Please use this identifier to cite or link to this item:
http://hdl.handle.net/10266/5911
Title: | An Efficient Framework for Privacy Preservation for Big Data Applications |
Authors: | Kaur, Harmanjeet |
Supervisor: | Kumar, Neeraj Batra, Shalini |
Keywords: | Privacy Preservation;Health Care;Big Data;Encryption |
Issue Date: | 20-Jan-2020 |
Abstract: | In the modern data-driven world, the actual advantage of big data can be realized if data is efficiently processed and knowledge extracted from it can serve as an important component in decision making. Data mining techniques have been used to discover interesting patterns and knowledge from large datasets. Providing all the data to data miners may provide good analytics, but it can also raise many security challenges since such data can be misused by malicious users. Thus, equilibrium should be maintained between data availability and data security as one needs to secure the confidentiality of sensitive data without affecting the efficiency of applications. Privacy preserving data mining techniques are used to extract useful information from data without compromising the security of sensitive information contained in it. Before performing any analysis on data set, it is anonymized by encryption techniques or by removing the personally identifiable information from data sets, such that the person whom the data refers will remain anonymous. The data sets used for the data mining purpose can be centralized owned by a single owner or it can be distributed among multiple parties having horizontal, vertical or arbitrary distribution. Usage of traditional cryptographic techniques for protecting the information leads to large computation and communication overheads especially, for large datasets. The anonymization techniques have less computation and communication overheads, but there is a risk of re-identification of anonymized dataset, since a large amount of data is available and by linking the different data sources with the anonymized dataset, the probability of re-identification of data is higher. This thesis proposes a framework for privacy preserving data mining on big data. Based on the proposed framework, two application domains have been identified. The first one is privacy preserving collaborative filtering technique used for recommendation generation in the healthcare system where data is arbitrarily distributed among multiple healthcare sites. xiii It is an item-based collaborative filtering technique where item-item similarity is securely computed using homomorphic encryption technique and secure scalar dot product algorithm. The second is cloud-based privacy preserving collaborative filtering technique based on naive Bayesian classifier for recommendation generation on arbitrarily distributed data among multiple parties. In this technique, conditional probability is securely calculated using proposed privacy preserving conditional probability algorithm and prior probability is securely calculated using homomorphic encryption technique. Both techniques are secure and having less computation overhead as compared to the state of art privacy preserving collaborative filtering techniques. Further, k- anonymization based on neural network and support vector machine classifiers helps in the anonymization of social network data before sharing or performing any analysis on it. The proposed technique is evaluated on different parameters: Precision, Recall, F-measure, Information loss and Average path length. Through this thesis work, it can be concluded that efficient data analytics can be performed securely for both centralized and distributed data sets without much computational overheads. |
URI: | http://hdl.handle.net/10266/5911 |
Appears in Collections: | Doctoral Theses@CSED |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
An Efficient Framework for Privacy Preservation for Big Data Applications.pdf | 2.84 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.