An Efficient Framework for Privacy Preservation for Big Data Applications

dc.contributor.authorKaur, Harmanjeet
dc.contributor.supervisorKumar, Neeraj
dc.contributor.supervisorBatra, Shalini
dc.date.accessioned2020-01-20T11:06:19Z
dc.date.available2020-01-20T11:06:19Z
dc.date.issued2020-01-20
dc.description.abstractIn the modern data-driven world, the actual advantage of big data can be realized if data is efficiently processed and knowledge extracted from it can serve as an important component in decision making. Data mining techniques have been used to discover interesting patterns and knowledge from large datasets. Providing all the data to data miners may provide good analytics, but it can also raise many security challenges since such data can be misused by malicious users. Thus, equilibrium should be maintained between data availability and data security as one needs to secure the confidentiality of sensitive data without affecting the efficiency of applications. Privacy preserving data mining techniques are used to extract useful information from data without compromising the security of sensitive information contained in it. Before performing any analysis on data set, it is anonymized by encryption techniques or by removing the personally identifiable information from data sets, such that the person whom the data refers will remain anonymous. The data sets used for the data mining purpose can be centralized owned by a single owner or it can be distributed among multiple parties having horizontal, vertical or arbitrary distribution. Usage of traditional cryptographic techniques for protecting the information leads to large computation and communication overheads especially, for large datasets. The anonymization techniques have less computation and communication overheads, but there is a risk of re-identification of anonymized dataset, since a large amount of data is available and by linking the different data sources with the anonymized dataset, the probability of re-identification of data is higher. This thesis proposes a framework for privacy preserving data mining on big data. Based on the proposed framework, two application domains have been identified. The first one is privacy preserving collaborative filtering technique used for recommendation generation in the healthcare system where data is arbitrarily distributed among multiple healthcare sites. xiii It is an item-based collaborative filtering technique where item-item similarity is securely computed using homomorphic encryption technique and secure scalar dot product algorithm. The second is cloud-based privacy preserving collaborative filtering technique based on naive Bayesian classifier for recommendation generation on arbitrarily distributed data among multiple parties. In this technique, conditional probability is securely calculated using proposed privacy preserving conditional probability algorithm and prior probability is securely calculated using homomorphic encryption technique. Both techniques are secure and having less computation overhead as compared to the state of art privacy preserving collaborative filtering techniques. Further, k- anonymization based on neural network and support vector machine classifiers helps in the anonymization of social network data before sharing or performing any analysis on it. The proposed technique is evaluated on different parameters: Precision, Recall, F-measure, Information loss and Average path length. Through this thesis work, it can be concluded that efficient data analytics can be performed securely for both centralized and distributed data sets without much computational overheads.en_US
dc.identifier.urihttp://hdl.handle.net/10266/5911
dc.language.isoenen_US
dc.subjectPrivacy Preservationen_US
dc.subjectHealth Careen_US
dc.subjectBig Dataen_US
dc.subjectEncryptionen_US
dc.titleAn Efficient Framework for Privacy Preservation for Big Data Applicationsen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
An Efficient Framework for Privacy Preservation for Big Data Applications.pdf
Size:
2.78 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.03 KB
Format:
Item-specific license agreed upon to submission
Description: