An Efficient Framework for Privacy Preservation for Big Data Applications

Kaur, Harmanjeet

An Efficient Framework for Privacy Preservation for Big Data Applications

Files

Primary An Efficient Framework for Privacy Preservation for Big Data Applications.pdf (2.78 MB)

Date

2020-01-20

Authors

Kaur, Harmanjeet

Supervisors

Kumar, Neeraj

Batra, Shalini

Abstract

In the modern data-driven world, the actual advantage of big data can be realized if data is efficiently processed and knowledge extracted from it can serve as an important component in decision making. Data mining techniques have been used to discover interesting patterns and knowledge from large datasets. Providing all the data to data miners may provide good analytics, but it can also raise many security challenges since such data can be misused by malicious users. Thus, equilibrium should be maintained between data availability and data security as one needs to secure the confidentiality of sensitive data without affecting the efficiency of applications. Privacy preserving data mining techniques are used to extract useful information from data without compromising the security of sensitive information contained in it. Before performing any analysis on data set, it is anonymized by encryption techniques or by removing the personally identifiable information from data sets, such that the person whom the data refers will remain anonymous. The data sets used for the data mining purpose can be centralized owned by a single owner or it can be distributed among multiple parties having horizontal, vertical or arbitrary distribution. Usage of traditional cryptographic techniques for protecting the information leads to large computation and communication overheads especially, for large datasets. The anonymization techniques have less computation and communication overheads, but there is a risk of re-identification of anonymized dataset, since a large amount of data is available and by linking the different data sources with the anonymized dataset, the probability of re-identification of data is higher. This thesis proposes a framework for privacy preserving data mining on big data. Based on the proposed framework, two application domains have been identified. The first one is privacy preserving collaborative filtering technique used for recommendation generation in the healthcare system where data is arbitrarily distributed among multiple healthcare sites. xiii It is an item-based collaborative filtering technique where item-item similarity is securely computed using homomorphic encryption technique and secure scalar dot product algorithm. The second is cloud-based privacy preserving collaborative filtering technique based on naive Bayesian classifier for recommendation generation on arbitrarily distributed data among multiple parties. In this technique, conditional probability is securely calculated using proposed privacy preserving conditional probability algorithm and prior probability is securely calculated using homomorphic encryption technique. Both techniques are secure and having less computation overhead as compared to the state of art privacy preserving collaborative filtering techniques. Further, k- anonymization based on neural network and support vector machine classifiers helps in the anonymization of social network data before sharing or performing any analysis on it. The proposed technique is evaluated on different parameters: Precision, Recall, F-measure, Information loss and Average path length. Through this thesis work, it can be concluded that efficient data analytics can be performed securely for both centralized and distributed data sets without much computational overheads.

Keywords

Privacy Preservation, Health Care, Big Data, Encryption

URI

http://hdl.handle.net/10266/5911

Collections

Doctoral Theses@CSED

Full item page

An Efficient Framework for Privacy Preservation for Big Data Applications

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By