Please use this identifier to cite or link to this item:
Title: An Efficient Approach for Outlier Detection in Big Data
Authors: Saneja, Bharti
Supervisor: Rani, Rinkle
Keywords: Outlier detection;Body Area Networks;Sensors;Clustering;Classification
Issue Date: 28-Aug-2019
Abstract: Outlier detection is an important aspect of data mining which discovers the unusual events that occurs in data. Big data has large volume of unseen knowledge and many perceptions which have raised significant challenges in knowledge discovery. In certain kinds of data, the association among the different attributes is of much more significance than the information itself. Hence, in such datasets before detecting outliers these associations needs to be extracted. The associations can be mined by analyzing correlation among various attributes. However, it is very challenging to acquire ample benefits from the large amount of complex data. To overcome these issues, various methods for analyzing correlation are studied. Also, various existing approaches for outlier detection based on supervised and unsupervised learning models are studied. In recent times, these approaches have become an indispensable tool for detecting anomalous events in various domains. With the advancement in sensor technologies, a lot of data is being generated by wireless sensors in various application domains. In this study, the main concern is on data generated from wireless body sensor networks. As caretaker may not be always available to monitor physiological parameters so, different sensors are attached with the body of patient to remotely monitor the health of the patient. Outlier detection in this domain detects the anomalous activities based on the sensor measurements and differentiates the sensor fault from true medical condition. This thesis carried out research work in the field of outlier detection in wireless body area sensor networks. The key objective of the research is to explore the profits of using distributed map reduce framework for outlier detection. An approach is proposed to detect outliers based on the assumption that data attributes are linearly related to each other. xiv Further, as it is seen that in real application scenarios none of the sensors exhibit a truly linear relationship. Hence, to deal with non-linear aspect of data the proposed approach is further enhanced so that it can be able to detect outliers in dataset where data attributes are linearly or non-linearly correlated. The results of both the proposed approaches are proved to be effective than other competent approaches in terms of processing time and accuracy of outlier detection. The approaches are also tested for scalability by forming a multinode Hadoop cluster of eight nodes. Furthermore, an integrated framework for outlier detection is proposed that is based on data compression, data clustering, and cluster refinement. The clustering algorithm in the proposed framework works on the principle of clonal selection algorithm and uses the objective function of fuzzy clustering. It is seen that the clusters formed by proposed clustering algorithm have more optimal structures than state of art clustering algorithms. The formed clusters are further refined using cluster refinement algorithm to increase accuracy of outlier detection. The results of the proposed framework show that it outperforms the competent algorithms in various aspects of processing time and detection rate. It is suggested that the utilization of correlation between attributes detects discriminate and significant events, which can help in accurate classification of events and also reduce false alarms which can further aid in better utilization of resources
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Bhart_thesis.pdf1.94 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.