Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/4208
Title: Design and Performance Evaluation of Density Based Anomaly Detection Clustering Algorithms using Hadoop and Map Reduce
Authors: Behera, Sourajit
Supervisor: Rani, Rinkle
Keywords: Anomaly Detection , Density based clustering, Hadoop, Mapreduce
Issue Date: 31-Aug-2016
Abstract: Advancement in technology in the last few decades has created loathes of data emerging from different source ranging from social media, customer centric data, online transactions to mention a few. So, companies and individuals are keen to analyze the data which is constantly increasing in both volume and complexity using effective data mining algorithms to take better decision based on the analysis. Various approaches are followed under data mining to meet present day demands of data analysis. Clustering approach is one such technique used to find instances in a data set which are more similar to each other and form groups while being different from other instances in other groups. Using the approach helps to detect data instances which do not follow an idea of well defined instance and raises suspicion of being generated externally due to some process. DBSCAN and OPTICS have been classified under the clustering approach for data mining. Because of high volume and complexity of data, hadoop framework has been in demand to perform operations on the data. Much of the work has already been done in proposing the individual algorithms and implementing them on smaller data sets. In this thesis we focus on creating multi node cluster of hadoop framework and perform a performance comparison of the algorithms DBSCAN and OPTICS by implementing them on real data sets on multi node clusters using Map Reduce programming approach. In this thesis, the implementation of DBSCAN and OPTICS on multi node cluster shows that OPTICS algorithm is slower than DBSCAN by a factor of 1.5-1.6. It is also observed that increasing the number of nodes in the cluster, leads to reduction in execution time of the algorithms on real data set.
URI: http://hdl.handle.net/10266/4208
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
4208.pdf33.55 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.