Please use this identifier to cite or link to this item:
Title: Hierarchical Clustering Algorithm for Big Data using Hadoop and Mapreduce
Authors: Lathiya, Piyush
Supervisor: Rani, Rinkle
Keywords: Hierarchical Clustering;Hadoop;Mapreduce
Issue Date: 5-Aug-2016
Abstract: Mining of massive data sets is the need of the hour in present computer science industry. The exponential growth in the number of users on internet and volume of available data force research to think about efficient approach to store data and analyze useful patterns out of it. Extracting useful information out of massive data and process them in less span of time has become crucial part of Data mining. There are many approach exist to cluster data objects based on similarity. CURE (Clustering Using Representatives) is very useful hierarchical algorithm which has ability to identify cluster of arbitrary shape and able to identify outliers. However traditional CURE algorithm is based on processing in single machine hence can’t cluster large amount of data in efficient way. In this thesis, CURE algorithm is proposed along with Distributed Environment using Hadoop. To process huge amount of data and to extract useful patterns out of it, distributed environment is the efficient solution so clustering of data objects is performed with the help of Mapreduce Programming model. One of the other advantage of CURE algorithm is to detect outlier points and removed it from further clustering process and improve quality of clusters. The major focus of this thesis has been exploring new approach to cluster data objects using CURE clustering algorithm with the help of Hadoop distributed environment and explore effect of different parameters in outlier detection.
Description: Master of Engineering-Software Engineering
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
4021.pdf2.45 MBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.