Hierarchical Clustering Algorithm for Big Data using Hadoop and Mapreduce

Lathiya, Piyush

Hierarchical Clustering Algorithm for Big Data using Hadoop and Mapreduce

Files

4021.pdf (2.39 MB)

Date

2016-08-05

Authors

Lathiya, Piyush

Supervisors

Rani, Rinkle

Abstract

Mining of massive data sets is the need of the hour in present computer science industry. The exponential growth in the number of users on internet and volume of available data force research to think about efficient approach to store data and analyze useful patterns out of it. Extracting useful information out of massive data and process them in less span of time has become crucial part of Data mining. There are many approach exist to cluster data objects based on similarity. CURE (Clustering Using Representatives) is very useful hierarchical algorithm which has ability to identify cluster of arbitrary shape and able to identify outliers. However traditional CURE algorithm is based on processing in single machine hence can’t cluster large amount of data in efficient way. In this thesis, CURE algorithm is proposed along with Distributed Environment using Hadoop. To process huge amount of data and to extract useful patterns out of it, distributed environment is the efficient solution so clustering of data objects is performed with the help of Mapreduce Programming model. One of the other advantage of CURE algorithm is to detect outlier points and removed it from further clustering process and improve quality of clusters. The major focus of this thesis has been exploring new approach to cluster data objects using CURE clustering algorithm with the help of Hadoop distributed environment and explore effect of different parameters in outlier detection.

Description

Master of Engineering-Software Engineering

Keywords

Hierarchical Clustering, Hadoop, Mapreduce

URI

http://hdl.handle.net/10266/4021

Collections

Masters Theses@CSED

Full item page

Hierarchical Clustering Algorithm for Big Data using Hadoop and Mapreduce

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By