Hierarchical Clustering Algorithm for Big Data using Hadoop and Mapreduce

dc.contributor.authorLathiya, Piyush
dc.contributor.supervisorRani, Rinkle
dc.date.accessioned2016-08-05T10:53:23Z
dc.date.available2016-08-05T10:53:23Z
dc.date.issued2016-08-05
dc.descriptionMaster of Engineering-Software Engineeringen_US
dc.description.abstractMining of massive data sets is the need of the hour in present computer science industry. The exponential growth in the number of users on internet and volume of available data force research to think about efficient approach to store data and analyze useful patterns out of it. Extracting useful information out of massive data and process them in less span of time has become crucial part of Data mining. There are many approach exist to cluster data objects based on similarity. CURE (Clustering Using Representatives) is very useful hierarchical algorithm which has ability to identify cluster of arbitrary shape and able to identify outliers. However traditional CURE algorithm is based on processing in single machine hence can’t cluster large amount of data in efficient way. In this thesis, CURE algorithm is proposed along with Distributed Environment using Hadoop. To process huge amount of data and to extract useful patterns out of it, distributed environment is the efficient solution so clustering of data objects is performed with the help of Mapreduce Programming model. One of the other advantage of CURE algorithm is to detect outlier points and removed it from further clustering process and improve quality of clusters. The major focus of this thesis has been exploring new approach to cluster data objects using CURE clustering algorithm with the help of Hadoop distributed environment and explore effect of different parameters in outlier detection.en_US
dc.identifier.urihttp://hdl.handle.net/10266/4021
dc.language.isoenen_US
dc.subjectHierarchical Clusteringen_US
dc.subjectHadoopen_US
dc.subjectMapreduceen_US
dc.titleHierarchical Clustering Algorithm for Big Data using Hadoop and Mapreduceen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
4021.pdf
Size:
2.39 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.03 KB
Format:
Item-specific license agreed upon to submission
Description: