Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/3115
Title: Membership Set Based K_MEANS Approach for High Dimensional Data
Authors: Sharma, Varun Kumar
Supervisor: Bala, Anju
Keywords: Data Mining;K-Means, Clustering
Issue Date: 1-Sep-2014
Abstract: With the development of Information technology, computers are helping human in every aspect of their life. In recent years the organizations capacity to produce more and more data has been increased. Be it Gmail, Facebook, YouTube, Twitter, Yahoo, blogging or any other social networking website, these are generating tons of data. These large dataset stores high dimensional data. Data mining is data analysis methodologies, which process this large voluminous data, summarize it so that it can be easily understood. Data analysis can be exploratory, or confirmatory. Exploratory data analysis is used when data analyst has no prior model of analysis, and they want to categorize it on the basis of general features. On the other hand in Confirmatory analysis, the classes of data are known and data needs to be classified into these classes. The objective of clustering process is to find groups of similar items. The similarity is defined on the basis of characteristics, also known as dimensions of data. The characteristic used in clustering algorithms is distance, Sine most of numeric data is available in Euclidean space; which is calculated for similarity purpose. When data is small and have less number of characteristics, human eye can better categorize it, but once data grown in dimensions (more than 3), it is impossible to categorize it by just seeing, and the solution lies in several clustering algorithms. K-means is most used clustering analysis algorithm. It is an iterative approach of point assignment into k clusters. The standard k-means algorithm has many issues with it such as its high time complexity for high dimensional data. Several improvements have been suggested by research community, but when it is applied on high dimensional data, the complexity becomes infeasible because the computation of distance function takes too much time and becomes a bottleneck. Therefore, membership set based k-means approach has been proposed to reduce the computation. It aims to define a cluster membership set for every data point. The distance function is calculated only for the clusters which are contained in this set. With this membership set of cluster, the complexity of overall algorithm is reduced.
Description: ME, CSED
URI: http://hdl.handle.net/10266/3115
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
3115.pdf985.66 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.