Role of Feature Selection in Data Filtering: A Comparative Analysis

Dhayal, Kamlesh

Role of Feature Selection in Data Filtering: A Comparative Analysis

dc.contributor.author	Dhayal, Kamlesh
dc.contributor.supervisor	Batra, Shalini
dc.date.accessioned	2009-07-27T07:08:33Z
dc.date.available	2009-07-27T07:08:33Z
dc.date.issued	2009-07-27T07:08:33Z
dc.description.abstract	The quality of the data is one of the most important factors influencing the performance of any classification or clustering algorithm. The attributes defining the feature space of a given data set can often be inadequate, which make it difficult to discover interesting knowledge or desired output. However, even when the original attributes are individually inadequate, it is often possible to combine such attributes in order to construct new ones with greater predictive power. Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, and noise from data to improving result comprehensibility. This thesis addresses the task of feature selection for clustering and classification. The goal of this thesis is to find out the best feature subset from the given features in order to improve the performance of classification and clustering techniques on complex, real world data. To partition a given document collection into clusters of similar documents a choice of good features along with good clustering algorithms is very important in clustering. The feature selection is an important part in automatic text categorization which can change the entire results of text clusters. This thesis addresses the problem of feature selection for machine learning through various methods. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. This thesis give a comparative study of variety of feature selection methods for data mining, including Information Gain (IG) and χ2 statistic (CHI) etc using Weka, an open source data mining tool.	en
dc.format.extent	2017879 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10266/819
dc.language.iso	en	en
dc.subject	classification	en
dc.subject	feature selection	en
dc.title	Role of Feature Selection in Data Filtering: A Comparative Analysis	en
dc.type	Thesis	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 819 Kamlesh Dhayal (80732029).pdf
Size:: 1.82 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.79 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses@CSED