Role of Feature Selection in Data Filtering: A Comparative Analysis

Dhayal, Kamlesh

Role of Feature Selection in Data Filtering: A Comparative Analysis

Files

819 Kamlesh Dhayal (80732029).pdf (1.82 MB)

Date

2009-07-27T07:08:33Z

Authors

Dhayal, Kamlesh

Supervisors

Batra, Shalini

Abstract

The quality of the data is one of the most important factors influencing the performance of any classification or clustering algorithm. The attributes defining the feature space of a given data set can often be inadequate, which make it difficult to discover interesting knowledge or desired output. However, even when the original attributes are individually inadequate, it is often possible to combine such attributes in order to construct new ones with greater predictive power. Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, and noise from data to improving result comprehensibility. This thesis addresses the task of feature selection for clustering and classification. The goal of this thesis is to find out the best feature subset from the given features in order to improve the performance of classification and clustering techniques on complex, real world data. To partition a given document collection into clusters of similar documents a choice of good features along with good clustering algorithms is very important in clustering. The feature selection is an important part in automatic text categorization which can change the entire results of text clusters. This thesis addresses the problem of feature selection for machine learning through various methods. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. This thesis give a comparative study of variety of feature selection methods for data mining, including Information Gain (IG) and χ2 statistic (CHI) etc using Weka, an open source data mining tool.

Keywords

classification, feature selection

URI

http://hdl.handle.net/10266/819

Collections

Masters Theses@CSED

Full item page

Role of Feature Selection in Data Filtering: A Comparative Analysis

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By