Role of Feature Selection in Data Filtering: A Comparative Analysis
| dc.contributor.author | Dhayal, Kamlesh | |
| dc.contributor.supervisor | Batra, Shalini | |
| dc.date.accessioned | 2009-07-27T07:08:33Z | |
| dc.date.available | 2009-07-27T07:08:33Z | |
| dc.date.issued | 2009-07-27T07:08:33Z | |
| dc.description.abstract | The quality of the data is one of the most important factors influencing the performance of any classification or clustering algorithm. The attributes defining the feature space of a given data set can often be inadequate, which make it difficult to discover interesting knowledge or desired output. However, even when the original attributes are individually inadequate, it is often possible to combine such attributes in order to construct new ones with greater predictive power. Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, and noise from data to improving result comprehensibility. This thesis addresses the task of feature selection for clustering and classification. The goal of this thesis is to find out the best feature subset from the given features in order to improve the performance of classification and clustering techniques on complex, real world data. To partition a given document collection into clusters of similar documents a choice of good features along with good clustering algorithms is very important in clustering. The feature selection is an important part in automatic text categorization which can change the entire results of text clusters. This thesis addresses the problem of feature selection for machine learning through various methods. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. This thesis give a comparative study of variety of feature selection methods for data mining, including Information Gain (IG) and χ2 statistic (CHI) etc using Weka, an open source data mining tool. | en |
| dc.format.extent | 2017879 bytes | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | http://hdl.handle.net/10266/819 | |
| dc.language.iso | en | en |
| dc.subject | classification | en |
| dc.subject | feature selection | en |
| dc.title | Role of Feature Selection in Data Filtering: A Comparative Analysis | en |
| dc.type | Thesis | en |
