Subset Feature Selection Approach for Class Imbalance
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In machine learning, building an effective classification model, when the high dimensional data is suffering from class imbalance problem, is a major challenge. The problem becomes severe when negative samples have large percentages than positive samples. Various techniques like cost sensitive learning techniques, recognition based techniques, and sampling based techniques, etc. exist to handle data imbalance problem. However, these techniques suffer from data loss and over fitting because they invariably change the original distribution of data. To surmount the data imbalance and high dimensionality issues in dataset, in this thesis we propose a framework named Subset Feature Selection (SFS). The proposed SFS framework comprises of SMOTE filters are used for balancing the datasets, as well as feature ranker for pre-processing of data. The framework SFS is developed using R language and various R packages. The performance of SFS framework is evaluated and results show that SFS framework outperforms than other existing techniques like cost sensitive learning, recognition based techniques etc.
Description
Master of Engineering-Software Engineering
