Subset Feature Selection Approach for Class Imbalance

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In machine learning, building an effective classification model, when the high dimensional data is suffering from class imbalance problem, is a major challenge. The problem becomes severe when negative samples have large percentages than positive samples. Various techniques like cost sensitive learning techniques, recognition based techniques, and sampling based techniques, etc. exist to handle data imbalance problem. However, these techniques suffer from data loss and over fitting because they invariably change the original distribution of data. To surmount the data imbalance and high dimensionality issues in dataset, in this thesis we propose a framework named Subset Feature Selection (SFS). The proposed SFS framework comprises of SMOTE filters are used for balancing the datasets, as well as feature ranker for pre-processing of data. The framework SFS is developed using R language and various R packages. The performance of SFS framework is evaluated and results show that SFS framework outperforms than other existing techniques like cost sensitive learning, recognition based techniques etc.

Description

Master of Engineering-Software Engineering

Citation

Endorsement

Review

Supplemented By

Referenced By