Novel Technique(s) for Concept Drift Detection and Handling
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In today’s world, learning in the presence of dynamic environments, where continous
change and development are evident, is a challenging task. Recent advances in technology
have witnessed an increase in the number of real world applications which include spam
filtering, fraud detection, weather forecasting, sensors, smart cities, health monitoring etc.
Data generated from such sources in form of streams tends to evolve with the course of time.
The predictive models which are trained using such data tend to become obselete with time,
resulting into poor adaptabilty to the underlying drifting distributions. In terms of machine
learning and data mining, the change in the statistical properties of data is known as concept
drift. Such changes causes degradation in the performance of the learning systems since the
models that were built on the old data are no longer consistent with the new data.
To address the problem of concept drift, efficient learning models which can monitor the
evolving distributions and update themselves regularly are required. These models should
detect the drifts and handle them timely by using adaptive learning techniques, to overcome
the deteriorating performance. Various learning methods which include single learners as
well as ensemble based modelling which utilize drift detectors, are used in literature to handle
evolving data streams.
This thesis proposes three techniques for concept drift detection and handling. First
one, a hybrid diversity based ensemble approach, called Ensemble Based Online Diversified
Drift Detection (En-ODDD), combines explicit drift detection and adaptive techniques deal
with drifting distributions. In second approach, Two-Level Pruning based Ensemble with
Abstained Learners (TLP-EnAbLe), similarity based pruning strategy has been proposed
for adapting to all types of drift patterns. The third approach, Dynamically Adaptive and
Diverse Dual Ensemble (DA-DDE) utilizes the characteristics of both online and block-based
ensemble techniques for concept drift handling. It proposes a dual ensemble mechanism for
separetely handling abrupt and gradual drifts. It is based on usage of novel Dynamic Dual
Selective Voting Mechanism (DDSVM) for ensemble selection and hypothesis generation.
Performance of the proposed approaches has been evaluated by conducting comparative
analysis with existing concept drift techniques and through the standard evaluation parameters
which include classification accuracy, kappa statistic, train time, test time, memory
consumption etc. Experiments conducted using several real datasets and artificially generated
streams of data, with variety of drift patterns, indicate that all the three approaches
handle the concept drift scenarios effectively giving better classification results.
