Anomaly Detection and Analysis in Big Data
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With an exponential increase in the Internet traffic over the network, penetration of security
threats in the underlying computer networks has witnessed a major increase. More recently,
the severity of their impacts has undergone a considerable transformation from uncomplicated
sniffing and spoofing attacks to the more complicated and critical attacks like denial
of service. Due to the occurrence of such anomalies in the network, its normal operations
get affected adversely in terms of traffic classification, resource allocation, and service management.
Further, due to the rapid proliferation of emerging computing paradigms such
as-Internet of Things (IoT), Edge/Fog Computing, Smart Grids, Software Defined Networks
(SDN), etc., massive amount of data is being generated at an unprecedented rate. The unconventional
5V features (volume, velocity, variety, veracity and variability) of the data being
generated have given birth to Big Data. The impact of this data abundance is further leading
to several security threats and thus, its management and analysis requires schemes for Big
Data analytics. Hence, it is essential to detect the notorious anomalies in the network on a
real-time basis.
Most of the existing anomaly detection solutions reported in the literature are not so efficient
for large-scale networks due to various reasons such as-curse of dimensionality, imbalance
between classes, and variations in the types of anomalies. The efficiency of any model
mainly depends on the selection of relevant features and the learning algorithms; which in
turn play a vital role in classification of the network traffic patterns into benign and malicious.
Keeping in view of the above challenges and proliferation of Big data, the problem of
anomaly detection in network traffic data has been considered in this thesis. Consequently,
two different ensemble based techniques for anomaly detection have been proposed particularly
for network wide traffic.
The first technique, Ensemble based Anomaly Detection Technique (En-ADT), employs
xi
the combination of Fuzzy K-means clustering algorithm, Extended Kalman Filter and Support
Vector Machines for the detection of various anomalies in the network. Here, the first
module is used in the identification of the optimal subset of features which are then refined
by the second module. Using these features, the third module classifies the traffic to identify
the malicious entities. Another technique, Fuzzified Cuckoo based Clustering Technique
(F-CBCT), has been proposed for the proactive prediction of attacks in networked traffic
data. It operates in two phases: a training phase, where the system is trained to recognize
the anomalies, and a detection phase, where the system detects anomalies on the basis of
employed algorithms and the input data. The results and analysis of the proposed anomaly
detections schemes over the benchmark datasets are presented on the basis of standard evaluation
parameters such as-detection rate, false positive rate, accuracy and F-score.
