Anomaly Detection and Analysis in Big Data

Garg, Sahil

Anomaly Detection and Analysis in Big Data

Files

Primary Anomaly Detection and Analysis in Big Data.pdf (3.08 MB)

Date

2018-10-03

Authors

Garg, Sahil

Supervisors

Batra, Shalini

Abstract

With an exponential increase in the Internet traffic over the network, penetration of security threats in the underlying computer networks has witnessed a major increase. More recently, the severity of their impacts has undergone a considerable transformation from uncomplicated sniffing and spoofing attacks to the more complicated and critical attacks like denial of service. Due to the occurrence of such anomalies in the network, its normal operations get affected adversely in terms of traffic classification, resource allocation, and service management. Further, due to the rapid proliferation of emerging computing paradigms such as-Internet of Things (IoT), Edge/Fog Computing, Smart Grids, Software Defined Networks (SDN), etc., massive amount of data is being generated at an unprecedented rate. The unconventional 5V features (volume, velocity, variety, veracity and variability) of the data being generated have given birth to Big Data. The impact of this data abundance is further leading to several security threats and thus, its management and analysis requires schemes for Big Data analytics. Hence, it is essential to detect the notorious anomalies in the network on a real-time basis. Most of the existing anomaly detection solutions reported in the literature are not so efficient for large-scale networks due to various reasons such as-curse of dimensionality, imbalance between classes, and variations in the types of anomalies. The efficiency of any model mainly depends on the selection of relevant features and the learning algorithms; which in turn play a vital role in classification of the network traffic patterns into benign and malicious. Keeping in view of the above challenges and proliferation of Big data, the problem of anomaly detection in network traffic data has been considered in this thesis. Consequently, two different ensemble based techniques for anomaly detection have been proposed particularly for network wide traffic. The first technique, Ensemble based Anomaly Detection Technique (En-ADT), employs xi the combination of Fuzzy K-means clustering algorithm, Extended Kalman Filter and Support Vector Machines for the detection of various anomalies in the network. Here, the first module is used in the identification of the optimal subset of features which are then refined by the second module. Using these features, the third module classifies the traffic to identify the malicious entities. Another technique, Fuzzified Cuckoo based Clustering Technique (F-CBCT), has been proposed for the proactive prediction of attacks in networked traffic data. It operates in two phases: a training phase, where the system is trained to recognize the anomalies, and a detection phase, where the system detects anomalies on the basis of employed algorithms and the input data. The results and analysis of the proposed anomaly detections schemes over the benchmark datasets are presented on the basis of standard evaluation parameters such as-detection rate, false positive rate, accuracy and F-score.

Keywords

Anomaly Detection, Ensemble technique, Machine learning, Fuzzified cuckoo search

URI

http://hdl.handle.net/10266/5413

Collections

Doctoral Theses@CSED

Full item page

Anomaly Detection and Analysis in Big Data

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By