Ensemble Machine Learning Framework for Big Data Analytics

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Data is growing tremendously. Every domain is becoming data rich and hence, researchers are more excited to use the concept of big data. Every business organization requires business insights. Lately, researchers are extensively embracing machine learning in diverse areas of research like health-care, astronomy, computational biology, finance, etc. The problem is that the big data concepts should be understood well. There is no threshold value that defines the size of big data. Big data Analytics is not only about the size of data but it is an opportunity to get valuable insights from the massive available data. Machine Learning (ML) applies scientific algorithms to the collected data with the goal of creating automated environment for making predictions or important business decisions. Researchers around the globe are working on improving the machine learning algorithms for modeling prediction and analytics problems. No single best machine learning algorithm is present which is applicable for all the possible cases of problems. So, numerous research attempts have been made for improving the performance of machine learning models by developing an ensemble-classifier which is created by combining multiple machine learning models. An ensemble learning serves as a powerful tool in machine learning as it employs multiple classifiers and works on optimizing the performance of base classifiers separately. Although it cannot always guarantees a success, but generally it offers better performance than a single classifier solution. By choosing a developing a special aggregation technique, an ensemble classifier can aid to scrutinize the risk of obtaining poor results from a single classifier system. In this thesis, a modified variant of an ensemble builder, Multi Criteria based TOPSIS Ensemble (MCTOPE) is proposed. In the proposed method, three new modifications are introduced. Firstly, ensemble builder is developed as an automated process. One need not think about combining multiple classifiers manually. Secondly, the user is relaxed from defining the number of candidate classifiers. The MCTOPE automatically tries the combinations of classifiers and chooses the best performers. Thirdly, unlike other ensemble building techniques, candidate classifiers for building an ensemble are not chosen on the basis of vii accuracy after the performance evaluation phase. MCTOPE employs multiple-criteria decision making (MCDM) based TOPSIS algorithm during the ensemble building process. The TOPSIS performance score is evaluated using multiple-performance criteria of classifier like accuracy, sensitivity, specificity, F score, area under ROC curve, etc. The work presented in this thesis mainly focuses on utilizing the ensemble machine learning technique for predicting the target in two different case studies. In the first case study, drug toxicity prediction problem is solved using MCTOPE framework. Two V’s of Big data i.e. variety and value are focused. Complex, unstructured, and high dimensional drug molecular data is collected with an objective of finding valuable insights in order to predict the toxic/non-toxic class of a drug molecule. In the second case study, Three V’s of Big data i.e. variety, veracity, and value are focused. An unstructured audit data is collected with an objective of finding fraudulent/ non-fraudulent class of a public firm. A web-application is offered to the auditors using R script and Django Python Web framework for prediction of fraudulent firm on the basis of input features. This web-application will help the auditors in automating a part of work before auditing the firm. The results obtained from the experiments have proved the usefulness of ensemble machine learning models for fraud prediction during audit planning and toxicity prediction during drug design and development. Hence, contributing the research area of an external auditing and biological computing.

Description

PhD Thesis

Citation

Endorsement

Review

Supplemented By

Referenced By