Intelligent Framework for Omics Data Analysis using Machine Learning

dc.contributor.authorKaur, Parampreet
dc.contributor.supervisorSingh, Ashima
dc.contributor.supervisorChana, Inderveer
dc.date.accessioned2024-10-03T12:06:18Z
dc.date.available2024-10-03T12:06:18Z
dc.date.issued2024-10-03
dc.description.abstractOmics data encompasses extensive genetic information as genomics, proteomics, transcriptomics, and metabolomics, generated through advanced sequencing and mass spectrometry technologies. In computational bioinformatics, machine learning techniques are harnessed for analysis of omics data. Recent advancements in omics data analysis presents a breakthrough in healthcare which enables researchers to predict the disease before its onset. The combination of computational technologies and omics data in healthcare has revolutionized the way large datasets are retrieved and analyzed. This integration enables researchers to extract valuable insights and make significant advancements in prediction for the development of targeted therapies which ultimately leads to improvements in human health. The substantial omics data generated necessitates the requirement of advanced computational methods for effective survival prediction and disease prediction. The aim of this research is to employ computational technologies such as machine learning, and metaheuristic methods for effective disease prediction and survival prediction of patients using omics data. At the beginning, a comprehensive review has been undertaken to explore computationally intelligent approaches for omics data analysis. It involved investigating, comparing, and categorizing diverse technologies and tools utilized in disease prediction, survival prediction, biomarker discovery, and disease recurrence using omics data. Through this critical analysis, it became evident that there is a significant demand for the development of effective framework specifically designed for survival prediction and disease prediction using omics data. Additionally, it was noted that existing tools in the field often lack the necessary provisions for users to make informed choices concerning data pre-processing, feature selection, and prediction models for omics data. This limitation underscores the crucial need for an accessible solution that empowers researchers with a wide range of options for conducting omics data analysis. To address these gaps, the present research proposes OmicsML framework for omics data analysis. Further, an application is developed using proposed framework. The OmicsML framework is proposed for omics data analysis which consists of four xxiii phases, i.e., data acquisition, data preparation, development of learning models, and integration. Through data acquisition phase, omics data is collected from public repositories, i.e., The Cancer Genome Atlas (TCGA), Molecular Taxonomy of Breast Cancer International Consortium(METABRIC), and National Center for Biotechnology Information-Gene Expression Omnibus (NCBI-GEO). The data preparation phase consists of pre-processing and feature selection techniques. The data pre-processing is performed by removal and imputation of null values, data normalization, and removal of duplicate samples. Additionally, feature selection is done using Artificial Bee Colony (ABC) and ANOVA-Firefly technique. In development of learning models phase, a Bayesian optimized Stacked ensemble (BSense) model and Bayesian optimized Deep Neural Network (BDNN) model is proposed for survival prediction and disease prediction, respectively. In integration phase, a web application is developed using the previous three phases of proposed OmicsML framework for validation. The BSense model is proposed for survival prediction using Multi-layer Perceptron, Gradient Boosting Machine, and Random Forest models. The hyperparameters of used models are tuned efficiently using parallel Bayesian optimization, leading to improved performance in a shorter processing time. The survival prediction is designed using data acquisition, data preparation, and learning model phase of proposed framework. In data preparation, ABC technique is applied for feature selection. Further, BSense model is used as learning model for survival prediction. The BSense model is validated using various breast cancer datasets, i.e., TCGA, METABRIC, Metabolomics, and RNA-seq. It has been observed from the results that for TCGA dataset, BSense model gives Area Under Curve (AUC) value of 83.9%. For METABRIC dataset, BSense model provides AUC value of 87.3%. For Metabolomics dataset and RNA-seq dataset, BSense model provides AUC value of 91.1% and 80.1%, respectively. The accurate survival prediction of breast cancer using omics data complements insightful decision making along with clinical data. The ability of BSense model to accurately predict breast cancer survival will help the clinicians in guiding more suitable cancer treatment. Additionally, the predicted short-term survivors could be prioritized and given appropriate line of treatment well in time. xxiv The BDNN model is proposed for disease prediction using Deep Neural Network, i.e. Multi-layer Perceptron model. The hyperparameters of used model are tuned using Bayesian optimization. The disease prediction is designed using data acquisition, data preparation, and learning model phase of proposed framework. In data preparation, ANOVA-Firefly technique is applied for feature selection. Further, BDNN model is used as learning model for disease prediction. The BDNN model is validated using various diseases, i.e., Alzheimer’s, Breast Cancer, and COVID-19 datasets. For Alzheimer’s dataset, i.e., GEO:GSE33000 and GEO:GSE44770, BDNN model gives an AUC value of 94.9%. For breast cancer dataset, i.e., METABRIC, BDNN model showed an AUC value of 98.7%. For COVID-19 dataset, i.e., GSE157103, BDNN model gives AUC value of 98.9%. The enhanced and accurate performance of BDNN model for disease prediction can help in recommending treatment to a patient diagnosed with disease. This work makes a significant contribution by developing a omics data analysis application for the validation of proposed framework. The OmicsML application is developed by integrating the data acquisition, data preparation, and learning model phase of proposed framework. The OmicsML application is deployed on cloud server and provides the graphical user interface which offers users to autopick the data pre-preparation techniques and learning models for omics data analysis.en_US
dc.identifier.urihttp://hdl.handle.net/10266/6880
dc.language.isoenen_US
dc.subjectOmics Data Analysisen_US
dc.subjectDisease Predictionen_US
dc.subjectSurvival Predictionen_US
dc.subjectMachine Learningen_US
dc.subjectDeep Learningen_US
dc.titleIntelligent Framework for Omics Data Analysis using Machine Learningen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_Parampreet Kaur_901703013.pdf
Size:
4.69 MB
Format:
Adobe Portable Document Format
Description:
Ph.D Thesis- Intelligent Framework for Omics Data Analysis using Machine Learning

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.03 KB
Format:
Item-specific license agreed upon to submission
Description: