Computational Intelligent Framework for Biomarker Identification in Multi-Omics Data

Dhillon, Arwinder

Computational Intelligent Framework for Biomarker Identification in Multi-Omics Data

Files

Primary Arwinder Dhillon (901903008) (1).pdf (6.26 MB)

Date

2024-04-19

Authors

Dhillon, Arwinder

Supervisors

Singh, Ashima

Bhalla, Vinod Kumar

Abstract

Omics data, encompassing genomics, proteomics, transcriptomics, and metabolomics, is generated through cutting-edge sequencing and mass spectrometry technologies. Biomarker identification, crucial in omics data analysis, relies on Deoxyribonucleic Acid (DNA), Ribonucleic Acid (RNA), and protein indicators to reveal physiological processes and disease symptoms. Leveraging machine learning and deep learning in computational bioinformatics enables the identification of biomarkers across single and multi-omics datasets, offering groundbreaking potential for early disease prediction. Integration of computational technologies with multi-omics data revolutionizes healthcare by facilitating advanced insights, aiding in disease diagnosis, prognosis, and targeted therapy development, thus advancing human health outcomes. This research aims to utilize computational technologies like machine learning, deep learning, and statistical methods for effective biomarker identification using multi-omics data, targeting disease survival prediction, subtype classification, and disease prediction. Beginning with a comprehensive review, the study explores intelligent computational approaches for biomarker identification across single and multi-omics datasets. It identifies a significant demand for a tailored framework specifically designed for biomarker identification using multi-omics data, highlighting shortcomings in existing tools related to data pre-processing, feature selection, biomarker validation, and prediction model creation. To bridge these gaps, the research proposes a novel framework for biomarker identification in multi-omics analysis, aiming to empower researchers with accessible and comprehensive options for conducting biomarker identification effectively. The framework is proposed for biomarker identification in multi-omics data which consists of six phases, i.e., data acquisition, data preprocessing, feature/ biomarker identification, biological interpretation, modeling, and performance evaluation. Through the data acquisition phase, omics data is collected from public repositories, i.e., The Cancer Genome Atlas (TCGA), Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), and Religious Orders Study and Rush Memory and Aging Project (ROSMAP). The data preprocessing is performed by removal and imputation of null values, data normalization, and removal of duplicate samples. Additionally, feature/ biomarker identification is done using three approaches, comprising, 1. statistical methods and Random Spatial Local Best Cat Swarm Optimization (RSLBCSO), 2. Multimodal Variav tional autoencoder (MVAE), 3. CpG site Aggregation, statistical methods, and Light Gradient Boosting Machine Recursive Feature Elimination (LGBMRFE). The extracted biomarkers are validated using DAVID analysis and Kalpan Meier (KM) plots in the biological interpretation phase. In the development of modeling phase, the features from different omics are integrated and three models have been developed comprising Bayesian optimized Deep Neural Network (DNN) model, Simplified Graph Convolutional Networks (SGC), and stacked ensemble model for survival prediction, subtype classification and disease prediction, respectively. The three feature/ biomarker selection techniques and models are combined named as BioSurv, iMVAN, and HBS-STACK which are designed for biomarker identification in multi-omics for survival prediction, subtype classification, and disease prediction respectively. The performance of proposed framework is evaluated using various performance parameters in the performance evaluation phase. The integration of computational techniques such as ML, DL, and statistical methods has significantly improved the precise identification of biomarkers using multi-omics data. The proposed approaches including BioSurv, iMVAN, and HBS-STACK exhibit high accuracies in survival prediction, disease subtype classification, and disease prognosis on multi-omics datasets. These approaches yield critical biomarker insights crucial for early disease detection, customized treatment strategies, and informed clinical decisions.

Keywords

Biomarkers, Multi-omics, Machine Learning, Deep Learning, Diagnostics

URI

http://hdl.handle.net/10266/6705

Collections

Doctoral Theses@CSED

Full item page

Computational Intelligent Framework for Biomarker Identification in Multi-Omics Data

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By