Please use this identifier to cite or link to this item:
http://hdl.handle.net/10266/6705
Title: | Computational Intelligent Framework for Biomarker Identification in Multi-Omics Data |
Authors: | Dhillon, Arwinder |
Supervisor: | Singh, Ashima Bhalla, Vinod Kumar |
Keywords: | Biomarkers;Multi-omics;Machine Learning;Deep Learning;Diagnostics |
Issue Date: | 19-Apr-2024 |
Abstract: | Omics data, encompassing genomics, proteomics, transcriptomics, and metabolomics, is generated through cutting-edge sequencing and mass spectrometry technologies. Biomarker identification, crucial in omics data analysis, relies on Deoxyribonucleic Acid (DNA), Ribonucleic Acid (RNA), and protein indicators to reveal physiological processes and disease symptoms. Leveraging machine learning and deep learning in computational bioinformatics enables the identification of biomarkers across single and multi-omics datasets, offering groundbreaking potential for early disease prediction. Integration of computational technologies with multi-omics data revolutionizes healthcare by facilitating advanced insights, aiding in disease diagnosis, prognosis, and targeted therapy development, thus advancing human health outcomes. This research aims to utilize computational technologies like machine learning, deep learning, and statistical methods for effective biomarker identification using multi-omics data, targeting disease survival prediction, subtype classification, and disease prediction. Beginning with a comprehensive review, the study explores intelligent computational approaches for biomarker identification across single and multi-omics datasets. It identifies a significant demand for a tailored framework specifically designed for biomarker identification using multi-omics data, highlighting shortcomings in existing tools related to data pre-processing, feature selection, biomarker validation, and prediction model creation. To bridge these gaps, the research proposes a novel framework for biomarker identification in multi-omics analysis, aiming to empower researchers with accessible and comprehensive options for conducting biomarker identification effectively. The framework is proposed for biomarker identification in multi-omics data which consists of six phases, i.e., data acquisition, data preprocessing, feature/ biomarker identification, biological interpretation, modeling, and performance evaluation. Through the data acquisition phase, omics data is collected from public repositories, i.e., The Cancer Genome Atlas (TCGA), Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), and Religious Orders Study and Rush Memory and Aging Project (ROSMAP). The data preprocessing is performed by removal and imputation of null values, data normalization, and removal of duplicate samples. Additionally, feature/ biomarker identification is done using three approaches, comprising, 1. statistical methods and Random Spatial Local Best Cat Swarm Optimization (RSLBCSO), 2. Multimodal Variav tional autoencoder (MVAE), 3. CpG site Aggregation, statistical methods, and Light Gradient Boosting Machine Recursive Feature Elimination (LGBMRFE). The extracted biomarkers are validated using DAVID analysis and Kalpan Meier (KM) plots in the biological interpretation phase. In the development of modeling phase, the features from different omics are integrated and three models have been developed comprising Bayesian optimized Deep Neural Network (DNN) model, Simplified Graph Convolutional Networks (SGC), and stacked ensemble model for survival prediction, subtype classification and disease prediction, respectively. The three feature/ biomarker selection techniques and models are combined named as BioSurv, iMVAN, and HBS-STACK which are designed for biomarker identification in multi-omics for survival prediction, subtype classification, and disease prediction respectively. The performance of proposed framework is evaluated using various performance parameters in the performance evaluation phase. The integration of computational techniques such as ML, DL, and statistical methods has significantly improved the precise identification of biomarkers using multi-omics data. The proposed approaches including BioSurv, iMVAN, and HBS-STACK exhibit high accuracies in survival prediction, disease subtype classification, and disease prognosis on multi-omics datasets. These approaches yield critical biomarker insights crucial for early disease detection, customized treatment strategies, and informed clinical decisions. |
URI: | http://hdl.handle.net/10266/6705 |
Appears in Collections: | Doctoral Theses@CSED |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Arwinder Dhillon (901903008) (1).pdf | 6.41 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.