Computational Intelligent Framework for Biomarker Identification in Multi-Omics Data
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Omics data, encompassing genomics, proteomics, transcriptomics, and metabolomics,
is generated through cutting-edge sequencing and mass spectrometry technologies.
Biomarker identification, crucial in omics data analysis, relies on Deoxyribonucleic
Acid (DNA), Ribonucleic Acid (RNA), and protein indicators to reveal physiological processes and disease symptoms. Leveraging machine learning and deep
learning in computational bioinformatics enables the identification of biomarkers
across single and multi-omics datasets, offering groundbreaking potential for early
disease prediction. Integration of computational technologies with multi-omics
data revolutionizes healthcare by facilitating advanced insights, aiding in disease
diagnosis, prognosis, and targeted therapy development, thus advancing human
health outcomes.
This research aims to utilize computational technologies like machine learning,
deep learning, and statistical methods for effective biomarker identification using
multi-omics data, targeting disease survival prediction, subtype classification, and
disease prediction. Beginning with a comprehensive review, the study explores intelligent computational approaches for biomarker identification across single and
multi-omics datasets. It identifies a significant demand for a tailored framework
specifically designed for biomarker identification using multi-omics data, highlighting shortcomings in existing tools related to data pre-processing, feature selection,
biomarker validation, and prediction model creation. To bridge these gaps, the
research proposes a novel framework for biomarker identification in multi-omics
analysis, aiming to empower researchers with accessible and comprehensive options for conducting biomarker identification effectively.
The framework is proposed for biomarker identification in multi-omics data
which consists of six phases, i.e., data acquisition, data preprocessing, feature/
biomarker identification, biological interpretation, modeling, and performance evaluation. Through the data acquisition phase, omics data is collected from public repositories, i.e., The Cancer Genome Atlas (TCGA), Molecular Taxonomy
of Breast Cancer International Consortium (METABRIC), and Religious Orders
Study and Rush Memory and Aging Project (ROSMAP). The data preprocessing is performed by removal and imputation of null values, data normalization,
and removal of duplicate samples. Additionally, feature/ biomarker identification
is done using three approaches, comprising, 1. statistical methods and Random
Spatial Local Best Cat Swarm Optimization (RSLBCSO), 2. Multimodal Variav
tional autoencoder (MVAE), 3. CpG site Aggregation, statistical methods, and
Light Gradient Boosting Machine Recursive Feature Elimination (LGBMRFE).
The extracted biomarkers are validated using DAVID analysis and Kalpan Meier
(KM) plots in the biological interpretation phase. In the development of modeling
phase, the features from different omics are integrated and three models have been
developed comprising Bayesian optimized Deep Neural Network (DNN) model,
Simplified Graph Convolutional Networks (SGC), and stacked ensemble model for
survival prediction, subtype classification and disease prediction, respectively. The
three feature/ biomarker selection techniques and models are combined named as
BioSurv, iMVAN, and HBS-STACK which are designed for biomarker identification in multi-omics for survival prediction, subtype classification, and disease
prediction respectively. The performance of proposed framework is evaluated using various performance parameters in the performance evaluation phase.
The integration of computational techniques such as ML, DL, and statistical
methods has significantly improved the precise identification of biomarkers using multi-omics data. The proposed approaches including BioSurv, iMVAN, and
HBS-STACK exhibit high accuracies in survival prediction, disease subtype classification, and disease prognosis on multi-omics datasets. These approaches yield
critical biomarker insights crucial for early disease detection, customized treatment
strategies, and informed clinical decisions.
