Cloud Based Network Analysis Model for Predicting Disease-Diet Associations
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Predictive analytics in healthcare is an integration of computational technologies and
healthcare domain for retrieval, storage and analysis of medical data. With the immense progress in computational techniques and technologies, healthcare domain has
witnessed unparalleled achievements since the last decade. Comprehending the relationship between health and diet is another such area which presents numerous opportunities for predictive healthcare. Disease-diet associations pose an arduous problem in
computational domain because of the evident complex interdependencies. The intertwined relations among disease, diet and their subtypes along with the varying nature
of their associations (harmful or helpful) adds to the complexity. Thus, the associations
need to be explored with a close integration of significant computational techologies.
Predictive analysis of such associations would be an aid for healthcare professionals to
foresee the risk of occurrence or progression of a disease on the basis of diet and thus
make informed decisions.
This work aims to efficiently and effectively predict unknown disease-diet associations
using integrated computational technologies. To achieve this, initially, a review of the
work done in exploring the relation of disease and diet has been undertaken. It is
evident from the review that several studies aim to explore the associations, but they
have been designed for a specific disease and diet combination. It is also realized that
while some disease-diet associations are well established since ages, there are others
which have found acknowledgement only in literature. This presents an opportunity
for bringing together such studies and exploiting them on a large scale. Further, a
survey of the existing services and techniques designed for understanding relation of
disease and other factors like drugs, symptoms etc. has been done. It highlights a
plethoric use of an upcoming technology Network Analysis for representing and analyzing complex relationships. Thus, a further investigation of Network Analysis and its
role in predictive and healthcare applications has been conducted. Various challenges
that are posed while exploiting Network Analysis for healthcare along with measures
that might be adopted for overcoming the challenges have also been discussed. Several
promising applications of Network Analysis in healthcare domain have been proposed.
xvi
As an outcome of the survey, Network Analysis is deemed to be a significant technology
for exploring disease-diet associations. Considering the complexity of computations involved in this task, there is also a need of a platform which assists effective analysis
and does not compromise with performance in case of higher load. Another propitious
technology Cloud computing is found to be suitable for this work, given its extensive
application in healthcare domain, which has also been reviewed. As a consequence,
amalgamation of Network Analysis and Cloud Computing are established as a great
fit for exploring disease-diet associations.
Data corresponding to disease-diet associations is not available as such, thereby it becomes necessary to extract it from the literature. It is also recognized that visualization
of known disease-diet associations in the form of a graph and its quantification offers
opportunities for advanced learning. Thus, a Network Model DID-NEM has been proposed and developed for extracting, visualizing and modelling disease-diet associations.
Firstly, a custom-made automatic technique DIDACE is utilized to extract and quantify the associations using literature mining. This eliminates the drawbacks of manual
curation and assist in fast and efficient extraction. 2,74,131 records containing 1917
different diseases and 143 diet terms have been extracted using this technique. Further, nature of a subset of associations are predicted by performing sentiment analysis
using a MLP with an accuracy of 86%. The associations are then transformed into a
graph to be readily available for analysis. DID-NEM is novel and can be utilized by
domain researchers for extraction of associations between entities other than disease
and diet. It also contributes a novel disease-diet associations database to the research
community for further study.
A prediction approach PredNEM has been proposed to accomplish manipulation and
analysis of the curated database. PredNEM aims to firstly quantify and integrate
different networks like disease-diet, disease-disease or diet-diet by utilizing different
resources including curated database, pattern mining and semantic similarity. Further, two different learning methods, TBM and TFM have been engineered from a
combination of network algorithms/parameters and machine learning for prediction of
unknown associations. The first method starts by finding communities in the network
xvii
followed by ranking of nodes for finding most similar nodes, while the second method
crafts network algorithms/parameters as features, compares different machine learning
algorithms and select the best performing for prediction. Validation of PredNEM and
its two learning methods have been demonstrated through two different case studies
corresponding to Covid-19 and Inflammatory Bowel Disease (IBD) respectively. Out
of top 20 diets predicted for Covid-19, some enthralling associations have been validated through literature including kefir, carrot and strawberry. In the second case
study, nature of 16 out of 21 associations has been correctly predicted as per dietician
and medical literature for Crohn’s disease using Naive Bayes classifier with ROC AUC
value of 82.7%. The predictions enhance traditonal know-how of domain experts and
help them to stay updated. These are also beneficial for researchers in further study.
Cloud platform has been introduced for provisioning the network based analysis as an
efficient and adaptable service to the concerned stakeholders. EC2 and Neo4j instances
have been created for deploying the case study of IBD over cloud and connecting to
graph database respectively. Results of prediction are provided through an accessible
cloud service named CloudMenu. Optimum values of CPU utilization and throughput
suggest good performance and better resource utilization.
This work contributes significantly by developing a Network model DID-NEM which
involves automatic curation of disease-diet associations through DIDACE and its visualization as a network. PredNEM further utilizes network algorithms/parameters and
machine learning for advanced analysis. This network based analysis is deployed over
cloud, making the service CloudMenu easy to use, flexible and economical.
