Toxicity Prediction of Pre-Clinical Trial Drugs using Physicochemical Properties and Computational Intelligence Approaches

Gupta, Vishan Kumar

Toxicity Prediction of Pre-Clinical Trial Drugs using Physicochemical Properties and Computational Intelligence Approaches

dc.contributor.author	Gupta, Vishan Kumar
dc.contributor.supervisor	Rana, Prashant Singh
dc.date.accessioned	2020-03-19T07:30:39Z
dc.date.available	2020-03-19T07:30:39Z
dc.date.issued	2020-03-19
dc.description.abstract	Development of quantitative structure activity relationships (QSARs), quantitative structure property relationships (QSPRs), and quantitative structure toxicity relationships (QSTRs) have been practiced for the prediction of various toxicities of drug molecules in terms of their activity, activity score, potency, and efficacy. These predictions are based on the in silico toxicity prediction techniques, which are essential for reducing animal testing (in vivo), less time-consuming and cost-efficient alternative for the identification of toxic effects at an early stage of drug development. The authors aim to build a prediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compounds have the potential to disrupt the processes in the human body that may adversely affect their health. Here, we have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors) that can bind to the various nuclear receptor (NR) signalling pathways like androgen receptor (AR), estrogen receptor (ER), and aryl hydrocarbon receptor (AhR), and various stress response (SR) signalling pathways like antioxidant response elements (ARE). The pharmaceutical data exploration laboratory (PaDEL) software is used for extracting the features of drug molecules. Aryl hydrocarbon receptor contains 9008 drug molecules where 1063 are active, and 7945 are inactive, the estrogen receptor dataset has 8481 drug molecules where 1084 are active, and 7397 are inactive, the androgen receptor dataset has 10273 drug molecules where 461 are active, and 9812 are inactive, and the antioxidant response elements dataset has total 7439 drug molecules, of which 1147 are active and 6292 are inactive. Initially, the class imbalance is resolved using SMOTE algorithms for the ER dataset, and we have divided the dataset into equal size of data frames which have an equal number of active and inactive drug molecules for the dataset of AR, AhR, and ARE. Feature selection is performed by Boruta algorithm, CFS algorithm, Gini importance, and Random forest importance algorithm. It is found that the extended topochemical atom (ETA) descriptors, electro-topological state descriptors, Crippen's logP, and Molar refractivity (MR) are quite rich in chemical information to encode the structural features that contribute to the toxicities and these indices may be used in combination with other topological and physicochemical descriptors for the development of predictive QSAR model. Initially, five classification methods are trained on the dataset of ER for activity, activity score, potency, and efficacy prediction and it is found that random forest is having the best accuracy in comparison of other models. Similarity, a multilevel ensemble model is proposed for the dataset of AR, where our proposed multilevel ensemble model is outperformed in comparison to other models. An ensemble model based on the votes of random forest is proposed for the prediction of toxicity of AhR drug molecules, where our proposed ensemble model is performed better instead of other models. An ensemble model based on the votes of AdaBoost, random forest, decision tree and support vector machine is proposed for the prediction of toxicity of the ARE signaling pathway dataset, where our proposed ensemble model outperformed other models. The K-fold cross-validation is performed to measure the consistency of all proposed models for all the target classes. Finally, we have proved the validity of all the proposed models on some AIDS Therapy's, general food additives, cosmetics, detergents, preservatives, luciferase-tagged ATAD5, and some other similar kinds of drug molecules.	en_US
dc.identifier.uri	http://hdl.handle.net/10266/5957
dc.language.iso	en	en_US
dc.subject	Androgen Receptor	en_US
dc.subject	Molecular Descriptor	en_US
dc.subject	Random Forest	en_US
dc.subject	Activity	en_US
dc.subject	Activity Score	en_US
dc.subject	Potency	en_US
dc.subject	Efficacy	en_US
dc.subject	Feature Selection	en_US
dc.subject	Toxicity	en_US
dc.subject	Ensemble Learning	en_US
dc.subject	Class Imbalance	en_US
dc.subject	Machine Learning	en_US
dc.subject	Nuclear Receptor	en_US
dc.subject	Stress Response	en_US
dc.subject	Estrogen Receptor	en_US
dc.subject	Antioxidant Response Element	en_US
dc.title	Toxicity Prediction of Pre-Clinical Trial Drugs using Physicochemical Properties and Computational Intelligence Approaches	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Thesis - Vishan - Final.pdf
Size:: 2.74 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.03 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Theses@CSED