Toxicity Prediction of Pre-Clinical Trial Drugs using Physicochemical Properties and Computational Intelligence Approaches
| dc.contributor.author | Gupta, Vishan Kumar | |
| dc.contributor.supervisor | Rana, Prashant Singh | |
| dc.date.accessioned | 2020-03-19T07:30:39Z | |
| dc.date.available | 2020-03-19T07:30:39Z | |
| dc.date.issued | 2020-03-19 | |
| dc.description.abstract | Development of quantitative structure activity relationships (QSARs), quantitative structure property relationships (QSPRs), and quantitative structure toxicity relationships (QSTRs) have been practiced for the prediction of various toxicities of drug molecules in terms of their activity, activity score, potency, and efficacy. These predictions are based on the in silico toxicity prediction techniques, which are essential for reducing animal testing (in vivo), less time-consuming and cost-efficient alternative for the identification of toxic effects at an early stage of drug development. The authors aim to build a prediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compounds have the potential to disrupt the processes in the human body that may adversely affect their health. Here, we have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors) that can bind to the various nuclear receptor (NR) signalling pathways like androgen receptor (AR), estrogen receptor (ER), and aryl hydrocarbon receptor (AhR), and various stress response (SR) signalling pathways like antioxidant response elements (ARE). The pharmaceutical data exploration laboratory (PaDEL) software is used for extracting the features of drug molecules. Aryl hydrocarbon receptor contains 9008 drug molecules where 1063 are active, and 7945 are inactive, the estrogen receptor dataset has 8481 drug molecules where 1084 are active, and 7397 are inactive, the androgen receptor dataset has 10273 drug molecules where 461 are active, and 9812 are inactive, and the antioxidant response elements dataset has total 7439 drug molecules, of which 1147 are active and 6292 are inactive. Initially, the class imbalance is resolved using SMOTE algorithms for the ER dataset, and we have divided the dataset into equal size of data frames which have an equal number of active and inactive drug molecules for the dataset of AR, AhR, and ARE. Feature selection is performed by Boruta algorithm, CFS algorithm, Gini importance, and Random forest importance algorithm. It is found that the extended topochemical atom (ETA) descriptors, electro-topological state descriptors, Crippen's logP, and Molar refractivity (MR) are quite rich in chemical information to encode the structural features that contribute to the toxicities and these indices may be used in combination with other topological and physicochemical descriptors for the development of predictive QSAR model. Initially, five classification methods are trained on the dataset of ER for activity, activity score, potency, and efficacy prediction and it is found that random forest is having the best accuracy in comparison of other models. Similarity, a multilevel ensemble model is proposed for the dataset of AR, where our proposed multilevel ensemble model is outperformed in comparison to other models. An ensemble model based on the votes of random forest is proposed for the prediction of toxicity of AhR drug molecules, where our proposed ensemble model is performed better instead of other models. An ensemble model based on the votes of AdaBoost, random forest, decision tree and support vector machine is proposed for the prediction of toxicity of the ARE signaling pathway dataset, where our proposed ensemble model outperformed other models. The K-fold cross-validation is performed to measure the consistency of all proposed models for all the target classes. Finally, we have proved the validity of all the proposed models on some AIDS Therapy's, general food additives, cosmetics, detergents, preservatives, luciferase-tagged ATAD5, and some other similar kinds of drug molecules. | en_US |
| dc.identifier.uri | http://hdl.handle.net/10266/5957 | |
| dc.language.iso | en | en_US |
| dc.subject | Androgen Receptor | en_US |
| dc.subject | Molecular Descriptor | en_US |
| dc.subject | Random Forest | en_US |
| dc.subject | Activity | en_US |
| dc.subject | Activity Score | en_US |
| dc.subject | Potency | en_US |
| dc.subject | Efficacy | en_US |
| dc.subject | Feature Selection | en_US |
| dc.subject | Toxicity | en_US |
| dc.subject | Ensemble Learning | en_US |
| dc.subject | Class Imbalance | en_US |
| dc.subject | Machine Learning | en_US |
| dc.subject | Nuclear Receptor | en_US |
| dc.subject | Stress Response | en_US |
| dc.subject | Estrogen Receptor | en_US |
| dc.subject | Antioxidant Response Element | en_US |
| dc.title | Toxicity Prediction of Pre-Clinical Trial Drugs using Physicochemical Properties and Computational Intelligence Approaches | en_US |
| dc.type | Thesis | en_US |
