Information

What is the best way to calculate the biological diversity of samples from bacterial/viral OTU tables?

What is the best way to calculate the biological diversity of samples from bacterial/viral OTU tables?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have a few large OTU tables of bacterial and viral datasets. The samples are across different sites and times.

I would like to visualise the community 'diversity' across the times for which I have data. For example it would be interesting to see if community diversity peaks in the summer months and falls in the winter months- in a repeating pattern.

I have not come across much advice or literature which looks at looking at diversity for large OTU datasets. Considering the OTUs are essentially arbitrary and that there are thousands of them, what is the best way to calculate and visualise the each samples diversity?

With the vegan package on R it is quite easy to calculate the Shannon/Simpsons diversity index for an OTU table. Can you simply use this on a 'raw' table of OTUs?


I mostly concur with @Nathan's answer, and in particular with the references he provided.

AsShannon&Simpsonindices can be hard to interpret and can be non-intuitive, I prefer usingHilldiversities as suggested by Nathan (the Jost 2006 and 2007 refs are great to read up on this). The main argument is that Hill diversities give effective number of species that are comparable between samples and follow the duplication principle.

Hill diversities rely on a unified formula (see this wikipedia article) with one parameter,q. Increasing values forqcorrespond to increasing weighting of taxa abundances in the diversity calculation:

  • Dwithq=0does not account for taxon abundance, so is just the number of taxa, orrichness
  • Dwithq=1is not defined, but asymptotically approximated bye^HwhereHis the Shannon entropy.D_q1is the effective number of species with abundance weights.
  • Dwithq=2corresponds to theInverse Simpsonindex (1/D_Simpson).D_q2weighs more abundant taxa even more strongly.

One can choose any value forq(withq=1with the limite^H), and comparing diversity estimates for varyingqcan give you an idea of sampleevenness. Settingq=∞gives theBerger-Parker index(the fraction of individuals in the sample belonging to the most abundant species).

Importantly, for alpha div analyses on (16S/18S-based) OTUs, I would always generate rarefaction curves first, and then generate diversity estimates at a common, rarefied number of reads per sample.

You can do most of this using the R packagevegan. Thephyloseqpackage provides various alpha div estimates in one command, but no Hill diversities. I've written a few simple functions to perform rarefaction and calculate (rarefied or non-rarefied) Hill diversities from an OTU count table matrix:

function.rarefaction.R

function.alpha_diversity.R


Theveganpackage is suitable for your needs, but you may find you need to use others or code your own functions.

Due to sequencing biases, you shouldn't trust the 'raw' counts of your OTUs (unless you have a good reason to do so-I'm not sure how your OTUs were obtained). Rather, you may consider relativizing your site-by-species matrix. You can do so using thedecostand()function.

Then, you can use thediversity()functions to analyze diversity; but you may also consider looking into other approaches to assess local diversity, such as rarefaction and sampling-based approaches, species equivalents and Hill numbers (Hill 1973, Gotelli and Colwell 2001, Jost 2006, 2007). The books by Magurran and McGill (2011) and Legendre and Legendre (2012) are extremely helpful.


Computer-Aided Drug Design 9789811568145, 9789811568152

Table of contents :
Foreword
Preface
Acknowledgement
Contents
About the Editor
1: Computational Approaches in Drug Discovery and Design
1.1 Introduction
1.2 Structure-Based Drug Designing
1.2.1 Target Identification
1.2.2 Modeling and Visualization of Macromolecule Structure
1.2.3 Binding Site Prediction and Analysis
1.2.4 Molecular Docking
1.2.4.1 Flexible Docking
1.2.4.2 Rigid Docking
1.2.5 Structure-Based Virtual Screening
1.2.6 Validation of Molecular Docking
1.3 Ligand-Based Designing
1.3.1 Pharmacophore Modeling
1.3.2 Quantitative Structure-Activity Relationship (QSAR)
1.3.2.1 CoMFA
1.3.2.2 CoMSIA
1.4 Computation of HOMO and LUMO Energy
1.5 ADMET Prediction and Analysis
1.6 Molecular Dynamics Simulation
1.7 Identification of New Drug-Like Molecules for Hyperuricemia from Millets: A Case Study
1.8 Discovery and Designing of Natural Lead Compounds for Liver Cancer: A Case Study
1.9 Examples of Drugs Synthesized Using CADD
1.10 Success and Limitations
1.11 Conclusion
References
2: Molecular Modeling of Proteins: Methods, Recent Advances, and Future Prospects
2.1 Introduction
2.1.1 Amino Acids
2.1.2 Basic Principles of Protein Structure
2.2 Explosion of Protein Related Data
2.3 Protein Structure Determination
2.3.1 X-Ray Crystallography
2.3.2 NMR Spectroscopy
2.3.3 3D Electron Microscopy
2.4 Protein Structure Prediction
2.4.1 Homology or Comparative Modeling
2.4.1.1 Template Recognition and Initial Alignment
2.4.1.2 Alignment Correction
2.4.1.3 Modeling Structurally Conserved Region (SCR) and Backbone Generation
2.4.1.4 Loop Modeling
Knowledge-Based
Energy-Based
2.4.1.5 Side-Chain Modeling
2.4.1.6 Model Optimization
Quantum Force Fields
Self-Parameterizing Force Fields
2.4.1.7 Model Validation
2.4.2 Fold Recognition or Threading Method
2.4.3 Ab Initio Methods
2.5 Evaluation and Validation of Modeled Structure
2.6 Recent Advances in Prediction Approaches
2.7 Applications
2.8 Conclusion
References
3: Cavity/Binding Site Prediction Approaches and Their Applications
3.1 Introduction
3.2 Target Molecule
3.3 Binding Site and Active Site
3.4 Ligand Molecule
3.5 Binding Affinity
3.6 Chemical Specificity
3.7 Binding Site and Molecular Interactions
3.7.1 Protein-Drug Interactions
3.7.1.1 Reversible Binding
3.7.1.2 Irreversible Binding
3.7.1.3 Factors Affecting Protein-Drug Binding
3.7.1.4 Role of Water Molecules
3.7.2 Drug-Nucleic Acid Interactions
3.7.3 Protein-Protein Interactions
3.7.4 Interaction of Protein with Nucleic Acid, Lipid, and Carbohydrate
3.8 Binding Site Prediction
3.8.1 Evolutionary Algorithms/Sequence-Based Predictions
3.8.1.1 Single Residue Based Approach
3.8.1.2 Window Based Approach
3.8.2 Energy-Based Algorithms
3.8.3 Geometry-Based Algorithm/Structure-Based Predictions
3.9 Approaches
3.9.1 Statistical Approach
3.9.2 Machine Learning
3.9.3 Meta-Predictors
3.10 Prediction Tools and Servers
3.11 Validation of Binding Site
3.12 Role of the Binding Site in Drug Designing
3.13 Recent Advances and Future Perspective
3.14 Conclusion
References
4: Role of ADMET Tools in Current Scenario: Application and Limitations
4.1 Introduction
4.1.1 ADMET Prediction
4.1.2 ADMET Parameters and their Role
4.2 Importance of ADMET
4.3 The Evolving Science of ADMET
4.4 Blood-Brain Barrier Models
4.5 ADMET Prediction
4.6 Strategies for the Designing of ADMET Model
4.6.1 Selection of Experimental Data
4.6.2 Calculation of Physicochemical Parameters or Descriptor Values
4.6.3 ADMET Prediction Methods and Tools
4.6.3.1 Recursive Partitioning Regression
4.6.3.2 Partial Least Square (PLS) Regression
4.6.3.3 Random Forests (RF)
4.6.3.4 Decision Trees
4.6.3.5 Naive Bayes Classifiers
4.6.3.6 k-Nearest Neighbour (k-NN)
4.6.3.7 Support Vector Machine (SVM)
4.7 ADMET Tools
4.8 Challenges in Present Scenario and Future Prospective
4.9 Conclusions
References
5: Database Resources for Drug Discovery
5.1 Introduction
5.2 Therapeutic Target Information
5.2.1 Universal Protein Resource (UniProt)
5.2.1.1 UniProtKnowledgeBase (UniProtKB)
5.2.1.2 UniProt Reference Clusters (UniRef)
5.2.1.3 UniProt Archive (UniParc)
5.2.2 Protein Data Bank (PDB)
5.2.3 Molecular Modeling Database
5.2.4 Therapeutic Target Database
5.2.5 Herbal Ingredients Targets (HIT) Database
5.2.6 SuperTarget
5.3 Chemical Information
5.3.1 PubChem
5.3.2 Zinc
5.3.3 ChEMBL
5.3.4 Chemical Entities of Biological Interest (ChEBI)
5.3.5 NCI Database
5.3.6 ChemDB
5.3.7 ChemSpider
5.3.8 BindingDB
5.3.9 PDBbind
5.3.10 Toxin and Toxin-Target Database (T3DB)
5.3.11 BIAdb
5.3.12 Super Natural II
5.3.13 Naturally Occurring Plant-Based Anti-Cancer Compound-Activity-Target Database (NPACT)
5.3.14 Dictionary of Natural Products Online
5.3.15 Ligand Expo
5.3.16 SuperLigands
5.3.17 Toxicology Data Network
5.4 Drug Molecule Information
5.4.1 DrugBank
5.4.2 SuperDRUG2
5.4.3 PharmGKB
5.4.4 Search Tool for Interactions of Chemicals (STITCH)
5.5 Metabolomic Pathway Information
5.5.1 Kyoto Encyclopedia of Genes and Genomes (KEGG)
5.5.2 Human Metabolome Database (HMDB)
5.5.3 Small Molecule Pathway Database (SMPDB)
5.5.4 BiGG
5.5.5 MetaboLights Database
5.5.6 BioCyc
5.5.7 Reactome
5.5.8 WikiPathways
5.6 Disease and Physiology Information
5.6.1 Online Mendelian Inheritance in Man (OMIM)
5.6.2 METAGENE
5.6.3 RAMEDIS
5.6.4 Online Metabolic and Molecular Basis of Inherited Disease (OMMBID)
5.7 Peptide Information
5.7.1 PepBank
5.7.2 StraPep
5.7.3 Antimicrobial Peptide Database (APD)
5.7.4 CAMPR3
5.7.5 CancerPPD
5.8 Challenges and Future Perspective
5.9 Summary
References
6: Molecular Docking and Structure-Based Drug Design
6.1 Introduction
6.2 Docking Guidelines
6.2.1 Hardware and Software Requirements for Molecular Docking
6.2.2 Docking Process
6.2.3 Ligand and Protein Preparation
6.2.4 Ligand Conformations Strategies
6.2.5 Scoring Functions
6.2.5.1 Force Field
6.2.5.2 Empirical Scorings
6.2.5.3 Knowledge-Based Scoring
6.2.6 Ensemble Docking
6.2.7 Consensus Docking
6.3 Different Types of Docking Based on Interactions
6.3.1 Protein-Ligand Docking
6.3.2 Protein-Peptide like Ligand Docking
6.3.3 Protein-Protein Docking
6.3.4 Protein-Nucleic Acid Docking/Nucleic Acid-Ligand Docking
6.4 Water Solvation and Docking
6.5 Docking Tools
6.6 Virtual Screening
6.7 Analysis of Docking Results
6.8 Limitations of Docking Algorithms and Future Scope
6.9 Major Developments in Docking
6.10 Conclusion
References
7: Molecular Dynamics Simulation of Protein and Protein-Ligand Complexes
7.1 History and Background
7.2 Introduction
7.3 Principle of MD Simulation
7.3.1 Periodic Boundary Conditions
7.3.2 Ewald Summation Techniques
7.3.3 Particle Mesh Ewald Method
7.3.4 Thermostats in MD
7.3.5 Solvent Models
7.3.6 Energy-Minimization Methods in MD Simulations
7.4 Current Tools for MD Simulation
7.4.1 Recent Advances in Hardware to Run MD Simulation
7.4.2 GROMACS
7.4.3 AMBER
7.4.4 CHARMM-GUI
7.4.5 NAMD
7.4.6 Quantum-Mechanics/Molecular-Mechanics (QM/MM)
7.4.7 HyperChem
7.5 Other Advance Methods for MD Simulation
7.5.1 Metadynamics
7.6 Analysis of MD Trajectories Through GUI-Based Software
7.6.1 Visual Molecular Dynamics
7.6.2 PyMOL
7.6.3 Chimera
7.7 Structural Parameters for Analysis of MD Simulation
7.7.1 RMSD
7.7.2 RMSF
7.7.3 Radius of Gyration
7.7.4 Protein-Ligand Contacts
7.7.5 SASA
7.7.6 Principal Component Analysis or Essential Dynamics
7.7.7 Secondary Structure Analysis
7.8 Application of MD Simulation
7.8.1 Mutational Analysis
7.8.2 Application in the Drug Designing
7.8.2.1 Inhibitor Designing Against MtbICL
7.8.2.2 Inhibitor Designing Against Fasciola gigantica Thioredoxin Glutathione Reductase
7.8.3 Unfolding Studies
7.8.3.1 Urea Induced Unfolding of FgGST1
7.8.3.2 GdnHCl-Induced Unfolding Analysis
7.8.3.3 pH-Induced Effects on the Structure and Stability of the Protein
7.9 Conclusions
References
8: Computational Approaches for Drug Target Identification
8.1 Introduction
8.2 Drug Targets
8.3 Drug Target Identification
8.4 Computational Approaches for Drug Target Identification
8.5 Homology-Based Approaches
8.5.1 Human Homologs
8.5.2 Human-Microbiome Homologs
8.5.3 Essentiality
8.5.4 Virulence Factor Homologs
8.5.5 Drug Target Homologs
8.5.6 Cellular Location
8.5.7 Role in the Biological Pathway
8.5.8 Case Study: Subtractive Approach for Drug Target Identification
8.6 Network-Based Approaches
8.6.1 Centrality Based Drug Target
8.6.1.1 Hubs as Target
8.6.1.2 Betweenness Centrality Based Target
8.6.1.3 Mesoscopic Centrality Based Target
8.6.1.4 Weight-Based Drug Target
8.6.2 Limitations
8.7 Properties of an Ideal Drug Target
8.8 Druggability of Drug Target
8.8.1 Importance of Druggability
8.9 Computational Methods for Druggability Assessment
8.9.1 Sequence-Based Methods
8.9.2 Structure-Based Methods
8.9.2.1 Identifying Cavities and Binding Pockets
8.9.2.2 Druggability of Binding Pocket
Position of the Atoms
Cavity Size
8.9.2.3 Target Specificity Assessment
Sequence Alignment Based Assessment
Structure Alignment Based Assessment
8.9.3 Quantification of Druggability
8.9.4 Major Concern
8.9.4.1 Size of Training Sets
8.9.4.2 Binding Site Flexibility
8.10 Target-Based Drug Discovery
8.10.1 Multi-Target Drug Designing
8.10.1.1 Identification of a Set of Targets ``Multi-Targets´´
8.10.1.2 Generation of Multi-Target Pharmacophore
8.10.1.3 Virtual Screening
8.10.1.4 Generation or Selection of Multi-Target Compound
8.10.1.5 Evaluation and Optimization of Multi-Target Specific Compound
8.11 Summary
References
9: Computational Screening Techniques for Lead Design and Development
9.1 Introduction
9.2 High-Throughput Screening
9.2.1 Assay Design
9.2.2 Biochemical Assays
9.2.3 Whole-Cell Assays
9.2.4 Automatic Methods of Library Generation and Robotics in HTS
9.2.5 Profiling
9.2.6 Screening Expense and Outsourcing Screening
9.3 QSAR Theories
9.4 Molecular Descriptors Used in QSAR
9.5 Methods of QSAR
9.5.1 2D QSAR Methods
9.5.1.1 Free Energy Models-Hansch Analysis Linear Free Energy Relationship
9.5.1.2 Mathematical Model
Free Wilson Analysis
Statistical Methods
Discriminant Analysis
Cluster Analysis
9.5.1.3 Principal Component Analysis (PCA)
9.5.1.4 Quantum Mechanical Methods
9.5.2 3D-QSAR
9.5.2.1 Molecular Shape Analysis (MSA)
9.5.2.2 Self-Organizing Molecular Field Analysis (SOMFA)
9.5.2.3 Comparative Molecular Field Analysis (CoMFA)
9.5.2.4 Comparative Molecular Similarity Indices Analysis (CoMSIA)
9.5.2.5 3D Pharmacophore Modeling
9.5.3 4D-QSAR
9.5.4 5D-QSAR
9.5.5 4D vs 5D-QSAR
9.6 ADME Screening
9.6.1 Absorption
9.6.1.1 Biologic Factors
9.6.1.2 Passive Diffusion
9.6.1.3 Carrier-Mediated Facilitated Transport
9.6.1.4 Local Blood Flow
9.6.1.5 Gastric Emptying Time
9.6.1.6 pH-Partition Theory
9.6.1.7 Ion Trapping
9.6.1.8 Chemical Modifications Affect the Absorption
9.6.1.9 Optimizing Absorption
9.6.2 Distribution
9.6.2.1 Optimizing Distribution
9.6.3 Metabolism
9.6.3.1 Phase I
Oxidation
Reduction
Hydrolysis
9.6.3.2 Phase II-Conjugation
9.6.3.3 Factors Affecting the Metabolism of a Drug
9.6.3.4 Optimizing Metabolism
9.6.4 Excretion
9.6.4.1 Factors Affecting ADME Properties and Modeling Process
9.6.4.2 Drug Likeness
9.6.4.3 Lipophilicity
9.6.4.4 Solubility
9.6.4.5 Pharmacokinetic Process
9.7 Toxicological Screening
9.7.1 Acute Systemic Toxicity
9.7.2 Toxicological Endpoints
9.7.3 Structural Alerts and Rule-Based Method
9.7.4 Read Across Methods Using Chemical Category
9.7.5 Quantitative Structure Activity Relationship Model Using a Statistical Method
9.7.6 Organization for Economic Cooperation and Development (OECD) Guidelines
9.7.6.1 Optimizing Toxicity
9.8 Limitations and Future Scope
9.9 Conclusions
References
10: Advances in Pharmacophore Modeling and Its Role in Drug Designing
10.1 Introduction
10.2 Features in a Pharmacophore
10.3 Pharmacophore Modeling
10.3.1 Ligand-Based Pharmacophore
10.3.2 Building a Pharmacophore
10.3.2.1 Ligand Preparation
10.3.2.2 Pharmacophore Feature Mapping
10.3.2.3 Searching Common Pharmacophore
10.3.2.4 Scoring the Common Pharmacophore
10.3.3 Algorithms Used to Build a Pharmacophore
10.3.4 Structure-Based Pharmacophore
10.3.4.1 Redocking of Co-Crystal Ligand
10.3.4.2 Scoring for Pharmacophoric Sites
10.3.4.3 Building a Pharmacophoric Hypothesis
10.4 Tools for Pharmacophore Building
10.5 Validation of a Pharmacophore Hypothesis
10.6 A Case Study of Structure and Ligand-Based Pharmacophore
10.7 Uses of Pharmacophore
10.7.1 Virtual Screening
10.7.1.1 An Instance of Virtual Screening and Its Workflow
10.7.2 Pharmacophore Fingerprint
10.7.2.1 An Instance of Pharmacophore Fingerprint Searching
10.7.3 De Novo Ligand Design
10.7.3.1 An Instance of De Novo Ligand Design
10.8 Success Stories in Pharmacophore-Based Drug Designing
10.9 Significance of Pharmacophore
10.10 Downside of Pharmacophore Modeling
10.11 Conclusion
References
11: In Silico Designing of Vaccines: Methods, Tools, and Their Limitations
11.1 Introduction
11.1.1 Live Attenuated Vaccine
11.1.2 Inactivated Vaccine
11.1.3 Subunit Vaccine
11.1.4 Recombinant Vector and DNA Vaccines
11.1.5 Epitope-Based Vaccines
11.2 B and T Cell Epitopes
11.2.1 B Cell Epitopes
11.2.2 T Cell Epitopes and Their Processing
11.3 Bioinformatics in Vaccine Design
11.4 Prediction Tools for Class I and II MHC Binding
11.4.1 NetMHC
11.4.2 NetMHCPan
11.4.3 SYFPEITHI
11.4.4 ProPred-I
11.4.5 RANKPEP
11.4.6 MHCPred
11.4.7 EpiJen
11.4.8 SVMHC
11.4.9 MULTIPRED2
11.4.10 ProPred
11.4.11 MHC2Pred
11.5 CTL Epitope Prediction
11.5.1 NetCTL
11.5.2 CTLPred
11.5.3 NetChop
11.5.4 MAPPP
11.5.5 Pcleavage
11.6 B Cell Epitope Prediction
11.6.1 BCPred
11.6.2 LBtope
11.6.3 ABCPred
11.6.4 BepiPred 2.0
11.6.5 Bcepred
11.6.6 DiscoTope
11.6.7 ElliPro
11.6.8 PEASE
11.7 Methods for In Silico Designing of Epitope-Based Vaccines
11.7.1 Selection of Proteins
11.7.2 Epitope Prediction and Analysis
11.7.3 Molecular Docking and Molecular Dynamics Simulation
11.7.4 Construction of Vaccine
11.8 Case Studies of Vaccine Designing
11.8.1 Vaccine Designing for Viral Pathogens
11.8.2 Vaccine Designing for Bacteria
11.8.3 Vaccine Designing for Other Parasites
11.9 Limitations and Challenges
11.10 Conclusion
References
12: Machine Learning Approaches to Rational Drug Design
12.1 Drug Industry
12.2 Drug Discovery Pipeline
12.2.1 Target Discovery
12.2.2 Target Validation
12.2.3 Lead Identification
12.2.4 Lead Optimization
12.2.5 Preclinical Phase
12.2.6 Clinical Trials
12.2.6.1 Phase I
12.2.6.2 Phase II
12.2.6.3 Phase III
12.2.6.4 Phase IV
12.3 Dimensions and Complexity of the Problem and Role of ML Techniques
12.4 Genetic Algorithms
12.4.1 Working of Genetic Algorithms
12.4.2 Genetic Algorithm Operators
12.4.2.1 Natural Selection Operator
12.4.2.2 Stochastic Sampling with Replacement
12.4.2.3 Stochastic Universal Sampling
12.4.2.4 Crossover/Recombination Operator
12.4.2.5 Mutation Operator
12.5 Artificial Neural Networks
12.5.1 How the Human Brain Works?
12.5.2 A Simple Artificial Neuron
12.5.3 Architecture of ANNs
12.6 Deep Learning(DL)
12.7 Support Vector Machines
12.8 Artificial Intelligence and Drug Discovery
12.9 Applications of ANNs, GAs, and Other ML Algorithms in Drug Discovery
12.9.1 Molecular Docking
12.9.2 Pharmacophore Modeling
12.9.3 Quantitative Structure-Activity Relationship (QSAR)
12.10 Conclusions
References

Citation preview

Computer-Aided Drug Design

Computer-Aided Drug Design

Computer-Aided Drug Design

Editor Dev Bukhsh Singh Department of Biotechnology, Institute of Biosciences and Biotechnology Chhatrapati Shahu Ji Maharaj University Kanpur, Uttar Pradesh, India

ISBN 978-981-15-6814-5 ISBN 978-981-15-6815-2 https://doi.org/10.1007/978-981-15-6815-2

# The Editor(s) (if applicable) and The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Ever since the cracking of the human genome in the beginning of the present century, scientists have been engaged in locating the drug targets and designing and developing novel drugs through the system’s approach. This has resulted in a tremendous reduction in research and production costs. Earlier, the drug design process used to take many decades and was carried out haphazardly without any direction. Already the surge in bioinformatics solutions has redefined the way drug trials are done and making a shift from in vitro to in silico. In this age of multiple drug resistance, in silico drug design could be used to shorten the time of discovery and this issue shall remain the biggest challenge for years to come. In the present fast changing scenario, it is difficult to manage expressive coherence in this rapidly growing area of drug designing. I am happy that Dr. Dev has ventured to collect twelve well-written chapters and has brought an edited book named “Computer-Aided Drug Design” to be published by Springer Nature, Singapore. I feel that the authors are quite successful in “fusing” the otherwise diverse topics of this fast-emerging area. I am sure that this book will be exceedingly useful for not only under- and postgraduate students but also for research scholars, scientists, and pharma industries involved in developing new drugs. I hope that the readers of this book shall contribute in the future for making the text more useful for further development of this important field of computer-aided drug design. Hony. Professor, IIIT-Allahabad Prayagraj, India

The computer-aided drug design uses computational approaches for analysis of target, screening, and interaction of ligands, simulation of target–ligand complex, optimization of lead compounds, QSAR analysis, and ADMET studies. In structurebased drug designing, ligand molecules are built keeping in mind the binding cavity of the target by assembling small substructures in a stepwise manner. Ligand-based drug designing involves the 2D/3D analysis and chemical modification of ligand known to interact with a drug target of the disease. A large number of computational tools have been developed to fulfill the different objectives in the way of drug designing. There are many successful stories of computer-aided drug designing. This field has attracted many researchers working in diverse fields of knowledge such as chemistry, physics, biology, mathematics, and computer science. In drug designing, systematic and sequential use of different computer-aided drug designing tools/software is required. Much advancement has taken place in the algorithms and approaches of computer-aided drug designing from time to time. The existing limitations of the tools and approaches used for drug designing have also been discussed which can motivate the readers and researchers to overcome such challenges in the future. The present book “Computer-Aided Drug Design” has been written considering the need for researchers and students working in the domain of computer-aided drug designing. This book not only represents the discussion of recent advances in the field of computer-aided drug designing but also provides a basic knowledge of principles, approaches, and tools used for drug designing. This book includes a discussion of biological database resources used for drug discovery. One chapter is focused on the computational approaches and resources used for vaccine designing. Similarly, a basic discussion and application of machine learning approaches such as genetic algorithm, artificial neural network, and support vector machine have been included. It also explains the basics and use of different biological, physical, and chemical parameters used for modeling, simulation, and ADMET prediction. The chapters provide a summary of related case studies along with the application, merits/demerits, limitations, and future perspectives related to the title. The steps and use of different computational approaches have been explained with the help of simple, suitable, and neat sketches and illustrations. This book is full of a lot of resources that can guide and motivate a learner to proceed for drug designing. vii

I hope this book will be very helpful in understanding the basics and recent advances in computer-aided drug design. I tried my best effort to present a good quality creation before the readers and other scientific communities. This book will cover the need for a broad spectrum of subjects such as bioinformatics, biotechnology, biochemistry, and pharmaceutical sciences. During the review and editing process, many suggestions, corrections, and suitable addition of new topics have been included. Still, I look forward to your valuable suggestions and feedback related to the content quality of the book. Kanpur, India

I am highly grateful to Prof. (Mrs.) Krishna Mishra (Prayagraj), a renowned scientist and educator for her continuous support, guidance, and motivation. Prof. J.V. Vaishampayan, former Vice-Chancellor of CSJM University, Kanpur has encouraged me a lot to achieve academic excellence. I will always be highly thankful to him for his inspiration, encouragement, and support. I would like to acknowledge the effort of all the authors of this book for their extensive labor, vision, and planning in writing the chapters. I thank the reviewers whose critical comments improved the book in substantial ways. I am highly thankful to Dr. Pankaj Kumar Singh, GBPUA&T, Pantnagar for his technical support and suggestions. I am highly grateful to my parents Mr. Sudhakar Singh and Smt. Radhika Singh and other family members for their wishes, valuable support, and encouragement. I am also thankful to my colleagues Dr. Manish Kumar Gupta, Dr. Satendra Singh, Dr. P. K. Yadav, Dr. Budhayash Guatam, Dr. Prashant Ankur Jain, Dr. Anil Kumar, Dr. Durg Vijay Singh, Dr. Ajay Kumar Singh, Dr. K. K. Ojha, Dr. Pramod Katara, Dr. Prem Kumar Singh, Dr. R. K. Kesharwani, other friends and staff members for their support. I would like to appreciate the effort of Dr. Rajesh Kumar Pathak, Mr. Rohit Shukla, Mr. Apoorv Tiwari, Mr. Himanshu Avasthi, Mr. Ambuj Srivastava, and Ms. Shikha Agnihotri for their support. I am thankful to Dr. Bhavik Sawhney and the entire team of Springer Nature for their continuous support and cooperation during the entire process of publication.

Computational Approaches in Drug Discovery and Design . . . . . . . Rajesh Kumar Pathak, Dev Bukhsh Singh, Mamta Sagar, Mamta Baunthiyal, and Anil Kumar

Molecular Modeling of Proteins: Methods, Recent Advances, and Future Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Apoorv Tiwari, Ravendra P. Chauhan, Aparna Agarwal, and P. W. Ramteke

Cavity/Binding Site Prediction Approaches and Their Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Himanshu Avashthi, Ambuj Srivastava, and Dev Bukhsh Singh Role of ADMET Tools in Current Scenario: Application and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajesh Kumar Kesharwani, Virendra Kumar Vishwakarma, Raj K. Keservani, Prabhakar Singh, Nidhi Katiyar, and Sandeep Tripathi

Database Resources for Drug Discovery . . . . . . . . . . . . . . . . . . . . . Anil Kumar and Praffulla Kumar Arya

Molecular Docking and Structure-Based Drug Design . . . . . . . . . . 115 Shikha Agnihotry, Rajesh Kumar Pathak, Ajeet Srivastav, Pradeep Kumar Shukla, and Budhayash Gautam

Molecular Dynamics Simulation of Protein and Protein–Ligand Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Rohit Shukla and Timir Tripathi

Computational Approaches for Drug Target Identification . . . . . . . 163 Pramod Katara

Computational Screening Techniques for Lead Design and Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Pramodkumar P. Gupta, Virupaksha A. Bastikar, Alpana Bastikar, Santosh S. Chhajed, and Parag A. Pathade xi

Advances in Pharmacophore Modeling and Its Role in Drug Designing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Priya Swaminathan

In Silico Designing of Vaccines: Methods, Tools, and Their Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Parvez Singh Slathia and Preeti Sharma

Machine Learning Approaches to Rational Drug Design . . . . . . . . . 279 Salman Akhtar, M. Kalim A. Khan, and Khwaja Osama

Dev Bukhsh Singh is an Assistant Professor at the Department of Biotechnology, Chhatrapati Shahu Ji Maharaj University, Kanpur, India. He received his B.Sc. and M.Sc. degrees from the University of Allahabad, Prayagraj, and his M.Tech. from the Indian Institute of Information Technology, Prayagraj. Holding a Ph.D. in Biotechnology with specialization in Bioinformatics from Gautam Buddha University, he has been actively involved in teaching and research since 2009, and his focus areas include molecular modeling, chemoinformatics, inhibitor/drug design, and in silico evaluation. He has authored numerous research articles and book chapters in the fields of medicinal research, molecular modeling, drug design, and systems biology. He also published a book on the title “protein structure, function, and dynamics“ (Springer Nature, Singapore). He is a member of various national and international academic bodies and is a reviewer for several international journals.

Computational Approaches in Drug Discovery and Design Rajesh Kumar Pathak, Dev Bukhsh Singh, Mamta Sagar, Mamta Baunthiyal, and Anil Kumar

Drug discovery is an expensive and complicated process. The drug must fulfill some criteria of being nontoxic, bioavailable, and potent. In the view of evermore stringent demands about efficacy, potency, and safety, the finding of the new drug-like molecule has become a complex and resource-intensive undertaking. Now, the availability of 3D structures of molecular drug targets and advances in computational approaches and bioinformatics speed up the application of molecular modeling in discovery. In this chapter, several molecular modeling strategies employed in modern drug discovery program are discussed. The concepts of structure- and ligand-based drug designing, protein modeling and visualization, molecular docking, virtual screening, molecular dynamics simulation, pharmacophore modeling, and QSAR approaches have been explained. Besides, we also provide important database resources and tools available for drug research. Finally, we present case studies conducted in our lab, showing how computational approaches can be implemented in reality for the discovery and designing of novel drugs from natural sources.

R. K. Pathak · M. Baunthiyal (*) Department of Biotechnology, Govind Ballabh Pant Institute of Engineering & Technology, Pauri Garhwal, Uttarakhand, India D. B. Singh Department of Biotechnology, Institute of Biosciences and Biotechnology, Chhatrapati Shahu Ji Maharaj University, Kanpur, Uttar Pradesh, India M. Sagar Department of Bioinformatics, University Institute of Engineering and Technology, Chhatrapati Shahu Ji Maharaj University, Kanpur, Uttar Pradesh, India A. Kumar Rani Lakshmi Bai Central Agricultural University, Jhansi, Uttar Pradesh, India # The Editor(s) (if applicable) and The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2020 D. B. Singh (ed.), Computer-Aided Drug Design, https://doi.org/10.1007/978-981-15-6815-2_1

Molecular modeling · Molecular docking · Molecular dynamics simulation · QSAR · Bioinformatics

Generally, new drugs were discovered from plants and other natural sources through accidental observations and analysis (Singh et al. 2012). The field of drug discovery is extremely challenging and requires adequate infrastructure and lab facilities. People in every nation have used drugs derived from plant or animal origin to treat and prevent disease. The quest for substances to fight sickness and to alter mood and consciousness is nearly as fundamental as search for food and shelter. Various drug molecules derived from plants or animals are highly valued, but most of the drugs in the modern medical system are synthetic chemistry and biotechnology products. Thus a drug is said to be a substance of either natural or synthetic origin that is employed in the prevention, treatment, and diagnosis of disease or modulation of the target or function of the biological systems. Thus, generally, we can say that a drug is a chemical that affects the biological systems and its processes at molecular to the cellular level (Huang et al. 2010). However, the traditional approach utilized for the discovery of novel drug molecules is time-consuming and cost-intensive. Therefore, the new approaches exceed the limitations of traditional research in the field of drug discovery. It evolved based on the following consideration, i.e. the molecular target present in the body and the potential bioactive compound are directly related to each other. For designing a drug, understanding about the disease and molecular mechanism of infectious processes is a must. For structure-based drug design, investigating a molecular target is a first step that is essential to a disease process and an infectious pathogen (Nag and Dey 2010). The next key step is to determine the molecular structure of the target through experimental or computational approaches. The success of structure-based drug discovery depends on accurate target structure with detail information of amino acid residues present in binding sites, which are further utilized by molecular docking program for screening of small molecules database (Kesharwani et al. 2018). Drugs have historically been developed to target a single biological object, usually a protein generally known as the target, with high selectivity to prevent any unintended effects arising from mis-targeting other biological targets. Based on this, the concept of multi-targets drugs has long been marked as undesirable, as it was naturally linked with harmful effects (Bolognesi and Cavalli 2016 Ramsay et al. 2018). Parallel to this, several evidences confirmed that molecules that strike more than one target have a safer profile compared to single targets. Therefore, the idea of multi-target drugs made rapid and dramatic progress from an evolving paradigm when first laid out in early 2000, to one of the hottest drug research topics for the year 2017 (Roth et al. 2004 Ramsay et al. 2018).

Computational Approaches in Drug Discovery and Design

In multifactorial diseases such as cancer and Alzheimer’s disease, there is an urgent need to find multi-target inhibitors. As essential for Alzheimer’s development, β-amyloid cleavage enzyme (BACE-1) and acetylcholinesterase (AChE) were considered promising targets for the drug (Goyal et al. 2014). Besides, numerous pathological manifestations also contribute to cancer. In fact, the medications also lead to serious side effects with their treatments. Thus, multi-indication therapies are required, which can simultaneously inhibit multiple targets and minimize side effects (Lim et al. 2019). Some multi-target drugs are available in the market for the treatment of the diseases. The recently approved (April 2017) multi-target drug is midostaurin it is a well-known multi-kinase inhibitor for the treatment of those newly diagnosed adult patients with acute myeloid leukemia who have a particular FLT3 gene variant. A study has shown that it can inhibit the activity of protein kinase C alpha, KIT, VEGFR2, WT and PDGFR and/or FLT3 tyrosine kinases mutant (Levis 2017 Ramsay et al. 2018). Drug discovery is based on the screening of small molecule databases on a receptor, whereas designing is based on modification in the structure of a lead compound when the lead compound having some therapeutically undesirable side effects. In addition to the above, quantitative structure–activity relationship (QSAR) is one of another potential area in molecular modeling that has helped medicinal chemists in drug designing process. In previous years, the identification of a new drug molecule is a very complex and time-consuming process. After studying more than 5000–10,000 compounds, only a single drug molecule comes to the market. The cost of each drug is about $156 million in the discovery phase. I, II, and III clinical trial and Food and Drug Administration (FDA) processes the cost is another $75 million. This brings the total amount is about $231 million for each drug that comes to the market for benefits of the society. Then, for gaining FDA approval, an extensive and expensive procedure also needs to be followed (Huang et al. 2010 de Ruyck et al. 2016). Considering the high failure levels, considerable costs, and slow speed of new drug identification research, repurposing “existing” medicines to treat common and rare diseases is becoming increasingly desirable as it requires the use of low-risk compounds to develop drugs in a shorter time with cost-effective manner (Pushpakom et al. 2019). Generally, three kinds of approaches that are widely used in drug repositioning include computational, experimental, and mixed approaches (Xue et al. 2018 Talevi and Bellera 2020). The first case of drug repositioning, in the 1920s, was an accidental discovery. After a century of progress, further strategies for accelerating the drug repositioning cycle have been suggested. In this scenario, machine learning algorithms have been implemented to boost drug repositioning efficiency. Over the computational methods, the experimental method was established that provide clear evidence of linkages between drugs and diseases, such as target screening, cell assays, animal model, and clinical approaches. These are effective and trustworthy methods. In recent years, growing numbers of researchers have merged computational and experimental methods for identifying new drug indications, called mixed approaches. Biological experiments and clinical studies confirmed the findings of the computational methods. Mixed approaches

provide incentives for the successful and rapid discovery of repositioned drugs (Xue et al. 2018). Some successful repositioned drugs are Zidovudine, Minoxidil, Sildenafil, Thalidomide, Celecoxib, Atomoxetine, Duloxetine, Rituximab, Raloxifene, Fingolimod, Dapoxetine, Topiramate, Ketoconazole, and Aspirin (Pushpakom et al. 2019). However, there are also major technical and regulatory issues that need to be tackled. It is an intricate process involving several factors including technology, business models, patents, and investment as well as consumer demands. While several medical databases have been developed, it is still a challenge to choose the best approach to make full use of vast amounts of medical data. There is an urgent need to develop a novel approach for drug repositioning. Another highlighted issue to address is the intellectual property (IP). IP safety is limited for repositioning of the drugs. For example, some novel associations of drugtarget-disease discovered by repositioning researchers were verified by publications or online databases however, because of the law, it is difficult to obtain IP protection for these associations. The IP problem prohibits the entry of such repositioned drugs into the market. In fact, some repositioning research initiatives are forced to give up, which is wastage of money and time. Therefore, developing a new commercial model is essential because the current model is a serial model which causes problems related to funding and investment (Xue et al. 2018). Molecular modeling is a data-driven science branch, with many of the algorithms and databases being created or adapted as a response to new data forms (Xia 2017). Today computer experiments play an increasingly important role in research. The advent of high-performance computing has allowed in silico experimentation as a tool for interpolating laboratory experiments and theory (Aminpour et al. 2019). Due to advances in computational algorithms and the development of efficient software, the time requires in the identification of lead compounds reduced dramatically. A detailed flowchart highlighting the different approaches of molecular modeling in drug discovery and design is demonstrated in Fig. 1.1.

Structure-Based Drug Designing

Structure-based designing is a multidisciplinary and iterative process that is wellestablished in the research institution and pharmaceutical industry. It played a tremendous role in the discovery and development of several registered drugs and clinical candidates, for example, zanamivir, nelfinavir, and aleglitazar. In contrast, structure-based designing is relatively new in the agrochemical industry and at present, no products in the market that are directly investigated with the use of this approach. However, there are several databases and software programs where structure-based design has a strong impact (Huang et al. 2010). The major database resources used in a drug discovery program are listed in Table 1.1. Different approaches used in the discovery of lead molecule through computational are discussed in the following sections.

Computational Approaches in Drug Discovery and Design

Fig. 1.1 Application of molecular modeling approaches in drug discovery and design

Drug target identification and its validation is the initial step of the drug discovery process. It is a macromolecule that has an established function in the pathophysiology of a disease. Four major drug targets are found in organisms, i.e. proteins, including receptors and enzymes, nucleic acids (DNA and RNA), carbohydrates, and lipid. The majority of drugs available in the market are addressed to proteins as a target. However, due to the decoding of several genomes of pathogens, nucleic acids could gain big importance as drug targets in the future (Gashaw et al. 2012). The

Table 1.1 Availability of major compound database resources for molecular modeling S. No. 1

Description It is a freely available database of commercially available compounds for molecular docking and virtual screening It is a database of small chemical molecules, their biological activities

It is a chemical structure database used for drug discovery

It is a small molecule database that contains information about ADMET and binding for a huge number of bioactive compounds It is a comprehensive database resource containing information about drugs, their targets, and other useful information

Availability http://zinc. docking.org/

References Irwin and Shoichet (2005)

https:// pubchem. ncbi.nlm.nih. gov/ http://www. chemspider. com/ https://www. ebi.ac.uk/ chembldb/

Pence and Williams (2010) Gaulton et al. (2012)

selection of potential drug targets from thousands of candidate macromolecules is a challenging task. In the post-genomic era, genomics and proteomics approaches are the most important tools for target identification (Singh et al. 2016). Besides, advances in high-throughput omics technologies generated a huge amount of data for host–pathogen interaction. These available data are also integrated and analyzed by the scientific community through network and systems biology approaches to accelerate the process of target identification in drug discovery program.

Modeling and Visualization of Macromolecule Structure

Determination of three-dimensional structure through experimental approaches is a costly and time taking process. Therefore, comparative modeling or homology modeling using sequence information is an accurate method for the prediction of three-dimensional structures, yielding appropriate models for a wide range of applications in the area of drug discovery (Bodade et al. 2010 Pathak et al. 2016). It is generally a choice of an algorithm when a homology among the target protein and a template structure exists (Sussman et al. 1998). This approach is based on the assumption that two identical sequences adopt similar three-dimensional structures. A higher sequence identity between the sequence of the target and template structure promises the generation of a more reliable model. Modeling the 3D structure of a protein from a sequence, in the absence of an X-ray or NMR verified structure is necessary for drug designing (Hekkelman et al. 2010 Bagaria et al. 2012). Besides, threading or fold recognition and ab initio are other methods used in modeling of 3D

Computational Approaches in Drug Discovery and Design

structure when no appropriate template detected in the PDB database (Singh and Tripathi 2020). CASP (critical structure prediction assessment) playing a key role in protein structure prediction. It is a biennial collective project designed to evaluate the state of the art in protein structure modeling. Participants are provided with target protein amino acid sequences, and model the corresponding 3D structures. The independent assessors equate submissions with the experiment. It is a double-blinded experiment, participants do not have exposure to the experimentally determined structures, and the evaluators do not know the identity of those who apply. A variety of other aspects of protein modeling are also tested, in addition to structure models: optimization of an estimated structure similar to the experimental one, estimates of the accuracy of the overall structural model and residue, modeling of the protein oligomer structure, the ability to develop models using a range of sparse data types, and the accuracy of protein structure characteristics relevant to the deduction of functional aspects (Kryshtafovych et al. 2019). CASP studies were designed to achieve an objective for evaluation and assessment of different servers used of protein 3D structure prediction. RasMol, PyMol, Chimera, and other visualization tools play a significant role in viewing and analyzing the predicted and experimentally determined 3D structures of macromolecules at the atomic level. Many efforts have been made in recent years to develop user-friendly simulation environments based on computer graphics for the structural biologist. It is widely used in biology for the presentation of simulation results in post-processing or experiments and by graphic editors for building models for a better understanding of atomic data of 3D co-ordinates (Seeliger and de Groot 2010 Mamgain et al. 2018).

Binding Site Prediction and Analysis

The determination of binding sites is not a simple task researchers have suggested some criteria for selecting a binding site. It is investigated that the functional activity of any protein is governed by such highly conserved cluster of amino acid residues present in binding site pocket. The most available algorithms are based on similarity searches of the molecular surface for functional site databases such as PDB that contain fully reviewed and experimentally validated information of protein structures. Besides, some methods are also developed based on phylogenetic profiling of residues and several other models such as HMM, SVM, and CASP9 (Schmidt et al. 2011 Liu et al. 2014). Generally, binding site residues are highly conserved among closely related proteins. Identification of such binding site residues is also done through the superimposition of the predicted model with their template that provided integrity for homology and assisted in the positioning of conserved active site residues (Nag and Dey 2010 Bajorath 2015). However, many protein–ligand complex structures are also available in public databases as a signature for the binding site where ligand was bound in binding site cavity of a protein. Usually, researchers separate bound

ligand from protein, and this area is considered as binding site area for molecular docking studies using ligand structures because it is an experimentally determined complex structure and yielded significant outcome. Advances in the area of bioinformatics provide several computational tools that can able to predict novel binding site residues present in the cavity of the predicted protein model, which are further utilized in drug discovery research.

Docking intends to precisely fit the structure of a ligand inside the requirements of a receptor binding site and to accurately evaluate the strength of binding (AdrianScotto and Vasilescu 2008). When the binding site is not known in target protein structure, in such case, blind docking is helpful because in which whole protein structure is considered as binding site area, whereas if the binding site is known, sitespecific docking is useful to predict the interacting nature of ligand molecule. Generally, the results of blind docking are less accurate and take more time and memory than site-specific docking because it targets only selected amino acid residues present in the binding site cavity. Nevertheless, the majority of available literature represents case studies where molecular docking has been used to deal with specific issues related to ligand design or target recognition (Huang et al. 2010 Yuriev and Ramsland 2013 Pathak et al. 2016 Rana et al. 2019). A summary of the highly cited molecular docking programs used in drug discovery has been listed in Table 1.2.

Table 1.2 A summary of the highly cited molecular docking programs used in drug discovery S. No. 1

Description Used for molecular docking. It predicts the binding affinity and poses of a small molecule to a 3D structure target protein Used for virtual screening and molecular docking

A complete package for molecular modeling and computer-aided drug discovery (CADD)