Open Access Databases and Datasets for Drug Discovery

Name: Open Access Databases and Datasets for Drug Discovery
Brand: Wiley-VCH
Price: 142.99 EUR
Availability: OnlineOnly

Antoine Daina Michael T. Przewosny Vincent Zoete(Herausgeber*in)

Wiley-VCH (Verlag)

1. Auflage

Erschienen am 27. September 2023

352 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-3-527-83048-0 (ISBN)

142,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Antoine Daina studied Pharmacy at the University of Lausanne (Switzerland) and got a Ph.D. in Pharmaceutical Sciences from the University of Geneva. After industrial practice as computational chemist for agrochemical research and academic experience as lecturer and researcher in drug discovery, he joined the SIB Swiss Institute of Bioinformatics in 2012. He is now Senior Scientist in the Molecular Modeling Group in charge of methodological developments in the SwissDrugDesign program, of supporting drug discovery projects and of teaching computer-aided drug design. Author of 22 peer-reviewed research articles or reviews and co-inventor on 5 patents.

Michael Przewosny studied chemistry at RWTH Aachen (Germany) and obtained a PhD in the field of peptides and protein chemistry at the DWI Aachen. He has over 20 years of experience in pharmaceutical research and drug discovery, having held several positions as laboratory manager in medicinal chemistry and process development. He is listed as co-inventor on 27 patents.

Vincent Zoete studied Chemistry at the Ecole Nationale Supérieure de Chimie de Lille and at Lille University (France). He studied Molecular Modeling in the Karplus Lab is Strasbourg and joined the Swiss Institute of Bioinformatics (SIB) in 2004. He was Associate Group leader of the SIB Molecular Modeling Group until 2017, then Group Leader from 2017 until now. He is also an Assistant Professor in Molecular Modelling at the University of Lausanne since 2017. He is the coordinator/developer of SwissDock.ch, SwissParam.ch, SwissBioisostere.ch, SwissTargetPrediction.ch, SwissSimilarity.ch, SwissADME.ch and the author of 122 peer-reviewed research articles and reviews, and 5 patents.

Herausgeber*in

Antoine Daina

ISNI: 0000 0005 1646 2127

Michael T. Przewosny

ISNI: 0000 0003 5951 0390

Vincent Zoete

Reihen-Herausgeber

Raimund Mannhold

ISNI: 0000 0001 0879 830X

Helmut Buschmann

ISNI: 0000 0000 0365 2566

Jörg Holenz

ISNI: 0000 0000 1368 3163

Inhalt

INTRODUCTION
Open access databases and datasets for computer-aided drug design: A short list used in the Molecular Modelling Group of the SIB

PART 1: SMALL MOLECULES
PubChem: A Large-Scale Public Chemical Database for Drug Discovery
DrugBank Online: A How-to Guide
Bioisosteric replacement for drug discovery supported by the SwissBioisostere database

PART 2: MOLECULAR TARGETS
The Protein Data Bank (PDB) and macromolecular structure data supporting computer-aided drug design
The SWISS-MODEL repository of 3D protein structures and models
PDB-REDO in computational aided drug design (CADD)
Pharos & TRCD: Informatics tools for illuminating dark targets

PART 3: USER'S POINTS OF VIEW
Mining for bioactive molecules in open databases
Open access databases - an industrial view

1
Open Access Databases and Datasets for Computer-Aided Drug Design. A Short List Used in the Molecular Modelling Group of the SIB

Antoine Daina1, María José Ojeda-Montes1, Maiia E. Bragina2, Alessandro Cuozzo2, Ute F. Röhrig1, Marta A.S. Perez1, and Vincent Zoete1,2

1SIB Swiss Institute of Bioinformatics, Molecular Modeling Group, Quartier UNIL-Sorge, Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland

2University of Lausanne, Ludwig Institute for Cancer Research, Department of Oncology UNIL-CHUV, Route de la Corniche 9A, CH-1066 Epalinges, Switzerland

The role of computer-aided drug design (CADD) in modern drug discovery [1-15] is to support its various processes, including hit finding, hit-to-lead, lead optimization, and the activities preluding to preclinical trials, through numerous in silico predictors and filters. These tools have a wide variety of objectives, such as enriching the families of molecules that will be submitted to experimental screening with potentially active compounds, identifying molecules that may be problematic such as toxic moieties or those with nonspecific activities, generating ideas on the chemical modifications to be made to the compounds to increase their affinity for the therapeutic target or to improve their pharmacokinetics [16-19], or finally assisting in the various selection processes aimed at identifying and promoting the most promising molecules. These approaches are generally divided into two main families [20].

Structure-based approaches [8,21-23] use the three-dimensional structure of the targeted protein, for example, to estimate via the use of a docking software how and how strongly a small molecule will bind to it. Avoiding the necessity to resort solely to an experimental method (e.g. X-ray crystallography, NMR, or cryo-electron microscopy) to obtain this information makes it possible to process a large number of molecules very quickly and at a moderate cost. In turn, this information can be used to determine how to modify the chemical structure of a small molecule to optimize rationally the intermolecular interactions with the protein target. It is then possible to select the most promising compounds for experimental validations, creating a cyclic optimization process, thanks to this feedback loop between in silico and in vitro approaches.

Ligand-based approaches take advantage of already known molecules with certain bioactivities or physicochemical properties, in order to derive the information necessary to predict the bioactivity or properties of other compounds, real or virtual. Indeed, CADD has been a pioneering research area in the development and application of machine learning methods [24-32], with the emergence, as early as the 1960s [33], of quantitative structure-activity relationships (QSAR [34]) or quantitative structure-property relationships (QSPR).

To perform these tasks, CADD benefits from numerous databases and datasets of small molecules, bioactivities and biological processes, 3D structures of small compounds and biomacromolecules, or molecular properties - some of which being related to pharmacokinetics or toxicity [13,35-38]. Created in 1971, the Protein Data Bank (PDB) [39], which stores the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, is a precursor in the field of freely and publicly available databases with possible applications in CADD. Currently managed by the wwPDB [40] organization and its five members, RCSB PDB [41], PDBe [42], PDBj [43], EMDB [44] and BMRB [45], the PDB continues to provide the CADD community with numerous valuable 3D structures of therapeutically relevant proteins in the apo form or in complex with small drug-like molecules, which can be used to nurture structure-based approaches. Several subsets involving such structures have been created over time, for instance, to provide reference sets to benchmark docking software, such as the Astex [46] or the Iridium [47] datasets. For a very long time, ligand-based approaches were generally limited to the use of small datasets, collected on a case-by-case basis during specific drug design projects, thus precluding their application beyond the building of focused models with limited scope. This situation dramatically changed during the 2000s with the rise of large-scale databases created specifically for the benefit of drug discovery in general and CADD in particular. ChEMBL [48, 49] released in 2008 or PubChem [50] in 2004, which collect molecules and their activities in biological assays systematically extracted from medicinal chemistry literature, patent publications, or experimental high-throughput screening programs, are certainly among the forerunners of this trend. Such databases paved the way for CADD approaches addressing, for instance, the prediction of bioactivities on a very large scale, including ligand-based methods. ZINC [51], freely accessible from 2004, is another large-scale database of small molecules, this time prepared especially for virtual screening. This important resource focuses on the compilation and storage of commercially available chemical compounds. DrugBank [52], whose first version dates back to 2006, is an example of a database gathering numerous curated and high-quality information about a group of molecules of biological interest, in this case mainly but not exclusively, approved or developmental drugs. Although smaller than ChEMBL or PubChem for instance, this type of resources, because of the quality, the structure and the practicality of the information provided, also plays an critical role in the development of new CADD techniques and filters, or for more direct applications in virtual screening.

Researchers working in CADD can be considered to have two main activities: one consists in designing, validating, and benchmarking new in silico approaches, the other is applying existing tools to support drug discovery projects. The nature of the databases reflects this duality. Some are clearly oriented toward an applicative usage. With virtual screening in mind, this is the case for resources gathering a large amount of commercial or virtual molecules, such as ZINC [51] or GDB-17 [53], whose main purpose is to be used as a source of molecules to feed virtual screening campaigns. At the opposite end of the spectrum, we find molecular sets constructed specifically for benchmarking screening methods, such as DUD-E [54] or DEKOIS [55]. These contain a limited number of compounds, known to be active or inactive on certain protein targets, and carefully chosen to avoid any bias in many molecular properties that would allow a screening software to identify the active ones too easily. Between these two extremes, we can find databases, such as ChEMBL, PubChem, or TCRD/Pharos [56], containing a large number of known bioactive molecules. These generalist databases can not only be used to develop a large range of CADD methods, including screening or reverse screening approaches, such as Similarity Ensemble Approach (SEA) [57, 58] or SwissTargetPrediction [59, 60], but also constitute a source of real molecules to be virtually screened.

By definition, the interest for many CADD-related databases lies in their capacity to store a possibly large quantity of molecules, along with useful annotations, and in their efficient diffusion to the public. This was made possible by the development and dissemination of widely accepted specific file formats. The most common file for representing molecules as strings are in SMILES [61, 62] and InChI [63, 64] formats. These one-line formats have the great advantage of using little disk or memory resources, facilitating the storage, and rapid transfer of large numbers of molecules. It should be noted, however, that several SMILES strings can represent the same molecule. This can be problematic and potentially generate redundancy when compounds from different sources are gathered. To avoid this kind of situation, it is possible to produce canonical SMILES by a well-chosen software, which are by definition unique for each molecule, or to use the UniChem [65] database that provides pointers between the molecules of most common databases. Structure-based approaches, such as molecular docking, 3D fingerprinting [66], or pharmacophores [67, 68], require a spatial representation of small molecules. The most frequently employed file definitions, including tridimensional atomic coordinates, are the Structural Data File (SDF), the MDL Mol, and Tripos Mol2 formats. Compounds are often available in such formats in the major small-molecule databases, such as ZINC [51], Chemspider [69], or DrugBank [52], which allow their direct use in 3D-based approaches. Other formats are available to store 3D structures of biomacromolecules, taking advantage of the fact that large biomolecules are based on the repetition of a...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Open Access Databases and Datasets for Drug Discovery

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 Open Access Databases and Datasets for Computer-Aided Drug Design. A Short List Used in the Molecular Modelling Group of the SIB

Systemvoraussetzungen

1
Open Access Databases and Datasets for Computer-Aided Drug Design. A Short List Used in the Molecular Modelling Group of the SIB