* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Representation and Manipulation of 3D Molecular Structures
Survey
Document related concepts
Molecular ecology wikipedia , lookup
Photosynthetic reaction centre wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Metalloprotein wikipedia , lookup
Drug discovery wikipedia , lookup
Expression vector wikipedia , lookup
Drug design wikipedia , lookup
Size-exclusion chromatography wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Proteolysis wikipedia , lookup
Transcript
3D Molecular Structures C371 Fall 2004 Morgan Algorithm (Leach & Gillet, p. 8) Bioisosteres (Leach & Gillet, p. 31) Milestones In Chemical Information: IV (PW) • Structure diagrams are planar but molecules are not, so need to extend existing 2D screening and graph-search methods to allow 3D substructure searching (Pfizer and Lederle, 1986-87) • Sources of 3D structural data – Experimental data (Cambridge Structure Database) – Computational chemistry (quantum mechanics, molecular mechanics, molecular dynamics) – Structure-generation methods for databases of molecules • CONCORD (Texas, 1987) • CORINA (Munich/Erlangen, 1990) • Further extensions to allow flexible searching (ICI, MDL and Tripos, 1991-94) Milestones In Molecular Modelling: IV (PW) • Use of 3D information in QSAR to facilitate structurebased approaches to drug discovery • COmparative Molecular Field Analysis (Tripos 1988), and related approaches – Calculate energies at points on a 3D grid surrounding a molecule – Statistical correlation with activity to identify important positions in space – Need for alignment Pharmacophore (Leach & Gillet, p. 32) 3D Substructure Searching (PW) O a = 8.62+ - 0.58 Angstroms N O b = 7.08+ - 0.56 Angstroms c a O c = 3.35+ - 0.65 Angstroms O O N b O O O S O O O O O O N N N N O O N N O O N N N O O N N O O P O O O N O N N N O P O O O O O N O P O O O O O O Current Activities: Virtual Screening (PW) • Need to prioritise the many molecules that could be tested • Increasingly sophisticated level of filtering to maximise the numbers of potential leads – “Drugability” considerations – Similarity searching (both 2D and 3D) using initial weak leads – 3D substructure searching once possible pharmacophoric patterns have been identified – Docking once the 3D structure of the biological target is available Cambridge Structural Database • X-ray crystal structures of more than 250,000 compounds (organic and organometallic) • Established in 1965 • Textual queries • Structural queries • Specific 3D constraints (conformation or distance variables) Protein Data Bank • More than 25,000 X-ray and NMR structures of protein and protein-ligand complexes • Some nucleic acid and carbohydrate structures • Founded in 1971 at Brookhaven National Laboratory; now run by a consortium • Retrieval by textual queries or in some interfaces by amino acid sequences Uses of the CSD and PDB • Data mining for conformational properties and intermolecular interactions (CSD & PDB) • Data mining for information about intermolecular interactions (CSD & PDB) • Further understanding of the nature of protein structure and its relationship to amino acid sequence (PDB) • Homology modeling (comparative modeling) (PDB) 3D Pharmacophores • Definition: a set of features together with their relative spatial orientation that are thought to be capable of interaction with a particular biological target – Hydrogen bond donors and acceptors – Positively and negatively charged groups – Hydrophobic regions and aromatic rings • Depends on atomic properties rather than element types • Does not depend on specific chemical connectivity Lipinski Rule of Five • Poor absorption or permeation are more likely when a molecule has: – More than five hydrogen bond donors – More than ten hydrogen bond acceptors – LogP greater than five – Molecular weight greater than 500 3D Database Searching • As with 2D searching, usually involves a 2stage process – Rapid screen to eliminate molecules that cannot match the query – Graph matching to identify matches • Interatomic distances between pairs of atoms are important Structure Generation Programs • CONCORD (Coordinates found in the CAS Registry File) • CORINA (COoRdINAtes) – About CORINA – Generating 3D structures with CORINA Conformational Search and Analysis; Systematic Conformational Search • Goal of Conformational Analysis: identify all accessible minimum-energy structures of a molecule • Global minimum-energy conformation: the minimum with the lowest energy • Systematic searches assign values to the torsion angles of the rotatable bonds in the molecule Random Conformational Search • Simulated annealing: temperature is gradually reduced from a high value to a low temperature Other Conformational Searches • Distance geometry • Molecular dynamics Deriving 3D Pharmacophores • Pharmacophore mapping: the process of deriving a 3D pharmacophore – Conformational flexibility – Different combinations of pharmacophoric groups in the molecule • Genetic algorithms: a class of optimization method based on computational models of Darwinian evolution Applications: Structural Genomics • Definitions (Goals) – Characterization of all protein structures in a given genome – Provide sufficient coverage fold space to facilitate accurate homology modeling of the majority of proteins of biological interest – PDB Target Database (http://targetdb.rcsb.org/) Searching 3D Protein Structures (PW) • Searching protein sequences is well established: how to search the 3D structures in the Protein Data Bank (PDB)? • Extensive collaboration between Information Studies and Molecular Biology and Biotechnology to develop graph representations of proteins that can be searched with isomorphism algorithms analogous to those used for chemical structures • Focus here on folding motifs (secondary structure elements) in proteins but others – Protein amino acid sidechains – Carbohydrates – Nucleic acids Representation Of Protein Folding Motifs: I (PW) • The helix and strand secondary structure elements (SSE) are both approximately linear, repeating structures, which can hence be represented by vectors drawn along their major axes • The nodes of the graph are these vectors and the edges comprise: – The angle between a pair of vectors – The distance of closest approach of the two vectors – The distance between the vectors’ mid-points • PROTEP compares such representation using a maximal common subgraph isomorphism algorithm to identify common folds Representation Of Protein Folding Motifs: II (PW) Structural Relationship Between Leucine Aminopeptidase And Carboxypeptidase A (PW) • Use of 1LAP as the target for a PROTEP search requiring structures with at least 7 SSEs in common with the target • The four carboxypeptidase structures in the PDB at that time have a fold containing five helices and eight strands in a sheet in common with 1LAP