* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PowerPoint プレゼンテーション
Magnesium transporter wikipedia , lookup
Expression vector wikipedia , lookup
Drug discovery wikipedia , lookup
Genetic code wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Drug design wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Metalloprotein wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Biochemistry wikipedia , lookup
Western blot wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Approaches to Protein Structure Prediction & Their Applications Dr. S. Selvaraj Department of Bioinformatics Bharathidasan University Tiruchirappalli 620 024 2017/5/23 1 Introduction to proteins one of the major biomacromolecules structural proteins (viral coat proteins, horny outer layer of human and animal skin) carrying out a variety of biological functions: enzymatic catalysis, transportation, immune response, hormones, storage, control of genetic transcription made up of 20 different kinds of amino acids linked together in a long string each string fold into a compact, unique threedimensional structure – to perform a specific function 2017/5/23 2 Copyright 1996-98 © Dale Carnegie & Associates, Inc. Levels of Description of Structural Complexity • Primary Structure (AA sequence) • Secondary Structure – Spatial arrangement of a polypeptide’s backbone atoms without regard to side-chain conformations - hydrogen bonding pattern of the main chain • , , coil, turns – Super-Secondary Structure • -helix hairpin, -hairpin, -- unit • Tertiary Structure – 3-D structure of an entire polypeptide (assembly and interaction of the helices and sheets • Quaternary Structure – Spatial arrangement of subunits (2 or more polypeptide 2017/5/23 chains) 3 Different structural classes 4MBN all- Dominated by -helices 4LYZ + Helices and strands tend to segregate 3CNA 1TIM all- / Dominated by -strands Helices and strands mix each other 2017/5/23 4 Protein Folding • Proteins need to maintain their tertiary structure to perform their specific function. This structure is stabilized by many non-covalent interactions such as electrostatic, hydrogenbonding, hydrophobic interaction etc. • Chemical agents such as urea (8M) or guanidinium chloride can unfold (denature) the proteins • C.B. Anfinsen in the late 50’s discovered that some proteins such as ribonuclease A and staphylococcal nuclease could be reversibly denatured i.e. they spontaneously refold to their native structures after denaturation • This observation led Anfinsen to conclude that the amino acid sequences contain the necessary information to encode 2017/5/23 the tertiary structure of proteins 5 Techniques to study protein structures / Databases • Primary structure – protein chemistry, cDNA sequencing • Secondary structure – ORD/CD • Tertiary (3-D) structure – x-ray crystallography, NMR • Information obtained from sequencing studies stored in databases such as Swiss-prot, PIR, NCBI etc • www.expasy.ch ; select swiss-prot • www-nbrf.georgetown.edu/pir • www.ncbi.nlm.nih.gov/Entrez 2017/5/23 6 3-D Structure Determination • Two methods for revealing positions of atoms in 3D: – X-Ray Crystallography • X-ray diffraction pattern + mathematical construction • Good resolution of diffraction needed • First, protein crystal needed – Nuclear Magnetic Resonance • Small proteins only (< 250 residues) • Inter-proton distances + geometric constraints 2017/5/23 7 DATABASE FOR PROTEIN STRUCTURES • Archive, annotate and distribute sets of atomic coordinates • The best-established data base for biological macromolecular structures is the Protein Data Bank (PDB) • Contains structures of proteins, nucleic acids and a few carbohydrates • web-site: www.rcsb.org/pdb 2017/5/23 8 Why so much of interest in bioinformatics these days ? • Human genome project – seeks to map every gene and spell out letter by letter the thread of life • complete DNA sequencing of more and more organisms will answer many important questions • how organisms evolved • how to treat a wide range of medical disorders 2017/5/23 9 The key issue ……… • is the need to annotate the vast amount of DNA sequence to give it meaning. This will come from the understanding of proteins encoded by the genes. 2017/5/23 10 Sequence-Structure Gap • Swissprot: Release 46.1 as of 15-Feb- 2005, contains 168297entries • PDB has about 26864 known structures of proteins and viruses (8 February 2005) • only about 15-20% of structures have been determined for known protein sequences • Can we shorten this gap using prediction techniques? 2017/5/23 11 The problem • Number of amino acid sequences available in sequence database far exceeds the number of structures known • Structure determination a time-consuming task • Hence efforts have been developed over the years to i) predict structure from sequence information ii) build models by homology 2017/5/23 12 Protein Structure Prediction Approaches • Secondary Structure Prediction • Homology Modeling • Threading • Ab initio prediction 2017/5/23 13 Secondary Structure Prediction Task • Given an amino acid sequence • Predict a secondary structure state (, , coil) for each residue in the sequence • Common approach in the past: – Make prediction for a given residue by considering a window of n (13 – 21) neighboring residues – Learn model that performs mapping from window of residues to secondary structure state 2017/5/23 14 Secondary Structure Prediction Task • Chou and Fasman (1978) – certain residues not only have high propensity for a particular secondary structure, but they tend to disrupt or break other secondary structures • Based on analysis of known structures: – table giving propensities of amino acid residues for Helical and Sheet conformation • Possible to predict secondary structure from sequence information using such empirically determined information 2017/5/23 15 Secondary Structure Prediction • Recent methods utilize evolutionary information (e.g., PHD system – Rost & Sander, 1993) • Main idea is to consider related sequences when making prediction • Why? Conservation of residues better in secondary structural regions such as helices and strands 2017/5/23 16 Applications of secondary structure prediction • Identification of remote homologs • Using predicted structure to discriminate between related and unrelated proteins in the range of 10-30% sequence identity (e.g. 1hbr-a and 2vhb-b vs 1ai7-a and 2vhb-a 16%id) • Fold recognition • Structural clustering 2017/5/23 17 Homology Modeling • Simplest, reliable approach • Basis: proteins with similar sequences tend to fold into similar structures • Has been observed that even proteins with 25% sequence identity fold into similar structures • Does not work for remote homologs (< 25% pair wise identity) 2017/5/23 18 Homology Modeling • Given: – A query sequence Q – A database of known protein structures • Find protein P such that P has high sequence similarity to Q • Return P’s structure as an approximation to Q’s structure 2017/5/23 19 Steps in homology modeling • Alignment and template selection • Generate multiple alignments • Construct initial models • Constructing variable side chains and main chains (loops, insertions and deletions) • Selecting the most native-like conformations 2017/5/23 20 Fold recognition (sequences with no sequence identity (<= 30%) to sequences of known structure) •Given the sequence, and a set of folds observed in PDB, see if any of the sequences could adopt one the known folds. •Takes advantage of knowledge of existing structures, and principles by which they are stabilized (favorable interactions). 2017/5/23 21 New sequence: •MLDTNMKTQLKAYLEKLTKPVELIATLDDSAKS AEIKELL… •Library of known folds: 2017/5/23 22 2017/5/23 23 Ab Initio or De novo Structure Prediction • Assumption - that the native state of a protein is at the global free energy minimum • large-scale search of conformational space for protein tertiary structures that are particularly low in free energy for the given amino acid sequence. 2017/5/23 24 Rosetta • A Particularly successful method - based on a picture of protein folding in which short segments of the protein chain flicker between different local structures consistent with their local sequence, and folding to the native state occurs when these local segments are oriented such that low free energy interactions are made throughout the protein. 2017/5/23 25 2017/5/23 26 2017/5/23 27 Web-sites for secondary structure prediction • PHD – http://cubic.bioc.columbia.edu/predictpro tein • PSI-PRED http://insulin.brunel.ac.uk/psipred • JPRED – http://jura.ebi.ac.uk:8888/ • NPSA – http://npsa-pbil.ibcp.fr 2017/5/23 28 Web-sites for homology modeling • RAMP • http://software.compbio.washington.edu/ramp • SWISS-MODEL www.expasy.ch/swissmod/SWISSMODEL.html 2017/5/23 29 Web-sites for fold recognition • 3D-PSSM Protein Fold Recognition (Threading) Server • www.sbg.bio.ic.ac.uk/~3dpssm • UCLA-DOE Fold Server • fold.doe-mbi.ucla.edu/ 2017/5/23 30 Critical Assessment of Structure Prediction (CASP) • Judging techniques for predicting structures requires blind tests • Initiated by J. Moult • Crystallographers and NMR Spectroscopists in the process of determining a protein structure are invited to publish the amino acid sequences several months before the expected date of completion and to keep the results secret • Predictors submit models • Predictions and experimental results are compared 2017/5/23 31 Applications of homology models in drug discovery • Steps in Drug Discovery • Target Identification > Target Validation > Lead Identification > Lead Optimization > Development • Homology Models are useful in all of the above steps of drug discovery 2017/5/23 32 Target Identification & Validation • Complementary nature of drug molecules and their corresponding target proteins help to distinguish good target proteins from others • Assessment of target ‘drugability’ by analysis of ligand binding sites 2017/5/23 33 Lead Identification and Optimization • High-throughput docking • Design of selective or broad spectrum Compounds • Prediction of binding characteristics of lead compounds • Prediction of metabolism, toxicity and drug-drug interactions 2017/5/23 34 Summary and Conclusions • Brief overview of protein structure • Methods for structure prediction • Their accuracy and application • Homology modelling - Threading - Ab Initio • Application of homology modelling in the drug discovery process 2017/5/23 35