* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture08_11
Survey
Document related concepts
Magnesium transporter wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Genetic code wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Interactome wikipedia , lookup
Point mutation wikipedia , lookup
Biochemistry wikipedia , lookup
Protein purification wikipedia , lookup
Metalloprotein wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Western blot wikipedia , lookup
Proteolysis wikipedia , lookup
Transcript
Structural Bioinformatics Proteins Secondary Structure Predictions Structure Prediction Motivation • Better understand protein function • Broaden homology – Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) • Explain disease – Explain the effect of mutations – Design drugs 2 Myoglobin – the first high resolution protein structure Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry. “ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.” 3 MERFGYTRAANCEAP…. Predicting the three dimensional structure from sequence of a protein is very hard (some times impossible) However we can predict with relative high precision the secondary structure 4 What do we mean by Secondary Structure ? Secondary structure are the building blocks of the protein structure: = What do we mean by Secondary Structure ? Secondary structure is usually divided into three categories: Alpha helix Beta strand (sheet) Anything else – turn/loop 6 Alpha Helix: Pauling (1951) • A consecutive stretch of 5-40 amino acids (average 10). • A right-handed spiral conformation. • 3.6 amino acids per turn. 3.6 residues 5.6 Å • Stabilized by H-bonds 7 Beta Strand: Pauling and Corey (1951) • Different polypeptide chains run alongside each other and are linked together by hydrogen bonds. • Each section is called β -strand, and consists of 5-10 amino acids. β -strand 8 3.47Å Beta Sheet 4.6Å The strands become adjacent to each other, forming beta-sheet. 3.25Å Antiparallel Parallel 4.6Å 9 Loops • Connect the secondary structure elements. • Have various length and shapes. • Located at the surface of the folded protein and therefore may have important role in biological recognition processes. 10 Three dimensional Tertiary Structure Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the level of one whole polypeptide chain 11 Tertiary Secondary RBP Globin 12 How do the (secondary and tertiary) structures relate to the primary protein sequence?? 13 SEQUENCE STRUCTURE -Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen) - Protein structure is more conserved than protein sequence and more closely related to function. 14 How (CAN) Different Amino Acid Sequence Determine Similar Protein Structure ?? Lesk and Chothia 1980 15 The Globin Family 16 Different sequences can result in similar structures 1ecd 2hhd 17 We can learn about the important features which determine structure and function by comparing the sequences and structures ? 18 The Globin Family 19 Why is Proline 36 conserved in all the globin family ? 20 Where are the gaps?? The gaps in the pairwise alignment are mapped to the loop regions 21 How are remote homologs related in terms of their structure? RBD retinol-binding protein apolipoprotein D b-lactoglobulin odorant-binding protein 22 PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3 Score = 159 bits (404), Expect = 1e-38 Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%) Query: 3 Sbjct: 1 Query: 55 Sbjct: 60 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112 Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159 23 The Retinol Binding Protein b-lactoglobulin 24 Structure Prediction: Motivation • Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR) • Only about ~50000 solved protein structures • Experimental methods are time consuming and not always possible • Goal: Predict protein structure based on sequence information Prediction Approaches • Tow stage 1. Primary (sequence) to secondary structure 2. Secondary to tertiary • One stage - Primary to tertiary structure 26 According to the most simplified model: • In a first step, the secondary structure is predicted based on the sequence. • The secondary structure elements are then arranged to produce the tertiary structure, i.e. the structure of a protein chain. • For molecules which are composed of different subunits, the protein chains are arranged to form the quaternary structure. 27 Secondary Structure Prediction • Given a primary sequence ADSGHYRFASGFTYKKMNCTEAA what secondary structure will it adopt ? 28 Secondary Structure Prediction Methods • Chou-Fasman / GOR Method – Based on amino acid frequencies • Machine learning methods – PHDsec and PSIpred • HMM (Hidden Markov Model) 29 Chou and Fasman (1974) The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker) Name Alanine Arginine Aspartic Acid Asparagine Cysteine Glutamic Acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine P(a) 142 98 101 67 70 151 111 57 100 108 121 114 145 113 57 77 83 108 69 106 P(b) 83 93 54 89 119 037 110 75 87 160 130 74 105 138 55 75 119 137 147 170 P(turn) 66 95 146 156 119 74 98 156 95 47 59 101 60 60 152 143 96 96 114 50 Success rate of 50% 30 Secondary Structure Method Improvements ‘Sliding window’ approach • Most alpha helices are ~12 residues long Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold predict this is an alpha helix/beta sheet TGTAGPOLKCHIQWMLPLKK 31 Improvements since 1980’s • Adding information from conservation in MSA • Smarter algorithms (e.g. Machine learning, HMM). Success -> 75%-80% 32 Machine learning approach for predicting Secondary Structure (PHD, PSIpred) Query SwissProt Query Subject Subject Subject Subject Step 1: Generating a multiple sequence alignment 33 Query Step 2: Additional sequences are added using a profile. We end up with a MSA which represents the protein family. seed MSA Query Subject Subject Subject Subject 34 Query Step 3: The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure. seed MSA Query Subject Subject Subject Subject Machine Learning Approach Known structures 35 HMM approach for predicting Secondary Structure (SAM) • HMM enables us to calculate the probability of assigning a sequence to a secondary structure TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB p=? 36 Beginning with an αhelix α-helix followed by α-helix The probability of observing Alanine as part of a βsheet The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15 Table built according to large database of known secondary structures 37 • The above table enables us to calculate the probability of assigning secondary structure to a protein • Example TGQ HHH p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8 x 0.0635 = 0.0020995 38 Secondary structure prediction • • • • • • • • • • • • • • AGADIR - An algorithm to predict the helical content of peptides APSSP - Advanced Protein Secondary Structure Prediction Server GOR - Garnier et al, 1996 HNN - Hierarchical Neural Network method (Guermeur, 1997) Jpred - A consensus method for protein secondary structure prediction at University of Dundee JUFO - Protein secondary structure prediction from sequence (neural network) nnPredict - University of California at San Francisco (UCSF) PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Brunel University SOPMA - Geourjon and Delיage, 1995 SSpro - Secondary structure prediction using bidirectional recurrent neural networks at University of California DLP - Domain linker prediction at RIKEN 39