* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture09_14Class
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Biosynthesis wikipedia , lookup
Interactome wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
Protein purification wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Western blot wikipedia , lookup
Biochemistry wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Structural Bioinformatics Proteins Secondary Structure Predictions The first high resolution structure of a protein-myoglobin Was solved in 1958 by Max Perutz John Kendrew of Cambridge University. (Won the 1962 and Nobel Prize in Chemistry) In 12.12.2013 there were 89,110 protein structures in the protein structure database. Great increase but still a magnitude lower then the total number of protein sequence databases (close to 1,000,000) 2 What can we do to bridge the gap?? MERFGYTRAANCEAP…. Predicting the three dimensional structure from sequence of a protein is very hard (some times impossible) However we can predict with relative high precision the secondary structure 3 What do we mean by Secondary Structure ? Secondary structure are the building blocks of the protein structure: = What do we mean by Secondary Structure ? Secondary structure is usually divided into three categories: Alpha helix Beta strand (sheet) Anything else – turn/loop 5 The different secondary structures are combined together to form the Tertiary Structure of the Proteins 6 Tertiary ? RBP Globin Secondary ? ? 7 Secondary Structure Prediction • Given a primary sequence ADSGHYRFASGFTYKKMNCTEAA what secondary structure will it adopt (alpha helix, beta strand or random coil) ? 8 Secondary Structure Prediction Methods • Statistical methods – Based on amino acid frequencies – HMM (Hidden Markov Model) • Machine learning methods – SVM , Neural networks 9 Statistical Methods for SS prediction The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker) Chou and Fasman (1974) Name Alanine Arginine Aspartic Acid Asparagine Cysteine Glutamic Acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine P(a) 142 98 101 67 70 151 111 57 100 108 121 114 145 113 57 77 83 108 69 106 P(b) 83 93 54 89 119 037 110 75 87 160 130 74 105 138 55 75 119 137 147 170 P(turn) 66 95 146 156 119 74 98 156 95 47 59 101 60 60 152 143 96 96 114 50 Success rate of 50% 10 Secondary Structure Method Improvements ‘Sliding window’ approach • Most alpha helices are ~12 residues long Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold predict this is an alpha helix/beta sheet TGTAGPQLKCHIQWMLPLKK 11 Improvements since 1980’s • Adding information from conservation in MSA • Smarter algorithms (e.g. Machine learning, HMM). 12 HMM (Hidden Markov Model) approach for predicting Secondary Structure • HMM enables us to calculate the probability of assigning a sequence to a secondary structure TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB p=? 13 Beginning with an αhelix α-helix followed by α-helix The probability of observing Alanine as part of a βsheet The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15 Table built according to large database of known secondary structures 14 • Example What is the probability that the sequence TGQ will be in a helical structure?? TGQ HHH p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8 x 0.0635 = 0.0020995 Success of HMM based methods-> 75%-80% 15 • What can we learn from secondary structure predictions?? Mad Cow Disease PrPc to PrPsc PRPc PRPsc How do the protein structure relate to the primary protein sequence?? 18 SEQUENCE -Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen) - Protein structure is more conserved than protein sequence and more closely related to function. 19 How (CAN) Different Amino Acid Sequence Determine Similar Protein Structure ?? Lesk and Chothia 1980 20 The Globin Family 21 Different sequences can result in similar structures 1ecd 2hhd 22 We can learn about the important features which determine structure and function by comparing the sequences and structures ? 23 The Globin Family 24 Why is Proline 36 conserved in all the globin family ? 25 Where are the gaps?? The gaps in the pairwise alignment are mapped to the loop regions 26 How are remote homologs related in terms of their structure? RBD retinol-binding protein apolipoprotein D b-lactoglobulin odorant-binding protein 27 PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3 Score = 159 bits (404), Expect = 1e-38 Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%) Query: 3 Sbjct: 1 Query: 55 Sbjct: 60 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112 Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159 28 The Retinol Binding Protein b-lactoglobulin 29 Taken together MERFGYTRAANCEAP…. FUNCTION 30 Pfam Database that contains a large collection of multiple sequence alignments of protein families (common structures) Very useful for function prediction. http://pfam.sanger.ac.uk/ The zinc-finger family (domain) Known family of Transcription Factors Protein sequence ZINC FINGER DOMAIN Pfam Based on Profile hidden Markov Models (HMMs) which represents the protein family HMM in comparison to PSSM is a model which considers dependencies between the different columns in the matrix (different residues) and is thus much more powerful!!!! http://pfam.sanger.ac.uk/ Profile HMM (Hidden Markov Model) can accurately represent a MSA D16 D17 delete Match insert D 0.8 S 0.2 I16 X M17 50% 16 17 18 19 100% 50% M16 D19 100% D18 P 0.4 R 0.6 M18 100% T 1.0 M19 100% R 0.4 S 0.6 I17 I18 I19 X X X DRTR DRTS S - - S SP TR DR TR DP TS D - - S D - - S D - - S D - - R Extra Slides (for your interest) 35 Alpha Helix: Pauling (1951) • A consecutive stretch of 5-40 amino acids (average 10). • A right-handed spiral conformation. 3.6 • 3.6 amino acids per turn. residues 5.6 Å • Stabilized by Hydrogen bonds 36 Beta Strand: Pauling and Corey (1951) β -strand > An extended polypeptide chains is called β –strand (consists of 5-10 amino acids > The chains are connected together by Hydrogen bonds to form b-sheet β -sheet 37 Loops • Connect the secondary structure elements (alpha helix and beta strands). • Have various length and shapes. 38