Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Hidden Markov Model for Protein Secondary Structure Prediction Wei-Mou Zheng Institute of Theoretical Physics Academia Sinica PO Box 2735, Beijing 100080 [email protected] Outline • Protein structure • A brief review of secondary structure prediction • Hidden Markov model: simple-minded • Hidden Markov model: realistic • Discussion • References Protein sequences are written in 20 letters (20 Naturally-occurring amino acid residues): AVCDE FGHIW KLMNY PQRST Hydrophobic Charged+- Polar Residues form a directed chain Cis- Trans- Rasmol ribbon diagram of GB1 Helix (pink), sheets (yellow) and coil (grey) Hydrogen-bond network 3D structure → secondary structure written in three letters:H, E, C. H: E: C = 34.9: 21.8: 43.3 Bayes formula Count of Generally, P(x, y) = P(x|y)P(y), Protein sequence A, {ai}, i=1,2,…,n Secondary structure sequence S, {si}, i=1,2,…,n Secendary structure prediction: 1D amino acid sequences → 1D secondary structure sequence An old problem for more than 30 years Inference of S from A: P(S |A ) 1. Simple Chou-fasman approach Chou-Fasman’s propensity of amino acid to conformational state + independence approximation Parameter Training Propensities q(a,s) Counts (20x3) from a database: N(a, s) sum over a → N(s), sum over s → N(a), sum over a and s → N q(a,s) = [N(a,s) N] / [N(a) N(s)]. 2. Garnier-Osguthorpe-Robson (GOR) window version Conditional Independency Weight matrix (20x17)x3 P(W|s) 3. Improved GOR (20x20x16x3, to include pair correlation) Hidden Markov Model (HMM): simple-minded Bayesian formula: P(S|A) = P(S,A)/P(A) ~ P(S,A) = P(A|S) P(S) Simple version Markov chain For hidden sequence a1 a2 a3 s1 s2 s3 Forward and backward functions emitting ai at si according to P(a|s) Initial conditions and recursion relations Partition function Linear algorithm: Dynamic programming Baum-Welch (sum) & Viterbi (max) Prob(si=s, si+1=s’) = Ai(s) tss’ P(ai+1|s’) Bi+1(s’)/Z Prob(si:j) Hidden Markov Model: Realistic 1) Strong correlation in conformational states: at least two consicutive E and three consicutive H refined conformational states (243 → 75) 2) Emission probabilities → improved window scores Proportion of accurately predicted sites ~ 70% (compared with < 65% for prediction based on a single sequence) • No post-prediction filtering • Integrated (overall) estimation of refined conformation states • Measure of prediction confidence Discussions • HMM using refined conformational states and window scores is efficient for protein secondary structure prediction. • Better score system should cover more correlation between conformation and sequence. • Combining homologous information will improve the prediction accuracy. • From secondary structure to 3D structure (structure codes: discretized 3D conformational states) References Lawrence R Rabiner, A tutorial on hidden Markov models and selected appllications in speech recognition Proceeding of the IEEE, 77 (1989) 257-286 Burkhard Rost Protein Secondary Structure Prediction Continues to Rise Journal of Structural Biology 134, 204–218 (2001) The End P I V G A L C S N T D Q M E Y F W K H R Hydrophobic Polar