* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Physical Properties of Amino Acids and Prediction of Secondary
Document related concepts
Physical Properties of Amino Acids and Prediction of Secondary Structure Huan-Xiang Zhou CSIT, Physics, and IMB 3/27/02 20 Types of Amino Acids Physical Properties • Polarity nonpolar – hydrophobic interactions polar – hydrogen bonds charged – salt bridges/charge pair • Rotational entropy G Nonpolar Amino Acids A P C M F V I L Q Polar Amino Acids N T Y S W Charged Amino Acids R H D K E Solvation of Charged Amino Acids + ∆G + ∆G r Distribution in Folded Proteins Vijayakumer & Zhou, J. Phys. Chem. 104:9755 (2000) Charge-Charge Interactions + _ + ∆G _ _ _ + + ∂∆G/∂pH = kBT(Qf - Qu)ln10 Charge-Charge Interactions: Unfolded State u = e2/ε r + r r is not fixed, but distributed according to p(r) = 4πr2(3/2πd2)3/2exp(-3r2/2d2) average distance d = bl1/2 + s <u> = (6/π)e2/εd Zhou, P. Natl. Acad. Sci. USA 99:3569 (2002) Barnase CI2 OMTKY3 RNase A Charge-Charge Interactions: Folded State Spherical Model + + + - - Detailed Model + + - Barnase Mutations D93 R69 D75 R83 Contributions to Stability ∆∆G (kcal/mol) 6 5 4 Experiment Calculation 3 2 1 0 R69S D93N R69S /D93N R69M R83Q D75N R83N/D75N D12A R110A/D12A Vijayakumar & Zhou, J. Phys. Chem. 105:7334 (2001) Conformations of Peptides χ1 ψ φ φ-ψ Map Sidechain Rotational Entropy • In the unfolded state, sidechains have more rotational freedom. • Loss of sidechain entropy depends on type of amino acids, backbone conformation, and tertiary contacts. Helix-Forming Propensities • Propensities are manifested by the occurrence frequencies of amino acids in helices and can be measured experimentally by mutations. Order: Ala > Leu > Ile > Val > Ser, Thr > Asp, Asn. Accounting for the Different Propensities • Rose (1992) proposed restriction in sidechain rotation by helix as a major factor. • This cannot explain lower propensities of polar sidechains (Ser, Thr, Asp, and Asn). Sidechain-Backbone Hydrogen Bonding ∆∆G = T∆∆Ssc – ∆Gsc-bb ∆Gsc-bb = kBT ln [1 – p + p exp(∆ghb/kBT)] p: probability of forming hydrogen bonding in nonhelical state (32% for Thr). Comparison with Experiment 2.0 (kcal/mol) ∆∆Gexp 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 ∆∆Gcal (kcal/mol) Vijayakumar et al. Proteins 34:497 (1999) Prediction of Solvent Accessibility • Two-state representation: 0 for buried and 1 for exposed. • Baseline Method: Those buried >50% of the time in a training set is predicted to be buried; the rest is predicted to be exposed. In particular, Leu, Ile, Val, Phe, Trp, and Cys are always predicted to be buried, whereas Asp, Glu, Lys, Arg, His, Asn, Gln, and Pro are always predicted to be exposed. Bayesian Statistical Analysis • Extends the baseline method by considering statistics of not just one position, but a window of residues centered at one position. • Because of low probability for any stretch of residues in protein sequences, statistically significant results for burial probability of a residue inside a particular stretch of residues cannot be obtained from any training set. Assumptions must be made. • Simplest assumption is probability for a type of residue to appear in a site within a segment of accessibility states is independent of neighboring positions. Linear Regression Analysis • Accessibility state at a position is assumed to be determined directly by the residue identities at that and neighboring positions, and the transfer free energies (Gi) and relative molecular weights (Mi) of the residues occupying these positions via Si = ∑jαj(Si)Rj + ∑j,kβjk(Si)GjGk + ∑j,kγjk(Si)MjMk The indices j and k run from the beginning to the end of a window centered at the position i whose accessibility state Si is calculated. The coefficients αj, βjk, and γjk are determined by minimizing the deviations of calculated accessibility states from actual ones for a training set. • Rj is an array of 19 zeros and a one representing the particular type of residue occupying position j. Multiple Sequence Alignment and Sequence Profile • Proteins are subject to mutations. Residues are likely replaced by those with similar properties (divergent evolution). Conversely, a protein structure dictates which type of positions are occupied by which type of residues (convergent evolution). • When homologous proteins are aligned by sequence, identities of amino acids occupying a given position (sequence profile) hold information about that position. • Multiple-sequence alignment can be readily obtained PSI-Blast. MS Information Enhances Accuracy • If a position is always occupied by residues favoring the buried (exposed) state among a set of homologous proteins, that position is very likely to be buried (exposed). ----L--D-------L--E-------I--E-------V--K---- • Implementation Baseline Method: ∑lwlpl > 0.5 Bayesian Statistics: Sequence profiles are represented by 28 classes MLR: Rj replaced by sequence profile Sequence Profiles Neural Network Predictor State • Sequence profile is fed as input. Network is trained on a set of known protein structures. Shan et al., Proteins 42:23 (2001) Prediction Results Training set Number of training sequences Number of test sequences BL BS MLR BL BS MLR NN Set 1 (90-199 aa) 298 277 71.4 72.7 73.1 74.1 75.9 74.4 78.2 Set 2 (200-439 aa) 399 218 69.3 71.1 71.5 72.8 75.1 75.6 77.9 Set 3 (≥ 440 aa) 186 18 67.4 69.3 69.6 70.3 73.0 72.6 75.5 69.8 71.5 71.9 73.0 75.2 74.9 77.8 69.9 71.1 71.7 73.1 74.4 75.8 77.1 All 883 513 Single sequence Multiple sequence • Neighboring residues do not exert great influence on solvent accessibility. Prediction of Secondary Structure • Amino acids have different preferences for α-helix (and β-strands). A string of helix-preferring residues will likely form helix. --AALILA-Chou & Fasman, Biochemistry (1974). • New idea: in a multiple sequence alignment, if position is mostly occupied by helix-preferring residues, that position will likely be helical. ----AL-------AA-------LL-------LA---- Sequence Profile Neural Network Predictor State • Sequence profile is fed as input. Network is trained on a set of known protein structures. Consistently predicts secondary structure at 75% accuracy. Shan et al., Proteins 42:23 (2001) Prediction of 3-D Structure • Proteins with similar sequences adopt nearly identical structures. Even proteins with very different sequences (e.g., 10% identity) often adopt similar structures. Perhaps there is a finite number of distinct structure folds. • New problem: which of the structure folds FITs the sequence best? Threading Query sequence: --dhwqarpcwyAGFTviltvkhtswyhlmad-- Templates Fitting Function of COBLATH Shan et al., Proteins 42:23 (2001) • When proteins have similar structures, their sequences do share similarities (e.g., Leu replaced by Ile). This similarity can be captured by comparing the sequence profile of query (from a multiple sequence alignment) with sequences of templates. • When 3-d structures are superimposable, secondary structures and solvent accessibilities must also agree. This agreement can be captured by comparing predicted secondary structure and accessibility of query and actual secondary structures and accessibilities of templates.