Download Physical Properties of Amino Acids and Prediction of Secondary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

LSm wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Magnesium transporter wikipedia , lookup

SR protein wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Protein moonlighting wikipedia , lookup

List of types of proteins wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein domain wikipedia , lookup

Protein wikipedia , lookup

Genetic code wikipedia , lookup

Structural alignment wikipedia , lookup

Proteolysis wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Physical Properties of Amino
Acids and Prediction of Secondary
Structure
Huan-Xiang Zhou
CSIT, Physics, and IMB
3/27/02
20 Types of Amino Acids
Physical Properties
• Polarity
nonpolar – hydrophobic interactions
polar – hydrogen bonds
charged – salt bridges/charge pair
• Rotational entropy
G
Nonpolar Amino Acids
A
P
C
M
F
V
I
L
Q
Polar Amino Acids
N
T
Y
S
W
Charged Amino Acids
R
H
D
K
E
Solvation of Charged Amino Acids
+
∆G
+
∆G
r
Distribution in Folded Proteins
Vijayakumer & Zhou, J. Phys. Chem. 104:9755 (2000)
Charge-Charge Interactions
+
_
+
∆G
_
_
_
+
+
∂∆G/∂pH = kBT(Qf - Qu)ln10
Charge-Charge Interactions: Unfolded State
u = e2/ε r
+
r
r is not fixed, but distributed according to
p(r) = 4πr2(3/2πd2)3/2exp(-3r2/2d2)
average distance d = bl1/2 + s
<u> = (6/π)e2/εd
Zhou, P. Natl. Acad. Sci. USA 99:3569 (2002)
Barnase
CI2
OMTKY3
RNase A
Charge-Charge Interactions: Folded State
Spherical Model
+
+
+
-
-
Detailed Model
+
+
-
Barnase Mutations
D93
R69
D75
R83
Contributions to Stability
∆∆G (kcal/mol)
6
5
4
Experiment
Calculation
3
2
1
0
R69S
D93N
R69S /D93N
R69M
R83Q
D75N
R83N/D75N
D12A
R110A/D12A
Vijayakumar & Zhou, J. Phys. Chem. 105:7334 (2001)
Conformations of Peptides
χ1
ψ
φ
φ-ψ Map
Sidechain Rotational Entropy
• In the unfolded state, sidechains have more
rotational freedom.
• Loss of sidechain entropy depends on type of
amino acids, backbone conformation, and
tertiary contacts.
Helix-Forming Propensities
• Propensities are manifested
by the occurrence
frequencies of amino acids
in helices and can be
measured experimentally
by mutations. Order: Ala >
Leu > Ile > Val > Ser, Thr
> Asp, Asn.
Accounting for the Different Propensities
• Rose (1992) proposed restriction in
sidechain rotation by helix as a major
factor.
• This cannot explain lower propensities of
polar sidechains (Ser, Thr, Asp, and Asn).
Sidechain-Backbone Hydrogen Bonding
∆∆G = T∆∆Ssc – ∆Gsc-bb
∆Gsc-bb = kBT ln [1 – p + p exp(∆ghb/kBT)]
p: probability of forming hydrogen bonding in
nonhelical state (32% for Thr).
Comparison with Experiment
2.0
(kcal/mol)
∆∆Gexp
1.5
1.0
0.5
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
∆∆Gcal (kcal/mol)
Vijayakumar et al. Proteins 34:497 (1999)
Prediction of Solvent Accessibility
• Two-state representation: 0 for buried and 1 for
exposed.
• Baseline Method: Those buried >50% of the time in a
training set is predicted to be buried; the rest is
predicted to be exposed. In particular, Leu, Ile, Val,
Phe, Trp, and Cys are always predicted to be buried,
whereas Asp, Glu, Lys, Arg, His, Asn, Gln, and Pro
are always predicted to be exposed.
Bayesian Statistical Analysis
• Extends the baseline method by considering statistics
of not just one position, but a window of residues
centered at one position.
• Because of low probability for any stretch of residues
in protein sequences, statistically significant results
for burial probability of a residue inside a particular
stretch of residues cannot be obtained from any
training set. Assumptions must be made.
• Simplest assumption is probability for a type of
residue to appear in a site within a segment of
accessibility states is independent of neighboring
positions.
Linear Regression Analysis
• Accessibility state at a position is assumed to be determined
directly by the residue identities at that and neighboring
positions, and the transfer free energies (Gi) and relative
molecular weights (Mi) of the residues occupying these
positions via
Si = ∑jαj(Si)Rj + ∑j,kβjk(Si)GjGk + ∑j,kγjk(Si)MjMk
The indices j and k run from the beginning to the end of a
window centered at the position i whose accessibility state Si
is calculated. The coefficients αj, βjk, and γjk are determined
by minimizing the deviations of calculated accessibility states
from actual ones for a training set.
• Rj is an array of 19 zeros and a one representing the particular
type of residue occupying position j.
Multiple Sequence Alignment and
Sequence Profile
• Proteins are subject to mutations. Residues are likely
replaced by those with similar properties (divergent
evolution). Conversely, a protein structure dictates
which type of positions are occupied by which type
of residues (convergent evolution).
• When homologous proteins are aligned by sequence,
identities of amino acids occupying a given position
(sequence profile) hold information about that
position.
• Multiple-sequence alignment can be readily obtained
PSI-Blast.
MS Information Enhances Accuracy
• If a position is always occupied by
residues favoring the buried
(exposed) state among a set of
homologous proteins, that position
is very likely to be buried (exposed).
----L--D-------L--E-------I--E-------V--K----
• Implementation
Baseline Method: ∑lwlpl > 0.5
Bayesian Statistics: Sequence profiles are represented
by 28 classes
MLR: Rj replaced by sequence profile
Sequence Profiles
Neural Network Predictor
State
• Sequence profile is fed as input. Network is trained
on a set of known protein structures.
Shan et al., Proteins 42:23 (2001)
Prediction Results
Training set
Number of
training
sequences
Number
of test
sequences
BL
BS
MLR
BL
BS
MLR
NN
Set 1 (90-199 aa)
298
277
71.4
72.7
73.1
74.1
75.9
74.4
78.2
Set 2 (200-439 aa)
399
218
69.3
71.1
71.5
72.8
75.1
75.6
77.9
Set 3 (≥ 440 aa)
186
18
67.4
69.3
69.6
70.3
73.0
72.6
75.5
69.8
71.5
71.9
73.0
75.2
74.9
77.8
69.9
71.1
71.7
73.1
74.4
75.8
77.1
All
883
513
Single sequence
Multiple sequence
• Neighboring residues do not exert great influence on
solvent accessibility.
Prediction of Secondary Structure
• Amino acids have different preferences for α-helix
(and β-strands). A string of helix-preferring residues
will likely form helix.
--AALILA-Chou & Fasman, Biochemistry (1974).
• New idea: in a multiple sequence
alignment, if position is mostly
occupied by helix-preferring
residues, that position will likely be
helical.
----AL-------AA-------LL-------LA----
Sequence Profile
Neural Network Predictor
State
• Sequence profile is fed as input. Network is trained
on a set of known protein structures. Consistently
predicts secondary structure at 75% accuracy.
Shan et al., Proteins 42:23 (2001)
Prediction of 3-D Structure
• Proteins with similar sequences adopt nearly identical
structures. Even proteins with very different
sequences (e.g., 10% identity) often adopt similar
structures. Perhaps there is a finite number of
distinct structure folds.
• New problem: which of the structure folds FITs the
sequence best?
Threading
Query sequence: --dhwqarpcwyAGFTviltvkhtswyhlmad--
Templates
Fitting Function of COBLATH
Shan et al., Proteins 42:23 (2001)
• When proteins have similar structures, their
sequences do share similarities (e.g., Leu replaced by
Ile). This similarity can be captured by comparing
the sequence profile of query (from a multiple
sequence alignment) with sequences of templates.
• When 3-d structures are superimposable, secondary
structures and solvent accessibilities must also agree.
This agreement can be captured by comparing
predicted secondary structure and accessibility of
query and actual secondary structures and
accessibilities of templates.