Download ProtMutation Analysis using Statistical Geometry

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Protein Mutational Analysis
Using Statistical Geometry
Methods
Majid Masso
[email protected]
http://mason.gmu.edu/~mmasso
Bioinformatics and Computational Biology
George Mason University
Protein Basics
=
=
A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,
V,W,Y
H
+
H3N Cα
O
C O-
Identical for all
amino acids
CH2
CH
H3C CH3
Leucine (Leu or L)
Unique side chain
(R group) for each
amino acid
H2O
H
+
O
H
O
=

H O
H O
Cα C O- + +H3N Cα C OR1
R2
=

formed by linearly linking amino
acid residues (aa’s are the
+
H3N
building blocks of proteins)
20 distinct aa types
=

H3N Cα C N Cα C OR1
H R2
peptide bond
Amino Acid Groups

Brandon/Tooze (affinity for water)



hydrophobic aa’s: A,V,L,I,M,P,F
hydrophilic aa’s:
 polar: N,Q,W,S,T,G,C,H,Y
 charged: D,E,R,K
Dayhoff (similar wrt structure or function)



(A,S,T,G,P),(V,L,I,M),(R,K,H),(D,E,N,Q),(F,Y,W),(C)
conservative substitution: replacement with an
amino acid from within the same class
non-conservative substitution: interclass replacement
Protein Basics




genes: code, or “blueprint”
proteins: product, or “building”
protein structure gives rise to
function
why do “things go wrong”?



mistakes in “blueprint”
incorrectly built, or nonexistent
“buildings”
Protein Data Bank (PDB):
repository of protein structural
data, including 3D coords. of all
atoms (www.rcsb.org/pdb/)
PDB ID: 1REZ
Structure reference: Muraki M., Harata K., Sugita N., Sato K.,
Origin of carbohydrate recognition specificity of human
lysozyme revealed by affinity labeling, Biochemistry 35 (1996)
Computational Geometry Approach to
Protein Structure Prediction
Tessellation






protein structure represented as a set
of points in 3D, using Cα coordinates
Voronoi tessellation: convex polyhedra,
each contains one Cα , all interior points
closer to this Cα than any other
Delaunay tessellation: connect four Cα
whose Voronoi polyhedra meet at a
common vertex
vertices of Delaunay simplices
objectively define a set of four nearestneighbor residues (quadruplets)
5 classes of Delaunay simplices
Quickhull algorithm (qhull program),
Barber et al., UMN Geometry Center
Voronoi/Delaunay tessellation in 2D space. Voronoi
tessellation-dashed line, Delaunay tessellation-solid
line (Adapted from Singh R.K., et al. J. Comput. Biol.,
1996, 3, 213-222.)
k
j
l
i
k
j
j
+
1
j
i
+
3
j
i
+
1
i
+
2 i
+
1
i
+
1
i
+
1
+
2i
i
i
i
i
{
1
1
1
1
}{
2
2
}
2
1
1
} {
{
4
}
{
3
1
}
Five classes of Delaunay simplices. (Adapted from
Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.)
Counting Quadruplets

assuming order independence among residues comprising
Delaunay simplices, the maximum number of all possible
combinations of quadruplets forming such simplices is 8855
C D E F
 20 
 
 4
C C D E
 19 
20   
2
C C D D
 20 
 
 2
C C C D
20 19
C C C C
20
Residue Environment Scores

log-likelihood: qijkl  log  fijkl pijkl 

fijkl = normalized frequency of quadruplets containing
residues i,j,k,l in a representative training set of highresolution protein structures with low primary sequence
identity


i.e., f ijkl = total number of quadruplets in dataset containing only
residues i,j,k,l divided by total number of observed quadruplets
pijkl = frequency of random occurrence of the
quadruplet (multinomial)



i.e., pijkl  cai a j ak al
ai= total number of occurrences of residue i divided by total
number of residues in the dataset
4!
, where n = number of distinct residue types in the
c n
  ti ! quadruplet, and t i is the number of residues of type i.
i
Residue Environment Scores


total statistical potential (topological score) of protein: sum the loglikelihoods of all quadruplets forming the Delaunay simplices
individual residue potentials: sum the log-likelihoods of all quadruplets
in which the residue participates (yields a 3D-1D potential profile)
3phv Potential Profile
12
PDB ID: 3phv
HIV-1 Protease Monomer
99 amino acids
(total potential 27.93)
10
Potential
8
6
4
2
0
-2
0
10
20
30
40
50
60
70
80
90
Residue Number
Structure reference: R. Lapatto, T. Blundell, A. Hemmings, et al., X-ray analysis of HIV-1 proteinase at 2.7 Å resolution
confirms structural homology among retroviral enzymes, Nature 342 (1989) 299-302.
100
Properties of HIV-1 Protease

functional as a homodimer






99 residues per subunit
monomers form an
intermolecular two-fold axis of
symmetry
approximate intramolecular
two-fold axis of symmetry
dimer interface: N and C
termini (P1-T4 & C95-F99,
respectively) form a fourstranded beta sheet
active site triad: D25-T26-G27
h-phobic flaps (M46-V56) are
also G-rich, providing flexibility

accommodate / interact with
substrate molecule
Figure adapted from URL:
http://mcl1.ncifcrf.gov/hivdb/Informative/Facts/facts.html
HIV-1 Protease Comprehensive Mutational Profile (CMP)



mutate 19 times the residue present at each of the 99 positions in the primary sequence
get total potential and potential profile of each artificially created mutant protein
create 20x99 matrix containing total potentials of all the single residue mutants

columns labeled with residues in the primary sequence of wild-type (WT) HIV-1
protease monomer, and rows labeled with the 20 naturally occurring amino acids
subtract WT total potential (TP) from each cell, then average columns to get CMP
1 20
1 20

CMPj = 20  [(mutant TP)ij-(WT TP)] = 20  [(mutant TP)ij-27.93] , j=1,…,99
i1
i1
3phv Comprehensive Mutational Profile
4
2
0
Mean Change in Total
Protein Potential

-2
-4
-6
-8
0
10
20
30
40
50
60
Residue Number
70
80
90
100
3phv Clustered Com prehensive Mutational Profiles
4
2
0
C
-2
NC
-4
ALL
Mean Change in
Total Protein Potential
-6
-8
-10
6
P Q I
1
T L
5
. . .
E A L L D T G A D D
21
30
. . .
A I
71
G T V L V G P T
80
. . .
C T
95
L N F
99
4
2
H-phobic
0
Charged
-2
Polar
-4
Total
-6
-8
-10
-12
P
1
Q
I
T L
5
. . .
E A L
21
L D T G A D
D
30
. . .
A I
71
Residue
G T V
L V
G P T
80
. . .
C T L N F
95
99
3phv Comprehensive Mutational Profile vs. Potential Profile
4
N83
D25
2
Mean Change in Total Protein Potential (CMPj)
K55
0
G78
G16
I50
G94 L19 T4G40
P9
G68
G73
P1
R57 Q92T12 N98
Q2
G86
D30
P44
Q61
P39H69
S37
Q7
Q18
K70
T91
A28
T80
T96
T26
G51
V82G17
K14
K43 T31
K45P81
R41
G27
M46
P79
W6 Q58 I54 G48
A71
R87
F99
I93
E34
L5
G49
E21
W42
T74
F53
E65
M36 L63
I3
G52
L97
R8
D29
V56
N88
I72
Y59 L38
E35
K20
I47
I84
-2
C95
L10
A22
L76
D60
L23
L89
V77
C67
L90
I62 V11
I13
-4
I15
V32
L33
V75
L24
-6
I66
I85
I64
-8
-2
0
2
4
6
8
10
Individual Residue Potentials of Wild-Type Protein (potential of residue j in WT HIV-1 protease)
12
3phv Comprehensive Non-Conservative Mutational Profile vs. Potential Profile
4
N83
D25
Mean Change in Overall Protein Potential
2
0
-2
K55
G78
G16
I50
L19
P9
T4
G40
G94
G68
P1
T12 N98
G73
R57 Q92Q2
Q7
G86
P44
D30
Q61
S37
P39H69
Q18T96
T91
T26
K70 T80
G51
T31
K14G17
P81
A71
K45
V82 A28 G27
R41
K43
M46Q58 I54 G48
P79
W6
R87
F99
E21
G49
E65 E34
W42
T74L5
I93
F53
G52M36
L63
R8
L97I3
D29
N88
Y59
E35
I72
K20
V56
L38
I47
C95
I84
L10
A22
C67
L76
D60
L89
L23
V77 L90
I62
V11
-4
I13
I15
V32
L33
-6
V75
L24
I66
-8
I85
I64
-10
-2
0
2
4
6
Individual Residue Potentials of Wild-Type Protein
8
10
12
3phv Comprehensive Conservative Mutational Profile vs. Potential Profile
1
G78
A28
G94
K55
K70
G16
G40
I50V82
N83
P39 G68
Q18
T4
Mean Change in Overall Protein Potential
P44
0
D25
L19
R57
V77
G73
L63
K43
Q58
I3
L76
M46
G86
W6
D30
V56
M36
P9
V75
S37
Q61G17
R87
F99
P1
T12
H69
K45
P79 R41 I54
L5
G51 N98 T91
I47
Q92
K14
F53 G48
T80
G27
W42
T96
Q2
L33
L89
L38
I93
I72
C95
C67
V11
V32
Y59
L23
T31
T74
P81 E34
E21T26
G49
I84 N88
R8
E35L97
K20
I15
L10 D60
I62
L90
G52
E65
-1
D29
Q7
I13
L24
I66
A71
I85
I64
-2
A22
-3
-2
0
2
4
6
Individual Residue Potentials of Wild-Type Protein
8
10
12
Experimental Data



536 single point missense mutations

336 published mutants:

200 mutants provided by R. Swanstrom (UNC)
Loeb D.D., Swanstrom R., Everitt L.,
Manchester M., Stamper S.E., Hutchison III C.A. Complete mutagenesis
of the HIV-1 protease. Nature, 1989, 340, 397-400
each mutant placed in one of 3 phenotypic
categories, positive, negative, or intermediate,
based on activity
mutant activity to be compared with change in
sequence-structure compatibility elucidated by
potential data
Experimental Data
3phv Structure-Function Correlations
Average Change in Potential
0.00
-0.20
-0.40
-0.60
-0.80
-1.00
-1.20
-1.40
-1.60
-1.80
Positive
Intermediate
Negative
ALL
-0.23
-0.74
-1.39
C
-0.14
-0.75
-0.23
NC
-0.29
-0.73
-1.65
HIV-1 Protease Assay
HIV-1 Protease Mutagenesis Data
Observations



set of mutants with unaffected protease activity exhibit minimal (negative) change
in potential
set of mutants that inactivate protease exhibit large negative change in potential,
weighted heavily by NC
set of mutants with intermediate phenotypes exhibit moderate negative change in
potential (similar among C and NC); wide range for intermediate phenotype in the
experiments
Evolutionarily Conserved Residue Positions

Apply chi-square test statistic on tables above, with the null
hypothesis being no association between residue position
conservation and level of sensitivity to mutation :


LHS table (1 df): χ2 = 10.44, reject null with p < 0.01
RHS table (2 df): χ2 = 75.49, reject null with p < 0.001
Mutagenesis at the Dimer Interface

Q2, T4, T96, and N98 are polar and side chains directed outward; P1,
I3, L97, and F99 are hydrophobic and side chains directed toward body
F99 in one subunit makes extensive contacts with I3, V11, L24, I66,
C67, I93, C95, and H96 in the complementary chain
Impact of the F99A Mutation in One Chain of the HIV-1
Protease on Conctacts in the Complementary Subunit
0.2
0.0
Difference in Residue
Potential (F99A - WT)

-0.2
-0.4
-0.6
-0.8
-1.0
0
10
20
30
40
50
60
Residue Number
70
80
90
100
Mutagenesis at the Dimer Interface

Alanine scan conducted on interface residues individually
and in pairs, in one subunit and in both chains; activity
of mutants measured by % cleavage of β-galactosidase
containing a protease cleavage site


S. Choudhury, L. Everitt, S.C. Pettit, A.H. Kaplan, Mutagenesis of the
dimer interface residues of tethered and untethered HIV-1 protease
result in differential activity and suggest multiple mechanisms of
compensation, Virology 307 (2003) 204-212.
Results: Good correlation between % cleavage (protease
activity) and topological scores (protease sequencestructure compatibility)
Structure-Function Correlation Based on Mutations in Both Subunits of HIV-1 Protease
3
Difference in Topological Scores (Mutant - WT)
2
N98A
1
P1A
0
WT
T96A
-1
Q2A
T4A
2
R = 0.61
-2
I3A
L97A+N98A
N98D
-3
I3A+T4A
Q2A+I3A
T96A+L97A
-4
L97A
F99A
-5
-6
0
0.1
0.2
0.3
0.4
0.5
% Cleavage
0.6
0.7
0.8
0.9
1
Structure-Function Correlation Based on Mutations in One Subunit of HIV-1 Protease
0.5
P1A
Difference in Topological Scores (Mutant - WT)
0
WT
N98A
T96A
-0.5
2
R = 0.57
T4A
Q2A
-1
N98D
I3A
-1.5
I3A+T4A
Q2A+I3A
-2
F99A
-2.5
T96A+L97A
L97A+N98A
-3
L97A
-3.5
0
0.1
0.2
0.3
0.4
0.5
% Cleavage
0.6
0.7
0.8
0.9
1
Conformational Changes Due to Dimerization
and/or Ligand Binding
PDB ID: 1g35
HIV-1 Protease Dimer with Inhibitor aha024
Structure reference: W. Schaal, A. Karlsson, G. Ahlsen, et al., Synthesis and comparative molecular field
analysis (CoMFA) of symmetric and nonsymmetric cyclic sulfamide HIV-1 protease inhibitors, J. Med. Chem.
44 (2001) 155-169



monomer in a dimeric configuration with an inhibitor: obtain profile for
1g35, plot 3D-1D only for g35A
isolated monomer: eliminate all PDB coordinate lines in 1g35 except
those for 1g35A, obtain profile, plot 3D-1D
plot interface: difference between the 1g35A 3D-1D’s in the dimer and
monomer configurations
1g35A Interface
5
Difference in Potential Profiles
4
3
2
1
0
-1
-2
0
10
20
30
40
50
60
70
80
90
100
Residue Number
Observations


majority of residues forming both dimer interface and flap region exhibit
increase in stability following dimerization: Q2, T4, I47-I54, T96, L97, and
F99

all h-phobic except Q2
increase in stability due to inhibitor binding evident for the active site
residues D25, T26, and G27; also true for the surrounding h-phobic
residues L24 and A28
Significance of Hydrophobic Residues
in HIV-1 Protease

35/99 amino acids with scores exceeding 1.0



Assuming h-phobic residues no more likely
than others (polar/charged) to have score>1.0



27 of these are hydrophobic
altogether, 44/99 amino acids in protease are hydrophobic
expect (35/99)x44, i.e. 15 or 16 h-phobics >1.0
27
17
44!
P(27 h-phobics>1.0)= 27!17!
 9935   9964   2.7x10-4 < 0.001, yet
this is exactly what we observe!
What about other cut-off scores, and other proteins?

applied similar test to all 996 proteins in the training set—
while varying cut-off between 0.0-5.0 in 0.25 increments,
binomial probabilities were calculated for each protein. For a
given p-value, # of proteins with a lower significance level at
each cut-off score was tabulated
Significance of Hydrophobic Residues

optimal cut-off score for rejection of the null is clearly
distinct for each of the individual proteins.


Ex. 827 proteins reject a null with 2.0 cut-off score at p =
0.05, but 918 proteins reject the null at the same
significance level if all cut-off scores considered.
alternate approach: 92,343 h-phobic amino acids and
136,329 others (polar/charged), total of 228,672
residues in the 996 proteins; assuming no differ. in
the mean of the scores in both groups, apply t-test.

Result: t=126.48, with 228,670 df => reject null!
Acknowledgements



Iosif Vaisman (Ph.D. advisor, first to
apply Delaunay to protein structure)
Zhibin Lu (Java programs for calculating
statistical potentials from tessellations)
Ronald Swanstrom (experimental HIV-1
protease mutants and activity measure)
Related documents