Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Protein Mutational Analysis Using Statistical Geometry Methods Majid Masso [email protected] http://mason.gmu.edu/~mmasso Bioinformatics and Computational Biology George Mason University Protein Basics = = A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T, V,W,Y H + H3N Cα O C O- Identical for all amino acids CH2 CH H3C CH3 Leucine (Leu or L) Unique side chain (R group) for each amino acid H2O H + O H O = H O H O Cα C O- + +H3N Cα C OR1 R2 = formed by linearly linking amino acid residues (aa’s are the + H3N building blocks of proteins) 20 distinct aa types = H3N Cα C N Cα C OR1 H R2 peptide bond Amino Acid Groups Brandon/Tooze (affinity for water) hydrophobic aa’s: A,V,L,I,M,P,F hydrophilic aa’s: polar: N,Q,W,S,T,G,C,H,Y charged: D,E,R,K Dayhoff (similar wrt structure or function) (A,S,T,G,P),(V,L,I,M),(R,K,H),(D,E,N,Q),(F,Y,W),(C) conservative substitution: replacement with an amino acid from within the same class non-conservative substitution: interclass replacement Protein Basics genes: code, or “blueprint” proteins: product, or “building” protein structure gives rise to function why do “things go wrong”? mistakes in “blueprint” incorrectly built, or nonexistent “buildings” Protein Data Bank (PDB): repository of protein structural data, including 3D coords. of all atoms (www.rcsb.org/pdb/) PDB ID: 1REZ Structure reference: Muraki M., Harata K., Sugita N., Sato K., Origin of carbohydrate recognition specificity of human lysozyme revealed by affinity labeling, Biochemistry 35 (1996) Computational Geometry Approach to Protein Structure Prediction Tessellation protein structure represented as a set of points in 3D, using Cα coordinates Voronoi tessellation: convex polyhedra, each contains one Cα , all interior points closer to this Cα than any other Delaunay tessellation: connect four Cα whose Voronoi polyhedra meet at a common vertex vertices of Delaunay simplices objectively define a set of four nearestneighbor residues (quadruplets) 5 classes of Delaunay simplices Quickhull algorithm (qhull program), Barber et al., UMN Geometry Center Voronoi/Delaunay tessellation in 2D space. Voronoi tessellation-dashed line, Delaunay tessellation-solid line (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.) k j l i k j j + 1 j i + 3 j i + 1 i + 2 i + 1 i + 1 i + 1 + 2i i i i i { 1 1 1 1 }{ 2 2 } 2 1 1 } { { 4 } { 3 1 } Five classes of Delaunay simplices. (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.) Counting Quadruplets assuming order independence among residues comprising Delaunay simplices, the maximum number of all possible combinations of quadruplets forming such simplices is 8855 C D E F 20 4 C C D E 19 20 2 C C D D 20 2 C C C D 20 19 C C C C 20 Residue Environment Scores log-likelihood: qijkl log fijkl pijkl fijkl = normalized frequency of quadruplets containing residues i,j,k,l in a representative training set of highresolution protein structures with low primary sequence identity i.e., f ijkl = total number of quadruplets in dataset containing only residues i,j,k,l divided by total number of observed quadruplets pijkl = frequency of random occurrence of the quadruplet (multinomial) i.e., pijkl cai a j ak al ai= total number of occurrences of residue i divided by total number of residues in the dataset 4! , where n = number of distinct residue types in the c n ti ! quadruplet, and t i is the number of residues of type i. i Residue Environment Scores total statistical potential (topological score) of protein: sum the loglikelihoods of all quadruplets forming the Delaunay simplices individual residue potentials: sum the log-likelihoods of all quadruplets in which the residue participates (yields a 3D-1D potential profile) 3phv Potential Profile 12 PDB ID: 3phv HIV-1 Protease Monomer 99 amino acids (total potential 27.93) 10 Potential 8 6 4 2 0 -2 0 10 20 30 40 50 60 70 80 90 Residue Number Structure reference: R. Lapatto, T. Blundell, A. Hemmings, et al., X-ray analysis of HIV-1 proteinase at 2.7 Å resolution confirms structural homology among retroviral enzymes, Nature 342 (1989) 299-302. 100 Properties of HIV-1 Protease functional as a homodimer 99 residues per subunit monomers form an intermolecular two-fold axis of symmetry approximate intramolecular two-fold axis of symmetry dimer interface: N and C termini (P1-T4 & C95-F99, respectively) form a fourstranded beta sheet active site triad: D25-T26-G27 h-phobic flaps (M46-V56) are also G-rich, providing flexibility accommodate / interact with substrate molecule Figure adapted from URL: http://mcl1.ncifcrf.gov/hivdb/Informative/Facts/facts.html HIV-1 Protease Comprehensive Mutational Profile (CMP) mutate 19 times the residue present at each of the 99 positions in the primary sequence get total potential and potential profile of each artificially created mutant protein create 20x99 matrix containing total potentials of all the single residue mutants columns labeled with residues in the primary sequence of wild-type (WT) HIV-1 protease monomer, and rows labeled with the 20 naturally occurring amino acids subtract WT total potential (TP) from each cell, then average columns to get CMP 1 20 1 20 CMPj = 20 [(mutant TP)ij-(WT TP)] = 20 [(mutant TP)ij-27.93] , j=1,…,99 i1 i1 3phv Comprehensive Mutational Profile 4 2 0 Mean Change in Total Protein Potential -2 -4 -6 -8 0 10 20 30 40 50 60 Residue Number 70 80 90 100 3phv Clustered Com prehensive Mutational Profiles 4 2 0 C -2 NC -4 ALL Mean Change in Total Protein Potential -6 -8 -10 6 P Q I 1 T L 5 . . . E A L L D T G A D D 21 30 . . . A I 71 G T V L V G P T 80 . . . C T 95 L N F 99 4 2 H-phobic 0 Charged -2 Polar -4 Total -6 -8 -10 -12 P 1 Q I T L 5 . . . E A L 21 L D T G A D D 30 . . . A I 71 Residue G T V L V G P T 80 . . . C T L N F 95 99 3phv Comprehensive Mutational Profile vs. Potential Profile 4 N83 D25 2 Mean Change in Total Protein Potential (CMPj) K55 0 G78 G16 I50 G94 L19 T4G40 P9 G68 G73 P1 R57 Q92T12 N98 Q2 G86 D30 P44 Q61 P39H69 S37 Q7 Q18 K70 T91 A28 T80 T96 T26 G51 V82G17 K14 K43 T31 K45P81 R41 G27 M46 P79 W6 Q58 I54 G48 A71 R87 F99 I93 E34 L5 G49 E21 W42 T74 F53 E65 M36 L63 I3 G52 L97 R8 D29 V56 N88 I72 Y59 L38 E35 K20 I47 I84 -2 C95 L10 A22 L76 D60 L23 L89 V77 C67 L90 I62 V11 I13 -4 I15 V32 L33 V75 L24 -6 I66 I85 I64 -8 -2 0 2 4 6 8 10 Individual Residue Potentials of Wild-Type Protein (potential of residue j in WT HIV-1 protease) 12 3phv Comprehensive Non-Conservative Mutational Profile vs. Potential Profile 4 N83 D25 Mean Change in Overall Protein Potential 2 0 -2 K55 G78 G16 I50 L19 P9 T4 G40 G94 G68 P1 T12 N98 G73 R57 Q92Q2 Q7 G86 P44 D30 Q61 S37 P39H69 Q18T96 T91 T26 K70 T80 G51 T31 K14G17 P81 A71 K45 V82 A28 G27 R41 K43 M46Q58 I54 G48 P79 W6 R87 F99 E21 G49 E65 E34 W42 T74L5 I93 F53 G52M36 L63 R8 L97I3 D29 N88 Y59 E35 I72 K20 V56 L38 I47 C95 I84 L10 A22 C67 L76 D60 L89 L23 V77 L90 I62 V11 -4 I13 I15 V32 L33 -6 V75 L24 I66 -8 I85 I64 -10 -2 0 2 4 6 Individual Residue Potentials of Wild-Type Protein 8 10 12 3phv Comprehensive Conservative Mutational Profile vs. Potential Profile 1 G78 A28 G94 K55 K70 G16 G40 I50V82 N83 P39 G68 Q18 T4 Mean Change in Overall Protein Potential P44 0 D25 L19 R57 V77 G73 L63 K43 Q58 I3 L76 M46 G86 W6 D30 V56 M36 P9 V75 S37 Q61G17 R87 F99 P1 T12 H69 K45 P79 R41 I54 L5 G51 N98 T91 I47 Q92 K14 F53 G48 T80 G27 W42 T96 Q2 L33 L89 L38 I93 I72 C95 C67 V11 V32 Y59 L23 T31 T74 P81 E34 E21T26 G49 I84 N88 R8 E35L97 K20 I15 L10 D60 I62 L90 G52 E65 -1 D29 Q7 I13 L24 I66 A71 I85 I64 -2 A22 -3 -2 0 2 4 6 Individual Residue Potentials of Wild-Type Protein 8 10 12 Experimental Data 536 single point missense mutations 336 published mutants: 200 mutants provided by R. Swanstrom (UNC) Loeb D.D., Swanstrom R., Everitt L., Manchester M., Stamper S.E., Hutchison III C.A. Complete mutagenesis of the HIV-1 protease. Nature, 1989, 340, 397-400 each mutant placed in one of 3 phenotypic categories, positive, negative, or intermediate, based on activity mutant activity to be compared with change in sequence-structure compatibility elucidated by potential data Experimental Data 3phv Structure-Function Correlations Average Change in Potential 0.00 -0.20 -0.40 -0.60 -0.80 -1.00 -1.20 -1.40 -1.60 -1.80 Positive Intermediate Negative ALL -0.23 -0.74 -1.39 C -0.14 -0.75 -0.23 NC -0.29 -0.73 -1.65 HIV-1 Protease Assay HIV-1 Protease Mutagenesis Data Observations set of mutants with unaffected protease activity exhibit minimal (negative) change in potential set of mutants that inactivate protease exhibit large negative change in potential, weighted heavily by NC set of mutants with intermediate phenotypes exhibit moderate negative change in potential (similar among C and NC); wide range for intermediate phenotype in the experiments Evolutionarily Conserved Residue Positions Apply chi-square test statistic on tables above, with the null hypothesis being no association between residue position conservation and level of sensitivity to mutation : LHS table (1 df): χ2 = 10.44, reject null with p < 0.01 RHS table (2 df): χ2 = 75.49, reject null with p < 0.001 Mutagenesis at the Dimer Interface Q2, T4, T96, and N98 are polar and side chains directed outward; P1, I3, L97, and F99 are hydrophobic and side chains directed toward body F99 in one subunit makes extensive contacts with I3, V11, L24, I66, C67, I93, C95, and H96 in the complementary chain Impact of the F99A Mutation in One Chain of the HIV-1 Protease on Conctacts in the Complementary Subunit 0.2 0.0 Difference in Residue Potential (F99A - WT) -0.2 -0.4 -0.6 -0.8 -1.0 0 10 20 30 40 50 60 Residue Number 70 80 90 100 Mutagenesis at the Dimer Interface Alanine scan conducted on interface residues individually and in pairs, in one subunit and in both chains; activity of mutants measured by % cleavage of β-galactosidase containing a protease cleavage site S. Choudhury, L. Everitt, S.C. Pettit, A.H. Kaplan, Mutagenesis of the dimer interface residues of tethered and untethered HIV-1 protease result in differential activity and suggest multiple mechanisms of compensation, Virology 307 (2003) 204-212. Results: Good correlation between % cleavage (protease activity) and topological scores (protease sequencestructure compatibility) Structure-Function Correlation Based on Mutations in Both Subunits of HIV-1 Protease 3 Difference in Topological Scores (Mutant - WT) 2 N98A 1 P1A 0 WT T96A -1 Q2A T4A 2 R = 0.61 -2 I3A L97A+N98A N98D -3 I3A+T4A Q2A+I3A T96A+L97A -4 L97A F99A -5 -6 0 0.1 0.2 0.3 0.4 0.5 % Cleavage 0.6 0.7 0.8 0.9 1 Structure-Function Correlation Based on Mutations in One Subunit of HIV-1 Protease 0.5 P1A Difference in Topological Scores (Mutant - WT) 0 WT N98A T96A -0.5 2 R = 0.57 T4A Q2A -1 N98D I3A -1.5 I3A+T4A Q2A+I3A -2 F99A -2.5 T96A+L97A L97A+N98A -3 L97A -3.5 0 0.1 0.2 0.3 0.4 0.5 % Cleavage 0.6 0.7 0.8 0.9 1 Conformational Changes Due to Dimerization and/or Ligand Binding PDB ID: 1g35 HIV-1 Protease Dimer with Inhibitor aha024 Structure reference: W. Schaal, A. Karlsson, G. Ahlsen, et al., Synthesis and comparative molecular field analysis (CoMFA) of symmetric and nonsymmetric cyclic sulfamide HIV-1 protease inhibitors, J. Med. Chem. 44 (2001) 155-169 monomer in a dimeric configuration with an inhibitor: obtain profile for 1g35, plot 3D-1D only for g35A isolated monomer: eliminate all PDB coordinate lines in 1g35 except those for 1g35A, obtain profile, plot 3D-1D plot interface: difference between the 1g35A 3D-1D’s in the dimer and monomer configurations 1g35A Interface 5 Difference in Potential Profiles 4 3 2 1 0 -1 -2 0 10 20 30 40 50 60 70 80 90 100 Residue Number Observations majority of residues forming both dimer interface and flap region exhibit increase in stability following dimerization: Q2, T4, I47-I54, T96, L97, and F99 all h-phobic except Q2 increase in stability due to inhibitor binding evident for the active site residues D25, T26, and G27; also true for the surrounding h-phobic residues L24 and A28 Significance of Hydrophobic Residues in HIV-1 Protease 35/99 amino acids with scores exceeding 1.0 Assuming h-phobic residues no more likely than others (polar/charged) to have score>1.0 27 of these are hydrophobic altogether, 44/99 amino acids in protease are hydrophobic expect (35/99)x44, i.e. 15 or 16 h-phobics >1.0 27 17 44! P(27 h-phobics>1.0)= 27!17! 9935 9964 2.7x10-4 < 0.001, yet this is exactly what we observe! What about other cut-off scores, and other proteins? applied similar test to all 996 proteins in the training set— while varying cut-off between 0.0-5.0 in 0.25 increments, binomial probabilities were calculated for each protein. For a given p-value, # of proteins with a lower significance level at each cut-off score was tabulated Significance of Hydrophobic Residues optimal cut-off score for rejection of the null is clearly distinct for each of the individual proteins. Ex. 827 proteins reject a null with 2.0 cut-off score at p = 0.05, but 918 proteins reject the null at the same significance level if all cut-off scores considered. alternate approach: 92,343 h-phobic amino acids and 136,329 others (polar/charged), total of 228,672 residues in the 996 proteins; assuming no differ. in the mean of the scores in both groups, apply t-test. Result: t=126.48, with 228,670 df => reject null! Acknowledgements Iosif Vaisman (Ph.D. advisor, first to apply Delaunay to protein structure) Zhibin Lu (Java programs for calculating statistical potentials from tessellations) Ronald Swanstrom (experimental HIV-1 protease mutants and activity measure)