Download slides

CRB Journal Club February 13, 2006 Jenny Gu Selected for a Reason • Residues selected by evolution for a reason, but conservation is not distinguished between function or stability. • To disentangle between functional and structural constraints, predicted sequence profiles generated for structural stability is compared to naturally occurring sequence profiles. • Incorporates two additional measures, free energy and sequence profile difference, in addition to residue conservation to identify functional residues. Datasets • Enzyme Active Site Set (Suspicious. What about cross fold validation?) Conservation Score Distribution Sequence conservation score calculated by SCORECONS with multiple sequence alignment from MUSCLE A) All Residue Sites B) Enzyme Active Site Set Calculating Difference between Profiles Designed Sequence Profiles Rosetta design program Generate 40 protein sequences stable for structure. Align with PSI-blast to generate position specific scoring matrix (PSSM) Natural Sequence Profiles PSSM matrix from PSI-blast Euclidean distance rescaled between : 0 (high similarity) 1 (low similiarity) Difference between Natural and Designed Sequence Profiles A) All residues in active sites. B) Functional residues in active sites Differences between profiles are rescaled such that: 0 - High Similarity 1 - Low similarity In other words: Selection for function vs. stability 0 - Low selection 1 - High selection Calculating Native/Optimal Residue Energy Difference 1. Use Rosetta G module to calculate free energy changes for each 20 amino acid substitutions at each position. 2. Compare to native G. If functional constraints are imposed, there should be a big gap between G. Rosetta G Originally developed to identify binding interface hot spots. Model based on all-atom rotamer description of side chains with energy function dominated by Lennard Jones interactions, solvation interactions, and hydrogen bonding. Distribution of Free Energy Difference Difference between free energy of naturally occurring residue and energetically most favorable residue. (kcal/mol) For all residues in active sites. B) Functional residues in active sites. A) In other words: Positions with smaller differences have been selected for stability. Residue Classification Combine: 1. Sequence Conservation 2. Profile Difference (Natural vs. Designed) 3. Residue Free Energy Changes (Natural vs Optimal) To classify functional vs. nonfunctional residues. Logistic regression with linear model module used to determine weights for input features. Classification Performance Largest improvement observed with free energy measures. Inclusion of profile difference with free measures resulted in minor improvements. Combined measures reduces false positives. Chymosin B Sequence Conservation Only Combined Measures Arginine Kinase Sequence Conservation Only Combined Measures Testing Generality Dataset 2 includes ligand binding sites Comparison to another predictor Sources of Error • Sensitivity to multiple alignment quality. • Loop regions are difficult to align. • Functionally important residues can contribute to stability. • Suggested Improvements: • • • Better multiple sequence alignments. Spatial clustering of high scoring residues. Introducing backbone flexibility into energy calculations. Other Approaches - Extracting from Sequence Design 1) Design procedure based on Monte Carlo simulation of amino acid substitution process. 2) Fixed substitutions based on scoring function from template structure and multiple alignment of homologs. Other Approaches - Using Protein Homology Information 1) Identify high degree of conservation between homologous proteins. 2) Use information theory to identify positions where environment-specific substitution tables make poor prediction of overall amino acid substitution pattern. 3) Identify residues with highly conserved positions when homologous family are superposed. Interest in this Paper • Distinguishing between functional and structural constraints. • Designing sequences and subsequent profiles allows us to explore an enlarge sequence space that is not captured by natural sequence. Questions: From an evolutionary perspective: 1) How does structure limit the exploration of sequence space. 2) How is sequence space expanded with structure change. 3) How do selective pressures for molten globules, flexible regions, and disordered structures impact the sequence space? Current Domain Coverage of Genome Current perspective: Ab initio structure evolution is now difficult now that system of balance and checks is implemented. Evolution of current protein repertoire largely attributed to recombination of existing folds. Reaching beyond structural genomics? …. • With known structures: • • • • With unknown structures: • • Use of Hidden Markov Model (HMM) or profile for domains to identify in genome. Evolutionary plasticity greater for loop regions than for core. Work has been done in this area. Can we design a structure not currently in PDB and identify it in nature? With structures that nature “hasn’t seen before”. • • • De novo structure designed in 2003. Maybe it already exists in nature, we just don’t know about it yet. And if it doesn’t exist, is it just a proof of principle or can we actually do something with it?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides