* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 7 - CS
Survey
Document related concepts
Protein design wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein folding wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein purification wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein moonlighting wikipedia , lookup
Implicit solvation wikipedia , lookup
Protein domain wikipedia , lookup
Western blot wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Structural alignment wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Transcript
7. (Predicted) residue pair contacts guide ab initio modeling … and homolog refinement too… Acknowledgments for slides in this lecture to Sergey Ovchinnikov! Restraint function: Contact prediction via correlated mutations Recent breakthrough: Significantly longer proteins can be modeled without template (ab initio) ab initio restricted to small (100aa), single domain proteins + information about contacts • Contact prediction from coevolution -> dramatic increase of scope (… 500aa) What is co-evolution? Important Contacts in Proteins are evolutionarily conserved and encoded in a Multiple Sequence Alignment within mediated by ligand between due to co-evolution conformational change by measuring coevolution, we can infer important contacts in proteins! Contacting residues can be represented as a contact map! N Contact Map C Contact: Residue – Residue interaction C Grey = Structural Contact Blue = Predicted Contact Intensity = Strength of Prediction Gremlin (Generative REgularized ModeLs of proteINs) • based on pseudolikelihood framework: Markov Random Field (more complex than HMM: chain) • optimized for maximum correct contact predictions • includes predicted context information: • SS (PSIPRED) • Contacts (SVMcon) informative MSA: # S (Sequences, <90%id) > 4-5 x L (protein length) @ 4-5L sequence depth, the top 1.5L contacts are ~50% correct reliable modeling: ≥ 1 reliable non-local contact every <12aa> -> prediction of longer proteins Original paper: Balakrishnan …. Langmead. Proteins 2010 Bakerlab: Kamisetty et al. . PNAS 2013; Kim et al.. Proteins 2013; Ovchinnikov et al. eLife2015 & Science 2017 GREMLIN used to measure Co-evolution Global statistical model Lapedes et al. 1990s Positions X1 X2 X3 X4 V1 V2 V3 V4 x = position vi = one-body energy (Conservation) wij = two-body energy (Coupling) Generative REgularized ModeLs of proteINs Balakrishnan et al. 2010 GREMLIN used to measure Co-evolution Global statistical model Lapedes et al. 1990s W1 x = position vi = one-body energy (Conservation) wij = two-body energy (Coupling) 4 Positions X1 X2 X3 X4 V1 V2 V3 V4 Balakrishnan et al. 2010 GREMLIN used to measure Co-evolution Global statistical model Lapedes et al. 1990s x = position fi = one-body energy (Conservation) ij = two-body energy (Coupling) Learn pseudo-likelihood model: (1)Connectivity (sparse: Few significant correlations – contacts) (2)Parameters (optimize model of X - MSA) Balakrishnan et al. 2010 GREMLIN used to measure Co-evolution Global statistical model Lapedes et al. 1990s GREMLIN APC(L2norm( )) Wij x = position vi = one-body energy (Conservation) wij = two-body energy (Coupling) 50S ribosomal protein L6 Balakrishnan et al. 2010 GREMLIN used to measure Co-evolution Global statistical model Lapedes et al. 1990s GREMLIN APC(L2norm( )) Wij x = position vi = one-body energy (Conservation) wij = two-body energy (Coupling) 50S ribosomal protein L6 Balakrishnan et al. 2010 GREMLIN used to measure Co-evolution When is it useful? •Needs many sequences -> structural template often available -> no need for contact predictions …. Model discrimination? DGREMLIN: difference between native and model scores (CAMEO dataset n=329) For 10% (34/329 proteins) GREMLIN discriminates the native from the rest Kamisetty et al. 2013 GREMLIN used to measure Co-evolution When is it useful? •Needs many sequences -> structural template often available -> no need for contact predictions …. Better information than templates? HHD (closeness of template: DHHPred scores) 0: HHPred query and template alignment identical 1: no homolog with known structure (CAMEO dataset n=339) HHD >0.5 -> GREMLIN is useful for model discrimination (GREMLIN D>0) (TMalign) Kamisetty et al. 2013 GREMLIN used to measure Co-evolution When is it useful? •Needs many sequences -> structural template often available -> no need for contact predictions …. Analysis of PFAM GREMLIN could be useful for 14% (422/12,452) of the families Estimated from: • # cases with distant template (HHD>0.5) • # cases with enough sequences (Sequences/Length>4) Kamisetty et al. 2013 Example: CASP T0806 predicted contacts YAAA_ECOLI Seqs: 1208 Length: 258 Top 1.5L contacts HHsearch results of top HIT Prob = 12.4% E-value = 20 Improve confidence by combination with GREMLIN contacts Not all contacts should be made! Monomer Homo-dimer Ligand mediated Multi-state Functional form to “de-noise” Starting conformation Sigmoid Harmonic Sigmoidal restraints prevent “false” contacts from distorting the structure, maximizing self-consistent contacts. Though requires LOTS of sampling. Residue-pair-specific Cβ-Cβ distance 2.9 9.0 Maximum Cβ-Cβ distance that allows a contact (< 5Å between any heavy atom). ●Bring residues close enough to form contacts, let Rosetta energy function decide if contact should be formed ●Can be used in centroid mode CASP target T0806 - each model made/missed a different subset of contacts Contact maps of the top 4 models Structure Contacts (5Å) Predicted Contacts Top 4 models Pipeline Hybridize (using RosettaCM) Fragment insertion (20 trials) Abinitio (using RosettaAB) Contact prediction essential for convergence Repeat until CASP deadline or convergence. High-Resolution comparative modeling with RosettaCM Y Song, F DiMaio, RYR Wang, D Kim, C Miles, TJ Brunette, J Thompson, D Baker One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling Kim, D.E., DiMaio, F., Yu‐Ruei Wang, R., Song, Y. and Baker, D. Iterative refinement essential for improved model quality Transition ab initio -> Template based modeling Contact-assisted ab initio prediction using Rosetta Contacts refine template topology 1. 2. Determination of Topology: • Ab initio folding w constraints • Find fragment pairs Refinement of Topology: • Refine structure by imposing constraints One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling Kim, D.E., DiMaio, F., Yu‐Ruei Wang, R., Song, Y. and Baker, D. Modeling with contact predictions: CASP 12 results for Rosetta • Examples Predicted contacts Model X-ray <5A; <10A; >10A Bakerlab: Kamisetty et al. . PNAS 2013; Kim et al.. Proteins 2013; Ovchinnikov et al. eLife2015 & Science 2017 Modeling with contact predictions: New models for uncharacterized families Nf > 64 Accurate model Same fold (1) Prokaryotic proteomes (58 /121 protein families with no structural template -> templates for ~400K prokaryotic proteins) (2) Large scale + metagenomic data (614/1024 -> 137 new folds; templates for ~500K uniprot and 3M metagenomic proteins) Nf: #sequence clusters (80% seqid threshold) √Length •Correlates well with accuracy (TM) •Length-independent Nf >64: accurate model >16: accurate fold Ovchinnikov et al. eLife2015 (1) & Science 2017 (2) Summary : Structure prediction with correlated contacts • Correlated evolution identifies neighboring residue pairs in protein structure • Informative alignment MSA is critical • Enough sequences are available today • Contacts used to guide structure prediction • In particular when no template is identified • Significant increase in proteins with reliable structural models • In particular for Transmembrane proteins • Helped by metagenomic data