Download Lecture 7 - CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein design wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein folding wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein purification wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein moonlighting wikipedia , lookup

Implicit solvation wikipedia , lookup

Protein domain wikipedia , lookup

Protein wikipedia , lookup

Proteomics wikipedia , lookup

Western blot wikipedia , lookup

Rosetta@home wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Structural alignment wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Cyclol wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
7. (Predicted) residue pair
contacts guide ab initio modeling
… and homolog refinement too…
Acknowledgments for slides in this lecture to
Sergey Ovchinnikov!
Restraint function: Contact
prediction via correlated mutations
Recent breakthrough: Significantly longer proteins
can be modeled without template (ab initio)
ab initio restricted to small (100aa),
single domain proteins
+ information about contacts
• Contact prediction from coevolution
-> dramatic increase of scope
(… 500aa)
What is co-evolution?
Important Contacts in Proteins are evolutionarily conserved and encoded in a
Multiple Sequence Alignment
within
mediated by ligand
between
due to co-evolution
conformational change
by measuring coevolution, we can infer important contacts in proteins!
Contacting residues can be represented as
a contact map!
N
Contact Map
C
Contact: Residue – Residue interaction
C
Grey = Structural Contact
Blue = Predicted Contact
Intensity = Strength of Prediction
Gremlin (Generative REgularized ModeLs of proteINs)
• based on pseudolikelihood framework:
Markov Random Field
(more complex than HMM: chain)
• optimized for maximum
correct contact predictions
• includes predicted context
information:
• SS (PSIPRED)
• Contacts (SVMcon)
informative MSA:
# S (Sequences, <90%id) >
4-5 x L (protein length)
@ 4-5L sequence depth, the top 1.5L
contacts are ~50% correct
reliable modeling:
≥ 1 reliable non-local contact
every <12aa>
-> prediction of longer
proteins
Original paper: Balakrishnan …. Langmead. Proteins 2010
Bakerlab: Kamisetty et al. . PNAS 2013; Kim et al.. Proteins 2013; Ovchinnikov et al. eLife2015 & Science 2017
GREMLIN used to measure Co-evolution
Global statistical model
Lapedes et al. 1990s
Positions
X1
X2
X3
X4
V1
V2
V3
V4
x = position
vi = one-body energy
(Conservation)
wij = two-body energy
(Coupling)
Generative REgularized ModeLs of proteINs
Balakrishnan et al. 2010
GREMLIN used to measure Co-evolution
Global statistical model
Lapedes et al. 1990s
W1
x = position
vi = one-body energy
(Conservation)
wij = two-body energy
(Coupling)
4
Positions
X1
X2
X3
X4
V1
V2
V3
V4
Balakrishnan et al. 2010
GREMLIN used to measure Co-evolution
Global statistical model
Lapedes et al. 1990s
x = position
fi = one-body energy
(Conservation)
 ij = two-body energy
(Coupling)
Learn pseudo-likelihood model:
(1)Connectivity (sparse: Few significant correlations – contacts)
(2)Parameters (optimize model of X - MSA)
Balakrishnan et al. 2010
GREMLIN used to measure Co-evolution
Global statistical model
Lapedes et al. 1990s
GREMLIN
APC(L2norm(
))
Wij
x = position
vi = one-body energy
(Conservation)
wij = two-body energy
(Coupling)
50S ribosomal
protein L6
Balakrishnan et al. 2010
GREMLIN used to measure Co-evolution
Global statistical model
Lapedes et al. 1990s
GREMLIN
APC(L2norm(
))
Wij
x = position
vi = one-body energy
(Conservation)
wij = two-body energy
(Coupling)
50S ribosomal
protein L6
Balakrishnan et al. 2010
GREMLIN used to measure Co-evolution
When is it useful?
•Needs many sequences
-> structural template often available
-> no need for contact predictions ….
Model discrimination?
DGREMLIN:
difference between native
and model scores
(CAMEO dataset n=329)
For 10% (34/329 proteins)
GREMLIN discriminates
the native from the rest
Kamisetty et al. 2013
GREMLIN used to measure Co-evolution
When is it useful?
•Needs many sequences
-> structural template often available
-> no need for contact predictions ….
Better information than
templates? HHD (closeness
of template: DHHPred scores)
0: HHPred query and template
alignment identical
1: no homolog with known
structure (CAMEO dataset n=339)
HHD >0.5 -> GREMLIN is useful
for model discrimination
(GREMLIN D>0)
(TMalign)
Kamisetty et al. 2013
GREMLIN used to measure Co-evolution
When is it useful?
•Needs many sequences
-> structural template often available
-> no need for contact predictions ….
Analysis of PFAM
GREMLIN could be useful for
14% (422/12,452) of the
families
Estimated from:
• # cases with distant template
(HHD>0.5)
• # cases with enough sequences
(Sequences/Length>4)
Kamisetty et al. 2013
Example: CASP T0806
predicted contacts
YAAA_ECOLI
Seqs: 1208
Length: 258
Top 1.5L contacts
HHsearch results of
top HIT
Prob = 12.4%
E-value = 20
Improve confidence
by combination with
GREMLIN contacts
Not all contacts should be made!
Monomer
Homo-dimer
Ligand
mediated
Multi-state
Functional form to “de-noise”
Starting conformation
Sigmoid
Harmonic
Sigmoidal restraints prevent “false” contacts from distorting the structure,
maximizing self-consistent contacts. Though requires LOTS of sampling.
Residue-pair-specific Cβ-Cβ distance
2.9
9.0
Maximum Cβ-Cβ distance that allows a
contact (< 5Å between any heavy atom).
●Bring residues close enough to form contacts, let Rosetta energy function decide
if contact should be formed
●Can be used in centroid mode
CASP target T0806 - each model made/missed a
different subset of contacts
Contact maps of the top 4 models
Structure Contacts (5Å)
Predicted Contacts
Top 4 models
Pipeline
Hybridize (using RosettaCM)
Fragment insertion
(20 trials)
Abinitio
(using RosettaAB)
Contact prediction
essential for
convergence
Repeat until CASP deadline or
convergence.
High-Resolution comparative modeling with RosettaCM
Y Song, F DiMaio, RYR Wang, D Kim, C Miles, TJ Brunette, J Thompson, D Baker
One contact for every twelve residues allows robust and accurate
topology‐level protein structure modeling
Kim, D.E., DiMaio, F., Yu‐Ruei Wang, R., Song, Y. and Baker, D.
Iterative refinement essential
for improved model quality
Transition ab initio -> Template based modeling
Contact-assisted ab initio
prediction using Rosetta
Contacts refine template
topology
1.
2.
Determination of Topology:
•
Ab initio folding w constraints
•
Find fragment pairs
Refinement of Topology:
•
Refine structure by imposing
constraints
One contact for every twelve residues allows robust and accurate
topology‐level protein structure modeling
Kim, D.E., DiMaio, F., Yu‐Ruei Wang, R., Song, Y. and Baker, D.
Modeling with contact predictions:
CASP 12 results for Rosetta
• Examples
Predicted contacts
Model
X-ray
<5A; <10A; >10A
Bakerlab: Kamisetty et al. . PNAS 2013; Kim et al.. Proteins 2013; Ovchinnikov et al. eLife2015 & Science 2017
Modeling with contact predictions: New
models for uncharacterized families
Nf > 64
Accurate model
Same fold
(1) Prokaryotic proteomes (58 /121 protein families with no structural template ->
templates for ~400K prokaryotic proteins)
(2) Large scale + metagenomic data (614/1024 -> 137 new folds; templates for ~500K
uniprot and 3M metagenomic proteins)
Nf: #sequence clusters (80% seqid threshold)
√Length
•Correlates well with accuracy (TM)
•Length-independent
Nf
>64: accurate model
>16: accurate fold
Ovchinnikov et al. eLife2015 (1) & Science 2017 (2)
Summary : Structure prediction with
correlated contacts
• Correlated evolution identifies neighboring residue pairs in
protein structure
• Informative alignment MSA is critical
• Enough sequences are available today
• Contacts used to guide structure prediction
• In particular when no template is identified
• Significant increase in proteins with reliable structural models
• In particular for Transmembrane proteins
• Helped by metagenomic data