Download Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

G protein–coupled receptor wikipedia , lookup

Size-exclusion chromatography wikipedia , lookup

Expression vector wikipedia , lookup

Magnesium transporter wikipedia , lookup

Point mutation wikipedia , lookup

Protein wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Interactome wikipedia , lookup

Metalloprotein wikipedia , lookup

Western blot wikipedia , lookup

Structural alignment wikipedia , lookup

Protein purification wikipedia , lookup

Proteolysis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Probabilistic Ensembles for
Improved Inference in
Protein-Structure Determination
Ameet Soni* and Jude Shavlik
Dept. of Computer Sciences
Dept. of Biostatistics and Medical Informatics
Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011
Protein Structure Determination
2

Proteins essential to most
cellular function
Structural support
 Catalysis/enzymatic activity
 Cell signaling



Protein structures determine function
X-ray crystallography is main technique
for determining structures
Task Overview
3

Given
A protein sequence
 Electron-density map (EDM)
of protein


Do

Automatically produce a protein
structure that
Contains all atoms
 Is physically feasible

SAVRVGLAIM...
Challenges & Related Work
4
Resolution is a
property of
the protein
1Å
2Å
3Å
4Å
ARP/wARP
TEXTAL & RESOLVE
Higher Resolution : Better Quality
Our Method: ACMI
Outline
5




Protein Structures
Prior Work on ACMI
Probabilistic Ensembles in ACMI (PEA)
Experiments and Results
Outline
6




Protein Structures
Prior Work on ACMI
Probabilistic Ensembles in ACMI (PEA)
Experiments and Results
Our Technique: ACMI
7
Perform Local Match
Apply Global Constraints
Sample Structure
Phase 1
Phase 2
Phase 3
bk
b*1…M
k+1
bk-1
prior probability of
each AA’s location
posterior probability
of each AA’s location
all-atom protein
structures
Results
[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]
8
ACMI Outline
9
Perform Local Match
Apply Global Constraints
Sample Structure
Phase 1
Phase 2
Phase 3
bk
b*1…M
k+1
bk-1
prior probability of
each AA’s location
posterior probability
of each AA’s location
all-atom protein
structures
Phase 2 – Probabilistic Model
10

ACMI models the probability of all possible traces
using a pairwise Markov Random Field (MRF)
ALA1
GLY2
LYS3
LEU4
SER5
Probabilistic Model
11
# nodes: ~1,000
# edges: ~1,000,000
Approximate Inference
12

Best structure intractable to calculate
i.e., we cannot infer the underlying structure analytically

Phase 2 uses Loopy Belief Propagation (BP) to
approximate solution
Local, message-passing scheme
 Distributes evidence between nodes

Loopy Belief Propagation
13
LYS31
LEU32
mLYS31→LEU32
pLYS31
pLEU32
Loopy Belief Propagation
14
LYS31
LEU32
mLEU32→LEU31
pLYS31
pLEU32
Shortcomings of Phase 2
15

Inference is very difficult
~1,000,000 possible outputs for one amino acid
 ~250-1250 amino acids in one protein
 Evidence is noisy
2
 O(N ) constraints


Approximate solutions, room for improvement
Outline
16




Protein Structures
Prior Work on ACMI
Probabilistic Ensembles in ACMI (PEA)
Experiments and Results
Ensemble Methods
17


Ensembles: the use of multiple models to improve
predictive performance
Tend to outperform best single model [Dietterich ‘00]

Eg, Netflix prize
Phase 2: Standard ACMI
18
MRF
Protocol
P(bk)
Phase 2: Ensemble ACMI
19
MRF
P1(bk)
Protocol 1
Protocol C
P2(bk)
…
…
Protocol 2
PC(bk)
Probabilistic Ensembles in ACMI (PEA)
20

New ensemble framework (PEA)
Run inference multiple times, under different conditions
 Output: multiple, diverse, estimates of each amino acid’s
location


Phase 2 now has several probability distributions
for each amino acid, so what?
ACMI Outline
21
Perform Local Match
Apply Global Constraints
Sample Structure
Phase 1
Phase 2
Phase 3
bk
b*1…M
k+1
bk-1
prior probability of
each AA’s location
posterior probability
of each AA’s location
all-atom protein
structures
Backbone Step (Prior work)
22
Place next backbone atom
b k-1
bk-2
b'k
?
?
?
?
?
(1) Sample bk from empirical
Ca- Ca- Ca pseudoangle distribution
Backbone Step (Prior work)
23
Place next backbone atom
b k-1
b'k
0.25
0.20
b k-2
0.15
(2) Weight each sample by its
Phase 2 computed marginal
Backbone Step (Prior work)
24
Place next backbone atom
b k-1
b'k
0.25
0.20
b k-2
0.15
(3) Select bk with probability
proportional to sample weight
Backbone Step for PEA
25
b k-1
b'k
P1(b'k)
P2(b'k)
PC(b'k)
0.23
0.15
0.04
?
b k-2
w(b'k )
Backbone Step for PEA: Average
26
b k-1
b'k
P1(b'k)
P2(b'k)
PC(b'k)
0.23
0.15
0.04
?
b k-2
0.14
Backbone Step for PEA: Maximum
27
b k-1
b'k
P1(b'k)
P2(b'k)
PC(b'k)
0.23
0.15
0.04
?
b k-2
0.23
Backbone Step for PEA: Sample
28
b k-1
b'k
P1(b'k)
P2(b'k)
PC(b'k)
0.23
0.15
0.04
?
b k-2
0.15
Review: Previous work on ACMI
29
Protocol
b k-1
bk-2
0.25
0.20
0.15
P(bk)
Phase 2
Phase 3
Review: PEA
Protocol
30
Protocol
b k-1
bk-2
0.05
Protocol
Phase 2
0.14
0.26
Phase 3
Outline
31




Protein Structures
Prior Work on ACMI
Probabilistic Ensembles in ACMI (PEA)
Experiments and Results
Experimental Methodology
32

PEA (Probabilistic Ensembles in ACMI)
4 ensemble components
 Aggregators: AVG, MAX, SAMP


ACMI
ORIG – standard ACMI (prior work)
 EXT – run inference 4 times as long
 BEST – test best of 4 PEA components

Phase 2 Results
33
*p-value < 0.01
Protein Structure Results
34
Correctness
*p-value < 0.05
Completeness
Protein Structure Results
35
Impact of Ensemble Size
36
Conclusions
37



ACMI is the state-of-the-art method for determining
protein structures in poor-resolution images
Probabilistic Ensembles in ACMI (PEA) improves
approximate inference, produces better protein
structures
Future Work
General solution for inference
 Larger ensemble size

Acknowledgements
38
Phillips Laboratory at UW - Madison
 UW Center for Eukaryotic Structural Genomics (CESG)

NLM R01-LM008796
 NLM Training Grant T15-LM007359
 NIH Protein Structure Initiative Grant GM074901

Thank you!