Download Document

Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department [email protected] 3D Protein Structure backbone backbone sidechain sidechain C-alpha 3D Protein Structure … … ALA LEU PRO ? ? VAL ARG ? High-Throughput Structure Determination  Protein-structure determination important    Understanding function of a protein Understanding mechanisms Targets for drug design  Some proteins produce poor density maps  Interpreting poor electron-density maps is very (human) laborious  I aim to automatically interpret poor-quality electron-density maps … … Electron-Density Map Interpretation GIVEN: 3D electron-density map, (linear) amino-acid sequence Electron-Density Map Interpretation … … FIND: All-atom Protein Model Density Map Resolution 1.0Å 2.0Å Morris et al. (2003) 3.0Å Ioerger et al. (2002) Terwilliger (2003) My focus 4.0Å Thesis Contributions  A probabilistic approach to protein-backbone tracing DiMaio et al., Intelligent Systems for Molecular Biology (2006)  Improved template matching in electron-density maps DiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)  Creating all-atom protein models using particle filtering DiMaio et al. (under review)  Pictorial structures for atom-level molecular modeling DiMaio et al., Advances in Neural Information Processing Systems (2004)  Improving the efficiency of belief propagation DiMaio and Shavlik, IEEE International Conference on Data Mining (2006)  Iterative phase improvement in ACMI ACMI Overview  Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)  Independent amino-acid search  Templates model 5-mer conformational space  Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006)  Protein structural constraints refine local search  Markov field (MRF) models pairwise constraints  Phase 3: Sample all-atom models   Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories ACMI Overview  Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)  Independent amino-acid search  Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006)  Protein structural constraints refine local search  Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models   Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories 5-mer Lookup …SAW C VKFEKPADKNGKTE… Protein DB  ACMI searches map for each template independently  Spherical-harmonic decomposition allows rapid search of all template rotations Spherical-Harmonic Decomposition f (θ,φ) 5-mer Fast Rotation Search electron density map pentapeptide fragment from PDB (the “template”) sampled region of density in 5A sphere calculated (expected) density in 5A sphere map-region sampled in spherical shells template-density sampled in spherical shells 5-mer Fast Rotation Search map-region sphericalharmonic coefficients map-region sampled in spherical shells fast-rotation function (Navaza 2006, Risbo 1996) template sphericalharmonic coefficients correlation coefficient as function of rotation template-density sampled in spherical shells Convert Scores to Probabilities scan density map for fragment Bayes’ rule correlation coefficients over density map ti (ui) probability distribution over density map P(5-mer at ui | EDM) ACMI Overview  Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)  Independent amino-acid search  Templates model 5-mer conformational space  Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006)  Protein structural constraints refine local search  Markov field (MRF) models pairwise constraints  Phase 3: Sample all-atom models   Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Probabilistic Backbone Model  Trace assigns a position and orientation ui={xi, qi} to each amino acid i  The probability of a trace U = {ui} is P(U | EDM)  P(u1   uN | EDM)  This full joint probability intractable to compute  Approximate using pairwise Markov field Pairwise Markov-Field Model ALA GLY LYS LEU SER  Joint probabilities defined on a graph as product of vertex and edge potentials P(U | EDM)     (u | EDM)      (u , u )   AAs i i i   AAs i , j ij i j  ACMI’s Backbone Model ALA GLY LYS LEU SER Observational potentials tie the map to the model ACMI’s Backbone Model ALA GLY LYS LEU SER  Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in proper orientation  Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space Backbone Model Potential p(U | EDM)    adj (ui , u j )    occ (ui , u j )    i (ui | EDM) AAs i , j |i  j | 1 AAs i , j |i  j | 1 AAs i Backbone Model Potential p(U | EDM)    adj (ui , u j )    occ (ui , u j )    i (ui | EDM) AAs i , j |i  j | 1 AAs i , j |i  j | 1 AAs i Constraints between adjacent amino acids  adj (ui , u j ) = p x ( || xi  x j || ) × p (ui , u j ) Backbone Model Potential p(U | EDM)    adj (ui , u j )    occ (ui , u j )    i (ui | EDM) AAs i , j |i  j | 1 AAs i , j |i  j | 1 AAs i Constraints between all other amino acid pairs 0 if ||x i  x j||  K  occ (ui , u j )   1 otherwise Backbone Model Potential p(U | EDM)    adj (ui , u j )    occ (ui , u j )    i (ui | EDM) AAs i , j |i  j | 1 AAs i , j |i  j | 1 AAs i Observational (“template-matching”) probabilities  i (ui | EDM)  Pr(5mer si 2 ...si  2 at ui ) Inferring Backbone Locations  Want to find backbone layout that maximizes   adj (ui , u j )    occ (ui , u j )    i (ui | EDM) AAs i , j |i  j | 1 AAs i , j |i  j | 1 AAs i Inferring Backbone Locations  Want to find backbone layout that maximizes   adj (ui , u j )    occ (ui , u j )    i (ui | EDM) AAs i , j |i  j | 1 AAs i , j |i  j | 1 AAs i  Exact methods are intractable  Use belief propagation (Pearl 1988) to approximate marginal distributions pi (ui | EDM)    p(U | EDM) uk , k  i Belief Propagation Example LYS31 LEU32 mLYS31→LEU32 pˆ LYS31 pˆ LEU32 Belief Propagation Example LYS31 LEU32 mLEU32→LYS31 pˆ LYS31 pˆ LEU32 Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006)  Naïve implementation O(N2G2)  N = the number of amino acids in the protein  G = # of points in discretized density map  O(G2) computation for each message passed  O(G log G) as Fourier-space multiplication  O(N2) messages computed & stored  Approx (N-3) occupancy msgs with 1 message  O(N) messages using a message accumulator  Improved implementation O(NG log G) Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006)  Naïve implementation O(N2G2)  N = the number of amino acids in the protein  G = # of points in discretized density map  O(G2) computation for each message passed  O(G log G) as Fourier-space multiplication  O(N2) messages computed & stored  Approx (N-3) occupancy msgs with 1 message  O(N) messages using a message accumulator  Improved implementation O(NG log G) Occupancy Message Approximation  To pass a message n 1 ˆ p (ui ) n i mi  j (u j )    occ (ui , u j ) n1 dui m j i (ui ) ui occupancy edge potential product of incoming msgs to i except from j Occupancy Message Approximation  To pass a message n 1 ˆ p (ui ) n i mi  j (u j )    occ (ui , u j ) n1 dui m j i (ui ) ui  “Weak” potentials between nonadjacent amino acids lets us approximate min (u )   occ (ui , u ) pˆ in 1 (ui ) dui ui occupancy edge potential product of all incoming msgs to i Occupancy Message Approximation pˆ 3 x  occ  m53 3 pˆ 3 x  occ  m13 3 1 2 3 4 pˆ 3 x  occ  m63 3 5 6 Occupancy Message Approximation  occ   pˆ 3 occ  pˆ 3 x3 x3 1  occ  pˆ 3 x3 2 3 4 5 6 Occupancy Message Approximation ACC 1 2 4 3 5 6 Send outgoing occupancy message product to a central accumulator ACC ( x )   mi  ( x ) AAs i Occupancy Message Approximation ACC 1 2 3 4 5 6 Then, each node’s incoming message product is computed in constant time m23  m43 p̂3   3  ACC  m2  m3  m4 BP Output  After some number of iterations, BP gives probability distributions over Cα locations … ALA LEU … PRO … pLEU  xLEU  VAL ARG … … pVAL  xVAL  ACMI’s Backbone Trace  Independently choose Cα locations that maximize approximate marginal distribution b  arg max pˆ i ( xi ) * i xi … Example: 1XRI prob(AA at location) 3.3Å resolution density map 39° mean phase error HIGH 0.9 0.1 LOW 0.9009Å RMSd 93% complete Density-map mean phase error (deg.) Testset Density Maps (raw data) 75 60 45 30 15 1.0 2.0 3.0 4.0 Density-map resolution (Å) % Cα’s located within 2Å of some Cα / correct Cα Experimental Accuracy % backbone correctly placed % amino acids correctly identified 100 80 60 40 20 0 ACMI ARP/ Resolve Textal wARP ACMI % Cα’s located Experimental Accuracy on a Per-Protein Basis 100 100 100 80 80 80 60 60 60 40 40 40 20 20 20 0 0 0 0 20 40 60 80 100 ARP/wARP % Cα’s located 0 20 40 60 80 100 Resolve % Cα’s located 0 20 40 60 80 100 Textal % Cα’s located ACMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)  Independent amino-acid search  Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006)  Protein structural constraints refine local search  Markov field (MRF) models pairwise constraints  Phase 3: Sample all-atom models   Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Problems with ACMI  Biologists want location of all atoms  All Cα’s lie on a discrete grid  Maximum-marginal backbone model may be physically unrealistic  Ignoring a lot of information  Multiple models may better represent conformational variation within crystal Probability=0.4 Probability=0.35 Probability=0.25 Maximummarginal structure ACMI with Particle Filtering (ACMI-PF) Idea: Represent protein using a set of static 3D all-atom protein models Particle Filtering Overview (Doucet et al. 2000)  Given some Markov process x1:KX with observations y1:K Y  Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates N  p  x1:K | y1:K    wt  x1:K  x i 1 (i ) K (i ) 1:K  Particle Filtering Overview  Markov process gives recursive formulation p  x1:k | y1:k   p  yk | xk   p  xk | xk 1   p  x1:k 1 | y1:k 1   Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles  Recursive weight update, wt (i ) k  wt (i ) k 1     p yk | xk(i )  p xk(i ) | xk(i)1  q xk(i ) | xk(i)1 , yk   Particle Filtering for Protein Structures  Particle refers to one specific 3D layout of some subsequence of the protein  At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms Particle Filtering for Protein Structures  Alternate extending chain left and right Particle Filtering for Protein Structures bk-1 bk bk+1 sk  Alternate extending chain left and right  An iteration alternately places   Cα position bk+1 given bk All sidechain atoms sk given bk-1:k+1 Particle Filtering for Protein Structures bk-1 bk bk+1 sk  Key idea: Use the conditional distribution p(bk|bik-1,Map) to advance particle trajectories  Construct this conditional distribution from BP’s marginal distributions Particle Filtering for Protein Structures sk … bk-1 bk bk+1 Algorithm place “seeds” bki for each particle i=1…N while amino-acids remain place bki+1 / bji-1 given bj:ki for each i=1…N place ski given bki-1:k+1 for each i=1…N optionally resample N particles end while … Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b1…L k+1 b k-1 (1) Sample L bk+1’s from bk-1–bk–bk+1 pseudoangle distribution Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b1…L k+1 pk+1(b k1+1 ) pk+1(b k2+1 ) b k-1 pk+1(b kL+1 ) (2) Weight each sample by its ACMI-computed approximate marginal Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b1…L k+1 1 pk+1(b k+1 ) 2 pk+1(b k+1 ) b k-1 L pk+1(b k+1 ) (3) Select bk+1 with probability proportional to sample weight Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b k+1 L b k-1   wtk 1   pk 1 bk 1  wtk 1 (4) Update particle weight as sum of sample weights Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N Protein Data Bank (1) Sample sk from a database of sidechain conformations Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N pk(EDM | s k2) pk(EDM | s k1) pk(EDM | sk3 ) (2) For each sidechain conformation, compute probability of density map given the sidechain Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N pk(EDM | s k2) pk(EDM | s k1) pk(EDM | sk3 ) (3) Select sidechain conformation from this weighted distribution Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N M   wtk 1   p EDM | skm  wtk m 1 (4) Update particle weight as sum of sample weights Particle Resampling wt = 0.4 wt = 0.2 wt = 0.3 wt = 0.2 wt = 0.1 wt = 0.2 wt = 0.1 wt = 0.2 wt = 0.1 wt = 0.2 Amino-Acid Sampling Order j k  Begin at some amino acid k with probability P(k )  exp  entropy  pˆ k (bk )   At each step, move left to right with probability  P( j  1)  exp entropy  pˆ j 1 (b j 1 )   P(k  1)  exp  entropy  pˆ k 1 (bk 1 )  Experimental Methodology  Run ACMI-PF 10 times with 100 particles each  Return highest-weight particle from each run  Each run samples amino-acids in a different order  Refine each structure for 10 iterations in Refmac5  Compare 10-structure model to others using Rfree  R Fobs  Fcalc F obs Refined Rfree ACMI-PF Versus ACMI-Naïve 0.5 0.4 0.3 Acmi-PF Acmi-Naive 0.2 1 2 3 4 5 6 7 8 9 10 Number of ACMI-PF runs Additionally, ACMI-PF’s models have …   Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å) ACMI-PF Rfree ACMI-PF Versus Others 0.65 0.65 0.65 0.55 0.55 0.55 0.45 0.45 0.45 0.35 0.35 0.35 0.25 0.25 0.25 0.25 0.35 0.45 0.55 0.65 ARP/wARP Rfree 0.25 0.35 0.45 0.55 Resolve Rfree 0.65 0.25 0.35 0.45 0.55 Textal Rfree 0.65 ACMI-PF Example: 2A3Q 2.3Å resolution 66° phase err. 1.79Å RMSd 92% complete ACMI Overview  Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)  Independent amino-acid searchphase  Phase 4: Iterative  Templates model 5-mer conformational space improvement Use particle-filtering models to  Phase 2: Coarse backbone model improve density-map quality (ISMB 2006, ICDM 2006)  Rerun entire pipeline on  Protein structural constraints refine local search improved density map  Markov field (MRF) models pairwise constraints  Repeat until convergence   Phase 3: Sample all-atom models   Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Phase Problem Intensities Measured by X-ray crystallography f I,Φ  Phases Experimentally estimated (e.g. MAD, MIR) Density-Map Phasing 0° 30° 60° 75° mean phase error Iterative Phase Improvement Revised Initial density map Predicted 3D model I obs I calc Φ exp calc Φcalc (deg. mean phase error) Error in ACMI-PF’s phases ACMI-PF’s Phase Improvement 75 60 45 30 15 0 0 15 30 45 60 75 Error in initial phases (deg. mean phase error) % backbone located Iteration 2 Two-Iteration ACMI 100 90 80 70 60 50 50 60 70 80 90 100 % backbone located Iteration 1 Average mean phase error 60 50 40 ? 30 20 10 0 0 1 2 3 4 Number of ACMI iterations Average % uninterpreted AAs Future Work: Many-iteration ACMI 20 15 ? 10 5 0 1 2 3 4 5 Number of ACMI iterations Conclusions  ACMI’s three steps construct a set of all-atom protein models from a density map  Novel message approximation allows inference on large, highly-connected models  Resulting protein models are more accurate than other methods Ongoing and Future Work  Incorporate additional structural biology background knowledge  Incorporate more complex potential functions  Further work on iterative phase improvement  Generalize my algorithms to other 3D image data Acknowledgements  Advisor Jude Shavlik  Committee George Phillips  Charles Dyer  David Page  Mark Craven  Collaborators  Ameet Soni  Dmitry Kondrashov  Eduard Bitto  Craig Bingman  6th floor MSCers   Center for Eukaryotic Structural Genomics  Funding    UW-Madison Graduate School NLM 1T15 LM007359 NLM 1R01 LM008796

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document