Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department [email protected] 3D Protein Structure backbone backbone sidechain sidechain C-alpha 3D Protein Structure … … ALA LEU PRO ? ? VAL ARG ? High-Throughput Structure Determination Protein-structure determination important Understanding function of a protein Understanding mechanisms Targets for drug design Some proteins produce poor density maps Interpreting poor electron-density maps is very (human) laborious I aim to automatically interpret poor-quality electron-density maps … … Electron-Density Map Interpretation GIVEN: 3D electron-density map, (linear) amino-acid sequence Electron-Density Map Interpretation … … FIND: All-atom Protein Model Density Map Resolution 1.0Å 2.0Å Morris et al. (2003) 3.0Å Ioerger et al. (2002) Terwilliger (2003) My focus 4.0Å Thesis Contributions A probabilistic approach to protein-backbone tracing DiMaio et al., Intelligent Systems for Molecular Biology (2006) Improved template matching in electron-density maps DiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007) Creating all-atom protein models using particle filtering DiMaio et al. (under review) Pictorial structures for atom-level molecular modeling DiMaio et al., Advances in Neural Information Processing Systems (2004) Improving the efficiency of belief propagation DiMaio and Shavlik, IEEE International Conference on Data Mining (2006) Iterative phase improvement in ACMI ACMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories ACMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories 5-mer Lookup …SAW C VKFEKPADKNGKTE… Protein DB ACMI searches map for each template independently Spherical-harmonic decomposition allows rapid search of all template rotations Spherical-Harmonic Decomposition f (θ,φ) 5-mer Fast Rotation Search electron density map pentapeptide fragment from PDB (the “template”) sampled region of density in 5A sphere calculated (expected) density in 5A sphere map-region sampled in spherical shells template-density sampled in spherical shells 5-mer Fast Rotation Search map-region sphericalharmonic coefficients map-region sampled in spherical shells fast-rotation function (Navaza 2006, Risbo 1996) template sphericalharmonic coefficients correlation coefficient as function of rotation template-density sampled in spherical shells Convert Scores to Probabilities scan density map for fragment Bayes’ rule correlation coefficients over density map ti (ui) probability distribution over density map P(5-mer at ui | EDM) ACMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Probabilistic Backbone Model Trace assigns a position and orientation ui={xi, qi} to each amino acid i The probability of a trace U = {ui} is P(U | EDM) P(u1 uN | EDM) This full joint probability intractable to compute Approximate using pairwise Markov field Pairwise Markov-Field Model ALA GLY LYS LEU SER Joint probabilities defined on a graph as product of vertex and edge potentials P(U | EDM) (u | EDM) (u , u ) AAs i i i AAs i , j ij i j ACMI’s Backbone Model ALA GLY LYS LEU SER Observational potentials tie the map to the model ACMI’s Backbone Model ALA GLY LYS LEU SER Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in proper orientation Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space Backbone Model Potential p(U | EDM) adj (ui , u j ) occ (ui , u j ) i (ui | EDM) AAs i , j |i j | 1 AAs i , j |i j | 1 AAs i Backbone Model Potential p(U | EDM) adj (ui , u j ) occ (ui , u j ) i (ui | EDM) AAs i , j |i j | 1 AAs i , j |i j | 1 AAs i Constraints between adjacent amino acids adj (ui , u j ) = p x ( || xi x j || ) × p (ui , u j ) Backbone Model Potential p(U | EDM) adj (ui , u j ) occ (ui , u j ) i (ui | EDM) AAs i , j |i j | 1 AAs i , j |i j | 1 AAs i Constraints between all other amino acid pairs 0 if ||x i x j|| K occ (ui , u j ) 1 otherwise Backbone Model Potential p(U | EDM) adj (ui , u j ) occ (ui , u j ) i (ui | EDM) AAs i , j |i j | 1 AAs i , j |i j | 1 AAs i Observational (“template-matching”) probabilities i (ui | EDM) Pr(5mer si 2 ...si 2 at ui ) Inferring Backbone Locations Want to find backbone layout that maximizes adj (ui , u j ) occ (ui , u j ) i (ui | EDM) AAs i , j |i j | 1 AAs i , j |i j | 1 AAs i Inferring Backbone Locations Want to find backbone layout that maximizes adj (ui , u j ) occ (ui , u j ) i (ui | EDM) AAs i , j |i j | 1 AAs i , j |i j | 1 AAs i Exact methods are intractable Use belief propagation (Pearl 1988) to approximate marginal distributions pi (ui | EDM) p(U | EDM) uk , k i Belief Propagation Example LYS31 LEU32 mLYS31→LEU32 pˆ LYS31 pˆ LEU32 Belief Propagation Example LYS31 LEU32 mLEU32→LYS31 pˆ LYS31 pˆ LEU32 Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006) Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map O(G2) computation for each message passed O(G log G) as Fourier-space multiplication O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator Improved implementation O(NG log G) Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006) Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map O(G2) computation for each message passed O(G log G) as Fourier-space multiplication O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator Improved implementation O(NG log G) Occupancy Message Approximation To pass a message n 1 ˆ p (ui ) n i mi j (u j ) occ (ui , u j ) n1 dui m j i (ui ) ui occupancy edge potential product of incoming msgs to i except from j Occupancy Message Approximation To pass a message n 1 ˆ p (ui ) n i mi j (u j ) occ (ui , u j ) n1 dui m j i (ui ) ui “Weak” potentials between nonadjacent amino acids lets us approximate min (u ) occ (ui , u ) pˆ in 1 (ui ) dui ui occupancy edge potential product of all incoming msgs to i Occupancy Message Approximation pˆ 3 x occ m53 3 pˆ 3 x occ m13 3 1 2 3 4 pˆ 3 x occ m63 3 5 6 Occupancy Message Approximation occ pˆ 3 occ pˆ 3 x3 x3 1 occ pˆ 3 x3 2 3 4 5 6 Occupancy Message Approximation ACC 1 2 4 3 5 6 Send outgoing occupancy message product to a central accumulator ACC ( x ) mi ( x ) AAs i Occupancy Message Approximation ACC 1 2 3 4 5 6 Then, each node’s incoming message product is computed in constant time m23 m43 p̂3 3 ACC m2 m3 m4 BP Output After some number of iterations, BP gives probability distributions over Cα locations … ALA LEU … PRO … pLEU xLEU VAL ARG … … pVAL xVAL ACMI’s Backbone Trace Independently choose Cα locations that maximize approximate marginal distribution b arg max pˆ i ( xi ) * i xi … Example: 1XRI prob(AA at location) 3.3Å resolution density map 39° mean phase error HIGH 0.9 0.1 LOW 0.9009Å RMSd 93% complete Density-map mean phase error (deg.) Testset Density Maps (raw data) 75 60 45 30 15 1.0 2.0 3.0 4.0 Density-map resolution (Å) % Cα’s located within 2Å of some Cα / correct Cα Experimental Accuracy % backbone correctly placed % amino acids correctly identified 100 80 60 40 20 0 ACMI ARP/ Resolve Textal wARP ACMI % Cα’s located Experimental Accuracy on a Per-Protein Basis 100 100 100 80 80 80 60 60 60 40 40 40 20 20 20 0 0 0 0 20 40 60 80 100 ARP/wARP % Cα’s located 0 20 40 60 80 100 Resolve % Cα’s located 0 20 40 60 80 100 Textal % Cα’s located ACMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Problems with ACMI Biologists want location of all atoms All Cα’s lie on a discrete grid Maximum-marginal backbone model may be physically unrealistic Ignoring a lot of information Multiple models may better represent conformational variation within crystal Probability=0.4 Probability=0.35 Probability=0.25 Maximummarginal structure ACMI with Particle Filtering (ACMI-PF) Idea: Represent protein using a set of static 3D all-atom protein models Particle Filtering Overview (Doucet et al. 2000) Given some Markov process x1:KX with observations y1:K Y Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates N p x1:K | y1:K wt x1:K x i 1 (i ) K (i ) 1:K Particle Filtering Overview Markov process gives recursive formulation p x1:k | y1:k p yk | xk p xk | xk 1 p x1:k 1 | y1:k 1 Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles Recursive weight update, wt (i ) k wt (i ) k 1 p yk | xk(i ) p xk(i ) | xk(i)1 q xk(i ) | xk(i)1 , yk Particle Filtering for Protein Structures Particle refers to one specific 3D layout of some subsequence of the protein At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms Particle Filtering for Protein Structures Alternate extending chain left and right Particle Filtering for Protein Structures bk-1 bk bk+1 sk Alternate extending chain left and right An iteration alternately places Cα position bk+1 given bk All sidechain atoms sk given bk-1:k+1 Particle Filtering for Protein Structures bk-1 bk bk+1 sk Key idea: Use the conditional distribution p(bk|bik-1,Map) to advance particle trajectories Construct this conditional distribution from BP’s marginal distributions Particle Filtering for Protein Structures sk … bk-1 bk bk+1 Algorithm place “seeds” bki for each particle i=1…N while amino-acids remain place bki+1 / bji-1 given bj:ki for each i=1…N place ski given bki-1:k+1 for each i=1…N optionally resample N particles end while … Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b1…L k+1 b k-1 (1) Sample L bk+1’s from bk-1–bk–bk+1 pseudoangle distribution Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b1…L k+1 pk+1(b k1+1 ) pk+1(b k2+1 ) b k-1 pk+1(b kL+1 ) (2) Weight each sample by its ACMI-computed approximate marginal Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b1…L k+1 1 pk+1(b k+1 ) 2 pk+1(b k+1 ) b k-1 L pk+1(b k+1 ) (3) Select bk+1 with probability proportional to sample weight Backbone Step (for particle i ) place bki+1 given bki for each i=1…N bk b k+1 L b k-1 wtk 1 pk 1 bk 1 wtk 1 (4) Update particle weight as sum of sample weights Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N Protein Data Bank (1) Sample sk from a database of sidechain conformations Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N pk(EDM | s k2) pk(EDM | s k1) pk(EDM | sk3 ) (2) For each sidechain conformation, compute probability of density map given the sidechain Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N pk(EDM | s k2) pk(EDM | s k1) pk(EDM | sk3 ) (3) Select sidechain conformation from this weighted distribution Sidechain Step (for particle i ) place ski given bki-1:k+1 for each i=1…N M wtk 1 p EDM | skm wtk m 1 (4) Update particle weight as sum of sample weights Particle Resampling wt = 0.4 wt = 0.2 wt = 0.3 wt = 0.2 wt = 0.1 wt = 0.2 wt = 0.1 wt = 0.2 wt = 0.1 wt = 0.2 Amino-Acid Sampling Order j k Begin at some amino acid k with probability P(k ) exp entropy pˆ k (bk ) At each step, move left to right with probability P( j 1) exp entropy pˆ j 1 (b j 1 ) P(k 1) exp entropy pˆ k 1 (bk 1 ) Experimental Methodology Run ACMI-PF 10 times with 100 particles each Return highest-weight particle from each run Each run samples amino-acids in a different order Refine each structure for 10 iterations in Refmac5 Compare 10-structure model to others using Rfree R Fobs Fcalc F obs Refined Rfree ACMI-PF Versus ACMI-Naïve 0.5 0.4 0.3 Acmi-PF Acmi-Naive 0.2 1 2 3 4 5 6 7 8 9 10 Number of ACMI-PF runs Additionally, ACMI-PF’s models have … Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å) ACMI-PF Rfree ACMI-PF Versus Others 0.65 0.65 0.65 0.55 0.55 0.55 0.45 0.45 0.45 0.35 0.35 0.35 0.25 0.25 0.25 0.25 0.35 0.45 0.55 0.65 ARP/wARP Rfree 0.25 0.35 0.45 0.55 Resolve Rfree 0.65 0.25 0.35 0.45 0.55 Textal Rfree 0.65 ACMI-PF Example: 2A3Q 2.3Å resolution 66° phase err. 1.79Å RMSd 92% complete ACMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid searchphase Phase 4: Iterative Templates model 5-mer conformational space improvement Use particle-filtering models to Phase 2: Coarse backbone model improve density-map quality (ISMB 2006, ICDM 2006) Rerun entire pipeline on Protein structural constraints refine local search improved density map Markov field (MRF) models pairwise constraints Repeat until convergence Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Phase Problem Intensities Measured by X-ray crystallography f I,Φ Phases Experimentally estimated (e.g. MAD, MIR) Density-Map Phasing 0° 30° 60° 75° mean phase error Iterative Phase Improvement Revised Initial density map Predicted 3D model I obs I calc Φ exp calc Φcalc (deg. mean phase error) Error in ACMI-PF’s phases ACMI-PF’s Phase Improvement 75 60 45 30 15 0 0 15 30 45 60 75 Error in initial phases (deg. mean phase error) % backbone located Iteration 2 Two-Iteration ACMI 100 90 80 70 60 50 50 60 70 80 90 100 % backbone located Iteration 1 Average mean phase error 60 50 40 ? 30 20 10 0 0 1 2 3 4 Number of ACMI iterations Average % uninterpreted AAs Future Work: Many-iteration ACMI 20 15 ? 10 5 0 1 2 3 4 5 Number of ACMI iterations Conclusions ACMI’s three steps construct a set of all-atom protein models from a density map Novel message approximation allows inference on large, highly-connected models Resulting protein models are more accurate than other methods Ongoing and Future Work Incorporate additional structural biology background knowledge Incorporate more complex potential functions Further work on iterative phase improvement Generalize my algorithms to other 3D image data Acknowledgements Advisor Jude Shavlik Committee George Phillips Charles Dyer David Page Mark Craven Collaborators Ameet Soni Dmitry Kondrashov Eduard Bitto Craig Bingman 6th floor MSCers Center for Eukaryotic Structural Genomics Funding UW-Madison Graduate School NLM 1T15 LM007359 NLM 1R01 LM008796