Download Slides

Probabilistic Ensembles for Improved Inference in Protein-Structure Determination Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011 Protein Structure Determination 2  Proteins essential to most cellular function Structural support  Catalysis/enzymatic activity  Cell signaling    Protein structures determine function X-ray crystallography is main technique for determining structures Task Overview 3  Given A protein sequence  Electron-density map (EDM) of protein   Do  Automatically produce a protein structure that Contains all atoms  Is physically feasible  SAVRVGLAIM... Challenges & Related Work 4 Resolution is a property of the protein 1Å 2Å 3Å 4Å ARP/wARP TEXTAL & RESOLVE Higher Resolution : Better Quality Our Method: ACMI Outline 5     Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Outline 6     Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Our Technique: ACMI 7 Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 bk b*1…M k+1 bk-1 prior probability of each AA’s location posterior probability of each AA’s location all-atom protein structures Results [DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007] 8 ACMI Outline 9 Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 bk b*1…M k+1 bk-1 prior probability of each AA’s location posterior probability of each AA’s location all-atom protein structures Phase 2 – Probabilistic Model 10  ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF) ALA1 GLY2 LYS3 LEU4 SER5 Probabilistic Model 11 # nodes: ~1,000 # edges: ~1,000,000 Approximate Inference 12  Best structure intractable to calculate i.e., we cannot infer the underlying structure analytically  Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme  Distributes evidence between nodes  Loopy Belief Propagation 13 LYS31 LEU32 mLYS31→LEU32 pLYS31 pLEU32 Loopy Belief Propagation 14 LYS31 LEU32 mLEU32→LEU31 pLYS31 pLEU32 Shortcomings of Phase 2 15  Inference is very difficult ~1,000,000 possible outputs for one amino acid  ~250-1250 amino acids in one protein  Evidence is noisy 2  O(N ) constraints   Approximate solutions, room for improvement Outline 16     Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Ensemble Methods 17   Ensembles: the use of multiple models to improve predictive performance Tend to outperform best single model [Dietterich ‘00]  Eg, Netflix prize Phase 2: Standard ACMI 18 MRF Protocol P(bk) Phase 2: Ensemble ACMI 19 MRF P1(bk) Protocol 1 Protocol C P2(bk) … … Protocol 2 PC(bk) Probabilistic Ensembles in ACMI (PEA) 20  New ensemble framework (PEA) Run inference multiple times, under different conditions  Output: multiple, diverse, estimates of each amino acid’s location   Phase 2 now has several probability distributions for each amino acid, so what? ACMI Outline 21 Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 bk b*1…M k+1 bk-1 prior probability of each AA’s location posterior probability of each AA’s location all-atom protein structures Backbone Step (Prior work) 22 Place next backbone atom b k-1 bk-2 b'k ? ? ? ? ? (1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution Backbone Step (Prior work) 23 Place next backbone atom b k-1 b'k 0.25 0.20 b k-2 0.15 (2) Weight each sample by its Phase 2 computed marginal Backbone Step (Prior work) 24 Place next backbone atom b k-1 b'k 0.25 0.20 b k-2 0.15 (3) Select bk with probability proportional to sample weight Backbone Step for PEA 25 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 w(b'k ) Backbone Step for PEA: Average 26 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 0.14 Backbone Step for PEA: Maximum 27 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 0.23 Backbone Step for PEA: Sample 28 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 0.15 Review: Previous work on ACMI 29 Protocol b k-1 bk-2 0.25 0.20 0.15 P(bk) Phase 2 Phase 3 Review: PEA Protocol 30 Protocol b k-1 bk-2 0.05 Protocol Phase 2 0.14 0.26 Phase 3 Outline 31     Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Experimental Methodology 32  PEA (Probabilistic Ensembles in ACMI) 4 ensemble components  Aggregators: AVG, MAX, SAMP   ACMI ORIG – standard ACMI (prior work)  EXT – run inference 4 times as long  BEST – test best of 4 PEA components  Phase 2 Results 33 *p-value < 0.01 Protein Structure Results 34 Correctness *p-value < 0.05 Completeness Protein Structure Results 35 Impact of Ensemble Size 36 Conclusions 37    ACMI is the state-of-the-art method for determining protein structures in poor-resolution images Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures Future Work General solution for inference  Larger ensemble size  Acknowledgements 38 Phillips Laboratory at UW - Madison  UW Center for Eukaryotic Structural Genomics (CESG)  NLM R01-LM008796  NLM Training Grant T15-LM007359  NIH Protein Structure Initiative Grant GM074901  Thank you!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slides