* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides
Survey
Document related concepts
G protein–coupled receptor wikipedia , lookup
Size-exclusion chromatography wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
Point mutation wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Interactome wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Structural alignment wikipedia , lookup
Protein purification wikipedia , lookup
Proteolysis wikipedia , lookup
Transcript
Probabilistic Ensembles for Improved Inference in Protein-Structure Determination Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011 Protein Structure Determination 2 Proteins essential to most cellular function Structural support Catalysis/enzymatic activity Cell signaling Protein structures determine function X-ray crystallography is main technique for determining structures Task Overview 3 Given A protein sequence Electron-density map (EDM) of protein Do Automatically produce a protein structure that Contains all atoms Is physically feasible SAVRVGLAIM... Challenges & Related Work 4 Resolution is a property of the protein 1Å 2Å 3Å 4Å ARP/wARP TEXTAL & RESOLVE Higher Resolution : Better Quality Our Method: ACMI Outline 5 Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Outline 6 Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Our Technique: ACMI 7 Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 bk b*1…M k+1 bk-1 prior probability of each AA’s location posterior probability of each AA’s location all-atom protein structures Results [DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007] 8 ACMI Outline 9 Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 bk b*1…M k+1 bk-1 prior probability of each AA’s location posterior probability of each AA’s location all-atom protein structures Phase 2 – Probabilistic Model 10 ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF) ALA1 GLY2 LYS3 LEU4 SER5 Probabilistic Model 11 # nodes: ~1,000 # edges: ~1,000,000 Approximate Inference 12 Best structure intractable to calculate i.e., we cannot infer the underlying structure analytically Phase 2 uses Loopy Belief Propagation (BP) to approximate solution Local, message-passing scheme Distributes evidence between nodes Loopy Belief Propagation 13 LYS31 LEU32 mLYS31→LEU32 pLYS31 pLEU32 Loopy Belief Propagation 14 LYS31 LEU32 mLEU32→LEU31 pLYS31 pLEU32 Shortcomings of Phase 2 15 Inference is very difficult ~1,000,000 possible outputs for one amino acid ~250-1250 amino acids in one protein Evidence is noisy 2 O(N ) constraints Approximate solutions, room for improvement Outline 16 Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Ensemble Methods 17 Ensembles: the use of multiple models to improve predictive performance Tend to outperform best single model [Dietterich ‘00] Eg, Netflix prize Phase 2: Standard ACMI 18 MRF Protocol P(bk) Phase 2: Ensemble ACMI 19 MRF P1(bk) Protocol 1 Protocol C P2(bk) … … Protocol 2 PC(bk) Probabilistic Ensembles in ACMI (PEA) 20 New ensemble framework (PEA) Run inference multiple times, under different conditions Output: multiple, diverse, estimates of each amino acid’s location Phase 2 now has several probability distributions for each amino acid, so what? ACMI Outline 21 Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 bk b*1…M k+1 bk-1 prior probability of each AA’s location posterior probability of each AA’s location all-atom protein structures Backbone Step (Prior work) 22 Place next backbone atom b k-1 bk-2 b'k ? ? ? ? ? (1) Sample bk from empirical Ca- Ca- Ca pseudoangle distribution Backbone Step (Prior work) 23 Place next backbone atom b k-1 b'k 0.25 0.20 b k-2 0.15 (2) Weight each sample by its Phase 2 computed marginal Backbone Step (Prior work) 24 Place next backbone atom b k-1 b'k 0.25 0.20 b k-2 0.15 (3) Select bk with probability proportional to sample weight Backbone Step for PEA 25 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 w(b'k ) Backbone Step for PEA: Average 26 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 0.14 Backbone Step for PEA: Maximum 27 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 0.23 Backbone Step for PEA: Sample 28 b k-1 b'k P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 ? b k-2 0.15 Review: Previous work on ACMI 29 Protocol b k-1 bk-2 0.25 0.20 0.15 P(bk) Phase 2 Phase 3 Review: PEA Protocol 30 Protocol b k-1 bk-2 0.05 Protocol Phase 2 0.14 0.26 Phase 3 Outline 31 Protein Structures Prior Work on ACMI Probabilistic Ensembles in ACMI (PEA) Experiments and Results Experimental Methodology 32 PEA (Probabilistic Ensembles in ACMI) 4 ensemble components Aggregators: AVG, MAX, SAMP ACMI ORIG – standard ACMI (prior work) EXT – run inference 4 times as long BEST – test best of 4 PEA components Phase 2 Results 33 *p-value < 0.01 Protein Structure Results 34 Correctness *p-value < 0.05 Completeness Protein Structure Results 35 Impact of Ensemble Size 36 Conclusions 37 ACMI is the state-of-the-art method for determining protein structures in poor-resolution images Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures Future Work General solution for inference Larger ensemble size Acknowledgements 38 Phillips Laboratory at UW - Madison UW Center for Eukaryotic Structural Genomics (CESG) NLM R01-LM008796 NLM Training Grant T15-LM007359 NIH Protein Structure Initiative Grant GM074901 Thank you!