* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Talk
List of types of proteins wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein design wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein domain wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein folding wikipedia , lookup
Structural alignment wikipedia , lookup
Protein purification wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Homology modeling wikipedia , lookup
Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu Baylor College of Medicine, Houston, USA Shape Matching • Shape comparison – How similar are shape A and shape B? – Application: 3D model retrieval • Shape alignment – What is the best alignment of A onto B? – Application: object recognition and registration Shape Matching • Shape comparison – How similar are shape A and shape B? – Application: 3D model retrieval • Shape alignment – What is the best alignment of A onto B? – Application: object recognition and registration 1D Protein Sequence 3D Protein Image Structural Biology • Protein: a sequence of amino acids – Folds into a 3D structure in order to interact with other molecules – Protein function derived from its 3D structure … • Identifying protein structure – Imaging methods: X-ray, NMR – Drawback: can not resolve large assemblies, like viruses. Domain Problem • Cryo-electron microscopy (Cryo-EM) – Produces 3D density volumes – Drawback: insufficient resolution to resolve atom locations ? • How to determine protein structure in a cryo-EM volume? Shape Matching Formulation • Matching 1D protein sequence with 3D density volume • Intermediate goal: Matching alpha-helices – One of the basic building blocks in a protein – Identified as cylindrical densities in the volume [Baker 07] + ? • How to align the protein sequence with the cryo-EM volume to match the two sets of helices? Method Overview • Compatible shape representation – 1D sequence and 3D volume as attributed relational graphs • Graph-based shape matching – A new constrained graph matching problem and an optimal solution – Error-tolerant (inexact) matching Shape Representation • Protein sequence as attributed relation graph – An edge: a helix segment or a non-helix segment • Attribute: number of amino acids in the segment – A node: end of a helix of end of the sequence – Add additional edges that skip at most m helix segments • To allow matching with a cryo-EM volume that has missing helices Shape Representation • Graph representation of Cryo-EM volume via skeletons – 3D Skeleton [Ju 06] builds connectivity among detected helices – An edge: a detected helix or a skeleton path between two helices • Attribute: length of the helix or skeleton path – A node: end of a helix of end of the protein – Add additional edges between helix-ends less than d apart • To account for missing helix connectivity in the skeleton Shape Matching - Problem • Finding two matching chains of helices – Same number of edges – Alternating types between non-helix and helix – Minimal attribute matching error • Uniqueness of this problem: – Inexact: not all edges/nodes in the two graphs are used in the matched sequence – Constrained: the match must have a linear topology Shape Matching - Review • Previous work on graph matching – Exact matching • Graph mono-morphism [Wong 90] • Sub-graph isomorphism [Ullmann 76, Cordella 99] – Inexact matching • A* search [Nilsson 80], simulated annealing [Herault 90], neural networks [Feng 94], probabilistic relaxation [Christmas 95], genetic algorithms [Wang 97], graph decomposition [Messmer 98] • All designed for un-constrained problems where there is no restriction on the topology of the matched sub-graphs. Shape Matching - Method • Key idea: utilize the linearity of chains. • Performing depth-first tree-search Sequence Graph Volume Graph {1,1} – Append matching nodes to the incomplete chain with minimal matching error • A*-search – Reduce node expansion by estimating future matching error {2,2} {2,3} {2,4} {2,5} 40 42 85 92 {3,4} 48 – Optimal if future error estimation is smaller than the actual error. – 3 future error functions are designed {4,3} {4,5} 99 51 {6,6} 58 {3,5} 91 {3,2} {3,3} {3,4} 61 63 72 Experimental Setup • Test data – Simulated data: 8 proteins (taken from Protein Data Bank) – Authentic data: 3 proteins (produced at Baylor) • Test modes – Automatic – With a few user-specified helix correspondences • Validation with the actual helix correspondence – Produce a list of candidates sorted by their matching errors – Find out where the actual correspondence ranks in the list Results - 1 • Bluetongue Virus (simulated, 10 helices, 0 missing) – Actual correspondence ranks #1 + Sequence Cryo-EM volume and its skeleton Top Matching Results - 2 • Human Insulin Receptor (simulated, 9 helices, 1 missing) – Actual correspondence ranks #1 + Sequence + Cryo-EM volume and its skeleton Top Matching Results - 3 • Bacteriophage P22 (authentic, 11 helices, 6 missing) – Actual correspondence ranks #4 + Sequence Cryo-EM skeleton Top Matching Actual Correspondence Results - 4 • Triose Phosphate Isomerase (simulated, 12 helices, 3 missing) – Before user-specification: actual correspondence not in the candidate list – Given 2 specified helix pairs: actual correspondence ranks #9 + Sequence Cryo-EM skeleton with 2 use-specified helix pairs Top Matching Without userspecification Actual Correspondence Result - Summary • Among the 11 proteins, the correct correspondence ranks among the candidate list computed by our method: – Top 1: 4 proteins – Within top 10: 2 proteins (1 simulated) – Top 1 after user-interaction: 2 proteins (both simulated) • 4 specified helix pairs in a 14/20-helix protein. – Within top 10 after user-interaction: 3 proteins • 2 specified helix pairs in a 6/9/12-helix protein • Performance – Under 4 seconds for proteins with 20 helices – Compare: [Wu 05] uses exhaustive search and takes 16 hours for finding correspondences in proteins with 8 helices Conclusion • Formulate protein structure identification as shape matching – 1D protein sequence vs. 3D cryo-EM density volume – Compatible representation of disparate biological data as graphs • Formulate a constrained inexact matching problem and propose an optimal solution – Based on A*-search • Validation on simulated and authentic data Future Work (Bio) • Incorporating beta-sheets for improved accuracy – Challenge: the match is no longer a linear chain • Integrating homology and ab initio modeling – Utilizing known 3D structure of segments – Refining the alignment by molecular energy minimization Future Work (CS) • Faster graph matching algorithm – Explore variants of A*-search to reduce running time for larger proteins (>20 helices) • Better skeleton generation – Generate skeletons directly from gray-scale density volume for iso-value-independent representation – Utilize cell-complex-based skeleton for better skeleton geometry • Currently used for topology editing, see [Ju, Zhou and Hu. Siggraph 2007] Pacific Graphics • Hawaii • 2007 • Oct 29 – Nov 2, in Maui, Hawaii Conference Chair: Ron Goldman Program co-chairs: Marc Alexa, Steven Gortler, Tao Ju Results - 1 • Bluetongue Virus (simulated, 10 helices, 0 missing) – Actual correspondence ranks #1 + Sequence Cryo-EM volume and its skeleton Top Matching