Download Talk

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Rosetta@home wikipedia , lookup

Protein design wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein domain wikipedia , lookup

Protein wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein folding wikipedia , lookup

Proteomics wikipedia , lookup

Structural alignment wikipedia , lookup

Cyclol wikipedia , lookup

Protein purification wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Western blot wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Alpha helix wikipedia , lookup

Transcript
Shape Modeling and Matching in
Protein Structure Identification
Sasakthi Abeysinghe, Tao Ju
Washington University, St. Louis, USA
Matthew Baker, Wah Chiu
Baylor College of Medicine, Houston, USA
Shape Matching
• Shape comparison
– How similar are shape A and shape B?
– Application: 3D model retrieval
• Shape alignment
– What is the best alignment of A onto B?
– Application: object recognition and registration
Shape Matching
• Shape comparison
– How similar are shape A and shape B?
– Application: 3D model retrieval
• Shape alignment
– What is the best alignment of A onto B?
– Application: object recognition and registration
1D Protein Sequence
3D Protein Image
Structural Biology
• Protein: a sequence of amino acids
– Folds into a 3D structure in order to interact with other molecules
– Protein function derived from its 3D structure
…
• Identifying protein structure
– Imaging methods: X-ray, NMR
– Drawback: can not resolve large assemblies, like viruses.
Domain Problem
• Cryo-electron microscopy (Cryo-EM)
– Produces 3D density volumes
– Drawback: insufficient resolution to resolve atom locations
?
• How to determine protein structure in a cryo-EM volume?
Shape Matching Formulation
• Matching 1D protein sequence with 3D density volume
• Intermediate goal: Matching alpha-helices
– One of the basic building blocks in a protein
– Identified as cylindrical densities in the volume [Baker 07]
+
?
• How to align the protein sequence with the cryo-EM
volume to match the two sets of helices?
Method Overview
• Compatible shape representation
– 1D sequence and 3D volume as attributed relational graphs
• Graph-based shape matching
– A new constrained graph matching problem and an optimal
solution
– Error-tolerant (inexact) matching
Shape Representation
• Protein sequence as attributed relation graph
– An edge: a helix segment or a non-helix segment
• Attribute: number of amino acids in the segment
– A node: end of a helix of end of the sequence
– Add additional edges that skip at most m helix segments
• To allow matching with a cryo-EM volume that has missing helices
Shape Representation
• Graph representation of Cryo-EM volume via skeletons
– 3D Skeleton [Ju 06] builds connectivity among detected helices
– An edge: a detected helix or a skeleton path between two helices
• Attribute: length of the helix or skeleton path
– A node: end of a helix of
end of the protein
– Add additional edges
between helix-ends less
than d apart
• To account for missing helix
connectivity in the skeleton
Shape Matching - Problem
• Finding two matching chains of helices
– Same number of edges
– Alternating types between non-helix and helix
– Minimal attribute matching error
• Uniqueness of this problem:
– Inexact: not all edges/nodes in the two graphs are used in the
matched sequence
– Constrained: the match must have a linear topology
Shape Matching - Review
• Previous work on graph matching
– Exact matching
• Graph mono-morphism [Wong 90]
• Sub-graph isomorphism [Ullmann 76, Cordella 99]
– Inexact matching
• A* search [Nilsson 80], simulated annealing [Herault 90], neural
networks [Feng 94], probabilistic relaxation [Christmas 95],
genetic algorithms [Wang 97], graph decomposition [Messmer 98]
• All designed for un-constrained problems where there is no
restriction on the topology of the matched sub-graphs.
Shape Matching - Method
• Key idea: utilize the linearity of chains.
• Performing depth-first tree-search
Sequence Graph
Volume Graph
{1,1}
– Append matching nodes to the incomplete
chain with minimal matching error
• A*-search
– Reduce node expansion by
estimating future matching error
{2,2} {2,3} {2,4} {2,5}
40
42
85
92
{3,4}
48
– Optimal if future error estimation is
smaller than the actual error.
– 3 future error functions are designed
{4,3} {4,5}
99
51
{6,6}
58
{3,5}
91
{3,2} {3,3} {3,4}
61 63 72
Experimental Setup
• Test data
– Simulated data: 8 proteins (taken from Protein Data Bank)
– Authentic data: 3 proteins (produced at Baylor)
• Test modes
– Automatic
– With a few user-specified helix correspondences
• Validation with the actual helix correspondence
– Produce a list of candidates sorted by their matching errors
– Find out where the actual correspondence ranks in the list
Results - 1
• Bluetongue Virus (simulated, 10 helices, 0 missing)
– Actual correspondence ranks #1
+
Sequence
Cryo-EM volume and its skeleton
Top Matching
Results - 2
• Human Insulin Receptor (simulated, 9 helices, 1 missing)
– Actual correspondence ranks #1
+
Sequence
+
Cryo-EM volume and its skeleton
Top Matching
Results - 3
• Bacteriophage P22 (authentic, 11 helices, 6 missing)
– Actual correspondence ranks #4
+
Sequence
Cryo-EM skeleton
Top Matching
Actual Correspondence
Results - 4
• Triose Phosphate Isomerase (simulated, 12 helices, 3 missing)
– Before user-specification: actual correspondence not in the candidate list
– Given 2 specified helix pairs: actual correspondence ranks #9
+
Sequence
Cryo-EM skeleton with 2
use-specified helix pairs
Top Matching
Without userspecification
Actual Correspondence
Result - Summary
• Among the 11 proteins, the correct correspondence ranks among
the candidate list computed by our method:
– Top 1: 4 proteins
– Within top 10: 2 proteins (1 simulated)
– Top 1 after user-interaction: 2 proteins (both simulated)
• 4 specified helix pairs in a 14/20-helix protein.
– Within top 10 after user-interaction: 3 proteins
• 2 specified helix pairs in a 6/9/12-helix protein
• Performance
– Under 4 seconds for proteins with 20 helices
– Compare: [Wu 05] uses exhaustive search and takes 16 hours for finding
correspondences in proteins with 8 helices
Conclusion
• Formulate protein structure identification as shape
matching
– 1D protein sequence vs. 3D cryo-EM density volume
– Compatible representation of disparate biological data as graphs
• Formulate a constrained inexact matching problem and
propose an optimal solution
– Based on A*-search
• Validation on simulated and authentic data
Future Work (Bio)
• Incorporating beta-sheets for improved accuracy
– Challenge: the match is no longer a linear chain
• Integrating homology and ab initio modeling
– Utilizing known 3D structure of segments
– Refining the alignment by molecular energy minimization
Future Work (CS)
• Faster graph matching algorithm
– Explore variants of A*-search to reduce running time for larger
proteins (>20 helices)
• Better skeleton generation
– Generate skeletons directly from gray-scale density volume for
iso-value-independent representation
– Utilize cell-complex-based skeleton for better skeleton geometry
• Currently used for topology editing, see [Ju, Zhou and Hu. Siggraph 2007]
Pacific Graphics • Hawaii • 2007
• Oct 29 – Nov 2, in Maui, Hawaii
Conference Chair: Ron Goldman
Program co-chairs: Marc Alexa, Steven Gortler, Tao Ju
Results - 1
• Bluetongue Virus (simulated, 10 helices, 0 missing)
– Actual correspondence ranks #1
+
Sequence
Cryo-EM volume and its skeleton
Top Matching