Download dimaio.icml03

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik George N. Phillips, Jr. [email protected] [email protected] [email protected] ICML Bioinformatics Workshop 21 August 2003 Task Overview  Given • Electron density for a region in a protein • Protein’s topology Find • Atomic positions of individual atoms in the density map Pictorial Structures A pictorial structure is… a collection of image parts together with… a deformable conformation of these parts Pictorial Structures Formally, a model consists of  Set of parts V={v1, …, vn}  Configuration L=(l1, …, ln)  Edges eij  E, connect neighboring parts vi, vj v1 e13  Appearance parameters Ai for each part  Connection parameters Cij for each edge e23 v3 – Explicit dependency between li, lj – G=(V,E) forms a Markov Random Field v2 e35 e34 v4 v5 e46 v6 Matching Algorithm Overview  Want configuration L of model Θ maximizing  P(L|I,Θ) P(I|L,Θ) · P(L|Θ) P(I|L,Θ) = Πi P(I|li,Θ) = 1 Z1 - Σi matchi(li) e P(L|Θ) = Π (v ,v )E P(li,lj|Cij) = i  j 1 Z2 - Σ(vi,vj)E dij(li,lj) e Equivalent to minimizing Σi matchi(li) + Σ(vi,vj)E dij(li,lj) Linear-Time Matching Algorithm  A Dynamic Programming implementation runs in quadratic time   Requires tree configuration of parts Felzenszwalb & Huttenlocher (2000) developed linear-time matching algorithm   Additional constraint on part-to-part cost function dij Basic “Trick”: Parallelize minimization computation over entire grid using a Generalized Distance Transform Pictorial Structures for Map Interpretation Basic Idea: Build pictorial structure that is able to model all configurations of a molecule   Each part in “collection of parts” corresponds to an atom Model has low-cost conformation for low-energy states of the molecule The Screw-Joint Model  Ideally, we would have cost function = atomic energy  Problem: Impossible to represent atomic energy function using pairwise potentials while maintaining tree-structure  Solution: screw-joint model  Ignore non-bonded interactions  Edges correspond to covalent bonds  Allow free rotation around bonds Screw-Joint Model Details  Each part’s configuration has six params (x,y,z,α,β,γ) with  (x,y,z) is part’s position αi  α is part’s rotation (about bond connecting vi and vj)  (β,γ) is part’s orientation (βi,γi) (βj,γj) vi vj vj   (xj,yj,zj) αj vi (xi,yi,zi) (xij,yij,zij) Part-to-part cost function dij based on child’s deviation from ideal Matching cost function matchi based on 3x3x3 template match Pictorial Structures for Map Interpretation  Ideally, we would …   Build pictorial structure for the entire protein Run the matching algorithm to get best layout  However, computationally infeasible  Instead, we use two-phase algorithm that … a) b) computes best backbone trace computes best sidechain conformation (current focus) Sidechain Refinement  Assume we have a rough Cα trace of the protein  Next use pictorial structure matching to place sidechains  Walk along chain one residue at a time, placing individual atoms Cα, ALA_82 Cα, MET_80 Cα, ARG_81 Cα, PRO_83 Sidechain Refinement Given:    residue type approximate Cα locations  Find: most likely location for sidechain atoms in the residue  Example Alanine N C-1 O N Cα Cα-1 O-1 C Cβ N O O N+1 Cα+1 Matching algorithm Learning Model Parameters O N N N O Cα Cβ Averaged 3D Template N N C-1 O Alanine Cα C Cα Cβ C N+1 Cα r = 1.51 θ = 118.4° φ = -19.7° Canonic Orientation r = 1.53 θ = 0.0° φ = -19.3° Cβ C Averaged Bond Geometry Soft Maximums  Sometimes we may get an optimal match like the one to the right  When this occurs, explore the space of non-optimal solutions via soft maximums in DP  Basic Idea: Take a path with probability inversely proportional to its cost ACTUAL PREDICTED 1 Soft Maximums  Figure to the right shows soft maximums  Red molecule eventually found  Annealing increases “softness” until legal structure found PREDICTED 2 ACTUAL  Legal structure may not be “right” PREDICTED 1 Results  Only sidechain refinement implemented & tested  Experimental Methodology   Assume Cα’s known to within 2Å  Trained on 1.7 Å resolution protein, tested on 1.9 Å resolution protein  Templates built for ALA, VAL, TYR, LYS Model Parameters  Grid spacing of 0.5 Å within diameter 10 Å sphere  Rotational discretization:  12 rotational steps  84 orientations Sidechain Placement  Compared predicted vs. actual location for 599 atoms on testset protein 1   29.9% atoms within 0.5Å % atoms 0.8 0.6 0.4 72.3% atoms within 1.0Å 0.2  93.0% atoms within 2.0Å 0 0 2 4 6 Accuracy (angstroms)  Recall 0.5Å grid spacing 8 Predictive Accuracy Task  We used DP matching score as a predictor of amino acid type  Tested 49 ALA, LYS, TYR, VAL residues  Highest scoring normalized template determined type  61.2% accuracy (majority classification = 33%) The Good…  PREDICTED vs. ACTUAL LYSINE LYSINE TYROSINE VALINE … and the Bad  PREDICTED vs. ACTUAL LYSINE VALINE TYROSINE ALANINE Future Work  Implement & integrate backbone tracing algorithm, to create complete two-tiered solution  Better strategies to handle illegal molecule configurations  perturbation of branches involved in collisions  more accurate representation of atomic energy function, e.g. torsion angle  Better match function … make use of previous work?  More tests (larger training set, higher resolution) Acknowledgements  NLM grant 1T15 LM007359-01  NLM grant 1R01 LM07050-01  NIH grant P50 GM64598.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download dimaio.icml03