Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Biochemistry wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Photosynthetic reaction centre wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Proteolysis wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Structural alignment wikipedia , lookup
Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik George N. Phillips, Jr. [email protected] [email protected] [email protected] ICML Bioinformatics Workshop 21 August 2003 Task Overview Given • Electron density for a region in a protein • Protein’s topology Find • Atomic positions of individual atoms in the density map Pictorial Structures A pictorial structure is… a collection of image parts together with… a deformable conformation of these parts Pictorial Structures Formally, a model consists of Set of parts V={v1, …, vn} Configuration L=(l1, …, ln) Edges eij E, connect neighboring parts vi, vj v1 e13 Appearance parameters Ai for each part Connection parameters Cij for each edge e23 v3 – Explicit dependency between li, lj – G=(V,E) forms a Markov Random Field v2 e35 e34 v4 v5 e46 v6 Matching Algorithm Overview Want configuration L of model Θ maximizing P(L|I,Θ) P(I|L,Θ) · P(L|Θ) P(I|L,Θ) = Πi P(I|li,Θ) = 1 Z1 - Σi matchi(li) e P(L|Θ) = Π (v ,v )E P(li,lj|Cij) = i j 1 Z2 - Σ(vi,vj)E dij(li,lj) e Equivalent to minimizing Σi matchi(li) + Σ(vi,vj)E dij(li,lj) Linear-Time Matching Algorithm A Dynamic Programming implementation runs in quadratic time Requires tree configuration of parts Felzenszwalb & Huttenlocher (2000) developed linear-time matching algorithm Additional constraint on part-to-part cost function dij Basic “Trick”: Parallelize minimization computation over entire grid using a Generalized Distance Transform Pictorial Structures for Map Interpretation Basic Idea: Build pictorial structure that is able to model all configurations of a molecule Each part in “collection of parts” corresponds to an atom Model has low-cost conformation for low-energy states of the molecule The Screw-Joint Model Ideally, we would have cost function = atomic energy Problem: Impossible to represent atomic energy function using pairwise potentials while maintaining tree-structure Solution: screw-joint model Ignore non-bonded interactions Edges correspond to covalent bonds Allow free rotation around bonds Screw-Joint Model Details Each part’s configuration has six params (x,y,z,α,β,γ) with (x,y,z) is part’s position αi α is part’s rotation (about bond connecting vi and vj) (β,γ) is part’s orientation (βi,γi) (βj,γj) vi vj vj (xj,yj,zj) αj vi (xi,yi,zi) (xij,yij,zij) Part-to-part cost function dij based on child’s deviation from ideal Matching cost function matchi based on 3x3x3 template match Pictorial Structures for Map Interpretation Ideally, we would … Build pictorial structure for the entire protein Run the matching algorithm to get best layout However, computationally infeasible Instead, we use two-phase algorithm that … a) b) computes best backbone trace computes best sidechain conformation (current focus) Sidechain Refinement Assume we have a rough Cα trace of the protein Next use pictorial structure matching to place sidechains Walk along chain one residue at a time, placing individual atoms Cα, ALA_82 Cα, MET_80 Cα, ARG_81 Cα, PRO_83 Sidechain Refinement Given: residue type approximate Cα locations Find: most likely location for sidechain atoms in the residue Example Alanine N C-1 O N Cα Cα-1 O-1 C Cβ N O O N+1 Cα+1 Matching algorithm Learning Model Parameters O N N N O Cα Cβ Averaged 3D Template N N C-1 O Alanine Cα C Cα Cβ C N+1 Cα r = 1.51 θ = 118.4° φ = -19.7° Canonic Orientation r = 1.53 θ = 0.0° φ = -19.3° Cβ C Averaged Bond Geometry Soft Maximums Sometimes we may get an optimal match like the one to the right When this occurs, explore the space of non-optimal solutions via soft maximums in DP Basic Idea: Take a path with probability inversely proportional to its cost ACTUAL PREDICTED 1 Soft Maximums Figure to the right shows soft maximums Red molecule eventually found Annealing increases “softness” until legal structure found PREDICTED 2 ACTUAL Legal structure may not be “right” PREDICTED 1 Results Only sidechain refinement implemented & tested Experimental Methodology Assume Cα’s known to within 2Å Trained on 1.7 Å resolution protein, tested on 1.9 Å resolution protein Templates built for ALA, VAL, TYR, LYS Model Parameters Grid spacing of 0.5 Å within diameter 10 Å sphere Rotational discretization: 12 rotational steps 84 orientations Sidechain Placement Compared predicted vs. actual location for 599 atoms on testset protein 1 29.9% atoms within 0.5Å % atoms 0.8 0.6 0.4 72.3% atoms within 1.0Å 0.2 93.0% atoms within 2.0Å 0 0 2 4 6 Accuracy (angstroms) Recall 0.5Å grid spacing 8 Predictive Accuracy Task We used DP matching score as a predictor of amino acid type Tested 49 ALA, LYS, TYR, VAL residues Highest scoring normalized template determined type 61.2% accuracy (majority classification = 33%) The Good… PREDICTED vs. ACTUAL LYSINE LYSINE TYROSINE VALINE … and the Bad PREDICTED vs. ACTUAL LYSINE VALINE TYROSINE ALANINE Future Work Implement & integrate backbone tracing algorithm, to create complete two-tiered solution Better strategies to handle illegal molecule configurations perturbation of branches involved in collisions more accurate representation of atomic energy function, e.g. torsion angle Better match function … make use of previous work? More tests (larger training set, higher resolution) Acknowledgements NLM grant 1T15 LM007359-01 NLM grant 1R01 LM07050-01 NIH grant P50 GM64598.