Download Conformational Space

Conformational Space Conformational Space  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List of coordinates of atom centers  List of torsional angles (e.g., the f-y-c for a protein)  Conformational space: Space of all conformations Conformational Space qj qi qN-1 q2 qN q1 Conformational Space q0 q1 qn q4 q3 Relation to Robotics/Graphics q0 q1 q2 qn t(t) q4 Configuration space q3 Need for a Metric  Simulation and sampling techniques can produce millions of conformations  Which conformations are similar?  Which ones are close to the folded one?  Do some conformations form small clusters (e.g. key intermediates while folding)? Metric in Conformational Space  A metric over conformational space C is a function: d: c,c’  C  d(c,c’)  +{0} such that:  d(c,c’) = 0  c = c’  d(c,c’) = d(c’,c)  d(c,c’) + d(c’,c”)  d(c,c”) (non-degeneracy) (symmetry) (triangle inequality) But not all metrics are “good”  Euclidean metric: d(c,c’) = Si=1,...,n(|fi-fi’|2+ |yi-yi’|2) Metric in Conformational Space  A “good” metric should measure how well the atoms in two conformations can be aligned  Usual metrics: cRMSD, dRMSD RMSD  Given two sets of n points in 3 A = {a1,…,an} and B = {b1,…,bn}  The RMSD between A and B is: RMSD(A,B) = [(1/n)Si=1,…,n||ai-bi||2]1/2 where ||ai-bi|| denotes the Euclidean distance between ai and bi in 3  RMSD(A,B) = 0 iff ai = bi for all i cRMSD  Molecule M with n atoms a1,…,an  Two conformations c and c’ of M  ai(c) is position of ai when M is at c  cRMSD(c,c’) is the minimized RMSD between the two sets of atom centers: minT[(1/n)Si=1,…,n||ai(c) – T(ai(c’))||2]1/2 where the minimization is over all possible rigid-body transform T cRMSD  cRMSD verifies triangle inequality  cRMSD takes linear time to compute  Often, cRMSD is restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone Representation Restricted to Ca Atoms Protein 1tph - The positions of AA residue centers (Cα atoms) mainly determine the structure of a protein. - In structural comparison, people usually work only on the backbone of Cα atoms, and neglect the other atoms. Possible project: Design a method for efficiently finding nearest neighbors in a sampled conformation space of a protein, using the cRMSD metric. dRMSD  Molecule M with n atoms a1,…,an  Two conformations c and c’ of M  {dij(c)}: nn symmetrical intra-molecular distance matrix in M at c  dRMD(c, c’) is : [(1/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c) – dij(c’))2]1/2  {dij} is usually restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone Intra-Molecular Distance Matrix Distances between Ca pairs of a protein with 142 residues. Darker squares represent shorter distances. Intra-Molecular Distance Matrix 45 40 85 1 Distances between Ca pairs of a protein with 142 residues. Darker squares represent shorter distances. Intra-Molecular Distance Matrix dRMSD  Molecule M with n atoms a1,…,an  Two conformations c and c’ of M  {dij(c)}: nn symmetrical intra-molecular distance matrix in M at c  dRMSD(c, c’) = [(2/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c) – dij(c’))2]1/2  {dij} is usually restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone dRMSD  Molecule M with n atoms a1,…,an  Two conformations c and c’ of M  {dij(c)}: nn symmetrical intra-molecular distance matrix in M at c  dRMSD(c, c’) = [(2/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c) – dij(c’))2]1/2  {dij} is usually restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone  Advantage: No aligning transform  Drawback: Takes quadratic time to compute Is dRMSD a metric?  dRMSD(c, c’) = [(2/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c) – dij(c’))2]1/2 is a metric in the n(n-1)/2-dimensional space, where a conformation c is represented by {dij(c)}  But, in this representation, the same point represents both a conformation and its mirror image k-Nearest-Neighbors Problem Given a set S of conformations of a protein and a query conformation c, find the k conformations in S most similar to c (w.r.t. cRMSD, dRMSD, other metric) Can be done in time O(N(log k + L)) where: - N = size of S - L = time to compare two conformations k-Nearest-Neighbors Problem The total time needed to compute the k nearest neighbors of every conformation in S is O(N2(log k + L)) Much too long for large datasets where N ranges from 10,000’s to millions!!! Can be improved by: 1. Reducing L 2. More efficient algorithm (e.g., kd-tree) kd-Tree In a d-dimensional space, where d>2, range searching for a point takes O(dn1-1/d) k-Nearest-Neighbors Problem Idea: simplify protein’s description Assume that each conformation is described by the coordinates of the n Ca atoms cRMSD  O(n) time dRMSD  O(n2) time This representation is highly redundant  Proximity along the chain entails spatial proximity d  3l  Atoms can’t bunch up, hence far away atoms along the chain are on average spatially distant ci cj  m-Averaged Approximation  Cut the backbone into fragments of m Ca atoms  Replace each fragment by the centroid of the m Ca atoms   Simplified cRMSD and dRMSD 3n coordinates 3n/m coordinates Evaluation: Test Sets [Lotan and Schwarzer, 2003]  8 diverse proteins (54 -76 residues)  Decoy sets of N =10,000 conformations from the Park-Levitt set [Park et al, 1997] Correlation: m cRMSD dRMSD 3 0.99 0.96-0.98 4 0.98-0.99 0.94-0.97 6 0.92-0.99 0.78-0.93 9 0.81-0.98 0.65-0.96 12 0.54-0.92 0.52-0.69 Higher correlation for random sets ( greater savings) Running Times Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A N r 1  n  n   r        1 2 m  m  A Vector ai of elements of distance matrix of ith conformation (i = 1 to N) 1 dRMSDm (c,c ai -aj i j )= r 2 Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A 2) Compute the SVD A = UDVT SVD Decomposition N r A (rxN) = Vector aj of elements of distance matrix of jth conformation (j = 1 to N) U (rxr) D (rxr) Diagonal matrix Orthonormal (rotation) matrix VT (rxN) SVD Decomposition N r A (rxN) = Vector aj of elements of distance matrix of jth conformation (j = 1 to N) U (rxr) s1 s2 0 0 sr VT (rxN) Diagonal matrix s1  s2  ...  sr  0 (singular values) Orthonormal (rotation) matrix SVD Decomposition N r A (rxN) = Vector aj of elements of distance matrix of jth conformation (j = 1 to N) U (rxr) VT D (rxr) (rxN) vjT vkT Diagonal matrix Orthonormal (rotation) matrix Matrix with orthonormal rows vi and vj are orthogonal unit Nx1 vectors SVD Decomposition N A (rxN) r = U (rxr) D (rxr) y X VT (rxN) Representation of A in space (X,Y) 1 dRMSDm (c,c )= ai -aj i j r Y does not depend on the coordinate system! r-dimensional space r  1  n    n   1  2  m    m   x 2 SVD Decomposition N r s1 s2 A (rxN) = U (rxr) v1 T v2 T s3 sr D (rxr) VT (rxN) ||s1v1||  ||s2v2|| ... SVD Decomposition N r s1 s2 A (rxN) = U (rxr) v1 T v2 T s3 sr vpT D (rxr) VT (rxN) p principal components SVD Decomposition N r s1 A (rxN) = U (rxr) v1 T v2 T s2 sp vpT 0 D (rxr) VT (rxN) p principal components Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A 2) Compute the SVD A = UDVT 3) Project onto p principal components Correlation between dRMSD and dRMSD4PC dRMSD is PC 4 reduced to summing up 12 to 20 terms (instead of ~ 80 to 200, since the proteins have 54 to 76 amino acids) Complexity of SVD  SVD of rxN matrix, where N > r, takes O(r2N) time  Here r ~ (n/m)2  So, time complexity is O(n4N)  Would be too costly without m-averaging Evaluation for 1CTF Decoy Sets [Lotan and Schwarzer, 2003]          N = 100,000, k = 100, 4-averaging, 16 PCs 70% correct, with furthest NN off by 20% Brute-force: 84 h Brute-force + m-averaging: 4.8 h Brute-force + m-averaging + PC: 41 min kD-tree + m-averaging + PC: 19 min Speedup greater than x200 6k approximate NNs contain all true k NNs  Use m-averaging and PC reduction as fast filters

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Conformational Space