* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Four-body Statistical Potentials
Bottromycin wikipedia , lookup
Molecular neuroscience wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
List of types of proteins wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Action potential wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Protein moonlighting wikipedia , lookup
Western blot wikipedia , lookup
Protein adsorption wikipedia , lookup
Rosetta@home wikipedia , lookup
Circular dichroism wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Proteolysis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein domain wikipedia , lookup
Protein folding wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
A Four-Body Statistical Potential For Protein Fold Recognition Bala Krishnamoorthy and Alex Tropsha UNC Chapel Hill Nov 17, 2003 1 Four-Body Potentials Outline Motivation Hypothesis Four-body statistical potentials Application to folding simulations Application to predictions from CASP5 and Livebench 6 2 Four-Body Potentials Motivation Knowledge of protein structure is essential to understand their function(s) Number of proteins (sequences known) is growing exponentially Traditional methods for determining protein structure (X-ray crystallography, NMR etc.) do not yield quick results Need to develop statistical methods that help with protein fold recognition 3 Four-Body Potentials Hypothesis Specific nearest neighbor residue contacts in protein structures have non-random propensities for occurrence. The propensities of occurrence of nearest neighbor clusters can be used to score compatibility between protein sequence and structure 4 Four-Body Potentials SNAPP Simplicial Neighborhood Analysis of Protein Packing 2-D Packing 3-D Packing 2-D: 3 neighbors in mutual contact 3-D: 4 neighbor clusters 5 Four-Body Potentials Objective definition of the nearest neighborhood of each residue is needed Use the Voronoi diagram of the protein - gives convex hulls around each residue (represented as a point) that define the nearest neighborhood of the residue Delaunay triangulation – defined as the dual of the Voronoi diagram 6 Four-Body Potentials Tessellation of protein structure (in 3D) Residues are represented by their side-chain centers (or by their C-α atoms) Protein structure represented as an aggregate of space filling, non-intersecting and irregular tetrahedra Nearest neighbor residues are identified as unique sets of four residues each (tetrahedral quadruplets) 7 Four-Body Potentials Four-body Statistical Potentials Denote each quadruplet by { i , j , k , l } i,j,k and l can be any of the 20 amino acids Total number of possible quadruplets is 8855 AALV VALI TLKM YYYY … 8 Four-Body Potentials Based on the back-bone connectivity of {i,j,k,l}, there can be five types of tetrahedra (indexed as 0,1,2,3 and 4 respectively ) The propensities of the {i,j,k,l} quadruplets of each type t could be used to develop four-body statistical potentials 9 Four-Body Potentials Four-body compositional propensities of Delaunay simplices q ijkl_t f ijkl_t = log pijkl_t f ijkl_t - observed frequency of occurrence in the training set of quad {ijkl} in a type t tetrahedron pijkl_t - expected frequency of occurrence in the training set of residues i,j,k and l in a type t tetrahedron pijkl_t = C a i a j ak al pt ai – individual AA frequency p – frequency of type t tetrahedra t C – combinatorial factor 10 Four-Body Potentials diverse training set of 1166 protein chains with known structure For a test conformation, the total log-likelihood score is calculated by adding the score for each tetrahedron in its Delaunay tessellation. Higher Score ↔ better structure 11 Four-Body Potentials MD Simulation of proteins Comparison of pre- and post-TS (transition) structure of CI2 vs. native CI2 * Pre-TS (six structures) Post-TS (20 structures) Native Go potentials (native structure specific) fail to discriminate between the three! *structures courtesy of Dr. E. Shaknovich, Harvard (Ref: J. Mol. Biol. 296 (2000) p1183-1188) 12 Four-Body Potentials Comparison of total scores for pre- and post-TS structures of CI2 vs. native CI2 120 110 100 total score 90 80 70 60 50 40 30 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 instances (red-pre(6), yellow-post(20), green-native) N.B. - The 5th pre-TS instance actually had a 0.10 probability of folding (the other five pre-TS structures had ~ 0 probability of folding) 13 Four-Body Potentials Structure profiles of pre-TS vs. post-TS structure of CI2 I20 20 log-likelihood score pre post native A16 15 L8 V13 L49 V47 V51 I57 I29 V13 V31 10 V31 V51 L49 5 0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 residue # Profile ProCAM of Post-TS structure 14 Four-Body Potentials SNAPP analysis of pre-TS vs. post-TS structure of CI2 Pre-TS Post-TS 15 Four-Body Potentials Structure profiles of pre-TS vs. post-TS structure of SH3 18 log-likelihood score 16 I48 14 A37 F18 12 L16 10 Y8 W35 Y52 8 pre post G46 native 6 4 2 0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 residue # 16 Four-Body Potentials Scoring Livebench 6 and CASP5 predictions Livebench Automated evaluation of structure prediction servers Set 6 had 32 “easy” and 66 “hard” targets CASP 5 3D coordinate models submitted for 56 targets Native structure of 33 targets has been released - rank 3D predictions using four-body potentials - compare with the ranking using global structural similarity measures (like MaxSub) 17 Four-Body Potentials To compare rankings, use predictive index (PI) Here, E – experimental values, P – predicted values 18 Four-Body Potentials Livebench 6 10 models for each target made by PMODELLER PI for 28 “easy” targets and 38 “hard” targets (at least one model had a non-zero MaxSub score) Easy <PI> Std(PI) Hard <PI> Std(PI) 4B pot 0.83 0.20 4B pot 0.83 0.11 MJ 0.70 0.39 MJ 0.74 0.18 PMOD 0.80 0.19 PMOD 0.84 0.15 19 Four-Body Potentials CASP 5 For 18 targets (out of 33), the native structure ranked better than all predictions For 26 (out of 33) targets, the native structure was ranked within the top 3.5 % of all the predictions CASP5 <PI> Std(PI) 4B pot 0.61 0.18 MJ 0.39 0.20 CRMSD 0.63 0.22 20 Four-Body Potentials Conclusions A four-body statistical scoring function is developed based on the Delaunay tessellation of proteins Discriminates native from decoy structures in most of the cases Distinguishes pre- and post-transition state structures and the native structure from MD folding simulation trajectories Highly effective in the accurate ranking of Livebench 6 and CASP5 predictions 21