Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Eclectic Energy Function to Discriminate Native From Decoys Avdesh Mishra, Sumaiya Iqbal, Md Tamjidul Hoque email: {amishra2, siqbal1, thoque}@uno.edu Department of Computer Science, University of New Orleans, New Orleans, LA, USA Introduction Discussions • 3D structure prediction is useful in drug and novel enzymes design. • Energy functions can aid in • Protein folding and structure prediction problems relies on an accurate energy function. • Accuracy of the potential function depends on • • • • • • Protein structure prediction and • Fold recognition Interaction distance between atom pairs Hydrophobic (H) and hydrophilic (P) properties Sequence-specific information Orientation-dependent interactions and Optimization techniques • We propose, 3DIGARS3.0 potential for improved accuracy. • We introduce two 3D structural features • uPhi based energy • uPsi based energy • We develop a potential function, which is an optimized linearly weighted accumulation of • 3-Dimensional Ideal Gas Reference State based Energy Function (3DIGARS) • Motivation comes from the fact that the 3D structural features assists the advancement of the accuracy. • uPhi and uPsi are linearly combined with prior energy components • It is formulated using an idea of HP, HH and PP properties of amino acids • Mined accessible surface area (ASA) and • Ubiquitously computed Phi (uPhi) and Psi (uPsi) energies • Optimization is performed using a Genetic Algorithm (GA). • Based on independent test dataset, the proposed energy function outperformed state-of-theart approaches significantly. • 3DIGARS potential Methods Figure 2: The dark central area, composed of atoms, can be thought of a 3D proteins and the outline around the area in green and red can be thought of real and predicted accessible surface area respectively. The error between real and predicted ASA is modelled as an energy feature. Figure 3: Definition of the angle ϴ formed by four atoms (At1, At2, At3 and At4). uPhi is computed using At1 belonging to one residue and a set of atoms, At2, At3, At4 belonging to some other residues. Similarly, uPsi is computed using a set of atoms, At1, At2, At3 belonging to some residues and an atom At4 belonging to some other residue. • Five independent test decoy sets were used to evaluate the accuracy • • • • • E 3 DIGARS hp E HP hh E HH pp E PP • 3DIGARS2.0 potential • • • • • • E 3DIGARS2.0 E 3DIGARS (w E ASA ) • 3DIGARS3.0 potential DFIRE by 440.91% RWplus by 440.91% dDFIRE by 72.46% GOAP by 20.20% 3DIGARS by 417.39% 3DIGARS2.0 by 440.91% based on independent test datasets. • The percentage weighted average improvement is calculated as n %WA ( yi 1 xi ) *100 n xi 1 where, yi represents new value and xi represents old value Figure 4: (a) Shows atoms arrangement as well as vectors created using the Cartesian coordinates of the atoms. (b) Shows the dihedral angle ϴ involving the four atoms. • We propose an accurate potential which combines useful features • HP, HH and PP interactions among the amino acids • Sequence based accessibility obtained for each amino acids • 3D Structure based property i.e. uPhi and uPsi Results Table 1: Performance comparison of different energy functions on Table 2: Performance comparison of different energy functions on optimization datasets based on correct native count. independent test datasets based on correct native count. Decoy Sets (No. of targets) DFIRE Methods RWplus dDFIRE GOAP 3DIGARS 3DIGARS2.0 3DIGARS3.0 19 19 19 20 19 18 Moulder 19 (-3.58) (-2.99) (-2.68) (-3.851) (-2.84) (-2.74) (20) (-2.97) 45 31 49 46 20 12 Rosetta 20 (-3.70) (-2.023) (-2.987) (-2.683) (-1.47) (-0.83) (58) (-1.82) 45 53 56 56 56 48 I-Tasser 49 (-5.36) (-4.036) (-4.296) (-5.573) (-5.77) (-5.03) (56) (-4.02) Weighted 38.64 28.42 56.41 11.93 18.45 -1.61 Average in % Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best scores. Conclusions Figure 5: Process flow of the design and development of 3DIGARS3.0 energy function. E 3DIGARS3.0 E 3DIGARS (w1 E ASA ) (w2 E uPhi ) (w3 E uPsi ) Figure 1: (a) Native like protein conformation, presented in a 3D hexagonal-close-packing (HCP) configuration using hydrophobic (H) and hydrophilic or polar (P) residues. The H-H interactions space is relatively smaller than P-P interactions space, since hydrophobic residues (black ball) being afraid of water tends to remain inside of the central space. (b) 3D metaphoric HP folding kernels, depicted based on HCP configuration based HP model, showing the 3 layers of distributions of amino-acids. 4state_reduced fisa_casp3 hg_structal ig_structal and ig_structural hires • 3DIGARS3.0 outperformed the state-of-the-arts approaches • Integration of the core energy and sequence specific features • Sequence specific feature is computed by modeling error between the real and predicted ASA (see Fig. 2) • Real and predicted ASA are obtained from DSSP and REGAd3p respectively • 3DIGARS2.0 is a linearly weighted accumulation of 3DIGARS and mined ASA Integration of core energy, sequence specific energy and 3D structural features (see Fig. 5) 3D structural features added are attained based on uPhi and uPsi angles uPhi and uPsi are computed using Cartesian coordinates of set of 4 atoms (see Fig. 3 and 4) uPhi and uPsi based energies are computed based on following steps (see Fig. 4) • Cosine value range (-1 to 1) of angles uPhi and uPsi are divided into 20 bins, each of width 0.1 • Individual frequency tables for uPhi and uPsi are computed • Frequency tables are further used to compute individual energy score libraries • Energy score are then used to compute uPhi and uPsi energies for a given protein • The linearly combined energies are optimized using GA • Three decoy sets were used in optimization • Moulder • Rosetta and • I-Tasser • Core statistical function based on HP, HH and PP interactions (see Fig. 1) • Segregated ideal gas reference state and libraries for HP, HH and PP groups • Better training dataset (100% sequence identity cutoff can capture natural frequency distribution) • Three shape parameters (αhp, αhh and αpp) controls shape of assumed spherical protein surface • Three contribution parameters (βhp, βhh and βpp) controls the contribution of each group • • • • • 3DIGARS energy which is based on HP, HH and PP interactions and their respective ideal gas reference state • ASA energy computed by modeling real and predicted accessibility obtained from protein sequences Methods Decoy Sets (No. of targets) DFIRE RWplus dDFIRE GOAP 3DIGARS 4state_reduced (7) fisa_casp3 (5) hg_structal (29) ig_structal (61) 6 (-3.48) 4 (-4.80) 12 (-1.97) 0 (0.92) 6 (-3.51) 4 (-5.17) 12 (-1.74) 0 (1.11) 7 (-4.15) 4 (-4.83) 16 (-1.33) 26 (-1.02) 7 (-4.38) 5 (-5.27) 22 (-2.73) 47 (-1.62) 6 (-3.371) 5 (-4.319) 12 (-1.914) 0 (0.645) 4 (-2.642) 5 (-4.682) 12 (-1.589) 0 (0.268) 7 (-3.456) 4 (-4.076) 28 (-3.678) 60 (-2.526) ig_structal_hires (20) 0 (0.17) 0 (0.32) 16 (-2.05) 18 (-2.35) 0 (-0.002) 1 (0.030) 20 (-2.378) 3DIGARS2.0 3DIGARS3.0 Weighted 440.91 72.46 20.20 417.39 440.91 440.91 Average in % Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best scores. • The improved potential can be used for • • • • • Protein-Ligand binding site prediction Ab Initio protein structure prediction Fold recognition Drug design and Enzyme design • The proposed potential outperforms all the stat-of-arts approaches. Acknowledgements We gratefully acknowledge the Louisiana Board of Regents through the Board of Regents Support Fund, LEQSF (201316)-RD-A-19.