Download Poster - Department of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biochemistry wikipedia , lookup

Protein adsorption wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
An Eclectic Energy Function to Discriminate Native From Decoys
Avdesh Mishra, Sumaiya Iqbal, Md Tamjidul Hoque
email: {amishra2, siqbal1, thoque}@uno.edu
Department of Computer Science, University of New Orleans, New Orleans, LA, USA
Introduction
Discussions
• 3D structure prediction is useful in drug and novel enzymes design.
• Energy functions can aid in
• Protein folding and structure prediction problems relies on an accurate energy function.
• Accuracy of the potential function depends on
•
•
•
•
•
• Protein structure prediction and
• Fold recognition
Interaction distance between atom pairs
Hydrophobic (H) and hydrophilic (P) properties
Sequence-specific information
Orientation-dependent interactions and
Optimization techniques
• We propose, 3DIGARS3.0 potential for improved accuracy.
• We introduce two 3D structural features
• uPhi based energy
• uPsi based energy
• We develop a potential function, which is an optimized linearly weighted accumulation of
• 3-Dimensional Ideal Gas Reference State based Energy Function (3DIGARS)
• Motivation comes from the fact that the 3D structural features assists
the advancement of the accuracy.
• uPhi and uPsi are linearly combined with prior energy components
• It is formulated using an idea of HP, HH and PP properties of amino acids
• Mined accessible surface area (ASA) and
• Ubiquitously computed Phi (uPhi) and Psi (uPsi) energies
• Optimization is performed using a Genetic Algorithm (GA).
• Based on independent test dataset, the proposed energy function outperformed state-of-theart approaches significantly.
• 3DIGARS potential
Methods
Figure 2: The dark central area, composed of atoms, can be thought of a 3D proteins
and the outline around the area in green and red can be thought of real and predicted
accessible surface area respectively. The error between real and predicted ASA is
modelled as an energy feature.
Figure 3: Definition of the angle ϴ formed by four atoms (At1, At2, At3 and At4). uPhi is
computed using At1 belonging to one residue and a set of atoms, At2, At3, At4 belonging
to some other residues. Similarly, uPsi is computed using a set of atoms, At1, At2, At3
belonging to some residues and an atom At4 belonging to some other residue.
• Five independent test decoy sets were used to evaluate the accuracy
•
•
•
•
•
E 3 DIGARS   hp E HP   hh E HH   pp E PP
• 3DIGARS2.0 potential
•
•
•
•
•
•
E 3DIGARS2.0  E 3DIGARS  (w  E ASA )
• 3DIGARS3.0 potential
DFIRE by 440.91%
RWplus by 440.91%
dDFIRE by 72.46%
GOAP by 20.20%
3DIGARS by 417.39%
3DIGARS2.0 by 440.91% based on independent test datasets.
• The percentage weighted average improvement is calculated as
n
%WA 
 ( yi
1
 xi ) *100
n
 xi
1
where, yi represents new value and xi represents old value
Figure 4: (a) Shows atoms arrangement as well as vectors created using the
Cartesian coordinates of the atoms. (b) Shows the dihedral angle ϴ involving the
four atoms.
• We propose an accurate potential which combines useful features
• HP, HH and PP interactions among the amino acids
• Sequence based accessibility obtained for each amino acids
• 3D Structure based property i.e. uPhi and uPsi
Results
Table 1: Performance comparison of different energy functions on Table 2: Performance comparison of different energy functions on
optimization datasets based on correct native count.
independent test datasets based on correct native count.
Decoy Sets
(No. of targets) DFIRE
Methods
RWplus
dDFIRE
GOAP
3DIGARS
3DIGARS2.0
3DIGARS3.0
19
19
19
20
19
18
Moulder
19
(-3.58)
(-2.99)
(-2.68)
(-3.851)
(-2.84)
(-2.74)
(20)
(-2.97)
45
31
49
46
20
12
Rosetta
20
(-3.70)
(-2.023)
(-2.987)
(-2.683)
(-1.47)
(-0.83)
(58)
(-1.82)
45
53
56
56
56
48
I-Tasser
49
(-5.36)
(-4.036)
(-4.296)
(-5.573)
(-5.77)
(-5.03)
(56)
(-4.02)
Weighted
38.64
28.42
56.41
11.93
18.45
-1.61
Average in %
Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best
scores.
Conclusions
Figure 5: Process flow of the design and development of 3DIGARS3.0 energy function.
E 3DIGARS3.0  E 3DIGARS  (w1  E ASA )  (w2  E uPhi )  (w3  E uPsi )
Figure 1: (a) Native like protein conformation, presented in a 3D hexagonal-close-packing (HCP) configuration
using hydrophobic (H) and hydrophilic or polar (P) residues. The H-H interactions space is relatively smaller
than P-P interactions space, since hydrophobic residues (black ball) being afraid of water tends to remain inside
of the central space. (b) 3D metaphoric HP folding kernels, depicted based on HCP configuration based HP
model, showing the 3 layers of distributions of amino-acids.
4state_reduced
fisa_casp3
hg_structal
ig_structal and
ig_structural hires
• 3DIGARS3.0 outperformed the state-of-the-arts approaches
• Integration of the core energy and sequence specific features
• Sequence specific feature is computed by modeling error between the real and predicted
ASA (see Fig. 2)
• Real and predicted ASA are obtained from DSSP and REGAd3p respectively
• 3DIGARS2.0 is a linearly weighted accumulation of 3DIGARS and mined ASA
Integration of core energy, sequence specific energy and 3D structural features (see Fig. 5)
3D structural features added are attained based on uPhi and uPsi angles
uPhi and uPsi are computed using Cartesian coordinates of set of 4 atoms (see Fig. 3 and 4)
uPhi and uPsi based energies are computed based on following steps (see Fig. 4)
• Cosine value range (-1 to 1) of angles uPhi and uPsi are divided into 20 bins, each of
width 0.1
• Individual frequency tables for uPhi and uPsi are computed
• Frequency tables are further used to compute individual energy score libraries
• Energy score are then used to compute uPhi and uPsi energies for a given protein
• The linearly combined energies are optimized using GA
• Three decoy sets were used in optimization
• Moulder
• Rosetta and
• I-Tasser
• Core statistical function based on HP, HH and PP interactions (see Fig. 1)
• Segregated ideal gas reference state and libraries for HP, HH and PP groups
• Better training dataset (100% sequence identity cutoff can capture natural frequency
distribution)
• Three shape parameters (αhp, αhh and αpp) controls shape of assumed spherical protein
surface
• Three contribution parameters (βhp, βhh and βpp) controls the contribution of each group
•
•
•
•
• 3DIGARS energy which is based on HP, HH and PP interactions and
their respective ideal gas reference state
• ASA energy computed by modeling real and predicted accessibility
obtained from protein sequences
Methods
Decoy Sets
(No. of targets)
DFIRE
RWplus
dDFIRE
GOAP
3DIGARS
4state_reduced
(7)
fisa_casp3
(5)
hg_structal
(29)
ig_structal
(61)
6
(-3.48)
4
(-4.80)
12
(-1.97)
0
(0.92)
6
(-3.51)
4
(-5.17)
12
(-1.74)
0
(1.11)
7
(-4.15)
4
(-4.83)
16
(-1.33)
26
(-1.02)
7
(-4.38)
5
(-5.27)
22
(-2.73)
47
(-1.62)
6
(-3.371)
5
(-4.319)
12
(-1.914)
0
(0.645)
4
(-2.642)
5
(-4.682)
12
(-1.589)
0
(0.268)
7
(-3.456)
4
(-4.076)
28
(-3.678)
60
(-2.526)
ig_structal_hires
(20)
0
(0.17)
0
(0.32)
16
(-2.05)
18
(-2.35)
0
(-0.002)
1
(0.030)
20
(-2.378)
3DIGARS2.0 3DIGARS3.0
Weighted
440.91
72.46
20.20
417.39
440.91
440.91
Average in %
Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best
scores.
• The improved potential can be used for
•
•
•
•
•
Protein-Ligand binding site prediction
Ab Initio protein structure prediction
Fold recognition
Drug design and
Enzyme design
• The proposed potential outperforms all the stat-of-arts approaches.
Acknowledgements
We gratefully acknowledge the Louisiana Board of Regents
through the Board of Regents Support Fund, LEQSF (201316)-RD-A-19.