Download Predicting Structural Features of Proteins

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Template-based Prediction of
Protein 8-state Secondary
Structures
Ashraf Yaseen and Yaohang Li
3rd IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)
June 12th 2013
DEPARTMENT OF COMPUTER SCIENCE
OLD DOMINION UNIVERSITY, NORFOLK, VA
Contents
2

Introduction
 Secondary
Structure Definition & Representation
 Secondary Structure Prediction
 C8-Scorpion

Materials & Methods
 Data
Sets, Template Construction, and Encoding
 Neural Network Model


Results & Discussions
Summary
Protein Secondary Structure Prediction in Protein Modeling
3


Proteins; Proteios, “primary”, “of prime
importance.” The primary components of living things
In nature, proteins fold into specific 3D structures

critical to their functions
Protein Modeling
Sequen
ce
3D
intermediate prediction steps

Correctly predicting protein secondary structure is a
critical step stone to obtain correct 3D models
Secondary Structures - Definition
4
• General 3D form of local segments of residues
• Identified from determined protein 3D
• DSSP
β-strand
π-helix
310-helix
Bend
α-helix
Turn
Protein 1BOO Chain A
Other
Secondary Structures - Representation
5
3-10 helix (G)
α-helix (H)
π-helix (I)
β-stand (E)
bridge (B)
turn (T)
bend (S)
others (C)
Secondary Structure Prediction - Effectiveness
6

Correctly predicting secondary structure
 Reduce
the degrees of freedom in protein structure
modeling  reduce the difficulty of obtaining high
resolution 3D models
 Derive
a much smaller range of
possible torsion angles
http://www.imb-jena.de/~rake/Bioinformatics_WEB/basics_peptide_bond.html
Secondary Structure Prediction - Background
7
Secondary Structure Prediction  classification
Each residue is predicted to be in one of few states
Machine Learning
(ANN, SVM, HMM, ...)
Secondary Structure Prediction
Predictor
•
•
3-state (helix, sheet, coil)
8-state (α-helix, π-helix, 310-helix, β-strand, βbridge, turn, bend and others)
Structural state of Ri
 3-state Examples:
 GOR4, PSI-Pred, PHD, SAM, Porter, JPred, SPINE, SSPRO, NETSURF, and many others.
 ~80% (Q3)
 8-state Examples:
 SSpro8, 62-63% Q8
 RaptorXss8, 67.9% Q8
Secondary Structure Prediction - 8-state
8
Prediction Accuracy of RaptorXss8 on Benchmarks of CB513, CASP9,
Manesh215, and Carugo338. Prediction accuracies for 3-10 helices
(G), π-helices (I), β-bridges (B), and bends (T) are particularly low due
to their low appearance frequencies
QG
QH
QI
QE
QB
QS
QT
QC
Q8
CB513
CASP9
Manesh215
Carugo338
17.54
20.58
18.43
19.20
89.96
92.90
90.22
89.91
0.00
0.00
0.00
0.00
77.68
81.64
79.60
79.45
0.09
0.00
0.32
0.44
15.87
18.11
17.80
17.14
48.02
51.45
51.28
50.11
63.29
59.37
63.73
63.36
65.59
69.31
67.69
66.64
Distribution of 3-10 helices (G), α-helices (H),
π-helices (I), β-sheets (E), β-bridges (B), turns
(T), bends (S), and coils (C) in Cull5547
Secondary Structure Prediction - Template-based
9
Most current methods for secondary structure
predictions are ab initio
 However, many protein sequences have some
degree of similarity among themselves
 Latest version of Porter (in 3-state)

 Improvement
in prediction accuracy with >30%
sequence similarity
 Decline in efficiency with low sequence similarity
<20%
Template-based C8-SCORPION
10
Is an extension of our previous method
C3-SCORPION
Input
encoding
Sequence & evolutionary info (PSSM)
Predictor
Structural feature (state) of Ri
+ Structure info. from
(templates
Or
context-based scores)
Materials & Methods
11
Data Sets
PISCES
server
Cull5547
CASP9
Template Construction
25% (at most) sequence identity,
2.0A resolution
Carugo338
Manesh215
CB513
Encoding
Context-based scores: potential scores, based on statistics, derived from the protein datasets, estimate
the favorability of residues in adopting specific structural states, within their amino acid environment.
Materials & Methods -cont.
12
Neural Network Model
Two phases of template-based 8-state secondary structure prediction (architecture and encoding)
Results & Discussions
13
7-fold cross-validation accuracy in
template-based 8-state prediction
Q8
SOV8
G
43.99
47.96
H
92.48
95.19
I
0.00
0.00
E
88.30
92.77
B
27.86
27.57
S
43.46
45.32
T
64.18
66.64
C
75.51
71.45
Overall
78.85
80.10
Comparison between 8-state predictions with and without template
on Benchmarks
Q
8
Distribution of 8-state secondary structure prediction accuracy
(Q8) as a function of sequence similarity- the first group of bars
corresponds to template-less predictions
SOV8
No Template
With Template
No Template
With Template
CB513
67.22
79.39
67.66
80.64
CASP9
71.54
76.36
73.47
78.15
Manesh215
69.71
81.10
70.79
82.99
Carugo338
68.44
80.39
69.50
81.95
Results & Discussions -cont.
14
Comparison of 7-fold cross validation prediction accuracies in eight states when templates with different
sequence similarities are used
(0, 10]
(10, 20]
(20, 40]
(40, 70]
(70, 95]
# of chains
4,426
4,215
3,204
1,437
1,133
QH
QG
QI
QE
QB
QT
QS
QC
Q8
92.05
92.70
93.60
94.97
95.94
22.07
23.93
35.09
55.03
69.44
0.00
0.00
0.00
0.00
0.00
83.37
84.53
86.59
90.16
93.61
1.53
3.59
7.24
22.30
44.26
53.35
55.34
60.89
69.66
77.06
22.83
26.41
35.19
54.09
73.40
66.55
67.84
71.81
79.56
86.80
71.33
73.01
76.29
82.11
88.01
Results & Discussions -cont.
15
Comparison between template-less and template-based predictions on 1BTN chain A
16
Working with C8-Scorpion
Input title
Input your sequence
Input your e-mail
Submit, then wait for the results...
 “C8-Scorpion” available at: http://hpcr.cs.odu.edu/c8scorpion
17
Working with C8-Scorpion
Check your e-mail,
Click the link provided
The results are displayed
Summary
18

The effectiveness of using structural information in templates has
been demonstrated in our computational results in 7-fold cross
validation as well as on benchmarks, where enhancements of
prediction accuracies are observed.



Overall, 78.85% Q8 accuracy and 80.10% SOV8 accuracy are
achieved in 7-fold cross validation
More importantly, when good templates are available, the prediction
accuracy of less frequent secondary structure states, such as 3-10
helices, turns, and bends, are highly improved, which are suitable for
practical use in applications.
A webserver (C8-Scorpion) implementing template-less 8-state
secondary structure prediction is currently available at
http://hpcr.cs.odu.edu/c8scorpion. The integration of templatebased prediction into the C8-Scorpion webserver is currently under
development
Acknowledgement
19
This work is partially supported by NSF grant
1066471 and ODU 2013 Multidisciplinary Seed grant
20
Questions?
Thank You
Related documents