Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Protein Secondary Structure
Prediction Using Deep
Convolutional Neural Fields
Sheng Wang, Jian Peng, Jianzhu Ma & Jinbo Xu
Bentley Wingert – 23 March 2016
1
Outline
Secondary Structure Background
Neural Net Architecture
Performance
Conclusions
2
Secondary Structure
Secondary structures defined by
pattern of hydrogen bonds in common
configurations
Dictionary of Protein Secondary
Structure (DSSP)
Secondary structure prediction using
neural nets since ’80s.*
Letter
Description
G
3-10 helix
H
I
T
Alpha helix
Pi helix
Turn
E
Beta sheet
B
S
C
Beta bridge
Bend
Coil
*Qian, Ning, and Terrence J. Sejnowski. "Predicting the secondary structure of globular proteins using
neural network models." Journal of molecular biology 202.4 (1988): 865-884.
3
Outline
Secondary Structure Background
Neural Net Architecture
Performance
Conclusions
4
Neural Net Overview
Deep Convolutional Neural Net (DCNN)
W: Neural net parameters
Conditional Random Field (CRF)
U: Between top and label layers
T: Correlation between adjacent residues
of label layer
A) Traditional deep neural net
B) Convolutional deep neural net
L2 regularization factor
λ
5
Deep Convolutional Neural Net
Window size = 11 nodes
Nk = 5
Activation function either sigmoid or
tanh – h()
Weights shared by all positions in layer
6
Conditional Random Field
7
Training Data
Datasets
CullPDB – 6125 proteins (proteins added before May
2012) ~5600 training ~500 test
CB513 – 513
CASP10 - 123
CASP11 – 105
CAMEO – 179 (5 Dec 2014 – 29 May 2015)
Input
21-element binary vector of amino acids for that position
Sigmoid-transformed PSSM from PSI-BLAST
E-value threshold 0.001
3 iterations
42 features per residue
8
Training Technique
𝜃 = {𝑊, 𝑇, 𝑈}
9
Training Technique
𝜃 = {𝑊, 𝑇, 𝑈}
10
Training Technique
𝜃 = {𝑊, 𝑇, 𝑈}
11
Outline
Secondary Structure Background
Neural Net Architecture
Performance
Conclusions
12
Evaluation
Q3 accuracy
Q8 accuracy
Percentage of residues with correctly predicted
secondary structures
Percentage of residues with correctly predicted
secondary structures
SOV (Segment of OVerlap)
How well predicted segments of secondary structure
match
Penalizes if length of segments do not match
Lower penalty if wrong at end of segment, large penalty
if wrong in middle
Letter
Description
G
3-10 helix
H
I
T
Alpha helix
Pi helix
Turn
E
Beta sheet
B
S
C
Beta bridge
Bend
Coil
13
Q3 Accuracy Results
14
Q8 Accuracy Results
15
SOV Scores
16
Recall and Precision
17
JPRED Validation
18
Outline
Secondary Structure Background
Neural Net Architecture
Performance
Conclusions
19
Conclusions
DeepCNF outperforms current state of the art secondary structure predictors
on common test sets
Believed to be because of actual structure of neural net, not a result of
training data
Want to apply to other sequence labeling problems
Solvent accessibility
Ordered/disordered
20