Download Protein Secondary Structure Prediction Using Deep Convolutional

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Protein Secondary Structure
Prediction Using Deep
Convolutional Neural Fields
Sheng Wang, Jian Peng, Jianzhu Ma & Jinbo Xu
Bentley Wingert – 23 March 2016
1
Outline

Secondary Structure Background

Neural Net Architecture

Performance

Conclusions
2
Secondary Structure

Secondary structures defined by
pattern of hydrogen bonds in common
configurations


Dictionary of Protein Secondary
Structure (DSSP)
Secondary structure prediction using
neural nets since ’80s.*
Letter
Description
G
3-10 helix
H
I
T
Alpha helix
Pi helix
Turn
E
Beta sheet
B
S
C
Beta bridge
Bend
Coil
*Qian, Ning, and Terrence J. Sejnowski. "Predicting the secondary structure of globular proteins using
neural network models." Journal of molecular biology 202.4 (1988): 865-884.
3
Outline

Secondary Structure Background

Neural Net Architecture

Performance

Conclusions
4
Neural Net Overview

Deep Convolutional Neural Net (DCNN)



W: Neural net parameters
Conditional Random Field (CRF)

U: Between top and label layers

T: Correlation between adjacent residues
of label layer

A) Traditional deep neural net

B) Convolutional deep neural net
L2 regularization factor

λ
5
Deep Convolutional Neural Net

Window size = 11 nodes

Nk = 5

Activation function either sigmoid or
tanh – h()

Weights shared by all positions in layer
6
Conditional Random Field
7
Training Data


Datasets

CullPDB – 6125 proteins (proteins added before May
2012) ~5600 training ~500 test

CB513 – 513

CASP10 - 123

CASP11 – 105

CAMEO – 179 (5 Dec 2014 – 29 May 2015)
Input

21-element binary vector of amino acids for that position

Sigmoid-transformed PSSM from PSI-BLAST


E-value threshold 0.001

3 iterations
42 features per residue
8
Training Technique
𝜃 = {𝑊, 𝑇, 𝑈}
9
Training Technique
𝜃 = {𝑊, 𝑇, 𝑈}
10
Training Technique
𝜃 = {𝑊, 𝑇, 𝑈}
11
Outline

Secondary Structure Background

Neural Net Architecture

Performance

Conclusions
12
Evaluation

Q3 accuracy


Q8 accuracy


Percentage of residues with correctly predicted
secondary structures
Percentage of residues with correctly predicted
secondary structures
SOV (Segment of OVerlap)

How well predicted segments of secondary structure
match

Penalizes if length of segments do not match

Lower penalty if wrong at end of segment, large penalty
if wrong in middle
Letter
Description
G
3-10 helix
H
I
T
Alpha helix
Pi helix
Turn
E
Beta sheet
B
S
C
Beta bridge
Bend
Coil
13
Q3 Accuracy Results
14
Q8 Accuracy Results
15
SOV Scores
16
Recall and Precision
17
JPRED Validation
18
Outline

Secondary Structure Background

Neural Net Architecture

Performance

Conclusions
19
Conclusions

DeepCNF outperforms current state of the art secondary structure predictors
on common test sets

Believed to be because of actual structure of neural net, not a result of
training data

Want to apply to other sequence labeling problems

Solvent accessibility

Ordered/disordered
20
Related documents