Download poster - Computer Science and Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein wikipedia , lookup

Signal transduction wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein structure prediction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein domain wikipedia , lookup

List of types of proteins wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Artificial Intelligence Research Laboratory
Bioinformatics and Computational Biology Program
Computational Intelligence, Learning, and Discovery Program
Department of Computer Science
Predicting Protein-RNA Binding Sites Using Structural Information
Cornelia Caragea, Michael Terribilini, Jivko Sinapov, Jae-Hyung Lee, Fadi Towfic, Drena Dobbs and Vasant Honavar
Introduction
Struct-SVM Classifier
RNA molecules play diverse functional and structural roles in cells:
A machine learning classifier that incorporates domain knowledge to improve
classification (that is, the structure of the protein)




messengers for transferring genetic information from DNA to proteins
primary genetic material in many viruses
enzymes important for protein synthesis and RNA processing
essential and ubiquitous regulators of gene expression in living organisms
Learning
System L
Resulting
Classifier
These functions depend on interactions between RNA molecules and specific
proteins in cells.
Xtest,j
Collection of
Surface
Windows
Protein-RNA interface residue identification
1T0K_B
xi
Collection of
Non-Surface
Windows
A
N
T
P
V
L
R
K
S
0
0
1
1
0
0
1
0
0
Feature Extraction
Sequence:
xi=(xi,1,…,xi,j-k,…,xi,j,…,xi,j+k,…,xi,m)
Label:
yi=(yi,1,…,yi,j-k,…,yi,j,…,yi,j+k,…,yi,m)
Seq2SeqWins
SeqWins2TargetAA
SeqWins2ZeroOne
SeqWins2Blast
SeqWins2SS
SS2ZeroOne
TargetAA2Struct
Struct2Blast
SeqWins2CXValue
SeqWins2Roughness
Seq2SeqWins
windowise
Fig. 1. Receiver Operaring Characteristi (ROC)
Curves for SVM and Struct-SVM classifiers on the
protein-RNA dataset
h(xtest,j)=-1
…
…
x’i,j-1=(xi,j-1-k,…,xi,j-1,…,xi,j-1+k)
x’i,j-1=(xi,j-1)
SeqWins2TargetAA
x’i,j=(xi,j)
x’i,j+1=(xi,j+1)
…
Classifier/
PerfMeasure
SVM
Struct-SVM
Accuracy
0.68
0.74
Correlation
Coefficient
0.25
0.30
Area Under
ROC Curve
0.73
0.76
Table 1. Accuracy, Correlation Coefficient and
Area Under the ROC Curves for SVM and StructSVM
Conclusions


x’i,j+1=(xi,j+1-k,…,xi,j+1,…,xi,j+1+k)
…
h(xtest,j)=y
Results
RNA-Protein Interface dataset, RB181: consists of RNA-binding protein
sequences extracted from structures of known RNA-protein complexes
solved by X-ray crystallography in the Protein Data Bank
x’i,j=(xi,j-k,…,xi,j,…,xi,j+k)
Test Data
Final
Predictions
SINQKLALVIKSGK
YTLGYKSTVKSLRQ
GKSKLIIIAANTPV
LRKSELEYYAMLSK
TKVYYFQGGNNELG
TAVGKLFRVGVVSI
LEAGDSDILTTLA
Dataset

yes
Training Data
y
i {0,1}*

Xtest,j=
surface
no
Developed Struct-SVM classifier that takes into account domain knowledge to
improve identification of protein-RNA interface residues
Results show that the ROC curve of Struct-SVM dominates the ROC curve of
Support Vector Machine (SVM) classifier
References
[1] Chen, Y., Varani, G. (2005). Protein families and RNA recognition. Febs J 272:2088-2097.
[2] Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998
[3] Towfic, F., Caragea, C., Dobbs, D., and Honavar, V. (2008). Struct-NB: Predicting protein-RNA binding sites using structural features.
International Journal of Data Mining and Bioinformatics, In press.
Acknowledgements: This work is supported in part by a grant from the National Institutes of Health (GM 066387) to Vasant Honavar & Drena Dobbs