Download poster - Computer Science and Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

LSm wikipedia , lookup

SR protein wikipedia , lookup

Ribosome wikipedia , lookup

RNA silencing wikipedia , lookup

Epitranscriptome wikipedia , lookup

Transcript
Artificial Intelligence Research Laboratory
Bioinformatics and Computational Biology Program
Computational Intelligence, Learning, and Discovery Program
Department of Computer Science
MSCBB 2007
Prediction of RNA-Protein interfaces Using Structural Features
Fadi Towfic, David C. Gemperline, Cornelia Caragea, Feihong Wu, Drena Dobbs, and Vasant Honavar
Abstract
RNA-protein interactions play a critical role in gene expression: From splicing
to translation, proteins must be able to recognize and interact with specific
sites of RNA in order to perform their respective functions. In this paper, 147
different chains from RNA-binding proteins in the Protein Databank were
characterized according to multiple structural features and the type of RNA
bound to each protein chain. Furthermore, Naive Bayes classifiers were
constructed to predict protein-RNA interfaces on the surface residues of the
proteins. The three structural features used in this study were surface
roughness, solid angle and CX value.
Dataset and Classification
The protein chains in the RB147 dataset available from the RNAbindr website
(http://bindr.gdcb.iastate.edu/) were classified according to the type of RNA
bound by each chain. Each type of RNA was then clustered using ANOVA as
described by Towfic et al. (Towfic et al., 2007) as shown in Table 1. A NaïveBayes classification algorithm with 10-fold cross-validation with a window
size of 12 (Witten and Frank, 2005) was then used to classify each of the
groups shown in table 1.
Structural Feature Group 1
Group 2
CX Value (Alpha
Carbon)
tRNA, mRNA
Roughness Value
tRNA, SRP RNA, Viral RNA,
snRNA, rRNA,
siRNA, mRNA,
dsRNA
other
Solid Angle Value
snRNA, rRNA,
dsRNA, other
Group 3
siRNA, SRP
RNA, Viral RNA
tRNA, SRP RNA, Viral RNA,
snRNA, rRNA,
siRNA, mRNA,
dsRNA
other
Table 1: Clustering of each RNA-binding type based on ANOVA analysis of
the propensities for each chain.
Results
As shown in table 2, the clustering of the RNA types seems to improve the
prediction accuracy, correlation, sensitivity and specificity in some cases
(alpha carbon group2, roughness value group 1, solid angle value group 1)
while contributing to poor performance in others (alpha carbon group 3,
roughness value group 2, solid angle value group 2) compared to the
classifiers that do not use clustering.
A possible reason for the aforementioned discrepancy is that the preliminary
clustering using ANOVA may have not been sophisticated enough to identify
subclusters that lie within each group. The poor clustering may have contributed to
the poor classification performance by Naïve Bayes. However, it is appropriate to
note that each of the structural features had at least one cluster where classification
performance was increased compared to the “No clustering” baseline. This result
demonstrates the potential of using more sophisticated clustering as well as
classification algorithms to improve the performance of RNA-protein interface
prediction algorithms.
Method/Group
Accuracy Correlation
Coefficient
0.724
0.196
Sensitivity+ Specificity+
0.346
0.399
CX Value (Alpha Carbon)–
Group 1
CX Value (Alpha Carbon)–
Group 2
CX Value (Alpha Carbon)–
Group 3
Roughness Value–No
clustering
0.712
0.017
0.235
0.142
0.645
0.204
0.408
0.524
0.513
-0.045
0.401
0.088
0.736
0.212
0.333
0.425
Roughness Value–Group 1
0.720
0.236
0.362
0.477
Roughness Value–Group 2
0.794
0.044
0.171
0.153
Solid Angle Value–No
clustering
0.700
0.194
0.357
0.428
Solid Angle Value–Group1
0.709
0.250
0.374
0.517
Solid Angle Value– Group2 0.722
0.048
0.263
0.167
CX Value (Alpha Carbon)–
No clustering
Table 2: Comparison of the performance of the Naïve Bayes classifier with and
without clustering.
References
F. Towfic, D. C. Gemperline, C. Caragea, F. Wu, D. Dobbs, and V. Honavar .
Structural Characterization of RNA-Binding Sites of Proteins: Preliminary
Results. Computational Structural Bioinformatics Workshop proceedings,
2007. In Press.
I. H. Witten and E. Frank . Data Mining: Practical Machine Learning Tools and
Techniques, 2nd Edition, Morgan Kaufmann, 2005
Acknowledgements: This research was supported in part by a grant from the National Institutes of Health (GM066387) to Vasant Honavar and Drena Dobbs, an Integrative Graduate Education and Research Training (IGERT)
fellowship to Fadi Towfic, funded by the National Science Foundation grant (DGE 0504304) to Iowa State University, and a Bioengineering and Bioinformatics Summer Institute (BBSI) fellowship to David Gemperline, funded
by a National Science Foundation award (EEC 0608769) to Iowa State University. This work has benefited from discussions with Dr. Robert Jernigan of Iowa State University.