Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science MSCBB 2007 Prediction of RNA-Protein interfaces Using Structural Features Fadi Towfic, David C. Gemperline, Cornelia Caragea, Feihong Wu, Drena Dobbs, and Vasant Honavar Abstract RNA-protein interactions play a critical role in gene expression: From splicing to translation, proteins must be able to recognize and interact with specific sites of RNA in order to perform their respective functions. In this paper, 147 different chains from RNA-binding proteins in the Protein Databank were characterized according to multiple structural features and the type of RNA bound to each protein chain. Furthermore, Naive Bayes classifiers were constructed to predict protein-RNA interfaces on the surface residues of the proteins. The three structural features used in this study were surface roughness, solid angle and CX value. Dataset and Classification The protein chains in the RB147 dataset available from the RNAbindr website (http://bindr.gdcb.iastate.edu/) were classified according to the type of RNA bound by each chain. Each type of RNA was then clustered using ANOVA as described by Towfic et al. (Towfic et al., 2007) as shown in Table 1. A NaïveBayes classification algorithm with 10-fold cross-validation with a window size of 12 (Witten and Frank, 2005) was then used to classify each of the groups shown in table 1. Structural Feature Group 1 Group 2 CX Value (Alpha Carbon) tRNA, mRNA Roughness Value tRNA, SRP RNA, Viral RNA, snRNA, rRNA, siRNA, mRNA, dsRNA other Solid Angle Value snRNA, rRNA, dsRNA, other Group 3 siRNA, SRP RNA, Viral RNA tRNA, SRP RNA, Viral RNA, snRNA, rRNA, siRNA, mRNA, dsRNA other Table 1: Clustering of each RNA-binding type based on ANOVA analysis of the propensities for each chain. Results As shown in table 2, the clustering of the RNA types seems to improve the prediction accuracy, correlation, sensitivity and specificity in some cases (alpha carbon group2, roughness value group 1, solid angle value group 1) while contributing to poor performance in others (alpha carbon group 3, roughness value group 2, solid angle value group 2) compared to the classifiers that do not use clustering. A possible reason for the aforementioned discrepancy is that the preliminary clustering using ANOVA may have not been sophisticated enough to identify subclusters that lie within each group. The poor clustering may have contributed to the poor classification performance by Naïve Bayes. However, it is appropriate to note that each of the structural features had at least one cluster where classification performance was increased compared to the “No clustering” baseline. This result demonstrates the potential of using more sophisticated clustering as well as classification algorithms to improve the performance of RNA-protein interface prediction algorithms. Method/Group Accuracy Correlation Coefficient 0.724 0.196 Sensitivity+ Specificity+ 0.346 0.399 CX Value (Alpha Carbon)– Group 1 CX Value (Alpha Carbon)– Group 2 CX Value (Alpha Carbon)– Group 3 Roughness Value–No clustering 0.712 0.017 0.235 0.142 0.645 0.204 0.408 0.524 0.513 -0.045 0.401 0.088 0.736 0.212 0.333 0.425 Roughness Value–Group 1 0.720 0.236 0.362 0.477 Roughness Value–Group 2 0.794 0.044 0.171 0.153 Solid Angle Value–No clustering 0.700 0.194 0.357 0.428 Solid Angle Value–Group1 0.709 0.250 0.374 0.517 Solid Angle Value– Group2 0.722 0.048 0.263 0.167 CX Value (Alpha Carbon)– No clustering Table 2: Comparison of the performance of the Naïve Bayes classifier with and without clustering. References F. Towfic, D. C. Gemperline, C. Caragea, F. Wu, D. Dobbs, and V. Honavar . Structural Characterization of RNA-Binding Sites of Proteins: Preliminary Results. Computational Structural Bioinformatics Workshop proceedings, 2007. In Press. I. H. Witten and E. Frank . Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Morgan Kaufmann, 2005 Acknowledgements: This research was supported in part by a grant from the National Institutes of Health (GM066387) to Vasant Honavar and Drena Dobbs, an Integrative Graduate Education and Research Training (IGERT) fellowship to Fadi Towfic, funded by the National Science Foundation grant (DGE 0504304) to Iowa State University, and a Bioengineering and Bioinformatics Summer Institute (BBSI) fellowship to David Gemperline, funded by a National Science Foundation award (EEC 0608769) to Iowa State University. This work has benefited from discussions with Dr. Robert Jernigan of Iowa State University.