Download Document

Improving Protein-RNA Interface Prediction by Combining a Sequence Homology-based Method with a Naïve Bayes Classifier: Preliminary Results Li Xue1,2, Rasna Walia1,2, Yasser El-Manzalawy2,4, Drena Dobbs1,3, Vasant Honavar1,2 1 Bioinformatics & Computational Biology Program; 2 Dept. of Computer Science; 3 Dept. of Genetics, Development & Cell Biology, Iowa State University; 4 Dept. of Systems & Computer Engineering, Al-Azhar University, Cairo, Egypt Method Protein-RNA interactions play important roles in cellular processes including protein synthesis, RNA processing, and gene expression regulation. Reliable identification of the interfaces involved in protein-RNA interactions is essential for comprehending their mechanisms and functional implications and provides a valuable guide for rational drug discovery and design. • NR216 – for analyzing protein interface conservation • RB199 – for testing the prediction performance of HomPRIP & its combination with a NB classifier • nr_RNAprot_s2c – for searching for putative sequence homologs using BLASTP Experimental determination of interfaces in protein-RNA complexes is time-consuming and expensive. Thus computational techniques for predicting RNA-binding sites on proteins are valuable. Here we propose a novel family of sequence homology-based methods: Query protein sequence Search nr_RNAprot_s2c to find homologous sequences • HomPRIP uses interface information from putative homologs of a query protein to predict interface residues in the query protein. • When no sequence homologs for the query protein can be found, HomPRIP-NB uses a Naïve Bayes (NB) classifier trained on evolutionary information derived from protein sequences in the NCBI nr database to return interface predictions. Homologous sequences found? Yes Safe zone Twilight zone Dark zone No HomPRIPNB returns predicted interface residues HomPRIP returns predicted interface residues http://einstein.cs.iastate.edu/HomPRIP-NB Protein-RNA Interface Conservation Homology Zones Results & Conclusion • Support Vector Machine & Naïve Bayes classifiers were trained using three different features: • amino acid identity • PSSM profiles • smoothed PSSM profiles and evaluated using five-fold cross-validation. ICscore Cutoff Safe Zone 0.70 Twilight Zone 1 0.50 Twilight Zone 2 0.40 Twilight Zone 3 0.20 Dark Zone 0.15 Classifiers CC Sensitivity F1 PPV Accuracy Sequence ID 0.24 0.64 0.37 0.26 0.67 Sequence PSSM 0.32 0.70 0.43 0.31 0.72 Smoothed Sequence PSSM 0.30 0.70 0.41 0.29 0.70 Sequence ID 0.24 0.65 0.37 0.26 0.68 Sequence PSSM 0.34 0.72 0.44 0.32 0.73 Smoothed Sequence PSSM 0.32 0.70 0.43 0.31 0.73 Sequence ID 0.28 0.17 0.27 0.62 0.86 Sequence PSSM 0.25 0.60 0.38 0.27 0.71 Smoothed Sequence PSSM 0.25 0.66 0.38 0.27 0.68 HomPRIP 0.69 0.70 0.74 0.78 0.91 HomPRIP-NB 0.51 0.67 0.59 0.53 0.86 SVM with Polynomial Kernel (default parameters) SVM with RBF Kernel (default parameters) An interface conservation score (ICscore) is calculated as a measurement of the similarity of a homolog’s interface residues to those of the query protein. A regression model is used to calculate the ICscore, based on BLAST sequence alignment statistics. Naïve Bayes • Performance of HomPRIP is reported only for 71% of complexes in the RB199 dataset (those for which homologs could be found); HomPRIP-NB returned predictions for the entire RB199 dataset. Future Directions Safe zone: a high degree of conservation (red data points) Twilight zone: moderate conservation of interfaces (yellow & orange data points) Dark zone: poor conservation of interfaces (blue data points) Acknowledgements Funding provided by: NIH GM 066387 Features Ongoing work is aiming at comparing HomPRIP-NB with other publically available servers that predict RNAbinding sites on proteins (e.g., BindN, PiRaNha, PRIP, RNABindR), using an independent test set. 1. 2. B.A. Lewis, R.R. Walia, M. Terribilini, J. Ferguson, C. Zheng, V. Honavar, and D. Dobbs. PRIDB: a protein–RNA interface database. Nucleic Acids Research, 39(suppl 1):D277, 2011. L.C. Xue, D. Dobbs, and V. Honavar. HOMPPI: A class of sequence homology based protein-protein interface prediction methods. BMC Bioinformatics, 12:244, 2011.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document