Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Signal transduction wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein structure prediction wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein domain wikipedia , lookup
List of types of proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Predicting Protein-RNA Binding Sites Using Structural Information Cornelia Caragea, Michael Terribilini, Jivko Sinapov, Jae-Hyung Lee, Fadi Towfic, Drena Dobbs and Vasant Honavar Introduction Struct-SVM Classifier RNA molecules play diverse functional and structural roles in cells: A machine learning classifier that incorporates domain knowledge to improve classification (that is, the structure of the protein) messengers for transferring genetic information from DNA to proteins primary genetic material in many viruses enzymes important for protein synthesis and RNA processing essential and ubiquitous regulators of gene expression in living organisms Learning System L Resulting Classifier These functions depend on interactions between RNA molecules and specific proteins in cells. Xtest,j Collection of Surface Windows Protein-RNA interface residue identification 1T0K_B xi Collection of Non-Surface Windows A N T P V L R K S 0 0 1 1 0 0 1 0 0 Feature Extraction Sequence: xi=(xi,1,…,xi,j-k,…,xi,j,…,xi,j+k,…,xi,m) Label: yi=(yi,1,…,yi,j-k,…,yi,j,…,yi,j+k,…,yi,m) Seq2SeqWins SeqWins2TargetAA SeqWins2ZeroOne SeqWins2Blast SeqWins2SS SS2ZeroOne TargetAA2Struct Struct2Blast SeqWins2CXValue SeqWins2Roughness Seq2SeqWins windowise Fig. 1. Receiver Operaring Characteristi (ROC) Curves for SVM and Struct-SVM classifiers on the protein-RNA dataset h(xtest,j)=-1 … … x’i,j-1=(xi,j-1-k,…,xi,j-1,…,xi,j-1+k) x’i,j-1=(xi,j-1) SeqWins2TargetAA x’i,j=(xi,j) x’i,j+1=(xi,j+1) … Classifier/ PerfMeasure SVM Struct-SVM Accuracy 0.68 0.74 Correlation Coefficient 0.25 0.30 Area Under ROC Curve 0.73 0.76 Table 1. Accuracy, Correlation Coefficient and Area Under the ROC Curves for SVM and StructSVM Conclusions x’i,j+1=(xi,j+1-k,…,xi,j+1,…,xi,j+1+k) … h(xtest,j)=y Results RNA-Protein Interface dataset, RB181: consists of RNA-binding protein sequences extracted from structures of known RNA-protein complexes solved by X-ray crystallography in the Protein Data Bank x’i,j=(xi,j-k,…,xi,j,…,xi,j+k) Test Data Final Predictions SINQKLALVIKSGK YTLGYKSTVKSLRQ GKSKLIIIAANTPV LRKSELEYYAMLSK TKVYYFQGGNNELG TAVGKLFRVGVVSI LEAGDSDILTTLA Dataset yes Training Data y i {0,1}* Xtest,j= surface no Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues Results show that the ROC curve of Struct-SVM dominates the ROC curve of Support Vector Machine (SVM) classifier References [1] Chen, Y., Varani, G. (2005). Protein families and RNA recognition. Febs J 272:2088-2097. [2] Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998 [3] Towfic, F., Caragea, C., Dobbs, D., and Honavar, V. (2008). Struct-NB: Predicting protein-RNA binding sites using structural features. International Journal of Data Mining and Bioinformatics, In press. Acknowledgements: This work is supported in part by a grant from the National Institutes of Health (GM 066387) to Vasant Honavar & Drena Dobbs