* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download We propose a frequent pattern-based algorithm for predicting
Rosetta@home wikipedia , lookup
List of types of proteins wikipedia , lookup
Structural alignment wikipedia , lookup
Circular dichroism wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein domain wikipedia , lookup
Protein folding wikipedia , lookup
Protein design wikipedia , lookup
Homology modeling wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Alpha helix wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein moonlighting wikipedia , lookup
Abstract: We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.