Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Protein Folding recognition with Committee Machine Mika Takata Outline Background System Outline Experiment Experimental result Reference 2 Background SCOP class All alpha All beta a/b Fold Globinlike Cytochrome c Cupredoxins a+b (TIM)barrel ・・・・・ βgrasp ・・・・ ・ ・ ・ Computation + biology + chemical + medicine + ・・・・ = significantly important Structure Classification Of Protein database Fold level class : remote homology Better recognition, better Tertiary structure prediction 1. Chemical approaching parameter ( i ) i. 6 types of Chemical features ii. String windows N-grams Protein molecular weight value Protein sequential length value iii. iv. 4 1. Chemical approaching parameter ( ii ): Global parameter Symbol C Frequencies of 20 amino acid symbols in a protein sequence Symbol S, H, V, P, Z (3-dim: composition, 3-dim: transition, 3×5-dim: Distribution) 1. Chemical approaching parameter ( iii ) Protein molecular weight value Sum of Amino acids molecular weight Utilize of molecular weight yi y yi SD Protein sequential length value Utilize of sequential length li li l SD 2. Feature parameter based on Sliding window N-Gram Proteomic fragment similarity c( , x) ( number of occurrence of in x ) /( length of x) …… NSDWTNNETRHAIVILIIIIIMLRHGKIPYWCMIPFAA… (*)string length =2 3: Feature parameter based on HMM Fig 1: feature parameter flow based on HMM Step 1 Model Ⅰ Training data C V S P H Z Mol-Weight Seq-Length Model Ⅱ Spectrum Kernel Test data Model Ⅲ HMM Step 2 Committe e SVM_1 decision_1 ・・ ・ Committe e SVM_i decision_ i ・ ・ ・ ・ Committe e SVM_27 decision_ 27 Evaluation measurement:”Accuracy Q” C i shows how correctly recognized in class i Ci The numbers of data in each class are various Q jclassi 27 TPj TPj FPj i Qi i 1 27 i i 1 ci ni C N n i i N Experiment Parameter Chemical approaching parameter Feature parameter based on Sliding window kernel (string length = 2 & 3) Feature parameter based on HMM i. ii. iii. i. Classification Methods i. independent SVM ii. Committee SVM Array Multi-class recognition approaches One-vs-others All-vs-All method i. ii. Data set Training data: 341, test data: 353 (total: 694) http://www.nersc.gov/~cding/protein Cross Validation:10 times Result (1): Independent SVM- Model I Result (2): CM- Model I Result (3): CM- Model II Result (3): Model I & II Result (4): Model I & III Result (5) : Model I & II & III Conlusion Improvement by using all models of Committee Machine Spectrum kernel was works if used with string length of 2 advantage Take advantage of sporadic data ( ex. chemical base and hmm) Reduce of computational cost Reference ( i ) 1. Takata, M., Matsuyama, Y.: Protein Folding Classification by Committee SVM Array, Lecture Notes in Computer Science, No.5507, pp. 369-377, 2009. 2. Matsuyama, Y., Kawasaki, K., Hotta, T, mizutani, Takata, M., Ishida, A.: Eukaryotic transcription start site recognition involving non-promoter model. Intelligent Systems for Molecular Biology, Toronto (2008) L05 3. Matsuyama, Y., Ishihara, Y., Ito, Y., Hotta, T., Kawasaki, K., Hasegawa, T., Takata, M.: Promoter recognition involving motif detection: Studies on E. coli and human genes. Intelligent Systems for Molecular Biology, Vienna (2007) H06. 4. Dubchak, I., Muchunik, I., Holbrook, S.R., Kim, S-H.: Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 92 (1995) 8700–8704 5. Dubchak, I., Muchnik, I., Mayor, C., Dralyyuk, I., Kim, S-H.: Recognition of a Protein Fold in the Context of the SCOP Classification. Proteins: Structure, Function, and Genetics 35 (1999) 401–407 Reference ( ii ) 1. Ding, C.H.Q, Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinfo. 17 (2001) 349–358 2. Mount,. D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2001) 3. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247 (1995) 536–540. 4. Leslie, C., Eskin, E., Noble, W.S.: The Spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing 7 (2002) 566–575 5. Tabrez, M., Shamim, A., Anwaruddin, M., Nagarajaram, H.A.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinfo. 23 (2007) 3320–3327 6. Lodhi, H,., Saunders, C., Shawe-Taylor, J., Watkins, C.: Text classification using string kernels. J. of Machine Learning Research 2 (2002) 419–444.