* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Madhavi_11072005
Survey
Document related concepts
Protein phosphorylation wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Endomembrane system wikipedia , lookup
Signal transduction wikipedia , lookup
Homology modeling wikipedia , lookup
Circular dichroism wikipedia , lookup
List of types of proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein moonlighting wikipedia , lookup
Bacterial microcompartment wikipedia , lookup
Western blot wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein structure prediction wikipedia , lookup
Transcript
Characterization of Transmembrane Helices Madhavi Ganapathiraju Nov 07, 2005 JKS-seminar 1 2 Summary Completion of classification procedures for TM prediction using the LSA features Web-tool for the TM prediction has been designed; it is being developed by Christopher Jursa TMPDB, a set of 119 transmembrane proteins has also been processed and included in evaluations KchannelDB, the database of Kchannel proteins subdiviided into families of 1, 2, 4 and 6 TMs each has been collected and processed. First 2 have been evaluated. Decision tree and support vector machine classifiers have been evaluated Paper summarizing the work has been written Qok metric has been found to be incorrect in previous evaluations – It has been corrected. Nov 07, 2005 3 Recap: TM prediction method Example: Example: MDPML… (A) Map amino acid sequence to 5 different property sequences -n---p--....R O..OO aDDad (B) Window analysis from left to right Count Ci1, Ci2… Ci10 Place a moving window at position i i=i+1 Matrix of Counts (L-l+1) x 10 Nov 07, 2005 (C) Singular Value Decomposition (PCA) Features (L-l+1) x 4 (D) Neural Network (4 input nodes, 1 output node) Prediction & confidence Lx1 & Lx1 (E) Hidden Markov Model Prediction Lx1 4 Neural Net Classifier Dimension 1 4 Dimensions of the Vector obtained by LSA form the input Dimension 2 Dimension 3 Dimension 4 Nov 07, 2005 5 Decision Tree & SVM Classifiers Used MATLABarsenal, the wrapper tools developed by Rong (LTI) to see the performance of classifiers on the feature set – Decision Trees – SVM (2nd degree polynomial kernel) Nov 07, 2005 6 Evaluation Data Sets Benchmark – 36 proteins of high resolution TM information TMPDB – 119 proteins of known 3D structure KChannelDB – Multiple sequence alignments of KChannel proteins of 1 and 2 TM segments Nov 07, 2005 7 Results: 36 high res Segment Symbol Method Qok F Q ob Residue level Q Q2 pred F F 2T 2N s Set 36 high resolution proteins 1 TMHMM* 71 90 90 90 80 74 77 2 TMpro (LC)* 61 94 94 94 76 ? ? 3 TMpro (HMM)* 66 95 97 92 77 76 76 4 TMpro (NN)* 75 95 95 94 73 70 75 Evaluations have been performed by submitting data on benchmark server Nov 07, 2005 8 Results: TMPDB Segment F Q obs Q pred Residue level Q2 F 2T F 2N TMHMM 90 89 90 89 80 90 NN 90 90 89 86 75 90 HMM 85 90 80 84 74 77 SVM 93 95 90 84 77 88 Decision Trees 92 97 86 83 75 87 Nov 07, 2005 9 Other things Processed KChannel DB proteins for evaluation – Initial evaluations are done, but not ready for discussion … Nov 07, 2005 10 TMPro web service TMPro website is being developed by Christopher, Dr. Karimi’s student – Should be up in 2 weeks time Developed standalone versions of feature processing required for the web-service for DT and SVM Nov 07, 2005 11 Charge rich proteins I seem to have not mailed myself the latest figures here, I will show them separately Nov 07, 2005 12 Ongoing work Qok is not high for TMPDB data set To overcome this, error analysis is being performed – Measure how far away from “truth” the prediction is (what threshold would have classified the segment correctly as TM or non TM) – Characteristics of the segments misclassified Are they traditional globular hydrophobic segments only, can aromatic and other properties be used to recover from error? Combination with TMHMM prediction for improved performance – Rule based combination on aromatic property has previously been shown to improve TMHMM predictions (March/June 2005?) on high resolution proteins – Do this on TMPDB set as well Other architectures of NN to be studied? Error TM segments to be studied further with DT rules that fail Nov 07, 2005