Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
From Expression, Through Annotation, to Function Ohad Manor & Tali Goren Open Day 2006 Have you ever wondered… Open Day 2006 Types of Data What Characterizes these data sets? Systematic view in genomic large scale Gene GO Annotations Expression Protein Sub Cellular – Protein GO Annotations ChIP on chip (Microarray) Localization Interactions Open Day 2006 Open Day 2006 What is ? • A computational tool to check enrichment of data sets • Implemented in perl • Interactive command line • May be scripted… • Concatenate tests and matrix operations • Data manipulation functions and queries Open Day 2006 Using • Load biological data • Check enrichment of crossed data sets • Extract statistically significant results • Multiple hypothesis correction • Cluster gene sets • Save results Open Day 2006 What is statistically significant? • How to choose the right test to compare measurements? • Non – Parametric: – no assumption about sample size or distribution – no parameters such as expectation or variance • Paired or Unpaired? Open Day 2006 Paired – Binary Version Gene1 Gene1 Gene2 Gene2 Gene3 Gene3 Gene4 Gene4 Gene5 Gene5 Gene6 Gene6 Gene7 Gene7 Gene8 Gene8 Gene9 Gene9 Gene10 Gene10 Ribosome Assembly RAP1 RAP1 Ribosome Assembly 0 1 0 3 2 1 0 5 Open Day 2006 Paired – continuous version heat shock Gene1 YPD Gene1 Gene1 Gene2 Gene2 Gene2 Gene3 Gene3 Gene3 Gene4 Gene4 Gene4 Gene5 Gene5 Gene5 Gene6 Gene6 Gene6 Gene7 Gene7 Gene7 Gene8 Gene8 Gene8 Gene9 Gene9 Gene9 Gene10 Gene10 Gene10 -1 1 Open Day 2006 Unpaired test heat shock heat shock RAP1 Gene1 Gene1 Gene2 Gene2 Gene3 Gene3 Gene4 Gene4 Gene5 Gene5 Gene6 Gene6 Gene7 Gene7 Gene8 Gene8 Gene9 Gene9 Gene10 Gene10 Gene1 Gene2 Gene4 Gene5 Gene6 heat shock Gene3 Gene7 Gene8 Gene10 -1 1 Open Day 2006 Statistics Statistics……. Type Of Data Goal Parametric Tests Non -Parametric Tests Compare two unpaired groups Unpaired T test Kolmogorov-Smirnov Compare two paired groups Paired T test Wilcoxon test Quantify association between two variables Pearson correlation Spearman Correlation Binary Measurements Chi-square test Open Day 2006 How About Some Biology? Open Day 2006 S. Cerevisiae Regulation • Let’s presume we know nothing about the Yeast • Use ENRICH to construct a basic regulatory network of Yeast • How can we do that? Open Day 2006 STE12 RAP1 MSN2 FHL1 Flow chart Gene1 ChIP Gene3 Gene4 Gene5 Gene6 Gene7 Gene8 Gene9 HG test Ribosomal Stress Cell cycle Metabolism Gene10 Gene1 STE12 Ribosomal Stress Cell cycle Metabolism Ribosomal Stress Cell cycle Metabolism Gene2 Significance RAP1 STE12 RAP1 YAP5 YAP5 MSN2 MSN2 SFP1 SFP1 threshold FHL1 FHL1 GAT1 GAT1 Gene2 GO Gene3 Gene4 Gene5 P-values Binary values Gene6 Gene7 Gene8 Gene9 Gene10 Open Day 2006 Yeast regulation network Metabolism Stress Cell cycle Open Day 2006 FHL1 protein Case study Open Day 2006 FHL1 – what is known • Putative transcriptional regulator • Predicted to be involved in stress response • Required for rRNA processing • Null mutant shows reduced growth rate • Could we have found all of that alone? Open Day 2006 Experimental various conditions Exp. Gene1 Gene2 Gene4 Exp.1 Gene5 Exp.2 Gene6 Exp.3 Gene7 Exp.4 FHL1 Gene8 genes Gene10 Unpaired Exp.5 Exp.1 HG test Exp.2 Exp.3 FHL1 Gene1 Gene2 Gene3 T-test Exp.4 Conditions Exp.5 P-values Exp.2 Exp.3 Gene5 Exp.4 Gene6 Exp.5 Gene8 Gene9 P-values Exp.1 Gene4 Gene7 FHL1 Gene9 Heat shock AA starvation osmotic stress oxidative stress invasive growth FHL1 Gene3 Binary values Gene10 Open Day 2006 Tell me who are your friends… FHL1 Gene1 Gene2 Gene3 Gene4 Gene5 RAP1 FKH2 MBP1 GAT3 SOK2 Gene6 Gene7 Gene9 HG test Gene10 ChIP FHL1 Gene8 Gene1 Gene2 Gene3 P-values Gene4 Gene5 Gene6 Gene7 Gene8 Gene9 Gene10 Open Day 2006 Enriched conditions Growth Enriched GO annotations Ribosome assembly Stress response Enriched TF’s RAP1 SFP1 GAT3 Open Day 2006 Remember this question? • What is the connection between the expression level of a gene to its sub-cellular localization? • Which Transcription Factors regulate Amino Acid Biosynthesis? • Does a heat shock affect peripheral proteins more than it affects mitochondrial proteins? Cell Periphery Mitochondrion Open Day 2006 Flow chart Mitochondria Bud Neck Vacuole Cell periphery Nucleus Exp. Gene1 Gene2 Gene3 Gene4 Mitochondria Bud Neck Vacuole Cell periphery Nucleus Gene6 Gene7 Gene8 Gene9 genes Gene10 Unpaired Exp.2 Exp.3 Exp.4 Exp.5 Exp.1 HG test Exp.2 Gene1 Gene2 Gene3 T-test Short HS Medium HS Long HS Severe HS Moderate HS Exp.3 Localization Exp.4 Exp.5 P-values Exp.2 Gene5 Exp.3 Gene6 Exp.4 Gene7 Exp.5 Gene9 Gene10 Short HS Medium HS Long HS Severe HS Moderate HS P-values Exp.1 Gene4 Gene8 Cell periphery Mitochondria Exp.1 Gene5 Binary values Open Day 2006 Future plans • Continue to develop • More data available out there • Build Regulation networks for the Yeast and other species Open Day 2006 Questions Open Day 2006 Thanks • Prof. Nir Friedman • Tommy Kaplan • And to you for listening!!! Open Day 2006 Open Day 2006