Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Gene expression analysis and network discovery: Genevestigator Philip Zimmermann, Genevestigator Team, ETH Zurich © ETH Zürich | Genevestigator 6 November 2007 Presentation flow Gene networks – biological context Microarray compendium: how, and what for? Meta-profile analysis: concepts and validation Genevestigator® V3 Data integration Summary & conclusion 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 2 Presentation flow Gene networks – biological context Microarray compendium: how, and what for? Meta-profile analysis: concepts and validation Genevestigator® V3 Data integration Summary & conclusion 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 3 Gene networks - biological context What is the interpretational value of a gene network derived by graphical modeling or correlation analysis? 6 November 2007 P. Zimmermann / a snapshot in time? a snapshot in space? an average trend? ETH Zurich / [email protected] 4 Gene networks - biological context From what experiment(s) was this network derived? 6 November 2007 P. Zimmermann / time-course? cell culture, whole organism? stimulus, drug response? anatomy part? stage of development? genetic modification? ETH Zurich / [email protected] 5 Context and dynamics of networks Hypothesis: networks are dynamic and context-dependant => networks evolve! => networks may have different functions in different contexts! Question: how can we quantify the role of the context in shaping the network? 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 6 Context: the time-space-response dimensions Time => time-course, development Space => anatomy parts, intracellular localization Response => response to external perturbations => response to modifications in the genome 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 7 Context and dynamics of networks Modeling the time, space and response dimensions requires: experiments testing time, space and response variables storage of measurement data and its meta-data developing analysis methods that incorporate these dimensions (→ meta-profiles) 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 8 Presentation flow Gene networks – biological context Microarray compendium: how, and what for? Meta-profile analysis: concepts and validation Genevestigator® V3 Data integration Summary & conclusion 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 9 Analysis versus meta-analysis 100 genes – what to do next? Data analysis Microarray experiment 10 billion data points – what to do next? Data storage 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Data repositories Data Annotations heterogenous datasets 6 November 2007 + unsystematic or poor annotation P. Zimmermann / ? = ETH Zurich / [email protected] meta-analysis impossible! Data warehouses anatomy development stimulus Data quality control ordered datasets 6 November 2007 + mutation Expert annotation with systematic ontologies systematic annotation P. Zimmermann / = ETH Zurich / [email protected] meta-analysis possible! Data quality control RLE NUSE Unprocessed values Affy QC metrics Border elements Correlation matrix RNA degradation 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 13 Ontologies – example of Anatomy Mouse / Rat: anatomy Edinburgh Mouse Atlas development stimulus Human: mutation mapping to Mouse and Rat anatomy tree Arabidopsis / Barley: terms from Plant Ontology tree created by Genevestigator 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Expert annotation with systematic ontologies Ontologies – example of Development Mouse: Theiler stages Rat: Witschi stages Human: Carnegie table Arabidopsis: Boyes key 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Meta-analysis tools • Who is most interested to mine this data? • Who can best interpret the results? THE BIOLOGIST! 6 November 2007 Genevestigator® – a tool for biologists P. Zimmermann / ETH Zurich / [email protected] Presentation flow Gene networks – biological context Microarray compendium: how, and what for? Meta-profile analysis: concepts and validation Genevestigator® V3 Data integration Summary & conclusion 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 17 Expression meta-profiles [space] 6 November 2007 [time] P. Zimmermann / [response] ETH Zurich / [email protected] [response] 18 Data validation [space] [time] [response] e.g. heart ventricle Category type Probe set 6 November 2007 e.g. Mm.23432 P. Zimmermann / ETH Zurich / [email protected] 19 Data validation [space] [time] [response] e.g. heart ventricle Category type Probe set 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 20 Mouse anatomy meta-profiles [space] 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 21 Data validation [space] [time] [response] Category type Probe set 6 November 2007 e.g. Mm.23432 P. Zimmermann / ETH Zurich / [email protected] 22 Rnf33 Transcription of Rnf33 has been shown to occur already in the mouse oocyte but not beyond the eight-cell stage nor in adult tissues Hoxa1 Hoxa1 expression starts at E7.5 and begins to retreat caudally by day E8.5 hemopexin hemopexin (hx), is known to be only lowly expressed in embryos and newborn mice and reaches it’s highest expression level not until the first year of age a – f: pre-natal g – l: post-natal 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 23 light-harvesting chlorophyll a/b binding protein (AT4G14690 ) protochlorophyllide reductase A (At5g54190 ) 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Presentation flow Gene networks – biological context Microarray compendium: how, and what for? Meta-profile analysis: concepts and validation Genevestigator® V3 Data integration Summary & conclusion 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 25 Development of Genevestigator® Biological experiments Anatomy Development Stimulus Mutation 14‘500 Affymetrix arrays (Nov 2007) Human, mouse, rat, arabidopsis, barley Microarray data Metabolic and regulatory pathway maps Public repositories Curation & Quality control for mouse and arabidopsis > 10‘000 registered users Genevestigator database Application server > 500 citations in peer reviewed journals Client Java application Genevestigator 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 26 Genevestigator® V3 Website Java Client Application Database and Application Server Cluster 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 27 Toolsets and tools 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 28 [space] [time] [response] 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 29 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 30 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 31 Biomarker Search toolset 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Abiotic stresses and hormonal responses salt (+) osmotic (+) salt (-) osmotic (-) salt (+) osmotic (+) cold (+) salt (+) drought (+) anoxia (-) hypoxia (-) hypoxia (-) ABA (+) --- ABA (+) MeJA (+) BL / H3BO3(+) ethylene (+) norflurazon (-) mycorrhiza (-) ozone (-) genotoxic (-) 2,4-D glucose syringolin (-) P. syringae (+) ozone (+) B. cinerea (+) syringolin (-) cycloheximide (-) H2O2 (-) AVG (+) chitin (+) 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 33 [space] [time] [response] 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 34 Biclustering Searches subsets of genes coexpressed across subsets of conditions BiMax algorithm Finds all maximal bicliques [space] [time] [response] 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 35 Example of a bicluster 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 36 Proline Phenylalanine / Tyrosine [space] [time] [response] Starch / sucrose ABA biosynthesis Cold response Inositol phosphate ABA response Beta-alanine 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 37 Presentation flow Gene networks – biological context Microarray compendium: how, and what for? Meta-profile analysis: concepts and validation Genevestigator® V3 Data integration Summary & conclusion 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 38 Biomarker search [time] Genes expressed specifically in seeds and germinating seedlings De-novo identification of cis-regulatory elements 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 39 Biomarker search [space] z = 18.2 z = 5.8 z = 5.4 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 40 Biomarker search [response] „Supervised biclustering“ isoxaben (+) norflurazon (-) light (+) nitrate_low (-) 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 41 Anatomy clustering and promoter analysis Clusters of genes expressed specifically in: cell suspension petals roots z > 5.0 seeds stamen xylem 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Development clustering and promoter analysis Clusters of Arabidopsis genes expressed specifically at: dev. stage 1 dev. stage 3 z > 5.0 dev. stage 9 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Stimulus clustering and promoter analysis „Supervised biclustering“ of stimulus meta-profiles: cluster 1 cluster 2 cluster 4 z > 5.0 cluster 5 cluster 7 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] cell suspension Proteins cotyledons flowers leaves roots seeds 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] seeds roots leaves flowers Transcripts cotyledons cell suspension Data integration: transcriptome - proteome Transcript quantification measure Arabidopsis leaf transcripts and proteins Protein quantification measure Frequency proteins detected in leaves proteins not detected in leaves but for which there is a probeset on the ATH1 array 6 November 2007 P. Zimmermann / general background range for transcript quantification measure ETH Zurich / [email protected] Number of transcripts/proteins Protein detection and transcript abundance 4000 3500 3000 2500 probe sets called “absent” on ATH1 (p >= 0.05) probe sets called “present” on ATH1 (p < 0.05) leaf proteins detected by peptide identification 2000 1500 1000 500 1.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Transcript abundance measure (log2 signal) 1 0.8 0.6 0.4 Fraction of „present“ transcripts that were detected on the protein level 6 November 2007 P. Zimmermann / 0.2 ETH Zurich / [email protected] 0 6 7 8 9 10 11 12 13 14 15 GO analysis 1.2 1 n = 221 specific probesets with average signal in leaves >13 0.8 0.6 0.4 GO Cellular Component 0.2 0 6 7 8 9 10 11 12 13 14 15 cell wall chloroplast cytosol ER extracellular Golgi apparatus mitochondria nucleus other cellular components other cytoplasmic components other intracellular components other membranes plasma membrane plastid ribosome ATH1 array (control) Proteins not detected but transcripts have high abundance ( >13 ) 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] GO analysis 1.2 1 n = 221 specific probesets with average signal in leaves >13 0.8 0.6 0.4 GO Biological Process 0.2 0 6 7 8 9 10 11 12 13 14 15 cell organization and biogenesis developmental processes DNA or RNA metabolism electron transport or energy pathways other biological processes other cellular processes other metabolic processes protein metabolism response to abiotic or biotic stimulus response to stress signal transduction transcription transport ATH1 array (control) Proteins not detected but transcripts have high abundance ( >13 ) 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] GO analysis 1.2 1 n = 221 specific probesets with average signal in leaves >13 0.8 0.6 0.4 GO Molecular Function 0.2 0 6 7 8 9 10 11 12 13 14 15 DNA or RNA binding hydrolase activity kinase activity nucleic acid binding nucleotide binding other binding other enzyme activity other molecular functions protein binding receptor binding or activity structural molecule activity transcription factor activity transferase activity transporter activity ATH1 array (control) Proteins not detected but transcripts have high abundance ( >13 ) 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Data integration – pathway analysis Transcript abundance Mevalonate biosynthesis Riboflavin metabolism Chlorophyll / Porphyrin metabolism Phenylpropanoid metabolism Protein abundance Carotenoid biosynthesis 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Relative protein-to-transcript ratio serine, glycine, cystein Calvin cycle starch and sucrose metabolism Fatty acid biosynthesis 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Relative protein-to-transcript ratio Chlorophyll / Porphyrin metabolism Purine metabolism Pyrimidine metabolism Fatty acid biosynthesis Glycolysis / Gluconeogenesis 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Proteomic and transcriptomic biomarkers „Root-specific“expression Search by scoring the proteomic dataset 6 November 2007 P. Zimmermann / Search by scoring the Genevestigator dataset ETH Zurich / [email protected] Proteomic and transcriptomic biomarkers Search by scoring the proteomic dataset 6 November 2007 P. Zimmermann / Search by scoring the Genevestigator dataset ETH Zurich / [email protected] Presentation flow Gene networks – biological context Microarray compendium: how, and what for? Meta-profile analysis: concepts and validation Genevestigator® V3 Data integration Summary & conclusion 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 56 Summary and conclusions Biological networks: importance of the biological context Meta-profiles: context-driven analysis Biological validation of meta-profiles and clusters Genevestigator – a tool for biologists! Data integration: challenging biological complexity 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 57 Data type? Experimental context? Modes of interactions? Organism? Network dynamics? Reproducibility? 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Acknowledgements ETH Zurich Prof. Gruissem Developer Team: Tomas Hruz, Oliver Laule, Stefan Bleuler, Philip Zimmermann Gabor Szabo, Frans Wessendorp, Lukas Oertle, Dominique Dümmler, Matthias Hirsch-Hoffmann 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] Thanks for your attention! 6 November 2007 P. Zimmermann / ETH Zurich / [email protected] 60