* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Here
Epigenetics in learning and memory wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene therapy wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Point mutation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene desert wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genome (book) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Protein moonlighting wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Gene nomenclature wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Functional module identification with tomato gene and metabolite expression profiles Cass Peluso Project Leaders: Zhangjun Fei, Ph.D, Je-Gun Joung, Ph.D Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA Introduction Abstract Cells carry out a multitude of complex functions through the Results A B D Pathogenesis-related transcriptional factor (CBF1 ) coordinated effort of a set of genes. Such activity is often carried out through the organization of the genome into regulatory modules. Modules are sets of co-regulated genes that share a common function. The identification of modules, their regulators, and the conditions under which regulation occurs is thus very important since a good deal of a cell’s activity is organized into this network of interacting modules. It is essential that these modules be identified and their functions be determined in order to understand cellular responses to internal and external signals (Segal et al. 2003). Here we report the identification of functional modules in the tomato using gene expression and metabolite profile datasets generated from a set of Solanum pennellii introgression lines. TMV responserelated gene Product (WRKY) Module 35 heat shock protein salicylic acid-binding protein gibberellin 2-oxidase Module 20 Phytoene PDS3 Phytofluene x-carotene wound-induced protein syringolide-induced protein 19-1-5 Heavy metal transport/detoxification protein pathogenesis-related protein osmotin precursor Avr9/Cf-9 rapidly elicited protein 231 CBF2 transcription factor - Apidaecin gene family ZDS Neurosporene Lycopene Module 6 b-LYC g-carotene b-LYC e-LYC a-carotene Figure 3. Representative functional modules C Methods TMV response-related gene product (WRKY) Floral homeotic protein AGAMOUS (TAG1) Figure 2. Computational pipeline for module identification Figure 1. The schema of module identification First, a computational pipeline was implemented to identify transcription factors on tomato TOM2 oligo-nucleotide arrays (See Fig. 2 for details). Step 2: Map TOM2 array probe IDs to GO term IDs using the Gene Ontology Annotation Database (GOA) based on their homologues in SwissProt and TrEMBL. Then, the gene expression profiles generated using the TOM2 arrays and the targeted metabolite profiles from twenty-three S. pennellii introgression lines were processed and normalized. Step 3: Associate GO IDs and GO names using the Gene Ontology definition file (OBO v1.2) downloaded from http://geneontology.org. The processed and normalized gene expression and metabolite profiles and the set of candidate regulatory genes on the TOM2 arrays were then loaded into Genomica, a program that uses an algorithm to simultaneously search for a partition of genes into modules and for each module's regulatory program. A module's regulation program specifies the set of regulators that control the module and the expression of the genes in the module. The program outputs a list of modules and associated regulation programs. Fig. 3 shows several interesting modules that were identified. Step 4: Add each GO name to each GO ID in the result file from Step 2. Each of the identified modules was then analyzed for GO term enrichment using a tool in the Tomato Functional Genomics Database. Significantly over-represented GO terms were identified in each module with an adjusted p-value (False Discovery Rate, FDR) < 0.05. A heatmap of the significance of GO term enrichment was generated using the web-based application Matrix2PNG, with an orange color signifying that a module has a certain function (Fig. 4). A list of modules and their regulators was then processed using the program Cytoscape, which created a module-regulator network map, with modules in light blue and regulators in orange (Fig. 5). ABA (A) The inferred regulatory modules. (B) Module 35 contains a pathogenesis-related TF as a regulator. It also has a number of genes that are potentially involved in plant responses to biotic and abiotic stresses. This module is thus likely related to pathogen response, which could have important implications for the creation of disease-resistant tomato varieties. (C) Module 6 shows two regulators acting on gene products that relate to the cell wall. The likely function of this module is related to cell wall organization and biogenesis. Tomato TOM2 array transcriptional factor identification Step 1: Blast tomato TOM2 probe sequences against SwissProt and TrEMBL protein databases. Parse results using BioPerl to extract probe IDs and hit accessions. Lutein b-carotene MADSbox cell wall organization and biogenesis cell wall protein (D) Module 20 contains phytofluene, a metabolite in the carotenoid biosynthesis pathway. WRKY CBF1 WRKY NAC2 WRKY NAC WRKY TAG1 Step 5: Identify TOM2 array probes with GO names of the desired regulators. WRKY4 Tomato functional module identification Step 6: Impute gene expression dataset. ERF Step 7: Make input expression dataset: Convert absolute value to log value (for gene and metabolite profiles), choose expressed gene in introgression lines, and merge expression profiles. Step 8: Make Genomica input file Figure 4. A heatmap representing the significant biological functions of modules 8.1: Insert associated genes (SGNs) with symbols (LEs) and sort. 8.2: Get symbols for the regulators. 8.3: Extract and add the expression data for the regulators, add the associated symbols, and merge them into the output file from step 8.1. ERF Figure 5. The regulator-module network represents key regulators that are linked to several different modules. Module 35 shares the pathogenesis-related transcriptional factor with modules 4, 31, and 43. These modules need to be investigated to see if they have the functional interactions. Modules 6 and 20 also share the TMV response-related gene product with numerous other modules. References Segal E. et al. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34: 166-176. Acknowledgements Thank you to BTI and Dr. Je Min Lee for the IL datasets used and helpful comments given.