* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2010 PCB 5530 Class Projects
Gene nomenclature wikipedia , lookup
Magnesium transporter wikipedia , lookup
RNA interference wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Interactome wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Proteolysis wikipedia , lookup
Signal transduction wikipedia , lookup
Community fingerprinting wikipedia , lookup
Paracrine signalling wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Biochemical cascade wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Ridge (biology) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
2011 PCB 5530 Class Projects ● Background These projects take cutting-edge genome research into the classroom and so ‘bring to life’ what you are learning. They will train you to integrate different kinds of information in a team setting. The class is split into three groups, each coordinated by a postdoctoral from our project. Groups are: Group 1 – Folate (FT-like) Ghulam Hasnain [email protected] Eric R Schultz Katrina K Cuddy Junya Zhang Jian Li Group 2 – Pyridoxine (YggS) Basma El Yacoubi [email protected] Annette M Fahrenkrog Shaun P Jensen Jiani Yang Wenlan Tian Group 3 – Riboflavin (COG3236) Océane Frelin [email protected] Marcio Fernando Resende Jessica L Gilbert Sisi Geng Yang Zhao Rujira Tisarum Each group will carry out the following tasks: - Annotate a particular pathway or set of pathways of vitamin- or cofactor-related metabolism in the maize genome using a standard format. - Identify genes that are missing from the pathways (‘functions without a gene’) and genes of unknown function that are in some way associated with the pathways (‘genes without a function’). - Predict candidates for missing genes (enzymes, transporters) and predict functions for genes of unknown function using the tools of plant-prokaryote comparative genomics Each group will meet several times, under the supervision of the postdoctoral instructor, to divide up the work to be done, to discuss progress, and to integrate and write up a report in the format below. Project reports. Reports should be submitted to Dr. Andrew Hanson [email protected] as an electronic file and a good quality hard copy, by 5 pm on Friday, November 18, 2011. Grading. For each student, 50% of the grade will be based on the performance of their group as a whole as judged from the project report. The other 50% will be based on the postdoctoral instructor’s assessment of that student’s contribution to the group effort (independent of the group size). Outcomes. It is anticipated that, in the best cases, the groups’ predictions will form part of a publication in a peer-reviewed journal, in which case the group members will be included as authors. ● Report format – summary Reports should be arranged in four sections: 1. A diagram summarizing the metabolites and enzymes of the pathways, pathway variants in different organisms, subcellular compartmentation in Arabidopsis, and maize genes that are missing or have mysterious paralogs. Use the format on p. 2 (a PowerPoint file is available as a template). 2. A table listing: - All pathway enzymes (give EC nos.) and transporters (whether or not they have plant homologs) - Gramene identifiers (e.g. GRMZM2G107665) of the corresponding maize genes - AGI numbers (e.g. At3g12930) of the corresponding Arabidopsis genes - Predicted and experimentally determined subcellular location of the Arabidopsis proteins - Arabidopsis mutant phenotypes (if available), e.g. lethality, growth defect, metabolome change - When there is >1 maize or Arabidopsis gene, show a phylogenetic tree relating them 3. A figure showing expression in various organs (the ‘development’ display) from the Golm Arabidopsis Expression dbase Multiple Expression Query for each Arabidopsis gene in the pathways. 4. Summaries and supporting evidence for two predictions of candidates for maize genes ‘missing’ from the pathways, or for roles of ‘unknown function’ genes associated with the pathway or its Bvitamin product, e.g. ‘FT-like’, YggS, and COG3236 (no more than 1 page total per gene). 1 ● Report format – example 1. Pathway diagram. This is a simplified version showing only reactions in the cytosol. A full diagram (required for your report) would have three parts, each similar to the one shown, representing the reactions found in cytosol, mitochondria, and plastids. See also Fig. 2, PMID: 10785666. I HutHUI His Formimino-Glu Maize paralogs of unknown function FT1, 2, 3 YgfZ GcvT GcvH GcvP Lpd I Gly Ser CO2+NH3 II ThyA Met FolD NADP FolD III NADPH H2O VII I Purines Dehydropantoate IV ADP +Pi PurN PurH PanB dTMP I ADP+Pi Methylbutanoate dUMP MetE GlyA NH3 FolD NADP Hcy ATP VII CD Gly MetF Missing from maize V YgfA COG0212 GlyA GCV VI Glu tRNAmet Fhs PurU ATP Fmt HCOOH HCOOH FormyltRNAmet I NADPH Folate Forms I II III IV V VI VII VIII Tetrahydrofolate (THF) 5,10-Methylene-THF 5,10-Methenyl-THF 10-Formyl-THF 5-Formyl-THF 5-Methyl-THF Dihydrofolate 5-Formimino-THF Reconstruction of folate-mediated C1 metabolism. Enzymes are denoted by their names in Escherichia coli or other bacteria. Maize has genes encoding all of the expected core enzymes except that most genes of histidine degradation are missing (blue blue highlights). highlights Maize has five mysterious extra genes of unknown function (in red). 2 2. Summary table & phylogenetic trees Gene name folD Enzyme (EC no.) 5,10-Methylene-THF dehydrogenase (EC 1.5.1.5) / 5,10-methenyl-THF cyclohydrolase (EC 3.5.4.9) Maize genes GRMZM2G130790 GRMZM2G150485 GRMZM2G143230 GRMZM2G082463 ygfZ Methylene-THF reductase (EC 1.5.1.20) n/a etc etc etc metF 1 a GRMZM2G107665 a Arabidopsis genes At2g38660 At3g12290 At4g00600 At4g00620 At2g44160 At3g59970 At4g12130 At1g60990 etc P = prediction; E = experimental evidence Identical to GRMZM2G147498 MEGA phylogenetic tree for FolD proteins AT4G00620 AT4G00600 Chloroplast GRMZM2G143230 AT2G38660 GRMZM2G130790 AT3G12290 Mitochondrion Cytosol GRMZM2G150485 MEGA phylogenetic tree for MetF proteins At3g59970 At2g44160 GRMZM2G082463 MEGA phylogenetic tree for YgfZ proteins GRMZM2G107665 GRMZM2G147498 Mitochondrion At4g12130 At1g60990 Chloroplast 3 Arabidopsis Location 1 Mito P Cytosol P E Plasma memb E Chloro P Chloro P E Cytosol P Cytosol P Mito P Chloro P E etc Arabidopsis Phenotype Abnormal shape seedling Morphology & metabolites normal etc 3. Expression in various organs folD At2g38660, At3g12290, At4g00620/At4g00600 metF At2g44160 (At3g59970 not on chip) ygfZ At4g12130, At1g60990 4 4. Predictions ● YgfZ (GRMZM2G107665). YgfZ occurs in all plants, all other eukaryotes, most bacteria, and some archaea. YgfZ is a paralog of the folate-dependent GcvT protein of the glycine cleavage complex, and so is likely to use a folate. Bacterial genes encoding YgfZ often cluster with diverse iron/sulfur (Fe/S) proteins (Fig. 1A), and transcriptomic1 and proteomic data2,3 show induction by oxidative stress and confirm an Fe/S association. YgfZ is required for full activity of certain Fe/S enzymes in E. coli4 and yeast5. Arabidopsis ygfZ At4g12130 is co-expressed with the iron storage protein ferritin 2 (Fig. 1B). We therefore predict that YgfZ is a novel folate-dependent protein involved in assembly or repair of Fe/S proteins, particularly under oxidative stress. A ygfZ sufC sufB sufD B sufS Rx ygfZ sdhC sdhD sdhA sdhB ygfZ MiaB Ba Sm ygfZ nadA nadC Pu Fig. 1. A. Clustering of ygfZ genes with Fe/S-related genes. Blue, YgfZ; red, Fe/S proteins; rose, proteins in same complex or pathway as Fe/S proteins; turquoise, Fe/S cluster assembly proteins. Rx, Rubrobacter xylanophilus; Sm, Stenotrophomonas maltophilia; Ba, Buchnera aphidicola; Pu, Pelagibacter ubique. B. Coexpressed gene network around At4g12130 (Atted). Arrow indicates Ferritin 2, which is both directly and indirectly connected to At4g12130 (in yellow). 1. 2. 3. 4. 5. Zheng M, Wang X, Templeton LJ, Smulski DR, LaRossa RA, Storz G (2001) DNA microarraymediated transcriptional profiling of the Escherichia coli response to hydrogen peroxide. J Bacteriol 183: 4562-4570 Hu P, Janga SC, Babu M, Díaz-Mejía JJ, Butland G et al (2009) Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 7: e96 Chen JW, Sun CM, Sheng WL, Wang YC, Syu WJ (2006) Expression analysis of up-regulated genes responding to plumbagin in Escherichia coli. J Bacteriol 188: 456-463 Ote T, Hashimoto M, Ikeuchi Y, Su'etsugu M, Suzuki T, Katayama T, Kato J (2006) Involvement of the Escherichia coli folate-binding protein YgfZ in RNA modification and regulation of chromosomal replication initiation. Mol Microbiol 59: 265-275 Gelling C, Dawes IW, Richhardt N, Lill R, Mühlenhoff U (2008) Mitochondrial Iba57p is required for Fe/S cluster formation on aconitase and activation of radical SAM enzymes. Mol Cell Biol 28: 1851-1861 5 ● COG0212 (GRMZM2G038128). The COG0212 protein is a paralog of YgfA (5-formyl-THF cycloligase). COG0212 occurs in plants, animals, archaea, and some bacteria. Comparative genomics analysis shows that COG0212 occurs in many archaea that lack folates (Fig. 2A), and that in most other organisms it co-occurs with YgfA; these data suggest that COG0212 differs from YgfA in function and has nothing to do with folates. Comparative genomics analysis also reveals clustering of archaeal and bacterial COG0212 genes with various genes of thiamine metabolism and transport, and with genes encoding the pyruvate dehydrogenase complex, which requires thiamine (Fig. 2B-D). Also, Arabidopsis COG0212 is co-expressed with pyruvate dehydrogenase kinase, which regulates the pyruvate dehydrogenase complex (Fig. 2E). We therefore predict that COG0212 mediates a reaction in thiamine metabolism, most probably a salvage reaction. COG0212 cannot mediate a biosynthetic reaction because it is present in animals, and animals do not synthesize thiamine. Crenarchaeota Sulfolobales Korarchaeota Thermococcales Methanopyrales Methanobacteriales Methanococcales Thermoplasmatales Archaeoglobales Halobacteriales Methanomicrobiales Methanosarcinales Euryarchaeota Thiazole ECF transporter Pyrobaculum aerophilum Pyrobaculum arsenaticum Thermoproteus neutrophilus Pyrobaculum aerophilum Pyrobaculum arsenaticum Pyrobaculum islandicum Sulfolobus acidocaldarius Sulfolobus solfataricus Sulfolobus tokodaii Korarchaeum cryptofilum Pyrococcus abyssi Pyrococcus furiosus Pyrococcus horikoshii Thermococcus kodakarensis Thermococcus onnurineus Methanopyrus kandleri Methanothermobacter sp. Methanosphaera stadtmanae Methanocaldococcus jannaschii Methanococcus aeolicus Methanococcus maripaludis Methanococcus vannieli Thermoplasma acidophilum Thermoplasma volcanium Archaeoglobus fulgidus Haloferax volcanii Haloarcula marismortui Halobacterium sp. Halogeometricum borinquense Halomicrobium mukohataei Haloquadratum walsbyi Halorhabdus utahensis Natronomonas pharaonis Methanospirillum hungatei Methanoculleus marisnigri Methanoregula boonei Methanosaeta thermophila Methanococcoides burtonii Methanosarcina acetivorans Methanosarcina barkeri Methanosarcina mazei thiD/thiN 0212 ATPase LP thiW thiD/thiN 0212 0212 TM thiD/thiN C Thermus thermophilus Haloarcula marismortui Halomicrobium mukohataei E2 0212 E1 0212 E3 0212 D E3 Hydroxymethylpterin ABC transporter Ochrobactrum spp. 0212 thiY thiZ thiX 0212 Pyrimidine/purine ABC transporter Clostridiales bacterium Thiomonas sp. 0212 0212 ATPase ATPase TM TM TM TM PBP PBP E At3g06483 Thermoproteales B Fola tes Ana logs YgfA 0212 Gene or cofactor present Gene or cofactor absent A At1g76730 Fig. 2. A. Distribution among archaeal taxa of folates and folate analogs in relation to the distribution of genes encoding YgfA and COG0212. B. Clustering of archaeal COG0212 genes with genes for thiamine metabolism and transport. Note that the COG0212-thiD/thiN duplet is conserved despite changes in gene orientation and flanking genes. C. Clustering of bacterial and archaeal COG0212 genes with genes encoding one or more subunits (E1-E3) of the pyruvate dehydrogenase complex, which requires thiamine pyrophosphate as cofactor. D. Clustering of bacterial COG0212 genes with genes encoding components of ABC transporters predicted to import hydroxy-methylpterin and/or formylaminopyrimidine or pyrimidines or purines. E. Correlated expression of Arabidopsis COG0212 (At1g76730) and pyruvate dehydrogenase kinase (At3g06483) during development (from Atted). 6 ● Instructions and recommendations Start by identifying all the known metabolites, enzymes and their EC numbers, and transporters in the assigned pathway in plants, bacteria, yeast, and animals. Remember that some pathways have variants; be sure to include these. This work will yield the equivalent of a KEGG pathway map. Next, identify first Arabidopsis and then maize orthologs for as many as possible of the enzymes and transporters, using BlastP searches of Arabidopsis and maize proteins (at NCBI and Maizesequence.org), AraCyc, the KEGG pathway database, and the bibliome. Identify also which enzymatic or transport steps have no corresponding gene in plants, i.e. are cases of ‘missing genes’. And look for paralogs of the known pathway enzymes. These are almost always interesting targets for function predictions – but remember that they may be ‘overannotated’ (via homology) as actually being pathway enzymes even though they are not. Note also: - Metabolites, enzymes, and genes have been given various names over the years, and GenBank contains different versions of predictions for the same genes/proteins. Hence multiple gene/protein accession numbers can refer to the same gene/protein. - Genes can be fused together, so it is important to check whether the proteins you identify have any ‘extra’ domains (use the NCBI Conserved Domains tool). Such domains may carry functions that have yet to be discovered. Where there are truly multiple genes in Arabidopsis or maize (not just multiple entries for the same gene) for a pathway step, align all the sequences and draw a phylogenetic tree using Phylogeny.fr or MEGA5). This will distinguish which maize genes correspond to which Arabidopsis genes. Use TargetP and Predotar to predict subcellular locations of proteins, and use PPDB and SUBA II (and the literature) to find experimental evidence on subcellular location. Mine plant phenome databases (RAPID, SeedGenes, Chloroplast2010, BAPDB) for information on mutant phenotypes, if available. Use the Golm Arabidopsis Expression dbase Multiple Expression Query tool to plot the expression in different organs of each Arabidopsis gene in the pathway. For ‘missing genes’, predict candidates from Arabidopsis and maize based on homology with proteins from other organisms and/or comparative genomics analysis. Use comparative genomics to identify candidates for unknown proteins that are (i) common to plants and bacteria, and (ii) associated in some way with the assigned pathway. Examples would be: - paralogs (see above) - cases where some bacteria have a gene in which a domain of unknown function is fused to an enzyme of the assigned pathway, and the unknown domain has a homolog in plants - cases where an unknown gene is clustered on the chromosome in diverse bacteria with genes of the assigned pathway, and the unknown gene has a homolog in plants. Then use comparative genomics analysis (including post-genomic evidence, e.g. microarray data) and the bibliome to predict a function as precise as possible for the ‘unknown proteins’. For two cases from your predictions for ‘missing genes’ or ‘unknown proteins’ (you could take one of each, or two of either) summarize the evidence for your prediction in not more than page total. Make use of figures to present the evidence. 7