Download 2010 PCB 5530 Class Projects

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene nomenclature wikipedia , lookup

Magnesium transporter wikipedia , lookup

RNA interference wikipedia , lookup

Lac operon wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Interactome wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Metabolism wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Proteolysis wikipedia , lookup

Signal transduction wikipedia , lookup

Community fingerprinting wikipedia , lookup

Paracrine signalling wikipedia , lookup

Expression vector wikipedia , lookup

Gene expression wikipedia , lookup

Promoter (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene wikipedia , lookup

Biochemical cascade wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Ridge (biology) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcript
2011 PCB 5530 Class Projects
● Background
These projects take cutting-edge genome research into the classroom and so ‘bring to life’ what you
are learning. They will train you to integrate different kinds of information in a team setting.
The class is split into three groups, each coordinated by a postdoctoral from our project. Groups are:
Group 1 – Folate (FT-like)
Ghulam Hasnain [email protected]
Eric R Schultz
Katrina K Cuddy
Junya Zhang
Jian Li
Group 2 – Pyridoxine (YggS)
Basma El Yacoubi [email protected]
Annette M Fahrenkrog
Shaun P Jensen
Jiani Yang
Wenlan Tian
Group 3 – Riboflavin (COG3236)
Océane Frelin [email protected]
Marcio Fernando Resende
Jessica L Gilbert
Sisi Geng
Yang Zhao
Rujira Tisarum
Each group will carry out the following tasks:
- Annotate a particular pathway or set of pathways of vitamin- or cofactor-related metabolism in the
maize genome using a standard format.
- Identify genes that are missing from the pathways (‘functions without a gene’) and genes of unknown
function that are in some way associated with the pathways (‘genes without a function’).
- Predict candidates for missing genes (enzymes, transporters) and predict functions for genes of
unknown function using the tools of plant-prokaryote comparative genomics
Each group will meet several times, under the supervision of the postdoctoral instructor, to divide up
the work to be done, to discuss progress, and to integrate and write up a report in the format below.
Project reports. Reports should be submitted to Dr. Andrew Hanson [email protected] as an electronic
file and a good quality hard copy, by 5 pm on Friday, November 18, 2011.
Grading. For each student, 50% of the grade will be based on the performance of their group as a
whole as judged from the project report. The other 50% will be based on the postdoctoral instructor’s
assessment of that student’s contribution to the group effort (independent of the group size).
Outcomes. It is anticipated that, in the best cases, the groups’ predictions will form part of a publication in a peer-reviewed journal, in which case the group members will be included as authors.
● Report format – summary
Reports should be arranged in four sections:
1. A diagram summarizing the metabolites and enzymes of the pathways, pathway variants in different
organisms, subcellular compartmentation in Arabidopsis, and maize genes that are missing or have
mysterious paralogs. Use the format on p. 2 (a PowerPoint file is available as a template).
2. A table listing:
- All pathway enzymes (give EC nos.) and transporters (whether or not they have plant homologs)
- Gramene identifiers (e.g. GRMZM2G107665) of the corresponding maize genes
- AGI numbers (e.g. At3g12930) of the corresponding Arabidopsis genes
- Predicted and experimentally determined subcellular location of the Arabidopsis proteins
- Arabidopsis mutant phenotypes (if available), e.g. lethality, growth defect, metabolome change
- When there is >1 maize or Arabidopsis gene, show a phylogenetic tree relating them
3. A figure showing expression in various organs (the ‘development’ display) from the Golm Arabidopsis Expression dbase Multiple Expression Query for each Arabidopsis gene in the pathways.
4. Summaries and supporting evidence for two predictions of candidates for maize genes ‘missing’
from the pathways, or for roles of ‘unknown function’ genes associated with the pathway or its Bvitamin product, e.g. ‘FT-like’, YggS, and COG3236 (no more than 1 page total per gene).
1
● Report format – example
1. Pathway diagram. This is a simplified version showing only reactions in the cytosol. A full diagram
(required for your report) would have three parts, each similar to the one shown, representing the
reactions found in cytosol, mitochondria, and plastids. See also Fig. 2, PMID: 10785666.
I
HutHUI
His
Formimino-Glu
Maize paralogs of
unknown function
FT1, 2, 3
YgfZ GcvT
GcvH
GcvP
Lpd
I
Gly
Ser
CO2+NH3
II
ThyA
Met
FolD
NADP
FolD
III
NADPH
H2O
VII
I
Purines
Dehydropantoate
IV
ADP
+Pi
PurN
PurH
PanB
dTMP
I
ADP+Pi
Methylbutanoate
dUMP
MetE
GlyA
NH3
FolD
NADP
Hcy
ATP
VII
CD
Gly
MetF
Missing from maize
V
YgfA COG0212
GlyA
GCV
VI
Glu
tRNAmet
Fhs PurU
ATP
Fmt
HCOOH
HCOOH
FormyltRNAmet
I
NADPH
Folate Forms
I
II
III
IV
V
VI
VII
VIII
Tetrahydrofolate (THF)
5,10-Methylene-THF
5,10-Methenyl-THF
10-Formyl-THF
5-Formyl-THF
5-Methyl-THF
Dihydrofolate
5-Formimino-THF
Reconstruction of folate-mediated C1 metabolism. Enzymes are
denoted by their names in Escherichia coli or other bacteria. Maize
has genes encoding all of the expected core enzymes except that
most genes of histidine degradation are missing (blue
blue highlights).
highlights
Maize has five mysterious extra genes of unknown function (in red).
2
2. Summary table & phylogenetic trees
Gene name
folD
Enzyme (EC no.)
5,10-Methylene-THF
dehydrogenase (EC 1.5.1.5) /
5,10-methenyl-THF
cyclohydrolase (EC 3.5.4.9)
Maize genes
GRMZM2G130790
GRMZM2G150485
GRMZM2G143230
GRMZM2G082463
ygfZ
Methylene-THF reductase
(EC 1.5.1.20)
n/a
etc
etc
etc
metF
1
a
GRMZM2G107665 a
Arabidopsis genes
At2g38660
At3g12290
At4g00600
At4g00620
At2g44160
At3g59970
At4g12130
At1g60990
etc
P = prediction; E = experimental evidence
Identical to GRMZM2G147498
MEGA phylogenetic tree for FolD proteins
AT4G00620
AT4G00600
Chloroplast
GRMZM2G143230
AT2G38660
GRMZM2G130790
AT3G12290
Mitochondrion
Cytosol
GRMZM2G150485
MEGA phylogenetic tree for MetF proteins
At3g59970
At2g44160
GRMZM2G082463
MEGA phylogenetic tree for YgfZ proteins
GRMZM2G107665
GRMZM2G147498
Mitochondrion
At4g12130
At1g60990
Chloroplast
3
Arabidopsis Location 1
Mito P
Cytosol P E Plasma memb E
Chloro P
Chloro P E
Cytosol P
Cytosol P
Mito P
Chloro P E
etc
Arabidopsis Phenotype
Abnormal shape seedling
Morphology & metabolites normal
etc
3. Expression in various organs
folD At2g38660, At3g12290, At4g00620/At4g00600
metF At2g44160 (At3g59970 not on chip)
ygfZ At4g12130, At1g60990
4
4. Predictions
● YgfZ (GRMZM2G107665). YgfZ occurs in all plants, all other eukaryotes, most bacteria, and some
archaea. YgfZ is a paralog of the folate-dependent GcvT protein of the glycine cleavage complex,
and so is likely to use a folate. Bacterial genes encoding YgfZ often cluster with diverse iron/sulfur
(Fe/S) proteins (Fig. 1A), and transcriptomic1 and proteomic data2,3 show induction by oxidative
stress and confirm an Fe/S association. YgfZ is required for full activity of certain Fe/S enzymes in E.
coli4 and yeast5. Arabidopsis ygfZ At4g12130 is co-expressed with the iron storage protein ferritin 2
(Fig. 1B). We therefore predict that YgfZ is a novel folate-dependent protein involved in assembly or
repair of Fe/S proteins, particularly under oxidative stress.
A
ygfZ
sufC
sufB
sufD
B
sufS
Rx
ygfZ
sdhC sdhD
sdhA
sdhB
ygfZ
MiaB
Ba
Sm
ygfZ
nadA
nadC
Pu
Fig. 1. A. Clustering of ygfZ genes with Fe/S-related genes. Blue, YgfZ; red, Fe/S proteins; rose, proteins in
same complex or pathway as Fe/S proteins; turquoise, Fe/S cluster assembly proteins. Rx, Rubrobacter xylanophilus; Sm, Stenotrophomonas maltophilia; Ba, Buchnera aphidicola; Pu, Pelagibacter ubique.
B. Coexpressed gene network around At4g12130 (Atted). Arrow indicates Ferritin 2, which is both directly and
indirectly connected to At4g12130 (in yellow).
1.
2.
3.
4.
5.
Zheng M, Wang X, Templeton LJ, Smulski DR, LaRossa RA, Storz G (2001) DNA microarraymediated transcriptional profiling of the Escherichia coli response to hydrogen peroxide. J
Bacteriol 183: 4562-4570
Hu P, Janga SC, Babu M, Díaz-Mejía JJ, Butland G et al (2009) Global functional atlas of
Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 7: e96
Chen JW, Sun CM, Sheng WL, Wang YC, Syu WJ (2006) Expression analysis of up-regulated
genes responding to plumbagin in Escherichia coli. J Bacteriol 188: 456-463
Ote T, Hashimoto M, Ikeuchi Y, Su'etsugu M, Suzuki T, Katayama T, Kato J (2006) Involvement of the Escherichia coli folate-binding protein YgfZ in RNA modification and regulation of
chromosomal replication initiation. Mol Microbiol 59: 265-275
Gelling C, Dawes IW, Richhardt N, Lill R, Mühlenhoff U (2008) Mitochondrial Iba57p is required for Fe/S cluster formation on aconitase and activation of radical SAM enzymes. Mol Cell
Biol 28: 1851-1861
5
● COG0212 (GRMZM2G038128). The COG0212 protein is a paralog of YgfA (5-formyl-THF cycloligase). COG0212 occurs in plants, animals, archaea, and some bacteria. Comparative genomics
analysis shows that COG0212 occurs in many archaea that lack folates (Fig. 2A), and that in most
other organisms it co-occurs with YgfA; these data suggest that COG0212 differs from YgfA in
function and has nothing to do with folates. Comparative genomics analysis also reveals clustering
of archaeal and bacterial COG0212 genes with various genes of thiamine metabolism and transport,
and with genes encoding the pyruvate dehydrogenase complex, which requires thiamine (Fig. 2B-D).
Also, Arabidopsis COG0212 is co-expressed with pyruvate dehydrogenase kinase, which regulates
the pyruvate dehydrogenase complex (Fig. 2E). We therefore predict that COG0212 mediates a
reaction in thiamine metabolism, most probably a salvage reaction. COG0212 cannot mediate a biosynthetic reaction because it is present in animals, and animals do not synthesize thiamine.
Crenarchaeota
Sulfolobales
Korarchaeota
Thermococcales
Methanopyrales
Methanobacteriales
Methanococcales
Thermoplasmatales
Archaeoglobales
Halobacteriales
Methanomicrobiales
Methanosarcinales
Euryarchaeota
Thiazole ECF transporter
Pyrobaculum
aerophilum
Pyrobaculum
arsenaticum
Thermoproteus
neutrophilus
Pyrobaculum aerophilum
Pyrobaculum arsenaticum
Pyrobaculum islandicum
Sulfolobus acidocaldarius
Sulfolobus solfataricus
Sulfolobus tokodaii
Korarchaeum cryptofilum
Pyrococcus abyssi
Pyrococcus furiosus
Pyrococcus horikoshii
Thermococcus kodakarensis
Thermococcus onnurineus
Methanopyrus kandleri
Methanothermobacter sp.
Methanosphaera stadtmanae
Methanocaldococcus jannaschii
Methanococcus aeolicus
Methanococcus maripaludis
Methanococcus vannieli
Thermoplasma acidophilum
Thermoplasma volcanium
Archaeoglobus fulgidus
Haloferax volcanii
Haloarcula marismortui
Halobacterium sp.
Halogeometricum borinquense
Halomicrobium mukohataei
Haloquadratum walsbyi
Halorhabdus utahensis
Natronomonas pharaonis
Methanospirillum hungatei
Methanoculleus marisnigri
Methanoregula boonei
Methanosaeta thermophila
Methanococcoides burtonii
Methanosarcina acetivorans
Methanosarcina barkeri
Methanosarcina mazei
thiD/thiN
0212
ATPase
LP thiW
thiD/thiN
0212
0212
TM
thiD/thiN
C
Thermus
thermophilus
Haloarcula
marismortui
Halomicrobium
mukohataei
E2
0212
E1
0212
E3
0212
D
E3
Hydroxymethylpterin ABC transporter
Ochrobactrum
spp.
0212
thiY
thiZ
thiX
0212
Pyrimidine/purine ABC transporter
Clostridiales
bacterium
Thiomonas
sp.
0212
0212
ATPase
ATPase
TM
TM
TM
TM
PBP
PBP
E
At3g06483
Thermoproteales
B
Fola
tes
Ana
logs
YgfA
0212
Gene or cofactor present
Gene or cofactor absent
A
At1g76730
Fig. 2. A. Distribution among archaeal taxa of folates and folate analogs in relation to the distribution of genes
encoding YgfA and COG0212. B. Clustering of archaeal COG0212 genes with genes for thiamine metabolism
and transport. Note that the COG0212-thiD/thiN duplet is conserved despite changes in gene orientation and
flanking genes. C. Clustering of bacterial and archaeal COG0212 genes with genes encoding one or more
subunits (E1-E3) of the pyruvate dehydrogenase complex, which requires thiamine pyrophosphate as cofactor. D.
Clustering of bacterial COG0212 genes with genes encoding components of ABC transporters predicted to import
hydroxy-methylpterin and/or formylaminopyrimidine or pyrimidines or purines. E. Correlated expression of Arabidopsis COG0212 (At1g76730) and pyruvate dehydrogenase kinase (At3g06483) during development (from Atted).
6
● Instructions and recommendations
Start by identifying all the known metabolites, enzymes and their EC numbers, and transporters in
the assigned pathway in plants, bacteria, yeast, and animals. Remember that some pathways have
variants; be sure to include these. This work will yield the equivalent of a KEGG pathway map.
Next, identify first Arabidopsis and then maize orthologs for as many as possible of the enzymes and
transporters, using BlastP searches of Arabidopsis and maize proteins (at NCBI and Maizesequence.org), AraCyc, the KEGG pathway database, and the bibliome. Identify also which enzymatic or
transport steps have no corresponding gene in plants, i.e. are cases of ‘missing genes’. And look for
paralogs of the known pathway enzymes. These are almost always interesting targets for function
predictions – but remember that they may be ‘overannotated’ (via homology) as actually being pathway enzymes even though they are not. Note also:
- Metabolites, enzymes, and genes have been given various names over the years, and GenBank
contains different versions of predictions for the same genes/proteins. Hence multiple gene/protein
accession numbers can refer to the same gene/protein.
- Genes can be fused together, so it is important to check whether the proteins you identify have any
‘extra’ domains (use the NCBI Conserved Domains tool). Such domains may carry functions that
have yet to be discovered.
Where there are truly multiple genes in Arabidopsis or maize (not just multiple entries for the same
gene) for a pathway step, align all the sequences and draw a phylogenetic tree using Phylogeny.fr or
MEGA5). This will distinguish which maize genes correspond to which Arabidopsis genes.
Use TargetP and Predotar to predict subcellular locations of proteins, and use PPDB and SUBA II
(and the literature) to find experimental evidence on subcellular location.
Mine plant phenome databases (RAPID, SeedGenes, Chloroplast2010, BAPDB) for information on
mutant phenotypes, if available.
Use the Golm Arabidopsis Expression dbase Multiple Expression Query tool to plot the expression in
different organs of each Arabidopsis gene in the pathway.
For ‘missing genes’, predict candidates from Arabidopsis and maize based on homology with proteins from other organisms and/or comparative genomics analysis.
Use comparative genomics to identify candidates for unknown proteins that are (i) common to plants
and bacteria, and (ii) associated in some way with the assigned pathway. Examples would be:
- paralogs (see above)
- cases where some bacteria have a gene in which a domain of unknown function is fused to an
enzyme of the assigned pathway, and the unknown domain has a homolog in plants
- cases where an unknown gene is clustered on the chromosome in diverse bacteria with genes of
the assigned pathway, and the unknown gene has a homolog in plants.
Then use comparative genomics analysis (including post-genomic evidence, e.g. microarray data)
and the bibliome to predict a function as precise as possible for the ‘unknown proteins’.
For two cases from your predictions for ‘missing genes’ or ‘unknown proteins’ (you could take one of
each, or two of either) summarize the evidence for your prediction in not more than page total. Make
use of figures to present the evidence.
7