Download Comparative day/night metatranscriptomic analysis of microbial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Amino acid synthesis wikipedia , lookup

RNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Biochemical cascade wikipedia , lookup

Genomic library wikipedia , lookup

Pharmacometabolomics wikipedia , lookup

Metabolism wikipedia , lookup

Transposable element wikipedia , lookup

Genetic code wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Messenger RNA wikipedia , lookup

Two-hybrid screening wikipedia , lookup

RNA interference wikipedia , lookup

RNA silencing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Biosynthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Point mutation wikipedia , lookup

Gene regulatory network wikipedia , lookup

Non-coding DNA wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Epitranscriptome wikipedia , lookup

Gene wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Community fingerprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Environmental Microbiology (2009) 11(6), 1358–1375
doi:10.1111/j.1462-2920.2008.01863.x
Comparative day/night metatranscriptomic analysis
of microbial communities in the North Pacific
subtropical gyre
emi_1863
1358..1375
Rachel S. Poretsky,1 Ian Hewson,2 Shulei Sun,1
Andrew E. Allen,3 Jonathan P. Zehr2 and
Mary Ann Moran1*
1
University of Georgia, Department of Marine Sciences,
Athens, GA 30602, USA.
2
University of California Santa Cruz, Department of
Ocean Sciences, Santa Cruz, CA 95064, USA.
3
J. Craig Venter Institute, Microbial and Environmental
Genomics, San Diego, CA 92121, USA.
day compared with night, and evidence that energy
acquisition is coordinated with solar radiation levels
for both autotrophic and heterotrophic microbes. In
contrast, housekeeping activities such as amino acid
biosynthesis, membrane synthesis and repair, and
vitamin biosynthesis were overrepresented in the
night transcriptome. Direct sequencing of these environmental transcripts has provided detailed information on metabolic and biogeochemical responses of a
microbial community to solar forcing.
Summary
Metatranscriptomic analyses of microbial assemblages (< 5 mm) from surface water at the Hawaiian
Ocean Time-Series (HOT) revealed community-wide
metabolic activities and day/night patterns of differential gene expression. Pyrosequencing produced
75 558 putative mRNA reads from a day transcriptome
and 75 946 from a night transcriptome. Taxonomic
binning of annotated mRNAs indicated that Cyanobacteria contributed a greater percentage of the transcripts (54% of annotated sequences) than expected
based on abundance (35% of cell counts and 21% 16S
rRNA of libraries), and may represent the most
actively transcribing cells in this surface ocean community in both the day and night. Major heterotrophic
taxa contributing to the community transcriptome
included a-Proteobacteria (19% of annotated
sequences, most of which were SAR11-related) and
g-Proteobacteria (4%). The composition of transcript
pools was consistent with models of prokaryotic gene
expression, including operon-based transcription
patterns and an abundance of genes predicted to be
highly expressed. Metabolic activities that are shared
by many microbial taxa (e.g. glycolysis, citric acid
cycle, amino acid biosynthesis and transcription and
translation machinery) were well represented among
the community transcripts. There was an overabundance of transcripts for photosynthesis, C1
metabolism and oxidative phosphorylation in the
Received 17 September, 2008; accepted 3 December, 2008. *For
correspondence. E-mail [email protected]; Tel. 706-542-6481; Fax
706-542-5888.
Introduction
Oceanic subtropical gyres make up 40% of the Earth’s
surface and play critical roles in carbon fixation and nutrient
cycling. The Hawaii Ocean Time-Series (HOT) in the North
Pacific subtropical gyre was established to provide a longterm perspective on oceanographic properties of such
systems (Karl and Lukas, 1996) and has served as the
focus of substantial research into the role of marine microorganisms in ocean biogeochemistry (Karl et al., 1997;
Cavender-Bares et al., 2001; Zehr et al., 2001). Station
ALOHA, the core study site at HOT, is characterized by
warm (> 23°C) surface waters with low NO3- concentrations (< 15 nM), seasonally variable surface mixed-layers
(10–120 m), low standing biomass of living organisms
(10–15 mg C l-1) and a persistent deep (75–140 m) chlorophyll a maximum layer. Since 1988, regular measurements
of physical, chemical and biological parameters have been
obtained with monthly ship-based monitoring as well as
bottom-moored instruments and buoys. Recent metagenomic sampling efforts at Station ALOHA have provided
information about the genes harboured by the bacterioplankton community and how they are distributed with
depth (DeLong et al., 2006). Characterizing patterns of
expression of these microbial genes and identifying what
factors induce their expression is the next critical step in
understanding this oceanic ecosystem.
Analogous to metagenomics, environmental transcriptomics (metatranscriptomics) retrieves and sequences
environmental mRNAs from a microbial assemblage
without prior knowledge of what genes the community
might be expressing (Poretsky et al., 2005; Frias-Lopez
et al., 2008). Thus it provides a less biased perspective on
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd
Comparative Metatranscriptomic Analysis 1359
microbial gene expression in situ compared with other
approaches (Wawrik et al., 2002; Bürgmann et al., 2003;
Zhou, 2003). Environmental transcriptomics protocols are
technically difficult, however, as prokaryotic mRNAs generally lack the poly(A) tails that make isolation of eukaryotic messages relatively straightforward (Liang and
Pardee, 1992) and because of the relatively short halflives of mRNAs (Belasco, 1993). In addition, mRNAs are
much less abundant than rRNAs in total RNA extracts,
thus an rRNA background often overwhelms mRNA
signals.
A first analysis of environmental transcriptomes by creating clone libraries using random primers to reversetranscribe and amplify environmental mRNAs was
successful in two different natural environments
(Poretsky et al., 2005), but results were biased by selection of the random primers used to initiate cDNA synthesis. Techniques to linearly amplify mRNA obviate the
need for random primers in the amplification step and
make it possible to use less starting material (Gelder
et al., 1990), while recently developed pyrosequencing
technologies allow direct sequencing (without cloning)
(Margulies et al., 2005). Initial application of this
approach at Station ALOHA (Frias-Lopez et al., 2008)
and in coastal water mesocosms (Gilbert et al., 2008)
demonstrated its utility for characterizing microbial community gene expression.
Here we use environmental transcriptomics to elucidate
day/night differences in gene expression in surface
waters of the North Pacific subtropical gyre (Karl and
Lukas, 1996). This analysis provides information on the
dominant metabolic processes within the bacterioplankton assemblages and reveals changes in expression patterns of biogeochemically relevant processes.
Results
cDNA sequence annotation
The cDNAs prepared from amplified RNA (collected from
the 0.2–5 mm size fraction) ranged in size from 100 bp to
1 kb, with the majority between 200 and 500 bp. The
average picoliter reactor pyrosequencing read length
was 99 bp, typical for the GS 20 sequencing platform.
Predicted rRNA sequences were removed based on
sequence similarity to the nt database using BLASTN.
While more laborious than our initial approach that used
sequence similarity to the RDP II database supplemented
with a 18S, 23S and 28S rRNA database from genome
sequences, it identified nearly all of the rRNA sequences
in our libraries. Accurate identification of rRNAs is crucial
because of numerous misidentified sequences in the
RefSeq protein database (i.e. rRNA sequences that are
incorrectly annotated as putative proteins). Relatively low
rRNA sequence contamination (37%) compared with the
rRNA content of prokaryotic cells (> 80%; Ingraham et al.,
1983) indicated that the steps for excluding rRNAs
through selective degradation and subtractive hybridization were largely successful.
Sequences remaining after deletion of rRNA
sequences (75 558 from the day and 75 946 from the
night) were categorized as possible protein encoding
sequences and BLASTX-queried against the NCBI
curated, non-redundant reference sequence database
(RefSeq) to determine putative functions (Fig. 1). About
one-third of HOT pyrosequences in each library met the
criteria for gene predictions determined empirically by in
silico analysis of known functional gene sequences fragmented into 100 bp pieces (see Experimental procedures
for more details). This is nearly twice the fraction of reads
identified in metagenomic efforts with similar pyrosequencing read lengths (Frias-Lopez et al., 2008; Mou
et al., 2008), as might be expected for sequences biased
towards coding regions of genomes. These sequences
were subsequently assigned to the function of their best
hit in RefSeq. Transcript abundance was analysed as
relative abundance within the collective community transcriptome rather than per-gene expression levels (see
Frias-Lopez et al., 2008). Empirically derived criteria were
established in separate in silico analyses for the Clusters
of Orthologous Groups (COG) and Kyoto Encyclopedia of
Genes and Genomes (KEGG) databases, which contain
fewer sequences than RefSeq (Fig. 1). Some of the
sequences without hits in RefSeq were similar to proteins
in the Global Ocean Sampling database, indicating that
similar sequences have been found in marine bacterioplankton communities, but functional annotation is not
currently possible.
At the end of the annotation pipeline, half of the possible protein-encoding sequences in each library had no
significant hits to previously sequenced genes. To
examine how sequences from uncultured marine bacterial taxa might decrease annotation success or skew
taxonomic assignments, we randomly selected 100 bp
sequences from the coding regions of genome fragments
from SAR86 and SAR116 cells captured in environmental BAC libraries (SAR86 BAC, AF279106; SAR86 BAC,
AY552545; SAR116 BAC, AY744399). Excluding selfhits, approximately 60% of the sequences from the BACs
had no hits in RefSeq (Table S1). In a similar analysis of
coding sequences from cultured taxa with genome
sequences available (Pelagibacter ubique HTCC1062
and Prochlorococcus marinus MIT9312), only ~20% of
the sequences had no hits in RefSeq. Many unannotated
sequences in the HOT libraries are therefore likely to be
transcripts from poorly known taxa, but also include
some transcripts from well-known taxa with poor identity
to sequence databases for that particular 100 bp fragment. In support of the latter, a preliminary analysis of a
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1360 R. S. Poretsky et al.
240,422 Total 454
Sequences
BLASTN against nt
63%
37%
88,916
rRNA sequences
151,504 Possible proteinencoding sequences
BLASTX against RefSeq
21%
42%
48,648 Identified sequences
BLASTX against
COG
BLASTX against
KEGG
10%
15%
24,474
sequences
35,927
sequences
102,856 Unidentified
BLASTX against
GOS
BLASTX
against nr
0.07%
32%
163 sequences
11%
26,366 GOS sequences
76,327
unidentified
sequences
Fig. 1. The mRNA annotation pipeline developed for 454 transcript reads showing combined counts for the day and night transcriptomes. All
percentages are relative to the total number of sequences entering the pipeline.
marine environmental transcriptome consisting of longer
reads (~200 bp; 454 GS FLX sequencing platform; R.S.
Poretsky and M.A. Moran, unpublished; and Table S1)
resulted in twice the frequency of annotated sequences
as the HOT metatranscriptome. For the 100 bp genome
fragments from uncultured taxa that had significant hits
in RefSeq, they were almost always to a gene from an
organism in the same phylum (90%) or subphylum
(70%), and thus did not significantly skew the taxonomic
assignments (Table S1). SAR86, SAR116 and other currently recognized uncultured groups made up ~4% of the
16S rRNA amplicons from these samples (see below).
Finally, to examine the possibility that the unidentified
sequences were from non-protein-coding regions, these
sequences were BLAST-queried to tRNA genes, 5S rRNA
genes and intergenic region sequences from three
P. marinus genomes (MIT9301, MIT9312 and AS601)
and two P. ubique genomes (HTCC1002 and
HTCC1062). Based on this analysis, ~4% of the 76 327
unidentified sequences were from non-protein-coding
regions of these genomes, and these primarily hit intergenic regions.
Community composition and taxonomic origin
of transcripts
Prochlorococcus are the most abundant Cyanobacteria at
Station ALOHA (> 95% of photosynthetic picoplankton
cells; Campbell and Vaulot, 1993) and in this study
accounted for approximately 2 ¥ 105 cell ml-1 (based on
flow cytometric counting; http://hahana.soest.hawaii.edu/
hot/hot-dogs/), or ~30% of the total microbial community
(Fig. 2). Heterotrophic bacteria (including phototrophs)
were numerically dominant with ~5 ¥ 105 cell ml-1,
accounting for ~65% of the microbial community present
at the time of sampling. Direct counts also indicated the
presence of ~800 cell ml-1 of pigmented nanoeukaryotes
(0.2%; Fig. 2).
Companion PCR-based 16S rRNA clone libraries were
generated from DNA collected in tandem with the RNA
samples and demonstrated close agreement with the flow
cytometric data in terms of taxonomic composition at
Station ALOHA. Cyanobacteria accounted for ~20% of the
16S rRNA sequences, and heterotrophic bacterial groups
were ~80% (Fig. 3). Among the heterotrophic 16S rRNA
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Comparative Metatranscriptomic Analysis 1361
0
200
400
600
0
Depth (m)
50
100
150
200
chla (10 -3 μg l -1 )
Prochlorococcus x 10 3 cells ml -1
Synechococcus x 10 2 cells ml -1
Nanoeukaryotes x 10 2 cells ml -1
Heterotrophic bateria x 10 3 cells ml -1
Fig. 2. Depth profiles of Prochlorococcus-like, Synechococcus-like, heterotrophic bacteria and pigmented nanoeukaryotes during the HOT-175
cruise, as determined by flow cytometry. The horizontal line indicates the mixed layer depth. The depth profile for chlorophyll a is also
indicated. Data were collected through the HOT project and downloaded from the HOT Data Organization and Graphical System
(http://hahana.soest.hawaii.edu/hot/hot-dogs/).
sequences, Proteobacteria were most abundant (41%;
Fig. 3) and were dominated by a-Proteobacteria (22%),
b-Proteobacteria (8%) and g-Proteobacteria (8%).
Bacteroidetes (8%) and Firmicutes (12%, biased towards
the day sample) were also well represented.
Taxonomically binned mRNA sequences were compared with community composition data to ask whether
taxa contributed to the HOT community mRNA in proportion to their representation in the microbial assemblage
(i.e. whether taxa are equally transcriptionally active on a
per-cell basis). Cyanobacteria dominated the transcript
libraries (55% of sequences) with about twofold higher
representation than in the 16S rRNA amplicons or the cell
count data (Fig. 3), indicating that there is more gene
expression in these autotrophic bacterioplankton than in
co-occurring heterotrophs (or possibly that their transcripts are longer-lived). When relative 16S rRNA abundance was calculated among just the heterotrophic
groups (i.e. with cyanobacterial sequences removed),
many taxa had similar contributions to the transcript pool
and amplicon pool, suggesting comparable levels of
transcriptional activity on a per-gene basis within the limits
of recognized biases of PCR amplification (Fig. 3).
Proteobacteria contributed the second largest number of
transcript sequences (28%), most of which were attributed to a-Proteobacteria (19%) and g-Proteobacteria
(4%). Approximately 2% of the total transcripts were of
eukaryotic origin. Comparing putative taxonomic assignments of transcripts between day and night, Cyanobacteria contributed equally to the day and night transcriptome
(55% versus 56%) as did a-Proteobacteria (40% versus
45% of heterotrophic transcripts) and g-Proteobacteria
(11% versus 8% of heterotrophic transcripts) (Fig. 3).
More detailed taxonomic assignment of transcripts was
carried out for the best represented clades. The Cyanobacteria transcripts were dominated by Prochlorococcuslike sequences most similar to P. marinus AS9601,
P. marinus MIT 9301 and P. marinus MIT 9312 (Table 1).
The a-Proteobacteria, the most transcriptionally active
among the heterotrophic groups, mostly contained
sequences with similarity to the SAR11 group members
P. ubique HTCC1002 and P. ubique HTCC1062 (~10% of
prokaryotic transcripts). Roseobacter-like sequences
were also represented and were primarily assigned to
Dinoroseobacter shibae DFL 12, Jannaschia sp. CCS1,
Silicibacter pomeroyi DSS-3, Roseobacter denitrificans
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1362 R. S. Poretsky et al.
A
16S rRNA
genes
Cyanobacteria
18 %
Other
82%
Cyanobacteria
Alphaproteobacteria
Gammaproteobacteria
Betaproteobacteria
Deltaproteobacteria
mRNA
Cyanobacteria
55 %
Other
45%
Epsilonproteobacteria
Other Proteobacteria
Actinobacteria
Bacteroidetes
Chlamydiae
Chlorobi
B
Chloroflexi
Chrysiogenetes
Acidobacteria
16S rRNA
genes
Cyanobacteria
21 %
Other
79%
Firmicutes
Lentispaerae
Planctomycetes
Spirochaetes
Thermotogae
Verrucomicrobia
mRNA
Cyanobacteria
56 %
Other
44%
Fig. 3. Contribution of taxa to the 16S rRNA amplicon pool and transcript pool for the day (A) and night (B) samples. Taxonomy is presented
to the phylum level (based on NCBI taxonomy) except for Proteobacteria, which is at the subphylum level. The dashed red lines indicate
cyanobacterial abundance in the night sample as determined by flow cytometric counting.
Och 114 and Silicibacter sp. TM1040 (Table 1 and Fig. 4).
These assignments do not imply that these actual species
were present at the time of sample collection, but rather
they represent the best current sequence matches for
some of the more abundant environmental transcripts.
Transcriptome coverage
To estimate transcriptome coverage, 16S rRNA clone
library data were used to establish a taxon-abundance
model for the HOT community at an identity level of 99%.
Assuming that each taxon expresses 1000 different
genes at any given time (based on the Escherichia coli
model; Ingraham et al., 1983) and that genome coverage
follows a Lander–Waterman model (Lander and Waterman, 1988), we estimate that the most abundant taxon in
the day or night sample had over 90% transcriptome
coverage (i.e. 90% of the expressed genes were
sequenced at least once), while the 15 most abundant
taxa had more than half of their transcriptome represented (Table S2). Alternately, we determined the singletons and doubletons among the COG categories (i.e. the
number of COGs containing only one or two sequences)
and applied the Chao1 index of diversity to determine the
theoretical abundance of COGs in the day and night. The
sequencing effort captured about 80% of the COGs predicted to be present in the night transcriptome and 70% of
the COGs predicted for the day transcriptome (Table S2).
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Frequency
Frequency
Comparative Metatranscriptomic Analysis 1363
% PHX Genes
Number of Adjacent Genes
Fig. 4. Evidence for prokaryotic gene expression patterns in the community transcriptome based on P. marinus, P. ubique and Roseobacter
genome bins.
A. Operon-based expression was evaluated by comparing the number of adjacent transcripts (closed circles) to the number of adjacent genes
found in 1000 random samples of the same size from the reference genome (black lines).
B. Preferential representation of transcripts from genes predicted to be highly expressed was evaluated by comparing the per cent of PHX
genes in the reference genome (grey bar) to the per cent in the transcript pool (black bar). Differences between transcript pools and reference
genomes were significant for both operon and PHX analyses (Wilcoxon signed-rank test; P < 0.05).
Based on these coverage estimates, increased
sequencing depth would have been required to fully
capture some specialized processes carried out by rarer
members of the HOT community, but frequently transcribed genes from abundant taxa were well represented.
In support of this, transcript mapping to the three P. mariTable 1. Number of sequences from the community transcriptome
with highest homology to the listed reference genomes, as determined by top BLASTX hit to RefSeq.
Prochlorococcus marinus str. MIT 9301
Prochlorococcus marinus str. AS9601
Pelagibacter ubique HTCC1002
Prochlorococcus marinus str. MIT 9312
Pelagibacter ubique HTCC1062
Dinoroseobacter shibae DFL 12
Jannaschia sp. CCS1
Silicibacter pomeroyi DSS-3
Roseobacter denitrificans Och 114
Silicibacter sp. TM1040
Night
Day
6309
3214
2541
1430
1308
48
41
39
30
19
6292
2849
1851
1264
944
34
27
30
28
26
nus and two P. ubique reference genomes showed
sequences with homology to approximately half the
genes, at coverage depths ranging from 1 to nearly 500
hits per gene (Fig. 5). Moreover, many of the reference
genes with the greatest coverage are those mediating
metabolic processes expected to be dominant in the HOT
bacterioplankton community (e.g. the photosynthesis
genes psaA and psaB, the light-harvesting complex and
RuBisCo, ammonium transporters and transcriptionrelated genes; Fig. 5). Other genes on the reference
genomes for which there is similarly deep transcript coverage (e.g. proteorhodopsin, Na+/solute symporters,
colicin V production and several hypothetical proteins)
can be hypothesized to also represent dominant metabolic activities (Fig. 5).
Operon signature in environmental transcript pools
Genes that encode steps in the same metabolic pathway
are frequently clustered into operons in prokaryotic
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1364 R. S. Poretsky et al.
30
A
MIT9312
25
Ribosomal protein L14
Hypothetical protein
20
Photosytem II PsbJ protein
Photosystem II D2
15
Ribosomal protein L20
Cytochrome b559, beta subunit
10
5
0
0
100
500
200
300
400
500
600
700
800
900
B
MIT9301
Ammonium transporter family
450
400
150
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
Photosystem I PsaA
Ribulose bisphosphate carboxylase
Elongation factor Tu
Occurences
100
Protoporphyrin IX magnesium chelatase,
subunit chlH
Photosystem II PsbB (CP47)
50
0
0
100
200
300
400
500
600
700
800
900
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
425
C
400
AS9601
Photosystem II PsbA (D1)
75
Photosystem I PsaB
light-harvesting complex protein
Integral membrane protein,
interacts with FtsH
50
30S ribosomal protein S3
Photosystem II
reaction center Z
25
0
0
80
100
200
300
400
500
D
600
700
800
Na+/solute symporter
900
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
HTCC 1002
60
DNA-directed RNA polymerase
beta prime chain
Bacteriorhodopsin
40
AcrB/AcrD/AcrF family protein
(Acriflavin resistance)
Chromosome segregation
SMC family protein
Hypothetical protein
20
0
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1200
1300
1400
35
E
30S ribosomal protein S1
30
HTCC 1062
excinuclease
ABC subunit C
25
heat shock protein a
octaprenyl-diphosphate synthase
translation elongation factor EF-G
adenylylsulfate reductase
20 lipoprotein
precursor
15
10
5
0
0
100
200
300
400
500
600
700
800
900
1000
1100
Fig. 5. Mapping of transcripts to five reference genomes. A–C are P. marinus strains; D–E are P. ubique strains. The x-axis shows gene
number in the reference genome. Shaded areas represent possible hypervariable regions with few mapped transcripts.
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Comparative Metatranscriptomic Analysis 1365
genomes (Overbeek et al., 1999) to facilitate coordinated
transcription. Thus a cell’s transcript pool is anticipated to
include more mRNAs from adjacent genes than what is
expected from a random sampling of the genome. We
tested this using the transcripts assigned to taxonomic
bins for P. marinus, P. ubique and Roseobacter by counting the frequency with which transcripts from two adjacent
genes on the reference strain genome (defined as ⱕ 1
gene intervening) were both present in the bin, recognizing that the wild and reference organisms will not be fully
syntenic. In all cases, the transcript bins had significantly
more adjacent genes than a null distribution generated
from the reference genomes (Fig. 4A), suggesting that
random transcript sequencing captures operon-based
expression patterns in natural marine bacterioplankton
communities.
Predicted highly expressed genes in environmental
transcript pools
Genes that are frequently transcribed by a cell can be
identified based on patterns in codon usage (Karlin and
Mrázek, 2000). We identified predicted highly expressed
(PHX) genes for the reference genomes, and then
assigned PHX status to the transcripts with best hits to
that reference genome based on homology. For all taxa,
and in accordance with biological expectations, the environmental transcript bins had a significantly higher percentage of PHX genes than the reference genomes
(Fig. 4B). This pattern was particularly evident for the
Roseobacters (9% of the genes in the reference genomes
are PHX versus 30% of the transcripts; 3.1-fold enrichment) and for P. marinus MIT9301 (4.6% versus 12.9%;
2.8-fold enrichment). A larger proportion of PHX transcripts were found in the day for all P. marinus bins and
the Roseobacter bin (although not for P. ubique), suggesting that highly expressed genes more frequently mediate
daytime-biased processes (data not shown).
Metatranscriptomic comparison of day and night
samples
The majority of annotated transcripts (~80%) were
assigned to genes related to metabolism, and in particular
to three KEGG categories: amino acid transport and
metabolism, energy production and conversion (particularly oxidative phosphorylation, carbon fixation and nitrogen metabolism), and carbohydrate transport (Fig. 6).
Membrane transport and signal transduction pathways
were also common in the community transcriptome,
specifically for ABC transporters of amino acids, glycine
betaine/L-proline,
polyamines
(spermidine
and
putrescine), iron and nutrients in the form of nitrate, phosphate and phosphonate.
The day/night samples allowed comparison of dominant
expression patterns in the presence and absence of solar
radiation in the bacterioplankton community. Among the
167 KEGG metabolic pathways represented in the annotated sequences, four pathways were better represented
at night (including those for glycospingolipid biosynthesis
and nucleotide sugars metabolism) and six were better
represented in the day (including photosynthesis and oxidative phosphorylation) (95% confidence level; Table 2).
Some KEGG pathways had significant diel differences in
frequency for individual taxonomic bins. These include:
histidine biosynthesis, with evidence for expression of all
or nearly all genes in the pathway (both P. ubique and
P. marinus at night; Fig. 7A and Fig. S1A); metabolism of
glutathione, a reductant with multiple detoxifying and cytoprotective capabilities (P. marinus at night); the photosynthesis pathway (phycobilisome, photosystem I and II,
cytochromes, ATP synthase) and nearly all genes
involved in biosynthesis of phytoene, and subsequent
conversion into carotenoids (P. marinus in the day;
Fig. 7B); nucleotide sugars metabolism, glycosphingolipid
biosynthesis, carotenoid biosynthesis and vitamin B6
metabolism (P. ubique in the night; Fig. S1B); and transfer
of methyl groups for C1 metabolism (P. ubique and
Roseobacter in the day) (Table S3).
Transcript annotation based on the COG database was
comparable. Among the 1577 COGs represented, statistical comparisons identified 12 that were better represented at night and 13 that were better represented in the
day (Table S4). These included amino acid and nucleotide
metabolism, membrane biosynthesis and polyamine
dehydrogenation at night, and light-mediated energy production, protein turnover, catalase synthesis and inorganic ion transport and metabolism in the day.
Statistically significant differences in the distribution of
transcripts between the day and night samples were also
assessed independently of KEGG and COG assignments
in order to capture signals from genes not currently classified by these annotation systems. Among the additional
significant functions overrepresented in the night transcriptome were those for ABC-type spermidine/putrescine
transport system permeases, RNA methyltransferases
and signal transduction histidine kinases. For the day
transcriptome, genes encoding proteorhodopsin and an
aromatic-ring hydroxylase were significantly overrepresented (Table S5).
Eukaryotic sequences
The majority of eukaryotic transcripts were most closely
affiliated with sequences from green-lineage organisms
(Viridiplantae), such as the picoeukaryotic prasinophytes
Ostreococcus spp. (Derelle et al., 2006) and Micromonas
spp. A large number of transcripts also appeared to be
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1366 R. S. Poretsky et al.
Fig. 6. The 50 most abundant KEGG pathways in the night (black) and day (gray) transcriptomes. The pathways marked with stars were
significantly overexpressed in one of the pools as determined by comparisons with P < 0.05 (Rodriguez-Brito et al., 2006).
most closely related to genes in Chromalveoltae
(Stramenopile or Alevolate) genomes. These groups are
major components of the picoeukaryotic phytoplankton
(McDonald et al., 2007) and are small enough to pass the
5 mm prefilter used in this study. Gene transcripts that
most closely matched reference genomes of photosynthetic eukaryotes were more abundant in the day compared with night sample. Among the most highly
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Comparative Metatranscriptomic Analysis 1367
Table 2. KEGG pathways significantly overrepresented in the night (grey shading) and day (no shading) transcriptomes (P < 0.05).
Pathway ID
Pathway
Category
path00520
path00521
path00602
path00603
path00190
path00195
path03010
path03020
path04940
path05060
Nucleotide sugars metabolism
Streptomycin biosynthesis
Glycosphingolipid biosynthesis – neo-lactoseries
Glycosphingolipid biosynthesis – globoseries
Oxidative phosphorylation
Photosynthesis
Ribosome
RNA polymerase
Chaperonin
Chaperonin
Carbohydrate Metabolism
Biosynthesis of Secondary Metabolites
Glycan Biosynthesis and Metabolism
Glycan Biosynthesis and Metabolism
Energy Metabolism
Energy Metabolism
Translation
Transcription
N/A
N/A
expressed genes detected from eukaryotic organisms
were those encoding chlorophyll binding proteins, light
harvesting reactions and photosynthetic machinery
(Fig. 8). These included a photosystem II D1 reactioncentre protein related to that from the diatom Thalassiosira psuedonana, as well as the plastid-encoded
photosystem I subunit protein similar to psaB from the
diatom Odontella sinensis. Evidence for stramenopile
nitrogen metabolism via urea cycle activity was also
detected based on several transcripts that most closely
matched stramenopile carbamoyl phosphate synthetase
III, indicating that the unique diatom urea cycle (Armbrust
et al., 2004; Allen et al., 2006) is likely active in natural
populations of stramenopile picophytoplankton.
qPCR quality control
The half-life of microbial transcripts can be as short as
30 s based on studies of mRNAs of cultured bacteria
(Belasco, 1993), while processing times for environmental
nucleic acid samples can take hours (Fuhrman et al.,
1988). Linear amplification of RNA greatly reduces the
time between initiation of sampling and capture of transcripts because sample volumes can be reduced, but it
has potential to introduce bias into the sequenced mRNA
pool. A previous test with mRNA from the cultured marine
bacterium S. pomeroyi DSS-3 demonstrated minor bias
and good repeatability during linear amplification (Bürgmann et al., 2007). Here, we assessed the full environmental transcriptomic sequencing protocol by comparing
qPCR-based ratios of selected genes in day versus night
total RNA fractions to the pyrosequencing-based ratio of
these same genes in the sequenced transcript pools. Five
genes common in the transcriptome (P. marinus-like recA
and psaA, P. ubique-like proteorhodopsin and Na+/solute
symporter, and P. torquis-like membrane proteinase)
showed a strong positive correlation between night and
day ratios in the original RNA pool and the pyrosequence
data sets (r = 0.94, Fig. S2), indicating that the sequenced
metatranscriptome was representative of the unamplified
mRNA pool.
Discussion
The HOT program provides comprehensive, long-term
oceanographic information for the oligotrophic North
Pacific Ocean (Karl and Lukas, 1996). In situ dissolved
organic constituents at 25 m depth at Station ALOHA are
typically 70–110 mM for carbon, 5–6 mM for nitrogen and
0.2–0.3 mM for phosphorus; ammonium concentrations in
these waters (~50 nM) are below the detection limit of
standard nutrient analysis (http://hahana.soest.hawaii.
edu/hot/hot-dogs/). Surface water nutrient data over the
past several decades for the month of November, the
month in which the community transcriptomes in this
study were obtained, and taken during various times of
day show no discernable differences in organic and inorganic carbon, nitrogen, and/or phosphorus concentrations
at Station ALOHA on a diel basis.
Building on previous metagenomic and transcriptomic
analyses of this system (DeLong et al., 2006; Frias-Lopez
et al., 2008), this day/night environmental transcriptomics
effort provides insight into the temporal patterns of bacterioplankton metabolic processes and ecological activities
(Table 3). Three important caveats of the analysis are
that: (i) the composition of the environmental transcriptomes may be inadvertently shaped by collection and
filtration manipulations, (ii) mRNAs with intrinsically
shorter half-lives are less likely to be stabilized and
sequenced and (iii) only 32% of the 151 000 possible
transcript sequences could be confidently assigned to a
known function (Fig. 1). Despite these concerns, the community transcriptomes provided reasonable coverage of
mRNAs from the dominant organisms, and the relative
representation of transcripts was corroborated by RT
qPCR-based expression analyses (Fig. S2).
The community transcriptomes had properties consistent with expected attributes of the HOT ecosystem,
including the apparent taxonomic affiliations of transcripts. Closely related P. marinus reference strains that
are members of high light clade eMIT9312 comprised the
most populated transcript bin. This clade has been shown
to dominate in the upper euphotic zone (< 50 m) at low
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1368 R. S. Poretsky et al.
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Comparative Metatranscriptomic Analysis 1369
Fig. 7. Transcript mapping to the KEGG histidine metabolism pathway for P. ubique, overrepresented at night (A) and the biosynthesis of
steroids and carotenoids pathway for P. marinus, overrepresented in the day (B). Colour (blue for night, yellow for day) indicates that
transcripts were found; grey indicates that genes were present in the reference genome but no transcripts were found; white indicates that
genes were not present in the reference genomes.
lives across the prokaryotic taxa, dominant autotrophs
produced more transcripts per gene than any
co-occurring heterotrophic group not only in the day, but
also at night (Fig. 3). This may reflect an advantage of
autotrophy over heterotrophy for maintaining cellular
activity levels given the low concentration and refractory
nature of organic carbon fuelling heterotrophic activity in
the oligotrophic ocean (Bauer et al., 1992).
As expected, many transcripts involved in lightmediated processes, such as photosynthesis and proteorhodopsin activity, were among those overrepresented in
the community transcriptome in the day. Transcripts
involved in protection or repair of light-induced DNA and
protein damage (e.g. catalase, chaperones, photolyases,
superoxide dismutase and various DNA repair proteins)
were also common in the day sample. Evidence
of daytime C1 utilization by some heterotrophs suggests
a source of C1 compounds or methyl groups in this
and mid latitudes (below 30°) (Johnson et al., 2006),
much like the HOT stations from which our samples were
collected. SAR11-like sequences comprised the second
largest taxonomic bin. This taxon is the most numerous
heterotrophic marine bacterioplankton group, particularly
in oligotrophic oceans where it makes up 30–40% of cells
in the euphotic zone (Morris et al., 2002).
Studies of taxonomic composition of ocean assemblages consistently show the numerical importance of aand g-Proteobacteria, Cyanobacteria, and Bacteriodetes
(Morris et al., 2002; DeLong et al., 2006; Rusch
et al., 2007), but little is known about how abundance
specifically relates to activity levels. Based on comparisons of the relative abundance of taxa (flow cytometry
counts and 16S rRNA amplicons) to their representation
in the community transcriptome, by far the highest per-cell
transcriptional activity level in the HOT ecosystem was
seen for the Cyanobacteria. Assuming similar mRNA half-
electron transport
photosynthesis, light reaction
phosphorus metabolic process
oxidative phosphorylation
ion transmembrane transporter activity
energy derivation by oxidation of organic compounds
heme binding
cellular biosynthetic process
protein metabolic process
cellular macromolecule metabolic process
organelle organization and biogenesis
DNA metabolism
organic acid metabolic process
carbon utilization by fixation of carbon dioxide
aldehyde metabolic process
macromolecular complex assembly
cellular component assembly
ribonucleoprotein complex biogenesis and assembly
macromolecule biosynthetic process
intracellular transport
aromatic compound metabolic process
biopolymer metabolic process
amino acid and derivative metabolic process
0
20
40
60
80
100
120 140
160 180
Fig. 8. Number of eukaryotic transcripts in day (top bars) compared with night (bottom bars) samples. The relative contribution of
Viridiplanteae (green), photosynthetic Chromist algae (yellow), and other Chromist (red) transcripts to each Gene Ontology (GO) annotation
category are depicted.
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1370 R. S. Poretsky et al.
Table 3. Selected biogeochemically relevant genes in the HOT metatranscriptome.
Nitrogen
Methylotrophy
Polyamine degradation
Sulphur cycle
Glycine betaine
Aromatic compounds
Carbon monoxide
Phototrophy and C fixation
Phosphate assimilation
Amino acid metabolism
Trace metal uptake
Nitrogenase (N fixation)
Ammonium transport
Ammonia monooxygenase
Assimilatory nitrate reductase
Hydroxylamine oxidoreductase
Nitrate permease
Nitrite reductase
Dissimilatory nitrite reductase
Nitric oxide reductase
Nitrate transporter
Urease
Serine-glyoxylate aminotransferase
Formate dehydrogenase
Methylene tetrahydrofolate reductase
Methane monooxygenase
Methanol dehydrogenase
Methenyltetrahydromethanopterin cyclohydrolase
Crotonyl-CoA reductase
Formaldehyde-activating enzyme
Deoxyhypusine synthase
Spermidine/putrescine transport system permease
Acetylpolyamine aminohydrolase
Sulphur oxidation
Dimethylsulphoniopropionate demethylase
Dimethylglycine dehydrogenase
Glycine cleavage system (amnomethyltransferase)
Aromatic ring hydroxylase
protocatechuate 3,4-dioxygenase
Benzoyl-CoA oxygenase
Carbon monoxide dehydrogenase
Photosystem I
Photosystem II
Rubisco
Photosynthetic reaction centre, M subunit
Proteorhodopsin
Phosphonate uptake
Alkaline phosphatase
Phosphate uptake
Glutamate synthase
Glutathione reductase
Histidine kinase
Threonine synthase
Selenium
Iron
Arsenite
Arsenate reductase
nifH, nifU, nifS, nifB
amt
amoA
narB
hao
napA
nirA
nirK, nirS
norQ
narK
ureC, ureE, ureF
fdh, fdsD
metF
mmo
mxa
mch
fae
dys2
potC
aphA
soxB, soxC, soxA, soxZ, soxF
dmdA
dmgdh
gcvT
chlP
pcaH
boxA
cosS, coxM, coxL
multiple
multiple
rbcL, rbcS
pufM
phnD, phnC
phoA
pstA, pstS
gltB
gor
baeS
thrC
tonB
arsC
Night
Day
+
+
+
+*
+
+
+
+
+
+
+
+
+
+
+
+
+
+*
+*
+
+
+
+
+
+
+
+
+
+*
+
+
+
+*
+
+
+
+
+
+
+
+
+
+
+
+*
+*
+*
+*
+
+
+
+
+
+*
+*
+*
+
+*
+
+
+
+
+
+
+
+
+
+
A ‘+’ indicates occurrence in the night or day sample. An asterisk indicates significantly higher transcript frequency in one.
ecosystem. Compounds such as methanol and formaldehyde (Heikes et al., 2002; Carpenter et al., 2004; Giovannoni et al., 2008), methane (Ward et al., 1987), and
methylhalides (Woodall et al., 2001; Schaefer et al., 2002)
may be available to heterotrophic bacterioplankton in
surface sea water. Dimethylsulphoniopropionate, an
organic sulphur compound produced in abundance by
marine phytoplankton (Kiene et al., 2000), is a rich source
of methyl groups for surface ocean bacterioplankton, and
tetrahydrofolate-mediated C1 transfer (i.e. transcripts
mapping to the C1 pool by folate and methane metabolism
KEGG pathway; Table S5) has been shown to play a role
in its metabolism (Howard et al., 2006). Recovery of nearly
four times as much mRNA per volume of sea water in the
day (~30 ng l-1) compared with night (~8 ng l-1) is consistent with high relative abundance of RNA polymerase
transcripts in the day (Table 2) and likely reflects increased
gene expression when solar radiation is available.
Night-biased synthesis of vitamin B6, essential for a
variety of amino acid conversions including transaminations, decarboxylations and dehydrations, in conjunction
with evidence for other night-time activities such as the
g-glutamyl pathway for amino acid uptake, the overrepresentation of amino acid transport and metabolism genes,
and the histidine synthesis pathway (Table 3 and
Tables S4–S6), indicate that amino acid acquisition in
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Comparative Metatranscriptomic Analysis 1371
general may be a relatively more important metabolic
activity in the night. Prochlorococcus marinus has recently
been shown to exhibit diel patterns of amino acid uptake,
with acquisition occurring predominantly at dusk (Mary
et al., 2008). Our data agree with this and further suggest
that heterotrophic taxa also devote a greater percentage
of their transcriptome to transporting and synthesizing
amino acids at night. Night-time accumulation of amino
acids might be a mechanism for nitrogen storage by many
organisms, particularly for P. marinus, which undergoes
cell division at night. Histidine, the amino acid with the
most consistent signal for synthesis at night by both
autotrophs and heterotrophs (Fig. 7A and Fig. S1), is one
of the most nitrogen-rich amino acids (only arginine has
more amino groups).
Overall, bacterial community investment in this oligotrophic ocean system was skewed towards energy
acquisition and metabolism during the day, while biosynthesis (specifically of membranes, amino acids and vitamins) received relatively greater investments at night.
Many microbial processes expected to be differentially
expressed over a day/night cycle, such as photosynthesis, oxidative phosphorylation and proteorhodopsin activity, were indeed captured in the sequence data. Less
anticipated processes that emerged included the utilization of C1 compounds, the uptake of polyamines and the
degradation of aromatic compounds (Table 3). Other
metabolic processes ongoing in this microbial community,
although without statistical evidence for day/night patterns, included: use of nitrate and urea as nitrogen
sources; use of phosphate, phosphonate and carbonoxygen-phosphorus (C-O-P) compounds as phosphorus
sources; oxidation of reduced sulphur compounds; oxidation of carbon monoxide; and uptake of multiple trace
metals (Table 3). This comparative analysis of microbial
community transcripts has provided an inventory of
ongoing metabolic processes, offered insights into their
temporal patterns and supplied a new type of data for
predictive modelling of environmental controls on ecosystem properties.
Experimental procedures
Sample collection
Samples were collected at the Hawaiian Ocean Time-series
(HOT) Station ALOHA, defined by the 6-nautical-mile radius
circle centred at 22°45′N, 158°W in November, 2005 (HOT175). For RNA extraction, sea water was collected from a
depth of 25 m using Niskin bottles on a conductivitytemperature-depth rosette sampler. A night sample was collected at 03:00 on 11 November 2005, and a daytime
sample was collected at 13:00 on 13 November 2005.
During HOT-175, the peak PAR level was at 12:00, with
sunrise occurring around 07:00 and sunset just before
18:00. Sea water (80 l for the night sample and 40 l for the
day sample) was prefiltered through a 5 mm, 142 mm polycarbonate filter (GE Osmonics, Minnetonka, MN) followed
by a 0.2 mm, 142 mm Durapore (Millipore) filter using
positive air pressure. The 0.2 mm filters were placed in a
15 ml tube containing 2 ml Buffer RLT (containing
b-mercaptoethanol) from the RNeasy kit (Qiagen, Valencia,
CA) and flash-frozen in liquid nitrogen for RNA extraction.
For DNA extraction, an additional 20 l of sea water were
simultaneously filtered using the protocol outlined above at
both time points. The 0.2 mm filters were placed in Whirlpack
bags and flash-frozen. The total sampling time from initiation
of collection until freezing in liquid nitrogen was approximately 1.5 h. We obtained ~1 mg of total RNA from 40 to 80 l
of sea water. Following mRNA enrichment and amplification,
30–100 mg of mRNA was available for conversion to cDNA
for sequencing. Typically, only 3–5 mg of DNA was required
for pyrosequencing.
RNA and DNA preparation
DNA was extracted using a phenol : chloroform-based protocol (Fuhrman et al., 1988). Briefly, frozen filters inside Whirlpak bags were transferred to 50 ml Falcon centrifuge tubes.
Ten millilitre extraction buffer [SDS (10% Sodium Doecyl
Sulphate) : STE (100 mM NaCl, 10 mM Tris, 1 mM EDTA),
9:1] was added to the tubes and boiled in a water bath for
5 min. The extraction buffer was then removed from the
tubes, placed into Oak Ridge round-bottom centrifuge tubes,
to which 3 ml NaOAc and 28 ml 100% EtOH were added.
Organic macromolecules were precipitated overnight at
-20°C, before the tubes were centrifuged for 1 h at 15 000 g.
The supernatant was decanted, and pellets dried for 30 min
in the air. The pellets were resuspended in 600 ml deionized
water, and sequentially extracted with 500 ml phenol, 500 ml
phenol : chloroform : isoamyl alcohol (24:1:0.1), and 500 ml
chloroform:isoamyl alcohol (9:1); after each extraction the
organic phase was removed and discarded. The supernatant
was removed into a fresh tube at the end of last extraction,
amended with 150 ml NaOAc and 1.2 ml 100% EtOH, and
precipitated overnight. The tube contents were then centrifuged at 15 000 g for 1 h, the supernatant decanted, and
pellets dried in a speed vacuum dryer for 10 min. The DNA
pellets were resuspended in 100 ml DNAse and RNAse-free
deionized water (Ambion).
RNA was extracted using a modified version of the RNeasy
kit (Qiagen) that results in high RNA yields from material on
polycarbonate filters (Poretsky et al., 2008). Frozen samples
were first thawed slightly for 2 min in a 40–50°C water bath
and then vortexed for 10 min with RNase-free beads from the
Mo-Bio RNA PowerSoil kit (Carlsbad, CA). Following centrifugation for 5 min at 3000–5000 g, the supernatant was transferred to a new tube. Beginning with the RNeasy Midi kit,
1 vol. of 70% ethanol was added to the lysate and, in order to
shear large-molecular-weight nucleic acids, the lysate was
drawn through a 22-gauge needle several (~5) times. RNA
extraction then continued with the RNeasy Mini kit according
to the manufacturer’s instructions.
Following extraction, RNA was treated with DNase using
the TURBO DNA-free kit (Ambion, Austin, TX). Two methods
were employed to rid the RNA samples of rRNA. The RNA
was first treated enzymatically with the mRNA-ONLY
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1372 R. S. Poretsky et al.
Prokaryotic mRNA Isolation Kit (Epicentre Biotechnologies,
Madison, WI) that uses a 5′-phosphate-dependent exonuclease to degrade rRNAs. The MICROBExpress kit (Ambion)
subtractive hybridization with capture oligonucleotides
hybridized to magnetic beads was subsequently used as an
additional mRNA enrichment step.
In order to obtain mg quantities of mRNA, approximately
500 ng of RNA was linearly amplified using the MessageAmp
II-Bacteria Kit (Ambion) according to the manufacturer’s
instructions. Finally, the amplified, antisense RNA (aRNA)
was converted to double-stranded cDNA with random hexamers using the Universal RiboClone cDNA Synthesis
System (Promega, Madison, WI). The cDNA was purified with
the Wizard DNA Clean-up System (Promega). The quality
and quantity of the total RNA, mRNA, aRNA and cDNA were
assessed by measurement on the NanoDrop-1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE) and
the Experion Automated Electrophoresis System (Bio-Rad,
Hercules, CA).
cDNA sequencing and quality control
cDNAs from each sample (night and day) were sequenced
using the GS 20 sequencing system by 454 Life Sciences
(Branford, CT) (Margulies et al., 2005), resulting in
10 682 120 bp from 106 907 reads for the night sample and
13 255 704 bp from 133 515 reads for the day sample. The
average sequence length was 99 bp. The sequences have
been deposited in the NCBI Short Read Archive with the
Genome Project ID #33463.
rRNA identification and removal
For rRNA sequence identification, the sequences were clustered at an identity threshold of 98% based on a local alignment (number of identical residues divided by length of
alignment) using the program Cd-hit (Li and Godzik, 2006).
Ribosomal RNA sequences were identified by BLASTN queries
of the reference sequence of each cluster against the noncurated, GenBank nucleotide database (nt) (Benson et al.,
2007) using cut-off criteria of E-value ⱕ 10-3, nucleic acid
length ⱖ 69 and per cent identity ⱖ 40% previously established with in silico tests for rRNA sequence predictions of
short pyrosequences (Frias-Lopez et al., 2008; Mou et al.,
2008). We conservatively identified a sequence as rRNAderived and removed it from the analysis pipeline if any of the
top three BLASTN hits were to an rRNA gene.
cDNA sequence annotation
The criteria for protein predictions generated using BLASTX
against the NCBI curated, non-redundant reference
sequence database (RefSeq) (Pruitt et al., 2005) were established with in silico tests to determine suitable cut-off limits for
reliable functional prediction. For these tests, 100 arbitrarily
selected, known functional gene sequences were fragmented
into 20–500 bp fragments and analysed using BLASTX
against RefSeq to determine if the best BLAST hit was to the
correct gene function, excluding self-hits. Based on these
analyses, the cut-off criteria for protein prediction were
set as E-value < 0.01, identity > 40% and overlapping
length > 23 aa to the corresponding best hit.
Sequences with hits to RefSeq were assigned functional
protein or pathway predictions based on the COG database
(Tatusov et al., 2000) or KEGG database (Kanehisa and
Goto, 2000). The cut-off criteria for functional protein prediction based on orthologous groups using BLASTX analysis
against the COG database were established using the same
in silico approach with 100 bp fragments of known functional
genes as E-value < 0.1, identity > 40% and overlapping
length > 23 aa to the corresponding best hit. The COG cut-off
criteria were also applied to the KEGG database for pathway
prediction because of the similarity in database size. Taxonomic binning of the sequences was carried out using MEGAN
with the default settings for all parameters (Huson et al.,
2007); this program assigns likely taxonomic origin to
sequences based on the NCBI taxonomy of closest BLAST
hits. The taxonomic affiliations of the putative mRNA
sequences were predicted using MEGAN to the family level,
and the top BLAST hit for any higher-resolution taxonomic
assignments. All non-rRNA sequences that had no RefSeq
hits were BLASTX-queried against the nr database as well as
against CAMERA un-assembled ORFs predicted from the
Global Ocean Survey reads (http://camera.calit2.net/
index.php) (Seshadri et al., 2007).
Eukaryotic sequence annotation
Eukaryotic transcripts were binned by MEGAN. Sequences
were queried (BLASTX) against a curated database of protein
sequences derived from all available complete eukaryotic
organelle and nuclear genomes (currently, 46 eukaryotic
genomes). Transcripts that matched a reference protein
sequence with > 60% identity and an E-value < e-10 were
retained and the reference protein for the cluster was used for
functional annotation. Functional annotation was performed
using Java-based Blast2go (Conesa et al., 2005) that annotates genes based on similarity searches with statistical
analysis and highlighted visualization on directed acyclic
graphs.
16S rRNA gene libraries
PCR amplification of ribosomal DNA was carried out using
primers 27F and 1522R (Johnson, 1994). The PCR conditions were as follows: 3 min at 96°C, followed by 30 cycles of
denaturation at 95°C for 50 s, annealing at 58°C for 50 s,
primer extension at 72°C for 1 min and a final extension at
72°C for 10 min. PCR products were cleaned using the
QIAquick PCR Purification Kit (Qiagen) and multiple PCR
reactions were pooled and cloned into pCR2.1 vector using
the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). PCR
amplifications included standard no-template controls.
Clones from each sample (192) were sequenced at the University of Georgia Sequencing Facility on an ABI 3100
(Applied Biosystems, Foster City, CA).
Predicted highly expressed genes
The PHX genes were determined for cultured representatives
of three prokaryotic taxa that were well represented in the
transcript libraries (Prochlorococcus, Roseobacter and
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Comparative Metatranscriptomic Analysis 1373
SAR11) using an algorithm developed by Karlin and Mrázek
(2000). The algorithm is based on comparisons with codon
usage patterns in genes expected to be frequently transcribed in a prokaryotic genome (ribosomal proteins, chaperone proteins, etc.). Environmental transcript sequences
that had best BLAST hits to one of the PHX genes were
similarly designated as PHX.
Statistical analysis
A statistical program designed for comparing gene frequency
in metagenomic data sets (Rodriguez-Brito et al., 2006) was
used to compare the night and day mRNA sequences categorized based on COGs, KEGGs and proteins. The program
was run with 20 000 repeated samplings with a sample size
of 10 000 for COGs, 9000 for KEGGs and 25 000 for proteins. The significance level (P) was set at < 0.05.
qPCR verifications
To confirm that the composition of the pyrosequence library
was representative of the initial mRNAs, transcripts of five
genes that were top hits to multiple sequences in both transcript pools were quantified in the total RNA pool. The qPCR
primer sets were designed for the P. marinus str. AS9601
recA and psaA, a proteorhodopsin gene and a Na+/solute
symporter (Ssf family) gene from P. ubique HTCC1062, and a
probable integral membrane proteinase attributed to Psychroflexus torquis ATCC 700755 (sequences and annealing
temps in Table S6). Reverse transcription reactions were
carried out on 200 ng of RNA using the Omniscript RT kit
(Qiagen) in 20 ml volumes containing 1¥ RT buffer, 0.3 mg ml-1
of random hexamers (Invitrogen), 1 ml of 5 mM dNTPs, 2 U of
reverse transcriptase and 20 U of RNase inhibitor (Promega)
at 37°C for 1 h, followed by inactivation of the reverse transcriptase at 95°C for 2 min. The day : night ratio of each gene
transcript in the RNA pools was determined by qPCR amplification of a serial dilution of cDNAs in triplicate, and calculation of the difference in cycle threshold values (DCT)
between the two samples. Quantitative amplification was
done using the iCycler iQ RT PCR detection system (BioRad) in a 20 ml reaction volume containing 10 ml of iQ SYBR
Green Supermix (Bio-Rad), 0.4 ml each of 10 mM of the
forward and reverse primers and 1 ml of the cDNA template.
PCR conditions included a preliminary denaturation at 95°C
for 3 min followed by 45 cycles of 95°C for 15 s, annealing for
1.5 s, 95°C for 1 min and 55°C for 1 min. A melt curve was
generated following the PCR, beginning with 55°C and
increasing 0.4°C every 10 s until 95°C. A PCR control without
an initial RT step was included with every set of reactions.
Acknowledgements
We thank the Captain and crew of the R/V Kilo Moana and Dr
David Karl. Jennifer Oliver assisted with sample processing.
Jonathan Badger assisted with data processing. Funding was
provided by The Gordon and Betty Moore Foundation,
National Science Foundation grants MCB-0702125 (M.A.M.),
EF-0722374 (A.E.A) and OCE-0425363 (J.P.Z.), and the NSF
C-MORE Center for Microbial Oceanography.
References
Allen, A.E., Vardi, A., and Bowler, C. (2006) An ecological
and evolutionary context for integrated nitrogen metabolism and related signaling pathways in marine diatoms.
Curr Opin Plant Biol 9: 264–273.
Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez, D., Putnam, N.H., et al. (2004) The genome of the
diatom Thalassiosira pseudonana: ecology, evolution, and
metabolism. Science 306: 79–86.
Bauer, J.E., Williams, P.M., and Druffel, E.R.M. (1992) 14C
activity of dissolved organic carbon fractions in the northcentral Pacific and Sargasso Sea. Nature 357: 667–670.
Belasco, J.G. (1993) mRNA degradation in prokaryotic cells:
an overview. In Control of Messenger RNA Stability.
Belasco, J.G., Brawerman, G. (eds). San Diego, CA, USA:
Academic Press, pp. 3–11.
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J.,
and Wheeler, D.L. (2007) GenBank. Nucleic Acids Res 35:
D21–D25.
Bürgmann, H., Widmer, F., Sigler, W.V., and Zeyer, J. (2003)
mRNA extraction and reverse transcription-PCR protocol
for detection of nifH gene expression by Azotobacter vinelandii in soil. Appl Environ Microbiol 69: 1928–1935.
Bürgmann, H., Howard, E.C., Ye, W., Sun, F., Sun, S., Napierala, S., and Moran, M.A. (2007) Transcriptional response
of Silicibacter pomeroyi DSS-3 to dimethylsulfoniopropionate (DMSP). Environ Microbiol 9: 2742–2755.
Campbell, L., and Vaulot, D. (1993) Photosynthetic picoplankton community structure in the subtropical North
Pacific Ocean near Hawaii (Station ALOHA). Deep Sea
Res. Part I Oceanogr Res Pap 40: 2043–2060.
Carpenter, L.J., Lewis, A.C., Hopkins, J.R., Read, K.A.,
Longley, I.D., and Gallagher, M.W. (2004) Uptake of
methanol to the North Atlantic Ocean surface. Global Biogeochem Cycles 18: GB4027.
Cavender-Bares, K.K., Karl, D.M., and Chisholm, S.W.
(2001) Nutrient gradients in the western North Atlantic
Ocean: relationship to microbial community structure and
comparison to patterns in the Pacific Ocean. Deep Sea
Res. Part I Oceanogr Res Pap 48: 2373–2395.
Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon,
M., and Robles, M. (2005) Blast2GO: a universal tool for
annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674–3676.
DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam,
S.J., Frigaard, N.-U., et al. (2006) Community genomics
among stratified microbial assemblages in the ocean’s
interior. Science 311: 496–503.
Derelle, E., Ferraz, C., Rombauts, S., Rouze, P., Worden,
A.Z., Robbens, S., et al. (2006) Genome analysis of the
smallest free-living eukaryote Ostreococcus tauri unveils
many unique features. Proc Natl Acad Sci USA 103:
11647–11652.
Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L.,
Schuster, S.C., Chisholm, S.W., and DeLong, E.F. (2008)
Microbial community gene expression in ocean surface
waters. Proc Natl Acad Sci USA 105: 3805–3810.
Fuhrman, J.A., Comeau, D.E., Hagstrom, A., and Chan, A.M.
(1988) Extraction from natural planktonic microorganisms
of DNA suitable for molecular biological studies. Appl
Environ Microbiol 54: 1426–1429.
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
1374 R. S. Poretsky et al.
Gelder, R.N.V., von Zastrow, M.E., Yool, A., Dement, W.C.,
Barchas, J.D., and Eberwine, J.H. (1990) Amplified RNA
synthesized from limited quantities of heterogeneous
cDNA. Proc Natl Acad Sci USA 87: 1663–1667.
Gilbert, J.A., Field, D., Huang, Y., Edwards, R., Li, W., Gilna,
P., and Joint, I. (2008) Detection of large numbers of novel
sequences in the metatranscriptomes of complex marine
microbial communities. PLoS ONE 3: e3042.
Giovannoni, S.J., Hayakawa, D.H., Tripp, H.J., Stingl, U.,
Givan, S.A., Cho, J.-C., et al. (2008) The small genome of
an abundant coastal ocean methylotroph. Environ Microbiol 10: 1771–1782.
Heikes, B.G., Chang, W.N., Pilson, M.E.Q., Swift, E., Singh,
H.B., Guenther, A., et al. (2002) Atmospheric methanol
budget and ocean implication. Global Biogeochem Cycles
16: 80.81–80.80.13.
Howard, E.C., Henriksen, J.R., Buchan, A., Reisch, C.R.,
Burgmann, H., Welsh, R., et al. (2006) Bacterial taxa that
limit sulfur flux from the ocean. Science 314: 649–652.
Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007)
MEGAN analysis of metagenomic data. Genome Res 17:
377–386.
Ingraham, J.L., Maaløe, O., and Neidhardt, F.C. (1983)
Growth of the Bacterial Cell. Sunderland, MA, USA:
Sinauer Associates.
Johnson, J.L. (1994) Similarity analysis of rRNAs. In Methods
for General and Molecular Bacteriology. Gerhardt, P.,
Murray, R.G.E., Wood, W.A., and Krieg, N.R. (eds). Washington, DC: American Society for Microbiology, pp. 683–
700.
Johnson, Z.I., Zinser, E.R., Coe, A., McNulty, N.P., Woodward, E.M.S., and Chisholm, S.W. (2006) Niche partitioning among Prochlorococcus ecotypes along ocean-scale
environmental gradients. Science 311: 1737–1740.
Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27–30.
Karl, D., Letelier, R., Tupas, L., Dore, J., Christian, J., and
Hebel, D. (1997) The role of nitrogen fixation in biogeochemical cycling in the subtropical North Pacific
Ocean. Nature 388: 533–538.
Karl, D.M., and Lukas, R. (1996) The Hawaii Ocean Timeseries (HOT) program: background, rationale and field
implementation. Deep Sea Res. Part II Top Stud Oceanogr
43: 129–156.
Karlin, S., and Mrázek, J. (2000) Predicted highly expressed
genes of diverse prokaryotic genomes. J Bacteriol 182:
5238–5250.
Kiene, R.P., Linn, L.J., and Bruton, J.A. (2000) New and
important roles for DMSP in marine microbial communities.
J Sea Res 43: 209–224.
Lander, E.S., and Waterman, M.S. (1988) Genomic mapping
by fingerprinting random clones: a mathematical analysis.
Genomics 2: 231–239.
Li, W., and Godzik, A. (2006) Cd-hit: a fast program for
clustering and comparing large sets of protein or nucleotide
sequences. Bioinformatics 22: 1658–1659.
Liang, P., and Pardee, A.B. (1992) Differential display of
eukaryotic messenger RNA by means of the polymerase
chain reaction. Science 257: 967–971.
McDonald, S.M., Sarno, D., Scanlan, D.J., and Zingone, A.
(2007) Genetic diversity of eukaryotic ultraphytoplankton in
the Gulf of Naples during an annual cycle. Aquat Microb
Ecol 50: 75–89.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader,
J.S., Bemben, L.A., et al. (2005) Genome sequencing in
microfabricated high-density picolitre reactors. Nature 437:
376–380.
Mary, I., Garczarek, L., Tarran, G.A., Kolowrat, C., Terry,
M.J., Scanlan, D.J., et al. (2008) Diel rhythmicity in amino
acid uptake by Prochlorococcus. Environ Microbiol 10:
2124–2131.
Morris, R.M., Rappe, M.S., Connon, S.A., Vergin, K.L.,
Siebold, W.A., Carlson, C.A., and Giovannoni, S.J. (2002)
SAR11 clade dominates ocean surface bacterioplankton
communities. Nature 420: 806–810.
Mou, X., Sun, S., Edwards, R.A., Hodson, R.E., and Moran,
M.A. (2008) Bacterial carbon processing by generalist
species in the coastal ocean. Nature 451: 708–711.
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., and
Maltsev, N. (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96: 2896–2901.
Poretsky, R.S., Bano, N., Buchan, A., LeCleir, G.,
Kleikemper, J., Pickering, M., et al. (2005) Analysis of
microbial gene transcripts in environmental samples. Appl
Environ Microbiol 71: 4121–4126.
Poretsky, R.S., Bano, N., Buchan, A., Moran M.A., and
Hollibaugh, J.T. (2008) Environmental transcriptomics: a
method to access expressed genes in complex microbial
communities. In Molecular Microbial Ecology Manual.
Kowalchuk, G.A., de Bruijn, F.J., Head, I.M., Akkermans,
A.D.L., and van Elsas, J.D. (eds). Dordrecht, Netherlands:
Springer, pp. 1892–1904.
Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005) NCBI
Reference Sequence (RefSeq): a curated non-redundant
sequence database of genomes, transcripts and proteins.
Nucleic Acids Res 33: D501–D504.
Rodriguez-Brito, B., Rohwer, F., and Edwards, R. (2006) An
application of statistics to comparative metagenomics.
BMC Bioinformatics 7: 162.
Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B.,
Williamson, S., Yooseph, S., et al. (2007) The Sorcerer II
Global Ocean Sampling Expedition: Northwest Atlantic
through Eastern Tropical Pacific. PLoS Biol 5: e77.
Schaefer, J.K., Goodwin, K.D., McDonald, I.R., Murrell, J.C.,
and Oremland, R.S. (2002) Leisingera methylohatidivorans
gen. nov., sp nov., a marine methylotroph that grows on
methyl bromide. Int J Syst Evol Microbiol 52: 851–859.
Seshadri, R., Kravitz, S.A., Smarr, L., Gilna, P., and Frazier,
M. (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5: 394–397.
Tatusov, R.L., Galperin, M.Y., Natale, D.A., and Koonin, E.V.
(2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res
28: 33–36.
Ward, B.B., Kilpatrick, K.A., Novelli, P.C., and Scranton, M.I.
(1987) Methane oxidation and methane fluxes in the ocean
surface-layer and deep anoxic waters. Nature 327: 226–
229.
Wawrik, B., Paul, J.H., and Tabita, F.R. (2002) Real-time
PCR quantification of rbcL (ribulose-1,5-bisphosphate
carboxylase/oxygenase) mRNA in diatoms and pelagophytes. Appl Environ Microbiol 68: 3771–3779.
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375
Comparative Metatranscriptomic Analysis 1375
Woodall, C.A., Warner, K.L., Oremland, R.S., Murrell, J.C.,
and McDonald, I.R. (2001) Identification of methyl halideutilizing genes in the methyl bromide-utilizing bacterial
strain IMB-1 suggests a high degree of conservation of
methyl halide-specific genes in gram-negative bacteria.
Appl Environ Microbiol 67: 1959–1963.
Zehr, J.P., Waterbury, J.B., Turner, P.J., Montoya, J.P.,
Omoregie, E., Steward, G.F., et al. (2001) Unicellular
cyanobacteria fix N2 in the subtropical North Pacific Ocean.
Nature 412: 635–638.
Zhou, J.H. (2003) Microarrays for bacterial detection and
microbial community analysis. Curr Opin Microbiol 6: 288–
294.
Supporting information
Additional Supporting Information may be found in the online
version of this article:
Fig. S1. Transcript mapping to the KEGG histidine metabolism pathway for P. marinus (A) and the vitamin B6 metabolism pathway for P. ubique (B) at night. Blue shading indicates
that transcripts were found; grey indicates genes that are
present in the genome, but no transcripts were found; white
indicates genes that are not present in the reference
genomes.
Fig. S2. Quality control of the pyrosequences using qPCR
verifications of transcript ratios for five genes: recA and psaA
from P. marinus str. AS9601, a bacteriorhodopsin and a
Na+/solute symporter (Ssf family) gene from P. ubique
HTCC1062, and a probable integral membrane proteinase
attributed to P. torquis ATCC 700755. The night : day ratio of
transcripts in the pyrosequence libraries is plotted against the
same ratio in the original total RNA fraction.
Table S1. Results of bioinformatic pipeline for 100 and
200 bp fragments from groups for which there are no genome
sequences currently available. BACs from uncultured marine
taxa (two from SAR86 and one from SAR116) were fragmented into random 100 bp pieces, using just the coding
regions. Fragments were blasted against RefSeq, not allowing a self-hit. As controls, we did the same for P. ubique
HTCC1062 and P. marinus MIT9312.
Table S2. Estimates of coverage using two different models.
The Lander–Waterman model uses the 16S rRNA clone
library data to establish a taxon-abundance model for the
system at a similarity level of 99%, and is based on the
assumptions that each taxon produces 1000 transcripts at
any given time and all expressed genes are expressed
equally. The Chao1 richness estimators for COGs are computed using EstimateS (version 8.0, R. K. Colwell, http://
purl.oclc.org/estimates).
Table S3. KEGG pathways for three taxonomic bins
(P. marinus, P. ubique and Roseobacters) significantly overrepresented in the night (grey shading) and day (no shading)
transcriptomes (P < 0.10).
Table S4. COGs significantly overrepresented in the night
(grey shading) and day (no shading) transcriptomes
(P < 0.05).
Table S5. Genes significantly overrepresented in the night
(grey shading) and day (no shading) transcriptomes
(P < 0.05).
Table S6. Primer sets used in qPCR.
Please note: Wiley-Blackwell are not responsible for the
content or functionality of any supporting materials supplied
by the authors. Any queries (other than missing material)
should be directed to the corresponding author for the
article.
© 2009 The Authors
Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375