* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Comparative day/night metatranscriptomic analysis of microbial
Amino acid synthesis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Biochemical cascade wikipedia , lookup
Genomic library wikipedia , lookup
Pharmacometabolomics wikipedia , lookup
Transposable element wikipedia , lookup
Genetic code wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Messenger RNA wikipedia , lookup
Two-hybrid screening wikipedia , lookup
RNA interference wikipedia , lookup
RNA silencing wikipedia , lookup
Genomic imprinting wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Biosynthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Point mutation wikipedia , lookup
Gene regulatory network wikipedia , lookup
Non-coding DNA wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Epitranscriptome wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression wikipedia , lookup
Genome evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Environmental Microbiology (2009) 11(6), 1358–1375 doi:10.1111/j.1462-2920.2008.01863.x Comparative day/night metatranscriptomic analysis of microbial communities in the North Pacific subtropical gyre emi_1863 1358..1375 Rachel S. Poretsky,1 Ian Hewson,2 Shulei Sun,1 Andrew E. Allen,3 Jonathan P. Zehr2 and Mary Ann Moran1* 1 University of Georgia, Department of Marine Sciences, Athens, GA 30602, USA. 2 University of California Santa Cruz, Department of Ocean Sciences, Santa Cruz, CA 95064, USA. 3 J. Craig Venter Institute, Microbial and Environmental Genomics, San Diego, CA 92121, USA. day compared with night, and evidence that energy acquisition is coordinated with solar radiation levels for both autotrophic and heterotrophic microbes. In contrast, housekeeping activities such as amino acid biosynthesis, membrane synthesis and repair, and vitamin biosynthesis were overrepresented in the night transcriptome. Direct sequencing of these environmental transcripts has provided detailed information on metabolic and biogeochemical responses of a microbial community to solar forcing. Summary Metatranscriptomic analyses of microbial assemblages (< 5 mm) from surface water at the Hawaiian Ocean Time-Series (HOT) revealed community-wide metabolic activities and day/night patterns of differential gene expression. Pyrosequencing produced 75 558 putative mRNA reads from a day transcriptome and 75 946 from a night transcriptome. Taxonomic binning of annotated mRNAs indicated that Cyanobacteria contributed a greater percentage of the transcripts (54% of annotated sequences) than expected based on abundance (35% of cell counts and 21% 16S rRNA of libraries), and may represent the most actively transcribing cells in this surface ocean community in both the day and night. Major heterotrophic taxa contributing to the community transcriptome included a-Proteobacteria (19% of annotated sequences, most of which were SAR11-related) and g-Proteobacteria (4%). The composition of transcript pools was consistent with models of prokaryotic gene expression, including operon-based transcription patterns and an abundance of genes predicted to be highly expressed. Metabolic activities that are shared by many microbial taxa (e.g. glycolysis, citric acid cycle, amino acid biosynthesis and transcription and translation machinery) were well represented among the community transcripts. There was an overabundance of transcripts for photosynthesis, C1 metabolism and oxidative phosphorylation in the Received 17 September, 2008; accepted 3 December, 2008. *For correspondence. E-mail [email protected]; Tel. 706-542-6481; Fax 706-542-5888. Introduction Oceanic subtropical gyres make up 40% of the Earth’s surface and play critical roles in carbon fixation and nutrient cycling. The Hawaii Ocean Time-Series (HOT) in the North Pacific subtropical gyre was established to provide a longterm perspective on oceanographic properties of such systems (Karl and Lukas, 1996) and has served as the focus of substantial research into the role of marine microorganisms in ocean biogeochemistry (Karl et al., 1997; Cavender-Bares et al., 2001; Zehr et al., 2001). Station ALOHA, the core study site at HOT, is characterized by warm (> 23°C) surface waters with low NO3- concentrations (< 15 nM), seasonally variable surface mixed-layers (10–120 m), low standing biomass of living organisms (10–15 mg C l-1) and a persistent deep (75–140 m) chlorophyll a maximum layer. Since 1988, regular measurements of physical, chemical and biological parameters have been obtained with monthly ship-based monitoring as well as bottom-moored instruments and buoys. Recent metagenomic sampling efforts at Station ALOHA have provided information about the genes harboured by the bacterioplankton community and how they are distributed with depth (DeLong et al., 2006). Characterizing patterns of expression of these microbial genes and identifying what factors induce their expression is the next critical step in understanding this oceanic ecosystem. Analogous to metagenomics, environmental transcriptomics (metatranscriptomics) retrieves and sequences environmental mRNAs from a microbial assemblage without prior knowledge of what genes the community might be expressing (Poretsky et al., 2005; Frias-Lopez et al., 2008). Thus it provides a less biased perspective on © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd Comparative Metatranscriptomic Analysis 1359 microbial gene expression in situ compared with other approaches (Wawrik et al., 2002; Bürgmann et al., 2003; Zhou, 2003). Environmental transcriptomics protocols are technically difficult, however, as prokaryotic mRNAs generally lack the poly(A) tails that make isolation of eukaryotic messages relatively straightforward (Liang and Pardee, 1992) and because of the relatively short halflives of mRNAs (Belasco, 1993). In addition, mRNAs are much less abundant than rRNAs in total RNA extracts, thus an rRNA background often overwhelms mRNA signals. A first analysis of environmental transcriptomes by creating clone libraries using random primers to reversetranscribe and amplify environmental mRNAs was successful in two different natural environments (Poretsky et al., 2005), but results were biased by selection of the random primers used to initiate cDNA synthesis. Techniques to linearly amplify mRNA obviate the need for random primers in the amplification step and make it possible to use less starting material (Gelder et al., 1990), while recently developed pyrosequencing technologies allow direct sequencing (without cloning) (Margulies et al., 2005). Initial application of this approach at Station ALOHA (Frias-Lopez et al., 2008) and in coastal water mesocosms (Gilbert et al., 2008) demonstrated its utility for characterizing microbial community gene expression. Here we use environmental transcriptomics to elucidate day/night differences in gene expression in surface waters of the North Pacific subtropical gyre (Karl and Lukas, 1996). This analysis provides information on the dominant metabolic processes within the bacterioplankton assemblages and reveals changes in expression patterns of biogeochemically relevant processes. Results cDNA sequence annotation The cDNAs prepared from amplified RNA (collected from the 0.2–5 mm size fraction) ranged in size from 100 bp to 1 kb, with the majority between 200 and 500 bp. The average picoliter reactor pyrosequencing read length was 99 bp, typical for the GS 20 sequencing platform. Predicted rRNA sequences were removed based on sequence similarity to the nt database using BLASTN. While more laborious than our initial approach that used sequence similarity to the RDP II database supplemented with a 18S, 23S and 28S rRNA database from genome sequences, it identified nearly all of the rRNA sequences in our libraries. Accurate identification of rRNAs is crucial because of numerous misidentified sequences in the RefSeq protein database (i.e. rRNA sequences that are incorrectly annotated as putative proteins). Relatively low rRNA sequence contamination (37%) compared with the rRNA content of prokaryotic cells (> 80%; Ingraham et al., 1983) indicated that the steps for excluding rRNAs through selective degradation and subtractive hybridization were largely successful. Sequences remaining after deletion of rRNA sequences (75 558 from the day and 75 946 from the night) were categorized as possible protein encoding sequences and BLASTX-queried against the NCBI curated, non-redundant reference sequence database (RefSeq) to determine putative functions (Fig. 1). About one-third of HOT pyrosequences in each library met the criteria for gene predictions determined empirically by in silico analysis of known functional gene sequences fragmented into 100 bp pieces (see Experimental procedures for more details). This is nearly twice the fraction of reads identified in metagenomic efforts with similar pyrosequencing read lengths (Frias-Lopez et al., 2008; Mou et al., 2008), as might be expected for sequences biased towards coding regions of genomes. These sequences were subsequently assigned to the function of their best hit in RefSeq. Transcript abundance was analysed as relative abundance within the collective community transcriptome rather than per-gene expression levels (see Frias-Lopez et al., 2008). Empirically derived criteria were established in separate in silico analyses for the Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, which contain fewer sequences than RefSeq (Fig. 1). Some of the sequences without hits in RefSeq were similar to proteins in the Global Ocean Sampling database, indicating that similar sequences have been found in marine bacterioplankton communities, but functional annotation is not currently possible. At the end of the annotation pipeline, half of the possible protein-encoding sequences in each library had no significant hits to previously sequenced genes. To examine how sequences from uncultured marine bacterial taxa might decrease annotation success or skew taxonomic assignments, we randomly selected 100 bp sequences from the coding regions of genome fragments from SAR86 and SAR116 cells captured in environmental BAC libraries (SAR86 BAC, AF279106; SAR86 BAC, AY552545; SAR116 BAC, AY744399). Excluding selfhits, approximately 60% of the sequences from the BACs had no hits in RefSeq (Table S1). In a similar analysis of coding sequences from cultured taxa with genome sequences available (Pelagibacter ubique HTCC1062 and Prochlorococcus marinus MIT9312), only ~20% of the sequences had no hits in RefSeq. Many unannotated sequences in the HOT libraries are therefore likely to be transcripts from poorly known taxa, but also include some transcripts from well-known taxa with poor identity to sequence databases for that particular 100 bp fragment. In support of the latter, a preliminary analysis of a © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1360 R. S. Poretsky et al. 240,422 Total 454 Sequences BLASTN against nt 63% 37% 88,916 rRNA sequences 151,504 Possible proteinencoding sequences BLASTX against RefSeq 21% 42% 48,648 Identified sequences BLASTX against COG BLASTX against KEGG 10% 15% 24,474 sequences 35,927 sequences 102,856 Unidentified BLASTX against GOS BLASTX against nr 0.07% 32% 163 sequences 11% 26,366 GOS sequences 76,327 unidentified sequences Fig. 1. The mRNA annotation pipeline developed for 454 transcript reads showing combined counts for the day and night transcriptomes. All percentages are relative to the total number of sequences entering the pipeline. marine environmental transcriptome consisting of longer reads (~200 bp; 454 GS FLX sequencing platform; R.S. Poretsky and M.A. Moran, unpublished; and Table S1) resulted in twice the frequency of annotated sequences as the HOT metatranscriptome. For the 100 bp genome fragments from uncultured taxa that had significant hits in RefSeq, they were almost always to a gene from an organism in the same phylum (90%) or subphylum (70%), and thus did not significantly skew the taxonomic assignments (Table S1). SAR86, SAR116 and other currently recognized uncultured groups made up ~4% of the 16S rRNA amplicons from these samples (see below). Finally, to examine the possibility that the unidentified sequences were from non-protein-coding regions, these sequences were BLAST-queried to tRNA genes, 5S rRNA genes and intergenic region sequences from three P. marinus genomes (MIT9301, MIT9312 and AS601) and two P. ubique genomes (HTCC1002 and HTCC1062). Based on this analysis, ~4% of the 76 327 unidentified sequences were from non-protein-coding regions of these genomes, and these primarily hit intergenic regions. Community composition and taxonomic origin of transcripts Prochlorococcus are the most abundant Cyanobacteria at Station ALOHA (> 95% of photosynthetic picoplankton cells; Campbell and Vaulot, 1993) and in this study accounted for approximately 2 ¥ 105 cell ml-1 (based on flow cytometric counting; http://hahana.soest.hawaii.edu/ hot/hot-dogs/), or ~30% of the total microbial community (Fig. 2). Heterotrophic bacteria (including phototrophs) were numerically dominant with ~5 ¥ 105 cell ml-1, accounting for ~65% of the microbial community present at the time of sampling. Direct counts also indicated the presence of ~800 cell ml-1 of pigmented nanoeukaryotes (0.2%; Fig. 2). Companion PCR-based 16S rRNA clone libraries were generated from DNA collected in tandem with the RNA samples and demonstrated close agreement with the flow cytometric data in terms of taxonomic composition at Station ALOHA. Cyanobacteria accounted for ~20% of the 16S rRNA sequences, and heterotrophic bacterial groups were ~80% (Fig. 3). Among the heterotrophic 16S rRNA © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Comparative Metatranscriptomic Analysis 1361 0 200 400 600 0 Depth (m) 50 100 150 200 chla (10 -3 μg l -1 ) Prochlorococcus x 10 3 cells ml -1 Synechococcus x 10 2 cells ml -1 Nanoeukaryotes x 10 2 cells ml -1 Heterotrophic bateria x 10 3 cells ml -1 Fig. 2. Depth profiles of Prochlorococcus-like, Synechococcus-like, heterotrophic bacteria and pigmented nanoeukaryotes during the HOT-175 cruise, as determined by flow cytometry. The horizontal line indicates the mixed layer depth. The depth profile for chlorophyll a is also indicated. Data were collected through the HOT project and downloaded from the HOT Data Organization and Graphical System (http://hahana.soest.hawaii.edu/hot/hot-dogs/). sequences, Proteobacteria were most abundant (41%; Fig. 3) and were dominated by a-Proteobacteria (22%), b-Proteobacteria (8%) and g-Proteobacteria (8%). Bacteroidetes (8%) and Firmicutes (12%, biased towards the day sample) were also well represented. Taxonomically binned mRNA sequences were compared with community composition data to ask whether taxa contributed to the HOT community mRNA in proportion to their representation in the microbial assemblage (i.e. whether taxa are equally transcriptionally active on a per-cell basis). Cyanobacteria dominated the transcript libraries (55% of sequences) with about twofold higher representation than in the 16S rRNA amplicons or the cell count data (Fig. 3), indicating that there is more gene expression in these autotrophic bacterioplankton than in co-occurring heterotrophs (or possibly that their transcripts are longer-lived). When relative 16S rRNA abundance was calculated among just the heterotrophic groups (i.e. with cyanobacterial sequences removed), many taxa had similar contributions to the transcript pool and amplicon pool, suggesting comparable levels of transcriptional activity on a per-gene basis within the limits of recognized biases of PCR amplification (Fig. 3). Proteobacteria contributed the second largest number of transcript sequences (28%), most of which were attributed to a-Proteobacteria (19%) and g-Proteobacteria (4%). Approximately 2% of the total transcripts were of eukaryotic origin. Comparing putative taxonomic assignments of transcripts between day and night, Cyanobacteria contributed equally to the day and night transcriptome (55% versus 56%) as did a-Proteobacteria (40% versus 45% of heterotrophic transcripts) and g-Proteobacteria (11% versus 8% of heterotrophic transcripts) (Fig. 3). More detailed taxonomic assignment of transcripts was carried out for the best represented clades. The Cyanobacteria transcripts were dominated by Prochlorococcuslike sequences most similar to P. marinus AS9601, P. marinus MIT 9301 and P. marinus MIT 9312 (Table 1). The a-Proteobacteria, the most transcriptionally active among the heterotrophic groups, mostly contained sequences with similarity to the SAR11 group members P. ubique HTCC1002 and P. ubique HTCC1062 (~10% of prokaryotic transcripts). Roseobacter-like sequences were also represented and were primarily assigned to Dinoroseobacter shibae DFL 12, Jannaschia sp. CCS1, Silicibacter pomeroyi DSS-3, Roseobacter denitrificans © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1362 R. S. Poretsky et al. A 16S rRNA genes Cyanobacteria 18 % Other 82% Cyanobacteria Alphaproteobacteria Gammaproteobacteria Betaproteobacteria Deltaproteobacteria mRNA Cyanobacteria 55 % Other 45% Epsilonproteobacteria Other Proteobacteria Actinobacteria Bacteroidetes Chlamydiae Chlorobi B Chloroflexi Chrysiogenetes Acidobacteria 16S rRNA genes Cyanobacteria 21 % Other 79% Firmicutes Lentispaerae Planctomycetes Spirochaetes Thermotogae Verrucomicrobia mRNA Cyanobacteria 56 % Other 44% Fig. 3. Contribution of taxa to the 16S rRNA amplicon pool and transcript pool for the day (A) and night (B) samples. Taxonomy is presented to the phylum level (based on NCBI taxonomy) except for Proteobacteria, which is at the subphylum level. The dashed red lines indicate cyanobacterial abundance in the night sample as determined by flow cytometric counting. Och 114 and Silicibacter sp. TM1040 (Table 1 and Fig. 4). These assignments do not imply that these actual species were present at the time of sample collection, but rather they represent the best current sequence matches for some of the more abundant environmental transcripts. Transcriptome coverage To estimate transcriptome coverage, 16S rRNA clone library data were used to establish a taxon-abundance model for the HOT community at an identity level of 99%. Assuming that each taxon expresses 1000 different genes at any given time (based on the Escherichia coli model; Ingraham et al., 1983) and that genome coverage follows a Lander–Waterman model (Lander and Waterman, 1988), we estimate that the most abundant taxon in the day or night sample had over 90% transcriptome coverage (i.e. 90% of the expressed genes were sequenced at least once), while the 15 most abundant taxa had more than half of their transcriptome represented (Table S2). Alternately, we determined the singletons and doubletons among the COG categories (i.e. the number of COGs containing only one or two sequences) and applied the Chao1 index of diversity to determine the theoretical abundance of COGs in the day and night. The sequencing effort captured about 80% of the COGs predicted to be present in the night transcriptome and 70% of the COGs predicted for the day transcriptome (Table S2). © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Frequency Frequency Comparative Metatranscriptomic Analysis 1363 % PHX Genes Number of Adjacent Genes Fig. 4. Evidence for prokaryotic gene expression patterns in the community transcriptome based on P. marinus, P. ubique and Roseobacter genome bins. A. Operon-based expression was evaluated by comparing the number of adjacent transcripts (closed circles) to the number of adjacent genes found in 1000 random samples of the same size from the reference genome (black lines). B. Preferential representation of transcripts from genes predicted to be highly expressed was evaluated by comparing the per cent of PHX genes in the reference genome (grey bar) to the per cent in the transcript pool (black bar). Differences between transcript pools and reference genomes were significant for both operon and PHX analyses (Wilcoxon signed-rank test; P < 0.05). Based on these coverage estimates, increased sequencing depth would have been required to fully capture some specialized processes carried out by rarer members of the HOT community, but frequently transcribed genes from abundant taxa were well represented. In support of this, transcript mapping to the three P. mariTable 1. Number of sequences from the community transcriptome with highest homology to the listed reference genomes, as determined by top BLASTX hit to RefSeq. Prochlorococcus marinus str. MIT 9301 Prochlorococcus marinus str. AS9601 Pelagibacter ubique HTCC1002 Prochlorococcus marinus str. MIT 9312 Pelagibacter ubique HTCC1062 Dinoroseobacter shibae DFL 12 Jannaschia sp. CCS1 Silicibacter pomeroyi DSS-3 Roseobacter denitrificans Och 114 Silicibacter sp. TM1040 Night Day 6309 3214 2541 1430 1308 48 41 39 30 19 6292 2849 1851 1264 944 34 27 30 28 26 nus and two P. ubique reference genomes showed sequences with homology to approximately half the genes, at coverage depths ranging from 1 to nearly 500 hits per gene (Fig. 5). Moreover, many of the reference genes with the greatest coverage are those mediating metabolic processes expected to be dominant in the HOT bacterioplankton community (e.g. the photosynthesis genes psaA and psaB, the light-harvesting complex and RuBisCo, ammonium transporters and transcriptionrelated genes; Fig. 5). Other genes on the reference genomes for which there is similarly deep transcript coverage (e.g. proteorhodopsin, Na+/solute symporters, colicin V production and several hypothetical proteins) can be hypothesized to also represent dominant metabolic activities (Fig. 5). Operon signature in environmental transcript pools Genes that encode steps in the same metabolic pathway are frequently clustered into operons in prokaryotic © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1364 R. S. Poretsky et al. 30 A MIT9312 25 Ribosomal protein L14 Hypothetical protein 20 Photosytem II PsbJ protein Photosystem II D2 15 Ribosomal protein L20 Cytochrome b559, beta subunit 10 5 0 0 100 500 200 300 400 500 600 700 800 900 B MIT9301 Ammonium transporter family 450 400 150 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 Photosystem I PsaA Ribulose bisphosphate carboxylase Elongation factor Tu Occurences 100 Protoporphyrin IX magnesium chelatase, subunit chlH Photosystem II PsbB (CP47) 50 0 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 425 C 400 AS9601 Photosystem II PsbA (D1) 75 Photosystem I PsaB light-harvesting complex protein Integral membrane protein, interacts with FtsH 50 30S ribosomal protein S3 Photosystem II reaction center Z 25 0 0 80 100 200 300 400 500 D 600 700 800 Na+/solute symporter 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 HTCC 1002 60 DNA-directed RNA polymerase beta prime chain Bacteriorhodopsin 40 AcrB/AcrD/AcrF family protein (Acriflavin resistance) Chromosome segregation SMC family protein Hypothetical protein 20 0 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1200 1300 1400 35 E 30S ribosomal protein S1 30 HTCC 1062 excinuclease ABC subunit C 25 heat shock protein a octaprenyl-diphosphate synthase translation elongation factor EF-G adenylylsulfate reductase 20 lipoprotein precursor 15 10 5 0 0 100 200 300 400 500 600 700 800 900 1000 1100 Fig. 5. Mapping of transcripts to five reference genomes. A–C are P. marinus strains; D–E are P. ubique strains. The x-axis shows gene number in the reference genome. Shaded areas represent possible hypervariable regions with few mapped transcripts. © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Comparative Metatranscriptomic Analysis 1365 genomes (Overbeek et al., 1999) to facilitate coordinated transcription. Thus a cell’s transcript pool is anticipated to include more mRNAs from adjacent genes than what is expected from a random sampling of the genome. We tested this using the transcripts assigned to taxonomic bins for P. marinus, P. ubique and Roseobacter by counting the frequency with which transcripts from two adjacent genes on the reference strain genome (defined as ⱕ 1 gene intervening) were both present in the bin, recognizing that the wild and reference organisms will not be fully syntenic. In all cases, the transcript bins had significantly more adjacent genes than a null distribution generated from the reference genomes (Fig. 4A), suggesting that random transcript sequencing captures operon-based expression patterns in natural marine bacterioplankton communities. Predicted highly expressed genes in environmental transcript pools Genes that are frequently transcribed by a cell can be identified based on patterns in codon usage (Karlin and Mrázek, 2000). We identified predicted highly expressed (PHX) genes for the reference genomes, and then assigned PHX status to the transcripts with best hits to that reference genome based on homology. For all taxa, and in accordance with biological expectations, the environmental transcript bins had a significantly higher percentage of PHX genes than the reference genomes (Fig. 4B). This pattern was particularly evident for the Roseobacters (9% of the genes in the reference genomes are PHX versus 30% of the transcripts; 3.1-fold enrichment) and for P. marinus MIT9301 (4.6% versus 12.9%; 2.8-fold enrichment). A larger proportion of PHX transcripts were found in the day for all P. marinus bins and the Roseobacter bin (although not for P. ubique), suggesting that highly expressed genes more frequently mediate daytime-biased processes (data not shown). Metatranscriptomic comparison of day and night samples The majority of annotated transcripts (~80%) were assigned to genes related to metabolism, and in particular to three KEGG categories: amino acid transport and metabolism, energy production and conversion (particularly oxidative phosphorylation, carbon fixation and nitrogen metabolism), and carbohydrate transport (Fig. 6). Membrane transport and signal transduction pathways were also common in the community transcriptome, specifically for ABC transporters of amino acids, glycine betaine/L-proline, polyamines (spermidine and putrescine), iron and nutrients in the form of nitrate, phosphate and phosphonate. The day/night samples allowed comparison of dominant expression patterns in the presence and absence of solar radiation in the bacterioplankton community. Among the 167 KEGG metabolic pathways represented in the annotated sequences, four pathways were better represented at night (including those for glycospingolipid biosynthesis and nucleotide sugars metabolism) and six were better represented in the day (including photosynthesis and oxidative phosphorylation) (95% confidence level; Table 2). Some KEGG pathways had significant diel differences in frequency for individual taxonomic bins. These include: histidine biosynthesis, with evidence for expression of all or nearly all genes in the pathway (both P. ubique and P. marinus at night; Fig. 7A and Fig. S1A); metabolism of glutathione, a reductant with multiple detoxifying and cytoprotective capabilities (P. marinus at night); the photosynthesis pathway (phycobilisome, photosystem I and II, cytochromes, ATP synthase) and nearly all genes involved in biosynthesis of phytoene, and subsequent conversion into carotenoids (P. marinus in the day; Fig. 7B); nucleotide sugars metabolism, glycosphingolipid biosynthesis, carotenoid biosynthesis and vitamin B6 metabolism (P. ubique in the night; Fig. S1B); and transfer of methyl groups for C1 metabolism (P. ubique and Roseobacter in the day) (Table S3). Transcript annotation based on the COG database was comparable. Among the 1577 COGs represented, statistical comparisons identified 12 that were better represented at night and 13 that were better represented in the day (Table S4). These included amino acid and nucleotide metabolism, membrane biosynthesis and polyamine dehydrogenation at night, and light-mediated energy production, protein turnover, catalase synthesis and inorganic ion transport and metabolism in the day. Statistically significant differences in the distribution of transcripts between the day and night samples were also assessed independently of KEGG and COG assignments in order to capture signals from genes not currently classified by these annotation systems. Among the additional significant functions overrepresented in the night transcriptome were those for ABC-type spermidine/putrescine transport system permeases, RNA methyltransferases and signal transduction histidine kinases. For the day transcriptome, genes encoding proteorhodopsin and an aromatic-ring hydroxylase were significantly overrepresented (Table S5). Eukaryotic sequences The majority of eukaryotic transcripts were most closely affiliated with sequences from green-lineage organisms (Viridiplantae), such as the picoeukaryotic prasinophytes Ostreococcus spp. (Derelle et al., 2006) and Micromonas spp. A large number of transcripts also appeared to be © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1366 R. S. Poretsky et al. Fig. 6. The 50 most abundant KEGG pathways in the night (black) and day (gray) transcriptomes. The pathways marked with stars were significantly overexpressed in one of the pools as determined by comparisons with P < 0.05 (Rodriguez-Brito et al., 2006). most closely related to genes in Chromalveoltae (Stramenopile or Alevolate) genomes. These groups are major components of the picoeukaryotic phytoplankton (McDonald et al., 2007) and are small enough to pass the 5 mm prefilter used in this study. Gene transcripts that most closely matched reference genomes of photosynthetic eukaryotes were more abundant in the day compared with night sample. Among the most highly © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Comparative Metatranscriptomic Analysis 1367 Table 2. KEGG pathways significantly overrepresented in the night (grey shading) and day (no shading) transcriptomes (P < 0.05). Pathway ID Pathway Category path00520 path00521 path00602 path00603 path00190 path00195 path03010 path03020 path04940 path05060 Nucleotide sugars metabolism Streptomycin biosynthesis Glycosphingolipid biosynthesis – neo-lactoseries Glycosphingolipid biosynthesis – globoseries Oxidative phosphorylation Photosynthesis Ribosome RNA polymerase Chaperonin Chaperonin Carbohydrate Metabolism Biosynthesis of Secondary Metabolites Glycan Biosynthesis and Metabolism Glycan Biosynthesis and Metabolism Energy Metabolism Energy Metabolism Translation Transcription N/A N/A expressed genes detected from eukaryotic organisms were those encoding chlorophyll binding proteins, light harvesting reactions and photosynthetic machinery (Fig. 8). These included a photosystem II D1 reactioncentre protein related to that from the diatom Thalassiosira psuedonana, as well as the plastid-encoded photosystem I subunit protein similar to psaB from the diatom Odontella sinensis. Evidence for stramenopile nitrogen metabolism via urea cycle activity was also detected based on several transcripts that most closely matched stramenopile carbamoyl phosphate synthetase III, indicating that the unique diatom urea cycle (Armbrust et al., 2004; Allen et al., 2006) is likely active in natural populations of stramenopile picophytoplankton. qPCR quality control The half-life of microbial transcripts can be as short as 30 s based on studies of mRNAs of cultured bacteria (Belasco, 1993), while processing times for environmental nucleic acid samples can take hours (Fuhrman et al., 1988). Linear amplification of RNA greatly reduces the time between initiation of sampling and capture of transcripts because sample volumes can be reduced, but it has potential to introduce bias into the sequenced mRNA pool. A previous test with mRNA from the cultured marine bacterium S. pomeroyi DSS-3 demonstrated minor bias and good repeatability during linear amplification (Bürgmann et al., 2007). Here, we assessed the full environmental transcriptomic sequencing protocol by comparing qPCR-based ratios of selected genes in day versus night total RNA fractions to the pyrosequencing-based ratio of these same genes in the sequenced transcript pools. Five genes common in the transcriptome (P. marinus-like recA and psaA, P. ubique-like proteorhodopsin and Na+/solute symporter, and P. torquis-like membrane proteinase) showed a strong positive correlation between night and day ratios in the original RNA pool and the pyrosequence data sets (r = 0.94, Fig. S2), indicating that the sequenced metatranscriptome was representative of the unamplified mRNA pool. Discussion The HOT program provides comprehensive, long-term oceanographic information for the oligotrophic North Pacific Ocean (Karl and Lukas, 1996). In situ dissolved organic constituents at 25 m depth at Station ALOHA are typically 70–110 mM for carbon, 5–6 mM for nitrogen and 0.2–0.3 mM for phosphorus; ammonium concentrations in these waters (~50 nM) are below the detection limit of standard nutrient analysis (http://hahana.soest.hawaii. edu/hot/hot-dogs/). Surface water nutrient data over the past several decades for the month of November, the month in which the community transcriptomes in this study were obtained, and taken during various times of day show no discernable differences in organic and inorganic carbon, nitrogen, and/or phosphorus concentrations at Station ALOHA on a diel basis. Building on previous metagenomic and transcriptomic analyses of this system (DeLong et al., 2006; Frias-Lopez et al., 2008), this day/night environmental transcriptomics effort provides insight into the temporal patterns of bacterioplankton metabolic processes and ecological activities (Table 3). Three important caveats of the analysis are that: (i) the composition of the environmental transcriptomes may be inadvertently shaped by collection and filtration manipulations, (ii) mRNAs with intrinsically shorter half-lives are less likely to be stabilized and sequenced and (iii) only 32% of the 151 000 possible transcript sequences could be confidently assigned to a known function (Fig. 1). Despite these concerns, the community transcriptomes provided reasonable coverage of mRNAs from the dominant organisms, and the relative representation of transcripts was corroborated by RT qPCR-based expression analyses (Fig. S2). The community transcriptomes had properties consistent with expected attributes of the HOT ecosystem, including the apparent taxonomic affiliations of transcripts. Closely related P. marinus reference strains that are members of high light clade eMIT9312 comprised the most populated transcript bin. This clade has been shown to dominate in the upper euphotic zone (< 50 m) at low © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1368 R. S. Poretsky et al. © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Comparative Metatranscriptomic Analysis 1369 Fig. 7. Transcript mapping to the KEGG histidine metabolism pathway for P. ubique, overrepresented at night (A) and the biosynthesis of steroids and carotenoids pathway for P. marinus, overrepresented in the day (B). Colour (blue for night, yellow for day) indicates that transcripts were found; grey indicates that genes were present in the reference genome but no transcripts were found; white indicates that genes were not present in the reference genomes. lives across the prokaryotic taxa, dominant autotrophs produced more transcripts per gene than any co-occurring heterotrophic group not only in the day, but also at night (Fig. 3). This may reflect an advantage of autotrophy over heterotrophy for maintaining cellular activity levels given the low concentration and refractory nature of organic carbon fuelling heterotrophic activity in the oligotrophic ocean (Bauer et al., 1992). As expected, many transcripts involved in lightmediated processes, such as photosynthesis and proteorhodopsin activity, were among those overrepresented in the community transcriptome in the day. Transcripts involved in protection or repair of light-induced DNA and protein damage (e.g. catalase, chaperones, photolyases, superoxide dismutase and various DNA repair proteins) were also common in the day sample. Evidence of daytime C1 utilization by some heterotrophs suggests a source of C1 compounds or methyl groups in this and mid latitudes (below 30°) (Johnson et al., 2006), much like the HOT stations from which our samples were collected. SAR11-like sequences comprised the second largest taxonomic bin. This taxon is the most numerous heterotrophic marine bacterioplankton group, particularly in oligotrophic oceans where it makes up 30–40% of cells in the euphotic zone (Morris et al., 2002). Studies of taxonomic composition of ocean assemblages consistently show the numerical importance of aand g-Proteobacteria, Cyanobacteria, and Bacteriodetes (Morris et al., 2002; DeLong et al., 2006; Rusch et al., 2007), but little is known about how abundance specifically relates to activity levels. Based on comparisons of the relative abundance of taxa (flow cytometry counts and 16S rRNA amplicons) to their representation in the community transcriptome, by far the highest per-cell transcriptional activity level in the HOT ecosystem was seen for the Cyanobacteria. Assuming similar mRNA half- electron transport photosynthesis, light reaction phosphorus metabolic process oxidative phosphorylation ion transmembrane transporter activity energy derivation by oxidation of organic compounds heme binding cellular biosynthetic process protein metabolic process cellular macromolecule metabolic process organelle organization and biogenesis DNA metabolism organic acid metabolic process carbon utilization by fixation of carbon dioxide aldehyde metabolic process macromolecular complex assembly cellular component assembly ribonucleoprotein complex biogenesis and assembly macromolecule biosynthetic process intracellular transport aromatic compound metabolic process biopolymer metabolic process amino acid and derivative metabolic process 0 20 40 60 80 100 120 140 160 180 Fig. 8. Number of eukaryotic transcripts in day (top bars) compared with night (bottom bars) samples. The relative contribution of Viridiplanteae (green), photosynthetic Chromist algae (yellow), and other Chromist (red) transcripts to each Gene Ontology (GO) annotation category are depicted. © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1370 R. S. Poretsky et al. Table 3. Selected biogeochemically relevant genes in the HOT metatranscriptome. Nitrogen Methylotrophy Polyamine degradation Sulphur cycle Glycine betaine Aromatic compounds Carbon monoxide Phototrophy and C fixation Phosphate assimilation Amino acid metabolism Trace metal uptake Nitrogenase (N fixation) Ammonium transport Ammonia monooxygenase Assimilatory nitrate reductase Hydroxylamine oxidoreductase Nitrate permease Nitrite reductase Dissimilatory nitrite reductase Nitric oxide reductase Nitrate transporter Urease Serine-glyoxylate aminotransferase Formate dehydrogenase Methylene tetrahydrofolate reductase Methane monooxygenase Methanol dehydrogenase Methenyltetrahydromethanopterin cyclohydrolase Crotonyl-CoA reductase Formaldehyde-activating enzyme Deoxyhypusine synthase Spermidine/putrescine transport system permease Acetylpolyamine aminohydrolase Sulphur oxidation Dimethylsulphoniopropionate demethylase Dimethylglycine dehydrogenase Glycine cleavage system (amnomethyltransferase) Aromatic ring hydroxylase protocatechuate 3,4-dioxygenase Benzoyl-CoA oxygenase Carbon monoxide dehydrogenase Photosystem I Photosystem II Rubisco Photosynthetic reaction centre, M subunit Proteorhodopsin Phosphonate uptake Alkaline phosphatase Phosphate uptake Glutamate synthase Glutathione reductase Histidine kinase Threonine synthase Selenium Iron Arsenite Arsenate reductase nifH, nifU, nifS, nifB amt amoA narB hao napA nirA nirK, nirS norQ narK ureC, ureE, ureF fdh, fdsD metF mmo mxa mch fae dys2 potC aphA soxB, soxC, soxA, soxZ, soxF dmdA dmgdh gcvT chlP pcaH boxA cosS, coxM, coxL multiple multiple rbcL, rbcS pufM phnD, phnC phoA pstA, pstS gltB gor baeS thrC tonB arsC Night Day + + + +* + + + + + + + + + + + + + +* +* + + + + + + + + + +* + + + +* + + + + + + + + + + + +* +* +* +* + + + + + +* +* +* + +* + + + + + + + + + + A ‘+’ indicates occurrence in the night or day sample. An asterisk indicates significantly higher transcript frequency in one. ecosystem. Compounds such as methanol and formaldehyde (Heikes et al., 2002; Carpenter et al., 2004; Giovannoni et al., 2008), methane (Ward et al., 1987), and methylhalides (Woodall et al., 2001; Schaefer et al., 2002) may be available to heterotrophic bacterioplankton in surface sea water. Dimethylsulphoniopropionate, an organic sulphur compound produced in abundance by marine phytoplankton (Kiene et al., 2000), is a rich source of methyl groups for surface ocean bacterioplankton, and tetrahydrofolate-mediated C1 transfer (i.e. transcripts mapping to the C1 pool by folate and methane metabolism KEGG pathway; Table S5) has been shown to play a role in its metabolism (Howard et al., 2006). Recovery of nearly four times as much mRNA per volume of sea water in the day (~30 ng l-1) compared with night (~8 ng l-1) is consistent with high relative abundance of RNA polymerase transcripts in the day (Table 2) and likely reflects increased gene expression when solar radiation is available. Night-biased synthesis of vitamin B6, essential for a variety of amino acid conversions including transaminations, decarboxylations and dehydrations, in conjunction with evidence for other night-time activities such as the g-glutamyl pathway for amino acid uptake, the overrepresentation of amino acid transport and metabolism genes, and the histidine synthesis pathway (Table 3 and Tables S4–S6), indicate that amino acid acquisition in © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Comparative Metatranscriptomic Analysis 1371 general may be a relatively more important metabolic activity in the night. Prochlorococcus marinus has recently been shown to exhibit diel patterns of amino acid uptake, with acquisition occurring predominantly at dusk (Mary et al., 2008). Our data agree with this and further suggest that heterotrophic taxa also devote a greater percentage of their transcriptome to transporting and synthesizing amino acids at night. Night-time accumulation of amino acids might be a mechanism for nitrogen storage by many organisms, particularly for P. marinus, which undergoes cell division at night. Histidine, the amino acid with the most consistent signal for synthesis at night by both autotrophs and heterotrophs (Fig. 7A and Fig. S1), is one of the most nitrogen-rich amino acids (only arginine has more amino groups). Overall, bacterial community investment in this oligotrophic ocean system was skewed towards energy acquisition and metabolism during the day, while biosynthesis (specifically of membranes, amino acids and vitamins) received relatively greater investments at night. Many microbial processes expected to be differentially expressed over a day/night cycle, such as photosynthesis, oxidative phosphorylation and proteorhodopsin activity, were indeed captured in the sequence data. Less anticipated processes that emerged included the utilization of C1 compounds, the uptake of polyamines and the degradation of aromatic compounds (Table 3). Other metabolic processes ongoing in this microbial community, although without statistical evidence for day/night patterns, included: use of nitrate and urea as nitrogen sources; use of phosphate, phosphonate and carbonoxygen-phosphorus (C-O-P) compounds as phosphorus sources; oxidation of reduced sulphur compounds; oxidation of carbon monoxide; and uptake of multiple trace metals (Table 3). This comparative analysis of microbial community transcripts has provided an inventory of ongoing metabolic processes, offered insights into their temporal patterns and supplied a new type of data for predictive modelling of environmental controls on ecosystem properties. Experimental procedures Sample collection Samples were collected at the Hawaiian Ocean Time-series (HOT) Station ALOHA, defined by the 6-nautical-mile radius circle centred at 22°45′N, 158°W in November, 2005 (HOT175). For RNA extraction, sea water was collected from a depth of 25 m using Niskin bottles on a conductivitytemperature-depth rosette sampler. A night sample was collected at 03:00 on 11 November 2005, and a daytime sample was collected at 13:00 on 13 November 2005. During HOT-175, the peak PAR level was at 12:00, with sunrise occurring around 07:00 and sunset just before 18:00. Sea water (80 l for the night sample and 40 l for the day sample) was prefiltered through a 5 mm, 142 mm polycarbonate filter (GE Osmonics, Minnetonka, MN) followed by a 0.2 mm, 142 mm Durapore (Millipore) filter using positive air pressure. The 0.2 mm filters were placed in a 15 ml tube containing 2 ml Buffer RLT (containing b-mercaptoethanol) from the RNeasy kit (Qiagen, Valencia, CA) and flash-frozen in liquid nitrogen for RNA extraction. For DNA extraction, an additional 20 l of sea water were simultaneously filtered using the protocol outlined above at both time points. The 0.2 mm filters were placed in Whirlpack bags and flash-frozen. The total sampling time from initiation of collection until freezing in liquid nitrogen was approximately 1.5 h. We obtained ~1 mg of total RNA from 40 to 80 l of sea water. Following mRNA enrichment and amplification, 30–100 mg of mRNA was available for conversion to cDNA for sequencing. Typically, only 3–5 mg of DNA was required for pyrosequencing. RNA and DNA preparation DNA was extracted using a phenol : chloroform-based protocol (Fuhrman et al., 1988). Briefly, frozen filters inside Whirlpak bags were transferred to 50 ml Falcon centrifuge tubes. Ten millilitre extraction buffer [SDS (10% Sodium Doecyl Sulphate) : STE (100 mM NaCl, 10 mM Tris, 1 mM EDTA), 9:1] was added to the tubes and boiled in a water bath for 5 min. The extraction buffer was then removed from the tubes, placed into Oak Ridge round-bottom centrifuge tubes, to which 3 ml NaOAc and 28 ml 100% EtOH were added. Organic macromolecules were precipitated overnight at -20°C, before the tubes were centrifuged for 1 h at 15 000 g. The supernatant was decanted, and pellets dried for 30 min in the air. The pellets were resuspended in 600 ml deionized water, and sequentially extracted with 500 ml phenol, 500 ml phenol : chloroform : isoamyl alcohol (24:1:0.1), and 500 ml chloroform:isoamyl alcohol (9:1); after each extraction the organic phase was removed and discarded. The supernatant was removed into a fresh tube at the end of last extraction, amended with 150 ml NaOAc and 1.2 ml 100% EtOH, and precipitated overnight. The tube contents were then centrifuged at 15 000 g for 1 h, the supernatant decanted, and pellets dried in a speed vacuum dryer for 10 min. The DNA pellets were resuspended in 100 ml DNAse and RNAse-free deionized water (Ambion). RNA was extracted using a modified version of the RNeasy kit (Qiagen) that results in high RNA yields from material on polycarbonate filters (Poretsky et al., 2008). Frozen samples were first thawed slightly for 2 min in a 40–50°C water bath and then vortexed for 10 min with RNase-free beads from the Mo-Bio RNA PowerSoil kit (Carlsbad, CA). Following centrifugation for 5 min at 3000–5000 g, the supernatant was transferred to a new tube. Beginning with the RNeasy Midi kit, 1 vol. of 70% ethanol was added to the lysate and, in order to shear large-molecular-weight nucleic acids, the lysate was drawn through a 22-gauge needle several (~5) times. RNA extraction then continued with the RNeasy Mini kit according to the manufacturer’s instructions. Following extraction, RNA was treated with DNase using the TURBO DNA-free kit (Ambion, Austin, TX). Two methods were employed to rid the RNA samples of rRNA. The RNA was first treated enzymatically with the mRNA-ONLY © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1372 R. S. Poretsky et al. Prokaryotic mRNA Isolation Kit (Epicentre Biotechnologies, Madison, WI) that uses a 5′-phosphate-dependent exonuclease to degrade rRNAs. The MICROBExpress kit (Ambion) subtractive hybridization with capture oligonucleotides hybridized to magnetic beads was subsequently used as an additional mRNA enrichment step. In order to obtain mg quantities of mRNA, approximately 500 ng of RNA was linearly amplified using the MessageAmp II-Bacteria Kit (Ambion) according to the manufacturer’s instructions. Finally, the amplified, antisense RNA (aRNA) was converted to double-stranded cDNA with random hexamers using the Universal RiboClone cDNA Synthesis System (Promega, Madison, WI). The cDNA was purified with the Wizard DNA Clean-up System (Promega). The quality and quantity of the total RNA, mRNA, aRNA and cDNA were assessed by measurement on the NanoDrop-1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE) and the Experion Automated Electrophoresis System (Bio-Rad, Hercules, CA). cDNA sequencing and quality control cDNAs from each sample (night and day) were sequenced using the GS 20 sequencing system by 454 Life Sciences (Branford, CT) (Margulies et al., 2005), resulting in 10 682 120 bp from 106 907 reads for the night sample and 13 255 704 bp from 133 515 reads for the day sample. The average sequence length was 99 bp. The sequences have been deposited in the NCBI Short Read Archive with the Genome Project ID #33463. rRNA identification and removal For rRNA sequence identification, the sequences were clustered at an identity threshold of 98% based on a local alignment (number of identical residues divided by length of alignment) using the program Cd-hit (Li and Godzik, 2006). Ribosomal RNA sequences were identified by BLASTN queries of the reference sequence of each cluster against the noncurated, GenBank nucleotide database (nt) (Benson et al., 2007) using cut-off criteria of E-value ⱕ 10-3, nucleic acid length ⱖ 69 and per cent identity ⱖ 40% previously established with in silico tests for rRNA sequence predictions of short pyrosequences (Frias-Lopez et al., 2008; Mou et al., 2008). We conservatively identified a sequence as rRNAderived and removed it from the analysis pipeline if any of the top three BLASTN hits were to an rRNA gene. cDNA sequence annotation The criteria for protein predictions generated using BLASTX against the NCBI curated, non-redundant reference sequence database (RefSeq) (Pruitt et al., 2005) were established with in silico tests to determine suitable cut-off limits for reliable functional prediction. For these tests, 100 arbitrarily selected, known functional gene sequences were fragmented into 20–500 bp fragments and analysed using BLASTX against RefSeq to determine if the best BLAST hit was to the correct gene function, excluding self-hits. Based on these analyses, the cut-off criteria for protein prediction were set as E-value < 0.01, identity > 40% and overlapping length > 23 aa to the corresponding best hit. Sequences with hits to RefSeq were assigned functional protein or pathway predictions based on the COG database (Tatusov et al., 2000) or KEGG database (Kanehisa and Goto, 2000). The cut-off criteria for functional protein prediction based on orthologous groups using BLASTX analysis against the COG database were established using the same in silico approach with 100 bp fragments of known functional genes as E-value < 0.1, identity > 40% and overlapping length > 23 aa to the corresponding best hit. The COG cut-off criteria were also applied to the KEGG database for pathway prediction because of the similarity in database size. Taxonomic binning of the sequences was carried out using MEGAN with the default settings for all parameters (Huson et al., 2007); this program assigns likely taxonomic origin to sequences based on the NCBI taxonomy of closest BLAST hits. The taxonomic affiliations of the putative mRNA sequences were predicted using MEGAN to the family level, and the top BLAST hit for any higher-resolution taxonomic assignments. All non-rRNA sequences that had no RefSeq hits were BLASTX-queried against the nr database as well as against CAMERA un-assembled ORFs predicted from the Global Ocean Survey reads (http://camera.calit2.net/ index.php) (Seshadri et al., 2007). Eukaryotic sequence annotation Eukaryotic transcripts were binned by MEGAN. Sequences were queried (BLASTX) against a curated database of protein sequences derived from all available complete eukaryotic organelle and nuclear genomes (currently, 46 eukaryotic genomes). Transcripts that matched a reference protein sequence with > 60% identity and an E-value < e-10 were retained and the reference protein for the cluster was used for functional annotation. Functional annotation was performed using Java-based Blast2go (Conesa et al., 2005) that annotates genes based on similarity searches with statistical analysis and highlighted visualization on directed acyclic graphs. 16S rRNA gene libraries PCR amplification of ribosomal DNA was carried out using primers 27F and 1522R (Johnson, 1994). The PCR conditions were as follows: 3 min at 96°C, followed by 30 cycles of denaturation at 95°C for 50 s, annealing at 58°C for 50 s, primer extension at 72°C for 1 min and a final extension at 72°C for 10 min. PCR products were cleaned using the QIAquick PCR Purification Kit (Qiagen) and multiple PCR reactions were pooled and cloned into pCR2.1 vector using the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). PCR amplifications included standard no-template controls. Clones from each sample (192) were sequenced at the University of Georgia Sequencing Facility on an ABI 3100 (Applied Biosystems, Foster City, CA). Predicted highly expressed genes The PHX genes were determined for cultured representatives of three prokaryotic taxa that were well represented in the transcript libraries (Prochlorococcus, Roseobacter and © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Comparative Metatranscriptomic Analysis 1373 SAR11) using an algorithm developed by Karlin and Mrázek (2000). The algorithm is based on comparisons with codon usage patterns in genes expected to be frequently transcribed in a prokaryotic genome (ribosomal proteins, chaperone proteins, etc.). Environmental transcript sequences that had best BLAST hits to one of the PHX genes were similarly designated as PHX. Statistical analysis A statistical program designed for comparing gene frequency in metagenomic data sets (Rodriguez-Brito et al., 2006) was used to compare the night and day mRNA sequences categorized based on COGs, KEGGs and proteins. The program was run with 20 000 repeated samplings with a sample size of 10 000 for COGs, 9000 for KEGGs and 25 000 for proteins. The significance level (P) was set at < 0.05. qPCR verifications To confirm that the composition of the pyrosequence library was representative of the initial mRNAs, transcripts of five genes that were top hits to multiple sequences in both transcript pools were quantified in the total RNA pool. The qPCR primer sets were designed for the P. marinus str. AS9601 recA and psaA, a proteorhodopsin gene and a Na+/solute symporter (Ssf family) gene from P. ubique HTCC1062, and a probable integral membrane proteinase attributed to Psychroflexus torquis ATCC 700755 (sequences and annealing temps in Table S6). Reverse transcription reactions were carried out on 200 ng of RNA using the Omniscript RT kit (Qiagen) in 20 ml volumes containing 1¥ RT buffer, 0.3 mg ml-1 of random hexamers (Invitrogen), 1 ml of 5 mM dNTPs, 2 U of reverse transcriptase and 20 U of RNase inhibitor (Promega) at 37°C for 1 h, followed by inactivation of the reverse transcriptase at 95°C for 2 min. The day : night ratio of each gene transcript in the RNA pools was determined by qPCR amplification of a serial dilution of cDNAs in triplicate, and calculation of the difference in cycle threshold values (DCT) between the two samples. Quantitative amplification was done using the iCycler iQ RT PCR detection system (BioRad) in a 20 ml reaction volume containing 10 ml of iQ SYBR Green Supermix (Bio-Rad), 0.4 ml each of 10 mM of the forward and reverse primers and 1 ml of the cDNA template. PCR conditions included a preliminary denaturation at 95°C for 3 min followed by 45 cycles of 95°C for 15 s, annealing for 1.5 s, 95°C for 1 min and 55°C for 1 min. A melt curve was generated following the PCR, beginning with 55°C and increasing 0.4°C every 10 s until 95°C. A PCR control without an initial RT step was included with every set of reactions. Acknowledgements We thank the Captain and crew of the R/V Kilo Moana and Dr David Karl. Jennifer Oliver assisted with sample processing. Jonathan Badger assisted with data processing. Funding was provided by The Gordon and Betty Moore Foundation, National Science Foundation grants MCB-0702125 (M.A.M.), EF-0722374 (A.E.A) and OCE-0425363 (J.P.Z.), and the NSF C-MORE Center for Microbial Oceanography. References Allen, A.E., Vardi, A., and Bowler, C. (2006) An ecological and evolutionary context for integrated nitrogen metabolism and related signaling pathways in marine diatoms. Curr Opin Plant Biol 9: 264–273. Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez, D., Putnam, N.H., et al. (2004) The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306: 79–86. Bauer, J.E., Williams, P.M., and Druffel, E.R.M. (1992) 14C activity of dissolved organic carbon fractions in the northcentral Pacific and Sargasso Sea. Nature 357: 667–670. Belasco, J.G. (1993) mRNA degradation in prokaryotic cells: an overview. In Control of Messenger RNA Stability. Belasco, J.G., Brawerman, G. (eds). San Diego, CA, USA: Academic Press, pp. 3–11. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. (2007) GenBank. Nucleic Acids Res 35: D21–D25. Bürgmann, H., Widmer, F., Sigler, W.V., and Zeyer, J. (2003) mRNA extraction and reverse transcription-PCR protocol for detection of nifH gene expression by Azotobacter vinelandii in soil. Appl Environ Microbiol 69: 1928–1935. Bürgmann, H., Howard, E.C., Ye, W., Sun, F., Sun, S., Napierala, S., and Moran, M.A. (2007) Transcriptional response of Silicibacter pomeroyi DSS-3 to dimethylsulfoniopropionate (DMSP). Environ Microbiol 9: 2742–2755. Campbell, L., and Vaulot, D. (1993) Photosynthetic picoplankton community structure in the subtropical North Pacific Ocean near Hawaii (Station ALOHA). Deep Sea Res. Part I Oceanogr Res Pap 40: 2043–2060. Carpenter, L.J., Lewis, A.C., Hopkins, J.R., Read, K.A., Longley, I.D., and Gallagher, M.W. (2004) Uptake of methanol to the North Atlantic Ocean surface. Global Biogeochem Cycles 18: GB4027. Cavender-Bares, K.K., Karl, D.M., and Chisholm, S.W. (2001) Nutrient gradients in the western North Atlantic Ocean: relationship to microbial community structure and comparison to patterns in the Pacific Ocean. Deep Sea Res. Part I Oceanogr Res Pap 48: 2373–2395. Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M., and Robles, M. (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674–3676. DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam, S.J., Frigaard, N.-U., et al. (2006) Community genomics among stratified microbial assemblages in the ocean’s interior. Science 311: 496–503. Derelle, E., Ferraz, C., Rombauts, S., Rouze, P., Worden, A.Z., Robbens, S., et al. (2006) Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc Natl Acad Sci USA 103: 11647–11652. Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L., Schuster, S.C., Chisholm, S.W., and DeLong, E.F. (2008) Microbial community gene expression in ocean surface waters. Proc Natl Acad Sci USA 105: 3805–3810. Fuhrman, J.A., Comeau, D.E., Hagstrom, A., and Chan, A.M. (1988) Extraction from natural planktonic microorganisms of DNA suitable for molecular biological studies. Appl Environ Microbiol 54: 1426–1429. © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 1374 R. S. Poretsky et al. Gelder, R.N.V., von Zastrow, M.E., Yool, A., Dement, W.C., Barchas, J.D., and Eberwine, J.H. (1990) Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci USA 87: 1663–1667. Gilbert, J.A., Field, D., Huang, Y., Edwards, R., Li, W., Gilna, P., and Joint, I. (2008) Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE 3: e3042. Giovannoni, S.J., Hayakawa, D.H., Tripp, H.J., Stingl, U., Givan, S.A., Cho, J.-C., et al. (2008) The small genome of an abundant coastal ocean methylotroph. Environ Microbiol 10: 1771–1782. Heikes, B.G., Chang, W.N., Pilson, M.E.Q., Swift, E., Singh, H.B., Guenther, A., et al. (2002) Atmospheric methanol budget and ocean implication. Global Biogeochem Cycles 16: 80.81–80.80.13. Howard, E.C., Henriksen, J.R., Buchan, A., Reisch, C.R., Burgmann, H., Welsh, R., et al. (2006) Bacterial taxa that limit sulfur flux from the ocean. Science 314: 649–652. Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007) MEGAN analysis of metagenomic data. Genome Res 17: 377–386. Ingraham, J.L., Maaløe, O., and Neidhardt, F.C. (1983) Growth of the Bacterial Cell. Sunderland, MA, USA: Sinauer Associates. Johnson, J.L. (1994) Similarity analysis of rRNAs. In Methods for General and Molecular Bacteriology. Gerhardt, P., Murray, R.G.E., Wood, W.A., and Krieg, N.R. (eds). Washington, DC: American Society for Microbiology, pp. 683– 700. Johnson, Z.I., Zinser, E.R., Coe, A., McNulty, N.P., Woodward, E.M.S., and Chisholm, S.W. (2006) Niche partitioning among Prochlorococcus ecotypes along ocean-scale environmental gradients. Science 311: 1737–1740. Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27–30. Karl, D., Letelier, R., Tupas, L., Dore, J., Christian, J., and Hebel, D. (1997) The role of nitrogen fixation in biogeochemical cycling in the subtropical North Pacific Ocean. Nature 388: 533–538. Karl, D.M., and Lukas, R. (1996) The Hawaii Ocean Timeseries (HOT) program: background, rationale and field implementation. Deep Sea Res. Part II Top Stud Oceanogr 43: 129–156. Karlin, S., and Mrázek, J. (2000) Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182: 5238–5250. Kiene, R.P., Linn, L.J., and Bruton, J.A. (2000) New and important roles for DMSP in marine microbial communities. J Sea Res 43: 209–224. Lander, E.S., and Waterman, M.S. (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2: 231–239. Li, W., and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658–1659. Liang, P., and Pardee, A.B. (1992) Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257: 967–971. McDonald, S.M., Sarno, D., Scanlan, D.J., and Zingone, A. (2007) Genetic diversity of eukaryotic ultraphytoplankton in the Gulf of Naples during an annual cycle. Aquat Microb Ecol 50: 75–89. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376–380. Mary, I., Garczarek, L., Tarran, G.A., Kolowrat, C., Terry, M.J., Scanlan, D.J., et al. (2008) Diel rhythmicity in amino acid uptake by Prochlorococcus. Environ Microbiol 10: 2124–2131. Morris, R.M., Rappe, M.S., Connon, S.A., Vergin, K.L., Siebold, W.A., Carlson, C.A., and Giovannoni, S.J. (2002) SAR11 clade dominates ocean surface bacterioplankton communities. Nature 420: 806–810. Mou, X., Sun, S., Edwards, R.A., Hodson, R.E., and Moran, M.A. (2008) Bacterial carbon processing by generalist species in the coastal ocean. Nature 451: 708–711. Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., and Maltsev, N. (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96: 2896–2901. Poretsky, R.S., Bano, N., Buchan, A., LeCleir, G., Kleikemper, J., Pickering, M., et al. (2005) Analysis of microbial gene transcripts in environmental samples. Appl Environ Microbiol 71: 4121–4126. Poretsky, R.S., Bano, N., Buchan, A., Moran M.A., and Hollibaugh, J.T. (2008) Environmental transcriptomics: a method to access expressed genes in complex microbial communities. In Molecular Microbial Ecology Manual. Kowalchuk, G.A., de Bruijn, F.J., Head, I.M., Akkermans, A.D.L., and van Elsas, J.D. (eds). Dordrecht, Netherlands: Springer, pp. 1892–1904. Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501–D504. Rodriguez-Brito, B., Rohwer, F., and Edwards, R. (2006) An application of statistics to comparative metagenomics. BMC Bioinformatics 7: 162. Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B., Williamson, S., Yooseph, S., et al. (2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol 5: e77. Schaefer, J.K., Goodwin, K.D., McDonald, I.R., Murrell, J.C., and Oremland, R.S. (2002) Leisingera methylohatidivorans gen. nov., sp nov., a marine methylotroph that grows on methyl bromide. Int J Syst Evol Microbiol 52: 851–859. Seshadri, R., Kravitz, S.A., Smarr, L., Gilna, P., and Frazier, M. (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5: 394–397. Tatusov, R.L., Galperin, M.Y., Natale, D.A., and Koonin, E.V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28: 33–36. Ward, B.B., Kilpatrick, K.A., Novelli, P.C., and Scranton, M.I. (1987) Methane oxidation and methane fluxes in the ocean surface-layer and deep anoxic waters. Nature 327: 226– 229. Wawrik, B., Paul, J.H., and Tabita, F.R. (2002) Real-time PCR quantification of rbcL (ribulose-1,5-bisphosphate carboxylase/oxygenase) mRNA in diatoms and pelagophytes. Appl Environ Microbiol 68: 3771–3779. © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375 Comparative Metatranscriptomic Analysis 1375 Woodall, C.A., Warner, K.L., Oremland, R.S., Murrell, J.C., and McDonald, I.R. (2001) Identification of methyl halideutilizing genes in the methyl bromide-utilizing bacterial strain IMB-1 suggests a high degree of conservation of methyl halide-specific genes in gram-negative bacteria. Appl Environ Microbiol 67: 1959–1963. Zehr, J.P., Waterbury, J.B., Turner, P.J., Montoya, J.P., Omoregie, E., Steward, G.F., et al. (2001) Unicellular cyanobacteria fix N2 in the subtropical North Pacific Ocean. Nature 412: 635–638. Zhou, J.H. (2003) Microarrays for bacterial detection and microbial community analysis. Curr Opin Microbiol 6: 288– 294. Supporting information Additional Supporting Information may be found in the online version of this article: Fig. S1. Transcript mapping to the KEGG histidine metabolism pathway for P. marinus (A) and the vitamin B6 metabolism pathway for P. ubique (B) at night. Blue shading indicates that transcripts were found; grey indicates genes that are present in the genome, but no transcripts were found; white indicates genes that are not present in the reference genomes. Fig. S2. Quality control of the pyrosequences using qPCR verifications of transcript ratios for five genes: recA and psaA from P. marinus str. AS9601, a bacteriorhodopsin and a Na+/solute symporter (Ssf family) gene from P. ubique HTCC1062, and a probable integral membrane proteinase attributed to P. torquis ATCC 700755. The night : day ratio of transcripts in the pyrosequence libraries is plotted against the same ratio in the original total RNA fraction. Table S1. Results of bioinformatic pipeline for 100 and 200 bp fragments from groups for which there are no genome sequences currently available. BACs from uncultured marine taxa (two from SAR86 and one from SAR116) were fragmented into random 100 bp pieces, using just the coding regions. Fragments were blasted against RefSeq, not allowing a self-hit. As controls, we did the same for P. ubique HTCC1062 and P. marinus MIT9312. Table S2. Estimates of coverage using two different models. The Lander–Waterman model uses the 16S rRNA clone library data to establish a taxon-abundance model for the system at a similarity level of 99%, and is based on the assumptions that each taxon produces 1000 transcripts at any given time and all expressed genes are expressed equally. The Chao1 richness estimators for COGs are computed using EstimateS (version 8.0, R. K. Colwell, http:// purl.oclc.org/estimates). Table S3. KEGG pathways for three taxonomic bins (P. marinus, P. ubique and Roseobacters) significantly overrepresented in the night (grey shading) and day (no shading) transcriptomes (P < 0.10). Table S4. COGs significantly overrepresented in the night (grey shading) and day (no shading) transcriptomes (P < 0.05). Table S5. Genes significantly overrepresented in the night (grey shading) and day (no shading) transcriptomes (P < 0.05). Table S6. Primer sets used in qPCR. Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article. © 2009 The Authors Journal compilation © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 11, 1358–1375