Download Supplementary Information (docx 768K)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metagenomics wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcript
SUPPLEMENTARY INFORMATION
FOR
Absolute quantification of microbial taxon abundances
Ruben Props1,2, Frederiek-Maarten Kerckhof1, Peter Rubbens3, Jo De Vrieze1, Emma
Hernandez Sanabria1, Willem Waegeman³, Pieter Monsieurs2, Frederik Hammes4 and
Nico Boon1,*
1
Center for Microbial Ecology and Technology (CMET), Department of Biochemical and
Microbial Technology, Ghent University, Coupure Links 653, B-9000 Gent, Belgium
2
3
Belgian Nuclear Research Centre (SCK•CEN), Boeretang 200, B-2400 Mol, Belgium
KERMIT, department of Mathematical modelling, Statistics and Bioinformatics, Ghent
University, Coupure Links 653, B-9000, Ghent, Belgium
4
Department of Environmental Microbiology, Eawag - Swiss Federal Institute for Aquatic
Science and Technology, Überlandstrasse 133, CH-8600, Dübendorf, Switzerland
* Corresponding author: Nico Boon, Ghent University; Faculty of Bioscience Engineering;
Center for Microbial Ecology and Technology (CMET); Coupure Links 653; B-9000 Gent,
Belgium; phone: +32 (0)9 264 59 76; fax: +32 (0)9 264 62 48; E-mail: [email protected];
Webpage: www.cmet.UGent.be.
Material and methods
Survey of the cooling water system
The microbial community of an industrial cooling water system that operates on a nuclear test
reactor was monitored over a total time period of approximately 80 days, with an average
sampling frequency of twice per day. During start-up (12 – 14h), temperature increased from
approximately 15 °C to 30 °C, conductivity from 1 to 7 µS cm-1, and pH remained relatively
stable between 4.18 and 4.87. All metadata, including time points, conductivity, temperature,
pH and cell densities are available in the Supplementary dataset. A more detailed description
of the system can be found in (Props et al 2016).
Flow cytometric analysis (adapted from Props et al 2016)
Aqueous samples were collected from five locations, mixed in equal ratios, and transported in
1 litre pre-rinsed and autoclaved glass containers. A sample-matrix optimized protocol
adjusted from a previous study was used (Props et al 2016). Triplicate sample volumes (250
µl) were diluted two times in sterile 0.22 µm filtered bottled mineral water (Prest et al 2013),
and left at room temperature for 10 minutes. Samples were stained with the nucleic acid stain
SYBR Green I (SG, 10 000x concentrate, Invitrogen), and incubated for 20 min at 37 °C (final
concentration of SG = 1x concentrate). Analysis by flow cytometry (Accuri, BD Biosciences,
Erembodegem, Belgium) on four fluorescence detectors (530/30 nm, 585/40 nm, > 670 nm,
675/25 nm) and two scatter detectors was conducted in fixed volume mode (50 µL sample-1)
to allow accurate quantification of cell densities. All samples were analysed immediately after
incubation. The data of each sample was denoised from (in)organic noise and cell aggregates
by a two-step filtering approach using the flowCore package (v1.38.1) in R (v3.3.0). First, the
bacterial cell population was extracted by a filter applied on the primary fluorescence emission
channels (Supplementary Figure S3, FL1-H/FL3-H). Secondly, cell aggregates were
removed by a filter on the area vs. height plot of the primary fluorescence detector (FL1,
Supplementary Figure S4). Disproportionate area-to-height ratios of signals are indicative of
aggregated bacterial cells, which result in an incorrect enumeration of the total cell density. As
there exists no method to correct for these agglomerates, we chose to exclude all aggregated
cells from the enumeration. Over this entire data set, the average proportion of aggregated
cells was 0.5 %, and the proportion never exceeded 2.5 %.
Community analysis (adapted from Props et al 2016)
In parallel to the flow cytometry analysis, one litre from each sample was filtered on sterile 0.2
µm filter funnels and stored dry at – 20 °C. DNA-extraction was performed on the filter material
as previously described (Vilchez-Vargas et al 2013). The DNA-quality was evaluated on 1 %
(w/v) agarose gel electrophoresis and DNA-concentration was quantified by the Quantifluor
dsDNA sample kit on a multi detection system (Promega, Leiden, the Netherlands). Highthroughput amplicon sequencing of the V3 – V4 hypervariable region (Klindworth et al 2013)
was performed with the Illumina MiSeq platform, according to the manufacturer’s guidelines at
LGC Genomics (GmbH, Berlin, Germany). This generated 50,000 ( 30 %) raw 2 x 300 bp
paired-end reads per sample. Contigs were created by merging paired-end reads based on
the Phred quality score (of both reads) heuristic (Kozich et al 2013), in mothur (v.1.37, seed =
777) (Schloss et al 2009). Contigs were aligned to the Silva database (v123), and filtered from
those with (i) ambiguous bases, (ii) more than 8 homopolymers, (iii) a length below 400, and
(iv) those not corresponding to the V3 – V4 region. The aligned sequences were filtered and
dereplicated, while sequencing errors were removed using the pre.cluster command. Chimera
removal was performed with the uchime command. Sequences were clustered into operational
taxonomic units (OTUs) at 97 % similarity with the cluster.split command (average neighbour
algorithm). Sequences were classified using the Wang method with the Silva database (v123)
and the freshwater database available at https://github.com/mcmahon-uw/FWMFG (Newton et
al 2011). Quality of the sequencing and post-processing pipeline was verified by incorporating
triplicate mock samples (n = 12 species) into the same sequencing run (Supplementary Table
1). All samples were rescaled by taking the proportions of each OTU, multiplying it with the
minimum sample size (825 reads) and rounding to the nearest integer (McMurdie and Holmes
2014). To evaluate whether further adjustment of the relative abundances by predicted 16S
rRNA gene copy numbers was justified, we utilized the accuracy option (-a) in the
normalize_by_copy_number.py code of PICRUSt (v1.0.0) to calculate the weighted Nearest
Sequenced Taxon Index (NSTI) for each sample (Langille et al 2013). The NSTI measures the
availability of reference genomes for the most abundant OTUs in a sample. High scores
(>0.15) indicate low availability of sufficient reference genomes, while low scores (<0.06)
indicate the presence of closely related reference genomes. As only 15 out of 79 samples had
an NSTI lower than 0.06 (Supplementary Figure S4), we did not correct the relative
abundances of our data set by the predicted 16S rRNA gene copy numbers. The code
necessary for formatting a mothur outputted biom file for PICRUSt, as well as the necessary
code to run the analysis is added as a supplementary file. Briefly, the OTU table was picked
against the Greengenes database (v13_8), formatted into a biom file (v1.0.0) and provided as
input to PICRUSt. The NSTI values are available in the supplementary dataset.
Statistical analysis
All statistical analyses were performed in R (v3.3.0) with seed 777. Pearson’s correlation
coefficients (rp) between OTU relative abundances and total cell densities were calculated with
the cor function. The difference between the relative and absolute abundance relationship for
the three most abundant OTUs (OTU1, OTU2 and OTU3) was investigated by ordinary least
squares regression (OLS). The absolute and relative abundance data were pruned from nearzero values (relative abundance < 5%), centred and scaled. Data points were grouped
according to their representative OTU and individual regressions were performed for each
group. Assumptions for statistical inference on OLS models were evaluated by visual
inspection of the Quantile-Quantile plot, residuals versus fitted value plot and autocorrelation
plot (Supplementary Figure S5). Errors on the model coefficients were corrected for
heteroscedasticity (Breusch-Pagan test, p<0.0001) and autocorrelation by using the HAC
correction available from the sandwich package (v2.3.4, vcovHAC function). The Wald test
was used to confirm the overall positive relation between the absolute and relative abundances
(p<0.0001). Posthoc analysis was performed to identify which OTUs differed in their absolute
vs. relative abundance relationship. This was performed with the glht function (Tukey’s all-pair
comparison) from the multcomp package (v1.4.5). P-values lower than 0.05 were considered
to represent significant differences.
Supplementary Figure S1: Distribution of the Pearson’s correlation coefficient for all 427
OTUs, and calculated between the relative abundances and the whole community cell density
over 79 time points. Dashed lines indicate the means of both sample groups.
Supplementary Figure S2: Distribution of the NSTI (weighted Nearest Sequenced Taxon
Index) score for all 79 samples. Only 15 samples had a good NSTI (< 0.06) and 64 samples
had a poor NSTI (> 0.06). Dashed lines indicate the mean NSTI score for each quality group.
A NSTI of 0.03 indicates the availability of reference genomes at the species level.
Supplementary Figure S3: Data filtering strategy of the flow cytometry data; each point
represents one cell that is characterized by two fluorescence signals. FL1-A, FL1-H and FL3H parameters were log10 transformed. In the first filtering step bacterial cells are separated
from the (in)organic noise (left panel). This filter was optimized based on a set of negative
controls (0.22 µm filtered sample, autoclaved sample). The filtered data (area within polygon)
was subjected to a second filtering step to remove cell aggregates (right panel). Cell
aggregates display disproportionate ratios between the area and height of their fluorescence
signal. This causes these aggregates to be positioned away from the diagonal (outside filter
2).
Supplementary Figure S4: Illustration of the non-linear relationship between the area and
height aspects of the scatter signals. SSC-A/H and FSC-H/A parameters were log10
transformed. Left panels display the scatterplots of the raw data (unfiltered) and right panels
display the scatterplots of the data denoised by filter 1.
Supplementary Figure S5: Evaluation of model assumptions. Left panel: Quantile-Quantile
plot for the ordinary least squares model (OLS) which depicts that the model residuals can be
approximated by a normal distribution. Dashed lines indicate point-wise 95% confidence
intervals after 5000 bootstraps. Middle panel: indication of heteroscedasticity in the residuals
of the fitted model. Right panel: the model residuals are correlated in time, as is shown by the
correlation between the residuals of samples that are separated by one, but also multiple lag
units.
Supplementary Table 1: Phylogenetic composition of triplicate mock communities (expressed
as number of reads) after removing OTUs with relative abundances less than 0.1 %. All species
present in the mock were correctly identified up to the genus level (except one Alcanivorax
species). Inter-replicate standard deviation of relative abundances was within 4 % for all OTUs.
The sequencing error rate was consistently lower than 0.081 % and the average error on the
relative abundances was 1.19%. For all replicates, relative abundances (%) are shown.
Assigned Family
Assigned Genus
Burkholderiaceae(100)
Porphyromonadaceae(100)
Fusobacteriaceae(100)
Alcanivoracaceae(100)
Lactobacillaceae (100)
Lachnospiraceae(100)
Planococcaceae(98)
Pseudomonadaceae(100)
Burkholderiaceae(100)
Burkholderiaceae(100)
Methylococcaceae1 (90)
Geobacteraceae(100)
Burkholderia(100)
Porphyromonas(99)
Fusobacterium(100)
Alcanivorax(100)
Lactobacillus(99)
Roseburia(100)
Sporosarcina(99)
Pseudomonas(100)
Cupriavidus(100)
Cupriavidus(100)
NA
Geobacter(100)
1
Misclassified on genus and family level
Replicate 1
Replicate 2
Replicate 3
9.84
19.10
7.89
19.60
11.34
17.80
17.72
3.89
8.11
7.61
5.59
3.38
4.28
7.47
7.62
5.39
16.77
3.32
8.36
6.41
6.42
8.47
4.82
6.91
5.80
5.22
15.67
4.34
6.82
6.79
5.76
9.50
0.00
9.08
7.56
5.35
Supplementary references
Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M et al (2013). Evaluation of
general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencingbased diversity studies. Nucleic Acids Research 41.
Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD (2013). Development of a
Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data
on the MiSeq Illumina Sequencing Platform. Applied and Environmental Microbiology 79:
5112-5120.
Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA et al (2013).
Predictive functional profiling of microbial communities using 16S rRNA marker gene
sequences. Nature Biotechnology 31: 814-+.
McMurdie PJ, Holmes S (2014). Waste Not, Want Not: Why Rarefying Microbiome Data Is
Inadmissible. Plos Computational Biology 10.
Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S (2011). A Guide to the Natural
History of Freshwater Lake Bacteria. Microbiology and Molecular Biology Reviews 75: 14-49.
Prest EI, Hammes F, Kotzsch S, van Loosdrecht MCM, Vrouwenvelder JS (2013). Monitoring
microbiological changes in drinking water systems using a fast and reproducible flow
cytometric method. Water Research 47: 7131-7142.
Props R, Monsieurs P, Mysara M, Clement L, Boon N (2016). Measuring the biodiversity of
microbial communities by flow cytometry. Methods in Ecology and Evolution: n/a-n/a.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al (2009). Introducing
mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing
and Comparing Microbial Communities. Applied and Environmental Microbiology 75: 75377541.
Vilchez-Vargas R, Geffers R, Suarez-Diez M, Conte I, Waliczek A, Kaser VS et al (2013).
Analysis of the microbial gene landscape and transcriptome for aromatic pollutants and alkane
degradation using a novel internally calibrated microarray system. Environmental Microbiology
15: 1016-1039.