Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SUPPLEMENTARY INFORMATION FOR Absolute quantification of microbial taxon abundances Ruben Props1,2, Frederiek-Maarten Kerckhof1, Peter Rubbens3, Jo De Vrieze1, Emma Hernandez Sanabria1, Willem Waegeman³, Pieter Monsieurs2, Frederik Hammes4 and Nico Boon1,* 1 Center for Microbial Ecology and Technology (CMET), Department of Biochemical and Microbial Technology, Ghent University, Coupure Links 653, B-9000 Gent, Belgium 2 3 Belgian Nuclear Research Centre (SCK•CEN), Boeretang 200, B-2400 Mol, Belgium KERMIT, department of Mathematical modelling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium 4 Department of Environmental Microbiology, Eawag - Swiss Federal Institute for Aquatic Science and Technology, Überlandstrasse 133, CH-8600, Dübendorf, Switzerland * Corresponding author: Nico Boon, Ghent University; Faculty of Bioscience Engineering; Center for Microbial Ecology and Technology (CMET); Coupure Links 653; B-9000 Gent, Belgium; phone: +32 (0)9 264 59 76; fax: +32 (0)9 264 62 48; E-mail: [email protected]; Webpage: www.cmet.UGent.be. Material and methods Survey of the cooling water system The microbial community of an industrial cooling water system that operates on a nuclear test reactor was monitored over a total time period of approximately 80 days, with an average sampling frequency of twice per day. During start-up (12 – 14h), temperature increased from approximately 15 °C to 30 °C, conductivity from 1 to 7 µS cm-1, and pH remained relatively stable between 4.18 and 4.87. All metadata, including time points, conductivity, temperature, pH and cell densities are available in the Supplementary dataset. A more detailed description of the system can be found in (Props et al 2016). Flow cytometric analysis (adapted from Props et al 2016) Aqueous samples were collected from five locations, mixed in equal ratios, and transported in 1 litre pre-rinsed and autoclaved glass containers. A sample-matrix optimized protocol adjusted from a previous study was used (Props et al 2016). Triplicate sample volumes (250 µl) were diluted two times in sterile 0.22 µm filtered bottled mineral water (Prest et al 2013), and left at room temperature for 10 minutes. Samples were stained with the nucleic acid stain SYBR Green I (SG, 10 000x concentrate, Invitrogen), and incubated for 20 min at 37 °C (final concentration of SG = 1x concentrate). Analysis by flow cytometry (Accuri, BD Biosciences, Erembodegem, Belgium) on four fluorescence detectors (530/30 nm, 585/40 nm, > 670 nm, 675/25 nm) and two scatter detectors was conducted in fixed volume mode (50 µL sample-1) to allow accurate quantification of cell densities. All samples were analysed immediately after incubation. The data of each sample was denoised from (in)organic noise and cell aggregates by a two-step filtering approach using the flowCore package (v1.38.1) in R (v3.3.0). First, the bacterial cell population was extracted by a filter applied on the primary fluorescence emission channels (Supplementary Figure S3, FL1-H/FL3-H). Secondly, cell aggregates were removed by a filter on the area vs. height plot of the primary fluorescence detector (FL1, Supplementary Figure S4). Disproportionate area-to-height ratios of signals are indicative of aggregated bacterial cells, which result in an incorrect enumeration of the total cell density. As there exists no method to correct for these agglomerates, we chose to exclude all aggregated cells from the enumeration. Over this entire data set, the average proportion of aggregated cells was 0.5 %, and the proportion never exceeded 2.5 %. Community analysis (adapted from Props et al 2016) In parallel to the flow cytometry analysis, one litre from each sample was filtered on sterile 0.2 µm filter funnels and stored dry at – 20 °C. DNA-extraction was performed on the filter material as previously described (Vilchez-Vargas et al 2013). The DNA-quality was evaluated on 1 % (w/v) agarose gel electrophoresis and DNA-concentration was quantified by the Quantifluor dsDNA sample kit on a multi detection system (Promega, Leiden, the Netherlands). Highthroughput amplicon sequencing of the V3 – V4 hypervariable region (Klindworth et al 2013) was performed with the Illumina MiSeq platform, according to the manufacturer’s guidelines at LGC Genomics (GmbH, Berlin, Germany). This generated 50,000 ( 30 %) raw 2 x 300 bp paired-end reads per sample. Contigs were created by merging paired-end reads based on the Phred quality score (of both reads) heuristic (Kozich et al 2013), in mothur (v.1.37, seed = 777) (Schloss et al 2009). Contigs were aligned to the Silva database (v123), and filtered from those with (i) ambiguous bases, (ii) more than 8 homopolymers, (iii) a length below 400, and (iv) those not corresponding to the V3 – V4 region. The aligned sequences were filtered and dereplicated, while sequencing errors were removed using the pre.cluster command. Chimera removal was performed with the uchime command. Sequences were clustered into operational taxonomic units (OTUs) at 97 % similarity with the cluster.split command (average neighbour algorithm). Sequences were classified using the Wang method with the Silva database (v123) and the freshwater database available at https://github.com/mcmahon-uw/FWMFG (Newton et al 2011). Quality of the sequencing and post-processing pipeline was verified by incorporating triplicate mock samples (n = 12 species) into the same sequencing run (Supplementary Table 1). All samples were rescaled by taking the proportions of each OTU, multiplying it with the minimum sample size (825 reads) and rounding to the nearest integer (McMurdie and Holmes 2014). To evaluate whether further adjustment of the relative abundances by predicted 16S rRNA gene copy numbers was justified, we utilized the accuracy option (-a) in the normalize_by_copy_number.py code of PICRUSt (v1.0.0) to calculate the weighted Nearest Sequenced Taxon Index (NSTI) for each sample (Langille et al 2013). The NSTI measures the availability of reference genomes for the most abundant OTUs in a sample. High scores (>0.15) indicate low availability of sufficient reference genomes, while low scores (<0.06) indicate the presence of closely related reference genomes. As only 15 out of 79 samples had an NSTI lower than 0.06 (Supplementary Figure S4), we did not correct the relative abundances of our data set by the predicted 16S rRNA gene copy numbers. The code necessary for formatting a mothur outputted biom file for PICRUSt, as well as the necessary code to run the analysis is added as a supplementary file. Briefly, the OTU table was picked against the Greengenes database (v13_8), formatted into a biom file (v1.0.0) and provided as input to PICRUSt. The NSTI values are available in the supplementary dataset. Statistical analysis All statistical analyses were performed in R (v3.3.0) with seed 777. Pearson’s correlation coefficients (rp) between OTU relative abundances and total cell densities were calculated with the cor function. The difference between the relative and absolute abundance relationship for the three most abundant OTUs (OTU1, OTU2 and OTU3) was investigated by ordinary least squares regression (OLS). The absolute and relative abundance data were pruned from nearzero values (relative abundance < 5%), centred and scaled. Data points were grouped according to their representative OTU and individual regressions were performed for each group. Assumptions for statistical inference on OLS models were evaluated by visual inspection of the Quantile-Quantile plot, residuals versus fitted value plot and autocorrelation plot (Supplementary Figure S5). Errors on the model coefficients were corrected for heteroscedasticity (Breusch-Pagan test, p<0.0001) and autocorrelation by using the HAC correction available from the sandwich package (v2.3.4, vcovHAC function). The Wald test was used to confirm the overall positive relation between the absolute and relative abundances (p<0.0001). Posthoc analysis was performed to identify which OTUs differed in their absolute vs. relative abundance relationship. This was performed with the glht function (Tukey’s all-pair comparison) from the multcomp package (v1.4.5). P-values lower than 0.05 were considered to represent significant differences. Supplementary Figure S1: Distribution of the Pearson’s correlation coefficient for all 427 OTUs, and calculated between the relative abundances and the whole community cell density over 79 time points. Dashed lines indicate the means of both sample groups. Supplementary Figure S2: Distribution of the NSTI (weighted Nearest Sequenced Taxon Index) score for all 79 samples. Only 15 samples had a good NSTI (< 0.06) and 64 samples had a poor NSTI (> 0.06). Dashed lines indicate the mean NSTI score for each quality group. A NSTI of 0.03 indicates the availability of reference genomes at the species level. Supplementary Figure S3: Data filtering strategy of the flow cytometry data; each point represents one cell that is characterized by two fluorescence signals. FL1-A, FL1-H and FL3H parameters were log10 transformed. In the first filtering step bacterial cells are separated from the (in)organic noise (left panel). This filter was optimized based on a set of negative controls (0.22 µm filtered sample, autoclaved sample). The filtered data (area within polygon) was subjected to a second filtering step to remove cell aggregates (right panel). Cell aggregates display disproportionate ratios between the area and height of their fluorescence signal. This causes these aggregates to be positioned away from the diagonal (outside filter 2). Supplementary Figure S4: Illustration of the non-linear relationship between the area and height aspects of the scatter signals. SSC-A/H and FSC-H/A parameters were log10 transformed. Left panels display the scatterplots of the raw data (unfiltered) and right panels display the scatterplots of the data denoised by filter 1. Supplementary Figure S5: Evaluation of model assumptions. Left panel: Quantile-Quantile plot for the ordinary least squares model (OLS) which depicts that the model residuals can be approximated by a normal distribution. Dashed lines indicate point-wise 95% confidence intervals after 5000 bootstraps. Middle panel: indication of heteroscedasticity in the residuals of the fitted model. Right panel: the model residuals are correlated in time, as is shown by the correlation between the residuals of samples that are separated by one, but also multiple lag units. Supplementary Table 1: Phylogenetic composition of triplicate mock communities (expressed as number of reads) after removing OTUs with relative abundances less than 0.1 %. All species present in the mock were correctly identified up to the genus level (except one Alcanivorax species). Inter-replicate standard deviation of relative abundances was within 4 % for all OTUs. The sequencing error rate was consistently lower than 0.081 % and the average error on the relative abundances was 1.19%. For all replicates, relative abundances (%) are shown. Assigned Family Assigned Genus Burkholderiaceae(100) Porphyromonadaceae(100) Fusobacteriaceae(100) Alcanivoracaceae(100) Lactobacillaceae (100) Lachnospiraceae(100) Planococcaceae(98) Pseudomonadaceae(100) Burkholderiaceae(100) Burkholderiaceae(100) Methylococcaceae1 (90) Geobacteraceae(100) Burkholderia(100) Porphyromonas(99) Fusobacterium(100) Alcanivorax(100) Lactobacillus(99) Roseburia(100) Sporosarcina(99) Pseudomonas(100) Cupriavidus(100) Cupriavidus(100) NA Geobacter(100) 1 Misclassified on genus and family level Replicate 1 Replicate 2 Replicate 3 9.84 19.10 7.89 19.60 11.34 17.80 17.72 3.89 8.11 7.61 5.59 3.38 4.28 7.47 7.62 5.39 16.77 3.32 8.36 6.41 6.42 8.47 4.82 6.91 5.80 5.22 15.67 4.34 6.82 6.79 5.76 9.50 0.00 9.08 7.56 5.35 Supplementary references Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M et al (2013). Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencingbased diversity studies. Nucleic Acids Research 41. Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD (2013). Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform. Applied and Environmental Microbiology 79: 5112-5120. Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA et al (2013). Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnology 31: 814-+. McMurdie PJ, Holmes S (2014). Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible. Plos Computational Biology 10. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S (2011). A Guide to the Natural History of Freshwater Lake Bacteria. Microbiology and Molecular Biology Reviews 75: 14-49. Prest EI, Hammes F, Kotzsch S, van Loosdrecht MCM, Vrouwenvelder JS (2013). Monitoring microbiological changes in drinking water systems using a fast and reproducible flow cytometric method. Water Research 47: 7131-7142. Props R, Monsieurs P, Mysara M, Clement L, Boon N (2016). Measuring the biodiversity of microbial communities by flow cytometry. Methods in Ecology and Evolution: n/a-n/a. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al (2009). Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Applied and Environmental Microbiology 75: 75377541. Vilchez-Vargas R, Geffers R, Suarez-Diez M, Conte I, Waliczek A, Kaser VS et al (2013). Analysis of the microbial gene landscape and transcriptome for aromatic pollutants and alkane degradation using a novel internally calibrated microarray system. Environmental Microbiology 15: 1016-1039.