Download Suppl. Info - The Brune Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
Supporting Information
2
Genome analysis of ‘Candidatus Ancillula trichonymphae’, first
3
representative of a deep-branching clade of Bifidobacteriales,
4
strengthens evidence for convergent evolution in flagellate
5
endosymbionts
6
7
Jürgen F. H. Strassert1†, Aram Mikaelyan1, Tanja Woyke2, and Andreas Brune1*
8
9
Table of contents
10
Detailed experimental procedures
11
Supplementary Tables
12
Supplementary Figures
13
References
14
15
Detailed experimental procedures
16
Termites and sample preparation
17
Incisitermes marginipennis was obtained from the Federal Institute for Materials Research and
18
Testing (BAM) in Berlin. The hindgut of a false worker (pseudergate) was removed and
19
suspended in solution U (Trager, 1934). A single cell of Trichonympha paraspiralis (Fig. 1A)
20
was isolated and washed in the same buffer using a micromanipulator (MMO-202ND;
21
Narishige) equipped with a microinjector (CellTramm Oil; Eppendorf). The flagellate cell was
22
physically fixed with a holding capillary tube (inner diameter: 20 µm; Fig. 2A) and perforated
23
with a confocal laser beam (XYClone; Hamilton Thorne Biosciences) near the anterior cell pole
24
(Fig. 2B), which contains the majority of the ‘Candidatus Ancillula trichonymphae’
25
endosymbionts (Strassert et al., 2012; Fig. 1B and C). Cytoplasm with bacterial cells leaking
26
from the flagellate was collected with a glass capillary tube (inner diameter: 20 µm) connected
27
to a second, identical micromanipulator (Fig. 2B). After sample collection, the flagellate was
28
disrupted to locate the nucleus and ensure that it had not been unintentionally aspirated (Fig.
29
2C). The sample was mixed with Triton X-100 (0.1% final concentration), and heated to 95 °C
30
for 10 min to release bacterial DNA, cooled on ice for 5 min, and centrifuged at 20,000 × g (4
31
°C) for 10 min to remove cell debris.
32
Whole genome amplification and purity check
33
Aliquots of each preparation were used to amplify genomic DNA by multiple-displacement
34
amplification (MDA) with the REPLI-g UltraFast Mini Kit (Qiagen) following the
35
manufacturer’s instructions, except that the incubation time was extended to 4 h. To ensure the
36
successful amplification of ‘Ca. A. trichonymphae’ and the absence of potential contaminants,
37
the MDA products were subjected to terminal restriction fragment length polymorphism (T-
38
RFLP) analysis of the bacterial SSU rRNA genes using the FAM-labeled forward primer
39
U341F (Baker et al., 2003) and the reverse primer 1390R (Thongaram et al., 2005). The PCR
40
started with a denaturing step at 95 °C for 3 min, followed by 32 cycles at 95 °C for 30 s, 56 °C
41
for 45 s, and 72 °C for 45 s, and a final extension step at 72 °C for 5 min. Aliquots of the PCR
42
product were separately digested with the restriction enzymes MspI and TaqI and analyzed as
43
described by Egert et al. (2003). Lengths of terminal restriction fragments (T-RF) were
44
determined on an automatic sequence analyzer (ABI 3130; Applied Biosystems, Carlsbad,
45
Calif., USA). For each preparation, the products of four replicate amplifications that originated
46
from the same flagellate cell and yielded exclusively the predicted T-RFs of ‘Ca. A.
47
trichonymphae’ were pooled for sequencing.
48
Sequencing
49
DNA was sheared into smaller fragments via sonication (Covaris) and ligated to sequencing
50
adapters. The samples with the name ImTpAt0 and ImTpAt1 were sequenced at GATC Biotech
51
(Konstanz, Germany), and at the Joint Genome Institute (Walnut Creek, CA, USA),
52
respectively.
53
Report by GATC Biotech: The DNA was run on a 2% agarose gel with TAE buffer, and
54
the band of a size of approximately 700 bp (approximate size after Covaris fragmentation) was
55
excised and column purified. Size selection was followed by 12 cycles of amplification, and a
56
final column purification. After concentration measurement, the resulting library was
57
immobilized onto DNA capture beads, and the library beads obtained were amplified through
58
emPCR according to the manufacturer’s recommendations. Following amplification, the
59
emulsion was chemically broken, and the beads carrying the amplified DNA library were
60
recovered and washed by filtration. The sample was sequenced on a half Genome Sequencer
61
FLX Pico-Titer plate device with a GS FLX Titanium XLR70 sequencing kit in a 200 cycles
62
run on a GS FLX+ Instrument (single reads, 420 bp). The GS FLX produced the sequence data
63
as Standard Flowgram Format (SFF) file containing flowgrams for each read with basecalls and
64
per-base quality scores. The data was analyzed with the GS FLX System Software GS De Novo
65
Assembler (Newbler) Version 2.6 taking the “read flowgrams” (SFF file) as input and using
66
default parameters for genomic libraries for the assembly. The assembly contained 5,824
67
contigs.
68
Report by the Joint genome Institute (JGI): The draft genome was generated using
69
Illumina technology. An Illumina std shotgun library was constructed and sequenced using the
70
Illumina HiSeq 2000 platform (paired end reads, 150 bp), which generated 24,764,830 reads
71
totaling 3,714.7 Mb. All general aspects of library construction and sequencing performed at
72
the JGI can be found at http://www.jgi.doe.gov. All raw Illumina sequence data were passed
73
through DUK, a filtering program developed at JGI, which removes known Illumina sequencing
74
and library preparation artifacts. Artifact-filtered sequence data was then screened and trimmed
75
according to the k-mers present in the dataset. High-depth k-mers, presumably derived from
76
MDA amplification bias, cause problems in the assembly, especially if the k-mer depth varies
77
in orders of magnitude for different regions of the genome. Reads with high k-mer coverage
78
(>30X average k-mer depth) were normalized to an average depth of 30X. Reads with an
79
average k-mer depth of less than 2X were removed. The following steps were then performed
80
for assembly: (1) normalized Illumina reads were assembled using IDBA-UD version 1.0.9
81
(Peng et al., 2012), (2) 1–3 kb simulated paired end reads were created from IDBA-UD contigs
82
using wgsim (https://github.com/lh3/wgsim), (3) normalized Illumina reads were assembled
83
with simulated read pairs using Allpaths-LG (version r42328) (Gnerre et al., 2011), (4)
84
parameters for assembly steps were: a) IDBA-UD (--no local), b) wgsim (-e 0 -1 100 -2 100 -r
85
0 -R 0 -X 0), and c) Allpaths-LG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1 FRAG
86
COVERAGE=125 JUMP COVERAGE=25 LONG JUMP COV=50, RunAllpathsLG:
87
THREADS=8 RUN=std shredpairs TARGETS=standard VAPI WARN ONLY=True
88
OVERWRITE=True MIN CONTIG=2000). The final draft assembly contained 437 contigs in
89
436 scaffolds. The total size of the genome is 5.3 Mb and the final assembly is based on 182.6
90
Mb of Illumina data. Based on a presumed genome size of 5 Mb, the average coverage of the
91
genome was 743X.
92
The contigs of both assemblies (sample ImTpAt0 and sample ImTpAt1) were combined
93
with CAP3 (Huang and Madan, 1999) using a sequence overlap of 100 bases and a sequence
94
similarity of 99.0%.
95
Annotation
96
Coding DNA sequences of the combined assemblies (draft genome ImTpAt; 784 scaffolds)
97
were identified with the Prokaryotic Dynamic Programming Gene-finding Algorithm (Hyatt et
98
al., 2010) and manually curated using the Gene Prediction Improvement Pipeline developed by
99
the JGI (Pati et al., 2010). tRNA genes were predicted with the tRNAScan-SE tool (Lowe and
100
Eddy, 1997). Ribosomal RNA genes were found by searches against the SILVA database
101
(Pruesse et al., 2007). Non-coding RNAs were identified by searching the genome for the
102
corresponding Rfam profiles using INFERNAL (http://infernal.janelia.org). Annotation was
103
further refined and metabolic pathways were reconstructed using the Integrated Microbial
104
Genomes Expert Review software (IMG ER; Markowitz et al., 2009). All scaffolds that
105
contained genes with a high sequence similarity (≥95%) to previously identified contaminations
106
of the REPLI-g UltraFast Mini Kit (Woyke et al., 2011) were removed from the draft genomes.
107
Also scaffolds with suspicious G+C content and k-mer patterns (analyses implemented in IMG
108
ER) were scrutinized by BLASTp analysis of several randomly selected genes and removed if
109
they were suspected contaminants.
110
Supplementary Tables
111
(see file Supplementary_Tables.xlsx)
112
113
Table S1. Presence of 182 single-copy genes generally conserved in most bacterial genomes
114
(Martin et al., 2006) in the draft genome of ‘Candidatus Ancillula trichonymphae’ strain
115
ImTpAt and its closest relative with a sequenced genome, Bifidobacterium asteroides strain
116
PRL2011.
117
118
Table S2. Phylogenetic context of the 2,131 protein-coding genes in the draft genome of
119
‘Candidatus Ancillula trichonymphae’ strain ImTpAt with best BLASTx scores (>30%
120
amino acid sequence identity) against homologs in other Actinobacteria in the IMG
121
reference database (Integrated Microbial Genomes, https://img.jgi.doe.gov/). Top hits are
122
shown for cut-off values of 30%, 60%, and 90% amino acid sequence similarity.
123
124
Table S3. Gene annotations in the draft genome of ‘Candidatus Ancillula trichonymphae’
125
strain ImTpAt. The annotations are based on the Integrated Microbial Genomes Expert
126
Review platform (IMG/ER; see Supporting Information). Unless otherwise noted, the genes
127
were grouped according to KEGG pathways. Top hits of BLAST searches against NCBI’s
128
protein database are shown right of the vertical lines.
129
Supplementary Figures
130
(see file Supplementary_Figures.pdf)
131
132
Fig. S1. Phylogenetic tree based on maximum-likelihood (ML) depicting the relationship
133
between the 16S rRNA sequences affiliated with ‘Candidatus Ancillula trichonymphae’ and
134
other major actinobacterial groups. Nodes marked with circles indicate monophyletic clades
135
in the ML tree that were well supported (○, ≥70%; •, ≥90%) by the parametric aBAYES test.
136
137
Fig. S2. Metabolic pathways of ‘Candidatus Ancillula trichonymphae’ involved in sugar
138
metabolism, based on the gene annotations in the draft genome. (A) Glycolysis,
139
gluconeogenesis, non-oxidative pentose-phosphate pathway, phosphoketolase pathway, and
140
the pentose and glucuronate interconversions. (B) The non-oxidative branch of the citrate
141
cycle. If a gene was not found in the draft genome, the corresponding reaction is indicated
142
by a gray arrow.
143
144
Fig. S3. Detailed schemes showing the phosphotransferase system (A), imports of phosphate
145
and sugar-phosphate (B and C, respectively), and the creation of a transmembrane proton
146
gradient via the F1FO-ATPase (D). Gray arrows indicate reactions catalyzed by enzymes that
147
are encoded by genes not detected in the draft genome of ‘Candidatus Ancillula
148
trichonymphae’.
149
150
Fig. S4. Metabolic pathways for the synthesis of amino acids. Gray arrows indicate reactions
151
for which the corresponding genes were not found in the draft genome of ‘Candidatus
152
Ancillula trichonymphae’.
153
154
155
Fig. S5. Biosynthesis of cofactors and vitamins. Genes missing in the draft genome are
indicated by gray arrows.
156
157
Fig. S6. Phylogenetic tree inferred from the maximum-likelihood analysis of bacterial pyruvate
158
flavodoxin/ferredoxin oxidoreductase amino acid sequences (PF01855). The sequences
159
were aligned and trimmed with MAFFT (‘auto’ flag activated; Katoh and Standley, 2013)
160
and trimAL (‘automated1’ mode; Capella-Gutierrez et al., 2009), respectively. The tree
161
topology was estimated using FastTree.
162
163
164
Fig. S7. Maximum-likelihood tree based on the analysis of bacterial [FeFe] hydrogenase amino
acid sequences (PF02906). The tree topology was estimated as described for Fig. S6.
165
References
166
Baker, G.C., Smith, J.J., and Cowan, D.A. (2003) Review and re-analysis of domain-specific
167
16S primers. J Microbiol Methods 55: 541–555.
168
Capella-Gutierrez, S., Silla-Martinez, J.M., and Gabaldon, T. (2009) trimAl: a tool for
169
automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:
170
1972–1973.
171
Egert, M., Wagner, B., Lemke, T., Brune, A., and Friedrich, M.W. (2003) Microbial community
172
structure in midgut and hindgut of the humus-feeding larva of Pachnoda ephippiata
173
(Coleoptera: Scarabaeidae). Appl Environ Microbiol 69: 6659–6668.
174
Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J. et al. (2011)
175
High-quality draft assemblies of mammalian genomes from massively parallel sequence
176
data. Proc Natl Acad Sci USA 108: 1513–1518.
177
178
Huang, X., and Madan, A. (1999) CAP3: a DNA sequence assembly program. Genome Res 9:
868–877.
179
Hyatt, D., Chen, G-L., LoCascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. (2010)
180
Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC
181
Bioinformatics 11: 119.
182
183
184
185
Katoh, K. and Standley, D.M. (2013) MAFFT multiple sequence alignment software version 7:
improvements in performance and usability. Mol Biol Evol 30: 772–780.
Lowe, T.M., and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer
RNA genes in genomic sequence. Nucleic Acids Res 25: 955–964.
186
Markowitz, V.M., Mavromatis, K., Ivanova, N.N., Chen, I-M.A., Chu, K., and Kyrpides, N.C.
187
(2009) IMG ER: a system for microbial genome annotation expert review and curation.
188
Bioinformatics 25: 2271–2278.
189
Martin, H.G., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C. et al. (2006)
190
Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge
191
communities. Nat Biotechnol 24: 1263–1269.
192
Pati, A., Ivanova, N.N., Mikhailova, N., Ovchinnikova, G., Hooper, S.D., Lykidis, A., and
193
Kyrpides, N.C. (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic
194
genomes. Nat Methods 7: 455–457.
195
Peng, Y., Leung, H.C, Yiu, S.M., and Chin, F.Y. (2012) IDBA-UD: a de novo assembler for
196
single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:
197
1420–1428.
198
Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W., Peplies, J., Glöckner, F.O. (2007)
199
SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA
200
sequence data compatible with ARB. Nuc Acids Res 35: 7188–7196.
201
Strassert, J.F.H., Köhler, T., Wienemann, T.H.G., Ikeda-Ohtsubo, W., Faivre, N.,
202
Franckenberg, S. et al. (2012) ‘Candidatus Ancillula trichonymphae’, a novel lineage of
203
endosymbiotic Actinobacteria in termite gut flagellates of the genus Trichonympha. Environ
204
Microbiol 14: 3259–3270.
205
Thongaram, T., Hongo, Y., Kosono, S., Ohkuma, M., Trakulnaleamsai, S., Noparatnaraporn,
206
N., and Kudo, T. (2005) Comparison of bacterial communities in the alkaline gut segment
207
among various species of higher termites. Extremophiles 9: 229–238.
208
209
Trager, W. (1934) The cultivation of a cellulose-digesting flagellate, Trichomonas termopsidis,
and of certain other termite protozoa. Biol Bull 66: 182–190.
210
Woyke, T., Sczyrba, A., Lee, J., Rinke, C., Tighe, D., Clingenpeel, S. et al. (2011)
211
Decontamination of MDA reagents for single cell whole genome amplification. PLoS ONE
212
6: e26161.