Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Supporting Information 2 Genome analysis of ‘Candidatus Ancillula trichonymphae’, first 3 representative of a deep-branching clade of Bifidobacteriales, 4 strengthens evidence for convergent evolution in flagellate 5 endosymbionts 6 7 Jürgen F. H. Strassert1†, Aram Mikaelyan1, Tanja Woyke2, and Andreas Brune1* 8 9 Table of contents 10 Detailed experimental procedures 11 Supplementary Tables 12 Supplementary Figures 13 References 14 15 Detailed experimental procedures 16 Termites and sample preparation 17 Incisitermes marginipennis was obtained from the Federal Institute for Materials Research and 18 Testing (BAM) in Berlin. The hindgut of a false worker (pseudergate) was removed and 19 suspended in solution U (Trager, 1934). A single cell of Trichonympha paraspiralis (Fig. 1A) 20 was isolated and washed in the same buffer using a micromanipulator (MMO-202ND; 21 Narishige) equipped with a microinjector (CellTramm Oil; Eppendorf). The flagellate cell was 22 physically fixed with a holding capillary tube (inner diameter: 20 µm; Fig. 2A) and perforated 23 with a confocal laser beam (XYClone; Hamilton Thorne Biosciences) near the anterior cell pole 24 (Fig. 2B), which contains the majority of the ‘Candidatus Ancillula trichonymphae’ 25 endosymbionts (Strassert et al., 2012; Fig. 1B and C). Cytoplasm with bacterial cells leaking 26 from the flagellate was collected with a glass capillary tube (inner diameter: 20 µm) connected 27 to a second, identical micromanipulator (Fig. 2B). After sample collection, the flagellate was 28 disrupted to locate the nucleus and ensure that it had not been unintentionally aspirated (Fig. 29 2C). The sample was mixed with Triton X-100 (0.1% final concentration), and heated to 95 °C 30 for 10 min to release bacterial DNA, cooled on ice for 5 min, and centrifuged at 20,000 × g (4 31 °C) for 10 min to remove cell debris. 32 Whole genome amplification and purity check 33 Aliquots of each preparation were used to amplify genomic DNA by multiple-displacement 34 amplification (MDA) with the REPLI-g UltraFast Mini Kit (Qiagen) following the 35 manufacturer’s instructions, except that the incubation time was extended to 4 h. To ensure the 36 successful amplification of ‘Ca. A. trichonymphae’ and the absence of potential contaminants, 37 the MDA products were subjected to terminal restriction fragment length polymorphism (T- 38 RFLP) analysis of the bacterial SSU rRNA genes using the FAM-labeled forward primer 39 U341F (Baker et al., 2003) and the reverse primer 1390R (Thongaram et al., 2005). The PCR 40 started with a denaturing step at 95 °C for 3 min, followed by 32 cycles at 95 °C for 30 s, 56 °C 41 for 45 s, and 72 °C for 45 s, and a final extension step at 72 °C for 5 min. Aliquots of the PCR 42 product were separately digested with the restriction enzymes MspI and TaqI and analyzed as 43 described by Egert et al. (2003). Lengths of terminal restriction fragments (T-RF) were 44 determined on an automatic sequence analyzer (ABI 3130; Applied Biosystems, Carlsbad, 45 Calif., USA). For each preparation, the products of four replicate amplifications that originated 46 from the same flagellate cell and yielded exclusively the predicted T-RFs of ‘Ca. A. 47 trichonymphae’ were pooled for sequencing. 48 Sequencing 49 DNA was sheared into smaller fragments via sonication (Covaris) and ligated to sequencing 50 adapters. The samples with the name ImTpAt0 and ImTpAt1 were sequenced at GATC Biotech 51 (Konstanz, Germany), and at the Joint Genome Institute (Walnut Creek, CA, USA), 52 respectively. 53 Report by GATC Biotech: The DNA was run on a 2% agarose gel with TAE buffer, and 54 the band of a size of approximately 700 bp (approximate size after Covaris fragmentation) was 55 excised and column purified. Size selection was followed by 12 cycles of amplification, and a 56 final column purification. After concentration measurement, the resulting library was 57 immobilized onto DNA capture beads, and the library beads obtained were amplified through 58 emPCR according to the manufacturer’s recommendations. Following amplification, the 59 emulsion was chemically broken, and the beads carrying the amplified DNA library were 60 recovered and washed by filtration. The sample was sequenced on a half Genome Sequencer 61 FLX Pico-Titer plate device with a GS FLX Titanium XLR70 sequencing kit in a 200 cycles 62 run on a GS FLX+ Instrument (single reads, 420 bp). The GS FLX produced the sequence data 63 as Standard Flowgram Format (SFF) file containing flowgrams for each read with basecalls and 64 per-base quality scores. The data was analyzed with the GS FLX System Software GS De Novo 65 Assembler (Newbler) Version 2.6 taking the “read flowgrams” (SFF file) as input and using 66 default parameters for genomic libraries for the assembly. The assembly contained 5,824 67 contigs. 68 Report by the Joint genome Institute (JGI): The draft genome was generated using 69 Illumina technology. An Illumina std shotgun library was constructed and sequenced using the 70 Illumina HiSeq 2000 platform (paired end reads, 150 bp), which generated 24,764,830 reads 71 totaling 3,714.7 Mb. All general aspects of library construction and sequencing performed at 72 the JGI can be found at http://www.jgi.doe.gov. All raw Illumina sequence data were passed 73 through DUK, a filtering program developed at JGI, which removes known Illumina sequencing 74 and library preparation artifacts. Artifact-filtered sequence data was then screened and trimmed 75 according to the k-mers present in the dataset. High-depth k-mers, presumably derived from 76 MDA amplification bias, cause problems in the assembly, especially if the k-mer depth varies 77 in orders of magnitude for different regions of the genome. Reads with high k-mer coverage 78 (>30X average k-mer depth) were normalized to an average depth of 30X. Reads with an 79 average k-mer depth of less than 2X were removed. The following steps were then performed 80 for assembly: (1) normalized Illumina reads were assembled using IDBA-UD version 1.0.9 81 (Peng et al., 2012), (2) 1–3 kb simulated paired end reads were created from IDBA-UD contigs 82 using wgsim (https://github.com/lh3/wgsim), (3) normalized Illumina reads were assembled 83 with simulated read pairs using Allpaths-LG (version r42328) (Gnerre et al., 2011), (4) 84 parameters for assembly steps were: a) IDBA-UD (--no local), b) wgsim (-e 0 -1 100 -2 100 -r 85 0 -R 0 -X 0), and c) Allpaths-LG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1 FRAG 86 COVERAGE=125 JUMP COVERAGE=25 LONG JUMP COV=50, RunAllpathsLG: 87 THREADS=8 RUN=std shredpairs TARGETS=standard VAPI WARN ONLY=True 88 OVERWRITE=True MIN CONTIG=2000). The final draft assembly contained 437 contigs in 89 436 scaffolds. The total size of the genome is 5.3 Mb and the final assembly is based on 182.6 90 Mb of Illumina data. Based on a presumed genome size of 5 Mb, the average coverage of the 91 genome was 743X. 92 The contigs of both assemblies (sample ImTpAt0 and sample ImTpAt1) were combined 93 with CAP3 (Huang and Madan, 1999) using a sequence overlap of 100 bases and a sequence 94 similarity of 99.0%. 95 Annotation 96 Coding DNA sequences of the combined assemblies (draft genome ImTpAt; 784 scaffolds) 97 were identified with the Prokaryotic Dynamic Programming Gene-finding Algorithm (Hyatt et 98 al., 2010) and manually curated using the Gene Prediction Improvement Pipeline developed by 99 the JGI (Pati et al., 2010). tRNA genes were predicted with the tRNAScan-SE tool (Lowe and 100 Eddy, 1997). Ribosomal RNA genes were found by searches against the SILVA database 101 (Pruesse et al., 2007). Non-coding RNAs were identified by searching the genome for the 102 corresponding Rfam profiles using INFERNAL (http://infernal.janelia.org). Annotation was 103 further refined and metabolic pathways were reconstructed using the Integrated Microbial 104 Genomes Expert Review software (IMG ER; Markowitz et al., 2009). All scaffolds that 105 contained genes with a high sequence similarity (≥95%) to previously identified contaminations 106 of the REPLI-g UltraFast Mini Kit (Woyke et al., 2011) were removed from the draft genomes. 107 Also scaffolds with suspicious G+C content and k-mer patterns (analyses implemented in IMG 108 ER) were scrutinized by BLASTp analysis of several randomly selected genes and removed if 109 they were suspected contaminants. 110 Supplementary Tables 111 (see file Supplementary_Tables.xlsx) 112 113 Table S1. Presence of 182 single-copy genes generally conserved in most bacterial genomes 114 (Martin et al., 2006) in the draft genome of ‘Candidatus Ancillula trichonymphae’ strain 115 ImTpAt and its closest relative with a sequenced genome, Bifidobacterium asteroides strain 116 PRL2011. 117 118 Table S2. Phylogenetic context of the 2,131 protein-coding genes in the draft genome of 119 ‘Candidatus Ancillula trichonymphae’ strain ImTpAt with best BLASTx scores (>30% 120 amino acid sequence identity) against homologs in other Actinobacteria in the IMG 121 reference database (Integrated Microbial Genomes, https://img.jgi.doe.gov/). Top hits are 122 shown for cut-off values of 30%, 60%, and 90% amino acid sequence similarity. 123 124 Table S3. Gene annotations in the draft genome of ‘Candidatus Ancillula trichonymphae’ 125 strain ImTpAt. The annotations are based on the Integrated Microbial Genomes Expert 126 Review platform (IMG/ER; see Supporting Information). Unless otherwise noted, the genes 127 were grouped according to KEGG pathways. Top hits of BLAST searches against NCBI’s 128 protein database are shown right of the vertical lines. 129 Supplementary Figures 130 (see file Supplementary_Figures.pdf) 131 132 Fig. S1. Phylogenetic tree based on maximum-likelihood (ML) depicting the relationship 133 between the 16S rRNA sequences affiliated with ‘Candidatus Ancillula trichonymphae’ and 134 other major actinobacterial groups. Nodes marked with circles indicate monophyletic clades 135 in the ML tree that were well supported (○, ≥70%; •, ≥90%) by the parametric aBAYES test. 136 137 Fig. S2. Metabolic pathways of ‘Candidatus Ancillula trichonymphae’ involved in sugar 138 metabolism, based on the gene annotations in the draft genome. (A) Glycolysis, 139 gluconeogenesis, non-oxidative pentose-phosphate pathway, phosphoketolase pathway, and 140 the pentose and glucuronate interconversions. (B) The non-oxidative branch of the citrate 141 cycle. If a gene was not found in the draft genome, the corresponding reaction is indicated 142 by a gray arrow. 143 144 Fig. S3. Detailed schemes showing the phosphotransferase system (A), imports of phosphate 145 and sugar-phosphate (B and C, respectively), and the creation of a transmembrane proton 146 gradient via the F1FO-ATPase (D). Gray arrows indicate reactions catalyzed by enzymes that 147 are encoded by genes not detected in the draft genome of ‘Candidatus Ancillula 148 trichonymphae’. 149 150 Fig. S4. Metabolic pathways for the synthesis of amino acids. Gray arrows indicate reactions 151 for which the corresponding genes were not found in the draft genome of ‘Candidatus 152 Ancillula trichonymphae’. 153 154 155 Fig. S5. Biosynthesis of cofactors and vitamins. Genes missing in the draft genome are indicated by gray arrows. 156 157 Fig. S6. Phylogenetic tree inferred from the maximum-likelihood analysis of bacterial pyruvate 158 flavodoxin/ferredoxin oxidoreductase amino acid sequences (PF01855). The sequences 159 were aligned and trimmed with MAFFT (‘auto’ flag activated; Katoh and Standley, 2013) 160 and trimAL (‘automated1’ mode; Capella-Gutierrez et al., 2009), respectively. The tree 161 topology was estimated using FastTree. 162 163 164 Fig. S7. Maximum-likelihood tree based on the analysis of bacterial [FeFe] hydrogenase amino acid sequences (PF02906). The tree topology was estimated as described for Fig. S6. 165 References 166 Baker, G.C., Smith, J.J., and Cowan, D.A. (2003) Review and re-analysis of domain-specific 167 16S primers. J Microbiol Methods 55: 541–555. 168 Capella-Gutierrez, S., Silla-Martinez, J.M., and Gabaldon, T. (2009) trimAl: a tool for 169 automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 170 1972–1973. 171 Egert, M., Wagner, B., Lemke, T., Brune, A., and Friedrich, M.W. (2003) Microbial community 172 structure in midgut and hindgut of the humus-feeding larva of Pachnoda ephippiata 173 (Coleoptera: Scarabaeidae). Appl Environ Microbiol 69: 6659–6668. 174 Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J. et al. (2011) 175 High-quality draft assemblies of mammalian genomes from massively parallel sequence 176 data. Proc Natl Acad Sci USA 108: 1513–1518. 177 178 Huang, X., and Madan, A. (1999) CAP3: a DNA sequence assembly program. Genome Res 9: 868–877. 179 Hyatt, D., Chen, G-L., LoCascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. (2010) 180 Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC 181 Bioinformatics 11: 119. 182 183 184 185 Katoh, K. and Standley, D.M. (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30: 772–780. Lowe, T.M., and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 955–964. 186 Markowitz, V.M., Mavromatis, K., Ivanova, N.N., Chen, I-M.A., Chu, K., and Kyrpides, N.C. 187 (2009) IMG ER: a system for microbial genome annotation expert review and curation. 188 Bioinformatics 25: 2271–2278. 189 Martin, H.G., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C. et al. (2006) 190 Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge 191 communities. Nat Biotechnol 24: 1263–1269. 192 Pati, A., Ivanova, N.N., Mikhailova, N., Ovchinnikova, G., Hooper, S.D., Lykidis, A., and 193 Kyrpides, N.C. (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic 194 genomes. Nat Methods 7: 455–457. 195 Peng, Y., Leung, H.C, Yiu, S.M., and Chin, F.Y. (2012) IDBA-UD: a de novo assembler for 196 single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28: 197 1420–1428. 198 Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W., Peplies, J., Glöckner, F.O. (2007) 199 SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA 200 sequence data compatible with ARB. Nuc Acids Res 35: 7188–7196. 201 Strassert, J.F.H., Köhler, T., Wienemann, T.H.G., Ikeda-Ohtsubo, W., Faivre, N., 202 Franckenberg, S. et al. (2012) ‘Candidatus Ancillula trichonymphae’, a novel lineage of 203 endosymbiotic Actinobacteria in termite gut flagellates of the genus Trichonympha. Environ 204 Microbiol 14: 3259–3270. 205 Thongaram, T., Hongo, Y., Kosono, S., Ohkuma, M., Trakulnaleamsai, S., Noparatnaraporn, 206 N., and Kudo, T. (2005) Comparison of bacterial communities in the alkaline gut segment 207 among various species of higher termites. Extremophiles 9: 229–238. 208 209 Trager, W. (1934) The cultivation of a cellulose-digesting flagellate, Trichomonas termopsidis, and of certain other termite protozoa. Biol Bull 66: 182–190. 210 Woyke, T., Sczyrba, A., Lee, J., Rinke, C., Tighe, D., Clingenpeel, S. et al. (2011) 211 Decontamination of MDA reagents for single cell whole genome amplification. PLoS ONE 212 6: e26161.