Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Supplementary Information Methods Bacterial isolation. Thermotoga bacteria were isolated from oil production fluids of the Troll B and Troll C oil platforms (60° 469 27.80 N 03° 309 11.50 E) as described in (Dipippo et al., 2009). The reservoir is located 1560 m below the sea floor, with an in situ temperature of 68 °C and a predicted in situ pH of 6. Several enrichment cultures were established. One ml of production fluid was injected into either 20 ml minimal medium (MM1) or into 20ml Thermotoga petrophila medium under 100% N2 atmosphere (Dipippo et al., 2009; Takahata et al., 2008). The composition of the MM1 medium per liter of distilled water was as follows: 20 g NaCl, 0.9 g MgCl2 . 6H2O, 1.4 g MgSO47H2O, 0.33 g KCl, 0.25 g NH4Cl, 0.14 g CaCl22H2O, 0.45 g KH2PO4, 1.0 ml trace minerals SL-10 (Widdel et al., 1983) 0.1 mg resazurin and 1 g yeast extract. After autoclaving, 4 ml 0.5 M Na2S9H2O and 10 ml trace vitamin solution (Balch et al., 1979) were added per liter. For both media the pH was adjusted to 6.8 with 1 M NaOH. Each enrichment culture was then supplied with a different growth substrate (i.e., fructose, glucose, xylose, galactose, cellulose) to a 0.5 % w/v final concentration and the culture bottle was incubated at 70 °C for 1–7 days. Dilution series were made by transferring 1 ml of the enrichment culture into a Bellco tube containing 9 ml medium and then transferring 1 ml from that into the next tube, and so on, to a 108 dilution (Hungate, 1962). One ml of 3% Gelrite was added to each tube to make shake tubes. White, round colonies were picked from the dilution series with the fewest colonies after incubation at 70 °C for 3 days, and their 16S rRNA gene sequences were determined after amplification by PCR using primers 16S.27F (5’-AGAGTTTGATCCTGG- CTCAG-3’) and 16S.1406R (5’ACGGGCGGTGTGTRC-3’). In all but one dilution series the recovered bacteria from the Thermotogales order belonged to the Thermotoga genus. These isolates had almost identical 16S rRNA gene sequences (>99.8% identity) and clustered with Thermotoga sp. RQ2 in phylogenetic trees). In one series grown on fructose, the detected bacterium was Kosmotoga olearia TBF 19.5.1 described in (Dipippo et al., 2009). The Kuril islands strains were isolated using similar methods from shallow marine hydrothermal vents in Alechino area (43° 5431 N 145° 2940 E) of Kunashir Island, Russia. Enrichment and pure cultures were obtained and cultivated on a medium similar to MM1, but with 25 g l-1 of NaCl as described in (Svetlichny et al., 1991), 100% N2 in gas phase and 2 g l-1 of microcrystalline cellulose (MCC, Chemapol, Czech Republic) as substrate. Dilution-series on gelrite with 2g l-1 of sucrose were made as described above, and grown colonies were picked up and inoculated into MCC medium. DNA isolation, genome and fosmid sequencing and annotation. DNA was isolated following the protocol from (Charbonnier & Forterre, 1995). The genomic DNA and fosmid clones carrying rRNA operons were sequenced using Roche 454technology with 8kb paired-end libraries and ¼ run per genome to a 26-63x coverage, and assembled into contigs and scaffolds using NEWBLER version 2.0 at the Massively Parallel Sequencing (MPS) Unit at Genome Quebec, Monteral, Canada. All of the genomes assembled into one scaffold with 5-22 ordered contigs. In two of the genomes, Thermotoga sp. CELL2 and Thermotoga sp. 2812B, all gaps were closed using PCR by designing primers to the ends of neighboring contigs in the scaffolds. In Thermotoga sp. TBGT17.6.5 genome, two gaps remained after the PCR closing. Fosmid libraries were constructed and screened as described in (Nesbø et al., 2006). Clones carrying rRNA operons were sequenced using 454-technology and assembled into contigs using NEWBLER version 2.0 at the Massively Parallel Sequencing (MPS) Unit at Genome Quebec, Monteral, Canada. The genomes and fosmid clones were submitted to Genbank (Acc. Numbers XXX) and annotated by the Prokaryotic Automatic Annotation Pipeline. CRISPRs were identified in all genomes using the CRISPR Recognition Tool (CRT) v. 1.1 with default settings (Bland et al., 2007) as described in (Zhaxybayeva, Swithers, et al., 2009). Spacers identified from the CRT analysis were compared pairwise using bl2seq program of BLAST v 2.2.29+ using default settings (Altschul, 1997). Assembly of a Thermotoga genome from a metagenome. Three metagenomes from 3 different sampling sites of Great Boiling Spring (GBS) in Nevada (Markowitz et al., 2014) with a large number of sequences with ≥ 90% identity to TM-group genomes (6.7Mb in total, Table S2) were assembled de novo using Geneious 6 (www.geneious.com). Since very low within-site diversity was observed in the assembled scaffolds, we selected the sample with the longest assembled scaffolds (sample 85cSC) for a genome assembly. After removing the smallest of the redundant contigs (which could be a result of population diversity, paralogy or mis-assembly), we assembled a 2.1Mb draft genome that we denote throughout the manuscript as Thermotoga sp. GBS. The Thermotoga sp. GBS and the Thermotoga sp. A7A (Sutcliffe et al., 2013) genomes contain 92% and 88% of the protein-coding genes in the T. maritima MSB8 genome, respectively, indicating that both draft genomes are nearly complete. Analysis of quartets from the Quartet Decomposition analysis. The quartets topologies with substantial bootstrap support (> 80%) were summarized into a spectrogram. Quartet topologies supported with >80% bootstrap values by at least 30% of gene families were extracted and coded into a weighted data matrix. Plurality networks were calculated from the matrix using in SplitsTree 4 (Huson & Bryant, 2006). Genomes were further grouped into categories by their origin. In an “ecological niche” division, the taxa were divided into two groups: those originating from oil reservoir (T. petrophila RKU1, T.naptophila RKU10 and Thermotoga sp. CELL2) and the “marine” isolates (T. maritima, Thermotoga sp. RQ2, Thermotoga sp. 2812B and Thermotoga sp. Mc24). In the “geographic proximity” division, the isolates were designated as originated from “Europe/Atlantic Ocean” (T. maritima, Thermotoga sp. CELL2 and Thermotoga sp. RQ2) or “Japan and Kurils Islands/Pacific Ocean” (T. petrophila RKU1, T.naptophila RKU10, Thermotoga sp. 2812B and Thermotoga sp. Mc24). Support of these partitions by individual gene families was evaluated using agreement scoring and scatter plot analysis as described in (Zhaxybayeva, Doolittle, et al., 2009). Gene families with an agreement score above 0 and larger than a disagreement score were designated as “preferentially” supporting the partition in question. Furthermore, gene families that had an agreement score >0.6 and a disagreement score of 0 were designated as “strongly supporting” the partition in question. We also repeated QD analyses with a larger genome set that included Thermotoga sp. GBS and Thermotoga sp. A7A draft genomes. Genes in the genomes assembled from metagenomic data, with ≥90% identity to the corresponding genes in T. maritima genome, were extracted in the TimeZone package (Chattopadhyay et al., 2013) and aligned using ClustalW (Larkin et al., 2007). Phylogenetic tree reconstruction and QD analysis were conducted as described above. Recombination analysis. The relative rate of recombination to mutation, as well as the average recombination tract length, were assessed using the pairwise program and a likelihood look-up table generated by the complete program in the LDhat package (McVean et al., 2002; Jolley, 2004). For each LCB in the alignment of 7 TM-group bacteria, we calculated the population mutation rate (2Ne= ) and the gene conversion parameter = 2Nect, where Ne is effective population size, is the mutation rate, c is the rate of initiation of gene conversion per base and t is the average gene conversion tract length. Since the parameter estimates varied (by the same order of magnitude as between LCBs; see Supplementary Table S4) depending on the look-up table used by the pairwise program, for each LCB we performed three analyses using different look-up tables generated by the complete program in the LDhat package. Detection of recombinant fragments was carried out in RDP version 4.33 (Martin et al., 2010) and LikeWind (Archibald & Roger, 2002). In the RDP package we used the RDP, Genconv, Maxchi and Chimera algorithms, and counted only the events detected in at least three of the four methods. The neighbor-joining tree calculated from the aligned LCBs within the RDP package was used as reference tree. On this tree, sister taxa pairs are Thermotoga sp. RQ2 and Thermotoga sp. Cell2, T. petrophila and T. naphthophila, and T. maritima and Thermotoga sp. 2812B. The detected recombination events were manually inspected and rejected if necessary. Some rejected events consisted of large segments with many inferred recombination breakpoints, making the detangling of the history difficult (Fig. S7). We also manually corrected some recombination events where the donor and recipient genome had been incorrectly inferred, and adjusted the recombination breakpoints for some of the larger recombination fragments. Events with predicted endpoints were used to estimate average recombination tract length. In the LikeWind analysis, we used was the maximum likelihood tree calculated in PAUP* version 4.0b10 (Swofford) under a GTR++ model as the reference tree. Estimation of the expected divergence of two genomes in isolated oil reservoirs. In absence of recombination, the divergence will be a result of mutation and can be calculated as 2μt, where μ is a mutation rate and t is the time since the reservoirs isolation. Mutation rates were repeatedly shown to be on average 0.0030.004 mutations per genome per replication across all three domains of life. Generation times for subsurface bacteria are estimated to be as low as 1,000 yrs/generation (Morono et al., 2011). Hence, we expect the two genomes to accumulate K=2 x 0.003 x 1,000,000 / 1,000 = 6 mutations per genome per one million year of isolation. Due to such low mutation rate, we are not correcting for back mutations and multiple mutations that can occur at the same nucleotide position in a genome. Using the above rates, indigenous bacteria in the Troll oil reservoir are expected to accumulate at least 145 x 6 = 870 SNPs per genome, due to their presumed isolation for 145 MY. More conservative thermophile mutation rate of 0.00033 (Drake, 2009) would yield 95.7 SNPs per genome. Since generation times are likely to be on the order of 10 rather than 1,000 years/generation, the number of mutations per genome may be as high as 92,800. Using Jukes-Cantor correction for multiple substitutions and back mutations, we would expect to observe ~90,000 SNPs Table S1. Thermotoga maritima-like genomes analyzed in this study. Name and Genbank Accession Number Italy T. maritima NC_000853 Japan T. petrophila RKU1 NC_009486 T. naphthophila RKU10 NC_013642 Azorez Thermotoga sp. RQ2 NC_010483 Sample site Size (bp) Geothermally heated seafloor, Vulcano island, Italy 1,860,725 Deep subterranean oil reservoir in Niigata, Japan 1,823,511 Deep subterranean oil reservoir in Niigata, Japan 1,809,823 Geothermally heated seafloor, Ribeira Quente, the Azores 1,877,693 North Sea Thermotoga sp. Troll oil reservoir (platform C) CELL2a XXXXXX Thermotoga sp. Troll oil reservoir (platform C) a XYL54 JSFJ01000000 Thermotoga sp. TBGT Troll oil reservoir (platform B) 17.6.5a JSFG01000000 Thermotoga sp. TBGT Troll oil reservoir (platform B) 17.6.6a JSFI01000000 Kuril Islands Thermotoga sp. 2812Ba Geothermally heated seafloor, Kunashir Island, XXXXXX Kuril islands, Russia a Thermotoga sp. EMP Geothermally heated seafloor, Kunashir Island, AJII01000000 Kuril islands, Russia a Thermotoga sp. Mc24 Geothermally heated seafloor, Kunashir Island, JSFH01000000 Kuril islands, Russia a) Sequenced as part of this study. b) These genomes have not been closed. 1,749,971 > 1,737,772b > 1,747,913b > 1,736,569b 1,843,731 > 1,835,066b > 1,823,483b Table S2. Number of SNPs observed between the TM-group genomes. The first 7genome comparison corresponds to the 7 representative TM-group genomes. This comparison is based on a 1,543,882 nt alignment. SNPs are shown below the diagonal, while the uncorrected distances calculated from the number of shared SNPs is shown above the diagonal. The last 4 genomes were only analyzed within the sample site: either the Kuril islands (K) or Troll oil field (T). The former and the latter comparisons are based on a 1,833,634 nt and a 1,719,687 nt alignment, respectively. Only number of SNPs is shown for the last 4 genomes. Tmar 2812B Mc24 RQ2 CELL2 Tpet Tnaphth Xyl54 Genomesa Seven genomes 0.028 0.062 0.039 0.036 0.049 0.047 Tmar 0.061 0.045 0.044 0.055 0.054 2812B (K) 43750 96403 94212 0.062 0.062 0.057 0.061 Mc24 59920 70116 95733 0.024 0.035 0.032 RQ2 0.040 0.039 CELL2 (T) 55659 67528 95827 36591 75681 84830 88152 54515 61257 0.033 Tpet 72282 83977 94838 49419 60423 50535 Tnaphth Kuril islands 23 EMP (K) Troll 54 XYL54 (T) 122 60 TBGT5 (T) 121 97 TBGT6 (T) a Abbreviations: Tmar, Thermotoga maritima MSB8; 2812B, Thermotoga sp. 2812B; Mc24, Thermotoga sp. Mc24; RQ2, Thermotoga sp. RQ2; CELL2, Thermotoga sp. CELL2; Tpet, Thermotoga petrophila RKU1; Tnapht, Thermotoga naphthophila RKU10; EMP, Thermotoga sp. EMP; XYL54, Thermotoga sp. XYL54; TBGT5, Thermotoga sp. TBGT17.6.5; TBGT6, Thermotoga sp. TBGT17.6.6. TBGT5 7 1 2 3 4 5 Table S3. Genes in TM-group genomes from Troll that vary by > 1nt indela. Insertions are shown in grey, while deletions – in white. Genes in CRISPR regions, mobile elements (insertion sequences and transposases) and other repeats are not listed. Empty cells refer to the region where the gene was not predicted, since the indel disrupts the open reading frame. Functional Length of Locus tag b Annotation in-del nt CELL2 XYL54 TBGT1765 TBGT1766 Alpha-amylase 27 04815 01015 00320 00010 pullulanase, type I 21 04790 01040 00295 08996 Uncharacterized 29 03730 02105 08230 07900 3 02895 02935 07390 07065 6 01965 03886 06438 06118 04176 06141 05828 conserved protein ABC-type antimicrobial peptide transport system, permease component ABC-type Na+ efflux pump, permease component phosphotransferase 17 domain-containing proteinb PAS/PAC sensor- 14 07895 06916 containing diguanylate cyclaseb 6 a 7 and were excluded from the analyses, since they are likely the result of 454 8 sequencing errors (Loman et al., 2012). 9 b Single nucleotide indels were almost exclusively observed in homopolymer tracts Locus tags (which are in format TAG_XXXXX) are shown in two parts: the header 10 row lists the first part (TAG), while the table cells show the last 5 digits of it. For 11 example, locus tag CELL2_04790 would be listed in CELL2 column as 04790. 12 Genome abbreviations: CELL2, Thermotoga sp. CELL2; XYL54, Thermotoga sp. 8 13 XYL54; TBGT1765: Thermotoga sp. TBGT17.6.5; TBGT1766, Thermotoga sp. 14 TBGT17.6.6. 15 16 b The indel appear to disrupt open reading frame, and may result in a non-functional pseudogene. 9 Table S4. Number of genes that are either unique to a genome (diagonal), or shared only between a pair of genomes (off diagonal). Number of genes and corresponding number of nucleotides are shown without or with parentheses, respectively. Noncoding regions were not included in nucleotide calculations. Genomesa Tmar 2812B Mc24 CELL2 RQ2 Tpet Tnapht Tmar 4 9 1 24 2 11 2812B 64 (12110) (69) 10 2 7 2 0 Mc24 (3628) 49 (15397) (3027) 3 0 5 1 CELL2 (34) (665) 38 (8230) (979) 3 8 11 RQ2 (8511) (2061) (0) 29 (6712) (850) 11 10 Tpet (276) (313) (1795) (2722) 34 (12471) (4408) 10 Tnapht (4550) (0) (70) (3430) (3577) 36 (15312) (2356) 42 (11371) a Abbreviations: Tmar, Thermotoga maritima MSB8; 2812B, Thermotoga sp. 2812B; Mc24, Thermotoga sp. Mc24; RQ2, Thermotoga sp. RQ2; CELL2, Thermotoga sp. CELL2; Tpet, Thermotoga petrophila RKU1; Tnapht, Thermotoga naphthophila RKU10. 10 Table S5. Estimates of the population mutation rate ( and gene conversion parameter (. Shown values are an average of three separate analyses using three different complete look up tables. The estimates were performed in the LDHat program (McVean et al., 2002). LCBa LCB Recombintation tract lengthb length LCB3 129,763 0.05881 3.6 5700 62 LCB9 17,498 0.02918 2.7 9800 91 LCB10 12,289 0.03458 0.9 2300 26 LCB11 17,901 0.01890 2.1 2300 109 LCB12 180,309 0.05084 2.6 14600 52 LCB14 48,811 0.04472 1.2 1400 27 LCB15 264,028 0.03994 3.8 12700 95 LCB16 41,522 0.04096 2.3 2000 59 LCB17 47,454 0.39626 2.2 2300 54 LCB18 26,975 0.04638 1.6 2000 35 LCB22 100,696 0.03772 2.5 3700 65 LCB23 89,956 0.04041 3.1 5300 77 LCB24 309,639 0.05468 3.8 6000 69 LCB26 1,429 0.04598 0 0 0 LCB27 128,460 0.04582 1.1 1500 24 LCB30 117,152 0.04093 2.1 3000 51 Averagec 1,542,882 0.04585 2.9 6800 63 a Locally collinear block from the genome alignment (see Methods). b c Rounded to the nearest hundred. This is a weighted average, where weight is the LCB length. 11 Table S6. Summary of the detected recombination events. The analysis was performed in the RDP program (see Supplementary Methods). i) Recombination events summarized by recipient and donor. Events involving isolates from the same type of environment are shaded in grey. Donora Recipienta Tmar 2812B Mc24 Cell2 RQ2 Tpet Tnapht Total Donor Tmar 2812B Mc24b Cell2 RQ2 Tpet Tnapht Unknown Total recipient NA 3 10 11 7 13 13 2 16 NA 9 2 15 16 9 NA 12 2 1 5 6 23 0 3 1 0 1 3 30 31 0 10 11 18 5 NA 26 27 78 51 54 51 6 5 15 14 12 NA 20 38 0 15 21 17 20 62 64 94 48 49 81 73 52 131 471 ii) Total number of events between pairs of genomes, or between a genome and an unknown source outside of analyzed genomes. Events involving isolates from the same type of environment are shaded grey. Isolatesa Tmar 2812B Mc24 Cell2 RQ2 Tpet Tnapht Unknown Tmar 2812B Mc24b Cell2 RQ2 Tpet Tnapht NA 15 13 14 14 12 20 33 7 5 3 5 38 14 19 45 46 0 NA 26 25 15 27 17 21 NA 17 0 20 iii) Average number of recombination instances grouped either by environment type or geographic proximity. Groups of genomes compared Within ‘marine vent’ Within ‘oil reservoir’ Between ‘marine vent’ and ‘oil reservoir’ Average number of events 17.2 25.5 18.5 Within Atlantic 9 Within Pacific 22 Between Pacific and Atlantic 15 a Abbreviations: Tmar, Thermotoga maritima MSB8; 2812B, Thermotoga sp. 2812B; Mc24, Thermotoga sp. Mc24; RQ2, Thermotoga sp. RQ2; CELL2, Thermotoga sp. CELL2; Tpet, Thermotoga petrophila RKU1, Tnapht, Thermotoga naphthophila RKU10. 12 b The higher number of recombination events in Thermotoga sp. Mc24 is probably a result of the easier recombination detection in a more divergent genome. 13 Table S7. List of metagenomes containing sequences with > 90% similarity to Thermotoga sp. Metagenome Abbreviation Location IMG ID CG7 Number of Thermotoga genesa Number of scaffolds (size range) b NA Total bpc San Juan basin coal 11650 16 (25) 6,266 bed production water Cellulolytic Sediment, Great 7164 2,123 274 2,230,489 enrichment Boiling Spring, (2,410) (200 – CS 85C Nevada 333,637) Cellulolytic Sediment, Great 7783 2,009 462 2,206,530 enrichment Boiling Spring, (2,323) (202CS 77C Nevada 49,510) Cellulolytic Sediment, 7780 2,152 517 2,300,260 enrichment Great Boiling (2,397) (201 – S 77C Spring, Nevada 118,805) a Calculated as having > 90% identity to the Thermotoga maritima MSB8 genome. Number of genes pre-classified as belonging to Thermotogae by the phylogenetic distribution tool in IMG is shown in parenthesis. b IMG-classified Thermotogae scaffolds with significant similarity (BLASTN Evalue < 10-20) to any Thermotoga genomes listed in Tables 1. c The calculation was based on lengths of complete contigs with similarity to Thermotoga genomes. 14 Supplementary figure legends Figure S1. Shared CRISPR repeats across Thermotoga genomes. For each CRISPR spacer sequence in all genomes pairwise identities were calculated using Blast2seq (Altschul, 1997). The heat map depicts the percent shared spacers between two genomes represented by different colors. CRISPR spacers were defined as shared if their nucleotide identity was larger than 95%. The heatmap shows that the genomes from the Troll population and two of the genomes from the Kuril island population share more CRISPR repeats with genomes from the same population than they do with any other genomes. It also reveals that Thermotoga maritima MSB8 shares spacers with both Thermotoga sp. RQ2 and Thermotoga petrophila RKU1 while Thermotoga naphthophila RKU10 and Thermotoga sp. Mc24 share few spacers with any other genome. CRISPR sequences were identified for each genome using the CRISPR recognition tool v1.2 (Bland et al., 2007). Figure S2. Maximum likelihood trees of commonly used phylogenetic marker genes. Trees were reconstructed in PhyML (Guindon & Gascuel, 2003) as implemented in Geneious 6 (www.geneious.com) under a GTR+ substitution model. The isolates are classified by geographic origin and environment type (colored circles). Note that although Thermotoga maritima MSB8 was isolated from the Mediterranean Sea, in our analyses it is classified as originating from Atlantic Ocean. Figure S3. Quartet Decomposition (QD) analysis of 7 TM-group genomes. Panel A. The histogram summarizes phylogenetic relationships supported (positive y-value) and conflicted (negative y-value) by 1728 gene families present in at least 4 analyzed genomes (quartets on x-axis, sorted by the number of supporting gene families). The bars are color-coded according to the bootstrap support value on the internal branch of a quartet. Panel B. Gene families that support grouping of strains by ecological niche or by geographic location. Scatter plots of agreement of individual gene families with data partitions by geographical proximity or environment type. Each gene family is 15 represented by a dot. The position of the dot within an XY coordinate system depends on how many embedded quartets within a gene family agree with the data partition (x-value) and how many disagree (y-value). Gene families with poor phylogenetic signal are located near (0,0). From the plots we can infer that only 69 and 25 gene families strongly support the division by environment type and geographical proximity, respectively. Figure S4. Examples of gene families, whose phylogenetic histories do not support Thermotoga sp. CELL2 and Thermotoga sp. GBS grouping. The gene families were identified in the QD analysis. The maximum likelihood trees were reconstructed in RAxML version 7.3.6 (Stamatakis, 2006) under GTR+ model with 100 bootstrap samples. Figure S5. Illustration of possible routes for gene flow among Thermotoga populations. Global Thermotoga collective (depicted as red ovals) is present in both subsurface and marine environments, including oil reservoirs and continental hot springs. Genetic exchange between an oil reservoir and a hot spring may occur either via surface, mediated via marine and air dispersal, or directly within subsurface (black arrows). The latter implies substantial presence of microbial populations within favorable pockets of within the subsurface, to allow for efficient dispersal. The diagram is not drawn to scale. 16 References Altschul S. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. Archibald JM, Roger AJ. (2002). Gene Conversion and the Evolution of Euryarchaeal Chaperonins: A Maximum Likelihood-Based Method for Detecting Conflicting Phylogenetic Signals. J Mol Evol 55:232–245. Balch WE, Fox GE, Magrum LJ, Woese CR, Wolfe RS. (1979). Methanogens: reevaluation of a unique biological group. Microbiol Rev 43:260–296. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, et al. (2007). CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8:209. Charbonnier F, Forterre P. (1995). Protocol 12: Purification of plasmids from thermophilic and hyperthermophilic archaea. In:Archaea: a laboratory manual— thermophiles., Robb, FT & Place, AR (eds), Cold Spring Harbor Laboratory Press: Cold Spring Harbor, N. Y., pp. 87–90. Chattopadhyay S, Paul S, Dykhuizen DE, Sokurenko EV. (2013). Tracking recent adaptive evolution in microbial species using TimeZone. Nature Protocols 8:652–665. Dipippo JL, Nesbø CL, Dahle H, Doolittle WF, Birkland N-K, Noll KM. (2009). Kosmotoga olearia gen. nov., sp. nov., a thermophilic, anaerobic heterotroph isolated from an oil production fluid. Int J Syst Evol Micr 59:2991–3000. Drake JW. (2009). Avoiding Dangerous Missense: Thermophiles Display Especially Low Mutation Rates. PLoS Genet 1–6. Guindon SXP, Gascuel O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biolology 52:696–704. Hungate RE. (1962). A role tube method for cultivation of strict anaerobes. In:Methods in microbiology, Norris, JR & Ribbons, DW (eds) Vol. 3B, Academic Press: London, pp. 117–132. Huson DH, Bryant D. (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267. Jolley KA. (2004). The Influence of Mutation, Recombination, Population History, and Selection on Patterns of Genetic Diversity in Neisseria meningitidis. Mol Biol Evol 22:562–569. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, et al. (2012). Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnology 30:434–439. 17 Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Pillay M, et al. (2014). IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Res 42:D560–7. Martin DP, Lemey P, Lott M, Moulton V, Posada D, Lefeuvre P. (2010). RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics 26:2462–2463. McVean G, Awadalla P, Fearnhead P. (2002). A Coalescent-Based Method for Detecting and Estimating Recombination From Gene Sequences. Genetics 160:1231– 1241. Morono Y, Terada T, Nishizawa M, Ito M, Hillion F, Takahata N, et al. (2011). Carbon and nitrogen assimilation in deep subseafloor microbial cells. Proc Natl Acad Sci 108:18295–18300. Nesbø CL, Dlutek M, Doolittle WF. (2006). Recombination in Thermotoga: implications for species concepts and biogeography. Genetics 172:759–769. Stamatakis A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690. Sutcliffe B, Midgley DJ, Rosewarne CP, Greenfield P, Li D. (2013). Draft Genome Sequence of Thermotoga maritima A7A Reconstructed from Metagenomic Sequencing Analysis of a Hydrocarbon Reservoir in the Bass Strait, Australia. Genome Announc 1:e00688–13–e00688–13. Svetlichny VA, Sokolova TG, Gerhardt M, Kostrikina NA, Zavarzin GA. (1991). Anaerobic extremely thermophilic carboxydotrophic bacteria in hydrotherms of Kuril Islands. Microb Ecol 21:1–10. Swofford DL. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Takahata Y, Nishijima M, Hoaki T, Maruyama T. (2008). Thermotoga petrophila sp. nov. and Thermotoga naphthophila sp. nov., two hyperthermophilic bacteria from the Kubiki oil reservoir in Niigata, Japan. Int J Syst Evol Micr 51:1901–1909. Widdel F, Kohing GW, Mayer F. (1983). Studies on dissimilatory sulfate-reducing bacteria that decompose fatty-acids. 3: Characterization of the filamentous gliding Desulfonema limicola gen. nov. sp. nov., and Desulfonema magnum sp. nov. Arch Microbiol 134:286–294. Zhaxybayeva O, Doolittle WF, Papke RT, Gogarten JP. (2009). Intertwined evolutionary histories of marine Synechococcus and Prochlorococcus marinus. Genome Biol Evol 1:325–339. Zhaxybayeva O, Swithers KS, Lapierre P, Fournier GP, Bickhart DM, DeBoy RT, et al. (2009). On the chimeric nature, thermophilic origin, and phylogenetic placement of the Thermotogales. Proc Natl Acad Sci 106:5865–5870. 18