Download Supplementary Information (doc 270K)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Supplementary Information
Methods
Bacterial isolation. Thermotoga bacteria were isolated from oil production fluids of
the Troll B and Troll C oil platforms (60° 469 27.80 N 03° 309 11.50 E) as described
in (Dipippo et al., 2009). The reservoir is located 1560 m below the sea floor, with an
in situ temperature of 68 °C and a predicted in situ pH of 6. Several enrichment
cultures were established. One ml of production fluid was injected into either 20 ml
minimal medium (MM1) or into 20ml Thermotoga petrophila medium under 100%
N2 atmosphere (Dipippo et al., 2009; Takahata et al., 2008). The composition of the
MM1 medium per liter of distilled water was as follows: 20 g NaCl, 0.9 g MgCl2 .
6H2O, 1.4 g MgSO47H2O, 0.33 g KCl, 0.25 g NH4Cl, 0.14 g CaCl22H2O, 0.45 g
KH2PO4, 1.0 ml trace minerals SL-10 (Widdel et al., 1983) 0.1 mg resazurin and 1 g
yeast extract. After autoclaving, 4 ml 0.5 M Na2S9H2O and 10 ml trace vitamin
solution (Balch et al., 1979) were added per liter. For both media the pH was adjusted
to 6.8 with 1 M NaOH. Each enrichment culture was then supplied with a different
growth substrate (i.e., fructose, glucose, xylose, galactose, cellulose) to a 0.5 % w/v
final concentration and the culture bottle was incubated at 70 °C for 1–7 days.
Dilution series were made by transferring 1 ml of the enrichment culture into a Bellco
tube containing 9 ml medium and then transferring 1 ml from that into the next tube,
and so on, to a 108 dilution (Hungate, 1962). One ml of 3% Gelrite was added to each
tube to make shake tubes. White, round colonies were picked from the dilution series
with the fewest colonies after incubation at 70 °C for 3 days, and their 16S rRNA
gene sequences were determined after amplification by PCR using primers 16S.27F
(5’-AGAGTTTGATCCTGG- CTCAG-3’) and 16S.1406R (5’ACGGGCGGTGTGTRC-3’). In all but one dilution series the recovered bacteria
from the Thermotogales order belonged to the Thermotoga genus. These isolates had
almost identical 16S rRNA gene sequences (>99.8% identity) and clustered with
Thermotoga sp. RQ2 in phylogenetic trees). In one series grown on fructose, the
detected bacterium was Kosmotoga olearia TBF 19.5.1 described in (Dipippo et al.,
2009).
The Kuril islands strains were isolated using similar methods from shallow
marine hydrothermal vents in Alechino area (43° 5431 N 145° 2940 E) of Kunashir
Island, Russia. Enrichment and pure cultures were obtained and cultivated on a
medium similar to MM1, but with 25 g l-1 of NaCl as described in (Svetlichny et al.,
1991), 100% N2 in gas phase and 2 g l-1 of microcrystalline cellulose (MCC,
Chemapol, Czech Republic) as substrate. Dilution-series on gelrite with 2g l-1 of
sucrose were made as described above, and grown colonies were picked up and
inoculated into MCC medium.
DNA isolation, genome and fosmid sequencing and annotation. DNA was
isolated following the protocol from (Charbonnier & Forterre, 1995). The genomic
DNA and fosmid clones carrying rRNA operons were sequenced using Roche 454technology with 8kb paired-end libraries and ¼ run per genome to a 26-63x coverage,
and assembled into contigs and scaffolds using NEWBLER version 2.0 at the
Massively Parallel Sequencing (MPS) Unit at Genome Quebec, Monteral, Canada.
All of the genomes assembled into one scaffold with 5-22 ordered contigs. In two of
the genomes, Thermotoga sp. CELL2 and Thermotoga sp. 2812B, all gaps were
closed using PCR by designing primers to the ends of neighboring contigs in the
scaffolds. In Thermotoga sp. TBGT17.6.5 genome, two gaps remained after the PCR
closing.
Fosmid libraries were constructed and screened as described in (Nesbø et al.,
2006). Clones carrying rRNA operons were sequenced using 454-technology and
assembled into contigs using NEWBLER version 2.0 at the Massively Parallel
Sequencing (MPS) Unit at Genome Quebec, Monteral, Canada.
The genomes and fosmid clones were submitted to Genbank (Acc. Numbers
XXX) and annotated by the Prokaryotic Automatic Annotation Pipeline. CRISPRs
were identified in all genomes using the CRISPR Recognition Tool (CRT) v. 1.1 with
default settings (Bland et al., 2007) as described in (Zhaxybayeva, Swithers, et al.,
2009). Spacers identified from the CRT analysis were compared pairwise using
bl2seq program of BLAST v 2.2.29+ using default settings (Altschul, 1997).
Assembly of a Thermotoga genome from a metagenome. Three
metagenomes from 3 different sampling sites of Great Boiling Spring (GBS) in
Nevada (Markowitz et al., 2014) with a large number of sequences with ≥ 90%
identity to TM-group genomes (6.7Mb in total, Table S2) were assembled de novo
using Geneious 6 (www.geneious.com). Since very low within-site diversity was
observed in the assembled scaffolds, we selected the sample with the longest
assembled scaffolds (sample 85cSC) for a genome assembly. After removing the
smallest of the redundant contigs (which could be a result of population diversity,
paralogy or mis-assembly), we assembled a 2.1Mb draft genome that we denote
throughout the manuscript as Thermotoga sp. GBS.
The Thermotoga sp. GBS and the Thermotoga sp. A7A (Sutcliffe et al., 2013)
genomes contain 92% and 88% of the protein-coding genes in the T. maritima MSB8
genome, respectively, indicating that both draft genomes are nearly complete.
Analysis of quartets from the Quartet Decomposition analysis. The
quartets topologies with substantial bootstrap support (> 80%) were summarized into
a spectrogram. Quartet topologies supported with >80% bootstrap values by at least
30% of gene families were extracted and coded into a weighted data matrix. Plurality
networks were calculated from the matrix using in SplitsTree 4 (Huson & Bryant,
2006). Genomes were further grouped into categories by their origin. In an
“ecological niche” division, the taxa were divided into two groups: those originating
from oil reservoir (T. petrophila RKU1, T.naptophila RKU10 and Thermotoga sp.
CELL2) and the “marine” isolates (T. maritima, Thermotoga sp. RQ2, Thermotoga sp.
2812B and Thermotoga sp. Mc24). In the “geographic proximity” division, the
isolates were designated as originated from “Europe/Atlantic Ocean” (T. maritima,
Thermotoga sp. CELL2 and Thermotoga sp. RQ2) or “Japan and Kurils
Islands/Pacific Ocean” (T. petrophila RKU1, T.naptophila RKU10, Thermotoga sp.
2812B and Thermotoga sp. Mc24). Support of these partitions by individual gene
families was evaluated using agreement scoring and scatter plot analysis as described
in (Zhaxybayeva, Doolittle, et al., 2009). Gene families with an agreement score
above 0 and larger than a disagreement score were designated as “preferentially”
supporting the partition in question. Furthermore, gene families that had an
agreement score >0.6 and a disagreement score of 0 were designated as “strongly
supporting” the partition in question.
We also repeated QD analyses with a larger genome set that included
Thermotoga sp. GBS and Thermotoga sp. A7A draft genomes. Genes in the genomes
assembled from metagenomic data, with ≥90% identity to the corresponding genes in
T. maritima genome, were extracted in the TimeZone package (Chattopadhyay et al.,
2013) and aligned using ClustalW (Larkin et al., 2007). Phylogenetic tree
reconstruction and QD analysis were conducted as described above.
Recombination analysis. The relative rate of recombination to mutation, as
well as the average recombination tract length, were assessed using the pairwise
program and a likelihood look-up table generated by the complete program in the
LDhat package (McVean et al., 2002; Jolley, 2004). For each LCB in the alignment of
7 TM-group bacteria, we calculated the population mutation rate (2Ne= ) and the
gene conversion parameter  = 2Nect, where Ne is effective population size,  is the
mutation rate, c is the rate of initiation of gene conversion per base and t is the
average gene conversion tract length. Since the parameter estimates varied (by the
same order of magnitude as between LCBs; see Supplementary Table S4) depending
on the look-up table used by the pairwise program, for each LCB we performed three
analyses using different look-up tables generated by the complete program in the
LDhat package.
Detection of recombinant fragments was carried out in RDP version 4.33
(Martin et al., 2010) and LikeWind (Archibald & Roger, 2002). In the RDP package
we used the RDP, Genconv, Maxchi and Chimera algorithms, and counted only the
events detected in at least three of the four methods. The neighbor-joining tree
calculated from the aligned LCBs within the RDP package was used as reference tree.
On this tree, sister taxa pairs are Thermotoga sp. RQ2 and Thermotoga sp. Cell2, T.
petrophila and T. naphthophila, and T. maritima and Thermotoga sp. 2812B. The
detected recombination events were manually inspected and rejected if necessary.
Some rejected events consisted of large segments with many inferred recombination
breakpoints, making the detangling of the history difficult (Fig. S7). We also
manually corrected some recombination events where the donor and recipient genome
had been incorrectly inferred, and adjusted the recombination breakpoints for some of
the larger recombination fragments. Events with predicted endpoints were used to
estimate average recombination tract length. In the LikeWind analysis, we used was
the maximum likelihood tree calculated in PAUP* version 4.0b10 (Swofford) under a
GTR++ model as the reference tree.
Estimation of the expected divergence of two genomes in isolated oil
reservoirs. In absence of recombination, the divergence will be a result of mutation
and can be calculated as 2μt, where μ is a mutation rate and t is the time since the
reservoirs isolation. Mutation rates were repeatedly shown to be on average 0.0030.004 mutations per genome per replication across all three domains of life.
Generation times for subsurface bacteria are estimated to be as low as 1,000
yrs/generation (Morono et al., 2011). Hence, we expect the two genomes to
accumulate K=2 x 0.003 x 1,000,000 / 1,000 = 6 mutations per genome per one
million year of isolation. Due to such low mutation rate, we are not correcting for
back mutations and multiple mutations that can occur at the same nucleotide position
in a genome. Using the above rates, indigenous bacteria in the Troll oil reservoir are
expected to accumulate at least 145 x 6 = 870 SNPs per genome, due to their
presumed isolation for 145 MY. More conservative thermophile mutation rate of
0.00033 (Drake, 2009) would yield 95.7 SNPs per genome.
Since generation times are likely to be on the order of 10 rather than 1,000
years/generation, the number of mutations per genome may be as high as 92,800.
Using Jukes-Cantor correction for multiple substitutions and back mutations, we
would expect to observe ~90,000 SNPs
Table S1. Thermotoga maritima-like genomes analyzed in this study.
Name and Genbank
Accession Number
Italy
T. maritima
NC_000853
Japan
T. petrophila RKU1
NC_009486
T. naphthophila
RKU10
NC_013642
Azorez
Thermotoga sp. RQ2
NC_010483
Sample site
Size (bp)
Geothermally heated seafloor, Vulcano island,
Italy
1,860,725
Deep subterranean oil reservoir in Niigata,
Japan
1,823,511
Deep subterranean oil reservoir in Niigata,
Japan
1,809,823
Geothermally heated seafloor, Ribeira Quente,
the Azores
1,877,693
North Sea
Thermotoga sp.
Troll oil reservoir (platform C)
CELL2a
XXXXXX
Thermotoga sp.
Troll oil reservoir (platform C)
a
XYL54
JSFJ01000000
Thermotoga sp. TBGT Troll oil reservoir (platform B)
17.6.5a
JSFG01000000
Thermotoga sp. TBGT Troll oil reservoir (platform B)
17.6.6a
JSFI01000000
Kuril Islands
Thermotoga sp. 2812Ba Geothermally heated seafloor, Kunashir Island,
XXXXXX
Kuril islands, Russia
a
Thermotoga sp. EMP
Geothermally heated seafloor, Kunashir Island,
AJII01000000
Kuril islands, Russia
a
Thermotoga sp. Mc24
Geothermally heated seafloor, Kunashir Island,
JSFH01000000
Kuril islands, Russia
a) Sequenced as part of this study.
b) These genomes have not been closed.
1,749,971
> 1,737,772b
> 1,747,913b
> 1,736,569b
1,843,731
> 1,835,066b
> 1,823,483b
Table S2. Number of SNPs observed between the TM-group genomes. The first 7genome comparison corresponds to the 7 representative TM-group genomes. This
comparison is based on a 1,543,882 nt alignment. SNPs are shown below the diagonal,
while the uncorrected distances calculated from the number of shared SNPs is shown
above the diagonal. The last 4 genomes were only analyzed within the sample site:
either the Kuril islands (K) or Troll oil field (T). The former and the latter
comparisons are based on a 1,833,634 nt and a 1,719,687 nt alignment, respectively.
Only number of SNPs is shown for the last 4 genomes.
Tmar 2812B Mc24
RQ2
CELL2
Tpet
Tnaphth Xyl54
Genomesa
Seven
genomes
0.028 0.062 0.039
0.036
0.049
0.047
Tmar
0.061 0.045
0.044
0.055
0.054
2812B (K) 43750
96403 94212
0.062
0.062
0.057
0.061
Mc24
59920 70116 95733
0.024
0.035
0.032
RQ2
0.040
0.039
CELL2 (T) 55659 67528 95827 36591
75681 84830 88152 54515 61257
0.033
Tpet
72282
83977
94838
49419
60423
50535
Tnaphth
Kuril
islands
23
EMP (K)
Troll
54
XYL54 (T)
122
60
TBGT5 (T)
121
97
TBGT6 (T)
a
Abbreviations: Tmar, Thermotoga maritima MSB8; 2812B, Thermotoga sp. 2812B;
Mc24, Thermotoga sp. Mc24; RQ2, Thermotoga sp. RQ2; CELL2, Thermotoga sp.
CELL2; Tpet, Thermotoga petrophila RKU1; Tnapht, Thermotoga naphthophila
RKU10; EMP, Thermotoga sp. EMP; XYL54, Thermotoga sp. XYL54; TBGT5,
Thermotoga sp. TBGT17.6.5; TBGT6, Thermotoga sp. TBGT17.6.6.
TBGT5
7
1
2
3
4
5
Table S3. Genes in TM-group genomes from Troll that vary by > 1nt indela.
Insertions are shown in grey, while deletions – in white. Genes in CRISPR regions,
mobile elements (insertion sequences and transposases) and other repeats are not
listed. Empty cells refer to the region where the gene was not predicted, since the
indel disrupts the open reading frame.
Functional
Length of Locus tag b
Annotation
in-del nt
CELL2
XYL54
TBGT1765
TBGT1766
Alpha-amylase
27
04815
01015
00320
00010
pullulanase, type I
21
04790
01040
00295
08996
Uncharacterized
29
03730
02105
08230
07900
3
02895
02935
07390
07065
6
01965
03886
06438
06118
04176
06141
05828
conserved protein
ABC-type
antimicrobial
peptide transport
system, permease
component
ABC-type Na+
efflux pump,
permease
component
phosphotransferase
17
domain-containing
proteinb
PAS/PAC sensor-
14
07895
06916
containing
diguanylate
cyclaseb
6
a
7
and were excluded from the analyses, since they are likely the result of 454
8
sequencing errors (Loman et al., 2012).
9
b
Single nucleotide indels were almost exclusively observed in homopolymer tracts
Locus tags (which are in format TAG_XXXXX) are shown in two parts: the header
10
row lists the first part (TAG), while the table cells show the last 5 digits of it. For
11
example, locus tag CELL2_04790 would be listed in CELL2 column as 04790.
12
Genome abbreviations: CELL2, Thermotoga sp. CELL2; XYL54, Thermotoga sp.
8
13
XYL54; TBGT1765: Thermotoga sp. TBGT17.6.5; TBGT1766, Thermotoga sp.
14
TBGT17.6.6.
15
16
b
The indel appear to disrupt open reading frame, and may result in a non-functional
pseudogene.
9
Table S4. Number of genes that are either unique to a genome (diagonal), or
shared only between a pair of genomes (off diagonal). Number of genes and
corresponding number of nucleotides are shown without or with parentheses,
respectively. Noncoding regions were not included in nucleotide calculations.
Genomesa
Tmar
2812B
Mc24
CELL2
RQ2
Tpet
Tnapht
Tmar
4
9
1
24
2
11
2812B
64
(12110)
(69)
10
2
7
2
0
Mc24
(3628)
49
(15397)
(3027)
3
0
5
1
CELL2
(34)
(665)
38
(8230)
(979)
3
8
11
RQ2
(8511)
(2061)
(0)
29
(6712)
(850)
11
10
Tpet
(276)
(313)
(1795)
(2722)
34
(12471)
(4408)
10
Tnapht
(4550)
(0)
(70)
(3430)
(3577)
36
(15312)
(2356)
42
(11371)
a
Abbreviations: Tmar, Thermotoga maritima MSB8; 2812B, Thermotoga sp. 2812B;
Mc24, Thermotoga sp. Mc24; RQ2, Thermotoga sp. RQ2; CELL2, Thermotoga sp.
CELL2; Tpet, Thermotoga petrophila RKU1; Tnapht, Thermotoga naphthophila
RKU10.
10
Table S5. Estimates of the population mutation rate ( and gene conversion
parameter (. Shown values are an average of three separate analyses using three
different complete look up tables. The estimates were performed in the LDHat
program (McVean et al., 2002).
LCBa
LCB


Recombintation

tract lengthb
length
LCB3
129,763
0.05881
3.6
5700
62
LCB9
17,498
0.02918
2.7
9800
91
LCB10
12,289
0.03458
0.9
2300
26
LCB11
17,901
0.01890
2.1
2300
109
LCB12
180,309
0.05084
2.6
14600
52
LCB14
48,811
0.04472
1.2
1400
27
LCB15
264,028
0.03994
3.8
12700
95
LCB16
41,522
0.04096
2.3
2000
59
LCB17
47,454
0.39626
2.2
2300
54
LCB18
26,975
0.04638
1.6
2000
35
LCB22
100,696
0.03772
2.5
3700
65
LCB23
89,956
0.04041
3.1
5300
77
LCB24
309,639
0.05468
3.8
6000
69
LCB26
1,429
0.04598
0
0
0
LCB27
128,460
0.04582
1.1
1500
24
LCB30
117,152
0.04093
2.1
3000
51
Averagec
1,542,882
0.04585
2.9
6800
63
a
Locally collinear block from the genome alignment (see Methods).
b
c
Rounded to the nearest hundred.
This is a weighted average, where weight is the LCB length.
11
Table S6. Summary of the detected recombination events. The analysis was
performed in the RDP program (see Supplementary Methods).
i) Recombination events summarized by recipient and donor. Events involving
isolates from the same type of environment are shaded in grey.
Donora
Recipienta
Tmar
2812B
Mc24
Cell2
RQ2
Tpet
Tnapht
Total
Donor
Tmar 2812B Mc24b Cell2 RQ2 Tpet Tnapht Unknown Total
recipient
NA
3
10
11
7
13
13
2
16
NA
9
2
15
16
9
NA
12
2
1
5
6
23
0
3
1
0
1
3
30
31
0
10
11
18
5
NA
26
27
78
51
54
51
6
5
15
14
12
NA
20
38
0
15
21
17
20
62
64
94
48
49
81
73
52
131
471
ii) Total number of events between pairs of genomes, or between a genome and an
unknown source outside of analyzed genomes. Events involving isolates from the
same type of environment are shaded grey.
Isolatesa
Tmar
2812B
Mc24
Cell2
RQ2
Tpet
Tnapht
Unknown
Tmar
2812B
Mc24b
Cell2
RQ2
Tpet
Tnapht
NA
15
13
14
14
12
20
33
7
5
3
5
38
14
19
45
46
0
NA
26
25
15
27
17
21
NA
17
0
20
iii) Average number of recombination instances grouped either by environment type
or geographic proximity.
Groups of genomes compared
Within ‘marine vent’
Within ‘oil reservoir’
Between ‘marine vent’ and ‘oil
reservoir’
Average number of events
17.2
25.5
18.5
Within Atlantic
9
Within Pacific
22
Between Pacific and Atlantic
15
a
Abbreviations: Tmar, Thermotoga maritima MSB8; 2812B, Thermotoga sp. 2812B;
Mc24, Thermotoga sp. Mc24; RQ2, Thermotoga sp. RQ2; CELL2, Thermotoga sp.
CELL2; Tpet, Thermotoga petrophila RKU1, Tnapht, Thermotoga naphthophila
RKU10.
12
b
The higher number of recombination events in Thermotoga sp. Mc24 is probably a
result of the easier recombination detection in a more divergent genome.
13
Table S7. List of metagenomes containing sequences with > 90% similarity to
Thermotoga sp.
Metagenome
Abbreviation
Location
IMG ID
CG7
Number of
Thermotoga
genesa
Number
of
scaffolds
(size
range) b
NA
Total bpc
San Juan basin coal 11650
16 (25)
6,266
bed production
water
Cellulolytic
Sediment, Great
7164
2,123
274
2,230,489
enrichment
Boiling Spring,
(2,410)
(200 –
CS 85C
Nevada
333,637)
Cellulolytic
Sediment, Great
7783
2,009
462
2,206,530
enrichment
Boiling Spring,
(2,323)
(202CS 77C
Nevada
49,510)
Cellulolytic
Sediment,
7780
2,152
517
2,300,260
enrichment
Great Boiling
(2,397)
(201 –
S 77C
Spring, Nevada
118,805)
a
Calculated as having > 90% identity to the Thermotoga maritima MSB8 genome.
Number of genes pre-classified as belonging to Thermotogae by the phylogenetic
distribution tool in IMG is shown in parenthesis.
b
IMG-classified Thermotogae scaffolds with significant similarity (BLASTN Evalue < 10-20) to any Thermotoga genomes listed in Tables 1.
c
The calculation was based on lengths of complete contigs with similarity to
Thermotoga genomes.
14
Supplementary figure legends
Figure S1. Shared CRISPR repeats across Thermotoga genomes. For each CRISPR
spacer sequence in all genomes pairwise identities were calculated using Blast2seq
(Altschul, 1997). The heat map depicts the percent shared spacers between two
genomes represented by different colors. CRISPR spacers were defined as shared if
their nucleotide identity was larger than 95%. The heatmap shows that the genomes
from the Troll population and two of the genomes from the Kuril island population
share more CRISPR repeats with genomes from the same population than they do
with any other genomes. It also reveals that Thermotoga maritima MSB8 shares
spacers with both Thermotoga sp. RQ2 and Thermotoga petrophila RKU1 while
Thermotoga naphthophila RKU10 and Thermotoga sp. Mc24 share few spacers with
any other genome. CRISPR sequences were identified for each genome using the
CRISPR recognition tool v1.2 (Bland et al., 2007).
Figure S2. Maximum likelihood trees of commonly used phylogenetic marker genes.
Trees were reconstructed in PhyML (Guindon & Gascuel, 2003) as implemented in
Geneious 6 (www.geneious.com) under a GTR+ substitution model. The isolates are
classified by geographic origin and environment type (colored circles). Note that
although Thermotoga maritima MSB8 was isolated from the Mediterranean Sea, in
our analyses it is classified as originating from Atlantic Ocean.
Figure S3. Quartet Decomposition (QD) analysis of 7 TM-group genomes. Panel
A. The histogram summarizes phylogenetic relationships supported (positive y-value)
and conflicted (negative y-value) by 1728 gene families present in at least 4 analyzed
genomes (quartets on x-axis, sorted by the number of supporting gene families). The
bars are color-coded according to the bootstrap support value on the internal branch of
a quartet. Panel B. Gene families that support grouping of strains by ecological niche
or by geographic location. Scatter plots of agreement of individual gene families with
data partitions by geographical proximity or environment type. Each gene family is
15
represented by a dot. The position of the dot within an XY coordinate system depends
on how many embedded quartets within a gene family agree with the data partition
(x-value) and how many disagree (y-value). Gene families with poor phylogenetic
signal are located near (0,0). From the plots we can infer that only 69 and 25 gene
families strongly support the division by environment type and geographical
proximity, respectively.
Figure S4. Examples of gene families, whose phylogenetic histories do not support
Thermotoga sp. CELL2 and Thermotoga sp. GBS grouping. The gene families were
identified in the QD analysis. The maximum likelihood trees were reconstructed in
RAxML version 7.3.6 (Stamatakis, 2006) under GTR+ model with 100 bootstrap
samples.
Figure S5. Illustration of possible routes for gene flow among Thermotoga
populations. Global Thermotoga collective (depicted as red ovals) is present in both
subsurface and marine environments, including oil reservoirs and continental hot
springs. Genetic exchange between an oil reservoir and a hot spring may occur either
via surface, mediated via marine and air dispersal, or directly within subsurface (black
arrows). The latter implies substantial presence of microbial populations within
favorable pockets of within the subsurface, to allow for efficient dispersal. The
diagram is not drawn to scale.
16
References
Altschul S. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 25:3389–3402.
Archibald JM, Roger AJ. (2002). Gene Conversion and the Evolution of Euryarchaeal
Chaperonins: A Maximum Likelihood-Based Method for Detecting Conflicting
Phylogenetic Signals. J Mol Evol 55:232–245.
Balch WE, Fox GE, Magrum LJ, Woese CR, Wolfe RS. (1979). Methanogens:
reevaluation of a unique biological group. Microbiol Rev 43:260–296.
Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, et al. (2007).
CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly
interspaced palindromic repeats. BMC Bioinformatics 8:209.
Charbonnier F, Forterre P. (1995). Protocol 12: Purification of plasmids from
thermophilic and hyperthermophilic archaea. In:Archaea: a laboratory manual—
thermophiles., Robb, FT & Place, AR (eds), Cold Spring Harbor Laboratory Press:
Cold Spring Harbor, N. Y., pp. 87–90.
Chattopadhyay S, Paul S, Dykhuizen DE, Sokurenko EV. (2013). Tracking recent
adaptive evolution in microbial species using TimeZone. Nature Protocols 8:652–665.
Dipippo JL, Nesbø CL, Dahle H, Doolittle WF, Birkland N-K, Noll KM. (2009).
Kosmotoga olearia gen. nov., sp. nov., a thermophilic, anaerobic heterotroph isolated
from an oil production fluid. Int J Syst Evol Micr 59:2991–3000.
Drake JW. (2009). Avoiding Dangerous Missense: Thermophiles Display Especially
Low Mutation Rates. PLoS Genet 1–6.
Guindon SXP, Gascuel O. (2003). A simple, fast, and accurate algorithm to estimate
large phylogenies by maximum likelihood. Systematic Biolology 52:696–704.
Hungate RE. (1962). A role tube method for cultivation of strict anaerobes.
In:Methods in microbiology, Norris, JR & Ribbons, DW (eds) Vol. 3B, Academic
Press: London, pp. 117–132.
Huson DH, Bryant D. (2006). Application of phylogenetic networks in evolutionary
studies. Mol Biol Evol 23:254–267.
Jolley KA. (2004). The Influence of Mutation, Recombination, Population History,
and Selection on Patterns of Genetic Diversity in Neisseria meningitidis. Mol Biol
Evol 22:562–569.
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et
al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948.
Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, et al.
(2012). Performance comparison of benchtop high-throughput sequencing platforms.
Nature Biotechnology 30:434–439.
17
Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Pillay M, et al. (2014).
IMG 4 version of the integrated microbial genomes comparative analysis system.
Nucleic Acids Res 42:D560–7.
Martin DP, Lemey P, Lott M, Moulton V, Posada D, Lefeuvre P. (2010). RDP3: a
flexible and fast computer program for analyzing recombination. Bioinformatics
26:2462–2463.
McVean G, Awadalla P, Fearnhead P. (2002). A Coalescent-Based Method for
Detecting and Estimating Recombination From Gene Sequences. Genetics 160:1231–
1241.
Morono Y, Terada T, Nishizawa M, Ito M, Hillion F, Takahata N, et al. (2011).
Carbon and nitrogen assimilation in deep subseafloor microbial cells. Proc Natl Acad
Sci 108:18295–18300.
Nesbø CL, Dlutek M, Doolittle WF. (2006). Recombination in Thermotoga:
implications for species concepts and biogeography. Genetics 172:759–769.
Stamatakis A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic
analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690.
Sutcliffe B, Midgley DJ, Rosewarne CP, Greenfield P, Li D. (2013). Draft Genome
Sequence of Thermotoga maritima A7A Reconstructed from Metagenomic
Sequencing Analysis of a Hydrocarbon Reservoir in the Bass Strait, Australia.
Genome Announc 1:e00688–13–e00688–13.
Svetlichny VA, Sokolova TG, Gerhardt M, Kostrikina NA, Zavarzin GA. (1991).
Anaerobic extremely thermophilic carboxydotrophic bacteria in hydrotherms of Kuril
Islands. Microb Ecol 21:1–10.
Swofford DL. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other
Methods).
Takahata Y, Nishijima M, Hoaki T, Maruyama T. (2008). Thermotoga petrophila sp.
nov. and Thermotoga naphthophila sp. nov., two hyperthermophilic bacteria from the
Kubiki oil reservoir in Niigata, Japan. Int J Syst Evol Micr 51:1901–1909.
Widdel F, Kohing GW, Mayer F. (1983). Studies on dissimilatory sulfate-reducing
bacteria that decompose fatty-acids. 3: Characterization of the filamentous gliding
Desulfonema limicola gen. nov. sp. nov., and Desulfonema magnum sp. nov. Arch
Microbiol 134:286–294.
Zhaxybayeva O, Doolittle WF, Papke RT, Gogarten JP. (2009). Intertwined
evolutionary histories of marine Synechococcus and Prochlorococcus marinus.
Genome Biol Evol 1:325–339.
Zhaxybayeva O, Swithers KS, Lapierre P, Fournier GP, Bickhart DM, DeBoy RT, et
al. (2009). On the chimeric nature, thermophilic origin, and phylogenetic placement
of the Thermotogales. Proc Natl Acad Sci 106:5865–5870.
18