Download Archaeogenomic Evidence of Punctuated Genome Evolution in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Archaeogenomic Evidence of Punctuated Genome Evolution
in Gossypium
Sarah A. Palmer,1 Alan J. Clapham,2 Pamela Rose,3 Fábio O. Freitas,4 Bruce D. Owen,5
David Beresford-Jones,6 Jonathan D. Moore,7 James L. Kitchen,1 and Robin G. Allaby*,1
1
School of Life Sciences, The University of Warwick, Coventry, United Kingdom
Worcester Historic Environment and Archaeology Service, Woodbury, University of Worcester, Worcester, United Kingdom
3
The Austrian Archaeological Institute, Cairo Branch, Zamalek, Cairo, Egypt
4
Embrapa Recursos Genéticos e Biotecnologia, Parque Estacxão Biológica—PqEB, Brası́lia, DF, Brasil
5
Department of Anthropology, Sonoma State University
6
The MacDonald Institute for Archaeological Research, University of Cambridge, Cambridge, United Kingdom
7
Warwick Systems Biology, The University of Warwick, Coventry, United Kingdom
*Corresponding author: E-mail: [email protected].
Associate editor: Beth Shapiro
2
Transposable elements (TEs) are drivers of evolution resulting in episodic surges of genetic innovation and genomic
reorganization (Oliver KR, Greene WK. 2009. TEs: powerful facilitators of evolution. Bioessays 31:703–714.), but there is
little evidence of the timescale in which this process has occurred (Gingerich PD. 2009. Rates of evolution. Ann Rev Ecol
Evol Syst. 40:657–675.). The paleontological and archaeological records provide direct evidence for how evolution has
proceeded in the past, which can be accessed through ancient DNA to examine genomes using high-throughput
sequencing technologies (Palmer SA, Smith O, Allaby RG. 2011. The blossoming of plant archaeogenetics. Ann Anat.
194:146–156.). In this study, we report shotgun sequencing of four archaeological samples of cotton using the GS 454 FLX
platform, which enabled reconstruction of the TE composition of these past genomes and species identification. From this,
a picture of lineage specific evolutionary patterns emerged, even over the relatively short timescale of a few thousand
years. Genomic stability was observed between South American Gossypium barbadense samples separated by over 2,000
miles and 3,000 years. In contrast, the TE composition of ancient Nubian cotton, identified as G. herbaceum, differed
dramatically from that of modern G. herbaceum and resembled closely the A genome of the New World tetraploids. Our
analysis has directly shown that considerable genomic reorganization has occurred within the history of a domesticated
plant species while genomic stability has occurred in closely related species. A pattern of episodes of rapid change and
periods of stability is expected of punctuated evolution. This observation is important to understanding the process of
evolution under domestication.
Key words: archaeogenomics, evolution, Gossypium, transposable element.
Introduction
Genomes of domesticated plant species have experienced
extensive genome rearrangement (Bennetzen 2005;
Baucom et al. 2011). Some of this may have resulted from
periods of evolutionary stress upon introduction to new
environments or cultural practices during domestication,
and some may pertain to evolutionary events deeper in
time prior to domestication. Genome rearrangement has
been facilitated by transposable elements (TEs), which in
turn has generated variation that has been subject to natural selection (Oliver and Greene 2009; Baucom et al. 2011;
Tenaillon et al. 2011). Episodic surges of TE activity as adaptative responses fit the pattern of punctuated equilibrium
seen in paleontology (Oliver and Greene 2009). It is
thought that the rate at which TEs can infest genomes
and be fixed in lineages is rapid (Bennetzen 2005), but
an estimated constant rate of transposition based on phylogenies of extant taxa is usually used to place these evo-
lutionary events in history (Venner et al. 2009). The time
frame in which genomes of crop species have evolved under domestication remains unclear. Archaeogenomics
allows the direct examination of taxa in their natural temporal succession so facilitating a resolution of evolutionary
events not possible with modern genomics (Schindewolf
1948; Palmer et al. 2011).
Cotton is an interesting species for the study of genome
evolution because it is both an economically important
crop and has experienced recent genome divergence across
the genus. The cotton genus, Gossypium, includes 13 genomes comprising ;40 diploid and tetraploid species,
many of which have undergone large and independent expansions of gypsy-like, copia-like, and LINE TEs (Hawkins
et al. 2006; Hu et al. 2010), as well as episodes of rapid
DNA loss (Hawkins et al. 2009). Furthermore, Zaki and
Ghany (2004) demonstrated that TE activity in this genus
is ongoing. Cotton is the most important domesticated
plant species for fabric technologies, such as textiles, lines,
© The Author 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 29(8):2031–2038. 2012 doi:10.1093/molbev/mss070 Advance Access publication February 14, 2012
2031
Research article
Abstract
MBE
Palmer et al. · doi:10.1093/molbev/mss070
FIG. 1. Archaeological cotton remains and the resulting metagenomic assignments of shotgun sequencing using the FLX 454 platform from
each sample. Taxa were assigned from highest scoring alignment to the NCBI nt, est, or gss databases using megablast and MEGAN. Sample
images are shown in scale (bar 5 1 cm).
and nets, and is present in the archaeological record in the
Old and New Worlds. Domesticated cotton has complex
origins involving two genomes (A and D) and four independent domestication events. In the Old World, two different
diploid species of the A genome type, Gossypium herbaceum and G. arboreum, were domesticated in the African
and South Asian continents, respectively. An A genome
species resembling G. herbaceum was translocated to
the New World, where hybridization with an indigenous
species of D genome type, G. raimondii, occurred giving rise
to wild allotetraploids of AD genome type. Senchina et al.
(2003) estimated a date of 1.5 Ma for this hybridization
event. Subsequently, two different AD allotetraploid species, G. hirsutum and G. barbadense were domesticated
in the North and South American continents, respectively
(Fryxell 1979; Wendel 1989; Senchina et al. 2003).
In this study, we directly examine recent genome evolution in Gossypium by using ancient DNA extracted from
archaeological samples of cotton. We used the approach of
shotgun high-throughput sequencing because it is the
most suitable methodology available to reflect the genomic
content of these samples (Tenaillon et al. 2011) and has not
been previously applied to archaeological samples from low
latitudes (Palmer et al. 2011). We selected a cross section of
the cotton archaeobotanical record, with one archaeological sample from the Old World from the site at Qasr Ibrim
(Egypt) (Clapham and Rowley-Conwy 2009) and three
samples from the New World from sites at Samaca (Peru),
Loreto Viejo (Peru), and the Boquette Caves (Brazil), respectively (Junqueira and Malta 1981; Freitas et al. 2003;
Beresford-Jones et al. 2009; Owen 2009) (fig. 1). Species
of the Gossypium genus are not discernable from the re2032
mains commonly found in the archaeobotanical record,
so attribution of species was necessary for both the archaeology and comparative genomic analysis, which we
achieved through the analysis of archaeogenomic data.
Materials and Methods
Methods are outlined in more detail in the Supplementary
Material online.
Archaeological Samples
The four samples used in this study represent a cross section of the cotton archaeobotanical record and are detailed
as follows: BQT90 was a boll excavated from Boquete Caves
in Januarı́a, Brazil, dated to 750 years BP by radiocarbon
dating of other samples from the same site (Junqueira
and Malta 1981; Freitas et al. 2003); P1 was seeds excavated
from Samaca H-13 site in the Ica Valley in Peru, stratigraphically dated to the Middle Horizon, 800–1,000 years BP
(Beresford-Jones et al. 2009); P2 was seeds excavated from
Loreto Viejo, Lower Osmore Valley, Peru, radiocarbon
dated to 3,820–3,630 years cal BP (Owen 2009), and
QI92 was seeds excavated from the site of Qasr Ibrim in
Egypt, stratigraphically dated to 1,600 years BP ± 50 years
(Clapham and Rowley-Conwy 2009) (for images, see fig. 1).
Ancient DNA Extraction and Sequencing
Ancient DNA was extracted from archaeobotanical samples
using a modified cetyltrimethylammonium bromide and
silica column purification protocol using stringent precautions to avoid contamination by foreign materials and ensure
authenticity. DNA samples (50–120 ng per sample) were
MBE
Cotton Genome Evolution · doi:10.1093/molbev/mss070
made into standard 454 GS FLX shotgun libraries following
the manufacturer’s low molecular weight protocol (Roche
Applied Science, Mannheim, Germany) and sequenced using
the 454 GS FLX platform on two LR70 plates using a four
region gasket. Reads were filtered for microsatellites and duplicate reads using a scripts developed in house for this purpose. All scripts used in these analyses are available for
download from http://www2.warwick.ac.uk/fac/sci/lifesci/
research/archaeobotany/downloads/.
Metagenome Analysis
Sequencing reads were treated as a metagenomic data set
and assigned to taxa by highest scoring alignments to any
known sequence in the NCBI nt, est, or gss databases using
megablast and MEGAN (v 3.8). Metagenomic analysis was
carried out in order to determine endogenous DNA content and to isolate the data sets assigned to Gossypium for
further analyses. The repetitive and nonrepetitive DNA
from this data set was separated for species assignation
analysis and repetitive DNA analysis.
Species Assignations
The data set that was assigned to the nonrepetitive portion
of the Gossypium genome was utilized to determine the
species of each sample by binomial analysis of the number
of sequences assigned to the respective Gossypium species
in the metagenomic analysis. The representation of each
species in the NCBI nt database was used to calculate
the unbiased probability of randomly selecting a particular
species of Gossypium over any other, out of all the Gossypium species found in the metagenomic analysis. In
the case of each sample, there was an expectation of
two possible species based on the archaeology of the region
and the geographical ranges of extant cotton; these were
G. barbadense or G. hirsutum for samples BQT90, P1 and P2,
and G. arboreum or G. herbaceum for sample QI92. The
probabilities of the observed assignations for each of the
Gossypium species were calculated using a posterior distribution. The reliability of the species identification was further assessed by the calculation of beta distributions of the
expected and observed numbers of assignations to establish the extent of their separation. Parameters for the beta
distribution used were a equal to p þ 1, where p was the
number of specific species assignations made in the metagenomic analysis, and the b parameter equal to q þ 1,
where q was equal to the total remaining assignations
to members of the Gossypium genus.
Repetitive Element Analysis
The data set assigned to the repetitive portion of the Gossypium genome was interrogated against a custom database of published TEs using megablast and categorized
to TE type and subtype according to annotation of the reference sequence. Retrotransposon frequencies were calculated for each sample for comparison with those already
published for modern accessions of Gossypium A-, D-,
and AD genomes. At the time of writing, retrotransposon
profiles for the following species were available: G. herba-
ceum (Hawkins et al. 2006), G. raimondii (Hawkins et al.
2006) and G. hirsutum (Guo et al. 2008), respectively. Absolute values of archaeogenomic sequences assigned to the
respective retrotransposon types were weighted according
to retrotransposon lengths of 9.7, 5.3, and 3.5 Kbp for gypsy,
copia, and LINE, respectively (Hawkins et al. 2006), to give
an estimate of the genomic contribution of each retrotransposon type. Posterior probability distributions were
calculated for each set of retrotransposon observations using a beta distribution. The parameter a equal to p þ 1,
where p was the number of counts observed for that retrotransposon type, and the b parameter equal to q þ1, where
q was the number of times a different retrotransposon type
was observed.
Theoretical tetraploid TE profiles were calculated using
the published TE profile of modern G. raimondii, with modern G. herbaceum (Hawkins et al. 2006) and the archaeological G. herbaceum (QI92), respectively, weighted by
relative genome size. The posterior distributions of the theoretical tetraploid TE profiles were generated from the posterior distributions of the respective diploid genome
progenitor genomes. In this process, the compositions of
the diploid genomes were randomly sampled from their
respective posterior distributions and the resulting tetraploid composition was calculated weighting the profiles
by relative genome size. The process was repeated
10,000 times to produce a frequency profile that was converted to a probability distribution.
Results
Metagenomic Analysis
GS FLX 454 shotgun sequencing generated 16.1 Mbp of
unique reads with an average length of 72.7 bp from
BQT90, 6.4 Mbp with average length 57.9 bp from P1,
5.3 Mbp with average length 38.6 bp from P2, and 8.5
Mbp with average length 37.4 bp from QI92 (supplementary table S1 and fig. S1, Supplementary Material online).
Sequence lengths decreased with sample age, which is consistent with other studies on samples of similar age and
origin (Freitas et al. 2003; Palmer et al. 2009) (supplementary fig. S2, Supplementary Material online). Metagenomic
analysis indicated proportions of endogenous DNA consistent with the context and integrity of each sample. Despite
the relatively poor coverage of Gossypium genomes on
publically available databases, the proportions of endogenous DNA from the assignable sequences were calculated
to be 45% for BQT90, 64% for P2, and 95% for QI92 (fig. 1);
however, it is likely that considerably more would have
been assigned to Gossypium if a complete reference genome were available. From the sample P1, less than 4%
DNA was assignable as Gossypium. This sample was of visibly poor integrity and our estimate of the base substitution
error rate (see supplementary methods, Supplementary
Material online) was higher from this sample than the
others. We conclude that the endogenous DNA sequenced
from this sample was too severely damaged for accurate
assignation to taxa of origin. The high level of endogenous
2033
MBE
Palmer et al. · doi:10.1093/molbev/mss070
FIG. 2. (A) Probabilities of species assignment of each sample from the metagenomic analysis based on database representation of Gossypium
species. (B) Beta distribution analysis of species assignment for each sample where the unbiased probability was below 1 105.
DNA in QI92 is comparable to that obtained from permafrost samples of human hairs (Rasmussen et al. 2010)
and considerably higher than previous paleontological
metagenomic studies of a permafrost preserved mammoth,
a Neandertal and a caprine (Green et al. 2006, 2010; Poinar
et al. 2006; Ramirez et al. 2009). Metagenomic analysis attributed only trace presence of bacteria from this sample,
which is consistent with previous observations of remarkable preservation at the Qasr Ibrim site (O’Donoghue et al.
1996; Palmer et al. 2009).
Species Assignations of Cotton Samples
In the metagenomic analysis, reads from each sample were
each assigned to one of a number of species of the
Gossypium genus (supplementary fig. S3, Supplementary
Material online). The unbiased probability of observing
the number of metagenomic assignations or more based
on database size is given in figure 2A. In this test, a probability value of 0.5 indicates no bias for or against selection
of a particular species and probabilities below 0.5 indicate
an increasing bias in favor of the associated species. For species in which the unbiased probability was below 1 105
the information value of the result was assessed using beta
distributions of the observed values compared with the unbiased values expected if no one species was more likely to
be assigned (fig. 2B). From these analyses, samples BQT90
and P2 were unambiguously assigned as G. barbadense and
QI92 as G. herbaceum. The sample P1 was also assigned
with statistical significance to G. barbadense, but beta distributions showed very little separation between observed
and expected assignations, indicating that there were too
few DNA sequences to be sure that the G. barbadense assignations of this sample were not drawn from the expected unbiased distribution.
2034
This study represents the first identification of archaeobotanical cotton remains to a species level and demonstrates the power of archaeogenomic analysis for this
purpose. The identification of the sample QI92 as G. herbaceum rather than G. arboreum indicates that it is of
African origin as opposed to the alternative possibility
of an imported plant from the Indian subcontinent
(Clapham and Rowley-Conwy 2009), which has occurred
for a number of domesticate species (Fuller and Boivin
2009). The identification of the South American samples
as G. barbadense rather than G. hirsutum confirms the
expectation for these samples based on biogeography
(Beresford-Jones et al. 2011). Specifically for this study,
the species identifications enabled comparative studies
of genomic architecture in samples spanning the domestication history of cotton.
Repetitive Element Analysis
The repetitive DNA content of archaeogenomes ranged
from 48% to 65% for BQT90, P2 and QI92, which is close
to previous estimates for Gossypium of 40–65% (Hawkins
et al. 2006; Guo et al. 2008; supplementary table S2, Supplementary Material online). The percentage of the QI92
archaeogenome assigned as repetitive DNA was 65%, concurrent with previous estimates for G. herbaceum of a minimum of 60% of the genome occupied by repetitive
sequences (Hawkins et al. 2006). While there have been
no direct estimates reported for modern G. barbadense,
53% and 48% of the BQT90 and P2 archaeogenomes, respectively, were repetitive in this study, so modern G. barbadense is likely to have about 50% repetitive DNA. The 5%
difference in repetitive DNA composition between these
two G. barbadense samples is unlikely to be the result of
sampling error because the posterior distributions of
MBE
Cotton Genome Evolution · doi:10.1093/molbev/mss070
genomic repetitive element proportion of the two samples
are well separated (supplementary fig. S4, Supplementary
Material online).
Gossypium retrotransposons accounted for the majority
of archaeogenomic repetitive DNA (84% from BQT90, 100%
for P1, 85% for P2, and 88% for QI92, supplementary table
S3, Supplementary Material online) and a high proportion
of which assigned to Gossypium retroelement gypsy elements, known as Gorge TEs (supplementary table S4, Supplementary Material online). Gorge TEs comprise a large
portion of Gossypium genomes, in some lineages up to half
of the genome, and only occur in members of the Gossypium genus (Hawkins et al. 2006). Hawkins et al.
(2006) describe three phylogenetically distinct Gorge subtypes, of which Gorge3 has undergone significant expansion
in lineages of the A genome. The relative abundance of
Gorge TEs in the archaeogenomes were similar to estimates
of modern Gossypium A and AD genomes, with a higher
proportion of the Gorge3 type occurring in QI92 (supplementary table S5, Supplementary Material online) than the
G. herbaceum accession of Hawkins et al. (2006). The evidence of Gorge TEs supported our findings of a high endogenous DNA content because Gorge TEs would not be
recovered from any contaminant DNA originating from
other species.
The retrotransposon profiles of the AD archaeogenomes were similar to that of the published G. hirsutum
(Guo et al. 2008; fig. 3; supplementary tables S6 and S7,
Supplementary Material online). In contrast, the profile
of the G. herbaceum archaeogenome differed greatly to
that of the modern published G. herbaceum profile
(Hawkins et al. 2006), with a much greater dominance
of gypsy-like elements occurring in the archaeogenome.
The extent of differentiation of the retrotransposon profiles, calculated by posterior distributions for the probabilities underlying the observed retrotransposon counts
observed, showed distinct separation between samples
for both the gypsy- and copia-type retrotransposons
(fig. 4; supplementary table S8, Supplementary Material
online), confirming that the differences in counts between samples reflect differentiated genome architectures for these retrotransposon types and are not the
result of sampling artifacts. For gypsy- and copia-like
retrotransposon types, the tetraploid genomes overlap,
distinct to the distributions of G. raimondii and G. herbaceum. In the case of the LINE-type retrotransposons, the
overlap between the distributions of most samples is likely
to be due to little activity of this retrotransposon type
in Gossypium lineages, supporting the current theory
(Hawkins et al. 2008). The retrotransposon data sets were
additionally interrogated against a data set of reverse
transcriptase (RT) genes from long terminal repeat
(LTR) and non-LTR types (Hu et al. 2010) to ensure that
the gene counts of the respective TE types were reflected
in the archaeogenomic profiles described above. Even
though the number of archaeogenomic alignments to
RT genes limited the analysis, no increase in the proportion of LINE retroelements was discovered (supplemen-
tary table S9, Supplementary Material online). The
analysis or RT genes presented an interesting ambiguity
in the relative abundance in gypsy-like retrotransposon
proportions in the QI92 sample. The RT gene analysis
estimates a lower proportion of gypsy-like elements, in
particular the Gorge3 subtype. Nested TE regions would
be underrepresented by the RT analysis, and interpreted
this way, our data show that one in every two Gorge3 elements in the QI92 lineage are nested.
Posterior probability profiles calculated for the theoretical AD genome tetraploid constructs demonstrated
that the theoretical tetraploid based on the A archaeogenome from QI92 is significantly similar to that of modern
G. hirsutum for all three retrotransposon types, as opposed to the profile constructed from the modern A genome, which sits outside the probability range observed
in G. hirsutum (fig. 4).
Discussion
Punctuated Evolution in Old World Cotton in the
Holocene
This study shows a punctuation event in G. herbaceum that
occurred over a timescale that is within the history of plant
domestication and cultivation. Comparative analysis of retrotransposon profiles from archaeological and modern
G. herbaceum genomes demonstrated significantly divergent genomic composition, indicating that the evolutionary development of the domesticate form of G. herbaceum
has been ongoing to a hitherto unappreciated level during
the course of human history.
The proportion of archaeogenomic sequences which
aligned to published RT genes (Hu et al. 2010) support
a Gorge3 expansion in the QI92 sample, an observation which
is consistent with previous reports of Gorge3 subtype TE proliferation in the modern A genome (Hawkins et al. 2006; Hu
et al. 2010). It is possible that the QI92 lineage may, like other
crop species found at this site (Palmer et al. 2009), have been
locally adapted to extreme environmental stress and that the
proliferation of Gorge3 TEs observed in the QI92 lineage may
be the signature of a transcriptional TE burst in response to
dehydration stress. The difference between the modern and
archaeogenome can be explained by a very recent copia expansion in the modern G. herbaceum lineage (Hawkins et al.
2008) that had not yet occurred at the time of the QI92 sample. These findings support a scenario of dynamic genome
evolution within the Holocene, under which domesticates
have actively responded to their environment and become
locally adapted.
Genomic Composition of Extinct A Genome Donor
to New World Cottons
It was surprising to observe such differentiation at the genomic level in the A genome lineage. Gossypium herbaceum is thought to be a relatively young species (Fryxell
1979) and the nearest extant relative of the A genome
progenitor of the AD genome tetraploids (Wendel and
Cronn 2003). The relationship between the contrasting
2035
Palmer et al. · doi:10.1093/molbev/mss070
MBE
FIG. 3. Genomic profiles of Gypsy-, Copia-, and LINE-type retrotransposons in modern and archaeobotanical cotton. Profiles of modern
Gossypium genomes were calculated from published estimates of retrotransposon copy number; Gossypium herbaceum and G. raimondii
(Hawkins et al. 2006) and G. hirsutum (Guo et al. 2008). Global locations of archaeological sites for each archaeobotanical sample are indicated
on world map. Read number attributable to retrotransposons from the archaeobotanical samples are: BQT90, n 5 850; P1, n 5 6; P2, n 5 754;
and QI92, n 5 5,026.
G. herbaceum genomes was explored as possible A genome
progenitors to the New World tetraploids by constructing
theoretical tetraploids based on the retroelement composition from the D genome donor G. raimondii and each of
the G. herbaceum genomes, respectively. The striking resemblance of the tetraploid construct using the archaeogenome as the A donor to modern G. hirsutum (figs. 3
and 4) led us to the conclusion that the profile represented
by QI92 is more likely to reflect the true A genome donor
than does the published modern G. herbaceum. This direct
ancient DNA evidence of an extinct A genome ancestor
2036
answers questions that could only be speculated on using
comparisons of modern genomes (Hu et al. 2010).
Genomic Stability of New World Cottons since
Polyploidization
The three archaeological samples from South America,
identified as G. barbadense, show retrotransposon composition close to that of the modern AD genome tetraploid
G. hirsutum (Guo et al. 2008) (fig. 3) deviating in gypsy/copia content between the archaeological and modern tetraploid samples by 4–8%. In the posterior distribution
MBE
Cotton Genome Evolution · doi:10.1093/molbev/mss070
FIG. 4. Posterior distributions of the probabilities of selecting (A) gypsy,
(B) copia, or (C) LINE retrotransposons from each of the data sets of
BQT90 (blue, n 5 850); P1 (red, n 5 6); P2 (green, n 5 754);
Gossypium hirsutum, Guo et al. (2008) (purple, n 5 69); QI92 (dotted
blue, n 5 5,026); G. herbaceum, Hawkins et al. (2006) (orange, n 5
291); G. raimondii, Hawkins et al. (2006) (dotted red, n 5 215);
theoretical AD genome construct 1; calculated from modern G.
herbaceum and G. raimondii (dotted green) and theoretical AD
genome construct 2; calculated from QI92 and G. raimondii (pink).
and among the tetraploids since formation. The profile of
the theoretical AD construct 2 differed in gyspy-like retrotransposon composition from G. hirsutum by 2.7% and differed to the archaeological G. barbadense by 7–10%. While
the deviation in TE composition between the tetraploid lineages has been small, these data show there has been greater
genomic change in the G. barbadense lineage than the
G. hirsutum lineage since tetraploid formation.
This low divergence of the tetraploids at a genomic level
is also reflected in the analysis of the nonrepetitive portion
of the archaeogenomes. Beta distribution analysis of reads
assigned to nonrepetitive DNA indicated that the older
G. barbadense sample, P2, was less divergent from the null
expected distribution than the younger G. barbadense sample, BQT90, which would be expected if the latter was in
a more advanced state of lineage sorting (fig. 2B; supplementary SI1, Supplementary Material online). If this is representative of 3,000 years of lineage sorting in the
G. barbadense lineage, and the null expected distribution
representative of the point of speciation between G. barbadense and G. hirsutum, then these two species could be
younger than has hitherto been realized.
In conclusion, the archaeogenomic analysis we present
here shows that the New World cotton tetraploids were
formed from an A genome donor that was ancestral to
the QI92 genotype, rather than a genotype resembling
the modern G. herbaceum. Both the QI92 and modern
G. herbaceum lineages experienced rapid bursts of TE activity
resulting in a punctuated change in genome content. This
punctuation was complimented by long periods of genomic
stability seen in the tetraploid lineages, giving an overall picture of evolutionary change reminiscent of punctuated equilibrium. It is possible that this observation may be extended
to other plant domesticates, which would have important
implications for the conservation of genetic resources. If
there has been extensive genome modification in many
landraces, which may be associated with local adaptation
to various conditions, then these germplasms may represent
an even more valuable resource relative to wild genetic resources than is currently appreciated.
Supplementary Material
analysis, the distributions of the AD genome tetraploid
samples overlap and are distinct of both A and D genomes
(fig. 4). BQT90 and P2 have distributions that overlap
closely, indicating that they are likely to be drawn from
highly similar underlying populations. The G. hirsutum distributions, for each retrotransposon type, encompass the
archaeological G. barbadense distributions of BQT90 and
P2. Gossypium barbadense and G. hirsutum are thought
to have had a common ancestor about 1.5 Ma (Senchina
et al. 2003). Our analysis reflects the minimal differentiation
between these two tetraploid species and indicates a gradual
genomic change over this time. The similarity that occurs
amongst the G. barbadense archaeological samples despite
separation by more than 2,000 miles in distance and 3,000
years in time implies stability in the G. barbadense lineage,
Supplementary tables S1–S9, figures S1–S4, methodology,
and SI1 are available at Molecular Biology and Evolution
online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
This work was funded by the Natural Environment Research
Council (NE/F000391/1). We thank the Egypt Exploration Society for access to the material from Qasr Ibrim and Dr Andre
Prous at the Museu de História Natural Universidade Federal
de Minas Gerais, for access to material from the Boquete
Caves. We also thank J. Wendel, C. Grover, and J. Hawkins
for supplying data and sequencing of modern Gossypium retrotransposon studies and additional G. arboreum sequencing
and Gérard Sécond for useful contacts and discussions
2037
Palmer et al. · doi:10.1093/molbev/mss070
surrounding the Loreto Viejo Peruvian samples. Thanks also
for useful comments from three anonymous reviewers. This
research was carried out at School of Life Sciences, The University of Warwick, Coventry, United Kingdom.
References
Baucom RS, Estill JC, Leebens-Mack J, Bennetzen JL. 2011. Natural
selection on gene function drives the evolution of LTR retrotransposon families in the rice genome. Genome Res. 19:243–254.
Bennetzen JL. 2005. Transposable element contributions to plant
gene and genome evolution. Plant Mol Biol. 42:251–269.
Beresford-Jones D, Arce Torres S, Whaley OQ, Chepstow-Lusty AJ.
2009. The role of Prosopsis in ecological and landscape change
in the Samaca basin, Lower Ica Valley, south coast Peru from the
Early Horizon to the Late Intermediate Period. Lat Am Antiq.
20:303–332.
Beresford-Jones DG, Whaley O, Ledesma CA, Cadwallader L. 2011.
Two millennia of changes in human ecology: archaeobotanical
and invertebrate records from the lower Ica Valley, south coast
Peru. Veg Hist Archaeobot. 20:273–292.
Clapham AJ, Rowley-Conwy PA. 2009. The archaeobotany of cotton
(Gossypium sp. L) in Egypt and Nubia with special reference to
Qasr Ibrim, Egyptian Nubia. In: Weiss E, Fairburn AS, editors.
From foragers to farmers: papers in honour of Gordon Hillman.
Oxford: Oxbow Books. p. 244–253.
Freitas FO, Bendel G, Allaby RG, Brown TA. 2003. DNA from
primitive maize landraces and archaeological remains: implications for the domestication of maize and its expansion into
South America. J Archaeol Sci. 30:901–908.
Fryxell PA. 1979. The natural history of the cotton tribe. Austin (TX):
Texas A&M University Press.
Fuller DQ, Boivin N. 2009. Crops, cattle and commensals across the
Indian Ocean: current and potential archaeobiological evidence.
Etudes Ocean Indien 42–43:13–46.
Green RE, Krause J, Briggs AW, et al. (57 co-authors) 2010. A draft
sequence of the Neandertal genome. Science 328:710–722.
Green RE, Krause J, Ptak SE, et al. (12 co-authors) 2006. Analysis
of one million base pairs of Neanderthal DNA. Nature
444:330–336.
Guo W, Cai C, Wang C, Zhao L, Wang L, Zhang T. 2008.
A preliminary analysis of genome structure and composition in
Gossypium hirsutum. BMC Genomics. 9:314.
Hawkins JS, Hu G, Rapp RA, Grafenberg JL, Wendel JF. 2008.
Phylogenetic determination of the pace of transposable element
proliferation in plants: copia and LINE-like elements in
Gossypium. Genome 51:11–18.
Hawkins JS, Kim HR, Nason JD, Wing RA, Wendel JF. 2006.
Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium.
Genome Res. 16:1252–1261.
Hawkins JS, Proulx SR, Rapp RA, Wendel JF. 2009. Rapid DNA loss
as a counterbalance to genome expansion through retrot-
2038
MBE
ransposon proliferation in plants. Proc Natl Acad Sci U S A.
106:17811–17816.
Hu G, Hawkins JS, Grover CE, Wendel JF. 2010. The history and
disposition of transposable elements in polyploid Gossypium.
Genome 53:599–607.
Junqueira PA, Malta IM. 1981. Horticultores e ceramistas préhistóricos do nordeste de Mina Gerais. Arquivos do Museu de
História Natural da UFMG. 6:275–289.
O’Donoghue K, Clapham A, Evershed RP, Brown TA. 1996.
Remarkable preservation of biomolecules in ancient radish
seeds. Proc R Soc B Biol Sci. 263:541–547.
Oliver KR, Greene WK. 2009. Transposable elements: powerful
facilitators of evolution. Bioessays 31:703–714.
Owen BD. 2009. Early agriculture in the coastal Osmore Valley, Peru:
synchronous events and macroregional processes in the
formation of Andean civilization. In: Marcus WPR Jr, editor.
Andean civilization: a tribute to Michael E. Mosely. Los Angeles
(CA): UCLA Cotsen Institute of Archaeology Press. p. 124–144.
Palmer SA, Moore JD, Clapham AJ, Rose P, Allaby RG. 2009.
Archaeogenetic evidence of ancient Nubian barley evolution from
six to two-row indicates local adaptation. PLoS One. 4:e6301.
Palmer SA, Smith O, Allaby RG. 2011. The blossoming of plant
archaeogenetics. Ann Anat. 194:146–156. doi:10.1016/j.aanat.2011.
03.012
Poinar HN, Schwarz C, Qi J, et al. (13 co-authors). 2006.
Metagenomics to paleogenomics: large-scale sequencing of
mammoth DNA. Science 311:392–394.
Ramirez O, Gigli E, Bover P, Alcover JA, Bertranpetit J, Castresana J,
Lalueza-Fox C. 2009. Paleogenomics in a temperate environment: shotgun sequencing from an extinct Mediterranean
caprine. PLoS One. 4:e5670.
Rasmussen M, Li Y, Lindgreen S, et al. (52 co-authors). 2010. Ancient
human genome sequence of an extinct Palaeo-Eskimo. Nature
463:757–762.
Schindewolf OH. 1948. Basic questions in paleontology: geologic
time, organic evolution, and biological systematics. Chicago (IL):
University of Chicago Press.
Senchina DS, Alvarez I, Cronn RC, Liu B, Rong J, Noyes RD,
Paterson AH, Wing RA, Wilkins TA, Wendel JF. 2003. Rate
variation among genes and the age of polyploidy in Gossypium.
Mol Biol Evol. 20:633–643.
Tenaillon MI, Hufford MB, Gaut BS, Ross-lbarra J. 2011. Genome size
and transposable element content as determined by highthroughput sequencing in maize and Zea luxurians. Genome Biol
Evol. 3:219–229.
Venner S, Feschotte C, Biémont C. 2009. Transposable element
dynamics: toward a community ecology of the genome. Trends
Genet. 25:317–323.
Wendel JF. 1989. New World tetraploid cottons contain Old World
cytoplasm. Proc Natl Acad Sci U S A. 86:4132–4136.
Wendel JF, Cronn RC. 2003. Polyploidy and the evolutionary history
of cotton. Adv Agron. 78:139–186.
Zaki EA, Ghany AAA. 2004. Ty3/gypsy retrotransposons in Egyptian
cotton (G. barbadense). J Cotton Sci. 8:179–185.