Download Population-Based Resequencing Reveals That the Flowering Time

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Population genetics wikipedia , lookup

Genome (book) wikipedia , lookup

Public health genomics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Quantitative trait locus wikipedia , lookup

History of genetic engineering wikipedia , lookup

Human genetic variation wikipedia , lookup

Tag SNP wikipedia , lookup

Microevolution wikipedia , lookup

A30-Cw5-B18-DR3-DQ2 (HLA Haplotype) wikipedia , lookup

Transcript
Population-Based Resequencing Reveals That the Flowering Time Adaptation of
Cultivated Barley Originated East of the Fertile Crescent
Huw Jones,* Fiona J. Leigh,* Ian Mackay,* Mim A. Bower,à Lydia M.J. Smith,* Michael
P. Charles,§ Glynis Jones,§ Martin K. Jones,à Terence A. Brown, and Wayne Powell*1
*National Institute for Agricultural Botany, Cambridge, United Kingdom; Manchester Interdisciplinary Biocentre, University of
Manchester, Manchester, United Kingdom; àMcDonald Institute for Archaeological Research, University of Cambridge, Cambridge,
United Kingdom; and §Department of Archaeology, University of Sheffield, Sheffield, United Kingdom
Gene resequencing and association analysis present new opportunities to study the evolution of adaptive traits in crop
plants. Here we apply these tools to an extensive set of barley accessions to identify a component of the molecular basis
of the flowering time adaptation, a trait critical to plant survival. Using an association-based study to relate variation in
flowering time to sequence-based polymorphisms in the Ppd-H1 gene, we identify a causative polymorphism (SNP48)
that accounts for the observed variation in barley flowering time. This polymorphism also shows latitude-dependent
geographical distribution, consistent with the expected clinal variation in phenotype with the nonresponsive form
predominating in the north. Networks, genealogies, and phylogenetic trees drawn for the Ppd-H1 haplotypes reveal
population structure both in wild barley and in domesticated barley landraces. The spatial distribution of these population
groups indicates that phylogeographical analysis of European landraces can provide information relevant to the Neolithic
spread of barley cultivation and also has implications for the origins of domesticated barley, including those with the
nonresponsive ppd-H1 phenotype. Haplotypes containing the nonresponsive version of SNP48 are present in wild barley
accessions, indicating that the nonresponsive phenotype of European landraces originated in wild barley. The wild
accessions whose nonresponsive haplotypes are most closely similar to those of landraces are found in Iran, within
a region suggested as an area for domestication of barley east of the Fertile Crescent but which has previously been
thought to have contributed relatively little to the diversity of European cultivars.
Introduction
The domestication of ‘‘wild’’ cereals is closely associated with the transition from a hunter-gatherer to an agrarian lifestyle in western Asia, and barley is one of the earliest
domesticated crops found at Neolithic farming sites (Harlan
and Zohary 1966; Zohary and Hopf 2000). Domestication
of crop plants was accompanied by a series of genetic
changes, which has been described as the domestication
syndrome, affecting phenotypes such as seed retention,
seed size, and seed dormancy, that differentiate domesticated plants from their wild progenitors (Hillman and Davis
1990; Doebley et al. 2006; Fuller 2007). The spread of
agriculture from their area of domestication involved the
dispersal of crop plants well beyond their progenitors’ native range and may have required adaptation to new environments. The timing of flowering is a key adaptive trait
enabling plants to flower at the optimum time for pollination, seed development and, in the case of wild plants, dispersal (Cockram et al. 2007). A long annual growth season
promotes late flowering, thus allowing longer periods
of vegetative growth and increased resource storage.
Conversely, a short growth season favors early flowering
(Roux et al. 2006). Models of flowering time genes developed in the dicot Arabidopsis have been extended by comparative mapping in cereals (Decousset et al. 2000; Dunford
et al. 2002; Laurie et al. 2004). In barley, 2 major loci
determining photoperiod response have been identified:
Photoperiod-H2 (Ppd-H2) (chromosome 1H) that controls
flowering time under short days and Photoperiod-H1
1
Present address: Institute of Biological, Environmental & Rural
Sciences, Aberystwyth University, Aberystwyth, United Kingdom.
Key words: Hordeum vulgare, domestication, flowering, Neolithic,
photoperiod response, landrace.
E-mail: [email protected].
Mol. Biol. Evol. 25(10):2211–2219. 2008
doi:10.1093/molbev/msn167
Advance Access publication July 31, 2008
Ó The Author 2008. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
(Ppd-H1) (Eam1) (chromosome 2H) that controls flowering
under long-day conditions (Laurie et al. 1995). The organization and function of the barley (Ppd-H1) locus has been
described and a causative polymorphism proposed (Turner
et al. 2005). The recent isolation of the Ppd-H1 locus provides an opportunity to investigate how the full spectrum of
sequence variation in Ppd-H1, a gene involved in circadian
clock function (Gardner et al. 2006) originated and evolved
during domestication and dispersal of this major food crop.
Resequencing of candidate genes within diverse populations has considerable potential in understanding
complex human traits (Topol and Frazer 2007). Resequencing-based association studies have been used to study vernalization response and pathogen resistance in Arabidopsis
(Aranzana et al. 2005), but the full potential of these methods has yet to be fully explored in crop plants, particularly
in relation to understanding the domestication syndrome
and the selective pressures acting on crop plants during
their subsequent geographic expansion. Crop landraces
and their progenitor wild species have considerable potential as subjects for resequencing-based association studies:
the varied evolutionary histories of wild barleys and widely
dispersed landraces have generated diverse ecotypes, due to
natural or human selection, with a wide range of phenotypic
responses. Characteristically, landraces have historical origins, high local genetic adaptation, recognizable identities,
genetic diversity, a lack of formal genetic improvement,
and an association with traditional farming systems
(Camacho Villa et al. 2005). Landraces were established
in all regions where crops were grown up to the 19th century. In Europe, where improved agriculture developed in
the 19th and 20th centuries, landraces were replaced by varieties developed by selections from landraces and later by
the products of systematic plant breeding (Jones et al.
2008). The variation available within these landrace populations could provide a valuable resource to help unravel the
2212 Jones et al.
genetic mechanisms that regulate how plants adapt to different habitats.
Adaptive variation in photoperiod response may play
a role in allowing crops to be grown in new environments or
under differing agricultural practices. Variation in the photoperiod response of barley landraces from different agroecological zones has been described (Knüpffer et al. 2003),
and this variation is organized in a latitudinal cline. Similar
geographic variation has been observed for dormancy
phenotypes in the European aspen (Populus tremula) associated with genotypic variation in the candidate gene
(PHYB2) for day length–induced dormancy (Ingvarsson
et al. 2006). Phenotypic variation in hypocotyl growth in
response to light has been correlated to latitude in natural
populations of Arabidopsis thaliana from Scandinavia
(Stenøien et al. 2002). Latitudinal clines in flowering times
have also been observed in Arabidopsis modulated by the
FRIGIDA gene (Stinchcombe et al. 2004).
A monophyletic origin for barley, with a single domestication in the Israel–Jordan region, was initially suggested by
amplified fragment length polymorphism analysis and haplotype data from the gene Bkn-3 (Badr et al. 2000), but polyphyletic origins have also been proposed (Kolodinska et al.
2004; Molina-Cano et al. 2005; Azhaguvel and Komatsuda
2007). Haplotype studies at multiple loci have revealed geographic structure in wild barley populations (Morrell et al.
2003; Kilian et al. 2006), and the genotypes seen in landrace
and elite-cultivated barleys have been related to geographically localized subpopulations. These studies give further
support to the Israel–Jordan region as a center of domestication, but wild barley subpopulations from areas described by
Morrell and Clegg (2007) as lying east of the Fertile Crescent
have also contributed to the diversity seen in cultivated barley, probably via a second domestication. There is also archaeobotanical evidence to suggest that different cereal
species were domesticated independently in different areas
of Southwest Asia (Willcox 2005) suggesting that domestications occurred at a number of different locations within this
broad region.
The subsequent spread of cultivated barley into
Europe has not been subject to equivalent investigation,
but barley genetics holds the same potential for understanding the selective pressures acting on early crops and the
mechanisms of their dispersal. The patterns and chronology
of the spread of the Neolithic into Europe have been described by archaeologists studying the material cultures
linked with early agriculture (Dolukhanov et al. 2005).
Cereal agriculture spread rapidly into Greece and, from
there, by 2 principal routes through the rest of Europe (Price
2000). The southerly spread, through the Mediterranean
basin to Italy and Iberia, was apparently rapid but the northerly route more punctuated: after a rapid spread into the
Balkans, agricultural expansion seems to have slowed
down when it reached Hungary before again spreading rapidly through the Danube and Rhine valleys, followed by
further delays in the north European plain and the Alpine
foreland (Sümegi and Kertesz 2001; Gkiasta et al. 2003).
This paper quantifies variation at the Ppd-H1 photoperiod response gene in a sample of European barley landraces and wild barley accessions and addresses questions
regarding the molecular basis of adaptation of flowering
time, the origin of the adaptive trait, and the relationship
between geographically structured variation in the barley
landraces and the dispersal of agriculture into Europe.
Materials and Methods
Accessions
The Ppd-H1 gene was sequenced in 194 barley landraces from Europe (170), the Fertile Crescent (14), and Bolivia (10) and 72 wild barleys from the Fertile Crescent. The
Ppd-H1 phenotype was determined in a subset of 87 barley
landraces; this subset was also assayed to determine their
microsatellite and miniature inverted repeat transposable element (MITE) genotypes. The accessions and their collection sites are given in Supplementary Material online.
Sequencing
A 3,508-bp region spanning the Ppd-H1 gene was sequenced in both directions. DNA was extracted from one randomly selected coleoptile from each accession using
a Qiagen DNeasy96 kit. Seven overlapping polymerase
chain reactions (PCRs) were performed in 20 ll reactions
containing 1 PCR buffer (Roche Diagnostics, Burgess Hill,
UK), 1.5 mM MgCl2, 0.2 mM each deoxynucleoside triphosphate (dNTP), 0.2 M of both forward and reverse primers,
0.65 units Fast-Start Taq (Roche Diagnostics, Burgess Hill,
UK), and 50 ng genomic DNA. PCR was performed using an
Applied Biosystems 9700 PCR machine using the following
cycling conditions; 96 °C for 10 min; 40 cycles of 96 °C
for 40 s, primer-specific annealing temperature for 40 s,
and 72 °C for 1.5 min; and a final extension of 72 °C for 7
min. Primer sequences and the annealing temperatures are
given in Supplementary Material online. PCR products were
treated with Exo-SAP (New England Biolabs, Hitchin, UK,
and Promega, Southampton, UK) and sequenced (forward
and reverse) using the Applied Biosystems Big Dye Terminator 3.1 system. Sequences were aligned using Pre-Gap and
Gap (Bonfield et al. 1995; Bonfield and Staden 1996) and
polymorphic sites identified.
Microsatellite and MITE Genotyping
The genotypes at 24 genomic microsatellite loci were
determined in a series of 10 ll multiplex PCRs each containing 1 PCR buffer þ MgCl2 (Roche Diagnostics, Burgess
Hill, UK), 0.2 mM each dNTP, 0.2 M of both forward and
reverse primers, 0.65 units Taq (Roche Diagnostics, Burgess
Hill, UK), and 1.0 ll extracted DNA. Primer sequences are
given in Supplementary Material online. PCR was performed
using an Applied Biosystems 9700 PCR machine using the
following cycling conditions: 94 °C for 5 min; 7 cycles of
94 °C for 30 s, 65 °C for 30 s decreasing by 1 °C per cycle,
and 72 °C for 30 s; followed by 28 cycles of 94 °C for 30 s,
58 °C for 30 s, and 72 °C for 30 s; and followed by final
extension at 72 °C for 7 min. PCR products were separated
and visualized using an ABI 3100 genetic analyzer and data
analyzed using ABI Genemapper software. The genotypes at
10 MITE loci were determined in a series of PCRs. PCR
products were separated and visualized on 1.5% agarose gels
stained with ethidium bromide. Primer sequences and
Origin of Adaptation in Barley Flowering Time 2213
FIG. 1.—F-ratio scores for Ppd-H1 polymorphisms plotted against
their sequence position. The positions of the nonsynonymous SNPs
(black bars), synonymous SNPs (gray bars), and a cartoon of the Ppd-H1
gene are shown for reference. F-ratio scores greater than 11.6 are highly
significant (P , 0.001).
reaction conditions were supplied by Prof. A. Schulman,
University of Helsinki, Finland, under a confidentiality
agreement.
Statistical Methods
PowerMarker 3.25 (Liu and Muse 2005) was used to
calculate haplotype diversity. DnaSP (Rozas et al. 2003)
was used to calculate the nucleotide diversity (p) and
Wright’s FST and to detect recombination within the sequenced region. Phylogenetic trees used to visualize evolutionary histories were drawn using the DNADIST and
DNAML programs from the PHYLIP package (Felsenstein
2004) and the trees drawn using MrEnt (Zuccon A and
Zuccon D 2006). Networks were drawn using SplitsTree
4.6 (Huson and Bryant 2006), NETWORK 4.2.0.1 (Bandelt
et al. 1999; Anon 2007), and TCS 2.1 (Clemment et al.
2000). Coalescent ages (q) were calculated using NETWORK 4.2.0.1 (Forster et al. 1996)
Phenotyping and Association Study
The Ppd-H1 phenotype was determined in 87 barley
landraces from Europe by growing them in adjacent glasshouses under long-day and short-day conditions and comparing the flowering dates. The phenotype was expressed as
the number of days difference in flowering for long days
versus short days. Forty single nucleotide polymorphisms
(SNPs) were identified in Ppd-H1 within these 87 barley
landraces. The SNP genotypes were recoded as binary data
and tabulated together with the Ppd-H1 phenotype data. An
analysis of variance (ANOVA) was carried out on a linear
regression of Ppd-H1 phenotype against each SNP in turn
(R Development Core Team 2007). A common variance for
Ppd-H1 phenotype and SNP genotype is expected under the
null hypothesis of no association; association can be detected by an increase in the ratio of the variances (F-ratio)
(Aranzana et al. 2005; Balding 2006).
The possibility of false positives was investigated by
applying structured association (Mackay and Powell 2007).
The microsatellite data were interrogated for genetic structure within the accession set using STRUCTURE (Pritchard
FIG. 2.—Posterior probability (P) of each accession belonging to the
ppd-HI nonresponsive distribution with group membership shown for
comparison (crosses, SNP48 5 T; circles, SNP48 5 C). Accessions with
SNP48 5 T are shown to be associated with the ppd-H1 nonresponsive
phenotype.
et al. 2000) and the admixture ancestry model with a burnin of 200,000 steps and runs of 1,000,000 steps, following the
recommendations and methods of the manual. Genetic structure was investigated for population numbers (K) between 2
and 10. The optimum stable solution was given for K 5 8.
The Q-matrix output from STRUCTURE was tabulated with
the Ppd-H1 phenotype data and binary SNP data. ANOVA
was carried out on a multiple linear regression of Ppd-H1
phenotype against the coefficients of population ancestry
(Q-matrix) and each SNP in turn. Associations were detected
by an increase in F-ratios, as above, where F-ratios greater
than 11.6 were highly significant (P , 0.001). The F-ratio
for each polymorphism, estimated by structured association,
was plotted against its sequence position (see fig. 1).
The structured association method was validated by
comparison with a similar regression carried out on
10 MITE markers. The Q-matrix output from STRUCTURE
was tabulated with the Ppd-H1 phenotype data and binary
MITE data, and ANOVA was carried out on a multiple linear regression of Ppd-H1 phenotype against the coefficients
of population ancestry (Q-matrix) and each MITE. No significant association (P . 0.05) was found between the
MITEs and the Ppd-H1 phenotype.
A mixture of 2 normal distributions for the responsive
Ppd-H1 and nonresponsive ppd-H1 phenotypes, with independent means, variances, and a mixing coefficient, were
fitted to the observed phenotypes by least squares using
the Excel Solver function. The derived means and variances
were then used to calculate the posterior probability of
membership of each landrace to each of the 2 groups. These
were used to plot a cumulative probability curve where the
accessions could be identified according to their genotypes
(see fig. 2).
Results and Discussion
Identification of the Causative Polymorphism for the
Nonresponsive Phenotype
Resequencing the Ppd-H1gene in 266 barley accessions comprising 194 Hordeum vulgare landraces and
2214 Jones et al.
72 Hordeum spontaneum wild barleys identified 83 polymorphisms (78 SNPs, 4 indels, and 1 microsatellite). These
are organized into 121 haplotypes; following divergence
since domestication only 5 of these haplotypes are shared
between the landraces and wild barley. The nucleotide diversity for Ppd-H1 shows a reduction characteristic of
a ‘‘domestication bottleneck’’ between wild and cultivated
barley (table 1). The observed reduction to 78% of wild nucleotide diversity is comparable to that seen in domesticated
maize compared with its wild progenitor (Tenaillon et al.
2004).
To examine the phenotypic effects of sequence variation at Ppd-H1, a set of 87 European landraces were grown
and evaluated for differences in flowering time under longday and short-day conditions. Barley plants with the responsive phenotype (Ppd-H1) are expected to flower later
if grown under short days relative to barley plants grown
under long days, and those with the nonresponsive phenotype (ppd-H1) are expected to flower at the same time
whether they are grown under long days or short days.
The expectation is therefore a bimodal distribution for
the Ppd-H1 phenotype, and the observed phenotypes conform to this, with the modal values differing by 18 days
(supplementary fig. 1, Supplementary Material online).
These results did not appear to be complicated by any contribution from the second photoperiod gene Ppd-H2. Evidence for a phenotypic effect of variation at Ppd-H2 was
sought by examining the flowering dates for plants grown
under short-day conditions, but these dates showed no deviation from a normal distribution (Anderson-Darling’s test
for goodness of fit to the normal distribution) suggesting
that no phenotypic effect of Ppd-H2 had been detected
by our test system.
Of the 40 polymorphisms that were recorded within
Ppd-H1 for this set of landraces, 20 showed a significant
association with the responsive Ppd-H1 phenotype and
18 polymorphisms showed a highly significant (P , 0.001)
association with the Ppd-H1 phenotype after adjustment for
the effects of population structure (fig. 1 and supplementary
fig. 2, Supplementary Material online). Five polymorphisms, which are in perfect linkage disequilibrium (LD)
(r2 5 1) with each other, among the sample of 87 landraces
showed the strongest association (F-ratio 5 80.89) with the
trait. Among the 5 strongly associated polymorphisms, only
SNP48 produces a change in the translated protein. A ‘‘C’’
at SNP48 is associated with early flowering in long days
(Ppd-H1), whereas ‘‘T’’ at SNP48 is not associated with
early flowering in long days (ppd-H1). SNP79 (the causative polymorphism proposed by Turner et al. [2005]) is not
among the 5 polymorphisms that have the strongest association with the Ppd-H1 phenotype (F-ratio for SNP79 5
64.02) and are in perfect LD (r 5 0.86 for LD between
SNP48 and SNP79). Although this analysis associates phenotype with genotype at SNP48, a more certain link could
be obtained by observing the Ppd-H1 phenotype in near
isogenic lines transformed to harbor the alternate allelic
forms.
Each accession was assigned a probability of belonging either to the Ppd-H1 responsive group or to the ppd-H1
nonresponsive group by fitting a mixture model to the quantitative phenotype. When these posterior probabilities were
Table 1
Diversity of Ppd-H1 Haplotypes in Wild and Cultivated
Barley
Group
All accessions
Wild barley
Landraces
Number of Number of Haplotype
Accessions Haplotypes Diversity
266
72
194
121
60
66
0.97
0.99
0.95
Nucleotide
Diversity (p)
0.0038 ± 0.00001
0.0040 ± 0.00003
0.0031 ± 0.00001
used to plot a cumulative probability curve, the landraces
are identified by their genotypes at SNP48 and the association between SNP48 and Ppd-H1 phenotype can be seen
(fig. 2), supporting the conclusion that SNP48 is indeed the
causative polymorphism.
Latitudinal Cline for Ppd-H1 Phenotypes in European
Landraces
Barley with the responsive phenotype (Ppd-H1) is expected to predominate in regions where the growing season
is short and with a dry summer; conversely, the nonresponsive phenotype (ppd-H1) should predominate in regions
with a long growing season and a moist summer. In Europe,
this should result in a latitudinal cline for flowering time.
This pattern is evident in the geographic distribution of
the landraces, assigned to their predicted phenotype on
the basis of SNP48 (fig. 3A). We divided the landraces
by latitude into numerically equal sized classes and plotted
the frequency of each group for each class (fig. 3B). The
results show that the proportion of landraces with a C at
SNP48 (predicted to be responsive Ppd-H1) is higher in
southern latitudes and the proportion with T at SNP48 (predicted to be nonresponsive ppd-H1) increases in northern
latitudes. This latitudinal cline has a correlation coefficient
of r 5 0.46 that is highly significant (P , 0.01) when
tested by permutation (n 5 999). A latitudinal cline drawn
in the same way for SNP79 (proposed to be causative by
Turner et al. [2005]) has a correlation coefficient r 5 0.13
that is not significant (P . 0.05) when tested by permutation (n 5 999). The pattern for SNP48 is consistent with
the expected latitude-dependent distribution with the nonresponsive form predominating in the north, offering further support to the conclusion that SNP48 is the
causative polymorphism.
Evolutionary Relationships between Ppd-H1 Haplotypes
The relationships between all 121 Ppd-H1 haplotypes
were examined by constructing networks with NETWORK
4.2.0.1 and TCS 2.1. The resulting topologies (fig. 4) are
virtually identical and are in broad agreement with the relationships shown by a network drawn using SplitsTree 4.6,
with Neighbor-Joining trees produced by using DNADIST
and NEIGHBOR, and with a maximum likelihood tree produced by DNAML.
Five distinct groups of haplotypes are seen within the
networks. All haplotypes with a T at SNP48, and hence predicted to give rise to the nonresponsive ppd-H1 phenotype,
cluster together in a single group (Group A, highlighted
Origin of Adaptation in Barley Flowering Time 2215
haplotypes. Most of the wild barleys in this group have origins strongly resembling the geographical range ‘‘Zagros
and further east’’ (Morrell and Clegg 2007), considered
by those authors as harboring a subpopulation genetically
distinct from the wild population in the western Fertile
Crescent. The 2 landraces in Group D both come from
Syria. The final group, E (purple in fig. 4), comprises haplotypes found only in wild barleys. In the Neighbor-Joining
and maximum likelihood trees, Group E has the deepest
root, suggesting that it is ancestral to the other 4 groups.
The networks show evidence of character conflict arising from recombination, but most reticulations are confined
within groups, rather than occurring between haplotypes in
different groups, even when the parsimony settings in TCS
are reduced. This topology therefore indicates that the
5 groups listed above have evolved as independent populations, with some recombination occurring between accessions belonging to each individual population (with the
apparent exception of the ancestral Group E) but with only
limited gene flow between populations.
Implications for the Origins of Domesticated Barley
FIG. 3.—Latitudinal cline of Ppd-H1 haplotypes in Europe. (A)
Distribution of landraces with Ppd-H1 haplotypes in which SNP48 is
either C (open circles; predicted to be responsive Ppd-H1) or T (closed
circles; predicted to be nonresponsive ppd-H1). (B) The same results
expressed as a histogram with landraces divided into 6 numerically equal
sized classes according to latitude.
blue in fig. 4). This group comprises 20 haplotypes
found only in landraces, 6 found only in wild barley,
and 1 shared haplotype. Of the 18 polymorphisms identified
as significantly associated with the Ppd-H1 phenotype, only
SNP48 partitions congruently among the members of
Group A and all other accessions in this way. In contrast,
accessions harboring a ‘‘G’’ at SNP79 (predicted to be ppdH1 by Turner et al. [2005]) partition between Groups A, B,
and C (73, 5, and 7 accessions, respectively). The 100%
coincidence of T at SNP48 and membership of Group A
therefore provides further support that SNP48 is the causative polymorphism.
The remaining 4 groups contain haplotypes with a C at
SNP48 (predicted Ppd-H1). The majority of the responsive
haplotypes that are found only in landraces fall into
2 groups, B and C (highlighted yellow and green, respectively, in fig. 4). Group B comprises 18 landrace haplotypes, 3 wild, and 1 shared, and Group C comprises
20 landrace haplotypes, 9 wild, and 3 shared. All the landraces possessing Group B or C haplotypes come from
Europe, and all the wild barleys come from the western
arm of the Fertile Crescent. The fourth group, D (red in
fig. 4), is made up of 24 wild haplotypes and 2 landrace
Haplotypes found only in wild barley accessions are
broadly distributed across the network and, in particular,
are colocated with European landrace haplotypes in each
of Groups A, B and C. Hence, both the day length responsive form of cultivated barley (Ppd-H1, Groups B and C)
and the nonresponsive form (ppd-H1, Group A) originate
from wild barley, rather than one type evolving from the
other as a result of mutation during dispersal of landraces
in Europe.
First, we consider the responsive types. Sixteen of the
38 haplotypes in Groups B and C are found in wild barley,
all these accessions located in Turkey, the Israel–Jordan region, or another part of the western Fertile Crescent. All the
landraces with Group B or C haplotypes come from Europe.
These results are therefore consistent with an origin for the
responsive landraces in the western Fertile Crescent, possibly the Israel–Jordan region as previously suggested
(Badr et al. 2000; Kilian et al. 2006). Division of these haplotypes into 2 groups could suggest that there were 2 domestications in this region, but the reticulations in the
basal regions of these 2 groups leave possible a single origin
with subsequent splitting of the domesticated population.
Group D also comprises responsive haplotypes present
in both wild barley and landraces. Most of these wild accessions originate to the east of the Fertile Crescent and
have previously been shown to be genetically distinct from
the western wild populations (Morrell and Clegg 2007).
The absence from Group D of haplotypes found in European landraces suggests that the eastern wild population
has not contributed to functional diversity of the Ppd-H1
phenotype in European germplasm. Group D does, however, include 2 domesticated haplotypes, present in 2 landraces from Syria. The positioning of these haplotypes
within Group D indicates that they arose from the wild
Group D accessions, in agreement with the suggestion that
there was a second domestication of barley from the eastern
wild population (Morrell and Clegg 2007). Our results
2216 Jones et al.
FIG. 4.—Relationships between Ppd-H1 haplotypes shown as networks constructed with (A) NETWORK and (B) TCS. The nonresponsive wild
Iranian and Israeli accessions in Group A are marked with red and blue asterisks, respectively. The 2 Iranian accessions from the Zagros region,
suggested by recombination breakpoint analysis to be most closely related to the Group A landraces, are indicated by double asterisks.
predict that the landraces from South and East Asia that derive from this second domestication (Saisho and Purugganan 2007) have Ppd-H1 haplotypes falling within
Group D.
Now we consider the nonresponsive haplotypes, all of
which fall within Group A. The 7 of these haplotypes that
we located in wild barley occur in accessions from Israel (3
haplotypes from 22 Israeli accessions) and Iran (4 haplotypes from 20). The topologies of the networks place the
Israeli haplotypes at the base of Group A, with the Iranian
haplotypes in more derived positions reticulated with the
nonresponsive landraces. Although it is possible that other
populations of nonresponsive barley exist in the wild, the
implication of our results is that the ppd-H1 phenotype of
European landraces originated in wild barley in Iran, lending support to the earlier finding of an eastern center for
barley domestication (Morrell and Clegg 2007). An Iranian
origin for the nonresponsive landraces is also supported by
detailed analysis of recombination breakpoints in the Group
A haplotypes. Eighteen recombination breakpoints were
detected within the Ppd-H1 locus by the ‘‘4 gamete’’ test
(Hudson and Kaplan 1985), 15 of these present in wild barley haplotypes and 10 in landraces (see Supplementary Material online). The presence of 8 ‘‘private’’ recombinants in
wild barley compared with just 3 in landraces is consistent
with the reduction in diversity expected during the domestication of a wild species. Examination of the subhaplotypes
in the intervals between recombination breakpoints detected several cases where a subhaplotype rare (,10%)
in wild barley is common (.40%) in the landraces. The
wild barleys harboring these rare subhaplotypes can be interpreted as founders of those landrace lineages that harbor
the same subhaplotypes. At breakpoint intervals between
positions 498–734 and 634–788 of the sequenced region,
subhaplotypes that are rare in wild barley were present
in 2 of the Iranian Group A accessions but not in any of
the Israeli ones. These subhaplotypes are present in all
97 of the landraces in Group A, strong evidence that these
landraces are related to the Iranian wild accessions. The network topology is consistent with linear descent from wild to
landrace and hence with a separate domestication event giving rise to the nonresponsive landraces, but the alternative—transfer by admixture between wild and existing
landraces—cannot be discounted. The foothills of the Zagros Mountains are among the 3 general regions suggested
as centers for a secondary domestication of barley (Morrell
and Clegg 2007). One of the 2 wild barleys that our analysis
suggests as likely progenitors of the nonresponsive cultivated barleys was sampled in this region (HOR 2882,
32°33#N, 48°33#E). The second of these accessions originates in the north of the Zagros Range (HOR 2680
36°30#N, 48°42#E).
Implications for Future Studies of the Spread of Barley
Cultivation in Europe
Our results are also relevant to attempts to use genetic
and phylogeographical analysis of cereal landraces as
a means of studying the spread and subsequent development
of agriculture in Europe. The European landraces possessing
haplotypes falling into Groups A, B, and C are highly differentiated at the Ppd-H1 locus (FST (A vs. B) 5 0.84,
FST (A vs. C) 5 0.83, and FST (B vs. C) 5 0.76) suggesting
that they are distinct lineages with differing geographic distributions and different dispersal histories. A possible differentiation between the Groups B and C haplotypes is
visible when the landrace distribution across Europe is plotted (fig. 5), Group B landraces predominating in Southern
Europe in regions with a Mediterranean coast, and Group C
being distributed across Europe from the North European
Plain, the northern Balkans through to western Europe.
Origin of Adaptation in Barley Flowering Time 2217
FIG. 5.—Distribution in Europe of landraces with Ppd-H1 haplotypes
belonging to Group B (open circles) and C (triangles).
Growth studies (Jones H, unpublished data) have provided
no indication of any phenotypic difference between Groups
B and C accessions that might lead to differential selective
pressure on a north–south cline. The geographic distributions of the Groups B and C landraces show a striking similarity to the original distributions expected if Group B had
spread into Europe mainly by the southern route along the
Mediterranean and Group C by the northern route through
Central Europe. This in itself does not advance our archaeological knowledge as the existence of these routes is well
established, but it does suggest that modern landraces preserve at least some of the phylogeographical pattern set up
during the original Neolithic spread of agriculture, despite
the movement of germplasm anticipated during the 7000–
8000 years since Europe became domesticated. If this interpretation is correct, then more detailed analysis of the
genotypes of European landraces (of neutral as well as
adaptive markers) might be used to probe aspects of agricultural spread that are not fully resolved, such as the
degree of contact between the 2 trajectories as measured
by the amount of admixture at locations such as the Alpine
foreland where human populations along the 2 principal
routes may have come into contact.
The predominant distribution of the nonresponsive
Group A haplotypes in northern Europe has already been
described (fig. 3). Although this distribution is likely to
reflect the outcome of continuing selective pressure, the
emergence of this group raises questions about the spread
of ppd-H1 haplotypes into Europe. Our data show that the
nonresponsive haplotypes characterizing the northern part
of the cline evolved directly from wild barley populations
and are therefore not the result of de novo mutation within
cultivars but rather a dispersal from east of the Fertile Crescent. If the ppd-H1 haplotypes entered the cultivated gene
pool before agriculture spread out of Southwest Asia, then it
is conceivable that these haplotypes spread into Europe
with the initial Neolithic expansion of farming, selection
during or subsequent to this spread establishing the greater
frequency of ppd-H1 in northern Europe. Alternatively,
ppd-H1 haplotypes could have spread independently to
northern Europe at some period after the initial expansion
of agriculture. Our data do not provide any clear indication
of the period, if any, that elapsed between the original
spread of agriculture and the arrival of nonresponsive haplotypes in Europe, though the possibility that there was a delay between the 2 events is suggested by calculating the
coalescent age (q) (Forster et al. 1996) of each group of
landraces. Likely founder sequences, that occupy a central
position within the phylogeny, are suggested by the topology of the network for each group of European landraces.
Their status as likely founders is supported by their frequency within the European landraces, by their similarity
to haplotypes seen in wild barley, and by their inclusion
of accessions originating near to likely dispersal routes into
Europe. The coalescent age of each group of landraces was
calculated, based on the network, as the average mutational
distance from the likely founding sequence. The value for
the Group A landraces (1.61) is lower than the values for the
Group B or C landraces (2.13 and 2.36, respectively), possibly suggesting that barley with the nonresponsive form
(ppd-H1) dispersed into Europe more recently than barley
with the responsive form. If the Group A haplotypes do represent a later spread of domesticated barley into Europe
from Iran, as suggested for crops such as the oil plant Lallemantia, introduced during the Bronze Age (Jones and
Valamoti 2005), this raises the possibility that these
nonresponsive landraces spread to northern Europe not
via Turkey and Greece but through Transcaucasia and
around the north coast of the Black Sea.
Conclusions
By population-based resequencing of the photoperiod
response gene Ppd-H1 in an extensive sample of barley
landraces, we are able to discriminate between candidate
causative polymorphisms by the strength of statistical association with phenotype and to identify a new causative
polymorphism, SNP48, that has a stronger association with
phenotype than that proposed by Turner et al. (2005). Identification of SNP48 as the causative polymorphism is supported by the geographic distribution of SNP48 alleles
across Europe, which displays a latitudinal cline consistent
with the expected distribution of the Ppd-H1 phenotype,
and by phylogenetic analysis, which places all Ppd-H1 haplotypes containing the nonresponsive version of SNP48 in
a single group. Haplotypes containing the nonresponsive
version of SNP48 are also present in wild barley accessions,
indicating that the nonresponsive phenotype of European
landraces originated in wild barley. The wild accessions
whose nonresponsive haplotypes are most closely similar
to those of landraces are found in Iran, within a region suggested as an area for domestication of barley east of the
Fertile Crescent (Morrell and Clegg 2007). This eastern
origin was the progenitor of cultivated barley now found
in Central Asia and the Far East but has previously been
thought to have contributed relatively little to the diversity
of European cultivars (Morrell and Clegg 2007; Saisho and
Purugganan 2007). By revealing that this eastern region
was the origin of the nonresponsive version of Ppd-H1
found in Europe, we show that although domestications
in this region had only limited impact on the overall genetic
2218 Jones et al.
diversity of European barley, the impact on the phenotype
of European landraces and modern cultivars derived from
them has been significant.
Supplementary Material
Supplementary figures 1–3 tables 1–4 are available at
Molecular Biology and Evolution online (http://www.mbe.
oxfordjournals.org/).
Acknowledgments
This work was funded by the Natural Environment
Research Council as part of the consortium project ‘‘The
Domestication of Europe’’ and by National Institute for Agricultural Botany. We thank the consortium project partners
for helpful advice, in particular Robin Allaby (University of
Warwick), Keri Brown (University of Manchester), and
Amy Bogaard (University of Oxford) as well as David
Laurie and James Beales (John Innes Centre). We also
thank Phillipa Smallman (NIAB) for carrying out the MITE
genotyping.
Literature Cited
Anon. 2007. Network 4.2.0.1 user guide [Internet]. Fluxus
Technology Ltd., Clare, Suffolk, UK. [cited 17 Aug 2008].
Available from: http://www.fluxus-technology.com.
Aranzana MJ, Kim S, Zhao K, et al. (17 co-authors). 2005.
Genome-wide association mapping in Arabidopsis thaliana
identifies previously known genes responsible for variation in
flowering time and pathogen resistance. PLoS Genet. 1:e60.
Azhaguvel P, Komatsuda T. 2007. A phylogenetic analysis based
on nucleotide sequence of a marker linked to the brittle rachis
locus indicates a diphyletic origin of barley. Ann Bot.
100:1009–1015.
Badr A, Muller K, Schafer–Pregl R, El Rabey H, Effgen S,
Ibrahim HH, Pozzi C, Rohde W, Salamini F. 2000. On the
origin and domestication history of barley (Hordeum
vulgare). Mol Biol Evol. 17:499–510.
Bandelt H-J, Forster P, Röhl A. 1999. Median-joining networks
for inferring intraspecific phylogenies. Mol Biol Evol.
16:37–48.
Balding DJ. 2006. A tutorial on statistical methods for population
association studies. Nat Rev Genet. 7:781–791.
Bonfield JK, Smith KF, Staden R. 1995. A new DNA sequence
assembly program. Nucleic Acids Res. 24:4992–4999.
Bonfield JK, Staden R. 1996. Experiment files and their
application during large-scale sequencing projects. DNA
Seq. 6:109–117.
Camacho Villa TC, Maxted N, Scholten M, Ford-Lloyd B. 2005.
Defining and identifying crop landraces. Plant Genet Res.
3:373–384.
Clemment M, Posada D, Crandall KA. 2000. TCS: a computer
programme to estimate gene genealogies. Mol Ecol.
9:1657–1660.
Cockram J, Jones H, Leigh FJ, O’Sullivan D, Powell W,
Laurie DA, Greenland AJ. 2007. Control of flowering time in
temperate cereals: genes, domestication, and sustainable
productivity. J Exp Bot. 58:1231–1244.
Decousset L, Griffiths S, Dunford RP, Pratchett N, Laurie DA.
2000. Development of STS markers closely linked to the Ppd-
H1 photoperiod response gene of barley. Theor Appl Genet.
101:1202–1206.
Doebley JF, Gaut BS, Smith BD. 2006. The molecular genetics
of crop domestication. Cell. 127:1309–1321.
Dolukhanov S, Shukarov A, Gronenborn D, Sokoloff D,
Timofeev V, Zaitseva G. 2005. The chronology of Neolithic
dispersal in Central and Eastern Europe. J Archaeol Sci.
32:1441–1458.
Dunford RP, Yano M, Kurata N, Sasaki T, Huestis G,
Rocheford T, Laurie DA. 2002. Comparative mapping of
the barley Ppd-H1 photoperiod response gene region, which
lies close to a junction between two rice linkage segments.
Genetics. 161:825–834.
Felsenstein J. 2004. PHYLIP, phylogeny inference package,
version 3.6. Seattle (WA): University of Washington.
Forster P, Harding R, Torroni A, Bandelt H. 1996. Origin and
evolution of native American mtDNA variation: a reappraisal.
Am J Hum Genet. 59:935–945.
Fuller DQ. 2007. Contrasting patterns in crop domestication and
domestication rates: recent archaeobotanical insights from the
Old World. Ann Bot. 100:903–924.
Gardner MJ, Hubbard KE Hotta CT, Dodd AN, Webb AAR.
2006. How plants tell the time. Biochem J. 397:15–24.
Gkiasta M, Russell T, Sherman S, Steele J. 2003. Neolithic
transition in Europe: the radiocarbon record revisited.
Antiquity. 77:45–62.
Harlan JR, Zohary D. 1966. Distribution of wild wheats and
barley. Science. 153:1074–1080.
Hillman GC, Davis MS. 1990. Domestication rates in wild-type
wheats and barley under primitive cultivation. Biol J Linn
Soc. 38:39–78.
Hudson RR, Kaplan NL. 1985. Statistical properties of the
number of recombination events in the history of a sample of
DNA sequences. Genetics. 111:147–164.
Huson DH, Bryant D. 2006. Application of phylogenetic
networks in evolutionary studies. Mol Biol Evol. 23:254–267.
Ingvarsson PK, Garcia MV, Hall D, Luquez V, Jansson S. 2006.
Clinal variation in phyB2, a candidate gene for day-lengthinduced growth cessation and bud set, across a latitudinal
gradient in European aspen (Populus tremula). Genetics.
172:1845–1853.
Jones G, Valamoti SM. 2005. Lallemantia, an imported or
introduced oil plant in Bronze Age northern Greece. Veg Hist
Archaeobot. 14:571–577.
Jones H, Lister DL, Bower MA, Leigh FJ, Smith LM, Jones MK.
2008. Approaches and constraints of using existing landrace
and extant plant material to understand agricultural spread in
prehistory. Plant Genet Res. 6:98–112.
Kilian B, Özkan H, Kohl J, von Haeseler A, Barale F, Deusch O,
Brandolini A, Yucel C, Martin W, Salamini F. 2006.
Haplotype structure at seven barley genes: relevance to gene
pool bottlenecks, phylogeny of ear type and site of barley
domestication. Mol Genet Genomics. 276:230–241.
Knüpffer H, Terentyeva I, Hammer K, Kovaleva O, Sato K.
2003. Ecogeographical diversity—a Vavilovian approach. In:
von Bothmer R, van Hintum T, Knüpffer H, Sato K, editors.
Diversity in barley. Amsterdam (The Netherlands): Elsevier.
p. 53–76.
Kolodinska BA, Bothmer RV, Dayteg C, Rashal I, Tuvesson S,
Weibull J. 2004. Inter simple sequence repeat analysis of
genetic diversity and relationships in cultivated barley of
Nordic and Baltic origin. Hereditas. 141:186–192.
Laurie DA, Griffiths S, Dunford RP, Christodoulou V,
Taylor SA, Cockram J, Beales J, Turner A. 2004. Comparative genetic approaches to the identification of flowering time
genes in temperate cereals. Field Crops Res. 90:87–99.
Origin of Adaptation in Barley Flowering Time 2219
Laurie DA, Pratchett N, Bezant JH, Snape JW. 1995. RFLP
mapping of five major genes and eight quantitative trait loci in
a winter spring barley (Hordeum vulgare L.) cross.
Genome. 38:575–585.
Liu K, Muse SV. 2005. PowerMarker: an integrated analysis
environment for genetic marker analysis. Bioinformatics.
21:2128–2129.
Mackay I, Powell W. 2007. Methods for linkage disequilibrium
mapping in crops. Trends Plant Sci. 12:57–63.
Molina-Cano JL, Russell JR, Moralejo MA, Escacena JL,
Arias G, Powell W. 2005. Chloroplast DNA microsatellite
analysis supports a polyphyletic origin fro barley. Theor Appl
Genet. 110:613–619.
Morrell PL, Clegg MT. 2007. Genetic evidence for a second
domestication of barley (Hordeum vulgare) east of the Fertile
Crescent. Proc Natl Acad Sci USA. 104:3289–3294.
Morrell PL, Lundy KE, Clegg MT. 2003. Distinct geographic
patterns of genetic diversity are maintained in wild barley
(Hordeum vulgare ssp. spontaneum) despite migration. Proc
Natl Acad Sci USA. 100:10812–10817.
Price TD, editor. 2000. Europe’s first farmers. Cambridge:
Cambridge University Press.
Pritchard JK, Stephens M, Donnelly P. 2000. Inference of
population structure using multilocus genotype data. Genetics.
155:945–959.
R Development Core Team. 2007. R version 2.5.1 [Internet].
Vienna (Austria): R Foundation for Statistical Computing.
Available from: http://www.R-project.org.
Roux F, Touzet P, Cuguen J, Le Corre V. 2006. How to be earlyflowering: an evolutionary perspective. Trends Plant Sci.
11:375–381.
Rozas J, Sánchez-DelBarrio JCN, Messeguer X, Rozas R. 2003.
DnaSP, DNA polymorphism analyses by the coalescent and
other methods. Bioinformatics. 19:2496–2497.
Saisho D, Purugganan MD. 2007. Molecular phylogeography of
domesticated barley traces expansion of agriculture in the Old
World. Genetics. 177:1765–1776.
Stenøien HK, Fenster CB, Kuittinen H, Savolainen O. 2002.
Quantifying latitudinal clines to light responses in natural
populations of Arabidopsis thaliana (Brassicaceae). Am J Bot.
89:1604–1608.
Stinchcombe JR, Weinig C, Ungerer M, Olsen KM, Mays C,
Halldorsdottir SS, Purugganan MD, Schmitt J. 2004. A
latitudinal cline in flowering time in Arabidopsis thaliana
modulated by the flowering time gene FRIGIDA. Proc Natl
Acad Sci USA. 101:4712–4717.
Sümegi P, Kertesz R. 2001. Palaeogeographic characteristics of
the Carpathian Basin—an ecological trap during the early
Neolithic? In: Kertész R, Makkay J, editors. From the
Mesolithic to the Neolithic. Budapest (Hungary): Archaeolingua. p. 405–415.
Tenaillon MI, U’Ren J, Tenaillon O, Gaut BS. 2004. Selection
versus demography: a multilocus investigation of the
domestication process in maize. Mol Biol Evol.
21:1214–1225.
Topol EJ, Frazer KL. 2007. The resequencing imperative. Nat
Genet. 39:439–440.
Turner A, Beales J, Faure S, Dunford RP, Laurie DA. 2005. The
pseudo-response regulator Ppd-H1 provides adaptation to
photoperiod in barley. Science. 310:1031–1034.
Willcox G. 2005. The distribution, natural habitats and
availability of wild cereals in relation to their domestication
in the Near East: multiple events, multiple centres. Veg Hist
Archaeobot. 14:534–541.
Zohary D, Hopf M. 2000. Domestication of plants in the Old
World. 3rd ed. Oxford: Oxford University Press.
Zuccon A, Zuccon D. 2006. MrEnt v12 program. Stockholm
(Sweden): Swedish Museum of Natural History.
Beth Shapiro, Associate Editor
Accepted July 25, 2008