Download Seed bank - Section of population genetics

Document related concepts

NEDD9 wikipedia , lookup

Hybrid (biology) wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Genetically modified organism containment and escape wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Mind the gaps: Theoretical and empirical aspects of
plant ecology and population genetics
S. chilense
S. peruvianum
University of Uppsala 26.02.2013
Aurélien Tellier
Populationsgenetik
TUM Weihenstephan. Freising
Seed banks and seed dormancy (1)
 Many plant species present variation in seed dormancy and long term seed banks
 If conditions are predictable within a year
No dormancy
time = 1 year
With dormancy**
favorable conditions
Seed banks and seed dormancy (2)
 In a variable and unpredictable environment over many years!
No seed bank = germination only by environmental clues or cyclic
Seed bank** = damp the effect of bad years = bet-hedging
Evolution of seed banks
 Seed banks may evolve as bet-hedging strategies
 If the environment is stochastically variable or competition between species generates
temporal variability
(Cohen 1966 J Theor Biol; Snyder and Adler 2011 Am Nat)
 Bet hedging strategies such as germ banking are ubiquitous to many species of:
 Bacteria (Jones and Lennon 2011 Nat Rev Microbiol)
 Invertebrates: diapause in insects, eggs banks in crustaceans (Daphnia)
a) Arbuscular mycorrhizal fungus, b) Cyanobacteria
c) Bacterial biofilm (P. aeruginosa), d) Bacteria Viridibacillus arvi
© J. Wolinska
From Jones and Lennon. 2011 Nat Rev Microbiol
Nature 2007
Importance of seed banks (1)
 Seed banks are important for conservation biology
 Promote the temporal rescue effect (Brown and Kodric-Brown 1977 Ecology)
 Decreasing the likelihood of population extinction
 Seed banks promote storage of diversity in the soil => time lag between above-ground
and seed banks
 Selection is slower (Hairston and De Stasio 1988 Nature)
 Balancing selection is favored (Turelli et al. 2001 Evolution)
 Coevolutionary dynamics are stabilized
(Tellier and Brown 2009 Am Nat)
Year 1
Year 2
Year 3
Linanthus parryae ©Bierzychudek lab
Year 4
Germination rate
Hypothetical phylogeny of Solanum section Lycopersicon
S. lycopersicum
S. pimpinellifolium
S. cheesmanii
Self
Compatible
S. neorickii
S. chmielewskii
S. arcanum
S. peruvianum*
S. chilense*
Self
Incompatible
S. habrochaites
S. pennellii
Solanum sp.
From Peralta, Spooner, Knapp (2008)
Background: Two wild tomato species
Solanum peruvianum
Solanum chilense
generalist species found in wide range
of habitats
specialist species found in dry to very
dry habitats
Städler et al. 2008 Genetics
Photos © TGRC. T. Städler
Plant populations are like iceberg
plants above ground = census size = the tip of the iceberg
from the Tomato Genetics Ressource Center
database (Davis. USA)
S. chilense
S. peruvianum
total number populations in database
147
304
Mean number of plants above ground (Ncs)
33 - 154
44 - 185
Maximum number of plants observed
400
600
total number of populations (From Nakazato et al. 2010 Am J Bot):
526 (S. peruvianum) and 428 (S. chilense)
The hidden part: Effective population size
Effective population size (Ne) reflects genetic diversity
Key question
Why is Ne so different from Ncs ???
Species
Population
Ne
Tarapaca
S. peruvianum
S. chilense
Arequipa
Nazca
Ncs (census size)
≈150
1.23106
9
10
Canta
≈300
Antofagasta
≈300
Tacna
Moquegua
Quicacha
1.06106
42
≈200
≈40
Our objectives
Objective 1:
 Reveal the existence of seed banks over evolutionary time scale without extensive
sampling of seeds and above-ground populations
 e.g. for species where sampling is complicated
Objective 2:
 What are the consequences of seed banks on statistical inference of past
demographic events?
 Of interest in conservation biology where DNA sequences are increasingly used to detect
past or recent crash of populations?
Objective 3:
 What are the consequences of seed banks on statistical inference for speciation
models?
 Can seed banks affect the detection of introgression between closely related species?
Part 1:
Evidence for the existence of seed banks
in wild tomato species
Work with Stefan Laurent, Wolfgang Stephan
? Metapopulation, Seed Bank or both ?
Spatial structuring of populations
Seed banks in plants
Germination
rate=b
Year 1
Year 2
Year 3
Year 4
Ne=Ncsnd+nd /4mig
Number of
individuals per
population
Number of
demes
Ne=Ncs(1/b)2
Migration rate
 Demes are linked by limited gene flow
(migration of seeds or pollen)
(Pannell and Charlesworth 2000; Wakeley and Aliacar 2001)
Number of individuals
per population
Germination
rate
 There is storage of diversity in the soil for
several years
(Nunney 2002; Kaj et al. 2001; Vitalis et al. 2004)
Overview of our approach
Goal
Explain the discrepancy between Ncs and Ne
 Are seed banks needed to generate the high observed diversity?
 Do these two wild tomato species have different seed banks?
Method and data
 Ecological data:
population sizes (35 < Ncs < 180) and number of demes (526 or 428) => priors
 Sequences at reference loci (> 300 silent SNPs in total)
 population samples
 species wide sample
 Population genetics statistics
 nucleotide diversity, site-frequency spectrum, index of fixation (FST)
 Approximate Bayesian Computation for statistical inference
Our sampling scheme
Our model of coalescence
 Based on Kaj, Krone and Lascoux J Appl Proba 2001
 The germination process is memoryless => the germination rate decreases geometrically
with age of seeds
 b = probability for a seed to germinate after one generation (= germination rate)
 b(1-b) = probability for a seed to germinate after two generations
 b(1-b)2 = probability for a seed to germinate after three generations
 β is a composite seed bank parameter function of b and m
b(1  (1  b)m1 )

1  (1  bm)(1  b)m
 The rate of coalescence is rescaled by β2 (the size of the genealogy is affected)
 Mutation does not increase with age of seeds
 The scaled mutation rate is scaled by β along a given ancestral line
 The recombination rate and migration rate between demes are also scaled by β
ABC: island model and demography
time
time
Ancestral unique population (at time of speciation?)
with split at time T. constant population size
Expansion from an ancestral population
AND fragmentation
With split at time T
time
Crash from an ancestral population
AND fragmentation
With split at time T
S. peruvianum
past demography
= expansion
Posterior probability for each model
Posterior probability for each model
Results 1: Seed banks are necessary for high diversity
are seed banks
necessary ?
= yes
S. chilense
past demography
≈ expansion
are seed banks
necessary ?
= yes
Tellier et al. PNAS 2011
Results 2: Estimates of germination rate
Posterior density for the germination rate b
S. peruvianum
(b = 0.03 [0.011 – 0.103])
S. chilense
(b = 0.093 [0.016 – 0.2])
P < 0.001 Kolmogorov-Smirnov
We estimate the germination parameter for both species
Tellier et al. PNAS 2011
Conclusions Part 1
We estimate parameters of seed bank and metapopulation with combination of
sampling schemes and chosen statistics in a flexible Bayesian framework
 Population genetics methods and theory to gain fundamental insights into the ecology and
adaptation of wild plant species:
 S. peruvianum = generalist species
 S. chilense = specialist species
 Different adaptations for seed dormancy
 differences in rates of purifying selection (Tellier et al. 2011 Heredity)
 rates of local adaptation (Xia et al. 2010 Mol Ecol. Fischer et al. 2011 New Phytol)
 inter-population differentiation (Arunyawat et al. 2007 MBE. Städler et al. 2008 Genetics)
Part 2:
Seed banks and statistical inference of simple
demographic events
Work with Daniel Živković
Rationale
 Application: In conservation biology, polymorphism analyses are used increasingly to
determine if populations show recent decline
 In our model from Kaj et al. (2001): coalescence occurs at slower rate (trees are longer by a
factor β2)
Scale of tree
function of β2
Time t in past
No seed bank
With seed bank
Rationale
 Application: In conservation biology, polymorphism analyses are used increasingly to
determine if populations show recent decline
 In our model from Kaj et al. (2001): coalescence occurs at slower rate (trees are longer by a
factor β2)
Scale of tree
function of β2
Time t in past
No seed bank
With seed bank
Our hypothesis
seed banks affect the scale of the coalescent tree => affect the inference of past events
Results 1: allele frequency-spectrum (SFS)
 We compute the SFS for a model with seed bank and varying population size
β =1
β =0.6
β =0.2
β =1
β =0.6
β =0.2
Different past demography can result in the same (relative) allele frequency spectrum
Results 1: expectations
 For a simple demographic scenario of population expansion
Time t2 in past
Growth rate R2
Time t1 in past
Growth rate R1
No seed bank
With seed bank
 We see that an expansion that occurred at time t2 in the past and with growth rate R2 in a
population with seed bank
 has equal SFS than more recent expansion at time t1 (t2 >> t1 )
 AND with larger growth rate R1 (R2 << R1 )
 in a population without seed bank
Our approach: test of expectation
No seed bank
Time t1 in past
Growth rate R1
With seed bank
Time t2 in past
Growth rate R2
 We simulate data with seed banks with various germination rate β
 Then estimate the parameters of the model in a population without seed bank
 Using Approximate Bayesian Computation (ABC) and SFS statistics
Results 2: for past expansion
 Over 500 datasets, we calculate errors in estimating the growth rate R
Knowing the seed bank rate
Long seed banks
Short seed banks
Ignoring the seed bank
Long seed banks
Short seed banks
Large error bias for estimating the time and growth rate of a past expansion!!
Results 3: more complex demographic scenarios
 For more complex demographic models, it may even be worst
 Various expected SFS can be obtained for one set of model parameters but for different
germination rates (β)
β =1
β =0.6
β =0.2
Živković and Tellier 2012 Mol Ecol
Results 3: more complex demographic scenarios
 For more complex demographic models. it may even be worst
 Various expected SFS can be obtained for one set of model parameters but for different
germination rates (β)
β =1
β =0.6
β =0.2
Good news: seed bank give us access to more ancient demographic events
Bad news: we need to know the germination rate
Živković and Tellier 2012 Mol Ecol
Part 3:
Seed banks and statistical inference in
speciation models
Work with Anja Hörger and Katharina Böndel
Coevolution in sister species?
 Coevolution may generate balancing selection at resistance genes in plants (Stahl et al. 1999 Nature;
Tellier and Brown 2011 Annu Rev Phytopathol)
 Under balancing selection with multiple alleles (Castric et al. 2008 PLoS Genetics) =>
 Repeated adaptive introgression due to frequency-dependent selection?
 Maintenance of ancestral polymorphism without gene flow?
Adaptive introgression model
Ancestral polymorphism model
Predictions
1) Higher amount of shared polymorphism at resistance genes than genomic
background due to balancing selection
2) Higher gene flow at resistance genes than genomic background
Methods
 We study three resistance genes: Pto, Pfi1, Rin4 (Rose et al. 2005 Genetics, Rose et al. 2011 Mol Plant Pathol)
 In two sister species S. chilense and S. peruvianum with history of gene flow (Städler et al. 2008
Genetics)
 10 reference loci are sequenced (=980 SNPs)
 Species wide sampling: one individual per population (Städler et al. 2009 Genetics)
Result 1: occurrence of balancing selection
 Several tests for comparison with 10 reference loci (980 SNPs)
 Balancing selection may occur at Pto and Pfi1 in both species
 Purifying selection at Rin4 as in reference loci (Tellier et al. 2011 Heredity)
Coalescence and IM speciation model
past
Ancestral
A=4NA
present
1
Derived 1
M12
M21
2
Time since
speciation = 
Derived 2
The number of migrants per generation M12=4N1m12
 We integrate seed banks in the coalescent model (using a modification of ms software)
 Use Joint-Site Frequency Spectrum frequency to summarize SNPs in both species (Wakeley and Hey
1997 Genetics; Tellier et al. 2011 PLoS One)
The JSFS
 Joint-Site Frequency Spectrum frequency of SNPs in both species
Species 1 = S. peruvianum
0
0
Species 2 = S. chilense
1
1
2
3
A1
A2
A4
A5
A6
A7
A8
A9
4
… n2-1
5
A3
A10
2
3
A14
4
A11
A12
A13
…
n1-1
n1
n2
A15
Result 1: Relative JSFS for reference vs candidate loci
Boxplot of JSFS classes at reference
loci
A1, A2, A4, A7 = private singletons and low frequency SNPs (low at Pto)
A3, A11 = private intermediate and high frequency SNPs (excess at Pto)
A6, A13 = shared intermediate and high frequency SNPs (excess at Pto)
Power analysis of ABC on model choice
 There is an excess of shared polymorphism at the Pto locus compared to other loci
 Question: is the shared polymorphism due to gene flow or ancestral polymorphism?
 Lets look first at a model for the reference loci (neutral model for genomic background)
 Using our JSFS classes, can we distinguish between model with and without gene flow?
 Our approach: Power analysis with the ABC method on wide range of parameter values, for
each parameter combination: 500 model choices
Model 1
Model 2
Result 2: Power analysis of ABC
 Confusion matrix: how often do we find out the wrong model (Bayes Factor = 3)
Divergence (τ)
Migration (M)
WITHOUT seed banks
WITH seed banks (β=0.1)
0.005
0.02
0.95
0.99
0.005
0.2
0.95
0.99
0.005
2
0.96
0.99
0.005
20
0.95
0.98
0.05
0.02
0
0.99
0.05
0.2
0
0.99
0.05
2
0
0.99
0.05
20
0.09
0.99
0.5
0.02
0
0.98
0.5
0.2
0
0.98
0.5
2
0
0.98
0.5
20
0
0.98
5
0.02
0
0.82
5
0.2
0
0.84
5
2
0
0.84
5
20
0
0.87
Conclusion: results of ABC IM model
 We cannot distinguish models of speciation with and without gene flow based on
genomic background (independent of number of loci)
 It is very difficult to estimate divergence time and migration rate
WHY?
Because seed banks retain genetic diversity for long period of time, and enhance
the amount of shared polymorphism between species (at whole genome level)
Following our previous results, this is because coalescent trees are lengthened by
seed banks
Conclusion: Coevolution in sister species?
 We find an increase of shared polymorphism at the Pto locus under balancing selection
 BUT based on the JSFS and ABC analysis, we cannot distinguish the origin of shared
polymorphisms between:
 Repeated adaptive introgression due to frequency-dependent selection?
 Maintenance of ancestral polymorphism without gene flow?
 because we cannot estimate the occurrence of gene flow for the neutral background
Unknown unknowns of germ/seed banking
“There are known knowns. These are things we know that we know.
There are known unknowns. That is to say, there are things that we know we don't know.
But there are also unknown unknowns. There are things we don't know we don't know.”
Donald Rumsfeld
 How important are germ banks in evolution?
 Consequences for statistical inference of past events?
 Affecting selection and drift in populations, and genomic signatures of selection?
 For dating speciation times and rate of diversification in phylogenies?
Thanks to hazards of migration
Wolfgang Stephan
Stefan Laurent
Daniel Živković
Katharina Böndel
Thomas Städler
Pavlos Pavlidis, Hilde Lainer, Laura Rose
Anja Hörger
Mamadou Mboup
James Brown
Research questions
 Do long term seed banks evolve as a bet-hedging strategy? => Ecological studies
(Evans and Dennehy 2005 Q. Rev. Biol.. Evans et al 2007 Am Nat)
 What is the genetic basis for short-term seed dormancy ?
 Genetic basis. physiological adaptations and ecological conditions for long-term
seed banks? Similar bases as dormancy? Correlation between these traits?
 Are seed banks important in plant evolution ? Do many plants have seed banks?
 Influence of seed banks on adaptation and genetic variability?
© SeedNet. transcription network in seeds
Importance of seed banks (2)




Consequence of seed banks for mutation
Increase of mutation rate in seed banks? (suggested by Levin 1990 Am Nat)
Comparison between seed banks and above-ground population (microsatellite data)
In 47 studies. 42 different plant species (Honnay et al. 2008 Oikos)
Higher values in
above -ground
Higher values in
seed bank
Importance of seed banks (2)




Consequence of seed banks for mutation
Increase of mutation rate in seed banks? (suggested by Levin 1990 Am Nat)
Comparison between seed banks and above-ground population (microsatellite data)
In 47 studies. 42 different plant species (Honnay et al. 2008 Oikos)
Higher values in
above -ground
Higher values in
seed bank
 Above-ground population is not differentiated from seed bank
 No evidence for increase of mutation rate in seed banks
 Purifying selection at the seedling stage (Vitalis et al. 2002 Am Nat)
The hidden part: Effective Population size
No correlation between census size and genetic diversity
1,6
ThetaW in percent per site
1,4
1,2
Peruvianum
1
chilense
0,8
0,6
0,4
0,2
0
0
50
100
150
200
N census size
250
300
350
Theory 1: Coalescence in metapopulation
Two phases: collecting (long) and scattering (short) (Wakeley and Aliacar 2001 Genetics)
Genealogy depends on the number of demes (n) and migration rate (M)
past
collecting phase
time
scattering phase
present
Deme 1
Deme 2
Deme 3
Deme 4
Theory 2: Species wide sample
past
collecting phase
time
scattering phase
present
Deme 1
Deme 2
Deme 3
Deme 4
1 individual per deme. over the species range = reflect the species wide evolution
(Wakeley and Aliacar 2001 Genetics. Pannell 2003 Evolution. Städler et al. 2009 Genetics)
Theory 3: Population sample
past
collecting phase
time
scattering phase
present
Deme 1
Deme 2
Deme 3
Deme 4
Several individuals per deme. few populations = reflect the local evolution
Combination of sampling schemes is necessary
Theory 3: Effect of sampling scheme
Model with 100 demes. Städler et al. 2009
Stepping stone with constant population size
Tajima’s D
Tajima’s D
Stepping stone with expansion
local
species wide
Migration rate
pooled
Expansion factor
 local samples are weakly affected by species wide demography
 expansion at the species level is seen in pooled and species-wide samples
use various sampling schemes to infer demography in metapopulation?
Model for seed banks (1)
If we have no information about what lies in the seed bank. here m = 5
Step 0
Cell 0
Our sample
Cell 1
Cell 2
Cell m-1
Cell m
Size=Ncs
compartments
Kaj. Krone. Lascoux. J. Appl. Proba. 2001
Model for seed banks (2)
If we have no information about what lies in the seed bank. here m = 5
Step 0
Cell 0
Cell 1
Our sample
Cell 2
Cell m-1
Cell m
Size=Ncs
compartments
b1
b4
b2
Step 1
Cell 0
For all plants in cell 0
Cell 1
Cell 2
Cell m-1
Cell m
Cell m+1
m-window
Kaj. Krone. Lascoux. J. Appl. Proba. 2001
Model for seed banks (3)
m = 5. if two seeds fall on the same compartment within a cell. there can be a coalescent
event
Coalescent event
Step 2
Cell 0
Cell 1
Cell 2
Cell m-1
Cell m
Cell m+1
m-window
At any moment. there are r lineages in the m-window
Then move every cell to the left. and start again step 0
Step 0
Cell 0
Cell 1
Cell 2
Cell m-1
Cell m
Cell m+1
Kaj. Krone. Lascoux. J. Appl. Proba. 2001
Results 3: Estimates of germination rate
Joint posterior density
B
S. chilense
Germination rate (b)
S. peruvianum
Germination rate (b)
A
Number of demes
b = 0.058
Number of demes
b = 0.135
Varying the number of demes affect also the estimates
Tellier et al. 2011 PNAS
Rationale
 The shape of the tree reflects past demographic events: expansion. crash.…
 The allele frequency-spectrum (SFS) summarizes the shape of the tree
 e.g. An excess of low frequency-variants reflects a possible past population
expansion
Wiuf et al. 2005. “Primer in Coalescent”
Two Orang-Utan populations
Locke et al. 2011 Nature
Results 2: for past expansion
Correcting by β2
Ignoring the seed bank
Long seed banks
Long seed banks
Short seed banks
Short seed banks
×β2
Large error bias for estimating the time and growth rate of a past expansion!!