Download Demographic events

Document related concepts

Dominance (genetics) wikipedia , lookup

Inbreeding wikipedia , lookup

Viral phylodynamics wikipedia , lookup

Human genetic variation wikipedia , lookup

Microevolution wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Population genetics wikipedia , lookup

Genetic drift wikipedia , lookup

Transcript
Demographic events
Measuring Genetic Diversity
•
•
•
•
Theta = θ = 4Nµ = 4Nm = 4N(µ+m)
For haploid markers θ = 2Nµ = 2Nm = 2N(µ+m)
The all important population genetic parameter.
It is based on the number of alleles or the number of
different nucleotides in a given sample.
• It quantifies genetic diversity of a given population.
Theta (θ) Hom
• The expected homozygosity (Zouros, 1979;
Chakraborty and Weiss (1991) in a population at
equilibrium between drift and mutation.
• Sensitive to small sample and allele sizes
• For microsat data
Theta (θ) S
• Estimated from the infinite-site equilibrium
relationship (Watterson, 1975) between the number
of segregating sites (S), the sample size (n) and θ for
a sample of non-recombining DNA.
Theta (θ) k
• Estimated from the infinite-allele equilibrium
relationship (Ewens, 1972) between the expected
number of alleles (k), the sample size (n) and θ.
• 95% confidence limits are calculated as
Sterling number (expansion factor of a factorial
Falling factorial
Theta (θ) πˆ
• Estimated from the infinite-site equilibrium (Tajima,
1983) relationship between the mean number of pairwise differences (πˆ) and theta (θ ).
Why so many θ measures
• Not all methods are suitable for all types of data.
• Ultimately all methods should result in the same
estimates of theta.
• Differences in estimates can be interpreted as
violations of assumptions, and each method is
sensitive to different assumptions.
Tajima’s D
• Tajima’s (1989) D test quantifies the discordance
between the estimate of theta from number of
segregating sites and from average pair-wise
sequence divergence.
• Negative values interpreted as a signal of purifying
selection or alternately as demographic expansion.
Fu’s Fs
• Fu’s (1997) Fs measures the probability of observing
a certain number of haplotypes given particular value
of θ – the test looks at discordance in values of θ
derived from number of haplotypes and average pairwise sequence divergence.
• Negative values interpreted as a signal of purifying
selection or alternately as demographic expansion.
Stobek’s S
• Strobek’s (1987) S measures the level of discordance
in values of θ derived from number of haplotypes and
average pair-wise sequence divergence.
• The expected number of alleles in a sample is an
increasing function of the migration rates, whereas
the expected average number of nucleotide
differences is shown to be independent of the
migration rates and equal to 4Nµ.
• Negative values interpreted as a signal population
structure.
Differences in θ measures
• Have selective interpretations.
• Have demographic interpretations.
Inferring demographic change
• Demographic changes are changes in effective
population size over time.
• Differences in θ summary statistics based different
population genetic measures will detect demographic
changes.
• Distribution of allelic frequencies are the signatures of
demographic changes.
• Coalescent analysis of data will recover signals of
demographic changes.
Mismatch distribution
• A frequency graph of pair-wise differences between
alleles.
• It is usually multimodal in samples drawn from
populations at demographic equilibrium (it reflects the
highly stochastic shape of gene trees)
• It is usually unimodal in populations having passed
through a recent demographic expansion (Rogers
and Harpending, 1992; Hudson and Slatkin, 1991) or
though a range expansion with high levels of
migration between neighboring demes (Ray et al.
2003, Excoffier 2004).
Mismatch distribution
• Multimodal
Mismatch distribution
• Unimodal
Mismatch distribution
• The mismatch distribution is a graphic way of
visualizing the signature of an expansion.
• If there is population expansion, then theoretically we
can calculate the population size before expansion,
the population size after expansion, and the time that
the expansion happened.
Pure demographic expansion
• We assumes that a population at equilibrium has
suddenly passed τ generations ago from a population
size of N0 to N1
• The probability of observing S segregating sites
between two randomly chosen non-recombining
haplotypes is given by
Pure demographic expansion
• In a simplified analysis θ1 is assumed to be ∞ - i.e. no
coalescent event since expansion.
• In this case:
• Where m and v are the mean and variance of the
mismatch distribution.
• This simplifying assumption tends to underestimate
the time to expansion, but is a fast solution.
• This model is implemented in DnaSP.
Pure demographic expansion
• Alternately all three variables θ1, θ0 and τ can be
solved for simultaneously using generalized nonlinear least-squares approach.
• The objective is to simultaneously change all three
variables such that Fs is maximized.
• This approach is more exact, but it is computationally
intensive.
• This model is implemented in Arlequin.
Pure demographic expansion
• Using the model implemented in Arlequin, we can
also estimate the confidence intervals around all
three variables.
• This is done by a parametric bootstrap.
Parametric bootstrap
• Assume some model plus some set of parameters,
and simulate a new dataset.
• Simulate N datasets.
• For each new simulated dataset, calculate a new set
of values.
• Plot a frequency distribution of newly calculated
values, and see where actual values or parameters
are placed relative to values of parameters derived
from simulated data.
• In this case we know the values of three variables θ1,
θ0 and τ and using these we generate new datasets
under the assumption of population expansion.
Test of demographic expansion
• Using the parametric bootstrap, we can place
confidence intervals on the three variables θ1, θ0 and
τ.
• The model used in the parametric bootstrap assumes
population expansion, and we have not yet tested if
the data have a signature of population expansion.
• We use the parametric bootstrap.
• Statistic 1 - the sum of square deviations (SSD)
between the observed and the expected mismatch.
• Statistic 2 - the raggedness index of the observed
distribution.
SSD statistic
• Using the parametric bootstrap (we simulate a new
dataset under the assumption of a demographic
expansion, and some values of θ1, θ0 and τ derived
from the original data), we get a new mismatch
distribution.
• We calculate a sum of square deviations of the
mismatch distribution for each bootstrapped dataset,
and compare it to the sum of squared deviations of
the actual dataset.
• The test statistic is
Raggedness index statistic
• Using the parametric bootstrap (we simulate a new
dataset under the assumption of a demographic
expansion, and some values of θ1, θ0 and τ derived
from the original data), we get a new mismatch
distribution.
• We calculate a summary statistic r based on
maximum number of mutational differences (d) and
frequency of the allelic classes (x).
• The test statistic is same as for SSD.
Test of demographic expansion
• Populations that have undergone demographic
expansions are expected to have smaller sum of
squared differences and smaller raggedness index in
their mismatch distributions than non-expanded
populations.
• Therefore, based on the test statistics, what values of
P would be considered significant (i.e. support the
hypothesis of a demographic expansion)?
Mismatch distribution
• Multimodal
Mismatch distribution
• Unimodal
Test of demographic expansion
• Maximum likelihood approaches.
– We assume some model of molecular evolution, and
calculate the likelihood of our data under that model of
evolution
– We estimate relevant parameters under this model
– Model of molecular evolution has to be known a priori
• Bayesian inference.
– We do not assume a particular model
– We divide our sampling throughout the duration of the
coalescent and estimate relevant parameters
– Based on distribution of parameter estimates, we infer a
process (model).
Test of demographic expansion
• Both maximum likelihood and Bayesian inference
approaches allow the calculation of confidence
intervals.
• Likelihoods and posterior probabilities are solved
through the Markov Chain Monte Carlo (MCMC)
algorithm – this is kind of a resampling algorithm.
• Likelihoods obtained under different models can be
compared using standard model selection criteria –
these include hierarchical Likelihood Ratio Test
(hLRT), Akaike Information Criterion (AIC) and
Bayesian Information Criterion (BIC).
Maximum likelihood methods
• ML assumes a model of sequence evolution.
• ML attempts to answer the question: What is
the probability that I would observe these
data (a multiple sequence alignment), given a
particular model of evolution (a tree and a
process).
• ML uses a ‘model’. This is justifiable, since
molecular sequence data can be shown to
have arisen according to a evolutionary
process.
Maximum Likelihood - goal
• To estimate the probability that we would observe a
particular dataset, given a phylogenetic tree and
some notion of how the evolutionary process worked
over time.
Probability of
⎧a b
⎪
⎪b a
⎨
⎪c e
⎪
⎩d c
given
π = [a,c,g,t]
c
e
a
f
d⎫
⎪
f⎪
⎬
g⎪
⎪
a⎭
Bayesian inference methods
Bayesian inference methods
Bayesian inference methods
Demographic events
• Same as a population can undergo a
demographic expansion, it can also
undergo a demographic contraction.
• Severe demographic contractions are
called bottlenecks.
Demographic declines
• Demographic declines should result in
patterns opposite to demographic
expansions.
• Three popular methodologies exist
– Heterozygozity method
– Allele number vs. allele range method
– Coalescent method
Demographic declines
• The heterozygosity method takes note
of the fact that recently declined
populations will have a relative excess
of observed to expected heterozygotes
• Why would this be true?
• Implemented in the program Bottleneck.
Demographic declines
• The allele number vs. allele range
method takes note of the fact that
recently declined populations will have
fewer alleles than expected relative to
allele range.
• Why would this be true?
• Implemented in the program M value.
Demographic declines
• M = k/(R+1)
Demographic declines
• The heterozygosity and allele number /
allele range methods work only with
microsatellite data.
• They model expected heterozygosity
under three different models of
microsatellite evolution.
– IAM – Infinite alleles model
– SMM – Stepwise mutation model
– TPM – Two phase model
Demographic declines
• IAM – Infinite alleles model – it assumes
that every new mutation results in a new
allele, and that there is no relationships
between the newly generated allele,
and the parental allele
Demographic declines
• SMM – Stepwise mutation model – it
assumes that every new mutation
results in a new allele, and that this new
allele is either one step larger or one
step smaller than the parental allele.
Demographic declines
• TPM – Two phase model – this model is
a mix of the SMM and IAM models.
Some percentage of the mutations are
allowed to form according to the SMM
model (~80%), the rest according to
IAM. Some programs allow for the input
of an average allele jump size.
Demographic declines
• TPM – Two phase model – this model is
a mix of the SMM and IAM models.
Some percentage of the mutations are
allowed to form according to the SMM
model (~80%), the rest according to
IAM. Some programs allow for the input
of an average allele jump size.
Demographic declines
• The method implemented in M value
also assumes the a priory knowledge of
θ of the population prior to the
population decline.
• Calculating θ from the current genetic
diversity would result in a conservative
estimate of population decline.
Demographic declines
• Harpia harpyja example:
• The average value of M for the 24 microsatellite loci
was 0.84, a value significantly lower than that
obtained under simulation of a pre-bottleneck
population size (p = 0.026 using the genetic
parameter θ of 2.24).
• We derived θ from estimated census sizes of 104 to
105 harpy eagle individuals assuming that the
effective number of individuals is equivalent to 1/10
the census size, and that microsatellite mutation rate
(µ) estimates range from 2.5 x 10-3 to 5.6 x 10-4.
Demographic declines
• Harpia harpyja example:
• When the parameter θ was estimated directly
from the microsatellite data (θ = 1.50), the M
value was not significant (p = 0.101).
• However, the θ calculated from the data itself
is necessarily a lower bound estimate if H.
harpyja shows any population structure, or if
the θ does not represent original population
prior to reduction.
Demographic declines
Demographic declines
• Different methods have different power to
detect bottleneck, an to register a bottleneck
event for different amount of time.
• Heterozygosity are often more immediately
sensitive, but they do not register a bottleneck
for a very long time.
• Allele number / allele range methods tend to
register equally or slightly less severe
bottleneck, but longer time in the past.
Demographic declines
• The coalescent method estimates
parameters for current and ancestral
population size, and time when population
size occurred.
• Implemented in the program MSvar.
Demographic declines
• Both populations experienced decreases in
effective population sizes.
• Populations started decreasing 100-150
years ago.
• Reductions range from 25% to 60%.
8
8
7
7
6
6
present
4
past
3
5
Ne
Ne
5
past
3
2
2
1
1
0
present
4
0
0
5000
10000
15000
generations
20000
25000
0
5000
10000
15000
generations
20000
25000
Populations size
• Effective population size – a summary
statistic representing some ideal number of
individuals based on some summary
statistic.
Populations size
• Inbreeding effective population size – The
ideal number of individuals that are
contributing to the reproductive population –
can be calculated from pedigree information
and more commonly from the coalescent
properties of the observed gene tree.
Populations size
• Variance effective population size – The
ideal number of individuals that represents
the sampling variance of the population –
this can either be sampling across
generations, or based on the variance in
gene frequencies observed in the data (so
based from θ estimates).
Populations size
• Different concepts of effective population
sizes will result in different estimates of
effective population sizes.
• The inbreeding effective population size is a
backward looking statistic whereas the
variance effective population sizes reflect
recent demographic/population genetic
processes influencing genetic systems.
Populations size
• Large inbreeding effective population sizes and
small variance effective population sizes are
indicative of recently reduced genetic variation due
to decreases in population size or habitat
fragmentation (Gerber & Templeton, 1996).
• In contrast, with a rapid increase in population size,
theory predicts a small inbreeding effective
population size and large variance and eigenvalue
effective sizes (Templeton, 1980).
Populations size
• There is therefore no simple relationship
between effective population sizes, and
census sizes, although it is often claimed
that in stable populations at equilibrium,
there is 1:10 relationships between
inbreeding and census population sizes.