Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Mutation-Drift Balance
Genetic Variation in Finite Populations
The amount of genetic variation found in a population is influenced by two opposing
forces: mutation and genetic drift.
1
Mutation tends to increase variation.
2
Genetic drift tends to reduce variation.
In particular, if both the mutation rate and the effective population size are relatively
stable, then the amount of genetic variation will tend towards an equilibrium known as
mutation-drift balance at which the rate at which variation is lost through drift is
equal to the rate at which new variation is created by mutation.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
1 / 23
Mutation-Drift Balance
Multilocus Surveys Reveal Limited Variation in Nucleotide Diversity
(0.00005 ≤ π ≤ 0.1)
Source: Leffler et al. (2012): Revisiting an Old Riddle: What Determines Genetic Diversity Levels within Species?
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
2 / 23
Mutation-Drift Balance
Mutation-Drift Balance and Identity by Descent
Mutation and drift have opposing effects on the probabilities that individuals are
identical by descent (Cotterman 1940, Malecot 1941).
1
We say that two haploid individuals are identical in state at a locus if they carry
the same allele.
2
We say that two haploid individuals are identical by descent at a locus if they
share the same allele and if they inherited that allele without mutation (or
recombination) from their most recent common ancestor.
A1
Figure: Individuals can be identical by state
even when they are not identical by descent
(homoplasy).
A1
A 1 → A2
A1 → A2
A2
A2
Identity by state
Jay Taylor (ASU)
Mutation-Drift Balance
A1
A1
Identity by descent
25 Jan 2017
3 / 23
Mutation-Drift Balance
Suppose that we sample two chromosomes at random from generation t and let Ft be
the probability that they are identical by descent.
We can derive a recursive equation relating Ft+1 to Ft by considering the parentage of
the sampled individuals. For simplicity, we will make the following assumptions:
1
The population is diploid, with coalescent effective population size Ne .
2
Mutation is governed by the infinite-allele model (IAM), which assumes that
every mutation generates a unique allele (no back mutation).
3
The mutation rate is µ per chromosome per generation.
In that case,
Ft+1
1
1
2
=
· (1 − µ) + 1 −
· Ft · (1 − µ)2 .
2N
2Ne
| e {z
} |
{z
}
same parent
Jay Taylor (ASU)
different parents
Mutation-Drift Balance
25 Jan 2017
4 / 23
Mutation-Drift Balance
As t increases, these probabilities tend to a limit Ft → F̃ , which is the probability of
identity by descent at equilibrium. This quantity satisfies the following equation:
1
1
2
F̃ =
· (1 − µ) + 1 −
· F̃ · (1 − µ)2 .
2Ne
2Ne
Rearranging gives
1
1
· (1 − µ)2 =
· (1 − µ)2
F̃ · 1 − 1 −
2Ne
2Ne
which can then be solved for F̃
1
F̃ =
Jay Taylor (ASU)
2Ne
1− 1−
· (1 − µ)2
.
1
· (1 − µ)2
2Ne
Mutation-Drift Balance
25 Jan 2017
5 / 23
Mutation-Drift Balance
If we assume that µ 1, then, at equilibrium, the probability of identity by descent in a
diploid population is given by the following approximate expression
Identity by descent at mutation-drift equilibrium in the IAM
F̃ ≈
1
1
=
.
1 + 4Ne µ
1+Θ
F̃ only depends on the parameter Θ = 4Ne µ (population mutation rate).
Increasing µ reduces F̃ because individuals are more likely to have inherited alleles
that are mutated from their ancestral state.
Increasing Ne reduces F̃ because pairs of randomly sampled individuals are less
likely to be closely related in a large population than in a small population.
In other words, genetic drift reduces variation by increasing the relatedness of the
members of a population.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
6 / 23
Mutation-Drift Balance
Since, in a randomly-mating population, the heterozygosity H is simply equal to 1 − F ,
we also obtain the following classical result:
Heterozygosity at mutation-drift equilibrium in the IAM
H̃ ≈
Θ
.
1+Θ
Competing rates interpretation: As we trace two lineages backwards in time, there are
two possible events
The two lineages coalesce at rate 1/2Ne ;
One of the lineages mutates, at total rate 2µ.
The two chromosomes will carry different alleles if one of the lineages experiences a
mutation before the two coalesce. This occurs with probability
P(mutation first) =
Jay Taylor (ASU)
2µ
4Ne µ
Θ
=
.
=
2µ + 1/2Ne
1 + 4Ne µ
1+Θ
Mutation-Drift Balance
25 Jan 2017
7 / 23
Mutation-Drift Balance
Example: Coyne (1976) detected 23 electrophoretically distinguishable alleles at the
xanthine dehydrogenase locus in a sample of 60 D. persimilis chromosomes with the
following frequencies:
p1 = p2 = · · · = p18 = 1/60 (singletons)
p19 = p20 = p21 = 1/30
p22 = 1/15
p23 = 8/15
We can use this data to estimate both the probability of identity by descent at this locus
and the population mutation rate Θ:
F̂ =
23
X
pi2 ≈ 0.297
Θ̂ =
i=1
1 − F̂
≈ 2.37
F̂
However, without additional information, we cannot separately estimate µ and Ne .
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
8 / 23
Mutation-Drift Balance
Mutation-Drift Balance for Microsatellite Loci
For certain kinds of loci, the infinite-alleles model is unsuitable and these predictions
need to be modified. This will often be true for example of tandemly-repeated DNA
sequences such as microsatellite loci.
Microsatellite repeats are
2-7 bp in length.
The number of repeats can vary
greatly between individuals.
These loci tend to mutate at very high rates and homoplasy may be common.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
9 / 23
Mutation-Drift Balance
Replication slippage leads to changes in copy number
Replication slippage occurs when the
parent and daughter strands partially
separate during replication and then
incorrectly re-anneal.
Slippage usually leads to a gain or a loss
of a single repeat, although larger
changes sometimes occur.
Mutation rates can be on the order of 1
event per 1000 generations.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
10 / 23
Mutation-Drift Balance
Copy-number change in microsatellite loci is often modeled using the stepwise mutation
model (SMM), which assumes that the number of repeats can only increase or decrease
by one per mutation event. For this model, Ohta & Kimura (1973) showed that
Heterozygosity at mutation-drift equilibrium in the SMM
H̃ = 1 −
1
(1 + 2Θ)1/2
The equilibrium heterozygosity under the SMM is less than that under the IAM.
This prediction ignores the possibility of copy number changes involving more than
one repeat, which may be common at some loci.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
11 / 23
Mutation-Drift Balance
Mutation-Drift Balance in the Infinite Sites Model
The infinite alleles model was useful in the pre-sequencing era when allelic variation
could only be discriminated using biochemical means. However, to handle DNA
sequence data, we need a more refined model.
The infinite sites model (ISM) was introduced by Kimura (1969).
It assumes that there are infinitely many sites, each of which is equally likely to
mutate and that no site mutates more than once.
This simplification is reasonable if the mutation rate per site is low and the
sequences being analyzed are not too distantly related, i.e., for intraspecific
polymorphism, but not for interspecific divergence.
With the ISM, we can ask questions about the number of segregating sites and
their frequencies at mutation-drift balance.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
12 / 23
Mutation-Drift Balance
Suppose that n chromosomes are sampled from a population with coalescent effective
population size Ne and let Sn be the number of segregating sites. Then
Expected number of segregating sites at equilibrium under the ISM
E[Sn ] = Θ
n−1
X
1
.
i
i=1
Here Θ = 4Ne µ, where µ is the locus-wide mutation rate of the region sequenced.
When n is large, E[Sn ] ∼ Θ log(n). This grows very slowly with n, meaning that
very large sample sizes will often be needed to discover new segregating sites, e.g.,
E[S10 ] ≈ 2.83Θ,
Jay Taylor (ASU)
E[S100 ] ≈ 5.18Θ,
E[S1000 ] ≈ 7.48Θ,
Mutation-Drift Balance
E[S10000 ] ≈ 9.79Θ
25 Jan 2017
13 / 23
Mutation-Drift Balance
We can turn this last result into an estimator of Θ using the method of moments.
Watterson’s estimator
ΘW = Sn
n−1
.X
1
.
i
i=1
ΘW is unbiased and asymptotically normal as n → ∞.
However, the variance of the estimator is fairly large and does not go to 0 as
n → ∞.
Nonetheless, ΘW is sometimes useful when estimates are needed on the fly, e.g.,
migrate-n uses ΘW to estimate initial effective population sizes that are then
refined through more computationally intensive procedures.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
14 / 23
Mutation-Drift Balance
The nucleotide diversity of a locus is defined to be the probability that two randomly
chosen individuals differ at a randomly chosen site within that locus. This can be
estimated from a sample of chromosomes, in which case the sample nucleotide diversity
is usually denoted π.
Equilibrium nucleotide diversity under the ISM
E[π] =
Θ
.
1+Θ
Here Θ = 4Ne µ, where µ is the mutation rate per site per generation.
This result can be derived using the competing rates calculation that we saw
previously.
The expected value does not depend on the sample size, but its variance does.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
15 / 23
Mutation-Drift Balance
Example: Kreitman (1983) sequenced a 768 bp region of the ADH locus in 11
chromosomes sampled from D. melanogaster and found a total of 6 alleles containing 14
segregating sites, shown below.
Name
Ref
Wa-S
Fl-1S
Af-S
Fr-S
Fl-2S
Ja-S
Fl-F
Fr-F
Wa-F
Af-F
Ja-F
39
T
.
.
.
.
G
G
G
G
G
G
G
226
C
T
T
.
.
.
.
.
.
.
.
.
Jay Taylor (ASU)
387
C
T
T
.
.
.
.
.
.
.
.
.
393
C
.
.
.
.
.
.
.
.
.
.
A
441
C
A
A
.
.
.
.
.
.
.
.
.
513
C
A
A
.
.
.
.
.
.
.
.
.
519
T
C
C
.
.
.
.
.
.
.
.
.
531
C
.
.
.
.
.
.
G
G
G
G
G
Mutation-Drift Balance
540
C
.
.
.
.
.
T
T
T
T
T
T
578
A
.
.
.
.
.
.
C
C
C
C
C
606
C
.
.
.
.
.
T
T
T
T
T
T
615
T
.
.
.
.
.
.
C
C
C
C
C
645
A
.
.
.
.
.
C
C
C
C
C
C
25 Jan 2017
684
G
.
.
A
A
.
A
.
.
.
.
.
16 / 23
Mutation-Drift Balance
For this data set, the population mutation rate Θ = 4Ne µ can be estimated in three
ways, using S11 , F or π. Here we will estimate the per-site population mutation rates,
so we will have to divide the first two estimates (which are locus-wide) by the number of
sites:
1
14
≈ 0.00622
S11 = 14 → Θ̂W =
768 2.929
1
1−F
F = 0.223 → Θ̂F =
≈ 0.00453
768
F
π
π = 0.00786 → Θ̂π =
≈ 0.00792.
1−π
Remarks:
With a mutation rate of µ = 10−8 mutations per site per generation, these
calculations give estimates of Ne ≈ 450, 000 − 800, 000.
The variation between estimates has several possible sources: estimation error
(noise), use of different information from the data, and model misspecification.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
17 / 23
Mutation-Drift Balance
Mutation-Drift Balance in Bi-allelic Models
We can incorporate bi-allelic mutation into the Wright-Fisher model by making the
following modifications:
1
Mutations occur only during reproduction and are independently transmitted to
each offspring.
2
Each descendant of an A parent inherits a mutant a allele with probability v .
Similarly, each descendant of an a parent inherits a mutant A allele with
probability u.
All other assumptions remain unchanged, i.e., non-overlapping generations, constant
population size, binomial sampling and neutrality.
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
18 / 23
Mutation-Drift Balance
Mutation changes the behavior of the Wright-Fisher model in several ways.
It is no longer the case that the average frequency of allele A is constant. Instead,
h
i
E ∆pt = u · (1 − pt ) − v · pt ,
which shows that A will tend to increase in frequency when rare and decrease in
frequency when common.
Although alleles may be transiently lost from the population, they will eventually
be reintroduced by mutation.
N=103, µ=10−3
N=104, µ=10−4
1
0.5
0.5
0.5
0
0
5000
Generation
Jay Taylor (ASU)
10000
p
1
p
p
N=103, µ=10−4
1
0
0
5000
Generation
Mutation-Drift Balance
10000
0
0
5000
Generation
10000
25 Jan 2017
19 / 23
Mutation-Drift Balance
Stationary Distribution of Allele Frequencies under Mutation-Drift Balance
If the mutation rates are positive, then the allele frequencies will never settle into
fixed values.
On the other hand, it can be shown that the distribution of pt will converge to a
limiting distribution which we call the stationary distribution.
The limiting distribution does not depend on the initial frequency of A.
It takes ∼ 4Ne generations for the population to forget the initial frequency.
t =2
0.16
0.035
t =20
0.025
t =100
p = 0.01
0.14
0.03
0.02
0.1
p = 0.9
0.08
0.06
p = 0.5
0.015
0.01
0.005
0.005
0.02
Jay Taylor (ASU)
0.02
0.015
0.01
0.04
0
0
density
0.025
density
Stationary behavior of the
Wright-Fisher process:
(N = 100, u = 0.02)
density
0.12
0.5
p
1
0
0
Mutation-Drift Balance
0.5
p
1
0
0
0.5
p
25 Jan 2017
1
20 / 23
Mutation-Drift Balance
Two interpretations of the stationary distribution:
If we run a large number of independent simulations or experiments, then after a
sufficient number of generations, the distribution of allele frequencies across trials
will be given by the stationary distribution.
Alternatively, if we run a single simulation or experiment for a very long time, then
the proportion of time when the allele frequency is equal to p will be proportional
to the stationary density of p.
Neutral Wright−Fisher model
1
0.9
0.8
0.7
0.6
p
Ergodic behavior of the
Wright-Fisher process:
(N = 100, u = 0.02)
0.5
0.4
0.3
0.2
0.1
0
0
Jay Taylor (ASU)
1
2
3
Mutation-Drift Balance
4
5
Generation
6
7
8
9
10
4
x 10
25 Jan 2017
21 / 23
Mutation-Drift Balance
Provided that Ne is sufficiently large (Ne ≥ 100), the stationary distribution at a neutral
bi-allelic locus in a population with coalescent effective population size Ne is given by a
Beta distribution.
Stationary distribution of allele frequencies
The stationary distribution can be approximated by a Beta distribution with parameters
4Ne u and 4Ne v , which has the following density:
π(p) =
1 4Ne u−1
p
(1 − p)4Ne v −1 ,
C
0 ≤ p ≤ 1.
In particular, if we sample the population at some sufficiently large time t, then the
probability that the allele frequency p(t) at that time is between a and b will be
approximately:
Z b
P(a < p(t) < b) ≈
π(p)dp.
a
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
22 / 23
Mutation-Drift Balance
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
The stationary distribution reflects the competing effects of genetic drift, which
eliminates variation, and mutation, which generates variation.
2Nu = 0.1
0.05
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2Nu = 10
0.05
p
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
When 4Ne u, 4Ne v < 1, drift dominates mutation and the stationary distribution is
bimodal, with peaks at the boundaries (one allele is common and one rare).
When 4Ne u, 4Ne v > 1, mutation dominates drift and the stationary distribution is
peaked about its mean (both alleles are common).
Jay Taylor (ASU)
Mutation-Drift Balance
25 Jan 2017
23 / 23