Download Lecture IV: Genetic Drift Lecture WS Evolutionary Genetics Part I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
LectureIV:GeneticDrift
Genetic Drift - Evolution at random and loss of genetic variation
For the derivation of the Hardy-Weinberg-Law we assumed a population of infinite size.
Under this and other simplifying conditions (random mating, no selection, non-overlapping
generations) we concluded that both allele frequencies and genotype frequencies were stable
across generations. Yet, much of evolutionary genetic theory and practice concentrates on
understanding how and why some alleles drastically change over time, while others are
stable. If we understanding the underlying forces, we have learnt a great deal about evolution.
The two most important factors that cause allele frequencies to change over time are selection
and genetic drift. For every allele that is found in a population, there are two possible reasons
why it may be transferred to the offspring generation: bad fitness or just bad luck. While
selection describes the first of these reasons, drift addresses the second factor. For example,
imagine a small population of finite size, say 10 diploid individuals (e.g. 3AA, 5Aa, 2 aa).
Perhaps some individuals leave more offspring than others, not due to having ‘better adapted’
alleles, simply due to external non-genetic factors. Some may never have a chance to
reproduce, because they run into a predator, are born on a terrible habitat patch, may catch a
deadly virus or never encounter a mate. In addition, those that make it to the age of
reproduction may produce a different number of offspring in their lives due to similar
reasons. Moreover, in heterozygous individuals only gametes with the A allele may make it to
their offspring, by pure chance, not in relation to the allele itself. As a result, it is unlikely that
the next generation will have exactly 16 A alleles and 14 a alleles, as the parents did. With
high probability the allele frequency will have changed. This process of random change of
allele frequencies in finite populations is called genetic drift and has several consequences
that we will consider in turn:
Genetic Drift is Real
While theory is always useful to predict what we may expect, let us start with pointing out
that genetic drift is not just a statistical belief. It happens under real-world conditions. In a
classical experiment Buri (1956) established 107 small populations of Drosophila
melanogaster and propagated them by in each generation choosing 8 males and 8 females at
random and let them reproduce (producing about 1000 offspring together). He ran the
experiment for about 20 generations and tracked the frequency of a Mendelian genetic marker
(eye colour) initially at 50% frequency. During the course of the experiment he made several
important observations that we will consider in turn:
1) Allele frequencies changed within populations
2) Variation in allele frequency across populations increased
3) Heterozygosity and the genetic variation within populations declined
LectureWS
EvolutionaryGeneticsPartI-JochenB.W.Wolf
1
LectureIV:GeneticDrift
Figure 1: Random genetic drift in an experimental setup of 107 Drosophila populations.
see Futuyma 2005.
The Wright-Fischer Model
A mathematical description of the process again requires a model. Population geneticists have
developed a number of different models to describe genetic drift. One of the simplest and
most common is the Wright-Fisher model, named after the founders of population genetic
theory: Sewall Wright and Ronald A. Fisher (who by the way is also the father of the
ANOVA).
The Wright-Fisher model assumes a haploid population without sexes, in which each
individual reproduces without the need of finding a mate. This would be a realistic
assumption in say bacterial populations. But it turns out, that the dynamics in a diploid,
randomly mating population are almost identical to the dynamics of the haploid model, which
is mathematically more easily tractable. The population is further assumed to be of constant
size holding 2N individuals (the factor two is human-centric and used to mimick a diploid
situation). Gene copies are transmitted from generation t to generation t+1, by random
sampling independently and with equal probability. Again, generations are discrete, i.e.
non-overlapping. In any generation, there are thus 2N homologous gene copies in the
population. K is the number of A Alleles in any generation, the frequency of the A allele
in is thus p = K/2N. We want to model the allele frequency change among generations.
How do we do this?
An analogy may help. Imagine you have two bags. One (the parent population) contains 2N
marbles of different colour (each representing an allele). The other one (the next
generation) is still empty. We now draw a marble from the parent population (a
reproduction event), mark its colour, put the original marble back and put a new marble
with this colour into the 2nd bag (the offspring). By returning the original marble to the first
bag, we make sure that the original allele frequency remains unaltered. This is a realistic
assumption, since parents generally produce a very (infinitely) large numbers of gametes,
much larger than the number used for the next generation (original marbles). We now shake
the original bag to make sure we continue drawing at random and repeat the process until
there are 2N marbles in the new bag. We then count the number of marbles of each colour
LectureWS
EvolutionaryGeneticsPartI-JochenB.W.Wolf
2
LectureIV:GeneticDrift
in both bags and note the difference. Using this analogy, we have modeled the change in
allele frequencies from generation t to offspring generation t + 1.
The Wright-Fisher model (with mutation included, see next lecture) is the null-model in
population genetics. This means, it is used to construct null hypotheses for the change of
allele frequencies in populations. These hypotheses are then tested against data.
Rejection of the simple Wright-Fisher model is then taken as evidence that something
more interesting than “just mutation and drift” acts in the population, for example
selection.
1) Allele frequencies change within populations
This process will introduce heterogeneity in the number of offspring each individual
produces. Some will not reproduce at all (marble not drawn), while others will have multiple
offspring (marble drawn several times). Note that these differences are not due to selection,
but merely due to the sampling process. As a result, allele frequencies will fluctuate, and
eventually one of the two alleles will get fixed, the other will be lost. If we trace the ancestry
of each gene copy backwards (marble), we will notice that after many generations all
individuals have descended from a single common ancestor. If the failure of gene copies to
leave descendants is random, then the gene copies at time t could equally likely have
descended from any of the original gene copies present at time 0. The concept of tracing gene
genealogies backwards in time is the basis of coalescent theory, a retrospective stochastic
model describing the effects of genetic drift.
Figure 2: Illustration of genetic drift by random sampling of gene copies (marbles) with
two allelic states (red, blue) across discrete generations t, t+1, ..., t+4.
For now, we will stick to the original, forward thinking Wright-Fisher model describing the
effects of genetic drift. Mathematically the bean-bag experiment corresponds to 2N Bernoulli
trials. For each trial the probability of drawing allele A is given by its frequency p = Pr(A),
the probability of drawing a allele q = 1 – p = Pr(a). The probability of finding A allele K-
LectureWS
EvolutionaryGeneticsPartI-JochenB.W.Wolf
3
LectureIV:GeneticDrift
times in the 2nd bag, i.e. the distribution of offspring alleles in generation t+1, is given by the
binomial distribution.
Pr(𝐾; 2𝑁; 𝑝) =
!!
!
𝑝 ! (1 − 𝑝)!!!!
where
2𝑁
2𝑁!
=
𝐾
𝐾! 2𝑁 − 𝐾 !
is the binomial coefficient.
Example: Assume N = 5, thus 2N = 10 alleles in the population and initially 2 copies
of the A allele, thus K = 2 and p = 2/10 = 0.2.
The probability to still have K = 2 alleles in the following generation is:
𝑃𝑟 2; 10; 0.2 =
10
0.2! 0.8! = 0.3
2
The probability of loosing the allele altogether, i.e. K = 0 is:
10
𝑃𝑟 0; 10; 0.2 =
0.2! 0.8!" = 0.11
0
If we assume 10 times the population size, but still initially p = 0.2, the probability for p
= 0 in the following generation becomes
𝑃𝑟 0; 100; 0.2 =
100
0.2! 0.8!"" = 2.0 10!!"
0
We see that the probability for large changes in the allele frequency is much reduced in a
large population. Population size matters: the effects of genetic drift are stronger in
small populations!
What happens if you repeat the Wright-Fisher sampling scheme over many generations?
Bob Sheehy at Redford University has compiled a nice simulation tool worth checking
out: http://www.radford.edu/~rsheehy/Gen_flash/popgen/Popgen_help/help.html
2) The mean of the allele frequency stays the same, but the variance increases.
While it is obvious that even extreme changes in allele frequency have a non-zero probability,
the expected allele frequency at t+1 is equal to the allele frequency in generation t. Since we
assume random sampling of genes in generation t, the chance we sample allele A to the next
generation t + 1 is simply given by its initial frequency p0 in generation t: p(t). Accordingly,
the number of A alleles we expect sample to the next generation be 2N p(t), as we draw 2N
times for the next generation. The frequency of the A allele in generation t+1 is thus the
expected number 2N p(t) divided by the total number of gene copies i.e. 2N:
E[p(t+1)]=2N p(t)/2N = p(t)
LectureWS
EvolutionaryGeneticsPartI-JochenB.W.Wolf
4
LectureIV:GeneticDrift
The same result is apparent when considering the expected number of our binomially
distributed random variable K (number of A alleles in a population of size 2N): E[K]=2N p.
The number of A alleles and therefore its frequency is thus expected to remain constant.
However, for any single trans-generational process, allele frequencies can vary substantially
between the parental and offspring generation, more so for very small populations. While the
mean expectation of allele frequencies does not change, the variance increases.
Imagine that a single parent population gives rise to several ‘colony’ offspring populations.
Initially, the frequency of A in all colonies will be very close to the original value p(t).
But over time, p will drift up in some colonies and down in others: the average p taken
over all colonies stays at p(t), but the variance of p among colonies increases.
Eventually, some of the colonies will have only A alleles and the others only a
alleles. We can quantify how much the variance in allele frequency among colonies
increases per generation. After a single generation, the variance in the number K of A
alleles among colonies is just given by the variance of the binomial distribution, 2Np(1p). The variance for the number K of A alleles is thus given by Var[K]=2Npq. Accordingly
the variance of the allele frequency of A p=K/2N is given by
𝑉𝑎𝑟 𝑝 =
𝐾
2𝑁𝑝𝑞 𝑝𝑞
=
=
2𝑁
4𝑁 !
2𝑁
Note that Var[p] → 0 as N → ∞. The colonies will diverge only very slowly if the population
size is large. For infinite population sizes, drift ceases to operate: we retain the prediction
under the Hardy-Weinberg-Law.
3) The probability of fixation and the time it takes
As becomes apparent from the above, in the absence of mutation, any allele must eventually
be lost or fixed. As there is no selection that can increase or decrease the cahnance that any
particular gene copy will be the lucky one, the probability of transmission is the same for all
gene copies. As probabilities for all individuals must some to 1, the probability of any one
individual must be 1/2N. Hence, the probability that an allele of frequency 1/2N goes to
fixation is simply 1/2N. For alleles with more K copies in the population, their probability of
fixation is K/2N or p, the allele frequency. Hence,
Pr 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑙𝑙𝑒𝑙𝑒 𝐴 = 𝐾
1
=𝑝
2𝑁
The time it takes for allele A to fix depends on the population size and it initial frequency p0.
(Kimura & Ohta 1969) showed that the average number of generations it takes an allele to fix
(excluding its loss) is given by
𝐸 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 =
−4𝑁[ 1 − 𝑝 ln 1 − 𝑝 ]
𝑝
Obviously, time scales positively with population size N. Also it takes longer if the allele in
question is rare than if it segregates at a high frequency. For a novel mutation entering a
population (p=1/2N) we can approximate
LectureWS
EvolutionaryGeneticsPartI-JochenB.W.Wolf
5
LectureIV:GeneticDrift
𝐸 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 = 4𝑁 , 𝑎𝑠 𝑝 → 0
More generally, the average the amount of time it will take for a single allele to fix within a
population is given by twice the number alleles within a population. Hence, in a population of
haploid individuals it will take 2N, for diploids 4N generations.
4) Genetic drift erodes genetic variation
Although the average allele frequencies in a population stay constant under drift (no
systematic effect), the heterozygosity (and the genetic variance by any measure) is
expected to decrease over time. This is intuitively clear if we consider the very long-term
effects of drift: Assume that allele A is initially present at frequency p0. We already know
that the average frequency, 𝑝 = E[p], will not change over time, i.e. 𝑝 = 𝑝! . However, in
every generation, there is a small chance that A either fixes (p → 1) or is lost from the
population (p → 0). Although this event is unlikely in any particular generation, it will
certainly happen in some generation if we only wait long enough. There is not conflict of
this fact with 𝑝 = 𝑝! : The A allele will go to fixation with probability p0 and be lost with
probability (1-p0), thus 𝑝 = 1𝑝! + 0 1 − 𝑝! = 𝑝! . Since we only have drift in the process
and no new mutation, the population will stay at p = 0 or p =1, once it reaches this point.
This means that the heterozygosity H = 2p(1-p) will go to zero over the long haul: Even
though the variance among populations increases by drift, the genetic variance within a
population decreases.
For added generality, we express the heterozygosity H as the probability of finding two
different alleles at two homologous loci that are randomly drawn from the population. The
heterozygosity (often also expressed as nucleotide diversity π in a DNA sequence content) is
a central measure of the effects of genetic drift. We now consider its change across
populations as a function of population size. This is most easily derived for homozygosity G
= 1 – H, i.e. the probability that allels at two randomly drawn genes are equal.
Consider a population of size N. At a given locus, the homozygosity in the parent generation
is G, we ask for the expected homozygosity G’ in the offspring generation. For this, we
randomly draw a gene from the offspring and derive the probability that a second gene that is
also randomly drawn from the offspring has the same allele. There are two possibilities:
1. The second gene is a copy of the same parent gene as the first one. Under the
conditions of the Wright-Fisher model, this occurs with a probability of 1/2N.
2. The second gene is not a copy from the same parent gene, but from a randomly drawn
different parent (probability 1 – 1/2N). In this case, the probability that the alleles at
the target locus are equal is just given by the homozygosity G in the parent
generation.
Summing over these two cases, we obtain:
G’=1/2N + (1-1/2N)G = G + 1/2N(1-G)
LectureWS
EvolutionaryGeneticsPartI-JochenB.W.Wolf
6
LectureIV:GeneticDrift
We see that G’ will increase until G reaches 1. The reason for this increase in the case 1)
above: in finite populations there is a non-zero probability for inbreeding, i.e. two alleles are
randomly drawn from the same parental gene – reducing heterozygosity in the offspring
generation.
Using H = 1- G, we can express the change of heterozygosity as
𝐻! = 1 −
1
𝐻
2𝑁
Heterozygosity thus decreases with factor (1-1/2N). Note for an infinite population size this
factors goes to 1 and thus H’ = H. This is what we just found for the Hardy-Weinberg-Law:
for an infinite populations size, the genetic variance is preserved.
If H0 is the initial heterozygosity of the population, then the heterozygosity after t generations
(Ht) can be calculated as:
1 !
𝐻! = 1 −
𝐻!
2𝑁
We can derive the ‘half-life’ of heterozygosity under drift as:
𝐻! 1
1
= = 1−
𝐻! 2
2𝑁
ln
!
1
1
= 𝑡 ln 1 −
2
2𝑁
Using the approximation ln 1 + 𝑥 ≈ 𝑥 we can solve for t
𝑡=
−𝑙𝑛2
−𝑙𝑛2
≈
= 2𝑁𝑙𝑛2 = 1.39𝑁
1
1
1−
− 𝑁
2𝑁
2
We see that the half-life of heterozygosity is of the order of the population size N: if N=100,
it takes 139 generations to cut H in half; if N = one million, it takes 1.39 106 generations.
Once again, we see that drift is mostly potent in small populations; for large populations, it
erodes genetic variation only very slowly.
Literature: (Futuyma 2005; Barton et al. 2007; Nielsen & Slatkin 2013)
Barton NH, Briggs DEG, Eisen JA, Goldstein DB, Patel NH (2007) Evolution. Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, N.Y.
Futuyma DJ (2005) Evolution. Sinauer Associates.
Kimura M, Ohta T (1969) The Average Number of Generations Until Fixation of a Mutant
Gene in a Finite Population. Genetics, 61, 763–771.
Nielsen R, Slatkin M (2013) An Introduction to Population Genetics: Theory and
Applications. Macmillan Education, Sunderland, Mass.
LectureWS
EvolutionaryGeneticsPartI-JochenB.W.Wolf
7