Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LectureIV:GeneticDrift Genetic Drift - Evolution at random and loss of genetic variation For the derivation of the Hardy-Weinberg-Law we assumed a population of infinite size. Under this and other simplifying conditions (random mating, no selection, non-overlapping generations) we concluded that both allele frequencies and genotype frequencies were stable across generations. Yet, much of evolutionary genetic theory and practice concentrates on understanding how and why some alleles drastically change over time, while others are stable. If we understanding the underlying forces, we have learnt a great deal about evolution. The two most important factors that cause allele frequencies to change over time are selection and genetic drift. For every allele that is found in a population, there are two possible reasons why it may be transferred to the offspring generation: bad fitness or just bad luck. While selection describes the first of these reasons, drift addresses the second factor. For example, imagine a small population of finite size, say 10 diploid individuals (e.g. 3AA, 5Aa, 2 aa). Perhaps some individuals leave more offspring than others, not due to having ‘better adapted’ alleles, simply due to external non-genetic factors. Some may never have a chance to reproduce, because they run into a predator, are born on a terrible habitat patch, may catch a deadly virus or never encounter a mate. In addition, those that make it to the age of reproduction may produce a different number of offspring in their lives due to similar reasons. Moreover, in heterozygous individuals only gametes with the A allele may make it to their offspring, by pure chance, not in relation to the allele itself. As a result, it is unlikely that the next generation will have exactly 16 A alleles and 14 a alleles, as the parents did. With high probability the allele frequency will have changed. This process of random change of allele frequencies in finite populations is called genetic drift and has several consequences that we will consider in turn: Genetic Drift is Real While theory is always useful to predict what we may expect, let us start with pointing out that genetic drift is not just a statistical belief. It happens under real-world conditions. In a classical experiment Buri (1956) established 107 small populations of Drosophila melanogaster and propagated them by in each generation choosing 8 males and 8 females at random and let them reproduce (producing about 1000 offspring together). He ran the experiment for about 20 generations and tracked the frequency of a Mendelian genetic marker (eye colour) initially at 50% frequency. During the course of the experiment he made several important observations that we will consider in turn: 1) Allele frequencies changed within populations 2) Variation in allele frequency across populations increased 3) Heterozygosity and the genetic variation within populations declined LectureWS EvolutionaryGeneticsPartI-JochenB.W.Wolf 1 LectureIV:GeneticDrift Figure 1: Random genetic drift in an experimental setup of 107 Drosophila populations. see Futuyma 2005. The Wright-Fischer Model A mathematical description of the process again requires a model. Population geneticists have developed a number of different models to describe genetic drift. One of the simplest and most common is the Wright-Fisher model, named after the founders of population genetic theory: Sewall Wright and Ronald A. Fisher (who by the way is also the father of the ANOVA). The Wright-Fisher model assumes a haploid population without sexes, in which each individual reproduces without the need of finding a mate. This would be a realistic assumption in say bacterial populations. But it turns out, that the dynamics in a diploid, randomly mating population are almost identical to the dynamics of the haploid model, which is mathematically more easily tractable. The population is further assumed to be of constant size holding 2N individuals (the factor two is human-centric and used to mimick a diploid situation). Gene copies are transmitted from generation t to generation t+1, by random sampling independently and with equal probability. Again, generations are discrete, i.e. non-overlapping. In any generation, there are thus 2N homologous gene copies in the population. K is the number of A Alleles in any generation, the frequency of the A allele in is thus p = K/2N. We want to model the allele frequency change among generations. How do we do this? An analogy may help. Imagine you have two bags. One (the parent population) contains 2N marbles of different colour (each representing an allele). The other one (the next generation) is still empty. We now draw a marble from the parent population (a reproduction event), mark its colour, put the original marble back and put a new marble with this colour into the 2nd bag (the offspring). By returning the original marble to the first bag, we make sure that the original allele frequency remains unaltered. This is a realistic assumption, since parents generally produce a very (infinitely) large numbers of gametes, much larger than the number used for the next generation (original marbles). We now shake the original bag to make sure we continue drawing at random and repeat the process until there are 2N marbles in the new bag. We then count the number of marbles of each colour LectureWS EvolutionaryGeneticsPartI-JochenB.W.Wolf 2 LectureIV:GeneticDrift in both bags and note the difference. Using this analogy, we have modeled the change in allele frequencies from generation t to offspring generation t + 1. The Wright-Fisher model (with mutation included, see next lecture) is the null-model in population genetics. This means, it is used to construct null hypotheses for the change of allele frequencies in populations. These hypotheses are then tested against data. Rejection of the simple Wright-Fisher model is then taken as evidence that something more interesting than “just mutation and drift” acts in the population, for example selection. 1) Allele frequencies change within populations This process will introduce heterogeneity in the number of offspring each individual produces. Some will not reproduce at all (marble not drawn), while others will have multiple offspring (marble drawn several times). Note that these differences are not due to selection, but merely due to the sampling process. As a result, allele frequencies will fluctuate, and eventually one of the two alleles will get fixed, the other will be lost. If we trace the ancestry of each gene copy backwards (marble), we will notice that after many generations all individuals have descended from a single common ancestor. If the failure of gene copies to leave descendants is random, then the gene copies at time t could equally likely have descended from any of the original gene copies present at time 0. The concept of tracing gene genealogies backwards in time is the basis of coalescent theory, a retrospective stochastic model describing the effects of genetic drift. Figure 2: Illustration of genetic drift by random sampling of gene copies (marbles) with two allelic states (red, blue) across discrete generations t, t+1, ..., t+4. For now, we will stick to the original, forward thinking Wright-Fisher model describing the effects of genetic drift. Mathematically the bean-bag experiment corresponds to 2N Bernoulli trials. For each trial the probability of drawing allele A is given by its frequency p = Pr(A), the probability of drawing a allele q = 1 – p = Pr(a). The probability of finding A allele K- LectureWS EvolutionaryGeneticsPartI-JochenB.W.Wolf 3 LectureIV:GeneticDrift times in the 2nd bag, i.e. the distribution of offspring alleles in generation t+1, is given by the binomial distribution. Pr(𝐾; 2𝑁; 𝑝) = !! ! 𝑝 ! (1 − 𝑝)!!!! where 2𝑁 2𝑁! = 𝐾 𝐾! 2𝑁 − 𝐾 ! is the binomial coefficient. Example: Assume N = 5, thus 2N = 10 alleles in the population and initially 2 copies of the A allele, thus K = 2 and p = 2/10 = 0.2. The probability to still have K = 2 alleles in the following generation is: 𝑃𝑟 2; 10; 0.2 = 10 0.2! 0.8! = 0.3 2 The probability of loosing the allele altogether, i.e. K = 0 is: 10 𝑃𝑟 0; 10; 0.2 = 0.2! 0.8!" = 0.11 0 If we assume 10 times the population size, but still initially p = 0.2, the probability for p = 0 in the following generation becomes 𝑃𝑟 0; 100; 0.2 = 100 0.2! 0.8!"" = 2.0 10!!" 0 We see that the probability for large changes in the allele frequency is much reduced in a large population. Population size matters: the effects of genetic drift are stronger in small populations! What happens if you repeat the Wright-Fisher sampling scheme over many generations? Bob Sheehy at Redford University has compiled a nice simulation tool worth checking out: http://www.radford.edu/~rsheehy/Gen_flash/popgen/Popgen_help/help.html 2) The mean of the allele frequency stays the same, but the variance increases. While it is obvious that even extreme changes in allele frequency have a non-zero probability, the expected allele frequency at t+1 is equal to the allele frequency in generation t. Since we assume random sampling of genes in generation t, the chance we sample allele A to the next generation t + 1 is simply given by its initial frequency p0 in generation t: p(t). Accordingly, the number of A alleles we expect sample to the next generation be 2N p(t), as we draw 2N times for the next generation. The frequency of the A allele in generation t+1 is thus the expected number 2N p(t) divided by the total number of gene copies i.e. 2N: E[p(t+1)]=2N p(t)/2N = p(t) LectureWS EvolutionaryGeneticsPartI-JochenB.W.Wolf 4 LectureIV:GeneticDrift The same result is apparent when considering the expected number of our binomially distributed random variable K (number of A alleles in a population of size 2N): E[K]=2N p. The number of A alleles and therefore its frequency is thus expected to remain constant. However, for any single trans-generational process, allele frequencies can vary substantially between the parental and offspring generation, more so for very small populations. While the mean expectation of allele frequencies does not change, the variance increases. Imagine that a single parent population gives rise to several ‘colony’ offspring populations. Initially, the frequency of A in all colonies will be very close to the original value p(t). But over time, p will drift up in some colonies and down in others: the average p taken over all colonies stays at p(t), but the variance of p among colonies increases. Eventually, some of the colonies will have only A alleles and the others only a alleles. We can quantify how much the variance in allele frequency among colonies increases per generation. After a single generation, the variance in the number K of A alleles among colonies is just given by the variance of the binomial distribution, 2Np(1p). The variance for the number K of A alleles is thus given by Var[K]=2Npq. Accordingly the variance of the allele frequency of A p=K/2N is given by 𝑉𝑎𝑟 𝑝 = 𝐾 2𝑁𝑝𝑞 𝑝𝑞 = = 2𝑁 4𝑁 ! 2𝑁 Note that Var[p] → 0 as N → ∞. The colonies will diverge only very slowly if the population size is large. For infinite population sizes, drift ceases to operate: we retain the prediction under the Hardy-Weinberg-Law. 3) The probability of fixation and the time it takes As becomes apparent from the above, in the absence of mutation, any allele must eventually be lost or fixed. As there is no selection that can increase or decrease the cahnance that any particular gene copy will be the lucky one, the probability of transmission is the same for all gene copies. As probabilities for all individuals must some to 1, the probability of any one individual must be 1/2N. Hence, the probability that an allele of frequency 1/2N goes to fixation is simply 1/2N. For alleles with more K copies in the population, their probability of fixation is K/2N or p, the allele frequency. Hence, Pr 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑙𝑙𝑒𝑙𝑒 𝐴 = 𝐾 1 =𝑝 2𝑁 The time it takes for allele A to fix depends on the population size and it initial frequency p0. (Kimura & Ohta 1969) showed that the average number of generations it takes an allele to fix (excluding its loss) is given by 𝐸 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 = −4𝑁[ 1 − 𝑝 ln 1 − 𝑝 ] 𝑝 Obviously, time scales positively with population size N. Also it takes longer if the allele in question is rare than if it segregates at a high frequency. For a novel mutation entering a population (p=1/2N) we can approximate LectureWS EvolutionaryGeneticsPartI-JochenB.W.Wolf 5 LectureIV:GeneticDrift 𝐸 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛 = 4𝑁 , 𝑎𝑠 𝑝 → 0 More generally, the average the amount of time it will take for a single allele to fix within a population is given by twice the number alleles within a population. Hence, in a population of haploid individuals it will take 2N, for diploids 4N generations. 4) Genetic drift erodes genetic variation Although the average allele frequencies in a population stay constant under drift (no systematic effect), the heterozygosity (and the genetic variance by any measure) is expected to decrease over time. This is intuitively clear if we consider the very long-term effects of drift: Assume that allele A is initially present at frequency p0. We already know that the average frequency, 𝑝 = E[p], will not change over time, i.e. 𝑝 = 𝑝! . However, in every generation, there is a small chance that A either fixes (p → 1) or is lost from the population (p → 0). Although this event is unlikely in any particular generation, it will certainly happen in some generation if we only wait long enough. There is not conflict of this fact with 𝑝 = 𝑝! : The A allele will go to fixation with probability p0 and be lost with probability (1-p0), thus 𝑝 = 1𝑝! + 0 1 − 𝑝! = 𝑝! . Since we only have drift in the process and no new mutation, the population will stay at p = 0 or p =1, once it reaches this point. This means that the heterozygosity H = 2p(1-p) will go to zero over the long haul: Even though the variance among populations increases by drift, the genetic variance within a population decreases. For added generality, we express the heterozygosity H as the probability of finding two different alleles at two homologous loci that are randomly drawn from the population. The heterozygosity (often also expressed as nucleotide diversity π in a DNA sequence content) is a central measure of the effects of genetic drift. We now consider its change across populations as a function of population size. This is most easily derived for homozygosity G = 1 – H, i.e. the probability that allels at two randomly drawn genes are equal. Consider a population of size N. At a given locus, the homozygosity in the parent generation is G, we ask for the expected homozygosity G’ in the offspring generation. For this, we randomly draw a gene from the offspring and derive the probability that a second gene that is also randomly drawn from the offspring has the same allele. There are two possibilities: 1. The second gene is a copy of the same parent gene as the first one. Under the conditions of the Wright-Fisher model, this occurs with a probability of 1/2N. 2. The second gene is not a copy from the same parent gene, but from a randomly drawn different parent (probability 1 – 1/2N). In this case, the probability that the alleles at the target locus are equal is just given by the homozygosity G in the parent generation. Summing over these two cases, we obtain: G’=1/2N + (1-1/2N)G = G + 1/2N(1-G) LectureWS EvolutionaryGeneticsPartI-JochenB.W.Wolf 6 LectureIV:GeneticDrift We see that G’ will increase until G reaches 1. The reason for this increase in the case 1) above: in finite populations there is a non-zero probability for inbreeding, i.e. two alleles are randomly drawn from the same parental gene – reducing heterozygosity in the offspring generation. Using H = 1- G, we can express the change of heterozygosity as 𝐻! = 1 − 1 𝐻 2𝑁 Heterozygosity thus decreases with factor (1-1/2N). Note for an infinite population size this factors goes to 1 and thus H’ = H. This is what we just found for the Hardy-Weinberg-Law: for an infinite populations size, the genetic variance is preserved. If H0 is the initial heterozygosity of the population, then the heterozygosity after t generations (Ht) can be calculated as: 1 ! 𝐻! = 1 − 𝐻! 2𝑁 We can derive the ‘half-life’ of heterozygosity under drift as: 𝐻! 1 1 = = 1− 𝐻! 2 2𝑁 ln ! 1 1 = 𝑡 ln 1 − 2 2𝑁 Using the approximation ln 1 + 𝑥 ≈ 𝑥 we can solve for t 𝑡= −𝑙𝑛2 −𝑙𝑛2 ≈ = 2𝑁𝑙𝑛2 = 1.39𝑁 1 1 1− − 𝑁 2𝑁 2 We see that the half-life of heterozygosity is of the order of the population size N: if N=100, it takes 139 generations to cut H in half; if N = one million, it takes 1.39 106 generations. Once again, we see that drift is mostly potent in small populations; for large populations, it erodes genetic variation only very slowly. Literature: (Futuyma 2005; Barton et al. 2007; Nielsen & Slatkin 2013) Barton NH, Briggs DEG, Eisen JA, Goldstein DB, Patel NH (2007) Evolution. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. Futuyma DJ (2005) Evolution. Sinauer Associates. Kimura M, Ohta T (1969) The Average Number of Generations Until Fixation of a Mutant Gene in a Finite Population. Genetics, 61, 763–771. Nielsen R, Slatkin M (2013) An Introduction to Population Genetics: Theory and Applications. Macmillan Education, Sunderland, Mass. LectureWS EvolutionaryGeneticsPartI-JochenB.W.Wolf 7