Download Mutation-Drift Balance

Mutation-Drift Balance Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the effective population size are relatively stable, then the amount of genetic variation will tend towards an equilibrium known as mutation-drift balance at which the rate at which variation is lost through drift is equal to the rate at which new variation is created by mutation. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 1 / 23 Mutation-Drift Balance Multilocus Surveys Reveal Limited Variation in Nucleotide Diversity (0.00005 ≤ π ≤ 0.1) Source: Leffler et al. (2012): Revisiting an Old Riddle: What Determines Genetic Diversity Levels within Species? Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 2 / 23 Mutation-Drift Balance Mutation-Drift Balance and Identity by Descent Mutation and drift have opposing effects on the probabilities that individuals are identical by descent (Cotterman 1940, Malecot 1941). 1 We say that two haploid individuals are identical in state at a locus if they carry the same allele. 2 We say that two haploid individuals are identical by descent at a locus if they share the same allele and if they inherited that allele without mutation (or recombination) from their most recent common ancestor. A1 Figure: Individuals can be identical by state even when they are not identical by descent (homoplasy). A1 A 1 → A2 A1 → A2 A2 A2 Identity by state Jay Taylor (ASU) Mutation-Drift Balance A1 A1 Identity by descent 25 Jan 2017 3 / 23 Mutation-Drift Balance Suppose that we sample two chromosomes at random from generation t and let Ft be the probability that they are identical by descent. We can derive a recursive equation relating Ft+1 to Ft by considering the parentage of the sampled individuals. For simplicity, we will make the following assumptions: 1 The population is diploid, with coalescent effective population size Ne . 2 Mutation is governed by the infinite-allele model (IAM), which assumes that every mutation generates a unique allele (no back mutation). 3 The mutation rate is µ per chromosome per generation. In that case, Ft+1 1 1 2 = · (1 − µ) + 1 − · Ft · (1 − µ)2 . 2N 2Ne | e {z } | {z } same parent Jay Taylor (ASU) different parents Mutation-Drift Balance 25 Jan 2017 4 / 23 Mutation-Drift Balance As t increases, these probabilities tend to a limit Ft → F̃ , which is the probability of identity by descent at equilibrium. This quantity satisfies the following equation: 1 1 2 F̃ = · (1 − µ) + 1 − · F̃ · (1 − µ)2 . 2Ne 2Ne Rearranging gives 1 1 · (1 − µ)2 = · (1 − µ)2 F̃ · 1 − 1 − 2Ne 2Ne which can then be solved for F̃ 1 F̃ = Jay Taylor (ASU) 2Ne 1− 1− · (1 − µ)2 . 1 · (1 − µ)2 2Ne Mutation-Drift Balance 25 Jan 2017 5 / 23 Mutation-Drift Balance If we assume that µ 1, then, at equilibrium, the probability of identity by descent in a diploid population is given by the following approximate expression Identity by descent at mutation-drift equilibrium in the IAM F̃ ≈ 1 1 = . 1 + 4Ne µ 1+Θ F̃ only depends on the parameter Θ = 4Ne µ (population mutation rate). Increasing µ reduces F̃ because individuals are more likely to have inherited alleles that are mutated from their ancestral state. Increasing Ne reduces F̃ because pairs of randomly sampled individuals are less likely to be closely related in a large population than in a small population. In other words, genetic drift reduces variation by increasing the relatedness of the members of a population. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 6 / 23 Mutation-Drift Balance Since, in a randomly-mating population, the heterozygosity H is simply equal to 1 − F , we also obtain the following classical result: Heterozygosity at mutation-drift equilibrium in the IAM H̃ ≈ Θ . 1+Θ Competing rates interpretation: As we trace two lineages backwards in time, there are two possible events The two lineages coalesce at rate 1/2Ne ; One of the lineages mutates, at total rate 2µ. The two chromosomes will carry different alleles if one of the lineages experiences a mutation before the two coalesce. This occurs with probability P(mutation first) = Jay Taylor (ASU) 2µ 4Ne µ Θ = . = 2µ + 1/2Ne 1 + 4Ne µ 1+Θ Mutation-Drift Balance 25 Jan 2017 7 / 23 Mutation-Drift Balance Example: Coyne (1976) detected 23 electrophoretically distinguishable alleles at the xanthine dehydrogenase locus in a sample of 60 D. persimilis chromosomes with the following frequencies: p1 = p2 = · · · = p18 = 1/60 (singletons) p19 = p20 = p21 = 1/30 p22 = 1/15 p23 = 8/15 We can use this data to estimate both the probability of identity by descent at this locus and the population mutation rate Θ: F̂ = 23 X pi2 ≈ 0.297 Θ̂ = i=1 1 − F̂ ≈ 2.37 F̂ However, without additional information, we cannot separately estimate µ and Ne . Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 8 / 23 Mutation-Drift Balance Mutation-Drift Balance for Microsatellite Loci For certain kinds of loci, the infinite-alleles model is unsuitable and these predictions need to be modified. This will often be true for example of tandemly-repeated DNA sequences such as microsatellite loci. Microsatellite repeats are 2-7 bp in length. The number of repeats can vary greatly between individuals. These loci tend to mutate at very high rates and homoplasy may be common. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 9 / 23 Mutation-Drift Balance Replication slippage leads to changes in copy number Replication slippage occurs when the parent and daughter strands partially separate during replication and then incorrectly re-anneal. Slippage usually leads to a gain or a loss of a single repeat, although larger changes sometimes occur. Mutation rates can be on the order of 1 event per 1000 generations. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 10 / 23 Mutation-Drift Balance Copy-number change in microsatellite loci is often modeled using the stepwise mutation model (SMM), which assumes that the number of repeats can only increase or decrease by one per mutation event. For this model, Ohta & Kimura (1973) showed that Heterozygosity at mutation-drift equilibrium in the SMM H̃ = 1 − 1 (1 + 2Θ)1/2 The equilibrium heterozygosity under the SMM is less than that under the IAM. This prediction ignores the possibility of copy number changes involving more than one repeat, which may be common at some loci. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 11 / 23 Mutation-Drift Balance Mutation-Drift Balance in the Infinite Sites Model The infinite alleles model was useful in the pre-sequencing era when allelic variation could only be discriminated using biochemical means. However, to handle DNA sequence data, we need a more refined model. The infinite sites model (ISM) was introduced by Kimura (1969). It assumes that there are infinitely many sites, each of which is equally likely to mutate and that no site mutates more than once. This simplification is reasonable if the mutation rate per site is low and the sequences being analyzed are not too distantly related, i.e., for intraspecific polymorphism, but not for interspecific divergence. With the ISM, we can ask questions about the number of segregating sites and their frequencies at mutation-drift balance. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 12 / 23 Mutation-Drift Balance Suppose that n chromosomes are sampled from a population with coalescent effective population size Ne and let Sn be the number of segregating sites. Then Expected number of segregating sites at equilibrium under the ISM E[Sn ] = Θ n−1 X 1 . i i=1 Here Θ = 4Ne µ, where µ is the locus-wide mutation rate of the region sequenced. When n is large, E[Sn ] ∼ Θ log(n). This grows very slowly with n, meaning that very large sample sizes will often be needed to discover new segregating sites, e.g., E[S10 ] ≈ 2.83Θ, Jay Taylor (ASU) E[S100 ] ≈ 5.18Θ, E[S1000 ] ≈ 7.48Θ, Mutation-Drift Balance E[S10000 ] ≈ 9.79Θ 25 Jan 2017 13 / 23 Mutation-Drift Balance We can turn this last result into an estimator of Θ using the method of moments. Watterson’s estimator ΘW = Sn n−1 .X 1 . i i=1 ΘW is unbiased and asymptotically normal as n → ∞. However, the variance of the estimator is fairly large and does not go to 0 as n → ∞. Nonetheless, ΘW is sometimes useful when estimates are needed on the fly, e.g., migrate-n uses ΘW to estimate initial effective population sizes that are then refined through more computationally intensive procedures. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 14 / 23 Mutation-Drift Balance The nucleotide diversity of a locus is defined to be the probability that two randomly chosen individuals differ at a randomly chosen site within that locus. This can be estimated from a sample of chromosomes, in which case the sample nucleotide diversity is usually denoted π. Equilibrium nucleotide diversity under the ISM E[π] = Θ . 1+Θ Here Θ = 4Ne µ, where µ is the mutation rate per site per generation. This result can be derived using the competing rates calculation that we saw previously. The expected value does not depend on the sample size, but its variance does. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 15 / 23 Mutation-Drift Balance Example: Kreitman (1983) sequenced a 768 bp region of the ADH locus in 11 chromosomes sampled from D. melanogaster and found a total of 6 alleles containing 14 segregating sites, shown below. Name Ref Wa-S Fl-1S Af-S Fr-S Fl-2S Ja-S Fl-F Fr-F Wa-F Af-F Ja-F 39 T . . . . G G G G G G G 226 C T T . . . . . . . . . Jay Taylor (ASU) 387 C T T . . . . . . . . . 393 C . . . . . . . . . . A 441 C A A . . . . . . . . . 513 C A A . . . . . . . . . 519 T C C . . . . . . . . . 531 C . . . . . . G G G G G Mutation-Drift Balance 540 C . . . . . T T T T T T 578 A . . . . . . C C C C C 606 C . . . . . T T T T T T 615 T . . . . . . C C C C C 645 A . . . . . C C C C C C 25 Jan 2017 684 G . . A A . A . . . . . 16 / 23 Mutation-Drift Balance For this data set, the population mutation rate Θ = 4Ne µ can be estimated in three ways, using S11 , F or π. Here we will estimate the per-site population mutation rates, so we will have to divide the first two estimates (which are locus-wide) by the number of sites: 1 14 ≈ 0.00622 S11 = 14 → Θ̂W = 768 2.929 1 1−F F = 0.223 → Θ̂F = ≈ 0.00453 768 F π π = 0.00786 → Θ̂π = ≈ 0.00792. 1−π Remarks: With a mutation rate of µ = 10−8 mutations per site per generation, these calculations give estimates of Ne ≈ 450, 000 − 800, 000. The variation between estimates has several possible sources: estimation error (noise), use of different information from the data, and model misspecification. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 17 / 23 Mutation-Drift Balance Mutation-Drift Balance in Bi-allelic Models We can incorporate bi-allelic mutation into the Wright-Fisher model by making the following modifications: 1 Mutations occur only during reproduction and are independently transmitted to each offspring. 2 Each descendant of an A parent inherits a mutant a allele with probability v . Similarly, each descendant of an a parent inherits a mutant A allele with probability u. All other assumptions remain unchanged, i.e., non-overlapping generations, constant population size, binomial sampling and neutrality. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 18 / 23 Mutation-Drift Balance Mutation changes the behavior of the Wright-Fisher model in several ways. It is no longer the case that the average frequency of allele A is constant. Instead, h i E ∆pt = u · (1 − pt ) − v · pt , which shows that A will tend to increase in frequency when rare and decrease in frequency when common. Although alleles may be transiently lost from the population, they will eventually be reintroduced by mutation. N=103, µ=10−3 N=104, µ=10−4 1 0.5 0.5 0.5 0 0 5000 Generation Jay Taylor (ASU) 10000 p 1 p p N=103, µ=10−4 1 0 0 5000 Generation Mutation-Drift Balance 10000 0 0 5000 Generation 10000 25 Jan 2017 19 / 23 Mutation-Drift Balance Stationary Distribution of Allele Frequencies under Mutation-Drift Balance If the mutation rates are positive, then the allele frequencies will never settle into fixed values. On the other hand, it can be shown that the distribution of pt will converge to a limiting distribution which we call the stationary distribution. The limiting distribution does not depend on the initial frequency of A. It takes ∼ 4Ne generations for the population to forget the initial frequency. t =2 0.16 0.035 t =20 0.025 t =100 p = 0.01 0.14 0.03 0.02 0.1 p = 0.9 0.08 0.06 p = 0.5 0.015 0.01 0.005 0.005 0.02 Jay Taylor (ASU) 0.02 0.015 0.01 0.04 0 0 density 0.025 density Stationary behavior of the Wright-Fisher process: (N = 100, u = 0.02) density 0.12 0.5 p 1 0 0 Mutation-Drift Balance 0.5 p 1 0 0 0.5 p 25 Jan 2017 1 20 / 23 Mutation-Drift Balance Two interpretations of the stationary distribution: If we run a large number of independent simulations or experiments, then after a sufficient number of generations, the distribution of allele frequencies across trials will be given by the stationary distribution. Alternatively, if we run a single simulation or experiment for a very long time, then the proportion of time when the allele frequency is equal to p will be proportional to the stationary density of p. Neutral Wright−Fisher model 1 0.9 0.8 0.7 0.6 p Ergodic behavior of the Wright-Fisher process: (N = 100, u = 0.02) 0.5 0.4 0.3 0.2 0.1 0 0 Jay Taylor (ASU) 1 2 3 Mutation-Drift Balance 4 5 Generation 6 7 8 9 10 4 x 10 25 Jan 2017 21 / 23 Mutation-Drift Balance Provided that Ne is sufficiently large (Ne ≥ 100), the stationary distribution at a neutral bi-allelic locus in a population with coalescent effective population size Ne is given by a Beta distribution. Stationary distribution of allele frequencies The stationary distribution can be approximated by a Beta distribution with parameters 4Ne u and 4Ne v , which has the following density: π(p) = 1 4Ne u−1 p (1 − p)4Ne v −1 , C 0 ≤ p ≤ 1. In particular, if we sample the population at some sufficiently large time t, then the probability that the allele frequency p(t) at that time is between a and b will be approximately: Z b P(a < p(t) < b) ≈ π(p)dp. a Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 22 / 23 Mutation-Drift Balance 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 The stationary distribution reflects the competing effects of genetic drift, which eliminates variation, and mutation, which generates variation. 2Nu = 0.1 0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2Nu = 10 0.05 p 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p When 4Ne u, 4Ne v < 1, drift dominates mutation and the stationary distribution is bimodal, with peaks at the boundaries (one allele is common and one rare). When 4Ne u, 4Ne v > 1, mutation dominates drift and the stationary distribution is peaked about its mean (both alleles are common). Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 23 / 23

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Mutation-Drift Balance