* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1 Natural Selection 2 Mutation
Saethre–Chotzen syndrome wikipedia , lookup
SNP genotyping wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
BRCA mutation wikipedia , lookup
Gene expression programming wikipedia , lookup
Koinophilia wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Group selection wikipedia , lookup
Frameshift mutation wikipedia , lookup
Genetic drift wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Microevolution wikipedia , lookup
1 Natural Selection 1.1 Maximum Likelihood Estimation Maximum Likelihood Estimation of Selection We have previously derived recurrence relations for allele frequencies over time given relative fitnesses w and starting allele frequencies p P p2u (t)wuu + v6=u pu (t)pv (t)wuv pu (t + 1) = . w̄(t) Suppose you observe allele counts over multiple generations {n1 (1), n2 (1), . . . , n1 (2), n2 (2), . . . , }. The allele frequencies in each generation are pu (1), pu (2), . . ., where pu (t + 1) is a function (given above) of pu (t) for all t > 0. Then, the likelihood for the fitness model is YY L ({wuv , pu (1)}) ∝ [pu (t)]nu (t) . t u Numerical methods are required to maximize this likelihood over wuv and pu (1). HIV Example Wu06 2 2.1 Mutation Theory Mutation Mutation provides the raw material for evolution. All mutations are ultimately changes at the nucleotide level. The vast majority of mutations that have an effect are deleterious and incompletely dominant (1 : 1 − hs : 1 − s, with h < 1). These mutations are present in populations because they arise by accident during genome copying during meisosis. How does selection act upon them? Such deleterious mutations ultimately achieve an equilibrium state, wherein their production by mutation is balanced by their removal by selection. Overview of mutation: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.chapter.1049 Modeling Mutation Consider a locus where there are 2 alleles possible A and B. Suppose the mutation rate (per replication cycle per locus) for mutating A → B is u. Let v be the rate from B → A. Let pA (t) be the frequency of allele A in the tth generation. In the next generation, type A alleles will arise by faithful copy of type A alleles from the previous generation, or by mutation during copying of type B alleles from the previous generation. So, pA (t + 1) = (1 − u)pA (t) + v [1 − pA (t)] ∆pA (t + 1) = pA (t + 1) − pA (t) = −upA (t) + v [1 − pA (t)] . If the mutation rate per site per generation is 10−9 to 10−10 and we consider a gene locus of about 1000 nucleotides, the mutation rate per gene per replication cycle is therefore 10−6 to 10−7 . A Stable Mutation Equilibrium A mutation equilibrium occurs when ∆pA (t) = 0 and pA (t) = pA . Solving the equation for ∆pA (t) = 0 0 = −upA + v [1 − pA ] yields pA = v u+v at equilibrium. v Exercise. The equilibrium is stable. To verify this, plug in pA = u+v − δ in the equation v for ∆pA . Will pA increase or decrease in the next generation? Repeat with pA = u+v + δ. Exercise. If A is the vastly dominant allele, show ∆pA (t) ≈ −u. How much relative error is introduced in making this approximation? Relative error is the difference between the exact and approximate values divided by the exact value. Rate of Approach to Equilibrium Take the recurrence relation for pA (t) and subtract the equilibrium pA pA (t + 1) − pA = (1 − u)pA (t) + v [1 − pA (t)] − pA = (1 − u)pA (t) + v [1 − pA (t)] − (1 − u)pA − v [1 − pA ] = (1 − u) [pA (t) − pA ] + v [1 − pA (t) − 1 + pA ] = [1 − u − v] [pA (t) − pA ] , and we’re overly familiar with this kind of equation pA (t) − pA = (1 − u − v)t (pA0 − pA ) , where pA0 is the initial frequency of type A alleles. The approach to equilibrium is very slow since 1 − u − v ≈ 1. Exercise. By Taylor’s series, we know (1 − u − v)t ≈ e−(u+v)t . Use this approximation to compute the landmark times t1/2 , the time it takes to decrease the starting disequilibrium pA0 − pA by one-half. Neglecting Back Mutation You will commonly hear someone say or write, "and we neglected back mutation." If A is the normal (wild type) form and B is the mutant form of the allele, then neglecting back mutation is equivalent to setting v = 0. Back mutation mutates the mutant form back to the wild type form of the allele. 2 It is biologically reasonable to neglect back mutation because often we are speaking of an allele variant of a protein. Either the protein works (normal/wild type) or it doesn’t (mutant). There are many more ways to make a protein that doesn’t work than one that does, so generally u v. However, when considering DNA sequences it is not reasonable to neglect back mutation. If A → C with probability u, then it is normally not all right to assume C → A is virtually impossible (i.e. v = 0). Mutation with Multiple Alleles For DNA sequences it is often the case that you are dealing with a large number n of possible alleles A1 , A2 , . . . , An . Let uij be the mutation rate from allele i to allele j. Then, the recursion equation for type i alleles is 2 3 X X uij 5 + pj (t)uji pi (t + 1) = pi (t) 41 − j6=i j6=i Equations can be established for the equilibrium allele frequencies pi , by setting pi (t + 1) = pi (t) = pi in the equations. For given uij , the resulting linear system of equations that can be solved for pi . Equilibrium with Multiple Alleles For the special case that uij = u for all i 6= j, then pi = pi [1 − (n − 1)u] + X upj j6=i (n − 1)pi = X pj j6=i npi = pi = 1 1 . n So, when all mutations are equally likely then all alleles are equally prevalent at equilibrium. This model of evolution at the DNA level is called the Jukes-Cantor model of nucleotide substitution. It implies that all nucleotides A, C, G, and T are equally likely at every position in the alignment, when mutational equilibrium has been achieved. 2.2 With selection - haploid Mutation and Selection Because there are more ways to make a bad protein than a good, functional protein, it would seem that mutation generally pushes toward a worse equilibrium. How does nature handle/control mutation? DNA Repair http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.section.1151 3 Mutation and Selection - Haploid Consider two alleles (genotypes) A and B. Let genotype A have allele frequency p at birth. We will consider how selection followed by mutation, in the circle of life, will impact the allele frequency. If the fitness of genotype A is 1 and the fitness of genotype B is 1 − s, then after selection the genotype frequency will shift from the initial p to p p = p + (1 − s)(1 − p) 1 − (1 − p)s p∗ = after selection. Next, the individuals copy themselves and the possibility of mutation is introduced. The allele frequency after mutation is p0 = (1 − u)p∗ , where we have neglected back mutation. Mutation and Selection Equilibrium - Haploid Combining selection and mutation, we have p0 = (1 − u)p . 1 − (1 − p)s At equilibrium p0 = p. Make the substitution and solve for the equilibrium, p [1 − (1 − p)s] = p(1 − u) up − sp(1 − p) = 0 p [u − s(1 − p)] = 0 When will the system be at equilibrium? Mutation and Selection Equilibrium - Haploid qe = u s where qe = 1 − pe . If s u, then qe is predicted to be small. There won’t be much allele B around despite the efforts of mutation to increase its numbers! The force of selection is generally much stronger than the force of mutation. Only at the DNA level can you see mutations that may have little impact on fitness and hence have small s, even on the order of magnitude of u. What happens when u > s? Mutation/Selection Balance Example 4 Ribeiro et al. 1998. The frequency of resistant mutant virus before antiviral therapy. AIDS. 12:461–465. λ − dx − βx [w + (1 − s)m] ẋ = ẇ = βx [(1 − µ)w + (1 − s)µm] − aw ṁ = βx [µw + (1 − s)(1 − µ)m] − am Mutation/Selection Balance Example Parameter xt wt mt λ d β (1 − s)β µ a xt+1 − xt = λ − dxt − βxt [wt + (1 − s)mt ] wt+1 − wt = βxt [(1 − µ)wt + (1 − s)µmt ] − awt mt+1 − mt = βxt [µwt + (1 − s)(1 − µ)mt ] − amt Meaning Count of susceptible cells at generation t Count of wild type-infected cells in generation t Count of mutant-infected cells in generation t Count of new susceptible cells born in each generation Prob. susceptible cell dies in a generation Prob. encounter of susc. cell and wild type-infected cell infects susc. cell Prob. encounter of susc. and mutant-infected cell infects susc. cell Mutation rate Prob. of death of infected cell They find at equilibrium 2.3 me we = µ . s Does this look familiar? With selection - diploid Mutation and Selection - Diploid Recessive First consider recessive mutants such that AA and AB individuals have relative fitness wA = 1, while the homozygous recessive mutant BB has decreased relative fitness wB = 1−s. Assume that the frequency of allele A is p in the starting gamete pool. After selection p∗ = = p2 + p(1 − p) p2 + 2p(1 − p) + (1 − s)(1 − p)2 p . 1 − s(1 − p)2 After mutation, where again we neglect back mutation p0 = p(1 − u) 1 − s(1 − p)2 5 Diploid Recessive - Mutation/Selection Balance To find the mutation/selection balance for the recessive diploid case, again assume equilibrium so p0 = p = pe . Then, 1 − s(1 − pe )2 1−u = Therefore, at equilibrium r qe = 1 − pe = u . s Comparing to the haploid case indicates that the gene frequency of the mutant allele B will be higher in the diploid recessive case than the haploid case. Can you explain this biologically? Exercise. What is the frequency of affected individuals at equilibrium? Example Cystic fibrosis is a disease caused by a recessive allele, we shall call B. The frequency of affecteds at at birth is 1 in 2,500. What is the cystic fibrosis allele B frequency q in the population? Until recently, cystic fibrosis was fatal before affected individuals reached reproductive age, therefore vBB = 1 − s = 0, where s = 1. At equilibrium √ qe = u. Is the required mutation rate reasonable for a disease caused by a single mutant protein? Mutation/Selection Balance - Diploid Dominant Now we consider the case where the mutant allele is completely or partially dominant to the wild type allele. First, we assume geometric (multiplicative fitness). AA 1 AB 1−s BB (1 − s)2 After selection p∗ = w̄A w̄ = p , 1 − (1 − p)s the same as the haploid case. Mutation affects allele frequencies in diploids just the same way it does in haploids, so the haploid results apply to diploid loci under multiplicative selection and qe ≈ us . 6 Selection in Homozygotes vs. Heterozygotes Every generation a fraction 2sq(1 − q) of heterozygotes are “killed” by selection. Each killing destroys a mutant B allele. ˆ ˜ Equivalently, a fraction q 2 1 − (1 − s)2 homozygotes are “killed” by selection each generation. All these killings destroy 2 B alleles. The ratio of mutant alleles destroyed in heterozygotes to homozygotes is 2sq(1 − q) s(1 − q) 1−q = = , 2q 2 [1 − (1 − s)2 ] q(2s − s2 ) q(2 − s) which falls between 1−q 2q and 1−q q because s ∈ [0, 1]. Since q 1, we conclude that this ratio is very large, and most mutants alleles are destroyed by selection on heterozygotes. That’s because most mutant alleles are present in heterozygotes when the mutant allele is rare. Mut./Sel. Balance - Diploid Partial Dominance Let’s parameterize partial dominance as follows: AA 1 AB 1 − hs BB 1−s h indicates how much of the fitness detriment in homozygote mutants BB is also shared by the heterozygote mutant carriers AB. So, for example, if h ≈ 1, then heterozygotes are nearly as affected as homozygotes. Because the mutant allele B will be rare when s u, homozygote BB will be rare in the population. Selection will mostly be acting on heterozygotes, so there cannot possibly be much practical difference between the above selection scheme and AA 1 AB 1 − hs BB (1 − hs)2 Mut./Sel. Balance - Diploid Partial Dominance The latter selection scheme is the familiar multiplicative. Therefore, the mutant allele frequency for the general partial dominance fitness landscape is qe ≈ u . hs Caution. The above result is true only when the mutant allele B is rare, i.e. u hs. Even fairly moderate heterozygote effects, e.g. h small, can still maintain mutant allele frequency qe low as long as u s. Counterintuitively, as far as population impact, the small fitness effects on mutant carriers (heterozygote AB) is much more important than the potentially huge impacts on affected homozygotes BB. 7 2.4 Genetic load Haldane-Muller Principle for Haploids We will now concern ourselves with the magnitude of the detrimental effect of mutation on a population. For haploids, the mean relative fitness of the population is w̄ = 1 − q + (1 − s)q = 1 − sq and at equilibrium q = qe = u/s, so mean relative fitness is w̄e = 1 − u. Surprisingly, the effect of mutation on the mean relative fitness of the population is to decrease it by fraction u, which is independent of the fitness of the mutant! Haldane-Muller Principle for Diploids For diploids, we have mean relative fitness of a recessive allele is w̄ = (1 − q)2 + 2q(1 − q) + (1 − s)q 2 = 1 − sq 2 p At equilibrium qe = u/s, so again w̄e = 1 − u. For partial or fully dominant alleles, mean relative fitness is w̄ = (1 − q)2 + (1 − hs)q(1 − q) + (1 − s)q 2 = 1 − 2hsq(1 − q) − sq 2 . At equilibrium qe ≈ u , hs w̄e = 1 − 2u + 2u2 u2 − 2 ≈ 1 − 2u, hs h s since u is very small and u2 is neglible. Genetic Load The fraction of mean relative fitness lost because of mutation is called genetic load. It represents the cost of mutation. wmax − w̄ L= , wmax where wmax is the fitness of the maximally fit genotype in the population. If wmax = 1, then L ≈ 2u for diploids. Consider n independent loci each mutating and contributing to the load. If we assume fitness effects across loci are multiplicative, then for n partially dominant loci, the mean fitness is (1 − 2u)n ≈ e−2un and genetic load is 1 − e−2un . Exercise. Show the cost of recessive mutations is less than the cost of dominant mutations. 8 Mutation and Linkage Disequilibrium Let the gamete frequencies for two loci be PAC , PAD , PBC , PBD , where locus 1 has alleles A and B and locus 2 has alleles C and D. Suppose the mutation rate A → B is u1 and B → A is v1 for locus 1. Similarly define u2 and v2 for locus 2. Follow PAC over time and assume linkage equilibrium at generation t PAC (t + 1) = (1 − u1 )(1 − u2 )PAC (t) + (1 − u1 )v2 PAD (t) = (1 − u1 )(1 − u2 )pA (t)pC (t) + (1 − u1 )v2 pA (t)pD (t) + v1 (1 − u2 )PBC (t) + v1 v2 PBD (t) + v1 (1 − u2 )pB (t)pC (t) + v1 v2 pB (t)pD (t) = [(1 − u1 )pA (t) + v1 pB (t)] [(1 − u2 )pC (t) + v2 pD (t)] = pA (t + 1)pC (t + 1) which is linkage equilibrium. Mutation and Linkage Disequilibrium II So, by the next generation, the two loci are still in linkage equilibrium. We conclude that mutation cannot create linkage disequilibrium. Unless.... If mutation is so rare that it becomes a random force, then mutation creates temporary linkage disequilibrium. Imagine a mutation that occurs on average once every million years. When it is first introduced, it will be introduced on a particular chromosome background. * A3 B6 C2 D9 E1 F1 G2 9 H4 I2 J6 K1