Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection Genome Evolution. Amos Tanay 2010 Population genetics Drift: The process by which allele frequencies are changing through generations Mutation: The process by which new alleles are being introduced Recombination: the process by which multi-allelic genomes are mixed Selection: the effect of fitness on the dynamics of allele drift Epistasis: the drift effects of fitness dependencies among different alleles “Organismal” effects: Ecology, Geography, Behavior Genome Evolution. Amos Tanay 2010 Wright-Fischer model for genetic drift ∞ gametes N individuals N individuals ∞ gametes We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0) We can model the frequency as a Markov process on a variable X (the number of A alleles) with transition probabilities: 2 N i i Tij 1 j 2 N 2 N j 2N j Sampling j alleles from a population 2N population with i alleles. In larger population the frequency would change more slowly (the variance of the binomial variable is pq/2N – so sampling wouldn’t change that much) Loss 0 1 2N-1 2N Fixation Genome Evolution. Amos Tanay 2010 The Moran model Instead of working with discrete generation, we replace at most one individual at each time step A t A t A a a X A A A a a a A A A A A A Replace by sampling from the current population t 0 We assume time steps are small, what kind of mathematical models is describing the process? Genome Evolution. Amos Tanay 2010 The Moran model A t A t A a a X A A A a a a A A A A A A Replace by sampling from the current population t 0 Assume the rate of replacement for each individual is 1, We derive a model similar to Wright-Fischer, but in continuous time. A process on a random variable counting the number of allele A: Loss 0 i-1 1 i i+1 2N-1 i i 1 bi (2 N i ) i i 1 di i Rates: 2N i 2N i 2N “Birth” “Death” 2N Fixation Genome Evolution. Amos Tanay 2010 Fixation probability Loss 0 i-1 1 i i+1 2N-1 i i 1 bi (2 N i ) i i 1 di i Rates: 2N i 2N i 2N 2N Fixation “Birth” “Death” In fact, in the limit, the Moran model converge to the Wright-Fischer model, for example: Theorem: When going backward in time, the Moran model generate the same distribution of genealogy as Wright-Fischer, only that the time is twice as fast Theorem: In the Moran model, the probability that A becomes fixed when there are initially I copies is i/2N Proof: like the proof for the Wright-Fischer model. The expected X value is unchanged since the probability of births and deaths is the same Genome Evolution. Amos Tanay 2010 Fixation time Ei Ei ( | T2 N To ) Expected fixation time assuming fixation Theorem: In the Moran model, let p = i / 2N, then: Proof: not here.. Ei 2 N (1 p) log( 1 p) p Genome Evolution. Amos Tanay 2010 Selection Fitness: the relative reproductive success of an individual (or genome) Fitness is only defined with respect to the current population. Fitness is unlikely to remain constant in all conditions and environments Sampling probability is multiplied by a selection factor 1+s Mutations can change fitness A deleterious mutation decrease fitness. It would therefore be selected against. This process is called negative or purifying selection. A advantageous or beneficial mutation increase fitness. It would therefore be subject to positive selection. A neutral mutation is one that do not change the fitness. Genome Evolution. Amos Tanay 2010 Adaptive evolution in a tumor model Selection Human fibroblasts + telomerase Passaged in the lab for many months Spontaneously increasing growth rate V. Rotter Selection in haploids: infinite populations, discrete generations Genome Evolution. Amos Tanay 2010 This is a common situation: •Bacteria gaining antibiotic residence •Yeast evolving to adapt to a new environment •Tumors cells taking over a tissue Allele Frequency Relative fitness Gamete after selection Generation t: A B pt 1 qt 1 w 1 pt 1w qt 1 pt 1w pt 1w qt 1 qt 1 pt 1w qt 1 Ratio as a function of time: pt p wt 0 qt q0 Fitness represent the relative growth rate of the strain with the allele A It is common to use s as w=1+s, defining the selection coefficient Genome Evolution. Amos Tanay 2010 Selection in haploid populations: dynamics 100 90 Growth = 1.5 80 Population 70 60 50 40 We can model it in continuous time: 30 20 Growth = 1.2 10 0 0 2 4 6 8 10 12 A (t ) aA(t ), B (t ) bB(t ) Generation 14 In infinite population, we can just consider the ratios: 12 Ratio A/B 10 A(t ) A(0) ( a b )t e B(t ) B(0) 8 6 4 2 0 0 2 4 6 Generation 8 10 12 Genome Evolution. Amos Tanay 2010 Computing w A(t ) A(0) t w B(t ) B(0) log( A(t ) A(0) ) log( ) (a b)t log( w)t B(t ) B(0) Example (Hartl Dykhuizen 81): E.Coli with two gnd alleles. One allele is beneficial for growth on Gluconate. A population of E.coli was tracked for 35 generations, evolving on two mediums, the observed frequencies were: Gluconate: Ribose: 0.4555 0.898 0.594 0.587 For Gluconate: log(0.898/0.102) - log(0.455/0.545) = 35logw log(w) = 0.292, w=1.0696 Compare to w=0.999 in Ribose. Genome Evolution. Amos Tanay 2010 Fixation probability: selection in the Moran model When population is finite, we should consider the effect of selection more carefully Loss 0 1 The models assume the fitness is the probability of the offspring to be viable. If it is not, then there will not be any replacement i-1 i i+1 2N-1 i i 1 bi (2 N i ) i i 1 di i Rates: Theorem: In the Moran model, with selection s>0 Pi (T2 N 1 (1 s)i T0 ) 1 (1 s) 2 N i 2N 2N i (1 s ) 2N 2N Fixation “Birth” “Death” Genome Evolution. Amos Tanay 2010 Fixation probability: selection in the Moran model Theorem: In the Moran model, with selection s>0 Pi (T2 N Note: Note: 1 (1 s)i T0 ) 1 (1 s) 2 N i i 1 bi (2 N i ) i i 1 di i i 2N 2N i (1 s ) 2N i 1 2 Ns 0 Pi (T2 N T0 ) s s s 1 (1 s) e Pi (T2 N 1 e is T0 ) 1 e 2 Ns Variant (Kimura 62): The probability of fixation in the Wright-Fischer model with selection is: P2 Np (T2 N 1 e 4 Nsp T0 ) 1 e 4 Ns Reminder: we should be using the effective population size Ne Genome Evolution. Amos Tanay 2010 Fixation probability: selection in the Moran model Theorem: In the Moran model, with selection s>0 Pi (T2 N T0 ) 1 (1 s) 1 (1 s) 2 N i i i 1 bi (2 N i ) i i 1 di i i 2N 2N i (1 s ) 2N Proof: First define: Hitting time Ty min{ t : X t y} Fixation given initial i “A”s h(i) Pi (T2 N To ) The rates of births is bi and of deaths is di, so the probability a birth occur before a death is bi/(bi+di). Therefore: h(i ) bi di h(i 1) h(i 1) bi d i bi d i h(i 1) h(i ) di (h(i ) h(i 1)) (1 s )( h(i ) h(i 1)) bi h(0) 0, h(i 1) h(i) h(1)(1 s)i j 1 1 (1 s) j h( j ) c(1 s) c s i 0 i h( 2 N ) 1 c s 1 (1 s) 2 N Genome Evolution. Amos Tanay 2010 Fixation probabilities and population size P2 Np (T2 N 1 e 4 Nsp 2s T0 ) 1 e 4 Ns 1 e 4 Ns 0.02 0.01 0.0001 0.000001 0.00000001 0.015 1E-10 1E-12 1E-14 1E-16 0.01 1E-18 Ne=100 Ne=1000 Ne=10000 Ne=100000 Ne=100 Ne=1000 Ne=10000 Ne=100000 1E-20 1E-22 1E-24 0.005 1E-26 1E-28 1E-30 1E-32 0 -0.005 -0.003 -0.001 0.001 0.003 0.005 0.007 1E-34 0.009 1E-36 1E-38 -0.005 -0.005 -0.003 1E-40 -0.001 0.001 0.003 0.005 0.007 0.009 Genome Evolution. Amos Tanay 2010 Selection and fixation Recall that the fixation time for a mutation (assuming fixation occurred) is equal the coalescent time: t 4N Theorem: In the Moran model: E1 ( | T2 N To ) Theorem (Kimura): t (2 / s) ln( 2 N ) 2 log N s (As said: twice slower) Fixation process: 1.Allele is rare – Number of A’s are a superciritcal branching process” 1 log 2 N s 2. Alelle 0<<p<<1 – Logistic differential equation – generally deterministic log log 2 N 3. Alelle close to fixation – Number of a’s are a subcritical branching process 1 log 2 N s Selection Drift Genome Evolution. Amos Tanay 2010 Selection in diploids Assume: Genotype AA Aa aa Fitness w11 w12 w22 Frequency p2 2 pq q2 (Hardy Weinberg!) There are different alternative for interaction between alleles: a is completely dominant: one a is enough – f(Aa) = f(aa) a is Complete recessive: f(Aa) = f(AA) codominance: f(AA)=1, f(Aa)=1+s, f(aa)=1+2s overdominance: f(Aa) > f(AA),f(aa) The simple (linear) cases are not qualitatively different from the haploid scenario Genome Evolution. Amos Tanay 2010 Mutation-Selection balance When an allele is weakly deleterious, mutations can play a major role in driving allele frequencies Genotype New allele frequency, without mutation pqw12 p 2 w11 p' 2 p w11 2 pqw12 q 2 w22 What is the equilibrium frequency of the deleterious allele? h 0, q' h 0, q ' s hs 1 Fitness Frequency(HW) p New allele frequency, assuming mutation pq(1 hs) p 2 p' 2 (1 ) p 2 pq(1 hs) q 2 (1 s) AA A Aa aa 1 hs 1 s 2 2 pq a ignore (q<<1) q2 Genome Evolution. Amos Tanay 2010 Mutation-Selection balance: Huntington disease a neurological genetic disease appearing after age 35 Resulting from a dominant mutation – how does this disease survive in the human population? Although it may be fatal, the fitness is not very low due to the late age of onset (estimated w12=0.81) Human population: 70 per million (Europe) to 1 per million (Africa) h>0, and we can estimate the mutation rate at the Huntington locus, as hsq’ = 10-6 (1-0.81) = 1.9x107 to 70x10-6 (1-0.81) = 1.3x10-6 h 0, q' h 0, q ' s hs Genome Evolution. Amos Tanay 2010 Mutation-Selection balance: Haldane-Muller h 0, q' The average fitness of the population, given recurrent mutations in rate at a locus with negative fitness s. Assume perfect recessivity (h=0): Assuming partial dominance (h>0) 1 qˆ 2 s 1 s h 0, q ' s hs s 1 1 2 pˆ qˆhs qˆ 2 s 1 2(1 ) hs s 1 2 hs hs hs The Haldane-Muller principle: the effect of mutation on the average population fitness depends only on the mutation rate, not on the fitness of the alleles!! 2 Genome Evolution. Amos Tanay 2010 Overdominance A SNP affecting the beta-globin gene make the encoded protein defected. The resulted red blood cells are curved and elongated, and are removed from the circulation Homozygous for the mutation will usually die from anemia without intensive care Heterozygous individual will have mild anemia, but will deal better with the malaria parasite Plasmodium fliciparum (maybe because infected red cells become sickled) (historical) Malaria distribution Sickle-cell anemia wiki Genome Evolution. Amos Tanay 2010 Other types of selection Different fitness for different individuals. e.g., male vs. female For example male genes that take up female resources in mammals This was suggested to lead to the phenomenon of imprinting where cells are expressing only the maternal or paternal allele Imprinted genes are much like haploids Genome Evolution. Amos Tanay 2010 Other types of selection Frequency-, Density-dependent selection: when the fitness depend on the frequency of the allele or the population size. Fecundity selection: different reproductive potential for mating pairs. Effects of heterogeneous environment Effects that apply directly to the haplotype: gametic selection/meiotic drive (e.g., killing your homologous chromosome reproductive potential) Sexual selection: male advertising the reproductive potential, or confronting other males Kin selection: (“origin of altruism”) Genome Evolution. Amos Tanay 2010 Recombination and selection Genome Evolution. Amos Tanay 2010 Linkage and selection Linkage interfere with the purging of deleterious mutations and reduce the efficiency of positive selection! Beneficial Beneficial Beneficial Weakly deleterious Selective sweep or Hitchhiking effect or genetic draft (Gillespie) Hill-Robertson effect Genome Evolution. Amos Tanay 2010 Linkage and selection The variance in allele frequency is used to define the effective population size V ( p) p(1 p) /( 2 N e ) Simplistically, assume a neutral locus is evolving such that a selective sweep is affecting a fully linked locus at rate . A sweep will fixate the allele with probability p, and we further assume that the sweep happens instantly: 1 Ne V ( p) p(1 p) N l 2 N 1 2 N e e This is very rough, but it demonstrates the basic intuition here: sweeps reduce the effective selection in a way that can be quantified through reduction in the effective population size. Nl Ne 1 2 N eC C – the average frequency of the neutral allele after the sweep Genome Evolution. Amos Tanay 2010 Don’t let it confuse you… Purifying Negative Forces that drives genomic conservation Neutrality Background Directed Adaptive Positive Forces that drives genome change