Download L20PositiveNegativeBalancing

Topic 14. Lecture 20. Positive, negative, and balancing selection in natural populations. We considered the five factors of Microevolution - mutation, selection, mode of reproduction, population structure, and genetic drift - separately. Of course, in nature they act together, and now we need to understand how this happens. Natural selection is the key factor of Microevolution, as far as adaptive evolution of phenotypes is considered. Thus, the presentation is structured around different modes of selection. Still, other factors are also important. In particular, we need to understand the relationship between selection - systematic differences of fitnesses, and genetic drift - random differences of fitnesses. Which one prevails? There is a simple answer to this question. Strength of selection, acting on a particular pair of alleles A and a (there could be no selection on one allele!) is characterized by coefficient of selection, or selective advantage of A over a s = 1-wa/wA. Strength of genetic drift is characterized by effective population size Ne, the size of an equivalent Wright-Fisher population (with some exceptions, Ne is approximately the same for all loci in the genome). The key fact: If, at some locus, Nes >> 1, selection rules. In particular, the most fit allele will be, eventually, fixed, and will never be lost after this. In contrast, if, at some locus, Nes << 1, random drift rules. In particular, evolution will be reversible in this case - no allele will be fixed forever. Also, effective population size determines the level of genetic variation at selectively neutral loci. At such a locus, virtual heterozygosity H = 4Nem, where m is the mutation rate at this locus. Thus, knowledge of m makes it possible to estimate Ne for natural populations from easily observable levels of genetic heterogeneity. Some estimates of Ne in nature are: humans - 10,000 (not today!) whales - 35,000 fruit flies - 1,000,000 worms - 100,000 - 1,000,000 marine invertebrates - 1,000,000 ciliates - 10,000,000 (or even more) bacteria - 10,000,000 (only!) Effective sizes of most natural populations are much smaller than their actual head counts, because of high variation in the number of offspring per individual. Thus, the minimal strengths selection that still matters must be at least ~10-6 - and in some populations only a much stronger selection can affect evolution. Let us now consider: 1) Selection that promotes changes (positive) 2) Selection that prevents changes (negative and balancing) 3) Weak or absent selection 1) Selection that promotes changes a) the complete story of one allele replacement: Replacements of old, inferior alleles by new, superior alleles, driven by positive selection, in the most important process in Microevolution, responsible for evolution of adaptations. We already considered the last phase of this process. However, the whole story consists of 3 phases, the first two of which are affected by stochasticity, due to mutation and to genetic drift, respectively: i) a beneficial mutation appears - population has to wait for this to happen. ii) the mutation survives unavoidable initial genetic drift. iii) the mutation takes over: d[A]/dt = s[A](1-[A]). Quantitative analysis of an allele replacement (rough, but fair): i) a beneficial mutation appears - population has to wait. For how long? A particular mutation appears once every Nm generations, where N is a number of breeding individuals. Thus, a typical waiting time may be 100 generations for a simple nucleotide substitution (if m = 10-8, N = 106), or 100,000 generations for a 3-nucleotide insertion (why 3?), or forever for a complex event (evolution is mutation-limited to a large extent). ii) the mutation survives unavoidable initial genetic drift. With what probability? A mutation will survive with probability ~2s, after which the number of its carriers will be lifted up to ~1/2s. Thus, if a typical strength of selection for a new, beneficial allele is 0.01, only ~1/100 alleles are not lost initially. In other words, the waiting time must be multiplied by 100 (may be more). This initial drift takes ~1/2s generations, after which a mutation is either lost or out of danger. iii) the mutation takes over: d[A]/dt = s[A](1-[A]). How fast? Deterministic propagation of the mutation - after the number of its carriers becomes larger than ~100 goes fast: it takes ~ 10/s generations for [A] to grow from ~0.0001 to 0.9999. Although our equation was derived for asexuals, sex even with diploid selection does not change much, as long as w AA > wAa > waa. ii) more on the unavoidable initial genetic drift Drift is a fair process. Thus, after a mutation appears, the expected number of neutral mutants is always 1 - so that if the mutation was not lost after some generations (with probability, say, 1%), the expected number of mutants is 100. During a short time when this branching process matters, selective advantage of the new beneficial allele does not matter much. If a beneficial mutation is lost, the population has to wait for its next occurrences. iii) more on how a mutation takes over after becoming frequent Is dominance/recessivity important for Darwinian evolution? Not much, at least as a rule. If the beneficial allele is partially recessive, its initial expansion is retarded, and if it is partially dominant, the final stage of its expansion is retarded. And without dominance (w Aa2 = wAA x waa, sex changes nothing. Still, this is not yet a complete story of an adaptive allele replacement. Indeed, a replacement affects genetic variation at other loci, due to a phenomenon known as hitchhiking. In asexuals, when a unique beneficial mutation reaches fixation, it "accidentally" drives to fixation all those variants that happened to be in the genotype in which it occurred. In this respect, sex makes a substantial difference, and limits , due to recombination, the impact of hitch-hiking to only a relatively small region of the chromosome. A replacement driven by positive selection produces a region of very low variation, flanked by regions with some high-frequency derived alleles. A beneficial mutation (red) in a population with many segregating neutral (green) and slightly deleterious (blue) variants. Half-way towards fixation, the beneficial mutation carries with it the close-by variants. Some of these variants become detached, due to crossing-over, by the time of the fixation. This process of removal of genetic variation close to the site of an advantageous allele replacement is called selective sweep. At the boundaries of the region of a sweep, initially rare variants can reach high frequencies, but not fixation. The size of the segment of a chromosome that is swept of genetic variation due to an adaptive allele replacement increases with s, the coefficient of selection in favor of a new allele and declines Ne and r, the probability of recombination between two nucleotides, increases. If selection is strong, the allele replacement occurs fast, and a larger segment of the genome will be swept. If the population is infinite and/or recombination is extremely fast, the effect of hitch-hiking would disappear. b) Overlapping selection-driven allele replacements: Under reasonable strength of selection, a selection-driven allele replacement can take from ~100 to ~10,000 generations. Thus, successive replacements in a population would not overlap in time, if there is, on average, much less that 1 of them per 100-10,000 generations. It seems that some populations accumulate beneficial alleles at faster rates. If so, different selection-driven allele replacements must overlap in time. How can this happen? Here, the role of sex is critical: in a population of a realistic size, overlapping adaptive allele replacements can happen only with sex. Asexual population: beneficial alleles at different loci that emerged in different individuals compete with each other. A replacement sweeps the whole genome! Sexual population: beneficial alleles at different loci that emerged in different individuals can find their way into the same genotype, due to recombination. Still, even with sex, overlapping adaptive allele replacements may signify a problem for the population. Indeed, they imply that fitness landscape changes fast and, if this happens, can the population survive? When a population follows a rapidly moving fitness peak, it lags behind it substantially - and without a big lag there would be no overlapping adaptive replacements. The reduction of the mean population fitness relative to the optimal fitness that corresponds to the top of the fitness peak is called the lag genetic load: L = (w max-W)/wmax . Haldane's dilemma: If a population has to follow a fitness peak that moves too rapidly, and, thus, tries to accumulate too many adaptive allele replacements per unit time, it may suffer from too high a lag load and go extinct. Remember that, in order to sustain the population with L = 0.8, individuals of the optimal genotype must produce, on average, at least 5 (or 10 with sex) offspring. Suppose that, at a given moment, 100 adaptive allele replacements are occurring. Thus, an average individual lags behind the optimum by 50 alleles. Would such an individual be viable? Is Haldane's dilemma real? If positive selection favors new, advantageous alleles independently, the fitness of an individual with k such alleles, each with advantage s, is (1+s)k ~ eks. Then, the average individual may have fitness that is way below the fitness of the optimal genotype and the lag load would be too high if many replacements occur at the same time. However, epistatic selection can abolish this problem: if selection is soft and 50% of the population is left to reproduce, the per individual number of good alleles increases by ~1 standard deviation of the number of beneficial alleles per individual - which can be a lot. We do not know what kind of selection - hard and exponential or soft and epistatic - is responsible for adaptive evolution. Depending on the answer, the rate of adaptive evolution is either limited or not limited by the lag load. c) Selection-driven allele replacements in spatially structured populations Replacement of an old, inferior allele with the new, beneficial allele can be substantially slowed down by spatial structure of the population. Propagation of a beneficial allele follows "traveling wave" dynamics, with the velocity of propagation 2(ms)1/2, where m is the rate of (localized) migration and s is selective advantage of the new allele. Occasional longdistance leaps of some individuals can speed up this process substantially. Without epistasis, waves of propagation of different alleles approximately independently within sexual populations. Global Spread of Chloroquine-Resistant Strains of Plasmodium falciparum. Microevolutionary theory obviously needs to take into account spatial structure when the spread of an advantageous genotype is considered. However, it does not radically alter the outcome of evolution under strong selection, with the only exception of speciation. 2) Selection that prevents changes Two forms of selection prevent changes - negative selection and balancing selection. Selection that prevents changes is much less important, from the point of view of evolutionary biology, than positive selection. Human's eye and peacock's tail evolved due to positive selection! Still, selection that prevent changes cannot be ignored completely, because negative selection is the most common form of selection in natural populations. In other words, a vast majority of mutations that lead to a substantial change of the phenotype are deleterious. One more example of negative selection: deleterious mutations in human rhodopsin. ADRP: autosomal dominant retinitis pigmentosa. ARRP: autosomal recessive retinitis pigmentosa. CSNB: congenital stationary night blindness. Under negative selection, population suffers from genetic load, which can be referred to as mutation load (abolish mutation, and this load would disappear). Let us consider the simplest case of one locus with two alleles, A and a, assuming asexual reproduction or sex with selection in the haploid phase. Fitnesses of alleles A and a are 1 and 1-s, respectively. Deleterious mutations A -> a occur with rate m. Mutation and selection lead to the following changes in allele frequencies (assuming that a is rare, because s >> m) mutation selection [a] ---------------> [a] + m ---------------> ([a] + m)(1-s) = [a]t+1 At equilibrium: ([a] + m)(1-s) = [a] [a] + m - s[a] - sm = [a] Ignoring term -sm (a product of two small numbers), we obtain [a]eq = m/s. What is the value of genetic load under mutation-selection equilibrium? L = 1 - W/wmax; wmax = 1; W = 1[A]eq + (1-s)[a]eq; [a]eq = m/s. Thus, L = 1 - 1(1-m/s) - (1-s)m/s = m This remarkable fact, L = m is known as Haldane-Muller principle: mutation load is equal to mutation rate, and does not depend on the strength of selection against mutations ("one mutation - one genetic death"). Of course, this is true only if selection removes mutations one-by-one. What other situations are possible? 1) Recessive mutations at one locus. Consider two alleles at one locus of sexual diploids, with fitnesses 1 (AA), 1-hs (Aa), and 1-s (aa), where h characterizes dominance of the deleterious allele a. When a is recessive, mutation load is two times lower: if deleterious alleles are removed only as homozygotes, each genetic death removes two alleles. 2) Truncation or similar epistatic selection against mutations at many loci. Exponential selection removes mutations, in a sense, one-by-one, but under truncation one genetic death can remove many mutations, reducing the mutation load (only with sex). Both recessivity and truncation are forms of synergistic epistasis between different deleterious alleles; when present together, mutations reinforce deleterious effects of each other. How important is this phenomenon in nature remains a matter of debates. In addition to negative selection, changes of the population can also be prevented by balancing selection, which, however, keeps the population variable. One form of balancing selection is the direct dependence of fitnesses of genotypes on allele frequencies, with rare genotypes having an advantage. Interactions of selection with Mendelian segregation lead to another curious form of balancing selection, due to advantage of heterozygotes. Consider a population of sexual diploids with two alleles, A and a, at one locus. Fitnesses of the 3 possible genotypes, AA, Aa, and aa, are w AA, wAa, and waa, respectively. If w AA < wAa > waa (advantage of heterozygotes), selection protects variation. Indeed, due to Hardy-Weinberg law, a rare allele is mostly exposed to selection in heterozygous state and, thus, advantage of heterozygotes leads to a higher fitness of rare alleles. Frequencies of the 3 possible genotypes, AA, Aa, and aa are [A]2, 2[A][a], and [a]2. If, for example, A is rare, [A]2 is small relatively to 2[A][a], so that rare A will be mostly present in heterozygotes. A few examples of balancing selection are known, but this mode of selection is rare. 3) Weak or absent selection The simplest case of strict selective neutrality is particularly important, because there are many neutral nucleotide sites, at least in large genomes. In this case 1. Equilibrium virtual heterozygosity is H = 4Nem in a diploid population. Derivation is easy, but we will not consider it. 1. Rate of evolution, the per generation frequency of allele replacements, equals to the mutation rate m. The probability of occurrence of a new mutation is mN per generation (say, 0.001). A new mutation then will be fixed with probability 1/N, because it has the same probability of eventually taking over the population as any other allele (selection is absent). This gives us m allele replacements per generation. If selection is not totally absent (s = 0), it can be regarded as "weak" is Nes < 20. In this case the superior allele is not fixed permanently. In the simplest case of symmetric mutation, the rate of evolution and the level of variation are maximal when selection is absent, and decline very rapidly when selection gets stronger. Locus A with two alleles A1 and A2, symmetric mutation with rate m, such that 4Nem << 1, so that most of the time either allele is fixed. Rate of evolution is the frequency of switches between A1 and A2 fixations. Detecting natural selection We reviewed the very basics of the direct theory of Microevolution, which tells us how all its factors, working together, affect genetic variation within populations. However, this theory is useful only if we know the actual parameters of factors of Microevolution. This can be accomplished either by direct measurements, for example of the mutation rate (by parentoffspring comparisons), or through inverse theory of Microevolution, which infers, from patterns in genetic variation and allele replacement, the parameters of these factors. We already saw how this works for measuring genetic drift: theory predicts that without selection H = 4Nem, thus, if we know m and can measure H, we can recover 4Ne (which is almost impossible to observe directly). Now we will consider the key issue of measuring natural selection. Indeed, measuring fitnesses directly is very difficult (it is essentially impossible to measure fitness of a multicellular organism with an error less than 1-3%), and the results obtained in the laboratory cannot be applied to wild populations. Thus, indirect methods based on inverse theory are crucial. 1) Detecting negative selection This is a relatively easy task - because negative selection is very common. Negative selection affects evolving sequences in two ways: 1) it reduces the probability of fixation of a mutation with s < 0 2) it reduces the time until elimination of a mutation with s < 0 As a result, negative selection leaves two kinds of footprints: 1) reduced rate of evolution and the level of within-population variation Reduced relative to what? - to the rate of evolution at selectively neutral sites. According to the fundamental theorem of neutral evolution, neutral sites evolve at the mutation rate (this is intuitively obvious). Practically, negative selection is detected by comparing the amount of interspecies divergence or within-population polymorphism to that at plausibly neutral sequence sites. Can we detect negative selection at individual sites or only at sequence segments? This depends on the depths of the alignment. Alignment of orthologous regulatory regions of 4 mammals. A transcription factor-binding site with low divergence is marked by blue. If the alignment includes only a few sequences, we can only detect substantial segments with reduced divergence rates (never call them mutation rates!) - for example, using Hidden Markov Model technique. A typical segment of an alignment of orthologous proteins from different species. Here the number of sequences makes it possible to detect negative selection even at individual sites. Data on within-population variation usually allow us only to detect negative selection in wide classes of sites, for example to show that non-synonymous coding sites are under stronger selection than synonymous sites. However, with high H making inferences about individual sites may become possible. We badly need 100 genotypes of Ciona savignyi. 2) An excess of rare alleles Distribution of allele (nucleotide) frequencies in Arabidopsis thaliana. PLoS Biology 3, 1289-1299, 2005. At non-synonymous sites an excess of rare alleles, relative to the neutral expectation, is higher. Of course, here we cannot make inferences about individual sites. However, we can make inferences about the strength of negative selection because only alleles with small s are observed as rare polymorphisms. In contrast, reduced rate of evolution tells us very little about the strength of selection: s = -0.001 is enough to stop evolution. 2) Detecting positive selection This is a difficult and important problem - because positive selection is rare, relatively to negative selection (this was proposed in 1935 by Ivan Schmalhausen) and because positive selection is the only driving force of adaptive evolution. Positive selection affects evolving sequences in two ways: 1) it increases the probability of fixation of a mutation with s > 0 2) it reduces the time until fixation of a mutation with s > 0 Footprint of positive selection looks rather differently depending on its age. 1) Positive selection accomplished a long time ago - interspecies comparisons In contrast to negative selection, positive selection accelerates evolution (not the rate of evolution!). Thus, it makes sites or segments to evolve faster than neutrally. As a result, we can detect positive selection only from comparing relatively close species, such that the number of accepted substitutions between them per neutral site, Kneu, is ~1-3. Ancient actions of positive selection, that occurred more than 1/m generations ago (m is the per nucleotide mutation rate) could never be detected. So, if we have a large number of close enough sequences, even individual sites where K > Kneu (Kneu is measured for sites that are probably under no selection) can be detected. This approach works well for pathogens, with multiple moderately different strains. Distribution of amino acid replacements along the Neisseria gonorrhoeae transmembrane porin sequence. Each dot represents one replacement. Obviously, sequence segments exposed outside the cell evolve much faster, probably due to positive selection. Molecular Biology and Evolution 17, 423-436, 2000. Positive selection in HIV-1 protease, detected on samples from 40,000 patients. For each codon site, the ratio of the rate of the most common allele replacement over the neutral rate is shown (Journal of Virology 78, 3722-3732, 2004). However, there are two problems with this approach: 1) Positive selection can act only within one clade, with negative selection acting at the same site in the rest of the phylogeny. Then, overall K will be low at the site. 2) There may be not enough species to measure K for individual sites. If so, all probably important sites are treated together, and their average per site number of changes, Kimp, is calculated. Trouble is, sites under positive are generally scattered between numerous sites under negative selection, leading to Kimp < Kneu. Only very rarely, there are long enough segments with a majority of sites under positive selection. Positive selection acting in one clade, on a sparse phylogenetic tree. Sophisticated statistical methods can be used to analyze such data - but, in my opinion, they reliably detect positive selection only if a substantial fraction of sites to Kimp > Kneu. at least within a large clade - and this is generally very rare. Most of "important" sites are, most of the time, under negative, and not positive selection. A clever idea of MacDonald and Kreitman can offer some help. They realized that the condition Kimp > Kneu (or Kimp/Kneu > 1) can be relaxed. If negative selection is strong, "important" sites under it will not be polymorphic in the population. Sites under positive selection also make only minimal contribution to polymorphism (because polymorphism in the course of an allele replacement is very short-lived). Thus, instead of asking for Kimp/Kneu > 1 as a signature of positive selection it is enough to ask for Kimp/Kneu > Himp/Hneu Himp/Hneu can be as low as 0.2-0.3 (due to a large fraction of sites under negative selection among "important" sites), so this is a much less stringent condition. One problem with this approach is that slightly deleterious variants with -s ~ 1/Ne can segregate within the population, but are only rarely fixed, and thus inflate Himp/Hneu. A possible way of dealing with this problem is to ignore rare variants. Some applications of MacDonald-Kreitman test to Drosophila species suggest that as many as 50% of allele replacements in fly evolution were driven by positive selection, because Kimp/Kneu = 2Himp/Hneu In contrast, in mammals Kimp/Kneu < Himp/Hneu, suggesting no positive selection. The reasons for such contrast are unclear. Anyway, MK test could never establish identities of individual sites under positive selection. 2) Positive selection accomplished recently - within-population variation A recent allele replacement driven by positive selection produces a region of very low variation, flanked by regions with some high-frequency derived alleles. Such a scar of an allele replacement is due to an effect called hitch-hiking, and it remains visible for << 1/Ne generations, where Ne is the effective population size per nucleotide mutation rate. A beneficial mutation (red) in a population with many segregating neutral (green) and slightly deleterious (blue) variants. Half-way towards fixation, the beneficial mutation carries with it the close-by variants. Some of these variants become detached, due to crossing-over, by the time of the fixation. There are several definite known cases of recently accomplished selective sweeps. Reduced levels of genetic variation around the site of recent positive selection-driven allele replacement (selective sweep) in human populations from Africa (a), Europe (b), and East Asia (c) (Nature Genetics 39, 218 - 225, 2007). 3) Ongoing positive selection - within-population variation One must be lucky to study the right population at the right time. Still, there are some definite cases of ongoing allele replacements driven by strong positive selection. One of them is parallel acquisition the ability of adults to digest milk (due to persistent expression of lactase) in Africans and non-Africans. These ongoing sweeps left clear-cut signatures. (a) Kenyan and Tanzanian C-14010 lactase-persistent (red) and non-persistent G-14010 (blue) homozygosity tracts. (b) European and Asian T-13910 lactase-persistent (green) and C-13910 non-persistent (orange) homozygosity tracts. Positions are relative to the start codon of lactase locus (Nature Genetics 39, 31 - 40, 2006). 4) A different approach - detecting positive selection by bursts of substitutions Suppose that at a codon site fitness landscape was suddenly changed. The new optimal amino acid may not be reachable from the old one by a single nucleotide substitution. Then, a clump of two or even three non-synonymous substitutions may follow. Such clumps were observed in evolution of mammals and HIV-1 (PNAS 103, 19396-19401, 2006). Clumping of nonsynonymous substitutions is the strongest in conservative regions of proteins, where the 1:1 situations occur only in ~20% of codons. Indeed - if an important amino acid is replaced, this must be beneficial. This approach reveals a number of slowlyevolving sites that occasionally undergo positive selection. Amino acid sites inferred to be under positive selection in HIV-1 gp120. Left: rapidly evolving sites previously inferred to be under positive selection. Right: conservative sites with strongly clumped substitutions. 3) Detecting balancing selection Balancing selection, which requires changing fitness landscapes, favors rare alleles. It prevents fixations and losses of the alleles involved, leading to durable polymorphisms. In the extreme case this can lead to transspecies polymorphisms, persisting from the time of species divergence. This is the case for sad csd (complementary sex determination) locus in bees. Female must be heterozygous at this locus, and homozygotes develop into sterile males, causing strong selection against common alleles (Genome Res. 16, 1366-1375, 2006). Quiz: Suppose that there is a genome segment with low genetic variation within a population. This can be due to either negative selection or a recent selective sweep within this segment. What additional data can be used to distinguish between these two explanations?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download L20PositiveNegativeBalancing