Download What is the Gene Trying to Do?

Brit. J. Phil. Sci. 62 (2011), 155–176 What is the Gene Trying to Do? Warren J. Ewens ABSTRACT The aim of this paper is to offer a new biological interpretation of Fisher’s ‘Fundamental Theorem of Natural Selection’ and from this to consider optimality properties of gene frequency changes. These matters are of continuing interest to biologists and philosophers alike. In particular, the extent to which biological evolution can be calculated from the ‘gene’s-eye’ point of view is also discussed. In this sense, the paper bears indirectly on the concepts of the unit of selection and of the ‘selfish gene’. A new biological significance for the Fundamental Theorem, not previously found in the literature, is offered, together with an optimality principle connected with this theorem. Introduction Preliminary Calculations and Average Effects The Modern Interpretation of the One-Locus Fundamental Theorem The Whole-Genome Interpretation of the Fundamental Theorem A New Biological Meaning for the Fundamental Theorem of Natural Selection 6 A Numerical Example 7 The Malthusian Parameter 8 Decreases in Mean Fitness 9 Optimality Considerations 10 Conclusions 1 2 3 4 5 1 Introduction The title of this paper is of course inappropriate: the gene (more accurately, allele, and more accurately again allele frequency—more on this below) is not trying to do anything specific, and all teleological expressions in this paper are made merely for simplicity and convenience. The population frequency of each allele changes for many reasons, for example selection, mutation, and random drift. In this paper I focus only on changes arising as a result of selection acting on the phenotype of each individual in the population of interest, and through this on his entire genome. Interactive effects of genes © The Author 2010. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. doi: 10.1093/bjps/axq005 For permissions, please email: [email protected] Advance Access published on June 11, 2010 156 Warren J. Ewens at different loci imply an extremely complex behavior for evolution as directed by natural selection. Despite this, it does not necessarily follow that the most convenient, or appropriate, description of the evolutionary process focuses on whole-genome genotypes. The comparative permanence of any allele as opposed to the essential uniqueness of the whole-genome genotype of any individual is one reason for this. One aim of this paper is to give a biological interpretation of Fisher’s ‘Fundamental Theorem of Natural Selection’, abbreviated here to FTNS. A second aim is to give an optimality principle in genetical evolution associated with the FTNS. Both aims are associated with a ‘gene’s-eye’ view (more accurately, ‘allele’s-eye’ view) of evolution. It is appropriate to start by making several points. The first is that the opening sentence in (Fisher [1930]), in which the FTNS first appeared, is ‘Natural selection is not evolution’. Evolution in a biological species is affected by many processes, of which natural selection is of course an important one. However, while Fisher recognized the importance of mutation and other factors in reaching a full picture of evolution, and did discuss them in (Fisher [1930]), he deliberately set these aside and focused his attention on properties of the evolution of a population as directed by natural selection. In this paper the discussion is similarly restricted, and thus, for example, it is assumed that mutation is not allowed. The second point arises from the interpretation of the FTNS. Its interpretation, its validity as a purely mathematical theorem, and its biological relevance have all occupied biologists, mathematicians, and philosophers for almost eighty years. The viewpoint adopted here, as in (Edwards [1994]; Frank [1997]; Grafen [2003]; and Okasha [2008]), who are the main recent commentators on the theorem, is that the so-called modern interpretation of the theorem, introduced by (Price [1972]) and generalized by (Ewens [1989]), and described below, is the correct interpretation and that under this interpretation, the theorem is correct as a mathematical statement. Given this, attention then focuses on the theorem’s biological relevance and in particular on the three questions raised by (Okasha [2008]) described below. The third point is that the FTNS was seen by Fisher as a mathematical theorem (as its very name implies). A theorem in mathematics is an exact result, derived by logical arguments from various assumptions (or axioms). Different axioms (as in different geometries, considered in Section 9) lead to different theorems and conclusions. The assumptions made in any mathematical analysis of the biological world cannot describe this world exactly, given that world’s complexity, and relate rather to an idealized and simplified picture of this world as defined by the assumptions made. These assumptions imply that various complicating features are ignored, either because insufficient information is available about them, because they are assumed to be What is the Gene Trying to Do? 157 of secondary importance, or because inclusion of these features would preclude a useful mathematical treatment. It is therefore appropriate, in considering the validity of the FTNS as a theorem, to consider the assumptions upon which it is based. These are either explicit or implicit in the development below. Among other simplifications, the theorem ignores the existence of two sexes and assumes that the fitness of any genotype is a fixed constant (and is not, for example, frequency-dependent) which remains unchanged from generation to generation. Only viability fitnesses are considered. The FTNS also assumes in effect an infinitely large population in which random changes of gene frequency do not arise. Some of these simplifications can be and have been relaxed in the recent literature, but those made in this paper are those made by Fisher. Fisher appeared to use a continuous-time analysis in his treatment, but to make the points at issue clear, I find it convenient to assume that the population under discussion reproduces in non-overlapping generations so that, for example, I talk of a parental and a daughter generation. The discrete- versus continuous-time distinction makes no difference to the points at issue. So far as assumptions that are not made are concerned, the FTNS does not assume random mating in the population. One of the major previous misunderstandings, sometimes still current, of the FTNS stems from the incorrect view that it depends on this assumption for its validity. Fisher repeatedly stated that the theorem is true independent of the mating scheme, and from the start he emphasized, for example in (Fisher [1918]), the importance of non-random mating, at least in humans. The modern interpretation of the FTNS accepts that no assumption is made about the mating scheme. As a further important generalization, the FTNS allows the fitness of any individual to depend on the genes in his entire genome (and does not merely apply to the case where it depends only on the genes at one single gene locus, as again is often incorrectly claimed). The fourth point concerns the meaning of what Fisher actually meant in his writings. The FTNS was first proposed in (Fisher [1930]) and further elaborated in (Fisher [1941], [1958]). Unfortunately, (Fisher [1958]) contains a staggering number of errors, misprints, duplications of material (sometimes involving entire pages), etc., as pointed out in detail by (Price [1972]). Since (Fisher [1930]) is not easy to obtain, the 1999 ‘Variorum’ edition of this book, edited by Bennett, together with (Fisher [1941]), which appears not to include any such errors, together provide the most reliable source for his ideas. The ‘Variorum’ edition is referred to here as (Fisher [1999]). The fifth point concerns the concept of the Malthusian parameter, written by Fisher as m. This is discussed at length in (Fisher [1999], pp. 25–30) and defined there as the rate of change of the population size. However, this parameter is mentioned only briefly in the verbal part of, and not at all in his mathematical development of, the FTNS given in (Fisher [1999], pp. 34–5). In 158 Warren J. Ewens that development, Fisher uses the expression ‘fitness of any organism’, a strange expression in view of the fact that it is clear that he is referring to a population concept, and all commentators take this expression to mean ‘population mean fitness’, as do I. The Malthusian parameter reappears in (Fisher [1999], p. 42), where it is stated that population size must generally be held more or less constant due to external factors (such as food supply). Thus, the population mean fitness, if it is to be equated to m, must also be held more or less constant (at the value 1), in apparent contradiction to the conclusion of the FTNS. The relation of the Malthusian parameter and population mean fitness is discussed in more detail below. An associated (and much debated) question is whether one should regard genotype fitnesses as being absolute or relative values. Changes in allele frequency (see for example (3) below) are independent of the absolute/relative question, and much of the FTNS as described mathematically in (Fisher [1999], pp. 34–5) is independent of this question. The absolute/relative question is discussed below in connection with the Malthusian parameter. Sixth, since the FTNS is a theorem, it is necessary in discussing its interpretation and biological usefulness to give enough of the relevant mathematics, equations, and expressions so that this discussion can be carried out. I have attempted to keep this to a minimum, and in particular often merely quote results rather than go through the calculations that lead to them. Finally, I use the word ‘gene’, and also the expression ‘gene frequency’, when it is more appropriate to use ‘allele’ and ‘allele frequency’. The word ‘allele’ is a description of a class, for example the alleles A, B, and O in the ABO blood group system, while a gene is a material object that belongs to one or other allelic class. The mathematical population genetics literature is careful to distinguish between the words ‘allele’ and ‘gene’ (with the exception that for purely historical reasons, ‘gene frequency’ is often used when ‘allele frequency’ is appropriate). Despite this distinction, in this paper I often follow common practice and use the word ‘gene’ when ‘allele’ is appropriate, and ‘gene frequency’ instead of ‘allele frequency’, except in cases when this is potentially misleading. 2 Preliminary Calculations and Average Effects Although I eventually assume, as is necessary, that the fitness of any individual depends on all the genes in his entire genome, in this and the next section I assume that it depends only on the two genes he has at some single gene locus. (The focus, as in all discussions of the FTNS, is on diploid individuals.) Doing this is sufficient to illustrate most of the points at issue concerning the FTNS, and also makes the whole-genome calculation easy to follow. I allow dominance effects at this single locus and any number of possible alleles at this What is the Gene Trying to Do? 159 locus. In the (later) whole-genome analysis I also allow epistatic effects (interactive effects between alleles at different loci), any recombination pattern between gene loci, any arbitrary number of gene loci, and any arbitrary number of chromosomes in the genome. Suppose then that the (viability) fitness of any individual depends only on his genotype at some gene locus A, at which alleles A1, A2, …, Ak can arise. I denote the fitness of an individual of genotype AiAj by wij. Consider first some parental population and denote the population frequency of the genotype AiAi by Pii and of the genotype AiAj by 2Pij (for i ≠ j) at the time of conception of this generation (It is convenient to call Pij the ‘ordered’ frequency of AiAj.) These genotype frequencies are not necessarily assumed to be in Hardy– Weinberg form since (with Fisher), I do not necessarily assume random mating in the preceding generation. The mean population fitness, w, of the population in this generation at the time of its conception is given by w = ∑i ∑j Pij wij ; ð1Þ the double sum (as with all double sums in this and the following section) being taken with both i and j running from 1 to k. The frequency pi of Ai is thus given by pi = ∑j Pij : ð2Þ This sum (and all single sums in this and the following section) is taken for j running from 1 to k. The frequency p′i of Ai at the time of reproduction of individuals in the parental generation is p′i = ∑j Pij wij =w; ð3Þ and I denote the intra-generational change p′i − pi in the frequency of Ai by δpi. Under any form of mating (for example random mating, selfing, partial selfing, assortative mating), p′i is also the frequency of Ai in the daughter generation at its time of conception. In other words, Equation (3) can be taken as providing the frequency of Ai in the daughter generation at the time of conception, and this is the interpretation that I now place on this equation. It is one component of the full evolutionary description of the changes over one generation of the frequencies of the various genotypes at the locus under consideration. It is not necessary, in developing the FTNS, to give this full evolutionary description of the population as determined by whatever the mating scheme might be. Although daughter generation gene frequencies at the time of conception can be calculated as shown above, it is not possible to calculate the daughter generation genotype frequencies at that time without 160 Warren J. Ewens knowledge of the mating scheme. This in turn implies that gene frequencies beyond the daughter generation cannot be calculated from genotype frequencies in the parental generation without knowledge of the mating scheme, and thus, we are unable to track gene frequency evolution over more than one generation without this knowledge. To further complicate matters, the mating scheme might well change from generation to generation. The importance of a lack of knowledge of the mating scheme in tracking evolutionary changes is discussed later this paper. One important point, not referred to sufficiently in the literature, and whose relevance becomes clear in the whole-genome analysis considered below, is that a gene is assumed to be passed on with unchanged allelic type from parent to child. In other words, it is assumed that no intracistronic recombination is possible (since it is also assumed, in developing the FTNS, that there is no mutation). This is the crucial fact which leads to a possible ‘gene’s-eye’ view of evolution. It is now necessary to define the set of average effects of the alleles Ai (i= 1, 2, …, k). These are central to the FTNS. Unfortunately (Fisher [1958]) contains an ambiguity in the definition, even within the same paragraph (p. 35). (Price [1972]) in effect continues this ambiguity (see his Equations (2.4) and (2.5)) by defining average effects in both of Fisher’s ways. In his second definition, he absorbs a constant additive factor (shown explicitly in Equation (6c) below) in the first definition in order to make his calculations more transparent. (Okasha [2008]) notes this ambiguity, and carries out his analysis using the second definition employed by Price. However, the first definition is preferred when defining the key concept of the additive genetic variance, and for this reason I used the first definition in my previous work on this topic (Ewens [1989], [1992], [2004]). Despite this, I use both definitions in this paper, mainly in order to compare my analysis with Okasha’s. This will have the benefit of making explicit the relation between the two definitions. Under the second definition, the average effects of the alleles A1, A2, …, Ak are defined as the values β1, β2, …, βk of b1, b2, …, bk which minimize the quantity 2 ∑i ∑j Pij wij −bi −bj : ð4Þ They are found as the (unique) solutions of the equations pi β i + ∑j Pij β j = ∑j Pij wij ; ði = 1; 2; …; kÞ: ð5Þ These equations have to be solved simultaneously, and in general their solution (for β1, β2, …, βk) cannot be written down explicitly. This does not What is the Gene Trying to Do? 161 matter since explicit solutions are not needed, all relevant information residing in (5). Summation over all alleles in (5) leads to the equation 2∑i pi β i = w: ð6aÞ This equation shows that from a ‘gene’s-eye’ point of view, we may regard the ‘fitness’ of the gene Ai (more appropriately here, allele Ai) as being 2βi and that mean fitness can be calculated using these gene ‘fitnesses’. I return to this viewpoint later. The concept of the additive genetic variance is central to the FTNS. To define this concept, it is easier to use the first definition of the average effects. The total variance in genotype fitness is ∑i ∑j Pij (wij −w)2. The (first definition) average effects are defined as the values α1, α2, …, αk of a1, a2, …, ak which minimize the sum of squares ∑i ∑j Pij (wij −w−ai −aj)2. The sum of squares so removed is the (single locus) additive genetic variance in fitness, 2 denoted by σA . It is that component of the total variance in fitness that is explained by the additive effects of genes within genotypes. Thus, if, for example, wij is exactly equal to the sum βi + βj for all (i, j) combinations, the fitness of each genotype can be explained by the additive effects of the two genes in the genotype; the additive genetic variance in fitness is equal to the total variance in fitness. Following a suggestion of (Crow and Kimura [1970]), the additive genetic variance is often called the ‘genic’ variance, but in view of the above comments, possibly the best expression is ‘additive genic’ variance. The reason why the additive genetic variance is important in evolution is that a parent passes on a gene, and not his genotype, to any offspring so that it is necessary to isolate that portion of the variance in fitness that is explained by ‘genes within genotypes’. The average effects α1, α2, …, αk calculated under the first definition satisfy the equation ∑i pi αi = 0; ð6bÞ an equation that becomes relevant when discussing the whole-genome case. The above calculations show that the relation between the two definitions of average effects is that β j = αj + w=2; ð j = 1; 2; …; kÞ; ð6cÞ the quantity w/2 being the constant additive factor referred to above. Leastsquares theory and this relation between the α and β values shows that σ2A = 2w∑j δpj αj = 2w∑j δpj β j : ð7Þ Finally, and most important, the values of α1, α2, …, αk and of β1, β2, …, βk depend on gene and genotype frequencies and thus change with these frequen- 162 Warren J. Ewens cies. They strictly should thus be written as α1(p1, …, pk), …, αk(p1, …, pk) and β1(p1, …, pk), …, βk(p1, …, pk). As a result, under both definitions the numerical values of the daughter generation average effect values will generally differ from the parental generation values, since the daughter generation gene frequencies will generally differ from the parental generation values (as a result of selection). This point is discussed in more detail in the following sections. 3 The Modern Interpretation of the One-Locus Fundamental Theorem With the above material in hand, I now turn to the modern interpretation of the one-locus version of the FTNS. The key point is that Fisher took a gene’seye view of fitness, deriving from the view that parents pass on genes and not genotypes to their offspring and that changes in gene frequencies are the substance of evolution. He thus conceptualized the fitness of the genotype AiAj not as wij but as βi + βj. (See the following section for excerpts from Fisher’s writing justifying this point of view in the whole-genome case.) Under this point of view, the mean fitness is defined not as in (2) but as ∑i ∑j Pij β i + β j : ð8Þ For any set of genotype fitness values, this expression yields a numerical value identical to that yielded by the expression on the left-hand side of (6a), so it is identical to the numerical value of the mean fitness w as given in (2). Thus this change of point of view is purely a conceptual, rather than a numerical, change in the definition of population mean fitness. The FTNS in its modern interpretation considers only the changes in (8) due to changes in the genotype frequencies, or equivalently changes in the expression on the left-hand side of (6a) due to changes in the gene frequencies pi, and disregards any changes due to changes to the average effects β1…, βk. This change has often been called the change due to natural selection, an interpretation which I challenge below. Here, as in (Ewens [1989]), I adopt the neutral terminology of calling it the partial change in mean fitness and denote it by δP(w), the suffix ‘P’ denoting ‘partial’. This change is ∑i ∑j δPij β i + β j ; ð9Þ or equivalently 2∑j (δpj) βj, where δPij is the change in the frequency of the or2 dered genotype AiAj. Equation (7) shows that this is, exactly, σA /w. This is the proof of the FTNS in the one-locus case. A parallel proof, given in (Ewens [1989]), applies under the definition of average effects using α1, α2, …, αk. What is the Gene Trying to Do? 163 As stated above, the changes (δp1, δp2, …, δpk) have two interpretations. First, they are the changes in gene frequency in any parental generation between its time of conception and its time of reproduction, and are brought about solely by viability differences between the various genotypes. However, gene frequencies in a daughter generation at its time of conception are identical to those in the parental generation at its age of reproduction, whatever the mating scheme in the parental population might be, so that these are also the changes in gene frequencies arising between parental and daughter generations at their respective times of conception. It follows that the partial increase in mean fitness of the population from one generation at its time of conception to the next at its time of conception does not depend on the mating scheme of the parental generation, since the partial change in mean fitness as defined above depends only on gene frequency changes. The FTNS is regarded as an inter-generational result, so that in the modern interpretation it is true independent of the mating scheme. That is, the partial change in mean fitness between parental and daughter generations at their respective ages of conception is independent of the mating scheme and is positive (or at the worst zero), 2 and is given by a specific formula (σA /w). This is in contrast to the actual change in mean fitness between the two generations, which depends on the mating scheme and which, first, can (as shown below) be negative and second, cannot be calculated in the absence of knowledge of this mating scheme. 4 The Whole-Genome Interpretation of the Fundamental Theorem I now turn to the whole-genome FTNS, and assume that the various (gigantically large number of) possible whole-genome genotypes are listed in some agreed order as genotypes 1, 2, …, s, …, S. The ‘time of conception’ frequency of the typical genotype s in the parental generation is denoted by gs and the fitness of this genotype by ws. Thus the parental generation population mean fitness at this time, denoted as in the previous section by w, is ∑Ss=1gsws. Using an approach generalizing that for the single-locus case, the (‘α’ definition) average effects of the various alleles at the various loci in the genome are defined by a least-squares procedure as follows. If the frequency of the allele Ai at the typical gene locus A is denoted by pai, the average effects of all the alleles in the genome are determined by minimizing the sum of squares ∑Ss = 1 gs fws −w−∑*αai g2 ; ð10Þ In the expression (10) the outer sum is taken over all whole-genome genotypes and the inner (starred) sum is taken, for each whole-genome genotype, over all alleles contained within that genotype, with αai being the (first definition) average effect of the allele i at locus A. The average effect αai arises once, twice, 164 Warren J. Ewens or not at all within the starred sum according to whether the allele Ai at locus A occurs once, twice, or not at all within the genotype gs. This minimization process does not lead to unique values of the average effects, since one can add any constant to the average effects at one locus, and subtract the same constant from those at some other locus, without changing the sum of squares removed from the total variance in fitness by fitting average effects. To obtain unique values it is necessary to impose a constraint for the average effects at each gene locus. This is done by requiring (see (6b)) that for the typical locus A, ∑ ipaiαai =0. The collection of αai values so obtained are the (‘α’ definition) whole-genome average effects of all genes at all gene loci in the genome. It is not necessary to give explicit formulae for the various αai values defined by this least-squares procedure: indeed, they can only be expressed implicitly as the (unique) solution of a gigantic set of simultaneous equations. The whole-genome additive genetic variance is found in a manner parallel to that of Section 2, that is, by defining the whole-genome additive genetic variance as the sum of squares removed from the total whole-genome variance in fitness by fitting these αai values. Complete details are given in (Ewens [1989], [1992], [2004]) and are not given here. In this section, we denote the (whole-genome) 2 additive genetic variance in fitness by σA . The relation between this wholegenome additive genetic variance and the sum of the single locus additive genetic variances, defined for each locus by using marginal genotype fitnesses and the calculations of Section 2, is discussed below. As in the one-locus case, Fisher’s ‘gene’s-eye’ view of the fitness of the typical whole-genome genotype s is not its actual fitness, but instead is the linear combination w + Σ* αai, with the starred summation defined as above. In parallel with the corresponding result in the one-locus case, the mean fitness of the population is now thought of as being ∑Ss = 1 gs ðw + ∑*αai Þ; ð11Þ which (as with the corresponding one-locus result) is numerically identical to that given by the standard definition of mean fitness, here ∑Ss = 1 gs ws as given above. The justification for claiming that (11) is Fisher’s view of the fitness of the whole-genome genotype s can be seen from the following quotations (changing from Fisher’s example of stature to fitness): ‘The relation of the quantities β to [fitness] may be made more clear by supposing that for any specific gene combination, we build up an “expected value” […] by adding appropriate [α values], according to the genes present. This expected value will not necessarily represent real [fitness] […] but its statistical properties will be more intimately involved with real [fitness] than [fitness] itself’ (Fisher [1958]). Also What is the Gene Trying to Do? 165 ‘[…] the quantity [this paper’s (11) above] is determined solely by the genes present in the individual, and is built up of the average effects of those genes. It therefore reflects the genetic potentiality of the individual concerned, in the aggregate of mating possibilities open to him’ (Fisher [1999]). Further, ‘A […] discrepancy occurs when gene substitutions are not exactly additive in their average effects’ (Fisher [1999]), but (paraphrasing Fisher [1999], pp. 33–4), this assumption captures ‘the genetic potentiality, [represented in (11) by the genes in any genotype], […] [and will be] reflected in the [fitness] in the offspring’. I now outline the proof of the whole-genome FTNS. This is a natural generalization of the one-locus FTNS given above, so only an outline is necessary (complete details can be found in Ewens [1989], [2004]). In parallel with the one-locus case, the partial change δP(w) in mean fitness is defined as the change in the expression (11) derived solely from changes {δgs} in the various whole-genome genotype frequencies and ignoring changes in average effect values, namely δP ðwÞ = ∑s δgs fw + ∑*αai g: ð12Þ 2 The resulting expression, as shown by least-squares theory, is σA /w, where now the whole-genome additive genetic variance, defined by the wholegenome least-squares procedure discussed above. This simple and exact result is the whole-genome FTNS. Stated in words, the theorem claims that the partial increase in mean fitness from one generation to the next is exactly equal to the whole-genome additive variance in fitness in the parental generation divided by the mean fitness in that generation. Again, in parallel with the one-locus case, the expression (12) for the partial change in mean fitness can be found using only gene frequency changes (of all genes at all gene loci in the entire genome) between the parent and daughter generations at their respective times of conception. As in the one-locus case, these changes in gene frequencies are independent of the mating scheme. Thus, the whole-genome FTNS is therefore true whatever the mating scheme. It is important to emphasize the contrast between this result and that of the (incorrect) version of the FTNS, which claims in the whole-genome case, the case of practical relevance, that mean fitness increases (or at worst remains unchanged) from one generation to the next. Even under random mating, the mean fitness can decrease from one generation to the next in the wholegenome case. This occurs because of recombination: a parent can pass on a chromosome (gamete, haplotype) to a child which the parent himself does not have. In this sense, the similarity between parent and child required for the Darwinian paradigm is partly lost. The error implicit in the incorrect version of the FTNS has unfortunately been perpetuated in many textbooks and research papers. 2 σA is 166 Warren J. Ewens It is also important to emphasize the ‘whole-genome’ property of the FTNS when correctly stated. Unlike chromosomes, genes are passed on unchanged between generations, and this leads to a viable ‘gene’s-eye’ view of evolution. The partial (or ‘gene-based’) change in mean fitness cannot be negative. One can then think of the genes in the genome doing their best to increase mean fitness and to overcome possible decreases in mean fitness because of the genetic phenomenon of recombination. The ‘gene’s-eye’ view of evolution implicit in this is often mistaken for a single-locus point of view. It should be noted in concluding this section that Fisher’s method of calculating the ‘whole-genome’ additive genetic variance, at least as it appears in his publications, is not correct. He in effect assumes that the whole-genome additive genetic variance can be found by first calculating, for each locus in the genome, the additive genetic variance for that locus, presumably using marginal fitness values in the calculation, and then summing the values so obtained over all loci in the genome. It is well known that this is an invalid procedure (for more detail, see Ewens [2004], p. 258). On the other hand, it is quite possible that Fisher was aware of the correct derivation and that his published statements were a shorthand for this. 5 A New Biological Meaning for the Fundamental Theorem of Natural Selection Once the purely mathematical aspects of the FTNS are clarified, it is natural to ask what biological significance it has. In this direction, (Okasha [2008]) has raised the following two questions about the FTNS in its modern interpretation. First, is the partial change as described above the change in mean fitness due to natural selection? Second, is Fisher correct in interpreting any change in the average effects from one generation to the next as an ‘environmental’ change? Finally, he addresses the question above, namely what biological significance does the FTNS in its modern interpretation have? I consider these and other questions in this section. Changes in gene frequencies are due entirely to natural selection (recall that the FTNS is not concerned with changes due to mutation, random drift, etc.) so that in effect, the interpretations that the FTNS concerns changes in mean fitness due to changes in gene frequency and due to natural selection are the same. The former interpretation is more commonly used, so below I refer to them as the changes in mean fitness due to changes in gene frequency. The statement that the FTNS concerns the change in mean fitness due to gene frequency changes is the point of view of (Edwards [1994]), (Frank [1997]), and (Okasha [2008]). It also appears to be stated by Fisher himself in a sentence preceding the statement of the theorem both in (Fisher [1958]) and (Fisher [1999]). However, it is important to note that Fisher uses the word ‘increase’, What is the Gene Trying to Do? 167 not ‘change’, in his statement of the FTNS. This is an important distinction to which I return below. I have expressed doubts (raised initially in Ewens [1989]) about the value of the FTNS as referring to the ‘change’ in mean fitness due to changes in gene frequency (see also Price [1972]). These doubts derive from the fact that changes in gene frequency also imply changes in the average effects; but as shown by comparing Equations (8) and (9) above, in deriving the modern interpretation of the FTNS these changes are either ignored, or assumed not to happen because of extrinsic action, or are described as changes due to the environment. A calculus analogue showing that they cannot be simply ignored, similar to the one that I produced in (Ewens [1989]), is the following. Suppose that one wants to calculate the derivative of the function xe−3x at the value x=1. One does not get the correct result by differentiating only the first component of this function, namely x, and holding the second component (e−3x) constant (yielding the value e−3 at x=1). The component e−3x is just as much a function of x as x is itself, and standard calculus operations involving the derivative of a product show that the correct derivative is found by adding to the value above that found by holding the first term constant and differentiating the second term, eventually arriving (for x=1) at the correct value −2e−3. This is twice as large as, and of the opposite sign to, the value found from the incorrect procedure. Similarly, the quantities being summed in (8) and (11) are all products. It is surely not correct to describe, and to calculate, the change in mean fitness as being due to gene frequency changes by considering only a partial change such as (9) and (12), since changes in gene frequency change the second term in the products in (8) and (11): doing this is the direct analogue of only differentiating the first component in xe−3x and holding the second component constant in a failed attempt to find the derivative of this function with respect to x. (Okasha [2008]) recognizes this point and addresses it by distinguishing between a so-called ‘direct’ effect of natural selection (yielding changes in gene frequencies) and downstream or ‘indirect effects’ (yielding changes in average effects). Changes in average effects are thus regarded as an indirect outcome of changes in gene frequency. In my opinion, this is the best current defense for regarding the FTNS as considering changes in mean fitness due to changes in gene frequency. However, my own view is that changes in average effects are a direct outcome of these changes, as the calculus analogy above shows. The further view expressed by Okasha that average effects are part of the environment of a gene and thus can be held constant (in some unclear way) or that changes in them are part of changes in the ‘environment’ of a gene, either external or from the remainder of the genome, and thus not within the ambit of the FTNS, is to me ad hoc. 168 Warren J. Ewens This leads to the question as to whether average effects can be held constant by extrinsic forces and whether changes in average effects can be regarded as being due to the environment. As emphasized in the numerical example in the following section, it is important to consider two phases of the life cycle: the time of conception and the time of reproduction. Parental generation gene frequencies at the time of reproduction are inherited exactly, as it were a split second later, by the gene frequencies in the daughter generation at its time of conception. However, the average effects for the daughter generation at its time of conception differ in general from parental generation average effects at its time of reproduction. It is difficult to imagine some extrinsic force acting in that split second, or that the environment can be invoked in that split second, negating these changes in average effects. As is shown in the numerical example in the following section, the mean fitness of the daughter generation at its time of conception depends on the mating scheme of the parental generation. This is despite the fact that the gene frequencies in the daughter generation at its time of conception are independent on that mating scheme. Thus the (usually unknown) mating scheme is a key determinant in the generation-to-generation changes in the population mean fitness. Thus changes in population mean fitness from one generation to the next are also determined by changes in average effects and therefore depend on more than just changes in gene frequencies. These considerations lead to a new interpretation of the biological significance of the FTNS, one which does not require extrinsic assumptions and which removes other problems associated with previous interpretations. This focuses on an aspect of evolution that was of considerable interest to Fisher and emphasized in the previous paragraph, namely the mating scheme. Thus, for example, in discussing the relevance of average effects in (Fisher [1941]), he introduced a mating scheme which ensures self-fertilization of the ovules of a certain plant rather than allowing random pollination. Fisher repeatedly criticized Wright’s ‘hill-climbing’ view of evolution, a view which relies on the assumption of one specific mating scheme, namely random mating, which Fisher said does not necessarily occur in general and certainly does not occur in humans. As shown above, daughter generation genotype frequencies, in contrast to daughter generation gene frequencies, do depend on the mating scheme in the parental generation. Thus since the numerical values of average effects depend on genotype frequencies, their changes from one generation to the next are determined in part by the mating scheme. Fisher stressed that this scheme is usually unknown. It would therefore seem to me that the biological relevance of the FTNS is encapsulated in the following statement: What is the Gene Trying to Do? 169 Nothing is known in general about the mating scheme, and therefore nothing in general is known about the changes in mean fitness from one generation to the next, since these are determined by changes in average effects from one generation to the next, which in turn are determined in part by the mating scheme. This implies in turn that nothing can be stated in general about the long-term effects of selection as reflected in long-term gene frequency changes. The Fundamental Theorem of Natural Selection isolates that part of the total change in mean fitness from one generation to the next about which something can be said independent of what the unknown mating scheme might be in each parental generation. This is that in generation after generation, the partial change in mean fitness, which is that component of the change in mean fitness independent of the mating scheme, is exactly equal in each generation to the additive genetic variance divided by population mean fitness in that generation, and is thus non-negative. This interpretation could be brought into line with that which claims that the FTNS leaves aside changes in mean fitness due to the environment if the mating scheme were regarded as part of the environment. However, I regard this interpretation as a stretch. Finally, as stated above, Fisher stated that the FTNS concerns an ‘increase’ in mean fitness, not a ‘change’ in mean fitness, brought about by ‘changes in gene ratio’. He was well aware that the total change in mean fitness can be negative under various forms of mating so that (I believe) he was trying to isolate a component of the total change that is positive (or at least non-negative), thus relating to an increase in mean fitness. I believe that this interpretation of his views is close to that given above. The interpretation of the FTNS given above is quite different from the interpretation which claims that the partial increase in mean fitness is that due to changes in gene frequency, which are created intra-generationally. They have nothing to do with the mating scheme, which leads to inter-generational changes. This new interpretation focuses not on what we can say, but on what we cannot say, about evolution. The relevance of intra-generational and intergenerational changes, and examples of the various points made above, is illustrated in the numerical example in the following section. 6 A Numerical Example The points raised above can be illustrated by a numerical example in the comparatively simple case where fitness depends on the genotypes at one locus only, with two possible alleles at that locus, so I restrict attention to that case. It is convenient to use the second (‘β’) definition of average effects here to compare the analysis with that in (Okasha [2008]). Suppose then that fitness depends on the genotypes at a locus admitting two alleles, A1 and A2, that the fitnesses of the genotypes A1A1, A1A2, and A2A2 are 1.2, 1.0, and 0.9, and that at the time of conception of some parental gen- 170 Warren J. Ewens eration the frequencies of these three genotypes are, respectively, 0.2, 0.4, and 0.4. These are not in Hardy–Weinberg form so that the previous generation did not mate at random. These values imply a frequency of A1 of 0.4 and a population mean fitness of 1.00. The total variance σ2 in fitness is 0.012, and the average effects of A1 and A2 as defined by (5) are 0.585714 and 0.442857, respectively. (All calculations here and below are accurate to six decimal places.) It is easy to confirm from these average effect values that the left-hand side in (6a) gives, as required, the correct mean fitness value of 1.00. Consider next this generation of individuals at the age of reproduction. At that time the frequencies of A1A1, A1A2, and A2A2 are, respectively, 0.24, 0.4, and 0.36 so that the frequency of A1 is 0.44 (an increase of 0.04 over that at the time of conception). The population mean fitness is 1.012, an increase of 0.012. Of the total increase 0.012 in mean fitness, the partial increase is 0.011429, as calculated from (7). This additive genetic variance is about 95.24% of the total variance 0.012 in fitness of the parental generation at its time of conception. These various values confirm the little-appreciated fact that for intragenerational total and partial changes in mean fitness, the equation δP(w)/ δ(w)=σA2 /σ2 holds. Thus, at the intra-generational level, the partial increase in mean fitness cannot exceed the total increase since, necessarily, σA2 ≤ σ2. The average effects of A1 and A2 at the age of reproduction, found from (5), are 0.587705 and 0.441803. The left-hand side in (6a), calculated using these values and the (new) frequencies 0.44 and 0.56 of A1 and A2, is 1.012, in agreement with the value given above for the parental generation mean fitness at this time. The differences between these new average effect values and those obtaining at the time of conception of this generation are due entirely to selection, and they account for the difference of 0.008571 between the total and the partial changes in mean fitness. To ignore these changes in average effects, or to state that they do not occur, is to brush aside an important aspect of intra-generational evolution. Indeed, as the next three paragraphs show, inter-generational gene frequency changes depend not only on the mating scheme but also on the changes in average effects. To allow gene frequencies to change according to standard formulae, but not to let average effects change by standard formulae, seems arbitrary. Consider next, two possibilities for the daughter generation. The first is that the parental generation reproduced by random mating. The daughter generation genotype frequencies at the time of conception would then be 0.1936, 0.4928, and 0.3136, leading to a mean fitness of 1.00736. Although this represents an increase of 0.00736 over that of the parental generation at its time of conception, it is less than two-thirds of the partial increase in mean fitness (of 0.011429) between the two generations. Thus, the partial increase in mean fitness exceeds the total increase, an event that cannot happen for What is the Gene Trying to Do? 171 intra-generational increases, as noted above. This arises because the average effects of A1 and A2 are now 0.58432 and 0.44032, both smaller than those for the parent generation at its time of conception. To ignore these changes in the average effects, arising in part from the mating scheme, is again to ignore an interesting aspect of evolution. Further, these average effects differ from those in the parental generation at its time of reproduction, and it is difficult to imagine any extrinsic force keeping these constant in the split second between the parental generation at its time of reproduction and the daughter generation at its time of conception or to regard these changes in that split second as being due to the environment. A second mating scheme for the parental generation is that of selfing. The daughter generation genotype frequencies at the time of conception under this form of mating are 0.34, 0.2, and 0.46 and the population mean fitness is 1.022. This represents an increase of 0.022 over that of the parental generation at its time of conception. The partial increase of 0.011429, identical to the value obtaining under random mating, is about half of this. The average effects A 1 and A 2 are now 0.594145 and 0.445672, both larger than the corresponding parental generation values, and the increases in these values account for the other half of the total increase in mean fitness. Comparison with the random-mating case illustrates the importance of the mating scheme on the values of the average effects. Again, to ignore the changes in average effects is to ignore an important aspect of evolution. The importance of the mating scheme and of changes in average effects is illustrated by a further calculation. In the case of random mating, the frequency of A1 at the time of reproduction of the daughter generation is 0.475222. In the case of selfing, this frequency is 0.506986. The difference between these two values shows the effect of the mating scheme, and of the effects of average effect values, on changes in gene frequencies between parental and daughter generations at their respective times of reproduction and, thus, of the future evolution of the population. From the inter-generational standpoint, the one of main interest in population genetics theory, the effects of the mating scheme on the total change in mean fitness brought about by changes in average effects are clear. To ignore these changes, or to claim that they do not arise, is thus to ignore an important feature of evolution. If the mating scheme is unknown, changes in population mean fitness cannot be calculated. The question to ask, concerning the statement of the FTNS, is whether the fact that the partial increase in mean fitness can be calculated at each generation irrespective of the mating scheme, and is non-negative, is a useful one. Whatever answer one gives to this question, the theme of this paper, that the point of the FTNS is to separate what one can say about evolution (that irrespective of the mating scheme the partial change in mean fitness is non-negative and is equal to the additive genetic variance in 172 Warren J. Ewens fitness divided by the mean fitness) and what one cannot say (what the change in mean fitness is if the mating scheme is unknown), seems to me to be a highly plausible one. 7 The Malthusian Parameter Another matter to consider, when one considers both intra-generational and inter-generational changes, is the relation between the Malthusian parameter and that of the absolute value of the population mean fitness. As the algebraic analysis and the numerical example above show, changes in mean fitness depend only on the relative values of the various genotypes, as shown by Equation (3) and its whole-genome analogue. On the other hand the mean fitness depends on the absolute fitness values. For the theme of this paper and the main statement made by the FTNS, the absolute/relative fitness values argument frequently conducted is irrelevant. So far as the Malthusian parameter and its relation to the mean fitness is concerned, under viability selection, the intra-generational population mean fitness at the time of reproduction always exceeds that at the time of conception. In contrast, the population size at the time of reproduction must be less, if viability selection operates, than that at the time of conception. It therefore seems contradictory to equate increase in mean fitness in any way with increase in population size as measured by the Malthusian parameter. At the intra-generational level, one decreases as the other increases. I thus disconnect this parameter and the mean fitness and claim that the FTNS says nothing about changes in population size. Indeed the relation between population size and mean fitness is a much-debated one. It is frequently argued that in practice the sizes of many populations are maintained at essentially fixed values from one generation to the next by external forces such as food supply. Under this view, one can interpret the FTNS as describing the attempt made by any population, via the changes in gene frequency made by natural selection within each generation, to increase its ability to maintain a stable size. 8 Decreases in Mean Fitness It is perhaps useful to give here a simple example where mean fitness can decrease from one generation to the next, even though the partial change in mean fitness is positive. Suppose that fitness depends on the genotypes at a locus admitting two alleles, A1 and A2, that the fitnesses of the genotypes A1A1, A1A2, and A2A2 are 1.1, 1.2, and 1.0, and that at the time of conception of some parental generation, the frequencies of these three genotypes are, respectively, 0.16, 0.48, and 0.36, in Hardy–Weinberg form. The population mean fitness is 1.112. At the age of reproduction, the genotype frequencies are, respectively, 22/139, 72/139, and 45/139, the frequency of A1 is 58/ What is the Gene Trying to Do? 173 139, and that of A2 is 81/139. Suppose now that the population mates over successive generations by selfing. In the next generation, the genotype frequencies at the time of conception would be, respectively, 40/139, 36/139, and 63/139 and the population mean fitness would be 150.2/139=1.080576…, a decrease of about 0.0314 compared to that of the parental generation. (Despite this, the partial increase in mean fitness is positive, being about 0.0093.) Over successive generations the mean fitness continues to decrease, asymptotically approaching the value 1. This is an example of which Fisher would have been well aware, so it is inconceivable that he would take the FTNS to imply that population mean fitness as defined mathematically above steadily increases whatever the mating scheme. 9 Optimality Considerations Whatever the correct interpretation of the FTNS might be, one can ask whether it goes far enough. Is it sufficiently interesting to say that some part of the change in mean fitness equals the additive genetic variance divided by the mean fitness? In this section, I offer a further development connected with the FTNS, first expounded in (Ewens [1992]), associated with the concept of optimality. In doing so, I shall pass over much of the algebra (for which see Ewens [2004], pp. 261–5, given there using the ‘α’ definition of average effects). I give details only in the one-locus case and do so by using matrix algebra, which considerably facilitates the analysis. I then outline the ‘whole-genome’ analysis which requires more complicated matrix algebra. I consider throughout a parental and a daughter generation at their respective times of conception. For convenience, I form the set of inter-generational change δpj in the frequencies of the various alleles into a column vector δ. I write the parent generation average effects β1, β2, …, βk conformally as a column vector β and similarly write the parent generation gene frequencies as a column vector p. Define D as diagonal matrix with j-th diagonal element pj, the parent generation frequency of Aj, and P as a matrix whose (i, j) element is Pij, the parent generation frequency of the ordered genotype Aij. From the calculation (3) for daughter generation gene frequencies, Equation (5) may be written in matrix terms as ðD + PÞβ = wðp + δÞ: ð13Þ δ = ðD + PÞβ w−1 −p: ð14Þ From this, immediately, Now consider any arbitrary vector d of gene frequency changes whose j-th element is dj. (This vector is of course subject to the requirement that all gene 174 Warren J. Ewens frequencies must be non-negative and that all gene frequencies must add to 1.) It was shown in (Ewens [1989], [2004]) that a biologically natural (squared) distance measure between parental and daughter generation frequencies is d′(D + P)−1d. It follows from (14) and some algebra that in the case of the natural selection changes δ in gene frequency, this distance measure takes the value σA2/(2w2). The discussion following (6a) shows that the partial change in mean fitness following these arbitrary changes is 2∑j βj dj. I now ask: subject to the condition that the changes d in gene frequency between parental and daughter generations lead to the same distance σA2/(2w2) as that achieved by the natural selection, which changes maximize the partial increase in mean fitness? The answer is the natural selection changes given in the vector δ. The connection with the FTNS through the concept of the partial increase in mean fitness should be noted in that the maximizing gene frequency changes concern the partial increase in mean fitness, a central FTNS concept. Further, this optimality principle is focused on changes in gene frequencies. Even at the whole-genome level this continues to be true, as indicated below. This is, then, a gene’s-eye view of evolution. It is interesting that a frequently encountered ‘natural’ optimality principle is not true. This incorrect principle is most easily stated in the continuous-time analogue of the analysis above, and I consider here only the simple case of a population that mates at random and for which the fitness of any individual depends on the genes at a single locus. It is well known in this case that the population mean fitness increases (or at worst remains constant) from one generation to the next. This incorrect principle is, in effect, that natural selection acts in such a way as to maximize the rate of increase of the population mean fitness. This, however, is not the trajectory taken by natural selection, even in this simple case, as shown for example by Svirezhev and Passekov ([1990], pp. 105–6). Equivalently, and employing the frequently used concept that the population mean fitness represents a dimension in a Euclidean space for which the other dimensions represent gene frequencies, natural selection does not act in such a way that gene frequencies follow a path of steepest ascent up the mean fitness surface. On the other hand, if one employs a nonEuclidean geometry whose distance measure is the one defined above, then in this ‘distorted-from-Euclidean’ space the path of steepest ascent is taken. This might be thought of as another reason for using the distance measure described above and takes up a point made in Section 1 that different geometries lead to different conclusions. The analogy with the fact that in general relativity theory light takes a shortest path in a suitable non-Euclidean space is an interesting one. In the whole-genome case the appropriate distance measure is of the form d′(D + P + Q)−1d. Here d is a gigantically long vector of gene frequency What is the Gene Trying to Do? 175 changes for all genes in the entire genome, D is an appropriately conformal gigantic diagonal matrix, displaying along its diagonal the various gene frequencies at all the various loci in the genome, P is a conformal gigantic block diagonal matrix, with the various blocks containing single locus genotype frequencies at the various gene loci in the genome, and Q is an equally gigantic matrix, best described in (Castilloux and Lessard [1995]), involving various two-locus frequencies. With this change, the results reached above in the onelocus case generalize to the entire genome. 10 Conclusions This paper has two themes. The first is to offer a new biological significance for the FTNS. This is to contrast what one can say about evolution (that whatever the mating scheme, the partial change in mean fitness from one generation to the next must be positive, or at worst zero, and is closely associated with the additive genetic variance) and what one cannot say, given that one in practice does not know the mating scheme of any population (that is whether this change in mean fitness is positive or negative). The second theme is that the relevance of the FTNS can be extended by showing that aspects of the evolution of a population as described by gene frequency changes, and thus connected to the FTNS, possess an optimality character. This optimality principle is perhaps counterintuitive in that it depends on a biologically relevant distance measure (defined above and whose biological relevance is discussed in Ewens [2004]) defining a certain non-Euclidean space. Genes are shown to evolve in an optimal way in this space, and from this point of view, one finds a meaningful picture of a gene’s-eye view of evolution. Taking up the theme of (Dawkins [1996]), we can say that gene frequencies evolve in such a way that the partial change in mean fitness is non-negative (the FTNS) and also (the optimality principle) that these frequencies track along a path of steepest ascent when the topography of ‘Mount Improbable’ is suitably amended so that this mountain is represented in a biologically meaningful non-Euclidean space. Acknowledgement I thank Anya Plutynski for many useful comments on an earlier draft of this paper. Department of Biology University of Pennsylvania Philadelphia, PA 19104-6018 USA [email protected] 176 Warren J. Ewens References Castilloux, A. M. and Lessard, S. [1995]: ‘The Fundamental Theorem of Natural Selection in Ewens’ Sense (Case of Many Loci)’, Theoretical Population Biology, 48, pp. 306–15. Crow, J. F. and Kimura, M. [1970]: An Introduction to Population Genetics Theory, New York: Harper and Row. Dawkins, R. [1996]: Climbing Mount Improbable, New York: W.W. Norton. Edwards, A. W. F. [1994]: ‘The Fundamental Theorem of Natural Selection’, Biological Reviews, 69, pp. 443–74. Ewens, W. J. [1989]: ‘An Interpretation and Proof of the Fundamental Theorem of Natural Selection’, Theoretical Population Biology, 36, pp. 167–80. Ewens, W. J. [1992]: ‘An Optimizing Principle of Natural Selection in Evolutionary Population Genetics’, Theoretical Population Biology, 42, pp. 333–46. Ewens, W. J. [2004]: Mathematical Population Genetics, New York: Springer. Fisher, R. A. [1918]: ‘The correlation between relatives on the supposition of Mendelian inheritance’, Proceedings of the Royal Society of Edinburgh, 52, pp. 399–433. Fisher, R. A. [1930]: The Genetical Theory of Natural Selection, Oxford: The Clarendon Press. Fisher, R. A. [1941]: ‘Average Excess and Average Effect of a Gene Substitution’, Annals of Eugenics, 11, pp. 53–63. Fisher, R. A. [1958]: The Genetical Theory of Natural Selection, New York: Dover. Fisher, R. A. [1999]: The Genetical Theory of Natural Selection, J. H. Bennett (ed), Variorum Edition, Oxford: Oxford University Press. Frank, S. A. [1997]: ‘The Price Equation, Fisher’s Fundamental Theorem, Kin Selection, and Causal Analysis’, Evolution, 51, pp. 1712–29. Grafen, A. [2003]: ‘Fisher the Evolutionary Biologist’, The Statistician, 52, pp. 319–29. Okasha, S. [2008]: ‘Fisher’s Fundamental Theorem of Natural Selection—A Philosophical Analysis’, British Journal for the Philosophy of Science, 59, pp. 319–51. Price, G. R. [1972]: ‘Fisher’s Fundamental Theorem Made Clear’, Annals of Human Genetics, 36, pp. 129–140. Svirezhev, Y. U. and Passekov, V. P. [1990]: Fundamentals of Mathematical Evolutionary Genetics, Dordrecht: Kluwer.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download What is the Gene Trying to Do?