Download The Frequency Distribution of Nucleotide Variation in Drosophila

The Frequency Distribution of Nucleotide Variation in Drosophila simulans David J. Begun Section of Evolution and Ecology, University of California at Davis Patterns of codon bias in Drosophila suggest that silent mutations can be classified into two types: unpreferred (slightly deleterious) and preferred (slightly beneficial). Results of previous analyses of polymorphism and divergence in Drosophila simulans were interpreted as supporting a mutation-selection-drift model in which slightly deleterious, silent mutants make significantly greater contributions to polymorphism than to divergence. Frequencies of unpreferred polymorphisms were inferred to be lower than frequencies of other silent polymorphisms. Here, I analyzed additional D. simulans data to reevaluate the support for these ideas. I found that D. simulans has fixed more unpreferred than preferred mutations, suggesting that this lineage has not been at mutation-selection-drift equilibrium at silent sites. Frequencies of polarized unpreferred polymorphisms are not skewed toward rare alleles. However, frequencies of unpolarized unpreferred codons are lower in high-bias genes than in low-bias genes. This supports the idea that unpreferred codons are borderline deleterious mutations. Purifying selection on silent sites appears to be stronger at twofold-degenerate codons than at fourfold-degenerate codons. Finally, I found that Xlinked polymorphisms occur at a higher average frequency than polymorphisms on chromosome arm 3R, even though an average X-linked site is significantly less likely to be polymorphic than an average site on 3R. This result supports a previous analysis of D. simulans indicating different population genetics of X-linked versus autosomal mutations. Introduction An early result of theoretical population genetics was the expected frequency distribution of mutations under a neutral, equilibrium model of evolution (e.g., Wright 1938; Kimura 1983). Unfortunately, violations of any one (or several) of the assumptions of the neutral, equilibrium model could cause the frequency spectrum of natural variation to deviate from theoretical predictions, making discrepancies between observed and expected distributions difficult to interpret. For example, purifying selection maintains deleterious alleles at average frequencies that are lower than the average frequencies of neutral alleles in equilibrium populations. However, frequencies of neutral alleles might be reduced compared with equilibrium expectations if a population is rapidly expanding (Maruyama and Fuerst 1984). Positive selection is expected to cause frequencies of beneficial alleles to be higher than expected for neutral alleles in equilibrium populations. However, population bottlenecks can also cause higher-than-expected frequencies for neutral alleles. Such problems with regard to hypothesis testing can be addressed if one has some way of categorizing mutations a priori. For example, if replacement polymorphisms were significantly more rare than silent polymorphisms from the same population sample, then one might hypothesize that replacement polymorphisms are under stronger purifying selection than silent polymorphisms. Another example comes from the analysis of codon bias. Preferred codons are, by definition, significantly more abundant in genes exhibiting a high degree of codon bias than in genes exhibiting less codon bias (e.g., Sharp and Lloyd 1993). Key words: Drosophila, DNA variation, population genetics, molecular evolution, natural selection. Address for correspondence and reprints: David Begun, Section of Evolution and Ecology, University of California, Davis, California 95616. E-mail: [email protected]. Mol. Biol. Evol. 18(7):1343–1352. 2001 q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 They are hypothesized to have a slightly higher average fitness than unpreferred codons. Recent analyses of nucleotide polymorphism and divergence at eight genes from Drosophila simulans and its close relatives have led to three hypotheses regarding the frequency distribution of nucleotide polymorphisms (Akashi 1996, 1999; Akashi and Schaeffer 1997). The first hypothesis is that roughly equal numbers of preferred and unpreferred codons (mutations) have fixed along the D. simulans lineage. The observation is consistent with the notion that codon bias is not evolving in D. simulans (i.e., that D. simulans is at equilibrium for codon bias). The second hypothesis (Akashi and Schaeffer 1997) is that unpreferred polymorphisms segregate at significantly lower frequencies than preferred polymorphisms in D. simulans. The third hypothesis is that replacement polymorphisms are skewed toward rare alleles in D. simulans (Akashi 1996, 1999). According to this worldview, many unpreferred polymorphisms and replacement polymorphisms in D. simulans belong to a special category of ‘‘borderline’’ alleles. These alleles have selection coefficients such that Ns, the product of the effective population size and the selection coefficient, is close to 1. Selection on such alleles is sufficiently weak that they can reach appreciable frequencies, yet sufficiently strong that they are unlikely to reach high frequencies or fix (Kimura 1983; Ohta 1992). A weakness of the D. simulans data, as acknowledged by Akashi (1996, 1999), is that support for a significant skew toward rare amino acid polymorphisms is based on data from only a few genes. Only three of the eight genes analyzed by Akashi (1996) harbored amino acid polymorphism. Of the nine singleton amino acid polymorphisms, five were from the period locus. We would be unwise to draw general conclusions about the frequency distribution of amino acid polymorphisms from so few data. Given that greater numbers of silent polymorphisms were observed in D. simulans, conclusions on their frequency distribution would seem to be more sound. Nevertheless, period data account for about 1343 1344 Begun 30% of the derived singleton unpreferred polymorphisms in the D. simulans data analyzed by Akashi and Schaeffer (1997). If period data are excluded, there is no significant skew toward rare, unpreferred alleles in D. simulans (one-tailed Mann-Whitney U test; P 5 0.11). This dependence of the statistical results on period data could be indicative of locus effects or could be attributable to reduced power associated with removal of a large amount of data from the analysis. The conclusion of roughly equal numbers of preferred and unpreferred fixations is based on observation of only 27 mutations (Akashi 1999). The observation of 14 unpreferred and 13 preferred fixations (Akashi 1999) is compatible with an equilibrium model (i.e., 50% of the fixations preferred and 50% unpreferred). However, this observation is also compatible with an underlying model with highly asymmetric fixation rates of the two mutant types. For example, the observation of 14 unpreferred and 13 preferred fixations in D. simulans is compatible with an underlying model of 65% unpreferred and 35% preferred fixations (two-tailed binomial probability; P 5 0.22). That is, although the equilibrium model was not rejected with the available D. simulans data, this should not be construed as strong support for the model. In general, the previously available data from D. simulans were insufficient to draw strong conclusions on the frequency distribution. Here, I reexamine the frequency distributions of different types of mutations from a larger sample of D. simulans codons (Begun and Whitley 2000b) and use the results to make inferences on the causes of variation in D. simulans populations. Materials and Methods Drosophila simulans alleles analyzed here are those reported in Begun and Whitley (2000b). Most of these sequences are from a set of highly inbred lines made from single D. simulans females captured in the Wolfskill Orchard in Winters, Calif., during the summer of 1995. Most remaining loci include alleles sampled from several locations. Forty genes, 19 on chromosome arm 3R and 21 on the X chromosome, are included in these analyses (appendix A). The number of chromosomes sampled per locus varies from 5 to 11, although the majority of genes were sampled for six to eight alleles (Begun and Whitley 2000b). The average number of alleles sampled per locus was 7.14 for the X chromosome and 7.06 for 3R. Drosophila yakuba sequences were available for 23 genes (Rh3, ry, Rel, hyd, T-cpl, Ap50, Osbp, boss, Tpi, mir, Cp190, Hsc70, eld, G6pd, g, sog, v, sn, dec-1, X, per, z, and Pgd) of the 40 surveyed for polymorphism in D. simulans. Sequences were analyzed primarily with the DnaSP program (Rozas and Rozas 1999). The numbers of polymorphisms can vary slightly from one analysis to another (e.g., between this report and Begun and Whitley [2000b]), because some codons with more than one mutation can be used in certain analyses, but not in others. The criteria of Sharp and Lloyd (1993) were used to assign codons to putative fitness classes, preferred and unpreferred. Following Akashi (1995, 1996), I analyzed the frequency distribution of polarized mutations. Polarized polymorphisms are those for which parsimony can be used to infer which of two alleles at a polymorphic codon is ancestral. Drosophila melanogaster and D. yakuba served as outgroups for the D. simulans data. I used a haphazardly selected allele from each of the two outgroup species for all inferences of the ancestral state in D. simulans. When each of the outgroup codons was identical to one of the segregating D. simulans codons, the outgroup codon was inferred to be the (monomorphic) codon in the hypothetical ancestral D. simulans population. Fixations along the D. simulans lineage were inferred when all D. simulans alleles had a particular base at a given site and both outgroups shared the same base, which was different from the base present in D. simulans. Changes from preferred alleles to unpreferred alleles are referred to as unpreferred mutations, while changes from unpreferred to preferred alleles are referred to as preferred (i.e., higher fitness) mutations. Codons harboring more than one mutation in the sample of three species were excluded from all analyses. Many of Akashi’s analyses focused on silent mutations assigned to either of two fitness categories. Here, I also analyzed ‘‘no-change’’ mutations, defined as unpreferred-to-unpreferred changes, or preferred-to-preferred changes. These mutations are hypothesized to have lesser fitness effects than mutations between categories. Replacement mutations were polarized in the same way as silent mutations for the purposes of estimating frequencies, although they were not assigned to presumptive fitness categories. Polarized polymorphisms can have frequencies between 1/n and (n 2 1)/n, where n is the number of sampled alleles. I also analyzed unpolarized mutations. This approach has at least two advantages. First, there is no inference regarding the ancestral state, and thus no potential uncertainty or bias introduced into the analysis. Second, many more codons are available for analysis. For the purposes of this paper, most unpolarized analyses are on the frequencies of unpreferred codons. Unpolarized unpreferred polymorphisms can have frequencies ranging from 1/n to (n 2 1)/n. Codons for which there were more than two alleles were excluded from the analysis. For silent versus replacement polymorphism frequencies (no parsing of silent mutations into fitness classes), the frequency of a codon was taken as the frequency of the less common allele. For some analyses, I assessed the effect of codon bias on frequency of mutations by dividing the D. simulans genes into ‘‘higher-bias’’ and ‘‘lower-bias’’ categories. Higher bias genes were defined as having an effective number of codons (ENC; Wright 1990) below the median ENC (43.8) of simulans genes in the data (appendix A); lower bias genes had ENCs above the median. The v and nos loci had the median ENC for the data; v was haphazardly assigned to the lower-bias category, while nos was assigned to the higher-bias category (none of the results are sensitive to this assignment). A more powerful approach for assessing the effect of bias on frequencies might result if we omit genes Population Genetics of Drosophila simulans 1345 Table 1 Average Frequencies of Drosophila simulans Polymorphisms Silent (n) X .......... 3R . . . . . . . . . X 1 3R . . . . . 0.278 (230) 0.254 (456) 0.262 (686) Silent 1 Replacement (n) Replacement (n) 0.245 (42) 0.271 (57) 0.260 (99) 0.273 (272) 0.256 (513) 0.262 (785) NOTE.—n 5 number of polymorphisms. The frequency at a codon is the frequency of the less common allele. of intermediate codon bias. Therefore, in some analyses, I included only the genes having ENC values near the tails of the ENC distribution for all the data. For analyses of polarized mutations, the following genes were assigned to the high-bias category: Tpi, Hsc70, G6pd, mir, per, Pgd, and sn; the low-bias category included ry, hyd, dec-1, and Cp190. The mean ENCs for these high- and low-bias categories were 33.6 and 55.1, respectively, compared to a mean ENC of 44.2 for the 40 D. simulans genes from Begun and Whitley (2000b). For analyses of unpolarized polymorphisms, the highbias genes included Yp3, Yp2, and sqh in addition to the above set. Low-bias genes for unpolarized analyses included those used in the polarized data, as well as fzo, mei-9, otu, Gld, mei-218, AATS, and ovo. Results Average Frequencies of Unpolarized Silent and Replacement Polymorphisms Table 1 shows the frequencies of silent and replacement polymorphisms in D. simulans. Overall frequencies (6SE) of silent and replacement polymorphisms are 0.262 (60.004) and 0.260 (60.012), respectively; there is no evidence that silent and replacement polymorphisms occur at different average frequencies in our sample. Mean frequencies (6SE) of X-linked and 3R polymorphisms (silent 1 replacement) are 0.273 (60.007) and 0.256 (60.005), respectively. Average Frequency of Polarized Mutations Table 2 shows the average frequencies among 350 polymorphisms from three silent categories and from the replacement category. A Kruskal-Wallis test of mutants in the four categories does not reject the null hypothesis of equal distributions (P 5 0.39). Average frequencies of unpreferred versus preferred polymorphisms are not significantly different (Mann-Whitney; P 5 0.16). The FIG. 1.—Codon bias (Drosophila simulans effective number of codons) versus average frequency of polarized unpreferred polymorphisms for each region. Only regions with five or more polymorphisms were included. Not significant by Spearman’s correlation. frequencies of unpreferred versus preferred polymorphisms are not significantly different for the X-linked genes (P 5 0.16) or for the 3R genes (P 5 0.54) considered separately. When fixed mutations (frequency 5 1.0) are included in the estimation of mean frequencies, the frequency of preferred mutants (n 5 89, mean 5 0.69) is significantly higher (Mann-Whitney; P , 0.0001) than the frequency of unpreferred mutants (n 5 275, mean 5 0.49). The difference between the results on mean frequencies with versus without fixations is easily understandable from the observation that the ratio of preferred to unpreferred fixations is much higher than the ratio of preferred to unpreferred polymorphisms. Under the mutation-selection-drift model of silentsite evolution, genes under stronger selection for codon bias might be expected to show a greater skew toward rare alleles for unpreferred mutations (Akashi 1999; McVean and Charlesworth 1999). Figure 1, a scatterplot of codon bias in D. simulans genes (ENC) versus the average frequency of unpreferred polymorphisms (per gene), reveals no effect of codon bias on the average frequency of unpreferred polymorphisms. Although the mean frequency of unpreferred polymorphisms is lower in higher-bias genes (n 5 114 polymorphisms, frequency 5 0.307) than in lower-bias genes (n 5 90 polymorphisms, frequency 5 0.333), the difference is not significant (Mann-Whitney; P 5 0.16). There is no difference in the average frequencies of unpreferred polymorphisms in high-bias genes (n 5 59 polymorphisms) Table 2 Average Frequencies of Polarized Polymorphisms in Drosophila simulans X (n) No change (NC) . . . . . . . . . . . . . . . . . . . . . Preferred (P) . . . . . . . . . . . . . . . . . . . . . . . . . Unpreferred (U) . . . . . . . . . . . . . . . . . . . . . . Silent (NC 1 P 1 U) . . . . . . . . . . . . . . . . . Replacement . . . . . . . . . . . . . . . . . . . . . . . . . Total (silent 1 replacement) . . . . . . . . . . . . NOTE.—n 5 number of polymorphisms. 0.403 0.432 0.350 0.373 0.376 0.373 (16) (18) (68) (102) (14) (116) 3R (n) 0.311 0.306 0.302 0.305 0.267 0.300 (42) (25) (136) (203) (31) (234) X 1 3R (n) 0.336 0.359 0.318 0.328 0.301 0.324 (58) (43) (204) (305) (45) (350) 1346 Begun Table 3 Numbers of Polarized Mutations in Drosophila simulans X FREQUENCY 3R P U NC R P U NC R .0.0, #0.2 . . . 4 .0.2, #0.4 . . . 5 .0.4, #0.6 . . . 4 .0.6, #0.8 . . . 4 .0.8, ,1.0 . . . 1 1.0 . . . . . . . . . . 28 24 20 14 6 4 34 5 1 8 2 0 7 6 4 0 2 2 27 11 6 7 1 0 18 66 36 21 12 1 37 18 14 6 3 1 9 15 7 9 0 0 38 NOTE.—P 5 preferred; U 5 unpreferred; NC 5 no change; R 5 replacement. Table 5 Numbers of Polarized Preferred and Unpreferred Mutations in High-Bias Versus Low-Bias Genes in Drosophila simulans HIGH BIAS LOW BIAS FREQUENCY P U P U .0.0, #0.2 . . . . . .0.2, #0.4 . . . . . .0.4, #0.6 . . . . . .0.6, #0.8 . . . . . .0.8, ,1.0 . . . . . 1.0 . . . . . . . . . . . . 0 3 1 3 1 18 30 11 8 8 2 19 6 2 2 2 0 12 27 9 5 4 0 20 NOTE.—P 5 preferred; U 5 unpreferred. versus low-bias genes (n 5 25 polymorphisms) (MannWhitney; P 5 0.47). The ratios of unpreferred to preferred polymorphisms are not significantly different in the higher-bias (114:19) versus lower-bias (90:24) genes; neither are the ratios significantly different in the high-bias (59:8) versus low-bias (25:6) genes. Overall, there is little evidence that selection has heterogeneous effects on the mean frequencies of derived polymorphisms across categories of mutants. mutations in the high- versus low-bias genes is marginally significant when the fixations are included (P 5 0.05) but is not significant when only the polymorphic mutations are used (P 5 0.23). The ratios of derived singleton to nonsingleton polymorphisms for unpreferred (90:114) versus preferred (15:28) mutants are not significantly different (G-test; P 5 0.31). Frequencies of Derived X-Linked Versus Autosomal Polymorphisms Frequency Distribution of Polarized Mutations Polarized polymorphisms and fixations from different categories were assigned to one of six frequency classes: .0.0 and #0.20, .0.20 and #0.40, .0.40 and #0.60, .0.60 and #0.80, .0.80 and ,1.0, and 1.0 (table 3). The 4 3 5 contingency table of polymorphisms (X and 3R data pooled) is not significantly heterogeneous (P 5 0.96). Thus, there is no support for a skew toward rare alleles for unpreferred polymorphisms, or for any difference in frequency distribution across mutational types. Table 4 shows the distributions of preferred and unpreferred polymorphisms and fixations in higher-bias versus lower-bias genes. The 4 3 5 contingency table of preferred versus unpreferred polymorphisms in higher-bias versus lower-bias genes is not significantly heterogeneous (P 5 0.22). The 4 3 6 contingency table that includes fixations is significantly heterogeneous (P 5 0.002). Table 5 shows the frequency distribution of preferred and unpreferred mutations for genes from the extreme bias categories, high versus low. A homogeneity test of the frequencies of unpreferred Table 4 Numbers of Polarized Preferred and Unpreferred Mutations in Higher-Bias Versus Lower-Bias Genes in Drosophila simulans HIGHER BIAS LOWER BIAS FREQUENCY P U P U .0.0, #0.2 . . . . . .0.2, #0.4 . . . . . .0.4, #0.6 . . . . . .0.6, #0.8 . . . . . .0.8, ,1.0 . . . . . 1.0 . . . . . . . . . . . . 8 6 1 3 1 24 57 28 16 10 3 38 7 5 10 2 0 22 33 28 19 8 2 33 NOTE.—Higher- and lower-bias genes defined in appendix B. P 5 preferred; U 5 unpreferred. For each of four mutant classes, X-linked alleles occur at higher average frequencies than alleles on chromosome arm 3R (table 2). Considering each of the 350 polarized polymorphisms as an independent observation, the difference in average frequency between chromosomes is highly significant (Mann-Whitney; P 5 0.004). The higher frequency of X-linked polymorphisms is consistent across preferred, unpreferred, nochange, and replacement variants. If one calculates the average frequency of polarized polymorphisms for each gene, the average is significantly higher for X-linked genes (0.389, n 5 10) than for 3R genes (0.304, n 5 13) (Mann-Whitney U; P 5 0.01). Tajima’s (1989) D statistics for silent mutations are given in appendix B. The mean Tajima’s D for X-linked genes (0.363) is more positive than the mean for 3R genes (20.004), although the difference is not significant. Tests of Polymorphism and Divergence Tables 6 and 7 show the numbers of polarized polymorphisms and fixations in D. simulans. The 2 3 4 contingency tables are significantly heterogeneous with all the data (P , 0.001) and with the Relish and G6pd data excluded (P , 0.001). There is strong evidence that both G6pd and Relish have undergone adaptive protein evoTable 6 Numbers of Polymorphic and Fixed Mutations (polarized) in Drosophila simulans Polymorphic . . . . . Fixed . . . . . . . . . . . P U NC R 43 46 204 71 58 16 45 65 NOTE.—P 5 preferred; U 5 unpreferred; NC 5 no change; R 5 replacement. Population Genetics of Drosophila simulans 1347 Table 7 Numbers of Polymorphic and Fixed Mutations (polarized) in Drosophila simulans (data from Relish and G6pd loci excluded) Polymorphic . . . . . Fixed . . . . . . . . . . . P U NC R 36 36 192 65 49 10 42 18 NOTE.—P 5 preferred; U 5 unpreferred; NC 5 no change; R 5 replacement. lution in the D. simulans lineage. Therefore, significant heterogeneity of the data in Table 7 shows that large numbers of excess amino acid fixations in Relish and G6pd (Eanes et al. 1996; Begun and Whitley 2000a) do not account for the result. As one would suspect from inspection of table 7, the ratio of polymorphic to fixed mutations is not significantly heterogeneous for the unpreferred, no-change, and replacement mutations (P 5 0.23). Thus, the main cause of the significant rejection of homogeneity in this table is that the ratio of preferred fixations to polymorphisms is significantly greater than the ratio observed for the other mutant classes. A lineage/gene at equilibrium for codon bias is expected to fix equal numbers of preferred and unpreferred mutations. A binomial test of the equilibrium hypothesis that 50% of D. simulans fixations are preferred is significant for 23 genes (table 6; P 5 0.021, two-tailed), and for 21 genes (table 7; Relish and G6pd excluded, P 5 0.004, two-tailed). Sixteen genes deviate from the expectation of equal numbers of preferred and unpreferred fixations in D. simulans; of these, 11 deviate in the direction of more unpreferred than preferred fixations, while only five deviate in the opposite direction. These results support the notion that there has been a decline of codon bias along the D. simulans lineage, although the effect is not as pronounced as it is in D. melanogaster (Akashi 1996). As seen in previous analyses (Akashi 1995, 1996), the ratio of unpreferred to preferred fixations in D. melanogaster (205:27) is much greater than the expected ratio of 1:1 (two-tailed binomial probability; P , 1025). This ratio in melanogaster is much greater than the ratio in D. simulans (G-test, P , 1025), also supporting previous analyses by Akashi. Although the D. simulans silent fixations reject the equilibrium model, there is no difference in the ratios of unpreferred to preferred fixations for higher-bias (38:24) versus lower-bias genes (33:22). Silent-site divergence was estimated by counting all silent mutants that fixed along the D. simulans lineage; polymorphism data from D. simulans, as well as outgroup data from D. melanogaster and D. yakuba, were used in the analysis. Figure 2 shows that there is no correlation between codon bias (ENC) and the silentsite divergence in the D. simulans lineage. An earlier study showing a similar result did not examine the D. simulans lineage separately (Powell and Moriyama 1997). Silent divergence for 10 X-linked genes (0.029) was slightly greater than the divergence for 3R genes (0.021); the difference was marginally significant (P 5 0.04) by a Mann-Whitney test. However, there was no FIG. 2.—Codon bias (Drosophila simulans effective number of codons) versus silent divergence per site along the D. simulans lineage for each region. Only ‘‘fixed’’ sites were included in the analysis. Not significant by Spearman’s correlation. difference in the ratio of unpreferred to preferred fixations for X-linked (34:28) versus 3R (37:18) genes. The data from tables 4 and 5 can be used to ask whether variation in levels of overall codon bias affect the ratio of polymorphic to fixed unpreferred mutations. The 2 3 2 contingency table of polymorphic versus fixed unpreferred mutants is not significantly heterogeneous for the higher-bias versus lower-bias genes. However, the corresponding table for the more extreme highbias versus low-bias comparison was significantly heterogeneous (P 5 0.03). Frequency of Unpolarized Unpreferred Codons There were 422 codons that were polymorphic for an unpreferred codon and a preferred codon (codons with allele frequencies of 0.5 were excluded). Of these, the rarer allele was unpreferred at 285 codons. Unpolarized data can be used directly in tests to determine if unpreferred codons are maintained at low frequency by natural selection. Under the mutation-selection-drift model, the degree of codon bias reflects the intensity of purifying selection at silent sites. The frequency of the unpreferred codon was calculated for each of 452 codons (n 5 40 genes) in which one allele was unpreferred and one was preferred. Figure 3 shows the relationship between ENC and the frequency of unpreferred alleles per gene; the two variable are significantly correlated (Spearman correlation; P 5 0.005). Furthermore, the average frequency of unpreferred codons is marginally significantly lower (Mann-Whitney; P 5 0.04) in the higher-bias genes (mean 5 0.213) than in the lower-bias genes (mean 5 0.252). The same is true for the 10 most biased (mean frequency of unpreferred alleles per gene 5 0.219) versus the 10 least biased (mean frequency of unpreferred alleles per gene 5 0.286) genes among the 40 D. simulans genes (Mann-Whitney; P 5 0.03). These analyses support the notion that frequencies of unpreferred codons are depressed by purifying selection. Further support for this notion comes from categorization of unpreferred polymorphisms into two categories, singletons versus nonsingletons. There are 85 singletons and 111 nonsingletons for higher-bias genes; there are 61 singletons and 195 nonsingletons for lower-bias genes. The proportion of unpreferred polymorphisms 1348 Begun Table 8 Numbers of Polarized Unpreferred Polymorphisms in Twofold and Fourfold codons in Drosophila simulans Frequency .0.0, .0.2, .0.4, .0.6, .0.8, FIG. 3.—Codon bias (Drosophila simulans effective number of codons) versus average frequency of unpolarized unpreferred polymorphisms for each region. Only regions with five or more polymorphisms were included. Spearman’s correlation; P 5 0.005. that are singletons is significantly greater in higher-bias than in lower-bias genes (G-test; P , 0.0001), as one would expect if purifying selection depresses frequencies of unpreferred codons more effectively in higherbias genes. Twofold-Degenerate Versus Fourfold-Degenerate Codons The previous analyses of silent mutations did not distinguish between those at twofold-degenerate versus fourfold-degenerate codons. The frequency of derived silent mutations (preferred 1 unpreferred 1 no change) is lower for the twofold (n 5 90 polymorphisms, mean 5 0.284) than for the fourfold (n 5 195, mean 5 0.343) codons; however, the difference is only marginally significant (Mann Whitney; P 5 0.04). As one might expect given this result and given that most D. simulans polymorphisms are unpreferred, the mean frequency of derived, unpreferred polymorphisms at twofold codons (n 5 70, mean 5 0.269) is lower than the corresponding frequency at fourfold codons (n 5 115, mean 5 0.339); the difference is marginally significant by a Mann-Whitney test (P 5 0.04). In spite of the marginally significant difference in means, the frequency distribution of unpreferred polymorphisms in twofold versus fourfold codons (table 8) is not significantly different (P 5 0.45). Table 9 shows the frequencies of unpolarized unpreferred codons in twofold-degenerate codons from genes of higher versus lower degrees of codons bias. This 2 3 5 contingency table is significantly heterogeneous (P 5 0.013). Frequencies of unpolarized unpreferred codons in higher-bias genes are skewed toward rare alleles compared with the distribution for lower-bias genes, as predicted if these codons are maintained at low frequency by purifying selection. Potential Biases Conclusions from polarized data would be weakened if the subset of genes for which we have D. yakuba #0.2 #0.4 #0.6 #0.8 ,1.0 .......... .......... .......... .......... .......... Twofold Fourfold 35 21 10 3 1 48 29 24 11 3 data differed in some important way from a random sample of genes (e.g., those analyzed in Begun and Whitley [2000b]). In fact, the X-linked genes included in the analyses here do differ from the set of X-linked genes from Begun and Whitley (2000b), in that they are not significantly less polymorphic at silent sites than 3R genes (P 5 0.86). The ratio of silent u/silent divergence is not significantly different for X versus 3R genes included in this study (P 5 0.09). Genes evolving more quickly are less likely to be included in the study of polarized mutations, because such genes are expected to have greater PCR failure rates in D. yakuba when PCR primers are designed from D. melanogaster sequence. This could result in a biased sample. However, under the neutral model, genes that evolve more slowly are expected to be less polymorphic. Therefore, if a bias were expected, one would imagine that the D. simulans samples for which D. yakuba data were available would tend to be less polymorphic than a random set of genes successfully amplified and sequenced from D. simulans. This is not the case. In fact, the silent-site divergences between D. melanogaster and D. simulans for genes with versus without a successfully isolated D. yakuba sequence are not significantly different (P 5 0.80). It seems likely, therefore, that the difference between the X-linked genes analyzed here and the entire set of Xlinked genes analyzed in Begun and Whitley (2000b) is a coincidence. Nevertheless, we should ask whether this coincidence affects our conclusions. For example, can the higher frequency of X-linked versus 3R polymorphisms be attributed to the unusually polymorphic sample of X-linked genes? Figure 4 shows a plot of silent u versus the average frequency of polarized polymorphisms for each gene (the frequency of a polymorphism does not affect its contribution to u). Genes, mutations in which can be polarized, show no correlation between the variables. Therefore, there is no reason to believe that conclusions from this paper are compromised by sampling artifacts. Another potential bias in the analysis of codon bias comes from the inference of ancestral Table 9 Numbers of Unpolarized Unpreferred Codons in HigherBias Versus Lower-Bias Genes in Drosophila simulans Frequency .0.0, .0.2, .0.4, .0.6, .0.8, #0.2 #0.4 #0.6 #0.8 ,1.0 .......... .......... .......... .......... .......... Higher Bias Lower Bias 34 15 8 5 8 28 31 15 20 6 Population Genetics of Drosophila simulans FIG. 4.—Silent u per site versus average frequency of polarized polymorphisms for each region. Not significant by Spearman’s correlation. states. A codon can be used in the analysis of polarized mutations only if there is a single mutation in the history of a sample that includes three species, D. simulans, D. melanogaster and D. yakuba. Therefore, we expect such codons to be evolving more slowly than a random sample of codons. If codons evolve more slowly because they experience stronger selection for codon bias, then we expect their frequency spectrum to be more skewed toward excess rare unpreferred polymorphisms compared with average D. simulans codons (Akashi 1996). Therefore, if anything, the analysis of polarized data biases one toward detecting a skewed frequency spectrum of unpreferred polymorphisms. However, we did not detect such a skew; from this perspective, our failure to reject homogeneity of frequencies of polarized unpreferred and preferred polymorphisms would seem to be conservative. Discussion Akashi (1996) interpreted the relatively small amount of fixation data from D. simulans as support for the idea that this lineage has been at equilibrium for codon bias. Larger amounts of data from D. melanogaster supported the idea that codon bias has been evolving (deteriorating) in this species. The results presented here suggest that codon bias is also declining in D. simulans, although, as Akashi pointed out, the relative accumulation of unpreferred fixations in D. melanogaster is dramatically greater than it is in D. simulans. Why do these species deviate from equilibrium? Why is the deviation greater in D. melanogaster? Evolution of a reduced recombination rate might result in reduced efficacy of natural selection and increased fixation rates for unpreferred mutations (e.g., Charlesworth, Morgan, and Charlesworth 1993). There are relatively few genetic data from species in the melanogaster subgroup other than D. melanogaster. However, recent genetic data from the tip of the X chromosome in D. yakuba (Takano-Shimizu 1999) suggest that for this region of the genome, the recombination rate is much greater for D. yakuba than for D. simulans and D. melanogaster. Given that D. yakuba is an outgroup for the other two species, this observation is consistent with (but does not strongly support) evolution of a reduced recombination rate along the D. simulans/D. melanogaster lineage. We have no idea whether this is a genomewide difference. 1349 The smaller genetic map of D. melanogaster relative to D. simulans (Sturtevant 1929; True, Mercer, and Laurie 1996) provides a plausible explanation for reduced effectiveness of purifying selection in melanogaster. Assessing the effect of recombination rates on molecular evolution will require molecular and genetic analysis of several members of the melanogaster subgroup. An alternative explanation for patterns of unpreferred and preferred fixations in D. simulans and D. melanogaster is that both are evolving toward some new optimal pattern of codon usage as a consequence of some biological change in the ancestor. Such changes might include alteration of tRNA abundance or other changes that affect selection on codon usage. However, this explanation seems unlikely, at least based on similar patterns of codon usage in widely divergent Drosophila species (McVean and Vieira 1999; Kreitman and Antezana 2000). Regardless of the cause, if these species are not at equilibrium, then one must use caution when attempting to estimate population genetic parameters through application of equilibrium models (Akashi 1995, 1996, 1999; McVean and Vieira 1999). The results presented here are similar to Akashi’s in that the ratio of polarized unpreferred to preferred polymorphisms is much greater than the ratio of unpreferred to preferred fixations. If one attributes this result to ‘‘too many’’ unpreferred polymorphisms, then a plausible explanation for such an excess is that unpreferred polymorphisms are borderline deleterious mutations (i.e., 1 , Ns , 3) (Akashi 1996). The contribution of such mutations to polymorphism is expected to be greater than their contribution to divergence (e.g., Ohta and Kimura 1971; Kimura 1983; Ohta 1992). Their frequencies in samples are expected to be lower than frequencies of neutral polymorphisms or borderline beneficial polymorphisms (Akashi 1999). Previous analyses of polymorphisms from D. simulans provided little support for skewed frequency distributions for mutants of various putative fitness classes. The results from polarized polymorphisms in D. simulans presented here also provide little support for heterogeneity of frequencies between unpreferred, preferred, or amino acid polymorphisms. On the other hand, analysis of unpolarized polymorphisms from higher-bias versus lower-bias genes provides the best evidence to date for a skew toward rare alleles for unpreferred polymorphisms. Excess numbers of unpreferred polymorphisms and the skew toward rare alleles for unpolarized unpreferred codons provide complementary support for the notion that borderline mutations make significant contributions to silent variation in D. simulans. The rather different results for the unpolarized versus polarized mutations is, however, a bit troubling. This is especially true given that one expects biases arising from analysis of polarized mutations to result in a greater likelihood of detecting skews toward rare alleles for unpreferred mutations. A possible explanation for the discrepancy is that analysis of unpolarized unpreferred mutations is more powerful because there are greater numbers of unpolarized mutations (452) than of polarized mutations (204). Given the results in table 4, it would not be surprising if larger samples of polar- 1350 Begun ized unpreferred polymorphisms from higher-bias versus lower-bias genes supported a skew toward rare alleles in higher-bias genes. The analyses presented here support the idea that selection at silent sites is stronger at twofold codons than at fourfold codons. A reasonable interpretation is that fourfold codons sometimes (or often) have more than two potential fitness classes. Assume the least fit allele at a fourfold codon is as deleterious as the less fit allele at a twofold codon. If this is true, then we expect the average unpreferred allele at a fourfold codon to be selected more weakly than the average unpreferred allele at a twofold codon. Kreitman and Antezana (2000) noted that the rank order of alternative codon frequency for most four-codon families was conserved between D. melanogaster and D. pseudoobscura. This suggests that there are more than two fitness classes and as many as four fitness classes for some codon families. Frequencies of polymorphisms for twofold and fourfold codons in D. simulans support this hypothesis. If true, the hypothesis predicts that many ‘‘no-change’’ mutations are very weakly deleterious (although some may also be weakly beneficial). The observation that the ratio of polymorphic to fixed no-change mutations is similar to the ratio for unpreferred mutations (tables 6 and 7) is consistent with the hypothesis that the two types of mutations have similar distributions of selection coefficients. The summary of codon use in high-bias D. melanogaster genes given in Kreitman and Antezana (2000) was used to assign a fitness ranking based on relative abundance. The number of fitness classes was equal to the size of the codon family (two-, three-, or fourfold). All D. simulans mutations previously assigned to the no-change category, along with those that had not been assigned any category based on the analysis of Sharp and Lloyd (1993), were reclassified as preferred or unpreferred based on these rankings. Among the reclassified polymorphic mutations, roughly twice as many are to ‘‘lower-fitness’’ codons (49) as to ‘‘higher-fitness’’ codons (23). Among the reclassified fixed mutations, 12 are to lower-fitness codons, while 9 are to higher-fitness codons. Although the 2 3 2 contingency table is not significantly heterogeneous, the configuration is in the same direction as for the unpreferred mutations. This, too, is consistent with the idea that categorization of silent mutations into two categories is overly simplistic. Begun and Whitley (2000b) suggested that reduced X-linked versus autosomal polymorphism in D. simulans is best explained by stronger effects of positively selected mutants on the X chromosome. The result reported here, that conditioned on a site being polymorphic in a sample, X-linked polymorphisms occur at a higher frequency than those on 3R (table 2), is another distinguishing feature of X-linked versus autosomal variation in this species. Further theoretical research is required to determine which models of linked selection may be able to account for these data (e.g., Gillespie 1997; Fay and Wu 2000). Sequencing surveys and microsatellite analyses of D. simulans are indicative of small but significant differentiation between populations and slightly higher levels of variation in African versus United States D. simulans (Irvin et al. 1998; Hamblin and Veuille 1999). Inferences on the dynamics of mutations in D. simulans populations reported here rely on comparisons of different mutant classes or comparison of mutations on different chromosomes. Because deviations from population equilibrium are expected to affect all weakly selected sites in the genome in a similar manner, such comparisons remain useful. Nevertheless, theoretical studies would be required to confirm that deviations from equilibrium have only minor effects on the behavior of the tests carried out for this paper. Cargill et al. (1999) measured polymorphism in 106 human genes, with an average sample size of 114 alleles per gene. They found that replacement polymorphisms occurred at a significantly lower average frequency than silent polymorphisms, primarily because replacement polymorphisms were overrepresented among the class of very rare alleles. They attributed this observation to stronger purifying selection against replacement polymorphisms than against silent polymorphisms. Determining whether the frequency distributions of replacement polymorphism in Drosophila populations and human populations are similar would require sampling of larger numbers of D. simulans alleles. Acknowledgments I thank the anonymous reviewers and D. Rand for useful comments. This work was supported by NIH GM55298 and by the Alfred P. Sloan Foundation. LITERATURE CITED AKASHI, H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935. ———. 1995. Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila. Genetics 139:1067–1076. ———. 1996. Molecular evolution between Drosophila melanogaster and D. simulans: Reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics 144:1297–1307. ———. 1999. Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221–238. AKASHI, H., and S. W. SCHAEFFER. 1997. Natural selection and the frequency distribution of ‘‘silent’’ DNA polymorphism in Drosophila. Genetics 146:295–307. BEGUN, D. J., and P. WHITLEY. 2000a. Adaptive evolution of RELISH, a Drosophila NF-kB/IkB protein. Genetics 154: 1231–1238. ———. 2000b. Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc. Natl. Acad. Sci. USA 97:5960– 5965. BULMER, M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907. CARGILL, M., D. ALTSHULER, J. IRELAND et al. (17 co-authors). 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231–238. CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303. Population Genetics of Drosophila simulans EANES, W. F., M. KIRCHNER, J. YOON, C. H. BIERMANN, I. N. WANG, M. A. MCCARTNEY, and B. C. VERRELLI. 1996. Historical selection, amino acid polymorphism and lineage-specific divergence at the G6pd locus in Drosophila melanogaster and D. simulans. Genetics 144:1027–1041. FAY, J. C., and C.-I. WU. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413. GILLESPIE, J. H. 1997. Junk ain’t what junk does: neutral alleles in a selected context. Gene 205:291–299. HAMBLIN, M., and M. VEUILLE. 1999. Population structure among African and derived populations of Drosophila simulans: evidence for ancient subdivision and recent admixture. Genetics 153:305–317. IRVIN, S. D., K. A. WETTERSTRAND, C. M. HUTTER, and C. F. AQUADRO. 1998. Genetic variation and differentiation at microsatellite loci in Drosophila simulans: evidence for founder effects in new world populations. Genetics 150: 777–790. KIMURA, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, England. KREITMAN, M., and M. ANTEZANA. 2000. Population and evolutionary genetics of codon usage in Drosophila. Pp. 82– 101 in R. SINGH and C. KRIMBAS, eds. Evolutionary genetics: from molecules to morphology. Cambridge University Press, Oxford, England. LI, W.-H. 1987. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24:337–345. MCVEAN, G. A. T., and B. CHARLESWORTH. 1999. A population genetic model for the evolution of synonymous codon usage: patterns and predictions. Genet. Res. 74:145–158. MCVEAN, G. A. T., and J. VIEIRA. 1999. The evolution of codon preference in Drosophila: a maximum-likelihood approach to parameter estimation and hypothesis testing. J. Mol. Evol. 49:63–75. MARUYAMA, T., and P. A. FUERST. 1984. Population bottlenecks and nonequilibrium models in population genetics. I. 1351 Allele numbers when populations evolve from zero variability. Genetics 108:745–763. OHTA, T. 1992. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263–286. OHTA, T., and M. KIMURA. 1971. On the constrancy of the evolutionary rate of cistrons. J. Mol. Evol. 1:18–25. POWELL, J. R., and E. N. MORIYAMA. 1997. Evolution of codon usage bias in Drosophila. Proc. Natl. Acad. Sci. USA 94: 7784–7790. ROZAS, J., and R. ROZAS. 1999. DnaSP 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174–175. SHARP, P. M., and A. T. LLOYD. 1993. Codon usage. Pp. 378– 397 in G. MARONI, ed. An atlas of Drosophila genes: sequences and molecular features. Oxford University Press, Oxford, England. STURTEVANT, A. H. 1929. Contributions to the genetics of Drosophila simulans and Drosophila melanogaster. Publ. Carnegie Inst. 399:1–62. TAJIMA, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595. TAKANO-SHIMIZU, T. 1999. Local recombination and mutation effects on molecular evolution in Drosophila. Genetics 153: 1285–1296. TRUE, J. R., J. M. MERCER, and C. C. LAURIE. 1996. Differences in crossover frequency and distribution among three sibling species of Drosophila. Genetics 142:507–523. WRIGHT, S. 1938. The distribution of gene frequencies under irreversible mutation. Proc. Natl. Acad. Sci. USA 24:253– 259. ———. 1990. The ‘‘effective number of codons’’ used in a gene. Gene 87:23–29. DAVID M. RAND, reviewing editor Accepted March 20, 2001 1352 Begun APPENDIX A APPENDIX B Estimates of Codon Bias in Drosophila simulans, Drosophila yakuba, and Drosophila melanogaster Tajima’s D Test Statistics for Silent Polymorphisms in Drosophila simulans ENC Gene Chromosome 3R Gld . . . . . . . . . . Rh3 . . . . . . . . . . mir . . . . . . . . . . . nos . . . . . . . . . . . eld . . . . . . . . . . . Hsc70 . . . . . . . . CP190 . . . . . . . . ry . . . . . . . . . . . . hyd . . . . . . . . . . Rel . . . . . . . . . . . pit . . . . . . . . . . . AP50 . . . . . . . . . T-cpl . . . . . . . . . fzo . . . . . . . . . . . AATS . . . . . . . . . tld . . . . . . . . . . . Osbp . . . . . . . . . boss . . . . . . . . . . Tpi . . . . . . . . . . . Chromosome I runt . . . . . . . . . . G6pd . . . . . . . . . bnb . . . . . . . . . . r ............. mei-218 . . . . . . . sog . . . . . . . . . . . g ............ Yp3 . . . . . . . . . . v ............ Yp2 . . . . . . . . . . otu . . . . . . . . . . . sn . . . . . . . . . . . . dec-1 . . . . . . . . . ct . . . . . . . . . . . . sqh . . . . . . . . . . . X ............ ovo . . . . . . . . . . mei-9 . . . . . . . . . per . . . . . . . . . . . z ............. Pgd . . . . . . . . . . D. simulans 56.0 42.8 33.9 43.8 44.7 30.1 57.5 51.4 55.4 49.4 45.8 40.8 39.1 60.9 54.2 48.5 44.3 48.9 27.5 35.3 31.0 40.4 42.3 54.9 40.4 45.9 28.0 43.8 33.3 56.6 38.8 55.9 43.2 33.5 44.1 52.1 56.6 36.8 42.1 37.4 C. yakuba 43.1 36.4 49.5 30.2 51.5 51.6 55.5 48.0 40.4 38.6 45.7 46.1 29.0 32.3 41.0 43.3 43.2 37.9 50.4 41.5 35.8 50.1 35.5 Gene D. melanogaster Length Bias 51.4 42.1 33.3 45.6 45.8 30.3 53.2 52.6 56.2 50.3 47.5 39.5 38.7 60.6 51.8 52.6 44.5 49.1 28.6 612 383 830 401 2,715 651 1,096 1,335 2,985 971 663 437 557 718 1,714 1,057 667 896 247 L H H H L H L L L L L H H L L L L L H 37.2 31.5 42.8 48.7 55.8 41.9 47.3 32.4 44.2 32.9 53.0 40.4 53.1 43.1 35.7 44.5 52.7 57.7 37.5 42.8 37.2 509 523 442 2,236 523 1,038 810 420 379 442 811 512 1,123 2,175 174 3,429 1,028 1,186 1,155 574 481 H H H H L H L H L H L H L H H L L L H H H NOTE.—Shown are effective numbers of codons (ENCs) for randomly selected alleles from each of the three species, calculated using only the codons that were sequenced in D. simulans. Genes were ranked by D. simulans ENC. Genes in the lower half of the distribution were placed in the higher-bias category, while genes in the upper half of the distribution were placed in the lowerbias category. Length 5 number of amino acid residues in complete proteins as indicated in the SwissProt database; H 5 higher bias; L 5 lower bias. Chromosome 3R Gld . . . . . . . . . . . . . . . . . . . . Rh3 . . . . . . . . . . . . . . . . . . . . mir . . . . . . . . . . . . . . . . . . . . . nos . . . . . . . . . . . . . . . . . . . . . eld . . . . . . . . . . . . . . . . . . . . . Hsc70 . . . . . . . . . . . . . . . . . . CP190 . . . . . . . . . . . . . . . . . . ry . . . . . . . . . . . . . . . . . . . . . . hyd . . . . . . . . . . . . . . . . . . . . Rel . . . . . . . . . . . . . . . . . . . . . pit . . . . . . . . . . . . . . . . . . . . . AP50 . . . . . . . . . . . . . . . . . . . T-cpl . . . . . . . . . . . . . . . . . . . fzo . . . . . . . . . . . . . . . . . . . . . AATS . . . . . . . . . . . . . . . . . . . tld . . . . . . . . . . . . . . . . . . . . . Osbp . . . . . . . . . . . . . . . . . . . boss . . . . . . . . . . . . . . . . . . . . Tpi . . . . . . . . . . . . . . . . . . . . . X chromosome runt . . . . . . . . . . . . . . . . . . . . G6pd . . . . . . . . . . . . . . . . . . . bnb . . . . . . . . . . . . . . . . . . . . r ....................... mei-218 . . . . . . . . . . . . . . . . . sog . . . . . . . . . . . . . . . . . . . . . g ...................... Yp3 . . . . . . . . . . . . . . . . . . . . v ...................... Yp2 . . . . . . . . . . . . . . . . . . . . otu . . . . . . . . . . . . . . . . . . . . . sn . . . . . . . . . . . . . . . . . . . . . . dec-1 . . . . . . . . . . . . . . . . . . . ct . . . . . . . . . . . . . . . . . . . . . . sqh . . . . . . . . . . . . . . . . . . . . . X ...................... ovo . . . . . . . . . . . . . . . . . . . . mei-9 . . . . . . . . . . . . . . . . . . . per . . . . . . . . . . . . . . . . . . . . . z ....................... Pgd . . . . . . . . . . . . . . . . . . . . Tajima’s D 0.808 20.728 20.394 20.079 0.412 20.044 1.044 20.149 0.092 20.775 0.554 0.299 20.793 0.380 0.197 20.188 0.159 20.008 20.863 1.280 1.564 21.595 0.050 1.598 20.861 21.623* — 1.670 — 0.955 0.948 0.421 — 20.561 1.385 0.874 20.572 20.699 0.612 1.093 NOTE.—D was not calculated for samples having only one or two segregating sites. * 5 P , 0.05.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The Frequency Distribution of Nucleotide Variation in Drosophila