Download The Frequency Distribution of Nucleotide Variation in Drosophila

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pharmacogenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Expanded genetic code wikipedia , lookup

Human genetic variation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Essential gene wikipedia , lookup

Heritability of IQ wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Genetic drift wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Koinophilia wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Minimal genome wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Frameshift mutation wikipedia , lookup

Mutation wikipedia , lookup

Gene wikipedia , lookup

Epistasis wikipedia , lookup

Point mutation wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Population genetics wikipedia , lookup

Genetic code wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
The Frequency Distribution of Nucleotide Variation in Drosophila simulans
David J. Begun
Section of Evolution and Ecology, University of California at Davis
Patterns of codon bias in Drosophila suggest that silent mutations can be classified into two types: unpreferred
(slightly deleterious) and preferred (slightly beneficial). Results of previous analyses of polymorphism and divergence in Drosophila simulans were interpreted as supporting a mutation-selection-drift model in which slightly
deleterious, silent mutants make significantly greater contributions to polymorphism than to divergence. Frequencies
of unpreferred polymorphisms were inferred to be lower than frequencies of other silent polymorphisms. Here, I
analyzed additional D. simulans data to reevaluate the support for these ideas. I found that D. simulans has fixed
more unpreferred than preferred mutations, suggesting that this lineage has not been at mutation-selection-drift
equilibrium at silent sites. Frequencies of polarized unpreferred polymorphisms are not skewed toward rare alleles.
However, frequencies of unpolarized unpreferred codons are lower in high-bias genes than in low-bias genes. This
supports the idea that unpreferred codons are borderline deleterious mutations. Purifying selection on silent sites
appears to be stronger at twofold-degenerate codons than at fourfold-degenerate codons. Finally, I found that Xlinked polymorphisms occur at a higher average frequency than polymorphisms on chromosome arm 3R, even
though an average X-linked site is significantly less likely to be polymorphic than an average site on 3R. This
result supports a previous analysis of D. simulans indicating different population genetics of X-linked versus autosomal mutations.
Introduction
An early result of theoretical population genetics
was the expected frequency distribution of mutations
under a neutral, equilibrium model of evolution (e.g.,
Wright 1938; Kimura 1983). Unfortunately, violations
of any one (or several) of the assumptions of the neutral,
equilibrium model could cause the frequency spectrum
of natural variation to deviate from theoretical predictions, making discrepancies between observed and expected distributions difficult to interpret. For example,
purifying selection maintains deleterious alleles at average frequencies that are lower than the average frequencies of neutral alleles in equilibrium populations.
However, frequencies of neutral alleles might be reduced
compared with equilibrium expectations if a population
is rapidly expanding (Maruyama and Fuerst 1984). Positive selection is expected to cause frequencies of beneficial alleles to be higher than expected for neutral alleles in equilibrium populations. However, population
bottlenecks can also cause higher-than-expected frequencies for neutral alleles. Such problems with regard
to hypothesis testing can be addressed if one has some
way of categorizing mutations a priori. For example, if
replacement polymorphisms were significantly more
rare than silent polymorphisms from the same population sample, then one might hypothesize that replacement polymorphisms are under stronger purifying selection than silent polymorphisms. Another example comes
from the analysis of codon bias. Preferred codons are,
by definition, significantly more abundant in genes exhibiting a high degree of codon bias than in genes exhibiting less codon bias (e.g., Sharp and Lloyd 1993).
Key words: Drosophila, DNA variation, population genetics, molecular evolution, natural selection.
Address for correspondence and reprints: David Begun, Section
of Evolution and Ecology, University of California, Davis, California
95616. E-mail: [email protected].
Mol. Biol. Evol. 18(7):1343–1352. 2001
q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
They are hypothesized to have a slightly higher average
fitness than unpreferred codons.
Recent analyses of nucleotide polymorphism and
divergence at eight genes from Drosophila simulans and
its close relatives have led to three hypotheses regarding
the frequency distribution of nucleotide polymorphisms
(Akashi 1996, 1999; Akashi and Schaeffer 1997). The
first hypothesis is that roughly equal numbers of preferred and unpreferred codons (mutations) have fixed
along the D. simulans lineage. The observation is consistent with the notion that codon bias is not evolving
in D. simulans (i.e., that D. simulans is at equilibrium
for codon bias). The second hypothesis (Akashi and
Schaeffer 1997) is that unpreferred polymorphisms segregate at significantly lower frequencies than preferred
polymorphisms in D. simulans. The third hypothesis is
that replacement polymorphisms are skewed toward rare
alleles in D. simulans (Akashi 1996, 1999). According
to this worldview, many unpreferred polymorphisms and
replacement polymorphisms in D. simulans belong to a
special category of ‘‘borderline’’ alleles. These alleles
have selection coefficients such that Ns, the product of
the effective population size and the selection coefficient, is close to 1. Selection on such alleles is sufficiently weak that they can reach appreciable frequencies, yet sufficiently strong that they are unlikely to
reach high frequencies or fix (Kimura 1983; Ohta 1992).
A weakness of the D. simulans data, as acknowledged by Akashi (1996, 1999), is that support for a significant skew toward rare amino acid polymorphisms is
based on data from only a few genes. Only three of the
eight genes analyzed by Akashi (1996) harbored amino
acid polymorphism. Of the nine singleton amino acid
polymorphisms, five were from the period locus. We
would be unwise to draw general conclusions about the
frequency distribution of amino acid polymorphisms
from so few data. Given that greater numbers of silent
polymorphisms were observed in D. simulans, conclusions on their frequency distribution would seem to be
more sound. Nevertheless, period data account for about
1343
1344
Begun
30% of the derived singleton unpreferred polymorphisms in the D. simulans data analyzed by Akashi and
Schaeffer (1997). If period data are excluded, there is
no significant skew toward rare, unpreferred alleles in
D. simulans (one-tailed Mann-Whitney U test; P 5
0.11). This dependence of the statistical results on period data could be indicative of locus effects or could
be attributable to reduced power associated with removal of a large amount of data from the analysis.
The conclusion of roughly equal numbers of preferred and unpreferred fixations is based on observation
of only 27 mutations (Akashi 1999). The observation of
14 unpreferred and 13 preferred fixations (Akashi 1999)
is compatible with an equilibrium model (i.e., 50% of
the fixations preferred and 50% unpreferred). However,
this observation is also compatible with an underlying
model with highly asymmetric fixation rates of the two
mutant types. For example, the observation of 14 unpreferred and 13 preferred fixations in D. simulans is
compatible with an underlying model of 65% unpreferred and 35% preferred fixations (two-tailed binomial
probability; P 5 0.22). That is, although the equilibrium
model was not rejected with the available D. simulans
data, this should not be construed as strong support for
the model.
In general, the previously available data from D.
simulans were insufficient to draw strong conclusions
on the frequency distribution. Here, I reexamine the frequency distributions of different types of mutations from
a larger sample of D. simulans codons (Begun and Whitley 2000b) and use the results to make inferences on
the causes of variation in D. simulans populations.
Materials and Methods
Drosophila simulans alleles analyzed here are those
reported in Begun and Whitley (2000b). Most of these
sequences are from a set of highly inbred lines made
from single D. simulans females captured in the Wolfskill Orchard in Winters, Calif., during the summer of
1995. Most remaining loci include alleles sampled from
several locations. Forty genes, 19 on chromosome arm
3R and 21 on the X chromosome, are included in these
analyses (appendix A). The number of chromosomes
sampled per locus varies from 5 to 11, although the
majority of genes were sampled for six to eight alleles
(Begun and Whitley 2000b). The average number of alleles sampled per locus was 7.14 for the X chromosome
and 7.06 for 3R. Drosophila yakuba sequences were
available for 23 genes (Rh3, ry, Rel, hyd, T-cpl, Ap50,
Osbp, boss, Tpi, mir, Cp190, Hsc70, eld, G6pd, g, sog,
v, sn, dec-1, X, per, z, and Pgd) of the 40 surveyed for
polymorphism in D. simulans. Sequences were analyzed
primarily with the DnaSP program (Rozas and Rozas
1999). The numbers of polymorphisms can vary slightly
from one analysis to another (e.g., between this report
and Begun and Whitley [2000b]), because some codons
with more than one mutation can be used in certain analyses, but not in others.
The criteria of Sharp and Lloyd (1993) were used
to assign codons to putative fitness classes, preferred
and unpreferred. Following Akashi (1995, 1996), I analyzed the frequency distribution of polarized mutations.
Polarized polymorphisms are those for which parsimony
can be used to infer which of two alleles at a polymorphic codon is ancestral. Drosophila melanogaster and
D. yakuba served as outgroups for the D. simulans data.
I used a haphazardly selected allele from each of the
two outgroup species for all inferences of the ancestral
state in D. simulans. When each of the outgroup codons
was identical to one of the segregating D. simulans codons, the outgroup codon was inferred to be the (monomorphic) codon in the hypothetical ancestral D. simulans population. Fixations along the D. simulans lineage
were inferred when all D. simulans alleles had a particular base at a given site and both outgroups shared the
same base, which was different from the base present in
D. simulans. Changes from preferred alleles to unpreferred alleles are referred to as unpreferred mutations,
while changes from unpreferred to preferred alleles are
referred to as preferred (i.e., higher fitness) mutations.
Codons harboring more than one mutation in the sample
of three species were excluded from all analyses. Many
of Akashi’s analyses focused on silent mutations assigned to either of two fitness categories. Here, I also
analyzed ‘‘no-change’’ mutations, defined as unpreferred-to-unpreferred changes, or preferred-to-preferred
changes. These mutations are hypothesized to have lesser fitness effects than mutations between categories. Replacement mutations were polarized in the same way as
silent mutations for the purposes of estimating frequencies, although they were not assigned to presumptive
fitness categories. Polarized polymorphisms can have
frequencies between 1/n and (n 2 1)/n, where n is the
number of sampled alleles.
I also analyzed unpolarized mutations. This approach has at least two advantages. First, there is no
inference regarding the ancestral state, and thus no potential uncertainty or bias introduced into the analysis.
Second, many more codons are available for analysis.
For the purposes of this paper, most unpolarized analyses are on the frequencies of unpreferred codons. Unpolarized unpreferred polymorphisms can have frequencies ranging from 1/n to (n 2 1)/n. Codons for which
there were more than two alleles were excluded from
the analysis. For silent versus replacement polymorphism frequencies (no parsing of silent mutations into
fitness classes), the frequency of a codon was taken as
the frequency of the less common allele.
For some analyses, I assessed the effect of codon
bias on frequency of mutations by dividing the D. simulans genes into ‘‘higher-bias’’ and ‘‘lower-bias’’ categories. Higher bias genes were defined as having an
effective number of codons (ENC; Wright 1990) below
the median ENC (43.8) of simulans genes in the data
(appendix A); lower bias genes had ENCs above the
median. The v and nos loci had the median ENC for the
data; v was haphazardly assigned to the lower-bias category, while nos was assigned to the higher-bias category (none of the results are sensitive to this assignment). A more powerful approach for assessing the effect of bias on frequencies might result if we omit genes
Population Genetics of Drosophila simulans
1345
Table 1
Average Frequencies of Drosophila simulans
Polymorphisms
Silent (n)
X ..........
3R . . . . . . . . .
X 1 3R . . . . .
0.278 (230)
0.254 (456)
0.262 (686)
Silent 1
Replacement (n) Replacement (n)
0.245 (42)
0.271 (57)
0.260 (99)
0.273 (272)
0.256 (513)
0.262 (785)
NOTE.—n 5 number of polymorphisms. The frequency at a codon is the
frequency of the less common allele.
of intermediate codon bias. Therefore, in some analyses,
I included only the genes having ENC values near the
tails of the ENC distribution for all the data. For analyses of polarized mutations, the following genes were
assigned to the high-bias category: Tpi, Hsc70, G6pd,
mir, per, Pgd, and sn; the low-bias category included
ry, hyd, dec-1, and Cp190. The mean ENCs for these
high- and low-bias categories were 33.6 and 55.1, respectively, compared to a mean ENC of 44.2 for the 40
D. simulans genes from Begun and Whitley (2000b).
For analyses of unpolarized polymorphisms, the highbias genes included Yp3, Yp2, and sqh in addition to the
above set. Low-bias genes for unpolarized analyses included those used in the polarized data, as well as fzo,
mei-9, otu, Gld, mei-218, AATS, and ovo.
Results
Average Frequencies of Unpolarized Silent and
Replacement Polymorphisms
Table 1 shows the frequencies of silent and replacement polymorphisms in D. simulans. Overall frequencies (6SE) of silent and replacement polymorphisms are
0.262 (60.004) and 0.260 (60.012), respectively; there
is no evidence that silent and replacement polymorphisms occur at different average frequencies in our
sample. Mean frequencies (6SE) of X-linked and 3R
polymorphisms (silent 1 replacement) are 0.273
(60.007) and 0.256 (60.005), respectively.
Average Frequency of Polarized Mutations
Table 2 shows the average frequencies among 350
polymorphisms from three silent categories and from the
replacement category. A Kruskal-Wallis test of mutants
in the four categories does not reject the null hypothesis
of equal distributions (P 5 0.39). Average frequencies
of unpreferred versus preferred polymorphisms are not
significantly different (Mann-Whitney; P 5 0.16). The
FIG. 1.—Codon bias (Drosophila simulans effective number of
codons) versus average frequency of polarized unpreferred polymorphisms for each region. Only regions with five or more polymorphisms
were included. Not significant by Spearman’s correlation.
frequencies of unpreferred versus preferred polymorphisms are not significantly different for the X-linked
genes (P 5 0.16) or for the 3R genes (P 5 0.54) considered separately.
When fixed mutations (frequency 5 1.0) are included in the estimation of mean frequencies, the frequency of preferred mutants (n 5 89, mean 5 0.69) is
significantly higher (Mann-Whitney; P , 0.0001) than
the frequency of unpreferred mutants (n 5 275, mean
5 0.49). The difference between the results on mean
frequencies with versus without fixations is easily understandable from the observation that the ratio of preferred to unpreferred fixations is much higher than the
ratio of preferred to unpreferred polymorphisms.
Under the mutation-selection-drift model of silentsite evolution, genes under stronger selection for codon
bias might be expected to show a greater skew toward
rare alleles for unpreferred mutations (Akashi 1999;
McVean and Charlesworth 1999). Figure 1, a scatterplot
of codon bias in D. simulans genes (ENC) versus the
average frequency of unpreferred polymorphisms (per
gene), reveals no effect of codon bias on the average
frequency of unpreferred polymorphisms. Although the
mean frequency of unpreferred polymorphisms is lower
in higher-bias genes (n 5 114 polymorphisms, frequency 5 0.307) than in lower-bias genes (n 5 90 polymorphisms, frequency 5 0.333), the difference is not
significant (Mann-Whitney; P 5 0.16). There is no difference in the average frequencies of unpreferred polymorphisms in high-bias genes (n 5 59 polymorphisms)
Table 2
Average Frequencies of Polarized Polymorphisms in Drosophila simulans
X (n)
No change (NC) . . . . . . . . . . . . . . . . . . . . .
Preferred (P) . . . . . . . . . . . . . . . . . . . . . . . . .
Unpreferred (U) . . . . . . . . . . . . . . . . . . . . . .
Silent (NC 1 P 1 U) . . . . . . . . . . . . . . . . .
Replacement . . . . . . . . . . . . . . . . . . . . . . . . .
Total (silent 1 replacement) . . . . . . . . . . . .
NOTE.—n 5 number of polymorphisms.
0.403
0.432
0.350
0.373
0.376
0.373
(16)
(18)
(68)
(102)
(14)
(116)
3R (n)
0.311
0.306
0.302
0.305
0.267
0.300
(42)
(25)
(136)
(203)
(31)
(234)
X 1 3R (n)
0.336
0.359
0.318
0.328
0.301
0.324
(58)
(43)
(204)
(305)
(45)
(350)
1346
Begun
Table 3
Numbers of Polarized Mutations in Drosophila simulans
X
FREQUENCY
3R
P
U
NC
R
P
U
NC
R
.0.0, #0.2 . . . 4
.0.2, #0.4 . . . 5
.0.4, #0.6 . . . 4
.0.6, #0.8 . . . 4
.0.8, ,1.0 . . . 1
1.0 . . . . . . . . . . 28
24
20
14
6
4
34
5
1
8
2
0
7
6
4
0
2
2
27
11
6
7
1
0
18
66
36
21
12
1
37
18
14
6
3
1
9
15
7
9
0
0
38
NOTE.—P 5 preferred; U 5 unpreferred; NC 5 no change; R 5 replacement.
Table 5
Numbers of Polarized Preferred and Unpreferred
Mutations in High-Bias Versus Low-Bias Genes in
Drosophila simulans
HIGH BIAS
LOW BIAS
FREQUENCY
P
U
P
U
.0.0, #0.2 . . . . .
.0.2, #0.4 . . . . .
.0.4, #0.6 . . . . .
.0.6, #0.8 . . . . .
.0.8, ,1.0 . . . . .
1.0 . . . . . . . . . . . .
0
3
1
3
1
18
30
11
8
8
2
19
6
2
2
2
0
12
27
9
5
4
0
20
NOTE.—P 5 preferred; U 5 unpreferred.
versus low-bias genes (n 5 25 polymorphisms) (MannWhitney; P 5 0.47). The ratios of unpreferred to preferred polymorphisms are not significantly different in
the higher-bias (114:19) versus lower-bias (90:24)
genes; neither are the ratios significantly different in the
high-bias (59:8) versus low-bias (25:6) genes. Overall,
there is little evidence that selection has heterogeneous
effects on the mean frequencies of derived polymorphisms across categories of mutants.
mutations in the high- versus low-bias genes is marginally significant when the fixations are included (P 5
0.05) but is not significant when only the polymorphic
mutations are used (P 5 0.23). The ratios of derived
singleton to nonsingleton polymorphisms for unpreferred (90:114) versus preferred (15:28) mutants are not
significantly different (G-test; P 5 0.31).
Frequencies of Derived X-Linked Versus Autosomal
Polymorphisms
Frequency Distribution of Polarized Mutations
Polarized polymorphisms and fixations from different categories were assigned to one of six frequency
classes: .0.0 and #0.20, .0.20 and #0.40, .0.40 and
#0.60, .0.60 and #0.80, .0.80 and ,1.0, and 1.0 (table 3). The 4 3 5 contingency table of polymorphisms
(X and 3R data pooled) is not significantly heterogeneous (P 5 0.96). Thus, there is no support for a skew
toward rare alleles for unpreferred polymorphisms, or
for any difference in frequency distribution across mutational types. Table 4 shows the distributions of preferred and unpreferred polymorphisms and fixations in
higher-bias versus lower-bias genes. The 4 3 5 contingency table of preferred versus unpreferred polymorphisms in higher-bias versus lower-bias genes is not significantly heterogeneous (P 5 0.22). The 4 3 6 contingency table that includes fixations is significantly heterogeneous (P 5 0.002). Table 5 shows the frequency
distribution of preferred and unpreferred mutations for
genes from the extreme bias categories, high versus low.
A homogeneity test of the frequencies of unpreferred
Table 4
Numbers of Polarized Preferred and Unpreferred
Mutations in Higher-Bias Versus Lower-Bias Genes in
Drosophila simulans
HIGHER BIAS
LOWER BIAS
FREQUENCY
P
U
P
U
.0.0, #0.2 . . . . .
.0.2, #0.4 . . . . .
.0.4, #0.6 . . . . .
.0.6, #0.8 . . . . .
.0.8, ,1.0 . . . . .
1.0 . . . . . . . . . . . .
8
6
1
3
1
24
57
28
16
10
3
38
7
5
10
2
0
22
33
28
19
8
2
33
NOTE.—Higher- and lower-bias genes defined in appendix B. P 5 preferred;
U 5 unpreferred.
For each of four mutant classes, X-linked alleles
occur at higher average frequencies than alleles on chromosome arm 3R (table 2). Considering each of the 350
polarized polymorphisms as an independent observation, the difference in average frequency between chromosomes is highly significant (Mann-Whitney; P 5
0.004). The higher frequency of X-linked polymorphisms is consistent across preferred, unpreferred, nochange, and replacement variants. If one calculates the
average frequency of polarized polymorphisms for each
gene, the average is significantly higher for X-linked
genes (0.389, n 5 10) than for 3R genes (0.304, n 5
13) (Mann-Whitney U; P 5 0.01). Tajima’s (1989) D
statistics for silent mutations are given in appendix B.
The mean Tajima’s D for X-linked genes (0.363) is more
positive than the mean for 3R genes (20.004), although
the difference is not significant.
Tests of Polymorphism and Divergence
Tables 6 and 7 show the numbers of polarized polymorphisms and fixations in D. simulans. The 2 3 4 contingency tables are significantly heterogeneous with all
the data (P , 0.001) and with the Relish and G6pd data
excluded (P , 0.001). There is strong evidence that both
G6pd and Relish have undergone adaptive protein evoTable 6
Numbers of Polymorphic and Fixed Mutations (polarized)
in Drosophila simulans
Polymorphic . . . . .
Fixed . . . . . . . . . . .
P
U
NC
R
43
46
204
71
58
16
45
65
NOTE.—P 5 preferred; U 5 unpreferred; NC 5 no change; R 5 replacement.
Population Genetics of Drosophila simulans
1347
Table 7
Numbers of Polymorphic and Fixed Mutations (polarized)
in Drosophila simulans (data from Relish and G6pd loci
excluded)
Polymorphic . . . . .
Fixed . . . . . . . . . . .
P
U
NC
R
36
36
192
65
49
10
42
18
NOTE.—P 5 preferred; U 5 unpreferred; NC 5 no change; R 5 replacement.
lution in the D. simulans lineage. Therefore, significant
heterogeneity of the data in Table 7 shows that large
numbers of excess amino acid fixations in Relish and
G6pd (Eanes et al. 1996; Begun and Whitley 2000a) do
not account for the result. As one would suspect from
inspection of table 7, the ratio of polymorphic to fixed
mutations is not significantly heterogeneous for the unpreferred, no-change, and replacement mutations (P 5
0.23). Thus, the main cause of the significant rejection
of homogeneity in this table is that the ratio of preferred
fixations to polymorphisms is significantly greater than
the ratio observed for the other mutant classes.
A lineage/gene at equilibrium for codon bias is expected to fix equal numbers of preferred and unpreferred
mutations. A binomial test of the equilibrium hypothesis
that 50% of D. simulans fixations are preferred is significant for 23 genes (table 6; P 5 0.021, two-tailed),
and for 21 genes (table 7; Relish and G6pd excluded, P
5 0.004, two-tailed). Sixteen genes deviate from the expectation of equal numbers of preferred and unpreferred
fixations in D. simulans; of these, 11 deviate in the direction of more unpreferred than preferred fixations,
while only five deviate in the opposite direction. These
results support the notion that there has been a decline
of codon bias along the D. simulans lineage, although
the effect is not as pronounced as it is in D. melanogaster (Akashi 1996). As seen in previous analyses
(Akashi 1995, 1996), the ratio of unpreferred to preferred fixations in D. melanogaster (205:27) is much
greater than the expected ratio of 1:1 (two-tailed binomial probability; P , 1025). This ratio in melanogaster
is much greater than the ratio in D. simulans (G-test, P
, 1025), also supporting previous analyses by Akashi.
Although the D. simulans silent fixations reject the equilibrium model, there is no difference in the ratios of
unpreferred to preferred fixations for higher-bias (38:24)
versus lower-bias genes (33:22).
Silent-site divergence was estimated by counting
all silent mutants that fixed along the D. simulans lineage; polymorphism data from D. simulans, as well as
outgroup data from D. melanogaster and D. yakuba,
were used in the analysis. Figure 2 shows that there is
no correlation between codon bias (ENC) and the silentsite divergence in the D. simulans lineage. An earlier
study showing a similar result did not examine the D.
simulans lineage separately (Powell and Moriyama
1997). Silent divergence for 10 X-linked genes (0.029)
was slightly greater than the divergence for 3R genes
(0.021); the difference was marginally significant (P 5
0.04) by a Mann-Whitney test. However, there was no
FIG. 2.—Codon bias (Drosophila simulans effective number of
codons) versus silent divergence per site along the D. simulans lineage
for each region. Only ‘‘fixed’’ sites were included in the analysis. Not
significant by Spearman’s correlation.
difference in the ratio of unpreferred to preferred fixations for X-linked (34:28) versus 3R (37:18) genes.
The data from tables 4 and 5 can be used to ask
whether variation in levels of overall codon bias affect
the ratio of polymorphic to fixed unpreferred mutations.
The 2 3 2 contingency table of polymorphic versus
fixed unpreferred mutants is not significantly heterogeneous for the higher-bias versus lower-bias genes. However, the corresponding table for the more extreme highbias versus low-bias comparison was significantly heterogeneous (P 5 0.03).
Frequency of Unpolarized Unpreferred Codons
There were 422 codons that were polymorphic for
an unpreferred codon and a preferred codon (codons
with allele frequencies of 0.5 were excluded). Of these,
the rarer allele was unpreferred at 285 codons. Unpolarized data can be used directly in tests to determine if
unpreferred codons are maintained at low frequency by
natural selection. Under the mutation-selection-drift
model, the degree of codon bias reflects the intensity of
purifying selection at silent sites. The frequency of the
unpreferred codon was calculated for each of 452 codons (n 5 40 genes) in which one allele was unpreferred
and one was preferred. Figure 3 shows the relationship
between ENC and the frequency of unpreferred alleles
per gene; the two variable are significantly correlated
(Spearman correlation; P 5 0.005). Furthermore, the average frequency of unpreferred codons is marginally significantly lower (Mann-Whitney; P 5 0.04) in the higher-bias genes (mean 5 0.213) than in the lower-bias
genes (mean 5 0.252). The same is true for the 10 most
biased (mean frequency of unpreferred alleles per gene
5 0.219) versus the 10 least biased (mean frequency of
unpreferred alleles per gene 5 0.286) genes among the
40 D. simulans genes (Mann-Whitney; P 5 0.03). These
analyses support the notion that frequencies of unpreferred codons are depressed by purifying selection. Further support for this notion comes from categorization
of unpreferred polymorphisms into two categories, singletons versus nonsingletons. There are 85 singletons
and 111 nonsingletons for higher-bias genes; there are
61 singletons and 195 nonsingletons for lower-bias
genes. The proportion of unpreferred polymorphisms
1348
Begun
Table 8
Numbers of Polarized Unpreferred Polymorphisms in
Twofold and Fourfold codons in Drosophila simulans
Frequency
.0.0,
.0.2,
.0.4,
.0.6,
.0.8,
FIG. 3.—Codon bias (Drosophila simulans effective number of
codons) versus average frequency of unpolarized unpreferred polymorphisms for each region. Only regions with five or more polymorphisms were included. Spearman’s correlation; P 5 0.005.
that are singletons is significantly greater in higher-bias
than in lower-bias genes (G-test; P , 0.0001), as one
would expect if purifying selection depresses frequencies of unpreferred codons more effectively in higherbias genes.
Twofold-Degenerate Versus Fourfold-Degenerate
Codons
The previous analyses of silent mutations did not
distinguish between those at twofold-degenerate versus
fourfold-degenerate codons. The frequency of derived
silent mutations (preferred 1 unpreferred 1 no change)
is lower for the twofold (n 5 90 polymorphisms, mean
5 0.284) than for the fourfold (n 5 195, mean 5 0.343)
codons; however, the difference is only marginally significant (Mann Whitney; P 5 0.04). As one might expect given this result and given that most D. simulans
polymorphisms are unpreferred, the mean frequency of
derived, unpreferred polymorphisms at twofold codons
(n 5 70, mean 5 0.269) is lower than the corresponding
frequency at fourfold codons (n 5 115, mean 5 0.339);
the difference is marginally significant by a Mann-Whitney test (P 5 0.04). In spite of the marginally significant
difference in means, the frequency distribution of unpreferred polymorphisms in twofold versus fourfold codons (table 8) is not significantly different (P 5 0.45).
Table 9 shows the frequencies of unpolarized unpreferred codons in twofold-degenerate codons from genes of
higher versus lower degrees of codons bias. This 2 3 5
contingency table is significantly heterogeneous (P 5
0.013). Frequencies of unpolarized unpreferred codons
in higher-bias genes are skewed toward rare alleles compared with the distribution for lower-bias genes, as predicted if these codons are maintained at low frequency
by purifying selection.
Potential Biases
Conclusions from polarized data would be weakened if the subset of genes for which we have D. yakuba
#0.2
#0.4
#0.6
#0.8
,1.0
..........
..........
..........
..........
..........
Twofold
Fourfold
35
21
10
3
1
48
29
24
11
3
data differed in some important way from a random
sample of genes (e.g., those analyzed in Begun and Whitley [2000b]). In fact, the X-linked genes included in the
analyses here do differ from the set of X-linked genes
from Begun and Whitley (2000b), in that they are not
significantly less polymorphic at silent sites than 3R
genes (P 5 0.86). The ratio of silent u/silent divergence
is not significantly different for X versus 3R genes included in this study (P 5 0.09). Genes evolving more
quickly are less likely to be included in the study of
polarized mutations, because such genes are expected to
have greater PCR failure rates in D. yakuba when PCR
primers are designed from D. melanogaster sequence.
This could result in a biased sample. However, under
the neutral model, genes that evolve more slowly are
expected to be less polymorphic. Therefore, if a bias
were expected, one would imagine that the D. simulans
samples for which D. yakuba data were available would
tend to be less polymorphic than a random set of genes
successfully amplified and sequenced from D. simulans.
This is not the case. In fact, the silent-site divergences
between D. melanogaster and D. simulans for genes
with versus without a successfully isolated D. yakuba
sequence are not significantly different (P 5 0.80). It
seems likely, therefore, that the difference between the
X-linked genes analyzed here and the entire set of Xlinked genes analyzed in Begun and Whitley (2000b) is
a coincidence. Nevertheless, we should ask whether this
coincidence affects our conclusions. For example, can
the higher frequency of X-linked versus 3R polymorphisms be attributed to the unusually polymorphic sample of X-linked genes? Figure 4 shows a plot of silent
u versus the average frequency of polarized polymorphisms for each gene (the frequency of a polymorphism
does not affect its contribution to u). Genes, mutations
in which can be polarized, show no correlation between
the variables. Therefore, there is no reason to believe
that conclusions from this paper are compromised by
sampling artifacts. Another potential bias in the analysis
of codon bias comes from the inference of ancestral
Table 9
Numbers of Unpolarized Unpreferred Codons in HigherBias Versus Lower-Bias Genes in Drosophila simulans
Frequency
.0.0,
.0.2,
.0.4,
.0.6,
.0.8,
#0.2
#0.4
#0.6
#0.8
,1.0
..........
..........
..........
..........
..........
Higher Bias
Lower Bias
34
15
8
5
8
28
31
15
20
6
Population Genetics of Drosophila simulans
FIG. 4.—Silent u per site versus average frequency of polarized
polymorphisms for each region. Not significant by Spearman’s
correlation.
states. A codon can be used in the analysis of polarized
mutations only if there is a single mutation in the history
of a sample that includes three species, D. simulans, D.
melanogaster and D. yakuba. Therefore, we expect such
codons to be evolving more slowly than a random sample of codons. If codons evolve more slowly because
they experience stronger selection for codon bias, then
we expect their frequency spectrum to be more skewed
toward excess rare unpreferred polymorphisms compared with average D. simulans codons (Akashi 1996).
Therefore, if anything, the analysis of polarized data biases one toward detecting a skewed frequency spectrum
of unpreferred polymorphisms. However, we did not detect such a skew; from this perspective, our failure to
reject homogeneity of frequencies of polarized unpreferred and preferred polymorphisms would seem to be
conservative.
Discussion
Akashi (1996) interpreted the relatively small
amount of fixation data from D. simulans as support for
the idea that this lineage has been at equilibrium for
codon bias. Larger amounts of data from D. melanogaster supported the idea that codon bias has been
evolving (deteriorating) in this species. The results presented here suggest that codon bias is also declining in
D. simulans, although, as Akashi pointed out, the relative accumulation of unpreferred fixations in D. melanogaster is dramatically greater than it is in D. simulans.
Why do these species deviate from equilibrium? Why is
the deviation greater in D. melanogaster? Evolution of
a reduced recombination rate might result in reduced
efficacy of natural selection and increased fixation rates
for unpreferred mutations (e.g., Charlesworth, Morgan,
and Charlesworth 1993). There are relatively few genetic data from species in the melanogaster subgroup
other than D. melanogaster. However, recent genetic
data from the tip of the X chromosome in D. yakuba
(Takano-Shimizu 1999) suggest that for this region of
the genome, the recombination rate is much greater for
D. yakuba than for D. simulans and D. melanogaster.
Given that D. yakuba is an outgroup for the other two
species, this observation is consistent with (but does not
strongly support) evolution of a reduced recombination
rate along the D. simulans/D. melanogaster lineage. We
have no idea whether this is a genomewide difference.
1349
The smaller genetic map of D. melanogaster relative to
D. simulans (Sturtevant 1929; True, Mercer, and Laurie
1996) provides a plausible explanation for reduced effectiveness of purifying selection in melanogaster. Assessing the effect of recombination rates on molecular
evolution will require molecular and genetic analysis of
several members of the melanogaster subgroup. An alternative explanation for patterns of unpreferred and
preferred fixations in D. simulans and D. melanogaster
is that both are evolving toward some new optimal pattern of codon usage as a consequence of some biological
change in the ancestor. Such changes might include alteration of tRNA abundance or other changes that affect
selection on codon usage. However, this explanation
seems unlikely, at least based on similar patterns of codon usage in widely divergent Drosophila species
(McVean and Vieira 1999; Kreitman and Antezana
2000). Regardless of the cause, if these species are not
at equilibrium, then one must use caution when attempting to estimate population genetic parameters through
application of equilibrium models (Akashi 1995, 1996,
1999; McVean and Vieira 1999).
The results presented here are similar to Akashi’s
in that the ratio of polarized unpreferred to preferred
polymorphisms is much greater than the ratio of unpreferred to preferred fixations. If one attributes this result
to ‘‘too many’’ unpreferred polymorphisms, then a plausible explanation for such an excess is that unpreferred
polymorphisms are borderline deleterious mutations
(i.e., 1 , Ns , 3) (Akashi 1996). The contribution of
such mutations to polymorphism is expected to be greater than their contribution to divergence (e.g., Ohta and
Kimura 1971; Kimura 1983; Ohta 1992). Their frequencies in samples are expected to be lower than frequencies of neutral polymorphisms or borderline beneficial
polymorphisms (Akashi 1999). Previous analyses of
polymorphisms from D. simulans provided little support
for skewed frequency distributions for mutants of various putative fitness classes. The results from polarized
polymorphisms in D. simulans presented here also provide little support for heterogeneity of frequencies between unpreferred, preferred, or amino acid polymorphisms. On the other hand, analysis of unpolarized polymorphisms from higher-bias versus lower-bias genes
provides the best evidence to date for a skew toward
rare alleles for unpreferred polymorphisms. Excess numbers of unpreferred polymorphisms and the skew toward
rare alleles for unpolarized unpreferred codons provide
complementary support for the notion that borderline
mutations make significant contributions to silent variation in D. simulans. The rather different results for the
unpolarized versus polarized mutations is, however, a bit
troubling. This is especially true given that one expects
biases arising from analysis of polarized mutations to
result in a greater likelihood of detecting skews toward
rare alleles for unpreferred mutations. A possible explanation for the discrepancy is that analysis of unpolarized
unpreferred mutations is more powerful because there
are greater numbers of unpolarized mutations (452) than
of polarized mutations (204). Given the results in table
4, it would not be surprising if larger samples of polar-
1350
Begun
ized unpreferred polymorphisms from higher-bias versus
lower-bias genes supported a skew toward rare alleles
in higher-bias genes.
The analyses presented here support the idea that
selection at silent sites is stronger at twofold codons
than at fourfold codons. A reasonable interpretation is
that fourfold codons sometimes (or often) have more
than two potential fitness classes. Assume the least fit
allele at a fourfold codon is as deleterious as the less fit
allele at a twofold codon. If this is true, then we expect
the average unpreferred allele at a fourfold codon to be
selected more weakly than the average unpreferred allele
at a twofold codon. Kreitman and Antezana (2000) noted that the rank order of alternative codon frequency for
most four-codon families was conserved between D. melanogaster and D. pseudoobscura. This suggests that
there are more than two fitness classes and as many as
four fitness classes for some codon families. Frequencies
of polymorphisms for twofold and fourfold codons in
D. simulans support this hypothesis. If true, the hypothesis predicts that many ‘‘no-change’’ mutations are very
weakly deleterious (although some may also be weakly
beneficial). The observation that the ratio of polymorphic to fixed no-change mutations is similar to the ratio
for unpreferred mutations (tables 6 and 7) is consistent
with the hypothesis that the two types of mutations have
similar distributions of selection coefficients. The summary of codon use in high-bias D. melanogaster genes
given in Kreitman and Antezana (2000) was used to
assign a fitness ranking based on relative abundance.
The number of fitness classes was equal to the size of
the codon family (two-, three-, or fourfold). All D. simulans mutations previously assigned to the no-change
category, along with those that had not been assigned
any category based on the analysis of Sharp and Lloyd
(1993), were reclassified as preferred or unpreferred
based on these rankings. Among the reclassified polymorphic mutations, roughly twice as many are to ‘‘lower-fitness’’ codons (49) as to ‘‘higher-fitness’’ codons
(23). Among the reclassified fixed mutations, 12 are to
lower-fitness codons, while 9 are to higher-fitness codons. Although the 2 3 2 contingency table is not significantly heterogeneous, the configuration is in the
same direction as for the unpreferred mutations. This,
too, is consistent with the idea that categorization of
silent mutations into two categories is overly simplistic.
Begun and Whitley (2000b) suggested that reduced
X-linked versus autosomal polymorphism in D. simulans is best explained by stronger effects of positively
selected mutants on the X chromosome. The result reported here, that conditioned on a site being polymorphic in a sample, X-linked polymorphisms occur at a
higher frequency than those on 3R (table 2), is another
distinguishing feature of X-linked versus autosomal variation in this species. Further theoretical research is required to determine which models of linked selection
may be able to account for these data (e.g., Gillespie
1997; Fay and Wu 2000).
Sequencing surveys and microsatellite analyses of
D. simulans are indicative of small but significant differentiation between populations and slightly higher lev-
els of variation in African versus United States D. simulans (Irvin et al. 1998; Hamblin and Veuille 1999).
Inferences on the dynamics of mutations in D. simulans
populations reported here rely on comparisons of different mutant classes or comparison of mutations on different chromosomes. Because deviations from population equilibrium are expected to affect all weakly selected sites in the genome in a similar manner, such
comparisons remain useful. Nevertheless, theoretical
studies would be required to confirm that deviations
from equilibrium have only minor effects on the behavior of the tests carried out for this paper.
Cargill et al. (1999) measured polymorphism in
106 human genes, with an average sample size of 114
alleles per gene. They found that replacement polymorphisms occurred at a significantly lower average frequency than silent polymorphisms, primarily because replacement polymorphisms were overrepresented among
the class of very rare alleles. They attributed this observation to stronger purifying selection against replacement polymorphisms than against silent polymorphisms.
Determining whether the frequency distributions of replacement polymorphism in Drosophila populations and
human populations are similar would require sampling
of larger numbers of D. simulans alleles.
Acknowledgments
I thank the anonymous reviewers and D. Rand for
useful comments. This work was supported by NIH
GM55298 and by the Alfred P. Sloan Foundation.
LITERATURE CITED
AKASHI, H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935.
———. 1995. Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila. Genetics 139:1067–1076.
———. 1996. Molecular evolution between Drosophila melanogaster and D. simulans: Reduced codon bias, faster
rates of amino acid substitution, and larger proteins in D.
melanogaster. Genetics 144:1297–1307.
———. 1999. Inferring the fitness effects of DNA mutations
from polymorphism and divergence data: statistical power
to detect directional selection under stationarity and free
recombination. Genetics 151:221–238.
AKASHI, H., and S. W. SCHAEFFER. 1997. Natural selection and
the frequency distribution of ‘‘silent’’ DNA polymorphism
in Drosophila. Genetics 146:295–307.
BEGUN, D. J., and P. WHITLEY. 2000a. Adaptive evolution of
RELISH, a Drosophila NF-kB/IkB protein. Genetics 154:
1231–1238.
———. 2000b. Reduced X-linked nucleotide polymorphism in
Drosophila simulans. Proc. Natl. Acad. Sci. USA 97:5960–
5965.
BULMER, M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907.
CARGILL, M., D. ALTSHULER, J. IRELAND et al. (17 co-authors).
1999. Characterization of single-nucleotide polymorphisms
in coding regions of human genes. Nat. Genet. 22:231–238.
CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH.
1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303.
Population Genetics of Drosophila simulans
EANES, W. F., M. KIRCHNER, J. YOON, C. H. BIERMANN, I. N.
WANG, M. A. MCCARTNEY, and B. C. VERRELLI. 1996. Historical selection, amino acid polymorphism and lineage-specific divergence at the G6pd locus in Drosophila melanogaster and D. simulans. Genetics 144:1027–1041.
FAY, J. C., and C.-I. WU. 2000. Hitchhiking under positive
Darwinian selection. Genetics 155:1405–1413.
GILLESPIE, J. H. 1997. Junk ain’t what junk does: neutral alleles
in a selected context. Gene 205:291–299.
HAMBLIN, M., and M. VEUILLE. 1999. Population structure
among African and derived populations of Drosophila simulans: evidence for ancient subdivision and recent admixture. Genetics 153:305–317.
IRVIN, S. D., K. A. WETTERSTRAND, C. M. HUTTER, and C. F.
AQUADRO. 1998. Genetic variation and differentiation at
microsatellite loci in Drosophila simulans: evidence for
founder effects in new world populations. Genetics 150:
777–790.
KIMURA, M. 1983. The neutral theory of molecular evolution.
Cambridge University Press, Cambridge, England.
KREITMAN, M., and M. ANTEZANA. 2000. Population and evolutionary genetics of codon usage in Drosophila. Pp. 82–
101 in R. SINGH and C. KRIMBAS, eds. Evolutionary genetics: from molecules to morphology. Cambridge University Press, Oxford, England.
LI, W.-H. 1987. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous
codons. J. Mol. Evol. 24:337–345.
MCVEAN, G. A. T., and B. CHARLESWORTH. 1999. A population genetic model for the evolution of synonymous codon
usage: patterns and predictions. Genet. Res. 74:145–158.
MCVEAN, G. A. T., and J. VIEIRA. 1999. The evolution of codon preference in Drosophila: a maximum-likelihood approach to parameter estimation and hypothesis testing. J.
Mol. Evol. 49:63–75.
MARUYAMA, T., and P. A. FUERST. 1984. Population bottlenecks and nonequilibrium models in population genetics. I.
1351
Allele numbers when populations evolve from zero variability. Genetics 108:745–763.
OHTA, T. 1992. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263–286.
OHTA, T., and M. KIMURA. 1971. On the constrancy of the
evolutionary rate of cistrons. J. Mol. Evol. 1:18–25.
POWELL, J. R., and E. N. MORIYAMA. 1997. Evolution of codon
usage bias in Drosophila. Proc. Natl. Acad. Sci. USA 94:
7784–7790.
ROZAS, J., and R. ROZAS. 1999. DnaSP 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174–175.
SHARP, P. M., and A. T. LLOYD. 1993. Codon usage. Pp. 378–
397 in G. MARONI, ed. An atlas of Drosophila genes: sequences and molecular features. Oxford University Press,
Oxford, England.
STURTEVANT, A. H. 1929. Contributions to the genetics of Drosophila simulans and Drosophila melanogaster. Publ. Carnegie Inst. 399:1–62.
TAJIMA, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:
585–595.
TAKANO-SHIMIZU, T. 1999. Local recombination and mutation
effects on molecular evolution in Drosophila. Genetics 153:
1285–1296.
TRUE, J. R., J. M. MERCER, and C. C. LAURIE. 1996. Differences in crossover frequency and distribution among three
sibling species of Drosophila. Genetics 142:507–523.
WRIGHT, S. 1938. The distribution of gene frequencies under
irreversible mutation. Proc. Natl. Acad. Sci. USA 24:253–
259.
———. 1990. The ‘‘effective number of codons’’ used in a
gene. Gene 87:23–29.
DAVID M. RAND, reviewing editor
Accepted March 20, 2001
1352
Begun
APPENDIX A
APPENDIX B
Estimates of Codon Bias in Drosophila simulans,
Drosophila yakuba, and Drosophila melanogaster
Tajima’s D Test Statistics for Silent Polymorphisms in
Drosophila simulans
ENC
Gene
Chromosome 3R
Gld . . . . . . . . . .
Rh3 . . . . . . . . . .
mir . . . . . . . . . . .
nos . . . . . . . . . . .
eld . . . . . . . . . . .
Hsc70 . . . . . . . .
CP190 . . . . . . . .
ry . . . . . . . . . . . .
hyd . . . . . . . . . .
Rel . . . . . . . . . . .
pit . . . . . . . . . . .
AP50 . . . . . . . . .
T-cpl . . . . . . . . .
fzo . . . . . . . . . . .
AATS . . . . . . . . .
tld . . . . . . . . . . .
Osbp . . . . . . . . .
boss . . . . . . . . . .
Tpi . . . . . . . . . . .
Chromosome I
runt . . . . . . . . . .
G6pd . . . . . . . . .
bnb . . . . . . . . . .
r .............
mei-218 . . . . . . .
sog . . . . . . . . . . .
g ............
Yp3 . . . . . . . . . .
v ............
Yp2 . . . . . . . . . .
otu . . . . . . . . . . .
sn . . . . . . . . . . . .
dec-1 . . . . . . . . .
ct . . . . . . . . . . . .
sqh . . . . . . . . . . .
X ............
ovo . . . . . . . . . .
mei-9 . . . . . . . . .
per . . . . . . . . . . .
z .............
Pgd . . . . . . . . . .
D. simulans
56.0
42.8
33.9
43.8
44.7
30.1
57.5
51.4
55.4
49.4
45.8
40.8
39.1
60.9
54.2
48.5
44.3
48.9
27.5
35.3
31.0
40.4
42.3
54.9
40.4
45.9
28.0
43.8
33.3
56.6
38.8
55.9
43.2
33.5
44.1
52.1
56.6
36.8
42.1
37.4
C. yakuba
43.1
36.4
49.5
30.2
51.5
51.6
55.5
48.0
40.4
38.6
45.7
46.1
29.0
32.3
41.0
43.3
43.2
37.9
50.4
41.5
35.8
50.1
35.5
Gene
D.
melanogaster
Length Bias
51.4
42.1
33.3
45.6
45.8
30.3
53.2
52.6
56.2
50.3
47.5
39.5
38.7
60.6
51.8
52.6
44.5
49.1
28.6
612
383
830
401
2,715
651
1,096
1,335
2,985
971
663
437
557
718
1,714
1,057
667
896
247
L
H
H
H
L
H
L
L
L
L
L
H
H
L
L
L
L
L
H
37.2
31.5
42.8
48.7
55.8
41.9
47.3
32.4
44.2
32.9
53.0
40.4
53.1
43.1
35.7
44.5
52.7
57.7
37.5
42.8
37.2
509
523
442
2,236
523
1,038
810
420
379
442
811
512
1,123
2,175
174
3,429
1,028
1,186
1,155
574
481
H
H
H
H
L
H
L
H
L
H
L
H
L
H
H
L
L
L
H
H
H
NOTE.—Shown are effective numbers of codons (ENCs) for randomly selected alleles from each of the three species, calculated using only the codons
that were sequenced in D. simulans. Genes were ranked by D. simulans ENC.
Genes in the lower half of the distribution were placed in the higher-bias category, while genes in the upper half of the distribution were placed in the lowerbias category. Length 5 number of amino acid residues in complete proteins as
indicated in the SwissProt database; H 5 higher bias; L 5 lower bias.
Chromosome 3R
Gld . . . . . . . . . . . . . . . . . . . .
Rh3 . . . . . . . . . . . . . . . . . . . .
mir . . . . . . . . . . . . . . . . . . . . .
nos . . . . . . . . . . . . . . . . . . . . .
eld . . . . . . . . . . . . . . . . . . . . .
Hsc70 . . . . . . . . . . . . . . . . . .
CP190 . . . . . . . . . . . . . . . . . .
ry . . . . . . . . . . . . . . . . . . . . . .
hyd . . . . . . . . . . . . . . . . . . . .
Rel . . . . . . . . . . . . . . . . . . . . .
pit . . . . . . . . . . . . . . . . . . . . .
AP50 . . . . . . . . . . . . . . . . . . .
T-cpl . . . . . . . . . . . . . . . . . . .
fzo . . . . . . . . . . . . . . . . . . . . .
AATS . . . . . . . . . . . . . . . . . . .
tld . . . . . . . . . . . . . . . . . . . . .
Osbp . . . . . . . . . . . . . . . . . . .
boss . . . . . . . . . . . . . . . . . . . .
Tpi . . . . . . . . . . . . . . . . . . . . .
X chromosome
runt . . . . . . . . . . . . . . . . . . . .
G6pd . . . . . . . . . . . . . . . . . . .
bnb . . . . . . . . . . . . . . . . . . . .
r .......................
mei-218 . . . . . . . . . . . . . . . . .
sog . . . . . . . . . . . . . . . . . . . . .
g ......................
Yp3 . . . . . . . . . . . . . . . . . . . .
v ......................
Yp2 . . . . . . . . . . . . . . . . . . . .
otu . . . . . . . . . . . . . . . . . . . . .
sn . . . . . . . . . . . . . . . . . . . . . .
dec-1 . . . . . . . . . . . . . . . . . . .
ct . . . . . . . . . . . . . . . . . . . . . .
sqh . . . . . . . . . . . . . . . . . . . . .
X ......................
ovo . . . . . . . . . . . . . . . . . . . .
mei-9 . . . . . . . . . . . . . . . . . . .
per . . . . . . . . . . . . . . . . . . . . .
z .......................
Pgd . . . . . . . . . . . . . . . . . . . .
Tajima’s D
0.808
20.728
20.394
20.079
0.412
20.044
1.044
20.149
0.092
20.775
0.554
0.299
20.793
0.380
0.197
20.188
0.159
20.008
20.863
1.280
1.564
21.595
0.050
1.598
20.861
21.623*
—
1.670
—
0.955
0.948
0.421
—
20.561
1.385
0.874
20.572
20.699
0.612
1.093
NOTE.—D was not calculated for samples having only one or two segregating sites. * 5 P , 0.05.