Download Inference of Positive and Negative Selection on the 59 Regulatory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Natural selection wikipedia , lookup

Gene expression programming wikipedia , lookup

The Selfish Gene wikipedia , lookup

Introduction to evolution wikipedia , lookup

Sex-limited genes wikipedia , lookup

Genetics and the Origin of Species wikipedia , lookup

Population genetics wikipedia , lookup

Symbiogenesis wikipedia , lookup

Evolutionary developmental biology wikipedia , lookup

Transcript
Inference of Positive and Negative Selection on the 59 Regulatory
Regions of Drosophila Genes
Michael H. Kohn,* Shu Fang,* and Chung-I Wu*
*Department of Ecology & Evolution, The University of Chicago; Institute of Zoology, Academia Sinica, Nankang,
Taipei, Taiwan, Republic of China
Both positive selection and negative selection have been shown to drive the evolution of coding regions. It is of interest to
know if the corresponding 59 regions of genes may be subjected to selection of comparable intensities. For such
a comparison, we chose the Accessory gland protein (Acp) genes as our test case. About 700 bp and 600 bp for the 59 and
coding regions, respectively, of eight previously unstudied genes were sequenced from 21 isogenic lines of
D. melanogaster and one line from D. simulans. The ratio of divergence at the amino-acid replacement sites (A) over
that at the synonymous sites (S) was twice the ratio for common polymorphism. Interestingly, the 59 region shows the same
trend, with the 59/S divergence ratio being 1.8 times higher than the 59/S ratio for common polymorphism. There are several
possible explanations for the 59/S ratios, including demography, negative selection, and positive selection. Under normal
conditions, positive selection is the most likely explanation. If that is true, about 45 to 50 percent of all fixed differences at
both the replacement and 59 sites were adaptive, even though the substitution rate in the former is only half that of the latter
(KA/KS ; 0.3 vs. K59/KS ; 0.6). As previous analyses have indicated, the inclusion of slightly deleterious polymorphism
confounds the inference of positive selection. The analysis of published polymorphism data covering 97 verified 59 regions
of Drosophila suggests more pronounced selective constraint on the 59 untranslated region and the core promoter (together
corresponding to ;200 bp in this data set) when compared to the more distal portion of the 59 region of genes.
Introduction
The role that natural selection may play in shaping
patterns of polymorphism and divergence of proteinencoding sequences has been a long-standing issue of
debate (Kimura 1968; King and Jukes 1969; Ohta 1973;
Nei 1987). It has been addressed in recent years using
genomic data (e.g., Akashi 1999; Sunyaev et al. 2000;
Andolfatto 2001; Aquadro, DuMont, and Reed 2001; Fay,
Wykoff, and Wu 2001, 2002; Smith and Eyre-Walker
2002). The organization of genes as distinct units (introns
and exons), codons (preferred and unpreferred), and sites
(amino-acid replacement sites and synonymous sites) has
enabled the formulation of testable hypotheses regarding
their evolution (e.g., McDonald and Kreitman 1991;
Sawyer and Hartl 1992). For example, amino-acid replacement (A) and synonymous (S) sites experience different
selective forces. Whereas the former may experience positive and negative selection, the latter are generally assumed to be neutral (but see, e.g., Akashi 1995). The
contrast between A and S sites thus has opened up ample
opportunities for rigorous analyses of the effects of both
positive and negative selection.
The pervasiveness of negative selection on the genome is a most concrete lesson in molecular evolution
(e.g., Li 1997). Specifically, the prevalence of weak
selection has become clear in recent years (e.g., Ohta
1973; Akashi 1995). Negative selection strong enough to
prevent fixation, but not strong enough to prevent polymorphism, is of particular interest because it operates on
the standing variation in natural populations. Coding
region data have revealed 20% to 40% of amino acid
polymorphism in both human and Drosophila to be under
Key words: promoter, molecular evolution, selection, Acp genes,
accessory gland.
E-mail: [email protected].
Mol. Biol. Evol. 21(2):374–383. 2004
DOI: 10.1093/molbev/msh026
Advance Access publication December 5, 2003
Molecular Biology and Evolution vol. 21 no. 2
Ó Society for Molecular Biology and Evolution 2004; all rights reserved.
374
such weak negative selection (Fay, Wykoff, and Wu 2001,
2002; Smith and Eyre-Walker 2002). Most surprising,
however, positive selection appears to be the driving force
behind 30% to 40% of amino acid substitutions in these
two species (Fay, Wykoff, and Wu 2001, 2002; Smith and
Eyre-Walker 2002). Accelerated and adaptive evolution
could be attributed in part to certain functional groups of
genes, including those that are interlocked in sexual
conflict and reproduction (Aguade, Miyashita, and Langley 1992; Tsaur, Ting, and Wu 1998; Begun et al. 2000;
Wykoff, Wang, and Wu 2000; Swanson et al. 2001).
The degree of uncertainty concerning the role of
natural selection in the evolution of noncoding regions is
in sharp contrast to our understanding of selection on
protein coding sequences (Tautz 2000). Recognizable
features and motifs outside of coding regions are somewhat labile, are often small in size, and may depend on the
sequence context (e.g., Lemon and Tjian 2000; Fessele
et al. 2002). Therefore, we might expect the overall pattern
of noncoding regions to be evolutionarily neutral, or we
might expect the complex structure of regulatory regions
to be the object of accordingly complex modes of selection
(Ludwig 2002; Dermitzakis, Bergman, and Clark 2003),
which may not yield any clearly predictable pattern. For
example, for the Drosophila even-skipped gene, a compensatory mutation mechanism for enhancer motifs has been
proposed that may allow for rapid sequence divergence
between species while maintaining functional equivalence
with respect to gene expression (Ludwig et al. 2000).
The spatial and functional boundaries of the 59 region
have been delineated for only a relatively small number of
genes (e.g., Wingender et al. 2000). The ongoing comparative and bioinformatic analyses of full-genome sequences are beginning to alleviate this resource shortage
(e.g., Bergman et al. 2002; Berman et al. 2002). Nonetheless, comparatively few empirical studies are thus far
available to provide insight into the population genetics of
regulatory sequences (e.g., Jenkins, Ortori, and Brookfield
1995; Ludwig and Kreitman 1995; Tautz and Nigro 1998;
Population Genetics of 59 Regulatory Regions 375
Dermitzakis, Bergman, and Clark 2003). Regulatory
noncoding DNA sequences contribute much to the bulk
of the genome and, moreover, their role in biological
diversification has long been a matter of speculation (e.g.,
Raff 1996). Hence, the furthering of our understanding of
the evolutionary mode of regulatory sequences should be
of great relevance.
In this study, we sequenced and analyzed 59-flanking
regions and coding sequences of eight different previously
unstudied Accessory gland protein (Acp) genes in Drosophila. Acps are part of the protein cocktail that makes up
the seminal fluid, which is passed on to the female along with
the sperm; it may stimulate egg-laying, inhibit the female’s
propensity to mate again, and function in sperm competition
(Clark et al. 1995; Chapman 2001; Wolfner 2002). Acps
may also be toxic to the female, thereby shortening her lifespan. Some Acp genes have been shown to evolve at high
rates between Drosophila species, and to bear the signature
of selection at the polymorphism level (Aguade, Miyashita,
and Langley 1992; Tsaur and Wu 1997; Tsaur, Ting, and Wu
1998; Begun et al. 2000; Swanson et al. 2001). A recently
obtained collection of expressed sequence tag sites (ESTs),
presumably covering nearly all of the Acp genes of the
Drosophila genome, was shown to have uniformly high
amino acid substitution rates when compared with silent
substitution rates (Swanson et al. 2001). Because positive
and negative selection both operate in large doses on the
coding regions of the Acp genes, this group of genes may
supply excellent candidates for comparing the effects of
such forces on the corresponding 59 regions. We also
surveyed existing data on regulatory noncoding sequences
in Drosophila in order to measure the selective constraints in
a larger and presumably unbiased gene collection.
Methods
Construction of Isogenic Lines
To construct isogenic chromosome 2 and/or 3 lines,
single males were crossed to CyO/Sp;TM3, Ser/Sb females
(generation G1). Resulting single Cy males (marked by
dominant curly wing mutation) and Ser males (dominant
serrate wing mutation) were backcrossed to CyO/Sp;TM3/
Sb females (G2). At G3, Cy and Ser males and females
were mated inter se. The CyO and/or TM3 balancer
chromosomes were then eliminated at G4 to produce the
isogenic chromosome 2 and/or 3 lines.
In total, we surveyed 21 of these isogenic/isochromosomal lines of D. melanogaster. These included 17
African (A) and four non-African lines that were derived
from recently described lines (Hollocher et al. 1997;
Takahashi et al. 2001; Fang, Takahashi, and Wu 2002).
The genome sequence of the y;cn bw sp D. melanogaster
strain was included in our survey (Adams et al. 2000;
Celniker et al. 2002). Divergence data were obtained from
D. simulans (isolate from Davis, California).
Polymerase Chain Reaction (PCR) and
DNA Sequencing
Primers were designed for genes that are expressed
in the male accessory gland of D. simulans, as was deter-
mined by Swanson et al. (2001). Furthermore, genes were
chosen for which a comparison between D. simulans ESTs
and D. melanogaster genomic sequence suggested a synonymous site divergence of about 11% and amino-acid
replacement site divergence of about 2% (Swanson et al.
2001). Primer design was done using the Primer3 software
(http://www.genome.wi.mit.edu/genome_software/other/
primer3.html; Rozen and Skaletsky 2000). The strategy
underlying their design relied on BDGP annotation
Release 2 and was to obtain PCR products of ;900 base
pairs (bp) annotated 59 region (P) of each of the genes and
;900 bp of corresponding coding sequence (CDS) (fig. 1).
Fragments were PCR-amplified from genomic DNA using
the primers given in table 1 of the Supplementary Material
online 1. Both the coding sequence and the corresponding
59 region for the eight genes were obtained from a PCR
screen of a larger set of Acp genes. DNA sequencing was
done as described elsewhere (Takahashi et al. 2001), and
primary sequence data were deposited under GenBank
(AY394091–AY394430). The alignments underlying our
analyses are provided as table 2 of the Supplementary
Material online.
Description of Data Set
Of the eight genes that were obtained from our initial
PCR screening, six genes were located on the second
chromosome and two were on the third chromosome.
None were located in regions of low recombination (table
1). A tRNA gene (CR31494) 72 bp long was situated
within the second intron of CG31248 (fig. 1). Neither
polymorphic nor divergent sites were found in this tRNA
gene and, consequently, there was no variation at its
regulatory sites that are located entirely within tRNA
genes. Three genes CG10956, CG8137, and CG9334 were
members of the serine protease inhibitor (serpin) family
(table 1). The latter two had ;73% amino-acid similarity
over a region spanning 375 amino acids. A domain search
identified CG10956 as a putative and highly diverged
paralog to them. Comparison of the nucleotide sequences
obtained for the putative paralogs with the Drosophila
genome sequence confirmed that the authentic copy of
each gene was PCR-amplified and sequenced (table 2 in
the Supplementary Material online). Serpins are known to
assume reproductive roles (Wolfner et al. 1997). Two
genes are triacylglycerol lipases (CG31872, CG17097),
which are also known to be involved in reproductive
processes (Smith et al. 1994). The putative function of the
remaining three genes either is unknown (CG31248) or is
not known to be specific for reproduction [programmed
cell death (CG5333); phospholipase (CG8552)]. In all,
however, the high level of expression of these genes in the
accessory gland hints at their involvement in male
reproduction (Swanson et al. 2001).
Initially, for primer design we relied on the BDGP
annotation of the D. melanogaster genome sequence
Release 2. Gene annotation was compared to the new
Release 3 (Celniker et al. 2002; Misra et al. 2002) of the
Drosophila genome (table 1; fig. 1). Two genes CG2640
and CG17101 (with the synonym CG17093 under Rel. 2)
376 Kohn et al.
FIG. 1.—Annotation of eight sequenced Acp genes. Coding sequence was annotated using (a) cDNAs and ESTs from BDGP (Stapleton et al.
2002; http://www.fruitfly.org/DGC/index.html and http://www.fruitfly.org/EST/EST.shtml) and (b) ESTs from the D. simulans library (Swanson et al.
2001, available from http://www.pnas.org). The cDNAs and ESTs were identified using local NCBI-BlastN. The predicted transcripts were used as
query to search the databases dbEST (December 2002) and full-length cDNA (December 2002). Final mapping of expressed sequences onto genomic
sequence was done with sim4 (Florea et al. 1998). In the figure, lines connect ESTs and cDNAs that overlap.
have been renamed to CG31248 and CG31872. We refer
to them by their new CG numbers. Differences between
genome releases had no effect on our coding sequence
annotation. The putative 59 regions differed between releases for two (CG17097 and CG31872) of our eight genes
(fig. 1). The 59 regions of CG17097 and CG31872 have
been moved upstream of the previously (Rel. 2) assigned
59 ends. However, there is no evidence from expressed
sequences (Stapleton et al. 2002) that support the new
annotation (fig. 1). Moreover, polymorphic indels that
would disrupt the translation frame were located in the 59
region of the genes, further suggesting this part was not
coding (table 2 in the Supplementary Material online).
Finally we note that the exclusion of these 59 regions from
our analysis would not alter our conclusions (see Results).
Analysis
Sequence analyses were carried out using DNASP 3.3
(Rozas and Rozas 1999) and ProSeq (Filatov et al. 2000).
Per site divergence between D. melanogaster and
D. simulans across all eight genes was computed separately for the concatenated (i.e., weighted by length)
sequence of the 59 region (K59), amino-acid replacement
sites (KA), and synonymous sites (KS). Divergence and its
standard deviation were estimated from the concatenated
sequences and using the Kimura two-parameter model
(Kimura 1980). K59, KA, and KS (and one SD) were also
computed for each gene separately (table 1). The number
of effectively used codons (ENC) was computed using
DNASP 3.3 (Rozas and Rozas 1999). The ENC in the CDS
was high for D. melanogaster and D. simulans (57.7 and
56.9, respectively). Both were not significantly biased
using the v2-test at the 5% significance level. The GC
content of the CDS was about 49% percent, whereas GC
content of the 59 region was 40%.
For the analysis of polymorphism and divergence
within the framework of a McDonald-Kreitman (1991)
test, we separated sites into those that are polymorphic
within D. melanogaster and those that are fixed between
D. melanogaster and D. simulans. Furthermore, polymorphic and derived variant frequency of three or more
chromosomes in our sample of 22 chromosomes (.13%)
was designated as common, whereas a derived variant
frequency of two or less (,10%) was considered to be rare
(c.f. Fay, Wykoff, and Wu 2002). The rationale for
separating polymorphism into frequency classes is that
truly neutral mutations can best be seen in the highfrequency class. G-tests after William’s correction were
used to determine the significance of the McDonaldKreitman (1991) test.
Results
Polymorphism and Divergence in the 59 and
Coding Regions of Acp Genes
In total, we collected ;5.6 kb of the 59 region
preceding the translation initiation start and ;5.1-kb
protein coding region of eight Acp genes from 22
D. melanogaster lines (including y;cn bw sp; Adams et
Population Genetics of 59 Regulatory Regions 377
Table 1
Gene Information and Evolutionary Rates for Eight Sequenced Drosophila Acp Genes
Sites Sampled (bp)a
Gene Information
Evolutionary Rate
ID
Map
Functional Annotation
59
A
S
CG8552
CG8137b
CG17097
CG31248c
CG9334b
CG10956
CG31872d
CG5333
All
28E
28F
31F
31F
38E
53F
84C
87C
Phospholipase A1
Serpin
Triacylglycerol lipase
Unknown
Serpin
Serpin
Triacylglycerol lipase
Cell death
835
564
473
926
624
679
747
811
5623
373.2
635.3
541.7
231.7
306.5
720.8
394.5
763.7
3965.3
145.8
171.7
157.3
70.3
92.5
194.2
116.5
213.3
1158.7
K59
0.105
0.083
0.057
0.103
0.110
0.075
0.119
0.051
0.088
6
6
6
6
6
6
6
6
6
Rate Ratio
KA
0.012
0.013
0.011
0.011
0.014
0.011
0.014
0.008
0.004
0.050
0.084
0.011
0.004
0.087
0.030
0.031
0.023
0.039
6
6
6
6
6
6
6
6
6
KS
0.012
0.012
0.005
0.004
0.018
0.007
0.009
0.006
0.003
0.120
0.114
0.133
0.281
0.118
0.118
0.165
0.170
0.143
6
6
6
6
6
6
6
6
6
0.031
0.028
0.032
0.078
0.039
0.027
0.042
0.032
0.012
K59/KS
KA/KS
0.88
0.73
0.43
0.37
0.93
0.64
0.72
0.30
0.62
0.42
0.74
0.08
0.01
0.74
0.25
0.19
0.14
0.27
a
The alignment length between D. melanogaster ( y;cn bw sp) and D. simulans excluding insertions and deletions. Sequence coverage was 22 D. melanogaster lines
and D. simulans (see Methods).
b
Duplicate gene pair with amino-acid similarity of 72.7%, KA of 0.173 and KS of 0.249. Common names are Spn2 (CG8137) and Spn3 (CG9334)
c
Synonym CG2640.
d
Synonyms CG17101, CG17093.
al. 2000; Celniker et al. 2002) and one D. simulans line
(table 1). The sequences are expected to capture variation
in proximal enhancer elements, core promoter, 59 untranslated region (59 UTR), and any intron sequence
preceding the translation start site (fig. 1). The boundaries
of each of these regions that make up the 59 regulatory ends
of our set of annotated genes are unknown. Overall,
however, the sequenced portions of the 59 regions are
expected to contain elements that exert regulatory control
(e.g., Lemon and Tjian 2000; Smale 2001). Because of the
weak signal expected to come from each individual gene, we
sum up the sites from the 59 region and the coding region,
respectively, across genes (Sawyer and Hartl 1992; Akashi
1999; Cargill et al. 1999; Begun et al. 2000; Fay, Wykoff,
and Wu 2001, 2002).
Evolutionary rates observed for each of the different
regions among the eight Acp genes are summarized in
table 1. The average evolutionary rates at synonymous
sites (KS) and at amino-acid replacement sites (KA) were
within the range of previously reported values for Acp
genes (Swanson et al. 2001, Betancourt, Presgraves, and
Swanson 2002). That the rate of amino-acid substitution is
slowed by functional constraint when compared with
synonymous substitutions is indicated by an average KA/
KS ratio of 0.27 between D. melanogaster and D. simulans
(table 1). The overall K59/KS is more than twice as high at
0.62, suggesting a lower, but still substantial, selective
constraint on the 59 region when compared with amino-
acid sites (table 1). Inclusion of indels, when treated as
single mutation events regardless of their size, had a small
effect only, changing K59 from 0.088 to 0.090.
On average, levels of polymorphism as measured as
hp per site in the 59 regions (7.8 3 103) were only about
41% of those seen at synonymous sites (19.2 3 103) but
twice as high as polymorphism levels at replacement sites
(3.9 3 103; table 2). Thus, like the rate contrasts referred to
in table 1, this also is suggestive of selective constraint
on the 59 regions, but on average these are lower than
those at amino-acid replacement sites (table 2). We have
to assume that the 59 regions and the coding regions have
independent demographic histories. To examine whether
there is a systematic (i.e., across all eight Acp genes)
difference between the 59 regions and the coding regions
that may have been caused by stochastic (e.g., demographic) events, we computed Tajima’s D for each gene
region separately (Tajima 1989a). None of the individual
genes or the pooled data display a significant value for
Tajima’s D at a significance level of a ¼ 0.05 (critical
values were deduced from 10,000 coalescent simulation
runs with no recombination).
In table 3 we analyzed the coding sequences for their
level of divergence and polymorphism by means of the
McDonald-Kreitman (MK) test (1991). We treated common and rare polymorphism separately (Akashi 1999;
Fay and Wu 2002). The rationale is that common
polymorphisms are more likely to be neutral than the rare
Table 2
Polymorphisma Data for Eight Sequenced Acp Genes
59 sites (59)
CG8552
CG8137
CG17097
CG31248
CG9334
CG10956
CG31872
CG5333
All
a
Replacement sites (A)
Synonymous sites (S)
hW
hp
Tajima’s D
hW
hp
Tajima’s D
hW
hp
Tajima’s D
0.00899
0.01626
0.00999
0.00300
0.00980
0.01596
0.00819
0.00137
0.00860
0.00570
0.01550
0.00739
0.00288
0.00884
0.01644
0.00890
0.00099
0.00779
1.319
0.171
0.900
0.130
0.349
0.110
0.309
0.737
0.366
0.01936
0.00394
0.00154
0.00240
0.00091
0.00386
0.00493
0.00218
0.00449
0.01955
0.00303
0.00095
0.00280
0.00031
0.00341
0.00297
0.00160
0.00386
0.034
0.729
0.926
0.348
1.061
0.373
1.197
0.783
0.524
0.03622
0.01619
0.01944
0.00791
0.00300
0.02290
0.02624
0.00261
0.01727
0.04755
0.01509
0.01877
0.00826
0.00412
0.02629
0.03057
0.00089
0.01924
1.094
0.219
0.112
0.095
0.598
0.508
0.538
1.387
0.433
Estimated using http://www.hgc.sph.uth.tmc.edu/fu/genealogy/test2/welcome.html; Fu 1997.
378 Kohn et al.
Table 3
Fixed and Polymorphic Sites for Eight Sequenced Acp Genes Classified by Derived Variant Frequencya
59 Sites (59)
Gene ID
Fixed
CG8552
73
CG8137
36
CG17097
21
CG31248
77
CG9334
55
CG10956
34
CG31872
75
CG5333
38
All
409
Polymorphic
Common
27
11
33
18
17
7
10
3
22
14
39
25
22
14
4
2
174
94
Replacement Sites (A)
Fixed
12
48
5
1
24
20
11
16
137
Polymorphic
Common
26
14
9
5
3
1
2
1
1
0
10
5
7
1
6
2
64
29
Synonymous Sites (S)
Fixed
10
15
14
15
10
16
15
31
126
Polymorphic
Common
19
15
10
5
11
7
2
2
1
1
16
15
11
7
2
0
72
52
59/S
Fixed
7.30
2.40
1.50
5.13
5.50
2.13
5.00
1.23
3.25
Polymorphic
Common
1.42
0.73
3.30
3.60
1.55
1.00
5.00
1.50
22.00
14.00
2.43
1.67
2.00
2.00
2.00
N/A
2.42
1.81
A/S
Fixed
1.20
3.20
0.36
0.07
2.40
1.25
0.73
0.52
1.09
Polymorphic
Common
1.36
0.93
0.90
1.00
0.27
0.14
1.00
0.50
1.00
N/A
0.62
0.33
0.64
0.14
3.00
N/A
0.89
0.56
NOTE.—N/A ¼ not applicable.
a
Common frequency . 13%.
ones, which are often slightly deleterious (Cargill et al.
1999; Halushka et al. 1999; Fay, Wykoff, and Wu 2002;
Smith and Eyre-Walker 2002). The inclusion of the latter
may confound the analysis of positive selection. Indeed,
the ratio of common amino-acid polymorphism to
common synonymous site polymorphism (A/S) is 0.56,
much lower than the A/S ratio for the low frequency
polymorphism (1.75 ¼ 35/20). (Note that A/S generally
falls between 2.2 and 2.5 under strict neutrality, depending
on the amino acid composition and the ratio of transition to
transversion.) A decrease in the A/S ratio, when the variant
frequency increases, is indicative of weak selection against
amino acid polymorphism and is one of the most common
characteristics of coding sequence evolution (Fay, Wykoff,
and Wu 2001, 2002; Smith and Eyre-Walker 2002). The
A/S ratio for divergence (1.09) was about twice (1.95 ¼
1.09/0.56) as high as the A/S ratio (0.56) for common
polymorphism (table 3; G ¼ 6.619, P ¼ 0.010), possibly as
a result of positive selection. In contrast, the inclusion of
low-frequency polymorphism resulted in a much smaller
and nonsignificant difference (G ¼ 0.905, P ¼ 0.342)
between A/S divergence (1.09) and A/S polymorphism
(0.89) ratios (table 3).
In table 3 we also analyzed the 59 regions versus
synonymous sites for their levels of divergence and
polymorphism. The contrast between the 59 regions and
synonymous sites in their common polymorphism versus
divergence revealed significant excess for divergence (G ¼
8.214, P ¼ 0.004; table 3). Specifically, the 59/S ratio for
divergence (3.25) was 1.8-fold as high as the 59/S ratio
(1.81) for common polymorphism. Again, only a weak
signal for excess divergence in the 59 region was evident
when all 174 polymorphic sites in the 59 regions were
included in the analysis (G ¼ 2.851, P ¼ 0.091).
What the MK test can reveal is the excess/deficit of
divergence over polymorphism between two types of sites.
By itself, it does not suggest selection. Although excess is
often interpreted to mean positive selection, the interpretation depends on several assumptions (Fay and Wu 2002),
as will be discussed later. Nevertheless, if (and only if) the
assumptions are satisfied, the proportion of adaptive
substitutions between D. melanogaster and D. simulans
can be estimated based on a comparison between observed
levels of divergence and those predicted from common
synonymous polymorphism (table 3). Specifically, we
expected 228 [¼ (126/52) 3 94] substitutions in the 59
regions and 70 [¼ (126/52) 3 29] amino acid replacement
substitutions between species. Compared with the observed 409 59-region substitutions and 137 amino acid
substitutions, an excess of 181 and 67 substitutions,
respectively, can be inferred. The resulting proportion of
adaptive substitutions in the 59 regions was 44% (181/
409), comparable to that of adaptive amino acid
substitutions at 49 % (67/137) (fig. 2).
The small number of polymorphic sites precluded
detailed tracking of the contribution each gene has made to
the overall amount of adaptive evolution (table 3). To see
if any of these eight genes contributed disproportionately
to the overall pattern, we omitted one gene at a time and
recalculated the proportion of adaptive evolution for the
reduced data set. The range of values is depicted in figure 2
as vertical bars. The proportion of adaptive substitution
remained high, regardless of the gene omitted. Exclusion
of the two genes with less certain 59 regions (CG17097
and CG31872, cf. fig. 1) resulted in an estimate of 0.43 for
adaptive amino-acid divergence and 0.40 for adaptive 59
region divergence.
Insertions and deletions (indels) are often observed
in the noncoding regions but are exceedingly rare in
alignments of coding sequences between closely related
Population Genetics of 59 Regulatory Regions 379
FIG. 2.—Proportion of adaptive substitutions in the 59 regions (59)
and at amino-acid replacement sites (A) summed over eight Acp genes of
Drosophila. Common polymorphism was analyzed separately because it
predominantly considers truly neutral mutations and thereby enhances the
power to detect positive selection compared to analyses that incorporate
all polymorphism. The vertical bars depict the range estimates of the
proportion of adaptive substitutions assumed when one gene was omitted
from the analysis at a time.
species. We found 42 indels that were fixed between
species and 18 that were polymorphic in D. melanogaster
(table 2 of the Supplementary Material online). These were
scored regardless of their size. When we compare fixed
and common indel polymorphism with the corresponding
synonymous polymorphism the difference is significant (G
¼ 11.85; P ¼ 0.0006). While an excess in the divergent
indels, vis-à-vis the polymorphic ones, may suggest
positive selection, we have too little information on the
underlying mutation characteristics of indels to be
confident about such an inference. This is also true with
the frequency spectrum of polymorphic indels. Only 3 of
the 18 polymorphic indels were common, i.e., had
a frequency higher than 10%. This is a stronger skew
toward rare variants than that seen for synonymous
polymorphism and may be taken as prima facie evidence
of negative selection on indels. However, it is prudent not
to reach such a conclusion until the mutation dynamics of
indels are understood in more detail (e.g., Comeron 2001).
Polymorphism in the 59 Regions of Drosophila
in the Database
Because the eight Acp genes were chosen primarily
for detecting selection in the 59 regions, we also compiled
sequence variation in the 59 regions of Drosophila genes
from existing databases that presumably are less biased
with respect to selection. Specifically, we searched ;55.4
kilobase pairs (kb) spanning 97 experimentally studied
polymerase II promoters compiled in the Eukaryotic
Promoter Database (EPD; http://www.epd.isb-sib.ch; Praz
et al 2002). To search for single-nucleotide polymorphic
sites (SNPs) and insertion/deletions (indels) these sequences were aligned against the cross-referenced sequence
entries given in the EPD. For comparison, the proteinencoding regions were also retrieved and searched for
variation. For 84 of the 97 entries, more than one
Drosophila sequence for both the 59 region and coding
region were in the database. Average sample size was 3.3
(typically between 2 and 4), contained the non-African
strains y;cn bw sp, and/or Oregon R and/or Canton S; few
loci were deeply sampled. More than three-quarters (73 of
97) of the examined 59 regions displayed SNP and/or indel
variation (table 4). There is thus ample variation in
Drosophila promoters that may conceivably result in
intraspecies expression differences (Stone and Wray 2001;
Rockman and Wray 2002) and that may therefore impose
constraints on the 59 regions of genes.
In general terms, the observed levels of 59 variation
within and between Drosophila species lend further
justification to studies searching for the underlying
molecular changes that may promote population adaptation and (incipient) species divergence in Drosophila at the
gene-expression level (e.g., Takahashi et al. 2001; Rockman and Wray 2002; Oleksiak, Churchill, and Crawford
2002; Michalak and Noor 2003). For example, annotation
of the examined promoters in table 4 for transcription
factor (TF) binding sites using optimized matrix recognition parameters for Drosophila (MatInspector 5.1; http://
genomatix.gsf.de; Quandt et al. 1995) predicted that SNP
and indel variation potentially could have notable effects
on transcription. Of the 73 promoters that displayed
polymorphism, 45 (;62%) had predicted TF sites that
were unique to one or another Drosophila strain, leading to
an average of two unique TF sites (range 1–7) per
examined strain (not shown). Some quantitative variation
in Drosophila may be due to regulatory mutations (Gibson
and MacKay 2002), and recent simulation studies have
documented the possibility of fixation of new functional
elements in Drosophila 59 regions (Stone and Wray 2001;
Dermitzakis, Bergman, and Clark 2003).
We contrast the selective constraints between the 59
regions and the coding regions, as well as between
different parts of the 59 regions. It is clear from table 4 that
there may be strong selective constraints on the 59 UTR
and the 50 bp chosen to represent the core promoter (e.g.,
Smale 2001), amounting to about ;200 bp immediately
upstream of the translation start in this particular data set.
The average level of SNP polymorphism in these regions
is about half that at the synonymous sites (0.46 for the core
promoter regions and 0.59 for the 59 UTR). Interestingly,
the further 350–400 bp upstream appear to be much less
constrained, as the level of polymorphism in these regions
is 95% of the level at the synonymous sites (table 4).
Nevertheless, from a technical standpoint it should not be
concluded that the distal 59 region is neutral because the
small sample sizes precluded the partitioning of SNPs by
frequency, which might have informed us about negative
selection. Indels were found about half as frequently as
SNPs (table 4).
Discussion
In interpreting the MK test results in light of
selection, we arrived at the following three postulates:
1. We assumed that selective constraint on the coding
regions and 59 regions had remained constant since the
divergence between D. melangoaster and D. simulans.
Population size expansion of D. melanogaster (Aqua-
380 Kohn et al.
Table 4
Summary of Variation Data on Drosophila Genes in the Database, the Partitioning of Variationa Among 59 Regulatory
Regions, and Comparison of 59 Variation to Synonymous (S) and Replacement (A) Site Variation
Region (positions)
bpb (per locus average)
h
SNPs
h
c
Indels
SNP ratio: h / hS
hS . hd
Distal 59region: (499 till 50)
35,987 (375)
9.0 3 103
2.9 3 103
0.95
Proximal 59 region:
Core promoter (50 till þ1)
59 UTR (þ1 till AUG)
4,746 (50)
14,999 (163)
4.4 3 103
5.6 3 103
3.6 3 103
3.2 3 103
0.46
0.59
P , 0.005
P ¼ 0.025
Coding regione:
89,619 (1,134)
n.d
1.00
0.26
n.a.
P , 0.005
(S) 9.5 3 103
(A) 2.5 3 103
n.s.
NOTE.—n.s., non significant at a ¼ 0.05 in two-tailed test; n.d., not determined; n.a., not applicable.
a
Watterson’s (1975) diversity estimate hW computed using DnaSPv3.51 (Rozas and Rozas 1999). X-linked genes adjusted for reduced population size. Loci
weighed equally during computation of means (Andolfatto 2001).
b
Length and strain coverage varied due to availability of cross-references in EPD.
c
Indels scored as single mutations regardless of size.
d
One-tailed signed-ranks test assuming independence of loci (Andolfatto 2001).
e
Genes with alternative promoters included only once.
dro, DuMont, and Reed 2001), for example, would
have made selective constraints weaker in the past than
in the present, thus leading to an inflated type I error in
the MK test. This possibility of relaxed selective
constraints owing to effective population size change of
D. melanogaster in the past has not been found to be
a major factor when the excess of amino-acid replacement site divergence is examined, or when African
and non-African Drosophila are analyzed separately
(Fay, Wykoff, and Wu 2002). However, fluctuation in
the selective constraints from factors other than
population size (Fay and Wu 2002) has not been ruled
out.
2. The MK test uses levels of polymorphism and
divergence at synonymous sites as a reference point
to be compared to the corresponding levels of
polymorphism and divergence at amino-acid replacement sites. We extend this principle to sites located in
the 59 regulatory region of genes. The inferred selection
on the amino-acid replacement sites and 59 regulatory
sites then reflects, strictly speaking, merely the
differential selection between these types of changes
and synonymous changes. Therefore it is not necessary
to assume strict neutrality for synonymous changes. If
known, nonfunctional sites that are located within the
regulatory regions may be compared with those that are
known to be functional (Jenkins, Ortori, and Brookfield
1995; Ludwig and Kreitman 1995). Efforts to develop
annotation procedures applicable to regulatory regions
throughout the Drosophila genome are ongoing
(Berman et al. 2002; Bergman et al. 2002), and soon
may allow for the hypotheses-driven analysis that
contrasts patterns of polymorphism and divergence
among thousands of putative functional and nonfunctional noncoding sites.
3. We partitioned the polymorphism into common and
rare variants because, as explained earlier, the former
are more likely to approximate neutral variants than the
latter (Akashi 1999; Cargill et al. 1999; Halushka et al.
1999; Fay, Wykoff, and Wu 2001, 2002; Smith and
Eyre-Walker 2002).
These three postulates apply to both the coding and 59
regulatory regions; the latter need additional input,
however. In general, what we have observed in the 59
regions of eight different Acp genes can be summarized as
follows: the level of polymorphism is only 40% as high as
the synonymous sites (table 2), but the level of 59
divergence is 62% as high as the synonymous divergence
(table 1). That negative selection is a major factor
contributing to this reduction in polymorphism levels in
the 59 regions seems indisputable. But what might account
for the smaller reduction in the 59 divergence relative to
divergence at synonymous sites? Either there is too little 59
polymorphism or there is too much 59 divergence. These
two possibilities are discussed below.
First, the 59 regions may have a genealogical history
different from that of the corresponding coding regions
and, merely by chance, happen to be uniformly less
polymorphic. Although this may be true for any individual
gene, the 59 regions of the eight genes collectively are also
significantly less polymorphic (table 2); hence, chance
alone is unlikely to account for the difference. Moreover,
a reduction in polymorphism due to chance, much like the
bottleneck effect (Tajima 1989b), should affect rare alleles
more than common alleles and hence would result in
positive Tajima’s D. This is opposite the patterns of table 2
and, in addition, we eliminated rare (,10% frequency)
alleles from our MK analysis.
Second, the indirect effect selection has on 59 sites
may be stronger than its effect on synonymous sites. For
example, deleterious mutations reduce the level of polymorphism in their vicinity (the background selection effect
[Charlesworth, Morgan, and Charlesworth 1994; Charlesworth 1996]). However, background selection should be
stronger in the coding region than in the 59 region, as
selection is stronger against amino-acid replacement
changes (tables 1 and 2). Alternatively, selective sweeps
may be more intense in the 59 regions than in the coding
regions and may therefore lead to lower levels of polymorphism in the 59 regions when compared to the coding
regions. This suggestion would be the equivalent of the
third explanation listed above: positive selection plays
a role in the 59 divergence as in the standard interpretation
of the MK test.
For these reasons, we interpret our observation to be
Population Genetics of 59 Regulatory Regions 381
due to the effect of selection on the 59 versus synonymous
sites in general, rather than on any individual gene
specifically. With this in mind, our analysis of 59 regions
from eight Acp genes and a published survey of Drosophila promoter variation revealed both positive and
negative selection on them. The possibility that negative
selection is pervasive in 59 regulatory regions may generally apply to Drosophila genes, whether they function in
reproduction (such as Acp genes; tables 1–3), housekeeping or development (table 4; Tautz and Nigro 1998;
Dermitzakis, Bergman, and Clark 2003; Hahn, Stajich, and
Wray 2003). However, even though the opportunity for
positive selection to act on regulatory regions may
frequently exist (Stone and Wray 2001; Dermitzakis,
Bergman, and Clark 2003), evidence for it has thus far
emerged only from this analysis of Acp genes involved in
sexual reproduction (table 3). Our intraspecies comparisons suggested that evolutionary constraint might be
unequally partitioned within the 59 upstream regions (table
4). Moreover, our results suggest that there is no clear
correlation between negative and positive selection. This
can be deduced from the observation that the 59 regions
experienced as much positive selection as coding regions
but lower levels of negative selection (tables 1–3). The
lower estimate of constraint in 59 regions may reflect that
not all sites are likely to be of functional importance, that
regulatory motifs are labile and depend on sequence
context, and that selection modes may be complex.
Despite the limitations imposed on our analysis by
data availability, it is clear from table 4 that regulatory
polymorphism is a general feature of Drosophila. The
pervasiveness of negative selection on a subset of 59 sites
that coincide with the core promoters and 59 UTRs is
hinted at by the available data presented in table 4. Also
from table 4, these functional sections that appear to
experience higher levels of negative selection occupy
about 200 bp. Human genomic SNP data covering proximal 59 regions (mostly 59 UTRs) suggested weak levels of
constraint on them, as revealed by similar levels of rare
and common polymorphism (Fay, Wykoff, and Wu 2001).
In contrast, noncoding sequences distal (;9 kb) from
human genes displayed patterns of variation that were
compatible with their neutrality (Zhao et al. 2000).
Negative selection on the 59 regions should be more
pervasive in Drosophila when compared to human
because of the larger effective population size of the
former (Aquadro, DuMont, and Reed 2001).
Broad-scale analysis of noncoding sequence polymorphism and divergence in Drosophila and other species
will be needed to confirm and refine our results, and to
systematically expand the search for the signature of
positive selection in the regulatory regions of genes.
Acknowledgments
We are indebted to Mao-Lien Wu and Steve Dorus
for help with sequencing and Chia-Ling Hu for help with
the maintenance of Drosophila lines. We thank Willie
Swanson, Bettina Harr, Casey Bergman, Justin Fay, Ines
Hellmann, and Kevin Thornton for discussion. We also
thank the Editor and two reviewers for the suggested
improvements of our manuscript. The study was supported
by grants from the National Institutes of Health and the
National Science Foundation.
Literature Cited
Adams, M. D., S. E. Celniker, R. A. Holt et al. (195 co-authors).
2000. The genome sequence of Drosophila melanogaster.
Science 287:2185–2195.
Aguade, M., N. Miyashita, and C. H. Langley. 1992. Polymorphism and divergence in the Mst26A male accessory gland
gene region in Drosophila. Genetics 132:755–70.
Akashi, H. 1995. Inferring weak selection from patterns of
polymorphism and divergence at ‘‘silent’’ sites in Drosophila
DNA. Genetics 139:1067–1076.
———. 1999. Within- and between-species DNA sequence
variation and the ‘‘footprint’’ of natural selection. Gene
238:39–51.
Andolfatto, P. 2001. Contrasting patterns of X-linked and
autosomal nucleotide variation in Drosophila melanogaster
and Drosophila simulans. Mol. Biol. Evol. 18:279–290.
Aquadro, C. F., V. DuMont, and F. A. Reed. 2001. Genome-wide
variation in the human and fruitfly: a comparison. Curr. Opin.
Genet. Dev. 11:627–634.
Begun, D. J., P., Whitley, B. L., Todd, H. M., Waldrip, and A.
G., Clark. 2000. Molecular population genetics of male
accessory gland proteins in Drosophila. Genetics 156:1879–
1888.
Bergman, C. M., B. D. Pfeiffer, D. E. Rincon-Limas et al. (17 coauthors). 2002. Assessing the impact of comparative genomic
sequence data on the functional annotation of the Drosophila
genome. Genome Biol. 3:research0086.1–0086.20.
Berman, B. P., Y. Nibu, B. D. Pfeiffer, P. Tomancak, S. E.
Celniker, M. Levine, G. M. Rubin, and M. B. Eisen. 2002.
Exploiting transcription factor binding site clustering to
identify cis-regulatory modules involved in pattern formation
in the Drosophila genome. Proc. Natl. Acad. Sci. USA
99:757–762.
Betancourt, A. J., D. C. Presgraves, and W. J. Swanson. 2002. A
test for faster X evolution in Drosophila. Mol. Biol. Evol.
10:1816–1819.
Cargill, M., D. Altshuler, J. Ireland et al. (18 co-authors). 1999.
Characterization of single-nucleotide polymorphism in coding
regions of human genes. Nat. Genet. 22:231–238.
Celniker, S. E., D. A. Wheeler, B. Kronmiller et al. (32 coauthors). 2002. Finishing a whole-genome shotgun: Release 3
of the Drosophila euchromatic genome sequence. Genome
Biol. 3:research0079.1–0079.14.
Chapman, T. 2001. Seminal fluid–mediated fitness traits in
Drosophila. Heredity 87:511–521.
Charlesworth, B. 1996. Background selection and patterns of
genetic diversity in Drosophila melanogaster. Genet Res.
2:131–149.
Charlesworth, B., M. T. Morgan, and D. Charlesworth 1994. The
effect of deleterious mutations on neutral molecular variation.
Genetics 4:1289–1303.
Clark, A. G., M. Aguade, T. Prout, L. G. Harshman, and C. H.
Langley. 1995. Variation in sperm displacement and its
association with accessory gland protein loci in Drosophila
melanogaster. Genetics 139:189–201.
Comeron, J. M. 2001. What controls the length of noncoding
DNA. Curr. Opin. Genet. Dev. 11:652–659.
Dermitzakis, E. T., C. M. Bergman, and A. G. Clark. 2003.
Tracing the evolutionary history of Drosophila regulatory
regions with models that identify transcription factor binding
sites. Mol. Biol. Evol. 20:703–714.
382 Kohn et al.
Fang, S., A. Takahashi, and C. I. Wu. 2002. A mutation in the
promoter of desaturase 2 is correlated with sexual isolation
between Drosophila behavioral races. Genetics 162:781–
784.
Fay, J. C., and C. I. Wu. 2002. The neutral theory in the genomic
era. Curr. Opin. Genet. Dev. 6:642–646.
Fay, J. C., G. J. Wykoff, and C.-I. Wu. 2001. Positive and
negative selection on the human genome. Genetics 158:1227–
1234.
———. 2002. Testing the neutral theory of molecular evolution
with genomic data from Drosophila. Nature 415:1024–1026.
Fessele, S., H. Maier, C. Zischek, P. J. Nelson, and T. Werner.
2002. Regulatory context is crucial part of gene function.
Trends. Genet. 18:60–63.
Filatov, D. A., F. I. Moneger, I. Negrutiu, and D. Charlesworth.
2000. Low variability in a Y-linked plant gene and its implications for Y-chromosome evolution. Nature 404:388–390.
Florea, L., G. Hartzell, Z. Zhang, Z., G. M. Rubin, and W. Miller.
1998. A computer program for aligning a cDNA sequence
with a genomic DNA sequence. Genome Res. 8:967–974.
Fu, Y. X. 1997. Statistical tests of neutrality of mutations against
population growth, hitchhiking and background selection.
Genetics 147:915–925.
Gibson G., and T. F. Mackay. 2002. Enabling population and
quantitative genomics. Genet. Res. 1:1–6.
Hahn, M. W., J. E. Stajich, and G. A. Wray. 2003. The effects of
selection against spurious transcription factor binding sites.
Mol. Biol. Evol. 20:901–906.
Halushka M. K., J. B. Fan, K. Bentley, L. Hsie, N. Shen, A.
Weder, R. Cooper, R. Lipshutz, and A. Chakravarti. 1999.
Patterns of single-nucleotide polymorphisms in candidate
genes for blood-pressure homeostasis. Nat. Genet. 22:
239–247.
Hollocher, H., C.-T. Ting, M.-L. Wu, and C.-I. Wu. 1997.
Incipient speciation by sexual isolation in Drosophila
melanogaster: extensive genetic divergence without reinforcement. Genetics 147:1191–1201.
Jenkins, D. L., C. A. Ortori, and J. F. Brookfield. 1995. A test for
adaptive change in DNA sequences controlling transcription.
Proc. R. Soc. Lond. Ser. B Biol. Sci. 261:203–207.
Kimura, M. 1968. Evolutionary rate at the molecular level.
Nature 217:624–626.
———. 1980. A simple method for estimating evolutionary rates
of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.
King, J. L., and T. H. Jukes. 1969. Non-Darwinian evolution.
Science 164:788–798.
Lemon, B., and R. Tjian. 2000. Orchestrated response: a symphony of transcription factors for gene control. Genes Dev.
14:2551–2569.
Li, W. H. 1997. Molecular Evolution. Sinauer Associates,
Sunderland, Mass.
Ludwig, M. Z. 2002. Functional evolution of noncoding DNA.
Curr. Opin. Genet. Dev. 12:634–639.
Ludwig, M. Z., and M. Kreitman. 1995. Evolutionary dynamics
of the enhancer region of even-skipped in Drosophila. Mol.
Biol. Evol. 12:1002–1011.
Ludwig, M. Z., C. Bergman, N. H. Patel, and M. Kreitman. 2000.
Evidence for stabilizing selection in a eukaryotic enhancer
element. Nature. 403:564–567.
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein
evolution at the Adh locus in Drosophila. Nature 351:652–654.
Michalak, P., and M. A. F. Noor. 2003. Genome-wide patterns of
expression in Drosophila pure species and hybrid males. Mol.
Biol. Evol. 20:1070–1076.
Misra, S., M. A. Crosby, C. J. Mungall et al. (30 co-authors).
2002. Annotation of the Drosophila melanogaster genome:
a systematic review. Genome Biol. 3:research0083.1–
0083.22.
Nei, M. 1987. Molecular evolutionary genetics. Columbia
University Press, New York.
Ohta, T. 1973. Slightly deleterious mutant substitutions in
evolution. Nature 246:96–98.
Oleksiak, M. F., G. A. Churchill, and D. L. Crawford. 2002.
Variation in gene expression within and among natural
populations. Nat. Genet. 2:261–266.
Praz, V., R. C. Périer, C. Bonnard, and P. Bucher. 2002. The
Eukaryotic Promoter Database, EPD: new entry types and
links to gene expression data. Nucleic Acids Res. 30:322–324.
Quandt, K., K. Frech, H. Karas, E. Wingender, and T. Werner.
1995. MatInd and MatInspector: new fast and versatile tools
for detection of consensus matches in nucleotide sequence
data. Nucleic Acids Res. 23:4878–4884.
Raff, R. A. 1996. The shape of life: genes, development, and the
evolution of animal form. University of Chicago Press,
Chicago.
Rockman, M. V., and G. A. Wray. 2002. Abundant raw material
for cis-regulatory evolution in humans. Mol. Biol. Evol.
19:1991–2004.
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated
program for molecular population genetics and molecular
evolution analyses. Bioinformatics 15:174–175.
Rozen, S., and H. J. Skaletsky. 2000. Primer3 on the WWW for
general users and for biologist programmers. Pp. 365–386 in
S. Krawetz and S. Misener, eds, Bioinformatics methods and
protocols: methods in molecular biology. Humana Press,
Totowa, N.J.
Sawyer, S. A., and D. L. Hartl. 1992. The population genetics of
polymorphism and divergence. Genetics 132:1161–1176.
Smale, S. T. 2001. Core promoters: active contributors to
combinatorial gene expression. Genes Dev. 15:2503–2508.
Smith, G. M., K. Rothwell, S. L. Wood, S. J. Yeaman, and M.
Bownes. 1994. Specificity and localization of lipolytic activity in adult Drosophila melanogaster. Biochem. J. 304:
775–779.
Smith, N. G. C., and A. Eyre-Walker. 2002. Adaptive protein
evolution in Drosophila. Nature 415:1024–1026.
Stapleton, M., J. Carlson, P. Brokstein et al. (15 co-authors).
2002. A Drosophila full-length cDNA resource. Genome Biol.
3:research0080.1–0080.8.
Stone, J. R., and G. A. Wray. 2001. Rapid evolution of cisregulatory sequences via local point mutations. Mol. Biol.
Evol. 18:1764–1770.
Sunyaev, S. R., W. C. Lathe 3rd, V. E. Ramensky, and P. Bork.
2000. SNP frequencies in human genes and excess of rare
alleles and differing modes of selection. Trends Genet.
16:335–337.
Swanson, W. J., A. G. Clark, Waldrip-Dail, M. F. Wolfner, and
C. F. Aquadro. 2001. Evolutionary EST analysis identifies
rapidly evolving male reproductive proteins in Drosophila.
Proc. Natl. Acad. Sci. USA 13:7375–7379.
Tajima, F. 1989a. Statistical methods for testing the neutral
mutation hypothesis by DNA polymorphism. Genetics
123:585–595.
———. 1989b. The effect of change in population size on DNA
polymorphism. Genetics 123:597–601.
Takahashi, A., S. C. Tsaur, J. A. Coyne, and C. I. Wu. 2001. The
nucleotide changes governing cuticular hydrocarbon variation
Population Genetics of 59 Regulatory Regions 383
and their evolution in Drosophila melanogaster. Proc Natl.
Acad. Sci. USA 98:3920–3925.
Tautz, D. 2000. Evolution of transcriptional regulation. Curr.
Opin. Genet. Dev. 10:575–579.
Tautz, D., and L. Nigro. 1998. Microevolutionary divergence
pattern of the segmentation gene hunchback in Drosophila.
Mol. Biol. Evol. 15:1403–1411.
Tsaur, S. C., and C. I. Wu. 1997. Positive selection and the
molecular evolution of a gene of male reproduction, Acp26Aa
of Drosophila. Mol. Biol. Evol. 14:544–549.
Tsaur, S. C., C. T. Ting, and C. I. Wu. 1998. Positive selection
driving the evolution of a gene of male reproduction,
Acp26Aa, of Drosophila: II. Divergence versus polymorphism. Mol. Biol. Evol. 8:1040–1046.
Watterson, G. A. 1975. On the number of segregating sites.
Theor. Popul. Biol. 7:256–276.
Wingender, E., X. Chen, R. Hehl, et al. (11 co-authors). 2000.
Transfac: an integrated system for gene expression regulation.
Nucleic Acids Res. 28:316–319.
Wolfner, M. F. 2002. The gifts that keep on giving: physiological
functions and evolutionary dynamics of male seminal proteins
in Drosophila. Heredity 88:85–93.
Wolfner, M. F., H. A. Harada, M. J. Bertram, T. J. Stelick, K. W.
Kraus, J. M. Kalb, Y. O. Lung, D. M. Neubaum, M. Park, and
U. Tram. 1997. New genes for male accessory gland proteins
in Drosophila melanogaster. Insect. Biochem. Mol. Biol.
10:825–834.
Wyckoff, G. J., W. Wang, and C. I. Wu. 2000. Rapid evolution
of male reproductive genes in the descent of man. Nature
403:304–309.
Zhao, Z., Y.-X. Fu, M. Ramsay, T. Jenkins et al. (13 coauthors). 2000. Worldwide DNA sequence variation in a 10kilobase noncoding region on human chromosome 22. PNAS
28:316–319.
Diethard Tautz, Associate Editor
Accepted September 29, 2003