Download PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Vectors in gene therapy wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Koinophilia wikipedia , lookup

Gene desert wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

History of genetic engineering wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Point mutation wikipedia , lookup

NEDD9 wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Transposable element wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomic library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Designer baby wikipedia , lookup

Human genome wikipedia , lookup

Gene wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Pathogenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
RESEARCH ARTICLE 2761
Development 138, 2761-2771 (2011) doi:10.1242/dev.065227
© 2011. Published by The Company of Biologists Ltd
Direct targets of the D. melanogaster DSXF protein and the
evolution of sexual development
Shengzhan D. Luo1,*, Guang W. Shi2 and Bruce S. Baker1,*,†
SUMMARY
Uncovering the direct regulatory targets of doublesex (dsx) and fruitless (fru) is crucial for an understanding of how they regulate
sexual development, morphogenesis, differentiation and adult functions (including behavior) in Drosophila melanogaster. Using a
modified DamID approach, we identified 650 DSX-binding regions in the genome from which we then extracted an optimal
palindromic 13 bp DSX-binding sequence. This sequence is functional in vivo, and the base identity at each position is important
for DSX binding in vitro. In addition, this sequence is enriched in the genomes of D. melanogaster (58 copies versus
approximately the three expected from random) and in the 11 other sequenced Drosophila species, as well as in some other
Dipterans. Twenty-three genes are associated with both an in vivo peak in DSX binding and an optimal DSX-binding sequence,
and thus are almost certainly direct DSX targets. The association of these 23 genes with optimum DSX binding sites was used to
examine the evolutionary changes occurring in DSX and its targets in insects.
INTRODUCTION
Although many of the regulatory hierarchies governing
development have been elucidated, progress has been slow in
identifying the direct targets of the terminal regulatory genes in
these hierarchies. These difficulties arise from: (1) few target genes
being unique to one hierarchy; (2) difficulties in distinguishing
direct from indirect targets; and (3) in vitro defined binding sites of
transcription factors occurring by chance many times in the
genome. Technological advances such as ChIP-seq solve some of
these difficulties, but challenges remain for transcription factors
that lack a good antibody or regulate their targets in restricted
tissues (Park, 2009). The sex hierarchy of Drosophila melanogaster
exhibits all of these features that make it difficult to identify direct
targets of the terminal genes in regulatory hierarchies.
Our interest here is in identifying the direct targets of doublesex
(dsx), a terminal gene in the fly sex hierarchy. dsx transcription is
highly regulated, as evidenced by its temporally and spatially
restricted expression patterns through development (Robinett et al.,
2010; Rideout et al., 2010). dsx pre-mRNAs are sex-specifically
spliced (Ryner et al., 1996; Burtis and Baker, 1989), resulting in
adults in whom some cells express DSXM or DSXF, and are thus
either male or female, respectively, and other cells do not express
these transcription factors.
DSXM and DSXF share identical DNA-binding domains, but
have different C termini (Burtis et al., 1991; Erdman and Burtis,
1993). The DSX proteins specify nearly all somatic sexual
differences outside the nervous system, as well as several nervous
system sexual dimorphisms (Lee et al., 2002; Sander and
1
Biology Department, Stanford University, Stanford, CA 94305, USA. 2Department
of Pathology, Stanford University, 300 Pasteur Drive, Stanford, CA 94305, USA.
*Present address: Janelia Farm Research Campus, Howard Hughes Medical Institute,
19700 Helix Drive, Ashburn, VA 20147, USA
†
Author for correspondence ([email protected])
Accepted 22 April 2011
Arbeitman, 2008; Robinett et al., 2010; Rideout et al., 2010;
Mellert et al., 2010) (for reviews, see Christiansen et al., 2002;
Camara et al., 2008).
Although some genes whose activity levels are dependent on dsx
have been identified by a candidate gene approach, most have been
identified via plus/minus, microarray, SAGE or enhancer trapbased screens for genes expressed sex-differentially in the soma
(Shirangi et al., 2009; Lebo et al., 2009; Chatterjee et al., 2011) (for
reviews, see Christiansen et al., 2002; Camara et al., 2008). These
screens identified dozens of genes whose activity levels are
governed by dsx, but did not distinguish direct from indirect
targets. Studies of these dsx-regulated genes showed that the DSX
proteins largely act by modulating in some tissues the activities of
genes that are also used sex-nonspecifically in other tissues. The
array of processes regulated by dsx is complex, as different genes
are regulated by dsx in all cell and tissue types examined (for
reviews, see Christiansen et al., 2002; Camara et al., 2008). Of the
dsx-regulated genes that have been sufficiently characterized to
distinguish likely direct and indirect targets, the vast majority are
indirect targets (Arbeitman et al., 2004; Chapman and Wolfner,
1988; Wolfner, 1988). The few identified direct targets are the three
Yolk protein (Yp) genes (Burtis et al., 1991; Coschigano and
Wensink, 1993; Hutson and Bownes, 2003), bric à brac 1 (bab1)
(Williams et al., 2008) and Fad2 (Shirangi et al., 2009).
The paucity of direct DSX targets limits understanding of
important aspects of sexual biology. First, we have little knowledge
of the molecular links between dsx and the biological processes it
controls. Second, understanding how the functions of the sex
hierarchy and other patterning hierarchies are integrated during
development requires the identification of direct DSX targets
(Christiansen et al., 2002; Williams et al., 2008). Third, not
knowing the direct DSX targets seriously constrains studies of the
molecular evolution of sex. To open up these topics for more
systematic analysis, we carried out a screen for direct DSXF targets.
DSXF and DSXM bind with similar kinetics to identical
sequences in the Yp1 and Yp2 common promoter (Burtis et al.,
1991; Erdman and Burtis, 1993; Coschigano and Wensink, 1993).
DEVELOPMENT
KEY WORDS: DamID, Doublesex, Evolution, Sex determination, Transcription factor binding site
2762 RESEARCH ARTICLE
MATERIALS AND METHODS
Plasmids construction
To construct a FLP-activated DSXF-Dam plasmid, the coding region of the
E. coli Dam methyltransferase was cut from pCMycDam (a gift from Drs
van Steensel and Henikoff, Fred Hutchinson Cancer Research Center,
Seattle, WA) with EcoRI/XbaI, and inserted into pP{UAST} between EcoRI
and XbaI sites. The fragment encoding a V5-tagged DSXF was digested
from pMTDSXFV5 (Garrett-Engele et al., 2002), and inserted in-frame into
the above plasmid between EcoRI and AgeI sites. An FRT-stop-FRT
cassette was inserted into the EcoRI site to generate the final FLP-activated
DSXF-Dam construct (fsfDSXF-Dam).
To generate a DNA-binding mutant for DSXF-Dam, site-directed
mutagenesis was performed using primers 5⬘-TCTGCAAACGGCCTTGctgcagGCCCAGGCGCAGGAT-3⬘ and 5⬘-ATCCTGCGCCTGGGCctgcagCAAGGCCGTTTGCAGA-3⬘ (the lowercase letters indicate the
mutated sequences). This resulted in the substitutions of R90L and R91Q
in the DSX protein, and the mutated plasmid was confirmed by sequencing
to be free of other mutations. An 820 bp BglII/XhoI fragment containing
the mutated sequence was used to replace the wild-type sequence in
fsfDSXF-Dam, resulting in the plasmid fsfR91Q-Dam.
To construct a FLP-activated Dam control plasmid, the coding region of
Myc-tagged E. coli Dam methyltransferase was digested from pNdamMyc
(a gift from Drs van Steensel and Henikoff) with EcoRI/SalI, and cloned
into P{UAST} between EcoRI and XhoI sites. An FRT-stop-FRT cassette
was inserted at the EcoRI site and the correct orientation selected,
generating the final plasmid fsfNdam.
To generate bacterial expression plasmids for DSXF, the coding
sequence from the fsfDSXF-Dam was PCR amplified using primers 5⬘ATGGTTTCGGAGGAGAACTGGA-3⬘ and 5⬘-CGTAGAATCGAGACCGAGGAGA-3⬘. The PCR products were cloned into pCR-Blunt-TOPO
plasmid (Invitrogen, Carlsbad, CA), with the correct orientation confirmed
by sequencing.
Eletrophoretic mobility shift assay
Eletrophoretic mobility shift assays were carried out as described
previously (Erdman et al., 1996) with modifications. Bacteria expressing
the V5-tagged DSXF (in pCR-Blunt-TOPO) were cultured in 5 ml of LB
medium overnight at 30°C, then induced with 1 mM IPTG for 3 hours and
harvested by centrifugation at 4°C. The cells were re-suspended in 300 ml
of Z-50 buffer (Erdman et al., 1996) and briefly sonicated followed by
centrifuging at 15,000 g at 4°C for 5 minutes and the supernatant
immediately frozen (–80°C) until use. The DNA oligos for the probes
(from the branchless gene) were synthesized as 5⬘ biotin labeled. Binding
reactions (20 ml) were set up in Z-50 buffer and include 2 mg of poly[dI-
dC], 0.25 pmol of labeled probes, 10 ml of total cell lysate and various
amounts of competitors as indicated. After 20 minutes of incubation at
22°C, reactions were electrophoresed (5% PAGE gel, 0.5⫻ TBE buffer,
120 V) then transferred to a Hybond-N+ membrane (GE Life Sciences,
Piscataway, NJ) and the biotin detected using a LightShift
Chemiluminescent EMSA Kit (Thermo Scientific, Rockford, IL).
Fly crosses and heat shocks
Males carrying either fsfDSXF-Dam (two lines), fsfR91Q-Dam (three
lines) or fsfNdam (1 line) were crossed to y w hsFLP122; +; + virgin
females at 22°C. Newly eclosed female offspring with pigmented eyes
were subjected to heat shock at 37°C for 40 minutes to induce the
expression of FLPase that in turn activates the expression of Dam proteins,
then allowed to recover at 25°C for 96 hours (24 hours for fsfNdam) before
being collected for genomic DNA preparation.
Construction of DpnI sequence-tag libraries for ultra highthroughput sequencing
Genomic DNA was prepared with a DNeasy Blood & Tissue Kit (Qiagen,
Valencia, CA). For each sample 8 mg DNA was digested with 40 U of DpnI
in 100 ml at 37°C overnight and purified through a QIAquick PCR purification column (Qiagen). A biotin-labeled EcoP15I adaptor was generated
by annealing biotin-5⬘-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCAGCAG-3⬘ and 5⬘-CTGCTGAGATCGGAAGAGCT-3⬘ (the
EcoP15I sites are underlined) and ligated to each DpnI-digested end. DNA
polymerase I was used to repair the nicked strand followed by EcoP15I
(which cleaves DNA 27 bp 3⬘ to its recognition site) digestion for 4 hours
to generate a 27 bp sequence tag ligated to the biotin-labeled adaptor. The
biotin-labeled tags were purified using 30 ml of strepavidin-Dynabeads
(Invitrogen) per sample. After end repairing using Klenow polymerase, a
second adaptor (generated by annealing 5⬘-ccgagatctaCACTCTTTCCCTACACGACGCTCTTCCGATCT-3⬘ and 5⬘-AGATCGGAAGAGCGTCGTGTA-3⬘) was ligated to the other end of the sequence tags. After
washing, the beads were re-suspended in 25 ml 1⫻ Buffer 2 (New England
Biolabs, Ipswich, MA). DpnI tags were PCR amplified with primers 5⬘CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCAGCAGTC3⬘ and 5⬘-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC-3⬘ (underlined sequences correspond to the Illumina library
construction primers) as follows: 1 cycle of 68°C for 5 minutes; 11 cycles of
95°C for 30 seconds, 65°C for 1 minute, 72°C for 20 seconds; one cycle of
72°C for 5 minutes. The products were resolved on a 4% agarose gel. The
desired ~125 bp product was recovered using a QIAquick Gel extraction kit
(Qiagen) and quantified using a Nanodrop spectrophotometer (Thermo
Scientific, Rockford, IL). Ultra high-throughput sequencing was performed
on a Genome Analyzer platform (Illumina, San Diego, CA). A more
detailed protocol on library preparation is available upon request.
Analysis of sequencing data
Sequencing data (25 bp reads) from the Genome Analyzer were matched
to the fly genome using the Illumina software package and reads matching
a unique location were binned for further analysis. The unique reads were
aligned with GATC sites using Perl scripts, and any reads that were not
located at a sequence flanking a GATC site were discarded (~10%). Reads
that aligned with each GATC site were counted for each sample. Because
methylation of any particular GATC site is independent between cells, and
the probability of any particular site being methylated in a cell is small, the
read counts at each GATC site were treated as a Poisson count. A Z-scorebased approach was used to compare the counts at each GATC site for
different samples. The Z score was calculated by the formula:
Z=
Ca − Cb
Ca + Cb
,
where Ca is the count of reads for sample A, and Cb is the count of reads
for sample B. If Ca+Cb is smaller than 4, the Z score at that site was
artificially set as 0 (which means that Ca is not different from Cb) to
minimize bias created by the small sample. Any Z-score greater than 1.65
(equivalent to P<0.05) was transformed into a P value using the Z-score to
P-value table. The transformation was set at a halving of P value for the
DEVELOPMENT
SELEX experiments defined a consensus DSX-binding site
(Erdman et al., 1996) as a 13 bp palindromic sequence
(G/A)nnAC(A/T)A(T/A)GTnn(C/T) composed of two half-sites
around a central (A/T) base pair. One such site is expected by
chance every 2 kb in the D. melanogaster genome, thus, additional
information is needed in vivo to specify biologically relevant DSXbinding sites.
We identified at a genome-wide level potential DSX-binding
regions via DamID chromatin profiling. Analysis of these DSXbinding regions identified a new consensus 13 bp palindromic
DSX-binding sequence in which the identity of each base is
important for optimal DSX binding. This optimal DSX binding
sequence is enriched in the D. melanogaster genome (58 copies
versus approximately three expected from random). There are 23
genes associated with an in vivo peak in DSX binding and an
optimal DSX binding sequence, and therefore almost certainly are
direct DSX targets. The abundance of the optimal DSX-binding
site in other insect genomes, as well as the maintenance of its
association with the above 23 direct DSX target genes in these
species, provides insight into the molecular evolution of DSX and
its targets in insects.
Development 138 (13)
DSX targets and the evolution of sex
Identification of an optimal DSX-binding site by MEME, and its
analysis in insects
The peaks (see Table S2 in the supplementary material) used for searching
for an enriched motif by MEME (Bailey and Elkan, 1994) were picked
based on peak shapes and good GATC-site density such that allow us to
locate the center of the peak, to which DSX potentially binds, and to
remove extraneous sequences on both sides.
Whole-genome analysis of optimal DSX-binding site matches in the D.
melanogaster genome of all chromosome arms (release sequence BDGP
R5, April 2006) was carried out with a Perl script. The whole-genome
searches for a given 13 bp sequence used Blast (Altschul et al., 1997), with
E-value adjusted for genome size.
Orthologous genes were defined by the FlyBase annotation or by Blast
using the D. melanogaster sequence if no ortholog had been annotated. The
sequence of the gene and its 5⬘ and 3⬘ flanking regions were manually
exported and subjected to analysis using Perl script. The 5⬘ or 3⬘ flanking
region is defined as the sequence until the next gene with the following
exceptions: if the next gene is less than 10 kb away, the flanking region is
extended to 10 kb; if the next gene is greater than 50 kb away, the flanking
region is truncated at 50 kb.
Protein alignment
The amino acid sequences of DNA-binding domains for the 12 Drosophila
DSX proteins were obtained from NCBI GenBank or FlyBase except for
D. mojavensis and D. willistoni, which were derived from sequences of RTPCR products, as we found errors in the NCBI sequences in the DSX
proteins for these two species. The amino acid sequence alignment was
performed using Clustalw2 (Larkin et al., 2007).
RESULTS
Whole genome identification of DSX-binding
regions
In vivo DSX-binding regions were identified on a whole-genome
scale using DamID, a technique in which tethering E. coli DNA
adenine methyltransferase (Dam) to a chromatin binding protein
leads to specific methylation of GATC sites in the vicinity of that
protein’s binding/recruitment sites (van Steensel and Henikoff,
2000; van Steensel et al., 2001). To avoid non-specific methylation
from overexpression (van Steensel and Henikoff, 2000), Damfusion proteins were expressed using the basal expression level of
the minimal hsp70 promoter in P{UAST} (Fig. 1A). The GATC
methylation profile of a DSXF-Dam fusion construct was compared
with those of two control proteins. First, Dam alone was used to
measure non-specific methylation (van Steensel and Henikoff,
2000). As the fly genome contains ‘transcription factor
colocalization hotspots’, where transcription factors accumulate
through protein-protein interactions (Moorman et al., 2006), a
second control was Dam fused to a DSX protein with a mutation
in its DNA-binding domain (R91Q) that has lost the ability to bind
Fig. 1. Modified DamID approach to identify DSXbinding regions. (A)Schematic drawing of the flip-on
Dam-fusion constructs. The proteins are expressed
from the basal level of the UAS sequence after the FRTstop-FRT cassette is removed by expression of FLPase.
(B)Example of an Illumina sequencing library. Lane 1,
100 bp DNA ladder; lanes 2 and 3, PCR of a sample
skipping the DpnI digestion step to serve as a negative
control; 1ml (lane 2) and 4ml (lane 3) of ligation
product were added as a template; lanes 4 and 5, PCR
of a sample with every step finished, 1ml (lane 4) and
4ml (lane 5) of ligation product were added as a
template. The library band is indicated by an
arrowhead and the primer dimer is indicated by an
asterisk. (C)The Z-score peak of a DSX-binding region
at the Yp1, Yp2 locus. The Z-score comparing the tag
counts between DSXF-Dam and R91Q-Dam for each
GATC site was plotted at the Yp1, Yp2 locus. The
genomic position is indicated on the x-axis. The scale
of the Z-score is indicated on the y-axis. Each vertical
line indicates the Z-score for a GATC site at the
corresponding genomic position on the x-axis. The
height of the vertical line indicates the value of the Zscore. For those GATC sites with a Z-score less than 0,
only the position of the GATC site is indicated (by a
vertical line with the Z-score value of 0). The genes are
indicated under the x-axis, with open boxes indicating
genes on the forward strand and shaded boxes
indicating genes on the reverse strand. (D)Similar to C,
the Z-sore peak of a DSX-binding region near the gene
shd is shown.
DEVELOPMENT
convenience of the Perl scripts programming, such that a P value between
0.025 and 0.05 was set as P0.05, a P value between 0.025 and 0.0125 was
set as P0.025, and so on; Z-scores greater than 4.3 were set as P0.00001.
For any Z-score smaller than 1.65, the P value was set as 1. The cumulative
P value was then calculated for each peak by grouping the GATC sites
whose P values were equal or smaller than 0.05, and are separated from
each other by less than 1.5 kb or no more than one GATC site (in the case
that the distance between two GATC site is greater than 1.5 kb). The P
value for the peak is calculated by multiplying the P values of the sites and
the total number of sites within the peak.
RESEARCH ARTICLE 2763
2764 RESEARCH ARTICLE
Development 138 (13)
The read counts for each GATC site were compared between
DSXF-Dam and R91Q-Dam by Z-score testing, which takes into
account the difference and the absolute number of reads for the
samples. Sites where methylation is significantly higher in DSXFDam than in R91Q-Dam often clustered together in peaks (Fig.
1C,D). We calculated the cumulative P values for all such peaks
and chose 1.0e-5 as the cut off threshold for defining a DSXbinding region. This strategy yielded 1452 putative direct DSXbinding regions. To discern whether any of these are likely
transcription factor colocalization hot spots, we compared the
methylation profiles of R91Q-Dam with Dam alone. A total of
3456 peaks passed the 1.0e-5 cut off, and we consider the
corresponding genomic regions to be hot spots. This represents
approximately one hot spot every 35 kb, similar to the report of one
hot spot per 50 kb in a 1 Mb region studied (Moorman et al., 2006).
We re-assessed the DSX-binding peaks by the same Z-scoretesting, but excluded GATC sites located in hot spots, resulting in
1278 DSX-binding regions in the genome.
We repeated our experiments using different biological samples
for DSXF-Dam and R91Q-Dam (different transgenic lines were
used for R91Q). The libraries of sequence tags were sequenced by
a second generation Illumina Genome Analyzer II platform. About
11 million reads that map to unique GATC sites were generated for
each sample, providing about 34 times the coverage of GATC sites.
For Z-score comparison of the two samples, we chose a P value of
1.0e-10 as a cut off as that yielded 1460 peaks, which is similar to
the number of peaks obtained from the first run (1452). We
excluded the hot spots defined in the first run from the second run
data, resulting in 1370 peaks. Six-hundred and fifty (~50%) of the
peaks were common to both lists, and we consider these regions to
be the more reliable in vivo DSX-binding regions (see Table S1 in
the supplementary material).
In both runs, all four known direct DSX target genes have
significant peaks in the regions where their previously
demonstrated DSX-binding sites are located (Table 1), indicating
that our approach is robust in identifying real DSX-binding regions.
We also compared the genes near the identified DSX-binding
regions (see Table S1 in the supplementary material) with the lists
of (1) genes showing sex- biased expression or (2) genes regulated
by DSX (Ranz et al., 2003; Parisi et al., 2003; Arbeitman et al.,
2004; Gnad and Parsch, 2006; Goldman and Arbeitman, 2007;
Lebo et al., 2009). The only significant correlation is with DSXregulated genes (Goldman and Arbeitman, 2007): 14/54 genes
labeled as ‘DSX set’ are located near a DSX-binding region, which
is significantly more frequent than the overall probability (0.05) of
a random gene being found near a DSX-binding region (P3.0e–7).
There was no significant correlation between the sex-biased genes
from the other studies and proximity to a DSX-binding region,
suggesting that most of the sex-biased genes identified in these
papers are probably indirect DSX targets.
Table 1. Summary of the known DSX direct targets and their associated DSX-binding regions
DSX target
Yp1,Yp2
Yp3
bab1
Fad2
DSX-binding region (peak)
P value (1st run)
P value (2nd run)
Position of peak center
relative to the target gene
ChrX:9943693-9950830
ChrX:13653051-13654128
Chr3L:1082484-1085429
Chr3L:11016788-11018561
2.21e-50
6.13e-6
9.6e-10
1.04e-10
2.01e-80
4.39e-14
1.5e-12
2.39e-25
Promoter region
Promoter region
First intron
Promoter region
*Column 1 lists the previously known direct DSX targets up to date. Column 2 shows the DSX-binding regions (peaks) identified by the DamID-seq experiments. Columns 3
and 4 list the P values of the peaks from the two runs of sequencing experiments. Column 5 indicates the position of the peak center relative to the DSX direct target genes.
DEVELOPMENT
the Yp1 DSX-binding site (Erdman and Burtis, 1993; Zhang et al.,
2006). R91Q-Dam should bind to transcription factor hotspots, but
not to real DSX-binding sites.
Expression of the Dam-fusion proteins was induced by heat shock
of newly eclosed females (hspFLP; UAS-FRT-stop-FRT-Dam[fusion] protein coding sequence) and the flies were allowed to
recover at 25°C for 24 hours for Dam alone, and 96 hours for DSXFDam and R91Q-Dam (Fig. 1A). These times were chosen
empirically as producing comparable levels of methylation in the
alpha-tubulin84D region (an indicator of the whole genome
methylation levels), because comparable levels of methylation in
these samples are important to minimize the biases towards regions
with lowest or highest accessibility in the samples when their
methylation levels are compared. Using Yp1 as a positive control, we
also validated the DamID approach using semi-quantitative PCR.
Ultra high-throughput sequencing was used to profile methylated
GATC sites across the whole genome. To quantify the methylation
level at each GATC site, the genomic DNA from whole flies was
digested with DpnI, which cuts only methylated GATC sites, and
processed to generate a library in which each element has two
adapters flanking a 27 bp genomic sequence from a DpnI-digested
end (Fig. 1B). The final product was amplified by PCR to facilitate
visualization and purification. In theory, two 27 bp sequence tags
will be generated from each GATC site independently of the
methylation status of other GATC sites. In addition, because of the
low number of PCR cycles (10-12 cycles) and the uniform size of
the sequence tags (expected 27 bp, observed 27±1), little PCR bias
should be introduced. Thus, the library can be subjected to ultra
high-throughput sequencing to quantify the relative methylation
level at each GATC site (see Fig. S1 in the supplementary
material).
Using an Illumina Genome Analyzer sequencing platform, 3.1
million, 2.6 million and 3.5 million reads matching a unique
position in the fly genome were generated for Dam alone, DSXFDam and R91Q-Dam, respectively. Individual reads that matched
two or more genomic locations were discarded, as were reads
that matched a location distant from a GATC site, which were
probably generated from fragments produced by random DNA
breaks. In general, over 85% of the unique reads matched a
genomic GATC site. GATC sites in heterochromatic regions had
significantly fewer unique reads (over 10-fold fewer compared
with euchromatic regions, consistent with the expectation that
the heterochromatic regions are generally less accessible), and
were excluded from further analysis. These filtering steps
resulted in read counts of 2.6 million, 2.4 million and 3.1 million
for Dam alone, DSXF-Dam and R91Q-Dam, respectively. These
translate into 7.4-9.6 times coverage of the GATC sites, and 8489% of the GATC sites have at least one read mapped to them.
A large proportion of the 11-16% unmapped GATC sites were
common to the three samples, and in regions of repetitive
sequence, so the reads for these sites were discarded due to
multiple matches.
Identification of a new consensus DSX-binding
site
As the 13 bp consensus DSX-binding sequence derived from
SELEX experiments (Erdman et al., 1996) has insufficient
information to function as a full DSX-binding site in vivo, we
searched the DSX-binding regions that we identified above to see
whether we could refine the DSX-binding sequence. We chose 43
regions where the center GATC sites had higher Z scores and the
distribution of GATC sites in the regions was relatively even and
dense. We trimmed these regions to an average of 1.5 kb to remove
as much extraneous sequence as possible without much risk of
omitting the potential DSX-binding site. The trimmed regions were
searched for enriched motifs using MEME (Bailey and Elkan,
1994). The sequence identified as most enriched is
GCAACAATGTTGC (Fig. 2A), which resembles the SELEXidentified DSX-binding site, (G/A)nnAC(A/T)A(T/A)GTnn(C/T),
Fig. 2. Identification of a new consensus DSX-binding sequence.
(A)The sequence identified by MEME that is enriched in 43 DSXbinding regions (see Table S3 in the supplementary material) (Evalue3.4e-21). The overall height of each stack indicates the sequence
consensus at that position (measured in bits), whereas the height of
nucleotide symbols within the stack reflects the relative frequency of
the nucleotides at that position. (B)Comparing the new consensus
DSX-binding sequence with the previously in vitro identified DSXbinding sequence. A 33 bp sequence from the second intron of
branchless (bnl) containing a perfect new DSX-binding site was used as
a probe in an electrophoresis mobility shift assay. Two palindromic
sequences with mutation at the outside 6 bp of the DSX-binding site
were created and used as competitors. The sequences and the amount
(fold over labeled probe) of competitors are shown in the table under
the panel. Lower case letters indicate the mutated nucleotides.
RESEARCH ARTICLE 2765
except that it is a complete palindrome and the identity of the
outside six bases is specific, suggesting that the these bases may
contribute to interactions with DSX protein.
We analyzed the affinity of recombinant DSX protein for each
of these sequences in vitro by electrophoretic mobility shift
assay (EMSA) using a probe containing the new consensus
DSX-binding site (GCAACAATGTTGC) and two competitor
sequences that differ at the outside six bases (atgACAATGTcat
and cgtACAATGTacg). As there are same amounts of protein
and probe in all reactions, the decrease in the intensity for the
shifted band reflects the relative affinity of the respective
competitor to the DSX protein. The SELEX-identified DSXbinding site predicts that these three sequences will have similar
affinities, whereas the new consensus DSX-binding site predicts
that the first sequence will have higher affinity than the other
two. We found that the second sequence (atgACAATGTcat) has
about a threefold lower affinity compared with the new DSX
consensus site, whereas the third sequence (cgtACAATGTacg)
has an even lower affinity (Fig. 2B), suggesting that the new
consensus sequence provides more information about the identity
of an optimal DSX-binding site.
The Yp3 gene is a direct DSX target, and DSX binds a 177 bp
fragment from the Yp3 promoter by EMSA (Hutson and Bownes,
2003). However, the SELEX-identified DSX-binding site could not
be found. Alignment with the new consensus DSX-binding
sequence identified a good candidate for the DSX-binding site in
the Yp3 promoter, with only two mismatches out of the 13 bp
sequence (see Fig. S2A in the supplementary material). By EMSA,
this Yp3 promoter sequence was bound by DSXF protein (see Fig.
S2B in the supplementary material), and competed away by
unlabeled oligos of the same Yp3 sequence or oligos for the DSX
A site in the Yp1 gene, and to a lesser extent, by oligos for the DSX
B site in the Yp1 gene.
To elucidate the importance of each base pair in the new
consensus DSX-binding site, we used EMSA to evaluate the
change in DSX affinity for sites mutated at different positions. We
found that (1) the sequence identity of each of the 13 bp contributes
to the affinity of the DSX binding site, with the outside 6 bp
sequence contributing to a lesser extent than the 7 bp core
sequence; and (2) a single A/T is preferred at the center position
(see Fig. S3 in the supplementary material).
The new consensus sequence is enriched in the
Drosophila melanogaster genome and
significantly associated with DSX-binding regions
Searching the whole genome for perfect matches to the new
consensus sequence (hereafter referred to as the optimal DSXbinding site) revealed 58 matches (Table 2), well in excess of the
3 (±2) matches expected by chance assuming the genome sequence
is random and adjusting for a 42.5±2.5% GC content (Jabbari and
Bernardi, 2004). To test whether the optimal DSX-binding site is
significantly enriched in the genome compared with similar
sequences, we permuted the outside 6 bp of the consensus
sequence in such a way that the percentage of GC remained
unchanged and the resulting sequence was palindromic. There are
24 such sequences including the optimal DSX-binding site (Fig.
3A). The copy number for the optimal DSX-binding site was
significantly greater than the copy number for each of the other 23
sequences by Grubb’s test (P1.24e-12 for one-tailed test) (Fig.
3B). The largest copy number for the other 23 sequences, which is
12, is not significantly different from the other 22 sequences
(P0.22 for a one-tailed test). Excluding the optimal DSX-binding
DEVELOPMENT
DSX targets and the evolution of sex
2766 RESEARCH ARTICLE
Development 138 (13)
Table 2. Genes near the 58 optimal DSX-binding sites
X
X
X
X
X
X
2L
2L
2L
2L
2L
2L
2L
2L
2L
2L
2L
2L
2L
2L
2R
2R
2R
2R
2R
2R
2R
2R
2R
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3L
3R
3R
3R
3R
3R
3R
3R
3R
3R
3R
3R
3580716
11442523
12115551
13800635
20790737
21378722
672482
684251
715832
2426306
4794104
7134101
7400153
7915167
7915243
9692193
12010402
12529004
17339045
19846470
2533346
3094399
5795615
11488592
12296335
13761675
14470005
18554605
18641339
29848
30341
50987
51892
1085012
1118618
1120322
2991898
3001667
7446065
10368944
10600974
11192592
14369164
16208698
17260582
18378824
20634560
5699874
9008489
15636942
15958377
17434128
21932142
23884341
25116246
26169425
27742671
27765517
3580704
11442511
12115563
13800623
20790725
21378710
672470
684263
715844
2426294
4794092
7134113
7400165
7915155
7915255
9692205
12010414
12529016
17339033
19846482
2533358
3094411
5795603
11488604
12296347
13761687
14470017
18554617
18641327
29860
30353
50975
51880
1085024
1118630
1120334
2991910
3001655
7446077
10368932
10600986
11192604
14369176
16208710
17260570
18378836
20634548
5699862
9008477
15636930
15958365
17434140
21932154
23884329
25116258
26169413
27742683
27765505
Nearest gene(s)
P value (1st run)
P value (2nd run)
Mnt
CG32666
Ten-a
mamo
CG11229
Cyp6t1
ds
ds
ds
dpp
CG15630
Pvf3
CG1571
Snoo
Snoo
Pka-C1
Rh5
bun
CG31408
sick
Fmo-2/debcl
pk
Fmrf
CG42524
CG33960
elk
CG30115
CG30273
CG42260
mthl8
mthl8
Lsp1gama
Lsp1gama
bab1
bab1
bab1
CG2113
FR
CG42458
CG12362
CG34237
CG6199
CG13418
CG13073
CG13723
rpr
CG13251
CG34360/Teh1
CG8138
bnl
Gr92a
InR
CG31324
CG34362
CG14506
hdc
heph
heph
2.19E-08
1.88E-13
2.61E-32
ns
5.37E-17
ns
ns
1.49E-37
ns
ns
ns
ns
5.86E-10
3.75E-08
3.75E-08
ns
8.10E-27
ns
ns
6.27E-10
6.92E-33
ns
1.75E-10
ns
ns
ns
ns
ns
ns
6.52E-33
6.52E-33
8.45E-35
8.45E-35
9.60E-10
7.18E-19
8.98E-53
ns
ns
ns
5.91E-12
ns
1.14E-25
ns
1.98E-59
ns
ns
8.59E-09
6.25E-08
ns
9.63E-71
ns
1.54E-12
ns
ns
1.74E-20
ns
ns
ns
ns
2.76E-30
3.76E-69
ns
2.62E-89
3.13E-11
ns
1.92E-40
ns
ns
ns
1.07E-15
7.18E-23
3.24E-30
3.24E-30
ns
7.38E-23
ns
ns
3.72E-36
7.12E-38
ns
1.87E-22
5.95E-15
ns
4.27E-13
ns
ns
ns
8.83E-87
8.83E-87
6.63E-105
6.63E-105
1.50E-12
7.27E-133
7.27E-133
ns
ns
ns
1.72E-44
ns
9.72E-43
4.50E-11
2.43E-121
1.23E-21
ns
3.91E-11
1.39E-25
ns
8.82E-152
ns
ns
ns
ns
2.78E-50
ns
ns
ns
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
The 58 optimal DSX-binding sites are listed by the chromosome arm, the start and the end nucleotide positions. The nearest protein-encoding gene for each site is also listed.
P values for the DSX-binding peak within 500 bp of the sites from two runs of experiments are also listed. ns indicates where there is no peak with P value past the cut off.
Asterisk indicates that the gene is likely to be a DSX direct target.
site, the average copy number for the other 23 sequences is five,
with a standard deviation of three; this is close to the expected
number assuming that the presence of a sequence in the genome is
random.
Determination of whether perfect matches to each of these 24
sequences are located near one of the 650 DSX-binding regions
(including the 43 used by MEME) revealed that the optimal DSXbinding site is significantly associated with DSX-binding regions
DEVELOPMENT
Genomic position of DSX site
DSX targets and the evolution of sex
RESEARCH ARTICLE 2767
chromatin accessibility and therefore cannot be detected by
DamID. Alternatively, the sequence itself may not be sufficient to
define a DSX-binding site in vivo.
(Fig. 3, P2.1e-22 compared with a random 13 bp sequence in the
genome which has a probability of 0.019 of being within 0.5 kb of
a DSX-binding region). By contrast, the other 23 palindromic
sequences are not significantly associated with DSX-binding
regions. Therefore, the optimal DSX-binding site is both
significantly enriched in the genome and significantly associated
with the identified DSX-binding regions.
Is this optimal DSX-binding sequence sufficient to predict an in
vivo DSX-binding site? Twenty-five of the 58 optimal DSXbinding sites in the genome are associated with a DSX-binding
region (27 for the first run, 31 for the second run and 25 in
common, corresponding to 23 genes; see Table 2). Thus, the
optimal DSX-binding sequence has at least a 40% chance of
predicting an in vivo DSX-binding site, whereas the other 23
palindromic sequences do so no better than a random sequence.
That not all 58 optimal DSX-binding sites are associated with a
DSX-binding region may reflect that: (1) required co-factor(s) for
DSX are not present at the time point of our experiment; or (2) the
DSX-binding site is located in a GATC-sparse region and/or a
region without adequate sequencing coverage because of low
The enrichment of the optimal DSX-binding
sequence is evolutionarily conserved across the
12 Drosophila species
To determine whether the enrichment of the optimal DSX-binding
site was conserved in other Drosophila species, across which the
DNA-binding domain of DSX is very conserved (see Fig. S4 in the
supplementary material), we examined the number of copies of the
optimal DSX-binding sequence, as well as the other 23 related
palindromic sequences, in the other 11 sequenced Drosophila
genomes. The copy number for the optimal DSX-binding sequence
is always the highest in each Drosophila genome (Fig. 4 and see
Fig. S5 in the supplementary material) and stands out as the sole
significant outlier by Grubb’s test (all P<8e-6, on a two-tailed test).
We next examined whether the association of an optimal DSXbinding site with a particular gene is conserved for the 23 D.
melanogaster genes that we had identified as direct DSX targets
based on their being both associated with a DSX-binding region
and having an optimal DSX-binding site (Table 2). We determined
whether one or more potential DSX-binding site (13/13, 12/13 or
11/13 matches) is associated with each of these 23 genes in the
other 11 sequenced Drosophila species (four examples are in Fig.
DEVELOPMENT
Fig. 3. The new DSX consensus sequence is significantly enriched
in the genome of Drosophila melanogaster. (A)List of palindromic
sequences created by switching the position at the outside 6 bp of the
new DSX consensus binding sequence (column 1, altered positions are
shown in red), and their respective copy numbers in the whole genome
(column 2), as well as the number of the copies that are near a DSXbinding region (column 3). (B)Scatter plot of the copy numbers for
sequences listed in A. The copy number for the DSX consensus
sequence (in orange) is significantly greater than those numbers for the
rest sequences (P1.24e-12 for one-tailed test).
Other DSX-binding sites
Although the 13 bp optimal DSX-binding sequence is frequently
associated with direct DSX targets, mismatches are allowed in the
sequences through which DSX functions in vivo. The identified
DSX sites in the bab1 gene (Williams et al., 2008) match well with
the optimal DSX-binding site (13/13 and 12/13 bp) and are within
the DSX-binding region we identified for bab1. We found several
additional DSX-binding regions within the bab1 and bab2
intergenic region, two of which contain an optimal DSX-binding
sequence, the functional roles of which are not known. The DSX
protein regulates the female-specific expression of Fad2 by binding
to a sequence (GCAACAATGTatg; 10/13 bp; matches in capitals)
located 272 bp upstream of the Fad2 transcription start site. Finally,
in the case of the Yp1 and Yp2 intergenic region, there is a DSXbinding site A (aCtACAATGTTGC) and a weaker DSX-binding
site B (cCtACAAaGTgat), which have 11/13 and 7/13 matches to
the optimal DSX-binding site, respectively.
Given that particular sequences with 11/13 and 12/13 matches to
the optimal DSX-binding sequence function as DSX-binding sites
in vivo, we asked how many such sequences were observed in the
genome and how many were expected, assuming the genomic
sequence was random. For 12/13 matches, the number observed
(541) was four times the number expected (135). For 11/13
matches, the number of observed (4532) and expected (2310) were
less different. Of the sequences with 12/13 matches to the optimal
binding site, 14% were associated with a DSX-binding region,
whereas only 1.9% would be expected at random. In addition, the
numbers of copies of individual sequences with 12/13 bp matches
to the optimal DSX-binding site were nonrandom, with some
sequences having significantly more copies than expected.
However, we could not discern any correlation between the number
of copies of individual sequences and the likelihood that a sequence
was associated with a DSX-binding region. Thus, although
additional information probably determines whether 12/13 and
11/13 matches function as DSX-binding sites in vivo, the nature of
that information is currently unclear.
2768 RESEARCH ARTICLE
Development 138 (13)
genomes, the D. melanogaster optimal DSX-binding site is found
at both a significantly higher copy number than the 23 control
sequences and is 26- or 34-fold over expected, respectively (Fig.
4). In another mosquito species, Culex pipiens, the copy number of
the optimal DSX-binding site is the second highest and is 10-fold
more than random expectation. The highest number of perfect
matches in the C. pipiens genome is to the related sequence
aCgACAATGTcGt (Fig. 4). In the silkworm Bombyx mori, which
is further removed from Drosophila, the D. melanogaster optimal
DSX-binding site copy number is only twofold more than random
expectation and is not an outlier among the 24 palindromic
sequences. Instead, the related sequence cgtACAATGTacg is the
sole outlier, and is 16-fold more prevalent than expected. None of
these 24 sequences showed up as an outlier (P0.01 cut off) in the
more distantly related insect species examined (Tribolium
castaneum, Apis mellifera, Nasonia vitripennis, Acyrthosiphon
pisum and Pediculus humanus corporis) (Fig. 4). Taken together,
these observations suggest that a global evolution of the optimal
DSX-binding sequence has occurred within the insects.
5; the rest are in Fig. S6 in the supplementary material). The
association of genes with an optimal DSX-binding site is seen for
23-35% of the 23 genes, suggesting the possibility that these genes
may function in sex-specific differentiation in all these species.
Thus, five of these genes are associated with either at least one
copy of the optimal DSX-binding sites, or two or more copies of a
12/13 match to the optimal DSX-binding site in all 11 species for
which the orthologous gene can be found from FlyBase annotation
or a Blast search. For three additional genes, an optimal DSXbinding site is found associated with the gene in 10 species and a
12/13 match to this site found in one species. Similarly, within the
four melanogaster subgroup species examined, 65-90% of the 23
genes show an association with an optimal DSX-binding site,
suggesting that these genes may function in sex-specific
differentiation in all these species. Thus, 15/23 genes were
associated with at least one optimal DSX-binding site in all four
species, and an additional six genes associated with either an
optimal DSX-binding site or a sequence that is a 12/13 match to
that binding site.
The pattern of enrichment of the optimal DSXbinding sequence in other insects
Extending this analysis to other insect species revealed the
following. In two mosquito species, Aedes aegypti and Anopheles
gambiae, similar to what is observed in the 12 Drosophila
The advantage of profiling methylated GATC sites
using ultra high-throughput sequencing
We have modified the DamID technique to take advantage of nextgeneration sequencing. Previous whole-genome DamID
experiments largely used array-based methods (van Steensel and
Henikoff, 2000; van Steensel et al., 2001; Orian et al., 2003;
Bianchi-Frias et al., 2004; Song et al., 2004; de Wit et al., 2005;
Reddy et al., 2008; Pindyurin et al., 2007). In these approaches, the
methylated fragments are isolated by PCR-based methods, and
therefore are subjected to PCR bias, although efforts were made to
minimize such bias. Still, the recovery of a fragment depended on
the GATC sites at both ends being methylated, and large fragments
produced from GATC-poor regions would be less efficiently
amplified. In our approach, each GATC site is evaluated
independently. Two sequence tags are generated from each
methylated GATC site, and this does not depend on the methylation
status of nearby GATC sites. In addition, there are only minimal
PCR cycles (<13) employed during library construction and the
sizes of PCR products are very uniform (125±1 bp); hence, there
is minimal PCR bias.
In previous reports where cDNA arrays were used, only binding
regions close to exons could be identified. As transcription factorbinding sites are often located in the intergenic regions or in
introns, using a cDNA array would probably miss many binding
regions. Using tilling arrays may avoid the problem of missing
target regions distant from genes, but tilling arrays are limited by
the fraction of the genome that can be put on a chip. Moreover, the
information is harder to compare between different versions of
arrays. To avoid these pitfalls, we used ultra high-throughput
sequencing to generate digitized actual sequences that can be
directly matched to the genome and can be easily compared with
results from different experiments. Furthermore, the array-based
DEVELOPMENT
Fig. 4. The enrichment of the optimal DSX-binding sequence is
conserved in other insects. The genome-wide copy numbers for the
optimal D. melanogaster DSX-binding sequence and the other 23
palindromic sequences (listed in Fig. 4A) were normalized by the copy
number expected from randomness, and plotted for each insect
species. The copy number for the optimal D. melanogaster DSX-binding
sequence is indicated in orange for the first 15 species. The copy
number for sequence aCgACAATGTcGt in the Culex pipiens genome is
indicated in purple. The copy number for sequence cgtACAATGTacg in
the Bombyx mori genome is indicated in red. The mean±s.e.m. for each
species is indicated.
DISCUSSION
A modified DamID approach identified a collection of ~650
potential direct DSX target genes. The subset of 23 of these genes
shown to be associated with evolutionarily conserved optimal
DSX-binding sites seem extremely likely to be direct DSX targets,
substantially increasing the number of identified direct DSX
targets. Below, we discuss some salient aspects of our experimental
approach and findings.
DSX targets and the evolution of sex
RESEARCH ARTICLE 2769
methods are more easily saturated, thus resulting in false-negatives,
whereas the ultra high-throughput sequencing in theory has an
infinite linear range.
The DamID approach might lack tissue specificity
We found that for a binding region to be identified, the
corresponding target gene does not need to be in a transcriptionally
active state when the Dam-fusion protein experiment is carried out.
In the case of the Yp1 DSX-binding region, we observed a three- to
ninefold higher normalized methylation level for the DSXF-Dam
sample versus Dam alone sample in third instar larvae, even though
the Yp1 gene is not transcribed until after eclosion, some 4 days
later.
There are two possibilities for this lack of temporal specificity.
One is that it is an intrinsic feature of the DamID approach. It has
been suggested that transcription factors are always in a dynamic
mode of on and off their binding sites (e.g. Hager et al., 2009). The
Dam-fusion protein might be highly efficient in methylating the
GATC sites around its binding sites, so that even transient binding
of the protein is sufficient to generate methylation. Alternatively,
the DSX-binding sites may be readily recognizable by the DSX
protein either because of their sequences per se or because of other
factors associated with the sites are already in position, so that
when the DSX-Dam protein is expressed at an earlier stage, it binds
to these pre-arranged binding sites.
Implications for the evolution of sex
Our findings contribute at several levels to an understanding of the
evolution of sex. Considerations of where in developmental
regulatory hierarchies the mutations underlying evolutionary
changes are most likely has led to the proposal that such changes
are more likely to be in the cis-regulatory elements of the targets
of the transcription factors encoded by the terminal regulatory
genes in a hierarchy, than in the coding sequences of the terminal
regulatory genes themselves (Tautz, 2000; Carroll, 2000).
With respect to the sex hierarchy, the limited data previously
available are consistent with this proposal. Thus, in the DSX DNAbinding domain (DM domain), the Zn module, as well as the Nterminal region of the tail (amino acids 78-97) immediately
following the Zn module [which has been proposed to be the
functionally important part of a recognition helix providing the
binding specificity (Zhu et al., 2000)], are highly conserved among
these Drosophila species (see Fig. S4 in the supplementary
material). Of the four previously identified direct DSX targets, only
two (bab and Fad2) exhibit changes in their sex-specific expression
patterns within the Drosophila genus and in both cases these
changes in the sex specificity of their expression are correlated with
changes in adjacent DSX-binding sites (Williams et al., 2008;
Shirangi et al., 2009).
The 23 genes we identified as very probable direct DSX targets
provide a broader data set with which to examine this proposal.
Within the four other sequenced members of the melanogaster
DEVELOPMENT
Fig. 5. The evolutionary change of the
association of DSX-binding site with
genes. (A-D) Four examples of the
association of potential DSX-binding sites in
the 12 sequenced Drosophila species with
the 23 genes that are identified in D.
melanogaster as likely direct DSX targets. For
each gene, the phylogenetic tree of the 12
Drosophila species is illustrated. The
maximum number of base pairs matched to
the 13 bp optimal DSX site by any 13 bp
sequence in the gene and flanking regions is
listed for each species. For the 12/13
matches, the actual sequences (with the mismatched base pair in red) are listed instead.
N.A. indicates that there is no sequence
available for analysis.
subgroup species, 21/23 of these genes are also associated with
putative DSX-binding sites (13/13 or 12/13 matches) in all four
species, whereas 2/23 genes are not associated with close matches
to the optimal DSX-binding site in at least one species, suggesting
that the latter may be cases in which the sex-specificity of
expression of a gene has changed via an alteration of its cis-acting
DSX-binding sequence. When the data from these 23 genes were
examined across all 12 sequenced Drosophila species, a similar
result was seen: in 23-35% of the cases an association between the
gene and a DSX-binding site is retained in all species, while for 6572% of these genes, a recognizable DSX-binding site is not present
in one or more of these species. These findings with respect to the
latter group of genes are consistent with a potential change in the
sex specificity of expression as a consequence of a change in the
adjacent DSX-binding site.
A second theme in modern considerations of the evolution of sex
is that sex evolves rapidly when compared with other
developmental processes (Schütt and Nöthiger, 2000). This view
derives in part from the wide diversity in sex determination
mechanisms used in animals, and findings that the primary sex
determination mechanisms can differ in closely related species
(Schütt and Nöthiger, 2000; Bull, 1985; Gempe and Beye, 2011).
Furthermore, and probably independently, at a developmental level
evidence for the rapid evolution of sex comes directly from the
myriad of secondary sexual characteristics that frequently
distinguish closely related species, which otherwise have nearly
identical body plans (Darwin, 1871). Our data, cited immediately
above, provides a relatively broad look at the molecular genetic
level at the frequencies with which there are potential changes
between sex-specific and sex-nonspecific patterns of expression of
downstream effector genes. Consistent with relatively rapid
evolutionary changes in the sex-specificity of expression of some
of the downstream genes in the Drosophila sex hierarchy, we found
that ~70% of the 23 D. melanogaster genes associated with DSXbinding sites are not associated with such a site in one or more of
the other 11 sequenced Drosophila species.
It is also important to note that these data, in addition to
pinpointing a set of genes whose roles in sexual development may
have changed during the evolution of this genus, also identify
another subset of these genes, comprising ~30% of the putative
direct DSX targets in D. melanogaster, that retain an association
with putative DSX-binding sites in all 12 Drosophila species
examined. The latter finding suggests that a significant subset of
the genes directly regulated by DSX have functions in aspects of
sexual development that are evolutionarily relatively stable. Such
evolutionarily conserved aspect of maleness or femaleness might
encompass, for example, somatic functions that are necessary for
sex-specific germline functions or gametogenesis.
Extending our analysis of the optimal DSX-binding site and the
related 13 bp palindromic sequences beyond the Drosophila genus
provided additional evolutionary insights. In two mosquito species,
Aedes aegypti and Anopheles gambiae, enrichment in the optimal
DSX-binding site is also observed. However, in the mosquito
species Culex pipiens, although the number of optimal DSXbinding sites is still 10-fold more than expected, the related
sequence aCgACAATGTcGt had the highest number of perfect
matches. Further analysis found that this latter sequence is within
a repetitive element of the C. pipiens genome, so we infer that
GCAACAATGTTGC is most probably the optimal DSX site in
this species. The enriched sequence, cgtACAATGTacg, in the
silkworm Bombyx mori is also within a repetitive element. None of
the 24 sequences are enriched for the more distantly related insect
Development 138 (13)
species (Tribolium castaneum, Apis mellifera, Nasonia vitripennis,
Acyrthosiphon pisum, and Pediculus humanus corporis) examined.
Thus, in insects outside of the Diptera, the DSX proteins might
have diverged sufficiently that they bind to sequences different than
the 24 examined.
Extending approaches such as those described here may provide
significant insights into how a DNA-binding domain and its targets
have evolved to maintain control of a fundamental developmental
process across a broad evolutionary timescale.
Acknowledgements
We thank the members of the Baker and Sidow labs for helpful discussions
and comments. We thank Dr Bas van Steensel and Dr Steven Henikoff for
providing the pNDamMyc and pCMycDam plasmids, Dr Ziming Weng for
helping with the high-throughput sequencing, Phil Lacroute for initial
sequence mapping, Drs Mike Weiss, Loren Looger and Sean Eddy for helpful
discussions, and Guennet Bohm for preparing fly food. This work is supported
by NIGMS and the Howard Hughes Medical Institute. This study was also
facilitated by information from FlyBase (www.flybase.org). Deposited in PMC
for release after 6 months.
Competing interests statement
The authors declare no competing financial interests.
Supplementary material
Supplementary material for this article is available at
http://dev.biologists.org/lookup/suppl/doi:10.1242/dev.065227/-/DC1
References
Altschul, S. F., Madden, T. L., Schaffe, A. A., Zhang, J., Zhang, Z., Miller, W.
and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs. Nucleic Acids Res. 25, 3389-3402.
Arbeitman, M. N., Fleming, A. A., Siegal, M. L., Null, B. H. and Baker, B. S.
(2004). A genomic analysis of Drosophila somatic sexual differentiation and its
regulation. Development 131, 2007-2021.
Bailey, T. L. and Elkan, C. (1994). Fitting a mixture model by expectation
maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol.
Biol. 2, 28-36
Bianchi-Frias, D., Orian, A., Delrow, J. J., Vazquez, J., Rosales-Nieves, A. E.
and Parkhurst, S. M. (2004). Hairy transcriptional repression targets and
cofactor recruitment in Drosophila. PLoS Biol. 2, E178.
Bull, J. J. (1985). Sex determining mechanisms: an evolutionary perspective.
Experientia 41, 1285-1296.
Burtis, K. C. and Baker, B. S. (1989). Drosophila doublesex gene controls somatic
sexual differentiation by producing alternatively spliced mRNAs encoding related
sex-specific polypeptides. Cell 56, 997-1010.
Burtis, K. C., Coschigano, K. T., Baker, B. S. and Wensink, P. C. (1991). The
doublesex proteins of Drosophila melanogaster bind directly to a sex-specific
yolk protein gene enhancer. EMBO J. 10, 2577-2582.
Camara, N., Whitworth, C. and Van Doren, M. (2008). The creation of sexual
dimorphism in the Drosophila soma. Curr. Top. Dev. Biol. 83, 65-107.
Carroll, S. B. (2000). Endless forms: the evolution of gene regulation and
morphological diversity. Cell 101, 577-580.
Chapman, K. B. and Wolfner, M. F. (1988). Determination of male-specific gene
expression in Drosophila accessory glands. Dev. Biol. 126, 195-202.
Chatterjee, S. S., Uppendahl, L. D., Chowdhury, M. A., Ip, P. L. and Siegal, M.
L. (2011). The female-specific Doublesex isoform regulates pleiotropic
transcription factors to pattern genital development in Drosophila. Development
138, 1099-1109.
Christiansen, A. E., Keisman, E. L., Ahmad, S. M. and Baker, B. S. (2002). Sex
comes in from the cold: the integration of sex and pattern. Trends Genet. 18,
510-516.
Coschigano, K. T. and Wensink, P. C. (1993). Sex-specific transcriptional
regulation by the male and female doublesex proteins of Drosophila. Genes Dev.
7, 42-54.
Darwin, C. (1871). The Descent of Man and Selection in Relation to Sex. London:
John Murray.
de Wit, E., Greil, F. and van Steensel, B. (2005). Genome-wide HP1 binding in
Drosophila: developmental plasticity and genomic targeting signals. Genome
Res. 15, 1265-1273.
Erdman, S. E. and Burtis, K. C. (1993). The Drosophila doublesex proteins share a
novel zinc finger related DNA binding domain. EMBO J. 12, 527-535.
Erdman, S. E., Chen, H. J. and Burtis, K. C. (1996). Functional and genetic
characterization of the oligomerization and DNA binding properties of the
Drosophila doublesex proteins. Genetics 144, 1639-1652.
DEVELOPMENT
2770 RESEARCH ARTICLE
Garrett-Engele, C. M., Siegal, M. L., Manoli, D. S., Williams, B. C., Li, H. and
Baker, B. S. (2002). intersex, a gene required for female sexual development in
Drosophila, is expressed in both sexes and functions together with doublesex to
regulate terminal differentiation. Development 129, 4661-4675.
Gempe, T. and Beye, M. (2011). Function and evolution of sex determination
mechanisms, genes and pathways in insects. Bioessays 33, 52-60.
Gnad, F. and Parsch, J. (2006). Sebida: a database for the functional and
evolutionary analysis of genes with sex-biased expression. Bioinformatics 22,
2577-2579.
Goldman, T. D. and Arbeitman, M. N. (2007). Genomic and functional studies
of Drosophila sex hierarchy regulated gene expression in adult head and nervous
system tissues. PLoS Genet. 3, e216.
Hager, G. L., McNally, J. G. and Misteli, T. (2009). Transcription dynamics. Mol.
Cell 35, 741-753.
Hutson, S. F. and Bownes, M. (2003). The regulation of yp3 expression in the
Drosophila melanogaster fat body. Dev. Genes Evol. 213, 1-8.
Jabbari, K. and Bernardi, G. (2004). Comparative genomics of Anopheles
gambiae and Drosophila melanogaster. Gene 333, 183-186.
Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A.,
McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R. et al. (2007).
ClustalW and ClustalX version 2. Bioinformatics 23, 2947-2948.
Lebo, M. S., Sanders, L. E., Sun, F. and Arbeitman, M. N. (2009). Somatic,
germline and sex hierarchy regulated gene expression during Drosophila
metamorphosis. BMC Genomics 10, 80.
Lee, G., Hall, J. C. and Park, J. H. (2002). Doublesex gene expression in the
central nervous system of Drosophila melanogaster. J. Neurogenet. 16, 229-248.
Mellert, D. J., Knapp, J. M., Manoli, D. S., Meissner, G. W. and Baker, B. S.
(2010). Midline crossing by gustatory receptor neuron axons is regulated by
fruitless, doublesex and the Roundabout receptors. Development 137, 323-332.
Moorman, C., Sun, L. V., Wang, J., de Wit, E., Talhout, W., Ward, L. D., Greil,
F., Lu, X. J., White, K. P., Bussemaker, H. J. et al. (2006). Hotspots of
transcription factor colocalization in the genome of Drosophila melanogaster.
Proc. Natl. Acad. Sci. USA 103, 12027-12032.
Orian, A., van Steensel, B., Delrow, J., Bussemaker, H. J., Li, L., Sawado, T.,
Williams, E., Loo, L. W., Cowley, S. M., Yost, C. et al. (2003). Genomic
binding by the Drosophila Myc, Max, Mad/Mnt transcription factor network.
Genes Dev. 17, 1101-1114.
Parisi, M., Nuttall, R., Naiman, D., Bouffard, G., Malley, J., Andrews, J.,
Eastman, S. and Oliver, B. (2003). Paucity of genes on the Drosophila X
chromosome showing male-biased expression. Science 299, 697-700.
Park, P. J. (2009). ChIP-seq: advantages and challenges of a maturing technology.
Nat. Rev. Genet. 10, 669-680.
Pindyurin, A. V., Moorman, C., de Wit, E., Belyakin, S. N., Belyaeva, E. S.,
Christophides, G. K., Kafatos, F. C., van Steensel, B. and Zhimulev, I. F.
(2007). SUUR joins separate subsets of PcG, HP1 and B-type lamin targets in
Drosophila. J. Cell Sci. 120, 2344-2351.
RESEARCH ARTICLE 2771
Ranz, J. M., Castillo-Davis, C. I., Meiklejohn, C. D. and Hartl, D. L. (2003). Sexdependent gene expression and evolution of the Drosophila transcriptome.
Science 300, 1742-1745.
Reddy, K. L., Zullo, J. M., Bertolino, E. and Singh, H. (2008). Transcriptional
repression mediated by repositioning of genes to the nuclear lamina. Nature
452, 243-247.
Rideout, E. J., Dornan, A. J., Neville, M. C., Eadie, S. and Goodwin, S. F.
(2010). Control of sexual differentiation and behavior by the doublesex gene in
Drosophila melanogaster. Nat. Neurosci.13, 458-466.
Robinett, C. C., Vaughan, A. G., Knapp, J. M. and Baker, B. S. (2010). Sex and
the single cell. II. There is a time and place for sex. PLoS Biol. 8, e1000365.
Ryner, L. C., Goodwin, S. F., Castrillon, D. H., Anand, A., Villella, A., Baker, B.
S., Hall, J. C., Taylor, B. J. and Wasserman, S. A. (1996). Control of male
sexual behavior and sexual orientation in Drosophila by the fruitless gene. Cell
87, 1079-1089.
Sanders, L. E. and Arbeitman, M. N. (2008). Doublesex establishes sexual
dimorphism in the Drosophila central nervous system in an isoform-dependent
manner by directing cell number. Dev. Biol. 320, 378-390.
Schütt, C. and Nöthiger, R. (2000). Structure, function and evolution of sexdetermining systems in Dipteran insects. Development 127, 667-677.
Shirangi, T. R., Dufour, H. D., Williams, T. M. and Carroll, S. B. (2009). Rapid
evolution of sex pheromone-producing enzyme expression in Drosophila. PLoS
Biol. 7, e1000168.
Song, S., Cooperman, J., Letting, D. L., Blobel, G. A. and Choi, J. K. (2004).
Identification of cyclin D3 as a direct target of E2A using DamID. Mol. Cell. Biol.
24, 8790-8802.
Tautz, D. (2000). Evolution of transcriptional regulation. Curr. Opin. Genet.
Dev.10, 575-579.
van Steensel, B. and Henikoff, S. (2000). Identification of in vivo DNA targets of
chromatin proteins using tethered dam methyltransferase. Nat. Biotechnol. 18,
424-428.
van Steensel, B., Delrow, J. and Henikoff, S. (2001). Chromatin profiling using
targeted DNA adenine methyltransferase. Nat. Genet. 27, 304-308.
Williams, T. M., Selegue, J. E., Werner, T., Gompel, N., Kopp, A. and Carroll,
S. B. (2008). The regulation and evolution of a genetic switch controlling
sexually dimorphic traits in Drosophila. Cell 134, 610-623.
Wolfner, M. F. (1988). Sex-specific gene expression in somatic tissues of
Drosophila melanogaster. Trends Genet. 4, 333-337.
Zhang, W., Li, B., Singh, R., Narendra, U., Zhu, L. and Weiss, M. A. (2006).
Regulation of sexual dimorphism: mutational and chemogenetic analysis of the
doublesex DM domain. Mol. Cell. Biol. 26, 535-547.
Zhu, L., Wilken, J., Phillips, N. B., Narendra, U., Chan, G., Stratton, S. M.,
Kent, S. B. and Weiss, M. A. (2000). Sexual dimorphism in diverse metazoans
is regulated by a novel class of intertwined zinc fingers. Genes Dev. 14, 17501764.
DEVELOPMENT
DSX targets and the evolution of sex