* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PDF
Vectors in gene therapy wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Koinophilia wikipedia , lookup
Gene desert wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
History of genetic engineering wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Point mutation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Transposable element wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Designer baby wikipedia , lookup
Human genome wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression profiling wikipedia , lookup
Microevolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
RESEARCH ARTICLE 2761 Development 138, 2761-2771 (2011) doi:10.1242/dev.065227 © 2011. Published by The Company of Biologists Ltd Direct targets of the D. melanogaster DSXF protein and the evolution of sexual development Shengzhan D. Luo1,*, Guang W. Shi2 and Bruce S. Baker1,*,† SUMMARY Uncovering the direct regulatory targets of doublesex (dsx) and fruitless (fru) is crucial for an understanding of how they regulate sexual development, morphogenesis, differentiation and adult functions (including behavior) in Drosophila melanogaster. Using a modified DamID approach, we identified 650 DSX-binding regions in the genome from which we then extracted an optimal palindromic 13 bp DSX-binding sequence. This sequence is functional in vivo, and the base identity at each position is important for DSX binding in vitro. In addition, this sequence is enriched in the genomes of D. melanogaster (58 copies versus approximately the three expected from random) and in the 11 other sequenced Drosophila species, as well as in some other Dipterans. Twenty-three genes are associated with both an in vivo peak in DSX binding and an optimal DSX-binding sequence, and thus are almost certainly direct DSX targets. The association of these 23 genes with optimum DSX binding sites was used to examine the evolutionary changes occurring in DSX and its targets in insects. INTRODUCTION Although many of the regulatory hierarchies governing development have been elucidated, progress has been slow in identifying the direct targets of the terminal regulatory genes in these hierarchies. These difficulties arise from: (1) few target genes being unique to one hierarchy; (2) difficulties in distinguishing direct from indirect targets; and (3) in vitro defined binding sites of transcription factors occurring by chance many times in the genome. Technological advances such as ChIP-seq solve some of these difficulties, but challenges remain for transcription factors that lack a good antibody or regulate their targets in restricted tissues (Park, 2009). The sex hierarchy of Drosophila melanogaster exhibits all of these features that make it difficult to identify direct targets of the terminal genes in regulatory hierarchies. Our interest here is in identifying the direct targets of doublesex (dsx), a terminal gene in the fly sex hierarchy. dsx transcription is highly regulated, as evidenced by its temporally and spatially restricted expression patterns through development (Robinett et al., 2010; Rideout et al., 2010). dsx pre-mRNAs are sex-specifically spliced (Ryner et al., 1996; Burtis and Baker, 1989), resulting in adults in whom some cells express DSXM or DSXF, and are thus either male or female, respectively, and other cells do not express these transcription factors. DSXM and DSXF share identical DNA-binding domains, but have different C termini (Burtis et al., 1991; Erdman and Burtis, 1993). The DSX proteins specify nearly all somatic sexual differences outside the nervous system, as well as several nervous system sexual dimorphisms (Lee et al., 2002; Sander and 1 Biology Department, Stanford University, Stanford, CA 94305, USA. 2Department of Pathology, Stanford University, 300 Pasteur Drive, Stanford, CA 94305, USA. *Present address: Janelia Farm Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA † Author for correspondence ([email protected]) Accepted 22 April 2011 Arbeitman, 2008; Robinett et al., 2010; Rideout et al., 2010; Mellert et al., 2010) (for reviews, see Christiansen et al., 2002; Camara et al., 2008). Although some genes whose activity levels are dependent on dsx have been identified by a candidate gene approach, most have been identified via plus/minus, microarray, SAGE or enhancer trapbased screens for genes expressed sex-differentially in the soma (Shirangi et al., 2009; Lebo et al., 2009; Chatterjee et al., 2011) (for reviews, see Christiansen et al., 2002; Camara et al., 2008). These screens identified dozens of genes whose activity levels are governed by dsx, but did not distinguish direct from indirect targets. Studies of these dsx-regulated genes showed that the DSX proteins largely act by modulating in some tissues the activities of genes that are also used sex-nonspecifically in other tissues. The array of processes regulated by dsx is complex, as different genes are regulated by dsx in all cell and tissue types examined (for reviews, see Christiansen et al., 2002; Camara et al., 2008). Of the dsx-regulated genes that have been sufficiently characterized to distinguish likely direct and indirect targets, the vast majority are indirect targets (Arbeitman et al., 2004; Chapman and Wolfner, 1988; Wolfner, 1988). The few identified direct targets are the three Yolk protein (Yp) genes (Burtis et al., 1991; Coschigano and Wensink, 1993; Hutson and Bownes, 2003), bric à brac 1 (bab1) (Williams et al., 2008) and Fad2 (Shirangi et al., 2009). The paucity of direct DSX targets limits understanding of important aspects of sexual biology. First, we have little knowledge of the molecular links between dsx and the biological processes it controls. Second, understanding how the functions of the sex hierarchy and other patterning hierarchies are integrated during development requires the identification of direct DSX targets (Christiansen et al., 2002; Williams et al., 2008). Third, not knowing the direct DSX targets seriously constrains studies of the molecular evolution of sex. To open up these topics for more systematic analysis, we carried out a screen for direct DSXF targets. DSXF and DSXM bind with similar kinetics to identical sequences in the Yp1 and Yp2 common promoter (Burtis et al., 1991; Erdman and Burtis, 1993; Coschigano and Wensink, 1993). DEVELOPMENT KEY WORDS: DamID, Doublesex, Evolution, Sex determination, Transcription factor binding site 2762 RESEARCH ARTICLE MATERIALS AND METHODS Plasmids construction To construct a FLP-activated DSXF-Dam plasmid, the coding region of the E. coli Dam methyltransferase was cut from pCMycDam (a gift from Drs van Steensel and Henikoff, Fred Hutchinson Cancer Research Center, Seattle, WA) with EcoRI/XbaI, and inserted into pP{UAST} between EcoRI and XbaI sites. The fragment encoding a V5-tagged DSXF was digested from pMTDSXFV5 (Garrett-Engele et al., 2002), and inserted in-frame into the above plasmid between EcoRI and AgeI sites. An FRT-stop-FRT cassette was inserted into the EcoRI site to generate the final FLP-activated DSXF-Dam construct (fsfDSXF-Dam). To generate a DNA-binding mutant for DSXF-Dam, site-directed mutagenesis was performed using primers 5⬘-TCTGCAAACGGCCTTGctgcagGCCCAGGCGCAGGAT-3⬘ and 5⬘-ATCCTGCGCCTGGGCctgcagCAAGGCCGTTTGCAGA-3⬘ (the lowercase letters indicate the mutated sequences). This resulted in the substitutions of R90L and R91Q in the DSX protein, and the mutated plasmid was confirmed by sequencing to be free of other mutations. An 820 bp BglII/XhoI fragment containing the mutated sequence was used to replace the wild-type sequence in fsfDSXF-Dam, resulting in the plasmid fsfR91Q-Dam. To construct a FLP-activated Dam control plasmid, the coding region of Myc-tagged E. coli Dam methyltransferase was digested from pNdamMyc (a gift from Drs van Steensel and Henikoff) with EcoRI/SalI, and cloned into P{UAST} between EcoRI and XhoI sites. An FRT-stop-FRT cassette was inserted at the EcoRI site and the correct orientation selected, generating the final plasmid fsfNdam. To generate bacterial expression plasmids for DSXF, the coding sequence from the fsfDSXF-Dam was PCR amplified using primers 5⬘ATGGTTTCGGAGGAGAACTGGA-3⬘ and 5⬘-CGTAGAATCGAGACCGAGGAGA-3⬘. The PCR products were cloned into pCR-Blunt-TOPO plasmid (Invitrogen, Carlsbad, CA), with the correct orientation confirmed by sequencing. Eletrophoretic mobility shift assay Eletrophoretic mobility shift assays were carried out as described previously (Erdman et al., 1996) with modifications. Bacteria expressing the V5-tagged DSXF (in pCR-Blunt-TOPO) were cultured in 5 ml of LB medium overnight at 30°C, then induced with 1 mM IPTG for 3 hours and harvested by centrifugation at 4°C. The cells were re-suspended in 300 ml of Z-50 buffer (Erdman et al., 1996) and briefly sonicated followed by centrifuging at 15,000 g at 4°C for 5 minutes and the supernatant immediately frozen (–80°C) until use. The DNA oligos for the probes (from the branchless gene) were synthesized as 5⬘ biotin labeled. Binding reactions (20 ml) were set up in Z-50 buffer and include 2 mg of poly[dI- dC], 0.25 pmol of labeled probes, 10 ml of total cell lysate and various amounts of competitors as indicated. After 20 minutes of incubation at 22°C, reactions were electrophoresed (5% PAGE gel, 0.5⫻ TBE buffer, 120 V) then transferred to a Hybond-N+ membrane (GE Life Sciences, Piscataway, NJ) and the biotin detected using a LightShift Chemiluminescent EMSA Kit (Thermo Scientific, Rockford, IL). Fly crosses and heat shocks Males carrying either fsfDSXF-Dam (two lines), fsfR91Q-Dam (three lines) or fsfNdam (1 line) were crossed to y w hsFLP122; +; + virgin females at 22°C. Newly eclosed female offspring with pigmented eyes were subjected to heat shock at 37°C for 40 minutes to induce the expression of FLPase that in turn activates the expression of Dam proteins, then allowed to recover at 25°C for 96 hours (24 hours for fsfNdam) before being collected for genomic DNA preparation. Construction of DpnI sequence-tag libraries for ultra highthroughput sequencing Genomic DNA was prepared with a DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA). For each sample 8 mg DNA was digested with 40 U of DpnI in 100 ml at 37°C overnight and purified through a QIAquick PCR purification column (Qiagen). A biotin-labeled EcoP15I adaptor was generated by annealing biotin-5⬘-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCAGCAG-3⬘ and 5⬘-CTGCTGAGATCGGAAGAGCT-3⬘ (the EcoP15I sites are underlined) and ligated to each DpnI-digested end. DNA polymerase I was used to repair the nicked strand followed by EcoP15I (which cleaves DNA 27 bp 3⬘ to its recognition site) digestion for 4 hours to generate a 27 bp sequence tag ligated to the biotin-labeled adaptor. The biotin-labeled tags were purified using 30 ml of strepavidin-Dynabeads (Invitrogen) per sample. After end repairing using Klenow polymerase, a second adaptor (generated by annealing 5⬘-ccgagatctaCACTCTTTCCCTACACGACGCTCTTCCGATCT-3⬘ and 5⬘-AGATCGGAAGAGCGTCGTGTA-3⬘) was ligated to the other end of the sequence tags. After washing, the beads were re-suspended in 25 ml 1⫻ Buffer 2 (New England Biolabs, Ipswich, MA). DpnI tags were PCR amplified with primers 5⬘CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCAGCAGTC3⬘ and 5⬘-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC-3⬘ (underlined sequences correspond to the Illumina library construction primers) as follows: 1 cycle of 68°C for 5 minutes; 11 cycles of 95°C for 30 seconds, 65°C for 1 minute, 72°C for 20 seconds; one cycle of 72°C for 5 minutes. The products were resolved on a 4% agarose gel. The desired ~125 bp product was recovered using a QIAquick Gel extraction kit (Qiagen) and quantified using a Nanodrop spectrophotometer (Thermo Scientific, Rockford, IL). Ultra high-throughput sequencing was performed on a Genome Analyzer platform (Illumina, San Diego, CA). A more detailed protocol on library preparation is available upon request. Analysis of sequencing data Sequencing data (25 bp reads) from the Genome Analyzer were matched to the fly genome using the Illumina software package and reads matching a unique location were binned for further analysis. The unique reads were aligned with GATC sites using Perl scripts, and any reads that were not located at a sequence flanking a GATC site were discarded (~10%). Reads that aligned with each GATC site were counted for each sample. Because methylation of any particular GATC site is independent between cells, and the probability of any particular site being methylated in a cell is small, the read counts at each GATC site were treated as a Poisson count. A Z-scorebased approach was used to compare the counts at each GATC site for different samples. The Z score was calculated by the formula: Z= Ca − Cb Ca + Cb , where Ca is the count of reads for sample A, and Cb is the count of reads for sample B. If Ca+Cb is smaller than 4, the Z score at that site was artificially set as 0 (which means that Ca is not different from Cb) to minimize bias created by the small sample. Any Z-score greater than 1.65 (equivalent to P<0.05) was transformed into a P value using the Z-score to P-value table. The transformation was set at a halving of P value for the DEVELOPMENT SELEX experiments defined a consensus DSX-binding site (Erdman et al., 1996) as a 13 bp palindromic sequence (G/A)nnAC(A/T)A(T/A)GTnn(C/T) composed of two half-sites around a central (A/T) base pair. One such site is expected by chance every 2 kb in the D. melanogaster genome, thus, additional information is needed in vivo to specify biologically relevant DSXbinding sites. We identified at a genome-wide level potential DSX-binding regions via DamID chromatin profiling. Analysis of these DSXbinding regions identified a new consensus 13 bp palindromic DSX-binding sequence in which the identity of each base is important for optimal DSX binding. This optimal DSX binding sequence is enriched in the D. melanogaster genome (58 copies versus approximately three expected from random). There are 23 genes associated with an in vivo peak in DSX binding and an optimal DSX binding sequence, and therefore almost certainly are direct DSX targets. The abundance of the optimal DSX-binding site in other insect genomes, as well as the maintenance of its association with the above 23 direct DSX target genes in these species, provides insight into the molecular evolution of DSX and its targets in insects. Development 138 (13) DSX targets and the evolution of sex Identification of an optimal DSX-binding site by MEME, and its analysis in insects The peaks (see Table S2 in the supplementary material) used for searching for an enriched motif by MEME (Bailey and Elkan, 1994) were picked based on peak shapes and good GATC-site density such that allow us to locate the center of the peak, to which DSX potentially binds, and to remove extraneous sequences on both sides. Whole-genome analysis of optimal DSX-binding site matches in the D. melanogaster genome of all chromosome arms (release sequence BDGP R5, April 2006) was carried out with a Perl script. The whole-genome searches for a given 13 bp sequence used Blast (Altschul et al., 1997), with E-value adjusted for genome size. Orthologous genes were defined by the FlyBase annotation or by Blast using the D. melanogaster sequence if no ortholog had been annotated. The sequence of the gene and its 5⬘ and 3⬘ flanking regions were manually exported and subjected to analysis using Perl script. The 5⬘ or 3⬘ flanking region is defined as the sequence until the next gene with the following exceptions: if the next gene is less than 10 kb away, the flanking region is extended to 10 kb; if the next gene is greater than 50 kb away, the flanking region is truncated at 50 kb. Protein alignment The amino acid sequences of DNA-binding domains for the 12 Drosophila DSX proteins were obtained from NCBI GenBank or FlyBase except for D. mojavensis and D. willistoni, which were derived from sequences of RTPCR products, as we found errors in the NCBI sequences in the DSX proteins for these two species. The amino acid sequence alignment was performed using Clustalw2 (Larkin et al., 2007). RESULTS Whole genome identification of DSX-binding regions In vivo DSX-binding regions were identified on a whole-genome scale using DamID, a technique in which tethering E. coli DNA adenine methyltransferase (Dam) to a chromatin binding protein leads to specific methylation of GATC sites in the vicinity of that protein’s binding/recruitment sites (van Steensel and Henikoff, 2000; van Steensel et al., 2001). To avoid non-specific methylation from overexpression (van Steensel and Henikoff, 2000), Damfusion proteins were expressed using the basal expression level of the minimal hsp70 promoter in P{UAST} (Fig. 1A). The GATC methylation profile of a DSXF-Dam fusion construct was compared with those of two control proteins. First, Dam alone was used to measure non-specific methylation (van Steensel and Henikoff, 2000). As the fly genome contains ‘transcription factor colocalization hotspots’, where transcription factors accumulate through protein-protein interactions (Moorman et al., 2006), a second control was Dam fused to a DSX protein with a mutation in its DNA-binding domain (R91Q) that has lost the ability to bind Fig. 1. Modified DamID approach to identify DSXbinding regions. (A)Schematic drawing of the flip-on Dam-fusion constructs. The proteins are expressed from the basal level of the UAS sequence after the FRTstop-FRT cassette is removed by expression of FLPase. (B)Example of an Illumina sequencing library. Lane 1, 100 bp DNA ladder; lanes 2 and 3, PCR of a sample skipping the DpnI digestion step to serve as a negative control; 1ml (lane 2) and 4ml (lane 3) of ligation product were added as a template; lanes 4 and 5, PCR of a sample with every step finished, 1ml (lane 4) and 4ml (lane 5) of ligation product were added as a template. The library band is indicated by an arrowhead and the primer dimer is indicated by an asterisk. (C)The Z-score peak of a DSX-binding region at the Yp1, Yp2 locus. The Z-score comparing the tag counts between DSXF-Dam and R91Q-Dam for each GATC site was plotted at the Yp1, Yp2 locus. The genomic position is indicated on the x-axis. The scale of the Z-score is indicated on the y-axis. Each vertical line indicates the Z-score for a GATC site at the corresponding genomic position on the x-axis. The height of the vertical line indicates the value of the Zscore. For those GATC sites with a Z-score less than 0, only the position of the GATC site is indicated (by a vertical line with the Z-score value of 0). The genes are indicated under the x-axis, with open boxes indicating genes on the forward strand and shaded boxes indicating genes on the reverse strand. (D)Similar to C, the Z-sore peak of a DSX-binding region near the gene shd is shown. DEVELOPMENT convenience of the Perl scripts programming, such that a P value between 0.025 and 0.05 was set as P0.05, a P value between 0.025 and 0.0125 was set as P0.025, and so on; Z-scores greater than 4.3 were set as P0.00001. For any Z-score smaller than 1.65, the P value was set as 1. The cumulative P value was then calculated for each peak by grouping the GATC sites whose P values were equal or smaller than 0.05, and are separated from each other by less than 1.5 kb or no more than one GATC site (in the case that the distance between two GATC site is greater than 1.5 kb). The P value for the peak is calculated by multiplying the P values of the sites and the total number of sites within the peak. RESEARCH ARTICLE 2763 2764 RESEARCH ARTICLE Development 138 (13) The read counts for each GATC site were compared between DSXF-Dam and R91Q-Dam by Z-score testing, which takes into account the difference and the absolute number of reads for the samples. Sites where methylation is significantly higher in DSXFDam than in R91Q-Dam often clustered together in peaks (Fig. 1C,D). We calculated the cumulative P values for all such peaks and chose 1.0e-5 as the cut off threshold for defining a DSXbinding region. This strategy yielded 1452 putative direct DSXbinding regions. To discern whether any of these are likely transcription factor colocalization hot spots, we compared the methylation profiles of R91Q-Dam with Dam alone. A total of 3456 peaks passed the 1.0e-5 cut off, and we consider the corresponding genomic regions to be hot spots. This represents approximately one hot spot every 35 kb, similar to the report of one hot spot per 50 kb in a 1 Mb region studied (Moorman et al., 2006). We re-assessed the DSX-binding peaks by the same Z-scoretesting, but excluded GATC sites located in hot spots, resulting in 1278 DSX-binding regions in the genome. We repeated our experiments using different biological samples for DSXF-Dam and R91Q-Dam (different transgenic lines were used for R91Q). The libraries of sequence tags were sequenced by a second generation Illumina Genome Analyzer II platform. About 11 million reads that map to unique GATC sites were generated for each sample, providing about 34 times the coverage of GATC sites. For Z-score comparison of the two samples, we chose a P value of 1.0e-10 as a cut off as that yielded 1460 peaks, which is similar to the number of peaks obtained from the first run (1452). We excluded the hot spots defined in the first run from the second run data, resulting in 1370 peaks. Six-hundred and fifty (~50%) of the peaks were common to both lists, and we consider these regions to be the more reliable in vivo DSX-binding regions (see Table S1 in the supplementary material). In both runs, all four known direct DSX target genes have significant peaks in the regions where their previously demonstrated DSX-binding sites are located (Table 1), indicating that our approach is robust in identifying real DSX-binding regions. We also compared the genes near the identified DSX-binding regions (see Table S1 in the supplementary material) with the lists of (1) genes showing sex- biased expression or (2) genes regulated by DSX (Ranz et al., 2003; Parisi et al., 2003; Arbeitman et al., 2004; Gnad and Parsch, 2006; Goldman and Arbeitman, 2007; Lebo et al., 2009). The only significant correlation is with DSXregulated genes (Goldman and Arbeitman, 2007): 14/54 genes labeled as ‘DSX set’ are located near a DSX-binding region, which is significantly more frequent than the overall probability (0.05) of a random gene being found near a DSX-binding region (P3.0e–7). There was no significant correlation between the sex-biased genes from the other studies and proximity to a DSX-binding region, suggesting that most of the sex-biased genes identified in these papers are probably indirect DSX targets. Table 1. Summary of the known DSX direct targets and their associated DSX-binding regions DSX target Yp1,Yp2 Yp3 bab1 Fad2 DSX-binding region (peak) P value (1st run) P value (2nd run) Position of peak center relative to the target gene ChrX:9943693-9950830 ChrX:13653051-13654128 Chr3L:1082484-1085429 Chr3L:11016788-11018561 2.21e-50 6.13e-6 9.6e-10 1.04e-10 2.01e-80 4.39e-14 1.5e-12 2.39e-25 Promoter region Promoter region First intron Promoter region *Column 1 lists the previously known direct DSX targets up to date. Column 2 shows the DSX-binding regions (peaks) identified by the DamID-seq experiments. Columns 3 and 4 list the P values of the peaks from the two runs of sequencing experiments. Column 5 indicates the position of the peak center relative to the DSX direct target genes. DEVELOPMENT the Yp1 DSX-binding site (Erdman and Burtis, 1993; Zhang et al., 2006). R91Q-Dam should bind to transcription factor hotspots, but not to real DSX-binding sites. Expression of the Dam-fusion proteins was induced by heat shock of newly eclosed females (hspFLP; UAS-FRT-stop-FRT-Dam[fusion] protein coding sequence) and the flies were allowed to recover at 25°C for 24 hours for Dam alone, and 96 hours for DSXFDam and R91Q-Dam (Fig. 1A). These times were chosen empirically as producing comparable levels of methylation in the alpha-tubulin84D region (an indicator of the whole genome methylation levels), because comparable levels of methylation in these samples are important to minimize the biases towards regions with lowest or highest accessibility in the samples when their methylation levels are compared. Using Yp1 as a positive control, we also validated the DamID approach using semi-quantitative PCR. Ultra high-throughput sequencing was used to profile methylated GATC sites across the whole genome. To quantify the methylation level at each GATC site, the genomic DNA from whole flies was digested with DpnI, which cuts only methylated GATC sites, and processed to generate a library in which each element has two adapters flanking a 27 bp genomic sequence from a DpnI-digested end (Fig. 1B). The final product was amplified by PCR to facilitate visualization and purification. In theory, two 27 bp sequence tags will be generated from each GATC site independently of the methylation status of other GATC sites. In addition, because of the low number of PCR cycles (10-12 cycles) and the uniform size of the sequence tags (expected 27 bp, observed 27±1), little PCR bias should be introduced. Thus, the library can be subjected to ultra high-throughput sequencing to quantify the relative methylation level at each GATC site (see Fig. S1 in the supplementary material). Using an Illumina Genome Analyzer sequencing platform, 3.1 million, 2.6 million and 3.5 million reads matching a unique position in the fly genome were generated for Dam alone, DSXFDam and R91Q-Dam, respectively. Individual reads that matched two or more genomic locations were discarded, as were reads that matched a location distant from a GATC site, which were probably generated from fragments produced by random DNA breaks. In general, over 85% of the unique reads matched a genomic GATC site. GATC sites in heterochromatic regions had significantly fewer unique reads (over 10-fold fewer compared with euchromatic regions, consistent with the expectation that the heterochromatic regions are generally less accessible), and were excluded from further analysis. These filtering steps resulted in read counts of 2.6 million, 2.4 million and 3.1 million for Dam alone, DSXF-Dam and R91Q-Dam, respectively. These translate into 7.4-9.6 times coverage of the GATC sites, and 8489% of the GATC sites have at least one read mapped to them. A large proportion of the 11-16% unmapped GATC sites were common to the three samples, and in regions of repetitive sequence, so the reads for these sites were discarded due to multiple matches. Identification of a new consensus DSX-binding site As the 13 bp consensus DSX-binding sequence derived from SELEX experiments (Erdman et al., 1996) has insufficient information to function as a full DSX-binding site in vivo, we searched the DSX-binding regions that we identified above to see whether we could refine the DSX-binding sequence. We chose 43 regions where the center GATC sites had higher Z scores and the distribution of GATC sites in the regions was relatively even and dense. We trimmed these regions to an average of 1.5 kb to remove as much extraneous sequence as possible without much risk of omitting the potential DSX-binding site. The trimmed regions were searched for enriched motifs using MEME (Bailey and Elkan, 1994). The sequence identified as most enriched is GCAACAATGTTGC (Fig. 2A), which resembles the SELEXidentified DSX-binding site, (G/A)nnAC(A/T)A(T/A)GTnn(C/T), Fig. 2. Identification of a new consensus DSX-binding sequence. (A)The sequence identified by MEME that is enriched in 43 DSXbinding regions (see Table S3 in the supplementary material) (Evalue3.4e-21). The overall height of each stack indicates the sequence consensus at that position (measured in bits), whereas the height of nucleotide symbols within the stack reflects the relative frequency of the nucleotides at that position. (B)Comparing the new consensus DSX-binding sequence with the previously in vitro identified DSXbinding sequence. A 33 bp sequence from the second intron of branchless (bnl) containing a perfect new DSX-binding site was used as a probe in an electrophoresis mobility shift assay. Two palindromic sequences with mutation at the outside 6 bp of the DSX-binding site were created and used as competitors. The sequences and the amount (fold over labeled probe) of competitors are shown in the table under the panel. Lower case letters indicate the mutated nucleotides. RESEARCH ARTICLE 2765 except that it is a complete palindrome and the identity of the outside six bases is specific, suggesting that the these bases may contribute to interactions with DSX protein. We analyzed the affinity of recombinant DSX protein for each of these sequences in vitro by electrophoretic mobility shift assay (EMSA) using a probe containing the new consensus DSX-binding site (GCAACAATGTTGC) and two competitor sequences that differ at the outside six bases (atgACAATGTcat and cgtACAATGTacg). As there are same amounts of protein and probe in all reactions, the decrease in the intensity for the shifted band reflects the relative affinity of the respective competitor to the DSX protein. The SELEX-identified DSXbinding site predicts that these three sequences will have similar affinities, whereas the new consensus DSX-binding site predicts that the first sequence will have higher affinity than the other two. We found that the second sequence (atgACAATGTcat) has about a threefold lower affinity compared with the new DSX consensus site, whereas the third sequence (cgtACAATGTacg) has an even lower affinity (Fig. 2B), suggesting that the new consensus sequence provides more information about the identity of an optimal DSX-binding site. The Yp3 gene is a direct DSX target, and DSX binds a 177 bp fragment from the Yp3 promoter by EMSA (Hutson and Bownes, 2003). However, the SELEX-identified DSX-binding site could not be found. Alignment with the new consensus DSX-binding sequence identified a good candidate for the DSX-binding site in the Yp3 promoter, with only two mismatches out of the 13 bp sequence (see Fig. S2A in the supplementary material). By EMSA, this Yp3 promoter sequence was bound by DSXF protein (see Fig. S2B in the supplementary material), and competed away by unlabeled oligos of the same Yp3 sequence or oligos for the DSX A site in the Yp1 gene, and to a lesser extent, by oligos for the DSX B site in the Yp1 gene. To elucidate the importance of each base pair in the new consensus DSX-binding site, we used EMSA to evaluate the change in DSX affinity for sites mutated at different positions. We found that (1) the sequence identity of each of the 13 bp contributes to the affinity of the DSX binding site, with the outside 6 bp sequence contributing to a lesser extent than the 7 bp core sequence; and (2) a single A/T is preferred at the center position (see Fig. S3 in the supplementary material). The new consensus sequence is enriched in the Drosophila melanogaster genome and significantly associated with DSX-binding regions Searching the whole genome for perfect matches to the new consensus sequence (hereafter referred to as the optimal DSXbinding site) revealed 58 matches (Table 2), well in excess of the 3 (±2) matches expected by chance assuming the genome sequence is random and adjusting for a 42.5±2.5% GC content (Jabbari and Bernardi, 2004). To test whether the optimal DSX-binding site is significantly enriched in the genome compared with similar sequences, we permuted the outside 6 bp of the consensus sequence in such a way that the percentage of GC remained unchanged and the resulting sequence was palindromic. There are 24 such sequences including the optimal DSX-binding site (Fig. 3A). The copy number for the optimal DSX-binding site was significantly greater than the copy number for each of the other 23 sequences by Grubb’s test (P1.24e-12 for one-tailed test) (Fig. 3B). The largest copy number for the other 23 sequences, which is 12, is not significantly different from the other 22 sequences (P0.22 for a one-tailed test). Excluding the optimal DSX-binding DEVELOPMENT DSX targets and the evolution of sex 2766 RESEARCH ARTICLE Development 138 (13) Table 2. Genes near the 58 optimal DSX-binding sites X X X X X X 2L 2L 2L 2L 2L 2L 2L 2L 2L 2L 2L 2L 2L 2L 2R 2R 2R 2R 2R 2R 2R 2R 2R 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3L 3R 3R 3R 3R 3R 3R 3R 3R 3R 3R 3R 3580716 11442523 12115551 13800635 20790737 21378722 672482 684251 715832 2426306 4794104 7134101 7400153 7915167 7915243 9692193 12010402 12529004 17339045 19846470 2533346 3094399 5795615 11488592 12296335 13761675 14470005 18554605 18641339 29848 30341 50987 51892 1085012 1118618 1120322 2991898 3001667 7446065 10368944 10600974 11192592 14369164 16208698 17260582 18378824 20634560 5699874 9008489 15636942 15958377 17434128 21932142 23884341 25116246 26169425 27742671 27765517 3580704 11442511 12115563 13800623 20790725 21378710 672470 684263 715844 2426294 4794092 7134113 7400165 7915155 7915255 9692205 12010414 12529016 17339033 19846482 2533358 3094411 5795603 11488604 12296347 13761687 14470017 18554617 18641327 29860 30353 50975 51880 1085024 1118630 1120334 2991910 3001655 7446077 10368932 10600986 11192604 14369176 16208710 17260570 18378836 20634548 5699862 9008477 15636930 15958365 17434140 21932154 23884329 25116258 26169413 27742683 27765505 Nearest gene(s) P value (1st run) P value (2nd run) Mnt CG32666 Ten-a mamo CG11229 Cyp6t1 ds ds ds dpp CG15630 Pvf3 CG1571 Snoo Snoo Pka-C1 Rh5 bun CG31408 sick Fmo-2/debcl pk Fmrf CG42524 CG33960 elk CG30115 CG30273 CG42260 mthl8 mthl8 Lsp1gama Lsp1gama bab1 bab1 bab1 CG2113 FR CG42458 CG12362 CG34237 CG6199 CG13418 CG13073 CG13723 rpr CG13251 CG34360/Teh1 CG8138 bnl Gr92a InR CG31324 CG34362 CG14506 hdc heph heph 2.19E-08 1.88E-13 2.61E-32 ns 5.37E-17 ns ns 1.49E-37 ns ns ns ns 5.86E-10 3.75E-08 3.75E-08 ns 8.10E-27 ns ns 6.27E-10 6.92E-33 ns 1.75E-10 ns ns ns ns ns ns 6.52E-33 6.52E-33 8.45E-35 8.45E-35 9.60E-10 7.18E-19 8.98E-53 ns ns ns 5.91E-12 ns 1.14E-25 ns 1.98E-59 ns ns 8.59E-09 6.25E-08 ns 9.63E-71 ns 1.54E-12 ns ns 1.74E-20 ns ns ns ns 2.76E-30 3.76E-69 ns 2.62E-89 3.13E-11 ns 1.92E-40 ns ns ns 1.07E-15 7.18E-23 3.24E-30 3.24E-30 ns 7.38E-23 ns ns 3.72E-36 7.12E-38 ns 1.87E-22 5.95E-15 ns 4.27E-13 ns ns ns 8.83E-87 8.83E-87 6.63E-105 6.63E-105 1.50E-12 7.27E-133 7.27E-133 ns ns ns 1.72E-44 ns 9.72E-43 4.50E-11 2.43E-121 1.23E-21 ns 3.91E-11 1.39E-25 ns 8.82E-152 ns ns ns ns 2.78E-50 ns ns ns * * * * * * * * * * * * * * * * * * * * * * * * * The 58 optimal DSX-binding sites are listed by the chromosome arm, the start and the end nucleotide positions. The nearest protein-encoding gene for each site is also listed. P values for the DSX-binding peak within 500 bp of the sites from two runs of experiments are also listed. ns indicates where there is no peak with P value past the cut off. Asterisk indicates that the gene is likely to be a DSX direct target. site, the average copy number for the other 23 sequences is five, with a standard deviation of three; this is close to the expected number assuming that the presence of a sequence in the genome is random. Determination of whether perfect matches to each of these 24 sequences are located near one of the 650 DSX-binding regions (including the 43 used by MEME) revealed that the optimal DSXbinding site is significantly associated with DSX-binding regions DEVELOPMENT Genomic position of DSX site DSX targets and the evolution of sex RESEARCH ARTICLE 2767 chromatin accessibility and therefore cannot be detected by DamID. Alternatively, the sequence itself may not be sufficient to define a DSX-binding site in vivo. (Fig. 3, P2.1e-22 compared with a random 13 bp sequence in the genome which has a probability of 0.019 of being within 0.5 kb of a DSX-binding region). By contrast, the other 23 palindromic sequences are not significantly associated with DSX-binding regions. Therefore, the optimal DSX-binding site is both significantly enriched in the genome and significantly associated with the identified DSX-binding regions. Is this optimal DSX-binding sequence sufficient to predict an in vivo DSX-binding site? Twenty-five of the 58 optimal DSXbinding sites in the genome are associated with a DSX-binding region (27 for the first run, 31 for the second run and 25 in common, corresponding to 23 genes; see Table 2). Thus, the optimal DSX-binding sequence has at least a 40% chance of predicting an in vivo DSX-binding site, whereas the other 23 palindromic sequences do so no better than a random sequence. That not all 58 optimal DSX-binding sites are associated with a DSX-binding region may reflect that: (1) required co-factor(s) for DSX are not present at the time point of our experiment; or (2) the DSX-binding site is located in a GATC-sparse region and/or a region without adequate sequencing coverage because of low The enrichment of the optimal DSX-binding sequence is evolutionarily conserved across the 12 Drosophila species To determine whether the enrichment of the optimal DSX-binding site was conserved in other Drosophila species, across which the DNA-binding domain of DSX is very conserved (see Fig. S4 in the supplementary material), we examined the number of copies of the optimal DSX-binding sequence, as well as the other 23 related palindromic sequences, in the other 11 sequenced Drosophila genomes. The copy number for the optimal DSX-binding sequence is always the highest in each Drosophila genome (Fig. 4 and see Fig. S5 in the supplementary material) and stands out as the sole significant outlier by Grubb’s test (all P<8e-6, on a two-tailed test). We next examined whether the association of an optimal DSXbinding site with a particular gene is conserved for the 23 D. melanogaster genes that we had identified as direct DSX targets based on their being both associated with a DSX-binding region and having an optimal DSX-binding site (Table 2). We determined whether one or more potential DSX-binding site (13/13, 12/13 or 11/13 matches) is associated with each of these 23 genes in the other 11 sequenced Drosophila species (four examples are in Fig. DEVELOPMENT Fig. 3. The new DSX consensus sequence is significantly enriched in the genome of Drosophila melanogaster. (A)List of palindromic sequences created by switching the position at the outside 6 bp of the new DSX consensus binding sequence (column 1, altered positions are shown in red), and their respective copy numbers in the whole genome (column 2), as well as the number of the copies that are near a DSXbinding region (column 3). (B)Scatter plot of the copy numbers for sequences listed in A. The copy number for the DSX consensus sequence (in orange) is significantly greater than those numbers for the rest sequences (P1.24e-12 for one-tailed test). Other DSX-binding sites Although the 13 bp optimal DSX-binding sequence is frequently associated with direct DSX targets, mismatches are allowed in the sequences through which DSX functions in vivo. The identified DSX sites in the bab1 gene (Williams et al., 2008) match well with the optimal DSX-binding site (13/13 and 12/13 bp) and are within the DSX-binding region we identified for bab1. We found several additional DSX-binding regions within the bab1 and bab2 intergenic region, two of which contain an optimal DSX-binding sequence, the functional roles of which are not known. The DSX protein regulates the female-specific expression of Fad2 by binding to a sequence (GCAACAATGTatg; 10/13 bp; matches in capitals) located 272 bp upstream of the Fad2 transcription start site. Finally, in the case of the Yp1 and Yp2 intergenic region, there is a DSXbinding site A (aCtACAATGTTGC) and a weaker DSX-binding site B (cCtACAAaGTgat), which have 11/13 and 7/13 matches to the optimal DSX-binding site, respectively. Given that particular sequences with 11/13 and 12/13 matches to the optimal DSX-binding sequence function as DSX-binding sites in vivo, we asked how many such sequences were observed in the genome and how many were expected, assuming the genomic sequence was random. For 12/13 matches, the number observed (541) was four times the number expected (135). For 11/13 matches, the number of observed (4532) and expected (2310) were less different. Of the sequences with 12/13 matches to the optimal binding site, 14% were associated with a DSX-binding region, whereas only 1.9% would be expected at random. In addition, the numbers of copies of individual sequences with 12/13 bp matches to the optimal DSX-binding site were nonrandom, with some sequences having significantly more copies than expected. However, we could not discern any correlation between the number of copies of individual sequences and the likelihood that a sequence was associated with a DSX-binding region. Thus, although additional information probably determines whether 12/13 and 11/13 matches function as DSX-binding sites in vivo, the nature of that information is currently unclear. 2768 RESEARCH ARTICLE Development 138 (13) genomes, the D. melanogaster optimal DSX-binding site is found at both a significantly higher copy number than the 23 control sequences and is 26- or 34-fold over expected, respectively (Fig. 4). In another mosquito species, Culex pipiens, the copy number of the optimal DSX-binding site is the second highest and is 10-fold more than random expectation. The highest number of perfect matches in the C. pipiens genome is to the related sequence aCgACAATGTcGt (Fig. 4). In the silkworm Bombyx mori, which is further removed from Drosophila, the D. melanogaster optimal DSX-binding site copy number is only twofold more than random expectation and is not an outlier among the 24 palindromic sequences. Instead, the related sequence cgtACAATGTacg is the sole outlier, and is 16-fold more prevalent than expected. None of these 24 sequences showed up as an outlier (P0.01 cut off) in the more distantly related insect species examined (Tribolium castaneum, Apis mellifera, Nasonia vitripennis, Acyrthosiphon pisum and Pediculus humanus corporis) (Fig. 4). Taken together, these observations suggest that a global evolution of the optimal DSX-binding sequence has occurred within the insects. 5; the rest are in Fig. S6 in the supplementary material). The association of genes with an optimal DSX-binding site is seen for 23-35% of the 23 genes, suggesting the possibility that these genes may function in sex-specific differentiation in all these species. Thus, five of these genes are associated with either at least one copy of the optimal DSX-binding sites, or two or more copies of a 12/13 match to the optimal DSX-binding site in all 11 species for which the orthologous gene can be found from FlyBase annotation or a Blast search. For three additional genes, an optimal DSXbinding site is found associated with the gene in 10 species and a 12/13 match to this site found in one species. Similarly, within the four melanogaster subgroup species examined, 65-90% of the 23 genes show an association with an optimal DSX-binding site, suggesting that these genes may function in sex-specific differentiation in all these species. Thus, 15/23 genes were associated with at least one optimal DSX-binding site in all four species, and an additional six genes associated with either an optimal DSX-binding site or a sequence that is a 12/13 match to that binding site. The pattern of enrichment of the optimal DSXbinding sequence in other insects Extending this analysis to other insect species revealed the following. In two mosquito species, Aedes aegypti and Anopheles gambiae, similar to what is observed in the 12 Drosophila The advantage of profiling methylated GATC sites using ultra high-throughput sequencing We have modified the DamID technique to take advantage of nextgeneration sequencing. Previous whole-genome DamID experiments largely used array-based methods (van Steensel and Henikoff, 2000; van Steensel et al., 2001; Orian et al., 2003; Bianchi-Frias et al., 2004; Song et al., 2004; de Wit et al., 2005; Reddy et al., 2008; Pindyurin et al., 2007). In these approaches, the methylated fragments are isolated by PCR-based methods, and therefore are subjected to PCR bias, although efforts were made to minimize such bias. Still, the recovery of a fragment depended on the GATC sites at both ends being methylated, and large fragments produced from GATC-poor regions would be less efficiently amplified. In our approach, each GATC site is evaluated independently. Two sequence tags are generated from each methylated GATC site, and this does not depend on the methylation status of nearby GATC sites. In addition, there are only minimal PCR cycles (<13) employed during library construction and the sizes of PCR products are very uniform (125±1 bp); hence, there is minimal PCR bias. In previous reports where cDNA arrays were used, only binding regions close to exons could be identified. As transcription factorbinding sites are often located in the intergenic regions or in introns, using a cDNA array would probably miss many binding regions. Using tilling arrays may avoid the problem of missing target regions distant from genes, but tilling arrays are limited by the fraction of the genome that can be put on a chip. Moreover, the information is harder to compare between different versions of arrays. To avoid these pitfalls, we used ultra high-throughput sequencing to generate digitized actual sequences that can be directly matched to the genome and can be easily compared with results from different experiments. Furthermore, the array-based DEVELOPMENT Fig. 4. The enrichment of the optimal DSX-binding sequence is conserved in other insects. The genome-wide copy numbers for the optimal D. melanogaster DSX-binding sequence and the other 23 palindromic sequences (listed in Fig. 4A) were normalized by the copy number expected from randomness, and plotted for each insect species. The copy number for the optimal D. melanogaster DSX-binding sequence is indicated in orange for the first 15 species. The copy number for sequence aCgACAATGTcGt in the Culex pipiens genome is indicated in purple. The copy number for sequence cgtACAATGTacg in the Bombyx mori genome is indicated in red. The mean±s.e.m. for each species is indicated. DISCUSSION A modified DamID approach identified a collection of ~650 potential direct DSX target genes. The subset of 23 of these genes shown to be associated with evolutionarily conserved optimal DSX-binding sites seem extremely likely to be direct DSX targets, substantially increasing the number of identified direct DSX targets. Below, we discuss some salient aspects of our experimental approach and findings. DSX targets and the evolution of sex RESEARCH ARTICLE 2769 methods are more easily saturated, thus resulting in false-negatives, whereas the ultra high-throughput sequencing in theory has an infinite linear range. The DamID approach might lack tissue specificity We found that for a binding region to be identified, the corresponding target gene does not need to be in a transcriptionally active state when the Dam-fusion protein experiment is carried out. In the case of the Yp1 DSX-binding region, we observed a three- to ninefold higher normalized methylation level for the DSXF-Dam sample versus Dam alone sample in third instar larvae, even though the Yp1 gene is not transcribed until after eclosion, some 4 days later. There are two possibilities for this lack of temporal specificity. One is that it is an intrinsic feature of the DamID approach. It has been suggested that transcription factors are always in a dynamic mode of on and off their binding sites (e.g. Hager et al., 2009). The Dam-fusion protein might be highly efficient in methylating the GATC sites around its binding sites, so that even transient binding of the protein is sufficient to generate methylation. Alternatively, the DSX-binding sites may be readily recognizable by the DSX protein either because of their sequences per se or because of other factors associated with the sites are already in position, so that when the DSX-Dam protein is expressed at an earlier stage, it binds to these pre-arranged binding sites. Implications for the evolution of sex Our findings contribute at several levels to an understanding of the evolution of sex. Considerations of where in developmental regulatory hierarchies the mutations underlying evolutionary changes are most likely has led to the proposal that such changes are more likely to be in the cis-regulatory elements of the targets of the transcription factors encoded by the terminal regulatory genes in a hierarchy, than in the coding sequences of the terminal regulatory genes themselves (Tautz, 2000; Carroll, 2000). With respect to the sex hierarchy, the limited data previously available are consistent with this proposal. Thus, in the DSX DNAbinding domain (DM domain), the Zn module, as well as the Nterminal region of the tail (amino acids 78-97) immediately following the Zn module [which has been proposed to be the functionally important part of a recognition helix providing the binding specificity (Zhu et al., 2000)], are highly conserved among these Drosophila species (see Fig. S4 in the supplementary material). Of the four previously identified direct DSX targets, only two (bab and Fad2) exhibit changes in their sex-specific expression patterns within the Drosophila genus and in both cases these changes in the sex specificity of their expression are correlated with changes in adjacent DSX-binding sites (Williams et al., 2008; Shirangi et al., 2009). The 23 genes we identified as very probable direct DSX targets provide a broader data set with which to examine this proposal. Within the four other sequenced members of the melanogaster DEVELOPMENT Fig. 5. The evolutionary change of the association of DSX-binding site with genes. (A-D) Four examples of the association of potential DSX-binding sites in the 12 sequenced Drosophila species with the 23 genes that are identified in D. melanogaster as likely direct DSX targets. For each gene, the phylogenetic tree of the 12 Drosophila species is illustrated. The maximum number of base pairs matched to the 13 bp optimal DSX site by any 13 bp sequence in the gene and flanking regions is listed for each species. For the 12/13 matches, the actual sequences (with the mismatched base pair in red) are listed instead. N.A. indicates that there is no sequence available for analysis. subgroup species, 21/23 of these genes are also associated with putative DSX-binding sites (13/13 or 12/13 matches) in all four species, whereas 2/23 genes are not associated with close matches to the optimal DSX-binding site in at least one species, suggesting that the latter may be cases in which the sex-specificity of expression of a gene has changed via an alteration of its cis-acting DSX-binding sequence. When the data from these 23 genes were examined across all 12 sequenced Drosophila species, a similar result was seen: in 23-35% of the cases an association between the gene and a DSX-binding site is retained in all species, while for 6572% of these genes, a recognizable DSX-binding site is not present in one or more of these species. These findings with respect to the latter group of genes are consistent with a potential change in the sex specificity of expression as a consequence of a change in the adjacent DSX-binding site. A second theme in modern considerations of the evolution of sex is that sex evolves rapidly when compared with other developmental processes (Schütt and Nöthiger, 2000). This view derives in part from the wide diversity in sex determination mechanisms used in animals, and findings that the primary sex determination mechanisms can differ in closely related species (Schütt and Nöthiger, 2000; Bull, 1985; Gempe and Beye, 2011). Furthermore, and probably independently, at a developmental level evidence for the rapid evolution of sex comes directly from the myriad of secondary sexual characteristics that frequently distinguish closely related species, which otherwise have nearly identical body plans (Darwin, 1871). Our data, cited immediately above, provides a relatively broad look at the molecular genetic level at the frequencies with which there are potential changes between sex-specific and sex-nonspecific patterns of expression of downstream effector genes. Consistent with relatively rapid evolutionary changes in the sex-specificity of expression of some of the downstream genes in the Drosophila sex hierarchy, we found that ~70% of the 23 D. melanogaster genes associated with DSXbinding sites are not associated with such a site in one or more of the other 11 sequenced Drosophila species. It is also important to note that these data, in addition to pinpointing a set of genes whose roles in sexual development may have changed during the evolution of this genus, also identify another subset of these genes, comprising ~30% of the putative direct DSX targets in D. melanogaster, that retain an association with putative DSX-binding sites in all 12 Drosophila species examined. The latter finding suggests that a significant subset of the genes directly regulated by DSX have functions in aspects of sexual development that are evolutionarily relatively stable. Such evolutionarily conserved aspect of maleness or femaleness might encompass, for example, somatic functions that are necessary for sex-specific germline functions or gametogenesis. Extending our analysis of the optimal DSX-binding site and the related 13 bp palindromic sequences beyond the Drosophila genus provided additional evolutionary insights. In two mosquito species, Aedes aegypti and Anopheles gambiae, enrichment in the optimal DSX-binding site is also observed. However, in the mosquito species Culex pipiens, although the number of optimal DSXbinding sites is still 10-fold more than expected, the related sequence aCgACAATGTcGt had the highest number of perfect matches. Further analysis found that this latter sequence is within a repetitive element of the C. pipiens genome, so we infer that GCAACAATGTTGC is most probably the optimal DSX site in this species. The enriched sequence, cgtACAATGTacg, in the silkworm Bombyx mori is also within a repetitive element. None of the 24 sequences are enriched for the more distantly related insect Development 138 (13) species (Tribolium castaneum, Apis mellifera, Nasonia vitripennis, Acyrthosiphon pisum, and Pediculus humanus corporis) examined. Thus, in insects outside of the Diptera, the DSX proteins might have diverged sufficiently that they bind to sequences different than the 24 examined. Extending approaches such as those described here may provide significant insights into how a DNA-binding domain and its targets have evolved to maintain control of a fundamental developmental process across a broad evolutionary timescale. Acknowledgements We thank the members of the Baker and Sidow labs for helpful discussions and comments. We thank Dr Bas van Steensel and Dr Steven Henikoff for providing the pNDamMyc and pCMycDam plasmids, Dr Ziming Weng for helping with the high-throughput sequencing, Phil Lacroute for initial sequence mapping, Drs Mike Weiss, Loren Looger and Sean Eddy for helpful discussions, and Guennet Bohm for preparing fly food. This work is supported by NIGMS and the Howard Hughes Medical Institute. This study was also facilitated by information from FlyBase (www.flybase.org). Deposited in PMC for release after 6 months. Competing interests statement The authors declare no competing financial interests. Supplementary material Supplementary material for this article is available at http://dev.biologists.org/lookup/suppl/doi:10.1242/dev.065227/-/DC1 References Altschul, S. F., Madden, T. L., Schaffe, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. Arbeitman, M. N., Fleming, A. A., Siegal, M. L., Null, B. H. and Baker, B. S. (2004). A genomic analysis of Drosophila somatic sexual differentiation and its regulation. Development 131, 2007-2021. Bailey, T. L. and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28-36 Bianchi-Frias, D., Orian, A., Delrow, J. J., Vazquez, J., Rosales-Nieves, A. E. and Parkhurst, S. M. (2004). Hairy transcriptional repression targets and cofactor recruitment in Drosophila. PLoS Biol. 2, E178. Bull, J. J. (1985). Sex determining mechanisms: an evolutionary perspective. Experientia 41, 1285-1296. Burtis, K. C. and Baker, B. S. (1989). Drosophila doublesex gene controls somatic sexual differentiation by producing alternatively spliced mRNAs encoding related sex-specific polypeptides. Cell 56, 997-1010. Burtis, K. C., Coschigano, K. T., Baker, B. S. and Wensink, P. C. (1991). The doublesex proteins of Drosophila melanogaster bind directly to a sex-specific yolk protein gene enhancer. EMBO J. 10, 2577-2582. Camara, N., Whitworth, C. and Van Doren, M. (2008). The creation of sexual dimorphism in the Drosophila soma. Curr. Top. Dev. Biol. 83, 65-107. Carroll, S. B. (2000). Endless forms: the evolution of gene regulation and morphological diversity. Cell 101, 577-580. Chapman, K. B. and Wolfner, M. F. (1988). Determination of male-specific gene expression in Drosophila accessory glands. Dev. Biol. 126, 195-202. Chatterjee, S. S., Uppendahl, L. D., Chowdhury, M. A., Ip, P. L. and Siegal, M. L. (2011). The female-specific Doublesex isoform regulates pleiotropic transcription factors to pattern genital development in Drosophila. Development 138, 1099-1109. Christiansen, A. E., Keisman, E. L., Ahmad, S. M. and Baker, B. S. (2002). Sex comes in from the cold: the integration of sex and pattern. Trends Genet. 18, 510-516. Coschigano, K. T. and Wensink, P. C. (1993). Sex-specific transcriptional regulation by the male and female doublesex proteins of Drosophila. Genes Dev. 7, 42-54. Darwin, C. (1871). The Descent of Man and Selection in Relation to Sex. London: John Murray. de Wit, E., Greil, F. and van Steensel, B. (2005). Genome-wide HP1 binding in Drosophila: developmental plasticity and genomic targeting signals. Genome Res. 15, 1265-1273. Erdman, S. E. and Burtis, K. C. (1993). The Drosophila doublesex proteins share a novel zinc finger related DNA binding domain. EMBO J. 12, 527-535. Erdman, S. E., Chen, H. J. and Burtis, K. C. (1996). Functional and genetic characterization of the oligomerization and DNA binding properties of the Drosophila doublesex proteins. Genetics 144, 1639-1652. DEVELOPMENT 2770 RESEARCH ARTICLE Garrett-Engele, C. M., Siegal, M. L., Manoli, D. S., Williams, B. C., Li, H. and Baker, B. S. (2002). intersex, a gene required for female sexual development in Drosophila, is expressed in both sexes and functions together with doublesex to regulate terminal differentiation. Development 129, 4661-4675. Gempe, T. and Beye, M. (2011). Function and evolution of sex determination mechanisms, genes and pathways in insects. Bioessays 33, 52-60. Gnad, F. and Parsch, J. (2006). Sebida: a database for the functional and evolutionary analysis of genes with sex-biased expression. Bioinformatics 22, 2577-2579. Goldman, T. D. and Arbeitman, M. N. (2007). Genomic and functional studies of Drosophila sex hierarchy regulated gene expression in adult head and nervous system tissues. PLoS Genet. 3, e216. Hager, G. L., McNally, J. G. and Misteli, T. (2009). Transcription dynamics. Mol. Cell 35, 741-753. Hutson, S. F. and Bownes, M. (2003). The regulation of yp3 expression in the Drosophila melanogaster fat body. Dev. Genes Evol. 213, 1-8. Jabbari, K. and Bernardi, G. (2004). Comparative genomics of Anopheles gambiae and Drosophila melanogaster. Gene 333, 183-186. Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A., McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R. et al. (2007). ClustalW and ClustalX version 2. Bioinformatics 23, 2947-2948. Lebo, M. S., Sanders, L. E., Sun, F. and Arbeitman, M. N. (2009). Somatic, germline and sex hierarchy regulated gene expression during Drosophila metamorphosis. BMC Genomics 10, 80. Lee, G., Hall, J. C. and Park, J. H. (2002). Doublesex gene expression in the central nervous system of Drosophila melanogaster. J. Neurogenet. 16, 229-248. Mellert, D. J., Knapp, J. M., Manoli, D. S., Meissner, G. W. and Baker, B. S. (2010). Midline crossing by gustatory receptor neuron axons is regulated by fruitless, doublesex and the Roundabout receptors. Development 137, 323-332. Moorman, C., Sun, L. V., Wang, J., de Wit, E., Talhout, W., Ward, L. D., Greil, F., Lu, X. J., White, K. P., Bussemaker, H. J. et al. (2006). Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 103, 12027-12032. Orian, A., van Steensel, B., Delrow, J., Bussemaker, H. J., Li, L., Sawado, T., Williams, E., Loo, L. W., Cowley, S. M., Yost, C. et al. (2003). Genomic binding by the Drosophila Myc, Max, Mad/Mnt transcription factor network. Genes Dev. 17, 1101-1114. Parisi, M., Nuttall, R., Naiman, D., Bouffard, G., Malley, J., Andrews, J., Eastman, S. and Oliver, B. (2003). Paucity of genes on the Drosophila X chromosome showing male-biased expression. Science 299, 697-700. Park, P. J. (2009). ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669-680. Pindyurin, A. V., Moorman, C., de Wit, E., Belyakin, S. N., Belyaeva, E. S., Christophides, G. K., Kafatos, F. C., van Steensel, B. and Zhimulev, I. F. (2007). SUUR joins separate subsets of PcG, HP1 and B-type lamin targets in Drosophila. J. Cell Sci. 120, 2344-2351. RESEARCH ARTICLE 2771 Ranz, J. M., Castillo-Davis, C. I., Meiklejohn, C. D. and Hartl, D. L. (2003). Sexdependent gene expression and evolution of the Drosophila transcriptome. Science 300, 1742-1745. Reddy, K. L., Zullo, J. M., Bertolino, E. and Singh, H. (2008). Transcriptional repression mediated by repositioning of genes to the nuclear lamina. Nature 452, 243-247. Rideout, E. J., Dornan, A. J., Neville, M. C., Eadie, S. and Goodwin, S. F. (2010). Control of sexual differentiation and behavior by the doublesex gene in Drosophila melanogaster. Nat. Neurosci.13, 458-466. Robinett, C. C., Vaughan, A. G., Knapp, J. M. and Baker, B. S. (2010). Sex and the single cell. II. There is a time and place for sex. PLoS Biol. 8, e1000365. Ryner, L. C., Goodwin, S. F., Castrillon, D. H., Anand, A., Villella, A., Baker, B. S., Hall, J. C., Taylor, B. J. and Wasserman, S. A. (1996). Control of male sexual behavior and sexual orientation in Drosophila by the fruitless gene. Cell 87, 1079-1089. Sanders, L. E. and Arbeitman, M. N. (2008). Doublesex establishes sexual dimorphism in the Drosophila central nervous system in an isoform-dependent manner by directing cell number. Dev. Biol. 320, 378-390. Schütt, C. and Nöthiger, R. (2000). Structure, function and evolution of sexdetermining systems in Dipteran insects. Development 127, 667-677. Shirangi, T. R., Dufour, H. D., Williams, T. M. and Carroll, S. B. (2009). Rapid evolution of sex pheromone-producing enzyme expression in Drosophila. PLoS Biol. 7, e1000168. Song, S., Cooperman, J., Letting, D. L., Blobel, G. A. and Choi, J. K. (2004). Identification of cyclin D3 as a direct target of E2A using DamID. Mol. Cell. Biol. 24, 8790-8802. Tautz, D. (2000). Evolution of transcriptional regulation. Curr. Opin. Genet. Dev.10, 575-579. van Steensel, B. and Henikoff, S. (2000). Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nat. Biotechnol. 18, 424-428. van Steensel, B., Delrow, J. and Henikoff, S. (2001). Chromatin profiling using targeted DNA adenine methyltransferase. Nat. Genet. 27, 304-308. Williams, T. M., Selegue, J. E., Werner, T., Gompel, N., Kopp, A. and Carroll, S. B. (2008). The regulation and evolution of a genetic switch controlling sexually dimorphic traits in Drosophila. Cell 134, 610-623. Wolfner, M. F. (1988). Sex-specific gene expression in somatic tissues of Drosophila melanogaster. Trends Genet. 4, 333-337. Zhang, W., Li, B., Singh, R., Narendra, U., Zhu, L. and Weiss, M. A. (2006). Regulation of sexual dimorphism: mutational and chemogenetic analysis of the doublesex DM domain. Mol. Cell. Biol. 26, 535-547. Zhu, L., Wilken, J., Phillips, N. B., Narendra, U., Chan, G., Stratton, S. M., Kent, S. B. and Weiss, M. A. (2000). Sexual dimorphism in diverse metazoans is regulated by a novel class of intertwined zinc fingers. Genes Dev. 14, 17501764. DEVELOPMENT DSX targets and the evolution of sex