* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Male-Biased Mutation Rate and Divergence in Autosomal, Z
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genomic library wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
Koinophilia wikipedia , lookup
Gene expression programming wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Metagenomics wikipedia , lookup
Viral phylodynamics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Genome evolution wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Genome (book) wikipedia , lookup
Microsatellite wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Population genetics wikipedia , lookup
Frameshift mutation wikipedia , lookup
X-inactivation wikipedia , lookup
Y chromosome wikipedia , lookup
Microevolution wikipedia , lookup
Neocentromere wikipedia , lookup
Male-Biased Mutation Rate and Divergence in Autosomal, Z-Linked and W-Linked Introns of Chicken and Turkey Erik Axelsson, Nick G.C. Smith,1 Hannah Sundström, Sofia Berlin, and Hans Ellegren Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden To investigate mutation-rate variation between autosomes and sex chromosomes in the avian genome, we have analyzed divergence between chicken (Gallus gallus) and turkey (Meleagris galopavo) sequences from 33 autosomal, 28 Z-linked, and 14 W-linked introns with a total ungapped alignment length of approximately 43,000 bp. There are pronounced differences in the mean divergence among autosomes and sex chromosomes (autosomes [A] ¼ 10.08%, Z chromosome ¼ 10.99%, and W chromosome ¼ 5.74%), and we use these data to estimate the male-to-female mutation-rate ratio (am) from Z/A, Z/W, and A/W comparisons at 1.71, 2.37, and 2.52, respectively. Because the am estimates of the three comparisons do not differ significantly, we find no statistical support for a specific reduction in the Z chromosome mutation rate (Z reduction estimated at 4.89%, P ¼ 0.286). The idea of mutation-rate reduction in the sex chromosome hemizygous in one sex (i.e., X in mammals, Z in birds) has been suggested on the basis of theory on adaptive mutationrate evolution. If it exists in birds, the effect would, thus, seem to be weak; a preliminary power analysis suggests that it is significantly less than 18%. Because divergence may vary within chromosomal classes as a result of variation in mutation and/or selection, we developed a novel double-bootstrapping method, bootstrapping both by introns and sites from concatenated alignments, to estimate confidence intervals for chromosomal class rates and for am. The narrowest interval for the am estimate is 1.88 to 2.97 from the Z/W comparison. We also estimated am using maximum likelihood on data from all three chromosome classes; this method yielded am ¼ 2.47 and approximate 95% confidence intervals of 2.27 to 2.68. Our data are broadly consistent with the idea that mutation-rate differences between chromosomal classes can be explained by the male mutation bias alone. Introduction How do mutation rates vary within genomes? To date, most vertebrate studies on mutation-rate variation have focused on mammals with increasing evidence of significant local and regional substitution rate variation at putative neutral sites within mammalian chromosomes (Wolfe, Sharp, and Li 1989; Matassi, Sharp, and Gautier 1999; Williams and Hurst 2000; Lercher, Williams, and Hurst 2001; Smith, Webster, and Ellegren 2002; Waterston et al. 2002; Hardison et al. 2003). The causes of this variation are poorly understood (Ellegren, Smith, and Webster 2003), although the observation of covariation of substitution rates in orthologous regions along independent primate lineages shows that regional rate variation is deterministic and repeatable (Smith, Webster, and Ellegren 2002; Hardison et al. 2003). Attempts to explain mutation-rate heterogenity include invoking sequence context effects (Silva and Kondrashow 2002; Zhao and Boerwinkle 2002; Arndt, Petrov, and Hwa 2003; Smith, Webster, and Ellegren 2003), an association between mutation and recombination processes (Lercher and Hurst 2002; Waterston et al. 2002; Hardison et al. 2003; Hellmann et al. 2003), and the evolution of isochores and the correlation between GC content and substitution rates (Eyre-Walker and Hurst 2001). Understanding how and why mutation rates vary is important not only in the contexts of the molecular basis for mutation and genome evolution but also for addressing the possibility for selection to modify mutation rates (mutationrate evolution [Sniegowski et al. 2000]). 1 Present address: Dept of Mathematics and Statistics, Fylde College, Lancaster University, Lancaster, UK. Key words: male-biased mutation, Z chromosome, W chromosome, adaptive mutation rates, nonparametric bootstrapping. E-mail: [email protected]. Mol. Biol. Evol. 21(8):1538–1547. 2004 doi:10.1093/molbev/msh157 Advance Access publication May 12, 2004 Substitution-rate variation is also seen at the level of individual chromosomes. There are significant differences in the mean substitution rate among autosomes in various mammalian comparisons (Lercher, Willians, and Hurst 2001; Ebersberger et al. 2002). The sex chromosomes show the most extreme variation, with the X chromosome evolving slower than autosomes and the Y chromosome evolving faster than autosomes (Li, Yi, and Makova 2002). At least two factors are thought to affect mutation rate variation between the mammalian sex chromosomes. First, if replication error that occurs during germline cell division is the major mutagenic process, then the much greater number of germline cell divisions in males than in females should increase the Y chromosome mutation rate relative to the X chromosome (Miyata et al. 1987). This difference underlies the argument for a male-biased mutation or maledriven evolution. In primates, molecular evolutionary analyses show Y chromosome divergence to be approxiamtely 2.2 times higher than X chromosome divergence (Shimmin, Chang, and Li 1993; Makova and Li 2002; but see Bohossian, Skaletsky, and Page [2000]), which translates to a male-to-female mutation rate ratio (am) of 4 to 6. In rodents, am is estimated at approximately 2 (Chang et al. 1994) and in goats at 3 to 4 (Lawson and Hewitt 2002), potentially indicating a correlation between generation time and am. Second, we may expect selection to favor a reduced mutation rate on the X chromosome because of the hemizygous exposure of recessive deleterious mutations (McVean and Hurst 1997). This theory was supported by an early finding that the X-linked synonymous substitution rate in the mouse-rat comparison was reduced to a greater extent than could be caused by a male mutation bias alone (McVean and Hurst 1997). However, the current evidence for such an adaptive reduction in the X-linked mutation rate of mammals is weak. For example, in a humanchimpanzee comparison of genomic sequences, the Molecular Biology and Evolution vol. 21 no. 8 Ó Society for Molecular Biology and Evolution 2004; all rights reserved. Divergence in Autosomal, Z-Linked, and W-Linked Avian Introns 1539 reduction in X-linked rates can be explained by a male mutation bias and high ancestral polymorphism because confidence intervals of the estimates of the male mutation bias derived from different chromosome comparisons X/A, Y/A, and Y/X overlap (Ebersberger et al. 2002). A similar conclusion was recently reached by Malcom, Wyckoff, and Lahn (2003) from extensive human-mouse and mouse-rat comparisons that used synonymous substitution rates. It is unclear why the results from the two latter studies are not consistent with the observation of McVean and Hurst (1997). The sex chromosome system of birds (ZZ males and ZW females) offers an interesting contrast to that of mammals, not least because the avian sex chromosomes evolved independently of those in mammals; that is, mammalian X chromosomes and avian Z chromosomes are not syntenic (Fridolfsson et al. 1998; Nanda et al. 1999, 2002). We expect the avian Z chromosome to have an elevated mutation rate caused by the male mutation bias because the Z chromosome spends two-thirds of its time in males, where rates of germline cell division are high. Accordingly, Z should evolve faster than the femalespecific W chromosome. This prediction is supported by analyses of substitution rates in gametologous introns shared between the Z and W chromosome of various bird lineages (Ellegren and Fridolfsson 1997; Kahn and Quinn 1999; Carmichael et al. 2000; Fridolfsson and Ellegren 2000; Bartosch-Härlid et al. 2003); the Z chromosome evolves faster than the W chromosome. There is some variation in the different estimates of avian am (1.7 to 6.5), but the confidence intervals associated with these estimates are large, and so far, all estimates have been based on molecular evolutionary analyses of a limited number of short introns or short coding sequences. Predictions for an adaptive reduction in X-linked mutation rates apply in the same way to the avian Z chromosome. Because deleterious mutations will be exposed on the Z chromosome when hemizygous in females, selection may favor a reduced mutation rate on this chromosome. The effects of adaptive mutation rates and the male mutation bias are, thus, expected to work in opposition in birds, leading to reduced and increased Z rate, respectively. Note that avian W as well as mammalian Y is always hemizygous but, as will be discussed later, several lines of arguments suggest that there is little potential for adaptive mutation-rate evolution on these chromosomes. The evidence for Z evolving faster than W seems unambiguous, but this does not rule out that the Z rate is lower than what should be expected from malebiased mutation alone. Fortunately, a quantitative assessment of the role of the male mutation bias and chromosome-specific mutation rates in birds can by obtained from comparisons of substitution rates in autosomes (A), the Z chromosome, and the W chromosome. If the male mutation bias is the main factor governing the mean mutation rate of chromosomes, then estimates of am should be similar from Z/A, Z/W, and A/W comparisons (cf. Miyata et al. 1987; McVean and Hurst 1997; Malcom, Wyckoff, and Lahn 2003). On the other hand, if the Z chromosome mutation rate is specifically reduced, am estimated from A/W should be higher than when estimated from Z/A and Z/W comparisons. To address this issue, we here make a large-scale attempt to analyze substitution rate variation in the avian genome by studying divergence in roughly 43 kb of orthologous, noncoding sequence of chicken (Gallus gallus) and turkey (Meleagris galopavo). We obtain data from 74 different introns on autosomes, Z chromosome, and W chromosome and contrast am estimates from Z/A, Z/W, and A/W comparisons. Materials and Methods Collection of Sequence Data Chicken and turkey intron sequences were derived for autosomal, Z-linked or W-linked genes (map information from ArkDB farm animal database at www.thearkdb.org or Schmid et al. [2000]), with the criterion of using only introns longer than 200 bp to reduce stochastic variation in estimates of divergence. This decision was motivated by the use of a novel bootstrapping method that bootstraps by both introns and sites. Also, excluding short introns may reduce the effect of constraint on small introns that results from conservation of splice sites. Because the exon-intron organization is not given for most avian genes in GenBank, we first Blasted chicken cDNA sequences against the draft human genome at NCBI (July 2003 build). Large gaps in the avian sequence produced in such Blast alignments should represent positions of putative introns; for all genes analyzed in this study, this approach revealed putative avian introns at precisely the same positions as in orthologous human genes. After this procedure, we designed exonic PCR primers for sequencing of both chicken and turkey introns (Appendix 1–3). For a few genes, the full genomic sequence, including exons and introns, was available in chicken. In those cases, exonic primers were designed for amplification in turkey only. Chicken and turkey DNA was extracted from fresh muscle tissue by standard proteinase K digestion and phenol/chloroform purification, adapted from Hoelzel and Green (1998). PCR reactions were carried out in 50-ll reaction volumes that contained 20 to 250 ng of DNA template, 13PCR Gold buffer (Applied Biosystems), 0.2 lM of each primer, 2.0 mM MgCl2, 0.2 mM dNTPs (Amersham Pharmacia Biotech Inc), and 1 U Ampli Taq Gold (Applied Biosystems). Amplification reactions were performed using an initial denaturation at 958C for 5 min, followed by 33 to 40 cycles at 948C for 30 s and specific annealing and extension conditions for every intron. Amplified fragments were purified using the Qiaquick purification protocol (Qiagen), sequenced using BigDyeÔ Terminator Cycle Sequencing chemistry with original primers (Applied Biosystems), and sequences were recorded with an ABI377 semiautomated sequencing instrument (Applied Biosystems). Sites found to be heterozygous in direct sequencing were excluded from analysis. In some cases, sequencing was preceded by cloning, which was done using the pGEM-T vector kit. Sequencing reactions were then initiated with M13 vector primers. Sequence data from this article have been deposited with the GenBank Data Library under 1540 Axelsson et al. accession numbers AF006660, AF526055, AY139836 to AY139865, AY142943 to AY142944, AY144673 to AY144682, AY189754 to AY189777, AY194125 to AY194147, AY298959 to AY299013, AY380785 to AY380789, and AY426725 to AY426737. Sequence Analysis Orthologous chicken and turkey introns were aligned by use of ClustalW on default settings (Thompson, Higgins, and Gibson 1994), although some manual adjustment was required to improve the alignment of repetitive sequences. Pairwise distances were estimated by use of the baseml program in PAML version 3.11 (Yang 1997), with the Tamura-Nei (Tamura and Nei 1993) model of sequence evolution. Distances were estimated on the assumption that all sites evolve at the same rate (i.e., no among-site rate variation). The estimation of confidence intervals and hypothesis testing was carried out by application of nonparametric bootstrapping. We developed a new bootstrapping procedure, termed double bootstrapping. For a given chromosome category (A, Z, or W) we first bootstrapped by introns, randomly sampling introns with replacement to give the same total number of introns as in the original data set, and then, for each of the intron alignments, we bootstrapped by sites, randomly sampling sites with replacement to generate alignments of the same length as the originals. The first stage of the bootstrapping procedure accounts for variation in substitution rates between different introns, as may be caused by regional variation in mutation and/or selection (reviewed in Ellegren, Smith, and Webster [2003]). Our preliminary observations suggest that this variation may be significant in bird genomes and that it is present in autosomes as well as the sex chromosomes (S. Berlin, N. G. C. Smith, and H. Ellegrin, unpublished data). Note, for instance, that the point estimates of divergence in autosomal introns varies between 3.9% and 18.5% (table 1), although these estimates are associated with large confidence intervals. We are currently investigating the causes of this rate heterogeneity. Preliminary analyses suggest that conserved sites or blocks explain part of the variation, but an underlying variation in the mutation rate contributes as well. One important implication of regional substitutionrate variation is that estimates of am can be heavily biased when based on individual introns. The second stage of the bootstrapping accounts for noise generated during the estimation of divergence. Pairwise distances were calculated for each of the alignments after the double bootstrapping, and the unweighted mean of these distances was taken as the output. The bootstrapping process was repeated 1,000 times, thereby giving 1,000 sets of W-linked, Z-linked, and autosomal divergences from which to estimate the male mutation bias (am) and other rate statistics. The standard deviation of the bootstrap values gives an estimate of the standard error of the bootstrapped statistic (Sokal and Rohlf 1995). Hypothesis testing that required the comparison of rate statistics was performed by direct comparison of randomized bootstrap values. Results A set of autosomal, Z-linked and W-linked introns were sequenced and analyzed in chicken and turkey, two species with a divergence time of 28 MYA, estimated by use of mitochondrial DNA–based molecular clocks (Dimcheff, Drovetski, and Mindell 2002). We obtained 33 orthologous autosomal alignments with a total ungapped length of 16,188 bp (table 1), 28 Z-linked alignments with total length 16,079 bp (table 2), and 14 W-linked alignments with total length 10,621 bp (table 3). There are pronounced differences in mean divergence among autosomes and sex chromosomes (autosomes [A] ¼ 10.06%, Z chromosome ¼ 10.95%, and W chromosome ¼ 5.71%). When the divergences were calculated using the double-bootstrapping method (see Materials and Methods), we obtained the following values of medians and standard errors for the concatenated alignments: A ¼ 10.08% 6 0.67%, Z ¼ 10.99% 6 0.48%, and W ¼ 5.74% 6 0.49%. Thus, both autosomal and Z-linked sequences seem to evolve at just under twice the rate of W-linked sequences in the chicken–turkey comparison. Figure 1 shows the distribution of bootstrap values for autosomes, the Z chromosome, and the W chromosome with the double-bootstrapping method. Divergence data from autosomes and sex chromosomes allows the partitioning of the effects of the male mutation bias and one other factor that affects substitution rates. If substitution rates are solely determined by the male mutation bias, then all three estimates of the male mutation bias (am) based on pairwise comparisons of divergence in pairs of chromosome categories (equations 1 to 3, using the approach of Miyata et al. [1987]) should give the same value. am ðZ=WÞ ¼ ð3Z=W 1Þ=2 ð1Þ am ðA=WÞ ¼ ð2A=W 1Þ am ðZ=AÞ ¼ ð3Z=A 2Þ=ð4 3Z=AÞ ð2Þ ð3Þ From equations 1 to 3, we obtain the following am estimates: Z/A ¼ 1.71 (95% confidence interval 0.62 to 7.16), Z/W ¼ 2.37 (1.18 to 2.97), A/W ¼ 2.52 (1.88 to 3.34). The bootstrap distribution of am estimates for each comparison is shown in figure 2. We can quantify any discrepancies in the three estimates of am by using rearrangements of equations 1 to 3 to predict divergence in one chromosome category, given observed divergences in the two other. Here, we predict divergence on the Z chromosome, given the autosomal and W-linked data. Comparing the expected and observed Zlinked divergences then gives the percentage reduction in the Z-linked substitution rate, termed Zr (equation 4). Z Zr ¼ 100 1 ð4Þ ð4A WÞ=3 Using the double-bootstrapping method, the median value of Zr is 4.89%, with a large bootstrap standard error of 8.50%. So in this approach, there no evidence of a significant discrepancy (P ¼ 0.286; one-tailed probability of Zr 0) from the male mutation bias predictions. Hence, Divergence in Autosomal, Z-Linked, and W-Linked Avian Introns 1541 Table 1 Data for Autosomal Introns Locus name Nicotinic acetylcholine receptor, c-subunit Actin, b Actin, b Adenylate kinase 1 Annexin V Annexin V POU domain Neural secreted glycoprotein Creatin kinase, brain Clathrin heavy chain Crystalline, bA3 Transcriptional repressor-d EF1 Fatty acid synthase Growth hormone 1 Glyceraldehyde-3-phosphate dehydrogenase Hemoglobin, aD Heat shock 108 kDa protein 1 Interleukin 8 Luteinizing hormone/ choriogonadotropin recep Ribosomal protein–encoding gene L37A Ribosomal protein–encoding gene L5 Ribosomal protein–encoding gene L7A Ribosomal protein–encoding gene L7A Myosin light chain Opsin Opsin Opsin Ovomucoid Rhodopsin visual pigment Rhodopsin visual pigment Ribosomal protein–encoding gene L30 Transforming growth factor–b2 Vimentin Locus Symbol Intron ACGR ACTB ACTB AK1 ANXA5 ANXA5 BRN-3 F 2 3 3 5 7 1 CEPU CKB CLTC CRYBA3 Chicken Accession Numbers Turkey Accession Numbers Ungapped Alignment Length (bp) Divergence (%) K02904 X00182 X00182 D00251 U01671 U01671 X91997 AY144678 AY298978 AY298979 AY298973 AY298974 AY298975 AY298977 691 479 297 366 387 390 370 7.1 3.9 12.6 6.4 6 10.9 9 1 1 5 2 AJ225897 AY139862 AJ429073 M84460 AY142943 AY139863 AY142944 AY14467 607 469 619 776 6.9 4.9 9.5 8.3 DEF1 FASN GH1 3 Y 2 D76434 J02839 D10484 AY144673 AY298980 AY144673 621 417 435 8.8 7.6 9.1 10 GAPD HBAD 3 2 M11213 X59989 AY298981 AY298976 271 247 13.3 HSP108 IL8 1 3 AY139864 AJ009800 AY139865 AY298982 662 539 17.5 15.2 LHCGR 7 AJ289775 AY144675 498 9.1 RPL37A 3 D14167 AY298987 827 9.9 RPL5 4 D10737 AY380788 592 8.3 RPL7A 3 D14522 AY380789 341 12.9 RPL7A MLC OPSIN OPSIN OPSIN OVM 4 6 1 3 4 G D14522 X00460 U87449 U87449 U87449 AF170979 AY298988 AY144675 AY298983 AY298984 AY298985 AF170984 345 460 438 204 370 431 7.1 13.1 14.4 10.1 18.5 4.8 RHO 1 D00702 AY144679 818 15.9 RHO 4 D00702 AY144680 356 11.8 RPL30 TGFB2 VIM 3 5 E D14521 X60091 M15852 AY144681 AY298986 AY144682 731 552 582 10.2 9.7 9.4 with the present data set, the male-mutation bias is sufficient to explain observed differences in divergence of autosomal, Z-linked, and W-linked sequences in the chicken–turkey comparison. It is noteworthy that the am estimates from the three possible comparisons remain consistent despite the significant differences in GC content between autosomes and sex chromosomes: autosomes ¼ 46.7%, Z chromosome ¼ 36.3%, and W chromosome ¼ 33.1% (t-tests: WA, P 0.0001; Z-A, P 0.0001; Z-W, P ¼ 0.088). This finding suggests that GC content does not have a strong effect on mean divergence of chromosomal classes, at least not in the sense that it would affect am estimates. The observed divergences can be used for estimation of absolute substitution rates between chicken and turkey. Using a divergence time of 28 MYA (Dimcheff, Drovetski, and Mindell 2002), we obtain rates of A ¼ 3.6 3 1029, Z ¼ 3.9 3 10–9 and W ¼ 2.0 3 10–9 substitutions per site per year. These rate estimates may be useful for dating events in bird evolution based on nuclear sequence data, although substitution rates may, of course, vary among avian lineages. In mammals, substitution rates in the human and mouse lineages since the split of primates and rodents have been estimated at 2.2 3 10–9 and 4.5 3 10–9, respectively (Waterston et al. 2002). Note that these latter substitution rates represent the average rates since the time of divergence, and that current rates may differ even more as the difference in generation time between human and most rodents should be more significant now than shortly after divergence (assuming a generation time effect on substitution rates). It is interesting to note that the substitution rate estimates for the chicken–turkey comparison are very similar to that obtained from restriction site analysis of 1542 Axelsson et al. Table 2 Data for Z-linked Introns Locus Name Aconitase 1 Aldolase B Aldolase B Aldolase B Aldolase B ATP synthase a2subunit on Z brm Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Chromo-helicase-DNA binding 1 on Z Histidine triad nucleotide-binding protein on Z Patched Patched Purpurin Spindlin on Z Ubiquitin-associated protein 2 on Z Ubiquitin-associated protein 2 on Z Ubiquitin-associated protein 2 on Z Very-low-density lipoprotein receptor (95 kDa) Very-low-density lipoprotein receptor (95 kDa) Locus Symbol ACO1 ALDOB ALDOB ALDOB ALDOB Intron Chicken Accession Number Turkey Accession Number Ungapped Alignment Length (bp) Divergence (%) 2 1 2 5 7 AY139856 AY380785 AY139836 AY139839 AY139841 AY139857 AY380786 AY139842 AY139845 AY139847 387 277 239 470 413 13.4 8.9 13.9 9.9 9.2 3 15 AF165969 AY298991 AF165971 AY298992 789 278 12.3 10.1 CHD1Z 6 AY298997 AY299005 719 12.2 CHD1Z 7 AY298998 AY299006 234 10.2 CHD1Z 9 AY298999 AY299007 482 14.8 CHD1Z 10 AY298993 AY299000 605 12 CHD1Z 11 AY298994 AY299001 771 12.4 CHD1Z 12 AY426729–30 AY426733 996 13 CHD1Z 13 AY298995 AY299002 324 11.2 CHD1Z 15 AF525980 AF526014 419 14.2 CHD1Z 22 AY426728 AY426734 1222 11.2 CHD1Z 24 AF526056 AY299008 394 8.7 CHD1Z 25 AY298996 AY299004 702 11.9 HINTZ PTCH PTCH PURP SPINZ 2 7 8 3 2 AB026676 AY299009 AY299011 AY298989 AY194142 AY194147 AY299010 AY299012 AY298990 AY194143 685 346 420 259 768 11.7 9.8 10.1 6.9 11.3 UBAP2Z 1 AY189761 AY189760 1185 7.3 UBAP2Z 2 AY189777 AY189776 406 9.1 UBAP2Z 4 AY426726 AY426732 1442 9.6 VLDLR 7 AY139848 AY139851 460 11.1 VLDLR 9 AY139850 AY139853 387 10.4 ATP5A1Z BRM several galliform species (Helm-Bychowski and Wilson 1986). These authors mapped 161 restriction sites from three autosomal regions and associated estimates of divergence with fossil evidence. Using different calibration points in galliform evolution, they arrived at rate estimates of 3.4 3 1029 to 4.0 3 1029 substitutions per site per year. Apparently, the relative early molecular evolutionary work of Helm-Bychowski and Wilson (1986) is in good agreement with data from large-scale DNA sequencing. Discussion In this study, we addressed between-chromosome variation in mutation rates in birds. Our approach was to use variation in intronic divergence to infer mutation-rate variation directly. How well justified is this inference? The most likely complication is the effect of selection on intronic sequences in birds. Little is yet known on this subject, but given that comparative studies are uncovering numerous nongenic conserved regions in mammalian genomes (e.g., see Dermitzakis et al. [2002]) and that bird genomes are smaller than mammalian genomes, it seems possible that conserved regulatory sequences may be relatively common in avian genomes. However, given that a large number of introns were analyzed for each chromosome category and that there may by regional mutation-rate variation or local effects of selection, the present data set may better reflect the mean substitution Divergence in Autosomal, Z-Linked, and W-Linked Avian Introns 1543 Table 3 Data for W-linked introns Locus Name ATP synthase a-subunit on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Chromo-helicase-DNA binding 1 on W Spindlin on W Ubiquitin-associated protein 2 on W Ubiquitin-associated protein 2 on W Ubiquitin-associated protein 2 on W Locus Symbol Intron Chicken Accession Number ATP5A1W 3 AF165968 CHD1W 7 CHD1W Ungapped Alignment Length (bp) Divergence (%) AF165970 771 7.6 AY298971 AY426735 250 4.4 10 AY298965 AY298959 569 6.9 CHD1W 11 AY298966 AY298960 582 6.7 CHD1W 12 AY426727 AY426736–7 1066 6.4 CHD1W 15 AY298972 AF526013 274 5.9 CHD1W 17 AY298970 AY299013 677 5.4 CHD1W 22 AY298968 AY298963 1249 5.4 CHD1W 25 AY298969 AY298964 684 4.1 CHD1W SPINW 24 2 AF526055 AY194125 AF526077 AY194126 405 611 8.5 4.1 UBAP2W 1 AY189754 AY189755 1204 3.8 UBAP2W 2 AY189767 AY189768 832 4.5 UBAP2W 4 AY426725 AY426731 1447 6.1 rate of chromosome category than a similar amount of sequence derived from a single region for each category. Turkey Accession Number Our point estimates of am for the chicken–turkey comparison are 1.71, 2.37, or 2.52, depending on whether based on Z/A, Z/W, or A/W comparison. Previous studies of male-biased mutation in birds have revealed am estimates in the range of 1.7 to 6.5 (Ellegren and Fridolfsson 1997, Kahn and Quin 1999, Carmichael et al. 2000, Fridolfsson and Ellegren 2000, Bartosch-Härlid et al. 2003), so our point estimates are in the lower range of reported values (table 4). However, most of the estimates are associated with large confidence intervals, and the confidence intervals obtained in this study (0.62 to 7.16, 1.18 to 2.97, and 1.88 to 3.34) generally overlap with those of previous studies. If we assume a single true value of am, then we can estimate it by combining data from autosomes, the Z chromosome, and the W chromosome. For each intron, the observed number of substitutions [O(S)] is given by the ungapped length of the alignment multiplied by the divergence. Note that although we refer to these data as ‘‘observed,’’ strictly speaking they are inferred from the alignments, but here we assume perfect inference of divergence. Then assuming a single am, we can calculate the expected number of substitutions, E(S), for each intron FIG. 1.—Histogram of 1,000 bootstrap values of autosomal, Wlinked, and Z-linked divergences. Bootstrap values were obtained using the double-bootstrap method (see Materials and Methods). FIG. 2.—Histogram of 1,000 bootstrap values of the male-to-female mutation-rate ratio am(A/W), am(Z/W), and am(Z/A). Bootstrap values were obtained by use of the double-bootstrap method. The Male Mutation Bias 1544 Axelsson et al. Table 4 Estimates of am in Birds Type of Sequence Length (bp) CHD1Z/CHD1W CHD1Z/CHD1W CHD1Z/CHD1W Intron Exon Intron 711 2,754 738 1.4–2.2 1.3–3.3 1.7–3.2 2.3–2.7 2.4–4.6 1.1–6.7 3.1–5.1 ATP5A1Z/ATP5A1W ATP5A1Z/ATP5A1W CHD1Z/CHD1W 74 different regions CHD1Z/CHD1W CHD1Z/CHD1W CHD1Z/CHD1W Intron Intron Intron Intron Intron Exon Intron ;800 ;800 711 42,888 711 615 428 4.7 3.4–6.0 CHD1Z/CHD1W Intron 731 5.0 6.5 3.3–6.6 2.8–10.2 ATP5A1Z/ATP5A1W CHD1Z/CHD1W Intron Intron ;800 230 am CI 1.7 1.7 1.8 0.9–2.5 ND 1.2–2.3 1.8 2.3 2.4 2.5 3.5 3.9 4.1 Gene Species Reference Ten passeriforms Three different orders One anseriform, two galliforms Two galliforms Two anseriforms Six passeriforms Two galliforms Four passerforms Two passeriforms One galliform, three anseriforms Two procellariiforms, four charadriiforms Two gulls Four passeriforms Bartosch-Härlid et al. (2003) Fridolfsson and Ellegren (2000) Bartosch-Härlid et al. (2003) Carmichael et al. (2000) Carmichael et al. (2000) Bartosch-Härlid et al. (2003) This study Bartosch-Härlid et al. (2003) Ellegren and Fridolfsson (1997) Kahn and Quinn (1999) Bartosch-Härlid et al. (2003) Carmichael et al. (2000) Ellegren and Fridolfsson (1997) NOTE.—Confidence interval (CI) is 95% (when available). The method for estimating CI has varied among studies. ND indicates not determined. as the product of ungapped alignment length, a chromosome-specific scaling factor, and a normalizing factor that ensures the sum of expected substitutions equals the sum of observed substitutions. The chromosome-specific scaling factors (K) reflect the proportion of time spent in the male and female germlines: KW ¼ 1, KA ¼ (11 am)/2, and KZ ¼ (112am)/3. The log-likelihood of am can be approximated by the G-test statistic (page 692 in Sokal and Rohlf [1995]), which uses the O(S) and E(S) values to generate a maximum-likelihood estimate of am ¼ 2.47. Approximate 95% confidence intervals can be estimated as the range of am for which the log-likelihood is within 2 units of the maximum, which yields a range of 2.27 to 2.68. Because the amount of sequence data in the present study exceeds that of earlier avian studies by one or two orders of magnitude, and the data are based on autosomes as well as sex chromosomes, our maximum-likelihood am estimate may be viewed as the most accurate estimate yet obtained for birds. This contention is substantiated by the fact that our estimate is based upon sequence data from a large number of regions from each chromosome category. Regional mutation-rate variation would make am estimates sensitive to the particular regions used for molecular evolutionary analysis. Previous studies of malebiased mutation in birds (e.g., Ellegren and Fridolfsson [1997], Kahn and Quinn [1999], Carmichael et al. [2000], and Bartosch-Härlid et al. [2003]), as well as many studies in other organisms (e.g., Shimmin, Chang, and Li [1993], Bohossian, Skaletsky, and Page [2000], and Makova and Li [2002]), have been based on one or a few genomic regions only. Given the overlap in confidence intervals between the present am estimates in the chicken–turkey comparison and those obtained in studies of other bird species, it would be premature to conclude that the male mutation bias is lower in galliforms than in other birds. On the other hand, a low-point estimate of 1.8 was independently obtained for chicken and turkey using ATP5A1Z/ATP5A1W intron sequences (Carmichael et al. 2000), and the same estimate was obtained in a three-species comparison that included one anseriform and two other galliform species using CHD1Z/CHD1W introns (Bartosch-Härlid et al. 2003). We have recently found evidence of am being higher in avian lineages with longer generation time and with higher intensity of sexual selection, which suggests a link between life-history characteristics and the male mutation bias (Bartosch-Härlid et al. 2003). Most galliforms breed at the age of 1 year, so a rather weak male mutation bias would be consistent with a generation time effect. A potential problem in estimation of the male mutation bias from sex-linked sequences is the effect of ancestral polymorphism, which can bias estimates of am when distances are low and lineage sorting is incomplete (Makova and Li 2002; Ellegren 2002a). However, the effect of ancestral polymorphism in our study is expected to be minimal because all pairwise distances between chicken and turkey are relatively high (5% to 11%), and with a divergence time of 28 MYA, lineage sorting should have been completed long ago. Makova and Li (2002) found ancient polymorphism to affect estimates of am in the human–chimpanzee comparison (;1% divergence) but not in comparisons of human and more distantly related primates. Is There a Reduction in the Z Chromosome Mutation Rate? Our results indicated that in birds, the Z-linked introns evolve slightly faster than autosomal introns, which, in turn, evolve much faster than W-linked introns. These qualitative findings are in keeping with the male mutation bias predictions of Miyata et al. (1987), and there is, thus, no apparent need to invoke factors additional to the male mutation bias to explain variation in divergence among chromosomal classes. Alternatively, we can view our results as indicating that such potential factors, if they exist, must be weak, which is opposite to some previous suggestions (McVean and Hurst 1997). There is the theoretical possibility that the increased efficacy of selection against slightly deleterious mutations Divergence in Autosomal, Z-Linked, and W-Linked Avian Introns 1545 on the avian Z chromosome relative to the autosomes (Charlesworth, Coyne, and Barton 1987) could reduce the mutation rate on the Z chromosome relative to null expectations. However, a quantitative analysis provides confidence that weak selection is unlikely to be responsible, for example, for a 5% reduction (the estimated value of Zr) in the Z chromosome mutation rate. Irrespective of the nature of dosage compensation in birds (see Ellegren [2002b]), the expected substitution rate of autosomal relative to Z-linked sequences, RA/Z, is given by (see equations 8a and 9 in Charlesworth, Coyne, and Barton [1987]): 1 1 ð5Þ RA=Z ¼ 1 þ Ns h 3 2 Even if all slightly deleterious mutations are recessive (McVean and Charlesworth 1999), a conservative assumption with respect to the strength of selection required, then equation 5 shows that a 5% reduction in Z-linked divergence requires the magnitude of Ns, the product of the selective coefficient of mutations and the effective population size, to be unrealistically high. We conclude that some theoretical arguments as well as our empirical observations do not support a significantly reduced Z-chromosome mutation rate. On the other hand, failure to demonstrate an effect does not mean that it does not exist. Our data set may simply have been too small to allow detection of a modest reduction in Z rate. Additional data are, thus, needed to firmly settle the question and should be accompanied by a power analysis to reveal what minimum reduction in the Z-chromosome mutation rate would be detectable with the data. Such an analysis would require a deeper insight into the patterns and causes of substitution rate heterogeneity among introns or other chromosomal regions. However, as a preliminary analysis of the power of our data set to determine the maximum value of a putative Z reduction, we took the 95% percentile of the 1,000 double-bootstrap estimates of Zr. This method indicated that Zr is significantly less than 18%. We note that this value is considerably less than the point estimate of Zr ¼ 30% in rodents obtained from the X and A substitution data of McVean and Hurst (1997) combined with the assumption of a male-to-female mutation bias of 2 in rodents (Chang et al. 1994). Why Not a Reduced W-Chromosome Mutation Rate? It should be noted that we may as well had predicted divergence on the W chromosome, given the observed autosomal and Z-linked rates. Similarly, in theory, we could have inferred the reduction in W-linked substitution rates, Wr, by comparison of expected and observed divergences (table 4). Using the present data set, the outcome of such an analysis would have been the same as the analysis of a possible reduction on Z; all comparisons give the same one-tailed probability (P ¼ 0.286) and, thus, provide no statistical support for deviations from expectations from the male mutation bias. However, the question of which approach is the correct one is still warranted from a general perspective: Shall we rely on the observed W rate to calculate the expected Z rate (via comparison with autosomes), or shall we use the observed Z rate to predict the rate on W? Put in other words, should potential discrepancies between the three ways of estimating am be interpreted as a reduction in the substitution rate on the Z chromosome or as a reduction in the substitution rate on the W chromosome (or as some combination of the two factors)? In fact, the corresponding question applies to studies of mammalian sex chromosomes: Shall the observed Y rate be used to predict the rate on X, or shall the observed X rate be used to calculated the expected rate on Y? Previous work in this field has ignored the latter possibility and only addressed the possible reduction in X-chromosome mutation rate (McVean and Hurst 1997; Nachman and Crowell 2000; Ebersberger et al. 2002; Malcom, Wyckoff, and Lahn 2003). There is no way to differentiate between these possibilities with present data, because three chromosome categories means only two degrees of freedom, and one of those is used to estimate the male mutation bias. If we had some way of knowing the ‘‘true’’ male mutation bias, we could use the extra degree of freedom to differentiate between a W reduction and a Z reduction by seeing which of the Z/A and A/W comparisons gave the am estimate closest to the true value. However, evolutionary theory may help to resolve the issue. The argument for a reduction in the Z chromosome mutation rate (McVean and Hurst 1997) relies on cost-benefit considerations of adaptive mutation rates. Given that the Z chromosome contains many more genes than the tiny W chromosome (probably by at least two orders of magnitude [Ellegren 2000; Schmid et al. 2000]), the strength of selection for a reduced mutation rate should be expected to be much higher on Z than on W. Moreover, note that the benefit of a reduced mutation rate is the avoidance of deleterious mutations being exposed in a hemizygote chromosome. Although W is always hemizygous, a majority of the avian W-linked genes so far characterized have highly similar, and likely functionally equivalent, homologs on Z (Ellegren 2002b). Recessive mutations in these W-linked genes are likely to be masked by the gametologous Z-linked genes, thereby giving little benefit of mutation-rate reduction. The Double-Bootstrapping Method Estimating the confidence intervals of divergence measures by bootstrapping individual sites from concatenated data sets is commonly applied in molecular evolutionary studies. However, the increasing support for regional mutation-rate variation within genomes (reviewed in Ellegren,. Smith, and Webster [2003]), which includes our preliminary data for birds, necessitated an adjustment in the method for estimating the confidence intervals of chromosomal class divergences. Simply concatenating all alignments and then bootstrapping by sampling with replacement from all sites is not sufficient, because it does not account for the variation generated by the choice of a limited number of introns. To account for regional variation, we developed the double-bootstrapping method, which bootstraps by introns and by sites within intronic alignments and then takes the unweighted mean of the 1546 Axelsson et al. bootstrapped alignments. With this method, we found no statistical support for a specific reduction of the Zchromosome mutation rate. Had we only bootstrapped by sites, slightly different divergence medians and significantly different standard errors would have been obtained (A ¼ 10.27% 6 0.28%, Z ¼ 10.89% 6 0.28%, and W ¼ 5.66% 6 0.26%). Note that the much lower standard errors compared with the double-bootstrapping procedure is a consequence of not taking rate variation among introns into account. Importantly, with this approach, there would have been significant support for a reduction in the Z-chromosome mutation rate (Zr ¼ 7.88%, P ¼ 0.026). We believe this observation is one of general significance and that it calls for careful statistical treatment of molecular evolutionary data sets in the presence of underlying rate heterogeneity. Acknowledgment Financial support was obtained from the Swedish Research Council. H.E. is a Royal Academy of Sciences Research Fellow supported by the Knut and Alice Wallenberg Foundation. We thank Scott Edwards and two anonymous reviewers for useful comments. Literature Cited Arndt, P. F., D. A. Petrov, and T. Hwa, 2003. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol. Biol. Evol. 20:1887–1896. Bartosch-Härlid, A., S. Berlin, N. G. C. Smith, A. P. Møller, and H. Ellegren. 2003. Life history and the male mutation bias. Evolution 57:2398–2406. Bohossian, H. B., H. Skaletsky, and D. C. Page. 2000. Unexpectedly similar rates of nucleotide substitution found in male and female hominids. Nature 406:622–625. Carmichael, A. N., A. K. Fridolfsson, J. Halverson, and H. Ellegren. 2000. Male-biased mutation rates revealed from Z and W chromosome-linked ATP synthase alpha-subunit (ATP5A1) sequences in birds. J. Mol. Evol. 50:443–447. Chang, B. H. J., L. C. Shimmin, S. K. Shyue, D. Hewett-Emmett, and W. H. Li. 1994. Weak male-driven evolution in rodents. Proc. Natl. Acad. Sci. USA 91:827–831. Charlesworth, B., J. A. Coyne, and N. H. Barton. 1987. The relative rates of evolution of sex chromosomes and autosomes. Am. Nat. 130:113–146. Dermitzakis, E. T., A. Reymond, R. Lyle et al. (11 co-authors). 2002. Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 420:578–582. Dimcheff, D. E., S. V. Drovetski, and D. P. Mindell. 2002. Phylogeny of Tetraoninae and other galliform birds using mitochondrial 12S and ND2 genes. Mol. Phyl. Evol. 24: 203–215. Ebersberger, I., D. Metzler, C. Schwarz, and S. Paabo. 2002. Genomewide comparison of DNA sequences between humans and chimpanzees. Am. J. Hum. Genet. 70:1490–1497. Ellegren, H. 2000. Evolution of the avian sex chromosomes and their role in sex determination. Trends Ecol. Evol. 15: 188–192. ———. 2002a. Human mutation: blame (mostly) men. Nat. Genet. 31:9–10. ———. 2002b. Dosage compensation: do birds do it as well? Trends Genet. 18:25–28. Ellegren, H., and A. K. Fridolfsson. 1997. Male-driven evolution of DNA sequences in birds. Nat. Genet. 17:182–184. Ellegren, H., N. G. C. Smith, and M. T. Webster. 2003. Mutation rate variation in the mammalian genome. Curr. Opin. Genet. Dev. 13:562–568. Eyre-Walker, A., and L. D. Hurst. 2001. The evolution of isochores. Nat. Rev. Genet. 2:549–555. Fridolfsson, A. K., H. Cheng, N. G. Copeland et al. (10 coauthors). 1998. Evolution of the avian sex chromosomes from an ancestral pair of autosomes. Proc. Natl. Acad. Sci. USA 95: 8147–8152. Fridolfsson, A. K., and H. Ellegren. 2000. Molecular evolution of the avian CHD1 genes on the Z and W sex chromosomes. Genetics 155:1903–1912. Hardison, R. C., K. M. Roskin, S. Yang et al. (18 co-authors). 2003. Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res. 13:13–26. Hellmann, I., I. Ebersberger, S. E. Ptak, S. Pääbo, and M. Przeworski. 2003. A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72:1527–1435. Helm-Bychowski, K. M., and A. C. Wilson. 1986. Rates of nuclear DNA evolution in pheasant-like birds: evidence from restriction maps. Proc. Natl. Acad. Sci. USA 83: 688–692. Hoelzel, A. R., and A. Green. 1998. PCR protocols and population analysis by direct DNA sequencing and PCRbased DNA fingerprinting. Pp. 201–233 in A. R. Hoelzel, ed. Molecular genetic analysis of populations: a practical approach. Oxford University Press, Oxford, UK. Kahn, N. W., and T. W. Quinn. 1999. Male-driven evolution among Eoaves? A test of the replicative division hypothesis in a heterogametic female (ZW) system. J. Mol. Evol. 49: 750–759. Lawson, L. J., and G. M. Hewitt. 2002. Comparison of substitution rates in ZFX and ZFY introns of sheep and goat related species supports the hypothesis of male-biased mutation rates. J. Mol. Evol. 54:54–61. Lercher, M. J., and L. D. Hurst. 2002. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 18:337–340. Lercher, M. J., E. J. B. Williams, and L. D. Hurst. 2001. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: implications for understanding the mechanistic basis of the male mutation bias. Mol. Biol. Evol. 18:2032–2039. Li, W. H., S. J. Yi, and K. Makova. 2002. Male-driven evolution. Current Opin. Genet. Dev. 12:650–656. Makova, K. D., and W. H. Li. 2002. Strong male-driven evolution of DNA sequences in humans and apes. Nature 416:624–626. Malcom, C. M., G. J. Wyckoff, and B. T. Lahn. 2003. Genic mutation rates in mammals: local similarity, chromosomal heterogeneity, and X-versus-autosome disparity. Mol. Biol. Evol. 20:1633–1641. Matassi, G., P. M. Sharp, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9:786–791. McVean, G. A. T., and B. Charlesworth. 1999. A population genetic model for the evolution of synonymous codon usage: patterns and predictions. Genet. Res. 74:145. McVean, G. T., and L. D. Hurst. 1997. Evidence for a selectively favourable reduction in the mutation rate of the X chromosome. Nature 386:388–392. Miyata, T., H. Hayashida, K. Kuma, K. Mitsuyasu, and T. Yasunaga. 1987. Male-driven molecular evolution: a model Divergence in Autosomal, Z-Linked, and W-Linked Avian Introns 1547 and nucleotide sequence analysis. Cold Spring Harbor Symp. Quant. Biol. 52:863–867. Nachman, M. W., and S. L. Crowell. 2000. Estimate of the mutation rate per nucleotide in humans. Genetics 156: 297–304. Nanda, I., Z. Shan, M. Schartl et al. (15 co-authors). 1999. 300 million years of conserved synteny between chicken Z and human chromosome 9. Nat. Genet. 21:258–259. Nanda, I., T. Haaf, M. Schartl, M. Schmid, and D. W. Burt. 2002. Comparative mapping of Z-orthologous genes in vertebrates: implications for the evolution of avian sex chromosomes. Cytogenet. Genome Res. 99:178–184. Schmid, M., I. Nanda, M. Guttenbach et al. (34 co-authors). 2000. First report on chicken genes and chromosomes 2000. Cytogenet. Cell Genet. 90:169–218. Shimmin, L. C., B. H. J. Chang, and W. H. Li. 1993. Male-driven evolution of DNA sequences. Nature 362:745–747 Silva, J. C., and A. S. Kondrashov. 2002. Patterns in spontaneous mutation revealed by human-baboon sequence comparison. Trends Genet. 18:544–547. Smith, N. G. C., M. T. Webster, and H. Ellegren. 2002. Deterministic mutation rate variation in the human genome. Genome Res. 12:1350–1356. ———. 2003. A low rate of simultaneous double-nucleotide mutations in primates. Mol. Biol. Evol. 20:47–53. Sniegowski, P. D., P. J. Gerrish, T. Johnson, and A. Shaver. 2000. The evolution of mutation rates: separating causes from consequences. Bioessays 22:1057–1066. Sokal, R. R., and F. J. Rohlf. 1995. Biometry. W.H. Freeman, New York. Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10: 512–526. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. ClustalW—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (222 coauthors). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562. Williams, E. J. B., and L. D. Hurst. 2000. The proteins of linked genes evolve at similar rates. Nature 407:900–903. Wolfe, K. H., P. M. Sharp, and W. H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283–285. Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556. Zhao, Z., and E. Boerwinkle. 2002. Neighboring-nucleotide effects on single nucleotide polymorphisms: a study of 2.6 million polymorphisms across the human genome. Genome Res. 12:1679–1686. Scott Edwards, Associate Editor Accepted April 21, 2004