* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Evolution of SMC Proteins: Phylogenetic Analysis and Structural
Survey
Document related concepts
Endomembrane system wikipedia , lookup
Protein phosphorylation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Signal transduction wikipedia , lookup
Bacterial microcompartment wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein–protein interaction wikipedia , lookup
List of types of proteins wikipedia , lookup
Transcript
The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications Neville Cobbe and Margarete M. S. Heck Wellcome Trust Centre for Cell Biology, Institute of Cell and Molecular Biology, University of Edinburgh, United Kingdom The SMC proteins are found in nearly all living organisms examined, where they play crucial roles in mitotic chromosome dynamics, regulation of gene expression, and DNA repair. We have explored the phylogenetic relationships of SMC proteins from prokaryotes and eukaryotes, as well as their relationship to similar ABC ATPases, using maximum-likelihood analyses. We have also investigated the coevolution of different domains of eukaryotic SMC proteins and attempted to account for the evolutionary patterns we have observed in terms of available structural data. Based on our analyses, we propose that each of the six eukaryotic SMC subfamilies originated through a series of ancient gene duplication events, with the condensins evolving more rapidly than the cohesins. In addition, we show that the SMC5 and SMC6 subfamily members have evolved comparatively rapidly and suggest that these proteins may perform redundant functions in higher eukaryotes. Finally, we propose a possible structure for the SMC5/SMC6 heterodimer based on patterns of coevolution. Introduction DNA is compacted in length by a factor of up to 10,000 to form a metaphase chromosome in a typical metazoan cell. How this remarkable feat of packaging is achieved and how the chromosomes are then faithfully segregated to the daughter cells have been fundamental questions for over 100 years, ever since Walther Fleming first described the dynamic behavior of chromosomes during cell division. Within the past decade, a convergence of data from genetic and biochemical approaches has identified the SMC (structural maintenance of chromosomes) proteins as key players in both chromosome condensation and segregation (Cobbe and Heck 2000). These proteins are a highly conserved and ubiquitous family, found in all eukaryotes for which sufficient sequence data are available and in most of the currently sequenced prokaryotes. The function of these proteins in different aspects of chromosomal behavior is also conserved in these organisms, where they play vital roles in the packaging, repair, and segregation of the genetic material. No more than one smc gene is found in each of the prokaryotes examined, in which the protein apparently forms homodimers, whereas six paralogs are found in eukaryotes, in which they form at least three types of heterodimers. These heterodimers include the SMC1 and SMC3 proteins (part of the ‘‘cohesin’’ complex), the SMC2 and SMC4 proteins (part of the ‘‘condensin’’ complex), and the SMC5/SMC6 heterodimer, which forms part of a complex involved in DNA repair (Hirano 2002). Although no canonical SMC family members have been found in the enterobacteriaceae, similar phenotypes to smc mutants are displayed by mutations affecting mukB in Escherichia coli (Niki et al. 1992). This gene encodes a protein with a structure that is remarkably similar to that of SMC proteins (Melby et al. 1998), despite limited Key words: SMC, phylogeny, condensin, cohesin, ABC ATPases, maximum likelihood. E-mail: [email protected]. Mol. Biol. Evol. 21(2):332–347. 2004 DOI: 10.1093/molbev/msh023 Advance Access publication December 5, 2003 Molecular Biology and Evolution vol. 21 no. 2 Ó Society for Molecular Biology and Evolution 2004; all rights reserved. 332 sequence homology. In addition, although the crenarchaeota do not appear to have any SMC or MukB orthologs, they do contain the related Rad50 proteins. Furthermore, although no SMC, MukB, or Rad50 orthologs are apparent in the genome of Rickettsia, it does nonetheless have a related RecN protein. In fact, only four genera for which a complete genome sequence is currently available (Helicobacter, Buchnera, Chlamydia, and Chlamydophila) do not appear to have either an SMC or a related protein. Thus, it appears that SMC proteins have an extremely ancient origin, reflecting their fundamental role in chromosome dynamics. As shown in figure 1A, each SMC protein is characterized by a large globular domain at each terminus and a third globular domain forming a flexible hinge near the middle of the molecule, separated by extended coiled-coil structures (Saitoh et al. 1994; Melby et al. 1998) with short gaps in the coiled-coils at various positions (Beasley et al. 2002). Although MukB and Rad50 have also been shown to possess hinge regions (Melby et al. 1998; Anderson et al. 2001; Hopfner et al. 2001), the large globular hinge domain of SMC proteins is so far unique in both its sequence and structure (Haering et al. 2002). Recent work has shown that the hinge domain is the principal site at which these molecules dimerize (Haering et al. 2002; Hopfner et al. 2002). The terminal head domains are required for ATP hydrolysis and have been shown to be the region where non-SMC subunits of the condensin and cohesin complexes bind (Anderson et al. 2002; Yoshimura et al. 2002), similar to the binding of MukE and MukF to the head of MukB (Yamazoe et al. 1999) or the binding of Mre11 to Rad50 (Anderson et al. 2001; Hopfner et al. 2001). Although ATP is not required for DNA binding (Kimura and Hirano 1997; Hirano and Hirano 1998; Kimura et al. 1999), it appears necessary for preferential binding to positively supercoiled substrates (Kimura et al. 1999). Likewise, ATP binding (but not hydrolysis) is also required for the enhanced aggregation of the B. subtilis Smc with ssDNA (Hirano and Hirano 1998). Among the most conserved motifs in all SMC proteins are the N-terminal Walker A (P-loop) and C-terminal Walker B (DA box) ATPase motifs (Saitoh, Phylogenetic Analysis of SMC Proteins 333 FIG. 1.—(A) General structure of SMC proteins, showing the five distinct domains. (B) Alternative models of SMC heterodimer structure. Goldberg, and Earnshaw 1995), which are also conserved in MukB, Rad50, and RecN. Although both motifs have been shown to be essential for ATP binding and hydrolysis (Hirano et al. 2001), only the N-terminal globular domain directly binds ATP (Akhmedov et al. 1998). This ATPase activity appears to be required for the full function of SMC-containing complexes, as shown by mutagenesis of the ATP-binding domain (Chuang, Albertson, and Meyer 1994; Verkade et al. 1999; Fousteri and Lehmann 2000; Hirano et al. 2001) or the use of nonhydrolysable ATP analogs (Kimura and Hirano 1997). Based on the identification of these motifs at the extreme ends of SMC molecules, it was suggested that the functional ATPase domain would form by uniting these regions (Saitoh et al. 1994). This could occur in either of two ways, as illustrated schematically in figure 1B. Either the molecule could bend at the hinge to bring the terminal domains together (the intramolecular model) or it could do so by dimerizing as an antiparallel coiled-coil, bringing the N-terminal domain of one subunit next to the C-terminal domain of the other (the intermolecular model). Indeed, the formation of such an ATPase has been confirmed by a covalent fusion of the N-terminal and C-terminal domains alone of the single SMC in Thermotoga maritima, which is capable of ATP-dependent DNA aggregation (Löwe, Cordell, and van den Ent 2001). A combination of in vitro data from electron microscopy, crystal structures, and several elegant biochemical experiments now indicates that both Rad50 and SMC molecules bring these terminal domains together by bending at the hinge and forming intramolecular antiparallel coiled-coils (Hirano et al. 2001; Haering et al. 2002; Hirano and Hirano 2002; Hopfner et al. 2002). In addition, visualization of the whole condensin complex and its SMC subunits alone by atomic force microscopy has suggested that the SMC heterodimer may fold back along its coiled-coils, allowing the hinge to interact with the terminal ATPase domains (Yoshimura et al. 2002). However, the functional significance of this conformation remains to be explored. SMC proteins share another conserved motif (LSGG) upstream of the Walker B site, referred to as the ‘‘signature motif’’ of the ABC (ATP-binding cassette) ATPase family, which also includes the structurally similar RecN, Rad50, and MukB proteins (Löwe, Cordell, and van den Ent 2001). This motif has been shown to play a pivotal role in the dimerization of Rad50 head domains (Hopfner et al. 2000) and probably serves a similar function in other ABC ATPases such as the SMC, MukB, and RecN proteins. Given the striking level of conservation within SMC proteins and other ABC ATPases with extended coiledcoils, the origin of these proteins and their relationship to one another have been a subject of considerable interest. A number of different phylogenies for SMC proteins have been suggested to date that differ in their method of construction and resultant topology (Melby et al. 1998; Cobbe and Heck 2000; Jones and Sgouros 2001; Soppa 2001; Beasley et al. 2002), so the precise relationships between these and related proteins have been unclear. To resolve this issue, we constructed phylogenetic trees containing all six SMC subfamilies, as well as the related MukB, Rad50, and RecN proteins, from a large data set of protein sequences using the combined criteria of minimum evolution and maximum likelihood. In addition, we used the data generated in constructing these trees to test hypotheses about the structure, function, and evolution of these proteins. Materials and Methods Phylogenetic Analysis of SMC Proteins Sequences were assembled into subfamilies initially based on their BLAST (Altschul et al. 1997) scores, and the whole protein sequences were aligned with ClustalW (Thompson, Higgins, and Gibson 1994) and DIALIGN (Morgenstern et al. 1998). Alignments were then inspected manually with the aid of Mview (Brown, Leroy, and Sander 1998) and adjusted where necessary. Trees containing large numbers of OTUs were constructed using PAL (Drummond and Strimmer 2001) and Weighbor (Bruno, Socci, and Halpern 2000), and smaller maximumlikelihood trees of various subgroups were then calculated by quartet puzzling using the Tree-Puzzle program (Strimmer and von Haeseler 1996). These subgroups included members of the eukaryotic and prokaryotic SMC subfamilies shown in figure 2, subfamilies that branch together or closely related sequences within a subfamily (e.g., the archaeal, proteobacterial, or firmicute SMCs in the eubacterial subfamily). In the case of prokaryotic sequences such as archaeal or firmicute SMCs, the branches in these smaller trees were most easily resolved by removing coiledcoil sequences and calculating distances based on the conserved globular domains. Maximum parsimony was used to confirm the topology of smaller trees containing closely related sequences within SMC subfamilies, using the tree-searching options in PAUP* (Swofford 1999). However, this method was not used to reconstruct the topology of the whole tree, as it can become inconsistent with unequal rates of evolution (Felsenstein 1978; Kuhner and Felsenstein 1994; Tateno, Takezaki, and Nei 1994), which are observed in the different SMC subfamilies. The topology of the eukaryotic 334 Cobbe and Heck FIG. 2.—Phylogenetic tree of SMC proteins. The scale bar denotes the number of accepted substitutions in Whelan-Goldman distance units. Phylogenetic Analysis of SMC Proteins 335 SMCs differed both within subfamilies, when different conserved domains of the protein were used to calculate trees, and between subfamilies, even when the whole protein sequence was used (see Supplementary Material online). Horizontal gene transfer is a comparatively rare event in multicellular eukaryotes, and so it seems more likely that the conflict between subfamilies is the result of additional evolutionary constraints imposed by functional interactions between members of the subfamilies. Consequently, the phylogenetic signal leading to the most probable topology was recovered by calculating the maximum-likelihood tree from concatenated alignments of all the SMC subfamily members for which sequence was available. This widely used approach (Gouy and Li 1989; Baldauf et al. 2000; Hausdorf 2000; Brown et al. 2001; Nei, Xu, and Glazko 2001; Springer et al. 2001; Wolf et al. 2001; Brochier et al. 2002; Lang et al. 2002; Suzuki, Glazko, and Nei 2002) is employed here on the assumption that each of the individual eukaryotic SMC sequences shares the same vertical pattern of inheritance and therefore should follow the same branching order. As these trees using concatenated alignments had bootstrap values of 100% at all nodes (online supplementary figure) and in turn led to the best alignments when used as guide trees, it seems likely that the most probable topology is recovered this way. For those taxa in which gene duplications have occurred within a subfamily, it was found that the separate use of either paralog gave rise to the same topology, after testing all possible combinations of paralogs within subfamilies. In assembling the complete tree, the phylogenetic information of as many taxa as possible was used to break up long branches (Graybeal 1998; Zwickl and Hillis 2002). Additionally, tests were performed using the less divergent members of subfamilies to reconstruct trees in any cases where long-branch attraction (Lyons-Weiler and Hoelzer 1997) was suspected, by confirming that the same topology existed between other subfamily members. Any ambiguously assigned branches (typically, with a bootstrap probability less that 60%) were compared with all remaining competing tree topologies in which the suspect branch was placed elsewhere, to establish which position contributed least to the total length. The relationship between the subfamilies was established both by calculating maximum-likelihood trees with the least divergent representatives of the different subgroups and by determining the topology yielding the minimum-evolution tree when all sequences were included. Finally, the branch lengths of the overall tree were calculated using the Whelan-Goldman model of amino acid substitution (Whelan and Goldman 2001), with rate heterogeneity among sites modeled according to a gamma distribution with 16 rate categories. The gamma distribution shape parameter a was estimated from the data set using the Tree-Puzzle program (Strimmer and von Haeseler 1996). The phylogenetic tree of coiled-coil containing ABC ATPases was constructed using essentially the same methods as the SMC tree, in which smaller maximumlikelihood trees of each subfamily were calculated using Tree-Puzzle (Strimmer and von Haeseler 1996), and the relationship between the subfamilies was established both by calculating maximum-likelihood trees with the least divergent sequences and by determining the topology yielding the minimum-evolution tree when all sequences were included. Similarly, the final tree of coiled-coil– containing ABC ATPases was constructed by Tree-Puzzle, using the Whelan-Goldman model of amino acid substitution with a gamma distribution. However, branch lengths for this tree were calculated using distances estimated from the conserved globular domains of these proteins, excluding coiled-coils. Examples of maximum-likelihood trees used to construct the final trees presented in figure 2 and figure 5 are provided as Supplementary Material online, together with percentage bootstrap support values calculated using PAL (Drummond and Strimmer 2001) and Tree-Puzzle (Strimmer and von Haeseler 1996). Analysis of SMC Codon Usage The general pattern of codon usage in the genome of different bacteria was compiled using data from the Codon Usage Database (http://www.kazusa.or.jp/codon/ ). Grantham’s D statistic was calculated using the Wisconsin Package programs CodonFrequency and Correspond version 10.3 (Accelrys Inc) in the following species: Pseudomonas aeruginosa, Mycobacterium tuberculosis, Caulobacter crescentus, Deinococcus radiodurans, Treponema pallidum, Mesorhizobium loti, Thermotoga maritima, Borrelia burgdorferi, Listeria monocytogenes, Neisseria meningitidis, Clostridium acetobutylicum, Bacillus subtilis, Agrobacterium tumefaciens, Staphylococcus aureus, Xylella fastidiosa, Streptococcus pneumoniae, Aquifex aeolicus, Prochlorococcus marinus, Synechocystis, Nostoc, Thermosynechococcus elongatus, Synechococcus, Archaeoglobus fulgidus, Halobacterium, Pyrococcus furiosus, Methanosarcina acetivorans, and Methanococcus jannaschii. The codon bias index and the effective number of codons were calculated using the CodonW program (http://www.molbiol.ox.ac.uk/cu/ ). The bias in GC content at silent third codon positions was calculated by developing the bias(GC3) statistic. To correct for biases in GC content caused by a biased amino acid composition in the encoded protein, the GC content at third nucleotide positions was calculated using relative frequencies of codons. The statistic used is defined as follows: GC3 ¼ 21 X n 1 X ci 21 j¼1 i¼1 ÿj where n is the number of alternative codons ending in either a G or a C for the j th group of synonymous codons, c is the frequency with which such a codon occurs in the coding sequence, and ÿj is the sum of the frequencies for each alternative codon in the j th synonymous group, regardless of GC content. To correct for changes in composition resulting from mutational bias at the same locus, the GC content of the remaining first and second codon positions was calculated (in the same way as for GC3), as well as that of any introns. The arithmetic mean of these three values was taken to be the expected GC content for the locus, and the bias(GC3) statistic for a gene was then calculated as 336 Cobbe and Heck observedðGC3 Þ expectedðGC3 Þ biasðGC3 Þ ¼ 1 1 expectedðGC3 Þ where the species scaling factor 1 is the average AT content of the respective genome (57.3% in the case of D. melanogaster [Adams et al. 2000]), divided by the average GC content. The pairwise number of synonymous and nonsynonymous substitutions per site between aligned SMC and Rad50 sequences from D. melanogaster and D. pseudoobscura were estimated using the Diverge program in the Wisconsin package. The D. pseudoobscura nucleic acid sequences were obtained from the Drosophila Genome Project at the Baylor College of Medicine (http://hgsc. bcm.tmc.edu/projects/Drosophila/ ) SMC Evolutionary Rate Correlation The divergence of each eukaryotic SMC protein region was calculated by summing the maximum-likelihood branch lengths as far as the presumed root where the eukaryotic and prokaryotic branches meet. The extent of correlation between the estimated evolutionary rate in different parts of the proteins was then calculated using the Pearson product-moment correlation coefficient for each pair of SMC protein regions. Similar trends were also found using the Spearman rank correlation coefficient. Results Phylogenetic Analysis of SMC Proteins The final maximum-likelihood tree of 148 SMC sequences is presented in figure 2, with only one representative of each prokaryotic genus displayed. Among the eukaryotic SMC proteins, it was consistently observed that SMC1 and SMC4 branch together (these are the larger SMC subunits of the cohesin and condensin SMC heterodimers, respectively). Similarly, the smaller subunits, SMC3 and SMC2, appear to originate from the same gene duplication event, and SMC5 and SMC6 (which heterodimerize as part of a DNA repair complex [Taylor et al. 2001; Fujioka et al. 2002]) also branch together. All eukaryotes whose genomes have been substantially sequenced, from microsporidia to humans, contain these six subfamilies of SMC proteins, suggesting that the duplication events giving rise to each subfamily must have occurred either before or very soon after the origin of eukaryotes. It is also noteworthy that the rate of accepted amino acid substitution varies among different eukaryotic taxa within each subfamily. In particular, a number of SMC proteins in Caenorhabditis elegans seem to be exceptionally divergent. However, this can be explained by the observation that the more rapidly evolving SMCs are either those that have arisen from a gene duplication (such as SMC4a and DPY-27) or SMC proteins that form heterodimers with both of the duplicated SMCs (such as MIX-1) (Hagstrom et al. 2002). This is consistent with a correlation between the large number of duplicated genes in the C. elegans genome (Coghlan and Wolfe 2002; Gu et al. 2002) and its rapid evolutionary rate (Aguinaldo et al. 1997; Mushegian et al. 1998). Conversely, the rate of evolution appears relatively slow for the plant and vertebrate SMCs, with the exception of the SMC1b proteins, which are probably less evolutionarily constrained, as they appear to be required only during meiosis in males (Revenkova et al. 2001). Examining the tree, it appears that the branch lengths between more closely related organisms (such as the vertebrates or diptera) are longer for condensin SMC proteins than for the corresponding cohesins, suggesting either that condensins may be more ancient or may evolve more rapidly. However, the SMC proteins of both complexes appear to be similarly essential for viability in a range of organisms (Strunnikov, Larionov, and Koshland 1993; Saka et al. 1994; Strunnikov, Hogan, and Koshland 1995; Michaelis, Ciosk, and Nasmyth 1997; Lieb et al. 1998; Steffensen et al. 2001; Hagstrom et al. 2002; Liu et al. 2002; Siddiqui et al. 2003), so one might expect their rates of evolution to be broadly similar. Indeed, a significant rate difference is only evident between the more closely-related organisms for which all SMC protein sequences are available, whereas the mean pairwise distances for condensin and cohesin SMCs in more divergent groups seem to approach similar values (fig. 3A). Furthermore, the overall number of substitutions between condensin or cohesin SMC proteins and their presumed ancestor appear to be essentially the same (fig. 3B). It is therefore possible that the trend shown in figure 3A may reflect slightly more rapid evolution in condensin SMC proteins (detectable only by examining the smallest pairwise distances, in which the number of accepted substitutions and speciation events are minimized). Although condensin SMCs appear to show a higher substitution rate among closely related species than cohesin SMCs, the mean distances within subfamilies of these proteins (averaged across all condensin and cohesin SMCs for each pairwise comparisons between different organisms) are about half (0.54 6 0.134) the corresponding distances between SMC5 and SMC6 proteins. Similarly, it can be seen in figure 3B that the mean distance of SMC5 and SMC6 to the presumed root with prokaryotes is approximately twice as great as that of other eukaryotic SMC proteins. As discussed later, this elevated sequence divergence may reflect a reduced selection pressure on these proteins compared with the other SMCs, which are required during every cell division, and is thus consistent with the suggested general role for these proteins, specifically in response to DNA damage (Lehmann et al. 1995; Mengiste et al. 1999; Verkade et al. 1999; Fousteri and Lehmann 2000). Analysis of Prokaryotic SMC Codon Usage In general, the phylogeny of prokaryotic SMC proteins agrees well with their accepted taxonomic distribution, although the unusually short SMC protein in Borrelia burgdorferi results in this protein clustering with other outgroup species, away from other spirochaetes such as Treponema. However, as previously suggested (Soppa 2001), it would appear from the tree shown in figure 2 that at least two cases of horizontal gene transfer (HGT) have Phylogenetic Analysis of SMC Proteins 337 FIG. 3.—Divergence of SMC protein subfamilies. (A) Relative distances (mean pairwise distance within each subfamily divided by the mean distance of all subfamilies) of condensin and cohesin SMC proteins within different taxonomic groups. The difference in relative distances between condensin and cohesin SMC proteins becomes most obvious with progressively more closely related organisms, as shown from left to right. (B) Evolutionary distance between members of different eukaryotic SMC subfamilies and the presumed root with prokaryotic SMC proteins. The mean number of accepted substitutions (in Whelan-Goldman distance units) and standard deviation are displayed for organisms with substantially sequenced genomes in which all SMC protein sequences are available. occurred, resulting in the SMC proteins of the cyanobacteria (Prochlorococcus marinus, Thermosynechococcus elongatus, Nostoc, Synechocystis, and Synechococcus) as well as the eubacterial species Aquifex aeolicus, most closely resembling those of the archaea. On the other hand, phylogenetic trees based on concatenated alignments of protein or rRNA sequences (Wolf et al. 2001; Brochier et al. 2002) firmly place these species among the remaining eubacteria, as do our own trees based on sequences for 16S rRNA or proteins such as EF-Tu and RecA (data not shown). To investigate this HGT phenomenon further, codon usage analysis was conducted to examine whether the pattern of codon usage in the cyanobacterial SMC genes more closely resembled that of the archaeal SMCs or of other cyanobacterial genes. The differences between codon frequency tables were computed using Grantham’s D statistic (Grantham et al. 1981). This statistic was calculated to measure the difference in codon usage FIG. 4.—Codon usage in SMC genes. (A) Measures of Grantham’s D statistic in comparisons between codon usage for SMC, EF-Tu, and RecA in different prokaryotic species and general codon usage patterns in the same organism. (B) Codon usage in Drosophila SMC genes. Values of bias(GC3) displayed above have all been halved to facilitate comparison with trends in the other statistics. (C) The ratio of nonsynonymous (Ka) to synonymous substitutions (Ks) in SMC genes, based on pairwise comparisions between codons in D. melanogaster and D. pseudoobscura sequences. between the SMC of a particular prokaryote and the general pattern of codon usage in its genome. In addition, comparisons were made between genomic codon usage and the codon usage of recA orthologs or EF-Tu genes, which are often employed in phylogenetic tree reconstruction because of the apparent rarity with which they experience HGT (Eisen 1995; Baldauf, Palmer, and Doolittle 1996; Brendel et al. 1997; Lopez, Forterre, and Philippe 1999). However, as shown in figure 4A, the pattern of codon usage for SMC genes in A. aeolicus, the cyanobacteria, the remaining eubacteria and the archaea more closely match the general pattern for other loci in the genome than do EF-Tu and RecA, which have higher values of D. Therefore, we have found little evidence in favor of HGT for smc genes from patterns of codon usage. As the relative level of sequence similarity between the archaeal and cyanobacterial SMC proteins seems hard to account for in terms of convergent evolution, it would appear that their similarity results from their common ancestry (synapomorphy). This synapomorphy is pre- 338 Cobbe and Heck sumably associated with an ancient HGT event, allowing sufficient time for a transferred smc gene to adapt to the translational apparatus of its recipient genome by means of appropriate synonymous substitutions. This codon adaptation could occur without significantly affecting a gene’s identity at the amino acid level, as the rate of amino acid replacement is generally much slower than the rate of synonymous substitution (Li 1997). Indeed, a recent study (Koski, Morton, and Golding 2001) has shown that many E. coli genes with normal nucleotide composition have no apparent orthologs in a related enterobacterium, suggesting that only relatively recent HGT events may necessarily reveal themselves by differences in codon usage. Consequently, it is not clear from the codon usage data in which direction this transfer event occurred, whether from archaea to cyanobacteria or vice versa, and there is still debate as to which is actually the older group of prokaryotes (Hedges et al. 2001; Cavalier-Smith 2002). As the A. aeolicus SMC clusters with different archaeal species in the tree and its pattern of codon usage more closely resembles that of the archaea than the cyanobacteria, it would appear that a separate transfer event was involved in this lineage. This may reflect a general trend in this species to acquire and retain genes from similarly thermophilic archaea (Aravind et al. 1998). However, apart from the cases described above, the phylogeny of prokaryotic SMC proteins agrees well with trees constructed from 16S rRNA, EF-Tu, and RecA sequences. If the complexity hypothesis (Jain, Rivera, and Lake 1999) holds true that informational genes are less prone to successful HGT by virtue of the number of gene products they interact with, then the relative lack of HGT involving SMCs may reflect a potentially large number of interactors. So far, only two proteins have been found to form a complex biochemically with either a bacterial SMC (Soppa et al. 2002) or MukB (Yamazoe et al. 1999), but it seems likely that many more may interact either transiently or indirectly, as suggested by the large number of proteins that interact genetically with MukB (Hiraga 2000). Analysis of SMC Codon Usage in Drosophila The dipteran SMC5 and SMC6 coding sequences seem to have an exceptionally large number of substitutions, even allowing for the observation that a large number of genes in Drosophila are known to evolve at an unusually high rate (Sharp and Li 1989; Schmid and Tautz 1997; Fay, Wyckoff, and Wu 2002). The possibility that these genes might be nonessential (Jordan et al. 2002; Yang, Gu, and Li 2003) or required at much lower levels (Pál, Papp, and Hurst 2001) was therefore investigated by examining the pattern of codon usage in the Drosophila genes encoding SMC5 and SMC6, in comparison with the codon usage bias of the other four SMCs. In Drosophila and other organisms, there is a strong positive correlation between gene expression levels and codon usage bias (Duret and Mouchiroud 1999; Kanaya et al. 2001). However, it has also been suggested that synonymous codon usage in Drosophila may be biased to enhance the accuracy of translation, as the frequency of preferred codons is significantly higher at conserved amino acid positions (Akashi 1994). As the number of accepted nucleotide substitutions may be constrained by strong codon usage bias, this could explain the apparent correlation between the rates of synonymous and nonsynonymous substitutions in Drosophila (Comeron and Kreitman 1998). Therefore, if the higher rates of amino acid replacement in Drosophila SMC5 and SMC6 result from reduced selective pressures, this may be reflected by a lack of bias for optimal codons (consistent with reduced translational accuracy or efficiency.) The extent to which the genes encoding SMC proteins in Drosophila demonstrate directional bias in codon usage was first investigated by calculating the codon bias index (CBI) (Bennetzen and Hall 1982). As shown in figure 4B, the CBI for SMC5 or SMC6 is lower than the values for cohesin or condensin SMCs. Moreover, the especially low CBI value for SMC5 is closer to that of the negatively biased yEst-6 pseudogene (accession number AF526559) of Drosophila. The remaining SMC genes all have similar values, with the exception of Drosophila SMC3 (CAP), which has a comparatively high level of codon bias. As this gene is found on the X chromosome, whereas the other SMC genes are all autosomal, the increased bias and decreased number of substitutions in Drosophila SMC3 is presumably a reflection of the increased selection pressure on X-linked genes in the heterogametic sex (Montgomery, Charlesworth, and Langley 1987). To evaluate whether the reduced bias observed in SMC5 and SMC6 might be simply the result of lower expression levels of DNA repair proteins, the CBI for Drosophila Rad50 was also calculated. As this is another gene specifically involved in DNA repair (Gorski and Eeken 2002; Engels et al. 2003; Johnson-Schlitz et al. 2003) that also has strong sequence and structural similarities to SMC proteins, one might expect it to be expressed similarly to the SMC5 and SMC6 repair proteins. Interestingly, the CBI for Rad50 is less than most SMCs, which is consistent with a threefoldlower abundance of rad50 mRNA compared with SMC2 and SMC4 during early Drosophila embryogenesis (http:// genome.med.yale.edu/Lifecycle/ ), when most mitotic activity takes place. Nevertheless, the CBI for Rad50 is significantly greater than that of SMC5 and SMC6, suggesting that these proteins may not be required to the same extent as Rad50 to repair DNA damage. Secondly, the effective number of codons (NC) (Wright 1990) was also calculated. As NC can take values from 20 (in the cases of extreme bias) to 61 (when all synonymous codons are used equally), the bias in NC was evaluated as: 61 NC 41 This statistic ranges from 0, for completely random codon usage, to 1, if one codon is exclusively used for each amino acid. These values are also plotted in figure 4B and show that bias(NC) for SMC5 and particularly for SMC6 is most similar to that of the yEst-6 pseudogene. Thirdly, in addition to a correlation between tRNA copy number and favored codons, it has also been described that more highly expressed genes in Drosophila biasðNC Þ ¼ Phylogenetic Analysis of SMC Proteins 339 tend to have a high GC content at silent sites such as the third nucleotide position of their codons (Shields et al. 1988; Kanaya et al. 2001). On the other hand, this GC content is often reduced in sequences with weaker selective constraints, such as pseudogenes (Shields et al. 1988) (although this depends on their age [Echols et al. 2002]). The bias in GC content at silent third codon positions was evaluated by calculating the bias(GC3) statistic. As shown in figure 4B, the bias(GC3) for SMC5 and SMC6 is at least 20% smaller than that of most other SMC genes (but only about 11% less than that of Rad50), whereas SMC3 shows the highest bias. Finally, the selection pressures on Drosophila SMC proteins were examined by estimating the average ratio of nonsynonymous to synonymous substitutions (Ka/Ks) at each codon position. This ratio provides a measure of the intensity of purifying, or negative, selection exerted on genes, which results in decreased rates of amino acid replacement and thus Ka/Ks ratios less than one. The distribution of Ka/Ks varies greatly among different classes of genes, with values ranging from 0.02 to 0.306 and a mean ratio of 0.122 for various Drosophila genes (Li 1997) On the other hand, the rates of synonymous and nonsynonymous substitutions are approximately equal if most amino acid variation is neutral, as observed in many pseudogenes (Li, Gojobori, and Nei 1981; Bustamante, Nielsen, and Hartl 2002), whereas Ka/Ks ratios greater than 1 usually indicate adaptive or positive selection (Messier and Stewart 1997; Hughes and Yeager 1998). As shown in figure 4C, the Ka/Ks ratios for SMC2, SMC4, and Rad50 appear to be close to the previously described mean value in Drosophila (Li 1997), whereas SMC1 and SMC3 have considerably lower Ka/Ks ratios. This increased negative selection on SMC1 and SMC3 is consistent with the differences in amino acid substitution rates between cohesin and condensin SMCs in closely related organisms, as described earlier. By contrast, SMC5 and SMC6 display Ka/Ks ratios approximately twice as great as other SMC proteins. In particular, the ratio for SMC6 is similar to the highest Ka/Ks values previously described in Drosophila for chorion proteins (Li 1997). Based on these combined results, it would appear that the SMC5 and SMC6 proteins in Drosophila are under severely reduced selection for preferred codons in comparison with other SMCs and experience dramatically decreased negative selection, which could explain their higher amino acid substitution rate in terms of fewer selective constraints on their evolution in general. Phylogenetic Analysis of SMC Related Proteins To reveal how closely related SMC proteins are to similar ABC ATPases, a maximum-likelihood tree was constructed using sequences from these protein families, as shown in figure 5. It is clear from this tree that the closest relatives to the SMC proteins are the archaeal Rad50 proteins, followed by eukaryotic Rad50 and eubacterial SbcC proteins. Many of these Rad50 superfamily proteins have the conserved N-terminal FKS (or FRS) motif (located before the Walker A site), which is present in most of the SMC proteins. However, this motif is absent in the SMC5 and SMC6 subfamily members (in which only the phenylalanine is conserved). The FKS motif is also absent in the MukB proteins, which appear to possess universally conserved NWN and FART motifs instead. In addition, although it was previously suggested that all SMC and MukB proteins uniquely share a conserved Nterminal QG motif (Melby et al. 1998), this dipeptide was found to occur at different locations in SMC and MukB sequences in the alignments generated in this study. Instead, this motif was found at a conserved position in SMC and Rad50 proteins, in agreement with previously published structure-based alignments (van den Ent et al. 1999). Therefore, although it was previously proposed that MukB should be included among the SMC proteins as they adopt a similar structure (Melby et al. 1998), the Rad50 proteins seem to be more closely related at the sequence level. The high level of sequence conservation between MukB proteins argues against the notion that they are a highly divergent group of SMC proteins. Instead, it is possible that their structural similarity may be partly caused by convergent evolution, which is consistent with the similar functions of MukB and SMC proteins in prokaryotes (Cobbe and Heck 2000). Correlated Rates of Evolution in Different Regions of SMC Proteins Contrary to previous reports (Melby et al. 1998), we observed that eukaryotic SMC trees reconstructed from individual conserved hinge or N-terminal and C-terminal globular domains often differed both from each other and from phylogenies based on the whole protein sequence. For this reason, we attempted to dissect the different selective constraints on the sequence divergence of these regions by examining how they might coevolve. As interacting proteins tend to evolve at similar rates (Fraser et al. 2002), it was expected that regions of SMC proteins that interact with each other would also coevolve and show correlated rates of substitution. Correlation coefficients for different regions of SMC proteins were calculated after sorting the maximumlikelihood distances according to the functional complexes within which different subfamily members are found. The Pearson correlation coefficient for each pair of SMC protein regions is plotted in figure 6, together with Spearman’s coefficient of rank correlation. As shown in figure 6B, a clear positive correlation was observed between the evolutionary rates of the b-strands that constitute the interface of the N-termini and C-termini in the same molecule (Löwe, Cordell, and, van den Ent 2001). However, no significant positive correlation was found between the corresponding N-terminal and C-terminal regions of any SMC and its partner (fig. 6A), as would have been expected if the heterodimer existed naturally in an antiparallel intermolecular conformation. This strongly suggests that the intramolecular structure (fig. 1B) indicated by available in vitro data is also the structure found in vivo. By contrast, a strong positive correlation was observed between the evolutionary rates of the exposed outer portions of the N-terminal and C-terminal domains in SMC2 and SMC4 but not in the corresponding regions 340 Cobbe and Heck FIG. 5.—Phylogenetic tree of SMC proteins and other coiled-coil ABC ATPases. The scale bar denotes the number of accepted substitutions in Whelan-Goldman distance units. Phylogenetic Analysis of SMC Proteins 341 FIG. 6.—Correlated rates of evolution in different regions of SMC proteins, calculated using maximum-likelihood branch lengths. Correlation coefficients associated with postulated intermolecular associations are denoted ‘‘INTER,’’ whereas ‘‘INTRA’’ indicates intramolecular correlations. The Pearson correlation coefficients are shown above and nonparametric Spearman rank correlations are shown below. of the same molecule (fig. 6C and D). These stronger intermolecular correlations are nonetheless consistent with observed associations between the termini of SMC proteins (Melby et al. 1998; Anderson et al. 2002). Moreover, a significant correlation at the 1% level was observed for intermolecular associations between the coiled-coils of the condensin SMC proteins (fig. 6E ). This elevated intermolecular correlation appears to reflect differences between the conformation of vertebrate condensin and cohesin SMC proteins previously observed by rotary shadowing, whereby the coiled-coils of condensin SMCs are intimately associated along most of their length but those of the cohesin complex are well separated (Anderson et al. 2002). Interestingly, the SMC5 and SMC6 proteins appear to show similarly weak associations between their arms (fig. 6C and E) as SMC1 and SMC3. This suggests that the SMC proteins involved specifically in DNA repair may adopt a similarly open conformation to that of the cohesin SMC proteins (Haering et al. 2002). By contrast, although a significant intermolecular correlation was not found consistently for the rate at which the hinge domains of cohesin SMC proteins evolve, the SMC5 and SMC6 proteins nonetheless showed intermolecular rate correla- 342 Cobbe and Heck tions greater than 60% at the dimer interface of their hinge domains. This may possibly indicate that SMC5 and SMC6 have a stronger hinge association than other SMC proteins such as SMC1 and SMC3. Although an even stronger positive correlation can be seen for the dimer interface of condensin hinge domains using the Pearson correlation coefficient, this particular result is not supported by nonparametric analyses. Discussion The Phylogeny of SMC Proteins We have investigated the evolutionary radiation of SMC proteins and determined that the closest relatives of these proteins are members of the Rad50 superfamily, as revealed by phylogenetic analysis and the conservation of particular residues. Apart from the suspected cases of HGT described in this study, the tree topology described here generally agrees with the currently accepted taxonomic distribution of the organisms whose SMC proteins have been examined. Interestingly, our consensus tree shows that plant and animal SMC proteins cluster together with fungi as the outgroup, whereas other recent studies based on 18S rRNA and a panoply of proteins have reported that fungi and animals are most closely related to each other (Baldauf and Palmer 1993; Baldauf et al. 2000; Van de Peer et al. 2000). However, the resolution of this trichotomy has been a longstanding controversy, with different studies suggesting that animals are closer relatives to either plants or fungi (Gupta 1995; Wang, Kumar, and Hedges 1999). The tree presented here is more consistent with previous analyses using ribosomal proteins (Veuthey and Bittar 1998) and other concatenated sequences (Gouy and Li 1989), which place plants closer to animals. Alternatively, the phylogeny revealed in the SMC tree may reflect higher rates of substitution in the fungal proteins because of reduced constraints in their interactions with other proteins, as has been suggested in the case of histones H3 and H4 (Thatcher and Gorovsky 1994). For example, at least one protein (AKAP95) with no clear orthologs in other animals, fungi, or plants is thought to interact with SMC proteins in vertebrates (Collas, Le Guellec, and Tasken 1999; Steen et al. 2000), which may influence the relatively slow rates of SMC evolution in these animals. In the future, it will be interesting to see how the use of SMC sequences, in combination with larger data sets of similarly conserved protein sequences (Brown et al. 2001), might contribute to resolving the relationships among these eukaryotic kingdoms. Evolution of Eukaryotic SMC Subfamilies According to the tree in figure 2, it appears that a symmetric duplication of genes encoding the larger and smaller eukaryotic SMCs gave rise to both cohesin and condensin SMC proteins in all eukaryotes. The relatively short branch lengths connecting either the roots of the SMC1/SMC4 or the SMC2/SMC3 lineages to the prokaryotic SMC root also suggest that the first duplication event, giving rise to the primordial eukaryotic SMC heterodimer, occurred very early in eukaryotic evolution. However, it seems that the first SMC heterodimer to arise in eukaryotes may have had more functions in common with the condensin heterodimer, based on the assertion that the primary phenotype in a prokaryotic smc null mutant is probably a condensation defect (Britton and Grossman 1999) and the observation that DNA reannealing (Sutani and Yanagida 1997; Hirano and Hirano 1998) and supercoiling (Kimura and Hirano 1997; Lindow, Britton, and Grossman 2002) activities are common features of these proteins but apparently not of cohesins in general (Jessberger et al. 1996; Losada and Hirano 2001; Sakai et al. 2003). Given these functional similarities between condensin and prokaryotic SMCs, it might seem odd that the primary sequences of SMC2 and SMC4 do not appear to be more closely related to prokaryotic SMC proteins than the functionally distinct SMC1 and SMC3 proteins (fig. 3B). However, if condensin SMCs had evolved slightly more rapidly since the divergence of eukaryotes, then this could account for their lack of greater sequence identity with the functionally related prokaryotic SMC proteins. Indeed, a higher substitution rate among condensin SMC proteins than cohesins can be seen in pairwise comparisons between sequences from more closely related organisms, such as vertebrate or dipteran SMCs. This suggests that SMC2 and SMC4 may evolve more rapidly than the cohesin SMCs but that the difference in rate is small enough to be masked by mutational saturation in comparisons between more distantly related species. If so, it is anticipated that additional SMC protein sequences from closely related organisms in other phyla might also reveal more accepted substitutions in pairwise comparisons between condensin SMC proteins than in cohesins. One might speculate that the later evolution of SMC2 and SMC4 involved some level of positive selection for proteins with enhanced condensing activities, given the larger size of eukaryotic chromosomes in general (Bendich and Drlica 2000), whereas SMC1 and SMC3 were recruited to specific roles in maintaining sister chromatid cohesion. Although speculative, a prediction of this hypothesis could be tested by directly comparing the supercoiling activities of condensin and prokaryotic SMCs to see if the condensing activities of the former are indeed more efficient. Moreover, as additional SMC sequences from closely related species become available, this may facilitate the detection of positive selection at particular amino acid sites of condensin SMC proteins by maximum-likelihood approaches (Anisimova, Bielawski, and Yang 2001). Whereas, differences in the substitution rate of cohesin and condensin SMC proteins become more pronounced in comparisons between closely related organisms, SMC5 and SMC6 consistently show pairwise distances about twice that of the average for the other SMCs. Based on the pattern of codon usage and Ka/Ks ratios in Drosophila, it seems that the large numbers of substitutions in the SMC5 and SMC6 proteins are best explained by an elevated rate of evolution, rather than a more ancient origin. Similar codon usage analyses in other organisms failed to clearly show the same pattern, often because of less pronounced codon bias patterns or Phylogenetic Analysis of SMC Proteins 343 FIG. 7.—Models of SMC heterodimers, based on available structural data and the patterns of coevolution described in figure 5. The coiled-coil arms of SMC4 and SMC2 have positively correlated substitution rates and appear to be associated with each other in electron microscope images of condensin. By contrast, cohesin is believed to form an annular structure (Gruber, Haering, and Nasmyth 2003) in which the arms of SMC1 and SMC3 are separated but their termini are bridged by the SCC1/RAD21 cohesin subunit, which might account for the lack of any significant correlation between substitution rates in these regions. As there is also little evidence of coevolution between the arms and exposed termini of SMC5 and SMC6, it is possible that these regions do not form significant contacts between these proteins either. the complicating influence of SMC gene duplications on evolutionary rate. However, as SMC5 and SMC6 are only known to be essential for proliferation in yeasts (Lehmann et al. 1995; Verkade et al. 1999) but not in Arabidopsis (Mengiste et al. 1999), it seems likely that their reduced sequence conservation may reflect reduced selective pressures on these proteins in other species as well. Moreover, as some of the properties of SMC5 and SMC6 (Taylor et al. 2001) are mirrored by the association of SMC1 (or SMC1b in vertebrates) and SMC3 with male meiotic chromosomes (Klein et al. 1999; Eijpe et al. 2000; Revenkova et al. 2001) and the role of SMC1/SMC3 in recombination repair (Jessberger et al. 1996; Sjögren and Nasmyth 2001; Kim, Xu, and Kastan 2002; Yazdi et al. 2002), it is possible that these proteins have evolved more rapidly since their function can be partially complemented by the cohesin SMCs. Therefore, the long branch lengths within this subfamily seem to better reflect their lessconstrained substitution rate, rather than their greater antiquity as previously suggested (Jones and Sgouros 2001). Although the other coiled-coil ABC ATPases appear to place the root of SMC proteins in the SMC5/ SMC6 branch, this does not necessarily imply that these are the ancestral SMC proteins. Instead, it seems more likely that the placement of the root here in the maximumlikelihood tree reflects the average similarity of other coiled-coil ABC ATPases to SMC proteins, regardless of how the SMCs have diverged from each other. The Conformations of SMC Heterodimers In agreement with the results obtained previously by in vitro analyses (Hirano et al. 2001; Haering et al. 2002; Hirano and Hirano 2002), our statistical analysis of coevolution in eukaryotic SMC heterodimers strongly favors the intramolecular model of dimer formation (fig. 6A and B). We believe our findings are complementary to the available in vitro data as the observed correlations between substitution rates would be consistent with formation of such intramolecular heterodimers in vivo. The low correlations between the evolutionary rates at the coiled-coils and termini of SMC5/SMC6 heterodimers suggest that their conformation may more closely resemble cohesin SMC proteins, consistent with their overlapping functions in DNA repair and meiosis. The similarly weak intermolecular correlations for the arms and termini of cohesin SMCs may represent a weaker association between these regions (fig. 7), facilitating the dissolution of sister chromatid cohesion mediated by a proposed annular cohesin complex (Gruber, Haering, and Nasmyth 2003). The weaker association between the arms suggested by our analysis is consistent with the appearance of both the SMC1/SMC3 heterodimer and the cohesin complex when observed by electron microscopy (Anderson et al. 2002; Haering et al. 2002). By contrast, the strong intermolecular correlation for the coiled-coils and the termini of condensin SMC proteins are consistent both with electron microscope images of these proteins (Anderson et al. 2002; Yoshimura et al. 2002) and the high specificity of their coiled-coil interactions (Hirano and Hirano 2002). In the future it will be interesting to see if SMC5 and SMC6 also adopt a similarly open conformation to that of the cohesin SMC proteins when viewed by microscopy. On the other hand, if SMC5 and SMC6 prove to have a stronger hinge association than other SMC proteins, this may reflect a role for this domain in tethering such SMC proteins bound to separate DNA strands during the repair process, as has been suggested similarly for Rad50 (de Jager et al. 2001; Hopfner et al. 2002). Supplementary Material Maximum-likelihood trees of selected SMC proteins from eukaryotes in which all SMC protein sequences are available, are presented in an online supplementary figure. Separate trees of the six different subfamilies are shown together with a consensus tree constructed from concatenated alignments of the other sequences (SMC1 to SMC6). Scale bars denote the number of accepted substitutions in Whelan-Goldman distance units and bootstrap percentage support for the nodes are indicated. Web sites where additional data are available include http://www.kazusa.or.jp/codon/ Codon Usage Database, http://www.molbiol.ox.ac.uk/cu/ CodonW Program, http:// genome.med.yale.edu/Lifecycle/ Drosophila Gene Expression Levels, and http://www.hgsc.bcm.tmc.edu/blast/? organism¼Dpseudoobscura D. pseudoobscura Genome Acknowledgments We would like to thank Andrew Lloyd of INCBI (Irish National Centre for Bioinformatics) and the UK Human Genome Mapping Project Resource Centre for use 344 Cobbe and Heck of computing facilities. In addition, we are grateful to Dietlind Gerloff, Paul McLaughlin, Ken Wolfe, and an anonymous referee for helpful discussions and critical reading of the manuscript. M.H. is a Senior Research Fellow in the Basic Biomedical Sciences, funded by the Wellcome Trust. N.C. has been supported by a Darwin Trust Prize Studentship. Literature Cited Adams, M. D., S. E. Celniker, R. A. Holt et al. 2000. The genome sequence of Drosophila melanogaster. Science 287:2185–2195. Aguinaldo, A. M., J. M. Turbeville, L. S. Linford, M. C. Rivera, J. R. Garey, R. A. Raff, and J. A. Lake. 1997. Evidence for a clade of nematodes, arthropods and other moulting animals. Nature 387:489–493. Akashi, H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935. Akhmedov, A. T., C. Frei, M. Tsai-Pflugfelder, B. Kemper, S. M. Gasser, and R. Jessberger. 1998. Structural maintenance of chromosomes protein C-terminal domains bind preferentially to DNA with secondary structure. J. Biol. Chem. 273:24088– 24094. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. Anderson, D. E., A. Losada, H. P. Erickson, and T. Hirano. 2002. Condensin and cohesin display different arm conformations with characteristic hinge angles. J. Cell Biol. 156:419–424. Anderson, D. E., K. M. Trujillo, P. Sung, and H. P. Erickson. 2001. Structure of the Rad50Mre11 DNA repair complex from Saccharomyces cerevisiae by electron microscopy. J. Biol. Chem. 276:37027–37033. Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18:1585–1592. Aravind, L., R. L. Tatusov, Y. I. Wolf, D. R. Walker, and E. V. Koonin. 1998. Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends. Genet. 14:442–444. Baldauf, S. L., and J. D. Palmer. 1993. Animals and fungi are each other’s closest relatives: congruent evidence from multiple proteins. Proc. Natl. Acad. Sci. USA 90:11558– 11562. Baldauf, S. L., J. D. Palmer, and W. F. Doolittle. 1996. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc. Natl. Acad. Sci. USA 93:7749–7754. Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972–977. Beasley, M., H. Xu, W. Warren, and M. McKay. 2002. Conserved disruptions in the predicted coiled-coil domains of eukaryotic SMC complexes: implications for structure and function. Genome Res. 12:1201–1209. Bendich, A. J., and K. Drlica. 2000. Prokaryotic and eukaryotic chromosomes: what’s the difference? Bioessays 22:481–486. Bennetzen, J. L. and B. D. Hall. 1982. Codon selection in yeast. J. Biol. Chem. 257:3026–3031. Brendel, V., L. Brocchieri, S. J. Sandler, A. J. Clark, and S. Karlin. 1997. Evolutionary comparisons of RecA-like proteins across all major kingdoms of living organisms. J. Mol. Evol. 44:528–541. Britton, R. A., and A. D. Grossman. 1999. Synthetic lethal phenotypes caused by mutations affecting chromosome partitioning in Bacillus subtilis. J. Bacteriol. 181:5860–5864. Brochier, C., E. Bapteste, D. Moreira, and H. Philippe. 2002. Eubacterial phylogeny based on translational apparatus proteins. Trends Genet. 18:1–5. Brown, J. R., C. J. Douady, M. J. Italia, W. E. Marshall, and M. J. Stanhope. 2001. Universal trees based on large combined protein sequence data sets. Nat. Genet. 28:281–285. Brown, N. P., C. Leroy, and C. Sander. 1998. MView: A Web compatible database search or multiple alignment viewer. Bioinformatics 14:380–381. Bruno, W. J., N. D. Socci, and A. L. Halpern. 2000. Weighted neighbor joining: a likelihood-based approach to distancebased phylogeny reconstruction. Mol. Biol. Evol. 17: 189–197. Bustamante, C. D., R. Nielsen, and D. L. Hartl. 2002. A maximum likelihood method for analyzing pseudogene evolution: implications for silent site evolution in humans and rodents. Mol. Biol. Evol. 19:110–117. Cavalier-Smith, T. 2002. The neomuran origin of archaebacteria, the negibacterial root of the universal tree and bacterial megaclassification. Int. J. Syst. Evol. Microbiol. 52:7–76. Chuang, P. T., D. G. Albertson, and B. J. Meyer. 1994. DPY-27: a chromosome condensation protein homolog that regulates C. elegans dosage compensation through association with the X chromosome. Cell 79:459–474. Cobbe, N. and M. M. Heck. 2000. SMCs in the world of chromosome biology- from prokaryotes to higher eukaryotes. J. Struct. Biol. 129:123–143. Coghlan, A., and K. H. Wolfe. 2002. Fourfold faster rate of genome rearrangement in nematodes than in Drosophila. Genome Res. 12:857–867. Collas, P., K. Le Guellec, and K. Tasken. 1999. The A-kinaseanchoring protein AKAP95 is a multivalent protein with a key role in chromatin condensation at mitosis. J. Cell Biol. 147:1167–1180. Comeron, J. M. and M. Kreitman. 1998. The correlation between synonymous and nonsynonymous substitutions in Drosophila: mutation, selection or relaxed constraints? Genetics 150:767–775. de Jager, M., J. van Noort, D. C. van Gent, C. Dekker, R. Kanaar, and C. Wyman. 2001. Human Rad50/Mre11 is a flexible complex that can tether DNA ends. Mol. Cell. 8:1129–1135. Drummond, A., and K. Strimmer. 2001. PAL: An object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics 17:662–663. Duret, L., and D. Mouchiroud. 1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96:4482–4487. Echols, N., P. Harrison, S. Balasubramanian, N. M. Luscombe, P. Bertone, Z. Zhang, and M. Gerstein. 2002. Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. Nucleic Acids Res. 30:2515–2523. Eijpe, M., C. Heyting, B. Gross, and R. Jessberger. 2000. Association of mammalian SMC1 and SMC3 proteins with meiotic chromosomes and synaptonemal complexes. J. Cell Sci. 113:673–682. Eisen, J. A. 1995. The RecA protein as a model molecule for molecular systematic studies of bacteria: comparison of trees of RecAs and 16S rRNAs from the same species. J. Mol. Evol. 41:1105–1123. Engels, W. R., J. Ducau, C. Flores, D. Johnson-Schlitz, and C. R. Preston. 2003. Double strand break repair: four pathways, many genes. A. Dros. Res. Conf. 44:1. Phylogenetic Analysis of SMC Proteins 345 Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2002. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415:1024–1026. Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27: 401–410. Fousteri, M. I., and A. R. Lehmann. 2000. A novel SMC protein complex in Schizosaccharomyces pombe contains the Rad18 DNA repair protein. EMBO J. 19:1691–1702. Fraser, H. B., A. E. Hirsh, L. M. Steinmetz, C. Scharfe, and M. W. Feldman. 2002. Evolutionary rate in the protein interaction network. Science 296:750–752. Fujioka, Y., Y. Kimata, K. Nomaguchi, K. Watanabe, and K. Kohno. 2002. Identification of a novel non-SMC component of the SMC5/SMC6 complex involved in DNA repair. J. Biol. Chem. 277:21585–21591. Gorski, M. M., and J. C. J. Eeken. 2002. The Drosophila rad50 mutants are pupal lethal, however the third instar larvae show elevated levels of anaphase bridges in dividing cells. A. Dros. Res. Conf. 43:201C. Gouy, M., and W. H. Li. 1989. Molecular phylogeny of the kingdoms Animalia, Plantae, and Fungi. Mol. Biol. Evol. 6:109–122. Grantham, R., C. Gautier, M. Gouy, M. Jacobzone, and R. Mercier. 1981. Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res. 9:r43–r74. Graybeal, A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47:9–17. Gruber, S., C. H. Haering, and K. Nasmyth. 2003. Chromosomal cohesin forms a ring. Cell 112:765–777. Gu, Z., A. Cavalcanti, F. C. Chen, P. Bouman, and W. H. Li. 2002. Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol. Biol. Evol. 19: 256–262. Gupta, R. S. 1995. Phylogenetic analysis of the 90 kD heat shock family of protein sequences and an examination of the relationship among animals, plants, and fungi species. Mol. Biol. Evol. 12:1063–1073. Haering, C. H., J. Löwe, A. Hochwagen, and K. Nasmyth. 2002. Molecular architecture of SMC proteins and the yeast cohesin complex. Mol. Cell. 9:773–788. Hagstrom, K. A., V. F. Holmes, N. R. Cozzarelli, and B. J. Meyer. 2002. C. elegans condensin promotes mitotic chromosome architecture, centromere organization, and sister chromatid segregation during mitosis and meiosis. Genes Dev. 16:729–742. Hausdorf, B. 2000. Early evolution of the bilateria. Syst. Biol. 49:130–142. Hedges, S. B., H. Chen, S. Kumar, D. Y. Wang, A. S. Thompson, and H. Watanabe. 2001. A genomic timescale for the origin of eukaryotes. BMC Evol. Biol. 1:4. Hiraga, S. 2000. Dynamic localization of bacterial and plasmid chromosomes. Annu. Rev. Genet. 34:21–59. Hirano, T. 2002. The ABCs of SMC proteins: two-armed ATPases for chromosome condensation, cohesion, and repair. Genes Dev. 16:399–414. Hirano, M., D. E. Anderson, H. P. Erickson, and T. Hirano. 2001. Bimodal activation of SMC ATPase by intra- and intermolecular interactions. EMBO J. 20:3238–3250. Hirano, M., and T. Hirano. 1998. ATP-dependent aggregation of single-stranded DNA by a bacterial SMC homodimer. EMBO J. 17:7139–7148. ———. 2002. Hinge-mediated dimerization of SMC protein is essential for its dynamic interaction with DNA. EMBO J. 21:5733–5744. Hopfner, K. P., L. Craig, G. Moncalian et al (12 co-authors). 2002. The Rad50 zinc-hook is a structure joining Mre11 complexes in DNA recombination and repair. Nature 418:562–566. Hopfner, K. P., A. Karcher, L. Craig, T. T. Woo, J. P. Carney, and J. A. Tainer. 2001. Structural biochemistry and interaction architecture of the DNA double-strand break repair Mre11 nuclease and Rad50-ATPase. Cell 105:473–485. Hopfner, K. P., A. Karcher, D. S. Shin, L. Craig, L. M. Arthur, J. P. Carney, and J. A. Tainer. 2000. Structural biology of Rad50 ATPase: ATP-driven conformational control in DNA double-strand break repair and the ABC-ATPase superfamily. Cell 101:789–800. Hughes, A. L., and M. Yeager. 1998. Natural selection at major histocompatibility complex loci of vertebrates. Annu. Rev. Genet. 32:415–435. Jain, R., M. C. Rivera, and J. A. Lake. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96:3801–3806. Jessberger, R., B. Riwar, H. Baechtold, and A. T. Akhmedov. 1996. SMC proteins constitute two subunits of the mammalian recombination complex RC-1. EMBO J. 15:4061–4068. Johnson-Schlitz, D. M., J. Ducau, C. Flores, and W. R. Engels. 2003. Evidence of DNA repair defects in mre11 and rad50 mutants. A. Dros. Res. Conf. 44:323B. Jones, S., and J. Sgouros. 2001. The cohesin complex: sequence homologies, interaction networks and shared motifs. Genome Biol. 2:RESEARCH0009. Jordan, I. K., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12:962–968. Kanaya, S., Y. Yamada, M. Kinouchi, Y. Kudo, and T. Ikemura. 2001. Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J. Mol. Evol. 53:290–298. Kim, S. T., B. Xu, and M. B. Kastan. 2002. Involvement of the cohesin protein, Smc1, in Atm-dependent and independent responses to DNA damage. Genes Dev. 16:560–570. Kimura, K., and T. Hirano. 1997. ATP-dependent positive supercoiling of DNA by 13S condensin: a biochemical implication for chromosome condensation. Cell 90:625–634. Kimura, K., V. V. Rybenkov, N. J. Crisona, T. Hirano, and N. R. Cozzarelli. 1999. 13S condensin actively reconfigures DNA by introducing global positive writhe: implications for chromosome condensation. Cell 98:239–248. Klein, F., P. Mahr, M. Galova, S. B. Buonomo, C. Michaelis, K. Nairz, and K. Nasmyth. 1999. A central role for cohesins in sister chromatid cohesion, formation of axial elements, and recombination during yeast meiosis. Cell 98:91–103. Koski, L. B., R. A. Morton, and G. B. Golding. 2001. Codon bias and base composition are poor indicators of horizontally transferred genes. Mol. Biol. Evol. 18:404–412. Kuhner, M. K., and J. Felsenstein. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11:459–468. Lang, B. F., C. O’Kelly, T. Nerad, M. W. Gray, and G. Burger. 2002. The closest unicellular relatives of animals. Curr. Biol. 12:1773–1778. Lehmann, A. R., M. Walicka, D. J. Griffiths, J. M. Murray, F. Z. Watts, S. McCready, and A. M. Carr. 1995. The rad18 gene of Schizosaccharomyces pombe defines a new subgroup of the SMC superfamily involved in DNA repair. Mol. Cell. Biol. 15:7067–7080. Li, W.-H. 1997. Molecular Evolution. Sinauer Associates, Sunderland, Mass. Li, W. H., T. Gojobori, and M. Nei. 1981. Pseudogenes as a paradigm of neutral evolution. Nature 292:237–239. 346 Cobbe and Heck Lieb, J. D., M. R. Albrecht, P. T. Chuang, and B. J. Meyer. 1998. MIX-1: an essential component of the C. elegans mitotic machinery executes X chromosome dosage compensation. Cell 92:265–277. Lindow, J. C., R. A. Britton, and A. D. Grossman. 2002. Structural maintenance of chromosomes protein of Bacillus subtilis affects supercoiling in vivo. J. Bacteriol. 184:5317–5322. Liu, C. C., J. McElver, I. Tzafrir, R. Joosen, P. Wittich, D. Patton, A. A. Van Lammeren, and D. Meinke. 2002. Condensin and cohesin knockouts in Arabidopsis exhibit a titan seed phenotype. Plant J. 29:405–415. Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree of life in the light of the covarion model. J. Mol. Evol. 49:496–508. Losada, A., and T. Hirano. 2001. Intermolecular DNA interactions stimulated by the cohesin complex in vitro. Implications for sister chromatid cohesion. Curr. Biol. 11:268–272. Löwe, J., S. C. Cordell, and F. van den Ent. 2001. Crystal structure of the SMC head domain: an ABC ATPase with 900 residues antiparallel coiled-coil inserted. J. Mol. Biol. 306:25–35. Lyons-Weiler, J., and G. A. Hoelzer. 1997. Escaping from the Felsenstein zone by detecting long branches in phylogenetic data. Mol. Phylogenet. Evol. 8:375–384. Melby, T. E., C. N. Ciampaglio, G. Briscoe, and H. P. Erickson. 1998. The symmetrical structure of structural maintenance of chromosomes (SMC) and MukB proteins: long, antiparallel coiled coils, folded at a flexible hinge. J. Cell Biol. 142:1595– 1604. Mengiste, T., E. Revenkova, N. Bechtold, and J. Paszkowski. 1999. An SMC-like protein is required for efficient homologous recombination in Arabidopsis. EMBO J. 18:4505– 4512. Messier, W., and C. Stewart. 1997. Episodic adaptive evolution of primate lysozymes. Nature 385:151–154. Michaelis, C., R. Ciosk, and K. Nasmyth. 1997. Cohesins: chromosomal proteins that prevent premature separation of sister chromatids. Cell 91:35–45. Montgomery, E. A., B. Charlesworth, and C. H. Langley. 1987. A test for the role of natural selection in the stabilization of transposable element copy number in a population of Drosophila melanogaster. Genet. Res. 49:31–41. Morgenstern, B., K. Frech, A. Dress, and T. Werner. 1998. DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14:290–294. Mushegian, A. R., J. R. Garey, J. Martin, and L. X. Liu. 1998. Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes. Genome Res. 8:590–598. Nei, M., P. Xu, and G. Glazko. 2001. Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proc. Natl. Acad. Sci. USA 98:2497–2502. Niki, H., R. Imamura, M. Kitaoka, K. Yamanaka, T. Ogura, and S. Hiraga. 1992. E. coli MukB protein involved in chromosome partition forms a homodimer with a rod-andhinge structure having DNA binding and ATP/GTP binding activities. EMBO J. 11:5101–5109. Pál, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:927–931. Revenkova, E., M. Eijpe, C. Heyting, B. Gross, and R. Jessberger. 2001. Novel meiosis-specific isoform of mammalian SMC1. Mol. Cell Biol. 21:6984–6998. Saitoh, N., I. Goldberg, and W. C. Earnshaw. 1995. The SMC proteins and the coming of age of the chromosome scaffold hypothesis. BioEssays 17:759–766. Saitoh, N., I. G. Goldberg, E. R. Wood, and W. C. Earnshaw. 1994. ScII: an abundant chromosome scaffold protein is a member of a family of putative ATPases with an unusual predicted tertiary structure. J. Cell Biol. 127:303–318. Saka, Y., T. Sutani, Y. Yamashita, S. Saitoh, M. Takeuchi, Y. Nakaseko, and M. Yanagida. 1994. Fission yeast cut3 and cut14, members of a ubiquitous protein family, are required for chromosome condensation and segregation in mitosis. EMBO J. 13:4938–4952. Sakai, A., K. Hizume, T. Sutani, K. Takeyasu, and M. Yanagida. 2003. Condensin but not cohesin SMC heterodimer induces DNA reannealing through protein-protein assembly. EMBO J. 22:2764–2775. Schmid, K. J., and D. Tautz. 1997. A screen for fast evolving genes from Drosophila. Proc. Natl. Acad. Sci. USA 94:9746–9750. Sharp, P. M., and W. H. Li. 1989. On the rate of DNA sequence evolution in Drosophila. J. Mol. Evol. 28:398–402. Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988. ‘‘Silent’’ sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704–716. Siddiqui, N. U., P. E. Stronghill, R. E. Dengler, C. A. Hasenkampf, and C. D. Riggs. 2003. Mutations in Arabidopsis condensin genes disrupt embryogenesis, meristem organization and segregation of homologous chromosomes during meiosis. Development 130:3283–3295. Sjögren, C., and K. Nasmyth. 2001. Sister chromatid cohesion is required for postreplicative double-strand break repair in Saccharomyces cerevisiae. Curr. Biol. 11:991–995. Soppa, J. 2001. Prokaryotic structural maintenance of chromosomes (SMC) proteins: distribution, phylogeny, and comparison with MukBs and additional prokaryotic and eukaryotic coiled-coil proteins. Gene 278:253–264. Soppa, J., K. Kobayashi, M. F. Noirot-Gros, D. Oesterhelt, S. D. Ehrlich, E. Dervyn, N. Ogasawara, and S. Moriya. 2002. Discovery of two novel families of proteins that are proposed to interact with prokaryotic SMC proteins, and characterization of the Bacillus subtilis family members ScpA and ScpB. Mol. Microbiol. 45:59–71. Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, O. Madsen, W. W. de Jong, and M. J. Stanhope. 2001. Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol. Biol. Evol. 18:132–143. Steen, R. L., F. Cubizolles, K. Le Guellec, and P. Collas. 2000. A kinase-anchoring protein (AKAP)95 recruits human chromosome-associated protein (hCAP)-D2/Eg7 for chromosome condensation in mitotic extract. J. Cell Biol. 149:531–536. Steffensen, S., P. A. Coelho, N. Cobbe, S. Vass, M. Costa, B. Hassan, S. N. Prokopenko, H. Bellen, M. M. Heck, and C. E. Sunkel. 2001. A role for Drosophila SMC4 in the resolution of sister chromatids in mitosis. Curr. Biol. 11:295–307. Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969. Strunnikov, A. V., E. Hogan, and D. Koshland. 1995. SMC2, a Saccharomyces cerevisiae gene essential for chromosome segregation and condensation, defines a subgroup within the SMC family. Genes Dev. 9:587–599. Strunnikov, A. V., V. L. Larionov, and D. Koshland. 1993. SMC1: an essential yeast gene encoding a putative head-rodtail protein is required for nuclear division and defines a new ubiquitous protein family. J. Cell Biol. 123:1635–1648. Sutani, T., and M. Yanagida. 1997. DNA renaturation activity of the SMC complex implicated in chromosome condensation. Nature 388:798–801. Phylogenetic Analysis of SMC Proteins 347 Suzuki, Y., G. V. Glazko, and M. Nei. 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. USA 99:16138–16143. Swofford, D. L. 1999. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4.0. Sinauer Associates, Sunderland, Mass. Tateno, Y., N. Takezaki, and M. Nei. 1994. Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximumparsimony methods when substitution rate varies with site. Mol. Biol. Evol. 11:261–277. Taylor, E. M., J. S. Moghraby, J. H. Lees, B. Smit, P. B. Moens, and A. R. Lehmann. 2001. Characterization of a novel human SMC heterodimer homologous to the Schizosaccharomyces pombe Rad18/Spr18 complex. Mol. Biol. Cell. 12:1583–1594. Thatcher, T. H., and M. A. Gorovsky. 1994. Phylogenetic analysis of the core histones H2A, H2B, H3, and H4. Nucleic Acids Res. 22:174–179. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. Van de Peer, Y., S. L. Baldauf, W. F. Doolittle, and A. Meyer. 2000. An updated and comprehensive rRNA phylogeny of (crown) eukaryotes based on rate-calibrated evolutionary distances. J. Mol. Evol. 51:565–576. van den Ent, F., A. Lockhart, J. Kendrick-Jones, and J. Löwe. 1999. Crystal structure of the N-terminal domain of MukB: a protein involved in chromosome partitioning. Struct. Fold. Des. 7:1181–1187. Verkade, H. M., S. J. Bugg, H. D. Lindsay, A. M. Carr, and M. J. O’Connell. 1999. Rad18 is required for DNA repair and checkpoint responses in fission yeast. Mol. Biol. Cell. 10:2905–2918. Veuthey, A. L., and G. Bittar. 1998. Phylogenetic relationships of fungi, plantae, and animalia inferred from homologous comparison of ribosomal proteins. J. Mol. Evol. 47:81–92. Wang, D. Y., S. Kumar, and S. B. Hedges. 1999. Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi. Proc. R. Soc. Lond. B Biol. Sci. 266:163–171. Whelan, S., and N. Goldman. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18: 691–699. Wolf, Y. I., I. B. Rogozin, N. V. Grishin, R. L. Tatusov, and E. V. Koonin. 2001. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1:8. Wright, F. 1990. The effective number of codons used in a gene. Gene 87:23–29. Yamazoe, M., T. Onogi, Y. Sunako, H. Niki, K. Yamanaka, T. Ichimura, and S. Hiraga. 1999. Complex formation of MukB, MukE and MukF proteins involved in chromosome partitioning in Escherichia coli. EMBO J. 18:5873–5884. Yang, J., Z. Gu, and W. H. Li. 2003. Rate of protein evolution versus fitness effect of gene deletion. Mol. Biol. Evol. 20:772–724. Yazdi, P. T., Y. Wang, S. Zhao, N. Patel, E. Y. Lee, and J. Qin. 2002. SMC1 is a downstream effector in the ATM/NBS1 branch of the human S-phase checkpoint. Genes Dev. 16: 571–582. Yoshimura, S. H., K. Hizume, A. Murakami, T. Sutani, K. Takeyasu, and M. Yanagida. 2002. Condensin architecture and interaction with DNA: regulatory non-SMC subunits bind to the head of SMC heterodimer. Curr Biol. 12:508–513. Zwickl, D. J., and D. M. Hillis. 2002. Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51:588–598. Michele Vendruscolo, Associate Editor Accepted September 29, 2003