Download The Evolution of SMC Proteins: Phylogenetic Analysis and Structural

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Endomembrane system wikipedia , lookup

Proteasome wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Magnesium transporter wikipedia , lookup

Signal transduction wikipedia , lookup

Bacterial microcompartment wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

SR protein wikipedia , lookup

Protein wikipedia , lookup

Protein moonlighting wikipedia , lookup

Cyclol wikipedia , lookup

Protein–protein interaction wikipedia , lookup

List of types of proteins wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
The Evolution of SMC Proteins: Phylogenetic Analysis and
Structural Implications
Neville Cobbe and Margarete M. S. Heck
Wellcome Trust Centre for Cell Biology, Institute of Cell and Molecular Biology, University of Edinburgh, United Kingdom
The SMC proteins are found in nearly all living organisms examined, where they play crucial roles in mitotic
chromosome dynamics, regulation of gene expression, and DNA repair. We have explored the phylogenetic relationships
of SMC proteins from prokaryotes and eukaryotes, as well as their relationship to similar ABC ATPases, using
maximum-likelihood analyses. We have also investigated the coevolution of different domains of eukaryotic SMC
proteins and attempted to account for the evolutionary patterns we have observed in terms of available structural data.
Based on our analyses, we propose that each of the six eukaryotic SMC subfamilies originated through a series of ancient
gene duplication events, with the condensins evolving more rapidly than the cohesins. In addition, we show that the
SMC5 and SMC6 subfamily members have evolved comparatively rapidly and suggest that these proteins may perform
redundant functions in higher eukaryotes. Finally, we propose a possible structure for the SMC5/SMC6 heterodimer
based on patterns of coevolution.
Introduction
DNA is compacted in length by a factor of up to
10,000 to form a metaphase chromosome in a typical
metazoan cell. How this remarkable feat of packaging is
achieved and how the chromosomes are then faithfully
segregated to the daughter cells have been fundamental
questions for over 100 years, ever since Walther Fleming
first described the dynamic behavior of chromosomes
during cell division. Within the past decade, a convergence
of data from genetic and biochemical approaches has
identified the SMC (structural maintenance of chromosomes) proteins as key players in both chromosome condensation and segregation (Cobbe and Heck 2000). These
proteins are a highly conserved and ubiquitous family,
found in all eukaryotes for which sufficient sequence data
are available and in most of the currently sequenced prokaryotes. The function of these proteins in different aspects
of chromosomal behavior is also conserved in these
organisms, where they play vital roles in the packaging,
repair, and segregation of the genetic material.
No more than one smc gene is found in each of the
prokaryotes examined, in which the protein apparently
forms homodimers, whereas six paralogs are found in
eukaryotes, in which they form at least three types of
heterodimers. These heterodimers include the SMC1 and
SMC3 proteins (part of the ‘‘cohesin’’ complex), the
SMC2 and SMC4 proteins (part of the ‘‘condensin’’
complex), and the SMC5/SMC6 heterodimer, which forms
part of a complex involved in DNA repair (Hirano 2002).
Although no canonical SMC family members have been
found in the enterobacteriaceae, similar phenotypes to smc
mutants are displayed by mutations affecting mukB in
Escherichia coli (Niki et al. 1992). This gene encodes
a protein with a structure that is remarkably similar to that
of SMC proteins (Melby et al. 1998), despite limited
Key words: SMC, phylogeny, condensin, cohesin, ABC ATPases,
maximum likelihood.
E-mail: [email protected].
Mol. Biol. Evol. 21(2):332–347. 2004
DOI: 10.1093/molbev/msh023
Advance Access publication December 5, 2003
Molecular Biology and Evolution vol. 21 no. 2
Ó Society for Molecular Biology and Evolution 2004; all rights reserved.
332
sequence homology. In addition, although the crenarchaeota do not appear to have any SMC or MukB orthologs,
they do contain the related Rad50 proteins. Furthermore,
although no SMC, MukB, or Rad50 orthologs are apparent
in the genome of Rickettsia, it does nonetheless have
a related RecN protein. In fact, only four genera for which
a complete genome sequence is currently available
(Helicobacter, Buchnera, Chlamydia, and Chlamydophila)
do not appear to have either an SMC or a related protein.
Thus, it appears that SMC proteins have an extremely
ancient origin, reflecting their fundamental role in
chromosome dynamics.
As shown in figure 1A, each SMC protein is characterized by a large globular domain at each terminus and
a third globular domain forming a flexible hinge near the
middle of the molecule, separated by extended coiled-coil
structures (Saitoh et al. 1994; Melby et al. 1998) with short
gaps in the coiled-coils at various positions (Beasley et al.
2002). Although MukB and Rad50 have also been shown
to possess hinge regions (Melby et al. 1998; Anderson et
al. 2001; Hopfner et al. 2001), the large globular hinge
domain of SMC proteins is so far unique in both its
sequence and structure (Haering et al. 2002). Recent work
has shown that the hinge domain is the principal site at
which these molecules dimerize (Haering et al. 2002;
Hopfner et al. 2002). The terminal head domains are
required for ATP hydrolysis and have been shown to be
the region where non-SMC subunits of the condensin and
cohesin complexes bind (Anderson et al. 2002; Yoshimura
et al. 2002), similar to the binding of MukE and MukF to
the head of MukB (Yamazoe et al. 1999) or the binding of
Mre11 to Rad50 (Anderson et al. 2001; Hopfner et al.
2001). Although ATP is not required for DNA binding
(Kimura and Hirano 1997; Hirano and Hirano 1998;
Kimura et al. 1999), it appears necessary for preferential
binding to positively supercoiled substrates (Kimura et al.
1999). Likewise, ATP binding (but not hydrolysis) is also
required for the enhanced aggregation of the B. subtilis
Smc with ssDNA (Hirano and Hirano 1998).
Among the most conserved motifs in all SMC
proteins are the N-terminal Walker A (P-loop) and
C-terminal Walker B (DA box) ATPase motifs (Saitoh,
Phylogenetic Analysis of SMC Proteins 333
FIG. 1.—(A) General structure of SMC proteins, showing the five
distinct domains. (B) Alternative models of SMC heterodimer structure.
Goldberg, and Earnshaw 1995), which are also conserved
in MukB, Rad50, and RecN. Although both motifs have
been shown to be essential for ATP binding and hydrolysis
(Hirano et al. 2001), only the N-terminal globular domain
directly binds ATP (Akhmedov et al. 1998). This ATPase
activity appears to be required for the full function of
SMC-containing complexes, as shown by mutagenesis of
the ATP-binding domain (Chuang, Albertson, and Meyer
1994; Verkade et al. 1999; Fousteri and Lehmann 2000;
Hirano et al. 2001) or the use of nonhydrolysable ATP
analogs (Kimura and Hirano 1997). Based on the identification of these motifs at the extreme ends of SMC
molecules, it was suggested that the functional ATPase
domain would form by uniting these regions (Saitoh et al.
1994). This could occur in either of two ways, as
illustrated schematically in figure 1B. Either the molecule
could bend at the hinge to bring the terminal domains
together (the intramolecular model) or it could do so by
dimerizing as an antiparallel coiled-coil, bringing the
N-terminal domain of one subunit next to the C-terminal
domain of the other (the intermolecular model). Indeed,
the formation of such an ATPase has been confirmed
by a covalent fusion of the N-terminal and C-terminal
domains alone of the single SMC in Thermotoga
maritima, which is capable of ATP-dependent DNA aggregation (Löwe, Cordell, and van den Ent 2001). A
combination of in vitro data from electron microscopy,
crystal structures, and several elegant biochemical experiments now indicates that both Rad50 and SMC molecules
bring these terminal domains together by bending at the
hinge and forming intramolecular antiparallel coiled-coils
(Hirano et al. 2001; Haering et al. 2002; Hirano and
Hirano 2002; Hopfner et al. 2002). In addition, visualization of the whole condensin complex and its SMC subunits
alone by atomic force microscopy has suggested that the
SMC heterodimer may fold back along its coiled-coils,
allowing the hinge to interact with the terminal ATPase
domains (Yoshimura et al. 2002). However, the functional
significance of this conformation remains to be explored.
SMC proteins share another conserved motif (LSGG)
upstream of the Walker B site, referred to as the ‘‘signature
motif’’ of the ABC (ATP-binding cassette) ATPase family,
which also includes the structurally similar RecN, Rad50,
and MukB proteins (Löwe, Cordell, and van den Ent 2001).
This motif has been shown to play a pivotal role in the
dimerization of Rad50 head domains (Hopfner et al. 2000)
and probably serves a similar function in other ABC
ATPases such as the SMC, MukB, and RecN proteins.
Given the striking level of conservation within SMC
proteins and other ABC ATPases with extended coiledcoils, the origin of these proteins and their relationship to one
another have been a subject of considerable interest. A
number of different phylogenies for SMC proteins have
been suggested to date that differ in their method of
construction and resultant topology (Melby et al. 1998;
Cobbe and Heck 2000; Jones and Sgouros 2001; Soppa
2001; Beasley et al. 2002), so the precise relationships
between these and related proteins have been unclear. To
resolve this issue, we constructed phylogenetic trees
containing all six SMC subfamilies, as well as the related
MukB, Rad50, and RecN proteins, from a large data set of
protein sequences using the combined criteria of minimum
evolution and maximum likelihood. In addition, we used the
data generated in constructing these trees to test hypotheses
about the structure, function, and evolution of these proteins.
Materials and Methods
Phylogenetic Analysis of SMC Proteins
Sequences were assembled into subfamilies initially
based on their BLAST (Altschul et al. 1997) scores, and the
whole protein sequences were aligned with ClustalW
(Thompson, Higgins, and Gibson 1994) and DIALIGN
(Morgenstern et al. 1998). Alignments were then inspected
manually with the aid of Mview (Brown, Leroy, and
Sander 1998) and adjusted where necessary. Trees
containing large numbers of OTUs were constructed using
PAL (Drummond and Strimmer 2001) and Weighbor
(Bruno, Socci, and Halpern 2000), and smaller maximumlikelihood trees of various subgroups were then calculated
by quartet puzzling using the Tree-Puzzle program
(Strimmer and von Haeseler 1996). These subgroups
included members of the eukaryotic and prokaryotic SMC
subfamilies shown in figure 2, subfamilies that branch
together or closely related sequences within a subfamily
(e.g., the archaeal, proteobacterial, or firmicute SMCs in the
eubacterial subfamily). In the case of prokaryotic sequences
such as archaeal or firmicute SMCs, the branches in these
smaller trees were most easily resolved by removing coiledcoil sequences and calculating distances based on the
conserved globular domains.
Maximum parsimony was used to confirm the topology of smaller trees containing closely related sequences
within SMC subfamilies, using the tree-searching options
in PAUP* (Swofford 1999). However, this method was
not used to reconstruct the topology of the whole tree, as it
can become inconsistent with unequal rates of evolution
(Felsenstein 1978; Kuhner and Felsenstein 1994; Tateno,
Takezaki, and Nei 1994), which are observed in the
different SMC subfamilies. The topology of the eukaryotic
334 Cobbe and Heck
FIG. 2.—Phylogenetic tree of SMC proteins. The scale bar denotes the number of accepted substitutions in Whelan-Goldman distance units.
Phylogenetic Analysis of SMC Proteins 335
SMCs differed both within subfamilies, when different
conserved domains of the protein were used to calculate
trees, and between subfamilies, even when the whole
protein sequence was used (see Supplementary Material
online). Horizontal gene transfer is a comparatively rare
event in multicellular eukaryotes, and so it seems more
likely that the conflict between subfamilies is the result of
additional evolutionary constraints imposed by functional
interactions between members of the subfamilies. Consequently, the phylogenetic signal leading to the most
probable topology was recovered by calculating the
maximum-likelihood tree from concatenated alignments
of all the SMC subfamily members for which sequence
was available. This widely used approach (Gouy and Li
1989; Baldauf et al. 2000; Hausdorf 2000; Brown et al.
2001; Nei, Xu, and Glazko 2001; Springer et al. 2001;
Wolf et al. 2001; Brochier et al. 2002; Lang et al. 2002;
Suzuki, Glazko, and Nei 2002) is employed here on the
assumption that each of the individual eukaryotic SMC
sequences shares the same vertical pattern of inheritance
and therefore should follow the same branching order. As
these trees using concatenated alignments had bootstrap
values of 100% at all nodes (online supplementary figure)
and in turn led to the best alignments when used as guide
trees, it seems likely that the most probable topology is
recovered this way. For those taxa in which gene
duplications have occurred within a subfamily, it was
found that the separate use of either paralog gave rise to
the same topology, after testing all possible combinations
of paralogs within subfamilies.
In assembling the complete tree, the phylogenetic
information of as many taxa as possible was used to break
up long branches (Graybeal 1998; Zwickl and Hillis 2002).
Additionally, tests were performed using the less divergent
members of subfamilies to reconstruct trees in any cases
where long-branch attraction (Lyons-Weiler and Hoelzer
1997) was suspected, by confirming that the same
topology existed between other subfamily members. Any
ambiguously assigned branches (typically, with a bootstrap
probability less that 60%) were compared with all remaining competing tree topologies in which the suspect
branch was placed elsewhere, to establish which position
contributed least to the total length. The relationship
between the subfamilies was established both by calculating maximum-likelihood trees with the least divergent
representatives of the different subgroups and by determining the topology yielding the minimum-evolution
tree when all sequences were included. Finally, the branch
lengths of the overall tree were calculated using the
Whelan-Goldman model of amino acid substitution
(Whelan and Goldman 2001), with rate heterogeneity
among sites modeled according to a gamma distribution
with 16 rate categories. The gamma distribution shape
parameter a was estimated from the data set using the
Tree-Puzzle program (Strimmer and von Haeseler 1996).
The phylogenetic tree of coiled-coil containing ABC
ATPases was constructed using essentially the same
methods as the SMC tree, in which smaller maximumlikelihood trees of each subfamily were calculated using
Tree-Puzzle (Strimmer and von Haeseler 1996), and the
relationship between the subfamilies was established both
by calculating maximum-likelihood trees with the least
divergent sequences and by determining the topology
yielding the minimum-evolution tree when all sequences
were included. Similarly, the final tree of coiled-coil–
containing ABC ATPases was constructed by Tree-Puzzle,
using the Whelan-Goldman model of amino acid substitution with a gamma distribution. However, branch
lengths for this tree were calculated using distances estimated from the conserved globular domains of these
proteins, excluding coiled-coils.
Examples of maximum-likelihood trees used to
construct the final trees presented in figure 2 and figure 5
are provided as Supplementary Material online, together
with percentage bootstrap support values calculated using
PAL (Drummond and Strimmer 2001) and Tree-Puzzle
(Strimmer and von Haeseler 1996).
Analysis of SMC Codon Usage
The general pattern of codon usage in the genome of
different bacteria was compiled using data from the Codon
Usage Database (http://www.kazusa.or.jp/codon/ ). Grantham’s D statistic was calculated using the Wisconsin
Package programs CodonFrequency and Correspond
version 10.3 (Accelrys Inc) in the following species:
Pseudomonas aeruginosa, Mycobacterium tuberculosis,
Caulobacter crescentus, Deinococcus radiodurans, Treponema pallidum, Mesorhizobium loti, Thermotoga maritima, Borrelia burgdorferi, Listeria monocytogenes,
Neisseria meningitidis, Clostridium acetobutylicum, Bacillus subtilis, Agrobacterium tumefaciens, Staphylococcus
aureus, Xylella fastidiosa, Streptococcus pneumoniae,
Aquifex aeolicus, Prochlorococcus marinus, Synechocystis, Nostoc, Thermosynechococcus elongatus, Synechococcus, Archaeoglobus fulgidus, Halobacterium, Pyrococcus
furiosus, Methanosarcina acetivorans, and Methanococcus jannaschii. The codon bias index and the effective
number of codons were calculated using the CodonW
program (http://www.molbiol.ox.ac.uk/cu/ ).
The bias in GC content at silent third codon positions
was calculated by developing the bias(GC3) statistic. To
correct for biases in GC content caused by a biased amino
acid composition in the encoded protein, the GC content at
third nucleotide positions was calculated using relative
frequencies of codons. The statistic used is defined as
follows:
GC3 ¼
21 X
n
1 X
ci
21 j¼1 i¼1 ÿj
where n is the number of alternative codons ending in
either a G or a C for the j th group of synonymous codons,
c is the frequency with which such a codon occurs in the
coding sequence, and ÿj is the sum of the frequencies for
each alternative codon in the j th synonymous group,
regardless of GC content. To correct for changes in composition resulting from mutational bias at the same locus,
the GC content of the remaining first and second codon
positions was calculated (in the same way as for GC3), as
well as that of any introns. The arithmetic mean of these
three values was taken to be the expected GC content for
the locus, and the bias(GC3) statistic for a gene was then
calculated as
336 Cobbe and Heck
observedðGC3 Þ expectedðGC3 Þ
biasðGC3 Þ ¼ 1
1 expectedðGC3 Þ
where the species scaling factor 1 is the average AT
content of the respective genome (57.3% in the case of
D. melanogaster [Adams et al. 2000]), divided by the
average GC content.
The pairwise number of synonymous and nonsynonymous substitutions per site between aligned SMC and
Rad50 sequences from D. melanogaster and D. pseudoobscura were estimated using the Diverge program in the
Wisconsin package. The D. pseudoobscura nucleic acid
sequences were obtained from the Drosophila Genome
Project at the Baylor College of Medicine (http://hgsc.
bcm.tmc.edu/projects/Drosophila/ )
SMC Evolutionary Rate Correlation
The divergence of each eukaryotic SMC protein
region was calculated by summing the maximum-likelihood branch lengths as far as the presumed root where the
eukaryotic and prokaryotic branches meet. The extent of
correlation between the estimated evolutionary rate in different parts of the proteins was then calculated using the
Pearson product-moment correlation coefficient for each
pair of SMC protein regions. Similar trends were also
found using the Spearman rank correlation coefficient.
Results
Phylogenetic Analysis of SMC Proteins
The final maximum-likelihood tree of 148 SMC sequences is presented in figure 2, with only one representative of each prokaryotic genus displayed. Among the
eukaryotic SMC proteins, it was consistently observed that
SMC1 and SMC4 branch together (these are the larger
SMC subunits of the cohesin and condensin SMC
heterodimers, respectively). Similarly, the smaller subunits, SMC3 and SMC2, appear to originate from the same
gene duplication event, and SMC5 and SMC6 (which
heterodimerize as part of a DNA repair complex [Taylor
et al. 2001; Fujioka et al. 2002]) also branch together. All
eukaryotes whose genomes have been substantially
sequenced, from microsporidia to humans, contain these
six subfamilies of SMC proteins, suggesting that the
duplication events giving rise to each subfamily must have
occurred either before or very soon after the origin of
eukaryotes.
It is also noteworthy that the rate of accepted amino
acid substitution varies among different eukaryotic taxa
within each subfamily. In particular, a number of SMC
proteins in Caenorhabditis elegans seem to be exceptionally divergent. However, this can be explained by the
observation that the more rapidly evolving SMCs are
either those that have arisen from a gene duplication (such
as SMC4a and DPY-27) or SMC proteins that form
heterodimers with both of the duplicated SMCs (such as
MIX-1) (Hagstrom et al. 2002). This is consistent with
a correlation between the large number of duplicated genes
in the C. elegans genome (Coghlan and Wolfe 2002; Gu
et al. 2002) and its rapid evolutionary rate (Aguinaldo et al.
1997; Mushegian et al. 1998). Conversely, the rate of
evolution appears relatively slow for the plant and vertebrate SMCs, with the exception of the SMC1b proteins,
which are probably less evolutionarily constrained, as they
appear to be required only during meiosis in males
(Revenkova et al. 2001).
Examining the tree, it appears that the branch lengths
between more closely related organisms (such as the
vertebrates or diptera) are longer for condensin SMC
proteins than for the corresponding cohesins, suggesting
either that condensins may be more ancient or may evolve
more rapidly. However, the SMC proteins of both
complexes appear to be similarly essential for viability in
a range of organisms (Strunnikov, Larionov, and Koshland
1993; Saka et al. 1994; Strunnikov, Hogan, and Koshland
1995; Michaelis, Ciosk, and Nasmyth 1997; Lieb et al.
1998; Steffensen et al. 2001; Hagstrom et al. 2002; Liu
et al. 2002; Siddiqui et al. 2003), so one might expect their
rates of evolution to be broadly similar. Indeed, a significant rate difference is only evident between the more
closely-related organisms for which all SMC protein sequences are available, whereas the mean pairwise distances for condensin and cohesin SMCs in more divergent
groups seem to approach similar values (fig. 3A). Furthermore, the overall number of substitutions between condensin or cohesin SMC proteins and their presumed
ancestor appear to be essentially the same (fig. 3B). It is
therefore possible that the trend shown in figure 3A may
reflect slightly more rapid evolution in condensin SMC
proteins (detectable only by examining the smallest
pairwise distances, in which the number of accepted
substitutions and speciation events are minimized).
Although condensin SMCs appear to show a higher
substitution rate among closely related species than
cohesin SMCs, the mean distances within subfamilies of
these proteins (averaged across all condensin and cohesin
SMCs for each pairwise comparisons between different
organisms) are about half (0.54 6 0.134) the corresponding distances between SMC5 and SMC6 proteins. Similarly, it can be seen in figure 3B that the mean distance of
SMC5 and SMC6 to the presumed root with prokaryotes is
approximately twice as great as that of other eukaryotic
SMC proteins. As discussed later, this elevated sequence
divergence may reflect a reduced selection pressure on
these proteins compared with the other SMCs, which are
required during every cell division, and is thus consistent
with the suggested general role for these proteins,
specifically in response to DNA damage (Lehmann et al.
1995; Mengiste et al. 1999; Verkade et al. 1999; Fousteri
and Lehmann 2000).
Analysis of Prokaryotic SMC Codon Usage
In general, the phylogeny of prokaryotic SMC
proteins agrees well with their accepted taxonomic
distribution, although the unusually short SMC protein in
Borrelia burgdorferi results in this protein clustering with
other outgroup species, away from other spirochaetes such
as Treponema. However, as previously suggested (Soppa
2001), it would appear from the tree shown in figure 2 that
at least two cases of horizontal gene transfer (HGT) have
Phylogenetic Analysis of SMC Proteins 337
FIG. 3.—Divergence of SMC protein subfamilies. (A) Relative
distances (mean pairwise distance within each subfamily divided by the
mean distance of all subfamilies) of condensin and cohesin SMC proteins
within different taxonomic groups. The difference in relative distances
between condensin and cohesin SMC proteins becomes most obvious
with progressively more closely related organisms, as shown from left to
right. (B) Evolutionary distance between members of different eukaryotic
SMC subfamilies and the presumed root with prokaryotic SMC proteins.
The mean number of accepted substitutions (in Whelan-Goldman distance
units) and standard deviation are displayed for organisms with substantially sequenced genomes in which all SMC protein sequences are
available.
occurred, resulting in the SMC proteins of the cyanobacteria (Prochlorococcus marinus, Thermosynechococcus
elongatus, Nostoc, Synechocystis, and Synechococcus) as
well as the eubacterial species Aquifex aeolicus, most
closely resembling those of the archaea. On the other hand,
phylogenetic trees based on concatenated alignments of
protein or rRNA sequences (Wolf et al. 2001; Brochier
et al. 2002) firmly place these species among the remaining
eubacteria, as do our own trees based on sequences for 16S
rRNA or proteins such as EF-Tu and RecA (data not
shown).
To investigate this HGT phenomenon further, codon
usage analysis was conducted to examine whether the
pattern of codon usage in the cyanobacterial SMC genes
more closely resembled that of the archaeal SMCs or of
other cyanobacterial genes. The differences between codon
frequency tables were computed using Grantham’s D
statistic (Grantham et al. 1981). This statistic was
calculated to measure the difference in codon usage
FIG. 4.—Codon usage in SMC genes. (A) Measures of Grantham’s D
statistic in comparisons between codon usage for SMC, EF-Tu, and RecA
in different prokaryotic species and general codon usage patterns in the
same organism. (B) Codon usage in Drosophila SMC genes. Values of
bias(GC3) displayed above have all been halved to facilitate comparison
with trends in the other statistics. (C) The ratio of nonsynonymous (Ka) to
synonymous substitutions (Ks) in SMC genes, based on pairwise
comparisions between codons in D. melanogaster and D. pseudoobscura
sequences.
between the SMC of a particular prokaryote and the
general pattern of codon usage in its genome. In addition,
comparisons were made between genomic codon usage
and the codon usage of recA orthologs or EF-Tu genes,
which are often employed in phylogenetic tree reconstruction because of the apparent rarity with which they
experience HGT (Eisen 1995; Baldauf, Palmer, and
Doolittle 1996; Brendel et al. 1997; Lopez, Forterre, and
Philippe 1999). However, as shown in figure 4A, the
pattern of codon usage for SMC genes in A. aeolicus, the
cyanobacteria, the remaining eubacteria and the archaea
more closely match the general pattern for other loci in the
genome than do EF-Tu and RecA, which have higher
values of D. Therefore, we have found little evidence in
favor of HGT for smc genes from patterns of codon usage.
As the relative level of sequence similarity between
the archaeal and cyanobacterial SMC proteins seems hard
to account for in terms of convergent evolution, it would
appear that their similarity results from their common
ancestry (synapomorphy). This synapomorphy is pre-
338 Cobbe and Heck
sumably associated with an ancient HGT event, allowing
sufficient time for a transferred smc gene to adapt to the
translational apparatus of its recipient genome by means
of appropriate synonymous substitutions. This codon
adaptation could occur without significantly affecting
a gene’s identity at the amino acid level, as the rate of
amino acid replacement is generally much slower than the
rate of synonymous substitution (Li 1997). Indeed, a recent
study (Koski, Morton, and Golding 2001) has shown that
many E. coli genes with normal nucleotide composition
have no apparent orthologs in a related enterobacterium,
suggesting that only relatively recent HGT events may
necessarily reveal themselves by differences in codon
usage. Consequently, it is not clear from the codon usage
data in which direction this transfer event occurred,
whether from archaea to cyanobacteria or vice versa, and
there is still debate as to which is actually the older group
of prokaryotes (Hedges et al. 2001; Cavalier-Smith 2002).
As the A. aeolicus SMC clusters with different archaeal
species in the tree and its pattern of codon usage more
closely resembles that of the archaea than the cyanobacteria, it would appear that a separate transfer event was
involved in this lineage. This may reflect a general trend in
this species to acquire and retain genes from similarly
thermophilic archaea (Aravind et al. 1998).
However, apart from the cases described above, the
phylogeny of prokaryotic SMC proteins agrees well with
trees constructed from 16S rRNA, EF-Tu, and RecA
sequences. If the complexity hypothesis (Jain, Rivera, and
Lake 1999) holds true that informational genes are less
prone to successful HGT by virtue of the number of gene
products they interact with, then the relative lack of HGT
involving SMCs may reflect a potentially large number of
interactors. So far, only two proteins have been found to
form a complex biochemically with either a bacterial SMC
(Soppa et al. 2002) or MukB (Yamazoe et al. 1999), but it
seems likely that many more may interact either transiently
or indirectly, as suggested by the large number of proteins
that interact genetically with MukB (Hiraga 2000).
Analysis of SMC Codon Usage in Drosophila
The dipteran SMC5 and SMC6 coding sequences
seem to have an exceptionally large number of substitutions, even allowing for the observation that a large
number of genes in Drosophila are known to evolve at an
unusually high rate (Sharp and Li 1989; Schmid and Tautz
1997; Fay, Wyckoff, and Wu 2002). The possibility that
these genes might be nonessential (Jordan et al. 2002;
Yang, Gu, and Li 2003) or required at much lower levels
(Pál, Papp, and Hurst 2001) was therefore investigated by
examining the pattern of codon usage in the Drosophila
genes encoding SMC5 and SMC6, in comparison with the
codon usage bias of the other four SMCs. In Drosophila
and other organisms, there is a strong positive correlation
between gene expression levels and codon usage bias
(Duret and Mouchiroud 1999; Kanaya et al. 2001).
However, it has also been suggested that synonymous
codon usage in Drosophila may be biased to enhance the
accuracy of translation, as the frequency of preferred
codons is significantly higher at conserved amino acid
positions (Akashi 1994). As the number of accepted
nucleotide substitutions may be constrained by strong
codon usage bias, this could explain the apparent
correlation between the rates of synonymous and nonsynonymous substitutions in Drosophila (Comeron and
Kreitman 1998). Therefore, if the higher rates of amino
acid replacement in Drosophila SMC5 and SMC6 result
from reduced selective pressures, this may be reflected by
a lack of bias for optimal codons (consistent with reduced
translational accuracy or efficiency.)
The extent to which the genes encoding SMC
proteins in Drosophila demonstrate directional bias in
codon usage was first investigated by calculating the
codon bias index (CBI) (Bennetzen and Hall 1982). As
shown in figure 4B, the CBI for SMC5 or SMC6 is lower
than the values for cohesin or condensin SMCs. Moreover,
the especially low CBI value for SMC5 is closer to that of
the negatively biased yEst-6 pseudogene (accession
number AF526559) of Drosophila. The remaining SMC
genes all have similar values, with the exception of
Drosophila SMC3 (CAP), which has a comparatively high
level of codon bias. As this gene is found on the X
chromosome, whereas the other SMC genes are all
autosomal, the increased bias and decreased number of
substitutions in Drosophila SMC3 is presumably a reflection of the increased selection pressure on X-linked
genes in the heterogametic sex (Montgomery, Charlesworth, and Langley 1987). To evaluate whether the
reduced bias observed in SMC5 and SMC6 might be
simply the result of lower expression levels of DNA repair
proteins, the CBI for Drosophila Rad50 was also
calculated. As this is another gene specifically involved
in DNA repair (Gorski and Eeken 2002; Engels et al.
2003; Johnson-Schlitz et al. 2003) that also has strong
sequence and structural similarities to SMC proteins, one
might expect it to be expressed similarly to the SMC5 and
SMC6 repair proteins. Interestingly, the CBI for Rad50 is
less than most SMCs, which is consistent with a threefoldlower abundance of rad50 mRNA compared with SMC2
and SMC4 during early Drosophila embryogenesis (http://
genome.med.yale.edu/Lifecycle/ ), when most mitotic
activity takes place. Nevertheless, the CBI for Rad50 is
significantly greater than that of SMC5 and SMC6,
suggesting that these proteins may not be required to the
same extent as Rad50 to repair DNA damage.
Secondly, the effective number of codons (NC)
(Wright 1990) was also calculated. As NC can take values
from 20 (in the cases of extreme bias) to 61 (when all
synonymous codons are used equally), the bias in NC was
evaluated as:
61 NC
41
This statistic ranges from 0, for completely random codon
usage, to 1, if one codon is exclusively used for each
amino acid. These values are also plotted in figure 4B and
show that bias(NC) for SMC5 and particularly for SMC6 is
most similar to that of the yEst-6 pseudogene.
Thirdly, in addition to a correlation between tRNA
copy number and favored codons, it has also been
described that more highly expressed genes in Drosophila
biasðNC Þ ¼
Phylogenetic Analysis of SMC Proteins 339
tend to have a high GC content at silent sites such as the
third nucleotide position of their codons (Shields et al.
1988; Kanaya et al. 2001). On the other hand, this GC
content is often reduced in sequences with weaker
selective constraints, such as pseudogenes (Shields et al.
1988) (although this depends on their age [Echols et al.
2002]). The bias in GC content at silent third codon
positions was evaluated by calculating the bias(GC3)
statistic. As shown in figure 4B, the bias(GC3) for SMC5
and SMC6 is at least 20% smaller than that of most other
SMC genes (but only about 11% less than that of Rad50),
whereas SMC3 shows the highest bias.
Finally, the selection pressures on Drosophila SMC
proteins were examined by estimating the average ratio of
nonsynonymous to synonymous substitutions (Ka/Ks) at
each codon position. This ratio provides a measure of the
intensity of purifying, or negative, selection exerted on
genes, which results in decreased rates of amino acid
replacement and thus Ka/Ks ratios less than one. The
distribution of Ka/Ks varies greatly among different classes
of genes, with values ranging from 0.02 to 0.306 and a mean
ratio of 0.122 for various Drosophila genes (Li 1997) On
the other hand, the rates of synonymous and nonsynonymous substitutions are approximately equal if most amino
acid variation is neutral, as observed in many pseudogenes
(Li, Gojobori, and Nei 1981; Bustamante, Nielsen, and
Hartl 2002), whereas Ka/Ks ratios greater than 1 usually
indicate adaptive or positive selection (Messier and Stewart
1997; Hughes and Yeager 1998). As shown in figure 4C,
the Ka/Ks ratios for SMC2, SMC4, and Rad50 appear to be
close to the previously described mean value in Drosophila
(Li 1997), whereas SMC1 and SMC3 have considerably
lower Ka/Ks ratios. This increased negative selection on
SMC1 and SMC3 is consistent with the differences in
amino acid substitution rates between cohesin and condensin SMCs in closely related organisms, as described
earlier. By contrast, SMC5 and SMC6 display Ka/Ks ratios
approximately twice as great as other SMC proteins. In
particular, the ratio for SMC6 is similar to the highest Ka/Ks
values previously described in Drosophila for chorion
proteins (Li 1997). Based on these combined results, it
would appear that the SMC5 and SMC6 proteins in
Drosophila are under severely reduced selection for
preferred codons in comparison with other SMCs and
experience dramatically decreased negative selection,
which could explain their higher amino acid substitution
rate in terms of fewer selective constraints on their
evolution in general.
Phylogenetic Analysis of SMC Related Proteins
To reveal how closely related SMC proteins are to
similar ABC ATPases, a maximum-likelihood tree was
constructed using sequences from these protein families, as
shown in figure 5. It is clear from this tree that the closest
relatives to the SMC proteins are the archaeal Rad50
proteins, followed by eukaryotic Rad50 and eubacterial
SbcC proteins. Many of these Rad50 superfamily proteins
have the conserved N-terminal FKS (or FRS) motif
(located before the Walker A site), which is present in
most of the SMC proteins. However, this motif is absent in
the SMC5 and SMC6 subfamily members (in which only
the phenylalanine is conserved). The FKS motif is also
absent in the MukB proteins, which appear to possess
universally conserved NWN and FART motifs instead. In
addition, although it was previously suggested that all
SMC and MukB proteins uniquely share a conserved Nterminal QG motif (Melby et al. 1998), this dipeptide was
found to occur at different locations in SMC and MukB
sequences in the alignments generated in this study.
Instead, this motif was found at a conserved position in
SMC and Rad50 proteins, in agreement with previously
published structure-based alignments (van den Ent et al.
1999). Therefore, although it was previously proposed that
MukB should be included among the SMC proteins as
they adopt a similar structure (Melby et al. 1998), the
Rad50 proteins seem to be more closely related at the
sequence level. The high level of sequence conservation
between MukB proteins argues against the notion that they
are a highly divergent group of SMC proteins. Instead, it is
possible that their structural similarity may be partly
caused by convergent evolution, which is consistent with
the similar functions of MukB and SMC proteins in
prokaryotes (Cobbe and Heck 2000).
Correlated Rates of Evolution in Different Regions of
SMC Proteins
Contrary to previous reports (Melby et al. 1998), we
observed that eukaryotic SMC trees reconstructed from
individual conserved hinge or N-terminal and C-terminal
globular domains often differed both from each other and
from phylogenies based on the whole protein sequence.
For this reason, we attempted to dissect the different
selective constraints on the sequence divergence of these
regions by examining how they might coevolve. As interacting proteins tend to evolve at similar rates (Fraser et al.
2002), it was expected that regions of SMC proteins that
interact with each other would also coevolve and show
correlated rates of substitution.
Correlation coefficients for different regions of SMC
proteins were calculated after sorting the maximumlikelihood distances according to the functional complexes
within which different subfamily members are found. The
Pearson correlation coefficient for each pair of SMC protein regions is plotted in figure 6, together with Spearman’s coefficient of rank correlation. As shown in figure
6B, a clear positive correlation was observed between the
evolutionary rates of the b-strands that constitute the
interface of the N-termini and C-termini in the same
molecule (Löwe, Cordell, and, van den Ent 2001). However, no significant positive correlation was found between
the corresponding N-terminal and C-terminal regions of
any SMC and its partner (fig. 6A), as would have been
expected if the heterodimer existed naturally in an antiparallel intermolecular conformation. This strongly suggests that the intramolecular structure (fig. 1B) indicated
by available in vitro data is also the structure found in
vivo. By contrast, a strong positive correlation was observed between the evolutionary rates of the exposed outer
portions of the N-terminal and C-terminal domains in
SMC2 and SMC4 but not in the corresponding regions
340 Cobbe and Heck
FIG. 5.—Phylogenetic tree of SMC proteins and other coiled-coil ABC ATPases. The scale bar denotes the number of accepted substitutions in
Whelan-Goldman distance units.
Phylogenetic Analysis of SMC Proteins 341
FIG. 6.—Correlated rates of evolution in different regions of SMC proteins, calculated using maximum-likelihood branch lengths. Correlation
coefficients associated with postulated intermolecular associations are denoted ‘‘INTER,’’ whereas ‘‘INTRA’’ indicates intramolecular correlations.
The Pearson correlation coefficients are shown above and nonparametric Spearman rank correlations are shown below.
of the same molecule (fig. 6C and D). These stronger
intermolecular correlations are nonetheless consistent with
observed associations between the termini of SMC proteins (Melby et al. 1998; Anderson et al. 2002). Moreover,
a significant correlation at the 1% level was observed for
intermolecular associations between the coiled-coils of
the condensin SMC proteins (fig. 6E ). This elevated intermolecular correlation appears to reflect differences
between the conformation of vertebrate condensin and
cohesin SMC proteins previously observed by rotary
shadowing, whereby the coiled-coils of condensin SMCs
are intimately associated along most of their length but
those of the cohesin complex are well separated (Anderson
et al. 2002).
Interestingly, the SMC5 and SMC6 proteins appear
to show similarly weak associations between their arms
(fig. 6C and E) as SMC1 and SMC3. This suggests that the
SMC proteins involved specifically in DNA repair may
adopt a similarly open conformation to that of the cohesin
SMC proteins (Haering et al. 2002). By contrast, although
a significant intermolecular correlation was not found
consistently for the rate at which the hinge domains of
cohesin SMC proteins evolve, the SMC5 and SMC6
proteins nonetheless showed intermolecular rate correla-
342 Cobbe and Heck
tions greater than 60% at the dimer interface of their hinge
domains. This may possibly indicate that SMC5 and
SMC6 have a stronger hinge association than other SMC
proteins such as SMC1 and SMC3. Although an even
stronger positive correlation can be seen for the dimer
interface of condensin hinge domains using the Pearson
correlation coefficient, this particular result is not supported by nonparametric analyses.
Discussion
The Phylogeny of SMC Proteins
We have investigated the evolutionary radiation of
SMC proteins and determined that the closest relatives of
these proteins are members of the Rad50 superfamily, as
revealed by phylogenetic analysis and the conservation of
particular residues. Apart from the suspected cases of HGT
described in this study, the tree topology described here
generally agrees with the currently accepted taxonomic
distribution of the organisms whose SMC proteins have
been examined. Interestingly, our consensus tree shows
that plant and animal SMC proteins cluster together with
fungi as the outgroup, whereas other recent studies based
on 18S rRNA and a panoply of proteins have reported that
fungi and animals are most closely related to each
other (Baldauf and Palmer 1993; Baldauf et al. 2000; Van
de Peer et al. 2000). However, the resolution of this
trichotomy has been a longstanding controversy, with
different studies suggesting that animals are closer
relatives to either plants or fungi (Gupta 1995; Wang,
Kumar, and Hedges 1999). The tree presented here is more
consistent with previous analyses using ribosomal proteins
(Veuthey and Bittar 1998) and other concatenated sequences (Gouy and Li 1989), which place plants closer
to animals. Alternatively, the phylogeny revealed in the
SMC tree may reflect higher rates of substitution in the
fungal proteins because of reduced constraints in their
interactions with other proteins, as has been suggested in
the case of histones H3 and H4 (Thatcher and Gorovsky
1994). For example, at least one protein (AKAP95) with
no clear orthologs in other animals, fungi, or plants is
thought to interact with SMC proteins in vertebrates
(Collas, Le Guellec, and Tasken 1999; Steen et al. 2000),
which may influence the relatively slow rates of SMC
evolution in these animals. In the future, it will be
interesting to see how the use of SMC sequences, in
combination with larger data sets of similarly conserved
protein sequences (Brown et al. 2001), might contribute
to resolving the relationships among these eukaryotic
kingdoms.
Evolution of Eukaryotic SMC Subfamilies
According to the tree in figure 2, it appears that
a symmetric duplication of genes encoding the larger and
smaller eukaryotic SMCs gave rise to both cohesin and
condensin SMC proteins in all eukaryotes. The relatively
short branch lengths connecting either the roots of the
SMC1/SMC4 or the SMC2/SMC3 lineages to the prokaryotic SMC root also suggest that the first duplication
event, giving rise to the primordial eukaryotic SMC
heterodimer, occurred very early in eukaryotic evolution.
However, it seems that the first SMC heterodimer to arise
in eukaryotes may have had more functions in common
with the condensin heterodimer, based on the assertion that
the primary phenotype in a prokaryotic smc null mutant is
probably a condensation defect (Britton and Grossman
1999) and the observation that DNA reannealing (Sutani
and Yanagida 1997; Hirano and Hirano 1998) and
supercoiling (Kimura and Hirano 1997; Lindow, Britton,
and Grossman 2002) activities are common features of
these proteins but apparently not of cohesins in general
(Jessberger et al. 1996; Losada and Hirano 2001; Sakai
et al. 2003). Given these functional similarities between
condensin and prokaryotic SMCs, it might seem odd that
the primary sequences of SMC2 and SMC4 do not appear
to be more closely related to prokaryotic SMC proteins
than the functionally distinct SMC1 and SMC3 proteins
(fig. 3B). However, if condensin SMCs had evolved
slightly more rapidly since the divergence of eukaryotes,
then this could account for their lack of greater sequence
identity with the functionally related prokaryotic SMC
proteins. Indeed, a higher substitution rate among
condensin SMC proteins than cohesins can be seen in
pairwise comparisons between sequences from more
closely related organisms, such as vertebrate or dipteran
SMCs. This suggests that SMC2 and SMC4 may evolve
more rapidly than the cohesin SMCs but that the difference
in rate is small enough to be masked by mutational
saturation in comparisons between more distantly related
species. If so, it is anticipated that additional SMC protein
sequences from closely related organisms in other phyla
might also reveal more accepted substitutions in pairwise
comparisons between condensin SMC proteins than in
cohesins. One might speculate that the later evolution of
SMC2 and SMC4 involved some level of positive
selection for proteins with enhanced condensing activities,
given the larger size of eukaryotic chromosomes in general
(Bendich and Drlica 2000), whereas SMC1 and SMC3
were recruited to specific roles in maintaining sister
chromatid cohesion. Although speculative, a prediction
of this hypothesis could be tested by directly comparing
the supercoiling activities of condensin and prokaryotic
SMCs to see if the condensing activities of the former are
indeed more efficient. Moreover, as additional SMC
sequences from closely related species become available,
this may facilitate the detection of positive selection at
particular amino acid sites of condensin SMC proteins by
maximum-likelihood approaches (Anisimova, Bielawski,
and Yang 2001).
Whereas, differences in the substitution rate of
cohesin and condensin SMC proteins become more
pronounced in comparisons between closely related
organisms, SMC5 and SMC6 consistently show pairwise
distances about twice that of the average for the other
SMCs. Based on the pattern of codon usage and Ka/Ks
ratios in Drosophila, it seems that the large numbers of
substitutions in the SMC5 and SMC6 proteins are best
explained by an elevated rate of evolution, rather than
a more ancient origin. Similar codon usage analyses in
other organisms failed to clearly show the same pattern,
often because of less pronounced codon bias patterns or
Phylogenetic Analysis of SMC Proteins 343
FIG. 7.—Models of SMC heterodimers, based on available structural data and the patterns of coevolution described in figure 5. The coiled-coil
arms of SMC4 and SMC2 have positively correlated substitution rates and appear to be associated with each other in electron microscope images of
condensin. By contrast, cohesin is believed to form an annular structure (Gruber, Haering, and Nasmyth 2003) in which the arms of SMC1 and SMC3
are separated but their termini are bridged by the SCC1/RAD21 cohesin subunit, which might account for the lack of any significant correlation
between substitution rates in these regions. As there is also little evidence of coevolution between the arms and exposed termini of SMC5 and SMC6,
it is possible that these regions do not form significant contacts between these proteins either.
the complicating influence of SMC gene duplications on
evolutionary rate. However, as SMC5 and SMC6 are only
known to be essential for proliferation in yeasts (Lehmann
et al. 1995; Verkade et al. 1999) but not in Arabidopsis
(Mengiste et al. 1999), it seems likely that their reduced
sequence conservation may reflect reduced selective
pressures on these proteins in other species as well.
Moreover, as some of the properties of SMC5 and SMC6
(Taylor et al. 2001) are mirrored by the association of
SMC1 (or SMC1b in vertebrates) and SMC3 with male
meiotic chromosomes (Klein et al. 1999; Eijpe et al. 2000;
Revenkova et al. 2001) and the role of SMC1/SMC3 in
recombination repair (Jessberger et al. 1996; Sjögren and
Nasmyth 2001; Kim, Xu, and Kastan 2002; Yazdi et al.
2002), it is possible that these proteins have evolved more
rapidly since their function can be partially complemented
by the cohesin SMCs. Therefore, the long branch lengths
within this subfamily seem to better reflect their lessconstrained substitution rate, rather than their greater
antiquity as previously suggested (Jones and Sgouros
2001). Although the other coiled-coil ABC ATPases
appear to place the root of SMC proteins in the SMC5/
SMC6 branch, this does not necessarily imply that these
are the ancestral SMC proteins. Instead, it seems more
likely that the placement of the root here in the maximumlikelihood tree reflects the average similarity of other
coiled-coil ABC ATPases to SMC proteins, regardless of
how the SMCs have diverged from each other.
The Conformations of SMC Heterodimers
In agreement with the results obtained previously by
in vitro analyses (Hirano et al. 2001; Haering et al. 2002;
Hirano and Hirano 2002), our statistical analysis of
coevolution in eukaryotic SMC heterodimers strongly
favors the intramolecular model of dimer formation
(fig. 6A and B). We believe our findings are complementary to the available in vitro data as the observed
correlations between substitution rates would be consistent
with formation of such intramolecular heterodimers in
vivo.
The low correlations between the evolutionary rates
at the coiled-coils and termini of SMC5/SMC6 heterodimers suggest that their conformation may more closely
resemble cohesin SMC proteins, consistent with their
overlapping functions in DNA repair and meiosis. The
similarly weak intermolecular correlations for the arms and
termini of cohesin SMCs may represent a weaker
association between these regions (fig. 7), facilitating the
dissolution of sister chromatid cohesion mediated by
a proposed annular cohesin complex (Gruber, Haering,
and Nasmyth 2003). The weaker association between the
arms suggested by our analysis is consistent with the
appearance of both the SMC1/SMC3 heterodimer and the
cohesin complex when observed by electron microscopy
(Anderson et al. 2002; Haering et al. 2002). By contrast,
the strong intermolecular correlation for the coiled-coils
and the termini of condensin SMC proteins are consistent
both with electron microscope images of these proteins
(Anderson et al. 2002; Yoshimura et al. 2002) and the high
specificity of their coiled-coil interactions (Hirano and
Hirano 2002). In the future it will be interesting to see if
SMC5 and SMC6 also adopt a similarly open conformation to that of the cohesin SMC proteins when viewed by
microscopy. On the other hand, if SMC5 and SMC6 prove
to have a stronger hinge association than other SMC
proteins, this may reflect a role for this domain in tethering
such SMC proteins bound to separate DNA strands during
the repair process, as has been suggested similarly for
Rad50 (de Jager et al. 2001; Hopfner et al. 2002).
Supplementary Material
Maximum-likelihood trees of selected SMC proteins
from eukaryotes in which all SMC protein sequences are
available, are presented in an online supplementary figure.
Separate trees of the six different subfamilies are shown
together with a consensus tree constructed from concatenated alignments of the other sequences (SMC1 to SMC6).
Scale bars denote the number of accepted substitutions in
Whelan-Goldman distance units and bootstrap percentage
support for the nodes are indicated.
Web sites where additional data are available include
http://www.kazusa.or.jp/codon/ Codon Usage Database,
http://www.molbiol.ox.ac.uk/cu/ CodonW Program, http://
genome.med.yale.edu/Lifecycle/ Drosophila Gene Expression Levels, and http://www.hgsc.bcm.tmc.edu/blast/?
organism¼Dpseudoobscura D. pseudoobscura Genome
Acknowledgments
We would like to thank Andrew Lloyd of INCBI
(Irish National Centre for Bioinformatics) and the UK
Human Genome Mapping Project Resource Centre for use
344 Cobbe and Heck
of computing facilities. In addition, we are grateful to
Dietlind Gerloff, Paul McLaughlin, Ken Wolfe, and an
anonymous referee for helpful discussions and critical
reading of the manuscript. M.H. is a Senior Research
Fellow in the Basic Biomedical Sciences, funded by the
Wellcome Trust. N.C. has been supported by a Darwin
Trust Prize Studentship.
Literature Cited
Adams, M. D., S. E. Celniker, R. A. Holt et al. 2000. The
genome sequence of Drosophila melanogaster. Science
287:2185–2195.
Aguinaldo, A. M., J. M. Turbeville, L. S. Linford, M. C. Rivera,
J. R. Garey, R. A. Raff, and J. A. Lake. 1997. Evidence for
a clade of nematodes, arthropods and other moulting animals.
Nature 387:489–493.
Akashi, H. 1994. Synonymous codon usage in Drosophila
melanogaster: natural selection and translational accuracy.
Genetics 136:927–935.
Akhmedov, A. T., C. Frei, M. Tsai-Pflugfelder, B. Kemper, S. M.
Gasser, and R. Jessberger. 1998. Structural maintenance of
chromosomes protein C-terminal domains bind preferentially
to DNA with secondary structure. J. Biol. Chem. 273:24088–
24094.
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z.
Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST
and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res. 25:3389–3402.
Anderson, D. E., A. Losada, H. P. Erickson, and T. Hirano. 2002.
Condensin and cohesin display different arm conformations
with characteristic hinge angles. J. Cell Biol. 156:419–424.
Anderson, D. E., K. M. Trujillo, P. Sung, and H. P. Erickson.
2001. Structure of the Rad50Mre11 DNA repair complex
from Saccharomyces cerevisiae by electron microscopy.
J. Biol. Chem. 276:37027–37033.
Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracy
and power of the likelihood ratio test in detecting adaptive
molecular evolution. Mol. Biol. Evol. 18:1585–1592.
Aravind, L., R. L. Tatusov, Y. I. Wolf, D. R. Walker, and E. V.
Koonin. 1998. Evidence for massive gene exchange between
archaeal and bacterial hyperthermophiles. Trends. Genet.
14:442–444.
Baldauf, S. L., and J. D. Palmer. 1993. Animals and fungi are
each other’s closest relatives: congruent evidence from
multiple proteins. Proc. Natl. Acad. Sci. USA 90:11558–
11562.
Baldauf, S. L., J. D. Palmer, and W. F. Doolittle. 1996. The root
of the universal tree and the origin of eukaryotes based on
elongation factor phylogeny. Proc. Natl. Acad. Sci. USA
93:7749–7754.
Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle.
2000. A kingdom-level phylogeny of eukaryotes based on
combined protein data. Science 290:972–977.
Beasley, M., H. Xu, W. Warren, and M. McKay. 2002.
Conserved disruptions in the predicted coiled-coil domains
of eukaryotic SMC complexes: implications for structure and
function. Genome Res. 12:1201–1209.
Bendich, A. J., and K. Drlica. 2000. Prokaryotic and eukaryotic
chromosomes: what’s the difference? Bioessays 22:481–486.
Bennetzen, J. L. and B. D. Hall. 1982. Codon selection in yeast.
J. Biol. Chem. 257:3026–3031.
Brendel, V., L. Brocchieri, S. J. Sandler, A. J. Clark, and S.
Karlin. 1997. Evolutionary comparisons of RecA-like proteins
across all major kingdoms of living organisms. J. Mol. Evol.
44:528–541.
Britton, R. A., and A. D. Grossman. 1999. Synthetic lethal
phenotypes caused by mutations affecting chromosome
partitioning in Bacillus subtilis. J. Bacteriol. 181:5860–5864.
Brochier, C., E. Bapteste, D. Moreira, and H. Philippe. 2002.
Eubacterial phylogeny based on translational apparatus
proteins. Trends Genet. 18:1–5.
Brown, J. R., C. J. Douady, M. J. Italia, W. E. Marshall, and M. J.
Stanhope. 2001. Universal trees based on large combined
protein sequence data sets. Nat. Genet. 28:281–285.
Brown, N. P., C. Leroy, and C. Sander. 1998. MView: A Web
compatible database search or multiple alignment viewer.
Bioinformatics 14:380–381.
Bruno, W. J., N. D. Socci, and A. L. Halpern. 2000. Weighted
neighbor joining: a likelihood-based approach to distancebased phylogeny reconstruction. Mol. Biol. Evol. 17:
189–197.
Bustamante, C. D., R. Nielsen, and D. L. Hartl. 2002. A
maximum likelihood method for analyzing pseudogene
evolution: implications for silent site evolution in humans
and rodents. Mol. Biol. Evol. 19:110–117.
Cavalier-Smith, T. 2002. The neomuran origin of archaebacteria,
the negibacterial root of the universal tree and bacterial
megaclassification. Int. J. Syst. Evol. Microbiol. 52:7–76.
Chuang, P. T., D. G. Albertson, and B. J. Meyer. 1994. DPY-27:
a chromosome condensation protein homolog that regulates
C. elegans dosage compensation through association with the
X chromosome. Cell 79:459–474.
Cobbe, N. and M. M. Heck. 2000. SMCs in the world of
chromosome biology- from prokaryotes to higher eukaryotes.
J. Struct. Biol. 129:123–143.
Coghlan, A., and K. H. Wolfe. 2002. Fourfold faster rate of
genome rearrangement in nematodes than in Drosophila.
Genome Res. 12:857–867.
Collas, P., K. Le Guellec, and K. Tasken. 1999. The A-kinaseanchoring protein AKAP95 is a multivalent protein with a key
role in chromatin condensation at mitosis. J. Cell Biol.
147:1167–1180.
Comeron, J. M. and M. Kreitman. 1998. The correlation between
synonymous and nonsynonymous substitutions in Drosophila: mutation, selection or relaxed constraints? Genetics
150:767–775.
de Jager, M., J. van Noort, D. C. van Gent, C. Dekker, R. Kanaar,
and C. Wyman. 2001. Human Rad50/Mre11 is a flexible
complex that can tether DNA ends. Mol. Cell. 8:1129–1135.
Drummond, A., and K. Strimmer. 2001. PAL: An object-oriented
programming library for molecular evolution and phylogenetics. Bioinformatics 17:662–663.
Duret, L., and D. Mouchiroud. 1999. Expression pattern and,
surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA
96:4482–4487.
Echols, N., P. Harrison, S. Balasubramanian, N. M. Luscombe, P.
Bertone, Z. Zhang, and M. Gerstein. 2002. Comprehensive
analysis of amino acid and nucleotide composition in
eukaryotic genomes, comparing genes and pseudogenes.
Nucleic Acids Res. 30:2515–2523.
Eijpe, M., C. Heyting, B. Gross, and R. Jessberger. 2000.
Association of mammalian SMC1 and SMC3 proteins with
meiotic chromosomes and synaptonemal complexes. J. Cell
Sci. 113:673–682.
Eisen, J. A. 1995. The RecA protein as a model molecule for
molecular systematic studies of bacteria: comparison of trees
of RecAs and 16S rRNAs from the same species. J. Mol.
Evol. 41:1105–1123.
Engels, W. R., J. Ducau, C. Flores, D. Johnson-Schlitz, and C. R.
Preston. 2003. Double strand break repair: four pathways,
many genes. A. Dros. Res. Conf. 44:1.
Phylogenetic Analysis of SMC Proteins 345
Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2002. Testing the neutral
theory of molecular evolution with genomic data from
Drosophila. Nature 415:1024–1026.
Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:
401–410.
Fousteri, M. I., and A. R. Lehmann. 2000. A novel SMC protein
complex in Schizosaccharomyces pombe contains the Rad18
DNA repair protein. EMBO J. 19:1691–1702.
Fraser, H. B., A. E. Hirsh, L. M. Steinmetz, C. Scharfe, and
M. W. Feldman. 2002. Evolutionary rate in the protein
interaction network. Science 296:750–752.
Fujioka, Y., Y. Kimata, K. Nomaguchi, K. Watanabe, and K.
Kohno. 2002. Identification of a novel non-SMC component
of the SMC5/SMC6 complex involved in DNA repair. J. Biol.
Chem. 277:21585–21591.
Gorski, M. M., and J. C. J. Eeken. 2002. The Drosophila rad50
mutants are pupal lethal, however the third instar larvae show
elevated levels of anaphase bridges in dividing cells. A. Dros.
Res. Conf. 43:201C.
Gouy, M., and W. H. Li. 1989. Molecular phylogeny of the
kingdoms Animalia, Plantae, and Fungi. Mol. Biol. Evol.
6:109–122.
Grantham, R., C. Gautier, M. Gouy, M. Jacobzone, and R. Mercier.
1981. Codon catalog usage is a genome strategy modulated for
gene expressivity. Nucleic Acids Res. 9:r43–r74.
Graybeal, A. 1998. Is it better to add taxa or characters to a
difficult phylogenetic problem? Syst. Biol. 47:9–17.
Gruber, S., C. H. Haering, and K. Nasmyth. 2003. Chromosomal
cohesin forms a ring. Cell 112:765–777.
Gu, Z., A. Cavalcanti, F. C. Chen, P. Bouman, and W. H. Li.
2002. Extent of gene duplication in the genomes of
Drosophila, nematode, and yeast. Mol. Biol. Evol. 19:
256–262.
Gupta, R. S. 1995. Phylogenetic analysis of the 90 kD heat shock
family of protein sequences and an examination of the
relationship among animals, plants, and fungi species. Mol.
Biol. Evol. 12:1063–1073.
Haering, C. H., J. Löwe, A. Hochwagen, and K. Nasmyth. 2002.
Molecular architecture of SMC proteins and the yeast cohesin
complex. Mol. Cell. 9:773–788.
Hagstrom, K. A., V. F. Holmes, N. R. Cozzarelli, and B. J.
Meyer. 2002. C. elegans condensin promotes mitotic
chromosome architecture, centromere organization, and sister
chromatid segregation during mitosis and meiosis. Genes
Dev. 16:729–742.
Hausdorf, B. 2000. Early evolution of the bilateria. Syst. Biol.
49:130–142.
Hedges, S. B., H. Chen, S. Kumar, D. Y. Wang, A. S. Thompson,
and H. Watanabe. 2001. A genomic timescale for the origin of
eukaryotes. BMC Evol. Biol. 1:4.
Hiraga, S. 2000. Dynamic localization of bacterial and plasmid
chromosomes. Annu. Rev. Genet. 34:21–59.
Hirano, T. 2002. The ABCs of SMC proteins: two-armed
ATPases for chromosome condensation, cohesion, and repair.
Genes Dev. 16:399–414.
Hirano, M., D. E. Anderson, H. P. Erickson, and T. Hirano. 2001.
Bimodal activation of SMC ATPase by intra- and intermolecular interactions. EMBO J. 20:3238–3250.
Hirano, M., and T. Hirano. 1998. ATP-dependent aggregation of
single-stranded DNA by a bacterial SMC homodimer. EMBO
J. 17:7139–7148.
———. 2002. Hinge-mediated dimerization of SMC protein is
essential for its dynamic interaction with DNA. EMBO J.
21:5733–5744.
Hopfner, K. P., L. Craig, G. Moncalian et al (12 co-authors).
2002. The Rad50 zinc-hook is a structure joining Mre11
complexes in DNA recombination and repair. Nature
418:562–566.
Hopfner, K. P., A. Karcher, L. Craig, T. T. Woo, J. P. Carney,
and J. A. Tainer. 2001. Structural biochemistry and interaction
architecture of the DNA double-strand break repair Mre11
nuclease and Rad50-ATPase. Cell 105:473–485.
Hopfner, K. P., A. Karcher, D. S. Shin, L. Craig, L. M. Arthur,
J. P. Carney, and J. A. Tainer. 2000. Structural biology of
Rad50 ATPase: ATP-driven conformational control in DNA
double-strand break repair and the ABC-ATPase superfamily.
Cell 101:789–800.
Hughes, A. L., and M. Yeager. 1998. Natural selection at major
histocompatibility complex loci of vertebrates. Annu. Rev.
Genet. 32:415–435.
Jain, R., M. C. Rivera, and J. A. Lake. 1999. Horizontal gene
transfer among genomes: the complexity hypothesis. Proc.
Natl. Acad. Sci. USA 96:3801–3806.
Jessberger, R., B. Riwar, H. Baechtold, and A. T. Akhmedov.
1996. SMC proteins constitute two subunits of the mammalian recombination complex RC-1. EMBO J. 15:4061–4068.
Johnson-Schlitz, D. M., J. Ducau, C. Flores, and W. R. Engels.
2003. Evidence of DNA repair defects in mre11 and rad50
mutants. A. Dros. Res. Conf. 44:323B.
Jones, S., and J. Sgouros. 2001. The cohesin complex: sequence
homologies, interaction networks and shared motifs. Genome
Biol. 2:RESEARCH0009.
Jordan, I. K., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002.
Essential genes are more evolutionarily conserved than are
nonessential genes in bacteria. Genome Res. 12:962–968.
Kanaya, S., Y. Yamada, M. Kinouchi, Y. Kudo, and T. Ikemura.
2001. Codon usage and tRNA genes in eukaryotes: correlation
of codon usage diversity with translation efficiency and with
CG-dinucleotide usage as assessed by multivariate analysis.
J. Mol. Evol. 53:290–298.
Kim, S. T., B. Xu, and M. B. Kastan. 2002. Involvement of the
cohesin protein, Smc1, in Atm-dependent and independent
responses to DNA damage. Genes Dev. 16:560–570.
Kimura, K., and T. Hirano. 1997. ATP-dependent positive
supercoiling of DNA by 13S condensin: a biochemical
implication for chromosome condensation. Cell 90:625–634.
Kimura, K., V. V. Rybenkov, N. J. Crisona, T. Hirano, and N. R.
Cozzarelli. 1999. 13S condensin actively reconfigures DNA
by introducing global positive writhe: implications for
chromosome condensation. Cell 98:239–248.
Klein, F., P. Mahr, M. Galova, S. B. Buonomo, C. Michaelis, K.
Nairz, and K. Nasmyth. 1999. A central role for cohesins in
sister chromatid cohesion, formation of axial elements, and
recombination during yeast meiosis. Cell 98:91–103.
Koski, L. B., R. A. Morton, and G. B. Golding. 2001. Codon bias
and base composition are poor indicators of horizontally
transferred genes. Mol. Biol. Evol. 18:404–412.
Kuhner, M. K., and J. Felsenstein. 1994. A simulation
comparison of phylogeny algorithms under equal and unequal
evolutionary rates. Mol. Biol. Evol. 11:459–468.
Lang, B. F., C. O’Kelly, T. Nerad, M. W. Gray, and G. Burger.
2002. The closest unicellular relatives of animals. Curr. Biol.
12:1773–1778.
Lehmann, A. R., M. Walicka, D. J. Griffiths, J. M. Murray, F. Z.
Watts, S. McCready, and A. M. Carr. 1995. The rad18 gene of
Schizosaccharomyces pombe defines a new subgroup of the
SMC superfamily involved in DNA repair. Mol. Cell. Biol.
15:7067–7080.
Li, W.-H. 1997. Molecular Evolution. Sinauer Associates,
Sunderland, Mass.
Li, W. H., T. Gojobori, and M. Nei. 1981. Pseudogenes as
a paradigm of neutral evolution. Nature 292:237–239.
346 Cobbe and Heck
Lieb, J. D., M. R. Albrecht, P. T. Chuang, and B. J. Meyer. 1998.
MIX-1: an essential component of the C. elegans mitotic
machinery executes X chromosome dosage compensation.
Cell 92:265–277.
Lindow, J. C., R. A. Britton, and A. D. Grossman. 2002. Structural
maintenance of chromosomes protein of Bacillus subtilis
affects supercoiling in vivo. J. Bacteriol. 184:5317–5322.
Liu, C. C., J. McElver, I. Tzafrir, R. Joosen, P. Wittich, D.
Patton, A. A. Van Lammeren, and D. Meinke. 2002.
Condensin and cohesin knockouts in Arabidopsis exhibit
a titan seed phenotype. Plant J. 29:405–415.
Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree
of life in the light of the covarion model. J. Mol. Evol.
49:496–508.
Losada, A., and T. Hirano. 2001. Intermolecular DNA interactions stimulated by the cohesin complex in vitro. Implications for sister chromatid cohesion. Curr. Biol. 11:268–272.
Löwe, J., S. C. Cordell, and F. van den Ent. 2001. Crystal structure
of the SMC head domain: an ABC ATPase with 900 residues
antiparallel coiled-coil inserted. J. Mol. Biol. 306:25–35.
Lyons-Weiler, J., and G. A. Hoelzer. 1997. Escaping from the
Felsenstein zone by detecting long branches in phylogenetic
data. Mol. Phylogenet. Evol. 8:375–384.
Melby, T. E., C. N. Ciampaglio, G. Briscoe, and H. P. Erickson.
1998. The symmetrical structure of structural maintenance of
chromosomes (SMC) and MukB proteins: long, antiparallel
coiled coils, folded at a flexible hinge. J. Cell Biol. 142:1595–
1604.
Mengiste, T., E. Revenkova, N. Bechtold, and J. Paszkowski.
1999. An SMC-like protein is required for efficient homologous recombination in Arabidopsis. EMBO J. 18:4505–
4512.
Messier, W., and C. Stewart. 1997. Episodic adaptive evolution
of primate lysozymes. Nature 385:151–154.
Michaelis, C., R. Ciosk, and K. Nasmyth. 1997. Cohesins:
chromosomal proteins that prevent premature separation of
sister chromatids. Cell 91:35–45.
Montgomery, E. A., B. Charlesworth, and C. H. Langley. 1987.
A test for the role of natural selection in the stabilization of
transposable element copy number in a population of
Drosophila melanogaster. Genet. Res. 49:31–41.
Morgenstern, B., K. Frech, A. Dress, and T. Werner. 1998.
DIALIGN: Finding local similarities by multiple sequence
alignment. Bioinformatics 14:290–294.
Mushegian, A. R., J. R. Garey, J. Martin, and L. X. Liu. 1998.
Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the
human, fly, nematode, and yeast genomes. Genome Res.
8:590–598.
Nei, M., P. Xu, and G. Glazko. 2001. Estimation of divergence
times from multiprotein sequences for a few mammalian
species and several distantly related organisms. Proc. Natl.
Acad. Sci. USA 98:2497–2502.
Niki, H., R. Imamura, M. Kitaoka, K. Yamanaka, T. Ogura, and
S. Hiraga. 1992. E. coli MukB protein involved in
chromosome partition forms a homodimer with a rod-andhinge structure having DNA binding and ATP/GTP binding
activities. EMBO J. 11:5101–5109.
Pál, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genes
in yeast evolve slowly. Genetics 158:927–931.
Revenkova, E., M. Eijpe, C. Heyting, B. Gross, and R.
Jessberger. 2001. Novel meiosis-specific isoform of mammalian SMC1. Mol. Cell Biol. 21:6984–6998.
Saitoh, N., I. Goldberg, and W. C. Earnshaw. 1995. The SMC
proteins and the coming of age of the chromosome scaffold
hypothesis. BioEssays 17:759–766.
Saitoh, N., I. G. Goldberg, E. R. Wood, and W. C. Earnshaw.
1994. ScII: an abundant chromosome scaffold protein is
a member of a family of putative ATPases with an unusual
predicted tertiary structure. J. Cell Biol. 127:303–318.
Saka, Y., T. Sutani, Y. Yamashita, S. Saitoh, M. Takeuchi, Y.
Nakaseko, and M. Yanagida. 1994. Fission yeast cut3 and
cut14, members of a ubiquitous protein family, are required
for chromosome condensation and segregation in mitosis.
EMBO J. 13:4938–4952.
Sakai, A., K. Hizume, T. Sutani, K. Takeyasu, and M. Yanagida.
2003. Condensin but not cohesin SMC heterodimer induces
DNA reannealing through protein-protein assembly. EMBO J.
22:2764–2775.
Schmid, K. J., and D. Tautz. 1997. A screen for fast evolving genes
from Drosophila. Proc. Natl. Acad. Sci. USA 94:9746–9750.
Sharp, P. M., and W. H. Li. 1989. On the rate of DNA sequence
evolution in Drosophila. J. Mol. Evol. 28:398–402.
Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988.
‘‘Silent’’ sites in Drosophila genes are not neutral: evidence
of selection among synonymous codons. Mol. Biol. Evol.
5:704–716.
Siddiqui, N. U., P. E. Stronghill, R. E. Dengler, C. A.
Hasenkampf, and C. D. Riggs. 2003. Mutations in Arabidopsis condensin genes disrupt embryogenesis, meristem
organization and segregation of homologous chromosomes
during meiosis. Development 130:3283–3295.
Sjögren, C., and K. Nasmyth. 2001. Sister chromatid cohesion is
required for postreplicative double-strand break repair in
Saccharomyces cerevisiae. Curr. Biol. 11:991–995.
Soppa, J. 2001. Prokaryotic structural maintenance of chromosomes (SMC) proteins: distribution, phylogeny, and comparison with MukBs and additional prokaryotic and eukaryotic
coiled-coil proteins. Gene 278:253–264.
Soppa, J., K. Kobayashi, M. F. Noirot-Gros, D. Oesterhelt, S. D.
Ehrlich, E. Dervyn, N. Ogasawara, and S. Moriya. 2002.
Discovery of two novel families of proteins that are proposed
to interact with prokaryotic SMC proteins, and characterization of the Bacillus subtilis family members ScpA and ScpB.
Mol. Microbiol. 45:59–71.
Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, O.
Madsen, W. W. de Jong, and M. J. Stanhope. 2001.
Mitochondrial versus nuclear gene sequences in deep-level
mammalian phylogeny reconstruction. Mol. Biol. Evol.
18:132–143.
Steen, R. L., F. Cubizolles, K. Le Guellec, and P. Collas. 2000. A
kinase-anchoring protein (AKAP)95 recruits human chromosome-associated protein (hCAP)-D2/Eg7 for chromosome
condensation in mitotic extract. J. Cell Biol. 149:531–536.
Steffensen, S., P. A. Coelho, N. Cobbe, S. Vass, M. Costa, B.
Hassan, S. N. Prokopenko, H. Bellen, M. M. Heck, and C. E.
Sunkel. 2001. A role for Drosophila SMC4 in the resolution
of sister chromatids in mitosis. Curr. Biol. 11:295–307.
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling:
a quartet maximum likelihood method for reconstructing tree
topologies. Mol. Biol. Evol. 13:964–969.
Strunnikov, A. V., E. Hogan, and D. Koshland. 1995. SMC2,
a Saccharomyces cerevisiae gene essential for chromosome
segregation and condensation, defines a subgroup within the
SMC family. Genes Dev. 9:587–599.
Strunnikov, A. V., V. L. Larionov, and D. Koshland. 1993.
SMC1: an essential yeast gene encoding a putative head-rodtail protein is required for nuclear division and defines a new
ubiquitous protein family. J. Cell Biol. 123:1635–1648.
Sutani, T., and M. Yanagida. 1997. DNA renaturation activity of
the SMC complex implicated in chromosome condensation.
Nature 388:798–801.
Phylogenetic Analysis of SMC Proteins 347
Suzuki, Y., G. V. Glazko, and M. Nei. 2002. Overcredibility of
molecular phylogenies obtained by Bayesian phylogenetics.
Proc. Natl. Acad. Sci. USA 99:16138–16143.
Swofford, D. L. 1999. PAUP*: phylogenetic analysis using
parsimony (*and other methods). Version 4.0. Sinauer
Associates, Sunderland, Mass.
Tateno, Y., N. Takezaki, and M. Nei. 1994. Relative efficiencies
of the maximum-likelihood, neighbor-joining, and maximumparsimony methods when substitution rate varies with site.
Mol. Biol. Evol. 11:261–277.
Taylor, E. M., J. S. Moghraby, J. H. Lees, B. Smit, P. B. Moens,
and A. R. Lehmann. 2001. Characterization of a novel human
SMC heterodimer homologous to the Schizosaccharomyces
pombe Rad18/Spr18 complex. Mol. Biol. Cell. 12:1583–1594.
Thatcher, T. H., and M. A. Gorovsky. 1994. Phylogenetic
analysis of the core histones H2A, H2B, H3, and H4. Nucleic
Acids Res. 22:174–179.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994.
CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22:4673–4680.
Van de Peer, Y., S. L. Baldauf, W. F. Doolittle, and A. Meyer.
2000. An updated and comprehensive rRNA phylogeny of
(crown) eukaryotes based on rate-calibrated evolutionary
distances. J. Mol. Evol. 51:565–576.
van den Ent, F., A. Lockhart, J. Kendrick-Jones, and J. Löwe.
1999. Crystal structure of the N-terminal domain of MukB:
a protein involved in chromosome partitioning. Struct. Fold.
Des. 7:1181–1187.
Verkade, H. M., S. J. Bugg, H. D. Lindsay, A. M. Carr, and M. J.
O’Connell. 1999. Rad18 is required for DNA repair and
checkpoint responses in fission yeast. Mol. Biol. Cell.
10:2905–2918.
Veuthey, A. L., and G. Bittar. 1998. Phylogenetic relationships of
fungi, plantae, and animalia inferred from homologous
comparison of ribosomal proteins. J. Mol. Evol. 47:81–92.
Wang, D. Y., S. Kumar, and S. B. Hedges. 1999. Divergence
time estimates for the early history of animal phyla and the
origin of plants, animals and fungi. Proc. R. Soc. Lond. B
Biol. Sci. 266:163–171.
Whelan, S., and N. Goldman. 2001. A general empirical model
of protein evolution derived from multiple protein families
using a maximum-likelihood approach. Mol. Biol. Evol. 18:
691–699.
Wolf, Y. I., I. B. Rogozin, N. V. Grishin, R. L. Tatusov, and E.
V. Koonin. 2001. Genome trees constructed using five
different approaches suggest new major bacterial clades.
BMC Evol. Biol. 1:8.
Wright, F. 1990. The effective number of codons used in a gene.
Gene 87:23–29.
Yamazoe, M., T. Onogi, Y. Sunako, H. Niki, K. Yamanaka, T.
Ichimura, and S. Hiraga. 1999. Complex formation of MukB,
MukE and MukF proteins involved in chromosome partitioning in Escherichia coli. EMBO J. 18:5873–5884.
Yang, J., Z. Gu, and W. H. Li. 2003. Rate of protein evolution
versus fitness effect of gene deletion. Mol. Biol. Evol.
20:772–724.
Yazdi, P. T., Y. Wang, S. Zhao, N. Patel, E. Y. Lee, and J. Qin.
2002. SMC1 is a downstream effector in the ATM/NBS1
branch of the human S-phase checkpoint. Genes Dev. 16:
571–582.
Yoshimura, S. H., K. Hizume, A. Murakami, T. Sutani, K.
Takeyasu, and M. Yanagida. 2002. Condensin architecture
and interaction with DNA: regulatory non-SMC subunits bind
to the head of SMC heterodimer. Curr Biol. 12:508–513.
Zwickl, D. J., and D. M. Hillis. 2002. Increased taxon sampling
greatly reduces phylogenetic error. Syst. Biol. 51:588–598.
Michele Vendruscolo, Associate Editor
Accepted September 29, 2003