* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Evolution of genes, evolution of species: the case of aminoacyl
Transfer RNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genomic imprinting wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Transposable element wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Non-coding DNA wikipedia , lookup
Expanded genetic code wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Human genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genetic code wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Pathogenomics wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Metagenomics wikipedia , lookup
Sequence alignment wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Evolution of Genes, Evolution of Species: The Case of Aminoacyl-tRNA Synthetases Y. Diaz-Lazcoz,* J.-C. Aude,† P. Nitschké,* H. Chiapello,‡ C. Landès-Devauchelle,* and J.-L. Risler* *Université de Versailles, Génome et Informatique, Bâtiment Buffon, Versailles, France; †Institut National de Recherche en Informatique et en Automatique, Le Chesnay, France; and ‡Laboratoire de Biologie Cellulaire, Institut National de la Recherche Agronomique, Versailles, France All of the aminoacyl-tRNA synthetase (aaRS) sequences currently available in the data banks have been subjected to a systematic analysis aimed at finding gene duplications, genetic recombinations, and horizontal transfers. Evidence is provided for the occurrence (or probable occurrence) of such phenomena within this class of enzymes. In particular, it is suggested that the monomeric PheRS from the yeast mitochondrion is a chimera of the a and b chains of the standard tetrameric protein. In addition, it is proposed that the dimeric and tetrameric forms of GlyRS are the result of a double and independent acquisition of the same specificity within two different subclasses of aaRS. The phylogenetic reconstructions of the evolutionary histories of the genes encoding aaRS are shown to be extremely diverse. While large segments of the population are consistent with the broad grouping into the three Woesian domains, some phylogenetic reconstructions do not place the Archae and the Eucarya as sister groups but, rather, show a Gram-negative bacteria/eukaryote clustering. In addition, many individual genes pose difficulties that preclude any simple evolutionary scheme. Thus, aaRS’s are clearly a paradigm of F. Jacob’s ‘‘odd jobs of evolution’’ but, on the whole, do not call into question the evolutionary scenario originally proposed by Woese and subsequently refined by others. Introduction Molecular phylogeny was born about 30 years ago (e.g., Fitch and Margoliash 1967) and, by necessity, was limited at that time to the study of chemically determined protein sequences. Since DNA sequences can now be easily and massively determined, phylogenetic studies based on molecular data (gene sequences or conceptually translated amino acid sequences, ribosomal RNA sequences) are routinely performed in many laboratories. Some analyses are even based on entire genomic sequences (Sankoff, Ferretti, and Nadeau 1997). The most important achievement in the field, which Doolittle and Brown (1994) called ‘‘the Woesian revolution,’’ was probably the delineation of the three kingdoms of life (Eukaryotes, Eubacteria, and Archaebacteria) based on the analysis of ribosomal RNA sequences (Fox et al. 1977; Woese and Fox 1977). Later, Iwabe et al. (1989) and Gogarten et al. (1989) suggested that archaebacterial and eukaryotic nuclear genomes are sister groups, indicating that Eubacteria were the first to diverge from the universal tree. This rooting was accepted by Woese, Kandler, and Wheelis (1990), who renamed the three kingdoms as three domains (Eucarya, Bacteria, and Archaea). Although generally accepted, the scenario proposed by Woese, Iwabe, Gogarten and their co-workers has been (and is still) the subject of fierce debates (see a discussion by Doolittle and Brown 1994). A number of phylogenetic analyses, essentially based on protein sequences, have produced contradictory and apparently irreconcilable results. These studies Key words: evolution, molecular phylogeny, aminoacyl-tRNA synthetases, pyramidal classification, codon usage, gene duplication. Address for correspondence and reprints: J.-L. Risler, Université de Versailles, Génome et Informatique, Bâtiment Buffon, 45 Avenue des Etats-Unis, 78035 Versailles Cedex, France. E-mail: [email protected]. Mol. Biol. Evol. 15(11):1548–1561. 1998 q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 1548 have been summarized and criticized by Forterre et al. (1992), Doolittle and Brown (1994), Roger and Brown (1996), Tourasse (1998), and others. Obviously, the reconstruction of the universal tree by molecular phylogenetic inference must be based on the sequences of ubiquitous and ancient macromolecules. Thus, aminoacyl-tRNA synthetases (aaRS’s) are candidates of choice, since they fulfill the above two conditions, and their key role in the fidelity of translation makes it probable that they acquired their evolutionary maturity very early (however, see Ibba et al. 1997a, 1997b). Not surprisingly, they have already been subjected to phylogenetic studies (Brown and Doolittle 1995; Nagel and Doolittle 1995; Brown et al. 1997). What prompted us to embark on the present—and apparently redundant—evolutionary study of aaRS is the fact that a number of sequences from a great variety of organisms have recently appeared in the data banks, making it possible to perform sound analyses on each of the 20 aaRS’s. Our aim was twofold: (1) to describe the evolution of the genes of each aaRS and (2) to examine whether the evolution of the genes could be used to infer that of the species. In other words, we wanted to determine whether the reconstructed trees for the 20 proteins gave a coherent picture of evolution, and check for possible gene duplications and horizontal transfers that would complicate the task of reconstructing a universal tree with these proteins. Hereinafter, we shall refer to all the aminoacyltRNA synthetases that share the same specificity for the same amino acid as a group of aaRS’s. Thus, for example, all the arginyl-tRNA synthetases belong to the same group. This is not to be confused with one of the two unrelated classes that partition the aaRS’s into the two equally populated class I and class II, each containing 10 groups (e.g., Carter 1993). Evolution of Aminoacyl-tRNA Synthetases Materials and Methods All the protein sequences were extracted from the latest releases of the SwissProt and SpTrembl data banks. Multiple-sequence alignments within each aaRS group were performed with CLUSTAL W (Thompson, Higgins, and Gibson 1994) and DIALIGN (Morgenstern et al. 1998) and then manually edited to remove regions with too many variable gap positions and regions exhibiting conspicuous saturation—i.e., many different amino acids at the same position without discernible patterns. This last step is, admittedly, rather subjective. For one given group of aaRS’s, the number of overall conserved regions was generally sufficient to ensure an essentially correct multiple alignment. Whenever possible, the CLUSTAL W or DIALIGN outputs were checked against previously published alignments (Cusack 1993; Landès et al. 1995) and those from the PRODOM database (Corpet, Gouzy, and Kahn 1998). In those few cases in which large insertions could cause a problem, as in the case of prolyl-tRNA synthetase (ProRS), we resorted to Smith-Waterman pairwise alignments to check or adjust the multiple alignment. One intergroup multiple alignment was performed with the TrpRS and TyrRS sequences. For this purpose, the TrpRS sequences and the TyrRS sequences were first aligned independently with CLUSTAL W. Then these two independent intragroup alignments were merged with a cut-and-paste operation. The resulting intergroup file (not yet properly aligned) was then edited such that the alignment of the TrpRS and TyrRS sequences from Bacillus stearothermophilus became strictly identical to the structural alignment published by Doublié et al. (1995), any modification (insertion of gaps, for example) in one of these two sequences being immediately mirrored in all the sequences from the same group. We believe that this three-step operation makes it probable that (1) the intragroup alignments (TrpRS and TyrRS) are essentially correct and (2) the intergroup multiple alignment closely mirrors the structural similarities of the TrpRS and TyrRS from B. stearothermophilus. Phylogenetic analyses were done using the PHYLIP 3.57c suite of programs (Felsenstein 1993). More specifically, the pairwise distance matrices were calculated by PROTDIST with the ‘‘Dayhoff’’ option to estimate the expected amino acid replacements per position, and neighbor-joining (NJ) trees were obtained with NEIGHBOR. The most-parsimonious trees were determined with PROTPARS. For each group of aaRS’s, we performed 1,000 bootstrap resamplings with SEQBOOT, and the consensus tree was obtained by CONSENSE. Finally, the trees were drawn with the program TreeView (Page 1996). All the multiple alignments used in this study (in PHYLIP format) and the resulting trees are available by anonymous ftp from ftp://genome.genetique.uvsq.fr/pub/synthetases. A search for those aaRS genes that could have been acquired by horizontal transfer was performed by an analysis of their codon usage. For this purpose, the codon usages of all the genes in a given species were calculated and submitted to a factorial correspondence 1549 analysis. For each species, this resulted in a cloud of points in which each point corresponds to a gene in the codon space and, in some cases, the particular position of one given gene in the cloud can point to a probable horizontal transfer (see Médigue et al. 1991). All of the necessary programs were written locally and, among others, implemented in a server that allows for interactive queries of genomic databases with particular focus on codon usage, metabolic pathways, and neighboring relationships (see http://indigo.genetique.uvsq.fr). Cluster analyses and pyramidal classifications were also used in a first pass to search for anonymous sequences that could be related to an aaRS and for unexpected groupings of aaRS sequences. First, we performed an all-by-all comparison of the 465 currently available aaRS sequences using the Smith-Waterman algorithm as implemented in LASSAP (Glémet and Codani 1997). Then, for each pair of sequences showing an alignment score greater than a given threshold, we calculated the associated Z value (Lipman et al. 1984) after 100 sequence shufflings. The pairwise Z values were finally used to build a set of connex classes—or clusters—by a single linkage analysis procedure such that in any cluster, each sequence is linked to at least one other sequence by a Z value $ 14. This step resulted in 10 different clusters. Each cluster was then submitted to a pyramidal classification (Bertrand and Diday 1985; unpublished data; see http://alize.inria.fr/Pyramids). This hierarchical clustering method has the property of allowing any element to belong simultaneously to two classes and therefore enables a better understanding of the relationships between them. A key property of the pyramidal classification is that it provides one (or a set) of compatible order(s), which means that the relative positions of the classified objects within the tree mirror their relative distances. This is particularly useful in determining which objects make the link between two adjacent groups. A detailed example of a pyramidal classification is given in http://alize.inria.fr/Pyramids. A set of programs aimed at calculating pyramidal classifications and drawing pyramids is also available from this URL. Results and Discussion It has been known for a long time that the sequences of aaRS can be extremely different, even for proteins within the same class. For example, while the sequences of IleRS, LeuRS, ValRS, and MetRS—all belonging to class I—are rather similar (see review by Carter 1993), it proves to be much more difficult to align the sequences of MetRS and TyrRS, even though TyrRS is also a class I synthetase. Under such circumstances, the safest way to align poorly related sequences is to take into account the structures of the proteins. Indeed, the threedimensional structures of at least 14 of the 20 synthetases have been determined (Cusack 1995, 1997). Unfortunately, most of the atomic coordinates have not been deposited with the Protein Data Bank. This is the main reason we did not attempt to perform intergroup multiple alignments, except for TrpRS and TyrRS, for 1550 Diaz-Lazcoz et al. Table 1 Clusters Obtained from the Single Linkage Analysis of Aminoacyl-tRNA Synthetases Cluster 1 2 3 4 5 6 7 8 9 10 ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... No. of Sequences Amino Acid Specificity 220 220 63 49 35 32 24 23 15 4 Cys, Ile, Leu, Met, Val Gly, His, Ser, Pro, Thr Asn, Asp, Lys Trp, Tyr Phe Gln, Glu Arg Ala Gly Lys NOTE.—For each cluster, the second column indicates the total number of sequences it contains, and the third column lists the cognate amino acids of the aaRS’s that have been grouped in the cluster. The GlyRS’s in cluster 2 are of the dimeric form, while those in cluster 9 are tetrameric. The last cluster (cluster 10) contains the LysRS from the Archae Methanobacterium thermoautotrophicum, Methanococcus maripaludis, and Archaeoglobus fulgidus and that from the spirochete Borrelia burgdorferi. which a structural alignment has been published (Doublié et al. 1995). Thus, the phylogenetic analyses reported below are based on multiple alignments of aaRS’s belonging to the same group—having the same specificity—for which the sequences are generally better conserved. The pyramidal analyses, for their part, rely on pairwise intra- and intergroup alignments. As stated above, this can be criticized. It should be noted, however, that the Smith-Waterman algorithm is a local alignment procedure and that the threshold (Z $ 14) used in the single linkage analysis is rather high. In other words, the clustering of the aaRS sequences is based on their most similar regions. While this analysis is certainly less sensitive than the analysis of global multiple alignments, the pyramidal classifications prove to be in excellent agreement with the phylogenetic reconstructions (vide infra). Clusters and Pyramidal Classifications In the present work, clusters of sequences were built according to their similarities based on the pairwise Z values (see Materials and Methods). It is clear that other similarity indices can be used for the same goal (e.g., Tatusov, Koonin, and Lipman 1997). Whatever the method used for grouping the sequences, it is expected that any given cluster will mainly contain orthologous proteins of similar functions, as well as paralogous sequences in the case of gene duplications. In some cases, a cluster will also contain proteins that have been acquired by horizontal transfers and mosaic or multidomain sequences that are not necessarily functionally related. Table 1 summarizes the results obtained with the 465 aaRS sequences currently available (as of July 1998) and shows that the sequences partition into 10 different clusters. All the clusters thus delineated are totally consistent with our present knowledge on aaRS. For example, the class I IleRS, LeuRS, MetRS, and ValRS have long been recognized as being closely related, as well as the class IIa GlyRS, HisRS, ProRS, SerRS, and ThrRS (see review by Carter 1993; Cusack 1993). Conversely, it is only recently that the archaebacterial LysRS’s from Methanococcus maripaludis, Methanobacterium thermoautotrophicum, and Methanococcus jannaschii (but not that from Sulfolobus solfataricus) and the LysRS from the spirochete Borrelia burgdorferi have been shown to be radically different from all the other LysRS’s (Ibba et al. 1997a, 1997b). These LysRS’s are grouped in cluster 10 (table 1), and their case will be discussed later. The pyramidal classification of the phenylalanyltRNA synthetases is shown in figure 1a and clearly shows two groups built by the a and b chains, respectively. PheRS is normally a heterotetramer with an a2b2 quaternary structure. In any given PheRS, the short (a) and long (b) chains present very few sequence similarities. However, the X-ray structure of the tetrameric PheRS from Thermus thermophilus (Mosyak and Safro 1993) revealed an unexpected structural similarity between its a chain and the N-terminal part of its b chain. This resemblance, together with a comparison of the structure with those of other class II synthetases, enabled Mosyak and Safro (1993) to locate in both chains the three motifs characteristic of class II synthetases (Eriani et al. 1990) as depicted in figure 1b, and led to the conclusion that the large b subunit probably arose from the duplication of an ancestral catalytic domain of class II synthetases followed by subsequent insertions and deletions of polypeptides. However, as already stated, there is practically no sequence similarity between the a and b chains in any PheRS. Hence, the grouping of sequences from both chains into the same cluster came as a surprise. Examination of the cluster shows that it is the mitochondrial PheRS from yeast that makes the link between the a chains and the b chains of the other PheRS. The point here is that the yeast mitochondrial PheRS is monomeric (Sanni et al. 1991) and is so far the only PheRS of its kind. Sanni et al. (1991) noticed strong sequence similarities between the mitochondrial PheRS and the Escherichia coli a subunit, as well as 30% identities between its C-terminal part and that from the E. coli b chain. These observations are now reinforced by comparisons with more recently sequenced PheRS’s. A search against SwissProt with WU-BLAST (http:// blast.wustl.edu) confirms the occurrence of two segments of high similarity between the yeast mitochondrial PheRS and the a subunit from Haemophilus influenzae, as well as an undisputable similarity between its C-terminus and that of the b chain from Synechococcus sp. (fig. 1c). It is therefore clear that the unique chain of the yeast mitochondrial PheRS is strongly linked to both the a and the b chains of other bacterial PheRS’s. This leads us to suggest that the yeast mitochondrial PheRS is a chimera of the a and b chains of a former ‘‘standard’’ bacterial a2b2 enzyme, resulting from genetic recombination. Nothing can tell us, however, whether the recombination occurred before or after the yeast nucleus appropriated the former mitochondrial gene. It is probably relevant to note at this point a somewhat similar situation with the GlyRS from Chlamydia trachomatis (Wagar et al. 1995) in which the a and b subunits are encoded as a single chain by a single gene. Evolution of Aminoacyl-tRNA Synthetases 1551 FIG. 1.—a, Pyramidal classification of the a and b chains of PheRS. The names of the archaebacteria are italicized. cyto stands for cytoplasmic, and mito stands for mitochondrial. b, Schematic representation of the similarities between the a and b chains of PheRS from T. thermophilus. The motifs 1, 2, and 3, characteristic of class II synthetases, were located in the b chain on the basis of structural similarities. c, Schematic representation of the similarities between the yeast mitochondrial PheRS (middle) and the a and b chains of the PheRS’s from H. influenzae (upper) and Synechococcus (lower), respectively. Each segment of similarity is represented with a different shading. The figures under the segments are the percentages of identities for each segment and the corresponding Z values (in brackets). Two more points in the pyramidal classification are worthy of note: (1) for both the a and b chains, the archaeal and eukaryotic sequences are grouped together, while the bacterial sequences make another group. This is consistent with the Woesean view of evolution. However, the sequences from B. burgdorferi are much closer to their archaeal/eucaryal counterparts than to the bacterial sequences. (2) two sequences, one from M. jannaschii (MJ1660) and the other from M. thermoautotro- phicum (MTH1501), suggest a duplication of the PheRS a gene. ArgRS provides quite a different example (fig. 2), in which we see that Gram-negative bacteria such as E. coli and H. influenzae are grouped with the Eucarya, while another proteobacterium (Helicobacter pylori) is clustered with the Gram-positive bacteria. In addition, the Archaea make up a subcluster of their own that is not directly linked to the Eucarya. Obviously, all the 1552 Diaz-Lazcoz et al. FIG. 2.—Pyramidal classification of the currently available ArgRS sequences. The names of the archaebacteria are italicized. aaRS did not follow identical evolutionary paths, and the case of ArgRS will be further discussed later. Duplication of Genes Within the aaRS Family Clusters of protein sequences such as those presented above, as well as powerful sequence retrieval systems such as SRS (Etzold, Ulyanov, and Argos 1996) or ACNUC (Gouy et al. 1985) and comparison programs like BLAST (Altschul et al. 1990) make it easy to search for gene duplications in entire genomic sequences or in data banks—provided, of course, that the sequences have not diverged too much. In table 2, we give a list—probably not exhaustive—of duplications and even triplications among genes encoding aaRS (for the moment, we do not make any distinction between a gene duplication and a probable horizontal transfer; see next section). In some cases, the sequence similarity between an authenticated aaRS and a putative duplicated gene does not span the entirety of the aaRS sequence, as is the case for the example of YOR335C (AlaRS in yeast) and YNL040W, in which the similarity is limited to the N-terminus of the AlaRS sequence. Nevertheless, table 2 shows that duplication events are far from seldom within the aaRS family, a fact that must be remembered when attempting to perform phylogenetic reconstructions. It is expected that more and more duplications within the aaRS family will be discovered as time passes and sequences accumulate. Horizontal Transfer of aaRS Genes (LysRS, GluRS) As shown by Médigue et al. (1991), a multivariate analysis of the codon usage of the genes from one spe- Table 2 List of Duplicated Genes Encoding Aminoacyl-tRNA Synthetases Amino Acid Species Lys . . . . . . . Escherichia coli H. influenzae Bacillus subtilis Synechocystis B. subtilis B. subtilis Synechocystis Human E. coli Human B. subtilis M. jannaschii M. thermoautotrophicum Saccharomyces cerevisiae Arabidopsis thaliana Caenorhabditis elegans Tyr . . . . . . . Thr . . . . . . . His . . . . . . . Glu . . . . . . . Phe . . . . . . . Ala . . . . . . . Arg . . . . . . . Genes lysS, lysU, yjeA or genX* HI1211 (lysU), HI0836 (genX)* tyrS, tyrZ SLR1031 (tyrS), SSR1720* thrS, thrZ hisS, hisZ SLR0357 (hisS), SLR1560 (hisZ) hars, harsr gltX, yadB* glnS, EMBL: AC002400* pheT, ytpR* MJ0487 (pheS), MJ1660* MTH742, MTH1501* YOR335C, YNL040W* EMBL: ATARGSG7, EMBL: ATARGSG6 SP: Q19825, TR: Q18316* NOTE.—An asterisk following a gene name indicates that the similarity does not span the entirety of one of the duplicated genes. SP and TR stand for the SwissProt and Trembl data banks, respectively. Not all the genes have been experimentally shown to encode a functional aminoacyl-tRNA synthetase. Evolution of Aminoacyl-tRNA Synthetases 1553 ilar to GluRS. The situation is different for H. influenzae (fig. 3b), a bacteria close to E. coli. Here, we have two (not three) genes encoding a LysRS. One is lysU, whose gene in the FCA lies within the cluster made by the other aaRS. The second is genX, which lies far away in the FCA. These observations can be explained in several nonexclusive ways. It is well known that the codon bias in E. coli correlates with the level of expression of the genes (Gouy and Gautier 1982). Indeed, the positions of the constitutive aaRS genes in the FCA map of E. coli in the so-called class II zone reflect their high expressivity (Médigue et al. 1991). Hence, the positions of lysU, genX, and yadB in this map and of genX in that of H. influenzae could simply mirror a lower expressivity. This does not tell us, however, whether these genes—or some of them—are paralogous or have been acquired by horizontal transfers. The latter possibility cannot be excluded since, obviously, exogenicity is not incompatible with a lower expressivity. This is particularly true for genX, the sequences of which in E. coli and H. influenzae resemble that of the LysRS from Campylobacter jejuni more closely than that of lysS or lysU in their respective hosts. An alternative explanation could be that the genes in question are in fact pseudogenes, the differences in their codon usages being the result of genetic drift. This is certainly not the case for lysU in E. coli, which is expressed in conditions of stress (Brevet et al. 1995). As for genX, its very high sequence similarities with other LysRS’s (see above) makes it improbable, but the question remains open for yadB in E. coli. FIG. 3.—Factorial correspondence analysis of the codon usage of all the ORFs in (a) E. coli and (b) H. influenzae. Each small dot represents one gene projected onto the plane of maximum inertia. Each gene encoding an aminoacyl-tRNA synthetase is represented by a solid diamond (l) except genX (v), lysU (m), and yadB (m). cies can point to probable horizontal transfers (see also Guerdoux-Jamet et al. 1997). We have therefore undertaken such a study on the aaRS genes from several organisms, using mainly the server accessible from http:/ /indigo.genetique.uvsq.fr (unpublished data) that implements the factorial correspondence analysis (FCA) of the codon usage in E. coli, Bacillus subtilis, and Arabidopsis thaliana. The simple underlying idea is that a xenologous gene will likely have a codon usage different from that of similar genes in the host, and, since the establishment of the codon usage is a slow process (Diaz-Lazcoz et al. 1995), this should be detectable even if the transfer occurred a long time ago. The FCA of the codon usage of 3,911 ORFs longer than 300 bp from E. coli is shown in figure 3a. All the genes encoding an aaRS, except three, cluster in one single group, including lysS, which encodes the constitutive LysRS, indicating that they all have a similar codon usage. One of the genes that do not cluster with the others is lysU, a nonconstitutive heat-inducible LysRS. The second is genX (yjeA), which encodes yet a third protein similar to LysRS. The third and last gene that lies outside the cluster is yadB, encoding a protein sim- Phylogenetic Reconstructions Phylogenetic reconstructions have been performed on the 20 different aaRS’s as described in Materials and Methods. As judged from the bootstrap values, 13 of them gave (at least partly) reliable phylogenies, namely AspRS, ArgRS, GluRS, GlyRS, HisRS, IleRS, LeuRS, MetRS, PheRS, ProRS, ThrRS, TrpRS, and TyrRS. For all of these aaRS’s, the multiple alignments that were used in the reconstructions, as well as the resulting unrooted trees and the bootstrap values, are available by anonymous ftp from ftp://genome.genetique.uvsq.fr/pub/ synthetases. It must be stressed here that our aim was not so much to obtain the universal tree of life but, rather, to test whether the aaRS’s (or which aaRS’s) are good candidates for this purpose and whether they underwent similar or contrasting evolutionary paths. In addition, we are well aware that trying to resolve in detail the phylogeny of the different kingdoms would require more sophisticated methods (e.g., Tourasse and Gouy 1997). Indeed, the topologies of the bacterial branches are generally poorly resolved in our trees. Trees with Woesian Topologies Among the 13 aforementioned aaRS, 9 (AspRS, GluRS, IleRS, LeuRS, MetRS, PheRS, ProRS, TrpRS, and TyrRS) provided reconstructions with the currently accepted topology delimiting the three domains of life. 1554 Diaz-Lazcoz et al. FIG. 4.—Reconstructed unrooted phylogenetic trees for AspRS and GluRS. The trees in a and c are the consensus neighbor-joining trees, and those in b and d are the most-parsimonious consensus trees. The bootstrap scores after 1,000 bootstrap resamplings are given for some key branches. The names of the archaeabacteria are italicized. Two examples are given in figure 4, namely AspRS (a class II enzyme) and GluRS (class I). Note, however, the poor resolution of the bacterial branches and the separation of Halobacterium salinarum from the other Archae in the most parsimonious tree of AspRS. In the case of ProRS (not shown), Mycoplasma genitalium and Mycoplasma pneumoniae are placed within the Archaea and Eucarya. But since mycoplasma are parasites for a wide range of hosts including animals and plants, the occurrence of horizontal gene transfers would not be surprising. Similarly, B. burgdorferi is classified together with the Archaea and the Eucarya in the case of MetRS, ProRS, and PheRS. These anomalies most probably point to the idiosyncratic evolution of particular genes in particular species and do not call into question the overall topologies of the trees. Evolution of Aminoacyl-tRNA Synthetases 1555 FIG. 5.—Reconstructed unrooted phylogenetic trees for ArgRS and ThrRS. The trees in a and c are the consensus neighbor-joining trees, and those in b and d are the most parsimonious consensus trees. The bootstrap scores after 1,000 bootstrap resamplings are given for some key branches. The names of the archaeabacteria are italicized. Trees with Nonconventional Topologies The trees obtained with the other four aaRS’s are, in one way or another, nonconventional. For example, the present analyses on ArgRS (fig. 5) do not link the Archaea and the Eucarya as sister groups. In addition, H. pylori, a Gram-negative proteobacterium, is classified together with the Gram-positive bacteria. Both of these observations could be made from the pyramidal classification shown in figure 2. It is probably relevant to recall here that the ArgRS gene is duplicated in A. thaliana. The two copies (see table 2) are so similar (88% identity) that they cluster together. In addition, two 1556 Diaz-Lazcoz et al. genes resembling ArgRS have been found so far in Caenorhabditis elegans. Their amino acid sequences share 27% identity, and they are well separated in the reconstructed tree. In this case, it seems possible that one gene (Trembl: Q18316) would in fact encode the mitochondrial ArgRS. Finally, figure 5 shows that the yeast cytoplasmic ArgRS clusters with its mitochondrial counterpart. All of these observations, as well as the long branch between the two main groups, suggest an ancient paralogy. The situation is similar in the case of ThrRS (fig. 5), in which the Archaea are also separated from the Eucarya and clearly clustered with Gram-positive bacteria. Such a situation has been observed with e.g., the HSP 70 chaperones (Gupta et al. 1997) and glutamine synthetase (Brown et al. 1994). Here again, the yeast cytoplasmic and mitochondrial enzymes belong to the same clade, and H. pylori is clustered with the Grampositive bacteria. We note that there exist two ThrRS’s in B. subtilis, namely thrS and thrZ. The unusual topology of the tree can be explained parsimoniously by an early duplication of the primitive ThrRS gene, leading to what are now thrS and thrZ. During the course of evolution, the Archaea and the Gram-positive bacteria would have lost thrZ, while the Eukarya and the Gramnegative bacteria would have lost thrS. An alternative and more general explanation for the nonconventional topologies of the ArgRS and ThrRS trees is that eukaryotes obtained a copy from a proteobacterial endosymbiont while their original genes were lost. This would explain the similarity between eukaryotic and bacterial Gram-negative ArgRS and ThrRS. Indeed, such an evolutionary scenario has been proposed for other gene phylogenies showing a Gram-negative bacteria/eukaryote clustering (see Martin 1996 for a summary). The Case of Tryptophanyl- and Tyrosyl-tRNA Synthetases TrpRS and TyrRS are quite controversial in the literature. A phylogenetic analysis of a multiple alignment comprising 16 different TrpRS and TyrRS sequences led Ribas de Pouplana et al. (1996) to the surprising conclusion that the eukaryotic TrpRS and TyrRS are more related to each other than to their respective prokaryotic counterparts. Their most parsimonious tree clearly classified the eukaryotic proteins within the same clade, thus suggesting an origin for the eucaryal genes separate from that of their bacterial counterparts. The high degree of structural similarity between TrpRS and TyrRS from B. stearothermophilus (Doublié et al. 1995) was taken as supporting evidence for a surprisingly recent common ancestor, in spite of the much poorer sequence similarity. Later, Brown et al. (1997) undertook a similar study, but with a larger number of sequences and a broader range of taxa, and came to a different conclusion: all of their phylogenetic reconstructions supported the separate monophyly of TrpRS and TyrRS. We therefore undertook yet one more similar study based on 49 TrpRS and TyrRS sequences, with particular attention being paid to the structural alignment of Doublié et al. (1995) in the multiple-alignment step (see Materials and Methods). Part of our multiple alignment, from the end of the first crossover connection to the middle of the dimer interface, is shown in figure 6. Clearly, a number of residues (identical or similar) are conserved within the two groups of sequences. In addition, there are positions where a residue is highly conserved within only one group and is characteristic of this group. Although Brown et al. (1997) also took full account of the structural alignment, their published multiple-alignment of TrpRS and TyrRS sequences differs from ours in some places (but is identical in the most conserved regions). The differences probably come from the use of different alignment programs and/or from the fact that these authors aligned the TrpRS and TyrRS sequences together, while we treated them separately. Figure 7 shows the pyramidal classification of the TrpRS and TyrRS sequences. It is clear that in each group, the archaeal and eucaryal synthetases are classified together and that the bacterial enzymes make another subgroup. One prominent feature of this classification is that the archaeal/eucaryal TyrRS’s are closer to the archaeal/eucaryal TrpRS’s than to the bacterial TyrRS’s. This is in agreement with the observations of Ribas de Pouplana et al. (1996). However, the archaeal/ eucaryal TrpRS’s are (slightly but clearly) closer to their bacterial counterparts than to the archaeal/eucaryal TyrRS’s, and this is at variance with the classification of Ribas de Pouplana et al. (1996). In addition, the bacterial TrpRS’s and TyrRS’s are quite far from one another and cannot be considered as forming a group. The simplest tentative explanation for these observations is that, for one reason or another, the rate of mutations of the TyrRS genes within the Archaea and Eucarya is different from that of the TyrRS genes within the Bacteria. Let us now turn to the phylogenetic reconstructions based on the structural multiple alignment, shown in figure 8a and b. It appears that the TrpRS sequences and the TyrRS sequences partition into two separate monophyletic groups. Within each group, the archaeal and eucaryal synthetases form one clade. The eukaryotic TrpRS’s and TyrRS’s are definitely not clustered together. In many respects, the trees in figure 8a and b are similar to those published by Brown et al. (1997) and therefore do not support the results of Ribas de Pouplana et al. (1996). Since Brown et al. (1997) suggested that these conflicting results could arise from too limited a sampling of the taxa in the work by Ribas de Pouplana et al. (1996), we removed all those sequences that were not present in this last study from our multiple alignment and reconstructed the phylogenetic tree with the same set of sequences (fig. 8c and d). Again, the eukaryotic TrpRS’s and TyrRS’s do not cluster into one monophyletic group. It is thus probable that the multiple alignment used by Ribas de Pouplana et al. (1996) was markedly different from ours and from that of Brown et al. (1997). The Case of Lysyl-tRNA Synthetase We have previously seen that the gene encoding LysRS is duplicated in several species and that there is Evolution of Aminoacyl-tRNA Synthetases 1557 FIG. 6.—Part of the multiple alignment between TrpRS and TyrRS. The alignment between the TrpRS (W-BACST) and TyrRS (Y-BACST) from B. stearothermophilus is identical to the structural alignment published by Doublié et al. (1995). Some residues (identical or similar) that are conserved in both groups are boxed and shaded. Some residues that are conserved within only one group are boxed unshaded. The names of the TrpRS and TyrRS sequences begin with W and Y, respectively. The names of the species follow the SwissProt convention, i.e. SCHPO stands for Schizosaccharomyces pombe, METJA for M. jannaschii, etc. some evidence of horizontal transfer in E. coli and H. influenzae. Until recently, no LysRS’s had been identified among the Archaea except in S. solfataricus. Unfortunately, the reconstruction of the LysRS phylogeny results in rather low bootstrap values (not shown) that prevent a more detailed study. This synthetase, however, is absolutely unique, since it has been demonstrated recently (Ibba et al. 1997a, 1997b) that the LysRS’s in four Archaea (M. jannaschii, A. fulgidus, M. maripaludis, and M. thermoautotrophicum) and one spirochete (B. burgdorferi) display sequence features that are characteristic of the class I–type synthetases, while all the LysRS’s from other bacteria and eukaryotes and that from S. solfataricus belong to class II. It is not clear at the moment whether the class I–type LysRS’s appeared before or after the universal ancestor gave rise to the three kingdoms of life, but the results of Ibba et al. (1997a, 1997b) demonstrate unambiguously that the tRNALys aminoacylation function appeared independently at least twice during the course of evolution. The Case of Glycyl-tRNA Synthetase GlyRS has been found to exist under two different oligomeric forms, namely a a2b2 heterotetramer and a a2 homodimer. The two forms do not share any convincing sequence similarity (Shiba et al. 1994). Until recently, the tetrameric GlyRS had been found only within Bacteria and the dimeric enzyme only within Eucaria, which suggested that the oligomeric structure of GlyRS distinguishes prokaryotes and eukaryotes (see Mazauric et al. 1996 and references therein). In addition, it had been suggested that the discriminator base 73 in tRNAGly could be correlated with both subunit structure and the prokaryote/eukaryote division, i.e., a2b2/U73 and a2/A73 (Shiba et al. 1994). Finally, sequence similarities clearly placed the dimeric GlyRS within subclass IIa together with HisRS, ProRS, SerRS, and ThrRS, and the tetrameric GlyRS within subclass IIc together with the a4 AlaRS and the a2b2 PheRS (Cusack 1993, 1995). The picture today is somewhat different, since a dimeric GlyRS was found in T. thermophilus (Logan et al. 1995; Mazauric et al. 1996) while genomic sequencing projects proved that the a2 GlyRS is the form that exists in the Archaea, Mycoplasma sp., and other bacteria. In contrast, the tetrameric GlyRS has been observed only among the Bacteria. In addition, GlyRS from T. thermophilus is able to aminoacylate heterologous E. coli and yeast tRNAGly (Mazauric et al. 1996), which discloses any relationship between the oligomeric state of the enzyme and the nature of its host. Finally, the availability of new sequences provides compelling evidence that the closest relatives to the a2 GlyRS are ThrRS and ProRS, reinforcing its assignation to subclass IIa. We are therefore led to the following conclusions: (1) Since the dimeric GlyRS is found among the three kingdoms of life, this is the glycyl-tRNA synthetase that 1558 Diaz-Lazcoz et al. FIG. 7.—Pyramidal classification of all the currently available TrpRS and TyrRS sequences. The sequence labeled Synechocystis(2) is fragmentary. The names of the archaeabacteria are italicized. existed originally, sharing a common subclass IIa ancestor with HisRS, ProRS, SerRS, and ThrRS. (2) The existence of another GlyRS with the same specificity, yet markedly different in both its quaternary structure and amino acid sequence, suggests that this very specificity was acquired a second time during the course of evolution among a different subclass of tetrameric aaRS’s comprising PheRS and AlaRS. This may seem unlikely at first, but that the same specificity could appear independently within two subclasses of the same family is, after all, less surprising than the convergent evolution of trypsin and subtilisin (Kraut et al. 1972). Note also that the tRNA-aminoacylation function itself clearly appeared independently at least twice, once within the class I family, and a second time within the class II family. Finally, the existence of both class I– and class II–type LysRS’s (vide supra) shows that this last synthetase was invented twice. Hence, the existence of two different nonorthologous GlyRS’s is not that improbable. (3) Since the tetrameric GlyRS has been found only within the Bacteria, it is probable that it appeared on this branch after the Bacteria separated from the other kingdoms. A special case should be noted for A. thaliana, in which a GlyRS of plastid origin (EMBL: ATAJ3069) appears to contain a single chain encoded by a single gene but containing both a a-chain at its N-terminus and a b-chain at its C-terminus. As indicated before, this is also true for the GlyRS from C. trachomatis (Wagar et al. 1995). Evolution of Aminoacyl-tRNA Synthetases 1559 FIG. 8.—Reconstructed phylogeny based on the multiple alignment between TrpRS and TyrRS. All the TrpRS names begin with W, and all the TyrRS names begin with Y. Yc stands for yeast cytoplasm, Ym for yeast mitochondrion, and Ce for Caenorhabditis elegans. The trees in a and b are the consensus neighbor-joining and most-parsimonious trees, respectively, for the 49 currently available sequences. The trees in c and d are the consensus neighbor-joining and most-parsimonious trees, respectively, reconstructed with only those sequences that were used in the study of Ribas de Pouplana et al. (1996). Each tree was obtained after 1,000 bootstrap resamplings. The bootstrap scores are given for some key branches. 1560 Diaz-Lazcoz et al. Conclusions Aminoacyl-tRNA synthetases have long been considered suitable for case studies because of their extreme diversity in terms of amino acid sequences, quaternary structures, and modes of action, while sharing basically the same function. In the present work, we have shown that the evolutionary paths of their genes are also very diverse due to numerous events such as gene fusions, gene duplications, genetic recombinations, probable horizontal transfers, and even reacquisition of the same specificity. These proteins are a paradigm of F. Jacob’s ‘‘odd jobs of evolution.’’ If the majority of them show phylogenetic reconstructions that are consistent with the broad grouping into three domains, many individual genes themselves pose difficulties that preclude any simple evolutionary scheme. To make the picture coherent, however, we do not think it necessary to invoke, like Shiba, Motegi, and Schimmel (1997), a confused notion of ‘‘crossover between domains,’’ even less a non-Darwinian ‘‘necessity for adaptation.’’ As time passes, aaRS’s become even more fascinating as their study brings new and unexpected insights into the processes of evolution. Acknowledgments We wish to thank Prof. P. Slonimski for his continuous support and interest. The all-by-all pairwise alignments, the calculations of the pairwise Z values, the delineation of the clusters of sequences, their pyramidal classifications, and the corresponding Web pages were done with the help of J.-J. Codani and E. Glemet from INRIA. We also thank the two anonymous referees for their constructive suggestions and criticisms. LITERATURE CITED ALTSCHUL, S. F., W. GISH, W. MILLER, E. G. MYERS, and D. J. LIPMAN. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410. BERTRAND, P., and E. DIDAY. 1985. A visual representation of the compatibility between an order and a dissimilarity index: the pyramids. Comput. Stat. Q. 2:31–42. BREVET A., J. CHEN, F. LEVEQUE, S. BLANQUET, and P. PLATEAU. 1995. Comparison of the enzymatic properties of the two Escherichia coli lysyl-tRNA synthetases species. J. Biol. Chem. 270:14439–14444. BROWN, J. R., and W. F. DOOLITTLE. 1995. Root of the universal tree based on ancient aminoacyl-tRNA synthetase gene duplications. Proc. Natl. Acad. Sci. USA 92:2441– 2445. BROWN, J. R., Y. MASUCHI, F. T. ROBB, and W. F. DOOLITTLE. 1994. Evolutionary relationships of bacterial and archaeal glutamine synthetase genes. J. Mol. Evol. 38:566–576. BROWN, J. R., F. T. ROBB, R. WEISS, and W. F. DOOLITTLE. 1997. Evidence for the early divergence of tryptophanyland tyrosyl-tRNA synthetases. J. Mol. Evol. 45:9–16. CARTER, C. W. 1993. Cognition, mechanism, and evolutionary relationships in aminoacyl-tRNA synthetases. Annu. Rev. Biochem. 62:715–748. CORPET, F., J. GOUZY, and D. KAHN. 1998. The ProDom database of protein domain families. Nucleic Acids Res. 26: 325–328. CUSACK, S. 1993. Sequence, structure and evolutionary relationships between class 2 aminoacyl-tRNA synthetases: an update. Biochimie 75:1077–1081. . 1995. Eleven done and nine to go. Nat. Struct. Biol. 2:824–831. . 1997. Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol. 7:881–889. DIAZ-LAZCOZ, Y., H. HENAUT, P. VIGIER, and J.-L. RISLER. 1995. Differential codon usage for conserved amino acids: evidence that the serine codons TCN were primordial. J. Mol. Biol. 250:123–127. DOOLITTLE, W. F., and J. R. BROWN. 1994. Tempo, mode, the progenote, and the universal root. Proc. Natl. Acad. Sci. USA 91:6721–6728. DOUBLIÉ, S., G. BRICOGNE, C. GILMORE, and C. W. CARTER. 1995. Tryptophanyl-tRNA synthetase crystal structure reveals an unexpected homology to tyrosyl-tRNA synthetase. Structure 3:17–31. ERIANI, G., M. DELARUE, O. POCH, J. GANGLOFF, and D. MORAS. 1990. Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347:203–206. ETZOLD, T., A. ULYANOV, and P. ARGOS. 1996. SRS: information retrieval system for molecular biology data banks. Methods Enzymol. 266:114–128. FELSENSTEIN, J. 1993. PHYLIP (phylogeny inference package). Version 3.57c. Distributed by the author, Department of Genetics, University of Washington, Seattle. FITCH, W. M., and E. E. MARGOLIASH. 1967. Construction of phylogenetic trees. Science 155:279–284. FORTERRE, P., N. BENACHENHOU-LAHFA, F. CONFALONIERI, M. DUGUET, C. ELIE, and B. LABEDAN. 1992. The nature of the last universal ancestor and the root of the tree of life, still open questions. Biosystems 28:15–32. FOX, G. E., L. J. MAGRUM, W. E. BALCH, R. S. WOLFE, and C. R. WOESE. 1977. Classification of methanogenic bacteria by 16S ribosomal RNA characterization. Proc. Natl. Acad. Sci. USA 74:4537–4541. GLÉMET, E., and J. J. CODANI. 1997. LASSAP, a large scale sequence comparison package. Comput. Appl. Biosci. 13: 137–143. GOGARTEN, J. P., H. KIBAK, P. DITTRICH et al. (13 co-authors). 1989. Evolution of the vacuolar H1-ATPase: implications for the origin of eukaryotes. Proc. Natl. Acad. Sci. USA 86:6661–6665. GOUY, M., and C. GAUTIER. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10: 7055–7074. GOUY, M., C. GAUTIER, M. ATTIMONELLI, C. LANAVE, and G. DI PAOLA. 1985. ACNUC—a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput. Appl. Biosci. 1:167–172. GUERDOUX-JAMET, P., A. HENAUT, P. NITSCHKE, J.-L. RISLER, and A. DANCHIN. 1997. Using codon usage to predict genes origin: is the Escherichia coli outer membrane a patchwork of products from different genomes? DNA Res. 4:257–265. GUPTA R. S., K. BUSTARD, M. FALAH, and D. SINGH. 1997. Sequencing of heat shock protein 70 (DnaK) homologs from Deinococcus proteolycus and Thermomicrobium roseum and their integration in a protein-based phylogeny of prokaryotes. J. Bacteriol. 179:345–357. IBBA, M., J. L. BONO, P. A. ROSA, and D. SOLL. 1997a. Archaeal-type lysyl-tRNA synthetase in the Lyme desease spirochete Borrelia burgdorferi. Proc. Natl. Acad. Sci. USA 94:14383–14388. IBBA, M., S. MORGAN, A. W. CURNOW, D. R. PRIDMORE, U. C. VOTHKNECHT, W. GARDNER, W. LIN, C. R. WOESE, and Evolution of Aminoacyl-tRNA Synthetases D. SOLL. 1997b. A euryarchaeal lysyl-tRNA synthetase: resemblance to class I synthetases. Science 278:1119–1122. IWABE, N., K. KUMA, M. HASEGAWA, S. OSAWA, and T. MIYATA. 1989. Evolutionary relationship of archaebacteria, eubacteria and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. USA 86:9355– 9359. KRAUT, J., J. D. ROBERTUS, J. J. BIRKTOFT, R. A. ALDEN, P. E. WILCOX, and J. C. POWERS. 1972. The aromatic substrate binding site in subtilisin BPN9 and its resemblance to chymotrypsin. Cold Spring Harb. Symp. Quant. Biol. 36:117– 123. LANDÈS, C., J. J. PERONA, S. BRUNIE, M. A. ROULD, C. ZELWER, T. A. STEITZ, and J.-L. RISLER. 1995. A structurebased multiple sequence alignment of all class I aminoacyltRNA synthetases. Biochimie 77:194–203. LIPMAN, D. J., W. J. WILBUR, T. F. SMITH, and M. S. WATERMAN. 1984. On the statistical significance of nucleic acid similarities. Nucleic Acids Res. 12:215–226. LOGAN, D. T., M.-H. MAZAURIC, D. KERN, and D. MORAS. 1995. Crystal structure of glycyl-tRNA synthetase from Thermus thermophilus. EMBO J. 14:4156–4167. MARTIN, W. F. 1996. Is something wrong with the tree of life? Bioessays 18:523–527. MAZAURIC, M.-H., J. REINBOLT, B. LORBER, C. EBEL, G. KEITH, R. GIEGE, and D. KERN. 1996. An example of nonconservation of oligomeric structure in prokaryotic aminoacyl-tRNA synthetases. Eur. J. Biochem. 241:814–826. MÉDIGUE, C., T. ROUXEL, P. VIGIER, A. HENAUT, and A. DANCHIN. 1991. Evidence for horizontal gene transfer in Escherichia coli speciation. J. Mol. Biol. 222:851–856. MORGENSTERN, B., K. FRECH, A. DRESS, and T. WERNER. 1998. DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14:290–294. MOSYAK, L., and M. SAFRO. 1993. Phenylalanyl-tRNA synthetase from T. thermophilus has four antiparallel folds of which only two are catalytically functional. Biochimie 75: 1091–1098. NAGEL, G. M., and R. F. DOOLITTLE. 1995. Phylogenetic analysis of the aminoacyl-tRNA synthetases. J. Mol. Evol. 40: 487–498. PAGE, D. M. 1996. TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12:357–358. RIBAS DE POUPLANA, L., M. FRUGIER, C. L. QUINN, and P. SCHIMMEL. 1996. Evidence that two present-day components needed for the genetic code appeared after nucleated 1561 cells separated from eubacteria. Proc. Natl. Acad. Sci. USA 93:166–170. ROGER, A. J., and J. R. BROWN. 1996. A chimeric origin for eukaryotes re-examined. Trends Biochem. Sci. 21:370–371. SANKOFF, D., V. FERRETTI, and J. H. NADEAU. 1997. Conserved segment identification. J. Comput. Biol. 4:559–565. SANNI, A., P. WALTER, Y. BOULANGER, J. P. EBEL, and F. FASIOLO. 1991. Evolution of aminoacyl-tRNA synthetase quaternary structure and activity: Saccharomyces cerevisiae mitochondrial phenylalanyl-tRNA synthatase. Proc. Natl. Acad. Sci. USA 88:8387–8391. SHIBA, K., H. MOTEGI, and P. SCHIMMEL. 1997. Maintaining genetic code through adaptations of tRNA synthetases to taxonomic domains. Trends Biochem. Sci. 22:453–457. SHIBA, K., P. SCHIMMEL, H. MOTEGI, and T. NODA. 1994. Human glycyl-tRNA synthetase. Wide divergence of primary structure from bacterial counterpart and species-specific aminoacylation. J. Biol. Chem. 269:30049–30055. TATUSOV, R. L., E. V. KOONIN, and D. J. LIPMAN. 1997. A genomic perspective on protein families. Science 278:631– 637. THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. TOURASSE, N. J. 1998. Développement d’une distance évolutive entre séquences prenant en compte la variabilité du taux de substitution entre sites et application à la reconstruction de phylogénies moléculaires anciennes. Ph.D. dissertation, Université Lyon-1. TOURASSE, N. J., and M. GOUY. 1997. Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony. Mol. Biol. Evol. 14:287–298. WAGAR, E. A., M. J. GIESE, B. YASIN, and M. PANG. 1995. The glycyl-tRNA synthetase of Chlamydia tracomatis. J. Bacteriol. 177:5179–5185. WOESE, C. R., and G. E. FOX. 1977. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. USA 74:5088–5090. WOESE, C. R., O. KANDLER, and M. L. WHEELIS. 1990. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria and Eukarya. Proc. Natl. Acad. Sci. USA 87:4576–4579. MANOLO GOUY, reviewing editor Accepted August 6, 1998