Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Phylogenetic Place of Guinea Pigs: No Support of the Rodent-Polyphyly Hypothesis from Maximum-Likelihood Analyses of Multiple Protein Sequences Ying Cao, * Jun Adachi,“f Taka-aki Yano,$ and Masami Hasegawa” pj*The Institute of Statistical Mathematics; ?Depattment of Statistical Science, The Graduate University for Advanced Studies; and SCollege of Arts and Sciences, Showa University Graur et al’s ( 199 1) hypothesis that the guinea pig-like rodents have an evolutionary origin within mammals that is separate from that of other rodents (the rodent-polyphyly hypothesis) was reexamined by the maximum-likelihood method for protein phylogeny, as well as by the maximum-parsimony and neighbor-joining methods. The overall evidence does not support Graur et al.‘s hypothesis, which radically contradicts the traditional view of rodent monophyly. This work demonstrates that we must be careful in choosing a proper method for phylogenetic inference and that an argument based on a small data set (with respect to the length of the sequence and especially the number of species) may be unstable. Introduction On the basis of molecular phylogenetic analyses of proteins, Graur et al. ( 199 1, 1992) and Li et al. ( 1992b) suggested that the order Rodentia may not be monophyletic and that, within mammals, the guinea pig-like rodents (Caviomorpha with the New World families, or Hystricomorpha including the Old World families as well as the New World ones) may have an evolutionary origin separate from that of the ratlike rodents (Myomorpha) and the squirrel-like rodents ( Sciuromorpha) . They further suggested that the Caviomorpha separated from other rodents before the divergence among Rodentia, Primates, and Artiodactyla (tree III in fig. 1). Their suggestion contradicts the traditional view of rodent monophyly (tree I in fig. 1)) which is based mainly on comparative morphology (Luckett and Hartenberger 1985; Novacek 1992). They used a maximum-parsimony (MP) method in estimating the tree, but it is known that the MP method is sometimes misleading, particularly when the evolutionary rate differs among lineages (Felsenstein 1978). Therefore, we reexamined the data by a maximum-likelihood (ML) method for protein sequences (Kishino et al. 1990; Adachi and Hasegawa 1992) that is robust against the violation of rate constancy (Hasegawa et al. 199 1; Hasegawa and Fujiwara 1993)) and a Key words: Guinea pig, Caviomorpha, rodents, mammalian evolution, maximum-likelihood tree of proteins. Address for correspondence and reprints: Masami Hasegawa, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan. Mol. Biol. Evoi. 11(4):593-604. 1994. 0 1994 by The University of Chicago. All rights reserved. 0737-4038/94/l 104-0003$02.00 preliminary result has been presented in a note by Hasegawa et al. ( 1992). Although the MP analysis supported Graur et al.‘s tree of the rodent polyphyly, with as high as 96% bootstrap probability, the result of ML analysis was equivocal, and hence we suggested the possibility that Graur et al.‘s tree represents an example of the fact that unequal evolutionary rates can mislead MP analysis (Felsenstein 1978; Hasegawa et al. 199 1). To our note, Li et al. ( 1992~) responded. They noted that the proteins used by Hasegawa et al. ( 1992) can be classified into two groups-that is, conservative and nonconservative proteins. It is known that the unequal rate effect on the MP method is stronger for divergent sequences than for well-conserved ones. Li et al. ( 1992a) found that, for most of the conservative proteins, the ML and MP methods are congruent, favoring the rodent polyphyly tree of Graur et al. ( 199 1). From our point of view, however, because the support of the rodent polyphyly tree by ML is not statistically significant, we cannot accept this tree, which radically contradicts the traditional tree. More recently, Ma et al. ( 1993) sequenced the mitochondrial cytochrome b gene from the guinea pig, the African porcupine (Hystrix africaeaustralis), and the South American opossum (Monodelphis domestica) and suggested, by using the neighbor-joining (NJ) method (Saitou and Nei 1987), another rodent polyphyly tree, where Hystricomorpha (the guinea pig and the African porcupine) and Primates form a clade excluding Myomorpha as an outgroup (tree II in fig. 1). This tree is at odds with Graur et al.‘s ( 199 1) tree, as well as with the traditional tree of the rodent monophyly. 593 594 Cao et al. Tree I Caviomorpha Myomorpha Primates outgroup Tree II Caviomorpha Primates Myomorpha outgroup Tree III Myomorpha Primates Caviomorpha outgroup FIG. 1.-Three possible phylogenetic trees among Caviomorpha (Hystricomorpha), Myomorpha, and Primates. Tree I represents the traditional view of Caviomorha and Myomorpha forming a clade as rodents, while tree III represents the view of Graur et al. ( 199 1) that Myomorpha and Primates form a clade excluding Caviomorpha as an outgroup. In this way, there remain many uncertainties with regard to the phylogenetic position of the guinea pig. Since this issue is critical to the methodology of molecular phylogenetics, as well as to the specific problem of mammalian phylogeny, we here present a detailed analysis of this problem. Material and Methods We examine the relationship among Caviomorpha ( Hystricomorpha ) , Myomorpha, and Primates. The protein sequence data used in this work are shown in table 1. Positions with gaps, as well as areas where alignment was ambiguous, were excluded from the analyses. In the first analysis, we use the same data set as that used by Hasegawa et al. ( 1992). The data are mostly from the SWISS PROT data library and are mostly the same as those used by Graur et al. ( 199 1). We additionally use factor IX sequences (Bottema et al. 199 1), with factor X as an outgroup. Following Li et al. ( 1992a), we take a-crystallin A, a-globin, P-globin, lipoprotein lipase, and lipocortin I as “conservative” proteins and take a-lactalbumin, p-nerve growth factor, factor IX, pancreatic ribonuclease, and proinsulin as “nonconservative” proteins. These two groups are analyzed separately. Table 1 List of Data Used in the Analyses No. of Protein (Abbreviation) 1. a-Crystallin A (Crys) ...... 2. a-Globin (Hba) .......... Sites Primates 172 141 Myomorpha Caviomorpha (Hystricomorpha) outgroup Mru” (PO2502) Oan (PO1979) Tat (PO1977) Dmaa (PO1976) Mgi (PO1975) Dvi (PO7419) Oan (PO2111) Tat (PO2110) Dmaa (PO2109) Mgi” (PO2106) Rno’ (PO2490) Mmua (PO1942) Mau’ (PO1945) Cpo” (P02491) Cpo” (POl947) Mmu (PO2088) Rnoa (PO2091) Mau * (PO2094) Cpo * (PO2095) Mmu” (Pll152) Mmua (P10107) Rnoa (PO7150) Rno’ (PO0715) Cpo” (Pl 1153) Ccu* (Pl4087) Gga” (PI 1602) Cli” (Pl4950) Cpo” (PO0713) Mrufo” (P07458) .... ..... 146 4. Lipoprotein lipase (Lipa) . 5. Lipocortin I (Cort) . . . . . 443 339 Hsa’ (P02489) Hsa” (PO1922) Cae (PO1926) Agea (PO1927) Lta (PO1938) Tgeavb(PO1932) Hsa” (PO2023) Cae (PO2028) Age* (PO2034) Lta (PO2048) Mmulavb (PO2026) Hsa” (P06858) Hsaa (PO4083) 6. a-Lactalbumin (Lact) ... 7. P-Nerve growth factor (NGF) . . . . . . . . . . . . ... 8. Factor IX (Fac9) . . . 119 Hsa” (POO709) 235 238 Hsa’ (PO1138) Hsa” (Bottema9 1) Mmu’ (PO1139) Mmu” (Bottema9 1) Rno a (Bottema9 1) Cpo” (Pl9093) Cpo” (Bottema9 1) Gga” (PO5200) HsaX” (Bottema9 1) BtaXa (Bottema9 1) 9 Pancreatic ribonuclease (Ribo) . . . . . . . . . 122 Hsa” (P07998) Pen (P19644) Mmua (PO0683) Rno”*b (PO0684) Mau’ (PO0682) Ozi (PO068 1) Mru” (PO0686) ...... 81 Hsa” (PO1308) Rno 1’ (PO1322) Rno2” (PO1323) CpoA (PO0678) CpoB8 (PO0679) Hhy” (PO0677) Cbr (PO0675) Mco (PO0676) Pgu (PO4059) Cpo” (PO1329) 3. P-Globin (HbP) 10. Proinsulin (Insu) Gga” (P01332) 596 Cao et al. tween the log likelihood of alternative trees and that of the ML tree, by the formula of Kishino and Hasegawa ( 1989), and estimated the bootstrap probability by the RELL (resampling of the estimated log likelihood) method given by Kishino et al. ( 1990). The RELL method is a good approximation to the computationally intensive bootstrap method (Felsenstein 1985)) in estimating bootstrap probability (Hasegawa and Kishino 1994). The results of the ML analyses are compared with those of the MP analysis (using the PROTPARS program of Felsenstein’s [ 19931 PHYLIP package) and with those of the NJ (Saitou and Nei 1987) analysis. (The aligned data used in this work, as well as the PROTML program, are available by anonymous ftp in sunmh.ism.ac.jp( 133.58.12.20):/pub/data/guinea-pig and /pub/molphy*.) Results and Discussion Using the Previous Data Set Table 2 gives results of the analyses of the same data set as in Hasegawa et al. ( 1992 ) . Since, in calculating the transition-probability matrix, the revised version of the PROTML adopts a numerical method more precise than that used in the previous version, the estimated likelihoods differ very slightly from the previous ones ( Hasegawa et al. 1992 ) , although our argument remains exactly the same. Likelihoods of the three candidate trees shown in figure l-that is, tree I, in which Caviomorpha and Myomorpha form a monophyletic group representing the traditional view; tree II, in which Caviomorpha and Primates form a clade excluding Myomorpha as an outgroup; and tree III, in which Myomorpha and Primates form a clade excluding Caviomorpha as an outgroup, representing Graur et al.‘s ( 199 1) hypothesis-were evaluated. For all the conservative proteins in the table, the ML method for all the models is congruent with the MP method, except for the ML trees of lipoprotein lipase and hemoglobin p, which, by some of the models, differ from the MP trees, with very minor log-likelihood differences. Bootstrap probabilities of tree III for the conservative proteins as a whole are 0.77, 0.78, 0.72, and 0.68, for the JTT, Dayhoff, proportional, and Poisson models, respectively, and 0.99 for the MP method. Table 2 also gives bootstrap probabilities for individual proteins, estimated by the NJ method. We cannot synthesize the results of the NJ analyses for diverse proteins as we did for the ML and MP analyses, but the results of the NJ analyses are consistent with those of the ML. For the nonconservative proteins, on the other hand, the ML and MP methods often do not support tree III, and the NJ method never supports that tree. When all the nonconservative proteins in the table are considered together, the ML method prefers tree I of rodent monophyly, irrespective of the model, while the MP method prefers tree II (another rodent polyphyly tree), but both results are not statistically significant. Since the unequal rate effect of the MP method is known to be stronger for divergent sequences than for conservative ones, this analysis might be regarded as in accord with tree III of the rodent polyphyly, as claimed by Graur et al. ( 199 1,1992) and Li et al. ( 1992a, 19923). However, tree I, which has long been accepted by morphologists, cannot, with any statistical significance, be ruled out by the ML analysis, because the bootstrap probability of this tree for the conservative proteins is estimated to be as high as 0.2 1,O. 19,0.27, and 0.30, by the JTT, Dayhoff, proportional, and Poisson models, respectively. We used four alternative models for amino acid substitutions in the ML analysis, and the ML tree sometimes differs among models. However, the dependence of the ML tree on the model does not necessarily mean that the ML method is sensitive to violation of the assumed underlying model, as Li et al. ( 1992a) claim. It merely indicates that the data are not sufficient to discriminate among the candidate trees, because none of the differences are statistically significant. Nevertheless, it is of course desirable to use a model that approximates the data as closely as possible. By using Akaike’s information criterion (AIC) (Akaike 1973; for review, see Sakamoto et al. 1986), defined by AIC = - (2 X log likelihood) + [ 2 X (no. of parameters)], we can compare the adequacy of the approximation of the four models for amino acid substitutions. The minimum AIC model is considered to be the most appropriate to represent the data. Table 3 shows that, for all the proteins analyzed in this work, the JTT and Dayhoff models are more appropriate by far than are the proportional and Poisson models. Furthermore, the JTT model is better than the Dayhoff model, except for hemoglobins a and p. These exceptions may be due to the hemoglobins contributing to a large weight in the estimation of Dayhoff et al.‘s matrix, and the JTT model seems to be appropriate in approximating the evolution of a wider range of proteins than is approximated by the Dayhoff model. Problems in the Previous Data Set The analyses mentioned above are mostly the same as those of Hasegawa et al. ( 1992), except for ( 1) the introduction of Li et al.‘s ( 1992a) classification of proteins into conservative and nonconservative and (2) the additional model (the JTT model) of amino acid substitutions. However, the data set contains several problems, as follows: Table 2 ML, MP, and NJ Analyses of the Relationship among Caviomorpha, Myomorpha, and Primates, as Shown in Figure 1 CONSERVATIVE PROTEINS TREE Crys HbU ML method with JTT model: I ML -6.2 + 8.0 II -0.5 + 7.0 -10.9 f 6.3 III -5.6 + 4.5 ML ML method with Dayhoff model: I ... .. ML -5.6 + 7.9 II . . . . -0.5 f 7.1 -10.5 & 6.0 II1 -5.6 ? 4.5 ML ML method with proportional model: I . ML -6.6 & 10.2 I1 -6.3 + 9.1 -13.4 + 7.8 III -11.0 +- 7.6 ML ML method with Poisson model: I ML -3.3 -+ 9.4 II -5.9 f 9.5 -10.9 + 6.5 III -11.2 + 7.7 ML MP method: I ..... MP +4 II +2 +8 III +3 MP MJ methods: I .83 .04 II II .OO III .OO L96 NONCONSERVATIVE PROTEINS HbP Lipa colt Overall” Pb Lact ML -3.8 + 3.7 +-O/I’* 5.6 -1.7 * 3.3 -0.2 + 4.0 ML -8.2 + 7.1 -10.8 f 6.2 ML -10.1 f 13.3 -20.2 f 11.4 ML .2147 .Ol66 .7687 ML -1.1 f 4.9 -3.6 -t 3.7 ML -3.5 + 3.6 -0.2 * 5.5 -3.0 + 3.5 ML -0.2 f 4.6 -8.3 & 6.9 -10.6 & 6.1 ML -10.9 + 13.1 -19.0 f 11.5 ML .1938 .0237 .7825 -1.9 f 5.4 -4.3 f 4.1 ML -3.8 f 6.2 -0.7 f 7.3 ML -8.8 f 7.9 -10.5 f 7.3 ML -10.2 * 17.0 -24.2 f 14.2 ML -2.8 +- 5.5 -4.8 f 4.4 ML -4.9 + 6.6 -1.5 f 7.8 ML -8.5 f 8.5 -10.6 -+ 7.7 ML +1 +5 MP +9 +6 MP +4 +5 MP .61 zl .39 .oo .49 .04 -47 .16 : 84 Pb Fac9 Ribo IIISU Overall 8 Pb ML -9.2 + 5.9 -3.4 -+ 7.8 -2.7 f 2.7 ML -2.3 ? 3.0 -3.1 + 2.8 ML -2.6 -+ 3.2 -2.0 + 2.7 ML -2.0 -+ 2.7 ML -2.5 + 8.9 -6.1 _+ 8.8 .5482 .3238 .I280 -4.0 f 15.9 -16.5 + 13.6 ML .3885 .0423 .5692 ML -4.1 + 5.3 -5.4 + 4.5 ML -9.2 + 6.1 -2.3 ? 8.3 -3.8 & 3.4 ML -3.6 f 3.5 -3.4 2 3.0 ML -2.9 f 3.4 -1.6 f 2.2 ML -1.6 + 2.2 ML -4.6 + 9.4 -7.0 f 9.6 .6036 .2480 .1484 -3.8 + 16.2 -16.6 f 13.9 ML .3886 .0444 .5670 .2662 .0165 .7173 ML -5.8 +- 4.8 -5.8 + 4.6 ML -12.1 f 7.1 -5.6 + 9.2 -2.9 f 4.0 ML -4.0 f 3.5 -3.8 f 3.9 ML -2.9 2 4.5 ML -1.7 f 2.5 -1.2 f 2.8 ML -12.9 f 10.4 -12.7 f IO.9 .8197 .0827 .0976 ML -26.9 f 16.8 -2.5 + 20.2 .544 1 .0118 ,444 1 -8.2 * 17.0 -22.4 + 14.4 ML .3029 .0217 .6754 ML -6.3 ? 5.1 -5.9 f 5.2 ML -11.6 f 7.1 -5.0 -t 9.1 -3.2 * 4.1 ML -4.3 f 3.6 -3.2 ? 3.5 ML -2.0 + 4.3 ML -1.9 f 2.7 -1.5 + 2.9 ML -13.3 f 10.5 -12.2 + 11.1 .8162 .0688 .I150 ML -27.5 f 17.5 -4.0 ? 20.4 .5674 .0124 .4202 +15 +23 MP .0148 .OOOl .985 1 +5 MP +4 +5 MP +1 MP +3 +3 MP +2 +1 +1 MP +5 MP +3 .1404 .5556 .3040 +17 +20 MP .0303 .OlO8 .9589 .68 Y5 .07 .77 K .22 .33 .46 5 .12 .78 z .47 z .12 NGF +I OVERALL’ SOURCE.-Hasegawa et al. ( 1992) data set. NOTE.-For the ML analyses, the highest likelihood tree for each protein is indicated as “ML,” and the differences of log likelihoods of alternative trees from that of the ML tree are shown with their standard errors. For the MP analyses, the most parsimonious tree is indicated as “MP,” and the differences of numbers of substitutions of alternative trees from that of the MP tree are shown. For the NJ analyses, bootstrap probabilities of three alternative trees of being an NJ tree during bootstrap resamplings are given for individual proteins (with 100 replications), and those of NJ trees for the real data are underlined. Abbreviations of protein names are as in table 1. ’Summation of log-likelihood differences from the diverse proteins. b Bootstrap probability of being the ML or MP tree, among alternatives during bootstrap resampling, estimated by the RELL method with 10’replications. . 598 Cao et al. Table 3 Difference of AIC from That of the Minimum AIC Model for the Analyses Presented in Table 2 Cryst Hba HbP Lipa .. ..... . tort Lact . . NGF . Fac9 . Ribo lnsu . . Overall JTT Dayhoff 0 31.6 33.6 0 0 0 0 0 0 0 0 13.5 0 0 8.0 36.0 8.6 26.4 30.6 4.0 8.4 70.3 Proportional 44.7 123.5 237.6 260.7 214.2 92.9 112.4 247.6 135.6 65.9 1,469.9 Poisson 82.1 181.7 269.7 327.3 308.0 110.5 168.5 297.7 172.0 66.3 1,918.6 The a-crystallin A data set may contain paralogous sequences. The human (Homo sapiens [ Hsa] ) differs from the guinea pig ( Cavia porcellus [ Cpo]) and the rat (Rat&s norvegicus [ Rno]) at seven and eight sites, respectively, in 172 amino acids of a-crystallin A, and the guinea pig differs from the rat at only one site. On the other hand, two strepsirhines, the potto (Perodicticus potto; SWISS PROT P02495) and the brown lemur (Lemur fulvus; P02494) differ from the guinea pig at three and two sites, respectively, and differ from the human, which must be closer to the strepsirhines, at as many as nine and eight sites, respectively. There is no indication of significant variation of the evolutionary rate of the acrystallin A among these species, on the basis of a relative rate test using the red kangaroo (Macropus rufus [ Mru] ) as an outgroup. This suggests that, while the strepsirhine sequences are orthologous to those of the guinea pig and the rat, the human sequence used in the previous analysis may be paralogous to these sequences and, hence, does not provide us with information on the phylogenetic relationship among species. For this reason, and since the strepsirhine sequences differ at too few sites from the guinea pig and rat sequences to have much phylogenetic information, we do not use the a-crystallin A sequences in the following analyses. In the previous analysis of hemoglobin a, the human (Hsa), the gelada babbon ( Theropithecus geZada [ Tge] , an Old World monkey), and the spider monkey (Ateles geofioyi [Age], a New World monkey) are used from Primates, and the relationship (( Hsa,Tge), Age) was used. The hemoglobin c1of the gelada baboon differs from those of the human and the spider monkey at 12 and 14 sites, respectively, in 14 1 amino acids, while the human differs from the spider monkey at only 5 sites. At a first glance, this may seem to indicate that the relationship (( Hsa,Age),Tge) holds for the hemoglobin a sequences and that the gelada baboon sequence is paralogous to those of the human and the spider monkey. However, an ML analysis of these sequences favors (( Hsa,Tge) ,Age) over (( Hsa,Age) ,Tge), with a log-likelihood difference of 3.8 & 4.6 (the Dayhoff model; k refers to SE). Although the relationship (( Hsa,Tge), Age), which was used in the previous analysis, is justified in this way, this relationship indicates exceptionally rapid evolution in the gelada baboon lineage, relative to that of other primates. For this reason, we use the green monkey (Cercopithecus aethiops [Cae]) for the Old World monkey, instead of gelada baboon, in the following analysis. Furthermore, we also use the slender loris (Loris tardigradus [ Lta] , a strepsirhine ) . Another problem in the previous analysis of hemoglobin a is that only the North American opossum (Didelphis marsupialis [ Dma] ) was used as an outgroup. It is known, however, that the opossum a chain has apparently evolved more rapidly than have other c1chains and that its mode of evolution differs from those of other species ( Stenzel 1974). The opossum hemoglobin c1has unusual features, and it seems to have lost some selective constraints, because of unknown causes ( Kimura 1983 ). Therefore, it may be desirable to use more sequence data as an outgroup, to increase the stability of the results. Hence, the duckbill platypus (Ornithorhynchus anatinus [ Oan] ), the Australian echidna ( Tachyglossus aculeatus [Tat ] ) , the eastern gray kangaroo (Macropus giganteus [ Mgi ] ) , and the southeastern quo11 (Dasyurid viverrinus [ Dvi] ) are used as outgroups, in addition to Didelphis marsupialis (table 1) . Hemoglobin a and p For this modified data set of hemoglobin a, the ML relationships within the outgroup and within Primates are ( (Oan,Tac) , (( (Mgi,Dvi) ,Dma), in group)), and (( (Hsa,Cae) ,Age) ,Lta) ; and differences of log likelihood among trees I, II, and III are shown in table 4. It turned out that this data set favors tree I over the alternatives, although the difference is not statistically significant, irrespective of the model used for ML analysis. The MP analysis is consistent with the ML analysis. Since tree III was preferred for the previous data set of hemoglobin a, this analysis indicates that the preferred tree depends on the species set that we use (Lecointre et al. 1993). The second set of species should be preferred, since it is more complete (hence, it contains more information and allows a better polarization of characters). For hemoglobin p, more species were also used than were used in the previous analysis (table 1). The ML relationships within the outgroup, within Primates, and within Myomorpha are (( Oan,Tac), ( (Mgi,Dma), I IL c .= 3 600 Cao et al. ingroup), (( (Hsa,Cae),Age),Lta), and (( (Mmu,Rno), Mau) ,Mbr), respectively. This data set also favors tree I, irrespective of the model for the ML analysis, although the MP analysis supports trees I and III equally (table 4). ,yatricomo’ I Myoglobin The guinea pig myoglobin sequence has not been published yet, and hence we instead use myoglobins from other members of the New World Hystricomorphathe casiragua (Proechimys guaire [Pgu]) and the viscacha (Lagostomus maximus [ Lma]). This data set strongly supports tree I (fig. 2)) irrespective of the model for ML analysis (table 4). To confirm tree I of the rodent monophyly, we must evaluate whether casiragua and viscacha are closely related to guinea pig. We can use pancreatic ribonuclease for this purpose, because this protein has been sequenced both from the casiragua and from the guinea pig (but not from the viscacha). To evaluate the monophyletic relationship among Hystricomorpha, all 105 trees possible among CpoA, CpoB, (Cbr,Pgu,Mco), Hhy, (( Mmu,Rno), Mau), and Hsa were investigated. The ML tree is shown in figure 3, suggesting the monophyly of Hystricomorpha. The subtotal of bootstrap probabilities for the monophyly of Hystricomorpha among the 105 trees is as high as 0.9505 (the JTT model). Our ML tree of ribonuclease is consistent with the Hystricomorpha part of the MP tree (Beintema 1985; Fitch and Beintema 1990). Therefore, the use of the casiragua and the viscacha (the close relationship between these two Callithrixjacchus 1 I Macaca fascicularis - [- Homo sapiens Ornithorhynchus anatinua Tachyglossus aculeatus 1 I 0.1 substitutions / site FIG. 2.-ML tree of myoglobin (the JTT model). The horizontal length of each branch is proportional to the estimated number of amino acid substitutions. The root is arbitrarily placed on the branch leading to Monotremata. The Hystricomorpha/Myomorpha clade suggested by this analysis is shaded. Chinchilla brevicaudata - Myocastor coypus MU8 musculus Myomorpha I Mesocricetus auratus Ondatra zibethicus Presbytis entellus Primates I I 0.7 substitutions/site FIG. 3.-ML tree of pancreatic ribonuclease (the JTT model). The horizontal length of each branch is proportional to the estimated number of amino acid substitutions. This is an unrooted tree, but, from the overall evidence presented in this paper, the root is likely to be located on the branch leading to the primate lineage. is evident from the myoglobin data), instead of guinea pig, in analyzing the myoglobins is justified. Cytochrome b One of the problems in molecular phylogenetic analyses using proteins encoded by nuclear DNA is that we may be comparing paralogously related proteins rather than orthologously related ones. Since, when matched for homology, sequences of mitochondrial DNA from different species will always be from orthologous genes, mitochondrial DNA sequence data from a guinea pig would be of great help in resolving the riddle of the phylogenetic placement of this group. Ma et al.‘s ( 1993) data on cytochrome b are important in this respect . On the basis of the NJ analysis of cytochrome b sequences from 26 species including the guinea pig, the African porcupine (Hystrix africaeaustralis; [ Haf] ) , myomorphs (mouse Mus musculus and rat Rattus norvegicus) , 11 artiodactyls, dolphins, perissodactyls (zebra Equus grevyi and rhinocerus Diceros bicornis), human (Homo sapiens), elephant (Loxodonta africana), the South American opossum (Monodelphis domestica [ Mdo ] ) , chicken ( Gallus gallus [ Ega] ) , frog (Xenopus Zaevis [ Xla] ), and sturgeon (Acipenser transmontanus), they suggested that Hystricomorpha, including the guinea pig and the African porcupine, are closer to Artiodactyla, Perissodactyla, Primates, and Proboscidea than to Myomorpha (tree II in fig. 1), although they admitted that their suggestion is not strongly based. Phylogenetic Place of Guinea Pigs We reanalyzed the cytochrome b data by the ML analysis. The data from human, mouse, rat, guinea pig, and African porcupine were analyzed by using opossum (Mdo ), chicken (Gga) , frog ( Xla), carp ( Cyprinus carpio [ Cca]), and loach ( Crossostoma Zacustre [ Cla] ) as an outgroup (table 1 ), with the relation (( Cca,Cla), ( Xla, (Gga, (Mdojngroup)))). The artiodactyls, perissodactyls, dolphins, human, and elephant used in Ma et al’s analysis were excluded, so as to restrict our analysis to the relationships among relevant taxa, and the 15 possible trees linking Primates ( human ) , Myomorpha (mouse and rat), guinea pig, porcupine, and the outgroup were examined. Consistent with Ma et al.‘s NJ analysis, the guinea pig/porcupine clade is strongly supported, with bootstrap probabilities of 0.90-0.96 for the four models of the ML analyses. In addition, our ML analysis favors tree I rather than tree II, irrespective of the assumed model, although with no statistical significance (fig. 4 and table 4). The MP analysis is equivocal for these data. The close relationship between the South American and the African Hystricomorpha is in accord with the hypothesis that the South American ones originated in Africa ( Wyss et al. 1993). Da Silva and Patton ( 1993) sequenced a partial cytochrome b gene from several arboreal species of the Caviomorph family Echimyidae from the Amazon Basin. Although inclusion of these data reduces the number of commonly compared sites to 265, to examine the effect of inclusion of further species, we additionally analyzed the cytochrome b data, including five sequences of Echimyidae species, Dactylomys boliviensis (Dbo; EMBL/GenBank/DDBJ L23339), Dactylomys dactylinus (Dda; L23336), Echimys chrysurus (Ech; L23340), and Makalata didelphoides JLP152 14 and LHE554 (Mdi 1 and Mdi2; L23356 and L2336 1), among which the phylogenetic relationship is unambiguously assigned as ( ( Mdi 1,Mdi2 ), ( Ech,( Dbo,Dda)) ) . The 105 possible trees, linking Echimyidae, Caviidae ( Cavia porcellus) , Hystricidae (Hystrix africaeaustralis), Myomorpha (Mus musculus and Rattus norvegicus), and Primates (Homo sapiens) with the outgroup used in the preceding analysis, were examined by the ML method based on the JTT model. A Caviidae/Echimyidae clade excluding all the others is strongly supported, with a bootstrap probability of 0.92. Furthermore, although with no statistical significance, a clade formed by Caviidae, Echimyidae, and Hystricidae is favored, with a bootstrap probability of 0.52. Thus, the most likely relationship in Hystricomorpha is (( Caviidae,Echimyidae ) ,Hystricidae). Under the assumption of this relationship within Hystricomorpha, tree I among Primates, Myomorpha, and Hystricomorpha is best supported. Trees II and III have log likelihood lower by 7.5 + 5.4 and 6.9 ? 5.7, 601 Homo sapiens - Monodelphis domestica Gallus gallus Xenopus la&s T I cro.E3sostoma lacustre Cyprinus carpio I 0.1 substitutions/site FIG. 4.-ML tree of cytochrome b (the JTT model). The horizontal length of each branch is proportional to the estimated number of amino acid substitutions. The root is arbitrarily placed on the branch leading to Actinopterygii. The Hystricomorpha/Myomorpha clade suggested by this analysis is shaded. respectively, than that of tree I, again not in accord with Graur et al’s hypothesis. Cytochrome Oxidase Subunit II (COII) Another mitochondrial element relevant to the guinea pig problem is COII. Although CO11 from the guinea pig has not been published yet, we can use instead the data from the Cape mole rat (Georychus capensis) ( Adkins and Honeycutt 1993 ) , belonging to the African Hystricomorph family Bathyergidae, which is probably closely related to the guinea pig, as suggested by the cytochrome b analysis. Since the inclusion of many species from the diverse group would stabilize the result of anal(Pan ysis, we used human (Hsa), pygmy chimpanzee paniscus [ Ppa] ) , gorilla ( Gorilla gorilla [ Ggo ] ) , siamang (Hylobates syndactylus [ Hsy ] ) , crab-eating macaque (Macaca fascicularis [ Mfa ] ) , rhesus macaque (Macaca mulatta [ Mmul] ), mantled howler monkey (Alouatta palliata [ Apa] ), Humboldt’s woolly monkey (Lagothrix lagotricha [ Lla] ) , lesser bushbaby ( Galago senegalensis [ Gse] ), and ring-tailed lemur (Lemur catta [Lea] ), among which the phylogenetic relationship is unambiguously assigned as ((((((Hsa,Ppa),Ggo),Hsy),(Mfa, Mmul)) ,( Apa,Lla)), (Gse,Lca)) (Ruvolo et al. 199 1; Adkins and Honeycutt 1993 ) . Tree I is supported by all the models of ML, as well as by the MP analysis (table 4). Figure 5 gives the ML tree of CO II, based on the JTT model. The close relationship between a Hystricomorph, Georychus capensis, 602 Cao et al. - Homo sapiens _- Pan paniscus - Gorillagorilla Hylobates syndactylus Macaca fascicularis Maraca mulatta Alouatta palliata Lagothrix - Didelphis U - virginiana Gallus gallus Xenopus laevis Crossostoma k I lagotricha Cyprinus lacustre carpio I 0.1 substitutions/site FIG. K-ML tree of CO II (the JTT model). The horizontal length of each branch is proportional to the estimated number of amino acid substitutions. The root is arbitrarily placed on the branch leading to Actinopterygii. The Hystricomorpha/Myomorpha clade suggested by this analysis is shaded. and Myomorpha is in accord with the MP analysis by Adkins and Honeycutt ( 1993). els, supports tree I, with a bootstrap probability of as high as 0.92. The strongest support for tree III, from the MP analysis, is provided by lipoprotein lipase (table 4). The number of substitutions in tree I is greater than that in tree III, by as many as 9 f 3.6 for this protein. Tree III is also preferred to tree I, by any of the models of ML, but the log-likelihood difference is very small ( 1.7 _+3.3 for the JTT model). Particularly in tree I, the length of the branch leading to guinea pig is much greater than the lengths of the other branches (data not shown). This may be an example of misleading MP analysis, because of the unequal rate effect. Of course, it is possible that we are comparing paralogous genes for lipoprotein lipase. If we exclude this protein, the MP analysis in table 4 no longer prefers tree III to tree I. For the revised data set, even when we exclude nonconservative proteins, tree I is still supported by the ML analysis (table 4). There seems to be no reason to believe that the ML analyses of nonconservative proteins are less reliable than those of conservative ones. Rather, we could consider that nonconservative proteins have more information than do conservative ones, especially when PROTML is used. Since the overall evidence of conservative and nonconservative proteins from the ML analyses supports tree I, the rodent polyphyly hypothesis claimed by Graur et al. ( 199 1) seems unlikely. Recently, Martignetti and Brosius ( 1993) demonstrated that BC 1 RNA, the product of a retropositionally generated gene, is present in Sciurognathi and guinea pig but not in other mammalian orders, including Primates, Artiodactyla, and Lagomorpha. Their finding is in accord with our analyses and is at odds with the rodent polyphyly hypothesis. Other Protein Data Conclusions Gi3 protein c1 from human, rat, and guinea pig, with both that from Xenopus and Gil protein a from guinea pig as outgroups, support tree I, by either method (table 4). Integrin ,f31and NADPH-cytochrome P450 reductase do not necessarily favor tree I, but there is no statistical significance (table 4). The ML and MP analyses of myelin and glucagon do not show any difference among the three possible trees, and hence the analyses of these genes are not given in table 4. Li et al. ( 1992a) claim that the ML method that we used is model dependent, while the MP method used by Graur et al. is not. From our viewpoint, any method of data analysis is based on some assumption (model), explicitly or implicitly. The ML method has a solid statistical background and is based on an explicit model, and hence the assumption is clear. Of course, if the assumption deviates too much from the real underlying process, the inferred tree may be erroneous. Therefore, we must continue to improve the model used in the ML method, by taking into account newly obtained knowledge on molecular evolutionary process. In this sense, the ML method is flexible enough in including increasing new evidence of the underlying process. In contrast, the assumption of the MP method is not necessarily clear. Up to now, it has been known that the MP method behaves badly in some situations (Felsenstein 1978; Hendy and Penny 1989; Hasegawa et al. 199 1; Hasegawa Overall Evidence of Conservative Proteins Table 4 summarizes the ML and MP analyses of the revised data set for the conservative proteins, under Li et al.‘s ( 1992~) criterion. Although the MP analysis still very weakly favors tree III, the overall evidence of the ML analysis supports the traditional tree I, irrespective of the assumed model, and the JTT model, which best approximates the data among the alternative mod- Phylogenetic Place of Guinea Pigs 603 and Fujiwara 1993), and hence, although the MP method is the most popular in molecular phylogenetics (e.g., see Stewart 1993)) we must be careful in using it. Furthermore, there is good indication of robustness of the ML method against the violation of the assumed model in inferring a branching order (Fukami-Kobayashi and Tateno 199 1; Hasegawa and Fujiwara 1993), and the inferred trees from the real data are consistent for different models if the SEs of the estimates are taken into account (Hasegawa et al. 1993; Hashimoto et al. 1993; present study; authors’ unpublished data). Up to now, we have no experience of finding inconsistent trees, of statistical significance, among different models for the ML. Graur et al.‘s hypothesis was at first supported by the MP method, with a high statistical significance. However, the overall evidence of the ML analysis of the revised data set does not support the hypothesis that radically contradicts the traditional view. This work demonstrates that we must be careful in choosing, among alternative methods, a proper method for phylogenetic inference and that an argument based on a small data set (with respect to both the length of the sequences and, especially, the number of species) may be unstable. It must be noted that this work does not necessarily exclude the rodent polyphyly hypothesis, and we think that more data are needed to settle the issue, as Graur et al. ( 199 1) and Li et al. ( 1992a) admit. It must also be noted that what we have studied in this work is limited to the relation among Hystricomorpha (Caviomorpha), Myomorpha, and Primates. Even if the close relationship between Hystricomorpha and Myomorpha, excluding Primates as an outgroup, is established, some of the other eutherian orders can be close to either Hystricomorpha or Myomorpha, rather than between these two groups. The difficulty in resolving the branching order among Hystricomorpha, Myomorpha, and Primates-as well as the outgroup status of Myomorpha, with respect to Primates, Carnivora, Artiodactyla, and Cetacea, shown by recent works (Easteal 1988, 1990; Li et al. 1990; Janke et al., in press; authors’ unpublished data)-may suggest a high taxonomic rank for Hystricomorpha, even though the traditional view of the rodent monophyly holds. Acknowledgments We thank H. Kishino, M. Milinkovitch, M. Novacek, and J. Powell for valuable comments on the manuscript and for discussions. Thanks are also due to two anonymous reviewers for their valuable comments on an earlier version of the manuscript. This work was carried out under the Institute of Statistical Mathemathics Cooperative Research Program (grants 92-ISMCRP-81 and 93-ISM-CRP-C2) and was supported by grants from the Ministry of Education, Culture of Japan. Science, and LITERATURE CITED ADACHI, J., and M. HASEGAWA. 1992. MOLPHY: programs for molecular phylogenetics. I. PROTML: maximum likelihood inference of protein phylogeny. Computer Science Monographs, no. 27. Institute of Statistical Mathematics, Tokyo. ADKINS, R. M., and R. L. HONEYCUTT. 1993. A molecular examination of Archontan and Chiropteran monophylyl. Pp. 227-249 in R. D. E. MACPHEE, ed. Primates and their relatives in phylogenetic perspective. Plenum, New York. AKAIKE, H. 1973. Information theory and an extension of the maximum likelihood principle. Pp. 267-281 in B. N. PETROV and F. CSAKI, eds. Second International Symposium on Information Theory. Akademiai Kiado, Budapest. BEINTEMA,J. J. 1985. Amino acid sequence data and evolutionary relationships among hystricognaths and other rodents. Pp. 549-565 in W. LUCKETT and J.-L. HARTENBERGER, eds. Evolutionary relationships among rodents: a multidisciplinary analysis. Plenum, New York. BOTTEMA, C. D. K., R. P. KETTERLING, S. II, H.-S. YOON, J. A. PHILLIPSIII, and S. S. SOMMER. 1991. Missense mutations and evolutionary conservation of amino acids: evidence that many of the amino acids in factor IX function as “spacer” elements. Am. J. Hum. Genet. 49:820-838. DA SILVA, M. N. F., and J. L. PATTON. 1993. Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha) . Mol. Phylogenet. Evol. 2:243-255. DAYHOFF, M. O., R. M. SCHWARTZ,and B. C. ORCUTT. 1978. A model of evolutionary change in proteins. Pp. 345-352 in Dayhoff, 0. H., ed. Atlas of protein sequence and structure, Vol. 5, suppl. 3. National Biomedical Research Foundation, Washington, D.C. EASTEAL,S. 1988. Rate constancy of globin gene evolution in placental mammals. Proc. Natl. Acad. Sci. USA 85:76227626. . 1990. The pattern of mammalian evolution and the relative rate of molecular evolution. Genetics 124: 165- 173. FELSENSTEIN,J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27:401-410. -. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-79 1. . 1993. PHYLIP, version 3.5. University of Washington, Seattle. FITCH, W. M., and J. J. BEINTEMA. 1990. Correcting parsimonious trees for unseen nucleotide substitutions: the effect of dense branching as exemplified by ribonuclease. Mol. Biol. Evol. 7:438-443. FUKAMI-KOBAYASHI,K., and Y. TATENO. 199 1. Robustness of maximum likelihood tree estimation against different patterns of base substitutions. J. Mol. Evol. 32:79-9 1. GRAUR, D., W. A. HIDE, and W.-H. LI . 199 1. Is the guineania _ y a rodent? Nature 351:649-652. 604 Cao et al. GRAUR, D., W. A. HIDE, A. ZHARKIKH, and W.-H. LI. 1992. The biochemical phylogeny of guinea-pigs and gundis, and the paraphyly of the order rodentia. Comp. Biochem. Physiol. [B] 101:495-498. HASEGAWA,M., Y. CAO, J. ADACHI, and T. YANO. 1992. Rodent polyphyly? Nature 355:595-595. HASEGAWA,M., and M. FUJIWARA. 1993. Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor joining methods for estimating protein phylogeny. Mol. Phylogenet. Evol. 2: l-5. HASEGAWA,M., T. HASHIMOTO,J. ADACHI, N. IWABE,and T. MIYATA. 1993. Early divergences in the evolution of eukaryotes: ancient divergence of Entamoeba that lacks mitochondria revealed by protein sequence data. J. Mol. Evol. 36:380-388. HASEGAWA, M., and H. KISHINO. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Mol. Biol. Evol. 11:142-145. HASEGAWA,M., H. KISHINO, and N. SAITOU. 199 1. On the maximum likelihood method in molecular phylogenetics. J. Mol. Evol. 32:443-445. HASHIMOTO,T., E. OTAKA, J. ADACHI, K. MIZUTA, and M. HASEGAWA . 1993. The giant panda is most close to a bear, judged by a- and P-hemoglobin sequences. J. Mol. Evol. 36:282-289. HENDY, M. D., and D. PENNY. 1989. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38:297309. JANKE, A., G. FELDMAIER-FUCHS,W. K. THOMAS, A. VON HAESELER,and S. P;~;~Bo. The marsupial mitochondrial genome and the evolution of placental mammals. Genetics (in press). JONES, D. T., W. R. TAYLOR, and J. M. THORNTON. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275-282. KIMURA, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge. KISHINO, H., and M. HASEGAWA. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29: 170-179. KISHINO, H., T. MIYATA, and M. HASEGAWA. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 30: 15 1- 160. LECOINTRE,G., H. PHILIPPE, H. L. V. LI?, and H. LE GUYADER. 1993. Species sampling has a major impact on phylogenetic inference. Mol. Phylogenet. Evol. 2:205-224. LI, W.-H., M. GOUY, P. M. SHARP, C. O’HUIGIN, and Y.-W. YANG. 1990. Molecular phylogeny of Rodentia, Lagomorpha, Primates, Artiodactyla, and Carnivora and molecular clocks. Proc. Natl. Acad. Sci. USA 87:6703-6707. LI, W.-H., W. A. HIDE, and D. GRAUR. 1992a. Origin of rodents and guinea-pigs. Nature 359:277-278. LI, W.-H., W. A. HIDE, A. ZHARKIKH, D.-P. MA, and D. GRAUR. 1992b. The molecular taxonomy and evolution of the guinea pig. J. Hered. 83: 174- 18 1. LUCKETT,W. P., and J.-L. HARTENBERGER.1985. Evolutionary relationships among rodents: comments and conclusions. Pp. 685-7 12 in W. LUCKETTand J.-L. HARTENBERGER, eds. Evolutionary relationships among rodents: a multidisciplinary analysis. Plenum, New York. MA, D.-P., A. ZHARKIKH, D. GRAUR, J. L. VANDEBERG,and W.-H. LI . 1993. Structure and evolution of opossum, guinea pig, and porcupine cytochrome b genes. J. Mol. Evol. 36: 327-334. MARTIGNETTI,J. A., and J. BROSIUS. 1993. Neural BCl RNA as an evolutionary marker: guinea pig remains a rodent. Proc. Natl. Acad. Sci. USA 90:9698-9702. NOVACEK, M. J. 1992. Mammalian phylogeny: shaking the tree. Nature 356:121-125. RUVOLO,M., T. R. DISOTELL,M. W. ALLARD,W. M. BROWN, and R. L. HONEYCUTT. 199 1. Resolution of the African hominoid trichotomy by use of a mitochondrial gene sequence. Proc. Natl. Acad. Sci. USA 88: 1570- 1574. SAITOU,N., and M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425. SAKAMOTO, Y., M. ISHIGURO, and G. KITAGAWA. 1986. Akaike information criterion statistics. Reidel, Dordrecht. SHEPPARD,D., C. Rozzo, L. STARR, V. QUARANTA, D. J. ERLE,and R. PYTELA . 1990. Complete amino acid sequence of a novel integrin p subunit ( p6) identified in epithelial cells using the polymerase chain reaction. J. Biol. Chem. 265:11502-l 1507. STENZEL,P. 1974. Opossum Hb chain sequence and neutral mutation theory. Nature 252:62-63. STEWARD,C.-B. 1993. The powers and pitfalls of parsimony. Nature 361:603-607. WYSS, A. R., J. J. FLYNN, M. A. NORELL, C. C. SWISHERIII, R. CHARRIER,M. J. NOVACEK,and M. C. MCKENNA 1993. South America’s earliest rodent and recognition of a new interval of mammalian evolution, Nature 365:434-437. TAKASHI GOJOBORI, reviewing Received November Accepted March 4, 1993 16, 1994 editor