Download Phylogenetic Place of Guinea Pigs: No Support of the Rodent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metagenomics wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
Phylogenetic Place of Guinea Pigs: No Support of the Rodent-Polyphyly
Hypothesis from Maximum-Likelihood Analyses of Multiple
Protein Sequences
Ying Cao, * Jun Adachi,“f Taka-aki Yano,$ and Masami Hasegawa” pj*The Institute of Statistical Mathematics; ?Depattment of Statistical Science, The Graduate University for Advanced
Studies; and SCollege of Arts and Sciences, Showa University
Graur et al’s ( 199 1) hypothesis that the guinea pig-like rodents have an evolutionary origin within mammals that
is separate from that of other rodents (the rodent-polyphyly hypothesis) was reexamined by the maximum-likelihood
method for protein phylogeny, as well as by the maximum-parsimony
and neighbor-joining methods. The overall
evidence does not support Graur et al.‘s hypothesis, which radically contradicts the traditional view of rodent
monophyly. This work demonstrates that we must be careful in choosing a proper method for phylogenetic inference
and that an argument based on a small data set (with respect to the length of the sequence and especially the
number of species) may be unstable.
Introduction
On the basis of molecular phylogenetic
analyses of
proteins, Graur et al. ( 199 1, 1992) and Li et al. ( 1992b)
suggested that the order Rodentia
may not be monophyletic and that, within mammals, the guinea pig-like
rodents (Caviomorpha
with the New World families, or
Hystricomorpha
including
the Old World families as
well as the New World ones) may have an evolutionary
origin separate from that of the ratlike rodents (Myomorpha) and the squirrel-like rodents ( Sciuromorpha) .
They further suggested that the Caviomorpha
separated
from other rodents before the divergence among Rodentia, Primates, and Artiodactyla
(tree III in fig. 1).
Their suggestion contradicts
the traditional
view of rodent monophyly
(tree I in fig. 1)) which is based mainly
on comparative morphology (Luckett and Hartenberger
1985; Novacek 1992).
They used a maximum-parsimony
(MP) method
in estimating
the tree, but it is known that the MP
method is sometimes misleading, particularly when the
evolutionary
rate differs among lineages (Felsenstein
1978). Therefore, we reexamined
the data by a maximum-likelihood
(ML) method for protein sequences
(Kishino et al. 1990; Adachi and Hasegawa 1992) that
is robust against the violation of rate constancy (Hasegawa et al. 199 1; Hasegawa and Fujiwara 1993)) and a
Key words: Guinea pig, Caviomorpha, rodents, mammalian
evolution, maximum-likelihood tree of proteins.
Address for correspondence and reprints: Masami Hasegawa, The
Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku,
Tokyo 106, Japan.
Mol. Biol. Evoi. 11(4):593-604. 1994.
0 1994 by The University of Chicago. All rights reserved.
0737-4038/94/l
104-0003$02.00
preliminary
result has been presented in a note by Hasegawa et al. ( 1992). Although the MP analysis supported
Graur et al.‘s tree of the rodent polyphyly, with as high
as 96% bootstrap probability,
the result of ML analysis
was equivocal, and hence we suggested the possibility
that Graur et al.‘s tree represents an example of the fact
that unequal evolutionary rates can mislead MP analysis
(Felsenstein
1978; Hasegawa et al. 199 1).
To our note, Li et al. ( 1992~) responded.
They
noted that the proteins used by Hasegawa et al. ( 1992)
can be classified into two groups-that
is, conservative
and nonconservative
proteins. It is known that the unequal rate effect on the MP method is stronger for divergent sequences than for well-conserved
ones. Li et al.
( 1992a) found that, for most of the conservative
proteins, the ML and MP methods are congruent, favoring
the rodent polyphyly tree of Graur et al. ( 199 1). From
our point of view, however, because the support of the
rodent polyphyly tree by ML is not statistically significant, we cannot accept this tree, which radically contradicts the traditional tree.
More recently, Ma et al. ( 1993) sequenced the mitochondrial
cytochrome b gene from the guinea pig, the
African porcupine
(Hystrix africaeaustralis), and the
South American opossum (Monodelphis domestica) and
suggested, by using the neighbor-joining
(NJ) method
(Saitou and Nei 1987), another rodent polyphyly tree,
where Hystricomorpha
(the guinea pig and the African
porcupine)
and Primates form a clade excluding Myomorpha as an outgroup (tree II in fig. 1). This tree is
at odds with Graur et al.‘s ( 199 1) tree, as well as with
the traditional tree of the rodent monophyly.
593
594
Cao et al.
Tree
I
Caviomorpha
Myomorpha
Primates
outgroup
Tree II
Caviomorpha
Primates
Myomorpha
outgroup
Tree III
Myomorpha
Primates
Caviomorpha
outgroup
FIG. 1.-Three possible phylogenetic trees among Caviomorpha
(Hystricomorpha), Myomorpha, and Primates. Tree I represents the
traditional view of Caviomorha and Myomorpha forming a clade as
rodents, while tree III represents the view of Graur et al. ( 199 1) that
Myomorpha and Primates form a clade excluding Caviomorpha as an
outgroup.
In this way, there remain many uncertainties with
regard to the phylogenetic position of the guinea pig.
Since this issue is critical to the methodology of molecular phylogenetics, as well as to the specific problem of
mammalian phylogeny, we here present a detailed analysis of this problem.
Material and Methods
We examine the relationship among Caviomorpha
( Hystricomorpha ) , Myomorpha, and Primates. The
protein sequence data used in this work are shown in
table 1. Positions with gaps, as well as areas where alignment was ambiguous, were excluded from the analyses.
In the first analysis, we use the same data set as
that used by Hasegawa et al. ( 1992). The data are mostly
from the SWISS PROT data library and are mostly the
same as those used by Graur et al. ( 199 1). We additionally use factor IX sequences (Bottema et al. 199 1),
with factor X as an outgroup. Following Li et al. ( 1992a),
we take a-crystallin A, a-globin, P-globin, lipoprotein
lipase, and lipocortin I as “conservative” proteins and
take a-lactalbumin, p-nerve growth factor, factor IX,
pancreatic ribonuclease, and proinsulin as “nonconservative” proteins. These two groups are analyzed separately.
Table 1
List of Data Used in the Analyses
No. of
Protein (Abbreviation)
1. a-Crystallin A (Crys) ......
2. a-Globin (Hba) ..........
Sites
Primates
172
141
Myomorpha
Caviomorpha
(Hystricomorpha)
outgroup
Mru” (PO2502)
Oan (PO1979)
Tat (PO1977)
Dmaa (PO1976)
Mgi (PO1975)
Dvi (PO7419)
Oan (PO2111)
Tat (PO2110)
Dmaa (PO2109)
Mgi” (PO2106)
Rno’ (PO2490)
Mmua (PO1942)
Mau’ (PO1945)
Cpo” (P02491)
Cpo” (POl947)
Mmu (PO2088)
Rnoa (PO2091)
Mau * (PO2094)
Cpo * (PO2095)
Mmu” (Pll152)
Mmua (P10107)
Rnoa (PO7150)
Rno’ (PO0715)
Cpo” (Pl 1153)
Ccu* (Pl4087)
Gga” (PI 1602)
Cli” (Pl4950)
Cpo” (PO0713)
Mrufo” (P07458)
.... .....
146
4. Lipoprotein lipase (Lipa)
.
5. Lipocortin I (Cort) . . . . .
443
339
Hsa’ (P02489)
Hsa” (PO1922)
Cae (PO1926)
Agea (PO1927)
Lta (PO1938)
Tgeavb(PO1932)
Hsa” (PO2023)
Cae (PO2028)
Age* (PO2034)
Lta (PO2048)
Mmulavb (PO2026)
Hsa” (P06858)
Hsaa (PO4083)
6. a-Lactalbumin (Lact)
...
7. P-Nerve growth factor
(NGF) . . . . . . . . . . . .
...
8. Factor IX (Fac9) . . .
119
Hsa” (POO709)
235
238
Hsa’ (PO1138)
Hsa” (Bottema9 1)
Mmu’ (PO1139)
Mmu” (Bottema9 1)
Rno a (Bottema9 1)
Cpo” (Pl9093)
Cpo” (Bottema9 1)
Gga” (PO5200)
HsaX” (Bottema9 1)
BtaXa (Bottema9 1)
9 Pancreatic ribonuclease
(Ribo) . . . . . . . .
.
122
Hsa” (P07998)
Pen (P19644)
Mmua (PO0683)
Rno”*b (PO0684)
Mau’ (PO0682)
Ozi (PO068 1)
Mru” (PO0686)
......
81
Hsa” (PO1308)
Rno 1’ (PO1322)
Rno2” (PO1323)
CpoA (PO0678)
CpoB8 (PO0679)
Hhy” (PO0677)
Cbr (PO0675)
Mco (PO0676)
Pgu (PO4059)
Cpo” (PO1329)
3. P-Globin (HbP)
10. Proinsulin (Insu)
Gga” (P01332)
596 Cao et al.
tween the log likelihood of alternative trees and that of
the ML tree, by the formula of Kishino and Hasegawa
( 1989), and estimated the bootstrap probability by the
RELL (resampling of the estimated log likelihood)
method given by Kishino et al. ( 1990). The RELL
method is a good approximation to the computationally
intensive bootstrap method (Felsenstein 1985)) in estimating bootstrap probability (Hasegawa and Kishino
1994). The results of the ML analyses are compared
with those of the MP analysis (using the PROTPARS
program of Felsenstein’s [ 19931 PHYLIP package) and
with those of the NJ (Saitou and Nei 1987) analysis.
(The aligned data used in this work, as well as the
PROTML program, are available by anonymous ftp in
sunmh.ism.ac.jp( 133.58.12.20):/pub/data/guinea-pig
and /pub/molphy*.)
Results and Discussion
Using the Previous Data Set
Table 2 gives results of the analyses of the same
data set as in Hasegawa et al. ( 1992 ) . Since, in calculating
the transition-probability matrix, the revised version of
the PROTML adopts a numerical method more precise
than that used in the previous version, the estimated
likelihoods differ very slightly from the previous ones
( Hasegawa et al. 1992 ) , although our argument remains
exactly the same.
Likelihoods of the three candidate trees shown in
figure l-that is, tree I, in which Caviomorpha and Myomorpha form a monophyletic group representing the
traditional view; tree II, in which Caviomorpha and Primates form a clade excluding Myomorpha as an outgroup; and tree III, in which Myomorpha and Primates
form a clade excluding Caviomorpha as an outgroup,
representing Graur et al.‘s ( 199 1) hypothesis-were
evaluated. For all the conservative proteins in the table,
the ML method for all the models is congruent with the
MP method, except for the ML trees of lipoprotein lipase
and hemoglobin p, which, by some of the models, differ
from the MP trees, with very minor log-likelihood differences. Bootstrap probabilities of tree III for the conservative proteins as a whole are 0.77, 0.78, 0.72, and
0.68, for the JTT, Dayhoff, proportional, and Poisson
models, respectively, and 0.99 for the MP method.
Table 2 also gives bootstrap probabilities for individual proteins, estimated by the NJ method. We cannot
synthesize the results of the NJ analyses for diverse proteins as we did for the ML and MP analyses, but the
results of the NJ analyses are consistent with those of
the ML.
For the nonconservative proteins, on the other
hand, the ML and MP methods often do not support
tree III, and the NJ method never supports that tree.
When all the nonconservative proteins in the table are
considered together, the ML method prefers tree I of
rodent monophyly, irrespective of the model, while the
MP method prefers tree II (another rodent polyphyly
tree), but both results are not statistically significant.
Since the unequal rate effect of the MP method is
known to be stronger for divergent sequences than for
conservative ones, this analysis might be regarded as in
accord with tree III of the rodent polyphyly, as claimed
by Graur et al. ( 199 1,1992) and Li et al. ( 1992a, 19923).
However, tree I, which has long been accepted by morphologists, cannot, with any statistical significance, be
ruled out by the ML analysis, because the bootstrap
probability of this tree for the conservative proteins is
estimated to be as high as 0.2 1,O. 19,0.27, and 0.30, by
the JTT, Dayhoff, proportional, and Poisson models,
respectively.
We used four alternative models for amino acid
substitutions in the ML analysis, and the ML tree sometimes differs among models. However, the dependence
of the ML tree on the model does not necessarily mean
that the ML method is sensitive to violation of the assumed underlying model, as Li et al. ( 1992a) claim. It
merely indicates that the data are not sufficient to discriminate among the candidate trees, because none of
the differences are statistically significant. Nevertheless,
it is of course desirable to use a model that approximates
the data as closely as possible.
By using Akaike’s information criterion (AIC)
(Akaike 1973; for review, see Sakamoto et al. 1986),
defined by AIC = - (2 X log likelihood) + [ 2 X (no. of
parameters)], we can compare the adequacy of the approximation of the four models for amino acid substitutions. The minimum AIC model is considered to be
the most appropriate to represent the data. Table 3 shows
that, for all the proteins analyzed in this work, the JTT
and Dayhoff models are more appropriate by far than
are the proportional and Poisson models. Furthermore,
the JTT model is better than the Dayhoff model, except
for hemoglobins a and p. These exceptions may be due
to the hemoglobins contributing to a large weight in the
estimation of Dayhoff et al.‘s matrix, and the JTT model
seems to be appropriate in approximating the evolution
of a wider range of proteins than is approximated by the
Dayhoff model.
Problems in the Previous Data Set
The analyses mentioned above are mostly the same
as those of Hasegawa et al. ( 1992), except for ( 1) the
introduction of Li et al.‘s ( 1992a) classification of proteins into conservative and nonconservative and (2) the
additional model (the JTT model) of amino acid substitutions. However, the data set contains several problems, as follows:
Table 2
ML, MP, and NJ Analyses of the Relationship
among Caviomorpha,
Myomorpha,
and Primates, as Shown in Figure 1
CONSERVATIVE
PROTEINS
TREE
Crys
HbU
ML method with JTT model:
I
ML
-6.2 + 8.0
II
-0.5 + 7.0 -10.9 f 6.3
III
-5.6 + 4.5
ML
ML method with Dayhoff model:
I ... ..
ML
-5.6 + 7.9
II . . . .
-0.5 f 7.1 -10.5 & 6.0
II1
-5.6 ? 4.5
ML
ML method with proportional model:
I .
ML
-6.6 & 10.2
I1
-6.3 + 9.1 -13.4 + 7.8
III
-11.0 +- 7.6
ML
ML method with Poisson model:
I
ML
-3.3 -+ 9.4
II
-5.9 f 9.5 -10.9 + 6.5
III
-11.2 + 7.7
ML
MP method:
I .....
MP
+4
II
+2
+8
III
+3
MP
MJ methods:
I
.83
.04
II
II
.OO
III
.OO
L96
NONCONSERVATIVE
PROTEINS
HbP
Lipa
colt
Overall”
Pb
Lact
ML
-3.8 + 3.7
+-O/I’* 5.6
-1.7 * 3.3
-0.2 + 4.0
ML
-8.2 + 7.1
-10.8 f 6.2
ML
-10.1 f 13.3
-20.2 f 11.4
ML
.2147
.Ol66
.7687
ML
-1.1 f 4.9
-3.6 -t 3.7
ML
-3.5 + 3.6
-0.2 * 5.5
-3.0 + 3.5
ML
-0.2 f 4.6
-8.3 & 6.9
-10.6 & 6.1
ML
-10.9 + 13.1
-19.0 f 11.5
ML
.1938
.0237
.7825
-1.9 f 5.4
-4.3 f 4.1
ML
-3.8 f 6.2
-0.7 f 7.3
ML
-8.8 f 7.9
-10.5 f 7.3
ML
-10.2 * 17.0
-24.2 f 14.2
ML
-2.8 +- 5.5
-4.8 f 4.4
ML
-4.9 + 6.6
-1.5 f 7.8
ML
-8.5 f 8.5
-10.6 -+ 7.7
ML
+1
+5
MP
+9
+6
MP
+4
+5
MP
.61
zl
.39
.oo
.49
.04
-47
.16
: 84
Pb
Fac9
Ribo
IIISU
Overall 8
Pb
ML
-9.2 + 5.9
-3.4 -+ 7.8
-2.7 f 2.7
ML
-2.3 ? 3.0
-3.1 + 2.8
ML
-2.6 -+ 3.2
-2.0 + 2.7
ML
-2.0 -+ 2.7
ML
-2.5 + 8.9
-6.1 _+ 8.8
.5482
.3238
.I280
-4.0 f 15.9
-16.5 + 13.6
ML
.3885
.0423
.5692
ML
-4.1 + 5.3
-5.4 + 4.5
ML
-9.2 + 6.1
-2.3 ? 8.3
-3.8 & 3.4
ML
-3.6 f 3.5
-3.4 2 3.0
ML
-2.9 f 3.4
-1.6 f 2.2
ML
-1.6 + 2.2
ML
-4.6 + 9.4
-7.0 f 9.6
.6036
.2480
.1484
-3.8 + 16.2
-16.6 f 13.9
ML
.3886
.0444
.5670
.2662
.0165
.7173
ML
-5.8 +- 4.8
-5.8 + 4.6
ML
-12.1 f 7.1
-5.6 + 9.2
-2.9 f 4.0
ML
-4.0 f 3.5
-3.8 f 3.9
ML
-2.9 2 4.5
ML
-1.7 f 2.5
-1.2 f 2.8
ML
-12.9 f 10.4
-12.7 f IO.9
.8197
.0827
.0976
ML
-26.9 f 16.8
-2.5 + 20.2
.544 1
.0118
,444 1
-8.2 * 17.0
-22.4 + 14.4
ML
.3029
.0217
.6754
ML
-6.3 ? 5.1
-5.9 f 5.2
ML
-11.6 f 7.1
-5.0 -t 9.1
-3.2 * 4.1
ML
-4.3 f 3.6
-3.2 ? 3.5
ML
-2.0 + 4.3
ML
-1.9 f 2.7
-1.5 + 2.9
ML
-13.3 f 10.5
-12.2 + 11.1
.8162
.0688
.I150
ML
-27.5 f 17.5
-4.0 ? 20.4
.5674
.0124
.4202
+15
+23
MP
.0148
.OOOl
.985 1
+5
MP
+4
+5
MP
+1
MP
+3
+3
MP
+2
+1
+1
MP
+5
MP
+3
.1404
.5556
.3040
+17
+20
MP
.0303
.OlO8
.9589
.68
Y5
.07
.77
K
.22
.33
.46
5
.12
.78
z
.47
z
.12
NGF
+I
OVERALL’
SOURCE.-Hasegawa et al. ( 1992) data set.
NOTE.-For the ML analyses, the highest likelihood tree for each protein is indicated as “ML,” and the differences of log likelihoods of alternative trees from that of the ML tree are shown with their standard errors. For the MP
analyses, the most parsimonious tree is indicated as “MP,” and the differences of numbers of substitutions of alternative trees from that of the MP tree are shown. For the NJ analyses, bootstrap probabilities of three alternative trees
of being an NJ tree during bootstrap resamplings are given for individual proteins (with 100 replications), and those of NJ trees for the real data are underlined. Abbreviations of protein names are as in table 1.
’Summation of log-likelihood differences from the diverse proteins.
b Bootstrap probability of being the ML or MP tree, among alternatives during bootstrap resampling, estimated by the RELL method with 10’replications.
.
598
Cao et al.
Table 3
Difference of AIC from That of the Minimum AIC Model
for the Analyses Presented in Table 2
Cryst
Hba
HbP
Lipa
..
.....
.
tort
Lact
. .
NGF
.
Fac9
.
Ribo
lnsu . .
Overall
JTT
Dayhoff
0
31.6
33.6
0
0
0
0
0
0
0
0
13.5
0
0
8.0
36.0
8.6
26.4
30.6
4.0
8.4
70.3
Proportional
44.7
123.5
237.6
260.7
214.2
92.9
112.4
247.6
135.6
65.9
1,469.9
Poisson
82.1
181.7
269.7
327.3
308.0
110.5
168.5
297.7
172.0
66.3
1,918.6
The a-crystallin A data set may contain paralogous
sequences. The human (Homo sapiens [ Hsa] ) differs
from the guinea pig ( Cavia porcellus [ Cpo]) and the rat
(Rat&s norvegicus [ Rno]) at seven and eight sites, respectively, in 172 amino acids of a-crystallin A, and the
guinea pig differs from the rat at only one site. On the
other hand, two strepsirhines, the potto (Perodicticus
potto; SWISS PROT P02495) and the brown lemur (Lemur fulvus; P02494) differ from the guinea pig at three
and two sites, respectively, and differ from the human,
which must be closer to the strepsirhines, at as many as
nine and eight sites, respectively. There is no indication
of significant variation of the evolutionary rate of the acrystallin A among these species, on the basis of a relative
rate test using the red kangaroo (Macropus rufus [ Mru] )
as an outgroup. This suggests that, while the strepsirhine
sequences are orthologous to those of the guinea pig and
the rat, the human sequence used in the previous analysis
may be paralogous to these sequences and, hence, does
not provide us with information on the phylogenetic
relationship among species. For this reason, and since
the strepsirhine sequences differ at too few sites from
the guinea pig and rat sequences to have much phylogenetic information, we do not use the a-crystallin A
sequences in the following analyses.
In the previous analysis of hemoglobin a, the human (Hsa), the gelada babbon ( Theropithecus geZada
[ Tge] , an Old World monkey), and the spider monkey
(Ateles geofioyi [Age], a New World monkey) are used
from Primates, and the relationship (( Hsa,Tge), Age)
was used. The hemoglobin c1of the gelada baboon differs
from those of the human and the spider monkey at 12
and 14 sites, respectively, in 14 1 amino acids, while the
human differs from the spider monkey at only 5 sites.
At a first glance, this may seem to indicate that the relationship (( Hsa,Age),Tge) holds for the hemoglobin a
sequences and that the gelada baboon sequence is paralogous to those of the human and the spider monkey.
However, an ML analysis of these sequences favors
(( Hsa,Tge) ,Age) over (( Hsa,Age) ,Tge), with a log-likelihood difference of 3.8 & 4.6 (the Dayhoff model; k refers to SE). Although the relationship (( Hsa,Tge),
Age), which was used in the previous analysis, is justified
in this way, this relationship indicates exceptionally rapid
evolution in the gelada baboon lineage, relative to that
of other primates. For this reason, we use the green
monkey (Cercopithecus aethiops [Cae]) for the Old
World monkey, instead of gelada baboon, in the following analysis. Furthermore, we also use the slender loris
(Loris tardigradus [ Lta] , a strepsirhine ) .
Another problem in the previous analysis of hemoglobin a is that only the North American opossum
(Didelphis marsupialis [ Dma] ) was used as an outgroup.
It is known, however, that the opossum a chain has apparently evolved more rapidly than have other c1chains
and that its mode of evolution differs from those of other
species ( Stenzel 1974). The opossum hemoglobin c1has
unusual features, and it seems to have lost some selective
constraints, because of unknown causes ( Kimura 1983 ).
Therefore, it may be desirable to use more sequence
data as an outgroup, to increase the stability of the results.
Hence, the duckbill platypus (Ornithorhynchus anatinus
[ Oan] ), the Australian echidna ( Tachyglossus aculeatus
[Tat ] ) , the eastern gray kangaroo (Macropus giganteus
[ Mgi ] ) , and the southeastern quo11 (Dasyurid viverrinus
[ Dvi] ) are used as outgroups, in addition to Didelphis
marsupialis (table 1) .
Hemoglobin a and p
For this modified data set of hemoglobin a, the ML
relationships within the outgroup and within Primates
are ( (Oan,Tac) , (( (Mgi,Dvi) ,Dma), in group)), and
(( (Hsa,Cae) ,Age) ,Lta) ; and differences of log likelihood
among trees I, II, and III are shown in table 4. It turned
out that this data set favors tree I over the alternatives,
although the difference is not statistically significant, irrespective of the model used for ML analysis. The MP
analysis is consistent with the ML analysis. Since tree
III was preferred for the previous data set of hemoglobin
a, this analysis indicates that the preferred tree depends
on the species set that we use (Lecointre et al. 1993).
The second set of species should be preferred, since it is
more complete (hence, it contains more information
and allows a better polarization of characters).
For hemoglobin p, more species were also used than
were used in the previous analysis (table 1). The ML
relationships within the outgroup, within Primates, and
within Myomorpha are (( Oan,Tac), ( (Mgi,Dma),
I
IL
c
.=
3
600
Cao et al.
ingroup), (( (Hsa,Cae),Age),Lta),
and (( (Mmu,Rno),
Mau) ,Mbr), respectively. This data set also favors tree
I, irrespective of the model for the ML analysis, although
the MP analysis supports trees I and III equally (table 4).
,yatricomo’
I
Myoglobin
The guinea pig myoglobin sequence has not been
published yet, and hence we instead use myoglobins from
other members of the New World Hystricomorphathe casiragua (Proechimys guaire [Pgu]) and the viscacha (Lagostomus maximus [ Lma]). This data set
strongly supports tree I (fig. 2)) irrespective of the model
for ML analysis (table 4).
To confirm tree I of the rodent monophyly, we must
evaluate whether casiragua and viscacha are closely related to guinea pig. We can use pancreatic ribonuclease
for this purpose, because this protein has been sequenced
both from the casiragua and from the guinea pig (but
not from the viscacha). To evaluate the monophyletic
relationship among Hystricomorpha, all 105 trees possible among CpoA, CpoB, (Cbr,Pgu,Mco),
Hhy,
(( Mmu,Rno), Mau), and Hsa were investigated. The
ML tree is shown in figure 3, suggesting the monophyly
of Hystricomorpha. The subtotal of bootstrap probabilities for the monophyly of Hystricomorpha among the
105 trees is as high as 0.9505 (the JTT model). Our ML
tree of ribonuclease is consistent with the Hystricomorpha part of the MP tree (Beintema 1985; Fitch and
Beintema 1990). Therefore, the use of the casiragua and
the viscacha (the close relationship between these two
Callithrixjacchus
1 I
Macaca fascicularis
-
[-
Homo sapiens
Ornithorhynchus anatinua
Tachyglossus aculeatus
1
I
0.1 substitutions / site
FIG. 2.-ML tree of myoglobin (the JTT model). The horizontal
length of each branch is proportional to the estimated number of amino
acid substitutions. The root is arbitrarily placed on the branch leading
to Monotremata. The Hystricomorpha/Myomorpha
clade suggested
by this analysis is shaded.
Chinchilla brevicaudata
-
Myocastor coypus
MU8
musculus
Myomorpha
I
Mesocricetus auratus
Ondatra zibethicus
Presbytis entellus
Primates
I
I
0.7 substitutions/site
FIG. 3.-ML tree of pancreatic ribonuclease (the JTT model).
The horizontal length of each branch is proportional to the estimated
number of amino acid substitutions. This is an unrooted tree, but,
from the overall evidence presented in this paper, the root is likely to
be located on the branch leading to the primate lineage.
is evident from the myoglobin data), instead of guinea
pig, in analyzing the myoglobins is justified.
Cytochrome b
One of the problems in molecular phylogenetic
analyses using proteins encoded by nuclear DNA is that
we may be comparing paralogously related proteins
rather than orthologously related ones. Since, when
matched for homology, sequences of mitochondrial
DNA from different species will always be from orthologous genes, mitochondrial DNA sequence data from a
guinea pig would be of great help in resolving the riddle
of the phylogenetic placement of this group. Ma et al.‘s
( 1993) data on cytochrome b are important in this respect .
On the basis of the NJ analysis of cytochrome b
sequences from 26 species including the guinea pig, the
African porcupine (Hystrix africaeaustralis; [ Haf] ) ,
myomorphs (mouse Mus musculus and rat Rattus norvegicus) , 11 artiodactyls, dolphins, perissodactyls (zebra
Equus grevyi and rhinocerus Diceros bicornis), human
(Homo sapiens), elephant (Loxodonta africana), the
South American opossum (Monodelphis domestica
[ Mdo ] ) , chicken ( Gallus gallus [ Ega] ) , frog (Xenopus
Zaevis [ Xla] ), and sturgeon (Acipenser transmontanus),
they suggested that Hystricomorpha,
including the
guinea pig and the African porcupine, are closer to Artiodactyla, Perissodactyla, Primates, and Proboscidea
than to Myomorpha (tree II in fig. 1), although they
admitted that their suggestion is not strongly based.
Phylogenetic Place of Guinea Pigs
We reanalyzed the cytochrome
b data by the ML
analysis. The data from human, mouse, rat, guinea pig,
and African porcupine were analyzed by using opossum
(Mdo ), chicken (Gga) , frog ( Xla), carp ( Cyprinus carpio
[ Cca]), and loach ( Crossostoma Zacustre [ Cla] ) as an
outgroup
(table
1 ), with the relation
(( Cca,Cla),
( Xla, (Gga, (Mdojngroup)))).
The artiodactyls,
perissodactyls, dolphins, human, and elephant used in Ma
et al’s analysis were excluded, so as to restrict our analysis to the relationships
among relevant taxa, and the
15 possible trees linking Primates ( human ) , Myomorpha
(mouse and rat), guinea pig, porcupine,
and the outgroup were examined.
Consistent
with Ma et al.‘s NJ
analysis, the guinea pig/porcupine
clade is strongly supported, with bootstrap probabilities
of 0.90-0.96 for the
four models of the ML analyses. In addition, our ML
analysis favors tree I rather than tree II, irrespective of
the assumed model, although with no statistical significance (fig. 4 and table 4). The MP analysis is equivocal
for these data. The close relationship
between the South
American and the African Hystricomorpha
is in accord
with the hypothesis that the South American ones originated in Africa ( Wyss et al. 1993).
Da Silva and Patton ( 1993) sequenced a partial
cytochrome
b gene from several arboreal species of the
Caviomorph
family Echimyidae
from the Amazon Basin. Although inclusion of these data reduces the number
of commonly
compared
sites to 265, to examine the
effect of inclusion of further species, we additionally
analyzed the cytochrome
b data, including five sequences
of Echimyidae
species, Dactylomys boliviensis (Dbo;
EMBL/GenBank/DDBJ
L23339), Dactylomys dactylinus (Dda; L23336), Echimys chrysurus (Ech; L23340),
and Makalata didelphoides JLP152 14 and LHE554
(Mdi 1 and Mdi2; L23356 and L2336 1), among which
the phylogenetic relationship is unambiguously
assigned
as ( ( Mdi 1,Mdi2 ), ( Ech,( Dbo,Dda)) ) . The 105 possible
trees, linking Echimyidae,
Caviidae ( Cavia porcellus) ,
Hystricidae
(Hystrix africaeaustralis),
Myomorpha
(Mus musculus and Rattus norvegicus), and Primates
(Homo sapiens) with the outgroup used in the preceding
analysis, were examined by the ML method based on
the JTT model. A Caviidae/Echimyidae
clade excluding
all the others is strongly supported,
with a bootstrap
probability of 0.92. Furthermore,
although with no statistical significance,
a clade formed by Caviidae, Echimyidae, and Hystricidae
is favored, with a bootstrap
probability
of 0.52. Thus, the most likely relationship
in Hystricomorpha
is (( Caviidae,Echimyidae
) ,Hystricidae). Under the assumption of this relationship within
Hystricomorpha,
tree I among Primates, Myomorpha,
and Hystricomorpha
is best supported. Trees II and III
have log likelihood lower by 7.5 + 5.4 and 6.9 ? 5.7,
601
Homo sapiens
-
Monodelphis
domestica
Gallus gallus
Xenopus la&s
T
I
cro.E3sostoma
lacustre
Cyprinus carpio
I
0.1 substitutions/site
FIG. 4.-ML tree of cytochrome b (the JTT model). The horizontal length of each branch is proportional to the estimated number
of amino acid substitutions. The root is arbitrarily placed on the branch
leading to Actinopterygii. The Hystricomorpha/Myomorpha
clade
suggested by this analysis is shaded.
respectively, than that of tree I, again not in accord with
Graur et al’s hypothesis.
Cytochrome
Oxidase
Subunit
II (COII)
Another
mitochondrial
element relevant to the
guinea pig problem is COII. Although CO11 from the
guinea pig has not been published yet, we can use instead
the data from the Cape mole rat (Georychus capensis)
( Adkins and Honeycutt
1993 ) , belonging to the African
Hystricomorph
family Bathyergidae, which is probably
closely related to the guinea pig, as suggested by the cytochrome b analysis. Since the inclusion of many species
from the diverse group would stabilize the result of anal(Pan
ysis, we used human (Hsa), pygmy chimpanzee
paniscus [ Ppa] ) , gorilla ( Gorilla gorilla [ Ggo ] ) , siamang
(Hylobates syndactylus [ Hsy ] ) , crab-eating
macaque
(Macaca fascicularis [ Mfa ] ) , rhesus macaque (Macaca
mulatta [ Mmul] ), mantled howler monkey (Alouatta
palliata [ Apa] ), Humboldt’s woolly monkey (Lagothrix
lagotricha [ Lla] ) , lesser bushbaby ( Galago senegalensis
[ Gse] ), and ring-tailed
lemur (Lemur catta [Lea] ),
among which the phylogenetic relationship
is unambiguously assigned as ((((((Hsa,Ppa),Ggo),Hsy),(Mfa,
Mmul)) ,( Apa,Lla)), (Gse,Lca))
(Ruvolo
et al. 199 1;
Adkins and Honeycutt
1993 ) .
Tree I is supported by all the models of ML, as well
as by the MP analysis (table 4). Figure 5 gives the ML
tree of CO II, based on the JTT model. The close relationship between a Hystricomorph,
Georychus capensis,
602
Cao et al.
- Homo sapiens
_-
Pan paniscus
- Gorillagorilla
Hylobates
syndactylus
Macaca fascicularis
Maraca
mulatta
Alouatta palliata
Lagothrix
-
Didelphis
U
-
virginiana
Gallus gallus
Xenopus
laevis
Crossostoma
k
I
lagotricha
Cyprinus
lacustre
carpio
I
0.1 substitutions/site
FIG. K-ML tree of CO II (the JTT model). The horizontal length
of each branch is proportional to the estimated number of amino acid
substitutions. The root is arbitrarily placed on the branch leading to
Actinopterygii. The Hystricomorpha/Myomorpha
clade suggested by
this analysis is shaded.
and Myomorpha is in accord with the MP analysis by
Adkins and Honeycutt ( 1993).
els, supports tree I, with a bootstrap probability of as
high as 0.92.
The strongest support for tree III, from the MP
analysis, is provided by lipoprotein lipase (table 4). The
number of substitutions in tree I is greater than that in
tree III, by as many as 9 f 3.6 for this protein. Tree III
is also preferred to tree I, by any of the models of ML,
but the log-likelihood difference is very small ( 1.7 _+3.3
for the JTT model). Particularly in tree I, the length of
the branch leading to guinea pig is much greater than
the lengths of the other branches (data not shown). This
may be an example of misleading MP analysis, because
of the unequal rate effect. Of course, it is possible that
we are comparing paralogous genes for lipoprotein lipase.
If we exclude this protein, the MP analysis in table 4 no
longer prefers tree III to tree I.
For the revised data set, even when we exclude
nonconservative proteins, tree I is still supported by the
ML analysis (table 4). There seems to be no reason to
believe that the ML analyses of nonconservative proteins
are less reliable than those of conservative ones. Rather,
we could consider that nonconservative proteins have
more information than do conservative ones, especially
when PROTML is used. Since the overall evidence of
conservative and nonconservative proteins from the ML
analyses supports tree I, the rodent polyphyly hypothesis
claimed by Graur et al. ( 199 1) seems unlikely.
Recently, Martignetti and Brosius ( 1993) demonstrated that BC 1 RNA, the product of a retropositionally
generated gene, is present in Sciurognathi and guinea
pig but not in other mammalian orders, including Primates, Artiodactyla, and Lagomorpha. Their finding is
in accord with our analyses and is at odds with the rodent
polyphyly hypothesis.
Other Protein Data
Conclusions
Gi3 protein c1 from human, rat, and guinea pig,
with both that from Xenopus and Gil protein a from
guinea pig as outgroups, support tree I, by either method
(table 4). Integrin ,f31and NADPH-cytochrome
P450
reductase do not necessarily favor tree I, but there is no
statistical significance (table 4). The ML and MP analyses of myelin and glucagon do not show any difference
among the three possible trees, and hence the analyses
of these genes are not given in table 4.
Li et al. ( 1992a) claim that the ML method that
we used is model dependent, while the MP method used
by Graur et al. is not. From our viewpoint, any method
of data analysis is based on some assumption (model),
explicitly or implicitly. The ML method has a solid statistical background and is based on an explicit model,
and hence the assumption is clear. Of course, if the assumption deviates too much from the real underlying
process, the inferred tree may be erroneous. Therefore,
we must continue to improve the model used in the ML
method, by taking into account newly obtained knowledge on molecular evolutionary process. In this sense,
the ML method is flexible enough in including increasing
new evidence of the underlying process. In contrast, the
assumption of the MP method is not necessarily clear.
Up to now, it has been known that the MP method
behaves badly in some situations (Felsenstein 1978;
Hendy and Penny 1989; Hasegawa et al. 199 1; Hasegawa
Overall Evidence of Conservative Proteins
Table 4 summarizes the ML and MP analyses of
the revised data set for the conservative proteins, under
Li et al.‘s ( 1992~) criterion. Although the MP analysis
still very weakly favors tree III, the overall evidence of
the ML analysis supports the traditional tree I, irrespective of the assumed model, and the JTT model, which
best approximates the data among the alternative mod-
Phylogenetic Place of Guinea Pigs 603
and Fujiwara 1993), and hence, although the MP
method is the most popular in molecular phylogenetics
(e.g., see Stewart 1993)) we must be careful in using it.
Furthermore, there is good indication of robustness of
the ML method against the violation of the assumed
model in inferring a branching order (Fukami-Kobayashi and Tateno 199 1; Hasegawa and Fujiwara 1993),
and the inferred trees from the real data are consistent
for different models if the SEs of the estimates are taken
into account (Hasegawa et al. 1993; Hashimoto et al.
1993; present study; authors’ unpublished data). Up to
now, we have no experience of finding inconsistent trees,
of statistical significance, among different models for
the ML.
Graur et al.‘s hypothesis was at first supported by
the MP method, with a high statistical significance.
However, the overall evidence of the ML analysis of the
revised data set does not support the hypothesis that
radically contradicts the traditional view. This work
demonstrates that we must be careful in choosing, among
alternative methods, a proper method for phylogenetic
inference and that an argument based on a small data
set (with respect to both the length of the sequences and,
especially, the number of species) may be unstable.
It must be noted that this work does not necessarily
exclude the rodent polyphyly hypothesis, and we think
that more data are needed to settle the issue, as Graur
et al. ( 199 1) and Li et al. ( 1992a) admit. It must also
be noted that what we have studied in this work is limited
to the relation among Hystricomorpha (Caviomorpha),
Myomorpha, and Primates. Even if the close relationship
between Hystricomorpha and Myomorpha, excluding
Primates as an outgroup, is established, some of the other
eutherian orders can be close to either Hystricomorpha
or Myomorpha, rather than between these two groups.
The difficulty in resolving the branching order among
Hystricomorpha, Myomorpha, and Primates-as
well
as the outgroup status of Myomorpha, with respect to
Primates, Carnivora, Artiodactyla, and Cetacea, shown
by recent works (Easteal 1988, 1990; Li et al. 1990; Janke
et al., in press; authors’ unpublished data)-may
suggest
a high taxonomic rank for Hystricomorpha, even though
the traditional view of the rodent monophyly holds.
Acknowledgments
We thank H. Kishino, M. Milinkovitch, M. Novacek, and J. Powell for valuable comments on the
manuscript and for discussions. Thanks are also due to
two anonymous reviewers for their valuable comments
on an earlier version of the manuscript. This work was
carried out under the Institute of Statistical Mathemathics Cooperative Research Program (grants 92-ISMCRP-81 and 93-ISM-CRP-C2) and was supported by
grants from the Ministry of Education,
Culture of Japan.
Science, and
LITERATURE CITED
ADACHI, J., and M. HASEGAWA. 1992. MOLPHY: programs
for molecular phylogenetics. I. PROTML: maximum likelihood inference of protein phylogeny. Computer Science
Monographs, no. 27. Institute of Statistical Mathematics,
Tokyo.
ADKINS, R. M., and R. L. HONEYCUTT. 1993. A molecular
examination of Archontan and Chiropteran monophylyl.
Pp. 227-249 in R. D. E. MACPHEE, ed. Primates and their
relatives in phylogenetic perspective. Plenum, New York.
AKAIKE, H. 1973. Information theory and an extension of the
maximum likelihood principle. Pp. 267-281 in B. N.
PETROV and F. CSAKI, eds. Second International Symposium on Information Theory. Akademiai Kiado, Budapest.
BEINTEMA,J. J. 1985. Amino acid sequence data and evolutionary relationships among hystricognaths and other rodents. Pp. 549-565 in W. LUCKETT and J.-L. HARTENBERGER, eds. Evolutionary relationships among rodents: a
multidisciplinary analysis. Plenum, New York.
BOTTEMA, C. D. K., R. P. KETTERLING, S. II, H.-S. YOON,
J. A. PHILLIPSIII, and S. S. SOMMER. 1991. Missense mutations and evolutionary conservation of amino acids: evidence that many of the amino acids in factor IX function
as “spacer” elements. Am. J. Hum. Genet. 49:820-838.
DA SILVA, M. N. F., and J. L. PATTON. 1993. Amazonian
phylogeography: mtDNA sequence variation in arboreal
echimyid rodents (Caviomorpha) . Mol. Phylogenet. Evol.
2:243-255.
DAYHOFF, M. O., R. M. SCHWARTZ,and B. C. ORCUTT. 1978.
A model of evolutionary change in proteins. Pp. 345-352
in Dayhoff, 0. H., ed. Atlas of protein sequence and structure, Vol. 5, suppl. 3. National Biomedical Research Foundation, Washington, D.C.
EASTEAL,S. 1988. Rate constancy of globin gene evolution in
placental mammals. Proc. Natl. Acad. Sci. USA 85:76227626.
. 1990. The pattern of mammalian evolution and the
relative rate of molecular evolution. Genetics 124: 165- 173.
FELSENSTEIN,J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool.
27:401-410.
-.
1985. Confidence limits on phylogenies: an approach
using the bootstrap. Evolution 39:783-79 1.
. 1993. PHYLIP, version 3.5. University of Washington,
Seattle.
FITCH, W. M., and J. J. BEINTEMA. 1990. Correcting parsimonious trees for unseen nucleotide substitutions: the effect
of dense branching as exemplified by ribonuclease. Mol.
Biol. Evol. 7:438-443.
FUKAMI-KOBAYASHI,K., and Y. TATENO. 199 1. Robustness
of maximum likelihood tree estimation against different
patterns of base substitutions. J. Mol. Evol. 32:79-9 1.
GRAUR, D., W. A. HIDE, and W.-H. LI . 199 1. Is the guineania
_ y a rodent? Nature 351:649-652.
604 Cao et al.
GRAUR, D., W. A. HIDE, A. ZHARKIKH, and W.-H. LI. 1992.
The biochemical phylogeny of guinea-pigs and gundis, and
the paraphyly of the order rodentia. Comp. Biochem. Physiol. [B] 101:495-498.
HASEGAWA,M., Y. CAO, J. ADACHI, and T. YANO. 1992.
Rodent polyphyly? Nature 355:595-595.
HASEGAWA,M., and M. FUJIWARA. 1993. Relative efficiencies
of the maximum likelihood, maximum parsimony, and
neighbor joining methods for estimating protein phylogeny.
Mol. Phylogenet. Evol. 2: l-5.
HASEGAWA,M., T. HASHIMOTO,J. ADACHI, N. IWABE,and
T. MIYATA. 1993. Early divergences in the evolution of
eukaryotes: ancient divergence of Entamoeba that lacks
mitochondria revealed by protein sequence data. J. Mol.
Evol. 36:380-388.
HASEGAWA, M., and H. KISHINO. 1994. Accuracies of the
simple methods for estimating the bootstrap probability of
a maximum-likelihood
tree. Mol. Biol. Evol. 11:142-145.
HASEGAWA,M., H. KISHINO, and N. SAITOU. 199 1. On the
maximum likelihood method in molecular phylogenetics.
J. Mol. Evol. 32:443-445.
HASHIMOTO,T., E. OTAKA, J. ADACHI, K. MIZUTA, and M.
HASEGAWA
. 1993. The giant panda is most close to a bear,
judged by a- and P-hemoglobin sequences. J. Mol. Evol.
36:282-289.
HENDY, M. D., and D. PENNY. 1989. A framework for the
quantitative study of evolutionary trees. Syst. Zool. 38:297309.
JANKE, A., G. FELDMAIER-FUCHS,W. K. THOMAS, A. VON
HAESELER,and S. P;~;~Bo. The marsupial mitochondrial
genome and the evolution of placental mammals. Genetics
(in press).
JONES, D. T., W. R. TAYLOR, and J. M. THORNTON. 1992.
The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275-282.
KIMURA, M. 1983. The neutral theory of molecular evolution.
Cambridge University Press, Cambridge.
KISHINO, H., and M. HASEGAWA. 1989. Evaluation of the
maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order
in Hominoidea. J. Mol. Evol. 29: 170-179.
KISHINO, H., T. MIYATA, and M. HASEGAWA. 1990. Maximum likelihood inference of protein phylogeny and the
origin of chloroplasts. J. Mol. Evol. 30: 15 1- 160.
LECOINTRE,G., H. PHILIPPE, H. L. V. LI?, and H. LE GUYADER. 1993. Species sampling has a major impact on phylogenetic inference. Mol. Phylogenet. Evol. 2:205-224.
LI, W.-H., M. GOUY, P. M. SHARP, C. O’HUIGIN, and Y.-W.
YANG. 1990. Molecular phylogeny of Rodentia, Lagomorpha, Primates, Artiodactyla, and Carnivora and molecular
clocks. Proc. Natl. Acad. Sci. USA 87:6703-6707.
LI, W.-H., W. A. HIDE, and D. GRAUR. 1992a. Origin of rodents and guinea-pigs. Nature 359:277-278.
LI, W.-H., W. A. HIDE, A. ZHARKIKH, D.-P. MA, and D.
GRAUR. 1992b. The molecular taxonomy and evolution
of the guinea pig. J. Hered. 83: 174- 18 1.
LUCKETT,W. P., and J.-L. HARTENBERGER.1985. Evolutionary relationships among rodents: comments and conclusions. Pp. 685-7 12 in W. LUCKETTand J.-L. HARTENBERGER, eds. Evolutionary relationships among rodents: a
multidisciplinary analysis. Plenum, New York.
MA, D.-P., A. ZHARKIKH, D. GRAUR, J. L. VANDEBERG,and
W.-H. LI . 1993. Structure and evolution of opossum, guinea
pig, and porcupine cytochrome b genes. J. Mol. Evol. 36:
327-334.
MARTIGNETTI,J. A., and J. BROSIUS. 1993. Neural BCl RNA
as an evolutionary marker: guinea pig remains a rodent.
Proc. Natl. Acad. Sci. USA 90:9698-9702.
NOVACEK, M. J. 1992. Mammalian phylogeny: shaking the
tree. Nature 356:121-125.
RUVOLO,M., T. R. DISOTELL,M. W. ALLARD,W. M. BROWN,
and R. L. HONEYCUTT. 199 1. Resolution of the African
hominoid trichotomy by use of a mitochondrial gene sequence. Proc. Natl. Acad. Sci. USA 88: 1570- 1574.
SAITOU,N., and M. NEI. 1987. The neighbor-joining method:
a new method for reconstructing phylogenetic trees. Mol.
Biol. Evol. 4:406-425.
SAKAMOTO, Y., M. ISHIGURO, and G. KITAGAWA. 1986.
Akaike information criterion statistics. Reidel, Dordrecht.
SHEPPARD,D., C. Rozzo, L. STARR, V. QUARANTA, D. J.
ERLE,and R. PYTELA . 1990. Complete amino acid sequence
of a novel integrin p subunit ( p6) identified in epithelial
cells using the polymerase chain reaction. J. Biol. Chem.
265:11502-l 1507.
STENZEL,P. 1974. Opossum Hb chain sequence and neutral
mutation theory. Nature 252:62-63.
STEWARD,C.-B. 1993. The powers and pitfalls of parsimony.
Nature 361:603-607.
WYSS, A. R., J. J. FLYNN, M. A. NORELL, C. C. SWISHERIII,
R. CHARRIER,M. J. NOVACEK,and M. C. MCKENNA 1993.
South America’s earliest rodent and recognition of a new
interval of mammalian evolution, Nature 365:434-437.
TAKASHI GOJOBORI, reviewing
Received
November
Accepted
March
4, 1993
16, 1994
editor