Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
<oological Journal o f t h e Linnean Socieg (1982) 74: 337-344. With 2 figures Phylogenetic reconstruction and phenetic taxonomy J. McNEILL Biosystematics Research Institute, Agriculture Canada, Ottawa, Canada K1A OCP Accepkd for publication 3um 1981 Cladistic analysis should not be equated with phylogenetic reconstruction. Instead it is a means of describing character-state distributions among organisms and in this it resembles phenetic analysis. However, the claim that cladistic methods meet phenetic (‘Gilmour-natural’) criteria for classification as well as or better than traditional phenetic ones is shown to be based on an inadequate interpretation of these criteria. Instead, a new measure of naturalness is proposed in which the most natural classification is that which describes the distribution of all character states by the smallest number of statements. The possibility of extending this measure to provide a criterion for an optimally simple classification is noted. It is concluded that phylogenetic reconstruction must not only reflect the branching patterns suggested by cladistic analysis but also take account of the evolutionary history that is reflected in an optimal phenogram. KEY WORDS:- Cladistics - phenetics - natural taxa - natural classification. CONTENTS Cladistic analysis differs from phylogenetic reconstruction Cladistics resembles phenetics . . . . . . . Cladistics, phenetics and Gilmour-natural classifications . Farris’s assessment of naturalness . . . . . A new measure of naturalness . . . . . . Phenetics and phylogeny . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 338 340 340 341 343 344 CLADISTIC ANALYSIS DIFFERS FROM PHYLOGENETIC RECONSTRUCTION I am very glad to have found during this Symposium that I am not alone in refusing to equate phylogenetic reconstruction with cladistic analysis. Forey (1982), for example, makes a clear distinction between cladograms and evolutionary trees, the one the result of cladistic analysis and the other claiming to be a best attempt at phylogenetic reconstruction. Forey, of course, expresses his preference for a cladogram as a medium for expressing evolutionary events, but both he and others have made it clear that cladistic analysis and phylogenetic reconstruction are two very different things. *Present address: Department of Botany, University of Ottawa, Ottawa, Canada KIN 6N5. 0024-408~/82/030337 +oa 002.00/0 337 0 1982 The Linnean Society of London 338 J. McNEILL Cladistic analyses produce cladograms that conform to certain defined rules and, within the constraints of these rules, best summarize the data that have gone into their construction. Phylogenetic reconstruction, on the other hand, is the attempt to deduce, by whatever means seems appropriate, what actually happened, i.e. the real evolutionary history of the group under investigation. Because very many investigators (e.g. Hill & Crane, 1982) have used cladistic techniques as a basis for phylogenetic reconstruction and because its product (the cladogram) is a tree diagram, it is tempting to view cladistics simply as a method, or perhaps even as the method, of reconstructing the course of organic evolution. I believe that this is a misleading view of cladistics and my belief is supported by the most recent cladistic literature. CLADISTICS RESEMBLES PHENETICS It was Hull (1980) who pointed out, in his contribution to the Hennig Memorial Symposium held in December 1977, that, whereas early cladistic analysis was concerned primarily with species, and the nodes in a cladogram were interpreted as real or ‘hypothetical’ ancestral species, more recently emphasis has been on characters, with cladograms representing the order of emergence of uniquely derived characters, so that the nodes represent “minimum sets of synapomorphic characters” (Platnick, 1977). This idea has been developed further by Platnick ( 1980), who notes that “cladistic methods obviously do not depend on the recognition of historically primitive or historically derived character states (i.e., they do not depend on the actual reconstruction of evolutionary history)” but “merely attempt to discriminate more general from less general characters.” Patterson (1980) spells out the implications of this when he says “cladistics . . . is not necessarily about evolution . . . It is about a simpler and more basic matter, the pattern in nature.” There is a striking parallel here to McNeill’s (1980) account of phenetics as being dependent on evolution for its success but otherwise making “no attempt to reflect evolution” but seeking “to describe the distribution among organisms of as many of their character-states as possible.” I would suggest that cladistic analysis and phenetic analysis are alike in that both are made possible by evolution (or by some analagous “external influence”) but that neither provide evolutionary trees per se and neither, by themselves, permit phylogenetic reconstruction. The phenograms and the derived ‘natural’ classifications of a pheneticist and the cladograms and the identically mapped classifications of a cladist are both in the form of rooted trees, but both are produced not to reconstruct phylogeny but to describe character state distributions among organisms. Of course, there are differences between phenetics and cladistics in addition to the basic one that phenetic methods start from overall similarity (though subsequent character selection may be made on the basis of initial group recognition, cf. McNeill, 1980: 475-477), whereas cladistic techniques use only character states considered to show derived patristic similarity (synapomorphies). Amongst these secondary differences is the fact that pheneticists usually distinguish between a phenogram and a classification, adopting, usually intuitively, some simplifying process in going from the one to the other (cf. McNeill, 1979). But these differences are secondary and I would suggest that there is, in fact, a very close relationship between modern PHENETIC TAXONOMY 339 cladistic analysis and genuine phenetic analysis (as described in e.g. Jardine & Sibson, 1971 : 1 3 w 38; McNeill, 1980). Figure 1. Diagram of two possible evolutionary histories for the terminal taxa A, B, C, D described by the characters and character-states listed in Table 1. The nodes E, F and G (A) and E' , F' and G (B) represent points of divergence of lineages and the numbers attached to the internodes represent the number of character-state changes postulated as having occurred in each lineage. Much of the time the two approaches will give the same answer and, what is more, neither need correspond to what really happened in the evolutionary history of the organisms concerned. Consider, for example, the four organisms A, B, C & D (Fig. 1). Figures 1A and 1B show two possible evolutionary histories for these organisms but the characters and character states are the same in both. Let us suppose that they are those listed in Table 1. Even if the true evolutionary history were that depicted in Fig. lB, both phenetic and cladistic techniques would generate a tree diagram with the topology of Fig. 1A. Moreover, although neither corresponds to a correct phylogenetic reconstruction, both the phenogram and the Table 1. Characters (a-n) and character-states (1/0) for the terminal taxa (A, B, C, D) and nodes (E, F, G and El, F', G) in Fig. 1 a b c d e f g h i j k I m n B C D 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 E F 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 E' F' 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 l A J . McNEILL 340 cladogram are correct according to their own criteria, in that both accurately describe the distribution of the character states amongst these four organisms: the members of the pairs A B and C D are phenetically more similar to each other than either is to a member of the other pair; the nodes E and F provide a more parsimonious nesting of the synamorphics than do the nodes E’ and F’ . CLADISTICS, PHENETICS .2SD GILMOUR-NATURAL CLASSIFICATIONS Farl-is’s assessment of naturalnesJ Of course, in other situations phenetic and cladistic methods can give different results with the same data set. Farris (1980) presents one such data set for four taxa and 12 characters; this is reproduced in Table 2. Most phenetic methods will give a set of pair-wise similarities and a phenogram such as is shown in Fig ZA, whereas cladistic methods, including Farris’s ‘special similarity’ measure, will give pairwise similarities and a cladogram like that in Fig. 2B. Farris (1980) has argued that even by phenetic criteria the cladogram is to be preferred to the phenogram and suggests that pheneticists should abandon the use of what he calls ‘raw similarity’ t i.e. overall similarity: in favour of his ‘special similarity’ (i.e. similarity assessed on the basis of synapomorphies only). Table 2. Characters and character-states for the four hypothetical taxa forming Farris’s i 1980) Data Set 2. Alternative phenetic and cladistic groupings of these taxa appear in Fig. 2 Characters A B C D la Ib IC 2 3 4a 4b 4c 4d 4e 5 6 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 1 1 1 0 As Farris (1977) and McNeill ( 1980) point out, one of the criteria of a phenetic classification is that it should be natural in the sense in which that term was used by Gilmour and others in the formative period of phenetic philosophy (cf. Gilmour, 1937, 194-0, 1961: Gilmour & IValters, 1963; Sneath, 1957; Cain & Harrison, 1958). Farris ( 1977 i describes a ‘Gilmour-natural’ classification as one “whose constituent groups describe the distribution among organisms of as many features as possible.” I n analysing the data set reproduced in Fig. 2, Farris (1980) interprets this as meaning that each group recognized in a classification should be characterized by at least one character state unique to that group, i.e. a monothetic criterion. The group (B, C) recognized in the phenogram does not meet this criterion, for, although nine out of 12 character states are shared by B and C , all are possessed also by either A or D. Farris argues from this that (B, C) is not a natural group, whereas the alternative group (A, B), recognized in the cladogram, is Gilmour-natural, because it uniquely describes the 1 state of character 5. Farris concludes from this that overall similarity may produce classifications less desirable phenetically than those produced by cladistic techniques. PHENETIC TAXONOMY A E B E c 7 9 D 2 4 C 341 A B C 2 5 A I I 0 0 B 0 C D B Figure 2. Similarity matrices and dendrograms derived from the four taxon character-state matrix in Table 2. A, Derived by a phenetic method such as the simple matching coefficient with group average clustering (UPGMA); B, derived by a cladistic method such as Farris’s (1980) ‘special similarity’ measure. I t appears to me that Farris’s requirement for naturalness, namely that each group be characterized by at least one unique character state, is not a necessary corollary of the requirement that Gilmour-natural classifications describe the distribution among organisms of as many features as possible. Gilmour-natural classifications are concerned with the distribution of all character-states, whether ‘primitive’ or ‘derived’ and a classification should therefore, be judged on the efficiency of the classification as a whole in communicating information on such character-state distribution. Farris ( 1980) does discuss information transmission as a criterion of naturalness but uses a character-type by character-type approach for each node in the phenogram. Not altogether surprisingly, with what is really a cladistic approach, the cladogram in Fig. 2B is shown to have greater ‘information content’ than the phenogram. A new measure of naturalness For a classification to describe the distribution of character-states among organisms most efficiently, it should permit the character states to be described in as few statements as possible. The data matrix in Table 2 with four taxa and 12 characters requires 48 statements, if for the present discussion we exclude, as Farris also does, consideration of taxon labels. Table 3 summarizes the numbers of statements needed if the cladogram and the phenogram are each treated as classifications. A ‘compromise’classification of only two ranks is also assessed. I n all cases, the first grouping is (D) and (A, B, C). The character-states of (D) must be specified in full (12 statements) if no information is to be lost. Likewise at least six must be specified for (A, B, C). The remaining six are variable (marked by asterisks in Table 3) and three alternative ways of making statements about them might be considered: I, reference to the character-state could be omitted (0 statements) ; 11, the character-state could be specified as variable (6 statements) ; or 111, for each character the most common state and the exceptions could be specified (12 statements). For all three classifications the number ofstatements will be the same: 18,24 or 30, depending upon whether Method I, I1 or I11 is adopted. Taxa 1 0 I 0 1 0 0 1 I 0 0 0 I 0 1 0 0 0 0 1 0 0 0 1 0 1 * 0 4a 3 0 0 1 * 0 1 0 0 8 8 0 8 0 I 0 8 8 0 r 0 8 0 * 0 0 I 0 2 Ic Ib la 0 1 4b 1 0 4c 1 0 4d 1 0 4e 0 1 1 0 0 1 5 1 0 6 Character states about which statements must be made - 42 - 24 6 6 6 18 6 6 6 36 - 48 - - 6 6 6 YO - 51 - 42 - - - 3 '5 9 30 6 33 24 6 6 3 3 - 54 - z m r F I 2 12 10 6 4 4 i ' 111 I11 - 18 6 3 3 3 44 - 34 12 12 6 6 4 4 I1 - 6 12 2 6 4 4 I Tor als Table 3. Numbers of statements required to describe completely, with three different classifications, the character-state distributions among the four taxa listed in Table 2. The uppermost set uses the cladogram (Fig. ZB), the middle, the phenogram (Fig. ZA), and the lowermost a compromise classification with only two ranks. I, I1 and I11 represent three alternative methods of computing the number of statements required (for details see'text) W N + PHENETIC TAXONOMY 343 At each of the subordinate levels no specification of the character-states that were constant in the more inclusive group is necessary: part of the information storage component of a classification, Consequently, for the cladogram, six character-states must be specified for (C) and two, plus four variable ones, for (A, B) giving 8, 12 or 16 statements depending on the method. By comparison (A) and (B, C) require 9, 12 or 15 statements. The main difference between the cladogram and the phenogram comes, however, in the eight statements needed to specify A and B within (A, B) as against the six required for B and C within (B, C). The totals in Table 3 show that whether Method I, I1 or I11 is chosen, the phenogram is more efficient at summarizing all the character-state distributions than is the cladogram. Method I11 bears some relationship to Gower’s (1974) ‘maximally predictive’ procedure of summing all the matches with the ‘predictor state’ for each group, a procedure which Farris ( 1980) has criticized in the hierarchical context as involving repetition ofcertain character states (in this case those associated with A, B, C) and not others (those associated with D). This criticism seems valid and certainly Method I11 does involve unnecessarily repetitive statements about characterstates: consequently I do not regard it as an appropriate method in this context, and will not consider it further. T h e choice between Method I and Method I1 is less clear; taxonomic practice, as reflected in descriptions of taxa, tends to favour Method I, but keys and diagnoses that are to be strictly comparable at each hierarchical level require Method 11. The ‘compromise’ classification of only two ranks (i.e. with (ABC) not further divided) cannot, strictly speaking, be compared with the other two unless statement lengths for taxon labels and ranks are determined and included. Purely from the number of character statements, however, it is never superior to the phenogram and is only more concise than the cladogram if statements must be made about characters that vary within groups (i.e. Methods I1 & 111). In a real classificatory situation, the two-rank classification might be preferred to either three-rank one, because the statement length (or set of symbols) to be associated with a taxon name and specification of rank might be adjudged to be so great as to outweigh the advantage of the slightly more concise character statements. This suggests an alternative strategy to the structural value criterion proposed by McNeill ( 1979) for the simplification of dendrograms to ‘practical classifications’, but further consideration of this is outside the scope of the present study. PHENETICS AND PHYLOGENY What, then, are the implications of this for phenetic taxonomy and phylogenetic reconstruction? Firstly, it establishes that there is a component to Gilmournaturalness that is not reflected in the nested synapomorphies of a cladogram. T h e description of character-state distributions among organisms provided by a phenogram is as good as or better than that of a cladogram, if the criterion used is that of minimizing the number of statements that need be made about all character-states. (If, for some reason, certain character-states are considered inappropriate for description in the classification, as, for example, species absence usually is in ecological classification, then account would be taken of this in the selection of the appropriate phenetic method, e.g. by using the Jaccard coefficient 344 J . McNEILL for calculating similarity; the phenogram would remain the best descriptor of those character-states being considered.) Phenetic taxonomy, thus, seeks to express in communicable, and hence simplified, form (see McNeill, 1980) information about character-state distributions among organisms that is lacking in a cladogram. This information is one of the products of the evolutionary history of the organisms. Even though i t is likely to be more a product of anagenesis than cladogenesis, it remains a component that phylogenetic reconstruction must explain if it is to attempt to describe the actual course of evolution. REFERENCES C A N , A. J. & HARRISON, G. A,, 1958. An analysis of the taxonomist’sjudgment ofaffinity. Proceedings of the (oological Socieg of London, 131: 85-98. FARRIS, J. S . , 1977. On the phenetic approach to vertebrate classification. In M.K. Hecht, P. C. Goody & B. M. Hecht (Eds), Major Paffernnrin Vertebra& Evolution: 823-850. New York: Plenum Press. FARRIS, J. S . , 1980. The information content of the phylogenetic system. Sysfmcfic <wlogy, 28: 483-519. FOREY, P. L., 1982. Palamntologicalstoriesversusneontologicalanalysis.In K. A. Joysey & A. E. Friday (Eds), Problem of Phylogmfic Rccomfrucfion: 114-157. London: Academic Press. GILMOUR. J. S. L., 1937. A taxonomic problem. .Vafure, London, 139: 1 W 1 0 4 2 . GILMOUR, J. S. L., 1940. Taxonomy and philosophy. In J. Huxley (Ed.), The .New Qsmatics: 461-474. Oxford : Clarendon Press. GILMOUR, J. S. L., 1961. Taxonomy. In A. M. MacLeod and L. S. Cobley (Eds), Contemporary Botanical Thought: 27-45. Edinburgh: Oliver & Boyd. GILMOUR, J. S. L. & WALTERS. S. M., 1963. Philosophy and classification. In W. B. Turrill (Ed.), Vistas in Rolan).. IV. Recent Researches in Plant Taxonomy: 1-22, Oxford: Pergamon Press. COWER, J. C.. 1974. Maximal predictive classification. Biomchics, 30: 643-654. HILL, C. R. & CRA4NE,P. R., 1982. Cladistics and the origin ofangiosperms. In K. A. Joysey & A. E. Friday (Eds), Problems of Phylagenetic Reconshucfion: 269-361.London: Academic Press. , 416 ~440. HULL, D. L.. 1980. The limits of cladism. Sysfmafu ~ o o l o p y 28: JARDINE, N. & SIBSON, R., 1971. Malhrmaficaf 7axonony. London: Wiley. McNEILL, J., 1979. Structural value: a concept used in the construction of taxonomic classifications. Taxon, 28: 48 1-504. McNEILL, J., 1980. Purposeful phenetics. $ & m a f i c (oology, 28: 465-482. PATTERSON, C., 1980. Cladistics. Biologist, I m d u n , 27: 234-240. PLATNICK, N. I., 1977. Monotypy and the origin of higher taxa: a reply to E. 0.Wiley. Svstemafic <oolog~,26: 355-357. PL..\TNICK, N. I., 1980. Philosophy and the transformation of cladistics. Jysfmatic .+dogy, 28: 537-546 SNEATH, P. H. A,, 1957. Some thoughts on bacterial classification.3 0 ~ of~General ~ 1 Microbiolo~,17: 184-200.