Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sankhyā : The Indian Journal of Statistics 2010, Volume 72-A, Part 1, pp. 70-80 c 2010, Indian Statistical Institute Quadratic Entropy and Analysis of Diversity C. R. Rao CRRao AIMSCS, Hyderabad, India Abstract In this paper some general postulates are laid down for the construction of diversity measures and conditions for ANOVA type of analysis are investigated. It is shown that a diversity measure called quadratic entropy introduced by the author in 1982, applicable to both qualitative and quantitative data, provides a general solution to both the problems posed above. AMS (2000) subject classification. Primary 62A01, 62H30, 62B10, 94A17. Keywords and phrases. Diversity measure, quadratic entropy, convexity. 1 Introduction R.A. Fisher introduced the method of Analysis of Variance (ANOVA) for partitioning the variance of a measurement in a sample of observations into several components with assignable causes. ANOVA has been used in a variety of situations from regression analysis to testing main effects and interactions based on data from multifactorial experiments. Two questions arise. Can ANOVA type analysis be done using measures of dispersion other than variance, such as mean deviation, range, Gini coefficient of concentration, and various types of entropy functions based on qualitative data used by ecologists to measure diversity ? In this paper, some general postulates are laid down for the construction of diversity measures and conditions for ANOVA type of analysis are investigated. It is shown that a diversity measure called quadratic entropy (QE) introduced by the author (Rao, 1982a, 1982b, 1982c, 1982d), applicable to both qualitative and quantitative data, provides a general solution to both the problems posed above. We use the acronym ANODIV (Analysis of Diversity) for the decomposition of a diversity measure into a number of components with assignable causes as in ANOVA. Quadratic entropy and analysis of diversity 71 The paper is dedicated to the memory of Dr. A.Maitra in appreciation of the fundamental contributions he has made to probability theory and the devotion with which he served the Indian Statistical Institute in various capacities including Directorship. 2 Characterization of a diversity measure Let P be a convex set of probability measures defined on a measurable space (X , B), and H(·) a real valued function defined on P. The function H will be called a diversity measure or entropy if it satisfies the following postulates. C0 : H(P ) = −J0 (P ) ≥ 0 ∀ P ∈ P and =0 iff P is degenerate C1 : H(λP1 + µP2 ) − λH(P1 ) − µH(P2 ) = J1 (P1 , P2 : λ, µ) ≥ 0 P1 , P2 ∈ P and λ ≥ 0, µ ≥ 0, λ + µ = 1. The postulate C0 requires H to be a non-negative function. C1 implies that H is a concave function, or J0 = −H is a convex function. It meets the intuitive requirement that the diversity within a mixture of two different populations (probability measures) is greater than the average of the diversities within the two populations. The function J1 defined on P 2 is called the first Jensen difference (Rao, 1982a). More generally, C1 implies k X X λi Pi ) − λi H(Pi ) = J1 ({Pi } : {λi }) ≥ 0 H( (2.1) 1 where Pi ∈ P, i = 1, . . . , k and λi ≥ 0, λ1 + . . . + λk = 1. Note 1. Ecologists specify another requirement of an entropy function that it should attain the maximum value when the probability distribution is uniform. For instance, in the case of a multinomial distribution with m classes, H(P ) should attain the maximum value when the probability of each class is 1/m. We have shown in Rao (2010) that this condition is implied by the concavity condition if H(P ) is a symmetric function of the class probabilities. But this postulate restricts the diversity function to cases where equal importance is given to differences between all pairs of classes. For instance, if some species are considered to be more similar than others based on some measurement, we may give less weight to their difference in constructing a diversity measure. Note 2. If C1 holds with J1 > 0 if P1 6= P2 and =0 iff P1 = P2 , then H is a strictly concave function. In some applications, it may be necessary to have a strictly concave measure. 72 C. R. Rao Using (2.1), based on concavity of H(P ), we have one way ANODIV as in Table 1, similar to ANOVA. Table 1. One way ANODIV for k populations Due to Between populations Within populations Total (B) (W) (T ) Diversity J1P ({Pi } : {λi }) λi H(Pi ) H(P. ) P In Table 1, J1 is as defined in (2.1) and P· = λi Pi . In practice, we estimate H(P ) based on observed data and choose λi as proportional to sample size ni of observations from Pi or as λi = 1/k, i = 1, . . . , k, depending on the problem under study. A decomposition as in Table 1 with all λi equal has been used to compute G = B/T , called genetic index of diversity, in population studies by Lewontin (1972), Nei (1973), Rao (1982b) and Rao and Boudreau (1982). Reference may also be made to papers by Light and Margolin (1971) and Anderson and Landis (1980) for some special cases. Now, we investigate the condition on H(P) for carrying out a two-way ANODIV. Let us consider populations denoted by Pij , i = 1, . . . , r and j = 1, . . . , s, where the first index i refers to a locality and the second index j to a specific community. Further let λi µj be the relative strength of individuals in localityPi and community j, where λi and µj are all non-negative and P λi = µj = 1. We may ask: How are the genetic contributions of individuals between localities and between communities as a whole different? Is the magnitude of genetic differences between the communities different in different localities? Such questions concerning the rs populations can be answered by a two-way ANODIV as in Table 2. Table 2. Two-way ANODIV Due to Localities Communities Interaction (L) (C) (LC) Between populations Within populations (B) (W) Total (T) Diversity P H(P·· ) − P r1 λi H(Pi· ) H(P·· ) − s1 µj H(P·j ) * by subtraction H(P − ·· )P P P λi µj H(Pij ) λi µj H(Pij ) H(P·· ) Quadratic entropy and analysis of diversity 73 PP P P λi µj Pij , Pi· = µj Pij , P·j = λi Pij . What are the In Table 2, P·· = conditions under which the entries in Table 2 non-negative? For B, L and C, to be non-negative, the function H(·) defined on P should be strictly concave. For the interaction LC to be non-negative (LC) = J1 ({Pi,j } : {λi µj }) − J1 ({Pi. } : {λi }) − J1 ({P.j } : {µj }) h i X h i X X = − H(P·· ) − λi H(Pi· ) + µj H(P·j ) − λi H(Pij ) X = −J1 ({Pi· } : {λi }) + µj J1 ({Pij } : {λi }) = J2 ({Pij } : {λi µj }) ≥ 0 or J1 as defined in (2.1) on P r is convex. We represent this condition as C2 on a diversity measure. If C1 and C2 hold, we call such a diversity measure as of order 2. Note that (LC) can also be expressed as h i X h i X X − H(P·· ) − µj H(P.j ) + λi H(Pi· ) − µj H(Pij ) X = − J1 ({P·j } : {µj }) + λi J1 ({Pij } : {µj }) or J1 as a function on P s is convex. We can recursively define higher order Jensen differences J3 from J2 , J4 from J3 and so on and call H(·) for which J0 , J1 , . . . , Ji−1 are convex as the i-th order diversity measure. With such a measure we can carry out ANODIV for i-way classified data. A diversity measure for which Jensen differences of all orders are convex is called a perfect diversity measure. Some examples of entropy functions for multinomial distributions in k classes, with Pi representing the probability for class i, used as diversity measures in ecological applications are as follows: 2. 3. 4. 5. 6. P Pi log Pi , Shannon entropy P P − Pi log Pi − (1 − Pi ) log(1 − Pi ), Paired Shannon entropy P 1 − Pi2 , Gini- Simpson index P (1 − α)−1 log Piα , 0 < α < 1, Renyi α-entropy P (α − 1)−1 (1 − Piα ), Havdra and Charavat entropy PP dij Pi Pj , Rao’s quadratic entropy 1. − 74 C. R. Rao For a general discussion of measurement of diversity reference may be made to Patil and Taillie (1982), Mathai and Rathie (1974), and Rao (1984, 1986). Burbea and Rao (1982a, 1982b, 1982c) have shown that Shannon entropy (1) and Paired Shannon entropy (2) satisfy the conditions C0 , C1 and C2 but not C3 , C4 , . . .. The Havdra and Charavat entropy (4) satisfies C0 , C1 , C2 for α in the range (1, 2] when k ≥ 3 and α in the range [1, 2] ∪ (3, 11/3) when k = 2 and C3 , C4 , . . . do not hold except when α = 2 in which case it reduces to Gini-Simpson index (3). Renyi’s entropy (4) satisfies C0 , C1 , C2 only for α in (0,1). Most of the well known entropy functions can be used only for two-way ANODIV but not for higher order ANODIV. It will be shown in the next section that quadratic entropy introduced by Rao (1981, 1982a) is a perfect diversity measure which can be used to carry out ANODIV for any order classified data. 3 Quadratic entropy 3.1 Definition. Quadratic entropy (QE) as a measure of diversity of individuals in a population is based on a quantitative assessment of the difference between any two individuals of the population identified by certain measurements which may be qualitative or quantitative. Let x1 and x2 be the measurements on two individuals drawn from a population, and the difference between them be assessed through a kernel function k(x1 , x2 ) = k(x2 , x1 ) ≥ 0 and = 0 if x1 = x2 . We define the quadratic entropy of a population with a probability measure P by the expression Q(P ) = EP P k(x1 , x2 ) (3.1) where we use the general notation EP1 P2 for expectation with respect to the product measure P1 P2 . If x1 is an observation from P1 and x2 from P2 we expect 1 1 EP1 P2 − EP1 P1 − EP2 P2 k(x1 , x2 ) 2 2 1 1 = EP1 P2 k(x1 , x2 ) − [EP1 P1 k(x1 , x2 )] − [EP2 P2 k(x1 , x2 )] ≥ 0 (3.2) 2 2 which implies that two individuals drawn from different populations are more dissimilar than those drawn from the same population. Quadratic entropy and analysis of diversity 75 An alternative requirement on the choice of k(x1 , x2 ) is the condition C2 of concavity of a diversity measure as described in Section 2. Considering a mixture of distributions, P1 and P2 with prior probabilities λ and µ = 1 − λ, concavity of Q(P ) defined in (3.1) implies Q (λP1 + µP2 ) − λQ(P1 ) − µQ(P2 ) 1 1 = 2λµ EP1 P2 − EP1 P1 − EP2 P2 k(x1 , x2 ) ≥ 0 (3.3) 2 2 which is same as the condition (3.2). If in (3.3), equality is attained if and only if P1 = P2 , Q(·) is said to be strictly concave, which may be needed in certain investigations. For instance in the univariate case the choice k(x1 , x2 ) = 1 (x1 − x2 )2 ⇒ Q(P ) = Variance of P = σP2 2 leads to J1 = EP1 P2 1 1 − EP1 P1 − EP2 P2 (x1 − x2 )2 = (µ1 − µ2 )2 , 2 2 EP1 (x) = µ1 , EP2 (x) = µ2 so that the diversity between P1 and P2 in one-way ANODIV does not reflect the differences in variances of P1 and P2 . Thus ANOVA is essentially useful in detecting differences in mean values of distributions. Q(P ) as defined in (3.1) is not strictly concave if the populations under study includes distributions with mean zero and different variances. 3.2 Choice of the k-function for multinomial distributions. Let P = (P1 , . . . , Pm ) be a multinomial distribution in m classes, where Pi is the probability of class i. Then the quadratic entropy of P is Q(P ) = P ′ KP, K = (kij ) (3.4) where in the m × m matrix (kij ), kij ≥ 0 represents the assessed difference between the classes i and j. Using the expression (3.3) without the term λµ, J1 (P1 , P2 ) = −(P1 − P2 )′ K(P1 − P2 ) = (P1∗ − P2∗ )K ∗ (P1∗ − P2∗ ) (3.5) where P1 and P2 are two distributions, P1∗ and P2∗ are vectors of the first m − 1 components of P1 and P2 respectively and K ∗ = (−kij + kim + kmj − kmm ). (3.6) 76 C. R. Rao Concavity of Q(P ) requires the first Jensen difference (3.5) to be nonnegative which holds if K ∗ is a non-negative definite matrix. We may state the desired result as follows: Q(P ) as defined in (3.4) satisfies the requirements of a diversity measure if (i) (ii) kij ≥ 0, kii = 0 The matrix K ∗ is non-negative definite. (3.7) J1 (P1 , P2 ) is defined on P 2 . Let us consider two pairs of distributions P1 , P2 and R1 , R2 and compute the second order Jensen difference J2 ({P1 , P2 }, {R1 , R2 } : λ, µ) = J1 (λP1 + µR1 , λP2 + µR2 ) − λJ1 (P1 , P2 ) − µJ(R1 , R2 ) = [λ(P1 − P2 ) + µ(R1 − R2 )]′ K [λ(P1 − P2 ) + µ(R1 − R2 )] − λ(P1 − P2 )′ K(P1 − P2 ) − µ(R1 − R2 )′ K(R1 − R2 ) = −λµ [(P1 − P2 ) − (R1 − R2 )]′ K [(P1 − P2 ) − (R1 − R2 )] . (3.8) The expression (3.8) is non-negative if K ∗ as defined in (3.6) is non-negative definite. Thus with the same conditions as in (3.7), J0 = −Q(P ) and J1 are convex. Similarly, Jensen differences of all orders are convex under the conditions (3.7). For some applications of Q(P ) as in (3.4) reference may be made to Izsak and Szeidl (2002), Pavoine, Ollier and Pontier (2005), Rao (1984), Ricotta and Szeidl (2006) and Zolton (2005). 3.3 Choice of the k-function: Continuous case. Let P1 (x) and P2 (x) be two probability measures, in which case the expression (3.3) takes the form, omitting the constant factor Z Z k(x1 , x2 ) [2dP1 (x1 )dP2 (x2 ) − dP1 (x1 )P1 (x2 ) − dP2 (x1 )P2 (x2 )] Z Z =− k(x1 , x2 ) dµ(x1 ) dµ(x2 ) ≥ 0 (3.9) R where µ = P1 − P2 and dµ = 0. The conditions under which (3.9) holds are provided by the following Lemma 3.1 due to Lau (1985). Lemma 3.1. Let k(x1 , x2 ) be continuous and such that k(x1 , x2 ) ≥ 0, k(x, x) = 0 and k(x1 , x2 ) = k(x2 , x1 ). (3.10) Then a necessary and sufficient condition for (3.9) to be non-negative is that k(x1 , x2 ) is a conditionally negative definite (CND) function. Quadratic entropy and analysis of diversity 77 A function k(x1 , x2 ) satisfying (3.10) is CND if n X n X 1 k(xi , xj )ai aj ≤ 0 1 for all x1 , . . . , xn and a1 , a2 , . . . , an such that a1 + . . . + an = 0, and any value of n. For some properties of such functions and the results based on them, reference may be made to Schoenberg (1938), Parthasarathy and Schmidt (1972) and Wells and Williams (1975). Further, using the same lemma, convexity of all orders of Jensen differences can be proved in the same manner as in the case of multinomial distributions. Lemma 3.2. A converse of Lemma 3.1 was proved by Lau (1985) that under some topological restrictions quadratic entropy is the only entropy function for which Jensen differences of all orders are concave. 4 A metric on the space of probability measures induced by QE 4.1 Dissimilarity measure and cross entropy. Let P1 and P2 be two probability measures and define dissimilarity between them induced by QE as P1 + P2 1 1 DQ (P1 , P2 ) = 4 Q − Q(P1 ) − Q(P2 ) 2 2 2 Z Z =− k(x1 , x2 ) dµ(x1 ) dµ(x2 ), µ(x) = P1 (x) − P2 (x) (4.1) which is the same as J1 (P1 , P2 ) apart from a constant as defined in (3.9). Rao and Nayak (1985) introduced the concept of cross entropy induced by quadratic entropy as a measure of closeness of P2 to P1 . It is defined as: Q(λP1 + µP2 ) − Q(P2 ) CQ (P2 |P1 ) = lim + Q(P2 ) − Q(P1 ) λ→0 λ Z Z =− k(x1 , x2 ) dµ(x1 ) dµ(x2 ) = DQ (P1 , P2 ). (4.2) The cross entropy (4.2), induced by QE is symmetric and is the same as the dissimilarity measure (4.1). In general, cross entropy based on an entropy function is not symmetric and not equal to the dissimilarity measure. 78 C. R. Rao 4.2 Metric on the space of P induced by QE. The following lemmas p which are of independent interest are needed to show that DQ (·, ·) is a metric on P. Lemma 4.1. (Cauchy-Schwartz type inequality). Let k(x1 , x2 ) be a CND function R as defined in Section 3.2, G(x1 , x2 ) = −k(x1 , x2 ) and µ1 , µ2 ∈ M = {µ : dµ(x) = 0}. Then Z Z ≤ Z Z 2 G(x1 , x2 )dµ1 (x)dµ2 (x) Z Z G(x1 , x2 )dµ1 (x1 )dµ1 (x2 ) G(x1 , x2 )dµ2 (x1 )dµ2 (x2 ) (4.3) Proof. Using Lemma 3.1, we define Z Z 2 f (µ) = G(x1 , x2 ) dµ(x1 ) dµ(x2 ). (4.4) Let f (µ1 ) = 0. Considering µ = µ1 ± θµ2 , we have Z Z θ 2 f 2 (µ2 ) ± 2 G(x1 , x2 ) dµ1 (x1 ) dµ2 (x2 ) ≥ 0 ∀ θ > 0 which implies that holds. RR G(x1 , x2 ) dµ1 (x1 ) dµ2 (x2 ) = 0 and the result (4.3) Let f (µ1 ) 6= 0, f (µ2 ) 6= 0 and both be finite, considering µ = [µ1 /f (µ1 ) ± µ2 /f (µ2 )], we have from (3.9) and (3.10), RR G(x1 , x2 )dµ1 (x1 )dµ2 (x2 ) f 2 (µ1 ) f 2 (µ2 ) 2 + 2 ± ≥0 2 f (µ1 ) f (µ2 ) f (µ1 )f (µ2 ) which establishes (4.3). The Lemma 4.1 is proved. 2 Lemma 4.2. (Subadditivity). With f (µ) as defined in (4.4), f (µ1 + µ2 ) ≤ f (µ1 ) + f (µ2 ). (4.5) Proof. We have 2 2 2 f (µ1 , µ2 ) = f (µ1 ) + f (µ2 ) + 2 Z Z G(x1 , x2 )dµ1 (x1 )dµ2 (x2 ) ≤ f 2 (µ1 ) + f 2 (µ2 ) + 2f (µ1 )f (µ2 ) = [f (µ1 ) + f (µ2 )]2 which establishes (4.5). 2 Quadratic entropy and analysis of diversity 79 Theorem 4.1. If k(x1 , x2 ) 6= 0 when x1 6= x2 , then q ρQ (P1 , P2 ) = DQ (P1 , P2 ) is a metric on P, i.e., (i) (ii) ρQ (P1 , P2 ) > 0 if P1 6= P2 and = 0 iff P1 = P2 ρQ (P1 , P2 ) + ρQ (P2 , P3 ) ≥ ρQ (P1 , P3 ). Proof. Choose µ1 = (P1 − P3 ) and µ2 = (P3 − P2 ). Then an application of Lemma 4.1 proves the result. 2 References Anderson, R.J. and Landis, J.R. (1980). When does the βth percentile residual life function determine the distribution? Oper. Res., 31, 391–396. Burbea, J. and Rao, C.R. (1982a). On the convexity of divergence measures based on entropy functions. IEEE Trans. Inform. Theory, 28, 489–495. Burbea, J. and Rao, C.R. (1982b). On the convexity of higher order Jensen differences based on entropy functions. IEEE Trans. Inform. Theory, 28, 961–963. Burbea, J. and Rao, C.R. (1982c). Entropy differential metric distance and divergence measures in probability spaces: A unified approach. J. Multivariate Anal., 12, 575–596. Izsak, J. and Szeidl, L. (2002). Quadratic diversity: its maximization can reduce the richness of species. Environ. Ecol. Stat., 9, 423–430. Lau, K.S. (1985). Characterization of Rao’s quadratic entropy. Sankhyā, Ser. A, 47, 295–309. Lewontin, R.C. (1972). The apportionment of human diversity. Evol. Biol., 6, 381–398. Light, R.J. and Margolin, B.H. (1971). An analysis of variance for categorical data. J. Amer. Statist. Assoc., 66, 534–544. Mathai, A. and Rathie, P.N. (1974). Basic Concepts in Information Theory and Statistics. Wiley (Halsted Press), New York. Nei, M. (1973). Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA, 70, 3321–3323. Parthasarathy, K.R. and Schmidt, K. (1972). Positive Definite Kernels, Continuous Tensor Products and Central Limit Theorems of Probability Theory. Lecture Notes in Mathematics, 272. Springer, NY. Patil, G.P. and Taillie, C. (1982). Diversity as a concept and its measurements. J. Amer. Statist. Assoc., 77, 548–567. Pavoine, S., Ollier, S. and Pontier, D. (2005). Measuring diversity with Rao’s quadratic entropy: Are any dissimilarities suitable? Theor. Popul. Biol., 67, 231– 239. 80 C. R. Rao Rao, C.R. (1981). Measures of diversity and applications. In Topics in Applied Statistics. Proc. Montreal Conf. Rao, C.R. (1982a). Analysis of diversity: A unified approach. In Statistical Decision Theory and Related Topics, III, Vol. 2, (S.S. Gupta and J.O. Berger, eds.). Academic Press, New York, 235–250. Rao, C.R. (1982b). Diversity and dissimilarity measurements: A unified approach. Theor. Popul. Biol., 21, 24–43. Rao, C.R. (1982c). Diversity, its measurement, decomposition, apportionment and analysis. Sankhyā, 44, 1–21. Rao, C.R. (1982d). Gini-Simpson index of diversity: A characterization, generalization and applications. Util. Math., 21, 273–282. Rao, C.R. (1984). Convexity properties of entropy functions and analysis of diversity. In Inequalities in Statistics and Probability, (Y.L. Tong, ed.). IMS Lecture Notes Monogr. Ser., 5. IMS, Hayward, CA, 68–77. Rao, C.R. (1986). Rao’s axiomatization of diversity measures. In Encyclopedia of Statistical Sciences, 7. Wiley, 614–617. Rao, C.R. (2010). Entropy and cross entropy: Characterizations and applications. In Alladi Ramakrishnan Memorial Volume, to appear. Rao, C.R. and Boudreau, R. (1982). Diversity and cluster analysis of blood group data on some human populations. In Human Population Genetics: The Pittsburgh Symposium, (A. Chakravarti, ed.). Van Nostrand Reinhold, 331–362. Rao, C.R. and Nayak, T.K. (1985). Cross entropy, dissimilarity measures and characterization of quadratic entropy. IEEE Trans. Inform. Theory, 31, 589–593. Ricotta, C. and Szeidl, L. (2006). Towards a unifying approach to diversity measures: Bridging the gap between Shannon entropy and Rao’s quadratic index. Theor. Popul. Biol., 70, 237–243. Schoenberg, I.J. (1938). Metric spaces and positive definite functions. Trans. Amer. Math. Soc., 44, 522–536. Wells, J. and Williams, L. (1975). Embeddings and Extensions in Analysis. Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 84. Springer-Verlag, NY. Zolton, B.-D. (2005). Rao’s quadratic entropy as a measure of functional diversity based on multiple traits. J. Vegetation Sci., 16, 533–540. C. R. Rao C. R. Rao Advanced Institute of Mathematics, Statistics and Computer Science Prof. C. R. Rao Road Hyderabad 500046, India E-mail: [email protected] Paper received July 2009; revised September 2009.