Download Quadratic Entropy and Analysis of Diversity

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Sankhyā : The Indian Journal of Statistics
2010, Volume 72-A, Part 1, pp. 70-80
c 2010, Indian Statistical Institute
Quadratic Entropy and Analysis of Diversity
C. R. Rao
CRRao AIMSCS, Hyderabad, India
Abstract
In this paper some general postulates are laid down for the construction of diversity measures and conditions for ANOVA type of analysis are investigated.
It is shown that a diversity measure called quadratic entropy introduced by
the author in 1982, applicable to both qualitative and quantitative data,
provides a general solution to both the problems posed above.
AMS (2000) subject classification. Primary 62A01, 62H30, 62B10, 94A17.
Keywords and phrases. Diversity measure, quadratic entropy, convexity.
1
Introduction
R.A. Fisher introduced the method of Analysis of Variance (ANOVA)
for partitioning the variance of a measurement in a sample of observations
into several components with assignable causes. ANOVA has been used in
a variety of situations from regression analysis to testing main effects and
interactions based on data from multifactorial experiments.
Two questions arise. Can ANOVA type analysis be done using measures
of dispersion other than variance, such as mean deviation, range, Gini coefficient of concentration, and various types of entropy functions based on
qualitative data used by ecologists to measure diversity ? In this paper, some
general postulates are laid down for the construction of diversity measures
and conditions for ANOVA type of analysis are investigated. It is shown that
a diversity measure called quadratic entropy (QE) introduced by the author
(Rao, 1982a, 1982b, 1982c, 1982d), applicable to both qualitative and quantitative data, provides a general solution to both the problems posed above.
We use the acronym ANODIV (Analysis of Diversity) for the decomposition
of a diversity measure into a number of components with assignable causes
as in ANOVA.
Quadratic entropy and analysis of diversity
71
The paper is dedicated to the memory of Dr. A.Maitra in appreciation
of the fundamental contributions he has made to probability theory and
the devotion with which he served the Indian Statistical Institute in various
capacities including Directorship.
2
Characterization of a diversity measure
Let P be a convex set of probability measures defined on a measurable
space (X , B), and H(·) a real valued function defined on P. The function
H will be called a diversity measure or entropy if it satisfies the following
postulates.
C0 : H(P ) = −J0 (P ) ≥ 0 ∀ P ∈ P and =0 iff P is degenerate
C1 : H(λP1 + µP2 ) − λH(P1 ) − µH(P2 ) = J1 (P1 , P2 : λ, µ) ≥ 0
P1 , P2 ∈ P and λ ≥ 0, µ ≥ 0, λ + µ = 1.
The postulate C0 requires H to be a non-negative function. C1 implies that
H is a concave function, or J0 = −H is a convex function. It meets the
intuitive requirement that the diversity within a mixture of two different
populations (probability measures) is greater than the average of the diversities within the two populations. The function J1 defined on P 2 is called
the first Jensen difference (Rao, 1982a). More generally, C1 implies
k
X
X
λi Pi ) −
λi H(Pi ) = J1 ({Pi } : {λi }) ≥ 0
H(
(2.1)
1
where Pi ∈ P, i = 1, . . . , k and λi ≥ 0, λ1 + . . . + λk = 1.
Note 1. Ecologists specify another requirement of an entropy function
that it should attain the maximum value when the probability distribution
is uniform. For instance, in the case of a multinomial distribution with
m classes, H(P ) should attain the maximum value when the probability
of each class is 1/m. We have shown in Rao (2010) that this condition is
implied by the concavity condition if H(P ) is a symmetric function of the
class probabilities. But this postulate restricts the diversity function to cases
where equal importance is given to differences between all pairs of classes.
For instance, if some species are considered to be more similar than others
based on some measurement, we may give less weight to their difference in
constructing a diversity measure.
Note 2. If C1 holds with J1 > 0 if P1 6= P2 and =0 iff P1 = P2 , then H
is a strictly concave function. In some applications, it may be necessary to
have a strictly concave measure.
72
C. R. Rao
Using (2.1), based on concavity of H(P ), we have one way ANODIV as
in Table 1, similar to ANOVA.
Table 1. One way ANODIV for k populations
Due to
Between populations
Within populations
Total
(B)
(W)
(T )
Diversity
J1P
({Pi } : {λi })
λi H(Pi )
H(P. )
P
In Table 1, J1 is as defined in (2.1) and P· =
λi Pi . In practice, we
estimate H(P ) based on observed data and choose λi as proportional to
sample size ni of observations from Pi or as λi = 1/k, i = 1, . . . , k, depending
on the problem under study. A decomposition as in Table 1 with all λi equal
has been used to compute G = B/T , called genetic index of diversity, in
population studies by Lewontin (1972), Nei (1973), Rao (1982b) and Rao
and Boudreau (1982). Reference may also be made to papers by Light and
Margolin (1971) and Anderson and Landis (1980) for some special cases.
Now, we investigate the condition on H(P) for carrying out a two-way
ANODIV. Let us consider populations denoted by Pij , i = 1, . . . , r and j =
1, . . . , s, where the first index i refers to a locality and the second index j to a
specific community. Further let λi µj be the relative strength of individuals
in localityPi and community j, where λi and µj are all non-negative and
P
λi =
µj = 1. We may ask: How are the genetic contributions of
individuals between localities and between communities as a whole different?
Is the magnitude of genetic differences between the communities different in
different localities? Such questions concerning the rs populations can be
answered by a two-way ANODIV as in Table 2.
Table 2. Two-way ANODIV
Due to
Localities
Communities
Interaction
(L)
(C)
(LC)
Between populations
Within populations
(B)
(W)
Total
(T)
Diversity
P
H(P·· ) − P r1 λi H(Pi· )
H(P·· ) − s1 µj H(P·j )
* by subtraction
H(P
−
·· )P
P
P
λi µj H(Pij )
λi µj H(Pij )
H(P·· )
Quadratic entropy and analysis of diversity
73
PP
P
P
λi µj Pij , Pi· =
µj Pij , P·j =
λi Pij . What are the
In Table 2, P·· =
conditions under which the entries in Table 2 non-negative? For B, L and
C, to be non-negative, the function H(·) defined on P should be strictly
concave. For the interaction LC to be non-negative
(LC) = J1 ({Pi,j } : {λi µj }) − J1 ({Pi. } : {λi }) − J1 ({P.j } : {µj })
h
i X h
i
X
X
= − H(P·· ) −
λi H(Pi· ) +
µj H(P·j ) −
λi H(Pij )
X
= −J1 ({Pi· } : {λi }) +
µj J1 ({Pij } : {λi })
= J2 ({Pij } : {λi µj }) ≥ 0
or J1 as defined in (2.1) on P r is convex. We represent this condition as C2
on a diversity measure. If C1 and C2 hold, we call such a diversity measure
as of order 2. Note that (LC) can also be expressed as
h
i X h
i
X
X
− H(P·· ) −
µj H(P.j ) +
λi H(Pi· ) −
µj H(Pij )
X
= − J1 ({P·j } : {µj }) +
λi J1 ({Pij } : {µj })
or J1 as a function on P s is convex.
We can recursively define higher order Jensen differences J3 from J2 , J4
from J3 and so on and call H(·) for which J0 , J1 , . . . , Ji−1 are convex as
the i-th order diversity measure. With such a measure we can carry out
ANODIV for i-way classified data. A diversity measure for which Jensen
differences of all orders are convex is called a perfect diversity measure.
Some examples of entropy functions for multinomial distributions in k
classes, with Pi representing the probability for class i, used as diversity
measures in ecological applications are as follows:
2.
3.
4.
5.
6.
P
Pi log Pi , Shannon entropy
P
P
− Pi log Pi − (1 − Pi ) log(1 − Pi ), Paired Shannon entropy
P
1 − Pi2 , Gini- Simpson index
P
(1 − α)−1 log Piα , 0 < α < 1, Renyi α-entropy
P
(α − 1)−1 (1 − Piα ), Havdra and Charavat entropy
PP
dij Pi Pj , Rao’s quadratic entropy
1. −
74
C. R. Rao
For a general discussion of measurement of diversity reference may be made
to Patil and Taillie (1982), Mathai and Rathie (1974), and Rao (1984, 1986).
Burbea and Rao (1982a, 1982b, 1982c) have shown that Shannon entropy
(1) and Paired Shannon entropy (2) satisfy the conditions C0 , C1 and C2 but
not C3 , C4 , . . .. The Havdra and Charavat entropy (4) satisfies C0 , C1 , C2 for
α in the range (1, 2] when k ≥ 3 and α in the range [1, 2] ∪ (3, 11/3) when
k = 2 and C3 , C4 , . . . do not hold except when α = 2 in which case it reduces
to Gini-Simpson index (3). Renyi’s entropy (4) satisfies C0 , C1 , C2 only for
α in (0,1). Most of the well known entropy functions can be used only for
two-way ANODIV but not for higher order ANODIV. It will be shown in
the next section that quadratic entropy introduced by Rao (1981, 1982a) is
a perfect diversity measure which can be used to carry out ANODIV for any
order classified data.
3
Quadratic entropy
3.1 Definition. Quadratic entropy (QE) as a measure of diversity of
individuals in a population is based on a quantitative assessment of the
difference between any two individuals of the population identified by certain
measurements which may be qualitative or quantitative. Let x1 and x2 be
the measurements on two individuals drawn from a population, and the
difference between them be assessed through a kernel function
k(x1 , x2 ) = k(x2 , x1 ) ≥ 0 and = 0 if x1 = x2 .
We define the quadratic entropy of a population with a probability measure
P by the expression
Q(P ) = EP P k(x1 , x2 )
(3.1)
where we use the general notation EP1 P2 for expectation with respect to the
product measure P1 P2 . If x1 is an observation from P1 and x2 from P2 we
expect
1
1
EP1 P2 − EP1 P1 − EP2 P2 k(x1 , x2 )
2
2
1
1
= EP1 P2 k(x1 , x2 ) − [EP1 P1 k(x1 , x2 )] − [EP2 P2 k(x1 , x2 )] ≥ 0 (3.2)
2
2
which implies that two individuals drawn from different populations are more
dissimilar than those drawn from the same population.
Quadratic entropy and analysis of diversity
75
An alternative requirement on the choice of k(x1 , x2 ) is the condition C2
of concavity of a diversity measure as described in Section 2. Considering a
mixture of distributions, P1 and P2 with prior probabilities λ and µ = 1 − λ,
concavity of Q(P ) defined in (3.1) implies
Q (λP1 + µP2 ) − λQ(P1 ) − µQ(P2 )
1
1
= 2λµ EP1 P2 − EP1 P1 − EP2 P2 k(x1 , x2 ) ≥ 0 (3.3)
2
2
which is same as the condition (3.2). If in (3.3), equality is attained if and
only if P1 = P2 , Q(·) is said to be strictly concave, which may be needed in
certain investigations. For instance in the univariate case the choice
k(x1 , x2 ) =
1
(x1 − x2 )2 ⇒ Q(P ) = Variance of P = σP2
2
leads to
J1 =
EP1 P2
1
1
− EP1 P1 − EP2 P2 (x1 − x2 )2 = (µ1 − µ2 )2 ,
2
2
EP1 (x) = µ1 , EP2 (x) = µ2
so that the diversity between P1 and P2 in one-way ANODIV does not reflect
the differences in variances of P1 and P2 . Thus ANOVA is essentially useful in
detecting differences in mean values of distributions. Q(P ) as defined in (3.1)
is not strictly concave if the populations under study includes distributions
with mean zero and different variances.
3.2 Choice of the k-function for multinomial distributions. Let P =
(P1 , . . . , Pm ) be a multinomial distribution in m classes, where Pi is the
probability of class i. Then the quadratic entropy of P is
Q(P ) = P ′ KP, K = (kij )
(3.4)
where in the m × m matrix (kij ), kij ≥ 0 represents the assessed difference
between the classes i and j. Using the expression (3.3) without the term λµ,
J1 (P1 , P2 ) = −(P1 − P2 )′ K(P1 − P2 )
= (P1∗ − P2∗ )K ∗ (P1∗ − P2∗ )
(3.5)
where P1 and P2 are two distributions, P1∗ and P2∗ are vectors of the first
m − 1 components of P1 and P2 respectively and
K ∗ = (−kij + kim + kmj − kmm ).
(3.6)
76
C. R. Rao
Concavity of Q(P ) requires the first Jensen difference (3.5) to be nonnegative which holds if K ∗ is a non-negative definite matrix. We may state
the desired result as follows: Q(P ) as defined in (3.4) satisfies the requirements of a diversity measure if
(i)
(ii)
kij ≥ 0, kii = 0
The matrix K ∗ is non-negative definite.
(3.7)
J1 (P1 , P2 ) is defined on P 2 . Let us consider two pairs of distributions P1 , P2
and R1 , R2 and compute the second order Jensen difference
J2 ({P1 , P2 }, {R1 , R2 } : λ, µ)
= J1 (λP1 + µR1 , λP2 + µR2 ) − λJ1 (P1 , P2 ) − µJ(R1 , R2 )
= [λ(P1 − P2 ) + µ(R1 − R2 )]′ K [λ(P1 − P2 ) + µ(R1 − R2 )]
− λ(P1 − P2 )′ K(P1 − P2 ) − µ(R1 − R2 )′ K(R1 − R2 )
= −λµ [(P1 − P2 ) − (R1 − R2 )]′ K [(P1 − P2 ) − (R1 − R2 )] .
(3.8)
The expression (3.8) is non-negative if K ∗ as defined in (3.6) is non-negative
definite. Thus with the same conditions as in (3.7), J0 = −Q(P ) and J1
are convex. Similarly, Jensen differences of all orders are convex under the
conditions (3.7).
For some applications of Q(P ) as in (3.4) reference may be made to Izsak
and Szeidl (2002), Pavoine, Ollier and Pontier (2005), Rao (1984), Ricotta
and Szeidl (2006) and Zolton (2005).
3.3 Choice of the k-function: Continuous case. Let P1 (x) and P2 (x) be
two probability measures, in which case the expression (3.3) takes the form,
omitting the constant factor
Z Z
k(x1 , x2 ) [2dP1 (x1 )dP2 (x2 ) − dP1 (x1 )P1 (x2 ) − dP2 (x1 )P2 (x2 )]
Z Z
=−
k(x1 , x2 ) dµ(x1 ) dµ(x2 ) ≥ 0
(3.9)
R
where µ = P1 − P2 and dµ = 0. The conditions under which (3.9) holds
are provided by the following Lemma 3.1 due to Lau (1985).
Lemma 3.1. Let k(x1 , x2 ) be continuous and such that
k(x1 , x2 ) ≥ 0, k(x, x) = 0 and k(x1 , x2 ) = k(x2 , x1 ).
(3.10)
Then a necessary and sufficient condition for (3.9) to be non-negative is that
k(x1 , x2 ) is a conditionally negative definite (CND) function.
Quadratic entropy and analysis of diversity
77
A function k(x1 , x2 ) satisfying (3.10) is CND if
n X
n
X
1
k(xi , xj )ai aj ≤ 0
1
for all x1 , . . . , xn and a1 , a2 , . . . , an such that a1 + . . . + an = 0, and any value
of n. For some properties of such functions and the results based on them,
reference may be made to Schoenberg (1938), Parthasarathy and Schmidt
(1972) and Wells and Williams (1975).
Further, using the same lemma, convexity of all orders of Jensen differences can be proved in the same manner as in the case of multinomial
distributions.
Lemma 3.2. A converse of Lemma 3.1 was proved by Lau (1985) that under some topological restrictions quadratic entropy is the only entropy function for which Jensen differences of all orders are concave.
4
A metric on the space of probability measures induced by QE
4.1 Dissimilarity measure and cross entropy. Let P1 and P2 be two
probability measures and define dissimilarity between them induced by QE
as
P1 + P2
1
1
DQ (P1 , P2 ) = 4 Q
− Q(P1 ) − Q(P2 )
2
2
2
Z Z
=−
k(x1 , x2 ) dµ(x1 ) dµ(x2 ), µ(x) = P1 (x) − P2 (x) (4.1)
which is the same as J1 (P1 , P2 ) apart from a constant as defined in (3.9).
Rao and Nayak (1985) introduced the concept of cross entropy induced by
quadratic entropy as a measure of closeness of P2 to P1 . It is defined as:
Q(λP1 + µP2 ) − Q(P2 )
CQ (P2 |P1 ) = lim
+ Q(P2 ) − Q(P1 )
λ→0
λ
Z Z
=−
k(x1 , x2 ) dµ(x1 ) dµ(x2 ) = DQ (P1 , P2 ).
(4.2)
The cross entropy (4.2), induced by QE is symmetric and is the same as the
dissimilarity measure (4.1). In general, cross entropy based on an entropy
function is not symmetric and not equal to the dissimilarity measure.
78
C. R. Rao
4.2 Metric on the space of P induced by QE. The following
lemmas
p
which are of independent interest are needed to show that DQ (·, ·) is a
metric on P.
Lemma 4.1. (Cauchy-Schwartz type inequality). Let k(x1 , x2 ) be a CND
function
R as defined in Section 3.2, G(x1 , x2 ) = −k(x1 , x2 ) and µ1 , µ2 ∈ M =
{µ : dµ(x) = 0}. Then
Z Z
≤
Z Z
2
G(x1 , x2 )dµ1 (x)dµ2 (x)
Z Z
G(x1 , x2 )dµ1 (x1 )dµ1 (x2 )
G(x1 , x2 )dµ2 (x1 )dµ2 (x2 ) (4.3)
Proof. Using Lemma 3.1, we define
Z Z
2
f (µ) =
G(x1 , x2 ) dµ(x1 ) dµ(x2 ).
(4.4)
Let f (µ1 ) = 0. Considering µ = µ1 ± θµ2 , we have
Z Z
θ 2 f 2 (µ2 ) ± 2
G(x1 , x2 ) dµ1 (x1 ) dµ2 (x2 ) ≥ 0 ∀ θ > 0
which implies that
holds.
RR
G(x1 , x2 ) dµ1 (x1 ) dµ2 (x2 ) = 0 and the result (4.3)
Let f (µ1 ) 6= 0, f (µ2 ) 6= 0 and both be finite, considering µ = [µ1 /f (µ1 ) ±
µ2 /f (µ2 )], we have from (3.9) and (3.10),
RR
G(x1 , x2 )dµ1 (x1 )dµ2 (x2 )
f 2 (µ1 ) f 2 (µ2 ) 2
+ 2
±
≥0
2
f (µ1 ) f (µ2 )
f (µ1 )f (µ2 )
which establishes (4.3). The Lemma 4.1 is proved.
2
Lemma 4.2. (Subadditivity). With f (µ) as defined in (4.4),
f (µ1 + µ2 ) ≤ f (µ1 ) + f (µ2 ).
(4.5)
Proof. We have
2
2
2
f (µ1 , µ2 ) = f (µ1 ) + f (µ2 ) + 2
Z Z
G(x1 , x2 )dµ1 (x1 )dµ2 (x2 )
≤ f 2 (µ1 ) + f 2 (µ2 ) + 2f (µ1 )f (µ2 )
= [f (µ1 ) + f (µ2 )]2
which establishes (4.5).
2
Quadratic entropy and analysis of diversity
79
Theorem 4.1. If k(x1 , x2 ) 6= 0 when x1 6= x2 , then
q
ρQ (P1 , P2 ) = DQ (P1 , P2 )
is a metric on P, i.e.,
(i)
(ii)
ρQ (P1 , P2 ) > 0 if P1 6= P2 and = 0 iff P1 = P2
ρQ (P1 , P2 ) + ρQ (P2 , P3 ) ≥ ρQ (P1 , P3 ).
Proof. Choose µ1 = (P1 − P3 ) and µ2 = (P3 − P2 ). Then an application
of Lemma 4.1 proves the result.
2
References
Anderson, R.J. and Landis, J.R. (1980). When does the βth percentile residual life
function determine the distribution? Oper. Res., 31, 391–396.
Burbea, J. and Rao, C.R. (1982a). On the convexity of divergence measures based on
entropy functions. IEEE Trans. Inform. Theory, 28, 489–495.
Burbea, J. and Rao, C.R. (1982b). On the convexity of higher order Jensen differences
based on entropy functions. IEEE Trans. Inform. Theory, 28, 961–963.
Burbea, J. and Rao, C.R. (1982c). Entropy differential metric distance and divergence
measures in probability spaces: A unified approach. J. Multivariate Anal., 12,
575–596.
Izsak, J. and Szeidl, L. (2002). Quadratic diversity: its maximization can reduce the
richness of species. Environ. Ecol. Stat., 9, 423–430.
Lau, K.S. (1985). Characterization of Rao’s quadratic entropy. Sankhyā, Ser. A, 47,
295–309.
Lewontin, R.C. (1972). The apportionment of human diversity. Evol. Biol., 6, 381–398.
Light, R.J. and Margolin, B.H. (1971). An analysis of variance for categorical data.
J. Amer. Statist. Assoc., 66, 534–544.
Mathai, A. and Rathie, P.N. (1974). Basic Concepts in Information Theory and
Statistics. Wiley (Halsted Press), New York.
Nei, M. (1973). Analysis of gene diversity in subdivided populations. Proc. Natl. Acad.
Sci. USA, 70, 3321–3323.
Parthasarathy, K.R. and Schmidt, K. (1972). Positive Definite Kernels, Continuous
Tensor Products and Central Limit Theorems of Probability Theory. Lecture Notes
in Mathematics, 272. Springer, NY.
Patil, G.P. and Taillie, C. (1982). Diversity as a concept and its measurements. J.
Amer. Statist. Assoc., 77, 548–567.
Pavoine, S., Ollier, S. and Pontier, D. (2005). Measuring diversity with Rao’s
quadratic entropy: Are any dissimilarities suitable? Theor. Popul. Biol., 67, 231–
239.
80
C. R. Rao
Rao, C.R. (1981). Measures of diversity and applications. In Topics in Applied Statistics. Proc. Montreal Conf.
Rao, C.R. (1982a). Analysis of diversity: A unified approach. In Statistical Decision
Theory and Related Topics, III, Vol. 2, (S.S. Gupta and J.O. Berger, eds.). Academic Press, New York, 235–250.
Rao, C.R. (1982b). Diversity and dissimilarity measurements: A unified approach.
Theor. Popul. Biol., 21, 24–43.
Rao, C.R. (1982c). Diversity, its measurement, decomposition, apportionment and analysis. Sankhyā, 44, 1–21.
Rao, C.R. (1982d). Gini-Simpson index of diversity: A characterization, generalization
and applications. Util. Math., 21, 273–282.
Rao, C.R. (1984). Convexity properties of entropy functions and analysis of diversity.
In Inequalities in Statistics and Probability, (Y.L. Tong, ed.). IMS Lecture Notes
Monogr. Ser., 5. IMS, Hayward, CA, 68–77.
Rao, C.R. (1986). Rao’s axiomatization of diversity measures. In Encyclopedia of
Statistical Sciences, 7. Wiley, 614–617.
Rao, C.R. (2010). Entropy and cross entropy: Characterizations and applications. In
Alladi Ramakrishnan Memorial Volume, to appear.
Rao, C.R. and Boudreau, R. (1982). Diversity and cluster analysis of blood group
data on some human populations. In Human Population Genetics: The Pittsburgh
Symposium, (A. Chakravarti, ed.). Van Nostrand Reinhold, 331–362.
Rao, C.R. and Nayak, T.K. (1985). Cross entropy, dissimilarity measures and characterization of quadratic entropy. IEEE Trans. Inform. Theory, 31, 589–593.
Ricotta, C. and Szeidl, L. (2006). Towards a unifying approach to diversity measures:
Bridging the gap between Shannon entropy and Rao’s quadratic index. Theor.
Popul. Biol., 70, 237–243.
Schoenberg, I.J. (1938). Metric spaces and positive definite functions. Trans. Amer.
Math. Soc., 44, 522–536.
Wells, J. and Williams, L. (1975). Embeddings and Extensions in Analysis. Ergebnisse
der Mathematik und ihrer Grenzgebiete, Band 84. Springer-Verlag, NY.
Zolton, B.-D. (2005). Rao’s quadratic entropy as a measure of functional diversity
based on multiple traits. J. Vegetation Sci., 16, 533–540.
C. R. Rao
C. R. Rao Advanced Institute of
Mathematics, Statistics and
Computer Science
Prof. C. R. Rao Road
Hyderabad 500046, India
E-mail: [email protected]
Paper received July 2009; revised September 2009.