* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Behboodian, J.; (1971)Bayesian estimation for the proportions in a mixture of distribution."
Degrees of freedom (statistics) wikipedia , lookup
Particle filter wikipedia , lookup
Sufficient statistic wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Density matrix wikipedia , lookup
Density of states wikipedia , lookup
And PahZavi Univel'sity~ ShiMZ~ Iran • .e • DAVESIAN EsTI~TION FOR THE PROPORTIONS IN A MIXTURE OF DISTRIBUTIONS . * Javad Behboochan Depa:r-tment of Statistics Univel'sity of NOl'th CaroZina at Chape'L Hi'L'L Institute of Statistics Mimeo Series No. 733 Ja.nuM.Ij, 1971 • BAYESIAN EsTIMl\TION FOR THE PROPORTIONS .e IN A ~'IXTURE OF DISTRIBUTIONS Javad Behboodian UniveT'sity of NOT'th CaT'oZina" Chapel Hill and PahZavi University" S'hiraz" IT'an ABSTRACT The joint density of a random sample from a mixture of two distributions is expressed as a binomial mixture of conditional densities. Then, the posterior distribution of the proportion, in a mixture of two known and distinct distributions, relative to a beta prior distribution is explored in detail and a conjugate family of prior distributions is introduced. 4It The results are generalized for estimating the proportions in a finite mixture of known distributions. 1. INTRODUCTION Consider m(x) where with other, f(x) 0 < p m(x) and < 1 g(x) ,and = pf(x) + qg(x) are two known and distinct probability density functions q = l-p. Since f(x) and g(x) are distinct from each is identifiable, i.e., corresponding to two different values of we have two different mixtures. p The estimation problem of proportions in a finite mixture of distributions has been already investigated by Boes. In [1], he 2 .e suggests a class of unbiased estimaotrs which converge with probability 1, and in [2] he gives minimax unbiased estimators for proportions. Malceva [4] intro- duces moment estimators for proportions, and shows that they are unbiased and asymptotically normal. As far as I know, a Bayesian analysis of this problem has not been so far explored. Here we look at the problem from a subjectivistic Bayesian point of view by considering the experimenter's information proportion p prior to taking a sample. by a prior distribution for about p. p, abo~t the Such information is usually expressed which reflects subjective beliefs or knowledge The prior distribution is modified, by using the sample information, in accordance with Bayes theorem to yield a posterior distribution of Bayesian analysis of p. A p would consist in the exploration and interpretation of the posterior distribution. Examples of such analyses are available in many of the references provided by L. J. Savage [5]. It is known that the posterior distribution is proportional to the product of the likelihood function and prior density. tribution for p, we first write its likelihood function in a convenient form. 2 Let density function (1.1). THE LI!<ELI HOOD FfJ\JCTI Q\J I Xl' X , •.• , X 2 n Xl' x2 ' ••• , xn Before introducing a prior dis- be a random sample from a population with probability The likelihood function of p, for 0 < p < 1 and with as the experimental values of the random sample, is n L(p) = IT i=l The contribution of any pend on p, Xi' [pf(x ) + qg(x )]. i i for which f(x ) i = g(x i ) , and we can eliminate such non-informative sample without any loss. It is quite possible to have (2.1) to L(p) does not de- from our observed 3 .e i = 1,2, ••• ,n. For example, this may happen in the case of a mixture of two uni- form distributions defined on two different overlapping intervals of equal length. However, due to distinction of comes small as n becomes large. assume that L(p) f(x) and the chance of such event be- Therefore, to avoid exceptional cases, we for all is a concave function of g(x), The logarithm of from the beginning. x 's i p with at most one local maximum. It can be shown that, with only light regularity conditions on the mixed densities and g(x), L(p) 1\ has a unique maximum at in the interval P (0,1) sufficiently large, with the usual desirable large sample properties. for small samples To write when f(x) n is However, 1\ P may be a poor estimate. L(p) in a suitable form, we can expand the right side of (2.1). But, it is more useful to. apply the following probabilistic argument: Let us de- note the right side of (2.1), which is in fact the joint density of the random = where E k (2 ~2) is the event that exactly the rest have density g(x). k of the xi have density 's f(x) It is clear that = The event E k can happen in of the random sample. and (~) (2.3) equally likely ways depending on the partition The joint density of Xl ,X , ••• ,Xn 2 corresponding to a particular partition is htk(Xl,X2,···,xn) where with tk k = IT a€J\ is a partition of the set elements in &\. f(x) IT a b€B {1,2, ••• ,n} (2.4) g(x ) , b k into two sets J\ Denoting the set of all such partitions by using conditional density once more, we obtain and Tk and 4 .e = I h t (xl,x2,···,xn)/(~)' tk€T k k where Sk(x l ,x2 , ••• ,xn ) is a symmetric n-variate density. (2.3) and (2.5), we have Now, from (2.2), = L(p) (2.5) (2.6) Xl ,X2 ""'Xn is a binomial defined by (2.4) and (2.5). Therefore, the joint density of the random sample mixture of the densities Sk(x ,x , ••. ,x ) l 2 n n = 3, then we have four 3-variate densities: For example, if So (xl' x2 ,x 3) = g(x ) g(x ) g(x ) l 2 3 81 (xl'x2 ,x 3) = [f(x )g(x2)g(x ) + f(x ) g(x ) g(x ) + f(x )g(x )g(x2)]/3 l l 3 l 3 2 3 82 (xl'x2 ,x 3) = [f(x )f(x 2)g(x ) + f(x ) f (x ) g(x ) + f(x )f(x )g(x2)]/3 l 2 l l 3 3 3 S3(x l ,x2 ,x 3) = f(x l ) f(x 2) f(x 3)· 3D Theoretically p BAYESIAN ESTIMA.TION FOR p can assume any real number in the interval pose the experimenter expresses his information and belief about (0,1). p, 8up- which is now considered as a random variable, by a beta density of the form (3(p; u,v) for 0 < p < 1, = r r(u+v) (u)r (v) and zero elsewhere, with parameters the posterior density of (3.1) pu-l(1_p)v-1 u > 0 and v > O. Now, p becomes (3.2) ~ where« denotes proportionality. By using (2.6) and (3.1) and omitting the 5 multiplier of (3.1) which does not involve p, we obtain .e (3.3) The missing constant of proportionality can be easily found from the fact that the posterior density must integrate to one. Simple calculation shows that n L = where S(p; k+u, n-k+v) and the weights (3.4) w S(p; k+u, n-k+v), k k=O is a beta density with parameters k+u and n-k+v, are obtained from (3.5) with (3.6) The weights w k depend on the parameters of the prior and on the observed sam- ples; the contribution of the prior is reflected in the data in Let Sk(x 1 ,x2 , ••• ,xn ). x ,x , ••• ,x 1 2 n ture of two ~~known and the contribution of Thus, we have the following result: be the experimental values of a random sample from a mix- and distinct distributions with proportion parameter where the prior distribution of v. Y k p is a beta distribution with parameter Then, the posterior distribution of tions with parameters k+u and p n-k+v is a mixture of for k p, u and n+l beta distribu- = O,l, ••• ,n. Simple calculation, by using (3.4), the mean and variance of a beta distribution, and the formulae for the mean and variance of a mixture [3], shows that the posterior mean and the posterior variance of n = = I L pare k=O k+u k n+u+v I ~k+U W (k+u) (n-k+v) J2 Wk· 2 + wk n+u+v -Err(p) • k=O (n+u+v) (n+u+v+l) k=O (3.7) (3.8) 6 .e These results show that the posterior mean takes values in posterior variance becomes small as We know that when u = v = 1, bution over the unit interval. of prior kno\v1edge about n (0,1), and the becomes large. the beta distribution is a uniform distri- This might be used to represent a diffuse state In this case, the maximum likelihood estimate p. A P is the posterior mode and in large samples it is usually near to the posterior mean L~=o wk (k+l)/(n+2). On the other hand, if the experimenter believes that the population with density with density g(x), i.e., a beta prior with large sufficiently large tion of k u f(x) p is slightly contaminated by the population is close to u and small and small v 1, v. then he can express his view by But it follows from (3.6) that for the coefficient Y k is an increasing func- since Yk+l/ Yk = (3.9) (k+l)(k+u)/(n-k)(n-k+v). The meaning of the above result can be expressed in the following manner: When we have a strong opinion about the closeness of iated with X. 's ~ Sk(x ,x , ••• ,x ), n l 2 to 1, assoc- the of the the conditional density that exactly come from the population with density creases. p f(x), becomes larger as k in- However, the effect of data and mixed densities which is reflected in Sk(x ,x , ••• ,x ), l 2 n opinion about a factor of the weight w ' k may confirm or reject our prior p. Actually, the beta family of-distributions includes symmetric, skewed, unimodal, U-shaped and J-shaped distributions. But we can have a wider variety of distributions for representing a person's knowledge about of finite mixtures of beta distributions. p by using a family It is interesting to observe, by an analysis similar to that of Section 3, that if we take a member of this family ~ for the prior density of p, then the posterior density will be again a member 7 .e of the same family. Therefore, a family of finite mixtures of beta distri- butions is conjugate with respect to the likelihood function (2.7). 4D EsTIf'1ATION IN A FINITE MIXTURE OF DISTRIBUTIONS Consider, for N ~ 2, N+1 m(x) I = (4.1) Pi fi(x) i=l where f (x), fZ(x), 1 GOO , density functions with are f N+1 (x) Pi > 0, i N+1 known and distinct probability = 1, Z, ••• , N+1 , the probability density function of N+1 I Ni=l +1 and distributions. Since the linearly dependent, the unknown parameters can be assumed to be Moreover, it is assumed that ~ ferent vectors x ,x , ••• ,x n 1 Z m(x) (P1,PZ, ••• ,PN) is m(x) Pi = 1. p 's i are P1,PZ, ••• ,PN' is identifiable, i.e., corresponding to dif- we obtain different mixtures. As before, let be the experimental values of a random sample from a population with probability density function (4.1) and function of the unknown parameters L(P 1 ,P Z"" ,PN) the likelihood P1,P , ••. ,PN' Z As an N-variate analogue of (Z.6), we can easily show that the joint density of the random sample is a multinomial mixture of the densities S~ k ••• 'N k (x1 ,xZ""'xn ), 1'Z' Xi's have density f (x), 1 which is the conditional density that L.... For the joint prior density of of the X 's i P1'P Z""'PN' have density k 1 and so on. we can take the N-variate analogue of the beta density (3.1), which is the N-variate Dirichlet density [6]. calculation shows that the joint posterior density of mixture of n+1 Dirichlet densities. of the Simple P1,P2, ••• ,PN becomes a Similarly, we observe that a family of finite mixtures of Dirichlet distributions is conjugate with respect to the 1ike- 8 .e REFERENCES [1] Boes, D. C., "On the Estimation of Mixing Distributions)" Annals of Mathematiaal Statistias 3 37 (1966), 177-188. [2] Boes) D. C,) "Minimax unbiased estimator of mixing distribution for finite mixtures," Sankhya:. Series A 29 (1967») 3 417-420. An Int'r'oduation to P'r'obabiZity and Its AppZiaations 3 VoZ. II3 John Wiley & Sons, Inc.) New York, 1966) p. 164. [3] Feller) W') [4] Ma1ceva) N. 1., "Estimation of the parameters of a mixture of measures by the method of moments, II MathematiaaZReviews 3 40 (Dec. 1970), p.1479. [5] Savage, L. J.) "Reading suggestions for the foundations of statistics," The Ame'r'iaan Statistiaian 3 24 (Oct. 1970),23-27. [6] ~ Wilks) S. S., MathematiaaZ Statistias, New York, 1962) 173-182. John Wiley & Sons) Inc.)