Download Behboodian, J.; (1971)Bayesian estimation for the proportions in a mixture of distribution."

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Particle filter wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Density matrix wikipedia , lookup

Density of states wikipedia , lookup

German tank problem wikipedia , lookup

Bayesian inference wikipedia , lookup

Transcript
And PahZavi
Univel'sity~ ShiMZ~
Iran •
.e
•
DAVESIAN EsTI~TION FOR THE PROPORTIONS
IN A MIXTURE OF DISTRIBUTIONS
. *
Javad Behboochan
Depa:r-tment of Statistics
Univel'sity of NOl'th CaroZina at Chape'L Hi'L'L
Institute of Statistics Mimeo Series No. 733
Ja.nuM.Ij, 1971
•
BAYESIAN EsTIMl\TION FOR THE PROPORTIONS
.e
IN A ~'IXTURE OF DISTRIBUTIONS
Javad Behboodian
UniveT'sity of NOT'th CaT'oZina" Chapel Hill
and PahZavi University" S'hiraz" IT'an
ABSTRACT
The joint density of a random sample from a mixture of two distributions is
expressed as a binomial mixture of conditional densities.
Then, the posterior
distribution of the proportion, in a mixture of two known and distinct distributions, relative to a beta prior distribution is explored in detail and a conjugate family of prior distributions is introduced.
4It
The results are generalized
for estimating the proportions in a finite mixture of known distributions.
1.
INTRODUCTION
Consider
m(x)
where
with
other,
f(x)
0
<
p
m(x)
and
<
1
g(x)
,and
=
pf(x) + qg(x)
are two known and distinct probability density functions
q
= l-p.
Since
f(x)
and
g(x)
are distinct from each
is identifiable, i.e., corresponding to two different values of
we have two different mixtures.
p
The estimation problem of proportions in a finite
mixture of distributions has been already investigated by Boes.
In [1], he
2
.e
suggests a class of unbiased estimaotrs which converge with probability 1, and
in [2] he gives minimax unbiased estimators for proportions.
Malceva [4] intro-
duces moment estimators for proportions, and shows that they are unbiased and
asymptotically normal.
As far as I know, a Bayesian analysis of this problem has
not been so far explored.
Here we look at the problem from a subjectivistic
Bayesian point of view by considering the experimenter's information
proportion
p
prior to taking a sample.
by a prior distribution for
about
p.
p,
abo~t
the
Such information is usually expressed
which reflects subjective beliefs or knowledge
The prior distribution is modified, by using the sample information,
in accordance with Bayes theorem to yield a posterior distribution of
Bayesian analysis of
p.
A
p would consist in the exploration and interpretation of
the posterior distribution.
Examples of such analyses are available in many of
the references provided by L. J. Savage [5].
It is known that the posterior distribution is proportional to the product
of the likelihood function and prior density.
tribution for
p,
we first write its likelihood function in a convenient form.
2
Let
density function (1.1).
THE LI!<ELI HOOD FfJ\JCTI Q\J
I
Xl' X , •.• , X
2
n
Xl' x2 ' ••• , xn
Before introducing a prior dis-
be a random sample from a population with probability
The likelihood function of
p,
for
0 < p < 1
and with
as the experimental values of the random sample, is
n
L(p)
= IT
i=l
The contribution of any
pend on
p,
Xi'
[pf(x ) + qg(x )].
i
i
for which
f(x )
i
= g(x i ) ,
and we can eliminate such non-informative
sample without any loss.
It is quite possible to have
(2.1)
to
L(p)
does not de-
from our observed
3
.e
i
= 1,2, ••• ,n.
For example, this may happen in the case of a mixture of two uni-
form distributions defined on two different overlapping intervals of equal length.
However, due to distinction of
comes small as
n
becomes large.
assume that
L(p)
f(x)
and
the chance of such event be-
Therefore, to avoid exceptional cases, we
for all
is a concave function of
g(x),
The logarithm of
from the beginning.
x 's
i
p with at most one local maximum.
It can be
shown that, with only light regularity conditions on the mixed densities
and
g(x),
L(p)
1\
has a unique maximum at
in the interval
P
(0,1)
sufficiently large, with the usual desirable large sample properties.
for small samples
To write
when
f(x)
n
is
However,
1\
P may be a poor estimate.
L(p)
in a suitable form, we can expand the right side of (2.1).
But, it is more useful to. apply the following probabilistic argument:
Let us de-
note the right side of (2.1), which is in fact the joint density of the random
=
where
E
k
(2 ~2)
is the event that exactly
the rest have density
g(x).
k
of the
xi
have density
's
f(x)
It is clear that
=
The event
E
k
can happen in
of the random sample.
and
(~)
(2.3)
equally likely ways depending on the partition
The joint density of
Xl ,X , ••• ,Xn
2
corresponding to a
particular partition is
htk(Xl,X2,···,xn)
where
with
tk
k
= IT
a€J\
is a partition of the set
elements in
&\.
f(x) IT
a b€B
{1,2, ••• ,n}
(2.4)
g(x ) ,
b
k
into two sets
J\
Denoting the set of all such partitions by
using conditional density once more, we obtain
and
Tk and
4
.e
=
I h t (xl,x2,···,xn)/(~)'
tk€T k k
where
Sk(x l ,x2 , ••• ,xn ) is a symmetric n-variate density.
(2.3) and (2.5), we have
Now, from (2.2),
=
L(p)
(2.5)
(2.6)
Xl ,X2 ""'Xn is a binomial
defined by (2.4) and (2.5).
Therefore, the joint density of the random sample
mixture of the densities
Sk(x ,x , ••. ,x )
l 2
n
n = 3, then we have four 3-variate densities:
For example, if
So (xl' x2 ,x 3)
=
g(x ) g(x ) g(x )
l
2
3
81 (xl'x2 ,x 3)
=
[f(x )g(x2)g(x ) + f(x ) g(x ) g(x ) + f(x )g(x )g(x2)]/3
l
l
3
l
3
2
3
82 (xl'x2 ,x 3)
=
[f(x )f(x 2)g(x ) + f(x ) f (x ) g(x ) + f(x )f(x )g(x2)]/3
l
2
l
l
3
3
3
S3(x l ,x2 ,x 3)
=
f(x l ) f(x 2) f(x 3)·
3D
Theoretically
p
BAYESIAN ESTIMA.TION FOR p
can assume any real number in the interval
pose the experimenter expresses his information and belief about
(0,1).
p,
8up-
which is
now considered as a random variable, by a beta density of the form
(3(p; u,v)
for
0 < p < 1,
= r r(u+v)
(u)r (v)
and zero elsewhere, with parameters
the posterior density of
(3.1)
pu-l(1_p)v-1
u > 0
and
v > O.
Now,
p becomes
(3.2)
~
where«
denotes proportionality.
By using (2.6) and (3.1) and omitting the
5
multiplier of (3.1) which does not involve
p,
we obtain
.e
(3.3)
The missing constant of proportionality can be easily found from the fact that
the posterior density must integrate to one.
Simple calculation shows that
n
L
=
where
S(p; k+u, n-k+v)
and the weights
(3.4)
w S(p; k+u, n-k+v),
k
k=O
is a beta density with parameters
k+u
and n-k+v,
are obtained from
(3.5)
with
(3.6)
The weights
w
k
depend on the parameters of the prior and on the observed sam-
ples; the contribution of the prior is reflected in
the data in
Let
Sk(x 1 ,x2 , ••• ,xn ).
x ,x , ••• ,x
1 2
n
ture of two
~~known
and the contribution of
Thus, we have the following result:
be the experimental values of a random sample from a mix-
and distinct distributions with proportion parameter
where the prior distribution of
v.
Y
k
p
is a beta distribution with parameter
Then, the posterior distribution of
tions with parameters
k+u
and
p
n-k+v
is a mixture of
for
k
p,
u
and
n+l beta distribu-
= O,l, ••• ,n.
Simple calculation, by using (3.4), the mean and variance of a beta distribution, and the formulae for the mean and variance of a mixture [3], shows that
the posterior mean and the posterior variance of
n
=
=
I
L
pare
k=O
k+u
k n+u+v
I
~k+U
W
(k+u) (n-k+v)
J2
Wk·
2
+
wk n+u+v -Err(p)
•
k=O
(n+u+v) (n+u+v+l)
k=O
(3.7)
(3.8)
6
.e
These results show that the posterior mean takes values in
posterior variance becomes small as
We know that when
u
= v = 1,
bution over the unit interval.
of prior kno\v1edge about
n
(0,1),
and the
becomes large.
the beta distribution is a uniform distri-
This might be used to represent a diffuse state
In this case, the maximum likelihood estimate
p.
A
P
is the posterior mode and in large samples it is usually near to the posterior
mean
L~=o wk (k+l)/(n+2).
On the other hand, if the experimenter believes that
the population with density
with density
g(x),
i.e.,
a beta prior with large
sufficiently large
tion of
k
u
f(x)
p
is slightly contaminated by the population
is close to
u and small
and small
v
1,
v.
then he can express his view by
But it follows from (3.6) that for
the coefficient
Y
k
is an increasing func-
since
Yk+l/ Yk
=
(3.9)
(k+l)(k+u)/(n-k)(n-k+v).
The meaning of the above result can be expressed in the following manner:
When we have a strong opinion about the closeness of
iated with
X. 's
~
Sk(x ,x , ••• ,x ),
n
l 2
to
1,
assoc-
the
of the
the conditional density that exactly
come from the population with density
creases.
p
f(x),
becomes larger as
k
in-
However, the effect of data and mixed densities which is reflected in
Sk(x ,x , ••• ,x ),
l 2
n
opinion about
a factor of the weight
w '
k
may confirm or reject our prior
p.
Actually, the beta family of-distributions includes symmetric, skewed, unimodal, U-shaped and J-shaped distributions.
But we can have a wider variety of
distributions for representing a person's knowledge about
of finite mixtures of beta distributions.
p
by using a family
It is interesting to observe, by an
analysis similar to that of Section 3, that if we take a member of this family
~
for the prior density of
p,
then the posterior density will be again a member
7
.e
of the same family.
Therefore, a family of finite mixtures of beta distri-
butions is conjugate with respect to the likelihood function (2.7).
4D
EsTIf'1ATION IN A FINITE MIXTURE OF DISTRIBUTIONS
Consider, for
N ~ 2,
N+1
m(x)
I
=
(4.1)
Pi fi(x)
i=l
where
f (x), fZ(x),
1
GOO ,
density functions with
are
f N+1 (x)
Pi > 0,
i
N+1
known and distinct probability
= 1, Z, ••• , N+1 ,
the probability density function of
N+1
I Ni=l
+1
and
distributions.
Since the
linearly dependent, the unknown parameters can be assumed to be
Moreover, it is assumed that
~
ferent vectors
x ,x , ••• ,x
n
1 Z
m(x)
(P1,PZ, ••• ,PN)
is
m(x)
Pi = 1.
p 's
i
are
P1,PZ, ••• ,PN'
is identifiable, i.e., corresponding to dif-
we obtain different mixtures.
As before, let
be the experimental values of a random sample from a population
with probability density function (4.1) and
function of the unknown parameters
L(P 1 ,P Z"" ,PN)
the likelihood
P1,P , ••. ,PN'
Z
As an N-variate analogue of (Z.6), we can easily show that the joint density
of the random sample is a multinomial mixture of the densities
S~
k ••• 'N
k (x1 ,xZ""'xn ),
1'Z'
Xi's have density f (x),
1
which is the conditional density that
L....
For the joint prior density of
of the
X 's
i
P1'P Z""'PN'
have density
k
1
and so on.
we can take the N-variate analogue
of the beta density (3.1), which is the N-variate Dirichlet density [6].
calculation shows that the joint posterior density of
mixture of
n+1
Dirichlet densities.
of the
Simple
P1,P2, ••• ,PN becomes a
Similarly, we observe that a family of
finite mixtures of Dirichlet distributions is conjugate with respect to the 1ike-
8
.e
REFERENCES
[1]
Boes, D. C.,
"On the Estimation of Mixing Distributions)"
Annals of Mathematiaal Statistias 3 37 (1966), 177-188.
[2]
Boes) D. C,)
"Minimax unbiased estimator of mixing distribution for
finite mixtures,"
Sankhya:. Series A 29 (1967»)
3
417-420.
An Int'r'oduation to P'r'obabiZity and Its AppZiaations 3 VoZ. II3
John Wiley & Sons, Inc.) New York, 1966) p. 164.
[3]
Feller) W')
[4]
Ma1ceva) N. 1.,
"Estimation of the parameters of a mixture of measures by
the method of moments, II MathematiaaZReviews 3 40 (Dec. 1970), p.1479.
[5]
Savage, L. J.)
"Reading suggestions for the foundations of statistics,"
The Ame'r'iaan Statistiaian 3 24 (Oct. 1970),23-27.
[6]
~
Wilks) S. S.,
MathematiaaZ Statistias,
New York, 1962) 173-182.
John Wiley & Sons) Inc.)