Download on unbiased estimation of density functions

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
ON UNBIASED ESTIMATION OF DENSITY FUNCTIONS
by
Allan Henry Seheult
Institute of Statistics
Mimeugraph Series No. 649
Ph.D. Thesis - January 1970
•
iv
TABLE OF CONTENTS
Page
LIST OF FIGURES •
...
v
...
INTRODUCTION •
6
REVIEW OF THE LITERATURE • •
•
3.
...
GENERAL THEORY • • •
13
Introduction • • • • . • • • . • • • •
The Model •••
The General Problem and Definitions ••
Existence Theorems
• • • • • • • • • .
4.
Introduction • • • • •
Methods • • • • • • ••
••• •
Discrete Distributions . • • •
• • • •
Absolutely Continuous Distributions • • • . • • •
SUMMARy. . . . . .
6.
LIST OF REFERENCES ••
1)
13
15
20
. ... . .. . . ...
APPLICATIONS • •
5·
1
.., .
...
26
27
34
38
49
..
53
•
v
LIST OF FIGURES
Page
4.1
•
The U.M.V.U.E. of (x/e)I(O,e) • • • • • • • • • • • • • ••
46
1.
INTRODUCTION
The problem of estimating probability density functions is of
fundamental importance to statistical theory and its applications.
Parzen (1962) states;
"The problem of estimation of a probability
density function f(x) is interesting for many reasons.
As one possible
application, we mention the problem of estimating the hazard, or conditional rate of failure, function f(x)/(l-F(x)}, where F(x) is the distribution function corresponding to f(x)."
Ifu,jek and Sidak. (1967) in
the introduction to chapter one of their book make the statement;
"We
have tried to organize the multitUde of rank tests into a compact system.
However, we need to have some knowledge of the form of the unknown density
in order to make a rational selection from this system.
The fact that no
workable procedure for the estimation of the form of the density is yet
available is a serious gap in the theory of rank tests."
It is impli-
citly understood in both of the statements cited above that the unknown
density function corresponds to a member of the class of all probability
distributions on the real line that are absolutely continuous with
spect to Lebesgue measure.
re~
Moreover, the obvious further implication
is that the problem falls wi thin the domain of non-parametric statistics.
It is not surprising, therefore, that most of the research effort on this
i
problem in the last fifteen years or so has been directed towards developing estimation procedures that are to some extent non-parametric in
character.
However, like many non-parametric procedures those that have
been developed for estimating density functions are somewhat heuristic.
Any reasonable heuristic procedure would be expected to have good large
sample properties and this is about all that could be hoped for in this
2
particular non-parametric setting.
The problem is that any attempt to
search for estimators which in some sense optimize a predetermined
criterion invariably leads one to the untenable situation of considering
estimators that are functions of the density function to be estimated.
Hence within this non-parametric framework the efforts, for example, of
'.
Y,
HaJek and Sidak;
"••• to organize the multitude of rank tests into a
compact system ••• ;"
is, indeed, a difficult task.
The work in this
thesis is partly an attempt to resolve some of the difficulties alluded
to above.
The approach adopted is classical in that attention is re-
stricted to unbiased estimators of density functions.
However, the
question of the existence of unbiased estimators is of fundamental
importance and is one of primary consideration of the research herein.
The general setting is that of a family of probabili ty measures p,
dominated by a CT-fini te measure ~ on a measurable space (1,G), with 1:>
the family of densities (with respect to
~)
that correspond to p.
It
is assumed that there is available n independent observations
(n)
X
= (Xl' X ' ••• , X ) on a random variable X with unknown distribution
2
n
P, a member of
P, and density P€1:>. The problem is, therefore, to find an
estimator ;(x)
=
;(x;x(n)) such that for each x€1
EpCp(x)J
i
= p(x),
for all P € p.
The symbol Ep denotes mathematical expectation under the assumption that
p is the true distribution of X.
•
Thus, with the exception of one simple
but important difference, the problem resembles the usual parametric
estimation problem.
In the usual context an estimator is required of a
3
function
~(p)
of the distribution P.
Here,
~
depends on x as well as p.
Thus a certain global notion has been introduced which is not usually
present wi thin the classical framework of point estimation; one might
use the phrase 'uniformly unbiased' for estimators p satisfying
(1.1).
At this point it is pertinent to make a few remarks about the classical
estimation problem.
i
estimate a function
Very often the statistician claims he wants to
~(p)
using the sample x(n).
However, it is not
always clear for what purpose the estimate will be used.
required is to have an estimate of
~
If all that is
then the classical procedures,
wi thin their own limitations, are adequate for this purpose.
It is often
the case, however, that P (and hence p) is an explicit function of
alone; that is, for each A
€
~
a
peA) :: P(A;~),
for all P
€
p.
(1.2)
For example, if p is the family of normal distributions on the real line
wi th unit variance and unknown mean
IJ.,
it is common to identify an
unknown member of this family by the symbol P , and a typical function
IJ.
cp is
cp(p ) :: IJ.,
IJ.
for all
IJ. €
(-
00,(0).
(1.3 )
Once an estimate cp of cp has been obtained an estimate P of P is arrived
at by the 'method of sUbstitution'; that is, P is given by
•
(1.4)
4
This procedure implies that the ultimate objective is to estimate P (or p)
rather
and, therefore, desirable criteria should refer to the
than~,
estimation of P (or p) rather than~.
This substitution method is per-
fectly valid if maximum likelihood estimators are required provided that
very simple regularity conditions are satisfied.
In the above example
the mean X is a unique minimum variance unbiased estimator of
x and pX
is not the case that P
estimators of P
J.L
and p.
J.L,
but it
are the unique minimum variance unbiased
It is shown in chapter four that the unique
,..
J.L
minimum variance unbiased estimator P of P is a normal distribution with
J.L
mean X but with variance (n-l)/n.
This should be compared with the maxi-
Px.
mum likelihood estimator
It transpires that in the search for estimators satisfying (1.1) it
is necessary to consider unbiased estimators ;(A)
corresponding probability measure P;
of P if for each A
E
= ;(A;X(n))
of the
that is, P is an unbiased estimator
G
,..
~[P(A)]
= P(A),
for all PEr.
(1.5)
,..
,..
It is easy to see that if p is an unbiased estimator of p then
S pdJi.
is
A
an unbiased estimator of P, but in general the converse is not true.
In
chapter three of this thesis conditions for the converse to hold are
established.
It turns out that if an unbiased estimator p of p exists,
then it can be found immediately.
The existence or non-existence of the
unbiased estimator p depends on whether or not there exists an unbiased
estimator P of the corresponding probability measure that is absolutely
continuous with respect of the original dominating measure
J.L.
5
If a
~-continuous
P exists, an unbiased estimator of p is given by the
~
Radon-Nikodym derivative, ~.
Thus if
r
is a sub-family of Lebesgue-
continuous distributions on the real line and an unbiased density
estimator exists, it is obtained by differentiation of the distribution
function estimator corresponding to P.
Chapter three also includes a more precise description of the
general theoretical framework and analogous theorems on the existence
of unique minimum variance unbiased estimators of density functions.
The interesting result here is that if
r
admits a complete and suffi-
cient statistic then a unique minimum variance unbiased estimator P of
P exists.
~
Moreover, if this P is absolutely continuous with respect to
~ then ~ is the unique minimum variance unbiased estimator of p.
The application of the general theory developed in chapter three
to particular families of distributions on the real line forms the main
content of chapter four.
The dominating measure
~
is taken to be either
ordinary Lebesgue measure or counting measure on some sequence of real
numbers.
In particular, the result of Rosenblatt (1956), that there
exists no unbiased estimator of p if
r
is the family of all Lebesgue-
continuous distributions on the real line, is re-established as a simple
corollary to the theorems of chapter three.
On the other hand, it is
also true that the theorems of chapter three are generalizations of
Rosenblatt's result.
Finally, the thesis is concluded with a summary and suggestions
for further research.
6
2.
REVIEW OF THE LI':('].'R..u.TURE
MOst of the research on density estimation that appears in the
II terature is concerned with the following problem.
X(n)
Let
= (Xl' X2' ••• , Xn ) be n independent observations on a real-valued
random variable X.
The only knowledge of the distribution of X is that
its distribution function F is absolutely continuous.
F and, hence, the
corresponding probability density function f are otherwise assumed to be
completely unknown.
By making use of the observation x(n) the objective
....
is to develop a procedure for estimating f(x) at each point x.
such estimation procedure let fn(x)
f at the point x.
For any
= fn(X;X(n» denote the estimator of
Some researchers have also considered the case where
X takes on values in higher dimensional Euclidean spaces.
However, for
the purpose of this review, no separate notation will be developed for
these multivariate extensions.
At the outset it should be remarked that, within the context outlined above, a result of Rosenblatt (1956) has a direct bearing on the
work of this thesis and the relevance of certain sections of the existi.ng
literature on the subject of density estimation.
exists no unbiased estimator of f.
He showed that there
As a result much of the emphasis has
been on finding estimators with good large sample properties, such as
consistency and asymptotic normality•. Since these
metho~s
and procedures
do not have a direct bearing on the present work, only a brief review of
the literature on these methods and procedures will be presented here.
Very basic approaches have been adopted by Fix and Hodges (1951),
Loftsgaa.rden and Quesenberry (1965), and Elkins (1968).
In connection
1
with non-parametric discrimination, Fix and Hodges estimate a
sional density by counting the number N
o~
sional Borel sets
i~ ~
lim
sup
\x-dl
n->CIDd€~
n
=
n
= 0,
They showed that
lim
n A(6n)
sample points in k-dimenis continuous at x,
then, N/n A(~n) is a con-
= 0,
n~CID
sistent estimator
and p(x,y)
~.
o~ ~(x).
Here, A is k-dimensional Lebesgue measure
Lo~tsgaarden
\x-y\ is the usual Euclidean metric.
Quesenberry obtain the same consistency result with
about x, and by posing the number
o~ ~
n
which contains this number
Elkin compares the
e~~ects o~
Apart
tially
o~
o~
o~
points and then
points.
chosing the
(centered about x) in the Fix and
criterion
k-dimen~
~odges
~
n
and
hyperspheres
~inding
the radius
In the two-dimensional case
~
n
to be spheres or squares
type
o~
estimator; the
comparison being mean integrated square error.
~rom
the simple estimators discussed above, two other essen-
di~~erent
approaches have been adopted in this complete non-par a-
metric setting; namely, weighting
~ction
type estimators, and series
expansion type estimators.
Kronmal and Tarter (1968),
re~erring
to papers on density esti-
mation by Rosenblatt (1956), Whittle (1958) and Parzen (1962) state;
"The density estimation problem 1s considered in these papers to be that
of finding the' focusingfunction6 (-) such that ,using the criteria of
m
Mean Integrated Square Error, M. I. S, E., ~ (x)
n
would be the best estimator
o~
the density
~."
(lin)
n
E 0 (x-X.)
i=l n
~
It was shown by Watson
=
and Leadbetter (1963) that the solution to this problem could be
obtained by inverting an expression involving the characteristic function
o~
the densi ty~.
Papers by Bartlett (1963), Murthy (1965, 1966),
8
Nadaraya (1965), Cacoul1os (1966), and Wooqroofe (1967) are all cOllcern~d
with the properties and extensions to higher dimensions of estimators of
the type described above.
Anderson (1969) compares some of the a.bove
estimators in terms of the M. I. S. E. criterion.
OJ
A different approach has been exploited by Cencov (1962), Schwartz
(1962) and Kronmal and Tarter (1968).
The basic idea on the one...dimen-
sional case is to use a density estimator
q
f
(x) = E a. cp.(x),
n
j=l In J
(2.1)
where
1 n
a. =- Er(X.)cp.(X.).
In
n i=l
1
J 1
(2.2)
Here, q is an integer depending on n, and the CPj form on orthonormal
system o! functions with respect to a weight function r; that is,
co
Sr(x)CPi(X)CPj(X)dx = 5ij
,
(2.3 )
-co
where 5.. is the Kronecker delta function.
1J
Cencov studied the general
case and its k-dimensional extension, Schwartz specialized to Hermite
functions with r(x) = 1, and Kronmal and Tarter specialized to trigonometri c functi ons wi th r (x)
=
1.
Brunk (1965), Robertson (1966), and Wegman (1968) restrict their
attention to estimating unimodal densities.
The maximum likelihood
9
method is used and the solution can be represented as a conditional
expectation given a ~-lattice.
Brunk
(1965) discusses such conditional
expectations and their application to various optimization problems.
The research which turns out to be most relevant to the present
work is to be found in papers concerned with the estimation of a
distribution fUnction Fe at a point x on the real line.
parameter which varies over some index set
subset of a Euclidean space.
Here,
e
a
i~
which is usually a Borel
(8),
With one exception, all of the writers
of these papers consider particular families of distributions on the
real line such that
say.
e admits
a complete and sufficient statistic T,
Their objective is then to find unique minimum variance unbiased
estimators of Fe(X) for these particular families.
Their approach is
the usual Rao-Blackwell-Lehmann-Scheffe method of conditioning a
simple estimator of Fe(X) by the complete and sufficient statistic T.
In fact, the usual estimator is given by
F(x) = P(Xl
Barton
~
x\T].
(2.4)
(1961) obtained the estimators for the binomial, the Poisson
and the normal distribution fUnctions.
Pugh
(1963) obtained the esti-
mator for the one-parameter exponential distribution fUnction.
Laurent
(1963) and Tate (1959) have considered the gamma and two-parameter
exponential distribution fUnctions.
Folks, et
!l (1965) find the esti-
mator for the normal distribution fUnction with known mean but unknown
variance.
Basu
(1964) has given a summary of all these methods. Sathe
10
and Yarde
(1969)
consider the estimation of the so-called reliability
fUnction, l-Fe(X).
Their method is to find a statistic which is
stochastically independent of the complete and sufficient statistic
and whose distribution can be easily obtained.
The unique minimum
variance unbiased estimator is based on this distribution.
For
example, if Fe(x) is the normal distribution function with mean e
and variance one, Xl sufficient statistic
and variance (n-l)/n.
X,
X,
say, is independent of the complete and
and has a normal distribution with mean zero
They give as their estimator of l-Fe(t),
where
G(w)
~
= J exp[-x
2
!2]dx.
(2.6)
w
They also apply their method to most of the distributions considered
by the previous authors mentioned above.
As
an example of more general functions of single location and
scale parameters, Tate considered the problem of estimation Pe[X E A];
that is, e is the parameter in distributions given by densities,
(scale parameter)
11
(2.8)
(location parameter),
where p(x) is a density function on the real line, and
and
~
<
theory.
e<
co
in (2.8).
For example, he
e>
0 in (2.7)
Tate xoelies heavily on integral transform
obt~ns
the following result for the family
(2.7), under the assumption that p(x) vanishes for negative x.
H(X(n»
be a non-negative homogeneous statistic of degree
density
eag(eax), and suppose
a~
Let
0 with
g(x) and 9p(ez) (for some fixed positive
number z) both have Mellin tr8llsfqrrns. Then, if there exists an
,..
unbiased estimator p of 9P(ez) with a Mellin transform it will be
given by
(2·9)
where M is the Mellin
tran~tQrm ~efined
for a function '(x), when it
exists, by
ClO
M[,(x);s]
-1
and M
= SxS-1C(X)dx,
(2.10)
o
denotes the inverse
~ct~on.
previously mentioned results.
integral transform methods,
Washi 0,
~tudy
Tate also obtained many of the
~ al
(1956), again using
the problem of estimating functions
of the parameter in the Koppman-Pi,. tman family of densi ti es (cf. Koopman
(1936) and Pitman (1936».
as one example, the
prob~em
To
illustra~e
their method they consider,
of estimating Pe[X
€
A] for the normal
12
distribution with unknown mean and variance.
In an earlier paper
Kolmogorov (1950) derived an estimator of Pe[X
E
A] for the normal
distribution with unknown mean and known variance.
His approach was
to first obtain a unique minimum variance unbiased estimator of the
density function (which he assumed existed) and then to integrate
the solution on the set A.
In his derivation he used the 'source
solution' of the heat equation
~(z,t) =
(2nt)i exp[-z2/ 4t], 0 < t
<~, ~ <
z <
~,
to solve the integral equation that resulted directly from the
estimation problem.
(2.11)
13
3.
3.1
GENERAL THEORY
Introduction
In this chapter the general set-up of the basic statistical model
is described, definitions of unbiasedness for probability measures and
I
densities are given, and two basic theorems are established.
The first
of these two theorems gives necessary and sufficient conditions for the
existence of an unbiased estimator of a density function, and under the
additional assumptions of sufficiency and completeness the second
theorem gives similar necessary and sufficient conditions for the
existence of a unique minimum variance unbiased estimator of a density
function.
Moreover, these theorems also show that if, in any parti-
cular example, an unbiased estimator of the density function exists,
then the estimator can be computed immediately.
3.2 The Model
In this section the set-theoretical details of the general
statistical model are described.
measure space; that is,
Let (l,a,g) be a
measure on
a.
Euclidean
I, with generic element x, is a Borel set in a
Euclidean space, a is the class of Borel subsets of
~-finite
~-finite
I, and g is a
Furthermore, suppose that there is given a
family P of probability measures P on a which is dominated by g; that
is, each P
€
P is absolutely continuous with respect to g.
Then by
the Radon-Nikodym theorem there exists, for each P, a g-unique, finite
valued, non-negative, a-measurable function p, such that
P(A) •
Sp(X)~(x),
for all A
€
G.
A
The fam:f,.ly
~,
of functions p, will be referred to as the family of
density tunctions(or
densitie~)
cOfresponding to p.
Let X be a random variable over the space (X,a).
assumed
~hat
It will be
the probability distribution of X identifies with an
b~ n indep~nde~t
n
random variables that are identically distributed as X, and denote by
unknown member P of p.
Let X(n)
= (Xl' X2'
••• , X )
x ' ••• , X ) an obs~rvation on x(n). Denote by X(n) the
n
2
sample ~pac~ ot observations x (n), and a(n) the product cr-algebra of'
x(n)
= (xl'
subsets of X(n) that is determined by
measure Q an
a
a in
the usual way.
the cQrresponding product measure on
a(n)
For any
will be
denoted by Q,(n).
All statistics to be considered are a(n).measurable functions on
X(n) to a. Euclidean measureaole
statistic,
sp~ce {~,8). .If T:::T(X(n» is such a
denote by PT tha.t fa.m:i,ly of probability measures P
T
on B induced by ~; that is, for each B € B
the~
for all PEP.
Since both of the spaces (X,a) and
(~,B)
are Euclidean it will be
assumed tqat all conditional probability distributions are regul8f;
that is, if T is a statistic and Q an
on
~(n), then
arbitrary probability measure
a conditional probability function QT defined, for each
A € a(n) and T
F
t, by
15
for all B
€
B,
(3.3 )
e~ch fixed t, a probability measure on a(n). Theorems 4 and 7
is, for
in chapter two of Lehmann(1959), for example, validate this assumption.
ActuallY, the assumption of Euclidean spaces is unnecessarily restrictive.
As long as there exist regular conditional probability measures
all the results of this chapte.f will still be valid.
Of course, from
a practical viewpoint, spaces other than Euclidean spaces are of little
interest.
TlUs will be true in the next chapter where I will be taken
to be a Borel s'Q.bset of the real line.
Finally, it will be assumed that the definitions of completeness
and sufficiency for a family of probabili ty measures are known, and
hence no
sep~ate
definittons of these notions will be given.
3.3· The General froblem and Definitions
i
In this section the general problem of density estimation is discussed within the framework of decision theory, and definitions of
unbiased estimators for probability measures and probability density
functions are given to prepare the way for the theorems of the next
•
section•
The problem to be considered here is typical of a large class
statistical problems.
are
ass~ed
pondin~
o~
A set of observations x (n) are available that
to have been generated from a distribution P with corres-
dens;l.ty p.
All that is assumed about P (1') is that it is a
member of some particular cla.ss of probability measures (probability
1,6
density functions) p
(~).
Hence, given the observations x(n), the
problem here is to estimate p in some optimal way.
standpoint of decision theory an estimator;
point x
€
Viewed from the
= ;(x;x(n»
of p at a
I is required that minimizes the risk
•
for all P
The
s~bol
€
(3.4 )
p.
Ep represents mathematical expectation under the assumption
that pen) is the true distribution of x(n), and L(.,.) is a loss
functio~
which expresses the loss incurred (monetary or otherwise) when estimating the density as p(x) when, in fact, p(x) is the true density at x.
Rather than think in terms of estimating p at some fixed point x it is
often desirable to think in terms of estimating the entire density
,.,
function. Hence the problem becomes one of selecting a p which not
only minimizeS the risk in (3.4) for all P
minimizes it for all x
€
€
P, but also, simultanously,
I. Minimum risk estimators are rarely obtain-
ablE! so that the additional complication introduced by this global
notion increases the difficulty in general.
cumvented in the following way.
This problem can be cir-
Let w be a measure on a which in some
way expresses the intensity of interest of the statistician in the
points of I with reference to the estimation of p.
For example, if I
is the real line and w is Lebesgue measure, the intensity of interest
is uniform over the points of I.
,.,
p which minimizes
The problem becomes one of finding a
17
for all P
€
p.
(3.5)
To overcome the difficulty of finding estimators which minimize,
for all P
€
function in
p, the risk function in (3.4) and the integrated risk
(3.5) the notions of minimax estimators and
mators have been introduced.
B~es
esti-
The Bayes procedure in the context of
integrated risk, for example, introduces yet another averaging process;
a prior distribution TI on P.
the one that minimizes the
In this case the
B~es
R"
P
B~es
estimator would be
integrated risk
= S R,,(P)dTI(P).
(3.6)
P p
The approach to this problem of density estimation adopted in this
thesis is classical.
The loss function is squared error loss; that is
"
L(p(x),p(x))
"
= (p(x)~p(x))
2
The object will be to find unbiased estimators;
exist, that minimize the risk
~~ction
•
(3.7)
= ;(x;x(n)),
in (3.4) for all (x,p)
if they
€
I X p.
Note that, with the loss function in (3.7) and the restriction to
unbiased estimators, the risk function in (304) becomes the variance
(3.8)
18
of p at the point x.
It is convenient at this point to give a
definition of an unbiased estimator of a density function.
the estimator will depend on a statistic T
=
In general
T(X(n)); for example,
The notation developed in the previous section concerning
such statistics will be adhered to in the definition below and the
subsequent ones in the remainder of this section.
""
Definition 3.1 A real-valued function p
unbiased estimator of a density p
€
'"
= p(x;t)
P if it is CL
X
on I X ~ is an
B measurable and
satisfies
""
ET[p(X;T)]
=
p(x),
a.e. IJ.,
for all p
€
p.
Note, E is an abbreviation for Ep •
T
T
Notice that this definition does not require that an unbiased
estimator have the properties of a density, such as, being nonnegative, and integrating to unity.
Notice also that the definition
allows for the estimator to be biased on a subset of I with IJ.-measure
zero.
This is a technical necessity the reason for which will be made
clear in the proofs of the theorems in the next section.
Apart from
this technical detail the above definition has, therefore, a certain
global content; an unbiased estimator of the entire function is
required.
Hence, with minor modifications, the problem is not new.
The
search is for minimum variance unbiased estimators of density functions.
The solution to the problem has also not changed; if P .admits a complete
19
and sufficient statistic then unbiased estimators based on this statistic
will be unique minimum variance unbiased.
However, the question of
existence is the fundamental problem that arises.
In this connection it
is convenient to consider unbiased estimators of the corresponding
probability measure P.
Therefore, the following definition will formal-
ize this notion.
A
A
Definition 3.2 A real-valued function P
biased estimator of P
€
P if for each A
=
€
P(A;t) on
a it
a x~
is an un-
is a a-measurable
function and satisfies
for all P
€
p.
(3.10 )
Notice that this definition does not require P to be a probability
measure or, in fact, a measure on
a
for each fixed t.
follows, it will be assumed that, for each t
€~,
However, in what
such estimators are,
in fact, measures on a; and only such estimators will be considered in
the sequel.
In the following development it will become clear that it
is desirable to restrict attention to estimators P which are measures.
A
Definition 3.3
a for
each t
~-finite
P(A;t)
= 0,
= P(A;T)
of P
€
p, which is a measure on
is said to be absolutely continuous with respect to a
€~.,
measure
A
An estimator P
~
on
a if,
for every A
€
a for
which
~(A) =
0,
a.e. PT; that is, except for a set of t-values with
PT-measure zero, for all PT
€
PTe
A consequence for this definition is that if P is such an estimator
20
then, by the Radon-Nikodym
T~eo.rem,
there exists, almost everywhere
r T,
an a-measurable, g-unique, non-negative function f(x;t), such that
,..
P(A;t)
= S f(x;t)dg(x),
for every A
€
a.
A
Although it is a-measurable, f(x;t), in general, will not be
a
x IB-measurable.
The G X IB-measurabili ty of such functions f(x;t)
is necessary for the application of Fubini's theorem in the proofs of
the theorems in the next section.
Hence, it will be assumed, in the
sequel, that all such g-continuous estimators P of P have
a
X B-
measurable Radon-Nikodym. derivatives.
3.4 Existence Theorems
In this section the problems concerning the existence of an
unbiased estimator and a unique minimum variance unbiased estimator
of a density function are solved.
A desirable property of these
solutions is that they also provide a method for computing such an
estimator, should it exist.
The terminology, notations, and conventions, introduced in the
previous sections of this chapter, will be adhered to throughout the
remainder of the present section.
Theorem 3.1, below, is concerned with the existence of an unbiased
estimator of a density p which is assumed to be a member of a family of
densities
~
corresponding (with respect to a
measure g) to a family
r
~-finite
of probability measures P.
dominating
21
Theorem 3.1
estimator
An ~~biased
or
p
~xists
if and only if there
exists an unbiased estimator P of P that is absolutely continuous with
respect to the original dominating
..
measure~.
Moreover, when such a P
exists, an unbiased estimator of p is given by the Radon-Nikodym deriva,.,
dP
"
t J.ve,
~.
,.,
,.,
Proof:
Let P
= p(. ;T)
be an unbiased estimator of P that is absolutely
continuous with respect
each A
€
Then, by the unbiasedness property, for
to~.
U
=
Sp(x)~(x),
for all P
€
p.
(3.12)
A
Also, by the Radon-Nikodym theorem, there exists a non-negative function
,.,
,.,
p = p(x;t) such that for each A € u,
,.,
P(A;t)
,.,
= S p(x;t)~(x),
A
It follows, on taking expectations (with respect to P ) in
T
applying Fubini's theorem to the third member of
for each A
€
(3.14) below, that,
u,
,.,
P(A)
(3.13), and
,.,
= ET[P(A;T)] = ET[S
p(x;T)~(x)J
A
,.,
= S ET[P(x;T)J~(x), for allP
€
p.
A
(3.14 )
Then, by the uniqueness part of the Radon-Nikodym theorem, the desired
22
result follows from (3.12) and (3.14); that is,
ET[P(x;T)]
Now suppose that p
is, (3.15) is true.
= p(x),
a.e.
= p(x;Tj
for all p
IJ.,
€
~.
is an unbiased estimator of p; that
It then follows, on applying Fubini's theorem to
the second member of (3.16) below, that, for each A
~
€
G,
~
P(A) = J ET[p(x;T)]dJJ.(x) = ET[J p(x;T)dJJ.(x)],
A
for all P
(3.16 )
P.
€
A
~
Hence, from (3.16), J pdJJ. is an unbiased estimator of P(A) for every
A
A
€
G, and is clearly absolutely continuous with respect to
IJ..
This
completes the proof.
Thus, if an unbiased estimator of P is available that happens to
be absolutely continuous with respect to
IJ.,
it can be immediately
concluded that an unbiased estimator of p exists, and such an estimator is given by the IJ.-derivative of the estimator of P.
Further-
more, if there exists a complete and sufficient statistic for P, the
usual methods of the Rao-Blackwell-Lehmarill-Scheffe theory can be used
to obtain a unique minimum variance unbiased estimator of p at every
point x
€
I.
However, the following lemma and theorem give another
method of achieving the same end.
Denote by
e
the class of subsets E of the form A X I (n-l) with
A an element of G.
Thus,
e is a class of cylinder sets with bases
in the first coordinate space of I(n).
Clearly,
e is a sUb-cr-algebra
23
of
a(n).
If T is a complete and sufficient statistic for P, the
conditional probability measure pT on
PEP.
a(n)
is, therefore, constant in
T
Denote by P~ the restriction of p to the class of sets
,..
and define an estimator P of P by
,..
P(A;T) =
P~(E),
for all E E
e,
e,
where E = A X X(n-l) •
For each t E ~, P~ is a probability measure on e and, therefore,
,..
P, defined in (3.17), is a probability measure on a for each such t.
The following lemma is now easy to prove.
Lemma 3.1
If T is a complete sufficient statistic for a family P of
proability measures P, then; =
P~
is the unique minimum variance
unbiased estimator of P.
Proof:
Put B =
~
E(=A X X(n-l)) E
and identify pen) with Q in (3.3).
e
it follows that
E [;(A;T)] = pen) (E) = peA),
T
Hence ; =
pi
Then, for every
for all PEP.
(3.18 )
is an unbiased estimator of P and, being a function of T
alone, is, moreover, the unique minimum variance unbiased estimator of
P; and this completes the proof of the lemma.
It should be noted that the P defined above is, in fact, a
probabili ty measure on
a
for each t E
~.
This property has an important
24
implication in the next theorem which is concerned with both the
existence and computation
mators
density
o~
o~
with the bases
unique minimum variance unbiased esti-
o~
Note also that
~ctions.
o~
its sets in any one
The notation
o~
Theorem 3.2
Let T be a complete and
e
could have been
de~ined
the coordinate spaces
o~ l(n).
the previous lemma will be used without
su~ficient
,..
statistic for p.
unique minimum variance unbiased estimator p of p
only if ;
= P~
dominating
measure~.
A
exists if and
Moreover, if such a p exists it is given by the
Radon-Nikodym derivative, ~e
If P
E 1:>
comment.
is absolutely continuous with respect to the original
T
dP
Proof:
~ther
T .
= P
e
18
.
absolutely continuous with respect
to~,
then, by
the
above lemma
and theorem 3.1 an unbiased estimator of p is given by
,..
,..
:.
But,: is a ~ction of T alone and hence is the unique minimum
variance unbiased estimator of p.
On the other hand, if p is the unique minimum variance unbiased
estimator of p, it must be a
each A
E
~ction,
p(x;T), of T alone.
Hence, for
U,
,..
P(A) = SETCP(X;T)J~ (x)
,..
= ETCSP(X;T)@ (x)J,
A
for all FE 'P..
A
,..
But, by the lemma P = P~ is an unbiased estimator of P and, therefore,
for each A E U,
,.
,.
ETCP(A;T)-S p(x;T)~(x)J
A
= 0,
~or
all PEP.
(3.20)
But, T is complete, and hence
,.
P(A;t)
,.
Hence, P
T.
= Pe
:LS
,.
=
JAp(x;t)dg(x),
absolutely continuous with respect to
IJ..
Finally, it
should be noted that if P~ is absolutely continuous with respect to
IJ.,
T
dP
then ~ is, in fact, a proper density function.
This completes the
proof.
T
Hence, once P has been computed, it is only necessary to check
e
whether or not it is absolutely continuous with respect to
IJ.;
and if
it is, the unique minimum variance unbiased estimator of p is obtained
by differentiating P~ with respect to
IJ..
For example, if P is a
family of probability measures on the real line with ;; the corresponding family of distribution functions, and
IJ.
is ordinary Lebesgue
measure, then one needs only to check if F, the distribution function
corresponding to P, is an absolutely continuous function of x.
If it
is, the unbiased estimator, at x, is given by the ordinary derivative
,.
dF(x) • Such families of distributions on the real line will be the
dx
object of study in the next chapter.
)~.
APPLICATIONS
4.1
Introduction
In this chapter applications of the general theory presented in
the previous chapter are discussed for families of distributions on
the real line; that is, the basic space is a Borel set of the real
line together with its Borel sUbsets, and the families of densities
and corresponding families of distribution functions are taken to be
Borel-measurable functions.
Without exception, the dominating measure
will be either ordinary Lebesgue measure or counting measure.
In the search for an unbiased estimator of a density function the
general method will be to first find an unbiased estimator of the
corresponding distribution I1Lnction and then test it for absolute
continuity.
If, then, the distribution function estimator is abso-
lutely continuous an unbiased estimator of the density function is
obtained by differentiating this distribution function estimator.
Most of the families of distributions that are considered have the
feature that they admit a complete and sufficient statistic.
In these
cases the search for estimators will be restricted to those which are
functions of the complete and sufficient statistic, and then the
unbiased estimators, if they exist, will also be unique and have uniform minimum variance.
Before proceeding to particular examples,
three different methods of finding unbiased estimators of distribution
fUnctions will be discussed.
The choice of method is sometimes
dictated by the structure of the particular family of distributions
under consideration, whereas in some situations computational ease is
the only criterion.
Essentially, the remBinder
two main parts; one part dealing
o~
the chapter will be divided into
~~th
discrete distributions and the
other with families of absolutely continuous distributions.
discrete case a solution is given for a
one~parameter,
In the
power series
type family which includes several of the well-known discrete distributions with possible values on the set (0, 1, 2, ••• }.
absolutely continuous case many of the well-known
f~~lies
In the
of dis-
tributions are discussed including a general truncation-parameter
family.
Rosenblatt's result, on the non-existence of an unbiased
estimator of a density function that corresponds to an unknown
absolutely continuous distribution function on the real line, is
also re-established as a simple corollary to Theorem 3.2 of the previous chapter.
4.2 Methods
The literature, to date, contains essentially three different
methods for finding unbiased estimators of distribution
functions.
Actually, all three methods can be used to find unbiased estimators
of more general parametric functions.
These methods are given below.
However, before giving a formal statenlent of each method, a short
discussion will first be given comparing them on the basis of their
scope of applicability and computational ease.
The first method, due to Tate (1959) and Washio, et al (1956),
makes use of integral transform. theory.
In general, the application
of this theory does not depend on the existence of sufficient
,
\,
statistics, although Washio,
~
al do restrict their study to the
Koopman-Pitman family of densities (cf. Koopman. (1936) and Pitman
(1936». Recall that this family, by definition, admits a sufficient
statistic and has range independent of the parameter.
On the other
hand, Tate restricts his attention to finding estimators which are
functions of homogeneous statistics and statistics with the translationproperty for the scale and location parameter families,
respectively.
In what follows interest will be confined to finding unbiased
estimators of distribution functions and in most of the examples
treated these integral transform methods would be somewhat heavyhanded and the other two methods, described below, are usually
eas~~r
to apply.
The second and third methods depend, for their application, on
the existence of sufficient or complete and sufficient statistics.
The second method is a straightforward application of the well-known
Rao-Blackwell and Lehmann-Scheffe theory, and the theorem due to B~u
(1955, 1958) is the connecting link between these two methods.
Whe~e-
as the third method is generally the easiest to apply, its application
depends, somewhat artificially, on one being able to find a statistic
with distribution independent of the index parameter of the family of
distributions under consideration.
On the other hand, the distribution
problems encountered in applying the Rao-Blackwell-Lehmann-Scheffe
theory are often difficult and laborious.
·e
Throughout the remainder of this chapter the following conventions,
29
notation, and terminology will be adopted.
of the real line, R;
a,
I will denote a Borel set
the Borel subsets of I; A, ordinary Lebesgue
measure; v, counting measure when I is a finite or infinite sequence
of points, (an: n ~ l}; Pe, Fe' and Pe' the unknown probability
measure, distribution fUnction, and density function, respectively;
wi th e an element of a parameter space ®.
•
In each example it will be
assumed that the estimation procedure will be based on a statistic T
which is a function of n independent observations, x(n)
= (Xl ,X 2, ••• ,xn ).
In all the examples considered, T will be a complete and sufficient
statistic.
Each example will be characterized by a vector, (Pe,I,®;T}.
~
~
A
~
and p(x) = p(x;T) will denote, respectively, the
,.
,.
estimators of Fe and Pe; and peA) = P(A;T), the estimator of P ' will be
Finally, F(x)
= F(x;T)
e
given by
,.
peA)
,.
= S dF(x),
for all A
€
a.
(4.1)
A
The integral in (4.1) is, for each t
€~,
a Lebesgue-Stieltjes
integral but in all examples can be evaluated as a Rieman-Stieltjes
integral.
Method 1. The transform method considered here will not be discussed in
any generality but will be illustrated by the work of Tate in section
four of his paper where he applies the Mellin transform to find unbiased
estimators of arbitrary fUnctions of scale parameters.
form of a fUnction cl(X), when it exists, is defined by
The Mellin trans-
30
00
M[rr(x);s]
= J XS-~(x)~~,
o
(4.2)
If &(s) is such a transform the inverse transform will be denoted by
(4.3 )
Let p(x) be a known, piecewise continuous density that vanishes
for negative x.
Consider the following scale parameter family, defined
by
Pe (x) = ep( ex),
o< e<
00.
(4.4)
Tate considers the problem of estimating ep(ez) for a fixed value z.
He considers only those estimators that are functions of non-negative
homogeneous statistics; that is, H is such a homogeneous statistic of
degree Ol .;. 0 if,
(i)
(ii)
H: [O,oo)(n)
H(8x(n)
=
---i>
[0,00),
eOlH(x(n)),
0 <
e<
00.
(4.5)
When x(n) is a simple random sample from a member of the family
defined in
(4.4), H(X(n)) has a density of the form eOlg(eOlx). Thus an
""
unbiased estimator p(z ;H}, if it exists, is a solution of the integral
equation
31
1»,.
Sp(z;h)eag(eah)dh = ep(ez).
(4.6)
o
Assume that both g(x) and ep(ez) have Mellin transforms.
Then, if
there exists an unbiased estimator of ep (ez) with a Mellin transform, it
will be given by
p(z;H)
= g
H
M-l [
M[p(x);a(s-l)+l] .1 ]
() ]'H •
za(s-l)+L_[
1M g x ;s
(4.7)
Tate uses the Laplace and Bilateral Laplace transforms to find
unbiased estimators of functions of location parameter families.
Method 2. This method, as noted in the discus.sion above, is a straightforward application of the Rao-Blackwell-Lehmann-Scheffe theory.
To
find the unique minimum variance unbiased estimator of Fe at a fixed
point x a simple unbiased estimator is first found which when conditioned on the complete and sufficient statistic yields the required
result.
Usually, the simple estimator is
(4.8)
where I
denotes the indicator function of the set A = (-=,x] and Xl is
A
the first observation from a sample of n. Thus, if T is a complete and
sufficient statistic for
e
Fe(x) is given uniquely by
the minimum variance unbiased estimator of
32
F(x;T) ~ P[Xl ~ x\T].
(4.9)
Clearly, in the discrete case the minimum variance 'UIlbiased estimator of the probability mass function can be obtained directly; abso..
lute continuity of the distribution function estimator with respect to
counting measure is guaranteed.
Hence, the required density estimator
in the discrete case is given by
,.,
p(x;T) ~ P[Xl ~ x\T],
(4alO)
at the point x.
Method 3. This method, due to Sathe and Yarde (1969), depends, for its
application, on a theorem of Basu (1955, 1958).
will be specialized as follows.
in
The original result
Denote by U the simple estimator given
(4.8) and suppose T is a complete and sufficient statistic for e.
If there exists a function V
(i)
(ii)
~
V(Xl,T) such that,
it is stochastically independent of T,
it is a strictly increasing function of Xl for fixed T,
then the minimum variance unbiased estimator of Fe at x is given
uniquely by
33
,..
F(X;T) =
•
1
if
V(x,T) > b,
V(x,T)
SdH(U)
a
if
a < V(x,t) ~ b,
o
if
V(x,T)
~
(4.11)
a,
where H is the distribution fUnction of V and satisfies
o for x
~
a,
H(x) =
for x > b.
1
The proof is immediate by noting
,..
F(X;T)
= P[Xl
~ x\T]
= P[V
~ V(x,T)\T]
by
= P[V
~
by
V(x,T)]
(ii)
(i).
The important step in the application of this theorem is the
selection of a suitable statistic V.
•
The following tneorem, due to
Basu (1955, 1958), gives sufficient conditions for V to satisfy (i) •
Theorem 4.1 If T is a boundedly complete and sufficient statistic for
e and
V is a statistic (Which is not a fUnction of T alone) that has
distribution independent of
e then
V is
stochastic~ly independent
of T.
34
Therefore, once V is found and its distribution has been derived,
the required estimator is given by (4.11).
4.3 Discrete Distributions
In
this section method 2 will be used to find unique minimum. variance
unbiased estimators of discrete-type density functions.
Included is a
general expression for an estimator for a wide class of discrete distributions.
These distributions, first introduced by Noak (1950) and later
studied in detail by Roy and Mitra (1957), include many of the well-known
discrete distributions.
ExAAWle 4.1:
The Binomial Distribution.
This class of distributions,
in the notation introduced in section 4.2, is characterized by the vector
(4.13 )
where k is a known positive integer.
The distribution of T is given by
t
E
{O, 1, ... , nk};
and the conditional distribution of x(n) given T
where S (t)
n
= [x(n) e x(n):
n
E x.
. 1
~=
~
= t].
= t is
(4.14)
35
Hence,
;(x;t) = E
[.~ (X~)/(~)J,
J.=l
x
J.
~
[0, 1, ••• , min(t, k)},
(4.16)
where the summation in (4.16) is over the set
[x(n):
x(n-l)~ Sn-let-x), xn = x}.
This expression simplifies to give
the required estimator
;(X;T)
= (~)(~::)/(~),
x
~ {O, 1, ••• ,
min(T,k)}.
It is interesting to note that the binomial distribution is estimated by a hypergeometric distribution with a range that has random
end-point, min(T,k).
Example 4.2:
The Noak Distributions.
These distributions are character..
ized by the vector
n
x
{a(x)e /f(6), (0, 1, ... ), (0, c); E X.},
• 1 J.
J.:::
(4.18)
where c is finite or infinite, a(x) > 0, a(O) ::: 1 and
<Xl
x
f(e) = E a(x)e •
x=o
The Poisson, negative binomial and logarithmic distributions occur
as special cases of the above.
They will be considered separately to
illustrate Theorem 4.2 below.
n
Roy and Mitra proved that T = E X. is a com;plete and sui'ficient
i=l ~
statistic for e and has distribution
C(t,n)e t/ f n (e),
te[O,l, ••• },
(4.20)
where
n
(4.21)
C(t,n) = E .IT a(x.).
. 1
~=
~
The summation in (4.21) is over the set S (t) = [x(n) € len): ~ x. :: t}.
n
i=l ~
The above results and notation will be used in the statement and proof of
the following theorem.
Theorem 4.2 The minimum variance unbiased estimator of a Noak density
function is given by
,.
p(x;T) ::
Proof:
C
a(x)~~~~
,
e (0, 1, ••. , T).
X
(4.22)
The conditional distribution of x(n) given T = t is given by
n
i~l a(xi )/C(t,n),
x(n)
€
S (t).
n
(4.23 )
37
Hence,
n
p(x,t)
: : En. 1 a(x. )/C(t,n),
1=
X €
1
(0, 1, ..• , t},
where the summation is over the set (x(n): x(n-l)
t~~ expr~ssion
in
€
Sn_l(t-x), x
(4.24)
n
::: x}
(4.24) simplifies to give the required estimator.
The above result will now be illustrated for the Poisson, negative
binomial and logarithmic distributions.
The Poisson Distribution.
For this distribution,
Therefore,
X
€
(0, 1, ... , T}.
Hence, the Poisson distribution is estimated by a binomial distribution with parameters l/n and T.
The Negative Binomial Distribution.
a(x) ::: ( k+Xx-l); f(e)
For this distribution,
= (1-8
) - k ; c ::: 1; C(
)
(nk+t-l)
t,n::c
t
'
where k is a positive integer.
Therefore,
(4.27)
38
,.
p(x;T)
X
€
(0, 1, ••• , T}. (4.28)
Note that the minimum variance unbiased estimator for the geometric
distribution is obtained by putting k = 1 in (4.28). For fixed T = t,
,.
p(x;t) has the following interpretation. Suppose there are nk cells into
which t indistinguishable balls are placed at random.
•
If it is assumed
(nk~t-l) distinguishable arrangements have equal proba,.
bilities, then p(x;t) is the probability that a group of k prescribed
that all of the
cells contain a total of exactly x balls.
The Logarithmic Distribution.
For this distribution,
a(x) = l/(x+l); f(a) = -log(l-e)/e; c = 1; C(t,n)
where the summation is over the set S (t).
n
=Z
n
IT l/(x.+l),
i=l
(4.29)
1.
No closed expression for
,.
C(t,n) is known by the author and, hence, p(x,t) does not have an obvious
probabilistic interpretation.
4.4
Absolutely Continuous Distribution
In this section some of the well-known families of Lebesgue-continuous
distributions will be discussed.
All of these families admit a complete
and sufficient statistic so that only unique minimum variance unbiased
estimators.of the distribution functions will be considered.
The esti-
mators of these distributions functions have all previously been derived
elsewhere and, therefore, only a few of these will be re-established to
illustrate the methods of section two.
The remainder of the results will
39
be stated and checked for absolute continuity to decide whether or not
unbiased estimators of the corresponding density functions exist.
Also,
Rosenblatt's result is re-established as a simple corollary to Theorem
Example 4.3:
The Class of all Absolutely Continuous Distributions.
In
this example F(x) is an arbitrary absolutely continuous distribution
function with corresponding density p(x).
It is well known that the
order statistic T = (X(l)' X(2)' ••• , X(n)) is complete and sufficient
for this family and that the minimum variance unbiased estimator of F
is given uniquely by the empirical distribution function
,.,
F(x;T)
=
n
l: I
i=l Ai
(X.
)/n,
X
E
(4.30)
R,
1.
where I . is the indicator function of the set A. = [X.
A1 . , . ,
1.
1.
~
x].
for each fixed T = t, F(x;t) is not absolutely continuous and
Clearly,
hence, by
Theorem 3.2, there is no unbiased estimator of p(x) for any sample size n.
Example 4.4:
The Normal Distribution.
This distribution is character-
ized by the vector
n
l:
. 1
1.=
2
(X. -X) )}.
1.
(4.31)
As well as (c), the complete family, two sub-families will be discussed;
namely, (a)6
2
= 1, and (b)e l = O.
For simplicity, write T/n = (U,W);
40
that is,
n
U::::
L; X./n,
i::::l ~
(4.32)
and
n
w:::: Z (X.-U)
~
i::::l
2
In.
(a) 8 unknown, 8 :::: 1. Here, U is the complete and suff~cient statistic.
1
2
Moreover, V :::: Xl~U has a normal distribution with (8 1,e ) :::: (0, (n-l)/n),
2
and the function x-u is monotonic increasing in x for fi~e:d u. Hence,
method 3
~plies
and the required minimum variance
unbi~ed
estimator of
F is given uniquely by
: : Jx-U(2n(n-l)/n)~exp[-ns2/2(n-l)]ds.
~
F(x;U)
(4.34 )
-eo
It should be noted that, for n
~
2, F is absolutely continuous and,
therefore, the minimum variance unbiased estimator of the density function
exists for n
~
2 and is given by
;(x;U) :::: [2n(n-l)/n]""*exp[-n(x-u)2/2(n-l)],
x
€
R.
When n :::: 1 F is not absolutely continuous and is, in fact, the
empirical distribution function.
It will become apparent in several of
the examples that the existence or non-existence of unbiased estimators
depends on sample size.
The smallest sample size m for wPich an unbiased
estimator exists will be called the degree of
4.3, m ::;
CQ
estimab}lity~
and in the present example, m = 2.
In example
The value m = 2 in this
example could be given the following interpretation: one 'degree of
dom' for
est~mating
the
empiric~
of freedom' f9r estimating 9 •
1
estimability.
free~
distribution function and one 'degree
A~tual1y,
e1
does have one degree of
Finally, it spould be observed that the unbiasedness of
p(x;U) in (4.35) can easily be verified directly by integration.
(0) 9
1
~
0, 9 unknown.
2
Here, the complete and sufficient statistic is
S =
Folks,
!l!:b
n
2
i=l
1.
(4.36)
I: X.•
using method two, obtained the following estimator:
0,
x
IS:~I,
< x
~ si,
A,_.
F (x;S) =
2
, G _ ( (n-l)'x/ (S_x )1],
n 1
...si
x >
si,
where G l(u) is Student's t-dis~ribution function with (n-l) degrees of
n"l'
,.,
freedom. Clearly, F is absolutely continuous for n :<!: 2 and, hence, the
required minimum variance unbiased estimator of the density function is
given uniquely by
42
A
=
p(X;S)
(4.38)
0,
B(a,~)
where
elsewhere,
is the complete Beta function.
It is interesting to compare the estimators in cases (a) and (b).
The estimator in
fixed T
= t,
(4.35) is strictly positive for all values of x and
whereas, the estimator in
val [-t*,t*J.
any
(4.38) is zero outside the inter-
In both cases, of course, the original density functions
are positive everywhere for all values of the unknown parameters.
case (b), m = 2 and in case (a), m = 2.
In
The value, m = 2, in case (b)
can be decomposed into one 'degree of freedom' for 9
2
and one degree for
the empirical distribution function.
(c) 9 unknown, 9 , unknown.
2
1
statistic is T
that V
=
(nU,nW).
= (Xl-U)/~
0,
In this case the complete and sufficient
Using method 3 of section two and the fact
has distribution
elsewhere.
Sathe and Yarde obtained, in a slightly different form, the estimator
43
F(x,T)
0,
x < U - [(n-l)w]i,
1,
x > U + [(n-l)w]i.
=
Clearly, F is absolutely continuous for n
~
3 and the unique minimum
variance unbiased estimator p of the density £'unction is obtained by
replacing x with (x-U)/wi in
(4.39).
It should be noted that, once
again, the estimator p is zero outside a finite interval for
value of T.
for
(e l ,e 2 )
The degree of estimability m
any
fixed
= 3; two 'degrees of freedom'
and one for the empirical distribution £'unction.
It
appears that the degree of estimability equals the degree of estimability of the usual indexing parameter plus one degree for the empirical distribution £'unction.
Example 4.5:
The Truncation Distributions.
This family of distri-
butions includes two types; and the characterizing vector for each type
~s
given by
(e<x<b),
(a < x <
e),
where the interval (a,b) is either finite, semi-infinite or infinite and,
k
l
and h and k and h have the obvious relations
l
2
2
44
b
l/kl(e)
= S hl(x)dX,
1!k (e)
2
= Sh2 (x)dx,
e
e
e
€
(a,b);
(4.43 )
for ~l e
€
(a, b).
(4.44)
for all
a
The statistics X(l) and X{p) are the smallest and largest order statisTate used essentially method 3 to
tics, respectively.
ing distribution function estimators for the Type
I
obtai~
and Type
the follow-
II
families
defined above:
a < x < X(l) < b,
Type I
"
F(X;X(l) )
x
=
1:n
+ (1.1:)
n
SX
b
(4.45)
hl(u)dU
,
(1)
S
a < X(l) < x
< b.
hl(u)du
X(l)
~
Type II
"
F(X,X(n) )
(l-~
=
S h2(u)du
i' ,
S (n)h
~
,
(u)du
of lin at x
= X(l)
est~mators
< x < X(n) < b,
(4.46)
2
1,
Both of the above
a
a < X(n) < x < b.
are mixed distributions with a jump
in the first case and at x
= X(n)
in tQe second case.
Hence by Tneorem 3.2 no minimum variance unbiased estimator of the
density function exists for either type.
Two examples of these distri-
butions will now be given to illustrate the above estimators.
The
~areto
kl(e) ~
•
s,
Distribution. This distribution has a Type I density with
~(x) ~ 1/x2, a = 0 and b =~. The estimator in (4.45)
takes the form,
0 < x < X(l) <
0,
ro ,
~
(4.47)
F(X;X(l)) =
l/n + (l-l/n)(l-X(l(x),
0 < X(l) < x <
ro •
~
Note that if n = 1, F in
(4.47) reduces to the empirical distribution
function.
The Uniform Distribution.
This is a Type II distribution with
k 2 (S) = l/e, h (X) = 1, a = 0 and b =
2
the form,
(n-l)X/nX(n)'
+roo
The estimator in (4.46) takes
o < x < X(n) <
ro ,
~
F(X,X(n) )
•
(4.48)
=
1,
o < X(n) < x < ro
•
The following figure illustrates the estimator in
(4.48).
46
F(x)
1
1
r
~---------~----------------x
1n)
Figure 4.1 The U.M.V.U.E. of (x/e)I(o,e).
This example serves to illustrate a point that emphasizes the
subtle difference between the work in this thesis and the usual point
estimation problem.
It is simple to show that if n ~ 2, (n-l)/nX(n)
is the unique minimum variance unbiased estimator of lie and yet there
exists no such estimator for the density
p (x)
e
lie,
° < x < 9,
0,
elsewhere.
=
The point is that in the first instance all that is required is an
unbiased estimator of the parametric function lie; and in the second
case an unbiased estimator is required of the function specified in
(4.49). Thus, the notion of a uniform (in x) unbiased estimator does
create a different estimation problem.
The Two-Parameter !!ponential Distribution.
i
. . .
.
Tp,e cha,raG-
terizing vector for this family of distributions is
Tate obtained the following distribution function estimator using
method 1:
0,
A
F(X;T)
=
1-(
1\
l-ii/(1
X-x Cl »)n-2
y - -.
x
< X(l)'
, X(1)
(4_~1)
< x < X(1) + Y,
x > X(l) + Y,
1,
n
where Y = E Xi-pX(l)i=l ..
Clearly, F is not absolutely continuous but is a mixed
with mass
1/n
at x
= X(l)-
distributio~
Hence, no unbiased estimator of the density
function exists for this family_
However, if 8 is known to be equal tQ
1
zero (say) then the estimator for this sub-family is obtained by putting
X(l)
= 0;
and changing the exponent to n-l and (l-l/n) to 1 in
continuous and,
density estimator is given by
This estimator is
•
.
~bsolutely
p(x;T)
(4.51)-
therefore, the required
C4-52)
=
0,
elsewhere,
48
where T
=
n
~
degree for
X..
. 1
~=
e2
~
The
degre~
at estimability in this case is two; one
and one degree for the empirical distribution function.
The
wor~
in this thesis is concerned with the problem of unbiased
estimation of probability density functions.
sidered is that of a
~-dominated
a o--fini te measure space
..
X(n)
(x,a,~).
family
The general model
con~
r of probability measures P on
Given n observations
(Xl' X , ••• , X ) on a random variable X (with values in I.) the
n
2
the probability density function
object is to find esti~ators p o! p
~
corresponding to P -
such
th~t
,.
Ep[p(x)]
= p(x),
for all PEr and all x
€
x..
The uniform in x property of such estimators p is the feature that dis.
tinguishes this problem from the usual problem of point estimation.
Here, an estimator of a function is sought.
In
~hapter
three definitions of unbiasedness for both density
function and probability measure estimators are given.
suitable definition of
~-continuity
After giving a
for an estimator P of P two theorems
are established that give necessary and sufficient conditions that
guarantee the existence of unbiased estimators p of p.
It is shown in
Theorem 3.1 that such a p exists if and only if there exists an unQiased
•
estimator P of P that is absolutely continuous with respect to the
original dominating
Moreover, if such a P exists then it is
"
shown that the Radon-Nikodym derivative ~ is an unbiased estimator of p.
measure~.
Theorem }.2 gives a strqnger result if there exists a complete and
sufficient statistic T for
r; a unique minimum variance unbiased
esti~liLtQJ'
50
of p exists if and only if P~ ~s absolutely cQntinu~us with respect to
~.
,.
Moreover, if s~ch a p exi~t"
it is give~ by
dP
dge
which is also a
probability density function for almc;>st all (PT) t. (Here, P~ is the
conditional probability me~ure fT o~ a(n), ~iven T, restricted to the
•
sub-~-algebra e
~pace
but arbitrary coordinl:l.te
family of distributions if
which is
absol~tely
esti~ator
of p
of X(n».
H;ence, for
~ists
apd is
obtain~d
to
res~ect
chapter four the
bases in a fixed
~
particular
is possible to derive an estimator of P
i~
continuous w:f,tll respect to
measure estimator with
rn
~e ~ylinders with
of a(n) whose set$
ge~e~al
by
~
then an unbiased
d~fferentiating
the probability
~.
theory is
sp~cial~zed
to well-known
families of distributtons on the real line that are either absolutely
continuous ¥ith respect to
ord~na~y ~ebesgMe
continuous with respect to counting
In the Lebesgue-continuous case
m~asure
measure or absolutely
on a
se~uence
of points.
is shown, tor example, that for the
~t
normal distribution with \Ulknown mean and variance 'Ghat if the sample
size exceeds pr equals three then
esting propepties when
minimum varianoe unbiased estimator
~ists.
of the normal density function
•
~
This estimator has some inter-
with the oorresponding maximum likeli-
co~pare~
hood estimator obtained by substituting the maximum likelihood estimators of the parameters for
the normal density.
statistic the
For
t~e par~eters
examp~e,
esti~tor ~s ~ero
mum likelihood estimator).
~t
for
an~
outside a
in the functional form of
fixed value of the sufficient
fin~te
interval (cf. the maxi-
is also interesting to note that the
normal density function is estimated by a
St~dent's
t-distribution,
51
whereas, the likelihood principle demands that the estimator should also
be a normal distribution.
mators of distribution
Examples of minimum variance unbiased esti-
functions which are not absolutely continuous
are the family of all Lebesgue-continuous distributions on the real line
and the uniform distribution on
(o,e).
Estimators for the binomial dis-
tribution and a general power series type distribution are obtained to
give examples of the procedures for discrete distributions.
Interesting sample size considerations are observed in many of the
examples illustrating the absolutely continuous case.
For example, in
the normal distribution it has already been noted that at least three
observations are required to obtain an unbiased estimator.
The maximum
likelihood procedure, on the other hand, requires a minimum of only two
observations.
It is apparent in several of the examples that the degree
of estimability can be
decompo~ed
into one degree for the empirical dis-
tribution function and the degree for the index parameter in the functional form of the density.
The degree for the parameter is the smallest
number of observations that are required to have an unbiased estimator
of that parameter.
..
"
As in any area of research some problems still remain to be solved•
Some of these are:
1.
Find conditions that guarantee the existence of Bayes, minimax
and invariant estimators of densityfunctionso
2.
Find equivalent conditions on p that guarantee P~ to be abso-
lutely continuous with respect to
3.
~.
Compare, for example, maximum likelihood estimators with
52
unbiased estimators of density functions, when the latter exist.
4.
The likelihood ratio is often used to partition the sample
space in the construction of decision rules.
In discrimination proce-
dures, for example, it is sometimes necessary to estimate the likelihood ratio and hence the resultant partition of the sample space.
A
study of the properties of such procedures when unbiased estimators of
density functions are used to estimate these partitions would be interesting•
•
53
6.
LIST OF REFERENCES
Anderson, G. D. 1969. A comparison of methods for estimating a
probability density function. Ph.D. thesis submitted to
graduate ;faculty of the University of Washington.
Bartlett, M. S. 1963. Statistical estimation of density functions.
Sarikya 25:245-254.
Barton, D. E. 1961. Unbiased estimation of a set of probabilities.
Biometrika 48:227-229.
Basu, A. P. 1964. Estimates of realiabi1i ty for some distributions
useful in life testing. Technometrics 6:215.~19.
Basu, D. 1955, 1958.
cient statistic.
On statistics independent of a complete suffiSankya 15:377-380; 20:223-226.
1965. Conditional expectation given a
appli cati ons. Ann. Math. Stat. 36:1339-1350.
B+Ullk, H. D.
~-lattice
Cacoul10s, T. 1966. Estimation of a multivariate density.
Inst. Stat. Math. 18:178-189.
and
Ann.
y
Cencov, N. N. 1962. Evaluation of an unknown distribution density
from observations. Soviet Mathematics 3 :1559-1562.
Elkins, A. T. 1968. Cubical and spherical estimation of multivariate
probability density. J. Am. Stat. Assoc. 63:1495-1513.
Fix, E., and Hodges, J. L. Jr. 1951. Nonparametric discrimination:
consistency properties. Report Number 4, Prqject Number 21-49-004,
USAF School of Aviation Me<U.cine, Randolf Field, Texas, February,
1951.
•
Folks, J. L., Pierce, D. A., ~d Stewart, C. 1965. Estimating the
fraction of acceptable product. Technometrics 7:43-50•
Kolmogorov, A. N. 1950. Unbiased estimates. Izvestia Akad. Nauk SSSR,
Seriya Matematiceskaya 14:303-326. Amer. Math. Soc. Translation
No. 98.
Koopman, B. o. 1936. On distributions admitting sufficient statistics,
Trans. Amer. Math. Soc. 39:399-409.
Kronmal, R., and Tarter, M. 1968. The estimation of probability densities
and cumulatives by Fourier series methods. J. Am. Stat. Assoc.
63: 925-952.
Laurent, A. G. 1963. Conditional distribution of order statistics and
distribution of the reduced i-th order statistic of the exponential
model. Ann. Math. Stat. 34:652-657.
Lehmann, E. L.
1959.
Testing Statistical Hypotheses.
Wiley, New York.
Loftsgaarden, D.O., and Quesenberry, C. P. 1965. A nonparametric
estimate of a multivariate density function. Ann. Math. Stat.
•
36:1049-1051•
Map.ija, G. M. 1960. The mean square error of an estimator of a normal
probability density. Trudy Vycial Centra .Ak.ad. Nauk Gruzin. S.S.R.
1:75-96·
Murthy, V. K. 1965. Estimation of probability density.
Stat. 36 :1027-1031.
Ann. Math.
Murthy, V. K. 1966. Nonparametric estimation of multivariate densities
with applications. Proceeding of an International Symposium on
Multivariate Analylsis. Academic Press, New York.
Nadaraya, E. A. 1965. On non-parametric estimates of density .functions
and regression curves. Theory of Probability and Its Applications.
10:186...190.
Noak, A.
1950. A class of random variables with discrete distributions.
21: 127-132.
Ann. Math. Stat.
Parzen, E.
mode.
1962.
On estimation of a probability density function and
Ann. Math. Stat.
33: 1065-1076.
Pitman, E. J. G. 1936. Sufficient statistics and intrinsic accuracy.
Proc. Camb. Philos. Soc. 32:567-579.
Pugh, E. L. 1963. The best estimate of reliability in the exponential
case. Opns. Res. 11:57-61.
..
Robertson, T. 1967. On estimating a density which is measurable with
respect to a ~-lattice. Ann. Math. Stat. 38-482-493.
Rosenblatt, M. 1956. Remarks on some nonparametric estimates of a
density function. Ann. Math. Stat. 271.832-837.
Roy, J., and Mitra, S. 1957. Unbiased minimum variance estimation in
a class of discrete distributions. Sankya 18:371-378.
Sathe, Y•.. S., and Yarde, S. D.
estimation of reliability.
1969.
On minimum variance unbiased
Ann. Math. Stat.
40:710-714.
55
Schwartz, S. c. 1967. Estimation of probability density by an
orthogonal series. Ann. Math. Stat. 38:1261-1265.
Tate, R. F. 1959. Unbiased estimation: functions of location and
scale parameters. Ann. Math. Stat. 30:341-366.
Was hi 0, Y., Morimoto, H., and Ikeda, N. 1956. Unbiased estimation
based on sufficient statistics. Bull. Math. Stat. 6:69-94•
•
Watson, G. S., and Leadbetter, M. R. 1963. On the estimation of
the probability density. I. Ann. Math. Stat. 34:480-491.
Wegman, E. J. 1968. A note on estimating a unimodal density.
Institute of Statistics Mimeo Series, No. 599.
Whittle, P. 1958. On the smoothing of probability density functions.
J. R. Stat. Soc. (B) 20:334-343.
Woodroofe, M. 1967. On the maximum deviation of the sample density.
Ann. Math. Stat• 38-475-481.
•