Download Properties of point estimators

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Unit D
Properties of point estimators
Introduction
This Unit establishes some of the properties a good point
estimator should have and evaluates which of these properties
(if any) are guaranteed by the general methods introduced in
Unit C, with particular attention to the ML estimators.
Properties of point estimators
132
Equivariance Property
• In Example B.2, suppose that the quantity of interest that
we wish to infer is not , but the mean waiting time per call,
denoted by = E(Xi) = 1/ .
• To obtain the ML estimate of the true value of , we should
write the likelihood function for ,
L( ) =
n
Y
1
i=1
e
1x
iobs
=
n
e
1 Pn x
i=1 iobs
,
and maximize it through the standard procedure. It can be
easily verified that the ML estimate is ˆobs = x̄obs.
• Recall from Example B.2, that the ML estimate for
ˆ obs = 1/x̄obs. Thus, ˆobs = 1/ ˆ obs, just like = 1/ .
is
• Note that ˆobs is derived from the same transformation as
.
• This is no coincidence, but results from a general property
of ML estimation known as the equivariance property
(many authors call it the invariance property).
In general, the equivariance property ensures that
if a parameter is transformed, the corresponding estimate and
estimator are changed by the same transformation.
133
Unit D:
More formally, a method of estimation is said to be equivariant
if it is such that
if T is the estimator for ✓, producing the estimate tobs, then,
for any one-to-one transformation (✓), the corresponding
estimator and estimate for (✓) are (T ) and (tobs),
respectively.
The MM and the ML satisfy the equivariance property:
˜ = (✓),
˜
and
˜obs = (✓˜obs),
ˆ = (✓),
ˆ
ˆobs = (✓ˆobs).
(Proof not given; refer to text books)
• In Example B.2, the MM and the ML estimators of the
true value of = E(Xi), by the equivariance property, are,
respectively,
˜ = 1/ ˜ = X̄,
ˆ = 1/ ˆ = X̄.
(In both cases we estimate the mean waiting time with the
corresponding sample mean.)
• In Example B.3, suppose we are interested in estimating, not
the population variance 2 of the IQ scores as before, but
Properties of point estimators
134
p
p
2.
the population standard
deviation
=
V
ar(X
)
=
i
p
2 is a one-to-one transformation of 2 .
As > 0, =
2
2
We know that the ML estimate for 2 is ˆ obs
= Sobs
and
2
2
it can be shown that the MM estimate is also ˜ obs
= Sobs
(prove it as a homework). By the equivariance property of
the two methods, we then have
˜ obs =
q
2 = S ,
˜ obs
obs
ˆ obs =
q
2 = S .
ˆ obs
obs
(In both cases, the population standard deviation is
estimated with the sample standard deviation.)
135
Unit D:
Principle of repeated sampling
The fundamental issue is that we are dealing with estimates,
that is with approximations of the true value, so what we would
like to know is
how good is the estimate? how large is the error of the
approximation?
We will answer these questions by recalling that an estimate is
a realization of a random variable (or vector), the estimator,
which has its own probability distribution called the sampling
distribution. The properties of the estimator can then be
studied by analyzing the features of its sampling distribution,
such as the expected value, the variance and so on.
This is the principle of repeated sampling by which
we evaluate the properties of an estimator thinking of a
hypothetical replication of the experiment, as if for many,
many times we extracted a new sample of size n from the
population and every time we computed a new estimate. We
then evaluate the behaviour of all the estimates obtained,
which are realizations of the same random variable (or vector),
the estimator. In most situations, such replication of the
experiment is not actually carried out, but we reason as if it
took place.
Properties of point estimators
136
Notation
• As in the previous Units, we will denote the observed sample
by xobs = (x1obs , . . . , xnobs ) and the generating random
vector by X = (X1, . . . , Xn), whose j.d.f (or j.p.f.) belongs
to the parametric family f (x; ✓), ✓ 2 ⇥.
• For clarity, we will denote the true and unknown value of ✓
by ✓0, so that f (x; ✓0) is the true j.d.f (or j.p.f) of X.
• We will also denote by T = T (X) a generic estimator for ✓,
and the corresponding estimate by tobs = T (xobs).
• In the following, we will assume that ✓ is unidimensional,
but all the results presented below can be extended to the
multi-parameter case. When appropriate, this extension will
be specified.
137
Unit D:
Mean Squared Error
• Though ✓0 is unknown, ideally, we would like the probability
distribution of T ✓0 to be concentrated around 0. In fact,
this would ensure that for any sample extracted, and hence
for any sample data xobs, the estimate tobs is close to the
true value ✓0.
• For example, in the very unrealistic case T = ✓0 with
probability 1, we would have that for any xobs, tobs = ✓0.
• A quantity which is used to measure the concentration of
the distribution of T ✓0 around 0 is the Mean Squared
Error (MSE), which is defined as
MSE(T ) = E[(T
✓0)2],
where the expected value is computed with respect to the
true j.d.f (or j.p.f.) f (x; ✓0).
• As (T
✓0)2 can be interpreted as the distance of the
estimator from the true value of ✓, the MSE is the average
distance from ✓0.
• A small value of MSE implies that for any sample, with high
probability, the estimate tobs is close to the true value ✓0.
• The MSE of T can be decomposed as follows:
MSE(T ) = E[(T
Properties of point estimators
✓0)2] = V ar(T ) + [E(T )
✓0 ] 2
138
Proof:
E[(T ✓0)2] = E(T 2+✓02 2✓0T ) = E(T 2)+✓02 2✓0E(T ) =
= E(T 2)
[E(T )]2 + [E(T )]2 + ✓02
= V ar(T ) + [E(T )
2✓0E(T ) =
✓0 ] 2 .
• The term
E(T )
✓0
is called the bias of the estimator T . It measures the
distance of the location of the distribution of T from ✓0.
• Hence, the MSE of the estimator T can be written as
MSE(T ) = [Bias(T )]2 + V ar(T )
which shows that the MSE has two components:
1. the bias of the estimator, which is related to the location
of the distribution of T ,
2. the variability of T .
139
Unit D:
Bias
• An estimator whose bias is 0 is said to be unbiased. In
other words, T is an unbiased estimator of ✓0 if
E(T ) = ✓0,
where, as before, the expectation is computed with respect
f (x; ✓0), i.e. the distribution of T is centred at ✓0.
• By the repeated sampling principle, unbiasedness means
that on average we estimate ✓0 correctly: if we repeat the
sampling procedure many times, for each sample we compute
tobs and then we average across all estimates, we obtain
exactly ✓0.
• If ✓ has more than one component the property
of unbiasedness requires that for each element of ✓ the
corresponding estimator is unbiased.
Properties of point estimators
140
In general, neither the MM nor the ML guarantee the property
of unbiasedness
This means that the MM and the ML estimators might or might
not be unbiased and the property, if required, has to be verified
on a case by case basis.
1. In Example B.1, MM and ML lead to the same estimator of
the true value of p, p0:
p̃ = p̂ = X/n
This is an unbiased estimator of p0, since
E(X/n) = E(X)/n = np0/n = p0.
2. In Example B.2, MM and ML lead to the same estimator of
the true value of , 0:
˜ = ˆ = 1/X̄.
This is a biased estimator of
E(X̄) = E
✓
X1 + . . . + Xn
n
=
141
0,
◆
since
E(X1) + . . . + E(Xn)
=
=
n
n
= 1/ 0,
n 0
Unit D:
but
E(1/X̄) 6= 1/E(X̄) =
0.
3. In Example B.3, ✓ has two components: µ and 2. For each
component, MM and ML produce the same estimator of the
true value, µ0 and 02, respectively:
˜ 2 = ˆ 2 = S 2.
µ̃ = µ̂ = X̄,
To verify unbiasedness we have to assess whether each
estimator is unbiased.
• X̄ is an unbiased estimator of µ0.
Proof:
✓
◆
X1 + . . . Xn
E(X1) + . . . + E(Xn)
E(X̄) = E
=
= µ0 .
n
n
• S 2 is a biased estimator of
Proof:
1
n
E(S 2) = E
1
n(
n
2
0
2
0
n
X
Xi2
X̄ 2
i=1
+ µ20)
+
2
0.
µ20
Properties of point estimators
!
n
1X
=
E(Xi2) E(X̄ 2) =
n i=1
V ar(X̄) + [E(X̄)]2 =
⇢
2
0
n
+
µ20
=
n
1
n
2
0.
142
It is easy to correct S 2 to derive an unbiased estimator of
2
0 . It follows from the previous steps that
S 02 = S 2
n
n
is such that E(S 02) =
1
=
1
n
1
n
X
(Xi
X̄)2
i=1
2
0.
For any simple random sample, the sample mean X̄ and the
“corrected” sample variance S 02 are unbiased estimators of
the true population mean and variance, respectively.
143
Unit D:
Expected Fisher Information
For an unbiased estimator T of ✓0 the MSE simplifies into
M SE(T ) = V ar(T ).
So the MSE will be smaller the less variable is the estimator T .
To study V ar(T ) we need to introduce a new quantity: the
Fisher Information.
• Recall that the score function is
d
d
u(✓) = l(✓) =
log f (xobs; ✓).
d✓
d✓
So, u(✓) is also a function of xobs and as such it can be seen
as a realization of a r.v. that we will denote by u(✓; X) to
underline the dependence on the random vector X.
• The Expected Fisher Information, denoted by I(✓), is
(almost always) given by
I(✓) =
E
✓
◆
d
u(✓; X) .
d✓
where the expectation is computed with respect to the
generic j.d.f. (or j.p.f.) of X, f (x; ✓).
Properties of point estimators
144
• To give an example of expected information, recall that in
Example B.1,
xobs
u(p) =
p
n xobs
,
1 p
where xobs is a realization of X ⇠ Bin(n = 20, p), so that
the corresponding r.v. is
u(p; X) =
X
p
n
1
X
X np
=
.
p
p(1 p)
The expected information is given by
✓
◆
✓
◆
d
X
n X
I(p) = E
u(p; X) = E
=
2
2
dp
p
(1 p)
⇥
⇤
✓
◆
2
2
E (2p 1)X np
X + 2pX np
= E
=
=
{p(1 p)}2
{p(1 p)}2
np(1 p)
n
=
=
.
2
{p(1 p)}
p(1 p)
• Why is I(✓) called the expected information? The score
function is the first derivative of the log-likelihood, so I(✓)
is the expected curvature of the log-likelihood (the change
of sign does not alter this interpretation). Greater values of
I(✓) imply a greater curvature of the log-likelihood, hence
greater information about the true value of ✓.
• Note that well-behaved log-likelihoods are concave and the
change of sign ensures a positive Fisher information.
145
Unit D:
The Cramer-Rao inequality and efficiency
The Cramer-Rao inequality states that if T = T (X) is an
unbiased estimator of the true value of ✓, ✓0, then
V ar(T )
1/I(✓0),
where V ar(T ) is evaluated with respect to the true j.d.f. (or
j.p.f.) f (x; ✓0) of X.
Proof: We do not prove this result, but for anyone interested
the proof can be found, for example, in Azzalini.
• The Cramer-Rao inequality gives a lower bound for the MSE
of an unbiased estimator.
• An unbiased estimator whose variance is equal to 1/I(✓0) is
called an efficient estimator.
• In Example B.1, we had
X
p̃ = p̂ = ,
n
where X ⇠ Bin(n = 20, p), which are unbiased estimators
of the true value of p, p0. We want to verify whether they
are efficient estimators of p0, i.e. whether
V ar(p̃) = V ar(p̂) = 1/I(p0).
Properties of point estimators
146
We know from previous computations that
I(p) =
n
p(1
so that
I(p0) =
p)
,
n
p0(1
p0 )
.
In addition,
V ar(p̃) = V ar(p̂) =
V ar(X) np0(1 p0) p0(1 p0)
.
=
=
2
2
n
n
n
Hence,
V ar(p̃) = V ar(p̂) = 1/I(p0),
which proves that the Cramer-Rao lower bound is reached
and that p̃ and p̂ are efficient estimators of p0.
147
Unit D:
Asymptotic properties
• Except in very simple estimation problems finding “optimal”
estimators, i.e. unbiased and efficient, is an impossible task,
sometimes because they are not easy to identify, but mostly
because such estimators do not exist.
• Statistical theory overcomes this obstacle, by looking at the
asymptotic properties of the estimators, that is at the
behaviour of the estimators when the sample size becomes
infinite.
• This is not a merely theoretical exercise, as the underlying
idea is that these asymptotic properties will be at least
approximately satisfied in large samples.
• In Example B.1, we have seen that
M SE(p̃) = M SE(p̂) = V ar(p̃) = V ar(p̂) =
p0(1
p0 )
n
.
So, as n ! 1, the M SE converges to 0, which implies
that the distributions of p̂ and p̃ become more and more
concentrated around the true value p0.
• This is what we expect, not just from efficient estimators,
but from any “reasonable” estimator. That is, we expect
that as the sample size n increases (and the sample
information becomes greater and greater), the estimator
behaves better and better: its sampling distribution becomes
Properties of point estimators
148
more concentrated around the true parameter value and the
MSE converges to 0.
• In the following pages we will review the asymptotic
properties most commonly used to judge the estimator
behaviour in large samples.
149
Unit D:
Asymptotic unbiasedness and asymptotic
efficiency
Asymptotic unbiasedness
An estimator T = T (X) = T (X1, X2, . . . , Xn) is said to
be asymptotically unbiased if
lim E(T ) = ✓0
n!1
where the expectation is computed with respect to the true
j.d.f. (or j.p.f.) f (x; ✓0).
Asymptotic efficiency
Extending the Cramer-Rao inequality to infinite samples,
we say that an asymptotically unbiased estimator T is
asymptotically efficient if
lim V ar(T )I(✓0) = 1,
n!1
where the variance is computed with respect to the true j.d.f.
(or j.p.f.) f (x; ✓0).
Properties of point estimators
150
Consistency
• The consistency property of an estimator requires that the
estimator “converges” to the true parameter value ✓0 as the
sample size n becomes infinite. It is such a fundamental
property that inconsistent estimators are generally not even
taken into account.
• As we are dealing with estimators, i.e. with r.v.’s, we need
a specific definition of limit which slightly di↵ers from the
one used for determinist sequences, though the basic idea
remains the same.
• An estimator T = T (X) = T (X1, X2, . . . , Xn) is a
consistent estimator of the true parameter value ✓0 if, for
every ✏ > 0,
lim P (|T (X1, . . . , Xn)
n!1
✓0| < ✏) = 1,
where the probability is computed with respect to the true
j.d.f (or j.p.f.) f (x; ✓0).
• Roughly speaking, this means that as the sample size
becomes larger and larger, the estimator will become
arbitrarily close to the true parameter value with high
probability.
• A sufficient (but not necessary) condition for an estimator
to be consistent is that
151
Unit D:
1.
E(T ) ! ✓0,
as n ! 1;
(i.e. T is asymptotically unbiased) and
2.
V ar(T ) ! 0,
as n ! 1.
• In Example B.1, we have
E(p̃) = E(p̂) = p0,
and
V ar(p̃) = V ar(p̂) =
p0(1
p0 )
n
! 0,
as n ! 1.
So, they are consistent estimators of p0.
Properties of point estimators
152
Asymptotic behaviour of ML estimators
• We now concentrate on the asymptotic properties of ML
estimators which largely explain the dominant role that ML
has in point estimation.
• We will limit attention to simple random samples, but
the results below can be extended to some non-i.i.d. cases.
• If X = (X1, . . . , Xn) have i.i.d. components, under
ˆ has the
regularity conditions, the ML estimator for ✓, ✓,
following asymptotic properties.
1. Property 1: ✓ˆ is a consistent estimator of the true
parameter value ✓0.
2. Property 2: ✓ˆ is such that
lim P
n!1
(✓ˆ ✓0)
p
z
1
I (✓0)
!
= (z),
where (z) denotes the distribution function of N (0, 1).
ˆ
This means that the distribution function of p(✓ 1✓0)
I
(✓0 )
converges to that of a N (0, 1).
Proof: We are not proving Properties 1 and 2. Anyone
interested can find the proofs, for example, in Azzalini (pages
80 and 82).
153
Unit D:
Asymptotic Normality of ML estimators and
its implications
• The practical implication of Property 2 is that, for n finite,
but large,
(✓ˆ ✓0) a
p
⇠ N (0, 1),
1
I (✓0)
or, equivalently,
a
✓ˆ ⇠ N ✓0, I 1(✓0) ,
a
where ⇠ stands for “approximately distributed as”.
• From Property 2, we also see that
ˆ ! ✓0 ,
E(✓)
as n ! 1;
(i.e. ✓ˆ is asymptotically unbiased) and
ˆ
V ar(✓)I(✓
0) ! 1
(i.e. ✓ˆ is asymptotically efficient)
• For all of the previous properties, the ML estimators are said
to be best asymptotically normal.
• For simple random samples, under regularity conditions, the
MM estimators are
Properties of point estimators
154
1. consistent,
2. asymptotically normal,
3. asymptotically unbiased,
4. but in general they are not asymptotically efficient.
• This explains the preference for ML on MM.
155
Unit D:
Examples of application of the asymptotic
properties of ML estimators
1. Example B.2
• We know that, for n large enough,
a
ˆ⇠
N
1
0, I
( 0) .
• We need to derive I( ). We have
u( ) =
n
X
n
xiobs
i=1
which is generated from the r.v.
u( ; X) =
n
X
n
Xi .
i=1
Then,
I( ) =
E
✓
d
u( ; X)
d
• So,
a
ˆ⇠
N
Properties of point estimators
✓
◆
E
=
0,
2
0
n
◆
⇣
n⌘
2
=
n
2
.
.
156
• The figure below shows the d.f. of ˆ for di↵erent
values of n (n = 5, 15, 30, 100) with 0 = 1.
4
^)
f(λ
3
n=100
2
n=15
n=5
0
1
^)
f(λ
n=30
0.0
0.5
1.0
1.5
2.0
2.5
3.0
^
λ
What we can see is that, as n increases,
– the density of ˆ resembles more and more that of a
Normal r.v.;
– the density of ˆ becomes more and more concentrated
around 0 as a consequence of the fact that V ar( ˆ ) !
0.
• Recall that in Example B.2 the number of observations
is 150. From the figure above, we can deduce that
the Normal distribution provides a good approximation
of the true distribution of ˆ in this case study. In other
words, n = 150 seems large enough to allow the use of
asymptotic approximations.
2. Example B.1
• The ML estimator for the settings of Example B.1 is p̂ =
157
Unit D:
X/n. We have already shown that this is an unbiased and
efficient estimator of p0. If it is unbiased and efficient,
it must also be asymptotically unbiased and efficient, but
the latter properties come also from the general results on
ML estimators, together with the consistency property.
• The general results for ML estimators also ensure that, in
large samples, the distribution of p̂ can be approximated
by
a
p̂ ⇠ N p0, I 1(p0) ,
that is
a
p̂ ⇠ N
✓
p0 ,
p0(1
p0 )
n
◆
.
• The figure below shows the p.f. of p̂ for di↵erent
values of n (n = 5, 20, 50, 100) and p0 = 0.2.
n=20
0.20
0.00
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
^
p
^
p
n=50
n=100
1.0
0.8
1.0
0.00
^)
f(p
0.8
0.06
0.2
0.00 0.06 0.12
0.0
^)
f(p
0.10
^)
f(p
0.2
0.0
^)
f(p
0.4
n=5
0.0
0.2
0.4
0.6
^
p
Properties of point estimators
0.8
1.0
0.0
0.2
0.4
0.6
^
p
158
• Note that, as X is discrete, p̂ is also discrete, but as n
increases, p̂ resembles more and more a continuous r.v.
(the Normal r.v.).
• Recall that in Example B.1 we have n = 20. From the
figure above, for n = 20 the distribution of p̂ shows
some departure from Gaussianity, so there is some doubt
on whether the Normal distribution is giving a good
approximation of the distribution of p̂ for this case study.
In other words, n = 20 might not be large enough to
allow the use of asymptotic approximations.
159
Unit D:
General remarks
• The issue that could be raised is how large n has to be for the
asymptotic Normal distribution to be a good approximation
ˆ There is unfortunately no
of the true distribution of ✓.
unique answer, as it depends on the shape of the distribution
of each Xi and on the true parameter value ✓0 which is, of
course, unknown.
• In both the examples above (and others) MM and ML lead to
the same estimator. This is generally not the case, especially
in more complicated estimation problems.
• In Example C.1, where f1(x; ✓) > 0 for x > ✓, i.e.
the Xi’s take value on a region that depends on ✓, the
regularity conditions required by the asymptotic results on
ML estimators do not hold. In this context we cannot rely
on any of the asymptotic properties seen above and the
behaviour of the estimator ✓ˆ = min(X1, . . . , Xn) has to
be studied directly.
• Examples B.3 and B.4 fall outside the scope of this Unit,
being multi-parameter problems. However, asymptotic
properties of ML estimators similar to those of the single
parameter case hold for higher dimensional problems with
some adjustments.
Properties of point estimators
160
Observed Information and Standard Errors
• From
a
✓ˆ ⇠ N ✓0, I 1(✓0) ,
we derive that for large samples
a
ˆ
V ar(✓) = I 1(✓0),
a
where = stands for “approximately equal to”. This implies
that if n is large we do not need to compute the variance of
✓ˆ directly, but we can more simply use I 1(✓0).
• The problem with the use of I 1(✓0) as a measure of
variability of the ML estimator is that I 1(✓0) is unknown,
since ✓0 is unknown.
• One solution is to replace the expected information evaluated
at ✓0 with the so-called observed information defined as
I(✓ˆobs) =
d
u(✓)
d✓
.
✓=✓ˆobs
• This solution “works” because it can be shown that I(✓ˆobs)
is a consistent estimate of I(✓0) (not proven).
• We then estimate the variance of ✓ˆ in large samples by
I 1(✓ˆobs).
161
Unit D:
ˆ
• The
qcorresponding estimate of the standard deviation of ✓
is I 1(✓ˆobs). This is called the standard error of ✓ˆ and
ˆ
denoted by s.e.(✓).
Properties of point estimators
162
Examples of standard errors
1. In Example B.2, a measure of the variability of ˆ can be
computed as follows.
• We know that
u( ) =
n
n
X
xiobs ,
i=1
so that
d
n
u( ) =
,
2
d
from which we can compute the observed information
n
ˆ
I( obs) =
.
ˆ2
obs
• Recalling that ˆ obs = 1/0.68 and n = 150,
s.e.( ˆ ) =
p
1/(0.682 ⇥ 150) = 0.014
2. In Example B.1, a measure of the variability of p̂ can be
computed as follows.
• We know that
u(p) =
163
xobs
p
n xobs
,
1 p
Unit D:
so that
d
xobs n xobs
u(p) =
,
dp
p2
(1 p)2
from which we can compute the observed information
I(p̂obs) =
xobs
n
+
p̂2obs (1
xobs
.
2
p̂obs)
• Recalling that xobs = 4, n = 20 and p̂obs = 4/20 = 0.2,
we have
s.e.(p̂) =
s
4
20 4
+
0.22 (1 0.2)2
1
= 0.089
• There might be some doubt about the quality of 0.089 as
a measure of variability of p̂, since, as noticed before,
n = 20 might not be large enough to ensure good
approximations.
Properties of point estimators
164
Approximate confidence intervals based on
the asymptotic distribution of ML estimators
• Though interval estimation is outside the scope of this
course, the asymptotic properties of ML estimators have an
important implication for building confidence intervals that
is worth mentioning.
• Recall that a confidence interval of level 1 ↵ (for ↵ fixed)
for an unknown parameter ✓ is given by the interval (T1, T2)
where T1 and T2 are two statistics such that
T1 = T1(X1, . . . , Xn),
T2 = T2(X1, . . . , Xn)
and
P(T1  ✓0  T2) = 1 ↵,
with ✓0 being, as before, the true and unknown value of ✓.
• If we let
T1 = ✓ˆ
and
p
z1 ↵/2 I 1(✓0)
p
T2 = ✓ˆ + z1 ↵/2 I 1(✓0)
where z1 ↵/2 is the 1 ↵/2 quantile of N (0, 1), the interval
(T1, T2) is a confidence interval of approximate level 1 ↵
for large samples.
• This follows from
165
a
✓ˆ ⇠ N ✓0, I 1(✓0) ,
Unit D:
which is equivalent to
✓ˆ ✓0 a
p
⇠ N (0, 1),
1
I (✓0)
so that, for large samples,
P
z1
↵/2
Thus,
.
✓ˆ ✓0
p
 z1
1
I (✓0)
a
↵/2
P (T1  ✓0  T2) = 1
!
a
= 1
↵.
↵
• (T1, T2) as defined above is a random interval, meaning that
T1 and T2 are r.v.’s. The corresponding observed interval
that is built on a specific sample xobs will be (t1obs , t2obs )
where
p
ˆ
t1obs = ✓obs z1 ↵/2 I 1(✓0)
and
t2obs
= ✓ˆobs + z1
p
1
↵/2 I (✓0 ).
• The problem with the confidence interval specified above is
that t1obs and t2obs depend on I(✓0) which is unknown.
ˆ the
• Just like when we derived a measure of variability for ✓,
solution is to replace I(✓0) with the observed information
Properties of point estimators
166
I(✓ˆobs). Under this choice, the numeric confidence interval
of approximate level 1 ↵ becomes
⇣
167
✓ˆobs
z1
↵/2
ˆ ✓ˆobs + z1
⇥ s.e.(✓);
↵/2
⌘
ˆ .
⇥ s.e.(✓)
Unit D:
Examples of construction of approximate
confidence intervals
1. Example B.2
• Suppose we are interested in building a 95% confidence
interval for .
• We have seen that s.e.( ˆ ) = 0.014.
• The confidence interval is then
(1/0.68 1.96⇥0.014; 1/0.68+1.96⇥0.014) = (1.44; 1.50),
where 1.96 corresponds to z0.975.
• As n = 150 seemed to ensure a good Normal
approximation of the true distribution of ˆ , the actual
confidence level of this interval will be very close to the
nominal 0.95 level.
2. Example B.1
• Suppose we are interested in building a 90% confidence
interval for p.
• We have seen that s.e.(p̂) = 0.089.
• The confidence interval is then
(0.2 1.645⇥0.089, 0.2+1.645⇥0.089) = (0.054; 0.346)
where 1.645 corresponds to z0.95.
Properties of point estimators
168
• As n = 20 might not be large enough for a good Normal
approximation of the true distribution of p̂, the actual
confidence level of this interval might be not so very close
to the nominal 0.9 level.
169
Unit D:
Exercises
Exercise D.1 Use the equivariance property of ML to answer the following
questions.
1. From the answer to question 3 of Exercise C.3 construct the ML
estimator of the probability that no one uses the cash point in a
single day.
2. From the answer to question 4 of Exercise C.3 compute the ML
estimate of the probability that no one uses the cash point in a single
day.
Exercise D.2 Use the answer to part 3 of Exercise C.3 to solve the
following problems.
1. Show that ˆ is an unbiased estimator of the true value of .
2. Compute the MSE of ˆ .
3. Show that ˆ is an efficient estimator of the true value of .
4. Write the approximate distribution of ˆ in large samples.
Exercise D.3 Use the answer to part 4 of Exercise C.3 and the data
specified in part 3 of Exercise B.2 to solve the following problems.
1. Compute the standard error of ˆ .
2. Build an approximate 95% confidence interval for the true value of
.
Exercise D.4 Use the answer to part 3 of Exercise C.2 to solve the
following problems. Let ✓ denote the mean number of trials till failure,
i.e. ✓ = E(Xi) = 1/p.
1. Using the equivariance property, derive the ML estimator for ✓ from
the ML estimator of p.
Properties of point estimators
170
2. Show that the ML estimator for ✓ is unbiased.
3. Compute the MSE of the ML estimator for ✓.
4. Show that the ML estimator for ✓ is efficient.
Exercise D.5 Use the answer to part 4 of Exercise C.2 and the data
specified in part 3 of Exercise B.1 to solve the following problems.
1. Compute the standard error of the ML estimator for p.
2. Build an approximate 90% confidence interval for the true value of
p.
Exercise D.6 Using the solution of part 3 of Exercise C.4 and fixing
2
= 60,
1. show that the ML estimator of the true value of µ is unbiased;
2. compute the MSE of the ML estimator of the true value of µ;
3. show that the ML estimator of the true value of µ is efficient.
Using the solution of part 4 of Exercise C.4,
4. compute the standard error of the ML estimator for µ;
5. construct an approximate 95% confidence interval for the true value
of µ.
171
Unit D:
Properties of point estimators
172