Download II. Finite Sample Properties of Estimators

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
FUNDAMENTALS OF MATHEMATICAL STATISTICS1
I.
Introduction
In this Lecture Note, we will go over some fundamental concepts of mathematical statistics
that are especially useful for studying Econometrics that you will be taking in the next
semester. We studied most of the materials that are covered in this Note already. We go
over them in this Note again but with some more rigor. The purpose of this lecture note is to
understand the “exact” definitions of some of the terminologies often used in Econometrics.
II.
Finite Sample Properties of Estimators
The first thing we will learn in this Note is finite sample properties of estimators. The
term “finite” sample comes from the fact that these properties hold for a sample of any size.
To put it differently, these properties hold no matter how small or large the sample size is.
Sometimes, these are called small sample properties, even though the properties hold for
large sample. Anyhow, the bottom line is that keep in mind that finite sample properties
imply that these properties hold for any sample size n.
A.
Estimators and Estimates
We will first define what we mean by an estimator in order to study properties of estimators. Suppose we have an SRS of size n, {X1 , X2 , ..., Xn }, that is drawn from a population
distribution that depends on an unknown parameter θ. Note that each Xi is a random variable. An estimator of θ is a “rule” that assigns each possible outcome of the sample a value
of θ. Note that the rule is the same regardless of the data actually obtained. In this course,
we frequently used one estimator; i.e., the sample mean X̄. Let {X1 , X2 , ..., Xn } be an SRS
1
Lecture Note 8 does not correspond to any of the chapters in our textbook. This Note is heavily drawn
from Introductory Econometrics—A Modern Approach, by Jeffrey Wooldridge.
Hosung Sohn
1
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
from a population with mean µ. One estimator of µ we learned is
n
X
X̄ =
Xi
i=1
n
.
X̄ is called the sample mean, and it is an estimator of the population mean µ. And we
learned what X̄ can be viewed as a random variable, as it is a combination of random
variables X1 , X2 , ..., Xn . As a matter of course, given any outcome of the random variables
X1 , X2 , ..., Xn , we use the same rule mentioned above to estimate µ.
An estimate is referred to the actual value we obtain from applying the estimator to the
“actual data,” x1 , x2 , ..., xn . We will distinguish the random variables, X1 , X2 , ..., Xn , from
the actual data, x1 , x2 , ..., xn .
More generally, an estimator W of a parameter θ can be expressed as an abstract mathematical formula as below:
W = h(X1 , X2 , ..., Xn ),
for some known function h of the random variables X1 , X2 , ..., Xn . W , of course, is a random
variable, and the value of W changes depending on the sample you obtain.
When a particular set of data, x1 , x2 , ..., xn is plugged into the function h, we obtain
an estimate of θ, denoted w = h(x1 , x2 , ..., xn ). We use an upper-case letter to denote an
estimator (e.g., X̄), and a lower-case letter for an estimate (e.g., x̄). Note further that an
estimator W is sometimes called a point estimator and w a point estimate to distinguish
these from interval estimators and estimates.
For evaluating estimation procedures, we study various properties of the probability
distribution of the random variable W . The distribution of an estimator is called a sampling
distribution, because this distribution describes the likelihood of various outcomes of W
across different random samples.
There are unlimited number of rules for combining data to estimate parameters. And
as you can imagine, we need some desirable criteria for choosing among many estimators,
or at least for eliminating some estimators from consideration. The criteria that we use are
based on the characteristics of a sampling distribution of an estimator, and this is the focus
of mathematical statistics. And in this section, we study the following characteristics of an
estimator: unbiasedness, efficiency, and a mean squared error.
Hosung Sohn
2
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
B.
Syracuse University
Fall, 2015
Unbiasedness
In statistics, we focus on a few features of the distribution of W in evaluating it as an
estimator of θ. The first important property of an estimator makes use of its expected value.
The property is an unbisedness.
Definition 1. An estimator, W of θ, is an unbiased estimator if
E(W ) = θ.
If an estimator is unbiased, then its sampling distribution has an expected value equal to the
parameter it is supposed to be estimating. Unbiasedness does not imply that the “estimate”
we get with any particular one sample is equal to θ, or even very close to θ. Rather, if
we could indefinitely draw many random samples on X from the population, compute an
estimate each time, and then average these estimates over all random samples, we would
obtain θ. This thought experiment is abstract because, in most applications, we just have
“one” random sample to work with.
Given that we have defined an unbiasedness, we can define a bias of an estimator:
Definition 2. If W is a biased estimator of θ, its bias is defined as
Bias(W ) ≡ E(W ) − θ.
In Definition 2, “≡” is used for denoting “equivalency.” If a bias of an estimator is zero, i.e.,
E(W ) − θ = 0
=⇒ E(W ) = θ.
which implies that an estimator W is unbiased. Figure 1 shows three estimators. As you
can see from the figure, W1 is an unbiased estimator for θ, and W2 is an biased estimator for
θ, but has a smaller bias than the biased estimator W3 .
As you can see from the two definitions above, the unbiasedness of an estimator W and
the size of a bias of an estimator W depend on the distribution of X and on the function h
because W = h(X1 , X2 , ..., Xn ). The distribution of X is usually beyond our control; it may
Hosung Sohn
3
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
Figure 1: An Illustration of Unbiased and Biased Estimators
be determined by nature or social forces. But the choice of the rule h is ours, and if we want
an unbiased estimator, then we must choose h accordingly.
Some estimators can be shown to be unbiased quite generally. We learned that X̄, the
estimator for a population mean µ, is an unbiased because E(X̄) = µ (we proved this in
Lecture Note 5). We also learned that the sample variance S 2
n 2
1 X
S =
Xi − X̄
n − 1 i=1
2
is an unbiased estimator for the population variance σ 2 (we did not prove this). And that is
why we replaced σ 2 for S 2 when conducting a hypothesis testing.
Note that for the unbiasedness of the estimator X̄, we did not require any conditions
other than random sampling. That is, the estimator X̄ is an unbiased estimator for µ as
long as we have a random sample. For the unbiasedness of other estimators such as β̂ that
you will learn in the Econometrics course, more conditions are required other than random
sampling, in order for this estimator to be unbiased.
The unbiasedness as a criterion for choosing an estimator has two weaknesses. First,
there are some reasonable, and even some very good, estimators are not unbiased. We will
see an example for this weakness later. Second, there are some poor estimators even though
these estimators are unbiased. Consider estimating the mean µ of a population. Rather
than using the sample mean X̄ to estimate µ, suppose that, after collecting a sample of size
n, we discard all of the observations except the first. That is, our estimator of µ is simply
Hosung Sohn
4
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
W = X1 . This estimator is unbiased because
E(X1 ) = µ.
As you can imagine, ignoring all but the first observation is not a prudent approach to
estimation; it throws out most of the information in the sample. For example, with n = 100,
we obtain 100 outcomes of the random variables X1 , X2 , ..., X1 00, but then we use only the
first of these, X1 , to estimate µ. This leads us to another criterion for choosing an estimator;
i.e., efficiency.
C.
Efficiency
Unbiasedness only ensures that the sampling distribution of an estimator has a mean value
equal to the population parameter it is supposed to be estimating. This is quite useful, but
we also need to know how spread out the distribution of an estimator is. An estimator can
be very close to the population parameter θ, on average, but it can also be very far away
with large probability. In Figure 2, two unbiased estimators are shown.
Figure 2: Two Unbiased Estimators with Different Spread
As you can see from the figure, W1 and W2 are both unbiased estimators of θ. But
the sampling distribution of W1 is more tightly centered about θ. This implies that the
Hosung Sohn
5
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
probability that W1 is greater than any given distance from θ is less than the probability
that W2 is greater than that same distance from θ. To put it differently, using W1 as our
estimator is more likely that we will obtain a random sample that yields an estimate very
close to θ. Therefore, if we were to choose one estimator from these two estimators, we
should definitely choose W1 to estimate θ.
How do we know whether W1 is less spread out than W2 ? We rely on the variance of
an estimator. The variance of an estimator is also called as sampling variance because it
is the variance associated with a sampling distribution. We already learned one sampling
variance; i.e., the sampling variance of an estimator X̄:
V ar X̄ =
σ2
.
n
As suggested by Figure 2, “among unbiased estimators,” we prefer the estimator with
the smallest variance. Knowing the sampling variance of estimators allows us to eliminate
certain unbiased estimators from consideration.
For a random sample from a population with mean µ and σ 2 , we know that X̄ is an
unbiased estimator for µ. We also learned that X1 is also an unbiased estimator for µ.
Which one should we choose as our estimator? Let’s consider the variance of these two
unbiased estimators. We know that V ar X̄ = σ 2 /n. We also learned that V ar (X1 ) = σ 2 .
Now, let’s compare these two variances. The difference between V ar X̄ and V ar (X1 )
can be large even for small sample sizes. If n = 5, then V ar (X1 ) is five times as large as
V ar X̄ . This gives us a formal way of excluding X1 as an estimator for µ.
Comparing the variances of estimators is a general approach to comparing different unbiased estimators.
Definition 3. If W1 and W2 are two “unbiased” estimators of θ, W1 is efficient relative to
W2 when V ar(W1 ) ≤ V ar(W2 ).
In the example above, we say that the estimator X̄ is efficient relative to X1 when n > 1.
Note that the criterion “efficiency” can only be used for “unbiased” estimators. If we do
not restrict out attention to unbiased estimators, then comparing variances is meaningless.
One way to compare estimators that are not necessarily unbiased is to compute the mean
squared error (MSE) of the estimators.
Hosung Sohn
6
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
Definition 4. If W is an estimator of θ, then the MSE of W is defined as
h
i
M SE(W ) = E (W − θ)2 .
The MSE measures how far, on average, the estimator is away from θ. Observe that
h
i
E (W − θ)2 = E W 2 − 2θW + θ2
= E W 2 − 2θE(W ) + θ2
= E W 2 − E(W )2 + E(W )2 − 2θE(W ) + θ2
= E W 2 − E(W )2 + [E(W ) − θ]2
= V ar(W ) + Bias(W )2 .
Hence, M SE(W ) depends on the variance and bias. This criterion allows us to compare two
estimators when one or both are biased.
III.
Large Sample Properties of Estimators
In previous section, we learned that the estimator X1 is unbiased, but it is a poor estimator
because its variance can be much larger than that of the sample mean X̄. Note that X1
has the same variance for any sample size. It seems reasonable to require any estimation
procedure to improve as the sample size increases. For estimating a population mean µ, X̄
improves in the sense that its variance gets smaller as n gets larger; X1 does not improve in
this sense.
We can rule out certain silly estimators by studying the large sample or asymptotic
properties of estimators. One of the large sample properties of estimators we learn in this
Note is consistency. This large sample property concerns how far the estimator is likely
to be from the parameter it is supposed to be estimating as we let the sample size increase
indefinitely.
Definition 5. Let Wn be an estimator of θ based on a sample X1 , X2 , ..., Xn of size n. Then
Wn is a consistent estimator of θ if Wn → θ as n → ∞. If Wn is not consistent for θ,
then we say it is inconsistent.
Hosung Sohn
7
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
Unlike unbiasedness—which is a feature of an estimator for a given sample size—consistency
involves the behavior of the sampling distribution of the estimator as the sample size n gets
large. And that is why this property is called the large sample property. To emphasize this,
we have indexed the estimator Wn by the sample size n in stating this definition.
The intuition of the consistency is clear. It implies that the distribution of Wn becomes
more and more concentrated about θ, which roughly means that for larger sample sizes, Wn
is less and less likely to be very far from θ. This is illustrated in Figure 3.
Figure 3: The Sampling Distributions of a Consistent Estimator for Four Sample Sizes
Note that if an estimator is not consistent, then it does not help us to learn about θ, even
with an unlimited amount of data. For this reason, consistency is a minimal requirement of
an estimator used in statistics or econometrics.
Also, unbiased estimators are not necessarily consistent, but those whose variances shrink
to zero as the sample size grows are consistent. This can be stated formally: If Wn is an
unbiased estimator of θ and V ar(Wn ) → 0 as n → ∞, then Wn is consistent.
A good example of a consistent estimator is the sample mean X̄. We have already shown
that X̄ is unbiased because E X̄ = µ. Also, V ar X̄ = σ 2 /n, which in turn implies that
Hosung Sohn
8
Lecture Note 8
Maxwell School of Citizenship and Public Affairs
Department of Public Administration and International Affairs
Syracuse University
Fall, 2015
V ar X̄ → 0 as n → ∞. Therefore, x̄ is an consistent estimator for a population mean µ.
And that is why we are using the estimator X̄ to estimate the population mean µ.
In the last paragraph of page 4, we learned that the unbiasedness as a criterion for
choosing an estimator has two weaknesses. And one of the weakness is that there are some
reasonable, even some very good, estimators are not unbiased. One example of a reasonable
as well as good estimator is the sample standard deviation Sn . We learned that the sample
variance Sn2 is an “unbiased” estimator for the population variance σ 2 . Note, however, that
the sample standard deviation Sn is not unbiased. Nevertheless, Sn is a consistent estimator
for the population standard deviation, and that is why we use the sample standard deviation
Sn to estimate the population standard deviation σ.
As you can see from the sample standard deviation case, therefore, if we use the “unbiasedness” criterion to choose an estimator, we would end up not using Sn to estimate σ,
which is not desirable, because Sn has a very good property; i.e., consistency.
Hosung Sohn
9
Lecture Note 8