Download D:\My Documents\wpdocs\ECO5423\w05\notes\sec4.02.wpd

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
References
Chapter 8 - Sections 1,2, and 4, Morris H. DeGroot and Mark J. Schervish, Probability and
Statistics, 3rd Edition, Addison-Wesley, Boston.
Chapter 6 - Section 1, Bernard W. Lindgren, Statistical Theory, 3rd Edition, MacMillan, New
York.
An Overview of Hypothesis Testing
Assume we have an estimator t(X) of an unknown parameter 2. A null hypothesis is
chosen. This is simply a statement about the numerical value of the unknown parameter. The
objective is to test the validity of the null hypothesis. Since the estimator is itself a random
variable, its value will generally differ from the hypothesized value. This is to be expected. But
when is a difference "large enough" to be construed as statistical evidence against the
hypothesized value? The ex-ante probability of the observed sample statistic, t(X), is computed
for the hypothesized value of 2. If this probability is “unusual” or “extreme” in the sense of
falling below some threshold, then the sample value of t(X) is considered to be evidence against
the null hypothesis, and the null is rejected. Otherwise, we accept the null hypothesis. Some
authors replace the word “accept” in the previous sentence with the phrase “fail to reject.” I do
not have strong preferences regarding this terminological dispute, provided that you understand
that acceptance of the null hypothesis is not proof of its validity, in the same way that rejection
of the null hypothesis is not proof of its invalidity. You cannot generate proof from statistics
(except for trivial problems).
The Null Hypothesis
In application, the null hypothesis is often of simple form. A simple hypothesis is one
that allows only a single value of the unknown parameter. This is typically written as Ho: 2=2o,
where 2o denotes some fixed value of the unknown parameter 2. In the context of classical
regression, the most common simple null hypothesis is of the form $j=0. This hypothesis states
that the regressor Xj may be omitted from the regression. If we reject the null hypothesis, we say
that $j is “significantly different than zero.” Many authors will truncate this phrase to
“significant.” In some cases, the simple null hypothesis will be of the form $j=c for some
constant c. For example, if we are estimating a demand equation, we might want to test whether
the price elasticity of demand is unitary. In this example, if we reject the null hypothesis, we say
that the price elasticity of demand is significantly different than one.
There are cases where the null hypothesis is composite. A composite hypothesis allows
more than one value of the unknown parameter (typically an interval). There are many examples
we could consider, but one such example is Ho: 2<2o. In the context of classical regression, a
composite null hypothesis is often of the form $j<0 or $j>0. For example, if we are estimating a
demand equation, we might want to test whether the income elasticity is positive or negative. If
the null hypothesis $j<0 is rejected, we say that $j is “significantly positive.” If the null
hypothesis $j>0 is rejected, we say that $j is “significantly negative.” Note that “significantly
positive” is different than “positive and significant(ly different than zero),” although there are
many authors that do not understand this distinction and use the phrases interchangeably.
Likewise, “significantly negative” is different than “negative and significant(ly different than
zero).”
Finally, the term “null space” refers to the subset of the parameter space that is specified
by the null hypothesis. The “alternative space” is best defined as the compliment of the null
space. In this way, the null and alternative spaces partition the parameter space into disjoint
subsets. Absent specific knowledge that allows us to eliminate portions of the parameter space
from consideration, failure to define the alternative space in the prescribed manner can overstate
the reported power of the test.
The Critical Region
The critical region, denoted C, is defined as the subset of the sample space for which the
null hypothesis is rejected. That is,
C = { t(X) | reject null }
These are values of the estimator that are judged to be "so unlikely" under the null hypothesis,
that we feel compelled to reject the null. The subjectivity of this statement is clear. In this
course, we will not consider the question of how to find an optimal critical region. This material
is available in many mathematical statistics textbooks under the topic “uniformly most powerful
tests.” Instead, we will focus on the intuitive relationship between the form of the hypothesis
pair and the form of the corresponding critical region. Eventually, we will discuss the
Likelihood Ratio critical region, which provides an intuitive method of finding a critical region
for many different structures. The LR critical region typically has desirable characteristics and is
often of optimal form.
The Power Function
In order to measure the performance of a critical region (or test), we must introduce the
concept of the power function. The power function, B(C:2), gives the probability the sample
statistic falls in the critical region, C, as a function of the unknown parameter, 2. That is,
B(C:2) = P2[t(X)0C]
The power function gives the probability of rejecting the null hypothesis for alternative values of
the unknown parameter. Note that the form of the power function depends on the form of the
critical region C and the distribution of t(X), while the numerical value of the power function is
dependent on the value of the unknown parameter 2.
The process of hypothesis testing admits two types of errors. Type I error occurs when
the null hypothesis is rejected when it is true. Type II error occurs when the null hypothesis
cannot be rejected when it is false. The ideal power function would take the value 1 for all 2 in
the alternative space, and the value 0 for all 2 in the null space. In this way, the probabilities of
type 1 and type II errors would always be zero. Obviously, the ideal power function cannot be
attained by any reasonable statistical test, since it implies a random process yielding valid
conclusions with certainty.
The power function may be used to define two summary measure of the performance of a
test. The " level of a test is the supremum (the least upper bound or maximum when it exists) of
the power function over the null space. That is,
sup
α = θ ∈ H π (C:θ )
o
The " level is the largest probability of Type I error. The $ level of a test is one minus the
infimum (greatest lower bound or minimum when it exists) of the power function over the
alternative space. That is,
β = 1−
inf
π (C:θ )
θ ∈ HA
The $ level is the largest probability of Type II error.
The relationship between the states of nature and statistical decisions is summarized in
the following table.
H0 True
H0 False
Reject H0
Type I Error (")
No Error (1-$)
Accept H0
No Error (1-")
Type II Error ($)
The power function may be used to define another important property of statistical tests.
A test is unbiased if
sup
inf
θ ∈ HO π (C:θ ) ≤ θ ∈ H A π (C:θ )
This property can be stated less precisely as follows; a test is unbiased if its power on the null
space is always less than its power on the alternative space. An unbiased critical region has the
intuitively pleasing property that the probability of rejecting the null is always greater when the
null is false than when it is true.