Download Lecture 9 Testing Concepts. 1 Hypotheses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Lecture 9
Testing Concepts.
1 Hypotheses
Hypotheses are some statements about population distribution, which are either true or untrue for the given
population.
Example For example, let X1 , ..., Xn be a random sample from distribution N (µ, σ 2 ) with σ 2 known and
µ ∈ M. Suppose our hypothesis is that µ ∈ M1 for some M1 ⊂ M, i.e. M1 is some subset of M. It is
called the null hypothesis. It is denoted as H0 : µ ∈ M1 . Then the alternative hypothesis is that µ ∈
/ M1 ,
i.e. µ ∈ M\M1 . It is denoted as Ha : µ ∈
/ M\M1 . For example, if M1 = µ0 and M = {µ : µ ≥ µ0 }, then
H0 : µ = µ0 and Ha : µ > µ0 . Or, as another example, if M1 = µ0 and M = {µ : µ ∈ R}, then H0 : µ = µ0
6 µ0 .
and Ha : µ =
If a hypothesis includes only one parameter value, it is called simple. Otherwise, the hypothesis is
called complex. In the examples above, the null hypothesis was simple while the alternative was complex. In
principle, we can also allow for complex null hypotheses. For example, if M1 = µ1 ∪µ2 and M = {µ : µ ∈ R},
6 µ1 and µ =
6 µ2 . It is customary to mention both the null and the
then H0 : µ = µ1 or µ = µ2 and Ha : µ =
alternative hypotheses since the full parameter space M is often unspecied.
2 Testing
We observe a sample from a population and, based on this sample, create a test. Our test is intended to
decide whether we accept the null hypothesis or reject it in favor of the alternative. Some people argue that
instead of word accept it is more appropriate to say do not reject. We are not going to emphasize this
dierence here.
2.1 Critical region
Let X denote our data. Then any test consists of the critical region C , which is a function of our null and
alternative hypotheses, such that we accept the null hypothesis if X ∈ C and reject it if X ∈
/ C . For example,
Pn
if our data is X = (X1 , ..., Xn ), then the critical region might be C = { i=1 Xi < δ} for some δ ∈ R. The
value δ in this example might depend both on the null and the alternative.
Cite as: Anna Mikusheva, course materials for 14.381 Statistical Methods in Economics, Fall 2011. MIT OpenCourseWare
(http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
In testing, four situations are possible. If H0 is true and we accept it, then it is a correct decision. If H0
is true but we reject it, then it is a type 1 error. If H0 is false but we accept it, then it is a type 2 error. If
H0 is false and we reject it, then it is a correct decision again. So, in addition to correct decisions, there are
errors of two types.
2.2 Size and power trade-o
The probability of a type 1 error is called the size of the test.
Example (cont.) In the example above, suppose our null hypothesis is H0 : µ = µ0 and our alternative is
Ha : µ > µ0 . Then the natural test is to accept the null hypothesis if the data belongs to the critical region
Pn
C = { i=1 Xi < δ}. Then
Pµ0 {
n
X
√
√
√
Xi ≥ δ } = Pµ0 { n(X n − µ0 )/σ ≥ n(δ/n − µ0 )/σ } = 1 − Φ( n(δ/n − µ0 )/σ),
i=1
which is a decreasing function of δ . If δ is large, then size of the test is small, which is good. Please, note,
that the size is calculated at the null value (often called under the null).
What is the probability of type-2 error? If true parameter value µ > µ0 , then
n
X
√
Xi < δ) = Φ( n(δ/n − µ)/σ)
Pµ (
i=1
First, notice that it is a function of true µ. Second, if δ is large, then the probability of a type 2 error is
large as well, which is bad.
Thus, there is a trade-o between the probability of a type-1 error and the probability of a type-2 error.
This trade-o exists in most practically relevant situations. Before we consider how one should choose the
test in light of this trade-o, some additional concepts are necessary.
The Power of the test is dened as the probability of correctly rejecting the null hypothesis. Thus, the
power of the test is dened as 1 minus the probability of a type-2 error. Apparently, the power of the test
depends on the true parameter value. So, power is usually considered as a function of the true parameter
value on the set of alternatives.
The size of the test also depends on the true parameter value when the null hypothesis is composite. But,
instead of considering the size of the test as a function of the true parameter value, the concept of the level
of the test is used. We say that the test has level α if for any true parameter value in the null hypothesis,
the size is not greater than α. The level of the test is dened as the maximum of the size over all possible
true parameter values in the null hypothesis. In the example above, level of the test is supµ∈M1 size(µ).
Once we have the notion of power of the test and its level, let us consider how to choose the test. Common
practice is to x the level of the test (usually, it is 1, 5, or 10%) and then to choose a test with as much
power as possible among all tests with a given level. In this sense the null and the alternative are not treated
equally.
2
Example (cont.) Let us return to our example where H0 : µ = µ0 and Ha : µ > µ0 . Suppose we
Pn
want a test with level 5%. All tests based on the critical region C = { i=1 Xi < δ} with k such that
√
1 − Φ( n(δ/n − µ0 )/σ) < 0.05 has level 5%. Since the power is a decreasing function of δ in this example,
√
one should choose δ such that 1 − Φ( n(δ/n − µ0 )/σ) = 0.05. Let Z0.95 denote 95%-quantile of standard
√
√
normal distribution. Then n(δ/n − µ0 )/σ = Z0.95 or, equivalently, δ = n(σZ0,95 / n + µ0 ). So, our test is
Pn
√
to accept the null if i=1 Xi < n(σZ0,95 / n + µ0 ).
Since the power of the test depends on the true parameter value, it is possible that one test has maximal
power among all tests with a given level at one parameter value while another test has maximal power at
some other parameter value. So it is possible that there is no uniformly most powerful test. In this situation
the researcher should use some additional criteria to choose a test. This observation explains a wide variety
of tests suggested in the statistical and econometric literature. However, we should note that there is an
important class of problems where uniformly most powerful tests exist. We will discuss it next time.
2.3 P-value
The result of any test is either acceptance or rejection of the null hypothesis. At the same time, it would be
interesting to know to what extent we are sure about the result of the test. The concept of the p-value gives
us such a measure. The p-value is the probability (calculated under the null) of obtaining a sample at least
as adverse to the null hypothesis as given. Notice, that the p-value is a random variable.
Example (cont.) We observe data X = (X1 , ...Xn ) from N (µ, σ 2 ) with known σ 2 . H0 : µ = µ0 and
Ha : µ > µ0 . Assume you have a realized sample (x1 , ..., xn ), denote realized value of
Pn
i=1
xi by, say, A.
Since large values of A is a sign in favor of the alternative, we should reject the null if A is large. In the
√
previous section, we showed that the test of level 5% rejects the null if A ≥ n(σZ0,95 / n + µ0 ). The quantity
Pn
√
on the right hand side of this inequality satises Pµ0 { i=1 Xi ≥ n(σZ0,95 / n + µ0 )} = 0.05. The samples
Pn
(X1 , ..., Xn ) which are more adverse to the null than our sample are those for which i=1 Xi > A. So, the
n
X
p − value = Pµ0 {
Xi > A} = 1 − Φ
i=1
µ
¶
√ A/n − µ0
n
σ
Note again, that it is a function of A, and thus is a random variable. By construction, the p-value is smaller
√
than 0.05 if and only if A ≥ n(σZ0,95 / n + µ0 ). So, if we have a test at the level 0.05, then our test rejects
the null if the p-value is smaller than 0.05.
If the p-value is much smaller than 0.05, then we are quite sure that the null hypothesis does not hold.
If the p-value is close to 0.05, then we are not so sure. Moreover, reporting the p-value has an advantage
that, once the p-value is reported, any researcher can decide for himself whether he or she accepts or rejects
the null hypothesis depending on his/her own favorite level of the test.
Let us now emphasize some frequent misunderstandings of the concept of the p-value. First, a p-value
is not the probability that the null is true. There is no such probability at all since parameters are not
random according to the frequentist (classical) approach. Second, the p-value is not the probability of falsely
rejecting the null. This probability is measured by the size of the test. Third, one minus p-value is not
3
the probability of the alternative being true. Again, there is no such probability since parameters are not
random. Finally, the level of the test is not determined by a p-value. Instead, once we know the p-value of
the test, the level of the test determines whether we accept or reject the null hypothesis.
Example. Let X1 , ...Xn be a random sample from N (µ, σ 2 ) distribution. The null hypothesis, H0 , is that
σ 2 = σ02 . The alternative hypothesis, Ha , is that σ 2 < σ02 . Note that both hypotheses are complex since
both of them contain all possible values of µ. Let us construct a test based on sample variance s2 . We know
that (n − 1)s2 /σ 2 ∼ χ2 (n − 1). Since small values of (n − 1)s2 /σ02 are a sign in favor of the alternative, our
critical region should take the form C = {(n − 1)s2 /σ02 > k}. Under H0 , (n − 1)s2 /σ02 ∼ χ2 (n − 1). Then a
2
test with level, say, 5%, accepts the null hypothesis if (n − 1)s2 /σ02 > χ20.05 (n − 1) where χ0.05
(n − 1) denotes
the 5%-quantile of χ2 (n − 1). What is the power of this test? Let σ 2 < σ02 . Then
2
2
} = Fχ2 (n−1) ((σ02 /σ 2 )χ0.05
),
Pσ2 {(n − 1)s2 /σ02 ≤ χ02.05 } = Pσ2 {(n − 1)s2 /σ 2 ≤ (σ02 /σ 2 )χ0.05
where Fχ2 (n−1) denotes the cdf of χ2 (n − 1). So the power of the test increases as σ 2 decreases. Suppose
n = 101, σ02 = 1, and we observe s2 = 0.9. What is the p-value of our test? Let A ∼ χ2 (n − 1). Then the
p-value equals
P {A ≤ (n − 1)s2 /σ02 } = Fχ2 (n−1) ((n − 1)s2 /σ02 ) = Fχ2 (100) (100 · 0.9/1) = Fχ2 (100) (90) ≈ 0.25.
Thus, the test with level 5% does not reject the null hypothesis.
3 Pivotal Statistics
By denition, a statistic is called pivotal if its distribution is independent of unknown parameters. Pivotal
statistics are useful in testing because one can calculate quantiles of their distributions and, thus, critical
values for tests based on these statistics. For example, (n − 1)s2 /σ02 from the example above is pivotal under
the null since its distribution does not depend on µ.
Example As another example, let X1 , ..., Xn be a random sample from distribution N (µ, σ 2 ) with unknown
6 µ0 . Again, both hypotheses
σ 2 . The null hypothesis is that H0 : µ = µ0 . The alternative is that Ha : µ =
are complex since both of them contain all possible values of σ 2 . Let us construct a test based on |X n − µ0 |.
Large values of |X n − µ0 | are a sign in favor of the alternative. Thus, our critical region should take the
form C = {|X n − µ0 ≤ δ} for some δ > 0. Under the null, |X n − µ0 | ∼ N (0, σ 2 /n). Since we do not
know σ 2 we cannot choose δ so that Pµ0 {|X n − µ0 | > δ} = 0.05. It is because statistic |X n − µ0 | is not
pivotal in this case. One way to proceed is to estimate σ 2 by, say s2 , and then use a pivotal statistic.
p
We know that, under the null, (X n − µ0 )/ s2 /n ∼ t(n − 1). Again, a reasonable test should be based
p
on the critical region C = {|X n − µ0 |/ s2 /n ≤ δ}. Since the pdf of t-distribution is symmetric around
zero, in particular t0,975 (n − 1) = −t0.025 (n − 1), a test with level, say, 5% accepts the null hypothesis if
4
|X n − µ0 |/
(
Pµ0
p
s2 /n ≤ t0.975 (n − 1). Indeed,
)
(
)
(
)
|X n − µ0 |
X n − µ0
X n − µ0
p
p
p
≤ t0.975 (n − 1)
= Pµ0
≤ t0.975 (n − 1) − Pµ0
≤ t0.025 (n − 1)
s2 /n
s2 /n
s2 /n
= 0.975 − 0.025
=
0.95
Example As another example, let X1 , ..., Xm and Y1 , ..., Yn be independent random samples from N (µx , σx2 )
and N (µy , σy2 ) distributions correspondingly with unknown σx and σy . We want to test null hypothesis, H0 ,
that µx = µy . against the alternative, Ha , that µx > µy . A natural place to start is to note that if the null
hypothesis is true, then X n should be close to Y n with high probability. But X n − Y n ∼ N (0, σx2 /m + σy2 /n)
with σx2 and σy2 unknown. So consider
Xn − Y n
t= q
s2x /m + s2y /n
where s2x and s2y are sample variances. Exact distribution of s2x /m + s2y /n is not pleasant. Instead, let us use
asymptotic theory. By the Law of large numbers
sx2 /m + sy2 /n
σx2 /m + σy2 /n
=
(σx2 /m)(sx2 /σx2 ) + (σy2 /n)(s2y /σy2 )
σx2 /m + σy2 /n
=
(σx2 /m)χ2m−1 /(m − 1) + (σy2 /n)χ2n−1 /(n − 1)
σx2 /m + σy2 /n
→p
=
σx2 /m + σy2 /n
σx2 /m + σy2 /n
1
Thus, by the Slutsky theorem, t ⇒ N (0, 1). So we can use quantiles of standard normal distribution to form
a test with size approximately equal to the required level of the test. This gives us a test with asymptotically
correct size.
5
MIT OpenCourseWare
http://ocw.mit.edu
14.381 Statistical Method in Economics
Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.