Download critical region

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
IE241: Introduction to
Hypothesis Testing
We said before that estimation of parameters
was one of the two major areas of statistics.
Now let’s turn to the second major area of
statistics, hypothesis testing.
What is a statistical hypothesis? A statistical
hypothesis is an assumption about f(X) if X is
continuous or p(X) if X is discrete.
A test of a statistical hypothesis is a
procedure for deciding whether or not to
reject the hypothesis.
Let’s look at an example.
A buyer of light bulbs bought 50 bulbs
of each of two brands. When he tested
them, Brand A had an average life of
1208 hours with a standard deviation of
94 hours. Brand B had a mean life of
1282 hours with a standard deviation of
80 hours. Are brands A and B really
different in quality?
We set up two hypotheses.
The first, called the null hypothesis Ho,
is the hypothesis of no difference.
Ho: μA = μB
The second, called the alternative
hypothesis Ha, is the hypothesis that
there is a difference.
Ha: μA ≠ μB
On the basis of the sample of 50 from
each of the two populations of light
bulbs, we shall either reject or not reject
the hypothesis of no difference.
In statistics, we always test the null
hypothesis. The alternative hypothesis
is the default winner if the null
hypothesis is rejected.
We never really accept the null
hypothesis; we simply fail to reject it on
the basis of the evidence in hand.
Now we need a procedure to test the
null hypothesis. A test of a statistical
hypothesis is a procedure for deciding
whether or not to reject the null
hypothesis.
There are two possible decisions, reject
or not reject. This means there are also
two kinds of error we could make.
The two types of error are shown in the table
below.
True state
Ho true
Ho false
Decision
Reject Ho Type 1
error
Do not
reject Ho
Correct
α
decision
Correct
Type 2
decision error
β
If we reject Ho when Ho is in fact true,
then we make a type 1 error. The
probability of type 1 error is α.
If we do not reject Ho when Ho is really
false, then we make a type 2 error. The
probability of a type 2 error is β.
Now we need a decision rule that will
make the probability of the two types of
error very small. The problem is that
the rule cannot make both of them small
simultaneously.
Because in science we have to take the
conservative route and never claim that
we have found a new result unless we
are really convinced that it is true, we
choose a very small α, the probability of
type 1 error.
Then among all possible decision rules given α, we
choose the one that makes β as small as possible.
The decision rule consists of a test statistic and a
critical region where the test statistic may fall. For
means from a normal population, the test statistic is
XA  XB
XA  XB
t

sdiff
s A2 s B2

n A nB
where the denominator is the standard deviation of
the difference between two independent means.
The critical region is a tail of the distribution
of the test statistic. If the test statistic falls in
the critical region, Ho is rejected.
Now, how much of the tail should be in the
critical region? That depends on just how
small you want α to be. The usual choice is
α = .05, but in some very critical cases, α is
set at .01.
Here we have just a non-critical choice of
light bulbs, so we’ll choose α = .05. This
means that the critical region has probability
= .025 in each tail of the t distribution.
For a t distribution with .025 in each tail,
the critical value of t = 1.96, the same
as z because the sample size is greater
than 30. The critical region then is
|t |> 1.96.
In our light bulb example, the test
statistic is
t
1282  1208
74

 4.23
2
2
17.5
80
94

50
50
Now 4.23 is much greater than 1.96 so
we reject the null hypothesis of no
difference and declare that the average
life of the B bulbs is longer than that of
the A bulbs.
Because α = .05, we have 95%
confidence in the decision we made.
We cannot say that there is a 95% probability
that we are right because we are either right
or wrong and we don’t know which.
But there is such a small probability that t will
land in the critical region if Ho is true that if it
does get there, we choose to believe that Ho
is not true.
If we had chosen α = .01, the critical value of
t would be 2.58 and because 4.23 is greater
than 2.58, we would still reject Ho. This time
it would be with 99% confidence.
How do we know that the test we used
is the best test possible?
We have controlled the probability of
Type 1 error. But what is the probability
of Type 2 error in this test? Does this
test minimize it subject of the value of α?
To answer this question, we need to
consider the concept of test power.
The power of a statistical test is the
probability of rejecting Ho when Ho is
really false. Thus power = 1-β.
Clearly if the test maximizes power, it
minimizes the probability of Type 2 error
β. If a test maximizes power for given
α, it is called an admissible testing
strategy.
Before going further, we need to distinguish
between two types of hypotheses.
A simple hypothesis is one where the value of
the parameter under Ho is a specified
constant and the value of the parameter
under Ha is a different specified constant.
For example, if you test
Ho: μ = 0
vs
Ha: μ = 10
then you have a simple hypothesis test.
Here you have a particular value for Ho and a
different particular value for Ha.
For testing one simple hypothesis Ha against
the simple hypothesis Ho, a ground-breaking
result called the Neyman-Pearson lemma
provides the most powerful test.
L(ˆa )

L(ˆ0 )
λ is a likelihood ratio with the Ha parameter
MLE in the numerator and the Ho parameter
MLE in the denominator. Clearly, any value of
λ > 1 would favor the alternative hypothesis,
while values less than 1 would favor the null
hypothesis.
Consider the following example of a
test of two simple hypotheses.
A coin is either fair or has p(H) = 2/3.
Under Ho, P(H) = ½ and under Ha, P(H)
= 2/3.
The coin will be tossed 3 times and a
decision will be made between the two
hypotheses. Thus X = number of heads
= 0, 1, 2, or 3. Now let’s look at how
the decision will be made.
First, let’s look at the probability of Type 1
error α. In the table below, Ho⇒ P(H) =1/2
and Ha⇒ P(H) = 2/3.
X P(X|Ho) P(X|Ha)
0
1/8
1/27
1
2
3/8
3/8
6/27
12/27
3
1/8
8/27
Now what should the critical region be?
Under Ho, if X = 0, α = 1/8. Under Ho, if X = 4, α =
1/8. So if either of these two values is chosen as the
critical region, the probability of Type 1 error would
be the same.
Now what if Ha is true? If X = 0 is chosen as the
critical region, the value of β = 26/27 because that is
the probability that X ≠ 0. On the other hand, if X =
4 is chosen as the critical region, the value of β =
19/27 because that is the probability that X ≠ 3.
Clearly, the better choice for the critical region is X=3
because that is the region that minimizes β for fixed
α. So this critical region provides the more powerful
test.
In discrete variable problems like this, it
may not be possible to choose a critical
region of the desired α. In this
illustration, you simply cannot find a
critical region where α = .05 or .01.
This is seldom a problem in real-life
experimentation because n is usually
sufficiently large so that there is a wide
variety of choices for critical regions.
This problem to illustrate the general
method for selecting the best test was
easy to discuss because there was only
a single alternative to Ho.
Most problems involve more than a
single alternative. Such hypotheses are
called composite hypotheses.
Examples of composite hypotheses:
Ho: μ = 0
vs
Ha: μ ≠ 0
which is a two-sided Ha.
A one-sided Ha can be written as
Ho: μ = 0
vs
Ha: μ > 0
Ho: μ = 0
vs
Ha: μ < 0
or
All of these hypotheses are composite
because they include more than one value
for Ha. And unfortunately, the size of β here
depends on the particular alternative value of
μ being considered.
In the composite case, it is necessary
to compare Type 2 errors for all
possible alternative values under Ha.
So now the size of Type 2 error is a
function of the alternative parameter
value θ.
So β(θ) is the probability that the
sample point will fall in the noncritical
region when θ is the true value of the
parameter.
Because it is more convenient to work
with the critical region, the power
function 1-β(θ) is usually used.
The power function is the probability
that the sample point will fall in the
critical region when θ is the true value
of the parameter.
As an illustration of these points,
consider the following continuous
example.
Let X = the time that elapses between
two successive trippings of a Geiger
counter in studying cosmic radiation. It
is assumed that the density function is
f(x;θ) = θe-θx
where θ is a parameter which depends
on experimental conditions.
Under Ho, θ = 2. Now a physicist
believes that θ < 2. So under Ha, θ < 2.
Now one choice for the critical region is X ≥ 1. and

   2e  2 x dx  .135
1
Another choice is the left tail, X ≤ .07 for which α
= .135. That is,
.07
   2e 2 x dx  .135
0
Now let’s examine the power functions for the two
competing critical regions.
For the critical region X > 1,

1   (1 )   e x dx  e 
1
and for the critical region X <.07,
.07
1   (2 )   e x dx  1  e .07
0
The graphs of these two functions are called
the power curves for the two critical regions.
These two power functions are
P o we r fu nc tio ns fo r two c ritic al re gio ns
1 .2
critical region X>1
1
critical region X<.07
Power
0 .8
0 .6
0 .4
0 .2
0
0
0 .5
1
1 .5
2
2 .5
3
3 .5
4
Th e ta
Note that the power function for X>1 region is always higher than
the power function for X<.07 region before they cross at θ = 2.
Since the alternative θ values in the problem are all θ<2, clearly
the region X>1 is superior.