Download errorsinhypothesistesting

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia, lookup

History of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Statistics wikipedia, lookup

Transcript
AP Statistics
Mr. Coppock
Errors in Hypothesis Testing
Like any decision making process, hypothesis testing is subject to error. There are two
errors that one can make while doing hypothesis testing. One can reject the null
hypothesis when in fact it should be accepted, or one might not reject the null hypothesis
when in fact it should be rejected. The two errors can best be depicted using the 2 X 2
table below:
Reality
Null Hypothesis is
Alternative
true
hypothesis is true
Test Decision
Reject Null
Hypothesis
Fail to
Reject Null
Hypothesis
Type I Error
Correct Decision
Correct Decision
Type II Error
AP Statistics
Mr. Coppock
Example: Lets say Consumer Reports Magazine doesn’t believe that McDonald’s Quarter
Pounders are really a quarter pound of beef (4 oz.). They perform the following
hypothesis test using an SRS of 50 burgers with an  level of 0.05 (the standard
deviation of Quarter Pounder weights is 1.5 oz):
Ho :   4
Ha :   4
A) Lets assume that the burgers are in fact 4 oz. (i.e. McDonalds is telling the truth)
1) If we got an x of 3.5 oz from our SRS, we would reject Ho because:
Z 3.5 
3.5  4.0
1.5 / 50
 2.357
P(Z < -2.357) = .009
the p-value for this x is below .05 (it is 0.009).
This, however would be a wrong decision. The reason we made this error is
because our sample was a fluke and happened to give us an extremely low x . We
call this a Type I error – rejecting the null hypothesis, when, in fact, it is true. The
probability of this error is the probability of rejecting the true null hypothesis
based on such a “fluke”. Since our  level is .05, the probability of rejecting Ho
based on a fluke is .05.
3.5
4.0
4.5
AP Statistics
Mr. Coppock
2) If we got an x of 3.97, we would not reject Ho because:
Z 3.97 
3.97  4.0
1.5 / 50
 0.141
P(Z < -0.141) = 0.444
The p-value for this x is above .05 (it is .444).
This would be a correct decision since Ho is correct. This correct decision would
occur with a probability of 0.95 (1-  )
3.2
3.4
3.6
3.8
4.0
4.2
4.4
4.6
4.8
B) Lets now assume that Quarter Pounders, in fact, weigh below 4 oz. (say they weigh
only 3.7 .oz. which McDonalds is lying)
1) If we got an x of 3.5 oz., we would reject Ho because
Z 3.5 
3.5  4.0
1.5 / 50
 2.357
P(Z < -2.357) = .009
The p-value for this x is below .05 (it is 0.009).
This would be a correct decision, since the alternative hypothesis is true (   4 ).
The probability of this decision is called the power of the test. We’ll learn how to
calculate it in the next example.
AP Statistics
Mr. Coppock
2) If we got an x of 3.97, we would not reject Ho because
Z 3.97 
3.97  4.0
1.5 / 50
 0.141
P(Z < -0.141) = 0.444
The p-value for this x is above .05 (it is .444).
This would be a wrong decision because Ho is not true (the true weight of Quarter
Pounders we assumed was 3.7 oz.) and we failed to reject Ho. This is called a
Type II error and is usually denoted by  .
A good way to picture  and to calculate it is to construct the normal curve that
represents the true weight (   3.7 oz.) next to the normal curve of our null
hypothesis assumption (   4.0 oz.) and look at overlap areas:
Normal curve based on
the true value of 
(   3.7 oz.)
Normal curve used to
calculate the test statistic
assuming the null
hypothesis is true
(   4.0 oz.)
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
H a :   3.7
Ho :   4
Type I error (  ) – Dark shaded region (discussed earlier)
Type II error (  ) – Light shaded region. This is the probability that we would not
reject the null hypothesis that   4.0 oz. when in fact the true  is only 3.7 oz.
b(i.e. alternative hypothesis is true)
AP Statistics
Mr. Coppock
Power: A high probability of a Type II Error (failing to reject the hull hypothesis that
really is false) means that the test is not sensitive enough to usually detect the alternative.
The sensitivity of the test to detect the alternative is called the power of the test. The
power is simply the probability of NOT making a Type II Error. So:
Power = 1 - 
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
Things to remember:
(1) A Type I error can only occur when a null hypothesis is true.
(You incorrectly reject a true null hypothesis.)
(2) A Type II error can only occur when a null hypothesis is false.
(You incorrectly fail to reject a false null hypothesis.)
(3) The Power of a test is 1 - probability (Type II error).
(This is the probability that you correctly reject a false null hypothesis.)
(4) One needs an alternative to the null hypothesis in order to calculate a
Type II error.
AP Statistics
Mr. Coppock
Important Properties of Type I and Type II Errors:
A) The probabilities of a Type I and Type II error are inversely related. If we decrease the
level of  we will increase the probability of 

 - level
a
o

 - level
a
o
The tradeoff between Type I and Type II errors has serious implications for choosing an
 level when doing hypothesis testing. Choosing the smallest possible  level might not
be such a good idea if the consequences of a type II error are grave.
AP Statistics
Mr. Coppock
Example 1: Suppose you are deciding whether or not to reject a parachute before placing
it in service.
Ho: The parachute works properly
Ha: The parachute does not work properly
A Type I error would be to reject the null hypothesis when in fact it is true. This would
mean rejecting a working parachute. The costs of such an error are the amount of money
it took to make the parachute
A Type II error would be not to reject the null hypothesis when in fact it should be
rejected (the alternative hypothesis is true). This would mean that the inspector would
accept a defective parachute and put it in commission. The cost of such an error would
obviously be the life of the individual using the parachute (the parachute not working
means the person jumping off the plane is going to die).
In such an example the cost of a Type II error is so much greater than a Type I error that
we would choose a high alpha level so as to minimize  .
Example 2: Suppose you are deciding whether to give the death penalty to a murder
suspect.
Ho: The suspect is innocent
Ha: The suspect is guilty
A Type I error in this case would entail rejecting the presumption of innocence when in
fact the suspect is innocent. The cost of this error would be executing an innocent person.
A Type II error would mean you do not reject the null hypothesis and decide the person is
not guilty when in fact he is guilty. The cost in this case would be the person would either
go free or perhaps serve a little jail time (if convicted of a lesser crime).
In this case a Type I error seems more costly, and we would want to choose an alpha
level that is extremely small so as to minimize the Type I error.
AP Statistics
Mr. Coppock
B. Both Type I and Type II Error can be reduced by increasing the sample size and thus
reducing the standard deviation in the distributions:

 - level
a
o
 - level

a
o
AP Statistics
Mr. Coppock
C. The probability of a Type II Error decreases, the farther away the alternative mean is
from  o :
 - level

a
o

 - level
a
o
 - level

a
o
AP Statistics
Mr. Coppock
A good analogy to errors in hypothesis testing would, again be a comparison to our
judicial system:
Reality
Jury Decision
Defendant is
Innocent
Defendant is Guilty
Reject
Presumption of
Innocence
(guilty verdict)
Type I Error
Correct Decision
Fail to Reject
Presumption of
Innocence
(Not Guilty
Verdict)
Correct Decision
Type II Error