Download Chapter 9 Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 9
Hypothesis Testing
9.1
Testing Hypotheses
• With our knowledge of interval estimation, we can consider hypothesis tests
• An Example of an Hypothesis Test: Statisticians at Employment and Immigration
Canada believe that the average duration of unemployment in Alberta is less than
6 weeks.
They want to test:
0 :  ≤ 6
1 :   6
NOTE:
1. 0 is called the null hypothesis. It is typically what we are interested in. It is
a maintained hypothesis that is held to be true unless sufficient evidence to the
contrary is obtained.
2. 1 is called the alternative hypothesis. It is a hypothesis against which the null
hypothesis is tested and which will be held to be true if the null is held false.
1
2
CHAPTER 9. HYPOTHESIS TESTING
The Null Hypothesis, H0

States the assumption (numerical) to be
tested
Example: The average number of TV sets in
U.S. Homes is equal to three ( H0 : μ  3 )

Is always about a population parameter,
not about a sample statistic
H0 : X  3
H0 : μ  3
S tatistic s for Busi nes s and Economi cs, 6e © 2007 Pearson E duc ation, I nc.
Figure 9.1:
Chap 10-4
9.2. AN TWO SIDED HYPOTHESIS TEST EXAMPLE
9.2
3
An Two Sided Hypothesis Test Example
A firm’s sales records show that customers spend on average $550 per month on their
product. They wish to know whether this has changed using a significance level of 01
this year. They survey 30 customers and find that the mean expenditure is $510 with a
sample standard deviation of $90.
We can follow the steps outlined earlier:
1. Formulate Null
0 :  = 550  null hypothesis
 :  6= 550  alternative hypothesis
2. Level of Significance of test say  = 01
3. Calculate test statistic as
 =
51 − 55
̄ − 0
√
√ =
= −2434
 
9 30
4. Critical Region Rejection rule
reject 
0
 | |  

2
−1
The critical value is 01229 = 2756 (  = 30)
We do not reject since | |  01229 = 2756.
9.3
Interpretation and Notes
1. We say the null hypothesis that the population mean is equal to 5.5 is not rejected
at the 1% level of significance. This is to make it clear that it might be if we were
to choose a higher level of significance (Try  = 10). Notice we never “accept” a
null hypothesis.
2. The idea is that differences between ̄ and 0 are not significant, since it could arise
from sampling variability under the null. Even if 0 were true we would expect to
see some samples with ̄  55.
3. If we reject 0 it implies the difference between ̄ and 0 is too large to be
attributed to ordinary sampling variability.
4
CHAPTER 9. HYPOTHESIS TESTING
4. The whole trick in hypothesis testing is to figure out the correct statistic to construct
(one with a known sample distribution) and then to find a rejection region. Drawing
a picture often helps one avoid mistakes with rejection regions.
5. We know if the  are normal then we can use the t-distribution in situations where
we are estimating the variance.
6. On the other hand, if  are not normally distributed we can appeal to the central
limit theorem so that ̄ is approximately normally distributed as  gets large, so
that we can still use the t-distribution. Also as we have seen for   30 the tdistribution is close to the normal and often critical values are calculated directly
from the normal tables.
9.4
Confidence Intervals and Hypothesis Tests
•
A final relation between hypothesis tests and confidence intervals can be stated.
•
If one calculates a 95% confidence interval for  and finds that the value 0 is
contained in the interval, then we know that the null hypothesis (0 :  = 0 )
would not be rejected at the 5% level of significance.
• Similarly if the confidence interval did not contain the value 0 , then the null
hypothesis is rejected.
• Once we have calculated the confidence interval, we have in fact obtained all the
possible null hypotheses that would be retained at the chosen significance level
for this particular sample.
• The two-sided confidence interval is:

̄ ± 01229 × √

• The calculation gives: (4.65,5.56) which contains 5.5, the null hypothesis.
9.5
9.6
1.
Definitions and Terms
Test Statistic: A test statistic is a random variable whose value determines
whether we reject or do not reject the null hypothesis.
2. Decision or Rejection Rule: A decision rule specifies the set of values for the
test statistic for which the null hypothesis 0 will be rejected and the set of values
for which 0 will not be rejected.
9.7.
SUMMARY OF CONCEPTS OF A HYPOTHESIS TEST
5
3. Critical Region or Rejection Region: The critical region of a test consists of
all the values of the test statistic for which 0 will be rejected.
4. Non-Rejection Region: The non-rejection region of a test consists of all the
values of the test statistic for which 0 will not be rejected.
5. Critical Values: Critical values of a test statistic separate the critical region from
the non-rejection region.
6. Level of Significance (Usually denoted as ): The level of significance of a test
is the probability that the test statistic lies in the critical region or rejection region
when 0 is true
7. Two-Sided Alternative: An alternative hypothesis involving all possible values of
a population parameter other than the value specified by a simple null hypothesis.
8. One-Sided Alternative: An alternative hypothesis involving all possible values
of a population parameter on either one side or the other of (that is , either greater
than or less than) the value specified by a simple null hypothesis.
9.7
Summary of Concepts of a Hypothesis Test
• Judgments in the form of the hypothesis testing involve an a priori assumption
about the value of an unknown parameter.
•
If the sample information provides evidence against the null hypothesis we reject
it, otherwise we do not reject it.
• The evidence from a sample is summarized in the form of a test statistic which is
used at arriving at a verdict concerning the hypothesis.
9.8
Steps in Conducting an Hypothesis Test
1. Formulate the null and alternative hypotheses (0  1 ).
2. Choose the level of significance  and hence define the critical value (i.e. divide the
region into rejection and non-rejection regions.
3. Calculate the test statistic using sample information.
4. If the calculated statistic falls within the rejection region, reject the null hypothesis;
if it is in the non-rejection region, do not reject the null hypothesis.
6
CHAPTER 9. HYPOTHESIS TESTING
9.9
An Generic Example for Two-Sided Alternative:
[Transparency 9.4]
1. Formulate Null and Alternative Hypotheses
0 :  = 0
1 :  6= 0 
2. Choose level of Significance  (say 5% level)
3. Calculate test statistic (with data)
 =
̄ − 0
√

We are testing on the basis of sample information whether  = 0 (where 0 is
a specified (known) number). In this case the test statistic is a t-test. If the null
hypothesis is true ( = 0 ) then this statistic is distributed as a −1 .
4. Critical Region for decision rule: Reject  0 if
•
| |  2−1
Do not Reject  0 if:
−2−1    2−1
• Notice carefully that there are really two critical values for two-sided tests:
±2−1
9.10
One-Sided Alternatives
The hypothesis tests above was for a two-sided alternative
•
1 :  6= 0 
9.11. NOTES ON ONE-SIDED ALTERNATIVES
7
• Suppose in the previous example it was thought that sales probably had fallen from
5.5 (everyone was confident that they could rule out a rise in sales)
• We might wish to incorporate this belief right into the hypothesis test
• This is accomplished by a one-sided alternative:
• Redo example for this
1. Formulate the Hypothesis (for some specified value of 0 )
0 :  ≥ 55
1 :   55
Notice that the alternative is narrowed in the direction where we think sales
are (in the event that the null is false)
2. Level of significance is still 
3. The test statistic is unchanged
4. Decision rule is beased on the critical value −1 (not  2 −1 ) so that we reject
the null hypothesis if
•
  −−1
• Otherwise we retain or do not reject 0 .
• The calculated value is unchanged at -2.434 but − −1 = −2462 which
means that we barely retain the null hypothesis for the one-sied alternative
9.11
Notes on One-Sided Alternatives
• This is an example of a one-sided test, since the alternative hypothesis includes
either the less than “” or the greater than “” condition.
0 :  ≥ 0
1 :   0 
• We could change the nature of the critical value (and hence the rejection and
non-rejection region) by changing the hypothesis test to:
0 :  ≤ 0
1 :   0 
8
CHAPTER 9. HYPOTHESIS TESTING
[Transparency 9.3]
• In this case we would calculate the same test statistic as above but the rejection
rule would be
  −1
• The inequality for 1 is a useful memory aid to decide whether you want to use the
positive critical value ( ) or the negative critical value ()
• Of course the two-sied alternative you use ± critical value ( 6=)
9.11.1
Reason for a One-Sided Alternative
• We note that since   −1   2 −1  that for the same calculated value it is
possible, to retain the null hypothesis for the two-sided alternative while rejecting
for the one-sided alternative
• Whether we want to reject the null or not depends on whether it is true or not
• We never know whether the null hypothesis is true (afterall why would we test it if
we knew)
9.12
Type I & Type II Errors
• It is very easy to lose sight of the fact that we DO NOT KNOW whether the
null is true or not (if we did why do we need to do any test).
There are 2 kinds of errors we can make:
1. Type I Error: Rejecting  0 when 
0
is true
2. Type II Error: Not Rejecting  0 when  0 is false
   
0 is true
0 is false
Do not Reject
 
(1−)
   
()
Reject 0
   
()
 
(called
(1−)
power)
9.13.
PROBABILITY OF TYPE I AND II ERRORS
9.13
9
Probability of Type I and II Errors

=  (Type I Error) =  (we reject 0 |0 is true)
=  (Test statistic lies in the rejection region|0 is true)
 =  (TypeII Error) = (we do not reject 0 |0 is false)
  = 1 − 
• Power measures the probability of correctly rejecting 0 when 0 is false.
9.13.1
Example of Probability of Type I and II
1. What is the  (Type I Error)? Answer  which for the above example  = 01
2. What is the P(Type II Error) and Power?
• To answer this question we must consider values for  that are in 1 and
•
Calculate the probabilities. of retaining 0 under various values in the alternative
• The null can be false in MANY ways under the alternative:
9.14
Power Calculation
• Let us calculate the probability of retaining 0 :  = 55 when the true  = 51
(which also happens to be the sample mean, but other values could also be chosen).
• Suppose Truth:  = 51
• Test Null at:  = 55
• What is our Decision Rule?:
9.14.1
Rejection rule: reject 0 if ||  2756
Calculating P(Type II Error) and Power
1. We assume that the variance is unchanged under 0 and 1 and use the estimate
√ .

2. We want to calculate what are the critical values in terms of ̄ .
10
CHAPTER 9. HYPOTHESIS TESTING
3. We know that we retain 0 :  = 55 whenever our calculated t-statistic | | 
2756.
̄ − 0
 2756} = 99
 {−2756 

√

which after some manipulation can be written


 {0 − 2756 × √  ̄  0 + 2756 × √ } = 99


Substituting 0 = 55 and the estimated standard deviation gives:
9
9
 {55 − 2756 × √
 ̄  55 + 2756 × √ } = 99
30
30
4. This leads to 99% critical values in terms of the sample the sample mean ̄:
(5047 5953)
(9.1)
This gives all the values for the sample mean that would not be rejected for
0 :  = 55 at the  = 01 level of significance. Note that our sample mean
̄ = 51 is in the interval and hence we did not reject the null hypothesis We want
to find out the probability of being the interval (5047 5953) for various values
in the alternative we start with  = 51
5. Calculate the probability of a Type II Error

=  {   }
=  {    0 | 0 }
=  {      −  |0   ]
=  {5047  ̄  5953| = 51}

 ̄−
 5953−51
]
=  { 5047−51
√
√9
√9
30

30
=  [−3225    51911}
=  {(  3225}
= 626
  = 1 −  = 1 − 626 = 374
9.15
Notes on Power
• The probability of a Type II error when testing 0 :  = 55 when the true value
of  = 51
• Note the interval (5047 5953) is not the same as the confidence interval
• The confidence interval for the population mean 

̄ ± 01229 × √ which gives (465 556)

9.15. NOTES ON POWER
11
• We can repeat this calculation for all possible alternatives under 1 :  6= 55.
• For example let us do another calculation on the other side of the null, say  = 57
 =  (5047  ̄  5953| = 57)
Ã
!
5047 − 57
̄ − 
5953 − 57




9
9
√
30
√

√
30
=  (−3974    1594) =  (1594) = 9441
• Now we can calcultate Probability of Type II Error and Power for a variety of values
under  
Power Calculations for Testing  0 :  = 55
Probability of Type II Error=  Power = 1 − 
Value under  
  = 44
  = 45
  = 46
  = 47
  = 48
  = 50
  = 51
  = 549999
..
.
..
.
 = 57
0
.001
.003
.017
.067
.386
.626
.99
.944
• Power Curve: Plot of power on − axis and value of 
9.15.1
1
.9999
.997
.98
.93
.614
.374
.01
.056

on  -axis
Notes on Type I and II Error
• We can make a Type I error only when we reject 0 and a Type II error only when
we do not.
• We want both  and  to be small.
• While we would like both Type I and Type II Errors to be as small as possible,
there is in fact a trade-off.
• Suppose the null hypothesis is that those charged with crimes are innocent.
• Then a legal test which never convicts the innocent (has  = 0) would free many
who are guilty (large ).
12
CHAPTER 9. HYPOTHESIS TESTING
• Lowering  will result in a wider non- rejection region which makes it more likely
that a false null hypothesis will be retained.
• To see this redo the above exercise with  = 005 and .05.
• Since our interest is usually centered on the null hypothesis  is usually chosen to
be small; 10 percent or less.
• The null and alternative are not treated symmetrically; rejecting the null
does not imply that the alternative is true.
• Alternative is not under test.
• Do not say ”we accept the alternative”.
• We have seen that the closer is the true value of  in  to the value under
0 the larger is the probability of Type II error and hence the lower is power.
9.16
Prob- or p-Values [Transparency 9.10 and 9.11]
• It is arbitrary that the rejection/non-rejection of a test depends on the choice of
.
• An alternative way to report one’s results is to quote the test statistic with a
p-value, or prob-value.
• This allows the user to choose a particular  and make their own decision using the
reported  − 
• A p-value is simply the probability that a test statistic is as large (in absolute
values) as that calculated under the null hypothesis.
•
It is simply the area in the statistic’s density beyond the point actually observed.
9.16.1
Example of a −value
In our example of the expenditure on customer sales our test statistic was -2.43. This
has a p-value of:
 =  (  −243) = 011
• For two-sided alternatives you will see authors report the p-value as .011 × 2
=.022 (reflecting that both large positives and negatives of the statistic are possible).
• Only 2% chance of observing a mean of 5.1, if the null hypothesis 0 :  = 55
against 1 :  6= 55 is true.
•  −  can be found from tables in the textbook or from the tables built into
computer packages like STATA.
9.17.
13
TESTING PROPORTIONS
9.16.2
Interpretation and Use of P-Values
If the  −  is less than a chosen level of the test  one rejects the null hypothesis
at the  level.
•
 − 
 − 
⇒
≥⇒
Reject  0
Do not Reject  0
(9.2)
(9.3)
In the above example
 −  = 0022
Therefore for an  = 05 we would reject  0 but  = 01 we would retain 0
• One can also report p-values and let readers decide on their own significance levels.
• We now have all the tools necessary to do any hypothesis test.
• In the rest of this chapter we will consider other applications.
9.17
Testing Proportions
• We can test hypotheses about the number of successes in  trials , or about the
proportion of successes.
• In the binomial distribution we know the standard deviation of  and  under
the null which we can use together with the standard normal tables (in fact better
approximation results can often be obtained by using t-distributions) for one- or
two-sided tests.
0 :
 = 0
1 :
 6=  0
• Form test statistic (either a Z statistic or t depending on the degrees of freedom):
− 0
= p
0 (1 −  0 )
14
CHAPTER 9. HYPOTHESIS TESTING
9.17.1
Example of Test for Population Proportion
Let us return to the mini survey conducted by Employment and Immigration Canada.
They survey  = 9 unemployed persons. We have seen how they tested an hypothesis
about the mean duration of unemployment. Now suppose they want to learn the proportion of searchers who receive a job offer within the first six weeks of unemployment. Of
the 9 people surveyed, 2 receive such offers. Suppose that:
0 :
 ≥ 5
 :
  5
Let  = 05.
• Then the rejection rule is: reject 0 if   −1860 (058 = 1860).
• The test statistic is:
222 − 5
= −1667
 = p
5(5)9
• The null hypothesis is not rejected at the 5 percent level.
• You can see in this example that the null is not rejected, even though there seems
to be a large gap between the agency’s hypothesis and the sample proportion.
• The null is not easily rejected because the sample is so small, so that sampling
variability is large.
• The p-value for this problem P(  −1667) = .067, again showing that the
hypothesis would not be rejected at the 5 percent level.
Chapter 10
Hypothesis Testing: Additional
Topics
10.1
Tests of Differences of Population Means
• Suppose that we have two samples with 1 observations, a mean ̄1 , and sample
standard deviation 1 in the first, and 2 observations, a mean ̄2 , and sample
standard deviation 2 in the second.
• Data with this property are most likely to arise from experiments in which two
treatments are applied. The testing problem is:
0 :
1 = 2
which can be written as:
0 :
1 − 2 = 0
• A general hypothesis test of differences is:
0 :
1 − 2 = 0 
• where 0 is the hypothesized difference.(usually 0 = 0)
15
16
CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS
10.2
•
Testing Differences when Variances are the same
We assume that the two populations have the same variance (see Chapters 7 and
8) ,
 1 =  2 = 
• The population standard deviation of the difference (assuming independence) is
s
 21  22
+
=
1 2
r
1
1
+ 
1 2
• A pooled estimate of  2 is
2
=
P1
P
(1 − ̄1 )2 + 2 (2 − ̄2 )2
(1 − 1)21 + (2 − 1)22
=

1 + 2 − 2
1 + 2 − 2
• Then we multiply the root of this by
r
1
1
+ 
1 2
• Taking the square root gives us the estimated sample standard deviation:
̄1 −̄2
r
1
1
= 2 × [ + ]
1 2
 =
̄1 − ̄2 − (1 − 2 )
̄1 −̄2
• The test statistic is:
• As before we have three different rejection rules (all use the same test statistic)
depending on the alternative:
0 :
1 = 2
1 :
1 6= 2
10.2.
TESTING DIFFERENCES WHEN VARIANCES ARE THE SAME 17
• Rejection Rule: | |  21 +2−2 then reject 0
0 :
1 ≤ 2
1 :
1  2
• Rejection Rule:   1 +2−2 then reject 0
0 :
1 ≥ 2
1 :
1  2
• Rejection Rule:   −1 +2−2 then reject 0
10.2.1
Example of Testing Differences in Population Means
A market research firm wishes to know if the mean number of hours of TV watching per
week is the same for teenage boys as for teenage girls. The following data were obtained:
Boys: 1 = 20 ̄1 = 245 21 = 64
Girls: 2 = 12 ̄2 = 287 22 = 71
Carry out a hypothesis test that boys and girls watch the same number of hours of
TV at the 5% level of significance.
0 :
1 = 2
1 :
1 6= 2
• Why 2 sided? DO not have any reason to believe girls and boys are different
18
CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS
245 − 287 − 0
= −141
 = q
6657
6657
[ 20 + 12 ]
since
2 =
(19)64 + (11)71
(1 − 1)21 + (2 − 1)22
=
= 6657
1 + 2 − 2
30
• As  = −141 lies in the non-rejection region (05230 = ±2042) we do not reject
the hypothesis at the 5% level of significance that boys and girls watch the same
number of TV hours.
• Note that  (  −141) = 084 and therefore the p-value = 2 × 084 =.168.
10.3
Testing Differences when Variances are Different
• Tests are conducted exactly the same way as before except the formula for ̄1 − ̄2
and the degrees of freedom are different.
̄1 − ̄2 =
s
21
2
+ 2
1
2
and the degrees of freedom is a crazy formula that can be found in the book:
=
h
2

2


(−1 )
+
+
2

i2
2


( −1 )
• For convenience I have always used  = 1 + 2 − 2 and hoped I was not too far
off.
• Redo the above TV example by not assuming the variances are the same. Is there
any difference to your conclusions?
10.4. TESTING DIFFERENCES OF POPULATION PROPORTIONS
10.4
19
Testing Differences of Population Proportions
• Recall that the variance of a sample proportion  is estimated by  (1 −  ) .
• Then if we have two independent samples, for hypotheses about the difference
between the two population proportions (and hypotheses are always about populations)
0 :
 1 −  2 = 0
 :
 1 −  2 6= 0 
• the test statistic is:
(1 − 2 ) − 0

= p
1 (1 − 1 )1 + 2 (1 − 2 )2
• Often the null hypothesis will be that the two proportions are equal:
0 :  1 =  2 
• In the equality of proportion case the formula simplifies to:
where
1 − 2

= p
 (1 −  )(11 + 12 )
 =
10.4.1
1 + 2

1 + 2
Example of Testing Differences in Population Proportions
In a sample of 400 products produced by Machine 1, 23 were defective and in a sample
of 400 products produced by Machine 2, 17 were defective.
Test:
0 :  1 −  2 = 0
20
CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS
against
1 : 1 −  2 6= 0
using a 5% level of significance.
Answer
From the question we know
1 =
23
400
2 =
17
400
The pooled estimate  is:
 =
1 + 2
23 + 17
=
= 05
1 + 2
400 + 400
Therefore
23
17
− 400
400
 = p
= 08111
(05)(95)(1400 + 1400)
• Since  is in the non-rejection region (-1.96,1.96) we do not reject 0 at the 5%
level of significance.
• Note that since  (  8111) = 21 that the p-value= .21 × 2 =0.42.
10.5
Testing the Hypothesis 1 = 2 with Paired
Data
• In testing whether the two means were different form 0 or 0 we assumed that the
two samples were independent.
• Hence the estimated variance of the difference was the sum of the two variances.
10.5.
TESTING THE HYPOTHESIS 1 = 2 WITH PAIRED DATA
21
• We could do the testing under the assumption that the variances were the same
(pooled variance) or different.
• On occasion, we may have paired data.
• This is data that is grouped or paired so that the variation in responses between
the members of any pair are less than the variation between members of different
pairs.
• We can improve the efficiency (lower the variance) of the experiment by randomizing
the two treatments over the two members of each pair.
• We restrict the randomization so that the treatment is given to one member of each
pair. and obtain a separate estimate of the difference between the treatment effects
for each pair.
• The variation among the pairs is not included in our estimate of the variance.
• Hence if this variation is large relative to the variation within pairs, the variance
from paired tests will be smaller than that from a completely randomized (independent) sampling experiment.
• This motivates the use of twins in some experiments.
• In economics we seldom (ever?) have paired data and so we will not puirsue this
matter
• In hospital, clinical and drug testing setting, this is often the casewher there is
paired data