* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 9 Hypothesis Testing
Survey
Document related concepts
Transcript
Chapter 9 Hypothesis Testing 9.1 Testing Hypotheses • With our knowledge of interval estimation, we can consider hypothesis tests • An Example of an Hypothesis Test: Statisticians at Employment and Immigration Canada believe that the average duration of unemployment in Alberta is less than 6 weeks. They want to test: 0 : ≤ 6 1 : 6 NOTE: 1. 0 is called the null hypothesis. It is typically what we are interested in. It is a maintained hypothesis that is held to be true unless sufficient evidence to the contrary is obtained. 2. 1 is called the alternative hypothesis. It is a hypothesis against which the null hypothesis is tested and which will be held to be true if the null is held false. 1 2 CHAPTER 9. HYPOTHESIS TESTING The Null Hypothesis, H0 States the assumption (numerical) to be tested Example: The average number of TV sets in U.S. Homes is equal to three ( H0 : μ 3 ) Is always about a population parameter, not about a sample statistic H0 : X 3 H0 : μ 3 S tatistic s for Busi nes s and Economi cs, 6e © 2007 Pearson E duc ation, I nc. Figure 9.1: Chap 10-4 9.2. AN TWO SIDED HYPOTHESIS TEST EXAMPLE 9.2 3 An Two Sided Hypothesis Test Example A firm’s sales records show that customers spend on average $550 per month on their product. They wish to know whether this has changed using a significance level of 01 this year. They survey 30 customers and find that the mean expenditure is $510 with a sample standard deviation of $90. We can follow the steps outlined earlier: 1. Formulate Null 0 : = 550 null hypothesis : 6= 550 alternative hypothesis 2. Level of Significance of test say = 01 3. Calculate test statistic as = 51 − 55 ̄ − 0 √ √ = = −2434 9 30 4. Critical Region Rejection rule reject 0 | | 2 −1 The critical value is 01229 = 2756 ( = 30) We do not reject since | | 01229 = 2756. 9.3 Interpretation and Notes 1. We say the null hypothesis that the population mean is equal to 5.5 is not rejected at the 1% level of significance. This is to make it clear that it might be if we were to choose a higher level of significance (Try = 10). Notice we never “accept” a null hypothesis. 2. The idea is that differences between ̄ and 0 are not significant, since it could arise from sampling variability under the null. Even if 0 were true we would expect to see some samples with ̄ 55. 3. If we reject 0 it implies the difference between ̄ and 0 is too large to be attributed to ordinary sampling variability. 4 CHAPTER 9. HYPOTHESIS TESTING 4. The whole trick in hypothesis testing is to figure out the correct statistic to construct (one with a known sample distribution) and then to find a rejection region. Drawing a picture often helps one avoid mistakes with rejection regions. 5. We know if the are normal then we can use the t-distribution in situations where we are estimating the variance. 6. On the other hand, if are not normally distributed we can appeal to the central limit theorem so that ̄ is approximately normally distributed as gets large, so that we can still use the t-distribution. Also as we have seen for 30 the tdistribution is close to the normal and often critical values are calculated directly from the normal tables. 9.4 Confidence Intervals and Hypothesis Tests • A final relation between hypothesis tests and confidence intervals can be stated. • If one calculates a 95% confidence interval for and finds that the value 0 is contained in the interval, then we know that the null hypothesis (0 : = 0 ) would not be rejected at the 5% level of significance. • Similarly if the confidence interval did not contain the value 0 , then the null hypothesis is rejected. • Once we have calculated the confidence interval, we have in fact obtained all the possible null hypotheses that would be retained at the chosen significance level for this particular sample. • The two-sided confidence interval is: ̄ ± 01229 × √ • The calculation gives: (4.65,5.56) which contains 5.5, the null hypothesis. 9.5 9.6 1. Definitions and Terms Test Statistic: A test statistic is a random variable whose value determines whether we reject or do not reject the null hypothesis. 2. Decision or Rejection Rule: A decision rule specifies the set of values for the test statistic for which the null hypothesis 0 will be rejected and the set of values for which 0 will not be rejected. 9.7. SUMMARY OF CONCEPTS OF A HYPOTHESIS TEST 5 3. Critical Region or Rejection Region: The critical region of a test consists of all the values of the test statistic for which 0 will be rejected. 4. Non-Rejection Region: The non-rejection region of a test consists of all the values of the test statistic for which 0 will not be rejected. 5. Critical Values: Critical values of a test statistic separate the critical region from the non-rejection region. 6. Level of Significance (Usually denoted as ): The level of significance of a test is the probability that the test statistic lies in the critical region or rejection region when 0 is true 7. Two-Sided Alternative: An alternative hypothesis involving all possible values of a population parameter other than the value specified by a simple null hypothesis. 8. One-Sided Alternative: An alternative hypothesis involving all possible values of a population parameter on either one side or the other of (that is , either greater than or less than) the value specified by a simple null hypothesis. 9.7 Summary of Concepts of a Hypothesis Test • Judgments in the form of the hypothesis testing involve an a priori assumption about the value of an unknown parameter. • If the sample information provides evidence against the null hypothesis we reject it, otherwise we do not reject it. • The evidence from a sample is summarized in the form of a test statistic which is used at arriving at a verdict concerning the hypothesis. 9.8 Steps in Conducting an Hypothesis Test 1. Formulate the null and alternative hypotheses (0 1 ). 2. Choose the level of significance and hence define the critical value (i.e. divide the region into rejection and non-rejection regions. 3. Calculate the test statistic using sample information. 4. If the calculated statistic falls within the rejection region, reject the null hypothesis; if it is in the non-rejection region, do not reject the null hypothesis. 6 CHAPTER 9. HYPOTHESIS TESTING 9.9 An Generic Example for Two-Sided Alternative: [Transparency 9.4] 1. Formulate Null and Alternative Hypotheses 0 : = 0 1 : 6= 0 2. Choose level of Significance (say 5% level) 3. Calculate test statistic (with data) = ̄ − 0 √ We are testing on the basis of sample information whether = 0 (where 0 is a specified (known) number). In this case the test statistic is a t-test. If the null hypothesis is true ( = 0 ) then this statistic is distributed as a −1 . 4. Critical Region for decision rule: Reject 0 if • | | 2−1 Do not Reject 0 if: −2−1 2−1 • Notice carefully that there are really two critical values for two-sided tests: ±2−1 9.10 One-Sided Alternatives The hypothesis tests above was for a two-sided alternative • 1 : 6= 0 9.11. NOTES ON ONE-SIDED ALTERNATIVES 7 • Suppose in the previous example it was thought that sales probably had fallen from 5.5 (everyone was confident that they could rule out a rise in sales) • We might wish to incorporate this belief right into the hypothesis test • This is accomplished by a one-sided alternative: • Redo example for this 1. Formulate the Hypothesis (for some specified value of 0 ) 0 : ≥ 55 1 : 55 Notice that the alternative is narrowed in the direction where we think sales are (in the event that the null is false) 2. Level of significance is still 3. The test statistic is unchanged 4. Decision rule is beased on the critical value −1 (not 2 −1 ) so that we reject the null hypothesis if • −−1 • Otherwise we retain or do not reject 0 . • The calculated value is unchanged at -2.434 but − −1 = −2462 which means that we barely retain the null hypothesis for the one-sied alternative 9.11 Notes on One-Sided Alternatives • This is an example of a one-sided test, since the alternative hypothesis includes either the less than “” or the greater than “” condition. 0 : ≥ 0 1 : 0 • We could change the nature of the critical value (and hence the rejection and non-rejection region) by changing the hypothesis test to: 0 : ≤ 0 1 : 0 8 CHAPTER 9. HYPOTHESIS TESTING [Transparency 9.3] • In this case we would calculate the same test statistic as above but the rejection rule would be −1 • The inequality for 1 is a useful memory aid to decide whether you want to use the positive critical value ( ) or the negative critical value () • Of course the two-sied alternative you use ± critical value ( 6=) 9.11.1 Reason for a One-Sided Alternative • We note that since −1 2 −1 that for the same calculated value it is possible, to retain the null hypothesis for the two-sided alternative while rejecting for the one-sided alternative • Whether we want to reject the null or not depends on whether it is true or not • We never know whether the null hypothesis is true (afterall why would we test it if we knew) 9.12 Type I & Type II Errors • It is very easy to lose sight of the fact that we DO NOT KNOW whether the null is true or not (if we did why do we need to do any test). There are 2 kinds of errors we can make: 1. Type I Error: Rejecting 0 when 0 is true 2. Type II Error: Not Rejecting 0 when 0 is false 0 is true 0 is false Do not Reject (1−) () Reject 0 () (called (1−) power) 9.13. PROBABILITY OF TYPE I AND II ERRORS 9.13 9 Probability of Type I and II Errors = (Type I Error) = (we reject 0 |0 is true) = (Test statistic lies in the rejection region|0 is true) = (TypeII Error) = (we do not reject 0 |0 is false) = 1 − • Power measures the probability of correctly rejecting 0 when 0 is false. 9.13.1 Example of Probability of Type I and II 1. What is the (Type I Error)? Answer which for the above example = 01 2. What is the P(Type II Error) and Power? • To answer this question we must consider values for that are in 1 and • Calculate the probabilities. of retaining 0 under various values in the alternative • The null can be false in MANY ways under the alternative: 9.14 Power Calculation • Let us calculate the probability of retaining 0 : = 55 when the true = 51 (which also happens to be the sample mean, but other values could also be chosen). • Suppose Truth: = 51 • Test Null at: = 55 • What is our Decision Rule?: 9.14.1 Rejection rule: reject 0 if || 2756 Calculating P(Type II Error) and Power 1. We assume that the variance is unchanged under 0 and 1 and use the estimate √ . 2. We want to calculate what are the critical values in terms of ̄ . 10 CHAPTER 9. HYPOTHESIS TESTING 3. We know that we retain 0 : = 55 whenever our calculated t-statistic | | 2756. ̄ − 0 2756} = 99 {−2756 √ which after some manipulation can be written {0 − 2756 × √ ̄ 0 + 2756 × √ } = 99 Substituting 0 = 55 and the estimated standard deviation gives: 9 9 {55 − 2756 × √ ̄ 55 + 2756 × √ } = 99 30 30 4. This leads to 99% critical values in terms of the sample the sample mean ̄: (5047 5953) (9.1) This gives all the values for the sample mean that would not be rejected for 0 : = 55 at the = 01 level of significance. Note that our sample mean ̄ = 51 is in the interval and hence we did not reject the null hypothesis We want to find out the probability of being the interval (5047 5953) for various values in the alternative we start with = 51 5. Calculate the probability of a Type II Error = { } = { 0 | 0 } = { − |0 ] = {5047 ̄ 5953| = 51} ̄− 5953−51 ] = { 5047−51 √ √9 √9 30 30 = [−3225 51911} = {( 3225} = 626 = 1 − = 1 − 626 = 374 9.15 Notes on Power • The probability of a Type II error when testing 0 : = 55 when the true value of = 51 • Note the interval (5047 5953) is not the same as the confidence interval • The confidence interval for the population mean ̄ ± 01229 × √ which gives (465 556) 9.15. NOTES ON POWER 11 • We can repeat this calculation for all possible alternatives under 1 : 6= 55. • For example let us do another calculation on the other side of the null, say = 57 = (5047 ̄ 5953| = 57) Ã ! 5047 − 57 ̄ − 5953 − 57 9 9 √ 30 √ √ 30 = (−3974 1594) = (1594) = 9441 • Now we can calcultate Probability of Type II Error and Power for a variety of values under Power Calculations for Testing 0 : = 55 Probability of Type II Error= Power = 1 − Value under = 44 = 45 = 46 = 47 = 48 = 50 = 51 = 549999 .. . .. . = 57 0 .001 .003 .017 .067 .386 .626 .99 .944 • Power Curve: Plot of power on − axis and value of 9.15.1 1 .9999 .997 .98 .93 .614 .374 .01 .056 on -axis Notes on Type I and II Error • We can make a Type I error only when we reject 0 and a Type II error only when we do not. • We want both and to be small. • While we would like both Type I and Type II Errors to be as small as possible, there is in fact a trade-off. • Suppose the null hypothesis is that those charged with crimes are innocent. • Then a legal test which never convicts the innocent (has = 0) would free many who are guilty (large ). 12 CHAPTER 9. HYPOTHESIS TESTING • Lowering will result in a wider non- rejection region which makes it more likely that a false null hypothesis will be retained. • To see this redo the above exercise with = 005 and .05. • Since our interest is usually centered on the null hypothesis is usually chosen to be small; 10 percent or less. • The null and alternative are not treated symmetrically; rejecting the null does not imply that the alternative is true. • Alternative is not under test. • Do not say ”we accept the alternative”. • We have seen that the closer is the true value of in to the value under 0 the larger is the probability of Type II error and hence the lower is power. 9.16 Prob- or p-Values [Transparency 9.10 and 9.11] • It is arbitrary that the rejection/non-rejection of a test depends on the choice of . • An alternative way to report one’s results is to quote the test statistic with a p-value, or prob-value. • This allows the user to choose a particular and make their own decision using the reported − • A p-value is simply the probability that a test statistic is as large (in absolute values) as that calculated under the null hypothesis. • It is simply the area in the statistic’s density beyond the point actually observed. 9.16.1 Example of a −value In our example of the expenditure on customer sales our test statistic was -2.43. This has a p-value of: = ( −243) = 011 • For two-sided alternatives you will see authors report the p-value as .011 × 2 =.022 (reflecting that both large positives and negatives of the statistic are possible). • Only 2% chance of observing a mean of 5.1, if the null hypothesis 0 : = 55 against 1 : 6= 55 is true. • − can be found from tables in the textbook or from the tables built into computer packages like STATA. 9.17. 13 TESTING PROPORTIONS 9.16.2 Interpretation and Use of P-Values If the − is less than a chosen level of the test one rejects the null hypothesis at the level. • − − ⇒ ≥⇒ Reject 0 Do not Reject 0 (9.2) (9.3) In the above example − = 0022 Therefore for an = 05 we would reject 0 but = 01 we would retain 0 • One can also report p-values and let readers decide on their own significance levels. • We now have all the tools necessary to do any hypothesis test. • In the rest of this chapter we will consider other applications. 9.17 Testing Proportions • We can test hypotheses about the number of successes in trials , or about the proportion of successes. • In the binomial distribution we know the standard deviation of and under the null which we can use together with the standard normal tables (in fact better approximation results can often be obtained by using t-distributions) for one- or two-sided tests. 0 : = 0 1 : 6= 0 • Form test statistic (either a Z statistic or t depending on the degrees of freedom): − 0 = p 0 (1 − 0 ) 14 CHAPTER 9. HYPOTHESIS TESTING 9.17.1 Example of Test for Population Proportion Let us return to the mini survey conducted by Employment and Immigration Canada. They survey = 9 unemployed persons. We have seen how they tested an hypothesis about the mean duration of unemployment. Now suppose they want to learn the proportion of searchers who receive a job offer within the first six weeks of unemployment. Of the 9 people surveyed, 2 receive such offers. Suppose that: 0 : ≥ 5 : 5 Let = 05. • Then the rejection rule is: reject 0 if −1860 (058 = 1860). • The test statistic is: 222 − 5 = −1667 = p 5(5)9 • The null hypothesis is not rejected at the 5 percent level. • You can see in this example that the null is not rejected, even though there seems to be a large gap between the agency’s hypothesis and the sample proportion. • The null is not easily rejected because the sample is so small, so that sampling variability is large. • The p-value for this problem P( −1667) = .067, again showing that the hypothesis would not be rejected at the 5 percent level. Chapter 10 Hypothesis Testing: Additional Topics 10.1 Tests of Differences of Population Means • Suppose that we have two samples with 1 observations, a mean ̄1 , and sample standard deviation 1 in the first, and 2 observations, a mean ̄2 , and sample standard deviation 2 in the second. • Data with this property are most likely to arise from experiments in which two treatments are applied. The testing problem is: 0 : 1 = 2 which can be written as: 0 : 1 − 2 = 0 • A general hypothesis test of differences is: 0 : 1 − 2 = 0 • where 0 is the hypothesized difference.(usually 0 = 0) 15 16 CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS 10.2 • Testing Differences when Variances are the same We assume that the two populations have the same variance (see Chapters 7 and 8) , 1 = 2 = • The population standard deviation of the difference (assuming independence) is s 21 22 + = 1 2 r 1 1 + 1 2 • A pooled estimate of 2 is 2 = P1 P (1 − ̄1 )2 + 2 (2 − ̄2 )2 (1 − 1)21 + (2 − 1)22 = 1 + 2 − 2 1 + 2 − 2 • Then we multiply the root of this by r 1 1 + 1 2 • Taking the square root gives us the estimated sample standard deviation: ̄1 −̄2 r 1 1 = 2 × [ + ] 1 2 = ̄1 − ̄2 − (1 − 2 ) ̄1 −̄2 • The test statistic is: • As before we have three different rejection rules (all use the same test statistic) depending on the alternative: 0 : 1 = 2 1 : 1 6= 2 10.2. TESTING DIFFERENCES WHEN VARIANCES ARE THE SAME 17 • Rejection Rule: | | 21 +2−2 then reject 0 0 : 1 ≤ 2 1 : 1 2 • Rejection Rule: 1 +2−2 then reject 0 0 : 1 ≥ 2 1 : 1 2 • Rejection Rule: −1 +2−2 then reject 0 10.2.1 Example of Testing Differences in Population Means A market research firm wishes to know if the mean number of hours of TV watching per week is the same for teenage boys as for teenage girls. The following data were obtained: Boys: 1 = 20 ̄1 = 245 21 = 64 Girls: 2 = 12 ̄2 = 287 22 = 71 Carry out a hypothesis test that boys and girls watch the same number of hours of TV at the 5% level of significance. 0 : 1 = 2 1 : 1 6= 2 • Why 2 sided? DO not have any reason to believe girls and boys are different 18 CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS 245 − 287 − 0 = −141 = q 6657 6657 [ 20 + 12 ] since 2 = (19)64 + (11)71 (1 − 1)21 + (2 − 1)22 = = 6657 1 + 2 − 2 30 • As = −141 lies in the non-rejection region (05230 = ±2042) we do not reject the hypothesis at the 5% level of significance that boys and girls watch the same number of TV hours. • Note that ( −141) = 084 and therefore the p-value = 2 × 084 =.168. 10.3 Testing Differences when Variances are Different • Tests are conducted exactly the same way as before except the formula for ̄1 − ̄2 and the degrees of freedom are different. ̄1 − ̄2 = s 21 2 + 2 1 2 and the degrees of freedom is a crazy formula that can be found in the book: = h 2 2 (−1 ) + + 2 i2 2 ( −1 ) • For convenience I have always used = 1 + 2 − 2 and hoped I was not too far off. • Redo the above TV example by not assuming the variances are the same. Is there any difference to your conclusions? 10.4. TESTING DIFFERENCES OF POPULATION PROPORTIONS 10.4 19 Testing Differences of Population Proportions • Recall that the variance of a sample proportion is estimated by (1 − ) . • Then if we have two independent samples, for hypotheses about the difference between the two population proportions (and hypotheses are always about populations) 0 : 1 − 2 = 0 : 1 − 2 6= 0 • the test statistic is: (1 − 2 ) − 0 = p 1 (1 − 1 )1 + 2 (1 − 2 )2 • Often the null hypothesis will be that the two proportions are equal: 0 : 1 = 2 • In the equality of proportion case the formula simplifies to: where 1 − 2 = p (1 − )(11 + 12 ) = 10.4.1 1 + 2 1 + 2 Example of Testing Differences in Population Proportions In a sample of 400 products produced by Machine 1, 23 were defective and in a sample of 400 products produced by Machine 2, 17 were defective. Test: 0 : 1 − 2 = 0 20 CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS against 1 : 1 − 2 6= 0 using a 5% level of significance. Answer From the question we know 1 = 23 400 2 = 17 400 The pooled estimate is: = 1 + 2 23 + 17 = = 05 1 + 2 400 + 400 Therefore 23 17 − 400 400 = p = 08111 (05)(95)(1400 + 1400) • Since is in the non-rejection region (-1.96,1.96) we do not reject 0 at the 5% level of significance. • Note that since ( 8111) = 21 that the p-value= .21 × 2 =0.42. 10.5 Testing the Hypothesis 1 = 2 with Paired Data • In testing whether the two means were different form 0 or 0 we assumed that the two samples were independent. • Hence the estimated variance of the difference was the sum of the two variances. 10.5. TESTING THE HYPOTHESIS 1 = 2 WITH PAIRED DATA 21 • We could do the testing under the assumption that the variances were the same (pooled variance) or different. • On occasion, we may have paired data. • This is data that is grouped or paired so that the variation in responses between the members of any pair are less than the variation between members of different pairs. • We can improve the efficiency (lower the variance) of the experiment by randomizing the two treatments over the two members of each pair. • We restrict the randomization so that the treatment is given to one member of each pair. and obtain a separate estimate of the difference between the treatment effects for each pair. • The variation among the pairs is not included in our estimate of the variance. • Hence if this variation is large relative to the variation within pairs, the variance from paired tests will be smaller than that from a completely randomized (independent) sampling experiment. • This motivates the use of twins in some experiments. • In economics we seldom (ever?) have paired data and so we will not puirsue this matter • In hospital, clinical and drug testing setting, this is often the casewher there is paired data