Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 512 Notes 7 Hypothesis Testing Continued Testing a normal mean Example: A highway patrol officer believes that the average speed of cars traveling over a certain stretch of highway exceeds the posted limit of 55 mph. The speeds of a random sample of 200 cars were recorded. The standard deviation of speeds is known to be 5 and it is assumed that the distribution of speeds is normal. Test the patrol officer’s claim. Distributions Speeds 40 45 50 55 60 Moments Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N 55.8 4.4676391 0.3159098 56.42296 55.17704 200 65 2 Suppose X 1 , , X n iid N ( , ) with the variance known. We want to test H 0 : 0 vs. H1 : 0 . X 0 z Consider the test statistic and critical region n C {z : z c} . What do we need to choose c to be so that the size of the test is 0.05? X 0 P c P( Z c) 0 n where Z is a standard normal random variable. Thus, we want to choose c to be the 0.95 quantile of the standard normal distribution which equals 1.645. X 0 55.8 55 2.26 5 For the speed limit data, . n 200 Since z>1.645, we reject the null hypothesis – there is strong evidence that the average speed is above 55 MPH. z Suppose we wanted to test H 0 : 0 vs. H1 : 0 . X 0 z The size of the test with test statistic and n critical region C {z : z c} is X 0 max 0 P c . We have n X X 0 P c P c 0 n n n P Z c 0 n c 0 P Z c 0 1 is an Because n n increasing function of , the size of the test is X 0 P c 0 . Thus a test of size 0.05 for testing n H 0 : 0 vs. H1 : 0 is the same as the test of size 0.05 for testing H 0 : 0 vs. H1 : 0 -- the critical region is C {z : z 1.645} where z X 0 . n Power function: The power function of the test with critical region C {z : z 1.645} is the following X 0 C ( ) P 1.645 n X 0 P 0 1.645 0 n n n X P 1.645 0 n n 1 1.645 0 n For H 0 : 0 vs. H a : 0 and 1 , the power function is shown below for n=10 and n=100. 0.0 0.2 0.4 Power 0.6 0.8 1.0 Power when n=10 -1.0 -0.5 0.0 0.5 mu 1.0 1.5 2.0 0.0 0.2 0.4 Power 0.6 0.8 1.0 Power when n=100 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 mu Two sided tests: Suppose we want to test H 0 : 0 vs. X 0 z H1 : 0 . Using the test statistic still seems n reasonable but now it makes sense to reject for both very large and very small values of z . We can use a critical region of the form C {z :| z | c} . A test of size 0.05 has critical region C {z :| z | 1.96} because X 0 P 0 c P 0 | Z | c n Duality between tests and confidence intervals Suppose we want to test H 0 : 0 vs. H1 : 0 and use the rejection region C {z :| z | 1.96} . Then, the set of 0 for which the H 0 : 0 is not rejected is {0 : X 0 1.96} {0 : 1.96 X 0 n {0 : X 1.96 1.96} n 0 X 1.96 } n n which is the 95% confidence interval for that we have used. In general, there is a duality between tests and confidence intervals. Suppose we have a family of tests of size of H 0 : 0 vs. H a : 0 for each . Then {0 : test of H0 : 0 vs. H1 : 0 is not rejected} is a (1 ) confidence interval for . Proof: Let CI ( X1 , , X n ) {0 : test of H0 : 0 vs. H1 : 0 is not rejected} Then P0 [ 0 CI ( X 1 , , X n )] 1 size(test of H 0 : 0 vs. H a : 0 ) 1 Conversely, suppose we have a (1 ) confidence interval CI ( X1 , , X n ) for . Then a test of size at most of H 0 : 0 vs. H a : 0 is to reject the null hypothesis if and only if 0 does not belong to the confidence region. Proof: We have P0 [ 0 CI ( X 1 , , X n )] because CI ( X1 , , X n ) is a (1 ) confidence interval. Thus, the test is of size at most . Large sample tests for mean One of the issues that came up in a recent municipal election was the high cost of housing. A candidate seeking to unseat an incumbent claimed that the average family spends more than 30% of its annual income on housing. A housing expert was asked investigate the claim. A random sample of 125 households was drawn, and each household was asked to report the percentage of household income spent on housing costs. Is there strong evidence in favor of the candidate’s claim? Distributions Costs 15 20 25 30 35 40 45 50 Moments Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N 31.952 7.1907826 0.6431632 33.225 30.679 125 We want to test H 0 : 30 vs. H a : 30 . More generally, test H 0 : 0 vs. H a : 0 . n 2 X 0 ( X X ) i 2 i 1 t S Test statistic where S is the n 1 n sample variance. Consider the test with critical region {t : t c} . By the central limit theorem, X X 0 0 P c P 0 c 0 S S S S n n n n X 1 c 0 P c 0 S S S n n n Note that the approximate probability of rejecting the null hypothesis is an increasing function of so that the size is equal to the probability of rejecting the null hypothesis when 0 . Thus, if we choose c 1 (1 ) z where is the standard normal CDF, then the approximate size of the test that has critical region 1 {t : t c} is 1 (1 ) . For the data on family spending on annual housing, 31.952 30 t 3.03 . Since t>1.645, we reject the null 7.191 125 hypothesis at the 0.05 significance level; there is strong evidence for the candidate’s claim that the average family spends more than 30% of its annual income on housing. t-test for normal mean , X n iid N ( , 2 ) with the variance unknown. Suppose we want to test H 0 : 0 vs. X 0 t H a : 0 . Consider the test statistic S . n The test with rejection region {t : t t ,n 1} [where t ,n 1 is the (1 ) quantile of the t-distribution with n-1 degrees of freedom, i.e., P(T t ,n 1 ) ] has exact size because X 0 t S when 0 , has a t-distribution with n-1 n degrees of freedom. Suppose X 1 , Note the difference between the rejection rule {t : t t ,n 1} and {t : t z } . The large sample {t : t z } has approximate size , while {t : t t ,n 1} has exact size . Of course, we now have to assume that X i has a normal distribution. In practice, we may not be willing to assume that the population is normal. In general t-critical values are larger than z critical values (i.e., t ,n 1 z ) so the t-test is conservative relative to the large sample test. So in practice, many statisticians often use the t-test even if they do not believe the data is normally distributed. Note that lim t ,n 1 z . n How well does the t-test work in moderate sized samples when the data is not normal, i.e., what is its true size in moderate sized samples? We will look at this question using the Monte Carlo method (Section 5.8) on Thursday. Review of hypothesis testing Goal: Decide between two hypotheses about a parameter of interest H 0 : 0 H1 : 1 , where 0 1 . Null vs. Alternative Hypothesis: The alternative hypothesis is the hypothesis we are trying to see if there is strong evidence for. The null hypothesis is the default hypothesis that we will retain unless there is strong evidence for the alternative hypothesis. Test statistic and critical region: Test is defined by test statistic and critical region. Critical region is region of values of test statistic for which we will reject the null hypothesis. Errors in hypothesis testing: Type I and Type II errors. Size of test, power of test: Power function of test = C ( ) P (W ( X1, , X n ) C ) = Probability of rejecting null hypothesis when true parameter is . Size of test = max 0 C ( ) Power at an alternative 1 = C ( ) Neyman-Pearson paradigm: Choose size of test to be reasonably small to protect against Type I error, typically 0.05 or 0.01. Among tests which have prescribed size, choose the most powerful test. P-values: Measure of evidence against the null hypothesis. Smallest sized test in a family of tests for which we would reject the null hypothesis. In chapter 8, we will discuss how to choose most powerful tests.