Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AMS 572 Lecture Notes #6 September 24, 2013 <Today’s Topics> 1. Power Calculation (Inference on one population mean) 2. Sample Size Determination 3. How to do it in SAS & R 4. Inference on one population variance ⑧ Sample Size Determination Sample size determination in the hypothesis test scenario 1. H 0 : 0 H a : 0 ( a 0 ) 1st scenario, Normal population, 2 is known. 1 = P( Z a 0 Z | a ) , n Z ~ N (0,1) 1 Z n n 2. a 0 a Z 0 Z n n ( Z Z ) 0 a (Z Z ) 2 2 ( 0 a ) 2 H 0 : 0 H a : 0 ( a 0 ) 1st scenario, Normal population, 2 is known. n (Z Z ) 2 2 ( 0 a ) 2 (*the same result as in Scenario 1 – in summary, the formula is identical for the one-sided tests.) 3. H 0 : 0 H a : a 0 2 1st scenario, Normal population, 2 is known. 1 = P( Z Z 2 a 0 0 | a ) P( Z Z 2 a | a ) n n (Assume a 0 . Then, P( Z Z 2 1 P( Z Z 2 Z Z 2 a 0 | a ) 0 . So, we can neglect it.) n a 0 | a ) n a 0 n a 0 Z 2 Z n n ( Z 2 Z ) 2 2 (a 0 ) 2 (* in summary, this hand-calculated formula for the two sided test differs from that for the one-sided tests in two aspects: (1) α is replaced by α/2; (2) it is an approximate formula, not an exact formula.) 3 Sample size determination in the CI scenario 1st scenario, Normal population, 2 is known. P.Q. Z X ~ N (0,1) n 100(1 )% CI for : X Z 2 L ( X Z 2 2 Z 2 n( n ) ( X Z 2 n n ) n 2 Z 2 L )2 Sample size determination based on the maximum error E 1st scenario, Normal population, 2 is known. P( X E ) 1 P( E X E ) 1 P( n( E n Z 2 E X n E n ) 1 )2 Compare the above formula to: n( 2 Z 2 L )2 For a given , L 2 E -- one can prove this easily. 4 ⑨ Do it in SAS For the inference on one population mean, three procedures are most relevant: Proc means; Proc univariate; - we studied this in lecture 5 Proc ttest; -- we will study how to use this today as this is a more recent SAS procedure We start, however, by reviewing how to enter the data to SAS from the key board. data one ; input ID $ weight ; X = weight – 100 ; datalines ; P1 100 P2 93 P3 88 … P37 105 ; run ; Alternative data entry procedures in SAS: data two ; input ID $ weight @@ ; P1 20 p2 37 p3 47 P4 34 … … ; run ; *** infile ; (used to read data stored in other files already, e.g. excel files) 5 proc univariate data=one normal ; var X ; run ; Normality test : Shapiro-Wilk Test H 0 : population is normal / H a : population is NOT normal t-test / z-test non-parametric test : Sign Test / Signed Rank Test Alternative test procedures for one population mean in SAS: proc means data=one t prt ; var X ; run ; H 0 : 0 prt : p-value of H a : 0 proc ttest ; ttest : 1 population t-test / 2 populations t-test (paired and independent) Proc ttest can directly test: H 0 : 100 H a : 100 Ex1) The seven scores listed below are axial loads (in pounds) for a random sample of 7 12-oz aluminum cans manufactured by ALUMCO. An axial load of a can is the maximum weight supported by its sides, and it must be greater than 165 pounds, because that is the maximum pressure applied when the top lid is pressed into place. 270, 273, 258, 204, 254, 228, 282 (1) As the quality control manager, please test the claim of the engineering supervisor that the average axial load is greater than 165 pounds. Use 0.05 . What assumptions are needed for your test? 6 (2) Please write a SAS program to do part (a). Sol) H 0 : 165 H a : 165 (1) Assume the distribution is normal. T0 X 0 S H0 ~ t n1 n At the significance level of 0.05 , we reject H 0 in favor of H a if T0 t n 1, T0 87.7 27.6 7 8.9 1.943 : We reject H 0 CI : X t n1, 2 S n (2) data cans ; input pressure @@ ; newvar = pressure – 165; datalines ; 270 273 258 204 254 228 282 ; run ; proc univariate data=cans normal ; var newvar ; run ; Alternatively, we can use the proc ttest procedure as follows: Proc ttest data=cans h0=165 sides=u alpha = 0.05; Var pressure; Run; *** Note: Here we can also perform the one-sided test.*** Please see the following site for more examples and explanations: 7 http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statu g_ttest_a0000000115.htm Ex2) To determine whether glaucoma affects the corneal thickness, measurements were made in 8 people affected by glaucoma in one eye but not in the other. The corneal thickness (in microns) were as follows: Person Eye affected Eye not affected 1 488 484 2 478 478 3 480 492 4 426 444 5 440 436 6 410 398 7 458 464 8 460 476 (a) According to the data, can you conclude, at the significance level of 0.10, that the corneal thickness is not equal for affected versus unaffected eyes? Please first derive the general formula for the test for the given scenario based on a sample size of n and a significance level of α. (b) Calculate a 90% confidence interval for the mean difference in thickness. Please first derive the general formula for the confidence interval for the given scenario based on a sample size of n and a confidence level of 1-α. (c) Please write the entire SAS code to check the assumptions necessary in (a) and to perform the test asked for in (a). (*Part C was not given as part of the quiz today.) Note: For the general derivation of (a) and (b), please refer to your lecture notes #4. Solution: (a) Using d 4 and s d 10.744 , the test statistic is t d 0 sd n 40 10.744 8 1.053 Since t t81,0.05 1.895 , we can NOT reject H 0 at 0.10 . That is, we do NOT have enough evidence to support the claim that the average corneal thicknesses are affected by glaucoma. (b) A 90% CI for 1 2 is given by d tn1, 2 sd n 4 1.895 10.744 8 That is, [11.198, 3.198] (c) The SAS code is as follows. Data eyes; Input bad good; Diff=bad-good; Datalines; 488 484 478 478 480 492 426 444 8 440 436 410 398 458 464 460 476 ; Run; Proc univariate data = eyes normal; Var diff; Run; Alternatively, we can use the proc ttest procedure as follows: Proc ttest data=eyes alpha = 0.1; Paired bad*good; Run; *** Note: Here we can also obtain the CI and choose the confidence level.*** Please see the following site for more examples & options (see especially the plots): http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#sta tug_ttest_sect011.htm Ex3) Jerry is planning to purchase a sports goods store. He calculated that in order to make profit, the average daily sales must be $525 . He randomly sampled 36 days and found X $565 and S $150 1) In order to estimate the average daily sales to within $20 with 95% reliability, how many days should Jerry sample? 2) If the true average daily sales is $530, what is the power of Jerry’s test at the significance level of 0.05? 3) Suppose $530 . In order to guarantee 0.05 and 0.2 . How many days should Jerry sample? Sol) 1). P( X E ) 1 . n( z / 2 2 1.96*150 2 ) ( ) 216.09 217 . E 20 H 0 : 0 525 H a : a 530 0 2). 9 Power = P(Reject H 0 | H a ) P( Z 0 Z | a ) P( P( X 0 S n X a S n Z | a ) Z P ( Z 1.645 a 0 S n 530 525 150 36 | a ) ) P ( Z 1.445) 0.0749 0.0735 0.0742 2 3). 0.05 0.2 530 a n (Z Z ) 2 2 ( a 0 ) 2 (1.645 0.845) 2 150 2 5580.09 5581 (530 525) 2 Ex4) John Pauzke, president of Cereal’s Unlimited Inc, wants to be very certain that the mean weight of packages satisfies the package label weight of 16 ounces. The packages are filled by a machine that is set to fill each package to a specified weight. However, the machine has random variability measured by 2 . John would like to have strong evidence that the mean package weight is about 16 oz. George Williams, quality control manager, advises him to examine a random sample of 25 packages of cereal. From his past experience, George knew that the weight of the packages follows a normal distribution with standard deviation 0.4 oz. At the significance level 0.05 , 1) What is the decision rule (rejection region) in terms of the sample mean X ? 2) What is the power of the test when 16.13 oz? 3) How many packages of cereal should be sampled if we wish to achieve a power of 80% when 16.13 oz? Sol) Let X denote the weight of a randomly selected package of cereal, then X ~ N ( 16, 0.4) 10 H 0 : 16 H a : 16 H 0 : 16 H : 16 a 1) Test Statistic : Z 0 X 0 n H0 ~ N (0,1) if 0 16 P( Z 0 c | H 0 ) c Z We reject H 0 at 0.05 if Z0 X 0 n Z X 0 Z n 16 1.645 0.4 25 16.1316 (oz) H 0 : 0 16 2) H a : a 16.13 0 (n=25) Power = P(Reject H 0 | H a ) P( Z 0 Z | a ) P( P( X 0 n X a n Z | a ) Z P( Z 1.645 a 0 | a ) n 16.13 16 0.4 25 ) P ( Z 0.02) 0.49 3) 0.05, 0.2, 0 16, a 16.13, 0.4 n (Z Z ) 2 2 ( a 0 ) 2 (1.645 0.845) 2 (0.4) 2 59 (16.13 16) 2 ⑩ Inference on one population variance When the population is normal i .i .d . Data X 1 , X 2 , , X n ~ N ( , 2 ) W (n 1) S 2 2 ~ n21 : Pivotal Quantity for the inference on 2 11 1. CI for 2 P( P( ( 2 n 1, 2, L (n 1) S 2 n21, 2,U (n 1) S 2 2 2 n21, 2,U ) 1 (n 1) S 2 n21, 2, L ) 1 (n 1) S 2 (n 1) S 2 , 2 ) : 100(1 ) % CI for 2 2 n1, 2,U n1, 2, L 2. test on 2 H 0 : 2 02 H a : 2 02 E (S 2 ) 2 Test statistic : W0 (n 1) S 2 2 0 H0 ~ n21 At the significance level , we reject H 0 if W0 n21, ,U 12 Question: What if the population is NOT normal? Answer: if you know the population distribution, you can do the LR test (likelihood ratio test) If the population distribution is unknown, you can try Box-Cox normal transformation, or apply non-parametric procedures such as Bootstrap resampling. 13