Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AMS312.01 Lecture notes April 14, 2008 Prof. Wei Zhu 1. Jerry is planning to purchase a sporting goods store. He calculated that in order to cover basic expenses average daily sales must be at least $525. He checked the daily sales of 36 randomly selected business days. And he found that the average daily sale for these days is $565 with a standard deviation of $150. (a) At the significance level =0.05, can Jerry conclude that the average daily sale is higher than $525? What is the p-value? (b) In order to estimate the average daily sale of the store to within $20 with 95% reliability, how many days should Jerry sample? Please derive the general formula first. (c) If Jerry could only check the daily sales of 9 randomly selected business days (instead of 36 randomly selected days). Suppose the daily sale for these 9 days are 510, 537, 548, 592, 503, 490, 601, 499 and 640 respectively. At the significance level =0.05, can Jerry conclude that the average daily sale is higher than $525? What assumptions are necessary for your test? What is the p-value of your test? Solution. Inference on one population mean. n=36. Population variance 2 is unknown. If you know the data, then you do normality test (e.g., Sharpio-Wilk test) to see if the sample is from normal distribution. If the population is normal, then we only use tdistribution. If it’s not normal but the sample size is large (>=30), the pivotal quantity Z X ~ N (0,1) (by Central Limit Theorem and Slusky Theorem). S/ n (a) X 565 , S=150. H 0 : 0 525 H a : 0 Note: If X 505 (<525), then you should notice that H a is not suitable. Test statistic: Z 0 X 0 565 525 1.6 150 / 6 S/ n At the significance level , we reject H 0 if Z 0 z . Here z z0.05 1.645 . Since z0 1.6 1.645 z , we cannot reject H 0 . P-value= P(Z z0 | H 0 ) P( Z 1.6) 0.0548 . We can not reject H 0 . Note: P-value= P( X 565 | H 0 ) P( Z 1.6 | H 0 ) . (b). First we derive the general formula. P( X E ) 1 P( E X E ) 1 E X E ) 1 / n / n / n z E z / 2 n ( / 2 )2 E / n P( Next we plug in the values to obtain the answer for the given problem. n( z / 2 2 1.96*150 2 ) ( ) 216.09 217 . E 20 (c) Suppose from the Sharpio-Wilk test, we know the data/sample is from a normal population. H 0 : 0 525 H a : 0 X 546.67 , S=53.09. Test statistic: T0 X 0 546.67 525 1.22 . 53.09 / 3 S/ n At the significance level , we reject H 0 if T0 tn 1, t8,0.05 1.860 . Since T0 1.22 t8,0.05 1.860 , we cannot reject H 0 . P-value= P(t t0 | H 0 ) P(t 1.22) 0.1 . Since P-value > α = 0.05, we cannot reject H 0 . 2. John Pauzke, president of Cereals Unlimited, Inc., wants to be very certain that the mean weight μ of packages satisfies the package label weight of 16 ounces. The packages are filled by a machine that is set to fill each package to a specified weight. However, the machine has random variability measured by σ2. John would like to have strong evidence that the mean package weight is above 16 ounces. George Williams, quality control manager, advises him to examine a random sample of 25 packages of cereal. From his past experience, George knew that the weight of the cereal packages follows a normal distribution with standard deviation 0.4 ounce. At the significance level α =.05, (a) What is the decision rule (rejection region) in terms of the sample mean x ? Please derive the general formula using the concept of Type I error rate. (b) What is the power of the test when μ =16.2 ounces? Please derive the general formula for power calculation first. (c) What is the sample size necessary to ensure a power of 80% when μ =16.2 ounces? Please derive the general formula for sample size calculation based on the Type I and II error rates first. Solution: (a) Given the population is normal and the population variance is known, first we derive the general formula for the hypotheses H 0 : 0 H a : 0 * The pivotal quantity for : Z X ~ N (0,1) / n * Test statistic (plug in H 0 : 0 to Z): Z0 X 0 ~ N (0,1) / n * Now we derive the decision threshold (rejection region) for our test based on the Type I error rate, that is, the significance level * = P(Type I error) = P(reject H 0 | H 0 ) = P(Z0 c | H 0 : 0 ) Given Ho, Zo follows exactly N(0, 1), therefore c Z Thus we reject H 0 ( 0 ) in favor of H a ( 0 ) if Z 0 Z ; that is, if Z0 X 0 Z X 0 Z / n / n For the given problem, we have 0 16, 0.4, n 25, 0.05 . Thus we reject we reject H 0 ( 16) in favor of H a ( 16) if X 16 0.4 0.4 z0.05 16 1.645 16.1316 5 5 (b) H 0 : 16 0 H a : 16.2 a 16 Power = 1- = P(reject H 0 | H a ) = P( Z 0 Z | a ) = P( = P( X 0 n X a n Z | a ) , If a , Z 0 = P( Z Z X 0 n ~ N( a 0 ,1) n a 0 Z | a ) n a 0 | a ) , n Z ~ N (0,1) Please see Figure 1 for an illustration. Figure 1. Power calculation and test related sample size calculation. For the given problem, Z a 0 16.1316 16.2 0.855 0.4 / 5 n Thus Power P( Z 0.855) 0.805 (c) H 0 : 16 0 H a : 16.2 a 16 Following Figure 1, we derive the sample size necessary by solving the equation: Z Z a 0 n The resulting general formula is n ( z z ) 2 2 ( a 0 ) 2 For the given problem: power 0.8 0.2, 0.4, 0.05 , therefore we have n ( z z ) 2 2 ( a 0 ) 2 (1.645 0.845) 2 0.42 6.2001 0.16 (16.2 16) 2 0.22 0.992016 24.8 25 0.04 Hence, about 25 packages of cereal should be sampled to achieve a power of 80% when =16.2 ounces. 3. For a random sample from any population for which the mean and variance exist. Please prove that (a) The sample mean and sample variance are unbiased estimators of the population mean and variance respectively. (b) When the population is normal, we have learned that the sample mean and the sample variance, are indeed, independent. Please prove this for n = 2. That is, for a random sample of size 2 only. Solution: Xi 1 1 (a) E X E E X i * n n n n X X 2 i 2 1 E X i 2 nE X 2 E S E n 1 n 1 2 2 1 2 2 2 n n n 1 n X X2 (b) When n=2, X 1 , 2 X1 X 2 2 X X2 2 ) (X2 1 ) ( X1 X ) ( X 2 X ) 2 2 2 S 2 1 1 2 X X2 2 X X1 2 ( X1 X 2 ) ( 1 ) ( 2 ) 2 2 2 2 2 ( X1 If we can show that X 1 X 2 and X 1 X 2 are independent, then X and S 2 are independent. This can be done easily using the mgf technique: M ( X1 X 2 ),( X1 X 2 ) (t1 , t2 ) M ( X1 X 2 ) (t1 ) M ( X1 X 2 ) (t2 ) 4. It has been claimed that more than 70 percent of the students attending a large university are opposed to a plan to increase student fees in order to build new parking facilities. If 25 out of 30 students selected at random at that university are opposed to the plan, please (a) Test the claim at the 0.05 level of significance. Please derive the general formula for the test at the significance level α first. (b) Construct a 95% confidence interval for the proportion of students opposed to the plan. Please derive the general formula for the 100(1-α)% confidence interval first. x 25 Solution: Inference on one population proportion. n 30, x 25, pˆ . n 30 (a). By the central limit theorem and Slusky’s theorem, we found the pivotal quantity: Z pˆ p N (0,1) for large sample p(1 p) / n In general: H 0 : p p0 v.s. H a : p p0 The test statistic is: Z pˆ p0 N (0,1) When the null hypothesis is true. p0 (1 p0 ) / n By the definition of Type I error, we have, P(Z0 c | H 0 ) P(Z0 c | H 0 : p p0 ) Given Ho, Zo follows approximately N(0, 1), therefore, c Z . That is, we reject H0 if Z0 Z In the given problem, we have H 0 : p 0.7, H a : p 0.7; 0.05 *Check: ( np0 5, n(1 p0 ) 5) ! 25 0.7 pˆ p0 30 T .S . Z 0 1.59 p0 (1 p0 ) 0.7(1 0.7) 30 n We can NOT reject H 0 . Z 0 1.59 Z 0.05 1.645 (b). By the central limit theorem and Slusky’s theorem, we found the pivotal quantity: pˆ p Z N (0,1) for large sample pˆ (1 pˆ ) / n From the pivotal quantity and its distribution: P( Z 2 pˆ p Z ) 1 pˆ (1 pˆ ) / n 2 C.I . [ pˆ Z pˆ (1 pˆ ) / n , pˆ Z 2 2 pˆ (1 pˆ ) / n ] Since x 25 5, n x 30 25 5, we can use the following formula to construct the CI for p: pˆ (1 pˆ ) pˆ 1.96 [.70, .96] n That is, we are 95% sure that between 70% to 96% of the students are against the plan. 5. To study the effectiveness of wall insulation in saving energy for home heating, the energy consumption (in MWh) for 5 houses in Bristol, England, was recorded for two winters; the first winter was before insulation and the second winter was after insulation. House 1 2 3 4 5 Before 12.1 10.6 13.4 13.8 15.5 After 12.0 11.0 14.1 11.2 15.3 (a) Please provide a 95% confidence interval for the difference between the mean energy consumption before and after the wall insulation is installed. What assumptions are necessary for your inference? (b) Can you conclude that there is a difference in mean energy consumption before and after the wall insulation is installed at the significance level 0.05? Please test it and evaluate the p-value of your test. What assumptions are necessary for your inference? Two dependent population mean, paired samples t-test (that is, one sample t-test on the paired differences) Assumptions for (a) and (b): the paired differences follow a normal distribution. (a). d 0.36 , sd=1.30 1.30 CI: 0.36 2.776 ( 1.25,1.97) 5 (b). H 0 : d 0 , H 0 : d 0 (1) t0 d d 0.36 0 0.619 sd n 1.30 / 5 Since t0 0.619 t 2 , n 1 t0.025,4 2.776 , we fail to reject H 0 . (2) p value 2 P(t 0.619) 0.2