Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
12-1 Chapter Twelve McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. 12-2 Chapter Twelve Analysis of Variance GOALS When you have completed this chapter, you will be able to: ONE List the characteristics of the F distribution. TWO Conduct a test of hypothesis to determine whether the variances of two populations are equal. THREE Discuss the general idea of analysis of variance. FOUR Organize data into a one-way and a two-way ANOVA table. Goals Chapter Twelve 12-3 continued Analysis of Variance GOALS When you have completed this chapter, you will be able to: FIVE Define and understand the terms treatments and blocks. SIX Conduct a test of hypothesis among three or more treatment means. SEVEN Develop confidence intervals for the difference between treatment means. EIGHT Conduct a test of hypothesis to determine if there is a difference among block means. Goals 12-4 Characteristics of the F-Distribution 4.5 1 There is a “family” of F Distributions. Each member of the family is determined by two parameters: the numerator degrees of freedom and the denominator degrees of freedom. Its values range from 0 to . As F the F cannot be curve approaches the XThe F negative, and axis but never touches it. distribution is it is a positively continuous skewed. Characteristics of Fdistribution. Distribution Test for Equal Variances of Two Populations For the two tail test, the test statistic is given by F 12-5 2 s1 2 s2 The degrees of freedom are n1-1 for the numerator and n2-1 for the denominator. s12 and s 22 are the sample variances for the two samples. The larger s is placed in the denominator. The null hypothesis is rejected if the computed value of the test statistic is greater than the critical value. Test for Equal Variances of Two Populations 12-6 Colin, a stockbroker at Critical Securities, reported that the mean rate of return on a sample of 10 internet stocks was 12.6 percent with a standard deviation of 3.9 percent. The mean rate of return on a sample of 8 utility stocks was 10.9 percent with a standard deviation of 3.5 percent. At the .05 significance level, can Colin conclude that there is more variation in the software stocks? Example 1 12-7 Step 1: The hypotheses are 2 H0 : I H1 : I2 2 U U2 Step 2: The significance level is .05. Step 3: The test statistic is the F distribution. Example 1 continued 12-8 Step 4: H0 is rejected if F>3.68 or if p < .05. The degrees of freedom are n1-1 or 9 in the numerator and n1-1 or 7 in the denominator. Step 5: The value of F is computed as follows. F (3.9) 2 (3.5) 2 1.2416 The p(F>1.2416) is .3965. H0 is not rejected. There is insufficient evidence to show more variation in the internet stocks. Example 1 continued 12-9 The ANOVA Test of Means The F distribution is also used for testing whether two or more sample means came from the same or equal populations. This technique is called analysis of variance or ANOVA The null and alternate hypotheses for four sample means is given as: Ho: m1 = m2 = m3 = m4 H1: m1 = m2 = m3 = m4 The ANOVA Test of Means 12-10 ANOVA requires the following conditions The sampled populations follow the normal distribution. The samples are independent The populations have equal standard deviations. Underlying assumptions for ANOVA 12-11 Estimate of the population variance based on the differences among the sample means F= Estimate of the population variance based on the variation within the samples Degrees of freedom for the F statistic in ANOVA If there are k populations being sampled, the numerator degrees of freedom is k – 1 If there are a total of n observations the denominator degrees of freedom is n – k. ANOVA Test of Means 12-12 ANOVA divides the Total Variation into the variation due to the treatment, Treatment Variation, and to the error component, Random Variation. In the following table, i stands for the ith observation xG is the overall or grand mean k is the number of treatment groups ANOVA Test of Means 12-13 ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Square Treatments (k) SST k-1 SST/(k-1) =MST Error k Snk(Xk-XG)2 SSE n-k i k Total SS(Xi.k-Xk)2 TSS i S(Xi-XG)2 n-1 F MST MSE SSE/(n-k) =MSE Treatment variation Random variation Total variation Anova Table 12-14 Rosenbaum Restaurants specialize in meals for families. Katy Polsby, President, recently developed a new meat loaf dinner. Before making it a part of the regular menu she decides to test it in several of her restaurants. She would like to know if there is a difference in the mean number of dinners sold per day at the Anyor, Loris, and Lander restaurants. Use the .05 significance level. Example 2 12-15 Number of Dinners Sold by Restaurant Restaurant Day Aynor Loris Lander Day 1 Day 2 Day 3 Day 4 Day 5 13 12 14 12 10 12 13 11 18 16 17 17 17 Example 2 continued 12-16 Step One: State the null hypothesis and the alternate hypothesis. Ho: mAynor = mLoris = mLandis H1: mAynor = mLoris = mLandis Step Two: Select the level of significance. This is given in the problem statement as .05. Step Three: Determine the test statistic. The test statistic follows the F distribution. Example 2 continued 12-17 Step Four: Formulate the decision rule. The numerator degrees of freedom, k-1, equal 3-1 or 2. The denominator degrees of freedom, n-k, equal 13-3 or 10. The value of F at 2 and 10 degrees of freedom is 4.10. Thus, H0 is rejected if F>4.10 or p< a of .05. Step Five: Select the sample, perform the calculations, and make a decision. Using the data provided, the ANOVA calculations follow. Example 2 continued Computation of SSE Anyor #sold 13 12 14 12 Xk SSE: XG: SS(Anyor) (13-12.75)2 (12-12.75)2 (14-12.75)2 (12-12.75)2 2.75 12.75 Loris #sold 10 12 13 11 i k 12-18 SS(Xi.k-Xk)2 SS(Loris) Lander SS(Lander) #sold (10-11.5)2 18 (18-17)2 (12-11.5)2 16 (16-17)2 (13-11.5)2 17 (17-17)2 (11-11.5)2 17 (17-17)2 17 (17-17)2 5 2 11.5 17 2.75 + 5 + 2 = 9.75 14.00 Computation of TSS 12-19 i S(Xi-XG)2 Anyor TSS(Anyor) Loris TSS(Loris) Lander TSS(Lander) #sold #sold #sold 13 (13-14)2 10 (10-14)2 18 (18-14)2 12 (12-14)2 12 (12-14)2 16 (16-14)2 14 (14-14)2 13 (13-14)2 17 (17-14)2 12 (12-14)2 11 (11-14)2 17 (17-14)2 17 (17-14)2 9.00 30 47 TSS: 9.00 + 30 + 47 = 86.00 SSE: 9.75 XG: 14.00 Example 2 continued Computation of TSS Computation of SST 12-20 k Snk(Xk-XG)2 Restaurant Anyor Loris Lander XT SST 12.75 11.50 17.00 4(12.75-14)2 4(11.50-14)2 5(17.00-14)2 76.25 Shortcut: SST = TSS – SSE = 86 – 9.75 = 76.25 Example 2 continued Computation of SST 12-21 ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Square Treatments 76.25 3-1 =2 76.25/2 =38.125 13-3 =10 13-1 =12 9.75/10 =.975 Error 9.75 Total 86.00 F 38.125 .975 = 39.103 Example 2 continued 12-22 The p(F> 39.103) is .000018. Since an F of 39.103 > the critical F of 4.10, the p of .000018 < a of .05, the decision is to reject the null hypothesis and conclude that At least two of the treatment means are not the same. The mean number of meals sold at the three locations is not the same. The ANOVA tables on the next two slides are from the Minitab and EXCEL systems. Example 2 continued 12-23 Analysis of Variance Source DF SS Factor 2 76.250 Error 10 9.750 Total 12 86.000 Level ---Aynor Loris Lander MS 38.125 0.975 N Mean StDev 4 4 5 12.750 11.500 17.000 0.957 1.291 0.707 ---Pooled StDev = 0.987 F 39.10 P 0.000 Individual 95% CIs For Mean Based on Pooled StDev ---------+---------+---------+--(---*---) (---*---) (---*---) ---------+---------+---------+--12.5 15.0 17.5 Example 2 continued 12-24 Anova: Single Factor SUMMARY Groups Count Sum Average Variance Aynor 4 51 12.75 0.92 Loris 4 46 11.50 1.67 Lander 5 85 17.00 0.50 ANOVA Source of Variation SS Between Groups 76.25 2 38.13 9.75 10 0.98 86.00 12 Within Groups Total df MS F P-value F crit 39.10 2E-05 4.10 Example 2 continued 12-25 When I reject the null hypothesis that the means are equal, I want to know which treatment means differ. One of the simplest procedures is through the use of confidence intervals around the difference in treatment means. Inferences About Treatment Means 12-26 1 1 X1 X 2 t MSE n n 1 2 t is obtained from the t table with degrees of freedom (n - k). MSE = [SSE/(n - k)] If the confidence interval around the difference in treatment means includes zero, there is not a difference between the treatment means. Confidence Interval for the Difference Between Two Means 12-27 95% confidence interval for the difference in the mean number of meat loaf dinners sold in Lander and Aynor Can Katy conclude that there is a difference between the two restaurants? 1 1 (17 12.75) 2.228 .975 4 5 4.25 148 . (2.77,5.73) EXAMPLE 3 12-28 Because zero is not in the interval, we conclude that this pair of means differs. The mean number of meals sold in Aynor is different from Lander. Example 3continued 12-29 Sometimes there are other causes of variation. For the twofactor ANOVA we test whether there is a significant difference between the treatment effect and whether there is a difference in the blocking effect (a second treatment variable). SSB = r S (Xb – XG)2 where r is the number of blocks Xb is the sample mean of block b XG is the overall or grand mean In the following ANOVA table, all sums of squares are computed as before, with the addition of the SSB. Two-Factor ANOVA 12-30 ANOVA Table Source of Variation Sum of Squares Treatments (k) Blocks (b) Error SST Total Degrees of Freedom k-1 SSB b-1 SSE (TSS – SST –SSB) TSS (k-1)(b-1) Mean Square SST/(k-1) =MST SSB/(b-1) =MSB SSE/(n-k) =MSE F MST MSE MSB MSE n-1 Two factor ANOVA table The Bieber Manufacturing Co. operates 24 hours a day, five days a week. The workers rotate shifts each week. Todd Bieber, the owner, is interested in whether there is a difference in the number of units produced when the employees work on various shifts. A sample of five workers is selected and their output recorded on each shift. 12-31 At the .05 significance level, can we conclude there is a difference in the mean production by shift and in the mean production by employee? Example 4 12-32 Employee McCartney Day Output 31 Evening Output 25 Night Output 35 Neary 33 26 33 Schoen 28 24 30 Thompson 30 29 28 Wagner 28 26 27 Example 4 continued Treatment Effect 12-33 Step 1: State the null hypothesis and the alternate hypothesis. Step 2: Select the level of H 0 : m1 m 2 m 3 significance. Given as .05. H1: Not all means are equal. Step 4: Formulate the decision rule. Ho is rejected if F > 4.46, the degrees of freedom are 2 and 8, or if p < .05. Step 5: Perform the calculations Example 4 continued and make a decision. Step 3: Determine the test statistic. The test statistic follows the F distribution. 12-34 Block Effect Step 1: State the null hypothesis and the alternate hypothesis. Step 2: Select the H 0 : m1 m 2 m 3 m 4 m 5 level of significance. Given as a = .05. H1: Not all means are equal. Step 3: Determine the test statistic. The test statistic follows the F distribution. Step 4: Formulate the decision rule. H0 is rejected if F>3.84, df =(4,8) or if p < .05. Step 5: Perform the calculations and make a decision. Example 4 continued Note: xG = 28.87 Block Sums of Squares Effects of time of day and worker on productivity Day Evening Night Employee x SSB McCartney 31 25 35 30.33 Neary 33 26 33 30.67 Schoen 28 24 30 27.33 Thompson 30 29 28 29.00 Wagner 28 26 27 27.00 SSB = 6.42 + 9.68 + 7.08 + .05 + 10.49= 33.73 12-35 3(30.33-28.87)2 = 6.42 3(30.67-28.87)2 = 9.68 3(27.33-28.87)2 7.08 3(29.00-28.87)2 .09 3(27.00-28.87)2 10.49 12-36 Compute the remaining sums of squares as before: TSS = 139.73 SST = 62.53 SSE = 43.47 (139.73-62.53-33.73) df(block) = 4 (b-1) df(treatment) = 2 (k-1) df(error)=8 (k-1)(b-1) Example 4 continued 12-37 ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Square F Treatments (k) 62.53 2 62.53/2 =31.275 31.27/5.43 = 5.75 Blocks (b) Error 33.73 4 8.43/5.43 =1.55 43.47 8 33.73/4 =8.43 43.47/8 =5.43 Total 139.73 14 Example 4 continued 12-38 Treatment Effect Since the computed F of 5.75 > the critical F of 4.10, the p of .03 < a of .05, H0 is rejected. There is a difference in the mean number of units produced for the different time periods. Block Effect Since the computed F of 1.55 < the critical F of 3.84, the p of .28> a of .05, H0 is not rejected since there is no significant difference in the average number of units produced for the different employees. Example 4 continued 12-39 Minitab output Two-way ANOVA: Units versus Worker, Shift Analysis of Variance for Units Source DF SS MS Worker 4 33.73 8.43 Shift 2 62.53 31.27 Error 8 43.47 5.43 Total 14 139.73 F 1.55 5.75 P 0.276 0.028 Example 4 continued 12-40 Anova: Two-Factor Without Replication SUMMARY Day Evening Night McCartney Neary Schoen Thompson Wagner Count 5 5 5 Sum Average Variance 150 30.0 4.5 130 26.0 3.5 153 30.6 11.3 3 3 3 3 3 91 92 82 87 81 30.33 30.67 27.33 29.00 27.00 2 4 8 MS 31.27 8.43 5.43 25.33 16.33 9.33 1 1 ANOVA Output Using EXCEL Source of Variation Rows Columns Error SS 62.53 33.73 43.47 Total 139.73 df F P-value 5.75 0.03 1.55 0.28 F crit 4.46 3.84 14 Example 4 continued