Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
One-Way ANOVA One-way (or one-factor) analysis of variance (ANOVA) is covered in Sokal and Rohlf (1995) Chapters 8 and 9 (computations shown in Box 9.4 pp. 218-219), in Ott Chapters 10 and 11, in Koopmans (1987) Chapter 11, in Zar (1984) Chapter 11, Sections 11.111.3, and in Zar (1999), Chapters 10 and 11, Sections 10.1 and 11.1-11.3. Contents Breakfast_Cereal_Example o The Six-Step Method Sums-of_Squares o Sum_of_Squares Exposed Electronic Computation o SAS Implementation Trivial Examples Exercises o Sums-of-Squares-Exposed Exercise o Trivial Exercises o The Doughnut Problem Table 1. Breakfast Cereal Study: Raw Data, “twovariable format” Weight Gain Diet (g) Cheerios 1 Cheerios 2 Cheerios 3 Cheerios 2 Breakfast Cereal Example Corn Flakes Corn Flakes Corn Flakes Corn Flakes 4 2 3 3 Sixteen randomly selected mice of the same age and strain were randomly assigned to one of four treatment groups. The various treatments were various diets: Cheerios, Corn Flakes, Frostyos, and Frosted Flakes. The weight gain after 1 week on the diet was recorded for each mouse. Is there evidence that the mean weight gain is affected by these treatments? Frostyos Frostyos Frostyos Frostyos 5 5 4 6 Frosted Flakes Frosted Flakes Frosted Flakes Frosted Flakes 5 7 6 6 Table 2. Breakfast Cereal Study, Raw Data, “four-sample format” Corn Frosted Cheerios Flakes Frostyos Flakes 1 4 5 5 2 2 5 7 3 3 4 6 2 3 6 6 The sample, i.e., raw data, can be viewed, as in Table 1, as a relationship between two variables: one qualitative variable, e.g., diet (Cheerios, Corn Flakes, Frostyos, Frosted Flakes) and one quantitative variable, e.g., weight gain (g). Alternatively, the raw data Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 1 11/13/2012 Virginia Tech Department of Statistics can be viewed as in Table 2, a four samples of the same quantitative variable, e.g., weight gain (g), from four independent populations, e.g., the “populations” of mice who eat, respectively, the four different diets. The data also can be viewed graphically as the relationship between two variables, as in Figure 1, which corresponds to the view of Table 1. Yij Weight Gain (g) | X 6 + X XX | XX X 4 + X X | X XX 2 + XX X | X 0 +----+--------+--------+--------+---->i 1 2 3 4 Cheerios Corn Frostios Frosted Flakes Flakes | 4 > + 3 4 4 > | 3 3 4 + 2 3 > | 1 2 2 > + 1 1 2 | 1 +-------------> Combined TREATMENT GROUP (Diet) Figure 1. Graph of raw data: weight gain as a function of diet The Six-Step Method 1. Model: Yij i eij i eij (obs response)ij = (pop mean response to trt)i + (error)ij (obs response)ij = (pop mean response) + (pop effect of trt)i + (error)ij where a. Yij denotes the weight gain (g) of the j-th randomly selected mouse from the i-th treatment group, j = 1, 2, ...,ni, i = 1, 2, ..., k. b. μi denotes the mean weight gain, or expected weight gain, of mice on diet i. μ denotes the grand mean, i.e., the mean weight gain, or expected weight gain, τi = (μi – μ) is the effect of treatment i. eij =(Yij − μi) is the residual or error of the j-th observation in the i-th group. c. Assume: i. eij are independent (random sampling and careful experimentation achieve this), Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 2 11/13/2012 Virginia Tech Department of Statistics ii. σi = σ for all i (Homogeneity of Variance). iii. eij are normally distributed with mean 0 and standard deviation σi , These assumptions imply, and are equivalent to assuming that i. ii. Yij are independent (random sampling and careful experimentation achieve this), σi = σ for all i (Homogeneity of Variance). iii. Yij are normally distributed with mean μi and standard deviation σi , 2. Hypotheses: H0: μ1 = μ2 = μ3= μ4 vs. HA: not H0, or, H0: τi = 0 for all i, vs. HA: τi ≠ 0 for some i. 3. Test Criterion:F = MSB/MSW, with (k – 1) and (N – k) degrees of freedom. 4. Design: α = 0.05, k = 4 groups, n1 = n2 = n3 = n4 = 4, N = 16 Note: When all treatments have equal sample sizes, the design is called balanced. Otherwise the design is called unbalanced. Unbalanced designs require more complicated (i.e., messy) computations, but the concepts are the same (with a few exceptions), we will stick to balanced designs for the most part. 5. Computations: j Group i → Cheerios 1 Corn Flakes 2 Frostyos 3 Frosted Flakes 4 1 4 5 6 2 2 4 5 2 3 5 6 3 3 6 7 (1) (2) (3) Yij (4) Combined (a) ni 4 4 4 4 N = 16 (b) Yi 2 3 5 6 Y 4 −2 −1 1 2 Check = 0 SSB= 40 (effect)i (c) ˆ Y Y i i (d) niˆi2 16 4 4 16 (e) si 0.816 0.816 0.816 0.816 (f) si 2 0.667 0.667 0.667 0.667 (g) SSi 2 2 2 2 Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 3 SSW = 8 11/13/2012 Virginia Tech Department of Statistics ANOVA Source of Variation k=4 N = 16 df SS (k – 1) = 3 Between groups MS F P-value R2 40 13.333 20.0 P < 0.005 0.83 Within groups (N – k) = 12 Total (N – 1) = 15 48 8 0.667 0.17 1.00 6. Report Method: The effects of diet (Cheerios, Cornflakes, Frostios, Frosted Flakes) on mean weight gain (g) of mice are studied by one-factor analysis of variance of a balanced design with 4 mice per diet, i.e., 16 mice in all. Conclusion:There is highly significant statistical evidence that the mean weight gains (g) of mice under the various diets(Cheerios, Cornflakes, Frostios, Frosted Flakes) are not all equal (P < 0.005), and R2 = 83% of the variation in weight gain is explained by diet. Sums-of Squares Definitional Formulas Lines (a), (b), and (e) containing the i-th sample size ni, the i-th sample standard deviation si , and the i-th sample mean Yi, respectively, for each treatment group are easily obtained with a pocket calculator that has statistical-function keys. Then the following formulas can be used to compute the ANOVA sums of squares. SSW = Yij Yi SSi ni 1 si2 2 2 2 2 8 k n i 1 j 1 SSB k k i 1 i 1 2 Y Y = n Y Y n ˆ n effect k n i 1 j 1 2 i k i 1 2 i i k k 2 i i i 1 SST Yij Y SSB SSW k i n 2 i 1 i 2 i 16 4 4 16 40 8 j 1 7 Determination of the effects of treatments by analysis of variance works by decomposing the deviations of the treatment group means from the grand mean. Wt Gain (g) 6 Sum of Squares Exposed 5 4 3 2 1 0 Cheerios Corn Flakes Frosted Flakes Frostyos Diet Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 4 11/13/2012 Virginia Tech Department of Statistics Deviations For each observed response, there are three deviations. The total deviation is the deviation of the response from the grand mean. The total deviation is the sum of two parts: the effect, which is the deviation of the treatment mean from the grand mean, and the error or residual, which is the deviation of the response from the observed treatment mean. Can you see the three deviations for each point in the breakfast cereal data shown to the left? Additively Analysis of variance works because of the additivity of the sums of squares and degrees of freedom. Source of Variation: Within Between Deviation: eij i eˆij ˆi total deviation Yi Y Y Y Y Y ij Y Sum-of-Squares: degrees of freedom: Yi ij ij ij ij eˆ 2 i ij i ij 2 N -k 2 ˆi 2 k 1 Total Y ij Y Y ij ij Y ij Y 2 N 1 Algebra Yij i eij Y i Y i ij ij Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 5 eij Y ij i 11/13/2012 Virginia Tech Department of Statistics In the table below, the effects, errors, total deviations, and squares are displayed for each observation in the cereal data. The sums of squares are appear at the bottom. Sum-of-Squares Exposed Sample Sample Between-Groups Within-Groups Total (Raw Data) Means Deviations Deviations Deviations Group Observed Group Grand Group Residual Total or Trt Response Mean Mean Effect Effect2 or "error" error2 Deviation TD2 Y Y Y i i Yij Yi Y 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 Total 1 2 3 2 4 2 3 3 5 5 4 6 5 7 6 6 2 2 2 2 3 3 3 3 5 5 5 5 6 6 6 6 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved ij ˆi2 ˆi Yi eˆi eˆi2 Y ij Y Y ij Y 2 −2 4 −1 1 −3 9 −2 4 0 0 −2 4 −2 4 1 1 −1 1 −2 4 0 0 −2 4 −1 1 1 1 0 0 −1 1 −1 1 −2 4 −1 1 0 0 −1 1 −1 1 0 0 −1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 −1 1 0 0 1 1 1 1 2 4 2 4 −1 1 1 1 2 4 1 1 3 9 2 4 0 0 2 4 2 4 0 0 2 4 0 40 0 8 0 48 Check = SSB Check = SSW Check = SST = SSM = SSE “explained” “unexplained” “total” 6 11/13/2012 Virginia Tech Department of Statistics Electronic Computation JMP Implementation Download the Data from Holtzman’s class website and open with JMP by performing the following steps. Holtzman Website > Data > JMP Examples >Cheerios.jmp Then, in the JMP data table Cheerios.jmp, JMP > Analyze > Fit Y by X > In the Fit Y by X dialog box Y, Response = Wt Gain (g) X, Factor = Diet OK > In the Fit Y by X platform, click on the red drop-down menu, and select > Means / Anova > Means and Std Dev > Compare Means > All Pairs, Tukey HSD Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 7 11/13/2012 Virginia Tech Department of Statistics SAS Implementation All of the computations of the previous example can be done by the Statistical Analysis System (SAS). *---------------------------------------------------------------------* | VM1 ST301200 1WAY GLM A1 | *---------------------------------------------------------------------; TITLE1 '1WAY ANOVA EXAMPLE (VM1 ST301200 1WAY GLM)'; DATA RAW; INPUT CEREAL $ GAIN; CARDS; 1CHEERIOS 1 1CHEERIOS 2 1CHEERIOS 3 1CHEERIOS 2 2CORNFLAKES 4 2CORNFLAKES 2 2CORNFLAKES 3 2CORNFLAKES 3 3FROSTYOS 5 3FROSTYOS 5 3FROSTYOS 4 3FROSTYOS 6 4FROSTEDFLAKES 5 4FROSTEDFLAKES 7 4FROSTEDFLAKES 6 4FROSTEDFLAKES 6 PROC PRINT; PROC PLOT; PLOT GAIN*CEREAL; PROC GLM; CLASS CEREAL; MODEL GAIN = CEREAL; MEANS CEREAL / LSD TUKEY; Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 8 11/13/2012 Virginia Tech Department of Statistics Trivial Examples Example A Example B Example C Example D Trt Responses Trt Responses Trt Responses Trt Responses 1 -1, 0, 1 1 -3, -2, -1 1 -1, 0, 1 1 -3, -1, 1 2 -1, 0, 1 2 -1, 0, 1 2 -1, 0, 1 2 -2, 0, 2 3 -1, 0, 1 3 1, 2, 3 3 1, 2, 3 3 -1, 1, 3 Example A Yij | 2 + | X X X 0 + X X X | X X X -2 + | +-+--+--+i = 1 2 3 _ Yi = 0, 0, 0 _ Y = 0 _ _ Yi-Y = 0, 0, 0 MSB MSW F (P = 0 = 1 = 0 = 1) N.S. Example B Yij | X 2 + X | X X 0 + X | X X -2 + X | X +-+--+--+i = 1 2 3 _ Yi =-2, 0, 2 _ Y = 0 _ _ Yi-Y =-2, 0, 2 Example C Yij | X 2 + X | X X X 0 + X X | X X -2 + | +-+--+--+i = 1 2 3 _ Yi = 0, 0, 2 _ Y = 2/3 _ _ Yi-Y=-.7,-.7,1.3 MSB = 12 MSW = 1 F = 12 (0.005<P<0.01) Significant Example D Yij | X 2 + X | X X 0 + X | X X -2 + X | X +-+--+--+i = 1 2 3 _ Yi =-1, 0, 1 _ Y = 0 _ _ Yi-Y =-1, 0, 1 MSB = 4 MSW = 1 F = 4 (0.05<P<0.10) Slightly Sig. MSB = 3 MSW = 4 F = 3/4 (P>0.5) N.S. Conclusions Example A. There is not significant evidence that the group means differ (P = 1). Example B. There is significant evidence that the group means differ (0.005 < P < 0.01). Example C. There is slightly significant evidence that the group means differ (0.05 < P < 0.10). Example D. There is not significant evidence that the group means differ (P > 0.50). Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 9 11/13/2012 Virginia Tech Department of Statistics Exercises Sums-of-Squares-Exposed Exercise 1. For Trivial Exercises 1 (below), construct "Sum-of-Squares Exposed" tables as I have done for the Breakfast-Cereal Example. 2. For Trivial Exercises 2 (below), construct "Sum-of-Squares Exposed" tables as I have done for the Breakfast-Cereal Example. Trivial Exercises To gain understanding of how ANOVA works, you will do four contrived trivial exercises (below). For each exercise there are three treatments with three observations per treatment. You do Exercises 1-4 (below) as I have done Examples A-D (above). There is no need to use a computer. You will learn more by doing this by hand. a. On one page, draw side-by-side scatterplots for each exercise, as I did for the trivial examples above. Please draw them side-by-side with uniform vertical axes. b. Estimate the treatment means, the grand mean, and the treatment effects. (Don't show or turn in details of the computations, just tabulate the correct answers in Step d below.) c. Compute MSB, MSW, F, and P. (Feel free to check your computations with JMP, SAS, Minitab, etc. Please do not turn in your calculations, just tabulate the correct answers in Step d below.) d. Now tabulate the treatment-group means, grand mean, effects,MSB, MSW, F, P, and the characterization of the result (N.S., sig., etc.) below the scatterplots as in the examples. e. State the conclusion verbally. Trivial Exercise 1 Trivial Exercise 2 Trivial Exercise 3 Trivial Exercise 4 Trt Responses Trt Responses Trt Responses Trt Responses 1 3, 4, 5 1 2, 3, 4 1 1, 3, 5 1 −1, 1, 3 2 4, 5, 6 2 4, 5, 6 2 3, 5, 7 2 3, 5, 7 3 5, 6, 7 3 6, 7, 8 3 5, 7, 9 3 5, 7, 9 Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 10 11/13/2012 Virginia Tech Department of Statistics One-way Anova and Multiple-Comparison Exercise The Doughnut Problem (Snedecor & Cochran (1967, 6th edition) pp. 258259) During cooking, doughnuts absorb fat in various amounts. A nutritionist wished to learn if the amount absorbed depends on the type of fat used. For each of four fats, six batches of doughnuts were prepared. The amount of fat absorbed for each batch is recorded in the table to the right. 1. Use the 6-step method to determine whether there is statistically significant evidence that the average amount of fat absorbed differs according the type of fat used. But modify the analysis step, Step 5, as explained next. 2. For Step 5, you may use software for the calculations, but show the following in this order cutting and pasting (actually or virtually) as necessary. Fat absorbed per batch (g) Type of fat 1 2 3 4 64 78 75 55 72 91 93 66 68 97 78 49 77 82 71 64 56 85 63 70 95 77 76 68 Table 3. Weekly Weight Gain by Diet. Mean SE Diet n gain (g) (g) Frosted Flakes 4 6.0 0.41 Frostyos 4 5.0 0.41 Corn Flakes 4 3.0 0.41 Cheerios 4 2.0 0.41 a. Draw a scatterplot of the doughnut data, by hand or with software. Notice from the scatterplot that there is a clear separation between the batches for Fats 4 and 2, but that for every other pair of samples, there is some overlap. b. Make a summary table of the donut results (treatments, treatment sample sizes, treatment means, and treatment standard errors), as often reported in published research, as shown in Table 3 for the Cheerios data. Notice that the table i. is sorted by the mean ii. has means and standard errors rounded to two significant figures Table 4. Effects of Diet on Weekly Weight Gain. Mean Effect Diet n gain (g) (g) Frosted Flakes 4 6.0 2.0 Frostyos 4 5.0 1.0 Corn Flakes 4 3.0 1.0 Cheerios 4 2.0 2.0 16 4.0 0.0 iii. shows the units of measure explicitly. iv. Warning: Recall that the Cheerios data are contrived in such a way that the standard errors are the same for each treatment group. That would be very unlikely to occur in real data, and it doesn’t happen for Donut data. Be sure to show the “raw” standard errors, which are computed in JMP by Fit Y by X> Means and StdDev, rather than the “pooled” standard errors, which are computed in JMP by Fit Y by X> Means/Anova. Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 11 11/13/2012 Virginia Tech Department of Statistics c. Calculate and tabulate the effect of each type of fat as shown in Table 4 for the Cheerios data. Notice that a row is added for the overall mean. Notice also that the effect of each treatment is simply the treatment mean minus the overall mean. d. Show the ANOVA Table. 3. Follow the ANOVA Conclusion, Step 6, with a written full report, including, if necessary, Tukey’s HSD procedure, and any relevant table(s) of results. For the Cheerios data, that table would be Table 5. Table 5. Weekly Weight Gain by Diet. Diet n Mean gain (g)* SE (g) Frosted Flakes 4 6.0a 0.41 ab Frostyos 4 5.0 0.41 Corn Flakes 4 3.0bc 0.41 c Cheerios 4 2.0 0.41 * Means followed by the same superscript are not significantly different at the 0.01 level experimentwise using theTukey-Kramer HSD (Sall, Creighton, and Lehman 2005, Zar 1999). Document1 Copyright©1982, 2008, 2012 Golde I. Holtzman, all rights reserved 12 11/13/2012 Virginia Tech Department of Statistics