Download One-Way ANOVA - VT Scholar

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
One-Way ANOVA
One-way (or one-factor) analysis of variance (ANOVA) is covered in Sokal and Rohlf
(1995) Chapters 8 and 9 (computations shown in Box 9.4 pp. 218-219), in Ott Chapters
10 and 11, in Koopmans (1987) Chapter 11, in Zar (1984) Chapter 11, Sections 11.111.3, and in Zar (1999), Chapters 10 and 11, Sections 10.1 and 11.1-11.3.
Contents





Breakfast_Cereal_Example
o The Six-Step Method
Sums-of_Squares
o Sum_of_Squares Exposed
Electronic Computation
o SAS Implementation
Trivial Examples
Exercises
o Sums-of-Squares-Exposed Exercise
o Trivial Exercises
o The Doughnut Problem
Table 1. Breakfast Cereal
Study: Raw Data, “twovariable format”
Weight
Gain
Diet
(g)
Cheerios
1
Cheerios
2
Cheerios
3
Cheerios
2
Breakfast Cereal Example
Corn Flakes
Corn Flakes
Corn Flakes
Corn Flakes
4
2
3
3
Sixteen randomly selected mice of the same age and strain
were randomly assigned to one of four treatment groups.
The various treatments were various diets: Cheerios, Corn
Flakes, Frostyos, and Frosted Flakes. The weight gain
after 1 week on the diet was recorded for each mouse. Is
there evidence that the mean weight gain is affected by
these treatments?
Frostyos
Frostyos
Frostyos
Frostyos
5
5
4
6
Frosted Flakes
Frosted Flakes
Frosted Flakes
Frosted Flakes
5
7
6
6
Table 2. Breakfast Cereal Study, Raw
Data, “four-sample format”
Corn
Frosted
Cheerios Flakes Frostyos Flakes
1
4
5
5
2
2
5
7
3
3
4
6
2
3
6
6
The sample, i.e., raw data, can be viewed, as in Table 1, as a relationship between two
variables: one qualitative variable, e.g., diet (Cheerios, Corn Flakes, Frostyos, Frosted
Flakes) and one quantitative variable, e.g., weight gain (g). Alternatively, the raw data
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
1
11/13/2012
Virginia Tech
Department of Statistics
can be viewed as in Table 2, a four samples of the same quantitative variable, e.g., weight
gain (g), from four independent populations, e.g., the “populations” of mice who eat,
respectively, the four different diets.
The data also can be viewed graphically as the relationship between two variables, as in
Figure 1, which corresponds to the view of Table 1.
Yij
Weight
Gain
(g)
|
X
6 +
X
XX
|
XX
X
4 +
X
X
|
X
XX
2 +
XX
X
|
X
0 +----+--------+--------+--------+---->i
1
2
3
4
Cheerios Corn
Frostios Frosted
Flakes
Flakes
| 4
> + 3 4 4
> | 3 3 4
+ 2 3
> | 1 2 2
> + 1 1 2
| 1
+------------->
Combined
TREATMENT GROUP (Diet)
Figure 1. Graph of raw data: weight gain as a function of diet
The Six-Step Method
1. Model:
Yij  i  eij     i  eij
(obs response)ij = (pop mean response to trt)i + (error)ij
(obs response)ij = (pop mean response) + (pop effect of trt)i + (error)ij
where
a. Yij denotes the weight gain (g) of the j-th randomly selected mouse from the i-th
treatment group, j = 1, 2, ...,ni, i = 1, 2, ..., k.
b. μi denotes the mean weight gain, or expected weight gain, of mice on diet i.
μ denotes the grand mean, i.e., the mean weight gain, or expected weight
gain,
τi = (μi – μ) is the effect of treatment i.
eij =(Yij − μi) is the residual or error of the j-th observation in the i-th
group.
c. Assume:
i.
eij are independent (random sampling and careful experimentation achieve
this),
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
2
11/13/2012
Virginia Tech
Department of Statistics
ii.
σi = σ for all i (Homogeneity of Variance).
iii.
eij are normally distributed with mean 0 and standard deviation σi ,
These assumptions imply, and are equivalent to assuming that
i.
ii.
Yij are independent (random sampling and careful experimentation achieve
this),
σi = σ for all i (Homogeneity of Variance).
iii.
Yij are normally distributed with mean μi and standard deviation σi ,
2. Hypotheses:
H0: μ1 = μ2 = μ3= μ4 vs. HA: not H0,
or,
H0: τi = 0 for all i, vs. HA: τi ≠ 0 for some i.
3. Test Criterion:F = MSB/MSW, with (k – 1) and (N – k) degrees of freedom.
4. Design: α = 0.05, k = 4 groups, n1 = n2 = n3 = n4 = 4, N = 16
Note: When all treatments have equal sample sizes, the design is called balanced.
Otherwise the design is called unbalanced. Unbalanced designs require more
complicated (i.e., messy) computations, but the concepts are the same (with a few
exceptions), we will stick to balanced designs for the most part.
5. Computations:
j
Group i →
Cheerios
1
Corn
Flakes
2
Frostyos
3
Frosted
Flakes
4
1
4
5
6
2
2
4
5
2
3
5
6
3
3
6
7
(1)
(2)
(3)
Yij
(4)
Combined
(a)
ni
4
4
4
4
N = 16
(b)
Yi
2
3
5
6
Y 4
−2
−1
1
2
Check = 0
SSB= 40
(effect)i
(c) ˆ  Y  Y
i 
i
(d)
niˆi2
16
4
4
16
(e)
si
0.816
0.816
0.816
0.816
(f)
si
2
0.667
0.667
0.667
0.667
(g)
SSi
2
2
2
2
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
3
SSW = 8
11/13/2012
Virginia Tech
Department of Statistics
ANOVA
Source of Variation
k=4
N = 16
df
SS
(k – 1) = 3
Between groups
MS
F
P-value
R2
40 13.333 20.0 P < 0.005 0.83
Within groups
(N – k) = 12
Total
(N – 1) = 15 48
8
0.667
0.17
1.00
6. Report
Method: The effects of diet (Cheerios, Cornflakes, Frostios, Frosted Flakes) on mean
weight gain (g) of mice are studied by one-factor analysis of variance of a balanced
design with 4 mice per diet, i.e., 16 mice in all.
Conclusion:There is highly significant statistical evidence that the mean weight gains
(g) of mice under the various diets(Cheerios, Cornflakes, Frostios, Frosted Flakes) are
not all equal (P < 0.005), and R2 = 83% of the variation in weight gain is explained by
diet.
Sums-of Squares
Definitional Formulas
Lines (a), (b), and (e) containing the i-th sample size ni, the i-th sample standard
deviation si , and the i-th sample mean Yi, respectively, for each treatment group are
easily obtained with a pocket calculator that has statistical-function keys. Then the
following formulas can be used to compute the ANOVA sums of squares.
SSW =  Yij  Yi    SSi    ni  1 si2  2  2  2  2  8
k
n
i 1 j 1
SSB 
k
k
i 1
i 1
2
 Y  Y  = n Y  Y    n ˆ   n  effect 
k
n
i 1 j 1
2
i
k
i 1
2
i
i
k
k
2
i i
i 1
SST   Yij  Y   SSB  SSW
k
i
n
2
i 1
i
2
i
 16  4  4  16  40
8
j 1
7
Determination of the effects of
treatments by analysis of variance works
by decomposing the deviations of the
treatment group means from the grand
mean.
Wt Gain (g)
6
Sum of Squares Exposed
5
4
3
2
1
0
Cheerios
Corn Flakes Frosted Flakes Frostyos
Diet
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
4
11/13/2012
Virginia Tech
Department of Statistics
Deviations
For each observed response, there are three deviations. The total deviation is the
deviation of the response from the grand mean. The total deviation is the sum of two
parts: the effect, which is the deviation of the treatment mean from the grand mean, and
the error or residual, which is the deviation of the response from the observed treatment
mean. Can you see the three deviations for each point in the breakfast cereal data shown
to the left?
Additively
Analysis of variance works because of the additivity of the sums of squares and degrees
of freedom.
Source of Variation:
Within

Between

Deviation:
eij

i

eˆij

ˆi
 total deviation
 Yi 

Y  Y 

 Y  Y 
Y
ij
 Y
Sum-of-Squares:
degrees of freedom:
 Yi 
ij
ij

ij ij
eˆ
2
i
ij
i

ij

2

N -k
2
ˆi 2
k 1
Total
Y
ij
 

Y

 Y
ij
ij
Y 
ij
Y 
2


N 1
Algebra
Yij     i  eij
Y
  
i

Y
  
 i   

ij
ij
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
5
eij
Y
ij
 i 
11/13/2012
Virginia Tech
Department of Statistics
In the table below, the effects, errors, total deviations, and squares are displayed for each
observation in the cereal data. The sums of squares are appear at the bottom.
Sum-of-Squares Exposed
Sample
Sample
Between-Groups Within-Groups
Total
(Raw Data)
Means
Deviations
Deviations
Deviations
Group Observed Group Grand Group
Residual
Total
or Trt Response Mean Mean Effect Effect2 or "error" error2 Deviation TD2
Y
Y  Y 
i
i
Yij
Yi
Y
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
Total
1
2
3
2
4
2
3
3
5
5
4
6
5
7
6
6
2
2
2
2
3
3
3
3
5
5
5
5
6
6
6
6
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
ij
ˆi2
 ˆi
 Yi 
 eˆi
eˆi2
Y
ij
Y 
Y
ij
Y 
2
−2
4
−1
1
−3
9
−2
4
0
0
−2
4
−2
4
1
1
−1
1
−2
4
0
0
−2
4
−1
1
1
1
0
0
−1
1
−1
1
−2
4
−1
1
0
0
−1
1
−1
1
0
0
−1
1
1
1
0
0
1
1
1
1
0
0
1
1
1
1
−1
1
0
0
1
1
1
1
2
4
2
4
−1
1
1
1
2
4
1
1
3
9
2
4
0
0
2
4
2
4
0
0
2
4
0
40
0
8
0
48
Check = SSB Check = SSW Check
= SST
= SSM
= SSE
“explained”
“unexplained”
“total”
6
11/13/2012
Virginia Tech
Department of Statistics
Electronic Computation
JMP Implementation
Download the Data from Holtzman’s class website and open with JMP by performing the
following steps.
Holtzman Website > Data > JMP Examples >Cheerios.jmp
Then, in the JMP data table Cheerios.jmp,
JMP > Analyze > Fit Y by X >
In the Fit Y by X dialog box
Y, Response = Wt Gain (g)
X, Factor = Diet
OK >
In the Fit Y by X platform, click on the red drop-down menu, and select
> Means / Anova
> Means and Std Dev
> Compare Means > All Pairs, Tukey HSD
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
7
11/13/2012
Virginia Tech
Department of Statistics
SAS Implementation
All of the computations of the previous example can be done by the Statistical Analysis
System (SAS).
*---------------------------------------------------------------------*
|
VM1 ST301200 1WAY GLM
A1
|
*---------------------------------------------------------------------;
TITLE1 '1WAY ANOVA EXAMPLE (VM1 ST301200 1WAY GLM)';
DATA RAW;
INPUT CEREAL $ GAIN;
CARDS;
1CHEERIOS
1
1CHEERIOS
2
1CHEERIOS
3
1CHEERIOS
2
2CORNFLAKES
4
2CORNFLAKES
2
2CORNFLAKES
3
2CORNFLAKES
3
3FROSTYOS
5
3FROSTYOS
5
3FROSTYOS
4
3FROSTYOS
6
4FROSTEDFLAKES
5
4FROSTEDFLAKES
7
4FROSTEDFLAKES
6
4FROSTEDFLAKES
6
PROC PRINT;
PROC PLOT; PLOT GAIN*CEREAL;
PROC GLM;
CLASS CEREAL;
MODEL GAIN = CEREAL;
MEANS CEREAL / LSD TUKEY;
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
8
11/13/2012
Virginia Tech
Department of Statistics
Trivial Examples
Example A
Example B
Example C
Example D
Trt Responses
Trt Responses
Trt Responses
Trt Responses
1
-1, 0, 1
1
-3, -2, -1
1
-1, 0, 1
1
-3, -1, 1
2
-1, 0, 1
2
-1, 0, 1
2
-1, 0, 1
2
-2, 0, 2
3
-1, 0, 1
3
1, 2, 3
3
1, 2, 3
3
-1, 1, 3
Example A
Yij
|
2 +
| X X X
0 + X X X
| X X X
-2 +
|
+-+--+--+i = 1 2 3
_
Yi = 0, 0, 0
_
Y =
0
_ _
Yi-Y = 0, 0, 0
MSB
MSW
F
(P
= 0
= 1
= 0
= 1)
N.S.
Example B
Yij
|
X
2 +
X
|
X X
0 +
X
| X X
-2 + X
| X
+-+--+--+i = 1 2 3
_
Yi =-2, 0, 2
_
Y =
0
_ _
Yi-Y =-2, 0, 2
Example C
Yij
|
X
2 +
X
| X X X
0 + X X
| X X
-2 +
|
+-+--+--+i = 1 2 3
_
Yi = 0, 0, 2
_
Y =
2/3
_ _
Yi-Y=-.7,-.7,1.3
MSB = 12
MSW = 1
F = 12
(0.005<P<0.01)
Significant
Example D
Yij
|
X
2 +
X
| X
X
0 +
X
| X
X
-2 +
X
| X
+-+--+--+i = 1 2 3
_
Yi =-1, 0, 1
_
Y =
0
_ _
Yi-Y =-1, 0, 1
MSB = 4
MSW = 1
F = 4
(0.05<P<0.10)
Slightly Sig.
MSB = 3
MSW = 4
F = 3/4
(P>0.5)
N.S.
Conclusions
Example A.
There is not significant evidence that the group means differ (P = 1).
Example B.
There is significant evidence that the group means differ
(0.005 < P < 0.01).
Example C.
There is slightly significant evidence that the group means differ
(0.05 < P < 0.10).
Example D.
There is not significant evidence that the group means differ (P > 0.50).
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
9
11/13/2012
Virginia Tech
Department of Statistics
Exercises
Sums-of-Squares-Exposed Exercise
1. For Trivial Exercises 1 (below), construct "Sum-of-Squares Exposed" tables as I have
done for the Breakfast-Cereal Example.
2. For Trivial Exercises 2 (below), construct "Sum-of-Squares Exposed" tables as I have
done for the Breakfast-Cereal Example.
Trivial Exercises
To gain understanding of how ANOVA works, you will do four contrived trivial
exercises (below). For each exercise there are three treatments with three observations
per treatment. You do Exercises 1-4 (below) as I have done Examples A-D (above).
There is no need to use a computer. You will learn more by doing this by hand.
a. On one page, draw side-by-side scatterplots for each exercise, as I did for the
trivial examples above. Please draw them side-by-side with uniform vertical axes.
b. Estimate the treatment means, the grand mean, and the treatment effects. (Don't
show or turn in details of the computations, just tabulate the correct answers in
Step d below.)
c. Compute MSB, MSW, F, and P. (Feel free to check your computations with JMP,
SAS, Minitab, etc. Please do not turn in your calculations, just tabulate the correct
answers in Step d below.)
d. Now tabulate the treatment-group means, grand mean, effects,MSB, MSW, F, P,
and the characterization of the result (N.S., sig., etc.) below the scatterplots as in
the examples.
e. State the conclusion verbally.
Trivial
Exercise 1
Trivial
Exercise 2
Trivial
Exercise 3
Trivial
Exercise 4
Trt Responses
Trt Responses
Trt Responses
Trt Responses
1
3, 4, 5
1
2, 3, 4
1
1, 3, 5
1
−1, 1, 3
2
4, 5, 6
2
4, 5, 6
2
3, 5, 7
2
3, 5, 7
3
5, 6, 7
3
6, 7, 8
3
5, 7, 9
3
5, 7, 9
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
10
11/13/2012
Virginia Tech
Department of Statistics
One-way Anova and Multiple-Comparison Exercise
The Doughnut Problem
(Snedecor & Cochran (1967, 6th edition) pp. 258259) During cooking, doughnuts absorb fat in
various amounts. A nutritionist wished to learn if
the amount absorbed depends on the type of fat
used. For each of four fats, six batches of
doughnuts were prepared. The amount of fat
absorbed for each batch is recorded in the table to
the right.
1. Use the 6-step method to determine whether
there is statistically significant evidence that
the average amount of fat absorbed differs
according the type of fat used. But modify
the analysis step, Step 5, as explained next.
2. For Step 5, you may use software for the
calculations, but show the following in this
order cutting and pasting (actually or
virtually) as necessary.
Fat absorbed per batch (g)
Type of fat
1
2
3
4
64
78
75
55
72
91
93
66
68
97
78
49
77
82
71
64
56
85
63
70
95
77
76
68
Table 3. Weekly Weight Gain by
Diet.
Mean
SE
Diet
n gain (g)
(g)
Frosted Flakes 4
6.0
0.41
Frostyos
4
5.0
0.41
Corn Flakes
4
3.0
0.41
Cheerios
4
2.0
0.41
a. Draw a scatterplot of the doughnut data, by hand or with software.
Notice from the scatterplot that there is a clear separation between the batches for
Fats 4 and 2, but that for every other pair of samples, there is some overlap.
b. Make a summary table of the donut
results (treatments, treatment sample
sizes, treatment means, and treatment
standard errors), as often reported in
published research, as shown in Table
3 for the Cheerios data. Notice that the
table
i. is sorted by the mean
ii. has means and standard errors
rounded to two significant figures
Table 4. Effects of Diet on Weekly
Weight Gain.
Mean
Effect
Diet
n gain (g)
(g)
Frosted Flakes
4
6.0
2.0
Frostyos
4
5.0
1.0
Corn Flakes
4
3.0
1.0
Cheerios
4
2.0
2.0
16
4.0
0.0
iii. shows the units of measure explicitly.
iv. Warning: Recall that the Cheerios data are contrived in such a way that the
standard errors are the same for each treatment group. That would be very
unlikely to occur in real data, and it doesn’t happen for Donut data. Be sure to
show the “raw” standard errors, which are computed in JMP by Fit Y by X>
Means and StdDev, rather than the “pooled” standard errors, which are
computed in JMP by Fit Y by X> Means/Anova.
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
11
11/13/2012
Virginia Tech
Department of Statistics
c. Calculate and tabulate the effect of each type of fat as shown in Table 4 for the
Cheerios data. Notice that a row is added for the overall mean. Notice also that
the effect of each treatment is simply the treatment mean minus the overall mean.
d. Show the ANOVA Table.
3. Follow the ANOVA Conclusion, Step 6, with a written full report, including, if
necessary, Tukey’s HSD procedure, and any relevant table(s) of results. For the
Cheerios data, that table would be Table 5.
Table 5. Weekly Weight Gain by Diet.
Diet
n
Mean gain (g)*
SE (g)
Frosted Flakes
4
6.0a
0.41
ab
Frostyos
4
5.0
0.41
Corn Flakes
4
3.0bc
0.41
c
Cheerios
4
2.0
0.41
* Means followed by the same superscript are not
significantly different at the 0.01 level experimentwise
using theTukey-Kramer HSD (Sall, Creighton, and
Lehman 2005, Zar 1999).
Document1
Copyright©1982, 2008, 2012
Golde I. Holtzman, all rights reserved
12
11/13/2012
Virginia Tech
Department of Statistics
Related documents