Download 2-way ANOVA notes File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
TWO WAY ANOVA
(Because this topic isn’t covered in the recommended text, I’ve written a proper set of
notes for it.)
A firm wants to test four different kinds of new machine. They have five workers, and each
of them uses each of the machines for a week. The results are as follows:
1
2
3
4
1
44
38
47
36
2
46
40
52
43
3
34
36
44
32
4
43
38
46
33
5
38
42
49
39
We begin by making the null hypothesis that there is no variation either among the
workers or among the types of machine. We then proceed much as before, except that
now we assume that each observation differs from the overall mean because of (a) factor 1
(which worker) (b) factor 2 (which machine) and (c) error. So our model is
X = µ + F1 + F2 + ϵ
Suppose there are r levels of factor 1 and c levels of factor 2. The word ‘levels’ is the
standard terminology. We would normally put the data into an r × c matrix, with the
rows representing the levels of factor 1 and the columns the levels of factor 2. Here, for
example, factor 1 is workers, factor 2 is machines, r = 5 and c = 4
We then define
1∑
X̄i. =
Xij
c j=1
1∑
X̄.j =
Xij
r i=1
s
r
These are the row and column means; so for example X̄2. is the mean of the entries in
the second row. Note that there is an c in the formula for the row means and an r in the
formula for the column means. This is simply because an r × c matrix has c entries in each
of its r rows.
The dot replaces the index we have summed over so that there is no confusion about
which one it was.
To break up the total sum of squares and get the right sorts of sums of squares in we
write:
∑∑
∑∑
(Xij − X̄)2 =
[(X̄i. − X̄) + (X̄.j − X̄) + (Xij − X̄i. − X̄.j + X̄)]2
i
j
i
j
1
When we expand the right hand side we find that all the cross products vanish:
∑∑
∑
∑
(X̄i. − X̄)(X̄.j − X̄) =
(X̄i. − X̄)
(X̄.j − X̄) = 0
i
since
∑
j (X̄.j
j
i
j
− X̄) = 0. Then also
∑∑
∑
∑
[(X̄i. − X̄)(Xij − X̄i. − X̄.j + X̄) =
(X̄i. − X̄)
(Xij − X̄i. − X̄.j + X̄)
i
j
i
j
∑
=
(X̄i. − X̄)(sX̄i. − sX̄i. − sX̄ + sX̄) = 0
i
and similarly for the remaining term. Hence
∑∑
∑∑
∑∑
∑∑
(Xij − X̄)2 =
(X̄i. − X̄)2 +
(X̄.j − X̄)2 +
(Xij − X̄i. − X̄.j + X̄)2
i
j
i
j
i
j
i
j
which we can write as
SStotal = SSf actor
1
+ SSf actor
2
+ SSerror
Clearly we can write
SSf actor
1
=
∑
c(X̄i. − X̄)2
SSf actor
i
2
=
∑
r(X̄.j − X̄)2
j
As before, there is an c in the formula for factor 1, i.e. for the rows, and an r in the
formula for factor 2, i.e. for the columns.
It can be shown that if we divide the sums of squares by the respective number of
degrees of freedom we obtain the mean square errors and that these are all estimates of the
(again supposed common) variance under the null hypothesis that there are no systematic
effects:
H0 :
E(X̄i. ) = E(X̄.j ) = E(X̄) = µ
The correct number of degrees of freedom is r − 1 for factor 1, c − 1 for factor 2, and
(r − 1)(c − 1) for error. This makes the total rc − 1 which is correct because there is one
lost in estimating the overall mean.
We can then go on to show that the appropriate tests are based on
F =
M Sf actor
M Serror
In the example the five rows correspond to different workers and the four columns to
different machines, so factor 1 is workers, factor 2 is machines, r = 5 and c = 4.
2
i
1
2
3
4
5
∑
i
X̄.j
We write down the data as before, together with the various different sums and means:
∑
1
2
3
4
X̄i.
(X̄i. − X̄)2
j
44
46
34
43
38
38
40
36
38
42
47
52
44
46
49
36
43
32
33
39
165
181
146
160
168
205
41
0
0
194
38.8
-2.2
4.84
238
47.6
6.6
43.56
183
36.6
-4.4
19.36
41.25
45.25
36.5
40
42
.25
4.25
-4.5
-1
1
.0625
18.0625
20.25
1
1
40.375
7→67.76
The overall mean is 41, which for this square array is the mean of both the Xi. and the
X.j . The summed squared deviations now have to be multiplied by 4 and 5, respectively,
and this gives
∑
SSworkers =
s(X̄i. − X̄)2 = 4 × 40.375 = 161.5
i
SSmachines =
∑
r(X̄.j − X̄)2 = 5 × 67.76 = 338.8
j
We can work out the sum of squares for error directly, but it is usually easier to work out
the total sum of squares
∑∑
SStotal =
(Xij − X̄)2 = 574
i
j
which gives us, by subtraction,
SSerror = 574 − 338.8 − 161.5 = 73.7
We now compute the mean squares
161.5
338.8
= 40.375
M Smachines =
= 112.93
5−1
4−1
Note that while the number 40.375 has turned up again that’s a bit of a coincidence.
We multiplied by 4 because there are 4 columns, and we’ve now divided by 4 because there
are 5 rows, and 4 is 5-1.
M Sworkers =
M Serror =
73.7
= 6.1
(5 − 1)(4 − 1)
and then the two F -statistics
workers:
40.38
= 6.62
6.1
F95%,(4,12)df = 3.26
3
F99%,(4,12)df = 5.41
machines:
112.93
= 18.5
6.1
F95%,(3,12)df = 3.49
F99%,(3,12)df = 5.95
so there is a highly significant variation among both workers and machines.
4