Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TWO WAY ANOVA (Because this topic isn’t covered in the recommended text, I’ve written a proper set of notes for it.) A firm wants to test four different kinds of new machine. They have five workers, and each of them uses each of the machines for a week. The results are as follows: 1 2 3 4 1 44 38 47 36 2 46 40 52 43 3 34 36 44 32 4 43 38 46 33 5 38 42 49 39 We begin by making the null hypothesis that there is no variation either among the workers or among the types of machine. We then proceed much as before, except that now we assume that each observation differs from the overall mean because of (a) factor 1 (which worker) (b) factor 2 (which machine) and (c) error. So our model is X = µ + F1 + F2 + ϵ Suppose there are r levels of factor 1 and c levels of factor 2. The word ‘levels’ is the standard terminology. We would normally put the data into an r × c matrix, with the rows representing the levels of factor 1 and the columns the levels of factor 2. Here, for example, factor 1 is workers, factor 2 is machines, r = 5 and c = 4 We then define 1∑ X̄i. = Xij c j=1 1∑ X̄.j = Xij r i=1 s r These are the row and column means; so for example X̄2. is the mean of the entries in the second row. Note that there is an c in the formula for the row means and an r in the formula for the column means. This is simply because an r × c matrix has c entries in each of its r rows. The dot replaces the index we have summed over so that there is no confusion about which one it was. To break up the total sum of squares and get the right sorts of sums of squares in we write: ∑∑ ∑∑ (Xij − X̄)2 = [(X̄i. − X̄) + (X̄.j − X̄) + (Xij − X̄i. − X̄.j + X̄)]2 i j i j 1 When we expand the right hand side we find that all the cross products vanish: ∑∑ ∑ ∑ (X̄i. − X̄)(X̄.j − X̄) = (X̄i. − X̄) (X̄.j − X̄) = 0 i since ∑ j (X̄.j j i j − X̄) = 0. Then also ∑∑ ∑ ∑ [(X̄i. − X̄)(Xij − X̄i. − X̄.j + X̄) = (X̄i. − X̄) (Xij − X̄i. − X̄.j + X̄) i j i j ∑ = (X̄i. − X̄)(sX̄i. − sX̄i. − sX̄ + sX̄) = 0 i and similarly for the remaining term. Hence ∑∑ ∑∑ ∑∑ ∑∑ (Xij − X̄)2 = (X̄i. − X̄)2 + (X̄.j − X̄)2 + (Xij − X̄i. − X̄.j + X̄)2 i j i j i j i j which we can write as SStotal = SSf actor 1 + SSf actor 2 + SSerror Clearly we can write SSf actor 1 = ∑ c(X̄i. − X̄)2 SSf actor i 2 = ∑ r(X̄.j − X̄)2 j As before, there is an c in the formula for factor 1, i.e. for the rows, and an r in the formula for factor 2, i.e. for the columns. It can be shown that if we divide the sums of squares by the respective number of degrees of freedom we obtain the mean square errors and that these are all estimates of the (again supposed common) variance under the null hypothesis that there are no systematic effects: H0 : E(X̄i. ) = E(X̄.j ) = E(X̄) = µ The correct number of degrees of freedom is r − 1 for factor 1, c − 1 for factor 2, and (r − 1)(c − 1) for error. This makes the total rc − 1 which is correct because there is one lost in estimating the overall mean. We can then go on to show that the appropriate tests are based on F = M Sf actor M Serror In the example the five rows correspond to different workers and the four columns to different machines, so factor 1 is workers, factor 2 is machines, r = 5 and c = 4. 2 i 1 2 3 4 5 ∑ i X̄.j We write down the data as before, together with the various different sums and means: ∑ 1 2 3 4 X̄i. (X̄i. − X̄)2 j 44 46 34 43 38 38 40 36 38 42 47 52 44 46 49 36 43 32 33 39 165 181 146 160 168 205 41 0 0 194 38.8 -2.2 4.84 238 47.6 6.6 43.56 183 36.6 -4.4 19.36 41.25 45.25 36.5 40 42 .25 4.25 -4.5 -1 1 .0625 18.0625 20.25 1 1 40.375 7→67.76 The overall mean is 41, which for this square array is the mean of both the Xi. and the X.j . The summed squared deviations now have to be multiplied by 4 and 5, respectively, and this gives ∑ SSworkers = s(X̄i. − X̄)2 = 4 × 40.375 = 161.5 i SSmachines = ∑ r(X̄.j − X̄)2 = 5 × 67.76 = 338.8 j We can work out the sum of squares for error directly, but it is usually easier to work out the total sum of squares ∑∑ SStotal = (Xij − X̄)2 = 574 i j which gives us, by subtraction, SSerror = 574 − 338.8 − 161.5 = 73.7 We now compute the mean squares 161.5 338.8 = 40.375 M Smachines = = 112.93 5−1 4−1 Note that while the number 40.375 has turned up again that’s a bit of a coincidence. We multiplied by 4 because there are 4 columns, and we’ve now divided by 4 because there are 5 rows, and 4 is 5-1. M Sworkers = M Serror = 73.7 = 6.1 (5 − 1)(4 − 1) and then the two F -statistics workers: 40.38 = 6.62 6.1 F95%,(4,12)df = 3.26 3 F99%,(4,12)df = 5.41 machines: 112.93 = 18.5 6.1 F95%,(3,12)df = 3.49 F99%,(3,12)df = 5.95 so there is a highly significant variation among both workers and machines. 4