Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT 211 Handout 10 (Chapter 10):The Analysis of Variance When two or more populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor under investigation. Single Factor ANOVA : Model: X ij i ij where i=1,...,I (number of treatments), j=1,...,J (number of observations in each treatment). X ij : observations i : ith treatment mean. ij : errors which are normally distributed with mean, 0 and the constant variance, 2 . Alternative model: X ij i ij where i=1,...,I (number of treatments), j=1,...,J (number of observations in each treatment). X ij : observations :overall mean i i : ith treatment effect. ij : errors which are normally distributed with mean, 0 and the constant variance, 2 . Assumptions: (ii) X ij 's are independent ( ij 's are independent) (ii) ij 's are normally distributed with mean, 0 and the constant variance, 2 . (iii) X ij 's are normally distributed with mean, i and the constant variance, 2 . Hypothesis: H 0 : 1 ... I 1 I versus H a : at least one i j for i j where i .and j 's are treatment means. Or H 0 : i 0 for all i versus H a : i 0 for at least one i where i is the ith treatment effect. Analysis of Variance Table: Source df SS MS F Prob > F Treatments I-1 SSTr MSTr = SStr / (I-1) MSTr / MSE P-value Error I(J-1) SSE MSE = SSE / [I(J-1)] Total IJ-1 SSTotal where df is the degrees of freedom, SS is the sum of squares, MS is the mean square. Reject H0 if the P-value or if the test statistics F > F;I-1,I(J-1). If you reject the null hypothesis, you need to use multiple comparison test such as Tukey-Kramer, page 414 to see which means are different. _ _ 100(1-)% simultaneous confidence interval for i-j : ( x i x j ) Q , I ; I ( J 1) MSE . J or write the sample means in increasing order and look at their pairwise differences. Reject H 0 : i j if _ _ x i x j Q , I ;I ( J 1) MSE in the multiple comparison test. J Q , I ; I ( J 1) is the critical value in studentized range distribution (Table A10). Example 1: The data on Calcium content of wheat is observed. Four different storage times are considered. Storage Period Observations 0 months 58.75 57.94 58.91 56.85 1 month 58.87 56.43 56.51 57.67 2 months 59.13 60.38 58.01 59.95 4 months 62.32 58.76 60.03 59.36 Is there sufficient evidence to conclude that the storage times? Use =0.05. 55.21 57.30 59.75 58.48 59.51 60.34 59.61 61.95 mean calcium content is not the same for the four different You are testing H 0 : 1 2 3 4 versus H a : at least one i j for i j , i,j=1,2,3,4. Assumptions: (i) Each month's distribution is normal. (ii) Each month's distribution has identical standard deviations. (iii) The observations selected for each month are independent from one another. (iv) The samples selected for each month are independent from one another. Analysis of Variance Source DF SS Factor 3 32.1381669 Error 20 32.9010529 Total 23 65.0392198 Level Month0 Month1 Month2 Month4 Total N 6 6 6 6 24 Mean 57.493333 57.951666 59.553333 60.338333 58.834166 MS 10.7127223 1.64505264 F 6.51 P 0.0030 StDev 1.3748118 1.3288857 0.89694374 1.4559044 1.681604 Bartlett's test for equal variances(normal distribution) Test Statistic: 1.1633 P-Value : 0.762 Tukey-Kramer multiple comparison test gives 95% simultaneous confidence intervals for 1 2 : (-2.5319, 1.6152) 1 3 : (-4.1335, 0.0135) 1 4 : (-4.9185, -0.7715) 2 3 : (-3.6752, 0.4719) 2 4 : (-4.4602, -0.3131) 3 4 : (-2.8585, 1.2885) Example 2: An engineer conducted a study of the factors influencing the lengths of steel bars. The lengths of twelve bars were taken from a screw machine, 4 being subjected to W heat treatment, 4 to L heat treatment, and 4 to D heat treatment. The lengths (less 438) were as follows: Heat Treatment W L D 6 4 7 7 6 9 1 -1 10 6 4 6 Boxplots of W - D (means are indicated by solid circles) 10 5 D L W 0 Analysis of Variance Source DF SS Factor 2 46.17 Error 9 58.75 Total 11 104.92 MS 23.08 6.53 Level W L D StDev 2.708 2.986 1.826 N 4 4 4 Mean 5.000 3.250 8.000 F 3.54 Bartlett's Test (normal distribution) Test Statistic: 0.637 P-Value : 0.727 Levene's Test (any continuous distribution) Test Statistic: 0.022 P-Value : 0.979 We will answer the question using the output above. P 0.074 Normal Probability Plot for x ML Estimates - 95% CI 99 ML Estimates 95 Mean 5.41667 StDev 2.95687 90 Goodness of Fit Percent 80 AD* 70 60 50 40 30 1.319 20 10 5 1 -5 0 5 10 15 Data Normal Probability Plot for W...D ML Estimates - 95% CI W 99 L D 95 Goodness of Fit 90 AD* Percent 80 2.953 2.802 2.619 70 60 50 40 30 20 10 5 1 -5 0 5 Data 10 15 Example 3: A tire manufacturer wants to test whether the mean diameters of tires produced at its three plants (New York, Illinois, and California) are equal. Last month, he took a random sample of tires at each plant, and their diameters (in inches) were as follows: New York Illinois California 24.2 24.2 24.1 24.4 24.2 24.3 24.4 24.3 24.2 24.1 24.2 24.4 24.3 24.1 24.4 24.5 24.4 24.2 24.3 24.3 24.4 24.3 24.4 24.4 24.4 24.5 24.4 Boxplots of NY - CA (means are indicated by solid circles) 24.5 24.4 24.3 24.2 Analysis of Variance Source DF SS Factor 2 0.0674 Error 24 0.2911 Total 26 0.3585 Level NY IL CA N 9 9 9 Mean 24.244 24.311 24.367 CA IL NY 24.1 MS 0.0337 0.0121 F 2.78 P 0.082 StDev 0.113 0.105 0.112 Bartlett's Test (normal distribution) Test Statistic: 0.042 P-Value : 0.979 Levene's Test (any continuous distribution) Test Statistic: 0.063 P-Value : 0.939 Confidence Interval for ci i : i _ ci x i t / 2;I ( J 1) i MSE ci2 J with equal sample sizes will be discussed in class. If we go back to example 2, I do not approve L treatment. I really like to test see the differences between the average of L and the combined average of D and W. L _ _ _ ^ 1 W D then x L 1 xW x D 3.250 1 (5 8) 3.25 2 2 2 MSE ci2 _ ci x i t / 2;I ( J 1) J i 6.53(12 (0.5) 2 (0.5) 2 ) =(-6.79 , 0.29) is the 95% 4 3.25 t 0.025;9 C.I. for where t 0.025;9 =2.262. For the case of unequal sample sizes, let n I J i 1 i and j=1,…,Ji . Then the difference in the analysis of variance table and multiple comparison test is as follows. Source Treatments Error Total df I-1 n-I n-1 SS MS F Prob > F SSTr MSTr = SStr / (I-1) MSTr / MSE P-value SSE MSE = SSE / (n-I) SSTotal where df is the degrees of freedom, SS is the sum of squares, MS is the mean square. Reject H0 if the P-value or if the test statistics F > F;I-1,n-I. If you reject the null hypothesis, you need to use multiple comparison test such as Tukey-Kramer to see which means are different. _ _ 100(1-)% simultaneous confidence interval for i-j : ( x i x j ) Q , I ;n I MSE 1 1 . 2 J i J j or write the sample means in increasing order and look at their pairwise differences. _ _ Reject H 0 : i j if x i x j Q , I ;n I MSE 1 1 for the Tukey-Kramer. 2 J i J j Q , I ;n I is the critical value in studentized range distribution (Table A10). Example 4 (Exercise 10.22): The data is about the yield of tomatoes for four different levels of salinity. Using =0.05, test for any differences in true average yield due to the different salinity levels. Analysis of Variance Source DF SS Factor 3 456.50 Error 14 124.50 Total 17 581.00 Level level level level level 1.6 3.8 6.0 10.2 N 5 4 4 5 Pooled StDev = Mean 58.280 55.400 50.850 45.500 2.982 MS 152.17 8.89 StDev 3.602 2.665 2.426 2.901 F 17.11 P 0.000 Individual 95% CIs For Mean Based on Pooled StDev ---------+---------+---------+------(----*----) (----*-----) (-----*----) (----*----) ---------+---------+---------+------48.0 54.0 60.0 Bartlett's Test (normal distribution) Test Statistic: 0.558 P-Value : 0.906 Levene's Test (any continuous distribution) Test Statistic: 0.130 P-Value : 0.940 Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0115 Critical value = 4.11 Intervals for (column level mean) - (row level mean) 1.6 3.8 3.8 -2.934 8.694 6.0 1.616 13.244 -1.578 10.678 10.2 7.299 18.261 4.086 15.714 6.0 -0.464 11.164 Differences between fixed effect and random effect models will be discussed in class.