* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download One-way ANOVA and Block Designs
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
14 – Comparing Several Population Means 14.1 - One-way Analysis of Variance (ANOVA) Suppose we wish to compare k population means ( k 2 ). This situation can arise in two ways. If the study is observational, we are obtaining independently drawn samples from k distinct populations and we wish to compare the population means for some numerical response of interest. If the study is experimental, then we are using a completely randomized design to obtain our data from k distinct treatment groups. In a completely randomized design the experimental units are randomly assigned to one of k treatments and the response value from each unit is obtained. The mean of the numerical response of interest is then compared across the different treatment groups. Assumptions 1) 2) 3) There two main questions of interest: 1) Are there at least two population means that differ? 2) If so, which population means differ and how much do they differ by? More formally: H o : 1 2 ... k i.e. all the population means are equal and have a common mean ( ) H a : i j for some i j, i.e. at least two population means differ. If we reject the null then we use comparative methods to answer question 2 above. 118 Basic Idea: The test procedure compares the variation in observations between samples to the variation within samples. If the variation between samples is large relative to the variation within samples we are likely to conclude that the population means are not all equal. The diagrams below illustrate this idea... Between Group Variation >> Within Group Variation (Conclude population means differ) Between Group Variation Within Group Variation (Fail to conclude the population means differ) The name analysis of variance gets its name because we are using variation to decide whether the population means differ. Computational Details for Equal Sample Size Case ( n1 n2 nk n ) Standard statistical notation: X ij jth observatio n from population i n X i sample mean from the population i k X grand mean of all observatio ns X j 1 ij n n X i 1 j 1 N k ij X i 1 i k TWO ESTIMATES OF THE COMMON VARIANCE ( 2 ) Estimate 1: Using Between Group Variation If the null hypothesis is true each of the X i ' s represents a randomly selected observation from a normal distribution with mean (common mean) and std. deviation = n , i.e. the standard error of the mean. 119 Thus if we find the sample standard deviation of the X i ' s we get an estimate of n , or in terms of the sample variance we have, k Sample variance of the X i ' s (X i 1 i X ) 2 k 1 is an estimate of 2 n . Therefore, k n ( X i X ) 2 i 1 k 1 is an estimate of , if and only if the null hypothesis is true. However if the alternative hypothesis is true, this estimate of 2 will be too BIG. This formula will be large when there is substantial between group variation. This measure of between group variation is called the Mean Square for Treatments and is denoted MS Treat . 2 Estimate 2: Using Within Group Variation Another estimate of the common variance ( 2 ) can be found by looking at the variation of the observations within each of the k treatment groups. By extending the pooledvariance from the two population case we have the following: (n 1) s12 (n2 1) s22 ... (nk 1) sk2 Pooled estimate of 2 1 n1 n2 ... nk k which simplifies to the mean of the k sample variances when the sample sizes are all equal. Another way to write this is as follows: k Pooled estimate of 2 ni ( X i 1 j 1 ij X i ) 2 N k MS Error This will be an estimate of the common population variance ( 2 ) regardless of whether the null hypothesis is true or not. This is called the Mean Square for Error ( MS Error ) . 120 Thus if the null hypothesis is true we have two estimates of the common variance ( 2 ) , namely the mean square for treaments (MSTreat ) and the Mean Square Error ( MS Error ) . If MSTreat MS Error we reject H o , i.e. the between group variation is large relative to the within group variation. If MS Treat MS Error we fail to reject H o , i.e. the between group variation is NOT large relative to the within group variation. Test Statistic The test statistic compares the mean squares for treatment and error in the form of a ratio, F MS Treat ~ F - distribution with numerator df k 1 and denominator df N - k MS Error Large values for F give small p-values and lead to rejection of the null hypothesis. 121 EXAMPLE 1 - Weight Gain in Anorexia Patients Data File: Anorexia.JMP in the ANOVA/t-test JMP folder Keywords: One-way ANOVA Topics: Medicine These data give the pre- and post-weights of patients being treated for anorexia nervosa. There are actually three different treatment plans being used in this study, and we wish to compare their performance. The variables in the data file are: Treatment – Family, Standard, Behavioral prewt - weight at the beginning of treatment postwt - weight at the end of the study period Weight Gain - weight gained (or lost) during treatment (postwt-prewt) We begin our analysis by examining comparative displays for the weight gained across the three treatment methods. To do this select Fit Y by X from the Analyze menu and place the grouping variable, group, in the X box and place the response, Weight Gain, in the Y box and click OK. Here boxplots, mean diamonds, normal quantile plots, and comparison circles have been added. Things to consider from this graphical display: Do there appear to be differences in the mean weight gain? Are the weight changes normally distributed? Is the variation in weight gain equal across therapies? 122 Checking the Equality of Variance Assumption To test whether it is reasonable to assume the population variances are equal for these three therapies select UnEqual Variances from the Oneway Analysis pull down-menu. Equality of Variance Test H o : 12 22 k2 H a : the population variances are not all equal We have no evidence to conclude that the variances/standard deviations of the weight gains for the different treatment programs differ (p >> .05). ONE-WAY ANOVA TEST FOR COMPARING THE THERAPY MEANS To test the null hypothesis that the mean weight gain is the same for each of the therapy methods we will perform the standard one-way ANOVA test. To do this in JMP select Means, Anova/t-Test from the Oneway Analysis pull-down menu. The results of the test are shown in the Analysis of Variance box. MS Treat 307.22 MS Error 56.68 The mean square for treatments is 5.42 times larger than the mean square for error! This provides strong evidence against the null hypothesis. The p-value contained in the ANOVA table is .0065, thus we reject the null hypothesis at the .05 level and conclude statistically significant differences in the mean weight gain experienced by patients in the different therapy groups exist. 14.2 - MULTIPLE COMPARISONS Because we have concluded that the mean weight gains across treatment method are not all equal it is natural to ask the secondary question: Which means are significantly different from one another? How large are the differences? We could consider performing a series of two-sample t-Tests and constructing confidence intervals for independent samples to compare all possible pairs of means, however if the number of treatment groups is large we will almost certainly find two treatment means as being significantly different. Why? Consider a situation where we have k = 7 different treatments that we wish to compare. To compare all possible pairs of means (1 vs. 2, 1 k (k 1) 21 two-sample t-Tests. If vs. 3, …, 6 vs. 7) would require performing a total of 2 123 we used P(Type I Error) .05 for each test we expect to make 21(.05) 1 Type I Error, i.e. we expect to find one pair of means as being significantly different when in fact they are not. This problem only becomes worse as the number of groups, k, gets larger. Experiment-wise Error Rate Another way to think about this is to consider the probability of making no Type I Errors when making our pair-wise comparisons. When k = 7 for example, the probability of making no Type I Errors is (.95) 21 .3406 , i.e. the probability that we make at least one Type I Error is therefore .6596 or a 65.96% chance. Certainly this unacceptable! Why would we conduct a statistical analysis when we know that we have a 66% of making an error in our conclusions? This probability is called the experiment-wise error rate. Bonferroni Correction There are several different ways to control the experiment-wise error rate. One of the easiest ways to control experiment-wise error rate is use the Bonferroni Correction. If we plan on making m comparisons or conducting m significance tests the Bonferroni Correction is to simply use as our significance level rather than . This simple m correction guarantees that our experiment-wise error rate will be no larger than . This correction implies that our p-values will have to be less than , rather than , to be m considered statistically significant. Multiple Comparison Procedures for Pair-wise Comparisons of k Population Means When performing pair-wise comparison of population means in ANOVA there are several different methods that can be employed. These methods depend on the types of pair-wise comparisons we wish to perform. The different types available in JMP are summarized briefly below: Compare each pair using the usual two-sample t-Test for independent samples. This choice does not provide any experiment-wise error rate protection! (DON’T USE) Compare all pairs using Tukey’s Honest Significant Difference (HSD) approach. This is best choice if you are interested comparing each possible pair of treatments. Compare with the means to the “Best” using Hsu’s method. The best mean can either be the minimum (if smaller is better for the response) or maximum (if bigger is better for the response). Compare each mean to a control group using Dunnett’s method. Compares each treatment mean to a control group only. You must identify the control group in JMP by clicking on an observation in your comparative plot corresponding to the control group before selecting this option. 124 Multiple Comparison Options in JMP EXAMPLE 1 - Weight Gain in Anorexia Patients (cont’d) For these data we are probably interested in comparing each of the treatments to one another. For this we will use Tukey’s multiple comparison procedure for comparing all pairs of population means. Select Compare Means from the Oneway Analysis menu and highlight the All Pairs, Tukey HSD option. Beside the graph you will now notice there are circles plotted. There is one circle for each group and each circle is centered at the mean for the corresponding group. The size of the circles are inversely proportional to the sample size, thus larger circles will drawn for groups with smaller sample sizes. These circles are called comparison circles and can be used to see which pairs of means are significantly different from each other. To do this, click on the circle for one of the treatments. Notice that the treatment group selected will be appear in the plot window and the circle will become red & bold. The means that are significantly different from the treatment group selected will have circles that are gray. These color differences will also be conveyed in the group labels on the horizontal axis. In the plot below we have selected Standard treatment group. 125 The results of the pair-wise comparisons are also contained the output window (shown below). The matrix labeled Comparison for all pairs using Tukey-Kramer HSD identifies pairs of means that are significantly different using positive entries in this matrix. Here we see only behavioral cognitive therapy and standard therapy significantly differ in terms of mean weight change. The next table conveys the same information by using different letters to represent populations that have significantly different means. Notice behavioral and standard therapies are not connected by the same letter so they are significantly different. Finally the CI’s in the Ordered Differences section give estimates for the differences in the population means. Here we see that mean weight gain for patients in receiving behavioral therapy is estimated to be between 2.09 lbs. and 13.34 lbs. larger than the mean weight gain for patients receiving standard treatment (see highlighted section below). 126 EXAMPLE 2 – Butter Fat Content in Cow Milk Data File: Butterfat-cows.JMP This data set comes from the butter fat content for five breeds of dairy cows. The variables in the data file are: Breed – Ayshire, Canadian, Guernsey, Holstein-Fresian, Jersey Age Group – two different age class of cows (1 = younger, 2 = older) Butterfat - % butter fat found in the milk sample We begin our analysis by examining comparative displays for the butter fat content across the five breeds. To do this select Fit Y by X from the Analyze menu and place the grouping variable, Breed, in the X box and place the response, Butterfat, in the Y box and click OK. The resulting plot simply shows a scatter plot of butter fat content versus breed. In many cases there will numerous data points on top of each other in such a display making the plot harder to read. To help alleviate this problem we can stagger or jitter the points a bit from vertical by selecting Jitter Points from the Display Options menu. To help visualize breed differences we could add quantile boxplots, mean diamonds, or mean with standard error bars to the graph. To do any or all of these select the appropriate options from the Display Options menu. The plot at the top of the next page has both quantile boxes and mean with error bars added. We can clearly see the butter fat content is largest for Jersey and Guernsey cows and lowest for Holstein-Fresian. There also appears to be some difference in the variation of butter fat content as well. The summary statistics on the following page confirm our findings based on the graph above. To obtain these summary statistics select the Quantiles and Means, Std Dev, Std Err options from the Oneway Analysis menu at the top of the window. 127 Summary Statistics for Butter Fat Content by Breed To test the null hypothesis that the butter fat content is the same for each of the breeds we can perform a one-way ANOVA test. H o : Ayshire Canadian ... Jersey H a : at least two breeds have different mean butter fat content in their milk Again the assumptions for one-way ANOVA are as follows: 1.) The samples are drawn independently or come from using a completely randomized design. Here we can assume the cows sampled from each breed were independently sampled. 2.) The variable of interest is normally distributed for each population. 3.) The population variances are equal across groups. Here this means: 2 Ayshire 2 Canadian ... 2 Jersey If this assumption is violated we can use Welch’s ANOVA, which allows for inequality of the population variances or we can transform the response (see below). To check these assumptions in JMP select UnEqual Variances and Normal Quantile Plot > Actual by Quantile (the 1st option). To conduct the one-way ANOVA test select Means, Anova/t-Test from the Oneway Analysis pull-down menu. The graphical results and the results of the test are shown on the following page. . 128 Butter Fat Content by Breed Normality appears to be satisfied The equality of variance test results are shown below. We have strong evidence against the equality of the population variances/standard deviations. (Use Welch’s ANOVA) We conclude at least two means differ. Because we have strong evidence against equality of the population variances we could use Welch’s ANOVA to test the equality of the means. This test allows for the population variances/standard deviations to differ when comparing the population means. For Welch’s ANOVA we see that the p-value < .0001, therefore we have strong evidence against the equality of the means and we conclude these dairy breeds differ in the mean butter fat content of their milk. 129 Variance Stabilizing Transformations Another approach that is often times used when we encounter situations where there is evidence against the equality of variance assumption is to transform the response. Often times we find that the variation appears to increase as the mean increases. For these data we see that this is the case, the estimated standard deviations increase as the estimated means increase. To correct this we generally consider taking the log of the response. Sometimes the square root and reciprocal are used but these transformations do not allow for any meaningful interpretation in the original scale, whereas the log transform does (see Handout 14). As we can see, as the samples means for the groups increase so do the sample standard deviations. To stabilize the variances/standard deviations we can try a log transformation. The results of the analysis using the log response are shown below. Analysis of the log(Percent Butter Fat) Content for the Five Dairy Cow Breeds Normality is still satisfied, there is no evidence against the equality of the variances, and we have strong evidence that at least two means differ. 130 Because we have concluded that the mean/median log butter fat content differs across breed (p < .0001), it is natural to ask the secondary question: Which breeds have typical values that are significantly different from one another? To do this we again use multiple comparison procedures. In JMP select Compare Means from the Oneway Analysis menu and highlight the All Pairs, Tukey HSD option. The results of Tukey’s HSD are shown below. Below is the plot showing the results when clicking on the circle for Guernsey’s. We can see that groups Guernsey’s and Jersey’s do not significantly differ from each other, but they both differ significantly from the other three breeds. The results of all pair-wise comparisons are contained in the matrix above. All pairs of means that are significantly different will have positive entries in this matrix. Here we see only Guernsey and Jersey do not significantly differ. The table using letters conveys the same information, using different letters to represent populations that have significantly different means. The CI’s in the Ordered Differences section give estimates for the differences in the population means/medians in the log scale. As was the case with using the log transformation with two populations, we interpret the results in the original scale 131 in terms of ratios of medians. For example, back transforming the interval comparing Jersey to Holstein-Fresian, we estimate that the median butter fat content found in Jersey milk is between 1.33 and 1.55 times larger than the median butter fat content in HolsteinFresian milk. We could also state this in terms of percentages as follows: we estimate that the typical butter fat content found in Jersey milk is between 33% and 55% higher than the butter fat content found in Holstein-Fresian milk. 132 15.3 - Randomized Complete Block (RCB) Designs EXAMPLE 1 – Comparing Methods of Determining Blood Serum Level Data File: Serum-Meth.JMP The goal of this study was determine if four different methods for determining blood serum levels significantly differ in terms of the readings they give. Suppose we plan to have 6 readings from each method which we will then use to make our comparisons. One approach we could take would be to find 24 volunteers and randomly allocate six subjects to each method and compare the readings obtained using the four methods. (Note: this is called a completely randomized design). There is one major problem with this approach, what is it? Instead of taking this approach it would clearly be better to use each method on the same subject. This removes subject to subject variation from the results and will allow us to get a clearer picture of the actual differences in the methods. Also if we truly only wish to have 6 readings for each method, this approach will only require the use of 6 subjects versus the 24 subjects the completely randomized approach discussed above requires, thus reducing the “cost” of the experiment. The experimental design where each patient’s serum level is determined using each method is called a randomized complete block (RCB) design. Here the patients serve as the blocks; the term randomized refers to the fact that the methods will be applied to the patients blood sample in a random order, and complete refers to the fact that each method is used on each patients blood sample. In some experiments where blocking is used it is not possible to apply each treatment to each block resulting in what is called an incomplete block design. These are less common and we will not discuss them in this class. The table below contains the raw data from the RCB experiment to compare the serum determination methods. Method Subject 1 2 3 4 1 360 435 391 502 2 1035 1152 1002 1230 3 632 750 591 804 4 581 703 583 790 5 463 520 471 502 6 1131 1340 1144 1300 133 Visualizing the need for Blocking Select Fit Y by X from the Analyze menu and place Serum Level in the Y, Response box and Method in the X, Factor box. The resulting comparative plot is shown below. Does there appear to be any differences in the serum levels obtained from the four methods? This plot completely ignores the fact that the same six blood samples were used for each method. We can incorporate this fact visually by selecting Oneway Analysis > Matching Column... > then highlight Patient in the list. This will have the following effect on the plot. Now we can clearly see that ignoring the fact the same six blood samples were used for each method is a big mistake! On the next page we will show how to correctly analyze these data. 134 Correct Analysis of RCB Design Data in JMP First select Fit Y by X from the Analyze menu and place Serum Level in the Y, Response box, Method in the X, Factor box, and Patient in the Block box. The results from JMP are shown below. Notice the Y axis is “Serum Level – Block Centered”. This means that what we are seeing in the display is the differences in the serum level readings adjusting for the fact that the readings for each method came from the same 6 patients. Examining the data in this way we can clearly see that the methods differ in the serum level reported when measuring blood samples from the same patient. The results of the ANOVA clearly show we have strong evidence that the four methods do not give the same readings when measuring the same blood sample (p < .0001). The tables below give the block corrected mean for each method and the block means used to make the adjustment. As was the case with one-way ANOVA (completely randomized) we may still wish to determine which methods have significantly different mean serum level readings when measuring the same blood sample. We can again use multiple comparison procedures. 135 Select Compare Means... > All Pairs, Tukey’s HSD. We can see that methods 4 & 2 differ significantly from methods 1 & 3 but not each other. The same can be said for methods 1 & 3 when compared to methods 4 & 2. The confidence intervals quantify the size of the difference we can expect on average when measuring the same blood samples. For example, we see that method 4 will give between 90.28 and 255.05 higher serum levels than method 3 on average when measuring the same blood sample. Other comparisons can be interpreted in similar fashion. 136 EXAMPLE 2: Exercise Performance and Ipratropium Bromide Aerosol Dose A study by Ikeda et al., (Thorax, 1996), was designed to determine the does of ipratropium bromide aerosol that improves exercise performance using progressive cycle ergometry in patients with stable chronic obstructive pulmonary disease. The mean age of the 20 male subjects was 69.2 years with a standard deviation of 4.6 years. Among the data collected were the maximum ventilation (VEmax, L/min.) values at maximum achieved exercise for difference ipratropium bromide dosage levels ( g). Data File: VE max IBA A portion of the data table in JMP Each dose (0 or placebo, 40, 80, 160, and 240) was given to the subjects in a random order. Subjects are the blocking factor, Treatment (i.e. dose) is the factor of interest, and VE max is the response. Research Questions: a) Determine if the mean maximum ventilation differs significantly as a function of dose. b) Which dosage levels, if any, significantly differ from placebo? c) Assuming that the larger VE values are best, what dosage would you recommend? To analyze these data begin by select Analyze > Fit Y by X and place VE max in the Y box, Treatment (dose) in the X box, and Subject (i.e. blocks) in the Block box as shown below. The resulting output is shown on the following page. 137 Analysis of Block Design There is a significant treatment (dose) effect on the mean VE max (p = .0022). The block effect also appears to be quite large. To compare the different dosages to placebo we can use Dunnett’s procedure. First click on a point in the placebo group to identify that group as control and then select Compare Means > With Control, Dunnett’s which will produce the output shown below. Comparisons of different doses vs. placebo Here the results from Dunnett’s Methods show two dosesg and 240 g, are significantly different from placebo. 138 By comparing the means in a pair-wise fashion using Tukey’s HSD we see that two highest dose levels do not significantly differ we could argue that the “optimal” dose would be 160 g because it would be “cheaper” in some sense than the higher dose. Tukey’s HSD Results 15.4 - Nonparametric Approach: Kruskal-Wallis Test If the normality assumption is suspect or the sample sizes from each of the k populations are too small to assess normality we can use the Kruskal-Wallis Test to compare the size of the values drawn from the different populations. There are two basic assumptions for the Kruskal-Wallis test: 1) The samples from the k populations are independently drawn. 2) The null hypothesis is that all k populations are identical in shape, with the only potential difference being in the location of the typical values (medians). Hypotheses: H o : All k populations have the same median or location. H a : At least one of the populations has a median different from other others or At least one population is shifted away from the others. To perform to the test we rank all of the data from smallest to largest and compute the rank sum for each of the k samples. The test statistic looks at the difference between the R N 1 average rank for each group i and average rank for all observations . If 2 ni there are differences in the populations we expect some groups will have an average rank much larger than the average rank for all observations and some to have smaller average ranks. 2 k R N 1 12 ~ k21 (Chi-square distribution with df = k-1) H ni i N ( N 1) i 1 ni 2 The larger H is the stronger the evidence we have against the null hypothesis that the populations have the same location/median. Large values of H lead to small p-values! 139 Example: Movement of Gastropods (Austrocochlea obtusa) Preliminary observations on North Stradbroke Island indicated that the gastropod Austrocochlea obtuse preferred the zone just below the mean tide line. In an experiment to test this, A. obtusa were collected, marked, and placed either 7.5 m above this zone (Upper Shore), 7.5 m below this zone (Lower Shore), or back in the original area (Control). After two tidal cycles, the snails were recaptured. The distance each had moved (in cm) from where it had been placed was recorded. Is there a significant difference among the median distances moved by the three groups? Enter these data into two columns, one denoting the group the other containing the recorded movement for each snail. R1 84, R2 79, R3 162 and H 7.25 (p-value = .0287). We have evidence to suggest that the movement distances significantly differ between the groups. Multiple Comparisons To determine which groups significantly differ we can use the procedure outlined below. To determine if group i significantly differs from group j we compute z ij Ri R j ni n j N ( N 1) 1 1 n n 12 j i and then compute p-value = P( Z z ij ) . 140 If the p-value is less then where m # of pair-wise comparisons to be made which would typically be 2m k k (k 1) if all pair-wise comparisons are of interest. For this example, we can make a total of m = 2 2 .05 3 3(3 1) .00833 . 3 pair-wise comparisons so we compare our p-values to 2(3) 2 2 Comparing Upper Shore vs. Control z13 18.0 12.0 = 1.618 P(Z > 1.62) = .0526 > .00833 so we fail to conclude these locations 25(26) 1 1 12 7 9 significantly differ in terms of gastropod movement. Comparing Upper Shore vs. Lower Shore z 23 | 18.0 8.78 | 25(26) 1 1 12 9 9 2.657 P(Z>2.66) = .0039 < .00833 so we conclude that these locations significantly differ in terms of gastropod movement. A similar comparison could be made to compare lower shore to control, however it clearly will not be significant given previous results. 141 ONE-WAY ANOVA MODEL (FYI only) In statistics we often use models to represent the mechanism that generates the observed response values. Before we introduce the one-way ANOVA model we first must introduce some notation. Let, ni sample size from group i , (i 1,..., k ) X ij the j th response value for i th group , ( j 1,..., ni ) mean common to all groups assuming the null hypothesis is true i shift in the mean due to the fact the observation came from group i ij random error for the j th observatio n from the i th group . Note: The test procedure we use requires that the random errors are normally distributed with mean 0 and variance 2 . Equivalently the test procedure requires that the response is normally distributed with a common standard deviation for all k groups. One-way ANOVA Model: X ij i ij i 1,...,k and j 1,...,ni The null hypothesis equivalent to saying that the i are all 0, and the alternative says that at least one i 0 , i.e. H o : i 0 for all i vs. H a : i 0 for some i . We must decide on the basis of the data whether we evidence against the null. We display this graphically as follows: Estimates of the Model Parameters: ̂ X ~ the grand mean ˆi X i X ~ the estimate ith treatment effect Xˆ ˆ ˆ X ~ these are called the fitted values. ij i i Note: X ij ˆ ˆi ˆij X ij X ( X i X ) ( X ij X i ) ˆij X ij Xˆ ij X ij X i ~ these are called the residuals. An Alternative Derivation of the Test Statistic Based on the Model Starting with X ij ˆ ˆi ˆij X ij X ( X i X ) ( X ij X i ) 142 we obtain ( X ij X ) ( X i X ) ( X ij X i ) After squaring both sides we have, ( X ij X ) 2 ( X i X ) 2 ( X ij X i ) 2 2( X i X )( X ij X i ) Finally after summing overall of the observations and simplying we obtain. k ni k k ni ( X ij X ) 2 ni ( X i X ) 2 ( X ij X i ) 2 i 1 j 1 i 1 i 1 j 1 These measures of variation are called “Sum of Squares” and are denoted SS. The above expression can be written SSTotal SSTreat SS Error The degrees of freedom associated with each of these SS’s have the same relationship and are given below. df total = df treatment + df error (n – 1) = (k – 1) + (n – k) The mean squares (i.e. variance estimates) discussed above are found by taking the sum of squares and dividing by their associated degrees of freedom, i.e. k MS Treat niˆi2 SSTreat 2 which is an estimate of the common pop. variance ( ) only if Ho is true i 1 k 1 k 1 and MS Error SS Error = ˆ 2 which is an estimate of the common population 2 regardless of Ho. nk Thus when testing H o : i 0 for all i. H a : i 0 for some i. we reject Ho when MSTreat >> MSError , i.e. when the estimated treatment effects are large! 143