Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Statistical inference wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Misuse of statistics wikipedia , lookup
15.4 ONE-WAY ANALYSIS OF VARIANCE One-way analysis of variance refers to the case in which we are testing the equality of the theoretical means of multiple populations distinguished by differing levels or categories of one variable (or factor), such as various teaching methods, for example. Every ANOVA example we have considered so far has been one-way. For example, in the vehicles data, the populations were distinguished by a single variable, namely the type of vehicle. This kind of nonnumerical variable is called a nominal variable. Nominal variables occur often in certain areas of study, such as sociology, in which people are categorized in many ways to better understand societal functioning. In the cancer patient data of Section 15.2, the populations were distinguished by type of cancer. We will later consider two-way ANOVA, where populations will be distinguished by two variables, such as gender and highest educational degree attained. When legitimate to do so, we would like to use a statistic with a specific probability distribution that we can look up in a table to test a null hypothesis rather than use the bootstrap approach. Recall that when we tested hypotheses previously, we sometimes used the normal distribution to test for significance. When our assumptions about the real-world setting of our hypothesis-testing problem allow it, such a distribution-based procedure is simpler to utilize than the bootstrap procedure and will often produce more reliable and powerful results. But if we do not believe we can justify the required assumption, we have to fall back on bootstrapping, which indeed usually works well. Thus the bootstrap approach is an extremely valuable tool for the statistician to have in his or her toolbox of statistical procedures. We will develop the F distribution to provide the probability distribution of the ratio of the between-samples mean square over the within-samples mean square under the assumption that the null hypothesis of equal population means is true. For use of the F distribution to be justified, however, some assumptions must hold (there is always some tradeoff for simpler methods; for example, they are less widely applicable): 1. Each observation must be independent of the others. Practically speak- ing, this means that the data should be collected in such a way that each observation is not influenced by any of the previously collected observations. This assumption of independence between observations is needed for the bootstrapping approach as well, because its random sampling from the sample is justified only if the observations of the original sample were independent of each other. Indeed, almost all statistical approaches taught in a first course in statistics presume independence of observations. Why good statistical experimental design and statistical survey sampling procedures make the assumption of the independence of the observations in the sample a reasonable assumption was discussed thoroughly in Chapter 10. 2. When the sample sizes are small for the populations being sampled from, the user must have reasons to assume for each sampled population that the observations are sampled from a roughly normal distribution (less than 20 individuals per sample is our rule-of-thumb definition of a small sample size). However, if the sample sizes are large (greater than 20), there is no restriction about what the population shapes need to be, because the sample mean for each such population is approximately normally distributed by the central limit theorem regardless of the shape of the population being sampled from. Moreover, this approximate normality of sample means suffices to justify the F-distribution approach even when the populations do not have a normal distribution shape. 3. The within-sample variability should be about the same across all samples. Assumption 2 above refers to the shape required of the distribution of the population that a sample came from when sample sizes are small. By contrast, assumption 3 specifies that these population shapes should have approximately the same spread, which we assess using the various sample standard deviations for convenience. This restrictive assumption is not always true and is important to assess, the usual method being to compare the standard deviations of the samples. The third assumption can be violated in many settings. Consider, for example, the cancer data from Section 15.2. The data show that the variability of survival time for those with breast cancer is clearly much larger than the variability for the other two types of cancer (compare the three standard deviations as an informative exercise). So, although using the bootstrap approach was justified and worked well, the F-distribution approach is not appropriate, even though assumptions 1 and 2 may be argued to be satisfied. There is no clear line as to when this last assumption is violated. However, a rule of thumb sometimes suggested is to take the ratio of the largest to the smallest sample standard deviation and require this ratio to be less than two. We will adopt this rule of thumb. If the above three assumptions hold and the null hypothesis of equal population means holds, then the ratio of the between-sample mean square divided by the within-sample mean square has the F distribution of Table H. The F Distribution We now discuss the F distribution explicitly. The mean squares we find are computed from the samples, and thus are random and hence will vary from one set of samples to the next even when the null hypothesis is true. Thus, even when populations have the same theoretical mean in common, when we randomly sample from these populations, we do not expect the resulting ratio of the mean squares to be exactly one, even though a theoretical analysis shows that the expected value of the numerator mean square over the expected value of the denominator mean square is 1. The point is that the ratio will fluctuate around 1 when the null hypothesis of equal population means is true. Thus variations in the data resulting from random sampling will always make the ratio different from 1 when the null hypothesis is true. Assuming the null hypothesis is true, what the F distribution does for us is to give us an idea of how large the ratio must be to fall outside the range of the typical random fluctuation of the ratio, and thus for ratio values sufficiently distant from 1 lead us to conclude that the populations from which the mean squares were taken do not have the same theoretical means. It is perhaps evident that the numerator (the between-samples mean square) will tend to be larger than the denominator (the within-samples mean square) when the theoretical population means differ. The reason is that a difference between population means will tend to increase the between-samples mean square while not influencing the within-samples mean square. It is because of this that we consider large values of the ratio as persuasive evidence that the null hypothesis of equal population means is false. The F distribution has two parameters that characterize it completely, just as the normal distribution is characterized by its population mean and variance. These are the degrees of freedom associated with the numerator mean square, and the degrees of freedom associated with the denominator mean square. In Figure 15.4 a typical F distribution with 4 numerator degrees of freedom and 10 denominator degrees of freedom is shown. Increasing the denominator degrees of freedom tends to conserve the general shape but shrink the curve somewhat toward 0. Increasing the numerator degrees of freedom slowly moves the shape toward a more normal shape and expands the curve away from 0. The F Distribution Tables Appendix H in the back of the book gives the upper 5% (Table H.1) and 1% (Table H.2) points for the F distribution for various combinations of numerator and denominator degrees of freedom. That is, the upper 0.05 and 0.01 probability points are given. Part of Table H.1 is reproduced as Table 15.2 for convenience. The row across the top gives the numerator degrees of freedom, and the first column gives the denominator degrees of freedom. To use Table H.1, find the desired row and column. The corresponding number in the interior of the table is the 5% point (that is, the point for which the probability of being to its right is 0.05) in the corresponding F distribution. 0.6 0.4 0.2 0.0 0 2 3.48 4 6 8 Figure 15.4 F distribution with 4 numerator degrees of freedom and 10 denominator degrees of freedom. Table 15.2 Portion of Upper 5% F Table Denominator degrees of freedom 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 161.45 18.51 10.13 7.71 6.61 5.99 5.59 5.32 5.12 4.96 199.50 19.00 9.55 6.94 5.79 5.14 4.74 4.46 4.26 4.10 215.71 19.16 9.28 6.59 5.41 4.76 4.35 4.07 3.86 3.71 224.58 19.25 9.12 6.39 5.19 4.53 4.12 3.84 3.63 3.48 230.16 19.30 9.01 6.26 5.05 4.39 3.97 3.69 3.48 3.33 233.99 19.33 8.94 6.16 4.95 4.28 3.87 3.58 3.37 3.22 236.77 19.35 8.89 6.09 4.88 4.21 3.79 3.50 3.29 3.14 238.88 19.37 8.85 6.04 4.82 4.15 3.73 3.44 3.23 3.07 240.54 19.38 8.81 6.00 4.77 4.10 3.68 3.39 3.18 3.02 241.88 19.40 8.79 5.96 4.74 4.06 3.64 3.35 3.14 2.98 Numerator degrees of freedom For example, the 5% point for an F distribution with 4 numerator degrees of freedom and 10 denominator degrees of freedom is 3.48. Observe this value for the curve in Figure 15.4. This means that if we were considering an observed ratio of mean squares greater than 3.48, with 4 and 10 degrees of freedom in the numerator and denominator, respectively, we would have statistically significant evidence to conclude that the population means are not equal. In other words, the distribution under the null hypothesis of equal population means says that a value greater than 3.48 for the ratio is sufficiently unlikely in the case when the theoretical means are equal that we should conclude that they are not equal. One-Way ANOVA Using the F distribution We are now prepared to use the F distribution to test for equality of means in a one-way analysis of variance. Let’s consider the Key Problem presented at the beginning of this chapter. Before we perform an ANOVA, however, certain aspects of the hot dog data collection are instructive to think about, in view of our emphasis in Chapter 10 on the use of randomization to obtain data useful for statistical analyses. First, let’s suppose that each of the 51 measurements in the Key Problem is the result of some sort of chemical analysis of one randomly chosen hot dog of the particular type (e.g., Oscar Mayer “All Beef”) of hot dog being analyzed. Unlike an elementary physics experiment, such as observing the distance a constant-speed object moves in 10 seconds, where random aspects of collected data cause little error, here there are many reasons why the 51 observations have a sizable random variability. First, measurement error, discussed in Chapter 9, is likely a major contributor to the variability occurring in the data, because measuring the number of calories in a hot dog is a complex process. Second, there will be random variation in the number of calories from hot dog to hot dog in the same package, and third, variation will occur from package to package of the same brand (for example, one package may have been made from a batch of leaner turkey meat than another). Fourth, there will be variation from brand to brand within the same meat type population. So there are four major sources of variation in the observed number of calories of a particular meat type of hot dog. We model each of the 17 observations of one meat type as being a simple random sample from its meat type population, each such population likely having a sizable variance. This assumption of simple random sampling presumes that each hot dog of the population of, for example, all beef hot dogs has an equal chance of being chosen. However, as was also the case for the multi- stage stratified National Opinion Research Center General Social Survey data of Section 10.3, the random sampling was stratified and done in stages. Stage 1 is the choice of 17 all beef hot dog brands, from the strata of all beef hot dog brands of interest (which may just be these 17 or may be many more than these 17). Then Stage 2 is the random choice of one package of hot dogs from each of the 17 brands. Next, Stage 3 is the random choice of one hot dog from each of the sampled packages. This is one way the random sampling of the 51 hot dogs could have been carried out (other stratified random sampling plans being possible). For our purposes, we intentionally ignore this sampling complexity and act as if the sampling was simple random sampling of hot dogs from each of the three meat type populations. Thus, for example, each manufactured all beef hot dog is presumed to be equally likely to have been chosen among the 17 actually sampled all beef hot dogs. Example 15.7 We want to perform the one-way ANOVA procedure on the hot dog data presented in the Key Problem. To start, we will attempt to justify that the three assumptions hold in this case, for if so then we can take an F-distribution approach. First, it is reasonable to assume that the number of calories in one sampled hot dog is independent of the number of calories in the other sampled hot dogs because the hot dogs have been randomly sampled from different populations. Second, for each of the three populations, the observations should have been sampled from a population that is at least roughly normally distributed. To informally investigate this, three histograms can be created from the three samples. We omit details of this, but the histograms do not provide evidence that we should reject the assumption that each population is roughly normally distributed, especially since the F test of a one-way ANOVA is somewhat robust against departures from population normality that are not too extreme. By robust, we mean here that moderate departures from population normality do not make the procedure behave very differently than what the F table tells us. Finally, the last assumption is that the population standard deviations corresponding to the three samples are about the same. This assumption is certainly reasonable, since the sample standard deviations of the beef, poultry, and meat combination samples are 22.24, 21.87, and 24.48, respectively, producing a ratio of largest over smallest sample standard deviation much less than the criterion value of two. We therefore proceed with testing the hypothesis H0 : All three types of hot dogs have the same population average. using the standard F-distribution-based ANOVA procedure. First, we want to calculate the between-samples sum of squares. The sample averages for the beef, poultry, and meat combination samples are 160.1, 118.8, and 158.7, respectively. The overall average of the 51 hot dogs is 145.87, gotten easily by averaging the three sample averages; this shortcut is permitted here because each sample is the same size (17). So, Between-samples sum of squares ⳱ 17(160.1 ⫺ 145.87)2 Ⳮ 17(158.7 ⫺ 145.87)2 Ⳮ 17(118.8 ⫺ 145.87)2 ⳱ 18698.07 Next, the within-samples sum of squares is calculated as follows: Within-samples sum of squares ⳱ (186 ⫺ 160.1)2 Ⳮ (181 ⫺ 160.1)2 Ⳮ ⭈⭈⭈ Ⳮ (131 ⫺ 160.1)2 Ⳮ (129 ⫺ 118.8)2 Ⳮ (132 ⫺ 118.8)2 Ⳮ ⭈⭈⭈ Ⳮ (144 ⫺ 118.8)2 Ⳮ (173 ⫺ 158.7)2 Ⳮ (191 ⫺ 158.7)2 Ⳮ ⭈⭈⭈ Ⳮ (138 ⫺ 158.7)2 ⳱ 26735.53 The degrees of freedom associated with the between-samples sum of squares is 3 ⫺ 1, since there are three samples. The degrees of freedom associated with the within-samples sum of squares is 51 ⫺ 3 ⳱ 48, since there are 51 total observations and 3 populations being sampled from. The degrees of freedom associated with the total sum of squares is 51 ⫺ 1, since there are 51 observations. The following table shows all the results, including the corresponding mean squares and the F statistic, the ratio between the mean squares. Sum of squares Degrees of freedom Mean square Between samples Within samples 18,698.07 26,735.53 2 48 9349.04 556.99 Total 45,433.60 50 Source F 16.78 This table is called the analysis of variance table or ANOVA table. It is the form commonly provided by statistics software packages to summarize the results of the ANOVA procedure—for example, SPSS, SAS, or Minitab. Note that, as we have stated previously, the total sum of squares equals the between-samples sum of squares plus the within-samples sum of squares. Likewise, the total sum of squares degrees of freedom equals the between-samples degrees of freedom plus the within samples degrees of freedom. The mean square associated with the total sum of squares is not included in the table, since it is of no significance. Likewise, the only entry in the F column that has any meaning is the ratio of the between-samples over the within-samples mean squares, which is placed in the first row. That is because it is this ratio that obeys the F distribution of Table H when the null hypothesis of equal population means holds. Finally we have reached the point where we want to compare the computed F statistic, 16.78, to the F distribution having 2 numerator and 48 denominator degrees of freedom. Looking at Table H.1, we see that the point that 5% of the area is above for that curve is 3.19. Similarly, it can be seen in Table H.2 that the 1% point for the F distribution is 5.10. Thus, we reject the null hypothesis, since the observed ratio of 16.78 is so much larger than what we would expect the ratio to be if the null hypothesis were in fact true. Indeed, the probability of an F ratio higher than the observed 16.78 is extremely close to 0, and hence the statistical evidence is extremely strong that the null hypothesis of equal population means for the three meat types is false. Designing Experiments In many cases the between-samples sum of squares is instead called the between-treatments sum of squares. The term treatments comes from the fact that the ANOVA method is often used in cases where we have designed and performed a statistical experiment involving different treatments using randomization, as discussed in Chapter 10. For example, a group of subjects may be gathered for a medical study. Some are randomly given treatment 1, some given treatment 2, and the rest (the control group) receive a notreatment placebo (treatment 3). Then all of the patients are assessed to see how they responded to the treatments, and the ANOVA procedure is used to test whether the different treatments (three in number, including the control) produced significantly different results. If we are designing an experiment in which the results will be analyzed using ANOVA, one thing that should be done at the design stage of the experiment is to randomly assign the treatments to the different units. For example, in the medical study, we should take the randomly selected group of patients and randomly choose one third of them to receive treatment 1, and so on. As discussed in Chapter 10, randomization is essential because there are many differences between units that affect the medical outcome being evaluated that we may never be aware of, and we could never adequately account for them in our analysis. The only major difference we want between units (after averaging over all units receiving the same treatment) in different treatment groups is the possible influence of having received a different treatment. By randomly assigning treatments to people, we are able to make negligible the possibility that any known or unknown variable not of interest is at widely differing levels in the multiple treatment groups. SECTION 15.4 EXERCISES 1. For each of the following F distributions, what is the point above which lies 5% of the area? a. Numerator df ⳱ 5, denominator df ⳱ 10 b. Numerator df ⳱ 3, denominator df ⳱ 12 c. Numerator df ⳱ 10, denominator df ⳱ 4 2. a. Complete the ANOVA table labeled Table E1. b. Would you reject the null hypothesis that there are no significant differences among the population means in this case? Table E1 Source Between samples Within samples Total Sum of squares Degrees of freedom Mean square 4823.45 26,591.01 ? ? 22 26 ? ? 3. A consumer group wanted to test for differences in the lives of three different brands of batteries. Six batteries of each brand were obtained and were then used in the same electronic device. The length of life in hours was measured. The results were as follows: Brand A: Brand B: Brand C: 115.76, 107.92, 103.73, 114.14, 113.51, 110.87 121.82, 127.45, 122.24, 125.74, 124.02, 113.39 106.99, 107.78, 103.78, 112.32, 106.46, 120.77 a. State the null hypothesis of the researcher. b. Does the assumption of equal variances among samples seem to be valid in this case? c. Form the ANOVA table for this case. d. Should the null hypothesis be rejected? 4. Four methods of weight loss were being compared. Subjects were chosen and randomly assigned to each of the four groups. After two weeks of treatment, the weight loss was measured. The results in pounds lost were as follows: Method A: 3.67, 2.52, 4.88, 2.73, 4.63 Method B: 2.65, 3.51, 5.81, 2.24, 4.47 Method C: 2.59, 1.91, 0.96, 2.27, 2.80 Method D: 4.45, 3.34, 2.52, 3.72, 3.68 a. State the null hypothesis of the researcher. b. Does the assumption of equal variances among samples seem to be valid in this case? F ? c. Form the ANOVA table for this case. d. Should the null hypothesis be rejected? 5. A manufacturer was testing four different materials for the construction of yarn, comparing the strength of the yarn made from each. A sample of seven pieces of yarn for each material was obtained, and the strength was measured by seeing how much weight it could hold before breaking. The results were as follows: Material A: 9.84, 9.39, 9.70, 9.54, 9.78, 8.82, 10.23 Material B: 9.94, 9.56, 9.84, 8.98, 9.23, 8.83, 9.58 Material C: 8.10, 8.62, 8.31, 8.66, 8.87, 8.32, 8.95 Material D: 8.38, 7.79, 8.37, 8.23, 8.38, 9.11, 8.09 a. State the null hypothesis of the researcher. b. Does the assumption of equal variances among samples seem to be valid in this case? c. Form the ANOVA table for this case. d. Should the null hypothesis be rejected? 6. Explain why it is important that subjects be randomly assigned to different treatment groups when an ANOVA experiment is being designed. 7. a. Create the ANOVA table for the data in Exercise 5 of Section 15.3. b. Find the value from the F table for level of significance ⳱ .05. c. What is the null hypothesis tested by the F test? d. Do you accept or reject the null hypothesis? e. Compare your answer in (d) to that in (d) of Exercise 4 in Section 15.2.