Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
17 Preview Chapter Learning Objectives 17.1The Analysis of Variance— Single-Factor ANOVA and the F Test 17.2Multiple Comparisons Appendix: ANOVA Computations Are You Ready to Move On? Chapter Review Exercises Technology Notes Appendix Tables Table 7: Values That Capture Specified Upper-Tail F Curve Areas Table 8: Critical Values of q for the Studentized Range Distribution Answers to Selected Exercises Asking and Answering Questions about More Than Two Means James Woodson/Digital Vision/Getty Images © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 2 Preview In Chapters 13 and 14, you learned methods for testing H0: m1 2 m2 5 0 (or equivalently, m1 5 m2 ), where m1 and m2 are the means of two different populations or the mean responses when two different treatments are applied. However, many investigations involve comparing more than two population or treatment means, as illustrated in the following example. 2 Chapter Learning Objectives After completing this chapter, you should be able to C1 Understand how a research question about differences between three or more population or treatment means is translated into hypotheses. Mastering the Mechanics After completing this chapter, you should be able to M1 Translate a research question or claim about differences between three or more population or treatment means into null and alternative hypotheses. M2 Know the conditions for appropriate use of the ANOVA F test M3 Carry out an ANOVA F test. M4 Use a multiple comparison procedure to identify differences in population or treatment means. Putting It into Practice After completing this chapter, you should be able to P1 Recognize when a situation calls for testing hypotheses about differences between three or more population or treatment means. P2 Carry out an ANOVA F test and interpret the conclusion in context. Preview Example Risky Soccer In a study to see if the high incidence of head injuries among soccer players might be related to memory recall, researchers collected data from three samples of college students (“No Evidence of Impaired Neurocognitive Performance in Collegiate Soccer Players,” The American Journal of Sports Medicine [2002]: 157–162). One sample consisted of soccer athletes, one sample consisted of athletes whose sport was not soccer, and one sample was a comparison group consisting of students who did not participate in sports. The following information on scores from the Hopkins Verbal Learning Test (which measures memory recall) was given in the paper. Group Soccer Athletes Nonsoccer Athletes Comparison Group Sample Size 86 95 Sample Mean Score 29.90 30.94 53 29.32 Sample Standard Deviation 3.73 5.14 3.78 Notice that the three sample means are different. But even when the population means are equal, you would not expect the three sample means to be exactly equal. Are the differences in sample means consistent with what is expected simply due to chance differences from one sample to another when the population means are equal, or are the differences large enough that you should conclude that the three population means are not all equal? This is the type of problem considered in this chapter. 3 © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Conceptual Understanding 4 CHAPTER 17 Asking and Answering Questions about More Than Two Means © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. section 17.1 The Analysis of Variance—Single-Factor ANOVA and the F Test When more than two populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor under investigation. For example, an experiment might be carried out to compare three different methods for teaching reading (three different treatments), in which case the factor of interest would be teaching method, a qualitative factor. If the growth of the fish raised in waters having different salinity levels—0%, 10%, 20%, and 30%—is of interest, the factor salinity level is quantitative. A single-factor analysis of variance (ANOVA) problem involves a comparison of k population or treatment means m1, m2, …, mk. The objective is to test H0: m1 5 m2 5 . . . 5 mk against Ha: At least two of m's are different When comparing populations, the analysis is based on independently selected random samples, one from each population. When comparing treatment means, the data are from an experiment, and the analysis assumes random assignment of the experimental units (subjects or objects) to treatments. If, in addition, the experimental units are chosen at random from a population of interest, it is also possible to generalize the results of the analysis to this population. Whether the null hypothesis in a single-factor ANOVA should be rejected depends on how much the samples from the different populations or treatments differ from one another. Figure 17.1 displays observations that might result when random samples are selected from each of three populations. Each dotplot displays five observations from the first population, four observations from the second population, and six observations from the third population. For both displays, the three sample means are located by arrows. The means of the two samples from Population 1 are equal, as are the means for the two samples from population 2 and for the two samples from Population 3. Mean of Sample 1 Mean of Sample 2 Mean of Sample 3 (a) Figure 17.1 Two possible ANOVA data sets when three populations are compared: green circle 5 observation from Population 1; orange circle 5 observation from Population 2; blue circle 5 observation from Population 3 Mean of Sample 1 Mean of Sample 2 Mean of Sample 3 (b) After looking at the data in Figure 17.1(a), you would probably think that the claim m1 5 m2 5 m3 appears to be false. Not only are the three sample means different, but also the three samples are clearly separated. In other words, differences between the three sample means are quite large relative to the variability within each sample. The situation pictured in Figure 17.1(b) is much less clear-cut. The sample means are as different as they were in the first data set, but now there is considerable overlap among the three samples. The separation between sample means might be due to the substantial variability in the populations (and therefore the samples) rather than to differences between m1, m2, and m3. The phrase analysis of variance comes from the idea of analyzing variability in the data to see how much can be attributed to Unless otherwise noted, all content on this page is © Cengage Learning. 5 17.1 The Analysis of Variance—Single-Factor ANOVA and the F Test Notations and Assumptions Notation in single-factor ANOVA is a natural extension of the notation used in earlier chapters for comparing two population or treatment means. ANOVA Notation k 5 number of populations or treatments being compared Population or treatment 1 2 Population or treatment mean m1 m2 Population or treatment variances 21 s 22 Sample size n1 n2 _ _ Sample meanx 1x 2 Sample variance s21 s22 ... k ... mk ...s 2k ... nk _ ...x k ... s2k N 5 n1 1 n2 1 . . . 1 nk (the total number of observations in the data set) _ _ _ T 5 grand total 5 sum of all N observations in the data set 5 n x 1 n x 1 . . . 1 n x _ _ 5 x 1 1 grand mean 5 __ T 2 2 k k N A decision between H0: m1 5 m2 5 . . . 5 mk and Ha: At least two of m's are different _ is based on examining the x values to see whether observed differences are small enough to be explained by sampling variability alone or whether an alternative explanation for the differences is more plausible. Example 17.1 An Indicator of Heart Attack Risk The article “Could Mean Platelet Volume Be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor [2005]: 387–392) described a study in which four groups of patients seeking treatment for chest pain were compared with respect to the mean platelet volume (MPV, measured in fL). The four groups considered were based on the clinical diagnosis: (1) noncardiac chest pain, (2) stable angina, (3) unstable angina, and (4) heart attack. The purpose of the study was to determine if the mean MPV differed for the four groups, and in particular if the mean MPV was different for the heart attack group, because then MPV could be used as an indicator of heart attack risk. To carry out this study, patients seen for chest pain were divided into groups according to diagnosis. The researchers then selected a random sample of 35 from each of the resulting k 5 4 groups. The researchers believed that this sampling process would result in samples that were representative of the four populations of interest and that could be regarded as if they were random samples from these four populations. Table 17.1 presents summary values given in the paper. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. differences in the µ’s and how much is due to variability in the individual populations. In Figure 17.1(a), the within-sample variability is small relative to the betweensample variability, whereas in Figure 17.1(b), a great deal more of the total variability is due to variation within each sample. If differences between the sample means can be explained entirely by within-sample variability, there is no compelling reason to reject H0: m1 5 m2 5 m3. 6 CHAPTER 17 Asking and Answering Questions about More Than Two Means Table 17.1 Summary Values for MPV Data of Example 17.1 © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Group Number 1 2 3 4 Sample Size Group Description Noncardiac chest pain Stable angina Unstable angina Heart attack Sample Mean 35 35 35 35 10.89 11.25 11.37 11.75 Sample Standard Deviation 0.69 0.74 0.91 1.07 With m1 denoting the true mean MPV for group i (i 5 1, 2, 3, 4), consider the null hypothesis H0: m1 5 m2 5 m3 5 m4. Figure 17.2 shows a comparative boxplot of the four samples (based on data consistent with summary values given in the paper). The mean MPV for the heart attack sample is larger than for the other three samples, and the boxplot for the heart attack sample appears to be shifted a bit higher than the boxplots for the other three samples. However, because the four boxplots show substantial overlap, it is not obvious whether H0 is plausible or should be rejected. In situations like this, a formal test procedure is helpful. Noncardiac Stable angina Unstable angina Heart attack Figure 17.2 9 Boxplots for Example 17.1 10 11 MPV 12 13 As with the inferential methods of previous chapters, the validity of the ANOVA test for H0: m1 5 m2 5 . . . 5 mk requires that some conditions be met. Conditions for ANOVA 1. Each of the k population or treatment response distributions is normal. 2. s1 5 s2 5 . . . 5 sk (The k normal distributions have equal standard deviations.) 3. The observations in the sample from any particular one of the k populations or treatments are independent of one another. 4. When comparing population means, the k random samples are selected independently of one another. When comparing treatment means, experimental units are assigned at random to treatments. In practice, the test based on these assumptions works well as long as the conditions are not too badly violated. If the sample sizes are reasonably large, normal probability plots or boxplots of the data in each sample are helpful in checking the condition of normality. Often, however, sample sizes are so small that a separate normal probability plot or boxplot for each sample is of little value in checking normality. In this case, a single _ combined plot can be constructed by first subtracting x1 from each observation in the first Unless otherwise noted, all content on this page is © Cengage Learning. 17.1 The Analysis of Variance—Single-Factor ANOVA and the F Test 7 _ 13 Deviation 12 11 10 9 Figure 17.3 A normal probability plot using the combined data of Example 17.1 −3 −2 −1 0 Normal score 1 2 3 There is a formal procedure for testing the equality of population standard deviations. Unfortunately, it is quite sensitive to even a small violation of the normality condition. However, the equal population or treatment standard deviation condition can be considered reasonably met if the largest of the sample standard deviations is at most twice the smallest one. For example, the largest standard deviation in Example 17.1 is s4 5 1.07, which is only about 1.5 times the smallest standard deviation (s1 5 0.69). The analysis of variance test procedure is based on the following measures of variation in the data. Definition A measure of differences among the sample means is the treatment sum of squares, denoted by SSTr and given by _ _ _ _ _ _ _ _ _ SSTr 5 n (x 2 x )2 1 n (x 2 x )2 1 . . . 1 n (x 2 x )2 1 1 2 2 k k A measure of variation within the k samples, called error sum of squares and denoted by SSE, is SSE 5 (n 2 1)s2 1 (n 2 1)s2 1 . . . 1 (n 2 1)s2 1 1 2 2 k k Each sum of squares has an associated df: treatment df 5 k 2 1 error df 5 N 2 k A mean square is a sum of squares divided by its df. In particular, SSTr mean square for treatments 5 MSTr 5 _____ k 21 SSE mean square for error 5 MSE 5 ______ N2k The number of error degrees of freedom comes from adding the number of degrees of freedom associated with each of the sample variances: (n 2 1) 1 (n 2 1) 1 … (n 2 1) 5 n 1 n 1 … n 21 21 2 …1 1 2 1 k 5N2k Unless otherwise noted, all content on this page is © Cengage Learning. 2 k © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. sample, x 2 from each value in the second sample, and so on, and then constructing a normal probability or boxplot of all N deviations from their respective means. Figure 17.3 shows such a normal probability plot for the data of Example 17.1. 8 CHAPTER 17 Asking and Answering Questions about More Than Two Means Example 17.2 © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Heart Attack Calculations _ _ Let’s return to the mean platelet volume (MPV) data of Example 17.1. The grand mean x was calculated to be 11.315. Notice that because the sample sizes are all equal, the grand mean is just the average of the four sample means (this will not usually be the case when _ _ _ _ the sample sizes are unequal). With x 1 5 10.89, x 2 5 11.25, x 3 5 11.37, x 4 5 11.75, and n1 5 n2 5 n3 5 n4 5 35, _ _ _ _ _ _ _ _ _ SSTr 5 n1(x 1 2 x )2 1 n2(x 2 2 x )2 1 . . . 1 nk(x k 2 x )2 5 35(10.89 2 11.315)21 35(11.25 2 11.315)21 35(11.37 2 11.315)2 1 35(11.75 2 11.315)2 5 6.322 1 0.148 1 0.106 1 6.623 5 13.199 Because s1 5 0.69, s2 5 0.74, s3 5 0.91, s4 5 1.07, SSE 5 (n1 2 1)s21 1 (n2 2 1)s22 1 . . . 1 (nk 2 1)s2k 5 (35 2 1) (0.69)21 (35 2 1) (0.74)21 (35 2 1) (0.91)21 (35 2 1) (1.07)2 5 101.888 The numbers of degrees of freedom are treatment df 5 k 21 5 3 error df 5 N 2k 5 35 1 35 1 35 1 35 2 4 5 136 from which 13.199 SSTr 5 4.4000 MSTr 5 _____ k 21 5 ______ 3 SSE 101.888 5 0.749 MSE 5 ______ N 2 k 5 _______ 136 Both MSTr and MSE are quantities whose values can be calculated once sample data are available (they are statistics). Each of these statistics varies in value from data set to data set. Both statistics MSTr and MSE have sampling distributions, and these sampling distributions have mean values. The following box describes the relationship between the mean values of MSTr and MSE. When H0 is true ( m1 5 m2 5 . . . 5 mk ), mMSTr 5 mMSE However, when H0 is false, mMSTr . mMSE and the greater the differences among the m9s, the larger mMSTr will be relative to mMSE. According to this result, when H0 is true, you would expect the values of the two mean squares to be close. However, you would expect MSTr to be substantially greater than MSE when some µ’s differ greatly from others. Thus, a calculated MSTr that is much larger than MSE is inconsistent with the null hypothesis. In Example 17.2, MSTr 5 4.400 and MSE 5 0.749, so MSTr is about six times as large as MSE. Can this be attributed solely to sampling variability, or is the ratio MSTr/MSE large enough to suggest that the null hypothesis is false? Before a formal test procedure can be described, you have to learn about a new family of probability distributions called F distributions. An F distribution always arises in connection with a ratio. A particular F distribution is obtained by specifying both numerator degrees of freedom (df1) and denominator 17.1 The Analysis of Variance—Single-Factor ANOVA and the F Test 9 degrees of freedom (df2). Figure 17.4 shows an F curve for a particular choice of df1 and df2. The ANOVA test of this section is an upper-tailed test, so a P-value is the area under an appropriate F curve to the right of the calculated value of the test statistic. Shaded area = P-value for upper-tailed F test Figure 17.4 An F curve and P-value for an upper-tailed test Calculated F Constructing tables of these upper-tail areas is cumbersome, because there are two degrees of freedom rather than just one (as in the case of t distributions). For selected (df1, df2) pairs, the F table (Appendix Table 7) gives only the four numbers that capture tail areas 0.10, 0.05, 0.01, and 0.001, respectively. Here are the four numbers for df1 5 4, df2 5 10 along with the statements that can be made about the P-value: Tail area 0.10 0.05 Value 2.61 3.48 ↑ ↑ ↑ a b c F , 2.16 → tail area 5 P-value > 0.10 2.61 , F , 3.48 → 0.05 , P-value , 0.10 3.48 , F , 5.99 → 0.01 , P-value , 0.05 5.99 , F , 11.28 → 0.001 , P-value , 0.01 F > 11.28 → P-value , 0.001 a. b. c. d. e. 0.01 5.99 ↑ 0.001 11.28 d ↑ e For example, if F 5 7.12, then 0.001 , P-value , 0.01. If a test with a 5 0.05 is used, H0 should be rejected, because P‑value a. The most frequently used statistical computer packages can provide exact P-values for F tests. Single Factor ANOVA F Test for Equality of Three or More Population Means Appropriate when the following conditions are met: 1. Each of the k population or treatment response distributions is normal. 2. s1 5 s2 5 . . . 5 sk (The k normal distributions have equal standard deviations.) 3. The observations in the sample from any particular one of the k populations or treatments are independent of one another. 4. When comparing population means, the k random samples are selected independently of one another. When comparing treatment means, experimental units are assigned at random to treatments. When these conditions are met, the following test statistic can be used: MSTr F 5 _____ MSE When the conditions above are met and the null hypothesis is true, the F statistic has an approximate F distribution with df1 5 k 2 1 and df2 5 N 2 k Form of the null hypothesis: H0: m1 5 m2 5 . . . 5 mk Form of the alternative hypothesis: Ha: At least two of the m9s are diffrent The P-value is: Area under the F curve to the right of the calculated value of the test statistic Unless otherwise noted, all content on this page is © Cengage Learning. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. F curve for particular df1, df2 10 CHAPTER 17 Asking and Answering Questions about More Than Two Means Example 17.3 Heart Attacks Revisited © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Recall that the two mean squares for the MPV data given in Example 17.1 were calculated in Example 17.2 to be MSTr 5 4.400 MSE 5 0.749 You can now use the five-step process for hypothesis testing problems (HMC3) to test the hypotheses of interest. Process Step H Hypotheses The question of interest is whether there are differences in mean MPV for the four different diagnosis groups. Population characteristics of interest: m1 5 mean MPV for the noncardiac chest pain group m2 5 mean MPV for the stable angina group m3 5 mean MPV for the unstable angina group m4 5 mean MPV for the heart attack group Hypotheses: Null hypothesis: H0: m1 5 m2 5 m3 5 m4 Alternative hypothesis: Ha: At least two of the m9s are diffrent M Method Because the answers to the four key questions are hypothesis testing, sample data, one numerical variable and four independently selected samples, consider an ANOVA F test. Potential method: ANOVA F test. The test statistic for this test is F 5 _____ MSTr MSE When the null hypothesis is true, this statistic has approximately an F distribution with df1 5 k 2 1 and df2 5 N 2 k Once you have decided to proceed with the test, you need to select a significance level for the test. In this example, you might choose a value for a of 0.05. Significance level: a 5 0.05 C Check The samples were independently selected. The largest sample standard deviation (from Table 17.1, s4 5 1.07) is not more than twice as large as the smallest sample standard deviation (s1 5 0.69), so the equal population standard deviations condition is reasonably met. A normal probability plot (see Figure 17.3) indicates that the normality condition is also reasonably met. C Calculate MSTr 5 4.400 MSE 5 0.749 (from Example 17.2) Test statistic: MSTr _____ 4.400 F 5 _____ MSE 5 0.749 5 5.87 Degrees of freedom df1 5 k 2 1 5 4 2 1 5 3 df2 5 N 2 k 5 140 2 4 5 136 (continued) 17.1 The Analysis of Variance—Single-Factor ANOVA and the F Test 11 Process Step C Communicate results Because the P-value is less than the selected significance level, you reject the null hypothesis. Decision: Reject H0. The final conclusion for the test should be stated in context and answer the question posed. Conclusion: You can conclude that the mean MPV is not the same for all four patient groups. Techniques for determining which means differ are introduced in Section 17.2. Example 17.4 Hormones and Body Fat The article “Growth Hormone and Sex Steroid Administration in Healthy Aged Women and Men” (Journal of the American Medical Association [2002]: 2282–2292) described an experiment to investigate the effect of four treatments on various body characteristics. In this double-blind experiment, each of 57 female subjects age 65 or older was assigned at random to one of the following four treatments: (1) placebo “growth hormone” and placebo “steroid” (denoted by P 1 P); (2) placebo “growth hormone” and the steroid estradiol (denoted by P 1 S); (3) growth hormone and placebo “steroid” (denoted by G 1 P); and (4) growth hormone and the steroid estradiol (denoted by G 1 S). The following table lists data on change in body fat mass over the 26-week period following the treatments that are consistent with summary quantities given in the article. Change in Body Fat Mass (kg) Treatment P1P 0.1 0.6 2.2 0.7 22.0 0.7 0.0 22.6 21.4 1.5 2.8 0.3 21.0 21.0 n _ x s s2 14 0.064 1.545 2.387 P1S G1P G1S 20.1 0.2 0.0 20.4 20.9 21.1 1.2 0.1 0.7 22.0 20.9 3.0 1.0 1.2 21.6 20.4 0.4 22.0 23.4 22.8 22.2 21.8 23.3 22.1 23.6 20.4 23.1 14 20.286 1.218 1.484 13 22.023 1.264 1.598 23.1 23.2 22.0 22.0 23.3 20.5 24.5 20.7 21.8 22.3 21.3 21.0 25.6 22.9 21.6 20.2 16 22.250 1.468 2.155 _ _ 265.4 For this example, N 5 57, grand total 5 265.4, and x . 5 ______ 57 5 21.15. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Associated P-value: P-value 5 area under F curve to the right of 5.87 Using df1 5 3 and df2 5 120 (the closest value to 136 that appears in the table), Appendix Table 7 shows that the area to the right of 5.78 is 0.001. Since 5.87 > 5.78 it follows that the P-value is less than 0.001. 12 CHAPTER 17 Asking and Answering Questions about More Than Two Means Process Step © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. H Hypotheses The question of interest is whether there are differences in mean change in body fat mass for the four treatments. Population characteristics of interest: m1 5 mean change in body fat mass for the P 1 P treatment m2 5 mean change in body fat mass for the P 1 S treatment m3 5 mean change in body fat mass for the G 1 P treatment m4 5 mean change in body fat mass for the G 1 S treatment Hypotheses: Null hypothesis: H0: m1 5 m2 5 m3 5 m4 Alternative hypothesis: Ha: At least two of the m9s are different M Method Because the answers to the four key questions are hypothesis testing, sample data, one numerical variable, and four independently selected samples, consider an ANOVA F test. Potential method: ANOVA F test. The test statistic for this test is F 5 _____ MSTr MSE When the null hypothesis is true, this statistic has approximately an F distribution with df1 5 k 2 1 and df2 5 N 2 k Once you have decided to proceed with the test, you need to select a significance level for the test. For this example, a significance level of 0.01 will be used. Significance level: a 5 0.01 C Check The subjects in the experiment were randomly assigned to treatments. The largest sample standard deviation (s1 5 1.545) is not more than twice as large as the smallest sample standard deviation (s2 5 1.218), so the equal population standard deviations condition is reasonably met. Boxplots of the data from each of the four samples are shown in Figure 17.5. The boxplots are roughly symmetric, and there are no outliers, so the normality condition is also reasonably met. C Calculate SSTr 5 n1(x 1 2 x )2 1 n2(x 2 2 x )2 1 . . . 1 nk(x k 2 x )2 5 14(0.064 2 (21.15))21 14(20.286 2 (21.15))2 1 13(22.023 2 (21.15))21 16(22.250 2 (21.15))2 5 60.37 _ _ _ _ _ _ _ _ _ treatment df 5 k 2 1 5 3 SSE 5 (n1 2 1)s21 1 (n2 2 1)s22 1 . . . 1 (nk 2 1)s2k 5 13(2.387) 1 13(1.484) 1 12(1.598) 1 15(2.155) 5 101.81 Test statistic: MSTr SSTr treatment df 20.12 60.37 3 _________ F 5 _____ 5 _______________ 5 _____ 1.92 5 10.48 MSE SSE error df 5 101.81 53 Degrees of freedom df1 5 k 2 1 5 4 2 1 5 3 df2 5 N 2 k 5 57 2 4 5 53 Associated P-value: P-value 5 area under F curve to the right of 10.48 Using df1 5 3 and df2 5 60 (the closest value to 53 that appears in the table), Appendix Table 7 shows that the area to the right of 6.17 is 0.001. Since 10.48 > 6.17 it follows that the P-value is less than 0.001. (continued) 17.1 The Analysis of Variance—Single-Factor ANOVA and the F Test 13 Process Step C Communicate results Because the P-value is less than the selected significance level, reject the null hypothesis. Decision: Reject H0. P+P P+S G+P G+S Figure 17.5 −6 Boxplots for the data of Example 17.4 −5 −4 −3 −2 −1 0 1 Change in body fat mass 2 3 Summarizing an ANOVA ANOVA calculations are often summarized in a tabular format called an ANOVA table. To understand such a table, one more sum of squares must be defined. Total sum of squares, denoted by SSTo, is given by ∑ __ SSTo 5 (x 2 x ) 2 with associated df 5 N 2 1 all N obs. The relationship between the three sums of squares SSTo, SSTr, and SSE is SSTo 5 SSTr 1 SSE which is called the fundamental identity for single-factor ANOVA The quantity SSTo, the sum of squared deviations about the grand mean, is a measure of total variability in the data set consisting of all k samples. The quantity SSE results from measuring variability separately within each sample and then combining. Such withinsample variability is present regardless of whether or not H0 is true. The magnitude of SSTr, on the other hand, depends on whether the null hypothesis is true or false. The more the m’s differ from one another, the larger SSTr will tend to be. SSTr represents variation that can (at least to some extent) be explained by any differences between means. An informal paraphrase of the fundamental identity for single-factor ANOVA is total variation 5 explained variation 1 unexplained variation Once any two of the sums of squares have been calculated, the remaining one is easily obtained from the fundamental identity. Often SSTo and SSTr are calculated first (using computational formulas given in the appendix to this chapter), and then SSE is obtained by subtraction: SSE 5 SSTo − SSTr. All the degrees of freedom, sums of squares, and mean squares are entered in an ANOVA table, as displayed in Table 17.2. The P-value usually appears to the right of F when the analysis is done by a statistical software package. Unless otherwise noted, all content on this page is © Cengage Learning. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Conclusion: You can conclude that the mean change in body fat mass is not the same for all four patient groups. 14 CHAPTER 17 Asking and Answering Questions about More Than Two Means Table 17.2 General Format for a Single-Factor ANOVA Table © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Source of Variation df Sum of Squares Mean Square F Treatments k21 SSTr SSTr MSTr 5 ______ k 2 1 Error N2k SSE SSE MSE 5 ______ N 2 k Total N21 SSTo MSTr F 5 _____ MSE An ANOVA table from Minitab for the change in body fat mass data of Example 17.4 is shown in Table 17.3. The reported P-value is 0.000, consistent with the previous conclusion that P-value < 0.001. Table 17.3 An ANOVA Table from Minitab for the Data of Example 17.4 One-way ANOVA Source section DF SS MS F Factor 3 60.37 20.12 10.48 Error 53 101.81 1.92 Total 56 162.18 P 0.000 17.1 Exercises Each Exercise Set assesses the following chapter learning objectives: C1, M1, M2, M3, P1, P2 Section 17.1 Exercise Set 1 17.1 Give as much information as you can about the P-value for an upper-tailed F test in each of the following situations. a. df1 5 4, df2 5 15, F 5 5.37 b. df1 5 4, df2 5 15, F 5 1.90 c. df1 5 4, df2 5 15, F 5 4.89 d. df1 5 3, df2 5 20, F 5 14.48 e. df1 5 3, df2 5 20, F 5 2.69 f. df1 5 4, df2 5 50, F 5 3.24 17.2 Employees of a certain state university system can choose from among four different health plans. Each plan differs somewhat from the others in terms of hospitalization coverage. Four samples of recently hospitalized individuals were selected, each sample consisting of people covered by a different health plan. The length of the hospital stay (number of days) was determined for each individual selected. a. What hypotheses would you test to decide whether the mean lengths of stay are not the same for all four health plans? b. If each sample consisted of eight individuals and the value of the ANOVA F statistic was F 5 4.37, what conclusion would be appropriate for a test with a 5 0.01? c. A nswer the question posed in Part (b) if the F value given there resulted from sample sizes n1 5 9, n2 5 8, n3 5 7, and n4 5 8. 17.3 The authors of the paper “Age and Violent Content Labels Make Video Games Forbidden Fruits for Youth” (Pediatrics [2009]: 870–876) carried out an experiment to determine if restrictive labels on video games actually increased the attractiveness of the game for young game players. Participants read a description of a new video game and were asked how much they wanted to play the game. The description also included an age rating. Some participants read the description with an age restrictive label of 71, indicating that the game was not appropriate for children under the age of 7. Others read the same description, but with an age restrictive label of 121, 161, or 181. The following data for 12- to 13-year-old boys are fictitious but are consistent with summary statistics given in the paper. (The sample sizes in the actual experiment were larger.) For purposes of this exercise, you can assume that the boys were assigned at random to one of the four age label treatments (71, 121, 161, and 181). Data shown are the boys’ ratings of how much they wanted to play the game on a scale of 1 to 10. Do the data provide convincing evidence that the mean rating associated with the game description by 12- to 15 17.1 The Analysis of Variance—Single-Factor ANOVA and the F Test storage times? Use the value of F from the ANOVA table to test the appropriate hypotheses at significance level 0.05. Section 17.1 71 label 121 label 161 label 181 label 6 8 7 10 6 7 9 9 6 8 8 6 5 5 6 8 4 7 7 7 8 9 4 6 6 5 8 8 1 8 9 9 2 4 6 10 4 7 7 8 17.4 The accompanying data on calcium content of wheat are consistent with summary quantities that appeared in the article “Mineral Contents of Cereal Grains as Affected by Storage and Insect Infestation” ( Journal of Stored Products Research [1992]: 147–151). Four different storage times were considered. Partial output from the SAS computer package is also shown. Storage Period Observations 0 months 58.75 57.94 58.91 56.85 55.21 Exercise Set 2 17.5 Give as much information as you can about the P-value of the single-factor ANOVA F test in each of the following situations. a. k 5 5, n1 5 n2 5 n3 5 n4 5 n5 5 4, F 5 5.37 b. k 5 5, n1 5 n2 5 n3 5 5, n4 5 n5 5 4, F 5 2.83 c. k 5 3, n1 5 4, n2 5 5, n3 5 6, F 5 5.02 d. k 5 3, n1 5 n2 5 4, n3 5 6, F 5 15.90 e. k 5 4, n1 5 n2 5 15, n3 5 12, n4 5 10, F 5 1.75 17.6 The paper referenced in Exercise 17.3 also gave data for 12- to 13-year-old girls. Data consistent with summary values in the paper are shown below. Do the data provide convincing evidence that the mean rating associated with the game description for 12- to 13-year-old girls is not the same for all four age restrictive rating labels? Test the appropriate hypotheses using a 5 0.05. 71 label 121 label 161 label 181 label 4 4 6 8 7 5 4 6 6 4 8 6 5 6 6 5 3 3 10 7 57.30 6 5 8 4 3 6 10 1 month 58.87 56.43 56.51 57.67 59.75 58.48 4 2 months 59.13 60.38 58.01 59.95 59.51 60.34 5 8 6 6 4 months 62.32 58.76 60.03 59.36 59.61 61.95 10 5 8 8 5 9 5 7 Dependent Variable: CALCIUM Sum of Mean Source DF Squares Square F Value Pr>F Model 3 32.13815000 10.71271667 Error 20 32.90103333 1.64505167 Corrected Total 23 65.03918333 R-Square C.V. Root MSE CALCIUM Mean 0.494135 2.180018 1.282596 58.8341667 6.51 0.0030 a. Verify that the sums of squares and df’s are as given in the ANOVA table. b. Is there sufficient evidence to conclude that the mean calcium content is not the same for the four different 17.7 The experiment described in Example 17.4 also gave data on change in body fat mass for men (“Growth Hormone and Sex Steroid Administration in Healthy Aged Women and Men,” Journal of the American Medical Association [2002]: 2282–2292). Each of 74 male subjects who were over age 65 was assigned at random to one of the following four treatments: (1) placebo “growth hormone” and placebo “steroid” (denoted by P 1 P); (2) placebo “growth hormone” and the steroid testosterone (denoted by P 1 S); (3) growth hormone and placebo “steroid” (denoted by G 1 P); and (4) growth hormone and the steroid testosterone (denoted by G 1 S). The accompanying table lists data on change in body fat mass over the 26-week period following the treatment that are consistent with summary quantities given in the article. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 13-year-old boys is not the same for all four restrictive rating labels? Test the appropriate hypotheses using a significance level of 0.05. 16 CHAPTER 17 Asking and Answering Questions about More Than Two Means Change in Body Fat Mass (kg) © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Treatment P1P P1S G1P G1S 0.3 23.7 23.8 25.0 0.4 21.0 23.2 25.0 21.7 0.2 24.9 23.0 20.5 22.3 25.2 22.6 22.1 1.5 22.2 26.2 1.3 21.4 23.5 27.0 0.8 1.2 24.4 24.5 1.5 22.5 20.8 24.2 21.2 23.3 21.8 25.2 20.2 0.2 24.0 26.2 1.7 0.6 21.9 24.0 1.2 20.7 23.0 23.9 0.6 20.1 21.8 23.3 0.4 23.1 22.9 25.7 21.3 0.3 22.9 24.5 20.2 20.5 22.9 24.3 0.7 20.8 23.7 24.0 20.7 24.2 20.9 24.7 Source of Variation 0.117 0.121 0.117 0.119 x 0.100 20.933 23.112 24.605 s 1.139 1.443 1.178 1.122 1.297 2.082 1.388 1.259 s 2 Mean Square F Error 235,419.04 Total 310,500.76 Additional Exercise for Section 17.1 17.9 The article “Compression of Single-Wall Corrugated Shipping Containers Using Fixed and Floating Text Platens” (Journal of Testing and Evaluation [1992]: 318–320) described an experiment in which several different types of boxes were compared with respect to compression strength (in pounds). The data at the bottom of the page resulted from a singlefactor experiment involving k 5 4 types of boxes (the sample means and standard deviations are in close agreement with values given in the paper). Do these data provide evidence to support the claim that the mean compression strength is not the same for all four box types? Test the relevant hypothesis using a significance level of 0.01. 17.10 The accompanying summary statistics for a measure of social marginality for samples of youths, young adults, adults, and seniors appeared in the paper “Perceived Causes of Loneliness in Adulthood” (Journal of Social Behavior and Personality [2000]: 67–84). The social marginality score mea- 20.6 _ Sum of Squares Treatments 22.0 n df sured actual and perceived social rejection, with higher scores indicating greater social rejection. For purposes of this exercise, assume that it is reasonable to regard the four samples as representative of the U.S. population in the corresponding age groups and that the distributions of social marginality scores for these four groups are approximately normal with the same standard deviation. Is there evidence that the mean social marginality score is not the same for all four age groups? Test the relevant hypotheses using a 5 0.05. _ _ 2158.3 Also, N 5 74, grand total 5 2158.3, and x 5 _______ . 5 74 22.139 Carry out an F test to see whether mean change in body fat mass differs for the four treatments. 17.8 In an experiment to investigate the performance of four different brands of spark plugs intended for use on a 125-cc motorcycle, five plugs of each brand were tested, and the number of miles (at a constant speed) until failure was observed. A partially completed ANOVA table is given. Fill in the missing entries, and test the relevant hypotheses using a 0.05 level of significance. Age Group Youths Young Adults Adults Seniors Sample Size _ x 106 255 314 36  2.00 3.40 3.07 2.84 s 1.56 1.68 1.66 1.89 Table for Exercise 17.9 Type of Box Sample Mean Compression Strength (lb) Sample SD 1 655.5 788.3 734.3 721.4 679.1 699.4 713.00 46.55 2 789.2 772.5 786.9 686.1 732.1 774.8 756.93 40.34 3 737.1 639.0 696.3 671.7 717.2 727.1 698.07 37.20 4 535.1 628.7 542.4 559.0 586.9 520.0 562.02 39.87 _ _ 5 682.50 x 17 17.2 Multiple Comparisons Soccer Athletes Group Nonsoccer Comparison Athletes Group Sample size 86 Sample mean score 29.90 30.94 95 29.32 53 Sample standard deviation 3.73 5.14 3.78 _ _ In addition, x 5. 30.19 Suppose that it is reasonable to regard these three samples as random samples from the three student populations of interest. Is there sufficient evidence to conclude that the mean Hopkins score is not the same for the three student populations? Use a 5 0.05. 17.12 Suppose that a random sample of size n 5 5 was selected from the vineyard properties for sale in Sonoma County, California, in each of 3 years. The following data are consistent with summary information on price per acre (in dollars, rounded to the nearest thousand) for disease-resistant grape vineyards in Sonoma County (Wines and Vines, November 1999). 1996 30,000 34,000 36,000 38,000 40,000 1997 30,000 35,000 37,000 38,000 40,000 1998 40,000 41,000 43,000 44,000 50,000 a. Construct boxplots for each of the 3 years on a common axis, and label each by year. Comment on the similarities and differences. b. Carry out an ANOVA to determine whether there is evidence to support the claim that the mean price per acre for vineyard land in Sonoma County was not the same for the 3 years considered. Use a significance level of 0.05 for your test. 17.13 Parents are frequently concerned when their child seems slow to begin walking (although when the child section 17.2 finally walks, the resulting havoc sometimes has the parents wishing they could turn back the clock!). The article “Walking in the Newborn” (Science, 176 [1972]: 314–315) reported on an experiment in which the effects of several different treatments on the age at which a child first walks were compared. Children in the first group were given special walking exercises for 12 minutes per day beginning at age 1 week and lasting 7 weeks. The second group of children received daily exercises but not the walking exercises administered to the first group. The third and fourth groups were control groups. They received no special treatment and differed only in that the third group’s progress was checked weekly, whereas the fourth group’s progress was checked just once at the end of the study. Observations on age (in months) when the children first walked are shown in the accompanying table. Also given is the ANOVA table, obtained from the SPSS computer package. Age Treatment 1 Treatment 2 Treatment 3 Treatment 4 9.00 9.50 9.75 10.00 13.00 9.50 11.00 10.00 10.00 11.75 10.50 15.00 11.50 12.00 9.00 11.50 13.25 13.00 13.25 11.50 12.00 13.50 11.50 n Total 6 60.75 6 68.25 6 70.25 12.00 561.75 Analysis of Variance Source df Sum of sq. Mean Sq. F Ratio F Prob Between Groups 3 14.779 4.926 .129 With in Group 19 43.690 2.299 Total 22 58.467 2.142 a. Verify the entries in the ANOVA table. b. State and test the relevant hypotheses using a significance level of 0.05. Multiple Comparisons When H0: m1 5 m2 5 . . . 5 mk is rejected by the F test, you believe that there are differences among the k population or treatment means. A natural question to ask at this point is, which means differ? For example, with k 5 4, it might be the case that m1 5 m2 5 m4, with m3 different from the other three means. Another possibility is that m1 5 m4 and m2 5 m3. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 17.11 The chapter Preview Example described a study comparing three groups of college students (soccer athletes, non–soccer athletes, and a comparison group consisting of students who did not participate in intercollegiate sports). The following is information on scores from the Hopkins Verbal Learning Test (which measures immediate memory recall). 18 CHAPTER 17 Asking and Answering Questions about More Than Two Means © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Still another possibility is that all four means are different from one another. A multiple comparisons procedure is a method for identifying differences among the m’s once the hypothesis that all of the means are equal has been rejected. The Tukey-Kramer (T-K) multiple comparisons procedure is one method that can be used to identify differences. The T-K procedure is based on computing confidence intervals for the difference between each possible pair of m’s. For example, for k 5 3, there are three differences to consider: m1 2 m2 m1 2 m3 m2 2 m3 (The difference m2 2 m1 is not considered, because the interval for m1 2 m2 provides the same information. Similarly, intervals for m3 2 m1 and m3 2 m2 are not necessary.) Once all confidence intervals have been computed, each is examined to determine whether the interval includes 0. If a particular interval does not include 0, the two means are declared “significantly different” from one another. If an interval includes 0, there is no evidence of a significant difference between the means involved. Suppose, for example, that k 5 3 and that the three confidence intervals are Difference T-K Confidence Interval m1 2 m2 m1 2 m3 m2 2 m3 (20.9, 3.5) (2.6, 7.0) (1.2, 5.7) Because the interval for m1 2 m2 includes 0, you would say that m1 and m2 do not differ significantly. The other two intervals do not include 0, so you would conclude that m1 Þ m3 and m2 Þ m3. The T-K intervals are based on critical values for a probability distribution called the Studentized range distribution. These critical values appear in Appendix Table 8. To find a critical value, enter the table at the column corresponding to the number of populations or treatments being compared, move down to the rows corresponding to the number of error degrees of freedom (N 2 k), and select either the value for a 95% confidence level or the one for a 99% level. The Tukey–Kramer Multiple Comparison Procedure k (k 2 1) When there are k populations or treatments being compared, ________ confidence 2 intervals must be computed. Denoting the relevant Studentized range critical value (from Appendix Table 8) by q, the intervals are as follows: ( ) WWWWWW _ _ __ For mi 2 mj: (x i 2 x j) 6 q _____ MSE __ 1 1 1 2 ni nj Ï Two means are judged to differ significantly if the corresponding interval does not include zero. If the sample sizes are all the same, you can use n to denote the common value of n1, n2, . . ., nk. In this case, the 6 term for each interval is the same quantity WWW q _____ MSE n Ï Example 17.5 Hormones and Body Fat Revisited Example 17.4 introduced the accompanying data on change in body fat mass resulting from a double-blind experiment designed to compare the following four treatments: (1) placebo “growth hormone” and placebo “steroid” (denoted by P 1 P); (2) placebo “growth hormone” and the steroid estradiol (denoted by P 1 S); (3) growth hormone and 17.2 Multiple Comparisons 19 placebo “steroid” (denoted by G 1 P); and (4) growth hormone and the steroid estradiol (denoted by G 1 S). From Example 17.4, MSTr 5 20.12, MSE 5 1.92, and F 5 10.48 with an associated P-value , 0.001. It was concluded that the mean change in body fat mass is not the same for all four treatments. Treatment n _ x s s2 P1P P1S G1P G1S 0.1 0.6 2.2 0.7 22.0 0.7 0.0 22.6 21.4 1.5 2.8 0.3 21.0 21.0 20.1 0.2 0.0 20.4 20.9 21.1 1.2 0.1 0.7 22.0 20.9 23.0 1.0 1.2 21.6 20.4 0.4 22.0 23.4 22.8 22.2 21.8 23.3 22.1 23.6 20.4 23.1 23.1 23.2 22.0 22.0 23.3 20.5 24.5 20.7 21.8 22.3 21.3 21.0 25.6 22.9 21.6 20.2 14 0.064 1.545 2.387 14 20.286 1.218 1.484 13 22.023 1.264 1.598 16 22.250 1.468 2.155 Appendix Table 8 gives the 95% Studentized range critical value q 5 3.74 (using k 5 4 and error df 5 60, the closest tabled value to df 5 N 2 k 5 53). The first two T-K intervals are WWWWWWWW ___ m1 2 m2: (0.064 2 (20.286)) 6 3.74 ____ 1.92 ___ 1 1 1 2 14 14 5 0.35 6 1.39 5 (21.04, 1.74) Ï( ) ( ) Ï( ) ( ) s0 Include WWWWWWWW ___ m1 2 m3: (0.064 2 (22.023)) 6 3.74 ____ 1.92 ___ 1 1 1 2 14 13 5 2.09 6 1.41 5 (0.68, 3.50) The remaining intervals are m1 2 m4 m2 2 m3 m2 2 m4 m3 2 m4 (0.97, 3.66) (0.32, 3.15) (0.62, 3.31) (21.14, 1.60) e0 t includ Does no Does not include 0 Does not include 0 Does not include 0 Includes 0 You would conclude that m1 is not significantly different from m2 and that m3 is not significantly different from m4. You would also conclude that m1 and m2 are significantly different from both m3 and m4. Note that Treatments 1 and 2 were treatments that © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Change in Body Fat Mass (kg) 20 CHAPTER 17 Asking and Answering Questions about More Than Two Means © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. administered a placebo in place of the growth hormone and Treatments 3 and 4 were treatments that included the growth hormone. This analysis was the basis of the researchers’ conclusion that growth hormone, with or without steroids, decreased body fat mass. Minitab can be used to construct T-K intervals if raw data are available. Typical output (based on Example 17.5) is shown in Figure 17.6. From the output, you can see that the confidence interval for m1 (P 1 P) 2 m2 (P 1 S) is (21.039, 1.739), that for m2 (P 1 S) 2 m4 (G 1 S) is (0.619, 3.309), and so on. Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level = 98.95% G + S subtracted from: G+P P+S P+P Lower -1.145 0.619 0.969 Center 0.227 1.964 2.314 Upper 1.599 3.309 3.659 --------+---------+---------+---------+(------*------) (------*------) (------*-----) --------+---------+---------+---------+-2.0 0.0 2.0 4.0 G + P subtracted from: P+S P+P Lower 0.322 0.672 Center 1.737 2.087 Upper 3.153 3.503 --------+---------+---------+---------+(------*------) (------*-------) --------+---------+---------+---------+-2.0 0.0 2.0 4.0 P + S subtracted from: P+P Figure 17.6 The T-K intervals for Example 17.5 (from Minitab) Lower -1.039 Center 0.350 Upper 1.739 --------+---------+---------+---------+(------*------) --------+---------+---------+---------+-2.0 0.0 2.0 4.0 Why calculate the T-K intervals rather than use the t confidence interval for a difference between m’s from Chapter 13? The answer is that the T-K intervals control the simultaneous confidence level at approximately 95% (or 99%). That is, if the procedure is used repeatedly on many different data sets, in the long run only about 5% (or 1%) of the time would at least one of the intervals not include the value of what the interval is estimating. Consider using separate 95% t intervals, each one having a 5% error rate. In those instances, the chance that at least one interval would make an incorrect statement about a difference in m’s increases dramatically with the number of intervals calculated. The Minitab output in Figure 17.6 shows that to achieve a simultaneous confidence level of about 95% (experimentwise or “family” error rate of 5%) when k 5 4 and error df 5 76, the individual interval confidence levels must be 98.95% (individual error rate 1.05%). An effective display for summarizing the results of any multiple comparisons proce_ dure involves listing the x ’s and underscoring pairs judged to be not significantly different. The process for constructing such a display is described in the following box. Unless otherwise noted, all content on this page is © Cengage Learning. Summarizing the Results of the Tukey–Kramer Procedure 1. List the sample means in increasing order, identifying the corresponding popu_ lation or treatment just above the value of each x . 2. Use the T-K intervals to determine the group of means that do not differ significantly from the first in the list. Draw a horizontal line extending from the smallest mean to the last mean in the group identified. For example, if there are five means, arranged in order, Population 3 2 1 4 5 _ _ _ _ _ Sample meanx 3x 2x 1x 4x 5 and m3 is judged to be not significantly different from m2 or m1, but is judged to be significantly different from m4 and m5, draw the following line: Population 3 _ Sample meanx 3 2 x 2 1 x 1 _ _ 4 x 4 _ 5 _ 5 x 3. Use the T–K intervals to determine the group of means that are not significantly different from the second smallest. (You need consider only means that appear to the right of the mean under consideration.) If there is already a line connecting the second smallest mean with all means in the new group identified, no new line need be drawn. If this entire group of means is not underscored with a single line, draw a line extending from the second smallest to the last mean in the new group. Continuing with our example, if m2 is not significantly different from m1 but is significantly different from m4 and m5, no new line need be drawn. However, if m2 is not significantly different from either m1 or m4 but is judged to be different from m5, a second line is drawn as shown: Population 3 2 1 4 5 _ _ _ _ _ Sample meanx 3 2 x 1 x 4 x 5 x 4. Continue considering the means in the order listed, adding new lines as needed. _ _ To illustrate this summary procedure, suppose that four samples with x 1 5 19, x 2 5 27, _ 5 24, and x 4 5 10 are used to test H0: m1 5 m2 5 m3 5 m4 and that this hypothesis is rejected. Suppose the T-K confidence intervals indicate that m2 is significantly different from both m1 and m4, and that there are no other significant differences. The resulting summary display would then be _ x 3 Population Sample mean Example 17.6 4 1 3 2 10 19 24 27 Sleep Time A biologist studied the effects of ethanol on sleep time. A sample of 20 rats, matched for age and other characteristics, was selected, and each rat was given an oral injection having a particular concentration of ethanol per body weight. The rapid eye movement (REM) sleep time for each rat was then recorded for a 24-hour period, with the results shown in the following table: Treatment 1. 0 (control) 2. 1 g/kg 3. 2 g/kg 4. 4 g/kg __ Observations 88.6 63.0 44.9 31.0 73.2 53.9 59.5 39.6 91.4 69.2 40.2 45.3 x 68.0 50.1 56.3 25.2 75.2 71.5 38.7 22.7 79.28 61.54 47.92 32.76 © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 21 17.2 Multiple Comparisons 22 CHAPTER 17 Asking and Answering Questions about More Than Two Means Table 17.4 (an ANOVA table from SAS) leads to the conclusion that actual mean REM sleep time is not the same for all four treatments (the P-value for the F test is 0.0001). © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Table 17.4 SAS ANOVA Table for Example 17.6 Analysis of Variance Procedure Dependent Variable: TIME Sum of Source DF Squares Mean Square F Value Pr > F 21.09 0.0001 Model 3 5882.35750 1960.78583 Error 16 1487.40000 92.96250 Total 19 7369.75750 The T-K intervals are Difference Interval Includes O? m1 2 m2 m1 2 m3 m1 2 m4 m2 2 m3 m2 2 m4 m3 2 m4 17.74 6 17.446 31.36 6 17.446 46.24 6 17.446 13.08 6 17.446 28.78 6 17.446 15.16 6 17.446 no no no yes no yes The only T-K intervals that include zero are those for m2 2 m3 and m3 2 m4. The corresponding underscoring pattern is _ _ _ _ x 4x 3x 2x 1 32.76 47.92 61.54 79.28 Figure 17.7 displays the SAS output that agrees with our underscoring; letters are used to indicate groupings in place of the underscoring. Figure 17.7 SAS output for Example 17.6 Alpha 5 0.05 df 5 16 MSE 5 92.9625 Critical Value of Studentized Range 5 4.046 Minimum Significant Difference 5 17.446 Means with the same letter are not significantly different. Tukey Grouping Mean N Treatment A 79.280 5 0 (control) B 61.540 5 1 g/kg C B 47.920 5 2 g/kg C 32.760 5 4 g/kg Example 17.7 Roommate Satisfaction How satisfied are college students with dormitory roommates? The article “Roommate Satisfaction and Ethnic Identity in Mixed-Race and White University Roommate Dyads” ( Journal of College Student Development [1998]: 194–199) investigated differences among randomly assigned African American/white, Asian/white, Hispanic/white, and white/ white roommate pairs. The researchers used a one-way ANOVA to analyze scores on the Roommate Relationship Inventory to see whether a difference in mean score existed for the four types of roommate pairs. They reported “significant differences among the means (P , 0.01). Follow-up Tukey [intervals] . . . indicated differences between White dyads (M 5 77.49) and African American/White dyads (M 5 71.27). No other significant differences were found.” Although the mean satisfaction score for the Asian/white and Hispanic/white groups were not given, they must have been between 77.49 (the mean for the white/white pairs) Unless otherwise noted, all content on this page is © Cengage Learning. 23 17.2 Multiple Comparisons and 71.27 (the mean for the African American/white pairs). (If they had been larger than 77.49, they would have been significantly different from the African American/white pairs mean, and if they had been smaller than 71.27, they would have been significantly different from the white/white pairs mean.) An underscoring consistent with the reported information is Hispanic/ African-American/ White and White Asian/White 17.2 Exercises Each Exercise Set assesses the following chapter learning objectives: M4, P1 Section 17.2 Exercise Set 1 17.14 Leaf surface area is an important variable in plant gas-exchange rates. Dry matter per unit surface area (mg/cm3) was measured for trees raised under three different growing conditions. Let m1, m2, and m3 represent the mean dry matter per unit surface area for the growing conditions 1, 2, and 3, respectively. The given 95% simultaneous confidence intervals are: Difference Interval m1 2 m2 m1 2 m3 m2 2 m3 (23.11, 21.11) (24.06, 22.06) (21.95, 0.05) Which of the following four statements do you think describes the relationship between m1, m2, and m3? Explain your choice. a. m1 5 m2, and m3 differs from m1 and m2. b. m1 5 m3, and m2 differs from m1 and m3. c. m2 5 m3, and m1 differs from m2 and m3. d. All three m’s are different from one another. 17.15 The accompanying underscoring pattern appears in the article “Women’s and Men’s Eating Behavior Following Exposure to Ideal-Body Images and Text” (Communications Research [2006]: 507–529). Women either viewed slides depicting images of thin female models with no text (treatment 1); viewed the same slides accompanied by diet and exercise-related text (treatment 2); or viewed the same slides accompanied by text that was unrelated to diet and exercise (treatment 3). A fourth group of women did not view any slides (treatment 4). Participants were assigned at random to the four treatments. Participants were then asked to complete a questionnaire in a room where pretzels were set out on the tables. An observer recorded how many pretzels participants ate while completing the questionnaire. Write a few sentences interpreting this underscoring pattern. Treatment: 2 1 4 3 Mean number of pretzels consumed: 0.97 1.03 2.20 2.65 17.16 The accompanying data resulted from a flammability study in which specimens of five different fabrics were tested to determine burn times. Fabric 1 2 3 4 5 17.8 13.2 11.8 16.5 13.9 16.2 10.4 11.0 15.3 10.8 15.9 11.3 9.2 14.1 12.8 15.5 10.0 15.0 11.7 13.9 MSTr 5 23.67 MSE 5 1.39 F 5 17.08 P-value 5 0.000 The accompanying output gives the T-K intervals as calculated by Minitab. Identify significant differences and give the underscoring pattern. Individual error rate 5 0.00750 Critical value 5 4.37 Intervals for (column level mean) 2 (row level mean) 2 3 4 1 1.938 2 7.495 3.278 21.645 3 8.422 3.912 21.050 25.983 26.900 4 3.830 20.670 22.020 1.478 23.445 24.372 0.220 5 6.622 2.112 0.772 5.100 Section 17.2 Exercise Set 2 17.17 The paper “Trends in Blood Lead Levels and Blood Lead Testing among U.S. Children Aged 1 to 5 Years” (Pediatrics [2009]: e376–e385) gave data on blood lead levels (in mg/dL) for samples of children living in homes that had been classified either at low, medium, or high risk of lead exposure, based on when the home was constructed. After using a multiple comparison procedure, the authors reported the following: 1. The difference in mean blood lead level between low-risk housing and medium-risk housing was significant. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. section White/White 24 CHAPTER 17 Asking and Answering Questions about More Than Two Means © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 2. The difference in mean blood lead level between low-risk housing and high-risk housing was significant. 3. The difference in mean blood lead level between mediumrisk housing and high-risk housing was significant. Which of the following sets of T-K intervals (Set 1, 2, or 3) is consistent with the authors’ conclusions? Explain your choice. mL 5 mean blood lead level for children living in low-risk housing mM 5 mean blood lead level for children living in mediumrisk housing mH 5 mean blood lead level for children living in high-risk housing Difference Set 1 Set 2 Set 3 mL 2 mM (20.6, 0.1) (20.6, 20.1) (20.6, 20.1) mL 2 mH (21.5, 20.6) (21.5, 20.6) (21.5, 20.6) mM 2 mH (20.9, 20.3) (20.9, 0.3) (20.9, 20.3) 17.18 The paper referenced in the Exercise 17.15 also gave the following underscoring pattern for men. Treatment: Mean number of pretzels consumed: 2 6.61 1 5.96 3 3.38 4 2.70 a. Write a few sentences interpreting this underscoring pattern. b. Using your answers from Part (a) and from the Exercise 17.15, write a few sentences describing the differences between how men and women respond to the treatments. 17.19 Do lizards play a role in spreading plant seeds? Some research carried out in South Africa would suggest so (“Dispersal of Namaqua Fig [Ficus cordata cordata] Seeds by the Augrabies Flat Lizard [Platysaurus broadleyi],” Journal of Herpetology [1999]: 328–330). The researchers collected 400 seeds of a particular type of fig, 100 of which were from each treatment: lizard dung, bird dung, rock hyrax dung, and uneaten figs. They planted these seeds in batches of 5, and for each group of 5 they recorded how many of the seeds germinated. This resulted in 20 observations for each treatment. The treatment means and standard deviations are given in the accompanying table. Treatment n __ x Uneaten figs 20 2.40 0.30 Lizard dung 20 2.35 0.33 Bird dung 20 1.70 0.34 Hyrax dung 20 1.45 0.28 s a. Construct the appropriate ANOVA table, and test the hypothesis that there is no difference between mean number of seeds germinating for the four treatments. b. Is there evidence that seeds eaten and then excreted by lizards germinate at a higher rate than those eaten and then excreted by birds? Give statistical evidence to support your answer. Additional Exercises for Section 17.2 17.20 Samples of six different brands of diet or imitation margarine were analyzed to determine the level of physiologically active polyunsaturated fatty acids (PAPUFA, in percent), resulting in the data shown in the accompanying table. (The data are fictitious, but the sample means agree with data reported in Consumer Reports.) Imperial 14.1 13.6 14.4 14.3 Parkay 12.8 12.5 13.4 13.0 Blue Bonnet 13.5 13.4 14.1 14.3 Chiffon 13.2 12.7 12.6 13.9 Mazola 16.8 17.2 16.4 17.3 Fleischmann’s 18.1 17.2 18.7 18.4 12.3 18.0 a. Test for differences among the true mean PAPUFA percentages for the different brands. Use a 5 0.05. b. Use the T-K procedure to compute 95% simultaneous confidence intervals for all differences between means and give the corresponding underscoring pattern. 17.21 The nutritional quality of shrubs commonly used for feed by rabbits was the focus of a study summarized in the article “Estimation of Browse by Size Classes for Snowshoe Hare” ( Journal of Wildlife Management [1980]: 34–40). The energy contents (cal/g) of three sizes (4 mm or less, 5–7 mm, and 8–10 mm) of serviceberries were studied. Let m1, m2, and m3 denote the true mean energy content for the three size classes. Suppose that 95% simultaneous confidence intervals for m1 2 m2, m1 2 m3, and m2 2 m3 are (210, 290), (150, 450), and (10, 310), respectively. How would you interpret these intervals? 17.22 Consider the accompanying data on plant growth after the application of five different types of growth hormone. Hormone 1 13 17 7 14 2 21 13 20 17 3 18 14 17 21 4 7 11 18 10 5 6 11 15 8 a. Carry out the F test at level a 5 0.05. b. What happens when the T-K procedure is applied? (Note: This “contradiction” can occur when H0 is “barely” rejected. It happens because the test and the multiple comparison method are based on different distributions. Consult your friendly neighborhood statistician for more information.) Appendix: ANOVA Computations 25 Chapter 17 Appendix: ANOVA Computations Let T1 denote the sum of the observations in the sample from the first population or treatment, and let T2, …, Tk denote the other sample totals. Also let T represent the sum of all N observations—the grand total—and T 2 CF 5 correction factor 5 ___ N Then ∑ SSTo 5 x22 CF all N obs. 2 1 ___ T Tk2 T22 … ___ SSTr 5 n 1 ___ 1 1 n n 2 CF 1 2 k SSE 5 SSTo 2 SSTr Example 15A.1 Treatment 1 4.2 3.7 5.0 4.8 T1 5 17.7 n1 5 4 Treatment 2 5.7 6.2 6.4 T2 5 18.3 n2 5 3 Treatment 3 4.6 3.2 3.5 3.9 T3 5 15.2 n3 5 4 T 5 51.2 N 5 11 (51.2)2 T 2 ______ ___ CF 5 correction factor 5 N 5 11 5 238.31 2 2 2 T1 Tk T2 … 1 ___ SSTr 5 ___ n 1 ___ n2 1 nk 2 CF 1 2 2 (17.7) (18.3) (15.2)2 ______ ______ 1 1 2 238.31 5 ______ 4 3 4 5 9.40 ∑ SSTo 5 x 22 CF 5 (4.2)2 1 (3.7)2 1 … 1 (3.9)2 2 238.31 5 11.81 all N obs. SSE 5 SSTo 2 SSTr 5 118.1 2 9.40 5 2.41 are you ready to move on? Chapter 17 Review Exercises All chapter learning objectives are assessed in these exercises. The learning objectives assessed in each exercise are given in parentheses for each exercise. 17.23 (C1, M1, M2, M3) The paper “Women’s and Men’s Eating Behavior Following Exposure to Ideal-Body Images and Text” (Communication Research [2006]: 507–529) describes an experiment in which 74 men were assigned at random to one of four treatments: 1. Viewed slides of fit, muscular men 2.Viewed slides of fit, muscular men accompanied by diet and fitness-related text 3.Viewed slides of fit, muscular men accompanied by text not related to diet and fitness 4. Did not view any slides The participants then went to a room to complete a questionnaire. In this room, bowls of pretzels were set out on the tables. A research assistant noted how many pretzels were consumed by each participant while completing the questionnaire. Data consistent with summary quantities given in the paper are given in the accompanying table. Do these data provide convincing evidence that the mean number of pretzels consumed is not the same for all four treatments? Test the relevant hypotheses using a significance level of 0.05. © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Single-Factor ANOVA 26 CHAPTER 17 Asking and Answering Questions about More Than Two Means © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Treatment 1 8 7 4 13 2 1 5 8 11 5 1 0 6 4 10 7 0 12 Treatment 2 Treatment 3 Treatment 4 1 5 2 0 3 0 3 4 4 5 5 7 8 4 0 6 3 5 2 5 7 5 2 0 0 3 4 2 4 1 1 6 8 0 4 9 8 6 2 7 8 8 5 14 9 0 6 3 12 5 6 10 8 6 2 10 17.24 (P1, P2) Can use of an online plagiarism-detection system reduce plagiarism in student research papers? The paper “Plagiarism and Technology: A Tool for Coping with Plagiarism” ( Journal of Education for Business [2005]: 149–152) describes a study in which randomly selected research papers submitted by students during five semesters were analyzed for plagiarism. For each paper, the percentage of plagiarized words in the paper was determined by an online analysis. In each of the five semesters, students were told during the first two class meetings that they would have to submit an electronic version of their research papers and that the papers would be reviewed for plagiarism. Suppose that the number of papers sampled in each of the five semesters and the means and standard deviations for percentage of plagiarized words are as given in the accompanying table. For purposes of this exercise, assume that the conditions necessary for the ANOVA F test are reasonable. Do these data provide evidence to support the claim that mean percentage of plagiarized words is not the same for all five semesters? Test the appropriate hypotheses using a 5 0.05. Semester n Mean Standard deviation 1 2 3 4 5 39 42 32 32 34 6.31 3.31 1.79 1.83 1.50 3.75 3.06 3.25 3.13 2.37 17.25 (M4, P2) The paper referenced in Exercise 17.3 described an experiment to determine if restrictive age labeling on video games increased the attractiveness of the game for boys ages 12 to 13. In that exercise, the null hypothesis was H0: m1 5 m2 5 m3 5 m4, where m1 is the population mean attractiveness rating for the game with the 71 age label, and m2, m3, and m4 are the population mean attractiveness scores for the 121, 161, and 181 age labels, respectively. The sample data are given in the accompanying table. 71 label 121 label 161 label 181 label 6 6 6 5 4 8 6 1 2 4 8 7 8 5 7 9 5 8 4 7 7 9 8 6 7 4 8 9 6 7 10 9 6 8 7 6 8 9 10 8 a.Compute the 95% T-K intervals and then use the underscoring procedure described in this section to identify significant differences among the age labels. b.Based on your answer to Part (a), write a few sentences commenting on the theory that the more restrictive the age label on a video game, the more attractive the game is to 12- to 13-year-old boys. 17.26 (M4) The authors of the paper “Beyond the Shooter Game: Examining Presence and Hostile Outcomes among Male Game Players” (Communication Research [2006]: 448–466) stud- ied how video game content might influence attitudes and behavior. Male students at a large Midwestern university were assigned at random to play one of three action-oriented video games. Two of the games involved some violence— one was a shooting game and one was a fighting game. The third game was a nonviolent race car driving game. After playing a game for 20 minutes, participants answered a set of questions. The responses were used to determine values of three measures of aggression: (1) a measure of aggressive behavior; (2) a measure of aggressive thoughts; and (3) a measure of aggressive feelings. The authors hypothesized that the means for the three measures of aggression would be greatest for the fighting game and lowest for the driving game. a.For the measure of aggressive behavior, the paper reports that the mean score for the fighting game was significantly higher than the mean scores for the shooting and driving game, but that the mean scores for the shooting and driving games were not significantly different. The three sample means were: Sample mean Driving Shooting Fighting 3.42 4.00 5.30 Use the underscoring procedure of this section to construct a display that shows any significant differences in mean aggressive behavior score among the three games. b.For the measure of aggressive thoughts, the three sample means were: Sample mean Driving Shooting Fighting 2.81 3.44 4.01 The paper states that the mean score for the fighting game only significantly differed from the mean score for the driving game, and that the mean score for the shooting game did not significantly differ from either the fighting or driving games. Use the underscoring procedure of this section to construct a display that shows any significant differences in mean aggressive thoughts score among the three games. Technology Notes ANOVA JMP 1. Input the raw data into the first column 2. Input the group information into the second column 3. Click Analyze then select Fit Y by X 4.Click and drag the first column name from the box under Select Columns to the box next to Y, Response 5.Click and drag the second column name from the box under Select Columns to the box next to X, Factor 6. Click OK 7.Click the red arrow next to Oneway Analysis of… and select Means/ANOVA Unless otherwise noted, all content on this page is © Cengage Learning. MINITAB Data stored in separate columns 1. Input each group’s data in a separate column 2.Click Stat then ANOVA then One-Way (Unstacked)… 3. Click in the box under Responses (in separate columns): 4. Double-click the column name containing each group’s data 5. Click OK Data stored in one column 1. Input the data into one column 2. Input the group information into a second column © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 27 Technology Notes 28 CHAPTER 17 Asking and Answering Questions about More Than Two Means © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 3.Click Analyze then click Compare Means then click OneWay ANOVA… 4.Click the name of the column containing the raw data and click the arrow to move it to the box under Dependent List: 5.Click the name of the column containing the group data and click the arrow to move it to the box under Factor: 6. Click OK Excel 1. Input the raw data for each group into a separate column 2. Click the Data ribbon 3. Click Data Analysis in the Analysis group Note: If you do not see Data Analysis listed on the Ribbon, see the Technology Notes for Chapter 2 for instructions on installing this add-on. 4.Select Anova: Single Factor and click OK 5.Click on the box next to Input Range and select ALL columns of data (if you typed and selected column titles, click the box next to Labels in First Row) 6.Click in the box next to Alpha and type the significance level 7. Click OK 3. Click Stat then ANOVA then One-Way… 4.Click in the box next to Response: and double-click the column name containing the raw data values 5.Click in the box next to Factor: and double-click the column name containing the group information 6. Click OK SPSS 1. Input the raw data for all groups into one column 2.Input the group information into a second column (use group numbers) Note: The test statistic and p-value can be found in the first row of the table under F and P-value, respectively. TI-83/84 1.Enter the data for each group into a separate list starting with L1 (In order to access lists press the STAT key, highlight the option called Edit… then press ENTER) 2. Press STAT 3. Highlight TESTS 4. Highlight ANOVA and press ENTER 5. Press 2nd then 1 6. Press , 7. Press 2nd then 2 8. Press , 9.Continue to input lists where data is stored separated by commas until you input the final list 10. When you are finished entering all lists, press ) 11. Press ENTER TI-Nspire Summarized Data 1.Enter the summary information for the first group in a list in the following order: the value for n followed by a comma __ then the value of x followed by a comma then the value of s (In order to access data lists select the spreadsheet option and press enter) Note: Be sure to title the lists by selecting the top row of the column and typing a title. 2.Enter the summary information for the first group in a list in the following order: the value for n followed by a comma then __ the value of x followed by a comma then the value of s 3.Continue to enter summary information for each group in this manner Unless otherwise noted, all content on this page is © Cengage Learning. 4.When you are finished entering data for each group, press menu then 4:Statistics then 4:Stat Tests then C:ANOVA… then press enter 5.For Data Input Method choose Stats from the drop-down menu 6. For Number of Groups enter the number of groups, k 7.In the box next to Group 1 Stats select the list containing group one’s summary statistics 8.In the box next to Group 2 Stats select the list containing group one’s summary statistics 9.Continue entering summary statistics in this manner for all groups 10. Press OK Raw data 1.Enter each group’s data into separate data lists (In order to access data lists, select the spreadsheet option and press enter) 29 Note: Be sure to title the lists by selecting the top row of the column and typing a title. 2.Press the menu key and select 4:Statistics then 4:Stat Tests then C:ANOVA… and press enter 3.For Data Input Method choose Data from the drop-down menu 4.For Number of Groups input the number of groups, k 5. Press OK 6.For List 1 select the list title that contains group one’s data from the drop-down menu 7.For List 2 select the list title that contains group two’s data from the drop-down menu 8.Continue to select the appropriate lists for all groups 9. When you are finished inputting lists press OK © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Technology Notes 17.2 Multiple Comparisons 30 CHAPTER 17 Asking and Answering Questions about More Than Two Means Appendix Tables © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Table 7 Values That Capture Specified Upper-Tail F Curve Areas df2 Area 1 2 3 4 5 6 7 8 9 .10 .05 .01 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 1 2 3 4 5 df1 6 7 8 9 10 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 60.19 161.40 199.50 215.70 224.60 230.20 234.00 236.80 238.90 240.50 241.90 4052.00 5000.00 5403.00 5625.00 5764.00 5859.00 5928.00 5981.00 6022.00 6056.00 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 998.50 999.00 999.20 999.20 999.30 999.30 999.40 999.40 999.40 999.40 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 167.00 148.50 141.10 137.10 134.60 132.80 131.60 130.60 129.90 129.20 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 74.14 61.25 56.18 53.44 51.71 50.53 49.66 49.00 48.47 48.05 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 47.18 37.12 33.20 31.09 29.75 28.83 28.16 27.65 27.24 26.92 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 35.51 27.00 23.70 21.92 20.80 20.03 19.46 19.03 18.69 18.41 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 29.25 21.69 18.77 17.20 16.21 15.52 15.02 14.63 14.33 14.08 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 25.41 18.49 15.83 14.39 13.48 12.86 12.40 12.05 11.77 11.54 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 16.39 13.90 12.56 11.71 11.13 10.70 10.37 10.11 9.89 22.86 (continued) 31 Appendix Table 7 Values That Capture Specified Upper-Tail F Curve Areas (Continued) df2 Area 10 11 12 13 14 15 16 17 18 19 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 1 3.29 4.96 10.04 21.04 3.23 4.84 9.65 19.69 3.18 4.75 9.33 18.64 3.14 4.67 9.07 17.82 3.10 4.60 8.86 17.14 3.07 4.54 8.68 16.59 3.05 4.49 8.53 16.12 3.03 4.45 8.40 15.72 3.01 4.41 8.29 15.38 2.99 4.38 8.18 15.08 2 2.92 4.10 7.56 14.91 2.86 3.98 7.21 13.81 2.81 3.89 6.93 12.97 2.76 3.81 6.70 12.31 2.73 3.74 6.51 11.78 2.70 3.68 6.36 11.34 2.67 3.63 6.23 10.97 2.64 3.59 6.11 10.66 2.62 3.55 6.01 10.39 2.61 3.52 5.93 10.16 3 2.73 3.71 6.55 12.55 2.66 3.59 6.22 11.56 2.61 3.49 5.95 10.80 2.56 3.41 5.74 10.21 2.52 3.34 5.56 9.73 2.49 3.29 5.42 9.34 2.46 3.24 5.29 9.01 2.44 3.20 5.18 8.73 2.42 3.16 5.09 8.49 2.40 3.13 5.01 8.28 4 2.61 3.48 5.99 11.28 2.54 3.36 5.67 10.35 2.48 3.26 5.41 9.63 2.43 3.18 5.21 9.07 2.39 3.11 5.04 8.62 2.36 3.06 4.89 8.25 2.33 3.01 4.77 7.94 2.31 2.96 4.67 7.68 2.29 2.93 4.58 7.46 2.27 2.90 4.50 7.27 5 2.52 3.33 5.64 10.48 2.45 3.20 5.32 9.58 2.39 3.11 5.06 8.89 2.35 3.03 4.86 8.35 2.31 2.96 4.69 7.92 2.27 2.90 4.56 7.57 2.24 2.85 4.44 7.27 2.22 2.81 4.34 7.02 2.20 2.77 4.25 6.81 2.18 2.74 4.17 6.62 6 7 8 9 10 2.46 3.22 5.39 9.93 2.39 3.09 5.07 9.05 2.33 3.00 4.82 8.38 2.28 2.92 4.62 7.86 2.24 2.85 4.46 7.44 2.21 2.79 4.32 7.09 2.18 2.74 4.20 6.80 2.15 2.70 4.10 6.56 2.13 2.66 4.01 6.35 2.11 2.63 3.94 6.18 2.41 3.14 5.20 9.52 2.34 3.01 4.89 8.66 2.28 2.91 4.64 8.00 2.23 2.83 4.44 7.49 2.19 2.76 4.28 7.08 2.16 2.71 4.14 6.74 2.13 2.66 4.03 6.46 2.10 2.61 3.93 6.22 2.08 2.58 3.84 6.02 2.06 2.54 3.77 5.85 2.38 3.07 5.06 9.20 2.30 2.95 4.74 8.35 2.24 2.85 4.50 7.71 2.20 2.77 4.30 7.21 2.15 2.70 4.14 6.80 2.12 2.64 4.00 6.47 2.09 2.59 3.89 6.19 2.06 2.55 3.79 5.96 2.04 2.51 3.71 5.76 2.02 2.48 3.63 5.59 2.35 3.02 4.94 8.96 2.27 2.90 4.63 8.12 2.21 2.80 4.39 7.48 2.16 2.71 4.19 6.98 2.12 2.65 4.03 6.58 2.09 2.59 3.89 6.26 2.06 2.54 3.78 5.98 2.03 2.49 3.68 5.75 2.00 2.46 3.60 5.56 1.98 2.42 3.52 5.39 2.32 2.98 4.85 8.75 2.25 2.85 4.54 7.92 2.19 2.75 4.30 7.29 2.14 2.67 4.10 6.80 2.10 2.60 3.94 6.40 2.06 2.54 3.80 6.08 2.03 2.49 3.69 5.81 2.00 2.45 3.59 5.58 1.98 2.41 3.51 5.39 1.96 2.38 3.43 5.22 © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. df1 32 CHAPTER 17 Asking and Answering Questions about More Than Two Means Table 7 Values That Capture Specified Upper-Tail F Curve Areas (Continued) df2 Area © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 20 21 22 23 24 25 26 27 28 29 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 .10 .05 .01 .001 1 2.97 4.35 8.10 14.82 2.96 4.32 8.02 14.59 2.95 4.30 7.95 14.38 2.94 4.28 7.88 14.20 2.93 4.26 7.82 14.03 2.92 4.24 7.77 13.88 2.91 4.23 7.72 13.74 2.90 4.21 7.68 13.61 2.89 4.20 7.64 13.50 2.89 4.18 7.60 13.39 df1 2 3 4 5 2.59 3.49 5.85 9.95 2.57 3.47 5.78 9.77 2.56 3.44 5.72 9.61 2.55 3.42 5.66 9.47 2.54 3.40 5.61 9.34 2.53 3.39 5.57 9.22 2.52 3.37 5.53 9.12 2.51 3.35 5.49 9.02 2.50 3.34 5.45 8.93 2.50 3.33 5.42 8.85 2.38 3.10 4.94 8.10 2.36 3.07 4.87 7.94 2.35 3.05 4.82 7.80 2.34 3.03 4.76 7.67 2.33 3.01 4.72 7.55 2.32 2.99 4.68 7.45 2.31 2.98 4.64 7.36 2.30 2.96 4.60 7.27 2.29 2.95 4.57 7.19 2.28 2.93 4.54 7.12 2.25 2.87 4.43 7.10 2.23 2.84 4.37 6.95 2.22 2.82 4.31 6.81 2.21 2.80 4.26 6.70 2.19 2.78 4.22 6.59 2.18 2.76 4.18 6.49 2.17 2.74 4.14 6.41 2.17 2.73 4.11 6.33 2.16 2.71 4.07 6.25 2.15 2.70 4.04 6.19 2.16 2.71 4.10 6.46 2.14 2.68 4.04 6.32 2.13 2.66 3.99 6.19 2.11 2.64 3.94 6.08 2.10 2.62 3.90 5.98 2.09 2.60 3.85 5.89 2.08 2.59 3.82 5.80 2.07 2.57 3.78 5.73 2.06 2.56 3.75 5.66 2.06 2.55 3.73 5.59 6 7 8 9 10 2.09 2.60 3.87 6.02 2.08 2.57 3.81 5.88 2.06 2.55 3.76 5.76 2.05 2.53 3.71 5.65 2.04 2.51 3.67 5.55 2.02 2.49 3.63 5.46 2.01 2.47 3.59 5.38 2.00 2.46 3.56 5.31 2.00 2.45 3.53 5.24 1.99 2.43 3.50 5.18 2.04 2.51 3.70 5.69 2.02 2.49 3.64 5.56 2.01 2.46 3.59 5.44 1.99 2.44 3.54 5.33 1.98 2.42 3.50 5.23 1.97 2.40 3.46 5.15 1.96 2.39 3.42 5.07 1.95 2.37 3.39 5.00 1.94 2.36 3.36 4.93 1.93 2.35 3.33 4.87 2.00 2.45 3.56 5.44 1.98 2.42 3.51 5.31 1.97 2.40 3.45 5.19 1.95 2.37 3.41 5.09 1.94 2.36 3.36 4.99 1.93 2.34 3.32 4.91 1.92 2.32 3.29 4.83 1.91 2.31 3.26 4.76 1.90 2.29 3.23 4.69 1.89 2.28 3.20 4.64 1.96 2.39 3.46 5.24 1.95 2.37 3.40 5.11 1.93 2.34 3.35 4.99 1.92 2.32 3.30 4.89 1.91 2.30 3.26 4.80 1.89 2.28 3.22 4.71 1.88 2.27 3.18 4.64 1.87 2.25 3.15 4.57 1.87 2.24 3.12 4.50 1.86 2.22 3.09 4.45 1.94 2.35 3.37 5.08 1.92 2.32 3.31 4.95 1.90 2.30 3.26 4.83 1.89 2.27 3.21 4.73 1.88 2.25 3.17 4.64 1.87 2.24 3.13 4.56 1.86 2.22 3.09 4.48 1.85 2.20 3.06 4.41 1.84 2.19 3.03 4.35 1.83 2.18 3.00 4.29 (continued) 33 Appendix Table 7 Values That Capture Specified Upper-Tail F Curve Areas (Continued) df2 Area 30 .10 .05 .01 .001 40 .10 .05 .01 .001 60 .10 .05 .01 .001 90 .10 .05 .01 .001 120 .10 .05 .01 .001 240 .10 .05 .01 .001 ∞ .10 .05 .01 .001 1 2.88 4.17 7.56 13.29 2.84 4.08 7.31 12.61 2.79 4.00 7.08 11.97 2.76 3.95 6.93 11.57 2.75 3.92 6.85 11.38 2.73 3.88 6.74 11.10 2.71 3.84 6.63 10.83 2 3 4 5 6 7 8 9 10 2.49 3.32 5.39 8.77 2.44 3.23 5.18 8.25 2.39 3.15 4.98 7.77 2.36 3.10 4.85 7.47 2.35 3.07 4.79 7.32 2.32 3.03 4.69 7.11 2.30 3.00 4.61 6.91 2.28 2.92 4.51 7.05 2.23 2.84 4.31 6.59 2.18 2.76 4.13 6.17 2.15 2.71 4.01 5.91 2.13 2.68 3.95 5.78 2.10 2.64 3.86 5.60 2.08 2.60 3.78 5.42 2.14 2.69 4.02 6.12 2.09 2.61 3.83 5.70 2.04 2.53 3.65 5.31 2.01 2.47 3.53 5.06 1.99 2.45 3.48 4.95 1.97 2.41 3.40 4.78 1.94 2.37 3.32 4.62 2.05 2.53 3.70 5.53 2.00 2.45 3.51 5.13 1.95 2.37 3.34 4.76 1.91 2.32 3.23 4.53 1.90 2.29 3.17 4.42 1.87 2.25 3.09 4.25 1.85 2.21 3.02 4.10 1.98 2.42 3.47 5.12 1.93 2.34 3.29 4.73 1.87 2.25 3.12 4.37 1.84 2.20 3.01 4.15 1.82 2.18 2.96 4.04 1.80 2.14 2.88 3.89 1.77 2.10 2.80 3.74 1.93 2.33 3.30 4.82 1.87 2.25 3.12 4.44 1.82 2.17 2.95 4.09 1.78 2.11 2.84 3.87 1.77 2.09 2.79 3.77 1.74 2.04 2.71 3.62 1.72 2.01 2.64 3.47 1.88 2.27 3.17 4.58 1.83 2.18 2.99 4.21 1.77 2.10 2.82 3.86 1.74 2.04 2.72 3.65 1.72 2.02 2.66 3.55 1.70 1.98 2.59 3.41 1.67 1.94 2.51 3.27 1.85 2.21 3.07 4.39 1.79 2.12 2.89 4.02 1.74 2.04 2.72 3.69 1.70 1.99 2.61 3.48 1.68 1.96 2.56 3.38 1.65 1.92 2.48 3.24 1.63 1.88 2.41 3.10 1.82 2.16 2.98 4.24 1.76 2.08 2.80 3.87 1.71 1.99 2.63 3.54 1.67 1.94 2.52 3.34 1.65 1.91 2.47 3.24 1.63 1.87 2.40 3.09 1.60 1.83 2.32 2.96 © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. df1 34 CHAPTER 17 Asking and Answering Questions about More Than Two Means © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Table 8 Critical Values of q for the Studentized Range Distribution Error df Confidence level 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 60 120 ∞ 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% 95% 99% Number of populations, treatments, or levels being compared 3 4 5 6 7 8 9 10 4.60 6.98 4.34 6.33 4.16 5.92 4.04 5.64 3.95 5.43 3.88 5.27 3.82 5.15 3.77 5.05 3.73 4.96 3.70 4.89 3.67 4.84 3.65 4.79 3.63 4.74 3.61 4.70 3.59 4.67 3.58 4.64 3.53 4.55 3.49 4.45 3.44 4.37 3.40 4.28 3.36 4.20 3.31 4.12 5.22 7.80 4.90 7.03 4.68 6.54 4.53 6.20 4.41 5.96 4.33 5.77 4.26 5.62 4.20 5.50 4.15 5.40 4.11 5.32 4.08 5.25 4.05 5.19 4.02 5.14 4.00 5.09 3.98 5.05 3.96 5.02 3.90 4.91 3.85 4.80 3.79 4.70 3.74 4.59 3.68 4.50 3.63 4.40 5.67 8.42 5.30 7.56 5.06 7.01 4.89 6.62 4.76 6.35 4.65 6.14 4.57 5.97 4.51 5.84 4.45 5.73 4.41 5.63 4.37 5.56 4.33 5.49 4.30 5.43 4.28 5.38 4.25 5.33 4.23 5.29 4.17 5.17 4.10 5.05 4.04 4.93 3.98 4.82 3.92 4.71 3.86 4.60 6.03 8.91 5.63 7.97 5.36 7.37 5.17 6.96 5.02 6.66 4.91 6.43 4.82 6.25 4.75 6.10 4.69 5.98 4.64 5.88 4.59 5.80 4.56 5.72 4.52 5.66 4.49 5.60 4.47 5.55 4.45 5.51 4.37 5.37 4.30 5.24 4.23 5.11 4.16 4.99 4.10 4.87 4.03 4.76 6.33 9.32 5.90 8.32 5.61 7.68 5.40 7.24 5.24 6.91 5.12 6.67 5.03 6.48 4.95 6.32 4.88 6.19 4.83 6.08 4.78 5.99 4.74 5.92 4.70 5.85 4.67 5.79 4.65 5.73 4.62 5.69 4.54 5.54 4.46 5.40 4.39 5.26 4.31 5.13 4.24 5.01 4.17 4.88 6.58 9.67 6.12 8.61 5.82 7.94 5.60 7.47 5.43 7.13 5.30 6.87 5.20 6.67 5.12 6.51 5.05 6.37 4.99 6.26 4.94 6.16 4.90 6.08 4.86 6.01 4.82 5.94 4.79 5.89 4.77 5.84 4.68 5.69 4.60 5.54 4.52 5.39 4.44 5.25 4.36 5.12 4.29 4.99 6.80 9.97 6.32 8.87 6.00 8.17 5.77 7.68 5.59 7.33 5.46 7.05 5.35 6.84 5.27 6.67 5.19 6.53 5.13 6.41 5.08 6.31 5.03 6.22 4.99 6.15 4.96 6.08 4.92 6.02 4.90 5.97 4.81 5.81 4.72 5.65 4.63 5.50 4.55 5.36 4.47 5.21 4.39 5.08 6.99 10.24 6.49 9.10 6.16 8.37 5.92 7.86 5.74 7.49 5.60 7.21 5.49 6.99 5.39 6.81 5.32 6.67 5.25 6.54 5.20 6.44 5.15 6.35 5.11 6.27 5.07 6.20 5.04 6.14 5.01 6.09 4.92 5.92 4.82 5.76 4.73 5.60 4.65 5.45 4.56 5.30 4.47 5.16 17.2 Multiple Comparisons 35 Section 17.1 Exercise Set 1 17.1 (a) 0.001 , P-value , 0.01 (b) P-value . 0.10 (c) P-value 5 0.01 (d) P-value , 0.001 (e) 0.05 , P-value , 0.10 (f) 0.01 , P-value , 0.05 (using df1 5 4 and df2 5 60) 17.2 (a) H0: m1 5 m2 5 m3 5 m4 , Ha: At least two of the four mi’s are different. (b) P-value 5 0.012, fail to reject H0 (c) P-value 5 0.012, fail to reject H0 17.3 F 5 6.687, P-value 5 0.001, reject H0 _ _ _ _ _ _ _ _ _ 17.16 Sample mean Fabric 3 Fabric 2 Fabric 5 Fabric 4 Fabric 1 14.96 16.35 10.5 11.633 12.3 Additional Exercises 17.4 (a) SSTr 5 n1(x 1 2 x )2 1 n2 (x 2 2 x )2 1 n3 (x 3 2 x )2 _ _ _ 2 ) 5 32.13815000; 1 n4 (x 4 2 x Treatment df 5 k 2 1 5 3; SSE 5 (n1 2 1)s 1 (n2 2 1)s 1 (n3 2 1)s32 1 (n4 2 1)s42 5 32.90103333; 2 1 no slides and slides with unrelated text. However, there was a significant difference between the mean numbers of pretzels eaten for no slides and slides with no text (and also between the results for no slides and slides with related text). Likewise, there was a significant difference between the mean numbers of pretzels eaten for slides with unrelated text and slides with no text (and also between the results for slides with unrelated text and slides with related text). 2 2 Error df 5 N 2 k 5 20 (b) H0: m1 5 m2 5 m3 5 m4 ; Ha: At least two among m1, m2, m3, m4 are different; F 5 6.51, P-value 5 0.033; reject H0. Additional Exercises 17.9 H0: m1 5 m2 5 m3 5 m4 ; Ha : At least two among m1, m2, m3, m4 are different; F 5 25.094, P-value < 0; reject H0. 17.11 F 5 2.62, 0.05 , P-value , 0.10, fail to reject H0 17.13 (a) See solutions manual for detailed computations. (b) F 5 2.142, P-value . 0.10, fail to reject H0 Section 17.2 Exercise Set 1 17.14 Since the interval for m2 2 m3 is the only one that contains zero, we have evidence of a difference between m1 and m2, and between m1 and m3, but not between m2 and m3. Thus, statement c is the correct choice. 17.15 In increasing order of the resulting mean numbers of pretzels eaten, the treatments were: slides with related text, slides with no text, no slides, and slides with unrelated text. There were no significant differences between the results for slides with related text and slides with no text, or for 17.21 The interval for m1 2 m2 contains zero, and hence m1 and m2 are judged not different. The intervals for µ1 2 m3 and m2 2 m3 do not contain zero, so m1 and m3 are judged to be different, and m2 and m3 are judged to be different. There is evidence that m3 is different from the other two means. Are You Ready to Move On? Chapter 17 Review Exercises 17.23 F 5 5.273, P-value 5 0.002, reject H0 17.25 (a) Difference Interval Includes 0? m1 2 m2 m1 2 m3 m1 2 m4 m2 2 m3 m2 2 m4 m3 2 m4 (24.027, 0.027) (24.327, 20.273) (25.327, 21.273) (22.327, 1.727) (23.327, 0.727) (23.027, 1.027) Yes No No Yes Yes Yes Sample mean 71 label 4.8 121 label 6.8 161 label 7.1 181 label 8.1 (b) The more restrictive the age label on the video game, the higher the sample mean rating given by the boys used in the experiment. However, according to the T-K intervals, the only significant differences were between the means for the 7 1 label and the 16 1 label and between the means for the 7 1 label and the 18 1 label. 35 © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Answers to Selected Exercises