Nonparametric Tests IPS Chapter 15 15.1: The Wilcoxon Rank Sum Test 15.2: The Wilcoxon Signed Rank Test 15.3: The Kruskal-Wallis Test © 2012 W.H. Freeman and Company Nonparametric Tests 15.1, 15.2, and 15.3 The Wilcoxon and Kruskal-Wallis Tests © 2012 W.H. Freeman and Company Objectives 15.1-15.3 The Wilcoxon rank sum test The Normal approximation for W What hypotheses does Wilcoxon test? The Wilcoxon signed rank test The Normal approximation for W+ Dealing with ties The Kruskal-Wallis test Assumptions for inference For the inference methods for means we have already studied, we assumed that the variables have Normal distributions in the population(s) from which we draw our data. Robustness: some skewness was acceptable, especially if the sample size was large. What happens if plots suggest the data are clearly not Normal, especially if the sample size is small? Options for non-Normal data and small n Is lack of Normality due to outliers? If an outlier appears to be “real data,” you have to leave it in, but if you have reason to think there is an error in that data, you may be able to remove it. Try transforming the data. For example, use a logarithm for right-skewed data. Try another standard distribution. Other procedures can replace the t procedures if data (especially right-skewed data) fit another distribution. Use modern bootstrap methods and permutation tests. Heavy computing avoids requiring Normality or any other specific form of sampling distribution. Use other nonparametric methods. Discussed in this chapter. Ranks Hypotheses for rank tests just replace the mean with the median. For strongly skewed data, we prefer the median to the mean for describing the center of the data. To rank observations, first arrange them in order from smallest to largest. The rank of each observation is its position in this ordered list, starting with rank 1 for the smallest observation. Example: Weeds among the corn Does the presence of small numbers of weeds reduce the yield of corn? Lamb’s-quarter is a common weed in corn fields. A researcher planted corn at the same rate in 8 small plots of ground, then weeded the corn rows by hand to allow no weeds in 4 randomly selected plots and exactly 3 lamb’s–quarter plants per meter of row in the other 4 plots. Here are the yields of corn (bushels per acre) in each of the plots. A Back-to-back stemplot shows non-Normality, possible outliers, and small sample sizes. Example: Weeds among the corn First rank all 8 observations together. Arrange them in order from smallest to largest. The shaded numbers are those with no weeds. Note that 4 of the 5 highest yields are from the no weeds group. The idea of rank tests is to look just at position in this list. Working with ranks allows us to dispense with the numerical values of the data and the specific conditions on the shape of the distribution such as Normality. Example: Weeds among the corn If the presence of weeds reduces corn yields, we expect the ranks of the yields from plots without weeds to be larger as a group than the ranks from plots with weeds. Compare the sums of the ranks from the two treatments. If the weeds have no effect, we would expect the sum of the ranks in either group to be 18. Why? Wilcoxon Rank Sum Test Draw an SRS of size n1 from one population and draw an independent SRS of size n2 from a second population. There are N observations in all, where N = n1 + n2. Rank all N observations. The sum W of the ranks for the first sample is the Wilcoxon rank sum statistic. If the two populations have the same continuous distribution, then W has mean W n1 ( N 1) 2 and standard deviation n1n2 ( N 1) W 12 The Wilcoxon rank sum test rejects the hypothesis that the two populations have identical distributions when the rank sum W is far from its mean. Example: Weeds among the corn In this study, we want to test the hypotheses H0: No difference in distribution of yields. Ha: Yields are systematically higher in weed-free plots. The test statistic is the rank sum W = 23 for the weed-free plots. Conditions for Wilcoxon test are met: data come from a randomized comparative experiment. yield of corn in bushels per acre has a continuous distribution. Example: Weeds among the corn N = 8, n1(no weeds) = 4, and n2(3 weeds per meter) = 4. The sum of ranks for the weed-free plants has mean and standard deviation: n1 ( N 1) 4(9) W 18 2 2 n1n2 ( N 1) (4)(4)(9) W 3.464 12 12 The observed rank sum W = 23 is only 1.4 standard deviations above the mean. Software tells us that the P-value for P(W 23) is 0.1. We cannot reject the null hypothesis. We do not have enough evidence to say that yields are systematically higher in weed-free plots. A larger sample size might clarify the effect of weeds on corn yield. . The Normal Approximation for W To calculate the P-value for the rank sum Wilcoxon test, we need to know the sampling distribution of W when the null hypothesis is true. With or without software, P-values for the Wilcoxon test are often based on the fact that the rank sum statistic W becomes approximately Normal as the two sample sizes increase. Test statistic: z W W W W n1 ( N 1) / 2 n1n2 ( N 1) /12 Example: Weeds among the corn z W W W 23 18 1.44 3.464 P value P( Z 1.44) 0.0749 We can improve this approximation by using the continuity correction. You use this for a variable that takes only whole-number values, like W. Act as if each whole number occupies the entire interval from 0.5 below the number to 0.5 above it. z W W W 22.5 18 1.30 3.464 P value P( Z 1.30) 0.0968 Software tells us that the exact P-value for P(W 23) is 0.1. Using technology We prefer software that gives the exact P-value for the Wilcoxon test rather than the Normal approximation. Neither Excel nor the TI-83 calculator has menu entries for rank tests. Minitab offers only the Normal approximation. Here is output from CrunchIt! for results of three tests that could be used to compare yields for the two groups of corn plots. What hypotheses does Wilcoxon test? If we assume that our sample is Normally distributed, we can use the two-sample t test for means. H : = 0 1 2 Ha: 1>2 When the distribution may not be Normal, we might restate the hypotheses in terms of population medians rather than means. H0: median1 = median2 Ha: median1> median2 The Wilcoxon rank sum test will test the hypotheses above only if an additional condition is met: both populations must have distributions of the same shape. What hypotheses does Wilcoxon test? The same shape condition is too strict to be reasonable in practice. A more useful statement of the hypotheses compares two continuous distributions, whether or not they have the same shape. H0: the two distributions are the same Ha: one has values that are systematically larger These hypotheses are considered “nonparametric” because they do not include a parameter. They are just stated in words. Dealing with ties in rank tests Up until now, our data has had no two values exactly the same. However, we often find observations tied at the same value. The usual practice is to assign all tied values the average of the ranks they occupy. In practice, software is required to use rank tests when the data contain tied values. Matched pairs: the Wilcoxon signed rank test Example: A study of early childhood education asked kindergarten students to tell fairy tales that had been read to them earlier in the week. Each child told two stories. The first had been read to them and the second had been read but also illustrated with pictures. An expert listened to a recording of the children and assigned a score for certain uses of language. Here are the data for five low-progress readers in a pilot study: Compare absolute values of the differences between before and after results. Matched pairs: the Wilcoxon signed rank test The test statistic is the sum of the ranks of the positive differences (highlighted in blue). This is the Wilcoxon signed rank statistic. Its value here is W+ = 4 + 5 = 9. Matched pairs: the Wilcoxon signed rank test Draw an SRS of size n from a population for a matched pairs study and take the differences in responses within pairs. Rank the absolute values of these differences. The sum W+ of the ranks for the positive differences is the Wilcoxon signed rank statistic. If the distribution of the responses is not affected by the different treatments within pairs, then W+ has mean and standard deviation: W n(n 1) 4 W n(n 1)(2n 1) 24 The Wilcoxon signed rank test rejects the hypothesis that there are no systematic differences within pairs when the rank sum W+ is far from its mean. Matched pairs: the Wilcoxon signed rank test For the storytelling example, W+ = 9 and n = 5, so the mean and standard deviation are: W W n(n 1) 5(5 1) 7.5 4 4 n(n 1)(2n 1) 5(6)(11) 3.708 24 24 The observed value of W+ = 9 is only slightly larger than the mean. We now expect that the data are not statistically significant. The data show a small effect but not a significant one. A larger sample size may show a larger effect. The P-value from software is 0.4062, which agrees with this conclusion. The Normal approximation for W+ The distribution of the signed rank statistic when the null hypothesis (no difference) is true becomes approximately Normal as the sample size becomes large. We can then use Normal probability calculations (with the continuity correction) to obtain approximate P-values for W+. For the storytelling example (although n = 5 is not a large sample), our P-value is really P(W 9), but with the continuity correction we change it to: 8.5 7.5 P(W 8.5) P Z 3.708 P( Z 0.27) 0.394 Using technology Minitab offers only the Normal approximation P-value. Here is output from CrunchIt! for results using the exact one-sided P-value for the Wilcoxon signed rank test and for the matched pairs t test. Dealing with ties in the signed rank test Ties among absolute differences: Handle just like for the regular rank sum tests—assign average ranks. Makes finding the P-value more complicated. There is no longer an exact distribution for W+. The standard deviation needs to be adjusted before the Normal approximation can be used. Ties within a pair: Create a difference of 0 (before = after). Because these differences are neither positive nor negative, we drop these pairs from our sample. Only reduces the number of observations, n. Comparing several samples: the Kruskal-Wallis test ANOVA hypotheses: Data should come from independent random samples, all Normally distributed with the same standard deviation: Kruskal-Wallis hypotheses: 1. Data should come from independent random samples; the response has a continuous (but not necessarily Normal) distribution. 2. Data should come from independent random samples, the response has a continuous (but not necessarily Normal) distribution, and the samples come from population distributions of the same shape (not necessarily Normal). H 0 : M 0 M1 M 3 M 9 H a : not all four medians are equal Example: Weeds among the corn Lamb’s-quarter is a common weed in corn fields. A researcher planted corn at the same rate in 16 small plots of ground, then randomly assigned the plots to 4 groups. He weeded the corn rows by hand to allow a fixed number of lamb’squarter plants to grow in each meter of corn row. These numbers were 0, 1, 3, and 9 in the four groups of plots. No other weeds were allowed to grow, and all plots received identical treatment except for the weeds. Here are the yields of corn (bushels per acre) in each of the plots. Example: Weeds among the corn Here are the summary statistics for the corn yield. Can we safely use ANOVA? The standard deviations don’t pass the largest s< 2 (smallest s) test, and there were outliers in the original data that cannot be removed. Can we use the median Kruskal-Wallis test? The different standard deviations suggest that the distributions do not all have the same shape. Example: Weeds among the corn Rank all 16 observations in order from smallest to largest. Note the tied observations Kruskal-Wallis test statistic Example: Weeds among the corn Kruskal-Wallis test statistic: Using Table F with df = 3, the P-value is 0.10 < P < 0.15. We do not reject the null hypothesis. Using technology Minitab output for the Kruskal-Wallis test and ANOVA. Alternate Slides The following slides offer alternate software output data and examples for this presentation. Using technology We prefer software that gives the exact P-value for the Wilcoxon test rather than the Normal approximation. Neither Excel nor the TI-83 calculator has menu entries for rank tests. Minitab and JMP offer only the Normal approximation. Here is output from CrunchIt! for results of three tests that could be used to compare yields for the two groups of corn plots. Using technology Minitab offers only the Normal approximation P-value. Here is output from CrunchIt! for results using the exact one-sided P-value for the Wilcoxon signed rank test and for the matched pairs t test. Using technology JMP output for ANOVA and for the Kruskal-Wallis test.