Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 12 Nonparametric Statistics 12-1 12-2 12-3 12-4 12-5 12-6 12-7 Overview Sign Test Wilcoxon Signed-Ranks Test for Matched Pairs Wilcoxon Rank-Sum Test for Two Independent Samples Kruskal-Wallis Test Rank Correlation Runs Test for Randomness Copyright © 2004 Pearson Education, Inc. Slide 2 Slide 3 Section 12-1 & 12-2 Overview and Sign Test Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Overview Slide 4 Definitions Parametric tests Parametric tests require assumptions about the nature or shape of the populations involved. Nonparametric tests Nonparametric tests do not require such assumptions. Consequently, these tests are called distribution-free tests. Copyright © 2004 Pearson Education, Inc. Advantages of Nonparametric Slide 5 Methods 1. Nonparametric methods can be applied to a wide variety of situations because they do not have the more rigid requirements of the corresponding parametric methods. In particular, nonparametric methods do not require normally distributed populations. 2. Unlike parametric methods, nonparametric methods can often be applied to nonnumerical data, such as the genders of survey respondents. 3. Nonparametric methods usually involve simpler computations than the corresponding parametric methods and are therefore easier to understand and apply. Copyright © 2004 Pearson Education, Inc. Disadvantages of Nonparametric Methods Slide 6 1. Nonparametric methods tend to waste information because exact numerical data are often reduced to a qualitative form. 2. Nonparametric tests are not as efficient as parametric tests, so with a nonparametric test we generally need stronger evidence (such as a larger sample or greater differences) before we reject a null hypothesis. Copyright © 2004 Pearson Education, Inc. Efficiency of Nonparametric Methods Copyright © 2004 Pearson Education, Inc. Slide 7 Definition Slide 8 Data are sorted when they are arranged according to some criterion, such as smallest to the largest or best to worst. A rank is a number assigned to an individual sample according to its order in the ranked list. The first item is assigned the rank of 1, the second is assigned the rank of 2, and so on. Copyright © 2004 Pearson Education, Inc. Example Slide 9 5 3 3 40 10 12 Original scores 5 10 12 40 Scores arranged in order 1 2 3 4 5 Ranks Copyright © 2004 Pearson Education, Inc. Handling Ties in Ranks Slide 10 Find the mean of the ranks involved and assign this mean rank to each of the tied items. 3 5 5 1 2.5 2.5 10 12 Original scores 4 5 Ranks 2 and 3 are tied Copyright © 2004 Pearson Education, Inc. Sign Test Slide 11 Definition The sign test is a nonparametric (distribution free) test that uses plus and minus signs to test different claims, including: 1) Claims involving matched pairs of sample data; 2) Claims involving nominal data; 3) Claims about the median of a single population. Copyright © 2004 Pearson Education, Inc. Figure 12-1 Sign Test Procedure Slide 12 Copyright © 2004 Pearson Education, Inc. Figure 12-1 SignTest Procedure Copyright © 2004 Pearson Education, Inc. Slide 13 Figure 12-1 Sign Test Procedure Copyright © 2004 Pearson Education, Inc. Slide 14 Assumptions Slide 15 1. The sample data have been randomly selected. 2. There is no requirement that the sample data come from a population with a particular distribution, such a normal distribution. Copyright © 2004 Pearson Education, Inc. Notation for Sign Test x = the number of times the less frequent sign occurs n = the total number of positive and negative signs combined Copyright © 2004 Pearson Education, Inc. Slide 16 Test Statistic for the Sign Test For n 25: x For n > 25: z= Slide 17 (the number of times the less frequent sign occurs) n (x + 0.5) – 2 n 2 Critical values: n 25, critical x values are in Table A-7 For n > 25, critical z values are in Table A-2 For Copyright © 2004 Pearson Education, Inc. Claims Involving Matched Pairs Slide 18 Convert the raw data to plus and minus signs as follows: 1. Subtract each value of the second variable from the corresponding value of the first variable 2. Record only the sign of the difference found in step 1. Exclude ties: that is, any matched pairs in which both values are equal. Copyright © 2004 Pearson Education, Inc. Key Principle of Sign Test Slide 19 If the two sets of data have equal medians, the number of positive signs should be approximately equal to the number of negative signs. Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children Slide 20 Use the data in Table 12-2 with a 0.05 significance level to test the claim that there is no difference between the times of the first and second trials. Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children Copyright © 2004 Pearson Education, Inc. Slide 21 Example: Intelligence in Children Slide 22 Use the data in Table 12-2 with a 0.05 significance level to test the claim that there is no difference between the times of the first and second trials. H0: The median of the difference is equal to 0. H1: The median of the difference is not equal to 0. = 0.05 x = minimum(12, 2) = 2 Critical value = 2 Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children Slide 23 Use the data in Table 12-2 with a 0.05 significance level to test the claim that there is no difference between the times of the first and second trials. We reject the null hypothesis. There is sufficient evidence to warrant rejection of the claim of no difference between the times; that is, the median is equal to 0. Copyright © 2004 Pearson Education, Inc. Example: Gender Discrimination Slide 24 Hatters Restaurant Chain hired 30 men and 70 women. Use the sign test and a 0.05 significance level to test the null hypothesis that men and women are hired equally by this company. H0: p = 0.5 H1: p 0.5 x = minimum(30, 70) = 30 Copyright © 2004 Pearson Education, Inc. Example: Gender Discrimination Slide 25 Hatters Restaurant Chain hired 30 men and 70 women. Use the sign test and a 0.05 significance level to test the null hypothesis that men and women are hired equally by this company. n (x + 0.5) – 2 z= n 2 100 (30 + 0.5) – 2 = –3.90 z= 100 2 Copyright © 2004 Pearson Education, Inc. Example: Gender Discrimination Slide 26 Hatters Restaurant Chain hired 30 men and 70 women. Use the sign test and a 0.05 significance level to test the null hypothesis that men and women are hired equally by this company. With = 0.05, the critical values are z = 1.96. We reject the null hypothesis. There is sufficient evidence to warrant rejection of the claim that hiring practices are fair. Copyright © 2004 Pearson Education, Inc. Example: Body Temperature Slide 27 Use the 106 temperatures in Data Set 4 on Day 2 with the sign test to test the claim that the median is less than 98.6°F. There are 68 subjects with temperatures greater than 98.6°F, 23 subjects with temperatures less than 98.6°F, and 15 subjects with temperatures equal to 98.6°F. H0: Median is equal to 98.6°F. H1: Median is less than 98.6°F. Copyright © 2004 Pearson Education, Inc. Example: Body Temperature Slide 28 Use the 106 temperatures in Data Set 4 on Day 2 with the sign test to test the claim that the median is less than 98.6°F. There are 68 subjects with temperatures greater than 98.6°F, 23 subjects with temperatures less than 98.6°F, and 15 subjects with temperatures equal to 98.6°F. n (x + 0.5) – 2 z= n 2 91 (23 + 0.5) – 2 = –4.61 z= 91 2 Copyright © 2004 Pearson Education, Inc. Example: Body Temperature Slide 29 Use the 106 temperatures in Data Set 4 on Day 2 with the sign test to test the claim that the median is less than 98.6°F. There are 68 subjects with temperatures greater than 98.6°F, 23 subjects with temperatures less than 98.6°F, and 15 subjects with temperatures equal to 98.6°F. We use Table A-2 to get the critical z value of –1.645. We can see that the test statistic of z = –4.61 falls into the critical region. We therefore reject the null hypothesis. We support the claim that the median body temperature of healthy adults is less than 98.6°F. Copyright © 2004 Pearson Education, Inc. Slide 30 Section 12-3 Wilcoxon Signed-Ranks Test for Matched Pairs Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Definition Slide 31 The Wilcoxon signed-ranks test is a nonparametric test that uses ranks of sample data consisting of matched pairs. It is used to test for differences in the population distributions. Copyright © 2004 Pearson Education, Inc. Wilcoxon Signed-Ranks Tests Slide 32 H0: The two samples come from populations with the same distribution. H1: The two samples come from populations with different distributions. Copyright © 2004 Pearson Education, Inc. Procedure for Finding the Value of the Test Statistic Slide 33 Step 1: For each pair of data, find the difference d by subtracting the second score from the first, Keep signs, but discard any pairs for which d = 0. Step 2: Ignore the signs of the differences, then sort the differences from lowest to highest and replace the differences by the corresponding rank value. When differences have the same numerical value, assign to them the mean of the ranks involved in the tie. Step 3: Attach to each rank the sign difference from which it came. That is, insert those signs that were ignored in step 2. Step 4: Find the sum of the absolute values of the negative ranks. Also find the sum of the positive ranks. (continued) Copyright © 2004 Pearson Education, Inc. Procedure for Finding the Value of the Test Statistic Slide 34 Step 5: Let T be the smaller of the two sums found in step 4. Either sum could be used, but for a simplified procedure we arbitrarily select the smaller of the two sums. Step 6: Let n be the number of pairs of data for which the difference d is not 0. Step 7: Determine the test statistic and critical values based on the sample size, as shown below. Step 8: When forming the conclusion, reject the null hypothesis if the sample data lead to a test statistic that is in the critical region - that is, the test statistic is less than equal or equal to the critical value(s). Otherwise, fail to reject the null hypothesis. Copyright © 2004 Pearson Education, Inc. Wilcoxon Signed-Ranks Tests Assumptions Slide 35 1. The sample data have been randomly selected. 2. The population of differences (found from the pairs of data) has a distribution that is approximately symmetric, meaning that the left half of its histogram is roughly a mirror image of its right half. (There is no requirement that the data have a normal distribution. Copyright © 2004 Pearson Education, Inc. Notation Slide 36 T = the smaller of the following two sums: 1. The sum of the absolute values of the negative ranks 2. The sum of the positive ranks Copyright © 2004 Pearson Education, Inc. Test Statistic for the Wilcoxon Signed-Ranks Test for Matched Pairs Slide 37 For n 30: T For n > 30: z= T – n(n + 1) 4 n(n +1) (2n +1) 24 Critical values: n 30, critical T values are in Table A-8 For n > 30, critical z values are in Table A-2 For Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children Slide 38 Use the data in Table 12-3 with the Wilcoxon signed-ranks test and 0.05 significance level to test the claim that there is no difference between the times of the first and second trials. Copyright © 2004 Pearson Education, Inc. H0: There is no difference between the Slide 39 times of the first and second trials. H1: There is a difference between the times of the first and second trials. The differences in row three of the table are found by computing the first time – second time. The ranks of differences in row four of the table are found by ranking the absolute differences, handling ties by assigning the mean of the ranks. The signed ranks in row five of the table are found by attaching the sign of the differences to the ranks. Copyright © 2004 Pearson Education, Inc. H0: There is no difference between the Slide 40 times of the first and second trials. H1: There is a difference between the times of the first and second trials. Find the sum of the absolute values of the negative ranks: 5.5 Find the sum of the values of the positive ranks: 99.5 T = 5.5 (the smaller of the two sums) Let n be the number of pairs where d 0, so n = 14. Since n 30, T = 5.5 will be the test statistic. Using Table A- 8, the critical value will be 21 . Copyright © 2004 Pearson Education, Inc. H0: There is no difference between the Slide 41 times of the first and second trials. H1: There is a difference between the times of the first and second trials. Since the test statistic (T = 5.5) is less than the critical value of 21, we reject the null hypothesis (Step 8 of procedures). It appears that there is a difference between the times of the first and second trials. Copyright © 2004 Pearson Education, Inc. Slide 42 Section 12-4 Wilcoxon Rank-Sum Test for Two Independent Samples Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Wilcoxon Rank-Sum Test for Two Independent Samples Slide 43 Definition The Wilcoxon rank-sum test is a nonparametric test that uses ranks of sample data from two independent populations. It is used to test the null hypothesis that the two independent samples come from populations with the same distribution. (That is, the two populations are identical.) Copyright © 2004 Pearson Education, Inc. Key Idea Slide 44 If two samples are drawn from identical populations and the individual values are all ranked as one combined collection of values, then the high and low ranks should fall evenly between the two samples. Copyright © 2004 Pearson Education, Inc. Assumptions Slide 45 1. There are two independent samples that were randomly selected. 2. Each of the two samples has more than 10 values. 3. There is no requirement that the two populations have a normal distribution or any other particular distribution. Copyright © 2004 Pearson Education, Inc. Procedure for Finding the Value of the Test Statistic Slide 46 1. Temporarily combine the two samples into one big sample, then replace each sample value with its rank. 2. Find the sum of the ranks for either one of the two samples. 3. Calculate the value of the z test statistic as shown next, where either sample can used as ‘sample 1’. Copyright © 2004 Pearson Education, Inc. Notation for the Wilcoxon Rank-Sum Test Slide 47 n1 = size of sample 1 n2 = size of sample 2 R1 = sum of ranks for sample 1 R2 = sum of ranks for sample 2 R = same as R1 (sum of ranks for sample 1) R R = mean of the sample R values that is expected when the two populations are identical = standard deviation of the sample R values that is expected when the two populations are identical Copyright © 2004 Pearson Education, Inc. Test Statistic for the Wilcoxon Rank-Sum Test for Two Independent Samples z= where R R = = Slide 48 R – R R n1 (n1 + n2 + 1) 2 n1 n2 (n1 + n2 + 1) 12 n1 = size of the sample from which the rank sum R is found n2 = size of the other sample R = sum of ranks of the sample with size n1 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Wilcoxon Rank-Sum Test for Two Independent Samples Slide 49 Critical Values Can be found in Table A-2 (because the test statistic is based on the normal distribution) Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Use the data in Table 124 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. Copyright © 2004 Pearson Education, Inc. Slide 50 Example: Rowling and Tolstoy Slide 51 Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. H0: The Rowling and Tolstoy books have Flesch Reading Ease scores with the same distribution. H1: The Rowling and Tolstoy books have distributions of Flesch Reading Ease scores that are different in some way. Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Slide 52 Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. R = 24 + 22 + 18 + + 9.5 = 236.5 R R = = n1 (n1 + n2 + 1) 2 13 (13+ 12+ 1) 2 = 169 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Slide 53 Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. R = n1 n2 (n1 + n2 + 1) 12 R = (13)(12)(13+ 12+ 1) = 18.385 12 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Slide 54 Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. z= z= R – R R 236.5 – 169 18.385 = 3.67 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Slide 55 Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. We have a two tailed test with an = 0.05, so the critical values are 1.96 and –1.96. The test statistic of 3.67 falls in the critical region, so we reject the null hypothesis that the Rowling and Tolstoy books have the same reading scores. Copyright © 2004 Pearson Education, Inc. Example: Wednesday and Saturday Rain Slide 56 Use the data from the Chapter Problem (shown in the Minitab printout) with the Wilcoxon rank-sum test to test the claim that the rainfall amounts for Wednesdays and Saturdays have the same distribution. Copyright © 2004 Pearson Education, Inc. Example: Wednesday and Saturday Rain Slide 57 Use the data from the Chapter Problem (shown in the Minitab printout) with the Wilcoxon rank-sum test to test the claim that the rainfall amounts for Wednesdays and Saturdays have the same distribution. H0: The Wednesday and Saturday rainfall amounts come from populations with the same distribution. H1: The two distributions are different in some way. Copyright © 2004 Pearson Education, Inc. Example: Wednesday and Saturday Rain Slide 58 Use the data from the Chapter Problem (shown in the Minitab printout) with the Wilcoxon rank-sum test to test the claim that the rainfall amounts for Wednesdays and Saturdays have the same distribution. The rank sum is W = 2639.0, the P-value = 0.2773 (or 0.1992 after adjustment for ties). We cannot reject the null hypothesis. The differences between Wednesday and Saturday are not significant. Copyright © 2004 Pearson Education, Inc. Slide 59 Section 12-5 Kruskal-Wallis Test Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Kruskal-Wallis Test Slide 60 (also call the H test) Definition The Kruskal-Wallis test is a nonparametric test that uses ranks of sample data from three or more independent populations. It is used to test the null hypothesis that the independent samples come from populations with the same distribution. Copyright © 2004 Pearson Education, Inc. Kruskal-Wallis Test Slide 61 (also call the H test) Hypotheses H0: The samples come from populations with the same distribution. H1: The samples come from populations with different distributions. Copyright © 2004 Pearson Education, Inc. Kruskal-Wallis Test Slide 62 We compute the test statistic H, which has a distribution that can be approximated by the chisquare (2 ) distribution as long as each sample has at least 5 observations. Copyright © 2004 Pearson Education, Inc. Procedure for Finding the Value of the Test Statistic Slide 63 1 Temporarily combine all samples into one big sample and assign a rank to each sample value. (Sort from lowest to highest, and in cases of ties, assign each observation the mean of the ranks involved.) 2. For each sample, find the sum of the ranks and find the sample size. 3. Calculate H by using results of Step 2 and the following: Copyright © 2004 Pearson Education, Inc. Assumptions Slide 64 1. We have at least three independent samples, all of which are randomly selected. 2. Each sample has at least 5 observations. 3. There is no requirement that the populations have a normal distribution or any other particular distribution. Copyright © 2004 Pearson Education, Inc. Notation for the Kruskal-Wallis Test Slide 65 • N = total number of observations combined • k = number of samples • R1 = sum of ranks for sample 1 • n1 = number of observations in sample 1 • For sample 2, the sum of ranks is R2 and the number of observations is n2 , and similar notation is used for the other samples. Copyright © 2004 Pearson Education, Inc. Test Statistic for the Kruskal-Wallis Test H= 12 N(N + 1) 2 1 2 R R2 + +...+ n1 n2 Slide 66 2 Rk nk – 3 (N +1) • where degrees of freedom = k – 1 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Kruskal-Wallis Test Slide 67 Critical Values 1. Test is right-tailed. 2. Use Table A-4 (because the H test statistic can be approximated by the 2 distribution). 3. Degrees of freedom = k – 1 Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. Copyright © 2004 Pearson Education, Inc. Slide 68 Example: Clancy, Rowling and Tolstoy Slide 69 Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. H0: The populations of the readability scores for pages from the three books are identical. H1: The three populations are not identical. Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Slide 70 Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. n1 = 12 n2 = 12 n3 = 12 N = 36 R1 = 201.5 R2 = 337 R3 = 127.5 Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Slide 71 Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. H= H= 2 1 12 N(N + 1) R R + +...+ n1 n2 12 36(36+ 1) 201.5 12 2 k 2 2 2 + 2 337 12 R nk – 3 (N +1) 2 + 127.5 12 H = 16.949 Copyright © 2004 Pearson Education, Inc. – 3 (36 +1) Example: Clancy, Rowling and Tolstoy Slide 72 Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. The critical value is 2 = 5.991, which corresponds to 2 degrees of freedom and a 0.05 level of significance. We reject the null hypothesis of equal means. Copyright © 2004 Pearson Education, Inc. Example: Rains More on Weekends? Slide 73 Use the Data Set 11 in Appendix B to test the claim that the seven weekdays have distributions that are not all the same. Copyright © 2004 Pearson Education, Inc. Example: Rains More on Weekends? Slide 74 Use the Data Set 11 in Appendix B to test the claim that the seven weekdays have distributions that are not all the same. H0: The populations of the weekday rainfall data are identical. H1: The populations of the weekday rainfall data are not identical. Copyright © 2004 Pearson Education, Inc. Example: Rains More on Weekends? Slide 75 Use the Data Set 11 in Appendix B to test the claim that the seven weekdays have distributions that are not all the same. The test statistic H = 3.85 (adjusted for ties), and the Pvalue is 0.697. We fail to reject the null hypotheis. There is not enough evidence to support a claim that the rainfall amounts on the seven weekdays have distributions that are not all the same. Copyright © 2004 Pearson Education, Inc. Slide 76 Section 12-6 Rank Correlation Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Rank Correlation Slide 77 Definition Rank Correlation uses the ranks of sample data consisting of matched pairs. The rank correlation test is used to test for an association between two variables Ho: s = 0 (There is no correlation between the two variables.) H1: s 0 (There is a correlation between the two variables.) Copyright © 2004 Pearson Education, Inc. Advantages Slide 78 1. The nonparametric method of rank correlation can be used in a wider variety of circumstances than the parametric method of linear correlation. With rank correlation, we can analyze paired data that are ranks or can be converted to ranks. 2. Rank correlation can be used to detect some (not all) relationships that are not linear. 3. The computations for rank correlation are much simpler than the computations for linear correlation, as can be readily seen by comparing the formulas used to compute these statistics. Copyright © 2004 Pearson Education, Inc. Disadvantages A disadvantage of rank correlation is its efficiency rating of 0.91, as described in Section 12-1. This efficiency rating shows that with all other circumstances being equal, the nonparametric approach of rank correlation requires 100 pairs of sample data to achieve the same results as only 91 pairs of sample observations analyzed through parametric methods. Copyright © 2004 Pearson Education, Inc. Slide 79 Assumptions Slide 80 1. The sample data have been randomly selected. 2. Unlike the parametric methods of Section 9-2, there is no requirement that the sample pairs of data have a bivariate normal distribution. There is no requirement of a normal distribution for any population. Copyright © 2004 Pearson Education, Inc. Notation rs = s = n = Slide 81 rank correlation coefficient for sample paired data (rs is a sample statistic) rank correlation coefficient for all the population data (s is a population parameter) number of pairs of data d = difference between ranks for the two values within a pair rs is often called Spearman’s rank correlation coefficient Copyright © 2004 Pearson Education, Inc. Test Statistic for the Rank Correlation Coefficient rs = 1 Slide 82 2 6 d – 2 n(n – 1) where each value of d is a difference between the ranks for a pair of sample data Critical values: If n 30, refer to Table A-9 If n > 30, use Formula 12-1 Copyright © 2004 Pearson Education, Inc. Formula 12-1 rs = Slide 83 z n–1 (critical values when n > 30) where the value of z corresponds to the significance level Copyright © 2004 Pearson Education, Inc. Figure 12-4 Rank Correlation for Testing H0: s = 0 Start Slide 84 Are the n pairs of data in the form of ranks ? No Yes Calculate the difference d for each pair of ranks by subtracting the lower rank from the higher rank. Square each difference d and then find the sum of those squares to get Let n equal the total number (d2) of signs. Complete the computation of 2 rs = 1 – 62d n(n –1) to get the sample statistic. Copyright © 2004 Pearson Education, Inc. Convert the data of the first sample to ranks from 1 to n and then do the same for the second sample. Figure 13-4 Rank Correlation for Testing H0: s = 0 Slide 85 Complete the computation of 2 rs = 1 – 62d n(n –1) to get the sample statistic. Calculate the critical values Is n 30 ? No Yes rs = z n –1 where z corresponds to the significance level Find the critical values of rs in Table A-9 If the sample statistic rs is positive and exceeds the positive critical value, there is a correlation. If the sample statistic rs is negative and is less than the negative critical value, there is a correlation. If the sample statistic rs is between the positive and negative critical values, there is no correlation. Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Slide 86 Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of = 0.05. Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Slide 87 Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of = 0.05. H 0: s = 0 H 1: s 0 n = 10 rs = 1 rs = 1 2 6 d – 2 n(n – 1) 6(74) – 2 10(10 – 1) rs = 0.552 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Slide 88 Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of = 0.05. We refer to Table A-9 to determine that the critical values are 0.648. Because the test statistic of rs = 0.552 does not exceed the critical value of 0.648, we fail to reject the null hypothesis. There is no sufficient evidence to support a claim of a correlation between the rankings of men and women. Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty with Large Samples Slide 89 Assume that the preceding example is expanded by including a total of 40 women and that the test statistic rs is found to be 0.291. If the significance level of = 0.05, what do you conclude about the correlation? Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty with Large Samples Slide 90 rs = z n–1 rs = 1.96 = 0.314 40 – 1 These are the critical values. Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty with Large Samples Slide 91 The test statistic of rs = 0.291 does not exceed the critical value of 0.314, so we fail to reject the null hypothesis. There is not sufficient evidence to support the claim of a correlation between men and women. Copyright © 2004 Pearson Education, Inc. Example: Detecting a Nonlinear Pattern Slide 92 The data in Table 12-7 are the numbers of games played and the last scores (in millions) of a Raiders of the Lost Ark pinball game. We expect that there should be an association between the number of games played and the pinball score. Is there sufficient evidence to support the claim that there is such an association? Copyright © 2004 Pearson Education, Inc. Example: Detecting a Nonlinear Pattern Copyright © 2004 Pearson Education, Inc. Slide 93 Example: Detecting a Nonlinear Pattern H0: s = 0 H1: s 0 n=9 rs = 1 2 6 d – 2 n(n – 1) rs = 1 6(6) – 2 9(9 – 1) rs = 0.950 Copyright © 2004 Pearson Education, Inc. Slide 94 Example: Detecting a Nonlinear Pattern Slide 95 We use Table A-9 to get the critical values of 0.683. The sample statistic of 0.950 exceeds the critical value of 0.683, so we conclude that there is significant correlation. Higher numbers of games played appear to be associated with higher scores. Copyright © 2004 Pearson Education, Inc. Slide 96 Section 12-7 Runs Test for Randomness Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Runs Test for Randomness Slide 97 Definitions Run A run is a sequence of data having the same characteristic; the sequence is preceded and followed by data with a different characteristic or no data at all. Runs Test The runs test uses the number of runs in a sequence of sample data to test for randomness in the order of the data. Copyright © 2004 Pearson Education, Inc. Fundamental Principles of the Run Test Slide 98 Reject randomness if the number of runs is very low or very high. Copyright © 2004 Pearson Education, Inc. Examples Slide 99 DDDDRRDDDR 4 runs DDDD RR DDD R 1st run 2nd run 3rd run 4th run Copyright © 2004 Pearson Education, Inc. Examples Slide 100 DDDDDRRRRR only 2 runs If the number of runs is very low, randomness is lacking. DRDRDRDRDR 10 runs If the number of runs is very high, randomness is lacking. Copyright © 2004 Pearson Education, Inc. Assumptions Slide 101 1. The sample data are arranged according to some ordering scheme, such as the order in which the sample values were obtained. 2. Each data value can be categorized into one of two separate categories. 3. The runs test for randomness is based on the order in which the data occur; it is not based on the frequency of the data. Copyright © 2004 Pearson Education, Inc. Notation Slide 102 n1 = number of elements in the sequence that have one particular characteristic (The characteristic chosen for n1 is arbitrary.) n2 = number of elements in the sequence that have the other characteristic G = number of runs Copyright © 2004 Pearson Education, Inc. Large Sample Cases Slide 103 Table A-10 applies when: 1. We are using 5% as the cutoff for sequences that have too few or too many runs 2. n1 20, and 3. n2 20 Copyright © 2004 Pearson Education, Inc. Large Sample Cases Formula 12-2 Formula 13-3 µ G 2n1n2 = +1 n 1 + n2 (2n1n2) (2n1n2 – n1 – n2) G = 2 (n1 + n2) (n1 + n2 – 1) where µG = mean of the runs G G = standard deviation of the runs G and the distribution of the number of runs G is approximately normal Copyright © 2004 Pearson Education, Inc. Slide 104 Test Statistic for the Runs Test for Randomness Slide 105 If = 0.05 and n1 20 and n2 20, the test statistic is G. If 0.05 and n1 > 20 and n2 > 20, the test statistic is z= Critical values: G– µ G G If the test statistic is G, critical values are found in Table A-10 If the test statistic is z, critical values are found in Table A-2 by using the same procedures introduced in Chapter 6. Copyright © 2004 Pearson Education, Inc. Figure 12-5 Runs Test for Randomness Slide 106 Copyright © 2004 Pearson Education, Inc. Figure 12-5 Runs Test for Randomness Slide 107 Copyright © 2004 Pearson Education, Inc. Figure 12-5 Runs Test for Randomness Slide 108 Copyright © 2004 Pearson Education, Inc. Example: Basketball Foul Shots Slide 109 In the course of a game, WNBA player Cynthia Cooper shoots 12 free throws. Denoting shots made by “H” and shots missed by “M”, her results are as follows: H, H, H, M, H, H, H, H, M, M, M, H. Use a 0.05 significance level to test for randomness in the sequence of hits and misses. Copyright © 2004 Pearson Education, Inc. Example: Basketball Foul Shots Slide 110 There are 8 hits, 4 misses, and 5 runs, so we have n1 = 8, n2 = 4, and G = 5. The test statistic is G = 5, and we refer to Table A-10 to find the critical values of 3 and 10. We do not reject randomness. There is not sufficient evidence to warrant rejection of the claim that the hits and misses occur randomly. Copyright © 2004 Pearson Education, Inc. Example: Boston Rainfall on Mondays Slide 111 Refer to the rainfall amounts for Boston as listed in Data Set 11 in Appendix B. Is there sufficient evidence to support the claim that rain on Mondays is not random? Copyright © 2004 Pearson Education, Inc. Example: Boston Rainfall on Mondays H0: The sequence is random. H1: The sequence is not random. n1 = 33 n2 = 19 G = 30 Copyright © 2004 Pearson Education, Inc. Slide 112 Example: Boston Rainfall on Mondays G = 2n1n2 +1 n1+n2 G = 2(33)(19) +1 33+19 = 25.115 Copyright © 2004 Pearson Education, Inc. Slide 113 Example: Boston Rainfall on Mondays (2n1n2) (2n1n2 – n1 – n2) G = 2 (n1 + n2) (n1 + n2 – 1) 2(33)(19)[2(19)(33) – 33 – 19] G = 2 (33 + 19) (33 + 19 – 1) G = 3.306 Copyright © 2004 Pearson Education, Inc. Slide 114 Example: Boston Rainfall on Mondays z= z= G –G G 30 – 25.115 3.306 = 1.48 Copyright © 2004 Pearson Education, Inc. Slide 115 Example: Boston Rainfall on Mondays Slide 116 The critical values are 1.96, since = 0.05, and we had a two tailed test. The test statistic of 1.48 does not fall within the critical region. We fail to reject the null hypothesis of randomness. The given sequence does appear to be random. Copyright © 2004 Pearson Education, Inc.