Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter Seven: Multi-Sample Methods 1/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze data from research designs that employ two groups of subjects. You will now study methods that can be used to analyze data from two or more groups. We will also consider a few multiple comparison procedures (MCPs). 7.1 Introduction 2/52 One-Way ANOVA: Hypotheses You will recall that the null hypothesis tested by the independent samples t test is H0 : µ1 = µ2 . This can be interpreted as asserting that the treatments afforded two groups of subjects were equal in their effect. The null hypothesis tested by the One-Way ANOVA F test is H0 : µ1 = µ2 = · · · = µk Which can be interpreted as asserting that treatments afforded a specified number of groups (k) of subjects were equal in their effect. 7.2 One-Way ANOVA F Test 3/52 Hypotheses (continued) For a three group design the null hypothesis is H0 : µ1 = µ2 = µ3 The alternative is any condition that makes the null hypothesis false which for the three group design would be 1 2 3 4 µ1 µ1 µ1 µ1 = µ2 6= µ2 = µ3 6= µ2 7.2 One-Way ANOVA F Test 6= µ3 = µ3 6= µ2 6= µ3 4/52 Obtained F The test statistic for the One-Way ANOVA F test is a ratio given by F = MSb MSw The numerator is termed the mean square between (MSb ) while the denominator is termed the mean square within (MSw ). 7.2 One-Way ANOVA F Test 5/52 The Mean Square Within (MSw ) The mean square within (MSw ) is also a ratio and is defined as MSw = SSw N −k Here SSw is the sum of squares within, N is the total number of observations, and k is the number of groups. The quantity N − k is termed the denominator degrees of freedom. For example, if there are three groups with five subjects in each, N = 15, k = 3 and the denominator degrees of freedom is 15 − 3 = 12. 7.2 One-Way ANOVA F Test 6/52 The Sum of Squares Within (SSw ) The sum of squares within is the sum of the sums of squares for the individual groups or SSw = SS1 + SS2 + · · · + SSk The sum of squares for a given group can be calculated by X SS = (x − x̄)2 or equivalently SS = 7.2 One-Way ANOVA F Test X P ( x)2 x − n 2 7/52 The Sum of Squares Within (SSw ) (continued) Thus, SSw can be calculated by i hP hP P 2 SSw = x12 − ( nx11 ) + x22 − 7.2 One-Way ANOVA F Test ( x2 )2 n2 P i + ··· + hP xk2 − ( xk )2 nk P i 8/52 Example The (fictitious) data in the accompanying table represents the weights of subjects who have been engaged in three different dieting regimens. Use these data to calculate MSw . 7.2 One-Way ANOVA F Test Diet One Diet Two Diet Three 198 211 240 189 178 214 200 259 194 188 174 176 213 201 158 9/52 Solution The sums of squares for the three individual groups are as follows. P ( x1 )2 (1016)2 SS1 = − = 208730 − = 2278.8 n1 5 P X ( x2 )2 (1055)2 2 SS2 = x2 − = 225857 − = 3252.0 n2 5 P X ( x3 )2 (922)2 SS3 = x32 − = 171986 − = 1969.2 n3 5 X 7.2 One-Way ANOVA F Test x12 10/52 Solution (continued) SSw is then by Equation 7.4 SSw = SS1 + SS2 + SS3 = 2278.8 + 3252.0 + 1969.2 = 7500.0 Then by Equation 7.3 MSw = 7.2 One-Way ANOVA F Test 7500.0 SSw = = 625 N −k 15 − 3 11/52 The Mean Square Between (MSb ) As with the mean square within, the mean square between is a ratio of a sum of squares to a degrees of freedom. More precisely, MSb = SSb k −1 where SSb is the sum of squares between and k is the number of groups. The quantity k − 1 is termed the numerator degrees of freedom. For example, if there are three groups the numerator degrees of freedom is 3 − 1 = 2. 7.2 One-Way ANOVA F Test 12/52 The Sum of Squares Between SSb Pn1 SSb = i=1 xi1 n1 2 Pn2 + i=1 xi2 n2 2 Pnk + ··· + i=1 xik nk 2 P ( All x.. )2 − N The terms before the minus sign indicate that the observations in each group are to be summed with the sum then being squared and the result then being divided by the number of observations in the group. This calculation is carried out for each group with the results then being summed. The term after the minus sign indicates that all observations are to be summed and the result squared. The division of this term is by N which represents the total number of observations—i.e., n1 + n2 + · · · + nk . 7.2 One-Way ANOVA F Test 13/52 Example Use the data in the table on slide #9 to calculate MSb . Then calculate the One-Way ANOVA F statistic. 7.2 One-Way ANOVA F Test 14/52 Solution By Equation 7.8 Pn1 SSb = i=1 xi1 2 Pn2 i=1 xi2 2 Pn3 i=1 xi3 + + n1 n2 n3 2 2 2 (1016) (1055) (922) (2993)2 = + + − 5 5 5 15 = 599073 − 597203.267 2 P ( All x.. )2 − N = 1869.73 7.2 One-Way ANOVA F Test 15/52 Solution (continued) Dividing SSb by the numerator degrees of freedom of k − 1 = 3 − 1 = 2 yields MSb = 1869.73 = 934.87.1 2 Using the mean square within calculation from slide #11 F = MSb 934.87 = = 1.50. MSw 625.00 Thus, obtained F for the test of significance is 1.40. 1 The result of 934.88 provided on page 269 of the text is based on a different calculation where rounding was a bit differently. 7.2 One-Way ANOVA F Test 16/52 The Test of Significance The test of significance is conducted by comparing obtained F to critical F . If the former is greater than or equal to the latter, the null hypothesis is rejected. Otherwise, the null hypothesis is not rejected. Critical F is obtained by first noting that the numerator degrees of freedom for the analysis are k − 1 = 3 − 1 = 2 and the denominator degrees of freedom are N − k = 15 − 3 = 12. To use Appendix C, the numerator degrees of freedom are located across the top of the table and the denominator degrees of freedom down the side. For α = .05 with 2 and 12 degrees of freedom, Appendix C shows that critical F is 3.89. In this case 1.50 is not greater than or equal to 3.89 so the null hypothesis is not rejected. 7.2 One-Way ANOVA F Test 17/52 Example Suppose a study is conducted with treatments being administered to four independent groups of subjects. Suppose further that n1 = 5, n2 = 7, n3 = 8 and n4 = 4. Obtained F is calculated to be 4.19. Use this information to conduct a One-Way ANOVA F test at α = .05. What is the null hypothesis being tested? What is your decision regarding the null hypothesis? 7.2 One-Way ANOVA F Test 18/52 Solution The numerator degrees of freedom are k − 1 = 4 − 1 = 3. N = 5 + 7 + 8 + 4 = 24 so that the denominator degrees of freedom are N − k = 24 − 4 = 20. Reference to Appendix C give critical F as 2.38. Because obtained F of 4.19 is greater than critical F of 2.38, the null hypothesis H0 : µ1 = µ2 = µ3 = µ4 is rejected. 7.2 One-Way ANOVA F Test 19/52 The ANOVA Table The results of a One-Way ANOVA analysis are traditionally reported in a table similar to the one shown here. Source of Variation Sum of Squares Mean Squares F Critical df Ratio F p -value Between Within SSb k−1 SSb /k−1 MSb /MSw (table) (computer) SSw N−k SSw /N−k Total SSt N−1 7.2 One-Way ANOVA F Test 20/52 Assumptions The assumptions underlying the ANOVA F test are the same as those underlying the independent samples t test, namely 1 2 3 Population normality Homogeneous variances Independence of observations 7.2 One-Way ANOVA F Test 21/52 The 2 By k Chi-Square Test: Hypotheses In Chapter 6 on page 230 you learned to test the null hypothesis H0 : π1 = π2 by means of an independent samples Z test. The 2 by k chi-square test extends this concept to test for equality of any number of proportions. This null hypothesis is stated as H 0 : π 1 = π 2 = · · · = πk which asserts that all population proportions are equal. The notation indicates that the equality extends to any number of groups with the last group characterized as group k. 7.3 The 2 By k Chi-Square Test 22/52 The Alternative Hypothesis The alternative hypothesis is any condition that renders the null hypothesis false. Thus, given three groups, any of the following conditions, baring a Type II error, would cause rejection of the null hypothesis. 1 2 3 4 π1 π1 π1 π1 = π2 6= π2 = π3 6= π2 6= π3 = π3 6= π2 6= π3 When the null hypothesis is rejected, there is no way to know which of the four conditions listed above caused the rejection. 7.3 The 2 By k Chi-Square Test 23/52 Obtained χ2 As with other statistics with which you are now familiar, the hypothesis test is carried out by calculating an obtained value with a subsequent comparison to a critical value. For the chi-square test the obtained value is calculated by " 2 χ = X all cells (fo − fe )2 fe # where fo and fe are referred to respectively as the observed and expected frequencies. The observed frequency is simply the number of outcomes occurring in the given cell as shown in the table on the next slide (slide #25). 7.3 The 2 By k Chi-Square Test 24/52 2 by 3 Chi-Square Table In this table we have used double subscripts to indicate the row and column of each cell entry. Group Group Group One Two Three fo11 fo12 fo13 Outcome 1 fe11 fe12 fe13 fo21 fo22 fo23 Outcome 2 fe21 fe22 fe23 7.3 The 2 By k Chi-Square Test 25/52 Observed (f0 ) and expected (fe ) frequencies The observed frequency is simply the number of outcomes falling into a given cell of the chi-square table. The expected frequency represents the expected number of outcomes to be found in each cell if the null hypothesis is true. The expected frequency is calculated as follows. fe = (NR ) (NC ) N where NR is the row total for the cell whose expected frequency is being calculated and NC is the column total for the same cell. 7.3 The 2 By k Chi-Square Test 26/52 Example Suppose that in the treatment of a terminal illness, the following results are obtained. Of the patients receiving treatment one, 17 are dead at the end of five years while 52 are still alive. For treatment two, 29 are dead while 54 remain alive and for treatment three 11 are dead and 26 remain alive. Use these data to construct a chi-square table then test the hypothesis H0 : π1 = π2 = π3 . 7.3 The 2 By k Chi-Square Test 27/52 Solution We begin by placing the observed frequency of each cell into a chi-square table as shown on the next slide (#29). We then calculate the expected frequency for each cell as follows. (ND ) NG1 (57) (69) fe11 = = = 20.81 N 189 (57) (83) (ND ) (NG 2 ) fe12 = = = 25.0 N 189 (57) (37) (ND ) (NG 3 ) = = 11.16 fe13 = N 189 (NA ) (NG 1 ) (132) (69) fe21 = = = 48.19 N 189 (NA ) (NG 2 ) (132) (83) fe22 = = = 57.97 N 189 (132) (37) (NA ) (NG 3 ) fe23 = = = 25.84 N 189 7.3 The 2 By k Chi-Square Test 28/52 Solution (continued) Dead Alive Group One Group Two Group Three [17] [29] [11] (20.81) (25.03) (11.16) [52] [54] [26] (48.19) (57.97) (25.84) 7.3 The 2 By k Chi-Square Test 57 132 29/52 Solution (continued) Obtained chi-square is then " 2 χ = X all cells = + (fo − fe )2 fe (17-20.81)2 20.81 (54-57.97)2 57.97 + + # (29-25.03)2 25.03 + (11-11.16)2 11.16 + (52-48.19)2 48.19 (26-25.84)2 25.84 = .70 + .63 + .00 + .30 + .27 + .00 = 1.9 7.3 The 2 By k Chi-Square Test 30/52 Solution (continued) The critical value is obtained by entering Appendix D with k − 1 degrees of freedom where k is the number of groups. For α = .05 and 3 − 1 = 2 degrees of freedom, critical χ2 is 5.991. The null hypothesis is rejected when obtained chi-square is greater than or equal to critical chi-square. Because 1.9 is less than 5.991, the null hypothesis is not rejected. We conclude, therefore, that a difference between population proportions cannot be demonstrated. In research terms, we conclude that we could not show a difference in the effectiveness of the three treatments. 7.3 The 2 By k Chi-Square Test 31/52 Multiple Comparison Procedures: Introduction You have learned that rejecting a true null hypothesis when conducting a significance test results in a Type I error and that the probability of such is α. For the purposes that follow we will term this type of rejection as a Per Comparison Error (PCE) and will symbolize the probability of such as αPCE . A Familywise Error (FWE) occurs when one or more true null hypotheses are rejected in a series of tests. The probability of such is symbolized αFWE . 7.4 Multiple Comparison Procedures 32/52 Introduction (continued) Familywise errors occur in two broad contexts. 1 2 Multiple comparison analysis refers to the situation where multiple groups are being compared on a single outcome variable. Multiple endpoint analysis refers to the situation where two groups are being compared on multiple outcome measures. 7.4 Multiple Comparison Procedures 33/52 Determinants of Familywise Error The following observations are demonstrated in the table on the following slide (#35). Other factors remaining fixed, as the number of comparisons (i.e. significance tests) increases, αFWE increases. Other factors remaining fixed, as αPCE decreases (increases), αFWE decreases (increases). 7.4 Multiple Comparison Procedures 34/52 Relationship Between αPCE and αFWE αPCE .05 .01 Number of Groups 3 5 10 20 Number of Comparisons 3 10 45 190 αFWE .122 .286 .630 .920 3 5 10 20 3 10 45 190 .027 .075 .231 .528 7.4 Multiple Comparison Procedures 35/52 Controlling Familywise Errors When you reject a single null hypothesis the interpretation is clear. You have an αPCE probability that you did so incorrectly. When you perform a series of tests and reject one or more null hypotheses, the interpretation is not so clear. Did you reject these hypotheses because they are false or because the familywise Type I error rate is so high that rejections were highly likely even in the face of true null hypotheses? You were confident in your result for the single test because you were able to control the probability of a false rejection at αPCE . You could gain this same confidence in your results for multiple tests if you could control αFWE to some specified level—.05 for example. 7.4 Multiple Comparison Procedures 36/52 The Bonferroni Method Of Controlling Familywise Error As shown in the table on slide #35, αFWE can be reduced by reducing αPCE . But suppose you wish to establish αFWE at some specified value—for example .05. How low must you set αPCE in order to have αFWE be .05? One of the oldest, simplest, and most widely used methods for finding this level is known as the Bonferroni adjustment. 7.4 Multiple Comparison Procedures 37/52 The Bonferroni Method (continued) The adjustment is made by the following equation. αPCE = αFWE NT where NT represents the number of tests to be performed. Thus, for example, if we wish to control αFWE at .05 while we perform three tests, each test would be carried out at the .05 3 = .017 level of significance. 7.4 Multiple Comparison Procedures 38/52 The Step-Down Bonferroni Method In 1979, Holm proposed a modification to the Bonferroni procedure that is usually more powerful than, is never less powerful than, and maintains familywise error at the same level as, the classical procedure. This modified Bonferroni, or more properly, step-down Bonferroni procedure is illustrated on slide #41 and is carried out as follows. 7.4 Multiple Comparison Procedures 39/52 The Step-Down Bonferroni Method (continued) 1 2 3 4 5 6 The multiple test statistics are calculated. The p-value for each statistic calculated in 1 is obtained. The p-values are ordered from smallest to largest with the smallest being designated p(1) , the second smallest p(2) and so forth with the largest being p(NT ) where NT is the number of tests. FWE FWE At the first step, p(1) is compared to αNT . If p(1) ≤ αNT , the test is FWE declared significant and the second step is carried out. If p(1) > αNT , the test is declared nonsignificant and testing ceases with all remaining comparisons being declared not significant. If the first step is significant, step two is carried out by comparing p(2) αFWE αFWE with NT −1 . If p(2) ≤ NT −1 , the result is declared significant and testing continues to the next step. Otherwise, the test is declared nonsignificant and testing ceases with all remaining tests being declared nonsignificant. The steps are continued as shown in the figure on slide #41 until a nonsignificant result is obtained or until the last step is completed. 7.4 Multiple Comparison Procedures 40/52 The Step-Down Bonferroni Method (continued) Figure: An illustration of the step-down Bonferroni multiple comparison procedure. Step One Step Two Step Three ... Step NT P (NT) P-value P (1) P (2) P (3) ... Step-down FWE NT FWE NT-1 FWE NT-2 ... FWE 1 Classical FWE NT FWE NT FWE NT ... FWE NT 7.4 Multiple Comparison Procedures 41/52 Example A researcher involved in a study employing multiple groups of subjects wishes to test a series of null hypotheses by means of independent samples t tests. The null hypotheses with accompanying p-values associated with each test are given below. Use these results to perform a step-down Bonferroni procedure with αFWE not to exceed .05. How do these results compare to results that would be obtained from classical Bonferroni tests? H0 : p-value µ1 = µ3 .0111 µ2 = µ4 .0419 .0090 µ2 = µ5 µ3 = µ4 .0200 µ4 = µ5 .0181 7.4 Multiple Comparison Procedures 42/52 Solution The five p values, along with the hypothesis test from which each was derived, are listed in ascending order below. Also shown are the step-down values of αPCE (S-D) and the classical Bonferroni values of αPCE (CB) for each test of significance. As may be seen the tests of µ2 − µ5 and µ1 − µ3 are significant while the remaining tests are not. It is important to understand that the tests of µ3 − µ4 and µ2 − µ4 are automatically declared nonsignificant at this point due to the stopping rule. Notice that had the researcher employed the classical Bonferroni method, which unfortunately is still common practice, only µ2 − µ5 would have been significant. 7.4 Multiple Comparison Procedures 43/52 Solution (continued) Test p-value S-D αPCE CB αPCE µ2 − µ5 .0090 .0100 .0100 µ1 − µ3 .0111 .0125 .0100 µ4 − µ5 .0181 .0167 .0100 µ3 − µ4 .0200 .0250 .0100 µ2 − µ4 .0419 .0500 .0100 S S NS NS NS 7.4 Multiple Comparison Procedures 44/52 Tukey’s HSD Method Tukey’s HSD (Honestly Significant Difference) test is designed for use in multiple comparison settings where all pairwise comparisons of group means are to be carried out. These tests are conducted by computing the test statistic, commonly comparisons with the symbolized as q, for each of the k(k−1) 2 resultant q statistics then being referenced to an appropriate table of critical values. 7.4 Multiple Comparison Procedures 45/52 Tukey’s HSD Method (continued) The test statistic is defined as follows. x̄i − x̄j qij = q MSw nh The subscripts i and j denote the two groups being compared so that x̄i and x̄j are the means of groups i and j respectively. MSw is the mean square within as computed for a one-way ANOVA via Equations 7.3 and 7.4. 7.4 Multiple Comparison Procedures 46/52 Tukey’s HSD Method (continued) The symbol nh represents the harmonic mean of the two sample sizes and is computed as 2 nh = 1 1 ni + nj When ni = nj , nh = n which is the sample size of either group. 7.4 Multiple Comparison Procedures 47/52 Example Use the data from the dieting study depicted on slide #9 to perform Tukey’s HSD test. Begin by stating the null hypotheses to be tested, then perform the tests and finally, state you conclusions. Maintain αFWE at .05. 7.4 Multiple Comparison Procedures 48/52 Solution Because there are three groups and we wish to make all pairwise comparisons, we will have 3(2) 2 = 3 hypotheses to test. They are H0 : µ1 = µ2 H0 : µ1 = µ3 H0 : µ2 = µ3 7.4 Multiple Comparison Procedures 49/52 Solution (continued) The means of the three groups are as follows. x̄1 = 203.2 x̄2 = 211.0 x̄3 = 184.4 Previous calculations (see slides #10 and 11) obtained when performing a one-way ANOVA on these data provide the following. MSw = 625 Because sample sizes are the same for all groups, nh will be nh = 1 ni 2 + 1 nj = 1 5 2 + 1 5 =5 for all comparisons. 7.4 Multiple Comparison Procedures 50/52 Solution (continued) The test statistics for the three comparisons are by Equation 7.13 x̄1 − x̄2 203.2 − 211.0 q q12 = q = = −.698 MSw nh 625 5 x̄1 − x̄3 203.2 − 184.4 q q13 = q = 1.682 = MSw nh 625 5 x̄2 − x̄3 211.0 − 184.4 q q23 = q = = 2.379 MSw nh 7.4 Multiple Comparison Procedures 625 5 51/52 Solution (continued) Critical values of q are obtained from Appendix E. The table is entered with the number of means in the analysis and the appropriate degrees of freedom which are N − k. Referencing Appendix E for 3 means and 12 degrees of freedom yields a critical value of 3.773. As may be seen, none of the hypotheses are rejected so that no differences between group means can be demonstrated. 7.4 Multiple Comparison Procedures 52/52