Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics WHEN cm-SQUARE DOES NOT HOLD: HOW TO ANALYZE TABLES WITH LOW CELL FREQUENCIES WITH SAS Lothar Tremmel, Ph.D. Bio-Pharm Clinical Senices Example 1: One 2x2 table Abstract Detennining if two proportions are significantly different is a common problem in statistics. For low sample sizes, the common chi-square tests may give the wrong answer, because normal approximation is no longer valid. Exact tests such as Fisher's exact tests need to be used instead. The randomization model for such a test is developed, and it is demonstrated how to obtain this test with SAS procedures and functions. failure success % success Placebo 10 I 10% Active 7 7 50% The question is: Are the two success rates 1()O/O and 50% "significantly" different? - "Significantly" meaning that the difference cannot reasonably be explained by random chance. To answer this question, we will develop an idea how a random process could have generated the data, and we will investigate the question: How likely is a difference as the observed one to emerge just by random chance? Oftentimes, one has a set of pairs of proportions, such as the proportion of responders under Active treatment vs. the proportion under Placebo for males, and the corresponding proportions for females. 'Gender' is the stratification variable, and 'male' and 'female' are the 'strata'. It is often desirable not to pool these data into a single 2x2 table, but to estimate the treatment effect in each stratum separately and to combine the estimates. An exact test is developed, and it is shown how to calculate the test with the new SAS procedure MUL1TEST. The fIXed-margin null model: A statistical model is an idea about how the data were generated. We will build a "null modef', i.e., a model which assumes no treatment effect, and we will try to explain the data with it. For 2x2 tables, a common null model isfixed-margin null model: It assumes that all the marginal sums of the table are fixed and given before the experiment: Extensions for sets ofk proportions are also discussed. One 2x2 table Suppose 25 patients were emoled in a clinical trial. Of these, 14 were randomly chosen to receive active treatment; the other 11 patients received placebo (sham treatment). At the end of the trial, 7 out of the 14 actively treated patients were cured, as compared to only one placebo patient This result can be summarized in the 2x2 table shown in Example I: 871 Failure Success Sum Placebo 0 11-0 n.1=11 Active 17-0 14-(17-0) n.2=14 Sum n1.=17 n2.=8 n.. =25 Statistics This randomization model can be described as follows: There are 25 subjects of which 8 will be successes. We randomly assign 11 of them to Placebo. This is like fishing II fishes from a lake with 25 fishes of two different kinds: 17 blue and 8 red, in the example. Sample Space: The fIXed-margin null model imposes a rigid structure on the table: If one cell is known, the whole table is determined. For instance, if there are 0=10 placebo failures, there must be 11-10=1 placebo success for the row to add up to II, the placebo sample size. Likewise. there must be 17-10=7 treatment failures for the column total to be 17, and 8 trealment successes for the second row and column to add up to their fixed margins. The set of all possible tables is called the sample space of the model; it is the set of all possible outcomes. An example of another element of the samaple space is the table with 9 placebo failures (and hence 11-9=2 placebo successes, 17-9=8 active successes, and 14-8=6 active failures). On the other hand, a table with 12 placebo failures is not an element of the sample space because there are only 11 placebo patients. . ~" ~ 0.02 0.00 3458'II,OU Figure 1: Exact probability distribution for o=the number of placebo failures from Example I, under the null model The formula for p(0) can be found in all basic statistical textbooks, such as Hoel (1084, p. 69). The hypergeometric distribution has the following properties: Expected and Observed Value: The overall failure rate is 17/25=0.68. Based on the null model of no treatmeilt effect, we would expect a placebo failure rate of about 0.68 as well. This would be the case if the number of placebo failures were 7.5, which is called the expected value E for the number of placebo failures. However, our observed value 0 is 10, which is considerably larger. • • Point Probability: Under the null mode~ the table with E placebo failures is the most likely table. The Null model determines precisely how likely this outcome is. It also determines the probability of each other possible table. These outcome probabilities are called point probabilities. Since one cell determines the whole table, the point probabilities are a function of a single cell, say o=Placebo failures. The distribution of the point probabilities over the values of 0 is called the hypergeometric distribution of o. It is shown in Figure I. It shows that we would expect to observe 7 or 8 placebo failures: p(7)=P(8)=O.305. Observing 10 placebo failures is a rather unlikely event p(IO)=O.035. In gen~ the distribution is not symmetric Writing 0 for the random variable of placebo failures, E(o)=nl. * (n.lIn), and Var(o)=(nl.*n2.*n.1 *n.2)/(n2(n-I» , Tail probability or 'p value': To answer the question if the null model is good enough to explain the data, the point probability of exactly the table obtained is of no interest at all. What we rea1ly want to know is: What is the probability under the null model to get a resuh as extreme as the one we got, or even more extreme? This will tell us how often we would be wrong if we considered the given evidence as sufficient to reject the null model although it is true. This probability is the p value, or tail probability. It is the sum of the point probabilities of the table obtained and all more extreme tables. In our example, the p value is p(IO placebo failures) = 0.0349 + p(11 placebo failures) = 0.00278, which adds up to 0.03768. This P value is called Fisher's exact p value. 872 Statistics One tailed vs. two tailed p values: Since we looked only in one direction to sum up the more extreme tables, the calculated p value is a one-sided p value. Two-sided p values are obtained by defming 'more extreme' in a nondirectional way, and summing up both tails of the distribution. Since in general the hypergeometric distribution is not symmetric, there are different possibilities of defining 'more extreme' nondirectionally with different resulting two-tailed p values. Agresti (1992, p. 134) gives an overview and references. given in the PROC FREQ output, simply labeled 'Chi Square'. For the example, its value is 4.738, for which the Chi-Square distribution with 1 d.f. returns a p value of 0.030. Exact tests based on the exact product binomial distribution can also be constructed (Rice 1988), but they are rarely used in practice. The Mantel Haenzel statistic and Pearsons' statistic differ only in a factor nI(n-I); hence, for large sample sizes, both null models yield almost identical p values. Both the one-tailed and a two-tailed Fisher's exact p value can be obtained with PROC FREQ, using the ICHISQ or !EXACT option of the TABLES statement More directly, the SAS-function PROBHYPR can be used to get tail probabilities from the hypergeometric distribution: Analysis of Example 1 with PROC FREQ: proc freq data= one; weight count; tables drug*improv I chisq; p=PROBHYPR(n, ni-, n_l, x) p=1-PROBHYPR(25, 11, 17, 10-1) run; for the example. Note that PROBHYPR always gives the lower-tail. Therefore, PROBHYPR(25, II, 17, 10) is p(o S 10), or the probability, to obtain 10 or fewer failures in Placebo. The uppertail p to obtain 10 or more is 1 minus PROBHYPR(25, 11, 17, 10-1). The subtraction of 1 from lOis necessary to include the point probability of the given table in the tail. ------------ OUTPUT: ---------------STATISTICS FOR TABLE OF DRUG BY IMPROV Approximate tests for 2x2 tables: For large cell sizes, the hypergeometric distribution approximates the bell curve. This means that Z = (0 - E(o»)/ ..J(VAR(0» follows approximately the standard normal distribution, and hence Z2 the chi-square distribution with 1 d.f. z2 is known as the Mante1-Haenszel Chi-Square 0.m- It can also be obtained by PROC FREQ using the TABLES statement with the ICHISQ option. For the example, the value of QMH is 4.548, for which the Chi-Square distribution with 1 d.f. returns a p value of 0.033. Statistic Value Prob Chi-Square Likelihood Ratio Chi-Square continuity Adj. Chi-Square 4.738 5.233 3.044 4.548 0.030 0.022 Kantel-Baenszel Chi-Square Fisher's Exact Test (Leftl (Rightl (2-Taill Notes and Extensions: Besides the fIXed margin model descnoed so far, a different data model is also common: The product-binomial model: Each sample is considered an independent binomial sample. Hence, only the two sample sizes n.1 and n.2 are considered fixed; the total number of successes n2. is not precpecified and can vary freely. In the example, this is like fIrSt fishing 11 fishes from the sea, and then fishing 14 fishes from the same sea - with no one knowing the true proportion of blue and red fish in the ocean. The nonnal-approximation test based on this idea is the well-known Pearson-Chi-square test, which is also 873 O.OBl 0.033 0.997 0.038 0.042 Statistics However, there are two objections to this approach: Several 2x2 tables Bias: Unbalanced randomization can lead to bias when data are pooled. In the example: Females respond more easily; a higher proportion of females was randomized to Placebo than males. In the combined table, the Placebo group has a higher proportion of female patients than the Treatment group: 321 (32+11) vs. 27/(27+14). This introduces a bias in favor of Placebo. So far, we have investigated only a part of the sample dataset in Koch & Edwards (1988). The complete dataset looks as follows: Example 2: Stratified 2x2 tables Power: Even if randomization is balanced, there is still a case against pooling: If the stratification variable has a main effect, pooling leads to a power loss (For a simulation study, see Gansky and Koch 1994). Stratum 1: Males failure success Placebo 10 1 Treatment 7 7 The alternative to pooling is to estimate the treatment effect in the separate tables separately, to combine the estimates into a combined test statistic to measure the overall effect., and to derive the distribution of this test statistic. This is called stratifYing. Stratum 2: Females failure success Placebo 19 13 Treatment 6 21 Approximate Stratified Tests. A straightforward combined test statistic is the sum !. 0. of all single observed values 0, of the single tables. Assuming independent normal distributions for the 0, in the different strata, is easy to derive the distribution of !, 0, : In the examples of SAS code to follow, we will assume that these data were read in as follows: z = DATA ONE; T T 0 6 1 2l P P 0 19 1 l3 M M M M T T 0 1 7 7 P P 0 lO 1 1 (!, 0, - E(!, 0,) 1 ..Jvar(!. 0,) (!, 0, - !. E,) 1 ..J(!, Var(o,» Again, Z2 is the Mantel-Haenszel Chi-Square Statistic QMH' It can easily be obtained by PROC FREQ as shown below. For the example, the value of OMH is 12.59, which yields a p value of < 0.001. INPUT SEX $ DRUG $ IMRPOV COUNT; CARDS; F F F F = RUN; Pooling vs. Stratifying. How to derive a combined assessment of the treatment effect? A straightforward idea is Pooling. which simply means throwing the data together, regardless of gender, and to analyze the resulting single 2x2 table as above. 874 Statistics of the point probabilities from the cells of Figure 2 over the two-dimensional sample space. Analysis of Example 2 with PROC FREQ: proc freq data= one; weight count; tables sex*drug*improv / run; u cmh; ...... OUTPUT: ---------------0.0413 Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic 1 2 3 Alternative DF Nonzero 1 correlation Row Mean 1 Scores differ General 1 AssoeiatioD Value PrOD 12.590 0.000 12.590 12.590 ~ s:b. &, 0.000 .. 0.000 O.GOOO k C£ ~ - ~ -.... <..0<000. iI=t: r=b, EI:, ,=>, '=>. ~7 11 -:-r 7' ~Oo~;;;r -,: o?~ -o==.:,~ 000'00.0. Qo. lC2 :.:::70.000 X1 -'"'9, Figure 3: Exal:t probability distribution fOT x2=number of placebo failures (males) and xl=number of placebo failures (females) Problem: If some cell sizes are low « 5), the nonnal approximation is no longer valid. Hence, the p value of the Mantel-Haenszel test will be wrong and exact tests are needed instead. Proc FREQ does not calculate exact p values for stratified tests. In the next section, we will show how to obtain exact tests with the new procedure MULTIEST. The p value is the sum of all point probabilities of outcomes which are at least as extreme as the observed one - which means all pairs of tables which score at least as extreme as the observed table on the combined test statistic ~ 0 .. For the general case ofk strata, this test is known as Birch's exact test of conditional independence (Agresti Exact Stratified Tests. Exact tests for several 2x2 tables can be developed exactly as for a single 2x2 table: The Null Model: Every single table is generated by a separate, independent tixed-margin-null model as described above. With this, the sample space consists of all possible combinations of single-table outcomes, and because of the independence between the tables, the point probability of each outcome is the product of single table probabilities. The resulting exact distribution is the Product 1992, p. 128). Hypergeometric Distribution. For Example 2, the sample space is the set of all pairs of possible outcome tables, and the point probabilities are just the product of the point probabilities of the single tables, as shown in Figure 2 on the next page. Figure 3 shows a plot 875 Statistics Figure 2: Schematic of the sample space of the two-tables example, with point probabilities and tail probability. Males: Failures in Placebo Females Xl P x2=3 p=0 4 .004 5 .039 6 .16 7 .31 8 .31 9 .16 10 .031 II .003 0 1 2 3 4 5 6 7 ... .... ... 16 .093 17 .041 18 .014 19 .003 20 .001 21 ... ... .... .0016 .... ... x 0 t X t t x t t t x t t t t X t t t t t x t t t t t t t t t t t t t 22 23 24 25 x Note: The '0' denotes the observed result; x and t designate the tail area to derive the exact p value. 876 Statistics Exact stratified p values with PROC MULITEST: PROC MULTrEST (SAS Institute, 1992) calculates approximate and exact p values for the COCHRANARMITAGE linear trend test which is based on data schemes like the following: Analysis of Example 2 with PROC MULTrEST: PROC MULTTEST DATA=ONE OUT=PVALSj CLASS DRUGj CONTRAST 'TMT' -0.5 o.sj FREQ COUNTj STRATA SEXj TEST CA (IMPROV / PERM=lOOO UPPERTAILED) j RUNj ------------ OUTPUT: ---------------- Stratum s: Groupg Trend Score Sg Successes Ogo g=1 SI 0 1, g=2 S2 ~ g=3 S3 0 3, ... g=k MULTTEST TABLES St Class <\. Variable Strata IMPROV f Statistic Count N Linear trend = ~ , ~ g Sg * (Ogs - Ego) Percent =Linear trend 1..J(Variance estimate) m Count N Percent which is approximately nonnally distributed. If one has only two groups and takes +0.5 and -0.5 as the trend scores, the linear trend turns out to be the numerator of the Mantel-Haenszel statistic 0-. The denominator turns out to be the estimate based on the product binomial null model, which is (n-l)/n times the hypergeometric variance, for each stratum. Therefore, the p values from the approximate tests will slightly differ from PROC FREQ's Mantel-Haenszel test which is based on the hypergeometric variance. For a single 2x2 table, the resuh will be identical with the p value from Pearson's chi-square test. 0.0003 21.00 27.00 77.78 1.00 11.00 !1. 09 7.00 14.00 50.00 The exact stratified upper-tailed p value agrees with the result given in Koch and Edwards (1988, p. 417). p(exact two-tailed)= ~ tmt diff 13.00 32.00 40.63 Uppertailed keyword: If neither UPPERTAILED nor LOWERTAILED is specified, two-tailed tests are calculated. p(exact uppertailed) =P(r. 0, 2: r. OJ Us Os) + P<r. 0. RawJ> t Perm= parameter: If in any stratum the marginal sum for success or failure exceeds the value of the PERM= parameter, normal approximation will be used. Hence, set PARM= to a high value if you want an exact test. Exact p values: Much more interestingly, MULTIEST is also able to calculate exact p values as follows: P<r. 0, Test p E ~s 0, - E» which can be shown to be the exact p values both for the CA statistic and for the ~H statistic (See appendix). The SAS code to obtain the exact p values is as follows: 877 Statistics More than two proportions: kx2 tables Analysis ofElC81tlple 3: Exact test of global hypothesis with PROC FREQ: You may have more than just two proportions, such as adverse event rates in three treatment groups with rising doses: proc freq data=one; weight count; tables drug*ae / exact; run; Example 3: One k*2 table Groupg Trend Score Headache % ------------ OUTPUT: ---------------- Placebo (n=27) 0 14/27 52% STATISTICS FOR TASLE OF DRUG BY AE 5mg(no:7) 5 5/7 71% IOmg(n=7) 10 6/7 86% Statistic DF Value Prob 2 3.066 0.216 2 3.336 0.18~ 1 2.976 0.084 Chi-Square Likelihood Ratio Chi-Square Mantel-Haenszel Chi-Square Fisher's Bxact ~est In the examples on the right hand side, we assume that these data were read as follows: 0.263 (2-~ail) data one; input drug ae cards; 0 1. 1.4 0 1.3 0 5 5 1. 2 5 0 6 10 1. 10 1 0 count; Analysis ofElC81tlple 3: Exact test of trend hypothesis with PROe MULTTEST: proc rnulttest data=one outo:pvals; class drug; contrast 'trend' 0 5 10; freq count; test CA(ae/perrn=1.0000); Two different hypotheses can be investigated: (1) Global hypothesis: Axe there any differences among the k proportions? (2) Trend hypothesis: Is there a dose-related trend among the k proportions? PROC FREQ provides large-sample tests for both questions, but an exact test only for (I). The second analysis example shows how an exact trend test (2) can be obtained with PROe MULnEST. run; ------------ OUTPUT: ---------------MULTTEST TABLBS Class Variable Problems AS Statistic Count N Problems with the new procedure MULnEST exist concerning time and mernory. Both can be prohibitive when analyzing datasets with moderate or large cell sizes. It is important to set the PERM= parameter judiciously so that MULTTEST switches to the far less demanding normal approximation whenever this makes sense. Percent 0 5 14.00 27.00 51.85 5.00 7.00 71.43 Exact p values for the table 878 Test Rawy trend 0.1011 10 6.00 7.00 85.71 Statistics Technical comments References Note 1: The uppertailed exact p value is the exact p value for the CA statistic because ZCA is nothing but a linear transfonnation of L. O. - Hence both statistics ~ and L. O. order the sample space in the same way. This means the probability for L. O. to be ~ observed is the same as the probability of the CA statistic to be greater than observed. Agresti, A:A Survey of Exact Inference for Contingency Tables. Statistical Science, 7, No.1, 131-177, 1992 Gansky, SA, Koch, GG: Evaluation of Alternative Strategies for Managing Stratification Factors in the Analysis of Randomized Clinical Trials. ASA 1994 Proceedings of the Biopharmaceutical Section Note 2: The two-sided p value is the exact p value for the QMH statistic because the unsquared QMH statistic is also a linear transfonnation of :E. 0.: Writing Z for the unsquared QMH statistic, 0 for observed L. 0., and 0 for the corresponding random variable: P (QMH :.!: (D-EiNar) = p (/ZI :.!: ID-EI/War) Hoe~ Paul G.: Introduction to Mathematical Statistics. Fifth ed., Wiley 1984 Koch GG, Edwards S: Clinical Efficacy Trials with Categorical Data. In Peace KE: Biopharmaceutical Statistics for Drug Development NY and Basel 1988 = p (Z :.!: (O-E)/War) + p(Z ~O-E)/War)= p(o-E:.!:O-E) + p(Zo-E ~(O-E)}= p(o:.!:O) + p(o~ Rice, W. R.: A New Probability Model for Detennining Exact P-Values for 2x2 Contingency Tables When Comparing Binomial Proportions. Biometrics ~ 1-22, March 1988 SAS Institute: The MULTTEST procedure. In SAS Technical Report, Release 6.07, SAS Institute 1992 2E-0) which is what MUL'ITEST calculates (SAS 1992, The MUL'ITEST procedure, p. 385). Author's address: Lothar Tremmel, Ph.D., Assistant Director, Biostatistics, Bio-Pbarm Clinical Services, 4 Valley Square, Blue Bell, PA 19422. Tel.: (215) 283-0770, Fax: (215) 283-0733. Acknowledgments: The author would like to thank John Cohen for his helpful comments to an earlier draft of this paper. 879