Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
УДК 57.087.2:519.233.3 ON THE POWER OF SOME BINOMIAL MODIFICATIONS OF THE BONFERRONI MULTIPLE TEST ©2007 A. T. Teriokhin, T. de Meeûs, J.-F. Guégan Genetique et Evolution des Maladies Infecrieuses, UMR 2724 IRD-CNRS, Centre IRD 911 Avenue Agropolis, BP 64501, 34394 Montpellier cedex 5, France Faculty of Biology, Moscow Lomonosov State University Leninskie Gory 1, Moscow 119992, Russia e-mail: [email protected] Widely used in testing statistical hypotheses, the Bonferroni multiple test has a rather low power that entails a high risk to accept falsely the overall null hypothesis and therefore to not detect really existing effects. We suggest that when the partial test statistics are statistically independent, it is possible to reduce this risk by using binomial modifications of the Bonferroni test. Instead of rejecting the null hypothesis when at least one of n partial null hypotheses is rejected at very high level of significance (say, 0.005 in the case of n=10), as it is prescribed by the Bonferroni test, the binomial tests recommend to reject the null hypothesis when at least k partial null hypotheses (say, k=[n/2]) are rejected at much lower level (up to 30-50%). We show that the power of such binomial tests is essentially higher as compared with the power of the original Bonferroni and some modified Bonferroni tests. In addition, such an approach allows us to combine tests for which the results are known only for a fixed significance level. The paper contains tables and a computer program which allow to determine (retrieve from a table or to compute) the necessary binomial test parameters, i.e., 1 either the partial significance level (when k is fixed) or the value of k (when the partial significance level is fixed). An environmental factor, x , can often influence independently n variables y1 , y 2 , ... , y n that describe the state of a population. For example, the presence of a pollutant may increase the frequencies of several diseases. Suppose that we know, in the form of p -values1, p1 , p 2 , ... , p n , the results of testing n partial null hypotheses, H 01 , H 02 , ... , H 0n , postulating that the factor x has no effect on the variables y1 , y 2 , ... , y n , correspondingly. Then the problem arises how to combine these results to verify the overall null hypothesis, H 0 , that the factor has no effect. The simplest way is to reject H 0 if at least one partial hypothesis is rejected at some given level of significance, i.e., when the probability of the partial mistake is not greater than ', ' (when the partial hypothesis events are statistically independent). But this test procedure is misleading because its significance level, n , may be much greater then ' . For example, for ' 0.05 and n 10 we would obtain an unacceptably great value 10 1 (1 0.05)10 0.40 . To avoid such a high risk of rejecting falsely the overall null hypothesis H 0 , a number of procedures (multiple tests) were proposed for combining the results of partial tests 1 A p-value is the probability that a test statistic will be equal to or greater than the currently observed statistic under assumption that the null hypothesis, i.e., the hypothesis being tested, is true. The smaller the p-value, the greater the confidence with which the test rejects the null hypothesis. 2 in such a way that the overall significance say, n be not greater than a given significance level, 0.05 . The most known multiple test is based on the Bonferroni inequality n n ' where ' is the significance level of each partial test (Morrison, 2004; Couples et al., 1984; Meinert, 1986; Hochberg, Tamhane, 1987; Westfall, Young, 1993; Bland, Altman, 1995). The inequality expresses a simple tenet of probability theory: the probability that one of several events occurs can not exceed the sum of probabilities of all those events. It follows from this inequality that if we use for partial tests the significance level correction for multiplicity2) then the overall significance significance level ' / n (Bonferroni n will be not greater than required . However, the power (i.e., the probability of detecting the really existing effect by rejecting the false null hypothesis) of such a Bonferroni multiple test, as well as that of some of its modifications (Holm, 1979; Simes, 1986; Hochberg, 1988; Rom, 1990; Zhang et al., 1997; Roth, 1999), is rather low (Ryman, Jonde, 2001; Legendre P., Legendre L., 1998; Morikawa et al., 1997; Blair et al., 1996). It is a reason why some researchers (Rothman, 1990; Perneger, 1998; Bender, Lange, 1999)) suggest rather to combine the results of partial tests at an informal level instead to apply a multiplicity correction. Another way to improve the situation and to rescue some empirically discovered effects falsely rejected by the Bonferroni test is to use more powerful multiple tests. The intuitive idea underlying our approach is that when the really existing effect is expressed rather weakly but in all partial tests, the power of Bonferroni test, n , equal to the probability of obtaining at least one test with the significance level lesser than / n may be 3 very low and, on the contrary, in this case the probability that at least some number k ( k 1) of n tests are significant at a level ' (greater than / n or even greater than ) may be much higher. Taking k and ' such that the overall probability n is not greater than the desired overall significance, , and rejecting the null hypothesis each time when there are at least k tests significant at the level significance not greater than ' , one can obtain a multiple test with overall and with the power greater than that of the Bonferroni multiple test. It is natural to consider the multiple tests of this type as binomial modifications of the Bonferroni test because the values of significance k and ' ensuring the desired overall can be easily found (under the assumption of independence of partial tests) by 2 The Bonferroni correction was proposed by Carlo Bonferroni (Bonferroni, 1935) for the case when several dependent or independent statistical tests are performed simultaneously (because it does nit follow from a given significance level holding for each individual comparison that the same does hold for the set of all comparisons). 4 means of the well-known formula for binomial probabilities. We will see that, indeed, the binomial multiple tests may have a power notably exceeding the power of the Bonferroni test and its former modifications. It is essential that we assume independence of partial tests to construct the binomial tests. In practice, however, the partial tests may be both independent (when they are based on different sets of data) and less or more dependent (when they are based on the same set of data, say, when performing multiple comparisons). We illustrate therefore the consequences of attenuating the restriction of independence. There are at least two principles for determining the values of k and ' for binomial multiple tests. First, we may fix arbitrarily the value of k and search for the largest value of ' that provides for the chosen overall significance n not greater than the required value . We may set k equal, say, to 2 and then calculate the corresponding value of ' . Also we may, for any given n , set k equal to some fraction of n , for example, equal to k [n / 2] (the integer part of n / 2 ). Second, we may fix arbitrarily the value of ' and search for the smallest value of k which provides that the obtained overall significance that the given level n is not greater , say, 0.05 . In particular, ' can be set equal to , i.e. we can set ' 0.05 (Prugnolle et al., 2002) and then calculate the corresponding value of k which, evidently, depends on the required overall significance of partial tests , on the chosen significance ' , and on the total number of partial tests n . But we may also fix ' at any other level, say, at 0.10, 0.25 or even 0.50 and calculate the corresponding value of k for the chosen level of significance. There are other modifications of the standard Bonferroni multiple test, mainly based on the ranking of the partial p -values. Holm (1979) proposed a sequential multiple testing procedure (see also Rice, 1989). The procedure consists in a stepwise comparison of 5 successively increasing partial p -values, p1 , p1 , ..., pn , with successively greater partial significance levels, / n, /(n 1), /(n 2), ..., /1 . If p1 / n , then the overall null hypothesis H 0 is not rejected and the procedure is stopped; otherwise, H 0 is rejected. Inequality p1 / n means also that the partial alternative should be rejected and we may pass to the next comparison. This stepwise process continues until the step i where the inequality pi /( n i 1) is fulfilled. In fact, it is only the first step of this procedure, concerning the overall null hypothesis, that is of interest for us. The binomial multiple tests we will consider do not test partial hypotheses and, in this sense, they give less information as compared with sequential tests. In principle, this is not even necessary that the overall alternative hypothesis is formulated as a falsity of all partial null hypotheses. But even in the case when the alternative hypothesis is formulated as the falsity of only a part of partial null hypotheses, it is very important to have a powerful test for the overall null hypothesis because falsely accepting the overall null hypothesis prevents automatically any further testing of partial hypotheses. Note also that Holm's procedure has the same power as the simple Bonferroni test because its first step is the same as that in the Bonferroni test. We will therefore use for comparison another sequential modification of the Bonferroni test developed by Simes (1986) in which the overall null hypothesis is rejected if at least one of inequalities pi i / n, i 1, 2, ..., n, holds. Though the Simes procedure does not provide universally that its really attained level of significance level n is always less than the required significance , it does so for a wide class of multivariate distributions, in particular, for the case of independent partial test statistics (Simes, 1986; Hochberg, Rom, 1995; Samuel-Cahn, 1996). 6 Another approach to combining independent test results was proposed by Fisher (Fisher, 1970; see also Manly, 1985). It is based on the fact that if H 0 is true, then p1 , p1 , ..., pn are uniformly randomly distributed over the interval [0, 1] and, consequently, the statistic n X 22n 2 log( pi ) i 1 is approximately distributed as a chi-square random variable with 2n degrees of freedom, the greater n , the better the approximation. Here we compare the power of different binomial multiple tests and the power of Bonferroni, Simes and Fisher tests for three types of the alternative hypothesis: (1) all partial null hypotheses are not true; (2) about a half of partial null hypotheses are not true; (3) only one of partial null hypotheses is not true. It will be shown that multiple binomial tests, especially that with k [n / 2] , are very suitable for testing the null hypothesis against alternatives (1) and (2), but not for (3). Another problem with multiple tests is an eventual correlation between partial tests. The parameters k and ' of binomial tests are calculated under assumption that the partial tests are independent, and only in this case their really attained significance exceed the required level n does not . Unfortunately, this property does not remain valid when partial tests are dependent, and we will see that, in this case, the overall significance notably greater than the desired level n can become , especially if intercorrelations between partial tests are high. So, some corrections for non-independence are needed in such cases. In Bonferroni test, for example, when partial tests are highly correlated, it is proposed to calculate the partial significance by formula '1 (1 )1/ n (Tukey et al., 1985; see also Curtin, Schultz, 7 1998). We will also consider how the lack of independence changes the properties of binomial tests. METHODS We consider the situation where the results of n independent tests for n partial null hypotheses, H i 0 , i 1, 2, ... , n , are given in the form of their sample tail probabilities ( p values), p1 , p2 , ... , pn , and we wish to combine these results in a single test procedure (multiple test) with a given significance level for verifying the overall null hypothesis, H 0 , affirming that all partial null hypotheses, H i 0 , are true. One way to do this is to reject H 0 each time when at least k of n p -values p1 , p2 , ... , pn are less than some level ' , where k and ' are chosen in such a way that the significance level of this procedure is not greater than a given value for the greatest , say, 0.05 . As was already noted, we can fix k and search ' providing the desired level of significance , or fix ' and search for the least k providing the significance level In the case of fixed . ' , the necessary value of k depends on n and and can be calculated by means of the Bernoulli formula for binomial probabilities. More precisely, to find the value of k that provides for the level of significance which is the most close to , it is sufficient to find the least value of k for which the inequality n n! i!(n i )!( ')i (1 ')ni i k is still satisfied. Note that the left-hand side of the inequality increases with increasing k and ' for small ' . 8 In the case of fixed k , we use the same inequality but vary the value of ' instead of k . To find ' providing the level of significance the most close to , it is sufficient to find the greatest value of ' for which this inequality still holds. In the Appendix we give a computer program for calculating k or values of n and ' for any given . To find the parameters k or ' of the binomial multiple tests for any n and , we use only the assumption of independence among partial tests and do not need any assumption on the probability distribution of the partial test statistics. However, to estimate the power of these tests we need to know these distributions. Hence, we make additional assumptions concerning the distributions of test statistics to be able to compare the powers of different tests. Certainly, the conclusions drawn from these particular comparisons cannot be general but they could give sound guidelines for choosing a suitable multiple test in real situations. To compare the powers of different multiple tests, we use the following partial tests which will be further referred to as "standard partial tests". It is assumed that their test statistics Z i have the standard normal distribution, Z i N ( 0; 1) , under the partial null hypotheses H i 0 , and that they have the same distribution but shifted to the right by a value 1 , i.e. Z i N ( 1; 1) , under the partial alternative hypotheses H i1 , H i 0 , i 1, 2, ... , n . Fig. 1 illustrates this testing situation graphically. In each partial test we reject the null hypothesis H i 0 (say, "absence of effect") and accept the alternative hypothesis H i1 (say, "presence of effect") if the sample test statistic falls into the critical region [ Z1 ' Z ) , where Z1 ' is the value of Z that satisfies the equation PH i 0 ( Z Z1 ' ) ' . The power of this partial test is equal to 9 ' PH i1 ( Z Z1 ' ) . Fig. 1 illustrates the case when ' 0.05 , Z1 ' 1.645 , and ' 0.26 . If the overall alternative hypothesis is formulated as the falsity of all partial null hypotheses, the power of the multiple binomial can be calculated by the Bernoulli formula n n i k where n! ( ')i (1 ')ni , i!(n i )! ' PH1 ( Z Z1 ' ) . If the alternative consists in falsity of only m of n partial null hypotheses with m equal, for example, to 1 or [n / 2] , the power can be calculated as n j2 m! l! j m j h l h , n ( ') (1 ') ( ') (1 ') j!(m j )! h!(l h)! i k j j1 where l n m , h i j , j1 max( 0, i l ) , j2 min( i, m) . Another method we used to evaluate the power of a test consists in generating a large number N , say, N 1000 , of random values in accordance with the probability distribution of the test statistics under the alternative hypothesis and in applying the multiple test to this data. The fraction of cases where the alternative hypothesis is accepted estimates the power of the multiple test. 10 RESULTS We have computed the parameters of some binomial tests for n from 1 to 30 and 0.05 using the program given in Appendix (Table 1). To compare those tests, we have also computed their powers for the case of standard partial tests with the alternative hypothesis that all the partial null hypotheses are false (Table 2). Figs. 2 and 3 illustrate how the power of these binomial multiple tests varies with the number of standard partial tests n . Fig. 2 does sos for binomial tests with fixed k and Fig. 3 for binomial tests with fixed ' . Example. Ryman and Jorde (2001) tested the allele frequency difference in 12 loci of two consecutive yearly classes of brown trout (Salmo trutta) using, for each locus, the chisquare statistics computed on the base of a 22 contingency table. The following twelve p- values were obtained: 0.007; 0.611; 0.009; 0.228; 0.110; 0.097; 0.651; 0.053; 0.851; 0.651; 0.058; 0.743. The Bonferroni test fails to elicit any significant difference between two classes at the level 0.05 because there is no p -value lesser than ' 0.05/12 0.004 among the 12 partial p -values. The authors argue for the use of the sum of partial chi-squares for testing the overall null hypothesis of no difference and, indeed, they succeed to discover a significant difference in allele frequency by this method. Instead, we could use the [n/2]binomial multiple test which is more universally applicable than the sum of chi-squares. According to Table 1, the null hypothesis should be rejected if at least 12/2=6 partial tests are significant at the level ' 0.245 . We find seven p -values significant at this level (0.007; 0.009; 0.228; 0.110; 0.097; 0.053; 0.058), whereby it follows that the null hypothesis should be rejected. 11 In Table 3, we compare, for some values of n , the [n/2]-binomial test not only with the Bonferroni, but also with the Simes (1986) and Fisher (1970) tests mentioned in the Introduction. We see that the [n/2]-binomial test is more powerful than the Simes test, but less powerful than the Fisher test, especially for small n . Until now we compared the powers of different tests under the alternative hypothesis that all partial null hypotheses are false. However, in practice, it is not always so. Sometimes the falsity of the overall null hypothesis may mean that only few, even one, null partial hypotheses are not true. In Table 4 we compare the power of the Bonferroni, Simes, Fisher and [n/2]-binomial multiple tests for 0.05 under two alternative hypotheses when not all partial null hypotheses are false: (1) when a half of n partial null hypotheses are false, and (2) only one partial null hypotheis is false. We see that, in the case of the alternative when a half of partial null hypotheses are false, the conclusions concerning the comparative properties of the four tests are nearly the same as in the case where the alternative falsifies all the partial null hypotheses. However, in the case of the alternative when only one partial null hypotheis is false, the [n/2]-binomial has no advantages and even a slightly lower power as compared to the Bonferroni, Simes and Fisher tests. In the previous considerations we assumed that the partial tests are independent. However, in practice it is not always so. To consider what follows from the failure of this assumption, we compared, for several values of n , the significance and power of different tests, under the alternative hypothesis that all partial null hypotheses are false, in the situation of correlated partial tests. For each n 4, 10, 20, 30 , we simulated 10,000 sets of n values of the test statistic under the alternative hypothesis which were correlated at the level of about 0.5. Then we applied the Bonferroni, Simes and [n/2]-binomial multiple tests to these data. The results are presented in Table 5: though the power of the [n/2]-binomial test remains always much higher than that of the Bonferroni and Simes tests, its overall significance, n , 12 is greater than the required level, 0.05 , especially for large n . In this situation, the [n/2]- binomial test behaves similar to the Fisher test: the significance levels of the both tests become considerably greater than 0.05 . Note, however, that the correlation of partial tests affects weaker the [n/2]-binomial test: the increase in the significance level and decrease in the power are lesser than those for the Fisher test. To the contrary, as it can be seen from Table 5, the significance level of the Bonferroni and Simes tests is less than the required level 0.05 . We hence have corrected all four tests in such a way that their overall significance be equal to 0.05 (simply by directly adjusting them). In particular, as it can be seen from Table 6, it was sufficient for this to increase ' for the Bonferroni test by 20-30% (these corrections are considerably less than those we would obtain using the equation '1 (1 )1/ n proposed by Tukey et al. (1985) for the case of correlated tests) and for the Simes test by 10-20%, and to decrease ' for [n/2]-binomial test nearly twice. We see that, after these corrections, the power of [n/2]binomial test still remains superior to the power of the Bonferrony and Simes tests, especially for large n (the power is about one half for the [n/2]-binomial test versus one third for the Bonferroni and Simes tests, for n greater than 15) and close to (and even greater than, for large n ) the power of the Fisher test. 13 DISCUSSION AND CONCLUSION The findings show that multiple binomial tests have considerably higher powers, as compared with those of the Bonferroni test and some its sequential modifications, when the partial tests are independent and the alternative hypothesis consists in the falsity of all partial null hypotheses. Though the powers were computed for a particular tested situation, we believe that this conclusion may be general enough. Hence, if the assumptions of partial tests independence and of multiplicity in partial null hypotheses violtion are true, it follows from our comparisons that the binomial tests, and especially the [n/2]-binomial test, are more powerful than the Bonferroni and Simes tests but less powerful than the Fisher test. We do not, however, recommend to use unreservedly the Fisher test. An obstacle in its using is the necessity to calculate logarithms of partial p -values which may often be equal to 0, especially when partial tests are based on discrete statistics. We also see, that the Fisher test is more sensible to correlations among the partial tests. From the two kinds of multiple binomial test, the kind with fixed k and that with fixed ' , the tests with fixed k are more preferable because, by virtue of the continuity of ' (as opposed to the discreteness of k ), the dependence of the power on the number of tests, n , is more regular and the power values are mostly higher. In particular, the test with k = [n / 2] behaves quite well and its power attains almost the double of the power of the Bonferroni test for our standard partial tests. Though binomial tests with fixed ' are, in general, less preferable, they may be useful in some cases. For example, if a single bit of information we have about each partial test is whether it is significant or not at the level 0.05 , then it is natural to combine these partial tests by using the multiple binomial test with the partial significance level ' 0.05 . 14 In practice, however, we are not always sure that our standard assumptions about a tested situation (independence of partial tests and simultaneous violation of all partial null hypotheses) are really true. In particular, test statistics may be correlated. We saw that, in this case, the Bonferroni test ( k 1 ) and [n/2]-binomial test behave quite differently. The the values of significance and power of the Bonferroni test decrease as compared with their values computed under the assumption of independence, whereas the values of the significance and power of the [n/2]-binomial and Fisher tests increase (especially those of the latter). It means that, to provide the required level of significance with no loss in the power, we should increase the value of the partial levels of significance for the Bonferroni and Simes tests and decrease them for the [n/2]-binomial and Fisher tests. In our example, with partial tests correlated at the level of 0.5, we had to increase the value of the partial significance level of the Bonferroni and Simes tests by about 25% and to decrease the value of the partial significance level of the [n/2]-binomial test by about 50%. After these adjustments, the [n/2]binomial test remained still more powerful than the Bonferroni and Simes tests, though not so drastically as in the case of independence. Another deviation from our basic assumptions may consist in only few of partial null hypotheses are really false in the case of falsity of overall null hypothesis. In the extreme case, only one of n partial hypotheses is corrupted, and we see that, in this case, it is rather the Bonferroni or, better, Simes test that is preferable. But if the corruptions of a partial null hypothesis are numerous, say, about a half of all of them, the [n/2]-binomial and Fisher tests remain considerably more powerful. The authors are thankful to anonymous referees for many helpful suggestions. The work was supported by a CNRS senior research fellowship and a grant from RFBR (07-0400521) for A.T. 15 References Bender R., Lange S., 1999. What's wrong with arguments against multiplicity adjustments // BMJ. V. 318(7183). P. 600. Blair R.C., Troendle J.F., Beck R.W., 1996. Control of familywise error in multiple endpoint assessments via stepwise permutation tests // Stat. Med. V. 15(11). P. 1107-1121. Bland J.M., Altman D.G., 1995, Multiple significance tests: the Bonferroni method // BMJ. V. 310(6973). P. 170. Bonferroni C. E., 1935. Il calcolo delle assicurazioni su gruppi di teste // Studi in Onore del Professore Salvatore Ortu Carboni. Rome: Italy. P. 13-60. Couples A, Heeren T., Schatzkin A., Colton T., 1984. Multiple testing of hypotheses in comparing two groups // Ann. Intern. Med. V. 100(1). P. 122-129. Curtin F., Schultz P., 1998. Multiple Correlations and Bonferroni’s Correction // Biol. Psychiatry. V. 44(8). P. 775–777. Fisher R.A., 1970. Statistical Methods for Research Workers. 14th ed. London: Oliver and Boyd. 362 p. Hochberg Y., 1988. A sharper Bonferroni procedure for multiple tests of significance // Biometrika. V. 75(4). P. 800-812. Hochberg Y., Rom D., 1995. Extensions of multiple testing procedures based on Simes' test // J. Statist. Plan. Infer. V. 48(1). P. 141-152. Hochberg Y., Tamhane A.C., 1987. Multiple Comparison Procedures. N. Y.: Wiley. 950 p. Holm S., 1979. A simple sequentially rejective multiple test procedure // Scand. J. Stat. V. 6(1). P. 65-70. 16 Legendre P., Legendre L., 1998. Numerical Ecology, 2nd ed. Amsterdam: Elsevier. 853 p. Manly B.F.J., 1985. The Statistics of Natural Selection on Animal Population. London: Chapman and Hall. 484 p. Morikawa T., Terao A., Iwasaki M., 1997. Power evaluation of various modified Bonferroni procedures by Monte Carlo study // J. Biopharm. Stat. V. 7(3). P. 473-477. Morrison D.F., 2004. Multivariate Statistical Methods, 4nd ed. N. Y.: McGrraw-Hill. 480 p. Meinert C.L., 1986. Clinical Trials. Design, Conduct and Analysis. N. Y.: Oxford Univ. Press. 512 p. Perneger T.V., 1998. What's wrong with Bonferroni adjustments // BMJ. V. 316(7139). P. 1236-1238. Prugnolle F., de Meeûs T., Durand P., Sire C. Théron, A., 2002. Sex-specific genetic structure in Schistosoma mansoni: evolutionary and epidemiological implications // Mol. Evol. V. 11(7). P. 1231-1238. Rice W.R., 1989. Analysing tables of statistical tests // Evolution. V. 43(1). P. 223-225. Rom D.M., 1990. A sequentially rejective test procedure based on a modified Bonferroni inequality // Biometrika. V. 77(3). P. 663-665. Roth A.J., 1999. Multiple comparisons procedures for discrete test statistics // J. Stat. Plan. Infer. V. 82(1-2). P. 101-117. Rothman K.J., 1990. No adjustment are needed for multiple comparisons // Epidemiology. V. 1(1). P. 43-46. Ryman N., Jorde P.E., 2001. Statistical power when testing for genetic differentiation // Mol. Evol. V. 10(10). P. 2361-2373. 17 Samuel-Cahn E., 1996. Is the Simes improved Bonferroni procedure conservative? // Biometrika. V. 77(3). P. 663-665. Simes R.J., 1986. An improved Bonferroni procedure for multiple tests of significance // Biometrika. V. 73(3). P. 751-754. Tukey J.W., Ciminera J.L., Heyse J.F., 1985. Testing the statistical incertainty of a response to increasing doses of a drog // Biometrics. V. 41(1). P. 295-301. Westfall P.H., Young S.S., 1993. Resampling-based Multiple Testing. N. Y.: Wiley. 360 p. Zhang J., Quan H., Ng J., Stepanavage M.E., 1997. Some statistical methods for multiple endpoints in clinical trials // Contr. Clin. Trials. V. 18(3). P. 204-221. 18 Appendix Below we give the program which can be used for computing the parameters of binomial tests. It requires four input values: (1) the number of partial tests, n; (2) the desired significance of the combined test, alpha; (3) the desired significance of partial test, alpha1fix (we set 0 instead if alpha1fix is not fixed); (4) the desired number of significant partial tests necessary for rejecting the overall null hypothesis, kfix (we set 0 instead if kfix is not fixed); The output includes two values: the optimal significance of partial tests, alpha1opt, and the optimal number of significant partial tests, kopt. If kfix=0 and a1fix>0, then the program searches for the number of significant partial tests, kopt, which provides the minimal difference between the desired significance, alpha, and real significance, an, always keeping an not greater than alpha. If kfix>0 and alpha1fix=0, then the program searches for the value of partial significance, alpha1opt, which provides the minimal difference between the desired and real significances, always keeping an not greater than alpha. We recommend to put kfix = [n/2] and alpha1fix = 0 (see details in the text). 19 The text of program in QuickBasic DEFINT I-N CLS INPUT "Enter n: ", n INPUT "Enter alpha: ", alpha INPUT "Enter kfix: ", kfix INPUT "Enter alpha1fix: ", alpha1fix da = .0001 IF kfix = 0 THEN k1 = 1 k2 = n ELSE k1 = kfix 20 k2 = kfix END IF IF alpha1fix = 0 THEN a11 = da a12 = .5 ELSE a11 = alpha1fix a12 = alpha1fix END IF dmin = 1 FOR k = k2 TO k1 STEP -1 FOR a1 = a11 TO a12 STEP da an = 0 FOR i = k TO n cni = 1 FOR j = 1 TO i cni = cni * (n - j + 1) / j 21 NEXT j an = an + cni * a1 ^ i * (1 - a1) ^ (n - i) NEXT i dn = alpha - an IF dn >= 0 AND dn <= dmin THEN kopt = k alpha1opt = a1 dmin = dn END IF NEXT a1 NEXT k PRINT PRINT "kopt="; kopt PRINT "alpha1opt="; PRINT USING "##.####"; alpha1opt 22 Captions for figures Fig. 1. Distribution of the test statistic in the "standard partial test" (see text) under the null and alternative partial hypotheses. Fig. 2. The power of some binomial multiple tests with fixed k as a function of the number of standard partial tests n for 0.05 under the alternative hypothesis that all partial null hypotheses are false. The power of the Bonferroni test is also given for comparison. Fig. 3. The power of the some binomial multiple tests with fixed ' as a function of the number of standard partial tests n for 0.05 under the alternative hypothesis that all partial null hypotheses are false. The power of the Bonferroni test is also given for comparison. 23 0.5 0.4 Hi1: N(=1, =1) Hi0: N(=0, =1) Density 0.3 0.2 0.1 0.0 -3 -2 -1 0 1 Test statistic value 2 3 4 Fig. 1. 24 1.0 k=[n/2] 0.9 k=3 k=2 0.8 Overall power 0.7 0.6 k=1 (Bonferroni) 0.5 0.4 0.3 0.2 0.1 0.0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Number of tests Fig. 2. 25 1.0 0.9 0.8 Overall power 0.7 0.6 k=1 (Bonferroni) 0.5 0.4 0.3 0.2 0.1 0.0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Number of tests Fig. 3. 26 Table 1. Parameters of some binomial multiple tests as a function of the number of standard partial tests n for 0.05 The required partial significance level, ' The required number of significant partial tests, k n k 1 k 2 k 3 k [n / 2 ] ' 0.05 ' 0.10 ' 0.25 (Bonferroni) 1 0.0500 1 2 0.0253 0.223 3 0.0170 0.135 4 0.0127 5 0.025 2 2 0.368 0.016 2 2 3 0.097 0.248 0.097 2 3 4 0.0102 0.076 0.189 0.076 2 3 4 6 0.0085 0.062 0.153 0.153 2 3 4 7 0.0073 0.053 0.128 0.128 2 3 5 8 0.0064 0.046 0.111 0.192 3 3 5 9 0.0057 0.041 0.097 0.168 3 4 5 10 0.0051 0.036 0.087 0.222 3 4 6 11 0.0047 0.033 0.078 0.199 3 4 6 12 0.0043 0.030 0.071 0.245 3 4 7 13 0.0039 0.028 0.066 0.223 3 4 7 27 14 0.0037 0.025 0.061 0.263 3 4 7 15 0.0034 0.024 0.056 0.243 3 5 8 16 0.0032 0.022 0.053 0.278 3 5 8 17 0.0030 0.021 0.049 0.260 4 5 8 18 0.0028 0.020 0.047 0.291 4 5 9 19 0.0027 0.019 0.044 0.273 4 5 9 20 0.0026 0.018 0.042 0.301 4 5 9 21 0.0024 0.017 0.040 0.285 4 6 10 22 0.0023 0.016 0.038 0.311 4 6 10 23 0.0022 0.015 0.036 0.296 4 6 10 24 0.0021 0.015 0.034 0.319 4 6 11 25 0.0020 0.014 0.033 0.305 4 6 11 26 0.0020 0.013 0.032 0.326 4 6 11 27 0.0019 0.013 0.030 0.313 4 6 12 28 0.0018 0.012 0.029 0.333 4 7 12 29 0.0018 0.012 0.028 0.320 5 7 12 30 0.0017 0.011 0.027 0.338 5 7 13 28 Table 2. The power of some binomial multiple tests as a function of the number of standard partial tests n for 0.05 under the alternative hypothesis that all partial null hypotheses are false Power n k 1 k 2 k 3 k [n / 2] ' 0.05 ' 0.10 ' 0.25 (Bonferroni) 1 0.26 0.26 2 0.31 0.35 3 0.34 0.44 4 0.36 5 0.31 0.07 0.15 0.42 0.33 0.17 0.34 0.25 0.49 0.52 0.49 0.28 0.17 0.16 0.38 0.54 0.59 0.54 0.39 0.30 0.39 6 0.40 0.57 0.64 0.64 0.49 0.43 0.60 7 0.41 0.60 0.68 0.68 0.58 0.56 0.48 8 0.43 0.62 0.71 0.74 0.35 0.66 0.66 9 0.44 0.65 0.73 0.77 0.43 0.49 0.79 10 0.45 0.66 0.76 0.82 0.50 0.59 0.70 11 0.46 0.68 0.78 0.85 0.57 0.68 0.81 12 0.46 0.69 0.79 0.88 0.64 0.75 0.74 13 0.47 0.71 0.81 0.89 0.70 0.81 0.83 14 0.48 0.71 0.82 0.92 0.75 0.86 0.90 29 15 0.49 0.73 0.83 0.93 0.79 0.76 0.85 16 0.49 0.74 0.85 0.94 0.83 0.81 0.90 17 0.50 0.75 0.85 0.95 0.68 0.85 0.94 18 0.50 0.76 0.86 0.96 0.73 0.89 0.91 19 0.51 0.77 0.87 0.97 0.77 0.92 0.95 20 0.51 0.78 0.88 0.97 0.80 0.94 0.97 21 0.52 0.78 0.89 0.98 0.83 0.89 0.95 22 0.52 0.79 0.89 0.98 0.86 0.91 0.97 23 0.53 0.79 0.89 0.99 0.88 0.93 0.98 24 0.53 0.81 0.90 0.99 0.90 0.95 0.97 25 0.54 0.80 0.90 0.99 0.92 0.96 0.98 26 0.54 0.80 0.91 0.99 0.93 0.97 0.99 27 0.54 0.81 0.91 0.99 0.95 0.98 0.98 28 0.55 0.81 0.91 1.00 0.96 0.96 0.99 29 0.55 0.82 0.92 1.00 0.91 0.97 0.99 30 0.55 0.81 0.92 1.00 0.92 0.98 0.99 30 Table 3. The power of tests for 0.05 under the alternative hypothesis that all partial null hypotheses are false Bonferroni Simes Fisher Binomial, k [n / 2 ] n 4 0.36 0.39 0.60 0.49 10 0.45 0.48 0.90 0.82 20 0.51 0.57 0.99 0.97 30 0.55 0.61 1.00 1.00 31 Table 4. The power of tests for 0.05 under two alternative hypothesis when not all partial null hypotheses are false A half of partial null hypotheses are false n Bonferroni Simes Fisher Only one partial null hypothesis is false Binomial, Bonferroni Simes Fisher k [n / 2 ] Binomial k [n / 2 ] 4 0.23 0.23 0.29 0.23 0.14 0.14 0.15 0.12 10 0.28 0.30 0.51 0.37 0.10 0.10 0.11 0.08 20 0.33 0.34 0.75 0.57 0.08 0.09 0.08 0.07 30 0.34 0.37 0.88 0.67 0.07 0.08 0.08 0.06 32 Table 5. The significance and power for 0.05 under the alternative hypothesis that all partial null hypotheses are false, for partial tests correlated at the level of about 0.5 Bonferroni n 4 ' n Simes n ' n Fisher Binomial, k [n / 2 ] n n n ' n n 0.0125 0.043 0.27 0.0125 0.046 0.30 0.11 0.51 0.097 0.09 0.45 10 0.0050 0.040 0.27 0.0050 0.045 0.31 0.18 0.66 0.222 0.18 0.67 20 0.0025 0.040 0.27 0.0025 0.045 0.30 0.23 0.73 0.301 0.20 0.77 30 0.0017 0.038 0.27 0.0017 0.042 0.31 0.36 0.76 0.338 0.25 0.81 33 Table 6. The significance and power of tests for 0.05 under the alternative hypothesis that all partial null hypotheses are false for correlated, at the level of about 0.5, partial tests after correcting partial significance levels Bonferroni Simes Fisher Binomial, k [n / 2 ] n ' n n ' n n n n ' n n 4 0.0148 0.05 0.30 0.0136 0.05 0.31 0.05 0.34 0.058 0.05 0.33 10 0.0068 0.05 0.32 0.0058 0.05 0.33 0.05 0.39 0.091 0.05 0.37 20 0.0032 0.05 0.30 0.0028 0.05 0.32 0.05 0.39 0.164 0.05 0.54 30 0.0023 0.05 0.31 0.0020 0.05 0.33 0.05 0.39 0.158 0.05 0.52 34 О МОЩНОСТИ НЕКОТОРЫХ БИНОМИАЛЬНЫХ МОДИФИКАЦИЙ МНОЖЕСТВЕННОГО КРИТЕРИЯ БОНФЕРРОНИ А.Т. Терехин, Т. де Мееус, Ж.-Ф. Геган. Центр генетики и эволюции инфекционных болезней, ИРД, 911 авеню Агрополис Монпелье 34394, Франция Московский государственный университет им. М.В. Ломоносова Биологический факультет, Ленинские горы 1, Москва 119992 E-mail: [email protected] Широко используемый при проверке статистических гипотез множественный критерий Бонферрони имеет довольно низкую мощность, что делает высоким риск ошибочного принятия общей нулевой гипотезы, т.е. необнаружения реально существующего эффекта. Мы показываем, что если статистики частных критериев независимы, то можно снизить этот риск, используя некоторые биномиальные модификации критерия Бонферрони. Вместо непринятия нулевой гипотезы, когда отвергается по меньшей мере одна из n частных нулевых гипотез на довольно высоком уровне значимости (например, на уровне 0.005 в случае n=10), как это предписывается критерием Бонферрони, предлагаются биномиальные тесты, в которых нулевая гипотеза отвергается, когда отвергаются по меньшей мере k частных нулевых гипотез (например, k=n/2) на гораздо менее высоком уровне значимости (до 30-50%). Показывается, что мощность таких биномиальных тестов существенно выше мощности теста Бонферрони и некоторых его вариантов. Кроме того, такой подход позволяет объединять тестирования, результаты которых известны лишь для фиксированного уровня значимости. В статье приводятся таблицы и компьютерная программа, позволяющие найти (из таблиц или с помощью программы) необходимые параметры биномиального критерия, т.е. либо частный уровень значимости (когда фиксируется k), либо значение k (когда фиксируется частный уровень значимости). 35