Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICS IN MEDICINE Statist. Med. 2001; 20:3009–3021 (DOI: 10.1002/sim.980) A three-stage clinical trial design for rare disorders Visa E. A. Honkanen1;2;3 , Andrew F. Siegel4 , John Paul Szalai5 , Vance Berger6 , Brian M. Feldman1;2;7;∗; † and Je7rey N. Siegel8 1 Division of Rheumatology; The Hospital for Sick Children; Toronto; Canada 2 University of Toronto; Toronto; Ontario; Canada 3 Children’s Hospital; University of Helsinki; Helsinki; Finland 4 Departments of Management Science; Finance; Statistics and Molecular Biotechnology; University of Washington at Seattle and the Institute for Systems Biology; Seattle; U.S.A. 5 Department of Research Design and Biostatistics; Sunnybrook Health Science Center; Toronto; Canada 6 Food and Drug Administration; Center for Biologics Evaluation and Research; O'ce of Establishment Licensing and Product Surveillance; Division of Biostatistics and Epidemiology; Rockville; U.S.A. 7 Department of Paediatrics and Community Medicine; University of Toronto; Toronto; Canada 8 Food and Drug Administration; Center for Biologics Evaluation and Research; O'ce of Therapeutics Research and Review; Division of Clinical Trial Design and Analysis; Rockville; U.S.A. SUMMARY Many clinical trials of uncommon diseases are underpowered because of the di>culty of recruiting adequate numbers of subjects. We propose a clinical trial design with improved statistical power compared to the traditional randomized trial for use in clinical trials of rare diseases. The three-stage clinical trial design consists of an initial randomized placebo-controlled stage, a randomized withdrawal stage for subjects who responded, and a third randomized stage for placebo non-responders who subsequently respond to treatment. Test level and power were assessed by computer-intensive exact calculations. The three-stage clinical trial design was found to be consistently superior to the traditional randomized trial design in all cases examined, with sample sizes typically reduced by 20 per cent to 30 per cent while maintaining comparable power. When a treatment clearly superior to placebo was considered, our design reached a power of 75 per cent with a sample of 21 patients compared with the 52 needed to attain this power when only a randomized controlled trial was used. In situations where patient numbers are limited, a three-stage clinical trial design may be a more powerful design than the traditional randomized trial for detecting clinical beneDts. Copyright ? 2001 John Wiley & Sons, Ltd. INTRODUCTION To assure that a clinical trial will produce interpretable results, the combination of sample size and design must provide adequate statistical power to detect the therapeutic beneDt of ∗ Correspondence to: Brian Feldman, Division of Rheumatology, Hospital for Sick Children, 555 University Avenue, Toronto, ON M5G 1X8, Canada [email protected] † E-mail: Copyright ? 2001 John Wiley & Sons, Ltd. Received September 2000 Accepted March 2001 3010 V. E. A. HONKANEN ET AL. a new therapy. When studying rare disorders, the number of patients available to study is usually small, making it di>cult to enrol enough patients to power a randomized clinical trial adequately. Another strategy for increasing statistical power is to design the trial to collect more information from each patient, for example, with cross-over designs. Crossover designs, however, rely on every patient taking part in a placebo stage equal in length to the treatment stage and on a total washout of the treatment e7ect, without which the design may yield invalid results [1; 2]. In addition, when expectations for the drug are high and equipoise is lacking [3], the placebo stage may pose a recruitment problem [4]. These problems become especially evident in studies of rare diseases in paediatric populations, such as juvenile rheumatoid arthritis or juvenile dermatomyositis [5; 6], when the treatment may have been tested in adults with promising results, thus raising expectations for its e>cacy in children. Another alternative to the simple randomized trial is the randomized withdrawal design [7]. A randomized withdrawal trial is an example of an enrichment design in which subjects previously demonstrated to respond to treatment are randomly withdrawn to placebo or maintained on therapy. Recently, the withdrawal design has been used successfully in studies of drugs with long or uncertain wash-out times in chronic disorders [8; 9] and was provided as evidence of e>cacy for the approval of etanercept for juvenile rheumatoid arthritis [10]. We asked whether by combining a simple randomized trial with a randomized withdrawal trial in the same group of subjects, thereby getting more information from each one, we could study fewer patients than in a traditional randomized control trial design. We used this approach to develop a three-stage clinical trial design, consisting of an initial randomized placebo-controlled stage, a randomized withdrawal stage for subjects who responded to drug, and a third randomized stage for initial non-responders to placebo who subsequently responded to drug treatment given open-label. We then assessed the power of this design using computationally-intense exact calculations to ascertain whether it worked, whether it o7ered practical advantages over more conventional designs, and whether studying the same subjects twice led to false positive results. METHODS Three-stage clinical trial design The design was based on classifying the treated subjects as responders or non-responders, a method now recommended in analysing the e>cacy of treatment in a number of chronic rheumatic conditions. The American College of Rheumatology has published standardized criteria for a signiDcant clinical response to a drug therapy in adults [11], and proposed a similar type of criteria for children. To assess responses in the randomized withdrawal stage of the trial, criteria for Mare of disease activity are used. Examples of criteria for Mare include a published trial of randomized withdrawal of hydroxychloroquine in systemic lupus erythematosus [12] or a trial utilized to determine the e>cacy of etanercept in juvenile rheumatoid arthritis [10]. In the discussion which follows, a responder in the randomized withdrawal stage of the trial is deDned as a subject who does not Mare. Our design is illustrated in Figure 1. In the Drst stage, patients are enrolled into a standard two-group parallel-arm randomized clinical trial. Each patient is randomized to receive either Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 THREE-STAGE TRIAL DESIGN FOR RARE DISORDERS 3011 Figure 1. Design of the three-stage design clinical trial. Stage I of the three-stage design consists of an ordinary randomized placebo-controlled trial, which yields the Drst two-by-two contingency table with p1 as its one-sided chi-squared p-value. In the second stage, the patients who responded to treatment in stage I are randomly assigned to treatment or placebo, resulting in the stage II two-by-two table with p2 as its one-sided chi-squared p-value. In the third stage, the patients who did not respond to placebo in stage I are placed on active treatment, and responders are randomly assigned to treatment or placebo, resulting in the stage III two-by-two table with p3 as its one-sided chi-squared p-value. Variable names in each circle indicate the number of subjects, for example, nCNTR is the number of control (placebo) non-responders subsequently assigned to treatment who then responded. Probabilities of responding are indicated along branches, for example, pTC is the probability of responding by a subject initially assigned to the treatment group but subsequently reassigned to the control (placebo) group. We assume that each subject responds independently at the appropriate probability at each stage so that, for example, nT R follows a binomial distribution with parameters nT and pT . Assignments are made in a balanced way so that, for example, nT = int(n=2) and nC = n − nT . the drug studied (active therapy) or a placebo. Subjects may either respond or not; the results produce the stage I two-by-two contingency table whose one-sided chi-squared p-value∗ is denoted p1 . At this point, the subjects who did not respond to active treatment end the study ∗ The one-sided chi-squared p-value, for which smaller values correspond to more e7ective treatment, may be computed from the conventional two-sided chi-squared p-value as follows. If subjects assigned to treatment respond at a higher rate than those assigned to control, then set pone-sided = ptwo-sided =2. If they respond at a lower rate, then set pone-sided = 1 − ptwo-sided =2. If they respond at an equal rate, then set pone-sided = 1=2. In addition, if computation of the chi-squared statistic would involve division by zero, then set pone-sided = 1=2. Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 3012 V. E. A. HONKANEN ET AL. (in a cross-over trial they would start on placebo after a wash-out period.) Similarly, subjects who respond favourably to placebo also end the study (in a cross-over trial they would start on active treatment after a wash-out period). In the second stage, the patients who did respond to treatment in stage I are randomized either to stay on active treatment or withdraw and be placed on placebo. In these patients the disease either recurs or stays in remission. The original treatment arm ends with the stage II two-by-two contingency table whose one-sided chi-squared p-value is denoted p2 . In the third stage, the patients who did not respond to placebo in stage I are placed on active treatment: those who do not respond exit the study, and responders are randomly assigned either to stay on active treatment or return to placebo. In these patients, the disease either recurs or stays in remission. The original placebo arm ends with the stage III two-by-two contingency table whose one-sided chi-squared p-value is denoted p3 . The p-values from the three stages (p1 ; p2 and p3 ) are asymptotically independent and asymptotically distributed according to a uniform distribution on (0; 1) under the hypothesis of no di7erences between treatment and placebo in any of the three stages, so that pT = pC; pTT = pTC, and pCTT = pCTC, using the notation in Figure 1. This distribution follows from the fact that independent random assignments to treatment and placebo are made at each stage and from asymptotic properties of chi-squared tests. It is worth noting that these results remain true even when we re-use the subjects because each two-by-two table is independently randomized. These three p-values are then combined using Fisher’s method, which rejects the null hypothesis if the product p1 p2 p3 is less than 0.001844. (This may be derived from the fact that −2 ln(p1 p2 p3 ) follows, asymptotically, a chi-squared distribution with 6 degrees of freedom.) If the two-by-two table at either stage II or stage III would involve fewer than four people, it is not used, and the result of the three-stage design would be computed either using Fisher’s method with the two available p-values (rejecting the null hypothesis if the appropriate product p1 p2 or p1 p3 is less than 0.008705; this may be derived from the fact that −2 ln(p1 p2 ) follows, asymptotically, a chi-squared distribution with 4 degrees of freedom) or using just p1 if it is the only p-value available. Fisher’s method was chosen over two alternative methods for combining p-values that were considered. The Drst alternative, the minimum p-value (using critical value (1 − 0:95)k when p-values from k stages are combined), was found to have uniformly lower power than Fisher’s method in all scenarios considered. The second alternative, a weighted Fisher’s method (using −2 ln(p1n1 p2n2 p3n3 ) =−2n1 ln(p1 ) − 2n2 ln(p2 ) − 2n3 ln(p3 ) as the test statistic, where ni is the number of subjects enrolled in stage i), gave similar results to Fisher’s method in many cases, but was considerably worse than Fisher’s method in the mixture model (for example, having 67 per cent power as compared to 84 per cent for Fisher’s method in the case of initial sample size 25). We therefore chose the unweighted Fisher’s method as a simple, e7ective and well-established solution. Please note that we are not claiming that the statistical data are independent from one stage of the trial to another, and such a claim is not necessary for the success of the three-stage design. Instead, we assert the weaker claim that the p-values are asymptotically independent across the three stages (even though the response rates, for example, might not be independent). For example, if the stage I treatment group happens randomly to show a high response rate, we may well see higher response rates in stage II (consistent with statistical dependence between the results of the two stages); none the less, under the hypothesis of no treatment e7ect, the full and complete re-randomization of all subjects who become part of stage II Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 THREE-STAGE TRIAL DESIGN FOR RARE DISORDERS 3013 will cause the p-value for stage II to be asymptotically uniformly distributed regardless of the results of stage I. Since the stage II p-value has the same asymptotic conditional probability distribution (uniform [0; 1]) regardless of the result of stage I, it follows that the stage II p-value is asymptotically statistically independent of the stage I p-value. This is all that is needed for the asymptotic validity of Fisher’s method for combining independent p-values from the three stages. Note that the analysis proposed here assumes random marginals in the tables, that is, the total number of subjects responding at a given stage is taken to be random. While this is a well-established approach, one alternative is to assume that the total number responding is Dxed, and the randomness arises only from the randomized assignments [13]. This could be accomplished here by substituting one-sided p-values from Fisher’s exact test in place of p-values from the chi-squared test while leaving all other aspects of the design unchanged. Evaluating the design We developed software to evaluate the performance of the full three-stage design and of the traditional randomized clinical trial design (which uses only the Drst stage) by computationallyintensive enumeration of all possible outcomes that could occur, given the initial sample size n and values for the seven probabilities of responding (pT, pC, pTT, pTC, pCT, pCTT and pCTC) as shown in Figure 1 (for example, with n = 40 there are 5473556 possible outcomes). For each possible observable outcome (deDned as the seven numbers nTR, nCR, nTRTR, nTRCR, nCNTR, nCNTRTR and nCNTRCR each specifying a number of subjects responding) we computed the stage I (randomized clinical trial) p-value p1 and the combined result for all three stages to see if the null hypothesis was rejected at the 5 per cent level. We also, for each possible outcome, computed its probability of occurrence as the product of the seven binomial probabilities, one corresponding to each probability of responding. By summing these probabilities of occurrence for the observed outcomes for which the null hypothesis was rejected, we were able to compute the exact probability of rejecting the null hypothesis for each design. Here is the algorithm we used to enumerate the sample space and compute the exact power of a test: 1. Set the initial number assigned to treatment: nT = Int(n=2), where Int is the integer part function. 2. Set the initial number assigned to control: nC = n − nT . 3. Set the number of stage I treatment responders using a control loop: FOR nTR = 0 TO nT. 4. Set the number assigned to treatment in stage II: nTRT = Int(nTR=2). 5. Set the number assigned to control in stage II: nTRC = nTR − nTRT . 6. Set the number of stage I control responders using a control loop: FOR nCR = 0 TO nC. 7. Assign all control non-responders to treatment in preparation for stage III: nCNT = nC − nCR. 8. Set the number of treatment responders in stage II using a control loop: FOR nTRTR = 0 TO nTRT. 9. Set the number of control responders in stage II using a control loop: FOR nTRCR = 0 TO nTRC. Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 3014 V. E. A. HONKANEN ET AL. 10. Set the number of subjects who will participate in stage III (these are stage I control nonresponders subsequently responding to treatment) using a control loop: FOR nCNTR = 0 TO nCNT. 11. Set the number assigned to treatment in stage III: nCNTRT = Int(nCNTR=2). 12. Set the number assigned to control in stage III: nCNTRC = nCNTR − nCNTRT . 13. Set the number of treatment responders in stage III using a control loop: FOR nCNTRTR = 0 TO nCNTRT. 14. Set the number of control responders in stage III using a control loop: FOR nCNTRCR = 0 TO nCNTRC. 15. Within this innermost loop, we see every possible conDguration (for example, the stage I table counts are nTr, nT − nTR, nCR, nC − nCR). We can therefore compute all appropriate p-values and accumulate the probability if the null hypothesis is rejected. The probability of the current conDguration (reMecting all three stages) may be computed as the following product of binomials and conditional binomials (although we optimized for e>ciency by precomputing binomial probabilities and moving multiplications outside control loops wherever possible): nT nTR pT nTR (1 − pT )nT −nTR nC nCR pC nCR (1 − pC)nC−nCR nTRT × pTT nTRTR (1 − pTT )nTRT −nTRTR nTRTR nTRC nCNT pTC nTRCR (1 − pTC)nTRC−nTRCR pCT nCNTR (1 − pCT )nCNT −nCNTR nTRCR nCNTR nCNTRT pCTT nCNTRTR (1 − pCTT )nCNTRT −nCNTRTR nCNTRTR nCNTRC × pCTC nCNTRCR (1 − pCTC)nCNTRC−nCNTRCR nCNTRCR 16. All control loops end here: NEXT nCNTRCR, NEXT nCNTRTR, NEXT nCNTR, NEXT nTRCR, NEXT nTRTR, NEXT nCR, NEXT nTR. RESULTS In a standard randomized clinical trial, e>cacy of a therapeutic agent is inferred from the initial response of patients to treatment compared to control. In the three-stage design (Figure 1), the trial begins with an initial exposure to drug or control (stage I) identical to a standard randomized clinical trial. Then responders to drug are randomized to remain on drug or be withdrawn to placebo (stage II). In addition, subjects who did not respond to placebo in the initial stage are given drug open-label. Those who respond are also randomized to remain on drug or be withdrawn to placebo (stage III). Thus, in contrast to the standard randomized controlled trial which infers e>cacy of a new therapy only based on the initial exposure Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 THREE-STAGE TRIAL DESIGN FOR RARE DISORDERS 3015 Figure 2. Type I error associated with the three-stage design clinical trial. Exact probability of type I error when the nominal test level is 5 per cent. Results for each of the two designs are shown for a uniform response rate of 20 per cent at sample sizes from 20 to 65 by 5, together with results for a uniform response rate of 40 per cent at each sample size. Some deviation from the nominal 5 per cent level is to be expected due to Dnite sample size. to drug or placebo, the three-stage design also uses information from the two withdrawal stages (stages II and III) to reach inferences about the e>cacy of the therapy. We studied various scenarios in which this new design might be used. Response rates for the active treatments were based on the expectation that the response rates of active therapy would be 0.4 and 0.5, that is, 40 per cent or 50 per cent of the patients would be responders. These are approximately the response rates achieved with some of the new biologic therapies being studied in rheumatic diseases. Since our design is intended for studies of chronic, severe diseases where the response rate to placebo is usually low, we set the response rate to placebo at 20 per cent when the active treatment was superior to placebo. Type I error measures the chance that a clinical trial will show that a therapy is e>cacious even though there is actually no di7erence between treatment and control. Figure 2 shows the exact probability of type I error when the nominal test level is 5 per cent. Results are shown for each method (stage I RCT and three-stage design) for a uniform response rate of 20 per cent at each sample size (20, 25, 30, 35, 40, 45, 50, 55, 60 and 65) together with results for uniform response rate 40 per cent at each sample size. Some deviation from the nominal level is to be expected due to Dnite sample size. The results indicate that the three-stage design is not associated with a statistically higher probability of type I error than the standard stage I randomized trial. The statistical power of a clinical trial is deDned as the ability of a trial to detect a treatment e7ect if the treatment is e7ective, given a speciDed sample size and certain assumptions about the e7ects of therapy. Figure 3 shows the statistical power of the three-stage design and the randomized clinical trial design when the response rate to treatment is 40 per cent and the response rate to placebo is 20 per cent, for sample sizes between 20 and 65 (20, 25, 30, 35, 40, 45, 50, 55, 60 and 65). The three-stage design was consistently more powerful than a standard randomized clinical trial for all sample sizes studied. Under these assumptions, 50 per cent power is achieved with a sample size of 49 subjects with the three-stage design Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 3016 V. E. A. HONKANEN ET AL. Figure 3. Statistical power of the three-stage design assuming a response rate of 40 per cent. Exact statistical power of the three-stage design and the randomized clinical trial design when the response rate to treatment is 40 per cent and the response rate to placebo is 20 per cent, for sample sizes 20, 25, 30, 35, 40, 45, 50, 55, 60 and 65. Figure 4. Statistical power of the three-stage design assuming a response rate of 50 per cent. Exact statistical power with response rate to treatment set at 50 per cent while the response rate to placebo is 20 per cent. compared to 54 subjects using a standard randomized controlled trial. Figure 4 shows how these results change when the treatment response rate is increased to 50 per cent. The threestage design was again consistently more powerful than the standard randomized clinical trial at all sample sizes. Under these assumptions, 75 per cent power is achieved with a sample size of 42 subjects with the three-stage design compared to 52 subjects using a standard, randomized controlled trial. Since the subjects in the randomized withdrawal stages of the trial (stages II and III) are previously selected for having responded to drug (in stage I), the proportion of the subjects who maintain a response to drug may be higher than the proportion who respond to drug in the initial stage of the trial. This inequality in expected response rates can be modelled using Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 THREE-STAGE TRIAL DESIGN FOR RARE DISORDERS 3017 Figure 5. Statistical power of the three-stage design using a non-uniform response rate model. Exact statistical power for a non-uniform response rate model in which response rates are allowed to di7er in each stage. The population is a mixture model in which 30 per cent of subjects always respond to treatment and never respond to placebo, while the remaining 70 per cent respond randomly and independently at a constant rate regardless of assignment to treatment or placebo. Stage I response rates are set at 50 per cent for treatment and 20 per cent for placebo. Selection e7ects increase the probability of responding to treatment and decrease the probability of responding to placebo in stages II and III, yielding increased power. a non-uniform response rate model. Figure 5 shows the statistical power for a non-uniform response rate model in which response rates are allowed to di7er in each stage. For this purpose, we assumed a mixture model for the population, in which 30 per cent of subjects always respond to treatment and never respond to placebo, while the remaining 70 per cent respond randomly and independently at a constant rate regardless of assignment (treatment or placebo). To achieve responses in stage I comparable to those of Figure 4 (that is, 50 per cent for treatment and 20 per cent for placebo) this constant rate for random responders was set at 2=7 = 28:6 per cent. For this model as speciDed, the resulting probabilities (see notation in Figure 1) are pT = 50 per cent, pC = 20 per cent, pTT = 71:4 per cent, pTC = 11:4 per cent, pCT = 55:4 per cent, pCTT = 77:0 per cent and pCTC = 9:2 per cent. Note how these natural selection e7ects increase the probability of responding to treatment and decrease the probability of responding to placebo in stages II and III as, for example, stage II keeps all of the consistent responders but only a fraction of the random responders. Using the assumptions of the mixture model, 75 per cent power is achieved with the three-stage design with a sample size of 21 subjects, compared to 52 subjects with the standard randomized controlled trial. DISCUSSION In a clinical trial conducted using the three-stage design, the results of treatment with study drug (stage I) and randomized withdrawal (stages II and III) give rise to three separate assessments of e>cacy, in contrast to the single measure of e>cacy used to calculate overall statistical signiDcance in a conventional randomized clinical trial. The individual measures of Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 3018 V. E. A. HONKANEN ET AL. statistical signiDcance are then combined using Fisher’s method to derive a single overall pvalue. The three-stage design was found to be uniformly more powerful than the single-stage randomized clinical trial in all cases studied, especially in trials with low patient numbers and highly e7ective treatment. Despite the increase in power, the three-stage design is not associated with a higher probability of type I error than the conventional parallel-arm randomized clinical trial design. As an example of the relative power of the three-stage design, when the response rates were set uniformly at 50 per cent for active treatment and 20 per cent for placebo, statistical power of 75 per cent is attained at 42 patients by using the three-stage design. In a single-stage design we would need 52 patients to achieve the same statistical power (Figure 4). If the initial response rates are 50 per cent for active treatment and 20 per cent for placebo, but the population is a mixture of consistent and random responders, a power of 75 per cent is achieved by the three-stage design at 21 patients, less than half of the 52 necessary with a conventional randomized clinical trial (Figure 5). Several caveats should be considered in the interpretation of the second and third stages of the trial that involve drug withdrawal. First, before subjects are enrolled in the withdrawal stage, they must Drst have responded to study drug, so that the frequency of Mare in the withdrawal study must be understood to pertain to the responding subpopulation and not to the entire patient population. However, one of the advantages of the three-stage design over some withdrawal designs is that the population of patients randomized in the withdrawal stages consists of all subjects who initially responded. Second, in withdrawal studies, there is the possibility that drug withdrawal itself may induce a Mare of disease that is greater than the patient had at baseline. The three-stage design should be avoided in the assessment of drugs known to cause a rebound Mare of disease when they are withdrawn. To conclude that a positive result in a three-stage clinical trial implies e>cacy of the study drug, it would be important to observe trends indicating an e7ect of drug in the induction stage of the trial (stage I) as well as in the withdrawal stages (stages II and III). Similar to other clinical trials, a clinical trial using the three-stage design could incorporate an interim analysis with early stopping rules if e>cacy is seen after stage I has been completed. If the p-value for stage I (p1 ) is small enough, there would be no reason to complete stages II and III for purposes of hypothesis testing. In addition, a penalty would not be necessary because one cannot get signiDcance after stage I and lose it later, by virtue of the fact that p1 p2 p3 ¡p1 p2 ¡p1 . However, investigators may wish to complete the trial regardless, since the withdrawal stages of the trial provide additional information about whether continued treatment with the drug is necessary to maintain the clinical e7ects. An interim analysis for futility could also be incorporated after stage I so that if the treatment is found to lack e>cacy subjects can be spared moving into stages II and III unnecessarily. The three-stage design o7ers several features which may address the concerns of subjects and their physicians about the unnecessary exposure to placebo. In our design, all the patients entered in the study have a chance to derive a therapeutic beneDt from their participation in the trial in that all subjects assigned to the placebo arm who do not improve are subsequently o7ered active treatment in the later stages of the trial. In addition, the three-stage design avoids treating patients unnecessarily in that the subjects who respond to placebo in the Drst stage do not undergo active treatment. Similarly, those who do not respond to active treatment in the Drst stage do not have to go on placebo. In addition, only half of the patients who respond to active treatment in the Drst stage have to change from treatment to placebo. As an example, if the response rate to treatment is 50 per cent and to Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 THREE-STAGE TRIAL DESIGN FOR RARE DISORDERS 3019 placebo 20 per cent, and the number of patients is 50, then instead of 25 (cross-over design), only 11–12 patients on average have to start on placebo after the Drst stage, and 5 patients are saved from active treatment. (This average of 11–12 may be found by adding the expected number of patients enrolled in placebo in stages II and III as follows: (n=2)(pT )=2 +(n=2)(1 − pC)(pCT )=2, using n = 50; pT = pCT = 0:5 and pC = 0:2.) These features in our design may make it more acceptable to informed patients without compromising the statistical rigour of the trial. The challenge in conducting clinical trials of rare disorders is to enrol enough patients to reach acceptable statistical power. Sometimes even a multi-centre collaboration does not provide a su>cient number. Too often, researchers try to solve this problem by overestimating the number of patients to be enrolled or simply go ahead, hoping that despite low statistical power they will attain a signiDcant result. Thus, a study with a small number of patients runs the risk of failing to detect even a highly e7ective treatment [14]. Conducting a trial with low power or without an adequate power analysis could be considered unethical. A cross-over design gives the best possible power when all the conditions are met [15]. Often this is not the case, since to obtain unbiased results, the intervention must have a completely reversible e7ect [2]. With many interventions, however, we do not know whether this is true. In addition, when a priori expectations of the intervention are strongly positive, recruitment of patients into a full-scale cross-over trial becomes di>cult [2]. The sparseness of randomized clinical trials in paediatric rheumatological disorders, such as juvenile dermatomyositis, illustrates these problems [16]. Our design does not solve all these problems but can provide substantial advantages in certain situations. First, the patients changed from placebo to treatment are not included in the same analysis with those going from treatment to placebo. Hence, the potential carry-over e7ect from treatment does not compromise the results. Second, all the patients entered in the study have a chance to receive the active treatment. This factor, together with saving some patients from an unnecessary treatment or placebo stage, may make the design more attractive to subjects, especially in a situation where preliminary data suggest that the treatment is e7ective. This is often the case in paediatric medicine. When the treatment e7ect was clearly superior to placebo, it is possible to reach an acceptable power with 21 patients instead of the 52 who would have been needed with the classic randomized controlled trial design. Di7erences of this magnitude are important when dealing with rare disorders. Our design is more complicated to work through than a simple parallel trial, but this complexity must be weighed against the power advantage it o7ers. It should be noted that the three-stage design is not optimal for all clinical situations and does not replace the traditional randomized trial design. First, it is applicable only to chronic conditions where both response to therapy and Mare upon withdrawal of therapy can be assessed. So, for example, treatments with clinical e7ects that do not wash out within a reasonable time frame would probably not be good candidates for this type of design. In designing a three-stage clinical trial, care should be taken to allow a duration of the withdrawal phase long enough so that the drug would be completely washed out and the clinical e7ects of therapy reversed, otherwise the Mare rates in the withdrawal phase may not approach the response rate in stage I. However, unlike the cross-over design, there is no expectation that subjects return to their pre-treatment clinical status during stages II and III, since a Mare can be deDned in a way that represents just a partial return toward baseline. Second, some placebo subjects may barely meet criteria for being a responder and would consequently forgo active Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 3020 V. E. A. HONKANEN ET AL. treatment even though they may have beneDted from it. To minimize the number of spurious placebo responses, care should be taken in selecting clinically relevant response criteria that minimize the frequency of placebo responders. Third, since fewer patients may be available in the initial stage of the trial, the ability to precisely determine initial response rates may be less than with a traditional randomized trial design. Fourth, the three-stage design may be less suited for controlled assessment of safety in that the initial stage of randomized assignment to active treatment or control is limited in patient number and in duration. None the less, because of its advantages in power, the three-stage design may be particularly helpful in studies of rare diseases where subject number is limited. Another situation where this study design may o7er advantages is in the e>cacy of a therapeutic agent in a particular patient subpopulation when e>cacy in the general patient population has already been established. Finally, the three-stage design may be helpful in early stages of drug testing where small cohorts of patients are tested but where decisions must be made about the choice of dosing, about choosing between alternative candidate compounds or about whether to proceed with drug development. In recent years, as patients have become more aware of the methods used in clinical trials, randomized studies have been criticized on ethical grounds [17; 18]. This perception may a7ect both recruitment and compliance. We can address this problem in two ways. First, we can emphasize to patients and the general public the importance of objective studies, even of promising therapies; the double-blind randomized design remains the gold standard for clinical trials. Second, we can try to tailor designs to each situation and test the validity, power and robustness of each one before using the model in a real trial. Our design can be used in situations in which the number of patients is less than 60, and preliminary data indicate that the treatment is at least moderately e7ective. In these situations, the same statistical power can be achieved with fewer patients, cutting required sample size by about 20 per cent to 30 per cent in many cases and up to 60 per cent in others. ACKNOWLEDGEMENTS Andrew F. Siegel holds the Grant I. Butterbaugh Professorship at the University of Washington. Brian M. Feldman is an Ontario University of Health Career Scientist. REFERENCES 1. Hills M, Armitage P. The two-period crossover trial. British Journal of Clinical Pharmacology 1979; 8:7–20. 2. Piantadosi S. Cross-over designs. In Clinical Trials, A Methodologic Perspective. Wiley: Toronto, 1997; 404 – 418. 3. Schulz KF. Unbiased research and the human spirit: the challenges of randomized controlled trials. Canadian Medical Association Journal 1995; 153(6):783–786. 4. Gotay CC. Accrual to cancer clinical trials: directions from the research literature. Social Science Medicine 1991; 33(5):569 –577. 5. Feldman BM, Giannini EH. Where’s the evidence? Putting clinical science into pediatric rheumatology. Journal of Rheumatology 1996; 23:1502–1504. 6. Hanissian AS, Masti AT, Pitner SE, Cape CC, Medsger TA. Polymyositis and dermatomyositis in children: an epidemiologic and clinical comparative analysis. Journal of Rheumatology 1982; 9:390 –394. 7. Temple RJ. Special study designs: early escape, enrichment, studies in non- responders. Communications in Statistics Theory and Methods 1994; 23(2):499 –530. 8. Chirac C, Dulac O, Gram L. Vigabatrin withdrawal randomized study in children. Epilepsy Research 1996; 25(3):209 –215. Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021 THREE-STAGE TRIAL DESIGN FOR RARE DISORDERS 3021 9. Caldwell JR, Furst DE, Smith AL, Clark JA, Bonebrake RA, Gruhn WB, McIlwain HH, Logne CM. Flare during drug withdrawal as a method to support e>cacy in rheumatoid arthritis: Amiprilose hydrochloride as an example in a double blind randomized study. Journal of Rheumatology 1998; 25(1):30 –35. 10. Lovell DJ, Giannini EH, Rei7 A, Cawkwell GD, Silverman ED, Nocton JJ, Stein LD, Gedalia A, Ilowite NT, Wallace CA, Whitmore J, Finck BK. Etanercept in children with polyarticular juvenile rheumatoid arthritis. Pediatric Rheumatology Collaborative Study Group. New England Journal of Medicine 2000; 342(11): 763–769. 11. Felson DT, Anderson JJ, Boers M, Bombardier C, Furst D, Goldsmith C, Katz LM, Lightfoot R, Paulus H, Strand V. American College of Rheumatology. Preliminary deDnition of improvement in rheumatoid arthritis. Arthritis and Rheumatism 1995; 38(6):727–735. 12. The Canadian Hydroxychloroquine Study Group. A randomized study of the e7ect of withdrawing hydroxychloroquine sulfate in systemic lupus erythematosus. New England Journal of Medicine 1991; 324:150 –154. 13. Ludbrook J, Dudley H. Why permutation tests are superior to t and F tests in biomedical research. American Statistician 1998; 52(2):127–132. 14. Lilford RJ, Thornton JG, Braunholtz D. Clinical trials in rare diseases, a way out of a conundrum. British Medical Journal 1995; 311(7020):1621–1625. 15. Packer M. Combined beta-adrenergic and calcium entry blockade in angina pectoris. New England Journal of Medicine 1989; 320:709 –718. 16. Rider LG, Miller FU. ClassiDcation and treatment of juvenile idiopathic inMammatory myopathies. Rheumatic Disease Clinics of North America 1997; 23(3):619 – 655. 17. Schafer A. The ethics of the randomized clinical trial. New England Journal of Medicine 1982; 307(12):719 –724. 18. Kotwall CA, Mahoney LJ, Myers RE, Decoste L. Reasons for non-entry in randomized clinical trials for breast cancer: a single institutional study. Journal of Surgical Oncology 1992; 50(2):125–129. Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:3009–3021