Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Drug discovery wikipedia , lookup
Pharmacognosy wikipedia , lookup
Neuropharmacology wikipedia , lookup
National Institute for Health and Care Excellence wikipedia , lookup
Pharmacokinetics wikipedia , lookup
Pharmaceutical industry wikipedia , lookup
Prescription costs wikipedia , lookup
Polysubstance dependence wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Clinical trial wikipedia , lookup
Controlled Clinical Trials 25 (2004) 143 – 156 www.elsevier.com/locate/conclintrial Assessment of blinding in clinical trials Heejung Bang a,*, Liyun Ni b, Clarence E. Davis a a Department of Biostatistics, University of North Carolina at Chapel Hill, 137 E. Franklin Street, Chapel Hill, NC 27599, USA b Amgen, Inc., Thousand Oaks, CA, USA Received 18 February 2003; accepted 17 October 2003 Abstract Success of blinding is a fundamental issue in many clinical trials. The validity of a trial may be questioned if this important assumption is violated. Although thousands of ostensibly double-blind trials are conducted annually and investigators acknowledge the importance of blinding, attempts to measure the effectiveness of blinding are rarely discussed. Several published papers proposed ways to evaluate the success of blinding, but none of the methods are commonly used or regarded as standard. This paper investigates a new approach to assess the success of blinding in clinical trials. The blinding index proposed is scaled to an interval of 1 to 1, 1 being complete lack of blinding, 0 being consistent with perfect blinding and 1 indicating opposite guessing which may be related to unblinding. It has the ability to detect a relatively low degree of blinding, response bias and different behaviors in two arms. The proposed method is applied to a clinical trial of cholesterol-lowering medication in a group of elderly people. D 2004 Elsevier Inc. All rights reserved. Keywords: Blinding; Blinding index; CRISP; Clinical trial; Masking; Multinomial test 1. Introduction Blinding embodies a rich history spanning a couple of centuries and represents an important, distinct aspect of randomized controlled trials. The double-blind procedure has been regarded as an important design feature in clinical trials. Although most researchers appreciate its meaning, there is some confusion in the definition of blinding (e.g., single-, double- and triple-blind, masking and allocation concealment) [1]. The terminology double-blind usually refers to keeping study participants, investigators and data assessors unaware of the allocated treatment or therapy, so that they are not influenced * Corresponding author. Tel.: +1-919-962-3231; fax: +1-919-962-3265. E-mail address: [email protected] (H. Bang). 0197-2456/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.cct.2003.10.016 144 H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 psychologically or physically by that knowledge. Single-blind normally means blinding of the participants only [2,3]. Blinding not only can prevent selection or ascertainment (i.e., information) biases, but also can improve compliance and retention of trial participants [1]. In well-blinded trials one can have certainty that any differential effects between groups stem from the treatment rather than the subjects’ or researchers’ biases [4]. Blinding is difficult to achieve in some situations such as comparison of surgical vs. medical interventions, or of psychological vs. no treatments. In drug trials, comparing active treatment with placebo or competing active treatment, however, researchers have established many methods to assure the achievement of blinding, such as making the appearance, smell and taste of active drug and placebo be the same to disguise their dissimilarity. When two active drugs are compared, the double-dummy method using two placebos is often used [5]. The degree of blinding is ascertained by directly asking participants, health-care providers or outcome assessors which treatment they think was administered at several stages over or at the end of a trial. Even with these technical efforts, beneficial therapeutic efficacy, side effects or even internal conversation are frequently cited as clues to treatment identity, and in consequence allow the patients and caregivers to become unblinded through the trial. For example, in clinical trials of psychoactive drugs, it was often found that both patients and physicians were able to figure out treatment allocation beyond chance levels [5]. Even though many trials are designed as double-blind and most researchers acknowledge that successful blinding is essential, very few studies actually evaluate or report the magnitude to which the blinding was maintained during the course of the study. A meta-analysis reported that among trials claimed as double-blind, only 45% described similarity of the treatment and control regimens, and only 26% provided information regarding the protection of the allocation schedule [6]. Most publications provide no information on attempts to maintain and/or evaluate blinding. Although several methods to assess blinding in clinical trials have been published, none of these are widely used or considered standard. Moreover, the methods have not been thoroughly studied from a statistical point of view. Most previous methods were based on an exploratory analysis often with no or even incorrect statistical properties and excluded nonrespondents (i.e., responders with indefinite answers) [5,7–11]. Recently, James et al. [4] proposed a method to assess blinding by constructing an index along with its asymptotic theory, which incorporates the nonrespondents. In this paper, we propose a set of hypothesis tests to evaluate the effectiveness of blinding in clinical trials. The remainder of this paper is organized as follows. In the next section, we introduce typical data structures commonly available in clinical trials. Then, we compare the proposed approach with the best (statistically) established method and explore their advantages and limitations. Next we present the Table 1 Number of subjects by treatment assignment and guess in 23 format Assignment Drug Placebo Total Response Drug Placebo DK Total n11( P1j1) n21( P1j2) n.1 n12( P2j1) n22( P2j2) n.2 n13( P3j1) n23( P3j2) n.3 n1. n2. N Pjji = P(guess jjassigned treatment i) for i = 1 (drug), 2 (placebo) and j = 1 (drug), 2 (placebo), 3 (DK) where DK denotes ‘‘Don’t know’’. N is total number of participants. H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 145 Table 2 Number of subjects by treatment assignment and guess in 25 format Assignment Drug Placebo Total Response 1 2 3 4 5 (DK) Total n11( P1j1) n21( P1j2) n.1 n12( P2j1) n22( P2j2) n.2 n13( P3j1) n23( P3j2) n.3 n14( P4j1) n24( P4j2) n.4 n15( P5j1) n25( P5j2) n.5 n1. n2. N 1: Strongly believe the treatment is drug, 2: somewhat believe the treatment is drug, 3: somewhat believe the treatment is placebo, 4: strongly believe the treatment is placebo and 5: DK. results of a simulation study. These methods are illustrated with the data from the Cholesterol Reduction in Seniors Program (CRISP). We end with some discussion and additional remarks. 2. Data structure The typical procedure to elicit blinding information from participants is to ask, during or at the end of study, about the treatment allocation they think they were assigned, and the answer can take various forms. In this section, we present the two most common structures for the response data. One is presented in Table 1 with three responses of ‘‘drug’’, ‘‘placebo’’ or ‘‘DK’’, where we will use ‘‘DK’’ as an abbreviation of ‘‘Don’t know.’’ The other is shown in Table 2 which presents the level of selfidentification in five categories of ‘‘Strongly believe the treatment is drug’’ (coded as 1), ‘‘Somewhat believe the treatment is drug’’ (2), ‘‘Somewhat believe the treatment is placebo’’ (3), ‘‘Strongly believe the treatment is placebo’’ (4) and ‘‘DK’’ (5). The probability in each cell denotes the conditional probability Pjji = P (guess jjassigned treatment i) for i = 1 (drug), 2 (placebo) and j = 1 (drug), 2 (placebo), 3 (DK) for Table 1 and j = 1,. . .,5 for Table 2. In addition, those individuals who declined to venture an opinion (i.e., answered DK) originally may be asked to choose a treatment allocation anyway as done in the CRISP study. Such ancillary data can be displayed in Table 3. If there are no missing data in Table 3, ñ (from Table 3) is expected to be equal to n.3 from Table 1 or n.5 from Table 2. Each probability in Table 3 can be defined similarly to the conditional probability above. We will focus on Table 1 alone and then explain how to expand the idea to Table 2 (and Table 3). The study design for blinding assessment and the data collection process in clinical trials is not complicated technically; however, no standard method is popularly used for this purpose. In Section 3, we summarize an existing method and then introduce a new approach based on a multinomial test. Table 3 Ancillary data from the subjects who answered DK initially Assignment Drug Placebo Total Response Drug Placebo Total ñ11(P̃1j1) ñ21(P̃1j2) ñ.1 ñ12(P̃2j1) ñ22(P̃2j2) ñ.2 ñ1. ñ2. ñ 146 H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 3. Statistical methods 3.1. James’ blinding index The standard j coefficient ignoring DK responses measures the degree of agreement. However, in clinical trials, disagreement is a more favorable result since it indicates a high degree of blinding and DK may be the most indicative response for that. Therefore, James et al. proposed a blinding index (BI), a variation of the kappa coefficient, that is sensitive not to the degree of agreement but to the degree of disagreement by placing the highest weight on DK responses [4]. The BI score for the data in Table 1 is defined as BI ¼ f1 þ PDK þ ð1 PDK Þ*KD g=2 P2 P2 responses, K = ( P P )/P with P = where PDK is the proportion of DK D Do De De Do i¼1 j ¼ 1 wijPij/ P2 P2 2 (1PDK) for PDK p 1 and PDe = i ¼ 1 j ¼ 1 wijP.j( Pi.Pi3)/(1PDK) . Note that this statistic is applicable to a symmetric data structure plus DK answers, and Pij and wij are the expected ‘‘relative’’ (not conditional) probability and the weight, respectively, for the (i, j)th cell where i indexes treatment assignment and j indexes treatment guess (i, j = 1,2). Thus, BI can be estimated by B̂I = {1+P̂DK+(1P̂DK)*K̂D}/2, where P̂ij = nij/N, P̂i. = ni./N, P̂.j = n.j/N, P̂i3 = ni3/N and P̂DK = n.3/N with the total sample size of N. Its asymptotic variance and an extension to the study with multiple arms can be found in the same article. According to James et al., correct guesses are least supportive of blinding and assigned a weight of 0, incorrect guesses are moderately supportive with an intermediate weight of 0.5 or 0.75, while DK responses are implicitly assigned a weight of 1. When all responses are correct, BI = 0. When all responses are DK (i.e., PDK = 1), BI = 1. Therefore, this index increases as the success of blinding increases, ranging from 0 to 1, 0 being total lack of blinding, 1 being complete blinding and 0.5 being completely random. If the upper bound of the confidence interval (CI) of B̂I is below 0.5 (i.e., CI does not cover the null value), the study is regarded as lacking blinding. Otherwise, we conclude that there is insufficient evidence to show unblinding or blinding is achieved. The former statement is a correct one but we will use these statements interchangeably to simplify communication. A major advantage of James’ BI is the adaptability to various data structures. Yet, it should be noted that this method is critically dominated by DK responses, which will be demonstrated later by a numerical experiment. A fundamental assumption for this method to be valid is that participants who answer DK are truly uncertain about their treatment assignment, not just giving a socially acceptable answer. This assumption cannot be verified in the absence of any supporting information from the people who answer DK. Some evidence may be provided from the ancillary data in Table 3 although it is not necessarily the case. In practice, the drug and placebo arms of a clinical trial can exhibit distinct blinding behaviors not only in magnitude but also in direction. Since James’ BI is a single index value obtained from all arms together, it cannot distinguish this difference, if any, and can even lead us to a totally opposite conclusion. Moreover, this and other previous methods do not provide the estimate of the proportion of unblinded participants beyond random chance level. Investigators may want to know how many participants are unmasked regardless of the validity and effectiveness of masking. Being motivated by several limitations, we propose a new BI based on the multinomial distribution. H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 147 3.2. New blinding index Taking into account the effect of uncertain responses and the fact that two arms often exhibit different blinding characteristics, a new BI based on a trinomial test is developed in this section. We introduce a new notation riji ( = Piji/( P1ji+P2ji)) for proportion of correct guesses among individuals with certain identification on the ith arm, where i = 1 (drug) and 2 (placebo) for the data in Table 1. When there is no DK response, if the treatment is successfully concealed, it is likely that r1j1 = r2j2 = 0.5 (i.e., half of the treatment group will guess treatment and half of the placebo group will guess placebo). It is easy to show that 2r̂iji1 is the sample proportion of individuals who guess their treatment correctly on the ith arm beyond random balance, where r̂iji = nii/(ni1+ni2) estimates the population quantity riji consistently. In the presence of DK responses, we define a new treatment-specific BI, newBIi = (2riji1)*(P1ji+P2ji) and estimate it by neŵBIi ¼ ð2r̂iAi 1Þ*ðni1 þ ni2 Þ=ðni1 þ ni2 þ ni3 Þ ð1Þ for i = 1,2. Intuitively, the numerator (2r̂iji1)*(ni1+ni2) estimates the number of people who guess the treatment correctly beyond chance level. This test is generally carried out separately to answer the questions: (1) Was the treatment arm blinded? and (2) Was the placebo arm blinded? A routine calculation leads us to the following relationships: newBI1 = P1j1P2j1 for drug arm and newBI2 = P2j2P1j2 for placebo arm, where Pjji (i, j = 1,2) is the ‘‘conditional’’ probability and can be estimated by P̂jji = nij/ni. (see Table 1 for details). The exact distribution of neŵBIi can be derived from a trinomial distribution and gives the variance VarðneŵBIi Þ ¼ fP1Ai ð1 P1Ai Þ þ P2Ai ð1 P2Ai Þ þ 2P1Ai P2Ai g=ni : for i = 1,2. In other words, inference based on the new index score is equivalent to statistical testing of H0: P1j1 = P2j1 vs. Ha: P1j1 > P2j1 for treatment arm and H0: P2j2 = P1j2 vs. Ha: P2j2 > P1j2 for placebo arm. NewBI calculates the difference between the proportions of correct and incorrect guesses by excluding DK responses in the numerator but including them in the denominator, and can be straightforwardly interpreted as a treatment-specific proportion of the unblinded participants, assuming implicitly that DK indicates blindness. It takes a value between 1 and 1, with 0 as a null value, which indicates the most desirable situation under successful blinding. A positive value implies failure in masking above random accounting (i.e., a majority of participants guess their treatment allocation correctly), and a negative value suggests success of masking or failure of masking in the other direction (i.e., more individuals mistakenly name the alternative treatment), which will be discussed below. If we do not have DK answers, this test is equivalent to a one-sample binomial test (i.e., H0: riji = 0.5 vs. Ha: riji > 0.5). The test can be conducted easily using the normal approximation to a binomial distribution. The normal approximation is expected to perform well because a large sample is commonly available in clinical trials. If a one-sided 95% confidence limit of the BI does not include the null value (equivalently, the lower limit is above 0), this treatment arm is deemed to fail to achieve blinding. Otherwise, blinding is maintained or, to speak more rigorously, there is insufficient evidence to show unblinding. When the data in Tables 2 and 3 are available, we can use a linear combination derived from a multinomial distribution with weights specified a priori. Now we are testing H0: P1ji + w2jiP2ji + w̃1jiP̃1ji = P4ji + w3jiP3ji + w̃2jiP̃2ji vs. Ha: P1ji + w2jiP2ji + w̃1jiP̃1ji > P4ji + w3jiP3ji + w̃2jiP̃2ji for i = 1, or 148 H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 Ha: P1ji + w2jiP2ji + w̃1jiP̃1ji < P4ji + w3jiP3ji + w̃2jiP̃2ji for i = 2, where wjji and w̃jji are the weights for Pjji and P̃jji, respectively. The new BI in this situation is defined as newBIi ¼ liT Pi where li ¼ ½1; w2Ai ; w̃1Ai ; w̃2Ai ; w3Ai ; 1T and Pi ¼ ½P1Ai ; P2Ai ; P̃1Ai ; P̃2Ai ; P3Ai ; P4Ai T for i ¼ 1; Pi ¼ ½P4Ai ; P2Ai ; P̃1Ai ; P̃2Ai ; P3Ai ; P1Ai T for i ¼ 2 P P subject to conditions: (1) 4j¼1 Pjji + 2j¼1 P̃jji = 1, and (2) 0 V w̃1ji = w̃2ji V w2ji = w3ji V 1 for i = 1,2. The variance of the new BI estimator is var(neŵBIi) = lTi cov(P̃i) li, where cov(P̃i) is a 66 covariance matrix. By using the data in both Tables 2 and 3, not only the ordinal scores for the response (rather than binary response), but also ancillary data can be incorporated along with proper weight assignment in these hypothesis tests. We suggest w2ji = w3ji = 0.5 and w̃1ji = w̃2ji = 0.25 for i = 1,2 but other weight specifications can be employed, especially for sensitivity analysis. It is easy to see that the test statistic in Eq. (1) is a special case with w2ji = w3ji = 1 and w̃1ji = w̃2ji = 0 and we just set w̃1ji = w̃2ji = 0 in the absence of the auxiliary data. If we intend to conduct the two hypothesis tests simultaneously, we recommend a Bonferroni type multiple testing procedure. Since the two test statistics are independent and a small number of tests (i.e., at most two or three) are generally involved, the conservatism of the Bonferroni approach will not be a problem [12]. A joint test can be an alternative, but some statistical tests proposed by Silvapulle should be adopted (instead of a standard chi-square test used for two-sided tests) in order to account for onesided nature of the alternative [13]. If statistical significance is near the nominal level, statistical inference may entail extensive computing efforts. Although the testing procedure in the new index method looks straightforward, more caution should be exercised before we reach the final conclusion, especially regarding response bias and the potential interpretation of a negative value of the BI [7,14]. A typical example of response bias is that all subjects tend to believe that they are on active treatment. This may occur when both taking an active treatment and being untreated cause adverse reactions or favorable effects, or accompany wishful thinking. Alternatively, participants are inclined to believe they receive a placebo in the absence of any therapeutic effects, because participants who presume they are on active drug anticipate definitive effects. No previous literature has proposed a statistically justifiable way to evaluate this type of potential bias. For our test, we suggest conducting a formal test and reporting the finding instead of concluding successful blinding on one arm and unsuccessful blinding on the other arm. If one arm suggests successful blinding and the other arm does not based on the method outlined above (i.e., accept newBI1 < 0 and newBI2 > 0, or vice versa.), then test H0: newBI1 + newBI2 = 0 in order to confirm the existence of response bias. If H0 cannot be rejected, there is some evidence that this trial may experience response bias. Finally, let us consider a very rare but still plausible example in which newBI = 1 (i.e., everyone picked the treatment that they did not receive). In this case, James’ BI assigns the best score of 1 and concludes that the trial is perfectly blinded. Unlike James’ BI, our method still can detect this departure H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 149 by giving a negative estimate. If the investigator believes that masking is suspect if unblinding is significant in either direction, adopting a two-sided hypothesis can resolve this issue. We note that this situation would seldom occur and cause little concern relative to the case of newBI z 0; consequently, we propose to use a one-sided test in general. Our suggestion in practice is that the result should be reported or discussed if at least one of the following conditions is met: (1) neŵBI is greater than 20%, (2) the null hypothesis (i.e., successful or random blinding) is rejected based on our principle or (3) the James method and our method differ in conclusion. 4. Simulation study We conducted a numerical experiment under various scenarios and compared the performance of our method with the James method. We chose the James method as the only one for which a large sample theory has been established. For illustrative purposes, we simulated the data according to the data structure in Table 1. We assumed equal allocations of 250 patients to intervention and placebo arms (i.e., a balanced clinical trial). We considered three scenarios: ‘‘random’’, ‘‘opposite’’ and ‘‘unblinded’’, where Random indicates that the probabilities of correct and wrong guesses are equal, therefore, each participant randomly guesses the treatment assignment. Opposite represents a higher probability of guessing the opposite intervention, i.e., more participants in the intervention arm guess placebo, and more participants in the placebo arm believe they receive an active drug. Lastly, unblinded indicates that most participants identify the assignment correctly. There are 3! = 6 distinct combinations and they are listed in Table 4. We briefly explain the motivation of each case. In case 1, all participants randomly guess their assignment, therefore theoretically, James’ BI = 0.5 and newBI = 0 for each arm. This is the most ideal scenario in reality. Case 2 is the scenario that gives the best score for each BI (James’ BI is close to 1 and our newBI is nearly 1 for each arm), but it seldom occurs and needs some caution in interpretation as outlined in the previous section. Case 3 is the opposite to case 2, where most participants figure out their treatment assignment. This case certainly indicates that treatments are revealed, thus the validity of the accompanying statistical analyses should be closely examined. In case 4, most participants guess the active drug, suggesting that response bias may exist. Case 5 indicates random guessing in the placebo arm and mostly incorrect guesses in the drug arm, which is also a moderately rare scenario. In case 6, participants in the placebo arm randomly guess their treatment, while most participants in the drug arm correctly identify it. Within each case, we varied the proportion of DK response: PDK = 0, 0.3 and 0.7, corresponding to no, moderate and high percentage of DK, respectively. The simulation study was performed with SAS 8.2. For case 1 with moderate DK probability, a random number Q was generated from a uniform distribution on [0, 1] using ranuni function. If Q V 0.35, the response was assigned to drug, if Q z 0.65, the response was assigned to placebo, and the response was assigned to DK otherwise. The data were generated similarly for the other conditions. Sample means of BI estimates and of standard error estimates, and the proportion of unblinding judged by the one-sided 95% CI from 500 data sets for each scenario were computed, and the results were summarized in Table 5. The simulation study showed precise coverage probabilities for most cases within the 2% range, where the accuracy for each empirical coverage probability is given by 150 H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 Table 4 Simulation setting Case 1 (T: random, P: random) 2 (T: opposite, P: opposite) 3 (T: unblinded, P: unblinded) 4 (T: unblinded, P: opposite) 5 (T: opposite, P: random) 6 (T: unblinded, P: random) Assignment T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P Response (%) DK T P 0 50 50 35 35 15 15 10 90 10 60 10 20 90 10 60 10 20 10 90 90 60 60 20 20 10 50 10 35 10 15 90 50 60 35 20 15 50 50 35 35 15 15 90 10 60 10 20 10 10 90 10 60 10 20 10 10 10 10 10 10 90 50 60 35 20 15 10 50 10 35 10 15 30 70 0 30 70 0 30 70 0 30 70 0 30 70 0 30 70 T and P denote an active treatment and placebo, respectively. ‘‘Random’’, ‘‘opposite’’ and ‘‘unblinded’’ indicate random, incorrect and correct guesses for the given treatment assignment, respectively. 2(0.950.05/500)1/2 = 0.02 with 500 runs (results not shown). The empirical type I error in case 1 without uncertain responses, where all participants in both arms randomly guessed their treatments, was close to 0.05, the prespecified significance level. Throughout all the scenarios, when the proportion of individuals who decline to express their belief is increased, the value of James’ BI is also increased. Since the James method emphasizes the contribution of uncertain response when PDK is high, this method almost always concludes that blinding is H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 151 Table 5 Simulation results with 5% type I error Case DK (%) James’ BI BI (SEE) 1 2 3 4 5 6 New BI Rejection (%) 0 0.50 (0.022) 5 30 0.65 (0.021) 0 70 0.85 (0.016) 0 0 0.90 (0.013) 0 30 0.90 (0.013) 0 70 0.90 (0.013) 0 0 0.10 (0.013) 100 30 0.40 (0.022) 99.4 70 0.80 (0.018) 0 0 0.50 (0.013) 6.6 30 0.65 (0.017) 0 70 0.85 (0.015) 0 0 0.70 (0.018) 0 30 0.77 (0.015) 0 70 0.88 (0.019) 0 0 0.30 (0.019) 100 30 0.53 (0.021) 0 70 0.83 (0.016) 0 Assignment BI (SEE) Rejection (%) T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P 0.0017 (0.063) 0.0002 (0.063) 0.0023 (0.053) 0.0001 (0.053) 0.0008 (0.034) 0.0007 (0.034) 0.80 (0.038) 0.80 (0.037) 0.50 (0.042) 0.50 (0.042) 0.10 (0.034) 0.10 (0.034) 0.80 (0.038) 0.80 (0.037) 0.50 (0.042) 0.50 (0.042) 0.10 (0.034) 0.10 (0.034) 0.80 (0.038) 0.80 (0.037) 0.50 (0.042) 0.50 (0.042) 0.10 (0.034) 0.10 (0.034) 0.80 (0.038) 0.0002 (0.063) 0.50 (0.042) 0.0001 (0.053) 0.10 (0.034) 0.0007 (0.034) 0.80 (0.038) 0.0002 (0.063) 0.50 (0.042) 0.0001 (0.053) 0.10 (0.034) 0.0007 (0.034) 5.2 7.2 3.2 5.8 4.4 4.4 0 0 0 0 0 0 100 100 100 100 92 92.4 100 0 100 0 92 0 0 7.2 0 5.8 0 4.4 100 7.2 100 5.8 92 4.4 T and P denote an active treatment and placebo, respectively. BI denotes the sample mean of BI estimates and SEE is the average of standard error estimates. Rejection (%) represents the proportion of {H0: blinding is maintained} being rejected. Two hundred and fifty subjects are allocated to each treatment arm. The results are based on 500 simulations. successfully fulfilled. As seen in the simulation study with PDK = 0.7, the James method never rejects the H0 (i.e., rejection probability is 0). Even with PDK = 0.3, H0 is never rejected except in case 3 where the entire study is severely unmasked and the conclusion could be controversial. The most important issue here is how much weight should be assigned to those uncertain responses, which depends on how uncertain those DK responses really are. In the CRISP study, which will be addressed later, those 152 H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 participants who originally answered DK were asked to choose treatment assignment a second time. This kind of ancillary data with a small number of non-responses may provide some assessment of the validity of DK answers in the original data. In addition, when a response bias exists such as in case 4, the James method could not detect this and would conclude that the study was successfully blinded. The final decision can be highly subjective but it is important to identify the cause of any bias. For the cases where participants in one arm randomly guess their assignment, and participants in another arm correctly guess their assignment, such as case 6, the James method always concludes (un)blinding without differentiating that one arm is actually blinded. Note that, for all opposite cases in either arm, James’ BI always gives the estimate greater than 0.5, whereas our method diagnoses the situation properly by providing negative values. In particular, it is interesting to note through case 2 that increasing PDK and more opposite selection have similar effects on the final estimate of the James method. In general, the James method works well, given that DK respondents are truly uncertain about the treatment assignment, not just providing a pleasing response to the study physician or program coordinator. However, the response bias, potential unblinding in other direction or differential behaviors for different arms cannot be detected by the James method since it combines all the information from all intervention arms into a univariate index value and some cancelling effect may occur erroneously. In contrast, the new BI method assesses blinding in each arm separately, weighs identified responses and detects unblinding in the opposite direction. Uncertain response comes into play indirectly in the denominator. When PDK increases, the new BI is generally attenuated toward randomness, i.e., uncertain response has a similar effect as random guessing. In the example of case 1, when PDK increases from 0 to 0.7, the new BI is attenuated toward 0, but the final result (i.e., rejection rate) is much the same in all circumstances. Our new BI shows higher power to detect a small absolute difference and relatively low degree of unblinding. In the example of case 2 with 70% uncertain response, the new BI method gives a value 0.1, indicating it is close to being random instead of blind (Jamês BI=0.9). Of course, 10% difference for determining (un)blindness may be subjective or even controversial, but it would be of scientific interest to detect if any significant discrepancy exists beyond balance. When blinding behaviors in the two arms are not similar, the new BI method can detect it and utilizes more information conveyed by the data for the arm-specific assessment of blinding. Taking case 4 without uncertain response as an example, adjusted by Bonferroni correction at an overall 5% significance level, the BI estimate for drug arm is significantly greater than 0 and that for placebo arm is significantly less than 0 in each of 500 simulations. Further, the sum of BI estimates from both arms in 99% of the 500 runs is not significantly away from zero, thus we report that response bias may exist. Lastly, in those scenarios where the participants in one arm randomly guess treatment and those in the other arm incorrectly or correctly guess treatment (cases 5 and 6), the new BI distinguishes the armspecific behaviors by yielding distinct index values. In general, due to the separate evaluation of blindness for each arm, the new BI is more realistic and accurate, especially in those circumstances where there exists response bias or significant difference in terms of blinding nature in two arms, but it is still very simple to use. 5. An example: the CRISP study Total and lipoprotein cholesterol levels continue to be one of the greatest risk factors of coronary heart disease in people over 65 years old, but clinical trials with regard to this issue are very sparse. The H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 153 Table 6 CRISP study data in 25 format Assignment Lovastatin Placebo Total Response 1 2 3 4 5 (DK) Total 38 11 49 44 16 60 21 21 42 4 8 12 170 83 253 277 139 416 1: Strongly believe the treatment is Lovastatin, 2: somewhat believe the treatment is Lovastatin, 3: somewhat believe the treatment is placebo, 4: strongly believe the treatment is placebo and 5: DK. CRISP study was a five-center pilot study to assess feasibility of recruitment and efficacy of cholesterol lowering in this age group. It was conducted from July 1990 to June 1994 [15]. The five centers included Wake Forest School of Medicine, NC; George Washington University Medical Center, Washington, DC; University of Minnesota, MN; University of Tennessee, TN; and University of Washington, WA. Four hundred thirty one subjects with low-density lipoprotein cholesterol levels greater than 4.1 and less than 5.7 mmol/l were randomized into the study, of whom 71% were women, 24% were minorities and the mean age was 71 years. Participants were followed for 1 year while on a cholesterol-lowering diet plus either placebo or one of two doses of the study drug, Lovastatin (20 or 40 mg/day). The primary endpoint was change in blood lipid level. Although change in blood lipid level was an objective measurement, whether or not the patients found out their treatment assignment may have affected their compliance or attitude toward participation in the study, which makes the assessment of blinding an important issue. At the end of the trial, all participants were asked to rate the extent to which they knew their medication on a five-point scale: (1) strongly believe the treatment is Lovastatin, (2) somewhat believe the treatment is Lovastatin, (3) somewhat believe the treatment is placebo, (4) strongly believe the treatment is placebo and (5) DK. Of the 420 people who participated in the post-trial data collection process, 4 participants did not answer this question and were deleted from the blinding assessment. The distribution of the participants’ response in each treatment group (with two doses combined) is displayed in Table 6. To simplify further, we also created a simpler table according to the 23 data structure (presented in Table 7). The additional data from those who answered DK originally and were asked a second time are also available in Table 8. James’ BI using the weights of 0, 0.5 and 1 for correct, wrong and uncertain response, respectively, gave the estimate of 0.75 (95% CI: 0.71, 0.78), implying that the CRISP study was well-blinded. The point estimates from different data configurations and weights varied between 0.74 and 0.78, always reaching the conclusion of effective masking. Table 7 CRISP study data in 23 format Assignment Lovastatin Placebo Total Response Lovastatin Placebo DK Total 82 27 109 25 29 54 170 83 253 277 139 416 154 H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 Table 8 Ancillary data from DK in CRISP study Assignment Lovastatin Placebo Total Response Lovastatin Placebo No answer Total 79 36 115 86 45 131 5 2 7 170 83 253 The new BI gave an estimate of 0.21 (95% CI: 0.15, 0.26) for the Lovastatin arm, meaning a significant excess of correct guesses, while it gave a value of 0.01 (95% CI: 0.07, 0.10) for the placebo arm, showing a pattern consistent with a random distribution of responses. These results can be directly interpreted as 21% of participants correctly guessed their treatment beyond chance in the Lovastatin arm, whereas only 1% of participants did in the placebo arm. When we use Tables 6 and 8 with the symmetric weights of 1, 0.5 and 0.25 for strongly believe, somewhat believe and believe but originally uncertain, respectively, the new BI estimate is 0.16 (95% CI: 0.11, 0.21) for the Lovastatin arm and 0.01 (95% CI: 0.06, 0.08) for the placebo arm (excluding seven participants who refused to answer again). Therefore, we attain the same conclusion for either data structure. Note that these estimates are slightly attenuated toward the null since the ancillary data strongly indicate randomness. We also find that the result is highly robust to different weight specifications. In particular, more patients who took the high dose tended to guess correctly compared to those who took the low dose (data not shown). 6. Discussion Many double-blind trials are conducted worldwide every year and there is substantial literature on the technology for and potential effects of successful blinding. Nonetheless, only a very small number of studies have systematically assessed the views of trial subjects regarding the identity of the assigned treatment. It is partly due to the absence of a standard method. In this paper, we investigate a new BI and compare it with the method proposed by James et al. [4]. Most methods previous to James’ simply ignore unknown responses and/or were not developed for statistical inference, but for descriptive purpose. If there is a high percentage of nonrespondents, these methods might result in misleading conclusions. The James’ method gives a single index value to assess the blindness of the entire clinical trial, emphasizing uncertain responses. If there is a moderate to high percentage of those responses in the data ( PDKz0.3), this method rarely discovers the moderate to low level of unblinding as demonstrated in the simulation study. Moreover, this method combines all treatment arms, so some cancelling or mixing effects may result in wrong conclusions when there is a significant difference in blinding between the two arms or when there exists response bias. A fundamental assumption for the James’ method is that the participants with uncertain responses truly do not know their assignment, but this is sometimes hard to ascertain. In this article, we propose a new BI based on a multinomial test. Because this is evaluated in each arm separately, it utilizes detailed information from all available data including DK, with more weights to known answers. The new BI can be used not only for hypothesis testing but also be directly interpreted as the proportion of the unblind beyond balance, where more emphasis is placed on the latter property due to the nature of the study. It has the ability to detect a relatively low degree of unblinding, response H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 155 bias and the different behaviors including unblinding in the opposite direction. Since the index is computed separately for each treatment arm, generalization to more than two arms is straightforward, although one may need to adjust for multiple tests using the Bonferroni correction. It is possible that unsuccessful blindness may not influence statistical analyses in the end. For example, in a nicotine gum study, even if the trial was regarded as partially unblinded, a drug effect was preserved among subjects who correctly identified their drug, among subjects who incorrectly identified their drug and among subjects who could not tell which drug they received, implying failure to maintain blindness did not alter the results of the study for different subgroups [7]. On the other hand, in a wellknown trial of vitamin C, the perceptions affected the endpoint concerning cold symptoms [16]. It has also been suggested that adequacy of the blind may strongly influence the adherence of subjects to their regimens. In this context, subjects’ efforts to unblind their assignment should be minimized. It may be an issue even in trials where more objective outcomes such as mortality are used. Knowledge about the treatment condition could lead to better or worse ancillary care and more or less attention. These factors could have an impact even on the survival time. While studying this topic, we realized that assessing blindness is not a simple process. Statistically it is straightforward, but the final conclusion is hard to summarize in one decision criterion. It relies heavily on the investigators’ subjectivity and the nature of the study. In example, Howard et al. claimed that patient blinding worked effectively even though about 67% of participants correctly identified treatment assignment [17]. Clearly, 67% might be high enough for a majority of researchers to reject that blinding was implemented properly. The patients’ conception of their response options can also vary. Incorrect guess may actually imply success of blinding, failure of blinding or even failure by the patients to understand the pros and cons of each treatment. This may be particularly pertinent in studies comparing two active treatments. However, under a random selection process, we can hypothesize statistically that it would occur in a balanced manner. We should make every effort to maximize the credibility and quality of the data. Of course, there is no way of knowing whether the respondents are completely honest in discussing their guesses and how they were determined. Some people may underplay the extent of certainty or hide relevant clues. The most important contribution to the blind is the respondents’ commitment to the purpose of the study. Toward this end, we should encourage all participants to make their best guess of the assigned treatment. Although responses other than DK are preferred in a statistical point of view, inclusion of an extra question for nonrespondents should be carefully discussed in the design stage of the trial. If unblinding is detected, we recommend that additional subgroup analyses be conducted to adjust for unblinding and extraordinary efforts be made to identify the unblinding factors, which would offer constructive suggestions for the design and conduct of similar studies in the future. The new BI appears to have several desirable properties for assessing blinding success. In practice, a combination of our BI, statistical testing, classification of case diagnosis (according to Table 4), careful interpretation and potential cause identification can provide a comprehensive evaluation of the blindness of clinical trials. Acknowledgements We thank participants for agreeing to be interviewed for the CRISP project. We also want to thank both referees and the editor-in-chief for very helpful comments, which remarkably improved this paper. 156 H. Bang et al. / Controlled Clinical Trials 25 (2004) 143–156 References [1] Schulz KF, Grimes DA. Epidemiology series: blinding in randomized trials: hiding who got what. Lancet 2002; 359(9307):696 – 700. [2] Day SJ. Dictionary for clinical trials. Wiley. New York; 1999. [3] Day SJ, Altman GD. Blinding in clinical trials and other studies. BMJ 2000;321:504. [4] James KE, Lee KK, Kraemer HC, Fuller RK. An index for assessing blindness in a multicenter clinical trial: disulfiram for alcohol cessation—a VA cooperative study. Stat Med 1990;15:1421 – 1434. [5] Morin CM, Colecchi C, Brink D, et al. How ‘‘blind’’ are double-blind placebo-controlled trials of benzodiazepine hypnotics? Sleep 1995;18(4):240 – 245. [6] Schulz KF, Grimes DA, Altman DG, Hayes RJ. Blinding and exclusions after allocation in randomized controlled trials: survey of published parallel group trials in obstetrics and gynecology. BMJ 1996;312:742 – 744. [7] Hughes JR, Krahn D. Blindness and the validity of the double-blind procedure. J Clin Psychopharmacol 1985;5(3): 138 – 142. [8] Deyo RA, Walsh NE, Schoenfeld LS, Ramamurthy S. Can trials of physical treatments be blinded? Am J Phys Med Rehabil 1990;69(1):6 – 10. [9] Jespersen CM, the Danish Study Group on Verapamil in Myocardial Infarction. Assessment of blindness in the Danish Verapamil Infarction trial II (DAVIT II). Eur J Clin Pharmacol 1990;39:75 – 76. [10] Marini JL, Sheard MH, Bridges CI, Wagner Jr E. An evaluation of the double-blind design in a study comparing lithium carbonate with placebo. Acta Psychiatr Scand 1976;53:343 – 354. [11] Rabkin JG, Markowitz JS, Stewart J, et al. How blind is blind? Assessment of patient and doctor medication guesses in a placebo-controlled trial of imipramine and phenelzine. Psychiatry Res 1986;19:75 – 86. [12] Reitmeir P, Wassmer G. Resampling-based methods for the analysis of multiple endpoints in clinical trials. Stat Med 1999;18:3453 – 3462. [13] Silvapulle MJ. On tests against one-sided hypotheses in some generalized linear models. Biometrics 1994;50:853 – 858. [14] Anderson TW, Reid DBW, Beaton GH. Vitamin C and the common cold: a double-blinded trial. Can Med Assoc J 1972;107:503 – 508. [15] LaRosa JC, Applegate W, Crouse JR, et al. Cholesterol lowering in the elderly. Arch Intern Med 1994;154:529 – 539. [16] Karlowski TR, Chalmers TC, Frenkel LD, et al. Ascorbic acid for the common cold: a prophylactic and therapeutic trial. JAMA 1975;231:1038 – 1042. [17] Howard J, Whittemore AS, Hoover JJ, Panos M, the Aspirin Myocardial Infarction Study Research Group. How blind was the patient blind in AMIS. Clin Pharmacol Ther 1982;32:543 – 553.