Download Interim Analysis in Clinical Trials and Early Stopping for Futility John

Interim Analysis in Clinical Trials and Early Stopping for Futility John M. Lachin The Biostatistics Center The George Washington University [email protected] Interim Analysis in Clinical Trials Effectiveness Monitoring: Group sequential upper one-sided, or outer two-sided bounds, for rejecting H0. Lan-DeMets α-spending function preserves overall type I error probability α. Allows early termination due to beneficial effect. Fixed sample size power preserved with O’Brien-Fleming bound, not a Pocock bound. Safety Monitoring: No formal sequential procedures usually applied. Futility Monitoring: Formal lower one-sided, or inner two-sided bounds, for accepting H0. OR Computation of conditional power Futility Monitoring Conditional Power (CP ): The probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the future data. Stochastic Curtailing refers to a decision to terminate a trial based on CP Halperin, Lan, Ware, Johnson and DeMets. Controlled Clinical Trials 1982. Lan, Simon and Halperin. Comm. Statist. 1982. Futility Monitoring refers to monitoring for lack of effectiveness based on low CP Ware, Muller and Braunwald. Am. J. Medicine 1985. Sequential Monitoring Bounds on T ype I and II error probabilities α and β Lan, Simon and Halperin (1982) under continuous monitoring Davis and Hardy (Comm. Statist., 1990, 1992) when monitored at discrete points in time. Single Interim Futility Assessment at a Pre-Specified Time Industry trials with "mid-study" futility assessment Sample size re-estimation procedures with a simultaneous futility assessment at pre-specified time τ , e.g. Lan-Trost (1997) Simple Example: The Lupus Nephritis Collaborative Study Lachin and Lan, Controlled Clinical Trials, 1992 Effects of Plasmapheresis versus Standard therapy on renal failure in lupus nephritis. With two years remaining in the study the following data had been observed Group P S Response + 11 10 21 − 29 36 55 40 46 86 π bP = 0.275 (11/40) π bS = 0.2174 p = 0.66 (two-sided) Recruitment closed. Question: Conditional Power What is the probability that the study would yield a significant result (onesided) in favor of Plasmapheresis if the study were to be continued for two years, given what has been observed to date? Possible Future Data All 2 × 2 tables with the following values Group P S Response + 11 + a 10 + b 21 + a + b − 29 − a 36 − b 55 − a − b 40 46 86 0 ≤ a ≤ 29 0 ≤ b ≤ 36 To compute the conditional power (one-sided): bP < π bS and p ≤ 0.05 (one-sided). 1. Identify those tables where π 2. Specify probabilities to apply to the future data: π P and π S . 3. Compute the probability of each table with values (a, b) given (π P , π S ), P (table) = P (a|π P ) × P (b|π S ) = B(a|π P , 29) × B(b|π S , 36) 4. Sum these probabilities CP (πP , π S ) = X29 X36 a=0 b=0 I [b πP < π bS ] I [p ≤ 0.05] B(a|π P , 29) × B(b|πS , 36) The Lupus Nephritis Collaborative Study Current Trend: Control group: π S = pS = 0.22 Plasmapheresis group: π P = pP = 0.28, CP < 0.01 (one-sided) Design Effect Size: 50% reduction Control group: π S = pS = 0.22 Plasmapheresis group: π P = 0.11, CP = 0.01 Study stopped for futility Generalizations Exact CP for test for proportions with additional recruitment (Lachin, Lan, 1992) Large Sample CP for tests for proportions with or without additional recruitment (Halperin, et al. 1982) Lan and Wittes (Biometrics, 1988) B -value (Brownian Motion) Applicable to any test with "independent increments" in information Any test based on an efficient estimator of the parameter of interest Lachin (Statistics in Medicine, 2005) Operating Characteristics Probability of stopping Type I and II error probabilities Information Time (Fraction) Test H0: θ = 0 versus H1: θ 6= 0 q Zt = Z -test value at “information time” t, Zt = b θt/ V (b θt) Simple cases (means or proportions) Planned total sample sizes: NE and NC in groups E and C . h i ¤ ¤ £ −1 £ Variance ∝ NE + NC−1 , e.g. V X 1 − X 2 = σ 2 N1E + N1C As Variance decreases, information increases ¤ £ −1 −1 −1 Information ∝ NE + NC Accrued sample sizes: nE and nC ¤ £ −1 −1 −1 nE + nC t = £ ¤ −1 −1 −1 NE + NC = n/N if nE = nC and NE = NC Logrank test Expected number of events: E(DE ) and E(DC ) ¤ £ −1 −1 −1 Inf ormation ∝ E(DE ) + E(DC ) Accrued events: dE and dC ¤ £ −1 −1 −1 dE + dC t = [E(DE )−1 + E(DC )−1]−1 dC = under H0 for groups of equal size E(DC ) Drift Parameter Statistic S : H0: S ∼ N(0, σ 2/N) H1: S ∼ N(φ, σ 2/N) √ Z = S N/σ αD = design type I error probability β D = design type II error probability, 1 − β D = Φ(Z1−β D ) = power ∙ ¸2 (Z1−αD + Z1−β )σ N= φ Drift Parameter θ is the non-centrality parameter √ N|φ| θ = E[Z|H1] = = Z1−αD + Z1−β σ For α = 0.05 (two-sided), β = 0.15, then θ = 1.96 + 1.04 = 3 The B -value A transform of the Z -test value that facilitates computation of CP Lan and Wittes. Biometrics 1988. Lan and Zucker. Statistics in Medicine 1993. B -value, √ Bt = Zt t, 0 < t ≤ 1. Asymptotically, 0 < t ≤ 1 Bt ∼ N(tθ, √t) Zt ∼ N(θ t, 1) b θt = Bt/t ∼ N(θ, 1/t) The B -value Shows the Trend in the Data √ N|φ| Bt ∼ N(tθ, t), θ = E[Z|H1] = σ Example, Mantel logrank Test αD = 0.05 (two-sided), 1 − β D = 0.85 Relative hazard φD = 0.6 Control group hazard rate of 0.35 per year 5% losses to follow-up per year 1 year recruitment and 2.5 year total duration N = 355, Lachin and Foulkes (Biometrics, 1986) θD = Z0.975 + Z0.85 = 1.96 + 1.04 = 3 E(DC ) = 85 events in the control group t = dc/85 Conditional Power θF = drift parameter for the future data where B1|(Bt, θF ) ∼ N(e θ, 1 − t) e θ = tb θt + θF (1 − t) = Bt + θF (1 − t). CP (t, θF ) = Φ[Z1−β (t, θF )] where e θ − Z1−αD Z1−β (t, θF ) = √ 1−t CPD under the original design, θF = θD CPT under the current trend, θF = b θt CPN under H0, θF = 0 Conditional Power - Distribution θI = drift parameter assumed for the initial data up to interim assessment = E(b θt) = E(Bt/t) θt, or of Bt or Zt, then Since Z1−β is a funciton of b ¸ ∙ tθI + (1 − t)θF − Z1−αD t √ , Z1−β (t, θF ) ∼ N 1−t 1−t Provides distribution of Z1−β , and of CP , for given α, θI , θF , and t. Low Conditional Power (< 0.3) at t = 0.5 Under the Design, Current Trend and Null ∧ As a Function of the B-value and θt ∧ θ0.5 -1 0 1 2 3 4 0.30 gn 0.25 Tre nd 0.15 Nu ll 0.20 De si Conditional Power -2 0.10 0.05 -1 0 B0.5 1 2 Futility Stopping Pre-specified time t = τ Stop for futility if CP ≤ CL P (Stop) PL = P [CP (τ , θF ) ≤ CL] = P [Bτ ≤ BL] = P [b θτ ≤ θL] ∧ Probability of Stopping at τ = 0.5, 0.8 when θτ ≤ θL 0.05 ≤ CPD ≤ CPN ≤ 0.3 1.0 0.9 τ = 0.8 Pr(stop) 0.8 τ = 0.5 H0 0.7 0.6 0.5 0.4 τ = 0.5 0.3 0.2 H1 0.1 τ = 0.8 0.0 -2 -1 0 θL 1 2 3 Type II Error Probability β = β1 + β2 β 1 = PL1 = P (stop for futility |H1) β 2 = Prob continuation and the final result is not significant β 2 = P (B1 < Z1−αD ∩ Bτ > BL|H1) β ≤ β1 + βD Example Design power = 0.85, β D = 0.15 when φ = 0.6 Stop for futility at τ = 0.5 if CPD(0.5) ≤ 0.3 B0.5 ≤ BL = 0.08916 b θ0.5 ≤ θL = 0.17831. Under H1: θ = θD , P (stop) = P (CPD(0.5) ≤ 0.3) = 0.023. Total type II error probability given θ = θD β = P (CPD(0.5) ≤ 0.3) + P [(CPD(0.5) > 0.3) ∩ (|Z1| < 1.96)] = 0.023 + 0.131 = 0.154. Type II Error Probability θF = θD = 3 0.05 ≤ CPD ≤ CPN ≤ 0.3 β 0.50 τ = 0.5 0.40 0.30 τ = 0.8 0.20 0.15 -2 -1 0 1 θ(τ)L 2 3 Type I Error Probability α = α1 + α2 α1 = P (reject when stop for futility |H0) = 0. α2 = is the probability of continuation and significance at the final analysis under H0: θ = 0. For a one-sided test at level αD α2 = P (Bt > BL ∩ B1 ≥ Z1−αD | H0) Type I Error Probability 0.05 ≤ CPD ≤ CPN ≤ 0.3 α 0.05 τ = 0.8 0.04 τ = 0.5 0.03 0.02 0.01 -2 -1 0 θ(τ)L 1 2 3 Fixing The Error Probabilities As P(Stop) increases β increases from design level β D α decreases from design level αD For given boundary BL can iteratively determine final critical value ZF < Z1−αD such that α2 = αD Example: BL = 0.9 (θL = 1.8) at τ = 0.5, ZF = 1.7535 provides α = 0.05 two-sided. For given β ≥ β D can iteratively determine BL and ZF such that α2 = αD Example: Slight Inflation in P(Type II error) θD = 3, αD = 0.05 (two-sided), β D = 0.15 and β = 0.175, at τ = 0.5, ZF = 1.8954 and BL = 0.5673 for which θL = 1.135. P (stop | H0) = 0.789, P (stop | θ = 3) = 0.094. Fixing The Error Probabilities Example: No Inflation in P(Type II error) θD = 3, αD = 0.05 (two-sided), β D = β = 0.15, at τ = 0.5, ZF = 1.95996 and BL = −1.1794 for which θL = −2.359. P (stop | H0) = 0.047, P (stop | θ = 3) = 0.000076. Conclusions While CP is a useful construct, the properties of futility monitoring depend on the θ t. bounds on the interim statistics: either Zt, Bt, or b Conservative to employ CP under design CP under the current trend has a higher probability of stopping for futility, greater inflation in β Regardless of how CP is computed, the probability of stopping is a function of the true drift parameter θ the critical value BL or the corresponding θL The greater the probability of stopping the greater the potential inflation β. Inflation in β can be reduced by adjustment of final critical value ZF . As the time of the futility assessment increases: Probability of stopping under H1 decreases Inflation in β decreases Futility Assessment and Monitoring for Effectiveness Futility assessment can be embedded in a group sequential α-spending function Assume O’Brien-Fleming like boundary starting at t = 0.25 and at increments of 0.125. Assume futility assessment at t = 0.5. 1. Compute Upper boundary for total α = 0.025 (one-sided) for looks prior to futility assessment t 0.25 0.325 0.5 0.625 0.75 0.8785 1 ZU 4.3326 3.4814 ZL at t = 0.375, α(0.375) = 0.00025 and the remaining α to be spent is 0.02475 2. Compute futility bound and critical value for αD = 0.02475, β D = 0.15, and total β = 0.20 for θ = 3 0.25 0.325 0.5 0.625 0.75 0.8785 1 t ZU 4.3326 3.4814 ZL 1.06427 ZF = 1.84009 with nominal αF = 0.032877 2. Compute futility bound and critical value for αD = 0.02475, β D = 0.15, and total β = 0.20 for θ = 3 0.25 0.325 0.5 0.625 0.75 0.8785 1 t ZU 4.3326 3.4814 ZL 1.06427 ZF = 1.84009 with nominal αF = 0.032877 3. Compute remaining O-Brien-Fleming boundary using α = 0.032877 starting at t = 0.5 0.25 0.325 0.5 0.625 0.75 0.8785 1 t ZU 4.3326 3.4814 2.8006 2.5013 2.2725 2.0963 1.9554 ZL 1.06427 4. Check actual probabilities using the Lan-DeMets program, α = 0.0266 one sided (0.0532 two-sided) and 1 − β = 0.7957. If futility assessment at τ = 0.75, then α = 0.02461 and 1 − β = 0.7817. An exact calculation could be done using successive multivariate integration for the sequence of interim analyses. Related Work Snappin (Stat in Med, 1992) reject H0 based on CP (T rend), accept H0 based on CP (Design) Pepe and Anderson (Appl Stat, 1992) similar for conservative estimate under current trend for survival data. Ellenberg and Eisenberg (Cancer Treat Rep, 1985), Wieand et al. (Stat in Med, 1994): b<0 Stop for futility in PH model if β P (stop|H1) very small, minimal impact on α, β Pampalonna and Tsiatis (J Stat Plan and Inf, 1994), α and β spending outer and inner bounds for effectiveness and futility (EAST) Other related methods Exact α, β only for a fixed pre-specified sequence of looks Kittleson and Emerson (Biometrics, 1999) Inner and outer boundaries for α = 0.05 (2-sided), β = 0.20 Inner boundary corresponds to CL = CPD = 0.50 Tante Grazie!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Interim Analysis in Clinical Trials and Early Stopping for Futility John