Download Interim Analysis in Clinical Trials

Document related concepts

Pharmaceutical industry wikipedia , lookup

Pharmacokinetics wikipedia , lookup

Prescription costs wikipedia , lookup

Polysubstance dependence wikipedia , lookup

Effect size wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Clinical trial wikipedia , lookup

Theralizumab wikipedia , lookup

Bilastine wikipedia , lookup

Bad Pharma wikipedia , lookup

Transcript
Interim Analysis in Clinical Trials
Professor Bikas K Sinha [ ISI, KolkatA ]
Courtesy : Dr Gajendra Viswakarma
Visiting Scientist
Indian Statistical Institute
Tezpur Centre
e-mail: [email protected]
1
What is a clinical trial?
A Clinical trial is defined as a prospective study
comparing the effect and value of intervention
(s) against a control in human beings.
A test of a new intervention or treatment on people for
detecting
-Tolerability
-Safety
-Efficacy
2
Types of clinical trials
Superiority
Non-inferiority
Equivalence
It can be a Phase I, Phase II or Phase III Trial
3
Diagrammatical Presentation of
Clinical Trials
equivalence
non-inferior
superior
Control better
-
0
Test better
4
Clinical Trial Stages
Phase I: Clinical Pharmacology and Toxicity
Objective: To determine a safe drug dose for further
studies of therapeutic efficacy of the drug
Design: Dose-escalation to establish a maximum
tolerated dose (MTD) for a new drug
Subjects: 1-10 normal volunteers or patients with
disease
5
Clinical Trial Stages
Phase II: Initial Clinical Investigation for
Treatment Effect
Is a fairly small-scale
Objective: To get preliminary information on
effectiveness and safety of the drug
Design: Often single arm (no control group)
Subjects: 100-500 patients with disease (or depends on
Therapeutic Area [TA])
6
Clinical Trial Stages
Phase III: Full-Scale Evaluation of the Treatment
(Comparative clinical trial): planned experiment on
human subjects. To some people the term “Clinical
trial” is synonymous with such a full-scale Phase III
trial.
Phase III trial is most rigorous and extensive type of
scientific clinical investigation of a new treatment.
Objective: To compare efficacy of the new
treatment with the standard regimen
Design: Randomized Control
Subjects: depends on phase II trial patients with
disease
7
Clinical Trial Stages
Phase IV: Post-Marketing
After the research program leading to a drug being
approved for marketing, there remain substantial
inquiries still to be undertaken as regards monitoring
for adverse effects and additional large-scale, longterm studies of morbidity and mortality.
Objective: To get more information (long-term side
effects)
Design: no control group
Subjects: Patients with disease using the treatment
8
The Big Picture
DRUG A
DRUG B
Test stat
9
… So What is Different?
Ethics: Experiment involving human subjects
brings up new ethical issues
Bias: Experiment on intelligent subjects
requires new measures of control
We will also study the additional
considerations in clinical trials
to address the above requirements.
10
Interim Analysis
Analysis comparing intervention groups at any
time before the formal completion of the trial,
usually before recruitment is complete.
Often used with "stopping rules" so that a trial
can be stopped if participants are being put at
risk unnecessarily.
Timing and frequency of interim analyses
should be specified in the protocol.
11
Interim Analyses
Interim analyses is a tool to protect the welfare of
subjects
By stopping enrollment/treatment as soon as a drug
is determined to be harmful
By stopping enrollment as soon as a drug is
determined to be highly beneficial
By stopping trials which will yield little additional
useful information (or which have negligible chance
of demonstrating efficacy if fully enrolled, given
results to date)
The associated statistical methods are generally
referred to as group sequential methods
12
Flowchart of the Study
Treatment period
Treatment-free follow up
Control
Test (safe dose determined)
T1
T2
Screening
15 days to 4
weeks
4 weeks
Visit 1
Enrolment
4 weeks
Visit 2
4 weeks
Visit 3
4 weeks
4 weeks
Visit 4 Visit 5
End of treatment
4 weeks
Visit 6
Visit 7
Required Sample size of the study is 330
(each are required 110 subjects)
13
Disposition Table on going study
Drug C
Drug T1
Patient Screened
Study Incomplete +
ongoing
Completed Visits 5+
Total
129
Screening Failure
Patient Randomized
Drug T2
23
36
36
34
106
9+5
8+5
10+3
28+12
22
23
21
66
14
Mean PASI Change at Visits in Different Treatment Groups
Drug A
Drug B
Drug C
16.00
14.00
12.00
Mean PASI
10.00
8.00
6.00
4.00
2.00
0.00
V1
V2
V3
V4
V5
Visit
15
Some Examples of Why a Trial
May Be Terminated
Treatments found to be convincingly different
Treatments found to be convincingly not different
Side effects or toxicities are too severe
Data quality is poor
Accrual is slow
Definitive information becomes available from an outside
source making trial unnecessary or unethical
Scientific question is no longer important
Adherence to treatment is unacceptably low
Resources to perform study are lost or diminished
Study integrity has been undermined by fraud or
misconduct
16
Opposing Pressures in Interim Analyses
To Terminate:
minimize size of trial
minimize number of
patients on inferior
arm
costs and economics
timeliness of results
To Continue:
increase precision
reduce errors
increase power
increase ability to
look at subgroups
gather information
on secondary
endpoints
17
The pitfalls of interim analyses
RCTs [Randomized Clinical Trials] with interim
analysis
1.
Calculate sample size
2. Carry out the clinical trial
3. Employ statistical test of efficacy at pre-planned
stages in the interim until sample size has been
reached*
*One treatment declared significantly better than
the other if we get a p-value less than 5%.....
18
Statistical Considerations in Interim
Analyses
Consider a safety/efficacy study (phase II)
“At this point in time, is there statistical
evidence that….”
The treatment will not be as efficacious as we
would hope/need it to be?
The treatment is clearly dangerous/unsafe?
The treatment is very efficacious and we
should proceed to a comparative trial?
19
Statistical Considerations in Interim
Analyses
Consider a comparative study (phase III)
“At this point in time, is there statistical
evidence that….”
One arm is clearly more effective than the
other?
One arm is clearly dangerous/unsafe?
The two treatments have such similar
responses that there is no possibility that we
will see a significant difference by the end of
the trial?
20
Statistical Considerations in Interim
Analyses
We use interim statistical analyses to determine
the answers to these questions.
It is a tricky business:
interim analyses involve relatively few data
points
inferences can be inexact
we increase chance of errors.
if interim results are conveyed to
investigators, a bias may be introduced
in general, we look for strong evidence in one
or another direction.
21
Example: ECMO trial
Extra-corporeal membrane oxygenation (ECMO)
versus standard treatment for newborn infants with
persistent pulmonary hypertension.
N = 39 infants enrolled in study
Trial terminated after interim analysis
4/10 deaths in standard therapy arm
0/9 deaths in ECMO arm
p = 0.054 (one-sided)
Questions:
Is this result sufficient evidence on which to change
routine practice?
Is the evidence in favor of ECMO very strong?
22
Example: ISIS trial
The Second International Study of Infarct Survival (ISIS-2)
Five week study of streptokinase versus placebo based on
17,187 patients with myocardial infarction.
Trial continued until
12% death rate in placebo group
9.2% death rate in streptokinase group
p < 0.000001
Issues:
strong evidence in favor of streptokinase was available
early on
impact would be greater with better precision on death
rate, which would not be possible if trial stopped early
earlier trials of streptokinase has similar results, yet little
23
impact.
Statistical Approaches for Interim
Analysis
Three main philosophic approaches
Frequentist approach:
Multiple Looks
Group Sequential Designs
Stopping Boundaries
Alpha Spending Functions
Two Stage Designs
Likelihood approach
Bayesian approach
All differ in their approaches
Frequentist (Multiple Looks) is most commonly seen (
but not necessarily the best ! )
24
An Example of “Multiple Looks:”
RCT (Randomized Clinical Trial with Trt A vs Trt
B): Required Sample Size: 200
TRT A
100
TRT B
100
25
An Example of “Multiple Looks:”
Four interim looks (50, 100, 150, and 200)
TRT A
100
P = 0.028
1st Interim look
TRT B
100
26
An Example of “Multiple Looks:”
Four interim looks (50, 100, 150, and 200)
TRT A
100
P = 0.38
2nd Interim look
TRT B
100
27
An Example of “Multiple Looks:”
Four interim looks (50, 100, 150, and 200)
TRT A
100
P = 0.028
P = 0.028
P = 0.38
P = 0.62
P = 1.00
TRT B
100
28
An Example of “Multiple Looks:”
Consider planning a comparative trial in which two
treatments are being compared for efficacy (response
rate).
H0: p2 = p1
H1: p2 > p1
A standard design says that for 80% power and with
alpha of 0.05, you need about 100 patients per arm
based on the assumption p2 = 0.50, p1= 0.30 which
results in 0.20 for the difference.
So what happens if we find p < 0.05 before all patients
are enrolled ?
Why can’t we look at the data a few times in the
middle of the trial and conclude that one treatment is
better if we see p < 0.05?
29
1.5
Risk Ratio
1.0
0.5
0.0
0
50
100
150
200
150
200
pvalue
0.4
0.6
0.8
1.0
Number of Patients
0.2
The plots to the right show
simulated data where p1
= 0.40 and p2 = 0.50
In our trial, looking to find
a difference between 0.30
to 0.50, we would not
expect to conclude that
there is evidence for a
difference.
However, if we look after
every 4 patients, we get
the scenario where we
would stop at 96 patients
and conclude that there
is a significant
difference.
0
50
100
Number of Patients
H1
30
1.4
1.2
1.0
Risk Ratio
1.6
If we look after
every 10 patients,
we get the
scenario where
we would not
stop until all 200
patients were
observed and
would conclude
that there is not a
significant
difference
(p =0.40)
50
100
150
200
150
200
0.6
0.4
0.2
pvalue
0.8
1.0
Number of Patients
50
100
Number of Patients
H1
31
Risk Ratio
1.2
1.4
If we look after every 40
patients, we get the
scenario where we
would not stop either.
1.0
If we wait until the END
of the trial (N = 200),
then we estimate p1 to
be 0.45 and p2 to be
0.52. The p-value for
testing that there is a
significant difference
is 0.40.
50
100
150
200
150
200
0.2
0.4
pvalue
0.6
0.8
1.0
Number of Patients
50
100
Number of Patients
H1
32
Would we have messed up if we looked early on?
Every time we look at the data and consider stopping, we
introduce the chance of falsely rejecting the null
hypothesis.
In other words, every time we look at the data, we have the
chance of a type 1 error.
If we look at the data multiple times, and we use alpha of
0.05 as our criterion for significance, then we have a 5%
chance of stopping each time.
Under the true null hypothesis and just 2 looks at the data,
then we “approximate” the error rates as:
Probability stop at first look: 0.05
Probability stop at second look: 0.95*0.05 = 0.0475
Total probability of stopping is 0.0975
33
Effect of Sample Size on a
True Proportion
n\p^ 0.20 0.30
0.40
0.50
10
0, .45 0, .60 .1, .7
.18, .82
20 .02,.38 .1, .5 .18, .62 .28, .72
30 .05, .35
40 .07, .33
50 .09, .31 p^ +/- 2 sqrt{p^(1-p^)/n}
100 .12, .28 serve as both-sided
200 .15, .25 limits to TRUE p
300 .16, .24
0.60
.3, .9
.38, .82
.42, .78
.35, .75
.36, .74
.50, .70
.53, .67
.54, .66
34
Effect of Sample Size on a
True Proportion
n\p^
0.2
0.3
0.4
0.5
0.6
400 0.16, 0.24
500 0.17, 0.23
1000 .175, .225
1500 .18, .22
2000 .182, .218
p^ +/- 2 sqrt{p^(1-p^)/n}
3000 .185, .215 serve as both-sided limits
4000 .19, .21
for TRUE p
5000 .19, .21
35
Illustrative Examples :Interim Analysis
Example 1. It is desired to carry out an experiment
to examine the superiority, or otherwise, of a therapeutic drug over a standard drug with 5% level and
90% power for detection of 10% difference in the
proportions ‘cured’.
‘C’ : Standard Drug
‘T’ : Therapeutic Drug
H_0 : P_C - P_T = 0
H_1 : P_C # P_T
Size = 0.05, Power = 0.90 for =P_T – P_C = 0.10.
IT IS A BOTH-SIDED TEST.
36
Determination of Sample Size for
Full Analysis
Two-sided Test
= 0.05; Z_ /2 = 1.96
Power = 0.90; = 0.10, Z_ = 1.282, =0.10
N = 2(Z_ /2 + Z_ )^2 pbar(1-pbar)/ ^2
Assume pbar = 0.35 [suggestive cure rate]
N = 2(1.96 + 1.282)^2 (0.35)(0.65)/(0.10)^2
= 21.021128 x 22.75= 478.23……480
Conclusion: Each arm involves 480 subjects.
37
Full Experiment vs. Interim Analysis
For Full Experiment : Needed 480 subjects in
each ‘arm’.
At the end of the entire experiment, suppose
we observe :
‘C’ : # cured = 156 out of 480 i.e., 32.5%
‘T’ : # cured = 190 out of 480 i.e., 39.6%
Therefore, p^_C = 0.325 and p^_T = 0.396.
Hence, pbar = [p^_C + p^_T]/2 = 0.3605.
Finally, we compute the value of z given by
38
Full Analysis…..
Z_obs. = [p^_C – p^_T]/sqrt[pbar(1-pbar)2/N]
=[.325-.396]/sqrt[.36x.64x2/480]
= -[.071]/sqrt[0.00192] = -2.29
In absolute value, z_obs. is computed as 2.29
which is more than the ‘critical’ value of z
given by 1.96 [for a both-sided test with size
5%].
Hence, we conclude that the Null Hypothesis
is ‘not tenable’, given the experimental
outputs.
39
Interim Analysis : 2 ‘Looks’
First Look : use 50% of data
2nd Look : At the end, if continued after 1st.
Q. What is the size of the test at 1st look ?
Also, what is the size at the 2nd look so that
on the whole the size is 5 % ?
Ans. If we use 5% for the size at each of 1st and
2nd looks, then the over-all size becomes 8%.
Hence……both can NOT be taken at 5%.
Start with < 5% and then take > 5%.....
40
Interim Analysis : 2 Looks
Defining Equation :
= P[ Z_I > z*] + P[ Z_I < z*, Z_{I,II} > z**]
where Z_I and Z_II are based on 50% data in
two identical and independent segments so
that their distributions are identical. Further,
Z_{I,II} = [z_I + z_II]/sqrt(2) is based on
combined evidence of I & II and hence Z_I and
Z_{I,II} are dependent.
Choices of z* and z** : intricate formulae.
41
Interim Analysis : 2 Looks
Z-computation….
z_I obs. is to be based on 50% data upto the
1st look for each of ‘C’ and ‘T’.
Data : C (90/240) & T(120/240) & n = 240.
p^_C = 90/240 = 0.375; p^_T = 120/240=0.50
pbar = (0.375 + 0.50)/2 = 0.4375.
z_I obs. = [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n]
= - [ 0.125 ]/sqrt{.4375x.5625x2/240}
= - (0.125)/sqrt{0.002050}
= - 2.76 implies ???
42
Interim Analysis : 2 Looks
Suggested cut-off points :Adopted for 2 Looks
z_c Hebittle-Peto Pocock O’Brien-Fleming
z*
3.0
2.46
3.5
z**
2.0
2.46
2.0
z_I obs. in absolute value = 2.76
Conclusion ?
Reject H_0 ….suggested by Pocock’s Rule
Continue …suggested by other two.
Finally, z = - 2.29 suggests acceptance of
H_0 only by Pocock’s rule
43
Interim Analysis : 4 Looks
Cut-off points : Suggested Rules
z_c Hebittle-Peto Pocock O’Brien-Fleming
z*
3.0
2.42
4.00
z**
3.0
2.42
2.83
z***
3.0
2.42
2.32
z****
2.0
2.42
2.00
•
•
: 1st look; ** : 2nd look; *** : 3rd look and
**** : last [4th] look
44
Interim Analysis : 4 Looks
Details of data sets :
C : 48/120; 42/120; 30/120; 36/120 …Total
156/480
T : 54/120; 66/120; 32/120; 38/120 …Total
190/480
Progressive proportions for ‘C’ :
48/120=0.40; (48+42)/240= 0.375;
(48+42+30)/360=0.333; 156/480=0.325
Progressive proportions for ‘T’ :
54/120=0.45; (54+66)/240= 0.50;
(54+ 66+32)/360=0.422; 190/480=0.396
45
Interim Analysis : 4 Looks
Progressive computations of pbar……
1st Look : pbar = (0.40 + 0.45)/2 = 0.425
2nd Look : pbar = (0.375 + 0.50)/2 = 0.4375
3rd Look : pbar = ( 0.333 + 0.422)/2 = 0.3639
4th Look : pbar = (0.325 + 0.396)/2 = 0.3605
46
Interim Analysis : 4 Looks
Progressive Computations of z-statistic
Generic Formula :
z-obs. for ‘Look # i’ is the ratio of
(a) [p^_C(i)– p^_T(i)] for i-th Look
(b) sqrt[pbar(i)(1-pbar(i))2/n(i)]
where pbar(i) corresponds to Look # i and
also ‘n(i) ’ corresponds to size of each arm
of Look # i for each i = 1, 2, 3,4.
Note : n(1)=120; n(2)=240; n(3)=360,
n(4)=480
47
Interim Analysis : 1st Look
z_(Look I) obs.
= [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n*]
= [ 0.40-0.45 ]/sqrt{.425x.575x2/120}
= - (0.05)/sqrt{0.004073}
= -0.7835
Conclusion : All Rules are suggestive of
Continuation to 2nd Look
48
Interim Analysis :
nd
2
Look
z_(Look II) obs.
= [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n**]
= [0.375-0.50 ]/sqrt{.4375x.5625x2/240}
= - (0.125)/sqrt{0.002050}
= - 2.76
Conclusion : Reject H_0 by Pocock’s Rule
However, continue to 3rd Look according
to the other two rules.
49
Interim Analysis :
rd
3
Look and …
z_(Look III) obs.
= [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n***]
= [0.333-0.422 ]/sqrt{.3639x.6361x2/360}
= - (0.089)/sqrt{0.001286}
= - 2.48
Conclusion : Reject H_0 by Pocock & OBF
Rules but Continue by H-P Rule
Last Look : z_obs. = -2.29
Accept H_0 by Pocock’s Rule only
50
Data Analysis….Interpretations
Relative Merits of Decision Rules :
Pocock’s Rule : Maintains uniformity in
critical values ….so …apparently
‘conservative’ at the start…slowly turns into
‘liberal’ !
Other Rules : Liberal at the start and
conservative at the end…..
All Rules have to maintain the ‘averaging
principle’ to meet alpha at the end.
No Rule can be strict/liberal all through the
Looks.
51
Interim Analysis : Example 2
Continuous data : Testing for equality of
mean effects of two treatments : ’C’ & ’T’.
As before, we have Null and Alt. Hypotheses
and we have a specified value of
DELTA = Mean of T – Mean of C
and a specified power, say 90% to detect
this. Taking size equal to 5%, we solve for
the sample size in each arm.
This is routine computation and we take
sample size N = 525 in each arm.
Full Analysis : Sample Size Computation
Assume normal distribution with sigma = 5.
Two-sided Test
= 0.05; Z_ /2 = 1.96
Power = 0.90; = 0.10, Z_ = 1.282,
= 0.20 times sigma = 20% of sigma = 1.0
N = 2(Z_ /2 + Z_ )^2 x sigma^2 / ^2
= 2(1.96 + 1.282)^2 / 0.04
= 525 [approx.]
We can think of 5 Looks altogether…at equal
Steps…..each with approx. 105 observations.
Interim Analysis…Example contd.
Details of data sets : (mean, sample size)
C:
(30.5,105); (31.8, 105); (29.7, 105);
(30.2, 105); (31.3, 105)
T:
(31.7,105); (32.0, 105); (30.8, 105);
(33.7, 105); (32.8, 105)
Progressive sample means for ‘C’ :
30.5, 31.15, 30.67, 30.55, 30.70
Progressive sample means for ‘T’ :
31.7, 31.85, 30.83, 32.55, 32.60
Interim Analysis : Example contd….
Progressive Computations of z-statistic
Generic Formula :
z-obs. for ‘Look # i’ is the ratio of
(a) [mean_C(i)– mean_T(i)] for i-th Look
(b) sigma times Sqrt 2/n(i)]
where mean refers to sample mean for and
also ‘n(i) ’ corresponds to size of each arm
of Look # i for each i = 1, 2, 3,4, 5.
Note : n(1)=105; n(2)=210; n(3)=315,
n(4)=420 and n(5) = 525.
Interim Analysis : Example contd.
Cut-off points : Suggested Rules
z_c Hebittle-Peto Pocock O’Brien-Fleming
z*
3.0
2.60
4.56
z**
3.0
2.60
3.23
z***
3.0
2.60
2.63
z****
3.0
2.60
2.28
z*****
2.0
2.60
2.00
•
•
: 1st look; ** : 2nd look; *** : 3rd look;
**** : 4th look & ***** : Last [5th] look
Interim Analysis…Example contd.
z_(Look I) obs.
= [mean_C – mean_T]/sigma x sqrt[2/n*]
= - [ 1.2] / 5 x sqrt{2/105}
= - 1.74
Conclusion : Continue to 2nd Look
Interim Analysis : Example contd.
z_(Look II) obs.
= [mean_C – mean_T]/sigma x sqrt[2/n**]
= - [ 0.7 ] / 5 x sqrt{2/210}
= - 1.43
Conclusion : Continue to 3rd Look
Interim Analysis : Example contd.
z_(Look III) obs.
= [mean_C – mean_T]/sigma x sqrt[2/n***]
= - [ 0.16 ] / 5 x sqrt{2/315}
= - 0.40
Conclusion : Continue to 4th Look
Interim Analysis : Example contd.
z_(Look IV) obs.
= [mean_C – mean_T]/sigma x sqrt[2/n****]
= - [ 2.0 ] / 5 x sqrt{2/420}
= - 5.80
Conclusion : Stop and Reject H_0.
Strong evidence against H_0 and yet 105
observations per arm are left to be studied.
What if the expt was continued till the end
anyway ?
Interim Analysis : Example contd.
z_(Look V) obs.
= [mean_C – mean_T]/sigma x sqrt[2/n*****]
= - [ 1.90 ] / 5 x sqrt{2/525}
= - 6.16
Conclusion : Reject H_0.
Quite a strong evidence against H_0