Download Clinical Trials A short course

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Prescription costs wikipedia , lookup

Neuropharmacology wikipedia , lookup

Pharmacokinetics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Bad Pharma wikipedia , lookup

Theralizumab wikipedia , lookup

Bilastine wikipedia , lookup

Effect size wikipedia , lookup

Transcript
Statistical Methods in
Clinical Trials
Sample Size Determination
Ziad Taib
Biostatistics
AstraZeneca
March 12, 2012
• “The number of subjects in a clinical study
should always be large enough to provide
a reliable answer to the question(s)
addressed.”
• “The sample size is usually determined by
the primary objective of the trial.”
• “ Sample size calculation should be
explicitly mentioned in the protocol .”
(from ICH-E9)
Power and sample size
• Suppose we want to test if a drug is better than a
placebo, or if a higher dose is better than a lower
dose.
• Sample size: How many patients should we
include in our clinical trial, to give ourselves a
good chance of detecting any effects of the drug?
• Power: Assuming that the drug has an effect,
what is the probability that our clinical trial will
give a significant result?
Sample Size and Power
• Sample size is contingent on design, analysis
plan, and outcome
• With the wrong sample size, you will either
– Not be able to make conclusions because the
study is “underpowered”
– Waste time and money because your study is
larger than it needed to be to answer the
question of interest
Sample Size and Power
• With wrong sample size, you might have
problems interpreting your result:
– Did I not find a significant result because the
treatment does not work, or because my
sample size is too small?
– Did the treatment REALLY work, or is the effect
I saw too small to warrant further consideration
of this treatment?
– Issue of CLINICAL versus STATISTICAL
significance
Sample Size and Power
• Sample size ALWAYS requires the investigator
to make some assumptions
– How much better do you expect the
experimental therapy group to perform than the
standard therapy groups?
– How much variability do we expect in
measurements?
– What would be a clinically relevant
improvement?
• The statistician CANNOT tell what these
numbers should be
• It is the responsibility of the clinical
investigators to define these parameters
Errors with Hypothesis Testing
Accept H0
Reject H0
E=C
OK
Type I error
EC
Type II error
OK
Ho: E=C (ineffective) H1: E≠C (effective)
Type I Error
Concluding for alternative hypothesis while nullhypothesis is true (false positive)
• Probability of Type I error = significance level or
 level
• Value needs to be specified in the protocol and
should be small
• Explicit guidance from regulatory authorities, e.g.
0.05
• One-sided or two-sided ?
• Ho: ineffective vs. Ha: effective
Type II Error
• Concluding for null-hypothesis while alternative
hypothesis is true (false negative)
• Probability of Type II error =  level
• Value to be chosen by the sponsor, and should be
small
• No explicit guidance from regulatory authorities, e.g.
0.20, or 0.10
• Power
=1 - 
= 1 - Type II error rate
= P(reject H0 when Ha is true)
=P(concluding that the drug is effective when it is)
• Typical values 0.80 or 0.90
Sample Size Calculation
• Following items should be specified
– a primary variable
– the statistical test method
– the null hypothesis; the alternative hypothesis;
the study design
– the Type I error
– the Type II error
– way how to deal with treatment withdrawals
Three common settings
1. Continuous outcome: e.g., number of units of
blood transfused, CD4 cell counts, LDL-C.
1. Binary outcome: e.g., response vs. no
response, disease vs. no disease
1. Time-to-event outcome: e.g., time to
progression, time to death.
Continuous outcomes
• Easiest to discuss
• Sample size depends on
– Δ: difference under the null hypothesis
– α: type 1 error
– β type 2 error
– σ: standard deviation
– r: ratio of number of patients in the two
groups (usually r = 1)
One-sample
• Assume that we wish to study a hypothesis related to a
sample of independent and identically distributed normal
random variables with mean  and standard deviation .
Assuming  is known, a (1-)100% confidence interval
for  is given by
 
YZ( )
2 n

EZ( )
2 n
• The maximum error in estimating  is
which is a function of n. If we want to pre specify E, then n
can be chosen according to

Z( )22
n 2 2
E
• This takes care of the problem of determining the sample
size required to attain a certain precision in estimating 
using a confidence interval at a certain confidence level.
Since a confidence interval is equivalent to a test, the
above procedure can also be used to test a null
hypothesis. Suppose thus that we wish to test the
hypothesis
– H 0: μ 1 = μ 0
– H a: μ a > μ 0
• with a significance level . For a specific alternative
H1: μa = μ0 + where  >0, the power of the test is given by
Two samples
• H 0: μ 1 – μ 2 = 0
• H1: μ1 – μ2 ≠≠ 0
• With known variances 1 and 2, the power against a
specific alternative μ1 = μ2 + is given by
• is normal so

 
 
 PZ( ) ZZ( ) 
2 d
2 d 

 
Z() Z( )
2 d


 
 2
2
Z
(
)

Z
(

)



1
2


2

n
2
2
• In the one sided case we have
2 2
2




Z
(

)

Z
(

)



1
2
n

2

Variants of two-sample formula
For testing the difference in two means, with (equal) variance
and equal allocation to each arm:
(
Z
Z
)
2
n

2
1

2 2
With Unequal allocation to each arm, where n2 = n1
(
Z

Z
)
r

1


2
n

2
1
r

2 2
If variance unknown, replace by t-distribution
One proportion
We want to test
• H 0 : P = P0
• Ha: P >P1
with a certain power (1-) and with a certain significance
level (). The required sample of size is



P

P
Z

(
1

P
)

Z

(
1

P
)
n

2
0
0


p

p
2
02
1
0

1
1
Two proportion
We want to test
• H 0 : P1 = P2
• H a : P1 ≠ P2
with a certain power (1-) and with a certain significance
level (). The required sample of size is



Z
(

/
2
)P
(
1

P
)

Z

P
(
1

P
)

P
(
1

P
)
n

,
2
1
2
1


p

p
2
(
P

P
)
1
2
P

2
1
2
2
Analysis of variance
We want to to test
• H0: 1 = 2 = …= k
• Ha: at least one i is not zero
Take:

k

1
,

k
(
n

1
)

N

k
,
n


F

F
(

,

,
),
 ,
2







2
(



)
(


2

)


(



)(
2


1
)
F
Z
(

)




(

)F

(
2

2
)
1
*
A
2
k
i
2
2
1 2
22
2
1
22
2
1
1 1
1 1
2 *
A
2
2
2
1
The last equation is difficult to solve but there are tables.
*
A
StudySize
Survival analysis
• Assume we plan a RCT aiming at comparing a new
treatment (drug) with an old one (control). Such a
comparison can be made in terms of the hazard ratio or
some function of this. Let d stand for the drug and c for
control
• In the sequel we describe the sample size required to
demonstrate that the drug is better than the control
treatment. This will be made under some assumptions
in two steps
• Step 1: specify the number of events needed
• Step 2: specify the number of patients needed to
obtain the “the right” number of events.
Step 1
In what follows we will use the notation below to give a
formula specifying the number of events needed to
have a certain asymptotic power in many different
cases like
The log-rank test – the score test – parametric tests
based on the exponential distribution – etc.
p

The
proportion
of
individual
receiveing
the
new
drug
q

1

p

The
proportion
of
individual
receivein
the
old
dru
d

the
required
number
of
events
C

The
critical
value
of
the
test

Z
The
upper

quantile
of
the
standard
normal
distrib
on


C
Z 

d
2
power
2
pq

• It is not clear how  should estimated.
The difficulty lies in the fact that this
depends on the exact situation at hand.
We illustrate that through some
examples
• EXAMPLE 1
Assume that our observations follow exp()
under the old drug (control) and exp(exp)
under the new drug. Then the ratio of medians
of the two distributions will be

M
2 1
d ln
 


M

e ln
2 e
c
• To be able to discover a 50% increase in
the median we take
M
1 3  2
d

e
M
3
c e 2
In general we can use Taylor expansion to derive the
following formula

e

e













S
t

S
t
;
1
F
t

1

F
t
b
c
b
c






log
1
F
t
F
t

b
b

e




F


log
1
F
t
t
c
c
We can thus interpret e  in terms of how more likely the event
A: (event before t in the drug group)
is compared to the event
B: (event before t in the control group)
COMMENTS
• To choose  one has to have an idea about
what is possible to achieve which depends on
the nature of the disease.

• With hypertension values of e between 0.87
and 1.15 are assumed to be equivalent while
with AIDS and cancer no therapy can do more
than reduce the risk by 50%
• EXAMPLE 3
In a clinical trial we want to compare an old therapy with 5
years survival rate of 30% with a new one and hope
that the survival rate of the new therapy is 45%. In


formulas
e
e









S
5
S
5
;
0
.
45
0
.
30
b
c

 
0
.
45
 log

e

e
0
.
663


log
0
.
30
• We perform a two sided test with =0.05 and want to
achieve 90% power of discovering a risk ratio of 0.663.
Moreover we assume that p=q=0.5.




C

Z
1
.
96

1
.
28
d



249
eve
1 2
pq
 1


0
.
663
2
0
.
05
0
.
10
2
2
2
2
1. It is possible to dress a table specifying  as
a function of the remaing parameters when
=0.05 and p=q and the test is two sided.
2. Observe that e  and e   lead to the same
number of events.
3. The largest power is obtained when p=q.
4. When p and q are not equal divide the table
value by 4pq.
5. For a one sided test, only C needs to be
changed from 1.96 to 1.64.
e
power
1.15
1.25
1.50
1.75
2.00
0.5
787
309
94
49
32
0.7
1264
496
150
79
51
0.8
1607
631
191
100
65
0.9
2152
844
256
134
88
The number of events needed for a certain power
Under equal allocation.
Reminder
• Step 1: specify the number of events needed
• Step 2: specify the number of patients needed to obtain
the “the right” number of events.
Step 2
In what follows we consider how we can specify the
number of patients needed to get the “right” number of
events needed to have a certain asymptotic power. In
general this will depend on
1. The length of the recruitment period T
2. The recruitment rate R
3. The follow up time F
4. The survival function for the control group
•
•
d = 2N x P(event) )2N => d/P(event)
P(event) is a function of the length of total follow-up at time of analysis and
the average hazard rate.
This step is often performed on an ad hoc
basis and therefore a call for caution is
motivated also in various written documents
such as the statistical part of a protocol.
Formulations should be in the spirit of the
following example
“For a power of 80% to detect a 50%
improvement in survival, using a two sided
test at  = 0.05, the study needs 191 events.
We ANTICIPATE that this will require the
enrollment of ….”
Such formulations are not popular but one
has to be as close to the truth as possible.
Dose response
• In this type of studies we are interested in whether some doses
exhibit some good properties vs. placebo. The null hypothesis is of
the form H0: 0 = 1 = …= k .
• The alternative hypothesis can take many different forms for
example: H0: 0 < 1 <…< k . Different cases of interest can be
visualized as in the figures in the next slide. One possibility is to use
linear contrasts
k
k
j
1
j
1
H
cjj 0, 
cj 0,
a :
k
k
j
1
j
1
ˆ  cY,
C
cjj, C
j j
Crossover Designs
• In this type of studies we are interested in
testing a null hypothesis of the form
– H0: T = P against
– Ha: T ≠ P .
Placebo
Arm k=1
Arm k=2
Period j=1
Drug
Period j=2
Arm k=1
T11
T21
T12
T22
Period j=1
Drug
Period j=2
Placebo
Crossover
design
Arm k=2
A general model we can use for the response
of the ith individual in the kth arm at the jth period
is
where S and e are independent random
effects with means zero and standard
deviations  and  . The above hypotheses
can be tested using the statistic Td, which can
also be used to determine the power of the
test against a certain hypothesis of the type
S
Assuming equal sample sizes in both arms,
and that d can be estimated from old studies
we have
(*)
where  stands for the clinically meaningful
difference. (*) can now be sued to determine n.
Interval Hypothesis for equivalence
• If we reject the null hypothesis as formulated
above we conclude that the treatments are nonequivalent. In case we cannot reject the null
hypothesis, can we say that the treatments are
equivalent? The answer is NO! To deal with this
we formulate non-equivalence as our null
hypothesis and try to reject this. This leads to
the following TWO ONE SIDED TESTS (TOST)
formulation
– H0: T - P ≥ L
– against
– Ha1: T - P > L
- H0: T - P ≤ U
- against
- Ha2: T - P < U
• Where L and U are clinically meaningful
limits of equivalence.
L

T
-

P
U
• Schuirman’s test
• Reject H0: non-equivalence if
• Can be worked by hand but easier to use
some software.
Software
• For common situations, software is available
• Software available for purchase
–
–
–
–
–
NQuery
StudySize
PASS
Power and Precision
Etc…..
• Online free software available on web search
•
•
•
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
http://calculators.stat.ucla.edu
http://hedwig.mgh.harvard.edu/sample_size/size.html
Questions or Comments?