Download The Role of the Statistician in Clinical Research Teams

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
The Role of the Statistician in
Clinical Research Teams
Elizabeth Garrett-Mayer, Ph.D.
Division of Biostatistics
Department of Oncology
Statistics
• Statistics is the art/science of summarizing data
• Better yet…summarizing data so that non-statisticians
can understand it
• Clinical investigations usually involve collecting a lot of
data.
• But, at the end of your trial, what you really want is a
“punch-line:”
– Did the new treatment work?
– Are the two groups being compared the same or different?
– Is the new method more precise than the old method?
• Statistical inference is the answer!
Do you need a statistician as part
of your clinical research team?
• YES!
• Simplest reasons: s/he will help to
optimize
– Design
– Analysis
– Interpretation of results
– Conclusions
What if I already know how to calculate
sample size and perform a t-test?
•
•
•
•
•
•
Statisticians might know a better approach
Trained more formally in design options
More “bang for your buck”
Tend to be less biased
Adds credibility to your application
Use resources that are available to you
Different Roles
• Very collaborative
–
–
–
–
Active co-investigator
Helps develop aims and design
Brought in early in planning
Continues to input throughout trial planning and while study
continues
• Consultants
– Inactive co-investigator
– Often not brought in until either
• You need a sample size calculation several days before
submission
• Trial has been criticized/rejected for lack of statistical input
• You’ve finally collected all of the data and don’t know what to do
next.
– Only involved sparsely for planning or for analysis.
Find a statistician early
• Your trial can only benefit from inclusion of a
statistician
• Statisticians cannot rescue a poorly designed
trial after the trial has begun.
• “Statistical adjustment” in analysis plan does not
always work.
• Ignorance is not bliss:
– Some clinical investigators are trained in statistics
– But usually not all aspects!
– Despite inclination to choose a particular design or
analysis method, there might be better ways.
Example: Breast Cancer
Prevention in Mice
•
•
•
•
NOT clinical study (thank goodness!)
Mouse study, comparing 4 types of treatments
Her-2/neu Transgenic Mouse
Prevention: which treatment is associated with
the longest time to tumor formation?
A intra-ductal
B intra-ductal
No treatment
A intravenous
Example: Breast Cancer
Prevention in Mice
• Each oval represents a mouse and dots
represent ducts
• Lab researchers developed design
• Varying numbers of mice for each of four
treatment groups
• Multiple dose levels, not explicitly shown here
A intra-ductal
B intra-ductal
No treatment
A intravenous
How to analyze?
• Researchers had created their own
survival curves.
• Wanted “p-values”
• Some design issues:
– What is the unit of analysis?
– How do we handle the imbalance?
– Treatment B has no “control” side
Our approach
• Statistical adjustments used
• Treat “side of mouse” as unit of analysis
– Cannot use mouse: multiple treatments per mouse
in many cases. Too many categories (once dose was
accounted for)
– Cannot use duct: hard to model the “dependence”
within side and within mice.
– Still need to include dependence within mice.
• Need to adjust for treatment on “other side”
– Problems due to imbalance of doses
– Could only adjust for “active” versus “inactive”
treatment on other side.
Our approach
•
•
•
•
•
We “rescued” this study!
Not optimal, but “best” given the design
Adjustments might not be appropriate
We “modeled” the data—we made assumptions
We could not implement all the adjustments we
would have liked to.
• Better approach?
– Start with a good design!
– We would have suggested balance
– Would NOT have given multiple treatments within
mice.
Statisticians: Specific Responsibilities
• Design
– Choose most efficient design
– Consider all aims of the study
– Particular designs that might be useful
• Cross-over
• Pre-post
• Factorial
– Sample size considerations
– Interim monitoring plan
Statisticians: Specific Responsibilities
• Assistance in endpoint selection
– Subjective vs. objective
– Measurement issues
• Is there measurement error that should be
considered?
• What if you are measuring pain? QOL?
– Multiple endpoints (e.g. safety AND efficacy)
– Patient benefit versus biologic/PK endpoint
– Primary versus secondary
– Continuous versus categorical outcomes
Statisticians: Specific Responsibilities
• Analysis Plan
– Statistical method for EACH aim
– Account for type I and type II errors
– Stratifications or adjustments are included if
necessary
– Simpler is often better
– Loss to follow-up: missing data?
Sample Size and Power
• The most common reason we get contacted
• Sample size is contingent on design, analysis plan, and
outcome
• With the wrong sample size, you will either
– Not be able to make conclusions because the study is
“underpowered”
– Waste time and money because your study is larger than it
needed to be to answer the question of interest
• And, with wrong sample size, you might have problems
interpreting your result:
– Did I not find a significant result because the treatment does not
work, or because my sample size is too small?
– Did the treatment REALLY work, or is the effect I saw too small
to warrant further consideration of this treatment?
– This is an issue of CLINICAL versus STATISTICAL signficance
Sample Size and Power
• Sample size ALWAYS requires the investigator
to make some assumptions
– How much better do you expect the experimental
therapy group to perform than the standard therapy
groups?
– How much variability do we expect in measurements?
– What would be a clinically relevant improvement?
• The statistician CANNOT tell you what these
numbers should be (unless you provide data)
• It is the responsibility of the clinical investigator
to define these parameters
Sample Size and Power
• Review of power
– Power = The probability of concluding that the new
treatment is effective if it truly is effective
– Type I error = The probability of concluding that the
new treatment is effective if it truly is NOT effective
– (Type I error = alpha level of the test)
– (Type II error = 1 – power)
• When your study is too small, it is hard to
conclude that your treatment is effective
Example: sample size in multiple
myeloma study
• Phase II study in multiple myeloma patients (Borello)
• Primary Aim: Assess clinical efficacy of activated marrow
infiltrating lymphocytes (aMILs) + GM-CSF-based tumor
vaccines in the autologous transplant setting for patients
with multiple myeloma.
• If the clinical response rate is ≤ 0.25, then the
investigators are not interested in pursuing aMILs +
vaccines further.
• If the clinical response rate is ≥ 0.40, they would be
interested in pursuing further.
Example: sample size in multiple
myeloma study
• H0: p = 0.25
• H1: p = 0.40
• We want to know what sample size we need to
have large power and small type I error.
– If the treatment DOES work, then we want to have a
high probability of concluding that H1 is “true.”
– If the treatment DOES NOT work, then we want a low
probability of concluding that H1 is “true.”
Sample size = 10; Power = 0.17
Distribution of H0: p = 0.25
0.20
0.15
0.10
0.05
0.00
distribution
Probability
Density
0.25
Distribution of H1: p = 0.40
0.0
0.2
0.4
0.6
ProportionProb.
of responders
of Response
0.8
1.0
Vertical line defines
“rejection region”
Sample size = 30; Power = 0.48
Distribution of H0: p = 0.25
0.10
0.05
0.00
distribution
Probability
Density
0.15
Distribution of H1: p = 0.40
0.0
0.2
0.4
0.6
ProportionProb.
of responders
of Response
0.8
1.0
Vertical line defines
“rejection region”
Sample size = 75; Power = 0.79
Distribution of H0: p = 0.25
0.08
0.06
0.04
0.02
0.00
distribution
Probability
Density
0.10
Distribution of H1: p = 0.40
0.0
0.2
0.4
0.6
ProportionProb.
of responders
of Response
0.8
1.0
Vertical line defines
“rejection region”
Sample size = 120; Power = 0.92
Distribution of H0: p = 0.25
0.06
0.04
0.02
0.00
distribution
Probability
Density
0.08
Distribution of H1: p = 0.40
0.0
0.2
0.4
0.6
ProportionProb.
of responders
of Response
0.8
1.0
Vertical line defines
“rejection region”
Not always so easy
• More complex designs require more
complex calculations
• Usually also require more assumptions
• Examples:
– Longitudinal studies
– Cross-over studies
– Correlation of outcomes
• Often, “simulations” are required to get a
sample size estimate.
Most common problems seen in study
proposals when a statistician is not involved
• Outcomes are not clearly defined
• There is not an analysis plan for secondary aims
of the study
• Sample size calculation is too simplistic or
absent
• Assumptions of statistical methods are not
appropriate
Examples:
trials with additional statistical needs
• Major clinical trials (e.g. Phase III studies)
• Continual reassessment method (CRM) studies)
• Longitudinal studies
• Study of natural history of disease/disorder
• Studies with ‘non-random’ missing data
Major trials
• Major trials usually are monitored periodically for safety
and ethical concerns.
• Monitoring board: Data (Safety and) Monitoring
Committee (DMC or DSMC)
• In these, trials, ideally you would have three statisticians
(Pocock, 2004, Statistics in Medicine)
– Study statistician
– DMC statistician
– Independent statistician
• Why?
– Interim analyses require “unbiased” analysis and interpretation of
study data.
• Industry- versus investigator-initiated trials …differences?
Study Statistician
• Overall statistical responsibility
• Actively engaged in design, conduct, final
analysis
• Not involved in interim analyses
• Want them to remain ‘blinded’ until the
study is complete
DMC Statistician
• Experienced trialist
• Evaluate interim results
• Decide (along with rest of DMC) whether
trial continues
• No conflict of interest
Independent Statistician
•
•
•
•
Performs interim analysis
Writes report of interim analysis
No conflict of interest
Only person to have full access to
“unblinded” data until trial completion
High-maintenance trials
• Some trials require statistical decision-making
during the trial
• Simon “two-stage” design:
– Stage 1: Treat about half the patients.
– Stage 2: If efficacy at stage 1 meets some standard,
then enroll the remainder of patients
• “Adaptive” and “Sequential” trials: final sample
size is determined somewhere in the middle of
the trial
• Continual Reassessment Method…
Continual Reassessment Method
• Phase I trial design
• “Standard” Phase I trials (in oncology) use what is often
called the ‘3+3’ design
Treat 3 patients at dose K
1. If 0 patients experience dose-limiting toxicity (DLT), escalate to dose K+1
2. If 2 or more patients experience DLT, de-escalate to level K-1
3. If 1 patient experiences DLT, treat 3 more patients at dose level K
A. If 1 of 6 experiences DLT, escalate to dose level K+1
B. If 2 or more of 6 experiences DLT, de-escalate to level K-1
• Maximum tolerated dose (MTD) is considered highest
dose at which 1 or 0 out of six patients experiences DLT.
• Doses need to be pre-specified
• Confidence in MTD is usually poor.
Continual Reassessment Method
• Allows statistical modeling of optimal dose:
dose-response relationship is assumed to
behave in a certain way
• Can be based on “safety” or “efficacy” outcome
(or both).
• Design searches for best dose given a desired
toxicity or efficacy level and does so in an
efficient way.
• This design REALLY requires a statistician
throughout the trial.
• Example: Phase I/II trial of Samarium 153 in
High Risk Osteogenic Sarcoma (Schwartz)
CRM history in brief
• Originally devised by O’Quigley, Pepe and
Fisher (1990) where dose for next patient was
determined based on responses of patients
previously treated in the trial
• Due to safety concerns, several authors
developed variants
–
–
–
–
Modified CRM (Goodman et al. 1995)
Extended CRM [2 stage] (Moller, 1995)
Restricted CRM (Moller, 1995)
and others….
Basic Idea of CRM
exp(3  di )
p(toxicity| dose  di ) 
1  exp(3  di )
Modified CRM
(Goodman, Zahurak, and Piantadosi, Statistics in Medicine, 1995)
• Carry-overs from standard
CRM
– Mathematical dose-toxicity
model must be assumed
– To do this, need to think about
the dose-response curve and
get preliminary model.
– We CHOOSE the level of
toxicity that we desire for
the MTD (p = 0.30)
– At end of trial, we can
estimate dose response curve.
– ‘prior distribution’
(mathematical subtlety)
exp(3  di )
p(toxicity| dose  di ) 
1  exp(3  di )
Modified CRM by
Goodman, Zahurak, and Piantadosi
(Statistics in Medicine, 1995)
• Modifications by Goodman et al.
– Use ‘standard’ dose escalation model until first toxicity is
observed:
• Choose cohort sizes of 1, 2, or 3
• Use standard ‘3+3’ design (or, in this case, ‘2+2’)
– Upon first toxicity, fit the dose-response model using
observed data
• Estimate 
• Find dose that is closest to toxicity of 0.3.
– Does not allow escalation to increase by more than one dose
level.
– De-escalation can occur by more than one dose level.
– Dose levels are discrete: need to round to closest level
Example Samarium study with cohorts of size 2:
2 patients treated at dose 1 with 0 toxicities
2 patient treated at dose 2 with 1 toxicity
 Fit CRM using equation below
exp(3  di )
p(toxicity| dose  di ) 
1  exp(3  di )
• Estimated  = 0.77
• Estimated dose for
next patient is 2.0
• Use dose level 2
for next cohort.
Example Samarium study with cohorts of size 2:
2 patients treated at dose 1 with no toxicities
4 patients treated at dose 2 with 1 toxicity
 Fit CRM using equation on earlier slide
• Estimated  = 0.88
• Estimated dose for next
patient is 2.54
• Round up to dose
level 3 for next cohort.
Example Samarium study with cohorts of size 2:
2 patients treated at dose 1 with no toxicities
4 patients treated at dose 2 with 1 toxicity
2 patients treated at dose 3 with 1 toxicity
 Fit CRM using equation on earlier slide
• Estimated  = 0.85
• Estimated dose for next
patient is 2.47
• Use dose level 2
for next cohort.
Example Samarium study with cohorts of size 2:
2 patients treated at dose 1 with no toxicities
6 patient treated at dose 2 with 1 toxicity
2 patients treated at dose 3 with 1 toxicity
 Fit CRM using equation on earlier slide
• Estimated  = 0.91
• Estimated dose for next
patient is 2.8
• Use dose level 3
for next cohort.
• Etc.
Longitudinal Studies
• Multiple observations per individual over
time
• Sample size calculations are HARD
• But, analysis is also complex
• Standard assumption of “basic” statistical
methods and models is that observations
are independent
• With longitudinal data, we have
“correlated” measures within individuals
Study of Autism in Young Children (Landa)
• Autism usually diagnosed at age 3.
• But, there is evidence that there are earlier symptoms
that are indicative of autism
• Children at high risk of autism (kids with older autism
siblings), and “controls” were observed at 6 months, 14
months, and 24 months for symptoms (Mullen)
• Prospective study
• Children were diagnosed at 36 months into three groups.
– ASD (autism-spectrum disorder)
– LD (learning disabled)
– Unaffected
• Earlier symptoms were compared to see if certain
symptoms could predict diagnosis.
ASD (n=23)
Mullen Subscales
Mean
LD (n=11)
(SD)
Mean
(SD)
Unaffected (n=53)
Mean
(SD)
6 Months
Gross Motor
51.50
(7.69)
53.86
(6.31)
49.35
(10.35)
Visual Reception
52.58
(8.94)
47.43
(10.75)
55.18
(10.66)
Fine Motor
46.92
(11.86)
36.86
(7.45)
49.54
(10.76)
Receptive Language
50.75
(8.21)
44.86
(8.95)
50.18
(7.26)
Expressive Language
47.25
(10.14)
44.57
(4.57)
43.98
(6.70)
Early Learning Composite
100.67
(12.75)
87.29
(9.16)
99.59
(10.10)
14 Months
Gross Motor
46.91
(12.34)
52.91
(10.38)
58.16
(10.52)
Visual Reception
48.39
(10.95)
51.00
(9.32)
54.73
(9.03)
Fine Motor
50.48
(10.44)
52.82
(9.40)
57.41
(7.28)
Receptive Language
34.70
(13.38)
39.64
(5.95)
52.59
(12.26)
Expressive Language
39.04
(15.14)
47.00
(7.64)
52.02
(11.33)
Early Learning Composite
87.39
(19.97)
95.55
(10.68)
108.27
(13.89)
24 Months
Gross Motor
35.43
(8.69)
49.18
(11.04)
52.20
(10.97)
Visual Reception
43.26
(10.98)
48.91
(11.59)
56.73
(10.48)
Fine Motor
36.04
(14.17)
48.91
(7.97)
52.78
(11.07)
Receptive Language
35.74
(15.25)
42.73
(11.31)
59.22
(10.74)
Expressive Language
36.65
(15.31)
45.27
(12.03)
60.14
(12.15)
Early Learning Composite
78.43
(21.68)
93.73
(14.86)
114.98
(15.89)
Statistical modeling can help!
• Previous table was hard to make conclusions
from
• Each time point was analyzed separately, and
within time points, groups were compared.
• Use some reasonable assumptions to help
interpretation
– Kids have “growth trajectories” that are continuous
and smooth
– Observations from within the same child are
correlated.
Conclusions are much easier
• The models that were used to make the
graphs as somewhat complicated
• But, they are “behind the scenes”
• The important information is presented
clearly and succintly
• ANOVA approach does not “summarize”
data
Concluding Remarks
• Get your statistician involved as soon as you
begin to plan your study
• Things statisticians do not like:
– Being contacted several days before
grant/protocol/proposal is due
– Rewriting inappropriate statistical sections
– Analyzing data that has arisen from a poorly designed
trial
• Statisticians have a lot to add
– “fresh” perspective on your study
– Study will be more efficient!