Download 2 - Drug Information Association

Document related concepts

Biosimilar wikipedia , lookup

Polysubstance dependence wikipedia , lookup

Non-specific effect of vaccines wikipedia , lookup

Effect size wikipedia , lookup

Clinical trial wikipedia , lookup

Theralizumab wikipedia , lookup

Bad Pharma wikipedia , lookup

Bilastine wikipedia , lookup

Transcript
An Overview of Meta-analysis in
Drug Safety Assessments
Jesse A. Berlin, ScD
Johnson & Johnson Pharmaceutical Research
and Development
DIA – FDA – PhRMA Drug Safety Conference
October 2008
Arlington, VA / Oct 15, 2008
2
The Obligatory Disclaimer
The views expressed herein represent those of the
presenter and do not necessarily represent the
views or practices of the presenter’s employer or
any other party.
3
Outline
 Recommendations for the use of meta-analysis
for safety assessment during product
development: methodologic questions
 Case studies of the use of historical randomized
trial data to address potential safety concerns
(including observational studies)
 Emphasis on exploration of patient-level
characteristics as potential effect modifiers
– Some more methodologic “heads up”
4
What is Meta-analysis?
 An optional component of a systematic review
 Definition:
‘the statistical analysis of a large collection of
analysis results from individual studies for the
purpose of integrating the findings”
Glass (1976)

meta: ‘after’, ‘above’, ‘transcending’
5
Is it sampling variability?
 Problem: How do we distinguish sampling
variability from “real” variability (possibly)
associated with different effects of treatment in
different subgroups of patients (or with different
dosing algorithms or other specific aspects of
treatment)?
6
Why do a meta-analysis?
 To increase power and precision
– detect effect as statistically significant;
narrower Cis
 To quantify effect sizes and their uncertainty
– reduce problems of interpretation due to
sampling variation
 To systematically assess the overall findings
from a body of literature
– Reduce the tendency to focus only on results
that support prior beliefs
 To answer questions not posed by the individual studies
– Study-level factors (e.g., double-blind vs. open-label)
– Patient-level factors
7
Estimates with 95% confidence intervals
IV streptokinase for
acute MI
(3 month mortality)
Combined: RR = 0.79 (95% CI 0.72,0.87)
0.01
0.1
1
Risk ratio
10
8
Estimated OR for IHD events by extent of serum
cholesterol reduction
(from Thompson, SMMR 1993; 2:173-192)
Guidelines from the Safety Planning,
Evaluation, and Reporting Team (SPERT)
white paper
 Multi-company PhRMA committee with goal of
recommending an industry-wide standard for
safety planning, data collection, evaluation, and
reporting
 A few selected items for your consideration for
what to do during development
9
10
SPERT Recommendations (1)
 PRINCIPLES:
 Safety questions can be investigated by
aggregating the cumulative safety data on an
ongoing basis to obtain a single estimate of
treatment effect for individual safety parameters
 We recommend that sponsors develop a Program
Safety Analysis Plan (PSAP) as a tool to
proactively plan for meta-analysis of the program
safety data.
11
SPERT Recommendations (2)
 Specify important adverse events prior to
commencing pivotal clinical trials
 This facilitates subsequent integration and
interpretation of data by collecting important data
in a standard fashion in all relevant studies.
12
Program Safety Analysis Plan (PSAP)
 Second section focuses on analyses
– Those to be analyzed using formal inferential
statistics (Tier 1 events: specified a priori)
– Statistical and graphical methodologies
– Should address missing values, multiplicity,
analysis population, etc., much like a singletrial statistical analysis plan does.
 POINT: Make safety analysis plans look more like
efficacy analysis plans than they have in the past
13
Analytical considerations
 Power and precision considerations for the contemplated
pooled/meta-analyses (including subgroup analyses)
– Traditional hypothesis testing versus “ruling out” an
increase in risk of a certain size (like “non-inferiority”)
 Are dedicated clinical safety studies needed to address
specific safety endpoints?
 The PSAP should be discussed with the regulatory
authorities at an agreed-upon milestone (e.g., end-of-Phase
II meeting)
– Therefore the first version of the analysis plan should be
completed prior to this meeting!
14
Planning meta-analyses (you will hear this
again)
 ICH E9 guideline states that meta-analyses should
be prospectively planned with the clinical trials
program in the development of a new treatment
 Not just planning the logistics, but planning the
scientific questions to be addressed (Berlin and
Colditz, JAMA 1999; 281:830-834 )
– Standardization of definitions of endpoints
– Standardization of data collection to allow
combination of results across all studies in the
development program.
– “Meta-design” considerations
15
Meta-experimental design
 Plan and control variation in the different factors in a
systematic manner.
– Like a factorial experiment or a single randomized
trial with stratified randomization
 Better to conduct 2 studies, each including both men
and women, and to stratify (either in the randomization
or post hoc) by sex, rather than to do one study in
men and a separate study in women.
– Separate studies by sex confounds sex and “study”
– Might be further confounded by different doses
 POINT: Think about what the “meta design space” will
look like when you’re done
16
What Happens in Practice?
 Whether we’re doing the meta-analysis before or
after approval, we need to think about how to
address “heterogeneity”
 (a recurring theme for today)
17
What is heterogeneity?
Clinical heterogeneity
 Participants
– e.g., conditions under investigation, eligibility criteria
for trials, geographical variation
 Interventions
– e.g., intensity / dose / duration, sub-type of drug,
mode of administration, experience of practitioners,
nature of the control (placebo/none/standard care)
 Outcomes
– e.g., follow-up duration, ways of measuring,
cut-off points on scales
18
What is heterogeneity?
Methodologic heterogeneity
 Design
– e.g., randomized vs. non-randomized,
crossover vs. parallel group vs. cluster randomized,
length
 Conduct
– e.g., allocation concealment, blinding (masking) of
subjects, treating physicians, outcome evaluation, etc.,
approach to analysis (intent-to-treat vs. “completers”)
19
What is heterogeneity?
Statistical heterogeneity
 Common views
– Variation in the results of studies
– More variation than would be expected by chance
 In truth:
– Variation in the true effects underlying the studies
– that may manifest itself in more observed variation
than expected by chance
– may be due to different treatment effects or different
biases
 Is statistical heterogeneity inevitable?
20
Identifying heterogeneity
How do we tell whether statistical variation among (between)
results is due to chance or real differences?
 Eyeballing
– a graphical inspection of the results is usually
the first step
– a lack of overlap in confidence intervals
indicates heterogeneity (but overlap does not
imply absence of heterogeneity)
21
Opioids for
breathlessness
Early light reduction
for ROP
Estimates with 95% confidence intervals
-2
-1
0
1
Estimates with 95% confidence intervals
2
0.2
1.0
Risk ratio
Standardized mean difference
Favors opioid
Favors placebo
Favors LR
5
Favors control
22
Identifying heterogeneity
 Statistical test
– A chi-squared (c2) test (Cochran’s Q)
– Has low power because there are usually very
few studies:
 i.e., test is not very good at detecting
heterogeneity when it exists
– But, has excessive power to detect clinically
unimportant heterogeneity when there are
many studies
23
Identifying heterogeneity
 Test is not asking a useful question if heterogeneity is
inevitable
 Quantify inconsistency
– based on c2 statistic, Q, and its degrees of freedom.
Q  d.f.
I 
 100%
Q
2
describes the proportion of variability that is due to
heterogeneity as opposed to sampling error
(d.f. = degrees of freedom = the number of studies
minus 1)
24
What can we do with heterogeneity?
 Ignore it
 Don’t do that! (worse yet – some
people throw out the outliers)
 Incorrect data extraction;
 Check the data
 Encompass it
 Explore it
unit of analysis errors (e.g., with
crossover trials, cluster
randomized trials, counts)
 Random effects meta-analysis
 Subgroup analysis
Meta-regression
Funnel plot
25
Fixed effect
meta-analysis
model
Random
error
(statistical
homogeneity)
Result
True
effect
26
Random effects
meta-analysis
model
Random
error
 The width of the curve
reflects the amount of
heterogeneity
Trial
specific
effect
True mean
effect
27
Random effects meta-analysis
 The ‘amount’ of heterogeneity can be estimated
 Weights are adjusted to account for both within-study and
among-study variability
 Random effects analyses give
– similar results when there is no heterogeneity
– similar pooled effect, wider confidence interval when
there is ‘symmetric’ heterogeneity
– different results when there is funnel plot asymmetry –
they give more weight to the potentially biased sample
of small studies
28
Identical results
Estimates with 95% confidence intervals
Kennedy 1997
Early light reduction for ROP
Locke 1952A
Lopes 1997
Reynolds 1998
Seiberth 1994
Fixed effect
Random effects
0.2
1.0
Risk ratio
Favours LR
5
Favours control
29
Slightly different results
Estimates with 95% confidence intervals
Trial
Woodcock 1981
Opioids for breathlessness
Woodcock 1982
Johnson
Eiser (A)
Eiser (B)
Bruera
Light
Chua
Poole
Davis
Leung
Noseda
–0.32 ( –0.43 , –0.20 )
–0.31 ( –0.50 , –0.13 )
Fixed effect
Random effects
-2
-1
0
1
Standardised mean difference
Opioid better
Placebo better
2
30
Very different results
Estimates with 95% confidence intervals
Study
Morton
Rasmussen
Smith
Abraham
Feldstedt
Shechter 1990
Ceremuzynski
LIMIT-2
Bertschat
Singh
Pereira
Golf
Thogersen
Shechter 1995
IV magnesium for
acute MI (mortality)
ISIS-4
MAGIC
1.01 (0.97,1.07)
0.76 (0.62,0.92)
Fixed effect
Random effects
0.01
0.1
1
10
Risk ratio
31
RE models can be counter-intuitive
Study 1
Dead
Alive
Total
Treatment 30 (60%)
20
50
Control
40
50
10 (20%)
32
Counter-intuitive RE (2)
Study 2
Dead
Alive
Total
Treatment 100 (1%)
9,900
10,000
Control
9,800
10,000
200 (2%)
33
Counter-intuitive RE (3)
 Study 1: RR = 60% / 20% = 3
 Study 2: RR = 1% / 2% = 0.50
N = 100
N = 20,000
 Heterogeneity test p-value < 0.001
 Fixed effect summary OR = 0.60 (0.48, 0.76)
 Random effects summary OR = 1.66 (0.14, 19)
34
Examples and challenges
35
SSRIs and Suicidal Behaviors
36
Did we already know this?
 “With beginning convalescence (following initiation of
treatment with tricyclic antidepressants), the risk of suicide
once more becomes serious as retardation fades.”
– [Clinical Psychiatry, by Mayer-Gross, Slater, and Roth, 1960, p. 231]
 “While this and other mechanisms all have some
plausibility as explanations for the clinical observation of
worsening depression or suicidality in depressed patients
being treated with antidepressants, proposing a
mechanism is quite a different matter from demonstrating
empirically that there is a causal association between
antidepressant use and induction of suicidality.”
– FDA Briefing Book for PDAC, 2006
37
SSRI Methods: adjudication?
 Possibly suicide-related adverse events (PSRAEs)
were adjudicated by the sponsors using the algorithm
developed by the group at Columbia U. (K. Posner)
 Reason: “…large number of subjects (approximately
100,000) in the adult suicidality analysis, which made
impracticable more detailed adjudication of all
potentially suicidal behaviors by the FDA.”
 So – what’s the standard?
– Independent third party?
– What would be the anticipated direction of any bias related to
lack of adjudication?
38
Conclusions about Adjudication
 “A wide variety of approaches can help assure that
outcome assessment in large simple trials is clinically
relevant, accurate, and without differential
misclassification” (JB added emphasis)
 Adjudication increases cost and complexity
 “Based on the available data from cardiovascular
trials, adjudication has not been shown to improve the
ability to determine treatment effects.”
– Granger CB, Vogel V, Cummings SR, et al. Do we need to adjudicate
major clinical events? Clinical Trials 2008;5:56-60.
39
Broad versus narrow definitions
 Common view is that more sensitive definitions
– Are more “conservative” by being inclusive
– Increase power by generating more events
 Overly broad inclusion of events can lead to an underestimation
of the true relative risk
– might include events less likely to be related to the true (but
possibly unknown) mechanism of action or
– by their nature, are simply more likely to be misclassified in
clinical trials
 Implications of “non-differential” misclassification in efficacy
versus safety settings?
 (MORE LATER)
40
What endpoints (AEs) were included?
 Primary outcome: suicidal ideation or worse
(outcomes 1, 2, 3 or 4 below), also called
suicidality or suicidal behavior and ideation.
– 1. Completed suicide
– 2. Suicide attempt
– 3. Preparatory acts toward imminent suicidal
behavior
– 4. Suicidal ideation
– 5. Self-injurious behavior, intent unknown
– 6. Not enough information (Fatal)
– 7. Not enough information (Non-Fatal)
41
Statistical methods
 Aggregate-level analyses:
–
–
–
–
Mantel-Haenszel (fixed-effect primary)
DerSimonian-Laird
“Double zero” studies excluded
Single zero – continuity correction
42
What about those “no event” studies?
 The exclusion of trials with no events in either




placebo or primary active drug arms is
problematic.
The absence of events provides some information
because of the background rate of events
independent of drug effect.
Studies with no events are dropped from the
likelihood for usual ratio estimates (OR, RR)
Risk differences are perhaps more promising,
although also have their own problems (e.g.,
variance estimation)
DO SENSITIVITY ANALYSES (and attend the talk
later)
43
Patient-level analyses
 Allow exploration of subgroups defined by
patient-level characteristics
 Ecological bias can be a problem when regressing
study result (e.g., log OR) against aggregate-level
patient characteristics (e.g., mean age, percent
male, etc.)
– Statistics in Medicine, 2002; 21:371-387
 FDA used conditional logistic regression
 NOTE: also allows proper time-to-event analyses when
appropriate
44
Analyses by age
 Young vs. Older Adults <25, 25+
 Young, Middle-aged and
 Elderly
 <25, 25-64, 65+
 Age by Decade <25, 25-34, 35-44, 45-54, 55-64, 65-74, 75+
 Age by Double Decade <25, 25-44, 45-64, 65+
 (Assessing sensitivity of results to choice of definition of
age categories)
 Could use non-linear fitting algorithms, like multivariate
restricted splines
– (e.g. Royston P, Sauerbrei W. Multivariable modeling with cubic regression
splines: A principled approach. The Stata Journal 2007;7(1):45-70)
45
Results overall
 Suicidality Risk for Active Drug relative to Placebo–
Ideation or Worse – All Adults – All Diagnoses
– 0.85 (0.71 – 1.02), p = 0.08 by conditional LR
– 0.86 (0.71 – 1.04), p = 0.12 Exact Method
 Suicide-related behavior (preparatory acts, attempts and
completed suicide)
– OR = 1.12 (95% CI, 0.79 – 1.58), by conditional logistic
regression) (LOOKS DIFFERENT?)
46
Results by indication
47
Results by age
48
MA of observational studies (briefly)
 Meta-analysis of observational studies remains
controversial
– How many epidemiologists does it take to
change a light bulb?
 The point will often NOT be to produce a single
summary estimate, but to explore (presumed)
sources of heterogeneity of findings
49
50
Conclusions (1)
 Meta-analysis has valuable applications in
pharmacoepidemiology
– Evaluation of safety using existing randomized
trials
– Evaluation of safety using non-experimental
studies (need more time to show)
51
Conclusions (2)
 There are challenging methodologic issues in the
meta-analysis of safety data
– Rare events, multiplicity, adjudication, …
 Sensitivity analyses should always be performed
– Then more sensitivity analyses should always
be performed
 Use patient-level data when possible
Another example (if time permits)
53
Example: Galantamine
 Acetylcholinesterase inhibitors (AchEIs) are used as a
standard treatment for Alzheimer’s Disease (AD)
 Galantamine, an AChEI, has been extensively studied
in patients with mild to moderate AD
 Galantamine has also been studied in patients with
AD with concomitant cerebrovascular disease (CVD)
and in patients with VaD (16).
 The benefit is to slow the progress of cognitive decline
(relative to placebo)
54
Safety “signal” for Galantamine in Mild
Cognitive Impairment
 Two 2-year randomized controlled trials
– Individuals with mild cognitive impairment
– Findings replicated in both studies
– 13 deaths versus 1 death
 Higher mortality observed in galantamine-treated
patients, compared with placebo
– Overall mortality rates were low in both groups
 The findings prompted a reevaluation in patients with
dementia
55
Galantamine Methods
 All galantamine trials (J&J or Shire-sponsored) for which
J&J could access data
 Also searched MEDLINE and the Cochrane Controlled
Trials Register (2005) Issue 4
 Trials included were independently reviewed, verified by
two readers, and met the following criteria:
– a) randomized
– b) placebo-controlled
– c) parallel group
– d) blinded
– e) at least one treatment arm with galantamine
56
Meta-analysis of survival in galantamine
randomized trials (6 months duration)
OR with
Study or sub-category
AD trials upto 4 months
GAL-93-01
GAL-CAN-5
GAL-INT -2
GAL-USA-16
AD trials upto 4 months - SUBT OT AL
Fixed trial effect :
Fixed trial effect : Breslow-Day test for homogeneity
N Deaths on T otal Randomized
95%CI
Placebo
1.33
0.34
0.09
3.00
0/ 87, 0.0%
1/ 66, 1.5%
2/125, 1.6%
0/ 69, 0.0%
1/198, 0.5%
0/ 64, 0.0%
0/261, 0.0%
1/ 70, 1.4%
2/279, 0.7%
2/215, 0.9%
3/320, 0.9%
2/138, 1.4%
1/213, 0.5%
4/286, 1.4%
2/275, 0.7%
2/438, 0.5%
8/645, 1.2%
2/256, 0.8%
3/423, 0.7%
8/692, 1.2%
11/390, 2.8%
5/196, 2.6%
5/396, 1.3%
5/396, 1.3%
[0.05;32.95]
[0.01; 8.46]
[0.00; 1.98]
[0.12;74.92]
Galantamine
0.57 [0.12; 2.77]
Chisq= 5.5422 DF= 3 Pr>Chisq 0.1361
Random trial effect :
0.39 [0.06; 2.34]
AD trials equal to 5-6 months
GAL-95-05
GAL-INT -1
GAL-INT -10
GAL-JPN-3
GAL-USA-1
GAL-USA-10
1.01
0.49
1.33
0.54
1.51
0.82
AD trials equal to 5-6 months - SUBT OT AL
Fixed trial effect :
Fixed trial effect : Breslow-Day test for homogeneity
Chisq= 1.2286 DF= 5 Pr>Chisq 0.9421
Random trial effect :
[0.14; 7.25]
[0.07; 3.49]
[0.35; 5.04]
[0.07; 3.84]
[0.16;14.65]
[0.25; 2.76]
0.90 [0.46; 1.76]
0.95 [0.49; 1.83]
AD+CVD and VaD trials equal to 6 months
GAL-INT -26
GAL-INT -6
AD+CVD and VaD trials equal to 6 months - SUBT OT AL
0.44 [0.15; 1.28]
0.49 [0.14; 1.71]
Fixed trial effect :
Fixed trial effect : Breslow-Day test for homogeneity
Chisq= 0.0152 DF= 1 Pr>Chisq 0.9019
Random trial effect :
0.46 [0.20; 1.04]
0.46 [0.21; 1.01]
ALL T RIALS
Fixed trial effect :
Fixed trial effect : Breslow-Day test for homogeneity
Chisq= 8.4661 DF=11 Pr>Chisq 0.6710
Random trial effect :
0.67 [0.41; 1.10]
0.65 [0.41; 1.05]
0.001
0.01
In favor of Galantamine
0.1
1
Odds Ratio
10
In favor of Placebo
100
57
Other Galantamine Analyses
 Nested case-control study of deaths was used to
investigate potential mechanism for the mortality increase
– Baseline ECG findings
– Comorbidities
– Concomitant medications
 Findings were inconclusive due to small sample size
 Mortality analyses in press (Feldman et al.; Acta
Neurologica Scandinavica)
 We are doing a large, placebo-controlled study with
mortality as the primary endpoint