Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Overview of Meta-analysis in Drug Safety Assessments Jesse A. Berlin, ScD Johnson & Johnson Pharmaceutical Research and Development DIA – FDA – PhRMA Drug Safety Conference October 2008 Arlington, VA / Oct 15, 2008 2 The Obligatory Disclaimer The views expressed herein represent those of the presenter and do not necessarily represent the views or practices of the presenter’s employer or any other party. 3 Outline Recommendations for the use of meta-analysis for safety assessment during product development: methodologic questions Case studies of the use of historical randomized trial data to address potential safety concerns (including observational studies) Emphasis on exploration of patient-level characteristics as potential effect modifiers – Some more methodologic “heads up” 4 What is Meta-analysis? An optional component of a systematic review Definition: ‘the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings” Glass (1976) meta: ‘after’, ‘above’, ‘transcending’ 5 Is it sampling variability? Problem: How do we distinguish sampling variability from “real” variability (possibly) associated with different effects of treatment in different subgroups of patients (or with different dosing algorithms or other specific aspects of treatment)? 6 Why do a meta-analysis? To increase power and precision – detect effect as statistically significant; narrower Cis To quantify effect sizes and their uncertainty – reduce problems of interpretation due to sampling variation To systematically assess the overall findings from a body of literature – Reduce the tendency to focus only on results that support prior beliefs To answer questions not posed by the individual studies – Study-level factors (e.g., double-blind vs. open-label) – Patient-level factors 7 Estimates with 95% confidence intervals IV streptokinase for acute MI (3 month mortality) Combined: RR = 0.79 (95% CI 0.72,0.87) 0.01 0.1 1 Risk ratio 10 8 Estimated OR for IHD events by extent of serum cholesterol reduction (from Thompson, SMMR 1993; 2:173-192) Guidelines from the Safety Planning, Evaluation, and Reporting Team (SPERT) white paper Multi-company PhRMA committee with goal of recommending an industry-wide standard for safety planning, data collection, evaluation, and reporting A few selected items for your consideration for what to do during development 9 10 SPERT Recommendations (1) PRINCIPLES: Safety questions can be investigated by aggregating the cumulative safety data on an ongoing basis to obtain a single estimate of treatment effect for individual safety parameters We recommend that sponsors develop a Program Safety Analysis Plan (PSAP) as a tool to proactively plan for meta-analysis of the program safety data. 11 SPERT Recommendations (2) Specify important adverse events prior to commencing pivotal clinical trials This facilitates subsequent integration and interpretation of data by collecting important data in a standard fashion in all relevant studies. 12 Program Safety Analysis Plan (PSAP) Second section focuses on analyses – Those to be analyzed using formal inferential statistics (Tier 1 events: specified a priori) – Statistical and graphical methodologies – Should address missing values, multiplicity, analysis population, etc., much like a singletrial statistical analysis plan does. POINT: Make safety analysis plans look more like efficacy analysis plans than they have in the past 13 Analytical considerations Power and precision considerations for the contemplated pooled/meta-analyses (including subgroup analyses) – Traditional hypothesis testing versus “ruling out” an increase in risk of a certain size (like “non-inferiority”) Are dedicated clinical safety studies needed to address specific safety endpoints? The PSAP should be discussed with the regulatory authorities at an agreed-upon milestone (e.g., end-of-Phase II meeting) – Therefore the first version of the analysis plan should be completed prior to this meeting! 14 Planning meta-analyses (you will hear this again) ICH E9 guideline states that meta-analyses should be prospectively planned with the clinical trials program in the development of a new treatment Not just planning the logistics, but planning the scientific questions to be addressed (Berlin and Colditz, JAMA 1999; 281:830-834 ) – Standardization of definitions of endpoints – Standardization of data collection to allow combination of results across all studies in the development program. – “Meta-design” considerations 15 Meta-experimental design Plan and control variation in the different factors in a systematic manner. – Like a factorial experiment or a single randomized trial with stratified randomization Better to conduct 2 studies, each including both men and women, and to stratify (either in the randomization or post hoc) by sex, rather than to do one study in men and a separate study in women. – Separate studies by sex confounds sex and “study” – Might be further confounded by different doses POINT: Think about what the “meta design space” will look like when you’re done 16 What Happens in Practice? Whether we’re doing the meta-analysis before or after approval, we need to think about how to address “heterogeneity” (a recurring theme for today) 17 What is heterogeneity? Clinical heterogeneity Participants – e.g., conditions under investigation, eligibility criteria for trials, geographical variation Interventions – e.g., intensity / dose / duration, sub-type of drug, mode of administration, experience of practitioners, nature of the control (placebo/none/standard care) Outcomes – e.g., follow-up duration, ways of measuring, cut-off points on scales 18 What is heterogeneity? Methodologic heterogeneity Design – e.g., randomized vs. non-randomized, crossover vs. parallel group vs. cluster randomized, length Conduct – e.g., allocation concealment, blinding (masking) of subjects, treating physicians, outcome evaluation, etc., approach to analysis (intent-to-treat vs. “completers”) 19 What is heterogeneity? Statistical heterogeneity Common views – Variation in the results of studies – More variation than would be expected by chance In truth: – Variation in the true effects underlying the studies – that may manifest itself in more observed variation than expected by chance – may be due to different treatment effects or different biases Is statistical heterogeneity inevitable? 20 Identifying heterogeneity How do we tell whether statistical variation among (between) results is due to chance or real differences? Eyeballing – a graphical inspection of the results is usually the first step – a lack of overlap in confidence intervals indicates heterogeneity (but overlap does not imply absence of heterogeneity) 21 Opioids for breathlessness Early light reduction for ROP Estimates with 95% confidence intervals -2 -1 0 1 Estimates with 95% confidence intervals 2 0.2 1.0 Risk ratio Standardized mean difference Favors opioid Favors placebo Favors LR 5 Favors control 22 Identifying heterogeneity Statistical test – A chi-squared (c2) test (Cochran’s Q) – Has low power because there are usually very few studies: i.e., test is not very good at detecting heterogeneity when it exists – But, has excessive power to detect clinically unimportant heterogeneity when there are many studies 23 Identifying heterogeneity Test is not asking a useful question if heterogeneity is inevitable Quantify inconsistency – based on c2 statistic, Q, and its degrees of freedom. Q d.f. I 100% Q 2 describes the proportion of variability that is due to heterogeneity as opposed to sampling error (d.f. = degrees of freedom = the number of studies minus 1) 24 What can we do with heterogeneity? Ignore it Don’t do that! (worse yet – some people throw out the outliers) Incorrect data extraction; Check the data Encompass it Explore it unit of analysis errors (e.g., with crossover trials, cluster randomized trials, counts) Random effects meta-analysis Subgroup analysis Meta-regression Funnel plot 25 Fixed effect meta-analysis model Random error (statistical homogeneity) Result True effect 26 Random effects meta-analysis model Random error The width of the curve reflects the amount of heterogeneity Trial specific effect True mean effect 27 Random effects meta-analysis The ‘amount’ of heterogeneity can be estimated Weights are adjusted to account for both within-study and among-study variability Random effects analyses give – similar results when there is no heterogeneity – similar pooled effect, wider confidence interval when there is ‘symmetric’ heterogeneity – different results when there is funnel plot asymmetry – they give more weight to the potentially biased sample of small studies 28 Identical results Estimates with 95% confidence intervals Kennedy 1997 Early light reduction for ROP Locke 1952A Lopes 1997 Reynolds 1998 Seiberth 1994 Fixed effect Random effects 0.2 1.0 Risk ratio Favours LR 5 Favours control 29 Slightly different results Estimates with 95% confidence intervals Trial Woodcock 1981 Opioids for breathlessness Woodcock 1982 Johnson Eiser (A) Eiser (B) Bruera Light Chua Poole Davis Leung Noseda –0.32 ( –0.43 , –0.20 ) –0.31 ( –0.50 , –0.13 ) Fixed effect Random effects -2 -1 0 1 Standardised mean difference Opioid better Placebo better 2 30 Very different results Estimates with 95% confidence intervals Study Morton Rasmussen Smith Abraham Feldstedt Shechter 1990 Ceremuzynski LIMIT-2 Bertschat Singh Pereira Golf Thogersen Shechter 1995 IV magnesium for acute MI (mortality) ISIS-4 MAGIC 1.01 (0.97,1.07) 0.76 (0.62,0.92) Fixed effect Random effects 0.01 0.1 1 10 Risk ratio 31 RE models can be counter-intuitive Study 1 Dead Alive Total Treatment 30 (60%) 20 50 Control 40 50 10 (20%) 32 Counter-intuitive RE (2) Study 2 Dead Alive Total Treatment 100 (1%) 9,900 10,000 Control 9,800 10,000 200 (2%) 33 Counter-intuitive RE (3) Study 1: RR = 60% / 20% = 3 Study 2: RR = 1% / 2% = 0.50 N = 100 N = 20,000 Heterogeneity test p-value < 0.001 Fixed effect summary OR = 0.60 (0.48, 0.76) Random effects summary OR = 1.66 (0.14, 19) 34 Examples and challenges 35 SSRIs and Suicidal Behaviors 36 Did we already know this? “With beginning convalescence (following initiation of treatment with tricyclic antidepressants), the risk of suicide once more becomes serious as retardation fades.” – [Clinical Psychiatry, by Mayer-Gross, Slater, and Roth, 1960, p. 231] “While this and other mechanisms all have some plausibility as explanations for the clinical observation of worsening depression or suicidality in depressed patients being treated with antidepressants, proposing a mechanism is quite a different matter from demonstrating empirically that there is a causal association between antidepressant use and induction of suicidality.” – FDA Briefing Book for PDAC, 2006 37 SSRI Methods: adjudication? Possibly suicide-related adverse events (PSRAEs) were adjudicated by the sponsors using the algorithm developed by the group at Columbia U. (K. Posner) Reason: “…large number of subjects (approximately 100,000) in the adult suicidality analysis, which made impracticable more detailed adjudication of all potentially suicidal behaviors by the FDA.” So – what’s the standard? – Independent third party? – What would be the anticipated direction of any bias related to lack of adjudication? 38 Conclusions about Adjudication “A wide variety of approaches can help assure that outcome assessment in large simple trials is clinically relevant, accurate, and without differential misclassification” (JB added emphasis) Adjudication increases cost and complexity “Based on the available data from cardiovascular trials, adjudication has not been shown to improve the ability to determine treatment effects.” – Granger CB, Vogel V, Cummings SR, et al. Do we need to adjudicate major clinical events? Clinical Trials 2008;5:56-60. 39 Broad versus narrow definitions Common view is that more sensitive definitions – Are more “conservative” by being inclusive – Increase power by generating more events Overly broad inclusion of events can lead to an underestimation of the true relative risk – might include events less likely to be related to the true (but possibly unknown) mechanism of action or – by their nature, are simply more likely to be misclassified in clinical trials Implications of “non-differential” misclassification in efficacy versus safety settings? (MORE LATER) 40 What endpoints (AEs) were included? Primary outcome: suicidal ideation or worse (outcomes 1, 2, 3 or 4 below), also called suicidality or suicidal behavior and ideation. – 1. Completed suicide – 2. Suicide attempt – 3. Preparatory acts toward imminent suicidal behavior – 4. Suicidal ideation – 5. Self-injurious behavior, intent unknown – 6. Not enough information (Fatal) – 7. Not enough information (Non-Fatal) 41 Statistical methods Aggregate-level analyses: – – – – Mantel-Haenszel (fixed-effect primary) DerSimonian-Laird “Double zero” studies excluded Single zero – continuity correction 42 What about those “no event” studies? The exclusion of trials with no events in either placebo or primary active drug arms is problematic. The absence of events provides some information because of the background rate of events independent of drug effect. Studies with no events are dropped from the likelihood for usual ratio estimates (OR, RR) Risk differences are perhaps more promising, although also have their own problems (e.g., variance estimation) DO SENSITIVITY ANALYSES (and attend the talk later) 43 Patient-level analyses Allow exploration of subgroups defined by patient-level characteristics Ecological bias can be a problem when regressing study result (e.g., log OR) against aggregate-level patient characteristics (e.g., mean age, percent male, etc.) – Statistics in Medicine, 2002; 21:371-387 FDA used conditional logistic regression NOTE: also allows proper time-to-event analyses when appropriate 44 Analyses by age Young vs. Older Adults <25, 25+ Young, Middle-aged and Elderly <25, 25-64, 65+ Age by Decade <25, 25-34, 35-44, 45-54, 55-64, 65-74, 75+ Age by Double Decade <25, 25-44, 45-64, 65+ (Assessing sensitivity of results to choice of definition of age categories) Could use non-linear fitting algorithms, like multivariate restricted splines – (e.g. Royston P, Sauerbrei W. Multivariable modeling with cubic regression splines: A principled approach. The Stata Journal 2007;7(1):45-70) 45 Results overall Suicidality Risk for Active Drug relative to Placebo– Ideation or Worse – All Adults – All Diagnoses – 0.85 (0.71 – 1.02), p = 0.08 by conditional LR – 0.86 (0.71 – 1.04), p = 0.12 Exact Method Suicide-related behavior (preparatory acts, attempts and completed suicide) – OR = 1.12 (95% CI, 0.79 – 1.58), by conditional logistic regression) (LOOKS DIFFERENT?) 46 Results by indication 47 Results by age 48 MA of observational studies (briefly) Meta-analysis of observational studies remains controversial – How many epidemiologists does it take to change a light bulb? The point will often NOT be to produce a single summary estimate, but to explore (presumed) sources of heterogeneity of findings 49 50 Conclusions (1) Meta-analysis has valuable applications in pharmacoepidemiology – Evaluation of safety using existing randomized trials – Evaluation of safety using non-experimental studies (need more time to show) 51 Conclusions (2) There are challenging methodologic issues in the meta-analysis of safety data – Rare events, multiplicity, adjudication, … Sensitivity analyses should always be performed – Then more sensitivity analyses should always be performed Use patient-level data when possible Another example (if time permits) 53 Example: Galantamine Acetylcholinesterase inhibitors (AchEIs) are used as a standard treatment for Alzheimer’s Disease (AD) Galantamine, an AChEI, has been extensively studied in patients with mild to moderate AD Galantamine has also been studied in patients with AD with concomitant cerebrovascular disease (CVD) and in patients with VaD (16). The benefit is to slow the progress of cognitive decline (relative to placebo) 54 Safety “signal” for Galantamine in Mild Cognitive Impairment Two 2-year randomized controlled trials – Individuals with mild cognitive impairment – Findings replicated in both studies – 13 deaths versus 1 death Higher mortality observed in galantamine-treated patients, compared with placebo – Overall mortality rates were low in both groups The findings prompted a reevaluation in patients with dementia 55 Galantamine Methods All galantamine trials (J&J or Shire-sponsored) for which J&J could access data Also searched MEDLINE and the Cochrane Controlled Trials Register (2005) Issue 4 Trials included were independently reviewed, verified by two readers, and met the following criteria: – a) randomized – b) placebo-controlled – c) parallel group – d) blinded – e) at least one treatment arm with galantamine 56 Meta-analysis of survival in galantamine randomized trials (6 months duration) OR with Study or sub-category AD trials upto 4 months GAL-93-01 GAL-CAN-5 GAL-INT -2 GAL-USA-16 AD trials upto 4 months - SUBT OT AL Fixed trial effect : Fixed trial effect : Breslow-Day test for homogeneity N Deaths on T otal Randomized 95%CI Placebo 1.33 0.34 0.09 3.00 0/ 87, 0.0% 1/ 66, 1.5% 2/125, 1.6% 0/ 69, 0.0% 1/198, 0.5% 0/ 64, 0.0% 0/261, 0.0% 1/ 70, 1.4% 2/279, 0.7% 2/215, 0.9% 3/320, 0.9% 2/138, 1.4% 1/213, 0.5% 4/286, 1.4% 2/275, 0.7% 2/438, 0.5% 8/645, 1.2% 2/256, 0.8% 3/423, 0.7% 8/692, 1.2% 11/390, 2.8% 5/196, 2.6% 5/396, 1.3% 5/396, 1.3% [0.05;32.95] [0.01; 8.46] [0.00; 1.98] [0.12;74.92] Galantamine 0.57 [0.12; 2.77] Chisq= 5.5422 DF= 3 Pr>Chisq 0.1361 Random trial effect : 0.39 [0.06; 2.34] AD trials equal to 5-6 months GAL-95-05 GAL-INT -1 GAL-INT -10 GAL-JPN-3 GAL-USA-1 GAL-USA-10 1.01 0.49 1.33 0.54 1.51 0.82 AD trials equal to 5-6 months - SUBT OT AL Fixed trial effect : Fixed trial effect : Breslow-Day test for homogeneity Chisq= 1.2286 DF= 5 Pr>Chisq 0.9421 Random trial effect : [0.14; 7.25] [0.07; 3.49] [0.35; 5.04] [0.07; 3.84] [0.16;14.65] [0.25; 2.76] 0.90 [0.46; 1.76] 0.95 [0.49; 1.83] AD+CVD and VaD trials equal to 6 months GAL-INT -26 GAL-INT -6 AD+CVD and VaD trials equal to 6 months - SUBT OT AL 0.44 [0.15; 1.28] 0.49 [0.14; 1.71] Fixed trial effect : Fixed trial effect : Breslow-Day test for homogeneity Chisq= 0.0152 DF= 1 Pr>Chisq 0.9019 Random trial effect : 0.46 [0.20; 1.04] 0.46 [0.21; 1.01] ALL T RIALS Fixed trial effect : Fixed trial effect : Breslow-Day test for homogeneity Chisq= 8.4661 DF=11 Pr>Chisq 0.6710 Random trial effect : 0.67 [0.41; 1.10] 0.65 [0.41; 1.05] 0.001 0.01 In favor of Galantamine 0.1 1 Odds Ratio 10 In favor of Placebo 100 57 Other Galantamine Analyses Nested case-control study of deaths was used to investigate potential mechanism for the mortality increase – Baseline ECG findings – Comorbidities – Concomitant medications Findings were inconclusive due to small sample size Mortality analyses in press (Feldman et al.; Acta Neurologica Scandinavica) We are doing a large, placebo-controlled study with mortality as the primary endpoint