Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Informing the selection of futility stopping thresholds: case study from a late-phase clinical trial Hughes S, Cuffe RL, Lieftucht A, Nichols WG Pharmaceutical Statistics 2009; 8: 25-37 Sara Hughes GSK Head of Clinical Statistics For PSI/DIA Journal Club, June 2012 [email protected] Research Example Constant drive for more efficient clinical trial designs – Quicker decisions – Reduced financial & human investment in ‘futile’ drugs / doses – Patient safety Adaptive designs receiving much press and research – Futility designs became viable in late 1970s – But, limited examples of application in clinical trial literature (at time of this paper) Futility case study – General futility definitions and case study background – Useful graphical tools created to demonstrate risks of futility design – Decision analysis developed to aid selection of futility stopping rules 2 Futility Dictionary of Terms Futility interim analysis: the option to stop a study if the possibility at the interim stage of ultimately getting a positive result is remote – ie “it’s futile to continue - the data looks so bad that no amount of further data will reverse that - let’s quit now” Stopping threshold: what result would make us quit? Various statistical methods exist to quantify probability of future success (POS) but little guidance available for selecting optimal values for stopping thresholds – High threshold few bad trials continue but some good trials stopped – Low threshold most good trials continue but so do some failures 3 HIV Futility Case Study GSK has an EU license to sell HIV drug Telzir at dose 700mg twice-daily with Ritonovir 100mg twice-daily boosting – Interested in investigating Telzir 1400mg once-daily with Ritonovir 100mg once-daily boosting – Once daily dosing would offer increased convenience – Reduced Ritonovir dose may offer improved safety profile Study to assess this is large, lengthy and costly – Futility design reduces the risk of a failed study – Without high probability of success, can redirect resources to other research & stop prescribing ineffective dosing regimen 4 Study Design 1:1 randomisation Stage One (N=200) Stage Two (N=528) Investigational dose Investigational dose Standard dose Standard dose 24 week Interim futility analysis 48 week Final analysis Primary endpoint: Non-inferiority on efficacy (proportion with undetectable HIV viral load) Stop After Stage 1 if POS < X% Key powered secondary endpoint: Superior on safety (difference of ≥13mg/dL in non-HDL cholesterol) Stop After Stage 1 if POS <Y% 5 “POS” for Case Study A variety of statistical stopping methods can be used for calculating POS (probability of future success): – frequentist conditional power (calculated under H0, H1, or current trend) – semi-Bayesian predictive power – formal group sequential methods Case study POS: conditional power under current trend – “Based on the results so far - and assuming these results reflect the truth - what is the probability of successfully achieving the study objectives at the end of the study?” Choice of stopping thresholds more important than choice of method. We had two challenges: – How to convey features & risks of futility design to non-statistical colleagues – how to derive optimal stopping thresholds? 6 Interpreting Conditional Power 100 Control response: 76% Control response: 72% Control response: 68% Conditional power (%) 80 60 40 20 0 -12 -10 -8 -6 -4 -2 Difference in response rates (%) 0 7 Probability of falsely stopping futility at the interim (%) (%) of falseforstop chance Impact Of “When” The Interim Occurs 50 POS threshold of 90% POS threshold of 70% POS threshold of 50% POS threshold of 30% 40 30 20 10 0 0 200 400 patients recruited 600 8 Impact of Interim on Trial’s Power for Primary Efficacy Endpoint Power (%) 90 80 70 60 10 30 50 70 90 Conditional power threshold (%) Note: no impact on type I error for futility designs 9 Quantifying Risks of Design Setting a stopping threshold of 70% POS will lead to a 27% chance of stopping at the interim if the drug works and a 10% chance of continuing if it doesn’t work probability of false stop (%) 50 40 90% 30 70% 20 50% 30% 10 10% 0 0 10 20 30 40 50 probability of false go (%) 10 Issues Clear graphs illustrated risks & benefits of varying stopping thresholds and timing of interim analysis But: Not every ‘successful’ trial is equally good – eg some results more or less likely to lead to license approval Wanted to quantitatively include in decision making information we already had on this new regimen’s performance (PK and small pilot studies) Decision analysis combined all these factors in order to weigh up benefits and risks of each stopping threshold 11 Decision Analysis Step 1: Categorise possible outcomes & elicit prior expectations Prior probability Efficacy Safety 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0 bad mediocre good excellent Proportion of responders at Wk 24 bad mediocre good excellent Improvement in non-HDL cholesterol (mg/dL) 12 Decision Analysis Step 2: Calculate predicted distribution of trial outcomes for each choice of stopping threshold (shown for 80% POS) efficacy safety 18% 22% 39% 50% 20% 30% 1% 3% 9% Stop at interim Continue to bad results Continue to mediocre results 8% Continue to good results Continue to excellent results 13 Results of Decision Analysis Using pie-charts for primary efficacy endpoint: – 50% probability of study continuation given 80% POS stopping threshold – Relaxing POS threshold to 70% progresses an additional 5% of trials, 53% of which go on to good/excellent results – Relaxing POS threshold to 60% progresses a further 5% of trials, 48% of which go on to good/excellent results – Relaxing POS threshold to 50% progresses a further 5% of trials, 43% of which go on to good/excellent results – … 14 Final Stopping Thresholds Selected Efficacy endpoint: 70% POS Safety endpoint: 60% POS Based on our assumptions, we had 38% overall probability of continuing the study to Stage Two If study continued, estimated 62% probability of final good/excellent results for both endpoints – Compared to 33% probability of good/excellent results with no futility interim analysis If stopped correctly for futility, prevented 528/2 subjects from possibly inferior regimen and saved company approx. £8million in wasted R&D funds 15 Case Study Conclusions Futility designs under-utilised but have great potential: – “playing the winner”, maximising use of limited resources – Depending on phase of trial and nature of disease and drug being studied, stopping threshold level may vary considerably Selection of optimal stopping threshold challenging – Lack of practical guidance in statistical literature – Can motivate discussion via informative graphs, simulations and decision analysis – making this design far more appealing and acceptable to non-statistical colleagues Statistical team led the study design development & choice of threshold work 16