Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Biostatistics Case Studies 2006 Session 5: Reporting Subgroup Results Peter D. Christenson Biostatistician http://gcrc.LAbiomed.org/Biostat Subgroup Issues • Measuring subgroup effect • Subgroups separately • Interaction • Selection of subgroups • A priori • Post-hoc • Based on data • Significance/strength of Conclusions • Transparency of analysis • Formal statistical comparisons; p-values, CIs. Case Study Editorial: pp. 1667-69 Case Study: Abstract Main Subgroup Result Separate Subgroup Comparisons % with Events 7.9 Symptomatic N = 12153 Δ= 1.0 p=0.05 RR=0.88 0.77 to 1.0 6.9 6.6 N = 3284 Asymptomatic Combination 5.5 Aspirin Only Δ=-1.0 p=0.20 RR=1.2 0.91 to 1.59 Separate Subgroup Conclusions • Symptomatic group: Combination better • Large N. • Is magnitude of effect relevant? See CIs. • Asymptomatic group: Inconclusive (0.91 ≤RR≤ 1.59) • Same magnitude, apparent inverse from symptomatics. • Much smaller N; less power. • Have not demonstrated subgroup difference. • Use interaction to do so. • Need to, based on CIs? Subgroup Interaction N = 12153 Δ= 1.0 p=0.05 RR=0.88 0.77 to 1.0 Interaction = Δ Δ = 2.0% N = 3284 vs. Δ=-1.0 p=0.20 RR=1.2 0.91 to 1.59 with 95% CI ~ 0.65% to 3.35% Why Is Interaction Relevant? Next slide Subgroup Conclusions with Interaction • Symptomatic group: Combination better • Large N. • Is magnitude of effect relevant? See CIs. • Asymptomatic group: Inconclusive (0.91 ≤RR≤ 1.59) • Same magnitude, apparent inverse from symptomatics. • Much smaller N; less power. • Difference between subgroups: • Significant according to interaction. • Inverse “non-effect” nevertheless incorporated. Change Data to Give Non-Significant Interaction Suppose: % with Events 7.9 Symptomatic N = 12153 Δ= 1.0 p=0.05 RR=0.88 0.77 to 1.0 6.9 6.6 6.4 Δ=-0.2 p=0.80 RR=1.03 0.40 to 1.4 Asymptomatic Combination N = 3284 Aspirin Only → P for interaction ~ 0.50. Change conclusions? Changed Data Subgroup Conclusions • Symptomatic group: Combination better • Large N. • Is magnitude of effect relevant? See CIs. • Asymptomatic group: Inconclusive (0.40 ≤RR≤ 1.40) • Apparently negligible, but not proven. • Much smaller N; less power. • Difference between subgroups: • Not demonstrated. • Use CI for ΔΔ to quantify magnitude of difference. Change Data to Give Non-Significant Interaction Suppose: % with Events 7.9 Symptomatic N = 12153 Δ= 1.0 p=0.05 RR=0.88 0.77 to 1.0 6.9 6.6 6.4 N = 3284 10000 Δ=-0.2 p=0.80 RR=1.03 0.40 to 1.3 Asymptomatic 0.96 to 1.1 Combination Aspirin Only New Changes → P for interaction will be small. Twice-Changed Data Subgroup Conclusions • Symptomatic group: Combination better • Large N. • Is magnitude of effect relevant? See CIs. • Asymptomatic group: Negligible (0.96 ≤RR≤ 1.1) • Negligible, proven. • Larger N → smaller CI; power not relevant. • Difference between subgroups: • Significantly demonstrated with interaction. • Use CI for ΔΔ to quantify magnitude of difference. Many Subgroup Analyses 12 Subgroups + Overall Formal Multiple Comparison Adjustment • Number of comparisons: k. • Individual comparison false positive error rate = α. • Experiment-wise error rate = α*. • Bonferroni adjustment: • Assume k comparisons are independent. • True negative rate = specificity = 1 – α. • Set α* = 1 - (1 – α)k → solve for α = 1 - (1 – α*)1/k =~ α*/k. • So, typically p< 0.05/(# tests) = 0.05/13= 0.004 here. • Conservative if comparisons are correlated; can improve if correlation is known. • No adjustment: Prob[≥1 false pos]=1-0.95k =0.49 if k=13. See next slide. Likelihood of False Positive Conclusions Subgroup Multiple Comparison Comments • Many other specialized methods. • Pre-specified comparisons count just as post-hoc, if post-hoc not based on results. • Why limit “experiment-wise” count to subgroup comparisons? • No formal comparisons in this paper (but what if a large diff was observed?): Table 1-3: 22+20+26 potential covariates. • P-values: Table 4 – 12 efficacy and safety comparisons. • Figure 2: 12 Subgroups. At least one explicit test. Subgroup Multiple Comparison Conclusions • Obviously usually need to examine subgroups. • If want to claim more than observations, need to adjust in a well-defined way. • Typically, report as observational and: • Explain decisions and choices of subgroups. • Formal adjustment typically not necessary. • Avoid p-values. Emphasize CI range. • Separate planned from data mining results. • Number of comparisons should be explicit. Recommendations for Reporting on Subgroups: • See Editorial. Use to justify the following approach to journal. • Do not make multiple comparison adjustment. • Be transparent about all analyses. • State where conclusions are based on interactions. • Report number of comparisons that were planned prior to looking at data (1) included and (2) not included in paper. • Report which results were a consequence of looking at data; no p-values. • Report if alternate definitions for a subgroup were examined. • Give confidence intervals for effects that are compatible with the data, not p-values, for subgroups. Recommendations: Example of a Start at Them Cohan(2005) Crit Care 23;10:2359-66.