* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Technical Appendix: Quantifying Gains in the War on Cancer This appendix provides some additional detail on the technical methods used to estimate the life expectancy gains for cancer patients and the decomposition of these gains between early detection and improved treatment. A1. Analytic Framework We seek to quantify the separate contributions of treatment and detection to improvements in cancer survival, we rely on two premises: (1) Gains in detection manifest as patients being diagnosed earlier in cancer progression; and (2) Gains in treatment manifest as longer survival for patients diagnosed at a fixed point in disease-progression. The fact that we can observe each of these two trends independently offers the opportunity to empirically separate their effect on overall survival by cancer patients. We use the decomposition approach employed previously by Sun et al. and Lakdawalla et al.[1, 2] This is based on a comparison of gains in stage conditional survival relative to gains in the percent of patients detected at earlier stages. A simple example helps elucidate this approach, wherein there are just two stages of cancer: early and late. The probability that a patient dies within a certain timeframe is equal to the probability of being detected early or late multiplied by the expected mortality rate for a patient diagnosed early or late, respectively. If mortality rates for early-diagnosed patients remain fixed, and mortality rates for late-diagnosed patients remain fixed, any reductions in the overall mortality rate would be entirely due to a higher percentage of patients being diagnosed early. Similarly, if there were no changes in the share of patients detected early or late, any overall mortality reductions would be entirely attributable to improvements in stage conditional survival (presumably due to improved treatment). Thus, to decompose the observed mortality reductions into those due to treatment or detection, we separately estimate mortality rates by stage of diagnosis, and we estimate the percent of patients diagnosed early, and estimate the survival gains due to each. Let 𝑀𝑖𝑡 represent the probability of cancer-related mortality within three years of initial diagnosis for a patient diagnosed in year 𝑡 at stage 𝑖. For simplicity, consider the two-period case where 𝑖 equals “early” or “late.” Similarly, let 𝑝𝑡 represent the probability that a patient in year 𝑡 is diagnosed early. Advances in detection should lead to a higher 𝑝, while advances in treatment should lead to lower 𝑀𝑖𝑡 . Both could reduce mortality conditional on diagnosis, which we denote with 𝐷𝑡 . In this simple model, the relative gains from treatment and detection between years 1 and 2 satisfy the decomposition: (1) 2 2 𝐷2 − 𝐷1 = [𝑝1 (∆𝑀𝑒𝑎𝑟𝑙𝑦 ) + (1 − 𝑝1 )(∆𝑀𝑙𝑎𝑡𝑒 )] + [∆𝑝𝑀𝑒𝑎𝑟𝑙𝑦 + (1 − ∆𝑝)𝑀𝑙𝑎𝑡𝑒 ] Here ∆𝑀𝑒𝑎𝑟𝑙𝑦 and ∆𝑀𝑙𝑎𝑡𝑒 are the changes in mortality between years 1 and 2 for someone diagnosed in each stage, while ∆𝑝 is the change in the probability that a patient is diagnosed early. The first bracketed term in Equation 1 is the contribution of improvements in treatment due to survival gains. This is a weighted average of the survival gains in each stage, with weights being the relative probability of detection at each stage from year 1. The second bracketed term is the contribution to survival gains of early detection. In our approach, we estimate the overall survival gain (defined as the reduction in overall mortality 𝐷2 − 𝐷1 ) and the survival gain from treatment [𝑝1 (∆𝑀𝑒𝑎𝑟𝑙𝑦 ) + (1 − 𝑝1 )(∆𝑀𝑙𝑎𝑡𝑒 )] using trends in stage-conditional survival (unadjusted and adjusted), and attribute the residual survival gain to detection. While we consider only two stages in this example, the decomposition generalizes to multiple stages according to: (2) 𝐷2 − 𝐷1 = ∑𝐽𝑗=1 𝑝 𝑗 ∆𝑀𝑗 + ∑𝐽𝑗=1 ∆𝑝 𝑗 𝑀𝑗 Where 𝑝 𝑗 is the probability of being diagnosed in stage 𝑗 (so ∑𝐽𝑗=1 𝑝 𝑗 = 1). Data and Study Sample We use data from 1997 to 2010 from the SEER Registry, which collects data on cancer incidence and survival from population-based cancer registries across the US. The SEER Research Data incorporate information from 18 registries from varied geographic regions encompassing approximately 28 percent of the US population. The SEER data include information on patient demographics such as: age at diagnosis; gender and race; patient area of residence at diagnosis; primary tumor site; first course of treatment; and notably for our analysis, stage of cancer at diagnosis and follow-up on vital status. Detailed information is also available on the time (month and year) of diagnosis and length of survival. Our analysis focuses on 15 of the most commonly occurring tumor types: breast, lung and bronchus, colorectal, melanoma of the skin, urinary bladder, kidney and renal pelvis, pancreatic, thyroid, stomach, leukemia, myeloma, liver and intrahepatic bile duct, prostate, non-Hodgkin lymphoma (NHL), and ovarian. We restrict to patients who have just one malignant primary tumor in their lifetimes, and exclude any patients who are missing key variables included in the analysis. Statistical Analysis The difficulty of measuring disease-progression at diagnosis presents the principal analytic challenge that we address. There are three principal biases in the measurement of diseaseprogression in registry data that are known to affect survival analysis, and we review those here [4-6]: Lead time bias occurs when advances in screening lead to earlier diagnosis, and hence longer survival even in the absence of improved treatment. Lead time is the interval between the earlier diagnosis made possible by screening and the later usual time of diagnosis. Stage migration occurs as diagnostic technology improves and some cancers previously classified in an earlier stage are now classified in a later stage, for instance, due to the detection of previously undetectable micro-metastases. This leads to increases in average survival times in both stages, even in the absence of improved treatment. Length biased sampling concerns the type of disease detected by screening, namely slowgrowing tumors are more likely to be diagnosed by screening than fast-growing ones. Patients with slow-growing tumors are also likely to survive longer than those with fast-growing ones, leading to observed improvement in outcomes even in the absence of any treatment effects. The better outcomes are due to the increased selection of patients with inherently better prognoses. None of these three biases would exist if there were a perfect and fixed measure of diseaseprogression, or of speed in tumor growth, but no such measure exists. Thus, we take two distinct analytic approaches to address the potential mis-measurement of disease-progression that provide us with bounds on the contribution of treatment versus detection. Measuring stage at diagnosis The simplest approach is to use a patient’s stage at diagnosis as a measure of diseaseprogression. We estimate changes in the fraction of patients diagnosed at each stage, and changes in mortality rates within stage of diagnosis. The cancer community has developed sophisticated systems to grade cancer severity, and these are heterogeneous according to cancer type. For example, stage I invasive breast cancer is localized and relatively small, and treated with breast-conserving surgery. On the other hand, stage IV breast cancers have often metastasized and may require more systemic therapies. Reflecting these nuances, the SEER data include several variables that describe tumor histology. However, the coding of these variables changes over time, and not all tumor types have consistent stage information. We use the SEER recoded Historic Stage A variable, supplemented with additional variables whenever it is missing, to identify tumor stage. The SEER Historic Stage A provides four classifications for stage: in situ, localized, regional metastasis, and distal metastasis. However, there is wide heterogeneity in the applicability of these categories to different cancers. To simplify the analysis, we combine in situ and localized tumors, and study stage-conditional survival for local, regional or distant tumors. A handful of cancer types, namely blood diseases such as leukemia and myeloma, are always classified as distant, meaning these cancers are defined as late in all cases (so the probability of early detection is 0 by definition). In the case of prostate cancer, local and regional cancers are combined in the seer so we label prostate cancer as local or distant. To identify the stage-conditional survival gains, we estimate a logistic regression model of 3year mortality rates from 1997 to 2007, as a function of demographic covariates and measures for the availability of detection technology. This covers mortality events from 1997 to 2010. We estimate the regression model for all cancers combined and separately for each tumor type. For the pooled model we include fixed effects for the diagnosed tumor type. Covariates include age, age squared, sex, race, ethnicity, stage fixed effects, and fixed effects for year of diagnosis. Then, using the regression model, we compute predicted 3-year mortality rates by stage of diagnosis in 1997 and 2007 at the mean value of the other covariates. This approach isolates trends in cancer-related mortality for each cancer type at each stage of diagnosis independent of other covariates that could influence patient outcomes. Using the results of the logistic regression models, we predict the mortality rates at mean values of the covariates for each stage and year of diagnosis. We then use these estimates to estimate the reduction in mortality from 1997 to 2007 due to improvements in treatment using the withinstage trend. The residual improvement is deemed to be that due to improved detection. This approach relies on the SEER cancer staging variables to construct stage fixed effects as our measures of disease-progression. Cancer staging, however, is not a perfect measure of diseaseprogression for several reasons. First, very fine variation in disease-progression will be present in any staging system. That is, two patients with subtly different levels of progression might be assigned the same stage. Therefore, if improved detection results in patients being diagnosed earlier within a stage, some lead-time bias might still exist, even after we condition on stage at diagnosis. Furthermore, both stage-migration and length-biased sampling may present problems. Note that all these forces suggest that the staging approach overstates the contribution of treatment and understates the role of detection, because detection improvements will confound the within-stage time trend and appear analytically to be improvements in treatment. Thus, our first approach provides an upper bound on the contribution of treatment and a lower bound on detection. Measuring improvements in detection Improvements in detection confound the cancer staging approach by introducing the possibility of unobserved changes in the timing of diagnosis. Our alternative analytic approaches focus on improvements in survival: (1) within regions of the country that have the lowest access to detection services; and (2) within the regions with the least measured improvement in early cancer detection. In principle, areas with the lowest access to and rates of improvement in detection will also have the most limited degrees of stage-migration and length-biased sampling. However, since areas worse at detection are also likely to be worse at treatment, this approach likely understates the contribution of treatment by focusing on areas less adept at deploying new treatment innovations. Thus, measuring survival gains in these areas should provide a lower bound on the contribution of treatment and an upper bound on detection. To identify regions with poorest access to detection services, we construct county-level measures of access to diagnostic services and treatment intensity by combining the SEER data with county-level data from the Area Health Resources File (AHRF). The AHRF is a health resource database produced by the Health Resources and Services Administration, an agency of the US Department of Health and Human Services. The AHRF contains county-level data on health facilities, health professions, and environmental and socioeconomic characteristics, among others. The AHRF also provides data on county-level population, provided by the US Census Bureau. We use county-level variables on the number of per-capita radiologists, including diagnostic radiologists, and the number of per-capita general surgeons to derive our local indexes. To mitigate possible correlations between radiologist access and treatment aggressiveness, we also include county-level data on the number of general surgeons. Areas with a higher number of surgeons per capita are more likely to engage in more intensive practice styles, which we would expect to also reflect in more aggressive treatment of cancer patients. If the number of radiologists is a measure of patient access to radiology then we would expect it to lead to earlier cancer diagnoses. We document evidence for this assumption by comparing the number of radiologists to the probability of early detection. Below, we confirm that the number of radiologists is positively correlated with early diagnosis of cancer patients, but the number of surgeons per-capita is unrelated to the probability of an earlier diagnosis (A3). We use access to radiology services to remove confounding variation in detection from our estimates of the improvements in survival due to treatment. First, we include it in the logistic regression as a covariate, to eliminate county-level differences in mortality that is due to improvements in detection technology. Second, we interact the year of diagnosis fixed effects with the number of radiologists per capita, to allow for counties with more access to diagnostic services to have different time trends. When we estimate the predicted change in mortality using the logistic regression, we do so using the base year dummies (not the interacted dummies), which is equivalent to using only the time trend for the counties with the lowest access to diagnostic services. Finally, we “book-end” this approach by focusing on counties with the lowest – indeed, almost no – measurable improvements in early cancer detection over our study period. Specifically, we identify those counties in the US that demonstrate little to no measurable progress in detection, in the sense that these counties remain in the bottom 25th percentile in terms of the share of patients diagnosed early across all years. Gains in survival within these counties are taken as due solely to treatment, and we compare the mortality reductions in these counties to our overall estimates from the logistic regression model. This is likely a conservative approach to measuring treatment improvements, since treatment diffusion might be poorer in these counties as well. For comparison, we construct similar estimates for the high detection counties, those that remain in the top 75th percentile for all study years. Limitations Our study faces several potential limitations. First, the SEER registries are not a fully representative sample of the US population, which raises concerns about generalizability. Nonetheless, the SEER data offer the most comprehensive source on cancer survival including variables on stage at diagnosis, which are necessary for our decomposition into treatment and detection effects. We also focus on cancer-related mortality as our primary outcome measure, to avoid influencing our survival analysis with trends in mortality due to non-cancer causes. However, multiple studies have documented misclassifications in recordings of underlying cause of death. Below, we show that our results are consistent whether we use cancer-related mortality or allcause mortality (A2). Our analysis does not directly identify the benefits of any particular treatment. Instead, we identify the gains in life expectancy conditional on diagnosis between 2010 and 2000, controlling for other factors that we expect would influence life expectancy (e.g., age at diagnosis). A more detailed analysis would include information on the actual treatment used, and compare life expectancy gains for newer compared to older treatment options. However, as long as our analysis is able to appropriately control for trends in other factors that influence life expectancy, our approach should estimate the approximate life expectancy gains due to changes in treatment for the average patient. Another limitation of our approach is that defining cancers as local, regional or distant is relatively coarse. The ideal approach would be to have a single variable that was comparable across tumor type and perfectly identified disease severity at the time of diagnosis, but obviously no such variable exists. If we inadequately control for disease severity at diagnosis, it could introduce measurement error into our classification of stage. However, the approaches that measure and control for improvements in detection are designed to mitigate any such error. Finally, our model simplifies the analysis to compare survival gains from treatment and survival gains in early detection as if the two are independent. In fact, there is a symbiotic relationship between treatment and detection that is important to consider. Detection can only improve health outcomes if it is followed by effective treatment, and treatment is generally more effective if detection occurs at an earlier disease stage. If innovative diagnostic technologies emerge for a tumor site, then effective treatments for that tumor become all the more important. This point extends to the development of new biomarkers that identify the presence of specific cancers. Biomarkers that identify cancer risk will only help improve patient health if there is some treatment or preventive options available for the cancers that they are able to identify. Similarly, biomarkers that identify which types of treatment will be effective for patients will help improve the cost-effectiveness of treatment and avoid wasteful therapy that is unlikely to improve outcomes and could expose patients to side effect risk. A2. All-Cause Mortality In our study we focus on cancer-related mortality because that should be most susceptible to improvements in both treatment and detection. Ignoring non-cancer mortality circumvents the potential problem that overall survival is improving over time (e.g., because of improved treatment options in cardiovascular disease). However, focusing on disease-specific mortality requires that we can accurately measure cause of death, but past work has documented misclassifications in recordings of underlying cause of death. Inaccuracies in coding cause of death are not necessarily problematic for this analysis, but it could introduce confounding variation if the degree of miscoding changed over time. Here we replicate our findings using allcause mortality as the outcome variable to verify that our findings are not driven by choice of outcome variable. Exhibit A1 compares the trends in cancer-related mortality to all-cause mortality within three years of cancer diagnosis. Overall, all-cause mortality is about 4-6 percent higher than cancerrelated mortality. In other words, about 85% of deaths in this sample are cancer-related. The difference between all-cause and cancer-related mortality differs across tumor type. Overall, however, the percent change in mortality is similar whether or not we consider all-cause or cancer-related mortality. All-cause mortality fell by 7.1 percentage points (17.9%) for patients diagnosed in 2007 compared to 1997, compared to 5.6 percentage points (16.7%) for cancerrelated mortality. Exhibit A2 reports the results of the estimated decomposition of all-cause mortality rates into treatment and detection innovations. The methods are the same as those used to decompose cancer-related mortality in the main text: we use logistic regression to predict stage-specific mortality rates and then apply the decomposition method to separate the overall gains into treatment and detection gains. Overall, the results are similar comparing all-cause and cancerrelated mortality, except a focus on all-cause mortality indicates larger gains in treatment. For example, all of the overall gain across cancer types can be attributed to treatment gains, though detection does explain more of the difference for individual cancers (e.g., 62% of the improvement in all-cause mortality for colorectal cancer patients can be attributed to treatment). These findings suggest that treatment innovations are an important driver of survival gains for patients, both in terms of all-cause and cancer-related mortality, but failure to account for the trends in survival for non-cancer causes likely causes us to overstate the gains from treatment in most cases. A3. The Use of Radiologists to Control for Diagnostic Intensity We use local access to radiologists as a proxy for the intensity of diagnostic effort in an area, and use it to try and net out improvements in diagnosis from improvements in treatment. We also use the number of general surgeons per capita as a proxy for treatment intensity, to avoid attributing trends in practice styles to improved treatment options. In the text, we describe analyses that verify that areas with more radiologists per capita have higher rates of early cancer detection. More formally, we conducted a logistic regression of the probability that patients were detected with local stage cancer and found that the number of radiologists per capita had a statistically significant effect (OR: 1.035; p = 0.015) while the number of general surgeons per capita did not (OR: 1.008; p = 0.265). This supports the hypothesis that the number of radiologists proxies for local access to detection technology, while the number of surgeons helps proxy for local treatment intensity. A4. Assessing Trends in 5-Year Mortality In the text, we focus on mortality 3 years after the date of diagnosis as a balance between allowing sufficient time over which to observe mortality while still capturing the more recent treatment advances. To verify that the results were not driven by this choice of time period, we replicated the analysis using mortality rates up to 5 years after diagnosis. Note that because we only observe mortality in the sample through 2010, we only consider changes in mortality and detection from 1997 to 2005 for this analysis. Exhibit A3 reports the findings from this sensitivity analysis. Overall, the results suggest an even bigger contribution of treatment to the trends in survival gains. We find that 5-year cancerrelated mortality fell by 11% (4.1 percentage points) from 1997 to 2005, approximately 89.8% of which is due to improved treatment. The share due to treatment varies similarly across cancer types as with 3-year mortality rates. For example, approximately 69.4% of the reduction in 5year mortality for breast cancer from 1997-2005 is due to treatment compared to 69.2% of the reduction in 3-year mortality from 1997-2007. Similarly, for colorectal cancer patients 55.6% of the reduction in 5-year mortality from 1997-2005 is due to treatment compared to 57.5% of the reduction in 3-year mortality from 1997-2007. Appendix References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Sun, E., et al. The Contributions of Improved Therapy and Early Detection to Cancer Survival Gains, 1988-2000. in Forum for Health Economics & Policy. 2010. Lakdawalla, D.N., et al., An economic evaluation of the war on cancer. Journal of health economics, 2010. 29(3): p. 333-346. Institute, N.C. Overview of the SEER Program. [Web Page] [cited 2014 May 2]; Available from: http://seer.cancer.gov/about/overview.html. Shapiro, S., J.D. Goldberg, and G.B. Hutchison, Lead time in breast cancer detection and implications for periodicity of screening. American Journal of Epidemiology, 1974. 100(5): p. 357-366. Chu, K.C., C.R. Smart, and R.E. Tarone, Analysis of breast cancer mortality and stage distribution by age for the Health Insurance Plan clinical trial. Journal of the National Cancer Institute, 1988. 80(14): p. 1125-1132. Connor, R.J., K.C. Chu, and C.R. Smart, Stage-shift cancer screening model. Journal of clinical epidemiology, 1989. 42(11): p. 1083-1095. Cancer, A.J.C.o., AJCC cancer staging manual. 7th ed2010, New York: Springer. Society, A.C. Treatment of invasive breast cancer, by stage. [Web Page] 2014 [cited 2014 May 2]; Available from: http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-treating-bystage. Wennberg, J.E., et al., Tracking the Care of Patients with Severe Chronic Illness-The Dartmouth Atlas of Health Care 2008. 2008. German, R.R., et al., The accuracy of cancer mortality statistics based on death certificates in the United States. Cancer epidemiology, 2011. 35(2): p. 126-131. Goldman, D.P., et al. The Value of Diagnostic Testing in Personalized Medicine. in Forum for Health Economics and Policy. 2013. Exhibit A1: 3-year mortality rates by tumor type, cancer-related and all-cause, 1997 to 2007 Cancer-related All-Cause 1997 2007 Difference (%) 1997 2007 Difference (%) 33.6% 28.0% -5.6 (-16.7%) 39.8% 32.6% -7.1 (-17.9%) Bladder 23.9 22.6 -1.3 (-5.2%) 33.0 30.0 -2.9 (-8.9%) Breast 9.8 7.6 -2.2 (-22.7%) 13.6 10.1 -3.4 (-25.3%) Colorectal 35.2 28.4 -6.8 (-19.4%) 43.0 34.4 -8.6 (-20.1%) Kidney 37.7 23.3 -14.4 (-38.3%) 43.9 27.9 -16.1 (-36.6%) Leukemia 48.7 40.6 -8.0 (-16.5%) 58.3 47.4 -10.9 (-18.7%) Liver 79.1 68.3 -10.8 (-13.7%) 87.7 77.8 -9.9 (-11.3%) Lung 78.4 72.8 -5.6 (-7.1%) 85.3 79.2 -6.2 (-7.3%) Melanoma 9.8 8.0 -1.8 (-18.5%) 13.6 10.7 -2.9 (-21.2%) Myeloma 49.4 40.6 -8.9 (-17.9%) 62.1 50.6 -11.5 (-18.6%) Non-Hodgkin Lymphoma 41.8 28.5 -13.3 (-31.8%) 48.7 34.1 -14.5 (-29.9%) Ovarian 46.0 43.5 -2.5 (-5.5%) 48.4 46.6 -1.8 (-3.7%) Pancreatic 90.6 87.0 -3.6 (-3.9%) 94.6 91.4 -3.2 (-3.4%) Prostate 7.3 4.5 -2.8 (-38.0%) 13.8 8.3 -5.6 (-40.2%) Stomach 70.4 63.6 -6.8 (-9.7%) 77.9 69.9 -7.9 (-10.2%) All Cancers Combined 6.0 3.6 -2.4 (-39.5%) 9.0 4.9 -4.1 (-45.2%) Source: Authors’ analysis of the Surveillance, Epidemiology, and End Results (SEER) Program registry data. Notes: Table reports the 3-year mortality for patients who were diagnosed with a malignant cancer in 1997 or 2007 in the Surveillance, Epidemiology, and End Results (SEER) Program registry. The 3-year mortality rate is defined as the percent of patients who die with within 3 years of the initial diagnosis. Separate estimates are provided for cancer-related and all-cause mortality. Thyroid Exhibit A2: Reductions in 3-year all-cause mortality due to treatment and detection by tumor type, 1997-2007 -20 Change in 3-Year Mortality Rate (1997 to 2007) -15 -10 -5 0 5 All Cancers Combined Bladder Breast Colorectal Kidney and Renal Pelvis Leukemia Liver and Intrahepatic Bile Duct Lung and Bronchus Melanoma of the Skin Myeloma Non-Hodgkin Lymphoma Ovarian Pancreatic Prostate Stomach Thyroid Mortality reduction from treatment Mortality reduction from detection Source: Authors’ analysis of the Surveillance, Epidemiology, and End Results (SEER) Program registry data. Notes: Figure reports the percentage point change in 3-year mortality rate from 1997-2007 for patients diagnosed with a malignant tumor in the SEER registry, overall and by tumor type. The mortality reductions are estimated using a patient-level logistic regression model of all-cause mortality, controlling for patient demographics, area characteristics, year of diagnosis and access to radiology services and local area treatment intensity. The dark gray bar represents the portion of the mortality reduction due to treatment innovations, while the light gray bar represents that due to gains in earlier detection. Exhibit A3: Reductions in 5-year cancer-related mortality due to treatment and detection by tumor type, 1997-2005 -14 Change in 5-Year Mortality Rate (1997 to 2005) -12 -10 -8 -6 -4 -2 0 2 All Cancers Combined Bladder Breast Colorectal Kidney and Renal Pelvis Leukemia Liver and Intrahepatic Bile Duct Lung and Bronchus Melanoma of the Skin Myeloma Non-Hodgkin Lymphoma Ovarian Pancreatic Prostate Stomach Thyroid Mortality reduction from treatment Mortality reduction from detection Source: Authors’ analysis of the Surveillance, Epidemiology, and End Results (SEER) Program registry data. Notes: Figure reports the percentage point change in 5-year mortality rate from 1997-2005 for patients diagnosed with a malignant tumor in the SEER registry, overall and by tumor type. The mortality reductions are estimated using a patient-level logistic regression model of cancerrelated mortality, controlling for patient demographics, area characteristics, year of diagnosis and access to radiology services and local area treatment intensity. The dark gray bar represents the portion of the mortality reduction due to treatment innovations, while the light gray bar represents that due to gains in earlier detection.