Download Analytical models for estimating cause-specific

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
1
Additional File 4: Supplementary Materials:
2
3
Analytical models for estimating cause-specific mortality
4
GBD 2010
5
The GBD 2010 modeling process used the following steps for pneumonia and diarrhea mortality
6
estimation: 1) data identification, 2) data processing, 3) Cause of Death Ensemble models
7
(CODEm) for modeling acute respiratory infections (ARI) and diarrheal deaths, 4) proportional
8
model for dividing ARI into lower respiratory infections (LRI) and upper respiratory infections
9
(URI) deaths, 5) DisMod-MR for estimating deaths due to LRI and diarrhea etiologies, and 6)
10
CoDCorrect for scaling cause-specific mortality fractions to sum to one (Figure 4).
11
12
The first step of their analytical approach for estimating all causes of death for all age groups,
13
including under-five (U5) mortality due to ARI and diarrhea and their respective etiologies, GBD
14
2010 identified all vital registration (VR) data sources, verbal autopsy (VA) studies, and
15
Demographic Health Surveillance Sites data. After identifying all possible data, including
16
published and unpublished sources, step two involved the preprocessing of this data in three
17
ways. First, undefined or ill-defined causes of death (“garbage” codes) were redistributed. For
18
example, deaths coded as fever or extreme dehydration were reassigned to pneumonia,
19
malaria, and other relevant target causes. Next, different versions of the International
20
Classification of Disease (e.g. ICD-8 and ICD-9) were mapped to match the ICD-10 causes of
21
death. Data were adjusted to be comparable over time. Finally, the resulting data were then
22
assessed for outliers. IHME’s website provides data visualizations which present data pre-and-
23
post adjustment, as well as which data were identified as outliers:
1
24
http://viz.healthmetricsandevaluation.org/cod/?geo=USA&sex=1&adjType=0&cause=A.1.1&uni
25
t=8&minMax=self&userType=external.
26
27
In step three, GBD 2010 used CODEm to model mortality for 133 of the 235 estimated causes of
28
death, including ARI and diarrhea. Because there are more data available for ARI overall (i.e. LRI
29
and URI combined), GBD 2010 began the modeling process by using CODEm to estimate the
30
mortality due to ARI.
31
In step four, GBD 2010 used a fixed proportional model to split ARI deaths into URI and LRI
32
deaths using all available cause of death data that distinguishes between URI and LRI. Ensemble
33
modeling is a statistical technique in which the predictions of a group of base models are
34
combined to generate a composite prediction which is believed to be more accurate than the
35
prediction of any single model [1]. CODEm uses a three-step process to build models. 1) The
36
first step is to create a large range of possible statistical models for each cause based on
37
published studies and plausible relationships between covariates and the specific cause of
38
death (the component models). 2) The second step is to assess the performance, or predictive
39
capacity, of the component models and to generate and assess the ensemble models by
40
randomly excluding 30% of the data and generating the component models using the remaining
41
70% of the data. Next, CODEm assesses the performance of the component model by using half
42
of the excluded data (i.e., 15% of the total data) to predict mortality and to build various
43
ensemble models from the component models. The last 15% of the data are then used to
44
assess the predictive capacity of the ensemble models and all the component models. Unique
45
to this process, data are randomly held out from the analysis (20 times for diarrhea and 25
2
46
times for ARI) using the same pattern of missing data as is present in the original database. This
47
allows the predictive validity test to best represent the real or observed pattern of missing data.
48
3) The final step is to pick the best performing model, usually an ensemble model, based on the
49
predictive validity test described above and estimate the cause-specific mortality rates using
50
the full dataset [2]. CODEm provides a cause fraction and total number of deaths due to ARI
51
and diarrhea for males and females aged 0–6, 7–27, and 28–356 days, and 1–4 years,
52
independently. When the cause fractions for each cause of death are summed within each age
53
group, the total may exceed one.
54
In step five, GBD 2010 built a database using data from existing systematic reviews and
55
additional literature searches to split LRI and diarrhea into their respective etiologies.
56
Calculations were performed on the number of deaths due to five etiologies of LRI (influenza,
57
pneumococcal pneumonia, H. influenzae type b (Hib) pneumonia, respiratory syncytial virus
58
(RSV), and other LRI) and ten etiologies of diarrhea (rotavirus, enteropathogenic E. coli,
59
enterotoxigenic E. coli, Campylobacter, Cryptosporidium, Shigella, Cholera, Amoebiasis, other
60
salmonella infection (Non-typhoidal salmonella: NTS), and other diarrhea pathogens) using the
61
DisMod-MR data model, which is a Bayesian meta-regression tool [3, 4]. DisMod-MR is a
62
nonlinear mixed effects model capable of integrating data on disease incidence, prevalence,
63
remission, and mortality. When applied to estimating etiologic fractions of disease, DisMod-MR
64
uses an age-standardizing negative-binomial spline model to combine measurements of disease
65
from heterogeneous age groups, and from different years and geographic areas [5]. Random
66
effects of country, region, and super-region quantify the geographic variation in the etiologic
67
fractions. Fixed effects capture study-level variation, such as sex, method of diagnosis, and
3
68
setting, and country-level variation, such as country income, malnutrition prevalence, and
69
smoking prevalence. DisMod-MR produces point estimates and uncertainty intervals for each
70
etiologic fraction for each country, sex, and year, using an empirical Bayes method.
71
72
Step six in the analytical approach is to use CoDCorrect to adjust the cause-specific mortality
73
fractions to sum to one. The algorithm takes into account the fact that certain causes of death
74
were known with greater precision than others and ‘penalizes’ those causes of death.
75
CoDCorrect was applied in a hierarchal fashion based on the GBD 2010 cause hierarchy. Level 1
76
causes are known with the greatest precision and Level 3 with the least precision. For the
77
causes under study in this discussion paper, diarrhea and ARI (LRI and URI combined), and all
78
other Level 1 causes are fit into the mortality envelope generated for all-cause mortality. As
79
described above, a fixed proportion model is then used to distribute ARI into the Level 2 causes:
80
LRI and URI. The LRI and diarrhea etiologies (Level 3) are then fit into the LRI and diarrhea
81
mortality envelopes respectively. This is implemented at the draw level (1000 draws that
82
represent uncertainty) [2].
83
84
CHERG
85
The CHERG approach used the following steps 1) data source identification, 2) data processing,
86
3) use of one of three models to estimate neonatal and postneonatal mortality separately
87
according to mortality level, disease profile, and data availability, 4) development of a disease-
88
specific model to estimate neonatal pneumonia and tetanus, as well as measles and pertussis
89
among older children, and 5) adjusting deaths to sum to the total U5 mortality (Figure 5) [6].
4
90
91
After identifying data sources and cleaning the data, country-specific proportions of neonatal
92
and postneonatal deaths were assigned to a specific cause using one of three different methods
93
depending on the coverage and quality of VR data and the under-five mortality rate (U5MR) in
94
each country:
95
96
1. Use of VR data: used when a country's coverage of the VR system was high (>80%) and
97
quality was perceived as good. Data were adjusted for coverage incompleteness if
98
needed by reassigning illogical causes of death and then grouped according to standard
99
ICD-10 codes.
100
2. Use of a vital-registration-data-based multi-cause model (VRMCM): used in countries
101
with inadequate vital registration system and lower U5MRs (≤35 deaths per 1000 live
102
births). CHERG used univariate meta-regression and multivariate ordinary least squares
103
regression to identify explanatory variables (risk factors) for the log ratio of each cause
104
with pneumonia as the reference category. CHERG applied the results of the model to
105
calculate the proportional mortality for each of the eight etiology groupings.
106
3. Use of a verbal-autopsy-data-based multi-cause model (VAMCM): used in countries with
107
poor VR coverage and high U5MRs (>35 per 1000 live births). CHERG used ordinary least
108
squares to determine inclusion of risk factor data in a multinomial logistic regression
109
(with pneumonia as the reference category) [6]. CHERG applied country- and year-
110
specific data points for each covariate to calculate the proportional mortality for each of
111
the eight etiology groupings.
5
112
113
In an update of the CHERG approach by Liu et al., neonatal pneumonia deaths were estimated
114
separately from the VAMCM and VRMCM models [6]. An additional 36 studies where deaths
115
due to neonatal pneumonia were reported were used to estimate the proportion of neonatal
116
deaths attributed to pneumonia using a logistic regression model.
117
118
Post-hoc adjustments were made to the country-level cause of death (COD) estimates for
119
pneumonia, malaria, and meningitis to account for health intervention coverage and
120
effectiveness. The deaths ‘saved’ by the intervention were reallocated to the other COD
121
categories by relative importance. Pneumonia and meningitis deaths were adjusted for the use
122
and effectiveness of three doses of Hib vaccine. Malaria deaths were adjusted for the use and
123
effectiveness of insecticide treated nets.
124
125
For all neonatal and postneonatal VRMCMs and VAMCMs, 10% of the data were reserved for
126
cross-validation of the models to compare out-of-sample prediction accuracy between model
127
versions. The best-performing model was chosen as the final model with the smallest absolute
128
prediction error. Bootstrapping was used to construct uncertainty ranges for estimates
129
produced by VRMCMs and VAMCMs.
130
131
CHERG’s etiology-specific mortality estimates for pneumonia (Hib and pneumococcus only) and
132
diarrhea were calculated using separate models [7, 8]. The conceptual and statistical approach
133
for the pneumococcus- and Hib-specific pneumonia mortality estimates was based on a
6
134
previous estimation project from the WHO, which aimed to estimate country-level
135
pneumococcal and Hib morbidity and mortality for pneumonia, meningitis and non-
136
pneumonia/non-meningitis in the year 2000 [9, 10]. For pneumonia etiologies, the conceptual
137
approach applied the country-specific pneumonia mortality values to the proportion of cases
138
due to pneumococcus and Hib. The etiologic fractions for the deaths were assumed to be the
139
same as the proportion of chest x-ray confirmed pneumonias that are due to Hib and
140
pneumococcus. In other words, if 33% of chest x-ray confirmed pneumonias were attributed to
141
pneumococcus, then 33% of pneumonia deaths were attributed to pneumococcus. The latter
142
proportion was calculated using values from randomized controlled trials of pneumococcal or
143
Hib vaccines and generating a single global point estimate through a random effects meta-
144
analysis. CHERG assumed that 21.3% of pneumonia deaths were due to Hib and 33.3% were
145
due to pneumococcus. These two estimates were determined through meta-analyses of
146
vaccine efficacy studies [7, 9, 10]. This approach assumes that the underlying etiologic
147
distribution of pneumonia in the absence of Hib and pneumococcal vaccination is the same
148
across all geographies. Because of sparse data for estimating the fraction of pneumonia deaths
149
from viral etiologies, point estimates were not available for RSV and influenza mortality;
150
therefore, the remaining 46% of pneumonia deaths were not assigned to an etiology. However,
151
Nair et al. (2010 and 2011) from CHERG have published mortality upper and lower “bounds”
152
which will be improved due to a current data collecting exercise funded by BMGF [11, 12].
153
154
To estimate the etiologies of diarrheal diseases, CHERG conducted a systematic review of
155
studies that analyzed stool samples to assess causation, focusing on 13 specific pathogens:
7
156
rotavirus, EPEC, ETEC, Salmonella spp. (excluding S. typhi), Shigella spp., Campylobacter spp.,
157
Vibrio cholerae O1 and O139, Giardia lamblia, Cryptosporidium spp., Entamoeba histolytica,
158
human Caliciviruses (genogroup I and II norovirus and sapovirus), astrovirus, and enteric
159
adenovirus [8]. CHERG also used 2011 World Health Organization (WHO) Rotavirus Surveillance
160
Network data from countries that have not introduced the rotavirus vaccine [13]. Rather than
161
taking a modeling approach to estimating the etiologic fraction of each pathogen, CHERG used
162
the age-adjusted median proportion from the studies found in the systematic review. They
163
examined the data in several ways, including comparing the median etiologic proportions of
164
studies that examined only one pathogen with those that examined 5–13 pathogens and scaling
165
the etiologic proportions so that they summed to one. CHERG applied both the unscaled and
166
scaled etiologic proportions to the total estimated number of diarrheal deaths to produce
167
global estimates of cause-specific diarrheal deaths.
168
Inclusion criteria for data
169
GBD 2010
170
The GBD 2010 LRI etiology models used data from existing systematic reviews conducted by
171
Black et al. [14], O’Brien et al. [10], Watt et al. [9], and a database of published studies of ARI
172
compiled by the WHO. GBD 2010 included all studies from these reviews after applying
173
additional exclusion criteria: studies not general population-representative (e.g., studies in
174
refugee populations); studies that did not provide data on the prevalence, incidence or
175
mortality due to LRI; studies that provided no numerator and denominator counts; studies
176
conducted before 1980; and studies with a denominator less than 50. An additional systematic
8
177
search was conducted using the same inclusion and exclusion rules for studies published
178
between 2006–2011, which identified an additional 54 studies for analysis [2]. Ten vaccine
179
efficacy studies were also used in the GBD 2010 LRI etiology models. Appendix 2 lists all studies
180
used to model pneumonia etiologies by both GBD 2010 and CHERG.
181
182
The GBD 2010 diarrhea etiology models used the following exclusion criteria for studies: case-
183
control studies that did not include data for controls, studies that provided no measurements
184
(only estimates), studies that did not include the proportion of diarrhea cases with positive
185
laboratory tests for at least one pathogen, studies that were conducted for fewer than 12
186
months (which did not account for potential effects of seasonality), studies that only reported
187
data for HIV-positive patients, studies that assessed travelers, and studies of populations
188
experiencing outbreaks of diarrhea due to a pathogen. GBD 2010 also used unpublished data
189
from a large cohort study in Pakistan. In total they included 189 studies from 1975–2010 with
190
rotavirus providing the largest number of studies at 126 [2].
191
Model Covariates:
192
Pneumonia Covariates
193
To estimate the number of deaths due to ARI, GDB 2010 used the following covariates in the
194
CODEm model: Hib3 vaccine coverage, health system access, malnutrition (proportion <2 SD
195
weight for age), indoor air pollution, outdoor air pollution, DTP3 coverage (proportion), rainfall
196
(quintiles 4-5), proportion of population with access to improved sanitation, proportion of
197
population with access to improved water, average years of education per capita, log lag-
9
198
distributed income (USD per capita), and proportion of population in areas over 1,000 people
199
per square kilometer. GBD 2010 used the following covariates to estimate LRI mortality for U5
200
children: healthcare access, Hib3 vaccine coverage, access to sanitation and clean water,
201
malnutrition, education, population density, and the log of lag-distributed income.
202
203
For the LRI etiology models, Hib3 vaccine coverage was included as a covariate to estimate the
204
proportion of LRI due to Hib, and health system access as a covariate for all of the five
205
etiologies. GBD 2010 included as a study-level covariate whether the study investigated the
206
etiology of LRI in inpatients or in an outpatient/community-based setting. Estimates for
207
inpatients were assumed to be representative of severe cases of LRI, whereas estimates for
208
outpatient or community-based settings were assumed to be representative of all cases of LRI.
209
For the efficacy study data, they assumed that clinical pneumonia was analogous to all cases of
210
LRI and chest radiography-positive pneumonia to be analogous to severe cases of LRI.
211
Proportions corresponding to clinical and chest radiography-positive pneumonia were
212
therefore coded as outpatient and inpatient, respectively. Inpatient studies were set as the
213
reference category in the model so that the resultant proportions estimated would represent
214
severe cases of LRI.
215
216
Because pneumonia was used as the reference category for CHERG’s VAMCM model, no
217
specific covariates were used to estimate postneonatal pneumonia mortality. Instead, all
218
covariates used to estimate the other seven causes of death informed the estimates for
219
pneumonia mortality. For neonates, the following covariates were used in the VAMCM model
10
220
for severe infection: neonatal mortality rate, DPT3 vaccination coverage, Quad-DPT coverage,
221
and period (early, late, or overall). In the VRMCM for severe infection, early neonatal was
222
modeled using access to antenatal care, female literacy and region, while late neonatal was
223
modeled using female literacy, under-five mortality rate, births with skilled attendant, low birth
224
weight, and region as covariates. The VAMCM model for neonatal pneumonia deaths used
225
female literacy, low birth weight, birth with skilled attendant, and tetanus toxoid vaccine
226
coverage at birth as covariates. In the VRMCM model, female literacy, neonatal morality rate,
227
births will skilled attendant, and region were used as covariates for pneumonia deaths in early
228
neonates, while access to antenatal care, region, GFR, and births with skilled attendant were
229
used for late neonatal pneumonia. For the pneumonia etiology model, estimates for
230
pneumococcal pneumonia and Hib deaths were based on each country’s pneumococcal and
231
Hib3 vaccination coverage, respectively.
232
233
Diarrhea Covariates
234
GBD 2010 estimated the number of deaths due to diarrhea using the following covariates:
235
health care access, rotavirus vaccine coverage, access to sanitation and water, percentage of
236
children under 2 malnourished, education, population density, and the log of lag-distributed
237
income ($ per capita).
238
239
CHERG used the following covariates in their neonatal VAMCM model for diarrhea deaths:
240
neonatal mortality rate, DPT vaccination coverage, Quad-DPT vaccination coverage, and period
241
(early, late, overall). For the postneonatal VRMCM model for diarrhea deaths the following
11
242
covariates were used: U5 population, coverage of the third dose of DPT3 vaccine, WHO Europe
243
Region vs. other WHO regions, and gross national income (GNI, per capita). For the VAMCM
244
model for postneonatal diarrhea deaths year of study, percent urban population, U5 mortality
245
rate, and oral rehydration therapy coverage were used as covariates.
246
247
References:
248
249
1. Foreman KJ, Lozano R, Lopez AD, Murray CJ: Modeling causes of death: an integrated
approach using CODEm. Popul Health Metr 2012, 10:1.
250
251
252
253
254
255
2. Lozano R, Naghavi M, Foreman K, Lim S, Shibuya K, Aboyans V, Abraham J, Adair T, Aggarwal
R, Ahn SY, Alvarado M, Anderson HR, Anderson LM, Andrews KG, Atkinson C, Baddour LM,
Barker-Collo S, Bartels DH, Bell ML, Benjamin EJ, Bennett D, Bhalla K, Bikbov B, Bin Abdulhak A,
Birbeck G, Blyth F, Bolliger I, Boufous S, Bucello C, Burch M, et al.: Global and regional mortality
from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the
Global Burden of Disease Study 2010. Lancet 2012, 380:2095–128.
256
257
258
259
260
261
3. Vos T, Flaxman AD, Naghavi M, Lozano R, Michaud C, Ezzati M, Shibuya K, Salomon J a,
Abdalla S, Aboyans V, Abraham J, Ackerman I, Aggarwal R, Ahn SY, Ali MK, Alvarado M,
Anderson HR, Anderson LM, Andrews KG, Atkinson C, Baddour LM, Bahalim AN, Barker-Collo S,
Barrero LH, Bartels DH, Basáñez M-G, Baxter A, Bell ML, Benjamin EJ, Bennett D, et al.: Years
lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990–2010: a
systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012, 380:2163–2196.
262
263
264
4. Ferrari AJ, Charlson FJ, Norman RE, Flaxman AD, Patten SB, Vos T, Whiteford H a.: The
Epidemiological Modelling of Major Depressive Disorder: Application for the Global Burden of
Disease Study 2010. PLoS One 2013, 8:e69637.
265
266
5. Barendregt JJ, Van Oortmarssen GJ, Vos T, Murray CJ: A generic model for the assessment of
disease epidemiology: the computational basis of DisMod II. Popul Health Metr 2003, 1:4.
267
268
269
6. Liu L, Johnson HL, Cousens S, Perin J, Scott S, Lawn JE, Rudan I, Campbell H, Cibulskis R, Li M,
Mathers C, Black RE: Global, regional, and national causes of child mortality: an updated
systematic analysis for 2010 with time trends since 2000. Lancet 2012, 379:2151–61.
270
271
7. Rudan I, O’Brien KL, Nair H, Liu L, Theodoratou E, Qazi S, Lukšić I, Fischer Walker CL, Black RE,
Campbell H: Epidemiology and etiology of childhood pneumonia in 2010: estimates of
12
272
273
incidence, severe morbidity, mortality, underlying risk factors and causative pathogens for
192 countries. J Glob Health 2013, 3:10401.
274
275
276
8. Lanata CF, Fischer-Walker CL, Olascoaga AC, Torres CX, Aryee MJ, Black RE: Global causes of
diarrheal disease mortality in children <5 years of age: a systematic review. PLoS One 2013,
8:e72788.
277
278
279
9. Watt JP, Wolfson LJ, O’Brien KL, Henkle E, Deloria-Knoll M, McCall N, Lee E, Levine OS, Hajjeh
R, Mulholland K, Cherian T: Burden of disease caused by Haemophilus influenzae type b in
children younger than 5 years: global estimates. Lancet 2009, 374:903–11.
280
281
282
10. O’Brien KL, Wolfson LJ, Watt JP, Henkle E, Deloria-Knoll M, McCall N, Lee E, Mulholland K,
Levine OS, Cherian T: Burden of disease caused by Streptococcus pneumoniae in children
younger than 5 years: global estimates. Lancet 2009, 374:893–902.
283
284
285
11. Nair H, Gessner BD, Nokes DJ, Theodoratou E, Rudan I, Madhi S a, Simões EA, Campbell H:
Childhood mortality due to respiratory syncytial virus – Authors’ reply. Lancet 2010, 376:872–
873.
286
287
288
289
290
291
12. Nair H, Brooks WA, Katz M, Roca A, Berkley J a, Madhi S a, Simmerman JM, Gordon A, Sato
M, Howie S, Krishnan A, Ope M, Lindblade K a, Carosone-Link P, Lucero M, Ochieng W,
Kamimoto L, Dueger E, Bhat N, Vong S, Theodoratou E, Chittaganpitch M, Chimah O, Balmaseda
A, Buchy P, Harris E, Evans V, Katayose M, Gaur B, O’Callaghan-Gordo C, et al.: Global burden of
respiratory infections due to seasonal influenza in young children: a systematic review and
meta-analysis. Lancet 2011, 378:1917–30.
292
293
13. World Health Organization: Rotavirus surveillance worldwide-2009. Wkly Epidemiol Rec
2011:173–176.
294
295
296
14. Black RE, Cousens S, Johnson HL, Lawn JE, Rudan I, Bassani DG, Jha P, Campbell H, Walker
CF, Cibulskis R, Eisele T, Liu L, Mathers C: Global, regional, and national causes of child
mortality in 2008: a systematic analysis. Lancet 2010, 375:1969–87.
13