Download Statistical Inference for Spatial and Structured Population Epidemic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cross-species transmission wikipedia , lookup

Transcript
Relating models to data:
A review
P.D. O’Neill
University of Nottingham
Caveats
Scope is strictly limited
 Review with a view to future challenges

Outline
1.
2.
3.
Why relate models to data?
How to relate models to data
Present and future challenges
Outline
1.
2.
3.
Why relate models to data?
How to relate models to data
Present and future challenges
1. Why relate models to data?
1. Scientific hypothesis testing
e.g. Can within-host heterogeneity of
susceptibility to HIV explain decreasing
prevalence?
e.g. Did control measures alone control
SARS in Hong Kong?
1. Why relate models to data?
2. Estimation
e.g. What is R0?
e.g. What is the efficacy of a vaccine?
1. Why relate models to data?
3. What-if scenarios
e.g. What would have happened if
transport restrictions were in place
sooner in the UK foot and mouth
outbreak?
e.g. How much would school closure
prevent spread of influenza?
1. Why relate models to data?
4. Real-time analyses
e.g. Has the epidemic finished yet?
e.g. Are control measures effective?
1. Why relate models to data?
5. Calibration/parameterisation
e.g. What range of parameter values are
sensible for simulation studies?
Outline
1.
2.
3.
Why relate models to data?
How to relate models to data
Present and future challenges
2. How to relate models to data
2.1 Fitting deterministic models
Options include
(i) “Estimation from the literature”
(ii) Least-squares / minimise metric
(iii) Can be Bayesian (Elderd, Dukic and
Dwyer 2006)
2. How to relate models to data
2.2 Fitting stochastic models
Available methods depend heavily on the
model and the data.
2. How to relate models to data
2.2 Fitting stochastic models
(i) Explicit likelihood
e.g. Longini-Koopman model for household
data (Longini and Koopman, 1982)
2. How to relate models to data
P (Avoid infection from housemate) = p
SEIR model within
household
P (Avoid infection from outside) = q
Given data on final outcome in (independent) households,
can formulate likelihood L (p,q)
2. How to relate models to data
2.2 Fitting stochastic models
(i) Explicit likelihood (continued)
Related household models examples:
Bayesian analysis (O’Neill at al., 2000)
Multi-type models (van Boven et al., 2007)
2. How to relate models to data
2.2 Fitting stochastic models
(i) Explicit likelihood (continued)
Methods include
Max likelihood (e.g. Longini and Koopman, 1982)
EM algorithm (e.g. Becker, 1997)
MCMC (e.g. O’Neill et al., 2000)
Rejection sampling (e.g. Clancy and O’Neill, 2007)
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood
Can arise due to model complexity and/or
insufficient data
2. How to relate models to data
Two-level mixing model
Sample
Ever-infected
Never-infected
Unseen
2. How to relate models to data
Individual-based
transmission models involve
unseen infection times
2. How to relate models to data
Even detailed data from
studies generally only give
bounds on unseen infection
times – e.g. infection occurs
between last –ve test and first
+ve test
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include:
Use a simpler approximating model
e.g. use pseudolikelihood, e.g. Ball, Mollison and
Scalia-Tomba, 1997
2. How to relate models to data
Two-level mixing model
Ever-infected
Never-infected
Explicit interactions
between households
2. How to relate models to data
Two-level mixing model
-> independent households model
Ever-infected
Never-infected
In a large population, households
are approximately independent
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include:
Use a simpler approximating model
e.g. discrete-time model instead of a continuous time model
(e.g. Lekone and Finkenstädt, 2006)
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include:
Direct approach – e.g. Martingale methods
(Becker, 1989)
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include:
Data augmentation: add in “missing data” or extra
model parameters to formulate a likelihood
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood: Data augmentation (continued)
Common example
- model describes individual-to-individual transmission
- observe times of case ascertainment, test results, etc, but
not times of infection/exposure
- augment data with missing infection/exposure times
2. How to relate models to data
Infectivity
starts
TI
Infectivity
ends
TE
Exposure time
= +ve test
Not observed
Höhle et al. (2005)
Observed data
= -ve test
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood: Data augmentation
(continued)
Data-augmentation methods include
MCMC (e.g. Gibson and Renshaw, 1998; O’Neill
and Roberts, 1999; Auranen et al., 2000)
EM algorithm (e.g. Becker, 1997)
2. How to relate models to data
2.2 Fitting stochastic models
(ii) No explicit likelihood: Data augmentation
(continued)
Data-augmentation methods can also be used in
less “obvious” settings
e.g. final size data for complex models
2. How to relate models to data
Two-level mixing model
Ever-infected
Never-infected
 Data
Augment parameter space
using links to describe
potential infections
Demiris and O’Neill, 2005
Outline
1.
2.
3.
Why relate models to data?
How to relate models to data
Present and future challenges
3. Present & future challenges
3.1 Large populations/complex models
Current methods often struggle with large-scale
problems.
e.g:
Large population,
Many missing data,
Many hard-to-estimate parameters/covariates
3. Present & future challenges
3.1 Large populations/complex models
e.g. UK foot & Mouth outbreak 2001
Keeling et al. (2001) stochastic discrete-time model,
parameterised via likelihood estimation and tuning/
simulation.
Attempting to fit this kind of model using “standard”
Bayesian/MCMC methods does not work well.
3. Present & future challenges
Large data set and many
missing data can cause
problems for standard
(and also non-standard)
MCMC
3. Present & future challenges
3.1 Large populations/complex models
e.g. Measles data
Cauchemez and Ferguson (2008) discuss the
problems that arise when fitting a standard
SIR model to large-scale temporal aggregated
data in a large population using standard
methods.
3. Present & future challenges
3.1 Large populations/complex models
Problems of this kind are usually tackled via
approximations (e.g. of the model itself).
Challenge: Can generic non-approximate
methods be found?
3. Present & future challenges
3.2 Data augmentation
Comment: this technique is surprisingly
powerful and is (probably) underdeveloped.
3. Present & future challenges
3.2 Data augmentation
e.g. Cauchemez and Ferguson (2008) use
a novel MCMC data-augmentation
scheme using a diffusion model to
approximate an SIR epidemic model.
3. Present & future challenges
3.2 Data augmentation
e.g. For final size data, instead of imputing a graph
describing infection pathways, could instead
impute generations of infection (joint work with
Simon White).
This can lead to much faster MCMC algorithms.
3. Present & future challenges
Two-level mixing model
Ever-infected
Never-infected
Imputing edges in graph
3. Present & future challenges
Ever-infected
Never-infected
Two-level mixing model
2
Infection chain = {1, 3, 1, 2, 1}
1
2
3
4
4
2
5
3. Present & future challenges
3.2 Data augmentation
e.g. Augmented data can also (sometimes) be
used to bound quantities of interest.
Clancy and O’Neill (2008) show how to obtain
stochastic bounds on R0 and other quantities
by considering “minimal” and “maximal”
configurations of unobserved infection times in
an SIR model.
3. Present & future challenges
3.2 Data augmentation
x x
x
x
x
Observed removal times
Imputed infection times
x
3. Present & future challenges
3.2 Data augmentation
xx
xx
x
Soon as possible
Observed removal times
Imputed infection times
x
3. Present & future challenges
3.2 Data augmentation
x
x
x
x
x
Late as possible
Observed removal times
Imputed infection times
Can show that “Soon as possible” maximises R0
but that minimal value is not necessarily given by
“Late as possible” – use Linear Programming to
find actual solution.
General idea also applicable to final outcome data
x
3. Present & future challenges
3.3 Model fit and model choice
Various methods are used in the literature
to assess model fit, e.g.
Simulation-based methods; use of
Bayesian predictive distribution;
standard methods where applicable;
Bayesian p-values
3. Present & future challenges
3.3 Model fit and model choice
Likewise for model choice methods
include AIC, RJMCMC
Challenge Better understanding of
pros and cons of such methods
References
B. D. Elderd, V. M. Dukic, and G. Dwyer (2006) Uncertainty in predictions of disease spread and public health responses to
bioterrorism and emerging diseases. PNAS 103, 15693-15697
I.M. Longini, Jr and J.S. Koopman (1982) Household and community transmission parameters from final distributions of infections in
households. Biometrics 38, 115-126.
P.D. O'Neill, D. J. Balding, N. G. Becker, M. Eerola and D. Mollison (2000) Analyses of infectious disease data from household
outbreaks by Markov Chain Monte Carlo methods. Applied Statistics 49, 517-542.
M. Van Boven, M. Koopmans, M. D. R. van Beest Holle, A. Meijer, D. Klinkenberg, C. A. Donnelly and H.A.P. Heesterbeek (2007)
Detecting emerging transmissibility of Avian Influenza virus in human households. PLoS Computational Biology 3, 13941402.
D. Clancy and P.D. O'Neill (2007) Exact Bayesian inference and model selection for stochastic models of epidemics among a
community of households. Scandinavian Journal of Statistics 34, 259-274.
N.G. Becker (1997) Uses of the EM algorithm in the analysis of data on HIV/AIDS and other infectious diseases. Statistical Methods in
Medical Research 6, 24-37.
F.G. Ball, D. Mollison and G-P. Scalia-Tomba (1997) Epidemic models with two levels of mixing. Annals of Applied Probability 7, 46-89.
M. Höhle, E. Jørgensen. and P.D. O'Neill (2005) Inference in disease transmission experiments by using stochastic epidemic models.
Applied Statistics 54, 349-366.
References…
N. G. Becker (1989) Analysis of Infectious Disease Data. Chapman and Hall, London.
G. Gibson and E. Renshaw (1998). Estimating parameters in stochastic compartmental models using Markov chain methods. IMA
Journal of Mathematics Applied in Medicine and Biology 15, 19-40.
P.D. O’Neill and G.O. Roberts (1999) Bayesian inference for partially observed stochastic epidemics. Journal of the Royal Statistical
Society Series A 162, 121-129.
K. Auranen, E. Arjas, T. Leino and A. K. Takala (2000) Transmission of pneumococcal carriage in families: a latent Markov process
model for binary longitudinal data. Journal of the American Statistical Association 95, 1044-1053.
P.E. Lekone and B.F. Finkenstädt (2006) Statistical Inference in a stochastic epidemic SEIR model with control intervention: Ebola as a
case study. Biometrics 62, 1170-1177.
M.J. Keeling, M.E.J. Woolhouse, D.J. Shaw, L. Matthews, M. Chase-Topping, D.T. Haydon, S.J. Cornell, J. Kappey, J. Wilesmith, B.T.
Grenfell (2001). Dynamics of the 2001 UK Foot and Mouth Epidemic: Stochastic Dispersal in a Heterogeneous Landscape.
Science 294, 813-817.
S. Cauchemez and N.M. Ferguson (2008). Likelihood-based estimation of continuous-time epidemic models from time-series data:
application to measles transmission in London. Journal of the Royal Society Interface 5, 885-897.
D. Clancy and P.D. O'Neill (2008) Bayesian estimation of the basic reproduction number in stochastic epidemic models. Bayesian
Analysis, in press.