Download Approximate Bayesian Computation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Approximate Bayesian Computation: a simulation
based approach to inference
Richard Wilkinson1
Simon Tavaré2
1 Department
of Probability and Statistics
University of Sheffield
2 Department
of Applied Mathematics and Theoretical Physics
University of Cambridge
Workshop on Approximate Inference in Stochastic Processes and
Dynamical System
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
1 / 19
Stochastic Computation
Implicit Statistical Models
Two types of statistical model:
Prescribed models - likelihood function is specified.
Implicit models - mechanism to simulate observations.
Implicit models give scientists
more freedom to accurately
model the phenomenon under
consideration. The increase in
computer power has made
there use more practicable.
Popular in many disciplines.
00
11
11
00
11
00
11
00
(University of Sheffield)
11
00
00
11
11
00
11
00
00
11
Approximate Bayesian Computation
00
11
00
11
11
00
00
11
11
00
00
11
11 00
00
11
00
11
11
00
11
00
11
00
t
11
00
11
00
R.D. Wilkinson
11
00
11
00
00
11
00
11
00
11
11
00
00
11
11
00
11
00
11
00
11
00
11
00
00
11
11
00
11
00
Time
11
00
00
11
11
00
11
00
00
11 11
00
11
00
11
00
11
00
00 00
11
00
11
11
00
11
00
11
00
11
11
00
11
00
11
00
11
00
11
00
11
00
1100
00
11
00
1100
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
00
11
11
00
11
11
00
11
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
PASCAL 2008
2 / 19
Fitting to data
Most models are forwards models, i.e., specify parameters θ and i.c.s and
the model generates output D. Usually, we are interested in the
inverse-problem, i.e., observe data, want to estimate parameter values.
Different terminology:
Calibration
Data assimilation
Parameter
estimation
Inverse-problem
Bayesian
inference
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
3 / 19
Monte Carlo Inference
Aim to sample from the posterior distribution:
π(θ|D) ∝ prior × likelihood = π(θ)P(D|θ).
Monte Carlo methods enable Bayesian inference to be done in more
complex models.
MCMC can be difficult or impossible in many stochastic models, e.g.,
if
◮
◮
P(D|θ) unknown - true for many stochastic models,
or where there are convergence or mixing problems, often caused by
highly dependent data arising from an underlying tree or graphical
structure.
⋆
⋆
⋆
R.D. Wilkinson
Population Genetics
Epidemiology
Evolutionary Biology
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
4 / 19
Likelihood-Free Inference
Rejection Algorithm
Draw θ from prior π(·)
Accept θ with probability P(D | θ)
Accepted θ are independent draws from the posterior distribution,
π(θ | D).
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
5 / 19
Likelihood-Free Inference
Rejection Algorithm
Draw θ from prior π(·)
Accept θ with probability P(D | θ)
Accepted θ are independent draws from the posterior distribution,
π(θ | D).
If the likelihood, P(D|θ), is unknown:
‘Mechanical’ Rejection Algorithm
Draw θ from π(·)
Simulate D ′ ∼ P(· | θ)
Accept θ if D = D ′
The acceptance rate is P(D): the number of runs to get n observations is
n
negative binomial, with mean P(D)
.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
5 / 19
Approximate Bayesian Computation I
If P(D) is small, we will rarely accept any θ. Instead, there is an
approximate version:
Approximate Rejection Algorithm
Draw θ from π(θ)
Simulate D ′ ∼ P(· | θ)
Accept θ if ρ(D, D ′ ) ≤ ǫ
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
6 / 19
Approximate Bayesian Computation I
If P(D) is small, we will rarely accept any θ. Instead, there is an
approximate version:
Approximate Rejection Algorithm
Draw θ from π(θ)
Simulate D ′ ∼ P(· | θ)
Accept θ if ρ(D, D ′ ) ≤ ǫ
This generates observations from π(θ | ρ(D, D ′ ) < ǫ):
As ǫ → ∞, we get observations from the prior, π(θ).
If ǫ = 0, we generate observations from π(θ | D).
ǫ reflects the tension between computability and accuracy.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
6 / 19
Approximate Bayesian Computation II
If the data are too high dimensional we never observe simulations that are
‘close’ to the field data.
Reduce the dimension using summary statistics, S(D).
Approximate Rejection Algorithm With Summaries
Draw θ from π(θ)
Simulate D ′ ∼ P(· | θ)
Accept θ if ρ(S(D), S(D ′ )) < ǫ
If S is sufficient this is equivalent to the previous algorithm.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
7 / 19
Error Structure
Example (Gaussian Distribution)
1.2
0.8
0.0
0.4
Density
0.8
0.4
−5
0
ǫ=5
10
5
10
0.8
1.2
ǫ=1
5
0.4
Density
0.0
−10
1.2
−10
(University of Sheffield)
10
µ
−5
and
ǫ2
Var (µ | |x̄| ≤ ǫ) = Var (µ | x̄ = 0)+
3
R.D. Wilkinson
5
0.0
2ǫ
0
µ
Density
σ /n
−5
0.0
σ /n
−10
0.8
Then π(µ | |x̄| ≤ ǫ) =
−ǫ−µ
ǫ−µ
√
√
−Φ
Φ
2
2
0.4
Accept µ if |x̄| < ǫ.
Density
Simulate Xi ∼ N(µ, σ 2 )
1.2
Suppose Xi ∼ N(µ, σ 2 ), with σ 2 known, and give µ an improper flat prior
distribution, π(µ) = 1 for µ ∈ R.
Suppose we observe data with
x̄ = 0.
1000 samples
Pick µ ∼ U(−∞, ∞)
ǫ = 0.1
ǫ = 0.5
Approximate Bayesian Computation
0
µ
5
10
−10
−5
0
µ
PASCAL 2008
8 / 19
Approximate MCMC
Rejection sampling is inefficient, as θ is repeatedly sampled from its prior
distribution.
The idea behind MCMC is that by correlating observations more
time is spent in regions of high likelihood.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
9 / 19
Approximate MCMC
Rejection sampling is inefficient, as θ is repeatedly sampled from its prior
distribution.
The idea behind MCMC is that by correlating observations more
time is spent in regions of high likelihood.
Approximate Metropolis-Hastings Algorithm
Suppose we are currently at θ. Propose θ ′ from density q(θ, θ ′ ).
Simulate D ′ from P(·|θ ′ ).
If ρ(D, D ′ ) ≤ ǫ, calculate
π(θ ′ )q(θ ′ , θ)
h(θ, θ ) = min 1,
π(θ)q(θ, θ ′ )
′
.
Accept the move to θ ′ with probability h(θ, θ ′ ), else stay at θ.
Adaptive tolerance choices.
Sisson et al. and Robert et al. proposed an approximate sequential
importance sampling algorithm.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
9 / 19
ABC-within-MCMC
Problem: a low acceptance rate leads to slow convergence.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
10 / 19
ABC-within-MCMC
Problem: a low acceptance rate leads to slow convergence.
Suppose θ = (θ 1 , θ 2 ) with
π(θ 1 | D, θ 2 ) known,
π(θ 2 | D, θ 1 ) unknown.
We can combine Gibbs update steps (or any M-H update) with ABC.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
10 / 19
ABC-within-MCMC
Problem: a low acceptance rate leads to slow convergence.
Suppose θ = (θ 1 , θ 2 ) with
π(θ 1 | D, θ 2 ) known,
π(θ 2 | D, θ 1 ) unknown.
We can combine Gibbs update steps (or any M-H update) with ABC.
ABC-within-Gibbs Algorithm
Suppose we are at θ t = (θ1t , θ2t )
1. Draw θ1t+1 ∼ π(θ1 | D, θ2t )
2. Draw θ2∗ ∼ πθ2 (·)
◮
◮
Simulate D′ ∼ P(· | θ1t+1 , θ2∗ )
If ρ(D, D′ ) < ǫ, set θ2t+1 = θ2∗ . Else return to step 2.
This is often the case for models with a hidden tree structure generating
highly dependent data.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
10 / 19
Example From Population Biology
Inferring ancestral divergence times
00
11
11
00
11
00
00
11
0
11
0
0
1
0
1
1
0
0
1
1
0
1
0
0
1
1
0
0
1
1
0
1
0
0
1
11
00
1
0
0
1
1
0
t
11
00
00
11
1
0
1 00
0
11
00
11
00
11
0
1
0
1
1
0
0
1
11
00
11
00
00
11
0
1
0 0
1
1
1
0
0
1
0
1
0
1
0
1
0
1
1
0
0 00
1
11
11
00
00
11
11
00
11
00
00
11
11
00
00
11
11
00
00
11
0
00
11
00 1
11
0
1
11
00
0
1
0
0
0
00
11
0
1
00
11
00
0
1
0
00
11
0
1
00
11
0
1
00
0
1
00
11
0
1
0
00
11
0
1
0
1
0
001
11
0 0
1
01
1
1 1
0
01
1
0
1
00
11
0
1
00 11
11
00
11
01
1
0
1
00
11
0
1
00
11
0 11
1
00
11
0
1
00
11
01
1
0
1
00
11
0
1
01
1
0
1
Time
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
11 / 19
Choosing summary statistics and metrics
1.5
1.0
D2
0.5
0.0
summaries S(D), which are
sensitive to changes in θ, but
robust to random variations in
D
2.0
We need
−1.0 −0.5
a definition of approximate
sufficiency (LeCam 1963):
distance between π(θ | D) and
D1
π(θ | S(D))?
a systematic implementable approach for finding good summary
statistics.
−1.0
−0.5
0.0
0.5
1.0
1.5
Complex dependence structures can be accounted for.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
12 / 19
ABC Approach
Data can be thought of in two parts:
the observed number of fossils Di found in i th interval
the total number of fossils found, D+ .
D ′ denotes simulated data. A suitable metric might be
′
k X
Di
Di′ D+
ρ(D, D ) =
D+ − D ′ + D+ − 1
+
′
i =1
Note: no data summaries here
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
13 / 19
250
200
150
100
50
Extant Population Size
300
Not going so well
0
200
400
600
800
1000
Iteration Number
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
14 / 19
Tweak the metric
The simulated N0 values are too small (376 modern species)
Easy to combine different types of information with ABC
Change the metric
′
′
k X
N0
Di
Di′ D+
−
−
1
−
1
+
+
ρ(D, D ) =
D+ D ′ D+
N0
+
′
i =1
This gives approximate samples from
π(θ | D, N0 = 376) ∝ P(D, N0 = 376 | θ)π(θ)
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
15 / 19
0.02
0.00
0.01
Density
0.03
Results
60
80
100
120
140
Divergence Time (My)
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
16 / 19
Extensions
Model selection:
π
(S ′ ≈S)
M1
Ratio of acceptance rates πM
(S ′ ≈S) ≈ Bayes Factor. Relative
2
acceptance rates gives posterior model probabilities.
◮
Hopeless in practice as it is too sensitive to the tolerance ǫ.
Raftery and Lewis (1992) and Chib (1995) give computational
schemes to calculate Bayes factors. Neither works.
Expensive Simulators:
Emulate the stochastic model with a Gaussian process emulator.
Richard Boys, Darren Wilkinson et al. .
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
17 / 19
Pros and cons of ABC
Pros
◮
◮
◮
◮
Cons
◮
◮
◮
Issues
◮
◮
◮
R.D. Wilkinson
Likelihood is not needed
Easy to code
Easy to adapt
Generates independent observations (parallel computation)
Hard to anticipate effect of summary statistics (needs intuition)
Over dispersion of posterior due to ρ(D, D′ ) < ǫ
For complex problems, sampling from the prior does not make good
use of observations
One run or many?
How to choose good summary statistics?
How good an approximation do we get?
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
18 / 19
References
M. A. Beaumont and W. Zhang and D. J. Balding,
Approximate Bayesian Computatation in Population Genetics,
Genetics, 2002.
P. Marjoram and J. Molitor and V. Plagnol and S.
Tavaré, Markov Chain Monte Carlo without likelihoods, PNAS,
2003.
S. A. Sisson and Y. Fan and M. M. Tanaka, Sequential
Monte Carlo without Likelihoods, PNAS, 2007.
C. P. Robert, M. A. Beaumont, J. Marin and J. Cornuet,
Adaptivity for ABC algorithms: the ABC-PMC scheme, arXiv, 2008.
R.D. Wilkinson
(University of Sheffield)
Approximate Bayesian Computation
PASCAL 2008
19 / 19