Download Approximate Bayesian Computation methods and their applications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Approximate Bayesian
Computation methods
and their applications for
hierarchical statistical models
University College London, 2013
Contents
1. 
2. 
3. 
4. 
5. 
Introduction
ABC methods
Hierarchical algorithm
Application for ovarian cancer detection
Conclusion
Introduction
•  The likelihood function plays an important role in statistical
inference problems
•  For complex models computational costs for evaluating the
analytical formula are very high
•  Methods which provide statistical inference bypassing
evaluation of the likelihood function gained high popularity
ABC methods
•  ABC methods provide ways of evaluating posterior
distributions when the likelihood function is
analytically or computationally intractable
•  These methods are based on replacing the
calculation of the likelihood with a comparison
between the observed and simulated data
Let θ be a parameter vector to be estimated. Given
the prior distribution π (θ ), the goal is to approximate
the posterior distribution π (θ |x)∝ f (x|θ )π (θ ), where
f (x|θ ) is the likelihood of θ given the data x.
Generic form of ABC methods
1.  Sample a candidate parameter vector θ * from
some proposal distribution π (θ ).
2.  Simulate a dataset x* from the model described
by a conditional probability distribution f (x|θ*) .
3.  Compare the simulated dataset, x*, with the
experimental data, x0 , using a distance function,
d and tolerance ε ; if d(x0,x*)≤ε, accept θ *.
The tolerance ε ≥0 is the desired level of
agreement between x0 and x*.
Most popular ABC algorithms
•  ABC rejection algorithm
•  ABC MCMC algorithm (Marcov Chain Monte
Carlo)
•  ABC SMC algorithm (Sequential Monte Carlo)
ABC rejection method
1. 
2. 
3. 
4. 
Sample θ * from π (θ ) .
Simulate a dataset x* from f (x|θ*) .
If d(x0,x*)≤ε, accept θ *, otherwise reject.
Return to step 1.
Disadvantage: if prior distribution will be very
different from the posterior, acceptance rate would
be low.
ABC MCMC
1. 
2. 
3. 
4. 
Metropolis-Hastings Algorithm
Random-walk-Metropolis-Hastings Algorithm
Gibbs Sampling Algorithm
Metropolis within Gibbs Algorithm
Metropolis-Hastings Algorithm
Let q(y|x) be an arbitrary, friendly distribution (we
know how to sample from) called proposal.
Choose X0 arbitrarily. Suppose we have generated
X0, X1,..., Xi . To generate Xi+1 do the following:
(1) Generate a proposal or candidate value Y ~q(y|Xi)
(2) Evaluate r≡r(Xi|Y) where
r(x,y)=min f (y)q(x|y),1
f (x)q(y|x)
!
#
"
#
$
(3) Set
Xi+1= Y with probability r
Xi with probability 1−r
"
$$
#
$
$%
%
#
&
#
'
Remarks to Metropolis-Hastings Algorithm
•  A simple way to execute step (3) is to generate
U~(0,1) . If U<r set Xi+1=Y otherwise Xi+1=Xi .
•  A common choice for q(y|x) is N(x,b 2 ) for some
b>0. In this case, proposal density q is symmetric,
q(y|x)=q(x|y), and
r(x,y)=min f (y),1
f (x)
!
#
"
#
$
%
#
&
#
'
Metropolis-Hastings Algorithm. Example1
Let’s simulate a Markov chain whose distribution is
f (x)= 1 1 (The Cauchy distribution)
π 1+x2
Let’s take N(x,b 2 ) as proposal distribution.
Then
!
%
!
2 %
r(x,y)=min f (y),1 =min 1+y 2,1
1+x
f (x)
#
"
#
$
#
&
#
'
#
"
#
$
#
&
#
'
Let’s choose b=1, N=10,000 length of chain.
Example 1. Code in R
N=10000
b=1
x_values=matrix(0,1,N)
x_cauchy=matrix(0,1,N)
x_axis=seq(-7,7,by=0.1)
x_old=0
x_new=x_old
for (i in 1:N)
{y=rnorm(1,x_old,b)
r=min((1+x_old^2)/(1+y^2),1)
p=runif(1)
if (p<r)
(x_new=y) else (x_new=x_old);
x_values[i]=x_new
x_old=x_new}
x_cauchy=dcauchy(x_axis)
plot(x_axis,x_cauchy,type="p",col="black")
points(density(x_values),type="l",col="red",lwd=3)
Gibbs Sampling
Gibbs Sampling is the easiest to use MCMC
algorithm in case of dealing with high-dimensional
problems as it helps to turn a high-dimensional
problem into several one-dimensional problems.
One of the examples of high-dimensional problems
is hierarchical model
Hierarchical model. Example1
Posterior distribution on
joint model
(θ,σ 2 ) associated with the
Xi ~(θ,σ 2 ), i=1,...,n,
θ ~N(θ0, τ 2 ), σ 2 ~IG(a,b),
2
θ0, τ , a, b specified.
Gibbs Sampling algorithm
Suppose that (X,Y) has density f X, Y (x, y). Suppose
that it is possible to simulate from the conditional
distributions f X|Y (x|y) and fY| X(y|x). Let (X0,Y0) be
starting values. Assume we have drawn (X0,Y0),...,(Xn,Yn)
Then the Gibbs sampling algorithm for getting (Xn+1,Yn+1) :
Xn+1 ~ f X|Y (x|Yn)
Yn+1 ~ fY| X(y|Xn+1)
repeat
Posteriors for the Example1
Xi ~(θ,σ 2 ), i=1,...,n,
θ ~N(θ0, τ 2 ), σ 2 ~IG(a,b),
θ0, τ 2, a, b
f (θ |x,σ )~N
2
2
2
2
2
f (σ |x,θ )~IG n+a, 1∑ xi−θ +b
2
2i
2
2
σ
n
τ
σ
τ
θ
+
x,
2
2 0
2
2
2
2
σ +nτ
σ +nτ σ +nτ
!
#
#
#
#
#
"
"
$
$
$
$
$
$
$
$
#
"
$$
#
%
''
&
%
'
'
'
'
'
'
'
'
&
$
&
&
&
&
&
%
Example 1. Code in R
x=rnorm(1000,10,2)
n=length(x)
a=3; b=3
tau2=10
theta0=5
Nsim=5000
xbar=mean(x)
sh1=(n/2)+a
sigma2=theta=rep(0,Nsim) #init arrays
sigma2[1]=1/rgamma(1,shape=a,rate=b) #init chains
B=sigma2[1]/(sigma2[1]+n*tau2)
theta[1]=rnorm(1,m=B*theta0+(1-B)*xbar,sd=sqrt(tau2*B))
for (i in 2:Nsim){
B=sigma2[i-1]/(sigma2[i-1]+n*tau2)
theta[i]=rnorm(1,m=B*theta0+(1-B)*xbar,sd=sqrt(tau2*B))
ra1=(1/2)*(sum((x-theta[i])^2))+b
sigma2[i]=1/rgamma(1,shape=sh1,rate=ra1)
}
mean(theta[3000:5000])
mean(sigma2[3000:5000])
Conjugate priors
In Bayesian probability theory, if posterior
distributions are in the same family as the prior
distributions, then both prior and posterior are called
conjugate distributions and the prior is called
conjugate prior.
P(θ |D)= P(θ )P(D|θ )
∫ P(θ )P(D|θ )dθ
Conjugate priors. Example
(2)
x~N(
µ
,
σ
).
Let’s consider normal distribution
(2)
σ
x
For normally distributed with fixed variance
,
the conjugate prior is also normally distributed. For
(2)
prior µ ~N(µ0,σ 0 ) posterior will be in the form:
(2)
ˆ
µ|x,σ ~N(µ̂0,σ 0 ),
(2)
(2)
0
(2)
0
µ̂0 = (2)σ (2) x+ (2)σ (2) µ0,
σ +σ 0 σ +σ 0
(2)
(2)
(2)
σˆ0 = σ(2) σ 0 (2)
σ +σ 0
Conjugate priors. Example
Let’s consider normal distribution x~N(µ,σ ).
For normally distributed x with fixed mean µ , the
conjugate prior is distributed according to inversegamma distribution. For prior σ (2) ~IG(α, β )
(2)
2
2
2
(x−
µ
)
1/2(x−
µ
)
−1/2
P(x,µ |σ )= 1 exp(−
)∝(σ ) exp(−
)
2
2
σ
2σ
σ 2π
(2) (−α −1)
α
(2)
P(σ )=IG(α,β )= β (σ ) exp− β(2) ,
Γ(α)
σ
2
(2)
2 −(α +1/2)−1
−
β
−1/2(x−
µ
)
P(σ |x,µ)∝(σ )
exp
σ (2)
α̂ =α +1/2, β̂ =β +1(x−µ)2
(2)
$
&
&
&
&
&
&
%
$
&
&
&
&
&
&
%
'
)
)
)
)
)
)
(
'
)
)
)
)
)
)
(
ABC SMC
A number of sampled parameter values (particles)
{θ (1),...,θ (n)}, sampled from the prior distribution π (θ ) ,
are propagated through a sequence of intermediate
distributions, π (θ |d(x 0, x*)≤εi), i=1,...,T−1, until it
represents a sample from the target distribution
π (θ |d(x 0, x*)≤εT ). The tolerances ε1>...>εT ≥0 what
mean gradual evolving towards the target posterior.
For sufficiently large numbers of particles, this
approach avoid the problem of getting stuck in areas
of low probability (as in ABC MCMC)
ABC SMC Algorithm
S1. Initialize ε1,...,εT . Set the population indicator t=0.
S2.0 Set the particle indicator i=1.
S2.1 If t=0 , sample θ ** independently from π (θ ).
(i)
*
{
θ
Else, sample θ from the previous population t−1}
with weights wt−1 and perturb the particle to obtain
θ ** ~Kt(θ |θ *), where Kt is a perturbation kernel.
**
π
(
θ
)=0, return to S2.1.
If
Simulate a candidate dataset x * ~ f (x|θ **) .
If d(x *, x0)≥εt , return to S2.1.
ABC SMC Algorithm
S2.2 Set θt(i) =θ ** and calculate the weight for particle θt(i)
1,
if t=0,
(i)
(i)
π
(
θ
t )
wt =
, if t>0.
( j)K (θ ( j),θ (i) )
w
∑ t−1 t t−1 t
#
%
%
%
%
$
% N
%
%
%& j=1
If i<N , set i=i+1, go to S2.1.
S3 Normalize the weights.
If t<T, set t=t+1, go to S2.0.
Ovarian Cancer case study. CA125
Risk calculation
Change-point hierarchical model for CA125
Controls:
Yij |tij ~N(θi,σ 2)
Cases:
Yij |tij,{Ii =0}~N(θi,σ 2)
Yij |tij,{Ii =1}~N(θi+γi(tij−τ i)+ ,σ 2)
Conditional distributions
Conclusion
1.  ABC methods has great impact on parameters’
estimation.
2.  A lot of applied problems can be reduced to
hierarchical model
3.  Gibbs Sampling Algorithm is most useful in
dealing with hierarchical models
Literature
1. Steven J. Skates, Donna K. Pauler, Ian J. Jacobs.
Screening Based on the Risk of Cancer Calculation from
Bayesian Hierarchical Changepoint and Mixture Models of
Longitudinal Markers. Journal of the American Statistical
Society, vol. 96 (2001).
2. Wasserman L. All of Statistics. A concise course in
Statistical Inference, Springer, 2004.
3. Tina Toni, David Welch, Natalja Strelkowa, Andreas Ipsen,
Michael P.H. Stumpf. Approximate Bayesian computation
scheme for parameter inference and model selection in
dynamical systems. Journal of the royal society, 6, 187-202
(2009).
4. Robert P. Christian, Casella George. Introducing Monte
Carlo Methods with R, Springer, 2009.
Questions