Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Approximate Bayesian Computation methods and their applications for hierarchical statistical models University College London, 2013 Contents 1. 2. 3. 4. 5. Introduction ABC methods Hierarchical algorithm Application for ovarian cancer detection Conclusion Introduction • The likelihood function plays an important role in statistical inference problems • For complex models computational costs for evaluating the analytical formula are very high • Methods which provide statistical inference bypassing evaluation of the likelihood function gained high popularity ABC methods • ABC methods provide ways of evaluating posterior distributions when the likelihood function is analytically or computationally intractable • These methods are based on replacing the calculation of the likelihood with a comparison between the observed and simulated data Let θ be a parameter vector to be estimated. Given the prior distribution π (θ ), the goal is to approximate the posterior distribution π (θ |x)∝ f (x|θ )π (θ ), where f (x|θ ) is the likelihood of θ given the data x. Generic form of ABC methods 1. Sample a candidate parameter vector θ * from some proposal distribution π (θ ). 2. Simulate a dataset x* from the model described by a conditional probability distribution f (x|θ*) . 3. Compare the simulated dataset, x*, with the experimental data, x0 , using a distance function, d and tolerance ε ; if d(x0,x*)≤ε, accept θ *. The tolerance ε ≥0 is the desired level of agreement between x0 and x*. Most popular ABC algorithms • ABC rejection algorithm • ABC MCMC algorithm (Marcov Chain Monte Carlo) • ABC SMC algorithm (Sequential Monte Carlo) ABC rejection method 1. 2. 3. 4. Sample θ * from π (θ ) . Simulate a dataset x* from f (x|θ*) . If d(x0,x*)≤ε, accept θ *, otherwise reject. Return to step 1. Disadvantage: if prior distribution will be very different from the posterior, acceptance rate would be low. ABC MCMC 1. 2. 3. 4. Metropolis-Hastings Algorithm Random-walk-Metropolis-Hastings Algorithm Gibbs Sampling Algorithm Metropolis within Gibbs Algorithm Metropolis-Hastings Algorithm Let q(y|x) be an arbitrary, friendly distribution (we know how to sample from) called proposal. Choose X0 arbitrarily. Suppose we have generated X0, X1,..., Xi . To generate Xi+1 do the following: (1) Generate a proposal or candidate value Y ~q(y|Xi) (2) Evaluate r≡r(Xi|Y) where r(x,y)=min f (y)q(x|y),1 f (x)q(y|x) ! # " # $ (3) Set Xi+1= Y with probability r Xi with probability 1−r " $$ # $ $% % # & # ' Remarks to Metropolis-Hastings Algorithm • A simple way to execute step (3) is to generate U~(0,1) . If U<r set Xi+1=Y otherwise Xi+1=Xi . • A common choice for q(y|x) is N(x,b 2 ) for some b>0. In this case, proposal density q is symmetric, q(y|x)=q(x|y), and r(x,y)=min f (y),1 f (x) ! # " # $ % # & # ' Metropolis-Hastings Algorithm. Example1 Let’s simulate a Markov chain whose distribution is f (x)= 1 1 (The Cauchy distribution) π 1+x2 Let’s take N(x,b 2 ) as proposal distribution. Then ! % ! 2 % r(x,y)=min f (y),1 =min 1+y 2,1 1+x f (x) # " # $ # & # ' # " # $ # & # ' Let’s choose b=1, N=10,000 length of chain. Example 1. Code in R N=10000 b=1 x_values=matrix(0,1,N) x_cauchy=matrix(0,1,N) x_axis=seq(-7,7,by=0.1) x_old=0 x_new=x_old for (i in 1:N) {y=rnorm(1,x_old,b) r=min((1+x_old^2)/(1+y^2),1) p=runif(1) if (p<r) (x_new=y) else (x_new=x_old); x_values[i]=x_new x_old=x_new} x_cauchy=dcauchy(x_axis) plot(x_axis,x_cauchy,type="p",col="black") points(density(x_values),type="l",col="red",lwd=3) Gibbs Sampling Gibbs Sampling is the easiest to use MCMC algorithm in case of dealing with high-dimensional problems as it helps to turn a high-dimensional problem into several one-dimensional problems. One of the examples of high-dimensional problems is hierarchical model Hierarchical model. Example1 Posterior distribution on joint model (θ,σ 2 ) associated with the Xi ~(θ,σ 2 ), i=1,...,n, θ ~N(θ0, τ 2 ), σ 2 ~IG(a,b), 2 θ0, τ , a, b specified. Gibbs Sampling algorithm Suppose that (X,Y) has density f X, Y (x, y). Suppose that it is possible to simulate from the conditional distributions f X|Y (x|y) and fY| X(y|x). Let (X0,Y0) be starting values. Assume we have drawn (X0,Y0),...,(Xn,Yn) Then the Gibbs sampling algorithm for getting (Xn+1,Yn+1) : Xn+1 ~ f X|Y (x|Yn) Yn+1 ~ fY| X(y|Xn+1) repeat Posteriors for the Example1 Xi ~(θ,σ 2 ), i=1,...,n, θ ~N(θ0, τ 2 ), σ 2 ~IG(a,b), θ0, τ 2, a, b f (θ |x,σ )~N 2 2 2 2 2 f (σ |x,θ )~IG n+a, 1∑ xi−θ +b 2 2i 2 2 σ n τ σ τ θ + x, 2 2 0 2 2 2 2 σ +nτ σ +nτ σ +nτ ! # # # # # " " $ $ $ $ $ $ $ $ # " $$ # % '' & % ' ' ' ' ' ' ' ' & $ & & & & & % Example 1. Code in R x=rnorm(1000,10,2) n=length(x) a=3; b=3 tau2=10 theta0=5 Nsim=5000 xbar=mean(x) sh1=(n/2)+a sigma2=theta=rep(0,Nsim) #init arrays sigma2[1]=1/rgamma(1,shape=a,rate=b) #init chains B=sigma2[1]/(sigma2[1]+n*tau2) theta[1]=rnorm(1,m=B*theta0+(1-B)*xbar,sd=sqrt(tau2*B)) for (i in 2:Nsim){ B=sigma2[i-1]/(sigma2[i-1]+n*tau2) theta[i]=rnorm(1,m=B*theta0+(1-B)*xbar,sd=sqrt(tau2*B)) ra1=(1/2)*(sum((x-theta[i])^2))+b sigma2[i]=1/rgamma(1,shape=sh1,rate=ra1) } mean(theta[3000:5000]) mean(sigma2[3000:5000]) Conjugate priors In Bayesian probability theory, if posterior distributions are in the same family as the prior distributions, then both prior and posterior are called conjugate distributions and the prior is called conjugate prior. P(θ |D)= P(θ )P(D|θ ) ∫ P(θ )P(D|θ )dθ Conjugate priors. Example (2) x~N( µ , σ ). Let’s consider normal distribution (2) σ x For normally distributed with fixed variance , the conjugate prior is also normally distributed. For (2) prior µ ~N(µ0,σ 0 ) posterior will be in the form: (2) ˆ µ|x,σ ~N(µ̂0,σ 0 ), (2) (2) 0 (2) 0 µ̂0 = (2)σ (2) x+ (2)σ (2) µ0, σ +σ 0 σ +σ 0 (2) (2) (2) σˆ0 = σ(2) σ 0 (2) σ +σ 0 Conjugate priors. Example Let’s consider normal distribution x~N(µ,σ ). For normally distributed x with fixed mean µ , the conjugate prior is distributed according to inversegamma distribution. For prior σ (2) ~IG(α, β ) (2) 2 2 2 (x− µ ) 1/2(x− µ ) −1/2 P(x,µ |σ )= 1 exp(− )∝(σ ) exp(− ) 2 2 σ 2σ σ 2π (2) (−α −1) α (2) P(σ )=IG(α,β )= β (σ ) exp− β(2) , Γ(α) σ 2 (2) 2 −(α +1/2)−1 − β −1/2(x− µ ) P(σ |x,µ)∝(σ ) exp σ (2) α̂ =α +1/2, β̂ =β +1(x−µ)2 (2) $ & & & & & & % $ & & & & & & % ' ) ) ) ) ) ) ( ' ) ) ) ) ) ) ( ABC SMC A number of sampled parameter values (particles) {θ (1),...,θ (n)}, sampled from the prior distribution π (θ ) , are propagated through a sequence of intermediate distributions, π (θ |d(x 0, x*)≤εi), i=1,...,T−1, until it represents a sample from the target distribution π (θ |d(x 0, x*)≤εT ). The tolerances ε1>...>εT ≥0 what mean gradual evolving towards the target posterior. For sufficiently large numbers of particles, this approach avoid the problem of getting stuck in areas of low probability (as in ABC MCMC) ABC SMC Algorithm S1. Initialize ε1,...,εT . Set the population indicator t=0. S2.0 Set the particle indicator i=1. S2.1 If t=0 , sample θ ** independently from π (θ ). (i) * { θ Else, sample θ from the previous population t−1} with weights wt−1 and perturb the particle to obtain θ ** ~Kt(θ |θ *), where Kt is a perturbation kernel. ** π ( θ )=0, return to S2.1. If Simulate a candidate dataset x * ~ f (x|θ **) . If d(x *, x0)≥εt , return to S2.1. ABC SMC Algorithm S2.2 Set θt(i) =θ ** and calculate the weight for particle θt(i) 1, if t=0, (i) (i) π ( θ t ) wt = , if t>0. ( j)K (θ ( j),θ (i) ) w ∑ t−1 t t−1 t # % % % % $ % N % % %& j=1 If i<N , set i=i+1, go to S2.1. S3 Normalize the weights. If t<T, set t=t+1, go to S2.0. Ovarian Cancer case study. CA125 Risk calculation Change-point hierarchical model for CA125 Controls: Yij |tij ~N(θi,σ 2) Cases: Yij |tij,{Ii =0}~N(θi,σ 2) Yij |tij,{Ii =1}~N(θi+γi(tij−τ i)+ ,σ 2) Conditional distributions Conclusion 1. ABC methods has great impact on parameters’ estimation. 2. A lot of applied problems can be reduced to hierarchical model 3. Gibbs Sampling Algorithm is most useful in dealing with hierarchical models Literature 1. Steven J. Skates, Donna K. Pauler, Ian J. Jacobs. Screening Based on the Risk of Cancer Calculation from Bayesian Hierarchical Changepoint and Mixture Models of Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All of Statistics. A concise course in Statistical Inference, Springer, 2004. 3. Tina Toni, David Welch, Natalja Strelkowa, Andreas Ipsen, Michael P.H. Stumpf. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the royal society, 6, 187-202 (2009). 4. Robert P. Christian, Casella George. Introducing Monte Carlo Methods with R, Springer, 2009. Questions