Download Metropolis-Hastings Algorithm The original algorithm used in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Metropolis-Hastings Algorithm
The original algorithm used in simulated annealing and MCMC’s
is due to Metropolis. Later generalized by Hastings. Hastings
showed that it is not necessary to use a symmetric proposal
distribution, and proposed that the proposed new state can be
generated from any q(y|x).
Of course, the speed with which we reach the equilibrium
distribution will depend on the choice of the proposal function.
Data Analysis and Monte Carlo Methods
Extra - Metropolis Hastings
1
Metropolis-Hastings Algorithm
Given xi:
1.  Generate Yi according to q(y|xi)
2.  Take
with probability ρ(x i ,Yi )
Yi
X i+1 = 
with probability 1− ρ(x i ,Yi )
x i
where
 f (y) q(x | y) 
ρ(x, y) = min
,1
 f (x) q(y | x) 
f is the target density and q is the instrumental or proposal
distribution. Note that if f (y i )q(x i | y i ) > f (x i )q(y i | x i ) then we
€
always
accept the new step. Else, it is accepted only with some
probability. If the proposal distribution is symmetric, q(y|x)=q(x|y)
then probability only depends on ratio f(y)/f(x) (Metropolis
€
algorithm).
Data Analysis and Monte Carlo Methods
Extra - Metropolis Hastings
2
Metropolis-Hastings Algorithm
Method looks a lot like hit-or-miss, except that we can have
repeated instances of the same values.
Note that the Metropolis-Hastings algorithm follows the definition
of a Markov process (depends only on current state).
Following conditions are sufficient for the stationary distribution to
!
yield the target density:
supp q(·|x) ⊃ supp f
Ω, the support of f, is connected and
x∈supp f
Means we can visit all parts of Ω using q given enough steps.
Transition kernel can be written as (continuous form):
Pxy = ρ(x, y)q(y | x) + (1− r(x))δ (y − x)
r(x) = ∫ ρ(x, y)q(y | x)dy
Data Analysis and Monte Carlo Methods
Extra - Metropolis Hastings
3
Metropolis-Hastings Algorithm
Claim: detailed balance is built into algorithm.
Pxy f (x) = ρ(x, y)q(y | x) f (x) + (1− r(x))δ (y − x) f (x)
 f (y)q(x | y) 
ρ(x, y)q(y | x) f (x) = min
,1q(y | x) f (x)
 f (x)q(y | x) 
= min{ f (y)q(x | y),q(y | x) f (x)}
 f (x)q(y | x) 
ρ(y, x)q(x | y) f (y) = min
,1q(x | y) f (y)
 f (y)q(x | y) 
= min{ f (y)q(x | y),q(y | x) f (x)}
and
(1− r(x))δ (y − x) f (x) = (1− r(y))δ (x − y) f (y)
so
Pxy f (x) = Pyx f (y)
Data Analysis and Monte Carlo Methods
So f is a stationary distribution of Pxy
Extra - Metropolis Hastings
4
Metropolis-Hastings Algorithm
As we showed, detailed balance implies that f is a stationary
distribution of the transition kernel.
Since X(t+1)=X(t) is possible, then the chain is aperiodic.
!
supp q(·|x) ⊃ supp f
Furthermore, if
x∈supp f
then the chain is irreducible (all states communicate).
Recurrence can also be demonstrated.
Therefore, the ergodic theorem applies (guarantee that the
Markov Chain will reach the stationary distribution independent
of the starting point). The stationary distribution is f.
Data Analysis and Monte Carlo Methods
Extra - Metropolis Hastings
5
Example
Using the following algorithm:
f (x) = (cos(50x) + sin(20x))
2
* starting value
x=0.5
*
Do I=1,100000
€
* the trial coordinate is generated flat (flat
covering function)
y=rvec(I)
* Get the ratio of probabilities
rho=min(testfun(y)/testfun(x),1.)
pass=0
If (rvec1(I).le.rho) then
Igood=Igood+1
pass=1
Endif
write (25,*) I,Igood,pass,rho,x,y
If (pass.eq.1) x=y
Enddo
Data Analysis and Monte Carlo Methods
Extra - Metropolis Hastings
6
Example
We now look at the importance of the proposal distribution.
Generate a Gaussian distribution with zero mean and σ=1 from a
random walk Markov Chain with a step derived from a flat
distribution as follows:
1.  Generate a number from a flat distribution between [-s,s]; call
it ε. Now set y=xt+ ε
−y2 / 2
e

2.  Calculate ρ = min − x / 2 ,1 (note that q(y|x)=q(x|y))
e

3.  Set xt+1=y if U<ρ, where U is a r.v. from a uniform distribution
between (0,1)
2
€
We will look to see how quickly we converge to the desired
distribution depending on s.
Data Analysis and Monte Carlo Methods
Extra - Metropolis Hastings
7
Example
Data Analysis and Monte Carlo Methods
Extra - Metropolis Hastings
Lecture 7 8