Download Inference V: MCMC Methods - CS

Inference V: MCMC Methods . Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem: It is difficult to sample from P(X1, …. Xn |e )  We had to use likelihood weighting to reweigh our samples  This introduced bias in estimation  In some case, such as when the evidence is on leaves, these methods are inefficient MCMC Methods  We are going to discuss sampling methods that are based on Markov Chain  Markov Chain Monte Carlo (MCMC) methods  Key  ideas: Sampling process as a Markov Chain  Next  sample depends on the previous one These will approximate any posterior distribution  We start by reviewing key ideas from the theory of Markov chains Markov Chains X1, X2, … take some set of values  wlog. These values are 1, 2, ...  A Markov chain is a process that corresponds to the network:  Suppose X1  To X2 X3 ... Xn quantify the chain, we need to specify  Initial probability: P(X1)  Transition probability: P(Xt+1|Xt)  A Markov chain has stationary transition probability  P(Xt+1|Xt) is the same for all times t ... Irreducible Chains state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0  There is a positive probability of reaching j from i after some number steps A A chain is irreducible if every state is accessible from every state Ergodic Chains A state is positively recurrent if there is a finite expected time to get back to state i after being in state i  If X has finite number of states, then this is suffices that i is accessible from itself A chain is ergodic if it is irreducible and every state is positively recurrent (A)periodic Chains state i is periodic if there is an integer d such that P(Xn = i | X1 = i ) = 0 when n is not divisible by d A A chain is aperiodic if it contains no periodic state Stationary Probabilities Thm:  If a chain is ergodic and aperiodic, then the limit lim P (Xn | X1  i ) n  exists, and does not depend on i  Moreover, let P * (X  j )  lim P (Xn  j | X1  i ) n  then, P*(X) is the unique probability satisfying P * (X  j )   P (Xt 1  j | Xt  i )P * (X  i ) i Stationary Probabilities probability P*(X) is the stationary probability of the process  Regardless of the starting point, the process will converge to this probability  The  The rate of convergence depends on properties of the transition probability Sampling from the stationary probability  This theory suggests how to sample from the stationary probability:    Set X1 = i, for some random/arbitrary i For t = 1, 2, …, n  Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) return xn  If n is P*(X) large enough, then this is a sample from Designing Markov Chains  How do we construct the right chain to sample from?  Ensuring aperiodicity and irreducibility is usually easy  Problem is ensuring the desired stationary probability Designing Markov Chains Key tool:  If the transition probability satisfies P ( Xt 1  j |Xt i ) P ( Xt 1 i |Xt  j )  Q (X  j ) whenever P ( Xt 1  j | Xt  i )  0 Q ( X i ) then, P*(X) = Q(X)  This gives a local criteria for checking that the chain will have the right stationary distribution MCMC Methods  We can use these results to sample from P(X1,…,Xn|e) Idea:  Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e)  Simulate the chain n steps to get a sample MCMC Methods Notes:  The Markov chain variable Y takes as value assignments to all variables that are consistent evidence  For simplicity, we will denote such a state using the vector of variables V (Y )  { x1,..., xn V (X1 )  V (X1 ) | x1,..., xn satisfy e } Gibbs Sampler  One of the simplest MCMC method  At each transition change the state of just on Xi  We can describe the transition probability as a stochastic procedure:  Input: a state x1,…,xn  Choose i at random (using uniform probability)  Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e)  let x’j = xj for all j  i  return x’1,…,x’n Correctness of Gibbs Sampler  By chain rule P(x1, …, xi-1, xi, xi+1 ,…, xn|e) = P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e)  Thus, we get P ( x1 ,,xi 1 ,xi ,xi 1 ,,xn |e ) P ( x1 ,,xi 1 ,x 'i ,xi 1 ,,xn |e )  Since  P ( xi |x1 ,,xi 1 ,xi 1 ,,xn ,e ) P ( x 'i |x1 ,,xi 1 ,xi 1 ,,xn ,e ) we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria Gibbs Sampling for Bayesian Network  Why is the Gibbs sampler “easy” in BNs?  Recall that the Markov blanket of a variable separates it from the other variables in the network  P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )  This property allows us to use local computations to perform sampling in each transition Gibbs Sampling in Bayesian Networks do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?  Let Y1, …, Yk be the children of Xi  By definition of Mbi, the parents of Yj are in Mbi{Xi}  How  It is easy to show that P (xi | Mbi )  P (xi | Pai ) P (yi | pay j ) j  P (x ' | Pa ) P (y x 'i i i j i | pay j ) Sampling Strategy  How do we collect the samples? Strategy I:  Run the chain M times, each run for N steps  each run starts from a different state points  Return the last state in each run M chains Sampling Strategy Strategy II:  Run one chain for a long time  After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain Comparing Strategies Strategy I:   Better chance of “covering” the space of points especially if the chain is slow to reach stationarity Have to perform “burn in” steps for each chain Strategy II:   Perform “burn in” only once Samples might be correlated (although only weakly) Hybrid strategy:  run several chains, and sample few samples from each  Combines benefits of both strategies

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Inference V: MCMC Methods - CS