Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inference V: MCMC Methods . Stochastic Sampling In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem: It is difficult to sample from P(X1, …. Xn |e ) We had to use likelihood weighting to reweigh our samples This introduced bias in estimation In some case, such as when the evidence is on leaves, these methods are inefficient MCMC Methods We are going to discuss sampling methods that are based on Markov Chain Markov Chain Monte Carlo (MCMC) methods Key ideas: Sampling process as a Markov Chain Next sample depends on the previous one These will approximate any posterior distribution We start by reviewing key ideas from the theory of Markov chains Markov Chains X1, X2, … take some set of values wlog. These values are 1, 2, ... A Markov chain is a process that corresponds to the network: Suppose X1 To X2 X3 ... Xn quantify the chain, we need to specify Initial probability: P(X1) Transition probability: P(Xt+1|Xt) A Markov chain has stationary transition probability P(Xt+1|Xt) is the same for all times t ... Irreducible Chains state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0 There is a positive probability of reaching j from i after some number steps A A chain is irreducible if every state is accessible from every state Ergodic Chains A state is positively recurrent if there is a finite expected time to get back to state i after being in state i If X has finite number of states, then this is suffices that i is accessible from itself A chain is ergodic if it is irreducible and every state is positively recurrent (A)periodic Chains state i is periodic if there is an integer d such that P(Xn = i | X1 = i ) = 0 when n is not divisible by d A A chain is aperiodic if it contains no periodic state Stationary Probabilities Thm: If a chain is ergodic and aperiodic, then the limit lim P (Xn | X1 i ) n exists, and does not depend on i Moreover, let P * (X j ) lim P (Xn j | X1 i ) n then, P*(X) is the unique probability satisfying P * (X j ) P (Xt 1 j | Xt i )P * (X i ) i Stationary Probabilities probability P*(X) is the stationary probability of the process Regardless of the starting point, the process will converge to this probability The The rate of convergence depends on properties of the transition probability Sampling from the stationary probability This theory suggests how to sample from the stationary probability: Set X1 = i, for some random/arbitrary i For t = 1, 2, …, n Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) return xn If n is P*(X) large enough, then this is a sample from Designing Markov Chains How do we construct the right chain to sample from? Ensuring aperiodicity and irreducibility is usually easy Problem is ensuring the desired stationary probability Designing Markov Chains Key tool: If the transition probability satisfies P ( Xt 1 j |Xt i ) P ( Xt 1 i |Xt j ) Q (X j ) whenever P ( Xt 1 j | Xt i ) 0 Q ( X i ) then, P*(X) = Q(X) This gives a local criteria for checking that the chain will have the right stationary distribution MCMC Methods We can use these results to sample from P(X1,…,Xn|e) Idea: Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e) Simulate the chain n steps to get a sample MCMC Methods Notes: The Markov chain variable Y takes as value assignments to all variables that are consistent evidence For simplicity, we will denote such a state using the vector of variables V (Y ) { x1,..., xn V (X1 ) V (X1 ) | x1,..., xn satisfy e } Gibbs Sampler One of the simplest MCMC method At each transition change the state of just on Xi We can describe the transition probability as a stochastic procedure: Input: a state x1,…,xn Choose i at random (using uniform probability) Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) let x’j = xj for all j i return x’1,…,x’n Correctness of Gibbs Sampler By chain rule P(x1, …, xi-1, xi, xi+1 ,…, xn|e) = P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e) Thus, we get P ( x1 ,,xi 1 ,xi ,xi 1 ,,xn |e ) P ( x1 ,,xi 1 ,x 'i ,xi 1 ,,xn |e ) Since P ( xi |x1 ,,xi 1 ,xi 1 ,,xn ,e ) P ( x 'i |x1 ,,xi 1 ,xi 1 ,,xn ,e ) we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria Gibbs Sampling for Bayesian Network Why is the Gibbs sampler “easy” in BNs? Recall that the Markov blanket of a variable separates it from the other variables in the network P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi ) This property allows us to use local computations to perform sampling in each transition Gibbs Sampling in Bayesian Networks do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ? Let Y1, …, Yk be the children of Xi By definition of Mbi, the parents of Yj are in Mbi{Xi} How It is easy to show that P (xi | Mbi ) P (xi | Pai ) P (yi | pay j ) j P (x ' | Pa ) P (y x 'i i i j i | pay j ) Sampling Strategy How do we collect the samples? Strategy I: Run the chain M times, each run for N steps each run starts from a different state points Return the last state in each run M chains Sampling Strategy Strategy II: Run one chain for a long time After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain Comparing Strategies Strategy I: Better chance of “covering” the space of points especially if the chain is slow to reach stationarity Have to perform “burn in” steps for each chain Strategy II: Perform “burn in” only once Samples might be correlated (although only weakly) Hybrid strategy: run several chains, and sample few samples from each Combines benefits of both strategies