Download Monte Carlo methods

CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNING Monte Carlo Methods for Probabilistic Inference AGENDA  Monte Carlo methods   O(1/sqrt(N)) standard deviation For Bayesian inference Likelihood weighting  Gibbs sampling  MONTE CARLO INTEGRATION  Estimate large integrals/sums: I =  f(x)p(x) dx  I =  f(x)p(x)   Using a sample of N i.i.d. samples from p(x)   I  1/N  f(x(i)) Examples: [a,b] f(x) dx  (b-a)/N  f(x(i))  E[X] =  x p(x) dx  1/N  x(i)  Volume of a set in Rn  MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?  MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?   E[I-IN]=I-E[IN] (linearity of expectation) MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?   E[I-IN]=I-E[IN] (linearity of expectation) = E[f(x)] - 1/N  E[f(x(i))] (definition of I and IN) MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?   E[I-IN]=I-E[IN] (linearity of expectation) = E[f(x)] - 1/N  E[f(x(i))] (definition of I and IN) = 1/N  (E[f(x)]-E[f(x(i))]) = 1/N  0 (x and x(i) are distributed w.r.t. p(x)) =0 MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?    Unbiased estimator What is the variance Var[IN]? MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?    Unbiased estimator What is the variance Var[IN]?  Var[IN] = Var[1/N  f(x(i))] (definition) MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?    Unbiased estimator What is the variance Var[IN]?  Var[IN] = Var[1/N  f(x(i))] = 1/N2 Var[ f(x(i))] (definition) (scaling of variance) MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?    Unbiased estimator What is the variance Var[IN]?  Var[IN] = Var[1/N  f(x(i))] (definition) = 1/N2 Var[ f(x(i))] (scaling of variance) = 1/N2  Var[f(x(i))] (variance of a sum of independent variables) MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?    Unbiased estimator What is the variance Var[IN]?  Var[IN] = Var[1/N  f(x(i))] = 1/N2 Var[ f(x(i))] = 1/N2  Var[f(x(i))] = 1/N Var[f(x)] (definition) (scaling of variance) (i.i.d. sample) MEAN & VARIANCE OF ESTIMATE Let IN be the random variable denoting the estimate of the integral with N samples  What is the bias (mean error) E[I-IN]?    What is the variance Var[IN]?   Unbiased estimator 1/N Var[f(x)] Standard deviation: O(1/sqrt(N)) APPROXIMATE INFERENCE THROUGH SAMPLING  Unconditional simulation:  To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed APPROXIMATE INFERENCE THROUGH SAMPLING  Unconditional simulation:   Conditional simulation:   1. 2. 3.  To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed To estimate the probability P(H) that a coin picked out of bucket B flips heads: Repeat for i=1,…,N: Pick a coin C out of a random bucket b(i) chosen with probability P(B) h(i) = flip C according to probability P(H|b(i)) Sample (h(i),b(i)) comes from distribution P(H,B) Result approximates P(H,B) MONTE CARLO INFERENCE IN BAYES NETS BN over variables X  Repeat for i=1,…,N   In top-down order, generate x(i) as follows: Sample xj(i) ~ P(Xj |paXj(i))  (RHS is taken by putting parent values in sample into the CPT for Xj)   Sample x(1)… x(N) approximates the distribution over X APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION  Sample from the joint distribution Burglary P(B) Earthquake 0.001 P(E) 0.002 B E P(A|…) B=0 E=0 A=0 J=1 M=0 Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION  As more samples are generated, the distribution of the samples approaches the joint distribution B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0 BASIC METHOD FOR HANDLING EVIDENCE Inference: given evidence E=e (e.g., J=1), approximate P(X/E|E=e)  Remove the samples that conflict  B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0 Distribution of remaining samples approximates the conditional distribution RARE EVENT PROBLEM: What if some events are really rare (e.g., burglary & earthquake ?)  # of samples must be huge to get a reasonable estimate  Solution: likelihood weighting  Enforce that each sample agrees with evidence  While generating a sample, keep track of the ratio of  (how likely the sampled value is to occur in the real world) (how likely you were to generate the sampled value) LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=1 Burglary P(B) Earthquake 0.001 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.008 Burglary P(B) Earthquake 0.001 B=0 E=1 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.0023 B=0 E=1 A=1 Burglary P(B) Earthquake 0.001 A=1 is enforced, and the weight updated Alarm to reflect the likelihood that this occurs JohnCalls A P(J|…) T F 0.90 0.05 P(E) 0.002 B E P(A|…) T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.0016 Burglary P(B) Earthquake 0.001 B=0 E=1 A=1 M=1 J=1 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=3.988 Burglary P(B) Earthquake 0.001 B=0 E=0 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.004 Burglary P(B) Earthquake 0.001 B=0 E=0 A=1 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.0028 Burglary P(B) Earthquake 0.001 B=0 E=0 A=1 M=1 J=1 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.00375 Burglary P(B) Earthquake 0.001 B=1 E=0 A=1 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.0026 Burglary P(B) Earthquake 0.001 B=1 E=0 A=1 M=1 J=1 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=5e-7 Burglary P(B) Earthquake 0.001 B=1 E=1 A=1 M=1 J=1 P(E) 0.002 B E P(A|…) Alarm JohnCalls A P(J|…) T F 0.90 0.05 T T F F T F T F 0.95 0.94 0.29 0.001 MaryCalls A P(M|…) T 0.70 F 0.01 LIKELIHOOD WEIGHTING Suppose evidence Alarm & MaryCalls  Sample B,E with P=0.5  w=0.0016 B=0 E=1 A=1 M=1 J=1 w=0.0028 B=0 E=0 A=1 M=1 J=1 w=0.0026 B=1 E=0 A=1 M=1 J=1 w~=0 B=1 E=1 A=1 M=1 J=1 N=4 gives P(B|A,M)~=0.371  Exact inference gives P(B|A,M) = 0.375  ANOTHER RARE-EVENT PROBLEM B=b given as evidence  Probability each bi is rare given all but one setting of Ai (say, Ai=1)  A1 A2 A10 B1 B2 B10 Chance of sampling all 1’s is very low => most likelihood weights will be too low  Problem: evidence is not being used to sample A’s effectively (i.e., near P(Ai|b))  GIBBS SAMPLING  Idea: reduce the computational burden of sampling from a multidimensional distribution P(x)=P(x1,…,xn) by doing repeated draws of individual attributes Cycle through j=1,…,n  Sample xj ~ P(xj | x[1…j-1,j+1,…n])   Over the long run, the random walk taken by x approaches the true distribution P(x) GIBBS SAMPLING IN BNS Each Gibbs sampling step: 1) pick a variable Xi, 2) sample xi ~ P(Xi|X/Xi)  Look at values of “Markov blanket” of Xi:  Parents PaXi  Children Y1,…,Yk  Parents of children (excluding Xi) PaY1/Xi, …, PaYk/Xi  Xi is independent of rest of network given Markov blanket   Sample xi~P(Xi|, Y1, PaY1/Xi, …, Yk, PaYk/Xi) = 1/Z P(Xi|PaXi) P(Y1|PaY1) *…* P(Yk|PaYk)  Product of Xi’s factor and the factors of its children HANDLING EVIDENCE Simply set each evidence variable to its appropriate value, don’t sample  Resulting walk approximates distribution P(X/E|E=e)  Uses evidence more efficiently than likelihood weighting  GIBBS SAMPLING ISSUES Demonstrating correctness & convergence requires examining Markov Chain random walk (more later)  Need to take many steps before the effects of poor initialization wear off (mixing time)    Difficult to tell how much is needed a priori Numerous variants  Known as Markov Chain Monte Carlo techniques NEXT TIME  Continuous and hybrid distributions

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Monte Carlo methods