Download Monte Carlo methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS B553: ALGORITHMS FOR
OPTIMIZATION AND LEARNING
Monte Carlo Methods for Probabilistic Inference
AGENDA

Monte Carlo methods


O(1/sqrt(N)) standard deviation
For Bayesian inference
Likelihood weighting
 Gibbs sampling

MONTE CARLO INTEGRATION

Estimate large integrals/sums:
I =  f(x)p(x) dx
 I =  f(x)p(x)


Using a sample of N i.i.d. samples from p(x)


I  1/N  f(x(i))
Examples:
[a,b] f(x) dx  (b-a)/N  f(x(i))
 E[X] =  x p(x) dx  1/N  x(i)
 Volume of a set in Rn

MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?

MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?


E[I-IN]=I-E[IN]
(linearity of expectation)
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?


E[I-IN]=I-E[IN]
(linearity of expectation)
= E[f(x)] - 1/N  E[f(x(i))] (definition of I and IN)
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?


E[I-IN]=I-E[IN]
(linearity of expectation)
= E[f(x)] - 1/N  E[f(x(i))] (definition of I and IN)
= 1/N  (E[f(x)]-E[f(x(i))])
= 1/N  0
(x and x(i) are distributed w.r.t. p(x))
=0
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?



Unbiased estimator
What is the variance Var[IN]?
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?



Unbiased estimator
What is the variance Var[IN]?

Var[IN] = Var[1/N  f(x(i))]
(definition)
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?



Unbiased estimator
What is the variance Var[IN]?

Var[IN] = Var[1/N  f(x(i))]
= 1/N2 Var[ f(x(i))]
(definition)
(scaling of variance)
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?



Unbiased estimator
What is the variance Var[IN]?

Var[IN] = Var[1/N  f(x(i))]
(definition)
= 1/N2 Var[ f(x(i))]
(scaling of variance)
= 1/N2  Var[f(x(i))]
(variance of a sum of
independent variables)
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?



Unbiased estimator
What is the variance Var[IN]?

Var[IN] = Var[1/N  f(x(i))]
= 1/N2 Var[ f(x(i))]
= 1/N2  Var[f(x(i))]
= 1/N Var[f(x)]
(definition)
(scaling of variance)
(i.i.d. sample)
MEAN & VARIANCE OF ESTIMATE
Let IN be the random variable denoting the
estimate of the integral with N samples
 What is the bias (mean error) E[I-IN]?



What is the variance Var[IN]?


Unbiased estimator
1/N Var[f(x)]
Standard deviation: O(1/sqrt(N))
APPROXIMATE INFERENCE THROUGH
SAMPLING

Unconditional simulation:

To estimate the probability of a coin flipping heads, I
can flip it a huge number of times and count the
fraction of heads observed
APPROXIMATE INFERENCE THROUGH
SAMPLING

Unconditional simulation:


Conditional simulation:


1.
2.
3.

To estimate the probability of a coin flipping heads, I
can flip it a huge number of times and count the
fraction of heads observed
To estimate the probability P(H) that a coin picked
out of bucket B flips heads:
Repeat for i=1,…,N:
Pick a coin C out of a random bucket b(i) chosen
with probability P(B)
h(i) = flip C according to probability P(H|b(i))
Sample (h(i),b(i)) comes from distribution P(H,B)
Result approximates P(H,B)
MONTE CARLO INFERENCE IN BAYES
NETS
BN over variables X
 Repeat for i=1,…,N


In top-down order, generate x(i) as follows:
Sample xj(i) ~ P(Xj |paXj(i))
 (RHS is taken by putting parent values in sample into the
CPT for Xj)


Sample x(1)… x(N) approximates the distribution
over X
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION

Sample from the joint distribution
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
B=0
E=0
A=0
J=1
M=0
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION

As more samples are generated, the distribution
of the samples approaches the joint distribution
B=0
E=0
A=0
J=1
M=0
B=0
E=0
A=0
J=0
M=0
B=0
E=0
A=0
J=0
M=0
B=1
E=0
A=1
J=1
M=0
BASIC METHOD FOR HANDLING EVIDENCE
Inference: given evidence E=e (e.g., J=1),
approximate P(X/E|E=e)
 Remove the samples that conflict

B=0
E=0
A=0
J=1
M=0
B=0
E=0
A=0
J=0
M=0
B=0
E=0
A=0
J=0
M=0
B=1
E=0
A=1
J=1
M=0
Distribution of remaining samples
approximates the conditional distribution
RARE EVENT PROBLEM:
What if some events are really rare (e.g.,
burglary & earthquake ?)
 # of samples must be huge to get a reasonable
estimate
 Solution: likelihood weighting

Enforce that each sample agrees with evidence
 While generating a sample, keep track of the ratio of

(how likely the sampled value is to occur in the real world)
(how likely you were to generate the sampled value)
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=1
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.008
Burglary
P(B)
Earthquake
0.001
B=0
E=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0023
B=0
E=1
A=1
Burglary
P(B)
Earthquake
0.001
A=1 is enforced, and
the weight updated
Alarm to
reflect the likelihood
that this occurs
JohnCalls
A
P(J|…)
T
F
0.90
0.05
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0016
Burglary
P(B)
Earthquake
0.001
B=0
E=1
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=3.988
Burglary
P(B)
Earthquake
0.001
B=0
E=0
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.004
Burglary
P(B)
Earthquake
0.001
B=0
E=0
A=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0028
Burglary
P(B)
Earthquake
0.001
B=0
E=0
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.00375
Burglary
P(B)
Earthquake
0.001
B=1
E=0
A=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0026
Burglary
P(B)
Earthquake
0.001
B=1
E=0
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=5e-7
Burglary
P(B)
Earthquake
0.001
B=1
E=1
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0016
B=0
E=1
A=1
M=1
J=1
w=0.0028
B=0
E=0
A=1
M=1
J=1
w=0.0026
B=1
E=0
A=1
M=1
J=1
w~=0
B=1
E=1
A=1
M=1
J=1
N=4 gives P(B|A,M)~=0.371
 Exact inference gives P(B|A,M) = 0.375

ANOTHER RARE-EVENT PROBLEM
B=b given as evidence
 Probability each bi is rare given all but one
setting of Ai (say, Ai=1)

A1
A2
A10
B1
B2
B10
Chance of sampling all 1’s is very low => most
likelihood weights will be too low
 Problem: evidence is not being used to sample A’s
effectively (i.e., near P(Ai|b))

GIBBS SAMPLING

Idea: reduce the computational burden of
sampling from a multidimensional distribution
P(x)=P(x1,…,xn) by doing repeated draws of
individual attributes
Cycle through j=1,…,n
 Sample xj ~ P(xj | x[1…j-1,j+1,…n])


Over the long run, the random walk taken by x
approaches the true distribution P(x)
GIBBS SAMPLING IN BNS
Each Gibbs sampling step: 1) pick a variable Xi,
2) sample xi ~ P(Xi|X/Xi)
 Look at values of “Markov blanket” of Xi:

Parents PaXi
 Children Y1,…,Yk
 Parents of children (excluding Xi) PaY1/Xi, …, PaYk/Xi
 Xi is independent of rest of network given Markov
blanket


Sample xi~P(Xi|, Y1, PaY1/Xi, …, Yk, PaYk/Xi)
= 1/Z P(Xi|PaXi) P(Y1|PaY1) *…* P(Yk|PaYk)

Product of Xi’s factor and the factors of its children
HANDLING EVIDENCE
Simply set each evidence variable to its
appropriate value, don’t sample
 Resulting walk approximates distribution
P(X/E|E=e)
 Uses evidence more efficiently than likelihood
weighting

GIBBS SAMPLING ISSUES
Demonstrating correctness & convergence
requires examining Markov Chain random walk
(more later)
 Need to take many steps before the effects of poor
initialization wear off (mixing time)



Difficult to tell how much is needed a priori
Numerous variants

Known as Markov Chain Monte Carlo techniques
NEXT TIME

Continuous and hybrid distributions
Related documents