Download The Monte Carlo method

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
The Monte Carlo method
1
(1,1)
(-1,1)
(0,0)
1
Z=
0
1
(-1,-1)
If X2+Y21
o/w
(1,-1)
• (X,Y) is a point chosen uniformly at random
in a 22 square centered at the origin (0,0).
• P(Z=1)=/4.
2

Assume we run this experiment m times, with
Zi being the value of Z at the ith run.
m
.
If W=iZi, then E[W ]  E[ Z i ]   E[ Z i ] 
4
i 1
i 1
m


m
W 

 W’=4W/m is an estimate of .
m 4
3

By Chernoff bound,
m m
Pr[| W ' |  ]  Pr[| W 
|
]
4
4
 Pr[| W  E[W ] | E[W ]]
 2e

 m 2 / 12
.
Def: A randomized algorithm gives an (,)approximation for the value V if the output X
of the algorithm satisfies Pr[|X-V|V]  1-.
4

The above method for estimating  gives an
(,)-approximation, as long as <1 and m
large enough.
12ln (2 /  )
 m / 12
2e
  m
.
2

2
5

Thm1: Let X1,…,Xm be independent and
identically distributed indicator random
variables, with =E[Xi]. If m  (3ln(2/))/2,
m
1
then Pr[|
X   | ]   .
m



i 1
i
I.e. m samples provide an (,)-approximation
for .
Pf: Exercise!
6


Def: FPRAS: fully polynomial randomized
approximation scheme.
A FPRAS for a problem is a randomized
algorithm for which, given an input x and any
parameters  and  with 0<,<1, the
algorithm outputs an (,)-approximation to
V(x) in time poly(1/,ln-1,|x|).
7



Def: DNF counting problem: counting the
number of satisfying assignments of a
Boolean formula in disjunctive normal form
(DNF).
Def: a DNF formula is a disjunction of clauses
C1C2… Ct, where each clause is a
conjunction of literals.
Eg.
(x1x2x3)(x2x4)(x1x3x4)
8


Counting the number of satisfying
assignments of a DNF formula is actually #Pcomplete.
Counting the number of Hamiltonian cycles in
a graph and counting the number of perfect
matching in a bipartite graph are examples of
#P-complete problems.
9
A naïve algorithm for DNF counting problem:
Input: A DNF formula F with n variables.
Output: Y = an approximation of c(F)

1. X0.
The number of satisfying
Assigments of F.
2. For k=1 to m, do:
(a) Generate a random assignment for the
n variables, chosen uniformly at random.
(b) If the random assignment satisfies F,
then X X+1.
3. Return Y  (X/m)2n.
10
Analysis
1
Xk=

If the k-th iteration in the algorithm
generated a satisfying assignment;
o/w.
0
Pr[Xk=1]=c(F)/2n.
m

Let X=  X k and then E[X]=mc(F)/2n.
k 1

E[ X ]2n
E[Y ] 
 c( F ).
m
11
Analysis



By Theorem 1, X/m gives an (,)approximation of c(F)/2n, and hence Y gives
an (,)-approximation of c(F), when m 
32nln(2/))/2c(F).
If c(F)  2n/poly(n), then this is not too bad,
m is a poly.
But, if c(F)=poly(n), then m=O(2n/c(F))!
12
Analysis






Note that if Ci has li literals then there are
exactly 2n-li satisfying assignments for Ci.
Let SCi denote the set of assignments that
satisfy clause i.
U={(i,a):
1itt and aSCi}.
t
nl
|
SC
|

2
|U|=  i 
t
i 1
i 1
Want to estimate c( F ) |  SCi | .
i 1
Define S={(i,a): 1it and aSCi, aSCj for j<i}.
i
13
DNF counting algorithm II:


1.
2.
3.
Input: A DNF formula F with n variables.
Output: Y: an approximation of c(F).
X0.
For k=1 to m do:
t
(a) With probability | SCi |  | SCi | choose,
i 1
uniformly at random an assignment aSCi.
(b) If a is not in any
SCj, j<i, then XX+1.
t
Return Y ( X / m) | SCi |.
i 1
14
DNF counting algorithm II:


Note that |U|t|S|. Why? t
Let Pr[i is chosen]=| SCi |  | SCi |  | SCi | | U |
i 1

Then Pr[(i,a) is chosen]
=Pr[i is chosen]Pr[a is chosen|i is chosen]
| SCi | 1
1

.
=
| U | | SCi |
|U |
15
DNF counting algorithm II:




Thm: DNF counting algorithm II is an FPRAS for
DNF counting problem when m=(3t/2)ln(2/).
Pf: Step 2(a) chooses an element of U
uniformly at random.
The probability that this element belongs to S is
at least 1/t.
Fix any ,>0, and let m= (3t/2)ln(2/).
poly(t,1/,ln(1/))
16
DNF counting algorithm II:


The processing time of each sample is poly(t).
By Thm1, with m samples, X/m gives an (,)approximation of c(F)/|U| and hence Y gives an
(,)-approximation of c(F).
17
Counting with Approximate Sampling


Def: w: the output of a sampling algorithm for
a finite sample space .
The sampling algorithm generates an -uniform
sample of  if, for any subset S of , |Pr[wS]|S|/|||.
18



Def: A sampling algorithm is a fully polynomial
almost uniform sampler (FPAUS) for a problem
if, given an input x and >0, it generates an uniform sample of (x) in time poly(|x|,ln(1/)).
Consider an FPAUS for independent sets would
take as input a graph G=(V,E) and a parameter
.
The sample space: the set of all independent
sets in G.
19




Goal: Given an FPAUS for independent sets, we
construct an FPRAS for counting the number of
independent sets.
Assume G has m edges, and let e1,…,em be an
arbitrary ordering of the edges.
Ei: the set of the first i edges in E and let
Gi=(V,Ei).
(Gi): denote the set of independent sets in Gi.
20
| (Gm ) | | (Gm 1 ) |
| (G1 ) |
| (G ) |

 
 | (G0 ) | .
| (Gm 1 ) | | (Gm 2 ) |
| (G0 ) |


|(G0)|=2n. Why?
To estimate |(G)|, we need good estimates
for r  | (Gi ) | , i  1,.., m.
i
| (Gi 1 ) |
21


~
Let ri be estimate for ri, then
~
 ri .
i 1
To evaluate
the error, we need to bound the
m ~
ratio R   ri .
i 1

| (G ) |~ 2
m
n
ri
To have an (,)-approximation, we want
Pr[|R-1|]1-.
22



~
Lemma: Suppose that for all i, 1im, ri is an
(/2m,/m)-approximation for ri.
Then Pr[|R-1|]1-.
Pf:
For each 1im, we have


~
Pr[| ri  ri |
2m
ri ]  1 
m
.
23







~
ri ]  .
Equivalently, Pr[| ri  ri |
2m
m

~
Pr[i, | ri  ri |
ri ]   .
2m

~
Pr[i, | ri  ri |
ri ]  1   .
2~m
ri


Pr[i,1 
  1
]  1  .
2m ri
2m
ri
 m m ~
 m
Pr[(1 
)    (1 
) ]  1  .
2m
2m
i 1 ri
 Pr[1    R  1   ]  1   .
24
1.
Estimating ri:
Input: Graph Gi-1=(V,Ei-1) and Gi=(V,Ei)
~
Output: ri = an approximation of ri.
X0
2.
Repeat for M=(1296m2/2)ln(2m/) independent trials:



3.
(a) Generate an (/6m)-uniform sample from (Gi-1).
(b) If the sample is an independent set in Gi, then XX+1.
Return ~
ri X/M.
25





Lemma: When m1 and 0<1, the procedure
for estimating ri yields an (/2m,/m)approximation for ri.
Pf:
Suppose Gi-1 and Gi differ in that edge {u,v} is
in Gi but not in Gi-1.
(Gi)  (Gi-1).
An independent set in (Gi-1)\(Gi) contains
both u and v.
26

Associate each I(Gi-1)\(Gi) with an
independent set I\{v}(Gi).
Note that I’(Gi) is associated with no more
than one independent set I’{v}(Gi-1)\(Gi),
thus |(Gi-1)\(Gi)|  |(Gi)|.

It follows that

| (Gi ) |
| (Gi ) |
1
ri 

 .
| (Gi 1 ) | | (Gi ) |  | (Gi 1 ) \ (Gi ) | 2
27


Let Xk=1 if the k-th sample is in (Gi) and 0 o/w.
Because our samples are generated by an
(/6m)-uniform sampler, by definition,
| (Gi ) |

Pr[ X k  1] 

| (Gi 1 ) |

6m
.
| (Gi ) |

E[ X k ] 

.
| (Gi 1 ) | 6m
28

By linearity of expectations,
M
X
E[
k 1
M
k
]
| (Gi ) |


.
| (Gi 1 ) |
6m
M
X
ri ]  ri | E[ k 1
 | E[~
M
k
]
| (Gi ) |


.
| (Gi 1 ) |
6m
 Since ri1/2, we have

1

1
E[ ~
ri ]  ri 
 
 .
6m
2 6m 3
29

If M3ln(2m/)/(/12m)2(1/3)=1296m2-2ln(2m/),
then
 ~
ri
 

~
 
~
~
Pr  ~  1 
E[ ri ]  .
  Pr  ri  E[ ri ] 
12m 
12m

 m
 E[ ri ]

Equivalently, with probability 1- /m,

~
ri

1

 1
.
~
12m E[ ri ]
12m
-----(1)
30



~
|
E
[
r
]

r
|

, we have
i
i
As
6m
E[~
ri ]


1

 1
.
6mri
ri
6mri
Using, ri1/2, then

E[~
ri ]

1

 1
.
3m
ri
3m
-----(2)
31

Combining (1) and (2), with probability 1-/m,
~
ri



1
 (1 
)(1 
)   (1 
)(1 
)  1
.
2m
3m
12m
ri
3m
12m
2m





This gives the desired (/2m,/m)-approximation.
Thm: Given FPAUS for independent sets in any
graph, we can construct an FPRAS for the
number of independent sets in a graph G.
32
The Markov Chain Monte Carlo Method


The Markov Chain Monte Carlo Method provides
a very general approach to sampling from a
desired probability distribution.
Basic idea: Define an ergodic Markov chain
whose set of states is the sample space and
whose stationary distribution is the required
sampling distribution.
33



Lemma: For a finite state space  and
neighborhood structure {N(x)|x}, let
N=maxx|N(x)|.
Let M be any number such that MN. Consider a
Markov chain where
1/M
if xy and yN(x)
Px,y=
0
if xy and yN(x)
1-N(x)/M if x=y.
If this chain is irreducible and aperiodic, then the
stationary distribution is the uniform distribution.
34



Pf:
For xy, if x=y, then xPx,y=yPy,x, since
Px,y=Py,x=1/M.
It follows that the uniform distribution x=1/||
is the stationary distribution by the following
theorem.
35
Thm:

P: transition matrix of a finite irreducible and
ergodic Markov chain. If there are nonnegative
n
numbers  =(0,..,n) such that   i  1
i 0
and if, for any pair of states i,j, iPi,j=jPj,i, then
 is the stationary distribution corresponding to
P. n
n
 i Pi , j    j Pj ,i   j .  P   .
Pf: 
n
i 0
i 0

Since   i  1, it follows that  is the unique
i 0
stationary distribution of the Markov Chain.

36

1.
2.
Eg. Markov chain with states from independent sets
in G=(V,E).
X0 is an arbitrary independent set in G.
To compute Xi+1:
(a) choose a vertex v uniformly at random from V;
(b) if vXi then Xi+1=Xi\{v};
(c) if vXi and if adding v to Xi still gives an
independent set, then Xi+1=Xi{v};
(d) o/w, Xi+1=Xi.
37




The neighbors of a state Xi are independent sets
that differ from Xi in just one vertex.
Since every state is reachable from the empty
set, the chain is irreducible.
Assume G has at least one edge (u,v), then the
state {v} has a self-loop (Pv,v>0), thus aperiodic.
When xy, it follows that Px,y=1/|V| or 0, by the
previous lemma, the stationary distribution is the
uniform distribution.
38
The Metropolis Algorithm
(When stationary distribution is nonuniform)


Lemma: For a finite state space  and neighborhood
structure {N(x)|x}, let N=maxx|N(x)|.
Let M be and number such that MN. For all x, let
x>0 be the desired probability of state x in the
stationary distribution. Consider a Markov chain
Px,y=

(1/M)min(1,y/x)
0
1-yxPx,y
if xy and yN(x)
if xy and yN(x)
if x=y.
Then, if this chain is irreducible and aperiodic, then
the stationary distribution is given by x.
39






Pf:
For any xy, if xy, then Px,y=1/M and
Py,x=(1/M)(x/y).
It follows that Px,y=1/M=(y/x)Py,x.
 xPx,y=yPy,x.
Similarly, for x>y.
Again, by the previous theorem, x’s form the
stationary distribution.
40



Eg. Create a Markov chain, in the stationary
distribution, each independent set I has
probability proportional to |I|, for some >0.
I.e. x=|Ix|/B, where Ix is the independent set
corresponding to state x and B=x|Ix|.
Note that, when =1, this is the uniform
distribution.
41
1.
2.
X0 is an arbitrary independent set in G.
To compute Xi+1:
(a) choose v uniformly at random from V;
(b) if vXi then Xi+1=Xi\{v} with probability min(1,1/);
(c) if vXi and Xi{v} still gives an independent set,
then put Xi+1=Xi{v} with probability min(1,);
(d) o/w, set Xi+1=Xi.
42