Download Probabilistic method

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Week 6
Probabilistic method
This week: [MR] parts of 5.3, 5.5
Probabilistic method
We continue with the probabilistic method. Two forms may be distinguished:
a) If some random variable X has expectation E[X] > a, then there must be a realization of X with value at least a.
b) If we pick an object s at random from a set of objects S and
Pr{s has property P } > 0 then there must be some s ∈ S with property P .
A standard application of (b) is by a simple counting argument and using the union bound.
Consider the following generic example. Let G = (V, E) be a graph on n vertices and let
S ⊆ 2V be a collection of subsets of the vertices. Assume we want to show that there
exists a graph G with property P , where P is the property that no subset in S has some
property P 0 . Now take G at random (for some distribution) and assume that for any
S∈S
Pr{S has property P 0 } < p for some p.
Then,
Pr{G has property P } = 1 − Pr{G does not have property P }
= 1 − Pr{There is an S ∈ S with P 0 .}
> 1 − |S|p,
which is strictly positive if p < 1/|S|.
Example: No large clique or independent set [not in [MR]].
Is there a graph on n = 1000 vertices such that any subgraph on 20 vertices is not a clique
or an independent set?
S: the subgraphs on 20 vertices.
P : No subgraph in S has property P 0 .
P 0 : It is a clique or an independent set.
Now take G uniformly at random by selecting each edge of the complete graph Kn with
probability 1/2 and let H be an arbitrary subgraph of G on 20 vertices. The probability
20
p that H is a clique or an independent set is 2−( 2 )+1 . On the other hand, the number
|S| of subgraphs is 1000
20 . One may check that p < 1/|S|. In general we get the following
theorem.
k
Theorem 1. For 2(2)−1 > nk there is graph on n vertices which does not have a clique
or independent set of size (at least) k.
1
Week 6
Probabilistic method
Example: Expanding graphs ([MR] 5.3)
This example is similar to the previous example but S is now partitioned into several sets.
Definition 1. An (n, d, α, c) OR-concentrator is a bipartite graph G = (L, R, E) with
|L| = |R| = n and such that
1. Every vertex in L has degree at most d
2. For any S ⊂ L with |S| 6 αn the number if neighbors of S is at least c|S|.
Theorem 2. There is an n0 such that for all n > n0 there is an (n, 18, 1/3, 2) ORconcentrator.
Proof. Construct a bipartite graph G = (L, R, E) at random as follows. For each v ∈ L
choose d neighbors uniformly at random (with replacement). Double edges are replaced
by a single edge.
For i 6 αn let Si = {(S, T ) | S ⊂ L, T ⊆ R, |S| = i, |T | = ci}.
P : For all i 6 αn, no (S, T ) ∈ Si has property P 0 .
P 0 : All neighbors of S are in T .
For fixed (S, T ) ∈ Si ,
di
ci
Pr{(S, T ) has property P } =
n
0
Let Ei be the event that some (S, T ) ∈ Si has property P 0 . Since |Si | =
n
i
n
ci
we get
di
n
n
ci
Pr{Ei } 6
i
ci
n
Next we need to find values for d, α and c such that the sum of the right-hand side for i = 1
to αn is less than 1. The following computation shows that this holds for d = 18, α = 1/3
and c = 2.
Since for any integers k 6 n it holds that nk 6 (ne/k)k we get
Pr{Ei } 6
ne i ne ci ci di
i
ci
n
" #i
i d−c−1 c+1 d−c
=
e c
.
n
Now using i/n 6 α = 1/3 and use c = 2 and d = 18 gives Pr{Ei } <
Pr{G has property P } > 1 −
αn
X
i=1
2
Pr{Ei } > 1 −
1 i
2 .
αn i
X
1
i=1
2
> 0.
Week 6
Probabilistic method
Example: Large independent set. (not in [MR])
Given a graph G with m edges, what can we say about the size k of the largest independent
set? One way is to take k vertices at random and to show that the induced subgraph is an
independent set with non-zero probability. An edge e is in the subgraph with probability
less than ( nk )2 . By the union bound, the probability that some edge is in the subgraph is
less than m( nk )2 . So the probability of hitting an independent set is non-zero if
2
n
k
< 1, i.e. if k < √ .
m
n
m
Clearly, this is not a strong bound. Consider the following two events. (i) In a random
sample of k vertices there is no edge in the induced subgraph. (ii) In a random sample of
2k vertices there will be at most k edges. Intuitively, the probability of the latter event is
much larger. But if this happens, then removing for edges on eof its endpoints leaves an
independent set of size at least k. Next, we use this sample and modify approach to get a
better bound.
Theorem 3. Any graph G with n vertices and m > n/2 edges has an independent set of
size at least k = n2 /(4m).
Proof. (If m 6 n/2 then clearly there is an indpendent set of size at least n/2.)
Do the following:
1. (Sample) Select each vertex with probability p.
2. (Modify) In the induced subgraph, select one endpoint for each edge and delete all
these points.
Let X be the number of selected vertices and Y be the number of edges in the induced
subgraph. Then the expected number of vertices in the remaining independent set is at
least E[X − Y ]. Clearly, E[X] = np. The probability that an edge is in the induced
subgraph is p2 . Therefore, E[Y ] = mp2 .
E[X − Y ] = np − mp2 .
This value is maximized for p = n/(2m) (which is at most 1 by assumption). In that case,
n
n2
n2
−m 2 =
.
2m
4m
4m
This bound is much stronger than the previous bound of √nm .
E[X − Y ] = n
Example: Girth of a graph. (not in [MR])
This is another example of the sample and modify approach. The girth of a graph G is
defined as the length of the smallest cycle in G. Clearly, there is some correlation between
the girth k of a graph and the number of edges |E|: Graphs with large girth cannot be
very dense. The probabilistic method is an easy tool to show what density still is possible
for given value of k. It certainly does not give the best bound.
3
Week 6
Probabilistic method
Theorem 4. For all k > 3 there is a graph G = (V, E) on n vertices such that
(i) G has girth at least k.
(ii) |E| > 13 n1+1/k
Proof. We take a random graph on n vertices and then we modify it such that its girth is
at least k. We then show that the expected number of edges is at least 31 n1+1/k .
1. (Sample) Take a random graph G on n vertices: edges are chosen independently
with probability p = n1/k−1 .
2. (Modify) For each cycle of length j < k select one edge. Remove all selected edges.
Let X be the number of edges in the random graph and Y be the number of cycles in it
of length less than k. Then the expected number of remaining edges is at least E[X − Y ].
n
E[X] =
p.
2
Let Cj be the number of cycles of length j in a complete graph on n vertices, j = 3, . . . , k−1.
Then
n
Cj 6
(j − 1)!/2 < nj .
j
Each such cycle appears in G with probability pj . Therefore, the expected number of
cycles of length j in G is at most nj pj = nj n(1/k−1)j = nj/k < n. Hence, E[Y ] < kn.
n
E[X − Y ] >
· p − kn
2
n(n − 1) 1/k−1
=
n
− kn
2
1
1
=
(1 − )n1/k+1 − kn
2
n
1 1/k+1
>
n
, for n large enough.
3
Probabilistic method: Lovász Local Lemma
([MR] 5.5)
Definition 2. We say that event E is mutually independent of events E1 , . . . , Ek if
Pr(E | ∩j∈S Ej ) = Pr(E) fpr all S ⊆ {1, 2, . . . , k}.
Definition 3. Graph G = (V, E) with vertices 1, 2, . . . , n is a dependency graph for
events E1 , E2 , . . . , En if, for every j, Ej is mutually independent of all other events Ei with
(i, j) ∈
/ E.
4
Week 6
Probabilistic method
Mutually independence is a stronger property than pairwise independence. It is a misunderstanding that the dependency graph has an edge between i and j if and only if Ei and
Ej are dependent. Consider the following example. We flip a coin twice and say that we
win if both outcomes are the same. Let E1 (E2 ) be the event that the first (second) coin
flip gives head and let E3 be the event of winning. Then the three events are pairwise
independent but E3 is not independent of E1 ∩ E2 . So what is the dependency graph for
this simple example? The complete graph K3 is a valid choice and so is any graphs with
exactly two edges. We see that the dependency graph is not unique and that in general the
complete graph is always valid. For graph G(V, E) and j ∈ V denote the neighborhood of
j by N (j) = {i | (i, j) ∈ E}.
Theorem 5 (LLL general form). Let G be a dependency graph for events E1 , E2 . . . , En .
If there are numbers x1 , x2 , . . . , xn ∈ [0, 1] such that
Y
Pr[Ej ] 6 xj
(1 − xi ) for every j.
(1)
i∈N (j)
then
n
Pr[ ∩ E j ] >
j=1
n
Y
(1 − xi ).
(2)
i
If xj = 1 for some j then the theorem is obviously true so assume xj < 1 for all j. The
theorem follows almost directly from the next lemma.
Lemma 1. If condition (1) holds, then for any S ⊂ {1, 2, . . . , n} and j ∈
/S
Pr{Ej | ∩ E i } 6 xj .
(3)
i∈S
Proof. We prove it by induction on the size of S. If |S| = 0, then the lemma follows
directly from (1). Now let |S| = k > 1 and assume the lemma holds for sets of size at
most k − 1. We partition S into two sets. Let
S1 = S ∩ N (j),
and event
A1 = ∩ E i
S2 = S \ S1 ,
and event
A2 = ∩ E i .
5
i∈S1
i∈S2
Week 6
Probabilistic method
Note that if S1 = ∅ then Pr{Ej | A1 ∩ A2 } = Pr{Ej | A2 } = Pr{Ej } 6 xj . So assume from
now that S1 6= ∅.
Pr{Ej | A1 ∩ A2 } =
Pr{Ej ∩ A1 ∩ A2 }
Pr{Ej ∩ A1 | A2 }
=
.
Pr{A1 ∩ A2 }
Pr{A1 | A2 }
We shall bound the numerator and denominator separately. Numerator:
Y
(1 − xi ).
Pr{Ej ∩ A1 | A2 } 6 Pr{Ej | A2 } = Pr{Ej } 6 xj
(4)
(5)
i∈N (j)
The equality follows from the independence of Ej and A2 and the inequality is by condition (1). For the denominator we use the induction hypothesis. Let S1 = {i1 , . . . , ir } for
some r > 1.
Pr{A1 | A2 } = Pr{E i1 | A2 } · Pr{E i2 | E i1 ∩ A2 } . . . Pr{E ir | E i1 ∩ · · · ∩ E ir−1 ∩ A2 }
Y
>
(1 − xi )
i∈S1
Substituting (5) and (6) in (4) we get
Q
xj
(1 − xi )
Pr{Ej | A1 ∩ A2 } 6
i∈N (j)
Q
Y
= xj
(1 − xi )
(1 − xi ) 6 xj .
i∈N (j)\S1
i∈S1
Proof. (LLL, general form) The proof follows directly from the previous lemma:
n
Pr{ ∩ E j } = Pr{E 1 } · Pr{E 2 | E 1 } · . . . · Pr{E n | E 1 ∩ · · · ∩ E n−1 }
j=1
> (1 − x1 )(1 − x2 ). . . . (1 − xn ).
The LLL lemma may be better known in its symmetric form.
Theorem 6 (LLL symmetric form). Let G be a dependency graph for events E1 , E2 . . . , En
and let d be the maximum degree. Further assume that Pr(Ej ) 6 p for all j and some p.
n
If e · p · (d + 1) 6 1 (where e = 2.71...) then Pr[ ∩ E j ] > 0.
j=1
Proof. Choose xi = 1/(d + 1) for all i. Then
Y
1
1 d
1
1
xj
(1 − xi ) >
(1 −
) >
· > p > Pr[Ej ].
d+1
d+1
d+1 e
i∈N (j)
Now the general LLL states that
n
Pr[ ∩ E j ] >
j=1
n
Y
(1 − xj ) = 1 −
j
6
1
d+1
n
> 0.
Week 6
Probabilistic method
Example: Cycle coloring (not in [MR])
Assume we are given a cycle C of c · n vertices together with a coloring of its vertices.
There are c colors and each color is given to n vertices. We want to choose one vertex of
each color such that no two are adjacent (i.e., no edge has both endpoints picked). Clearly,
this is not possible for small values of n and always possible for large values of n. How
small can n be? Let us first follow the simple argument without LLL and then apply LLL.
Pick a vertex at random from each color. Let Ej be the event that both endpoints of edge
j are picked. Then Pr[Ej ] 6 1/n2 . (Equality if the endpoints have different color and zero
if the same color.) The probability that no edge is picked
Pr[∩j E j ] = 1 − Pr[∪j Ej ] > 1 −
X
Pr[Ej ] > 1 −
j
c
cn
=1− .
n2
n
The probability is strictly positive if n > c.
This bound is not very strong. The weak point is the union bound. Assume there is
an edge i with endpoints colored 1 and 2 and another edge j with color 3 and 4. The
union bound says that Pr[Ei ∪ Ej ] 6 Pr[Ei ] + Pr[Ej ] = 2/n2 . However, these events are
independent which implies that Pr[Ei ∪ Ej ] = Pr[Ei ] + Pr[Ej ] − Pr[Ei ∩ Ej ] = 2/n2 − 1/n4 .
Hence, we can get a stronger bound by using independence.
Now we apply LLL. Consider the following graph G on n vertices. There is an edge
between i and j if and only if i and j have a color in common. Then G is a dependency
graph and the maximum degree is 4(n − 1) + 2 = 4n − 2. Now LLL tells us that
n
Pr[ ∩ E j ] > 0,
j=1
if e n12 (4n − 1) 6 1, i.e., n > 11. This is much stronger that the bound n > c that we
obtained without LLL.
Example: Edge disjoint paths (not in [MR])
This is a similar application of LLL. We have a graph G = (V, E) together with N pairs
(sj , tj ) ∈ V 2 and would like to find one path between each pair such that the N paths are
pairwise edge disjoint. This may or may not exist. Assume we have for each pair (sj , tj ) a
set Pj of m paths between the pair that we can choose from. The paths in Pj do not have
to be edge disjoint. Assume that for any two Pi , Pj at most k of the m2 pairs of paths
share some edges.
Lets first do the analysis without LLL. For each j pick a path pj from Pj at random. Let
Eij be the event that pi and pj share an edge. Then
Pr{Eij } 6 k/m2 .
7
Week 6
Probabilistic method
The probability that no pair of paths share an edge is at least
X
N k
Pr(∩i,j E ij ) > 1 −
,
Pr(Eij ) > 1 −
2 m2
i,j
m2
k
which is strictly positive for
>
N
2
.
Now we use LLL. The event Eij is mutually independent from the set of events {Egh |
{i, j} ∩ {g, h} = ∅}. In the dependency graph there is an edge between Eij and Egh only if
{i, j} ∩ {g, h} =
6 ∅. Hence,
d 6 2(N − 2).
The condition of LLL states
k
m2
(2N
−
1)
6
1
⇔
> e(2N − 1).
m2
k
2
This is stronger than the bound mk > N2 obtained without LLL.
For example, if we have N = 100 pairs of vertices (in some large graph) and between each
pair m = 50 paths and k = 4 then we know that there should be a set of pairwise edge
disjoint paths.
ep(d + 1) 6 1 ⇔ e
Example: k-Satisfiability ([MR] 5.5)
In general, it may be very difficult to get from the existence proof (obtained from LLL)
to actually finding a solution. The book gives a complex example for k-SAT in which one
actually finds an assignment (with high probability). Let us first give a simple proof for
the existence using LLL.
The k-SAT problem is the maximum satisfiability problem in which each clause has exactly
k literals. We may assume that no clause contains both a variable and its negation since
these clauses are always true.
Theorem 7 (Not in MR). If each variable appears in at most K = 2k /(3k) clauses then
the formula has a feasible assignment. (Assume k > 6.)
Proof. Set each variable true with probability 1/2. Let Ej be the event that clause j is not
satisfied. Then Pr{Ej } = 21k . Since each variable appears in at most K clauses we have
d 6 k(K − 1) < k2k /(3k) = 2k /3.
Then,
ep(d + 1) = e
1
2k
2k
+1
3
1
1
= e( + k ) < 1 for k > 6.
3 2
Note that K = 2k /(3k) is much larger than the K = 2k/50 from the book. In that sense,
the result above is stronger than what is done in the book. However, it only gives existence.
Finding a feasible assignment is much harder.
8
Week 6
Probabilistic method
Theorem 8. (Theorem 5.14) If each variable appears in at most K = 2k/50 clauses then
a satisfying assignment can be found in expected polynomial time (where k is assumed an
even constant).
The algorithm works in two phases. In the first phase, part of the variables is given a
random value. In the second pahes, the assignment is completed in an optimal way.
Algorithm:
Phase 1 Sequentially flip a coin for each variable (for example in the order 1, 2, . . . ,n) but
skip a variable if it became marked in the process. When a clause has k/2 of its variables
assigned and is still not satisfied then all the remaining variables are ‘marked’. The clause
is then called dangerous.
Phase 2 Build a graph for the yet unsatisfied clauses. There is a vertex for each clause
and an edge if the two clauses share a marked variable. If all components in this graph
have size at most z log m (for some constant z following from the analysis below) then
complete the assignment in an optimal way by complete enumeration. Otherwise, apply
phase 1 again.
Example The example below is easliy sastified but illustrates the two phases.
C 1 = x1 ∨ x3 ∨ x5 ∨ x6 ,
C 2 = x1 ∨ x2 ∨ x5 ∨ x7 ,
C 3 = x2 ∨ x3 ∨ x4 ∨ x6 ,
C4 = x2 ∨ x4 ∨ x5 ∨ x7
Assume the first coin flip is: x1 = true. Then the first clause is satisfied. Now assume the second
coin flip gives x2 =false. The false literals are displayed in red.
C 2 = x1 ∨ x2 ∨ x 5 ∨ x 7 ,
C3 = x2 ∨ x3 ∨ x4 ∨ x6 ,
C4 = x2 ∨ x4 ∨ x5 ∨ x7
Clause C2 is dangerous. Variables x5 and x7 are defered to the second phase. Now assume the
third coin flip gives x3 =false. Then also clause C3 is dangerous and x4 and x6 are defered to
the second phase. Clause C4 is not dangerous but all its remaing variables are defered. Clauses
C2 , C3 , C4 survive the first phase.
C2 = x1 ∨ x2 ∨ x5 ∨ x7 ,
C3 = x2 ∨ x3 ∨ x4 ∨ x6 ,
C4 = x2 ∨ x4 ∨ x5 ∨ x7 .
Lemma 2. There always is a feasible completion of the partial assignment given in Phase 1.
Proof. Each of the unsatisfied clauses still has at least k/2 marked variables left. Let k 0 = k/2
and assume each clause has exactly k 0 variables. (Just remove the extra variables.) Each variable
0
appears in at most 2k/50 = 2k /25 clauses, which is much less than the bound of Theorem 7. So
existence is guaranteed.
Lemma 3. If all components of the graph constructed in Phase 2 are of size at most z log m for
constant z, then an optimal completion can be found in polynomial time.
Proof. No two components have a variable in common so the assignment can be done independently
for each component. For each component, the number of possible assignments is 2(kz log m) which
is polynomial in m if we assume that k is a constant.
It remains to show that the expected number of repetitions of the phases is polynomially bounded.
We show that the expected number is less than 2. Let G be the dependency graph and let d be
the maximum degree. In this case, d 6 k(K − 1) < k2k/50 .
9
Week 6
Probabilistic method
Lemma 4. The probability that a clause is yet unsatisfied at the end of phase 1 is at most (d +
1)2−k/2 .
Proof. This only happens when it became a dangerous clause or when some variables where marked
because other clauses became dangerous. In the latter case, the other clause is adjacent in the
dependency graph. For any clause, the probability that it becomes dangerous is at most 2−k/2 .
So, the probability that a clause survives the first phase is at most (union bound) (d + 1)2−k/2 .
Lemma 5. Let C1 , C2 . . . , Cr be r clauses at pairwise distance at least 4 in G. The probability
that all clauses survive the first phase is at most [(d + 1)2−k/2 ]r
Proof. If a clause survives then one of its neighbors or the clause itself became dangerous. For
j = 1, 2, . . . , r let Dj be a clause at distance at most 1 from Cj in G. For each j, the probability
that Dj becomes dangerous is at most 2−k/2 . Any pair Di , Dj is at distance at least 2 in G so they
do not have variables in common. Therefore, the probability that all D1 , . . . , Dr become dangerous
is at most (2−k/2 )r . (Note that this holds even though the events may not be independent.). The
number of ways to select one such Dj for each Cj is at most (d + 1)r . Hence, the probability is
bounded by (d + 1)r · (2−k/2 )r .
Definition 4. A subset T of vertices in G is called a 4-tree if the following holds:
1. The clauses are at pairwise distance at least 4 in G.
2. If we would construct a new graph in which there is an edge between two clauses in T if they
are at distance exactly 4 in G then this graph is connected.
Lemma 6. The number of 4-trees of size r is at most md8r
Proof. Construct a new graph on the clauses where there is an edge if the distance in G between
the clauses is exactly 4. The maximum degree in this graph is at most d0 = d4 . The number of
4-trees is no more than the number of connected subgraphs of this graph. By the Problem 5.7, the
0
number of connected subgraphs is at most md2r = md8r .
Lemma 7. There is a constant b such that the probability that any 4-tree of size larger than
r = b log m survives is o(1).
Proof. Follows directly from the previous two lemmas and the union bound: The probability is at
most
[(d + 1)2−k/2 ]r md8r 6 f (k)r m, for some function f (k) < 1.
This is at most 1/m if f (k)r 6 m−2 . We assumed k to be constant so this holds for r > b log m
for some constant b.
Lemma 8. There is a constant z such that the probability that any connected subgraph of G of
size larger than z log m survives is o(1).
Proof. Any connected subgraph contains some maximal 4-tree T . Each vertex in T has no more
than d + d2 + d3 6 3d3 vertices in the component at distance at most 3 and there are no other
vertices (by the maximality of T ). So the size of this subgraph is bounded by 3d3 |T |. So take
z = 3d3 b.
By Lemma 8, the expected number of repetitions of the phases is less than 2.
10
Week 6
Probabilistic method
Questions for this week:
Complete the proof of the k-SAT construction by making Problem 5.7 from [MR].
Problem 5.8 from [MR].
Each vertex j of a graph G = (V, E) is assigned a list Sj of 6r colors. For each pair j ∈ V, c ∈ Sj
there are at most r neighbors i of j such that c ∈ Si . Use LLL to show that a coloring of the
vertices exists such that
• no two adjacent points get the same color
• each vertex j is assigned a color from Sj .
11