Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Week 6 Probabilistic method This week: [MR] parts of 5.3, 5.5 Probabilistic method We continue with the probabilistic method. Two forms may be distinguished: a) If some random variable X has expectation E[X] > a, then there must be a realization of X with value at least a. b) If we pick an object s at random from a set of objects S and Pr{s has property P } > 0 then there must be some s ∈ S with property P . A standard application of (b) is by a simple counting argument and using the union bound. Consider the following generic example. Let G = (V, E) be a graph on n vertices and let S ⊆ 2V be a collection of subsets of the vertices. Assume we want to show that there exists a graph G with property P , where P is the property that no subset in S has some property P 0 . Now take G at random (for some distribution) and assume that for any S∈S Pr{S has property P 0 } < p for some p. Then, Pr{G has property P } = 1 − Pr{G does not have property P } = 1 − Pr{There is an S ∈ S with P 0 .} > 1 − |S|p, which is strictly positive if p < 1/|S|. Example: No large clique or independent set [not in [MR]]. Is there a graph on n = 1000 vertices such that any subgraph on 20 vertices is not a clique or an independent set? S: the subgraphs on 20 vertices. P : No subgraph in S has property P 0 . P 0 : It is a clique or an independent set. Now take G uniformly at random by selecting each edge of the complete graph Kn with probability 1/2 and let H be an arbitrary subgraph of G on 20 vertices. The probability 20 p that H is a clique or an independent set is 2−( 2 )+1 . On the other hand, the number |S| of subgraphs is 1000 20 . One may check that p < 1/|S|. In general we get the following theorem. k Theorem 1. For 2(2)−1 > nk there is graph on n vertices which does not have a clique or independent set of size (at least) k. 1 Week 6 Probabilistic method Example: Expanding graphs ([MR] 5.3) This example is similar to the previous example but S is now partitioned into several sets. Definition 1. An (n, d, α, c) OR-concentrator is a bipartite graph G = (L, R, E) with |L| = |R| = n and such that 1. Every vertex in L has degree at most d 2. For any S ⊂ L with |S| 6 αn the number if neighbors of S is at least c|S|. Theorem 2. There is an n0 such that for all n > n0 there is an (n, 18, 1/3, 2) ORconcentrator. Proof. Construct a bipartite graph G = (L, R, E) at random as follows. For each v ∈ L choose d neighbors uniformly at random (with replacement). Double edges are replaced by a single edge. For i 6 αn let Si = {(S, T ) | S ⊂ L, T ⊆ R, |S| = i, |T | = ci}. P : For all i 6 αn, no (S, T ) ∈ Si has property P 0 . P 0 : All neighbors of S are in T . For fixed (S, T ) ∈ Si , di ci Pr{(S, T ) has property P } = n 0 Let Ei be the event that some (S, T ) ∈ Si has property P 0 . Since |Si | = n i n ci we get di n n ci Pr{Ei } 6 i ci n Next we need to find values for d, α and c such that the sum of the right-hand side for i = 1 to αn is less than 1. The following computation shows that this holds for d = 18, α = 1/3 and c = 2. Since for any integers k 6 n it holds that nk 6 (ne/k)k we get Pr{Ei } 6 ne i ne ci ci di i ci n " #i i d−c−1 c+1 d−c = e c . n Now using i/n 6 α = 1/3 and use c = 2 and d = 18 gives Pr{Ei } < Pr{G has property P } > 1 − αn X i=1 2 Pr{Ei } > 1 − 1 i 2 . αn i X 1 i=1 2 > 0. Week 6 Probabilistic method Example: Large independent set. (not in [MR]) Given a graph G with m edges, what can we say about the size k of the largest independent set? One way is to take k vertices at random and to show that the induced subgraph is an independent set with non-zero probability. An edge e is in the subgraph with probability less than ( nk )2 . By the union bound, the probability that some edge is in the subgraph is less than m( nk )2 . So the probability of hitting an independent set is non-zero if 2 n k < 1, i.e. if k < √ . m n m Clearly, this is not a strong bound. Consider the following two events. (i) In a random sample of k vertices there is no edge in the induced subgraph. (ii) In a random sample of 2k vertices there will be at most k edges. Intuitively, the probability of the latter event is much larger. But if this happens, then removing for edges on eof its endpoints leaves an independent set of size at least k. Next, we use this sample and modify approach to get a better bound. Theorem 3. Any graph G with n vertices and m > n/2 edges has an independent set of size at least k = n2 /(4m). Proof. (If m 6 n/2 then clearly there is an indpendent set of size at least n/2.) Do the following: 1. (Sample) Select each vertex with probability p. 2. (Modify) In the induced subgraph, select one endpoint for each edge and delete all these points. Let X be the number of selected vertices and Y be the number of edges in the induced subgraph. Then the expected number of vertices in the remaining independent set is at least E[X − Y ]. Clearly, E[X] = np. The probability that an edge is in the induced subgraph is p2 . Therefore, E[Y ] = mp2 . E[X − Y ] = np − mp2 . This value is maximized for p = n/(2m) (which is at most 1 by assumption). In that case, n n2 n2 −m 2 = . 2m 4m 4m This bound is much stronger than the previous bound of √nm . E[X − Y ] = n Example: Girth of a graph. (not in [MR]) This is another example of the sample and modify approach. The girth of a graph G is defined as the length of the smallest cycle in G. Clearly, there is some correlation between the girth k of a graph and the number of edges |E|: Graphs with large girth cannot be very dense. The probabilistic method is an easy tool to show what density still is possible for given value of k. It certainly does not give the best bound. 3 Week 6 Probabilistic method Theorem 4. For all k > 3 there is a graph G = (V, E) on n vertices such that (i) G has girth at least k. (ii) |E| > 13 n1+1/k Proof. We take a random graph on n vertices and then we modify it such that its girth is at least k. We then show that the expected number of edges is at least 31 n1+1/k . 1. (Sample) Take a random graph G on n vertices: edges are chosen independently with probability p = n1/k−1 . 2. (Modify) For each cycle of length j < k select one edge. Remove all selected edges. Let X be the number of edges in the random graph and Y be the number of cycles in it of length less than k. Then the expected number of remaining edges is at least E[X − Y ]. n E[X] = p. 2 Let Cj be the number of cycles of length j in a complete graph on n vertices, j = 3, . . . , k−1. Then n Cj 6 (j − 1)!/2 < nj . j Each such cycle appears in G with probability pj . Therefore, the expected number of cycles of length j in G is at most nj pj = nj n(1/k−1)j = nj/k < n. Hence, E[Y ] < kn. n E[X − Y ] > · p − kn 2 n(n − 1) 1/k−1 = n − kn 2 1 1 = (1 − )n1/k+1 − kn 2 n 1 1/k+1 > n , for n large enough. 3 Probabilistic method: Lovász Local Lemma ([MR] 5.5) Definition 2. We say that event E is mutually independent of events E1 , . . . , Ek if Pr(E | ∩j∈S Ej ) = Pr(E) fpr all S ⊆ {1, 2, . . . , k}. Definition 3. Graph G = (V, E) with vertices 1, 2, . . . , n is a dependency graph for events E1 , E2 , . . . , En if, for every j, Ej is mutually independent of all other events Ei with (i, j) ∈ / E. 4 Week 6 Probabilistic method Mutually independence is a stronger property than pairwise independence. It is a misunderstanding that the dependency graph has an edge between i and j if and only if Ei and Ej are dependent. Consider the following example. We flip a coin twice and say that we win if both outcomes are the same. Let E1 (E2 ) be the event that the first (second) coin flip gives head and let E3 be the event of winning. Then the three events are pairwise independent but E3 is not independent of E1 ∩ E2 . So what is the dependency graph for this simple example? The complete graph K3 is a valid choice and so is any graphs with exactly two edges. We see that the dependency graph is not unique and that in general the complete graph is always valid. For graph G(V, E) and j ∈ V denote the neighborhood of j by N (j) = {i | (i, j) ∈ E}. Theorem 5 (LLL general form). Let G be a dependency graph for events E1 , E2 . . . , En . If there are numbers x1 , x2 , . . . , xn ∈ [0, 1] such that Y Pr[Ej ] 6 xj (1 − xi ) for every j. (1) i∈N (j) then n Pr[ ∩ E j ] > j=1 n Y (1 − xi ). (2) i If xj = 1 for some j then the theorem is obviously true so assume xj < 1 for all j. The theorem follows almost directly from the next lemma. Lemma 1. If condition (1) holds, then for any S ⊂ {1, 2, . . . , n} and j ∈ /S Pr{Ej | ∩ E i } 6 xj . (3) i∈S Proof. We prove it by induction on the size of S. If |S| = 0, then the lemma follows directly from (1). Now let |S| = k > 1 and assume the lemma holds for sets of size at most k − 1. We partition S into two sets. Let S1 = S ∩ N (j), and event A1 = ∩ E i S2 = S \ S1 , and event A2 = ∩ E i . 5 i∈S1 i∈S2 Week 6 Probabilistic method Note that if S1 = ∅ then Pr{Ej | A1 ∩ A2 } = Pr{Ej | A2 } = Pr{Ej } 6 xj . So assume from now that S1 6= ∅. Pr{Ej | A1 ∩ A2 } = Pr{Ej ∩ A1 ∩ A2 } Pr{Ej ∩ A1 | A2 } = . Pr{A1 ∩ A2 } Pr{A1 | A2 } We shall bound the numerator and denominator separately. Numerator: Y (1 − xi ). Pr{Ej ∩ A1 | A2 } 6 Pr{Ej | A2 } = Pr{Ej } 6 xj (4) (5) i∈N (j) The equality follows from the independence of Ej and A2 and the inequality is by condition (1). For the denominator we use the induction hypothesis. Let S1 = {i1 , . . . , ir } for some r > 1. Pr{A1 | A2 } = Pr{E i1 | A2 } · Pr{E i2 | E i1 ∩ A2 } . . . Pr{E ir | E i1 ∩ · · · ∩ E ir−1 ∩ A2 } Y > (1 − xi ) i∈S1 Substituting (5) and (6) in (4) we get Q xj (1 − xi ) Pr{Ej | A1 ∩ A2 } 6 i∈N (j) Q Y = xj (1 − xi ) (1 − xi ) 6 xj . i∈N (j)\S1 i∈S1 Proof. (LLL, general form) The proof follows directly from the previous lemma: n Pr{ ∩ E j } = Pr{E 1 } · Pr{E 2 | E 1 } · . . . · Pr{E n | E 1 ∩ · · · ∩ E n−1 } j=1 > (1 − x1 )(1 − x2 ). . . . (1 − xn ). The LLL lemma may be better known in its symmetric form. Theorem 6 (LLL symmetric form). Let G be a dependency graph for events E1 , E2 . . . , En and let d be the maximum degree. Further assume that Pr(Ej ) 6 p for all j and some p. n If e · p · (d + 1) 6 1 (where e = 2.71...) then Pr[ ∩ E j ] > 0. j=1 Proof. Choose xi = 1/(d + 1) for all i. Then Y 1 1 d 1 1 xj (1 − xi ) > (1 − ) > · > p > Pr[Ej ]. d+1 d+1 d+1 e i∈N (j) Now the general LLL states that n Pr[ ∩ E j ] > j=1 n Y (1 − xj ) = 1 − j 6 1 d+1 n > 0. Week 6 Probabilistic method Example: Cycle coloring (not in [MR]) Assume we are given a cycle C of c · n vertices together with a coloring of its vertices. There are c colors and each color is given to n vertices. We want to choose one vertex of each color such that no two are adjacent (i.e., no edge has both endpoints picked). Clearly, this is not possible for small values of n and always possible for large values of n. How small can n be? Let us first follow the simple argument without LLL and then apply LLL. Pick a vertex at random from each color. Let Ej be the event that both endpoints of edge j are picked. Then Pr[Ej ] 6 1/n2 . (Equality if the endpoints have different color and zero if the same color.) The probability that no edge is picked Pr[∩j E j ] = 1 − Pr[∪j Ej ] > 1 − X Pr[Ej ] > 1 − j c cn =1− . n2 n The probability is strictly positive if n > c. This bound is not very strong. The weak point is the union bound. Assume there is an edge i with endpoints colored 1 and 2 and another edge j with color 3 and 4. The union bound says that Pr[Ei ∪ Ej ] 6 Pr[Ei ] + Pr[Ej ] = 2/n2 . However, these events are independent which implies that Pr[Ei ∪ Ej ] = Pr[Ei ] + Pr[Ej ] − Pr[Ei ∩ Ej ] = 2/n2 − 1/n4 . Hence, we can get a stronger bound by using independence. Now we apply LLL. Consider the following graph G on n vertices. There is an edge between i and j if and only if i and j have a color in common. Then G is a dependency graph and the maximum degree is 4(n − 1) + 2 = 4n − 2. Now LLL tells us that n Pr[ ∩ E j ] > 0, j=1 if e n12 (4n − 1) 6 1, i.e., n > 11. This is much stronger that the bound n > c that we obtained without LLL. Example: Edge disjoint paths (not in [MR]) This is a similar application of LLL. We have a graph G = (V, E) together with N pairs (sj , tj ) ∈ V 2 and would like to find one path between each pair such that the N paths are pairwise edge disjoint. This may or may not exist. Assume we have for each pair (sj , tj ) a set Pj of m paths between the pair that we can choose from. The paths in Pj do not have to be edge disjoint. Assume that for any two Pi , Pj at most k of the m2 pairs of paths share some edges. Lets first do the analysis without LLL. For each j pick a path pj from Pj at random. Let Eij be the event that pi and pj share an edge. Then Pr{Eij } 6 k/m2 . 7 Week 6 Probabilistic method The probability that no pair of paths share an edge is at least X N k Pr(∩i,j E ij ) > 1 − , Pr(Eij ) > 1 − 2 m2 i,j m2 k which is strictly positive for > N 2 . Now we use LLL. The event Eij is mutually independent from the set of events {Egh | {i, j} ∩ {g, h} = ∅}. In the dependency graph there is an edge between Eij and Egh only if {i, j} ∩ {g, h} = 6 ∅. Hence, d 6 2(N − 2). The condition of LLL states k m2 (2N − 1) 6 1 ⇔ > e(2N − 1). m2 k 2 This is stronger than the bound mk > N2 obtained without LLL. For example, if we have N = 100 pairs of vertices (in some large graph) and between each pair m = 50 paths and k = 4 then we know that there should be a set of pairwise edge disjoint paths. ep(d + 1) 6 1 ⇔ e Example: k-Satisfiability ([MR] 5.5) In general, it may be very difficult to get from the existence proof (obtained from LLL) to actually finding a solution. The book gives a complex example for k-SAT in which one actually finds an assignment (with high probability). Let us first give a simple proof for the existence using LLL. The k-SAT problem is the maximum satisfiability problem in which each clause has exactly k literals. We may assume that no clause contains both a variable and its negation since these clauses are always true. Theorem 7 (Not in MR). If each variable appears in at most K = 2k /(3k) clauses then the formula has a feasible assignment. (Assume k > 6.) Proof. Set each variable true with probability 1/2. Let Ej be the event that clause j is not satisfied. Then Pr{Ej } = 21k . Since each variable appears in at most K clauses we have d 6 k(K − 1) < k2k /(3k) = 2k /3. Then, ep(d + 1) = e 1 2k 2k +1 3 1 1 = e( + k ) < 1 for k > 6. 3 2 Note that K = 2k /(3k) is much larger than the K = 2k/50 from the book. In that sense, the result above is stronger than what is done in the book. However, it only gives existence. Finding a feasible assignment is much harder. 8 Week 6 Probabilistic method Theorem 8. (Theorem 5.14) If each variable appears in at most K = 2k/50 clauses then a satisfying assignment can be found in expected polynomial time (where k is assumed an even constant). The algorithm works in two phases. In the first phase, part of the variables is given a random value. In the second pahes, the assignment is completed in an optimal way. Algorithm: Phase 1 Sequentially flip a coin for each variable (for example in the order 1, 2, . . . ,n) but skip a variable if it became marked in the process. When a clause has k/2 of its variables assigned and is still not satisfied then all the remaining variables are ‘marked’. The clause is then called dangerous. Phase 2 Build a graph for the yet unsatisfied clauses. There is a vertex for each clause and an edge if the two clauses share a marked variable. If all components in this graph have size at most z log m (for some constant z following from the analysis below) then complete the assignment in an optimal way by complete enumeration. Otherwise, apply phase 1 again. Example The example below is easliy sastified but illustrates the two phases. C 1 = x1 ∨ x3 ∨ x5 ∨ x6 , C 2 = x1 ∨ x2 ∨ x5 ∨ x7 , C 3 = x2 ∨ x3 ∨ x4 ∨ x6 , C4 = x2 ∨ x4 ∨ x5 ∨ x7 Assume the first coin flip is: x1 = true. Then the first clause is satisfied. Now assume the second coin flip gives x2 =false. The false literals are displayed in red. C 2 = x1 ∨ x2 ∨ x 5 ∨ x 7 , C3 = x2 ∨ x3 ∨ x4 ∨ x6 , C4 = x2 ∨ x4 ∨ x5 ∨ x7 Clause C2 is dangerous. Variables x5 and x7 are defered to the second phase. Now assume the third coin flip gives x3 =false. Then also clause C3 is dangerous and x4 and x6 are defered to the second phase. Clause C4 is not dangerous but all its remaing variables are defered. Clauses C2 , C3 , C4 survive the first phase. C2 = x1 ∨ x2 ∨ x5 ∨ x7 , C3 = x2 ∨ x3 ∨ x4 ∨ x6 , C4 = x2 ∨ x4 ∨ x5 ∨ x7 . Lemma 2. There always is a feasible completion of the partial assignment given in Phase 1. Proof. Each of the unsatisfied clauses still has at least k/2 marked variables left. Let k 0 = k/2 and assume each clause has exactly k 0 variables. (Just remove the extra variables.) Each variable 0 appears in at most 2k/50 = 2k /25 clauses, which is much less than the bound of Theorem 7. So existence is guaranteed. Lemma 3. If all components of the graph constructed in Phase 2 are of size at most z log m for constant z, then an optimal completion can be found in polynomial time. Proof. No two components have a variable in common so the assignment can be done independently for each component. For each component, the number of possible assignments is 2(kz log m) which is polynomial in m if we assume that k is a constant. It remains to show that the expected number of repetitions of the phases is polynomially bounded. We show that the expected number is less than 2. Let G be the dependency graph and let d be the maximum degree. In this case, d 6 k(K − 1) < k2k/50 . 9 Week 6 Probabilistic method Lemma 4. The probability that a clause is yet unsatisfied at the end of phase 1 is at most (d + 1)2−k/2 . Proof. This only happens when it became a dangerous clause or when some variables where marked because other clauses became dangerous. In the latter case, the other clause is adjacent in the dependency graph. For any clause, the probability that it becomes dangerous is at most 2−k/2 . So, the probability that a clause survives the first phase is at most (union bound) (d + 1)2−k/2 . Lemma 5. Let C1 , C2 . . . , Cr be r clauses at pairwise distance at least 4 in G. The probability that all clauses survive the first phase is at most [(d + 1)2−k/2 ]r Proof. If a clause survives then one of its neighbors or the clause itself became dangerous. For j = 1, 2, . . . , r let Dj be a clause at distance at most 1 from Cj in G. For each j, the probability that Dj becomes dangerous is at most 2−k/2 . Any pair Di , Dj is at distance at least 2 in G so they do not have variables in common. Therefore, the probability that all D1 , . . . , Dr become dangerous is at most (2−k/2 )r . (Note that this holds even though the events may not be independent.). The number of ways to select one such Dj for each Cj is at most (d + 1)r . Hence, the probability is bounded by (d + 1)r · (2−k/2 )r . Definition 4. A subset T of vertices in G is called a 4-tree if the following holds: 1. The clauses are at pairwise distance at least 4 in G. 2. If we would construct a new graph in which there is an edge between two clauses in T if they are at distance exactly 4 in G then this graph is connected. Lemma 6. The number of 4-trees of size r is at most md8r Proof. Construct a new graph on the clauses where there is an edge if the distance in G between the clauses is exactly 4. The maximum degree in this graph is at most d0 = d4 . The number of 4-trees is no more than the number of connected subgraphs of this graph. By the Problem 5.7, the 0 number of connected subgraphs is at most md2r = md8r . Lemma 7. There is a constant b such that the probability that any 4-tree of size larger than r = b log m survives is o(1). Proof. Follows directly from the previous two lemmas and the union bound: The probability is at most [(d + 1)2−k/2 ]r md8r 6 f (k)r m, for some function f (k) < 1. This is at most 1/m if f (k)r 6 m−2 . We assumed k to be constant so this holds for r > b log m for some constant b. Lemma 8. There is a constant z such that the probability that any connected subgraph of G of size larger than z log m survives is o(1). Proof. Any connected subgraph contains some maximal 4-tree T . Each vertex in T has no more than d + d2 + d3 6 3d3 vertices in the component at distance at most 3 and there are no other vertices (by the maximality of T ). So the size of this subgraph is bounded by 3d3 |T |. So take z = 3d3 b. By Lemma 8, the expected number of repetitions of the phases is less than 2. 10 Week 6 Probabilistic method Questions for this week: Complete the proof of the k-SAT construction by making Problem 5.7 from [MR]. Problem 5.8 from [MR]. Each vertex j of a graph G = (V, E) is assigned a list Sj of 6r colors. For each pair j ∈ V, c ∈ Sj there are at most r neighbors i of j such that c ∈ Si . Use LLL to show that a coloring of the vertices exists such that • no two adjacent points get the same color • each vertex j is assigned a color from Sj . 11