Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Randomization in Graph Optimization Problems David Karger MIT http://theory.lcs.mit.edu/~karger MIT Randomized Algorithms Flip coins to decide what to do next Avoid hard work of making “right” choice Often faster and simpler than deterministic algorithms Different from average-case analysis » » MIT Input is worst case Algorithm adds randomness Methods Random selection » Monte Carlo simulation » » MIT simulations estimate event likelihoods Random sampling » if most candidate choices “good”, then a random choice is probably good generate a small random subproblem solve, extrapolate to whole problem Randomized Rounding for approximation Cuts in Graphs Focus on undirected graphs A cut is a vertex partition Value is number (or total weight) of crossing edges MIT Optimization with Cuts Cut values determine solution of many graph optimization problems: » » » » » min-cut / max-flow multicommodity flow (sort-of) bisection / separator network reliability network design Randomization helps solve these problems MIT Presentation Assumption For entire presentation, we consider unweighted graphs (all edges have weight/capacity one) All results apply unchanged to arbitrarily weighted graphs » Integer weights = parallel edges » Rational weights scale to integers » Analysis unaffected » Some implementation details MIT Basic Probability Conditional probability » Pr[A B] = Pr[A] Pr[B | A] Independent events multiply: » Pr[A B] = Pr[A] Pr[B] Linearity of expectation: » E[X + Y] = E[X] + E[Y] Union Bound » Pr[X Y] [ Pr[X] + Pr[Y] MIT Random Selection for Minimum Cuts Random choices are good when problems are rare MIT Minimum Cut Smallest cut of graph Cheapest way to separate into 2 parts Various applications: » network reliability (small cuts are weakest) » subtour elimination constraints for TSP » separation oracle for network design MIT Not s-t min-cut Max-flow/Min-cut s-t flow: edge-disjoint packing of s-t paths s-t cut: a cut separating s and t [FF]: s-t max-flow = s-t min-cut » max-flow saturates all s-t min-cuts » most efficient way to find s-t min-cuts [GH]: min-cut is “all-pairs” s-t min-cut » find using n flow computations MIT Flow Algorithms Push-relabel [GT]: » » push “excess” around graph till it’s gone max-flow in O*(mn) (note: O* hides logs) – » min-cut in O*(mn2) --- “harder” than flow Pipelining [HO]: » save push/relabel data between flows » MIT Recent O*(m3/2) [GR] min-cut in O*(mn) --- “as easy” as flow Contraction Find edge that doesn’t cross min-cut Contract (merge) endpoints to 1 vertex MIT Contraction Algorithm Repeat n - 2 times: » » find non-min-cut edge contract it (keep parallel edges) Each contraction decrements #vertices At end, 2 vertices left » » MIT unique cut corresponds to min-cut of starting graph Picking an Edge Must contract non-min-cut edges [NI]: O(m) time algorithm to pick edge » » n contractions: O(mn) time for min-cut slightly faster than flows If only could find edge faster…. Idea: min-cut edges are few MIT Randomize Repeat until 2 vertices remain pick a random edge contract it (keep fingers crossed) MIT Analysis I Min-cut is small---few edges » » » Suppose graph has min-cut c Then minimum degree at least c Thus at least nc/2 edges Random edge is probably safe Pr[min-cut edge] c/(nc/2) = 2/n (easy generalization to capacitated case) MIT Analysis II Algorithm succeeds if never accidentally contracts min-cut edge Contracts #vertices from n down to 2 When k vertices, chance of error is 2/k » MIT thus, chance of being right is 1-2/k Pr[always right] is product of probabilities of being right each time Analysis III n Pr[success] Pr[ k contractio n safe] th k 2 (1 n2 )(1 n21 ) (1 23 ) ( n n 2 )( nn13 ) ( 13 ) MIT 2 n ( n 1) 2 n2 …not too good! Repetition Repetition amplifies success probability » » basic failure probability 1 - 2/n2 so repeat 7n2 times Pr[complete failure] Pr[fail 7 n times] 2 (Pr[fail once]) (1 n22 ) 10 MIT 6 7 n2 7 n2 How fast? Easy to perform 1 trial in O(m) time » just use array of edges, no data structures But need n2 trials: O(mn2) time Simpler than flows, but slower MIT An improvement [KS] When k vertices, error probability 2/k » big when k small Idea: once k small, change algorithm » » Amplify by repetition! » MIT algorithm needs to be safer but can afford to be slower Repeat base algorithm many times Recursive Algorithm Algorithm RCA ( G, n ) {G has n vertices} repeat twice randomly contract G to n/21/2 vertices RCA(G,n/21/2) (50-50 chance of avoiding min-cut) MIT Main Theorem On any capacitated, undirected graph, Algorithm RCA » » MIT runs in O*(n2) time with simple structures finds min-cut with probability 1/log n Thus, O(log n) repetitions suffice to find the minimum cut (failure probability 10-6) in O(n2 log2 n) time. Proof Outline Graph has O(n2) (capacitated) edges So O(n2) work to contract, then two subproblems of size n/2½ » T(n) = 2 T(n/2½) + O(n2) = O(n2 log n) Algorithm fails if both iterations fail » Iteration succeeds if contractions and recursion succeed » P(n)=1 - [1 - ½ P(n/2½)]2 = W(1 / log n) MIT Failure Modes Monte Carlo algorithms always run fast and probably give you the right answer Las Vegas algorithms probably run fast and always give you the right answer To make a Monte Carlo algorithm Las Vegas, need a way to check answer » MIT repeat till answer is right No fast min-cut check known (flow slow!) How do we verify a minimum cut? MIT Enumerating Cuts The probabilistic method, backwards MIT Cut Counting Original CA finds any given min-cut with probability at least 2/n(n-1) Only one cut found Disjoint events, so probabilities add So at most n(n-1)/2 min-cuts » probabilities would sum to more than one Tight » Cycle has exactly this many min-cuts MIT Enumeration RCA as stated has constant probability of finding any given min-cut If run O(log n) times, probability of missing a min-cut drops to 1/n3 But only n2 min-cuts So, probability miss any at most 1/n So, with probability 1-1/n, find all » O(n2 log3 n) time MIT Generalization If G has min-cut c, cut ac is a-mincut Lemma: contraction algorithm finds any given a-mincut with probability W(n-2a) » Proof: just add a factor to basic analysis Corollary: O(n2a) a-mincuts Corollary: Can find all in O*(n2a) time » Just change contraction factor in RCA MIT Summary A simple fast min-cut algorithm » Random selection avoids rare problems Generalization to near-minimum cuts Bound on number of small cuts » Probabilistic method, backwards MIT Network Reliability Monte Carlo estimation MIT The Problem Input: » Graph G with n vertices » Edge failure probabilities – For simplicity, fix a single p Output: » FAIL(p): probability G is disconnected by edge failures MIT Approximation Algorithms Computing FAIL(p) is #P complete [V] Exact algorithm seems unlikely Approximation scheme » Given G, p, e, outputs e-approximation » May be randomized: – succeed with high probability » Fully polynomial (FPRAS) if runtime is polynomial in n, 1/e MIT Monte Carlo Simulation Flip a coin for each edge, test graph k failures in t trials FAIL(p) k/t E[k/t] = FAIL(p) How many trials needed for confidence? » “bad luck” on trials can yield bad estimate » clearly need at least 1/FAIL(p) MIT Chernoff bound: O*(1/e2FAIL(p)) suffice to give probable accuracy within e » Time O*(m/e2FAIL(p)) Chernoff Bound Random variables Xi ` [0,1] Sum X = Xi Bound deviation from expectation Pr[ |X-E[X]| m eE[X] ] < exp(-e2E[X]/4) If E[X] m 4(log n)/e2, “tight concentration” » Deviation by e probability < 1 / n MIT No one variable is a big part of E[X] Application Let Xi=1 if trial i is a failure, else 0 Let X = X1 + … + Xt Then E[X] = t FAIL(p) Chernoff says X within relative e of E[X] with probability 1-exp(e2 t FAIL(p)/4) So choose t to cancel other terms » “High probability” t = O(log n / e2FAIL(p)) » Deviation by e probability < 1 / n MIT Review Contraction Algorithm » O(n2a) a-mincuts » Enumerate in O*(n2a) time MIT Network reliability problem Random edge failures » Estimate FAIL(p) = Pr[graph disconnects] Naïve Monte Carlo simulation » Chernoff bound---“tight concentration” Pr[ |X-E[X]| m eE[X] ] < exp(-e2E[X]/4) » O(log n / e2FAIL(p)) trials expect O(log n / e2) network failures---good for Chernoff » So estimate within e in O*(m/e2FAIL(p)) time MIT Rare Events When FAIL(p) too small, takes too long to collect sufficient statistics Solution: skew trials to make interesting event more likely But in a way that let’s you recover original probability MIT DNF Counting Given DNF formula (OR of ANDs) (e1 e2 e3) (e1 e4) (e2 e6) Each variable set true with probability p Estimate Pr[formula true] » #P-complete [KL, KLM] FPRAS » Skew to make true outcomes “common” » Time linear in formula size MIT Rewrite problem Assume p=1/2 » Count satisfying assignments “Satisfaction matrix » Sij=1 if ith assignment satisfies jth clause We want number of nonzero rows Randomly sampling rows won’t work » Might be too few nonzeros MIT New sample space So normalize every nonzero row to sum to one (divide by number of nonzeros) » Now sum of nonzeros is desired value » So sufficient to estimate average nonzero MIT Sampling Nonzeros We know number of nonzeros/column » If satisfy given clause, all variables in clause must be true » All other variables unconstrained Estimate average by random sampling » Know number of nonzeros/column » So can pick random column » Then pick random true-for-column assignment MIT Few Samples Needed Suppose k clauses Then E[sample] > 1/k » 1 satisfied clauses k » 1 sample value 1/k Adding O(k log n / e2) samples gives “large” mean So Chernoff says sample mean is probably good estimate MIT Reliability Connection Reliability as DNF counting: » Variable per edge, true if edge fails » Cut fails if all edges do (AND of edge vars) » Graph fails if some cut does (OR of cuts) » FAIL(p)=Pr[formula true] Problem: the DNF has 2n clauses MIT Focus on Small Cuts Fact: FAIL(p) > pc Theorem: if pc=1/n(2+d) then Pr[>a-mincut fails]< n-ad Corollary: FAIL(p) Pr[ a-mincut fails], where a=1+2/d Recall: O(n2a) a-mincuts Enumerate with RCA, run DNF counting MIT Proof of Theorem Given pc=1/n(2+d) At most n2a cuts have value ac Each fails with probability pac=1/na(2+d) Pr[any cut of value ac fails] = O(nad) Sum over all a>1 MIT Algorithm RCA can enumerate all a-minimum cuts with high probability in O(n2a) time. Given a-minimum cuts, can e-estimate probability one fails via Monte Carlo simulation for DNF-counting (formula size O(n2a)) Corollary: when FAIL(p)< n-(2+d), can e-approximate it in O (cn2+4/d) time MIT Combine For large FAIL(p), naïve Monte Carlo For small FAIL(p), RCA/DNF counting Balance: e-approx. in O(mn3.5/e2) time Implementations show practical for hundreds of nodes Again, no way to verify correct MIT Summary Naïve Monte Carlo simulation works well for common events Need to adapt for rare events Cut structure and DNF counting lets us do this for network reliability MIT Random Sampling More min-cut algorithms MIT Random Sampling General tool for faster algorithms: » » » Speed-accuracy tradeoff » » MIT pick a small, representative sample analyze it quickly (small) extrapolate to original (representative) smaller sample means less time but also less accuracy Min-cut Duality [Edmonds]: min-cut=max tree packing » » » [Gabow] “augmenting trees” » » » MIT convert to directed graph “source” vertex s (doesn’t matter which) spanning trees directed away from s add a tree in O*(m) time min-cut c (via max packing) in O*(mc) great if m and c are small… Example min-cut 2 2 directed spanning trees MIT directed min-cut 2 Sampling for Approximation MIT Random Sampling [Gabow] scheme great if m, c small Random sampling » » » reduces m, c scales cut values (in expectation) if pick half the edges, get half of each cut So find tree packings, cuts in samples Problem: maybe some large deviations MIT Sampling Theorem Given graph G, build a sample G(p) by including each edge with probability p Cut of value v in G has expected value pv in G(p) Definition: “constant” r = 8 (ln n) / e2 Theorem: With high probability, all cuts in G(r / c) have (1 ± e) times their expected values. MIT A Simple Application [Gabow] packs trees in O*(mc) time Build G(r / c) » » » » MIT minimum expected cut r by theorem, min-cut probably near r find min-cut in time O*(rm) using [Gabow] corresponds to near-min-cut in G Result: (1+e) times min-cut in O*(rm) time Proof of Sampling: Idea Chernoff bound says probability of large deviation in cut value is small Problem: exponentially many cuts. Perhaps some deviate a great deal Solution: showed few small cuts » only small cuts likely to deviate much » but few, so Chernoff bound applies MIT Proof of Sampling Sampled with probability r /c, » a cut of value ac has mean ar » [Chernoff]: deviates from expected size by more than e with probability at most n-3a At most n2a cuts have value ac Pr[any cut of value ac deviates] = O(na) Sum over all a1 MIT Las Vegas Algorithms Finding Good Certificates MIT Approximate Tree Packing Break edges into c /rrandom groups Each looks like a sample at rate r / c » O*( rm / c) edges » each has min expected cut r » so theorem says min-cut (1 – e)r So each has a packing of size (1 – e)r [Gabow] finds in time O*(r2m/c) per group » so overall time is c MIT O*(r2m/c) = O*(r2m) Las Vegas Algorithm Packing algorithm is Monte Carlo Previously found approximate cut (faster) If close, each “certifies” other » Cut exceeds optimum cut » Packing below optimum cut If not, re-run both Result: Las Vegas, expected time O*(r2m) MIT Exact Algorithm Randomly partition edges in two groups » each like a 1/2-sample: e= O*(c-1/2) Recursively pack trees in each half » c/2 - O*(c1/2) trees Merge packings » gives packing of size c - O*(c1/2) » augment to maximum packing: O*(mc1/2) MIT T(m,c)=2T(m/2,c/2)+O*(mc1/2) = O* (mc1/2) Nearly Linear Time MIT Analyze Trees Recall: [G] packs c (directed)-edge disjoint spanning trees Corollary: in such a packing, some tree crosses min-cut only twice To find min-cut: » find tree packing » find smallest cut with 2 tree edges crossing MIT Problem: packing takes O*(mc) time Constraint trees Min-cut c: » c directed trees » 2c directed min-cut edges » On average, two min-cut edges/tree MIT Definitions: tree 2-crosses cut Finding the Cut MIT From crossing tree edges, deduce cut Remove tree edges No other edges cross So each component is on one side And opposite its “neighbor’s” side Sampling Solution: use G(r/c) with e=1/8 » pack O*(r) trees in O*(m) time » original min-cut has (1+e)r edges in G(r / c) » some tree 2-crosses it in G(r / c) » …and thus 2-crosses it in G Analyze O*(r) trees in G » time O*(m) per tree » Monte Carlo MIT Simplify Discuss case where one tree edge crosses min-cut MIT Analyzing a tree Root tree, so cut subtree Use dynamic program up from leaves to determine subtree cuts efficiently Given cuts at children of a node, compute cut at parent Definitions: » v are nodes below v » C(v) is value of cut at subtree v MIT The Dynamic Program u v w keep ( ) discard Edges with least common ancestor u ( ) ( ) ( C u C v + C w 2 C v , w MIT ) Algorithm: 1-Crossing Trees Compute edges’ LCA’s: O(m) Compute “cuts” at leaves » Cut values = degrees » each edge incident on at most two leaves » total time O(m) Dynamic program upwards: O(n) Total: O(m+n) MIT 2-Crossing Trees Cut corresponds to two subtrees: v discard w keep ( C v w ) C( v ) + C( w ) 2C( v ,w n2 table entries fill in O(n2) time with dynamic program MIT ) Linear Time Bottleneck is C(v, w) computations Avoid. Find right “twin” w for each v ( ) + min( C( w ) 2C( v Cv w ,w )) Compute using addpath and minpath operations of dynamic trees [ST] Result: O(m log3 n) time (messy) MIT How do we verify a minimum cut? MIT Network Design Randomized Rounding MIT Problem Statement Given vertices, and cost cvw to buy and edge from v to w, find minimum cost purchase that creates a graph with desired connectivity properties Example: minimum cost k-connected graph. Generally NP-hard Recent approximation algorithms [GW],[JV] MIT Integer Linear Program Variable xvw=1 if buy edge vw Solution cost Sxvw cvw Constraint: for every cut, Sxvw k Relaxing integrality gives tractable LP » Exponentially many cuts » But separation oracles exist (eg min-cut) MIT What is integrality gap? Randomized Rounding Given LP solution values xvw Build graph where vw is present with probability xvw Expected cost is at most opt: Sxvw cvw Expected number of edges crossing any cut satisfies constraint If expected number large for every cut, sampling theorem applies MIT k-connected subgraph Fractional solution is k-connected So every cut has (expected) k edges crossing in rounded solution Sampling theorem says every cut has at least k-(k log n)1/2 edges Close approximation for large k Can often repair: e.g., get k-connected subgraph at cost 1+((log n)/k)1/2 times min MIT Nonuniform Sampling Concentrate on the important things [Benczur-Karger, Karger, Karger-Levine] MIT s-t Min-Cuts Recall: if G has min-cut c, then in G(r/c) all cuts approximate their expected values to within e. Applications: MIT Min-cut in O*(mc) time [G] Approximate/exact in O*((m/c) c) =O*(m) s-t min-cut of value v in O*(mv) Approximate in O*(mv/c) time Trouble if c is small and v large. The Problem Cut sampling relied on Chernoff bound Chernoff bounds required that no one edge is a large fraction of the expectation of a cut it crosses If sample rate ^1/c, each edge across a min-cut is too significant But: if edge only crosses large cuts, then sample rate ^1/c is OK! MIT Biased Sampling Original sampling theorem weak when » large m » small c But if m is large » then G has dense regions » where c must be large » where we can sample more sparsely MIT Problem Old Time New Time Approx. s-t min-cut O*(mn) Approx. s-t max-flow O*(m3/2 ) Flow of value v O*(mv) Approx. bisection O*(m2) O*(n2 / e2) O*(mn1/2 / e) O* ( mn v ) O*(n11/9v) O*(n2 / e2) m n /e2 in weighted, undirected graphs MIT Strong Components Definition: A k-strong component is a maximal vertex-induced subgraph with min-cut k. 3 2 3 MIT 2 Nonuniform Sampling Definition: An edge is k-strong if its endpoints are in same k-component. Stricter than k-connected endpoints. Definition: The strong connectivity ce for edge e is the largest k for which e is k-strong. Plan: sample dense regions lightly MIT Nonuniform Sampling Idea: if an edge is k-strong, then it is in a k-connected graph So “safe” to sample with probability 1/k Problem: if sample edges with different probabilities, E[cut value] gets messy Solution: if sample e with probability pe, give it weight 1/pe Then E[cut value]=original cut value MIT Compression Theorem Definition: Given compression probabilities pe, compressed graph G[pe] » includes edge e with probability pe and » gives it weight 1/pe if included Note E[G[pe]] = G Theorem: G[r/ ce] » approximates all cuts by e » has O (rn) edges MIT Proof (approximation) Basic idea: in a k-strong component, edges get sampled with prob. r / k » original sampling theorem works Problem: some edges may be in stronger components, sampled less Induct up from strongest components: » apply original sampling theorem inside » then “freeze” so don’t affect weaker parts MIT Strength Lemma Lemma: 1/ce n » Consider connected component C of G » Suppose C has min-cut k » Then every edge e in C has ce k » So k edges crossing C’s min-cut have 1/ce 1/k k (1/k ) = 1 » Delete these edges (“cost” 1) » Repeat n - 1 times: no more edges! MIT Proof (edge count) Edge e included with probability r/ ce So expected number is S r/ ce We saw S 1/ce n So expected number at most r n MIT Construction To sample, must find edge strengths » can’t, but approximation suffices Sparse certificates identify weak edges: » construct in linear time [NI] » contain all edges crossing cuts k » iterate until strong components emerge Iterate for 2i-strong edges, all i » tricks turn it strongly polynomial MIT Certificate Algorithm Repeat k times » Find a spanning forest » Delete it Each iteration deletes one edge from every cut (forest is spanning) So at end, any edge crossing a cut of size k is deleted [NI] merge all iterations in O(m) time MIT Flows Uniform sampling led to flow algorithms » Randomly partition edges » Merge flows from each partition element Compression problematic » Edge capacities changed » So flow path capacities distorted » Flow in compressed graph doesn’t fit in original graph MIT Smoothing If edge has strength ce, divide into br/ ce edges of capacity ce /br » Creates br 1/ce = brn edges Now each edge is only 1/br fraction of any cut of its strong component So sampling a 1/b fraction works So dividing into b groups works * mn v / e time Yields (1-e) max-flow in O ( MIT ) Cleanup Approximate max-flow can be made exact by augmenting paths Integrality problems » Augmenting paths fast for small integer flow » But breakup by smoothing ruins integrality Surmountable » Flows in dense and sparse parts separable MIT Result: max-flow in O*(n11/9v) time Proof By Picture Dense regions s MIT t Compress Dense Regions s MIT t Solve Sparse Flow s MIT t Replace Dense Parts (keep flow in sparse bits) s MIT t “Fill In” Dense Parts s MIT t Conclusions MIT Conclusion Randomization is a crucial tool for algorithm design Often yields algorithms that are faster or simpler than traditional counterparts In particular, gives significant improvements for core problems in graph algorithms MIT Randomized Methods Random selection » Monte Carlo simulation » » MIT simulations estimate event likelihoods Random sampling » if most candidate choices “good”, then a random choice is probably good generate a small random subproblem solve, extrapolate to whole problem Randomized Rounding for approximation Random Selection When most choices good, do one at random Recursive contraction algorithm for minimum cuts » Extremely simple (also to implement) » Fast in theory and in practice [CGKLS] MIT Monte Carlo To estimate event likelihood, run trials Slow for very rare events Bias samples to reveal rare event FPRAS for network reliability MIT Random Sampling Generate representative subproblem Use it to estimate solution to whole » Gives approximate solution » May be quickly repaired to exact solution Bias sample toward “important” or “sensitive” parts of problem New max-flow and min-cut algorithms MIT Randomized Rounding Convert fractional to integral solutions Get approximation algorithms for integer programs “Sampling” from a well designed sample space of feasible solutions Good approximations for network design. MIT Generalization Our techniques work because undirected graph are matroids All our results extend/are special cases » Packing bases » Finding minimum “quotients” » Matroid optimization (MST) MIT Directed Graphs? Directed graphs are not matroids Directed graphs can have lots of minimum cuts Sampling doesn’t appear to work Residual graphs for flows are directed » Precludes obvious recursive solutions to flow problems MIT Open problems Flow in O(nv) time (complete m n) » Eliminate v dependence » Apply to weighted graphs with large flows » Flow in O(m) time? Las Vegas algorithms » Finding good certificates Detrministic algorithms » Deterministic construction of “samples” » Deterministically compress a graph MIT Randomization in Graph Optimization Problems David Karger MIT http://theory.lcs.mit.edu/~karger [email protected] MIT