Download Lunteren - People.csail.mit.edu

Randomization in Graph Optimization Problems David Karger MIT http://theory.lcs.mit.edu/~karger MIT Randomized Algorithms Flip coins to decide what to do next  Avoid hard work of making “right” choice  Often faster and simpler than deterministic algorithms   Different from average-case analysis » » MIT Input is worst case Algorithm adds randomness Methods  Random selection »  Monte Carlo simulation »  » MIT simulations estimate event likelihoods Random sampling »  if most candidate choices “good”, then a random choice is probably good generate a small random subproblem solve, extrapolate to whole problem Randomized Rounding for approximation Cuts in Graphs Focus on undirected graphs  A cut is a vertex partition  Value is number (or total weight) of crossing edges  MIT Optimization with Cuts  Cut values determine solution of many graph optimization problems: » » » » » min-cut / max-flow multicommodity flow (sort-of) bisection / separator network reliability network design Randomization helps solve these problems MIT Presentation Assumption For entire presentation, we consider unweighted graphs (all edges have weight/capacity one)  All results apply unchanged to arbitrarily weighted graphs  » Integer weights = parallel edges » Rational weights scale to integers » Analysis unaffected » Some implementation details MIT Basic Probability Conditional probability » Pr[A  B] = Pr[A]  Pr[B | A]  Independent events multiply: » Pr[A  B] = Pr[A]  Pr[B]  Linearity of expectation: » E[X + Y] = E[X] + E[Y]  Union Bound » Pr[X  Y] [ Pr[X] + Pr[Y]  MIT Random Selection for Minimum Cuts Random choices are good when problems are rare MIT Minimum Cut Smallest cut of graph  Cheapest way to separate into 2 parts  Various applications:  » network reliability (small cuts are weakest) » subtour elimination constraints for TSP » separation oracle for network design  MIT Not s-t min-cut Max-flow/Min-cut s-t flow: edge-disjoint packing of s-t paths  s-t cut: a cut separating s and t  [FF]: s-t max-flow = s-t min-cut  » max-flow saturates all s-t min-cuts » most efficient way to find s-t min-cuts  [GH]: min-cut is “all-pairs” s-t min-cut » find using n flow computations MIT Flow Algorithms  Push-relabel [GT]: » » push “excess” around graph till it’s gone max-flow in O*(mn) (note: O* hides logs) – »  min-cut in O*(mn2) --- “harder” than flow Pipelining [HO]: » save push/relabel data between flows » MIT Recent O*(m3/2) [GR] min-cut in O*(mn) --- “as easy” as flow Contraction Find edge that doesn’t cross min-cut  Contract (merge) endpoints to 1 vertex  MIT Contraction Algorithm  Repeat n - 2 times: » » find non-min-cut edge contract it (keep parallel edges) Each contraction decrements #vertices  At end, 2 vertices left  » » MIT unique cut corresponds to min-cut of starting graph Picking an Edge Must contract non-min-cut edges  [NI]: O(m) time algorithm to pick edge  » » n contractions: O(mn) time for min-cut slightly faster than flows If only could find edge faster…. Idea: min-cut edges are few MIT Randomize Repeat until 2 vertices remain pick a random edge contract it (keep fingers crossed) MIT Analysis I  Min-cut is small---few edges » » » Suppose graph has min-cut c Then minimum degree at least c Thus at least nc/2 edges Random edge is probably safe Pr[min-cut edge]  c/(nc/2) = 2/n (easy generalization to capacitated case)  MIT Analysis II Algorithm succeeds if never accidentally contracts min-cut edge  Contracts #vertices from n down to 2  When k vertices, chance of error is 2/k  »  MIT thus, chance of being right is 1-2/k Pr[always right] is product of probabilities of being right each time Analysis III n Pr[success]   Pr[ k contractio n safe] th k 2  (1  n2 )(1  n21 )  (1  23 )  ( n n 2 )( nn13 )  ( 13 ) MIT  2 n ( n 1)  2 n2 …not too good! Repetition  Repetition amplifies success probability » » basic failure probability 1 - 2/n2 so repeat 7n2 times Pr[complete failure]  Pr[fail 7 n times] 2  (Pr[fail once])  (1  n22 )  10 MIT 6 7 n2 7 n2 How fast?  Easy to perform 1 trial in O(m) time » just use array of edges, no data structures But need n2 trials: O(mn2) time  Simpler than flows, but slower  MIT An improvement [KS]  When k vertices, error probability 2/k » big when k small  Idea: once k small, change algorithm » »  Amplify by repetition! » MIT algorithm needs to be safer but can afford to be slower Repeat base algorithm many times Recursive Algorithm Algorithm RCA ( G, n ) {G has n vertices} repeat twice randomly contract G to n/21/2 vertices RCA(G,n/21/2) (50-50 chance of avoiding min-cut) MIT Main Theorem  On any capacitated, undirected graph, Algorithm RCA » »  MIT runs in O*(n2) time with simple structures finds min-cut with probability 1/log n Thus, O(log n) repetitions suffice to find the minimum cut (failure probability 10-6) in O(n2 log2 n) time. Proof Outline Graph has O(n2) (capacitated) edges  So O(n2) work to contract, then two subproblems of size n/2½  » T(n) = 2 T(n/2½) + O(n2) = O(n2 log n)  Algorithm fails if both iterations fail » Iteration succeeds if contractions and recursion succeed » P(n)=1 - [1 - ½ P(n/2½)]2 = W(1 / log n) MIT Failure Modes Monte Carlo algorithms always run fast and probably give you the right answer  Las Vegas algorithms probably run fast and always give you the right answer  To make a Monte Carlo algorithm Las Vegas, need a way to check answer  »  MIT repeat till answer is right No fast min-cut check known (flow slow!) How do we verify a minimum cut? MIT Enumerating Cuts The probabilistic method, backwards MIT Cut Counting Original CA finds any given min-cut with probability at least 2/n(n-1)  Only one cut found  Disjoint events, so probabilities add  So at most n(n-1)/2 min-cuts  » probabilities would sum to more than one  Tight » Cycle has exactly this many min-cuts MIT Enumeration RCA as stated has constant probability of finding any given min-cut  If run O(log n) times, probability of missing a min-cut drops to 1/n3  But only n2 min-cuts  So, probability miss any at most 1/n  So, with probability 1-1/n, find all  » O(n2 log3 n) time MIT Generalization If G has min-cut c, cut ac is a-mincut  Lemma: contraction algorithm finds any given a-mincut with probability W(n-2a)  » Proof: just add a factor to basic analysis Corollary: O(n2a) a-mincuts  Corollary: Can find all in O*(n2a) time  » Just change contraction factor in RCA MIT Summary  A simple fast min-cut algorithm » Random selection avoids rare problems Generalization to near-minimum cuts  Bound on number of small cuts  » Probabilistic method, backwards MIT Network Reliability Monte Carlo estimation MIT The Problem  Input: » Graph G with n vertices » Edge failure probabilities – For simplicity, fix a single p  Output: » FAIL(p): probability G is disconnected by edge failures MIT Approximation Algorithms Computing FAIL(p) is #P complete [V]  Exact algorithm seems unlikely  Approximation scheme  » Given G, p, e, outputs e-approximation » May be randomized: – succeed with high probability » Fully polynomial (FPRAS) if runtime is polynomial in n, 1/e MIT Monte Carlo Simulation Flip a coin for each edge, test graph  k failures in t trials  FAIL(p)  k/t  E[k/t] = FAIL(p)  How many trials needed for confidence?  » “bad luck” on trials can yield bad estimate » clearly need at least 1/FAIL(p)  MIT Chernoff bound: O*(1/e2FAIL(p)) suffice to give probable accuracy within e » Time O*(m/e2FAIL(p)) Chernoff Bound Random variables Xi ` [0,1]  Sum X =  Xi  Bound deviation from expectation Pr[ |X-E[X]| m eE[X] ] < exp(-e2E[X]/4)  If E[X] m 4(log n)/e2, “tight concentration”  » Deviation by e probability < 1 / n  MIT No one variable is a big part of E[X] Application Let Xi=1 if trial i is a failure, else 0  Let X = X1 + … + Xt  Then E[X] = t FAIL(p)  Chernoff says X within relative e of E[X] with probability 1-exp(e2 t FAIL(p)/4)  So choose t to cancel other terms  » “High probability” t = O(log n / e2FAIL(p)) » Deviation by e probability < 1 / n MIT Review  Contraction Algorithm » O(n2a) a-mincuts » Enumerate in O*(n2a) time MIT Network reliability problem  Random edge failures » Estimate FAIL(p) = Pr[graph disconnects]  Naïve Monte Carlo simulation » Chernoff bound---“tight concentration” Pr[ |X-E[X]| m eE[X] ] < exp(-e2E[X]/4) » O(log n / e2FAIL(p)) trials expect O(log n / e2) network failures---good for Chernoff » So estimate within e in O*(m/e2FAIL(p)) time MIT Rare Events When FAIL(p) too small, takes too long to collect sufficient statistics  Solution: skew trials to make interesting event more likely  But in a way that let’s you recover original probability  MIT DNF Counting  Given DNF formula (OR of ANDs) (e1 e2  e3)  (e1  e4)  (e2  e6) Each variable set true with probability p  Estimate Pr[formula true]  » #P-complete  [KL, KLM] FPRAS » Skew to make true outcomes “common” » Time linear in formula size MIT Rewrite problem  Assume p=1/2 » Count satisfying assignments  “Satisfaction matrix » Sij=1 if ith assignment satisfies jth clause We want number of nonzero rows  Randomly sampling rows won’t work  » Might be too few nonzeros MIT New sample space  So normalize every nonzero row to sum to one (divide by number of nonzeros) » Now sum of nonzeros is desired value » So sufficient to estimate average nonzero MIT Sampling Nonzeros  We know number of nonzeros/column » If satisfy given clause, all variables in clause must be true » All other variables unconstrained  Estimate average by random sampling » Know number of nonzeros/column » So can pick random column » Then pick random true-for-column assignment MIT Few Samples Needed Suppose k clauses  Then E[sample] > 1/k  » 1  satisfied clauses  k » 1  sample value  1/k Adding O(k log n / e2) samples gives “large” mean  So Chernoff says sample mean is probably good estimate  MIT Reliability Connection  Reliability as DNF counting: » Variable per edge, true if edge fails » Cut fails if all edges do (AND of edge vars) » Graph fails if some cut does (OR of cuts) » FAIL(p)=Pr[formula true] Problem: the DNF has 2n clauses MIT Focus on Small Cuts Fact: FAIL(p) > pc  Theorem: if pc=1/n(2+d) then Pr[>a-mincut fails]< n-ad  Corollary: FAIL(p) Pr[ a-mincut fails], where a=1+2/d  Recall: O(n2a) a-mincuts  Enumerate with RCA, run DNF counting  MIT Proof of Theorem Given pc=1/n(2+d)  At most n2a cuts have value ac  Each fails with probability pac=1/na(2+d)  Pr[any cut of value ac fails] = O(nad)  Sum over all a>1  MIT Algorithm RCA can enumerate all a-minimum cuts with high probability in O(n2a) time.  Given a-minimum cuts, can e-estimate probability one fails via Monte Carlo simulation for DNF-counting (formula size O(n2a))  Corollary: when FAIL(p)< n-(2+d), can e-approximate it in O (cn2+4/d) time  MIT Combine For large FAIL(p), naïve Monte Carlo  For small FAIL(p), RCA/DNF counting  Balance: e-approx. in O(mn3.5/e2) time  Implementations show practical for hundreds of nodes  Again, no way to verify correct  MIT Summary Naïve Monte Carlo simulation works well for common events  Need to adapt for rare events  Cut structure and DNF counting lets us do this for network reliability  MIT Random Sampling More min-cut algorithms MIT Random Sampling  General tool for faster algorithms: » » »  Speed-accuracy tradeoff » » MIT pick a small, representative sample analyze it quickly (small) extrapolate to original (representative) smaller sample means less time but also less accuracy Min-cut Duality  [Edmonds]: min-cut=max tree packing » » »  [Gabow] “augmenting trees” » » » MIT convert to directed graph “source” vertex s (doesn’t matter which) spanning trees directed away from s add a tree in O*(m) time min-cut c (via max packing) in O*(mc) great if m and c are small… Example min-cut 2 2 directed spanning trees MIT directed min-cut 2 Sampling for Approximation MIT Random Sampling [Gabow] scheme great if m, c small  Random sampling  » » »  reduces m, c scales cut values (in expectation) if pick half the edges, get half of each cut So find tree packings, cuts in samples Problem: maybe some large deviations MIT Sampling Theorem Given graph G, build a sample G(p) by including each edge with probability p  Cut of value v in G has expected value pv in G(p)  Definition: “constant” r = 8 (ln n) / e2  Theorem: With high probability, all cuts in G(r / c) have (1 ± e) times their expected values.  MIT A Simple Application [Gabow] packs trees in O*(mc) time  Build G(r / c)  » » » »  MIT minimum expected cut r by theorem, min-cut probably near r find min-cut in time O*(rm) using [Gabow] corresponds to near-min-cut in G Result: (1+e) times min-cut in O*(rm) time Proof of Sampling: Idea Chernoff bound says probability of large deviation in cut value is small  Problem: exponentially many cuts. Perhaps some deviate a great deal  Solution: showed few small cuts  » only small cuts likely to deviate much » but few, so Chernoff bound applies MIT Proof of Sampling  Sampled with probability r /c, » a cut of value ac has mean ar » [Chernoff]: deviates from expected size by more than e with probability at most n-3a At most n2a cuts have value ac  Pr[any cut of value ac deviates] = O(na)  Sum over all a1  MIT Las Vegas Algorithms Finding Good Certificates MIT Approximate Tree Packing Break edges into c /rrandom groups  Each looks like a sample at rate r / c  » O*( rm / c) edges » each has min expected cut r » so theorem says min-cut (1 – e)r So each has a packing of size (1 – e)r  [Gabow] finds in time O*(r2m/c) per group  » so overall time is c  MIT O*(r2m/c) = O*(r2m) Las Vegas Algorithm Packing algorithm is Monte Carlo  Previously found approximate cut (faster)  If close, each “certifies” other  » Cut exceeds optimum cut » Packing below optimum cut If not, re-run both  Result: Las Vegas, expected time O*(r2m)  MIT Exact Algorithm  Randomly partition edges in two groups » each like a 1/2-sample: e= O*(c-1/2)  Recursively pack trees in each half » c/2 - O*(c1/2) trees  Merge packings » gives packing of size c - O*(c1/2) » augment to maximum packing: O*(mc1/2)  MIT T(m,c)=2T(m/2,c/2)+O*(mc1/2) = O* (mc1/2) Nearly Linear Time MIT Analyze Trees Recall: [G] packs c (directed)-edge disjoint spanning trees  Corollary: in such a packing, some tree crosses min-cut only twice  To find min-cut:  » find tree packing » find smallest cut with 2 tree edges crossing  MIT Problem: packing takes O*(mc) time Constraint trees  Min-cut c: » c directed trees » 2c directed min-cut edges » On average, two min-cut edges/tree  MIT Definitions: tree 2-crosses cut Finding the Cut      MIT From crossing tree edges, deduce cut Remove tree edges No other edges cross So each component is on one side And opposite its “neighbor’s” side Sampling  Solution: use G(r/c) with e=1/8 » pack O*(r) trees in O*(m) time » original min-cut has (1+e)r edges in G(r / c) » some tree 2-crosses it in G(r / c) » …and thus 2-crosses it in G  Analyze O*(r) trees in G » time O*(m) per tree » Monte Carlo MIT Simplify Discuss case where one tree edge crosses min-cut MIT Analyzing a tree Root tree, so cut subtree  Use dynamic program up from leaves to determine subtree cuts efficiently  Given cuts at children of a node, compute cut at parent  Definitions:  » v are nodes below v » C(v) is value of cut at subtree v MIT The Dynamic Program u v w keep ( ) discard Edges with least common ancestor u ( ) ( ) ( C u   C v  + C w   2 C v  , w MIT ) Algorithm: 1-Crossing Trees Compute edges’ LCA’s: O(m)  Compute “cuts” at leaves  » Cut values = degrees » each edge incident on at most two leaves » total time O(m)  Dynamic program upwards: O(n) Total: O(m+n) MIT 2-Crossing Trees  Cut corresponds to two subtrees: v discard w keep (  C v w  )  C( v ) + C( w )  2C( v    ,w n2 table entries  fill in O(n2) time with dynamic program  MIT  ) Linear Time Bottleneck is C(v, w) computations  Avoid. Find right “twin” w for each v  ( ) + min( C( w )  2C( v Cv   w  ,w  )) Compute using addpath and minpath operations of dynamic trees [ST]  Result: O(m log3 n) time (messy)  MIT How do we verify a minimum cut? MIT Network Design Randomized Rounding MIT Problem Statement Given vertices, and cost cvw to buy and edge from v to w, find minimum cost purchase that creates a graph with desired connectivity properties  Example: minimum cost k-connected graph.  Generally NP-hard  Recent approximation algorithms [GW],[JV]  MIT Integer Linear Program Variable xvw=1 if buy edge vw  Solution cost Sxvw cvw  Constraint: for every cut, Sxvw k  Relaxing integrality gives tractable LP  » Exponentially many cuts » But separation oracles exist (eg min-cut)  MIT What is integrality gap? Randomized Rounding Given LP solution values xvw  Build graph where vw is present with probability xvw  Expected cost is at most opt: Sxvw cvw  Expected number of edges crossing any cut satisfies constraint  If expected number large for every cut, sampling theorem applies  MIT k-connected subgraph Fractional solution is k-connected  So every cut has (expected) k edges crossing in rounded solution  Sampling theorem says every cut has at least k-(k log n)1/2 edges  Close approximation for large k  Can often repair: e.g., get k-connected subgraph at cost 1+((log n)/k)1/2 times min  MIT Nonuniform Sampling Concentrate on the important things [Benczur-Karger, Karger, Karger-Levine] MIT s-t Min-Cuts Recall: if G has min-cut c, then in G(r/c) all cuts approximate their expected values to within e.  Applications:   MIT Min-cut in O*(mc) time [G] Approximate/exact in O*((m/c) c) =O*(m) s-t min-cut of value v in O*(mv) Approximate in O*(mv/c) time Trouble if c is small and v large. The Problem Cut sampling relied on Chernoff bound  Chernoff bounds required that no one edge is a large fraction of the expectation of a cut it crosses  If sample rate ^1/c, each edge across a min-cut is too significant  But: if edge only crosses large cuts, then sample rate ^1/c is OK! MIT Biased Sampling  Original sampling theorem weak when » large m » small c  But if m is large » then G has dense regions » where c must be large » where we can sample more sparsely MIT Problem Old Time New Time Approx. s-t min-cut O*(mn) Approx. s-t max-flow O*(m3/2 ) Flow of value v O*(mv) Approx. bisection O*(m2) O*(n2 / e2) O*(mn1/2 / e) O* ( mn  v ) O*(n11/9v) O*(n2 / e2) m  n /e2 in weighted, undirected graphs MIT Strong Components  Definition: A k-strong component is a maximal vertex-induced subgraph with min-cut k. 3 2 3 MIT 2 Nonuniform Sampling Definition: An edge is k-strong if its endpoints are in same k-component.  Stricter than k-connected endpoints.  Definition: The strong connectivity ce for edge e is the largest k for which e is k-strong.  Plan: sample dense regions lightly  MIT Nonuniform Sampling Idea: if an edge is k-strong, then it is in a k-connected graph  So “safe” to sample with probability 1/k  Problem: if sample edges with different probabilities, E[cut value] gets messy  Solution: if sample e with probability pe, give it weight 1/pe  Then E[cut value]=original cut value  MIT Compression Theorem Definition: Given compression probabilities pe, compressed graph G[pe] » includes edge e with probability pe and » gives it weight 1/pe if included Note E[G[pe]] = G Theorem: G[r/ ce]  » approximates all cuts by e » has O (rn) edges MIT Proof (approximation)  Basic idea: in a k-strong component, edges get sampled with prob. r / k » original sampling theorem works Problem: some edges may be in stronger components, sampled less  Induct up from strongest components:  » apply original sampling theorem inside » then “freeze” so don’t affect weaker parts MIT Strength Lemma  Lemma:  1/ce  n » Consider connected component C of G » Suppose C has min-cut k » Then every edge e in C has ce  k » So k edges crossing C’s min-cut have  1/ce   1/k  k (1/k ) = 1 » Delete these edges (“cost” 1) » Repeat n - 1 times: no more edges! MIT Proof (edge count)  Edge e included with probability r/ ce  So expected number is S r/ ce We saw S 1/ce  n  So expected number at most r n  MIT Construction  To sample, must find edge strengths » can’t, but approximation suffices  Sparse certificates identify weak edges: » construct in linear time [NI] » contain all edges crossing cuts  k » iterate until strong components emerge  Iterate for 2i-strong edges, all i » tricks turn it strongly polynomial MIT Certificate Algorithm  Repeat k times » Find a spanning forest » Delete it Each iteration deletes one edge from every cut (forest is spanning)  So at end, any edge crossing a cut of size  k is deleted  [NI] merge all iterations in O(m) time  MIT Flows  Uniform sampling led to flow algorithms » Randomly partition edges » Merge flows from each partition element  Compression problematic » Edge capacities changed » So flow path capacities distorted » Flow in compressed graph doesn’t fit in original graph MIT Smoothing  If edge has strength ce, divide into br/ ce edges of capacity ce /br » Creates br 1/ce = brn edges Now each edge is only 1/br fraction of any cut of its strong component  So sampling a 1/b fraction works  So dividing into b groups works * mn  v / e time  Yields (1-e) max-flow in O  ( MIT ) Cleanup Approximate max-flow can be made exact by augmenting paths  Integrality problems  » Augmenting paths fast for small integer flow » But breakup by smoothing ruins integrality  Surmountable » Flows in dense and sparse parts separable  MIT Result: max-flow in O*(n11/9v) time Proof By Picture Dense regions s MIT t Compress Dense Regions s MIT t Solve Sparse Flow s MIT t Replace Dense Parts (keep flow in sparse bits) s MIT t “Fill In” Dense Parts s MIT t Conclusions MIT Conclusion Randomization is a crucial tool for algorithm design  Often yields algorithms that are faster or simpler than traditional counterparts  In particular, gives significant improvements for core problems in graph algorithms  MIT Randomized Methods  Random selection »  Monte Carlo simulation »  » MIT simulations estimate event likelihoods Random sampling »  if most candidate choices “good”, then a random choice is probably good generate a small random subproblem solve, extrapolate to whole problem Randomized Rounding for approximation Random Selection When most choices good, do one at random  Recursive contraction algorithm for minimum cuts  » Extremely simple (also to implement) » Fast in theory and in practice [CGKLS] MIT Monte Carlo To estimate event likelihood, run trials  Slow for very rare events  Bias samples to reveal rare event  FPRAS for network reliability  MIT Random Sampling Generate representative subproblem  Use it to estimate solution to whole  » Gives approximate solution » May be quickly repaired to exact solution Bias sample toward “important” or “sensitive” parts of problem  New max-flow and min-cut algorithms  MIT Randomized Rounding Convert fractional to integral solutions  Get approximation algorithms for integer programs  “Sampling” from a well designed sample space of feasible solutions  Good approximations for network design.  MIT Generalization Our techniques work because undirected graph are matroids  All our results extend/are special cases  » Packing bases » Finding minimum “quotients” » Matroid optimization (MST) MIT Directed Graphs? Directed graphs are not matroids  Directed graphs can have lots of minimum cuts  Sampling doesn’t appear to work  Residual graphs for flows are directed  » Precludes obvious recursive solutions to flow problems MIT Open problems  Flow in O(nv) time (complete m n) » Eliminate v dependence » Apply to weighted graphs with large flows » Flow in O(m) time?  Las Vegas algorithms » Finding good certificates  Detrministic algorithms » Deterministic construction of “samples” » Deterministically compress a graph MIT Randomization in Graph Optimization Problems David Karger MIT http://theory.lcs.mit.edu/~karger [email protected] MIT

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lunteren - People.csail.mit.edu