Download cutsurvey - people.csail.mit.edu

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Randomization in
Graph Optimization Problems
David Karger
MIT
http://theory.lcs.mit.edu/~karger
MIT
Randomized Algorithms
Flip coins to decide what to do next
 Avoid hard work of making “right” choice
 Often faster and simpler than
deterministic algorithms


Different from average-case analysis
»
»
Input is worst case
Algorithm adds randomness
2
MIT
Methods

Random selection
»

Random sampling
»
»

generate a small random subproblem
solve, extrapolate to whole problem
Monte Carlo simulation
»

if most candidate choices “good”, then a
random choice is probably good
simulations estimate event likelihoods
Randomized Rounding for approximation
3
MIT
Cuts in Graphs
Focus on undirected graphs
 A cut is a vertex partition
 Value is number (or total weight) of
crossing edges

4
MIT
Optimization with Cuts

Cut values determine solution of many
graph optimization problems:
»
»
»
»
»
min-cut / max-flow
multicommodity flow (sort-of)
bisection / separator
network reliability
network design
Randomization helps solve these problems
5
MIT
Presentation Assumption
For entire presentation, we consider
unweighted graphs (all edges have
weight/capacity one)
 All results apply unchanged to arbitrarily
weighted graphs

» Integer weights = parallel edges
» Rational weights scale to integers
» Analysis unaffected
» Some implementation details
MIT
6
Basic Probability
Conditional probability
» Pr[A  B] = Pr[A]  Pr[B | A]
 Independent events multiply:
» Pr[A  B] = Pr[A]  Pr[B]
 Union Bound
» Pr[X  Y]  Pr[X] + Pr[Y]
 Linearity of expectation:
» E[X + Y] = E[X] + E[Y]

MIT
7
Random Selection for
Minimum Cuts
Random choices are good
when problems are rare
MIT
Minimum Cut
Smallest cut of graph
 Cheapest way to separate into 2 parts
 Various applications:

» network reliability (small cuts are weakest)
» subtour elimination constraints for TSP
» separation oracle for network design

Not s-t min-cut
9
MIT
Max-flow/Min-cut
s-t flow: edge-disjoint packing of s-t paths
 s-t cut: a cut separating s and t
 [FF]: s-t max-flow = s-t min-cut

» max-flow saturates all s-t min-cuts
» most efficient way to find s-t min-cuts

[GH]: min-cut is “all-pairs” s-t min-cut
» find using n flow computations
10
MIT
Flow Algorithms

Push-relabel [GT]:
»
»
push “excess” around graph till it’s gone
max-flow in O*(mn) (note: O* hides logs)
–
»

recent O*(m3/2) [GR]
min-cut in O*(mn2) --- “harder” than flow
Pipelining [HO]:
»
»
save push/relabel data between flows
min-cut in O*(mn) --- “as easy” as flow
11
MIT
Contraction
Find edge that doesn’t cross min-cut
 Contract (merge) endpoints to 1 vertex

12
MIT
Contraction Algorithm

Repeat n - 2 times:
»
»
find non-min-cut edge
contract it (keep parallel edges)
Each contraction decrements #vertices
 At end, 2 vertices left

»
»
unique cut
corresponds to min-cut of starting graph
13
MIT
Picking an Edge
Must contract non-min-cut edges
 [NI]: O(m) time algorithm to pick edge

»
»
n contractions: O(mn) time for min-cut
slightly faster than flows
If only could find edge faster….
Idea: min-cut edges are few
15
MIT
Randomize
Repeat until 2 vertices remain
pick a random edge
contract it
(keep fingers crossed)
16
MIT
Analysis I

Min-cut is small---few edges
»
»
»
Suppose graph has min-cut c
Then minimum degree at least c
Thus at least nc/2 edges
Random edge is probably safe
Pr[min-cut edge]  c/(nc/2)
= 2/n
(easy generalization to capacitated case)

17
MIT
Analysis II
Algorithm succeeds if never accidentally
contracts min-cut edge
 Contracts #vertices from n down to 2
 When k vertices, chance of error is 2/k

»

thus, chance of being right is 1-2/k
Pr[always right] is product of
probabilities of being right each time
18
MIT
Analysis III
n
Pr[success]   Pr[ k contractio n safe]
th
k 2
 (1  n2 )(1  n21 )  (1  23 )
 ( n n 2 )( nn13 )  ( 13 )
MIT

2
n ( n 1)

2
n2
…not too good!
19
Repetition

Repetition amplifies success probability
»
»
basic failure probability 1 - 2/n2
so repeat 7n2 times
Pr[complete failure]  Pr[fail 7 n times]
2
 (Pr[fail once])
 (1  n22 )
 10
MIT
7 n2
7 n2
6
20
How fast?

Easy to perform 1 trial in O(m) time
»
just use array of edges, no data structures
But need n2 trials: O(mn2) time
 Simpler than flows, but slower

21
MIT
An improvement [KS]

When k vertices, error probability 2/k
» big when k small

Idea: once k small, change algorithm
»
»

algorithm needs to be safer
but can afford to be slower
Amplify by repetition!
»
Repeat base algorithm many times
22
MIT
Recursive Algorithm
Algorithm RCA ( G, n )
{G has n vertices}
repeat twice
randomly contract G to n/2½ vertices
RCA(G,n/21/2)
(50-50 chance of avoiding min-cut)
23
MIT
Main Theorem

On any capacitated, undirected graph,
Algorithm RCA
»
»

runs in O*(n2) time with simple structures
finds min-cut with probability 1/log n
Thus, O(log n) repetitions suffice to find
the minimum cut (failure probability 10-6)
in O(n2 log2 n) time.
24
MIT
Proof Outline
Graph has O(n2) (capacitated) edges
 So O(n2) work to contract, then two
subproblems of size n/2½

» T(n) = 2 T(n/2½) + O(n2) = O(n2 log n)

Algorithm fails if both iterations fail
» Iteration succeeds if contractions and
recursion succeed
» P(n)=1 - [1 - ½ P(n/2½)]2 = W(1 / log n)
25
MIT
Failure Modes
Monte Carlo algorithms always run fast
and probably give you the right answer
 Las Vegas algorithms probably run fast
and always give you the right answer
 To make a Monte Carlo algorithm Las
Vegas, need a way to check answer

»

repeat till answer is right
No fast min-cut check known (flow slow!)
26
MIT
How do we verify
a minimum cut?
MIT
Enumerating Cuts
The probabilistic method,
backwards
MIT
Cut Counting
Original CA finds any given min-cut with
probability at least 2/n(n-1)
 Only one cut found
 Disjoint events, so probabilities add
 So at most n(n-1)/2 min-cuts

» probabilities would sum to more than one

Tight
» Cycle has exactly this many min-cuts
29
MIT
Enumeration
RCA as stated has constant probability
of finding any given min-cut
 If run O(log n) times, probability of
missing a min-cut drops to 1/n3
 But only n2 min-cuts
 So, probability miss any at most 1/n
 So, with probability 1-1/n, find all

» O(n2 log3 n) time
30
MIT
Generalization
If G has min-cut c, cut ac is a-mincut
 Lemma: contraction algorithm finds any
given a-mincut with probability W(n-2a)

» Proof: just add a factor to basic analysis
Corollary: O(n2a) a-mincuts
 Corollary: Can find all in O*(n2a) time

» Just change contraction factor in RCA
31
MIT
Summary

A simple fast min-cut algorithm
» Random selection avoids rare problems
Generalization to near-minimum cuts
 Bound on number of small cuts

» Probabilistic method, backwards
32
MIT
Random Sampling
MIT
Random Sampling

General tool for faster algorithms:
»
»
»

pick a small, representative sample
analyze it quickly (small)
extrapolate to original (representative)
Speed-accuracy tradeoff
»
»
smaller sample means less time
but also less accuracy
34
MIT
A Polling Problem
Population of size m
 Subset of c red members
 Goal: estimate c
 Naïve method: check whole population
 Faster method: sampling

» Choose random subset of population
» Use relative frequency in sample as
estimate for frequency in population
35
MIT
Analysis: Chernoff Bound
Random variables Xi [0,1]
 Sum X =  Xi
 Bound deviation from expectation

Pr[ |X-E[X]|  eE[X] ] < exp(-e2E[X] / 4)
“Probably, X  (1±e)E[X]”
 If E[X]  4(ln n)/e2, “tight concentration”

» Deviation by e probability < 1 / n
36
MIT
Application to Polling
Choose each member with probability p
 Let X be total number of reds seen
 Then E[X]=pc
 So estimate ĉ by X/p
 Note ĉ accurate to within 1±e iff X is
within 1±e of expectation:

ĉ = X/p  (1±e)E[X]/p (1±e)c
37
MIT
Analysis
Let Xi=1 if ith red item chosen, else 0
 Then X=  Xi
 Chernoff Bound applies
» Pr[deviation by e] < exp(-e2pc/ 4)
» < 1/n if pc > 4(log n)/e2
 Pretty tight
» if pc < 1, likely no red samples

» so no meaningful estimate
38
MIT
Sampling for Min-Cuts
MIT
Min-cut Duality

[Edmonds]: min-cut=max tree packing
»
»
»

convert to directed graph
“source” vertex s (doesn’t matter which)
spanning trees directed away from s
[Gabow] “augmenting trees”
»
»
»
add a tree in O*(m) time
min-cut c (via max packing) in O*(mc)
great if m and c are small…
40
MIT
Example
min-cut 2
2 directed
spanning trees
MIT
directed min-cut 2
41
Random Sampling
Gabow’s algorithm great if m, c small
 Random sampling

»
»
»

reduces m, c
scales cut values (in expectation)
if pick half the edges, get half of each cut
So find tree packings, cuts in samples
Problem: maybe some large deviations
42
MIT
Sampling Theorem
Given graph G, build a sample G(p) by
including each edge with probability p
 Cut of value v in G has expected value
pv in G(p)
 Definition: “constant” r = 8 (ln n) / e2
 Theorem: With high probability, all
exponentially many cuts in G(r / c)
have (1 ± e) times their expected values.

43
MIT
A Simple Application
[Gabow] packs trees in O*(mc) time
 Build G(r / c)

»
»
»
»

minimum expected cut r
by theorem, min-cut probably near r
find min-cut in O*(rm) time using [Gabow]
corresponds to near-min-cut in G
Result: (1+e) times min-cut in O*(m/e2) time
44
MIT
Proof of Sampling: Idea
Chernoff bound says probability of large
deviation in cut value is small
 Problem: exponentially many cuts.
Perhaps some deviate a great deal
 Solution: showed few small cuts

» only small cuts likely to deviate much
» but few, so Chernoff bound applies
45
MIT
Proof of Sampling

Sampled with probability r /c,
» a cut of value ac has mean ar
» [Chernoff]: deviates from expected size by
more than e with probability at most n-3a
At most n2a cuts have value ac
 Pr[any cut of value ac deviates] = O(na)
 Sum over all a

46
MIT
Las Vegas Algorithms
Finding Good Certificates
MIT
Approximate Tree Packing
Break edges into c /rrandom groups
 Each looks like a sample at rate r / c

» O*( rm / c) edges
» each has min expected cut r
» so theorem says min-cut (1 – e)r
So each has a packing of size (1 – e)r
 [Gabow] finds in time O*(r2m/c) per group

» so overall time is (c /r )  O*(r2m/c) = O*(rm)
48
MIT
Las Vegas Algorithm
Packing algorithm is Monte Carlo
 Previously found approximate cut (faster)
 If close, each “certifies” other

» Cut exceeds optimum cut
» Packing below optimum cut
If not, re-run both
 Result: Las Vegas, expected time O*(rm)

49
MIT
Exact Algorithm

Randomly partition edges in two groups
» each like a ½ -sample: e= O*(c-½)

Recursively pack trees in each half
» c/2 - O*(c½) trees

Merge packings
» gives packing of size c - O*(c½)
» augment to maximum packing: O*(mc½)

T(m,c)=2T(m/2,c/2)+O*(mc½) = O*(mc½)
50
MIT
Nearly Linear Time
MIT
Analyze Trees
Recall: [G] packs c (directed)-edge
disjoint spanning trees
 Corollary: in such a packing, some tree
crosses min-cut only twice
 To find min-cut:

» find tree packing
» find smallest cut with 2 tree edges crossing
52
MIT
Constraint trees

Min-cut c:
» c directed trees
» 2c directed min-cut
edges
» On average, two
min-cut edges/tree

Definitions:
tree 2-crosses cut
53
MIT
Finding the Cut





From crossing tree
edges, deduce cut
Remove tree edges
No other edges cross
So each component
is on one side
And opposite its
“neighbor’s” side
54
MIT
Two Problems

Packing trees takes too long
» Gabow runtime is O*(mc)

Too many trees to check
» Only claimed that one (of c) is good

Solution: sampling
55
MIT
Sampling

Use G(r/c) with e=1/8
» pack O*(r) trees in O*(m) time
» original min-cut has (1+e)r edges in G(r / c)
» some tree 2-crosses it in G(r / c)
» …and thus 2-crosses it in G

Analyze O*(r) trees in G
» time O*(m) per tree

Monte Carlo
56
MIT
Simple First Step
Discuss case where one tree
edge crosses min-cut
MIT
Analyzing a tree
Root tree, so cut
subtree
 Use dynamic program up from leaves to
determine subtree cuts efficiently
 Given cuts at children of a node,
compute cut at parent
 Definitions:

» v are nodes below v
» C(v) is value of cut at subtree v
58
MIT
The Dynamic Program
u
v
w
keep
( )
discard
Edges with least
common ancestor u
( ) ( )
(
C u   C v  + C w   2 C v  , w
)
59
MIT
Algorithm: 1-Crossing Trees
Compute edges’ LCA’s: O(m)
 Compute “cuts” at leaves

» Cut values = degrees
» each edge incident on at most two leaves
» total time O(m)

Dynamic program upwards: O(n)
Total: O(m+n)
60
MIT
2-Crossing Trees

Cut corresponds to two subtrees:
v
discard
w
keep
(

C v w

)  C( v ) + C( w )  2C( v



,w

)
n2 table entries
 fill in O(n2) time with dynamic program

61
MIT
Linear Time
Bottleneck is C(v, w) computations
 Avoid. Find right “twin” w for each v

( ) + min( C( w )  2C( v
Cv


w

,w

))
Compute using addpath and minpath
operations of dynamic trees [ST]
 Result: O(m log3 n) time (messy)

62
MIT
How do we verify
a minimum cut?
MIT
Network Design
Randomized Rounding
MIT
Problem Statement
Given vertices, and cost cvw to buy and
edge from v to w, find minimum cost
purchase that creates a graph with
desired connectivity properties
 Example: minimum cost k-connected
graph.
 Generally NP-hard
 Recent approximation algorithms
[GW],[JV]

65
MIT
Integer Linear Program
Variable xvw=1 if buy edge vw
 Solution cost Sxvw cvw
 Constraint: for every cut, Sxvw k
 Relaxing integrality gives tractable LP

» Exponentially many cuts
» But separation oracles exist (eg min-cut)

What is integrality gap?
66
MIT
Randomized Rounding
Given LP solution values xvw
 Build graph where vw is present with
probability xvw
 Expected cost is at most opt: Sxvw cvw
 Expected number of edges crossing any
cut satisfies constraint
 If expected number large for every cut,
sampling theorem applies

67
MIT
k-connected subgraph
Fractional solution is k-connected
 So every cut has (expected) k edges
crossing in rounded solution
 Sampling theorem says every cut has at
least k-(k log n)1/2 edges
 Close approximation for large k
 Can often repair: e.g., get k-connected
subgraph at cost 1+((log n)/k)1/2 times min

68
MIT
Repair Methods

Slightly increase all xvw before rounding
» E.g., multiply by (1+e)
» Works fine, but some xvw become > 1
» Problem if only want single use of edges

Round to approx, then fix
» Solve “augmentation problem” using other
network design techniques
» May be worse approx, but only to a small
part of cost
69
MIT
Nonuniform Sampling
Concentrate on the important things
[Benczur-Karger, Karger, Karger-Levine]
MIT
s-t Min-Cuts
Recall: if G has min-cut c, then in G(r/c)
all cuts approximate their expected
values to within e.
 Applications:


MIT
Min-cut in
O*(mc) time [G]
Approximate/exact in
O*((m/c) c) =O*(m)
s-t min-cut of
value v in O*(mv)
Approximate in
O*(mv/c) time
Trouble if c is small and v large.
71
The Problem
Cut sampling relied on Chernoff bound
 Chernoff bounds required that no one
edge is a large fraction of the
expectation of a cut it crosses
 If sample rate <<1/c, each edge across
a min-cut is too significant

But: if edge only crosses large cuts,
then sample rate <<1/c is OK!
MIT
72
Biased Sampling

Original sampling theorem weak when
» large m
» small c

But if m is large
» then G has dense regions
» where c must be large
» where we can sample more sparsely
73
MIT
Problem
Old Time New Time
Approx. s-t min-cut
Approx. s-t min-cut
Approx. s-t max-flow
Flow of value v
O*(mv)
O*(mn)
O*(m3/2 )
O*(mv)
O*(nv / e2)
O*(n2 / e2)
O*(mn1/2 / e)
O*(nv)
m  n /e2 in weighted, undirected graphs
74
MIT
Strong Components

Definition: A k-strong component is a
maximal vertex-induced subgraph with
min-cut k.
3
2
2
3
75
MIT
Nonuniform Sampling
Definition: An edge is k-strong if its
endpoints are in same k-component.
 Stricter than k-connected endpoints.
 Definition: The strong connectivity ce
for edge e is the largest k for which e is
k-strong.
 Plan: sample dense regions lightly

76
MIT
Nonuniform Sampling
Idea: if an edge is k-strong, then it is in
a k-connected graph
 So “safe” to sample with probability 1/k
 Problem: if sample edges with different
probabilities, E[cut value] gets messy
 Solution: if sample e with probability pe,
give it weight 1/pe
 Then E[cut value]=original cut value

77
MIT
Compression Theorem
Definition: Given compression
probabilities pe, compressed graph G[pe]
» includes edge e with probability pe and
» gives it weight 1/pe if included
Note E[G[pe]] = G
Theorem: G[r/ ce]

» approximates all cuts by e
» has O (rn) edges
78
MIT
Application
Compress graph to rn=O*(n/e2) edges
 Find s-t max-flow in compressed graph
 Gives s-t mincut in compressed
 So approx. s-t mincut in original
 Assorted runtimes:
» [GT] O(mn) becomes O*(n2/e2)
» [FF] O(mv) becomes O(nv/e2)
» [GR] O(m3/2) become O(n3/2/e3)

79
MIT
Proof (approximation)

Basic idea: in a k-strong component,
edges get sampled with prob. r / k
» original sampling theorem works
Problem: some edges may be in
stronger components, sampled less
 Induct up from strongest components:

» apply original sampling theorem inside
» then “freeze” so don’t affect weaker parts
80
MIT
Strength Lemma

Lemma:  1/ce  n
» Consider connected component C of G
» Suppose C has min-cut k
» Then every edge e in C has ce  k
» So k edges crossing C’s min-cut have
 1/ce   1/k  k (1/k ) = 1
» Delete these edges (“cost” 1)
» Repeat n - 1 times: no more edges!
81
MIT
Proof (edge count)

Edge e included with probability r/ ce

So expected number is S r/ ce
We saw S 1/ce  n
 So expected number at most r n

82
MIT
Construction

To sample, must find edge strengths
» can’t, but approximation suffices

Sparse certificates identify weak edges:
» construct in linear time [NI]
» contain all edges crossing cuts  k
» iterate until strong components emerge

Iterate for 2i-strong edges, all i
» tricks turn it strongly polynomial
83
MIT
NI Certificate Algorithm

Repeat k times
» Find a spanning forest
» Delete it
Each iteration deletes one edge from
every cut (forest is spanning)
 So at end, any edge crossing a cut of
size  k is deleted
 [NI] pipeline all iterations in O(m) time

84
MIT
Approximate Flows

Uniform sampling led to tree algorithms
» Randomly partition edges
» Merge trees from each partition element

Compression problematic for flow
» Edge capacities changed
» So flow path capacities distorted
» Flow in compressed graph doesn’t fit in
original graph
85
MIT
Smoothing

If edge has strength ce, divide into br/ ce
edges of capacity ce /br
» Creates br 1/ce = brn edges
Now each edge is only 1/br fraction of
any cut of its strong component
 So sampling a 1/b fraction works
 So dividing into b groups works
 Yields (1-e) max-flow in O*(mn1/2 / e) time

86
MIT
Exact Flow Algorithms
Sampling from residual graphs
MIT
Residual Graphs
Sampling can be used to approximate
cuts and flows
 A non-maximum flow can be made
maximum by augmenting paths
 But residual graph is directed.
 Can sampling help?

» Yes, to a limited extent
88
MIT
First Try
Suppose current flow value f
» residual flow value v-f
 Lemma: if all edges sampled with
probability rv/c(v-f) then, w.h.p., all
directed cuts within e of expectations

» Original undirected sampling used r/c
Expectations nonzero, so no empty cut
 So, some augmenting path exists

89
MIT
Application
When residual flow i, seek augmenting
path in a sample of mrv/ic edges. Time
O(mrv/ic).
 Sum over all i from v down to 1
 Total O(mrv (log v)/c) since S1/i=O(log v)
 Here, e can be any constant < 1 (say ½)
 So r=O(log n)
 Overall runtime O*(mv/c)

90
MIT
Proof

Augmenting a unit of flow from s to t
decrements residual capacity of each s-t
cut by exactly one
s
t
91
MIT
Analysis
Each s-t cut loses f edges, had at least v
 So, has at least ( v-f ) / v times as many
edges as before
 But we increase sampling probability by
a factor of v / ( v-f )
 So expected number of sampled edges
no worse than before
 So Chernoff and union bound as before

92
MIT
Strong Connectivity
Drawback of previous: dependence on
minimum cut c
 Solution: use strong connectivities
 Initialize a=1
 Repeat until done
» Sample edges with probabilities ar/ ke

» Look for augmenting path
» If don’t find, double a
93
MIT
Analysis
Theorem: if sample with probabilities
ar/ke, and a > v/(v-f), then will find
augmenting path w.h.p.
 Runtime:

» a always within a factor of 2 of “right” v/(v-f)
» As in compression, edge count O(arn)
» So runtime O(rn Siv/i)=O*(nv)
94
MIT
Summary
Nonuniform sampling for cuts and flows
 Approximate cuts in O(n2) time

» for arbitrary flow value

Max flow in O(nv) time
» only useful for “small” flow value
» but does work for weighted graphs
» large flow open
95
MIT
Network Reliability
Monte Carlo estimation
MIT
The Problem

Input:
» Graph G with n vertices
» Edge failure probabilities
– For exposition, fix a single p

Output:
» FAIL(p): probability G is disconnected by
edge failures
97
MIT
Approximation Algorithms
Computing FAIL(p) is #P complete [V]
 Exact algorithm seems unlikely
 Approximation scheme

» Given G, p, e, outputs e-approximation
» May be randomized:
– succeed with high probability
» Fully polynomial (FPRAS) if runtime is
polynomial in n, 1/e
98
MIT
Monte Carlo Simulation
Flip a coin for each edge, test graph
 k failures in t trials  FAIL(p)  k/t
 E[k/t] = FAIL(p)
 How many trials needed for confidence?

» “bad luck” on trials can yield bad estimate
» clearly need at least 1/FAIL(p)

MIT
Chernoff bound: O*(1/e2FAIL(p)) suffice
to give probable accuracy within e
» Time O*(m/e2FAIL(p))
99
Chernoff Bound
Random variables Xi [0,1]
 Sum X =  Xi
 Bound deviation from expectation

Pr[ |X-E[X]|  eE[X] ] < exp(-e2E[X] / 4)

If E[X]  4(log n)/e2, “tight concentration”
» Deviation by e probability < 1 / n

No one variable is a big part of E[X]
100
MIT
Application
Let Xi=1 if trial i is a failure, else 0
 Let X = X1 + … + Xt
 Then E[X] = t · FAIL(p)
 Chernoff says X within relative e of E[X]
with probability 1-exp(e2 t FAIL(p)/4)
 So choose t to cancel other terms

» “High probability” t = O(log n / e2FAIL(p))
» Deviation by e with probability < 1 / n
101
MIT
For network reliability

Random edge failures
» Estimate FAIL(p) = Pr[graph disconnects]

Naïve Monte Carlo simulation
» Chernoff bound---“tight concentration”
Pr[ |X-E[X]|  eE[X] ] < exp(-e2E[X] / 4)
» O(log n / e2FAIL(p)) trials expect O(log n / e2)
network failures---sufficient for Chernoff
» So estimate within e in O*(m/e2FAIL(p)) time
102
MIT
Rare Events
When FAIL(p) too small, takes too long
to collect sufficient statistics
 Solution: skew trials to make interesting
event more likely
 But in a way that let’s you recover
original probability

103
MIT
DNF Counting

Given DNF formula (OR of ANDs)
(e1 e2  e3)  (e1  e4)  (e2  e6)
Each variable set true with probability p
 Estimate Pr[formula true]

» #P-complete

[KL, KLM] FPRAS
» Skew to make true outcomes “common”
» Time linear in formula size
104
MIT
Rewrite problem

Assume p=1/2
» Count satisfying assignments

“Satisfaction matrix
» Truth table with one column per clause
» Sij=1 if ith assignment satisfies jth clause

We want number of nonzero rows
105
MIT
Satisfaction Matrix
Clauses
Assignments
1
1
0
1
0
0
0
0
0
1
0
0
1
0
1
0
3 nonzero rows
Randomly sampling rows won’t work
Might be too few nonzero rows
106
MIT
New sample space



Clauses
Assignments

Normalize each
nonzero row to one
So sum of nonzeros
is desired value
Goal: estimate
average nonzero
Method: sample
random nonzeros
1/3 1/3
0
1/3
0
0
0
0
0
1
0
0
0
1/2
0
1/2
107
MIT
Sampling Nonzeros

We know number of nonzeros/column
» If satisfy given clause, all variables in
clause must be true
» All other variables unconstrained

Estimate average by random sampling
» Know number of nonzeros/column
» So can pick random column
» Then pick random true-for-column
assignment
108
MIT
Few Samples Needed
Suppose k clauses
 Then E[sample] > 1/k

» 1  satisfied clauses  k
» 1  sample value  1/k
Adding O(k log n / e2) samples gives
“large” mean
 So Chernoff says sample mean is
probably good estimate

109
MIT
Reliability Connection

Reliability as DNF counting:
» Variable per edge, true if edge fails
» Cut fails if all edges do (AND of edge vars)
» Graph fails if some cut does (OR of cuts)
» FAIL(p)=Pr[formula true]
Problem: the DNF has 2n clauses
110
MIT
Focus on Small Cuts
Fact: FAIL(p) > pc
 Theorem: if pc=1/n(2+d) then
Pr[>a-mincut fails]< n-ad
 Corollary: FAIL(p) Pr[ a-mincut fails],
where a=1+2/d
 Recall: O(n2a) a-mincuts
 Enumerate with RCA, run DNF counting

111
MIT
Review

Contraction Algorithm
» O(n2a) a-mincuts
» Enumerate in O*(n2a) time
112
MIT
Proof of Theorem
Given pc=1/n(2+d)
 At most n2a cuts have value ac
 Each fails with probability pac=1/na(2+d)
 Pr[any cut of value ac fails] = O(nad)
 Sum over all a>

113
MIT
Algorithm
RCA can enumerate all a-minimum cuts
with high probability in O(n2a) time.
 Given a-minimum cuts, can e-estimate
probability one fails via Monte Carlo
simulation for DNF-counting (formula
size O(n2a))
 Corollary: when FAIL(p)< n-(2+d), can
e-approximate it in O (cn2+4/d) time

114
MIT
Combine
For large FAIL(p), naïve Monte Carlo
 For small FAIL(p), RCA/DNF counting
 Balance: e-approx. in O(mn3.5/e2) time
 Implementations show practical for
hundreds of nodes
 Again, no way to verify correct

115
MIT
Summary
Naïve Monte Carlo simulation works
well for common events
 Need to adapt for rare events
 Cut structure and DNF counting lets us
do this for network reliability

116
MIT
Conclusions
MIT
Conclusion
Randomization is a crucial tool for
algorithm design
 Often yields algorithms that are faster or
simpler than traditional counterparts
 In particular, gives significant
improvements for core problems in
graph algorithms

118
MIT
Randomized Methods

Random selection
»

Monte Carlo simulation
»

simulations estimate event likelihoods
Random sampling
»
»

if most candidate choices “good”, then a
random choice is probably good
generate a small random subproblem
solve, extrapolate to whole problem
Randomized Rounding for approximation
119
MIT
Random Selection
When most choices good, do one at
random
 Recursive contraction algorithm for
minimum cuts

» Extremely simple (also to implement)
» Fast in theory and in practice [CGKLS]
120
MIT
Monte Carlo
To estimate event likelihood, run trials
 Slow for very rare events
 Bias samples to reveal rare event
 FPRAS for network reliability

121
MIT
Random Sampling
Generate representative subproblem
 Use it to estimate solution to whole

» Gives approximate solution
» May be quickly repaired to exact solution
Bias sample toward “important” or
“sensitive” parts of problem
 New max-flow and min-cut algorithms

122
MIT
Randomized Rounding
Convert fractional to integral solutions
 Get approximation algorithms for integer
programs
 “Sampling” from a well designed sample
space of feasible solutions
 Good approximations for network
design.

123
MIT
Generalization
Our techniques work because
undirected graph are matroids
 All our results extend/are special cases

» Packing bases
» Finding minimum “quotients”
» Matroid optimization (MST)
124
MIT
Directed Graphs?
Directed graphs are not matroids
 Directed graphs can have lots of
minimum cuts
 Sampling doesn’t appear to work

125
MIT
Open problems

Flow in O(n2) time
» Eliminate v dependence
» Apply to weighted graphs with large flows
» Flow in O(m) time?

Las Vegas algorithms
» Finding good certificates

Detrministic algorithms
» Deterministic construction of “samples”
» Deterministically compress a graph
126
MIT
Randomization in
Graph Optimization Problems
David Karger
MIT
http://theory.lcs.mit.edu/~karger
[email protected]
MIT