Download Random sampling in matroids, with applications to graph

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Random Sampling in Matroids, with Applications to Graph
Connectivity and Minimum Spanning Trees.
David R. Karger*
Department of Computer Science
Stanford University
[email protected]
Abstract
This has obvious advantages in reducing the investigator’s work, both in gathering and in analyzing the
data.
We apply the concept of a representative sample
t o combinatorial optimization. Given an optimization problem, it may be possible to generate a small
random representative subproblem. Intuitively, such
a subproblem should form a microcosm of the larger
problem. In particular, an optimal solution to the
subproblem may be a nearly optimal solution to the
problem as a whole. Fbrthermore, it may be relatively
quick and easy to improve this good solution t o a solution for the entire problem.
Floyd and Rivest [14] use this approach in a fast and
elegant algorithm for finding the median of an ordered
set. They select a small random sample from the set;
inspecting this sample gives a very accurate estimate
of the value of the median. It is then easy t o find the
actual median by examining only those elements close
t o the estimate. This algorithm uses fewer comparisons than any other known median-finding algorithm.
Random sampling is a powerful way t o gather information about a group by considering only a small part
of it. W e give a paradigm f o r applying this technique
to optimization problems, and demonstrate its effectiveness o n matroids. Matroids abstractly model m a n y
optimization problems that can be solved by greedy
methods, such as the minimum spanning tree ( M S T )
problem.
Our results have several applications. W e give a n
algorithm that uses simple data structures to construct
a n M S T in O ( m + n logn) time (Klein and Tarjan [21]
have recently shown that a better choice of parameters makes this algorithm run in O(m n ) time). W e
give bounds o n the connectivity (minimum cut) of a
graph suffering random edge failures. W e give fast
algorithms f o r packing matroid bases, with particular
attention to packing spanning trees in graphs.
+
1
1.1
Introduction
1.2 Matroids and the greedy algorithm
Representative sampling
Matroids provide a demonstration of the effectiveness of our random sampling approach. The matroid is
a powerful abstraction which generalizes both graphs
and vector spaces. A matroid M consists of a ground
set M of which some subsets are declared to be independent. The independent sets must satisfy three
properties:
Arguably the central concept of statistics is that of
a representative sample. It is often possible t o gather
a great deal of information about a large population
by examining a small sample randomly drawn from it.
*Supported by a National Science Foundation Graduate Fellowship, NSF Grant CCR-9010517, NSF Young Investigator
Award CCR-9357849, and grants from Mitsubishi Corporation
and OTL.
The empty set is independent.
All subsets of an independent set are independent.
0
If U and V are independent] and JUJ> J V Jthen
,
some element of U can be added t o V t o yield an
independent set.
This definition clearly generalizes the notion of linear
independence in vector spaces; indeed this was the
84
0272-5428l93$03.00 0 1993 IEEE
ification the elements of M can be examined in any
order. Furthermore, the basis which must be verified
is static. Extensive study of dynamic algorithms has
demonstrated that they tend to be significantly more
complicated than their static counterparts-in particular, algorithms on a static input can preprocess the
input so as to accelerate queries against it.
Consider, for example, the problem of verifying an
MST on an m-edge, n-vertex graph. This problem can
be solved in linear time [9], and has simple and relatively practical O ( m n log n)-time solutions based
on least common ancestor queries [a, 5, 281. In contrast, the problem of constructing an MST had no
linear-time solution until the techniques of this paper
were applied [21] and all O ( m n log n)-time solutions are relatively complicated [15, 301. The best
of them [16] runs in O ( m l o g P ( m , n ) ) time, where
o(m,n) = min{i I log(’)n 5 m / n } . References to
other construction and verification algorithms can be
found in [9].
We demonstrate that representative sampling is
useful in matroid optimization. We show that a random sample of a l / k fraction of the elements of a
matroid will likely contain one of the IC best matroid
bases (in a sense to be formalized later). As an immediate application, this gives a simple algorithm for
finding an approximately optimal basis: choose a random sample of the elements, and construct its optimal
basis. We provide a precise description of the timeaccuracy tradeoff in this approach. In particular, we
give an extremely simple approximate-MST algorithm
which runs in linear time.
We also give a simple reduction which uses verification to exactly solve the optimization problem. The
approach is to construct the optimal solution of a representative sample, to use verification to determine
how the solution fails to be optimal for the entire problem, and then to improve it so that it becomes optimal
for the entire problem. Since the solution to the representative sample is good, the improvement takes little
additional time. Because the verification problem may
be easier to solve in either a theoretical or a practical sense, this can yield both theoretical and practical
improvements to optimization algorithms.
Applying this reduction yields a practical MST algorithm which runs in O ( m n l o g n ) time. Such
a time bound was previously achieved only by using
the theoretically important but somewhat impractical
Fibonacci heap data structure of Fredman and Tarjan [15]. This paradigm of improving on the good
solution of a representative subproblem may have applications to other optimization problems as well.
first use of matroids [31]. However, it was quickly
noted [33] that matroids also generalize graphs: in
the graphic matroid the edges of the graph form the
ground set, and the independent sets are the acyclic
sets of edges (forests). Maximal independent sets of a
matroid are called bases; bases in a vector space are
the standard ones while bases in a graph are the spanning forests (spanning trees, if the graph is connected).
In the matching matroid of a graph [24], bases correspond t o maximum matchings.
Matroids have rich structure and are the subject of
much study in their own right [32]. Matroid theory
is used to solve problems in electrical circuit analysis
and structural rigidity [26]. In computer science, perhaps the most natural problem involving matroids is
matroid optimization. If a weight is assigned to each
element of a matroid, and the weight of a set is defined
as the sum of its elements’ weights, the optimization
problem is to find a basis of minimum weight. The
minimum spanning tree (MST) problem is the matroid
optimization problem on the graphic matroid just described. Numerous other problems can also be formulated as instances of matroid optimization [8, 241.
Edmonds [ll]was the first to observe that the matroid optimization problem can be solved by the following natural greedy algorithm. Begin with an empty
independent set G, and consider the matroid elements
in order of increasing weight. Add each element to
G if doing so will keep G independent. Applying
the greedy algorithm to the graphic matroid yields
Kruskal’s algorithm [23] for minimum spanning trees:
grow a forest by repeatedly adding to the forest the
minimum weight edge which does not form a cycle
with edges already in the forest. An interesting converse result [32] is that if a family of sets does not form
a matroid, then there is an assignment of weights to
the elements for which the greedy algorithm will fail
to find an optimal set in the family.
The greedy algorithm has two drawbacks. First, the
elements of the matroid must be examined in order of
weight. Thus the matroid elements must be sorted,
forcing an R(m1ogm) lower bound on the running
time of the greedy algorithm. Second, the independent set under construction is constantly changing, so
that the problem of determining independence of elements is a dynamic one.
1.3
+
+
+
Optimizing by sampling and verifying
Contrast the optimization problem with that of verifying the optimality of a given basis. For matroids,
all that is necessary is to verify that no single element
of the matroid M “improves” the basis. Thus in ver-
85
1.4
Basis packing
divide-and-conquer parameter used by our MST algorithm was not optimal. Using the correct parameter
value makes the MST algorithm presented here run
in linear time. Their improved analysis also makes it
possible t o simplify some of the details of the algorithm.
Reif and Spirakis [27] studied random matroids. In
particular, they generalized existence proofs and algorithms for Hamiltonian paths and perfect matchings
in random graphs. However, their approach was t o analyze the average case behavior of matroid algorithms
on random inputs, while our goal is to develop randomized algorithms which work well on all inputs.
In previous work [18] we gave a limited version of
our basis counting theorem which applies only to the
graphic matroid and determines only the existence,
rather than a count, of sampled bases.
Another matroid problem, which was first studied
by Edmonds [lo], is that of packing matroid bases,
i.e. finding a maximum set of disjoint bases in a matroid. A simpler algorithm for the problem was given
by Knuth [22]. Faster algorithms exist for the special
case of the graphic matroid [17] where the problem
is to find a maximum collection of disjoint spanning
trees. A related problem is that of counting the total number of (non-disjoint) bases of matroid. This
#P-complete generalization of the problem of counting perfect matchings in a graph has been the focus of
much recent work; see for example [13].
We apply random sampling t o the basis packing
problem. Let the packing number of a matroid be the
maximum number of disjoint bases in it. We show
that a random sample of l / k elements from a matroid
with packing number n has a packing number very
close to n / k . This yields fast and simple algorithms
for estimating the packing number of a matroid. We
also use sampling to accelerate algorithms for finding
packings, using again the idea that a good solution
(packing) in random subproblems can quickly be improved t o an optimum packing of the whole. There has
been a great deal of work on packing problems in numerous settings [29]; our approach of packing random
subproblems might yield improved results in some of
them.
We can also relate our sampling theorems to the
very successful study of random graphs [3]. Our results generalize to matroids the result which initiated the study of random graphs, namely the work of
Erdos and Renyi [12] which determined the probability
threshold a t which a random graph is likely to become
connected. Although the study of random graphs focuses mainly on the complete graph, every graph determines a corresponding graphic matroid. Furthermore, the connectivity of a graph is closely related t o
the number of disjoint spanning trees it contains [7].
Our matroid result therefore yields a simple characterization of the likelihood of any undirected graph
to remain connected under random edge failures. By
bounding the number of bases in a random sample of
a matroid, we also estimate the expected connectivity
(minimum cut) of a graph with random edge failures.
1.5
2
Definitions
Matroids were defined in the introduction; those
looking for more detail may wish t o consult [32]. A
basis in a matroid is a maximal independent set. All
bases in a matroid have the same size. We therefore
define the rank of a matroid as the size of any matroid
basis (so the rank of a connected graph is one less than
the number of vertices in it). Throughout this paper,
we will focus on a matroid M of rank T on a ground
set M containing m elements.
Definition 2.1 Given a set T C M , the restriction
of M to T , denoted M I T , is a matroid with ground
set T and independent sets all independent sets of M
contained in T .
Definition 2.2 S ( p ) is a set generated f r o m S b y including each element independently with probability p .
For A S, A ( p ) = A n S ( p ) .
It is important for our work that a restriction is
indeed a matroid, a fact which follows easily from the
definition of matroids. In particular, M ( p ) is a matroid.
3
Related work
Weighted matroids
In this section we consider the result of sampling
at random from a weighted matroid and formalize the
following intuition: a random sample is likely to contain a nearly optimal basis. This shows that if we
need to find a good but not necessarily optimal basis,
Klein and Tarjan [21] have recently developed an
elegant improved analysis of our main theorem on
sampling from weighted matroids. They show that a
86
it suffices t o choose a small random sample from the
matroid and construct its optimal basis. This gives
an obvious improvement in the running time; we will
make the time-accuracy tradeoff precise. Extending
this idea, once we have a good basis, we can use a
verification algorithm to determine why it is not optimal. If improving this good basis to an optimum one
is easy, then the problem of constructing an optimal
basis is reduced to the problem of verifying a basis for
optimality. This is the subject of Section 4.
To formalize the above statements, we make the
following definitions.
Some of the algorithms we define, in particular the
MST algorithm, use a parameter i . The smaller i ,
In other words, an element improves A if adding it
changes the greedy basis of A . The elements of G ( M )
are easily seen t o improve every independent set.
In the preliminary version of this paper, we proved
the following theorem:
the better the running times of our algorithms. However, i must be large enough t o ensure that with
high probability, at most i k elements of M improve
M ( l / k ) . Our algorithms initially used Theorem 3.1
to set i = O ( r log r ) ; however the new result of Klein
and Tarjan shows that it is possible t o set i = O ( r )
and thus improve our running times.
To begin with, suppose that we have two algorithms
available: a construction algorithm which takes a matroid of m elements and rank r and constructs its optimum basis in time C ( m , r ) ;and a verification algorithm which takes an m-element matroid M of rank
r and an independent set I and determines which elements of M improve I in time V ( m , r ) . We show
how to combine these two algorithms to yield a more
efficient construction algorithm when V is faster than
C.
Begin by sampling each element of M with probability I l k and constructing the greedy basis G of
the sample using C. With high probability the sample has size O ( m / k ) so that this takes C ( m / k , r )
time. Use the verification algorithm t o find the set
I of elements of M which improve G ; this takes time
V ( m , r ) . Construct G ( I ) ; since G ( M ) I we know
G ( I ) = G ( M ) . By definition of i , I has size a t
most k i with high probability; thus this construction
takes C ( k i , r )time. The overall running time is thus
V ( m ,r ) C ( m / k ,r ) C ( k i ,r ) . To balance for k , set
m / k = k i . The running time then becomes
Theorem 3.1 With high probability, O(rk logr) ele-
V ( m ,r ) + 2 C ( G ,r ) .
Definition 3.1 The optimum or greedy basis G ( S )
for S M is the minimum weight basis which can be
constructed from elements of S.
Thus G ( S ) is the basis which results from running the
greedy algorithm on S. As noted in Definition 2.1,
S is itself a restricted matroid, so the concept of an
optimal basis in S makes sense and matroid algorithms
can be applied t o S to find it. The optimum basis of
the matroid is simply G ( M ) .
Definition 3 . 2 An element x improves an independent set A i f x is in G ( A U {x}).
+
+
ments of M improve G ( M ( l / k ) ) .
We give a sketch of the proof of this theorem in
Section 5.1. However, the discussion is abbreviated
because Klein and Tarjan [21] have recently presented
the following improvement, which is proved through a
much more elegant analysis:
Theorem 3.2 (Klein-Tarjan) With high probability, O ( r k ) elements of M improve G ( M ( l / k ) ) .
4
Application: optimizing by verifying
We use the results of the previous section t o reduce the problem of constructing the optimum basis
to the problem of verifying a basis t o determine which
elements improve it. This is useful because, as was
discussed in the introduction, the verification task is
likely t o be easier than the construction task.
This is a clear improvement when i is less than m.
It should be noted that this new algorithm is just as
practical as the original construction and verification
algorithms were, since it consists simply of two construction calls and a verification call.
At the cost of some additional complexity, we can
improve the running time further by applying the new
construction algorithm recursively. Choose a random
sample of half the matroid elements, and recursively
find the greedy basis G of that sample. Then verify
the remaining half of the matroid elements against
G . By definition of i , the set I of elements which
improve G has size a t most 2 i with high probability.
The optimum basis of M is then the greedy basis of
G U I and can be found in C ( i ,r ) time. This leads to
the recurrence
C’(2i,r) = C ( 2 i , r )
C’(m,r ) = C’(m/2,r )
+ V ( m / 2 ,r ) + C ( 2 i ,r ) .
+
Given the natural assumption that V ( m , r )= n(m)
(since every element must be examined), this recurrence solves t o
this running time to O ( m f i ) . It is based on the
first MST algorithm, by BGruvka ([4], see also [30,
11). BGruvka shows that solving an m-edge, n-vertex
MST problem can be reduced in O ( m ) time t o solving
an m-edge, n/2-vertex problem by having each vertex
identify and contract its smallest incident edgc: (all
such edges will be in the MST).
We use this reduction to improve T ( m ,n ) ,the time
needed to find an MST in a graph with m edges and
n vertices. Given the graph G, run two BGruvka iterations to reduce to n / 4 vertices and then continue the
BGruvka iterations until the graph satisfies m > 4fi
(note that fi changes as n does). The total amount of
work this requires is O ( m 4(fi/4) 4(fi/8)
. . .) =
O ( m f i ) . Then sample half the remaining edges and
recursively construct the MST G of the sample. This
takes at most T(m/2,n/4) time. Then verify the unsampled edges against the sample: this takes V ( m )
time. With high probability, at most fi < m / 4 edges
will improve G. Thus to find the actual MST of the remainder we need only use T ( m / 4 ,n/4) time. In other
words,
C'(m,r ) = O ( V ( m ,r ) + C(i:,r ) log(m/i)).
Since the depth of recursion is logarithmic in m,
the same algorithm can be used to reduce the processor cost of W C algorithms for matroid optimization
problems; the analysis is unchanged if we let V and
C denote processor costs for verification and construction rather than running times.
4.1
Minimum spanning trees
+
+
As a concrete example, consider the MST problem.
This is of course the matroid optimization problem on
the graphic matroid, which has rank n - 1 in a connected n-vertex graph. To match this accepted terminology, we let fi play the role of i: from the previous
section.
There are many well known MST algorithms which
run in O(m1ogn) time [4, 25, 23, 30, 11. We use random sampling to improve this bound. The following
may be useful because of the simple and practical way
it can be applied.
+
+
T ( m ,n ) i m+fi+V(m)+T(m/2,n/4)+T(m/4,
n/4).
By substitution, T(m,n)
= O(V(m)+fi).Using one of
the simple O ( m n log n)-time algorithms for verification and the fi = O ( n logn) parameter value originally
determined here gives a practical O ( m n log n)-time
MST algorithm. Using the linear time verification althe K1ein-Tarjan bound Of
gorithmof [91 and
fi = O ( n )gives a linear-time MST algorithm.
We can also apply the sampling paradigm t o find an
MST in parallel. It is possible to apply the m/ logn
n'+' processor, O(1ogn) time EREW PRAM connected components algorithm of [19] t o yield an MST
algorithm which uses ml+' processors and has the
same time bounds. Using the verification paradigm,
we improve the processor cost to m/ log n+nl+', yielding an EREW MST algorithm which is optimal in time
and processor costs on dense graphs.
+
Lemma 4.1 In linear time, it is possible t o construct
a spanning tree which is minimal for all but O(fi1ogn)
edges of a graph.
+
proof: Sample a l / logn fraction of the edges of
the graph and construct their MST in linear time using any one of the standard O ( mlog n)-time MST algorithms. If the resulting forest is not spanning, add
arbitrary additional edges t o make it spanning. 0
We now turn to the problem of finding the actual
MST. There are several simple and practical linear
time verification schemes for minimum spanning trees
which run in O ( m n l o g n ) time [2, 28, 51. Using an
O ( mlog n ) time construction algorithm and applying
the construction-to-verification transformation yields
a practical algorithm for the MST problem. If we
use the non-recursive formulation of the matroid algorithm, we get a running time of
+
+
5
Sampling to find a basis
We now turn to the second topic of this paper. We
consider an unweighted matroid and study the way
bases arise in a random sample from the matroid.
This section generalizes and extends a well known fact
proved by Erdos and Renyi [12, 31, namely that if a
random graph on n vertices is generated by including each edge independently with probability exceeding (In n ) / n ,then t,he graph is likely to be connected.
O(m+dmfilog2n) = O(m+filog2n).
The recursive formulation yields a running time of
O ( m +filognlog(m/fi)) = O ( m +filognloglogn).
An additional modification which does not appear
extensible to general matroids allows us to improve
88
examining the sets Bi(p) one at a time and adding
some of their elements to an independent set I (initially empty) until I is large enough t o be a basis. We
invert the problem by asking how many bases must
be examined before I becomes a basis. Suppose we
determine U = Bl(p),the set of elements of B1 contained in M ( p ) . Note that the size U of U is distributed
as B(r,p);thus E[u] = ~ p Consider
.
the contraction
M I U . By Lemma 5.1, this matroid contains disjoint
bases B2/U, B3/U,.. ., and has rank T - U . We ask
recursively how many of these bases we need to examine to construct a basis B for the contracted matroid.
Once we have done so, we know from Lemma 5.1 that
U U B is a basis for M . This gives a probabilistic recurrence for the number of bases we need t o examine:
Rephrased in the language of graphic matroids, if the
edges of a complete graph on n vertices are sampled
with the given probability, then the sampled edges are
likely to contain a spanning tree, i.e. a basis. We generalize this result t o arbitrary matroids.
The result of this section is merely a special case
of Theorem 6.1, which actually counts the number of
bases which survive. However, this section provides
the intuition and demonstrates the techniques which
will be used in the proof of that more difficult theorem.
We begin with some definitions needed in the proof.
Definition 5.1 The rank of A
M , denoted PA, is
the size of the largest independent subset of A .
Definition 5.2 A set A spans an element x if pA =
p ( A U { x } ) . The span of a set A , denoted [TA,is the
set of elements spanned by A . A spans B if B [TA.
If A C B then a A
C then A spans C.
T ( r )= 1+ T ( r - U ) ,
If we replaced random variables by their expected values, we would get a recurrence of the form S ( r ) =
1 S((1- p ) ~ )which
,
solves to S ( r ) = log, r , where
b = 1/ (1-p ) . Probabilistic recurrences are studied by
Karp in [20]. His first theorem exactly describes our
recurrence, and proves that for any a,
OB. If A spans B and B spans
+
Definition 5.3 B(n,p) is a binomial random variable:
Pr[B(n,p)= k] =
U = B(r,p).
(3
pk(l-P)"-~.
P r [ T ( r )2
rj + a
f 21
5 (1 - l/k)".
In our case, log, r M k In r. 0
A corollary to this theorem is the performance of
the following natural randomized incremental algorithm for constructing a matroid basis. Start with an
empty set. Examine elements in random order, and
add them to the set if they are independent of it, until
the set forms a basis.
The concept of a contracted matroid is well known
in matroid theory; however, we use slightly different
terminology. For the following definitions, fix some
independent set T in M .
Definition 5.4 A set A is T-independent or independent of T in M if A U T is independent.
Definition 5.5 The contraction of M by T , denoted
M I T , is the matroid on M whose independent sets
Corollary 5.3 The randomized incremental algorithm for constructing a basis requires O(m/k) independence tests on a matroid with k disjoint basis.
are all the T-independent sets of M .
Definition 5.6 A / T is any maximal T-independent
subset of A .
5.1
Weighted Matroids
The above proof can be adapted t o prove Theorem 3.1. Instead of analyzing an arbitrary collection
of bases, we define a sequence of bases of "small"
weight: B1 = G ( M ) , B2 = G ( M - B I ) , B3 =
G ( M - (B1 U Bz)), and so on. Let S<" denote the
elements smaller than x in a set S . By construction,
B,<" spans B g l . Suppose x E Bj. This means B,<"
spans B:", which spans B;", and so on until B;Tl,
which spans x. Applying the analysis of Theorem 5.2
to the sets B,<"proves that i f j > 2 k l n r , then M ( l / k )
is likely to contain a set of elements smaller than x
which span x. If this is so then x does not improve
G ( M ( l / k ) ) .This proves that only the elements in the
Lemma 5.1 If A is a basis of M , then A / T is a basis
for M I T . If B is a a basis of M / T , then B U T is
a basis of M .
Theorem 5.2 Suppose M contains a + 2 + k In r disjoint bases. Then M ( l / k ) contains a basis for M with
probability 1 - e - a / k .
In other words, once the number of bases exceeds a
threshold of k l n r , the probability that no basis appears in the sample decreases geometrically to 0.
Proof:
Let p = l / k . Let {B,}YZ:+k'nrb e disjoint bases of M . We construct the basis in M ( p ) by
89
first 2 k l n r bases improve G(M(l/k)). However, the
first 2k In T bases contain at most 2rk In T elements.
This gives Theorem 3.1.
I t Independent set so far.
RE Remainder of nth basis.
E t Elements examined for use.
6
U: Elements actually used from
E t , namely E t (p).
Counting bases
We now extend the ideas of Section 5 t o get a
stronger result: namely, an estimate of the number
of disjoint bases which will be contained in a random
sample of the matroid. As before, we assume that
the sample is constructed by including each matroid
element in the sample with probability p.
Figure 1: Variables describing nth basis in kth phase
ufE be the size of RE, I:, E:, and U: respectively.
Suppose that we have I:-l in hand, and wish t o extend it by examining elements of RE. We assume by
induction that ik-l 5 T ; . It follows from the definiRE
tion of matroids that there must exist a set E:
such that I:-l U EL is independent and has size rk.
Defining E: this way determines RE+' = Rk
n - En
k.
We then set U: = E:(p), and I: = I:-1 U U:.
To justify our inductive assumption we use induction on k. To prove it for k 1, note that our con= ik-l. Thus the fact that
struction makes T:+'
z,-~
.k 5 zk-l implies that
5
Our construc-
Definition 6.1 The packing number P ( M )for a matroid M is the maximum number of disjoint bases in
it.
Theorem 6.1 If P ( M ) = n then the probability that
M ( p ) fails to contain k disjoint bases of M is at most
T . Pr[B(n,p)5 k].
+
Proof:
We generalize the technique of Section 5.
We line up the bases {B,} and pass through them one
by one, adding some of the sampled elements from
each basis t o an independent set I which grows until
it is itself a basis. For each B,, we set aside some of
the elements because they are dependent on elements
already added t o I ; we then examine the remaining
elements of B, t o find out which ones were actually
sampled and add those sampled elements t o I . The
change in the procedure is that we do this more than
once: the next time, we examine those elements set
aside the first time.
Consider a series of phases; in each phase we will
construct one basis. At the beginning of phase k, there
will be a remaining portion RE of B,; the elements of
RE are those elements of B , which were not examined
in any of the previous phases. We construct an independent set I k by processing each of the RfE in order.
Let
be the portion of I k that we have constructed
before processing RE. To process RE, we split it into
two sets: RE+1 are those elements which are set aside
until the next phase, while E: = RfE - RE++'is the set
of elements we examine in this phase. The elements of
E: will be independent of I,"-1. Thus as in the singlebasis case, we simply check which elements of E: are
in the sampled set, identifying the set U: = E:@) of
elements we use, and add them to our growing basis.
Formally, we let I: = I:-1 U U:; by construction I:
is independent.
We now explain precisely how we determine the
split of RE into RfE+' and E:. Let rk, ZfE, e:, and
ikzi
tion forces
5 1-22:;thus ik?; 5 r;+l as desired.
We now use the just noted invariant r;+' = ik-l t o
derive recurrences for the sizes of the various sets. As
before, the recurrences will be probabilistic in nature.
Recall that U! is the size of U,, so u i = B(ek,p).
Thus
n
=
22-1
-
i2-2
k
+ u,-~
-
k+l
k
rn-1+ B(en-1,p).
It follows that
e2 = rk - r nk+l
-
k
[rn-1+ B ( ~ ~ I : , P )[r:?:
]
k
e:-1 - B(en-1,p)
+~
+~ ( e k - l , ~ ) ]
( e k ~ i , ~ ) .
Now let f : = E[.!]. Linearity of expectation applied
the recurrence shows that
f 2 = ( 1 - p ) f n k- l
+PftI:.
Since we examine the entire first basis in the first
phase, e: = T and e! = 0 for k > 0. Therefore this
recurrence is solved by
f; =
(L).k(l
-p)n-kT.
We now ask how big n needs to be t o give us a basis
in the kth phase. As in Section 5, it simplifies matters
90
Proof:
By Edmonds’ theorem, there must exist
some A C M such that
to assume that we begin with an infinite set of disjoint
bases, and ask for the value of n such that in the kth
phase, we finish constructing the kth sampled basis
I k before we reach the nth original basis B,. Recall
denoting the number of items from B,
the variable
used in I k . Suppose that in the kth phase we use no
elements from any basis after B,. One way this might
happen is if we never finish constructing I k . However,
this is a probability 0 event. The only other possibility
is that we have finished constructing I k by the time
we reach B, so that we examine no more bases.
It follows that if U $ = 0 for every j 2 n, then we
must have finished constructing I k before we examined
B,. Since the U $ are non-negative, this is equivalent to
saying that Cj2,us = 0. It follows that our problem
can be solved by determining the value n such that
Cj2,U $ = 0 with high probability.
From the Markov inequality, which says that for
positive integer random variables Pr[X > 01 5 E[X],
and from the fact that E [ u $ ] = pE[e$] = p f t , we
deduce that the probability that we fail to construct
Ik before reacing B, is at most
(n
ui
r
To bound s,;
+ l ) p ( A )+ 1x1< ( n + 1)r.
It is straightforward to prove that with the desired
probability,
(k + 1 M A b ) ) + IAb)I < (k + 1)r
In other words, A ( p ) demonstrates through Edmonds’
Theorem that M ( p ) contains at most k bases. 0
Applying the Chernoff bound [6] t o the previous
two theorems yields the following:
Theorem 6.4 Given a matroid M of rank r and
packing number n, and given p , let n’ be the number
of bases of M in M ( p ) . Then
Pr[ln’- npl > cnp] < r e - ~ ’ n p / ~
6.1
Application: network reliability
Consider the graphic matroid on G . The bases
of the graphic matroid are the spanning trees of the
graph. A theorem of Polesskii [7] shows that a graph
with minimum cut c contains between c / 2 and c disjoint spanning trees. In [18]we proved Theorem 5.2
for the special case of the graphic matroid. Theorem 6.1 yields a more general result:
i
we can sum by parts to prove that
Corollary 6.5 If a graph G contains k disjoint spanning trees, then with high probability G(p) contains between k p - 2 J G and kp+2&
disjoint spanning
trees.
= r P r [ B ( n , p )5 k]
Corollary 6.6 If a graph G has minimum cut c and
p is such that pc = R(logn), then G ( p ) has minimum
cut O(pc).
This yields the theorem. 0
The probability of finding no bases is thus a t most
s: = re-,P; this is exactly the result proven in Section 5.
We also consider the converse problem, namely to
upper bound the number of bases which survive. This
analysis is relatively easy thanks t o the following packing theorem due to Edmonds [ l o ] . Let 2 denote
M - A.
7 Packing bases
Our results can be applied to the basis packing problem. The basis packing problem is to find a maximum
disjoint collection of bases in a matroid M . The first
algorithm for this problem was given by Edmonds [ l o ] ,
and there have been numerous improvements.
Many matroid packing algorithms use a concept of
augmenting paths: the algorithms repeatedly augment
a collection of independent sets until they form the
desired number of bases. These augmentation steps
generally have running times dependent on the matroid size m , the matroid rank r , and k , the number
Theorem 6.2 (Edmonds) A matroid M on M with
rank r has n disjoint bases i f and only if
for every A & M .
Corollary 6.3 If P ( M ) 5 n, and k > np, then the
probability that M ( p ) contains more than k disjoint
bases of M is at most Pr[B(n,p)2 k ] .
91
of bases already found. For example Knuth’s algorithm (221 finds an augmenting path in O ( m r k ) time
and can therefore find a set of k disjoint bases of total
size rk (if they exist) in time O ( m r 2 k 2 ) .
One application of sampling is to estimate the packing number of a matroid. If we find p such that M ( p )
has packing number k = @(log2n ) , then we can be
sure that M has packing number roughly k / p . Thus
with Knuth’s algorithm and binary search on p, we can
estimate t_he packing number k to within a constant
factor in O ( m r 2 / k )time.
We can extend this approach to find a maximum
packing of bases. Suppose that the matroid contains
k bases. We can randomly partition the matroid into
2 groups. We can apply the sampling theorem t o each
group, with p = 1/2, to deduce that with high probability each group contains k / 2 - O ( d m ) disjoint
bases. We recursively run the packing algorithm on
each group t o find each subcollection of bases, and
join them to yield a set of k - O ( d m )bases. The
benefit is that this packing was found by examining
smaller matroids. We now augment the packing to k
bases using Knuth’s algorithm; this takes O ( m r 2 k 3 / 2 )
time and is the dominant term in the running time of
our algorithm; thus we improve the running time of
Knuth’s algorithm by an O(&) factor.
Gabow and Westermann [17] study the problem of
packing spanning trees in the graphic matroid, and
give algorithms which are significantly faster than
the one just presented (they use special properties of
graphs, and are based on an analogue to blocking flows
rather than augmenting paths). Combining their algorithm with our sampling techniques, we can estimate
the number of disjoint spanning trees in a graph to
)
It is an
within any constant factor in 0 ( m 3 / 2time.
open problem t o combine our sampling approach with
their algorithm t o find optimum packings faster than
they already do.
the subproblem may be a good solution t o the original problem which can quickly be improved t o an optimal solution. The obvious open question: apply this
paradigm t o other optimization problems.
In the realm of combinatorics, how much of the
theory of random graphs can be extended t o the more
general matroid model? There is a well defined notion
of connectivity in matroids [32]; is this relevant t o the
basis packing results presented here? What further insight into random graphs can be gained by examining
them from a matroid perspective? The result of Erdos
and Renyi provided a tight threshold of p = ( l n n ) / n
for connectivity in random graphs, whereas our result
gives a looser result of n ( ( l o g n ) / n ) . Is there a 0-1 law
for bases in a matroid?
9
Acknowledgements
Thanks to Don Knuth for help with a troublesome
summation, and to Daphne Koller for her comments.
Thanks to Philip Klein and Bob Tarjan for permitting
reference to their work in progress.
References
R. K. Ahuja, T. L. Magnanti, and J. B. Orlin.
Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.
N. Alon and B. Schieber. “Optimal preprocessing
for answering online product queries” . Technical
report, Tel Aviv University, 1987.
B. Bollobas. Random Graphs. Harcourt Brace
Janovich, 1985.
0. B6ruvka. “0jist6m problCmu minimhlnim” .
Pra‘ca Moravske‘ Pdrodovddecke‘ SpolEnosti, 3:3758, 1926.
8
Conclusion
This paper has suggested a new approach to matroids and given results which apply to matroids as
models for greedy algorithms and as combinatorial objects. Two future directions suggest themselves.
In the realm of optimization, we have suggested a
new paradigm which works particularly well for matroid greedy algorithms: generate a small random representative subproblem, solve it quickly, and use the
information gained t o home in on the solution t o the
entire problem. In particular, an optimal solution to
B. Chazelle. “Computing on a free tree via
complexity preserving mappings”. Algorithmica,
2:337-361, 1987.
H. Chernoff. “A measure of the asymptotic efficiency for tests of a hypothesis based on the sum
of observations”. Annals of Mathematical Statistics, 23:493-509, 1952.
C. 3. Colbourn. The Combinatorics of Network
Reliability, volume 4 of The International Series
of Monographs o n Computer Science. Oxford
University Press, 1987.
[SI T . H. Cormen, C. E. Leiserson, and R. L. Rivest.
Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.
[21] P. N. Klein and R. E. Tarjan. “A linear-time
algorithm for minimum spanning tree”. personal
communication, 1993.
[9] B. Dixon, M. Rauch, and R. E. Tarjan. “Verification and sensitivity analysis of minimum spanning trees in linear time”. SIAM Journal on Computing, 2 1(6) :1184-1 192, 1992.
[22] D. E. Knuth. “Matroid partitioning”. Technical Report STAN-CS-73-342, Stanford University, 1973.
[23] J. B. Kruskal, Jr. “On the shortest spanning subtree of a graph and the traveling salesman problem”. Proceedings of the American Mathematical
Society, 7(1):48-50, 1956.
[lo] J. Edmonds. “Minimum partition of a matroid
into independents subsets”. Journal of Research
of the National Bureau of Standards, 69:67-72,
1965.
[24] E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Reinhardt and Winston, 1976.
[ll] J. Edmonds. “Matroids and the greedy algorithm.
Mathematical Programming, 1:126-136, 1971.
[12] P. Erdos and A. Renyi. “On random graphs I”.
Publ. Math. Debrecen, 6:290-297, 1959.
[25] R. C. Prim. “Shortest connection networks and
some generalizations”. Bell System Technical
Journal, 36:1389-1401,1957.
[13] T . Feder and M. Mihail. “Balanced matroids”. In
Proceedings of the 24th Annual ACM Symposium
on the Theory of Computing, pages 26-38, 1992.
[26] A. Recski. Matroid Theory and its Applications
In Electric Network Theory and in Statics. Number 6 in Algorithms and Combinatorics. SpringerVerlag, 1989.
[14] R. W . Floyd and R. L. Rivest. “Expected time
bounds for selection”. Communications of the
ACM, 18(3):165-172, 1975.
[27] J. H. Reif and P. Spirakis. “Random matroids”.
In Proceedings of the 12th Annual Symprosium on
the Theory of Computing, pages 385-397, 1980.
[15] M. L. Fredman and R. E. Tarjan. “Fibonacci
heaps and their uses in improved network optimization algorithms”. Journal of the ACM,
34(3):596-615, 1987.
[28] B. Schieber and U. Vishkin. “On finding lowest
common ancestors: Simplifcation and parallelization”. SIAM Journal on Computing, 17:12531262, Dec. 1988.
[16] H. N. Gabow, Z. Galil, T . Spencer, and R. E. Tarjan. “Efficient algorithms for finding minimum
spanning tree in undirected and directed graphs”.
Combinatorica, 6:109-122, 1986.
[29] A. Schrijver, editor. Packing and Covering in
Combinatorics. Number 106 in Mathematical
Centre Tracts. Mathematical Centre, 1979.
[17] H. N. Gabow and H. H. Westermann. “Forests,
frames, and games: Algorithms for matroid sums
and applications. Algorithmica, 7:465-497, 1992.
[30] R. E. Tarjan. Data Structures and Network Algorithms, volume 44 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society
for Industrial and Applied Mathematics, 1983.
[18] D. R. Karger. “Global min-cuts in W C and
other ramifications of a simple mincut algorithm”. In Proceedings of the 4th ACM-SIAM
Symposium on Discrete Algorithms, pages 21-30,
Jan. 1993.
[31] B. L. Van Der Waerden.
Springer, 1937.
Moderne Algebra.
[32] D. J. A. Welsh. Matroid Theory. London Mathematical Society Monographs. Academic Press,
1976.
[19] D. R. Karger, M. Parnas, and N. Nisan. “Fast
connected components algorithms for the EREW
PRAM”. In Proceedings of the 4th Annual ACMSIAM Symposium on Parallel Algorithms and Architectures, pages 562-572, 1992.
[33] H. Whitney. “On the abstract properties of linear
independence”. American Journal of Mathemati c ~57:509-533,
,
1935.
[20] R. M. Karp. “Probabilistic recurrence relations”.
Proceedings of the 23’d Annual ACM Symposium
on Theory of Computing, pages 190-197, May
1991.
93