Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Sampling in Matroids, with Applications to Graph Connectivity and Minimum Spanning Trees. David R. Karger* Department of Computer Science Stanford University [email protected] Abstract This has obvious advantages in reducing the investigator’s work, both in gathering and in analyzing the data. We apply the concept of a representative sample t o combinatorial optimization. Given an optimization problem, it may be possible to generate a small random representative subproblem. Intuitively, such a subproblem should form a microcosm of the larger problem. In particular, an optimal solution to the subproblem may be a nearly optimal solution to the problem as a whole. Fbrthermore, it may be relatively quick and easy to improve this good solution t o a solution for the entire problem. Floyd and Rivest [14] use this approach in a fast and elegant algorithm for finding the median of an ordered set. They select a small random sample from the set; inspecting this sample gives a very accurate estimate of the value of the median. It is then easy t o find the actual median by examining only those elements close t o the estimate. This algorithm uses fewer comparisons than any other known median-finding algorithm. Random sampling is a powerful way t o gather information about a group by considering only a small part of it. W e give a paradigm f o r applying this technique to optimization problems, and demonstrate its effectiveness o n matroids. Matroids abstractly model m a n y optimization problems that can be solved by greedy methods, such as the minimum spanning tree ( M S T ) problem. Our results have several applications. W e give a n algorithm that uses simple data structures to construct a n M S T in O ( m + n logn) time (Klein and Tarjan [21] have recently shown that a better choice of parameters makes this algorithm run in O(m n ) time). W e give bounds o n the connectivity (minimum cut) of a graph suffering random edge failures. W e give fast algorithms f o r packing matroid bases, with particular attention to packing spanning trees in graphs. + 1 1.1 Introduction 1.2 Matroids and the greedy algorithm Representative sampling Matroids provide a demonstration of the effectiveness of our random sampling approach. The matroid is a powerful abstraction which generalizes both graphs and vector spaces. A matroid M consists of a ground set M of which some subsets are declared to be independent. The independent sets must satisfy three properties: Arguably the central concept of statistics is that of a representative sample. It is often possible t o gather a great deal of information about a large population by examining a small sample randomly drawn from it. *Supported by a National Science Foundation Graduate Fellowship, NSF Grant CCR-9010517, NSF Young Investigator Award CCR-9357849, and grants from Mitsubishi Corporation and OTL. The empty set is independent. All subsets of an independent set are independent. 0 If U and V are independent] and JUJ> J V Jthen , some element of U can be added t o V t o yield an independent set. This definition clearly generalizes the notion of linear independence in vector spaces; indeed this was the 84 0272-5428l93$03.00 0 1993 IEEE ification the elements of M can be examined in any order. Furthermore, the basis which must be verified is static. Extensive study of dynamic algorithms has demonstrated that they tend to be significantly more complicated than their static counterparts-in particular, algorithms on a static input can preprocess the input so as to accelerate queries against it. Consider, for example, the problem of verifying an MST on an m-edge, n-vertex graph. This problem can be solved in linear time [9], and has simple and relatively practical O ( m n log n)-time solutions based on least common ancestor queries [a, 5, 281. In contrast, the problem of constructing an MST had no linear-time solution until the techniques of this paper were applied [21] and all O ( m n log n)-time solutions are relatively complicated [15, 301. The best of them [16] runs in O ( m l o g P ( m , n ) ) time, where o(m,n) = min{i I log(’)n 5 m / n } . References to other construction and verification algorithms can be found in [9]. We demonstrate that representative sampling is useful in matroid optimization. We show that a random sample of a l / k fraction of the elements of a matroid will likely contain one of the IC best matroid bases (in a sense to be formalized later). As an immediate application, this gives a simple algorithm for finding an approximately optimal basis: choose a random sample of the elements, and construct its optimal basis. We provide a precise description of the timeaccuracy tradeoff in this approach. In particular, we give an extremely simple approximate-MST algorithm which runs in linear time. We also give a simple reduction which uses verification to exactly solve the optimization problem. The approach is to construct the optimal solution of a representative sample, to use verification to determine how the solution fails to be optimal for the entire problem, and then to improve it so that it becomes optimal for the entire problem. Since the solution to the representative sample is good, the improvement takes little additional time. Because the verification problem may be easier to solve in either a theoretical or a practical sense, this can yield both theoretical and practical improvements to optimization algorithms. Applying this reduction yields a practical MST algorithm which runs in O ( m n l o g n ) time. Such a time bound was previously achieved only by using the theoretically important but somewhat impractical Fibonacci heap data structure of Fredman and Tarjan [15]. This paradigm of improving on the good solution of a representative subproblem may have applications to other optimization problems as well. first use of matroids [31]. However, it was quickly noted [33] that matroids also generalize graphs: in the graphic matroid the edges of the graph form the ground set, and the independent sets are the acyclic sets of edges (forests). Maximal independent sets of a matroid are called bases; bases in a vector space are the standard ones while bases in a graph are the spanning forests (spanning trees, if the graph is connected). In the matching matroid of a graph [24], bases correspond t o maximum matchings. Matroids have rich structure and are the subject of much study in their own right [32]. Matroid theory is used to solve problems in electrical circuit analysis and structural rigidity [26]. In computer science, perhaps the most natural problem involving matroids is matroid optimization. If a weight is assigned to each element of a matroid, and the weight of a set is defined as the sum of its elements’ weights, the optimization problem is to find a basis of minimum weight. The minimum spanning tree (MST) problem is the matroid optimization problem on the graphic matroid just described. Numerous other problems can also be formulated as instances of matroid optimization [8, 241. Edmonds [ll]was the first to observe that the matroid optimization problem can be solved by the following natural greedy algorithm. Begin with an empty independent set G, and consider the matroid elements in order of increasing weight. Add each element to G if doing so will keep G independent. Applying the greedy algorithm to the graphic matroid yields Kruskal’s algorithm [23] for minimum spanning trees: grow a forest by repeatedly adding to the forest the minimum weight edge which does not form a cycle with edges already in the forest. An interesting converse result [32] is that if a family of sets does not form a matroid, then there is an assignment of weights to the elements for which the greedy algorithm will fail to find an optimal set in the family. The greedy algorithm has two drawbacks. First, the elements of the matroid must be examined in order of weight. Thus the matroid elements must be sorted, forcing an R(m1ogm) lower bound on the running time of the greedy algorithm. Second, the independent set under construction is constantly changing, so that the problem of determining independence of elements is a dynamic one. 1.3 + + + Optimizing by sampling and verifying Contrast the optimization problem with that of verifying the optimality of a given basis. For matroids, all that is necessary is to verify that no single element of the matroid M “improves” the basis. Thus in ver- 85 1.4 Basis packing divide-and-conquer parameter used by our MST algorithm was not optimal. Using the correct parameter value makes the MST algorithm presented here run in linear time. Their improved analysis also makes it possible t o simplify some of the details of the algorithm. Reif and Spirakis [27] studied random matroids. In particular, they generalized existence proofs and algorithms for Hamiltonian paths and perfect matchings in random graphs. However, their approach was t o analyze the average case behavior of matroid algorithms on random inputs, while our goal is to develop randomized algorithms which work well on all inputs. In previous work [18] we gave a limited version of our basis counting theorem which applies only to the graphic matroid and determines only the existence, rather than a count, of sampled bases. Another matroid problem, which was first studied by Edmonds [lo], is that of packing matroid bases, i.e. finding a maximum set of disjoint bases in a matroid. A simpler algorithm for the problem was given by Knuth [22]. Faster algorithms exist for the special case of the graphic matroid [17] where the problem is to find a maximum collection of disjoint spanning trees. A related problem is that of counting the total number of (non-disjoint) bases of matroid. This #P-complete generalization of the problem of counting perfect matchings in a graph has been the focus of much recent work; see for example [13]. We apply random sampling t o the basis packing problem. Let the packing number of a matroid be the maximum number of disjoint bases in it. We show that a random sample of l / k elements from a matroid with packing number n has a packing number very close to n / k . This yields fast and simple algorithms for estimating the packing number of a matroid. We also use sampling to accelerate algorithms for finding packings, using again the idea that a good solution (packing) in random subproblems can quickly be improved t o an optimum packing of the whole. There has been a great deal of work on packing problems in numerous settings [29]; our approach of packing random subproblems might yield improved results in some of them. We can also relate our sampling theorems to the very successful study of random graphs [3]. Our results generalize to matroids the result which initiated the study of random graphs, namely the work of Erdos and Renyi [12] which determined the probability threshold a t which a random graph is likely to become connected. Although the study of random graphs focuses mainly on the complete graph, every graph determines a corresponding graphic matroid. Furthermore, the connectivity of a graph is closely related t o the number of disjoint spanning trees it contains [7]. Our matroid result therefore yields a simple characterization of the likelihood of any undirected graph to remain connected under random edge failures. By bounding the number of bases in a random sample of a matroid, we also estimate the expected connectivity (minimum cut) of a graph with random edge failures. 1.5 2 Definitions Matroids were defined in the introduction; those looking for more detail may wish t o consult [32]. A basis in a matroid is a maximal independent set. All bases in a matroid have the same size. We therefore define the rank of a matroid as the size of any matroid basis (so the rank of a connected graph is one less than the number of vertices in it). Throughout this paper, we will focus on a matroid M of rank T on a ground set M containing m elements. Definition 2.1 Given a set T C M , the restriction of M to T , denoted M I T , is a matroid with ground set T and independent sets all independent sets of M contained in T . Definition 2.2 S ( p ) is a set generated f r o m S b y including each element independently with probability p . For A S, A ( p ) = A n S ( p ) . It is important for our work that a restriction is indeed a matroid, a fact which follows easily from the definition of matroids. In particular, M ( p ) is a matroid. 3 Related work Weighted matroids In this section we consider the result of sampling at random from a weighted matroid and formalize the following intuition: a random sample is likely to contain a nearly optimal basis. This shows that if we need to find a good but not necessarily optimal basis, Klein and Tarjan [21] have recently developed an elegant improved analysis of our main theorem on sampling from weighted matroids. They show that a 86 it suffices t o choose a small random sample from the matroid and construct its optimal basis. This gives an obvious improvement in the running time; we will make the time-accuracy tradeoff precise. Extending this idea, once we have a good basis, we can use a verification algorithm to determine why it is not optimal. If improving this good basis to an optimum one is easy, then the problem of constructing an optimal basis is reduced to the problem of verifying a basis for optimality. This is the subject of Section 4. To formalize the above statements, we make the following definitions. Some of the algorithms we define, in particular the MST algorithm, use a parameter i . The smaller i , In other words, an element improves A if adding it changes the greedy basis of A . The elements of G ( M ) are easily seen t o improve every independent set. In the preliminary version of this paper, we proved the following theorem: the better the running times of our algorithms. However, i must be large enough t o ensure that with high probability, at most i k elements of M improve M ( l / k ) . Our algorithms initially used Theorem 3.1 to set i = O ( r log r ) ; however the new result of Klein and Tarjan shows that it is possible t o set i = O ( r ) and thus improve our running times. To begin with, suppose that we have two algorithms available: a construction algorithm which takes a matroid of m elements and rank r and constructs its optimum basis in time C ( m , r ) ;and a verification algorithm which takes an m-element matroid M of rank r and an independent set I and determines which elements of M improve I in time V ( m , r ) . We show how to combine these two algorithms to yield a more efficient construction algorithm when V is faster than C. Begin by sampling each element of M with probability I l k and constructing the greedy basis G of the sample using C. With high probability the sample has size O ( m / k ) so that this takes C ( m / k , r ) time. Use the verification algorithm t o find the set I of elements of M which improve G ; this takes time V ( m , r ) . Construct G ( I ) ; since G ( M ) I we know G ( I ) = G ( M ) . By definition of i , I has size a t most k i with high probability; thus this construction takes C ( k i , r )time. The overall running time is thus V ( m ,r ) C ( m / k ,r ) C ( k i ,r ) . To balance for k , set m / k = k i . The running time then becomes Theorem 3.1 With high probability, O(rk logr) ele- V ( m ,r ) + 2 C ( G ,r ) . Definition 3.1 The optimum or greedy basis G ( S ) for S M is the minimum weight basis which can be constructed from elements of S. Thus G ( S ) is the basis which results from running the greedy algorithm on S. As noted in Definition 2.1, S is itself a restricted matroid, so the concept of an optimal basis in S makes sense and matroid algorithms can be applied t o S to find it. The optimum basis of the matroid is simply G ( M ) . Definition 3 . 2 An element x improves an independent set A i f x is in G ( A U {x}). + + ments of M improve G ( M ( l / k ) ) . We give a sketch of the proof of this theorem in Section 5.1. However, the discussion is abbreviated because Klein and Tarjan [21] have recently presented the following improvement, which is proved through a much more elegant analysis: Theorem 3.2 (Klein-Tarjan) With high probability, O ( r k ) elements of M improve G ( M ( l / k ) ) . 4 Application: optimizing by verifying We use the results of the previous section t o reduce the problem of constructing the optimum basis to the problem of verifying a basis t o determine which elements improve it. This is useful because, as was discussed in the introduction, the verification task is likely t o be easier than the construction task. This is a clear improvement when i is less than m. It should be noted that this new algorithm is just as practical as the original construction and verification algorithms were, since it consists simply of two construction calls and a verification call. At the cost of some additional complexity, we can improve the running time further by applying the new construction algorithm recursively. Choose a random sample of half the matroid elements, and recursively find the greedy basis G of that sample. Then verify the remaining half of the matroid elements against G . By definition of i , the set I of elements which improve G has size a t most 2 i with high probability. The optimum basis of M is then the greedy basis of G U I and can be found in C ( i ,r ) time. This leads to the recurrence C’(2i,r) = C ( 2 i , r ) C’(m,r ) = C’(m/2,r ) + V ( m / 2 ,r ) + C ( 2 i ,r ) . + Given the natural assumption that V ( m , r )= n(m) (since every element must be examined), this recurrence solves t o this running time to O ( m f i ) . It is based on the first MST algorithm, by BGruvka ([4], see also [30, 11). BGruvka shows that solving an m-edge, n-vertex MST problem can be reduced in O ( m ) time t o solving an m-edge, n/2-vertex problem by having each vertex identify and contract its smallest incident edgc: (all such edges will be in the MST). We use this reduction to improve T ( m ,n ) ,the time needed to find an MST in a graph with m edges and n vertices. Given the graph G, run two BGruvka iterations to reduce to n / 4 vertices and then continue the BGruvka iterations until the graph satisfies m > 4fi (note that fi changes as n does). The total amount of work this requires is O ( m 4(fi/4) 4(fi/8) . . .) = O ( m f i ) . Then sample half the remaining edges and recursively construct the MST G of the sample. This takes at most T(m/2,n/4) time. Then verify the unsampled edges against the sample: this takes V ( m ) time. With high probability, at most fi < m / 4 edges will improve G. Thus to find the actual MST of the remainder we need only use T ( m / 4 ,n/4) time. In other words, C'(m,r ) = O ( V ( m ,r ) + C(i:,r ) log(m/i)). Since the depth of recursion is logarithmic in m, the same algorithm can be used to reduce the processor cost of W C algorithms for matroid optimization problems; the analysis is unchanged if we let V and C denote processor costs for verification and construction rather than running times. 4.1 Minimum spanning trees + + As a concrete example, consider the MST problem. This is of course the matroid optimization problem on the graphic matroid, which has rank n - 1 in a connected n-vertex graph. To match this accepted terminology, we let fi play the role of i: from the previous section. There are many well known MST algorithms which run in O(m1ogn) time [4, 25, 23, 30, 11. We use random sampling to improve this bound. The following may be useful because of the simple and practical way it can be applied. + + T ( m ,n ) i m+fi+V(m)+T(m/2,n/4)+T(m/4, n/4). By substitution, T(m,n) = O(V(m)+fi).Using one of the simple O ( m n log n)-time algorithms for verification and the fi = O ( n logn) parameter value originally determined here gives a practical O ( m n log n)-time MST algorithm. Using the linear time verification althe K1ein-Tarjan bound Of gorithmof [91 and fi = O ( n )gives a linear-time MST algorithm. We can also apply the sampling paradigm t o find an MST in parallel. It is possible to apply the m/ logn n'+' processor, O(1ogn) time EREW PRAM connected components algorithm of [19] t o yield an MST algorithm which uses ml+' processors and has the same time bounds. Using the verification paradigm, we improve the processor cost to m/ log n+nl+', yielding an EREW MST algorithm which is optimal in time and processor costs on dense graphs. + Lemma 4.1 In linear time, it is possible t o construct a spanning tree which is minimal for all but O(fi1ogn) edges of a graph. + proof: Sample a l / logn fraction of the edges of the graph and construct their MST in linear time using any one of the standard O ( mlog n)-time MST algorithms. If the resulting forest is not spanning, add arbitrary additional edges t o make it spanning. 0 We now turn to the problem of finding the actual MST. There are several simple and practical linear time verification schemes for minimum spanning trees which run in O ( m n l o g n ) time [2, 28, 51. Using an O ( mlog n ) time construction algorithm and applying the construction-to-verification transformation yields a practical algorithm for the MST problem. If we use the non-recursive formulation of the matroid algorithm, we get a running time of + + 5 Sampling to find a basis We now turn to the second topic of this paper. We consider an unweighted matroid and study the way bases arise in a random sample from the matroid. This section generalizes and extends a well known fact proved by Erdos and Renyi [12, 31, namely that if a random graph on n vertices is generated by including each edge independently with probability exceeding (In n ) / n ,then t,he graph is likely to be connected. O(m+dmfilog2n) = O(m+filog2n). The recursive formulation yields a running time of O ( m +filognlog(m/fi)) = O ( m +filognloglogn). An additional modification which does not appear extensible to general matroids allows us to improve 88 examining the sets Bi(p) one at a time and adding some of their elements to an independent set I (initially empty) until I is large enough t o be a basis. We invert the problem by asking how many bases must be examined before I becomes a basis. Suppose we determine U = Bl(p),the set of elements of B1 contained in M ( p ) . Note that the size U of U is distributed as B(r,p);thus E[u] = ~ p Consider . the contraction M I U . By Lemma 5.1, this matroid contains disjoint bases B2/U, B3/U,.. ., and has rank T - U . We ask recursively how many of these bases we need to examine to construct a basis B for the contracted matroid. Once we have done so, we know from Lemma 5.1 that U U B is a basis for M . This gives a probabilistic recurrence for the number of bases we need t o examine: Rephrased in the language of graphic matroids, if the edges of a complete graph on n vertices are sampled with the given probability, then the sampled edges are likely to contain a spanning tree, i.e. a basis. We generalize this result t o arbitrary matroids. The result of this section is merely a special case of Theorem 6.1, which actually counts the number of bases which survive. However, this section provides the intuition and demonstrates the techniques which will be used in the proof of that more difficult theorem. We begin with some definitions needed in the proof. Definition 5.1 The rank of A M , denoted PA, is the size of the largest independent subset of A . Definition 5.2 A set A spans an element x if pA = p ( A U { x } ) . The span of a set A , denoted [TA,is the set of elements spanned by A . A spans B if B [TA. If A C B then a A C then A spans C. T ( r )= 1+ T ( r - U ) , If we replaced random variables by their expected values, we would get a recurrence of the form S ( r ) = 1 S((1- p ) ~ )which , solves to S ( r ) = log, r , where b = 1/ (1-p ) . Probabilistic recurrences are studied by Karp in [20]. His first theorem exactly describes our recurrence, and proves that for any a, OB. If A spans B and B spans + Definition 5.3 B(n,p) is a binomial random variable: Pr[B(n,p)= k] = U = B(r,p). (3 pk(l-P)"-~. P r [ T ( r )2 rj + a f 21 5 (1 - l/k)". In our case, log, r M k In r. 0 A corollary to this theorem is the performance of the following natural randomized incremental algorithm for constructing a matroid basis. Start with an empty set. Examine elements in random order, and add them to the set if they are independent of it, until the set forms a basis. The concept of a contracted matroid is well known in matroid theory; however, we use slightly different terminology. For the following definitions, fix some independent set T in M . Definition 5.4 A set A is T-independent or independent of T in M if A U T is independent. Definition 5.5 The contraction of M by T , denoted M I T , is the matroid on M whose independent sets Corollary 5.3 The randomized incremental algorithm for constructing a basis requires O(m/k) independence tests on a matroid with k disjoint basis. are all the T-independent sets of M . Definition 5.6 A / T is any maximal T-independent subset of A . 5.1 Weighted Matroids The above proof can be adapted t o prove Theorem 3.1. Instead of analyzing an arbitrary collection of bases, we define a sequence of bases of "small" weight: B1 = G ( M ) , B2 = G ( M - B I ) , B3 = G ( M - (B1 U Bz)), and so on. Let S<" denote the elements smaller than x in a set S . By construction, B,<" spans B g l . Suppose x E Bj. This means B,<" spans B:", which spans B;", and so on until B;Tl, which spans x. Applying the analysis of Theorem 5.2 to the sets B,<"proves that i f j > 2 k l n r , then M ( l / k ) is likely to contain a set of elements smaller than x which span x. If this is so then x does not improve G ( M ( l / k ) ) .This proves that only the elements in the Lemma 5.1 If A is a basis of M , then A / T is a basis for M I T . If B is a a basis of M / T , then B U T is a basis of M . Theorem 5.2 Suppose M contains a + 2 + k In r disjoint bases. Then M ( l / k ) contains a basis for M with probability 1 - e - a / k . In other words, once the number of bases exceeds a threshold of k l n r , the probability that no basis appears in the sample decreases geometrically to 0. Proof: Let p = l / k . Let {B,}YZ:+k'nrb e disjoint bases of M . We construct the basis in M ( p ) by 89 first 2 k l n r bases improve G(M(l/k)). However, the first 2k In T bases contain at most 2rk In T elements. This gives Theorem 3.1. I t Independent set so far. RE Remainder of nth basis. E t Elements examined for use. 6 U: Elements actually used from E t , namely E t (p). Counting bases We now extend the ideas of Section 5 t o get a stronger result: namely, an estimate of the number of disjoint bases which will be contained in a random sample of the matroid. As before, we assume that the sample is constructed by including each matroid element in the sample with probability p. Figure 1: Variables describing nth basis in kth phase ufE be the size of RE, I:, E:, and U: respectively. Suppose that we have I:-l in hand, and wish t o extend it by examining elements of RE. We assume by induction that ik-l 5 T ; . It follows from the definiRE tion of matroids that there must exist a set E: such that I:-l U EL is independent and has size rk. Defining E: this way determines RE+' = Rk n - En k. We then set U: = E:(p), and I: = I:-1 U U:. To justify our inductive assumption we use induction on k. To prove it for k 1, note that our con= ik-l. Thus the fact that struction makes T:+' z,-~ .k 5 zk-l implies that 5 Our construc- Definition 6.1 The packing number P ( M )for a matroid M is the maximum number of disjoint bases in it. Theorem 6.1 If P ( M ) = n then the probability that M ( p ) fails to contain k disjoint bases of M is at most T . Pr[B(n,p)5 k]. + Proof: We generalize the technique of Section 5. We line up the bases {B,} and pass through them one by one, adding some of the sampled elements from each basis t o an independent set I which grows until it is itself a basis. For each B,, we set aside some of the elements because they are dependent on elements already added t o I ; we then examine the remaining elements of B, t o find out which ones were actually sampled and add those sampled elements t o I . The change in the procedure is that we do this more than once: the next time, we examine those elements set aside the first time. Consider a series of phases; in each phase we will construct one basis. At the beginning of phase k, there will be a remaining portion RE of B,; the elements of RE are those elements of B , which were not examined in any of the previous phases. We construct an independent set I k by processing each of the RfE in order. Let be the portion of I k that we have constructed before processing RE. To process RE, we split it into two sets: RE+1 are those elements which are set aside until the next phase, while E: = RfE - RE++'is the set of elements we examine in this phase. The elements of E: will be independent of I,"-1. Thus as in the singlebasis case, we simply check which elements of E: are in the sampled set, identifying the set U: = E:@) of elements we use, and add them to our growing basis. Formally, we let I: = I:-1 U U:; by construction I: is independent. We now explain precisely how we determine the split of RE into RfE+' and E:. Let rk, ZfE, e:, and ikzi tion forces 5 1-22:;thus ik?; 5 r;+l as desired. We now use the just noted invariant r;+' = ik-l t o derive recurrences for the sizes of the various sets. As before, the recurrences will be probabilistic in nature. Recall that U! is the size of U,, so u i = B(ek,p). Thus n = 22-1 - i2-2 k + u,-~ - k+l k rn-1+ B(en-1,p). It follows that e2 = rk - r nk+l - k [rn-1+ B ( ~ ~ I : , P )[r:?: ] k e:-1 - B(en-1,p) +~ +~ ( e k - l , ~ ) ] ( e k ~ i , ~ ) . Now let f : = E[.!]. Linearity of expectation applied the recurrence shows that f 2 = ( 1 - p ) f n k- l +PftI:. Since we examine the entire first basis in the first phase, e: = T and e! = 0 for k > 0. Therefore this recurrence is solved by f; = (L).k(l -p)n-kT. We now ask how big n needs to be t o give us a basis in the kth phase. As in Section 5, it simplifies matters 90 Proof: By Edmonds’ theorem, there must exist some A C M such that to assume that we begin with an infinite set of disjoint bases, and ask for the value of n such that in the kth phase, we finish constructing the kth sampled basis I k before we reach the nth original basis B,. Recall denoting the number of items from B, the variable used in I k . Suppose that in the kth phase we use no elements from any basis after B,. One way this might happen is if we never finish constructing I k . However, this is a probability 0 event. The only other possibility is that we have finished constructing I k by the time we reach B, so that we examine no more bases. It follows that if U $ = 0 for every j 2 n, then we must have finished constructing I k before we examined B,. Since the U $ are non-negative, this is equivalent to saying that Cj2,us = 0. It follows that our problem can be solved by determining the value n such that Cj2,U $ = 0 with high probability. From the Markov inequality, which says that for positive integer random variables Pr[X > 01 5 E[X], and from the fact that E [ u $ ] = pE[e$] = p f t , we deduce that the probability that we fail to construct Ik before reacing B, is at most (n ui r To bound s,; + l ) p ( A )+ 1x1< ( n + 1)r. It is straightforward to prove that with the desired probability, (k + 1 M A b ) ) + IAb)I < (k + 1)r In other words, A ( p ) demonstrates through Edmonds’ Theorem that M ( p ) contains at most k bases. 0 Applying the Chernoff bound [6] t o the previous two theorems yields the following: Theorem 6.4 Given a matroid M of rank r and packing number n, and given p , let n’ be the number of bases of M in M ( p ) . Then Pr[ln’- npl > cnp] < r e - ~ ’ n p / ~ 6.1 Application: network reliability Consider the graphic matroid on G . The bases of the graphic matroid are the spanning trees of the graph. A theorem of Polesskii [7] shows that a graph with minimum cut c contains between c / 2 and c disjoint spanning trees. In [18]we proved Theorem 5.2 for the special case of the graphic matroid. Theorem 6.1 yields a more general result: i we can sum by parts to prove that Corollary 6.5 If a graph G contains k disjoint spanning trees, then with high probability G(p) contains between k p - 2 J G and kp+2& disjoint spanning trees. = r P r [ B ( n , p )5 k] Corollary 6.6 If a graph G has minimum cut c and p is such that pc = R(logn), then G ( p ) has minimum cut O(pc). This yields the theorem. 0 The probability of finding no bases is thus a t most s: = re-,P; this is exactly the result proven in Section 5. We also consider the converse problem, namely to upper bound the number of bases which survive. This analysis is relatively easy thanks t o the following packing theorem due to Edmonds [ l o ] . Let 2 denote M - A. 7 Packing bases Our results can be applied to the basis packing problem. The basis packing problem is to find a maximum disjoint collection of bases in a matroid M . The first algorithm for this problem was given by Edmonds [ l o ] , and there have been numerous improvements. Many matroid packing algorithms use a concept of augmenting paths: the algorithms repeatedly augment a collection of independent sets until they form the desired number of bases. These augmentation steps generally have running times dependent on the matroid size m , the matroid rank r , and k , the number Theorem 6.2 (Edmonds) A matroid M on M with rank r has n disjoint bases i f and only if for every A & M . Corollary 6.3 If P ( M ) 5 n, and k > np, then the probability that M ( p ) contains more than k disjoint bases of M is at most Pr[B(n,p)2 k ] . 91 of bases already found. For example Knuth’s algorithm (221 finds an augmenting path in O ( m r k ) time and can therefore find a set of k disjoint bases of total size rk (if they exist) in time O ( m r 2 k 2 ) . One application of sampling is to estimate the packing number of a matroid. If we find p such that M ( p ) has packing number k = @(log2n ) , then we can be sure that M has packing number roughly k / p . Thus with Knuth’s algorithm and binary search on p, we can estimate t_he packing number k to within a constant factor in O ( m r 2 / k )time. We can extend this approach to find a maximum packing of bases. Suppose that the matroid contains k bases. We can randomly partition the matroid into 2 groups. We can apply the sampling theorem t o each group, with p = 1/2, to deduce that with high probability each group contains k / 2 - O ( d m ) disjoint bases. We recursively run the packing algorithm on each group t o find each subcollection of bases, and join them to yield a set of k - O ( d m )bases. The benefit is that this packing was found by examining smaller matroids. We now augment the packing to k bases using Knuth’s algorithm; this takes O ( m r 2 k 3 / 2 ) time and is the dominant term in the running time of our algorithm; thus we improve the running time of Knuth’s algorithm by an O(&) factor. Gabow and Westermann [17] study the problem of packing spanning trees in the graphic matroid, and give algorithms which are significantly faster than the one just presented (they use special properties of graphs, and are based on an analogue to blocking flows rather than augmenting paths). Combining their algorithm with our sampling techniques, we can estimate the number of disjoint spanning trees in a graph to ) It is an within any constant factor in 0 ( m 3 / 2time. open problem t o combine our sampling approach with their algorithm t o find optimum packings faster than they already do. the subproblem may be a good solution t o the original problem which can quickly be improved t o an optimal solution. The obvious open question: apply this paradigm t o other optimization problems. In the realm of combinatorics, how much of the theory of random graphs can be extended t o the more general matroid model? There is a well defined notion of connectivity in matroids [32]; is this relevant t o the basis packing results presented here? What further insight into random graphs can be gained by examining them from a matroid perspective? The result of Erdos and Renyi provided a tight threshold of p = ( l n n ) / n for connectivity in random graphs, whereas our result gives a looser result of n ( ( l o g n ) / n ) . Is there a 0-1 law for bases in a matroid? 9 Acknowledgements Thanks to Don Knuth for help with a troublesome summation, and to Daphne Koller for her comments. Thanks to Philip Klein and Bob Tarjan for permitting reference to their work in progress. References R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993. N. Alon and B. Schieber. “Optimal preprocessing for answering online product queries” . Technical report, Tel Aviv University, 1987. B. Bollobas. Random Graphs. Harcourt Brace Janovich, 1985. 0. B6ruvka. “0jist6m problCmu minimhlnim” . Pra‘ca Moravske‘ Pdrodovddecke‘ SpolEnosti, 3:3758, 1926. 8 Conclusion This paper has suggested a new approach to matroids and given results which apply to matroids as models for greedy algorithms and as combinatorial objects. Two future directions suggest themselves. In the realm of optimization, we have suggested a new paradigm which works particularly well for matroid greedy algorithms: generate a small random representative subproblem, solve it quickly, and use the information gained t o home in on the solution t o the entire problem. In particular, an optimal solution to B. Chazelle. “Computing on a free tree via complexity preserving mappings”. Algorithmica, 2:337-361, 1987. H. Chernoff. “A measure of the asymptotic efficiency for tests of a hypothesis based on the sum of observations”. Annals of Mathematical Statistics, 23:493-509, 1952. C. 3. Colbourn. The Combinatorics of Network Reliability, volume 4 of The International Series of Monographs o n Computer Science. Oxford University Press, 1987. [SI T . H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990. [21] P. N. Klein and R. E. Tarjan. “A linear-time algorithm for minimum spanning tree”. personal communication, 1993. [9] B. Dixon, M. Rauch, and R. E. Tarjan. “Verification and sensitivity analysis of minimum spanning trees in linear time”. SIAM Journal on Computing, 2 1(6) :1184-1 192, 1992. [22] D. E. Knuth. “Matroid partitioning”. Technical Report STAN-CS-73-342, Stanford University, 1973. [23] J. B. Kruskal, Jr. “On the shortest spanning subtree of a graph and the traveling salesman problem”. Proceedings of the American Mathematical Society, 7(1):48-50, 1956. [lo] J. Edmonds. “Minimum partition of a matroid into independents subsets”. Journal of Research of the National Bureau of Standards, 69:67-72, 1965. [24] E. L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Reinhardt and Winston, 1976. [ll] J. Edmonds. “Matroids and the greedy algorithm. Mathematical Programming, 1:126-136, 1971. [12] P. Erdos and A. Renyi. “On random graphs I”. Publ. Math. Debrecen, 6:290-297, 1959. [25] R. C. Prim. “Shortest connection networks and some generalizations”. Bell System Technical Journal, 36:1389-1401,1957. [13] T . Feder and M. Mihail. “Balanced matroids”. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, pages 26-38, 1992. [26] A. Recski. Matroid Theory and its Applications In Electric Network Theory and in Statics. Number 6 in Algorithms and Combinatorics. SpringerVerlag, 1989. [14] R. W . Floyd and R. L. Rivest. “Expected time bounds for selection”. Communications of the ACM, 18(3):165-172, 1975. [27] J. H. Reif and P. Spirakis. “Random matroids”. In Proceedings of the 12th Annual Symprosium on the Theory of Computing, pages 385-397, 1980. [15] M. L. Fredman and R. E. Tarjan. “Fibonacci heaps and their uses in improved network optimization algorithms”. Journal of the ACM, 34(3):596-615, 1987. [28] B. Schieber and U. Vishkin. “On finding lowest common ancestors: Simplifcation and parallelization”. SIAM Journal on Computing, 17:12531262, Dec. 1988. [16] H. N. Gabow, Z. Galil, T . Spencer, and R. E. Tarjan. “Efficient algorithms for finding minimum spanning tree in undirected and directed graphs”. Combinatorica, 6:109-122, 1986. [29] A. Schrijver, editor. Packing and Covering in Combinatorics. Number 106 in Mathematical Centre Tracts. Mathematical Centre, 1979. [17] H. N. Gabow and H. H. Westermann. “Forests, frames, and games: Algorithms for matroid sums and applications. Algorithmica, 7:465-497, 1992. [30] R. E. Tarjan. Data Structures and Network Algorithms, volume 44 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, 1983. [18] D. R. Karger. “Global min-cuts in W C and other ramifications of a simple mincut algorithm”. In Proceedings of the 4th ACM-SIAM Symposium on Discrete Algorithms, pages 21-30, Jan. 1993. [31] B. L. Van Der Waerden. Springer, 1937. Moderne Algebra. [32] D. J. A. Welsh. Matroid Theory. London Mathematical Society Monographs. Academic Press, 1976. [19] D. R. Karger, M. Parnas, and N. Nisan. “Fast connected components algorithms for the EREW PRAM”. In Proceedings of the 4th Annual ACMSIAM Symposium on Parallel Algorithms and Architectures, pages 562-572, 1992. [33] H. Whitney. “On the abstract properties of linear independence”. American Journal of Mathemati c ~57:509-533, , 1935. [20] R. M. Karp. “Probabilistic recurrence relations”. Proceedings of the 23’d Annual ACM Symposium on Theory of Computing, pages 190-197, May 1991. 93