Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT Minimum Spanning Trees Spanning trees A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree. A graph may have many spanning trees; for instance the complete graph on four vertices o---o |\ /| | X | |/ \| o---o has sixteen spanning trees: o---o | | | | | | O o o---o \ / X / \ o o o---o | | | o---o o o |\ / | X |/ \ o o o o |\ | | \ | | \| o o o---o / / / o---o o---o |\ | | \ | | \| o o o o o | | | | | | o---o o o \ / X / \ o---o o / / / o---o o o \ /| X | / \| o o o o | /| | / | |/ | o o o o---o | | | o---o o---o \ \ \ o---o o \ \ \ o---o o---o | /| | / | |/ | o o Minimum spanning trees Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different trees have different lengths. The problem: how to find the minimum length spanning tree? This problem can be solved by many different algorithms. It is the topic of some very recent research. There are several "best" algorithms, depending on the assumptions you make: A randomized algorithm can solve it in linear expected time. [Karger, Klein, and Tarjan, "A randomized linear-time algorithm to find minimum spanning trees", J. ACM, vol. 42, 1995, pp. 321-328.] IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT It can be solved in linear worst case time if the weights are small integers. [Fredman and Willard, "Trans-dichotomous algorithms for minimum spanning trees and shortest paths", 31st IEEE Symp. Foundations of Comp. Sci., 1990, pp. 719--725.] Otherwise, the best solution is very close to linear but not exactly linear. The exact bound is O(m log beta(m,n)) where the beta function has a complicated definition: the smallest i such that log(log(log(...log(n)...))) is less than m/n, where the logs are nested i times. [Gabow, Galil, Spencer, and Tarjan, Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, vol. 6, 1986, pp. 109--122.] These algorithms are all quite complicated, and probably not that great in practice unless you're looking at really huge graphs. The book tries to keep things simpler, so it only describes one algorithm but (in my opinion) doesn't do a very good job of it. I'll go through three simple classical algorithms (spending not so much time on each one). Why minimum spanning trees? The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn't a tree you can always remove some edges and save money. A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once. Note that if you have a path visiting all points exactly once, it's a special kind of tree. For instance, in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it's a minimization over a strictly larger set. On the other hand, if you draw a path tracing around the minimum spanning tree, you trace each edge twice and visit all points, so the TSP weight is less than twice the MST weight. Therefore this tour is within a factor of two of optimal. There is a more complicated way ( Christofides ' heuristic ) of using minimum spanning trees to find a tour within a factor of 1.5 of optimal; I won't describe this here but it might be covered in ICS 163 (graph algorithms) next year. Kruskal's algorithm Kruskal's algorithm is an algorithm in graph theory that finds a minimum spanning tree for a connected weighted graph. This means it finds a subset of the edges that forms a tree that includes IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component). Kruskal's algorithm is an example of a greedy algorithm. This algorithm first appeared in Proceedings of the American Mathematical Society, pp. 48–50 in 1956, and was written by Joseph Kruskal. We'll start with Kruskal 's algorithm, which is easiest to understand and probably the best one for solving problems by hand. Kruskal's algorithm: sort the edges of G in increasing order by length keep a subgraph S of G, initially empty for each edge e in sorted order if the endpoints of e are disconnected in S add e to S return S Note that, whenever you add an edge (u,v), it's always the smallest connecting the part of S reachable from u with the rest of G, so by the lemma it must be part of the MST. This algorithm is known as a greedy algorithm , because it chooses at each step the cheapest edge to add to S. You should be very careful when trying to use greedy algorithms to solve other problems, since it usually doesn't work. Eg if you want to find a shortest path from a to b, it might be a bad idea to keep taking the shortest edges. The greedy idea only works in Kruskal's algorithm because of the key property we proved. Analysis: The line testing whether two endpoints are disconnected looks like it should be slow (linear time per iteration, or O(mn) total). But actually there are some complicated data structures that let us perform each test in close to constant time; this is known as the union-find problem and is discussed in Baase section 8.5 (I won't get to it in this class, though). The slowest part turns out to be the sorting step, which takes O(m log n) time. IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT Image Description This is our original graph. The numbers near the arcs indicate their weight. None of the arcs are highlighted. AD and CE are the shortest arcs, with length 5, and AD has been arbitrarily chosen, so it is highlighted. CE is now the shortest arc that does not form a cycle, with length 5, so it is highlighted as the second arc. IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT The next arc, DF with length 6, is highlighted using much the same method. The next-shortest arcs are AB and BE, both with length 7. AB is chosen arbitrarily, and is highlighted. The arc BD has been highlighted in red, because there already exists a path (in green) between B and D, so it would form a cycle (ABD) if it were chosen. The process continues to highlight the next-smallest arc, BE with length 7. Many more arcs are highlighted in red at this stage: BC because it would form the loop BCE, DE because it would form the loop DEBA, and FE because it would form FEBAD. Finally, the process finishes with the arc EG of length 9, and the minimum spanning tree is found. IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT The proof consists of two parts. First, it is proved that the algorithm produces a spanning tree. Second, it is proved that the constructed spanning tree is of minimal weight. Prim's algorithm In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm was developed in 1930 byCzech mathematician Vojtěch Jarník and later independently by computer scientist Robert C. Prim in 1957 and rediscovered by Edsger Dijkstra in 1959. Therefore it is also sometimes called the DJP algorithm, the Jarník algorithm, or the Prim–Jarník algorithm. Rather than build a subgraph one edge at a time, Prim 's algorithm builds a tree one vertex at a time. Prim's algorithm: let T be a single vertex x while (T has fewer than n vertices) { find the smallest edge connecting T to GT add it to T } Since each edge added is the smallest connecting T to GT, the lemma we proved shows that we only add edges that should be part of the MST. Again, it looks like the loop has a slow step in it. But again, some data structures can be used to speed this up. The idea is to use a heap to remember, for each vertex, the smallest edge connecting T with that vertex. Prim with heaps: make a heap of values (vertex,edge,weight(edge)) initially (v,-,infinity) for each vertex let tree T be empty while (T has fewer than n vertices) { let (v,e,weight(e)) have the smallest weight in the heap remove (v,e,weight(e)) from the heap add v and e to T for each edge f=(u,v) if u is not already in T find value (u,g,weight(g)) in heap if weight(f) < weight(g) replace (u,g,weight(g)) with (u,f,weight(f)) IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT } Analysis: We perform n steps in which we remove the smallest element in the heap, and at most 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replace a value on the heap, reducing it's weight. (You also have to find the right value on the heap, but that can be done easily enough by keeping a pointer from the vertices to the corresponding values.) I haven't described how to reduce the weight of an element of a binary heap, but it's easy to do in O(log n) time. Alternately by using a more complicated data structure known as a Fibonacci heap, you can reduce the weight of an element in constant time. The result is a total time bound of O(m + n log n). Image Description This is our original weighted graph. The numbers near the edges indicate their weight. Vertex D has been arbitrarily chosen as a starting point. Vertices A, B, E and F are connected to D through a single edge. A is the vertex nearest to D and will be chosen as the second vertex along with the edge AD. IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT The next vertex chosen is the vertex nearest to either D or A. B is 9 away from D and 7 away from A, E is 15, and F is 6. F is the smallest distance away, so we highlight the vertex F and the arc DF. The algorithm carries on as above. Vertex B, which is 7 away from A, is highlighted. In this case, we can choose between C, E, and G. C is 8 away from B, E is 7 away from B, and G is 11 away from F. Eis nearest, so we highlight the vertex E and the arc BE. IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT Here, the only vertices available are C and G. C is 5 away from E, and G is 9 away from E. C is chosen, so it is highlighted along with the arc EC. Vertex G is the only remaining vertex. It is 11 away from F, and 9 away from E. E is nearer, so we highlight it and the arc EG. Now all the vertices have been selected and the minimum spanning tree is shown in green. In this case, it has weight 39. U Edge(u,v) {} {D} V\U {A,B,C,D,E,F,G} (D,A) = 5 V (D,B) = 9 (D,E) = 15 IT 213 – Data Structures and Algorithms {A,B,C,E,F,G} PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT (D,F) = 6 {A,D} (D,B) = 9 (D,E) = 15 (D,F) = 6 V (A,B) = 7 {B,C,E,F,G} {A,D,F} (D,B) = 9 (D,E) = 15 (A,B) = 7 V (F,E) = 8 (F,G) = 11 {B,C,E,G} {A,B,D,F} (B,C) = 8 (B,E) = 7 V (D,B) = 9 cycle {C,E,G} (D,E) = 15 (F,E) = 8 (F,G) = 11 {A,B,D,E,F} (B,C) = 8 (D,B) = 9 cycle (D,E) = 15 cycle {C,G} (E,C) = 5 V (E,G) = 9 (F,E) = 8 cycle (F,G) = 11 {A,B,C,D,E,F} (B,C) = 8 cycle (D,B) = 9 cycle (D,E) = 15 cycle {G} (E,G) = 9 V (F,E) = 8 cycle (F,G) = 11 {A,B,C,D,E,F,G} (B,C) = 8 cycle {} (D,B) = 9 cycle IT 213 – Data Structures and Algorithms PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT (D,E) = 15 cycle (F,E) = 8 cycle (F,G) = 11 cycle Proof of Correctness Let P be a connected, weighted graph. At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph. Since P is connected, there will always be a path to every vertex. The output Y of Prim's algorithm is a tree, because the edge and vertex added to Y are connected. Let Y1 be a minimum spanning tree of P. If Y1=Y then Y is a minimum spanning tree. Otherwise, let e be the first edge added during the construction of Y that is not in Y1, and V be the set of vertices connected by the edges added before e. Then one endpoint of e is in V and the other is not. Since Y1 is a spanning tree of P, there is a path in Y1 joining the two endpoints. As one travels along the path, one must encounter an edge fjoining a vertex in V to one that is not in V. Now, at the iteration when e was added to Y, f could also have been added and it would be added instead of e if its weight was less than e. Since f was not added, we conclude that Let Y2 be the graph obtained by removing f from and adding e to Y1. It is easy to show that Y2 is connected, has the same number of edges as Y1, and the total weights of its edges is not larger than that of Y1, therefore it is also a minimum spanning tree of P and it contains e and all the edges added before it during the construction of V. Repeat the steps above and we will eventually obtain a minimum spanning tree of P that is identical to Y. This shows Y is a minimum spanning tree. Find the minimum spanning tree using Kruskal’s algorithm: F 3 5 2 A I 4 2 4 E D 6 4 5 G IT 213 – Data Structures and Algorithms 5 3 H PAMBAYANG DALUBHASAAN NG MARILAO Abangan Norte, Marilao, Bulacan INFORMATION TECHNOLOGY DEPARTMENT Find the minimum spanning tree using Prim’s algorithm: A D F C I B IT 213 – Data Structures and Algorithms H E G