Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Rothschild Lecture Robert E. Tarjan - (Princeton University) Turing Prize Winner "Problems in Data Structures & Algorithms" 1. Introduction I would like to talk about various problems I have worked on over the course of my career. In the course of this lecture I’ll review simple problems with interesting applications, and problems that have rich, sometimes surprising, structures. Let me start by saying a few words about how I view the process of research, discovery and development. (See Figure 1). The Discovery/Development Process Application experiment model Abstraction apply develop Algorithm Old/New Theory/Process 1 My view is based on my experience with data structures and algorithms in computer science, but I think it applies more generally. There is an interesting interplay between theory and practice. The way I like to work is to start out with some application from the real world. The real world, of course, is very messy and the application gets modeled or abstracted away into some problem or some setting that someone, with a theoretical background, can actually deal with. Given the abstraction, I then try to develop a solution which is usually, in the case of computer science, an algorithm, a computational method to perform some task. We may be able to prove things about the algorithm, its running time, its efficiency, and so on. And then, if it’s at all useful, we want to apply the algorithm back to the application and see if it actually solves the real problem. There is an interplay in the experimental domain between the algorithm developed, based on the abstraction, and the application; perhaps we discover that the abstraction does not capture the right parts of the problem; we have solved an interesting mathematical problem but it doesn’t solve the real-world application. Then we need to go back and change the abstraction and solve the new abstract problem and then try to apply that in practice. In this entire process we are developing a body of new theory and practice which can then be used in other settings. A very interesting and important aspect of computation is that often the key to performing computations efficiently is to understand the problem, to represent the problem data appropriately, and to look at the operations that need to be performed on the data. In fact, it may be that many algorithmic problems turn into data manipulation problems and the key issue is to develop the right kind of data structure to solve the problem. I would like to talk about many of these problems. The real question is to devise a data structure, or to analyze a data structure which is a concrete representation of some kind of algorithmic process. 2 2. Optimum Stack Generation Problem Let’s take a look at the following simple problem. I’ve chosen this problem because it’s an abstraction which is, on the one hand, very easy to state, but on the other hand, captures a number of ideas. We are given a finite alphabet , and a stack S. We would like to generate strings of letters over the alphabet using the stack. There are three stack operations we can perform. push (a) - push any letter a from the alphabet onto the stack, emit - output the top letter from the stack, pop - pop the top letter from the stack. We can perform any sequence of these operations subject to the following well-formed constraints: we begin with an empty stack, we perform an arbitrary series of push, emit and pop operations, we never perform pop from an empty stack, and we end up with an empty stack. Following these operations we will then have generated some sequence of letters over the alphabet. Problem 2.1: Given some string over the alphabet, find a minimum length of stack operations to generate . We would like to find a fast algorithm to find the minimum length sequence of stack operations for generating any particular string. For example, consider the string A B C A C B A. We could generate it by performing: push (A), emit A, pop A, push (B), emit B, pop B, push (C), emit C, pop C etc., but the point is that we have repeated letters in the string and we can use the same item on the stack to generate repeats. A shorter sequence of operations is: push (A), emit A, push (B), emit B, push (C), emit C, push (A), emit A, pop A; now we can emit C (we don’t have to put a new C on the stack), pop C, emit B, pop B, emit A. We got the ‘CBA’ string for free rather than having to do a new push-pop. This problem is a simplification of the programming problem which 3 appeared in “The International Conference on Functional Programming” in 2001 [?]. It may apply to the complicated problem involving optimum parsing of HTML-like expressions. What can we say about this problem? There is an obvious O(n 3 ) dynamic programming algorithm1. This is really a special case of optimum context-free language parsing where you have cost associated with the rules. If you have a small alphabet, say, of size three, there is an number of letters is four, there exists an O(n) O(n 2 ) algorithm (by Y. Zhou [?]). If the algorithm. That’s basically all I know about this problem. I would suspect you can solve this problem by matrix multiplication which would give the complexity of O(n you can you do it in O(n 2 ) or in O(n log n) . 2.3 ). I have no idea whether I think it is very interesting and I think that solving this problem and getting a better upper bound or a better lower bound would reveal more information about context-free parsing than what we already know. I think this kind of question actually arises in practice. There are also problems related to string questions in biology that are related to this problem. 1 Sketch of the algorithm: Let S[1n] denote the sequence of characters. Note that there must be exactly n emits and that the number of pushes must equal the number of pops. Thus we may assume that the cost is simply the number of pushes. The dynamic programming is based on the observation that if the same stack item is used to produce, say, S [i1 ] and S [i2 ] (where i2 i1 and S [i1 ] S [i2 ] , then the S[i1 ] must be restored for emit S [i2 ] . Thus the cost C[I,j] of producing the subsequence S[i, j ] is the minimum of C[i, j 1] 1and min C[i, t ] C[t 1, j 1]; S[t ] S[ j ], i t j state of the stack at the time of emit 4 3. Path Compression Let me turn to an old simple problem with a surprising solution. The answer to this problem has already come up several times in some of the talks in the conference “Second Haifa Workshop on Interdisciplinary Applications of Graph Theory, Combinatorics and Algorithms.” The goal is to maintain a collection of n elements which are partitioned into sets, i.e., the sets are always disjoint and each element is in a unique set. Initially each element is in a singleton set. The sets are named by some arbitrary element in it. We would like to perform the following two operations: find(x) – for a given arbitrary element x, we want to return the set containing it. unite(x,y) – combine the two sets named by x and y. The new set gets the name of one of the old sets. Let’s assume that the number of elements is n. Initially, each element is in a singleton set, and after n-1 unite operations all the elements are combined into a single set. Problem 3.1: Find a data structure so as to minimize the worst case total cost of m find operations intermingled with n-1 unite operations. For simplicity and for saving time bounds, we assume that m n , although this assumption is not very important. This problem originally arose in the processing of common and equivalent statements in the ancient programming language FORTRAN. It is also the key problem in computing minimum spanning trees using Kruskal’s [?] minimum spanning tree algorithm. There is a beautiful and very simple algorithm for solving Problem 3.1, developed in the late ‘60s, early ‘70s. I’m sure that many of you are familiar with it. We use a tree data structure with essentially as simple as possible representation of a tree, (see Figure 2). It is a rooted tree where every node has one pointer to its parent. Every set is represented by a tree, where the set name is in the root. Every 5 node represents an element in the set. To answer a find(x) operation we start at the given node x, and follow the pointers to the root node which contains the name of the set. The time of the find operation is proportional to the length of the path. The tree structure is important here because it affects the length of the find path. To perform a unite(x,y) operation, we get hold of the two corresponding tree roots x and y, and make one of the roots point to the other root. The unite operation takes constant time. A B D F C E Figure 2 The question is, how long can find paths be? Well, if this is all there is to it, we can get bad examples. In particular, we can construct the example in Figure 3: a tree which is just a long path. If we do lots of finds, each of linear cost, then the total cost is proportional to the number of finds times the number of elements which is not a happy situation. 6 O(m n) A B C D E F Figure 3 As we know, there are a couple of heuristics we can add to this method to substantially improve the running time. We use the fact that the structure of each tree is completely arbitrary. The best structure for the finds would be if each tree has all its nodes just one step away from the root. Then find operations would all be at constant cost. But as we do the unite operation, the trees get deeper and deeper. However, we can perform the unites in several intelligent ways to ensure that the trees don’t get too deep. 3.1 Unite by size (Mellroy)[?] : One method to combine two trees into one is by always making the root of the smaller tree point to the root of the larger tree. The method is described in the pseudo code below: 7 3.1.1 Maintain at each root the tree size, size(x) (number of nodes in the tree). 3.1.2 unite(x,y): if size ( x) size ( y) make x the parent of y and size( x) size ( x) size( y) Otherwise make y the parent of x and size ( y) size ( x) size ( y) . 3.2 Unite by rank: In this method each node contains a “rank” which is an estimate of the maximum number of edges from the node to the root. When we combine two trees of different rank we attach the shallower tree to the deeper tree, thus getting a tree with the same rank as the deeper tree. The pseudo code is below: 3.2.1 Maintain at each root the tree rank: rank(x) . Initially, the rank of each node is zero. 3.2.2 unite(x,y): if rank( x) rank( y) make x the parent of y if rank( x) rank( y) make y the parent of x if rank( x) rank( y) make x the parent of y and increase the rank of x by 1. We will see, when I talk about what we do in the case of finds, it will no longer be true that the rank is the tree height. It will be an upper bound on the tree height. The rules above improve the complexity drastically. In particular, they cut the find time from linear to logarithmic. Now the total cost for a sequence of operations and an appropriate number of unite operations is m find (m log n) , since on average, the rank of a tree is logarithmic of the number of nodes in the tree. That was in Gal and Fisher’s [?] paper. There is one more thing we can do to improve the complexity of the solution to Problem 3.1. It is an idea that Knuth attributes to Allen Trigger[?] from IBM . The idea is that we modify the tree not only when we do unite operations (by sticking the 8 two trees together), but also when we perform find operations: we “squash” the trees along the find path (see Figure 4). When we perform a find on element, let’s say E, we walk up the path to the root, A, which contains the name of the set represented by this tree. We know now not only the answer for E, but also the answer for every node along the path from E to the root. We take advantage of this fact by squashing or compressing this path and making all these nodes point directly to the root. The tree is modified as depicted in Figure 4. Thus, if later we do a find on say, D, instead of being three steps away from the root, it is now one step away from it. A A B E D C B C D E Figure 4 The question now is, by how much does path compression improve the complexity of the problem? Analyzing this algorithm, especially if you use both path compression and one of the unite rules, is complicated and Knuth proposed it as a 9 challenge. I will remind you of the history of the upper bounds to this problem from the early 1970’s: There was an early “proof” by Bogus (???) of an O(m) time-bound with constant time per find. Shortly thereafter, Mike Fisher [?] obtained a correct bound of O(m log log n) . Here log n Later, Hopcroft and Ullman [?] obtained the bound O(m log n) . denotes the number of times you have to apply the log function on n to get down to a constant. After this result had already appeared, there was yet another result by Bogus [?] claiming a lower bound of (n log log n) . Then I was able to obtain a lower bound which showed that this data structure, in fact, is not performing in constant time per find. Rather it performs in slightly worse than a constant time per find. I showed the lower bound (n (n)) , where (n) is the inverse of Ackerman’s function, an incredibly slowly growing function that you cannot possibly measure in practice. It will be defined below. After showing the lower bound, I was able to obtain a matching upper bound of O(m (n)) . So the correct answer for the complexity of the algorithm using both path compression and one of the unite rules is almost constant time per find, where almost constant is the inverse of Ackerman’s function. Ackerman’s function was originally constructed as function that was so rapidly growing, it was not in the primitive recursive class of functions. Here is one possible definition of the inverse of Ackerman’s function. We define a sequence of functions: For j 1, k 0 , A0 ( j ) j 1, Ak ( j ) Ak( j11) ( j ) Where Note that two, A2 for k 1, A(i 1) ( x) A( A(i ) ( x)) , denoting function composition. A0 is just the successor function; is exponentiation; A3 A1 is essentially multiplying by is iterated exponentiation, the inverse of after that the functions grow horrendously. 10 log (n) ; The inverse of Ackerman’s function is defined as: (n) min k ; Ak (1) n The growth of the function (n) is incredibly slow even for enormous n’s. For k 4 , n is greater than any large number that anyone will ever compute with, in the course of all time. The most interesting thing about this problem, in my opinion, is the surprising result involving (n) . I was able to obtain this result because I imagined that it wasn’t linear, (and I was right), so what could it be? Since this result, the inverse of Ackerman’s function has turned up in a number of other places in computer science, especially in computational geometry, in Davenport-Chinsel’s sequences [?] and related topics, in the complexity of various geometrical configurations involving lines and points and other structures. What is left to research on this problem? There are still interesting questions having to do with the upper bounds extensions, lower bounds, and similar topics. Let me just mention that this data structure is really very powerful. You can attach values to the edges or vertices in the trees and compute functions of find on tree paths and there are many applications of this. There are variants of path compression that have the same kinds of inverse Ackerman function bounds and some other variants that have worse bounds, see for example, Tarjan and Van Leeuven [?]. The lower bound that I originally obtained was for the particular algorithm that I have described. The inverse Ackerman function turns out to be inherent in the problem. There is no way to solve this problem without having the inverse Ackerman function dependence. I was able to show this for a pointer machine model with certain restrictions [?]. Later Fredman and Saks [?] had a beautiful result for the cell probe model where the inverse Ackerman function is necessary. Recently, Haim Kaplan, Nira Shafrir and I [?] have extended the data structure to insertions and deletions with the same kind of bounds. 11 4. Amortization and Self-adjusting Search Trees The inverse Ackerman function that results from path compression is very complicated, if one gets into the heart of the analysis. It illustrates a very important concept, which is the notion of amortization. In solving Problem 3.1 we are performing a sequence of operations, unites and path compressions in various orders. We may, in fact, generate a long path in the tree, and have a single find cost logarithmic time. But the find operation squashes the tree and causes later path compressions to be cheap. Since we are interested in measuring the total cost, we do not mind if some operations are expensive, as long as they are balanced by cheap ones. This leads to the notion of amortization, which is cost per operation averaged over a worst case sequence. This is the first example that I am aware of, where this notion arose, although in the original work the word amortization was not used and the analysis we perform these days was not present then. The idea that you can have a data structure, where you do simple modifications to improve things for later operations, is extremely powerful. I would like to turn to another data structure where this idea comes into play - self-adjusting search trees. 4.1 Splay Trees There is a type of self-adjusting search tree called splay tree that I’m sure many of you know about. It was invented by Danny Sleater and myself [?]. Many complexity results are known for splay trees, as we will see, and they illustrate some clever ideas in algorithmic analysis. However, the ultimate question of the optimality of the splaying algorithm, within a constant factor, remains an open problem. Let me remind you about binary search trees. Definition 4.1: A binary search tree is a binary tree, i.e., every node has a left and a right child, either of which or both, can be missing. Each node contains an item of data and a key, and the items are totally ordered by their keys. The items are 12 arranged in the binary search tree in the following way: For every node r in the tree, every node in the left subtree of r is less than the item stored in r, and every node in the right subtree of r is greater than or equal to the item stored in r. The operations done on the tree are: access, insert and delete. We perform a search for an item in the obvious way: we start at the root, and we go down the tree, choosing in every node whether to go left or right by comparing the key of the node to the key of the item we are searching for. The search time is proportional to the depth of the tree or, more precisely, the length of the path from the root to the designated item. For example, in Figure 5, if we are searching for “frog” which is at the root, it is cheap; if we are searching for “zebra”, it will take us four steps. Searching for “zebra” is more expensive, but not too expensive, because the tree is reasonably balanced. Of course, there are “bad” trees which are just long paths and there are “good” trees, which are spread out wide like the one in Figure 5. If we have a fixed set of items, it is easy to construct a perfectly balanced tree, which gives us logarithmic access time. frog cow cat horse dog goat rabbit pig Figure 5 13 zebra The situation becomes more interesting if we want to allow insertion and deletion operations, since the shape of the tree will change. There are standard methods for inserting and deleting items at a binary search tree. Let me remind you how these work. The easiest method for an insert operation is just to follow the search path, which will end at the bottom of the tree, and stick the new item in the missing position. A delete operation is slightly more complicated. Here is an example. If I want to delete “pig” (a leaf in the tree), I simply delete the node containing it. But if I want to delete “frog “, which is sitting at the root, I have to replace that node with another node. I can get the replacement node by taking the left branch from the root and then going all the way down to the right, giving me the predecessor of “frog”, which happens to be “dog”, and moving it to replace the root. Or symmetrically, I can take the successor of “frog” and move it to the position of “frog”. In either case, an insertion or deletion takes essentially one search in the tree which, in the worse case, is proportional to the depth. But now the tree structure changes. In the presence of insertions and deletions, long paths may develop. Now if we want to perform finds in such trees, the search time will increase. Therefore, we need operations that allow us to restructure the tree, so that we can move things up and down to restore and maintain a "good" search tree. The standard operation for restructuring trees is the rebalancing operation called rotation. Rotation takes an edge (f, k) in the tree, and switches it around to become (k, f), as depicted in Figure 6. This is a right rotation. A left rotation is defined similarly. In any standard computer representation this takes constant time, and the resulting tree is still a binary search tree. Rotation is universal in the sense that any tree on the same set of ordered items can be turned into any other tree on the same set of items in the same order by doing an appropriate sequence of rotations. We can use rotations to rebalance the tree in the presence of insertions and deletions, and 14 there are various so-called “balanced tree structures” that use extra information to preserve the balance when there are insertions and deletions – “AVL trees”, “redblack trees”, etc. All these balanced tree structures have the property that the worst case time for a search, insertion or deletion, is O(log n) . f k right k C A left f A B B Figure 6 Perhaps that should be the end of the story, but balanced search trees possess certain drawbacks: You have to store extra information to keep track of the balance, so they need extra space. 15 C The re-balancing in the case of insertions or deletions involves several cases, some of them are possibly complicated. And perhaps more important, the data structure is logarithmic-worst-case but it is logarithmic-best-case as well. It doesn’t necessarily adapt to a usage pattern. That is to say, suppose I have a tree with a million items but I am only using a thousand of them. I would like those thousand items to be accessed cheaply, more cheaply than log of a million. An ordinary search tree does not allow that. There are various data structures that people have invented to handle this last drawback. Assume that you know something about the usage pattern, for example, you have an estimate of the access frequency for each item. Then you can think about constructing an optimum search tree to minimize the average access time. But maybe the access pattern changes over time. Maybe the access frequency is varied. This happens a lot in practice. Motivated by these questions and also knowing about path compression results, Danny Sleator and I [?] asked the following question: Problem 4.2: Is there a simple self-adjusting mechanism for search trees where we could avoid explicitly balancing the tree and somehow take advantage of the usage pattern, i.e., have the tree automatically adjust itself based upon the usage? The goal is to have items, that are accessed frequently, to somehow bubble up in the tree, and items, that are accessed less frequently, to remain, somehow, at the bottom of the tree. We were able to come up with such a structure, which we call the “splaying tree”, or “splay tree”. Splaying tree is a self-adjusting search tree. . “Splay” means to spread out. Splaying is a simple self-adjusting heuristic, like path compression, but now we are operating on binary search trees. In the splaying heuristic we take a designated item and move it up to the root of the tree by performing rotations, thus 16 preserving the fact that it’s a search tree. The idea is essentially to perform rotations bottom up, but it turns out that if you do rotations one at a time strictly in order, you don’t get the right properties. The heuristics performs rotations in pairs, roughly in bottom up order, and according to the rules shown in Figure 7 and explained below. Thus, when we access an item, we walk down the tree to the item, and then perform the splay operation, which moves the item all the way up to the tree root. Every item along the path has its distance to the root roughly halved, and all other nodes get pushed to the side. No node moves down more than a constant number of steps. 17 Cases of splaying y x Zig x y A C A Rotation B B C (a) z x Zig Zig y y D A x z C B A B (b) D C Z X Y Zig Zag Y Z D X A A B C B D (c) C Figure 7 Figure 7 shows an example of splaying. Assume accessed. If the two steps above x x is the node to be toward the root are in the same direction, we do a rotation on the top edge first and then on the bottom edge, which locally transforms the tree as seen in Figure 7b. This rotation doesn’t look helpful, but in fact, when you 18 do the rotations in sequence, it has positive effects. That’s called the “zigzig” case. If the two edges from x to the root are in the opposite directions, right-left directions as is seen in Figure 7c, or symmetrically, left-right directions, then the bottom rotation is done first, followed by a top rotation. In this case, x moves up and y and z get split into two subtrees of the root. All other subtrees are attached in the correct places. We keep doing zigzag and zigzig steps, as appropriate, bubbling x a time until either x up, two steps at gets to the root or it gets one step away from the root, in which case we then do one final rotation, the zig case (Figure 7a). The splay operation is an entire sequence, which moves a designated item all the way up to the root by means of this sequence of transformations. 6 6 5 5 4 4 3 1 6 6 1 4 2 1 1 4 5 2 2 5 2 Pure zig zig case 6 6 3 3 4 3 2 Pure zig zag 6 1 5 5 2 3 6 1 1 3 3 (a) 3 1 5 2 4 (b) 4 2 5 4 case Figure 8 Figure 8a contains another step-by-step example. This is the purest zigzig case. We perform two rotations, moving item #1 up the path and then two more rotations, moving item #1 further up the path. Finally, a last rotation is performed to 19 complete the task. Going from the initial position to the final one is called “splaying node 1”. Figure 8b is the pure zigzag case. Figure 9 is another example of a pure zigzag case and a single splay operation. Again, the accessed node moves to the root, every other node along the find path has its distance to the root roughly halved, and no nodes are pushed down by more than a constant amount. If we start out with a really bad example, and we do extensive splay operations, the tree gets balanced very quickly, and again, the accessed node moves to the root. Given this splay operation, we are able to perform very efficiently and simply the operations of insertions, deletions, splits and joins of trees. I should say that there is also a top down version of this algorithm. If you look at Danny Sleator’s Website at Carnegie-Mellon , [?], you may find the top-down version code for this algorithm. 10 5 1 9 2 2 8 3 10 1 9 3 7 8 4 4 7 6 6 5 Splaying Figure 9 20 4.2 Complexity Results The main question for us theoreticians is, “How does this algorithm perform?” We are able to show that in the long run, in the amortized sense, this algorithm performs just as well as balanced trees. Theorem 4.3: Assume we start off with an n node tree and we perform a sequence of m accesses, ignoring start up effects, that is, assume that m n . The following results hold: (a) The total cost of m accesses is O (m log n) , thus matching the bound for balanced trees. (b) The splaying algorithm on any access sequence performs within a constant factor of the performance of that of the best possible static tree, in spite the fact that this algorithm does not know the frequencies. (c) If you access the items in symmetric order; that is, in left to right order, small to large, starting with an arbitrary tree, the total access time is linear, O(m) . The amortized cost per access is constant, so it’s better than the usual bound of O(log n) . Note that in any static tree, if you access each of the items in order, it would cost you O(m log n) . So this illustrates the fact that modifying the tree as you go along, can be good in certain situations. It is relatively straightforward to prove results (a) and (b) above. They follow from an interesting potential argument that I trust many of you are familiar with. It can be found in [?]. Result (c) turns out to be complicated to prove – a complicated inductive proof I was eventually able to come up with. The behavior of the tree and its access times is really interesting, (see Figure 10). [Insert figure from CD] If you start off with a long path in the tree, the left path, and begin accessing the nodes in sequential order, the first access costs next access costs about n 2, the next one costs 21 n 4 n. The and so on. You get an exponential decay until you get down to something like a constant. Then the access time function behaves similar to a ruler function, with a certain amount of randomness thrown in, so about half the accesses are really cheap, about a quarter are slightly more expensive, an eighth of the accesses are slightly more expensive than that. The average cost per access turns out to be constant, not logarithmic but constant. This is an experimental result. Proving it is true, especially for any initial tree, is quite difficult. To consider our audacious conjecture, consider any possible way of manipulating binary search trees whatsoever and ignoring insertions and deletions. We begin with some initial tree. We want to perform a sequence of accesses. The cost for accessing a node is the number of nodes on the access path from the node to the root, and anytime we want we can perform a rotation, also at a cost of one. In an off-line algorithm the sequence of accesses is given in advance. In an on-line algorithm there is no prior knowledge of the sequence of accesses. The conjecture compares the minimum cost of off-line algorithms with on-line algorithms. Problem 4.4: Dynamic Optimality Conjecture: For any access sequence, the splaying tree on-line algorithm performs within a constant factor of the optimal offline algorithm (under the assumptions that the cost of accessing a node is the number of nodes on the access path and the cost of each rotation is one). The closest that anyone has come to proving this conjecture is an extension by Richard Cole of the sequential access result. Consider a sequence of accesses. For simplicity, assume that the items are numbered 1, 2, 3, … and that they appear in that order in the search tree. For each consecutive pair of accesses, the distance between them is defined as the number of items in between the two given items that are to be accessed. For example, if we access item 10 and then we access item 25, the distance is their difference, i.e.15. Richard Cole proved the following: 22 Theorem 4.5 (Cole et al [?]): The total cost of sequence of accesses used in a splaying tree is at most the sum of logarithms of distances between consecutive accesses multiplied by some constant factor. This result, called the “dynamic fingerprint theory”, would follow from the dynamic optimality conjecture. The dynamic optimality conjecture, however, is still open. To summarize, we do not know the answer to the following problem: Problem 4.6: Is the splaying algorithm optimal within a constant factor? 4.3 The Rotation Distance between Search Trees The splaying tree data structure is very interesting. Again, it illustrates the fact that if you perform very simple operations but you repeat them, then you get very complicated behavior that is hard to analyze. It is also worth noting that this data structure has been used often in practice in various systems applications, memory management, etc. In many applications of table structures or tree structures, most of the data does not get accessed most of the time. The splaying tree algorithm takes advantage of locality of reference; in other words, for some small working set on the tree, the splay algorithm moves that set right up to the top of the tree, where you have your hands on it with low access cost. Then, as the working set changes, the new working set moves up to the top of the tree. The drawback of this data structure is, of course, that it is performing rotations all the time. Nevertheless, it seems to work very well in practical situations where the access pattern is changing. I mentioned optimum search trees that lead us into less data structure, more algorithmic questions on search trees. Search trees are a fascinating topic. Another interesting question about binary search trees is to look at their structure under rotation. I have said that you can convert any search tree into any other search tree by performing an appropriate sequence of rotations. We could ask the following: 23 Question 1: Given two trees, how many rotations does it take to convert one tree into another? This is an algorithmic question. This question seems to be NP- complete, but I am not sure whether it is or not. A related question is how far apart can two trees be? To formulate the question more precisely, we define the “tree graph” in the following way: The nodes of the graph are trees, n node binary trees. Two trees are connected by an edge if and only if one tree can be gotten from the other using one rotation. This graph is connected. Question 2: What is the diameter of the “tree-graph” described above? It is easy to get an upper bound for the diameter. It is not so easy to get a lower bound for it. It is easy to get a lower bound n but there is a factor of 2 gap. Danny Sleator and I are working with Bill Furston, who turns everything into a hyperbolic geometry problem. We were able to show that the 2n bound is tight, by converting the distance in the tree-graph problem into a hyperbolic geometry problem. This was another amazing thing. I don’t have slides to illustrate the details and it is technical, but it is an example where some piece of mathematics from way out in left field comes in and solves some nitty gritty data structure problem. It just goes to show the power of mathematics in the world. 5. Algorithmic questions on search trees Assume we have a sequence of items in fixed order. We want to store them in a binary search tree, so that we minimize the average access time. We assume that we know the access probabilities and that every item gets accessed with an independent fixed probability. The problem is to construct the tree. There are an exponential number of trees, so it might take exponential time, but in fact it does not. There are two different kinds of algorithms, depending on the kind of binary search tree we pick. Thus we consider two cases: 24 Case 5.1: Items are allowed to be stored in the internal nodes of the tree. The problem in this case is: Given positive weights w1 , w2 ,, wn , construct n an n-node binary tree that minimizes w d i 1 i i di , where is the depth of the i - th node in symmetric order. Case 5.2: Items are allowed to be stored only in the external nodes, (the leaves), of the tree. The problem in this case is: Given positive weights w1 , w2 ,, wn , construct a n binary tree with n external nodes that minimizes w d i 1 i i , where d i is the depth of the i - th external node in symmetric order. In the splay data structure I allowed items to be stored in the internal nodes of the tree. To build an optimum tree in the first case, there is a straightforward O(n 3 ) dynamic programming algorithm, just as in the stack generation problem I mentioned. Knuth[?] was able to show that there is no need to look at all the sub-problems. There is a restriction to the sub-problems that have to be looked at, which reduces the time to O(n 2 ) and this was extended by Frances Yao [?] to other problems. There is a certain kind of structural restriction, that Frances called the “quadrangle inequalities”. Here we have an O(n 2 ) dynamic programming algorithm with a certain amount of cleverness in it. This result is twenty-five years old or so. Nothing better is known. There may, in fact, be a faster algorithm here with no nontrivial lower bound. If we change the problem so that we store the items in the external nodes and just put values in the internal nodes to drive the search process, then we can do even 25 better. There is a beautiful algorithm due to Hu and Tucker [?] which runs in O(n log m) time, which is an extension of the classical algorithm of Huffman for Huffman codes. The difference here is the alphabetic restriction. The items have to be in order in the leaves, whereas in Huffman coding you can permute them. It turns out that a simple variant of the Huffman coding algorithm with a similar running time works here. The amazing thing about this is that the proof is inordinately complicated. The original proof was almost unreadable and tremendously long. There are now many nicer and simpler proofs (references ??) but it is still quite a miracle that this thing in fact works. And again, there is no lower bound here. In fact, there is no reason to believe that the problem can’t be solved in linear time, and there is good reason to believe that maybe it can. There are many special cases of this problem, which can be solved in linear time. 6. Minimum Spanning Tree Problem Let me close by coming back to a classical graph problem – the minimum spanning tree problem. We are given a connected, undirected graph with edge costs. We want to find a spanning tree of minimum total edge cost. This problem is a classical problem about network optimization, and it is about the simplest problem in this area. It has a really long history. Computing minimum spanning tree corresponds to single linkage clustering, so there is a lot of work by people who are trying to do clustering algorithms on this problem, and it occurred very early on. There was an anthropologist in about 1909 who hinted of a solution, which approaches what we would call an algorithm. The first algorithms for this problem were developed in the late 1920’s by various Czech mathematicians, and the algorithms that we know mostly were discovered back in the 1920’s and 1930’s by, now unknown, mathematicians. The exception is Kruskal’s algorithm, with which we are all familiar. In Kruskal’s algorithm the idea is to sort the edges in increasing order by cost, 26 process the edges in this order, and build up the tree incrementally, looking at each edge sequentially. If the edge combines two disconnected components of the graph, we add it to the tree we are building; if it connects two vertices in the same tree, throw it away. To implement this algorithm, we need a sorting or a sorting-like algorithm plus a data structure like the Set union [?] algorithm that was mentioned in section 3. If you include the sorting time, the cost of Kruskal’s algorithm is O(m log n) ; if you assume that the edges are presorted by cost, the running time of Kruskal’s algorithm is O(m (n)) , where (n) is the inverse Ackerman’s function. There were earlier algorithms. There is a classical algorithm usually credited to Prim and Dijkstra, which is a single source version of Kruskal’s algorithm. It begins in a vertex and grows a tree from it by adding the cheapest edge connecting the tree to a new vertex. This algorithm was actually discovered by Jarnik in 1930 [?]. It runs in quadratic time as it was described by Prim and Dijkstra. Jarnik did not describe computational complexity at all. There is an even earlier algorithm, a beautiful parallel algorithm that was described by Baruvka O(min{ m log n, n 2 }) . [?] in 1926, which has a running time of This algorithm is Kruskal’s algorithm in parallel; for every vertex, pick the cheapest adjacent incident edge and throw that into the set of edges that you fix up. In general, you have a collection of pieces of the spanning tree. For each one pick the cheapest incident edge and throw that in. So you might have two pieces selecting the same edge from opposite directions. In every iteration the number of pieces goes down by a factor of 2, and this is where the logarithm function comes in. Once again, Baruvka did not analyze the computational complexity. The classical results are O(n 2 ) , or O(m log n) . The obvious question here is whether sorting is really needed, and whether one can get rid of the log factor. Can this problem, in fact, be solved in linear time? Andy Yao [?] in 1975 was able to add a 27 new idea to Baruvka’s algorithm, that reduced the running time from O(m log log n) , O(m log n) to thereby showing that sorting was not, in fact, inherent in this problem. Then there was a sequence of improvements. If a Fibonacci heaps data structure that Mike Fredman and I [?] invented is used, a running time of O(m log * n) is achieved. If Gabow’s idea [?] is combined with Fibonacci heaps, O(m log log * n) is achieved. Very recently, Bernard Chazelle was able to decrease the complexity down to O(m (n)) , and again you see the inverse Ackerman’s function coming in. This is a deterministic algorithm. This is not the end of the story. You could ask the question:” is (n) inherent in this problem, as well?” It turns out that probably not. These algorithms, as well as Chazelle’s algorithm, are all deterministic. If randomization is allowed, the problem, in fact, can be solved in O(m) . This is a truly amazing result, based on Chazelle’s work by Petty and Ramachandran [?], which says the following: It is possible to construct an algorithm, which is optimum, but they cannot analyze its running time. So you have an algorithm which runs in at least O(m (n)) . It might run in linear time. They cannot analyze the complexity of the algorithm, but they know it runs as fast as possible. Now, how can you come up with an algorithm that you know is optimum but you cannot say how fast it is? The idea is to use pre-processing. If the problem is small enough, all possibilities can be enumerated, a decision tree can be constructed in this case, which tells the minimum number of computations, in this case comparisons, that have to be made in order to solve the problem. Suppose there are n edges. If there was enough time, the optimum decision tree for the computation could be built. It would take double exponential or triple exponential time or something like that. This pre-processing time is too expensive. But if the problem is really, really small, say, O(log log log n) or something like that, then all the possible algorithms can be enumerated on that small problem, the fastest one be 28 picked, and then you are in a good shape. Using Boruvka plus Chazelle’s idea, a recursion is achieved, which takes the original problem and reduces it to a collection of really tiny sub-problems. For every really tiny sub-problem the preprocessing computation is run, which is hugely exponential, but in this case it becomes linear or sub-linear in the size of the original problem. The optimum algorithm for that small size sub-problem is found, then the algorithm is applied to all these small subproblems, and we are done. It can be shown, that this is going to be the best possible algorithm to within a constant factor but its running time can’t be told. It is a very peculiar thing. This is the strange unsettled state of the minimum spanning tree problem, at the moment. 7. References 29