Download Richard Tarjent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

B-tree wikipedia , lookup

Quadtree wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

Binary tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
The Rothschild Lecture
Robert E. Tarjan - (Princeton University)
Turing Prize Winner
"Problems in Data Structures & Algorithms"
1. Introduction
I would like to talk about various problems I have worked on over the course of my
career. In the course of this lecture I’ll review simple problems with interesting
applications, and problems that have rich, sometimes surprising, structures.
Let me start by saying a few words about how I view the process of research,
discovery and development. (See Figure 1).
The Discovery/Development
Process
Application
experiment
model
Abstraction
apply
develop
Algorithm
Old/New
Theory/Process
1
My view is based on my experience with data structures and algorithms in
computer science, but I think it applies more generally. There is an interesting
interplay between theory and practice. The way I like to work is to start out with some
application from the real world. The real world, of course, is very messy and the
application gets modeled or abstracted away into some problem or some setting that
someone, with a theoretical background, can actually deal with. Given the
abstraction, I then try to develop a solution which is usually, in the case of computer
science, an algorithm, a computational method to perform some task. We may be
able to prove things about the algorithm, its running time, its efficiency, and so on.
And then, if it’s at all useful, we want to apply the algorithm back to the application
and see if it actually solves the real problem. There is an interplay in the experimental
domain between the algorithm developed, based on the abstraction, and the
application; perhaps we discover that the abstraction does not capture the right parts
of the problem; we have solved an interesting mathematical problem but it doesn’t
solve the real-world application. Then we need to go back and change the
abstraction and solve the new abstract problem and then try to apply that in practice.
In this entire process we are developing a body of new theory and practice which can
then be used in other settings.
A very interesting and important aspect of computation is that often the key to
performing computations efficiently is to understand the problem, to represent the
problem data appropriately, and to look at the operations that need to be performed
on the data. In fact, it may be that many algorithmic problems turn into data
manipulation problems and the key issue is to develop the right kind of data structure
to solve the problem. I would like to talk about many of these problems. The real
question is to devise a data structure, or to analyze a data structure which is a
concrete representation of some kind of algorithmic process.
2
2. Optimum Stack Generation Problem
Let’s take a look at the following simple problem. I’ve chosen this problem
because it’s an abstraction which is, on the one hand, very easy to state, but on the
other hand, captures a number of ideas. We are given a finite alphabet  , and a
stack
S.
We would like to generate strings of letters over the alphabet using the
stack. There are three stack operations we can perform.
push (a) - push any letter a from the alphabet onto the stack,
emit - output the top letter from the stack,
pop - pop the top letter from the stack.
We can perform any sequence of these operations subject to the following
well-formed constraints: we begin with an empty stack, we perform an arbitrary series
of push, emit and pop operations, we never perform pop from an empty stack, and
we end up with an empty stack. Following these operations we will then have
generated some sequence of letters over the alphabet.
Problem 2.1: Given some string  over the alphabet, find a minimum length
of stack operations to generate  .
We would like to find a fast algorithm to find the minimum length sequence of
stack operations for generating any particular string.
For example, consider the string A B C A C B A. We could generate it by
performing: push (A), emit A, pop A, push (B), emit B, pop B, push (C), emit C,
pop C etc., but the point is that we have repeated letters in the string and we can
use the same item on the stack to generate repeats. A shorter sequence of
operations is: push (A), emit A, push (B), emit B, push (C), emit C, push (A),
emit A, pop A; now we can emit C (we don’t have to put a new C on the stack), pop
C, emit B, pop B, emit A. We got the ‘CBA’ string for free rather than having to do a
new push-pop. This problem is a simplification of the programming problem which
3
appeared in “The International Conference on Functional Programming” in 2001 [?].
It may apply to the complicated problem involving optimum parsing of HTML-like
expressions.
What can we say about this problem? There is an obvious
O(n 3 )
dynamic
programming algorithm1. This is really a special case of optimum context-free
language parsing where you have cost associated with the rules. If you have a small
alphabet, say, of size three, there is an
number of letters is four, there exists an
O(n)
O(n 2 )
algorithm (by Y. Zhou [?]). If the
algorithm. That’s basically all I know
about this problem. I would suspect you can solve this problem by matrix
multiplication which would give the complexity of O(n
you can you do it in O(n
2
) or in O(n log n) .
2.3
).
I have no idea whether
I think it is very interesting and I think
that solving this problem and getting a better upper bound or a better lower bound
would reveal more information about context-free parsing than what we already
know. I think this kind of question actually arises in practice. There are also problems
related to string questions in biology that are related to this problem.
1
Sketch of the algorithm: Let
S[1n]
denote the sequence of characters. Note
that there must be exactly n emits and that the number of pushes must equal the
number of pops. Thus we may assume that the cost is simply the number of pushes.
The dynamic programming is based on the observation that if the same stack item is
used to produce, say, S [i1 ] and S [i2 ] (where i2  i1 and S [i1 ]  S [i2 ] , then the
S[i1 ] must be restored for emit S [i2 ] . Thus the
cost C[I,j] of producing the subsequence S[i, j ] is the minimum of
C[i, j  1]  1and min C[i, t ]  C[t  1, j  1]; S[t ]  S[ j ], i  t  j
state of the stack at the time of emit
4
3. Path Compression
Let me turn to an old simple problem with a surprising solution. The answer to
this problem has already come up several times in some of the talks in the
conference “Second Haifa Workshop on Interdisciplinary Applications of Graph
Theory, Combinatorics and Algorithms.” The goal is to maintain a collection of n
elements which are partitioned into sets, i.e., the sets are always disjoint and each
element is in a unique set. Initially each element is in a singleton set. The sets are
named by some arbitrary element in it. We would like to perform the following two
operations:
find(x) – for a given arbitrary element x, we want to return the set containing
it.
unite(x,y) – combine the two sets named by x and y. The new set gets the
name of one of the old sets.
Let’s assume that the number of elements is n. Initially, each element is in a
singleton set, and after n-1 unite operations all the elements are combined into a
single set.
Problem 3.1: Find a data structure so as to minimize the worst case total cost
of m find operations intermingled with n-1 unite operations.
For simplicity and for saving time bounds, we assume that
m  n , although
this assumption is not very important. This problem originally arose in the processing
of common and equivalent statements in the ancient programming language
FORTRAN. It is also the key problem in computing minimum spanning trees using
Kruskal’s [?] minimum spanning tree algorithm.
There is a beautiful and very simple algorithm for solving Problem 3.1,
developed in the late ‘60s, early ‘70s. I’m sure that many of you are familiar with it.
We use a tree data structure with essentially as simple as possible representation of
a tree, (see Figure 2). It is a rooted tree where every node has one pointer to its
parent. Every set is represented by a tree, where the set name is in the root. Every
5
node represents an element in the set. To answer a find(x) operation we start at the
given node x, and follow the pointers to the root node which contains the name of the
set. The time of the find operation is proportional to the length of the path. The tree
structure is important here because it affects the length of the find path. To perform a
unite(x,y) operation, we get hold of the two corresponding tree roots x and y, and
make one of the roots point to the other root. The unite operation takes constant
time.
A
B
D
F
C
E
Figure 2
The question is, how long can find paths be? Well, if this is all there is to it,
we can get bad examples. In particular, we can construct the example in Figure 3: a
tree which is just a long path. If we do lots of finds, each of linear cost, then the total
cost is proportional to the number of finds times the number of elements
which is not a happy situation.
6
O(m  n)
A
B
C
D
E
F
Figure 3
As we know, there are a couple of heuristics we can add to this method to
substantially improve the running time. We use the fact that the structure of each tree
is completely arbitrary. The best structure for the finds would be if each tree has all
its nodes just one step away from the root. Then find operations would all be at
constant cost. But as we do the unite operation, the trees get deeper and deeper.
However, we can perform the unites in several intelligent ways to ensure that the
trees don’t get too deep.
3.1 Unite by size (Mellroy)[?] : One method to combine two trees into one is by
always making the root of the smaller tree point to the root of the larger tree. The
method is described in the pseudo code below:
7
3.1.1
Maintain at each root the tree size, size(x) (number of nodes in the
tree).
3.1.2
unite(x,y): if
size ( x)  size ( y) make x
the parent of
y and
size( x)  size ( x)  size( y)
Otherwise
make
y
the parent of x and
size ( y)  size ( x)  size ( y) .
3.2 Unite by rank: In this method each node contains a “rank” which is an estimate
of the maximum number of edges from the node to the root. When we combine
two trees of different rank we attach the shallower tree to the deeper tree, thus
getting a tree with the same rank as the deeper tree. The pseudo code is below:
3.2.1
Maintain at each root the tree rank: rank(x) . Initially, the rank of each
node is zero.
3.2.2
unite(x,y): if
rank( x)  rank( y) make x
the parent of
y
if
rank( x)  rank( y) make y
the parent of
x
if
rank( x)  rank( y) make x
the parent of
y and
increase the rank of
x
by 1.
We will see, when I talk about what we do in the case of finds, it will no longer be
true that the rank is the tree height. It will be an upper bound on the tree height.
The rules above improve the complexity drastically. In particular, they cut the
find time from linear to logarithmic. Now the total cost for a sequence of
operations and an appropriate number of unite operations is
m
find
(m log n) , since on
average, the rank of a tree is logarithmic of the number of nodes in the tree. That was
in Gal and Fisher’s [?] paper.
There is one more thing we can do to improve the complexity of the solution
to Problem 3.1. It is an idea that Knuth attributes to Allen Trigger[?] from IBM . The
idea is that we modify the tree not only when we do unite operations (by sticking the
8
two trees together), but also when we perform find operations: we “squash” the trees
along the find path (see Figure 4). When we perform a find on element, let’s say E,
we walk up the path to the root, A, which contains the name of the set represented
by this tree. We know now not only the answer for E, but also the answer for every
node along the path from E to the root. We take advantage of this fact by squashing
or compressing this path and making all these nodes point directly to the root. The
tree is modified as depicted in Figure 4. Thus, if later we do a find on say, D, instead
of being three steps away from the root, it is now one step away from it.
A
A
B
E
D
C
B
C
D
E
Figure 4
The question now is, by how much does path compression improve the
complexity of the problem? Analyzing this algorithm, especially if you use both path
compression and one of the unite rules, is complicated and Knuth proposed it as a
9
challenge. I will remind you of the history of the upper bounds to this problem from
the early 1970’s:
There was an early “proof” by Bogus (???) of an
O(m)
time-bound with
constant time per find. Shortly thereafter, Mike Fisher [?] obtained a correct bound of
O(m log log n) .
Here
log  n
Later, Hopcroft and Ullman [?] obtained the bound
O(m log  n) .
denotes the number of times you have to apply the log function on n to
get down to a constant. After this result had already appeared, there was yet another
result by Bogus [?] claiming a lower bound of
(n log log n) .
Then I was able to
obtain a lower bound which showed that this data structure, in fact, is not performing
in constant time per find. Rather it performs in slightly worse than a constant time per
find. I showed the lower bound
(n (n)) ,
where
 (n)
is the inverse of
Ackerman’s function, an incredibly slowly growing function that you cannot possibly
measure in practice. It will be defined below. After showing the lower bound, I was
able to obtain a matching upper bound of O(m   (n)) . So the correct answer for the
complexity of the algorithm using both path compression and one of the unite rules is
almost constant time per find, where almost constant is the inverse of Ackerman’s
function. Ackerman’s function was originally constructed as function that was so
rapidly growing, it was not in the primitive recursive class of functions.
Here is one possible definition of the inverse of Ackerman’s function. We
define a sequence of functions:
For
j  1, k  0 , A0 ( j )  j  1, Ak ( j )  Ak( j11) ( j )
Where
Note that
two,
A2
for
k  1,
A(i 1) ( x)  A( A(i ) ( x)) , denoting function composition.
A0
is just the successor function;
is exponentiation;
A3
A1
is essentially multiplying by
is iterated exponentiation, the inverse of
after that the functions grow horrendously.
10
log  (n) ;
The inverse of Ackerman’s function is defined as:
 (n)  min k ; Ak (1)  n
The growth of the function  (n) is incredibly slow even for enormous n’s. For
k  4 , n is greater than any large number that anyone will ever compute with, in the
course of all time.
The most interesting thing about this problem, in my opinion, is the surprising
result involving  (n) . I was able to obtain this result because I imagined that it
wasn’t linear, (and I was right), so what could it be? Since this result, the inverse of
Ackerman’s function has turned up in a number of other places in computer science,
especially in computational geometry, in Davenport-Chinsel’s sequences [?] and
related topics, in the complexity of various geometrical configurations involving lines
and points and other structures.
What is left to research on this problem? There are still interesting questions
having to do with the upper bounds extensions, lower bounds, and similar topics. Let
me just mention that this data structure is really very powerful. You can attach values
to the edges or vertices in the trees and compute functions of find on tree paths and
there are many applications of this. There are variants of path compression that have
the same kinds of inverse Ackerman function bounds and some other variants that
have worse bounds, see for example, Tarjan and Van Leeuven [?]. The lower bound
that I originally obtained was for the particular algorithm that I have described. The
inverse Ackerman function turns out to be inherent in the problem. There is no way to
solve this problem without having the inverse Ackerman function dependence. I was
able to show this for a pointer machine model with certain restrictions [?]. Later
Fredman and Saks [?] had a beautiful result for the cell probe model where the
inverse Ackerman function is necessary. Recently, Haim Kaplan, Nira Shafrir and I
[?] have extended the data structure to insertions and deletions with the same kind of
bounds.
11
4. Amortization and Self-adjusting Search Trees
The inverse Ackerman function that results from path compression is very
complicated, if one gets into the heart of the analysis. It illustrates a very important
concept, which is the notion of amortization. In solving Problem 3.1 we are
performing a sequence of operations, unites and path compressions in various
orders. We may, in fact, generate a long path in the tree, and have a single find cost
logarithmic time. But the find operation squashes the tree and causes later path
compressions to be cheap. Since we are interested in measuring the total cost, we
do not mind if some operations are expensive, as long as they are balanced by
cheap ones. This leads to the notion of amortization, which is cost per operation
averaged over a worst case sequence. This is the first example that I am aware of,
where this notion arose, although in the original work the word amortization was not
used and the analysis we perform these days was not present then. The idea that
you can have a data structure, where you do simple modifications to improve things
for later operations, is extremely powerful. I would like to turn to another data
structure where this idea comes into play - self-adjusting search trees.
4.1 Splay Trees
There is a type of self-adjusting search tree called splay tree that I’m sure
many of you know about. It was invented by Danny Sleater and myself [?]. Many
complexity results are known for splay trees, as we will see, and they illustrate some
clever ideas in algorithmic analysis. However, the ultimate question of the optimality
of the splaying algorithm, within a constant factor, remains an open problem.
Let me remind you about binary search trees.
Definition 4.1: A binary search tree is a binary tree, i.e., every node has a
left and a right child, either of which or both, can be missing. Each node contains an
item of data and a key, and the items are totally ordered by their keys. The items are
12
arranged in the binary search tree in the following way: For every node r in the tree,
every node in the left subtree of r is less than the item stored in r, and every node in
the right subtree of r is greater than or equal to the item stored in r. The operations
done on the tree are: access, insert and delete.
We perform a search for an item in the obvious way: we start at the root, and
we go down the tree, choosing in every node whether to go left or right by comparing
the key of the node to the key of the item we are searching for. The search time is
proportional to the depth of the tree or, more precisely, the length of the path from the
root to the designated item. For example, in Figure 5, if we are searching for “frog”
which is at the root, it is cheap; if we are searching for “zebra”, it will take us four
steps. Searching for “zebra” is more expensive, but not too expensive, because the
tree is reasonably balanced. Of course, there are “bad” trees which are just long
paths and there are “good” trees, which are spread out wide like the one in Figure 5.
If we have a fixed set of items, it is easy to construct a perfectly balanced tree, which
gives us logarithmic access time.
frog
cow
cat
horse
dog
goat
rabbit
pig
Figure 5
13
zebra
The situation becomes more interesting if we want to allow insertion and
deletion operations, since the shape of the tree will change. There are standard
methods for inserting and deleting items at a binary search tree. Let me remind you
how these work. The easiest method for an insert operation is just to follow the
search path, which will end at the bottom of the tree, and stick the new item in the
missing position. A delete operation is slightly more complicated. Here is an
example. If I want to delete “pig” (a leaf in the tree), I simply delete the node
containing it. But if I want to delete “frog “, which is sitting at the root, I have to
replace that node with another node. I can get the replacement node by taking the
left branch from the root and then going all the way down to the right, giving me the
predecessor of “frog”, which happens to be “dog”, and moving it to replace the root.
Or symmetrically, I can take the successor of “frog” and move it to the position of
“frog”. In either case, an insertion or deletion takes essentially one search in the tree
which, in the worse case, is proportional to the depth.
But now the tree structure changes. In the presence of insertions and
deletions, long paths may develop. Now if we want to perform finds in such trees, the
search time will increase. Therefore, we need operations that allow us to restructure
the tree, so that we can move things up and down to restore and maintain a "good"
search tree.
The standard operation for restructuring trees is the rebalancing operation
called rotation. Rotation takes an edge (f, k) in the tree, and switches it around to
become (k, f), as depicted in Figure 6. This is a right rotation. A left rotation is defined
similarly. In any standard computer representation this takes constant time, and the
resulting tree is still a binary search tree. Rotation is universal in the sense that any
tree on the same set of ordered items can be turned into any other tree on the same
set of items in the same order by doing an appropriate sequence of rotations. We can
use rotations to rebalance the tree in the presence of insertions and deletions, and
14
there are various so-called “balanced tree structures” that use extra information to
preserve the balance when there are insertions and deletions – “AVL trees”, “redblack trees”, etc. All these balanced tree structures have the property that the worst
case time for a search, insertion or deletion, is O(log n) .
f
k
right
k
C
A
left
f
A
B
B
Figure 6
Perhaps that should be the end of the story, but balanced search trees
possess certain drawbacks:

You have to store extra information to keep track of the balance, so they need
extra space.
15
C

The re-balancing in the case of insertions or deletions involves several cases,
some of them are possibly complicated.

And perhaps more important, the data structure is logarithmic-worst-case but it is
logarithmic-best-case as well. It doesn’t necessarily adapt to a usage pattern.
That is to say, suppose I have a tree with a million items but I am only using a
thousand of them. I would like those thousand items to be accessed cheaply,
more cheaply than log of a million. An ordinary search tree does not allow that.
There are various data structures that people have invented to handle this last
drawback. Assume that you know something about the usage pattern, for example,
you have an estimate of the access frequency for each item. Then you can think
about constructing an optimum search tree to minimize the average access time. But
maybe the access pattern changes over time. Maybe the access frequency is varied.
This happens a lot in practice.
Motivated by these questions and also knowing about path compression
results, Danny Sleator and I [?] asked the following question:
Problem 4.2: Is there a simple self-adjusting mechanism for search trees
where we could avoid explicitly balancing the tree and somehow take advantage of
the usage pattern, i.e., have the tree automatically adjust itself based upon the
usage?
The goal is to have items, that are accessed frequently, to somehow bubble
up in the tree, and items, that are accessed less frequently, to remain, somehow, at
the bottom of the tree.
We were able to come up with such a structure, which we call the “splaying
tree”, or “splay tree”. Splaying tree is a self-adjusting search tree. . “Splay” means
to spread out. Splaying is a simple self-adjusting heuristic, like path compression, but
now we are operating on binary search trees. In the splaying heuristic we take a
designated item and move it up to the root of the tree by performing rotations, thus
16
preserving the fact that it’s a search tree. The idea is essentially to perform rotations
bottom up, but it turns out that if you do rotations one at a time strictly in order, you
don’t get the right properties. The heuristics performs rotations in pairs, roughly in
bottom up order, and according to the rules shown in Figure 7 and explained below.
Thus, when we access an item, we walk down the tree to the item, and then perform
the splay operation, which moves the item all the way up to the tree root. Every item
along the path has its distance to the root roughly halved, and all other nodes get
pushed to the side. No node moves down more than a constant number of steps.
17
Cases of splaying
y
x
Zig
x
y
A
C
A
Rotation
B
B
C
(a)
z
x
Zig Zig
y
y
D
A
x
z
C
B
A
B
(b)
D
C
Z
X
Y
Zig Zag
Y
Z
D
X
A
A
B
C
B
D
(c)
C
Figure 7
Figure 7 shows an example of splaying. Assume
accessed. If the two steps above
x
x
is the node to be
toward the root are in the same direction, we do
a rotation on the top edge first and then on the bottom edge, which locally transforms
the tree as seen in Figure 7b. This rotation doesn’t look helpful, but in fact, when you
18
do the rotations in sequence, it has positive effects. That’s called the “zigzig” case. If
the two edges from
x
to the root are in the opposite directions, right-left directions as
is seen in Figure 7c, or symmetrically, left-right directions, then the bottom rotation is
done first, followed by a top rotation. In this case,
x moves up and y
and
z
get split
into two subtrees of the root. All other subtrees are attached in the correct places.
We keep doing zigzag and zigzig steps, as appropriate, bubbling
x
a time until either
x
up, two steps at
gets to the root or it gets one step away from the root, in which
case we then do one final rotation, the zig case (Figure 7a). The splay operation is an
entire sequence, which moves a designated item all the way up to the root by means
of this sequence of transformations.
6
6
5
5
4
4
3
1
6
6
1
4
2
1
1
4
5
2
2
5
2
Pure zig zig
case
6
6
3
3
4
3
2
Pure zig zag
6
1
5
5
2
3
6
1
1
3
3
(a)
3
1
5
2
4
(b)
4
2
5
4
case
Figure 8
Figure 8a contains another step-by-step example. This is the purest zigzig
case. We perform two rotations, moving item #1 up the path and then two more
rotations, moving item #1 further up the path. Finally, a last rotation is performed to
19
complete the task. Going from the initial position to the final one is called “splaying
node 1”. Figure 8b is the pure zigzag case.
Figure 9 is another example of a pure zigzag case and a single splay
operation. Again, the accessed node moves to the root, every other node along the
find path has its distance to the root roughly halved, and no nodes are pushed down
by more than a constant amount. If we start out with a really bad example, and we
do extensive splay operations, the tree gets balanced very quickly, and again, the
accessed node moves to the root. Given this splay operation, we are able to perform
very efficiently and simply the operations of insertions, deletions, splits and joins of
trees.
I should say that there is also a top down version of this algorithm. If you look
at Danny Sleator’s Website at Carnegie-Mellon , [?], you may find the top-down
version code for this algorithm.
10
5
1
9
2
2
8
3
10
1
9
3
7
8
4
4
7
6
6
5
Splaying
Figure 9
20
4.2 Complexity Results
The main question for us theoreticians is, “How does this algorithm perform?”
We are able to show that in the long run, in the amortized sense, this algorithm
performs just as well as balanced trees.
Theorem 4.3: Assume we start off with an n node tree and we perform a
sequence of m accesses, ignoring start up effects, that is, assume that
m  n . The
following results hold:
(a) The total cost of m accesses is O (m log n) , thus matching the bound for
balanced trees.
(b) The splaying algorithm on any access sequence performs within a
constant factor of the performance of that of the best possible static tree, in
spite the fact that this algorithm does not know the frequencies.
(c) If you access the items in symmetric order; that is, in left to right order,
small to large, starting with an arbitrary tree, the total access time is linear,
O(m) .
The amortized cost per access is constant, so it’s better than the
usual bound of
O(log n) . Note that in any static tree, if you access each of
the items in order, it would cost you O(m log n) . So this illustrates the fact
that modifying the tree as you go along, can be good in certain situations.
It is relatively straightforward to prove results (a) and (b) above. They follow
from an interesting potential argument that I trust many of you are familiar with. It can
be found in [?]. Result (c) turns out to be complicated to prove – a complicated
inductive proof I was eventually able to come up with.
The behavior of the tree and its access times is really interesting, (see Figure
10). [Insert figure from CD] If you start off with a long path in the tree, the left path,
and begin accessing the nodes in sequential order, the first access costs
next access costs about n
2,
the next one costs
21
n 4
n.
The
and so on. You get an
exponential decay until you get down to something like a constant. Then the access
time function behaves similar to a ruler function, with a certain amount of
randomness thrown in, so about half the accesses are really cheap, about a quarter
are slightly more expensive, an eighth of the accesses are slightly more expensive
than that. The average cost per access turns out to be constant, not logarithmic but
constant. This is an experimental result. Proving it is true, especially for any initial
tree, is quite difficult.
To consider our audacious conjecture, consider any possible way of
manipulating binary search trees whatsoever and ignoring insertions and deletions.
We begin with some initial tree. We want to perform a sequence of accesses. The
cost for accessing a node is the number of nodes on the access path from the node
to the root, and anytime we want we can perform a rotation, also at a cost of one. In
an off-line algorithm the sequence of accesses is given in advance. In an on-line
algorithm there is no prior knowledge of the sequence of accesses. The conjecture
compares the minimum cost of off-line algorithms with on-line algorithms.
Problem 4.4: Dynamic Optimality Conjecture: For any access sequence,
the splaying tree on-line algorithm performs within a constant factor of the optimal offline algorithm (under the assumptions that the cost of accessing a node is the
number of nodes on the access path and the cost of each rotation is one).
The closest that anyone has come to proving this conjecture is an extension
by Richard Cole of the sequential access result. Consider a sequence of accesses.
For simplicity, assume that the items are numbered 1, 2, 3, … and that they appear
in that order in the search tree. For each consecutive pair of accesses, the distance
between them is defined as the number of items in between the two given items that
are to be accessed. For example, if we access item 10 and then we access item 25,
the distance is their difference, i.e.15. Richard Cole proved the following:
22
Theorem 4.5 (Cole et al [?]): The total cost of sequence of accesses used in
a splaying tree is at most the sum of logarithms of distances between consecutive
accesses multiplied by some constant factor.
This result, called the “dynamic fingerprint theory”, would follow from the
dynamic optimality conjecture. The dynamic optimality conjecture, however, is still
open.
To summarize, we do not know the answer to the following problem:
Problem 4.6: Is the splaying algorithm optimal within a constant factor?
4.3 The Rotation Distance between Search Trees
The splaying tree data structure is very interesting. Again, it illustrates the fact
that if you perform very simple operations but you repeat them, then you get very
complicated behavior that is hard to analyze. It is also worth noting that this data
structure has been used often in practice in various systems applications, memory
management, etc. In many applications of table structures or tree structures, most of
the data does not get accessed most of the time. The splaying tree algorithm takes
advantage of locality of reference; in other words, for some small working set on the
tree, the splay algorithm moves that set right up to the top of the tree, where you
have your hands on it with low access cost. Then, as the working set changes, the
new working set moves up to the top of the tree.
The drawback of this data structure is, of course, that it is performing rotations
all the time. Nevertheless, it seems to work very well in practical situations where the
access pattern is changing.
I mentioned optimum search trees that lead us into less data structure, more
algorithmic questions on search trees. Search trees are a fascinating topic. Another
interesting question about binary search trees is to look at their structure under
rotation. I have said that you can convert any search tree into any other search tree
by performing an appropriate sequence of rotations. We could ask the following:
23
Question 1: Given two trees, how many rotations does it take to convert one
tree into another?
This is an algorithmic question. This question seems to be NP- complete, but
I am not sure whether it is or not. A related question is how far apart can two trees
be? To formulate the question more precisely, we define the “tree graph” in the
following way: The nodes of the graph are trees,
n
node binary trees. Two trees are
connected by an edge if and only if one tree can be gotten from the other using one
rotation. This graph is connected.
Question 2: What is the diameter of the “tree-graph” described above?
It is easy to get an upper bound for the diameter. It is not so easy to get a
lower bound for it. It is easy to get a lower bound
n
but there is a factor of 2 gap.
Danny Sleator and I are working with Bill Furston, who turns everything into a
hyperbolic geometry problem. We were able to show that the
2n
bound is tight, by
converting the distance in the tree-graph problem into a hyperbolic geometry
problem. This was another amazing thing. I don’t have slides to illustrate the details
and it is technical, but it is an example where some piece of mathematics from way
out in left field comes in and solves some nitty gritty data structure problem. It just
goes to show the power of mathematics in the world.
5. Algorithmic questions on search trees
Assume we have a sequence of items in fixed order. We want to store them
in a binary search tree, so that we minimize the average access time. We assume
that we know the access probabilities and that every item gets accessed with an
independent fixed probability. The problem is to construct the tree. There are an
exponential number of trees, so it might take exponential time, but in fact it does not.
There are two different kinds of algorithms, depending on the kind of binary search
tree we pick. Thus we consider two cases:
24
Case 5.1: Items are allowed to be stored in the internal nodes of the tree.
The problem in this case is: Given positive weights w1 , w2 ,, wn , construct
n
an n-node binary tree that minimizes
w d
i 1
i
i
di
, where
is the depth of the i -
th node in symmetric order.
Case 5.2: Items are allowed to be stored only in the external nodes, (the
leaves), of the tree.
The problem in this case is: Given positive weights w1 , w2 ,, wn , construct a
n
binary tree with n external nodes that minimizes
w d
i 1
i
i
, where
d i is
the
depth of the i - th external node in symmetric order.
In the splay data structure I allowed items to be stored in the internal nodes of the
tree.
To build an optimum tree in the first case, there is a straightforward
O(n 3 )
dynamic programming algorithm, just as in the stack generation problem I mentioned.
Knuth[?] was able to show that there is no need to look at all the sub-problems.
There is a restriction to the sub-problems that have to be looked at, which reduces
the time to
O(n 2 )
and this was extended by Frances Yao [?] to other problems.
There is a certain kind of structural restriction, that Frances called the “quadrangle
inequalities”. Here we have an
O(n 2 )
dynamic programming algorithm with a
certain amount of cleverness in it. This result is twenty-five years old or so. Nothing
better is known. There may, in fact, be a faster algorithm here with no nontrivial lower
bound.
If we change the problem so that we store the items in the external nodes and
just put values in the internal nodes to drive the search process, then we can do even
25
better. There is a beautiful algorithm due to Hu and Tucker [?] which runs in
O(n log m)
time, which is an extension of the classical algorithm of Huffman for
Huffman codes. The difference here is the alphabetic restriction. The items have to
be in order in the leaves, whereas in Huffman coding you can permute them. It turns
out that a simple variant of the Huffman coding algorithm with a similar running time
works here. The amazing thing about this is that the proof is inordinately complicated.
The original proof was almost unreadable and tremendously long. There are now
many nicer and simpler proofs (references ??) but it is still quite a miracle that this
thing in fact works. And again, there is no lower bound here. In fact, there is no
reason to believe that the problem can’t be solved in linear time, and there is good
reason to believe that maybe it can. There are many special cases of this problem,
which can be solved in linear time.
6. Minimum Spanning Tree Problem
Let me close by coming back to a classical graph problem – the minimum
spanning tree problem. We are given a connected, undirected graph with edge costs.
We want to find a spanning tree of minimum total edge cost. This problem is a
classical problem about network optimization, and it is about the simplest problem in
this area. It has a really long history. Computing minimum spanning tree corresponds
to single linkage clustering, so there is a lot of work by people who are trying to do
clustering algorithms on this problem, and it occurred very early on. There was an
anthropologist in about 1909 who hinted of a solution, which approaches what we
would call an algorithm. The first algorithms for this problem were developed in the
late 1920’s by various Czech mathematicians, and the algorithms that we know
mostly were discovered back in the 1920’s and 1930’s by, now unknown,
mathematicians. The exception is Kruskal’s algorithm, with which we are all familiar.
In Kruskal’s algorithm the idea is to sort the edges in increasing order by cost,
26
process the edges in this order, and build up the tree incrementally, looking at each
edge sequentially. If the edge combines two disconnected components of the graph,
we add it to the tree we are building; if it connects two vertices in the same tree,
throw it away. To implement this algorithm, we need a sorting or a sorting-like
algorithm plus a data structure like the Set union [?] algorithm that was mentioned in
section 3. If you include the sorting time, the cost of Kruskal’s algorithm is
O(m log n) ; if you assume that the edges are presorted by cost, the running time of
Kruskal’s algorithm is
O(m (n)) , where  (n) is the inverse Ackerman’s function.
There were earlier algorithms. There is a classical algorithm usually credited
to Prim and Dijkstra, which is a single source version of Kruskal’s algorithm. It begins
in a vertex and grows a tree from it by adding the cheapest edge connecting the tree
to a new vertex. This algorithm was actually discovered by Jarnik in 1930 [?]. It runs
in quadratic time as it was described by Prim and Dijkstra. Jarnik did not describe
computational complexity at all.
There is an even earlier algorithm, a beautiful parallel algorithm that was
described
by
Baruvka
O(min{ m log n, n 2 }) .
[?]
in
1926,
which
has
a
running
time
of
This algorithm is Kruskal’s algorithm in parallel; for every
vertex, pick the cheapest adjacent incident edge and throw that into the set of edges
that you fix up. In general, you have a collection of pieces of the spanning tree. For
each one pick the cheapest incident edge and throw that in. So you might have two
pieces selecting the same edge from opposite directions. In every iteration the
number of pieces goes down by a factor of 2, and this is where the logarithm function
comes in. Once again, Baruvka did not analyze the computational complexity.
The classical results are
O(n 2 ) , or O(m log n) . The obvious question here
is whether sorting is really needed, and whether one can get rid of the log factor. Can
this problem, in fact, be solved in linear time? Andy Yao [?] in 1975 was able to add a
27
new idea to Baruvka’s algorithm, that reduced the running time from
O(m log log n) ,
O(m log n) to
thereby showing that sorting was not, in fact, inherent in this
problem. Then there was a sequence of improvements. If a Fibonacci heaps data
structure that Mike Fredman and I [?] invented is used, a running time of
O(m log * n)
is achieved. If Gabow’s idea [?] is combined with Fibonacci heaps,
O(m log log * n)
is achieved. Very recently, Bernard Chazelle was able to decrease
the complexity down to O(m   (n)) , and again you see the inverse Ackerman’s
function coming in. This is a deterministic algorithm.
This is not the end of the story. You could ask the question:” is
 (n)
inherent
in this problem, as well?” It turns out that probably not. These algorithms, as well as
Chazelle’s algorithm, are all deterministic. If randomization is allowed, the problem, in
fact, can be solved in
O(m) .
This is a truly amazing result, based on Chazelle’s
work by Petty and Ramachandran [?], which says the following: It is possible to
construct an algorithm, which is optimum, but they cannot analyze its running time.
So you have an algorithm which runs in at least
O(m (n)) .
It might run in linear
time. They cannot analyze the complexity of the algorithm, but they know it runs as
fast as possible. Now, how can you come up with an algorithm that you know is
optimum but you cannot say how fast it is? The idea is to use pre-processing. If the
problem is small enough, all possibilities can be enumerated, a decision tree can be
constructed in this case, which tells the minimum number of computations, in this
case comparisons, that have to be made in order to solve the problem. Suppose
there are n edges. If there was enough time, the optimum decision tree for the
computation could be built. It would take double exponential or triple exponential time
or something like that. This pre-processing time is too expensive. But if the problem
is really, really small, say,
O(log log log n)
or something like that, then all the
possible algorithms can be enumerated on that small problem, the fastest one be
28
picked, and then you are in a good shape. Using Boruvka plus Chazelle’s idea, a
recursion is achieved, which takes the original problem and reduces it to a collection
of really tiny sub-problems. For every really tiny sub-problem the preprocessing
computation is run, which is hugely exponential, but in this case it becomes linear or
sub-linear in the size of the original problem. The optimum algorithm for that small
size sub-problem is found, then the algorithm is applied to all these small subproblems, and we are done. It can be shown, that this is going to be the best possible
algorithm to within a constant factor but its running time can’t be told. It is a very
peculiar thing.
This is the strange unsettled state of the minimum spanning tree problem, at
the moment.
7. References
29