Download Amortization - Jeff Erickson

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

B-tree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Amortization
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
The goode workes that men don whil they ben in good lif
al amortised by synne folwyng.
— Geoffrey Chaucer, “The Persones [Parson’s] Tale” (c.1400)
I will gladly pay you Tuesday for a hamburger today.
— J. Wellington Wimpy, “Thimble Theatre” (1931)
I want my two dollars!
— Johnny Gasparini [Demian Slade], “Better Off Dead” (1985)
A dollar here, a dollar there. Over time, it adds up to two dollars.
— Jarod Kintz, The Titanic Would Never Have Sunk
if It Were Made out of a Sink (2012)
15
15.1
Amortized Analysis
Incrementing a Binary Counter
It is a straightforward exercise in induction, which often appears on Homework 0, to prove that
any non-negative integer n can be represented as the sum of distinct powers of 2. Although some
students correctly use induction on the number of bits—pulling off either the least significant bit
or the most significant bit in the binary representation and letting the Recursion Fairy convert
the remainder—the most commonly submitted proof uses induction on the value of the integer,
as follows:
Proof: The base case n = 0 is trivial. For any n > 0, the inductive hypothesis implies that there
is set of distinct powers of 2 whose sum is n − 1. If we add 20 to this set, we obtain a multiset of
powers of two whose sum is n, which might contain two copies of 20 . Then as long as there are
two copies of any 2i in the multiset, we remove them both and insert 2i+1 in their place. The
sum of the elements of the multiset is unchanged by this replacement, because 2i+1 = 2i + 2i .
Each iteration decreases the size of the multiset by 1, so the replacement process must eventually
terminate. When it does terminate, we have a set of distinct powers of 2 whose sum is n.
ƒ
This proof is describing an algorithm to increment a binary counter from n − 1 to n. Here’s a
more formal (and shorter!) description of the algorithm to add 1 to a binary counter. The input B
is an (infinite) array of bits, where B[i] = 1 if and only if 2i appears in the sum.
Increment(B[0 .. ∞]):
i←0
while B[i] = 1
B[i] ← 0
i ← i+1
B[i] ← 1
We’ve already argued that Increment must terminate, but how quickly? Obviously, the
running time depends on the array of bits passed as input. If the first k bits are all 1s, then
Increment takes Θ(k) time. The binary representation of any positive integer n is exactly
blg nc + 1 bits long. Thus, if B represents an integer between 0 and n, Increment takes Θ(log n)
time in the worst case.
© Copyright 2014 Jeff Erickson.
This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/).
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/ for the most recent revision.
1
Algorithms
15.2
Lecture 15: Amortized Analysis [Fa’13]
Counting from 0 to n
Now suppose we call Increment n times, starting with a zero counter. How long does it take
to count from 0 to n? If we only use the worst-case running time for each Increment, we get
an upper bound of O(n log n) on the total running time. Although this bound is correct, we can
do better; in fact, the total running time is only Θ(n). This section describes several general
methods for deriving, or at least proving, this linear time bound. Many (perhaps even all) of
these methods are logically equivalent, but different formulations are more natural for different
problems.
15.2.1
Summation
Perhaps the simplest way to derive a tighter bound is to observe that Increment doesn’t flip
Θ(log n) bits every time it is called. The least significant bit B[0] does flip in every iteration, but
B[1] only flips every other iteration, B[2] flips every 4th iteration, and in general, B[i] flips every
2i th iteration. Because we start with an array full of 0’s, a sequence of n Increments flips each
bit B[i] exactly bn/2i c times. Thus, the total number of bit-flips for the entire sequence is
blg
nc j
X
i=0
∞
nk X n
= 2n.
<
2i
2i
i=0
(More precisely, the number of flips is exactly 2n − #1(n), where #1(n) is the number of 1 bits
in the binary representation of n.) Thus, on average, each call to Increment flips just less than
two bits, and therefore runs in constant time.
This sense of “on average” is quite different from the averaging we consider with randomized
algorithms. There is no probability involved; we are averaging over a sequence of operations, not
the possible running times of a single operation. This averaging idea is called amortization—the
amortized time for each Increment is O(1). Amortization is a sleazy clever trick used by
accountants to average large one-time costs over long periods of time; the most common example
is calculating uniform payments for a loan, even though the borrower is paying interest on less
and less capital over time. For this reason, it is common to use “cost” as a synonym for running
time in the context of amortized analysis. Thus, the worst-case cost of Increment is O(log n),
but the amortized cost is only O(1).
Most textbooks call this particular technique “the aggregate method”, or “aggregate analysis”,
but these are just fancy names for computing the total cost of all operations and then dividing by
the number of operations.
The Summation Method. Let T (n) be the worst-case running time for a sequence of
n operations. The amortized time for each operation is T (n)/n.
15.2.2
Taxation
A second method we can use to derive amortized bounds is called either the accounting method
or the taxation method. Suppose it costs us a dollar to toggle a bit, so we can measure the
running time of our algorithm in dollars. Time is money!
Instead of paying for each bit flip when it happens, the Increment Revenue Service charges a
two-dollar increment tax whenever we want to set a bit from zero to one. One of those dollars is
spent changing the bit from zero to one; the other is stored away as credit until we need to reset
the same bit to zero. The key point here is that we always have enough credit saved up to pay for
2
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
the next Increment. The amortized cost of an Increment is the total tax it incurs, which is
exactly 2 dollars, since each Increment changes just one bit from 0 to 1.
It is often useful to distribute the tax income to specific pieces of the data structure. For
example, for each Increment, we could store one of the two dollars on the single bit that is set
for 0 to 1, so that that bit can pay to reset itself back to zero later on.
Taxation Method 1. Certain steps in the algorithm charge you taxes, so that the total
cost incurred by the algorithm is never more than the total tax you pay. The amortized
cost of an operation is the overall tax charged to you during that operation.
A different way to schedule the taxes is for every bit to charge us a tax at every operation,
regardless of whether the bit changes of not. Specifically, each bit B[i] charges a tax ofP1/2i dollars
for each Increment. The total tax we are charged during each Increment is i≥0 2−i = 2
dollars. Every time a bit B[i] actually needs to be flipped, it has collected exactly $1, which is
just enough for us to pay for the flip.
Taxation Method 2. Certain portions of the data structure charge you taxes at each
operation, so that the total cost of maintaining the data structure is never more than
the total taxes you pay. The amortized cost of an operation is the overall tax you pay
during that operation.
In both of the taxation methods, our task as algorithm analysts is to come up with an
appropriate ‘tax schedule’. Different ‘schedules’ can result in different amortized time bounds.
The tightest bounds are obtained from tax schedules that just barely stay in the black.
15.2.3
Charging
Another common method of amortized analysis involves charging the cost of some steps to some
other, earlier steps. The method is similar to taxation, except that we focus on where each unit of
tax is (or will be) spent, rather than where is it collected. By charging the cost of some operations
to earlier operations, we are overestimating the total cost of any sequence of operations, since we
pay for some charges from future operations that may never actually occur.
The Charging Method. Charge the cost of some steps of the algorithm to earlier steps,
or to steps in some earlier operation. The amortized cost of the algorithm is its actual
running time, minus its total charges to past operations, plus its total charge from
future operations.
For example, in our binary counter, suppose we charge the cost of clearing a bit (changing
its value from 1 to 0) to the previous operation that sets that bit (changing its value from 0 to
1). If we flip k bits during an Increment, we charge k − 1 of those bit-flips to earlier bit-flips.
Conversely, the single operation that sets a bit receives at most one unit of charge from the next
time that bit is cleared. So instead of paying for k bit-flips, we pay for at most two: one for
actually setting a bit, plus at most one charge from the future for clearing that same bit. Thus,
the total amortized cost of the Increment is at most two bit-flips.
We can visualize this charging scheme as follows. For each integer i, we represent the running
time of the ith Increment as a stack of blocks, one for each bit flip. The jth block in the ith
stack is white if the ith Increment changes B[ j] from 0 to 1, and shaded if the ith Increment
changes B[ j] from 1 to 0. If we moved each shaded block onto the white block directly to its left,
there would at most two blocks in each stack.
3
Algorithms
1
2
3
Lecture 15: Amortized Analysis [Fa’13]
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Charging scheme for a binary counter.
15.2.4
Potential
The most powerful method (and the hardest to use) builds on a physics metaphor of ‘potential
energy’. Instead of associating costs or taxes with particular operations or pieces of the data
structure, we represent prepaid work as potential that can be spent on later operations. The
potential is a function of the entire data structure.
Let Di denote our data structure after i operations have been performed, and let Φi denote
its potential. Let ci denote the actual cost of the ith operation (which changes Di−1 into Di ).
Then the amortized cost of the ith operation, denoted ai , is defined to be the actual cost plus the
increase in potential:
ai = ci + Φi − Φi−1
So the total amortized cost of n operations is the actual total cost plus the total increase in
potential:
n
n
n
X
X
X
ai =
(ci + Φi − Φi−1 ) =
c i + Φ n − Φ0 .
i=1
i=1
i=1
A potential function is valid if Φi − Φ0 ≥ 0 for all i. If the potential function is valid, then the
total actual cost of any sequence of operations is always less than the total amortized cost:
n
X
i=1
ci =
n
X
ai − Φn ≤
i=1
n
X
ai .
i=1
For our binary counter example, we can define the potential Φi after the ith Increment to
be the number of bits with value 1. Initially, all bits are equal to zero, so Φ0 = 0, and clearly
Φi > 0 for all i > 0, so this is a valid potential function. We can describe both the actual cost of
an Increment and the change in potential in terms of the number of bits set to 1 and reset to 0.
ci = #bits changed from 0 to 1 + #bits changed from 1 to 0
Φi − Φi−1 = #bits changed from 0 to 1 − #bits changed from 1 to 0
Thus, the amortized cost of the ith Increment is
ai = ci + Φi − Φi−1 = 2 × #bits changed from 0 to 1
Since Increment changes only one bit from 0 to 1, the amortized cost Increment is 2.
The Potential Method. Define a potential function for the data structure that is initially equal to zero and is always non-negative. The amortized cost of an operation is
its actual cost plus the change in potential.
4
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
For this particular example, the potential is precisely the total unspent taxes paid using the
taxation method, so it should be no surprise that we obtain precisely the same amortized cost.
In general, however, there may be no natural way to interpret change in potential as “taxes” or
“charges”. Taxation and charging are useful when there is a convenient way to distribute costs to
specific steps in the algorithm or components of the data structure. Potential arguments allow us
to argue more globally when a local distribution is difficult or impossible.
Different potential functions can lead to different amortized time bounds. The trick to using
the potential method is to come up with the best possible potential function. A good potential
function goes up a little during any cheap/fast operation, and goes down a lot during any
expensive/slow operation. Unfortunately, there is no general technique for finding good potential
functions, except to play around with the data structure and try lots of possibilities (most of
which won’t work).
15.3
Incrementing and Decrementing
Now suppose we wanted a binary counter that we can both increment and decrement efficiently.
A standard binary counter won’t work, even in an amortized sense; if we alternate between 2k
and 2k − 1, every operation costs Θ(k) time.
A nice alternative is represent each integer as a pair (P, N ) of bit strings, subject to the
invariant P ∧ N = 0 where ∧ represents bit-wise And. In other words,
For every index i, at most one of the bits P[i] and N [i] is equal to 1.
If we interpret P and N as binary numbers, the actual value of the counter is P − N ; thus,
intuitively, P represents the “positive” part of the pair, and N represents the “negative” part.
Unlike the standard binary representation, this new representation is not unique, except for zero,
which can only be represented by the pair (0, 0). In fact, every positive or negative integer can
be represented has an infinite number of distinct representations.
We can increment and decrement our double binary counter as follows. Intuitively, the
Increment algorithm increments P, and the Decrement algorithm increments N ; however, in
both cases, we must change the increment algorithm slightly to maintain the invariant P ∧ N = 0.
Increment(P, N ):
i←0
while P[i] = 1
P[i] ← 0
i ← i+1
Decrement(P, N ):
i←0
while N [i] = 1
N [i] ← 0
i ← i+1
if N [i] = 1
N [i] ← 0
else
P[i] ← 1
if P[i] = 1
P[i] ← 0
else
N [i] ← 1
P = 10001
P = 10010
P = 10011
P = 10000
P = 10000
P = 10000
P = 10001
++
++
++
−−
−−
++
N = 01100 −→ N = 01100 −→ N = 01100 −→ N = 01000 −→ N = 01001 −→ N = 01010 −→ N = 01010
P−N =5
P−N =6
P−N =7
P−N =8
P−N =7
P−N =6
P−N =7
Incrementing and decrementing a double-binary counter.
Now suppose we start from (0, 0) and apply a sequence of n Increments and Decrements.
In the worst case, each operation takes Θ(log n) time, but what is the amortized cost? We can’t
5
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
use the aggregate method here, because we don’t know what the sequence of operations looks
like.
What about taxation? It’s not hard to prove (by induction, of course) that after either P[i]
or N [i] is set to 1, there must be at least 2i operations, either Increments or Decrements,
before that bit is reset to 0. So if each bit P[i] and N [i] pays a tax of 2−i at each operation, we
will always have enough
to pay for the next operation. Thus, the amortized cost of each
P money
−i
operation is at most i≥0 2 · 2 = 4.
We can get even better amortized time bounds using the potential method. Define the
potential Φi to be the number of 1-bits in both P and N after i operations. Just as before, we have
ci = #bits changed from 0 to 1 + #bits changed from 1 to 0
Φi − Φi−1 = #bits changed from 0 to 1 − #bits changed from 1 to 0
=⇒
ai = 2 × #bits changed from 0 to 1
Since each operation changes at most one bit to 1, the ith operation has amortized cost ai ≤ 2.
?
15.4
Gray Codes
An attractive alternate solution to the increment/decrement problem was independently suggested
by several students. Gray codes (named after Frank Gray, who discovered them in the 1950s) are
methods for representing numbers as bit strings so that successive numbers differ by only one bit.
For example, here is the four-bit binary reflected Gray code for the integers 0 through 15:
0000, 0001, 0011, 0010, 0110, 0111, 0101, 0100, 1100, 1101, 1111, 1110, 1010, 1011, 1001, 1000
The general rule for incrementing a binary reflected Gray code is to invert the bit that would be
set from 0 to 1 by a normal binary counter. In terms of bit-flips, this is the perfect solution; each
increment of decrement by definition changes only one bit. Unfortunately, the naïve algorithm
to find the single bit to flip still requires Θ(log n) time in the worst case. Thus, so the total cost
of maintaining a Gray code, using the obvious algorithm, is the same as that of maintaining a
normal binary counter.
Fortunately, this is only true of the naïve algorithm. The following algorithm, discovered
by Gideon Ehrlich¹ in 1973, maintains a Gray code counter in constant worst-case time per
increment! The algorithm uses a separate ‘focus’ array F [0 .. n] in addition to a Gray-code bit
array G[0 .. n − 1].
EhrlichGrayIncrement(n):
j ← F [0]
F [0] ← 0
if j = n
G[n − 1] ← 1 − G[n − 1]
else
G[ j] = 1 − G[ j]
F [ j] ← F [ j + 1]
F [ j + 1] ← j + 1
EhrlichGrayInit(n):
for i ← 0 to n − 1
G[i] ← 0
for i ← 0 to n
F [i] ← i
¹Gideon Ehrlich. Loopless algorithms for generating permutations, combinations, and other combinatorial
configurations. J. Assoc. Comput. Mach. 20:500–513, 1973.
6
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
The EhrlichGrayIncrement algorithm obviously runs in O(1) time, even in the worst case.
Here’s the algorithm in action with n = 4. The first line is the Gray bit-vector G, and the second
line shows the focus vector F , both in reverse order:
G : 0000, 0001, 0011, 0010, 0110, 0111, 0101, 0100, 1100, 1101, 1111, 1110, 1010, 1011, 1001, 1000
F : 3210, 3211, 3220, 3212, 3310, 3311, 3230, 3213, 4210, 4211, 4220, 4212, 3410, 3411, 3240, 3214
Voodoo! I won’t explain in detail how Ehrlich’s algorithm works, except to point out the following
invariant. Let B[i] denote the ith bit in the standard binary representation of the current number.
If B[ j ] = 0 and B[ j − 1] = 1, then F [ j ] is the smallest integer k > j such that B[k] = 1;
otherwise, F [ j] = j. Got that?
But wait — this algorithm only handles increments; what if we also want to decrement?
Sorry, I don’t have a clue. Extra credit, anyone?
15.5
Generalities and Warnings
Although computer scientists usually apply amortized analysis to understand the efficiency of
maintaining and querying data structures, you should remember that amortization can be applied
to any sequence of numbers. Banks have been using amortization to calculate fixed payments for
interest-bearing loans for centuries. The IRS allows taxpayers to amortize business expenses or
gambling losses across several years for purposes of computing income taxes. Some cell phone
contracts let you to apply amortization to calling time, by rolling unused minutes from one month
into the next month.
It’s also important to remember that amortized time bounds are not unique. For a data
structure that supports multiple operations, different amortization schemes can assign different
costs to exactly the same algorithms. For example, consider a generic data structure that can be
modified by three algorithms: Fold, Spindle, and Mutilate. One amortization scheme might
imply that Fold and Spindle each run in O(log n) amortized time, while Mutilate runs in O(n)
p
amortized time. Another scheme might imply that Fold runs in O( n) amortized time, while
Spindle and Mutilate each run in O(1) amortized time. These two results are not necessarily
inconsistent! Moreover, there is no general reason to prefer one of these sets of amortized time
bounds over the other; our preference may depend on the context in which the data structure is
used.
Exercises
1. Suppose we are maintaining a data structure under a series of n operations. Let f (k)
denote the actual running time of the kth operation. For each of the following functions f ,
determine the resulting amortized cost of a single operation. (For practice, try all of the
methods described in this note.)
(a) f (k) is the largest integer i such that 2i divides k.
(b) f (k) is the largest power of 2 that divides k.
(c) f (k) = n if k is a power of 2, and f (k) = 1 otherwise.
(d) f (k) = n2 if k is a power of 2, and f (k) = 1 otherwise.
(e) f (k) is the index of the largest Fibonacci number that divides k.
(f) f (k) is the largest Fibonacci number that divides k.
7
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
(g) f (k) = k if k is a Fibonacci number, and f (k) = 1 otherwise.
(h) f (k) = k2 if k is a Fibonacci number, and f (k) = 1 otherwise.
(i) f (k) is the largest integer whose square divides k.
(j) f (k) is the largest perfect square that divides k.
(k) f (k) = k if k is a perfect square, and f (k) = 1 otherwise.
(l) f (k) = k2 if k is a perfect square, and f (k) = 1 otherwise.
(m) Let T be a complete binary search tree, storing the integer keys 1 through n. f (k) is
the number of ancestors of node k.
(n) Let T be a complete binary search tree, storing the integer keys 1 through n. f (k) is
the number of descendants of node k.
(o) Let T be a complete binary search tree, storing the integer keys 1 through n. f (k) is
the square of the number of ancestors of node k.
(p) Let T be a complete binary search tree, storing the integer keys 1 through n. f (k) =
size(k) lg size(k), where size(k) is the number of descendants of node k.
(q) Let T be an arbitrary binary search tree, storing the integer keys 0 through n. f (k) is
the length of the path in T from node k − 1 to node k.
(r) Let T be an arbitrary binary search tree, storing the integer keys 0 through n. f (k) is
the square of the length of the path in T from node k − 1 to node k.
(s) Let T be a complete binary search tree, storing the integer keys 0 through n. f (k) is
the square of the length of the path in T from node k − 1 to node k.
2. Consider the following modification of the standard algorithm for incrementing a binary
counter.
Increment(B[0 .. ∞]):
i←0
while B[i] = 1
B[i] ← 0
i ← i+1
B[i] ← 1
SomethingElse(i)
The only difference from the standard algorithm is the function call at the end, to a
black-box subroutine called SomethingElse.
Suppose we call Increment n times, starting with a counter with value 0. The amortized
time of each Increment clearly depends on the running time of SomethingElse. Let
T (i) denote the worst-case running time of SomethingElse(i). For example, we proved
in class that Increment algorithm runs in O(1) amortized time when T (i) = 0.
(a) What is the amortized time per Increment if T (i) = 42?
(b) What is the amortized time per Increment if T (i) = 2i ?
(c) What is the amortized time per Increment if T (i) = 4i ?
p i
(d) What is the amortized time per Increment if T (i) = 2 ?
(e) What is the amortized time per Increment if T (i) = 2i /(i + 1)?
8
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
3. An extendable array is a data structure that stores a sequence of items and supports the
following operations.
• AddToFront(x) adds x to the beginning of the sequence.
• AddToEnd(x) adds x to the end of the sequence.
• Lookup(k) returns the kth item in the sequence, or Null if the current length of the
sequence is less than k.
Describe a simple data structure that implements an extendable array. Your AddToFront
and AddToBack algorithms should take O(1) amortized time, and your Lookup algorithm
should take O(1) worst-case time. The data structure should use O(n) space, where n is
the current length of the sequence.
4. An ordered stack is a data structure that stores a sequence of items and supports the
following operations.
• OrderedPush(x) removes all items smaller than x from the beginning of the
sequence and then adds x to the beginning of the sequence.
• Pop deletes and returns the first item in the sequence (or Null if the sequence is
empty).
Suppose we implement an ordered stack with a simple linked list, using the obvious
OrderedPush and Pop algorithms. Prove that if we start with an empty data structure,
the amortized cost of each OrderedPush or Pop operation is O(1).
5. A multistack consists of an infinite series of stacks S0 , S1 , S2 , . . ., where the ith stack Si can
hold up to 3i elements. The user always pushes and pops elements from the smallest stack
S0 . However, before any element can be pushed onto any full stack Si , we first pop all the
elements off Si and push them onto stack Si+1 to make room. (Thus, if Si+1 is already full,
we first recursively move all its members to Si+2 .) Similarly, before any element can be
popped from any empty stack Si , we first pop 3i elements from Si+1 and push them onto
Si to make room. (Thus, if Si+1 is already empty, we first recursively fill it by popping
elements from Si+2 .) Moving a single element from one stack to another takes O(1) time.
Here is pseudocode for the multistack operations MSPush and MSPop. The internal
stacks are managed with the subroutines Push and Pop.
MPush(x) :
i←0
while Si is full
i ← i+1
MPop(x) :
i←0
while Si is empty
i ← i+1
while i > 0
i ← i−1
for j ← 1 to 3i
Push(Si+1 , Pop(Si ))
while i > 0
i ← i−1
for j ← 1 to 3i
Push(Si , Pop(Si+1 ))
Push(S0 , x)
return Pop(S0 )
(a) In the worst case, how long does it take to push one more element onto a multistack
containing n elements?
9
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
×9
×3
Making room in a multistack, just before pushing on a new element.
(b) Prove that if the user never pops anything from the multistack, the amortized cost
of a push operation is O(log n), where n is the maximum number of elements in the
multistack during its lifetime.
(c) Prove that in any intermixed sequence of pushes and pops, each push or pop operation
takes O(log n) amortized time, where n is the maximum number of elements in the
multistack during its lifetime.
6. Recall that a standard (FIFO) queue maintains a sequence of items subject to the following
operations.
• Push(x): Add item x to the end of the sequence.
• Pull(): Remove and return the item at the beginning of the sequence.
It is easy to implement a queue using a doubly-linked list and a counter, so that the entire
data structure uses O(n) space (where n is the number of items in the queue) and the
worst-case time per operation is O(1).
(a) Now suppose we want to support the following operation instead of Pull:
• MultiPull(k): Remove the first k items from the front of the queue, and return
the kth item removed.
Suppose we use the obvious algorithm to implement MultiPull:
MultiPull(k):
for i ← 1 to k
x ← Pull()
return x
Prove that in any intermixed sequence of Push and MultiPull operations, the
amortized cost of each operation is O(1)
(b) Now suppose we also want to support the following operation instead of Push:
• MultiPush(x, k): Insert k copies of x into the back of the queue.
Suppose we use the obvious algorithm to implement MultiPuush:
MultiPush(k, x):
for i ← 1 to k
Push(x)
10
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
Prove that for any integers ` and n, there is a sequence of ` MultiPush and MultiPull
operations that require Ω(n`) time, where n is the maximum number of items in the
queue at any time. Such a sequence implies that the amortized cost of each operation
is Ω(n).
(c) Describe a data structure that supports arbitrary intermixed sequences of MultiPush
and MultiPull operations in O(1) amortized cost per operation. Like a standard
queue, your data structure should use only O(1) space per item.
7. Recall that a standard (FIFO) queue maintains a sequence of items subject to the following
operations.
• Push(x): Add item x to the end of the sequence.
• Pull(): Remove and return the item at the beginning of the sequence.
• Size(): Return the current number of items in the sequence.
It is easy to implement a queue using a doubly-linked list, so that it uses O(n) space (where
n is the number of items in the queue) and the worst-case time for each of these operations
is O(1).
Consider the following new operation, which removes every tenth element from the
queue, starting at the beginning, in Θ(n) worst-case time.
Decimate():
n ← Size()
for i ← 0 to n − 1
if i mod 10 = 0
Pull() 〈〈result discarded〉〉
else
Push(Pull())
Prove that in any intermixed sequence of Push, Pull, and Decimate operations, the
amortized cost of each operation is O(1).
8. Chicago has many tall buildings, but only some of them have a clear view of Lake Michigan.
Suppose we are given an array A[1 .. n] that stores the height of n buildings on a city block,
indexed from west to east. Building i has a good view of Lake Michigan if and only if every
building to the east of i is shorter than i.
Here is an algorithm that computes which buildings have a good view of Lake Michigan.
What is the running time of this algorithm?
GoodView(A[1 .. n]):
initialize a stack S
for i ← 1 to n
while (S not empty and A[i] > A[Top(S)])
Pop(S)
Push(S, i)
return S
11
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
9. Suppose we can insert or delete an element into a hash table in O(1) time. In order to
ensure that our hash table is always big enough, without wasting a lot of memory, we will
use the following global rebuilding rules:
• After an insertion, if the table is more than 3/4 full, we allocate a new table twice as
big as our current table, insert everything into the new table, and then free the old
table.
• After a deletion, if the table is less than 1/4 full, we allocate a new table half as big as
our current table, insert everything into the new table, and then free the old table.
Show that for any sequence of insertions and deletions, the amortized time per operation
is still O(1). [Hint: Do not use potential functions.]
10. Professor Pisano insists that the size of any hash table used in his class must always be a
Fibonacci number. He insists on the following variant of the previous global rebuilding
strategy. Suppose the current hash table has size Fk .
• After an insertion, if the number of items in the table is Fk−1 , we allocate a new hash
table of size Fk+1 , insert everything into the new table, and then free the old table.
• After a deletion, if the number of items in the table is Fk−3 , we allocate a new hash
table of size Fk−1 , insert everything into the new table, and then free the old table.
Show that for any sequence of insertions and deletions, the amortized time per operation
is still O(1). [Hint: Do not use potential functions.]
11. Remember the difference between stacks and queues? Good.
(a) Describe how to implement a queue using two stacks and O(1) additional memory,
so that the amortized time for any enqueue or dequeue operation is O(1). The only
access you have to the stacks is through the standard subroutines Push and Pop.
(b) A quack is a data structure combining properties of both stacks and queues. It can
be viewed as a list of elements written left to right such that three operations are
possible:
• QuackPush(x): add a new item x to the left end of the list;
• QuackPop(): remove and return the item on the left end of the list;
• QuackPull(): remove the item on the right end of the list.
Implement a quack using three stacks and O(1) additional memory, so that the
amortized time for any QuackPush, QuackPop, or QuackPull operation is O(1).
In particular, each element in the quack must be stored in exactly one of the three
stacks. Again, you cannot access the component stacks except through the interface
functions Push and Pop.
12. Let’s glom a whole bunch of earlier problems together. Yay! An random-access doubleended multi-queue or radmuque (pronounced “rad muck”) stores a sequence of items and
supports the following operations.
• MultiPush(x, k) adds k copies of item x to the beginning of the sequence.
12
Algorithms
Lecture 15: Amortized Analysis [Fa’13]
• MultiPoke(x, k) adds k copies of item x to the end of the sequence.
• MultiPop(k) removes k items from the beginning of the sequence and retrns the last
item removed. (If there are less than k items in the sequence, remove them all and
return Null.)
• MultiPull(k) removes k items from the end of the sequence and retrns the last item
removed. (If there are less than k items in the sequence, remove them all and return
Null.)
• Lookup(k) returns the kth item in the sequence. (If there are less than k items in the
sequence, return Null.)
Describe and analyze a simple data structure that supports these operations using O(n)
space, where n is the current number of items in the sequence. Lookup should run in O(1)
worst-case time; all other operations should run in O(1) amortized time.
13. Suppose you are faced with an infinite number of counters x i , one for each integer i.
Each counter stores an integer mod m, where m is a fixed global constant. All counters
are initially zero. The following operation increments a single counter x i ; however, if x i
overflows (that is, wraps around from m to 0), the adjacent counters x i−1 and x i+1 are
incremented recursively.
Nudgem (i):
xi ← xi + 1
while x i ≥ m
xi ← xi − m
Nudgem (i − 1)
Nudgem (i + 1)
(a) Prove that Nudge3 runs in O(1) amortized time. [Hint: Prove that Nudge3 always
halts!]
(b) What is the worst-case total time for n calls to Nudge2 , if all counters are initially
zero?
14. Now suppose you are faced with an infinite two-dimensional grid of modular counters,
one counter x i, j for every pair of integers (i, j). Again, all counters are initially zero. The
counters are modified by the following operation, where m is a fixed global constant:
2dNudgem (i, j):
x i, j ← x i + 1
while x i, j ≥ m
x i, j ← x i, j − m
2dNudgem (i − 1, j)
2dNudgem (i, j + 1)
2dNudgem (i + 1, j)
2dNudgem (i, j − 1)
(a) Prove that 2dNudge5 runs in O(1) amortized time.
Æ
(b) Prove or disprove: 2dNudge4 also runs in O(1) amortized time.
13
Algorithms
Æ
Lecture 15: Amortized Analysis [Fa’13]
(c) Prove or disprove: 2dNudge3 always halts.
? 15.
Suppose instead of powers of two, we represent integers as the sum of Fibonacci numbers.
In other words, instead of an array of bits, we keep an array of fits, where the ith least
significant fit indicates whether the sum includes the ith Fibonacci number Fi . For example,
the fitstring 101110 F represents the number F6 + F4 + F3 + F2 = 8+3+2+1 = 14. Describe
algorithms to increment and decrement a single fitstring in constant amortized time. [Hint:
Most numbers can be represented by more than one fitstring!]
? 16.
A doubly lazy binary counter represents any number as a weighted sum of powers of two,
where each weight is one of four values: −1, 0, 1, or 2. (For succinctness, I’ll write 1 instead
of −1.) Every integer—positive, negative, or zero—has an infinite number of doubly lazy
binary representations. For example, the number 13 can be represented as 1101 (the
standard binary representation), or 2101 (because 2 · 23 − 22 + 20 = 13) or 10111 (because
24 − 22 + 21 − 20 = 13) or 11200010111 (because −210 + 29 + 2 · 28 + 24 − 22 + 21 − 20 = 13).
To increment a doubly lazy binary counter, we add 1 to the least significant bit, then
carry the rightmost 2 (if any). To decrement, we subtract 1 from the lest significant bit,
and then borrow the rightmost 1 (if any).
LazyIncrement(B[0 .. n]):
B[0] ← B[0] + 1
for i ← 1 to n − 1
if B[i] = 2
B[i] ← 0
B[i + 1] ← B[i + 1] + 1
return
LazyDecrement(B[0 .. n]):
B[0] ← B[0] − 1
for i ← 1 to n − 1
if B[i] = −1
B[i] ← 1
B[i + 1] ← B[i + 1] − 1
return
For example, here is a doubly lazy binary count from zero up to twenty and then back
down to zero. The bits are written with the least significant bit B[0] on the right, omitting
all leading 0’s.
++
++
++
++
++
++
++
++
++
++
0 −→ 1 −→ 10 −→ 11 −→ 20 −→ 101 −→ 110 −→ 111 −→ 120 −→ 201 −→ 210
++
++
++
++
++
++
++
−−
−−
−−
−−
−−
−−
−−
−−
−−
−−
++
++
++
−→ 1011 −→ 1020 −→ 1101 −→ 1110 −→ 1111 −→ 1120 −→ 1201 −→ 1210 −→ 2011 −→ 2020
−−
−−
−−
−→ 2011 −→ 2010 −→ 2001 −→ 2000 −→ 2011 −→ 2110 −→ 2101 −→ 1100 −→ 1111 −→ 1010
−−
−−
−−
−−
−−
−−
−−
−→ 1001 −→ 1000 −→ 1011 −→ 1110 −→ 1101 −→ 100 −→ 111 −→ 10 −→ 1 −→ 0
Prove that for any intermixed sequence of increments and decrements of a doubly lazy
binary number, starting with 0, the amortized time for each operation is O(1). Do not
assume, as in the example above, that all the increments come before all the decrements.
© Copyright 2014 Jeff Erickson.
This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/).
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
14
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
Everything was balanced before the computers went off line. Try and adjust something,
and you unbalance something else. Try and adjust that, you unbalance two more and
before you know what’s happened, the ship is out of control.
— Blake, Blake’s 7, “Breakdown” (March 6, 1978)
A good scapegoat is nearly as welcome as a solution to the problem.
— Anonymous
Let’s play.
— El Mariachi [Antonio Banderas], Desperado (1992)
CAPTAIN:
CAPTAIN:
CAPTAIN:
CAPTAIN:
TAKE OFF EVERY ’ZIG’!!
YOU KNOW WHAT YOU DOING.
MOVE ’ZIG’.
FOR GREAT JUSTICE.
— Zero Wing (1992)
16
16.1
ÆÆÆ
Scapegoat and Splay Trees
Definitions
Move intro paragraphs to earlier treap notes, or maybe to new appendix on basic data
structures (arrays, stacks, queues, heaps, binary search trees).
I’ll assume that everyone is already familiar with the standard terminology for binary search
trees—node, search key, edge, root, internal node, leaf, right child, left child, parent, descendant,
sibling, ancestor, subtree, preorder, postorder, inorder, etc.—as well as the standard algorithms
for searching for a node, inserting a node, or deleting a node. Otherwise, consult your favorite
data structures textbook.
For this lecture, we will consider only full binary trees—where every internal node has exactly
two children—where only the internal nodes actually store search keys. In practice, we can
represent the leaves with null pointers.
Recall that the depth of a node is its distance from the root, and its height is the distance to
the farthest leaf in its subtree. The height (or depth) of the tree is just the height of the root.
The size of a node is the number of nodes in its subtree. The size n of the whole tree is just the
total number of nodes.
A tree with height h has at most 2h leaves, so the minimum height of an n-leaf binary tree
is dlg ne. In the worst case, the time required for a search, insertion, or deletion to the height
of the tree, so in general we would like keep the height as close to lg n as possible. The best
we can possibly do is to have a perfectly balanced tree, in which each subtree has as close to
half the leaves as possible, and both subtrees are perfectly balanced. The height of a perfectly
balanced tree is dlg ne, so the worst-case search time is O(log n). However, even if we started
with a perfectly balanced tree, a malicious sequence of insertions and/or deletions could make
the tree arbitrarily unbalanced, driving the search time up to Θ(n).
To avoid this problem, we need to periodically modify the tree to maintain ‘balance’. There
are several methods for doing this, and depending on the method we use, the search tree is
given a different name. Examples include AVL trees, red-black trees, height-balanced trees,
weight-balanced trees, bounded-balance trees, path-balanced trees, B-trees, treaps, randomized
© Copyright 2014 Jeff Erickson.
This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/).
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/ for the most recent revision.
1
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
binary search trees, skip lists,¹ and jumplists. Some of these trees support searches, insertions,
and deletions, in O(log n) worst-case time, others in O(log n) amortized time, still others in
O(log n) expected time.
In this lecture, I’ll discuss three binary search tree data structures with good amortized
performance. The first two are variants of lazy balanced trees: lazy weight-balanced trees,
developed by Mark Overmars* in the early 1980s, [14] and scapegoat trees, discovered by Arne
Andersson* in 1989 [1, 2] and independently² by Igal Galperin* and Ron Rivest in 1993 [11]. The
third structure is the splay tree, discovered by Danny Sleator and Bob Tarjan in 1981 [19, 16].
16.2
Lazy Deletions: Global Rebuilding
First let’s consider the simple case where we start with a perfectly-balanced tree, and we only
want to perform searches and deletions. To get good search and delete times, we can use a
technique called global rebuilding. When we get a delete request, we locate and mark the node
to be deleted, but we don’t actually delete it. This requires a simple modification to our search
algorithm—we still use marked nodes to guide searches, but if we search for a marked node, the
search routine says it isn’t there. This keeps the tree more or less balanced, but now the search
time is no longer a function of the amount of data currently stored in the tree. To remedy this,
we also keep track of how many nodes have been marked, and then apply the following rule:
Global Rebuilding Rule. As soon as half the nodes in the tree have been marked,
rebuild a new perfectly balanced tree containing only the unmarked nodes.³
With this rule in place, a search takes O(log n) time in the worst case, where n is the number of
unmarked nodes. Specifically, since the tree has at most n marked nodes, or 2n nodes altogether,
we need to examine at most lg n + 1 keys. There are several methods for rebuilding the tree in
O(n) time, where n is the size of the new tree. (Homework!) So a single deletion can cost Θ(n)
time in the worst case, but only if we have to rebuild; most deletions take only O(log n) time.
We spend O(n) time rebuilding, but only after Ω(n) deletions, so the amortized cost of
rebuilding the tree is O(1) per deletion. (Here I’m using a simple version of the ‘taxation method’.
For each deletion, we charge a $1 tax; after n deletions, we’ve collected $n, which is just enough
to pay for rebalancing the tree containing the remaining n nodes.) Since we also have to find
and mark the node being ‘deleted’, the total amortized time for a deletion is O(log n).
16.3
Insertions: Partial Rebuilding
Now suppose we only want to support searches and insertions. We can’t ‘not really insert’ new
nodes into the tree, since that would make them unavailable to the search algorithm.⁴ So
instead, we’ll use another method called partial rebuilding. We will insert new nodes normally,
but whenever a subtree becomes unbalanced enough, we rebuild it. The definition of ‘unbalanced
enough’ depends on an arbitrary constant α > 1.
Each node v will now also store height(v) and size(v). We now modify our insertion algorithm
with the following rule:
¹Yeah, yeah. Skip lists aren’t really binary search trees. Whatever you say, Mr. Picky.
²The claim of independence is Andersson’s [2]. The two papers actually describe very slightly different rebalancing
algorithms. The algorithm I’m using here is closer to Andersson’s, but my analysis is closer to Galperin and Rivest’s.
³Alternately: When the number of unmarked nodes is one less than an exact power of two, rebuild the tree. This
rule ensures that the tree is always exactly balanced.
⁴Well, we could use the Bentley-Saxe* logarithmic method [3], but that would raise the query time to O(log2 n).
2
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
Partial Rebuilding Rule. After we insert a node, walk back up the tree updating
the heights and sizes of the nodes on the search path. If we encounter a node v
where height(v) > α · lg(size(v)), rebuild its subtree into a perfectly balanced tree (in
O(size(v)) time).
If we always follow this rule, then after an insertion, the height of the tree is at most α · lg n.
Thus, since α is a constant, the worst-case search time is O(log n). In the worst case, insertions
require Θ(n) time—we might have to rebuild the entire tree. However, the amortized time for
each insertion is again only O(log n). Not surprisingly, the proof is a little bit more complicated
than for deletions.
Define the imbalance I(v) of a node v to be the absolute difference between the sizes of its
two subtrees:
Imbal(v) := |size(left(v)) − size(right(v))|
A simple induction proof implies that Imbal(v) ≤ 1 for every node v in a perfectly balanced tree.
In particular, immediately after we rebuild the subtree of v, we have Imbal(v) ≤ 1. On the other
hand, each insertion into the subtree of v increments either size(left(v)) or size(right(v)), so
Imbal(v) changes by at most 1.
The whole analysis boils down to the following lemma.
Lemma 1. Just before we rebuild v’s subtree, Imbal(v) = Ω(size(v)).
Before we prove this lemma, let’s first look at what it implies. If Imbal(v) = Ω(size(v)), then
Ω(size(v)) keys have been inserted in the v’s subtree since the last time it was rebuilt from scratch.
On the other hand, rebuilding the subtree requires O(size(v)) time. Thus, if we amortize the
rebuilding cost across all the insertions since the previous rebuild, v is charged constant time for
each insertion into its subtree. Since each new key is inserted into at most α · lg n = O(log n)
subtrees, the total amortized cost of an insertion is O(log n).
Proof: Since we’re about to rebuild the subtree at v, we must have height(v) > α · lg size(v).
Without loss of generality, suppose that the node we just inserted went into v’s left subtree. Either
we just rebuilt this subtree or we didn’t have to, so we also have height(left(v)) ≤ α·lg size(left(v)).
Combining these two inequalities with the recursive definition of height, we get
α · lg size(v) < height(v) ≤ height(left(v)) + 1 ≤ α · lg size(left(v)) + 1.
After some algebra, this simplifies to size(left(v)) > size(v)/21/α . Combining this with the identity
size(v) = size(left(v)) + size(right(v)) + 1 and doing some more algebra gives us the inequality
size(right(v)) < 1 − 1/21/α size(v) − 1.
Finally, we combine these two inequalities using the recursive definition of imbalance.
Imbal(v) ≥ size(left(v)) − size(right(v)) − 1 > 2/21/α − 1 size(v)
Since α is a constant bigger than 1, the factor in parentheses is a positive constant.
3
ƒ
Algorithms
16.4
Lecture 16: Scapegoat and Splay Trees [Fa’13]
Scapegoat (Lazy Height-Balanced) Trees
Finally, to handle both insertions and deletions efficiently, scapegoat trees use both of the previous
techniques. We use partial rebuilding to re-balance the tree after insertions, and global rebuilding
to re-balance the tree after deletions. Each search takes O(log n) time in the worst case, and the
amortized time for any insertion or deletion is also O(log n). There are a few small technical
details left (which I won’t describe), but no new ideas are required.
Once we’ve done the analysis, we can actually simplify the data structure. It’s not hard to
prove that at most one subtree (the scapegoat) is rebuilt during any insertion. Less obviously, we
can even get the same amortized time bounds (except for a small constant factor) if we only
maintain the three integers in addition to the actual tree: the size of the entire tree, the height
of the entire tree, and the number of marked nodes. Whenever an insertion causes the tree to
become unbalanced, we can compute the sizes of all the subtrees on the search path, starting at
the new leaf and stopping at the scapegoat, in time proportional to the size of the scapegoat
subtree. Since we need that much time to re-balance the scapegoat subtree, this computation
increases the running time by only a small constant factor! Thus, unlike almost every other kind
of balanced trees, scapegoat trees require only O(1) extra space.
16.5
Rotations, Double Rotations, and Splaying
Another method for maintaining balance in binary search trees is by adjusting the shape of the
tree locally, using an operation called a rotation. A rotation at a node x decreases its depth by
one and increases its parent’s depth by one. Rotations can be performed in constant time, since
they only involve simple pointer manipulation.
y
right
x
x
y
left
Figure 1. A right rotation at x and a left rotation at y are inverses.
For technical reasons, we will need to use rotations two at a time. There are two types of
double rotations, which might be called zig-zag and roller-coaster. A zig-zag at x consists of two
rotations at x, in opposite directions. A roller-coaster at a node x consists of a rotation at x’s
parent followed by a rotation at x, both in the same direction. Each double rotation decreases
the depth of x by two, leaves the depth of its parent unchanged, and increases the depth of
its grandparent by either one or two, depending on the type of double rotation. Either type of
double rotation can be performed in constant time.
Finally, a splay operation moves an arbitrary node in the tree up to the root through a series
of double rotations, possibly with one single rotation at the end. Splaying a node v requires time
proportional to depth(v). (Obviously, this means the depth before splaying, since after splaying v
is the root and thus has depth zero!)
16.6
Splay Trees
A splay tree is a binary search tree that is kept more or less balanced by splaying. Intuitively, after
we access any node, we move it to the root with a splay operation. In more detail:
4
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
z
z
w
x
x
w
w
x
z
Figure 2. A zig-zag at x . The symmetric case is not shown.
z
x
y
y
y
x
x
z
z
Figure 3. A right roller-coaster at x and a left roller-coaster at z .
b
a
b
d
c
a
m
k
e
n
k
e
j
i
a
m
f
h
g
d
c
l
f
b
n
x
x
n
f
c
a
m
k
j
e
m
d
c
n
g
k
n
f
l
j
e
l
h
l
h
g
b
x
d
k
j
e
h
g
a
m
f
j
x
d
c
l
x
b
h
i
g
i
i
i
Figure 4. Splaying a node. Irrelevant subtrees are omitted for clarity.
• Search: Find the node containing the key using the usual algorithm, or its predecessor or
successor if the key is not present. Splay whichever node was found.
• Insert: Insert a new node using the usual algorithm, then splay that node.
• Delete: Find the node x to be deleted, splay it, and then delete it. This splits the tree into
two subtrees, one with keys less than x, the other with keys bigger than x. Find the node
w in the left subtree with the largest key (the inorder predecessor of x in the original tree),
splay it, and finally join it to the right subtree.
x
x
w
w
w
Figure 5. Deleting a node in a splay tree.
Each search, insertion, or deletion consists of a constant number of operations of the form
walk down to a node, and then splay it up to the root. Since the walk down is clearly cheaper
5
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
than the splay up, all we need to get good amortized bounds for splay trees is to derive good
amortized bounds for a single splay.
Believe it or not, the easiest way to do this uses the potential method. We define the rank of a
node v to be blg size(v)c, and the potential of a splay tree to be the sum of the ranks of its nodes:
X
X
Φ :=
rank(v) =
blg size(v)c
v
v
It’s not hard to observe that a perfectly balanced binary tree has potential Θ(n), and a linear
chain of nodes (a perfectly unbalanced tree) has potential Θ(n log n).
The amortized analysis of splay trees boils down to the following lemma. Here, rank(v)
denotes the rank of v before a (single or double) rotation, and rank0 (v) denotes its rank afterwards.
Recall that the amortized cost is defined to be the number of rotations plus the drop in potential.
The Access Lemma. The amortized cost of a single rotation at any node v is at most 1 +
3 rank0 (v) − 3 rank(v), and the amortized cost of a double rotation at any node v is at most
3 rank0 (v) − 3 rank(v).
Proving this lemma is a straightforward but tedious case analysis of the different types of
rotations. For the sake of completeness, I’ll give a proof (of a generalized version) in the next
section.
By adding up the amortized costs of all the rotations, we find that the total amortized cost of
splaying a node v is at most 1 + 3 rank0 (v) − 3 rank(v), where rank0 (v) is the rank of v after the
entire splay. (The intermediate ranks cancel out in a nice telescoping sum.) But after the splay, v
is the root! Thus, rank0 (v) = blg nc, which implies that the amortized cost of a splay is at most
3 lg n − 1 = O(log n).
We conclude that every insertion, deletion, or search in a splay tree takes O(log n) amortized
time.
?
16.7
Other Optimality Properties
In fact, splay trees are optimal in several other senses. Some of these optimality properties follow
easily from the following generalization of the Access Lemma.
Let’s arbitrarily assign each node v a non-negative real weight w(v). These weights are not
actually stored in the splay tree, nor do they affect the splay algorithm in any way; they are only
used to help with the analysis. We then redefine the size s(v) of a node v to be the sum of the
weights of the descendants of v, including v itself:
s(v) := w(v) + s(right(v)) + s(left(v)).
If w(v) = 1 for every node v, then the size of a node is just the number of nodes in its subtree, as
in the previous section. As before, we define the rank of any node v to be r(v) = lg s(v), and the
potential of a splay tree to be the sum of the ranks of all its nodes:
Φ=
X
r(v) =
v
X
lg s(v)
v
In the following lemma, r(v) denotes the rank of v before a (single or double) rotation, and r 0 (v)
denotes its rank afterwards.
6
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
The Generalized Access Lemma. For any assignment of non-negative weights to the nodes, the
amortized cost of a single rotation at any node x is at most 1 + 3r 0 (x) − 3r(x), and the amortized
cost of a double rotation at any node v is at most 3r 0 (x) − 3r(x).
Proof: First consider a single rotation, as shown in Figure 1.
1 + Φ0 − Φ = 1 + r 0 (x) + r 0 ( y) − r(x) − r( y)
[only x and y change rank]
0
≤ 1 + r (x) − r(x)
[r 0 ( y) ≤ r( y)]
≤ 1 + 3r 0 (x) − 3r(x)
[r 0 (x) ≥ r(x)]
Now consider a zig-zag, as shown in Figure 2. Only w, x, and z change rank.
2 + Φ0 − Φ
= 2 + r 0 (w) + r 0 (x) + r 0 (z) − r(w) − r(x) − r(z)
0
0
0
[only w, x, z change rank]
[r(x) ≤ r(w) and r 0 (x) = r(z)]
≤ 2 + r (w) + r (x) + r (z) − 2r(x)
= 2 + (r 0 (w) − r 0 (x)) + (r 0 (z) − r 0 (x)) + 2(r 0 (x) − r(x))
s0 (z)
s0 (w)
+
lg
+ 2(r 0 (x) − r(x))
s0 (x)
s0 (x)
s0 (x)/2
≤ 2 + 2 lg 0
+ 2(r 0 (x) − r(x))
s (x)
= 2(r 0 (x) − r(x))
= 2 + lg
[s0 (w) + s0 (z) ≤ s0 (x), lg is concave]
≤ 3(r 0 (x) − r(x))
[r 0 (x) ≥ r(x)]
Finally, consider a roller-coaster, as shown in Figure 3. Only x, y, and z change rank.
2 + Φ0 − Φ
= 2 + r 0 (x) + r 0 ( y) + r 0 (z) − r(x) − r( y) − r(z)
0
0
≤ 2 + r (x) + r (z) − 2r(x)
0
0
[only x, y, z change rank]
0
[r ( y) ≤ r(z) and r(x) ≥ r( y)]
0
0
= 2 + (r(x) − r (x)) + (r (z) − r (x)) + 3(r (x) − r(x))
s(x)
s0 (z)
+
lg
+ 3(r 0 (x) − r(x))
0
0
s (x)
s (x)
s0 (x)/2
+ 3(r 0 (x) − r(x))
≤ 2 + 2 lg 0
s (x)
= 3(r 0 (x) − r(x))
= 2 + lg
[s(x) + s0 (z) ≤ s0 (x), lg is concave]
ƒ
This completes the proof. ⁵
Observe that this argument works for arbitrary non-negative vertex weights. By adding up
the amortized costs of all the rotations, we find that the total amortized cost of splaying a node x
is at most 1 + 3r(root) − 3r(x). (The intermediate ranks cancel out in a nice telescoping sum.)
This analysis has several immediate corollaries. The first corollary is that the amortized
search time in a splay tree is within a constant factor of the search time in the best possible static
⁵This proof is essentially taken verbatim from the original Sleator and Tarjan paper. Another proof technique,
which may be more accessible, involves maintaining blg s(v)c tokens on each node v and arguing about the changes in
token distribution caused by each single or double rotation. But I haven’t yet internalized this approach enough to
include it here.
7
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
binary search tree. Thus, if some nodes are accessed more often than others, the standard splay
algorithm automatically keeps those more frequent nodes closer to the root, at least most of the
time.
Static
Optimality Theorem. Suppose each node x is accessed at least t(x) times, and let T =
P
x t(x). The amortized cost of accessing x is O(log T − log t(x)).
Proof: Set w(x) = t(x) for each node x.
ƒ
For any nodes x and z, let dist(x, z) denote the rank distance between x and y, that is,
the number of nodes y such that key(x) ≤ key( y) ≤ key(z) or key(x) ≥ key( y) ≥ key(z). In
particular, dist(x, x) = 1 for all x.
Static Finger Theorem. For any fixed node f (‘the finger’), the amortized cost of accessing x is
O(lg dist( f , x)).
Proof: Set w(x) = 1/dist(x, f )2 for each node x. Then s(root) ≤
r(x) ≥ lg w(x) = −2 lg dist( f , x).
P∞
i=1 2/i
2
= π2 /3 = O(1), and
ƒ
Here are a few more interesting properties of splay trees, which I’ll state without proof.⁶
The proofs of these properties (especially the dynamic finger theorem) are considerably more
complicated than the amortized analysis presented above.
Working Set Theorem [16]. The amortized cost of accessing node x is O(log D), where D is the
number of distinct items accessed since the last time x was accessed. (For the first access to x, we
set D = n.)
Scanning Theorem [18]. Splaying all nodes in a splay tree in order, starting from any initial tree,
requires O(n) total rotations.
Dynamic Finger Theorem [7, 6]. Immediately after accessing node y, the amortized cost of accessing node x is O(lg dist(x, y)).
?
16.8
Splay Tree Conjectures
Splay trees are conjectured to have many interesting properties in addition to the optimality
properties that have been proved; I’ll describe just a few of the more important ones.
The Deque Conjecture [18] considers the cost of dynamically maintaining two fingers l and r,
starting on the left and right ends of the tree. Suppose at each step, we can move one of these
two fingers either one step left or one step right; in other words, we are using the splay tree
as a doubly-ended queue. Sundar* proved that the total cost of m deque operations on an
n-node splay tree is O((m + n)α(m + n)) [17]. More recently, Pettie later improved this bound to
O(mα∗ (n)) [15]. The Deque Conjecture states that the total cost is actually O(m + n).
The Traversal Conjecture [16] states that accessing the nodes in a splay tree, in the order
specified by a preorder traversal of any other binary tree with the same keys, takes O(n) time.
This is generalization of the Scanning Theorem.
The Unified Conjecture [13] states that the time to access node x is O(lg min y (D( y)+d(x, y))),
where D( y) is the number of distinct nodes accessed since the last time y was accessed. This
⁶This list and the following section are taken almost directly from Erik Demaine’s lecture notes [5].
8
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
would immediately imply both the Dynamic Finger Theorem, which is about spatial locality, and
the Working Set Theorem, which is about temporal locality. Two other structures are known that
satisfy the unified bound [4, 13].
Finally, the most important conjecture about splay trees, and one of the most important open
problems about data structures, is that they are dynamically optimal [16]. Specifically, the cost of
any sequence of accesses to a splay tree is conjectured to be at most a constant factor more than
the cost of the best possible dynamic binary search tree that knows the entire access sequence in
advance. To make the rules concrete, we consider binary search trees that can undergo arbitrary
rotations after a search; the cost of a search is the number of key comparisons plus the number
of rotations. We do not require that the rotations be on or even near the search path. This is an
extremely strong conjecture!
No dynamically optimal binary search tree is known, even in the offline setting. However,
three very similar O(log log n)-competitive binary search trees have been discovered in the last
few years: Tango trees [9], multisplay trees [20], and chain-splay trees [12]. A recently-published
geometric formulation of dynamic binary search trees [8, 10] also offers significant hope for
future progress.
References
[1] Arne Andersson*. Improving partial rebuilding by using simple balance criteria. Proc. Workshop on
Algorithms and Data Structures, 393–402, 1989. Lecture Notes Comput. Sci. 382, Springer-Verlag.
[2] Arne Andersson. General balanced trees. J. Algorithms 30:1–28, 1999.
[3] Jon L. Bentley and James B. Saxe*. Decomposable searching problems I: Static-to-dynamic transformation. J. Algorithms 1(4):301–358, 1980.
[4] Mihai Bdiou* and Erik D. Demaine. A simplified and dynamic unified structure. Proc. 6th Latin
American Sympos. Theoretical Informatics, 466–473, 2004. Lecture Notes Comput. Sci. 2976, SpringerVerlag.
[5] Jeff Cohen* and Erik Demaine. 6.897: Advanced data structures (Spring 2005), Lecture 3, February
8 2005. 〈http://theory.csail.mit.edu/classes/6.897/spring05/lec.html〉.
[6] Richard Cole. On the dynamic finger conjecture for splay trees. Part II: The proof. SIAM J. Comput.
30(1):44–85, 2000.
[7] Richard Cole, Bud Mishra, Jeanette Schmidt, and Alan Siegel. On the dynamic finger conjecture for
splay trees. Part I: Splay sorting log n-block sequences. SIAM J. Comput. 30(1):1–43, 2000.
[8] Erik D. Demaine, Dion Harmon*, John Iacono, Daniel Kane*, and Mihai Pǎtracu. The geometry of
binary search trees. Proc. 20th Ann. ACM-SIAM Symp. Discrete Algorithms, 496–505, 2009.
[9] Erik D. Demaine, Dion Harmon*, John Iacono, and Mihai Ptracu**. Dynamic optimality—almost.
Proc. 45th Annu. IEEE Sympos. Foundations Comput. Sci., 484–490, 2004.
[10] Jonathan Derryberry*, Daniel Dominic Sleator, and Chengwen Chris Wang*. A lower bound
framework for binary search trees with rotations. Tech. Rep. CMU-CS-05-187, Carnegie Mellon Univ.,
Nov. 2005. 〈http://reports-archive.adm.cs.cmu.edu/anon/2005/CMU-CS-05-187.pdf〉.
[11] Igal Galperin* and Ronald R. Rivest. Scapegoat trees. Proc. 4th Annu. ACM-SIAM Sympos. Discrete
Algorithms, 165–174, 1993.
[12] George F. Georgakopoulos. Chain-splay trees, or, how to achieve and prove log log N -competitiveness
by splaying. Inform. Proc. Lett. 106(1):37–43, 2008.
[13] John Iacono*. Alternatives to splay trees with O(log n) worst-case access times. Proc. 12th Annu.
ACM-SIAM Sympos. Discrete Algorithms, 516–522, 2001.
[14] Mark H. Overmars*. The Design of Dynamic Data Structures. Lecture Notes Comput. Sci. 156.
Springer-Verlag, 1983.
[15] Seth Pettie. Splay trees, Davenport-Schinzel sequences, and the deque conjecture. Proc. 19th Ann.
ACM-SIAM Symp. Discrete Algorithms, 1115–1124, 2008.
9
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
[16] Daniel D. Sleator and Robert E. Tarjan. Self-adjusting binary search trees. J. ACM 32(3):652–686,
1985.
[17] Rajamani Sundar*. On the Deque conjecture for the splay algorithm. Combinatorica 12(1):95–124,
1992.
[18] Robert E. Tarjan. Sequential access in splay trees takes linear time. Combinatorica 5(5):367–378,
1985.
[19] Robert Endre Tarjan. Data Structures and Network Algorithms. CBMS-NSF Regional Conference
Series in Applied Mathematics 44. SIAM, 1983.
[20] Chengwen Chris Wang*, Jonathan Derryberry*, and Daniel Dominic Sleator. O(log log n)-competitive
dynamic binary search trees. Proc. 17th Annu. ACM-SIAM Sympos. Discrete Algorithms, 374–383, 2006.
*Starred authors were graduate students at the time that the cited work was published. **Double-starred
authors were undergraduates.
Exercises
1. (a) An n-node binary tree is perfectly balanced if either n ≤ 1, or its two subtrees are
perfectly balanced binary trees, each with at most bn/2c nodes. Prove that I(v) ≤ 1
for every node v of any perfectly balanced tree.
(b) Prove that at most one subtree is rebalanced during a scapegoat tree insertion.
2. In a dirty binary search tree, each node is labeled either clean or dirty. The lazy deletion
scheme used for scapegoat trees requires us to purge the search tree, keeping all the clean
nodes and deleting all the dirty nodes, as soon as half the nodes become dirty. In addition,
the purged tree should be perfectly balanced.
(a) Describe and analyze an algorithm to purge an arbitrary n-node dirty binary search
tree in O(n) time. (Such an algorithm is necessary for scapegoat trees to achieve
O(log n) amortized insertion cost.)
? (b)
Æ
Modify your algorithm so that is uses only O(log n) space, in addition to the tree itself.
Don’t forget to include the recursion stack in your space bound.
(c) Modify your algorithm so that is uses only O(1) additional space. In particular, your
algorithm cannot call itself recursively at all.
3. Consider the following simpler alternative to splaying:
MoveToRoot(v):
while parent(v) 6= Null
rotate at v
Prove that the amortized cost of MoveToRoot in an n-node binary tree can be Ω(n). That
is, prove that for any integer k, there is a sequence of k MoveToRoot operations that
require Ω(kn) time to execute.
4. Let P be a set of n points in the plane. The staircase of P is the set of all points in the plane
that have at least one point in P both above and to the right.
10
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
A set of points in the plane and its staircase (shaded).
(a) Describe an algorithm to compute the staircase of a set of n points in O(n log n) time.
(b) Describe and analyze a data structure that stores the staircase of a set of points, and
an algorithm Above?(x, y) that returns True if the point (x, y) is above the staircase,
or False otherwise. Your data structure should use O(n) space, and your Above?
algorithm should run in O(log n) time.
TRUE
FALSE
Two staircase queries.
(c) Describe and analyze a data structure that maintains a staircase as new points are
inserted. Specifically, your data structure should support a function Insert(x, y)
that adds the point (x, y) to the underlying point set and returns True or False to
indicate whether the staircase of the set has changed. Your data structure should use
O(n) space, and your Insert algorithm should run in O(log n) amortized time.
TRUE!
FALSE!
Two staircase insertions.
5. Suppose we want to maintain a dynamic set of values, subject to the following operations:
• Insert(x): Add x to the set (if it isn’t already there).
• Print&DeleteBetween(a, b): Print every element x in the range a ≤ x ≤ b, in
increasing order, and delete those elements from the set.
For example, if the current set is {1, 5, 3, 4, 8}, then
• Print&DeleteBetween(4, 6) prints the numbers 4 and 5 and changes the set to
{1, 3, 8};
• Print&DeleteBetween(6, 7) prints nothing and does not change the set;
• Print&DeleteBetween(0, 10) prints the sequence 1, 3, 4, 5, 8 and deletes everything.
11
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
(a) Suppose we store the set in our favorite balanced binary search tree, using the
standard Insert algorithm and the following algorithm for Print&DeleteBetween:
Print&DeleteBetween(a, b):
x ← Successor(a)
while x ≤ b
print x
Delete(x)
x ← Successor(a)
Here, Successor(a) returns the smallest element greater than or equal to a (or ∞
if there is no such element), and Delete is the standard deletion algorithm. Prove
that the amortized time for Insert and Print&DeleteBetween is O(log N ), where
N is the maximum number of items that are ever stored in the tree.
(b) Describe and analyze Insert and Print&DeleteBetween algorithms that run in
O(log n) amortized time, where n is the current number of elements in the set.
(c) What is the running time of your Insert algorithm in the worst case?
(d) What is the running time of your Print&DeleteBetween algorithm in the worst
case?
6. Say that a binary search tree is augmented if every node v also stores size(v), the number
of nodes in the subtree rooted at v.
(a) Show that a rotation in an augmented binary tree can be performed in constant time.
(b) Describe an algorithm ScapegoatSelect(k) that selects the kth smallest item in an
augmented scapegoat tree in O(log n) worst-case time. (The scapegoat trees presented
in these notes are already augmented.)
(c) Describe an algorithm SplaySelect(k) that selects the kth smallest item in an
augmented splay tree in O(log n) amortized time.
(d) Describe an algorithm TreapSelect(k) that selects the kth smallest item in an
augmented treap in O(log n) expected time.
7. Many applications of binary search trees attach a secondary data structure to each node in
the tree, to allow for more complicated searches. Let T be an arbitrary binary tree. The
secondary data structure at any node v stores exactly the same set of items as the subtree
of T rooted at v. This secondary structure has size O(size(v)) and can be built in O(size(v))
time, where size(v) denotes the number of descendants of v.
The primary and secondary data structures are typically defined by different attributes
of the data being stored. For example, to store a set of points in the plane, we could define
the primary tree T in terms of the x-coordinates of the points, and define the secondary
data structures in terms of their y-coordinate.
Maintaining these secondary structures complicates algorithms for keeping the top-level
search tree balanced. Specifically, performing a rotation at any node v in the primary tree
now requires O(size(v)) time, because we have to rebuild one of the secondary structures
(at the new child of v). When we insert a new item into T , we must also insert into one or
more secondary data structures.
12
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
(a) Overall, how much space does this data structure use in the worst case?
(b) How much space does this structure use if the primary search tree is perfectly
balanced?
(c) Suppose the primary tree is a splay tree. Prove that the amortized cost of a splay (and
therefore of a search, insertion, or deletion) is Ω(n). [Hint: This is easy!]
(d) Now suppose the primary tree T is a scapegoat tree. How long does it take to rebuild
the subtree of T rooted at some node v, as a function of size(v)?
(e) Suppose the primary tree and all secondary trees are scapegoat trees. What is the
amortized cost of a single insertion?
? (f)
Finally, suppose the primary tree and every secondary tree is a treap. What is the
worst-case expected time for a single insertion?
8. Suppose we want to maintain a collection of strings (sequences of characters) under the
following operations:
• NewString(a) creates a new string of length 1 containing only the character a and
returns a pointer to that string.
• Concat(S, T ) removes the strings S and T (given by pointers) from the data structure,
adds the concatenated string S T to the data structure, and returns a pointer to the
new string.
• Split(S, k) removes the strings S (given by a pointer) from the data structure, adds
the first k characters of S and the rest of S as two new strings in the data structure,
and returns pointers to the two new strings.
• Reverse(S) removes the string S (given by a pointer) from the data structure, adds
the reversal of S to the data structure, and returns a pointer to the new string.
• Lookup(S, k) returns the kth character in string S (given by a pointer), or Null if
the length of the S is less than k.
Describe and analyze a simple data structure that supports NewString and Reverse
in O(1) worst-case time, supports every other operation in O(log n) time (either worst-case,
expected, or amortized), and uses O(n) space, where n is the sum of the current string
lengths. [Hint: Why is this problem here?]
9. After the Great Academic Meltdown of 2020, you get a job as a cook’s assistant at Jumpin’
Jack’s Flapjack Stack Shack, which sells arbitrarily-large stacks of pancakes for just four
bits (50 cents) each. Jumpin’ Jack insists that any stack of pancakes given to one of his
customers must be sorted, with smaller pancakes on top of larger pancakes. Also, whenever
a pancake goes to a customer, at least the top side must not be burned.
The cook provides you with a unsorted stack of n perfectly round pancakes, of n
different sizes, possibly burned on one or both sides. Your task is to throw out the pancakes
that are burned on both sides (and only those) and sort the remaining pancakes so that
their burned sides (if any) face down. Your only tool is a spatula. You can insert the spatula
under any pancake and then either flip or discard the stack of pancakes above the spatula.
More concretely, we can represent a stack of pancakes by a sequence of distinct integers
between 1 and n, representing the sizes of the pancakes, with each number marked to
13
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
Flipping the top four pancakes. Again.
indicate the burned side(s) of the corresponding pancake. For example, 1 4 3 2 represents
a stack of four pancakes: a one-inch pancake burned on the bottom; a four-inch pancake
burned on the top; an unburned three-inch pancake, and a two-inch pancake burned
on both sides. We store this sequence in a data structure that supports the following
operations:
• Position(x): Return the position of integer x in the current sequence, or 0 if x is
not in the sequence.
• Value(k): Return the kth integer in the current sequence, or 0 if the sequence has no
kth element. Value is essentially the inverse of Position.
• TopBurned(k): Return True if and only if the top side of the kth pancake in the
current sequence is burned.
• Flip(k): Reverse the order and the burn marks of the first k elements of the sequence.
• Discard(k): Discard the first k elements of the sequence.
(a) Describe an algorithm to filter and sort any stack of n burned pancakes using O(n) of
the operations listed above. Try to make the big-Oh constant small.
1432
Flip(4)
/ 2341
Discard(1)
/ 341
Flip(2)
/ 431
Flip(3)
/ 134
(b) Describe a data structure that supports each of the operations listed above in O(log n)
amortized time. Together with part (a), such a data structure gives us an algorithm
to filter and sort any stack of n burned pancakes in O(n log n) time.
10. Let X = 〈x 1 , x 2 , . . . , x m 〉 be a sequence of m integers, each from the set {1, 2, . . . , n}. We
can visualize this sequence as a set of integer points in the plane, by interpreting each
element x i as the point (x i , i). The resulting point set, which we can also call X , has exactly
one point on each row of the n × m integer grid.
(a) Let Y be an arbitrary set of integer points in the plane. Two points (x 1 , y1 ) and
(x 2 , y2 ) in Y are isolated if (1) x 1 6= x 2 and y1 6= y2 , and (2) there is no other point
(x, y) ∈ Y with x 1 ≤ x ≤ x 2 and y1 ≤ y ≤ y2 . If the set Y contains no isolated pairs
of points, we call Y a commune.⁷
Let X be an arbitrary set of points on the n × n integer grid with exactly one point
per row. Show that there is a commune Y that contains X and consists of O(n log n)
points.
⁷Demaine et al. [8] refer to communes as arborally satisfied sets.
14
Algorithms
Lecture 16: Scapegoat and Splay Trees [Fa’13]
(b) Consider the following model of self-adjusting binary search trees. We interpret X
as a sequence of accesses in a binary search tree. Let T0 denote the initial tree. In
the ith round, we traverse the path from the root to node x i , and then arbitrarily
reconfigure some subtree Si of the current search tree Ti−1 to obtain the next search
tree Ti . The only restriction is that the subtree Si must contain both x i and the root
of Ti−1 . (For example, in a splay tree, Si is the search path to x i .) The cost of the ith
access is the number of nodes in the subtree Si .
Prove that the minimum cost of executing an access sequence X in this model is
at least the size of the smallest commune containing the corresponding point set X .
[Hint: Lowest common ancestor.]
? (c)
Suppose X is a random permutation of the integers 1, 2, . . . , n. Use the lower bound
in part (b) to prove that the expected minimum cost of executing X is Ω(n log n).
Æ(d)
Describe a polynomial-time algorithm to compute (or even approximate up to constant
factors) the smallest commune containing a given set X of integer points, with at
most one point per row. Alternately, prove that the problem is NP-hard.
© Copyright 2014 Jeff Erickson.
This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/).
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
15
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
E pluribus unum (Out of many, one)
— Official motto of the United States of America
John: Who’s your daddy? C’mon, you know who your daddy is!
Who’s your daddy? D’Argo, tell him who his daddy is!
D’Argo: I’m your daddy.
— Farscape, “Thanks for Sharing” (June 15, 2001)
What rolls down stairs, alone or in pairs, rolls over your neighbor’s dog?
What’s great for a snack, and fits on your back? It’s Log, Log, Log!
It’s Log! It’s Log! It’s big, it’s heavy, it’s wood!
It’s Log! It’s Log! It’s better than bad, it’s good!
— Ren & Stimpy, “Stimpy’s Big Day/The Big Shot" (August 11, 1991)
lyrics by John Kricfalusi
The thing’s hollow - it goes on forever - and - oh my God! - it’s full of stars!
— Capt. David Bowman’s last words(?)
2001: A Space Odyssey by Arthur C. Clarke (1968)
17
Data Structures for Disjoint Sets
In this lecture, we describe some methods for maintaining a collection of disjoint sets. Each set
is represented as a pointer-based data structure, with one node per element. We will refer to
the elements as either ‘objects’ or ‘nodes’, depending on whether we want to emphasize the set
abstraction or the actual data structure. Each set has a unique ‘leader’ element, which identifies
the set. (Since the sets are always disjoint, the same object cannot be the leader of more than
one set.) We want to support the following operations.
• MakeSet(x): Create a new set {x} containing the single element x. The object x must
not appear in any other set in our collection. The leader of the new set is obviously x.
• Find(x): Find (the leader of) the set containing x.
• Union(A, B): Replace two sets A and B in our collection with their union A ∪ B. For
example, Union(A, MakeSet(x)) adds a new element x to an existing set A. The sets A
and B are specified by arbitrary elements, so Union(x, y) has exactly the same behavior as
Union(Find(x), Find( y)).
Disjoint set data structures have lots of applications. For instance, Kruskal’s minimum
spanning tree algorithm relies on such a data structure to maintain the components of the
intermediate spanning forest. Another application is maintaining the connected components
of a graph as new vertices and edges are added. In both these applications, we can use a
disjoint-set data structure, where we maintain a set for each connected component, containing
that component’s vertices.
17.1
Reversed Trees
One of the easiest ways to store sets is using trees, in which each node represents a single element
of the set. Each node points to another node, called its parent, except for the leader of each
set, which points to itself and thus is the root of the tree. MakeSet is trivial. Find traverses
© Copyright 2014 Jeff Erickson.
This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/).
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/ for the most recent revision.
1
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
parent pointers up to the leader. Union just redirects the parent pointer of one leader to the
other. Unlike most tree data structures, nodes do not have pointers down to their children.
Find(x):
while x 6= parent(x)
x ← parent(x)
return x
MakeSet(x):
parent(x) ← x
a
b
Union(x, y):
x ← Find(x)
y ← Find( y)
parent( y) ← x
p
d
q
p
r
a
c
b
q
r
d
c
Merging two sets stored as trees. Arrows point to parents. The shaded node has a new parent.
Make-Set clearly takes Θ(1) time, and Union requires only O(1) time in addition to the two
Finds. The running time of Find(x) is proportional to the depth of x in the tree. It is not hard
to come up with a sequence of operations that results in a tree that is a long chain of nodes, so
that Find takes Θ(n) time in the worst case.
However, there is an easy change we can make to our Union algorithm, called union by depth,
so that the trees always have logarithmic depth. Whenever we need to merge two trees, we
always make the root of the shallower tree a child of the deeper one. This requires us to also
maintain the depth of each tree, but this is quite easy.
MakeSet(x):
parent(x) ← x
depth(x) ← 0
Find(x):
while x 6= parent(x)
x ← parent(x)
return x
Union(x, y)
x ← Find(x)
y ← Find( y)
if depth(x) > depth( y)
parent( y) ← x
else
parent(x) ← y
if depth(x) = depth( y)
depth( y) ← depth( y) + 1
With this new rule in place, it’s not hard to prove by induction that for any set leader x, the
size of x’s set is at least 2depth(x) , as follows. If depth(x) = 0, then x is the leader of a singleton
set. For any d > 0, when depth(x) becomes d for the first time, x is becoming the leader of the
union of two sets, both of whose leaders had depth d − 1. By the inductive hypothesis, both
component sets had at least 2d−1 elements, so the new set has at least 2d elements. Later Union
operations might add elements to x’s set without changing its depth, but that only helps us.
Since there are only n elements altogether, the maximum depth of any set is lg n. We conclude
that if we use union by depth, both Find and Union run in Θ(log n) time in the worst case.
17.2
Shallow Threaded Trees
Alternately, we could just have every object keep a pointer to the leader of its set. Thus, each set
is represented by a shallow tree, where the leader is the root and all the other elements are its
2
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
children. With this representation, MakeSet and Find are completely trivial. Both operations
clearly run in constant time. Union is a little more difficult, but not much. Our algorithm sets
all the leader pointers in one set to point to the leader of the other set. To do this, we need a
method to visit every element in a set; we will ‘thread’ a linked list through each set, starting at
the set’s leader. The two threads are merged in the Union algorithm in constant time.
b
c
a
p
a
d
q
q
p
r
r
b
c
d
Merging two sets stored as threaded trees.
Bold arrows point to leaders; lighter arrows form the threads. Shaded nodes have a new leader.
Union(x, y):
x ← Find(x)
y ← Find( y)
MakeSet(x):
leader(x) ← x
next(x) ← x
Find(x):
return leader(x)
y←y
leader( y) ← x
while (next( y) 6= Null)
y ← next( y)
leader( y) ← x
next( y) ← next(x)
next(x) ← y
The worst-case running time of Union is a constant times the size of the larger set. Thus, if we
merge a one-element set with another n-element set, the running time can be Θ(n). Generalizing
this idea, it is quite easy to come up with a sequence of n MakeSet and n − 1 Union operations
that requires Θ(n2 ) time to create the set {1, 2, . . . , n} from scratch.
WorstCaseSequence(n):
MakeSet(1)
for i ← 2 to n
MakeSet(i)
Union(1, i)
We are being stupid in two different ways here. One is the order of operations in WorstCaseSequence. Obviously, it would be more efficient to merge the sets in the other order, or to use
some sort of divide and conquer approach. Unfortunately, we can’t fix this; we don’t get to decide
how our data structures are used! The other is that we always update the leader pointers in the
larger set. To fix this, we add a comparison inside the Union algorithm to determine which set is
smaller. This requires us to maintain the size of each set, but that’s easy.
MakeWeightedSet(x):
leader(x) ← x
next(x) ← x
size(x) ← 1
WeightedUnion(x, y)
x ← Find(x)
y ← Find( y)
if size(x) > size( y)
Union(x, y)
size(x) ← size(x) + size( y)
else
Union( y, x)
size( y) ← size(x) + size( y)
3
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
The new WeightedUnion algorithm still takes Θ(n) time to merge two n-element sets.
However, in an amortized sense, this algorithm is much more efficient. Intuitively, before we can
merge two large sets, we have to perform a large number of MakeWeightedSet operations.
Theorem 1. A sequence of m MakeWeightedSet operations and n WeightedUnion operations takes
O(m + n log n) time in the worst case.
Proof: Whenever the leader of an object x is changed by a WeightedUnion, the size of the set
containing x increases by at least a factor of two. By induction, if the leader of x has changed k
times, the set containing x has at least 2k members. After the sequence ends, the largest set
contains at most n members. (Why?) Thus, the leader of any object x has changed at most blg nc
times.
Since each WeightedUnion reduces the number of sets by one, there are m − n sets at
the end of the sequence, and at most n objects are not in singleton sets. Since each of the
non-singleton objects had O(log n) leader changes, the total amount of work done in updating
the leader pointers is O(n log n).
ƒ
The aggregate method now implies that each WeightedUnion has amortized cost O(log n).
17.3
Path Compression
Using unthreaded tress, Find takes logarithmic time and everything else is constant; using
threaded trees, Union takes logarithmic amortized time and everything else is constant. A third
method allows us to get both of these operations to have almost constant running time.
We start with the original unthreaded tree representation, where every object points to a
parent. The key observation is that in any Find operation, once we determine the leader of an
object x, we can speed up future Finds by redirecting x’s parent pointer directly to that leader.
In fact, we can change the parent pointers of all the ancestors of x all the way up to the root; this
is easiest if we use recursion for the initial traversal up the tree. This modification to Find is
called path compression.
p
a
b
q
p
c
r
d
b
a
q
r
d
c
Path compression during Find(c). Shaded nodes have a new parent.
Find(x)
if x 6= parent(x)
parent(x) ← Find(parent(x))
return parent(x)
If we use path compression, the ‘depth’ field we used earlier to keep the trees shallow is no
longer correct, and correcting it would take way too long. But this information still ensures
that Find runs in Θ(log n) time in the worst case, so we’ll just give it another name: rank. The
following algorithm is usually called union by rank:
4
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
MakeSet(x):
parent(x) ← x
rank(x) ← 0
Union(x, y)
x ← Find(x)
y ← Find( y)
if rank(x) > rank( y)
parent( y) ← x
else
parent(x) ← y
if rank(x) = rank( y)
rank( y) ← rank( y) + 1
Find still runs in O(log n) time in the worst case; path compression increases the cost by only
most a constant factor. But we have good reason to suspect that this upper bound is no longer
tight. Our new algorithm memoizes the results of each Find, so if we are asked to Find the
same item twice in a row, the second call returns in constant time. Splay trees used a similar
strategy to achieve their optimal amortized cost, but our up-trees have fewer constraints on their
structure than binary search trees, so we should get even better performance.
This intuition is exactly correct, but it takes a bit of work to define precisely how much better
the performance is. As a first approximation, we will prove below that the amortized cost of a
Find operation is bounded by the iterated logarithm of n, denoted log∗ n, which is the number of
times one must take the logarithm of n before the value is less than 1:
¨
1
if n ≤ 2,
∗
lg n =
∗
1 + lg (lg n) otherwise.
Our proof relies on several useful properties of ranks, which follow directly from the Union and
Find algorithms.
• If a node x is not a set leader, then the rank of x is smaller than the rank of its parent.
• Whenever parent(x) changes, the new parent has larger rank than the old parent.
• Whenever the leader of x’s set changes, the new leader has larger rank than the old leader.
• The size of any set is exponential in the rank of its leader: size(x) ≥ 2rank(x) . (This is easy
to prove by induction, hint, hint.)
• In particular, since there are only n objects, the highest possible rank is blg nc.
• For any integer r, there are at most n/2 r objects of rank r.
Only the last property requires a clever argument to prove. Fix your favorite integer r.
Observe that only set leaders can change their rank. Whenever the rank of any set leader x
changes from r − 1 to r, mark all the objects in x’s set. Since leader ranks can only increase over
time, each object is marked at most once. There are n objects altogether, and any object with
rank r marks at least 2 r objects. It follows that there are at most n/2 r objects with rank r, as
claimed.
?
17.4
O(log∗ n) Amortized Time
The following analysis of path compression was discovered just a few years ago by Raimund Seidel
and Micha Sharir.¹ Previous proofs² relied on complicated charging schemes or potential-function
¹Raimund Seidel and Micha Sharir. Top-down analysis of path compression. SIAM J. Computing 34(3):515–525,
2005.
²Robert E. Tarjan. Efficiency of a good but not linear set union algorithm. J. Assoc. Comput. Mach. 22:215–225, 1975.
5
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
arguments; Seidel and Sharir’s analysis relies on a comparatively simple recursive decomposition.
(Of course, simple is in the eye of the beholder.)
Seidel and Sharir phrase their analysis in terms of two more general operations on set forests.
Their more general Compress operation compresses any directed path, not just paths that lead
to the root. The new Shatter operation makes every node on a root-to-leaf path into its own
parent.
Compress(x, y):
〈〈 y must be an ancestor of x〉〉
if x 6= y
Compress(parent(x), y)
parent(x) ← parent( y)
Shatter(x):
if parent(x) 6= x
Shatter(parent(x))
parent(x) ← x
Clearly, the running time of Find(x) operation is dominated by the running time of Compress(x, y),
where y is the leader of the set containing x. Thus, we can prove the upper bound by analyzing
an arbitrary sequence of Union and Compress operations. Moreover, we can assume that the
arguments of every Union operation are set leaders, so that each Union takes only constant
worst-case time.
Finally, since each call to Compress specifies the top node in the path to be compressed,
we can reorder the sequence of operations, so that every Union occurs before any Compress,
without changing the number of pointer assignments.
y
x
y
x
x
y
x
y
y
y
x
x
Top row: A Compress followed by a Union. Bottom row: The same operations in the opposite order.
Each Union requires only constant time, so we only need to analyze the amortized cost of
Compress. The running time of Compress is proportional to the number of parent pointer
assignments, plus O(1) overhead, so we will phrase our analysis in terms of pointer assignments.
Let T (m, n, r ) denote the worst case number of pointer assignments in any sequence of at most m
Compress operations, executed on a forest of at most n nodes, in which each node has rank at
most r.
The following trivial upper bound will be the base case for our recursive argument.
Theorem 2. T (m, n, r) ≤ nr
Proof: Each node can change parents at most r times, because each new parent has higher rank
than the previous parent.
ƒ
Fix a forest F of n nodes with maximum rank r, and a sequence C of m Compress operations
on F , and let T (F, C ) denote the total number of pointer assignments executed by this sequence.
6
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
Let s be an arbitrary positive rank. Partition F into two sub-forests: a ‘low’ forest F− containing
all nodes with rank at most s, and a ‘high’ forest F+ containing all nodes with rank greater than s.
Since ranks increase as we follow parent pointers, every ancestor of a high node is another high
node. Let n− and n+ denote the number of nodes in F− and F+ , respectively. Finally, let m+
denote the number of Compress operations that involve any node in F+ , and let m− = m − m+ .
F
F+
rank ≥ s
rank ≥ s
rank < s
rank < s
F–
Splitting the forest F (in this case, a single tree) into sub-forests F+ and F− at rank s.
Any sequence of Compress operations on F can be decomposed into a sequence of Compress
operations on F+ , plus a sequence of Compress and Shatter operations on F− , with the same
total cost. This requires only one small modification to the code: We forbid any low node from
having a high parent. Specifically, if x is a low node and y is a high node, we replace any
assignment parent(x) ← y with parent(x) ← x.
A Compress operation in F splits into a Compress operation in F+ and a Shatter operation in F−
This modification is equivalent to the following reduction:
Compress(x, y, F ):
〈〈 y is an ancestor of x〉〉
if rank(x) > s
Compress(x, y, F+ )
〈〈in C+ 〉〉
else if rank( y) ≤ s
Compress(x, y, F− )
〈〈in C− 〉〉
else
z←x
while rank(parent F (z)) ≤ s
z ← parent F (z)
Compress(parent F (z), y, F+ )
Shatter(x, z, F− )
parent(z) ← z
〈〈in C+ 〉〉
(!?)
The pointer assignment in the last line (!?) looks redundant, but it is actually necessary for the
analysis. Each execution of that line mirrors an assignment of the form parent(z) ← w, where z
is a low node, w is a high node, and the previous parent of z was also a high node. Each of these
7
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
‘redundant’ assignments happens immediately after a Compress in the top forest, so we perform
at most m+ redundant assignments.
Each node x is touched by at most one Shatter operation, so the total number of pointer
reassignments in all the Shatter operations is at most n.
Thus, by partitioning the forest F into F+ and F− , we have also partitioned the sequence C of
Compress operations into subsequences C+ and C− , with respective lengths m+ and m− , such
that the following inequality holds:
T (F, C) ≤ T (F+ , C+ ) + T (F− , C− ) + m+ + n
P
Since there are only n/2i nodes of any rank i, we have n+ ≤ i>s n/2i = n/2s . The number
of different ranks in F+ is r − s < r. Thus, Theorem 2 implies the upper bound
T (F+ , C+ ) < r n/2s .
Let us fix s = lg r , so that T (F+ , C+ ) ≤ n. We can now simplify our earlier recurrence to
T (F, C) ≤ T (F− , C− ) + m+ + 2n,
or equivalently,
T (F, C) − m ≤ T (F− , C− ) − m− + 2n.
Since this argument applies to any forest F and any sequence C, we have just proved that
T 0 (m, n, r) ≤ T 0 (m, n, blg rc) + 2n,
where T 0 (m, n, r) = T (m, n, r) − m. The solution to this recurrence is T 0 (n, m, r ) ≤ 2n lg∗ r .
Voilá!
Theorem 3. T (m, n, r) ≤ m + 2n lg∗ r
?
17.5
Turning the Crank
There is one place in the preceding analysis where we have significant room for improvement.
Recall that we bounded the total cost of the operations on F+ using the trivial upper bound from
Theorem 2. But we just proved a better upper bound in Theorem 3! We can apply precisely the
same strategy, using Theorem 3 recursively instead of Theorem 2, to improve the bound even
more.
∗
Suppose we fix s = lg∗ r, so that n+ = n/2lg r . Theorem 3 implies that
T (F+ , C+ ) ≤ m+ + 2n
lg∗ r
∗
2lg
r
≤ m+ + 2n.
This implies the recurrence
T (F, C) ≤ T (F− , C− ) + 2m+ + 3n,
which in turn implies that
T 00 (m, n, r) ≤ T 00 (m, n, lg∗ r) + 3n,
8
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
where T 00 (m, n, r) = T (m, n, r)−2m. The solution to this equation is T (m, n, r ) ≤ 2m+3n lg∗∗ r ,
where lg∗∗ r is the iterated iterated logarithm of r:
¨
1
if r ≤ 2,
lg∗∗ r =
∗∗
∗
1 + lg (lg r) otherwise.
Naturally we can apply the same improvement strategy again, and again, as many times as
we like, each time producing a tighter upper bound. Applying the reduction c times, for any
c
positive integer c, gives us T (m, n, r ) ≤ cm + (c + 1)n lg∗ r , where


if c = 0,
lg r
c
lg∗ r =
1
if r ≤ 2,

1 + lg∗c (lg∗c−1 r) otherwise.
Each time we ‘turn the crank’, the dependence on m increases, while the dependence on n
and r decreases. For sufficiently large values of c, the cm term dominates the time bound, and
further iterations only make things worse. The point of diminishing returns can be estimated by
the minimum number of stars such that lg∗∗···∗ r is smaller than a constant:
c
α(r) = min c ≥ 1 lg∗ n ≤ 3 .
c
(The threshold value 3 is used here because lg∗ 5 ≥ 2 for all c.) By setting c = α(r), we obtain
our final upper bound.
Theorem 4. T (m, n, r) ≤ mα(r) + 3n(α(r) + 1)
We can assume without loss of generality that m ≥ n by ignoring any singleton sets, so this
upper bound can be further simplified to T (m, n, r) = O(mα(r)) = O(mα(n)). It follows that if
we use union by rank, Find with path compression runs in O(α(n)) amortized time.
Even this upper bound is somewhat conservative if m is larger than n. A closer estimate is
given by the function
c
α(m, n) = min c ≥ 1 lg∗ (lg n) ≤ m/n .
It’s not hard to prove that if m = Θ(n), then α(m, n) = Θ(α(n)). On the other hand, if
m ≥ n lg∗∗∗∗∗ n, for any constant number of stars, then α(m, n) = O(1). So even if the number of
Find operations is only slightly larger than the number of nodes, the amortized cost of each Find
is constant.
O(α(m, n)) is actually a tight upper bound for the amortized cost of path compression; there
are no more tricks that will improve the analysis further. More surprisingly, this is the best
amortized bound we obtain for any pointer-based data structure for maintaining disjoint sets;
the amortized cost of every Find algorithm is at least Ω(α(m, n)). The proof of the matching
lower bound is, unfortunately, far beyond the scope of this class.³
³Robert E. Tarjan. A class of algorithms which require non-linear time to maintain disjoint sets. J. Comput. Syst.
Sci. 19:110–127, 1979.
9
Algorithms
17.6
Lecture 17: Disjoint Sets [Fa’13]
The Ackermann Function and its Inverse
The iterated logarithms that fell out of our analysis of path compression are the inverses of a
hierarchy of recursive functions defined by Wilhelm Ackermann in 1928.⁴


if n = 1
2
c
2 ↑ n := 2n
if c = 0

2 ↑c−1 (2 ↑c (n − 1)) otherwise
For each fixed integer c, the function 2 ↑c n is monotonically increasing in n, and these functions
grow incredibly faster as the index c increases. 2 ↑ n is the familiar power function 2n . 2 ↑↑ n is
the tower function:
ª
2 ↑↑ n = 2 ↑ 2 ↑ . . . ↑ 2 = 2
|
{z
}
22
.2
..
n
n
John Conway named 2 ↑↑↑ n the wower function:
2 ↑↑↑ n = 2 ↑↑ 2 ↑↑ · · · ↑↑ 2 .
{z
}
|
n
And so on, et cetera, ad infinitum.
c
For any fixed c, the function log∗ n is the inverse of the function 2 ↑c+1 n, the (c + 1)th row
in the Ackerman hierarchy. Thus, for any remotely reasonable values of n, say n ≤ 2256 , we have
c
log∗ n ≤ 5, log∗∗ n ≤ 4, and log∗ n ≤ 3 for any c ≥ 3.
The function α(n) is usually called the inverse Ackerman function.⁵ Our earlier definition is
equivalent to α(n) = min{c ≥ 1 | 2 ↑c+2 3 ≥ n}; in other words, α(n) + 2 is the inverse of the third
c
column in the Ackermann hierarchy. The function α(n) grows much more slowly than log∗ n
for any fixed c; we have α(n) ≤ 3 for all even remotely imaginable values of n. Nevertheless, the
function α(n) is eventually larger than any constant, so it is not O(1).
2 ↑c n
n=1
2
n=3
n=4
n=5
2n
2
4
6
8
10
2↑n
2
4
8
16
2 ↑↑ n
2 ↑↑↑ n
2 ↑↑↑↑ n
2
2
2
4
16
4
22
4
2...2
2 ↑↑↑↑↑ n
2
4
65536
ª
2.
2
2...2
2...2 ...
2
..
22
65536
ª
2
..
2.
32
65536
...2
65536
65536
22
22
2
..
2.
2...2 ...
2...2
65536
65536
22
.2
..
22
2
2
..
2.
2
ª
2
.2
..
22
ª
65536
ª
65536
〈〈Yeah, right.〉〉
ª
65536
〈〈Very funny.〉〉
〈〈Argh! My eyes!〉〉
Small (!!) values of Ackermann’s functions.
⁴Ackermann didn’t define his functions this way—I’m actually describing a slightly cleaner hierarchy defined 35
years later by R. Creighton Buck—but the exact details of the definition are surprisingly irrelevant! The mnemonic
up-arrow notation for these functions was introduced by Don Knuth in the 1970s.
⁵Strictly speaking, the name ‘inverse Ackerman
definition of the true
function’
One good formal
is inaccurate.
c
inverse Ackerman function is α̃(n) = min c ≥ 1 lg∗ n ≤ c = min c ≥ 1 2 ↑c+2 c ≥ n . However, it’s not hard to
prove that α̃(n) ≤ α(n) ≤ α̃(n) + 1 for all sufficiently large n, so the inaccuracy is completely forgivable. As I said in
the previous footnote, the exact details of the definition are surprisingly irrelevant!
10
Algorithms
17.7
Lecture 17: Disjoint Sets [Fa’13]
To infinity. . . and beyond!
Of course, one can generalize the inverse Ackermann function to functions that grow arbitrarily
more slowly, starting with the iterated inverse Ackermann function
¨
1
if n ≤ 4,
α∗ (n) =
∗
1 + α (α(n)) otherwise,
then the iterated iterated iterated inverse Ackermann function


if c = 0,
α(n)
c
α∗ (n) =
1
if n ≤ 4,

1 + α∗c (α∗c−1 (n)) otherwise,
and then the diagonalized inverse Ackermann function
c
Head-asplode(n) = min{c ≥ 1 | α∗ n ≤ 4},
and so on ad nauseam. Fortunately(?), such functions appear extremely rarely in algorithm analysis. Perhaps the only naturally-occurring example of a super-constant sub-inverse-Ackermann
function is a recent result of Seth Pettie⁶, who proved that if a splay tree is used as a double-ended
queue — insertions and deletions of only smallest or largest elements — then the amortized cost
of any operation is O(α∗ (n))!
Exercises
1. Consider the following solution for the union-find problem, called union-by-weight. Each
set leader x stores the number of elements of its set in the field weight(x). Whenever we
Union two sets, the leader of the smaller set becomes a new child of the leader of the
larger set (breaking ties arbitrarily).
MakeSet(x):
parent(x) ← x
weight(x) ← 1
Find(x):
while x 6= parent(x)
x ← parent(x)
return x
Union(x, y)
x ← Find(x)
y ← Find( y)
if weight(x) > weight( y)
parent( y) ← x
weight(x) ← weight(x) + weight( y)
else
parent(x) ← y
weight(x) ← weight(x) + weight( y)
Prove that if we use union-by-weight, the worst-case running time of Find(x) is O(log n),
where n is the cardinality of the set containing x.
2. Consider a union-find data structure that uses union by depth (or equivalently union by
rank) without path compression. For all integers m and n such that m ≥ 2n, prove that
there is a sequence of n MakeSet operations, followed by m Union and Find operations,
that require Ω(m log n) time to execute.
⁶Splay trees, Davenport-Schinzel sequences, and the deque conjecture. Proceedings of the 19th Annual ACM-SIAM
Symposium on Discrete Algorithms, 1115–1124, 2008.
11
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
3. Suppose you are given a collection of up-trees representing a partition of the set {1, 2, . . . , n}
into disjoint subsets. You have no idea how these trees were constructed. You are
also given an array node[1 .. n], where node[i] is a pointer to the up-tree node containing
element i. Your task is to create a new array label[1 .. n] using the following algorithm:
LabelEverything:
for i ← 1 to n
label[i] ← Find(node[i])
(a) What is the worst-case running time of LabelEverything if we implement Find
without path compression?
(b) Prove that if we implement Find using path compression, LabelEverything runs in
O(n) time in the worst case.
4. Consider an arbitrary sequence of m MakeSet operations, followed by u Union operations,
followed by f Find operations, and let n = m + u + f . Prove that if we use union by rank
and Find with path compression, all n operations are executed in O(n) time.
5. Suppose we want to maintain an array X [1 .. n] of bits, which are all initially zero, subject
to the following operations.
• Lookup(i): Given an index i, return X [i].
• Blacken(i): Given an index i < n, set X [i] ← 1.
• NextWhite(i): Given an index i, return the smallest index j ≥ i such that X [ j] = 0.
(Because we never change X [n], such an index always exists.)
If we use the array X [1 .. n] itself as the only data structure, it is trivial to implement
Lookup and Blacken in O(1) time and NextWhite in O(n) time. But you can do better!
Describe data structures that support Lookup in O(1) worst-case time and the other two
operations in the following time bounds. (We want a different data structure for each set
of time bounds, not one data structure that satisfies all bounds simultaneously!)
(a) The worst-case time for both Blacken and NextWhite is O(log n).
(b) The amortized time for both Blacken and NextWhite is O(log n). In addition, the
worst-case time for Blacken is O(1).
(c) The amortized time for Blacken is O(log n), and the worst-case time for NextWhite
is O(1).
(d) The worst-case time for Blacken is O(1), and the amortized time for NextWhite is
O(α(n)). [Hint: There is no Whiten.]
6. Suppose we want to maintain a collection of strings (sequences of characters) under the
following operations:
• NewString(a) creates a new string of length 1 containing only the character a and
returns a pointer to that string.
12
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
• Concat(S, T ) removes the strings S and T (given by pointers) from the data structure,
adds the concatenated string S T to the data structure, and returns a pointer to the
new string.
• Reverse(S) removes the string S (given by a pointer) from the data structure, adds
the reversal of S to the data structure, and returns a pointer to the new string.
• Lookup(S, k) returns the kth character in string S (given by a pointer), or Null if
the length of the S is less than k.
Describe and analyze a simple data structure that supports Concat in O(log n) amortized
time, supports every other operation in O(1) worst-case time, and uses O(n) space, where n
is the sum of the current string lengths. Unlike the similar problem in the previous lecture
note, there is no Split operation. [Hint: Why is this problem here?]
7. (a) Describe and analyze an algorithm to compute the size of the largest connected
component of black pixels in an n × n bitmap B[1 .. n, 1 .. n].
For example, given the bitmap below as input, your algorithm should return the
number 9, because the largest conected black component (marked with white dots
on the right) contains nine pixels.
9
(b) Design and analyze an algorithm Blacken(i, j) that colors the pixel B[i, j] black and
returns the size of the largest black component in the bitmap. For full credit, the
amortized running time of your algorithm (starting with an all-white bitmap) must
be as small as possible.
For example, at each step in the sequence below, we blacken the pixel marked
with an X. The largest black component is marked with white dots; the number
underneath shows the correct output of the Blacken algorithm.
9
14
14
16
17
(c) What is the worst-case running time of your Blacken algorithm?
? 8.
Consider the following game. I choose a positive integer n and keep it secret; your goal is
to discover this integer. We play the game in rounds. In each round, you write a list of at
most n integers on the blackboard. If you write more than n numbers in a single round,
you lose. (Thus, in the first round, you must write only the number 1; do you see why?)
If n is one of the numbers you wrote, you win the game; otherwise, I announce which of
13
Algorithms
Lecture 17: Disjoint Sets [Fa’13]
the numbers you wrote is smaller or larger than n, and we proceed to the next round. For
example:
You
1
4, 42
8, 15, 16, 23, 30
9, 10, 11, 12, 13, 14
Me
It’s bigger than 1.
It’s between 4 and 42.
It’s between 8 and 15.
It’s 11; you win!
Describe a strategy that allows you to win in O(α(n)) rounds!
© Copyright 2014 Jeff Erickson.
This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/).
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
14