Download Chapter 4: Algorithms and Data Structures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

B-tree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Chapter 4
Algorithms and Data Structures
In this chapter, we will discuss certain algorithms and data structures which are
needed to implement some of the algorithms we will discuss in Chapter 5, and which
allow to accelerate these algorithms in certain cases.
The first aspect we want to look at is searching for data. Then, we will consider
binary trees. These have many applications in computer science. Many powerful
data structures, such as dictionaries (see Section 2.3.5), are usually implemented
using (balanced) binary trees.1
Also, binary trees can be used to sort data efficiently, or to speed up mathematical operations such as computing f mod gi for polynomials f, g1 , . . . , gn ∈ R[X]
efficiently, when the degrees of the gi ’s are not close to each other. We will use this
in Chapter 5.
Finally, we will consider sorting data sets. We will show how to sort using
Python, how to implement a simple sorting algorithm (Section 4.3.1), and how to
use binary heaps – a special kind of a binary tree – to create an asymptotically
optimal sorting algorithm (Section 4.3.2).
Note that essentially all algorithms in this chapter will have running times
O(log n) or O(n log n) (except the one in Section 4.3.1).
4.1
Searching
If we are given a list L = (a0 , . . . , an−1 ) with unsorted data, and we want to find a
specific element x, we need in the worst case n comparisons to find out whether or
not ai = x for some i, and if it is, to determine such an i.
In the case the list is sorted, we can dramatically speed this up using binary
search. This is another classical divide and conquer algorithm, where the input
problem is split into two halves. For this, one compares the seeked element x with
the element in the middle, an/2 . If x is less than an/2 , then x must be in the first
half of the list – if it exists in the list. And if x > an/2 , then x must be in the second
half of the list.
1
The dictionaries in Python, though, are implemented using hash tables. But for example, in
C++, dictionaries (there called maps) are implemented using trees.
121
122
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
Input: A sorted list L = (a0 , . . . , an−1 ) in ascending order (i.e. ai ≤ aj for i < j),
and an element x
Output: An index i ∈ {0, . . . , n − 1} with ai = x in case it exists, or ∅ otherwise.
1. If n = 0, return ∅;
2. Let k = bn/2c;
3. Compare ak to x:
• If ak = x, return k;
• If ak > x, apply Algorithm 4.1 recursively to (a0 , . . . , ak−1 ); if the result
is i, return i;
• If ak < x, apply Algorithm 4.1 recursively to (ak+1 , . . . , an−1 ); if the
result is i, return ∅ in case i = ∅ and i + k + 1 otherwise.
Algorithm 4.1: Binary search
Proposition 4.1.1. Algorithm 4.1 is correct and needs at most 1 + blog2 nc ∈
O(log n) comparisons for n ≥ 1.
Proof. The correctness is clear in case L is sorted in ascending order. For the number
of comparisons, denote by T (n) the maximal number of comparisons needed by
Algorithm 4.1 if it is given a list of length n. We will show T (n) ≤ 1 + blog2 nc,
which is monotonic in n. Clearly, T (1) = 1, whence for n = 1, the inequality holds.
Let n ≥ 2.
Note that (n − 1) − (k + 1) + 1 = n − k − 1 ≤ k, since n ≤ 2k + 1. Therefore,
T (n) ≤ 1 + (1 + blog2 kc) = 1 + b1 + log2 bn/2cc
= 1 + blog2 (2bn/2c)c ≤ 1 + blog2 nc.
In case nothing is known about the order of the elements, it can be shown that
searching on a classical computer is in Θ(n). Interestingly, on a quantum computer,
√
such searches can be done in O( n) operations (Grover search).
4.2
Binary Trees
Trees are one way to store data in a non-linear fashion. If the trees are search trees,
one can efficiently search for elements in them; for balanced trees, the complexity for
searching an element, inserting an element, etc. is O(log n), where n is the number
of elements stored in that tree.
In this section we want to introduce trees and forests, present balanced trees,
discuss heaps and priority queues, and finally discuss Huffman trees and their relation to entropy. All trees in this section will be binary, which means that every
node has at most two children. One can also consider more general trees, but we do
not need such during this course.
In graph theory, a binary tree can be described as a connected undirected graph
without circles, loops and double nodes, where the degree of every node is ≤ 3 and
the degree of at least one node is ≤ 2. If we drop the requirement of being connected,
the result is a forest; a forest is the disjoint union of trees. The root of a tree is a
node with degree ≤ 2. If a root is fixed, and a, b are two nodes connected by an
4.2. BINARY TREES
123
edge, we say that a is the parent of b and b is a child of a if a is closer to the root.
The leaves of a tree are precisely the nodes of degree one, or the root in case it has
degree zero.
From the computer science perspective, a binary tree looks as this:
root
node
node
leaf
node
leaf
node
node
node
leaf
leaf
node
leaf
leaf
leaf
while a forest of five trees with roots r1 , . . . , r5 looks like this:
r1
r2
r3
r4
r5
We call the distance between a node and the root of its tree the height of that node.
The height of the tree is the maximal height of all its nodes. For example, the first
tree had height 5, while the five trees in the forest have heights 3, 4, 3, 4 and 2,
respectively.
Note that if we fix a node of a tree, we can look at its subtree: this are the nodes
which are children of this node, their own children, etc.
4.2.1
A Formal Approach
A more formal approach to binary trees and forests requires some background from
Graph Theory. In this section, we will provide the necessary definitions and some
results.
We restrict ourselves to undirected graphs without loops.
Definition 4.2.1.
(i) A graph G is a pair G = (V, E), where E ⊆ {e ⊆ V | |e| = 2}. The set V
is called the vertex set of G and denoted by V (G), and the set E is called the
edge set of G and denoted by E(G).
An element e = {x, y} ∈ E is called edge of G, and often written as xy. An
element x ∈ V is called vertex of G.
(ii) If x ∈ V (G) and e ∈ E(G), then we say that e and x are incident to x if and
only if x ∈ e.
We denote by E(x) or EG (x) the set of edges which are incident to x, and by
δG (x) := |EG (x)| the degree of x.
124
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
(iii) Two vertices x, y ∈ V (G) are said to be adjacent if {x, y} ∈ E(G), and two
edges e1 , e2 ∈ E(G) are said to be adjacent if |e1 ∩ e2 | = 1.
(iv) A subgraph of a graph G = (V, E) is a graph G0 = (V 0 , E 0 ) with V 0 ⊆ V ,
E 0 ⊆ E.
(v) Given a subset V 0 ⊆ V , where G = (V, E) is a graph, the subgraph induced by
V 0 is the graph G[V 0 ] := (V 0 , E 0 ), where E 0 = {e ∈ E | e ⊆ V 0 }.
Remark 4.2.2. For every subset V 0 ⊆ V (G), G[V 0 ] is a subgraph of G.
Definition 4.2.3.
(i) A way of length n in a graph G is a sequence of vertices (x0 , . . . , xn ) such that
for all i ∈ {1, . . . , n}, {xi−1 , xi } ∈ E(G). We denote the way by x0 x1 · · · xn .
(Note that n = 0 is possible.)
We say that x0 · · · xn is connecting x0 with xn .
(ii) A graph G is connected if for every two vertices x, y ∈ V , there exists a way
connecting x with y.
(iii) A connected component of a graph G is a subgraph G[V 0 ], where V 0 ⊆ V (G)
is maximal with respect to inclusion such that G[V 0 ] is connected.
(iv) A way x0 · · · xn is called a path if xi 6= xj for i 6= j.
(v) A way x0 · · · xn with n ≥ 2 is a circle if xn = x0 and x0 · · · xn−1 is a path.
Proposition 4.2.4. If GSis a graph and {Gi | i ∈SI} the set of connected components of G, then V (G) = i∈I V (Gi ) and E(G) = i∈I E(Gi ), and both unions are
disjoint.
In particular, for every x ∈ V (G), there exists a unique connected component Gi
with x ∈ V (Gi ). We write G(x) := V (Gi ).
Definition 4.2.5.
(i) A forest is a graph which has no circle. A connected forest is called a tree. A
forest respectively tree is called binary if all vertex degrees are ≤ 3.
(ii) A rooted (binary) tree is a pair (G, r) such that G is a (binary) tree, and in
case it is binary, δ(r) ≤ 2. We call r the root of (G, r).
(iii) A rooted (binary) forest is a pair (G, (ri )i∈I ) such that
• ri and rj are not connected for i 6= j;
• (G(ri ), ri ) is a rooted (binary) tree;
• {G(ri ) | i ∈ I} is the set of connected components of G.
The vertices ri are called roots of the trees in (G, (ri )i ).
(iv) If G is a (rooted) forest or (rooted) tree, we call a vertex x ∈ V (G) a leaf if
and only if
(1) x is a root (in case G is rooted) and δ(x) = 0; or
(2) x is not a root (in case G is rooted) and δ(x) = 1.
4.2. BINARY TREES
125
Remark 4.2.6. A rooted (binary) tree (G, r) is also a rooted (binary) forest (G, (r){1} ).
Proposition 4.2.7. Let G be a forest.
(a) If x, y ∈ V (G), there exists at most one path connecting x to y. Such a path
exists if and only if y ∈ V (G(x)). The path equals the shortest way connecting
x to y.
(b) If (G, (ri )i∈I ) is a rooted forest and x ∈ V (G(ri )), then the height h(x) of x is
the length of the unique path connecting ri to x.
Definition 4.2.8. Let (G, (ri )i∈I ) be a rooted forest. Let x, y ∈ V (G).
(i) We say that x is a child of y and y a parent of x if and only if x and y are
connected by a path of length 1 and h(x) = h(y) + 1.
(ii) The height of G(ri ) is defined as {max h(x) | x ∈ G(ri )}.
(iii) The vertices of G are also called nodes.
(iv) The set Sx of grandchildren of x ∈ G is the set of nodes y ∈ G such that there
exists a path x0 · · · xn with x0 = x, xn = y such that xi is a child of xi−1 ,
1 ≤ i ≤ n. Denote G[Sx ] by G|x , which we will call the subtree starting at x.
Remarks 4.2.9. Let (G, (ri )i∈I ) be a (binary) forest and x ∈ V (G).
(a) Then x is a leaf if and only if x has no children.
(b) If (G, (ri )i∈I ) is binary, every node has at most two children.
(c) The roots ri are the only nodes of height 0, and the only nodes which have no
parent. All other nodes have precisely one parent (which we from now on call
the parent).
(d) (G|x , x) is a rooted (binary) tree. Moreover, G|ri = G(ri ).
(e) If G(x) is a tree of height n, then G|x is a tree of height ≤ n − h(x).
Lemma 4.2.10. A rooted binary tree has at most 2n nodes of height n. In particular,
a binary tree of height n has at most 2n+1 − 1 nodes and at most 2n leaves.
Proof. Let (G, r) be a rooted binary tree. We show the first claim by induction.
Clearly, h(x) = 0 if and only if x = r, whence the number of nodes of height 0
equals 1 = 20 .
Let Sn be the set of nodes of height n. Assuming |Sn | ≤ 2n , we see that the
map Sn+1 → Sn , mapping each node to its parent, is surjective and at most two-toone. Therefore, |Sn+1 | ≤ 2 · |Sn | ≤ 2n+1 .
Now assume that the height of G is n. Then Sk = ∅ for k > n, whence |V | =
Pn
Pn
2n+1 −1
k
n+1 − 1.
k=0 |Sk | ≤
k=0 2 = 2−1 = 2
For the statement about the number of leaves, we proceed by induction on the
height of G. Clearly, a rooted binary tree of height 0 has precisely one node, which
is a leaf. Thus the statement is true for such trees. Now assume that the statement
is true for all rooted binary trees of height n. Let (G, r) be a rooted binary tree
of height n + 1, and let xi , i ∈ I with I ∈ {{1}, {1, 2}} be all children of r. By
the previous remark, G|xi is a rooted binary tree of height ≤ (n + 1) − h(xi ) =
(n + 1) − 1 = n; therefore, the number of leaves of G|xi is at most 2n .
126
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
S
Now V (G) = {r} ∪ i∈I V (G|xi ), whence the number of leaves of G equals the
sum of all leaves of G|xi , i ∈ I. Therefore, the total number of leaves of G is
bounded by |I| · 2n = 2 · 2n = 2n+1 , what we wanted to show. By induction, the
claim follows.
Definition 4.2.11.
(i) A binary tree is called perfect if it contains precisely 2n+1 − 1 nodes and has
height n.
(ii) A binary tree is called complete if it contains precisely 2k nodes of height k for
all k less than its height.
Remarks 4.2.12.
(a) In a perfect binary tree, every node is either a leaf, or has precisely two children.
(b) In a complete binary tree of height n, all nodes of height < n − 1 have precisely
two children. The nodes of height n − 1 can have between zero and two children.
(c) A perfect binary tree of height n has precisely 2n+1 − 1 nodes, and a complete
binary tree of height n has at least 2n and at most 2n+1 − 1 nodes.
(d) A perfect binary tree of height n has precisely 2n leaves, and a complete binary
tree of height n has at least 2n−1 + 1 and at most 2n leaves.
The following graph depicts a rooted binary forest, whose two trees are both of
height 4, and one is perfect (G(r1 )) and the other one complete (G(r2 )):
r1
4.2.2
r2
Search Trees and Balanced Trees
A binary tree with data is a tree whose nodes contain data. More precisely, let (T, r)
be a rooted binary tree, and let (X, ≤) be an arbitrary set. Let d : V (T ) → X be a
map: this map associates to every node x ∈ V (T ) of the tree one element of x, the
data stored in this node. Then a binary rooted tree with data is the tuple (T, r, d).
There are many ways to implement binary trees with data. In Python, one could
do this using classes:
1
2
3
4
class Node ( object ) :
left = None
right = None
data = None
5
6
7
def __init__ ( self , data ) :
self . data = data
4.2. BINARY TREES
127
Each Node object contains references to at most two children, which are stored in
Node.left and Node.right. Sometimes, it us useful to also add a reference to the
parent. It is possible to “walk through the tree” and print the data using the
following function:
1
2
3
4
5
6
def printTree ( root ) :
if root is None :
return
print root . data
printTree ( root . left )
printTree ( root . right )
If we create a tree as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
def createTestTree () :
n1 = Node ( " This " )
n2 = Node ( " is " )
n2 . left = n1
n3 = Node ( " some " )
n4 = Node ( " test " )
n4 . right = n3
n4 . left = n2
n5 = Node ( " tree " )
root = Node ( " Root " )
root . left = n4
root . right = n5
return root
Then we can do the following:
1
2
3
4
5
6
7
8
>>> root = createTestTree ()
>>> printTree ( root )
Root
test
is
This
some
tree
Note that the way printTree() is traversing the tree is called a depth-first traversion.
We can also modify printTree() so that it outputs the tree structure:
1
2
3
4
5
6
def printTree2 ( root , ind =0) :
if root is None :
return
print " " * ind + str ( root . data )
printTree2 ( root . left , ind +2)
printTree2 ( root . right , ind +2)
Then we obtain:
1
2
3
4
5
6
7
8
>>> root = createTestTree ()
>>> printTree2 ( root )
Root
test
is
This
some
tree
128
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
It is also possible to get a more “graphical” structure similar to the one we used
before:
1
2
3
4
5
6
def printTree3 ( root , ind =0) :
if root is None :
return
printTree3 ( root . left , ind +2)
print " " * ind + str ( root . data )
printTree3 ( root . right , ind +2)
Then we obtain:
1
2
3
4
5
6
7
8
>>> root = createTestTree ()
>>> printTree3 ( root )
This
is
test
some
Root
tree
This has to be read as follows:
Root
test
is
tree
some
This
Note that this is a bit more than a binary rooted tree with data (T, r, d): there
is an explicit order on the children, by distinguishing them as “left” and “right”
children. Also, if there is only one child of a node, it can be either a “left” or
a “right” child. This can be formulated mathematically by adding another map
` : V (T ) \ {r} → {left, right}. (One could also define two maps which map each
node to its left child and its right child; but then we need a value for “has no such
child”.) Such a quadruple (T, r, d, `) is what is meant when talking about binary
trees in Computer Science. We denote this definition by binary CS-tree.
We can now also say what a binary search tree is.
Definition 4.2.13. Let (T, r, d, `) be a binary CS-tree, and assume that (X, ≤) is a
totally ordered set, where d : V (T ) → X. Then (T, r, d, `) is a binary search CS-tree
if for every x ∈ V (T ), one of the following holds:
• if x has a left child x1 , then all grandchildren x01 of x1 satisfy d(x1 ) ≤ d(x);
• if x has a right child x2 , then all grandchildren x02 of x2 satisfy d(x2 ) ≥ d(x).
Note that each node is a grandchild of itself. Thus if node x has the left child x1
and the right child x2 , we have in particular d(x1 ) ≤ d(x) ≤ d(x2 ).
Colloquially, nodes more left in a binary search CS-tree are less or equal to the
nodes more right. For example, the following tree is a search tree (using the natural
ordering on N):
4.2. BINARY TREES
129
23
13
5
42
21
1
If we would replace 21 by 31, it would be no longer a search tree, since one grandchild
of 13 would be larger than 23, while 13 is a left child of 23.
The main property of binary search CS-trees is that they allow to efficiently
search in them:
Theorem 4.2.14. If we are given a binary search CS-tree of height n, we can test
whether it contains an element, and if yes, find this element in O(n) comparisons.
In case the tree is complete and has m nodes, one can search for elements in
O(log m) comparisons.
The second follows from the fact that a complete tree of height n has between
2n and 2n+1 − 1 nodes, whence 2n ≤ m < 2n+1 and thus n ≤ log2 m < n + 1.
One problem with trees is that creating a binary search CS-tree is quite easy –
to insert a new element, one essentially searches for it, and then one knows where it
has to be inserted – but that they usually are not complete. The farer they are away
from being complete, the larger is their height compared to log2 |V (T )|. This shows
that searching becomes slower. In the worst case, when T is degenerated , every node
has at most one child, and the tree is essentially an ordered linked list:
1
5
12
13
23
42
Searching for an element can take as many comparisons as there are elements in the
tree, which is far away from log2 |V (T )|.
For this reason, one often uses balanced binary search CS-trees. There are different ways of being balanced; one very strict notion is being complete: then the
height of two different leaves differs by at most one.
If one inserts a new node into a balanced binary search CS-tree (or removes
one), one has to ensure that the tree is still balanced. This is usually achieved by
rotations. Two important examples of such trees are red-black trees and AVL trees.
We will not go into details here, but refer to the literature and to Wikipedia.
130
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
4.2.3
Simple Trees
There are several applications where one would like to use a tree, but where one
wants to keep things simple. For example, given a list of elements x1 , . . . , xn , we
want a tree with the elements x1 , . . . , xn as its leaves.
We do not care how precisely the tree looks like. Instead, our main focus are the
following points:
(i) The height of the tree is ≈ log2 n;
(ii) All leaves are at the same height;
(iii) It is easy to process the tree by processing all nodes on one height at the same
time.
The simplest solution is to start with a complete tree of height m such that the m-th
row contains all n elements. The smallest such m is m = dlog2 ne.
For example, assume that we want to store eleven elements x1 , . . . , x11 . The
minimal m is m = dlog2 11e = 4. Consider the following perfect binary tree of
height 4:
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10 x11
If we now cut off all leaves which are empty, we obtain a complete binary tree.
If we further cut off all branches (i.e. subtrees) whose leaves are empty, we obtain
the following binary tree:
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
We will call trees constructed with this method simple
trees.
Note that the number of nodes on level k equals 11/24−k for k = 0, . . . , 4:
k
11
24−k
0
1
2
3
4
1
2
3
6
11
This is a general property of such trees, as we will see below.
Moreover, describing which leaves are below a node (i.e. what are the leaves of
the corresponding subtree) at level k is not so complicated:
• On level 3, the first node has x1 , x2 as leaves, the second x3 , x4 , . . ., the fifth
node has x9 , x10 as leaves, and the sixth node has x11 as a leaf;
4.2. BINARY TREES
131
• On level 2, the first node has x1 , . . . , x4 as leaves, the second node x5 , . . . , x8 ,
and the third x9 , . . . , x11 ;
• On level 1, the first node has x1 , . . . , x8 as leaves, and the second node x9 , · · · , x11 ;
• On level 0, the only node (which is the root) has x1 , . . . , x11 as leaves.
Therefore, one easily deduces that for level k and node ` ∈ {1, . . . , 11/24−k }, it
has x24−k (`−1)+1 , . . . , min{11, x24−k ` } as leaves:
1
2
3
4
5
6
7
8
9
10
11
`
0
1, . . . , 11
1
1, . . . , 8
9, 10 11
2
1, 2, 3, 4
5, 6, 7, 8
9, 10, 11
3
1, 2
3, 4
5, 6
7, 8
9, 10
11
4
1
2
3
4
5
6
7
8
9
10
11
k
Let us collect a few results about simple trees:
Proposition 4.2.15. Let n ∈ N>0 . Consider the simple tree for the data elements x1 , . . . , xn .
(a) The tree has height h = dlog2 ne;
(b) In row k (where k = 0 is the root and k = h contains the leaves), we have
n/2h−k nodes;
(c) The parent of node i ∈ {0, . . . , n/2h−k −1} inrow k is
node bi/2c in row k −1,
if k > 0; the grandparent in row ` < k is node i/2k−` ;
(d) The children of node i ∈ {0, . . . , n/2h−k − 1} in row k are nodes 2i and 2i + 1
in row k + 1, if k < h. Node 2i in row k + 1 always exists,
node
2i + 1 exists
and
h−k
h−k
in row k + 1 if and only if i < n/2
− 1, or i = n/2
− 1 and either
n ≡ 0 (mod 2h−k ) or (n mod 2h−k ) > 2h−k−1 ;
(e) The leaves under node i in row k have indices 2h−k i to min{2h−k (i+1)−1, n−1};
thus they correspond to x2h−k i+1 , . . . , xmin{2h−k (i+1),n} .
Proof. To ease the proofs of parts (b) and (d), define A : R≥0 → R by A(0) = 0 and
A(x) = 1 for x > 0.
(a) A perfect tree of height h has 2h nodes in the last row. Therefore, 2h−1 < n ≤ 2h
yields h = dlog2 ne.
(b) Above any node, there is at least one other node. Each node has at most
two children, and every node but the last in a row has always two children.
Therefore, if row k + 1 has tk+1 nodes, then row k has tk = dtk+1 /2e nodes. We
132
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
will show by induction on k = h, h − 1, . . . , 0 that th = n implies tk = n/2h−k .
For k = h this is clear.
Thus, assume that tk+1 = n/2h−(k+1) for k < h. Write n = 2h−k a+2h−(k+1) b+
c with a, b, c ∈ N, b < 2 and c < 2h−(k+1) . Then tk+1 = 2a + b + A(c),
and tk+1 /2 = a + (b + A(c))/2, and (b+ A(c))/2 ≤ 1.
Therefore,
dtk+1 /2e =
a + A(b + A(c)) = a + A(2b + c) = a + (2b + c)/2h−k = n/2h−k .
t /2 =
(c) The
first
part
is
clear.
The
second
part
follows
by
induction
using
n/2
n/2t+1 .
(d) For i < n/2h−k − 1 the statement is clear (compare the proof of (b)). For
i = n/2h−k − 1, the first child is 2i = 2 n/2h−k − 2 < n/2h−(k+1) − 1. The
second child would be 2i+1, which only exists if and only if 2i+1 = 2 n/2h−k −
1 ≤ n/2h−(k+1) − 1. This is equivalent to 2 n/2h−k ≤ n/2h−(k+1) .
Write
n = 2h−k a + 2h−k−1 b + c with
b, c∈ N, b < 2 and c < 2h−k−1 . Then
a,h−k
h−(k+1)
n/2
= 2a + b + A(c), and n/2
= a + A(b + c) (compare the proof
of (b)). Therefore, the existence of node 2i + 1 is equivalent to 2a + 2A(b + c) ≤
2a + b + A(c), i.e. to 2A(b + c) ≤ b + A(c). This is true if either b + c = 0, or
if b = 1 and c > 0. The first condition is equivalent to n ≡ 0 (mod 2h−k ), and
the second is equivalent to (n mod 2h−k ) > 2h−k−1 .
(e) The first leave under node i has index 2h−k i. Assuming that each of the (grand)children has two children, the last node will be 2h−k i + (2h−k − 1) = 2h−k (i +
1) − 1. Since the last index in the last line is n, the claim thus follows.
4.2.4
Heaps and Priority Queues
One important data structure which is best to visualize as a complete binary CStree is a heap. One important application of heaps are priority queues. Such queues
allow to quickly insert new elements and to quickly remove the smallest element.
(Or largest element, by inverting the order.)
Similarly to binary search CS-trees, heaps are binary (CS-)trees with special
properties. As opposed to search CS-trees, we do not need the more strict definition
of CS-trees, but it suffices to look at binary trees.
Definition 4.2.16. A (binary) heap is a rooted (binary) tree (T, r) such that every x ∈ V (T ) satisfies the heap condition: if y is any child of x, then d(x) ≤ d(y).
This ensures that the root r of the heap (T, r) satisfies d(r) ≤ d(x) for all
x ∈ V (T ). Also note that if x ∈ V (T ), then T |x is also a heap.
Note that a binary search CS-tree (T, r, d, `) yields a special binary heap (T, r, d),
but a binary heap is often far away from being a search tree. For example, look at
the example heaps in Sections 4.2.4.1, 4.2.4.3 and 4.2.4.2.
In the following, we will only consider binary heaps, though most statements
are also true for arbitrary heaps, as long as the number of children is bounded and
assumed to be in O(1).
4.2.4.1
Insertion
One of the the two most important algorithms for a heap is insertion of new elements.
The following algorithm describes how to do this.
4.2. BINARY TREES
133
Input: the root R of a heap, an element x ∈ X to be inserted
Output: the root R0 of a new heap containing everything from the old heap and x
(1) Append a new leaf L such that the height of the tree only increases if it was
perfect before;
Note: we will see below how to do this in practice (see Section 4.2.4.4);
(2) Let the value of L be x, i.e. d(L) = x;
(3) As long as L has a parent P and as d(P ) > d(L), do the following:
• Swap d(P ) and d(L) such that d(P ) < d(L);
• Let L now point to P ;
Note: we treat L as a reference to a node in the tree. Now we change the
reference;
(4) Return R.
Algorithm 4.2: Inserting a new element into a heap
The algorithm works as follows. Assume that we want to insert 4 into the
following heap:
3
5
10
7
6
23
18
13
18
101
20
21
19 202
1234
21
14
22
123
50
125 210 75
90
We begin by adding a new leaf L in level 4 (since the tree is not yet perfect, we cannot
add a new level). The red rounded boxes show the positions of the references R
(the root), L (the inserted leaf) and P (the parent of L):
3
R
5
10
7
23
6
13
P
4
L
21
18
18
101
20
19 202
1234
21
14
22
123
50
125 210 75
90
Clearly, the heap property is violated, as d(L) = 4 < 13 = d(P ). Therefore, we swap
d(L) and d(P ), and let L now point to P (and adjust P to be L’s parent):
3
R
5
P
23
10
6
7
L
13
4
21
18
18
101
20
19 202
1234
21
14
22
123
125 210 75
50
90
134
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
Again, the heap property is violated, since d(L) = 4 < 7 = d(P ). Therefore, we
again swap d(L) and d(P ), and let L now point to P (and adjust P to be L’s parent):
3
P
L
R
5
10
4
6
23
7
13
21
18
18
101
20
19 202
1234
21
14
22
123
50
125 210 75
90
And again, the heap property is violated, since d(L) = 4 < 5 = d(P ). Therefore, we
swap d(L) and d(P ) another time, and let L now point to P (and adjust P to be
L’s parent):
P
L
3
R
4
10
5
6
23
7
13
21
18
18
101
20
19 202
1234
21
14
22
123
125 210 75
50
90
Now, finally, the heap property is satisfied.
Note that since our tree is complete, the height is in O(log n), where n is the
number of elements. Since we do at most one swap and comparison per level, the
number of operations is in O(log n). Finally, note that if the heap was complete
before, it is still complete afterwards: a new level is only added in case the binary
tree was perfect.
This is summed up in the following proposition:
Proposition 4.2.17. Let R be the root of a binary heap of n nodes which is complete.
Assume that appending a new leaf as described in the algorithm is possible in O(log n)
operations. Then Algorithm 4.2 inserts a new element into the heap using at most
O(log n) comparisons and swaps. Afterwards, the tree is still complete.
4.2.4.2
Deletion of the Root
The second of the two most important algorithms we want to describe for a heap
is the process of removing the smallest element of the heap, which can always be
found at its root. We assume that the heap has at least two elements, as otherwise
the process consists of just getting rid of the root.
We split up the deletion algorithm into two steps:
• Removing one node (at the bottom) and replacing the content of the root with
its content;
• Restoring the heap property.
Let us first consider the process of restoring the heap property. We give it as an
extra algorithm since we will need this part of the removal process later again:
4.2. BINARY TREES
135
Input: the root R of a binary tree which satisfies the heap property except for
possibly the root node
Output: the root of the same tree with rearranged data, now satisfying the heap
property everywhere
(1) Let L point to R;
(2) While L is not a leaf:
(a) If d(L) ≤ d(C) for all children C of L, exit the loop;
(b) Let C be a child of L such that d(C) < d(L) and d(C) ≤ d(C 0 ) for all other
children C 0 of L;
(c) Swap d(L) with d(C), and let L point to C;
(3) Return R.
Algorithm 4.3: Percolate down (for heaps)
Proposition 4.2.18. Let R be the root of a binary tree of height h which satisfies
the hypotheses of Algorithm 4.4. Then Algorithm 4.4 rearranges the data such that
the resulting binary tree is a heap with at most O(h) comparisons and swaps. The
structure of the tree is not changed.
Now the removal algorithm can be specified as follows:
Input: the root R of a heap of at least two nodes
Output: the root R0 of a new heap containing everything from the old heap except
its root
(1) Let L be any leaf in the last level;
(2) Swap d(R) with d(L) and remove L from the heap;
(3) Apply Algorithm 4.3 to the heap with root R;
(4) Return R.
Algorithm 4.4: Removing the smallest element of a heap
The asymptotic running time is the same as for insertion, and it also preserves
the property of being a complete binary tree:
Proposition 4.2.19. Let R be the root of a binary heap of n nodes which is complete,
and assume that n ≥ 2. Further assume that it is possible to obtain any leaf in the
last level in O(log n) operations. Then Algorithm 4.4 removes the root data using at
most O(log n) comparisons and swaps. Afterwards, the tree is still complete.
We again want to illustrate removal with an example. Consider the heap build
in Section 4.2.4.3. We want to remove its root element 1:
136
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
1
1
3
7
19
8
14
26
6
20
9
8
11
18
12
For that, we take some leaf, say the one with data 12, replace 1 by 12 and delete
the leaf:
L 12 R
C1
C2
1
7
19
8
14
26
3
6
20
8
9
11
18
Now we have to restore the heap property, which is violated since d(C1 ) < d(L). As
d(C1 ) ≤ d(C2 ), we swap d(C1 ) with d(L), and assign L to point to C1 :
1
R
L 12
C1
C2
7
19
3
14
8
26
6
20
9
8
11
18
Again, the heap property is violated since d(C1 ) < d(L). As d(C1 ) ≤ d(C2 ), we
swap d(C1 ) with d(L), and assign L to point to C1 :
1
R
7
3
L 12
8
19
14
C1
C2
26
6
20
8
9
11
18
Since d(L) ≤ d(C1 ) and d(L) ≤ d(C2 ), the heap property is now satisfied.
4.2.4.3
Building a Heap
In case we want to create a new heap and fill it with (unsorted) data x1 , . . . , xn , we
just take the first data x1 , create a root node R, and assign d(R) = x1 . We then
use Algorithm 4.2 (n − 1)-times to insert x2 , . . . , xn . Since before inserting xi , the
heap has i − 1 nodes, the total running time is
n
X
i=2
O(log(i − 1)) = O
n−1
X
log i .
i=1
Pn−1
One can now show that
i=1 log i ∈ Θ(n log n), whence we see that the total
complexity for creating a heap of n elements is O(n log n).
4.2. BINARY TREES
137
Interestingly, it turns out that this process can be done much faster. The basic
idea is to just create a complete binary tree which contains all the data x1 , . . . , xn ,
and only then to ensure the heap property:
Input: a binary tree
Output: the same tree with rearranged data, now satisfying the heap property
(1) Let h be the height of the tree.
(2) For level k = h − 1, h − 2, . . . , 0, do:
(a) For each node P on level k, do:
(i) Let L1 , . . . , L` be the children of P , ` ∈ {0, 1, 2};
(ii) If there exists an i such that d(Li ) < d(P ) and d(Li ) ≤ d(Lj ) for all j:
• Swap d(Li ) and d(P );
• Turn the heap starting at node Li back into a heap using Algorithm 4.3;
(3) Return the tree.
Algorithm 4.5: Building up a heap from a binary tree
In level k, we have 2k nodes. By Proposition 4.2.18, the call to Algorithm 4.3 is
done in O(h−k) operations, since the subtree beginning at Li has height ≤ h−k −1.
Therefore, the total number of operations is in
h−1
X
k=0
X
X
h−1
h
k
k−h
2 O(h − k) = O n ·
2
(h − k) = O n ·
= O(n)
2k
k
k=0
k=1
since
∞
X
k
= 2.
2k
k=1
P
1
k
(For this, consider the power series f (x) = ∞
k=0 x = 1−x ; its radius of convergence
P
P
1
k−1 =
is 1 and its derivative is f 0 (x) = ∞
, whence f 0 (1/2) = 2 · ∞
k=1 kx
k=1 k ·
(1−x)2
2−k = 4.) We have therefore shown the following result:
Proposition 4.2.20. Assume that it is possible to iterate over all nodes as specified
in the algorithm in O(n) operations in total. Then Algorithm 4.5 builds a heap out
of n unsorted data in O(n) comparisons and swaps.
Let us now illustrate the building process in a small example. The following
complete tree consisting out of 15 nodes has height 3:
26
1
8
7
19
1
14
8
6
20
3
12
11
18
9
In the first step, all nodes on level 2 are checked; they are marked in red . All
children which will be swapped with the nodes are marked in blue :
138
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
26
1
8
7
19
1
14
8
6
20
3
12
11
18
9
After swapping, the trees beginning at the red nodes are already heaps, and Algorithm 4.3 does not do anything. Now we consider the nodes on level 1:
26
1
8
7
19
1
14
8
3
20
6
9
11
18
12
After another swapping process, we are now in the situation that the left subtree
below 3 (marked in red) is no longer a heap:
26
1
3
7
19
1
14
8
8
20
6
9
11
18
12
Algorithm 4.3 will now restore the heap property by swapping 8 with 6. After this,
we continue with Algorithm 4.5 and consider the root, the only node on level 0:
26
1
3
7
19
1
14
8
6
20
8
9
11
18
12
In the final step, one last swap is done. Afterwards, the left subtree below the new
root (marked in red) is no longer a heap:
1
26
3
7
19
1
14
8
6
20
8
9
11
18
12
Algorithm 4.3 will swap the 26 with the 1 in its right child, and then swap the new
26 with its left child 8. This results in the following heap, where we marked all
nodes which were changed since the last tree in blue:
1
1
3
7
19
8
14
26
6
20
8
9
11
18
12
4.2. BINARY TREES
4.2.4.4
139
Representing a Complete Heap
So far, everything seems to be quite easy, in particular for humans. The problem is
that Algorithms 4.2, 4.4 and 4.5 use some operations which we have not explained
yet:
• Algorithm 4.2 needs to append a leaf to the tree, which should be on the last
level, except if the last level is full;
• Algorithm 4.4 needs to get hold of one leaf in the last level;
• Algorithm 4.5 needs to go through all levels step by step.
There is a simple solution for all these problems:
• to (conceptually) use a complete binary CS-tree such that all leaves in the last
level are on the left side, and all “missing” leaves in the last level are on the
right side;
• and to represent this complete CS-tree as a simple list.
In fact, this is a very similar strategy to the simple trees in Section 4.2.3, but indexing
is even simpler. The following complete tree of height 4 displays the indices of the
data in each node:
[0, 0]
0
1
3
7
4
8
[1, 2]
2
9
5
10
11
[3, 6]
6
12
13
14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
[7, 14]
[15, 30]
Proposition 4.2.21. Let x0 , . . . , xn−1 be some data. If represented as a complete
binary CS-tree whose last level is filled from left to right, the tree T has the following
properties:
(a) Its height is h := blog2 nc;
(b) On level k ∈ {0, . . . , h − 1}, it has 2k nodes, and on level h, it has n − 2h + 1 ≤
2h nodes;
(c) The nodes on level k ∈ {0, . . . , h} have indices 2k − 1 to min{2(2k − 1), n − 1};
if k < h, then min{2(2k − 1), n − 1} = 2(2k − 1);
i
(d) The parent of the node with index i > 0 has index i+1
− 1 = i−1
= 2 − 1;
2
2
(e) The children of the node with index i have indices j with 2i + 1 ≤ j ≤ min{2i +
2, n − 1}. That is, node i has no child if 2(i + 1) > n, one child if 2(i + 1) = n,
and two children if 2(i + 1) < n.
Proof.
(a) Follows from Remark 4.2.12 (c).
140
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
(b) For levels 0 up to h − 1, P
this follows from Remark 4.2.12 (c). Therefore, in all
k
h
these levels, the tree has h−1
k=0 2 = 2 − 1 nodes. Therefore, on level h, it must
h
have n − (2 − 1) nodes.
Finally, as h = blog2 nc, we have 2h ≤ n < 2h+1 . As n is an integer, n ≤ 2h+1 −1,
whence n − 2h + 1 ≤ (2h+1 − 1) − 2h + 1 = 2h .
(c) We show this by induction on k. For k = 0, there is only one node with index 0,
and 20 −1 = 0 = 2(20 −1). Now assume that the formula is correct for level k−1,
where k ∈ {1, . . . , h}. Then the last index in level k − 1 is 2(2k−1 − 1), and thus
the first index in level k is 2(2k−1 − 1) + 1 = 2k − 1.
In case k < h, level k has 2k nodes by (b), whence the last index is
2(2k−1 − 1) + 2k = 2k+1 − 2 = 2(2k − 1).
In case k = h, level h has n − 2h + 1 nodes, whence the last index is
2(2h−1 − 1) + (n − 2h + 1) = n − 1 = min{2(2h − 1), n − 1}
as 2(2h−1 − 1) + (n − 2h + 1) ≤ 2(2h−1 − 1) + 2h = 2(2h − 1) by (b).
(d) In case node i is on level k > 0, we have i = 2k − 1 + j for 0 ≤ j < 2k . Now
the parent of node i is on level k − 1 and has index bj/2c there. Therefore, its
global index is (2k−1 − 1) + bj/2c.
Now j = i − (2k − 1), whence
bj/2c = (i − 2k + 1)/2 = i/2 − 2k−1 + 1/2 = bi/2 + 1/2c − 2k−1 .
Therefore, the parent has index
(2k−1 − 1) + bj/2c = (2k−1 − 1) + bi/2 + 1/2c − 2k−1
= b(i + 1)/2c − 1 = b(i − 1)/2c.
Finally, if we write i = 2` + t with t ∈ {0, 1}, then
b(i − 1)/2c = b` + t/2 − 1/2c = ` + b(t − 1)/2c = ` − 1 + t
= ` − 1 + dt/2e = d(2` + t)/2e − 1 = di/2e − 1.
(e) Assume that node i is the j-th node on level k, where 0 ≤ j < 2k . Then
i = 2k − 1 + j by (c). Now the children of node i – if they exist – are on
level k + 1 and have indices 2j and 2j + 1 on that level. Therefore, their global
indices are A := (2k+1 − 1) + 2j and B := (2k+1 − 1) + (2j + 1) = A + 1.
Now j = i − (2k − 1) yields A = (2k+1 − 1) + 2(i − (2k − 1)) = 2k+1 − 1 + 2i −
2k+1 + 2 = 2i + 1 and B = 2i + 2. Now these nodes only exist in the tree if
their index is ≤ n − 1. Therefore, the children of node i have indices t with
2i + 1 ≤ t ≤ min{2i + 2, n − 1}, and the statement on the number of children is
correct.
Using this representation, we obtain a binary heap which can do all operations
as fast as in Propositions 4.2.17, 4.2.19 and 4.2.20:
4.2. BINARY TREES
141
Theorem 4.2.22 (Algorithms for Linearly Represented Binary Heaps). There exist
algorithms such that a binary heap of n nodes can be stored in a linear fashion requiring O(n) memory positions, such that the following operations have the following
asymptotic complexities:
(a) Insertion of a new element can be done in O(log n) operations;
(b) Querying of the smallest element can be done in O(1) operations;
(c) Removal of the smallest element can be done in O(log n) operations;
(d) Building a new heap with n unsorted datas can be done in O(n) operations.
4.2.5
Huffman Trees
Huffman trees originate in Information Theory. The question is the following: given
an alphabet a1 , . . . , an , P
where every letter ai appears in texts with a probability
of pi , i.e. pi ∈ [0, 1] and ni=1 pi = 1. (Such a collection of pi ’s is called a (discrete)
probability distribution.) How can we choose a uniquely decodable 2 binary encoding,
i.e. sequences consisting of 0 and 1, for every letter ai , such that a random text
encoded with this scheme has a minimal length?
Assume that an encoding scheme is chosen where
Pn ai is encoded using ci bits.
Then the average length of one encoded letter is i=1 pi ci . C. Shannon [Sha48]
showed in his famous Source Coding Theorem (see Theorem 4.2.26 below) that we
always have
n
X
i=1
pi ci ≥
n
X
i=1
n
X
1
pi log2
=−
pi log2 pi =: H(p1 , . . . , pn );
pi
i=1
here, we understand that 0 · log2 0 = 0. The value H(p1 , . . . , pn ) is called the entropy
of the probability distribution (p1 , . . . , pn ). It always is H(p1 , . . . , pn ) ≤ log2 n.
D. Huffman gave a simple algorithm which allows to obtain such an encoding
which is almost optimal. More precisely, his algorithm constructs a binary encoding
with bit-lengths c1 , . . . , cn such that
n
X
pi ci < H(p1 , . . . , pn ) + 1
i=1
and that no other binary encoding performs better. Therefore, his encoding is
optimal.
We begin with bounding the entropy of a probability distribution.
Lemma 4.2.23. For any probability distribution (p1 , . . . , pn ), 0 ≤ H(p1 , . . . , pn ) ≤
log2 n, with equality on the left-hand side if and only if n = 1, and equality on the
right-hand side if and only if p1 = · · · = pn .
Proof. In case n = 1, we have α = 1 and H(p1 , . . . , pn ) = 0 = log2 1. Now assume
n > 1. As 0 < pi ≤ 1, log pi ≤ 0. As n > 1, we have pi < 1 for at least one i, and
the corresponding terms are then < 0, whence 0 < H(p1 , . . . , pn ).
2
This means that if we write all encoded words one after another without any separator, it is
possible to determine where one code word ends and where the next starts, if one goes through the
code letters from left to right.
142
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
For the other inequality, we use log x ≤ x − 1 for x > 0, with equality if and only
if x = 1. Then
H(p1 , . . . , pn ) − log2 n =
n
X
pi [log(1/pi ) − log n]/ log 2
i=1
= (log 2)
−1
≤ (log 2)−1
= (log 2)−1
n
X
i=1
n
X
i=1
n
X
pi log(1/(pi n))
pi (1/(npi ) − 1)
(1/n − pi ) = (log 2)−1 (1 − 1) = 0,
i=1
and equality holds if and only if 1/(npi ) = 1 for all i.
An additive tree (T, r, f ) is a rooted tree (T, r) together with a function f :
V (T ) →P
R≥0 such that if x ∈ V (G) and y1 , . . . , yk ∈ V (G) are all children of x, then
f (x) = ki=1 f (yi ). We call f (x) the weight of the node x, and f (r) the weight of
(T, r, f ).
Given an alphabet x1 , . . . , xn with probability distribution p1 , . . . , pn , any binary
encoding gives rise to an additive binary tree whose weight is 1. For example, assume
that we are given the following alphabet with n = 7:
i
ai
pi
1
A
0.30
2
B
0.10
3
C
0.09
4
E
0.05
5
G
0.15
6
L
0.13
7
R
0.18
We can first form any binary tree whose leaves are p1 , . . . , p7 :
0.18
R
0.30
A
0.13
0.10 0.09 0.05 0.15
B
C
E
L
G
Then we determine the weight of a parent node of two children which already have
heights by adding them together:
1.00
0.82
0.18
0.49
0.33
0.30
A
0.19
0.20
0.13
0.10 0.09 0.05 0.15
B
C
E
R
G
L
4.2. BINARY TREES
143
The average height of this tree is the sum of the heights of the leaves, weighted with
their value:
0.30 · 3 + (0.10 + 0.09 + 0.05 + 0.15) · 4 + 0.13 · 3 + 0.18 · 1 = 3.03.
On the other hand, the entropy of the probability distribution is
−
7
X
pi log pi ≈ 2.6205.
i=1
Now how is this related to an encoding scheme? We treat each left branch of the
binary tree as a 0, and each right branch of the binary tree as a 1:
1.00
0
0.82
0
0.18
1
0.49
0
0.30
A
1
R
0.33
1
0.19
0
0.20
1
0.13
0
1
0
1
0.10 0.09 0.05 0.15
B
C
E
L
G
This yields the following encodings:
Letter
A
B
C
E
G
L
R
Encoding
000
0010
0011
0100
0101
011
1
For example, the word ALGEBRA would be encoded as 000 011 0101 0100 0010 1
000, or written without spaces,
0000110101010000101000.
To decode this, one can again use the tree: one begins at the root and follows the
labeled branches down until a leaf is reached; then one letter was decoded, and one
can begin again at the root. This shows that such a scheme is uniquely decodable.
Now note that the height of a leaf equals the length of the encoding. Therefore,
the average height equals the average length of an encoded letter! By Shannon, the
average length is at least H(p1 , . . . , p7 ), which is true as we verified above.
As an intermission, let us prove Shannon’s theorem. We begin with two lemmas
from which the theorem follows.
Lemma 4.2.24 (Gibb’s Inequality). Let (p1 , . . . , pn ), (q1 , . . . , qn ) be two discrete
probability distributions. Then
H(p1 , . . . , pn ) = −
n
X
pi log pi ≤ −
i=1
and equality holds if and only if pi = qi for all i.
n
X
i=1
pi log qi ,
144
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
Proof. Let I = {i ∈ {1, . . . , n} | pi > 0}. Then using log x ≤ x − 1 for all x with
equality if and only if x = 1,
X
X
X
X
X
qi
qi
pi ·
−1 =−
qi +
pi = −
qi + 1 ≥ 0
−
pi log ≥
pi
pi
i∈I
i∈I
as
P
i∈I qi
i∈I
i∈I
i∈I
≤ 1. Therefore,
−
n
X
pi log qi = −
i=1
X
pi log qi ≥ −
i∈I
X
pi log pi = −
n
X
pi log pi .
i=1
i∈I
Dividing both sides by log 2 yields the inequality.
P Now in the above, we have equality if and only if qi /pi = 1 for all i ∈ I and
i∈I qi = 1. This is possible if and only if pi = qi for all i.
Lemma 4.2.25 (Kraft’s Inequality). Let c1 , . . . , cn be binary strings which form a
uniquely decodable code. If ci has length si , then
n
X
2−si ≤ 1.
i=1
P
Conversely, if there exist natural numbers s1 , . . . , sn satisfying ni=1 2−si ≤ 1,
there exist binary strings c1 , . . . , cn , where ci has length si , such that c1 , . . . , cn is a
uniquely decodable code.
Proof. Without loss of generality, we can assume s1 ≤ · · · ≤ sn . Let Ai be the set
of binary strings of length sn which begin with ci ; then |Ai | = 2sn −si . As c1 , . . . , cn
is a uniquely decodable code, Ai ∩ Aj = ∅ for i 6= j. Therefore,
n
X
sn −si
2
i=1
=
n
X
i=1
[
n
|Ai | = Ai ≤ 2sn .
i=1
Dividing by 2sn yields the inequality.
c1
c2
c3
c4
c5
c6
c7 c8 c9
Figure 4.1: Corresponding perfect binary CS-tree to the Gibb’s inequality 1 · 2−2 +
3 · 2−3 + 2 · 2−4 + 3 · 2−5 = 27/32 ≤ 1
P
For the converse, we are given natural numbers s1 , . . . , sn with ni=1 2sn −si ≤
2sn , and we again assume that s1 ≤ · · · ≤ sn . We begin with a rooted perfect
binary tree (T, r) of height sn ; this tree has 2sn leaves. In the first step, we pick
any node x1 of height s1 and remove T |x1 from T . This will remove 2sn −s1 leaves of
T . The node x1 corresponds to c1 , and the removed leaves correspond to all binary
strings of length sn which begin with c1 . By iteratively performing this operation
4.2. BINARY TREES
145
for s2 , . . . , sn , we obtain the desired code. We only have to show that this is always
possible:
Assume that we have iteratively removed
subtrees beginning at levels s1 , . . . , si−1 .
Pi−1
s
n
We are left with a binary tree with 2 − j=1 2sn −sj leaves at level sn . All removed
leaves belong to removed subtreesP
beginning at heights P
≤ si . In fact, the number of
n
si −sj < 2si (as
si −sj ≤ 2si ), so there
nodes removed at level si equals i−1
2
j=1
j=1 2
exists at least one node of height si which can be chosen for ci and removed.
Theorem 4.2.26 (Shannon’s Source Coding Theorem). Let (T, r, f ) be an additive
tree with leaves `1 , . . . , `n and weight > 0. Then
n
X
i=1
f (`1 )
f (`n )
.
hT (`i )f (`i ) ≥ f (r) · H
,...,
f (r)
f (r)
Moreover, there exists an additive tree (T 0 , r0 , f 0 ) with the same leaf weights such
that
n
X
f (`1 )
f (`n )
0
0
hT (`i )f (`i ) < f (r) · H
,...,
+1 .
f (r)
f (r)
i=1
Proof. Without loss of generality,
can assume that the tree has weight 1, i.e.
Pwe
n
f (r) = 1. We then have to show i=1 h(`i )f (`i ) ≥ H(f (`1 ), . . . , f (`n )).
Let ci be the binary string corresponding to the leaf `i . Then c1 , . . . , cn is a
uniquely decodable binary code,
P and since the length of ci is h(`i ), Kraft’s inequality
(Lemma 4.2.25) yields C := ni=1 2− h(`i ) ≤ 1.
P
P
Set qi := 2− h(`i ) /C and pi = f (`i ); then ni=1 qi = 1 = ni=1 pi . Gibb’s inequality (Lemma 4.2.24) yields
H(p1 , . . . , pn ) ≤ −
n
X
i=1
pi log2 qi =
n
X
i=1
pi h(`i ) +
n
X
i=1
pi log2 C ≤
n
X
pi h(`i ),
i=1
what we wanted to show for the first part.
For the existence of (T 0 , r0 , f 0 ),Pset si = d−Plog2 pi e. Then − log2 pi ≤ si <
n
− log2 pi + 1, and 2−si ≤ pi and ni=1 2−si ≤
i=1 pi = 1. By Kraft’s inequality (Lemma 4.2.25), there exists a uniquely decodable binary code with codeword
lengths s1 , . .P
. , sn . This code corresponds to an additive tree with average height
P
n
n
p
s
<
i
i
i=1 pi (− log2 pi + 1) = H(p1 , . . . , pn ) + 1.
i=1
We now want to relate this to the topic of this lecture, namely Computer Algebra. Assume we are given a list of polynomials f1 , . . . , fn ∈ R[X], together with
a multiplication
time M for polynomials. Our aim is to find out how fast we can
Qn
compute i=1 fi . If we use the naive approach, i.e. we first compute f1 f2 , then
(f1 f2 )f3 , then (f1 f2 f3 )f4 , etc., and if deg fiP
≈ deg fj for all i, j, then we obtain a
worst case bound of O(k M(k)), where k = ni=1 deg fi . Assuming that deg fi ≥ 1
for all i, we have k ≥ n.
There are simple approaches to do this faster; one is based on simple trees (as
introduced in Section 4.2.3), and its complexity is O(M(k) log n). This is already
much better than the worst case, but can still be improved, especially if the deg fi ’s
vary a lot.
Pn
Pn
Define pi = degk fi , where k =
i=1 deg fi ; then pi ∈ [0, 1] and
i=1 pi = 1,
whence (p1 , . . . , pn ) is a probability distribution. Now consider the function g :
1
k N → R≥0 , x 7→ M(kx); then f (pi ) denotes the number of operations required
146
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
to multiply two polynomials of degree deg fi , and g is non-negative and satisfies
g(x)/x ≤ g(y)/y for x ≤ y.
Q
We can now describe the way how the fi ’s are multiplied to obtain ni=1 fi as an
additive tree, whereQeach leaf corresponds to one fi and the other nodes correspond
to a subproduct of ni=1 fi ; more precisely, each node corresponds to the product of
all leaves in its subtree:
p1 + · · · + p7 = 1 f1 · · · f7
p1 + · · · + p6
f1 · · · f3
p1 + · · · + p3
p4 + · · · + p6
p1
p2 + p3
p4 + p5
p6
f1
p2
p3
p4
p5
f6
f2
f3
f4
f5
f2 · f3
p7
f1 · · · f6
f7
f4 · · · f6
f4 · f5
Here, the red rounded boxes show the polynomial (products) corresponding to the
nodes. Now,
we can bound the number of operations required to compute the
Q
product ni=1 fi using the tree by summing over all non-leaf nodes x and summing
up g(f (x)). The following more
P general proposition relates in our special case the
number of operations to M( ni=1 deg fi ) times the (normalized) average height of
the tree:
Proposition 4.2.27. Let (T, r, f ) be an additive tree of positive weight, with f (x) >
0 for all leaves x, and g a non-negative function defined on f (V (T )) which satisfies
g(t)/t ≤ g(t0 )/t0 for t ≤ t0 . Let L = {`1 , . . . , `n } be the n leaves of T . Then
X
n
n
X
1 X
g(f (x)) ≤ g
f (`i ) ·
h(`i )f (`i );
f (r)
i=1
x∈V (T )\L
Pn
1
f (r)
i=1
Pn
1
note that f (r) = i=1 f (`i ), and
i=1 h(`i )f (`i ) is the average height of (T, r, f (r) f ),
which is the normalized tree (i.e. it has weight 1). In case g is linear, equality holds.
Proof. For i ∈ {1, . . . , n} and x ∈ V (T ), let δ(x, i) =
P1 if `i is a leaf of T |x , and
δ(x, i) = 0 otherwise. With this notation, we have x∈V (T )\L δ(x, i) = h(`i ) for
P
every i and g(f (x)) = g( ni=1 δ(i, x)f (`P
i )) for every x ∈ V (T ). P
For a fixed x ∈ V (T )\L, set s(x) := ni=1 δ(i, x)f (`i ) and t := ni=1 f (`i ) = f (r);
then 0 < s(x) ≤ t, and thus g(s(x))/s(x) ≤ g(t)/t. Multiplying by st > 0 yields
X
n
n
X
f (r) · g
δ(i, x)f (`i ) = t · g(s(x)) ≤ s(x) · g(t) =
δ(i, x)f (`i )g(f (r)).
i=1
i=1
Summing over all such x, we obtain
X
f (r)g(f (x)) =
x∈V (T )\L
≤
X
X
f (r) · g
x∈V (T )\L i=1
δ(i, x)f (`i )g(f (r)) =
δ(i, x)f (`i )
i=1
x∈V (T )\L
n
X
X
n
n
X
i=1
h(`i )f (`i )g(f (r)).
4.2. BINARY TREES
147
P
Dividing by f (r) and using f (r) = ni=1 f (`i ) yields the result.
In case g is linear, all “≤” above can be replaced by “=”, whence we obtain
equality.
We now want
to present Huffman’s algorithm, which allows to create a tree which
1 Pn
minimizes f (r)
i=1 h(`i )f (`i ) among all additive binary trees with leaves `1 , . . . , `n .
Instead of presenting the algorithm right away, we want to begin with an example
run of the algorithm. We begin with ten non-negative real numbers, say
7, 45, 11, 61, 13, 16, 23, 3, 20 and 5,
and create ten additive trees whose only node consists of one of these numbers. We
order the trees by weight:
3
5
7
11
13
16
20
23
45
61
We then begin with taking the two trees with the smallest weights (or some of them,
if these are not unique), and combine them to a new tree and insert it at the correct
position:
7
8
11
3
13
16
20
23
45
61
5
In the next step, we again take the two trees of smallest weight, and combine them:
11
13
15
7
16
20
23
45
61
8
3
5
In the next step, we again take the two trees of smallest weight, and combine them:
15
7
16
20
8
3
23
24
11
45
61
13
5
After another step, we have this forest:
20
23
24
11
31
13
15
7
45
61
16
8
3
5
The next step results in the following forest:
24
11
31
13
15
7
16
8
3
43
5
20
45
23
61
148
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
One more step:
43
20
45
23
55
61
24
11
31
13
15
7
16
8
3 5
After three more steps, we obtain the following final result:
204
88
43
20
116
45
55
23
61
24
11
31
13
15
7
16
8
3
5
The average height of this tree is
2 · (45 + 61) + 3 · (20 + 23) + 4 · (11 + 13 + 16) + 5 · 7 + 6 · (3 + 5)
584
=
,
3 + 5 + 7 + 11 + 13 + 16 + 20 + 23 + 45 + 61
204
which is ≈ 2.8627, and the entropy is
3
5
7 11 13 16 20 23 45 61
H
,
,
,
,
,
,
,
,
,
204 204 204 204 204 204 204 204 204 204
Now we state a formal description of the algorithm:
≈ 2.8412.
4.2. BINARY TREES
149
Input: elements x1 , . . . , xn with non-negative real numbers p1 , . . . , pn
Output: a additive tree with nodes x1 , . . . , xn with minimal average height
(1) Create a list L;
(2) For i = 1, . . . , n:
(a) Create an additive tree T with one node xi with weight pi ;
(b) Put T somewhere into L;
(3) Create a binary heap on L;
(4) While L contains more than one element:
(a) Remove the two smallest trees T1 and T2 from L and restore the heap
property;
(b) Create a new additive tree T such that the two subtrees below the root
are T1 and T2 ;
(c) Insert T into the heap L;
(5) Return the only element in L.
Algorithm 4.6: Huffman’s algorithm
Instead of keeping a heap, one could also sort the list L (see Section 4.3), and
using a binary search (see Section 4.1) to insert the new tree T at the correct position
so that the list L is sorted afterwards. This is essentially what we did in the example
above, and we would obtain the same asymptotic complexity for building the tree
assuming that insertion can be done in O(log n) operations.
Proposition 4.2.28. Algorithm 4.6 returns after O(n log n) operations.
Proof. Building the heap can be done in O(n log n) operations (Theorem 4.2.22 (d)).
In iteration k ∈ {1, . . . , n} of the while loop, the heap contains n − k + 1 elements
at the beginning. Therefore, removing the two smallest elements can be done in
O(log(n−k+1)) operations, and inserting the new tree can also be done in O(log(n−
k + 1)) operations (Theorem 4.2.22 (a)–(c)). Therefore, the loop in total needs
n
X
O(log(n − k + 1)) ⊆ O(n log n)
k=1
operations.
We are left to show that the algorithm is correct, i.e. that the algorithm returns
an additive tree with minimal average height.
Proposition 4.2.29. Algorithm 4.6 returns an additive tree of minimal average
height. That is, if (T, r, f ) is the output of the algorithm with leaves `1 , . . . , `n , and
(T 0 , r0 , f 0 ) is any other additive tree with leaves `01 , . . . , `0n such that f (`i ) = f 0 (`0i )
for all i, then
n
n
X
X
h(`i )f (`i ) ≤
h(`0i )f 0 (`0i ).
i=1
i=1
150
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
Proof. Without loss of generality, we can assume that p1 ≤ · · · ≤ pn and n > 1.
First, note that there are only finitely many additive trees with leaf weights p1 , . . . , pn ;
therefore, we can pick one which minimizes the average path length. Let one such
tree be (T 0 , r0 , f 0 ), and let `01 , . . . , `0n be its leaves with f 0 (`0i ) = pi .
Let N be a node of T 0 which is not a leaf and whose h(N ) is maximal under this
condition. If `01 and `02 are not leaves of N , we can permute the (`0i )’s such that this
is the case, without increasing the average height. Thus, the children of N can be
assumed to be `01 and `02 . We replace T 0 |N by a leaf `002 of weight p1 + p2 , and obtain
a new additive tree (T 00 , r00 , f 00 ) with leaves `002 , . . . , `00n of weights p1 + p2 , p3 , . . . , pn .
We now claim that this additive tree also has minimal average height:
By Proposition 4.2.27 with g(x) = x, we have for L00 = {`002 , . . . , `00n } that
X
f 00 (x) =
x∈V (T 00 )\L00
n
X
h(`00i )f 00 (`00i ).
i=2
This shows that the average height of T 00 equals the average height of T 0
minus p1 + p2 . If (T 00 , r00 , f 00 ) would not have minimal average height, we
could take another additive tree with minimal average height having the
leaf weights p1 + p2 , p3 , . . . , pn , replace the node for p1 + p2 with the subtree
T 0 |N , and obtain another additive tree with leaf weights p1 , . . . , pn which
weights less than T 0 , a contradiction.
By repeating this construction, we see that T 0 must contain one possible output of
Algorithm 4.6 as a subtree. Moreover, the above shows that the resulting average
weight of the output of Algorithm 4.6 does not depend on the choices made inside
the algorithm.
Finally, we want to show that the average height of the generated
P tree is not
too far away from the lower bound p H(p1 /p, . . . , pn /p), where p = ni=1 pi . More
precisely:
Corollary 4.2.30. The output (T, r, f ) of Algorithm 4.6 satisfies
p · H(p1 /p, . . . , pn /p) ≤
n
X
h(`i )f (`i ) < p · (H(p1 /p, . . . , pn /p) + 1),
i=1
where `1 , . . . , `n are the leaves of T and p =
Pn
i=1 pi .
Proof. In case p = 0, the statement is clear, since this implies pi = 0 for all i. Thus,
assume that p > 0. The lower bound follows from the first part of Theorem 4.2.26,
as it is valid for all additive trees with weight p > 0. Without loss of generality, we
can assume that p = 1.
The second part of Theorem 4.2.26 shows that there exists an additive tree with
minimal average height < p · (H(p1 /p, . . . , pn /p) + 1). Since the average height of
the output of Algorithm 4.6 is minimal by Proposition 4.2.29, this claim follows as
well.
We therefore have shown the following result:
Corollary 4.2.31. Given polynomials
f1 , . . . , fn ∈ R[X] of degree ≥ 1, it is possible
Q
to compute the product f := ni=1 fi in less than
M(deg f ) · (H(deg f1 / deg f, . . . , deg fn / deg f ) + 1) ≤ M(deg f ) · (log2 n + 1)
4.3. SORTING
151
arithmetic operations in R, with an administrative cost of O(n log n) operations.
Proof. Let (T, r, f ) denote the additive tree output by Algorithm 4.6 for the leaf
weights deg f1 , . . . , deg fn . Building the tree is the administrative cost, which is
O(n log n) by Proposition 4.2.28. Let `1 , . . . , `n be the leaves of T , where f (`i ) =
deg fi .
By Proposition 4.2.27,
running time of multiplying along the tree T is
Pnthe total
deg fi
bounded by M(deg f ) · i=1 h(`i ) deg f . Next, Corollary 4.2.30 yields
n
X
h(`i )
i=1
deg fi
< H(deg f1 / deg f, . . . , deg fn / deg f ) + 1,
deg f
and Lemma 4.2.23 gives H(deg f1 / deg f, . . . , deg fn / deg f ) ≤ log2 n.
Note that the algorithm is essentially equivalent to the following algorithm, which
avoids talking about (Huffman) trees. The statements of Corollary 4.2.31 also hold
for this algorithm:
Input: polynomials f1Q
, . . . , fn ∈ R[X]
Output: the product ni=1 fi
(1) Put the polynomials into a list L and sort the list by degrees;
(2) While L contains more than one element:
(a) Remove the two first elements g, h of L;
(b) Compute f := g · h;
(c) Insert f into L such that L is still sorted by degree;
(3) Return the unique element in L.
Algorithm 4.7: Computing
Qn
i=1 fi
In fact, one can extend several results which use multiplication trees using Huffman trees. We will see some such examples in Chapter 5.
4.3
Sorting
There exist many different sorting algorithms. We want to discuss two algorithms,
one slow and simple, namely selection sort, and another which is fast (in fact,
asymptotically optimal), namely heap sort. The algorithm used in Python is called
Timsort, a hybrid algorithm; see Wikipedia for more information. Its asymptotic
complexity is optimal as well, but it is better optimized for many “every-day situations”.
There are many other optimal and slow but easy algorithms. A nice overview can
be found on Wikipedia as well. Let us first begin with a theorem from Complexity
Theory:
Theorem 4.3.1. For any sorting algorithm which runs on a classical computer and
knows nothing about the data except a way to compare two datas to each other,
there exists an input (x1 , . . . , xn ) such that it needs at least dlog2 (n!)e ∈ Θ(n log n)
comparisons to sort it.
152
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
For more details, see Wikipedia. Also note that if more is known about the data,
faster algorithms might exist; one example is the infamous bucket sort.
Before we begin with describing the sorting algorithms, we want to explain how
to sort data in Python. There are two ways to sort data. The first is using the
sorted() function, which can be applied to any sequence object (for example, tuples
and lists; compare Section 2.3.2). The second is the sort() member function of lists:
it sorts the list in-place (i.e. without creating a new list). By default, both functions
use <, == and > to compare elements, but this behavior can be changed in two ways.
In the following, let x1 , . . . , xn ∈ X be some data which shall be sorted.
Sorting using Keys Assume that we have a function f : X → Y , such that
objects of type Y can be compared using <, == and >. Then Python can sort
x1 , . . . , xn such that f (x1 ) ≤ · · · ≤ f (xn ) as follows:
1
2
x = [ " house " , " forest " , " kid " , " witch " ]
x . sort ( key = len )
Then the function len will be applied to all elements of x to obtain f (x). The result
will be:
x == ["kid", "house", "witch", "forest"]
Note that the relative order of "house" and "witch" (both have five letters) are not
changed; this is because Timsort is stable. (With heap sort, it could happen that
the relative order of such elements would change.)
Note that one can reverse the sorting order by using the additional argument
reverse=True:
1
2
x = [ " house " , " forest " , " kid " , " witch " ]
x . sort ( key = len , reverse = True )
This yields:
x == ["forest", "house", "witch", "kid"]
Note that here, also the original relative order of "house" and "witch" was not
changed; we only have f (x1 ) ≥ · · · ≥ f (xn ) now.
Note that one also can use user-defined functions:
1
2
def getSecondElement ( x ) :
return x [1]
3
4
5
x = [( " house " , 13) , ( " forest " , 15) , ( " kid " , 1) , ( " witch " , 38) ]
x . sort ( key = getSecondElement )
This yields:
x == [("kid", 1), ("house", 13), ("forest", 15), ("witch", 38)]
The list is sorted by the second element of each tuple it contains.
4.3. SORTING
153
Sorting using Comparison Functions By default, Python uses <, >, == to
compare two elements. To change the sorting order fundamentally (or define it in
the first place), one possibility is to redefine these operators; this usually not a good
idea. A better idea is to specify a comparison function to sort(), which returns a
negative number, zero or a positive number to indicate that the first element is less,
equal or larger than the second, respectively:
1
2
3
4
5
def compareTwoTuples (x , y ) :
c = cmp ( x [1] , y [1])
if c != 0:
return c
return cmp ( len ( x [0]) , len ( y [0]) )
6
7
8
x = [( " forest " , 13) , ( " house " , 13) , ( " kid " , 1) , ( " witch " , 5) ]
x . sort ( cmp = compareTwoTuples )
(The function cmp(x, y) returns -1 if x < y, 0 if x == y and 1 if x > y. It can be
implemented for own classes by defining a method __cmp__(self, y) inside the class.
If this method is not available, Python uses <, == and > to determine their order.)
This comparison function first compares the second element of the tuple (in the
usual manner). If the second elements are equal, it considers the length of the first
element. This yields:
x == [("kid", 1), ("witch", 5), ("house", 13), ("forest", 13)]
Note that here, one can also use reverse=True to obtain the reversed order.
Also note that specifying a comparison function is more powerful than specifying
a key function, but also slower, since Python uses the key function once for each
input element, while it has to call the cmp function for each comparison (and by
Theorem 4.3.1, this can happen Θ(n log n) times).
Finally, one can also combine key and cmp: then cmp will be used to compare
key(x[0]), . . . , key(x[n-1]). It is also possible to add reverse=True here as well.
As an application of everything discussed above, we want to show how Algorithm 4.7 can be implemented in Python:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def multiply_list ( list ) :
if len ( list ) == 0: return None
# Sort list by degree of entries , and store
# the result in a new list
list = sorted ( list , key = len )
# ( we use that len ( poly ) will return deg +1)
while len ( list ) > 1:
# Remove two first elements
f1 = list [0]
f2 = list [1]
del list [0:2]
# Compute their product
f = f1 * f2
# Insert the product at the right position .
# Here , we cheat by simply sorting . This
# is * NOT * optimal !
list . append ( f )
list . sort ( key = len )
return list [0]
154
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES
We can use it as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
>>> import polynomials
>>> import rings
>>> R = rings . Integers ()
>>> x = [ polynomials . Polynomial (R ,
...
polynomials . Polynomial (R ,
...
polynomials . Polynomial (R ,
...
polynomials . Polynomial (R ,
>>> f = multiply_list ( x )
>>> print f
(432 * X ^11 + 1284 * X ^10 + 2252 *
X ^7 + 2750 * X ^6 + 2196 * X ^5 +
345 * X ^2 + 104 * X ^1 + 16)
>>> g = x [0] * x [1] * x [2] * x [3]
>>> print f == g
True
4.3.1
[1 ,
[4 ,
[4 ,
[1 ,
2]) ,
5 , 6]) ,
5 , 6 , 7 , 8 , 9]) ,
2 , 3 , 4]) ]
X ^9 + 2871 * X ^8 + 2986 *
1494 * X ^4 + 820 * X ^3 +
Selection Sort
Selection sort is a very simple sorting algorithm. It works as follows: given n datas
x1 , . . . , xn , it first looks through the whole list to find the smallest element, say
xi . Then it moves xi to the first position, and continues recursively with the list
(x1 , . . . , xi−1 , xi+1 , . . . , xn ). This can be done in-place3 as follows:
1
2
3
4
5
6
7
8
9
def selection_sort ( v ) :
for i in xrange ( len ( v ) ) :
# Find index with smallest v [ current ] for i < current
< len ( v )
current = i
for j in xrange ( i + 1 , len ( v ) ) :
if v [ j ] < v [ current ]:
current = j
# Swap v [ i ] with v [ current ]
v [ i ] , v [ current ] = v [ current ] , v [ i ]
Note that in iteration i, one has to do len(v) - i - 1 comparisons. Therefore,
if the length of v is n, then the total number of comparisons is
n−1
X
n−1
X
i=0
i=0
(n − i − 1) = n2 −
i − n = n2 − 12 n(n − 1) − n = 12 n(n − 1).
Therefore, we obtain:
Theorem 4.3.2 (Selection Sort). The running time of selection sort is in Θ(n2 ).
Note that selection sort is one of the few sorting algorithms whose running time
does not depend on the input data (except its length and the time needed for each
comparison, of course). Most algorithms have certain inputs for which they perform
very fast (usually in Θ(n) operations), and others for which they are slow (often
ranges between O(n log n) and O(n2 ) operations).
3
That is, without creating new lists, but by just modifying the current list.
4.3. SORTING
4.3.2
155
Heap Sort
The idea is very simple: since a heap is a priority queue and allows to quickly
return the smallest element, we create a heap of the given input data, and query it
repeatedly until we finally iteratively remove the smallest element and add it to the
destination list, until the heap is empty.
It is also possible to do this in-place. For this, one usually creates the heap
such that the root is the largest element (i.e. we use the reversed order). The
heap can be created in-place in the list (x1 , . . . , xn ) of original data, as described
in Section 4.2.4.4 (representing the heap in a linear fashion) and Section 4.2.4.3
(turning unsorted data into a heap). After this, we iteratively remove the smallest
element. Since this reduces the size of the heap by one, we can put the largest
element which was just removed at the position after the reduced heap which is now
empty:
Input: unsorted data x1 , . . . , xn
Output: the data rearranged such that it is sorted
(1) Use Algorithm 4.5 to turn (x1 , . . . , xn ) into the linear representation of a heap,
with reversed order such that the root is always the largest element;
(2) For k from n, n − 1, . . . , 2, do:
(a) Set t := x1 ;
(b) Remove the largest element x1 from the heap, which is represented by the
elements (x1 , . . . , xk );
(c) Afterwards, the heap will be represented by (x1 , . . . , xk−1 );
(d) Set xk := t;
(3) Return (x1 , . . . , xn ).
Algorithm 4.8: Heap sort
Theorem 4.3.3 (Heap Sort). Algorithm 4.8 sorts n elements of unsorted data in
O(n log n) operations. Therefore, it is an asymptotically optimal sorting algorithm.
Proof. By Proposition 4.2.20, building the heap can be done in O(n) operations. In
loop iteration k, the heap has size k, whence removing the largest element can be
done in O(log k) operations by Proposition 4.2.19. Therefore, the total number of
operations is in
n
X
O(n) +
O(log k) ⊆ O(n log n).
k=1
156
CHAPTER 4. ALGORITHMS AND DATA STRUCTURES