Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4 Algorithms and Data Structures In this chapter, we will discuss certain algorithms and data structures which are needed to implement some of the algorithms we will discuss in Chapter 5, and which allow to accelerate these algorithms in certain cases. The first aspect we want to look at is searching for data. Then, we will consider binary trees. These have many applications in computer science. Many powerful data structures, such as dictionaries (see Section 2.3.5), are usually implemented using (balanced) binary trees.1 Also, binary trees can be used to sort data efficiently, or to speed up mathematical operations such as computing f mod gi for polynomials f, g1 , . . . , gn ∈ R[X] efficiently, when the degrees of the gi ’s are not close to each other. We will use this in Chapter 5. Finally, we will consider sorting data sets. We will show how to sort using Python, how to implement a simple sorting algorithm (Section 4.3.1), and how to use binary heaps – a special kind of a binary tree – to create an asymptotically optimal sorting algorithm (Section 4.3.2). Note that essentially all algorithms in this chapter will have running times O(log n) or O(n log n) (except the one in Section 4.3.1). 4.1 Searching If we are given a list L = (a0 , . . . , an−1 ) with unsorted data, and we want to find a specific element x, we need in the worst case n comparisons to find out whether or not ai = x for some i, and if it is, to determine such an i. In the case the list is sorted, we can dramatically speed this up using binary search. This is another classical divide and conquer algorithm, where the input problem is split into two halves. For this, one compares the seeked element x with the element in the middle, an/2 . If x is less than an/2 , then x must be in the first half of the list – if it exists in the list. And if x > an/2 , then x must be in the second half of the list. 1 The dictionaries in Python, though, are implemented using hash tables. But for example, in C++, dictionaries (there called maps) are implemented using trees. 121 122 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES Input: A sorted list L = (a0 , . . . , an−1 ) in ascending order (i.e. ai ≤ aj for i < j), and an element x Output: An index i ∈ {0, . . . , n − 1} with ai = x in case it exists, or ∅ otherwise. 1. If n = 0, return ∅; 2. Let k = bn/2c; 3. Compare ak to x: • If ak = x, return k; • If ak > x, apply Algorithm 4.1 recursively to (a0 , . . . , ak−1 ); if the result is i, return i; • If ak < x, apply Algorithm 4.1 recursively to (ak+1 , . . . , an−1 ); if the result is i, return ∅ in case i = ∅ and i + k + 1 otherwise. Algorithm 4.1: Binary search Proposition 4.1.1. Algorithm 4.1 is correct and needs at most 1 + blog2 nc ∈ O(log n) comparisons for n ≥ 1. Proof. The correctness is clear in case L is sorted in ascending order. For the number of comparisons, denote by T (n) the maximal number of comparisons needed by Algorithm 4.1 if it is given a list of length n. We will show T (n) ≤ 1 + blog2 nc, which is monotonic in n. Clearly, T (1) = 1, whence for n = 1, the inequality holds. Let n ≥ 2. Note that (n − 1) − (k + 1) + 1 = n − k − 1 ≤ k, since n ≤ 2k + 1. Therefore, T (n) ≤ 1 + (1 + blog2 kc) = 1 + b1 + log2 bn/2cc = 1 + blog2 (2bn/2c)c ≤ 1 + blog2 nc. In case nothing is known about the order of the elements, it can be shown that searching on a classical computer is in Θ(n). Interestingly, on a quantum computer, √ such searches can be done in O( n) operations (Grover search). 4.2 Binary Trees Trees are one way to store data in a non-linear fashion. If the trees are search trees, one can efficiently search for elements in them; for balanced trees, the complexity for searching an element, inserting an element, etc. is O(log n), where n is the number of elements stored in that tree. In this section we want to introduce trees and forests, present balanced trees, discuss heaps and priority queues, and finally discuss Huffman trees and their relation to entropy. All trees in this section will be binary, which means that every node has at most two children. One can also consider more general trees, but we do not need such during this course. In graph theory, a binary tree can be described as a connected undirected graph without circles, loops and double nodes, where the degree of every node is ≤ 3 and the degree of at least one node is ≤ 2. If we drop the requirement of being connected, the result is a forest; a forest is the disjoint union of trees. The root of a tree is a node with degree ≤ 2. If a root is fixed, and a, b are two nodes connected by an 4.2. BINARY TREES 123 edge, we say that a is the parent of b and b is a child of a if a is closer to the root. The leaves of a tree are precisely the nodes of degree one, or the root in case it has degree zero. From the computer science perspective, a binary tree looks as this: root node node leaf node leaf node node node leaf leaf node leaf leaf leaf while a forest of five trees with roots r1 , . . . , r5 looks like this: r1 r2 r3 r4 r5 We call the distance between a node and the root of its tree the height of that node. The height of the tree is the maximal height of all its nodes. For example, the first tree had height 5, while the five trees in the forest have heights 3, 4, 3, 4 and 2, respectively. Note that if we fix a node of a tree, we can look at its subtree: this are the nodes which are children of this node, their own children, etc. 4.2.1 A Formal Approach A more formal approach to binary trees and forests requires some background from Graph Theory. In this section, we will provide the necessary definitions and some results. We restrict ourselves to undirected graphs without loops. Definition 4.2.1. (i) A graph G is a pair G = (V, E), where E ⊆ {e ⊆ V | |e| = 2}. The set V is called the vertex set of G and denoted by V (G), and the set E is called the edge set of G and denoted by E(G). An element e = {x, y} ∈ E is called edge of G, and often written as xy. An element x ∈ V is called vertex of G. (ii) If x ∈ V (G) and e ∈ E(G), then we say that e and x are incident to x if and only if x ∈ e. We denote by E(x) or EG (x) the set of edges which are incident to x, and by δG (x) := |EG (x)| the degree of x. 124 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES (iii) Two vertices x, y ∈ V (G) are said to be adjacent if {x, y} ∈ E(G), and two edges e1 , e2 ∈ E(G) are said to be adjacent if |e1 ∩ e2 | = 1. (iv) A subgraph of a graph G = (V, E) is a graph G0 = (V 0 , E 0 ) with V 0 ⊆ V , E 0 ⊆ E. (v) Given a subset V 0 ⊆ V , where G = (V, E) is a graph, the subgraph induced by V 0 is the graph G[V 0 ] := (V 0 , E 0 ), where E 0 = {e ∈ E | e ⊆ V 0 }. Remark 4.2.2. For every subset V 0 ⊆ V (G), G[V 0 ] is a subgraph of G. Definition 4.2.3. (i) A way of length n in a graph G is a sequence of vertices (x0 , . . . , xn ) such that for all i ∈ {1, . . . , n}, {xi−1 , xi } ∈ E(G). We denote the way by x0 x1 · · · xn . (Note that n = 0 is possible.) We say that x0 · · · xn is connecting x0 with xn . (ii) A graph G is connected if for every two vertices x, y ∈ V , there exists a way connecting x with y. (iii) A connected component of a graph G is a subgraph G[V 0 ], where V 0 ⊆ V (G) is maximal with respect to inclusion such that G[V 0 ] is connected. (iv) A way x0 · · · xn is called a path if xi 6= xj for i 6= j. (v) A way x0 · · · xn with n ≥ 2 is a circle if xn = x0 and x0 · · · xn−1 is a path. Proposition 4.2.4. If GSis a graph and {Gi | i ∈SI} the set of connected components of G, then V (G) = i∈I V (Gi ) and E(G) = i∈I E(Gi ), and both unions are disjoint. In particular, for every x ∈ V (G), there exists a unique connected component Gi with x ∈ V (Gi ). We write G(x) := V (Gi ). Definition 4.2.5. (i) A forest is a graph which has no circle. A connected forest is called a tree. A forest respectively tree is called binary if all vertex degrees are ≤ 3. (ii) A rooted (binary) tree is a pair (G, r) such that G is a (binary) tree, and in case it is binary, δ(r) ≤ 2. We call r the root of (G, r). (iii) A rooted (binary) forest is a pair (G, (ri )i∈I ) such that • ri and rj are not connected for i 6= j; • (G(ri ), ri ) is a rooted (binary) tree; • {G(ri ) | i ∈ I} is the set of connected components of G. The vertices ri are called roots of the trees in (G, (ri )i ). (iv) If G is a (rooted) forest or (rooted) tree, we call a vertex x ∈ V (G) a leaf if and only if (1) x is a root (in case G is rooted) and δ(x) = 0; or (2) x is not a root (in case G is rooted) and δ(x) = 1. 4.2. BINARY TREES 125 Remark 4.2.6. A rooted (binary) tree (G, r) is also a rooted (binary) forest (G, (r){1} ). Proposition 4.2.7. Let G be a forest. (a) If x, y ∈ V (G), there exists at most one path connecting x to y. Such a path exists if and only if y ∈ V (G(x)). The path equals the shortest way connecting x to y. (b) If (G, (ri )i∈I ) is a rooted forest and x ∈ V (G(ri )), then the height h(x) of x is the length of the unique path connecting ri to x. Definition 4.2.8. Let (G, (ri )i∈I ) be a rooted forest. Let x, y ∈ V (G). (i) We say that x is a child of y and y a parent of x if and only if x and y are connected by a path of length 1 and h(x) = h(y) + 1. (ii) The height of G(ri ) is defined as {max h(x) | x ∈ G(ri )}. (iii) The vertices of G are also called nodes. (iv) The set Sx of grandchildren of x ∈ G is the set of nodes y ∈ G such that there exists a path x0 · · · xn with x0 = x, xn = y such that xi is a child of xi−1 , 1 ≤ i ≤ n. Denote G[Sx ] by G|x , which we will call the subtree starting at x. Remarks 4.2.9. Let (G, (ri )i∈I ) be a (binary) forest and x ∈ V (G). (a) Then x is a leaf if and only if x has no children. (b) If (G, (ri )i∈I ) is binary, every node has at most two children. (c) The roots ri are the only nodes of height 0, and the only nodes which have no parent. All other nodes have precisely one parent (which we from now on call the parent). (d) (G|x , x) is a rooted (binary) tree. Moreover, G|ri = G(ri ). (e) If G(x) is a tree of height n, then G|x is a tree of height ≤ n − h(x). Lemma 4.2.10. A rooted binary tree has at most 2n nodes of height n. In particular, a binary tree of height n has at most 2n+1 − 1 nodes and at most 2n leaves. Proof. Let (G, r) be a rooted binary tree. We show the first claim by induction. Clearly, h(x) = 0 if and only if x = r, whence the number of nodes of height 0 equals 1 = 20 . Let Sn be the set of nodes of height n. Assuming |Sn | ≤ 2n , we see that the map Sn+1 → Sn , mapping each node to its parent, is surjective and at most two-toone. Therefore, |Sn+1 | ≤ 2 · |Sn | ≤ 2n+1 . Now assume that the height of G is n. Then Sk = ∅ for k > n, whence |V | = Pn Pn 2n+1 −1 k n+1 − 1. k=0 |Sk | ≤ k=0 2 = 2−1 = 2 For the statement about the number of leaves, we proceed by induction on the height of G. Clearly, a rooted binary tree of height 0 has precisely one node, which is a leaf. Thus the statement is true for such trees. Now assume that the statement is true for all rooted binary trees of height n. Let (G, r) be a rooted binary tree of height n + 1, and let xi , i ∈ I with I ∈ {{1}, {1, 2}} be all children of r. By the previous remark, G|xi is a rooted binary tree of height ≤ (n + 1) − h(xi ) = (n + 1) − 1 = n; therefore, the number of leaves of G|xi is at most 2n . 126 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES S Now V (G) = {r} ∪ i∈I V (G|xi ), whence the number of leaves of G equals the sum of all leaves of G|xi , i ∈ I. Therefore, the total number of leaves of G is bounded by |I| · 2n = 2 · 2n = 2n+1 , what we wanted to show. By induction, the claim follows. Definition 4.2.11. (i) A binary tree is called perfect if it contains precisely 2n+1 − 1 nodes and has height n. (ii) A binary tree is called complete if it contains precisely 2k nodes of height k for all k less than its height. Remarks 4.2.12. (a) In a perfect binary tree, every node is either a leaf, or has precisely two children. (b) In a complete binary tree of height n, all nodes of height < n − 1 have precisely two children. The nodes of height n − 1 can have between zero and two children. (c) A perfect binary tree of height n has precisely 2n+1 − 1 nodes, and a complete binary tree of height n has at least 2n and at most 2n+1 − 1 nodes. (d) A perfect binary tree of height n has precisely 2n leaves, and a complete binary tree of height n has at least 2n−1 + 1 and at most 2n leaves. The following graph depicts a rooted binary forest, whose two trees are both of height 4, and one is perfect (G(r1 )) and the other one complete (G(r2 )): r1 4.2.2 r2 Search Trees and Balanced Trees A binary tree with data is a tree whose nodes contain data. More precisely, let (T, r) be a rooted binary tree, and let (X, ≤) be an arbitrary set. Let d : V (T ) → X be a map: this map associates to every node x ∈ V (T ) of the tree one element of x, the data stored in this node. Then a binary rooted tree with data is the tuple (T, r, d). There are many ways to implement binary trees with data. In Python, one could do this using classes: 1 2 3 4 class Node ( object ) : left = None right = None data = None 5 6 7 def __init__ ( self , data ) : self . data = data 4.2. BINARY TREES 127 Each Node object contains references to at most two children, which are stored in Node.left and Node.right. Sometimes, it us useful to also add a reference to the parent. It is possible to “walk through the tree” and print the data using the following function: 1 2 3 4 5 6 def printTree ( root ) : if root is None : return print root . data printTree ( root . left ) printTree ( root . right ) If we create a tree as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 def createTestTree () : n1 = Node ( " This " ) n2 = Node ( " is " ) n2 . left = n1 n3 = Node ( " some " ) n4 = Node ( " test " ) n4 . right = n3 n4 . left = n2 n5 = Node ( " tree " ) root = Node ( " Root " ) root . left = n4 root . right = n5 return root Then we can do the following: 1 2 3 4 5 6 7 8 >>> root = createTestTree () >>> printTree ( root ) Root test is This some tree Note that the way printTree() is traversing the tree is called a depth-first traversion. We can also modify printTree() so that it outputs the tree structure: 1 2 3 4 5 6 def printTree2 ( root , ind =0) : if root is None : return print " " * ind + str ( root . data ) printTree2 ( root . left , ind +2) printTree2 ( root . right , ind +2) Then we obtain: 1 2 3 4 5 6 7 8 >>> root = createTestTree () >>> printTree2 ( root ) Root test is This some tree 128 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES It is also possible to get a more “graphical” structure similar to the one we used before: 1 2 3 4 5 6 def printTree3 ( root , ind =0) : if root is None : return printTree3 ( root . left , ind +2) print " " * ind + str ( root . data ) printTree3 ( root . right , ind +2) Then we obtain: 1 2 3 4 5 6 7 8 >>> root = createTestTree () >>> printTree3 ( root ) This is test some Root tree This has to be read as follows: Root test is tree some This Note that this is a bit more than a binary rooted tree with data (T, r, d): there is an explicit order on the children, by distinguishing them as “left” and “right” children. Also, if there is only one child of a node, it can be either a “left” or a “right” child. This can be formulated mathematically by adding another map ` : V (T ) \ {r} → {left, right}. (One could also define two maps which map each node to its left child and its right child; but then we need a value for “has no such child”.) Such a quadruple (T, r, d, `) is what is meant when talking about binary trees in Computer Science. We denote this definition by binary CS-tree. We can now also say what a binary search tree is. Definition 4.2.13. Let (T, r, d, `) be a binary CS-tree, and assume that (X, ≤) is a totally ordered set, where d : V (T ) → X. Then (T, r, d, `) is a binary search CS-tree if for every x ∈ V (T ), one of the following holds: • if x has a left child x1 , then all grandchildren x01 of x1 satisfy d(x1 ) ≤ d(x); • if x has a right child x2 , then all grandchildren x02 of x2 satisfy d(x2 ) ≥ d(x). Note that each node is a grandchild of itself. Thus if node x has the left child x1 and the right child x2 , we have in particular d(x1 ) ≤ d(x) ≤ d(x2 ). Colloquially, nodes more left in a binary search CS-tree are less or equal to the nodes more right. For example, the following tree is a search tree (using the natural ordering on N): 4.2. BINARY TREES 129 23 13 5 42 21 1 If we would replace 21 by 31, it would be no longer a search tree, since one grandchild of 13 would be larger than 23, while 13 is a left child of 23. The main property of binary search CS-trees is that they allow to efficiently search in them: Theorem 4.2.14. If we are given a binary search CS-tree of height n, we can test whether it contains an element, and if yes, find this element in O(n) comparisons. In case the tree is complete and has m nodes, one can search for elements in O(log m) comparisons. The second follows from the fact that a complete tree of height n has between 2n and 2n+1 − 1 nodes, whence 2n ≤ m < 2n+1 and thus n ≤ log2 m < n + 1. One problem with trees is that creating a binary search CS-tree is quite easy – to insert a new element, one essentially searches for it, and then one knows where it has to be inserted – but that they usually are not complete. The farer they are away from being complete, the larger is their height compared to log2 |V (T )|. This shows that searching becomes slower. In the worst case, when T is degenerated , every node has at most one child, and the tree is essentially an ordered linked list: 1 5 12 13 23 42 Searching for an element can take as many comparisons as there are elements in the tree, which is far away from log2 |V (T )|. For this reason, one often uses balanced binary search CS-trees. There are different ways of being balanced; one very strict notion is being complete: then the height of two different leaves differs by at most one. If one inserts a new node into a balanced binary search CS-tree (or removes one), one has to ensure that the tree is still balanced. This is usually achieved by rotations. Two important examples of such trees are red-black trees and AVL trees. We will not go into details here, but refer to the literature and to Wikipedia. 130 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES 4.2.3 Simple Trees There are several applications where one would like to use a tree, but where one wants to keep things simple. For example, given a list of elements x1 , . . . , xn , we want a tree with the elements x1 , . . . , xn as its leaves. We do not care how precisely the tree looks like. Instead, our main focus are the following points: (i) The height of the tree is ≈ log2 n; (ii) All leaves are at the same height; (iii) It is easy to process the tree by processing all nodes on one height at the same time. The simplest solution is to start with a complete tree of height m such that the m-th row contains all n elements. The smallest such m is m = dlog2 ne. For example, assume that we want to store eleven elements x1 , . . . , x11 . The minimal m is m = dlog2 11e = 4. Consider the following perfect binary tree of height 4: x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 If we now cut off all leaves which are empty, we obtain a complete binary tree. If we further cut off all branches (i.e. subtrees) whose leaves are empty, we obtain the following binary tree: x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 We will call trees constructed with this method simple trees. Note that the number of nodes on level k equals 11/24−k for k = 0, . . . , 4: k 11 24−k 0 1 2 3 4 1 2 3 6 11 This is a general property of such trees, as we will see below. Moreover, describing which leaves are below a node (i.e. what are the leaves of the corresponding subtree) at level k is not so complicated: • On level 3, the first node has x1 , x2 as leaves, the second x3 , x4 , . . ., the fifth node has x9 , x10 as leaves, and the sixth node has x11 as a leaf; 4.2. BINARY TREES 131 • On level 2, the first node has x1 , . . . , x4 as leaves, the second node x5 , . . . , x8 , and the third x9 , . . . , x11 ; • On level 1, the first node has x1 , . . . , x8 as leaves, and the second node x9 , · · · , x11 ; • On level 0, the only node (which is the root) has x1 , . . . , x11 as leaves. Therefore, one easily deduces that for level k and node ` ∈ {1, . . . , 11/24−k }, it has x24−k (`−1)+1 , . . . , min{11, x24−k ` } as leaves: 1 2 3 4 5 6 7 8 9 10 11 ` 0 1, . . . , 11 1 1, . . . , 8 9, 10 11 2 1, 2, 3, 4 5, 6, 7, 8 9, 10, 11 3 1, 2 3, 4 5, 6 7, 8 9, 10 11 4 1 2 3 4 5 6 7 8 9 10 11 k Let us collect a few results about simple trees: Proposition 4.2.15. Let n ∈ N>0 . Consider the simple tree for the data elements x1 , . . . , xn . (a) The tree has height h = dlog2 ne; (b) In row k (where k = 0 is the root and k = h contains the leaves), we have n/2h−k nodes; (c) The parent of node i ∈ {0, . . . , n/2h−k −1} inrow k is node bi/2c in row k −1, if k > 0; the grandparent in row ` < k is node i/2k−` ; (d) The children of node i ∈ {0, . . . , n/2h−k − 1} in row k are nodes 2i and 2i + 1 in row k + 1, if k < h. Node 2i in row k + 1 always exists, node 2i + 1 exists and h−k h−k in row k + 1 if and only if i < n/2 − 1, or i = n/2 − 1 and either n ≡ 0 (mod 2h−k ) or (n mod 2h−k ) > 2h−k−1 ; (e) The leaves under node i in row k have indices 2h−k i to min{2h−k (i+1)−1, n−1}; thus they correspond to x2h−k i+1 , . . . , xmin{2h−k (i+1),n} . Proof. To ease the proofs of parts (b) and (d), define A : R≥0 → R by A(0) = 0 and A(x) = 1 for x > 0. (a) A perfect tree of height h has 2h nodes in the last row. Therefore, 2h−1 < n ≤ 2h yields h = dlog2 ne. (b) Above any node, there is at least one other node. Each node has at most two children, and every node but the last in a row has always two children. Therefore, if row k + 1 has tk+1 nodes, then row k has tk = dtk+1 /2e nodes. We 132 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES will show by induction on k = h, h − 1, . . . , 0 that th = n implies tk = n/2h−k . For k = h this is clear. Thus, assume that tk+1 = n/2h−(k+1) for k < h. Write n = 2h−k a+2h−(k+1) b+ c with a, b, c ∈ N, b < 2 and c < 2h−(k+1) . Then tk+1 = 2a + b + A(c), and tk+1 /2 = a + (b + A(c))/2, and (b+ A(c))/2 ≤ 1. Therefore, dtk+1 /2e = a + A(b + A(c)) = a + A(2b + c) = a + (2b + c)/2h−k = n/2h−k . t /2 = (c) The first part is clear. The second part follows by induction using n/2 n/2t+1 . (d) For i < n/2h−k − 1 the statement is clear (compare the proof of (b)). For i = n/2h−k − 1, the first child is 2i = 2 n/2h−k − 2 < n/2h−(k+1) − 1. The second child would be 2i+1, which only exists if and only if 2i+1 = 2 n/2h−k − 1 ≤ n/2h−(k+1) − 1. This is equivalent to 2 n/2h−k ≤ n/2h−(k+1) . Write n = 2h−k a + 2h−k−1 b + c with b, c∈ N, b < 2 and c < 2h−k−1 . Then a,h−k h−(k+1) n/2 = 2a + b + A(c), and n/2 = a + A(b + c) (compare the proof of (b)). Therefore, the existence of node 2i + 1 is equivalent to 2a + 2A(b + c) ≤ 2a + b + A(c), i.e. to 2A(b + c) ≤ b + A(c). This is true if either b + c = 0, or if b = 1 and c > 0. The first condition is equivalent to n ≡ 0 (mod 2h−k ), and the second is equivalent to (n mod 2h−k ) > 2h−k−1 . (e) The first leave under node i has index 2h−k i. Assuming that each of the (grand)children has two children, the last node will be 2h−k i + (2h−k − 1) = 2h−k (i + 1) − 1. Since the last index in the last line is n, the claim thus follows. 4.2.4 Heaps and Priority Queues One important data structure which is best to visualize as a complete binary CStree is a heap. One important application of heaps are priority queues. Such queues allow to quickly insert new elements and to quickly remove the smallest element. (Or largest element, by inverting the order.) Similarly to binary search CS-trees, heaps are binary (CS-)trees with special properties. As opposed to search CS-trees, we do not need the more strict definition of CS-trees, but it suffices to look at binary trees. Definition 4.2.16. A (binary) heap is a rooted (binary) tree (T, r) such that every x ∈ V (T ) satisfies the heap condition: if y is any child of x, then d(x) ≤ d(y). This ensures that the root r of the heap (T, r) satisfies d(r) ≤ d(x) for all x ∈ V (T ). Also note that if x ∈ V (T ), then T |x is also a heap. Note that a binary search CS-tree (T, r, d, `) yields a special binary heap (T, r, d), but a binary heap is often far away from being a search tree. For example, look at the example heaps in Sections 4.2.4.1, 4.2.4.3 and 4.2.4.2. In the following, we will only consider binary heaps, though most statements are also true for arbitrary heaps, as long as the number of children is bounded and assumed to be in O(1). 4.2.4.1 Insertion One of the the two most important algorithms for a heap is insertion of new elements. The following algorithm describes how to do this. 4.2. BINARY TREES 133 Input: the root R of a heap, an element x ∈ X to be inserted Output: the root R0 of a new heap containing everything from the old heap and x (1) Append a new leaf L such that the height of the tree only increases if it was perfect before; Note: we will see below how to do this in practice (see Section 4.2.4.4); (2) Let the value of L be x, i.e. d(L) = x; (3) As long as L has a parent P and as d(P ) > d(L), do the following: • Swap d(P ) and d(L) such that d(P ) < d(L); • Let L now point to P ; Note: we treat L as a reference to a node in the tree. Now we change the reference; (4) Return R. Algorithm 4.2: Inserting a new element into a heap The algorithm works as follows. Assume that we want to insert 4 into the following heap: 3 5 10 7 6 23 18 13 18 101 20 21 19 202 1234 21 14 22 123 50 125 210 75 90 We begin by adding a new leaf L in level 4 (since the tree is not yet perfect, we cannot add a new level). The red rounded boxes show the positions of the references R (the root), L (the inserted leaf) and P (the parent of L): 3 R 5 10 7 23 6 13 P 4 L 21 18 18 101 20 19 202 1234 21 14 22 123 50 125 210 75 90 Clearly, the heap property is violated, as d(L) = 4 < 13 = d(P ). Therefore, we swap d(L) and d(P ), and let L now point to P (and adjust P to be L’s parent): 3 R 5 P 23 10 6 7 L 13 4 21 18 18 101 20 19 202 1234 21 14 22 123 125 210 75 50 90 134 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES Again, the heap property is violated, since d(L) = 4 < 7 = d(P ). Therefore, we again swap d(L) and d(P ), and let L now point to P (and adjust P to be L’s parent): 3 P L R 5 10 4 6 23 7 13 21 18 18 101 20 19 202 1234 21 14 22 123 50 125 210 75 90 And again, the heap property is violated, since d(L) = 4 < 5 = d(P ). Therefore, we swap d(L) and d(P ) another time, and let L now point to P (and adjust P to be L’s parent): P L 3 R 4 10 5 6 23 7 13 21 18 18 101 20 19 202 1234 21 14 22 123 125 210 75 50 90 Now, finally, the heap property is satisfied. Note that since our tree is complete, the height is in O(log n), where n is the number of elements. Since we do at most one swap and comparison per level, the number of operations is in O(log n). Finally, note that if the heap was complete before, it is still complete afterwards: a new level is only added in case the binary tree was perfect. This is summed up in the following proposition: Proposition 4.2.17. Let R be the root of a binary heap of n nodes which is complete. Assume that appending a new leaf as described in the algorithm is possible in O(log n) operations. Then Algorithm 4.2 inserts a new element into the heap using at most O(log n) comparisons and swaps. Afterwards, the tree is still complete. 4.2.4.2 Deletion of the Root The second of the two most important algorithms we want to describe for a heap is the process of removing the smallest element of the heap, which can always be found at its root. We assume that the heap has at least two elements, as otherwise the process consists of just getting rid of the root. We split up the deletion algorithm into two steps: • Removing one node (at the bottom) and replacing the content of the root with its content; • Restoring the heap property. Let us first consider the process of restoring the heap property. We give it as an extra algorithm since we will need this part of the removal process later again: 4.2. BINARY TREES 135 Input: the root R of a binary tree which satisfies the heap property except for possibly the root node Output: the root of the same tree with rearranged data, now satisfying the heap property everywhere (1) Let L point to R; (2) While L is not a leaf: (a) If d(L) ≤ d(C) for all children C of L, exit the loop; (b) Let C be a child of L such that d(C) < d(L) and d(C) ≤ d(C 0 ) for all other children C 0 of L; (c) Swap d(L) with d(C), and let L point to C; (3) Return R. Algorithm 4.3: Percolate down (for heaps) Proposition 4.2.18. Let R be the root of a binary tree of height h which satisfies the hypotheses of Algorithm 4.4. Then Algorithm 4.4 rearranges the data such that the resulting binary tree is a heap with at most O(h) comparisons and swaps. The structure of the tree is not changed. Now the removal algorithm can be specified as follows: Input: the root R of a heap of at least two nodes Output: the root R0 of a new heap containing everything from the old heap except its root (1) Let L be any leaf in the last level; (2) Swap d(R) with d(L) and remove L from the heap; (3) Apply Algorithm 4.3 to the heap with root R; (4) Return R. Algorithm 4.4: Removing the smallest element of a heap The asymptotic running time is the same as for insertion, and it also preserves the property of being a complete binary tree: Proposition 4.2.19. Let R be the root of a binary heap of n nodes which is complete, and assume that n ≥ 2. Further assume that it is possible to obtain any leaf in the last level in O(log n) operations. Then Algorithm 4.4 removes the root data using at most O(log n) comparisons and swaps. Afterwards, the tree is still complete. We again want to illustrate removal with an example. Consider the heap build in Section 4.2.4.3. We want to remove its root element 1: 136 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES 1 1 3 7 19 8 14 26 6 20 9 8 11 18 12 For that, we take some leaf, say the one with data 12, replace 1 by 12 and delete the leaf: L 12 R C1 C2 1 7 19 8 14 26 3 6 20 8 9 11 18 Now we have to restore the heap property, which is violated since d(C1 ) < d(L). As d(C1 ) ≤ d(C2 ), we swap d(C1 ) with d(L), and assign L to point to C1 : 1 R L 12 C1 C2 7 19 3 14 8 26 6 20 9 8 11 18 Again, the heap property is violated since d(C1 ) < d(L). As d(C1 ) ≤ d(C2 ), we swap d(C1 ) with d(L), and assign L to point to C1 : 1 R 7 3 L 12 8 19 14 C1 C2 26 6 20 8 9 11 18 Since d(L) ≤ d(C1 ) and d(L) ≤ d(C2 ), the heap property is now satisfied. 4.2.4.3 Building a Heap In case we want to create a new heap and fill it with (unsorted) data x1 , . . . , xn , we just take the first data x1 , create a root node R, and assign d(R) = x1 . We then use Algorithm 4.2 (n − 1)-times to insert x2 , . . . , xn . Since before inserting xi , the heap has i − 1 nodes, the total running time is n X i=2 O(log(i − 1)) = O n−1 X log i . i=1 Pn−1 One can now show that i=1 log i ∈ Θ(n log n), whence we see that the total complexity for creating a heap of n elements is O(n log n). 4.2. BINARY TREES 137 Interestingly, it turns out that this process can be done much faster. The basic idea is to just create a complete binary tree which contains all the data x1 , . . . , xn , and only then to ensure the heap property: Input: a binary tree Output: the same tree with rearranged data, now satisfying the heap property (1) Let h be the height of the tree. (2) For level k = h − 1, h − 2, . . . , 0, do: (a) For each node P on level k, do: (i) Let L1 , . . . , L` be the children of P , ` ∈ {0, 1, 2}; (ii) If there exists an i such that d(Li ) < d(P ) and d(Li ) ≤ d(Lj ) for all j: • Swap d(Li ) and d(P ); • Turn the heap starting at node Li back into a heap using Algorithm 4.3; (3) Return the tree. Algorithm 4.5: Building up a heap from a binary tree In level k, we have 2k nodes. By Proposition 4.2.18, the call to Algorithm 4.3 is done in O(h−k) operations, since the subtree beginning at Li has height ≤ h−k −1. Therefore, the total number of operations is in h−1 X k=0 X X h−1 h k k−h 2 O(h − k) = O n · 2 (h − k) = O n · = O(n) 2k k k=0 k=1 since ∞ X k = 2. 2k k=1 P 1 k (For this, consider the power series f (x) = ∞ k=0 x = 1−x ; its radius of convergence P P 1 k−1 = is 1 and its derivative is f 0 (x) = ∞ , whence f 0 (1/2) = 2 · ∞ k=1 kx k=1 k · (1−x)2 2−k = 4.) We have therefore shown the following result: Proposition 4.2.20. Assume that it is possible to iterate over all nodes as specified in the algorithm in O(n) operations in total. Then Algorithm 4.5 builds a heap out of n unsorted data in O(n) comparisons and swaps. Let us now illustrate the building process in a small example. The following complete tree consisting out of 15 nodes has height 3: 26 1 8 7 19 1 14 8 6 20 3 12 11 18 9 In the first step, all nodes on level 2 are checked; they are marked in red . All children which will be swapped with the nodes are marked in blue : 138 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES 26 1 8 7 19 1 14 8 6 20 3 12 11 18 9 After swapping, the trees beginning at the red nodes are already heaps, and Algorithm 4.3 does not do anything. Now we consider the nodes on level 1: 26 1 8 7 19 1 14 8 3 20 6 9 11 18 12 After another swapping process, we are now in the situation that the left subtree below 3 (marked in red) is no longer a heap: 26 1 3 7 19 1 14 8 8 20 6 9 11 18 12 Algorithm 4.3 will now restore the heap property by swapping 8 with 6. After this, we continue with Algorithm 4.5 and consider the root, the only node on level 0: 26 1 3 7 19 1 14 8 6 20 8 9 11 18 12 In the final step, one last swap is done. Afterwards, the left subtree below the new root (marked in red) is no longer a heap: 1 26 3 7 19 1 14 8 6 20 8 9 11 18 12 Algorithm 4.3 will swap the 26 with the 1 in its right child, and then swap the new 26 with its left child 8. This results in the following heap, where we marked all nodes which were changed since the last tree in blue: 1 1 3 7 19 8 14 26 6 20 8 9 11 18 12 4.2. BINARY TREES 4.2.4.4 139 Representing a Complete Heap So far, everything seems to be quite easy, in particular for humans. The problem is that Algorithms 4.2, 4.4 and 4.5 use some operations which we have not explained yet: • Algorithm 4.2 needs to append a leaf to the tree, which should be on the last level, except if the last level is full; • Algorithm 4.4 needs to get hold of one leaf in the last level; • Algorithm 4.5 needs to go through all levels step by step. There is a simple solution for all these problems: • to (conceptually) use a complete binary CS-tree such that all leaves in the last level are on the left side, and all “missing” leaves in the last level are on the right side; • and to represent this complete CS-tree as a simple list. In fact, this is a very similar strategy to the simple trees in Section 4.2.3, but indexing is even simpler. The following complete tree of height 4 displays the indices of the data in each node: [0, 0] 0 1 3 7 4 8 [1, 2] 2 9 5 10 11 [3, 6] 6 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [7, 14] [15, 30] Proposition 4.2.21. Let x0 , . . . , xn−1 be some data. If represented as a complete binary CS-tree whose last level is filled from left to right, the tree T has the following properties: (a) Its height is h := blog2 nc; (b) On level k ∈ {0, . . . , h − 1}, it has 2k nodes, and on level h, it has n − 2h + 1 ≤ 2h nodes; (c) The nodes on level k ∈ {0, . . . , h} have indices 2k − 1 to min{2(2k − 1), n − 1}; if k < h, then min{2(2k − 1), n − 1} = 2(2k − 1); i (d) The parent of the node with index i > 0 has index i+1 − 1 = i−1 = 2 − 1; 2 2 (e) The children of the node with index i have indices j with 2i + 1 ≤ j ≤ min{2i + 2, n − 1}. That is, node i has no child if 2(i + 1) > n, one child if 2(i + 1) = n, and two children if 2(i + 1) < n. Proof. (a) Follows from Remark 4.2.12 (c). 140 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES (b) For levels 0 up to h − 1, P this follows from Remark 4.2.12 (c). Therefore, in all k h these levels, the tree has h−1 k=0 2 = 2 − 1 nodes. Therefore, on level h, it must h have n − (2 − 1) nodes. Finally, as h = blog2 nc, we have 2h ≤ n < 2h+1 . As n is an integer, n ≤ 2h+1 −1, whence n − 2h + 1 ≤ (2h+1 − 1) − 2h + 1 = 2h . (c) We show this by induction on k. For k = 0, there is only one node with index 0, and 20 −1 = 0 = 2(20 −1). Now assume that the formula is correct for level k−1, where k ∈ {1, . . . , h}. Then the last index in level k − 1 is 2(2k−1 − 1), and thus the first index in level k is 2(2k−1 − 1) + 1 = 2k − 1. In case k < h, level k has 2k nodes by (b), whence the last index is 2(2k−1 − 1) + 2k = 2k+1 − 2 = 2(2k − 1). In case k = h, level h has n − 2h + 1 nodes, whence the last index is 2(2h−1 − 1) + (n − 2h + 1) = n − 1 = min{2(2h − 1), n − 1} as 2(2h−1 − 1) + (n − 2h + 1) ≤ 2(2h−1 − 1) + 2h = 2(2h − 1) by (b). (d) In case node i is on level k > 0, we have i = 2k − 1 + j for 0 ≤ j < 2k . Now the parent of node i is on level k − 1 and has index bj/2c there. Therefore, its global index is (2k−1 − 1) + bj/2c. Now j = i − (2k − 1), whence bj/2c = (i − 2k + 1)/2 = i/2 − 2k−1 + 1/2 = bi/2 + 1/2c − 2k−1 . Therefore, the parent has index (2k−1 − 1) + bj/2c = (2k−1 − 1) + bi/2 + 1/2c − 2k−1 = b(i + 1)/2c − 1 = b(i − 1)/2c. Finally, if we write i = 2` + t with t ∈ {0, 1}, then b(i − 1)/2c = b` + t/2 − 1/2c = ` + b(t − 1)/2c = ` − 1 + t = ` − 1 + dt/2e = d(2` + t)/2e − 1 = di/2e − 1. (e) Assume that node i is the j-th node on level k, where 0 ≤ j < 2k . Then i = 2k − 1 + j by (c). Now the children of node i – if they exist – are on level k + 1 and have indices 2j and 2j + 1 on that level. Therefore, their global indices are A := (2k+1 − 1) + 2j and B := (2k+1 − 1) + (2j + 1) = A + 1. Now j = i − (2k − 1) yields A = (2k+1 − 1) + 2(i − (2k − 1)) = 2k+1 − 1 + 2i − 2k+1 + 2 = 2i + 1 and B = 2i + 2. Now these nodes only exist in the tree if their index is ≤ n − 1. Therefore, the children of node i have indices t with 2i + 1 ≤ t ≤ min{2i + 2, n − 1}, and the statement on the number of children is correct. Using this representation, we obtain a binary heap which can do all operations as fast as in Propositions 4.2.17, 4.2.19 and 4.2.20: 4.2. BINARY TREES 141 Theorem 4.2.22 (Algorithms for Linearly Represented Binary Heaps). There exist algorithms such that a binary heap of n nodes can be stored in a linear fashion requiring O(n) memory positions, such that the following operations have the following asymptotic complexities: (a) Insertion of a new element can be done in O(log n) operations; (b) Querying of the smallest element can be done in O(1) operations; (c) Removal of the smallest element can be done in O(log n) operations; (d) Building a new heap with n unsorted datas can be done in O(n) operations. 4.2.5 Huffman Trees Huffman trees originate in Information Theory. The question is the following: given an alphabet a1 , . . . , an , P where every letter ai appears in texts with a probability of pi , i.e. pi ∈ [0, 1] and ni=1 pi = 1. (Such a collection of pi ’s is called a (discrete) probability distribution.) How can we choose a uniquely decodable 2 binary encoding, i.e. sequences consisting of 0 and 1, for every letter ai , such that a random text encoded with this scheme has a minimal length? Assume that an encoding scheme is chosen where Pn ai is encoded using ci bits. Then the average length of one encoded letter is i=1 pi ci . C. Shannon [Sha48] showed in his famous Source Coding Theorem (see Theorem 4.2.26 below) that we always have n X i=1 pi ci ≥ n X i=1 n X 1 pi log2 =− pi log2 pi =: H(p1 , . . . , pn ); pi i=1 here, we understand that 0 · log2 0 = 0. The value H(p1 , . . . , pn ) is called the entropy of the probability distribution (p1 , . . . , pn ). It always is H(p1 , . . . , pn ) ≤ log2 n. D. Huffman gave a simple algorithm which allows to obtain such an encoding which is almost optimal. More precisely, his algorithm constructs a binary encoding with bit-lengths c1 , . . . , cn such that n X pi ci < H(p1 , . . . , pn ) + 1 i=1 and that no other binary encoding performs better. Therefore, his encoding is optimal. We begin with bounding the entropy of a probability distribution. Lemma 4.2.23. For any probability distribution (p1 , . . . , pn ), 0 ≤ H(p1 , . . . , pn ) ≤ log2 n, with equality on the left-hand side if and only if n = 1, and equality on the right-hand side if and only if p1 = · · · = pn . Proof. In case n = 1, we have α = 1 and H(p1 , . . . , pn ) = 0 = log2 1. Now assume n > 1. As 0 < pi ≤ 1, log pi ≤ 0. As n > 1, we have pi < 1 for at least one i, and the corresponding terms are then < 0, whence 0 < H(p1 , . . . , pn ). 2 This means that if we write all encoded words one after another without any separator, it is possible to determine where one code word ends and where the next starts, if one goes through the code letters from left to right. 142 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES For the other inequality, we use log x ≤ x − 1 for x > 0, with equality if and only if x = 1. Then H(p1 , . . . , pn ) − log2 n = n X pi [log(1/pi ) − log n]/ log 2 i=1 = (log 2) −1 ≤ (log 2)−1 = (log 2)−1 n X i=1 n X i=1 n X pi log(1/(pi n)) pi (1/(npi ) − 1) (1/n − pi ) = (log 2)−1 (1 − 1) = 0, i=1 and equality holds if and only if 1/(npi ) = 1 for all i. An additive tree (T, r, f ) is a rooted tree (T, r) together with a function f : V (T ) →P R≥0 such that if x ∈ V (G) and y1 , . . . , yk ∈ V (G) are all children of x, then f (x) = ki=1 f (yi ). We call f (x) the weight of the node x, and f (r) the weight of (T, r, f ). Given an alphabet x1 , . . . , xn with probability distribution p1 , . . . , pn , any binary encoding gives rise to an additive binary tree whose weight is 1. For example, assume that we are given the following alphabet with n = 7: i ai pi 1 A 0.30 2 B 0.10 3 C 0.09 4 E 0.05 5 G 0.15 6 L 0.13 7 R 0.18 We can first form any binary tree whose leaves are p1 , . . . , p7 : 0.18 R 0.30 A 0.13 0.10 0.09 0.05 0.15 B C E L G Then we determine the weight of a parent node of two children which already have heights by adding them together: 1.00 0.82 0.18 0.49 0.33 0.30 A 0.19 0.20 0.13 0.10 0.09 0.05 0.15 B C E R G L 4.2. BINARY TREES 143 The average height of this tree is the sum of the heights of the leaves, weighted with their value: 0.30 · 3 + (0.10 + 0.09 + 0.05 + 0.15) · 4 + 0.13 · 3 + 0.18 · 1 = 3.03. On the other hand, the entropy of the probability distribution is − 7 X pi log pi ≈ 2.6205. i=1 Now how is this related to an encoding scheme? We treat each left branch of the binary tree as a 0, and each right branch of the binary tree as a 1: 1.00 0 0.82 0 0.18 1 0.49 0 0.30 A 1 R 0.33 1 0.19 0 0.20 1 0.13 0 1 0 1 0.10 0.09 0.05 0.15 B C E L G This yields the following encodings: Letter A B C E G L R Encoding 000 0010 0011 0100 0101 011 1 For example, the word ALGEBRA would be encoded as 000 011 0101 0100 0010 1 000, or written without spaces, 0000110101010000101000. To decode this, one can again use the tree: one begins at the root and follows the labeled branches down until a leaf is reached; then one letter was decoded, and one can begin again at the root. This shows that such a scheme is uniquely decodable. Now note that the height of a leaf equals the length of the encoding. Therefore, the average height equals the average length of an encoded letter! By Shannon, the average length is at least H(p1 , . . . , p7 ), which is true as we verified above. As an intermission, let us prove Shannon’s theorem. We begin with two lemmas from which the theorem follows. Lemma 4.2.24 (Gibb’s Inequality). Let (p1 , . . . , pn ), (q1 , . . . , qn ) be two discrete probability distributions. Then H(p1 , . . . , pn ) = − n X pi log pi ≤ − i=1 and equality holds if and only if pi = qi for all i. n X i=1 pi log qi , 144 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES Proof. Let I = {i ∈ {1, . . . , n} | pi > 0}. Then using log x ≤ x − 1 for all x with equality if and only if x = 1, X X X X X qi qi pi · −1 =− qi + pi = − qi + 1 ≥ 0 − pi log ≥ pi pi i∈I i∈I as P i∈I qi i∈I i∈I i∈I ≤ 1. Therefore, − n X pi log qi = − i=1 X pi log qi ≥ − i∈I X pi log pi = − n X pi log pi . i=1 i∈I Dividing both sides by log 2 yields the inequality. P Now in the above, we have equality if and only if qi /pi = 1 for all i ∈ I and i∈I qi = 1. This is possible if and only if pi = qi for all i. Lemma 4.2.25 (Kraft’s Inequality). Let c1 , . . . , cn be binary strings which form a uniquely decodable code. If ci has length si , then n X 2−si ≤ 1. i=1 P Conversely, if there exist natural numbers s1 , . . . , sn satisfying ni=1 2−si ≤ 1, there exist binary strings c1 , . . . , cn , where ci has length si , such that c1 , . . . , cn is a uniquely decodable code. Proof. Without loss of generality, we can assume s1 ≤ · · · ≤ sn . Let Ai be the set of binary strings of length sn which begin with ci ; then |Ai | = 2sn −si . As c1 , . . . , cn is a uniquely decodable code, Ai ∩ Aj = ∅ for i 6= j. Therefore, n X sn −si 2 i=1 = n X i=1 [ n |Ai | = Ai ≤ 2sn . i=1 Dividing by 2sn yields the inequality. c1 c2 c3 c4 c5 c6 c7 c8 c9 Figure 4.1: Corresponding perfect binary CS-tree to the Gibb’s inequality 1 · 2−2 + 3 · 2−3 + 2 · 2−4 + 3 · 2−5 = 27/32 ≤ 1 P For the converse, we are given natural numbers s1 , . . . , sn with ni=1 2sn −si ≤ 2sn , and we again assume that s1 ≤ · · · ≤ sn . We begin with a rooted perfect binary tree (T, r) of height sn ; this tree has 2sn leaves. In the first step, we pick any node x1 of height s1 and remove T |x1 from T . This will remove 2sn −s1 leaves of T . The node x1 corresponds to c1 , and the removed leaves correspond to all binary strings of length sn which begin with c1 . By iteratively performing this operation 4.2. BINARY TREES 145 for s2 , . . . , sn , we obtain the desired code. We only have to show that this is always possible: Assume that we have iteratively removed subtrees beginning at levels s1 , . . . , si−1 . Pi−1 s n We are left with a binary tree with 2 − j=1 2sn −sj leaves at level sn . All removed leaves belong to removed subtreesP beginning at heights P ≤ si . In fact, the number of n si −sj < 2si (as si −sj ≤ 2si ), so there nodes removed at level si equals i−1 2 j=1 j=1 2 exists at least one node of height si which can be chosen for ci and removed. Theorem 4.2.26 (Shannon’s Source Coding Theorem). Let (T, r, f ) be an additive tree with leaves `1 , . . . , `n and weight > 0. Then n X i=1 f (`1 ) f (`n ) . hT (`i )f (`i ) ≥ f (r) · H ,..., f (r) f (r) Moreover, there exists an additive tree (T 0 , r0 , f 0 ) with the same leaf weights such that n X f (`1 ) f (`n ) 0 0 hT (`i )f (`i ) < f (r) · H ,..., +1 . f (r) f (r) i=1 Proof. Without loss of generality, can assume that the tree has weight 1, i.e. Pwe n f (r) = 1. We then have to show i=1 h(`i )f (`i ) ≥ H(f (`1 ), . . . , f (`n )). Let ci be the binary string corresponding to the leaf `i . Then c1 , . . . , cn is a uniquely decodable binary code, P and since the length of ci is h(`i ), Kraft’s inequality (Lemma 4.2.25) yields C := ni=1 2− h(`i ) ≤ 1. P P Set qi := 2− h(`i ) /C and pi = f (`i ); then ni=1 qi = 1 = ni=1 pi . Gibb’s inequality (Lemma 4.2.24) yields H(p1 , . . . , pn ) ≤ − n X i=1 pi log2 qi = n X i=1 pi h(`i ) + n X i=1 pi log2 C ≤ n X pi h(`i ), i=1 what we wanted to show for the first part. For the existence of (T 0 , r0 , f 0 ),Pset si = d−Plog2 pi e. Then − log2 pi ≤ si < n − log2 pi + 1, and 2−si ≤ pi and ni=1 2−si ≤ i=1 pi = 1. By Kraft’s inequality (Lemma 4.2.25), there exists a uniquely decodable binary code with codeword lengths s1 , . .P . , sn . This code corresponds to an additive tree with average height P n n p s < i i i=1 pi (− log2 pi + 1) = H(p1 , . . . , pn ) + 1. i=1 We now want to relate this to the topic of this lecture, namely Computer Algebra. Assume we are given a list of polynomials f1 , . . . , fn ∈ R[X], together with a multiplication time M for polynomials. Our aim is to find out how fast we can Qn compute i=1 fi . If we use the naive approach, i.e. we first compute f1 f2 , then (f1 f2 )f3 , then (f1 f2 f3 )f4 , etc., and if deg fiP ≈ deg fj for all i, j, then we obtain a worst case bound of O(k M(k)), where k = ni=1 deg fi . Assuming that deg fi ≥ 1 for all i, we have k ≥ n. There are simple approaches to do this faster; one is based on simple trees (as introduced in Section 4.2.3), and its complexity is O(M(k) log n). This is already much better than the worst case, but can still be improved, especially if the deg fi ’s vary a lot. Pn Pn Define pi = degk fi , where k = i=1 deg fi ; then pi ∈ [0, 1] and i=1 pi = 1, whence (p1 , . . . , pn ) is a probability distribution. Now consider the function g : 1 k N → R≥0 , x 7→ M(kx); then f (pi ) denotes the number of operations required 146 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES to multiply two polynomials of degree deg fi , and g is non-negative and satisfies g(x)/x ≤ g(y)/y for x ≤ y. Q We can now describe the way how the fi ’s are multiplied to obtain ni=1 fi as an additive tree, whereQeach leaf corresponds to one fi and the other nodes correspond to a subproduct of ni=1 fi ; more precisely, each node corresponds to the product of all leaves in its subtree: p1 + · · · + p7 = 1 f1 · · · f7 p1 + · · · + p6 f1 · · · f3 p1 + · · · + p3 p4 + · · · + p6 p1 p2 + p3 p4 + p5 p6 f1 p2 p3 p4 p5 f6 f2 f3 f4 f5 f2 · f3 p7 f1 · · · f6 f7 f4 · · · f6 f4 · f5 Here, the red rounded boxes show the polynomial (products) corresponding to the nodes. Now, we can bound the number of operations required to compute the Q product ni=1 fi using the tree by summing over all non-leaf nodes x and summing up g(f (x)). The following more P general proposition relates in our special case the number of operations to M( ni=1 deg fi ) times the (normalized) average height of the tree: Proposition 4.2.27. Let (T, r, f ) be an additive tree of positive weight, with f (x) > 0 for all leaves x, and g a non-negative function defined on f (V (T )) which satisfies g(t)/t ≤ g(t0 )/t0 for t ≤ t0 . Let L = {`1 , . . . , `n } be the n leaves of T . Then X n n X 1 X g(f (x)) ≤ g f (`i ) · h(`i )f (`i ); f (r) i=1 x∈V (T )\L Pn 1 f (r) i=1 Pn 1 note that f (r) = i=1 f (`i ), and i=1 h(`i )f (`i ) is the average height of (T, r, f (r) f ), which is the normalized tree (i.e. it has weight 1). In case g is linear, equality holds. Proof. For i ∈ {1, . . . , n} and x ∈ V (T ), let δ(x, i) = P1 if `i is a leaf of T |x , and δ(x, i) = 0 otherwise. With this notation, we have x∈V (T )\L δ(x, i) = h(`i ) for P every i and g(f (x)) = g( ni=1 δ(i, x)f (`P i )) for every x ∈ V (T ). P For a fixed x ∈ V (T )\L, set s(x) := ni=1 δ(i, x)f (`i ) and t := ni=1 f (`i ) = f (r); then 0 < s(x) ≤ t, and thus g(s(x))/s(x) ≤ g(t)/t. Multiplying by st > 0 yields X n n X f (r) · g δ(i, x)f (`i ) = t · g(s(x)) ≤ s(x) · g(t) = δ(i, x)f (`i )g(f (r)). i=1 i=1 Summing over all such x, we obtain X f (r)g(f (x)) = x∈V (T )\L ≤ X X f (r) · g x∈V (T )\L i=1 δ(i, x)f (`i )g(f (r)) = δ(i, x)f (`i ) i=1 x∈V (T )\L n X X n n X i=1 h(`i )f (`i )g(f (r)). 4.2. BINARY TREES 147 P Dividing by f (r) and using f (r) = ni=1 f (`i ) yields the result. In case g is linear, all “≤” above can be replaced by “=”, whence we obtain equality. We now want to present Huffman’s algorithm, which allows to create a tree which 1 Pn minimizes f (r) i=1 h(`i )f (`i ) among all additive binary trees with leaves `1 , . . . , `n . Instead of presenting the algorithm right away, we want to begin with an example run of the algorithm. We begin with ten non-negative real numbers, say 7, 45, 11, 61, 13, 16, 23, 3, 20 and 5, and create ten additive trees whose only node consists of one of these numbers. We order the trees by weight: 3 5 7 11 13 16 20 23 45 61 We then begin with taking the two trees with the smallest weights (or some of them, if these are not unique), and combine them to a new tree and insert it at the correct position: 7 8 11 3 13 16 20 23 45 61 5 In the next step, we again take the two trees of smallest weight, and combine them: 11 13 15 7 16 20 23 45 61 8 3 5 In the next step, we again take the two trees of smallest weight, and combine them: 15 7 16 20 8 3 23 24 11 45 61 13 5 After another step, we have this forest: 20 23 24 11 31 13 15 7 45 61 16 8 3 5 The next step results in the following forest: 24 11 31 13 15 7 16 8 3 43 5 20 45 23 61 148 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES One more step: 43 20 45 23 55 61 24 11 31 13 15 7 16 8 3 5 After three more steps, we obtain the following final result: 204 88 43 20 116 45 55 23 61 24 11 31 13 15 7 16 8 3 5 The average height of this tree is 2 · (45 + 61) + 3 · (20 + 23) + 4 · (11 + 13 + 16) + 5 · 7 + 6 · (3 + 5) 584 = , 3 + 5 + 7 + 11 + 13 + 16 + 20 + 23 + 45 + 61 204 which is ≈ 2.8627, and the entropy is 3 5 7 11 13 16 20 23 45 61 H , , , , , , , , , 204 204 204 204 204 204 204 204 204 204 Now we state a formal description of the algorithm: ≈ 2.8412. 4.2. BINARY TREES 149 Input: elements x1 , . . . , xn with non-negative real numbers p1 , . . . , pn Output: a additive tree with nodes x1 , . . . , xn with minimal average height (1) Create a list L; (2) For i = 1, . . . , n: (a) Create an additive tree T with one node xi with weight pi ; (b) Put T somewhere into L; (3) Create a binary heap on L; (4) While L contains more than one element: (a) Remove the two smallest trees T1 and T2 from L and restore the heap property; (b) Create a new additive tree T such that the two subtrees below the root are T1 and T2 ; (c) Insert T into the heap L; (5) Return the only element in L. Algorithm 4.6: Huffman’s algorithm Instead of keeping a heap, one could also sort the list L (see Section 4.3), and using a binary search (see Section 4.1) to insert the new tree T at the correct position so that the list L is sorted afterwards. This is essentially what we did in the example above, and we would obtain the same asymptotic complexity for building the tree assuming that insertion can be done in O(log n) operations. Proposition 4.2.28. Algorithm 4.6 returns after O(n log n) operations. Proof. Building the heap can be done in O(n log n) operations (Theorem 4.2.22 (d)). In iteration k ∈ {1, . . . , n} of the while loop, the heap contains n − k + 1 elements at the beginning. Therefore, removing the two smallest elements can be done in O(log(n−k+1)) operations, and inserting the new tree can also be done in O(log(n− k + 1)) operations (Theorem 4.2.22 (a)–(c)). Therefore, the loop in total needs n X O(log(n − k + 1)) ⊆ O(n log n) k=1 operations. We are left to show that the algorithm is correct, i.e. that the algorithm returns an additive tree with minimal average height. Proposition 4.2.29. Algorithm 4.6 returns an additive tree of minimal average height. That is, if (T, r, f ) is the output of the algorithm with leaves `1 , . . . , `n , and (T 0 , r0 , f 0 ) is any other additive tree with leaves `01 , . . . , `0n such that f (`i ) = f 0 (`0i ) for all i, then n n X X h(`i )f (`i ) ≤ h(`0i )f 0 (`0i ). i=1 i=1 150 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES Proof. Without loss of generality, we can assume that p1 ≤ · · · ≤ pn and n > 1. First, note that there are only finitely many additive trees with leaf weights p1 , . . . , pn ; therefore, we can pick one which minimizes the average path length. Let one such tree be (T 0 , r0 , f 0 ), and let `01 , . . . , `0n be its leaves with f 0 (`0i ) = pi . Let N be a node of T 0 which is not a leaf and whose h(N ) is maximal under this condition. If `01 and `02 are not leaves of N , we can permute the (`0i )’s such that this is the case, without increasing the average height. Thus, the children of N can be assumed to be `01 and `02 . We replace T 0 |N by a leaf `002 of weight p1 + p2 , and obtain a new additive tree (T 00 , r00 , f 00 ) with leaves `002 , . . . , `00n of weights p1 + p2 , p3 , . . . , pn . We now claim that this additive tree also has minimal average height: By Proposition 4.2.27 with g(x) = x, we have for L00 = {`002 , . . . , `00n } that X f 00 (x) = x∈V (T 00 )\L00 n X h(`00i )f 00 (`00i ). i=2 This shows that the average height of T 00 equals the average height of T 0 minus p1 + p2 . If (T 00 , r00 , f 00 ) would not have minimal average height, we could take another additive tree with minimal average height having the leaf weights p1 + p2 , p3 , . . . , pn , replace the node for p1 + p2 with the subtree T 0 |N , and obtain another additive tree with leaf weights p1 , . . . , pn which weights less than T 0 , a contradiction. By repeating this construction, we see that T 0 must contain one possible output of Algorithm 4.6 as a subtree. Moreover, the above shows that the resulting average weight of the output of Algorithm 4.6 does not depend on the choices made inside the algorithm. Finally, we want to show that the average height of the generated P tree is not too far away from the lower bound p H(p1 /p, . . . , pn /p), where p = ni=1 pi . More precisely: Corollary 4.2.30. The output (T, r, f ) of Algorithm 4.6 satisfies p · H(p1 /p, . . . , pn /p) ≤ n X h(`i )f (`i ) < p · (H(p1 /p, . . . , pn /p) + 1), i=1 where `1 , . . . , `n are the leaves of T and p = Pn i=1 pi . Proof. In case p = 0, the statement is clear, since this implies pi = 0 for all i. Thus, assume that p > 0. The lower bound follows from the first part of Theorem 4.2.26, as it is valid for all additive trees with weight p > 0. Without loss of generality, we can assume that p = 1. The second part of Theorem 4.2.26 shows that there exists an additive tree with minimal average height < p · (H(p1 /p, . . . , pn /p) + 1). Since the average height of the output of Algorithm 4.6 is minimal by Proposition 4.2.29, this claim follows as well. We therefore have shown the following result: Corollary 4.2.31. Given polynomials f1 , . . . , fn ∈ R[X] of degree ≥ 1, it is possible Q to compute the product f := ni=1 fi in less than M(deg f ) · (H(deg f1 / deg f, . . . , deg fn / deg f ) + 1) ≤ M(deg f ) · (log2 n + 1) 4.3. SORTING 151 arithmetic operations in R, with an administrative cost of O(n log n) operations. Proof. Let (T, r, f ) denote the additive tree output by Algorithm 4.6 for the leaf weights deg f1 , . . . , deg fn . Building the tree is the administrative cost, which is O(n log n) by Proposition 4.2.28. Let `1 , . . . , `n be the leaves of T , where f (`i ) = deg fi . By Proposition 4.2.27, running time of multiplying along the tree T is Pnthe total deg fi bounded by M(deg f ) · i=1 h(`i ) deg f . Next, Corollary 4.2.30 yields n X h(`i ) i=1 deg fi < H(deg f1 / deg f, . . . , deg fn / deg f ) + 1, deg f and Lemma 4.2.23 gives H(deg f1 / deg f, . . . , deg fn / deg f ) ≤ log2 n. Note that the algorithm is essentially equivalent to the following algorithm, which avoids talking about (Huffman) trees. The statements of Corollary 4.2.31 also hold for this algorithm: Input: polynomials f1Q , . . . , fn ∈ R[X] Output: the product ni=1 fi (1) Put the polynomials into a list L and sort the list by degrees; (2) While L contains more than one element: (a) Remove the two first elements g, h of L; (b) Compute f := g · h; (c) Insert f into L such that L is still sorted by degree; (3) Return the unique element in L. Algorithm 4.7: Computing Qn i=1 fi In fact, one can extend several results which use multiplication trees using Huffman trees. We will see some such examples in Chapter 5. 4.3 Sorting There exist many different sorting algorithms. We want to discuss two algorithms, one slow and simple, namely selection sort, and another which is fast (in fact, asymptotically optimal), namely heap sort. The algorithm used in Python is called Timsort, a hybrid algorithm; see Wikipedia for more information. Its asymptotic complexity is optimal as well, but it is better optimized for many “every-day situations”. There are many other optimal and slow but easy algorithms. A nice overview can be found on Wikipedia as well. Let us first begin with a theorem from Complexity Theory: Theorem 4.3.1. For any sorting algorithm which runs on a classical computer and knows nothing about the data except a way to compare two datas to each other, there exists an input (x1 , . . . , xn ) such that it needs at least dlog2 (n!)e ∈ Θ(n log n) comparisons to sort it. 152 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES For more details, see Wikipedia. Also note that if more is known about the data, faster algorithms might exist; one example is the infamous bucket sort. Before we begin with describing the sorting algorithms, we want to explain how to sort data in Python. There are two ways to sort data. The first is using the sorted() function, which can be applied to any sequence object (for example, tuples and lists; compare Section 2.3.2). The second is the sort() member function of lists: it sorts the list in-place (i.e. without creating a new list). By default, both functions use <, == and > to compare elements, but this behavior can be changed in two ways. In the following, let x1 , . . . , xn ∈ X be some data which shall be sorted. Sorting using Keys Assume that we have a function f : X → Y , such that objects of type Y can be compared using <, == and >. Then Python can sort x1 , . . . , xn such that f (x1 ) ≤ · · · ≤ f (xn ) as follows: 1 2 x = [ " house " , " forest " , " kid " , " witch " ] x . sort ( key = len ) Then the function len will be applied to all elements of x to obtain f (x). The result will be: x == ["kid", "house", "witch", "forest"] Note that the relative order of "house" and "witch" (both have five letters) are not changed; this is because Timsort is stable. (With heap sort, it could happen that the relative order of such elements would change.) Note that one can reverse the sorting order by using the additional argument reverse=True: 1 2 x = [ " house " , " forest " , " kid " , " witch " ] x . sort ( key = len , reverse = True ) This yields: x == ["forest", "house", "witch", "kid"] Note that here, also the original relative order of "house" and "witch" was not changed; we only have f (x1 ) ≥ · · · ≥ f (xn ) now. Note that one also can use user-defined functions: 1 2 def getSecondElement ( x ) : return x [1] 3 4 5 x = [( " house " , 13) , ( " forest " , 15) , ( " kid " , 1) , ( " witch " , 38) ] x . sort ( key = getSecondElement ) This yields: x == [("kid", 1), ("house", 13), ("forest", 15), ("witch", 38)] The list is sorted by the second element of each tuple it contains. 4.3. SORTING 153 Sorting using Comparison Functions By default, Python uses <, >, == to compare two elements. To change the sorting order fundamentally (or define it in the first place), one possibility is to redefine these operators; this usually not a good idea. A better idea is to specify a comparison function to sort(), which returns a negative number, zero or a positive number to indicate that the first element is less, equal or larger than the second, respectively: 1 2 3 4 5 def compareTwoTuples (x , y ) : c = cmp ( x [1] , y [1]) if c != 0: return c return cmp ( len ( x [0]) , len ( y [0]) ) 6 7 8 x = [( " forest " , 13) , ( " house " , 13) , ( " kid " , 1) , ( " witch " , 5) ] x . sort ( cmp = compareTwoTuples ) (The function cmp(x, y) returns -1 if x < y, 0 if x == y and 1 if x > y. It can be implemented for own classes by defining a method __cmp__(self, y) inside the class. If this method is not available, Python uses <, == and > to determine their order.) This comparison function first compares the second element of the tuple (in the usual manner). If the second elements are equal, it considers the length of the first element. This yields: x == [("kid", 1), ("witch", 5), ("house", 13), ("forest", 13)] Note that here, one can also use reverse=True to obtain the reversed order. Also note that specifying a comparison function is more powerful than specifying a key function, but also slower, since Python uses the key function once for each input element, while it has to call the cmp function for each comparison (and by Theorem 4.3.1, this can happen Θ(n log n) times). Finally, one can also combine key and cmp: then cmp will be used to compare key(x[0]), . . . , key(x[n-1]). It is also possible to add reverse=True here as well. As an application of everything discussed above, we want to show how Algorithm 4.7 can be implemented in Python: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 def multiply_list ( list ) : if len ( list ) == 0: return None # Sort list by degree of entries , and store # the result in a new list list = sorted ( list , key = len ) # ( we use that len ( poly ) will return deg +1) while len ( list ) > 1: # Remove two first elements f1 = list [0] f2 = list [1] del list [0:2] # Compute their product f = f1 * f2 # Insert the product at the right position . # Here , we cheat by simply sorting . This # is * NOT * optimal ! list . append ( f ) list . sort ( key = len ) return list [0] 154 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES We can use it as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 >>> import polynomials >>> import rings >>> R = rings . Integers () >>> x = [ polynomials . Polynomial (R , ... polynomials . Polynomial (R , ... polynomials . Polynomial (R , ... polynomials . Polynomial (R , >>> f = multiply_list ( x ) >>> print f (432 * X ^11 + 1284 * X ^10 + 2252 * X ^7 + 2750 * X ^6 + 2196 * X ^5 + 345 * X ^2 + 104 * X ^1 + 16) >>> g = x [0] * x [1] * x [2] * x [3] >>> print f == g True 4.3.1 [1 , [4 , [4 , [1 , 2]) , 5 , 6]) , 5 , 6 , 7 , 8 , 9]) , 2 , 3 , 4]) ] X ^9 + 2871 * X ^8 + 2986 * 1494 * X ^4 + 820 * X ^3 + Selection Sort Selection sort is a very simple sorting algorithm. It works as follows: given n datas x1 , . . . , xn , it first looks through the whole list to find the smallest element, say xi . Then it moves xi to the first position, and continues recursively with the list (x1 , . . . , xi−1 , xi+1 , . . . , xn ). This can be done in-place3 as follows: 1 2 3 4 5 6 7 8 9 def selection_sort ( v ) : for i in xrange ( len ( v ) ) : # Find index with smallest v [ current ] for i < current < len ( v ) current = i for j in xrange ( i + 1 , len ( v ) ) : if v [ j ] < v [ current ]: current = j # Swap v [ i ] with v [ current ] v [ i ] , v [ current ] = v [ current ] , v [ i ] Note that in iteration i, one has to do len(v) - i - 1 comparisons. Therefore, if the length of v is n, then the total number of comparisons is n−1 X n−1 X i=0 i=0 (n − i − 1) = n2 − i − n = n2 − 12 n(n − 1) − n = 12 n(n − 1). Therefore, we obtain: Theorem 4.3.2 (Selection Sort). The running time of selection sort is in Θ(n2 ). Note that selection sort is one of the few sorting algorithms whose running time does not depend on the input data (except its length and the time needed for each comparison, of course). Most algorithms have certain inputs for which they perform very fast (usually in Θ(n) operations), and others for which they are slow (often ranges between O(n log n) and O(n2 ) operations). 3 That is, without creating new lists, but by just modifying the current list. 4.3. SORTING 4.3.2 155 Heap Sort The idea is very simple: since a heap is a priority queue and allows to quickly return the smallest element, we create a heap of the given input data, and query it repeatedly until we finally iteratively remove the smallest element and add it to the destination list, until the heap is empty. It is also possible to do this in-place. For this, one usually creates the heap such that the root is the largest element (i.e. we use the reversed order). The heap can be created in-place in the list (x1 , . . . , xn ) of original data, as described in Section 4.2.4.4 (representing the heap in a linear fashion) and Section 4.2.4.3 (turning unsorted data into a heap). After this, we iteratively remove the smallest element. Since this reduces the size of the heap by one, we can put the largest element which was just removed at the position after the reduced heap which is now empty: Input: unsorted data x1 , . . . , xn Output: the data rearranged such that it is sorted (1) Use Algorithm 4.5 to turn (x1 , . . . , xn ) into the linear representation of a heap, with reversed order such that the root is always the largest element; (2) For k from n, n − 1, . . . , 2, do: (a) Set t := x1 ; (b) Remove the largest element x1 from the heap, which is represented by the elements (x1 , . . . , xk ); (c) Afterwards, the heap will be represented by (x1 , . . . , xk−1 ); (d) Set xk := t; (3) Return (x1 , . . . , xn ). Algorithm 4.8: Heap sort Theorem 4.3.3 (Heap Sort). Algorithm 4.8 sorts n elements of unsorted data in O(n log n) operations. Therefore, it is an asymptotically optimal sorting algorithm. Proof. By Proposition 4.2.20, building the heap can be done in O(n) operations. In loop iteration k, the heap has size k, whence removing the largest element can be done in O(log k) operations by Proposition 4.2.19. Therefore, the total number of operations is in n X O(n) + O(log k) ⊆ O(n log n). k=1 156 CHAPTER 4. ALGORITHMS AND DATA STRUCTURES