Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COMP200 - Data Structures Non-Linear Data Structures - Trees Rob Dempster [email protected] School of Computer Science University of KwaZulu-Natal Pietermaritzburg Campus COMP200 - Data Structures – p. 1 Abstract This is not a paper. It is a segment of the lecture presentation slides I used to provide a framework for the Data Structures lectures I presented while my (ex) boss spent a year (on Sabbatical) gadding about the northern hemisphere. The slides were prepared using SuSE Linux, Emacs, LATEXand Prosper. c 2008, Robert Dempster. These are free slides. This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. (This license allows you to redistribute these slides and handouts in unmodified form. It allows you to make and distribute modified versions, as long as you include an attribution to the original author, clearly describe the modifications that you have made, and distribute the modified work under the same license as the original. See the http://creativecommons.org/licenses/by-sa/2.5/ for full details.) The most recent version of these slides and handouts are always available, at no charge, for downloading and for on-line use at the Web address http://java-sun.cs.ukzn.ac.za/~robd/dsslides/ . There you will find the LATEXsource code together with the slides in formats suitable for slide presentations (.a4.sp.) and hand-outs (.a4.ho.). COMP200 - Data Structures – p. 2 Introduction In the previous section we studied the properties of linear (sequential) data types that could be used to efficiently represent and manipulate linear arrangements of data. In this section we will study the properties of a subset of the nonlinear data types that could be used to efficiently represent and manipulate nonlinear arrangements of data. The nonlinear data we will be concerned with here, are those data that are either explicitly arranged in a hierarchical manner, or result in a hierarchical structure when processed. Two classical examples are: Family trees, and Possible routes from a starting point to a particular destination: shortest path, spanning trees and minimum spanning trees, and (business) decision tables or game trees. COMP200 - Data Structures – p. 3 Definitions A general tree (ADT) can be defined recursively as follows: My definition: 1. A tree is either empty ( NULL), i.e., contains no elements, or 2. It consists of a root node that has zero or more (sub)trees associated with it. From [DROZ1996]: 1. An empty structure (i.e., one that contains no elements/nodes) is an empty tree. 2. If t1 , t2 , ..., tk are disjoint (independent/unconnected) trees, then the structure whose root has as its children the roots of t1 , t2 , ..., tk , is also a tree. 3. Only structures generated by rules 1 and 2 are trees. From [HORO1995] - A tree is a finite set of one or more nodes such that 1. There is a specially designated node called the root. 2. The remaining nodes are partitioned into n ≥ 0 disjoint sets T1 , T2 , ..., Tn , where each of these sets is a tree. T1 , T2 , ..., Tn are called the subtrees of the root. COMP200 - Data Structures – p. 4 Tree Examples COMP200 - Data Structures – p. 5 Some Terminology Some Terminology: (i) is an empty tree i.e., it contains no nodes. (ii) is a tree containing a single node, that is also the root node. (iii) is a tree consisting of a root node and three subtrees. in (iii) the nodes b, c and d are children of the parent a. nodes which have no children are generally referred to as the leaves of the tree. In case you have not noticed, our trees are upside down. the ancestors of a node are all the nodes along the unique path from the root to that node. in (v) the nodes b, c and d are siblings, i.e., brothers/sisters. Note that e and f are not siblings of b, c and d. the level of a node is equal to the number of arcs/links/edges traversed from the root to the node plus 1. the height (or depth) of a tree is equal to the maximum level of any node in the tree. For (v) it is 5. COMP200 - Data Structures – p. 6 More Tree Terminology The number of subtrees of a node is called its degree. The degree of a tree is the maximum of the degree of the nodes in the tree. An n-ary tree can be defined as follows: An n-ary tree is either empty (NULL), i.e., contains no elements, or It consists of a root node that has at most n (sub)trees attached to it. COMP200 - Data Structures – p. 7 More Tree Terminology (cont’d.) Consider the following examples for n = 2 (better known as Binary Trees): If all the leaves in a tree are either at a height h or h + 1, then the tree is balanced. Both (i) and (ii) are balanced. Allowing a tree to grow in an unbalanced manner may result in inefficiencies in terms of some of the algorithms that use the tree. If all the leaves in a tree are at a height h then the tree is complete. Only (i) is complete. For all nonempty, binary trees, whose non-terminal nodes have exactly two nonempty children, the number of leaves m is greater than the number of non-terminal nodes k and m = k + 1. COMP200 - Data Structures – p. 8 Balanced Binary Trees COMP200 - Data Structures – p. 9 Tree Traversals and Binary Search Trees If the nodes in a tree have been placed such that a particular traversal of the tree reveals a specific order in some way, then it is an ordered tree. Both these trees exhibit order: An in-order traversal of (i), returns the list of letters sorted alphabetically. For (ii), (taking note of how unary operators are taken care of): pre-order returns the pre-fix expression in-order returns the in-fix expression post-order returns the post-fix expression Ordered trees are often maintained as such to improve the efficiency of searching algorithms. Binary trees are used in this context so often, that we refer to Binary Search Trees (BST), Decision Trees and Game Trees: Average case search - O(ln n) for BSTs versus O(n/2) for ordered typical linear data structures. Worst case - both O(n). COMP200 - Data Structures – p. 10 Tree Traversals - Breadth First The traversals we have examined here are also depth first traversals. The traversal algorithm always descends to the greatest possible depth within the tree, before back tracking and trying an alternate path. This means that a search based on these algorithms, may traverse several alternate paths, before it finds a solution that exists at a depth, significantly less than the height of the tree. This suggests that an alternative approach that traverses the tree breadth first, rather than depth first may be more appropriate. While a discussion of tree ADTs should not concern itself with implementation details, we have already done so in terms of the figures we have used to represent trees. We have also done so in terms of some of the terminology we have introduced. COMP200 - Data Structures – p. 11 Tree Traversals - Breadth First (cont’d.) The following representation/implementation is presented as an alternative for one of the trees (v) presented earlier. This representation certainly supports breadth first traversals more directly. It also certainly makes it possible to proceed from one sibling to another directly. Traversals can also be supported by using the arcs linking the nodes in a different manner and/or by using additional arcs. We can also avoid the use of extra arcs if we use small tags, possibly dynamically, to determine how arcs are to be interpreted. COMP200 - Data Structures – p. 12 Breadth First Searching Tree Example COMP200 - Data Structures – p. 13 Threaded Tree Traversals The BST in the following diagram employs tags to enhance its interpretation of arcs such that they can be followed directly to perform one of the binary tree traversals. In this implementation, if a tag implies that an arc does not point to a node in the conventional sense, then it serves as a thread (pointer) which either points to node’s predecessor, or successor. The threads used in this form of (threaded) tree can thus be used to traverse the tree without having to employ a stack either directly, or implicitly, via recursion. The threads are in essence a form of implicit hardwired stack.) Finally it is possible to traverse a tree without an implicit runtime stack, or resorting to explicit threads. This can be done by creating the stack (or threads) dynamically during the traversal of the tree. COMP200 - Data Structures – p. 14 Threaded Tree Example COMP200 - Data Structures – p. 15 Morris’stackless in-order Tree Traversal Example Morris’ algorithm (shown in the next slide) achieves a stackless BST traversal by making changes to the tree during the traversal. The changes involve links that are no longer required during the current traversal. As no attempt is made to restore the altered links, the tree is in effect destroyed. The tree could subsequently be restored by creating and retaining some additional information during the traversal. This would of course be contrary to the purpose of Morris’ algorithm namely, to eliminate the run-time stack. COMP200 - Data Structures – p. 16 Morris’ Algorithm BT p = this; BT tmp; while (p != null) if (p.left == null) { visit(p); // process the node data p = p.right; } else { tmp = p.left; while (tmp.right != null \&\& // go to the rightmost node of tmp.right != p) // the left subtree or tmp = tmp.right; // to the temporary parent of p; if (tmp.right == null) { // if ’true’ rightmost node was tmp.right = p; // reached, make it a temporary p = p.left; // parent of the current root, } else { // else a temporary parent has been visit(p); // found; visit node p and then cut tmp.right = null; // the right pointer of the current p = p.right; // parent, whereby it ceases to be } // a parent; } COMP200 - Data Structures – p. 17 Node Deletion Deleting a node from any tree is generally non-trivial and presents three cases: The node is a leaf node - simply replace the reference to the node with a null reference. The node only has one child - replace the reference to the node with a reference to its single child node. The node has two children - several solutions exist and we will examine one (Delete by Merging) in more detail in the next slide. The Delete by Merging solution: Constructs a new ordered tree out of the two subtrees by merging them. It then attaches this tree to the parent of the node to be deleted in its place. The height of the tree may be extended during the deletion and the tree may also become quite unbalanced. COMP200 - Data Structures – p. 18 Delete (a node) by Merge - Pseudo Code // assume p is the reference in the tree // to the node to be deleted if (p references a leaf node) set p to null else if (p.right is null) set p to p.left else if (p.left is null) set p to p.right else tp = p.left while (tp.right != null) tp = tp.right tp.right = p.right p = p.left COMP200 - Data Structures – p. 19 Delete (a node) by Merge - A Java Problem Consider a method deletNode(BST bt, T data) where: bt is a reference to the root node of the BST (or null if the BST is empty), and data is the value associated with the node in the BST to be deleted. If the node to be deleted is the root node of the BST, then we have a problem. The problem is that Java passes parameters by value. Any change made to the root reference by this method e.g., such that it references the new root node of the merged subtrees, does not affect the reference to the root node of the BST in the calling environment. COMP200 - Data Structures – p. 20 Delete (a node) by Merge - A Java Problem (cont’d.) This problem could be solved if the reference to the root field instance could be passed as a reference parameter. That would allow the method to change the reference to root node such that it reference to the root node of the BST in the calling environment. both C++ and C# both support reference parameters. The problem could also be solved by modifying the BST representation and delete algorithm as shown in the following slides. The algorithm requires an additional parameter that references the Java field instance (probably root) that in turn references the BST node at the root of the BST, to be passed to the method. COMP200 - Data Structures – p. 21 The new BST and Empty BST Representation The BSTX EMPTY Tree data root left right EMPTY ’’’’ % % COMP200 - Data Structures – p. 22 Delete by Merge - Java Code - Part 1 public void deleteByMerging(T data) { BT p = pp = tmp = null; // pointer to root node an pointer to pointer ... boolean left =true; if (isEmpty()) { return; } else { p = root.getRight(); pp = root.getRight(); while (!p.isEmpty()) if (p.compareTo(data) == 0) break; else if (p.compareTo(data) < 0) { left = false; pp = p; p = p.getRight(); } else { left = true; pp = p; p = p.getLeft(); } } } COMP200 - Data Structures – p. 23 Delete by Merge - Java Code - Part 2 System.out.println("*** " + p.getData()); if (!p.isEmpty()) { if (p.getLeft().isEmpty() && p.getRight().isEmpty()) { if (left) pp.setLeft(BT.EMPTY); else pp.setRight(BT.EMPTY); } else if (p.getRight().isEmpty()) if (left) pp.setLeft(p.getLeft()); else pp.setRight(p.getLeft()); else if (p.getLeft().isEmpty()) if (left) pp.setLeft(p.getRight()); else pp.setRight(p.getRight()); else { //start here with following slide COMP200 - Data Structures – p. 24 Delete by Merge - Java Code - Part 3 else { //start here from previous slide tmp = p.getLeft(); while (!tmp.getRight().isEmpty()) tmp = tmp.getRight(); tmp.setRight(p.getRight()); if (left) pp.setLeft(p.getLeft()); else pp.setRight(p.getLeft()); } } COMP200 - Data Structures – p. 25 Node Deletion by Copying This solution effectively deletes a non leaf node by replacing it with the data of its immediate predecessor. The immediate predecessor will be a leaf node and is easily deleted. This algorithm does not increase the height of the tree. It is however asymmetric, as it always deletes the node of the immediate predecessor of the node which is in effect being deleted. The development of this algorithm and its implementation is left as an exercise. COMP200 - Data Structures – p. 26 Balancing Trees In terms of the general discussion thus far it has been pointed out that: Trees model the hierarchical structure of the data of certain problem domains very well, and Searching a tree for an item can be done faster than for a list. The second claim is only true if we are able to ensure that the tree being searched remains reasonably balanced. For a balanced binary search tree containing n nodes the number of comparisons will not exceed log2 n. There are a number of techniques that can be employed to obtain balanced trees. They include: Ordering the data prior to inserting it into the tree, such that it produces a balanced tree. This suits applications which only ever search an established tree. Rearrange the nodes in the tree after a new node has been inserted, to ensure that it remains balanced. This suits applications that add/delete nodes dynamically as the program runs. COMP200 - Data Structures – p. 27 Balancing Trees The following neat routine is based on the first option. It requires a previously sorted array containing the data. It then employs the following recursive routine to remove the element in the middle of the sorted array and insert it into the tree. balance(data[], int first, int last) { if (first <= last) { int middle = first + (last-first) / 2; insert(data[middle]); balance(data, first, middle-1); balance(data, middle+1, last); } } COMP200 - Data Structures – p. 28 Splay Trees (leans heavily on Bailey[2]) Because the process of adding a new value to a binary search tree is deterministic: it produces the same result tree each time, and because inspection of the tree does not modify its structure, one is stuck with the performance of any degenerate tree constructed. What might work better would be to allow the tree to reconfigure itself when operations appear to be inefficient. The splay tree quickly overcomes poor performance by rearranging the tree’s nodes on the fly using a simple operation called a splay. Instead of performing careful analysis and optimally modifying the structure whenever a node is added or removed, the splay tree simply moves the referenced node to the top of the tree. The operation has the interesting characteristic that the average depth of the ancestors of the node to be splayed is approximately halved. As with skew heaps, the performance of a splay trees operators, when amortised over many operations, is logarithmic. COMP200 - Data Structures – p. 29 Splay Tree Rotations The basis for the splay operation is a pair of operations called rotations as shown in the following slide. Each of these rotations replaces the root of a subtree with one of its children. A right rotation takes a left child, x, of a node y and reverses their relationship. This induces certain obvious changes in connectivity of subtrees, but in all other ways, the tree remains the same. In particular, there is no structural effect on the tree above the original location of node y. A left rotation is precisely the opposite of a right rotation; these operations are inverses of each other. For each rotation accomplished, the non-root node moves upward by one level. COMP200 - Data Structures – p. 30 Splay Tree Rotations (cont’d.) The code for rotating a binary tree about a node is a method of the Binary-Tree class. If x is the root, we are done. If x is a left (or right) child of the root, rotate the tree to the right (or left) about the root. x becomes the root and we are done. If x is the left child of its parent p, which is, in turn, the left child of its grandparent g, rotate right about g, followed by a right rotation about p. A symmetric pair of rotations is possible if x is a left child of a left child. After double rotation, continue splay of tree at x with this new tree. If x is the right child of p, which is the left child of g, we rotate left about p, then right about g. The method is similar if x is the left child of a right child. Again, continue the splay at x in the new tree. After the splay has been completed, the node x is located at the root of the tree. If node x were to be immediately accessed again (a strong possibility), the tree is clearly optimised to handle this situation COMP200 - Data Structures – p. 31 Splay Tree Rotations (cont’d.) After splaying it is not the case that the tree becomes more balanced. Clearly, if the tree is splayed at an extremal value, the tree is likely to be extremely unbalanced. An interesting feature, however, is that the depth of the nodes on the original path from x to the root of the tree is, on average, halved. Since the average depth of these nodes is halved, they clearly occupy locations closer to the top of the tree where they may be more efficiently accessed. To guarantee that the splay has an effect on all operations, we simply perform each of the binary search tree operations as before, but we splay the tree at the node accessed or modified during the operation. In the case of remove, we splay the tree at the parent of the value removed. COMP200 - Data Structures – p. 32 References References [1] How to Think Like a Computer Scientist, Downey, Allen B., URL: http://www.greenteapress.com/thinkapjava/ [2] Java Structures : Data Structures in Java for the Principled Programmer, Bailey, Duane A., URL: http://www.cs.williams.edu/JavaStructures/Welcome.html COMP200 - Data Structures – p. 33