* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Binary search trees 1
Survey
Document related concepts
Transcript
COSC242 Lecture 13 Trees ! Why do we care about trees? Well, consider the following classification system by Borges, called The Celestial Emporium Of Benevolent Knowledge: On those remote pages it is written that animals may be divided into (a) those that belong to the Emperor (b) embalmed ones (c) those that are trained (d) suckling pigs (e) mermaids (f) fabulous ones (g) stray dogs (h) those that are included in this classification (i) those that tremble as if they were mad (j) innumerable ones (k) those drawn with a very fine camel’s hair brush (l) others (m) those that have just broken a flower vase (n) those that resemble flies from a distance. Why is this a bad taxonomy? COSC242 Lecture 13 Slide 1 The uses of trees ! It makes better sense if we visualise the classification as a tree: animals insects bees fish bad ones mammals bitey mammals cuddly mammals It is natural for people to visualise hierarchies like business organisation charts as trees, and it is no accident that people speak of family trees. We draw trees hanging down from a root at the top because it’s easier to draw them that way around, and each node, including the root, can have any number of children. We will start with simple binary trees, in which nodes have at most 2 children. COSC242 Lecture 13 Slide 2 Hash tables or binary search trees? ! Hash tables (basically arrays): Convenient if maximum size of table is known beforehand, actual size does not fluctuate too much or spend too much time at a small fraction of the maximum size, and if the emphasis is on insertion and retrieval (search). But what if we need to do lots of deletions? What if we need to do traversals, e.g. to print out items in order of increasing key values? What if dynamic storage allocation is needed, because maximum tablesize is unknown or size fluctuates a lot? Pointer based data structures like linked lists are good for these operations, and we can hang linked lists off a hash table if we use chaining. But we know linked lists are suboptimal for searching, because we must use linear search which in the worst case is O(n). Can we improve the searchability of linked lists? That is what binary search trees are for! We review what you know about BSTs and then we talk about improving their efficiency by balancing them. COSC242 Lecture 13 Slide 3 What is a binary search tree? ! Recursive definition: A binary tree T is either • the empty tree, or • a root node containing a key field and data fields, a left subtree TL, and a right subtree TR. Nodes with empty left and right subtrees are leaves. A binary tree T has the search-tree-property if: • nodes in T have a key field of ordinal type, so they can be ordered by < • for each node N in T, N’s key value is greater than all keys in its left subtree TL and less than all keys in its right subtree TR, and TL and TR are binary search trees. key! < < TL! TR! COSC242 Lecture 13 Slide 4 Examples and counterexamples ! Which of these are BSTs? E 1 B 2 G 3 A C 4 5 2 1 1 8 4 7 3 5 6 4 COSC242 Lecture 13 Slide 5 9 The Search Algorithm ! Suppose we want to search for a record with key k in a binary search tree T. We can exploit the recursive definition of tree. Thus there are 2 cases. Case 1: If T is empty, return ‘item not found’ or some other suitable value. Case 2: If T has a root, then compare its key value to k. There are now 3 possibilities. If k is equal to the key, return this root node. If k is less than the key, search the left subtree. If k is bigger, search the right subtree. How efficient is searching? In the worst case, we have to travel from root to leaf. How many steps is that, in the worst case? Exercise: Give a recursive definition of the height of T. (Hint - think of how many edges you need to follow.) To think about: How does height differ from depth? Would it make sense to define the height of a node as well as the height of the tree? The depth of a node? COSC242 Lecture 13 Slide 6 The Insertion Algorithm ! Suppose we want to insert a record (object, struct) with key k into a binary search tree T. Again we exploit the recursive definition of trees. Thus there are 2 cases. Case 1: If T is empty, make k the root node. Case 2: If T has a root x, compare its key value to k. If k is less than the key of x, insert k into the left subtree of x. If k is greater than the key of x, insert k into the right subtree of x. So the idea is simple —proceed down the tree as you would with Search, and insert k at the last spot on the path. How efficient is insertion in the worst case? COSC242 Lecture 13 Slide 7 Traversal ! Suppose we want to print the items in a BST in sorted order by key value. We need to walk over (traverse) the tree, pausing at the right moment to print a node, so that we print the nodes in the right order (with increasing key values). So suppose we have an operation Zap to apply to each node (where Zap is something like Print or Update): Inorder_tree_walk(T, Zap) 1. if root(T) ≠ NIL 2. then Inorder_tree_walk(TL, Zap) 3. Zap(root(T)) 4. Inorder_tree_walk(TR, Zap) In what order would we Zap the nodes below? COSC242 Lecture 13 Slide 8 More BST traversals! Preorder_tree_walk(T, Zap) 1. 2. 3. 4. if root(T) ≠ NIL then Zap(root(T)) Preorder_tree_walk(TL, Zap) Preorder_tree_walk(TR, Zap) Postorder_tree_walk(T, Zap) 1. 2. 3. 4. if root(T) ≠ NIL then Postorder_tree_walk(TL, Zap) Postorder_tree_walk(TR, Zap) Zap(root(T)) So a postorder traversal zaps a node after its children. How long does a traversal take? COSC242 Lecture 13 Slide 9 Sorting ! Suppose we have an array we want to sort, say the array A of length 7: [3 1 8 2 6 7 5]. We can build a BST to sort the array A. This is very similar to Quicksort and also uses a pivot to partition around at every step, although it is going to be a bit less efficient than Quicksort (why?). Example (in class): Insert the following keys into a BST 3 1 8 2 6 7 5 Once we have built the tree, we can use an in-order traversal to read off (i.e. print) the data in sorted order. COSC242 Lecture 13 Slide 10