Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 5311 Notes 2: Binary Search Trees (Last updated 4/29/17 2:51 PM) ROTATIONS B A a C b c d Single right rotation at B (AKA rotating edge BA) Single left rotation at B (AKA rotating edge BC) C A B B a d A C c a b b c D F Double right rotation at F B B G A A D C d E What two single rotations are equivalent? F C E G 2 (BOTTOM-UP) RED-BLACK TREES A red-black tree is a binary search tree whose height is logarithmic in the number of keys stored. 1. Every node is colored red or black. (Colors are only examined during insertion and deletion) 2. Every “leaf” (the sentinel) is colored black. 3. Both children of a red node are black. 4. Every simple path from a child of node X to a leaf has the same number of black nodes. This number is known as the black-height of X (bh(X)). Example: = red = black 40 3 0 20 100 2 3 10 30 60 120 1 1 2 2 0 0 0 0 50 80 110 140 1 2 1 2 0 0 70 90 1 1 0 0 0 0 0 130 160 1 1 0 0 150 170 1 1 0 0 0 Observations: 1. A red-black tree with n internal nodes (“keys”) has height at most 2 lg(n+1). 2. If a node X is not a leaf and its sibling is a leaf, then X must be red. 3. There may be many ways to color a binary search tree to make it a red-black tree. 4. If the root is colored red, then it may be switched to black without violating structural properties. 0 3 Insertion 1. Start with unbalanced insert of a “data leaf” (both children are the sentinel). 2. Color of new node is _________. 3. May violate structural property 3. Leads to three cases, along with symmetric versions. The x pointer points at a red node whose parent might also be red. Case 1: (in 2320 Notes 12, using Sedgewick’s book, this is done top-down before attaching new leaf) = red = black k+1 k+1 x B B k+1 k A k C A C k k k A’ x A’ x k k c a c d b a d b Case 2: = red = black k+1 k+1 C C k k A D B D k k-1 k k-1 x x B A k k d a b c e c a b d e 4 Case 3: = red = black k+1 x k+1 C B k k B D A C k k-1 k k A D k k-1 d c a e b a c b d e Example: = red = black 40 20 60 10 30 70 50 Insert 15 40 40 20 60 10 30 x x 1 70 50 20 60 10 30 15 70 50 15 Insert 13 40 40 20 10 60 30 70 50 10 15 x 30 x 40 20 3 60 13 13 13 10 20 2 60 30 15 50 70 15 50 70 5 Insert 75 40 40 20 60 13 30 70 50 15 10 x 20 1 13 x 75 30 60 70 50 15 10 75 Insert 14 40 40 20 13 30 20 1 x 70 50 15 10 x 60 75 13 30 70 50 15 10 14 60 75 14 x 40 Reset to black 20 1 60 13 30 70 50 15 10 75 14 Example: 140 40 160 20 10 60 30 50 170 150 100 120 110 80 70 130 90 Insert 75 140 40 140 160 40 20 160 170 150 100 20 10 60 30 50 120 110 80 70 x 90 75 1 130 10 60 30 50 x 170 150 100 120 110 80 90 70 75 130 6 140 40 140 160 x 20 x 1 10 60 30 120 40 120 110 80 60 110 130 130 10 50 30 90 70 80 90 70 75 75 100 40 3 140 60 20 10 30 150 2 20 50 160 100 170 150 100 50 120 110 80 160 130 150 170 90 70 75 Deletion Start with one of the unbalanced deletion cases: 1. Deleted node is a “data leaf”. a. Splice around to sentinel. b. Color of deleted node? Red Done Black Set “double black” pointer at sentinel. Determine which of four rebalancing cases applies. 2. Deleted node is parent of one “data leaf”. a. Splice around to “data leaf” b. Color of deleted node? Red Not possible Black “data leaf” must be red. Change its color to black. 3. Node with key-to-delete is parent of two “data nodes”. a. “Steal” key and data from successor (but not the color). b. “Delete” successor using the appropriate one of the previous two cases. 170 7 Case 1: Case 2: = red = 0 = either = black = 1 k+color(B) k+color(B) x B B k x k-1 A k-2 a D A D k-1 k-2 k-1 C E C E k-2 k-2 k-2 k-2 b a d c e b f d c e f Case 3: = red = 0 = either = black = 1 k+color(B) k+color(B) B B k x a k A D k-2 k-1 x A C k-2 k-1 C E D k-1 k-2 k-1 b a b c E k-2 c d e f d e f 8 Case 4: = red = 0 = either = black = 1 k+color(D)+color(C) k+color(B)+color(C) x B D k+color(C) k+color(C) A D B E k-2 +color(C) k-1 +color(C) k-1 +color(C) k-1 +color(C) C E A C k-1 k-1 +color(C) k-2 +color(C) k-1 b a c d e f a e c b f d (At most three rotations occur while processing the deletion of one key) Example: 80 40 120 20 10 100 60 30 70 50 90 140 110 150 130 Delete 50 80 40 120 20 10 100 60 x 30 70 90 140 110 150 130 80 40 2 x 20 10 120 30 100 60 70 90 140 110 130 150 9 80 x 2 40 120 20 100 60 10 70 30 90 x 140 110 80 40 2 120 20 100 60 10 150 130 70 30 140 90 110 150 130 If x reaches the root, then done. Only place in tree where this happens. Delete 60 80 40 120 20 70 10 100 30 90 140 110 150 130 Delete 70 80 80 40 x 20 1 120 100 140 20 120 100 40 10 140 x 10 30 90 110 150 130 30 90 80 2 20 10 x 30 120 100 40 90 140 110 130 150 If x reaches a red node, then change color to black and done. 110 130 150 10 Delete 10 80 80 20 x 3 120 100 40 30 20 x 140 90 110 120 30 40 150 130 100 140 90 110 150 130 80 30 4 20 120 40 100 140 90 110 150 130 Delete 40 80 80 30 120 x 2 x 20 100 110 120 20 140 90 30 100 150 130 90 100 30 20 2 140 80 x 90 110 150 130 120 120 1 140 x 150 130 140 80 100 30 20 110 90 150 130 110 Delete 120 130 130 140 80 x 80 2 140 x 100 30 20 90 150 100 30 20 110 90 100 80 x 140 80 4 150 110 30 110 100 130 3 150 30 130 90 110 140 20 90 150 20 Delete 100 110 80 30 20 110 4 130 90 x 80 30 140 150 20 140 90 130 150 11 AVL TREES An AVL tree is a binary search tree whose height is logarithmic in the number of keys stored. 1. Each node stores the difference of the heights (known as the balance factor) of the right and left subtrees rooted by the children: heightright - heightleft 50 +1 80 +1 20 +1 10 0 40 -1 30 0 100 +1 60 +1 70 0 120 -1 90 0 110 0 2. A balance factor must be +1, 0, -1 (leans right, “balanced”, leans left). 3. An insertion is implemented by: a. Attaching a leaf b. Rippling changes to balance factor: 1. Right child ripple Parent.Bal = 0 +1 and ripple to parent Parent.Bal = -1 0 to complete insertion Parent.Bal = +1 +2 and ROTATION to complete insertion 2. Left child ripple Parent.Bal = 0 -1 and ripple to parent Parent.Bal = +1 0 to complete insertion Parent.Bal = -1 -2 and ROTATION to complete insertion 12 4. Rotations a. Single (LL) - right rotation at D Rest of Tree Rest of Tree B 0 D -2 B -1 D 0 E h A h A h E h C h C h Restores height of subtree to pre-insertion number of levels RR case is symmetric b. Double (LR) Rest of Tree Rest of Tree F -2 D 0 B +1 D -1 A h C h-1 B 0 G h A h F +1 C h-1 E h-1 Insert on either subtree Restores height of subtree to pre-insertion number of levels RL case is symmetric E h-1 G h 13 Deletion Still have RR, RL, LL, and LR, but two addditional (symmetric) cases arise. Suppose 70 is deleted from this tree. Either LL or LR may be applied. 60 -1 30 80 -1 0 10 50 +1 -1 20 40 0 0 70 0 Fibonacci Trees - special case of AVL trees exhibiting two worst-case behaviors 1. Maximally skewed. (max height is roughly log1.618 n =1.44 lg n, expected height is lg n +.25) 2. (log n) rotations for a single deletion. (empty) 0 (empty) 1 2 3 4 5 6 7 14 TREAPS (CLRS, p. 333) Hybrid of BST and min-heap ideas Gives code that is clearer than RB or AVL (but comparable to skip lists) Expected height of tree is logarithmic (2.5 lg n) Keys are used as in BST Tree also has min-heap property based on each node having a priority: Randomized priority - generated when a new key is inserted Virtual priority - computed (when needed) using a function similar to a hash function 34 1 62 2 19 3 10 4 27 5 12 13 23 15 71 7 41 6 31 16 18 14 38 17 65 12 53 21 57 50 77 8 67 20 75 18 82 9 15 22 Asides: the first published such hybrid were the cartesian trees of J. Vuillemin, “A Unifying Look at Data Structures”, C. ACM 23 (4), April 1980, 229-239. A more complete explanation appears in E.M. McCreight, “Priority Search Trees”, SIAM J. Computing 14 (2), May 1985, 257-276 and chapter 10 of M. de Berg et.al. These are also used in the elegant implementation in M.A. Babenko and T.A. Starikovskaya, “Computing Longest Common Substrings” in E.A. Hirsch, Computer Science - Theory and Applications, LNCS 5010, 2008, 64-75. Insertion Insert as leaf Generate random priority (large range to minimize duplicates) Single rotations to fix min-heap property 88 11 15 Example: Insert 16 with a priority of 2 34 1 62 2 19 3 10 4 27 5 12 13 71 7 41 6 23 15 31 16 38 17 65 12 53 21 18 14 77 8 67 20 57 50 75 18 82 9 15 22 88 11 16 2 After rotations: 34 1 62 2 16 2 19 3 10 4 12 13 27 5 18 14 15 22 71 7 41 6 23 15 38 17 31 16 65 12 53 21 57 50 77 8 67 20 75 18 82 9 88 11 Deletion Find node and change priority to Rotate to bring up child with lower priority. Continue until min-heap property holds. Remove leaf. 16 Delete key 2: 2 1 2 4 3 1 2 4 3 1 2 5 5 3 4 1 2 1 2 3 4 4 3 2 2 4 3 5 5 3 4 5 5 5 5 3 4 2 4 3 4 3 5 5 5 5 3 4 3 4 2 1 2 1 AUGMENTING DATA STRUCTURES Read CLRS, section 14.1 on using RB tree with ranking information for order statistics. Retrieving an element with a given rank Determine the rank of an element Problem: Maintain summary information to support an aggregate operation on the k smallest (or largest) keys in O(log n) time. Example: Prefix Sum Given a key, determine the sum of all keys given key (prefix sum). Solution: Store sum of all keys in a subtree at the root of the subtree. 15 172 1 20 26 137 10 19 3 9 2 2 20 81 16 16 4 4 To compute prefix sum for a key: 30 30 24 45 21 21 Key 1 2 3 4 10 15 16 20 21 24 26 30 Prefix Sum 1 3 6 10 20 35 51 71 92 116 142 172 17 Initialize sum to 0 Search for key, modifying total as search progresses: Search goes left - leave total alone Search goes right or key has been found - add present node’s key and left child’s sum to total Key is 24: (15 + 20) + (20 + 16) + (24 + 21) = 116 Key is 10: (1 + 0) + (10 + 9) = 20 Key is 16: (15 + 20) + (16 + 0) = 51 Variation: Determine the smallest key that has a prefix sum ≥ a specified value. Updates to tree: Non-structural (attach/remove node) - modify node and every ancestor Single rotation (for prefix sum) D D B B A A B D E E C C A A D C+E+D C C E E (Similar for double rotation) General case - see CLRS 14.2, especially “Theorem” 14.1 Interval trees (CLRS 14.3) - a more significant application Set of (closed) intervals [low, high] - low is the key, but duplicates are allowed Each subtree root contains the max value appearing in any interval in that subtree Aggregate operation to support - find any interval that overlaps a given interval [low’, high’] Modify BST search . . . 18 if ptr == nil no interval in tree overlaps [low’, high’] if high’ ≥ ptr->low and ptr->high ≥ low’ return ptr as an answer if ptr->left != nil and ptr->left->max ≥ low’ ptr := ptr->left else ptr := ptr->right Updates to tree - similar to prefix sum, but replace additions with maximums