Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2IL05 Data Structures Spring 2010 Lecture 9: Augmenting Data Structures Data structures Data structures are used in many applications directly: the user repeatedly queries the data structure indirectly: to make algorithms run faster In most cases a standard data structure is sufficient (possibly provided by a software library) But sometimes one needs additional operations that aren’t supported by any standard data structure ➨ need to design new data structure? Not always: often augmenting an existing structure is sufficient Example S set of elements, each with an unique key. Operations Search(S, k): return a pointer to an element x in S with key[x] = k, or NIL if such an element does not exist. OS-select(S, i): return a pointer to an element x in S with the ith smallest key (the key with rank i) Solution: sorted array A 2 5 6 9 10 11 24 27 31 35 41 43 54 55 73 the key with rank i is stored in A[i] Example S set of elements, each with an unique key. Operations Search(S, k): return a pointer to an element x in S with key[x] = k, or NIL if such an element does not exist. OS-select(S, i): return a pointer to an element x in S with the ith smallest key (the key with rank i) Insert(S, x): inserts element x into S, that is, S ← S ⋃ {x} Delete(S, x): remove element x from S Solution? Idea: Use red-black trees OS-select(S, 3): report key with rank 3 Is the key with rank 3 in the left subtree, in the right subtree, or in the root? 10 2 18 NIL NIL 12 Idea 1: store the rank of each node in the node 2 ➨ 2│1 NIL 50 17 NIL NIL NIL NIL Idea 1: Store the rank in each node Problem: Insertion can change the rank of every node! ➨ worst case O(n) 10 │ 10 2 22 │1 NIL NIL 18 │ 18 5 12 │ 12 3 NIL 17 │ 17 4 NIL NIL 50 │ 50 6 NIL NIL Idea 1: Store the rank in each node Problem: Insertion can change the rank of every node! ➨ worst case O(n) 10 │ 10 2 22 │1 NIL NIL Idea 2: store the size of the subtree in each node 18 │ 18 5 12 │ 12 3 NIL 17 │ 17 4 NIL NIL 50 │ 50 6 NIL NIL Idea 2: Store the size of the subtree 10 │ 10 6 22 │1 NIL NIL 18 │ 18 4 12 │ 12 2 NIL 17 │ 17 1 NIL NIL 50 │ 50 1 NIL NIL Idea 2: Store the size of the subtree 10 │ 10 6 store in each node x: left [x], right [x] parent [x] 22 │1 18 │ 18 4 key [x] color [x] NIL NIL size [x] = number of keys in subtree rooted at x (size [NIL] = 0) order-statistic tree 12 │ 12 2 NIL 17 │ 17 1 NIL NIL 50 │ 50 1 NIL NIL Order-statistic trees: OS-Select OS-Select(x, i): return pointer to node containing the ith smallest key of the subtree rooted at x OS-Select(x , i) 1. r = size[left[x]] + 1 2. if i == r 3. then return x 4. else if i < r 5. then return OS-Select(left[x], i) 6. else return OS-Select(right[x], i) Running time? O(log n) i-r OS-Select(x, 17) r = 16 17 25 size = 24 16 15 x Order-statistic trees: OS-Rank OS-Rank(T, x): return the rank of x in the linear order determined by an inorder walk of T = 1 + number of keys smaller than x x Order-statistic trees: OS-Rank OS-Rank(T, x): return the rank of x in the linear order determined by an inorder walk of T = 1 + number of keys smaller than x OS-Rank(T, x) 1. r = size[left[x]] + 1 2. y = x 3. while y ≠ root [T] 4. do if y == right[p[y]] 5. then r = r + size [left[p[y]] + 1 6. y = p[y] 7. return r Running time? O(log n) x OS-Rank: Correctness OS-Rank(T, x) 1. r = size[left[x]] + 1 2. y = x 3. while y ≠ root [T] 4. do if y == right[p[y]] 5. then r = r + size [left[p[y]] + 1 6. y = p[y] 7. return r Invariant At the start of each iteration of the while loop, r = rank of key[x] in Ty Initialization r = rank of key[x] in Tx (y = x) = number of keys smaller than key[x] in Tx +1 = size[left[x]] + 1 (binary-search-tree property) subtree with root y OS-Rank: Correctness OS-Rank(T, x) 1. r = size[left[x]] + 1 2. y = x 3. while y ≠ root [T] 4. do if y == right[p[y]] 5. then r = r + size [left[p[y]] + 1 6. y = p[y] 7. return r Invariant At the start of each iteration of the while loop, r = rank of key[x] in Ty Termination loop terminates when y = root[T] ➨ subtree rooted at y is entire tree ➨ r = rank of key[x] in entire tree OS-Rank: Correctness OS-Rank(T, x) 1. r = size[left[x]] + 1 2. y = x 3. while y ≠ root [T] 4. do if y == right[p[y]] 5. then r = r + size [left[p[y]] + 1 6. y = p[y] 7. return r Invariant At the start of each iteration of the while loop, r = rank of key[x] in Ty Maintenance case i: y = right[p[y]] ➨ all keys Tleft[p[y]] and key[p[y]] smaller than key[x] ➨ rank key[x] in Tp[y] = rank key[x] in Ty + size [left[p[y]] + 1 OS-Rank: Correctness OS-Rank(T, x) 1. r = size[left[x]] + 1 2. y = x 3. while y ≠ root [T] 4. do if y == right[p[y]] 5. then r = r + size [left[p[y]] + 1 6. y = p[y] 7. return r Invariant At the start of each iteration of the while loop, r = rank of key[x] in Ty Maintenance case ii: y = left[p[y]] ➨ all keys Tright[p[y]] and key[p[y]] larger than key[x] ➨ rank key[x] in Tp[y] = rank key[x] in Ty Order-statistic trees: Insertion and deletion Insertion and deletion as in a regular red-black tree, but we have to update size[x] field Red-black trees: Insertion 1. Do a regular binary search tree insertion 15 2. Fix the red-black properties 10 +1 Step1 find the leave where the node should be inserted replace the leave by a red node that contains the key to be inserted size of the new node = 1 increment size of each node on the search path 2 23 +1 12 +1 50 17 +1 15 size[x] = 1 Red-black trees: Insertion 1. a regular tree2insertion TheDo new node isbinary red ➨search Property or 4 can be violated. Remove violationproperties by rotations and recoloring. 2. Fix thethe red-black 10 Red-black properties 1. 2. 3. 4. Every node is either red or black. The root is black Every leaf (nil[T]) is black. If a node is red, then both its children are black. 5. For each node, all paths from the node to descendant leaves contain the same number of black nodes. 23 2 12 50 17 15 Rotation right rotation around y y x x y left rotation around x A rotation affects only size[x] and size[y] We can determine the new values based on the size of children: size[x] = size[left[x]] + size[right[x]] + 1 and the same for y … Order-statistic trees The operations Insert, Delete, Search, OS-Select, and OS-Rank can be executed with an order-statistic tree in O(log n) time. Augmenting data structures Methodology for augmenting a data structure OS tree 1. Choose an underlying data structure. 1. R-B tree 2. Determine additional information to maintain. 2. size[x] 3. Verify that we can maintain additional information for existing data structure operations. 3. maintain size during insert and delete 4. Develop new operations. 4. OS-Select and OS-Rank You don’t need to do these steps in strict order! Red-black trees are very well suited to augmentation … Augmenting red-black trees Theorem Augment a R-B tree with field f, where f[x] depends only on information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]). Then can maintain values of f in all nodes during insert and delete without affecting O(log n) performance. When we alter information in x, changes propagate only upward on the search path for x … Augmenting red-black trees Theorem Augment a R-B tree with field f, where f[x] depends only on information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]). Then can maintain values of f in all nodes during insert and delete without affecting O(log n) performance. Proof (insert) Step 1 Do a regular binary search tree insertion go up from inserted node and update f additional time: O(log n) x Augmenting red-black trees Theorem Augment a R-B tree with field f, where f[x] depends only on information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]). Then can maintain values of f in all nodes during insert and delete without affecting O(log n) performance. Proof (insert) Step 2 y x Fix the red-black properties by rotations and recoloring x y update f for x, y, and their ancestors additional time per rotation: O(log n) Example: Interval trees Interval trees S set of closed intervals closed: endpoints are part of the interval j i low[i] high[i] Operations Interval-Insert(T, x): adds an interval x, whose int field is assumed to contain an interval, to the interval tree T. Interval-Delete(T, x): removes the element x from the interval tree T. Interval-Search(T, j): returns pointer to a node x in T such that int[x] overlaps j, or nil[T] if no such element exists. Methodology 1. Choose an underlying data structure. 2. Determine additional information to maintain. 3. Verify that we can maintain additional information for existing data structure operations. 4. Develop new operations. Methodology 1. Choose an underlying data structure. use red-black trees each node x contains interval int[x] key is left endpoint low[int[x]] inorder walk would list intervals sorted by low endpoint 2. Determine additional information to maintain. 3. Verify that we can maintain additional information for existing data structure operations. 4. Develop new operations. Methodology 1. Choose an underlying data structure. ✔ 2. Determine additional information to maintain. 3. Verify that we can maintain additional information for existing data structure operations. 4. Develop new operations. Additional information for Interval-Search case 1: i ∩ j ≠ Ø ➨ report i j i low[.] ≤ low[i] low[.] > low[i] j i Additional information for Interval-Search case 1: i ∩ j ≠ Ø ➨ report i j i low[.] ≤ low[i] case 2: j lies left of i ➨ j cannot overlap any interval in the right subtree low[.] > low[i] j i Additional information for Interval-Search case 1: i ∩ j ≠ Ø ➨ report i j i low[.] ≤ low[i] case 2: j lies left of i ➨ j cannot overlap any interval in the right subtree case 3: j lies right of i ➨ need additional information! low[.] > low[i] j i max[x] = max endpoint value in subtree rooted at x = max{high[i] where i is stored in the subtree rooted at x} Additional information for Interval-Search case 1: i ∩ j ≠ Ø ➨ report i j i low[.] ≤ low[i] case 2: j lies left of i ➨ j cannot overlap any interval in the right subtree case 3: j lies right of i ➨ j overlaps interval in left subtree if and only if low[j] ≤ max[left[i]] low[.] > low[i] j i max[x] = max endpoint value in subtree rooted at x = max{high[i] where i is stored in the subtree rooted at x} Methodology 1. Choose an underlying data structure. ✔ 2. Determine additional information to maintain. ✔ 3. Verify that we can maintain additional information for existing data structure operations. 4. Develop new operations. Interval-Search Interval-Search(T, j) 1. x = root [ T ] 2. while x ≠ nil[T] and j does not overlap int[x] 3. do if left[x] ≠ NIL and max[left[x]] ≥ low[j] 4. then x = left[x] 5. else x = right[x] 6. return x Correctness Invariant If tree T contains an interval that overlaps j, then there is such an interval in the subtree rooted at x. Running time? O(log n) Methodology 1. Choose an underlying data structure. ✔ 2. Determine additional information to maintain. ✔ 3. Verify that we can maintain additional information for existing data structure operations. 4. Develop new operations. ✔ Maintain additional information Theorem Augment a R-B tree with field f, where f[x] depends only on information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]). Then can maintain values of f in all nodes during insert and delete without affecting O(log n) performance. Additional information max[x] = max endpoint value in subtree rooted at x max[x] depends only on information in x: high[int[x]] information in left[x]: max[left[x]] information in right[x]: max[right[x]] max[x] = max{high[int[x]], max[left[x]], max[right[x]]} ➨ insert and delete still run in O(log n) time Search for all intervals Interval-Search-All(T, j) report all intervals that overlap j Worst case running time? Θ(k log n) where k = # of reported intervals How? See Assignment 8 … There is another type of interval tree that can answer this query in Θ(k + log n) time. Another interval tree use a red-black tree, key is left endpoint low[.] store each interval i є S in the highest node x such that i contains low[x] i1 Another interval tree use a red-black tree, key is left endpoint low[.] {i2, i3, i1} {i2, i1, i3} store each interval i є S in the highest node x such that i contains low[x] ➨ each node can store several intervals ➨ use two sorted lists (one on low[.] and one on high[.]) i2 i3 i1 Tutorials this week No small tutorials on Tuesday 1+2. Thursday 3+4 big tutorial.