Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data structures Skip Lists - Idea Sorted linked list: 1 3 4 6 7 9 Insertion and Deletion is fast (requires just constant time); however Find operation requires (n) time. Can we do better? (E.g. somehow do binary search on linked list?) Data structures Skip Lists - Idea We can try something like this: [Adapted from J.Erickson] Data structures Skip Lists - "perfect" 1 3 4 5 6 9 11 13 Data structures Skip Lists - real examples [Adapted from B.Weems] Data structures Skip Lists - LookUp [Adapted from J.Erickson] Data structures Skip Lists - LookUp [Adapted from J.Erickson] Data structures Skip Lists - LookUp procedure SkipListLookUp(int K, SkipList L): P Header(L) for i from Level(L) downto 0 do while Key(Forward(P)[i]) < K do P Forward(P)[i] P Forward(P)[0] if Key(P) = K then return Data(P) else return fail Data structures Skip Lists - Insert procedure SkipListInsert(int K, int I, SkipList L): P Header(L) for i from Level(L) downto 0 do while Key(Forward(P)[i]) < K do P Forward(P)[i] Update[i] P P Forward(P)[0] if Key(P) = K then Data(P) I Data structures Skip Lists - Insert (cont.) else NewLevel RandomLevel() if NewLevel > Level(L) then for i from Level(L) + 1 to NewLevel do Update[i] Header(L) Level(L) NewLevel P NewCell(NewLevel) Key(P) K Data(P) I for i from 0 to NewLevel do Forward(P)[i] Forward(Update[i])[i] Forward(Update[i])[i] P Data structures Skip Lists - Random Level procedure RandomLevel(): v0 while Random() < 1/2 and v < MaxLevel do vv+1 return v Data structures Skip Lists - Complexity Let T be a path from the header (starting at level Level(L)) of skip list L to node P (at level i). If we trace the path backwards (from P to header) what will be the expected number of segments C(k) after which the path will rise k levels? Data structures Skip Lists - Complexity If we are at node P then there are 2 possibilities: • P is a node of level i and previous node is of level i (from which path has to rise k levels) • P is a node of level i and previous node is of level i – 1 (from which path has to rise k – 1 levels) C(k) = 1/2 (1 + C(k)) + 1/2(1 + C(k – 1)) C(k) = 2 + C(k – 1) = (k) Data structures Skip Lists - Complexity Number of traversed segments from P (at level 0) to header (at level log n) will not exceed S(n) = C(log n) + L(log n) = (log n) + L(log n), where L(log n) is expected number of nodes at level log n or higher. L(log n) = n (1/2) log n = const S(n) = O(log n) + const = O(log n) Data structures Skip Lists - Summary LookUp (log n) Insert (log n) Delete (log n) MakeEmpty (1) IsEmpty (1) Data structures Skip Lists - Summary [Adapted from B.Weems] Data structures Binary Search Trees T is a binary search tree, if • it is a binary tree (with a key associated with each node) • for each node x in T the keys at all nodes of left subtree of x are not larger than key at node x and keys at all nodes of right subtree of x are not smaller than key at node x Data structures BST - Examples This is BST 7 3 2 12 5 11 14 Data structures BST - Examples This is BST 7 3 12 5 11 Data structures BST - Examples This is not BST 7 2 3 4 5 11 14 Data structures BST - Examples This is not BST 7 3 2 12 5 6 14 Data structures Tree traversal algorithms Inorder - left subtree, root, right subtree Postorder - left subtree, right subtree, root Preorder - root, left subtree, right subtree Data structures Tree traversal algorithms - Preorder procedure Preorder(Tree T): if T then write Key(T) Preorder(LC(T)) Preorder(RC(T)) 7 3 2 12 5 11 14 7 3 2 5 12 11 14 Data structures Tree traversal algorithms - Inorder procedure Inorder(Tree T): if T then Inorder(LC(T)) write Key(T) Inorder(RC(T)) 7 3 2 12 5 11 14 2 3 5 7 11 12 14 Data structures Tree traversal algorithms - Postorder procedure Postorder(Tree T): if T then Postorder(LC(T)) Postorder(RC(T)) write Key(T) 7 3 2 12 5 11 14 2 5 3 11 14 12 7 Data structures BST - LookUp procedure BSTLookUp(int K, BST T): while T do if K = Key(T) then return Data(K) else if K < Key(T) then T LC(T) else T RC(T) return fail Data structures BST - Insert procedure BSTInsert(int K, int I, BST T): while T do if Key(T) = K then Data(T) I return else if K < Key(T) then T LC(T) else T RC(T) Q NewCell() Key(Q) K; Data(Q) I LC(Q) ; RC(Q) T Q Data structures BST - Insert - Example 7 3 2 13 5 12 14 9 8 10 11 11 Data structures BST - Insert - Example 2 [Adapted from N.Dale] Data structures BST - Insert - Example 2 [Adapted from N.Dale] Data structures BST - Delete procedure BSTDelete(int K, BST T): while T and Key(T) K do if K < Key(T) then T LC(T) else T RC(T) if T = then return if RC(T) = then T LC(T) else Q RC(T) while LC(Q) do Q LC(Q) TQ Q RC(Q) LC(Q) LC(T) RC(Q) RC(T) Data structures BST - Delete - Example 1 7 3 2 13 5 12 14 9 8 10 11 Data structures BST - Delete - Example 1 7 3 2 13 5 12 9 8 10 14 Data structures BST - Delete - Example 2 7 3 2 13 5 12 9 8 11 10 14 Data structures BST - Delete - Example 2 7 3 2 13 5 12 9 8 10 14 Data structures BST - Delete - Example 3 7 3 2 13 5 12 35 20 18 30 19 31 Data structures BST - Delete - Example 3 7 3 2 18 5 12 35 20 19 30 31 Data structures BST - Delete - Example 4 [Adapted from N.Dale] Data structures BST - Delete - Example 4 [Adapted from N.Dale] Data structures BST - Complexity LookUp (h) Insert (h) Delete (h) MakeEmpty (1) IsEmpty (1) Data structures BST - Insertion order defines shape [Adapted from N.Dale] Data structures BST - Insertion order defines shape [Adapted from N.Dale] Data structures BST - Average Height Theorem After inserting of n keys in initially empty binary search tree T, the average height of T is (log n). Theorem Let S(n) be the number of comparisons in a successful search of randomly constructed n node BST and let U(n) be the number of comparisons in a unsuccessful search of randomly constructed n node BST. Then • S(n) 1.39 log n S(n) = (log n) • U(n) 1.39 log (n + 1) U(n) = (log n) Data structures External Binary Trees 7 3 2 12 5 14 Data structures External Binary Trees Proposition For BST with n nodes the corresponding external binary tree has 2n + 1 node. Proposition Let internal path length I(n) be the sum of the depths of internal nodes and let external path length E(n) be the sum of the depths of all external nodes. Then E(n) = I(n) + 2 n. Data structures BST - Average Height • we need one more comparison to find a key in successful search than we needed to insert the key in the unsuccessful search that preceded its insertion S(n) = 1/n (U(0) + 1 + U(1) + 1 + +U(n – 1) + 1) • the expected number of comparisons in successful search is 1 more than average internal path length and the expected number of comparisons in an unsuccessful search is the average external path length S(n) = 1 + (I(n)/n) U(n) = E(n)/(n + 1) Data structures BST - Average Height [Adapted from H.Lewis, L.Denenberg] Data structures BST - Average Height [Adapted from H.Lewis, L.Denenberg] Data structures BST - Average Height [Adapted from H.Lewis, L.Denenberg] Data structures AVL Trees T is an AVL tree, if • it is a binary search tree • for each node x in T we have Height(LC(x)) – Height(RC(x)) {– 1, 0, 1} Data structures AVL Trees - Height Theorem Any AVL tree with n nodes has height less than 1.44 log n. We will prove a slightly weaker statement that h = (log n) Data structures AVL Trees - “Worst” examples Data structures AVL Trees - Height w(0) = 1, w(1) = 2, w(3) = 4, w(4) = 7, etc. w(h) = 1 + w (h – 1) + w(h – 2) F(i) = F(i – 1) + F(i – 2), F(0) = 0, F(1) = 1 w(h) = F(h + 3) – 1 Data structures AVL Trees - Height w(h) < 2h h > log n h = (log n) w(h) 21/2 h h 2 log n h = O(log n) Hence h = (log n) Data structures AVL Trees – Insertion – Case 1 +1 h + 2 h 0 h+2 h+1 h +1 h+1 Data structures AVL Trees – Insertion – Case 2 0 h+1 h -1 h + 2 h h h +1 Data structures AVL Trees – Insertion – Case 3 –1 h + 1 h –2 h + 2 h–1 h–1 h +1 Data structures AVL Trees – Insertion – Rotation 1 A +2 h + 3 A +1 h + 2 C +1 C 0 h h T1 T1 h T2 h h T3 T2 h+ 1 T3 Data structures AVL Trees – Insertion – Rotation 1 A +2 h + 3 C 0 C +1 A h+2 0 h+ 1 h T1 h T2 h+ 1 T3 T3 h T1 T2 h Data structures AVL Trees – Insertion – Rotation 2 A +1 h + 2 0 h 0 T1 A +2 h + 3 C -1 C h B h h–1 T2 T3 T4 +1 B T1 h h–1 T2 T3 T4 Data structures AVL Trees – Insertion – Rotation 2 A +2 h + 3 B T1 h–1 0 C T2 h–1 h T1 h h+2 A -1 +2 B h 0 T3 T4 0 T2 C T3 T4 h Data structures AVL Trees – Insertion – Example 1 7 3 10 2 5 4 8 12 9 11 6 20 15 24 Data structures AVL Trees – Insertion – Example 1 7 3 10 2 5 4 8 6 12 9 11 20 15 24 Data structures AVL Trees – Insertion – Example 2 7 3 10 2 5 4 8 12 9 11 21 20 15 24 Data structures AVL Trees – Insertion – Example 2 7 3 10 2 5 4 8 12 9 11 20 15 24 21 Data structures AVL Trees – Insertion – Rotation 1 A +2 h + 3 A +1 h + 2 C +1 C 0 h h T1 T1 h T2 h h T3 T2 h+ 1 T3 Data structures AVL Trees – Insertion – Rotation 1 A +2 h + 3 C 0 C +1 A h+2 0 h+ 1 h T1 h T2 h+ 1 T3 T3 h T1 T2 h Data structures AVL Trees – Insertion – Example 2 7 3 10 2 5 4 A 12 8 9 11 C 20 15 24 21 Data structures AVL Trees – Insertion – Example 2 7 3 10 2 5 4 C 20 8 9 A 12 11 24 15 21 Data structures AVL Trees – Insertion – Example 3 7 3 10 2 5 4 8 20 9 16 12 11 24 15 21 Data structures AVL Trees – Insertion – Example 3 7 3 10 2 5 4 8 20 9 12 11 24 15 21 16 Data structures AVL Trees – Insertion – Rotation 2 A +1 h + 2 0 h 0 T1 A +2 h + 3 C -1 C h B h h–1 T2 T3 T4 +1 B T1 h h–1 T2 T3 T4 Data structures AVL Trees – Insertion – Rotation 2 A +2 h + 3 B T1 h–1 0 C T2 h–1 h T1 h h+2 A -1 +2 B h 0 T3 T4 0 T2 C T3 T4 h Data structures AVL Trees – Insertion – Example 3 7 A 10 3 2 5 4 C 20 8 9 B 12 11 24 15 21 16 Data structures AVL Trees – Insertion – Example 3 7 A 10 3 2 5 4 B 12 8 9 11 C 20 15 24 16 21 Data structures AVL Trees – Insertion – Example 3 7 B 12 3 2 A 10 5 4 8 C 20 11 9 15 24 16 21 Data structures AVL Trees – Rebalancing [Adapted from P.Flocchini] Data structures AVL Trees – Insertion [Adapted from P.Flocchini] Data structures AVL Trees – Deletion – Example 1 50 90 20 15 14 13 24 16 21 22 40 30 h=3 45 h=4 47 Data structures AVL Trees – Deletion – Example 1 50 90 20 15 24 14 13 21 22 40 30 h=3 45 h=4 47 Data structures AVL Trees – Deletion – Example 1 50 90 20 14 13 24 15 21 22 40 30 h=3 45 h=4 47 Data structures AVL Trees – Deletion – Example 1 50 90 24 20 14 13 21 15 22 40 30 h=3 45 h=4 47 Data structures AVL Trees – Deletion – Example 1 90 50 24 20 14 13 21 15 22 40 30 h=3 45 h=4 47 Data structures AVL Trees – Deletion – Example 2 [Adapted from P.Flocchini] Data structures AVL Trees – Deletion – Example 2 [Adapted from P.Flocchini] Data structures AVL Trees - Complexity LookUp (log n) Insert (log n) Delete (log n) MakeEmpty (1) IsEmpty (1) Data structures 2 – 3 Trees • All leaves are at the same depth and contain 1 or 2 keys • An interior node either - contains one key and has 2 children ( 2- node), or - contains two keys and has three children ( 3- node) • A key in an interior node is between the keys in the subtrees of its adjacent children. (For 2- node it is just BST property, for 3- node the two keys split the keys in the subtrees in 3 groups: less than smaller key, between two keys, greater than the larger key) Data structures 2 – 3 Trees - Example 13-41 9 2 17-33 11 16 21 50 38-40 44 99 Data structures 2 – 3 Trees - Insertion 1. Search for leave where key belongs. Remember path. 2. If there is one key in the leaf, add key and stop. 3. Split node in two (first and third keys), pass the middle key to the parent. 4. If parent is a 2-node, then stop. Else return to step 3. 5. The process is repeated until there is a room for a key, or the root is being split - then a new root is created and the height increases by 1. Data structures 2 – 3 Trees - Insertion 13-41 9 2 17-33 11 16 50 21 37-38-40 44 99 13-41 9 2 17-33-38 11 16 21 37 40 44 50 99 Data structures 2 – 3 Trees - Insertion 13-33-41 9 2 17 11 16 38 21 37 50 40 44 99 33 13 9 2 41 17 11 16 38 21 37 50 40 44 99 Data structures 2 – 3 Trees - Deletion 1. If key is in leaf, then remove it. Otherwise replace it by the Inorder successor and remove the successor from the leaf. 2. A key has been removed from node N. If N still has a key then stop. Otherwise: a) If N is the root, delete it. If N has no child the tree becomes empty; otherwise the child becomes the root. b) If N has a sibling N’ immediately to its left or right that has two keys, let S be a key in the parent that separates them. Move S to N and replace it in the parent by the key of N’ that is adjacent to N. If N and N’ are interior nodes, then move one child of N’ to be a child of N. (N and N’ now have 1 key each, instead of 0 and 2). Stop. Data structures 2 – 3 Trees - Deletion c) N’ - a sibling of N with only one key. Let P be a parent of N and N’ and let S be a key in P that separates them. Consolidate S and the one key in N’ into a new 3-node, which replaces both N and N’. This reduces by 1 the number of keys in P and the number of children of P. Let N P and repeat step 2. Data structures 2 – 3 Trees – Deletion 1 13-41 9 2 17-33 11 16 21 50 38-40 44 99 13-41 9 2 17-38 11 16 33 50 40 44 99 Data structures 2 – 3 Trees – Deletion 2 13-41 9 2 17-33 11 16 21 50 38-40 44 99 13-41 17-33 2-9 16 21 50 38-40 44 99 Data structures 2 – 3 Trees – Deletion 2 13-41 17-33 2-9 16 21 50 38-40 44 99 17-41 13 2-9 33 16 21 50 38-40 44 99 Data structures Red-Black trees A A B C B A-B C A B or C D E C B D A E C E D Data structures (a,b) Trees We assume a 2 and b 2a – 1. • All leaves are at the same depth • All interior nodes has c children, where a c b (root may have from 2 to b children). Data structures B-trees - motivation So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t fit? We will have to use disk storage but when this happens our time complexity fails The problem is that Big-Oh analysis assumes that all operations take roughly equal time This is not the case when disk access is involved [Adapted from S.Garretti] Data structures B-trees - motivation Assume that a disk spins at 3600 RPM In 1 minute it makes 3600 revolutions, hence one revolution occurs in 1/60 of a second, or 16.7ms On average what we want is half way round this disk – it will take 8ms This sounds good until you realize that we get 120 disk accesses a second – the same time as 25 million instructions In other words, one disk access takes about the same time as 200,000 instructions It is worth executing lots of instructions to avoid a disk access [Adapted from S.Garretti] Data structures B-trees - motivation Assume that we use an AVL tree to store all the car driver details in UK (about 20 million records) We still end up with a very deep tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes about 0.2 seconds (if there is only one user of the program) We know we can’t improve on the log n for a binary tree But, the solution is to use more branches and thus less height! As branching increases, depth decreases [Adapted from S.Garretti] Data structures B-trees - definition A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which: 1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree 2. all leaves are on the same level 3. all non-leaf nodes except the root have at least m / 2 children 4. the root is either a leaf node, or it has from two to m children 5. a leaf node contains no more than m – 1 keys The number m should always be odd [Adapted from S.Garretti] Data structures B-trees - example 26 6 A B-tree of order 5 containing 26 items 12 42 1 2 4 27 7 29 8 13 45 46 15 48 Note that all the leaves are at the same level [Adapted from S.Garretti] 18 51 62 25 53 55 60 64 70 90 Data structures B*-trees - example K E C-D P-R I L-O Q T-X |B| |D| |E| |H| |J| |L| |N| |P| |Q| |R| |T| |U| |Z| Data structures Self adjusting (Splay) BST [Adapted from B.Weems] Data structures Self adjusting (Splay) BST Basic operations: LookUp(K,T) Insert(K,I,T) Delete(K,T) Additional technical operations: Splay(K,T) Concat(T1,T2) Data structures Self adjusting BST - Splay operation Splay(K,T) Modifies T so that it remains BST on the same keys. But the new tree has K at the root, if K is in tree; otherwise the root contains the next largest or smallest key compared to K. Data structures Self adjusting BST - Splay operation K Splay(K,T) B K J X P X B P J Data structures Self adjusting BST - Splay operation J Splay(K,T) B X J P B X P Data structures Self adjusting BST - LookUp LookUp(K,T) Execute Splay(K,T); examine the root to see if it contains K. ? Splay(K,T) Data structures Self adjusting BST - Insert Insert(K,I,T) Execute Splay(K,T); if K is in root, install I in this node. Otherwise create a new node containing K and I and break one link to make this node a new root. Data structures Self adjusting BST - Insert J K J Splay(K,T) B X J P B X P B X P Data structures Self adjusting BST - Concat Concat(T1,T2) T1, T2 - BST such that every key in T1 is less than every key in T2. Concat(T1,T2) creates a BST containing all keys from T1 and T2. Execute Splay(+,T1). Now T1 has no right subtree, attach T2 as the right child of T1. Data structures Self adjusting BST - Concat Splay(+,T) T2 T1 + + T2 T1 T2 T1 Data structures Self adjusting BST - Delete Delete(K,T) Execute Splay(K,T1). If the root does not contain K, there is nothing to do. Otherwise apply Concat to the two subtrees of the root. Data structures Self adjusting BST - Delete Splay(K,T) K Concat(T1,T2) T2 T1 T2 T1 Data structures Self adjusting BST - Splay Splay(K,T) Search for node K, remembering the search path by stacking it. Let P be the last node inspected. If K is in tree, then it is P; otherwise P has an empty child where the search for K terminated. Return along the path for P to root carrying out the following rotations, which move P up the tree. Data structures Self adjusting BST - Splay - Case 1 Splay(K,T) Case 1. P has no grandparent, i.e. Parent(P) is the root. Perform the single rotation around the parent of P. Data structures Self adjusting BST - Splay - Case 1 Q P P Q C A B A B C Data structures Self adjusting BST - Splay - Case 2 Splay(K,T) Case 2. P and Parent(P) are both left children, or both right children. Perform two single rotations in the same direction, first around the grandparent of P and then around the parent of P. Data structures Self adjusting BST - Splay - Case 2 Q R P Q P R D A B C A B P Q R A B C D C D Data structures Self adjusting BST - Splay - Case 3 Splay(K,T) Case 3. One of P and Parent(P) is a left child and the other is a right child. perform single rotations in opposite directions, firts around the parent of P and then around its grandparent. Data structures Self adjusting BST - Splay - Case 3 P R R Q A Q P A D B C B C D Data structures Self adjusting BST - Splay [Adapted from B.Weems] Data structures SA BST - Worst case complexity LookUp (n) Insert (n) Delete (n) MakeEmpty (1) IsEmpty (1) Data structures SA BST - Amortised complexity Theorem Any sequence of m dictionary operations on a self-adjusting tree that is initially empty and never has more than n nodes uses O(m log n) time. Data structures SA BST - Amortised complexity - Tree Potential Node rank r(N) = log w(N), where w(N) - number of descendants of N (including N itself) Tree potential P(Tree) = sum of all r(N), where N is a node in Tree Amortised cost TA(op) = T(op) + P Data structures SA BST - Amortised complexity [Adapted from B.Weems] Data structures SA BST - Amortised complexity - Example [Adapted from B.Weems] Data structures SA BST - Amortised complexity - Example [Adapted from B.Weems] Data structures SA BST - Amortised complexity - Example [Adapted from B.Weems] Data structures SA BST - Amortised complexity Theorem The amortised complexity of Splay operation on tree with n nodes is at most 3 log n + 1. LookUp - TA(Splay) + const = O(log n) Insert - TA(Splay) + r(root) + const = O(log n) Concat - TA(Splay) + (part of) r(root) + const = O(log n) Delete - TA(Splay) + TA(Concat) + const = O(log n) Data structures SA BST - Amortised complexity Lemma The amortised cost of Splay step involving node P, the parent of P and (possibly) the grandparent of P is at most 3 (r’(P) – r(P)), where r(P) and r’(P) are rank of P before and after rotation (at most 3 (r’(P) – r(P)) + 1 for the last step in Splay). Data structures SA BST - Amortised complexity TA(Splay step) 3 (r’(P) – r(P)) TA(Splay) ( +1) 3 (r’(P) – r(P)) + 3 (r(2)(P) – r’(P)) + 3 (r(3)(P) – r(2) (P)) + ... 3 (r(k)(P) – r(k–1) (P)) + 1 = 3 (r(k)(P) – r(P)) + 1 r(k)(P) - the rank of original root log n TA(Splay) 3 (log n – r(P)) + 1 3 log n + 1 Data structures SA BST - Amortised complexity Lemma (Rank Rule) If a node has two children of equal rank, then its rank is greater than that of each child. w(P)2r(P) w(Q)2r(Q) R P Q If r(Q) = r(P) then w(R) = w(P)+W(Q)+1>2r(P)+1 Thus r(R)= log w(r) r(P)+1 Data structures SA BST - Amortised complexity - Case 1 This must be the last step! Q P P Q C A B TA(Splay step) A B r’(P) + r’(Q) – r(P) – r(Q) + T 3 (r’(P) – r(P)) + 1 C Data structures SA BST - Amortised complexity - Case 1 [Adapted from B.Weems] Data structures SA BST - Amortised complexity - Case 2 Q R P Q P R D A B C A C P B Q TA(Splay step) r’(P) + r’(Q) + r’(R) – r(P) – r(Q) – r(R) + T 3 (r’(P) – r(P)) D R A B C D Data structures SA BST - Amortised complexity - Case 2 [Adapted from B.Weems] Data structures SA BST - Amortised complexity - Case 2 [Adapted from B.Weems] Data structures SA BST - Amortised complexity - Case 3 P R R Q A Q P A B C D B C TA(Splay step) r’(P) + r’(Q) + r’(R) – r(P) – r(Q) – r(R) + T 3 (r’(P) – r(P)) D Data structures SA BST - Amortised complexity - Case 3 [Adapted from B.Weems] Data structures SA BST - Amortised complexity - Case 3 [Adapted from B.Weems]