Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational Database Systems 2 4.Trees & Advanced Indexes Christoph Lofi Benjamin Köhncke Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 2 Storage • Buffer Management is very important – Holds DB blocks in primary memory • DB block are made up of several FS blocks • Find good strategies to have requested DB blocks available when needed – Each block holds some meta data and row data • Indexes drastically speed up queries – Less blocks need to be scanned – Primary Index • On primary key attribute, usually influences row storage order – Secondary Index • On any attribute, does not influence storage order Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme 2 4 Trees & Advanced Indexes 4.1 Introduction 4.2 Binary Search Trees 4.3 Self Balancing Binary Search Trees 4.4 B-Trees Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 3 4.1 Introduction • Indexes need a suitable data structure – For efficient index look-ups search keys need to be ordered • Remember: All indexes should be stored in a separate database file, not together with data – A suitable number of DB blocks (adjacent on disk) is reserved at index creation time – If the space is not sufficient, another file is created and linked to the original index file Search Key 1 Block Address 1 Search Key 2 Block Address 2 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 4 4.1 Introduction • Search within an index – Bisection search possible: ⌈log2n⌉; O(log n) 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 – But usually indexes span several DB blocks • If index is in n blocks, O(n) blocks need to be read from disk • Example: search for 92 4 6 7 16 18 21 24 33 39 47 68 72 89 92 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 99 5 4.1 Introduction • Maintenance of index is also difficult – Insert a new search key with value 5! 5 4 6 7 16 18 21 24 33 39 47 68 72 89 92 99 4 5 6 7 16 18 21 24 33 39 47 68 72 89 92 99 – In worst case, all cells need to be shifted and all blocks need to be accessed • Similar problem occurs when deleting a value – Often: do not shift values, but mark key as deleted Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 6 4.1 Introduction • In this lecture, we discuss more efficient multilevel data structures – B-trees • Prevalent in database systems • Better access performance • Much better update performance • To understand B-trees better, we start by examining binary search trees Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 7 4.2 Binary Trees • Binary trees are – Rooted and directed trees – Each node has none, one or two children – Each node (except root) has exactly one parent 0/1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 8 4.2 Binary Trees • Some naming conventions – Nodes without children are called leaf nodes – The depth of node N is the path length from the root – The tree height is the maximum node depth – If there is a path from node N1 to node N2, N1 is an ancestor of N2 and N2 is a descendant of N1 – The size of a node N is the number of all descendants of N including itself – A subtree of a node N is formed of all descendant nodes including N and the respective links root subtree red red Leaf nodes tree height = 3 red node: size = 3 depth = 1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 9 4.2 Binary Trees • Properties of binary trees • Full binary tree (or proper) – Each node has either zero or two children • Perfect binary tree – All leaf nodes have the same depth – With height h, contains 2h nodes Full and Balanced • Height-balanced binary tree – Depth of all leaf nodes differ by at most 1 – With height h, contains between 2h-1 and 2h nodes Full and Perfect • Degenerated binary tree – Each node has either zero or one child – Behaves like a linked list: search in O(n) Degenerated Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 10 4.2 Binary Search Trees • Binary search trees are binary trees with – Each node has a unique value assigned – There is a total order on all values – Left subtree of a node contains only values less than node value – Right subtree of a node contains only values larger than the node value – Aiming for O(log n) search complexity • Structurally resembles bisection search 57 0/1 33 17 85 42 61 99 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 11 4.2 Binary Search Trees • Constructing and inserting into binary search trees – Values are inserted incrementally – First value is root – Additional values sink into tree • Sink to left subtree if value smaller • Sink to right subtree if value larger • Attach to last node as left/right child, if subtree is empty • Insert order of values does highly influence resulting and intermediate tree properties Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 12 4.2 Binary Search Trees • Suppose insert order 57, 33, 42, 85, 17, 61, 99 33 57 85 57 17 42 33 85 61 99 17 42 61 99 Insert 57 Insert 33, 42 – Degenerated 61 57 57 99 33 17 85 42 33 17 Insert 85, 17 – Full and Balanced 85 42 61 99 Insert 61, 99 –Perfect and Full Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 13 4.2 Binary Search Trees • Suppose insert order – 99, 85, 61, 57, 42, 33, 17 99 85 • Insert complexity is thus 61 – O(n) worst case – O(log n) average case 57 42 33 17 Degenerated Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 14 4.2 Binary Search Tree • Search for a Key – Start with root – Recursive Procedure 57 33 • If node value = v 85 – Return node • If node is leaf 17 42 61 99 – Value not found • if v < node value – Descend to left subtree Else – Descend to right subtree • Complexity: – Average case: O(log n) – Worst case: O(n) – degenerated tree Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 15 4.2 Binary Search Tree 57 • Tree Traversal 33 – Accesses all nodes of the tree • Pre-Order – Visit node – Traverse left subtree – Traverse right subtree • In-Order (sorted access) 17 85 42 61 99 49 35 Pre-Order: 57-33-17-42-35-49-85-61-99 57 – Traverse left subtree – Visit node – Traverse right subtree 33 85 • Post-Order – – – – Traverse left subtree Traverse right subtree Visit node 17–35–49–42–33–61–99–85–57 17 42 35 61 99 49 In-Order: 17-33-35-42-49-57-61-85-99 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 16 4.2 Binary Search Tree • Deleting nodes has complexity O(n) worst case, O(log n) average case – Locate the node to delete by tree search – If node is leaf, just delete it – If node has one child, delete node and attach child to parent – If node has two children • Replace either by a) in-order successor (the left-most child of the right subtree) b) in-order predecessor (the right-most child of the left subtree) • Example: delete search key with value 57 a) 57 27 22 83 27 86 b) 83 86 22 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 27 22 83 86 17 4.2 Binary Search Trees • Summary – Very simple, dynamic data structure – Efficient on average • O(log n) for all operations – Can be very inefficient for degenerated cases • O(n) for all operations 0/1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 18 4.3 Self-Balancing Binary Search Trees • Observation: – Binary Search Trees are very efficient when perfect or balanced • Idea: – Continuously optimize tree structure to keep tree balanced • Popular Implementations – – – – – AVL-Tree (classic example) Red-Black-Tree Splay-Tree Scapegoat-Tree … Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 19 4.3 Self-Balancing Binary Search Trees • Basic Concepts for Deletion: Global Rebuild (Lazy Deletion) – Start with balanced tree – Don’t delete a node, just mark it as deleted • Search algorithm scans deleted nodes, but does not return them – If Rebuild Condition is met, rebuild the whole tree without the deleted nodes • “Rebuild as soon as half of the nodes are marked as deleted” • Complete rebuild can be performed in O(n) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 20 4.3 Self-Balancing Binary Search Trees • Global Rebuild (cont.) – Search Efficiency • n number of unmarked nodes • Tree is balanced, contains max 2n nodes overall – Number of accesses during search usually just increases by 1 – O(log n) – Delete Efficiency • Global rebuild is in O(n) – But only necessary after n deletions – Amortized additional costs per deletion is O(1) • Overall complexity – Average: O(log n) – Worst Case: O(n), if actual rebuild is performed Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 21 4.3 Self-Balancing Binary Search Trees • Global Rebuild (cont.) – Direct Deletion with Rebuild • Similar complexity as with lazy deletion – Increased per delete effort – Reduced per search effort until rebuild • Delete nodes as in normal binary trees – Increment deletion counter cd • Rebuild tree as soon as cd = n, reset cd 57 85 33 85 17 17 42 Delete 57,33,42,61 61 99 99 Rebuild Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 22 4.3 Self-Balancing Binary Search Trees • Basic Concepts for Insertion and Deletion: Local Balancing (Subtree Balancing) – Start with balanced tree – Insert/delete nodes normally • If a subtree becomes “too unbalanced”, locally balance subtree to regain global balance Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 23 4.3 Self-Balancing Binary Search Trees – To detect unbalanced subtrees, each node v needs to know the size |v| and the height h(v) of it’s subtree • Unbalanced Condition: (Height Balancing) – Subtree is “too unbalanced” when |h(left(v)) - h(right(v))| > α – α is a constant which can be adjusted (for AVL, α=1) • Alternative Unbalanced Condition: – Subtree is “too unbalanced” when h(v) > α * log2|v| – α is a constant which can be adjusted Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 24 4.3 Self-Balancing Binary Search Trees • Local Balancing (cont.) – “After inserting a node, walk back the tree and update stored subtree statistics h(v) and |v|. If node v is too imbalanced, balance subtree of v” Height Imbalanced for α=1 |2-0| = 2 > 1 57 2, 5 33 1, 3 17 0, 1 57 3, 6 85 0, 1 42 0, 1 h(v) 33 2, 4 17 1, 2 key |v| 85 0, 1 42 0, 1 5 0, 1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 25 4.3 Self-Balancing Binary Search Trees – Local Balancing can be archived by • Rebuilding the subtree – O(|v|) = O(n) the worst case – However, O(log n) in average – This operation is expensive. But in the context of DBMS, it may pay off as it can also consolidate and optimize physical storage locations – Especially suited for disk based trees • Rotating – – – – Only pointers are moved very efficient O(1) Does not change physical storage of nodes Especially suited for main-memory-based trees Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 26 4.3 Self-Balancing Binary Search Trees • Local Balancing – Rotating – Simple Rotation (left, right) y Pivot x right x y 3 1 1 left 2 2 3 – Double Rotation (left-left, right-right, Rollercoaster) right-right z y right-right y x x z y 4 1 x z 3 1 2 1 left-left 2 3 4 2 left-left Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 3 4 27 4.3 Self-Balancing Binary Search Trees • Local Balancing – Rotating – Double Rotation (left-right, Zig-Zag) left z right z y x y 4 z 4 x y 1 3 2 x 3 1 1 2 3 4 2 – Double Rotation (right-left, Zig-Zag) – Analogous to left-right Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 28 Self-Balancing Binary Search Trees • The presented concepts can be combined in different ways to implement self-balancing trees – AVL-Tree (classic example) – Red-Black-Tree – Splay-Tree – Scapegoat-Tree Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 29 Self-Balancing Binary Search Trees • Implementation: AVL-Trees – Invented 1962 by Adelson-Velsky and Landis – Uses Local Rebalancing with Rotations for Insertion and Deletion • Unbalanced criterion: |h(left(v)) - h(right(v))| > 1 – “Height difference of left and right subtree of v is 2 or more” – Height information is stored explicitly within nodes • Update backtracking after each insert and delete • Storage overhead of O(n) – Guaranteed maximum height of 1.44 log2n Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 30 Self-Balancing Binary Search Trees • Implementation: Scapegoat-Trees – Invented 1993 by Galperin and Rivest – Uses Global Rebuilding for Deletions – Local Balancing with Rotations for Insertions • Unbalanced criterion: h(v) > log1/α|v| + 1; 0.5 ≤ α ≤ 1 – Node statistics (height, size) determined dynamically during backtracking • Only global statistics are stored • Storage overhead of O(1) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 31 4.4 Problems with Binary Search Trees • Are binary trees really suitable for disk based databases? – Yes and No… – Binary Trees are great data-structures for usage in internal memory – But they have a very bad performance when stored on external storage (i.e. hard disks) 0/1 & Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig = 32 4.4 Problems with Binary Search Trees • Binary tree nodes have to be stored within hard disk blocks in linear fashion – When tree is large, nodes are scattered among the blocks – In worst case, a new block must be read from disk for every node accessed during search or traversal • Every linearization scheme for binary trees has that problem – Reading a block from disk is very expensive Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 33 4.4 Problems with Binary Search Trees • Sample linearization – Search for ’42’ • In worst case needs to fetch 3 blocks from disk for just 4 nodes – Problem is even worse for full tree traversal Tree: 33 16 47 6 4 21 7 18 39 24 35 68 42 59 99 16 Disk/DB Blocks: 33 16 47 Block 1 6 21 39 68 Block 2 4 7 18 24 Block 3 35 42 59 99 Block 4 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 2 34 4.4 B-Trees • B-Trees adapt concepts and techniques learned for binary trees and optimize them for harddisc storage • Basic Ideas: – Searching within a DB/disk block is very efficient • Take advantage of static nature within a block • Search can be performed in memory with bisection search • Treat entire blocks as tree nodes – Reading blocks from the disk is expensive • Reduce block reads • Most data resides in the leaf nodes – Thus minimize the height of the tree • Dramatically increase fan-out factor • Tree becomes “bushy” • Smaller serach path length Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 35 4.4 Block Search Trees • First Improvement: Block Search Tree – Nodes are complete DB blocks – Each node can store up to q pointers pi and q-1 unique and ordered key entries ki : <p1, k1, …, kq-1, pq > • ki < ki+1 • Pointers pi link to subtrees (or are empty). All keys in subtree of pi are less than ki and greater as ki-1 Key Value Node Pointers 10 20 30 Node 5 15 13 EN 14.3.1 17 34 38 55 14 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 36 4.4 Block Search Trees • Locate a key k – Recursive Procedure: Start with root node • Use bisection search within the current node • If key found – Return it • If key not found – If there is a pi with ki-1 < k < ki » Follow pi and repeat algorithm with link node – Else » Key not in tree – Example: Locate 14 10 5 15 13 EN 14.3.1 20 17 30 34 38 55 14 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 37 4.4 Block Search Trees • Insert a key k – Recursive Procedure: Start with root node • Use bisection search within the current node • If key found – Key cannot inserted twice, abort • If key not found – If there is a pi with ki-1 < k < ki » Follow pi and repeat algorithm with link node – Else » If there is space left in the node • Insert key and restore sort order » Else • Create new, empty node • Insert k into new node • Link new node to pi in current node such that with ki-1 < k < ki EN 14.3.1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 38 4.4 Block Search Trees • Insert a key : 44 5 10 15 13 20 17 5 EN 14.3.1 34 38 55 34 38 55 14 10 15 13 30 14 20 17 30 44 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 39 4.4 Block Search Trees • Delete a key k – Start with root node – Locate k • If k is in leaf node, delete k from node and restore order – If leaf node is now empty, delete the node • If k is in internal node – If no or only one directly adjacent pointer of k are used » Delete k and restore order – If k is a separator between two used pointers, » If space in both subnodes is sufficient • Union both nodes into one • Delete k and restore order » Else • Replace k with new separator key • Either largest key in left node or smallest key in right node – Any completely empty node is deleted as in binary search trees EN 14.3.1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 40 4.4 Block Search Trees • Delete a key : 10 5 10 15 13 20 17 14 38 55 38 55 44 5 EN 14.3.1 34 14 20 13 30 30 15 17 34 44 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 41 4.4 Block Search Trees • Delete a key : 20 20 5 13 30 15 17 14 34 38 55 38 55 44 30 5 13 EN 14.3.1 14 15 17 34 44 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 42 4.4 Block Search Trees • Delete a key : 30 30 5 13 15 17 14 34 38 55 44 34 5 13 EN 14.3.1 14 15 17 38 55 44 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 43 4.4 Block Search Trees • Block Search Trees have similar properties to Binary Search Trees – Can be perfect, balanced or degenerated • Assume height h=3; fan-out-factor q=2048; and total number of keys n – Block Search Tree • • • • One node can store up to 2047 keys and 2048 links Perfect : n = 8581M Balanced : 4M ≤ n ≤ 8581M Degenerated : n = 6141 – Binary Search Tree (Block tree with q=2) • • • • One node can store 1 key and up to 2 links Perfect : n = 7 Balanced : 3 < n ≤ 7 Degenerated : n = 3 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 44 4.4 Block Search Trees • Assume n = 1,000,000,000; fan-out-factor q=2048; and height h – Block Search Tree • Balanced : h = 3 • Degenerated : h = 488,520 – Binary Search Tree • Balanced : h = 30 • Degenerated : h = 1,000,000,000 • During search, there is one disk access per tree height in worst case – In this example, block search trees are already 10 times more efficient when balanced, 2000 times when unbalanced Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 45 4.4 Block Search Trees • Summary BST – Data structure optimized for disk storage – Is very efficient in average case • O(log n) for all operations • Even better – Average node-accesses to locate a key is logfan-out n » Fan-out usually in the order of several thousands » Binary tree averages only to log2n – Accessing a node is expensive on disks, huge improvement – Can be very inefficient for degenerated cases • O(n) for all operations • Better than binary trees, but still bad Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 46 4.4 B-Trees • B-Tree are specialized Block Search Trees for disc-based Indexing – Invented by Rudolf Bayer in 1971 – Keys may be non-unique – Tree is self-balancing • No degenerated cases anymore EN 14.3.2 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 47 4.4 B-Trees • Basic structure of a B-tree node – Nodes contain key values and respective data (block) pointers to the actual data records – Additionally, there are node pointers for the left, resp. right interval around a key value Key Value Data Pointer Tree Node … Node Pointers Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 48 EN 14.3.2 6 7 3 4 5 9, Tarrens, $ 99,00 2 8, Smith, $675,99 7, Ruth, $ 8642,78 6, Naders, $ 682,56 5, Miller, $179,99 4, Cesar, $ 1866,00 1 3, Behaim, $ 167,00 2, Bertram, $19,99 1, Adams, $ 887,00 4.4 B-Trees • B-Trees as Primary Index 8 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 9 49 4.4 B-Trees • All base operations similar to Block Search Tree with small changes – Guaranteed fill degree – Self-Balancing • Each node contains between L(ower) and U(pper) links – Usually 2* L = U – Nodes are split during insertion as soon as they contain more than U-2 keys – Nodes are unioned during deletion as soon as they contain less than L keys – If complete node is created or deleted, use local rebalancing to re-balance tree • Local rebuilding for disk-based storage, rotations for memory based storage EN 14.3.2 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 50 4.4 B-Trees • All insertions start at the leaf nodes – Search the tree to find leaf node where new element should be added – If the leaf node contains fewer than the maximum legal number of elements (if |leaf node| < U) • Insert the new element in the node and restore order – Otherwise the leaf node is split into two nodes (node split) • The median is chosen from among the leaf's elements and the new element • Values less than the median are put in the new left node and values greater than the median are put in the new right node, with the median acting as a separation value • That separation value is added to the node's parent, which may also cause it to be split • If the splitting goes all the way up to the root, it creates a new root with a single separator value and two children – Remember: the lower bound on the size of internal nodes does not apply to the root Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 51 4.4 B-Trees • Deleting elements is problematic, because node sizes can decrease under the minimum number of elements (i.e. |new node size| < L) – An element to be deleted in an internal node may be a separator for its child nodes • Deletion from a leaf node – Search for the value to delete – If the value is in a leaf node, it can simply be deleted from the node • Test if node has too few elements; in that case the tree has to be rebalanced Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 52 4.4 B-Trees • Rebalancing after deletion – If some leaf node is under the minimum size, some elements must be redistributed from its siblings to bring all children nodes again up to the minimum (stealing) • If all siblings have only minimum size, the parent node is affected and has to hand over an element • If the parent then falls under the minimum size, the redistribution must be applied iteratively up the tree • Since the minimum element count does not apply to the root, making the root the only deficient node is not a problem Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 53 4.4 B-Trees • The rebalancing strategy is to find a sibling of the deficient node which has more than the minimum number of elements (for stealing) – Choose a new separator • Move it to the parent node and redistribute the values in both original nodes to the new left and right children – If the sibling node to the right of the deficient node has only the minimum number of elements, examine the sibling node to the left Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 54 4.4 B-Trees – If both siblings have only the minimum number of elements: • create a new node with all the elements from the deficient node & all the elements from one of its siblings & the separator in the parent between the two combined sibling nodes – Remove the separator from the parent, and replace the two children it separated with the combined node. – If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root may be deficient Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 55 4.4 B-Trees • Example: Steal Keys from Siblings (l=3) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 56 4.4 B-Trees • Example: Join Child Nodes (l=3) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 57 4.4 B-Trees • Each element in an internal node acts as a separation value for two subtrees. • When such an element is deleted, there are two cases: – Both of the two child nodes to the left and right of the deleted element have the minimum number of elements (L-1) and then can then be joined into a legal single node with (2L-2) elements – One of the two child nodes contains more than the minimum number of elements. Then a new separator for those subtrees must be found. There are two possible choices: • The largest element in the left subtree is the largest element which is still less than the separator • The smallest element in the right subtree is the smallest element which is still greater than the separator Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 58 4.4 B-Trees • Deletion from an internal node – If the value is in an internal node, choose a new separator, remove it from the leaf node it is in, and replace the element to be deleted with the new separator – This has deleted an element from child node so the deletion has been passed down the tree iteratively • If the child is a leaf node the leaf node deletion procedure applies Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 59 4.4 B-Trees • Example: Build a B-Tree Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 60 4.4 B-Trees • Summary – Very efficient data structure for disk storage • O(log n) for all operations • Even better log 𝑓𝑎𝑛 −𝑜𝑢𝑡 ( – Guaranteed maximum node-accesses to locate a key is – Balanced binary tree guarantees only ⌈ log2 𝑛)⌉ – Accessing a node is expensive on disks huge improvement 𝑛+1 ) 2 – No degenerated cases • Self-Balancing rarely necessary as most updates affect just one node • Wasted space decreased due to guaranteed minimal fill factor EN 14.3.2 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 61 4.4 B*Trees • The B*Tree is a constrained B-Tree – All non-root nodes need to be filled to 2/3 – Implemented in various file systems • HFS • Raiser 4 – Used to be quite popular, but lost its importance… EN 14.3.3 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 62 4.4 + B Trees • The B+Tree is an optimization of the B-Tree – Improved traversal performance – Increased search efficiency – Increased memory efficiency • B+Tree uses different nodes for leaf nodes and internal nodes – Internal Nodes: Only unique keys and node links • No data pointers! – Leaf Nodes: Replicated keys with data pointer • Data pointers only here Node Pointer Node Key Value Key Value … Internal Node EN 14.3.3 Data Pointer … Leaf Node Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 63 4.4 + B Trees • Internal Nodes are used for search guidance – A block can contain more keys fan-out higher • Leafs just contain data links – All leafs are linked to each other in-order for increased traversal performance Internal Search Nodes Data Nodes EN 14.3.3 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 64 4.4 + B Trees • Summary – B+ Tree is THE super index structure for disk-based databases – Improved over B-Tree • Improved traversal performance • Increased search efficiency • Increased memory efficiency EN 14.3.3 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 65 4.5 IMDB’s • Observation – Loading data from the hard disk is a major bottleneck – Available main memory still doubles every 18 month • Moore’s Law • Idea – Store all data in fast main memory! • Solutions – Use “traditional” DBMS with huge buffer pool (block cache) • DBMS are usually optimized for sequential disk access – Design special In-Memory Databases Systems • Or MMDB (Main Memory Database) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 66 4.5 IMDB’s • Why do we need in-memory databases? – Embedded Systems • • • • • Mobile Phones PDA’s Sensors Diskless Computing Devices … – Ultra-High-Performance (Real Time) Scenarios • • • • Network Applications Telecommunication Applications High-Volume Trading … Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 67 4.5 IMDB’s • Why should IMDB’s be different? • Traditional DBMS do also work in-memory – but they waste potential – Random access has nearly no penalty compared to sequential access • Optimizing for linear storage and block read/write unnecessary Type Media Pri DDR3-Ram Size Random Acc. Speed Transfer Speed Characteristics Price Price/GB 2 GiB 0.004 ms 8000 MB/sec Vol, Dyn, Ra, OL €38 € 19 2000 GB < 8.5 ms 138 MB/sec Stat, RA, OL €143 € 0.07 (Corsair 1600C7DHX) Sec Harddrive Magnetic (Seagate ST32000641AS) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 68 4.5 IMDB’s • Storing a DB in main memory also has problems – Main memory usually smaller and more expensive – ACID support • IMDBs support atomicity, consistency and isolation • Problem: main memory is not persistent – What happens in case of power failure? – How to ensure the durability requirement of DBs? Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 69 4.5 IMDB’s - Durability • Snapshot Files / Checkpoint Images – Record state of DB at given point in time – Done periodically or in case of controlled shutdown – Only partial durability • Transaction Logging – History of actions executed by the DBMS – File of changes done in the database stored in stable storage – If DB has not been shut down properly (respectively is in inconsistent state) the DBMS reviews logs for uncommitted transactions and rolls back changes Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 70 4.5 IMDB’s - Durability • Non-volatile random access memory (NVRAM) – Static RAM backed up with battery power (battery RAM) – Or electrically erasable programmable ROM (EEPROM) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 71 4.5 IMDB’s - Indexes • IMDB Index Structures – B-Trees are great, but they are shallow and bushy which is unnecessary in main memory • Can save some performance there – Hash Indexes are very suitable in main memory for unsorted data • Especially bucket chained hashing is very efficient – For sorted data: Use the T-Tree instead of B-Tree • Specialized tree for main memory databases • Blend between AVL-Tree and B-Tree Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 72 4.5 T-Tree • T-Tree design considerations – I/O access is cheap in main memory – Expensive resources are computation time and memory space p d1 d2 … l dm-1 dm r • Properties – T-Tree is a self-balancing binary tree (AVL algorithm) – T-Tree nodes contain only links – Each node links to m data records (d1 … dm) • Data entries are ordered, smallest left, biggest right • All nodes contain a maximum of cmax entries • Each internal node contains cmin to cmax entries (usually cmax -cmin ≤2) – Each node has a link to it’s parent – Each node has at most a left and a right subtree • Left subtree contains only entries smaller than the minimal node entry • Right subtree contains only entries bigger than maximal node entry Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 73 4.5 IMDB Indexes • How do main memory index structures compare? Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 74 4.5 IMDB Indexes • How do main memory index structures compare? Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 75 4.5 IMDB Indexes • How do main memory index structures compare? Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 76 4.5 IMDB Indexes • Why not always use Chained Bucked Hashing? – No range queries – Storage overhead – Suboptimal if amount of data is unknown during initialization • Why do T-Tree and AVL-Tree perform better than BTree and ordered array for search? – Bisection search within a B-tree node/array needs to compute position of next comparison • AVL and T-Tree do only need 2 comparisons in each node • Why does T-Tree perform better than AVL for updates? – Due to larger nodes, many updates do not require a rebalancing • Why does ordered array suck for updates? – Reordering of all elements necessary for each update Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 77 4.5 References and Timeline • AVL-Tree • G. Adelson-Velskii, E. M. Landis: “An algorithm for the organization of information”. Proceedings of the USSR Academy of Sciences 146: 263266. (Russian), 1962. • English translation by M. J. Ricci in Soviet Math. Doklady, 3:1259–1263, 1962 – B-Trees • R. Bayer, E. M. McCreight: “Organization and Maintenance of Large Ordered Indexes”. Acta Informatica 1, 173-189, 1972 – T-Trees • T. J. Lehman, M. J. Carey: “A Study of Index Structures for Main Memory Database Management Systems”, 12 Int. Conf. On Very Large Database, Kyoto, August 1986 th – Scapegoat Trees • I. Galperin, R. L. Rivest: “Scapegoat trees”, ACM-SIAM Symposium on Discrete Algorithms, Austin, Texas, US, 1993 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 78 4 Storage • Tree data structures are good index structures – O(logn) performance in average – But they may degenerate O(n) • Balancing necesarry! – Block structure of hard discs must be considered • Block trees – B-Tree • Self-balancing block tree with fill-guarantees – B+-Tree • Special inner nodes without data pointers • Leaf nodes optimized for linear traversal Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme 79 5 Outlook • The Query Processor – How do DBMS actually answer queries? – Query Parsing/Translation – Query Optimization – Query Execution – Implementation of Joins Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 80