* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What is a B
Survey
Document related concepts
Transcript
B-Trees • Disk Storage • What is a multiway tree? • What is a B-tree? • Why B-trees? • Insertion in a B-tree • Deleting from a B-tree Disk Storage • In this lecture you will study two external data storage structures: M-way and B-trees. • Data is stored on disk (i.e., secondary memory) in blocks. • A block is the smallest amount of data that can be accessed on a disk. • Each block has a fixed number of bytes. • Each block may hold many data records. 999000 Record 1 data 999004 999004 Record 2 data empty Record 3 empty empty empty 999000 Record 4 What is a Multiway (M-way) search tree? • A multiway (or M-way) search tree of order m is a finite set of keys and references. Either the set is empty or: 1. Each non-leaf node consists of distinct keys ki and of at least 2 and at most m non-empty M-way subtrees T0, T1, . . . , Tm-1. The number of keys is one less than the number of non-empty subtrees. k1 k2 k3 ... km-2 km-1 T0 T1 T2 Tm-2 Tm-1 key < k1 k1 < key < k2 k2 < key < k3 km-2 < key < km-1 key > km-1 2. The keys and subtrees of a non-leaf node are ordered as: T0, k1, T1, k2, T2, . . . , km-1, Tm-1 such that: • All keys in subtree T0 are less than k1. • All keys in subtree Ti , 1 <= i <= m - 2, are greater than ki but less than ki+1. • All keys in subtree Tm-1 are greater than km-1 What is an (M-way) search tree?(cont’d) 3. Each leaf node consists of distinct keys pi and at least 2 and at most m non-null references to data blocks B0, B1, B2, . . . , Bm-1. The number of keys is one less than the number of non-null references. 4. The keys in a non-leaf node and the referenced blocks are ordered as: B0, p1, B1, p2, B2, . . . , pm-1, Bm-1 such that: • All data in block B0 correspond to keys that are less than p1. • All data in block Bi , 1 <= i <= m - 2, correspond to pi and to keys that are greater than pi but less than pi+1. • All data in block Bm - 1 correspond to pm-1 and to keys that are greater than pm-1. Multiway (M-way) search tree Examples • A multiway search tree of order 3: 10 4 1 2 7 4 26 14 7 10 20 32 14 20 26 8 17 24 28 9 19 32 Multiway search tree Examples (cont’d) • Example:A multiway search tree of order 3: Note: The leaf nodes in an M-way tree are not necessarily at the same level. • Example:A multiway search tree of order 5: What is a B-Tree? • A B-tree of order m (or branching factor m), where m > 2, is either an empty tree or a multiway search tree with the following properties: The root is either a leaf or it has at least two non-empty subtrees and at most m non-empty subtrees. Each non-leaf node, other than the root, has at least m / 2 nonempty subtrees and at most m non-empty subtrees. (Note: x is the lowest integer > x ). The number of keys in each non-leaf node is one less than the number of non-empty subtrees for that node. Each leaf node has at least m / 2 and at most m references to data blocks. The number of keys in each leaf node is one less than the number of non-null references in that node. All leaf nodes are at the same level; that is the tree is perfectly balanced. A data block referenced by by a leaf node may store at least L / 2 and at most L data records, where L is a fixed constant, L 1. B-Tree Examples Example: A B-tree with m = 4 and L = 3 Example: Choosing M and L for a B-Tree • A good value for L , the number of records in a block, is the largest integer satisfying the inequality: L <= BlockSize / RecordSize • A good value for the branching factor, m, is the largest integer satisfying the inequality: ReferenceSize * m + KeySize * (m - 1) <= BlockSize • Example: A B-tree stores student records. Each student record is 1024 bytes. The key for a record is a student ID that is 36 bytes. If a block holds 4096 bytes and a reference requires 32 bytes. Find suitable values for m and L. L <= 4096 / 1024 => L = 4 32m + 36(m - 1) <= 4096 => m <= 59.7 => m = 58 Why B-Trees? • Accessing data in secondary memory is much slower than accessing data in primary memory • Thus, when data is too large to fit in primary memory, minimizing the number of disk accesses is important. • Many algorithms and data structures that are efficient for manipulating data in primary memory are not efficient for manipulating large data in secondary memory because they do not minimize the number of disk accesses. • For example, AVL trees are not suitable for representing huge tables residing in secondary memory. • The height of an AVL tree increases, and hence the number of disk accesses required to access a particular record increases, as the number of records increases. Why B-Trees? (cont’d) • B-trees are suitable for representing huge tables residing in secondary memory because: 1. With a large branching factor m, the height of a B-tree is low resulting in fewer disk accesses. 2. The branching factor can be chosen such that a node reference corresponds to a block of secondary memory. 3. B-trees usually keep related records on the same block. Again this results in fewer disk accesses. Why B-Trees? (comparison with AVL Trees) • The height h of a B-tree of order m, with a total of n keys, satisfies the inequality: h <= 1 + log m / 2 ((n + 1) / 2) • If m = 300 and n = 16,000,000 then h <= 3.31 . • Thus, in the worst case finding a key in such a B-tree requires 3 disk accesses. • The average number of comparisons for an AVL tree with n keys is log n + 0.25 where n is large. • If n = 16,000,000 the average number of comparisons is 17. • Thus, in the average case, finding a key in such an AVL tree requires 17 disk accesses. Insertion in B-Trees • • • In insertion or deletion, a B-tree undergoes changes that must maintain: Its height balance. Its leaves to be at the same level. Each of its nodes, except the root, to be at least half full (i.e., to contain a minimum of m / 2 - 1 keys, where m is the order of the tree). A node of a B-tree of order m is said to underflow if, after deletion, it contains m / 2 - 2 keys. There are 2 insertion cases: 1. The leaf in which the insertion is to be done does not overflow. 2. The leaf overflows: (a) with an odd number of keys, (b) with an even number of keys. Insertion: case 1 • The leaf does not overflow: B-tree of order 5 Insert 511 Insertion: case 2a • The leaf overflows with an odd numbers of keys. B-tree of order 5 Insert 490 Insertion: case 2a (cont’d) Insertion: case 2b The leaf overflows with an even number of keys. Left bias insertion of 102 B-tree of order 4 Right bias insertion of 102 Deletion • If the key to be deleted is not in a leaf, swap it with either its successor or predecessor (each will be in a leaf). • Delete the key from the leaf. • There are four cases: 1. The leaf does not underflow. B-tree of order 4 Delete 140 Deletion (cont’d) 2. The leaf underflows and the adjacent right sibling has at least m / 2 keys. B-tree of order 5 Delete 113 Deletion (cont’d) 3. The leaf underflows and the adjacent left sibling has at least m / 2 keys. B-tree of order 5 Delete 135 Deletion (cont’d) 4. The leaf underflows and each adjacent sibling has m / 2 - 1 keys. merge node, sibling and the separating key x If the parent of the merged node underflows, the merging process propagates upward. In the limit, a root with one key is deleted and the height decreases by one. Deletion (cont’d) B-tree of order 5 Delete 412 The parent of the merged node does not underflow. The merging process does not propagate upward. Deletion (cont’d) B-tree of order 5 Delete D