Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 373 Data Structures and Algorithms Lecture 25: B-Trees Assumptions of Algorithm Analysis Big-Oh assumes all operations take the same amount of time 2 Is this really true? Disk Based Data Structures All data structures we have examined are limited to main memory Have been assuming data fits into main memory Counter-example: Transaction data of a bank > 1 GB per day 3 Uses secondary storage (e.g., hard disks) Operations: insert, delete, searches Cycles* to access: CPU Registers 1 Cache tens Main memory hundreds Disk millions *cycle: time it takes to execute an instruction Hard Disks Large amount of storage but slow access Identifying a particular block takes a long time Read or write data in chunks (“a page”) of 0.5 – 8 KB in size (Newer technology) Solidstate drives are 50 – 100 times faster 5 Still “slow” Algorithm Analysis Running time of disk-based data structures measured in terms of Computing time (CPU) Number of disk accesses Regular main-memory algorithms that work one data element at a time can not be "ported" to secondary storage in a straight forward way 6 Principles Almost all of our data structure is on disk. Every time we access a node in the tree it amounts to a random disk access. How can we address this problem? 7 M-ary Search Tree Suppose we devised a search tree with branching factor M: M – 1 keys needed to decide branch to take Complete tree has height: (logMn) # Nodes accessed for search: (logMn) 8 B-Trees Internal nodes store (up to) M 1 keys M=7 Order property: Subtree between two keys x and y contain leaves with values v such that x v < y Note the “” Leaf nodes contain up to L sorted values/ data items (“records”). 9 3 7 12 21 x<3 3x<7 7x<12 12x<21 21x B-Tree Structure Properties Root (special case) Internal nodes Has between 2 and M children (or could be a leaf) Stores up to M1 keys Have between ceiling(M/2) and M children Leaf nodes Nodes are at least ½ full Leaves are at least ½ full Where data is stored Contains between ceiling(L/2) and L data items The tree is perfectly balanced ! 10 B-Tree: Example B-Tree with M = 4 (# pointers in internal node) and L = 5 (# data items in leaf) 12 44 Data objects… which we’ll ignore in slides 6 1, AB.. 2, GH.. 4, XY.. 6 8 9 10 20 27 34 12 14 16 17 19 20 22 24 27 28 32 50 34 38 39 41 44 47 49 Definition for later: “neighbor” is the next sibling to the left or right. 11 50 60 All leaves 70 at the same depth Disk Friendliness Many keys stored in a node Each node is one disk page/block. All brought to memory/cache in one disk access. Internal nodes contain only keys; only leaf nodes contain actual data What is limiting you from increasing the number of keys stored in each node? 12 The page size Exercise: Computing M and L Exercise: If disk block is 4000 bytes, key size is 20 bytes, pointer size is 4 bytes, and data/value size is 200 bytes, what should M (# of branches) and L (# of data items per leaf) be for our B-Tree? Solve for M: M - 1 keys + M pointers = 20M - 20 + 4M = 24M – 20 24M - 20 <= 4000 M = 167 Solve for L: L = 4000 / 200 L = 20 13 B-trees vs. AVL trees Suppose we have n = 109 (billion) data items: Depth of AVL Tree: log2 109 = 30 Depth of B-Tree with M = 256, L = 256: ~ log M / 2109 = log 128109 = 4.3 (M/2 because keys half-full) 14 14 Building a B-Tree with Insertions 3 Insert(3) 3 Insert(18) 3 Insert(14) 18 14 18 The empty B-Tree M=3 15 L=3 3 3 3 18 14 14 14 30 18 18 Insert(30) 30 M=3 16 L=3 18 3 18 18 3 18 18 Insert(32) 14 30 32 3 18 32 14 30 36 Insert(36) 14 30 32 18 M=3 L=3 Insert(15) 3 18 32 14 30 36 15 17 32 18 3 32 18 18 32 32 3 18 32 14 30 36 Insert(16) 14 30 36 15 15 16 18 15 32 18 15 18 32 3 15 18 32 14 16 30 36 M=3 L=3 Insert(12,40,45,38) 18 18 15 32 15 32 40 3 15 18 32 3 15 18 32 40 14 16 30 36 12 16 30 36 45 14 19 38 Insertion Algorithm: The Overflow Step K0 K1 K2 K3 Too big M=5 20 K0 K6 K4 K5 K1 K3 K2 K6 K4 K5 Insertion Algorithm 1. Insert the key in its leaf in sorted order 2. If the leaf ends up with L+1 items, overflow! 21 Split the leaf into two nodes with each half of the data. Add the new leaf to the parent. If the parent ends up with M+1 children, overflow! Insertion Algorithm 3. If an internal node ends up with M+1 children, overflow! 4. Split the internal node into two nodes each with half the keys. Add the new child to the parent If the parent ends up with M+1 items, overflow! If the root ends up with M+1 children, split it in two, and create new root with two children This makes the tree deeper! 22 And Now for Deletion… Delete(32) 18 15 32 18 15 40 40 3 15 18 32 40 3 15 18 36 40 12 16 30 36 45 12 16 30 38 45 14 38 M=3 23 L=3 14 Delete(15) 18 15 36 40 18 16 3 15 18 36 40 3 12 16 30 38 45 12 14 36 16 40 18 36 40 30 38 45 14 Are we okay? M=3 24 L=3 Leaf not half full! Are you using that 14? Can I borrow it? 18 18 16 3 36 16 12 14 M=3 25 L=3 40 14 36 40 18 36 40 3 14 18 36 40 30 38 45 12 16 30 38 45 Delete(16) 18 14 36 40 14 3 14 18 36 40 3 12 16 30 38 45 12 M=3 L=3 Are you using that 14? 26 18 36 14 40 18 36 40 30 38 45 18 18 14 3 36 14 12 36 40 40 18 36 40 3 18 36 40 30 38 45 12 30 38 45 14 M=3 27 L=3 Are you using the node18/30? 18 36 36 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 14 14 M=3 28 L=3 Delete(14) 36 36 18 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 14 M=3 29 L=3 Delete(18) 36 36 18 18 40 3 18 36 40 3 12 30 38 45 12 M=3 30 L=3 40 30 36 40 38 45 36 36 18 3 40 30 12 40 36 40 3 36 40 38 45 12 38 45 30 M=3 31 L=3 36 40 36 40 3 36 40 3 36 40 12 38 45 12 38 45 30 M=3 32 30 L=3 36 36 3 12 30 M=3 33 L=3 40 36 38 40 45 40 3 36 40 12 38 45 30 Deletion Algorithm: Rotation Step K0 K2 K1 K0 K3 K4 K5 K1 K3 K2 K4 Too small M=5 34 This is left rotation. Similarly, right rotation K5 Deletion Algorithm: Merging Step Too small! K2 K1 K3 Too small M=5 35 K4 Can’t get smaller K1 K2 K3 K4 Deletion Algorithm 1. Remove the key from its leaf 2. If the leaf ends up with fewer than L/2 items, underflow! 36 Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow Deletion Slide Two 3. If an internal node ends up with fewer than M/2 children, underflow! 3. Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow If the root ends up with only one child, make the child the new root of the tree This reduces the height of the tree! 37