Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
File Structures by Folk, Zoellick, and Ricarrdi Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structure SNU-OOPSLA Lab 1 Chapter Objectives(1) Place the development of B-trees in the historical context of the problems they were designed to solve Look briefly at other tree structures that might be used on secondary storage, such as paged AVL trees Introduce multirecord and multilevel indexes and evaluate the speed of the search operation Provide an understanding of the important properties possessed by B-trees, and show how these properties are especially well suited to secondary storage applications File Structure SNU-OOPSLA Lab 2 Chapter Objectives(2) Present the object-oriented design of B-trees define class BTreeNode and Btree Explain the implementation of the fundamental operations on Btrees Introduce the notion of page buffering and virtual B-trees Describe variations of the fundamental B-trees algorithms, such as those used to build B* trees and B-trees with variable-length records File Structure SNU-OOPSLA Lab 3 Contents(1) 9.1 Introduction 9.2 Statement of the Problem 9.3 Indexing with Binary Search Trees : AVL Trees, Paged Binary Trees, Problems with Paged Tress 9.4 Multilevel Indexing 9.5 B-Trees 9.6 Example of Creating a B-Tree 9.7 An Object-Oriented Representation of B-Trees : Class BTreeNode , Class BTree File Structure SNU-OOPSLA Lab 4 Contents(2) 9.8 B-Tree Methods Search, Insert, and Others 9.9 B-Tree Nomenclature 9.10 Formal Definition of B-Tree Properties 9.11 Worst-case Search Depth 9.12 Deletion, Merging, and Redistribution 9.13 Redistribution During Insertion 9.14 B* Trees 9.15 Buffering of Pages : Virtual B-Trees 9.16 Variable-Length Records and Keys File Structure SNU-OOPSLA Lab 5 9.1 Introduction : Invention of the B-Tree Introduction: The Invention of the B-tree 1972 Acta Infomatica : R. Bayer and E. McCreight (at Boeing Corporation) “Organization and Maintenance of Large Ordered Indexes” 1979 : ‘de facto’ standard for database index D.Comer “The Ubiquitous B-tree” ACM Computing Survey Why the name B-tree? Balanced, Bushy, Broad, Boeing, Bayer Retrieval, Insertion, Deletion time = log K I ( I : no of indexes in file, K : no of indexes in a page) Excellent for dynamically changing random access files File Structure SNU-OOPSLA Lab 6 9.2 Statement of the Problem Statement of the Problem Problems in an index on secondary storage Searching the index must be faster than binary searching In binary search: 15 items - 4 seeks, 1,000 items - 9.5 seeks Insertion and deletion must be as fast as search inserting a key may involve moving many other keys in some file structures File Structure SNU-OOPSLA Lab 7 9.3 Indexing with Binary Search Trees Binary Search Tree(1) Advantages Data may not be physically sorted Good performance on balanced tree Insert cost = search cost Disadvantages In out-of-balance binary tree, more seeks are required File Structure SNU-OOPSLA Lab 8 9.3 Indexing with Binary Search Trees Binary Search Tree(2) Sorted list of keys AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ KF At most 4 seeks/one record Binary search tree representation FB HN CL AX File Structure DE SD FT WS PA JD NR SNU-OOPSLA Lab RF TK YJ 9 9.3 Indexing with Binary Search Trees Internal Representation of Binary Tree With RRN(fixed length record) or pointer key left right ROOT 9 0 FB 10 8 1 JD 2 RF 3 SD 6 13 4 AX 5 YJ 6 PA 11 2 File Structure 7 FT SNU-OOPSLA Lab key left right 8 HN 7 1 9 KF 0 3 10 CL 4 12 11 NR 12 DE 13 WS 14 5 14 TK 10 9.3 Indexing with Binary Search Trees Unbalanced Binary Tree KF FB CL AX DE SD HN FT WS PA JD NR RF TK YJ LV LA NP - At most 9 seeks/one record MB - Worst case : sequential search ND NK File Structure SNU-OOPSLA Lab 11 9.3 Indexing with Binary Search Trees AVL Tree(1) A height-balanced k tree ( HB(k) tree) Allowable difference in the height of any two sub-tree is k AVL Tree : HB(1) Tree G.M. Adel’son, Vel’skii, E.M. Landis Maintenance overhead is needed Performance Given N keys, worst-case search => 1.44 log2(N+2) cf. Completely balanced AVL tree : worst-case search => log2(N+1) File Structure SNU-OOPSLA Lab 12 9.3 Indexing with Binary Search Trees AVL Tree(2) (a) AVL Trees (b) Non - AVL Trees X X X File Structure SNU-OOPSLA Lab X 13 9.3 Indexing with Binary Search Trees AVL Tree(3) Binary tree structure that is balanced nature with respect to the height of subtree Definition An empty tree is height balanced If T is a nonempty binary tree with TL and TR as its left and right subtrees, then T is height balanced iff (1) TL and TR are height balanced and (2) |hL-hR|<1 where hL and hR are the heights of TL and TR, respectively File Structure SNU-OOPSLA Lab 14 9.3 Indexing with Binary Search Trees AVL Tree(4) BalanceFactor, BF(T), of a node T in a binary tree is hL-hR where hL and hR are the height of the left and right subtree of T For any node in tree T in AVL tree, BF(T) should be one of “ -1, 0, 1” If BF(T) is -2 or 2, then proper rotation is carried out in order to get balance File Structure SNU-OOPSLA Lab 15 9.3 Indexing with Binary Search Trees AVL Tree(5) New Identifier After Insertion MARCH 0 MAR New Identifier After Insertion MAY No Rebalancing needed No Rebalancing needed -1 MAR 0 MAY New Identifier NOVEMBER After Rebalancing After Insertion -2 MAR -1 MAY File Structure 0 MAY RR 0 MAR 0 NOV SNU-OOPSLA Lab 0 NOV 16 9.3 Indexing with Binary Search Trees AVL Tree(6) New Identifier After Insertion AUGUST +1 MAY +1 MAR No Rebalancing needed 0 NOV 0 AUG File Structure SNU-OOPSLA Lab 17 9.3 Indexing with Binary Search Trees AVL Tree(7) New Identifier After Insertion APRIL +2 MAY +2 MAR After Rebalancing LL +1 MAY 0 NOV +1 AUG 0 AUG 0 APR 0 NOV 0 MAR 0 APR File Structure SNU-OOPSLA Lab 18 9.3 Indexing with Binary Search Trees AVL Tree(8) New Identifier After Rebalancing After Insertion +2 MAY JANUARY -1 AUG 0 APR 0 MAR LR 0 NOV +1 MAR 0 AUG 0 APR -1 MAY 0 JAN 0 NOV 0 JAN File Structure SNU-OOPSLA Lab 19 9.3 Indexing with Binary Search Trees AVL Tree(9) New Identifier No Rebalancing needed After Insertion +1 MAR DECEMBER -1 AUG -1 MAY +1 JAN 0 APR 0 NOV 0 DEC File Structure SNU-OOPSLA Lab 20 9.3 Indexing with Binary Search Trees AVL Tree(10) New Identifier +1 MAR JULY -1 AUG -1 MAY 0 JAN 0 APR 0 DEC File Structure No Rebalancing needed After Insertion 0 NOV 0 JUL SNU-OOPSLA Lab 21 9.3 Indexing with Binary Search Trees AVL Tree(11) New Identifier FEBRUARY RL +2 MAR -2 AUG +1 MAR -1 MAY +1 JAN 0 APR After Rebalancing After Insertion 0 NOV 0 JUL -1 DEC 0 DEC 0 JAN +1 AUG 0 APR -1 MAY 0 FEB 0 NOV 0 JUL 0 FEB File Structure SNU-OOPSLA Lab 22 9.3 Indexing with Binary Search Trees AVL Tree(12) JUNE -1 DEC 0 APR LR +2 MAR 0 FEB 0 NOV -1 JUL 0 JAN 0 MAR +1 DEC -1 MAY -1 JAN +1 AUG After Rebalancing After Insertion New Identifier +1 AUG 0 APR 0 FEB -1 MAY -1 JUL 0 JUN -1 NOV 0 JUN File Structure SNU-OOPSLA Lab 23 9.3 Indexing with Binary Search Trees AVL Tree(13) OCTOBER -1 JAN 0 APR RR -1 MAR +1 DEC +1 AUG After Rebalancing After Insertion New Identifier 0 FEB -1 JUL -2 MAY 0 JUN -1 NOV 0 OCT File Structure SNU-OOPSLA Lab 24 9.3 Indexing with Binary Search Trees AVL Tree(14) 0 JAN +1 DEC +1 AUG 0 APR File Structure 0 MAR 0 FEB 0 NOV -1 JUL 0 JUN SNU-OOPSLA Lab 0 MAY 0 OCT 25 9.3 Indexing with Binary Search Trees AVL Tree(15) New Identifier No Rebalancing needed After Insertion -1 JAN SEPTEMBER +1 DEC +1 AUG 0 APR -1 MAR 0 FEB -1 NOV -1 JUL 0 JUN 0 MAY -1 OCT 0 SEP File Structure SNU-OOPSLA Lab 26 9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(1) Rebalancing is carried out using four different kinds of rotations LL when new node Y is inserted in the left subtree of the left subtree of A LR when new node Y is inserted in the right subtree of the left subtree of A RR when new node Y is inserted in the right subtree of the right subtree of A RL when new node Y is inserted in the left subtree of the right subtree of A File Structure SNU-OOPSLA Lab 27 9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(2) A Insert Y LL File Structure LR SNU-OOPSLA Lab RL RR 28 9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LL) Balanced Subtree Unbalanced following insertion +1 A +2 A 0 B AR 0 B h+2 rotation type LL BL BL 0 B BL AR 0 A h+2 h BR Balanced Subtree BR AR BR Height of BL increase to h+1 (BL < B < BR < A < AR) File Structure SNU-OOPSLA Lab 29 9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(RR) Unbalanced following insertion Balanced Subtree 0 B AL BL rotation type RR -2 A -1 A 0 B BR 0 A 0 B AL h+2 BR Balanced Subtree h+2 BL BR Al BL Height of BR increase to h+1 (AL < A < BL < B < BR) File Structure SNU-OOPSLA Lab 30 9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Balanced Subtree +1 A 0 B Unbalanced following insertion rotation type Balanced Subtree 0 C LR(a) +1 A 0 B -1 B 0 A 0 C (B < C < A) File Structure SNU-OOPSLA Lab 31 9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Unbalanced following insertion Balanced Subtree 0 B AR h+2 BL CR h+2 -1 A AR +1 C h h-1 CL 0 C 0 B -1 B h 0 C BL rotation type LR(b) +2 A +1 A Balanced Subtree CL BL CL CR AR h CR (BL < B < CL < C < CR < A < AR) File Structure SNU-OOPSLA Lab 32 9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Unbalanced following insertion Balanced Subtree 0 B AR h-1 CL h+2 BL CR 0 C h+2 0 A +1 B -1 B h 0 C BL rotation type LR(c) +2 A +1 A Balanced Subtree AR -1 C CL BL CL CR AR h CR RL a, b and c are symmetric to LR a, b and c File Structure SNU-OOPSLA Lab 33 9.3 Indexing with Binary Search Trees Paged Binary Tree(1) Page A unit of disk I/O for handling seek and transfer of disk data Typically, 4k, 8k, 16k ... Paged Binary Tree Divide a binary tree into pages and then store each page in a block of contiguous locations on disk. If every page holds 7 keys, 511 nodes(keys) in only three seeks Performance : # of seeks A completely full balanced tree : log2 (N+1) A completely full paged tree : log(k+1) (N+1) (k : # of keys hold in a single page) File Structure SNU-OOPSLA Lab 34 9.3 Indexing with Binary Search Trees Paged Binary Tree(2) File Structure SNU-OOPSLA Lab 35 9.3 Indexing with Binary Search Trees The Problem with Paged Trees Only valid when we have the entire set of keys in hand before the tree is built Problems due to out of balance How to select a good separator How to group keys How to guarantee the maximum loading B-tree provides a solution for above problems! File Structure SNU-OOPSLA Lab 36 9.3 Indexing with Binary Search Trees Paged Binary Tree (Out of balance) random input sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V D C S A M U I B G E T P K H J N L R O Q W V Y X Z F File Structure SNU-OOPSLA Lab 37 9.4 Multilevel Indexing : A Better Approach to Tree Indexes Multilevel Indexing Approach as simple index record limited on the number of keys allowed Approach as multirecord index consists of a sequence of simple index records binary search is too expensive Approach as multilevel index reduced the number of records to be searched speed up the search <example> 80Mbytes file of 8,000,000 records 10-byte keys File Structure SNU-OOPSLA Lab 38 9.4 Multilevel Indexing : A Better Approach to Tree Indexes Example of Multilevel Indexing 4th level index 1 1 2 8 a single index record with 8 keys 3rd level index 1 2 1 2 ... : : 8 100 : : 801 8 index records to index the largest keys in the 800 second-level records 800 2nd level index 1 2 : : 1 2 ... 100 : : 9 901 : : : : : 800 7901 800 index records with 80,000 keys choose one of the keys in each index record as the key of that whole record 1000 8000 Lowest level index is an index to data file and its reference fields are record addresses in the data file File Structure SNU-OOPSLA Lab 39 Multi-level Indexing(3) How can we insert new keys into the multilevel index? The index records in some level might be full The several levels of indexes might be rebuilt Overflow chain may be helpful, but still ugly Multi-level index structure is not strong in dynamic data processing applications B-tree will give you the right solution! File Structure SNU-OOPSLA Lab 40 9.5 B-Trees:Working up from the bottom B-Trees: Working up from the bottom Bayer and McCreight, 1972, Acta Infomatica Build trees upward from the bottom instead of downward from the top Each node of B-tree is an index record which consists of “key-reference” pairs The order of B-tree: the max number of key-reference pairs Every index record should have at least half of the order File Structure SNU-OOPSLA Lab 41 Sample B-Tree D P A C D A C File Structure D D T I M P I M T S D P SNU-OOPSLA Lab S T D 42 9.6 Example of Creating a B-Tree Splitting & Promoting(1) Splitting Creation of two nodes out of one because the original node becomes overfull Result in the need to promote a key to a higherlevel node to provide an index separating the two new nodes Promotion of a key Movement of a key from one node into a higherlevel node when split occurs File Structure SNU-OOPSLA Lab 43 9.6 Example of Creating a B-Tree Splitting & Promoting(2) *A *B * C *D *E * F *G * Initial leaf of a B-tree with a page size of seven Insert J key *A *B * C *D * * * * *E *F *G*J * * * Splitting the leaf to accommodate the new J key (continued....) File Structure SNU-OOPSLA Lab 44 * 9.6 Example of Creating a B-Tree Splitting & Promoting(3) J * D *A *B * C *D * * * * * * *E *F *G* J * * * * * * Promotion of the E key into a root node File Structure SNU-OOPSLA Lab 45 * 9.6 Example of Creating a B-Tree Insertion in B-tree(1) Input Sequence : CSDTAMPIBWNGURKEHOLJYQZFXV C D S D T T Insertion of C, S, D, T into the initial page A C D S T Insertion of A causes node to split and the largest key in each leaf node(D and T)to be placed in the root node File Structure SNU-OOPSLA Lab 46 9.6 Example of Creating a B-Tree Insertion in B-tree(2) D P A C D I T M P S T M and P are inserted into the rightmost leaf node, then insertion of I causes it to split File Structure SNU-OOPSLA Lab 47 9.6 Example of Creating a B-Tree Insertion in B-tree(3) D M P W A B C D G I M N P S T W Insertions of B,W,N, and G into leaf nodes causes another split and the root is now full File Structure SNU-OOPSLA Lab 48 9.6 Example of Creating a B-Tree Insertion in B-tree(4) D M P W A B C D G I M N P S T U W Insertion of U proceeds without incident, but R would have to be inserted into the rightmost leaf, which is full File Structure SNU-OOPSLA Lab 49 9.6 Example of Creating a B-Tree Insertion in B-tree(5) P W D M P A B C D G I M T W N P R S T U W Insertion of causes the rightmost leaf node to split, insertion into the root to split and the tree grows to level three File Structure SNU-OOPSLA Lab 50 9.6 Example of Creating a B-Tree Insertion in B-tree(6) P Z D I M P T Z Q R S T A B C D E G M I J K L M U W Y Z N O P Insertions of K,E,H,O,L,J,Y,Q, and Z, continue with another node split File Structure SNU-OOPSLA Lab 51 9.6 Example of Creating a B-Tree Insertion in B-tree(7) I D G I A B C D E F G P Z M P J K L M H I T X Z Q R S T N O P Y Z U V W X Insertions of F, X, and V finish the insertion of the alphabet File Structure SNU-OOPSLA Lab 52 Insertion in B-trees Major components of insertion Split the node Promote the middle key Increase the height of the B-tree Insertion may touch no more than 2 nodes per level Insertion cost is strictly linear in the height of the tree File Structure SNU-OOPSLA Lab 53 9.7 An Object-Oriented Representation of B-Trees Class BTreeNode(1) Represent B-Tree nodes in memory B-tree is an index file associated with a data file Specified in btnode.h of Appendix I The template BTreeNode class based on the SimpleIndex template class Public methods SimpleIndex Class BTreeNode Class File Structure Insert, Remove, Clear, Search Print, NumKeys Public methods Insert, Remove, LargestKey Split, Pack, Unpack SNU-OOPSLA Lab 54 9.7 An Object-Oriented Representation of B-Trees Class BTreeNode(2) Members Public methods: insert : simply calls SimpleIndex::Insert and then check for overflow remove a key, split and merge nodes search : inherited from SimpleIndex class(works perfectly well) pack/unpack : manage the difference between the memory and the disk representation of BTreeNode objects Protected member store the file address of the node and the minimum and maximum number of keys File Structure SNU-OOPSLA Lab 55 Template <class keyType> class BTreeNode: public SimpleIndex <keyType> {public BTreeNode(int maxKeys, int unique = 1); int Insert (const keyType key, int recAddr); int Remove(const keyType key, int recAddr = -1); int LargestKey (); int Split (BTreeNode<ketType>*newNode); int Pack (IOBuffer& buffer); int Unpack(IOBuffer& buffer); protected int MaxBKeys; int Init(); friend class Btree<keyType>; } File Structure SNU-OOPSLA Lab 56 9.7 An Object-Oriented Representation of B-Trees Class BTree Uses in-memory BTreeNode objects adds the file access portion enforces the consistent size of the nodes specified in btree.h of Appendix I Methods Create, Open, Close a B-Tree Search, Insert, Remove key-reference pairs Protected area Fetch(transfer nodes from disk to memory) Store(transfer nodes back to disk) root node, height of the tree, file of index records BTNode **Node:used to keep a collection of tree nodes in memory and reduce disk access File Structure SNU-OOPSLA Lab 57 Template <class keyType> class Btree {public: Btree(int order, int keySize=sizeof(keyType), int unique=1); int Open (char * name, int mode); int Create (char * name, int mode); int Close (); int Insert (const keyType key, const int recAddr); int Remove (const ketType key, const int recAddr = -1); int Search (const keyType key, const int recAddr = -1); protected typedef BTreeNode<keyType> BTNode; BTNode * FindLeaf (const ketType key); BTNode * Fetch(const int recaddr); int Store (BTNode *); BTNode Root; int Height; int Order; BTNode ** Nodes; RecordFile<BTNode> BtreeFile; }| File Structure SNU-OOPSLA Lab 58 9.8 B-Tree Methods Search, Insert, and Others Page Structure 2 D M P W 0 3 A B C D 8 G I M KEYCOUNT N 5 S P KEY array T U W CHILD array Page 2 4 D M P W 0 Page 3 3 G I M 3 8 5 content of PAGE 2, 3 File Structure SNU-OOPSLA Lab Nil Nil Nil Nil 59 9.8 B-Tree Methods Search, Insert, and Others Algorithm for Search Searching procedure iterative work in two stages operating alternatively on entire pages (Class BTree) and then within pages (Class BTreeNode) Step1: Loading a page into memeory Step 2: Searching through a page, looking for the key along the tree until it reaches the leaf level File Structure SNU-OOPSLA Lab 60 9.8 B-Tree Methods Search, Insert, and Others Search and FindLeaf method • Specifications of Search and FindLeaf methods(Fig 9.18) Template <class keyType> int BTree<keyType>::Search(const keyType key, const int recAddr) template <class keyType> BTreeNode<keyType>* BTree<keyType>::FindLeaf(const keyType key) Search method recAddr = btree.Search(‘L’) call FindLeaf(‘L’); Search key in the leaf node, and then if key exists, return the data file address of record with key ‘L’ otherwise, return -1 FindLeaf method Search down to leafNode, beginning of the root return the address of leafNode File Structure SNU-OOPSLA Lab 61 9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(1) Observations of Insertion, Splitting, and Promotion proceed all the way down to the leaf level after finding the insertion location at the leaf level, the work proceeds upward from the bottom Iterative procedure as having three phases Search to the leaf level, using FindLeaf method Insertion, overflow detection, and splitting on the upward path Creation of a new root node, if the current root was split File Structure SNU-OOPSLA Lab 62 9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(2) With no redistribution (Step 1) Locate node on bottom most level in which to insert record. Location is determined by key search. (Step 2) If vacant record slot is available, insert the record so that key sequencing is maintained. Then, update the pointer associated with the record (Pointer is null for level 0 records). Then Stop! (Step 3) If no vacant record slot exists, identify median record. All records and pointers to the left of the median records are stored in one node (the original) and those to the right are stored in another node(the new node). File Structure SNU-OOPSLA Lab 63 9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(3) (Step 4) If the topmost node was split, create a new topmost node which contains the median record identified in Step 3, filled with pointers to the original and split nodes. Update the root node to point to the new topmost node. Then Stop! (Step 5) If topmost node was not split, prepare to insert median record identified in Step 3 and a pointer to the new node (created in Step 3). Then Goto Step 2. Note : Step 4 makes B-tree increase in height by 1 level B-trees have 70% occupancy(like B+-trees) on an average File Structure SNU-OOPSLA Lab 64 9.8 B-Tree Methods Search, Insert, and Others Insertion Example Insert 3 Insert 1 Insert 19,4,20 3 split 3 4 19 20 4 20 2 0 0 1 3 4 0 Insert 9 Insert 13,16 4 20 0 split 2 13 16 19 20 1 File Structure 1 4 16 20 2 1 3 4 19 20 1 3 4 0 SNU-OOPSLA Lab 9 13 16 1 19 20 3 65 9.8 B-Tree Methods Search, Insert, and Others Create, Open, and Close Specified in btree.tc of Appendix I Method Create Method Open writes the empty root node into the file BTreeFile so that its first record is reserved for that root node opens BTreeFile and load the root node into memory from the first record in the file Method Close simply stores the node into BTreeFile and close it File Structure SNU-OOPSLA Lab 66 9.9 B-Tree Nomenclature B-Tree Nomenclature Be aware that terms are not uniform in the literature Definitions are also quite different In fact, there are a number of B-tree variations This text book uses “B tree” for B+ tree by other books In this book, “B+ tree” is B+ tree with a linked list of sorted data blocks File Structure SNU-OOPSLA Lab 67 Root A B Data Block File Structure C G E F Data Block Other Book Our Book B-Tree N/A H Data Block SNU-OOPSLA Lab I Data Block 68 Root A B Data Block File Structure C C G I E F G Data Block Other Book Our Book B+-Tree B-Tree H Data Block SNU-OOPSLA Lab I Data Block 69 Other Book B+-Tree with Linked List Root A B Data Block File Structure C C G I E F G Data Block Data Block SNU-OOPSLA Lab H Our Book B+-Tree I Data Block 70 Another aspect (node structures) Homogeneous Trees :B-Tree in other text Homogeneous trees - leaf nodes and interior nodes have same structures; Each contains both data pointers and tree pointers Average search length less for homogeneous trees, because some searches may conclude before reaching a leaf node File Structure SNU-OOPSLA Lab. 71 B-Tree in other text 23 pointers to 23 records in data file 37 64 45 53 8 23 1 7 1420 27 36 70 80 38 40 File Structure 85 91 50 52 SNU-OOPSLA Lab. 88 95 60 72 Another Aspect (node structures) Heterogeneous Trees :B+-Tree in other text Heterogeneous trees - leaf nodes and interior nodes have different structures File Structure SNU-OOPSLA Lab. 73 B+-Tree in other text 23 pointers to 23 records in data file 37 64 45 53 14 23 1 7 8 1420 232736 85 91 6470 9195 808588 373840 455052 5360 File Structure SNU-OOPSLA Lab. 74 Comparison of B-Tree and B+-Tree in other text Topic B-Tree Algorithm Complexity Rather complexity for insertion Retrieval efficiency Storage efficiency 1-pass structure creation algorithms File Structure less efficiency (B-tree is tall & spindle) slightly more efficient (is less space) rather complex SNU-OOPSLA Lab. B+-Tree more simple more efficient B+-tree is short & bushy less efficient (is more space) simple 75 Comparison of B-Tree and B+-Tree in other text Historical Note B-tree : Bayer & McCreight B+-tree: Comer B*-tree : Knuth, B-trees with 67% minimum occupancy ÷-trees : B+-trees with 67% minimum occupancy B File Structure SNU-OOPSLA Lab. 76 9.10 Formal Definition of B-Tree Properties Formal Definition of B-Tree Properties ** The properties of a B-tree of order m 1. Every page has a maximum of m descendants 2. Every page, except for the root and the leaves, has at least ceiling of (m/2) descendants 3. The root has at least two descendants (unless it is a leaf) 4. All the leaves appear on the same level 5. The leaf level forms a complete, ordered index of the associated data file File Structure SNU-OOPSLA Lab 77 9.11 Worst-Case Search Depth Worst-case Search Depth(1) Search depth : depth of the tree Worst case When every page of the tree has only the minimum # of descendants A maximal height with a minimum breadth File Structure SNU-OOPSLA Lab 78 9.11 Worst-Case Search Depth Worst-case Search Depth(2) B-TREE WITH ORDER m level minimum # of descendants 1(root) 2 3 ... ... d 2 2 x [m/2] 2 x [m/2]2 2 x [m/2]d-1 For a tree with N keys in its leaves, N >= 2 x [m/2]d-1 Upper bound for the depth of a B-tree ---> d d <= 1 + log[m/2](N/2) e.g.. Btree order = 512 keys, given 1,000,000 keys d <= 3.37 at most 3 depth ( 3 disk I/O ) File Structure SNU-OOPSLA Lab 79 9.12 Deletion, Merging, and Redistribution Deletion, Redistribution, and Concatenation Ensure that the B-tree properties are maintained after a deletion Algorithm (with redistribution and cocatenation) 1. If the key to be deleted is not in a leaf, swap it with its immediate successor, which is in a leaf (might be redistributed or concatenated!) 2. Delete the key File Structure SNU-OOPSLA Lab 80 9.12 Deletion, Merging, and Redistribution Deletion Algorithm(Cont’d) 3. If underflow occurs (the leaf now contains one too few keys), 3.1 If the left or right sibling has more than the minimum number of keys , redistribute 3.2 Otherwise, concatenate the two leaves and the median key from the parent into one leaf 3.3 Apply above step 3 to the parent as if it were deleted File Structure SNU-OOPSLA Lab 81 9.12 Deletion, Merging, and Redistribution Redistribution Occur when a sibling has more than the minimum # of keys Idea: Move keys between siblings Result in a change in the key in the parent page Does not propagate : strictly local effects How many keys should be moved? Not necessarily fixed Even distribution is desired File Structure SNU-OOPSLA Lab 82 9.12 Deletion, Merging, and Redistribution Concatenation(merge) Occur in case of underflow Combining the two pages and the key from the parent page ==> make a single full page Reverse the splitting Concatenation must involve demotion of keys : may cause underflow in the parent page The effects propagate upward File Structure SNU-OOPSLA Lab 83 9.12 Deletion, Merging, and Redistribution e.g. Deletion(1) Figure A I P Z M P D G I A B C D E F G File Structure J K L M H I T X Z Q R S T N O P SNU-OOPSLA Lab Y Z U V W X 84 9.12 Deletion, Merging, and Redistribution e.g. Deletion(2) I A B C D P Z M P D G I A B D E F G File Structure Removal of key C from figure A: Change occurs only in leaf node J K L M H I T X Z Q R S T N O P SNU-OOPSLA Lab Y Z U V W X 85 9.12 Deletion, Merging, and Redistribution e.g. Deletion(3) I D F A B C D E F G File Structure Result of deleting P from figure A : P changes to O in the second level and the root O Z M O I J K L M H I T X Z Q R S T N O SNU-OOPSLA Lab Y Z U V W X 86 9.12 Deletion, Merging, and Redistribution e.g. Deletion(4) I A B C D E F G P Z M P I D I File Structure Result of deleting H from figure A : Removal of H caused an underflow, and two leaf nodes were merged J K L M T X Z Q R S T N O P SNU-OOPSLA Lab Y Z U V W X 87 9.13 Redistribution During Insertion Redistribution during Insertion A way to improve storage utilization A way of avoiding the creation of new pages Tend to make an efficient B-tree in terms of space utilization Worst case : around 50% Average case : 67 ~ 69% With redistribution during insertion : over 85% File Structure SNU-OOPSLA Lab 88 9.13 Redistribution During Insertion 0 M DELETE J (No change) 1 QU DH AC E F I J K DELETE M (Swap with N) NOP RS VWXY Z 0 MN 1 QU DH AC E F File Structure I K MN O P SNU-OOPSLA Lab RS VWXY Z 89 9.13 Redistribution During Insertion 0 DELETE R (Redistribution) N 1 QUW DH AC I K E F OP RS UV V WXY Z DELETE A (Concatenation) 0 N 1 QW D H underflow File Structure AC E F CD EF I K SNU-OOPSLA Lab O P S U V X Y Z 90 9.13 Redistribution During Insertion NOW UNDERFLOW PROPAGATE UPWARD! 0 N 1 underflow QW H CDEF I K HEIGHT OF THE TREE DECREASED CDEF File Structure I K O P S U V X Y Z HN QW O P SNU-OOPSLA Lab S U V X Y Z 91 9.14 B* Trees B* Trees Knuth, 1973, Addison-Wesley Use redistribution operation during insertion Perform two-to-three split When split, the page has at least one sibling that is also full After split, the pages are about 2/3 full The page with at least (ceiling of (2m -1)/3) keys c.f. remember (ceiling of (m/2)) -1 keys File Structure SNU-OOPSLA Lab 92 9.14 B* Trees B* Tree(Cont’d) Original tree: A A CD F H K P RS T V X Two-to-three-split: FR Insert B A BC D File Structure H KMP SNU-OOPSLA Lab S TV X 93 9.15 Buffering of Pages:Virtual B-Trees Buffering of B-tree pages: Virtual B-Trees B-tree size >> main memory (in practice) Need buffering pages of B-tree Better to keep the root page in the main memory Buffer replacement algorithm: LRU + page height weighting factor Keep pages of top some levels all the time in main memory File Structure SNU-OOPSLA Lab 94 9.15 Buffering of Pages:Virtual B-Trees Placement of Information associated with the Key How to store associated information In a data and index mingled file Once the key is found, no more disk access required In a separate file Larger number of keys per a page Higher order, shallower tree File Structure SNU-OOPSLA Lab 95 9.16 Variable-Length Records and Keys Variable Length Records and Keys A B-tree with variable length keys No single, fixed order A different criterion for over/underflow condition Using max/min number of bytes (c.f. max/min number of keys) Key promotion mechanism Shortest variable-length keys are promoted in preference to longer ones Pages with the largest numbers of descendants up high in the tree File Structure SNU-OOPSLA Lab 96 Let’s Review !!! 9.1 Introduction 9.2 Statement of the Problem 9.3 Indexing with Binary Search Trees : AVL Trees, Paged Binary Trees, Problems with Paged Tress 9.4 Multilevel Indexing 9.5 B-Trees 9.6 Example of Creating a B-Tree 9.7 An Object-Oriented Representation of B-Trees : Class BTreeNode , Class BTree File Structure SNU-OOPSLA Lab 97 Let’s Review !!! 9.8 B-Tree Methods Search, Insert, and Others 9.9 B-Tree Nomenclature 9.10 Formal Definition of B-Tree Properties 9.11 Worst-case Search Depth 9.12 Deletion, Merging, and Redistribution 9.13 Redistribution During Insertion 9.14 B* Trees 9.15 Buffering of Pages : Virtual B-Trees 9.16 Variable-Length Records and Keys File Structure SNU-OOPSLA Lab 98