* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 7. B Tree, ISAM and B+ Tree Indexes
Survey
Document related concepts
Transcript
CENG 206 - Data Management and File Structures DYNAMIC MULTILEVEL INDEXES USING B AND B+ TREES 1 CENG 206 - Data Management and File Structures 2 Recall: Tree Nomenclature (Terminology) Terminology Definition Root The top node in a tree. Parent A node with at least one child. Child A node with a parent. Siblings Nodes with the same parent. Branch (Internal Node) A node with at least one child. Leaf (External Node) A node with no children (childless). Edge Connection between one node to another. Height of a Node The number of edges on the longest path from the node to a leaf. A leaf node will have a height of 0. Depth of a Node The number of edges from the node to the tree's root node. A root node will have a depth of 0. Height and depth of a tree is same, which is the number of edges on the longest downward path between the root and a leaf. CENG 206 - Data Management and File Structures 3 What is a B-Tree? • A B-Tree is a multi-way (multilevel) tree, designed especially for use on magnetic disks, which each node contains a set of search keys and pointers. • It contains data pages where the user data is actually stored and index pages. • A B-Tree is a generalization of a binary search tree in that more than two paths diverge from a single node. • Unlike self balancing binary search trees, the B-Tree is optimized for systems that read and write large blocks of data. • It is commonly used in databases and file systems. CENG 206 - Data Management and File Structures 4 What is a B-Tree? - 2 • A B-Tree keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time (ʘ(log n)). • B-Trees are dynamic, that is the height of the tree grows and shrinks as records are added and deleted. • B-Trees are also balanced (that all leaves must be at the same depth) so adding or deleting records must not violate this property. • Etymology: The origin of B-Tree has never been explained by the authors (Rudolf Bayer, Edward M. McCreight). As we shall see, "balanced," "broad," or "bushy" might apply. Others suggest that the "B" stands for Boeing or Bayer-Trees. CENG 206 - Data Management and File Structures B-Tree Structure 5 CENG 206 - Data Management and File Structures B-Tree Structure - 2 6 CENG 206 - Data Management and File Structures 7 What is ISAM? • ISAM (Indexed Sequential Access Method) is a static file management system developed at IBM that allows records to be accessed either sequentially (in the order they were entered) or randomly (with an index). • Each index defines a different ordering of the records. • E.g., an employee database may have several indexes, based on the information being sought. A name index may order employees alphabetically by last name, while a department index may order employees by their department. • A key is specified in each index. • For an alphabetical index of employee names, the last name field would be the key. CENG 206 - Data Management and File Structures What is ISAM? - 2 8 CENG 206 - Data Management and File Structures 9 What is a B+ Tree? • A B+ Tree combines features of ISAM and B-Trees. • It contains index pages and data pages. • The data pages always appear as leaf (data) nodes in the tree. • The root (index) node and intermediate (node between the root and leaf node) nodes are always index pages. • It is fully dynamic, that is it can grow and shrink. CENG 206 - Data Management and File Structures 10 What is a B+ Tree? - 2 • B+ Tree is the most commonly used data structures in OS for metadata indexing. • A B+ Tree consists of a hierarchy of nodes. • Each node in the tree, except the root node (has no parent), has one parent node and zero (leaf node) or more child nodes (branch node). • Depth may vary across different paths from root to leaf (unbalanced tree), or depth may be the same from the root node to each leaf node, producing a tree called a balanced tree. CENG 206 - Data Management and File Structures 11 What is a B+ Tree? - 3 • B+ Trees are always height balanced. • The degree, or order, of a tree is the maximum number of children allowed per parent. • Large degrees, in general, create broader, shallower trees. • Because access time in a tree structure depends more often upon depth than on breadth, it is usually advantageous to have “bushy” shallow trees. CENG 206 - Data Management and File Structures B+ Tree Structure 12 CENG 206 - Data Management and File Structures B+ Tree Node Structure Key/Page Tree/Data Pointer Key/Page Tree/Data Pointer Key/Page Pointer to Next Leaf Node 13 CENG 206 - Data Management and File Structures 14 Fill Factor • The index pages in a B+ Tree are constructed through the process of inserting and deleting records. • Thus, B+ Trees grow and contract like their B-Tree counterparts. • The contents and the number of index pages reflects this growth and shrinkage. • B+ Trees and B-Trees use a "fill factor" to control the growth and the shrinkage. • Fill factor is the value that determines the percentage of space on each leaf-level page to be filled with data. • Minimum fill factor for B+ or B trees is 50% (the root page may violate this rule). CENG 206 - Data Management and File Structures 15 Branching Factor • The branching factor, or the order of a B+ Tree, is the number of children at each node, the out degree. • The actual number of children for a node m, is constrained for internal nodes so that ceiling|b / 2| ≤ m ≤ b. • For example, if the order of a B+ Tree is 7, each internal node (except for the root) may have between 4 and 7 children; the root may have between 2 and 7 (for the root 2 ≤ m ≤ b). • The root is an exception: it is allowed to have as few as two children. CENG 206 - Data Management and File Structures 16 Branching Factor - 2 • Leaf nodes have no children, but are constrained (hold records) so that the number of keys must be at least floor|b / 2| and at most b-1. • In the situation where a B+ tree is nearly empty, it only contains one node, which is a leaf node (the root is also the single leaf, in this case). • This node is permitted to have as little as one key if necessary, and at most b. CENG 206 - Data Management and File Structures 17 Branching Factor - 3 Node Type Children Type Min Children Max Children Example b=7 Example b = 100 Root Node (Only node in the tree) Records 1 b 1-7 1 - 100 Root Node Internal or External Nodes 2 b 2–7 2 - 100 Internal Node Internal or External Nodes b 4-7 50 - 100 Leaf Node Records b-1 3–6 50 - 99 ceiling|b floor|b / 2| / 2| CENG 206 - Data Management and File Structures ***B+ Tree Calculations 18 CENG 206 - Data Management and File Structures 19 B+ Trees Characteristics • Differences between B and B+ Tree. • In B+ Trees, all records are stored at the leaf level of the tree; only keys are stored in interior nodes. • In B+ Trees, all leaf nodes are linked together as a double-linked list. • A B+ Tree can have less levels than the corresponding B Tree. • Characteristics of B+ Trees: • A B+ Tree is a balanced tree. • A minimum occupancy of 50% is guaranteed for each node except the root. • Searching for a record requires just a traversal from the root to the appropriate leaf. CENG 206 - Data Management and File Structures B+ Tree Example Place of the equal sign may vary. <25 Parameters Values Number of Keys/Page 4 Number of Pointers 5 Fill Factor 50% Minimum Keys in Each Page 2 25≥ 20 CENG 206 - Data Management and File Structures 21 B+ Tree Adding • The key value determines a record's placement in a B+ Tree. • The leaf pages are maintained in sequential order and a doubly linked list connects each leaf page with its sibling page(s). • This doubly linked list speeds data movement as the pages grow and contract. • We must consider three scenarios when we add a record to a B+ Tree. • Each scenario causes a different action in the insert algorithm. CENG 206 - Data Management and File Structures 22 Recall: Doubly and Single Linked List CENG 206 - Data Management and File Structures 23 B+ Tree Adding: Scenario 1 • Leaf page and index page is not full: place the record in sorted position in the appropriate leaf page. CENG 206 - Data Management and File Structures 24 B+ Tree Adding: Scenario 2 • Leaf page is full, index page is not: 1. 2. 3. 4. Split the leaf page Place middle key in the index page in sorted order. Left leaf page contains records with keys below the middle key. Right leaf page contains records with keys equal to or greater than the middle key. CENG 206 - Data Management and File Structures 25 B+ Tree Adding: Scenario 2 - 2 • Suppose that we want to insert a record with a key value of 70 into our B+ tree. • This record should go in the leaf page containing 50, 55, 60, and 65. • Unfortunately leaf page is full. CENG 206 - Data Management and File Structures B+ Tree Adding: Scenario 2 - 3 26 CENG 206 - Data Management and File Structures 27 B+ Tree Adding: Scenario 2 - 4 • This means that we must split the page as follows: left leaf page: 50, 55 and right leaf page: 60, 65, 70. • The middle key of 60 is placed in the index page between 50 and 75. CENG 206 - Data Management and File Structures 28 B+ Tree Adding: Scenario 3 • Leaf page and index page are full: 1. Split the leaf page. 2. Records with keys “<“ middle key go to the left leaf page. 3. Records with keys “>=“ middle key go to the right leaf page. 4. Split the index page. 5. Keys “<“ middle key go to the left index page. 6. Keys “>” middle key go to the right index page. 7. The middle key goes to the next (higher level) index. • If the next level index page is full, continue splitting the index pages. CENG 206 - Data Management and File Structures 29 B+ Tree Adding: Scenario 3 - 2 • Suppose that we want to add a record containing a key value of 95 to our B+ tree. • This record belongs in the page containing 75, 80, 85, and 90. CENG 206 - Data Management and File Structures 30 B+ Tree Adding: Scenario 3 - 3 • This means that we must split the page as follows: left leaf page: 75, 80 and right leaf page: 85, 90, 95. • The middle key, 85, rises to the index page. • Unfortunately, the index page is also full, so we split the index page as follows: left index page: 25, 50, right index page: 75, 85 and new index page: 60. CENG 206 - Data Management and File Structures B+ Tree Adding: Scenario 3 - 4 31 CENG 206 - Data Management and File Structures 32 B+ Tree Deleting ***You are not responsible from deleting in the exams. • We must consider three scenarios when we delete a record from a B+ tree. • Each scenario causes a different action in the delete algorithm. • However, for all cases (the average and the worst case) time complexity of deleting is ʘ(n). CENG 206 - Data Management and File Structures 33 B+ Tree Deleting: Scenario 1 • Leaf page and index page below fill factor: • Delete the record from the leaf page. • Arrange keys in ascending order to fill void. • If the key of the deleted record appears in the index page, use the next key to replace it. CENG 206 - Data Management and File Structures 34 B+ Tree Deleting: Scenario 1 - Example • We begin by deleting the record with key 70 from the B+ Tree. • This record is in a leaf page containing 60, 65 and 70. • This page will contain 2 records after the deletion. • Since our fill factor is 50% or (2 records) we simply delete 70 from the leaf node. CENG 206 - Data Management and File Structures 35 B+ Tree Deleting: Scenario 1 - Example CENG 206 - Data Management and File Structures B+ Tree Deleting: Scenario 2 • Leaf page below fill factor and index page is not: • Combine the leaf page and its sibling. • Change the index page to reflect the change. 36 CENG 206 - Data Management and File Structures 37 B+ Tree Deleting: Scenario 2 - Example • Next, we delete the record containing 25 from the B+ Tree. • This record is found in the leaf node containing 25, 28, and 30. • The fill factor will be 50% after the deletion; however, 25 appears in the index page. • Thus, when we delete 25 we must replace it with 28 in the index page. CENG 206 - Data Management and File Structures 38 B+ Tree Deleting: Scenario 2 - Example CENG 206 - Data Management and File Structures 39 B+ Tree Deleting: Scenario 3 • Leaf page and index page are below fill factor: Combine the leaf page and its sibling. 2. Adjust the index page to reflect the change. 3. Combine the index page with its sibling. • Continue combining index pages until you reach a page with the correct fill factor or you reach the root page. 1. CENG 206 - Data Management and File Structures 40 B+ Tree Deleting: Scenario 3 - Example • We want to delete 60 from the B+ Tree. 1. 2. 3. The leaf page containing 60 will be below the fill factor after the deletion. Thus, we must combine leaf pages. With recombined pages, the index page will be reduced by one key. Hence, it will also fall below the fill factor thus, we must combine index pages. Sixty appears as the only key in the root index page. Obviously, it will be removed with the deletion. CENG 206 - Data Management and File Structures 41 B+ Tree Deleting: Scenario 3 - Example CENG 206 - Data Management and File Structures 42 Example - 1 • Consider a DBMS that has the following characteristics: • 2KB fixed-size blocks • 12 byte pointers • 56 byte block headers • We want to build an index on a search key that is 8 bytes long. Calculate the maximum number of records we can index with a 3level B+ Tree (2 levels plus the root). CENG 206 - Data Management and File Structures 43 Example - 1 (Solution) • Let each node of a B+ Tree contain at most n pointers and n-1 keys. (Considering, each block header is 56 byte.) • “Search Key Size * # of Keys + Ptr. Size * # of Ptr. + Block Header” must be smaller or equal to fixed block size therefore: 8 * (n-1) + 12 * n + 56 ≤ 2048 Therefore, n ≤ 100 • The leaf level of a B+ Tree can hold at most: 99 * 100 * 100 record pointers CENG 206 - Data Management and File Structures 44 Example - 2 • Build a B+ Tree for the given numbers below with 3 keys and %50 fill factor (ceiling|# of Keys * 1/2| = 2). 7, 5, 3, 10, 12, 15, 6, 16 • Assume that the tree is initially empty and values are inserted in the given order. 45 CENG 206 - Data Management and File Structures Example - 2 (Solution) 7 3 5 7 Insert 10 7 Insert 5 5 3 7 5 Insert 3 3 5 7 10 Wrong 7 5 7 CENG 206 - Data Management and File Structures Example - 2 (Solution) 7 7≤ <7 3 5 7 10 Insert 12 7 <7 3 5 7≤ 7 10 12 Insert 15 46 CENG 206 - Data Management and File Structures Example - 2 (Solution) 7 <7 3 5 7 10 <7 5 Insert 6 12≤ 7≤x<12 7 3 12 12 12 12≤ 7≤x<12 7 10 15 12 15 Insert 16 47 48 CENG 206 - Data Management and File Structures Example - 2 (Solution) 7 <7 3 5 12 12≤ 7≤x<12 6 7 10 Data 12 15 16