Download 7. B Tree, ISAM and B+ Tree Indexes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

B-tree wikipedia , lookup

Transcript
CENG 206 - Data Management and File Structures
DYNAMIC MULTILEVEL
INDEXES USING
B AND B+ TREES
1
CENG 206 - Data Management and File Structures
2
Recall: Tree Nomenclature (Terminology)
Terminology
Definition
Root
The top node in a tree.
Parent
A node with at least one child.
Child
A node with a parent.
Siblings
Nodes with the same parent.
Branch (Internal Node)
A node with at least one child.
Leaf (External Node)
A node with no children (childless).
Edge
Connection between one node to another.
Height of a Node
The number of edges on the longest path from the node
to a leaf. A leaf node will have a height of 0.
Depth of a Node
The number of edges from the node to the tree's root
node. A root node will have a depth of 0.
Height and depth of a tree is same, which is the number of edges on the longest
downward path between the root and a leaf.
CENG 206 - Data Management and File Structures
3
What is a B-Tree?
• A B-Tree is a multi-way (multilevel) tree, designed especially for use
on magnetic disks, which each node contains a set of search keys
and pointers.
• It contains data pages where the user data is actually stored and
index pages.
• A B-Tree is a generalization of a binary search tree in that more than
two paths diverge from a single node.
• Unlike self balancing binary search trees, the B-Tree is optimized for
systems that read and write large blocks of data.
• It is commonly used in databases and file systems.
CENG 206 - Data Management and File Structures
4
What is a B-Tree? - 2
• A B-Tree keeps data sorted and allows searches, sequential access,
insertions, and deletions in logarithmic amortized time (ʘ(log n)).
• B-Trees are dynamic, that is the height of the tree grows and shrinks
as records are added and deleted.
• B-Trees are also balanced (that all leaves must be at the same depth)
so adding or deleting records must not violate this property.
• Etymology: The origin of B-Tree has never been explained by the
authors (Rudolf Bayer, Edward M. McCreight). As we shall see,
"balanced," "broad," or "bushy" might apply. Others suggest that the
"B" stands for Boeing or Bayer-Trees.
CENG 206 - Data Management and File Structures
B-Tree Structure
5
CENG 206 - Data Management and File Structures
B-Tree Structure - 2
6
CENG 206 - Data Management and File Structures
7
What is ISAM?
• ISAM (Indexed Sequential Access Method) is a static file management
system developed at IBM that allows records to be accessed either
sequentially (in the order they were entered) or randomly (with an index).
• Each index defines a different ordering of the records.
• E.g., an employee database may have several indexes, based on the
information being sought. A name index may order employees
alphabetically by last name, while a department index may order
employees by their department.
• A key is specified in each index.
• For an alphabetical index of employee names, the last name field would be
the key.
CENG 206 - Data Management and File Structures
What is ISAM? - 2
8
CENG 206 - Data Management and File Structures
9
What is a B+ Tree?
• A B+ Tree combines features of ISAM and B-Trees.
• It contains index pages and data pages.
• The data pages always appear as leaf (data) nodes in the tree.
• The root (index) node and intermediate (node between the root and
leaf node) nodes are always index pages.
• It is fully dynamic, that is it can grow and shrink.
CENG 206 - Data Management and File Structures
10
What is a B+ Tree? - 2
• B+ Tree is the most commonly used data structures in OS for
metadata indexing.
• A B+ Tree consists of a hierarchy of nodes.
• Each node in the tree, except the root node (has no parent), has one
parent node and zero (leaf node) or more child nodes (branch node).
• Depth may vary across different paths from root to leaf (unbalanced
tree), or depth may be the same from the root node to each leaf node,
producing a tree called a balanced tree.
CENG 206 - Data Management and File Structures
11
What is a B+ Tree? - 3
• B+ Trees are always height balanced.
• The degree, or order, of a tree is the maximum number of children
allowed per parent.
• Large degrees, in general, create broader, shallower trees.
• Because access time in a tree structure depends more often upon
depth than on breadth, it is usually advantageous to have “bushy”
shallow trees.
CENG 206 - Data Management and File Structures
B+ Tree Structure
12
CENG 206 - Data Management and File Structures
B+ Tree Node Structure
Key/Page
Tree/Data
Pointer
Key/Page
Tree/Data
Pointer
Key/Page
Pointer to
Next Leaf
Node
13
CENG 206 - Data Management and File Structures
14
Fill Factor
• The index pages in a B+ Tree are constructed through the process of
inserting and deleting records.
• Thus, B+ Trees grow and contract like their B-Tree counterparts.
• The contents and the number of index pages reflects this growth and
shrinkage.
• B+ Trees and B-Trees use a "fill factor" to control the growth and
the shrinkage.
• Fill factor is the value that determines the percentage of space on
each leaf-level page to be filled with data.
• Minimum fill factor for B+ or B trees is 50% (the root page may
violate this rule).
CENG 206 - Data Management and File Structures
15
Branching Factor
• The branching factor, or the order of a B+ Tree, is the number of
children at each node, the out degree.
• The actual number of children for a node m, is constrained for internal
nodes so that ceiling|b / 2| ≤ m ≤ b.
• For example, if the order of a B+ Tree is 7, each internal node
(except for the root) may have between 4 and 7 children; the root
may have between 2 and 7 (for the root  2 ≤ m ≤ b).
• The root is an exception: it is allowed to have as few as two children.
CENG 206 - Data Management and File Structures
16
Branching Factor - 2
• Leaf nodes have no children, but are constrained (hold records) so
that the number of keys must be at least floor|b / 2| and at most b-1.
• In the situation where a B+ tree is nearly empty, it only contains one
node, which is a leaf node (the root is also the single leaf, in this
case).
• This node is permitted to have as little as one key if necessary, and at
most b.
CENG 206 - Data Management and File Structures
17
Branching Factor - 3
Node Type
Children Type
Min
Children
Max
Children
Example
b=7
Example
b = 100
Root Node
(Only node in the tree)
Records
1
b
1-7
1 - 100
Root Node
Internal or
External Nodes
2
b
2–7
2 - 100
Internal Node
Internal or
External Nodes
b
4-7
50 - 100
Leaf Node
Records
b-1
3–6
50 - 99
ceiling|b
floor|b
/ 2|
/ 2|
CENG 206 - Data Management and File Structures
***B+ Tree Calculations
18
CENG 206 - Data Management and File Structures
19
B+ Trees Characteristics
• Differences between B and B+ Tree.
• In B+ Trees, all records are stored at the leaf level of the tree; only
keys are stored in interior nodes.
• In B+ Trees, all leaf nodes are linked together as a double-linked
list.
• A B+ Tree can have less levels than the corresponding B Tree.
• Characteristics of B+ Trees:
• A B+ Tree is a balanced tree.
• A minimum occupancy of 50% is guaranteed for each node except
the root.
• Searching for a record requires just a traversal from the root to the
appropriate leaf.
CENG 206 - Data Management and File Structures
B+ Tree Example
Place of the
equal sign may
vary.
<25
Parameters
Values
Number of Keys/Page
4
Number of Pointers
5
Fill Factor
50%
Minimum Keys in Each Page
2
25≥
20
CENG 206 - Data Management and File Structures
21
B+ Tree Adding
• The key value determines a record's placement in a B+ Tree.
• The leaf pages are maintained in sequential order and a doubly linked
list connects each leaf page with its sibling page(s).
• This doubly linked list speeds data movement as the pages grow
and contract.
• We must consider three scenarios when we add a record to a B+
Tree.
• Each scenario causes a different action in the insert algorithm.
CENG 206 - Data Management and File Structures
22
Recall: Doubly and Single Linked List
CENG 206 - Data Management and File Structures
23
B+ Tree Adding: Scenario 1
• Leaf page and index page is not full: place the record in sorted
position in the appropriate leaf page.
CENG 206 - Data Management and File Structures
24
B+ Tree Adding: Scenario 2
• Leaf page is full, index page is not:
1.
2.
3.
4.
Split the leaf page
Place middle key in the index page in sorted order.
Left leaf page contains records with keys below the middle key.
Right leaf page contains records with keys equal to or greater
than the middle key.
CENG 206 - Data Management and File Structures
25
B+ Tree Adding: Scenario 2 - 2
• Suppose that we want to insert a record with a key value of 70 into
our B+ tree.
• This record should go in the leaf page containing 50, 55, 60, and 65.
• Unfortunately leaf page is full.
CENG 206 - Data Management and File Structures
B+ Tree Adding: Scenario 2 - 3
26
CENG 206 - Data Management and File Structures
27
B+ Tree Adding: Scenario 2 - 4
• This means that we must split the page as follows: left leaf page: 50,
55 and right leaf page: 60, 65, 70.
• The middle key of 60 is placed in the index page between 50 and 75.
CENG 206 - Data Management and File Structures
28
B+ Tree Adding: Scenario 3
• Leaf page and index page are full:
1.
Split the leaf page.
2.
Records with keys “<“ middle key go to the left leaf page.
3.
Records with keys “>=“ middle key go to the right leaf page.
4.
Split the index page.
5.
Keys “<“ middle key go to the left index page.
6.
Keys “>” middle key go to the right index page.
7.
The middle key goes to the next (higher level) index.
• If the next level index page is full, continue splitting the index
pages.
CENG 206 - Data Management and File Structures
29
B+ Tree Adding: Scenario 3 - 2
• Suppose that we want to add a record containing a key value of 95 to
our B+ tree.
• This record belongs in the page containing 75, 80, 85, and 90.
CENG 206 - Data Management and File Structures
30
B+ Tree Adding: Scenario 3 - 3
• This means that we must split the page as follows: left leaf page: 75,
80 and right leaf page: 85, 90, 95.
• The middle key, 85, rises to the index page.
• Unfortunately, the index page is also full, so we split the index page
as follows: left index page: 25, 50, right index page: 75, 85 and new
index page: 60.
CENG 206 - Data Management and File Structures
B+ Tree Adding: Scenario 3 - 4
31
CENG 206 - Data Management and File Structures
32
B+ Tree Deleting
***You are not responsible from deleting in the exams.
• We must consider three scenarios when we delete a record from a B+
tree.
• Each scenario causes a different action in the delete algorithm.
• However, for all cases (the average and the worst case) time
complexity of deleting is ʘ(n).
CENG 206 - Data Management and File Structures
33
B+ Tree Deleting: Scenario 1
• Leaf page and index page below fill factor:
• Delete the record from the leaf page.
• Arrange keys in ascending order to fill void.
• If the key of the deleted record appears in the index page, use the
next key to replace it.
CENG 206 - Data Management and File Structures
34
B+ Tree Deleting: Scenario 1 - Example
• We begin by deleting the record with key 70 from the B+ Tree.
• This record is in a leaf page containing 60, 65 and 70.
• This page will contain 2 records after the deletion.
• Since our fill factor is 50% or (2 records) we simply delete 70 from the
leaf node.
CENG 206 - Data Management and File Structures
35
B+ Tree Deleting: Scenario 1 - Example
CENG 206 - Data Management and File Structures
B+ Tree Deleting: Scenario 2
• Leaf page below fill factor and index page is not:
• Combine the leaf page and its sibling.
• Change the index page to reflect the change.
36
CENG 206 - Data Management and File Structures
37
B+ Tree Deleting: Scenario 2 - Example
• Next, we delete the record containing 25 from the B+ Tree.
• This record is found in the leaf node containing 25, 28, and 30.
• The fill factor will be 50% after the deletion; however, 25 appears in
the index page.
• Thus, when we delete 25 we must replace it with 28 in the index
page.
CENG 206 - Data Management and File Structures
38
B+ Tree Deleting: Scenario 2 - Example
CENG 206 - Data Management and File Structures
39
B+ Tree Deleting: Scenario 3
• Leaf page and index page are below fill factor:
Combine the leaf page and its sibling.
2. Adjust the index page to reflect the change.
3. Combine the index page with its sibling.
• Continue combining index pages until you reach a page with the
correct fill factor or you reach the root page.
1.
CENG 206 - Data Management and File Structures
40
B+ Tree Deleting: Scenario 3 - Example
• We want to delete 60 from the B+ Tree.
1.
2.
3.
The leaf page containing 60 will be below the fill factor after the
deletion. Thus, we must combine leaf pages.
With recombined pages, the index page will be reduced by one
key. Hence, it will also fall below the fill factor thus, we must
combine index pages.
Sixty appears as the only key in the root index page. Obviously, it
will be removed with the deletion.
CENG 206 - Data Management and File Structures
41
B+ Tree Deleting: Scenario 3 - Example
CENG 206 - Data Management and File Structures
42
Example - 1
• Consider a DBMS that has the following characteristics:
• 2KB fixed-size blocks
• 12 byte pointers
• 56 byte block headers
• We want to build an index on a search key that is 8 bytes long.
Calculate the maximum number of records we can index with a 3level B+ Tree (2 levels plus the root).
CENG 206 - Data Management and File Structures
43
Example - 1 (Solution)
• Let each node of a B+ Tree contain at most n pointers and n-1 keys.
(Considering, each block header is 56 byte.)
• “Search Key Size * # of Keys + Ptr. Size * # of Ptr. + Block Header”
must be smaller or equal to fixed block size therefore:
8 * (n-1) + 12 * n + 56 ≤ 2048
Therefore, n ≤ 100
• The leaf level of a B+ Tree can hold at most:
99 * 100 * 100 record pointers
CENG 206 - Data Management and File Structures
44
Example - 2
• Build a B+ Tree for the given numbers below with 3 keys and %50 fill
factor (ceiling|# of Keys * 1/2| = 2).
7, 5, 3, 10, 12, 15, 6, 16
• Assume that the tree is initially empty and values are inserted in the
given order.
45
CENG 206 - Data Management and File Structures
Example - 2 (Solution)
7
3
5
7
Insert 10
7
Insert 5
5
3
7
5
Insert
3
3
5
7
10
Wrong
7
5
7
CENG 206 - Data Management and File Structures
Example - 2 (Solution)
7
7≤
<7
3
5
7
10
Insert 12
7
<7
3
5
7≤
7
10
12
Insert 15
46
CENG 206 - Data Management and File Structures
Example - 2 (Solution)
7
<7
3
5
7
10
<7
5
Insert 6
12≤
7≤x<12
7
3
12
12
12
12≤
7≤x<12
7
10
15
12
15
Insert 16
47
48
CENG 206 - Data Management and File Structures
Example - 2 (Solution)
7
<7
3
5
12
12≤
7≤x<12
6
7
10
 Data 
12
15
16