Download 4.4 B+Trees - IfIS - Technische Universität Braunschweig

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Interval tree wikipedia , lookup

Red–black tree wikipedia , lookup

Binary tree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Relational Database Systems 2
4.Trees & Advanced Indexes
Christoph Lofi
Benjamin Köhncke
Institut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
2 Storage
• Buffer Management is very important
– Holds DB blocks in primary memory
• DB block are made up of several FS blocks
• Find good strategies to have requested DB blocks available when
needed
– Each block holds some meta data and row data
• Indexes drastically speed up queries
– Less blocks need to be scanned
– Primary Index
• On primary key attribute, usually influences row storage order
– Secondary Index
• On any attribute, does not influence storage order
Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme
2
4 Trees & Advanced Indexes
4.1 Introduction
4.2 Binary Search Trees
4.3 Self Balancing
Binary Search Trees
4.4 B-Trees
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
3
4.1 Introduction
• Indexes need a suitable data structure
– For efficient index look-ups search keys need to be
ordered
• Remember: All indexes should be stored in a
separate database file, not together with data
– A suitable number of DB blocks (adjacent on disk) is
reserved at index creation time
– If the space is not sufficient, another file is created
and linked to the original index file
Search Key 1
Block Address 1
Search Key 2
Block Address 2
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
4
4.1 Introduction
• Search within an index
– Bisection search possible: ⌈log2n⌉; O(log n)
4
6
7
16
18
21
24
33
39
47
68
72
89
92
99
– But usually indexes span several DB blocks
• If index is in n blocks, O(n) blocks need to be read from disk
• Example: search for 92
4
6
7
16
18
21
24
33
39
47
68
72
89
92
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
99
5
4.1 Introduction
• Maintenance of index is also difficult
– Insert a new search key with value 5!
5
4
6
7
16
18
21
24
33
39
47
68
72
89
92
99
4
5
6
7
16
18
21
24
33
39
47
68
72
89
92
99
– In worst case, all cells need to be shifted and all blocks
need to be accessed
• Similar problem occurs when deleting a value
– Often: do not shift values, but mark key as deleted
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
6
4.1 Introduction
• In this lecture, we discuss more efficient multilevel data structures
– B-trees
• Prevalent in database systems
• Better access performance
• Much better update performance
• To understand B-trees better,
we start by examining binary
search trees
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
7
4.2 Binary Trees
• Binary trees are
– Rooted and directed trees
– Each node has none, one or two children
– Each node (except root) has exactly
one parent
0/1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
8
4.2 Binary Trees
• Some naming conventions
– Nodes without children are called leaf
nodes
– The depth of node N is the path
length from the root
– The tree height is the maximum node
depth
– If there is a path from node N1 to
node N2, N1 is an ancestor of N2 and
N2 is a descendant of N1
– The size of a node N is the number of
all descendants of N including itself
– A subtree of a node N is formed of
all descendant nodes including N and
the respective links
root
subtree red
red
Leaf nodes
tree height = 3
red node:
size = 3
depth = 1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
9
4.2 Binary Trees
• Properties of binary trees
• Full binary tree (or proper)
– Each node has either zero or two children
• Perfect binary tree
– All leaf nodes have the same depth
– With height h, contains 2h nodes
Full and Balanced
• Height-balanced binary tree
– Depth of all leaf nodes differ by at most 1
– With height h, contains between 2h-1 and 2h
nodes
Full and Perfect
• Degenerated binary tree
– Each node has either zero or one child
– Behaves like a linked list: search in O(n)
Degenerated
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
10
4.2 Binary Search Trees
• Binary search trees are binary trees with
– Each node has a unique value assigned
– There is a total order on all values
– Left subtree of a node contains only values less than
node value
– Right subtree of a node contains only values larger than
the node value
– Aiming for O(log n) search complexity
• Structurally resembles bisection search
57
0/1
33
17
85
42
61
99
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
11
4.2 Binary Search Trees
• Constructing and inserting into binary search
trees
– Values are inserted incrementally
– First value is root
– Additional values sink into tree
• Sink to left subtree if value smaller
• Sink to right subtree if value larger
• Attach to last node as left/right child, if subtree is empty
• Insert order of values does highly influence
resulting and intermediate tree properties
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
12
4.2 Binary Search Trees
• Suppose insert order 57, 33, 42, 85, 17, 61, 99
33
57
85
57
17
42
33
85
61
99
17
42
61
99
Insert 57
Insert 33, 42 – Degenerated
61
57
57
99
33
17
85
42
33
17
Insert 85, 17 – Full and Balanced
85
42
61
99
Insert 61, 99 –Perfect and Full
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
13
4.2 Binary Search Trees
• Suppose insert order
– 99, 85, 61, 57, 42, 33, 17
99
85
• Insert complexity is thus
61
– O(n) worst case
– O(log n) average case
57
42
33
17
Degenerated
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
14
4.2 Binary Search Tree
• Search for a Key
– Start with root
– Recursive Procedure
57
33
• If node value = v
85
– Return node
• If node is leaf
17
42
61
99
– Value not found
• if v < node value
– Descend to left subtree
Else
– Descend to right subtree
• Complexity:
– Average case: O(log n)
– Worst case: O(n) – degenerated tree
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
15
4.2 Binary Search Tree
57
• Tree Traversal
33
– Accesses all nodes of the tree
• Pre-Order
– Visit node
– Traverse left subtree
– Traverse right subtree
• In-Order (sorted access)
17
85
42
61
99
49
35
Pre-Order: 57-33-17-42-35-49-85-61-99
57
– Traverse left subtree
– Visit node
– Traverse right subtree
33
85
• Post-Order
–
–
–
–
Traverse left subtree
Traverse right subtree
Visit node
17–35–49–42–33–61–99–85–57
17
42
35
61
99
49
In-Order: 17-33-35-42-49-57-61-85-99
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
16
4.2 Binary Search Tree
• Deleting nodes has complexity O(n) worst case, O(log n) average case
– Locate the node to delete by tree search
– If node is leaf, just delete it
– If node has one child, delete node and attach child to parent
– If node has two children
• Replace either by
a) in-order successor (the left-most child of the right subtree)
b) in-order predecessor (the right-most child of the left subtree)
• Example: delete search key with value 57
a)
57
27
22
83
27
86
b)
83
86
22
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
27
22
83
86
17
4.2 Binary Search Trees
• Summary
– Very simple, dynamic data structure
– Efficient on average
• O(log n) for all operations
– Can be very inefficient for degenerated cases
• O(n) for all operations
0/1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
18
4.3 Self-Balancing Binary Search Trees
• Observation:
– Binary Search Trees are very efficient when perfect or
balanced
• Idea:
– Continuously optimize tree structure to keep tree
balanced
• Popular Implementations
–
–
–
–
–
AVL-Tree (classic example)
Red-Black-Tree
Splay-Tree
Scapegoat-Tree
…
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
19
4.3 Self-Balancing Binary Search Trees
• Basic Concepts for Deletion:
Global Rebuild (Lazy Deletion)
– Start with balanced tree
– Don’t delete a node, just mark it as deleted
• Search algorithm scans deleted nodes, but does not return
them
– If Rebuild Condition is met, rebuild the whole tree
without the deleted nodes
• “Rebuild as soon as half of the nodes are marked as
deleted”
• Complete rebuild can be performed in O(n)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
20
4.3 Self-Balancing Binary Search Trees
• Global Rebuild (cont.)
– Search Efficiency
• n number of unmarked nodes
• Tree is balanced, contains max 2n nodes overall
– Number of accesses during search usually just increases by 1
– O(log n)
– Delete Efficiency
• Global rebuild is in O(n)
– But only necessary after n deletions
– Amortized additional costs per deletion is O(1)
• Overall complexity
– Average: O(log n)
– Worst Case: O(n), if actual rebuild is performed
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
21
4.3 Self-Balancing Binary Search Trees
• Global Rebuild (cont.)
– Direct Deletion with Rebuild
• Similar complexity as with lazy deletion
– Increased per delete effort
– Reduced per search effort until rebuild
• Delete nodes as in normal binary trees
– Increment deletion counter cd
• Rebuild tree as soon as cd = n, reset cd
57
85
33
85
17
17
42
Delete 57,33,42,61
61
99
99
Rebuild
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
22
4.3 Self-Balancing Binary Search Trees
• Basic Concepts for Insertion and Deletion:
Local Balancing (Subtree Balancing)
– Start with balanced tree
– Insert/delete nodes normally
• If a subtree becomes “too unbalanced”, locally balance
subtree to regain global balance
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
23
4.3 Self-Balancing Binary Search Trees
– To detect unbalanced subtrees, each node v needs to
know the size |v| and the height h(v) of it’s subtree
• Unbalanced Condition: (Height Balancing)
– Subtree is “too unbalanced” when |h(left(v)) - h(right(v))| > α
– α is a constant which can be adjusted (for AVL, α=1)
• Alternative Unbalanced Condition:
– Subtree is “too unbalanced” when h(v) > α * log2|v|
– α is a constant which can be adjusted
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
24
4.3 Self-Balancing Binary Search Trees
• Local Balancing (cont.)
– “After inserting a node, walk back the tree and update
stored subtree statistics h(v) and |v|. If node v is too
imbalanced, balance subtree of v”
Height Imbalanced for α=1
|2-0| = 2 > 1
57
2, 5
33
1, 3
17
0, 1
57
3, 6
85
0, 1
42
0, 1
h(v)
33
2, 4
17
1, 2
key
|v|
85
0, 1
42
0, 1
5
0, 1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
25
4.3 Self-Balancing Binary Search Trees
– Local Balancing can be archived by
• Rebuilding the subtree
– O(|v|) = O(n) the worst case
– However, O(log n) in average
– This operation is expensive. But in the context of DBMS, it may
pay off as it can also consolidate and optimize physical storage
locations
– Especially suited for disk based trees
• Rotating
–
–
–
–
Only pointers are moved  very efficient
O(1)
Does not change physical storage of nodes
Especially suited for main-memory-based trees
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
26
4.3 Self-Balancing Binary Search Trees
• Local Balancing – Rotating
– Simple Rotation (left, right)
y
Pivot
x
right
x
y
3
1
1
left
2
2
3
– Double Rotation (left-left, right-right, Rollercoaster)
right-right
z
y
right-right
y
x
x
z
y
4
1
x
z
3
1
2
1
left-left
2
3
4
2
left-left
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
3
4
27
4.3 Self-Balancing Binary Search Trees
• Local Balancing – Rotating
– Double Rotation (left-right, Zig-Zag)
left
z
right
z
y
x
y
4
z
4
x
y
1
3
2
x
3
1
1
2
3
4
2
– Double Rotation (right-left, Zig-Zag)
– Analogous to left-right
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
28
Self-Balancing Binary Search Trees
• The presented concepts can be combined in
different ways to implement self-balancing trees
– AVL-Tree (classic example)
– Red-Black-Tree
– Splay-Tree
– Scapegoat-Tree
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
29
Self-Balancing Binary Search Trees
• Implementation: AVL-Trees
– Invented 1962 by Adelson-Velsky and Landis
– Uses Local Rebalancing with Rotations for
Insertion and Deletion
• Unbalanced criterion: |h(left(v)) - h(right(v))| > 1
– “Height difference of left and right subtree of v is 2 or more”
– Height information is stored explicitly within nodes
• Update backtracking after each insert and delete
• Storage overhead of O(n)
– Guaranteed maximum height of 1.44 log2n
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
30
Self-Balancing Binary Search Trees
• Implementation: Scapegoat-Trees
– Invented 1993 by Galperin and Rivest
– Uses Global Rebuilding for Deletions
– Local Balancing with Rotations for Insertions
• Unbalanced criterion: h(v) > log1/α|v| + 1; 0.5 ≤ α ≤ 1
– Node statistics (height, size) determined dynamically
during backtracking
• Only global statistics are stored
• Storage overhead of O(1)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
31
4.4 Problems with Binary Search Trees
• Are binary trees really suitable for disk based
databases?
– Yes and No…
– Binary Trees are great data-structures for usage in
internal memory
– But they have a very bad performance when stored
on external storage (i.e. hard disks)
0/1
&
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
=
32
4.4 Problems with Binary Search Trees
• Binary tree nodes have to be stored within hard
disk blocks in linear fashion
– When tree is large, nodes are scattered among the
blocks
– In worst case, a new block must be read from disk
for every node accessed during search or traversal
• Every linearization scheme for binary trees has that problem
– Reading a block from disk is very expensive
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
33
4.4 Problems with Binary Search Trees
•
Sample linearization
– Search for ’42’
• In worst case needs to fetch 3 blocks from disk for just 4 nodes
– Problem is even worse for full tree traversal
Tree:
33
16
47
6
4
21
7
18
39
24
35
68
42
59
99
16
Disk/DB Blocks:
33
16
47
Block 1
6
21
39
68
Block 2
4
7
18
24
Block 3
35
42
59
99
Block 4
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
2
34
4.4 B-Trees
• B-Trees adapt concepts and techniques learned for binary
trees and optimize them for harddisc storage
• Basic Ideas:
– Searching within a DB/disk block is very efficient
• Take advantage of static nature within a block
• Search can be performed in memory with bisection search
• Treat entire blocks as tree nodes
– Reading blocks from the disk is expensive
• Reduce block reads
• Most data resides in the leaf nodes
– Thus minimize the height of the tree
• Dramatically increase fan-out factor
• Tree becomes “bushy”
• Smaller serach path length
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
35
4.4 Block Search Trees
• First Improvement: Block Search Tree
– Nodes are complete DB blocks
– Each node can store up to q pointers pi and q-1 unique
and ordered key entries ki : <p1, k1, …, kq-1, pq >
• ki < ki+1
• Pointers pi link to subtrees (or are empty).
All keys in subtree of pi are less than ki and greater as ki-1
Key Value
Node Pointers
10
20
30
Node
5
15
13
EN 14.3.1
17
34
38
55
14
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
36
4.4 Block Search Trees
• Locate a key k
– Recursive Procedure: Start with root node
• Use bisection search within the current node
• If key found
– Return it
• If key not found
– If there is a pi with ki-1 < k < ki
» Follow pi and repeat algorithm with link node
– Else
» Key not in tree
– Example: Locate 14
10
5
15
13
EN 14.3.1
20
17
30
34
38
55
14
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
37
4.4 Block Search Trees
• Insert a key k
– Recursive Procedure: Start with root node
• Use bisection search within the current node
• If key found
– Key cannot inserted twice, abort
• If key not found
– If there is a pi with ki-1 < k < ki
» Follow pi and repeat algorithm with link node
– Else
» If there is space left in the node
• Insert key and restore sort order
» Else
• Create new, empty node
• Insert k into new node
• Link new node to pi in current node such that with ki-1 < k < ki
EN 14.3.1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
38
4.4 Block Search Trees
• Insert a key : 44
5
10
15
13
20
17
5
EN 14.3.1
34
38
55
34
38
55
14
10
15
13
30
14
20
17
30
44
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
39
4.4 Block Search Trees
• Delete a key k
– Start with root node
– Locate k
• If k is in leaf node, delete k from node and restore order
– If leaf node is now empty, delete the node
• If k is in internal node
– If no or only one directly adjacent pointer of k are used
» Delete k and restore order
– If k is a separator between two used pointers,
» If space in both subnodes is sufficient
• Union both nodes into one
• Delete k and restore order
» Else
• Replace k with new separator key
• Either largest key in left node or smallest key in right node
– Any completely empty node is deleted as in binary search trees
EN 14.3.1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
40
4.4 Block Search Trees
• Delete a key : 10
5
10
15
13
20
17
14
38
55
38
55
44
5
EN 14.3.1
34
14
20
13
30
30
15
17
34
44
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
41
4.4 Block Search Trees
• Delete a key : 20
20
5
13
30
15
17
14
34
38
55
38
55
44
30
5
13
EN 14.3.1
14
15
17
34
44
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
42
4.4 Block Search Trees
• Delete a key : 30
30
5
13
15
17
14
34
38
55
44
34
5
13
EN 14.3.1
14
15
17
38
55
44
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
43
4.4 Block Search Trees
• Block Search Trees have similar properties to Binary
Search Trees
– Can be perfect, balanced or degenerated
• Assume height h=3; fan-out-factor q=2048; and total
number of keys n
– Block Search Tree
•
•
•
•
One node can store up to 2047 keys and 2048 links
Perfect : n = 8581M
Balanced : 4M ≤ n ≤ 8581M
Degenerated : n = 6141
– Binary Search Tree (Block tree with q=2)
•
•
•
•
One node can store 1 key and up to 2 links
Perfect : n = 7
Balanced : 3 < n ≤ 7
Degenerated : n = 3
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
44
4.4 Block Search Trees
• Assume n = 1,000,000,000; fan-out-factor q=2048;
and height h
– Block Search Tree
• Balanced : h = 3
• Degenerated : h = 488,520
– Binary Search Tree
• Balanced : h = 30
• Degenerated : h = 1,000,000,000
• During search, there is one disk access per tree
height in worst case
– In this example, block search trees are already 10 times
more efficient when balanced, 2000 times when
unbalanced
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
45
4.4 Block Search Trees
• Summary BST
– Data structure optimized for disk storage
– Is very efficient in average case
• O(log n) for all operations
• Even better
– Average node-accesses to locate a key is logfan-out n
» Fan-out usually in the order of several thousands
» Binary tree averages only to log2n
– Accessing a node is expensive on disks, huge improvement
– Can be very inefficient for degenerated cases
• O(n) for all operations
• Better than binary trees, but still bad
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
46
4.4 B-Trees
• B-Tree are specialized Block Search Trees for
disc-based Indexing
– Invented by Rudolf Bayer in 1971
– Keys may be non-unique
– Tree is self-balancing
• No degenerated cases
anymore
EN 14.3.2
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
47
4.4 B-Trees
• Basic structure of a B-tree node
– Nodes contain key values and respective data
(block) pointers to the actual data records
– Additionally, there are node pointers for the left,
resp. right interval around a key value
Key Value
Data Pointer
Tree Node
…
Node Pointers
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
48
EN 14.3.2
6
7
3
4
5
9, Tarrens, $ 99,00
2
8, Smith, $675,99
7, Ruth, $ 8642,78
6, Naders, $ 682,56
5, Miller, $179,99
4, Cesar, $ 1866,00
1
3, Behaim, $ 167,00
2, Bertram, $19,99
1, Adams, $ 887,00
4.4 B-Trees
• B-Trees as Primary Index
8
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
9
49
4.4 B-Trees
• All base operations similar to Block Search Tree with small
changes
– Guaranteed fill degree
– Self-Balancing
• Each node contains between L(ower) and U(pper) links
– Usually 2* L = U
– Nodes are split during insertion as soon as they contain
more than U-2 keys
– Nodes are unioned during deletion as soon as they contain
less than L keys
– If complete node is created or deleted, use local rebalancing
to re-balance tree
• Local rebuilding for disk-based storage, rotations for memory based
storage
EN 14.3.2
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
50
4.4 B-Trees
• All insertions start at the leaf nodes
– Search the tree to find leaf node where new element should be
added
– If the leaf node contains fewer than the maximum legal number of
elements (if |leaf node| < U)
• Insert the new element in the node and restore order
– Otherwise the leaf node is split into two nodes (node split)
• The median is chosen from among the leaf's elements and the new element
• Values less than the median are put in the new left node and values greater than
the median are put in the new right node, with the median acting as a separation
value
• That separation value is added to the node's parent, which may also cause it to
be split
• If the splitting goes all the way up to the root, it creates a new root with a
single separator value and two children
– Remember: the lower bound on the size of internal nodes does not apply to the root
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
51
4.4 B-Trees
• Deleting elements is problematic, because node
sizes can decrease under the minimum number of
elements (i.e. |new node size| < L)
– An element to be deleted in an internal node may be a
separator for its child nodes
• Deletion from a leaf node
– Search for the value to delete
– If the value is in a leaf node, it can simply be deleted from
the node
• Test if node has too few elements; in that case the tree has to be
rebalanced
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
52
4.4 B-Trees
• Rebalancing after deletion
– If some leaf node is under the minimum size,
some elements must be redistributed from its
siblings to bring all children nodes again up to the
minimum (stealing)
• If all siblings have only minimum size, the parent node is
affected and has to hand over an element
• If the parent then falls under the minimum size, the
redistribution must be applied iteratively up the tree
• Since the minimum element count does not apply to the
root, making the root the only deficient node is not a
problem
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
53
4.4 B-Trees
• The rebalancing strategy is to find a sibling of
the deficient node which has more than the
minimum number of elements (for stealing)
– Choose a new separator
• Move it to the parent node and redistribute the values in
both original nodes to the new left and right children
– If the sibling node to the right of the deficient node
has only the minimum number of elements, examine
the sibling node to the left
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
54
4.4 B-Trees
– If both siblings have only the minimum number of
elements:
• create a new node with all the elements from the deficient
node & all the elements from one of its siblings & the
separator in the parent between the two combined sibling nodes
– Remove the separator from the parent, and replace the
two children it separated with the combined node.
– If that brings the number of elements in the parent under
the minimum, repeat these steps with that deficient node,
unless it is the root, since the root may be deficient
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
55
4.4 B-Trees
• Example: Steal Keys from Siblings (l=3)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
56
4.4 B-Trees
• Example: Join Child Nodes (l=3)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
57
4.4 B-Trees
• Each element in an internal node acts as a separation
value for two subtrees.
• When such an element is deleted, there are two cases:
– Both of the two child nodes to the left and right of the
deleted element have the minimum number of elements (L-1)
and then can then be joined into a legal single node with (2L-2)
elements
– One of the two child nodes contains more than the
minimum number of elements. Then a new separator for those
subtrees must be found. There are two possible choices:
• The largest element in the left subtree is the largest element which is
still less than the separator
• The smallest element in the right subtree is the smallest element which
is still greater than the separator
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
58
4.4 B-Trees
• Deletion from an internal node
– If the value is in an internal node, choose a new separator,
remove it from the leaf node it is in, and replace the
element to be deleted with the new separator
– This has deleted an element from child node so the
deletion has been passed down the tree iteratively
• If the child is a leaf node the leaf node deletion procedure applies
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
59
4.4 B-Trees
• Example: Build a B-Tree
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
60
4.4 B-Trees
• Summary
– Very efficient data structure for disk storage
• O(log n) for all operations
• Even better
log 𝑓𝑎𝑛 −𝑜𝑢𝑡 (
– Guaranteed maximum node-accesses to locate a key is
– Balanced binary tree guarantees only ⌈ log2 𝑛)⌉
– Accessing a node is expensive on disks  huge improvement
𝑛+1
)
2
– No degenerated cases
• Self-Balancing rarely necessary as most updates affect just
one node
• Wasted space decreased due to guaranteed minimal fill
factor
EN 14.3.2
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
61
4.4 B*Trees
• The B*Tree is a constrained B-Tree
– All non-root nodes need to be filled to 2/3
– Implemented in various file systems
• HFS
• Raiser 4
– Used to be quite popular, but lost its importance…
EN 14.3.3
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
62
4.4
+
B Trees
• The B+Tree is an optimization of the B-Tree
– Improved traversal performance
– Increased search efficiency
– Increased memory efficiency
• B+Tree uses different nodes for leaf nodes and
internal nodes
– Internal Nodes: Only unique keys and node links
• No data pointers!
– Leaf Nodes: Replicated keys with data pointer
• Data pointers only here
Node Pointer
Node
Key Value
Key Value
…
Internal Node
EN 14.3.3
Data Pointer
…
Leaf Node
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
63
4.4
+
B Trees
• Internal Nodes are used for search guidance
– A block can contain more keys  fan-out higher
• Leafs just contain data links
– All leafs are linked to each other in-order for
increased traversal performance
Internal Search Nodes
Data Nodes
EN 14.3.3
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
64
4.4
+
B Trees
• Summary
– B+ Tree is THE super index structure for disk-based
databases
– Improved over B-Tree
• Improved traversal performance
• Increased search efficiency
• Increased memory efficiency
EN 14.3.3
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
65
4.5 IMDB’s
• Observation
– Loading data from the hard disk is a major
bottleneck
– Available main memory still doubles every 18 month
• Moore’s Law
• Idea
– Store all data in fast main memory!
• Solutions
– Use “traditional” DBMS with huge buffer pool (block
cache)
• DBMS are usually optimized for sequential disk access
– Design special In-Memory Databases Systems
• Or MMDB (Main Memory Database)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
66
4.5 IMDB’s
• Why do we need in-memory databases?
– Embedded Systems
•
•
•
•
•
Mobile Phones
PDA’s
Sensors
Diskless Computing Devices
…
– Ultra-High-Performance (Real Time) Scenarios
•
•
•
•
Network Applications
Telecommunication Applications
High-Volume Trading
…
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
67
4.5 IMDB’s
• Why should IMDB’s be different?
• Traditional DBMS do also work in-memory – but
they waste potential
– Random access has nearly no penalty compared to
sequential access
• Optimizing for linear storage and block read/write
unnecessary
Type
Media
Pri
DDR3-Ram
Size
Random
Acc. Speed
Transfer
Speed
Characteristics
Price
Price/GB
2 GiB
0.004 ms
8000 MB/sec
Vol, Dyn, Ra,
OL
€38
€ 19
2000 GB
< 8.5 ms
138 MB/sec
Stat, RA, OL
€143
€ 0.07
(Corsair 1600C7DHX)
Sec
Harddrive Magnetic
(Seagate ST32000641AS)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
68
4.5 IMDB’s
• Storing a DB in main memory also has problems
– Main memory usually smaller and more expensive
– ACID support
• IMDBs support atomicity, consistency and isolation
• Problem: main memory is not persistent
– What happens in case of power failure?
– How to ensure the durability requirement of DBs?
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
69
4.5 IMDB’s - Durability
• Snapshot Files / Checkpoint Images
– Record state of DB at given point in time
– Done periodically or in case of controlled shutdown
– Only partial durability
• Transaction Logging
– History of actions executed by the DBMS
– File of changes done in the database stored in stable
storage
– If DB has not been shut down properly (respectively is
in inconsistent state) the DBMS reviews logs for
uncommitted transactions and rolls back changes
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
70
4.5 IMDB’s - Durability
• Non-volatile random access memory (NVRAM)
– Static RAM backed up with battery power (battery
RAM)
– Or electrically erasable programmable ROM
(EEPROM)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
71
4.5 IMDB’s - Indexes
• IMDB Index Structures
– B-Trees are great, but they are shallow and bushy
which is unnecessary in main memory
• Can save some performance there
– Hash Indexes are very suitable in main memory for
unsorted data
• Especially bucket chained hashing is very efficient
– For sorted data: Use the T-Tree instead of B-Tree
• Specialized tree for main memory databases
• Blend between AVL-Tree and B-Tree
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
72
4.5 T-Tree
• T-Tree design considerations
– I/O access is cheap in main memory
– Expensive resources are computation time
and memory space
p
d1
d2
…
l
dm-1
dm
r
• Properties
– T-Tree is a self-balancing binary tree (AVL algorithm)
– T-Tree nodes contain only links
– Each node links to m data records (d1 … dm)
• Data entries are ordered, smallest left, biggest right
• All nodes contain a maximum of cmax entries
• Each internal node contains cmin to cmax entries (usually cmax -cmin ≤2)
– Each node has a link to it’s parent
– Each node has at most a left and a right subtree
• Left subtree contains only entries smaller than the minimal node entry
• Right subtree contains only entries bigger than maximal node entry
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
73
4.5 IMDB Indexes
• How do main memory index structures compare?
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
74
4.5 IMDB Indexes
• How do main memory index structures compare?
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
75
4.5 IMDB Indexes
• How do main memory index structures compare?
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
76
4.5 IMDB Indexes
• Why not always use Chained Bucked Hashing?
– No range queries
– Storage overhead
– Suboptimal if amount of data is unknown during initialization
• Why do T-Tree and AVL-Tree perform better than BTree and ordered array for search?
– Bisection search within a B-tree node/array needs to compute
position of next comparison
• AVL and T-Tree do only need 2 comparisons in each node
• Why does T-Tree perform better than AVL for updates?
– Due to larger nodes, many updates do not require a
rebalancing
• Why does ordered array suck for updates?
– Reordering of all elements necessary for each update
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
77
4.5 References and Timeline
• AVL-Tree
• G. Adelson-Velskii, E. M. Landis: “An algorithm for the organization of
information”. Proceedings of the USSR Academy of Sciences 146: 263266. (Russian), 1962.
• English translation by M. J. Ricci in Soviet Math. Doklady, 3:1259–1263, 1962
– B-Trees
• R. Bayer, E. M. McCreight: “Organization and Maintenance of Large
Ordered Indexes”. Acta Informatica 1, 173-189, 1972
– T-Trees
• T. J. Lehman, M. J. Carey: “A Study of Index Structures for Main
Memory Database Management Systems”, 12 Int. Conf. On Very Large
Database, Kyoto, August 1986
th
– Scapegoat Trees
• I. Galperin, R. L. Rivest: “Scapegoat trees”, ACM-SIAM Symposium on
Discrete Algorithms, Austin, Texas, US, 1993
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
78
4 Storage
• Tree data structures are good index structures
– O(logn) performance in average
– But they may degenerate  O(n)
• Balancing necesarry!
– Block structure of hard discs
must be considered
•  Block trees
– B-Tree
• Self-balancing block tree with fill-guarantees
– B+-Tree
• Special inner nodes without data pointers
• Leaf nodes optimized for linear traversal
Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme
79
5 Outlook
• The Query Processor
– How do DBMS actually answer queries?
– Query Parsing/Translation
– Query Optimization
– Query Execution
– Implementation of Joins
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
80