Download Nodes

Document related concepts

Linked list wikipedia , lookup

Quadtree wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Interval tree wikipedia , lookup

Red–black tree wikipedia , lookup

B-tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
1431227-3
File Organization and Processing
Week 3
2-3 Tree
2-3-4 Tree
2-3 Tree
Outline
 Balanced Search Trees
•
2-3 Trees
•
2-3-4 Trees
Why care about advanced implementations?
Same entries, different insertion sequence:
 Not good! Would like to keep tree balanced.
2-3 Trees
Features
 each internal node has either 2 or 3 children
 all leaves are at the same level
2-3 Trees with Ordered Nodes
2-node
3-node
• leaf node can be either a 2-node or a 3-node
Example of 2-3 Tree
What did we gain?
What is the time efficiency of searching for an item?
Gain: Ease of Keeping the Tree Balanced
Binary Search
Tree
both trees after
inserting items
39, 38, ... 32
2-3 Tree
Inserting Items
Insert 39
Inserting Items
Insert 38
insert in leaf
divide leaf
and move middle
value up to parent
result
Inserting Items
Insert 37
Inserting Items
Insert 36
insert in leaf
divide leaf
and move middle
value up to parent
overcrowded
node
Inserting Items
... still inserting 36
divide overcrowded node,
move middle value up to parent,
attach children to smallest and largest
result
Inserting Items
After Insertion of 35, 34, 33
Inserting so far
Inserting so far
Inserting Items
How do we insert 32?
Inserting Items
 creating a new root if necessary
 tree grows at the root
Inserting Items
Final Result
Deleting Items
Delete 70
70
80
Deleting Items
Deleting 70: swap 70 with inorder successor (80)
Deleting Items
Deleting 70: ... get rid of 70
Deleting Items
Result
Deleting Items
Delete 100
Deleting Items
Deleting 100
Deleting Items
Result
Deleting Items
Delete 80
Deleting Items
Deleting 80 ...
Deleting Items
Deleting 80 ...
Deleting Items
Deleting 80 ...
Deleting Items
Final Result
comparison with
binary search tree
Deletion Algorithm I
Deleting item I:
1. Locate node n, which contains item I
2. If node n is not a leaf  swap I with inorder successor
 deletion always begins at a leaf
3. If leaf node n contains another item, just delete item I
else
try to redistribute nodes from siblings (see next slide)
if not possible, merge node (see next slide)
Deletion Algorithm II
Redistribution
A sibling has 2 items:
 redistribute item
between siblings and
parent
Merging
No sibling has 2 items:
 merge node
 move item from parent
to sibling
Deletion Algorithm III
Redistribution
Internal node n has no item left
 redistribute
Merging
Redistribution not possible:
 merge node
 move item from parent
to sibling
 adopt child of n
If n's parent ends up without item, apply process recursively
Deletion Algorithm IV
If merging process reaches the root and root is without item
 delete root
Operations of 2-3 Trees
all operations have time complexity of log n
2-3-4 Tree
Introduction
• Multi-way Trees are trees that can have up to four
children and three data items per node.
• 2-3-4 Trees: Very nice features
– Are balanced, like RB Trees.
– Slightly less efficient but easier to program.
–  Serve as an introduction to the understanding of BTrees!!
• B-Trees: another kind of multi-way tree particularly
useful in organizing external storage.
– B-Trees can have dozens or hundreds of children.
39
Introduction to 2-3-4 Trees
• Shape of nodes is a lozenge-shaped node.
• In a 2-3-4 tree, all leaf nodes are at the same level.
(but data can appear in all nodes)
50
30
10 20
40
60
55
62 64
66
70
80
75
83 86
40
Introduction
• The 2, 3, and 4 in the name refer to how many links to
child nodes can potentially be contained in a given
node.
• For non-leaf nodes, three arrangements are possible:
– A node with only one data item always has two children
– A node with two data items always has three children
– A node with three data items always has four children.
41
Introduction
• Non-leaf nodes must always have one more
child than it has data items (see below);
– Equivalently, if the number of child links is L and
the number of data items is D, then L = D+1.
50
30
10 20
40
60
55
62 64
66
70
75
80
83 86
More Introductory stuff
• This critical relationship determines the structure of 2-3-4
trees.
• A leaf node has no children, but can still contain one, two, or
three data items; cannot be empty.
•  Because a 2-3-4 tree can have nodes with up to four
children, it’s called a multiway tree of order 4.
50
30
10 20
40
60
55
62 64
66
70
75
80
83 86
More Introductory stuff
• Binary and RB trees may be referred to as multiway trees of order 2 - each
node can have up to two children.
• But note: in a binary tree, a node may have up to two child links (one or
more may be null).
• In a 2-3-4 tree, nodes with a single link are NOT permitted; a node with
one data item must have two links (unless it is a leaf); nodes with two
data items must have three children; nodes with three data items must
have four children. (You will see this more clearly once we talk about how
they are actually built.)
• These numbers are important. If a node has one data item, then it also
points to lower level nodes that have values less than the value of this
item and a pointer to a node that has values greater than or equal to this
value.
• Nodes with two links is called a 2-node; a node with three links is called a
3-node; with four links, a 4-node. (no such thing as a 1-node).
44
More Introductory Stuff
Do you see any 2-nodes? 3-nodes? 4-nodes?
Do you see: a node with one data item that has two links?
a node with two data items having three children;
a node with three data items having four children?
50
2 node
4 node
30
10 20
40
2 node
55
60
62 64
66
70
75
80
83 86
Nodes in a 2-3-4 Tree
2-3-4 Tree Organization
• Very different organization than for a binary tree.
• First, we number the data items in a node 0,1,2
and the child links: 0,1,2,3. Very Important.
• Data items are always ascending: left to right in
a node.
• Relationships between data items and child links
is critical:
47
More on 2-3-4 Tree’s Organization
A
0
1
Nodes w/key < A Nodes with key between A and <B
B
C
2
3
Nodes w/keys between B and < C Nodes w/keys > C
See below: (Equal keys not permitted; leaves all on same level; upper level nodes often not
full; tree balanced!
Its construction always maintains its balance, even if you add additional data items. (ahead)
• All children in the subtree rooted at child 0 have key values less than key 0.
• All children in the subtree rooted at child 1 have key values greater than key 0
but less
than key 1.
• All children in the subtree rooted at child 2 have key values greater than key 1
but less
than key 2.
• All children in the subtree rooted at child 3 have key values greater than key 2.
Searching a 2-3-4 Tree
• A very nice feature of these trees.
• You have a search key; Go to root.
• If hit,
– done.
• Else
– Select the link that leads to the subtree with the appropriate
range of values.
– If you don’t find your target here, go to next child. (notice
items are sequential – VIP later)
– etc. Perhaps data will be ‘not found.’
49
Try it: search for 64, 40, 65
50
2 node
4 node
30
10 20
40
2 node
55
60
62 64
66
70
80
75
83 86
Start at the root.
You search the root, but don't find the item. Because 64 is larger than 50, you
go to child 1, which we will represent as 60/70/80. (Remember that child 1 is on the
right, because the numbering of children and links starts at 0 on the left.) You don't
find the data item in this node either, so you must go to the next child. Here, because
64 is greater than 60 but less than 70, you go again to child 1. This time you find the
specified item in the 62/64/66 link.
50
Node Insertion
• Can be quite easy; sometimes very complex.
– Can do a top-down or a bottom-up approach…
• But structure of tree must be maintained at all
costs.
• Easy Approach:
– Start with searching to find a spot.
• We like to insert at the leaf level. So,
– This may very likely involve moving a data item
around to maintain the sequential nature of the data
in a leaf.
– We will take the top down approach.
51
Node Split
• We will use a top-down 2-3-4 tree.
• Full nodes are split on the way down, if
we encounter a full node in looking for the
insertion point.
• This approach keeps the tree balanced.
52
Node Split – Insertion
Upon encountering a full node (searching for a
place to insert…)
1. split that node at that time.
2. move highest data item from the current (full) node
into new node to the right.
3. move middle value of node undergoing the split up
to parent node (Know we can do all this because parent
node was not full)
4. Retain lowest item in node.
5. Note: new node (to the right) only has one data item
(the highest value)
53
Node Split – Insertion
Upon encountering a full node:
6. Original (formerly full) node contains the lowest of
the three values.
7. Rightmost two children of original full node are
disconnected and connected to the new sibling of the
original full node (to the right)
(They must be disconnected, since their parent data
is changed)
Hooked to new sibling with links ‘lower than’ first
data and >= first (and only) data item
7. Insert new data item into the proper leaf node.
Please note: there can be multiple splits encountered
en route to finding the insertion point.
54
Insert: Split is NOT the Root Node
Insert 99
62
… other stuff
74
83
87 89
92
97
Want to add a data value of 99
Split this node…
99 to be inserted…
104
112
55
Case 1 Insert: Split is NOT the root node
Insert 99
2. 92 moves up
62
92
3. 83 stays
1. 104 starts a new node
… other stuff
83
74
87 89
104
97
99
112
4. Two rightmost children
of split node are
reconnected to new node.
5. New data item moved in.
56
Splitting the Root
• Let’s label the 3 items say A, B and C respectively.
• A new node is created that becomes the new root and the parent of
the node being split.
• A second new node is created that becomes a sibling of the node
being split.
• Data item C is moved into the new sibling.
• Data item B is moved into the new root.
• Data item A remains where it is.
• The two rightmost children of the node being split are disconnected
from it and connected to the new right-hand node.
• This process creates a new root that's at a higher level than the old
one. Thus the overall height of the tree is increased by one.
57
Root Node Split
58
Splitting on the Way Down
• Note: once we hit a node that must be split (on the way down),
we know that when we move a data value ‘up’ that ‘that’ node
was not full.
– May be full ‘now,’ but it wasn’t on way down.
• Algorithm is reasonably straightforward.
• Just remember:
– 1. You are splitting a 4-node. Node being split has three data
values. Data on the right goes to a new node. Data on the left
remains; data in middle is promoted upward; new data item is
inserted appropriately.
– 2. We do a node split any time we encounter a full node and
when we are trying to insert a new data value.
59
2-3-4 Tree: Insertion
Inserting 60, 30, 10, 20, 50, 40, 70, 80, 15, 90, 100
2-3-4 Tree: Insertion
Inserting 60, 30, 10, 20 ...
... 50, 40 ...
2-3-4 Tree: Insertion
Inserting 50, 40 ...
... 70, ...
2-3-4 Tree: Insertion
Inserting 70 ...
... 80, 15 ...
2-3-4 Tree: Insertion
Inserting 80, 15 ...
... 90 ...
2-3-4 Tree: Insertion
Inserting 90 ...
... 100 ...
2-3-4 Tree: Insertion
Inserting 100 ...
2-3-4 Tree: Insertion Procedure
Splitting 4-nodes during Insertion
2-3-4 Tree: Insertion Procedure
Splitting a 4-node whose parent is a 2-node during insertion
2-3-4 Tree: Insertion Procedure
Splitting a 4-node whose parent is a 3-node during insertion
2-3-4 Trees and Red-Black Trees
70
2-3-4 Trees and Red-Black Trees
• As we shall see later, these two data structures
have very much in common.
• One of the main common features is they provide
for balanced trees
• One can be converted into the other very easily.
• Even the operations applied to the two trees are
equivalent.
– We keep the 2-3-4 tree balanced via node splits; in RB
Trees, we balance via color flips and rotations.
71
2-3-4 Trees and Red-Black Trees - more
•  One thing to remember, though, (we will see a lot
about this later) is that the Red Black tree has nodes that
contain a single item;
2-3-4 trees contain nodes
that can contain three data items and can have four
children.
– Thus, the number of levels of the two equivalent tree structures
is quite different and
– The amount of data stored at each node is quite different.
– These factors affect search times and storage requirements.
– These things come to bear later!!
72
Efficiency Considerations for 2-3-4 Trees
• Searching:
• RB Trees: one node at each level must be visited.
• 2-3-4 Trees: one node must be visited here too, but
– More data per node / level.
– Searches are faster.
• recognize all data items at node must be checked in a 2-3-4 tree,
• but this is very fast.
• Overall height of the RB Tree is generally twice 2-3-4 tree,
• But all nodes in the 2-3-4 tree are NOT always full
• Yet overall speed is slightly better in 2-3-4 trees.
•  Overall, for 2-3-4 trees, the increased number of items (which
increases processing / search times) per node processing tends to cancel
out the increases gained from the decreased height of the tree – fewer
node retrievals.
•  So, the search times for a 2-3-4 tree and for a balanced binary tree
are approximately equal and both are O(log2n)
73
Efficiency Considerations for 2-3-4 Trees
• Storage
• 2-3-4 Trees: a node can have three data items and up to four
references.
– Can be an array of references or four specific variables.
– IF not all of it is used there can be considerable waste.
–  In 2-3-4 trees, quite common to see many nodes not full.
• RB Trees: balanced; contain few nodes w/ only one child.
Almost all references are used.
– Each node contains max number of data items: one.
• RB Trees: more efficient use of storage than 2-3-4 trees for
storage.
74