* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Nodes
Survey
Document related concepts
Transcript
1431227-3 File Organization and Processing Week 3 2-3 Tree 2-3-4 Tree 2-3 Tree Outline Balanced Search Trees • 2-3 Trees • 2-3-4 Trees Why care about advanced implementations? Same entries, different insertion sequence: Not good! Would like to keep tree balanced. 2-3 Trees Features each internal node has either 2 or 3 children all leaves are at the same level 2-3 Trees with Ordered Nodes 2-node 3-node • leaf node can be either a 2-node or a 3-node Example of 2-3 Tree What did we gain? What is the time efficiency of searching for an item? Gain: Ease of Keeping the Tree Balanced Binary Search Tree both trees after inserting items 39, 38, ... 32 2-3 Tree Inserting Items Insert 39 Inserting Items Insert 38 insert in leaf divide leaf and move middle value up to parent result Inserting Items Insert 37 Inserting Items Insert 36 insert in leaf divide leaf and move middle value up to parent overcrowded node Inserting Items ... still inserting 36 divide overcrowded node, move middle value up to parent, attach children to smallest and largest result Inserting Items After Insertion of 35, 34, 33 Inserting so far Inserting so far Inserting Items How do we insert 32? Inserting Items creating a new root if necessary tree grows at the root Inserting Items Final Result Deleting Items Delete 70 70 80 Deleting Items Deleting 70: swap 70 with inorder successor (80) Deleting Items Deleting 70: ... get rid of 70 Deleting Items Result Deleting Items Delete 100 Deleting Items Deleting 100 Deleting Items Result Deleting Items Delete 80 Deleting Items Deleting 80 ... Deleting Items Deleting 80 ... Deleting Items Deleting 80 ... Deleting Items Final Result comparison with binary search tree Deletion Algorithm I Deleting item I: 1. Locate node n, which contains item I 2. If node n is not a leaf swap I with inorder successor deletion always begins at a leaf 3. If leaf node n contains another item, just delete item I else try to redistribute nodes from siblings (see next slide) if not possible, merge node (see next slide) Deletion Algorithm II Redistribution A sibling has 2 items: redistribute item between siblings and parent Merging No sibling has 2 items: merge node move item from parent to sibling Deletion Algorithm III Redistribution Internal node n has no item left redistribute Merging Redistribution not possible: merge node move item from parent to sibling adopt child of n If n's parent ends up without item, apply process recursively Deletion Algorithm IV If merging process reaches the root and root is without item delete root Operations of 2-3 Trees all operations have time complexity of log n 2-3-4 Tree Introduction • Multi-way Trees are trees that can have up to four children and three data items per node. • 2-3-4 Trees: Very nice features – Are balanced, like RB Trees. – Slightly less efficient but easier to program. – Serve as an introduction to the understanding of BTrees!! • B-Trees: another kind of multi-way tree particularly useful in organizing external storage. – B-Trees can have dozens or hundreds of children. 39 Introduction to 2-3-4 Trees • Shape of nodes is a lozenge-shaped node. • In a 2-3-4 tree, all leaf nodes are at the same level. (but data can appear in all nodes) 50 30 10 20 40 60 55 62 64 66 70 80 75 83 86 40 Introduction • The 2, 3, and 4 in the name refer to how many links to child nodes can potentially be contained in a given node. • For non-leaf nodes, three arrangements are possible: – A node with only one data item always has two children – A node with two data items always has three children – A node with three data items always has four children. 41 Introduction • Non-leaf nodes must always have one more child than it has data items (see below); – Equivalently, if the number of child links is L and the number of data items is D, then L = D+1. 50 30 10 20 40 60 55 62 64 66 70 75 80 83 86 More Introductory stuff • This critical relationship determines the structure of 2-3-4 trees. • A leaf node has no children, but can still contain one, two, or three data items; cannot be empty. • Because a 2-3-4 tree can have nodes with up to four children, it’s called a multiway tree of order 4. 50 30 10 20 40 60 55 62 64 66 70 75 80 83 86 More Introductory stuff • Binary and RB trees may be referred to as multiway trees of order 2 - each node can have up to two children. • But note: in a binary tree, a node may have up to two child links (one or more may be null). • In a 2-3-4 tree, nodes with a single link are NOT permitted; a node with one data item must have two links (unless it is a leaf); nodes with two data items must have three children; nodes with three data items must have four children. (You will see this more clearly once we talk about how they are actually built.) • These numbers are important. If a node has one data item, then it also points to lower level nodes that have values less than the value of this item and a pointer to a node that has values greater than or equal to this value. • Nodes with two links is called a 2-node; a node with three links is called a 3-node; with four links, a 4-node. (no such thing as a 1-node). 44 More Introductory Stuff Do you see any 2-nodes? 3-nodes? 4-nodes? Do you see: a node with one data item that has two links? a node with two data items having three children; a node with three data items having four children? 50 2 node 4 node 30 10 20 40 2 node 55 60 62 64 66 70 75 80 83 86 Nodes in a 2-3-4 Tree 2-3-4 Tree Organization • Very different organization than for a binary tree. • First, we number the data items in a node 0,1,2 and the child links: 0,1,2,3. Very Important. • Data items are always ascending: left to right in a node. • Relationships between data items and child links is critical: 47 More on 2-3-4 Tree’s Organization A 0 1 Nodes w/key < A Nodes with key between A and <B B C 2 3 Nodes w/keys between B and < C Nodes w/keys > C See below: (Equal keys not permitted; leaves all on same level; upper level nodes often not full; tree balanced! Its construction always maintains its balance, even if you add additional data items. (ahead) • All children in the subtree rooted at child 0 have key values less than key 0. • All children in the subtree rooted at child 1 have key values greater than key 0 but less than key 1. • All children in the subtree rooted at child 2 have key values greater than key 1 but less than key 2. • All children in the subtree rooted at child 3 have key values greater than key 2. Searching a 2-3-4 Tree • A very nice feature of these trees. • You have a search key; Go to root. • If hit, – done. • Else – Select the link that leads to the subtree with the appropriate range of values. – If you don’t find your target here, go to next child. (notice items are sequential – VIP later) – etc. Perhaps data will be ‘not found.’ 49 Try it: search for 64, 40, 65 50 2 node 4 node 30 10 20 40 2 node 55 60 62 64 66 70 80 75 83 86 Start at the root. You search the root, but don't find the item. Because 64 is larger than 50, you go to child 1, which we will represent as 60/70/80. (Remember that child 1 is on the right, because the numbering of children and links starts at 0 on the left.) You don't find the data item in this node either, so you must go to the next child. Here, because 64 is greater than 60 but less than 70, you go again to child 1. This time you find the specified item in the 62/64/66 link. 50 Node Insertion • Can be quite easy; sometimes very complex. – Can do a top-down or a bottom-up approach… • But structure of tree must be maintained at all costs. • Easy Approach: – Start with searching to find a spot. • We like to insert at the leaf level. So, – This may very likely involve moving a data item around to maintain the sequential nature of the data in a leaf. – We will take the top down approach. 51 Node Split • We will use a top-down 2-3-4 tree. • Full nodes are split on the way down, if we encounter a full node in looking for the insertion point. • This approach keeps the tree balanced. 52 Node Split – Insertion Upon encountering a full node (searching for a place to insert…) 1. split that node at that time. 2. move highest data item from the current (full) node into new node to the right. 3. move middle value of node undergoing the split up to parent node (Know we can do all this because parent node was not full) 4. Retain lowest item in node. 5. Note: new node (to the right) only has one data item (the highest value) 53 Node Split – Insertion Upon encountering a full node: 6. Original (formerly full) node contains the lowest of the three values. 7. Rightmost two children of original full node are disconnected and connected to the new sibling of the original full node (to the right) (They must be disconnected, since their parent data is changed) Hooked to new sibling with links ‘lower than’ first data and >= first (and only) data item 7. Insert new data item into the proper leaf node. Please note: there can be multiple splits encountered en route to finding the insertion point. 54 Insert: Split is NOT the Root Node Insert 99 62 … other stuff 74 83 87 89 92 97 Want to add a data value of 99 Split this node… 99 to be inserted… 104 112 55 Case 1 Insert: Split is NOT the root node Insert 99 2. 92 moves up 62 92 3. 83 stays 1. 104 starts a new node … other stuff 83 74 87 89 104 97 99 112 4. Two rightmost children of split node are reconnected to new node. 5. New data item moved in. 56 Splitting the Root • Let’s label the 3 items say A, B and C respectively. • A new node is created that becomes the new root and the parent of the node being split. • A second new node is created that becomes a sibling of the node being split. • Data item C is moved into the new sibling. • Data item B is moved into the new root. • Data item A remains where it is. • The two rightmost children of the node being split are disconnected from it and connected to the new right-hand node. • This process creates a new root that's at a higher level than the old one. Thus the overall height of the tree is increased by one. 57 Root Node Split 58 Splitting on the Way Down • Note: once we hit a node that must be split (on the way down), we know that when we move a data value ‘up’ that ‘that’ node was not full. – May be full ‘now,’ but it wasn’t on way down. • Algorithm is reasonably straightforward. • Just remember: – 1. You are splitting a 4-node. Node being split has three data values. Data on the right goes to a new node. Data on the left remains; data in middle is promoted upward; new data item is inserted appropriately. – 2. We do a node split any time we encounter a full node and when we are trying to insert a new data value. 59 2-3-4 Tree: Insertion Inserting 60, 30, 10, 20, 50, 40, 70, 80, 15, 90, 100 2-3-4 Tree: Insertion Inserting 60, 30, 10, 20 ... ... 50, 40 ... 2-3-4 Tree: Insertion Inserting 50, 40 ... ... 70, ... 2-3-4 Tree: Insertion Inserting 70 ... ... 80, 15 ... 2-3-4 Tree: Insertion Inserting 80, 15 ... ... 90 ... 2-3-4 Tree: Insertion Inserting 90 ... ... 100 ... 2-3-4 Tree: Insertion Inserting 100 ... 2-3-4 Tree: Insertion Procedure Splitting 4-nodes during Insertion 2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 2-node during insertion 2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 3-node during insertion 2-3-4 Trees and Red-Black Trees 70 2-3-4 Trees and Red-Black Trees • As we shall see later, these two data structures have very much in common. • One of the main common features is they provide for balanced trees • One can be converted into the other very easily. • Even the operations applied to the two trees are equivalent. – We keep the 2-3-4 tree balanced via node splits; in RB Trees, we balance via color flips and rotations. 71 2-3-4 Trees and Red-Black Trees - more • One thing to remember, though, (we will see a lot about this later) is that the Red Black tree has nodes that contain a single item; 2-3-4 trees contain nodes that can contain three data items and can have four children. – Thus, the number of levels of the two equivalent tree structures is quite different and – The amount of data stored at each node is quite different. – These factors affect search times and storage requirements. – These things come to bear later!! 72 Efficiency Considerations for 2-3-4 Trees • Searching: • RB Trees: one node at each level must be visited. • 2-3-4 Trees: one node must be visited here too, but – More data per node / level. – Searches are faster. • recognize all data items at node must be checked in a 2-3-4 tree, • but this is very fast. • Overall height of the RB Tree is generally twice 2-3-4 tree, • But all nodes in the 2-3-4 tree are NOT always full • Yet overall speed is slightly better in 2-3-4 trees. • Overall, for 2-3-4 trees, the increased number of items (which increases processing / search times) per node processing tends to cancel out the increases gained from the decreased height of the tree – fewer node retrievals. • So, the search times for a 2-3-4 tree and for a balanced binary tree are approximately equal and both are O(log2n) 73 Efficiency Considerations for 2-3-4 Trees • Storage • 2-3-4 Trees: a node can have three data items and up to four references. – Can be an array of references or four specific variables. – IF not all of it is used there can be considerable waste. – In 2-3-4 trees, quite common to see many nodes not full. • RB Trees: balanced; contain few nodes w/ only one child. Almost all references are used. – Each node contains max number of data items: one. • RB Trees: more efficient use of storage than 2-3-4 trees for storage. 74