Download 2-3-4 Trees - Randomly Philled

Data Structures 2-3-4 Trees Phil Tayco Slide version 1.0 Apr. 23, 2015 2-3-4 Trees Binary trees revisited • Binary trees combine the best of both worlds of dynamic memory usage and performing binary search like you could with a sorted array • The search algorithm with a binary tree will only achieve O(log n) as long as the tree is balanced • The balance of a tree is dependent on the inserting and deleting of nodes which can lead to imbalance • Imbalance leads to O(n) search performance which is basically a linked list 2-3-4 Trees Advanced tree ideas • As with other data structures, we try to address the cons • For trees, we want to efficiently maintain balance as inserts and deletes are performed • There are tree algorithms that already look at ways to do this: – AVL trees – Red-black trees • These trees keep the basic structure of a node • As you would guess, the function algorithms are more complex than the standard tree 2-3-4 Trees Multiway tree • What if we modified the tree node instead? root 20 40 60 10 30 50 70 80 • Notice each node here contains multiple data elements and multiple child links • The modified structure is interesting, but needs to work within a set of rules to guarantee balance 2-3-4 Trees Multiway tree • A non-leaf node with 1 data item always has 2 children root 20 10 30 2-3-4 Trees Multiway tree • A non-leaf node with 2 data item always has 3 children root 20 40 10 30 50 60 2-3-4 Trees Multiway tree • A non-leaf node with 3 data item always has 4 root children 20 40 60 10 30 50 70 80 2-3-4 Trees Multiway tree • Leaf nodes can have any number of data items root 20 40 60 10 30 31 32 50 70 80 2-3-4 Trees Multiway tree • As before, child nodes to the left and right of a data item are less and greater to maintain order root 20 40 60 10 30 31 32 50 70 80 2-3-4 Trees Similarities to Binary trees • While the number of items and node children have increased, the basic order is the same • This promotes a search and insert performance similar to binary trees at O(log n) • Search starts at root examining data items against the search value and traverses down nodes appropriately • Insert adds new data items at the appropriate leaf level • The algorithms will show that balance will always be achieved. This makes search and insert perform at O(log n) 2-3-4 Trees Insert • New data items will be inserted at the leaf level • In order to maintain balance, as we perform the normal search for the appropriate leaf to insert the new data element, we add a rule to the algorithm: – When visiting any node, if it is full, “split” the node – Whether or not a split has occurred, continue down the path using the standard search until a leaf node is reached – Once a leaf is reached, add the new data element to it (if it is full, perform another “split”) 2-3-4 Trees Split • The splitting of a node requires creating a new or modifying an existing parent node as well as creating a new sibling node • Data elements are moved and child pointers are readjusted as follows: – A new node is created as a sibling to the full node – The 3rd data item of the full node is moved to the sibling node as its 1st data item – The 2nd data item of the full node is added to the parent node – The 1st data item of the full node remains where it is – The 3rd and 4th child pointers of the full node move to the sibling node as its 1st and 2nd child pointers 2-3-4 Trees Split example 1 • We want to add 5 to the tree below. We start at root, 1st data item is 14 so we go down the 1st child pointer. We see it’s full so we must split it root 14 3 1 2 4 6 10 17 7 8 12 16 18 20 2-3-4 Trees Step 1: Create new sibling node • Notice parent node in this case is root and the sibling is not yet attached to the parent (the 2nd child pointer of root is still connected as such) root (parent) 14 (sibling) (current) 3 1 2 4 6 10 17 7 8 12 16 18 20 2-3-4 Trees Step 2: Move 3rd item to as 1st item of new node • 10 of current moves to new sibling node root (parent) 14 (sibling) (current) 3 1 2 4 6 10 7 8 17 12 16 18 20 2-3-4 Trees Step 3: Move 2nd item to parent • Notice 6 is inserted into the data item list of parent. This shifts 14 as well as its 2 child pointers root (parent) 6 14 (sibling) (current) 3 1 2 4 10 7 8 17 12 16 18 20 2-3-4 Trees Step 5: Move 3rd and 4th child pointers as 1st and 2nd child pointers of sibling • This keeps the parent-child relationships and orders intact and balanced root (parent) 6 14 (sibling) (current) 3 1 2 4 10 7 8 17 12 16 18 20 2-3-4 Trees Split Analysis • The split keeps the non-leaf and leaf rules intact • Guarantees non-leaf nodes with 1, 2 or 3 data items have 2, 3 or 4 child nodes • The split is performed as full nodes are encountered on the way down • In the previous example, the insert of 5 still has not been performed • The insert process resumes at the parent. Note that if the parent is full as a result of the split, a split at that node is not performed 2-3-4 Trees Resume insert at parent • 5 is less than 6 so we go down child pointer 1. 5 is greater than 3 and there is only 1 data item, so we go down 2nd child pointer. Node with data item 4 is a leaf and is not full so we add 5 there. 6 14 3 1 2 4 10 5 7 8 17 12 16 18 20 2-3-4 Trees Insert Analysis • The algorithm keeps the tree balanced • New nodes are created as needed by adding siblings before adding levels • Levels are increased when the root node is the one that requires splitting • When splitting the root, the same split algorithm applies, but instead of adding the 2nd data item to the parent node, a new parent node is created (as the new root) 2-3-4 Trees Splitting the root • Here, we will insert 15. Before we even go down a child node, we must split the root because it is full root 20 40 60 10 30 31 32 50 70 80 2-3-4 Trees Step 1: Create the sibling node • The algorithm works the same as before, except there is no “parent” node (yet) (current) root (sibling) 20 40 60 10 30 31 32 50 70 80 2-3-4 Trees Step 2: Create new root as parent • Since the current node is root, we create another new node to be the parent (and new root) (parent) (current) root (sibling) 20 40 60 10 30 31 32 50 70 80 2-3-4 Trees Step 3: Move data items • The normal split occurs. 3rd item of current moves to 1st of sibling and 2nd item of current moves to 1st of parent (parent) 40 (current) root (sibling) 20 10 30 31 32 60 50 70 80 2-3-4 Trees Step 4: Update pointers • 3rd and 4th child pointers of current become 1st and 2nd of sibling. 1st and 2nd of new parent get current and sibling nodes respectively (parent) 40 (current) root (sibling) 20 10 30 31 32 60 50 70 80 2-3-4 Trees Step 5: New root and continue • Make the parent the new root of the tree. Resume the insert from the root (15 will end up going down and added to leaf node with 10) (root) 40 20 10 15 30 31 32 60 50 70 80 • Notice the full leaf node 30, 31, 32 is not split. This is because it is never visited 2-3-4 Trees Insert Analysis • Splitting will only occur when a visited node is full, keeping the 23-4 tree rules intact • Levels of the tree increase “upward” when the root node is full (because the new parent is created at that moment and becomes the new root) • Splitting a leaf node will never result in more than 4 children for a parent node (if the parent node had 4 children, it would be full and split before reaching any of the child leaf nodes) • Balance is maintained because even if one side gets “heavy” with data items, the number of nodes will remain balanced because of the splitting algorithm • Best practice at understanding the algorithm is to insert a series of numbers and draw the resulting tree 2-3-4 Trees public class Node234 { private int numItems; private Node234 parent; private Node234[] children; private int[] dataItems; 2-3-4 Trees public Node234() { numItems = 0; parent = null; children = new Node234[4]; dataItems = new int[3]; for (int n = 0; n < 4; n++) children[n] = null; for (int n = 0; n < 3; n++) dataItems[n] = -1; } 2-3-4 Trees public class Tree234 { private Node234 root; public Tree234() { root = new Node234(); } 2-3-4 Trees Node234 and Tree234 Code • More properties needed here for the node – numItems to keep track of how many data items are in the node – Reference to parent node (useful for handling splits) – Array of child pointers – Array of data items • The array sizes are defined in the constructor and initialized to null (for children) and -1 (for data items) • We could also use a Linked List for the child and data arrays, but they are so small, we don’t necessarily need to (and simplifying the code to start) • The Tree is just the root node. Note that it is not initialized to null, but to a new Node234 object with no data items 2-3-4 Trees public void insert(int value) { Node234 current = root; while(true) { if(current.isFull()) { split(current); current = current.getParent(); current = getNextChild(current, value); } 2-3-4 Trees Tree234 Insert Code • We start with a current node at root • The loop plans to go down child nodes of the tree until we reach a leaf • Along the way, if the node.isFull method returns true, we have to split it • After the split, we set current to its parent followed by finding the appropriate child to go to based on the value to be inserted • Many methods being used here: isFull, split, getParent and getnextChild 2-3-4 Trees public boolean isFull() { return (numItems == 3); } public Node234 getParent() { return parent; } // Note: these methods appear in the Node234 class (split and getNextChild are in Tree234) 2-3-4 Trees private void split(Node234 n) { int thirdItem = n.removeItem(); int secondItem = n.removeItem(); Node234 Node234 Node234 Node234 fourthChild = n.removeChild(3); thirdChild = n.removeChild(2); sibling = new Node234(); parent; 2-3-4 Trees Tree234 Split Code • It is important now if you haven’t been drawing pictures to go through code that you do so now… • Split begins with removing the 2nd and 3rd data items from the full node and storing their values – these will be transferred to the parent and sibling nodes respectively • We do the same with disconnecting the 3rd and 4th child pointers of the node (so we can transfer them to the sibling) • We then create a new sibling node and a parent pointer (parent is not a new node yet as we haven’t determined if the full node is root at this point) • The setup is complete, but there are 2 new methods in Node234 to review: removeItem and removeChild 2-3-4 Trees public int removeItem() { int lastItem = dataItems[numItems - 1]; dataItems[--numItems] = -1; return lastItem; } // This removes the last data item in the data array (setting it to -1), decrements numItems and returns the value that was removed 2-3-4 Trees public Node234 removeChild(int n) { Node234 child = children[n]; children[n] = null; return child; } // This sets the given child of the node to null while returning a reference to that child // Now we can look at the next part of the split function… 2-3-4 Trees if (n == root) { parent = new Node234(); root = parent; root.setChild(0, n); } else parent = n.getParent(); // If the node being split is root, now create a new node as parent and root and set its first child to the current node // Otherwise, a parent exists and we just get it 2-3-4 Trees int itemLocation = parent.insertItem(secondItem); int parentItems = parent.getNumItems(); int c = parentItems - 1; while (c > itemLocation) { Node234 temp = parent.removeChild(c); parent.setChild(c + 1, temp); c--; } parent.setChild(itemLocation + 1, sibling); 2-3-4 Trees Tree234 Split Code – adjusting the parent • • • • • • The second item from the full node being split is inserted into the parent node using the Node’s insertItem function The location of that insert can vary, so it is returned here to determine how to adjust the child pointers of the parent This is done by getting the number of items in the parent and using a loop down to the location of the new item that was inserted – At each iteration, we remove the child pointer on its right and set it equal to the pointer on its left – this shifts the child pointers to the right that are after the inserted item Once that shift is complete, there will be a “hole” to the right of where the item inserted into the parent took place This hole is filled by connecting it to the new sibling node just created! Notice we have more Node234 functions: insertItem and getNumItems… 2-3-4 Trees public int getNumItems() { return numItems; } // This method is a standard get function of a class, returning the numItems property // insertItem is not as simple… 2-3-4 Trees public int insertItem(int data) { numItems++; int c = 0; for (int n = 2; n >= 0; n--) { if (dataItems[n] == -1) continue; // From right to left of the data items array, we check for non-empty data items (denoted as not equal to -1), if a spot is empty, ignore it 2-3-4 Trees else { int d = dataItems[n]; if (data < d) dataItems[n + 1] = dataItems[n]; else { dataItems[n + 1] = data; return n + 1; } } } dataItems[0] = data; return 0; } 2-3-4 Trees Node234 Code – inserting a data item • • The “else” branch here deals with encountering a data item as we go right to left in the data array looking for the correct place to insert the new data item When a data item is found, compare it to the new item – If the new item is less than it, the new item belongs to the left so we shift the data item in the array to the right by 1 – Otherwise, the new data item belongs to the right of this item in the array so we set it there and return that index • If we reach the end of the loop, that means all data items in the array shifted to the right and the new item belongs in the first spot (index 0). We insert it there and return that index • A lot of bouncing back and forth between Node234 and Tree234! We’re almost done though. At this point, the we’ve created the sibling node, and inserted the 2nd data item of the full node into the parent (created or existing) All that is left in the split function is to set the sibling to the new data and child pointers • 2-3-4 Trees sibling.insertItem(thirdItem); sibling.setChild(0, thirdChild); sibling.setChild(1, fourthChild); } // Using the Node functions previously discussed, we insert the 3rd data item from the full node into the sibling and set its 1st and 2nd child pointers to what was once the full node’s 3rd and 4th children 2-3-4 Trees Efficiency • The insert algorithm and the splits with the 2-3-4 tree guarantee balance • The balance leads to an O(log n) category performance • Each node contains 3 data items which imply extra data usage and impact to performance • Question: is the impact on performance on with traversing each node’s data array significant? • Question 2: is the array allocation of 3 elements per node a significant amount of data storage? 2-3-4 Trees Performance • Worst case searches mean for each node visited at each level, the entire data array is traversed before finding the element or determining the next level to descend (this is also the tree’s maximum value) • Because of the way the insert and split algorithms work, it is rare to see full nodes that haven’t been split on each level • Also, even if each node on each level was full when visited, the number of data item searches will still be O(log n) proportional to the total number of data elements • This makes the search performance ultimately comparable to a balanced binary search tree 2-3-4 Trees Data Storage • With most nodes in the tree not usually full, that implies an amount of unused data space • The math works out to about 2/7 of unused space based on the number of elements inserted into the tree • Compared to self-balancing trees like red black trees and AVL trees, the amount of overhead to balance the tree is comparable to the amount of unused space (you get a little better performance with 2-3-4 than the balancing trees with a relative price in data storage) • Why not use a linked list instead of an array? There is an increased amount of overhead with doing that as well, but if that is necessary to relieve the unused space, it can be used 2-3-4 Trees Tree Traversal • Displaying data in order with a binary search tree involved using simple recursion of displaying the subtree on the left, printing the current element and then displaying the subtree on the right • The same concept can apply with a 2-3-4 tree except you must now account for the multiple data items and child pointers: – If the current node is not null, print the child[0] subtree, print data item[0], print child[1] subtree – If the current node has 2 data items, also print data item[1] and then print the child[2] subtree – If the current node is full, also print data item[2] and then print the child[3] subtree 2-3-4 Trees private void displayInOrder(Node234 current) { if (current != null) { displayInOrder(current.getChild(0)); int n = current.getNumItems(); for (int c = 0; c < n; c++) { System.out.println(current.getItem(c)); displayInOrder(current.getChild(c+1)); } } } 2-3-4 Trees Delete • As you can imagine, the delete function appears quite challenging: – Removing an item at the leaf level is not hard – Removing an item at a non-leaf level requires rearranging nodes and child pointers • The “cop out” discussed with Binary Trees is even more necessary here – Make each data item a class with an additional “isDeleted” property – Mark data items as true for isDeleted when removed – Rebuild the 2-3-4 tree as needed walking through the tree and inserting elements into a new tree that are not flagged for deletion – The new 2-3-4 tree will still be a balanced tree 2-3-4 Trees Applications • Guaranteed balance is a big advantage that you get with a 2-3-4 over a binary search tree • Minimized node count and balance also reduces the amount of node visits • Reduced node visits can be useful in applications where nodes representing a significant data element is captured – Disk blocks as nodes mean less time to find a block of data on a track that takes time to find – Disk storage is a popular use of this data structure 2-3-4 Trees Other Multiway Trees • 2-3 trees are similar to 2-3-4 trees: – 2 data items and 3 child pointers – Same non-leaf node rules apply • Larger sized data item trees follow the same rules for number of data items and children (links = data items + 1) – this makes the insert and split algorithm the same • 2-3 trees split only when the leaf is full and recursively split full parents up the tree (this keeps the number of splits necessary per insert to a minimum) 2-3-4 Trees Summary • Whether self-balancing binary search tree or 2-34 type tree, balance is the theme to keep performance at O(log n) • Self balancing trees reduce the memory usage and makes that more dynamic while the algorithms for 2-3-4 trees are not as complex • The search is optimized by way of storing the data in some determined order • Search can reach O(1) performance if that order was not as significant and the data elements could be mapped in a different way…

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 2-3-4 Trees - Randomly Philled