Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Chapter 8: Binary Trees, Binary Search Trees, Huffman Code Binary Trees Binary Trees in all their forms present a very useful data structure that allows O(log N) searches, insertions and deletions, while being fully dynamic in nature. To discuss the tree structure we need to understand the terminology associated with a tree Tree Terminology Node is an item on the tree. Typically a node consists of data (or a reference to data) and at least two pointers to possible children (usually left and right) Root the original parent node of all other child nodes. parent node of a subtree Child a node that is referenced from another node. the ancestor of the child Or the The parent is Parent the node that a child node directly comes from, the child(ren) is a descendant of the parent Leaf a node with no descendents Depth or Height of a tree is the maximum number of nodes that can be traversed (in one direction) starting from the parent, 0 to n 2 Level is the current number of nodes that have to be traversed to reach a node. In general each level can hold up to 2depth nodes, where depth starts at zero for the root Descendents the set of nodes that can trace a path back up to a given node are its descendents Ancestors the set of nodes that lay between a node and the root Sub Tree a given node (root or otherwise) and all its descendents Binary Tree is a root node with up to two children. have up to two children, etcetera Each child can Balanced Tree a tree with approximately the same depth for all of its leaves. Maintains the “logarithmic” speed of the tree, without this trees could turn into a linear access structure Full Binary Tree a tree with all possible nodes filled at each level. Such a tree is considered balanced Complete Tree is a tree that is either full or only has missing nodes from the bottom level. Such a tree is also considered balanced Tree Operations There are several bog standard tree operations 3 Find If there is no special order to the tree than this is an O(N) process and possibly one with a somewhat expensive constant. However, trees usually have some special properties that make searching at least O(log N) Insert similar to search in that a generic tree insert might be O(N), but they usually are O(log N) for any type of tree we would use. We must create a new node and attach it to its new parent Delete ditto, but deleting a node involves more complexity. For example, a node with children will involve re-arranging how they are placed on the tree Traversing a tree for output is an O(N) process regardless of how a tree is arranged or what type of tree it is. This is expected as a tree with N nodes will require at least visiting N nodes. It may be some constant of N as we may have to go through a node more than once to get to all its children Binary Search Trees A Binary Search Tree is a binary tree with the additional property that the left child (if any) of a node contains a value smaller than the current nodes value. The right child of a node (if any) must store only a greater value. Because of this property a Binary Search Tree makes searching, sorted insertion and deletion all O(log N) if the tree is near balanced. Of course, an unbalanced tree approaches O(N), just like a list Binary Search Tree Operations Find use the binary search tree property (children to the left are less / right are greater) to search for a node in at most O(log N) time - in a balanced tree. If you end up at a node that does not contain the value you are looking for, and has no applicable children to traverse, then the tree does not contain the value 4 Insert use the same approach as above to fall to the lowest level of the tree and insert the new leaf node attached to this node Delete must ensure that the tree maintains its’ binary search tree status. First we find a node as above. This algorithm can be recursive or non-recursive, although non-recursive is actually not any harder to code (and may not even need a stack). There are three cases for deleting a node A leaf node, simplest case where we set the parent’s node link to null for that leaf A node with two children, requires finding the in order successor. It is this node that we want to use to replace the node to delete. Such a node is found by going right once and then left as far as possible. Note that upon finding this node it may be necessary to also handle its’ right child as well by moving it up A node with a single child, whether right or left, can be deleted by simply replacing its’ parent’s link with its’ child link Tree Traversal means walking the tree or visiting each node in some order. Note that you may well visit each node more than once, but usually you will only ‘process’ each node once, such as is typically done for output of the tree. There are three standard ways to walk a tree Pre Order print the current node. Then try to go left and print if able. If not, try and go right and print if able. Failing that back up a level and try those again until all nodes have been visited at least once (but only processed/printed once) 5 In Order is essentially viewing the nodes in sorted order. Go left as far as possible and then print, go right if able and repeat the above process. If you have to back up then print if you have not already done so Post Order go left as far as possible and then right as far as possible. Print and then backup a level. As usual, don’t reprint a node you have already visited Code Example) // tree.java // demonstrates binary tree // to run this program: C>java TreeApp import java.io.*; import java.util.*; // for Stack class //////////////////////////////////////////////////////////////// class Node { public int iData; // data item (key) public double dData; // data item public Node leftChild; // this node's left child public Node rightChild; // this node's right child public void displayNode() { System.out.print('{'); System.out.print(iData); System.out.print(", "); System.out.print(dData); System.out.print("} "); } } // end class Node // display ourself //////////////////////////////////////////////////////////////// class Tree { private Node root; // first node of tree 6 // ------------------------------------------------------------public Tree() // constructor { root = null; // no nodes in tree yet } // ------------------------------------------------------------public Node find(int key) // find node with given key { // (assumes non-empty tree) Node current = root; // start at root while(current.iData != key) // while no match, { if(key < current.iData) // go left? current = current.leftChild; else // or go right? current = current.rightChild; if(current == null) // if no child, return null; // didn't find it } return current; // found it } // end find() // ------------------------------------------------------------public void insert(int id, double dd) { Node newNode = new Node(); // make new node newNode.iData = id; // insert data newNode.dData = dd; if(root==null) // no node in root root = newNode; else // root occupied { Node current = root; // start at root Node parent; while(true) // (exits internally) { parent = current; if(id < current.iData) // go left? { current = current.leftChild; if(current == null) // if end of the line, { // insert on left parent.leftChild = newNode; return; } } // end if go left else // or go right? { current = current.rightChild; 7 } if(current == null) // if end of the line { // insert on right parent.rightChild = newNode; return; } } // end else go right } // end while } // end else not root // end insert() // ------------------------------------------------------------public boolean delete(int key) // delete node with given key { // (assumes non-empty list) Node current = root; Node parent = root; boolean isLeftChild = true; while(current.iData != key) // { parent = current; if(key < current.iData) // { isLeftChild = true; current = current.leftChild; } else // { isLeftChild = false; current = current.rightChild; } if(current == null) // return false; // } // end while // found node to delete search for node go left? or go right? end of the line, didn't find it // if no children, simply delete it if(current.leftChild==null && current.rightChild==null) { if(current == root) // if root, root = null; // tree is empty else if(isLeftChild) parent.leftChild = null; // disconnect else // from parent parent.rightChild = null; } // if no right child, replace with left subtree 8 else if(current.rightChild==null) if(current == root) root = current.leftChild; else if(isLeftChild) parent.leftChild = current.leftChild; else parent.rightChild = current.leftChild; // if no left child, replace with right subtree else if(current.leftChild==null) if(current == root) root = current.rightChild; else if(isLeftChild) parent.leftChild = current.rightChild; else parent.rightChild = current.rightChild; else // two children, so replace with inorder successor { // get successor of node to delete (current) Node successor = getSuccessor(current); // connect parent of current to successor instead if(current == root) root = successor; else if(isLeftChild) parent.leftChild = successor; else parent.rightChild = successor; // connect successor to current's left child successor.leftChild = current.leftChild; } // end else two children // (successor cannot have a left child) return true; // success } // end delete() // ------------------------------------------------------------// returns node with next-highest value after delNode // goes to right child, then right child's left descendents private Node getSuccessor(Node delNode) { Node successorParent = delNode; Node successor = delNode; Node current = delNode.rightChild; // go to right child while(current != null) // until no more { // left children, successorParent = successor; 9 successor = current; current = current.leftChild; } // go to left child // if successor not if(successor != delNode.rightChild) // right child, { // make connections successorParent.leftChild = successor.rightChild; successor.rightChild = delNode.rightChild; } return successor; } // ------------------------------------------------------------public void traverse(int traverseType) { switch(traverseType) { case 1: System.out.print("\nPreorder traversal: "); preOrder(root); break; case 2: System.out.print("\nInorder traversal: "); inOrder(root); break; case 3: System.out.print("\nPostorder traversal: "); postOrder(root); break; } System.out.println(); } // ------------------------------------------------------------private void preOrder(Node localRoot) { if(localRoot != null) { System.out.print(localRoot.iData + " "); preOrder(localRoot.leftChild); preOrder(localRoot.rightChild); } } // ------------------------------------------------------------private void inOrder(Node localRoot) { if(localRoot != null) { inOrder(localRoot.leftChild); System.out.print(localRoot.iData + " "); 10 inOrder(localRoot.rightChild); } } // ------------------------------------------------------------private void postOrder(Node localRoot) { if(localRoot != null) { postOrder(localRoot.leftChild); postOrder(localRoot.rightChild); System.out.print(localRoot.iData + " "); } } // ------------------------------------------------------------public void displayTree() { Stack globalStack = new Stack(); globalStack.push(root); int nBlanks = 32; boolean isRowEmpty = false; System.out.println( "......................................................"); while(isRowEmpty==false) { Stack localStack = new Stack(); isRowEmpty = true; for(int j=0; j<nBlanks; j++) System.out.print(' '); while(globalStack.isEmpty()==false) { Node temp = (Node)globalStack.pop(); if(temp != null) { System.out.print(temp.iData); localStack.push(temp.leftChild); localStack.push(temp.rightChild); if(temp.leftChild != null || temp.rightChild != null) isRowEmpty = false; } else { 11 System.out.print("--"); localStack.push(null); localStack.push(null); } for(int j=0; j<nBlanks*2-2; j++) System.out.print(' '); } // end while globalStack not empty System.out.println(); nBlanks /= 2; while(localStack.isEmpty()==false) globalStack.push( localStack.pop() ); } // end while isRowEmpty is false System.out.println( "......................................................"); } // end displayTree() } // end class Tree //////////////////////////////////////////////////////////////// class TreeApp { public static void main(String[] args) throws IOException { int value; Tree theTree = new Tree(); theTree.insert(50, theTree.insert(25, theTree.insert(75, theTree.insert(12, theTree.insert(37, theTree.insert(43, theTree.insert(30, theTree.insert(33, theTree.insert(87, theTree.insert(93, theTree.insert(97, 1.5); 1.2); 1.7); 1.5); 1.2); 1.7); 1.5); 1.2); 1.7); 1.5); 1.5); while(true) { System.out.print("Enter first letter of show, "); System.out.print("insert, find, delete, or traverse: "); int choice = getChar(); switch(choice) { case 's': 12 } theTree.displayTree(); break; case 'i': System.out.print("Enter value to insert: "); value = getInt(); theTree.insert(value, value + 0.9); break; case 'f': System.out.print("Enter value to find: "); value = getInt(); Node found = theTree.find(value); if(found != null) { System.out.print("Found: "); found.displayNode(); System.out.print("\n"); } else System.out.print("Could not find "); System.out.print(value + '\n'); break; case 'd': System.out.print("Enter value to delete: "); value = getInt(); boolean didDelete = theTree.delete(value); if(didDelete) System.out.print("Deleted " + value + '\n'); else System.out.print("Could not delete "); System.out.print(value + '\n'); break; case 't': System.out.print("Enter type 1, 2 or 3: "); value = getInt(); theTree.traverse(value); break; default: System.out.print("Invalid entry\n"); } // end switch } // end while // end main() // ------------------------------------------------------------public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); 13 return s; } // ------------------------------------------------------------public static char getChar() throws IOException { String s = getString(); return s.charAt(0); } //------------------------------------------------------------public static int getInt() throws IOException { String s = getString(); return Integer.parseInt(s); } } // end class TreeApp Optimizing Binary Trees for the tree operations can take several approaches, from removing recursion to implementing the tree with an array Removing Recursion is one approach to speeding up some tree operations. To do this you will need Doubly Linked nodes so that you can traverse them backwards without using a stack to store previous nodes Incrementing Mark some int or long that marks if you have been to a node before or not for a given tree operation. To keep this from requiring a clear operation to reset this mark, simply increment the mark value you will use for each operation, and use that new value to check / mark the path of your operation 14 Saving Deleted Node whenever you need to inset anode, first check the list of deleted nodes. If one does exist, then reuse it. This saves two costs, first no garbage collection for a deleted node occurs. Second, no instantiation of the new node occurs in this case. Obviously this a fairly good affect on trees that are actively being added to or deleted from Trees Using Arrays are another speedup. In this case we trade the extra storage for a full tree. This means we possibly pay array resizing costs as well. In turn, we can traverse or search the tree very quickly compared to linked lists and may gain some cache locality benefits as well A node’s parent is found by taking its index and computing ( index – 1 ) / 2 A node’s children are 2 * index + 1 and 2 * index + 2 Huffman Code A Huffman Code is a means of compressing data by coming up with a binary tree representation of the characters where the most frequently used characters are close to the root. Instead of using the normal 8+ bit representation for characters we traverse the tree with a left traverse being a 0 and a right traverse being a 1. In this way we can build a new non-fixed length binary representation for each character in a message Constructing a Tree is a several step process Build the Character Frequency Array by counting each character in the message. We want to place these characters into an array (or list) of trees where the least frequently occurring character is placed first. The trees will each have a parent node that is the character in question 15 Combine the Character Frequency Tree Array as follows. Take the first two trees and combine them as children into a new tree with a new parent. Place this tree back into the array in the position where the total frequency of all leaf node’s characters would dictate. Repeat this process until only a single tree remains Build the Message character by character using a tree search to determine its’ encoding (left = 0, right = 1). Both sender and receiver have to agree as to how the encoding works so the encoding must be sent in some way as well 16 Chapter 9: Red-Black Trees Red-Black Trees Red-Black Trees in present a very useful data structure that allows O(log N) searches, insertions and deletions regardless of how data is entered. This is because they self balanced dynamically on insertion and deletion of nodes. Two basic types of trees exist. Those that insert top down and change the tree data structure as they insert. And the somewhat less efficient variety that first traverses to find the insert spot and then propagates tree structure changes upward. Both types require bottom up for deletion. Key Red-Black Tree Concepts Each node in the tree is ‘colored’. There are a few rules about this coloration process 1) Every node is colored red or black 2) The root is always black 3) If a node is red its children must be black. converse does not have to be true But the 4) Every path from root to any actual (or possible) leaf must contain the same number of black nodes (or black height). Possible children are also known as null children 5) Duplicate items are either not allowed (easy) or some how caused to be distributed evenly along either child branch of the first such value (harder) Default to a Red Insert, it is simpler and faster 17 Color Change can sometimes be used to rebalance a tree. That is changing the color of the nodes in the tree to match the above rules Rotation of nodes may occur in either direction. Note that we can rotate any node and its sub tree, not just the head 1) A right rotate means the current node moves right and its left child moves into its old position. Any inner grandchild node also moves from its grandparent to be connected to eh parent after the rotation 2) A left rotate is similar except the rotation is to the left of course Searching a Red-Black tree is just like searching a binary tree. We do not have to pay attention to any special Re-Black Tree rules since we are not touching the data Simplified Pseudo Code Examples for deletes and inserts in a bottom up manner. This insert approach requires another rule -- Null children are counted as black. T is the tree p is parent x is a the current node pointer color() returns the color of a node or sets it if assigned to root() returns if the node is a root or not left() returns the left most child of a node right() ditto but the right most node RotateLeft and RotateRight work as descried above TreeInsert() uses a standard search to find where the node should be inserted 18 Inserting into a Red-Black Tree is more complex than a search due to the rules listed above Example Code) RedBlackInsert(T, x) { TreeInsert(T,x) color(x) = Red while( x != root(T) && color(p(x)) == Red ) { if ( p(x) == left(p(p(x)) ) { y <- right(p(p(x))) if ( color(y) == Red ) { color(p(x)) = Black color(y) = Black color(p(p(x))) = Red x = p(p(x)) } else { if ( x == right(p(x)) ) { x = p(x) RotateLeft(T,x) } color(p(x)) = Black color(p(p(x))) = Red RotateRight(T, p(p(x))) } } else // p(x) != left(p(p(x)) { // this is the same as above but swap right and left } } // end while color(root(T)) = Black } Deleting from a Red-Black Tree is even more complex than inserting. 19 Example Code) RedBlackDelete(T,z) { if ( left(z) == nil(T) || right(z) == nil(T) ) { y = z } else { y = TreeSuccessor(z) } if ( left(y) != nil(T) ) { x = left(y) } else { x = right(y) } p(x) = p(y) if ( p(y) == nil(T) ) { root(T) = x } else { if ( y == left(p(y)) ) { left(p(x)) = x } else { right(p(y)) = x } } if ( y != z ) { key(z) = key(y) } // note if y has other fields, copy them too if ( color(y) == Black ) { RBDeleteFixup(T,x) 20 } } RBDeleteFixup(T,x) { while ( x != root(T) && color(x) == { if ( x == left(p(x)) ) { w = right(p(x)) if ( color(w) = Black ) { color(p(x)) = Red RotateLeft(T,p(x)) w = right(p(x)) } if ( color(left(w)) == Black && { color(w) = Red x = parent(x) } else { if ( color(right(w)) == Black { color(left(w)) = Black color(w) = Red RotateRight(T,w) w = right(p(x)) } color(w) = color(p(x)) color(p(x)) = Black color (right(w)) = Black RotateLeft(T,p(x)) x = root(T) } } else // x != left(p(x)) { // same code as if portion, but } // end ifs about } // end while color(x) <--Black } Black ) color (right(w)) == Black ) ) switch right and left 21 Chapter 10: 2-3-4 Trees, Storage and B-Trees 2-3-4 Trees 2-3-4 Trees are a slightly less efficient than red-black trees but a whole lot easier to code and understand. As in all self-balanced trees, they allow O(log N) searches, insertions and deletions regardless of how data is entered. 2-3-4 Tree Concepts Each node has 2-4 links off of it. This means that there are 0-3 data items in a node. The number of links is referred to as the order of the tree Data items in the node are stored in sorted order. Which means the links may have to move as well depending on insertions or deletions to the data in the node Links refer to children that are between the data items. just compare to the nearest data item (lesser or greater). will always be one more link than the number of data items End links There A Split refer to the process where a full node is broken into two and a value is propagated up to its’ parent (or a new parent is made). The details of the process are as follows 1) A new node is created that will be the sibling of the node to be split 2) Move the last item into the new node 3) Item A is left as is 4) The rightmost two children are attached to the new node 5) Item B is moved up one level 22 Searching a 2-3-4 tree is just like searching a binary tree so long as order is 2. Otherwise, we need to search each of the data items in the node until we find one greater than the value we are looking at or reach the end. If the former occurs, take the previous link. Else, take the last link Inserting into a 2-3-4 Tree can be fairly easy or hard, depending on the condition of the nodes on the way to this node 1) If all the nodes on the path are not full, we just need to traverse the tree and insert the data at the leaf 2) If the nodes on the way are full, we split the node and continue until the leaf 3) if the leaf is full then we split that and move the middle value up Deleting, will not be covered Example Code) // demonstrates 234 tree // to run this program: C>java Tree234App import java.io.*; //////////////////////////////////////////////////////////////// class DataItem { public long dData; public DataItem(long dd) { dData = dd; } // one data item // constructor public void displayItem() // display item, format "/27" { System.out.print("/"+dData); } } // end class DataItem 23 //////////////////////////////////////////////////////////////// class Node { private static final int ORDER = 4; private int numItems; private Node parent; private Node childArray[] = new Node[ORDER]; private DataItem itemArray[] = new DataItem[ORDER-1]; // connect child to this node public void connectChild(int childNum, Node child) { childArray[childNum] = child; if(child != null) { child.parent = this; } } // disconnect child from this node, return it public Node disconnectChild(int childNum) { Node tempNode = childArray[childNum]; childArray[childNum] = null; return tempNode; } public Node getChild(int childNum) { return childArray[childNum]; } public Node getParent() { return parent; } public boolean isLeaf() { return ( childArray[0] == null) ? true : false; } public int getNumItems() { return numItems; } 24 public DataItem getItem(int index) { return itemArray[index]; } // get DataItem at index public boolean isFull() { return (numItems == ORDER - 1) ? true : false; } public int findItem(long key) // return index of { // item (within node) for(int j=0; j<ORDER-1; j++) // if found, { // otherwise, if( itemArray[j] == null ) // return -1 { break; } else if( itemArray[j].dData == key ) { return j; } } return -1; } // end findItem public int insertItem(DataItem newItem) { // assumes node is not full numItems++; // will add new item long newKey = newItem.dData; // key of new item for(int j=ORDER-2; j>=0; j--) // { // if(itemArray[j] == null) // { continue; // } else // { // long itsKey = itemArray[j].dData; if(newKey < itsKey) // { itemArray[j+1] = itemArray[j]; } else { itemArray[j+1] = newItem; // start on right, examine items if item null, go left one cell not null, get its key if it's bigger // shift it right insert new item 25 } return j+1; } } // end else (not null) // end for itemArray[0] = newItem; return 0; } // end insertItem() // return index to // new item // shifted all items, // insert new item public DataItem removeItem() // remove { // assumes node not empty DataItem temp = itemArray[numItems-1]; // itemArray[numItems-1] = null; // numItems--; // return temp; // } public void displayNode() { for(int j=0; j<numItems; j++) itemArray[j].displayItem(); System.out.println("/"); } } largest item save item disconnect it one less item return item // format "/24/56/74/" // "/56" // final "/" // end class Node class Tree234 { private Node root = new Node(); // make root node public int find(long key) { Node curNode = root; int childNumber; while(true) { if(( childNumber=curNode.findItem(key) ) != -1) { return childNumber; // found it } else if( curNode.isLeaf() ) { return -1; // can't find it } else // search deeper 26 } { curNode = getNextChild(curNode, key); } // end while } // insert a DataItem public void insert(long dValue) { Node curNode = root; DataItem tempItem = new DataItem(dValue); while(true) { if( curNode.isFull() ) { split(curNode); curNode = curNode.getParent(); // if node full, // split it // back up // search once curNode = getNextChild(curNode, dValue); } // end if(node is full) else if( curNode.isLeaf() ) // if node is leaf, { break; // go insert } // node is not full, not a leaf; so go to lower level else { curNode = getNextChild(curNode, dValue); } } // end while curNode.insertItem(tempItem); } // end insert() public void split(Node thisNode) { // assumes node is full DataItem itemB, itemC; Node parent, child2, child3; int itemIndex; // insert new DataItem // split the node itemC = thisNode.removeItem(); // remove items from itemB = thisNode.removeItem(); // this node child2 = thisNode.disconnectChild(2); // remove children child3 = thisNode.disconnectChild(3); // from this node 27 Node newRight = new Node(); // make new node if(thisNode==root) // { root = new Node(); parent = root; root.connectChild(0, thisNode); } else // { parent = thisNode.getParent(); } if this is the root, // make new root // root is our parent // connect to parent this node not the root // get parent // deal with parent itemIndex = parent.insertItem(itemB); // item B to parent int n = parent.getNumItems(); // total items? for(int j=n-1; j>itemIndex; j--) // move parent's { // connections Node temp = parent.disconnectChild(j); // one child parent.connectChild(j+1, temp); // to the right } // connect newRight to parent parent.connectChild(itemIndex+1, newRight); // deal with newRight newRight.insertItem(itemC); // item C to newRight newRight.connectChild(0, child2); // connect to 0 and 1 newRight.connectChild(1, child3); // on newRight } // end split() // gets appropriate child of node during search for value public Node getNextChild(Node theNode, long theValue) { int j; // assumes node is not empty, not full, not a leaf int numItems = theNode.getNumItems(); for(j=0; j<numItems; j++) // for each item in node { // are we less? if( theValue < theNode.getItem(j).dData ) { return theNode.getChild(j); // return left child } } // end for // we're greater, so return theNode.getChild(j); // return right child } 28 public void displayTree() { recDisplayTree(root, 0, 0); } private void recDisplayTree(Node thisNode, int level, int childNumber) { System.out.print("level="+level+" child="+childNumber+" "); thisNode.displayNode(); // display this node // call ourselves for each child of this node int numItems = thisNode.getNumItems(); for(int j=0; j<numItems+1; j++) { Node nextNode = thisNode.getChild(j); if(nextNode != null) { recDisplayTree(nextNode, level+1, j); } else { return; } } } // end recDisplayTree() } // end class Tree234 class Tree234App { public static void main(String[] args) throws IOException { long value; Tree234 theTree = new Tree234(); theTree.insert(50); theTree.insert(40); theTree.insert(60); theTree.insert(30); theTree.insert(70); while(true) { System.out.print("Enter first letter of "); 29 } System.out.print("show, insert, or find: "); char choice = getChar(); switch(choice) { case 's': theTree.displayTree(); break; case 'i': System.out.print("Enter value to insert: "); value = getInt(); theTree.insert(value); break; case 'f': System.out.print("Enter value to find: "); value = getInt(); int found = theTree.find(value); if(found != -1) System.out.println("Found "+value); else System.out.println("Could not find "+value); break; default: System.out.print("Invalid entry\n"); } // end switch } // end while // end main() public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); return s; } public static char getChar() throws IOException { String s = getString(); return s.charAt(0); } public static int getInt() throws IOException { String s = getString(); return Integer.parseInt(s); } } // end class Tree234App 30 Storage Basics Hard Disks are come in several interfaces and formats. In any case some of the basic descriptions are Storage Capacity is measured in Gigabytes (1,000,000,000 bytes) Bandwidth determines how fast data can be moved to or from storage. It is measured in MB/Sec with both sustained and burst rates for read and write. Be aware that your I/O subsystem needs to handle the sustained amount, along with any other traffic, or you will have a bottle neck Access Time is in ms and consist of seek time (the head moving across the platter), rotation latency (time it takes for the drive to rotate to correct position) and Block Transfer Time (time to read/write a block). In general higher RPMS, smaller platter size and more numerous platters all make for faster access Mean Time Between Failure (MBTF) usually the number of hours of operation before a drive will fail (on avaerage). Look for at least 500,000+ hours. Interface is the protocol that the drive uses to communicate with the PC. In general SCSI is much better than IDE for servers because it requires fewer CPU cycles to do its job HD Terminology Heads consists of the number of read/write ‘needles’ that can access your drive. In general 2 per platter 31 Spindle what the drive platters spin on Platter is a magnetically coated disk that resembles a record and stores numerous 0s or 1s. May have multiple platters stacked on top of one another in a disk (typically 20 GB a platter for IDE and 18GB a platter for SCSI) Tracks and Cylinder (multi-platter tracks) positional descriptor assigned to each “ring” of a disk Sector another positional descriptor of the disk. pie slice of the disk that contains many sectors A pie shaped Blocks are the combined position of sector and track numbers and typically store 512 to 4096 Bytes each. Blocks are separated by Inter Block Gaps which serve as “speed bumps” so that the drive knows where blocks begin and end. Blocks can be combined into contiguous, logically addressable units called clusters Hardware Address consists of block, sector and track numbers 32 Efficient Use of the Hard Drive involves filling each block as full as possible, and reading as many blocks in order on a track as possible. I.e. seek and rotation latency are used once, then read all the blocks you need. Additionally files placed near the edge of a drive tend to be accessed faster. In essence this is what a disk defragmenter does for you Why does it matter? Hard drive performance is measured in milliseconds (ms) while your computer processes information in nanoseconds (ns). So, hard drives are usually 1000’s of times slower than your CPU. Hence any speedup in hard drive access yields a serious speedup in machine performance. Which leads us to B-Trees and Merge Sorts. 33 B-Trees B-Trees are a slightly less efficient than top down 2-34 trees or red-black trees, but easier to code of course. Like all the balanced trees we have seen, they are O(log N) for searches, insertions and deletions B-Tree Concepts A B-Tree is ordered like the others. Typically we see an order that ends up filling the block size of the hard drive The simple B-Trees are bottom up for insertion. And, for our purposes this means doubly linked pointers so we can backtrack up the tree There are more advanced versions called B+ trees, but these will not be covered here A Split refer to the process where a full node is broken into two and a value is propagated up to its’ parent(s) (or a new parent is made). The details of the process are as follows 1) We split the node into two with half the ordered data staying in palce and the other half – 1 going to a new child node 2) The half + 1 item ends up moving up the tree and is inserted into the parent 3) If the parent is full we split again on the parent, etc 34 A Merge refers to the process where two children are brought up to the parent’s level. This is occurs when the current node + both children’s node’s have fewer than values then a node can hold. In this case we test for this after a deletion 1) Read in values and pointers from each child and insert them into the current node 2) Delete both children 3) Strictly speaking we might need to rotate the tree at this point, but we will leave that as an exercise for the readers imagination Searching a B-Tree is just like searching a 2-3-4 tree while not found and not leaf do traverse tree using value to search if value in node found return true return false Inserting into a B-tree means going to the root (bottom up) while not leaf do traverse tree using value to insert if leaf not full insert else split leaf as above 35 Deleting this is a simple delete routine for a B-Tree. better, but it will work for us There are while value not found and not leaf do traverse tree using value to delete if value in node not found return if node is a leaf remove value if no values left delete node else if number of values in node and children is > values node can store return else merge 36 Chapter 11: Hash, Chaining, Special Hashes Hash Tables A Hash is an algorithm that uses a calculation to inset and then to find the location of a piece of data in (ideally) one step. This makes hashes quite fast. However, a hash table is usually fixed in size (R items) and needs about 50% more storage than the number of hashed keys (M) it will hold. A good choice is to pick M as a power of two, or alternatively as a prime number Hash Table is the base structure that holds the data. A hash table typically about 50% larger than the number of values it could hold. However since this is usually implemented as an array of references the actual excess memory consumption is less than expected Hash Slot/Bucket a slot in the hash table. Typically hold a link to an element of data, but can also hold the data itself in some algorithms. Buckets n eh same slot can be chained together as a linked list, a tree, etc. Hash Key is the variable that is used in the hash function to determine which bucket an item should fall into Hash Function is a consistent formula that will map between the received key variable and a specific slot/bucket. It should ideally uniformly distribute data. This can be difficult to accomplish if you do no tknow much about your data to begin with // hash example int hashtable[8] = {0, 0, 0, 0, 0, 0, 0, 0}; slot = intInsertMe % 8; hashtable[slot] = intInsertMe; 37 Collisions occur when a second value hashes to the same bucket. Assuming a basic hash algorithm, we can deal with this in one of three ways. Wrap the data around to the next bucket. This is also known as linear probing and is generally considered inefficient Re-hash with another function. cause performance issues Possible, but may still Chain another bucket off of the current one usually in some sorted order. As mentioned above this chaining can be of any data structure. This is typically the best solution Hash Benefits and Limitations a hash is ideally an O(1) data structure for searches, inserts and deletions. So what are the issues? Collisions can turn this into an O(N) or O(log N) algorithm depending on how chaining is implemented. The extra memory utilization with expensive operations to resize, just as in arrays. Usually no easy quick way to sequentially access values in sorted order. So hashes are almost always considered an un-sorted data structure 38 Chaining Chaining means adding buckets off of our hash slots. This can be done via nay data structure you wish. From an array to a tree of whatever sort you wish. In general, we are assuming that our depth of chaining is not much, or else our hash is in trouble anyway. Therefore, we tend to implement nothing more advanced than a binary tree to support out chaining Code Example) // hashChain.java // demonstrates hash table with separate chaining // to run this program: C:>java HashChainApp import java.io.*; //////////////////////////////////////////////////////////////// class Link { // (could be other items) private int iData; // data item public Link next; // next link in list public Link(int it) { iData = it; } // constructor public int getKey() { return iData; } public void displayLink() { System.out.print(iData + " "); } } // end class Link class SortedList { private Link first; public void SortedList() { first = null; } // display this link // ref to first list item // constructor 39 public void insert(Link theLink) { int key = theLink.getKey(); Link previous = null; Link current = first; // insert link, in order // start at first // until end of list, while( current != null && key > current.getKey() ) { // or current > key, previous = current; current = current.next; // go to next item } if(previous==null) { first = theLink; } else { previous.next = theLink; } theLink.next = current; } // end insert() public void delete(int key) { Link previous = null; Link current = first; // if beginning of list, // first --> new link // not at beginning, // prev --> new link // new link --> current // delete link // (assumes non-empty list) // start at first // until end of list, while( current != null && key != current.getKey() ) { // or key == current, previous = current; current = current.next; // go to next link } // disconnect link if(previous==null) // if beginning of list { first = first.next; // delete first link } else // not at beginning { previous.next = current.next; // delete current link } } // end delete() 40 public Link find(int key) { Link current = first; // find link // start at first // until end of list, while(current != null && current.getKey() <= key) { // or key too small, if(current.getKey() == key) // is this the link? { return current; // found it, return link } current = current.next; // go to next item } return null; // didn't find it } // end find() public void displayList() { System.out.print("List (first-->last): "); Link current = first; // start at beginning of list while(current != null) // until end of list, { current.displayLink(); // print data current = current.next; // move to next link } System.out.println(""); } } // end class SortedList class HashTable { private SortedList[] hashArray; private int arraySize; // array of lists public HashTable(int size) // constructor { arraySize = size; hashArray = new SortedList[arraySize]; // create array for(int j=0; j<arraySize; j++) // fill array { hashArray[j] = new SortedList(); // with lists } } 41 public void displayTable() { for(int j=0; j<arraySize; j++) // for each cell, { System.out.print(j + ". "); // display cell number hashArray[j].displayList(); // display list } } public int hashFunc(int key) { return key % arraySize; } // hash function public void insert(Link theLink) // insert a link { int key = theLink.getKey(); int hashVal = hashFunc(key); // hash the key hashArray[hashVal].insert(theLink); // insert at hashVal } // end insert() public void delete(int key) // delete a link { int hashVal = hashFunc(key); // hash the key hashArray[hashVal].delete(key); // delete link } // end delete() public Link find(int key) // find link { int hashVal = hashFunc(key); // hash the key Link theLink = hashArray[hashVal].find(key); // get link return theLink; // return link } } // end class HashTable class HashChainApp { public static void main(String[] args) throws IOException { int aKey; Link aDataItem; int size, n, keysPerCell = 100; // get sizes 42 System.out.print("Enter size of hash table: "); size = getInt(); System.out.print("Enter initial number of items: "); n = getInt(); // make table HashTable theHashTable = new HashTable(size); for(int j=0; j<n; j++) // insert data { aKey = (int)(java.lang.Math.random() * keysPerCell * size); aDataItem = new Link(aKey); theHashTable.insert(aDataItem); } while(true) // interact with user { System.out.print("Enter first letter of "); System.out.print("show, insert, delete, or find: "); char choice = getChar(); switch(choice) { case 's': theHashTable.displayTable(); break; case 'i': System.out.print("Enter key value to insert: "); aKey = getInt(); aDataItem = new Link(aKey); theHashTable.insert(aDataItem); break; case 'd': System.out.print("Enter key value to delete: "); aKey = getInt(); theHashTable.delete(aKey); break; case 'f': System.out.print("Enter key value to find: "); aKey = getInt(); aDataItem = theHashTable.find(aKey); if(aDataItem != null) { System.out.println("Found " + aKey); } else { System.out.println("Could not find " + aKey); } break; 43 } default: System.out.print("Invalid entry\n"); } // end switch } // end while // end main() public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); return s; } public static char getChar() throws IOException { String s = getString(); return s.charAt(0); } public static int getInt() throws IOException { String s = getString(); return Integer.parseInt(s); } } // end class HashChainApp Special Hashes Special Hashes are typically used for hard drive access, but can also be adopted to in memory hashing for Extensible Hashing is a modification of the general hash scheme and can expand and contract its’ table size via a directory that uses the first d bits (local depth) of the hashed key to determine which bucket to use. We start with 1 bit and expand from there when we have an overflow (buckets may have N items in them). Flexible in memory performance, but can suffer from odd sequences of data causing frequent expansions The directory needs to be accessed before we go to the appropriate bucket. Since the directory is always 2d bits long we can always find the right bucket in O(1) time 44 When a bucket overflow occurs, the directory gains one bit in depth which causes a re-organization that is somewhat expensive. Note that only the bucket that over flowed gets split into two (or more) new buckets, all other buckets stay the same Buckets can be merged as well if data continues to shrink. However this is not that common of an operation – perhaps run once in a while just to check if directory can be shrunk 45 Linear Hashing is another approach to dynamically sizing the hash table, but this time without the need for a directory. When the first overflow occurs in any bucket, the 0th bucket is split in two with half of its’ data remaining and half being placed in the new last bucket M + 1 (regardless of where the overflows occurred). This process repeats until there are 2M buckets. Multiple Hash Functions are needed because we will have buckets that are “more split” than other buckets as the number of total buckets grows toward 2M. Note that this process repeats with new hash functions when we pass 2M 46 Chapter 12: Heap, Heap Sort Heap A Heap is a binary tree where in each parent has the greatest possible value of the sub tree of consisting of it’s descendents. the root will always have the greatest value. So Heaps support fixed O(log N) insertion and removal time because a heap can maintain a balanced tree state, regardless of data inserted / deleted Heaps are the preferred method of creating a priority queue because they maintain O(log N) performance regardless of data Heaps can be used to sort an un-sorted array in O(N log N) time regardless of data Heaps are usually implemented as an array for speed, but can be based on linked lists Heap Properties Each node of a heap must satisfy the following condition. All its descendents are greater than it. This is also referred to as weakly ordered as it is less rigid than a binary search tree A heap is always a complete “tree.” As a matter of fact only the end leaves of a given level may be missing 47 Heap Operations most heap operations are fairly trivial, except for enqueue and dequeue. Note that we typically avoid fully swapping elements when we percolate, instead we simply shuffle each node around and then insert the final value when it reaches the appropriate level Enqueue adds an element to the first empty leaf node and then runs a percolate up. This operation swaps the element higher and higher in the tree until it is either the root, or until the next ancestor node has a higher value than it does Dequeue removes the leading element from the heap and replaces it with the last child’s value and then percolates that value downward. In this way the highest value will swap up into the root and lower values will move down the tree Code Example) // heap.java // demonstrates heaps // to run this program: C>java HeapApp import java.io.*; //////////////////////////////////////////////////////////////// class Node { private int iData; // data item (key) public Node(int key) { iData = key; } public int getKey() { return iData; } public void setKey(int id) { iData = id; } // constructor 48 } // end class Node class Heap { private Node[] heapArray; private int maxSize; private int currentSize; // size of array // number of nodes in array public Heap(int mx) // constructor { maxSize = mx; currentSize = 0; heapArray = new Node[maxSize]; // create array } public boolean isEmpty() { return currentSize==0; } public boolean insert(int key) { if(currentSize==maxSize) { return false; } Node newNode = new Node(key); heapArray[currentSize] = newNode; trickleUp(currentSize++); return true; } // end insert() public void trickleUp(int index) { int parent = (index-1) / 2; Node bottom = heapArray[index]; while( index > 0 && heapArray[parent].getKey() < bottom.getKey() ) { heapArray[index] = heapArray[parent]; // move it down index = parent; parent = (parent-1) / 2; } // end while heapArray[index] = bottom; } // end trickleUp() 49 public Node remove() // delete item with max key { // (assumes non-empty list) Node root = heapArray[0]; heapArray[0] = heapArray[--currentSize]; trickleDown(0); return root; } // end remove() public void trickleDown(int index) { int largerChild; Node top = heapArray[index]; while(index < currentSize/2) { int leftChild = 2*index+1; int rightChild = leftChild+1; // save root // while node has at // least one child, // find larger child if ( rightChild < currentSize && heapArray[leftChild].getKey() < heapArray[rightChild].getKey()) { largerChild = rightChild; } else { largerChild = leftChild; } // top >= largerChild? if( top.getKey() >= heapArray[largerChild].getKey() ) { break; } // shift child up heapArray[index] = heapArray[largerChild]; index = largerChild; // go down } // end while heapArray[index] = top; // root to index } // end trickleDown() public boolean change(int index, int newValue) { if(index<0 || index>=currentSize) { return false; } int oldValue = heapArray[index].getKey(); // remember old heapArray[index].setKey(newValue); // change to new if(oldValue < newValue) // if raised, 50 { trickleUp(index); } else { trickleDown(index); } return true; } // end change() // trickle it up // if lowered, // trickle it down public void displayHeap() { System.out.print("heapArray: "); // array format for(int m = 0; m < currentSize; m++) { if(heapArray[m] != null) { System.out.print( heapArray[m].getKey() + " "); } else { System.out.print( "-- "); } } System.out.println(); // heap format int nBlanks = 32; int itemsPerRow = 1; int column = 0; int j = 0; // current item String dots = "..............................."; System.out.println(dots+dots); // dotted top line while(currentSize > 0) // for each heap item { if(column == 0) // first item in row? { for(int k=0; k<nBlanks; k++) // preceding blanks { System.out.print(' '); } // display item } System.out.print(heapArray[j].getKey()); if(++j == currentSize) { break; } // done? 51 if(++column==itemsPerRow) // { nBlanks /= 2; // itemsPerRow *= 2; // column = 0; // System.out.println(); // } else // { for(int k=0; k<nBlanks*2-2; k++) { System.out.print(' '); // } } } // end for System.out.println("\n"+dots+dots); // } // end displayHeap() } end of row? half the blanks twice the items start over on new row next item on row interim blanks dotted bottom line // end class Heap class HeapApp { public static void main(String[] args) throws IOException { int value, value2; Heap theHeap = new Heap(31); // make a Heap; max size 31 boolean success; theHeap.insert(70); theHeap.insert(40); theHeap.insert(50); theHeap.insert(20); theHeap.insert(60); theHeap.insert(100); theHeap.insert(80); theHeap.insert(30); theHeap.insert(10); theHeap.insert(90); // insert 10 items while(true) // until [Ctrl]-[C] { System.out.print("Enter first letter of "); System.out.print("show, insert, remove, change: "); int choice = getChar(); switch(choice) 52 } { case 's': // show theHeap.displayHeap(); break; case 'i': // insert System.out.print("Enter value to insert: "); value = getInt(); success = theHeap.insert(value); if( !success ) System.out.println("Can't insert; heap full"); break; case 'r': // remove if( !theHeap.isEmpty() ) theHeap.remove(); else System.out.println("Can't remove; heap empty"); break; case 'c': // change System.out.print("Enter current index of item: "); value = getInt(); System.out.print("Enter new key: "); value2 = getInt(); success = theHeap.change(value, value2); if( !success ) System.out.println("Invalid index"); break; default: System.out.println("Invalid entry\n"); } // end switch } // end while // end main() public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); return s; } public static char getChar() throws IOException { String s = getString(); return s.charAt(0); } public static int getInt() throws IOException { 53 String s = getString(); return Integer.parseInt(s); } } // end class HeapApp Tree Based Heaps are the same order as there array based brethren. However they are usually harder to navigate. One trick is to Store the number of the node as it is created. Then you can use the binary representation of that node to determine the path you need to take (left or right) from the root to get back to that node. Another is to doubly link your nodes Heap Sort first converts data into a heap and then removes items on at a time. The conversion process can be accomplished in O(N log N) time and we already know an item can be removed in O(log N) time so removing all items sorted takes O(N log N) as well Code Example) // heapSort.java // demonstrates heap sort // to run this program: C>java HeapSortApp import java.io.*; class Node { private int iData; public Node(int key) { iData = key; } public int getKey() { return iData; } } // end class Node // data item (key) // constructor 54 class Heap { private Node[] heapArray; private int maxSize; private int currentSize; // size of array // number of items in array public Heap(int mx) // constructor { maxSize = mx; currentSize = 0; heapArray = new Node[maxSize]; } public Node remove() // delete item with max key { // (assumes non-empty list) Node root = heapArray[0]; heapArray[0] = heapArray[--currentSize]; trickleDown(0); return root; } // end remove() public void trickleDown(int index) { int largerChild; Node top = heapArray[index]; while(index < currentSize/2) { int leftChild = 2*index+1; int rightChild = leftChild+1; // save root // not on bottom row // find larger child if(rightChild < currentSize && heapArray[leftChild].getKey() < heapArray[rightChild].getKey()) { largerChild = rightChild; } else { largerChild = leftChild; } // top >= largerChild? if(top.getKey() >= heapArray[largerChild].getKey()) { break; } // shift child up heapArray[index] = heapArray[largerChild]; index = largerChild; // go down } // end while heapArray[index] = top; // root to index 55 } // end trickleDown() public void displayHeap() { int nBlanks = 32; int itemsPerRow = 1; int column = 0; int j = 0; // current item String dots = "..............................."; System.out.println(dots+dots); // dotted top line while(currentSize > 0) // for each heap item { if(column == 0) // first item in row? { for(int k=0; k<nBlanks; k++) // preceding blanks { System.out.print(' '); } } // display item System.out.print(heapArray[j].getKey()); if(++j == currentSize) // { break; } if(++column==itemsPerRow) // { nBlanks /= 2; // itemsPerRow *= 2; // column = 0; // System.out.println(); // } else { // next for(int k=0; k<nBlanks*2-2; k++) { System.out.print(' '); // } } } // end for System.out.println("\n"+dots+dots); // } // end displayHeap() done? end of row? half the blanks twice the items start over on new row item on row interim blanks dotted bottom line 56 public void displayArray() { for(int j=0; j<maxSize; j++) { System.out.print(heapArray[j].getKey() + " "); } System.out.println(""); } public void insertAt(int index, Node newNode) { heapArray[index] = newNode; } public void incrementSize() { currentSize++; } } // end class Heap class HeapSortApp { public static void main(String[] args) throws IOException { int size, j; System.out.print("Enter number of items: "); size = getInt(); Heap theHeap = new Heap(size); for(j=0; j<size; j++) // fill array with { // random nodes int random = (int)(java.lang.Math.random()*100); Node newNode = new Node(random); theHeap.insertAt(j, newNode); theHeap.incrementSize(); } System.out.print("Random: "); theHeap.displayArray(); // display random array for(j=size/2-1; j>=0; j--) { theHeap.trickleDown(j); // make random array into heap 57 } System.out.print("Heap: theHeap.displayArray(); theHeap.displayHeap(); "); // dislay heap array // display heap for(j=size-1; j>=0; j--) // remove from heap and { // store at array end Node biggestNode = theHeap.remove(); theHeap.insertAt(j, biggestNode); } System.out.print("Sorted: "); theHeap.displayArray(); // display sorted array } // end main() public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); return s; } public static int getInt() throws IOException { String s = getString(); return Integer.parseInt(s); } } // end class HeapSortApp 58 Chapter 13: Graphs, Searches, Spanning Trees, Sorting Graphs A graph consists of nodes and edges that connect nodes. It is very much like a tree except that it does not need to be limited to a hierarchical form. Nodes can have edges that loop back around, and can even allow travel in along edges in both directions (or only one direction if you so wish) Vertex a node in the graph. In many cases we also track whether that node has been visited or not yet for a given algorithm with a Boolean Edge (Arc) a path connecting two nodes together Undirected Graph a graph in which edges can be traversed in wither direction Directed Graph (Digraph) a graph in which edges can be traversed in wither direction Acyclic a graph that has NO loops in it where I can start at any node and end up at the same node by following any combination of edges Cyclic a graph that has at least one way of looping Weight (Weighted Edge) A value given to an edge to make it more/less easily traveled in relation to other edges 59 Adjacent Vertices are vertices connected by an edge Path A series of edges that connect any two vertices Shortest Path the least weight (fewest edges) between any two vertices Minimal Spanning Tree the series of edges with least total weight (or fewest total edges) that connect every vertex in the graph Keeping Track of Vertices and Edges is typically done with either an adjacency matrix or and adjacency graph. They both have pros and cons depending on what your requirements are An Adjacency Matrix is an N by N matrix showing which N vertices are connected to which of N vertices. Each entry in the graph is either a 1 or 0, true or false, or a weight for a weighted graph. Note that the vertices serve as indices into the two dimensional graph Benefits include generally faster access for both looking up and modifying edge data Drawbacks include N2 memory usage and lack of inexpensive Dynamic growth should you wish to change your graph on the fly Example) A B C A B C - 1 1 1 - - 1 - 60 An Adjacency List is an N array (or list) with chaining nodes that represent the vertices that can be reached from a given vertex Benefits include N memory usage and possibility of being fully dynamic. Drawbacks generally slower access for both looking up and modifying edge data. Even putting a tree on the links will be likely not help too much unless the graph is fairly big and close to fully connected Example) A B C ->B->C ->A ->B Vertices are typically composed of a name and some visited Boolean value that indicates if we have seen this node before Example Code) class Vertex { public char label; public boolean wasVisited; public Vertex(char cLabel) { label = cLabel; wasVisited = false; } } 61 Searches Searches are typically done in one of two ways. Breadth first search, or depth first search. They both end up visiting all vertices of a graph, but the order in which they do so is dramatically different A Depth First Search (DFS) uses a stack to roam the graph. By using stack we ensure that the most recent additions are the first ones to be dealt with thus the most recently pushed vertex will be the first to be processed by a DFS. It is an O(N) algorithm that generally works as follows 1) Prime the pump by pushing the starting node (pick one) onto the stack 2) While the stack is not empty pop a node 3) mark it visited 4) push its children on the stack if they are not visited Code Example) // dfs.java // demonstrates depth-first search // to run this program: C>java DFSApp //////////////////////////////////////////////////////////////// class StackX { private final int SIZE = 20; private int[] st; private int top; public StackX() { st = new int[SIZE]; top = -1; } // constructor public void push(int j) { st[++top] = j; } // put item on stack // make array 62 public int pop() { return st[top--]; } // take item off stack public int peek() { return st[top]; } // peek at top of stack public boolean isEmpty() { return (top == -1); } // true if nothing on stack } // end class StackX class Vertex { public char label; // label (e.g. 'A') public boolean wasVisited; public Vertex(char lab) { label = lab; wasVisited = false; } } // end class Vertex // constructor class Graph { private final int MAX_VERTS = 20; private Vertex vertexList[]; // list of vertices private int adjMat[][]; // adjacency matrix private int nVerts; // current number of vertices private StackX theStack; public Graph() // constructor { vertexList = new Vertex[MAX_VERTS]; // adjacency matrix adjMat = new int[MAX_VERTS][MAX_VERTS]; nVerts = 0; for(int y=0; y<MAX_VERTS; y++) // set adjacency 63 { for(int x=0; x<MAX_VERTS; x++) { adjMat[x][y] = 0; } } theStack = new StackX(); } // end constructor // matrix to 0 public void addVertex(char lab) { vertexList[nVerts++] = new Vertex(lab); } public void addEdge(int start, int end) { adjMat[start][end] = 1; adjMat[end][start] = 1; } public void displayVertex(int v) { System.out.print(vertexList[v].label); } public void dfs() // depth-first search { // begin at vertex 0 vertexList[0].wasVisited = true; // mark it displayVertex(0); // display it theStack.push(0); // push it while( !theStack.isEmpty() ) // until stack empty, { // get an unvisited vertex adjacent to stack top int v = getAdjUnvisitedVertex( theStack.peek() ); if(v == -1) // if no such vertex, { theStack.pop(); } else // if it exists, { vertexList[v].wasVisited = true; // mark it displayVertex(v); // display it theStack.push(v); // push it } } // end while // stack is empty, so we're done 64 for(int j=0; j<nVerts; j++) // reset flags { vertexList[j].wasVisited = false; } } // end dfs // returns an unvisited vertex adj to v public int getAdjUnvisitedVertex(int v) { for(int j=0; j<nVerts; j++) { if(adjMat[v][j]==1 && vertexList[j].wasVisited==false) { return j; } } return -1; } // end getAdjUnvisitedVertex() } // end class Graph class DFSApp { public static void main(String[] args) { Graph theGraph = new Graph(); theGraph.addVertex('A'); // 0 (start for dfs) theGraph.addVertex('B'); // 1 theGraph.addVertex('C'); // 2 theGraph.addVertex('D'); // 3 theGraph.addVertex('E'); // 4 theGraph.addEdge(0, theGraph.addEdge(1, theGraph.addEdge(0, theGraph.addEdge(3, } 1); 2); 3); 4); // // // // AB BC AD DE System.out.print("Visits: "); theGraph.dfs(); // depth-first search System.out.println(); } // end main() // end class DFSApp 65 A DBreadth First Search (BFS) uses a queue to roam the graph. By using a queue we ensure that the most recent additions are the first ones to be dealt with thus the most recently pushed vertex will be the first to be processed by a DFS. It is an O(N) algorithm that generally works as follows 1) Prime the pump by enqueing the starting node (pick one) onto the queue 2) While the queue is not empty dequeue a node 3) Mark it visited 4) Enqueue its children on the stack if they are not visited Code Example) // bfs.java // demonstrates breadth-first search // to run this program: C>java BFSApp //////////////////////////////////////////////////////////////// class Queue { private final int SIZE = 20; private int[] queArray; private int front; private int rear; public Queue() // constructor { queArray = new int[SIZE]; front = 0; rear = -1; } public void insert(int j) // put item at rear of queue { if(rear == SIZE-1) { rear = -1; } queArray[++rear] = j; } 66 public int remove() // take item from front of queue { int temp = queArray[front++]; if(front == SIZE) { front = 0; } return temp; } public boolean isEmpty() // true if queue is empty { return ( rear+1==front || (front+SIZE-1==rear) ); } } // end class Queue class Vertex { public char label; // label (e.g. 'A') public boolean wasVisited; public Vertex(char lab) { label = lab; wasVisited = false; } } // constructor // end class Vertex class Graph { private final int MAX_VERTS = 20; private Vertex vertexList[]; // list of vertices private int adjMat[][]; // adjacency matrix private int nVerts; // current number of vertices private Queue theQueue; public Graph() // constructor { vertexList = new Vertex[MAX_VERTS]; // adjacency matrix adjMat = new int[MAX_VERTS][MAX_VERTS]; 67 nVerts = 0; for(int j=0; j<MAX_VERTS; j++) { for(int k=0; k<MAX_VERTS; k++) { adjMat[j][k] = 0; } } theQueue = new Queue(); } // end constructor // set adjacency // matrix to 0 public void addVertex(char lab) { vertexList[nVerts++] = new Vertex(lab); } public void addEdge(int start, int end) { adjMat[start][end] = 1; adjMat[end][start] = 1; } public void displayVertex(int v) { System.out.print(vertexList[v].label); } public void bfs() // breadth-first search { // begin at vertex 0 vertexList[0].wasVisited = true; // mark it displayVertex(0); // display it theQueue.insert(0); // insert at tail int v2; while( !theQueue.isEmpty() ) // until queue empty, { int v1 = theQueue.remove(); // remove vertex at head // until it has no unvisited neighbors while( (v2=getAdjUnvisitedVertex(v1)) != -1 ) { // get one, vertexList[v2].wasVisited = true; // mark it displayVertex(v2); // display it theQueue.insert(v2); // insert it } // end while } // end while(queue not empty) 68 // queue is empty, so we're done for(int j=0; j<nVerts; j++) { vertexList[j].wasVisited = false; } } // end bfs() // reset flags // returns an unvisited vertex adj to v public int getAdjUnvisitedVertex(int v) { for(int j=0; j<nVerts; j++) { if(adjMat[v][j]==1 && vertexList[j].wasVisited==false) { return j; } } return -1; } // end getAdjUnvisitedVertex() } // end class Graph class BFSApp { public static void main(String[] args) { Graph theGraph = new Graph(); theGraph.addVertex('A'); // 0 (start for bfs) theGraph.addVertex('B'); // 1 theGraph.addVertex('C'); // 2 theGraph.addVertex('D'); // 3 theGraph.addVertex('E'); // 4 theGraph.addEdge(0, theGraph.addEdge(1, theGraph.addEdge(0, theGraph.addEdge(3, } 1); 2); 3); 4); // // // // AB BC AD DE System.out.print("Visits: "); theGraph.bfs(); // breadth-first search System.out.println(); } // end main() // end class BFSApp 69 Spanning Trees A Spanning Tree is a list of edges that include all vertices of a graph. Typically we are concerned with the Minimal Spanning Tree (MST) which is a minimal set of edges that connect all the vertices of the graph. There are usually several such possibilities, but any one answer is as good as another MST answer The Minimal Spanning Tree Algorithm can be either based upon a depth first search or the breadth first search. In either case it is an O(N) algorithm. The primary difference is in how we output the answer, because a MST produces an edge list instead of a vertex list Code Exmaple) // dfs.java // demonstrates depth-first search // to run this program: C>java DFSApp //////////////////////////////////////////////////////////////// class StackX { private final int SIZE = 20; private int[] st; private int top; public StackX() { st = new int[SIZE]; top = -1; } // constructor public void push(int j) { st[++top] = j; } public int pop() { return st[top--]; } // put item on stack // make array // take item off stack 70 public int peek() { return st[top]; } // peek at top of stack public boolean isEmpty() { return (top == -1); } // true if nothing on stack } // end class StackX class Vertex { public char label; // label (e.g. 'A') public boolean wasVisited; public Vertex(char lab) { label = lab; wasVisited = false; } } // end class Vertex // constructor class Graph { private final int MAX_VERTS = 20; private Vertex vertexList[]; // list of vertices private int adjMat[][]; // adjacency matrix private int nVerts; // current number of vertices private StackX theStack; public Graph() // constructor { vertexList = new Vertex[MAX_VERTS]; // adjacency matrix adjMat = new int[MAX_VERTS][MAX_VERTS]; nVerts = 0; for(int y=0; y<MAX_VERTS; y++) // set adjacency { for(int x=0; x<MAX_VERTS; x++) // matrix to 0 { adjMat[x][y] = 0; } 71 } theStack = new StackX(); } // end constructor public void addVertex(char lab) { vertexList[nVerts++] = new Vertex(lab); } public void addEdge(int start, int end) { adjMat[start][end] = 1; adjMat[end][start] = 1; } public void displayVertex(int v) { System.out.print(vertexList[v].label); } public void mst() // minimum spanning tree (depth first) { // start at 0 vertexList[0].wasVisited = true; // mark it theStack.push(0); // push it while( !theStack.isEmpty() ) // until stack empty { // get stack top int currentVertex = theStack.peek(); // get next unvisited neighbor int v = getAdjUnvisitedVertex(currentVertex); if(v == -1) // if no more neighbors { theStack.pop(); // pop it away } else // got a neighbor { vertexList[v].wasVisited = true; // mark it theStack.push(v); // push it // display edge displayVertex(currentVertex); // from currentV displayVertex(v); // to v System.out.print(" "); } } // end while(stack not empty) // stack is empty, so we're done for(int j=0; j<nVerts; j++) { // reset flags 72 } vertexList[j].wasVisited = false; } // end mst() // returns an unvisited vertex adj to v public int getAdjUnvisitedVertex(int v) { for(int j=0; j<nVerts; j++) { if(adjMat[v][j]==1 && vertexList[j].wasVisited==false) { return j; } } return -1; } // end getAdjUnvisitedVertex() } // end class Graph class MSTApp { public static void main(String[] args) { Graph theGraph = new Graph(); theGraph.addVertex('A'); // 0 (start for mst) theGraph.addVertex('B'); // 1 theGraph.addVertex('C'); // 2 theGraph.addVertex('D'); // 3 theGraph.addVertex('E'); // 4 theGraph.addEdge(0, theGraph.addEdge(0, theGraph.addEdge(0, theGraph.addEdge(0, theGraph.addEdge(1, theGraph.addEdge(1, theGraph.addEdge(1, theGraph.addEdge(2, theGraph.addEdge(2, theGraph.addEdge(3, } 1); 2); 3); 4); 2); 3); 4); 3); 4); 4); // // // // // // // // // // AB AC AD AE BC BD BE CD CE DE System.out.print("Minimum spanning tree: "); theGraph.mst(); // minimum spanning tree System.out.println(); } // end main() // end class MSTApp 73 Topological Sorting Topological Sorting (Critical Path Analysis) is the process of determining which order things should be done in given an acyclic graph. The algorithm is destructive in that it will end up eliminating the graph to get an answer the runs in O(N2) time and works as follows So long as any vertices remain in the graph, find a vertex with no outgoing edges (no successors) Add this vertex to the beginning of the answer list Remove this vertex from the graph, go back to step one Code Example) // topo.java // demonstrates topological sorting // to run this program: C>java TopoApp //////////////////////////////////////////////////////////////// class Vertex { public char label; public Vertex(char lab) { label = lab; } } // end class Vertex // label (e.g. 'A') // constructor class Graph { private final int MAX_VERTS = 20; private Vertex vertexList[]; // list of vertices private int adjMat[][]; // adjacency matrix private int nVerts; // current number of vertices private char sortedArray[]; 74 public Graph() // constructor { vertexList = new Vertex[MAX_VERTS]; // adjacency matrix adjMat = new int[MAX_VERTS][MAX_VERTS]; nVerts = 0; for(int j=0; j<MAX_VERTS; j++) // set adjacency { for(int k=0; k<MAX_VERTS; k++) // matrix to 0 { adjMat[j][k] = 0; } } sortedArray = new char[MAX_VERTS]; // sorted vert labels } // end constructor public void addVertex(char lab) { vertexList[nVerts++] = new Vertex(lab); } public void addEdge(int start, int end) { adjMat[start][end] = 1; } public void displayVertex(int v) { System.out.print(vertexList[v].label); } public void topo() // toplogical sort { int orig_nVerts = nVerts; // remember how many verts while(nVerts > 0) // while vertices remain, { // get a vertex with no successors, or -1 int currentVertex = noSuccessors(); if(currentVertex == -1) // must be a cycle { System.out.println("ERROR: Graph has cycles"); return; } // insert vertex label in sorted array (start at end) sortedArray[nVerts-1] = vertexList[currentVertex].label; 75 deleteVertex(currentVertex); } // end while // delete vertex // vertices all gone; display sortedArray System.out.print("Topologically sorted order: "); for(int j=0; j<orig_nVerts; j++) { System.out.print( sortedArray[j] ); } System.out.println(""); } // end topo public int noSuccessors() // returns vert with no successors { // (or -1 if no such verts) boolean isEdge; // edge from row to column in adjMat for(int row=0; row<nVerts; row++) // { isEdge = false; // for(int col=0; col<nVerts; col++) { if( adjMat[row][col] > 0 ) // { // isEdge = true; break; // } // } // if( !isEdge ) // { return row; // } } return -1; // } // end noSuccessors() for each vertex, check edges if edge to another, this vertex has a successor try another if no edges, has no successors no such vertex public void deleteVertex(int delVert) { if(delVert != nVerts-1) // if not last vertex, { // delete from vertexList for(int j=delVert; j<nVerts-1; j++) { vertexList[j] = vertexList[j+1]; } // delete row from adjMat for(int row=delVert; row<nVerts-1; row++) { moveRowUp(row, nVerts); } // delete col from adjMat for(int col=delVert; col<nVerts-1; col++) 76 { moveColLeft(col, nVerts-1); } } nVerts--; } // end deleteVertex // one less vertex private void moveRowUp(int row, int length) { for(int col=0; col<length; col++) { adjMat[row][col] = adjMat[row+1][col]; } } private void moveColLeft(int col, int length) { for(int row=0; row<length; row++) { adjMat[row][col] = adjMat[row][col+1]; } } } // end class Graph class TopoApp { public static void main(String[] args) { Graph theGraph = new Graph(); theGraph.addVertex('A'); // 0 theGraph.addVertex('B'); // 1 theGraph.addVertex('C'); // 2 theGraph.addVertex('D'); // 3 theGraph.addVertex('E'); // 4 theGraph.addVertex('F'); // 5 theGraph.addVertex('G'); // 6 theGraph.addVertex('H'); // 7 theGraph.addEdge(0, theGraph.addEdge(0, theGraph.addEdge(1, theGraph.addEdge(2, theGraph.addEdge(3, theGraph.addEdge(4, 3); 4); 4); 5); 6); 6); // // // // // // AD AE BE CF DG EG 77 } theGraph.addEdge(5, 7); theGraph.addEdge(6, 7); // FH // GH theGraph.topo(); } // end main() // end class TopoApp // do the sort 78 Chapter 14: Java 1.5 Features and Issues Generics A Generic (Template) type allows you to create classes that accept objects or variables of arbitrary type. The type is set at instantiation time where you are required to set them. You can have as many templated types as you wish in class, although frequently one is enough The Benefits of Generics primarily focus on code reuse. By allowing for any type I could create a stack that would hold any data type (primitive or object) without having to change one line of code. This will obviously greatly expand code reuse as you will now only have to write one binary tree, one red black tree, etc. Or, at least that is the theory The Big Issue with generics is that any operations you perform on you data must be supported by all data types that you use your generic on. This can be problematic in Java as (for example) == works one way for primitives and another for objects. The Differences from Collections are that a generic is an actual type you can specify. Useful anywhere a type is used. A collection on the other hand, is a data structure that can hold any type(s) of objects. The problem with collections is that you have to test and cast them back and forth, which is slow with primitives at best as you have to boxing them up with wrapper objects and then un-box them. Code Example) class MyList<Type> { //... 79 Boolean insert(Type item) { //... } Type remove() { //... } } class Test { public static void main(String[] args) { MyList<String> list = new MyList<String>(); list.insert(“Hi”); //... } } Autoboxing and Unboxing Autoboxing means that a primitive will be wrapped by a wrapper object when you provide one to a template type or collection that requires objects. Such values will be unboxed automatically when they are removed from such a structure as well. The Good News is that you can now instantiate generics with primitives for the generic types automatically as well as use them in collections The Bad News is that Java must instantiate and garbage collect each of these wrapper objects. That means a serious performance hit from ‘hidden’ instantiations and GC activity. While I can not say how efficiently they will implement this, I know it is an issue under C# as they have done it 80 Operator Overloading (currently missing in action) Operator Overloading allows you to specify how a given mathematical or logical operator will work on your class. In this way you could add the + operator for fractions so that you could type C = A + B; as code instead of A.add(B,C) or some such. There are two advantages to this former approach By using the standard operators there will never be any doubt or confusion as to what or how to handle operations on your class. For example, Implement +, and everyone will know how to use it because the notation is standard. Otherwsie some one may implement LT, another lessthan another LessThanm, etc. This results in compatible code Integration of primitives and objects. I can now write a + method in fraction that will take an int and a fraction so now I can write C = 2+ A; if I so wish. This is of course very clean and intuitive to use. This also would/will allow generics to work whether you want to use primitives or objects. As the primitives MUST use the standard operators, and the objects (previously) could never use the standard operators. Now each could use the same code Example Code) NA at this time 81 Collections (part of Java 1.2+ actually) Collections are a bit like generics in that a collection can store any sort of object, including auto-boxed primitives (now). We have avoid collections so far simply because they can be slow with boxing, but they do provide a really easy way to work with a variety of data types. Notes on Collections Collections stored things of type object. So you may need to down cast to your type to access methods or field level variables of your class. Otherwise only things that all objects can do are supported Collections arrange the objects they store according to several schemes. These include HashMap (sorted), HashSet (unsorted), List, etc. You must specify this on instantiation. Code Example) import java.util.collections.*; public class CollectionTest { // Statics public static void main( String [] args ) { System.out.println( "Collection Test" ); // Create a collection HashSet collection = new HashSet(); // Adding String dog1 = "Max", collection.add( dog1 collection.add( dog2 collection.add( dog3 dog2 = "Bailey", dog3 = "Harriet"; ); ); ); // Sizing System.out.println( "Collection created" + ", size=" + collection.size() + ", isEmpty=" + collection.isEmpty() ); // Containment System.out.println( "Collection contains " + dog3 + ": " + collection.contains( dog3 ) ); 82 // Iteration. Iterator supports hasNext, next, remove System.out.println( "Collection iteration (unsorted):" ); Iterator iterator = collection.iterator(); while ( iterator.hasNext() ) { System.out.println( " " + iterator.next() ); } // Removing collection.remove( dog1 ); collection.clear(); } } Enhanced Iterators An Iterator allows you to walk through a container of objects or an array in an easy manner. For example Current Source Code) // Current for loop for (Iterator i = c.iterator(); i.hasNext(); ) { String s = (String) i.next(); //... } // Now, with generics: for (Iterator<String> i = c.iterator(); i.hasNext(); ) { String s = i.next(); // ... } // Now with the new iterator int sum = 0; for (int e : a) // read e is a item in a { sum += e; } 83 Type Safe Enumerated An Enumerated Type is one which you can define the range of values that it can hold. This could be some numbers, strings, etc. It is all up to you. The nice feature of Java is that these are type safe, meaning they will not allow mis-assignments. Apparently they are almost as quick as C/C++ implementation of enums that use of integer constants and are not type safe in the least. Schweet. Tentative Code Example) public enum Suit { clubs, diamonds, hearts, spades } Suit = 1 Suit = clubs // error, type violation // okay If ( Suit == clubs ) { // … } // okay