Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lists, Hash Tables, Trees CS 1302 Fall 1999 Contents of Lecture • Linked lists – Linked lists in general – What classes are needed to implement? – Code example – Linked lists versus arrays: when to use which? • Hash table basics • Binary search trees Linked Lists in Java We will now show the basics for a Java linked list holding data elements of some specific class. Our example: • A linked list of class RingList. It’s a class, so we may create any number of instances of RingList. • Any and all objects of class Ring contain nodes that are of class RingNode which manages data about a single ring. • Note that the list itself is one class, the items in that list are of another class, and the program using them is a 3rd class. head 1 instance of RingList width width 2 instances of RingNode The RingNode Class class RingNode { width private int width; private RingNode next; public RingNode (int width) { this.width = width; next = null; // reserved word for // nonexistent obj } public void setNext (RingNode next) { this.next = next; } public RingNode getNext () { return (next); } public void print () { System.out.println (title); } }//class BookNode class RingList { private RingNode head; The List Class head RingList () { head = null; } public void add (RingNode newRing) { RingNode currRing; if (head == null) head = newRing; else { currRing = head; while (currRing.getNext() != null) currRing = currRing.getNext(); currRing.setNext (newRing); } }//add public void print () { printHelper (head); }//print private void printHelper (RingNode probe) { if (probe != null) { probe.print (); printHelper (probe.getNext()); } } //printHelper }//class RingList The class that uses the list public class ListUser { public static void main (String args[ ]) { RingList rings = new RingList (); /* the RingList.add method can do the "new RingNode” for you. Which is better, do you think? */ rings.add (new RingNode (6)); rings.add (new RingNode (5)); rings.add (new RingNode (4)); rings.add (new RingNode (3)); rings.print (); }//main head }//class ListUser rings 6 5 Using Linked Lists in Java Three crucial things to remember: 1. No pointers in Java. Manipulate references • main difference in code from 1301/1501: no need for de-referencing operator ( ^ ) 2. No direct access to data: • assignment of references only • data access via accessor/modifier methods 3. Multiple classes required: • A type of node is defined in one class. • A list that uses nodes is defined in another class. • The program that uses the list is in yet another class. Q: Why bother with different Collection types? Designing is largely choosing among alternatives based on design goals Arrays vs. Linked Lists Same Cost/Benefit Trade-offs as in Pseudocode: Arrays are statically sized, so you have to commit to length in advance Linked lists are dynamically sized, so you can hedge your bets Arrays require O(N) work to insert at front (need to shift rest) Linked lists require O(1) (constant) work (need to “splice” in place) Arrays provide random access to each data element Linked lists require that you traverse list to get to search node Arrays require space for data objects Linked lists also require space for “next” references Linked Structures & Dynamic Memory Allocation: Summary Points Problems with static memory allocation • you run out of memory if you underestimate • you waste memory if you overestimate • you have to manage the memory manually (e.g., closing gaps in the data structure) Dynamic memory allocation • all data (objects) is allocated in the heap; you work with reference to those objects • you request new memory when you need it (using "new") • java collects memory that you don't need any more and returns it to the heap (garbage collection) Examples • linked lists, ordered lists, queues, stacks, doubly-linked lists, etc. • trees, binary trees, binary search trees, height-balanced binary search trees, graphs (to come) Lists: Summary • Linked lists – Class that uses the list (e.g. driver) – Class representing the list itself (NodeList) – Class representing nodes (Node) • One or more data fields • One field is “pointer” to next node: type is Node – null is “zero”/non-existent object • Never reference a null object!!! • Lists -v- arrays: Design as choosing among options – Early commitment to length in arrays – Insertion cost constant for lists – Space overhead for next field in list • • Hashtable basics Binary search trees Contents of Lecture • Linked lists • Hash table basics – Purpose – Hash functions and collisions – Collision resolution strategies • Binary search trees Hashing Problem: How to quickly access items in a list? Possible Solution: Hashing Idea: Shrink the address space to fit the population size. For example: Use some function to reduce the address space of a billion possible SocSecNums to the population size of 100 students. Hash Function: The function by which you shrink the address space, e.g, index = SocSecNum % 100 Hash Functions The Perfect Hash Function: • would be very fast (used for all data access) • would return a unique result for each key, i.e., would result in zero collisions • in general case, perfect hash doesn’t exist (we can create one for a specific population, but as soon as that population changes... ) • provides an ideal point of reference Common Hash Functions: • Digit selection: e.g., last 4 of phone num • Division: modulo •Character keys: use ASCII num values for chars (e.g., ‘R’ is 82) Cost of Hash Two costs of hashing: 1. loss of natural order • side effect of desired random shrinking • lose any ordering of original indices 2. collision will occur • no perfect hash function • when (not “if”) collision, how to handle it? Collision Resolution strategies: • Multiple record buckets: small for each index, but . . . • Open address methods: look for next open address, but . . . • Coalesced chaining: use cellar for overflow (~34..40% of size) • External chaining: linked list at each location Collision Resolution Technique: Multiple element buckets • Idea: have extra spaces there for overflow • if population of 8, and if hash fuction of mod 8, then: 1st hash 1st 2nd collision collision 0 1 2 3 4 5 6 7 Problems: using 3N space; “what if 3rd collision at any one locale?” Collision Resolution Technique: Open address methods • Idea: upon collision, look for an empty spot • if population of 8, and if hash fuction of mod 8 • Assume data items arrived in the order: W, X, Y, Z, A, B, C, D 0 1 2 3 D hashes to 2 W hashes to 1 C hashes to 1 X hashes to 3 4 5 6 7 Y hashes to 4 Z hashes to 3 A hashes to 6 B hashes to 5 D belongs at 2, but C already there W already at 1, so C to next available slot X already at 3, so Z to next available slot B belongs at 5, but Z already there Problem: Deteriorates to an unsorted list (e.g., O(N) ) Collision Resolution Cellar Technique: Coalesced chaining: • Idea: have small extra “cellar” to handle collision • if population of 8, and if hash fuction of mod 8 • Assume data items arrived in the order: W, X, Y, Z, A, B, C, D 0 1 W hashes to 1 2 D hashes to 2 3 4 5 X hashes to 3 Y hashes to 4 B hashes to 5 6 7 8 9 10 A hashes to 6 C hashes to 1 Z hashes to 3 9 10 Works well with cellar of 35..40% of N if good hash function; cellar can overflow if need be Cellar bottom is now 8 Collision Resolution Technique: External chaining: • Idea: have pointers to all items at given hash, handle collision as normal event. • if population of 8, and if hash fuction of mod 8 • Assume data items arrived in the order: W, X, Y, Z, A, B, C, D 0 1 2 3 4 5 6 7 W hashes to 1 D hashes to 2 X hashes to 3 Y hashes to 4 B hashes to 5 A hashes to 6 C hashes to 1 Z hashes to 3 Hashing with Chaining: Example public class HashChain { private Node[] bucket; private int TableSize; public HashChain(int TableSize) { this.TableSize = TableSize; bucket = new Node[TableSize]; for (int i=0; i< TableSize; i++) bucket[i] = new Node(); } // HashChain private int getHashKey(int newElement) { return newElement % TableSize; } // getHashKey public void addElement (int newElement) { int index = getHashKey(newElement); bucket[index] .insertNode(newElement); } //addElement public Node getElement(int iData) { int index = getHashKey(iData); Node item = bucket[index] .locateNode(iData); return item; } // getElement } // HashChain public Node locateNode (int iData, Node current) { if (iData == current.getData()) return current; else if (current.getNextNode()== null) return null; else return locateNode (iData, current.getNextNode()); } public class Node { int iData; Node nextNode; public Node() { ; } public Node(int iData) { this.iData = iData; } public void insertNode(int iData) { insertNode (iData, this); } public int getData() { return iData; } public void insertNode (int iData, Node current) { if (current.getNextNode() == null) current.setNextNode (new Node(iData)); else insertNode (iData, current.getNextNode()); } public Node locateNode(int iData) { return locateNode(iData, this); } public Node getNextNode() { return nextNode; } public void setNextNode(Node nextNode) { this.nextNode = nextNode; } public String toString() { return "Node: " + iData; } } class Driver{ public static void main(String arg[]) { HashChain hash = new HashChain(10); for (int i=0; i< 100; i++) { hash.addElement(i); } // for for (int i=0; i<100; i++) { System.out.println (hash.getElement(i)); } // for } // main } // Driver Summary of Hash Tables • Purpose: Fast searching of lists by reducing address space to approx. population size. • Hash function: the reduction function • Collision: hash(a) == hash(b), but a!=b • Collision resolution strategies – Multiple element buckets still risk collisions – Open addressing quickly deteriorates to unordered list – Chaining is most general solution Contents of Lecture • • Linked lists Hash table basics • Binary search trees – BSTs in Java – Balancing a BST – Search strategies • and there’s a bit more coming up, so don’t pack up yet Trees--Definitions Recall our previous discussion of Trees. Defined: A tree is a multiply-linked data structure path Internal node (non terminal) fox branch hen height cat dog elf hat hog Leaf (terminal) Binary Trees Binary Trees: same basic idea as from Pseudocode. A few neat properties of binary trees: * There exists at most one path between any two nodes. * A tree with N nodes has N-1 edges. * A full binary tree with N internal nodes has N+1 external nodes. A B D C E F * The height of a full binary tree with N nodes is about log2N. Binary Trees: Part of a Class for the Tree’s Nodes data Object a TreeNode class TreeNode { private SomeClass dataObject; // reference to data private TreeNode left; // reference to left subtree private TreeNode right; // reference to right subtree // constructor for leaf node with null references public TreeNode(SomeClass newObject) { this( newObject, null, null ); } // accessor: returns reference to current data object public SomeClass getObject( ) { return dataObject ; } // accessor: returnes reference to left subtree public TreeNode getLeft( ) { return left; } // accessor: return reference to right subtree public TreeNode getRight( ) { return right; } } // class TreeNode Insertion into a Binary Search Tree In Class TreeNode: public void insert (SomeClass newObject) { if ( newObject.lessThan( dataObject ) ) { if ( left == null ) left = new TreeNode(newObject); else left.insert( newObject ); } else // treating duplicates as if they’re greaterThan { if ( right == null ) right = new TreeNode (newObject); else right.insert( newObject ); } } // insert More Insertion, plusTraversal a TreeNode In Class Tree: root public class Tree { private TreeNode root; public Tree( ) { root = null; } 2 possible Trees public void insertInBST (SomeClass newObject) { if (root == null) root = new TreeNode (newObject); else root.insert(newObject ); } // insertInBST public void inorderTraversal( ) { inorderHelper (root); } root data Object public void inorderHelper (TreeNode node) { if (node != null) { inorderHelper (node.left); System.out.println (node.dataObject ); inorderHelper (node.right); } // if } // inorderHelper } // class Tree Binary Tree Balancing If unbalanced, binary tree can become linked list: In best-case, tree search time is O(log N). A B Problem: As tree grows out of balance search time deteriorates: worst-case, the search time of O(N). C Tree as linked list creates worst case search time D E Solved by keeping tree F in balance! Binary Tree Balancing Two general methods of tree balancing 1) Rebalance from sorted list. E.g., perform inorder traversal followed by tree reconstruction of sorted array. Cost: O(N) time to reconstruct global tree 2) Create ‘almost balanced’ binary tree and use local tree balancing (AVL trees, covered in algorithms courses) Cost : O(log N) insertion time. Binary Tree Balancing Reconstruction Technique: place nodes in array (in order traversal). Sort the array Combine into a recursive call Take midpoint as new tree head. Take midpoint of each remaining half as the left and right child; repeat New Head 9 12 14 22 25 31 44 47 64 74 84 Left & Right Children DFS: Simple Example A DFS of a binary tree is similar to a recursive pre-order traversal, except that: 1. Pre-order visits entire tree, while DFS stops at at a goal node. 2. The right tree gets visited only if the left fails to hold the goal node. A B D C E F To find C, traverse: A-B-D-E-C (but not F) To find E, traverse: A-B-D-E, etc. Tree Searching: Recursive public boolean searchTree(Node item, Node current) { boolean found = false; Terminates if (current == null) recursive found = false; else if (current.equals(item)) call found = true; else if (searchTree(current.getLeft()) found = true; else if (searchTree(current.getRight()) found = true; else Note: If binary search tree, one achieves improved search found = false; performance by comparing the return found; current node to item, limiting } // searchTree search to left or right subtree. Contents of Lecture • • Linked lists Hash table basics • Binary search trees – Terms, revisited: branch, leaf, etc. – Java example for Node class (cf. List Node) – Search cost degenerates to that for list unless tree is balanced – Depth-first search is like preorder traversal, but terminates when target node is found • and there’s really is a bit more coming up, so don’t pack up just yet Designing with (== choosing among) Data Structures Example: Maintaining Very Large Distributed Databases Issues How fast can we search? How fast can we insert and delete entries? How large a data structure do we need? How large a data structure do we need in main memory to work with? Very Large Databases • How large is an English dictionary? /usr/dict/words: 25,000 words ARTFL Project, Webster’s Revised Unabridged Dictionary, 1913 Edition: 110,000 words (http://humanities.uchicago.edu/forms_unrest/webster.form.html) • How large is the web? • How fast can we search databases this size? Searching and Maintaining Very Large Databases How fast are these techniques? Data representation Search Algorithm Sorted Array Binary Search Ordered Linked List Linear Search Binary Search Tree Tree Search Balanced Binary Search Tree Tree Search O(log n), but how long do they really take on real machines?