Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables III Topics Hashing Definition Hash function Key Hash value collision 2 Open hashing Common Problem A common pattern in many programs is to store and look up data Because it is so common, many data structures for it have been investigated 3 Find student record, given ID# Find person address, given phone # How? Phone Number Problem Problem: phone company wants to implement caller ID. given a phone number (the key), look up person’s name or address(the data) lots of phone numbers (P=107-1) in a given area code only a small fraction of them are in use 4 Nobody has a phone number :0000000 or 0000001 Comparison of Time Complexity (average) Operation Insertion Deletion Search O(1) O(n) O(n) Unsorted reference O(1) O(n) O(n) Sorted Array O(n) O(n) O(logn) Sorted reference O(n) O(n) O(n) BST O(logn) O(logn) O(logn) Unsorted Array Can we do better than O(logn)? 5 Can we do better than O(log N)? All previous searching techniques require a specified amount of time (O(logn) or O(n)) Time usually depends on number of elements (n) stored in the table In some situations searching should be almost instantaneous -- how? Examples 6 911 emergency system Air-traffic control system Can we do better than O(log N)? Search is O(1), but huge amount of space wasted. – how to solve this? 7 000-0002 000-0001 000-0000 Null Null Null Xu ••• Null Null ••• ••• Null Sub 263-3049 Answer: Yes … sort of, if we're lucky. General idea: take the key of the data record you’re inserting, and use that number directly as the item number in a list (array). 259-1623 ••• Hashing Basic idea: Don't use the data value directly. Given an array of size B, use a hash function, h(x), which maps the given data record x to some (hopefully) unique index (“bucket”) in the array. 0 1 x 8 h h(x) B-1 What is Hash Table? The simplest kind of hash table is an array of records. This example has 101 records. [0] [1] [2] [3] [4] [5] [100] ... An array of records 9 What is Hash Table? [4] Number 256-2879 8888 Queen St. Linda Kim Each record has a special field, called its key. In this example, the key is a long integer field called Number. [0] [1] [2] [3] [4] [5] [100] ... An array of records 10 What is Hash Table? [4] Number 256-2879 The number is person's phone number, and the rest is person name or address. [0] [1] [2] [3] [4] [5] [100] ... An array of records 11 What is Hash Table? When a hash table is in use, some spots contain valid records, and other spots are "empty". [0] [1] [2] Number 281942902 Number [3] 233667136 [4] Number [5] [100] 506643548 Number 155778322 ... An array of records 12 Inserting a New Record? Number 265-1556 In order to insert a new record, the key must somehow be converted to an array index. The index is called the hash value of the key. [0] [1] [2] Number 281942902 Number [3] 233667136 [4] Number [5] [100] 506643548 Number 155778322 ... An array of records 13 Inserting a New Record? Number 265-1556 Typical way to create a hash value: (Number mod 101) What is (265-1556 mod 101) ? [0] [1] [2] Number 281942902 Number [3] 233667136 [4] Number [5] [100] 506643548 Number 155778322 ... An array of records 14 Inserting a New Record?Number 265-1556 Typical way to create a hash value: (Number mod 101) What is (2651556 mod 101) ? [0] [1] [2] Number 281942902 Number [3] 233667136 3 [4] Number [5] [100] 506643548 Number 155778322 ... An array of records 15 Number 265-1556 Inserting a New Record? The hash value is used for the location of the new record. [0] [1] [2] Number 281942902 Number [3] 233667136 [3] [4] Number [5] [100] 506643548 Number 155778322 ... An array of records 16 Inserting a New Record? The hash value is used for the location of the new record. [0] [1] [2] Number 281942902 Number 233667136 [3] Number 580625685 [4] Number [5] [100] 506643548 Number 155778322 ... An array of records 17 What is Hashing? What is hashing? Each item has a unique key. Use a large array called a Hash Table. Use a Hash Function. Hashing is like indexing in that it involves associating a key with a relative record address. Hashing, however, is different from indexing in two important ways: With hashing, there is no obvious connection between the key and the location. With hashing two different keys may be transformed to the same address. A Hash function is a function h(K) which transforms a key K into an address. 18 What is Hashing? An address calculator (hashing function) is used to determine the location of the item 0 Search key Array Address Calculator (Hash table) (Hash function) N-1 19 What Can Be Hashed? 20 Anything! Can hash on numbers, strings, structures, etc. Java defines a hashing method for general objects which returns an integer value. Where do we use Hashing? 21 Databases (phone book, student name list). Spell checkers. Computer chess games. Compilers. Hashing and Tables Hashing gives us another implementation of Table ADT Hashing operations Initialize Insert Search Delete Hash the key; this gives an index; use it to find the value stored in the table in O(1) 22 all locations in Hash Table are empty. Great improvement over Log N. Hashing Insert pseudocode tableInsert (newItem) i = the array index that the address calculator gives you for the new item’s search key table[i]=newItem Retrieval pseudocode tableRerieve (searchKey) i = array index for searchKey given by the hash function if (table[i].getKey( ) == searchKey) return table[i] else return null 23 Hashing Deletion pseudocode tableDelete (searchKey) i = array index for searchKey given by the hash function success=(tabke[I].getKey() equals searchKey if (success) Delete the item from table[i] Return success 24 Hash Tables Table size Mapping Simple to compute Ideally 1-1: not possible Even distribution Main problems 25 Entries are numbered 0 to TSIZE-1 Choosing table size Choosing a good hash function What to do on collisions How to choose the Table Size? TSIZE = 11 H (Key) = Key mod TSIZE TSIZE = 10 110 20 0 15 210 20 1 320 22 22 2 460 26 3 520 49 4 54 600 54 5 15 6 26 7 8 9 49 26 0 1 2 3 4 5 6 7 8 9 110 210 320 460 520 600 110 0 210,320 1 2 520 3 4 5 600 6 7 8 460 9 10 How to choose a Hashing Function? The hash function we choose depends on the type of the key field (the key we use to do our lookup). Rule 27 Finding a good one can be hard Be easy to calculate. Use all of the key. Spread the keys uniformly. How to choose a Hashing Function? Example: Student Ids (integers) h(idNumber) = idNumber % B eg. h(678921) = 678921 % 100 = 21 Names (char strings) h(name) = (sum over the ascii values) % B eg. h(“Bill”) = (66+105+108+108) % 101 = 86 28 Number 2641455 Collision Here is another new record to insert, with a hash value of 2. My hash value is [2]. [0] [1] [2] Number 281942902 Number 233667136 [3] Number 580625685 [4] Number [5] [100] 506643548 Number 155778322 ... An array of records 29 What to do on collisions? Open hashing (separate chaining) Close hashing (open address) 30 Linear Probing Quadratic Probing Double hashing Open hashing (separate chaining) Keep a list of all elements that hash to the same value. 0 0 1 4 9 16 25 36 49 64 81 31 0 1 2 3 4 5 6 7 8 9 1 81 4 25 16 64 9 49 36 Open hashing (separate chaining) Secondary Data Structure List Search tree another hash table We expect small collision List 32 Simple Small overhead 0 1 2 3 4 5 6 7 8 9 0 1 81 4 25 16 64 9 49 36 Operations with Chaining Insert with chaining Search with chaining 33 Apply hash function to get a position. Insert key into the Linked List at this position. Apply hash function to get a position. Search the Linked List at this position. Open hashing (separate chaining) public class ChainNode { Private KeyedItem item; private ChainNode next; public ChainNode(KeyedItem newItem, ChainNode nextNode) { item = newItem; next= nextNode; // set and get methods } } // end of ChainNode 34 Open hashing (separate chaining) public class HashTable { private final int HASH_TABLE_SIZE = 101; // size of hash table private ChainNode [] table; //hash table private int size; //size of hash table public HashTable() { table = new ChainNode [HASH_TABLE_SIZE]; size =0; } 35 public bool tableIsEmpty() { return size ==0;} public int tableLength() { return size;} public void tableInsert(KeyedItem newItem) throws HashException {} public boolean tableDelete(Comparable searchKey) {} public KeyedIten tableRetrieve(Comparable searchKey) {} } // end of hashtable Open hashing (separate chaining) tableInsert(newItem) if (table is not full) { searchKey= the search key of newItem i = hashIndex (searchKey) node= reference to a new node containing newItem node.setNext (table[I]); table[I] = node } else //table full throw new HashException () 36 Open hashing (separate chaining) tableRetrieve (searchKey) i = hashIndex (searchKey) node= table [I]; while ((node !=null)&& node.getItem().getKey()!= searchKey ) node=getNext () if (node !=null) return node.getITem() else return null 37 Evaluation of Chaining Disadvantages of Chaining More complex to implement. Search and Delete are harder. We need to know: The number of elements in the table (N); the number of buckets (B); the quality of the hash function Advantage of Chaining Insertions is easy and quick. Allows more records to be stored. 38 Worse case (O(n)) for searching The size of table is dynamic Review A(n) ______ maps the search key of a table item into a location that will contain the item. 39 hash function hash table AVL tree heap Review A hash table is a(n) ______. 40 stack queue array list Review The condition that occurs when a hash function maps two or more distinct search keys into the same location is called a(n) ______. 41 disturbance collision Rotation congestion Review ______ is a collision-resolution scheme that searches the hash table sequentially, starting from the original location specified by the hash function, for an unoccupied location. 42 Linear probing Quadratic probing Double hashing Separate chaining Review ______ is a collision-resolution scheme that searches the hash table for an unoccupied location beginning with the original location that the hash function specifies and continuing at increments of 12, 22, 32, and so on. 43 Linear probing Double hashing Quadratic probing Separate chaining Review ______ is a collision-resolution scheme that uses an array of linked lists as a hash table. 44 Linear probing Double hashing Quadratic probing Separate chaining Review The load factor of a hash table is calculated as ______. 45 table size + current number of table items table size – current number of table items current number of table items * table size current number of table items / table size