Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hash Tables Asst. Prof. Dr. İlker Kocabaş Overview Information Retrieval Binary Search Trees Hashing. Applications. Example. Hash Functions. Hash Tables Collisions Linear Probing Problems with Linear Prob Chaining 2 Example: Bibliography R. Kruse, C. Tondo, B. Leung, “Data Structures and Program Design in C”, 1991, Prentice Hall. E. Horowitz, S. Salini, S. Anderson-Freed, “Fundamentals of Data Structures in C”, 1993, Computer Science Press. R. Sedgewick, “Algorithms in C”, 1990, Addison-Wesley. A. Aho, J. Hopcroft, J. Ullman, “Data Structures and Algorithms”, 1983, Addison-Wesley. T.A. Standish, “Data Structures, Algorithms & Software Principles in C”, 1995, Addison-Wesley. D. Knuth, “The Art of Computer Programming”, 1975, AddisonWesley. Y. Langsam, M. Augenstein, M. Fenenbaum, “Data Structures using C and C++”, 1996, Prentice Hall. 3 Insert the information into a Binary Search Tree, using the first author’s surname as the key 4 Kruse Horowitz Sedgewick Aho Knuth Langsam Standish Kruse Horowitz Aho Knuth Sedgewick Langsam Standish Insert the information into a Binary Search Tree, using the first author’s surname as the key 5 Complexity Inserting Balanced Trees O(log(n)) Unbalanced Trees O(n) Searching Balanced Trees O(log(n)) Unbalanced Trees O(n) 6 Hashing hash table 0 1 key hash function pos 2 3 : : TABLESIZE - 1 7 Example: hash table 0 1 “Kruse” hash function 5 2 3 4 5 Kruse 6 8 Hashing Each item has a unique key. Use a large array called a Hash Table. Use a Hash Function. 9 Applications Databases. Spell checkers. Computer chess games. Compilers. 10 Operations Initialize all locations in Hash Table are empty. Insert Search Delete 11 Hash Function Maps keys to positions in the Hash Table. Be easy to calculate. Use all of the key. Spread the keys uniformly. 12 Example: Hash Function #1 unsigned hash(char* s) { int i = 0; unsigned value = 0; while (s[i] != ‘\0’) { value = (s[i] + 31*value) % 101; i++; } return value; } 13 Example: Hash Function #1 value = (s[i] + 31*value) % 101; A. Aho, J. Hopcroft, J. Ullman, “Data Structures and Algorithms”, 1983, Addison-Wesley. ‘A’ = 65 ‘h’ = 104 ‘o’ = 111 value = (65 + 31 * 0) % 101 = 65 value = (104 + 31 * 65) % 101 = 99 value = (111 + 31 * 99) % 101 = 49 14 Example: Hash Function #1 value = (s[i] + 31*value) % 101; Key Aho Kruse Standish Horowitz Langsam Sedgewick Knuth Hash Value 49 95 60 28 21 24 44 resulting table is “sparse” 15 Example: Hash Function #2 value = (s[i] + 1024*value) % 128; Key Aho Kruse Standish Horowitz Langsam Sedgewick Knuth Hash Value 111 101 104 122 109 107 104 likely to result in “clustering” 16 Example: Hash Function #3 value = (s[i] + 3*value) % 7; Key Aho Kruse Standish Horowitz Langsam Sedgewick Knuth Hash Value 0 5 1 5 5 2 1 “collisions” 17 Insert Apply hash function to get a position. Try to insert key at this position. Deal with collision. 18 Example: Insert Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table 0 Aho 1 Aho Hash Function 0 2 3 4 5 6 19 Example: Insert Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table 0 Aho 1 Kruse Hash Function 5 2 3 4 5 Kruse 6 20 Example: Insert Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table Standish Hash Function 1 0 Aho 1 Standish 2 3 4 5 Kruse 6 21 Search Apply hash function to get a position. Look in that position. Deal with collision. 22 Example: Search Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table Kruse Hash Function 5 0 Aho 1 Standish 2 3 4 found. 5 Kruse 6 23 Example: Search Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table Sedgwick Hash Function 2 0 Aho 1 Standish 2 3 4 Not found. 5 Kruse 6 24 Hash Tables: Collision Resolution Hashing hash table 0 1 key hash function pos 2 3 : : TABLESIZE - 1 26 Example: hash table 0 1 “Kruse” hash function 5 2 3 4 5 Kruse 6 27 Hashing Each item has a unique key. Uses a large array called a Hash Table. Uses a Hash Function. Hash Function • • • • Maps keys to positions in the Hash Table. Be easy to calculate. Use all of the key. Spread the keys uniformly. 28 Hash Table Operations Initialize all locations in Hash Table are empty. Insert Search Delete 29 Example: Hash Function #3 value = (s[i] + 3*value) % 7; Key Aho Kruse Standish Horowitz Langsam Sedgewick Knuth Hash Value 0 5 1 5 5 2 1 “collisions” 30 Collision When two keys are mapped to the same position. Very likely. Birthdays Number of People Probability 10 0.1169 20 0.4114 30 0.7063 40 0.8912 50 0.9704 60 0.9941 70 0.9992 31 Collision Resolution Two methods are commonly used: Linear Probing. Chaining. 32 Linear Probing Linear search in the array from the position where collision occurred. 33 Insert with Linear Probing Apply hash function to get a position. Try to insert key at this position. Deal with collision. Must also deal with a full table! 34 Example: Insert with Linear Probing Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table 0 Aho 1 Aho Hash Function 0 2 3 4 5 6 35 Example: Insert with Linear Probing Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table 0 Aho 1 Kruse Hash Function 5 2 3 4 5 Kruse 6 36 Example: Insert with Linear Probing Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table Standish Hash Function 1 0 Aho 1 Standish 2 3 4 5 Kruse 6 37 Example: Insert with Linear Probing Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table Horowitz Hash Function 5 0 Aho 1 Standish 2 3 4 5 Kruse 6 Horowitz 38 Example: Insert with Linear Probing Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table Langsam Hash Function 5 0 Aho 1 Standish 2 Langsam 3 4 5 Kruse 6 Horowitz 39 module linearProbe(item) { position = hash(key of item) count = 0 loop { if (count == hashTableSize) then { output “Table is full” exit loop } if (hashTable[position] is empty) then { hashTable[position] = item exit loop } position = (position + 1) % hashTableSize count++ } } 40 Search with Linear Probing Apply hash function to get a position. Look in that position. Deal with collision. 41 Example: Search with Linear Probing Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth found. Langsam Hash Function hash table 5 0 Aho 1 Standish 2 Langsam 3 4 5 Kruse 6 Horowitz 42 Example: Search with Linear Probing Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth hash table Knuth Hash Function 1 0 Aho 1 Standish 2 Langsam 3 4 not found. 5 6 Kruse Horowitz 43 module search(target) { count = 0 position = hash(key of target) loop { if (count == hashTableSize) then { output “Target is not in Hash Table” return -1. } else if (hashTable[position] is empty) then { output “Item is not in Hash Table” return -1. } else if (hashTable[position].key == target) then { return position. } position = (position + 1) % hashTableSize count++ } } 44 Delete with Linear Probing Use the search function to find the item If found check that items after that also don’t hash to the item’s position If items after do hash to that position, move them back in the hash table and delete the item. Very difficult and time/resource consuming! 45 Linear Probing: Problems Speed. Tendency for clustering to occur as the table becomes half full. Deletion of records is very difficult. If implemented in arrays – table may become full fairly quickly, resizing is time and resource consuming 46 Chaining Uses a Linked List at each position in the Hash Table. Linked list at a position contains all the items that ‘hash’ to that position. May keep linked lists sorted or not. 47 hash table 0 1 2 3 : : 48 Example: Chaining Aho, Kruse, Standish, Horowiz, Langsam, Sedgwick, Knuth 0, 5, 1, 5, 0 1 Aho 1 2 Standish 2 1 3 0 4 0 5 3 6 5, 2, 1 Knuth Sedgewick Kruse Horowitz Langsam 0 49 Hashtable with Chaining At each position in the array you have a list: List hashTable[MAXTABLE]; 0 1 1 2 2 1 : You must initialise each list in the table.50 Insert with Chaining Apply hash function to get a position in the array. Insert key into the Linked List at this position in the array. 51 module InsertChaining(item) { posHash = hash(key of item) insert (hashTable[posHash], item); } 0 1 Aho 1 2 Standish 2 1 Sedgewick Knuth : 52 Search with Chaining Apply hash function to get a position in the array. Search the Linked List at this position in the array. 53 /* module returns NULL if not found, or the address of the * node if found */ module SearchChaining(item) { posHash = hash(key of item) Node* found; found = searchList (hashTable[posHash], item); return found; } 0 1 Aho 1 2 Standish 2 1 Sedgewick Knuth : 54 Delete with Chaining Apply hash function to get a position in the array. Delete the node in the Linked List at this position in the array. 55 /* module uses the Linked list delete function to delete an item *inside that list, it does nothing if that item isn’t there. */ module DeleteChaining(item) { posHash = hash(key of item) } deleteList (hashTable[posHash], item); 0 1 Aho 1 2 Standish 2 1 Sedgewick Knuth : 56 Disadvantages of Chaining Uses more space. More complex to implement. Contains a linked list at every element in the array. Requires linear searching. May be time consuming. 57 Advantages of Chaining Insertions and Deletions are easy and quick. Allows more records to be stored. Naturally resizable, allows a varying number of records to be stored. 58 Double Hashing Double hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series (i + jd(k)) mod N for j = 0, 1, … , N - 1 The secondary hash function d(k) cannot have zero values The table size N must be a prime to allow probing of all the cells Common choice of compression map for the secondary hash function: d2(k) = q - k mod q where q<N q is a prime The possible values for d2(k) are 1, 2, … , q Dictionaries and Hash Tables 59 Performance of Hashing In the worst case, searches, insertions and removals on a hash table take O(n) time The worst case occurs when all the keys inserted into the dictionary collide The load factor a = n/N affects the performance of a hash table Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is 1 / (1 - a) The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% Applications of hash tables: small databases compilers browser caches Dictionaries and Hash Tables 60