Download ppt - EECG Toronto

Data Structures & Computational Complexity ECE 244 2013 Vaughn Betz Data Structure Complexity Operation Unordered List Vector Binary Search Tree Build (N items) O(N) O(N) O(N log N) Find item by value/key O(N) in general O(1) at head / tail O(N) O(log N) Insert item O(1) O(N) in general O(1) at back O(log N) Delete item O(N) in general O(1) at head / tail O(N) in general O(1) at back O(log N) • BST: all operations reasonably fast – Good match to database (find, insert, delete common) – Search through everything to find entry • vector: – Slow to insert/delete except at end/back – Slow find by value/key • Unordered List: fast insert, slow find and delete – Good if you only find/delete at head and tail – Stacks and queues 2 Important Vector Special Case • If key is an integer • AND range of key limited: [0 to N-1]  Can store in array/vector, use key as index array MyObject 0 MyObject 1 MyObject 2 MyObject 3 MyObject array[N]; int key; MyObject object_with_key = array[key]; MyObject 4 ... MyObject N-3 • Build: O(N) • Find, Insert, Delete: O(1) • Fastest data structure! MyObject N-2 MyObject N-1 3 Hash Tables ECE 244 2013 Hash Table - Idea • Stores data in a specially indexed array • Enables O(1) find, insert, delete operations – Very fast! – Trade-off: wastes some space • Each item has a unique key – Key range large (e.g. student numbers: 900123456) – Many more keys than items to store – Can’t just make array[MAX_KEY]  too big • Know roughly how many items to store – E.g. number of students in ECE department • Create array with more entries than we have items to store • “hashing function” h(key) to map from key to array entry • Reference: Algorithms by Cormen et al, Chap. 11 5 Hash Table - Implementation MyObject hash_array[M]; hash_array key (0 .. 999999999) h(key)  index [0..M-1] – Example: – h(key) = key % M; – M = 10 – key = 900200204 – h(key) = 4 MyObject M-1 MyObject M-2 MyObject M-3 ... MyObject 4 MyObject 3 MyObject 2 MyObject 1 MyObject 0 6 Hashing Collisions const int M = 10; MyObject hash_array[M]; – h(key) = key % M; – M = 10 – key0 = 789222308 – h(key0) = 8 – key1 = 900200205 – h(key1) = 5 – key2 = 988777335 – h(key2) = 5  collision! • Problem: can’t overwrite data of another object/key hash_array MyObject 9 MyObject 8 MyObject 7 MyObject 6 MyObject 5 MyObject 4 MyObject 3 MyObject 2 MyObject 1 MyObject 0 7 Solution 1: Hashing with Chaining • • • Also known as “open hashing” Don’t store object at array entry i Instead store pointer to linked list of objects – Every object with a key such that h(key) == i ListofObjects hash_array[M]; hash_array key0 = 789222304 h(key0) = 4 key1 = 900200201 h(key1) = 1 key2 = 988777331 h(key2) = 1 ... List 5 = NULL List 4 = NULL List 3 = NULL key = 789222304 next = NULL List 2 = NULL List 1 = NULL List 0 = NULL key = 900200201 988777331 next = NULL 8 Open Hashing • If hash function is “good” – Spreads out keys across array evenly • And if hash table (array) big enough – More array locations than keys we want to insert • Then average length of list is approx. 1 – Achieves O(1) access 9 Open Hashing & Hash Quality • Bad hash function example: – Instead of h(key) = key % M, could use h’(key) = key / 100000000 • E.g. – key = 988777334 – h’(key) = 9 – If keys are student numbers, and many start with same digit (entered university at same time) h’ will not map evenly across array Lists will be longer, and access slower 10 Solution 2: Closed Hashing • If h(key) == i, and location i occupied – check location (i+1) % M in hash table – store key/object there if empty hash_array – else check location (i+2) % M –… key=789222308 const int M = 10; MyObject hash_array[M]; key0 = 789222308 h(key0) = 8 key1 = 900200204 h(key1) = 4 key2 = 988777334 h(key2) = 4 9 8 7 6 key=988777334 5 key=900200204 4 3 2 1 0 11 Closed Hashing • Consider: – h(key) == i – i occupied • Linear probing: – Check (i+1) % M, (i+2) %M, … – Until we find an empty slot – Problem: used entries tend to cluster, even with a good hash function • More probes per insert or find • Slows down average insert and average find • Quadratic probing: – Check (i+1) % M, (i+4) % M, (i+9) % M, … – Less clustering (find an empty slot faster) 12 Closed Hashing • Double hashing (or re-hashing) – Defines two hash functions: – h(key) and h2(key) – Example: • h(key) = key % M • h2(key) = (key * 7) % M – – – – – First map to h(key) If collision, try [h(key) + h2(key)] % M Then [h(key) + 2 * h2(key)] % M Then [h(key) + 3 * h2(key)] % M … • +ve: reduces clustering • -ve: more time to compute hash functions 13 Closed Hashing: Class // Storing Element type objects, using an integer key. // Using closed hashing with linear probing in this class. Class HashTable { private: Element *htable; // [0..m_table_size-1] int m_table_size; int n_items; // Num items stored in table. int hash (int key); // Helper function: the hash bool is_empty (int index); // Helper functions void set_to_empty (int index); public: HashTable(int table_size); ~HashTable(); bool insert (const Element &item); bool find (int key, Element &item); bool delete (const Element &item); }; 14 Closed Hashing: Constructor HashTable::HashTable (int table_size) { n_items = 0; m_table_size = table_size; htable = new Element[m_table_size]; // Need to store a special value for empty items, // so we can tell where we have spots to insert things // in the table. for (int i = 0; i < m_table_size; i++) set_to_empty (i); } 15 Closed Hashing: Insert bool HashTable::insert (const Element &item) { if (n_items == m_table_size - 1) return (false); // Table full! int index = hash (item.key); Need to resize. // “Home” spot for item. while (!is_empty(index)) { if (item.key == htable[index].key) return (false); // Already in table. // Look for free entry, using linear probing. index = (index + 1) % m_table_size; } htable[index] = item; n_items++; return (true); // Found free entry. Store item. } 16 Closed Hashing: Find bool HashTable::find (int key, Element &item) { int index = hash (key); // “Home” spot for item. while (!is_empty(index)) { if (key == htable[index].key) { // Found item! item = htable[index]; return (true); } // Keep looking, using linear probing. index = (index + 1) % m_table_size; } // Hit an empty spot, without finding item with key // Will have at least one empty spot, since insert // only allowed m_table_size – 1 items return (false); } 17 Closed Hashing: Delete HashTable my_hash(10); item0.key = 789222308; my_hash.insert (item0); item1.key = 900200205; my_hash.insert (item1); item2.key = 988777335; my_hash.insert (item2); htable 9 <empty> <empty> 8 key=789222308 7 <empty> key=988777335 6 <empty> <empty> key=900200205 5 my_hash.delete (900200205); my_hash.find (988777335, found_item); <empty> <empty> 4 <empty> 2 <empty> 1 <empty> 0 3 18 Closed Hashing: Delete HashTable my_hash(10); item0.key = 789222308; my_hash.insert (item0); item1.key = 900200205; my_hash.insert (item1); item2.key = 988777335; my_hash.insert (item2); htable 9 <empty> key=789222308 8 7 <empty> key=988777335 6 <empty> key=900200205 5 my_hash.delete (900200205); my_hash.find (988777335, found_item); Find will fail (return false)! Problem: didn’t reproduce insert’s probe sequence How can we fix? <empty> <empty> 4 <empty> 2 <empty> 1 <empty> 0 3 19 Closed Hashing: Delete Solution: can’t mark deleted items as empty. Mark as “deleted” (another special value) – OK for insert to re-use. – But find must keep looking past them, as it probes. – Ensures original (insert) probe sequence reproduced by find htable 9 <empty> key=789222308 8 7 <empty> key=988777335 6 my_hash.delete (900200205); my_hash.find (988777335, found_item); Now finds item in index 6 What do we need to change in insert & find code? <deleted> key=900200205 5 <empty> <empty> 4 <empty> 2 <empty> 1 <empty> 0 3 20 Closed Hashing: Fixed Insert bool HashTable::insert (const Element &item) { int index; if (n_items == m_table_size - 1) return (false); // Table full! Need to resize. index = hash (item.key); // “Home” spot for item. while (!is_empty(index) && !is_deleted(index)) { if (item.key == htable[index].key) { return (false); // Already in table. // Look for free entry, using linear probing. index = (index + 1) % m_table_size; } htable[index] = item; n_items++; return (true); // Found free entry. Store item. } 21 Closed Hashing: Fixed Find bool HashTable::find (int key, Element &item) { int index = hash (key); // “Home” spot for item. while (!is_empty(index)) { if (!is_deleted(index) && key == htable[index].key) { // Found item! item = htable[index]; return (true); } // Keep looking, using linear probing. index = (index + 1) % m_table_size; } // Hit an empty spot, without finding item with key return (false); // Can’t find key in table. } 22 String Hash Functions • Can hash more than integers • pair of ints, strings, … string s = “Hash tables enable fast find operations!”; int hash_index = hash_func (s); int hash_func (string s) { return (s[0] % m); // m is the hash table size } Good? No! 1. Indices all from 0 to 255. If m = 5000, most indices never used. 2. Some letters (e.g. ‘e’) more likely than others – Poor spreading over indices even from 0 to 255 3. Empty string will crash (no s[0]) 23 Better String Hash Function int hash_func (string s) { int hash = 0; int length = s.length(); for (int i = 0; i < length; i++) hash = hash * 127 + s[i]; // Multiply by a non-power of 2 for “bit scrambling” } hash = hash % m; // m is the hash table size return (hash); } 1. Produces larger numbers: uses all indices – With strings of length 4, indices up to 500 million possible 2. Each bit of hash is a scramble of the string characters  Spreads various strings better across indices 24 Complexity Analysis – Open Hashing • Let  = n / m – n = items in table – m = table size (array size) – “load factor” • Worst case? – All items could hash to one location – Linked list with n items to search – O(n) find and delete – O(1) insert (at head) 25 Complexity Analysis – Open Hashing • Average case – – – – Good hash function: all m locations equally likely Assume O(1) time to compute hash function Expected length of each linked list: n/m =  Find and delete are: O(1 + ) • 1 to compute hash and go to “home” index •  to look through a linked list of length  – If  kept small (constant): • O(1) find and delete • Already had O(1) insert – To keep  small, keep m within a constant factor of n • Often want m >= n for highest speed 26 Complexity Analysis – Closed Hashing • Worst case – Insert, find, delete could all take O(n) time • Average case – Good hash function: all m locations equally likely – Insert and unsuccessful find: 1 • O(1−) – Successful find • O(1 ∗ ln(1−1)) – O(1) if  kept well below 1 •  = 0.5: average 2 probes per insert and 1.4 per successful find – – – – Degrades rapidly as  approaches 1 Table fails (cannot insert) when  == 1 Usually slightly faster than open hashing for small  Degrades more rapidly than open hashing for larger  27 Average Probes vs. Load Factor 25 20 Probes (Comparisons) per Find 15 Open hashing: Find Closed Hashing: Find 10 5 0 0 0.2 0.4 0.6 0.8 1  (load factor) 28

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ppt - EECG Toronto