* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Heap Sort
Survey
Document related concepts
Mathematical optimization wikipedia , lookup
Birthday problem wikipedia , lookup
Cryptography wikipedia , lookup
Corecursion wikipedia , lookup
Diffie–Hellman key exchange wikipedia , lookup
Error detection and correction wikipedia , lookup
Fisher–Yates shuffle wikipedia , lookup
History of cryptography wikipedia , lookup
Selection algorithm wikipedia , lookup
Binary search algorithm wikipedia , lookup
Post-quantum cryptography wikipedia , lookup
Bloom filter wikipedia , lookup
Transcript
Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011 Hashing ► In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast ► Example: empdata[1000] index = employee ID number ► Search for employee with ID number 500 ► return empdata[500] ► Running Time: O(1) Hash Table ►A data structure implemented as an array of objects, where the search keys correspond to the array indices ► Insert and find operations involve straight forward array accesses: O(1) time complexity About Hash Tables ► In the first example shown, it was relatively easy since employee number is an integer ► A few problems may arise in different situations About Hash Table ► Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical ► Need to map large integer values to smaller array indices ► Problem 2: What if the key is a word in the English Alphabet (e.g. last names) ► Need to map names to integers (indices) Large Values to Small Values ► Hash function: converts a number from a large range into a number from a smaller range (the range of array indices) ► Size of the array ► Rule of thumb: the array size should be about twice the size of the data set ► For 50,000 words, use an array of 100,000 elements Hash Function and Modulo ► Simplest Hash Function: achieved by using the modulo function (returns the remainder) ► For example, 33 % 10 = 3 ► General Formula: LargeNumber % SmallRange Hash Functions for Names ► Sum of Digits Method ► Map the alphabet A to Z to the numbers 1 to 26 (a=1, b=2, c=3, etc) ► Add the total of the letters ► For example, “cats” ►c=3, a=1, t=20, s=19, 3+1+20+19=43 ►“cats” will be stored using index 43 ► Use modulo to map to a smaller array Collisions ► Problem ► Too many words with the same index ► “was”, “tin”, “give”, “tend”, “moan”, “tick” and several other words add to 43 ► These are called collisions: case where two different search keys hash to the same index value Collisions ► Can occur even when dealing with integers ► Suppose the size of the hash table is 100 ► Keys 158 and 358 hash to the same value when using the modulo hash function Collision Resolution Policy ► Need to know what to do when a collision occurs; i.e. during an insert operation; What if the array slot is already occupied? ► Most common policy: go to the next available slot ► “wrap around” the array if necessary Collision Resolution Policy ► Consequence: when searching, use the hash function, first check whether the element is the one you are looking for ► If not, try the next slots ► How do you know if the element is not in the array? Probe Sequence ► Sequence of indices that serve as array slots where a key value would map to ► The first index in the probe sequence is the home position; the value of the hash function ► The next indices are the alternative slots Probe Sequence ► Suppose the array size is 10, and the hash function is h(K) = K%10. ► The probe sequence for K=25 is: ► 5,6,7,8,9,0,1,2,3,4 ► Here, we assume that most common collision resolution policy of going to the next slot: p(K,i) = I ► Goal: exhaust array slots Hash Table Operations ► Insert ► object Obj with key value K home h(K) for i 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error” // or overwrite it else if HT[pos] is null then HT[pos] Obj break; Hash Table Operations ► Finding ► an object with key value K home h(K) for i 0 to M-1 do pos = (home + p(K,i)) % if HT[pos].getKey() = K return HT[pos] else if HT[pos] is null throw exception “not 10 then then found” Hash Table Operations ► Although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n) ► Something to think about: characterize the worst-case scenarios for insert and find Removing Elements ► Removing an element from a hash table during a delete operation poses a problem ► If we set the corresponding hash table entry to null, then succeeding find operations might not work properly ► Recall that for the find algorithm, seeing a null means a target element is not found but in fact the element might be in a next slot Removing Elements ► Solution: ► Arrange tombstone it so that deleted entries seem null when inserting, but don’t seem null when searching ► Requires a simple flag on the objects stored Hash Tables in Java ► java.util.Hashtable ► Important methods for Hashtable class ► put(Object key, Object entry) ► Object get(Object key) ► remove(Object key) ► boolean constainsKey(Object key) Summary ► Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations ► Caveat: O(n) in the worst-case because of the possibility of collisions Summary ► Requires a hash function(maps keys to array indices) and a collision resolution policy ► Probe sequence depicts a sequence of array slots that an object would occupy, given its key ► In Java: use the Hashtable class