Download Notes 33 Royden

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

B-tree wikipedia , lookup

Java ConcurrentMap wikipedia , lookup

Comparison of programming languages (associative array) wikipedia , lookup

Bloom filter wikipedia , lookup

Control table wikipedia , lookup

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Transcript
Data Structures
CSCI 132
Hash Tables
1
Tables with complicated index
functions
•Index functions are not always simple functions that compute
an integer value from integer inputs.
•Often, the key used for table lookup is not a number, but rather
an object or string.
•Example:
Keys that consist of 8 character words.
•Problem: There are 268 = 2 x 1011 possible arrangements of
characters. There is not enough memory to contain a table with
one position for each possible word. Furthermore, only a few
of the table positions would be filled--it would be a sparse table.
2
Hash Tables
•Hash tables use an index function that maps many possible keys to a single
location.
•If the table is sparse, then most of the time only 1 key will go to each
location.
•If 2 records do get assigned to the same location (a collision), we use a
method for reassigning the second record (collision resolution).
A hash table
3
The Hash Table Algorithm
Insertion:
1) Calculate hash function of the key of the record to be inserted.
2) If the location is empty, insert the record there.
3) If the location contains the same record, do not insert.
4) If the location contains a different record, find a new location for
insertion with collision resolution method.
Retrieval:
1) Calculate the hash function of the key.
2) If the record is at that location, retrieve it.
3) Otherwise, follow collision resolution method to find the record.
4
Creating Hash Functions
Hash functions should:
1) Be easy and quick to compute
2) Achieve an even distribution of keys across the table.
Methods:
Truncation
Folding
Modular Arithmetic
5
A Hash Function Example
class Key: public String {
public:
char key_letter(int position) const;
void make_blank( );
// Add constructors and other methods.
};
int hash(const Key &target) {
int value = 0;
for (int position = 0; position < 8; position++)
value = 4 * value + target.key_letter(position);
return value%hash_size;
}
6
Collision Resolution
Methods:
Linear Probing
Quadratic probing
Key dependent Increments
Random probing
Chaining
7
Chaining
Chaining uses a table of linked lists. Collisions are resolved by inserting
the new elements into a list at the shared location.
8
Advantages and disadvantages
of chaining
Advantages:
•Create an array of addresses rather than records. If the records are large,
this saves considerable space.
•Collision handling is simple--Insert colliding records into a list.
•Allows more records to be stored than the size of the table.
•Deletion of records is easy.
Disadvantages:
•If table is full (or nearly full) there may be long lists at some key locations.
This can slow down retrieval because you have to search the list for your
record.
•Pointers take up memory space. This may be wasteful if the records are
small.
9
The C++ Hash Table
Specification
const int hash_size = 997; // a prime number of appropriate size
class Hash_table {
public:
Hash_table( );
void clear( );
Error_code insert(const Record &new_entry);
Error_code retrieve(const Key &target, Record &found) const;
private:
Record table[hash_size];
};
10
Implementation of insert( )
Error_code Hash_table :: insert(const Record &new_entry) {
Error_code result = success;
int probe_count,
// Counter to be sure that table is not full.
increment,
// Increment used for quadratic probing.
probe;
// Position currently probed in the hash table.
Key null;
// Null key for comparison purposes.
null.make_blank( );
probe = hash(new_entry);
//Find location to insert new_entry
probe_count = 0;
increment = 1;
11
insert( ) continued
//we will complete this in class.
}
12
Likelihood of collisions
•How many people have to be in a room before the probability that
two of them have the same birthday reaches 50%?
P=?
•The calculation for a probability of a collision in a table is similar.
•The table does not have to be very full for the probability of a
collision to reach at least 50%.
•Therefore:
Collisions happen! We must handle them efficiently.
13
Counting Probes
•We can analyze the running time of hash tables by counting comparisons.
•Comparisons take place when "probing" an entry: Looking at an entry and
comparing its key to a target.
•The number of probes done depends on how full the table is.
n = number of entries in the table
t = number of total positions in table (= hash_size)
l = n/t = Load Factor
l = 0 means no entries in table
l = 0.5 means the table is 1/2 full
l <= 1 for contiguous table without chaining (open addressing)
l can be greater than 1 if using chaining
14
Number of comparisons for
chaining
Unsuccessful searches:
•If entries distributed evenly over the table, then the expected number of entries
in each chain is: n/t = l.
•For an unsuccessful search, we must do one probe for each entry in the list, so
the average number of probes (or comparisons) is l.
Successful searches:
•Average number of comparisons for sequential search of a list with k items is:
(k + 1)/2
•The node we are looking for is in our list, the other n-1 nodes are distributed
evenly over the table so the average number of nodes will be:
k = (n-1)/t + 1 ~ n/t + 1 = l + 1.
•Average number of comparisons will be
(l + 1 + 1)/2 = l/2 + 1
15
Open addressing (without
chaining)
Evenly distributed entries, Random probing:
Number of Comparisons (approx)
Successful case:
(1/l)ln(1/(1-l))
Unsuccessful case:
1/(1 - l)
Linear Probing:
Successful case:
Unsuccessful case:
0.5(1 + 1/(1-l) )
0.5(1 + 1/(1-l)2 )
16
Theoretical and empirical results
17
Hash Tables vs. Other Methods
•Speed of retrieval from a hash table does not depend on the
total number of entries, but on the ratio of entries/table-size (l).
•A table of size 40 with 20 entries has the same performance as a
table of size 4000 with 2000 entries.
Sequential Search: Q(n)
Binary Search: Q( lg(n))
Hash Table retrieval: O (1) for small l.
•Read section 9.8 on choosing a method for storage and retrieval
of data.
18
Radix sort
Radix sort creates a table of queues. Each queue corresponds to a
letter of the alphabet.
Sort from least significant letter to most significant letter.
19
Implementation of Radix Sort
const int key_size = 5;
const int max_chars = 28;
template <class Record>
void Sortable_list<Record> :: radix_sort( ) {
Record data;
Queue queues[max_chars];
for (int position = key_size - 1; position >= 0; position--) {
// Loop from the least to the most significant position.
while (remove(0, data) == success) {
int queue_number = alphabetic_order(data.key_letter(position));
queues[queue_number].append(data); // Queue operation.
}
rethread(queues); // Reassemble the list.
}
}
20