Download ppt - EECG Toronto

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

B-tree wikipedia , lookup

Java ConcurrentMap wikipedia , lookup

Control table wikipedia , lookup

Array data structure wikipedia , lookup

Comparison of programming languages (associative array) wikipedia , lookup

Bloom filter wikipedia , lookup

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Transcript
Data Structures &
Computational Complexity
ECE 244
2013
Vaughn Betz
Data Structure Complexity
Operation
Unordered List
Vector
Binary Search
Tree
Build (N items) O(N)
O(N)
O(N log N)
Find item by
value/key
O(N) in general
O(1) at head / tail
O(N)
O(log N)
Insert item
O(1)
O(N) in general
O(1) at back
O(log N)
Delete item
O(N) in general
O(1) at head / tail
O(N) in general
O(1) at back
O(log N)
• BST: all operations reasonably fast
– Good match to database (find, insert, delete common)
– Search through everything to find entry
• vector:
– Slow to insert/delete except at end/back
– Slow find by value/key
• Unordered List: fast insert, slow find and delete
– Good if you only find/delete at head and tail
– Stacks and queues
2
Important Vector Special Case
• If key is an integer
• AND range of key limited: [0 to N-1]
 Can store in array/vector, use key as index
array
MyObject 0
MyObject 1
MyObject 2
MyObject 3
MyObject array[N];
int key;
MyObject object_with_key = array[key];
MyObject 4
...
MyObject N-3
• Build: O(N)
• Find, Insert, Delete: O(1)
• Fastest data structure!
MyObject N-2
MyObject N-1
3
Hash Tables
ECE 244
2013
Hash Table - Idea
• Stores data in a specially indexed array
• Enables O(1) find, insert, delete operations
– Very fast!
– Trade-off: wastes some space
• Each item has a unique key
– Key range large (e.g. student numbers: 900123456)
– Many more keys than items to store
– Can’t just make array[MAX_KEY]  too big
• Know roughly how many items to store
– E.g. number of students in ECE department
• Create array with more entries than we have items to
store
• “hashing function” h(key) to map from key to array entry
• Reference: Algorithms by Cormen et al, Chap. 11
5
Hash Table - Implementation
MyObject hash_array[M];
hash_array
key (0 .. 999999999)
h(key)  index [0..M-1]
– Example:
– h(key) = key % M;
– M = 10
– key = 900200204
– h(key) = 4
MyObject M-1
MyObject M-2
MyObject M-3
...
MyObject 4
MyObject 3
MyObject 2
MyObject 1
MyObject 0
6
Hashing Collisions
const int M = 10;
MyObject hash_array[M];
– h(key) = key % M;
– M = 10
– key0 = 789222308
– h(key0) = 8
– key1 = 900200205
– h(key1) = 5
– key2 = 988777335
– h(key2) = 5  collision!
• Problem: can’t overwrite data of
another object/key
hash_array
MyObject 9
MyObject 8
MyObject 7
MyObject 6
MyObject 5
MyObject 4
MyObject 3
MyObject 2
MyObject 1
MyObject 0
7
Solution 1: Hashing with Chaining
•
•
•
Also known as “open hashing”
Don’t store object at array entry i
Instead store pointer to linked list of objects
–
Every object with a key such that h(key) == i
ListofObjects hash_array[M];
hash_array
key0 = 789222304
h(key0) = 4
key1 = 900200201
h(key1) = 1
key2 = 988777331
h(key2) = 1
...
List 5 = NULL
List 4 = NULL
List 3 = NULL
key = 789222304
next = NULL
List 2 = NULL
List 1 = NULL
List 0 = NULL
key
= 900200201
988777331
next = NULL
8
Open Hashing
• If hash function is “good”
– Spreads out keys across array evenly
• And if hash table (array) big enough
– More array locations than keys we want to insert
• Then average length of list is approx. 1
– Achieves O(1) access
9
Open Hashing & Hash Quality
• Bad hash function example:
– Instead of h(key) = key % M, could use h’(key) = key /
100000000
• E.g.
– key = 988777334
– h’(key) = 9
– If keys are student numbers, and many start with
same digit (entered university at same time)
h’ will not map evenly across array
Lists will be longer, and access slower
10
Solution 2: Closed Hashing
• If h(key) == i, and location i occupied
– check location (i+1) % M in hash table
– store key/object there if empty
hash_array
– else check location (i+2) % M
–…
key=789222308
const int M = 10;
MyObject hash_array[M];
key0 = 789222308
h(key0) = 8
key1 = 900200204
h(key1) = 4
key2 = 988777334
h(key2) = 4
9
8
7
6
key=988777334 5
key=900200204 4
3
2
1
0
11
Closed Hashing
• Consider:
– h(key) == i
– i occupied
• Linear probing:
– Check (i+1) % M, (i+2) %M, …
– Until we find an empty slot
– Problem: used entries tend to cluster, even with a
good hash function
• More probes per insert or find
• Slows down average insert and average find
• Quadratic probing:
– Check (i+1) % M, (i+4) % M, (i+9) % M, …
– Less clustering (find an empty slot faster)
12
Closed Hashing
• Double hashing (or re-hashing)
– Defines two hash functions:
– h(key) and h2(key)
– Example:
• h(key) = key % M
• h2(key) = (key * 7) % M
–
–
–
–
–
First map to h(key)
If collision, try [h(key) + h2(key)] % M
Then [h(key) + 2 * h2(key)] % M
Then [h(key) + 3 * h2(key)] % M
…
• +ve: reduces clustering
• -ve: more time to compute hash functions
13
Closed Hashing: Class
// Storing Element type objects, using an integer key.
// Using closed hashing with linear probing in this class.
Class HashTable {
private:
Element *htable;
// [0..m_table_size-1]
int m_table_size;
int n_items;
// Num items stored in table.
int hash (int key); // Helper function: the hash
bool is_empty (int index);
// Helper functions
void set_to_empty (int index);
public:
HashTable(int table_size);
~HashTable();
bool insert (const Element &item);
bool find (int key, Element &item);
bool delete (const Element &item);
};
14
Closed Hashing: Constructor
HashTable::HashTable (int table_size) {
n_items = 0;
m_table_size = table_size;
htable = new Element[m_table_size];
// Need to store a special value for empty items,
// so we can tell where we have spots to insert things
// in the table.
for (int i = 0; i < m_table_size; i++)
set_to_empty (i);
}
15
Closed Hashing: Insert
bool HashTable::insert (const Element &item) {
if (n_items == m_table_size - 1)
return (false); // Table full!
int index = hash (item.key);
Need to resize.
// “Home” spot for item.
while (!is_empty(index)) {
if (item.key == htable[index].key)
return (false); // Already in table.
// Look for free entry, using linear probing.
index = (index + 1) % m_table_size;
}
htable[index] = item;
n_items++;
return (true);
// Found free entry. Store item.
}
16
Closed Hashing: Find
bool HashTable::find (int key, Element &item) {
int index = hash (key);
// “Home” spot for item.
while (!is_empty(index)) {
if (key == htable[index].key) {
// Found item!
item = htable[index];
return (true);
}
// Keep looking, using linear probing.
index = (index + 1) % m_table_size;
}
// Hit an empty spot, without finding item with key
// Will have at least one empty spot, since insert
// only allowed m_table_size – 1 items
return (false);
}
17
Closed Hashing: Delete
HashTable my_hash(10);
item0.key = 789222308;
my_hash.insert (item0);
item1.key = 900200205;
my_hash.insert (item1);
item2.key = 988777335;
my_hash.insert (item2);
htable
9
<empty>
<empty>
8
key=789222308
7
<empty>
key=988777335
6
<empty>
<empty>
key=900200205
5
my_hash.delete (900200205);
my_hash.find (988777335, found_item);
<empty>
<empty>
4
<empty>
2
<empty>
1
<empty>
0
3
18
Closed Hashing: Delete
HashTable my_hash(10);
item0.key = 789222308;
my_hash.insert (item0);
item1.key = 900200205;
my_hash.insert (item1);
item2.key = 988777335;
my_hash.insert (item2);
htable
9
<empty>
key=789222308 8
7
<empty>
key=988777335 6
<empty>
key=900200205
5
my_hash.delete (900200205);
my_hash.find (988777335, found_item);
Find will fail (return false)!
Problem: didn’t reproduce insert’s probe sequence
How can we fix?
<empty>
<empty>
4
<empty>
2
<empty>
1
<empty>
0
3
19
Closed Hashing: Delete
Solution: can’t mark deleted items as empty.
Mark as “deleted” (another special value)
– OK for insert to re-use.
– But find must keep looking past them, as it probes.
– Ensures original (insert) probe sequence reproduced
by find
htable
9
<empty>
key=789222308 8
7
<empty>
key=988777335 6
my_hash.delete (900200205);
my_hash.find (988777335, found_item);
Now finds item in index 6
What do we need to change in insert & find code?
<deleted>
key=900200205
5
<empty>
<empty>
4
<empty>
2
<empty>
1
<empty>
0
3
20
Closed Hashing: Fixed Insert
bool HashTable::insert (const Element &item) {
int index;
if (n_items == m_table_size - 1)
return (false); // Table full! Need to resize.
index = hash (item.key); // “Home” spot for item.
while (!is_empty(index) && !is_deleted(index)) {
if (item.key == htable[index].key) {
return (false); // Already in table.
// Look for free entry, using linear probing.
index = (index + 1) % m_table_size;
}
htable[index] = item;
n_items++;
return (true);
// Found free entry. Store item.
}
21
Closed Hashing: Fixed Find
bool HashTable::find (int key, Element &item) {
int index = hash (key); // “Home” spot for item.
while (!is_empty(index)) {
if (!is_deleted(index) && key == htable[index].key) {
// Found item!
item = htable[index];
return (true);
}
// Keep looking, using linear probing.
index = (index + 1) % m_table_size;
}
// Hit an empty spot, without finding item with key
return (false); // Can’t find key in table.
}
22
String Hash Functions
• Can hash more than integers
• pair of ints, strings, …
string s = “Hash tables enable fast find operations!”;
int hash_index = hash_func (s);
int hash_func (string s) {
return (s[0] % m); // m is the hash table size
}
Good?
No!
1. Indices all from 0 to 255. If m = 5000, most indices never used.
2. Some letters (e.g. ‘e’) more likely than others
–
Poor spreading over indices even from 0 to 255
3. Empty string will crash (no s[0])
23
Better String Hash Function
int hash_func (string s) {
int hash = 0;
int length = s.length();
for (int i = 0; i < length; i++)
hash = hash * 127 + s[i];
// Multiply by a non-power of 2 for “bit scrambling”
}
hash = hash % m;
// m is the hash table size
return (hash);
}
1. Produces larger numbers: uses all indices
– With strings of length 4, indices up to 500 million possible
2. Each bit of hash is a scramble of the string characters
 Spreads various strings better across indices
24
Complexity Analysis – Open Hashing
• Let  = n / m
– n = items in table
– m = table size (array size)
– “load factor”
• Worst case?
– All items could hash to one location
– Linked list with n items to search
– O(n) find and delete
– O(1) insert (at head)
25
Complexity Analysis – Open Hashing
• Average case
–
–
–
–
Good hash function: all m locations equally likely
Assume O(1) time to compute hash function
Expected length of each linked list: n/m = 
Find and delete are: O(1 + )
• 1 to compute hash and go to “home” index
•  to look through a linked list of length 
– If  kept small (constant):
• O(1) find and delete
• Already had O(1) insert
– To keep  small, keep m within a constant factor of n
• Often want m >= n for highest speed
26
Complexity Analysis – Closed Hashing
• Worst case
– Insert, find, delete could all take O(n) time
• Average case
– Good hash function: all m locations equally likely
– Insert and unsuccessful find:
1
• O(1−)
– Successful find
• O(1 ∗ ln(1−1))
– O(1) if  kept well below 1
•  = 0.5: average 2 probes per insert and 1.4 per successful find
–
–
–
–
Degrades rapidly as  approaches 1
Table fails (cannot insert) when  == 1
Usually slightly faster than open hashing for small 
Degrades more rapidly than open hashing for larger 
27
Average Probes vs. Load Factor
25
20
Probes
(Comparisons)
per Find
15
Open hashing: Find
Closed Hashing: Find
10
5
0
0
0.2
0.4
0.6
0.8
1
 (load factor)
28