Download Dictionaries and Hash Tables

Document related concepts

Array data structure wikipedia , lookup

Binary search tree wikipedia , lookup

Java ConcurrentMap wikipedia , lookup

Control table wikipedia , lookup

Comparison of programming languages (associative array) wikipedia , lookup

Bloom filter wikipedia , lookup

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Transcript
Hash Tables
Asst. Prof. Dr. İlker Kocabaş
Overview
Information Retrieval
Binary Search Trees
Hashing.
Applications.
Example.
Hash Functions.
Hash Tables
Collisions
Linear Probing
Problems with Linear Prob
Chaining
2
Example: Bibliography
R. Kruse, C. Tondo, B. Leung, “Data Structures and Program
Design in C”, 1991, Prentice Hall.
E. Horowitz, S. Salini, S. Anderson-Freed, “Fundamentals of
Data Structures in C”, 1993, Computer Science Press.
R. Sedgewick, “Algorithms in C”, 1990, Addison-Wesley.
A. Aho, J. Hopcroft, J. Ullman, “Data Structures and
Algorithms”, 1983, Addison-Wesley.
T.A. Standish, “Data Structures, Algorithms & Software
Principles in C”, 1995, Addison-Wesley.
D. Knuth, “The Art of Computer Programming”, 1975, AddisonWesley.
Y. Langsam, M. Augenstein, M. Fenenbaum, “Data Structures
using C and C++”, 1996, Prentice Hall.
3
Insert the information into a Binary Search Tree,
using the first author’s surname as the key
4
Kruse
Horowitz Sedgewick
Aho
Knuth
Langsam
Standish
Kruse
Horowitz
Aho
Knuth
Sedgewick
Langsam
Standish
Insert the information into a Binary Search Tree,
using the first author’s surname as the key
5
Complexity
Inserting


Balanced Trees O(log(n))
Unbalanced Trees O(n)
Searching


Balanced Trees O(log(n))
Unbalanced Trees O(n)
6
Hashing
hash table
0
1
key
hash
function
pos
2
3
:
:
TABLESIZE - 1
7
Example:
hash table
0
1
“Kruse”
hash
function
5
2
3
4
5
Kruse
6
8
Hashing
Each item has a unique key.
Use a large array called a Hash Table.
Use a Hash Function.
9
Applications
Databases.
Spell checkers.
Computer chess games.
Compilers.
10
Operations
Initialize

all locations in Hash Table are empty.
Insert
Search
Delete
11
Hash Function
Maps keys to positions in the Hash
Table.
Be easy to calculate.
Use all of the key.
Spread the keys uniformly.
12
Example: Hash Function #1
unsigned hash(char* s)
{
int i = 0;
unsigned value = 0;
while (s[i] != ‘\0’)
{
value = (s[i] + 31*value) % 101;
i++;
}
return value;
}
13
Example: Hash Function #1
value = (s[i] + 31*value) % 101;
A. Aho, J. Hopcroft, J. Ullman, “Data Structures and
Algorithms”, 1983, Addison-Wesley.
‘A’ = 65
‘h’ = 104
‘o’ = 111
value = (65 + 31 * 0) % 101 = 65
value = (104 + 31 * 65) % 101 = 99
value = (111 + 31 * 99) % 101 = 49
14
Example: Hash Function #1
value = (s[i] + 31*value) % 101;
Key
Aho
Kruse
Standish
Horowitz
Langsam
Sedgewick
Knuth
Hash
Value
49
95
60
28
21
24
44
resulting
table is
“sparse”
15
Example: Hash Function #2
value = (s[i] + 1024*value) % 128;
Key
Aho
Kruse
Standish
Horowitz
Langsam
Sedgewick
Knuth
Hash
Value
111
101
104
122
109
107
104
likely to
result in
“clustering”
16
Example: Hash Function #3
value = (s[i] + 3*value) % 7;
Key
Aho
Kruse
Standish
Horowitz
Langsam
Sedgewick
Knuth
Hash
Value
0
5
1
5
5
2
1
“collisions”
17
Insert
Apply hash function to get a position.
Try to insert key at this position.
Deal with collision.
18
Example: Insert
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
0
Aho
1
Aho
Hash
Function
0
2
3
4
5
6
19
Example: Insert
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
0
Aho
1
Kruse
Hash
Function
5
2
3
4
5
Kruse
6
20
Example: Insert
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
Standish
Hash
Function
1
0
Aho
1
Standish
2
3
4
5
Kruse
6
21
Search
Apply hash function to get a position.
Look in that position.
Deal with collision.
22
Example: Search
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
Kruse
Hash
Function
5
0
Aho
1
Standish
2
3
4
found.
5
Kruse
6
23
Example: Search
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
Sedgwick
Hash
Function
2
0
Aho
1
Standish
2
3
4
Not found.
5
Kruse
6
24
Hash Tables:
Collision Resolution
Hashing
hash table
0
1
key
hash
function
pos
2
3
:
:
TABLESIZE - 1
26
Example:
hash table
0
1
“Kruse”
hash
function
5
2
3
4
5
Kruse
6
27
Hashing
Each item has a unique key.
Uses a large array called a Hash Table.
Uses a Hash Function.
Hash Function
•
•
•
•
Maps keys to positions in the Hash Table.
Be easy to calculate.
Use all of the key.
Spread the keys uniformly.
28
Hash Table Operations
Initialize

all locations in Hash Table are empty.
Insert
Search
Delete
29
Example: Hash Function #3
value = (s[i] + 3*value) % 7;
Key
Aho
Kruse
Standish
Horowitz
Langsam
Sedgewick
Knuth
Hash
Value
0
5
1
5
5
2
1
“collisions”
30
Collision
When two keys are mapped to the same position.
Very likely.
Birthdays
Number of People
Probability
10
0.1169
20
0.4114
30
0.7063
40
0.8912
50
0.9704
60
0.9941
70
0.9992
31
Collision Resolution
Two methods are commonly used:
 Linear Probing.
 Chaining.
32
Linear Probing
Linear search in the array from the
position where collision occurred.
33
Insert with Linear Probing
Apply hash function to get a position.
Try to insert key at this position.
Deal with collision.

Must also deal with a full table!
34
Example: Insert with Linear Probing
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
0
Aho
1
Aho
Hash
Function
0
2
3
4
5
6
35
Example: Insert with Linear Probing
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
0
Aho
1
Kruse
Hash
Function
5
2
3
4
5
Kruse
6
36
Example: Insert with Linear Probing
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
Standish
Hash
Function
1
0
Aho
1
Standish
2
3
4
5
Kruse
6
37
Example: Insert with Linear Probing
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
Horowitz
Hash
Function
5
0
Aho
1
Standish
2
3
4
5
Kruse
6
Horowitz
38
Example: Insert with Linear Probing
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
Langsam
Hash
Function
5
0
Aho
1
Standish
2
Langsam
3
4
5
Kruse
6
Horowitz
39
module linearProbe(item)
{
position = hash(key of item)
count = 0
loop {
if (count == hashTableSize) then {
output “Table is full”
exit loop
}
if (hashTable[position] is empty) then {
hashTable[position] = item
exit loop
}
position = (position + 1) % hashTableSize
count++
}
}
40
Search with Linear Probing
Apply hash function to get a position.
Look in that position.
Deal with collision.
41
Example: Search with Linear Probing
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
found.
Langsam
Hash
Function
hash table
5
0
Aho
1
Standish
2
Langsam
3
4
5
Kruse
6
Horowitz
42
Example: Search with Linear Probing
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
hash table
Knuth
Hash
Function
1
0
Aho
1
Standish
2
Langsam
3
4
not found.
5
6
Kruse
Horowitz
43
module search(target)
{
count = 0
position = hash(key of target)
loop {
if (count == hashTableSize) then {
output “Target is not in Hash Table”
return -1.
}
else if (hashTable[position] is empty) then {
output “Item is not in Hash Table”
return -1.
}
else if (hashTable[position].key == target) then {
return position.
}
position = (position + 1) % hashTableSize
count++
}
}
44
Delete with Linear Probing
Use the search function to find the item
If found check that items after that also
don’t hash to the item’s position
If items after do hash to that position,
move them back in the hash table and
delete the item.
Very difficult and time/resource consuming!
45
Linear Probing: Problems
Speed.
Tendency for clustering to occur as the
table becomes half full.
Deletion of records is very difficult.
If implemented in arrays – table may
become full fairly quickly, resizing is
time and resource consuming
46
Chaining
Uses a Linked List at each position in
the Hash Table.


Linked list at a position contains all the
items that ‘hash’ to that position.
May keep linked lists sorted or not.
47
hash table
0
1
2
3
:
:
48
Example: Chaining
Aho, Kruse, Standish, Horowiz, Langsam, Sedgwick, Knuth
0,
5,
1,
5,
0
1
Aho
1
2
Standish
2
1
3
0
4
0
5
3
6
5,
2,
1
Knuth
Sedgewick
Kruse
Horowitz
Langsam
0
49
Hashtable with Chaining
At each position in the array you have a
list:
List hashTable[MAXTABLE];
0
1
1
2
2
1
:
You must initialise each list in the table.50
Insert with Chaining
Apply hash function to get a position in
the array.
Insert key into the Linked List at this
position in the array.
51
module InsertChaining(item)
{
posHash = hash(key of item)
insert (hashTable[posHash], item);
}
0
1
Aho
1
2
Standish
2
1
Sedgewick
Knuth
:
52
Search with Chaining
Apply hash function to get a position in
the array.
Search the Linked List at this position in
the array.
53
/* module returns NULL if not found, or the address of the
* node if found */
module SearchChaining(item)
{
posHash = hash(key of item)
Node* found;
found = searchList (hashTable[posHash], item);
return found;
}
0
1
Aho
1
2
Standish
2
1
Sedgewick
Knuth
:
54
Delete with Chaining
Apply hash function to get a position in
the array.
Delete the node in the Linked List at
this position in the array.
55
/* module uses the Linked list delete function to delete an
item
*inside that list, it does nothing if that item isn’t there. */
module DeleteChaining(item)
{
posHash = hash(key of item)
}
deleteList (hashTable[posHash], item);
0
1
Aho
1
2
Standish
2
1
Sedgewick
Knuth
:
56
Disadvantages of Chaining
Uses more space.
More complex to implement.



Contains a linked list at every element in the
array.
Requires linear searching.
May be time consuming.
57
Advantages of Chaining
Insertions and Deletions are easy and
quick.
Allows more records to be stored.
Naturally resizable, allows a varying
number of records to be stored.
58
Double Hashing
Double hashing uses a
secondary hash function
d(k) and handles
collisions by placing an
item in the first available
cell of the series
(i + jd(k)) mod N
for j = 0, 1, … , N - 1
The secondary hash
function d(k) cannot
have zero values
The table size N must be
a prime to allow probing
of all the cells
Common choice of
compression map for the
secondary hash function:
d2(k) = q - k mod q
where


q<N
q is a prime
The possible values for
d2(k) are
1, 2, … , q
Dictionaries and Hash Tables
59
Performance of
Hashing
In the worst case, searches,
insertions and removals on a
hash table take O(n) time
The worst case occurs when
all the keys inserted into the
dictionary collide
The load factor a = n/N
affects the performance of a
hash table
Assuming that the hash
values are like random
numbers, it can be shown
that the expected number of
probes for an insertion with
open addressing is
1 / (1 - a)
The expected running
time of all the dictionary
ADT operations in a
hash table is O(1)
In practice, hashing is
very fast provided the
load factor is not close
to 100%
Applications of hash
tables:



small databases
compilers
browser caches
Dictionaries and Hash Tables
60