Download Hash Table -

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Java ConcurrentMap wikipedia, lookup

B-tree wikipedia, lookup

Rainbow table wikipedia, lookup

Hash table wikipedia, lookup

Control table wikipedia, lookup

Array data structure wikipedia, lookup

Bloom filter wikipedia, lookup

Comparison of programming languages (associative array) wikipedia, lookup

Yerusha Nuh & Ivan Yu
Efficient access of data.
 Access by index.
 Mapping between search keys and
indices allows each data to be stored in
the array element with the
corresponding index.
There are 500 students in a school. Each
student has their own TDSB nine-digit
student number.
If we want to assign an ID to each student
name, we could use their student
number. However, if the greatest student
number is “351000005”, there would be
351,000,005 elements in the array. This
is a lot more than what is required to
store the names of 500 students.
 Mapping between the student numbers
and the numbers from 0 to 499.
By using arithmetic operations on keys, we
can map them onto table addresses.
 Direct referencing.
Methods for mapping:
 Direct address table
 Hash table
Hash table – a data structure that uses a hash function
to efficiently map certain identifiers or keys (i.e.
persons’ names) to associated values (i.e. their
telephone numbers).
A hash table is made up of two parts:
 An array (the actual table where the data to be
searched is stored)
 A mapping function, a.k.a. hash function.
Hash Function
Hash function – a function that transforms
the search key into a table address.
Different hash functions use different
arithmetic operations to do this. We will
focus on the modulo arithmetic.
Hash Function
Modulo Arithmetic
Numbers as keys
 Address = search key % size of array
Pseudocode - Number
get number
address = key % size of array
Strings as keys
 Take the binary representation of a key
as a number and then apply the first
In general the arithmetic operations in
such expressions will use 32-bit modular
arithmetic ignoring overflow.
For example:
Integer.MAX_VALUE + 1 = Integer.MIN_VALUE
Integer.MAX_VALUE = 2147483647
Integer.MIN_VALUE = -2147483648
104*314 + 101*313 + 108*312 + 108*311 + 111*310
= 99162322
To prevent overflow, we can apply Horner’s method:
anxn + an-1·xn-1 + an-2·xn-2 + … + a1x1 + a0x0
= x(x(…x(x (an·x +an-1) + an-2) + ….) + a1) + a0
99162322 = (((104*31 + 101)31 + 108)31 +
108)31 + 111
We compute the hash function by applying
the mod (%) operation at each step, thus
avoiding overflowing.
Compute h0 = (22*32 +5) % N
Compute h1 = (32*h0 + 18) % N
Compute h2 = (32*h1 +25) % N
Pseudocode - String
get string
loop (for as many as the number of
characters in the string, each time with a
different character of the string)
address = (31*address + Unicode of
character) % size of array
Hash Table
How do we choose the size of the array (hash table)?
Let N be the number of records to be stored.
Let M be the size of the hash table.
Ideally N records are stored in a hash table of size N.
 We may not have prior knowledge of exact number of
 It is possible to have two keys mapped to the same index
(although this can be prevented).
Hence, we assume that the size of the table (N) can be
different from the number of records (M).
Load factor – the ratio between N and M.
 Load factor L = N/M
 The default L value for Java is 0.75.
Note: M should be a prime number to
obtain more even distribution of keys
over the table.
Collision Resolution
Collision – when two or more keys hash to
the same index.
Methods to resolve collisions:
 Separate chaining
 Open addressing
 Linear probing
 Quadric probing
 Double hashing
Linear Probing
Collision when inserting:
 Probe the next slot in the table.
 If unoccupied, store the key.
 If occupied, continue probing the next
Linear Probing - Collision
 If the key hashes to an occupied slot but
does not match the key occupying the
slot, probe the next slot.
 If slot is empty, search is unsuccessful.
 If slot is occupied:
○ If it does not match, search is unsuccessful.
○ If it matches, search is successful.
When reaching the end of table, resume
from the beginning.
 Primary clustering – building up of large
 Runs slowly for tables that are almost full
Hash Table - Advantages
 Especially with large number of entries (thousands
or more).
Efficient when maximum number of entries is
predicted in advance.
If the set of key-value pairs is fixed and known
ahead of time (no insertions and deletions),
average lookup cost can be reduced by a
careful choice of the hash function, bucket
table size, and internal data structures.
Hash Tables - Disadvantages
More difficult to implement than self-balancing binary
Difficult to create a perfect hash function.
Insertion or deletion may take time proportional to
number of entries.
 May not be suitable for real-time or interactive
Cost is significantly higher than sequential list or
search tree even though operations take constant
time on average.
 Not suitable for small number of entries.