Download Hashing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Generalized linear model wikipedia , lookup

Cryptography wikipedia , lookup

Probability box wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Birthday problem wikipedia , lookup

Post-quantum cryptography wikipedia , lookup

Hash table wikipedia , lookup

Bloom filter wikipedia , lookup

Rainbow table wikipedia , lookup

Cryptographic hash function wikipedia , lookup

Transcript
Look-up problem
IP address
did we see the IP address before?
Hashing + chaining
use IP address
as an index
IP address
hash function
linked list
index
How to choose a hash function?
depends on the distribution of the data
x [0,1]
x  x.n
interpret x as a number
x  x mod n
IP-addresses
n=256
bad
n=257
good?
Universal hash functions
choose a hash function “randomly”
n = number of entries in the hash table
U = the universe
h: U  {0,...,n-1} a hash function
Universal hash functions
choose a hash function “randomly”
n = number of entries in the hash table
U = the universe
h: U  {0,...,n-1} a hash function
a set of hash functions H is universal
if x,y U and random h  H
P ( h(x) = h(y) )  1/n
Universal hash functions
a set of hash functions H is universal
if x,y U and random h  H
P ( h(x) = h(y) )  1/n
For IP addresses
choose a1,a2,a3,a4  {0,1,...,256}
(x1,x2,x3,x4)  a1x1+a2x2+a3x3+a4x4 mod 257
Perfect hashing
Goal: worst-case O(1) search
space used O(m)
static set of elements
Perfect hashing
Goal: worst-case O(1) search
space used O(m)
static set of elements
n = m2
i.e., space used (m2)
H = family of universal hash functions
 hash function h H with no collision
Perfect hashing
Goal: worst-case O(1) search
space used O(m)
n=m
H = family of universal hash functions
x1,...,xn the number of elements that map
to 1,2,...,n
 h H such that
 xi2 = O(m)
Perfect hashing
 h H such that
Goal: worst-case O(1) search
space used O(m)
n=m
 xi2 = O(m)
H = family of universal hash functions
x1,...,xn the number of elements that map
to 1,2,...,n
secondary hash table of size xi2
Bloom filter
Goal: store an m element subset of IP addresses
IP address
HASH
0
0
HASH
n-bits of storage
HASH
0
Bloom filter - insert
INSERT(x)
for i from 1 to k do
A(hi(x))  1
IP address
HASH
1
1
HASH
n-bits of storage
HASH
1
Bloom filter – member
MEMBER(x)
for i from 1 to k do
if A(hi(x))=0 then return FALSE
return TRUE
IP address
HASH
1
1
HASH
n-bits of storage
HASH
1
Bloom filter – member
MEMBER(x)
for i from 1 to k do
if A(hi(x))=0 then return FALSE
return TRUE
sometimes gives false positive answer
error parameter: false positive probability
Bloom filter – analysis
error parameter: false positive probability
m = number of items to be stored
n = number of bits of storage
k = number of hash functions
Bloom filter – analysis
error parameter: false positive probability
m = number of items to be stored
n = number of bits of storage
k = number of hash functions
p = fraction of the bits filled
p
-km/n
e
Bloom filter – analysis
error parameter: false positive probability
m = number of items to be stored
n = number of bits of storage
k = number of hash functions
p = fraction of the bits filled
false positive probability
k
(1-p)
p  e-km/n
Bloom filter – analysis
error parameter: false positive probability
m = number of items to be stored
n = number of bits of storage
k = number of hash functions
optimal
k  0.7 m/n
false positive rate  0.6185m/n