Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Universal Hashing
Worst case analysis
Probabilistic analysis
Need the knowledge of the distribution of the inputs
Indicator random variables
Given a sample space S and an event A, the indicator
random variable I{A} associated with event A is defined
as:
1 if A occurs
I  A 
0 o/w
E.g.:
Consider flipping a fair coin:
• Sample space S = { H,T }
• Define random variable Y with Pr{ Y=H } =
Pr{ Y=T }=1/2
• We can define an indicator r.v. XH associated with the
coin coming up heads, i.e. Y=H
1 if Y  H
X H  I Y  H  
0 if Y  T
E  X H   E  I Y  H 
 1 Pr Y  H   0  Pr Y  T 
1
 Pr Y  H  
2
Lemma :
Given a sample space S and an event A in the
sample space S, let X A  I { A}
Then E  X A   Pr  A
Proof :
E  X A   E  I  A  1 Pr  A  0  Pr  A
 Pr  A
H={ h: U→{0,…,m-1} }, which is a finite collection
of hash functions.
H is called “universal” if for each pair of distinct
keys k, ∈ U, the number of hash functions h∈H
for which h(k)=h( ) is at most |H|/m
Define ni = the length of list T[i]
Theorem:
Suppose h is randomly selected from H, using
chaining to resolve collisions. If k is not in the table,
then E[nh(k)] ≤ α. If k is in the table, then
E[nh(k)] ≤ 1+α
Proof:
For each pair k and of distinct keys,
define Xk =I{h(k)=h( )}.
By definition, Prh{h(k)=h( )} ≤ 1/m, and so
E[Xk ] ≤ 1/m.
Define Yk to be the number of keys other than k
that hash to the same slot as k, so that
Yk   Xk
T
k
1
E[Yk ]   E[ Xk ]  
T
T m
k
k
If k  T , then nh ( k )  Yk and |{ :  T ,  k}|  n
n
thus E[nh ( k )]  E[Yk ]   
m
If k∈T, then because k appears in T[h(k)] and the
count Yk does not include k, we have nh(k) = Yk + 1
and |{ :  T ,  k}| n  1
n 1
1
Thus E[nh ( k )]  E[Yk ]  1 
1  1    1 
m
m
Corollary: Using universal hashing and collision
resolution by chaining in an initially empty table
with m slots, it takes expected time Θ(n) to handle
any sequence of n Insert, Search and Delete
operations containing O(m) Insert operations.
Proof: Since n= O(m), the load factor is O(1). By the
Thm, each Search takes O(1) time. Each of Insert
and Delete takes O(1). Thus the expected time is
Θ(n).
Designing a universal class of hash functions:
p:prime
Z p  0,1,, p  1 Z p  1,2,, p  1
For any a  Z p and b  Z p , define ha,b : Zp→Zm
ha ,b (k )  (( ak  b) mod p ) mod m
 p,m  ha,b : a  Z *p and b  Z p 
Theorem:
Hp,m is universal.
Pf: Let k, be two distinct keys in Zp.
Given ha,b, Let r= (ak +b) mod p, and
s= (a +b) mod p.
Then r-s ≡ a(k- ) mod p
For any ha,b∈Hp,m, distinct inputs k and
map to distinct r and s modulo p.
Each possible p(p-1) choices for the pair
(a,b) with a≠0 yields a different resulting
pair (r,s) with r≠s, since we can solve for
a and b given r and s:
a=((r-s)((k- )-1 mod p)) mod p
b=(r-ak) mod p
There are p(p-1) possible pairs (r,s)
with r≠s, there is a 1-1
correspondence between pairs (a,b)
with a≠0 and (r,s), r≠s.
For any given pair of inputs k and , if
we pick (a,b) uniformly at random from
Z p  Z p , the resulting pair (r,s) is
equally likely to be any pair of distinct
values modulo p.
Pr[ k and collide]=Prr,s[ r≡s mod m]
Given r, the number of s such that s≠r
and s≡r (mod m) is at most
⌈p/m⌉-1≤((p+m-1)/m)-1
=(p-1)/m
∵ s, s+m, s+2m,…., ≤p
Thus,
Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1)
=1/m
Therefore, for any pair of distinct
k, ∈Zp,
Pr[ha,b(k)=ha,b( )] ≤1/m,
so that Hp,m is universal.
Perfect Hashing
Good for when the keys are static; i.e. , once
stored, the keys never change, e.g. CD-ROM,
the set of reserved word in programming
languages. A perfect hashing uses O(1)
memory accesses for a search.
Thm :
If we store n keys in a hash table of size
m=n2 using a hash function h randomly
chosen from a universal class of hash
functions, then the probability of there being
any collisions is < ½ .
Proof:
Let h be chosen from an universal family. Then
each pair collides with probability 1/m , and
there are  n2  pairs of keys.
Let X be a r.v. that counts the number of
collisions. When m=n2,
 n  1 n2  n 1 1
E[ X ]    
 2
2
n
2
 2 m
By Markov ' s inequality , Pr[ X  t ]  E[ X ] / t ,
and take t  1.
Thm: If we store n keys in a hash table
of size m=n using a hash function h
randomly chosen from universal class of
m 1
2
hash functions, then E[ j 0 n j ]  2n ,
where nj is the number of keys hashing
to slot j.
Pf:
It is clear for any nonnegative integer a,
a
a  a  2 
 2
2
 nj 
E[ n j ]  E[  n j  2  ]
j 0
j 0 
 2 
m 1
m 1 n
 j
 E[ n j ]  2 E[  ]
j 0
j 0  2 
m 1
2
m 1
m 1 n
nj 
 j
 E[n]  2 E[  ]  n  2 E[  ]
j 0  2 
j 0  2 
m 1
total number of pairs of
keys that collide
 n  1 n(n  1) n  1
  
, since m  n.
2m
2
 2 m
m 1
n 1
2
E[ n j ]  n  2 
 2n  1  2n.
2
j 0
Cor: If store n keys in a hash table of
size m=n using a hash function h
randomly chosen from a universal class
of hash functions and set the size of
each secondary hash table to mj=nj2 for
j=0,…,m-1, then the expected amount
of storage required for all secondary
hash tables in a perfect hashing scheme
is < 2n.
Pf:
m 1
m 1
j 0
j 0
E[ m j ]  E[ n j ]  2n.
2
Testing a few randomly chosen hash functions
will soon find one using small storage.
Cor: Pr[total storage for secondary hash
tables ]  4n] < 1/2
Pf: By Markov’s inequality, Pr[X  t]  E[X]/t.
m 1
Take X   m j and t  4n :
j 0
m 1
m 1
E[ m j ]
j 0
4n
Pr{ m j  4n} 
j 0
2n 1
 .
4n 2