Download Universal Hashing

Universal Hashing   Worst case analysis Probabilistic analysis   Need the knowledge of the distribution of the inputs Indicator random variables  Given a sample space S and an event A, the indicator random variable I{A} associated with event A is defined as: 1 if A occurs I  A  0 o/w  E.g.: Consider flipping a fair coin: • Sample space S = { H,T } • Define random variable Y with Pr{ Y=H } = Pr{ Y=T }=1/2 • We can define an indicator r.v. XH associated with the coin coming up heads, i.e. Y=H 1 if Y  H X H  I Y  H   0 if Y  T  E  X H   E  I Y  H   1 Pr Y  H   0  Pr Y  T  1  Pr Y  H   2 Lemma : Given a sample space S and an event A in the sample space S, let X A  I { A} Then E  X A   Pr  A Proof : E  X A   E  I  A  1 Pr  A  0  Pr  A  Pr  A     H={ h: U→{0,…,m-1} }, which is a finite collection of hash functions. H is called “universal” if for each pair of distinct keys k, ∈ U, the number of hash functions h∈H for which h(k)=h( ) is at most |H|/m Define ni = the length of list T[i] Theorem: Suppose h is randomly selected from H, using chaining to resolve collisions. If k is not in the table, then E[nh(k)] ≤ α. If k is in the table, then E[nh(k)] ≤ 1+α  Proof:    For each pair k and of distinct keys, define Xk =I{h(k)=h( )}. By definition, Prh{h(k)=h( )} ≤ 1/m, and so E[Xk ] ≤ 1/m. Define Yk to be the number of keys other than k that hash to the same slot as k, so that Yk   Xk T k 1 E[Yk ]   E[ Xk ]   T T m k k   If k  T , then nh ( k )  Yk and |{ :  T ,  k}|  n n thus E[nh ( k )]  E[Yk ]    m If k∈T, then because k appears in T[h(k)] and the count Yk does not include k, we have nh(k) = Yk + 1 and |{ :  T ,  k}| n  1 n 1 1 Thus E[nh ( k )]  E[Yk ]  1  1  1    1  m m  Corollary: Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n Insert, Search and Delete operations containing O(m) Insert operations. Proof: Since n= O(m), the load factor is O(1). By the Thm, each Search takes O(1) time. Each of Insert and Delete takes O(1). Thus the expected time is Θ(n). Designing a universal class of hash functions: p:prime  Z p  0,1,, p  1 Z p  1,2,, p  1 For any a  Z p and b  Z p , define ha,b : Zp→Zm ha ,b (k )  (( ak  b) mod p ) mod m  p,m  ha,b : a  Z *p and b  Z p  Theorem: Hp,m is universal. Pf: Let k, be two distinct keys in Zp. Given ha,b, Let r= (ak +b) mod p, and s= (a +b) mod p. Then r-s ≡ a(k- ) mod p   For any ha,b∈Hp,m, distinct inputs k and map to distinct r and s modulo p. Each possible p(p-1) choices for the pair (a,b) with a≠0 yields a different resulting pair (r,s) with r≠s, since we can solve for a and b given r and s: a=((r-s)((k- )-1 mod p)) mod p b=(r-ak) mod p  There are p(p-1) possible pairs (r,s) with r≠s, there is a 1-1 correspondence between pairs (a,b) with a≠0 and (r,s), r≠s.  For any given pair of inputs k and , if we pick (a,b) uniformly at random from Z p  Z p , the resulting pair (r,s) is equally likely to be any pair of distinct values modulo p.   Pr[ k and collide]=Prr,s[ r≡s mod m] Given r, the number of s such that s≠r and s≡r (mod m) is at most ⌈p/m⌉-1≤((p+m-1)/m)-1 =(p-1)/m ∵ s, s+m, s+2m,…., ≤p Thus, Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1) =1/m Therefore, for any pair of distinct k, ∈Zp, Pr[ha,b(k)=ha,b( )] ≤1/m, so that Hp,m is universal.  Perfect Hashing   Good for when the keys are static; i.e. , once stored, the keys never change, e.g. CD-ROM, the set of reserved word in programming languages. A perfect hashing uses O(1) memory accesses for a search. Thm : If we store n keys in a hash table of size m=n2 using a hash function h randomly chosen from a universal class of hash functions, then the probability of there being any collisions is < ½ . Proof:   Let h be chosen from an universal family. Then each pair collides with probability 1/m , and there are  n2  pairs of keys. Let X be a r.v. that counts the number of collisions. When m=n2,  n  1 n2  n 1 1 E[ X ]      2 2 n 2  2 m By Markov ' s inequality , Pr[ X  t ]  E[ X ] / t , and take t  1.  Thm: If we store n keys in a hash table of size m=n using a hash function h randomly chosen from universal class of m 1 2 hash functions, then E[ j 0 n j ]  2n , where nj is the number of keys hashing to slot j.  Pf:  It is clear for any nonnegative integer a, a a  a  2   2 2    nj  E[ n j ]  E[  n j  2  ] j 0 j 0   2  m 1 m 1 n  j  E[ n j ]  2 E[  ] j 0 j 0  2  m 1 2 m 1 m 1 n nj   j  E[n]  2 E[  ]  n  2 E[  ] j 0  2  j 0  2  m 1 total number of pairs of keys that collide  n  1 n(n  1) n  1     , since m  n. 2m 2  2 m m 1 n 1 2 E[ n j ]  n  2   2n  1  2n. 2 j 0  Cor: If store n keys in a hash table of size m=n using a hash function h randomly chosen from a universal class of hash functions and set the size of each secondary hash table to mj=nj2 for j=0,…,m-1, then the expected amount of storage required for all secondary hash tables in a perfect hashing scheme is < 2n.  Pf: m 1 m 1 j 0 j 0 E[ m j ]  E[ n j ]  2n. 2 Testing a few randomly chosen hash functions will soon find one using small storage.   Cor: Pr[total storage for secondary hash tables ]  4n] < 1/2 Pf: By Markov’s inequality, Pr[X  t]  E[X]/t. m 1  Take X   m j and t  4n : j 0 m 1 m 1 E[ m j ] j 0 4n Pr{ m j  4n}  j 0 2n 1   . 4n 2

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Universal Hashing