Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Universal Hashing Worst case analysis Probabilistic analysis Need the knowledge of the distribution of the inputs Indicator random variables Given a sample space S and an event A, the indicator random variable I{A} associated with event A is defined as: 1 if A occurs I A 0 o/w E.g.: Consider flipping a fair coin: • Sample space S = { H,T } • Define random variable Y with Pr{ Y=H } = Pr{ Y=T }=1/2 • We can define an indicator r.v. XH associated with the coin coming up heads, i.e. Y=H 1 if Y H X H I Y H 0 if Y T E X H E I Y H 1 Pr Y H 0 Pr Y T 1 Pr Y H 2 Lemma : Given a sample space S and an event A in the sample space S, let X A I { A} Then E X A Pr A Proof : E X A E I A 1 Pr A 0 Pr A Pr A H={ h: U→{0,…,m-1} }, which is a finite collection of hash functions. H is called “universal” if for each pair of distinct keys k, ∈ U, the number of hash functions h∈H for which h(k)=h( ) is at most |H|/m Define ni = the length of list T[i] Theorem: Suppose h is randomly selected from H, using chaining to resolve collisions. If k is not in the table, then E[nh(k)] ≤ α. If k is in the table, then E[nh(k)] ≤ 1+α Proof: For each pair k and of distinct keys, define Xk =I{h(k)=h( )}. By definition, Prh{h(k)=h( )} ≤ 1/m, and so E[Xk ] ≤ 1/m. Define Yk to be the number of keys other than k that hash to the same slot as k, so that Yk Xk T k 1 E[Yk ] E[ Xk ] T T m k k If k T , then nh ( k ) Yk and |{ : T , k}| n n thus E[nh ( k )] E[Yk ] m If k∈T, then because k appears in T[h(k)] and the count Yk does not include k, we have nh(k) = Yk + 1 and |{ : T , k}| n 1 n 1 1 Thus E[nh ( k )] E[Yk ] 1 1 1 1 m m Corollary: Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n Insert, Search and Delete operations containing O(m) Insert operations. Proof: Since n= O(m), the load factor is O(1). By the Thm, each Search takes O(1) time. Each of Insert and Delete takes O(1). Thus the expected time is Θ(n). Designing a universal class of hash functions: p:prime Z p 0,1,, p 1 Z p 1,2,, p 1 For any a Z p and b Z p , define ha,b : Zp→Zm ha ,b (k ) (( ak b) mod p ) mod m p,m ha,b : a Z *p and b Z p Theorem: Hp,m is universal. Pf: Let k, be two distinct keys in Zp. Given ha,b, Let r= (ak +b) mod p, and s= (a +b) mod p. Then r-s ≡ a(k- ) mod p For any ha,b∈Hp,m, distinct inputs k and map to distinct r and s modulo p. Each possible p(p-1) choices for the pair (a,b) with a≠0 yields a different resulting pair (r,s) with r≠s, since we can solve for a and b given r and s: a=((r-s)((k- )-1 mod p)) mod p b=(r-ak) mod p There are p(p-1) possible pairs (r,s) with r≠s, there is a 1-1 correspondence between pairs (a,b) with a≠0 and (r,s), r≠s. For any given pair of inputs k and , if we pick (a,b) uniformly at random from Z p Z p , the resulting pair (r,s) is equally likely to be any pair of distinct values modulo p. Pr[ k and collide]=Prr,s[ r≡s mod m] Given r, the number of s such that s≠r and s≡r (mod m) is at most ⌈p/m⌉-1≤((p+m-1)/m)-1 =(p-1)/m ∵ s, s+m, s+2m,…., ≤p Thus, Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1) =1/m Therefore, for any pair of distinct k, ∈Zp, Pr[ha,b(k)=ha,b( )] ≤1/m, so that Hp,m is universal. Perfect Hashing Good for when the keys are static; i.e. , once stored, the keys never change, e.g. CD-ROM, the set of reserved word in programming languages. A perfect hashing uses O(1) memory accesses for a search. Thm : If we store n keys in a hash table of size m=n2 using a hash function h randomly chosen from a universal class of hash functions, then the probability of there being any collisions is < ½ . Proof: Let h be chosen from an universal family. Then each pair collides with probability 1/m , and there are n2 pairs of keys. Let X be a r.v. that counts the number of collisions. When m=n2, n 1 n2 n 1 1 E[ X ] 2 2 n 2 2 m By Markov ' s inequality , Pr[ X t ] E[ X ] / t , and take t 1. Thm: If we store n keys in a hash table of size m=n using a hash function h randomly chosen from universal class of m 1 2 hash functions, then E[ j 0 n j ] 2n , where nj is the number of keys hashing to slot j. Pf: It is clear for any nonnegative integer a, a a a 2 2 2 nj E[ n j ] E[ n j 2 ] j 0 j 0 2 m 1 m 1 n j E[ n j ] 2 E[ ] j 0 j 0 2 m 1 2 m 1 m 1 n nj j E[n] 2 E[ ] n 2 E[ ] j 0 2 j 0 2 m 1 total number of pairs of keys that collide n 1 n(n 1) n 1 , since m n. 2m 2 2 m m 1 n 1 2 E[ n j ] n 2 2n 1 2n. 2 j 0 Cor: If store n keys in a hash table of size m=n using a hash function h randomly chosen from a universal class of hash functions and set the size of each secondary hash table to mj=nj2 for j=0,…,m-1, then the expected amount of storage required for all secondary hash tables in a perfect hashing scheme is < 2n. Pf: m 1 m 1 j 0 j 0 E[ m j ] E[ n j ] 2n. 2 Testing a few randomly chosen hash functions will soon find one using small storage. Cor: Pr[total storage for secondary hash tables ] 4n] < 1/2 Pf: By Markov’s inequality, Pr[X t] E[X]/t. m 1 Take X m j and t 4n : j 0 m 1 m 1 E[ m j ] j 0 4n Pr{ m j 4n} j 0 2n 1 . 4n 2