Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Auto-completion Search How it works What’s the dictionary ? Trie for the Dictionary 0 s 1 y z 2 stile aibelyite zyg 5 1 etic ial 2 3 5 2 omo 7 czecin ygy 4 6 Pro: O(p) search time = path scan Cons: edge + node labels + tree structure Top-1 0 1 y 8,1 P = sy s z 2 stile aibelyite zyg 5 8 1 5 etic 2 1 3 omo 7 czecin ygy ial 2 2 4 4 6 What’s the ranking/scoring of the answers ? Top-1: How to speed-up 0 8,1 1 4 etic 2 1 3 5 2 2 7 omo 7 5 czecin ygy ial 2 z aibelyite zyg 5 1 1 2 stile 8 1 y P = sy s 4 4 6 3 How to compute the top-1 in O(1) time ? Top-k in O(1) time, but k× space Top-2 0 1 y 1,4 1 4,2 etic 2 1 3 5 2 2 7,6 omo 7 5 czecin ygy ial 2 z aibelyite zyg 5 8 1,7 2 stile P = sy s 4 4 6 3 How to compute the top-2 in O(1) time ? Top-k: How to squeeze ? 0 P = sy s 1 y z 2 stile aibelyite zyg 5 8 1 5 etic 2 1 3 omo 7 2 5 czecin ygy ial 2 2 4 6 4 3 Prefixed by P, proceed D&C Score String 8 2 1 2 1 3 4 4 2 5 3 6 5 7 Top-k: How to squeeze ? Prefixed by P, proceed D&C Score String 8 2 1 2 L 1 3 4 4 2 5 3 6 5 7 R Let H be a max-heap of size k, keep also Initialize H with k pairs <-, NULL> Given the range <L,R> (here <1,4>) RMQ-query in O(1) time and min[H] and max[H] O(n) space Compute max-score in Array[L,R] (pos. M, value m) If m ≤ min[H], skip; else: Time: O(k) time, and space Insert <m,string> in H; If size(H)>k then remove min[H]; Recurse on <L,M-1> and <M+1,R>, if not empty. Example for Top-2 Consider this other array Score String 4 2 1 2 L 1 3 8 4 2 5 3 6 5 7 R Range : operations [1,7]: H <8,4>; recurse on [1,3] and [5,7] [1,3]: H={<8,4>} <4,1>; recurse on [1,0] and [2,3] [5,7]: H={<8,4>,<4,1>} <5,7>; delete <4,1> from H, recurse on [5,6] and [8,7] [2,3]: H={<8,4>,<5,7>} <2,2>; since min[H]=5, not insert in H [5,6]: H ={<8,4>,<5,7>} <3,6>; since min[H]=5, not insert in H H = {<8,4> e <5,7>} A smarter approach Prefixed by P, proceed D&C Score String 8 2 1 2 L 1 3 4 4 2 5 3 6 5 7 R Let H be a max-heap, including items <val, string, [low,high]> Compute max-score in Array[L,R] (pos. M, value m) i=0; insert <m, string[M], L, R> in H While (i<k) do Extract <x, string[X], Lx, Rx> from H, where x is max-value in H Return String[X] as one of the top-k strings Compute max-score in Array[Lx,X-1] (pos. M’, value m’) insert <m’, string[M’], Lx, X-1> Compute max-score in Array[X+1,Rx] (pos. M’’, value m’’) insert <m’’, string[M’’], X+1, Rx> Time: still O(k) time, and space i++; Random access to postings lists and other data types A basic problem ! Dog 1 12 15 20 22.... Array of n skip pointers to an array of m integers • (log m) bits per pointer = (n log m) bits = 32 n bits. • it is effective for few pointers D AbacoBattleCarColdCod .... Array of n string pointers to strings of total length m • (n log m) bits = 32 n bits. • it is independent of string length B 100001000001001000100 .... We aim at achieving ≈ n log(m/n) bits < n log m Rank/Select Wish to index the bit vector B (possibly compressed). Select1(3) = 8 B 00101001010101011111110000011010101.... Rank1(6) = 2 • Rankb(i) m = |B| n = #1 = number of b in B[1,i] • Selectb(i) = position of the i-th b in B Do exist data structures that solve this problem in O(1) query time and very small extra space (i.e. +o(m) bits) m = |B| n = #1s The Bit-Vector Index: |B| + o(|B|) Goal. B is read-only, and the additional index takes o(m) bits. Rank B 00101001010101011 1111100010110101 0101010111000.... Z (absolute) Rank1 18 8 4 5 8 z (bucket-relative) Rank1 Setting Z = poly(log m) and z=(1/2) log m: block pos #1 0000 1 0 .... ... ... 1011 2 1 .... Extra space is + (m/Z) log m + (m/z) log Z + o(m) exists a Bit-Vector Index + O(m loglog m / log m) =There o(m) bits taking o(m) extra bits Rank time is O(1) and constant time for Rank/Select. Term o(m) is crucial in practice,BB is is needed untouched (not compressed) and read-only! Elias-Fano (B is not needed) z = 3, w=2 If w = log (m/n) and z = log n, where m = |B| and n = #1 then - L takes n w = n log (m/n) bits - H takes n 1s + n 0s = 2n bits 0 In unary 1 2 3 4 5 6 7 (Select1 on H) Select1(i) on B uses L and (Select1(H,i) – i) in +o(n) space Rank1(i) on B Needs binary search over B If you wish to play with Rank and Select m/10 + n log (m/n) Rank in 0.4 msec, Select in < 1 msec vs 32n bits of explicit pointers