Download 1,0 - DidaWiki

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hardware random number generator wikipedia , lookup

Mirror symmetry (string theory) wikipedia , lookup

Post-quantum cryptography wikipedia , lookup

Transcript
Auto-completion Search
How it works
What’s the dictionary ?
Trie for the Dictionary
0
s
1
y
z
2
stile
aibelyite
zyg
5
1
etic
ial
2
3
5
2
omo
7
czecin
ygy
4
6
Pro: O(p) search time = path scan
Cons: edge + node labels + tree structure
Top-1
0
1
y
8,1
P = sy
s
z
2
stile
aibelyite
zyg
5
8
1
5
etic
2
1
3
omo
7
czecin
ygy
ial
2
2
4
4
6
What’s the ranking/scoring of the answers ?
Top-1: How to speed-up
0
8,1
1
4
etic
2
1
3
5
2
2
7
omo
7
5
czecin
ygy
ial
2
z
aibelyite
zyg
5
1
1
2
stile
8
1
y
P = sy
s
4
4
6
3
How to compute the top-1 in O(1) time ?
Top-k in O(1) time, but k× space
Top-2
0
1
y
1,4
1
4,2
etic
2
1
3
5
2
2
7,6
omo
7
5
czecin
ygy
ial
2
z
aibelyite
zyg
5
8
1,7
2
stile
P = sy
s
4
4
6
3
How to compute the top-2 in O(1) time ?
Top-k: How to squeeze ?
0
P = sy
s
1
y
z
2
stile
aibelyite
zyg
5
8
1
5
etic
2
1
3
omo
7
2
5
czecin
ygy
ial
2
2
4
6
4
3
Prefixed by P, proceed D&C
Score
String
8 2
1 2
1
3
4
4
2
5
3
6
5
7
Top-k: How to squeeze ?
Prefixed by P, proceed D&C
Score
String
8 2
1 2
L
1
3
4
4
2
5
3
6
5
7
R
Let H be a max-heap of size k, keep also
Initialize H with k pairs <-, NULL>
Given the range <L,R> (here <1,4>)



RMQ-query
in O(1)
time
and
min[H]
and
max[H]
O(n) space
Compute max-score in Array[L,R] (pos. M, value m)
If m ≤ min[H], skip;
else:
Time: O(k) time, and space



Insert <m,string> in H;
If size(H)>k then remove min[H];
Recurse on <L,M-1> and <M+1,R>, if not empty.
Example for Top-2
Consider this other array
Score
String
4 2
1 2
L
1
3
8
4
2
5
3
6
5
7
R
Range : operations
[1,7]: H  <8,4>; recurse on [1,3] and [5,7]
[1,3]: H={<8,4>}  <4,1>; recurse on [1,0] and [2,3]
[5,7]: H={<8,4>,<4,1>}  <5,7>; delete <4,1> from H,
recurse on [5,6] and [8,7]
[2,3]: H={<8,4>,<5,7>}  <2,2>; since min[H]=5, not insert in H
[5,6]: H ={<8,4>,<5,7>}  <3,6>; since min[H]=5, not insert in H
H = {<8,4> e <5,7>}
A smarter approach
Prefixed by P, proceed D&C
Score
String
8 2
1 2
L
1
3
4
4
2
5
3
6
5
7
R
Let H be a max-heap, including items <val, string, [low,high]>
Compute max-score in Array[L,R] (pos. M, value m)
i=0; insert <m, string[M], L, R> in H
While (i<k) do

Extract <x, string[X], Lx, Rx> from H, where x is max-value in H

Return String[X] as one of the top-k strings

Compute max-score in Array[Lx,X-1] (pos. M’, value m’)

insert <m’, string[M’], Lx, X-1>

Compute max-score in Array[X+1,Rx] (pos. M’’, value m’’)

insert <m’’, string[M’’], X+1, Rx>
Time: still O(k) time, and space

i++;
Random access to postings lists
and other data types
A basic problem !
Dog  1 12 15 20 22....
Array of n skip pointers to an array of m integers
• (log m) bits per pointer = (n log m) bits = 32 n bits.
• it is effective for few pointers
D AbacoBattleCarColdCod ....
Array of n string pointers to strings of total length m
• (n log m) bits = 32 n bits.
• it is independent of string length
B 100001000001001000100 ....
We aim at achieving ≈ n log(m/n) bits < n log m
Rank/Select
Wish to index the bit vector B (possibly compressed).
Select1(3) = 8
B 00101001010101011111110000011010101....
Rank1(6) = 2
• Rankb(i)
m = |B|
n = #1
= number of b in B[1,i]
• Selectb(i) = position of the i-th b in B
Do exist data structures that solve this problem in
O(1) query time and very small
extra
space (i.e. +o(m) bits)
m = |B|
n = #1s
The Bit-Vector Index: |B| + o(|B|)
Goal. B is read-only, and the additional index takes o(m) bits.
Rank
B 00101001010101011 1111100010110101 0101010111000....
Z
(absolute) Rank1
18
8
4
5
8
z
(bucket-relative) Rank1
 Setting Z = poly(log m) and z=(1/2) log m:



block pos
#1
0000 1
0
....
...
...
1011
2
1
....
Extra space is + (m/Z) log m + (m/z) log Z + o(m)
exists a Bit-Vector Index
 + O(m loglog m / log m) =There
o(m) bits
taking o(m) extra bits
Rank time is O(1)
and constant time for Rank/Select.
Term o(m) is crucial in practice,BB is
is needed
untouched
(not
compressed)
and
read-only!
Elias-Fano (B is not needed)
z = 3, w=2
If w = log (m/n) and z = log n, where m = |B| and n = #1
then
- L takes n w = n log (m/n) bits
- H takes n 1s + n 0s = 2n bits
0
In unary
1
2 3 4 5
6
7
(Select1
on H)
Select1(i) on B  uses L and (Select1(H,i) – i) in +o(n) space
Rank1(i) on B  Needs binary search over B
If you wish to play with Rank and Select
m/10 + n log (m/n)
Rank in 0.4 msec, Select in < 1 msec
vs 32n bits of explicit pointers