Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Succinct Data Structures
Kunihiko Sadakane
National Institute of Informatics
Extension of Rank/Select (3)
• Multi-alphabet
–
–
–
–
access(S, x): returns S[x]
rankc(S, x): number of c in S[0..x] = S[0]S[1]…S[x]
selectc(S, i): position of i-th c in S (i 1)
c A = {0, 1, …, 1} alphabet size > 2
2
Data Structure 1
•
•
•
•
Represent S by 0,1 vectors and compress by FID
rank and select are done in O(1) time
access is done in O()
Size
– If a letter c appears nc times, the size of the vector is
n log log n
n
nc log 1.44nc O
nc
log n
– In total
n log log n
nH 0 1.44 O
log n
$: 010000000
a: 000010011
c: 001100000
g: 100001100
S: g$ccaggaa
3
• Store S as it is
Data Structure 2
– access takes O(1) time
• The 0,1 vectors are computed from S
– Succinct indexes for rank/select can be used without
modifications
– Time to obtain log n bits of 0,1 vector is O(log )
– rank and select are done in O(log ) time
• Size
n log log n
| S | O
log n
(n log )
$: 010000000
a: 000010011
c: 001100000
g: 100001100
4
S: g$ccaggaa
Wavelet Trees
• Binary tree consisting of 1 0,1 vectors
• The vector in root node V[0..n1]
– V[i] = 1 ⇔ most significant bit of S[i] is 1
• Right (left) subtree of root node is the wavelet tree
for the string consisting of letters of S whose most
significant bit is 1 (0)
– Remove the most significant bit
S: g $ c c a g g a a
$ = 00
a = 01
c = 10
g = 11
V 101101100
S0 V0 0 1 1 1
S1 V1 1 0 0 1 1
5
Computing rank
•
•
•
•
b = (most significant bit of c)
c’ = (c whose most significant bit is omitted)
r = rankb(V, x)
rankc(S, x) = rankc’(Sb, r) (recursively computed)
•
•
•
•
O(log ) time
Total length of vectors in depth d nodes is n
Height of tree is log
n log + O(n log log log n/log n)
= |S| + o(|S|) bits
6
Computing access
•
•
•
•
b = access(V, x)
r = rankb(V, x)
access(S, x) = b : access(Sb, r)
O(log ) time
(recursion)
Computing select
•
•
•
•
b = (most significant bit of c)
c’ = (c whose most significant bit is omitted)
selectc(S, x) = selectb(V, selectc’(S, x))
O(log ) time
7
Compressing Vectors
• nc: frequency of letter c
• Size of compressed V
n
log
n0 n 1 1 n 1 n 1
2
2
• Size of compressed V0 and V1 S: g $ c c a g g a a
n0 n 1
V0 1 0 1 1 0 1 1 0 0
log
1
2
n0 n 1 1 n 1 n 1 1
4
4
2
n 1 n 1
2
log n n
n 3 n 1
3 1
1
4
4
2
V0V1 0 1 1 1 1 0 0 1 1
8
• By adding the sizes for all levels
n
1
n
nc log nH 0
log
nc
n0 , n2 ,, n 1 c 0
• Summary
– access/rank/select: O(log ) time
– size: nH0+O(n log log log n/log n)
9
Multi-ary Wavelet Trees [5]
•
•
•
•
Use not binary but multi-ary for wavelet trees
access/rank/select: O(log /log log n) time
Size: nH0+o(n log ) bits
If = polylog(n)
– access/rank/select: O(1) time
– size: nH0+o(n) bits
• Size can be reduced to nH0+o(n)(nH0+1) bits
10
Summary
Size
access
rank
select
n(H0+1.44)+ o(n)
O()
O(1)
O(1)
|S| + o(n)
O(1)
O(log )
O(log )
nH0+log o(n)
O(log )
O(log )
O(log )
nH0+o(n log )
O(log /log log n)
O(log /log log n)
O(log /log log n)
nH0+o(n)(nH0+1)
O(log log )
O(log log )
O(1)
11