Download S - Researchmap

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Succinct Data Structures
Kunihiko Sadakane
National Institute of Informatics
Extension of Rank/Select (3)
• Multi-alphabet
–
–
–
–
access(S, x): returns S[x]
rankc(S, x): number of c in S[0..x] = S[0]S[1]…S[x]
selectc(S, i): position of i-th c in S (i  1)
c  A = {0, 1, …, 1} alphabet size  > 2
2
Data Structure 1
•
•
•
•
Represent S by  0,1 vectors and compress by FID
rank and select are done in O(1) time
access is done in O()
Size
– If a letter c appears nc times, the size of the vector is
 n log log n 
n

nc log  1.44nc  O
nc
 log n 
– In total
 n log log n 

nH 0  1.44  O
log n


$: 010000000
a: 000010011
c: 001100000
g: 100001100
S: g$ccaggaa
3
• Store S as it is
Data Structure 2
– access takes O(1) time
• The  0,1 vectors are computed from S
– Succinct indexes for rank/select can be used without
modifications
– Time to obtain log n bits of 0,1 vector is O(log )
– rank and select are done in O(log ) time
• Size
 n log log n 

| S | O
log n


(n log )
$: 010000000
a: 000010011
c: 001100000
g: 100001100
4
S: g$ccaggaa
Wavelet Trees
• Binary tree consisting of 1 0,1 vectors
• The vector in root node V[0..n1]
– V[i] = 1 ⇔ most significant bit of S[i] is 1
• Right (left) subtree of root node is the wavelet tree
for the string consisting of letters of S whose most
significant bit is 1 (0)
– Remove the most significant bit
S: g $ c c a g g a a
$ = 00
a = 01
c = 10
g = 11
V 101101100
S0 V0 0 1 1 1
S1 V1 1 0 0 1 1
5
Computing rank
•
•
•
•
b = (most significant bit of c)
c’ = (c whose most significant bit is omitted)
r = rankb(V, x)
rankc(S, x) = rankc’(Sb, r) (recursively computed)
•
•
•
•
O(log ) time
Total length of vectors in depth d nodes is n
Height of tree is log 
n log  + O(n log  log log n/log n)
= |S| + o(|S|) bits
6
Computing access
•
•
•
•
b = access(V, x)
r = rankb(V, x)
access(S, x) = b : access(Sb, r)
O(log ) time
(recursion)
Computing select
•
•
•
•
b = (most significant bit of c)
c’ = (c whose most significant bit is omitted)
selectc(S, x) = selectb(V, selectc’(S, x))
O(log ) time
7
Compressing Vectors
• nc: frequency of letter c
• Size of compressed V
n




log
 n0    n 1  1 n 1     n 1 
2
2


• Size of compressed V0 and V1 S: g $ c c a g g a a
n0    n  1


V0 1 0 1 1 0 1 1 0 0

log 
1
2
 n0    n 1  1 n 1     n 1  1 
4
4
2


n 1     n 1


2


 log n    n
n 3     n 1 
3  1
 1
4
4
 2

V0V1 0 1 1 1 1 0 0 1 1
8
• By adding the sizes for all levels
n

  1
n
   nc log  nH 0
log 
nc
 n0 , n2 ,, n 1  c 0
• Summary
– access/rank/select: O(log ) time
– size: nH0+O(n log  log log n/log n)
9
Multi-ary Wavelet Trees [5]
•
•
•
•
Use not binary but multi-ary for wavelet trees
access/rank/select: O(log /log log n) time
Size: nH0+o(n log ) bits
If  = polylog(n)
– access/rank/select: O(1) time
– size: nH0+o(n) bits
• Size can be reduced to nH0+o(n)(nH0+1) bits
10
Summary
Size
access
rank
select
n(H0+1.44)+  o(n)
O()
O(1)
O(1)
|S| +   o(n)
O(1)
O(log )
O(log )
nH0+log   o(n)
O(log )
O(log )
O(log )
nH0+o(n log )
O(log /log log n)
O(log /log log n)
O(log /log log n)
nH0+o(n)(nH0+1)
O(log log )
O(log log )
O(1)
11
Related documents