Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Succinct Data Structures Kunihiko Sadakane National Institute of Informatics Extension of Rank/Select (3) • Multi-alphabet – – – – access(S, x): returns S[x] rankc(S, x): number of c in S[0..x] = S[0]S[1]…S[x] selectc(S, i): position of i-th c in S (i 1) c A = {0, 1, …, 1} alphabet size > 2 2 Data Structure 1 • • • • Represent S by 0,1 vectors and compress by FID rank and select are done in O(1) time access is done in O() Size – If a letter c appears nc times, the size of the vector is n log log n n nc log 1.44nc O nc log n – In total n log log n nH 0 1.44 O log n $: 010000000 a: 000010011 c: 001100000 g: 100001100 S: g$ccaggaa 3 • Store S as it is Data Structure 2 – access takes O(1) time • The 0,1 vectors are computed from S – Succinct indexes for rank/select can be used without modifications – Time to obtain log n bits of 0,1 vector is O(log ) – rank and select are done in O(log ) time • Size n log log n | S | O log n (n log ) $: 010000000 a: 000010011 c: 001100000 g: 100001100 4 S: g$ccaggaa Wavelet Trees • Binary tree consisting of 1 0,1 vectors • The vector in root node V[0..n1] – V[i] = 1 ⇔ most significant bit of S[i] is 1 • Right (left) subtree of root node is the wavelet tree for the string consisting of letters of S whose most significant bit is 1 (0) – Remove the most significant bit S: g $ c c a g g a a $ = 00 a = 01 c = 10 g = 11 V 101101100 S0 V0 0 1 1 1 S1 V1 1 0 0 1 1 5 Computing rank • • • • b = (most significant bit of c) c’ = (c whose most significant bit is omitted) r = rankb(V, x) rankc(S, x) = rankc’(Sb, r) (recursively computed) • • • • O(log ) time Total length of vectors in depth d nodes is n Height of tree is log n log + O(n log log log n/log n) = |S| + o(|S|) bits 6 Computing access • • • • b = access(V, x) r = rankb(V, x) access(S, x) = b : access(Sb, r) O(log ) time (recursion) Computing select • • • • b = (most significant bit of c) c’ = (c whose most significant bit is omitted) selectc(S, x) = selectb(V, selectc’(S, x)) O(log ) time 7 Compressing Vectors • nc: frequency of letter c • Size of compressed V n log n0 n 1 1 n 1 n 1 2 2 • Size of compressed V0 and V1 S: g $ c c a g g a a n0 n 1 V0 1 0 1 1 0 1 1 0 0 log 1 2 n0 n 1 1 n 1 n 1 1 4 4 2 n 1 n 1 2 log n n n 3 n 1 3 1 1 4 4 2 V0V1 0 1 1 1 1 0 0 1 1 8 • By adding the sizes for all levels n 1 n nc log nH 0 log nc n0 , n2 ,, n 1 c 0 • Summary – access/rank/select: O(log ) time – size: nH0+O(n log log log n/log n) 9 Multi-ary Wavelet Trees [5] • • • • Use not binary but multi-ary for wavelet trees access/rank/select: O(log /log log n) time Size: nH0+o(n log ) bits If = polylog(n) – access/rank/select: O(1) time – size: nH0+o(n) bits • Size can be reduced to nH0+o(n)(nH0+1) bits 10 Summary Size access rank select n(H0+1.44)+ o(n) O() O(1) O(1) |S| + o(n) O(1) O(log ) O(log ) nH0+log o(n) O(log ) O(log ) O(log ) nH0+o(n log ) O(log /log log n) O(log /log log n) O(log /log log n) nH0+o(n)(nH0+1) O(log log ) O(log log ) O(1) 11