Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Succinct Representations
of Dynamic Strings
Meng He and J. Ian Munro
University of Waterloo
Background: Succinct Data Structures
What are succinct data structures (Jacobson 1989)
Representing data structures using ideally
information-theoretic minimum space
Supporting efficient navigational operations
Why succinct data structures
Large data sets in modern applications: textual,
genomic, spatial or geometric
Strings: Definitions
Notation
Alphabet: [σ]={1, 2, …, σ}
String: S[1..n]
Operations:
access(i): S[i]
rank(α, i): number of occurrences of α in S[1..i]
select(α, i): position of the ith occurrence of α in S
Strings: An Example
S=aabacccdaddabbbc
string_access(8) = d
string_rank(a, 8) = 3
string_select(b, 3) = 14
Succinct Representations of Strings
Information-theoretic minimum: n lg σ bits
Succinct representation (Grossi et al. 2003)
Space: n H0 +o(n)∙lg σ bits
Time: O(lg σ)
There are many more results.
The case in which σ = 2 (bit vector) is even more
fundamental!
Jacobson 1989
Applications of Strings and Bit Vectors
Ordinal trees on n nodes
Munro & Raman 1997, Benoit et al. 1999…)
Full text indexes for text string from [σ]n
Standard approach: 3n lg n bits
Succinct data structures: 2n + o(n) bits (Jacobson 1989,
Suffix trees can use as much as 4n lg n to 6n lg n bits!
Succinct data structures: n lg σ +o(n lg σ) bits (Grossi et
al. 2003, González and Navarro 2009…)
Labeled trees, planar graphs, binary relations,
permutations, functions, …
Our Problem: Dynamic Strings
Motivation: In many applications, data are also
updated frequently
For strings, we also consider the following
update operations:
insert(α, i), which inserts character α between S[i-1]
and S[i]
delete(i), which deletes S[i] from S
Comparisons
Gupta et al.
2007
Space (bits)
Access, rank and
select
Insert and delete
n lg σ +lg
σ∙(o(n)+O(1))
O(lg lg n)
O(nε) amortized
O(lg n lg σ)
O(lg n lg σ)
lg σ
O(lg n ( ──── + 1))
lg lg n
lg σ
O(lg n ( ──── + 1))
lg lg n
amortized
lg σ
O(lg n ( ──── + 1))
lg lg n
Mäkinen &
n H0 +o(n)∙lg σ
Navarro 2008
Lee & Park
2009
n lg σ +o(n)∙lg σ
González and n H0 +o(n)∙lg σ
Navarro 2009
This paper
n H0 +o(n)∙lg σ
lg σ
O(lg n ( ──── + 1))
lg lg n
lg n
lg σ
lg n
lg σ
O(──── ( ────
+
1))
O(────
(
────
+ 1))
lg lg n lg lg n
lg lg n lg lg n
For the special cases in which σ = polylog (n) or 2 (bit vector!), our results
also improve previous results
Searchable Partial Sums
Data
Operations
A sequence Q of n nonnegative integers
sum(i): Q[1] + Q[2] + … + Q[i]
search(x): the smallest i such that sum(i) ≥ x
update(i, δ): Q[i] ← Q[i] + δ
Raman et al. 2001
Assumptions: |Q| = O(lgε n), |δ| ≤ lg n
Space: O(lg1+ε n) bits, with a universal table of size O(nε’) bits
Operations: O(1) time
Collections of Searchable Partial Sums
Data
d sequences of k-bit nonnegative integers of length n each
Operations
sum, search, update: supported on each sequence
insert, delete: operated simultaneously on the same positions of
all the sequences, but only 0’s can be inserted or deleted
González and Navarro 2009 (CSPSI)
8
2
9
5
11
0
5
12
0
3
1
9
0
7
3
6
0
19
0
4
2
8
1
5
3
12
4
0
3
5
4
1
0
sum(2, 5) = 25
insert(6)
delete(6)
Our results on CSPSI
Assumptions
Space
d = O(lgη n)
|δ| ≤ lg n
O(kdn + w) bits, where w is the word size
Buffer: O(n lg n) bits
Time
lg n
All operations: O ( ──── )
lg lg n
Data Structures for Dynamic Strings Over a
Small Alphabet of size O(lg1/2 n)
Main data structure: a B-tree constructed over S
Leaf
Each leaf stores a superblock
of at most 2L bits which encodes a
2
lg n )
substring of S (L = ────
lg lg n
The numbers of occurrences of each character in all the
superblocks form an integer sequence
Maintain the above sequences for all the characters in the
alphabet in a CSPSI structure E
Internal node v (lg1/2 n ≤ degree(v) ≤ 2lg1/2 n)
U(v): U(v)[i] = number of leaves of the subtree rooted at the i-th
child of v
I(v): I(v)[i] = number of characters stored in the subtree rooted at
the i-th child of v
Supporting Queries
rank(α, i)
Perform a top-down traversal with
the help of I(v)’s
Locate the superblock, j, containing
S[i] with the help of U(v)’s
Perform sum(α, j) operation on E to
count the number of occurrences of α
in superblocks 1, 2, … j-1
Read superblock j in blocks of size
(lg n) / 2 bits
The support for access and select
is similar
v
…
…
Insert, delete and deamortization
Supporting insert and delete requires traversing
and updating the B-tree and updating E
It is however much more complicated
Merging and splitting B-tree nodes
Deamortization
Succinct Global Rebuilding
A key technique for deamortizing operations on B-trees
is global rebuilding (Overmars and van Leeuwen 1981)
Global rebuilding
Rebuild the B-tree after the number of update operations
performed exceeds half the initial length of the string
A new copy and an old copy of the B-tree: more space
A buffer of O(n lg n) bits is required
Succinct global rebuilding
Only one copy of the data: no duplication
During rebuilding, queries and updates are performed on either
the new part or the old part
No buffer required
Putting Everything Together
Dynamic strings over an alphabet of size O(lg1/2 n)
This can be extended to general alphabets using wavelet
trees
Space: n H0 +o(n)∙lg σ bits
lg n
Time: O ( ────
lg lg n )
Space: n H0 +o(n)∙lg σ bits
lg n
lg σ
Time: O(────
( ────
lg lg n + 1))
lg lg n
When σ = polylog (n) or 2 (bit vectors)
Space: n H0 +o(n)∙lg σ bits
lg n
Time: O ( ────
lg lg n )
Applications
Dynamic text collections
Data: a collection of text strings
Operations
Pattern search
Display a substring
Insert/delete a text string
Compressed construction of full-text indexes
Working space: n Hk +o(n)∙lg σ bits
n lg n
lg σ
Time: O(────
( ────
lg lg n + 1))
lg lg n
Conclusions
We designed a succinct representation of
dynamic strings that provide more efficient
operations than previous results
This structure can be directly applied to improve
previous results on text indexing
We expect our results to play an important role
in the design of dynamic succinct data structures
We expect succinct global rebuilding to be useful
for the deamotization of algorithms on dynamic
succinct data structures
Thank you!