Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Succinct Representations
of Dynamic Strings
Meng He and J. Ian Munro
University of Waterloo
Background: Succinct Data Structures

What are succinct data structures (Jacobson 1989)



Representing data structures using ideally
information-theoretic minimum space
Why succinct data structures

Large data sets in modern applications: textual,
genomic, spatial or geometric
Strings: Definitions

Notation



Alphabet: [σ]={1, 2, …, σ}
String: S[1..n]
Operations:



access(i): S[i]
rank(α, i): number of occurrences of α in S[1..i]
select(α, i): position of the ith occurrence of α in S
Strings: An Example
string_access(8) = d
string_rank(a, 8) = 3
string_select(b, 3) = 14
Succinct Representations of Strings

Information-theoretic minimum: n lg σ bits

Succinct representation (Grossi et al. 2003)




Space: n H0 +o(n)∙lg σ bits
Time: O(lg σ)
There are many more results.
The case in which σ = 2 (bit vector) is even more
fundamental!

Jacobson 1989
Applications of Strings and Bit Vectors

Ordinal trees on n nodes



Munro & Raman 1997, Benoit et al. 1999…)
Full text indexes for text string from [σ]n



Standard approach: 3n lg n bits
Succinct data structures: 2n + o(n) bits (Jacobson 1989,
Suffix trees can use as much as 4n lg n to 6n lg n bits!
Succinct data structures: n lg σ +o(n lg σ) bits (Grossi et
al. 2003, González and Navarro 2009…)
Labeled trees, planar graphs, binary relations,
permutations, functions, …
Our Problem: Dynamic Strings

Motivation: In many applications, data are also
updated frequently

For strings, we also consider the following
update operations:

insert(α, i), which inserts character α between S[i-1]
and S[i]

delete(i), which deletes S[i] from S
Comparisons
Gupta et al.
2007
Space (bits)
Access, rank and
select
Insert and delete
n lg σ +lg
σ∙(o(n)+O(1))
O(lg lg n)
O(nε) amortized
O(lg n lg σ)
O(lg n lg σ)
lg σ
O(lg n ( ──── + 1))
lg lg n
lg σ
O(lg n ( ──── + 1))
lg lg n
amortized
lg σ
O(lg n ( ──── + 1))
lg lg n
Mäkinen &
n H0 +o(n)∙lg σ
Navarro 2008
Lee & Park
2009
n lg σ +o(n)∙lg σ
González and n H0 +o(n)∙lg σ
Navarro 2009
This paper
n H0 +o(n)∙lg σ
lg σ
O(lg n ( ──── + 1))
lg lg n
lg n
lg σ
lg n
lg σ
O(──── ( ────
+
1))
O(────
(
────
+ 1))
lg lg n lg lg n
lg lg n lg lg n
For the special cases in which σ = polylog (n) or 2 (bit vector!), our results
also improve previous results
Searchable Partial Sums

Data


Operations




A sequence Q of n nonnegative integers
sum(i): Q[1] + Q[2] + … + Q[i]
search(x): the smallest i such that sum(i) ≥ x
update(i, δ): Q[i] ← Q[i] + δ
Raman et al. 2001



Assumptions: |Q| = O(lgε n), |δ| ≤ lg n
Space: O(lg1+ε n) bits, with a universal table of size O(nε’) bits
Operations: O(1) time
Collections of Searchable Partial Sums

Data

d sequences of k-bit nonnegative integers of length n each

Operations

sum, search, update: supported on each sequence
 insert, delete: operated simultaneously on the same positions of
all the sequences, but only 0’s can be inserted or deleted
González and Navarro 2009 (CSPSI)

8
2
9
5
11
0
5
12
0
3
1
9
0
7
3
6
0
19
0
4
2
8
1
5
3
12
4
0
3
5
4
1
0
sum(2, 5) = 25
insert(6)
delete(6)
Our results on CSPSI

Assumptions



Space



d = O(lgη n)
|δ| ≤ lg n
O(kdn + w) bits, where w is the word size
Buffer: O(n lg n) bits
Time

lg n
All operations: O ( ──── )
lg lg n
Data Structures for Dynamic Strings Over a
Small Alphabet of size O(lg1/2 n)


Main data structure: a B-tree constructed over S
Leaf




Each leaf stores a superblock
of at most 2L bits which encodes a
2
lg n )
substring of S (L = ────
lg lg n
The numbers of occurrences of each character in all the
superblocks form an integer sequence
Maintain the above sequences for all the characters in the
alphabet in a CSPSI structure E
Internal node v (lg1/2 n ≤ degree(v) ≤ 2lg1/2 n)


U(v): U(v)[i] = number of leaves of the subtree rooted at the i-th
child of v
I(v): I(v)[i] = number of characters stored in the subtree rooted at
the i-th child of v
Supporting Queries

rank(α, i)





Perform a top-down traversal with
the help of I(v)’s
Locate the superblock, j, containing
S[i] with the help of U(v)’s
Perform sum(α, j) operation on E to
count the number of occurrences of α
in superblocks 1, 2, … j-1
Read superblock j in blocks of size
(lg n) / 2 bits
The support for access and select
is similar
v
…
…
Insert, delete and deamortization

Supporting insert and delete requires traversing
and updating the B-tree and updating E

It is however much more complicated


Merging and splitting B-tree nodes
Deamortization
Succinct Global Rebuilding


A key technique for deamortizing operations on B-trees
is global rebuilding (Overmars and van Leeuwen 1981)
Global rebuilding




Rebuild the B-tree after the number of update operations
performed exceeds half the initial length of the string
A new copy and an old copy of the B-tree: more space
A buffer of O(n lg n) bits is required
Succinct global rebuilding



Only one copy of the data: no duplication
During rebuilding, queries and updates are performed on either
the new part or the old part
No buffer required
Putting Everything Together

Dynamic strings over an alphabet of size O(lg1/2 n)



This can be extended to general alphabets using wavelet
trees



Space: n H0 +o(n)∙lg σ bits
lg n
Time: O ( ────
lg lg n )
Space: n H0 +o(n)∙lg σ bits
lg n
lg σ
Time: O(────
( ────
lg lg n + 1))
lg lg n
When σ = polylog (n) or 2 (bit vectors)


Space: n H0 +o(n)∙lg σ bits
lg n
Time: O ( ────
lg lg n )
Applications

Dynamic text collections


Data: a collection of text strings
Operations




Pattern search
Display a substring
Insert/delete a text string
Compressed construction of full-text indexes


Working space: n Hk +o(n)∙lg σ bits
n lg n
lg σ
Time: O(────
( ────
lg lg n + 1))
lg lg n
Conclusions
We designed a succinct representation of
dynamic strings that provide more efficient
operations than previous results
 This structure can be directly applied to improve
previous results on text indexing
 We expect our results to play an important role
in the design of dynamic succinct data structures
 We expect succinct global rebuilding to be useful
for the deamotization of algorithms on dynamic
succinct data structures

Thank you!
```
Related documents