Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Succinct Indexes for Strings,
Binary Relations and
Multi-labeled Trees
Jérémy Barbay, Meng He, J. Ian Munro,
University of Waterloo
S. Srinivasa Rao,
IT University of Copenhagen
Background: Succinct Data
Structures
What are succinct data structures
Why succinct data structures
Jacobson 1989
Large data sets in modern applications:
textual, genomic, spatial or geometric
An implementation: Delpratt et al. 2006
Succinct integrated encodings
Main data and auxiliary data structures
Our Problem: Succinct Indexes
Use of the concept in previous work
Compact PAT trees: Clark & Munro 1996
Lower bounds: Demaine & López-Ortiz 2001;
Miltersen 2005
Upper bounds: Sadakane & Grossi 2006
Definition of succinct indexes in data
structure design
ADT: primitive access operators
Succinct index: more powerful operators
Succinct Integrated Encodings
+
Main Data
X
Auxiliary Data
Structures
Navigational
Operations
Succinct Indexes
+
Main Data
Succinct Index
Navigational
Operations
Succinct Indexes vs. Integrated
Encodings
Maximizing the freedom of the encoding
of the main data
Allowing incremental design
Supporting implicit data
Strings: Definitions
Notation
Alphabet: [σ]={1, 2, …, σ}
String: S[1..n]
Operations:
string_access(x): S[x]
string_rank(α, x): number of occurrences of α in
S[1..x]
string_select(α, r): position of the rth occurrence
of α in S
Strings: An Example
S=aabacccdaddabbbc
string_access(8) = d
string_rank(a, 8) = 3
string_select(b, 3) = 14
Strings: Previous Results
Succinct Integrated Encodings
Wavelet trees: Grossi et al. 2003
Space: nH0 + o(n)∙lg σ bits
Time: O(lg σ) time for all three operations
Golynski et al. 2006
Space: n (lg σ + o(lg σ)) bits
Time: O(lglg σ) time for string_access and
string_rank, O(1) time for string_select
Strings: Our Results
Succinct Indexes
ADT
string_access: f(n, σ) time
Space: n∙o(lg σ) bits
Operations
string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg σ))
string_select: O(lglglg σ (f(n, σ)+lglg σ))
Other operations: negations
Binary Relations: Definitions
Notation
Binary relation: R ⊆ [n] x [σ]
Number of objects: n; number of labels: σ
Number of object-label pairs: t
Operations
object_access(x, r): rth label associated with x
label_access(x, α): whether x is associated with α
label_rank(α, x): number of objects labeled α up
to object x
label_select(α, r): rth object labeled α
Binary Relations: An Example
n
σ
0
0
1
1
1
0
0
1
0
0
1
0
object_access(1, 2) =
1
1
1
0
0
0
0
1
label_access(2, 3) =
4
false
label_rank(3, 4) =
3
label_select(4, 3) =
5
Binary Relations: Previous
Results
Succinct Integrated Encodings
Barbay et al., 2006
Space: t (lg σ + o(lg σ)) bits
Time: O(lglg σ) time for object_access,
label_rank and label_access, O(1) time for
label_select
Binary Relations: Our Results
Succinct Indexes
ADT:
Space:
object_access: f(n,σ,t)
t∙o(lg σ) bits
Time:
label_rank and label_access: O(lglg σ lglglg σ
(f(n,σ,t) + lglg σ))
label_select: O(lglglg σ (f(n,σ,t) + lglg σ))
Multi-labeled Trees: Definitions
Notation
Number of nodes: n
Number of labels: σ
Number of node-label pairs: t
Operations
α-descendant
α-child
α-ancestor
Multi-labeled Trees: An
Example
1 {a, c, d}
{a}
3
5
{b}
{c, d}
2
{a, c}
8
{a, b} {b,d} {c}
4
7
9
{c,d} {b,c,d}
10
11
6
{a, b}
Node 2 is a
c-ancestor
of node 6
Node 6 is a
b-descendant
of node 2
Node 10 is a
d-child
of node 8
Multi-labeled Trees: Previous
Results
Labeled trees
Geary et al. 2004
Ferragina et al. 2005
Barbay et al. 2006
Multi-labeled trees
Barbay et al. 2006
Multi-labeled Trees: Our
Approach
Traversal Orders
Preorder
DFUDS order
Ordinal Trees: DFUDS
1
Benoit et al. 1999 & 2005
Jansson et al. 2007
2 Binary Relations
2
Nodes in preorder &
labels
Nodes in DFUDS order &
labels
3
4
8
3
4
5
7
5
7
6
8
6
9
10
11
Multi-labeled Trees: Our
Results
Succinct Indexes
ADT: node_label(x, r)
Supporting α-child/descendant queries:
t∙o(lg σ) bits
Supporting α-child/descendant/ancestor
queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits
(ρ: recursivity)
Supporting α-child/descendant/ancestor
queries of node x after another node y
Applications
Compressed Succinct Encodings
Strings
Space: nHk + o(nlg σ) bits
Operations:
string_access: O(1)
String_rank: O((lglg σ)2lglglg σ)
string_select: O(lglg σ lglglg σ)
First high-order entropy-compressed encoding
supporting rank/select efficiently
Other Data Structures
Applications (Continued)
High-order entropy-compressed text
indexes for large alphabets
Notations: n-text size, σ-alphabet size, mpattern length, occ-number of occurrences
Our results
Space: n Hk+o(n lg σ) bits
Pattern searching: O(m lglg σ+occ lg1+ε n lglg σ)
Previous results: a lg σ factor instead of
lglg σ or incompressible
Conclusions
We showed the importance of succinct
indexes in the design of succinct data
structures by designing:
Succinct representation of multi-labeled
trees that supports efficient retrieval of
ancestors / children / descendants by label
First high-order entropy compressed
representation of strings supporting
rank/select
High-order entropy compressed text
indexes for large alphabets
Conclusions (Continued)
The concept of succinct indexes is useful
in designing succinct data structures … it
maximizes the freedom of the encoding
of the main data and leads to a rich
choice of design tradeoffs.
Thank you!