Download Succinct Indexes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Succinct Indexes for Strings,
Binary Relations and
Multi-labeled Trees
Jérémy Barbay, Meng He, J. Ian Munro,
University of Waterloo
S. Srinivasa Rao,
IT University of Copenhagen
Background: Succinct Data
Structures

What are succinct data structures


Why succinct data structures



Jacobson 1989
Large data sets in modern applications:
textual, genomic, spatial or geometric
An implementation: Delpratt et al. 2006
Succinct integrated encodings

Main data and auxiliary data structures
Our Problem: Succinct Indexes

Use of the concept in previous work




Compact PAT trees: Clark & Munro 1996
Lower bounds: Demaine & López-Ortiz 2001;
Miltersen 2005
Upper bounds: Sadakane & Grossi 2006
Definition of succinct indexes in data
structure design


ADT: primitive access operators
Succinct index: more powerful operators
Succinct Integrated Encodings
+
Main Data
X
Auxiliary Data
Structures
Navigational
Operations
Succinct Indexes
+
Main Data
Succinct Index
Navigational
Operations
Succinct Indexes vs. Integrated
Encodings

Maximizing the freedom of the encoding
of the main data

Allowing incremental design

Supporting implicit data
Strings: Definitions

Notation



Alphabet: [σ]={1, 2, …, σ}
String: S[1..n]
Operations:



string_access(x): S[x]
string_rank(α, x): number of occurrences of α in
S[1..x]
string_select(α, r): position of the rth occurrence
of α in S
Strings: An Example
S=aabacccdaddabbbc
string_access(8) = d
string_rank(a, 8) = 3
string_select(b, 3) = 14
Strings: Previous Results

Succinct Integrated Encodings

Wavelet trees: Grossi et al. 2003



Space: nH0 + o(n)∙lg σ bits
Time: O(lg σ) time for all three operations
Golynski et al. 2006


Space: n (lg σ + o(lg σ)) bits
Time: O(lglg σ) time for string_access and
string_rank, O(1) time for string_select
Strings: Our Results

Succinct Indexes

ADT



string_access: f(n, σ) time
Space: n∙o(lg σ) bits
Operations



string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg σ))
string_select: O(lglglg σ (f(n, σ)+lglg σ))
Other operations: negations
Binary Relations: Definitions

Notation




Binary relation: R ⊆ [n] x [σ]
Number of objects: n; number of labels: σ
Number of object-label pairs: t
Operations




object_access(x, r): rth label associated with x
label_access(x, α): whether x is associated with α
label_rank(α, x): number of objects labeled α up
to object x
label_select(α, r): rth object labeled α
Binary Relations: An Example
n
σ
0
0
1
1
1
0
0
1
0
0
1
0
object_access(1, 2) =
1
1
1
0
0
0
0
1
label_access(2, 3) =
4
false
label_rank(3, 4) =
3
label_select(4, 3) =
5
Binary Relations: Previous
Results

Succinct Integrated Encodings

Barbay et al., 2006


Space: t (lg σ + o(lg σ)) bits
Time: O(lglg σ) time for object_access,
label_rank and label_access, O(1) time for
label_select
Binary Relations: Our Results

Succinct Indexes

ADT:


Space:


object_access: f(n,σ,t)
t∙o(lg σ) bits
Time:


label_rank and label_access: O(lglg σ lglglg σ
(f(n,σ,t) + lglg σ))
label_select: O(lglglg σ (f(n,σ,t) + lglg σ))
Multi-labeled Trees: Definitions

Notation




Number of nodes: n
Number of labels: σ
Number of node-label pairs: t
Operations
 α-descendant
 α-child
 α-ancestor
Multi-labeled Trees: An
Example
1 {a, c, d}
{a}
3
5
{b}
{c, d}
2
{a, c}
8
{a, b} {b,d} {c}
4
7
9
{c,d} {b,c,d}
10
11
6
{a, b}
Node 2 is a
c-ancestor
of node 6
Node 6 is a
b-descendant
of node 2
Node 10 is a
d-child
of node 8
Multi-labeled Trees: Previous
Results

Labeled trees




Geary et al. 2004
Ferragina et al. 2005
Barbay et al. 2006
Multi-labeled trees

Barbay et al. 2006
Multi-labeled Trees: Our
Approach

Traversal Orders



Preorder
DFUDS order
Ordinal Trees: DFUDS



1
Benoit et al. 1999 & 2005
Jansson et al. 2007
2 Binary Relations


2
Nodes in preorder &
labels
Nodes in DFUDS order &
labels
3
4
8
3
4
5
7
5
7
6
8
6
9
10
11
Multi-labeled Trees: Our
Results

Succinct Indexes




ADT: node_label(x, r)
Supporting α-child/descendant queries:
t∙o(lg σ) bits
Supporting α-child/descendant/ancestor
queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits
(ρ: recursivity)
Supporting α-child/descendant/ancestor
queries of node x after another node y
Applications

Compressed Succinct Encodings

Strings


Space: nHk + o(nlg σ) bits
Operations:





string_access: O(1)
String_rank: O((lglg σ)2lglglg σ)
string_select: O(lglg σ lglglg σ)
First high-order entropy-compressed encoding
supporting rank/select efficiently
Other Data Structures
Applications (Continued)

High-order entropy-compressed text
indexes for large alphabets


Notations: n-text size, σ-alphabet size, mpattern length, occ-number of occurrences
Our results



Space: n Hk+o(n lg σ) bits
Pattern searching: O(m lglg σ+occ lg1+ε n lglg σ)
Previous results: a lg σ factor instead of
lglg σ or incompressible
Conclusions

We showed the importance of succinct
indexes in the design of succinct data
structures by designing:
 Succinct representation of multi-labeled
trees that supports efficient retrieval of
ancestors / children / descendants by label
 First high-order entropy compressed
representation of strings supporting
rank/select
 High-order entropy compressed text
indexes for large alphabets
Conclusions (Continued)
The concept of succinct indexes is useful
in designing succinct data structures … it
maximizes the freedom of the encoding
of the main data and leads to a rich
choice of design tradeoffs.
Thank you!
Related documents