Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen Background: Succinct Data Structures What are succinct data structures Why succinct data structures Jacobson 1989 Large data sets in modern applications: textual, genomic, spatial or geometric An implementation: Delpratt et al. 2006 Succinct integrated encodings Main data and auxiliary data structures Our Problem: Succinct Indexes Use of the concept in previous work Compact PAT trees: Clark & Munro 1996 Lower bounds: Demaine & López-Ortiz 2001; Miltersen 2005 Upper bounds: Sadakane & Grossi 2006 Definition of succinct indexes in data structure design ADT: primitive access operators Succinct index: more powerful operators Succinct Integrated Encodings + Main Data X Auxiliary Data Structures Navigational Operations Succinct Indexes + Main Data Succinct Index Navigational Operations Succinct Indexes vs. Integrated Encodings Maximizing the freedom of the encoding of the main data Allowing incremental design Supporting implicit data Strings: Definitions Notation Alphabet: [σ]={1, 2, …, σ} String: S[1..n] Operations: string_access(x): S[x] string_rank(α, x): number of occurrences of α in S[1..x] string_select(α, r): position of the rth occurrence of α in S Strings: An Example S=aabacccdaddabbbc string_access(8) = d string_rank(a, 8) = 3 string_select(b, 3) = 14 Strings: Previous Results Succinct Integrated Encodings Wavelet trees: Grossi et al. 2003 Space: nH0 + o(n)∙lg σ bits Time: O(lg σ) time for all three operations Golynski et al. 2006 Space: n (lg σ + o(lg σ)) bits Time: O(lglg σ) time for string_access and string_rank, O(1) time for string_select Strings: Our Results Succinct Indexes ADT string_access: f(n, σ) time Space: n∙o(lg σ) bits Operations string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg σ)) string_select: O(lglglg σ (f(n, σ)+lglg σ)) Other operations: negations Binary Relations: Definitions Notation Binary relation: R ⊆ [n] x [σ] Number of objects: n; number of labels: σ Number of object-label pairs: t Operations object_access(x, r): rth label associated with x label_access(x, α): whether x is associated with α label_rank(α, x): number of objects labeled α up to object x label_select(α, r): rth object labeled α Binary Relations: An Example n σ 0 0 1 1 1 0 0 1 0 0 1 0 object_access(1, 2) = 1 1 1 0 0 0 0 1 label_access(2, 3) = 4 false label_rank(3, 4) = 3 label_select(4, 3) = 5 Binary Relations: Previous Results Succinct Integrated Encodings Barbay et al., 2006 Space: t (lg σ + o(lg σ)) bits Time: O(lglg σ) time for object_access, label_rank and label_access, O(1) time for label_select Binary Relations: Our Results Succinct Indexes ADT: Space: object_access: f(n,σ,t) t∙o(lg σ) bits Time: label_rank and label_access: O(lglg σ lglglg σ (f(n,σ,t) + lglg σ)) label_select: O(lglglg σ (f(n,σ,t) + lglg σ)) Multi-labeled Trees: Definitions Notation Number of nodes: n Number of labels: σ Number of node-label pairs: t Operations α-descendant α-child α-ancestor Multi-labeled Trees: An Example 1 {a, c, d} {a} 3 5 {b} {c, d} 2 {a, c} 8 {a, b} {b,d} {c} 4 7 9 {c,d} {b,c,d} 10 11 6 {a, b} Node 2 is a c-ancestor of node 6 Node 6 is a b-descendant of node 2 Node 10 is a d-child of node 8 Multi-labeled Trees: Previous Results Labeled trees Geary et al. 2004 Ferragina et al. 2005 Barbay et al. 2006 Multi-labeled trees Barbay et al. 2006 Multi-labeled Trees: Our Approach Traversal Orders Preorder DFUDS order Ordinal Trees: DFUDS 1 Benoit et al. 1999 & 2005 Jansson et al. 2007 2 Binary Relations 2 Nodes in preorder & labels Nodes in DFUDS order & labels 3 4 8 3 4 5 7 5 7 6 8 6 9 10 11 Multi-labeled Trees: Our Results Succinct Indexes ADT: node_label(x, r) Supporting α-child/descendant queries: t∙o(lg σ) bits Supporting α-child/descendant/ancestor queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits (ρ: recursivity) Supporting α-child/descendant/ancestor queries of node x after another node y Applications Compressed Succinct Encodings Strings Space: nHk + o(nlg σ) bits Operations: string_access: O(1) String_rank: O((lglg σ)2lglglg σ) string_select: O(lglg σ lglglg σ) First high-order entropy-compressed encoding supporting rank/select efficiently Other Data Structures Applications (Continued) High-order entropy-compressed text indexes for large alphabets Notations: n-text size, σ-alphabet size, mpattern length, occ-number of occurrences Our results Space: n Hk+o(n lg σ) bits Pattern searching: O(m lglg σ+occ lg1+ε n lglg σ) Previous results: a lg σ factor instead of lglg σ or incompressible Conclusions We showed the importance of succinct indexes in the design of succinct data structures by designing: Succinct representation of multi-labeled trees that supports efficient retrieval of ancestors / children / descendants by label First high-order entropy compressed representation of strings supporting rank/select High-order entropy compressed text indexes for large alphabets Conclusions (Continued) The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs. Thank you!