Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

no text concepts found

Transcript

Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo Background: Succinct Data Structures What are succinct data structures (Jacobson 1989) Representing data structures using ideally information-theoretic minimum space Supporting efficient navigational operations Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric Strings: Definitions Notation Alphabet: [σ]={1, 2, …, σ} String: S[1..n] Operations: access(i): S[i] rank(α, i): number of occurrences of α in S[1..i] select(α, i): position of the ith occurrence of α in S Strings: An Example S=aabacccdaddabbbc string_access(8) = d string_rank(a, 8) = 3 string_select(b, 3) = 14 Succinct Representations of Strings Information-theoretic minimum: n lg σ bits Succinct representation (Grossi et al. 2003) Space: n H0 +o(n)∙lg σ bits Time: O(lg σ) There are many more results. The case in which σ = 2 (bit vector) is even more fundamental! Jacobson 1989 Applications of Strings and Bit Vectors Ordinal trees on n nodes Munro & Raman 1997, Benoit et al. 1999…) Full text indexes for text string from [σ]n Standard approach: 3n lg n bits Succinct data structures: 2n + o(n) bits (Jacobson 1989, Suffix trees can use as much as 4n lg n to 6n lg n bits! Succinct data structures: n lg σ +o(n lg σ) bits (Grossi et al. 2003, González and Navarro 2009…) Labeled trees, planar graphs, binary relations, permutations, functions, … Our Problem: Dynamic Strings Motivation: In many applications, data are also updated frequently For strings, we also consider the following update operations: insert(α, i), which inserts character α between S[i-1] and S[i] delete(i), which deletes S[i] from S Comparisons Gupta et al. 2007 Space (bits) Access, rank and select Insert and delete n lg σ +lg σ∙(o(n)+O(1)) O(lg lg n) O(nε) amortized O(lg n lg σ) O(lg n lg σ) lg σ O(lg n ( ──── + 1)) lg lg n lg σ O(lg n ( ──── + 1)) lg lg n amortized lg σ O(lg n ( ──── + 1)) lg lg n Mäkinen & n H0 +o(n)∙lg σ Navarro 2008 Lee & Park 2009 n lg σ +o(n)∙lg σ González and n H0 +o(n)∙lg σ Navarro 2009 This paper n H0 +o(n)∙lg σ lg σ O(lg n ( ──── + 1)) lg lg n lg n lg σ lg n lg σ O(──── ( ──── + 1)) O(──── ( ──── + 1)) lg lg n lg lg n lg lg n lg lg n For the special cases in which σ = polylog (n) or 2 (bit vector!), our results also improve previous results Searchable Partial Sums Data Operations A sequence Q of n nonnegative integers sum(i): Q[1] + Q[2] + … + Q[i] search(x): the smallest i such that sum(i) ≥ x update(i, δ): Q[i] ← Q[i] + δ Raman et al. 2001 Assumptions: |Q| = O(lgε n), |δ| ≤ lg n Space: O(lg1+ε n) bits, with a universal table of size O(nε’) bits Operations: O(1) time Collections of Searchable Partial Sums Data d sequences of k-bit nonnegative integers of length n each Operations sum, search, update: supported on each sequence insert, delete: operated simultaneously on the same positions of all the sequences, but only 0’s can be inserted or deleted González and Navarro 2009 (CSPSI) 8 2 9 5 11 0 5 12 0 3 1 9 0 7 3 6 0 19 0 4 2 8 1 5 3 12 4 0 3 5 4 1 0 sum(2, 5) = 25 insert(6) delete(6) Our results on CSPSI Assumptions Space d = O(lgη n) |δ| ≤ lg n O(kdn + w) bits, where w is the word size Buffer: O(n lg n) bits Time lg n All operations: O ( ──── ) lg lg n Data Structures for Dynamic Strings Over a Small Alphabet of size O(lg1/2 n) Main data structure: a B-tree constructed over S Leaf Each leaf stores a superblock of at most 2L bits which encodes a 2 lg n ) substring of S (L = ──── lg lg n The numbers of occurrences of each character in all the superblocks form an integer sequence Maintain the above sequences for all the characters in the alphabet in a CSPSI structure E Internal node v (lg1/2 n ≤ degree(v) ≤ 2lg1/2 n) U(v): U(v)[i] = number of leaves of the subtree rooted at the i-th child of v I(v): I(v)[i] = number of characters stored in the subtree rooted at the i-th child of v Supporting Queries rank(α, i) Perform a top-down traversal with the help of I(v)’s Locate the superblock, j, containing S[i] with the help of U(v)’s Perform sum(α, j) operation on E to count the number of occurrences of α in superblocks 1, 2, … j-1 Read superblock j in blocks of size (lg n) / 2 bits The support for access and select is similar v … … Insert, delete and deamortization Supporting insert and delete requires traversing and updating the B-tree and updating E It is however much more complicated Merging and splitting B-tree nodes Deamortization Succinct Global Rebuilding A key technique for deamortizing operations on B-trees is global rebuilding (Overmars and van Leeuwen 1981) Global rebuilding Rebuild the B-tree after the number of update operations performed exceeds half the initial length of the string A new copy and an old copy of the B-tree: more space A buffer of O(n lg n) bits is required Succinct global rebuilding Only one copy of the data: no duplication During rebuilding, queries and updates are performed on either the new part or the old part No buffer required Putting Everything Together Dynamic strings over an alphabet of size O(lg1/2 n) This can be extended to general alphabets using wavelet trees Space: n H0 +o(n)∙lg σ bits lg n Time: O ( ──── lg lg n ) Space: n H0 +o(n)∙lg σ bits lg n lg σ Time: O(──── ( ──── lg lg n + 1)) lg lg n When σ = polylog (n) or 2 (bit vectors) Space: n H0 +o(n)∙lg σ bits lg n Time: O ( ──── lg lg n ) Applications Dynamic text collections Data: a collection of text strings Operations Pattern search Display a substring Insert/delete a text string Compressed construction of full-text indexes Working space: n Hk +o(n)∙lg σ bits n lg n lg σ Time: O(──── ( ──── lg lg n + 1)) lg lg n Conclusions We designed a succinct representation of dynamic strings that provide more efficient operations than previous results This structure can be directly applied to improve previous results on text indexing We expect our results to play an important role in the design of dynamic succinct data structures We expect succinct global rebuilding to be useful for the deamotization of algorithms on dynamic succinct data structures Thank you!

Related documents