Download Succinct Data Structure

ENCODING NEAREST LARGER VALUES Pat Nicholson* and Rajeev Raman** *MPII ** University of Leicester DÉJÀ VU: THE ENCODING APPROACH Input Data (Relatively Big) DÉJÀ VU: THE ENCODING APPROACH Preprocess w.r.t. Some Query Input Data (Relatively Big) Encoding (Hope: much smaller) DÉJÀ VU: THE ENCODING APPROACH Encoding (Hope: much smaller) DÉJÀ VU: THE ENCODING APPROACH Encoding (Hope: much smaller) Auxiliary Data Structures: (Should be smaller still) DÉJÀ VU: THE ENCODING APPROACH Succinct Data Structure: Minimum Space Possible Encoding (Hope: much smaller) Auxiliary Data Structures: (Should be smaller still) DÉJÀ VU: THE ENCODING APPROACH Succinct Data Structure: Minimum Space Possible Auxiliary Data Structures: Encoding (Hope: much smaller) (Should be smaller still) Query (Hope: as fast as nonsuccinct counterpart) NEAREST LARGER VALUES Support nearest larger value queries on an array 𝐴 1. . 𝑛 :  Given index 𝑖, return position 𝑗 of the “nearest” value larger than A 𝑖 10 2 3 1 9 8 7 11 Two important questions:  #1: What does “nearest” mean?  Many possible variants of the problem:  Unidirectional: return the index of the NLV to the left (𝑗 < 𝑖)  Bidirectional: return the indices of the NLV to the left AND right  Nondirectional: return the index of the closest NLV (min. |𝑗 − 𝑖|)  #2: Are all elements in the array distinct? 5 4 OVERVIEW: ENCODING NLV Distinct Problem Space Q Notes Unidirectional 2𝑛 + 𝑜(𝑛) 𝑂(1) Cartesian Tree Bidirectional 2𝑛 + 𝑜(𝑛) 𝑂(1) Cartesian Tree ??? ??? 2𝑛 + 𝑜(𝑛) 𝑂(1) Cartesian Tree Bidirectional [Fischer 2011] log 2 (3 + 2 2) 𝑛 + 𝑜 𝑛 ≈ 2.54𝑛 + 𝑜(𝑛) 𝑂(1) Schröder Trees (Navigate CSA) Nondirectional ??? ??? Yes Nondirectional No Unidirectional [Fischer et al. 2009] For all these results: space bound is optimal to within lower order terms OVERVIEW: ENCODING NLV Distinct Problem Space Q Notes Unidirectional 2𝑛 + 𝑜(𝑛) 𝑂(1) Cartesian Tree Bidirectional 2𝑛 + 𝑜(𝑛) 𝑂(1) Cartesian Tree Yes Nondirectional No < 𝟏. 𝟗𝒏 + 𝒐 𝒏 > 𝟏. 𝟑𝟏𝟕𝟑𝒏 − 𝑶(𝟏) 𝑶(𝟏) This paper: NLV Tree 2𝑛 + 𝑜(𝑛) 𝑂(1) Cartesian Tree Bidirectional [Fischer 2011] log 2 (3 + 2 2) 𝑛 + 𝑜 𝑛 ≈ 2.54𝑛 + 𝑜(𝑛) 𝑂(1) Schröder Trees (Navigate CSA) Nondirectional ??? ??? Unidirectional [Fischer et al. 2009] Still very open: What is the constant? log 2 3? Prove it! BIGGER PICTURE Encoding 1D Range Minimum Queries  Fischer and Heun [SICOMP 2011]  Encoding also using 2𝑛 + 𝑜(𝑛) bits via Cartesian tree All-Nearest Larger Values     Asano et al. [Mehlhorn’s Festschrift 2009, WADS 2013] Trade-offs for computing the solutions to all NLV queries Berkman et al. [J. Alg 1993] Parallel algorithms for parenthesis matching, triangulating monotone polygons Encoding 2D Nearest Larger Values  Jo, Raman, and Rao [WALCOM 2015], Jayapaul et al. [IWOCA 2014]  Encode NLV of 𝑁 × 𝑁 array under 𝐿1 metric using 𝑂 𝑁 2 bits Encoding 2D Range Minimum Queries See Brodal et al. [ESA 2010, 2012, and 2013] Encode RMQ for an 𝑁 × 𝑀 matrix requires Ω(𝑁𝑀 log min(𝑁, 𝑀)) BIGGER PICTURE Encoding 1D Range Minimum Queries  Fischer and Heun [SICOMP 2011]  Encoding also using 2𝑛 + 𝑜(𝑛) bits via Cartesian tree All-Nearest Larger Values     Asano et al. [Mehlhorn’s Festschrift 2009, WADS 2013] Trade-offs for computing the solutions to all NLV queries Berkman et al. [J. Alg 1993] Parallel algorithms for parenthesis matching, triangulating monotone polygons Encoding 2D Nearest Larger Values  Jo, Raman, and Rao [WALCOM 2015], Jayapaul et al. [IWOCA 2014]  Encode NLV of 𝑁 × 𝑁 array under 𝐿1 metric using 𝑂 𝑁 2 bits Encoding 2D Range Minimum Queries  See Brodal et al. [ESA 2010, 2012, and 2013]  Encode RMQ for an 𝑁 × 𝑀 matrix requires Ω(𝑁𝑀 log min(𝑁, 𝑀)) CARTESIAN TREES REVIEW We can rebuild him. We have the technology. NONDIRECTIONAL NLV TREE Tie breaking rule: break ties to by choosing the one to the right. TIEBREAKING MATTERS? 1 2 3 4 5 6 7 8 9 10 To the right 1 2 5 14 40 116 341 1010 3009 9012 To the smaller 1 2 5 14 42 126 383 1178 3640 11316 To the larger 1 2 5 12 32 88 248 702 1998 5696 𝒏 Rule Open problem: Does the tie breaking rule affect the constant factor: i.e., log 𝑥𝑛 𝑛 𝑛→∞ lim ? IDEA: COMPRESS RUNS IDEA: COMPRESS RUNS IDEA: COMPRESS RUNS IDEA: COMPRESS RUNS DIGRESSION: PATH (OR CHAIN) COMPRESSION Degree two Degree one Terminal Subtree If there are 𝑘 deleted nodes, and 𝑚 chains, then store:  Path/chain-compressed tree: ~𝑛 − 𝑘 bits  Bitvector marking chain terminals: log 𝑛−𝑘 bits 𝑚 𝑘  Bitvector of length 𝑘 with 𝑚 ones: unary chain lengths log 𝑚 bits  Bitvector of length 𝑘 indicating a zig or zag for each deleted node This works out to 2𝑛 + Θ log 𝑛 bits  Note: doesn’t support queries, just recovers structure COMPRESSING CARTESIAN TREES W.R.T. NLVS Lemma: Excluding chains containing nodes representing array elements 𝐴[1] or 𝐴[𝑛], if a chain contains 𝑐𝑖 deleted elements, then there are exactly 𝑐𝑖 + 1 combinatorially distinct chains with respect to answering nearest larger value queries (breaking ties to the right). Forget about whether it zigs or zags, just store # in prefix… THE ENCODING If there are 𝑘 deleted nodes, and 𝑚 chains, then store:  Path/chain-compressed tree: ~𝑛 − 𝑘 bits  Bitvector marking chain terminals: log 𝑛−𝑘 𝑚 bits  Bitvector of length 𝑘 with 𝑚 ones: unary chain lengths log 𝑘 𝑚 bits  For each deleted chain of length 𝑐𝑖 , store number of nodes in prefix  This takes no more than log 𝑚 𝑖=1(𝑐𝑖 + 1) ≤ 𝑚 log 𝑘 𝑚 +1 bits If we maximize this expression in terms of 𝑚 and 𝑘:  Upper bounded by 1.9198𝑛 + 𝑜(𝑛) bits We can improve the encoding of the chain lengths:  Encode multiset of lengths to zeroth order empirical entropy  This improves the upper bound constant factor to about 1.9 (> 1.8999) SUB-OPTIMALITY EXAMPLES SUB-OPTIMALITY EXAMPLES SUB-OPTIMALITY EXAMPLES SUB-OPTIMALITY EXAMPLES ENCODING → DATA STRUCTURE Tree decomposition: Mini-Micro trees Farzan and Munro     Davoodi et al. showed how to support select-inorder for binary tree We simply plug our compression into this framework Need to support two additional operations: is_chain_prefix/suffix Decompress fingerprints, use lookup tables: tree + inorder position Theorem: Space bound the same as encoding + 𝑜(𝑛) bits and supports nondirectional NLV query in 𝑂(1) time LOWER BOUND SKETCH “Computer assisted” lower bound idea:  Rough Idea 1: Use the computer to count number of distinct structures on 𝛽 elements for some integer 𝛽 > 0. Call this value 𝑆𝛽 .  Rough Idea 2: Given an instance of size 𝑛 glue pieces of size 𝛽 together without restricting the number of possible configurations of each piece  Two adjacent 𝛽-structures can obviously interfere with each other in non-trivial ways LOWER BOUND SKETCH Clearer Idea:  Fix tiebreaking rule: to the right  𝑆𝛽 is the number of distinct 𝛽 sized NLV structures  𝑅𝛽 is the number of distinct 𝛽 sized NLV structures with added restrictions ∞ 10 2 3 1 9 8 7 11 5  Break permutation into upper and lower half:  Green blocks come from upper half, blue from lower half  Interleave: green guys can have 𝑆𝛽 configurations, blue guys: 𝑅𝛽  Only the max in each green block will “exit” 4 ∞ CONCLUSIONS AND OPEN PROBLEMS For nearest larger value problems the details are crucial:  Distinct elements?  Definition of nearest?  Tiebreaking rules? We have considered encodings of nondirectional NLV  For an array containing distinct elements these can be encoded using less than 1.9𝑛 + 𝑜 𝑛 bits: slightly less than the Cartesian tree Open Problems:  What is the optimal space bound for the nondirectional NLV?  Distinct vs. nondistinct?  Does the tiebreaking rule affect the constant factor?  Other formulations? THANK YOU

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Succinct Data Structure