Download Succinct Data Structure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Post-quantum cryptography wikipedia , lookup

Coding theory wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Corecursion wikipedia , lookup

Arithmetic coding wikipedia , lookup

Transcript
ENCODING NEAREST LARGER
VALUES
Pat Nicholson* and Rajeev
Raman**
*MPII
** University of Leicester
DÉJÀ VU: THE ENCODING APPROACH
Input Data
(Relatively Big)
DÉJÀ VU: THE ENCODING APPROACH
Preprocess
w.r.t.
Some Query
Input Data
(Relatively Big)
Encoding
(Hope: much smaller)
DÉJÀ VU: THE ENCODING APPROACH
Encoding
(Hope: much smaller)
DÉJÀ VU: THE ENCODING APPROACH
Encoding
(Hope: much smaller)
Auxiliary Data
Structures:
(Should be smaller still)
DÉJÀ VU: THE ENCODING APPROACH
Succinct Data Structure: Minimum Space Possible
Encoding
(Hope: much smaller)
Auxiliary Data
Structures:
(Should be smaller still)
DÉJÀ VU: THE ENCODING APPROACH
Succinct Data Structure: Minimum Space Possible
Auxiliary Data
Structures:
Encoding
(Hope: much smaller)
(Should be smaller still)
Query
(Hope: as fast as nonsuccinct counterpart)
NEAREST LARGER VALUES
Support nearest larger value queries on an array 𝐴 1. . 𝑛 :
ο‚­ Given index 𝑖, return position 𝑗 of the β€œnearest” value larger than A 𝑖
10
2
3
1
9
8
7
11
Two important questions:
ο‚­ #1: What does β€œnearest” mean?
ο‚­ Many possible variants of the problem:
ο‚­ Unidirectional: return the index of the NLV to the left (𝑗 < 𝑖)
ο‚­ Bidirectional: return the indices of the NLV to the left AND right
ο‚­ Nondirectional: return the index of the closest NLV (min. |𝑗 βˆ’ 𝑖|)
ο‚­ #2: Are all elements in the array distinct?
5
4
OVERVIEW: ENCODING NLV
Distinct
Problem
Space
Q
Notes
Unidirectional
2𝑛 + π‘œ(𝑛)
𝑂(1)
Cartesian Tree
Bidirectional
2𝑛 + π‘œ(𝑛)
𝑂(1)
Cartesian Tree
???
???
2𝑛 + π‘œ(𝑛)
𝑂(1)
Cartesian Tree
Bidirectional
[Fischer 2011]
log 2 (3 + 2 2) 𝑛 + π‘œ 𝑛 β‰ˆ
2.54𝑛 + π‘œ(𝑛)
𝑂(1)
Schröder Trees
(Navigate CSA)
Nondirectional
???
???
Yes
Nondirectional
No
Unidirectional
[Fischer et al. 2009]
For all these results: space bound is optimal to within lower order terms
OVERVIEW: ENCODING NLV
Distinct
Problem
Space
Q
Notes
Unidirectional
2𝑛 + π‘œ(𝑛)
𝑂(1)
Cartesian Tree
Bidirectional
2𝑛 + π‘œ(𝑛)
𝑂(1)
Cartesian Tree
Yes
Nondirectional
No
< 𝟏. πŸ—π’ + 𝒐 𝒏
> 𝟏. πŸ‘πŸπŸ•πŸ‘π’ βˆ’ 𝑢(𝟏)
𝑢(𝟏) This paper:
NLV Tree
2𝑛 + π‘œ(𝑛)
𝑂(1)
Cartesian Tree
Bidirectional
[Fischer 2011]
log 2 (3 + 2 2) 𝑛 + π‘œ 𝑛 β‰ˆ
2.54𝑛 + π‘œ(𝑛)
𝑂(1)
Schröder Trees
(Navigate CSA)
Nondirectional
???
???
Unidirectional
[Fischer et al. 2009]
Still very open: What is the constant? log 2 3? Prove it!
BIGGER PICTURE
Encoding 1D Range Minimum Queries
ο‚­ Fischer and Heun [SICOMP 2011]
ο‚­ Encoding also using 2𝑛 + π‘œ(𝑛) bits via Cartesian tree
All-Nearest Larger Values
ο‚­
ο‚­
ο‚­
ο‚­
Asano et al. [Mehlhorn’s Festschrift 2009, WADS 2013]
Trade-offs for computing the solutions to all NLV queries
Berkman et al. [J. Alg 1993]
Parallel algorithms for parenthesis matching, triangulating monotone polygons
Encoding 2D Nearest Larger Values
ο‚­ Jo, Raman, and Rao [WALCOM 2015], Jayapaul et al. [IWOCA 2014]
ο‚­ Encode NLV of 𝑁 × π‘ array under 𝐿1 metric using 𝑂 𝑁 2 bits
Encoding 2D Range Minimum Queries
See Brodal et al. [ESA 2010, 2012, and 2013]
Encode RMQ for an 𝑁 × π‘€ matrix requires Ξ©(𝑁𝑀 log min(𝑁, 𝑀))
BIGGER PICTURE
Encoding 1D Range Minimum Queries
ο‚­ Fischer and Heun [SICOMP 2011]
ο‚­ Encoding also using 2𝑛 + π‘œ(𝑛) bits via Cartesian tree
All-Nearest Larger Values
ο‚­
ο‚­
ο‚­
ο‚­
Asano et al. [Mehlhorn’s Festschrift 2009, WADS 2013]
Trade-offs for computing the solutions to all NLV queries
Berkman et al. [J. Alg 1993]
Parallel algorithms for parenthesis matching, triangulating monotone polygons
Encoding 2D Nearest Larger Values
ο‚­ Jo, Raman, and Rao [WALCOM 2015], Jayapaul et al. [IWOCA 2014]
ο‚­ Encode NLV of 𝑁 × π‘ array under 𝐿1 metric using 𝑂 𝑁 2 bits
Encoding 2D Range Minimum Queries
ο‚­ See Brodal et al. [ESA 2010, 2012, and 2013]
ο‚­ Encode RMQ for an 𝑁 × π‘€ matrix requires Ξ©(𝑁𝑀 log min(𝑁, 𝑀))
CARTESIAN TREES REVIEW
We can rebuild him. We have the technology.
NONDIRECTIONAL NLV TREE
Tie breaking rule: break ties to by
choosing the one to the right.
TIEBREAKING MATTERS?
1
2
3
4
5
6
7
8
9
10
To the right
1
2
5
14
40
116
341
1010
3009
9012
To the smaller
1
2
5
14
42
126
383
1178
3640
11316
To the larger
1
2
5
12
32
88
248
702
1998
5696
𝒏
Rule
Open problem: Does the tie breaking rule affect the constant factor: i.e.,
log π‘₯𝑛
𝑛
π‘›β†’βˆž
lim
?
IDEA: COMPRESS RUNS
IDEA: COMPRESS RUNS
IDEA: COMPRESS RUNS
IDEA: COMPRESS RUNS
DIGRESSION: PATH (OR CHAIN) COMPRESSION
Degree two
Degree one
Terminal
Subtree
If there are π‘˜ deleted nodes, and π‘š chains, then store:
ο‚­ Path/chain-compressed tree: ~𝑛 βˆ’ π‘˜ bits
ο‚­ Bitvector marking chain terminals: log π‘›βˆ’π‘˜
bits
π‘š
π‘˜
ο‚­ Bitvector of length π‘˜ with π‘š ones: unary chain lengths log π‘š
bits
ο‚­ Bitvector of length π‘˜ indicating a zig or zag for each deleted node
This works out to 2𝑛 + Θ log 𝑛 bits
ο‚­ Note: doesn’t support queries, just recovers structure
COMPRESSING CARTESIAN TREES W.R.T.
NLVS
Lemma: Excluding chains containing nodes representing
array elements 𝐴[1] or 𝐴[𝑛], if a chain contains 𝑐𝑖 deleted
elements, then there are exactly 𝑐𝑖 + 1 combinatorially
distinct chains with respect to answering nearest larger
value queries (breaking ties to the right).
Forget about whether it
zigs or zags, just store
# in prefix…
THE ENCODING
If there are π‘˜ deleted nodes, and π‘š chains, then store:
ο‚­ Path/chain-compressed tree: ~𝑛 βˆ’ π‘˜ bits
ο‚­ Bitvector marking chain terminals: log
π‘›βˆ’π‘˜
π‘š
bits
ο‚­ Bitvector of length π‘˜ with π‘š ones: unary chain lengths log
π‘˜
π‘š
bits
ο‚­ For each deleted chain of length 𝑐𝑖 , store number of nodes in prefix
ο‚­ This takes no more than log
π‘š
𝑖=1(𝑐𝑖
+ 1) ≀ π‘š log
π‘˜
π‘š
+1
bits
If we maximize this expression in terms of π‘š and π‘˜:
ο‚­ Upper bounded by 1.9198𝑛 + π‘œ(𝑛) bits
We can improve the encoding of the chain lengths:
ο‚­ Encode multiset of lengths to zeroth order empirical entropy
ο‚­ This improves the upper bound constant factor to about 1.9 (> 1.8999)
SUB-OPTIMALITY EXAMPLES
SUB-OPTIMALITY EXAMPLES
SUB-OPTIMALITY EXAMPLES
SUB-OPTIMALITY EXAMPLES
ENCODING β†’ DATA STRUCTURE
Tree decomposition: Mini-Micro trees Farzan and Munro
ο‚­
ο‚­
ο‚­
ο‚­
Davoodi et al. showed how to support select-inorder for binary tree
We simply plug our compression into this framework
Need to support two additional operations: is_chain_prefix/suffix
Decompress fingerprints, use lookup tables: tree + inorder position
Theorem: Space bound the same as encoding + π‘œ(𝑛) bits and
supports nondirectional NLV query in 𝑂(1) time
LOWER BOUND SKETCH
β€œComputer assisted” lower bound idea:
ο‚­ Rough Idea 1: Use the computer to count number of distinct structures on 𝛽 elements
for some integer 𝛽 > 0. Call this value 𝑆𝛽 .
ο‚­ Rough Idea 2: Given an instance of size 𝑛 glue pieces of size 𝛽 together without
restricting the number of possible configurations of each piece
ο‚­ Two adjacent 𝛽-structures can obviously interfere with each other in non-trivial ways
LOWER BOUND SKETCH
Clearer Idea:
ο‚­ Fix tiebreaking rule: to the right
ο‚­ 𝑆𝛽 is the number of distinct 𝛽 sized NLV structures
ο‚­ 𝑅𝛽 is the number of distinct 𝛽 sized NLV structures with added restrictions
∞
10
2
3
1
9
8
7
11
5
ο‚­ Break permutation into upper and lower half:
ο‚­ Green blocks come from upper half, blue from lower half
ο‚­ Interleave: green guys can have 𝑆𝛽 configurations, blue guys: 𝑅𝛽
ο‚­ Only the max in each green block will β€œexit”
4
∞
CONCLUSIONS AND OPEN PROBLEMS
For nearest larger value problems the details are crucial:
ο‚­ Distinct elements?
ο‚­ Definition of nearest?
ο‚­ Tiebreaking rules?
We have considered encodings of nondirectional NLV
ο‚­ For an array containing distinct elements these can be encoded using less than 1.9𝑛 +
π‘œ 𝑛 bits: slightly less than the Cartesian tree
Open Problems:
ο‚­ What is the optimal space bound for the nondirectional NLV?
ο‚­ Distinct vs. nondistinct?
ο‚­ Does the tiebreaking rule affect the constant factor?
ο‚­ Other formulations?
THANK YOU