Download Cell Probe Lower Bounds for Succinct Data Structures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Inverse problem wikipedia, lookup

Post-quantum cryptography wikipedia, lookup

Pattern recognition wikipedia, lookup

Error detection and correction wikipedia, lookup

Fisher–Yates shuffle wikipedia, lookup

K-nearest neighbors algorithm wikipedia, lookup

Coding theory wikipedia, lookup

Corecursion wikipedia, lookup

Arithmetic coding wikipedia, lookup

Transcript
PaweΕ‚ Gawrychowski* and Pat Nicholson**
*University of Warsaw
**Max-Planck-Institut für Informatik
Range Queries in Arrays
ο‚— Input: an array 𝐴[1. . 𝑛]
ο‚— Preprocess the array to answer queries of the form
β€œGiven a range [𝑖, 𝑗] find _____ in the subarray 𝐴[𝑖. . 𝑗]”
ο‚— Where ______ is something like:
ο‚—
ο‚—
ο‚—
ο‚—
the index of the maximum/minimum element
the index of the top-π‘˜ values
the index of the π‘˜-th largest/smallest number
find the maximum sum range 𝑖 β€² , 𝑗 β€² βŠ† [𝑖, 𝑗]
Encoding Range Queries in Arrays
ο‚— How much space do we need to answer these queries?
ο‚— As an example, think of range min. queries (RMinQ):
ο‚— If we return the value of min, then we must store the array. Why?
ο‚—
ο‚—
Because we can ask the query [𝑖, 𝑖] for each 𝑖 ∈ [1, 𝑛]
This allows us to recover the entire array
ο‚— If we return just the array index, then we can do much better.
There is a succinct data structure that occupies 2𝑛 + π‘œ(𝑛) bits, and
answers queries in constant time.
Fischer and Heun (2011)
Typical Data Structure
Input Data
(Relatively Big)
Typical Data Structure
Input Data
(Relatively Big)
Preprocess
Data Structure
Encoding Approach
Input Data
(Relatively Big)
Encoding Approach
Preprocess
w.r.t.
Some Query
Input Data
(Relatively Big)
Encoding
(Hope: much smaller)
Encoding Approach
Encoding
(Hope: much smaller)
Encoding Approach
Encoding
(Hope: much smaller)
Auxiliary Data
Structures:
(Should be smaller still)
Encoding Approach
Succinct Data Structure: Minimum Space Possible
Encoding
(Hope: much smaller)
Auxiliary Data
Structures:
(Should be smaller still)
Encoding Approach
Succinct Data Structure: Minimum Space Possible
Auxiliary Data
Structures:
Encoding
(Hope: much smaller)
(Should be smaller still)
Query
(Hope: as fast as nonsuccinct counterpart)
This Talk: Maximum-Sum Segments
ο‚— From Jon Bentley’s β€œProgramming Pearls”:
ο‚— Input: an array 𝐴[1. . 𝑛] containing arbitrary numbers
𝑗
ο‚— Output: the range [𝑖, 𝑗] s.t. π‘˜=𝑖 𝐴[π‘˜] is maximized
ο‚—
ο‚—
Only non-trivial if array contains negative numbers
Can be solved in linear time (credited to Kadane)
ο‚— Applications:
ο‚—
ο‚—
Bentley [1986]: β€œ[problem] is a toy – it was never incorporated into a system.”
Chen and Chao [2004]: β€œβ€¦plays an important role in sequence analysis.”
ο‚— We focus on the range query case:
𝑖′, 𝑗′
𝑗′
π‘˜=𝑖 β€² 𝐴[π‘˜]
βŠ† [𝑖, 𝑗] s.t.
is maximized
ο‚— Also motivated by biological sequence analytics applications
ο‚— Find range
Range Maximum-Sum Segment Queries
ο‚— What was known:
ο‚— Chen and Chao [ISAAC 2004, Disc. App. Math. 2007]
ο‚—
This can be done in Θ(𝑛) words of space and Θ(1) time
ο‚— Very closely related to the range maximum problem:
ο‚—
ο‚—
RMSSQ β†’ RMaxQ: Pad elements with large negative numbers
RMinQ/RMaxQ β†’ RMSSQ: More complicated argument
Range Maximum-Sum Segment Queries
ο‚— What was not known:
ο‚— Is there an efficient encoding structure for this problem?
ο‚—
That is: can we beat Θ(𝑛) words
Range Maximum-Sum Segment Queries
ο‚— Our main results:
I. We can encode these queries using Θ(𝑛) bits
(Rest of this talk)
II.
A space lower bound of 1.89113𝑛 bits
(Enumeration argument using methods from:
)
III. Application to computing π‘˜-covers
(π‘˜ disjoint subranges that achieve the maximum sum)
CsΕ±rös: β€œThe problem arises in DNA and protein segmentation, and in postprocessing of sequence alignments.”
Main Idea: Θ(𝑛) word solution
ο‚— Define an array 𝐢 consisting of the partial sums of 𝐴
Main Idea: Θ(𝑛) word solution
ο‚— Imagine shooting a ray from each 𝐢 𝑖 to the left
Main Idea: Θ(𝑛) word solution
ο‚— Imagine shooting a ray from each 𝐢 𝑖 to the left
Main Idea: Θ(𝑛) word solution
ο‚— Now find the minimum in this range
Main Idea: Θ(𝑛) word solution
ο‚— Now find the minimum in this range
Main Idea: Θ(𝑛) word solution
ο‚— Define another array 𝑃 storing these minima
𝑖
𝑃[𝑖]
Candidate Pairs
ο‚— We call each pair (𝑃 𝑖 , 𝑖) a candidate
ο‚— We define (yet another) array 𝐷 as follows:
ο‚— 𝐷[𝑖] is the score of the candidate (𝑃 𝑖 , 𝑖)
ο‚—
That is: the sum within the range [𝑃 𝑖 + 1, 𝑖]
What Do They Store?
1) The array 𝐢: Θ(𝑛) words
(Cumulative Sums)
2) The array 𝑃: Θ(𝑛) words
(Candidate partners)
3) Range min (RMinQ) structure on 𝐢: 2𝑛 + π‘œ(𝑛) bits
4) Range max (RMaxQ) structure on 𝐷: 2𝑛 + π‘œ 𝑛 bits
(Candidate Scores)
Main Idea: Θ(𝑛) word solution
ο‚— How to answer a query: the easy case
Main Idea: Θ(𝑛) word solution
ο‚— Let π‘₯ = RMaxQ(𝐷, 𝑖, 𝑗), and examine candidate pair
π‘₯
𝑃[π‘₯]
Main Idea: Θ(𝑛) word solution
ο‚— If 𝑃 π‘₯ + 1 is in query range, return [𝑃 𝑖 + 1, π‘₯]
π‘₯
𝑃[π‘₯]
Main Idea: Θ(𝑛) word solution
ο‚— How to answer a query: the not so easy case
Main Idea: Θ(𝑛) word solution
ο‚— Let π‘₯ = RMaxQ(𝐷, 𝑖, 𝑗)… this time 𝑃 π‘₯ + 1 βˆ‰ [𝑖, 𝑗]
π‘₯
𝑃[π‘₯]
Main Idea: Θ(𝑛) word solution
ο‚— Let 𝑑 = RMinQ 𝐢, 𝑖, π‘₯ …
π‘₯
𝑑
Main Idea: Θ(𝑛) word solution
ο‚— Let 𝑑 = RMinQ 𝐢, 𝑖, π‘₯ and 𝑦 = RMaxQ 𝐷, π‘₯ + 1, 𝑗
π‘₯
𝑑
𝑦
𝑃[𝑦]
Main Idea: Θ(𝑛) word solution
ο‚— Return the greater sum: [𝑑 + 1, π‘₯] or [𝑃 𝑦 + 1, 𝑦]
π‘₯
𝑑
𝑦
𝑃[𝑦]
Reducing the Space
ο‚— What are the bottlenecks in the data structure?
Storing the array 𝑃
I.
ο‚—
We need to store the candidate pairs
Storing the array 𝐢
II.
ο‚—
We must compare scores of candidates in the not so easy case
Dealing with 𝑃: Bottleneck I
Dealing with 𝑃: Bottleneck I
Nested Is Good
ο‚— Imagine indices as 𝑛 vertices, candidate pairs as edges
ο‚— We can represent an 𝑛-edge nested graph in 4𝑛 bits
ο‚— Also known as a one-page or outerplanar graph
ο‚— Navigation is efficient: select vertices, follow edges, etc.
ο‚—
Jacobson (1989), Munro and Raman (2001); 4𝑛 + π‘œ(𝑛) bits
1
()((
2
())(
3
()(
4
()))
5
())((
6
()(
7
8
()))
())
Dealing with 𝑃: Bottleneck I
Dealing with 𝐢: Bottleneck II
𝑦
𝑃[𝑦]
Dealing with 𝐢: Bottleneck II
𝑦
β„“
𝑃[𝑦]
Dealing with 𝐢: Bottleneck II
We call the point β„“ the left sibling of (𝑃 𝑦 , 𝑦)
Knowing β„“, we can handle the not so easy case.
𝑦
β„“
𝑃[𝑦]
Recall The Query Algorithm
ο‚— Return the greater sum: [𝑑 + 1, π‘₯] or [𝑃 𝑦 + 1, 𝑦]
π‘₯
𝑑
𝑦
𝑃[𝑦]
Recall The Query Algorithm
ο‚— Return the greater sum: [𝑑 + 1, π‘₯] or [𝑃 𝑦 + 1, 𝑦]
If the left sibling of
(𝑃 𝑦 , 𝑦) is < 𝑑
π‘₯
𝑑
𝑦
𝑃[𝑦]
Recall The Query Algorithm
ο‚— Return the greater sum: [𝑑 + 1, π‘₯] or [𝑃 𝑦 + 1, 𝑦]
If the left sibling of
(𝑃 𝑦 , 𝑦) is ∈ [𝑑, π‘₯]
π‘₯
𝑑
𝑦
𝑃[𝑦]
Recall The Query Algorithm
ο‚— Return the greater sum: [𝑑 + 1, π‘₯] or [𝑃 𝑦 + 1, 𝑦]
Left sibling of
(𝑃 𝑦 , 𝑦) can’t be here
π‘₯
𝑑
𝑦
𝑃[𝑦]
Dealing with 𝐢: Bottleneck II
Problem: cannot store the left siblings explicitly
𝑦
β„“
𝑃[𝑦]
Dealing with 𝐢: Bottleneck II
Idea: try to find something that is nested
𝑦
β„“
𝑃[𝑦]
Dealing with 𝐢: Bottleneck II
Solution: the pairs (β„“, 𝑃 𝑦 ) are nested
𝑦
β„“
𝑃[𝑦]
Dealing with 𝐢: Bottleneck II
Dealing with 𝐢: Bottleneck II
What Do We Store?
1) The graph representing candidates: 4𝑛 + π‘œ(𝑛) bits
2) The graph representing left siblings: 4𝑛 + π‘œ(𝑛) bits
3) Range min (RMinQ) structure on 𝐢: 2𝑛 + π‘œ(𝑛) bits
4) Range max (RMaxQ) structure on 𝐷: 2𝑛 + π‘œ 𝑛 bits
Grand total: 12𝑛 + π‘œ(𝑛) bits… (can be reduced slightly with more tricks)