Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inverse problem wikipedia , lookup
Post-quantum cryptography wikipedia , lookup
Pattern recognition wikipedia , lookup
Error detection and correction wikipedia , lookup
FisherβYates shuffle wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Coding theory wikipedia , lookup
PaweΕ Gawrychowski* and Pat Nicholson** *University of Warsaw **Max-Planck-Institut für Informatik Range Queries in Arrays ο Input: an array π΄[1. . π] ο Preprocess the array to answer queries of the form βGiven a range [π, π] find _____ in the subarray π΄[π. . π]β ο Where ______ is something like: ο ο ο ο the index of the maximum/minimum element the index of the top-π values the index of the π-th largest/smallest number find the maximum sum range π β² , π β² β [π, π] Encoding Range Queries in Arrays ο How much space do we need to answer these queries? ο As an example, think of range min. queries (RMinQ): ο If we return the value of min, then we must store the array. Why? ο ο Because we can ask the query [π, π] for each π β [1, π] This allows us to recover the entire array ο If we return just the array index, then we can do much better. There is a succinct data structure that occupies 2π + π(π) bits, and answers queries in constant time. Fischer and Heun (2011) Typical Data Structure Input Data (Relatively Big) Typical Data Structure Input Data (Relatively Big) Preprocess Data Structure Encoding Approach Input Data (Relatively Big) Encoding Approach Preprocess w.r.t. Some Query Input Data (Relatively Big) Encoding (Hope: much smaller) Encoding Approach Encoding (Hope: much smaller) Encoding Approach Encoding (Hope: much smaller) Auxiliary Data Structures: (Should be smaller still) Encoding Approach Succinct Data Structure: Minimum Space Possible Encoding (Hope: much smaller) Auxiliary Data Structures: (Should be smaller still) Encoding Approach Succinct Data Structure: Minimum Space Possible Auxiliary Data Structures: Encoding (Hope: much smaller) (Should be smaller still) Query (Hope: as fast as nonsuccinct counterpart) This Talk: Maximum-Sum Segments ο From Jon Bentleyβs βProgramming Pearlsβ: ο Input: an array π΄[1. . π] containing arbitrary numbers π ο Output: the range [π, π] s.t. π=π π΄[π] is maximized ο ο Only non-trivial if array contains negative numbers Can be solved in linear time (credited to Kadane) ο Applications: ο ο Bentley [1986]: β[problem] is a toy β it was never incorporated into a system.β Chen and Chao [2004]: ββ¦plays an important role in sequence analysis.β ο We focus on the range query case: πβ², πβ² πβ² π=π β² π΄[π] β [π, π] s.t. is maximized ο Also motivated by biological sequence analytics applications ο Find range Range Maximum-Sum Segment Queries ο What was known: ο Chen and Chao [ISAAC 2004, Disc. App. Math. 2007] ο This can be done in Ξ(π) words of space and Ξ(1) time ο Very closely related to the range maximum problem: ο ο RMSSQ β RMaxQ: Pad elements with large negative numbers RMinQ/RMaxQ β RMSSQ: More complicated argument Range Maximum-Sum Segment Queries ο What was not known: ο Is there an efficient encoding structure for this problem? ο That is: can we beat Ξ(π) words Range Maximum-Sum Segment Queries ο Our main results: I. We can encode these queries using Ξ(π) bits (Rest of this talk) II. A space lower bound of 1.89113π bits (Enumeration argument using methods from: ) III. Application to computing π-covers (π disjoint subranges that achieve the maximum sum) CsΕ±rös: βThe problem arises in DNA and protein segmentation, and in postprocessing of sequence alignments.β Main Idea: Ξ(π) word solution ο Define an array πΆ consisting of the partial sums of π΄ Main Idea: Ξ(π) word solution ο Imagine shooting a ray from each πΆ π to the left Main Idea: Ξ(π) word solution ο Imagine shooting a ray from each πΆ π to the left Main Idea: Ξ(π) word solution ο Now find the minimum in this range Main Idea: Ξ(π) word solution ο Now find the minimum in this range Main Idea: Ξ(π) word solution ο Define another array π storing these minima π π[π] Candidate Pairs ο We call each pair (π π , π) a candidate ο We define (yet another) array π· as follows: ο π·[π] is the score of the candidate (π π , π) ο That is: the sum within the range [π π + 1, π] What Do They Store? 1) The array πΆ: Ξ(π) words (Cumulative Sums) 2) The array π: Ξ(π) words (Candidate partners) 3) Range min (RMinQ) structure on πΆ: 2π + π(π) bits 4) Range max (RMaxQ) structure on π·: 2π + π π bits (Candidate Scores) Main Idea: Ξ(π) word solution ο How to answer a query: the easy case Main Idea: Ξ(π) word solution ο Let π₯ = RMaxQ(π·, π, π), and examine candidate pair π₯ π[π₯] Main Idea: Ξ(π) word solution ο If π π₯ + 1 is in query range, return [π π + 1, π₯] π₯ π[π₯] Main Idea: Ξ(π) word solution ο How to answer a query: the not so easy case Main Idea: Ξ(π) word solution ο Let π₯ = RMaxQ(π·, π, π)β¦ this time π π₯ + 1 β [π, π] π₯ π[π₯] Main Idea: Ξ(π) word solution ο Let π‘ = RMinQ πΆ, π, π₯ β¦ π₯ π‘ Main Idea: Ξ(π) word solution ο Let π‘ = RMinQ πΆ, π, π₯ and π¦ = RMaxQ π·, π₯ + 1, π π₯ π‘ π¦ π[π¦] Main Idea: Ξ(π) word solution ο Return the greater sum: [π‘ + 1, π₯] or [π π¦ + 1, π¦] π₯ π‘ π¦ π[π¦] Reducing the Space ο What are the bottlenecks in the data structure? Storing the array π I. ο We need to store the candidate pairs Storing the array πΆ II. ο We must compare scores of candidates in the not so easy case Dealing with π: Bottleneck I Dealing with π: Bottleneck I Nested Is Good ο Imagine indices as π vertices, candidate pairs as edges ο We can represent an π-edge nested graph in 4π bits ο Also known as a one-page or outerplanar graph ο Navigation is efficient: select vertices, follow edges, etc. ο Jacobson (1989), Munro and Raman (2001); 4π + π(π) bits 1 ()(( 2 ())( 3 ()( 4 ())) 5 ())(( 6 ()( 7 8 ())) ()) Dealing with π: Bottleneck I Dealing with πΆ: Bottleneck II π¦ π[π¦] Dealing with πΆ: Bottleneck II π¦ β π[π¦] Dealing with πΆ: Bottleneck II We call the point β the left sibling of (π π¦ , π¦) Knowing β, we can handle the not so easy case. π¦ β π[π¦] Recall The Query Algorithm ο Return the greater sum: [π‘ + 1, π₯] or [π π¦ + 1, π¦] π₯ π‘ π¦ π[π¦] Recall The Query Algorithm ο Return the greater sum: [π‘ + 1, π₯] or [π π¦ + 1, π¦] If the left sibling of (π π¦ , π¦) is < π‘ π₯ π‘ π¦ π[π¦] Recall The Query Algorithm ο Return the greater sum: [π‘ + 1, π₯] or [π π¦ + 1, π¦] If the left sibling of (π π¦ , π¦) is β [π‘, π₯] π₯ π‘ π¦ π[π¦] Recall The Query Algorithm ο Return the greater sum: [π‘ + 1, π₯] or [π π¦ + 1, π¦] Left sibling of (π π¦ , π¦) canβt be here π₯ π‘ π¦ π[π¦] Dealing with πΆ: Bottleneck II Problem: cannot store the left siblings explicitly π¦ β π[π¦] Dealing with πΆ: Bottleneck II Idea: try to find something that is nested π¦ β π[π¦] Dealing with πΆ: Bottleneck II Solution: the pairs (β, π π¦ ) are nested π¦ β π[π¦] Dealing with πΆ: Bottleneck II Dealing with πΆ: Bottleneck II What Do We Store? 1) The graph representing candidates: 4π + π(π) bits 2) The graph representing left siblings: 4π + π(π) bits 3) Range min (RMinQ) structure on πΆ: 2π + π(π) bits 4) Range max (RMaxQ) structure on π·: 2π + π π bits Grand total: 12π + π(π) bitsβ¦ (can be reduced slightly with more tricks)