Download slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Space Efficient Data Structures
for Dynamic Orthogonal
Range Counting
Meng He and J. Ian Munro
University of Waterloo
Dynamic Orthogonal Range Counting

A fundamental geometric query problem

Definitions




Data sets: a set P of n points in the plane
Query: given an axis-aligned query rectangle R,
compute the number of points in P∩R
Update: insertion or deletion of a point
Applications


Geometric data processing (GIS, CAD)
Databases
Example
Classic Solutions and Our Result
Space
Query
Update
Chazelle (1988)
O(n)
O(lg n)
JáJá (2004)*
O(n)
O(lg n / lglg n)
Chazelle (1988)
O(n)
O(lg2 n)
O(lg2 n)
Nekrich (2009)
O(n)
O((lg n / lglg n)2)
O(lg4+ε n) (0<ε<1)
Our result
O(n)
O((lg n / lglg n)2)
O((lg n / lglg n)2)
* For integer coordinates.

Matches the lower bound under the group model
Pătraşcu (2007)
Background: Succinct Data Structures

What are succinct data structures (Jacobson 1989)



Why succinct data structures


Representing data structures using ideally information-theoretic
minimum space
Supporting efficient navigational operations
Large data sets in modern applications: textual, genomic, spatial
or geometric
A novel and unusual way of using succinct data
structures (this paper)


Matching the storage cost of standard data structures
Improving the time efficiency
Dynamic Range Sum

Data


Operations





A 2D array A[1..r, 1..c] of numbers
range_sum(i1, j1, i2, j2): the sum of numbers in A[i1..i2, i2.. j2]
modify(i, j, δ): A[i, j] ← A[i, j] + δ
insert(j): insert a 0 between A[i, j-1] and A[i, j] for i = 1, 2, …, r.
delete(j): delete A[i, j] for for i = 1, 2, …, r. To perform this, A[i, j]
must be 0 for all i.
Restrictions on r, c and δ and operations supported may
apply.
Dynamic Range Sum: An Example
8
2
9
5
4
0
5
12
0
3
1
9
0
7
3
1
5
0
0
0
4
2
8
1
2
5
9
3
1
10
8
-2 0
0 0
3
4
5
1
4
0
1
18
0
5
range_sum(2, 3, 3, 6) = 25
insert(6) modify(2, 6, 5)
range_sum(2, 3, 3, 7) = 30
modify(2, 6, -5) delete(6)
Dynamic Range Sum in a small 2D Array

Assumptions and restrictions






Word size w: Ω(lg n)
Each number: nonnegative, O(lg n) bits
rc = O(lgλ n) , 0 < λ < 1
modify(i, j, δ): |δ| ≤ lg n
insert and delete: no support
Our solution




Space: O(lg1+λ n) bits, with an o(n)-bit universal table
Time: modify and range_sum in O(1) time
Generalization of the 1D array version (Raman et al. 2001)
Deamortization is interesting
Range Sum in a Narrow 2D Array

Assumptions and restrictions




Our results



b = O(w): number of bits required to encode each number
“Narrow”: r = O(lgγ c), 0 < λ < 1
|δ| ≤ lg c
Space: O(rcb + w) bits, with an O(c lg c)-bit buffer
Operations: O(lg c / lg lg c) time
A generalization of the solution to CSPSI problem based
on B trees (He and Munro 2010), using our small 2D array
structure on each B-tree node
Range Counting in Dynamic Integer Sequences

Notation



Integer range: [1..σ]
Sequence: S[1..n]
Operations:






access(x): S[x]
rank(α, x): number of occurrences of α in S[1..x]
select(α, r): position of the rth occurrence of α in S
range_count(p1, p2, v1, v2): number of entries in S[p1..
p2] whose values are in the range [v1.. v2].
insert(α, i): insert α between S[i-1] and S[i]
delete(i): delete S[i] from S
Range Counting in Integer Sequences: An Example
S = 5,5,2,5,3,1,3,4,7,6,4,1,2,2,5,8
rank(5, 8) = 3
select(2, 3) =
14
range_count(6, 12, 2, 6) = 4
Range Counting in Sequences of Small Integers

Restrictions


Our result



σ = O(lg ρ n) for any constant 0 < ρ < 1
Space: nH0 + o(n lg σ) + O(w) bits
Time: O(lg n / lglg n)
This is achieved by combining:


Our solution to range sum on narrow 2D arrays
A succinct dynamic string representation (He and Munro
2010)
Dynamic Range Counting: An Augmented
Red Black Tree

Tx: A red black tree storing all the x-coordinates

Each node also stores the number of its
descendants

Purpose: conversions between real xcoordinates and rank space in O(lg n) time
Dynamic Range Counting: A Range Tree

Ty: A weight balanced B-tree (Arge and Vitter 2003)
constructed over all the y-coordinates




Essentially a range tree


Branching factor d = Θ(lgε n) for constant 0 < ε < 1
Leaf parameter: 1
The levels are numbered 0, 1, … from top to bottom
Each node represents a range of y-coordinates
Choice of weight balanced B-tree: amortizing a
rebuilding cost
Dynamic Range Counting: A Wavelet Tree


Ideas from generalized wavelet trees (Ferragina et al. 2006)
For each node v of Ty, construct a sequence Sv:






Each entry of Sv corresponds to a point whose y-coordinate is in
the range represented by node v
Sv [i] corresponds to the point with the ith smallest x-coordinate
among all these points
Sv [i] indicates which child of v contains the y-coordinate of the
above point
For each level m, construct a sequence Lm[1..n] of
integers from [1..4d] by concatenating the all the Sv’s
constructed at level m
Lm : stored as dynamic sequences of small integers
Space: O(n lg d + w) bits per level, O(n) words overall
Range Counting Queries
Query range: [x1..x2] × [y1..y2]
 Use Tx to convert the query x-range to a range in
rank space
 Perform a top-down traversal to locate the (up to
two) leaves in Ty whose ranges contain y1 and y2
 Perform range_count on Sv for each node v
visited in the above traversal
 Sum up the query results to get the answer
 Time: O(lg n / lglg n) per level, O(lg n / lglg n)
levels

Insertions and Deletions

More complicated: splits and merges; changes
to child ranks

The choice of storing Ty as weight balanced Btree allows us to amortize the updating cost of
subsequences of Lm’s

Additional techniques supporting batch updating
of integer sequences are also developed
Our Results

Dynamic Orthogonal Range Counting



Points on a U×U grid



Space: O(n) words
Time: O((lg n / lglg n)2)
Space: O(n) words
Time (worst-case): O(lg n lg U / (lg lg n)2)
Succinct representations of dynamic integer
sequences


Space: nH0 + o(n lg σ) + O(w) bits
lg n
lg σ
Time (including range_count):O(────
( ────
lg lg n + 1))
lg lg n
Conclusions

Results





Techniques



The best result for dynamic orthogonal range counting
Same problem for points on a grid
The first succinct representations of dynamic integer sequences
supporting range counting
Two preliminary results on dynamic range sum
The first that combines wavelet trees with range trees
Deamortization on 2D arrays
Future work


Lower bound
Use techniques from succinct data structures to improve
standard data structures
Thank you!