Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Space-Efficient Range Reporting for Categorical Data Yakov Nekrich Department of Computer Science University of Chile PODS 2012 1 Outline O Colored Range Reporting O Three-sided Color Reporting in Linear Space O Path-range Trees O Three-sided Color Reporting for O(B*log2n) Points O Colored Range Reporting in Two Dimensions PODS 2012 2 Colored Range Reporting O Definition O Each point p in a set S is assigned a color col(p) O For a query rectangle Q O Report the colors of all points that occur in Q PODS 2012 3 Colored Range Reporting O Contributions O First I/O-efficient data structures O Achieve optimal query costs and almost the same space usage as the corresponding data structures for regular range reporting queries PODS 2012 4 Outline O Colored Range Reporting O Three-sided Color Reporting in Linear Space O Path-range Trees O Three-sided Color Reporting for O(B*log2n) Points O Colored Range Reporting in Two Dimensions PODS 2012 5 Lemma 1 O There exists an data structure D O D stores two-dimensional points and uses O(N/B) space O For any three-sided range Q = [a, b] × [0, c] O D can report all results in O((N/B)δ + (K/B)) I/Os for any δ > 0 PODS 2012 6 Lemma 1 1/δ PODS 2012 (N/B)δ 7 Lemma 1 O For every node v (containing Nv points) O Maintain a tree Lv O Uses space O(Nv/B) O Range search cost in (logBNv + K/B) I/Os O O(logBN + K/B) I/Os PODS 2012 8 Lemma 1 O Space used O O ((1 / δ ) * (N / B)) = O(N/B) O I/O cost ([a, b] × [0, c]) O [a, b] can identify O((N/B) δ) nodes on every level O Total cost O ((1 / δ ) * (N/B) δlogBN + (K / B)) = O((N/B) δlogBN + (K/B)) PODS 2012 9 Lemma2 O There exists an data structure D O D contains colored one-dimensional points and uses O(N/B) space O For any one-dimensional range Q = [a, b] O D can report all distinct colors of points in O ((N/B) δlogBN + (K/B)) I/Os PODS 2012 10 Lemma2 [2, 6] [2, 6] × [0, 2) PODS 2012 11 Lemma 3 O There exists O(N/B) space data structure D O D answers three-sided color range reporting queries for points on an N × N grid in O ((N/B) δlogBN + (K/B)) I/Os PODS 2012 12 Lemma 3 O Sweep a horizontal line h in +y direction O When h hits a point p O Add p to Structure E in Lemma 2 O Since points lie on N × N grid O Q = [a, b] ×[0, c] cost N * O ((N/B) δlogBN + (K/B)) I/Os = O ((N/B) δlogBN + (K/B)) I/Os PODS 2012 13 Outline O Colored Range Reporting O Three-sided Color Reporting in Linear Space O Path-range Trees O Three-sided Color Reporting for O(B*log2n) Points O Colored Range Reporting in Two Dimensions PODS 2012 14 Structure π(w, la) PODS 2012 πr(w, la) πl(w, lb) 15 Structure O ymin(v, F) O For a color v and a set of points F O ymin(v, F) denote the point with the smallest y-coordinate among all points p ∈ F with col(p) =v O Yl(u, v) (Yr (u, v)) O ymin(c, πl(u, v)) (ymin(c, πr(u, v)) PODS 2012 16 Structure O Identity levels of the tree Ni O logB + f(i) log logN for f(i) = 2i and i = 1, … , log loglogN(N/B) O O(B logf(i)N) points in each node PODS 2012 17 Structure O For each node v in Ni O For each ancestor u of v we store the lists Ll(u, v) = Yl(u, v)[1, …, mi] Lr(u, v) = Yr(u, v)[1, …, mi] where mi = B logf(i)-1N O For i >= 2 store all points from S(v) in a O(|Sv|/B) space data structure D(v) O For each v ∈ N1 are stored in a structure D’(v) PODS 2012 18 Queries O Q = [a, b] × [0, c] O Identity la and lb O Identity the lowest common ancestor w of la and lb PODS 2012 19 Queries O Find the ancestor vi of lb that belongs to Ni O Traverse the lists Ll(w, vi) until a point pi, pi.y > c O If pi is not found, increment i and proceed in the next node vi O If Found O Report colors of all points p∈Ll(w, vi), p.y < c. O Answer the query Q = [a, b] × [0, c] to data structure D(vi) PODS 2012 20 Space Usage O D(v) cost O v ∈ Ni, v contains O(B*logf(i)N) points O O(N/(B*logf(i)N)) nodes in Ni O Total log loglogN(N/B) levels O D(v) total cost O((N/B) log log(N/B)) blocks O Lr(u, v) cost O Each list Lr(u, v) contains B logf(i)-1N points O O(logN) lists for each v in Ni O Lr(u, v) cost O((N/B) log log(N/B)) PODS 2012 21 I/O cost O Traversing all Ll(w, vt) O K >=|Ll(w, vt-1)| = O(B logf(t-1)-1N) O |Ll(w, vt-1)| >= ∑1,t-2 |Ll(w, vt)| O K = Ω(∑1,t-1 |Ll(w, vt)|) O Total cost O(K/B) I/Os O Cost in D(v) O O(((B logf(t)N)/B)1/4+K’/B) K’ <= K O (B logf(t)N)/B)1/4=O(logf(t-1)-1N) =O(|Ll(w, vt-1)|/B) for t > 1 O Total cost O(K/B) I/Os PODS 2012 22 Outline O Colored Range Reporting O Three-sided Color Reporting in Linear Space O Path-range Trees O Three-sided Color Reporting for O(B*log2N) Points O Colored Range Reporting in Two Dimensions PODS 2012 23 Lemma 5 O Let S be a set of r = O(B log2N) points on an N * N grid O There exists a O((|S|/B) log logN) space data structure D O Cost in O((log logN)δ + K/B) I/Os PODS 2012 24 Lemma 6 O Let S be a set of r = O(B log2N) points on an N * N grid O There exists a O((|S|/B) log logNlog(3)N) space data structure D O Cost in O(K/B) I/Os PODS 2012 25 Outline O Colored Range Reporting O Three-sided Color Reporting in Linear Space O Path-range Trees O Three-sided Color Reporting for O(B*log2N) Points O Colored Range Reporting in Two Dimensions PODS 2012 26 N×N to U×U O Rank space technique O Scaling and Related Techniques for Geometry Problems O Query cost increases in O(loglogBU) I/Os O Space usage remains unchanged PODS 2012 27 Three-sided to Two Dimensions O Construct a range tree Ty on the y coordinates O In each node v of Ty, define path range trees P1(v) and P2(v) that support three-sided queries O To answer a query [a, b] × [c, d], just answer either S(w1)∩([a, b]×[c, +∞]) or S(w2) ∩([a, b]×[0, d]) PODS 2012 28 Three-sided to Two Dimensions O Query cost is the same as the cost of three sided queries O Space usage is increases by O(log N) factor PODS 2012 29 PODS 2012 30