Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA 9/2/2005 VLDB 2005, Trondheim, Norway 1 Outline Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions 9/2/2005 VLDB 2005, Trondheim, Norway 2 Problem Definition Given: Top-t most influential sites query: a set of sites S a set of weighted objects O a spatial region Q an integer t. find t sites in Q with the largest influences. influence of a site s = total weight of objects that consider s as the nearest site. 9/2/2005 VLDB 2005, Trondheim, Norway 3 Motivation Which supermarket in Boston is the most influential among residential buildings? Sites: supermarkets; Objects: residential buildings; Weight: # people in a building; Query region: Boston; Which wireless station in Boston is the most influential among mobile users? 9/2/2005 VLDB 2005, Trondheim, Norway 4 Example o2 o1 s1 s2 o4 s3 o5 s4 o3 o6 Suppose all objects have weight = 1, Q is the whole space, and t = 1. The most influential site is s1, with influence = 3. 9/2/2005 VLDB 2005, Trondheim, Norway 5 Example o2 o1 s1 s2 o4 s3 o5 s4 o3 o6 Now that Q is the shadowed rectangle and t = 2. Top-2 most influential sites: s4 and s2. 9/2/2005 VLDB 2005, Trondheim, Norway 6 Outline Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions 9/2/2005 VLDB 2005, Trondheim, Norway 7 Related Work Bi-chromatic RNN query: considers two datasets, sites and objects. The RNNs of a site s S are the objects that consider s as the nearest site. o2 o1 s1 9/2/2005 s2 o4 s3 o5 s4 o3 VLDB 2005, Trondheim, Norway o6 8 Related Work Solutions to the RNN query based on precomputation [KM00, YL01]. o2 o1 s1 9/2/2005 s2 o4 s3 o5 s4 o3 VLDB 2005, Trondheim, Norway o6 9 Related Work Solution to RNN query based on Voronoi diagram [SRAE01]. 9/2/2005 Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. Querying the object R-tree using the Voronoi cell. VLDB 2005, Trondheim, Norway 10 Related Work [SRAE01] o2 o1 s1 9/2/2005 s2 o4 s3 o5 s4 o3 VLDB 2005, Trondheim, Norway o6 11 Our Problem vs. RNN Query RNN query: A single site as an input. Interested in the actual set of the RNNs. Top-t most influential sites query: 9/2/2005 A spatial region as an input. Interested in the aggregate weight of RNNs. VLDB 2005, Trondheim, Norway 12 Straightforward Solution 1 For each site, pre-compute its influence. At query time, find the sites in Q and return the t sites with max influences. Drawback 1: Costly maintenance upon updates. Drawback 2: binding a set of sites closely with a set of objects. 9/2/2005 VLDB 2005, Trondheim, Norway 13 Straightforward Solution 2 An extension of the Voronoi diagram based solution to the RNN query. 1. 2. 3. 9/2/2005 Find all sites in Q. For each such site, find its RNNs by using the Voronoi cell, and compute its influence. Return the t sites with max influences. VLDB 2005, Trondheim, Norway 14 Straightforward Solution 2 Drawback 1: All sites in Q need to be retrieved from the leaf nodes. Drawback 2: The object R-tree and the site R-tree are browsed multiple times. 9/2/2005 For each site in Q, browse the site R-tree to compute the Voronoi Cell. For each such Voronoi Cell, browse the object R-tree to compute the influence. VLDB 2005, Trondheim, Norway 15 Features of Our Solution Systematically browse both trees once. Pruning techniques are provided based on a new metric, minExistDNN. No need to compute the influences for all sites in Q, or even to locate all sites in Q. 9/2/2005 VLDB 2005, Trondheim, Norway 16 Outline Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions 9/2/2005 VLDB 2005, Trondheim, Norway 17 Motivation Intuitively, if some object in Oi may consider some site in Sj as an NN, Oi affects Sj. To estimate the influences of all sites in a site MBR Sj, we need to know whether an object MBR Oi will affect Sj. O1 O2 S1 S2 O1 only affects S1, while O2 affects both S1 and S2. 9/2/2005 VLDB 2005, Trondheim, Norway 18 maxDist – A Loose Estimation If maxDist(O1, S1) < minDist(O1, S2), O1 does not affect S2. Why not good enough? minDist(O1,S2)=8 S2 O1 S1 9/2/2005 maxDist(O1,S1)=10 VLDB 2005, Trondheim, Norway 19 minMaxDist – A Tight Estimation? minDist(o1,S2) = 6 S1 o1 S2 minMaxDist(o1, S1) = 5 An object o does not affect S2, if there exists S1 such that minMaxDist(o1, S1) < minDist(o1, S2) 9/2/2005 VLDB 2005, Trondheim, Norway 20 minMaxDist – A Tight Estimation? minDist(O1,S2) = 6 s1 S1 S2 O1 7 6 o1 s2 minMaxDist(O1, S1) = 5 Not true for an object MBR O1. 9/2/2005 VLDB 2005, Trondheim, Norway 21 A Tight Estimation? A metric m(O1, S1) should: 1) 2) 9/2/2005 guarantee that, each location in O1 is within m(O1, S1) of a site in S1, and be the smallest distance with this property. VLDB 2005, Trondheim, Norway 22 New Metric – minExistDNNS1(O1) Definition: minExistDNNS1(O1) = max {minMaxDist(l, S1) | location l O1} O1 does not affect S2, if there exists S1, s.t. minExistDNNS1(O1) < minDist(O1, S2). 9/2/2005 VLDB 2005, Trondheim, Norway 23 Examples of minExistDNNS1(O1) O1 O1 S1 S1 How to calculate it? 9/2/2005 VLDB 2005, Trondheim, Norway 24 Calculating minExistDNNS1(O1) Step 1: Space partitioning P1:b P2:c P3:a a P4:d c S1 b P8:a 9/2/2005 d P7:d P6:b P5:c Every location l in the same partition is associated with the second closest corner of S1 – the distance is minMaxDist(l, S1)! VLDB 2005, Trondheim, Norway 25 Space Partitioning O1 is divided into multiple sub-regions, one in each partition. P1:b P2:c O1 a c S1 b 9/2/2005 d VLDB 2005, Trondheim, Norway 26 Calculating minExistDNNS1(O1) Step 2: Choose up-to 8 locations on O1’ border and compute the minMaxDist’s to S1. minExistDNN is the largest one! P1:b P2:c O1 minExistDNNS1(O1) a c S1 b 9/2/2005 VLDB 2005, Trondheim, Norway d 27 Outline Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions 9/2/2005 VLDB 2005, Trondheim, Norway 28 Data Structure Two R-trees: S of sites, O of objects. Three queues: 9/2/2005 queueSIN: entries of S inside Q. queueSOUT: entries of S outside Q. queueO: entries of O. VLDB 2005, Trondheim, Norway 29 Data Structure S3 O2 O1 S1 Q S4 S2 O3 O4 queueSIN: S1 S2 queueO: O1 queueSOUT: S3 9/2/2005 VLDB 2005, Trondheim, Norway 30 maxInfluence and minInfluence For each entry Sj in queueSIN, maxInfluence: total weight of entries in queueO that affect Sj. minInfluence: total weight of entries in queueO that ONLY affect Sj, divided by the number of objects in Sj. queueSIN is sorted in decreasing order of maxInfluence. 9/2/2005 VLDB 2005, Trondheim, Norway 31 Algorithm Overview Expand an entry from one of the three queues. Remove the entry from the queue. Retrieve the referenced node, and insert the (unpruned) entries into the same queue. Update maxInfluence and minInfluence if necessary. If top-t entries in queueSIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return. 9/2/2005 VLDB 2005, Trondheim, Norway 32 Example S3 O5 S8 S9 O6 O1 S5 S1 Q S6 S7 queueSIN: S1 queueO: O1 queueSOUT: S3 queueSIN: S5, S7 queueO: O6 queueSOUT: S9 S6 is not affected by O1, prune S6. O5 does not affect S5 and S7, prune O5. 9/2/2005 VLDB 2005, Trondheim, Norway 33 A Pruning Case minExistDNNS3(O1)=4 minDist(S2, O1)=5 Expand S1 S1 S4 S3 S2 O1 minExistDNNS1(O1)=7 S2 is pruned because of minExistDNNS3(O1) < minDist(S2, O1) 9/2/2005 VLDB 2005, Trondheim, Norway 34 Choosing an Entry to Expand Expand top entries in queueSIN. Expand the most important Oi. Importance: |Oi| * #affected entries * area(Oi) Expand Sj that contains the most important Oi. 9/2/2005 VLDB 2005, Trondheim, Norway 35 Choosing an Entry to Expand Estimate the probability of pruning Oi using some Sj in queueSOUT. Q Q S1 minDist(S1, O1)=5 minExistDNNS2(O1)=6 S1 O1 minDist(S1, O1)=5 O1 S2 minExistDNNS2(O1)=6 S’2 After expanding S2, O1 is likely not to affect S1. 9/2/2005 VLDB 2005, Trondheim, Norway 36 Outline Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions 9/2/2005 VLDB 2005, Trondheim, Norway 37 Experimental Setup Data sets: 24,493 populated places in North America 9,203 cultural landmarks in North America R-tree page size: 1 KB LRU buffer: 128 disk pages. t = 4. Comparing to the solution using Voronoi diagram. 9/2/2005 VLDB 2005, Trondheim, Norway 38 Selected Experimental Results #sites : #objects = 1 : 2.5 9/2/2005 VLDB 2005, Trondheim, Norway 39 Selected Experimental Results #sites : #objects = 2.5 : 1 9/2/2005 VLDB 2005, Trondheim, Norway 40 Outline Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions 9/2/2005 VLDB 2005, Trondheim, Norway 41 Conclusions We addressed a new problem: Top-t most influential sites query. We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems. We carefully designed an algorithm which systematically browses both R-trees once. Experiments showed more than an order of magnitude improvement. 9/2/2005 VLDB 2005, Trondheim, Norway 42 Thank you! Q&A 9/2/2005 VLDB 2005, Trondheim, Norway 43