Download KNN Paper

Nearest Nick Neighbor Roussopoulos Stephen Department College objects to a given requires substantially for location efficient point Park, or range in space. In this to finding neighbor R-tree object the k nearest for an optimistic paper traversaJ to a point, neighbors. than and a pessimistic an to find and then generalize We also discuss search it ordering our algorithm and examine the scalabfity of the algorithm. the behavior of of The efficient queries tion Systems specific the (GIS). location system database. is when and is not objects. finding the nearest dow multiple query. Another stars which The k furthest nearest familiar even NN with point and objects where to a given more to it query of k nearest if we consider with query or the light-years it R-trees of the varying win- could four bound be is combined remaining [Gutt84], Packed R-trees access methods subtrees which search algorithm [Rous85], [Kame94], [Beck90] have been primarily range queries and spatial on an for overlap/containment. efficient branch-and- processing several experiments on synthetic to demonstrate the performance increases such explores queries for the R-trees, introduce ordering and pruning the search nearest of it, and join queries [BKS93] based In this paper, we provide 2D range that backtracks R-tree variations [SRF87], used for overlap/containment database, search variations when in the away. neighbors all to a in the sky could is to find ten it is useful traditional complex technique neighbors, of spatial potentially contain NN until no subtree needs be visited. In [FBF77] a NN algorithm for k-d-trees was proposed which was later refined in [Spro91]. request the layout point searches to use a more are at least versatility substantially five unsuccessful by an NN may case of an astrophysics star sizes if we were handled the In the a user (NN) Informa- on the screen, situation user spatial involve the Neighbor in Geographic For example, to find the of Nearest interest or an object Another is a wide variety algorithm which first goes down the quadtree, exploring the subtree that contains the query point, in order Then to get a first estimate of the NN location. implementation is of a particular There quadtrees. The exact k-NN problem, is also posed for hierarchical spatial data structures such as the PM quadtree. The proposed solution is a top-down recursive INTRODUCTION 1 movie theaters, [Same89]. However, very few have been used for NN. In [Same90], heuristics are provided to find objects in strategy of the metrics 20742 only, metrics as well as for pruning. Finally, we present the results several experiments obtained using the implementation MD Efficient processing of NN queries requires spatial data structures which capitalize on the proximity of the objects to focus the search of potential neighbors those we present algorithm Vincent Science different such queries queries. branch-and-bound the nearest Processing search algorithms Fr6d6ric other spatial queries such as find the k NN to the East of a location, or even spatial joins with NN join predicate, such as find the three closest restaurants for each of two of query in Geographic the k nearest neighbor different * of Maryland Abstract encountered type Systems is to find Kelley of Computer University A frequently Information Queries exact k-NN several metrics for tree, and perform and real-world data and scalability of our approach. To the best of our knowledge, neither NN algorithms have been developed for R-trees, nor similar metrics for NN search. We would also like to point out that, although the algorithm and these metrics are in as with *This research was sponsored partially by the National Science Foundation under grant BIR 9318183, by ARPA under contract 003195 Ve100ID, and by NASA/USRA under contract 5555-09. the context of R-trees, all other spatial they are directly applicable to data structures. Section 2 of the paper contains the theoretical foundation for the nearest neighbor search. Section 3 describes the algorithm and the metrics for ordering the search and pruning during it. Section 4 has the experiments with the implementation of the algorithm. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copyin is by permission of the Association of Computing Machinery. ? o copy otherwise, or to republish, requires a fee and/or specific permission. SIGMOD’95,San Jose, CA USA Q 1995 ACM 0-89791-731 -6/95/0005.. $3.50 The conclusion 71 is in section 5. NEIGHBOR NEAREST 2 USING R-trees trees were in SEARCH R-TREES proposed higher as than one a natural extension of dimensions [Gutt84]. They B- combine most of the nice features of both B-trees and quadtrees. Like B-trees, they remain balanced, while ~~[~] they their maintain the flexibility of dynamically adjusting or dense grouping to deal with either dead-space areas, like the quad trees do. The decomposition used in (RECT, oid) where used as a a pointer an n-dimensional which in is an object-identifier and to a data object and RECT the a 2-dimensional of the rectangle. stored further i.e. quadrants, the next rectangle node. entry leaf level are For R-tree their atomic, spatial contain line c; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..: be of the of objects and are primitives, segments, entries N is corner spatial trapezoids, nodes will L I is (MBR) represents upper-right considered ‘n: a, example, RECT composite into triangles, p) where and possibly decomposed Non-leaf (RECT, an lower-left The at the not pixels. space, Rectangle object. (z~~~, z~ig~, y[~~, yhigh ) which coordinates the Bounding corresponding q M of the form oid Minimal bounds the form cent ain entries ... ...... ..LL.m........k.l .. ...........‘q R-trees is dynamic, driven by the spatial data objects. And with appropriate split algorithms, if a region of an n-dimensional space includes dead-space, no entry in the R-tree will be introduced. Leaf nodes of the R-tree .............. ........................ .......... Figure or 1: Collection of Rect angles of the form to a successor node in is a minimal level of the R-tree, and REC’T which bounds all the entries in the descendent The term p is a pointer branching (or fan-out) factor can be used to specify the maximum number of entries that a node can have; each node of an R-tree with branching factor fifty, for example, points or leaf objects. to a maximum To illustrate of fifty descendants the way an R-tree is defined on some space, Figure 1 shows a collection of rectangles Performance and Figure 2 the corresponding tree. of an R-tree search is measured by the number of disk accesses (reads) necessary to find (or not find) the desired object(s) in the database. So, the R-tree branching factor is chosen such that the size of a node is equal to (or a multiple file system page. 2.1 Metrics for of) the size of a disk block or Nearest EIIfiEl Neighbor Iw ‘Jld F-F-h-7 Search Given a query point P and an object O enclosed in its MBR, we provide two metrics for ordering the NN search. The first one is based on the minimum distance (MINDIST) of the object O from P. The second metric is based on the minimum of the maximum possible distances (MINMAXDIST) from P to a face (or vertex) of the MBR containing O. MINDIST and MINMAXDIST offer a lower and an upper bound on the actual distance of O from P respectively. These bounds are used by the nearest neighbor algorithm Figure to 72 2: R-tree Construction order and efficiently prune the paths of the search space I)efinition in an R-tree. Definition 1 A rectangle of dimension and 3 The minimum a spatial n, T of its will major R in Euclidean be defined by the space two object distance o, denoted by II(P, of a poznt P from 0)11, M: E(n) endpoints 11(~, o)ll= rnd’jj 1P;- Xilz, S i=l diagonal: VX=[q,..., zn]eo). R = (S, T) Theorem where S=[sl, sz, ..., s*] and T= [tl, tz,..., t~] a set 1 Given a point of objects P and Minimum Distance (MINDIST) we introduce is a variation of the Proof: MINDIST(F’, and the construction R, then MINDIST R) < II(P, X)1]. theorem holds when an object of R touches the circle with center P and radius the square root of MINDIST. of P in E(n) a from J41NDIST(P, one must decide which MBR to search first. MI NDIST offers a first approximation of the NN distance to every R), 2s: MBR of the node and, therefore, the search. = 5 i=l straightforward. In fact, cases, due to dead if pi < si; space inside the MBRs, ti if pi > ti; first the MBR than MINDIST, and visiting smallest MINDIST may result in fake-drops, otherwise. to unnecessary 1 The distance of definition 2 is equal to the Euclidean distance from P to any square of the minimal of R. pozni on the perimeter R, then MINDIST = O which nodes. the NN might is Lemma 2 The edge dimension in MBR level spatial = O and so is equal to the square of minimal distance of P from its closest point on the of the object R-tree) Face 2, ‘hype rface ’ in higher be much further For this reason, second metric MINMAXDIST. lemma is necessary. less than or equal to the distance of P from any point on the perimeter of R. If P is on the perimeter, again MINDIST Euclidean in many Si Lemma If P is inside can be used to direct In general, deciding which MBR to visit first in order to minimize the total number of visited nodes is not that lpi – r~ [2 where { Pi is When searching an R-tree for the nearest neighbor to a query point P, at each visited node of the R-tree, Definition 2 The distance of a point rectangle R in the same space, denoted R) = O which MINDIST is used to determine the closest object to P from all those enclosed in R, The equality in the above less costly computations. To avoid confusion, whenever we refer to distance in this paper, we will in practice be using the square of the distance our metrics will reflect this. If P is inside less than the distance of any object within R including one that may be touching P. If P is outside R, then according to lemma 1, VX on the perimeter of R, zero. If the point is outside the rectangle, we use the square of the Euclidean distance between the point and the nearest edge of the rectangle. We use the square of the Euclidean distance because it involves fewer and Proof: is The first metric classic Euclidean distance applied to a point and a rectangle. If the point is inside the rectangle, the distance between them is r~ = R enclosing the following true: andsi<tiforl~i~n. AIIIVDIST(P, an MBR O = {oi, 1 < i < m], But Property: rectangle in dimensions) contains in the DB. we introduce first, (See figures a the following Every face dimension of any at least with the i.e. visits h4BR one point (t. e. 3 and (at any of some 3 and ~). perimeter, namely itself. If P is outside R and j coordinates, j = 1,2,.. ., n– 1 measures the of P satisfy sj < pj ~ tj,then MINDIST Proof: At the leaf level in the R-tree (object level), assume by contradiction that one face of the enclosing square of the length of a perpendicular segment from P to an edge, for j = 1 or to a plane for j = 2, or a hyperface for j > 3. If none of the pj coordinates MBR does not touch the enclosed object. Then, there exists a smaller rectangle that encloses the object which contradicts the definition of the Minimum Bounding fall between (si, t;), then MINDIST is the square of the distance to the closest vertex of R by the way of selecting I’i . Rectangle. For the non-leaf levels, we use an induction on the level in the tree of the MBR. Assume any level k >0 MBR has the MBR face property, and consider an Notice that in the number computing MINDIST of dimensions, O(n), requires MBR only linear at level k +1. face of that operations. 73 MBR By the definition touches an MBR of an MBR, of lower each level, and therefore, with a leaf object by applying the inductive hypothesis. ———— ———— I1 1 I I1 II -----I II------- r— % L vel ———— MBR i+~ — Level i.1 I I I Minimax avoid have MINDIST Eollowingdistance I I MBRs, higher than construction is being introduced all the maximum I I I I and points In orderto we should have an this upper bound. The (called MINMAXDIST) to compute the minimum value of distances between the query point on the each of the n axes respectively. The MINMAXDIST guarantees there isanobject within the MBR at a distance less than or equal to MINMAXDIST, Definition I ——— (MINMAXDIST) unnecessary upper bound of the NN distance to any object inside an MBR. This will allow us to prune MBRs that i 1 1 1 1 1 ——— Distance visiting R = (S, T) Given a point P in of same dimension the E(n) an MBR and ality, we define — MINMAXDIST(P, R) as: I iWINMAXDIST(P, Zach edge with a of the graphic property MBR level of for the applies Figure3: at object MBRFace the MBRs Property i is in R) = contact DB. (The same at level i+l) min (lp~ – rrn~12 + ~ l<k<n . . lpi – rM~12) ,#k I<t<n in 2-Space where: rmk = { Sk ~fpk tk otherwise. This I s~ ifp~ otherwise. and ~ v; can be described as follows: For each k select the hyperplane Hk = rmk which contains the closer of the two faces of the MBR orthogonal (One of these faces has to the kzh space axis. Hk = sk and the other has Hk = Tk ). The point r&fk-l)rmk,r~k+lj. ... r~n), is Vk! = (?’kfl, rkfz,..., .............. ,,’ . . construction &.?!.#; ti rftf~ = { < ,’ the farthest vertex from P on this face. MINMAXDIST then, is the minimum of the squares of the distance each of these points. ,.’ ,,’ to Notice that this expression can be efficiently implemented, in O(n) operations by first computing S = Enclosed Figure4: ~l<i<~ lpi – r~i!z, (the distance from P to the furthest vertex on the MBR), then iteratively selecting the minimum of S – IPk ‘~~k12+lPk–rm~12forl<k<n. MBRs or Objects MBR Face Property in 3-Space Theorem 2 Gwen a point a set of objects erty 74 holds: O = {oi, 30 ~ O, II(P, 1< P and an MBR i < m}, R enclosing the following 0)1] < MINMAXDIST(P, propR). Proof: Because MBR’s face is touching MBR. MBR 2, we know at least one object Since the definition an estimate its of lemma of MINMAXDIST of the NN distance at the extremity each the 3 as an upper bound In this touching for faces, section we present R-tree a branch-and-bound this MINDIST and MINMAXDIST metrics prune the search tree, then we present for finding 1-NN and finally, generalize the proposition that than MINMAXDIST for finding to order and the algorithm the algorithm the k-NN. of the NN distance. R whose distance from P is within or equal to MINMAXDIST some object inside an MBR, ‘miss’ some object. Figures MAXDIST Algorithm traversal algorithm to find the k-NN objects to a given query point. We first discuss the merits of using the 3.1 Theorem 2 says that MINMAXDIST is the minimum distance that guarantees the presence of an object O in larger Neighbor R-trees guarantees that MINMAXDIST is greater than or equal to the NN distance. On the other hand, a point object located exactly at the vertex of the MBR at distance MINMAXDIST would contradict one could find a smaller distance Nearest produces to an object of one of its that within distance could 5 and 6 illustrate MINDIST and in a 2-Space and 3-Space respectively. MIN- and Ordering and Branch-and-bound this distance, A value would always ‘catch’ but a smaller MINDIST MINMAXDIST Pruning algorithms the for Search have been studied and used extensively in the areas of artificial intelligence and operations research [HS78]. If the ordering and pruning heuristics are chosen well, they can significantly reduce the number of nodes visited in a large search space, Search F The Ordering: algorithm heuristics and in the following orderings of the MINDIST we use in our experiments are based on and MINMAXDIST metrics. The MINDIST ordering is the optimistic choice, while the MINMAXDIST metric is the pessimistic (though T>~,M=,,zx[ . not worst case) one. Since MINDIST estimates the distance from the query point to any enclosed MBR or ...” ,..-, .,. .. . ,. c .,. -...” ... data object w the minimum distance from the point to the MBR itself, it is the most optimistic choice possible. Due to the properties of MBRs and the construction MIN14AXDIST of it, / /. MI NMAXD I S ~...-.-. MINMAXDIST ordering that In applying to a query ordering query Figure 5: MINDIST and MINMAXDIST a depth point first the case, pessimistic not only on the to each of the objects) within each to find the optimal MBRs from the root to the leaf node(s), size and layout of the MBRs (or in 2-Space most traversal in an R-tree, depends point produces need ever be considered. MBR. distance along visit from the the path(s) but also on the in the leaf node In particular, one I can construct examples in which the MINDIST metric ordering produces tree traversals that are more costly (in terms of nodes visited) than the MINMAXDIST I%.. metric, This is shown in figure 7, where the MINDIST MINDIsT MINNAXDIST(P, the NN MBR (P, ordering will lead the search to MBR1 require of opening up Ml 1 and M12. If hand, MINMAXDIST metric ordering was MBR2 would result in an smaller estimate R) R) metric which would on the other used, visiting of the actual distance to the NN (which will be found to be in M21) which will then eliminate the need to examine M 11 and M12. The MINDIST ordering optimistically assumes that the NN to P in MBR Al is going to be close P), which is not always the case. to MINDIST(M, Figure 6: MINDIST and MINMAXDIST Similarly, predefine 3-Space counterexamples ordering. As we stated most optimistic 75 could be constructed above, the MINDIST ordering, but that for any metric produces is not always the 3.2 22 ‘F ,, 21 During is not always the better ordering best choice. Many other orderings are possible choosing metrics which compute the distance from or equal MBRs during by the is greater an MBR it as an contains 2). This than 3.3 P to a given object ● O (DB to the MBR objects level), Finding description the of the k Nearest prune A sorted buffer Query Point, where k is greater are: of at most k current nearest neighbors is needed. for ● The MBRa pruning is done according to the distance of the furthest nearest neighbor in this buffer. The next section provides experimental both MINDIST and MINMAXDIST. 4 Experimental results using Results We implemented our k-NN search algorithm and designed and carried out our experiments in order to demonstrate the capability and usefulness of our NN Although we specify only the use of MINMAXDIST in downward pruning, in practice, there are situations where it is better to apply MINDIST (and in fact strategy 3) instead. For example, when there is no dead space (or at least very lit tie) in the nodes of the R-tree, the MINDIST is a much better estimate of II(P, N)ll, actual distance to the NN than is MINMAXDIST, at than will Generalization: The only differences 3. every MBR M with MINDIST(P,M) greater than the actual distance from P to a given object O is discarded because it cannot enclose an object nearer than O (theorem 1). We use this in upward pruning. MBRs node 8 for the pseudo-code Neighbors to a given than zero. pruning. the MINMAXDIST(P,M) So, it will a leaf The algorithm presented above can be easily generalized to answer queries of the type: Find The k Nearest M can be discarded (actually replaced by estimate of the NN distance) because M an object O’ which is nearer to P (theorem is also used in downward pruning. all levels in the tree. At Neighbors 1 and 2). We use this in downward which nonmetric ABL. to the search: from the ordering to the node corresponding branch. See Figure algorithm. distance computes current value of Nearest and each computed value and updates Nearest appropriately. At the return from the recursion, we take this new estimate of the NN and apply pruning strategy 3 to remove all branches with M1ND1S7’(P, M) > Nearest for all MBRs M in the greater than the 1. an MBR M with MINDIST(P,M) MINMAXDIST(P,M’) of another MBR M’ is discarded because it cannot contain the NN (theorems 2. an actual is infinity. the algorithm calls a type specific distance function for each object and selects the smaller dist ante between We utilize the two theorems we Search Pruning: developed to formulate the following three strategies to prune the algorithm itself recursively query point to faces or vertices of the MBR which are further away. The importance of MINMAXDIST(P,M) is that it computes the smallest distance between point P and MBR M that guarantees the finding of an object less than (call it Nearest) phase, at each newly visited ABL until the ABL is empty: For each iteration, the algorithm selects the next branch in the list and applies of this distance Algorithm bounds (e.g. MINDIST, Definition 2) for all its MBRa and sorts them (associated with their corresponding node) into an Active Branch List (ABL). We then apply pruning strategies 1 and 2 to the ABL to remove unnecessary branches. The algorithm iterates on this 2. MINMAXDIST ordering: If we visit MBR2 first, and then M21, when we eventually visit MBR1, we can prune Ml 1 and Ml 2. in M at a Euclidean MINMAXDIST(P,M). Search distance the descending leaf node, 1. MINDIST ordering: If we visit MBR1 first, we have to visit Ml 1, M12, MBR2 and M21 before finding the NN. 7: MINDIST Neighbor the nearest neighbor MBR2 The NN Is somewhere In there Figure Nearest The algorithm presented here implements an ordered depth first traversal. It begins with the R-tree root node and proceeds down the tree. Originally, our guess for search approach as applied to GIS type of queries. We examined the behavior of our algorithm as the number of neighbors increased, the cardinality of the data set size grew, and how the MINDIST and MINMAXDIST metrics affected performance. We performed more candidate available MINMAXDIST. sets. 76 our real-world The real-world experiments spatial data on both publically data sets and synthetic sets included segment data based TIGER data files for the city of Long Beach, CA and Montgomery County, MD, and observation records from the International Ultraviolet Explorer (1.U.E) satellite from RECURSIVE PROCEDURE Point, N. A.S.A. Beach stored Nearest) Our examples will be from the Long data, which consists of 55,000 street segments as pairs of latitude and longitude coordinates. nearestNeighborSearch (Node, NODE POINT Node Point // // Current NODE Search POINT For the synthetic data experiments, we generated test data files of lK, 2K, 4K, 8K, 16K, 32K, 64K, 128K and NEARESTN Nearest // Nearest 256K Neighbor 8K. //Local newNode Then For distance to actual // Generate Active genBranchList(Point, prune Perform objects approximately for both the MINDIST and MINMAXDIST ordering metrics applied to the Long Beach data. We generated .rect three uniform sets of querys of 100 points each based regions of the Long Beach, CA data set. The first was from a sparse (few or no segments at all) region the city, the second was from a dense (large number nodes Branch List Node, branchList) Downward metric on set of of segments per unit area), and the third was a uniform sample from the MBR of the whole city map. We then values executed a series of nearest neighbor queries for each of the query points in each of these regions and plotted the average number of nodes accessed against the number Pruning // (may discard all branches) last = pruneBranchList(Node, 50 in all indexes, In Figure 9 we see the average of 100 queries for each of several different numbers of nearest neighbors and visit // Sort ABL based on ordering sortBranchList( branchList) // of 8K by generated We built the R-tree indexes by first presorting the data files using a Hilbert [Jaga90] number generating function, and then applying a modified version of Node. branchi.rect) Nearest .dist := dist Nearest .rect := Node. branchi level - order, in a grid and randomly [Rous85] R-tree packing technique according to the suggestion of [Kame94]. The branching factor of both the terminal and non-terminal nodes was set to be i := 1 to Node#count dist := objectDIST(Point, If (dist < Nearest .dist) // Non-leaf Else as rectangles) were unique 8K space. branchList dist, last, i // At leaf level - compute If Node.type = LEAF (stored points using a different seed value for each data set. We then generated 100 equally spaced query points in the 8K by Variables NODE BRANCHARRAY integer points The of nearest Point, neighbors. Nearest, branchList) // Iterate through the Active Branch List For i := 1 to last newNode := Node. branchbranChList, // Recursively visit child nodes nearestNeighborSearch( newNode, Point, Nearest) am // Perform Upward Pruning last := pruneBranchList( Node, branchList) Point, Nearest, 2a20 4acQ m.m Figure 9: MINDIST parison Figure 8: Nearest Neighbor Search Pseudo-Code ordered I mm 12am and MINMAXDIST For this experiment MAXDIST 20.00 we see that Metric graphs searches were similar No. of Ntighbm Com- of the MIN- in shape to the graphs of the MINDIST ordered searches but the number of pages accessed was consistently, approximately 20% higher. MINMAXDIST performed the worst in dense regions of the various data sets which was not surprising. It turned out that in all the experiments 77 we performed were comparing similar to this the one. two Since metrics, this the occurred Database Size Scalability results with I 2CUX2 real world surmise and that MINDIST (pseudo) for spatially is the sake of clarity randomly sorted ordering and distributed and metric simplicity, data, packed of choice. the rest we So, for 128-Nei .K.N::&&’& 18.CKI R-trees, / 16.00 12.00 will the results Figure 10 shows synthetic the of the data. 100 the We equally MINDIST metric results ran averaged the results points for 25 query using &w from 4.00 ‘; // aca of an experiment and spaced only. — . . ..-. ,.’; “,, -----,---- – T-Nei&l . . . . . . . . . . -------------------? --------------.-- ““. ‘- “ ( --.>;0 r --- different Log(2) (Sue m KB; 2.CKI l!.oo of k NN (ranging from 1 to 121) on the lK, 4K, 16K, 64K and 256K data sets. We graphed the results as the (average) number of pages (nodes) accessed versus the number of nearest neighbors searched for, / ,,,#----, / ,’,’/ / 103I show wmi~fii%- . 7cFi&JiE&--/ 14.W the of the figures - Synthetic Dats w— both 4.a2 8.00 cum values Nearest Neighhor Scalability . Synthetic Figure I Data ,WS Acra6d $5-SK -POilui ]8.00 &frcPXr,&x.F& 16.00 ~ .-. / 12.00 - / w. laoa d. T.it f%iZI - . This is an important observation because it shows that the algorithm behaves well for ordered data sets N. 4am WX2 lcam 8am of Nci2htcm 12am 10: Synthetic From the experimental observations, First, increased number ratio the with ity. Since the same gree a (small) all the node synthetic dominant proportionality data sets). that the other 4K, gave In distinct 16K experiment of between the set the with the number number (base of [og2(2561<) pages data appeared to be piecewise examined tree the steps the increases. 2K height appear index noticed (producing a banded were so close next nodes of the at the The has lK R-trees where the R-tree index of 2, the and graph the has 4K, other metrics our algorithm environments. logarithm the height a depth 8K, 16K, the Nearest Neighbor tested our k-NN real and synthetic well and and how to characterize the behavior of in dynamic as well as static database 6 ACKNOWLEDGEMENTS curves We would like to insightful comments We then observed and pruning (so 128 nearest functions. and a We plotted of kilobytes prove useful not constructed Nonetheless, updates. to be valuable tools in with size of the data sets. Further research on NN queries in our group will focus on defining and analyzing the for directing and might data sets showed that the algorithm scales up with respect to both the number of NN requested the of accessed terms points a depth size queries. step to each examined the ordering our experiments was either We implemented and thoroughly algorithm. The experiments on both fact experiment. we in of them are possible as well or subject to (many) these two metrics were shown search. The Although con- ordered increased. the most pessimistic in the case where the R-tree effectively in this linear combination clus- against in de- aver- = 8) for each of 1, 16, 32,64 We of the produces that the neighbor size suggests spatially 11, accessed with the two metrics depth first spatial have shown that MINDIST ordering was more effective in the case where the data were spatially sorted, other orderings or more sophisticated metrics using a and the of MINMAXDIST, sublinearly increase set queries. created set size grew, curves number neighbor the in grew Figure of nearest 2) of the data to run the data least of first metric, MINDIST, produces the ordering possible, whereas the second, that ever need be considered. in a linear 50), components 64K correlation limited sets were of neighbors insight two nei~hbors grew groupings and us the make of proportional- in the accesses The most optimistic accessed strongly (at as the number can curves as the of page in three we of nearest We also introduced can be used to guide an ordered search. constant data term Second, 1 Results (approximately of the of tered of pages cardinality is the pattern) behavior, fractional stant age number Experiment aa the number of similarity this Data the height CONCLUSIONS of a given query point. that Figure with In this paper, we developed a branch-and-bound R-tree traversal algorithm which finds the k Nearest Neighbors ----2am 2 Results and 64K data sets have a depth of 3, and the 128K and the 256K data sets have a depth of 4. 5 ace Experiment ----- ----- 8.00 4,04 Data and the cost of NN increases linearly the tree. ~k.=.. 14.00 .@r~, 11: Synthetic that of the of 1, 32K 78 thank Christos and suggestions. Faloutsos for his References [Beck90] and Beckmann, N., B. “The robust Seeger, H.-P. access method ACM SIGMOD, Kriegel, R*-tree: for points pp 322-331, R. Schneider an efficient and and rectangles,” May 1990. [BKS93] Brinkhoff, T., Kriegel, H. P., and Seeger, B., “Efficient Processing of Spatial Joins Using Rtrees,” 246. [FBF77] Proc. ACM SIGMOD, J. H., Friedman, May Bentley, 1993, pp. 237- J. L., and Finkel, R. A., “An algorithm for finding the best matches in logarithmic expected time,” ACM Trans. Math. Software, [Gutt84] 3, September Guttman, A,, 1977, pp. 209-226. “R-trees,: A Dynamic Structure for Spatial Searching,” MOD, pp. 47-57, June 1984. [HS78] Horowitz, E., Sahni, Computer Algorithms,” 1978, pp. 370-421. [Jaga90] with Jagadish, Multiple S., H. V., “Linear SIG- “Fundamentals Computer Attributes,” Index Proc. ACM Science Clustering Proc. of Press, of Objects ACM SIGMOD, May 1990, pp. 332-342. [Kame94] Kamel, I. and Faloutsos, C., Tree: an Improved R-Tree Using of VLDB, 1994, pp. 500-509, [Rous85] Roussopoulos, Spatial R-trees,” [Same89] N. and Search on Pictorial Proc. ACM D. “Direct Using Packed May 1985. Samet, H., “The Design & Analysis Data Structures,” RProc. Leifker, Databases SIGMOD, “Hilbert Fractals,” Addison-Wesley, Of Spatial 1989. Of Spatial Data [Same90] Samet, H., “Applications Structures, Computer Graphics, Image Processing and GIS,” Addison-Wesley, 1990. [SRF87] Sellis T., Roussopoulos, N., and Faloutsos, C., “The R+-tree: A Dynamic Index for MultiProc. 13th International dimensional Objects,” Conference 507-518. on Very Large Data Bases, “Refinements [Spro91] Sproull, R. F., Neighbor Searching in k-Dimensional gorithmic, 6, 1987, pp. 579-589. to 1987, pp. Nearest- Trees,” Al- 79

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download KNN Paper