Download Document

Spatial Query Processing • • • • Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle a large set of complex data, which are not sorted in a dimension. Complex algorithms are needed for evaluating spatial predicates. It is not possible to assume that the computational cost in the query processing is only associated with I/O. Spatial Operations • • Update operations Selection operations: – Point Query (PQ): given a query point p, fin all objects O that contain it: PQ(p) = { O| p  O.G ≠ Ø} – Range or region query (WQ): given a query polygon P, find all objects O that intersect P. When P is rectangular, we call it windows query. WQ(P) = { O| O.G P.G ≠ Ø} • Spatial aggregation: It is a variant of the search for nearest neighbor. Given an object O’, find objects o that have a minimum distance to o’. NNQ(o’) = { o|o’’: dist(o’.G,o.G) ≤ dist(o’.G,o’’.G) } Spatial Operations • Spatial JOIN: This is one of the most important operators in relational databases. When two tables R y S are joined based on a spatial predicate , the join is called spatial join. A variant of this operator in GIS is the map overlay. This operator combines two set of spatial objects to create a new set. The boundaries of these new objects are determined by the nonspatial attributes assigned by the overlap operation. For example, if an operation assigns the same value of a nonspatial attribute two adjacent objects, they will merge. R  S = {(o, o’)| o  R , o’  S,  (o.G, o’.G)} Some spatial predicates are: intersection, northeast, distance, overlap, meets, adjacent, contains, and so on. Techniques of Query Processing • Selection: – – – Unsorted data and no index Spatial Indexing Rank = selectivity - 1/differential cost selectivity(p): cardinality(output(p))/cardinality(input(p)) differential cost is the cost of the predicate. Techniques of Query Processing • Nearest Neighbor: An approach to solve this type of queries uses a couple of distance measures, search pruning criteria, and a search algorithm. Min-distance(P,R) is zero if P is inside of R or on its boundary. If P is outside of R, then min-distance(P,R) is the Euclidean distance between P and any side of R. Min-Max distance(P,R) is the distance to P from the farthest point on any face of R that contain the vertex closest from R to P. The construction of the R-tree guarantee that there is an object O inside of R in the R-tree such that distance(O,P) ≤ Min-Max distance(P,R). Some search pruning strategies are: • An MBR M can be eliminated if if there is another MBR M’ such min-distance(P,M) > min-max distance(P,M’) • An MBR M can be eliminated if if there is an object O such that distance(P,O) < min- distance(P,M) • An object O can be eliminated if if there is an MBR M such that distance(P,O) > min-max distance(P,M) that Techniques of Query Processing • Join: Un join is defined as the cross product followed by a selection condition. This is specially expensive for spatial databases. Associated with a filter step, which is then followed by a refinement, the following algorithms are concentrated on the spatial operations over rectangles (mbrs). JOIN Algorithms • Nested loop for all tuple f  F for all tuple r  R if overlap(F.Geom, R.Geom) then add <f,r> to result If F needs M pages with pf tuples in each page, and R needs N pages with pr tuples in each of them, the computational cost is prohibitive. If we consider B buffers in memory, one can transfer B-2 pages from F, leave one buffer for R, and one for the results of <f,r>. An alternative is to use each tuple in F as a window query over an indexed R. JOIN Algorithms • Tree matching: Both tables are indexed. SJ(R1,R2: nodes) forall er2 in R2 [ forall er1 in R1 [ if overlap(er1.rect, er2.rect) then [ if R1 and R2 are leaf pages then output(er1.oid,er2.oid) else if R1 is leaf page then [ ReadPage(er2.ptr); SJ(er1.ptr.er2.ptr)] else if R2 is leaf page then[ ReadPage(er1.ptr); SJ(er1.ptr.er2.ptr)] else [ ReadPage(er1.ptr), ReadPage(er2.ptr) SJ(er1.ptr,er2.ptr)] ] ] ] JOIN Algorithms • • Partition-Based Spatial Merge Join Filter Step: Given two relations F y R: – – – • Given each tuple in F y R, form the tuple key-pointer consisting of the unique id OID and the MBR. Llame a esto Fkp y Rkp. If both relations Fkp y Rkp fit in main memory, the operation can be processed with a plane-sweep algorithm. If the relations do not fit in memory, partition both relations in P parts. Partition: The partition must satisfy the following constraints: – – For each Fikp, the element in Rikp lies in Rikp Both Fikp y Rikp lie in main memory. Sweep plane: intersection of polygons l l Optimization In traditional DB, the computational cost of a query is defined in terms of I/O. In a spatial DB, in contrast, the fact that the system deals with complex data makes the definition of a query plan and optimization more relevent. The query optimizer generates different evaluation planes and selects one. Many times, time is not the best, but at least, it is not the worst. The activities of the optimizer can be classified into: logical trasnformation and dynamic programming. Schema of Query Optimizer Parser Query SQL Grammer Abstract Data Types Optimizer Logical Transformation Descomposition Dynamic Programming Heuristic Rule Nonspatial Spatial Hybrid Architecture Specification System Catalog selectivity Index CPU Bfr Cost Function Nonspatial Evaluation Merge Spatial Query Optimizer • Parsing: Before that the optimizer can operate, a high-level declarative statement must be scanned through a parser.In traditional DB, the types of data and functions are fixed and the parsers are relatively simple. Spatial DB are extended by user defined types so that parsers are more complicated. SELECT L.nombre FROM Lago L, Servicio Fa WHERE Area(L.Geometry) > 20 ANDs Fa.nombre = ‘camping’ AND Distance(Fa.Geometry, L.Geometry) < 50 Query Tree  L.nombre  Area(L.Geometry) > 20  Fa.nombre = ΤcampingΥ Distance(Fa.Geometry,L.Geometry) < 50 Lago L Servicio Fa Query Optimizer • Logical transformation: The strategy derived from the parser can be very inefficient. The join operation is very expensive and whose complexity is bounded by the size of the input.Thus,it is better to decrease the size of the input of the join operation.An option is to move the selection of nonspatial attribute down in the query tree.  L.nombre  Area(L.Geometry) > 20 Distance(Fa.Geometry,L.Geometry) < 50 Lago L  Fa.nombre = ΤcampingΥ Servicio Fa Transformations • • In the step, the tree is mapped onto equivalent trees by using a set of formal rules inherited from relational algebra. The trees are numbered based on the heuristics to filter candidates that are obviously no recommended. The general rule in this case is “ move the nonspatial operators SELECT and PROJECT down in the tree.” For each alternative is possible to define the rank. Rank = selectivity - 1/differential cost selectivity(p): cardinality(output(p))/cardinality(input(p)) The space of alternatives is generated with rules of relationsl algebra based on notions of commutativity, associativity and distributivity. Equivalence Rules • • Selection c1 c2…cn(R )  c1(c2(…(cn (R ))…) All nonspatial relation are moveed to the right. c1(c2 (R ))  c2(c1 (R )) Nonspatial selection is first than spatial selection. Projection If ai’ are a set of attributes such that ai  ai+1 for i = 1,…n-1, then a1 (R )   a1( a2(…( an (R ))…) Equivalence Rules • Cross Product and Join Conmutativity: RS S R Associativity R  (S  T)  (S  R)  T Implication (R  T)  S  (T  R)  S • Selection, Projection and Join If the selection condition involves attributes used by the projection operator: a(c(R ))  c (a (R )) Equivalence Rules • Selection, Projection and Join If a condition of selection c involves an attribute that only appears in R and not in S, then: c(R S )  c (R )  S Projection can be processed with Join: a(R S )  a1(R )  a2 (S ) where a1 is a subset of a, which appears in R, and a2 is the subset of a that appears in S. Query Optimizer • Dynamic Programming. It is the technique that selects an evaluation plan. This selection is carried out with the goal of minimizing the computational cost.The factors to consider are: – – – – Access cost Storage cost CPU cost Communication cost • Catalogs. It keeps the information for computing the cost • Cost function: Cost = Espression(records-examined) + K* Expresión(pages-read) K weigth of CPU respect to I/O. Execution Plan Ej. SELECT F. Nombre FROM Bosque F, Rios R WHERE Intersect(F. Geometry, Overlap( F. Geometry, R.Geometry) :WINDOW)  F.nombre (on-the fly)  Intersect(F .Geometry,:WINDOWS) (R-T ree index) Overlap(F.Geometry,R.Geometry) (Tree-Matching Join) Bosque F Rios R AND

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document