* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 6) R-tree: Typically the preferred method for indexing spatial data
Survey
Document related concepts
Transcript
Universālo DBS indeksi B-koki B-trees, short for balanced trees, are the most common type of database index. A B-tree index is an ordered list of values divided into ranges. By associating a key with a row or range of rows, B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range searches. Indeksu struktūras veidošanās Spatial indices are used by spatial databases (databases which store information related to objects in space) to optimize spatial queries. Conventional index types do not efficiently handle spatial queries such as how far two points differ, or whether points fall within a spatial area of interest. Ģeometrijas datu indeksi Ģeometrisko datu indeksu (spatial indexes) tipi 1) Grid (spatial index); 2) Z-order (curve); 3) Quadtree; 4) Octree; 5) UB-tree; 6) R-tree: Typically the preferred method for indexing spatial data. Objects (shapes, lines and points) are grouped using the minimum bounding rectangle (MBR). Objects are added to an MBR within the index that will lead to the smallest increase in its size. 7) R+ tree; 8) R* tree; 9) Hilbert R-tree; 10) X-tree; 11) kd-tree; 12) m-tree - an m-tree index can be used for the efficient resolution of similarity queries on complex objects as compared using an arbitrary metric. Spatial index methods 1. A grid ("mesh", also "global grid" if it covers the entire surface of the globe) is a regular tessellation of a manifold or 2-D surface that divides it into a series of contiguous cells, which can then be assigned unique identifiers and used for spatial indexing purposes. A wide variety of such grids have been proposed or are currently in use, including grids based on "square" or "rectangular" cells, triangular grids or meshes, hexagonal grids and grids based on diamond-shaped cells. 2. Z-order, Morton order, or Morton code is a function which maps multidimensional data to one dimension while preserving locality of the data points. It was introduced in 1966 by G. M. Morton. The z-value of a point in multidimensions is simply calculated by interleaving the binary representations of its coordinate values. 3. A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This data structure was named a quadtree by Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree. 4. An octree is a tree data structure in which each internal node has exactly eight children. Octrees are most often used to partition a three dimensional space by recursively subdividing it into eight octants. Octrees are the three-dimensional analog of quadtrees. The name is formed from oct + tree, but note that it is normally written "octree" with only one "t". Octrees are often used in 3D graphics and 3D game engines. 5. The UB-tree as proposed by Rudolf Bayer and Volker Markl is a balanced tree for storing and efficiently retrieving multidimensional data. It is basically a B+ tree (information only in the leaves) with records stored according to Z-order, also called Morton order. Z-order is simply calculated by bitwise interlacing the keys. Aproksimējošie taisnstūri (bounding boxes) Pārklāšanās noteikšana (find intersecting shapes) 1 2 3 4 6. R-tree. Typically the preferred method for indexing spatial data. Objects (shapes, lines and points) are grouped using the minimum bounding rectangle (MBR). Objects are added to an MBR within the index that will lead to the smallest increase in its size. The R-tree was proposed by Antonin Guttman in 1984. 7. An R+ tree is a method for looking up data using a location, often (x, y) coordinates, and often for locations on the surface of the earth. Searching on one number is a solved problem; searching on two or more, and asking for locations that are nearby in both x and y directions, requires craftier algorithms. Fundamentally, an R+ tree is a tree data structure, a variant of the R tree, used for indexing spatial information. In the figure above, the spatial data configuration from D to M is the same as in R Tree configuration. However, the A and C MBR area is different. In R+-Tree, if needed, the overlapping areas in the same level can be brought from different nodes. You can see that J and D are redundant. When such redundancies occur, when looking for the area F and J are intersecting, you need only to search in A MBR and do not have to search C. 8. R*-trees are a variant of R-trees used for indexing spatial information. R*-trees have slightly higher construction cost than standard R-trees, as the data may need to be reinserted; but the resulting tree will usually have a better query performance. Like the standard R-tree, it can store both point and spatial data. It was proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990. 9. Hilbert R-tree, an R-tree variant, is an index for multidimensional objects like lines, regions, 3-D objects, or high-dimensional feature-based parametric objects. It can be thought of as an extension to B+-tree for multidimensional objects. The performance of R-trees depends on the quality of the algorithm that clusters the data rectangles on a node. Hilbert R-trees use space-filling curves, and specifically the Hilbert curve, to impose a linear ordering on the data rectangles. There are two types of Hilbert R-trees, one for static databases, and one for dynamic databases. In both cases Hilbert space-filling curves are used to achieve better ordering of multidimensional objects in the node. This ordering has to be ‘good’, in the sense that it should group ‘similar’ data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Packed Hilbert R-trees are suitable for static databases in which updates are very rare or in which there are no updates at all. 10. An X-tree (for eXtended node tree) is an index tree structure based on the R-tree used for storing data in many dimensions. It appeared in 1996, and differs from R-trees (1984), R+-trees (1987) and R*-trees (1990) because it emphasizes prevention of overlap in the bounding boxes, which increasingly becomes a problem in high dimensions. In cases where nodes cannot be split without preventing overlap, the node split will be deferred, resulting in super-nodes. In extreme cases, the tree will linearize, which defends against worst-case behaviors observed in some other data structures. 11. A k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a kdimensional space (Bentley 1975). k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). k-d trees are a special case of binary space partitioning trees. 12. M-trees are tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-nearest neighbor (k-NN) queries. While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval do not satisfy this. 13. Point access method. 14. Binary space partitioning (BSP) is a method for recursively subdividing a space into convex sets by hyperplanes. This subdivision gives rise to a representation of objects within the space by means of a tree data structure known as a BSP tree. Binary space partitioning was developed in the context of 3D computer graphics, where the structure of a BSP tree allows spatial information about the objects in a scene that is useful in rendering, such as their ordering from front-to-back with respect to a viewer at a given location, to be accessed rapidly. Other applications include performing geometrical operations with shapes (constructive solid geometry) in CAD, collision detection in robotics and 3-D video games, ray tracing and other computer applications that involve handling of complex spatial scenes. i. Start with a list of lines, (or in 3-D, polygons) making up the scene. In the tree diagrams, lists are denoted by rounded rectangles and nodes in the BSP tree by circles. In the spatial diagram of the lines, the direction chosen to be the 'front' of a line is denoted by an arrow. Following the steps of the algorithm above, 1. We choose a line, A, from the list and,... 2. ...add it to a node. 3. We split the remaining lines in the list into those in front of A (i.e. B2, C2, D2), and those behind (B1, C1, D1). 4. We first process the lines in front of A (in steps ii–v),... 5. ...followed by those behind (in steps vi–vii). ii. We now apply the algorithm to the list of lines in front of A (containing B2, C2, D2). We choose a line, B2, add it to a node and split the rest of the list into those lines that are in front of B2 (D2), and those that are behind it (C2, D3). iii. Choose a line, D2, from the list of lines in front of B2. It is the only line in the list, so after adding it to a node, nothing further needs to be done. iv. We are done with the lines in front of B2, so consider the lines behind B2 (C2 and D3). Choose one of these (C2), add it to a node, and put the other line in the list (D3) into the list of lines in front of C2. v. Now look at the list of lines in front of C2. There is only one line (D3), so add this to a node and continue. vi. We have now added all of the lines in front of A to the BSP tree, so we now start on the list of lines behind A. Choosing a line (B1) from this list, we add B1 to a node and split the remainder of the list into lines in front of B1 (i.e. D1), and lines behind B1 (i.e. C1). vii. Processing first the list of lines in front of B1, D1 is the only line in this list, so add this to a node and continue. viii. Looking next at the list of lines behind B1, the only line in this list is C1, so add this to a node, and the BSP tree is complete. Hibrīdindekss Oracle10g ģeometrijas datu indeksu veidošana Pēc datu ir ievadīšanas tabulās ir nepieciešamas izveidot telpisko indeksu katrai tabulai, lai nodrošinātu ātrāku informācijas meklēšanu un efektīvu piekļūšanu datiem. To realizē pārklājot ģeometrijas ar dakstiņiem (tile). Oracle Spatial relāciju objektu datu bāzei nodrošina trīs telpiskās indeksēšanas metodes: 1) fiksētā; 2) hibrīdā; 3) R–tree indeksēšana. Indeksēšanai var izmantot fiksēta vai mainīga izmēra dakstiņus. Fiksēta izmēra dakstiņi tiek kontrolēti ar to izšķirtspēju. Mainīga izmēra dakstiņi tiek kontrolēti ar skaitli, kas nosaka to maksimālo skaitu. Fiksēta izmēra dakstiņu izšķirtspēja un mainīga izmēra dakstiņu skaits ir lietotāju iestādītie parametri ir attiecīgi SDO_LEVEL un SDO_NUMT1LES. Jo mazāks ir fiksēto dakstiņu izmērs, un jo lielāks ir mainīga izmēra dakstiņu skaits, jo precīzāka būs aproksimācija. Spatial atbalsta divas SDO_LEVEL un SDO_NUMTILES kombinācijas. Pirmā - ar nenulles SDO_LEVEL vērtību un nulles SDO_NUMTILES vērtību, kā rezultāts ir fiksēta izmēra dakstiņi (fiksēta indeksēšana). Otra - ar nenulles SDO_LEVEL vērtību un nenulles SDO_NUMTILES vērtību, rezultātā iznāk hibrīda indeksēšana. Hibrīda indeksēšana ģenerē divas dakstiņu kopas katrai ģeometrijai: pirmā satur fiksēta izmēra dakstiņus, un otrā - mainīga izmēra dakstiņus. Hibrīdie indeksi nodrošina samērā efektīvu atmiņas sadali. Hibrīda indeksus veido tabulas kolonnām, kuras satur ģeometriju koordinātes. R – tree ļauj indeksēt telpiskos datus līdz pat četrām dimensijām quadtree indexing (fiksētie un hibrīdie indeksi) atbalsta tikai 2 dimensijas. R-tree indeksu veidošanai nav jāievada parametri SDO_LEVEL un SDO_NUMTILES. Šeit indekss aproksimē katru ģeometriju ar mazāko tā iespējamo robežu MIR (minimālais ierobežojuma rādiuss) vai Minimum Bounding Rectangle - MBR). Ģeometrijas datu indeksu definēšana Indeksu tipi: 4) fiksētā; 5) hibrīdā; 6) R–tree indeksēšana. Fiksētā indeksa veidošana: CREATE INDEX IND_CELTNES ON CELTNES(CELTNE) INDEXTYPE IS MDSYS.SPATIAL_INDEX PARAMETERS('SDO_LEVEL = 4'); Hibrīda tipa indeksa veidošana: CREATE INDEX IND_KOKI ON KOKI(KOKS) INDEXTYPE IS MDSYS.SPATIAL_INDEX PARAMETERS('SDO_LEVEL = 4, SDO_NUMTILES =4'); R-tree koka indeksa veidošana: CREATE INDEX IND_IELAS ON IELAS(IELA) INDEXTYPE IS MDSYS.SPATIAL_INDEX; Metadatus par izveidotajiem ALL_SDO_INDEX_INFO: indeksiem var SELECT INDEX_NAME, TABLE_NAME, SDO_INDEX_TYPE FROM ALL_SDO_INDEX_INFO; apskatīt ar skatu COLUMN_NAME, INDEX_NAME TABLE_NAME COLUMN_NAME SDO_INDEX_TYPE ---------------------------------------------------------------------------------------------IND_CELTNES CELTNES CELTNE QTREE IND_IELAS IELAS IELA RTREE IND_KOKI KOKI KOKS QTREE Indeksu ātrdarbības salīdzinājums Attēlā redzama informācijas ielāde no visām 7 slāņu tabulām, kur visām tabulā ir fiksētais indekss ar SDO_LEVEL=4. Kā redzams dažādiem slāņiem ir atšķirīgs ielādes laiks, piemēram, visilgākais ir ielu slāņa ielāde, pēc tam ūdenstilpņu un māju slāņu ielāde, bet visātrākā būvju, dzelzceļa, koku un taku slāņa ielāde (47ms). Attēlā redzams, kā informācija tiek ielādēta, izmantojot hibrīdo indeksu ar SDO_LEVEL=4 un SDO_NUMTILES=4. Visām slāņu tabulām izmantots hibrīdais indekss. Salīdzinoši ar fiksēto indeksu, hibrīdā indeksa rezultāti ir sliktāki, kas ir iespējams, nepiemērotu parametru vērtību izvēles dēļ. Piemēram, būvju slāņa ielāde tabulai ar hibrīdo indeksu aizņem 296ms, bet ar fiksēto 47ms. Attēlā redzama informācijas ielāde, visām slāņu tabulām pielietojot r-koka indeksus. Salīdzinoši ar datu ielādi izmantojot fiksēto un hibrīdo indeksu, rkoka šajā gadījumā ir visātrākais, bez tam visātrākais visu slāņu ielādei. Tabulā redzams ātrdarbības pārbaužu rezultāts tabulai MAJAS. Kā jau iepriekš secināts šo pārbaužu rezultātā, var konstatēt, ka r-koka indeksa darbība ir visātrākā (vismaz vizualizācijā ar Map Builder). Indeksu ātrdarbības salīdzinājums Indekss Fiksētais Hibrīdais R-koka Tabula MAJAS Laiks 62ms 172ms 31ms 27 Indeksu ātrdarbības salīdzinājuma apkopojums Fiksētie indeksi Hibrīda indeksi R-koka indeksi 1. 2. 3. 1. 2. 3. 1. 2. 3. rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts CITI_OBJEKTI 3009 16 16 4 3 369 7 141 17 EKAS 2962 7 15 7 1 44 7 62 14 IELAS 2768 16 17 2 2 65 6 180 14 KOKI 2939 14 8 8 1 48 7 77 15 UDENSTILPNES 2742 16 15 6 3 187 7 69 11 VID_N 2884 13,8 14,2 5,4 2 142,6 6,8 105,8 14,2 VID_REZ 970,6666667 50 42,26666667 28