Download 6) R-tree: Typically the preferred method for indexing spatial data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

B-tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Quadtree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Universālo DBS indeksi
B-koki
B-trees, short for balanced trees, are the most common type of database index.
A B-tree index is an ordered list of values divided into ranges. By associating a
key with a row or range of rows, B-trees provide excellent retrieval
performance for a wide range of queries, including exact match and range
searches.
Indeksu struktūras veidošanās
Spatial indices are used by spatial databases (databases which store
information related to objects in space) to optimize spatial queries.
Conventional index types do not efficiently handle spatial queries such as how
far two points differ, or whether points fall within a spatial area of interest.
Ģeometrijas datu indeksi
Ģeometrisko datu indeksu (spatial indexes) tipi
1) Grid (spatial index);
2) Z-order (curve);
3) Quadtree;
4) Octree;
5) UB-tree;
6) R-tree: Typically the preferred method for indexing spatial data.
Objects (shapes, lines and points) are grouped using the minimum
bounding rectangle (MBR). Objects are added to an MBR within the index
that will lead to the smallest increase in its size.
7) R+ tree;
8) R* tree;
9) Hilbert R-tree;
10) X-tree;
11) kd-tree;
12) m-tree - an m-tree index can be used for the efficient resolution of
similarity queries on complex objects as compared using an arbitrary
metric.
Spatial index methods
1. A grid ("mesh", also "global
grid" if it covers the entire
surface of the globe) is a
regular tessellation of a
manifold or 2-D surface that
divides it into a series of
contiguous cells, which can
then be assigned unique
identifiers and used for spatial
indexing purposes. A wide
variety of such grids have been
proposed or are currently in
use, including grids based on
"square" or "rectangular" cells,
triangular grids or meshes,
hexagonal grids and grids
based on diamond-shaped cells.
2. Z-order, Morton order, or Morton code is
a function which maps multidimensional data
to one dimension while preserving locality of
the data points. It was introduced in 1966 by
G. M. Morton. The z-value of a point in
multidimensions is simply calculated by
interleaving the binary representations of its
coordinate values.
3. A quadtree is a tree data structure in which each internal node has exactly four
children. Quadtrees are most often used to partition a two-dimensional space by
recursively subdividing it into four quadrants or regions. The regions may be square or
rectangular, or may have arbitrary shapes. This data structure was named a quadtree by
Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree.
4. An octree is a tree data structure
in which each internal node has
exactly eight children. Octrees are
most often used to partition a three
dimensional space by recursively
subdividing it into eight octants.
Octrees are the three-dimensional
analog of quadtrees. The name is
formed from oct + tree, but note
that it is normally written "octree"
with only one "t". Octrees are often
used in 3D graphics and 3D game
engines.
5. The UB-tree as proposed by Rudolf Bayer and
Volker Markl is a balanced tree for storing and
efficiently retrieving multidimensional data. It is
basically a B+ tree (information only in the
leaves) with records stored according to Z-order,
also called Morton order. Z-order is simply
calculated by bitwise interlacing the keys.
Aproksimējošie taisnstūri (bounding boxes)
Pārklāšanās noteikšana (find intersecting shapes)
1
2
3
4
6. R-tree. Typically the preferred method for indexing spatial data. Objects
(shapes, lines and points) are grouped using the minimum bounding
rectangle (MBR). Objects are added to an MBR within the index that will lead
to the smallest increase in its size. The R-tree was proposed by Antonin Guttman in
1984.
7. An R+ tree is a method for looking up data using a location, often (x, y) coordinates,
and often for locations on the surface of the earth. Searching on one number is a solved
problem; searching on two or more, and asking for locations that are nearby in both x and
y directions, requires craftier algorithms.
Fundamentally, an R+ tree is a tree data structure, a variant of the R tree, used for indexing
spatial information.
In the figure above, the spatial data configuration from D to M is the same as in R Tree configuration.
However, the A and C MBR area is different. In R+-Tree, if needed, the overlapping areas in the same
level can be brought from different nodes. You can see that J and D are redundant.
When such redundancies occur, when looking for the area F and J are intersecting, you need only to
search in A MBR and do not have to search C.
8. R*-trees are a variant
of R-trees used for
indexing spatial
information. R*-trees
have slightly higher
construction cost than
standard R-trees, as the
data may need to be
reinserted; but the
resulting tree will
usually have a better
query performance. Like
the standard R-tree, it
can store both point and
spatial data. It was
proposed by Norbert
Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990.
9. Hilbert R-tree, an R-tree variant, is
an index for multidimensional objects
like lines, regions, 3-D objects, or
high-dimensional
feature-based
parametric objects. It can be thought
of as an extension to B+-tree for
multidimensional objects.
The performance of R-trees depends
on the quality of the algorithm that
clusters the data rectangles on a node.
Hilbert R-trees use space-filling
curves, and specifically the Hilbert
curve, to impose a linear ordering on
the data rectangles.
There are two types of Hilbert R-trees,
one for static databases, and one for
dynamic databases. In both cases
Hilbert space-filling curves are used to
achieve better ordering of multidimensional objects in the node. This ordering has to be
‘good’, in the sense that it should group ‘similar’ data rectangles together, to minimize the
area and perimeter of the resulting minimum bounding rectangles (MBRs). Packed Hilbert
R-trees are suitable for static databases in which updates are very rare or in which there are
no updates at all.
10. An X-tree (for eXtended node tree) is an index tree structure based on the R-tree used
for storing data in many dimensions. It appeared in 1996, and differs from R-trees (1984),
R+-trees (1987) and R*-trees (1990) because it emphasizes prevention of overlap in the
bounding boxes, which increasingly becomes a problem in high dimensions. In cases
where nodes cannot be split without preventing overlap, the node split will be deferred,
resulting in super-nodes. In extreme cases, the tree will linearize, which defends against
worst-case behaviors observed in some other data structures.
11. A k-d tree (short for
k-dimensional tree) is a
space-partitioning data
structure for organizing
points
in
a
kdimensional
space
(Bentley 1975). k-d
trees are a useful data
structure for several
applications, such as
searches involving a
multidimensional search
key (e.g. range searches
and nearest neighbor
searches). k-d trees are a
special case of binary
space partitioning trees.
12. M-trees are tree data structures that are similar to R-trees and B-trees. It is constructed
using a metric and relies on the triangle inequality for efficient range and k-nearest
neighbor (k-NN) queries. While M-trees can perform well in many conditions, the tree can
also have large
overlap and there
is
no clear strategy
on
how to best
avoid overlap. In
addition, it can
only be used for
distance
functions
that
satisfy
the
triangle
inequality, while
many advanced
dissimilarity
functions used in
information
retrieval do not satisfy this.
13. Point access method.
14. Binary space partitioning (BSP) is a method for recursively subdividing a space into
convex sets by hyperplanes. This subdivision gives rise to a representation of objects
within the space by means of a tree data structure known as a BSP tree.
Binary space partitioning was developed in the context of 3D computer graphics, where
the structure of a BSP tree allows spatial information about the objects in a scene that is
useful in rendering, such as their ordering from front-to-back with respect to a viewer at a
given location, to be accessed rapidly. Other applications include performing geometrical
operations with shapes (constructive solid geometry) in CAD, collision detection in
robotics and 3-D video games, ray tracing and other computer applications that involve
handling
of
complex spatial
scenes.
i.
Start with a list of lines, (or in 3-D, polygons)
making up the scene. In the tree diagrams, lists are
denoted by rounded rectangles and nodes in the
BSP tree by circles. In the spatial diagram of the
lines, the direction chosen to be the 'front' of a line
is denoted by an arrow.
Following the steps of the algorithm above,
1. We choose a line, A, from the list and,...
2. ...add it to a node.
3. We split the remaining lines in the list into
those in front of A (i.e. B2, C2, D2), and
those behind (B1, C1, D1).
4. We first process the lines in front of A (in
steps ii–v),...
5. ...followed by those behind (in steps vi–vii).
ii.
We now apply the algorithm to the list of lines in
front of A (containing B2, C2, D2). We choose a
line, B2, add it to a node and split the rest of the list
into those lines that are in front of B2 (D2), and
those that are behind it (C2, D3).
iii. Choose a line, D2, from the list of lines in front of
B2. It is the only line in the list, so after adding it to
a node, nothing further needs to be done.
iv. We are done with the lines in front of B2, so
consider the lines behind B2 (C2 and D3). Choose
one of these (C2), add it to a node, and put the other
line in the list (D3) into the list of lines in front of
C2.
v.
Now look at the list of lines in front of C2. There is
only one line (D3), so add this to a node and
continue.
vi. We have now added all of the lines in front of A to
the BSP tree, so we now start on the list of lines
behind A. Choosing a line (B1) from this list, we
add B1 to a node and split the remainder of the list
into lines in front of B1 (i.e. D1), and lines behind
B1 (i.e. C1).
vii. Processing first the list of lines in front of B1, D1 is
the only line in this list, so add this to a node and
continue.
viii. Looking next at the list of lines behind B1, the only
line in this list is C1, so add this to a node, and the
BSP tree is complete.
Hibrīdindekss
Oracle10g ģeometrijas datu indeksu veidošana
Pēc datu ir ievadīšanas tabulās ir nepieciešamas izveidot telpisko indeksu katrai tabulai, lai
nodrošinātu ātrāku informācijas meklēšanu un efektīvu piekļūšanu datiem. To realizē
pārklājot ģeometrijas ar dakstiņiem (tile).
Oracle Spatial relāciju objektu datu bāzei nodrošina trīs telpiskās
indeksēšanas metodes:
1) fiksētā;
2) hibrīdā;
3) R–tree indeksēšana.
Indeksēšanai var izmantot fiksēta vai mainīga izmēra dakstiņus. Fiksēta izmēra dakstiņi
tiek kontrolēti ar to izšķirtspēju. Mainīga izmēra dakstiņi tiek kontrolēti ar skaitli, kas
nosaka to maksimālo skaitu. Fiksēta izmēra dakstiņu izšķirtspēja un mainīga izmēra
dakstiņu skaits ir lietotāju iestādītie parametri ir attiecīgi SDO_LEVEL un SDO_NUMT1LES.
Jo mazāks ir fiksēto dakstiņu izmērs, un jo lielāks ir mainīga izmēra dakstiņu skaits, jo
precīzāka būs aproksimācija.
Spatial atbalsta divas SDO_LEVEL un SDO_NUMTILES kombinācijas.
Pirmā - ar nenulles SDO_LEVEL vērtību un nulles SDO_NUMTILES vērtību, kā rezultāts
ir fiksēta izmēra dakstiņi (fiksēta indeksēšana).
Otra - ar nenulles SDO_LEVEL vērtību un nenulles SDO_NUMTILES vērtību, rezultātā
iznāk hibrīda indeksēšana. Hibrīda indeksēšana ģenerē divas dakstiņu kopas katrai
ģeometrijai: pirmā satur fiksēta izmēra dakstiņus, un otrā - mainīga izmēra dakstiņus.
Hibrīdie indeksi nodrošina samērā efektīvu atmiņas sadali. Hibrīda indeksus veido tabulas
kolonnām, kuras satur ģeometriju koordinātes.
R – tree ļauj indeksēt telpiskos datus līdz pat četrām dimensijām quadtree indexing
(fiksētie un hibrīdie indeksi) atbalsta tikai 2 dimensijas. R-tree indeksu veidošanai nav
jāievada parametri SDO_LEVEL un SDO_NUMTILES. Šeit indekss aproksimē katru
ģeometriju ar mazāko tā iespējamo robežu MIR (minimālais ierobežojuma rādiuss) vai
Minimum Bounding Rectangle - MBR).
Ģeometrijas datu indeksu definēšana
Indeksu tipi:
4) fiksētā;
5) hibrīdā;
6) R–tree indeksēšana.
Fiksētā indeksa veidošana:
CREATE INDEX IND_CELTNES ON CELTNES(CELTNE)
INDEXTYPE
IS
MDSYS.SPATIAL_INDEX
PARAMETERS('SDO_LEVEL = 4');
Hibrīda tipa indeksa veidošana:
CREATE INDEX IND_KOKI ON KOKI(KOKS)
INDEXTYPE IS MDSYS.SPATIAL_INDEX
PARAMETERS('SDO_LEVEL = 4, SDO_NUMTILES =4');
R-tree koka indeksa veidošana:
CREATE INDEX IND_IELAS ON
IELAS(IELA)
INDEXTYPE IS MDSYS.SPATIAL_INDEX;
Metadatus par izveidotajiem
ALL_SDO_INDEX_INFO:
indeksiem
var
SELECT
INDEX_NAME, TABLE_NAME,
SDO_INDEX_TYPE
FROM ALL_SDO_INDEX_INFO;
apskatīt
ar
skatu
COLUMN_NAME,
INDEX_NAME
TABLE_NAME
COLUMN_NAME
SDO_INDEX_TYPE
---------------------------------------------------------------------------------------------IND_CELTNES
CELTNES
CELTNE
QTREE
IND_IELAS
IELAS
IELA
RTREE
IND_KOKI
KOKI
KOKS
QTREE
Indeksu ātrdarbības salīdzinājums
Attēlā redzama informācijas ielāde no visām 7 slāņu tabulām, kur visām tabulā
ir fiksētais indekss ar SDO_LEVEL=4. Kā redzams dažādiem slāņiem ir
atšķirīgs ielādes laiks, piemēram, visilgākais ir ielu slāņa ielāde, pēc tam
ūdenstilpņu un māju slāņu ielāde, bet visātrākā būvju, dzelzceļa, koku un taku
slāņa ielāde (47ms).
Attēlā redzams, kā informācija tiek ielādēta, izmantojot hibrīdo indeksu ar
SDO_LEVEL=4 un SDO_NUMTILES=4. Visām slāņu tabulām izmantots
hibrīdais indekss. Salīdzinoši ar fiksēto indeksu, hibrīdā indeksa rezultāti ir
sliktāki, kas ir iespējams, nepiemērotu parametru vērtību izvēles dēļ.
Piemēram, būvju slāņa ielāde tabulai ar hibrīdo indeksu aizņem 296ms, bet ar
fiksēto 47ms.
Attēlā redzama informācijas ielāde, visām slāņu tabulām pielietojot r-koka
indeksus. Salīdzinoši ar datu ielādi izmantojot fiksēto un hibrīdo indeksu, rkoka šajā gadījumā ir visātrākais, bez tam visātrākais visu slāņu ielādei.
Tabulā redzams ātrdarbības pārbaužu rezultāts tabulai MAJAS. Kā jau iepriekš
secināts šo pārbaužu rezultātā, var konstatēt, ka r-koka indeksa darbība ir
visātrākā (vismaz vizualizācijā ar Map Builder).
Indeksu ātrdarbības salīdzinājums
Indekss
Fiksētais
Hibrīdais
R-koka
Tabula
MAJAS
Laiks
62ms
172ms
31ms
27
Indeksu ātrdarbības salīdzinājuma apkopojums
Fiksētie indeksi
Hibrīda indeksi
R-koka indeksi
1.
2.
3.
1.
2.
3.
1.
2.
3.
rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts rezultāts
CITI_OBJEKTI
3009
16
16
4
3
369
7
141
17
EKAS
2962
7
15
7
1
44
7
62
14
IELAS
2768
16
17
2
2
65
6
180
14
KOKI
2939
14
8
8
1
48
7
77
15
UDENSTILPNES
2742
16
15
6
3
187
7
69
11
VID_N
2884
13,8
14,2
5,4
2
142,6
6,8
105,8
14,2
VID_REZ
970,6666667
50
42,26666667
28