Download data structures used in spatial data mining - TKS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Lattice model (finance) wikipedia , lookup

Interval tree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Quadtree wikipedia , lookup

Transcript
DATA STRUCTURES USED IN
SPATIAL DATA MINING
1
What is Spatial data ?


broadly be defined as data which covers
multidimensional points, lines, rectangles,
polygons, cubes and other geometric objects.
Spatial data occupies a certain amount of space
called it’s spatial extent, which is characterized by
location and boundary.
USES
 Geographic Information Systems.
 CAD/CAM It can
 Multimedia Applications
– Content based image retrieval
– Fingerprint matching
– MRI ( Digitized medical images)
2
Features of spatial data

Specific features of spatial data are rich data
types, implicit spatial relationships among the
variables, observations that are not independent,
spatial auto correlation among the features.

It has two distinct types of attributes i.e. spatial
attributes, non spatial attributes. Spatial attributes
are used to define the spatial locations and extend
of spatial objects.
3
Types of spatial databases

Region Data: It has a spatial extent having a
location and boundary. Region data basically is the
geometric approximation to an actual database.

Point Data: Point data consists of collection of
points in a multidimensional space. It doesn’t
cover any area of space.
4
What is Spatial Data Mining?

It is defined as the non-trivial search for
interesting and unexpected spatial patterns from
spatial databases.
 New understanding of geographic processes for
critical questions like how is the health of planet
Earth? Characterize effects of human activity on
environment and ecology? needs spatial data
mining.
5
Spatial data in GIS


A geographic information system is any
system for capturing, storing, analyzing and
managing data and associated attributes which are
spatially referenced to Earth.
There are two broad methods used to store data
in a GIS i.e. Raster and Vector. In a GIS,
geographical features are often expressed as
vectors, by considering those features as
geometrical shapes like point, chains, polygons.
6
Spatial data structures used in
GIS
In order to handle spatial data efficiently, as
required in computer aided design and geodata applications, a database system needs
an index mechanism that will help it
retrieve data items quickly according to
their spatial locations.
 Quad tree
 k-d tree
 R-tree
 R+-tree
 R*-tree
7
Quad trees
 It is used to store 2D space.
 Each node of a quad tree is associated with a
rectangular region of space.
 The top node is associated with the entire target
space.
 Each internal node splits the space into four disjunct
sub spaces according to the axes.
 Each of these sub spaces is split recursively until there
is at most one object inside each of them.
8
Division of space by quadtree
9
k-d Trees
 A k-d tree partitions the space into two sub spaces
according to one of the coordinates of the splitting
points.

Let level(nod) be the length of the path from the root
to the node nod and suppose the axes are numbered
from 0 to k − 1. At the level level(nod) in every node
the space is split according to the coordinate number
(level(nod) mod k).

The partitioning is done along one dimension at the
node at the top level of the tree, along another
dimension in nodes at the next level and so on,
10
cycling through the dimensions.
Division of space by a k-d tree
11
R-Trees

It is a balanced tree structure with the index objects
stored in leaf nodes.

The structure is completely dynamic with no need for
intermittent restructuring.

If M is the maximum number of entries in one node and
m = M/2. Then ‘m’ specifies the minimum number of
entries allowed in a node except for the root.
12
Continue…
 Every non-leaf node has between ‘m’ and ‘M’ children
unless it is the root.
The root node has at least two children unless it is a leaf.
 For each index record (I, tuple-id) in a leaf node, I is the
smallest rectangle that spatially contains the n dimensional
data object.
 For each (I, child-ptr) entry in a non-leaf node, I is the
smallest rectangle that spatially contains the rectangles in
the child nodes.
13
Division of space by R-trees
14
R+-tree



It is an extension of R-tree.
Here bounding rectangle of nodes at one level do not
overlap. This feature decreases the number of searched
branches of the tree and reduces the time consumption
and increases the space consumption .
Here the data objects are allowed to split so that different
parts of one object can be stored in more nodes of one
tree level.
15
Continue…



Root has at least two children unless it is a leaf.
All leaves are at same level.
There is no constraint on the minimum number of
entries at each node.
16
Division of space by R+-tree
17
R*-tree


R*-tree is a modification of R–tree. R–tree tries to
minimize the area of all nodes of the tree.
But R*–tree combines more criteria:
• the area covered by a bounding rectangle
• the margin of a rectangle: Minimization of the margin
of a bounding rectangle prefers the squares.
• the overlap between rectangles: Minimization of the
overlap between rectangles decreases the number of
paths that must be searched
18
Conclusion
New techniques are needed for SDM due to
spatial auto correlation, continuity of space.
Indexing structures discussed above are very
much useful for spatial data represented in
vector space. For metric spaces M-tree, Vp-tree,
mvp-tree are used.The main aim of all these
indexing structures is to minimize disk access.
19
References






http://en.wikipedia.org/wiki/Quadtree
http://www.cs.umd.edu/~hjs/rtrees/index.html
Spatial datamining.pdf
http://www.dbminer.com
R+-tree.pdf
Data structure for spatial data mining21.pdf
20
THANK YOU
21
QUERIES ???
22