Download Oracle Spatial 8.2 Projects Database Functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Cluster analysis wikipedia, lookup

Transcript
Spatial-enabled Mining in Oracle
Ravi Kothuri
Spatial Technologies
Oracle USA
Dagstuhl 2006
Copyright Oracle Corporation
Oracle Spatial: Store, Analyze
and Visualize Spatial Data
Spatial Data Types
Mapviewer
Oracle10g
Spatial
Vector (feature/topological), Spatial Relationships
Raster,
Route Computation
Network types,
Raster Manipulation
Versioning
Visualization
Scalability & Seamless Integration for Spatial Data
Dagstuhl 2006
Copyright Oracle Corporation
Oracle Spatial: Future Projects
• 3-D
– Extensions to SDO_GEOMETRY
• Composite Surface and Composite/Multi-Solid
• Support different operators: Anyinteract, Filter, NN,
Within_distance
– Scalable Storage and Management of PointCloud Data:
Partitioning and Visibility Query (LOD)
• TIN generation: need to experiment with variety
of approaches
• Intelligent Map Caching, WFS,…
Dagstuhl 2006
Copyright Oracle Corporation
Oracle Data Mining
• Preprocessing, data clean up: number of
transformations, normalization functions
– Binning, Spatial Binning,…
• Data Mining Functions:
– Classification: Decision Trees, Adaptive Bayes,…
– Clustering: KMeans, KModes, Oracle-specific
• Spatial: BIRCH+Agglomerative Clustering
– Association Rules: Apriori
– Regression:
• SVM with linear kernel and more…
Robust Framework for Mining Data in Oracle
Dagstuhl 2006
Copyright Oracle Corporation
Spatial Data Mining
• Where result patterns have a spatial component
– Clustering
– Colocation of data items
• Spatial-enabled: Include Spatial Info in Data Mining
– Information is implicit (not materialized)
– What information to materialize?
• Spatial correlation with target data (e.g., habitats of birds)
• Spatial auto-correlation in Regression
– Target Variable Y = a .X + p W Y
– Where p is the spatial autocorrelation and W is neighborhood matrix
• First step: materialize target variable estimates
– How to incorporate spatial auto-correlation
• Materialize spatial information, estimates as additional attributes
Dagstuhl 2006
Copyright Oracle Corporation
Materializing Neighborhood Influence
• Compute a weighted-sum of interesting
information (target variable, other attributes)
from neighbors
– E.g., if you are estimating CRIME for a region/point
T take a “distance-based” weighted sum of crime of
neighbors.
– Additionally, you can also estimate population-in- T
10mile radius (based on race) etc.
C(T) =
A
B
C(A)/d(A,T) + C(B)/d(B, T)
(1/d(A, T) + 1/d(B, T) )
– Oracle Spatial provides specific functions to
compute such neighborhood-based estimates
Dagstuhl 2006
Copyright Oracle Corporation
Spatial-enabled Mining
Table
Neighborhood
Estimates
e.g. population in 2-miles,
Crime in neighborhood,…
Augmented
Table
Oracle
Data Mining
Dagstuhl 2006
Mining Results
Copyright Oracle Corporation
Spatial-enabled Mining
Mapviewer
ODM
applications
Spatial Analysis
(building blocks)
Dagstuhl 2006
Classification,
Regression,
Association Rules,…
Spatial Binning,
Spatial Estimates,
Clustering for polygons
(BIRCH+agglomerative)
Copyright Oracle Corporation
Case Study for Spatial-enabled Mining:
How helpful are these estimates?
• Test on a specific dataset
– US Block groups from Census for CA (21K)
– Crime Data for US Blockgroups (from a partner
company)
• Crimerate is number of crimes per 1000 of population
– Separate the data into TRAINING data and TEST
data
– Compute Data Mining models using TRAINING
data
Dagstuhl 2006
Copyright Oracle Corporation
Evaluation
• Predict Crime for TEST regions with and without
spatial estimates using ODM’s Mining functions
– Test Regions: 450 locations in San Francisco area
– Classification (Adaptive Bayes Network)
• Create Bins or “classes” of the data and results
• So how well the model predicts the “class” for new test regions
– Regression (Support Vector Machines)
• Predict the exact value of Regression analysis using SVM
crimerate
– Estimates for spatial neighborhood
Dagstuhl 2006
Copyright Oracle Corporation
Spatial Neighborhood
– How do you define neighborhood?
• Buffer around test location? Quarter-mile, to 10 mile
• Nearest-neighbors? 2 to 20
– Compute spatial estimates for crime,
– Can also be done for population (white, asian, black,
hispanic,..)
Dagstuhl 2006
Copyright Oracle Corporation
Some Results:
• Classification:
– Accuracy increases from 62% to 89% with 7 nearest
neighbors
• Regression:
–
Root-Mean-Square-Error between predicted and
actual value improves from ~25 to 8 (5-7 Neighbors)
• Detailed results in a white paper on
http://technet.oracle.com/products/spatial
• Visualize the results with Mapviewer
Dagstuhl 2006
Copyright Oracle Corporation
Dagstuhl 2006
Copyright Oracle Corporation
Dagstuhl 2006
Copyright Oracle Corporation
Summary of the case study
• Adding Neighborhood Influence to Data
– Improves classification accuracy from 62% to 89%
– Best Neighborhood for this case study: 5-7 neighbors or 2-mile
distance
• Details, Additions: White paper on OTN
– http://technet.oracle.com/products/spatial
• Recommendation for Businesses : Spatial-enable the data
– Always geocode customer/business locations
– Materialize demographic information from spatial neighborhood
– Test the data and perform mining tasks
Dagstuhl 2006
Copyright Oracle Corporation
More research needed…
• Current case study:
– SVM w/o spatial, although worse than with spatial, is
still good: Which attributes are helping?
• Colocation Mining
– “Co-location” of items as opposed to “co-occurrence”
in a transaction
– E.g., which sets of items are colocated and what are the
implications (interesting patterns)
– One approach: identify items that co-occur within
“tiled” regions
– Needs tighter integration with association rule mining
Dagstuhl 2006
Copyright Oracle Corporation