Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Spatial-enabled Mining in Oracle Ravi Kothuri Spatial Technologies Oracle USA Dagstuhl 2006 Copyright Oracle Corporation Oracle Spatial: Store, Analyze and Visualize Spatial Data Spatial Data Types Mapviewer Oracle10g Spatial Vector (feature/topological), Spatial Relationships Raster, Route Computation Network types, Raster Manipulation Versioning Visualization Scalability & Seamless Integration for Spatial Data Dagstuhl 2006 Copyright Oracle Corporation Oracle Spatial: Future Projects • 3-D – Extensions to SDO_GEOMETRY • Composite Surface and Composite/Multi-Solid • Support different operators: Anyinteract, Filter, NN, Within_distance – Scalable Storage and Management of PointCloud Data: Partitioning and Visibility Query (LOD) • TIN generation: need to experiment with variety of approaches • Intelligent Map Caching, WFS,… Dagstuhl 2006 Copyright Oracle Corporation Oracle Data Mining • Preprocessing, data clean up: number of transformations, normalization functions – Binning, Spatial Binning,… • Data Mining Functions: – Classification: Decision Trees, Adaptive Bayes,… – Clustering: KMeans, KModes, Oracle-specific • Spatial: BIRCH+Agglomerative Clustering – Association Rules: Apriori – Regression: • SVM with linear kernel and more… Robust Framework for Mining Data in Oracle Dagstuhl 2006 Copyright Oracle Corporation Spatial Data Mining • Where result patterns have a spatial component – Clustering – Colocation of data items • Spatial-enabled: Include Spatial Info in Data Mining – Information is implicit (not materialized) – What information to materialize? • Spatial correlation with target data (e.g., habitats of birds) • Spatial auto-correlation in Regression – Target Variable Y = a .X + p W Y – Where p is the spatial autocorrelation and W is neighborhood matrix • First step: materialize target variable estimates – How to incorporate spatial auto-correlation • Materialize spatial information, estimates as additional attributes Dagstuhl 2006 Copyright Oracle Corporation Materializing Neighborhood Influence • Compute a weighted-sum of interesting information (target variable, other attributes) from neighbors – E.g., if you are estimating CRIME for a region/point T take a “distance-based” weighted sum of crime of neighbors. – Additionally, you can also estimate population-in- T 10mile radius (based on race) etc. C(T) = A B C(A)/d(A,T) + C(B)/d(B, T) (1/d(A, T) + 1/d(B, T) ) – Oracle Spatial provides specific functions to compute such neighborhood-based estimates Dagstuhl 2006 Copyright Oracle Corporation Spatial-enabled Mining Table Neighborhood Estimates e.g. population in 2-miles, Crime in neighborhood,… Augmented Table Oracle Data Mining Dagstuhl 2006 Mining Results Copyright Oracle Corporation Spatial-enabled Mining Mapviewer ODM applications Spatial Analysis (building blocks) Dagstuhl 2006 Classification, Regression, Association Rules,… Spatial Binning, Spatial Estimates, Clustering for polygons (BIRCH+agglomerative) Copyright Oracle Corporation Case Study for Spatial-enabled Mining: How helpful are these estimates? • Test on a specific dataset – US Block groups from Census for CA (21K) – Crime Data for US Blockgroups (from a partner company) • Crimerate is number of crimes per 1000 of population – Separate the data into TRAINING data and TEST data – Compute Data Mining models using TRAINING data Dagstuhl 2006 Copyright Oracle Corporation Evaluation • Predict Crime for TEST regions with and without spatial estimates using ODM’s Mining functions – Test Regions: 450 locations in San Francisco area – Classification (Adaptive Bayes Network) • Create Bins or “classes” of the data and results • So how well the model predicts the “class” for new test regions – Regression (Support Vector Machines) • Predict the exact value of Regression analysis using SVM crimerate – Estimates for spatial neighborhood Dagstuhl 2006 Copyright Oracle Corporation Spatial Neighborhood – How do you define neighborhood? • Buffer around test location? Quarter-mile, to 10 mile • Nearest-neighbors? 2 to 20 – Compute spatial estimates for crime, – Can also be done for population (white, asian, black, hispanic,..) Dagstuhl 2006 Copyright Oracle Corporation Some Results: • Classification: – Accuracy increases from 62% to 89% with 7 nearest neighbors • Regression: – Root-Mean-Square-Error between predicted and actual value improves from ~25 to 8 (5-7 Neighbors) • Detailed results in a white paper on http://technet.oracle.com/products/spatial • Visualize the results with Mapviewer Dagstuhl 2006 Copyright Oracle Corporation Dagstuhl 2006 Copyright Oracle Corporation Dagstuhl 2006 Copyright Oracle Corporation Summary of the case study • Adding Neighborhood Influence to Data – Improves classification accuracy from 62% to 89% – Best Neighborhood for this case study: 5-7 neighbors or 2-mile distance • Details, Additions: White paper on OTN – http://technet.oracle.com/products/spatial • Recommendation for Businesses : Spatial-enable the data – Always geocode customer/business locations – Materialize demographic information from spatial neighborhood – Test the data and perform mining tasks Dagstuhl 2006 Copyright Oracle Corporation More research needed… • Current case study: – SVM w/o spatial, although worse than with spatial, is still good: Which attributes are helping? • Colocation Mining – “Co-location” of items as opposed to “co-occurrence” in a transaction – E.g., which sets of items are colocated and what are the implications (interesting patterns) – One approach: identify items that co-occur within “tiled” regions – Needs tighter integration with association rule mining Dagstuhl 2006 Copyright Oracle Corporation