Download Book Chapter Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
CHAPTER-7 -- SPATIAL DATA MINING
By G10: Anuj Karpatne and Vijay Borra
OLD Organization
1.1 Pattern Discovery
1.2 Motivation
1.3 Spatial Statistics
1.4 Classification Techniques
1.5 Association Rule Discovery
1.6 Clustering
1.7 Outlier detection
New Organization
1.1 Pattern Discovery
1.2 Motivation
1.3 Spatial Statistics
1.3.1 LISA
1.3.2 Geostatistics-Kriging
1.4 Classification Techniques
1.4.2 Geographical Weighted Regression
1.5 Association Rule Discovery
1.5.1 Algorithm for Global Colocation Discovery
1.5.1 Algorithm for Regional Colocation Discovery
1.6 Clustering
1.7 Hotspot Analysis
1.8.1 Introduction and Types
1.8.2 Understanding Hotspots
1.8 Outlier detection
1.9 Spatio Temporal Data Mining
1.9.1 Trends
1
Learning Objectives
•
•
•
Learning Objectives (LO)
– LO1: Understand the concept of spatial data mining (SDM)
– LO2 : Learn about patterns explored by SDM
– LO3: Learn about techniques to find spatial patterns
– LO4 : Learn about the statistical measures used in spatial context.
– LO5: Understand Regional and Global Colocation Patterns
– LO6 : Understand Hotspots
– LO7 : Understand Trends in Spatio-Temporal Data Mining
Focus on concepts not procedures!
Mapping Sections to learning objectives
– LO1
7.1
– LO2
7.2.4
– LO3
7.3 - 7.6
– LO4
7.3.1, 7.3.2, 7.4.24
– LO5
7.5.1,7.5.2
– LO6
7.7
– LO7
7.9
2
LO4: Statistical Measures
Local Indicators of Spatial Association( LISA)
• The LISA for each observation gives an indication of the extent of
significant spatial clustering of similar values around the observation.
• The sum of LISA for all observations is proportional to a global
indicator of spatial association.
• Local Moran’s I, Local Geary, Local Gamma
Geographical Weighted Regression:
• Normal regression methods assume relationships over the entire space.
• GWR conducts regression using all data but gives greater weightage to the
locations that are close to the location of interest.
3
L05-Understand Regional and Global Colocations
• Global Colocation Algorithm
– Relevant Methods
• Feature centric model
• Partitioning approach
– Algorithm used
• Event Centric Model
• Participation Index as criteria
• K-function used for statistical significance
• Regional Colocation Algorithm
– Use Prevalence Locality Concept
– Computational efficiency by Maximal Locality Pruning
4
L06-Understanding Hotspots
• Hotspot is
– Unusual high spatial concentration of
phenomenon
e.g., crime, accidents
• Types of Hotspots
• Geometric Hotspots
• Point maps
• Area maps
(isoclines, ellipses, choropeth)
• Network Hotspots
Figure source: Identifying Patterns in spatial information:
A survey of methods by Shekhar et.al
5
L06-Understanding Hotspots
• Traditional clustering algorithms like K-means fail
• Solved using
– Nearest neighbor index(NNI)
• Generate Random sample distribution ,
compare the NNI of the data with the
NNI of random distribution
– Spatial Statistics based ellipsoids
– Thematic maps are generated and LISA is applied
– Network Hotspots using K-means routing algorithms
Figure indicating a thematic maps of vehicle robbery using quadrat thematic mapping
[Source :Eck, J and Chainey, S and Cameron, J and Wilson, R (2005) Mapping crime Understanding Hotspots. National
Institute of Justice: Washington DC.]
6
L07-Trends in Spatio-Temporal Data Mining
• Flow Anomalies
– Flow anomaly finds time intervals where the fraction of time instances
with significantly mismatched sensor readings exceed a user-defined
threshold
– Applications in water treatment systems, transportation networks and
video surveillance systems
– Computationally expensive
Figure Source: [Identifying Patterns in spatial information: A survey of Methods by Shekhar et.al]
7
L07-Trends in Spatio-Temporal Data Mining
Tele-connected flow anomalies
• A Tele-connection represents a strong interaction between paired events that are
spatially distant from each other
e.g., El-Nino Phenomenon
• Tele-connected flow events are
computationally hard to detect due
to the large number of time
instances of measurements, sensors,
and locations
Sequential patterns in ST Domain
• These are patterns that occur sequentially in a spatio-temporal domain
• Events are Boolean, instantaneous and totally ordered.
Low Temperature
High Temperature
High Evaporation


Figure Source- NSF Expeditions in Computing: Understanding Climate Change
8
L07-Trends in Spatio-Temporal Data Mining
• Cascading Spatio Temporal Patterns
– Sequential patterns in spatio-temporal domain where the events are
partially ordered
– Patterns represented as directed acyclic graph substructures
– Applications in crime analysis, disaster planning, climate science,
epidemiology etc.
Figure Source: [Cascading spatio-temporal pattern discovery, Mohan et al.]
9