Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER-7 -- SPATIAL DATA MINING By G10: Anuj Karpatne and Vijay Borra OLD Organization 1.1 Pattern Discovery 1.2 Motivation 1.3 Spatial Statistics 1.4 Classification Techniques 1.5 Association Rule Discovery 1.6 Clustering 1.7 Outlier detection New Organization 1.1 Pattern Discovery 1.2 Motivation 1.3 Spatial Statistics 1.3.1 LISA 1.3.2 Geostatistics-Kriging 1.4 Classification Techniques 1.4.2 Geographical Weighted Regression 1.5 Association Rule Discovery 1.5.1 Algorithm for Global Colocation Discovery 1.5.1 Algorithm for Regional Colocation Discovery 1.6 Clustering 1.7 Hotspot Analysis 1.8.1 Introduction and Types 1.8.2 Understanding Hotspots 1.8 Outlier detection 1.9 Spatio Temporal Data Mining 1.9.1 Trends 1 Learning Objectives • • • Learning Objectives (LO) – LO1: Understand the concept of spatial data mining (SDM) – LO2 : Learn about patterns explored by SDM – LO3: Learn about techniques to find spatial patterns – LO4 : Learn about the statistical measures used in spatial context. – LO5: Understand Regional and Global Colocation Patterns – LO6 : Understand Hotspots – LO7 : Understand Trends in Spatio-Temporal Data Mining Focus on concepts not procedures! Mapping Sections to learning objectives – LO1 7.1 – LO2 7.2.4 – LO3 7.3 - 7.6 – LO4 7.3.1, 7.3.2, 7.4.24 – LO5 7.5.1,7.5.2 – LO6 7.7 – LO7 7.9 2 LO4: Statistical Measures Local Indicators of Spatial Association( LISA) • The LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around the observation. • The sum of LISA for all observations is proportional to a global indicator of spatial association. • Local Moran’s I, Local Geary, Local Gamma Geographical Weighted Regression: • Normal regression methods assume relationships over the entire space. • GWR conducts regression using all data but gives greater weightage to the locations that are close to the location of interest. 3 L05-Understand Regional and Global Colocations • Global Colocation Algorithm – Relevant Methods • Feature centric model • Partitioning approach – Algorithm used • Event Centric Model • Participation Index as criteria • K-function used for statistical significance • Regional Colocation Algorithm – Use Prevalence Locality Concept – Computational efficiency by Maximal Locality Pruning 4 L06-Understanding Hotspots • Hotspot is – Unusual high spatial concentration of phenomenon e.g., crime, accidents • Types of Hotspots • Geometric Hotspots • Point maps • Area maps (isoclines, ellipses, choropeth) • Network Hotspots Figure source: Identifying Patterns in spatial information: A survey of methods by Shekhar et.al 5 L06-Understanding Hotspots • Traditional clustering algorithms like K-means fail • Solved using – Nearest neighbor index(NNI) • Generate Random sample distribution , compare the NNI of the data with the NNI of random distribution – Spatial Statistics based ellipsoids – Thematic maps are generated and LISA is applied – Network Hotspots using K-means routing algorithms Figure indicating a thematic maps of vehicle robbery using quadrat thematic mapping [Source :Eck, J and Chainey, S and Cameron, J and Wilson, R (2005) Mapping crime Understanding Hotspots. National Institute of Justice: Washington DC.] 6 L07-Trends in Spatio-Temporal Data Mining • Flow Anomalies – Flow anomaly finds time intervals where the fraction of time instances with significantly mismatched sensor readings exceed a user-defined threshold – Applications in water treatment systems, transportation networks and video surveillance systems – Computationally expensive Figure Source: [Identifying Patterns in spatial information: A survey of Methods by Shekhar et.al] 7 L07-Trends in Spatio-Temporal Data Mining Tele-connected flow anomalies • A Tele-connection represents a strong interaction between paired events that are spatially distant from each other e.g., El-Nino Phenomenon • Tele-connected flow events are computationally hard to detect due to the large number of time instances of measurements, sensors, and locations Sequential patterns in ST Domain • These are patterns that occur sequentially in a spatio-temporal domain • Events are Boolean, instantaneous and totally ordered. Low Temperature High Temperature High Evaporation Figure Source- NSF Expeditions in Computing: Understanding Climate Change 8 L07-Trends in Spatio-Temporal Data Mining • Cascading Spatio Temporal Patterns – Sequential patterns in spatio-temporal domain where the events are partially ordered – Patterns represented as directed acyclic graph substructures – Applications in crime analysis, disaster planning, climate science, epidemiology etc. Figure Source: [Cascading spatio-temporal pattern discovery, Mohan et al.] 9