Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SIGKDD 2015 Assembler Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Sheng QIAN 2015-08-01 Content 1. Introduction 2. Problem Description 3. The Assembler Method Stage I Detecting Individual Evolutions Stage II SCP Generation Time and space complexity 4. Experiment Introduction Spatial Co-evolving Patterns(SCP) e.g. AQI Sensors in Beijing Introduction Challenge • Interesting evolutions are often flooded by trivial fluctuations • The pattern search space is extremely large Problem Description Our Interest Problem Description Symbol • S = {s1, s2, . . . , sm} Sensors • li Location of si • T = {t1, t2, . . . , tn} Time domain Problem Description Definitions Problem Description Definitions Problem Description Definitions Method: I. Detecting Individual Evolutions Haar Wavelet Transformation Method: I. Detecting Individual Evolutions Haar Wavelet Transformation cij Method: I. Detecting Individual Evolutions Evolving interval extraction Method: I. Detecting Individual Evolutions Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes Method: II. SCP Generation The Anti-monotonicity Property Method: II. SCP Generation Find SCP by intersecting matching timestamps Method: II. SCP Generation SCP Search Tree Method: II. SCP Generation Neighbor and Parent Method: II. SCP Generation SCP Search Tree Method: II. SCP Generation Algorithm Method: II. SCP Generation Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes Method: Discussion Time Complexity Segment approach : O(ne · le · ls) ≈ O(m) ls is small, ne · le <m Mean Shift : O(nl · k) ≈ O(m) k: the avg. number of shifting operation Second Stage : O(nG(n|EG| + np2ns)) nG: the number of connected components in G that have SCPs |EG| : the number of edges in G np : the maximum number of SCPs on a connected component ns : the maximum support of an SCP Method: Discussion Space Complexity Segment & Mean Shift: nearly linear Second Stage: O(n · np · ns) Method: Discussion Parameters Setting The minimum support θ How many occurrences can be considered frequent enough The distance threshold h What distance makes two sensors reachable The change threshold δ How much change in the reading reflects a significant and unusual behavior The mean shift bandwidth ω Experiment Dataset 1. Air is an air quality data set. 180 air quality sensors are deployed in 16 cities in northern China (Beijing, Tianjin, and 14 cities in the Hebei Province). Each sensor has measured the hourly AQI during the period 2013.02.08 – 2014.08.27. 2. Bike is the Citi Bike rental data set for the 332 rental docks in New York, we record the number of available bikes at each dock every 30 minutes during 2013.07.01 – 2014.08.30. 3. Syn-Sensor is a collection of 4 synthetic data sets used to evaluate the scalability of Assembler w.r.t. the number of sensors n Experiment Illumination Experiment Illumination Experiment Efficiency Study Varing θ and h Experiment Efficiency Study Varing δ and w Experiments Scalability Thank you