Download SCP - Data Stream Mining group!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SIGKDD 2015
Assembler
Efficient Discovery of Spatial Co-evolving
Patterns in Massive Geo-sensory Data
Sheng QIAN
2015-08-01
Content
1. Introduction
2. Problem Description
3. The Assembler Method
Stage I Detecting Individual Evolutions
Stage II SCP Generation
Time and space complexity
4. Experiment
Introduction
Spatial Co-evolving Patterns(SCP)
e.g. AQI Sensors in Beijing
Introduction
Challenge
• Interesting evolutions are often flooded by trivial
fluctuations
• The pattern search space is extremely large
Problem Description
Our Interest
Problem Description
Symbol
• S = {s1, s2, . . . , sm}
Sensors
• li
Location of si
• T = {t1, t2, . . . , tn}
Time domain
Problem Description
Definitions
Problem Description
Definitions
Problem Description
Definitions
Method: I. Detecting Individual Evolutions
Haar Wavelet Transformation
Method: I. Detecting Individual Evolutions
Haar Wavelet Transformation
cij
Method: I. Detecting Individual Evolutions
Evolving interval extraction
Method: I. Detecting Individual Evolutions
Mining Frequent Evolutions
Segment-and-group approach
1. Segement: bottom-up
2. Mean Shift: divide segements into groups such that
the segments in the same group have similar slopes
Method: II. SCP Generation
The Anti-monotonicity Property
Method: II. SCP Generation
Find SCP by intersecting matching
timestamps
Method: II. SCP Generation
SCP Search Tree
Method: II. SCP Generation
Neighbor and Parent
Method: II. SCP Generation
SCP Search Tree
Method: II. SCP Generation
Algorithm
Method: II. SCP Generation
Mining Frequent Evolutions
Segment-and-group approach
1. Segement: bottom-up
2. Mean Shift: divide segements into groups such that
the segments in the same group have similar slopes
Method: Discussion
Time Complexity
Segment approach : O(ne · le · ls) ≈ O(m)
ls is small, ne · le <m
Mean Shift : O(nl · k) ≈ O(m)
k: the avg. number of shifting operation
Second Stage : O(nG(n|EG| + np2ns))
nG: the number of connected components in G that have SCPs
|EG| : the number of edges in G
np : the maximum number of SCPs on a connected component
ns : the maximum support of an SCP
Method: Discussion
Space Complexity
Segment & Mean Shift: nearly linear
Second Stage: O(n · np · ns)
Method: Discussion
Parameters Setting
The minimum support θ
How many occurrences can be considered frequent enough
The distance threshold h
What distance makes two sensors reachable
The change threshold δ
How much change in the reading reflects a significant and unusual
behavior
The mean shift bandwidth ω
Experiment
Dataset
1.
Air is an air quality data set. 180 air quality sensors are
deployed in 16 cities in northern China (Beijing, Tianjin, and
14 cities in the Hebei Province). Each sensor has measured
the hourly AQI during the period 2013.02.08 – 2014.08.27.
2. Bike is the Citi Bike rental data set for the 332 rental docks
in New York, we record the number of available bikes at
each dock every 30 minutes during 2013.07.01 – 2014.08.30.
3. Syn-Sensor is a collection of 4 synthetic data sets used to
evaluate the scalability of Assembler w.r.t. the number of
sensors n
Experiment
Illumination
Experiment
Illumination
Experiment
Efficiency Study
Varing θ and h
Experiment
Efficiency Study
Varing δ and w
Experiments
Scalability
Thank you
Related documents