Download From Sensors to Streams

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Stream Hierarchy Data
Mining for Sensor Data
Margaret H. Dunham
Vijay Kumar
SMU
Dallas, Texas 75275
[email protected]
UMKC
Kansas City, Missouri 64110
[email protected]
11/26/07 – IRADSN’07
1
From Sensors to Streams –
An Outline
 Data Stream Overview
 Data Stream Visualization
 Temporal Heat Map
 Data Stream Modeling
 Extensible Markov Model
 Data Stream Hierarchy
11/26/07 – IRADSN’07
2
From Sensors to Streams –
An Outline
 Data Stream Overview
 Data Stream Visualization
 Temporal Heat Map
 Data Stream Modeling
 Extensible Markov Model
 Data Stream Hierarchy
11/26/07 – IRADSN’07
3
From Sensors to Streams
 Data captured and sent by a set of sensors
is usually referred to as “stream data”.
 Real-time sequence of encoded signals
which contain desired information. It is
continuous, ordered (implicitly by arrival time
or explicitly by timestamp or by geographic
coordinates) sequence of items
 Stream data is infinite - the data keeps
coming.
11/26/07 – IRADSN’07
4
Data Stream Management Systems
(DSMS)
 Software to facilitate querying and managing stream
data.
 Retrieve the most recent information from the stream
 Data aggregation facilitates merging together multiple
streams
 Modeling stream data to “summarize” stream
 Visualization needed to observe in real-time the spatial
and temporal patterns and trends hidden in the data.
11/26/07 – IRADSN’07
5
DSMS Problems
 Stream Management development in state similar to
that of databases prior to 1970’s
 Each system/researcher looks at specific
application or system
 No standards concerning functionality
 No standard query language
 Unreasonable to expect end users will access raw
data, data in the DSMS, or even data at a
summarized view
 Domain experts need to “see” a higher level of data
11/26/07 – IRADSN’07
6
Our Proposal
Four level data abstraction to facilitate the
creation of actionable intelligence for domain
experts evaluating sensor data.
11/26/07 – IRADSN’07
7
From Sensors to Streams –
An Outline
 Data Stream Overview
 Data Stream Visualization
 Temporal Heat Map
 Data Stream Modeling
 Extensible Markov Model
 Data Stream Hierarchy
11/26/07 – IRADSN’07
8
Assumptions for Our Research
 End User:
 May not be knowledgeable concerning sensors
 Probably a Domain Expert
 May not need to see exact sensor values
 Concerned with trends and approximate values
 Need to see data from MANY sensors at one time
 Need to see data continuously in a visualization of
the stream
11/26/07 – IRADSN’07
9
Suppose There Were MANY
Sensors
 Traditional line graphs would be very difficult to read
 Requirements for new visualization technique:
 High level summary of data
 Handle multiple sensors at once
 Continuous
 Temporal
 Spatial
11/26/07 – IRADSN’07
10
Temporal Heat Map
 Also called Temporal Chaos Game Representation (TCGR)
 Temporal Heat Map (THM) is a visualization technique for streaming
data derived from multiple sensors.
 It is a two dimensional structure similar to an infinite table.
 Each row of the table is associated with one sensor value.
 Each column of the table is associated with a point in time.
 Each cell within the THM is a color representation of the sensor value
 Colors normalized (in our examples)

0 – While

0.5 – Blue

1.0 - Red
11/26/07 – IRADSN’07
11
•Values →
Cisco – Internal VoIP Traffic Data
•Complete Stream: CiscoEMM.png
•Time →
•VoIP traffic data was provided by Cisco Systems and represents logged VoIP traffic in
their Richardson, Texas facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17
10/11/07
11:29:11 2003.
12
NGDM'07
Derwent River (UK)
28023
28043
28011
28117
28048
Derwent Temporal
Heat Map
derwentrotate.png
28010
11/26/07 – IRADSN’07
13
From Sensors to Streams –
An Outline
 Data Stream Overview
 Data Stream Visualization
 Temporal Heat Map
 Data Stream Modeling
 Extensible Markov Model
 Data Stream Hierarchy
11/26/07 – IRADSN’07
14
Data Stream Modeling Requirements








Summarization (Synopsis )of data
Use data NOT SAMPLE
Temporal and Spatial
Dynamic
Continuous (infinite stream)
Learn
Forget
Sublinear growth rate - Clustering
11/26/07 – IRADSN’07
15
Extensible Markov Model
 Extensible Markov Model (EMM): at any time t, EMM
consists of a Markov Chain with designated current
node, Nn, and algorithms to modify it, where algorithms
include:



EMMCluster, which defines a technique for matching between
input data at time t + 1 and existing states in the MC at time t.
EMMIncrement algorithm, which updates MC at time t + 1 given
the MC at time t and clustering measure result at time t + 1.
EMMDecrement algorithm, which removes nodes from the
EMM when needed.
 In addition, the EMM has associated Data Mining
functions such a Rare Event Detection and Prediction
Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference,
11/26/07 –November
IRADSN’07
2004, pp 371-374.
16
EMM Learning
•<18,10,3,3,1,0,0>
•<17,10,2,3,1,0,0>
•<16,9,2,3,1,0,0>
•2/3
•2/3
•2/2
•2/3
•1/1
1
•1/2
•1/2
•N3
•N1
•<14,8,2,3,1,0,0>
•1/3
•N2
•1/1
•1/2
•1/1
•<14,8,2,3,0,0,0>
•<18,10,3,3,1,1,0.>
10/11/07
NGDM'07
17
EMM Forgetting
N1
N3
1/3
1/3
2/2
N1
1/2
N5
11/26/07 – IRADSN’07
N3
1/3
1/3
N2
1/3
1/6
1/6
1/3
N6
N5
1/6
N6
18
EMM Sublinear Growth Rate
Minnesota Department of Transportation (MnDot)
11/26/07 – IRADSN’07
19
From Sensors to Streams –
An Outline
 Data Stream Overview
 Data Stream Visualization
 Temporal Heat Map
 Data Stream Modeling
 Extensible Markov Model
 Data Stream Hierarchy
11/26/07 – IRADSN’07
20
Traditional DBMS Data Abstraction
 Three levels of data abstraction
 Physical,
 Logical
 External
 Data is normally pulled to the user by a
query
11/26/07 – IRADSN’07
21
Proposed DSMS Data Abstraction
 Abstraction
 Level 0 - Physical Level
• Raw data from sensors
• Cannot be stored

Level 1 – DSMS
• Sensor data is merged, aggregated, and cleansed.
• DSMS queries may be processed against this data.

Level 2 – Model
• Summarization (Synopsis )of data

Level 3 – Domain Expert
• Summary Visualization
 Data is normally pushed to the user
11/26/07 – IRADSN’07
22
Levels
Lowest
Level
External
Storage
Highest Level
Inter-level Data
Abstraction
Migration
Subset/Cache/Buffer Fetch/Prefetch
External View
Memory
Hierarchy
n
DBMS Data
Hierarchy
3
Physical
Storage
Data
Warehouse
n
Operational Cube/
Data
Multidimensional
View
Stream
Hierarchy
4
Sensor Data Visualization/Triggers Automatic Push
11/26/07 – IRADSN’07
Fetch, Prefetch
Aggregation
23
Actionable Intelligence
LEVEL 3
Domain Expert
Visualization
Triggers, Lookmarks, Anomalies
P21
P12
N1
P15
Data Mining
Applications
N2
P31
P24
P41
LEVEL 2
Model
LEVEL 1
DSMS
N3
P34
P53
N4
N5
P55
Temporal Dynamic
Synopsis
Data Stream
Management System
(DSMS)
Query
Scratch Space
Streams
11/26/07 – IRADSN’07
LEVEL 0
Sensors
24
Stream Hierarchy Summary
 Except for the inter-level functionality requirements,
each level functionality is independent of the others
and may differ across different implementations.
 The model used must capture time and ordering of
data, be able to both learn and forget, and use some
variation of clustering.
 Visualization at the domain expert level must capture
both time and ordering. It addition it should be able to
be easily “read” for many sets of sensors.
11/26/07 – IRADSN’07
25