Download Synthesis of Streaming Data from Multiple Sensors via Embedded

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Synthesis of Streaming Data from Multiple
Sensors via Embedded Data Extraction
April 15th, 2004 Project Report
Magdiel Galán
CSE591: DataMining
Dr. Huan Liu
Spring 2004
http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt
Outline






Problem/Project Description
Sampling
Smoothing
Clustering
Current Status
Plans
Project Description


Synthesis of Streaming Data from Multiple
Sensors (~100’s) via Embedded Data
Extraction for mission critical applications.
Work in conjunction with Motorola’s Human
Interface Lab (on-going project)

Simulation Environment
Project Description

Goal: Develop driver assistance system that provide
feedback, but not control, during unsafe instances.



From distractions caused by cellphones, PDAs, eMail,
Why: Targeting a government initiative to create a
safer car environment in the information age explosion
How: Develop intelligent system by mining Streaming
Data from multiple automotive sensors

Development work being done using driving simulator with
projections screens with up to 400 parameters/sensors
including video links for eye-gaze and foot-pedal movement
Sample Cases

Case Scenario #1:

Passing Slow Traffic

which slowed down due to an accident


which you are also rubber-necking
 while fidgetting with your radio
Case Scenario #2:

Making a left turn

while hearing directions from MapTracker

while checking at the time because you are late
 while reaching for the cellphone with on-coming call
Simulation Environment
150 Simulated View
Driving Experience
Gas
Gas
Batt
EngineTemp
Acceleration
Lateral Acc.
PDA
GearShift
Oil
Air Bag
GPS
Driver
Internet
CellPhone
A/C
CD
Sonar Proximity Sensor RPMs
Wheel Rotation
Brake Pressure
Motivation

Primary Interest: Robotics

Merging of Sensors/Sensor Fusion






Problem: decide agent’s next best action vs. a goal


optical
proximity (IR, sonar, radar)
location (GPS, visual maps)
movement (actuators, rotations)
system (battery, temperature, bump switches)
Not too dissimilar from an Automobile environment
Other Applications:

Manufacturing Environment

Increase Yields/Productivity/Reduce Defects using quality
control daily monitor data (100’s  Parameters  1K’s)

Pentium Ex.: Oxide Thickness, Poly Width, Boron Implant
Density, Plasma Etch eV’s, Litho PM, Diffuser RPMs, etc…
Stream Data Properties

Numerical/Continuous





Categorical






Speed
Steering/Heading
Acceleration (Forward/Lateral)
Distance (Lane Edge, Vehicle on Front)
Lane Position
Gear: P/R/D/OD/L1/L2
Headlights On/Off
Radio/CD ON
Incoming Call
Sampling Rate: 60Hz
Critical/Special Conditions






Left/Right Turn
Passing/Changing Lanes
U-Turn
Reverse
Tailgating
Not On Road
Some Warning Signs


Lane Drifting
Erratic Behavior




droopy eyes
eyes not facing the road
foot/pedal movement do not correspond with
road conditions
Incoming Call while performing Critical
Maneuver
Goal

Identify Instances outside normal patterns
as an indication of an Abnormal Situation


Hence – Need to draw Driver’s Attention to
Impending Situation
Ultimate Goal:

Develop bootsrapping mechanism that
combines driving situation classifiers (i.e.
LeftTurn/Passing) together with instance
selection methods in active learning

Bootsrapping – selecting high utility data for retraining
Instance Selection Properties



Instance representative
Instance selection  reduce rows
Ideal outcome instance selection


choose a data subset achieves same result as
whole data with little or no performance P
deterioration
Should be model independent

∆ P(Mi) ≐ ∆P(Mj)
[LM01]
Problem#1: Sampling

Initial step towards instance selection:
select representative subset…

Divide into collection of elements which must
cover the whole population without
overlapping [GHL01]

These are called sampling units
Sampling Results
Sampling at 10mS
(x-axis: signal duration; y-axis: count)
Problem#2: Smoothing


Reduce/Filter out noise and outliers.
Smoothing Techniques used:

Bin Median/Rolling Average [LM01]/[D03]


Median preferred over Mean since less sensitive to
outliers
Tresholding/Bin Boundaries [LM01]/[HK01]

10% offset treshold
PreSmoothing - RAW Data
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
RAW Data Map/Course
Route Map – starting point at (0,0)
Smoothing Results - Median
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
Smoothing Results - Median
Smoothing Results - Threshold
Smoothing Results - Threshold
Dr. Liu’s Incremental Instance
Selection Algorithm
Given: Data streams with instances I
Output: indicative instances
For each data stream
Do the following incrementally
Create a profile P for I
Check new instance i against P
if i is an outlier of P
Return i
else
Update P with i
End do
Outliers
Problem#3: Clustering

Why?




Data is Unclassified
Previous results using Numerical Data on most
significant key parameters
Develop clusters exemplifying ALL attributes
Select instances that do not belong to a cluster
as triggering mechanism
Stream Clustering Challenges


Large “Unclassified” Data Base
Fast On-Line Resolution within small window



0.5 – to 2 or 3 seconds
One Pass Only restriction (need fast I/O)
Mix of Numerical and Categorical Data

Traditional algorithms do not work well for categorical
attributes (remember P/R/D/OD/L1/L2, or CD On)



Centroid approach cannot be used
Hard to reflect the properties of the neighborhood of the
points
Memory Constraints
Clustering Techniques vs.
Streaming Data

SVM



Good at handling multidimensional data
Not good – need classified data, lots of I/O,
data in memory
BIRCH


Good at handling mulidimensional data, large
databases; single scan, linear I/O time
Not good – predominantly for “numerical” type
of attributes; order dependent
Clustering Techniques vs.
Streaming Data (2)

CURE (Clustering Using REpresentative)[D03]



Good at handling outliers; hierarchical
Not good – random sampling (won’t fit
streaming)
ROCK (RObust Clustering Using LinKs)[D03]


Good at Hierarchical clustering for categorical
attributes
Not good: Random sampling for scale up
My 1st Clustering Attempt…
Move in
Reverse
My 1st Clustering Attempt(2)
Zoom Next
Page
My 1st Clustering Attempt(3)
Move in
Reverse
Current Status/Plans


This is an ON-GOING project
Cluster Technique Development


Evolve from known methods?
Generalization of the technique

Not just Automobile Streaming Data
References




[LM01] H.Liu, H. Motoda. “Data Reduction via Instance Selection”. Instance
Selection and Construction for Data Mining. 2001. KAP. ASU Library
[GHL01] B. Gu, F.Hu, H. Liu. “Sampling: Knowing Whole From its Part”. Instance
Selection and Construction for Data Mining. 2001. KAP. ASU Library
[HK01] J. Han, M. Kamber. Data Mining Concepts and Techniques. Chps. 3, 8
Data Cleaning, Clustering. Morgan Kaufman. ASU Library
[D03] M.Dunham. Introductory and Advanced Topics. Prentice Hall, Chps. 3-5.
Mining Techniques, Classification, Clustering. ASU Library