Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Candidate: Parisa Rashidi
Advisor: Diane J. Cook
1
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer Learning
Active learning
Results
Conclusions & future directions
2
Smart Homes
Sensors & actuators integrated into everyday objects
Knowledge acquisition about inhabitant
Percepts
(sensors)
Agent
Environment
Actions
(controllers)
3
Applications
Energy efficiency
Security
Achieving more comfort
Monitoring well-being of residents
In home monitoring
4
Monitor daily activities
Check for anomalies
Help by giving prompts and cues
Activity Recognition
A vital component of smart homes
Recognizing activities from stream of sensor events
…
A
B
C
D
An Activity
(Sequence of sensor events)
5
A
C
D
F
…
A Sensor Event
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer learning
Active learning
Results
Conclusions & future directions
6
Why it is difficult?
Human activity is erratic and complex
Discontinuous (interrupting events)
Step order might vary each time
Inter-subject and intra-subject variability
The algorithm should be scalable
Data annotation
Costly and laborious
Training for each new space?
7
Unsolved Challenges
Many methods proposed
Hidden Markov models, conditional
random fields, naïve Bayes, …
Current methods
Consider many simplifying assumptions
Mostly are supervised
Data annotation problem
Even if unsupervised
Trained for each new setting from scratch
Ignore activity variations or interruptions
…
8
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer learning
Active learning
Results
Conclusions & future directions
9
Our Solutions
Discovering complex activities
Sequence mining
Discovery activities from stream
Stream sequence mining
Transferring activity models to new spaces
Transfer learning
Guiding activity annotation
Active learning
10
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer learning
Active learning
Results
Conclusions & future directions
11
Sequence Mining
Sequence
Ordered set of items
Examples
Speech: sequence of phonemes
DNA sequence: AAGCTACGTAA
Network: sequence of packets
Our data: sequence of sensor events
Goal
Finding repetitive sequential patterns in data
Many methods proposed
GSP, PrefixSpan, SPADE, …
12
Activity Sequence Mining Problem
Data: a single sequence with no boundaries
Unlike transaction data
We are looking for activity sequence patterns
With discontinuous steps
Variations of the same activity
13
Transaction ID
Items
1
{Milk, Egg, Bread}
2
{Bread, Beer}
3
{Soap, Milk, Egg}
Item-set boundary
… M D M D A
C
D F
No boundaries !
…
From Sequence Mining to Activity
Recognition
Find activity patterns
Discontinuous Varied Sequence Mining (DVSM)
Continuous, varied Order, Multi Threshold (COM)
Cluster similar patterns
Cluster centroid is a representative activity.
Recognize activities
Hidden Markov Model
Data
DMSM
Sensor
Data
14
Clustering
Interesting
Patterns
Recognition
Representative
Activities
DVSM
Pattern Instances
Finds general patterns/variations in
{b,x,a}
{a,b,q}
several iteration
During each iteration
<a,b>
Finds increasing length patterns
Extend by prefix and suffix at each iteration
Checks if it is a variation of a general pattern
At the end of each iteration
Retain only interesting patterns according to
MDL principle
Compression
15
Continuity
General Pattern
{a,u,b}
DVSM
Continuity
Pattern Variations Instances Events
abchdadcbopa
bb
cgeqydc
arhabxc
Prunes patterns/variations with low compression values
Highly discontinuous
Infrequent
Prunes non-maximal patterns
Prune irrelevant variations using mutual information and
sensor
16
Improve DVSM: COM
Different sensor frequencies for
Different regions of home
Different types of sensor
“Rare item problem”
A global min-support doesn’t work!
Use multiple support thresholds
f k 0.02
f m 0.02
f k 0.02
f m 0.02
f k NA
f m 0.03
17
f k 0.01
f m 0.03
f k NA
f m 0.06
Frequent Motion Sensors
Frequent Key Sensors
Infrequent Motion Sensors
Infrequent Key Sensors
Clustering
Grouping similar objects together
There are many different clustering methods
Partition based (k-Means)
Hierarchal (CURE)
Density based (DBSCAN) Centroid Activity Activity Cluster
Model based (EM)
.
.
. .. .
18
...
.
. ... ..
.... .. .
Similarity Measure
How similarity is determined?
Our activity similarity measure
Total Similarity
=
Start Time Similarity
+
Duration Similarity
+
Structure Similarity
+
Location Similarity
19
Activity Recognition
Basically a sequence classification problem
Different than ordinary classification problems
Variable length records
Order
Probabilistic methods are the most widely used
HMM
Markov chains
Hidden Markov models
Dynamic Bayesian Networks
Conditional random fields
Day
X
DBN
Day
X
Time
Y
Room
Activity n
Time t+1
Time t
20
Room
Y
Activity n
Time t
Time
Time t+1
Hidden Markov Model
A statistical model
Markovian property
A number of observed & hidden variables
Their transition probabilities
We automatically build HMM from cluster centroids
a12
Cooking
a21
Taking
Meds
b22
b11
b13
a23
a34
Hygiene
Leaving
b23
b35
b12
b46
b33
M003
21
D029
M001
b34
D032
b45
M006
M004
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer learning
Active learning
Results
Conclusions & future directions
22
Stream Mining
Many emerging applications
IP network traffic
Scientific data
Process data as it arrives
We cannot store all data
One pass
Approximate and randomization answers
E.g. relaxed support threshold
Some proposed methods
Frequent itemset mining
Lossy counting [Manku 2002], SpaceSaving algorithm [Metwally 2005], …
Frequent sequence mining
SPEED algorithm [Raissi 2005], ..
23
Tilted Time Model
Uses a set of time-tilted windows to keep frequency of
items
Finer details for more recent time frame
Coarser details for older time frames
Shifting history into older time frames as data arrives
Month
24
day
hour
*C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, Mining Frequent Patterns in Data Streams at Multiple
Time Granularities. MIT Press, 2003, ch. 3.
Tilted Time Model
Minimum support: σ
Maximum support error: ε
An itemset can be
Frequent
Sub-frequent
Infrequent
Pruning itemsets (tail pruning)
25
StreamCOM
Extending COM into a stream mining method
Using tilted time model
COM
StreamCOM
Titled
Time
Model
26
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer learning
Active learning
Results
Conclusions & future directions
31
Transfer Learning
Apply skills learned in previous tasks to novel tasks
Chess Checkers
Math CS
32
test items
training items
Transfer Learning
test items
training items
Traditional ML
Why in Smart Homes?
Why transfer learning?
Supervised methods
Requires annotation
Unsupervised methods
Requires lots of data
Target Home
Infinite Stream of Dafa
Small Initial Dataset
Source Home
Activity
Pattern
Mapping
34
Labeled
Activity
Patterns
Activity
Recognition
Our Transfer Learning Solutions
Activity Transfer
Transfer from one resident to another
Different residents, space layouts, sensors
Transfer from a single physical source to a target
Transfer from multiple physical source to a target
Domain selection
Transfer
Source
Activities
35
Target
Activities
Multi Home Transfer Learning
(MHTL)
1.
Find activity models in both spaces
Source: extract activity model
Target: location based mining, incremental clustering
Activity consolidation, sensor selection
2. Map activity models from source to target
Map Sensors
Map activities
3. Map Labels
4. Use labels for recognition!
37
MHTL Architecture
Input
Activity
Extraction
Mapping
Recognition
Form
Activities
Initialize
Source
Labeled
Data
Consolidate
Activities
Activity
Templates
Map
Sensors
Select
Sensors
Target
Unlabeled
Data
Target
Labeled
Data
(If any)
Mine Data
Activity
Templates
Form
Activities
Consolidate
Activities
Select
Sensors
38
Adjust
Mapping
Map
Activities
Target
Labeled
Activities
Domain Selection
Our previous works
Assumed “all sources are equal”
Not all sources are equal
Some sources are more equal!
Select top N sources
Efficiency: do not use all sources
Accuracy: negative transfer effect
41
Some animals are more equal ...
George Orwell – Animal Farm
Domain Similarity
How to measure difference between two distributions?
42
Domain Similarity
Conventional similarity measures
Kullbeck Leibler divergence (KL), Jensen Shannon
divergence (JSD), L1 or Lp norms
Kifer et al [2004] proposed H distance
Later Ben David et al [2007] proved that
It is exactly the problem of minimizing the empirical
risk of a classifier that discriminates between instances
drawn from the two domain!
43
Demonstration of H Distance
H-distance: 0.1, small!
44
*Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for
domain adaptation. In NIPS, 2007.
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer learning
Active learning
Results
Conclusions & future directions
47
Active Learning
The learning algorithm can query for the label of a
point
Ask the oracle!
Proposed methods
Uncertainty sampling, committee based, …
Select
Informative
Instance
Learning
Algorithm
?
48
Informative
Instance
Oracle
Label
A Problem!
Traditional active learning methods
Ask overly specific queries
vs.
“What is the class label if
(sex= female) and (age =39) and (chest
pain type =3) and (serum cholesterol =
150.2 mg/dL) and (fasting blood sugar =
150 mg/dL)... and (electrocardiographic
result = 1) and (maximum heart rate
achieved = 126) and (exercise induced
angina = 90) and (heart old peak = 2.3)
and (number of major vessels colored by
fluoroscopy = 3)? ”
49
“What is the class label
if (age > 65) and (chest pain type = 3)
and (serum cholesterol > 240 mg/dL) ?”
Template Based Queries
Select the most informative instances
Select friends (+) and enemies (-) = Δ
Select relevant and weakly relevant features in Δ
Build a template query using relevant and weakly
relevant features
Data
Learning
Algorithm
Select
Informative
Instance
Build Template
Query based on
Template Neighbors and
Enemies
Query
Oracle
Update
50
Label
Select
Neighbors
and Enemies
RIQY
RIQY: Rule Induced active learning QuerY method
Select the most informative instances
Select friends (+) and enemies (-) = Δ
Use rule induction to build generic queries
Data
Learning
Algorithm
Select
Informative
Instance
Oracle
Update
51
Label
Rule
Select
Neighbors
and Enemies
Induce Rule
based on
Neighbors and
Enemies
Agenda
Introduction
Challenges
Solutions
Sequence mining
Stream mining
Transfer learning
Active learning
Results
Conclusions & future directions
53
Can we discover activities?
DVSM vs. COM
54
Activity Discovery
Confusion matrix for various activities in apartment 1
55
Some Discovered Patterns
56
StreamCOM
Taking medication activity
57
Transferring Activities
58
Transferring Activities
59
What about active learning?
Wisconsin breast cancer
dataset -UCI repository
60
Kyoto smart apartment
dataset -CASAS
Conclusions
Two novel sequence mining methods
DVSM
COM
A novel stream data mining method
StreamCOM
A couple of transfer learning methods
Between residents
Between one/multiple smart homes
Source selection
Two novel active learning methods
Template based active learning
RIQY
61
Future Work
• Anomaly detection in sequences
• Exploiting more temporal information
• Order of activities
• Change detection in patterns
• …
62
Publications
Published/Accepted
Parisa Rashidi and Diane J. Cook. Mining and Monitoring Patterns of Daily
Routines for Assisted Living in Real World Settings. Proceedings of
International Health Informatics Conference (IHI). 2010.
Parisa Rashidi and Diane J. Cook. Transferring learned activities in smart
environments between different residents. Proceedings of International
Conference on Intelligent Environments (IE), volume 2, pages 185-192.
Springer-Verlag, 2009.
Parisa Rashidi and Diane J. Cook. Multi Home Transfer Learning for
Resident Activity Discovery and Recognition. Proceedings of International
Workshop on Knowledge Discovery from Sensor Data (KDD), pages 53-63,
2010.
Parisa Rashidi, Diane J. Cook, "Home to home transfer learning",
Proceedings of AAAI Plan, Activity, Intention Recognition Workshop
(AAAI), 2010.
63
Publications
Published/Accepted
Parisa Rashidi, Diane J. Cook, "Transferring Learned Activities and Cues
between Different Residential Spaces", Journal of Pervasive and Mobile
Computing (PMC). March 2010.
Maureen Schmitter-Edgecombe, Parisa Rashidi, Diane J. Cook, Larry
Holder. Discovering and Tracking Activities for Assisted Living, The
American Journal of Geriatric Psychiatry. In Press, 2010.
Parisa Rashidi, Diane J. Cook, , Larry Holder, Maureen SchmitterEdgecombe. Discovering Activities to Recognize and Track in a Smart
Environment, IEEE Transaction of Data and Knowledge Engineering
(TKDE). In Press, 2010.
Parisa Rashidi, Diane J. Cook, Mining Sensor Streams for Discovering
Human Activity Patterns Over Time. Proceedings of International
Conference on Data Mining (ICDM), 2010.
64
Publications
Submitted
Parisa Rashidi, Diane J. Cook. Domain Selection and
Adaptation in Smart Homes. ICOST 2011, January 2011,
submitted.
Parisa Rashidi, Diane J. Cook. Template Based Active
Learning. AAAI 2011, February 2011. Submitted.
Parisa Rashidi, Diane J. Cook. Ask Me Better Questions.
Rule Induction Based Active Learning. KDD 2011,
February 2011. Submitted.
65
Publications
Invited/To be submitted
Parisa Rashidi, Diane J. Cook. Mining and Monitoring
Patterns of Daily Routines for Assisted Living in Real
World Settings. ACM Transactions special issue on
Intelligent Systems for Health Informatics. Invited. April
2011
Parisa Rashidi, Diane J. Cook. Generic Active Learning
Queries. TKDE or JMLR. May 2011. To be submitted.
66
Questions?
67