Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Candidate: Parisa Rashidi Advisor: Diane J. Cook 1 Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer Learning Active learning Results Conclusions & future directions 2 Smart Homes Sensors & actuators integrated into everyday objects Knowledge acquisition about inhabitant Percepts (sensors) Agent Environment Actions (controllers) 3 Applications Energy efficiency Security Achieving more comfort Monitoring well-being of residents In home monitoring 4 Monitor daily activities Check for anomalies Help by giving prompts and cues Activity Recognition A vital component of smart homes Recognizing activities from stream of sensor events … A B C D An Activity (Sequence of sensor events) 5 A C D F … A Sensor Event Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer learning Active learning Results Conclusions & future directions 6 Why it is difficult? Human activity is erratic and complex Discontinuous (interrupting events) Step order might vary each time Inter-subject and intra-subject variability The algorithm should be scalable Data annotation Costly and laborious Training for each new space? 7 Unsolved Challenges Many methods proposed Hidden Markov models, conditional random fields, naïve Bayes, … Current methods Consider many simplifying assumptions Mostly are supervised Data annotation problem Even if unsupervised Trained for each new setting from scratch Ignore activity variations or interruptions … 8 Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer learning Active learning Results Conclusions & future directions 9 Our Solutions Discovering complex activities Sequence mining Discovery activities from stream Stream sequence mining Transferring activity models to new spaces Transfer learning Guiding activity annotation Active learning 10 Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer learning Active learning Results Conclusions & future directions 11 Sequence Mining Sequence Ordered set of items Examples Speech: sequence of phonemes DNA sequence: AAGCTACGTAA Network: sequence of packets Our data: sequence of sensor events Goal Finding repetitive sequential patterns in data Many methods proposed GSP, PrefixSpan, SPADE, … 12 Activity Sequence Mining Problem Data: a single sequence with no boundaries Unlike transaction data We are looking for activity sequence patterns With discontinuous steps Variations of the same activity 13 Transaction ID Items 1 {Milk, Egg, Bread} 2 {Bread, Beer} 3 {Soap, Milk, Egg} Item-set boundary … M D M D A C D F No boundaries ! … From Sequence Mining to Activity Recognition Find activity patterns Discontinuous Varied Sequence Mining (DVSM) Continuous, varied Order, Multi Threshold (COM) Cluster similar patterns Cluster centroid is a representative activity. Recognize activities Hidden Markov Model Data DMSM Sensor Data 14 Clustering Interesting Patterns Recognition Representative Activities DVSM Pattern Instances Finds general patterns/variations in {b,x,a} {a,b,q} several iteration During each iteration <a,b> Finds increasing length patterns Extend by prefix and suffix at each iteration Checks if it is a variation of a general pattern At the end of each iteration Retain only interesting patterns according to MDL principle Compression 15 Continuity General Pattern {a,u,b} DVSM Continuity Pattern Variations Instances Events abchdadcbopa bb cgeqydc arhabxc Prunes patterns/variations with low compression values Highly discontinuous Infrequent Prunes non-maximal patterns Prune irrelevant variations using mutual information and sensor 16 Improve DVSM: COM Different sensor frequencies for Different regions of home Different types of sensor “Rare item problem” A global min-support doesn’t work! Use multiple support thresholds f k 0.02 f m 0.02 f k 0.02 f m 0.02 f k NA f m 0.03 17 f k 0.01 f m 0.03 f k NA f m 0.06 Frequent Motion Sensors Frequent Key Sensors Infrequent Motion Sensors Infrequent Key Sensors Clustering Grouping similar objects together There are many different clustering methods Partition based (k-Means) Hierarchal (CURE) Density based (DBSCAN) Centroid Activity Activity Cluster Model based (EM) . . . .. . 18 ... . . ... .. .... .. . Similarity Measure How similarity is determined? Our activity similarity measure Total Similarity = Start Time Similarity + Duration Similarity + Structure Similarity + Location Similarity 19 Activity Recognition Basically a sequence classification problem Different than ordinary classification problems Variable length records Order Probabilistic methods are the most widely used HMM Markov chains Hidden Markov models Dynamic Bayesian Networks Conditional random fields Day X DBN Day X Time Y Room Activity n Time t+1 Time t 20 Room Y Activity n Time t Time Time t+1 Hidden Markov Model A statistical model Markovian property A number of observed & hidden variables Their transition probabilities We automatically build HMM from cluster centroids a12 Cooking a21 Taking Meds b22 b11 b13 a23 a34 Hygiene Leaving b23 b35 b12 b46 b33 M003 21 D029 M001 b34 D032 b45 M006 M004 Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer learning Active learning Results Conclusions & future directions 22 Stream Mining Many emerging applications IP network traffic Scientific data Process data as it arrives We cannot store all data One pass Approximate and randomization answers E.g. relaxed support threshold Some proposed methods Frequent itemset mining Lossy counting [Manku 2002], SpaceSaving algorithm [Metwally 2005], … Frequent sequence mining SPEED algorithm [Raissi 2005], .. 23 Tilted Time Model Uses a set of time-tilted windows to keep frequency of items Finer details for more recent time frame Coarser details for older time frames Shifting history into older time frames as data arrives Month 24 day hour *C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, Mining Frequent Patterns in Data Streams at Multiple Time Granularities. MIT Press, 2003, ch. 3. Tilted Time Model Minimum support: σ Maximum support error: ε An itemset can be Frequent Sub-frequent Infrequent Pruning itemsets (tail pruning) 25 StreamCOM Extending COM into a stream mining method Using tilted time model COM StreamCOM Titled Time Model 26 Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer learning Active learning Results Conclusions & future directions 31 Transfer Learning Apply skills learned in previous tasks to novel tasks Chess Checkers Math CS 32 test items training items Transfer Learning test items training items Traditional ML Why in Smart Homes? Why transfer learning? Supervised methods Requires annotation Unsupervised methods Requires lots of data Target Home Infinite Stream of Dafa Small Initial Dataset Source Home Activity Pattern Mapping 34 Labeled Activity Patterns Activity Recognition Our Transfer Learning Solutions Activity Transfer Transfer from one resident to another Different residents, space layouts, sensors Transfer from a single physical source to a target Transfer from multiple physical source to a target Domain selection Transfer Source Activities 35 Target Activities Multi Home Transfer Learning (MHTL) 1. Find activity models in both spaces Source: extract activity model Target: location based mining, incremental clustering Activity consolidation, sensor selection 2. Map activity models from source to target Map Sensors Map activities 3. Map Labels 4. Use labels for recognition! 37 MHTL Architecture Input Activity Extraction Mapping Recognition Form Activities Initialize Source Labeled Data Consolidate Activities Activity Templates Map Sensors Select Sensors Target Unlabeled Data Target Labeled Data (If any) Mine Data Activity Templates Form Activities Consolidate Activities Select Sensors 38 Adjust Mapping Map Activities Target Labeled Activities Domain Selection Our previous works Assumed “all sources are equal” Not all sources are equal Some sources are more equal! Select top N sources Efficiency: do not use all sources Accuracy: negative transfer effect 41 Some animals are more equal ... George Orwell – Animal Farm Domain Similarity How to measure difference between two distributions? 42 Domain Similarity Conventional similarity measures Kullbeck Leibler divergence (KL), Jensen Shannon divergence (JSD), L1 or Lp norms Kifer et al [2004] proposed H distance Later Ben David et al [2007] proved that It is exactly the problem of minimizing the empirical risk of a classifier that discriminates between instances drawn from the two domain! 43 Demonstration of H Distance H-distance: 0.1, small! 44 *Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In NIPS, 2007. Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer learning Active learning Results Conclusions & future directions 47 Active Learning The learning algorithm can query for the label of a point Ask the oracle! Proposed methods Uncertainty sampling, committee based, … Select Informative Instance Learning Algorithm ? 48 Informative Instance Oracle Label A Problem! Traditional active learning methods Ask overly specific queries vs. “What is the class label if (sex= female) and (age =39) and (chest pain type =3) and (serum cholesterol = 150.2 mg/dL) and (fasting blood sugar = 150 mg/dL)... and (electrocardiographic result = 1) and (maximum heart rate achieved = 126) and (exercise induced angina = 90) and (heart old peak = 2.3) and (number of major vessels colored by fluoroscopy = 3)? ” 49 “What is the class label if (age > 65) and (chest pain type = 3) and (serum cholesterol > 240 mg/dL) ?” Template Based Queries Select the most informative instances Select friends (+) and enemies (-) = Δ Select relevant and weakly relevant features in Δ Build a template query using relevant and weakly relevant features Data Learning Algorithm Select Informative Instance Build Template Query based on Template Neighbors and Enemies Query Oracle Update 50 Label Select Neighbors and Enemies RIQY RIQY: Rule Induced active learning QuerY method Select the most informative instances Select friends (+) and enemies (-) = Δ Use rule induction to build generic queries Data Learning Algorithm Select Informative Instance Oracle Update 51 Label Rule Select Neighbors and Enemies Induce Rule based on Neighbors and Enemies Agenda Introduction Challenges Solutions Sequence mining Stream mining Transfer learning Active learning Results Conclusions & future directions 53 Can we discover activities? DVSM vs. COM 54 Activity Discovery Confusion matrix for various activities in apartment 1 55 Some Discovered Patterns 56 StreamCOM Taking medication activity 57 Transferring Activities 58 Transferring Activities 59 What about active learning? Wisconsin breast cancer dataset -UCI repository 60 Kyoto smart apartment dataset -CASAS Conclusions Two novel sequence mining methods DVSM COM A novel stream data mining method StreamCOM A couple of transfer learning methods Between residents Between one/multiple smart homes Source selection Two novel active learning methods Template based active learning RIQY 61 Future Work • Anomaly detection in sequences • Exploiting more temporal information • Order of activities • Change detection in patterns • … 62 Publications Published/Accepted Parisa Rashidi and Diane J. Cook. Mining and Monitoring Patterns of Daily Routines for Assisted Living in Real World Settings. Proceedings of International Health Informatics Conference (IHI). 2010. Parisa Rashidi and Diane J. Cook. Transferring learned activities in smart environments between different residents. Proceedings of International Conference on Intelligent Environments (IE), volume 2, pages 185-192. Springer-Verlag, 2009. Parisa Rashidi and Diane J. Cook. Multi Home Transfer Learning for Resident Activity Discovery and Recognition. Proceedings of International Workshop on Knowledge Discovery from Sensor Data (KDD), pages 53-63, 2010. Parisa Rashidi, Diane J. Cook, "Home to home transfer learning", Proceedings of AAAI Plan, Activity, Intention Recognition Workshop (AAAI), 2010. 63 Publications Published/Accepted Parisa Rashidi, Diane J. Cook, "Transferring Learned Activities and Cues between Different Residential Spaces", Journal of Pervasive and Mobile Computing (PMC). March 2010. Maureen Schmitter-Edgecombe, Parisa Rashidi, Diane J. Cook, Larry Holder. Discovering and Tracking Activities for Assisted Living, The American Journal of Geriatric Psychiatry. In Press, 2010. Parisa Rashidi, Diane J. Cook, , Larry Holder, Maureen SchmitterEdgecombe. Discovering Activities to Recognize and Track in a Smart Environment, IEEE Transaction of Data and Knowledge Engineering (TKDE). In Press, 2010. Parisa Rashidi, Diane J. Cook, Mining Sensor Streams for Discovering Human Activity Patterns Over Time. Proceedings of International Conference on Data Mining (ICDM), 2010. 64 Publications Submitted Parisa Rashidi, Diane J. Cook. Domain Selection and Adaptation in Smart Homes. ICOST 2011, January 2011, submitted. Parisa Rashidi, Diane J. Cook. Template Based Active Learning. AAAI 2011, February 2011. Submitted. Parisa Rashidi, Diane J. Cook. Ask Me Better Questions. Rule Induction Based Active Learning. KDD 2011, February 2011. Submitted. 65 Publications Invited/To be submitted Parisa Rashidi, Diane J. Cook. Mining and Monitoring Patterns of Daily Routines for Assisted Living in Real World Settings. ACM Transactions special issue on Intelligent Systems for Health Informatics. Invited. April 2011 Parisa Rashidi, Diane J. Cook. Generic Active Learning Queries. TKDE or JMLR. May 2011. To be submitted. 66 Questions? 67