Download Data Stream Mining with Extensible Markov Model

Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006 Outline Data Stream Mining  EMM Framework  EMM Applications  Future Work  Conclusions  2 Data Mining Is the process of automatically searching large volumes of data for the nontrivial, hidden, previously unknown, and potentially useful information (interrelation of data)  Also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining,  Classification (Yahoo news, finance, etc.)  Clustering (type of customers in online purchase)  Association (Market Basket Analysis) 3 Classification:     Given a collection of records (training set )  Each record contains a set of attributes, one of the attributes is the class.  Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. Decision tree, neural network, naïve Bayes, etc. Classification is a supervised learning process. 4 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Learning algorithm Induction Learn Model Model 10 Training Set Tid Attrib1 Attrib2 Attrib3 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? Apply Model Class Deduction 10 Test Set 5 Clustering   Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Clustering is an unsupervised learning Intra-cluster distances are minimized Inter-cluster distances are maximized 6 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 3 4 5 Bread, Diaper, Beer, Eggs Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Example of Association Rules {Diaper}  {Beer}, {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk}, Implication means co-occurrence, not causality! 7 Why Data Stream Mining?  A growing number of applications generate streams of data.  Computer network monitoring data (IEPM-BW2004, Abilene      2005) Call detail records in telecommunications (Cisco VoIP data 2003) Highway transportation traffic data (MnDot 2005) Online web purchase log records (JcPenny data 2003) Sensor network data (Ouse, Serwent 2002) Stock exchange, transactions in retail chains, ATM operations in banks, credit card transactions. 8 What we see from the data streams? Characteristics of data stream:       Records may arrive at a rapid rate High volume (possibly infinite) of continuous data Concept drifts: Data distribution changes on the fly Data are raw Multidimensional Spatiality, Temporality 9 What we see from the data streams? Requirements:   High efficient computation and processing of the input streams in terms of both time and space. Soft-real time and scalability. “Seek needles in a haystack”. Rare event detections. Haixun Wang, Jian Pei, Philip S. Yu, ICDE 2005; Keogh, ICDM’04 10 What we see from the data streams? Stream processing restrictions:  Single pass: Each record is examined at most once Bounded storage: Limited Memory to be used Real-time: Per record processing time must be low Incremental responses to queries  Our Solution       Data modeling (global synopsis) Mining on local patterns based on the synopsis Incremental, scalable algorithms 11 Extensive Markov Model To develop a new data mining framework to model spatiotemporal data stream, and mine interesting local patterns. Assumptions of data:     Data are collected in discrete time intervals, Data are in structured format, <a1, a2, …> Data are multidimensional, Data hold an approximation of the Markov property. 12 Extensive Markov Model Capabilities of the technique:      soft real-time processing time (Incremental) Global modeling capability (scalable, synopsis) Local pattern finding capability (mining performed on synopsis) Adaptive to concept changes, Rare event detection 13 Outline Introduction  EMM Framework  EMM Applications  Future Work  Conclusions  14 EMM: An Overview Motivation of EMM:  Markov process is a random process satisfying Markov property. Markov chain is a Markov process with discrete states.  Clustering -> determine representative granules in the data space.  Static Markov chain -> dynamic Markov chain  Map a cluster into a state in Markov chain What is EMM: A data mining framework which models spatiotemporal data stream and is employed for local pattern detections. EMM models data stream by interleaving a clustering algorithm with a dynamic Markov chain. EMM applies a series of efficient algorithms to mine interesting patterns from the modeled data (synopsis). 15 EMM Overview EMM Clustering Algorithms Nearest neighbor O(m) Hierarchical Clustering O(log m) EMM uses any clustering algorithm EMM Building Algorithms O(1) EMMIncrement algorithm, EMMDecrement algorithm, EMMMerge algorithm EMMSplit algorithm EMM Application Algorithms O(1) Predictions Anomaly detection Risk Assessment Emerging Event Finding Selection of these algorithms are solely dependent on hypotheses of data profiles EMM performs learning incrementally and is able to perform application computations simultaneously. 16 EMM Components and Workflow Hypotheses Data stream Online Preprocessing Query EMM Modeling EMM Pattern Finding Output - Flexibility - Modularization - It models while executes applications 17 EMM – A Walk Through Attributes EMM Clustering 1 2 3 4 5 6 7 1-> N1 18.63 10.97 3.179 3.803 1.239 0.718 0.137 2 -> N1 17.6 10.81 2.989 3.741 1.497 0.661 0.135 3 -> N2 16 9.503 2.685 3.432 1.169 0.594 0.125 4 -> N3 14.62 8.966 2.561 3.296 1.01 0.56 0.116 5 -> N3 14.62 8.32 2.409 3.107 0.915 0.512 0.114 6 -> N1 18.73 10.37 3.19 3.83 1.39 1.18 0.13 Inputs ->States N1 EMM Building 18 EMM – A Walk Through Attributes EMM Clustering 1 2 3 4 5 6 7 1-> N1 18.63 10.97 3.179 3.803 1.239 0.718 0.137 2 -> N1 17.6 10.81 2.989 3.741 1.497 0.661 0.135 3 -> N2 16 9.503 2.685 3.432 1.169 0.594 0.125 4 -> N3 14.62 8.966 2.561 3.296 1.01 0.56 0.116 5 -> N3 14.62 8.32 2.409 3.107 0.915 0.512 0.114 6 -> N1 18.73 10.37 3.19 3.83 1.39 1.18 0.13 Inputs ->States CL11=1 N1 N 1 CN1=1 EMM Building 19 EMM – A Walk Through Attributes EMM Clustering 1 2 3 4 5 6 7 1-> N1 18.63 10.97 3.179 3.803 1.239 0.718 0.137 2 -> N1 17.6 10.81 2.989 3.741 1.497 0.661 0.135 3 -> N2 16 9.503 2.685 3.432 1.169 0.594 0.125 4 -> N3 14.62 8.966 2.561 3.296 1.01 0.56 0.116 5 -> N3 14.62 8.32 2.409 3.107 0.915 0.512 0.114 6 -> N1 18.73 10.37 3.19 3.83 1.39 1.18 0.13 Inputs ->States CL11=1 CL11=1 NN 11 CN1=2 CN1=1 L12=1 N2 EMM Building 20 EMM – A Walk Through Attributes EMM Clustering CL11=1 CL11=1 1 2 3 4 5 6 7 1-> N1 18.63 10.97 3.179 3.803 1.239 0.718 0.137 2 -> N1 17.6 10.81 2.989 3.741 1.497 0.661 0.135 3 -> N2 16 9.503 2.685 3.432 1.169 0.594 0.125 4 -> N3 14.62 8.966 2.561 3.296 1.01 0.56 0.116 5 -> N3 14.62 8.32 2.409 3.107 0.915 0.512 0.114 6 -> N1 18.73 10.37 3.19 3.83 1.39 1.18 0.13 Inputs ->States EMM Applications CN1=2 NN11 CN1=2 CL12=1 CL12=1 CN2=1 NN22 CL23=1 EMM Building N2 21 EMM – A Walk Through Attributes 2 3 4 5 6 7 1-> N1 18.63 10.97 3.179 3.803 1.239 0.718 0.137 2 -> N1 17.6 10.81 2.989 3.741 1.497 0.661 0.135 3 -> N2 16 9.503 2.685 3.432 1.169 0.594 0.125 4 -> N3 14.62 8.966 2.561 3.296 1.01 0.56 0.116 5 -> N3 14.62 8.32 2.409 3.107 0.915 0.512 0.114 6 -> N1 18.73 10.37 3.19 3.83 1.39 1.18 0.13 Inputs ->States EMM Clustering EMM Applications CL11=1 N1 1 CN1=2 CL12=1 CN2=1 N2 CL23=1 N2 CN3=1 EMM Building 22 CL33=1 EMM – A Walk Through Attributes 1 2 3 4 5 6 7 1-> N1 18.63 10.97 3.179 3.803 1.239 0.718 0.137 2 -> N1 17.6 10.81 2.989 3.741 1.497 0.661 0.135 3 -> N2 16 9.503 2.685 3.432 1.169 0.594 0.125 4 -> N3 14.62 8.966 2.561 3.296 1.01 0.56 0.116 5 -> N3 14.62 8.32 2.409 3.107 0.915 0.512 0.114 6 -> N1 18.73 10.37 3.19 3.83 1.39 1.18 0.13 Inputs ->States EMM Clustering CL11=1 CL11=1 EMM Applications N CN1=2 N11 CN1=2 CL12=1 CL12=1 CN2=1 CN2=1 L23=1 N N22 CL23=1 CL23=1 EMM Building NN22 CN3=1 CN3=2 CL33=1 CL33=1 23 More Issues of EMM Label of Nodes: Cluster feature: <CNi, LSi> LS: Medoid or Centroid Label of Links: <CLij> Calibration of Granularity of Clusters  Determine threshold using Markov property  Parameter free modeling [Keogh, KDD04] RMS    Pe rform ance 8.4 8.2 8 7.8 7.6 7.4 7.2 7 6.8 6.6 1 2 3 4 5 6 7 8 Th = 0.5 0.6 0.7 0.8 0.9 0.99 0.995 0.999 Figure 65.5. RMS Error for Prediction in Serwent Dataset 24 Modeling Performance    Growth rate of EMM states (Matlab as a testbed)  Sublinear growth of number of states  Growth rate decreases  Memory usage: 0.02-0.04% of data size for Ouse, Serwent, and MnDot. Time efficiency  Clustering: O(m) vs. O(log m)  Markov chain: O(1) Continued learning 25 Outline Introduction  EMM Framework  EMM Applications     Anomaly detection Risk Assessment Emerging Event Finding Future Work  Conclusions  26 EMM Application: Anomaly Detections  Problem: compares a synopsis representing “normal” behavior to actual behavior. Any deviation is flagged as a potential interesting pattern.    Methodology: Concepts and rules      Also known as Positive Security Model [http://www.imperva.com] Assume that everything that deviates from normal is bad. Cardinality of nodes and links Normalized Occurrence Frequency and Normalized Transition Probability Performance Metric: detection rate = TP/(TP+TN) Plus: has the potential to detect interesting patterns of all kind – including "unknown" patterns Minus: can lead to a high false alarm rate. 27 EMM Application: Anomaly Detections 28 EMM Application: Anomaly Detections 29 EMM Application: Risk Assessment     Problem: Mitigate false alarm rate while maintain a high detection rate. “98% of the alarm incidents in most Methodology: communities are false alarms which  Historic used as a free resource distracts law feedbacks enforcement can frombe real to take some possibly safe anomalies public safetyout responses. “ PurvisGary, http://www.falsealarmreduction.com/  Combine anomaly detection model and user’s feedbacks.  Risk level index Evaluation metrics: Detection rate, false alarm rate. Results and discussions Detection rate = TP/(TP+TN) False alarm rate = FP/(TP+FP) 30 EMM Application: Risk Assessment DETECTION RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS 1 0.5 0 16 ANOMALY DETECTION RISK ASSESSMENT 18 20 22 24 26 28 (a) EUCLIDEAN THRESHOLD FOR CLUSTERING (th) 30 32 1 0.5 0 0 ANOMALY DETECTION RISK ASSESSMENT 0.1 1 0.3 0.4 0.5 0.6 0.7 0.8 (b) RISK ASSESSMENT WEIGHT FACTOR (alpha) 0.9 1 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 0 0.2 50 100 150 200 250 300 (c) EMM STATE CARDINALITY THRESHOLD (thNode) 350 400 1 0.5 0 0 ANOMALY DETECTION RISK ASSESSMENT 50 100 150 200 250 (d) EMM TRANSITION CARDINALITY THRESHOLD (thLink) 300 31 EMM Application: Risk Assessment FALSE ALARM RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS 1 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 16 18 1 20 22 24 26 28 (a) EUCLIDEAN THRESHOLD FOR CUSTERING (th) 0.1 1 1 0.5 0 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (b) RISK ASSESSMENT WEIGHT FACTOR (alpha) 0.9 1 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 0 32 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 0 30 50 100 150 200 250 300 (c) EMM STATE CARDINALITY THRESHOLD (thNode) 350 400 ANOMALY DETECTION RISK ASSESSMENT 50 100 150 200 250 (d) EMM TRANSITION CARDINALITY THRESHOLD (thLink) 300 32 EMM Application: Risk Assessment RELATIVE OPERATING CHARACTERISTIC CURVE OF ANOMALY DETECTION MODEL 1 0.95 0.9 DETECTION RATE 0.85 0.8 0.75 0.7 0.65 W/ D. EUCLIDEAN THRESHOLDS W/ D. EMM STATE THRESHOLDS 0.6 W/ D. EMM TRANSITION THRESHOLDS 0.55 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 FALSE ALARM RATE 0.7 0.8 0.9 1 33 EMM Application: Emerging Events  Problem: Model dynamic changing spatiotemporal data series. Find emerging events that represent new and significant trends.    Methodology:       How to delete obsolete nodes? How to identify the new trend at an early time? Sliding window : EMMDelete Decay of importance: Aging Score Extended Cluster Feature Extended Transition Labeling Emerging events Results and discussions  O(1) 34 EMM Application: Emerging Events 1.0 1.0 1.0 1.0 1.0 1 1 S(t)/CN = 3/5 =0.6 CN = 5 t t 1 1.0 1 S(t)/CN = 4.5/5 =0.9 0.6 0.7 0.3 0.4 t S(t)=3 t 1 S(t)/CN = 2.8/3 =0.93 t = 0.3+0.4+0.5+0.6+1.0 = 3.0 35 EMM Application: Emerging Events EMM STATE INCREMENT (CISCOINTERNAL2 EULIDEAN TH=30 CENTROID WINDOW SIZE=1000 ALPHA=0.01 R=0.9) 30 25 EMM STATE 20 15 10 5 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 TIME 36 Outline Introduction  EMM Framework  EMM Applications  Future Work  Conclusions  37 Future Work: Adaptive EMM Adaptive EMM   Motivation: Modeling dynamically changing data profile needs change of cluster granularity. Our proposed methodology: local ensemble of EMMs     One main EMM and two ancillary EMMs (less descriptors ), Compare performance of the three EMMs, Switch the main EMM Create a new ancillary EMM based on the new main EMM (Faster time-to-mature). EMM performance at time t   EMMSplit EMMMerge Performance New algorithms are needed 38 Granularity Future Work: Hierarchical EMM Hierarchical EMM: The logical geographic area under consideration will be divided into virtual regions. A high level EMM is an agglomeration of lower level EMMs.  Parallel EMM: a high level EMM is a summary of lower level EMMs with the same features/attributes.  Heterogeneous EMM: a lower level EMM is a feature of the higher level EMM,  Recursive EMM: a lower level EMM represents one or several sub-states of the higher level EMM. EMM EMM EMM EMM EMM EMM EMM EMM 39 Conclusions      EMM is an efficient, modularized, flexible data mining framework suitable for spatiotemporal data steam processing It has a series of applications, EMM complies with current research trend and demanding techniques, EMM is innovative, List of Publications. 40 Related Publications       Yu Meng and Margaret H. Dunham, "Mining Developing Trends of Dynamic Spatiotemporal Data Streams", Journal of Computers, Vol. 1, No. 3, Academy Publisher, 2006. Charlie Isaksson, Yu Meng and Margaret H. Dunham, "Risk Leveling of Network Traffic Anomalies", Int'l Journal of Computer Science and Network Security (IJCSNS), Vol. 6, No. 6, 2006. Yu Meng and Margaret H. Dunham, “Online Mining of Risk Level of Traffic Anomalies with User's Feedbacks”, in Proceedings of the Second IEEE International Conference on Granular Computing (GrC'06), Atlanta, GA, May 10-12, 2006. Y. Meng, M.H. Dunham, F.M. Marchetti, and J. Huang, ”Rare Event Detection in A Spatiotemporal Environment”, in Proceedings of the Second IEEE International Conference on Granular Computing (GrC'06), Atlanta, GA, May 10-12, 2006. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in A Dynamic Spatiotemporal Environment”, in Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006) , Singapore, April 9-12, 2006, Springer LNCS Vol. 3918. M.H. Dunham, Y. Meng, and J. Huang, “Extensible Markov Model”, in Proceedings of the 4th IEEE International Conference on Data Mining (ICDM'04), Brighton, UK, November 1-4, 2004. 41 Thank you Questions? 42

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Stream Mining with Extensible Markov Model