Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Anomaly Detection Using Data Mining Techniques Margaret H. Dunham, Yu Meng, Donya Quick, Jie Huang, Charlie Isaksson CSE Department Southern Methodist University Dallas, Texas 75275 [email protected] This material is based upon work supported by the National Science Foundation under Grant No. IIS-0208741 12/9/08, Sandia National Labs 1 Objectives/Outline Develop modeling techniques which can “learn/forget” past behavior of spatiotemporal stream events. Apply to prediction of anomalous events. Introduction EMM Overview EMM Applications to Anomaly Detection EMM Applications to Nuclear Testing Future Work 12/9/08, Sandia National Labs 2 12/9/08, Sandia National Labs 3 Outline Introduction Motivation What is an anomaly? Spatiotemporal Data Modeling Spatiotemporal Data EMM Overview EMM Applications to Anomaly Detection Emm Applications to Nuclear Testing Future Work 12/9/08, Sandia National Labs 4 Motivation A growing number of applications generate streams of data. Computer network monitoring data Call detail records in telecommunications Highway transportation traffic data Online web purchase log records Sensor network data Stock exchange, transactions in retail chains, ATM operations in banks, credit card transactions. Data mining techniques play a key role in modeling and analyzing this data. 12/9/08, Sandia National Labs 5 What is Anomaly? Event that is unusual Event that doesn’t occur frequently Predefined event What is unusual? What is deviation? 12/9/08, Sandia National Labs 6 What is Anomaly in Stream Data? Rare - Anomalous – Surprising Out of the ordinary Not outlier detection No knowledge of data distribution Data is not static Must take temporal and spatial values into account May be interested in sequence of events Ex: Snow in upstate New York is not an anomaly Snow in upstate New York in June is rare Rare events may change over time 12/9/08, Sandia National Labs 7 Statistical View of Anomaly Outlier Data item that is outside the normal distribution of the data Identify by Box Plot Image from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002. 12/9/08, Sandia National Labs 8 Statistical View of Anomaly Identify by looking at distribution THIS DOES NOT WORK with stream data Image from www.wikipedia.org, Normal distribution. 12/9/08, Sandia National Labs 9 Data Mining View of Anomaly Classification Problem Build classifier from training data Problem is that training data shows what is NOT an anomaly Thus an anomaly is anything that is not viewed as normal by the classification technique MUST build dynamic classifier Identify anomalous behavior Signatures of what anomalous behavior looks like Input data is identified as anomaly if it is similar enough to one of these signatures Mixed – Classification and Signature 12/9/08, Sandia National Labs 10 Visualizing Anomalies Temporal Heat Map (THM) is a visualization technique for streaming data derived from multiple sensors. Two dimensional structure similar to an infinite table. Each row of the table is associated with one sensor value. Each column of the table is associated with a point in time. Each cell within the THM is a color representation of the sensor value Colors normalized (in our examples) 0 – While 0.5 – Blue 1.0 - Red 12/9/08, Sandia National Labs 11 THM of VoIP Data VoIP traffic data was provided by Cisco Systems and represents logged VoIP traffic in their Richardson, Texas facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17 11:29:11 2003. 12/9/08, Sandia National Labs 12 Spatiotemporal Stream Data Records may arrive at a rapid rate High volume (possibly infinite) of continuous data Concept drifts: Data distribution changes on the fly Data does not necessarily fit any distribution pattern Multidimensional Temporal Spatial Data are collected in discrete time intervals, Data are in structured format, <a1, a2, …> Data hold an approximation of the Markov property. 12/9/08, Sandia National Labs 13 Spatiotemporal Environment Events arriving in a stream At any time, t, we can view the state of the problem as represented by a vector of n numeric values: Vt = <S1t, S2t, ..., Snt> V1 S1 S2 … Sn S11 S21 … Sn1 V2 S12 S22 … Sn2 … … … … … Vq S1q S2q … Snq Time 12/9/08, Sandia National Labs 14 Data Stream Modeling Single pass: Each record is examined at most once Bounded storage: Limited Memory for storing synopsis Real-time: Per record processing time must be low Summarization (Synopsis )of data Use data NOT SAMPLE Temporal and Spatial Dynamic Continuous (infinite stream) Learn Forget Sublinear growth rate - Clustering 12/9/08, Sandia National Labs 15 15 MM A first order Markov Chain is a finite or countably infinite sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that: S ={N1,N2, …, Nm}, and A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc, Lij = <Ni,Nj> is labeled with a transition probability Pij = P(Nj | Ni). 12/9/08, Sandia National Labs 16 Problem with Markov Chains The required structure of the MC may not be certain at the model construction time. As the real world being modeled by the MC changes, so should the structure of the MC. Not scalable – grows linearly as number of events. Our solution: Extensible Markov Model (EMM) Cluster real world events Allow Markov chain to grow and shrink dynamically 12/9/08, Sandia National Labs 17 Outline Introduction EMM Overview EMM Applications to Anomaly Detection EMM Applications to Nuclear Testing Future Work 12/9/08, Sandia National Labs 18 Extensible Markov Model (EMM) Time Varying Discrete First Order Markov Model Nodes are clusters of real world states. Learning continues during application phase. Learning: Transition probabilities between nodes Node labels (centroid/medoid of cluster) Nodes are added and removed as data arrives 12/9/08, Sandia National Labs 19 Related Work Splitting Nodes in HMMs Create new states by splitting an existing state M.J. Black and Y. Yacoob,”Recognizing facial expressions in image sequences using local parameterized models of image motion”, Int. Journal of Computer Vision, 25(1), 1997, 23-48. Dynamic Markov Modeling States and transitions are cloned G. V. Cormack, R. N. S. Horspool. “Data compression using dynamic Markov Modeling,” The Computer Journal, Vol. 30, No. 6, 1987. Augmented Markov Model (AMM) Creates new states if the input data has never been seen in the model, and transition probabilities are adjusted Dani Goldberg, Maja J Mataric. “Coordinating mobile robot group behavior using a model of interaction dynamics,” Proceedings, the Third International Conference on Autonomous Agents (agents ’99), Seattle, Washington 12/9/08, Sandia National Labs 20 EMM vs AMM Our proposed EMM model is similar to AMM, but is more flexible: EMM continues to learn during the application phase. State matching is determined using a clustering technique. EMM not only allows the creation of new nodes, but deletion (or merging) of existing nodes. This allows the EMM model to “forget” old information which may not be relevant in the future. It also allows the EMM to adapt to any main memory constraints for large scale datasets. EMM performs one scan of data and therefore is suitable for online data processing. 12/9/08, Sandia National Labs 21 EMM Extensible Markov Model (EMM): at any time t, EMM consists of an MM and algorithms to modify it, where algorithms include: EMMSim, which defines a technique for matching between input data at time t + 1 and existing states in the MM at time t. EMMIncrement algorithm, which updates MM at time t + 1 given the MM at time t and classification measure result at time t + 1. Additional algorithms may be added to modify the model or for applications. 12/9/08, Sandia National Labs 22 EMMSim Find closest node to incoming event. If none “close” create new node Labeling of cluster is centroid/medoid of members in cluster Problem Nearest Neighbhor O(n) BIRCH O(lg n) • Requires second phase to recluster initial 12/9/08, Sandia National Labs 23 EMMIncrement <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> 2/3 2/3 2/21 2/3 1/1 1/2 1/2 N3 N1 1/3 N2 1/1 1/2 1/1 <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.> 12/9/08, Sandia National Labs 24 EMMDecrement N1 N3 1/3 1/3 2/2 1/3 N2 1/2 N5 12/9/08, Sandia National Labs N1 1/3 N3 1/3 1/6 Delete N2 1/6 1/3 N6 N5 1/6 N6 25 EMM Advantages Dynamic Adaptable Use of clustering Learns rare event Scalable: Growth of EMM is not linear on size of data. Hierarchical feature of EMM Creation/evaluation quasi-real time Distributed / Hierarchical extensions 12/9/08, Sandia National Labs 26 Growth of EMM number of state in model 800 700 600 threshold 0.994 500 threshold 0.995 400 threshold 0.996 300 threshold 0.997 200 threshold 0.998 100 1502 1423 1344 1265 1186 1107 1028 949 870 791 712 633 554 475 396 317 238 159 80 1 0 number of input data (total 1574) Servent Data 12/9/08, Sandia National Labs 27 EMM Performance – Growth Rate Data Ser went Ouse Sim 0.99 Jaccrd 156 Dice 72 Cosine 11 Ovrlap 2 Jaccrd 56 Dice 40 Cosine 6 Ovrlap 1 12/9/08, Sandia National Labs Threshold 0.992 0.994 0.996 0.998 190 268 389 667 92 123 191 389 14 19 31 61 2 3 3 4 66 81 105 162 43 52 66 105 8 10 13 24 1 1 1 1 28 EMM Performance – Growth Rate 12/9/08, Sandia National Labs Minnesota Traffic Data 29 Outline Introduction EMM Overview EMM Applications to Anomaly Detection EMM Applications to Nuclear Testing Future Work 12/9/08, Sandia National Labs 30 Datasets/Anomalies MnDot – Minnesota Department of Transportation Automobile Accident Ouse and Serwent – River flow data from England Flood Drought KDD Cup’99 http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Intrusion Cisco VoIP – VoIP traffic data obtained at Cisco Unusual Phone Call 12/9/08, Sandia National Labs 31 Rare Event Detection Detected unusual weekend traffic pattern Weekdays Weekend 12/9/08, Sandia National Labs Minnesota DOT Traffic Data 32 Our Approach to Detect Anomalies By learning what is normal, the model can predict what is not Normal is based on likelihood of occurrence Use EMM to build model of behavior We view a rare event as: Unusual event Transition between events states which does not frequently occur. Base rare event detection on determining events or transitions between events that do not frequently occur. Continue learning 12/9/08, Sandia National Labs 33 EMMRare EMMRare algorithm indicates if the current input event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs: The frequency of the node at time t+1 is below this threshold The updated transition probability of the MC transition from node at time t to the node at t+1 is below the threshold 12/9/08, Sandia National Labs 34 Determining Rare Occurrence Frequency (OFc) of a node Nc : OFc = CNc CN i i Normalized Transition Probability (NTPmn), from one state, Nm, to another, Nn : NTPmn = CL mn CN i i 12/9/08, Sandia National Labs 36 EMMRare Given: • Rule#1: CNi <= thCN • Rule#2: CLij <= thCL • Rule#3: OFc <= thOF • Rule#4: NTPmn <= thNTP Input: Gt: EMM at time t i: Current state at time t R= {R1, R2,…,RN}: A set of rules Output: At: Boolean alarm at time t Algorithm: 1 Ri = True At = 0 Ri = False 12/9/08, Sandia National Labs 37 Rare Event in Cisco Data 12/9/08, Sandia National Labs 38 Risk assessment Problem: Mitigate false alarm rate while maintaining a high detection rate. Methodology: Historic feedbacks can be used as a free resource to take out some possibly safe anomalies Combine anomaly detection model and user’s feedbacks. Risk level index Evaluation metrics: Detection rate, false alarm rate. Detection rate Detection rate = TP/(TP+TN) False alarm rate False alarm rate = FP/(TP+FP) Operational Curve 12/9/08, Sandia National Labs 39 Reducing False Alarms •Calculate Risk using historical feedback •Historical Feedback: •Count of true alarms: 12/9/08, Sandia National Labs 40 Detection Rate Experiments DETECTION RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS 1 0.5 0 16 ANOMALY DETECTION RISK ASSESSMENT 18 20 22 24 26 28 (a) EUCLIDEAN THRESHOLD FOR CLUSTERING (th) 30 32 1 0.5 0 0 ANOMALY DETECTION RISK ASSESSMENT 0.1 1 0.3 0.4 0.5 0.6 0.7 0.8 (b) RISK ASSESSMENT WEIGHT FACTOR (alpha) 0.9 1 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 0 0.2 50 100 150 200 250 300 (c) EMM STATE CARDINALITY THRESHOLD (thNode) 350 400 1 0.5 0 0 12/9/08, Sandia National Labs ANOMALY DETECTION RISK ASSESSMENT 50 100 150 200 250 (d) EMM TRANSITION CARDINALITY THRESHOLD (thLink) 300 41 False Alarm Rate FALSE ALARM RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS 1 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 16 18 1 20 22 24 26 28 (a) EUCLIDEAN THRESHOLD FOR CUSTERING (th) 0.1 1 1 0.5 0 0 12/9/08, Sandia National Labs 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (b) RISK ASSESSMENT WEIGHT FACTOR (alpha) 0.9 1 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 0 32 ANOMALY DETECTION RISK ASSESSMENT 0.5 0 0 30 50 100 150 200 250 300 (c) EMM STATE CARDINALITY THRESHOLD (thNode) 350 400 ANOMALY DETECTION RISK ASSESSMENT 50 100 150 200 250 (d) EMM TRANSITION CARDINALITY THRESHOLD (thLink) 300 42 Outline Introduction EMM Overview EMM Applications to Anomaly Detection EMM Applications to Nuclear Testing Future Work 12/9/08, Sandia National Labs 43 Research Objectives Apply proven spatiotemporal modeling technique to seismic data Construct EMM to model sensor data Local EMM at location or area Hierarchical EMM to summarize lower level models Represent all data in one vector of values EMM learns normal behavior Develop new similarity metrics to include all sensor data types (Fusion) Apply anomaly detection algorithms 12/9/08, Sandia National Labs 44 Input Data Representation •Vector of sensor values (numeric) at precise time points or aggregated over time intervals. •Need not come from same sensor types. •Similarity/distance between vectors used to determine creation of new nodes in EMM. 12/9/08, Sandia National Labs 45 EMM with Seismic Data Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Initial Testing with RDG data Use amplitude, period, and wave type 12/9/08, Sandia National Labs 46 New Distance Measure Data = <amplitude, period, wave type> Different wave type = 100% difference For events of same wave type: 50% weight given to the difference in amplitude. 50% weight given to the difference in period. If the distance is greater than the threshold, a state change is required. amplitude = | amplitudenew – amplitudeaverage | / amplitudeaverage period = | periodnew – periodaverage | / periodaverage 12/9/08, Sandia National Labs 47 EMM with Seismic Data States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively. 12/9/08, Sandia National Labs 48 Preliminary Testing RDG data February 1, 1981 – 6 earthquakes Find transition times close to known earthquakes 9 total nodes 652 total transitions Found all quakes 12/9/08, Sandia National Labs 49 Outline Introduction EMM Overview EMM Applications to Anomaly Detection EMM Applications to Nuclear Testing Future Work 12/9/08, Sandia National Labs 50 Ongoing/Future Work Extend to Emerging Patterns Extend to Hierarchical/Distributed Yu Su Test with more data – KDD Cup Compare to other approaches Charlie Isaksson Apply to nuclear testing 12/9/08, Sandia National Labs 51 Hierarchical EMM Summary EMM Regional EMM Local EMM 12/9/08, Sandia National Labs Local EMM Regional EMM Local EMM Local EMM Local EMM 52 12/9/08, Sandia National Labs 53 References Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp 258-265. Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. 12/9/08, Sandia National Labs 54