Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Net-Centric Software and Systems I/UCRC Self-Detection of Abnormal Event Sequences Project Lead: Farokh Bastani, I-Ling Yen, Latifur Khan Date: April 1, 2010 Copyright © 2010 NSF Net-Centric I/UCRC. All rights reserved. 2009/Current Project Overview Self-Detection of Abnormal Event Sequences Project Scope: • Given a set of event sequences, determine the normal and abnormal transitions using data mining and automata techniques • Develop techniques for problem-specific anomaly detection, including data collection and extraction, a suite of techniques for detecting abnormal event sequences • The industry members can share the techniques for abnormal event sequence detection to achieve high quality systems Project Schedule: T2,3,5: Applied clustering and PFSA on the small dataset and obtained results T4: Parallelized T1: Implemented preprocessor algorithms on large dataset T1: Refined preprocessor & got new results A M J J A S O N D J F M A 09 Tasks: 1. Develop Preprocessor for processing log data and extract event sequences 2. Develop cluster based anomaly detection techniques 3. Develop probabilistic finite state automata (PFSA) based anomaly detection techniques 4. Parallelize the algorithms to make them more efficient 5. Apply the techniques on the datasets provided by the industrial partner and report the results T5: Applied algo’s on the large dataset 10 Deliverables: • A suite of anomaly detection algorithms (cluster-based and PFSA based tools) • Anomaly detection results Success Criteria: • Identify injected anomalies with high precision and recall 5/22/2017 Page 2 Significant Finding/Accomplishment! Complete Partially Complete 2009 Project Results TASK Not Started STAT PROGRESS and ACCOMPLISHMENT 1:Develop Preprocessor for processing log data and extracting event sequences Use lex/yacc to implement a flexible processor. Need to refine the preprocessor to eliminate the noisy data due to initialization and concurrent execution 2: Develop cluster based anomaly detection techniques Completed. Parallelized GA based clustering technique for anomaly detection. Completed. Enhanced MDI (minimal divergence inference) to handle event attributes and anomaly detection. 4: Enhance the algorithms to make them more efficient and effective Invented the prefix tree based approach, which facilitates the analysis of very large datasets, reduces processing time over 20 folds. 5: Apply the techniques on the datasets provided by Cisco The results show high precision in identifying injected anomalies 3: Develop PFSA based anomaly detection techniques The tools from this research have detected the injected anomalies with high precision 5/22/2017 Page 3 Major Accomplishments, Discoveries and Surprises 1. Enhanced clustering based anomaly detection • Developed Multi-objective genetic algorithm to avoid local • minimum search Parallelized the algorithm 2. Enhanced PFSA based anomaly detection • Implemented the prefix tree scheme • Used MDI (minimal divergence inference) • Attributes as transition symbols • Use some of the attributes directly • Quantize other attributes (e.g., time) and use the quantized values 3. Applied to Cisco call signal event sequences • Identified all the injected anomalies 4. Invented the prefix tree based approaches 5/22/2017 Page 4 New Problems • Use prefix tree to greatly enhance the efficiency of the algorithms • • • • How to achieve real-time on-the-fly anomaly detection? • • • Event sequences can be built into a prefix tree Prefix tree can be used to group event sequences at different levels of granularity (this is especially the case for datasets containing execution traces) Prefix tree can provide some distance information Need to determine a suitable time interval T Data collected in T should be sufficient to build a good anomaly detection model, while the detection latency is not significant 2nd closest neighbor How to handle event sequences created due to concurrent execution? • • Concurrent execution can generate event sequences of arbitrary order and make anomaly detection difficult Investigate association rule mining techniques for this problem 5/22/2017 Page 5 Proposed Solution • Enhance existing tools using information provided by prefix tree • • • • Clustering-based approaches: Use prefix tree to determine the sequence groups at different granularity levels (object level, method level, exact sequence level); clustering algorithms can then be used to merge these groups into clusters Density-based approaches: Use prefix tree to help determine the k-th nearest neighbor PFSA-based approaches: Always start from prefix tree Enhance existing tools for real-time on-the-fly anomaly detection • • Collect data Dt in (t, t+T], use Dt to build the anomaly detection model At in (t+T, t+2T], use At for anomaly detection in (t+2T, t+3T] Experimentally determine Collect Dt+T Collect Dt+2T Collect Dt a suitable T Build At Build At+T … … Build At–T Apply At–2T t Apply At–T t+T Apply At t+2T t+3T 5/22/2017 Page 6 2010/New Project Summary Self-Detection of Abnormal Event Sequences Tasks: 1. Modify the anomaly detections tools for realtime on-the-fly anomaly detection 2. Enhance the anomaly detection techniques using knowledge in prefix tree 3. Continue to • Refine the preprocessor • Apply the techniques to the datasets • Compare the results (time/precision) 4. Develop visualization tool for PFSA 5. Adapt the tools for different datasets Project Schedule: Research Goals: 1. Improve existing anomaly detections techniques, specifically for execution traces and event sequences 2. Develop a diverse set of anomaly detection techniques for handling datasets with different characteristics 3. Make the tools available for future anomaly detection tasks Benefits to Industry Partners: 1. A comprehensive set of techniques and tools to allow best analysis of different datasets 2. Real-time on-the-fly anomaly detection capability 3. Rapid adaptation of the tools to handle other application specific datasets Task 1. Modification Task 4. Visualization Task 5? Additional datasets Task 2. Prefix tree Task 3. Experiment A M J J A S O N D J F M A 10 11 5/22/2017 Page 7