Download Project Report -

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Net-Centric Software and Systems
I/UCRC
Self-Detection of Abnormal Event Sequences
Project Lead: Farokh Bastani, I-Ling Yen, Latifur Khan
Date: April 1, 2010
Copyright © 2010 NSF Net-Centric I/UCRC. All rights reserved.
2009/Current Project Overview
Self-Detection of Abnormal Event Sequences
Project Scope:
• Given a set of event sequences, determine the
normal and abnormal transitions using data
mining and automata techniques
• Develop techniques for problem-specific
anomaly detection, including data collection
and extraction, a suite of techniques for
detecting abnormal event sequences
• The industry members can share the
techniques for abnormal event sequence
detection to achieve high quality systems
Project Schedule:
T2,3,5: Applied
clustering and PFSA
on the small dataset
and obtained
results
T4: Parallelized
T1: Implemented
preprocessor
algorithms on
large dataset
T1: Refined
preprocessor &
got new results
A M J J A S O N D J F M A
09
Tasks:
1. Develop Preprocessor for processing log data
and extract event sequences
2. Develop cluster based anomaly detection
techniques
3. Develop probabilistic finite state automata
(PFSA) based anomaly detection techniques
4. Parallelize the algorithms to make them more
efficient
5. Apply the techniques on the datasets
provided by the industrial partner and report
the results
T5: Applied
algo’s on
the large
dataset
10
Deliverables:
• A suite of anomaly detection algorithms
(cluster-based and PFSA based tools)
• Anomaly detection results
Success Criteria:
• Identify injected anomalies with high precision
and recall
5/22/2017
Page 2
Significant Finding/Accomplishment!
Complete
Partially Complete
2009 Project Results
TASK
Not Started
STAT
PROGRESS and ACCOMPLISHMENT
1:Develop Preprocessor for
processing log data and
extracting event sequences

Use lex/yacc to implement a flexible processor.
Need to refine the preprocessor to eliminate
the noisy data due to initialization and
concurrent execution
2: Develop cluster based anomaly
detection techniques

Completed. Parallelized GA based clustering
technique for anomaly detection.

Completed. Enhanced MDI (minimal
divergence inference) to handle event
attributes and anomaly detection.
4: Enhance the algorithms to
make them more efficient and
effective

Invented the prefix tree based approach, which
facilitates the analysis of very large datasets,
reduces processing time over 20 folds.
5: Apply the techniques on the
datasets provided by Cisco

The results show high precision in identifying
injected anomalies
3: Develop PFSA based anomaly
detection techniques
The tools from this research have detected the injected anomalies with
high precision
5/22/2017
Page 3
Major Accomplishments, Discoveries
and Surprises
1. Enhanced clustering based anomaly detection
• Developed Multi-objective genetic algorithm to avoid local
•
minimum search
Parallelized the algorithm
2. Enhanced PFSA based anomaly detection
• Implemented the prefix tree scheme
• Used MDI (minimal divergence inference)
• Attributes as transition symbols
• Use some of the attributes directly
• Quantize other attributes (e.g., time)
and use the quantized values
3. Applied to Cisco call signal event sequences
• Identified all the injected anomalies
4. Invented the prefix tree based approaches
5/22/2017
Page 4
New Problems
•
Use prefix tree to greatly enhance the efficiency of the algorithms
•
•
•
•
How to achieve real-time on-the-fly anomaly detection?
•
•
•
Event sequences can be built into a prefix tree
Prefix tree can be used to group event sequences
at different levels of granularity (this is especially
the case for datasets containing execution traces)
Prefix tree can provide some distance information
Need to determine a suitable time interval T
Data collected in T should be sufficient to
build a good anomaly detection model,
while the detection latency is not significant
2nd closest neighbor
How to handle event sequences created due to concurrent execution?
•
•
Concurrent execution can generate event sequences of arbitrary
order and make anomaly detection difficult
Investigate association rule mining techniques for this problem
5/22/2017
Page 5
Proposed Solution
•
Enhance existing tools using information provided by prefix tree
•
•
•
•
Clustering-based approaches: Use prefix tree to determine the
sequence groups at different granularity levels (object level,
method level, exact sequence level); clustering algorithms can
then be used to merge these groups into clusters
Density-based approaches: Use prefix tree to help determine the
k-th nearest neighbor
PFSA-based approaches: Always start from prefix tree
Enhance existing tools for real-time on-the-fly anomaly detection
•
•
Collect data Dt in (t, t+T], use Dt to build the anomaly detection
model At in (t+T, t+2T], use At for anomaly detection in (t+2T, t+3T]
Experimentally
determine
Collect Dt+T
Collect Dt+2T
Collect Dt
a suitable T
Build At
Build At+T
…
… Build At–T
Apply At–2T
t
Apply At–T
t+T
Apply At
t+2T
t+3T
5/22/2017
Page 6
2010/New Project Summary
Self-Detection of Abnormal Event Sequences
Tasks:
1. Modify the anomaly detections tools for realtime on-the-fly anomaly detection
2. Enhance the anomaly detection techniques
using knowledge in prefix tree
3. Continue to
• Refine the preprocessor
• Apply the techniques to the datasets
• Compare the results (time/precision)
4. Develop visualization tool for PFSA
5. Adapt the tools for different datasets
Project Schedule:
Research Goals:
1. Improve existing anomaly detections
techniques, specifically for execution traces
and event sequences
2. Develop a diverse set of anomaly detection
techniques for handling datasets with
different characteristics
3. Make the tools available for future anomaly
detection tasks
Benefits to Industry Partners:
1. A comprehensive set of techniques and
tools to allow best analysis of different
datasets
2. Real-time on-the-fly anomaly detection
capability
3. Rapid adaptation of the tools to handle other
application specific datasets
Task 1. Modification
Task 4. Visualization
Task 5?
Additional
datasets
Task 2. Prefix tree
Task 3. Experiment
A M J J A S O N D J F M A
10
11
5/22/2017
Page 7