Download Title of Project Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Net-Centric Software & Systems Consortium
Kick-off Meeting
February 26-27, 2009
Self-Detection of Abnormal Event
Sequences
Farokh B. Bastani
I-Ling Yen
Latifur Khan
UT-Dallas
[email protected]
UT-Dallas
[email protected]
UT-Dallas
[email protected]
Net-Centric Software & Systems Consortium
Kick-off Meeting
Problem Description
• There are numerous types of event-based
workflows in net-centric systems
• E.g., Call control signal processing, network accesses,
access to resources, access to data, etc.
• Need for abnormal behavior detection
• Event-based workflows may incur software & system
faults, operational errors, attacks, fraud, illegitimate
manipulations, resulting in abnormal behaviors
• If the abnormal behavior can be detected, proactive
techniques can be used to mitigate the problem
5/22/2017
Net-Centric Software & Systems
Consortium
2
Net-Centric Software & Systems Consortium
Kick-off Meeting
Existing Solutions
• Many data mining and machine learning algorithms can be used to
classify normal and abnormal events
• Bayesian networks, neural networks, decision trees, K-mean, support vector
machines (SVM), hidden Markov models, etc.
• Problem: Which method to use?
• Data set dependent  Must explore the best approach for each dataset
• Feature extraction from raw data can have significant impact on the
prediction quality  Must explore various feature extraction models
• Problem: How to mine event sequences?
• Automata based approach: Known event sequences, cluster them and
determine the abnormal ones (no well established clustering techniques)
• Episode based approaches: Need to mine the event sequences first, and
then cluster them and determine the abnormal ones (has well established
episode mining techniques, but not much research on clustering)
5/22/2017
Net-Centric Software & Systems
Consortium
3
Net-Centric Software & Systems Consortium
Kick-off Meeting
Our Solution
• Multivariate automata and episode mining
• Unknown event sequence: Use episode mining
• Automata merging for known or mined event sequences
• Multiple variables result in a huge state space
• Use dominance parameters and weights to merge states
• Develop techniques to merge automata efficiently (hashing, clustering)
• Identify abnormal event sequences
• Use clustering techniques to identify outliers
• Need effective clustering techniques
• Need to handle event sequences with different lengths
• Need to integrate inter-event parameters in the clustering process
• Manual help to identify actual faulty event sequences offline
5/22/2017
Net-Centric Software & Systems
Consortium
4
Net-Centric Software & Systems Consortium
Kick-off Meeting
Our Solution (Cont.)
• Develop a feedback based self-improving mechanism
• When the prediction error exceeds a threshold, adjust the algorithm
• Use multiple algorithms to provide fine tuning
Faulty data
• E.g., use weighted decision from multiple
algorithms
injector
feedback
• Fine tune feature set extractions and use dimension reduction
Current
prediction
mechanisms to obtain faster and better
results Classifier
Data sets
• Off-line analysis to achieve improvements and feed the
improvements to the online model
Classifier Analysis
All data sets
• Adjusted algorithm, revision of features,
addition of inter-ES features
• Develop fault-injection techniques to induce self-learning
• Establish the faulty pattern library from data that have been learned
• Inject faulty patterns to train the mining process and to measure the
effects (use faulty pattern library and develop fault generation algorithms)
5/22/2017
Net-Centric Software & Systems
Consortium
5
Net-Centric Software & Systems Consortium
Kick-off Meeting
Experimental Plan
• Develop techniques for abnormal event sequence detection
• Develop automata generation and merging techniques
• Study the effects of various clustering algorithms on various event
sequence datasets
• Consider signal flow data from Cisco
• Consider network-based intrusion detection datasets
• Consider human interoperations (if possible)
• Develop the models and methods for dynamic adaptation
• Algorithmic adaptation and feature set extraction adaptation
5/22/2017
Net-Centric Software & Systems
Consortium
6
Net-Centric Software & Systems Consortium
Kick-off Meeting
Industry Member Benefits
• The abnormal behavior prediction approach can be applied
to many net-centric applications that are event-based and
workflow-oriented
• Call control signal processing
• Resource and database access control
• System health monitoring for real-time embedded systems, including
avionic systems, space-based systems, etc.
• Application-dependent workflows, e.g. monitoring the behavior of
drivers on roads
• Need real data and related knowledge from industry for
analysis, model construction, effectiveness analysis
5/22/2017
Net-Centric Software & Systems
Consortium
7
Net-Centric Software & Systems Consortium
Kick-off Meeting
Deliverables and Budget
• First year, $30K: Develop the basic multivariate automata mining and
abnormal sequence detection techniques
• First quarter: Work with industrial partner to understand the data
and develop pre-processor to extract event patterns
• Second and third quarter: Develop the automata merging and
automata clustering techniques
• Fourth quarter: Apply the techniques to the dataset and validate
the approaches
• Second year, $30K: Develop dynamic learning techniques
• Develop the feedback learning approach
• Develop tools to efficiently achieve self learning
5/22/2017
Net-Centric Software & Systems
Consortium
8
Many data clustering
algorithms can be used for
abnormal event detection.
But they do not self adapt
and data features have to
be identified preliminarily
Key objectives:
MAIN ACHIEVEMENT:
Applied data clustering algorithms to various data
sets to study their effectiveness. The experiments
show that Support Vector Machine yields the
best results for 90% of the data sets.
Developed improved SVM algorithm to further
improve data clustering outcomes.
Developed methods for clustering sparse data sets.
• Dynamic learning
• Adaptive feature extraction
HOW IT WORKS:
QUANTITATIVE IMPACT
STATUS QUO
Net-Centric
Software & Systemsdescription
Consortium
Topic/project/effort
Kick-off Meeting
END-OF-PHASE GOAL
NEW INSIGHTS
ASSUMPTIONS AND LIMITATIONS:
• Availability of data
Comparison of Prediction Accuracy
With dynamic learning: More
accurate abnormal event prediction
results and less false alarms.
Automated Feature extraction:
Obtain features that can optimize
the prediction effectiveness.
Dynamic Learning: Develop a feedback based
self-improving mechanism to improve clustering
algorithm on-the-fly based on a small set of data
and verify the improvement off-line on a large
volume of historical data
Automated feature extraction: Build workflow and
event model to allow automatic extraction of data
features, including event characteristics, interevent effects, etc. Try to improve the precision of
abnormality prediction by improvement on
extracted features.
Methods\Datasets dataset A
dataset B
Item-Based
61.326
60.132
User-Based
61.271
60.321
LPWSI
68.3
67.0
LPKA
71.5
70.0
KAWOK
72.5
70.4
BIC-aiNet
82 (80% training, 20% test)
BMKL
84.03
83.53
BMKL with NSM
84.13
83.71
Develop an abnormal event
detection algorithm that can
dynamically adapt through
learning and can
automatically extract the
best features for optimal
prediction
Apply the technique to Cisco
signal flow data and for
network intrusion detection
Net-Centric Software & Systems
5/22/2017
Can be used for abnormal event sequence
detection for many event based applications.
Consortium
9