Download Data Mining for Network Intrusion Detection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
CS685 Presentation
Data Mining for
Network Intrusion Detection
Paul Dokas, Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep
ZSrivastava, Pang-Ning Tan
Computer Science Department
University of Minnesota
Presented By: [email protected]
CS685 Presentation
Outlines
• Motivation
• Related Work
• Detection Models and Approaches
• Experimental Evaluation
• Conclusion
CS685 Presentation
Motivation
• Organizations are becoming increasingly vulnerable
to potential cyber threats, e.g., network intrusions.
Incidents Reported to Computer Emergency
Response Team/Coordination Center (CERT/CC)
60000
50000
40000
30000
20000
10000
0
90
91
92
93
94
95
96
97
98
99
cyber incidents reported to CERT/CC
00
01
CS685 Presentation
Motivation (cont.)
•Intrusion Detection System (IDS)
•
•
•
•
•
collect signatures of known attacks
input attack signatures into IDS signature databases
extract features from various audit streams
compare these features with attacks signatures
raise the alarm when possible intrusion happens
•Limitations of traditional signature-based methods
• manual update of signature database
• inability to detect emerging cyber threats
CS685 Presentation
Motivation (cont.)
Why data mining?
•
large volumes of network data
•
different data mining techniques
clustering, classification
CS685 Presentation
Related Work
Data mining based intrusion detection techniques
• anomaly detection
•
•
•
•
Build models of normal data
Detect any deviation from normal data
Flag deviation as suspect
Identify new types of intrusions as deviation from normal behavior
• misuse detection
•
•
Label all instances in the data set (“normal” or “intrusion” )
Run learning algorithms over the labeled data to generate
classification rules
• Automatically retrain intrusion detection models on different input
data
CS685 Presentation
Related Work --- misuse detection
•Classification Model
Bayesian classifier
Decision tree
Association rule
Support vector machine
Learning from rare class
CS685 Presentation
Related Work --- anomaly detection
•Anomaly Detection Model
Association rule
Neural network
Unsupervised SVM
Outlier detection
CS685 Presentation
Detection Models
• misuse detection
rare class prediction model
 known intrusions and their variations
• anomaly detection
outlier detection model
 novel attacks whose nature is unknown
CS685 Presentation
Learning from Rare Class
• Problem: classification model for dataset with
skewed class distribution ?
intrusion class << normal class
 Mining needle in a haystack
CS685 Presentation
Learning from Rare Class (cont.)
• Novel classification algorithms
•PN-rule
• P-rule  most of intrusive examples
• N-rule  eliminating false alarms
•SMOTEBoost
•SMOTE (Synthetic Minority Over-sampling TEchnique)
•Boosting
CS685 Presentation
Anomaly Detection
•Novel attacks/intrusions
 deviation from normal behavior
•Outlier detection algorithm
Nearest neighbor approach
Distance based approach
Density based approach
Unsupervised support vector machines
CS685 Presentation
Anomaly Detection
• Density based approach (LOF)
CS685 Presentation
Anomaly Detection
•Identify normal behavior
•Construct useful set of feature
•Define similarity function
•Flag deviation as suspect
CS685 Presentation
Experimental Evaluation
•Public data set
DARPA 1998 Intrusion Detection Evaluation Data Set
prepared and managed by MIT Lincoln Lab
training data and test data
KDD Cup 1999 Data
the extension of DARPA’98
training data and test data
•Real network data
Network data from University of Minnesota
CS685 Presentation
Experimental Evaluation --- feature construction
Purpose:
more informative data set from public data set
Method:
• connection records
• label connection records
‘normal‘ or ‘intrusion‘
• features for each connection record
# of {packets, bytes}, {ACK, Re-Tx} packets, SYN/FIN, …
time-based features ( DoS attacks )
connection-based features ( PROBING attacks )
CS685 Presentation
Experimental Evaluation --- single connection attacks
ROC Curves for different outlier detection techniques
1
0.9
Detection Rate
0.8
0.7
0.6
0.5
0.4
0.3
LOF approach
NN approach
Mahalanobis approach
Unsupervised SVM
0.2
0.1
0
0
0.02
0.04
0.06
False Alarm Rate
0.08
0.1
ROC curves for single connection attacks
CS685 Presentation
Experimental Evaluation --- bursty attacks
ROC Curves for different outlier detection techniques
1
0.9
Detection Rate
0.8
0.7
0.6
0.5
0.4
Unsupervised SVM
LOF approach
Mahalanobis approach
NN approach
0.3
0.2
0.1
0
0.02
0.04
0.06
0.08
False Alarm Rate
0.1
ROC curves for bursty attacks
0.12
CS685 Presentation
Experimental Evaluation --- real network data
•Why?
Limitations of DARPA’98 data set
•How?
Detect network intrusion in the live network traffic
•Result?
•Successfully identify some novel intrusions
(top ranked outliers)
CS685 Presentation
Conclusion
•
promising intrusion detection models
•
performance of algorithm (on-line detection)
•
new classification and anomaly detection
algorithms
CS685 Presentation
Thanks!
Questions?