Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS685 Presentation Data Mining for Network Intrusion Detection Paul Dokas, Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep ZSrivastava, Pang-Ning Tan Computer Science Department University of Minnesota Presented By: [email protected] CS685 Presentation Outlines • Motivation • Related Work • Detection Models and Approaches • Experimental Evaluation • Conclusion CS685 Presentation Motivation • Organizations are becoming increasingly vulnerable to potential cyber threats, e.g., network intrusions. Incidents Reported to Computer Emergency Response Team/Coordination Center (CERT/CC) 60000 50000 40000 30000 20000 10000 0 90 91 92 93 94 95 96 97 98 99 cyber incidents reported to CERT/CC 00 01 CS685 Presentation Motivation (cont.) •Intrusion Detection System (IDS) • • • • • collect signatures of known attacks input attack signatures into IDS signature databases extract features from various audit streams compare these features with attacks signatures raise the alarm when possible intrusion happens •Limitations of traditional signature-based methods • manual update of signature database • inability to detect emerging cyber threats CS685 Presentation Motivation (cont.) Why data mining? • large volumes of network data • different data mining techniques clustering, classification CS685 Presentation Related Work Data mining based intrusion detection techniques • anomaly detection • • • • Build models of normal data Detect any deviation from normal data Flag deviation as suspect Identify new types of intrusions as deviation from normal behavior • misuse detection • • Label all instances in the data set (“normal” or “intrusion” ) Run learning algorithms over the labeled data to generate classification rules • Automatically retrain intrusion detection models on different input data CS685 Presentation Related Work --- misuse detection •Classification Model Bayesian classifier Decision tree Association rule Support vector machine Learning from rare class CS685 Presentation Related Work --- anomaly detection •Anomaly Detection Model Association rule Neural network Unsupervised SVM Outlier detection CS685 Presentation Detection Models • misuse detection rare class prediction model known intrusions and their variations • anomaly detection outlier detection model novel attacks whose nature is unknown CS685 Presentation Learning from Rare Class • Problem: classification model for dataset with skewed class distribution ? intrusion class << normal class Mining needle in a haystack CS685 Presentation Learning from Rare Class (cont.) • Novel classification algorithms •PN-rule • P-rule most of intrusive examples • N-rule eliminating false alarms •SMOTEBoost •SMOTE (Synthetic Minority Over-sampling TEchnique) •Boosting CS685 Presentation Anomaly Detection •Novel attacks/intrusions deviation from normal behavior •Outlier detection algorithm Nearest neighbor approach Distance based approach Density based approach Unsupervised support vector machines CS685 Presentation Anomaly Detection • Density based approach (LOF) CS685 Presentation Anomaly Detection •Identify normal behavior •Construct useful set of feature •Define similarity function •Flag deviation as suspect CS685 Presentation Experimental Evaluation •Public data set DARPA 1998 Intrusion Detection Evaluation Data Set prepared and managed by MIT Lincoln Lab training data and test data KDD Cup 1999 Data the extension of DARPA’98 training data and test data •Real network data Network data from University of Minnesota CS685 Presentation Experimental Evaluation --- feature construction Purpose: more informative data set from public data set Method: • connection records • label connection records ‘normal‘ or ‘intrusion‘ • features for each connection record # of {packets, bytes}, {ACK, Re-Tx} packets, SYN/FIN, … time-based features ( DoS attacks ) connection-based features ( PROBING attacks ) CS685 Presentation Experimental Evaluation --- single connection attacks ROC Curves for different outlier detection techniques 1 0.9 Detection Rate 0.8 0.7 0.6 0.5 0.4 0.3 LOF approach NN approach Mahalanobis approach Unsupervised SVM 0.2 0.1 0 0 0.02 0.04 0.06 False Alarm Rate 0.08 0.1 ROC curves for single connection attacks CS685 Presentation Experimental Evaluation --- bursty attacks ROC Curves for different outlier detection techniques 1 0.9 Detection Rate 0.8 0.7 0.6 0.5 0.4 Unsupervised SVM LOF approach Mahalanobis approach NN approach 0.3 0.2 0.1 0 0.02 0.04 0.06 0.08 False Alarm Rate 0.1 ROC curves for bursty attacks 0.12 CS685 Presentation Experimental Evaluation --- real network data •Why? Limitations of DARPA’98 data set •How? Detect network intrusion in the live network traffic •Result? •Successfully identify some novel intrusions (top ranked outliers) CS685 Presentation Conclusion • promising intrusion detection models • performance of algorithm (on-line detection) • new classification and anomaly detection algorithms CS685 Presentation Thanks! Questions?