Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada Outline Motivation Intrusion detection system Data mining meets intrusion detection Proposed architecture Challenges and solutions Experimental results Conclusion and future work PST2005 Jiong Zhang and Mohammad Zulkernine 2 Motivation Intrusion Prevention System (firewall) can not prevent all attacks. Intruder Victim Intruder Firewall Internet PST2005 Jiong Zhang and Mohammad Zulkernine 3 Motivation (contd.) Statistical data for intrusions • Total losses of 2004 (reported): $141,496,560. Source: FBI survey for Year 2004 • 50% of security breaches are undetected. PST2005 Source: FBI Statistics for Year 2000 Jiong Zhang and Mohammad Zulkernine 4 Intrusion Detection Techniques Misuse Detection • Extracts patterns of known intrusions • Cannot detect novel intrusions • Has low false positive rate Anomaly Detection • Builds profiles for normal activities • Uses the deviations from the profiles to detect attacks • Can detect unknown attacks • Has high false positive rate PST2005 Jiong Zhang and Mohammad Zulkernine 5 Network Intrusion Detection System (NIDS) PST2005 Monitors network traffic to detect intrusions Monitors more targets on a network Detects some attacks that hostbased systems miss Does not affect network operations Jiong Zhang and Mohammad Zulkernine 6 Current NIDS Many current NIDSs (like snort) : Rule-based Unable to detect novel attacks High maintenance cost PST2005 Jiong Zhang and Mohammad Zulkernine 7 Rule Based vs. Data Mining Rule based systems Intrusion Data Security Experts Rules Data mining based systems Labeled Data PST2005 Data Mining Engine Jiong Zhang and Mohammad Zulkernine Patterns 8 Data Mining Meets Intrusion Detection PST2005 Extract patterns of intrusions for misuse detection Build profiles of normal activities for anomaly detection Build classifiers to detect attacks Some IDSs have successfully applied data mining techniques in intrusion detection Jiong Zhang and Mohammad Zulkernine 9 Proposed Architecture Database (On line) Networks Packets Sensors Audited data On-line PreProcessors Alarms Feature vectors Detector Patterns Training data Data Set Off-line Preprocessor Feature vectors Alarmer On line Off line Pattern Builder Database (Off line) Architecture of the proposed NIDS PST2005 Jiong Zhang and Mohammad Zulkernine 10 Random Forests Unsurpassable in accuracy among the current data mining algorithms Runs efficiently on large data set with many features Gives the estimates of what features are important No nominal data problem No over-fitting PST2005 Jiong Zhang and Mohammad Zulkernine 11 Imbalanced Intrusion Problems • Higher error rate for minority intrusions • Some minority intrusions are more dangerous • Need to improve the performance for the minority intrusions Proposed Solution • Down-sample the majority intrusions and over-sample the minority intrusions PST2005 Jiong Zhang and Mohammad Zulkernine 12 Feature Selection PST2005 Essential for improving detection rate Reduces the computational cost Many NIDSs select features by intuition or the domain knowledge Jiong Zhang and Mohammad Zulkernine 13 Feature Selection over the KDD’99 Dataset PST2005 Calculate variable importance using random forests. Select the 38 most important features in detection. Importance -10 Feature -5 0 5 10 15 3 23 10 35 33 17 8 6 32 14 24 5 36 40 13 12 4 16 34 22 1 2 29 31 38 37 30 18 19 41 27 9 26 11 28 25 39 15 7 20 21 Jiong Zhang and Mohammad Zulkernine 14 Some Features The two most important features • Feature 3. service type, such as http, telnet, and ftp • Feature 23. count, # connections to the same host as the current one during past two seconds The three least important features • Feature 7. land, 1 if connection is from/to the same host/port; 0 otherwise • Feature 20. num_outbound_cmds, # of outbound commands in an ftp session • Feature 21. is_hot_login, 1 if the login belongs to the “hot” list; 0 otherwise PST2005 Jiong Zhang and Mohammad Zulkernine 15 Parameter Optimization for Random Forests PST2005 600 0.00215 0.0021 Oob Error Rate Time 500 0.00205 0.002 400 0.00195 300 0.0019 0.00185 Time Optimize the parameter Mtry of random forests to improve detection rate. Choose 15 as the optimal value, which reaches the minimum of the oob error rate. Oob Error Rate 200 0.0018 0.00175 100 0.0017 0 0.00165 Jiong Zhang and Mohammad Zulkernine 5 10 15 20 25 30 35 38 Mtry 16 Performance Comparison on the KDD’99 Dataset Our approach provides lower overall error rate and cost compared to the best KDD’99 result. Feature selection can improve the performance of intrusion detection. Ove rall Error Rate 7.35% 7.30% 7.25% 7.20% 7.15% 7.10% 7.05% 7.00% 6.95% B e s t KD D R e s ult E xpe rim e nt wit ho ut f e a t ure s e le c t io n E xpe rim e nt wit h f e a t ure s e le c t io n Cos t of M is clas s ification 0.234 0.233 0.232 0.231 0.23 0.229 0.228 0.227 0.226 0.225 B e s t KD D R e s ult PST2005 Jiong Zhang and Mohammad Zulkernine E xpe rim e nt wit ho ut f e a t ure s e le c t io n E xpe rim e nt wit h f e a t ure s e le c t io n 17 Conclusion and Future Work PST2005 Random forests algorithm can help improve detection performance and select features. Sampling techniques can reduce the time to build patterns and increase the detection rate of minority intrusions. In future, we will focus on anomaly detection and a multiple classifier architecture. Jiong Zhang and Mohammad Zulkernine 18 PST2005 Jiong Zhang and Mohammad Zulkernine 19