Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Proposal of New Benchmark Data to Evaluate Mining Algorithms for Intrusion Detection Jungsuk SONG†, Hiroki TAKAKURA‡, Yasuo OKABE‡ †Graduate ‡Academic School of Informatics, Kyoto Univ. Center for Computing and Studies, Kyoto Univ. [email protected], [email protected], [email protected] Overview Introduction Intrusion Detection System Intrusion Detection Evaluation Data KDD Cup 99 Data Set Details Problems Our Experimental Result Our Proposal 2007/1/25 23rd Asia Pacific Advanced Network Meeting 2 Introduction Intrusion Detection System(IDS) combination of software and hardware that attempts to perform intrusion detection raise the alarm when possible intrusion or suspicious patterns are observed IDS Attacker The Interne t Firewall IDS Internal Network 2007/1/25 23rd Asia Pacific Advanced Network Meeting 3 Introduction Why we need IDS? Unknown weakness or bugs Complex, unforeseen attacks Firewalls, security policies Using information detected 2007/1/25 Recover compromised system Understand the attack mechanism Detect novel attacks Defend our systems 23rd Asia Pacific Advanced Network Meeting 4 Introduction We need evaluation data for IDS Performance improvement Technical progress Research guide… KDD Cup 99 Data Set Most commonly used evaluation data, but.. Propose new benchmark data 2007/1/25 23rd Asia Pacific Advanced Network Meeting 5 KDD Cup 99 Data Set Modification of DARPA 1998 data set DARPA 1998 data set Managed by Lincoln Lab.(under DARPA sponsorship) Simulated nine weeks of raw TCP dump data Attacks 38 different attacks against Unix/Linux machines DoS, Scan, Buffer overflow and so on. Normal 2007/1/25 traffic 1000’s of virtual hosts and 100’s of user automata 23rd Asia Pacific Advanced Network Meeting 6 KDD Cup 99 Data Set Each connection ⇒ 41-dimensions vector Samples 5,tcp,smtp,SF,959,337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00, 0.00,0.00,144,192,0.70,0.02,0.01,0.01,0.00,0.00,0.00,0.00,normal. 0,tcp,http,SF,54540,8314,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.0 0,0.00,0.00,118,118,1.00,0.00,0.01,0.00,0.00,0.00,0.02,0.02,back. 0,tcp,http_443,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,114,2,1.00,1.00,0.00,0.00,0.02 ,0.06,0.00,255,2,0.01,0.07,0.00,0.00,1.00,1.00,0.00,0.00,neptune. Numerical: 34, Categorical: 7 Basic feature: “duration”, “protocol”… Statistical feature: “number of connections to the same host as the current connection in the past two seconds”… Label ⇒ “normal” or “name of attacks” 2007/1/25 23rd Asia Pacific Advanced Network Meeting 7 KDD Cup 99 Data Set Problems Attacks Can not reflect current malicious activities Stealthy scan ⇒ short time interval, no multiple IP address scan No attacks against Windows machines Protocol types Only TCP, UDP, ICMP Can not detect attacks such as ARP Spoofing Simplicity 2007/1/25 Only 3 real victim hosts 1000’s of virtual hosts and 100’s of user automata(custom software) 23rd Asia Pacific Advanced Network Meeting 8 Our Experimental Results PCA(Principal Components Analysis) Technique for reducing dimensions of data set Transform the data to a new coordinate system What we know from PCA The number of dimensions that are actually required to represent the original data Accumulative Contribution Ratio Indicate what percentage of the original data can be represented For 2007/1/25 example 2 dimensions ⇒ 90% : represent 90% of the original data by them 23rd Asia Pacific Advanced Network Meeting 9 Our Experimental Results There is no guarantee their performance also will be good in real environment 2007/1/25 23rd Asia Pacific Advanced Network Meeting 10 Our Proposal New benchmark data IDS KDD Cup 99 form Honeypots Privacy problems Sanitize IP address Remove Open Update every month payload data Goal Comparison analysis of IDS alert and Honeypots traffic data Detect the attacks that are missed by IDS 2007/1/25 23rd Asia Pacific Advanced Network Meeting 11 Thank you for your attention! 2007/1/25 23rd Asia Pacific Advanced Network Meeting 12