Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer network wikipedia , lookup
Distributed firewall wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Deep packet inspection wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Network tap wikipedia , lookup
Airborne Networking wikipedia , lookup
Real-time Intrusion Detection and Classification Phurivit Sangkatsanee1, Naruemon Wattanapongsakorn1,* and Chalermpol Charnsripinyo2 1 Computer Engineering Department, King Mongkut’s University of Technology Thonburi, 126 Pracha-Utid, Tung-Kru, Bangkok 10140 Thailand, 2 Network Technology Laboratory, National Electronics and Computer Technology Center, Klong Luang, Pathumthani, 10120 Thailand *Corresponding author: [email protected] Abstract Together with the growth of computer network activities, the growing rate of network attacks including hacker, cracker, and criminal enterprises have been advancing, which impact to the availability, confidentiality, and integrity of critical information data. In this paper, we propose a RealTime Intrusion Detection System (RT-IDS) using Decision tree technique to classify an online network data that is preprocessed to have only 13 features. The number of features affects to the RT-IDS detection speed and resource consumption. In addition our RT-IDS can classify normal network activities and main attack types consisting of Probe and Denial of Service (DoS). Hence, it helps to decrease time to diagnose and defense each network attack. The results show that our RT-IDS technique offers the detection rate higher than 98%, while consuming less than 25% of CPU and 94.5 MB of memory on full traffic load of 100 Mbps. Key Words: Intrusion Detection System, Decision tree, Network security system, Denial of Service, Probe, KDD99 dataset 1. Introduction Nowadays, many organizations and companies use Internet services as their communication and marketplace to do business such as at EBay and Amazon.com website. Together with the growth of computer network activities, the growing rate of network attacks has been advancing, impacting to the availability, confidentiality, and integrity of critical information data. Therefore a network system must use one or more security tools such as firewall, antivirus, IDS and Honey Pot to prevent important data from criminal enterprises. A network system using a firewall only is not enough to prevent networks from all attack types. The firewall cannot defense the network against intrusion attempts during the opening port. Hence a Real-Time Intrusion Detection System (RT-IDS), shown in Figure 1, is a prevention tool that gives an alarm signal to the computer user or network administrator for antagonistic activity on the opening session, by inspecting hazardous network activities. Figure 1. Intrusion detection system environment In the past, there were research papers proposing intrusion detection systems with various classification algorithms such as Adaptive Resonance Theory (ART), Self-Organizing Map (SOM), BackPropagation (Back-Prop) Neural Network, statistical probability distribution, BLINd classification and Bayesian [1-8]. Most of them used KDD99 dataset to evaluate their IDS performance. The KDD 99 dataset which is a 10 year data is very old off-line data consisting of 41 features. There were a few of researchers proposing realtime intrusion detection systems using different techniques. The first one uses Self-Organizing Map to classify normal data and DoS attack with 10 features of every 50 packets evaluated by different characteristic visualization of normal and DoS [6]. The second one uses Bayesian Classification model to classify normal and attack with first 3 months of training and last month of testing evaluated by detection penalty [7]. The third one uses Adaptive Resonance Theory (ART) and Self-Organizing Map (SOM) by considering about 5000 packets for training and 3000 packets for training, obtaining from sampling during 4-day experiment with 27 features by number of frequency of occurrences in each interval as inputs for classification. The detection rates (Attack and Normal) of the ART and the SOM are about 97% and 95%, respectively [8]. In this paper, we propose a Real-Time Intrusion Detection System (RT-IDS) using Decision tree approach considering only 13 features of network traffic data which are effective to detection speed and computer resource consumption (CPU, Memory). Moreover we classify normal activity and main attack types consisting of Probe and DoS. This advantage will reduce the time for computer users to analyze network data and protect the network from the criminal enterprises. Decision tree algorithm has high performance in classifying unknown attack (without training) [9]. The rest of this paper is organized as follows. In section 2, we present our research methodology with the Decision tree. In section 3, we describe the realtime IDS process. Section 4 explains parameter settings and evaluation. Section 5 shows the experimental results including detection rate and consumption of CPU and memory. Finally in section 6, conclusion of this research is given. C4.5 version is an efficiency and popular learning type of the decision tree. It is proposed by J.R. Quinlan who has his research spanning for more than 15 years [13]. 3. Real-Time IDS Process Our Real-Time IDS as shown in Figure 3 mainly consists of the preprocess part, and the classifying part. Ethernet Online data Packet Sniffer Tcpdump packet Preprocess Part Extracting IP features TCP packet UDP packet Extracting TCP features Extracting UDP features TCP Extracted UDP Extracted ICMP packet Extracting ICMP features ICMP Extracted 2. Research Methodology Decision tree model is a well-known classification algorithm. It consists of non-terminal nodes (a root and internal nodes) and terminal nodes (leaves) which efficiently classify data [10]. Root node is the first attribute with test conditions to split each record toward each internal node depending on characteristics of the record. Firstly, the decision tree is trained with known data by a learning type before it can classify new or untrained data. After training, this algorithm can predict new data by starting from a root node to each internal node containing attribute test conditions until arriving at the leaf node consisting of answer class as shown in Figure 2 [11]. Collecting and Waiting every 2 sec All Extracted Data Separating the data into records by connection between 2 IP addresses Records including 13 features Classifying Part Dataset Root Answer Class Internal node Decision Tree Using C4.5 learning Detection Result Log File Log File Training Testing Figure 3. Real-time IDS process Leaves Figure 2. Decision tree structure When the RT-IDS receives online network data packet firstly entering the preprocess part where the packet header and other detailed data are considered. The detailed packet data feature is then generated to numeral. The essential feature which represents the network activity will be extracted from this data. Then the preprocessed data with key signature extraction is ready to enter the classifying part so that the IDS can classify the data into normal network activity and main attack types. The Real-Time IDS is implemented on 2.83 GHz Intel Pentium Core2 Quad 9550 processor with 4GB RAM and 100Mbps LAN. 3.1 Preprocess Part In the preprocess part, we use the packet sniffer, which is built with Jpcap library, to store network packet information including IP header, TCP header, UDP header, and ICMP header from each promiscuous packet. After that, the packet information is divided by considering connections between any two IP addresses (source IP and destination IP) and collect all records every 2 seconds as shown in the preprocess part in Figure 3. Each record consists of 13 data features and an answer class as shown in Table 1. Examples of the data records obtained from the preprocess part are shown below. 2138, 33, 33, 4, 4, 0, 644, 2136, 0, 0, 0, 0, 0, Normal 12, 2, 2, 0, 0, 0, 1, 12, 0, 0, 0, 0, 0,Normal 230, 2, 120, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,Probe 6, 3, 1, 2, 2, 0, 2, 2, 2, 153, 2, 78, 0, Probe 0, 0, 0, 0, 0, 0, 0, 0, 0, 48810, 1, 1, 48810, DoS 145, 145, 1, 0, 145, 0, 0, 0, 0, 0, 0, 0, 0, DoS Table 1. Thirteen features in preprocess data No. Feature Description Data Type 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of TCP packets Number of TCP source port Number of TCP destination port Number of TCP fin flag Number of TCP syn flag Number of TCP reset flag Number of TCP push flag Number of TCP ack flag Number of TCP urget flag Number of UDP packets Number of UDP source port Number of UDP destination port Number of ICMP packets 14 Answer Class Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer Integer String (Normal, DoS, Probe) 3.2 Classifying Part The classification part consists of 2 main processes which are training and testing using java library of WEKA tool [13]. We train the C4.5 Decision tree model with known answer class of each record from the preprocess part. After that, we test the trained Decision tree model by new or untrained dataset where each record is captured on real-time system as shown in the classifying part in Figure 3. Our experimental network data consists of 4 DoS attack types, 13 Probe attack types, and normal activity as shown in Table 2. Table 2. Attack type and normal activity Data Tools Category Smurf UDP Flood HTTP Flood Jping Port Scan Advance Port Scan Host Scan Connect SYN Stealt FIN Stealt UDP Scan Null Scan Xmas Tree IP Scan ACK Scan Window Scan RCP Scan Normal Smurf.c Net Tools 5 Net Tools 5 Jping.c Net Tools 5 Net Tools 5 Host Scan 1.6 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 NMapWin 1.3.1 Actual Environment DoS DoS DoS DoS Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Normal 4. Parameter Settings & Evaluation A 2.83 GHz Intel Pentium Core2 Quad 9550 processor with 4GB RAM on maximum 100 Mbps is used to host our RT-IDS that captures network traffic in Computer engineering department of King Mongkut’s University of Technology Thonburi (KMUTT). We simultaneously generate attacks from many computers as shown in Figure 4 consisting of 4 DoS attack types and 14 Probes attack types to a computer victim that hosts our RT-IDS. Many Internet services on Ethernet are used full load in order to generate normal network activity. Detection performance evaluation of our RT-IDS is quantified based on following values. • The Total Detection Rate (TDR) is the percentage that the RT-IDS can correctly detect the DoS attacks, Probe attacks, and Normal network data. • The Normal Detection Rate (NDR) is the percentage that the RT-IDS can correctly detect the normal class. • The DoS Detection Rate (DDR) is the percentage that the RT-IDS can correctly detect the DOS attacks. • The Probe Detection Rate (PDR) is the percentage that the RT-IDS can correctly detect the Probe attacks. The consumption of CPU and memory resource in our running RT-IDS system is captured by Process Explorer tool. with full load (100 Mbps), our RT-IDS uses less than 25% of CPU resource while consuming memory about 94.5 MB. In addition, the detection time which is captured using the OS clock is about 2 seconds. DoS intruder Internet RT-IDS & Victim Create C4.5 Decision Tree model Full load (100 Mbps) Figure 5. CPU comsumption of the RT-IDS Probe intruder Figure 4. Network environment 5. Experimental Result In our experiment, the training data has 55,000 records including 10,000 DoS records, 30,000 Probe records and 15,000 normal records. After trained with the data for 0.03 seconds, the C4.5 Decision tree model consists of 197 nonterminal nodes (a root and internal nodes) and 99 terminal nodes (leaves). The RT-IDS with the trained decision tree is used to test online network data in 24hour time interval capturing about 109 Mega connections. After preprocessing the test data, we obtain a total of 102,959 records of testing data, consisting of 19,454 DoS records, 8392 Probe records, and 75,113 Normal records to test the classifying part of the real-time system or the RTIDS. The results are shown in Tables 3 and 4. Table 3. Experimental results Prediction Actual DoS (%) Probe (%) Normal (%) DoS 99.17 0.76 0.07 Probe 0.07 98.73 1.2 Normal 0.08 0.47 99.43 Table 4. Summary of results Our RT-IDS TDR (%) NDR (%) DDR (%) PDR (%) Decision tree 99.33 99.43 99.17 98.73 While running the RT-IDS, the Process Explorer tool captures consumption of CPU and memory resource as shown in Figures 5 and 6. When running Using 94.5 MB of memory Figure 6. Memory comsumption of the RT-IDS 6. Conclusion This paper proposes a new real-time intrusion detection system (RT-IDS) using a decision tree approach with an efficient data preprocessing consisting of only 13 features. We evaluate the RTIDS performance including detection rate, CPU, and memory consumption under a real-time environment. From the experimental results, our RT-IDS offers both total detection rate (TDR) and normal detection rate (NTR) higher than 99%, and the false alarm rate is very low. When capturing the network traffic with full load (100 Mbps), the RT-IDS uses less than 25% of CPU consumption, with only 94.5MB of memory usage, which is very low comparing to the memory capacity of a PC at present. Essentially, our RT-IDS can detect data packet about 2 seconds which is sufficient to warn or alert the computer user/administrator to protect the network system. Therefore the decision tree classification algorithm is a suitable approach for real-time intrusion detection. 7. Reference [1] M. Sabhnani and G. Serpen, “Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context”, Inter Conference: Machine Learning, Models, Technologies and Applications (MLMTA), 2003, pp. 209-215. [2] M. Gil-Jong, K. Yong-Min, K. DongKook, and N. BongNam, “Network Intrusion Detection Using Statistical Probability Distribution”, ”, Inter Conference: ICCSA(2), 2006, pp. 340-348. [3] V. Katos, “Network Intrusion Detection: Evaluating Cluster, Discriminant, and Logit Analysis”, Inter J.: Information Sciences, 177, 2007, pp. 3060-3073. [4] N. Ngamwitthayanon, N. Wattanapongsakorn, C. Charnsripinyo, and D.W. Coit, “Multi-Stage NetworkBased Intrusion Detection System Using Back Propagation Neural Networks”, Inter Conference: Asian International Workshop on Advanced Reliability Modeling (AIWARM), 2008. [5] S. Pukkawanna, V. Visoottiviseth, and P. Pongpaibool, “Lightweight Detection of DoS Attacks”, Networks, 15th IEEE Inter Conference, 2007, pp 77-82. [6] K. Labib and R. Vemuri, “NSOM: A Real-Time Network-Based Intrusion Detection System Using Self-Organizing Maps”, Networks and Security, 2002. [7] Puttini, Ricardo S., Marrakchi, Zakia and Mé, Ludovic, “A Bayesian Classification Model for RealTime Intrusion Detection”, API Conference, 2003, pp. 150-162. [8] Amini, M., Jalili, A. and Reza Shahriari, H., “RTUNNID: A Practical Solution to Real-Time NetworkBased Intrusion Detection Using Unsupervised Neural Networks”, Computer & Security 25, 2005, pp. 459468. [9] P. Sangkatsanee, N. Wattanapongsakorn, and C. Charnsripinyo, “Network Intrusion Detection and Classification with Decision Tree and Rule Based Approaches”, IEEE Inter Conference: ISCIT, 2009. [10] Zhi-Song Pan, Song-Can Chen, Gen-Bao Hu, and Dao-Qiang Zhang, "Hybrid Neural Network and C4.5 for Misuse Detection," Inter Conference: Machine Learning and Cybernetics, vol.4, 2003, pp. 2463-2467. [11] P.N. Tan, M. Steinbach, V. Kumar, “Introduction to Data Mining”, Pearson Addison Wesley, 2005, pp.150-168. [12] W. Cohen, “Fast Eective Rule Induction”, Inter Conference: Machine Learning, 1995. [13] Weka 3.7.0 tools [Online], Available: www.cs.waikato.ac.nz/ml/weka/ [2009, July 2]