Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Hybrid Intrusion detection system using Principal Component Analysis Khin Khattar Myint PhD Student of UCSY, Myanmar,[email protected] can be classified into two categories based on the technique Abstract-- Intrusion detection is the essential component used to detect intrusion: anomaly detection and misuse of today computer network. There are two different detection. Signature detection, or misuse detection, searches approaches for intrusion detection such as misuse for well-known patterns of attacks and intrusions by detection and anomaly detection. Misuse detection can scanning for pre classified signatures in network packets. detect known attack but it is unable to detect when new Anomaly detection can detect new intrusions while misuse attacks arise. Anomaly detection can be used to detect detection may not. The idea behind anomaly detection is that unknown attack but false positive rate may high. In if we can establish a normal activity profile for a system, in order to detect our computer network efficiently, we theory we can flag all system states varying from the may use both misuse and anomaly detection. Moreover, established profile as intrusion attempts. Anomalous collected data contains irrelevant and redundant activities that are not intrusive but flagged as intrusive are features. The key point is to build an intrusion detection referred as false positives. The effectiveness of an IDS is system in terms of high detection rate and low false measured using attack detection rate and false alarm rate. alarm rate. So, we proposed hybrid intrusion detection This paper consists of following sections: Section II is a system to detect both known and unknown attack. In brief description about intrusion detection system, section III this paper decision tree is used for misuse detection and discusses the related work of this system, section IV classification based association is used for anomaly describes about the proposed system. detection and principal component analysis is used for II. Intrusion Detection System feature selection.KDD CUP 99 dataset is used to train Intrusion may be defined as the unauthorized attempt for and test our model. gaining access on a secured system or network. Intrusion Index Terms-- decision tree C4.5 algorithm, misuse and detection is the course of action to detect suspicious anomaly detection, KDD dataset. activity on the network or a device. Intrusion detection I. INTRODUCTION system (IDS) is an important detection used as a counter With the increment use of networked computers for critical measure to preserve data integrity and system availability systems, network security is becoming more and more from attacks. An intrusion detection system (IDS) inspects challenging. Intrusion detection (IDS) has been widely all inbound and outbound network activity and identifies deployed to be a second line of defense for computer suspicious patterns that may indicate a network or system network systems along with other network security attack from someone attempting to break into or techniques such as firewall and access control. The main compromise a system. goal of intrusion detection system is to detect unauthorized A. Intrusion Detection Process use, misuse and abuse of computer systems by both system Intrusion detection processes [3] are categorized into insiders and external intruders. Intrusion detection system Misuse detection and Anomaly detection. Misuse detection 2 is sometimes called signature-based detection and anomaly B. User to Root Attack (U2R) detection is sometimes called behavior-based detection. Is a class of exploit in which the attacker starts out with Misuse detection compares the user activities to the known access to a normal user account on the system and is able to intruder activities on the web. The idea of misuse detection exploit some vulnerability to gain root access to the system. is to represent attacks in the form of a pattern or a signature C. Remote to Local Attack (R2L) so that the same attack can be detected and prevented in Occurs when an attacker who has the ability to send packets future. The IDS searches for defined signatures and if a to a machine over a network but who does not have an match is found, the system generates an alarm indicting the account on that machine exploits some vulnerability to gain presence of intrusion. Since it works on the basis of local access as a user of that machine. predefined signatures, it is unable to detect new or D. Probing or Surveillance Attack previously unknown intrusions. Anomaly intrusion detection It's any attempt to gather information about a network of identifies deviations from the normal usage behavior computers for the apparent purpose of circumventing its patterns to identify the intrusion. It estimates the deviation security controls. of a user activity from the normal behavior and if the D. Intrusion Detection Dataset deviation goes beyond a preset threshold, it considers that The KDD Cup 1999 Intrusion detection contest data [3] is activity as an intrusion. Anomaly is able to detect new used in our experiments. This data was prepared by DARPA intrusion but the compulsion for involvement of limiting Intrusion detection evaluation program by MIT Lincoln factor results in high percentage of false positive rate. Laboratory. Lincoln labs acquired nine weeks of raw TCP B. Intrusion Detection Approaches dump data. The raw data was processed into connection Intrusion Detection can be classified into Host-based records, which contains about 5 million connection records. Intrusion Detection System (HIDS) and Network-based The data set contains 24 attack types. These attacks fall into Intrusion Detection System (NIDS) according to the data four main categories describe above. Besides the four analyzed and stored. Host-based IDS analyze host-bound different types of attacks, normal class needs to be detected. audit sources such as operating system audit trails, system The dataset for our experiments contained 1000 connection logs and application logs. It is a software application which records, which is a subset of 10% KDD Cup'99 intrusion is installed onto a system in order to protect it from detection intruders. The audit data which is to be analyzed is collected generated from the MIT dataset. Random generation of data from the host in the network. Network-based IDS analyze include the number of data from each class proportional to network packets that are captured on a network. In NIDS, its size, expect that the smallest class is completely included. detection software is installed in a network in order to detect All the intrusion detection models are trained and tested with intrusions. NIDS collects data directly from the network in same dataset. form of packets and are analyzed for detecting intrusions benchmark dataset. These were randomly III. Related Works [3]. There are many research papers in hybrid intrusion detection C. Attacks Detected by IDS with different data mining algorithms. Kandeeban and Attacks types [1] fall in one of the following categories: Rengan [7] used a combination of genetic algorithm and A. Denial of Service Attack (DOS) neural networks for intrusion detection on KDD 99 dataset. Is an attack in which the attacker makes some computing or They achieved a detection rate as close to 95 when the false memory resource too busy or too full to handle legitimate alarm rate is 1.9% to 2% and a detection arte of 70% as the requests, or denies legitimate users access to a machine. false alarm rate is brought down to 1%. Depren et al. [5] 3 proposed an IDS architecture utilizing Self-Organizing Map classifier is used to separate the attack patterns and normal (SOM) structure for anomaly and C4.5 for misuse detection. pattern. Attack patterns are sent to misuse detector and A rule based Decision Sup-port System (DSS) was also normal patterns are sent to anomaly detector. developed for interpreting the results of both anomaly and misuse detection modules. They obtained a detection rate of 98.96% for anomaly detection and 99.61% for the misuse detection modules on the KDD 99 Data Set. Another model for IDS, proposed by Pan et al. [8], used neural network and C4.5 for attack detection. Their model achieved the average detection rate of 93.28% and false positive rate of 0.2% on KDD Cup 99 dataset. Nguyen and Choi [2] compared different algorithms on the basis of their percentage accuracies of individual attack type detection and on the basis of overall accuracies (AA). Algorithms' classification times (TT in sec) were also compared to find their real time usage. Compared algorithms included BayesNet AA(90.62) TT(6.28), NaiveBayes Figure 1. Proposed system framework A. Principle Component Analysis AA(78.32) TT(5.57), C4.5 AA(92.06) TT(15.85), NBTree AA(92.28) TT(295.88), Decision Table AA(91.66) TT(66.24) and few others. Even though the overall accuracy of single classifiers, mainly C4.5, was quite good but none of them was able to detect all four attacks efficiently. Radhika Goel et.al [6] in this paper a novel hybrid model is being proposed for Misuse and anomaly detection. C4.5 based binary decision trees are used for misuse and CBA based classifier is used for anomaly detection. Results show that 99.995% misuse detection rate with an anomaly detection rate of 99.298% is achievable. As discussed above, many researchers have conducted extensive performance comparison of various popular classification algorithms. Among them, the decision tree based algorithms like C4.5 give the best performance than PCA is a common statistical method used in multivariate optimization problems in order to reduce the dimensionality of data while retraining a large fraction of the data characteristic. First, PCA is used to project the training set onto eigenspace vectors representing the mean of the data. These eigenspace vectors are then used to predict malicious connections in a workload containing normal and attack behavior [4]. PCA reduce the amount of dimensions required to classify new data and produces a set of principal components, which are orthonormal eigenvalue pairs. In other words, pca projects a set of axes which best suit the data. These set of axes represent the normal connection data. Outlier detection occurs by mapping live network data onto these normal axes and calculating the distance from the axes. If the distance is greater than a certain threshold, then the other classifiers. IV. Proposed System Figure-1 shows the proposed system of our hybrid intrusion detection system. Firstly, both training and testing data be preprocessed to remove irrelevant attributes. In our proposed system, feature selection is done by using Principal Component analysis (PCA). Decision tree based connection is classified as an attack. The principle components are derived from the covariance matrix. When some values are much larger than others, then their corresponding eigenvalues have larger weights. The larger the eigenvalue, the more significant its corresponding projected eigenvector. Therefore, the principal components 4 are sorted from most to least significant i.e. in descending Confidence: The rule holds with confidence conf, if conf% order. If a new data item is projected along the upper set of of cases that contain X also contain y. the significant principal components, it is likely that the data The algorithm used for rule generation in CBA is similar to item can be classified without projecting along all the the Apriori algorithm using generate and test approach. principal components. The eigenvectors of the principal Firstly, size-k patterns are generated. These are called components represent axes which best suit a data sample. candidate patterns. Then using Apriori approach, candidate Points which lie at a far distance from the axes would patterns satisfying minimum support are selected as frequent exhibit abnormal behavior. Outliers measured using the patterns. Using size-k frequent pattern, size k+1 candidate Euclidian distance are the network connections that are patterns are generated and tested for minimum support anomalous. Using a threshold value (t), any network value. This process continues till a max limit on rule size is connection with a distance greater than the threshold is reached or no frequent patterns exist. Finally from frequent considered an outlier. patterns, rules are generated using confidence value. B. Misuse Detector After rule generation, CBA uses a heuristic method to order Misuse detector is a hierarchical sequential model using the rules in decreasing precedence based on their confidence decision tree. The differential approach separates out one and support values. If a set of rules has the same antecedent attack at a time. This technique defines the unique features then the rule with the highest confidence is selected to of one attack and at the same time brings about the general represent the set. If the confidence of the rules that apply is characteristics of the rest of the other attacks which the same, the rule with highest support will be picked. Again differentiate the rest from that attack. The distribution of if the support is also equal, CBA will classify the case different attacks in the training dataset is usually uneven. So, according to the rule which is generated earlier than the the instances of some attacks are often less than others. others. In this way an ordered list of rules is created. When a Sequence maintained at different levels is so as to make an new tuple is given for classification, the class associated unbiased classifier by combining the less frequent records with first rule satisfying the tuple is used for labeling. The together. In the proposed architecture, C4.5 is used as the classifier also contains a default rule, having low decision tree algorithm [6]. precedence. If a tuple doesn’t satisfy any rule then it is C. Anomaly Detector assigned the label of default class [6]. Classification based on association is the application of V. Conclusion association rules to classification problems. It generates In this paper we have proposed a new model for attack class association rules (CARs). Classification association detection using a decision tree for misuse detection and rules (CARs) are association rules with the target class on CBA rules based classifier for anomaly detection. Moreover, the right hand side of the rules. A CAR is an implication of relevant feature selection model using principal component the form: analysis was proposed to select the best features set that . could be used to design a lightweight intrusion detection X is the set of features. I is the set of all features. y is the system. Our proposed system is just frame. So the future target class. Y is the set of all classes. CBA also provides work of this proposed framework will be implementation. strength measurements for the CARs: VI. REFERENCES Support: The rule holds with support sup, if sup% of cases [1] Ayman I. Madbouly, Amr M. Gody, Tamer M. Barakat, contain X. Relevant Feature Selection Model Using Data Mining for Intrusion Detection system, International Journal of 5 Engineering Trends and Technology, Volume 9 Number 10, March 2014. [2] H. A. Nguyen and D. Choi, Application of data mining to network intrusion detection: classifier selection model, APNOMS 2008, LNCS 5297, pp. 399-408, SpringerVerlag, 2008. [3] Mradul Dhakar and Akhilesh Tiwari, Journal of Information and Computing Science Vol.9, No.1, 2014. [4] Nethu B, Adaptive Intrusion detection Using Machine Learning, International Journal of Computer Science and Network Security, Vol.13 No.3, March 2013. [5] O. Depren, M. Topallar, E. Anarim, and M. K.Cili, An intelligent intrusion detection system (IDS)for anomaly and misuse detection in computer net-works, Expert Systems with Application, vol. 29, no.4, pp. 713-722, 2005. [6] Radhika Goel, Anjali Sardana, and rAmesh C. Joshi , Parallel Misuse and Anomaly Detection Model, International Journal of Network Security, Vol.14, No.4, PP.211-222, July 2012. [7] S. S. Kandeeban and R. S. Rajesh, Integrated Intrusion Detection System Using Soft Computing, International Journal of Network Security, vol. 10, no. 2, pp. 87-92, Mar. 2010. [8] Z. S. Pan, S. C. Chen, G. B. Hu, and D. Q. Zhang, Hybrid neural network and C4.5 for misuse detection, Proceedings of Second International Machine Learning and Cybernetics, 2003. Conference on