Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007. Unsupervised Anomaly Detection In Network Intrusion Detection Using Clusters Satinder Singh1, Guljeet Kaur2 1 Computer Science Department, Lyallpur Khalsa College, Jalandhar H.no:5790, Near Jain Hospital Akash Avenue, Jandiala Guru, Amritsar, [email protected] 2 Computer Science Department, Khalsa College,Amritsar H.no:5790, Near Jain Hospital Akash Avenue, Jandiala Guru, Amritsar, [email protected] Abstract Most current network intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is 90typically expensive to produce. Moreover, these methods have difficulty in detecting new types of attack. In this paper, we have discussed anomaly based instruction detection, pros and cons of anomaly detection, supervised and unsupervised anomaly detection. 1. INTRODUCTION Intrusion Detection, is the art of detecting inappropriate, incorrect, or anomalous activity. ID systems that operate on a host to detect malicious activity on that host are called hostbased ID systems, and ID systems that operate on network data flows are called network-based ID systems. 2. ANOMALY BASED INTRUSION DETECTION Most current network intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. Moreover, these methods have difficulty in detecting new types of attack. Using unsupervised anomaly detection techniques, however, the system can be trained with unlabelled data and is capable of detecting previously “unseen" attacks. With the increased usage of computer networks, security becomes a critical issue. A network intrusion by malicious or unauthorized users can cause severe disruption to networks. Therefore the development of a robust and reliable network intrusion detection system (IDS) is increasingly important. Traditionally, signature based automatic detection methods have been widely used in intrusion detection systems. When an attack is discovered, the associated traffic pattern is recorded and coded as a signature by human experts, and then used to detect malicious traffic. However, signature based methods suffer from their inability to detect new types of attack. Further-more the database of the signatures is growing as new types of attack are being detected, which may affect the efficiency of the detection. Other methods have been proposed using machine learning algorithms to train on labeled network data, i.e., with instances pre-classified as being an attack or not (Lee & Stolfo 1998). These methods can be classified into two categories: misuse detection and anomaly detection. Misuse detection In the misuse detection approach, the machine learning algorithm is trained over the set of labeled data and automatically builds detection models. Thus, the detection models are similar to the signatures described before. Nonetheless these detection methods have the same weakness as the signature based methods in that they are vulnerable against new types of attack. Anomaly detection In contrast, anomaly detection approaches build models of normal data and then attempt to detect deviations from the normal model in observed data. Consequently these algorithms can detect new types of intrusions because these new intrusions, by assumption, will deviate from normal network usage (Javitz & Vadles 1993, Denning 1987). Nevertheless these algorithms require a set of purely normal data from which they train their model. If the training data contains traces of intrusions, the algorithm may not detect future instances of these attack because it will assume that they are normal. An IDS which is anomaly based will monitor network traffic and compare it against an established baseline. The baseline will identify what is “normal” for that network- what sort of bandwidth is generally used, what protocols are used, what ports and devices generally connect to each other- and alert the administrator or user when traffic is detected which is anomalous, or significantly different, than the baseline. In most circumstances, labeled data or purely normal data is not readily available since it is time consuming and expensive to manually classify it. Purely normal data is also very hard to obtain in practice, since it is very hard to guarantee that there are no intrusions when we are collecting network traffic. To address these problems, we used a new type of intrusion detection algorithm called unsupervised anomaly detection. It makes two assumptions about the data. Assumption 1 The majority of the network connections are normal traffic. Only X% of traffic are malicious. (Portnoy, Eskin & Stolfo 2001) 107 Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007. Assumption 2 The attack traffic is statistically different from normal traffic. (Javitz & Vadles 1993, Denning 1987) The algorithm takes as input a set of unlabelled data and attempts to find intrusions buried within the data. After these intrusions are detected, we can rain a misuse detection algorithm or a traditional anomaly detection algorithm using the data. If any of the assumptions fail, the performance of the algorithm will deteriorate. For example, it will have difficulties in detecting a bandwidth DoS attack. The reason is that often under such attacks there are so many instances of the intrusion that it occurs in a similar number to normal instances. 3. PROS AND CONS OF ANOMALY DETECTION Pros: An anomaly-based IDS examines ongoing traffic, activity, transactions, or behavior for anomalies on networks or systems that may indicate attack. The underlying principle is the notion that “attack behavior” differs enough from “normal user behavior” that it can be detected by cataloging and identifying the differences involved. By creating baselines of normal behavior, anomaly-based IDS systems can observe when current behavior deviates statistically from the norm. This capability theoretically gives anomaly-based IDSs abilities to detect new attacks that are neither known nor for which signatures have been created. Cons: Because normal behavior can change easily and readily, anomaly-based IDS systems are prone to false positives where attacks may be reported based on changes to the norm that are “normal,” rather than representing real attacks. Their intensely analytical behavior can also impose sometimes-heavy processing overheads on systems where they’re running. Furthermore, anomaly-based systems take a while to create statistically significant baselines (to separate normal behavior from anomalies); they’re relatively open to attack during this period.Today, many antivirus packages include both signaturebased and anomaly-based detection characteristics, but only a few IDSs incorporate both approaches. Most experts expect anomaly-based detection to become more widespread in IDSs, but research and programming breakthroughs will be necessary to deliver the kind of capability that anomaly-based detection should be, but is currently not, able to deliver. Finally, some IDSs are capable of responding to attacks when they occur. This behavior is desirable from two points of view. For one thing, a computer system can track behavior and activity in near-real time and respond much more quickly and decisively during early stages of an attack. Since automation helps hackers mount attacks, it stands to reason that it should also help security professionals fend them off as they occur. For another thing, IDSs run 24/7, but network administrators may not be able to respond as quickly during off hours as they can during peak hours (even if the IDS can page them with an alarm that an attack has begun). By automating a response to block incoming traffic from one or more addresses from which an attack originates, the IDS can halt an attack in process and block future attacks from the same address. Unlike signature or misuse based intrusion detection techniques, anomaly detection is capable of detecting novel attacks. However, the use of anomaly detection in practice is hampered by a high rate of false alarms. Specification-based techniques have been shown to produce a low rate of false alarms, but are not as effective as anomaly detection in detecting novel attacks, especially when it comes to network probing and denial-of-service attacks. This paper presents a new approach that combines specification-based and anomalybased intrusion detection, mitigating the weaknesses of the two approaches while magnifying their strengths. Our approach begins with state-machine specifications of network protocols, and augments these state machines with information about statistics that need to be maintained to detect anomalies. We present a specification language in which all of this information can be captured in a succinct manner. We demonstrate the effectiveness of the approach on the 1999 Lincoln Labs intrusion detection evaluation data, where we are able to detect all of the probing and denial-of-service attacks with a low rate of false alarms (less than 10 per day). Whereas feature selection was a crucial step that required a great deal of expertise and insight in the case of previous anomaly detection approaches, we show that the use of protocol specifications in our approach simplifies this problem. Moreover, the machine learning component of our approach is robust enough to operate without human supervision and fast enough that no sampling techniques need to be employed. As further evidence of effectiveness, we present results of applying our approach to detect stealthy email viruses in an intranet environment. 4. UNSUPERVISED AND SUPERVISED ANOMALY DETECTION Applying unsupervised anomaly detection in network intrusion detection is a new research area that have already drawn interest in the academic community. Eskin, et al. (Eskin et al. 2002) investigated the effectiveness of three algorithms in intrusion detection: the fixed-width clustering algorithm, an optimized version of the k-nearest neighbor algorithm, and the one class support vector machine algorithm. Old-meadow, et al. (Oldmeadow et al. 2004) carried out further research based on the clustering method in (Eskin et al. 2002) and showed improvements in accuracy when the clusters are adaptive to changing traffic patterns. A different approach using a quartersphere support vector machine is proposed in (Laskov, Schafer & Kotenko 2004), with moderate success. In (Eskin 2000), a mixture model for explaining the presence of anomalies is presented, and machine learning techniques are used to estimate the probability distributions of the mixture to detect anomalies. In (Zanero & Savaresi 2004), a novel two-tier IDS is proposed. The first tier uses unsupervised clustering to classify the packets and compresses the information within the payload, and the second tier used an anomaly detection algorithm and the information from the first tier for intrusion 108 Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007. detection. Lane and Brodley (Lane & Brodley 1997) evaluated unlabelled data by looking at user profiles and comparing the activity during an intrusion to the activity during normal use. Supervised anomaly detection in network intrusion detection, which uses purely normal instances as training data, has been studied extensively in the academic community. A comprehensive survey of various techniques is given in (Lazarevic, Ertoz, Kumar, Ozgur & Srivastava 2003). An approach for modelling normal traffic using self-organising maps is presented in (Gonzalez & Dasgupta 2002), while another one uses principal component classifiers to obtain the model (Shyu, Chen, Sarinnapakorn & Chang 2003). One approach uses graphs for modelling the normal data and detect the irregularities in the graph for anomalies (Noble & Cook 2003). Another approach uses the normal data to generate abnormal data and uses it as input for a classification algorithm (Gonzalez & Dasgupta 2003). 5.CLUSTERING Clustering is a well known and studied problem. There exist a large number of clustering algorithms in the literature. These methods can be categorized as: partitioning methods, hierarchical methods, density-based methods and grid-based methods. We shall concentrate on algorithms that closely related to our investigation. Partitioning Methods Xu 1996) and OPTICS (Ankerst, Breunig, Kriegel & Sander 1999). Grid-based Methods Grid-based methods divide the object space into a finite number of cells that form a grid structure. All of the clustering operations are performed on the grid structure. The main advantage of this approach is its fast processing time, which is typically dependent mainly on the number of cells in each dimension in the quantised space. Some examples of gridbased methods are STING (Wang, Yang & Muntz 1997), WaveCluster (Sheikholeslami, Chatterjee & Zhang 1998), CLIQUE (Agrawal, Gehrke, Gunopulos & Raghavan 1998) and pMAFIA (Nagesh, Goil & Choudhary 2000). CLIQUE (CLustering In QUEst) Our work builds upon the CLIQUE and pMAFIA algorithms. CLIQUE (CLustering In QUEst) (Agrawal et al. 1998) is a hybrid clustering method that combines the idea of both gridbased and density-based approaches. CLIQUE first partitions the n-dimensional data space into non-overlapping rectangular units. It attempts to discover the overall distribution patterns of the data set by identifying the sparse and dense units in the space. The identification of the candidate search space is based on the following monotonicity principle: if a k-dimensional unit is dense, then so are its projections in (k - 1) dimensional space. Given a database of n objects, a partitioning method constructs k partitions of data where each partition represents a cluster. One partitioning method that is of interest to our study is the fixed width clustering algorithm. It is one of the algorithms used in the studies by Stolfo, et al. (Eskin et al. 2002) and Oldmeadow, et al. (Oldmeadow et al. 2004) which we compare our results against. The main advantage of the fixed width algorithm is that it scales linearly with the number of objects in the data set and the number of attributes of the objects. Nevertheless the quality of the clusters is sensitive to the definition of the width of cluster w. Often the user needs several repetitions of the algorithm to choose the best value of w in any particular application. pMAFIA Density-based Methods Mining Frequent Itemsets is an intermediate step in anomaly detection algorithm. Two well known methods, Apriori (Agrawal & Srikant 1994) and FP-growth (Han, Pei & Yin 2000), in the following section. Density-based methods are based on a simple assumption: clusters are dense regions in the data space that are separated by regions of lower density. Their general idea is to continue growing the given cluster as long as the density in the neighbourhood exceeds some threshold. In other words, for each data point within a given cluster, the neighborhood of a given radius has to contain at least a minimum number of points. These methods are good at filtering out outliers and discovering clusters of arbitrary shapes. Some examples of density-based methods are DBSCAN (Ester, Kriegel, Sander & pMAFIA (Nagesh et al. 2000) is an optimized and improved version of CLIQUE. There are two main differences between them. First, pMAFIA used the adaptive grid algorithm to reduce the total number of potential dense units by merging small 1-dimensional partitions that have similar densities. Second, it parallelized the operation of the generation and population of the candidate dense units using a computer cluster. However, they both scale exponentially to the dimension of the cluster of the highest dimension in the data set. 6. Mining Frequent Itemsets Apriori algorithm The Apriori algorithm is a well known method for mining frequent itemsets. It employs an iterative approach known as a level-wise search. The finding of each level of itemsets requires one full scan of the database. To reduce the search space, candidate k-itemsets are generated only if all of their (k - 109 Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007. 1)-subsets are frequent. Although this approach guarantees to find all of the frequent itemsets, the computational complexity is exponential to the number of maximally frequent itemsets 1 . It may also need to repeatedly scan the database and check the large set of candidate itemsets by pattern matching. This is infeasible for large databases containing millions of records with long patterns. Frequent-pattern growth (FP-growth) Frequent-pattern growth (FP-growth) is an efficient method for mining frequent itemsets. It avoids the cost of generating a huge set of candidate itemsets by building a compact prefixtree data structure, the frequent-pattern tree (FP-Tree). First, the algorithm scans the database and derives the set of frequent items and their support counts. The set is sorted in the order of descending support count. Let L denote the resulting set. To construct the FP-Tree, the algorithm scans the database and processes the items in each record in L order (i.e., sorted according to descending support count). The processed itemset represents a branch in the tree, with each frequent item represented by a node. Add the branch to the tree if it does not exist. If any prefix of the branch already exists in the tree, then increment the count of each node along the common prefix by one and extend the branch. The algorithm mines the frequent itemsets from the FP-Tree from nodes deep inside the tree. As shown in (Hipp, Guntzer & Nakhaeizadeh 2000), the FPgrowth algorithm is about an order of magnitude faster than Apriori in large databases. 7. REFERENCES [1] Intrusion Detection by Rebecca Gurley Bace, Publisher: New Rider [2] Intrusion Detection & Prevention by Carl Endorf, Gene Schultz, Jim Mellander, Publisher: Mc Graw Hill Osborne [3] Recent Advances in Intrusion Detection: 5th International Symposium, Raid 2002,Zurich, Switzerland, October 2002, Processings by: Andreas Wespi, Giovanni Vigna, Luca Deri (Eds.), Publisher: Springer [4] Snort Intrusion Detection 2.0 by Syngress [5] Managing Cyber Threats: Issues, Approaches, and Challenges edited by Jaideep Srivastava, Aleksandar Lazarevic, Vipin Kumar [6] Computational Science. ICCS 2004 Part II: Proceedings of the 4th International Conference edited by Marian Bubak, Geert D. van Albada, Peter M. A. Sloot, Jack J. Dongarra [7] Information Security And Cryptology-icisc 2004: 7th International Conference, Seoul, Korea,... [8] Security And Protection In Information Processing Systems: Ifip 18th World Computer Congress, Kluwer Academic Publishers, Edited By: Yves Deswarte, Frederic Cuppens, Sushil Jajodia, Lingyu Wang [9] Intrusion Detection, Edited by D. Frincke, Publisher: IOS Press 110