Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice Anomaly Detection Techniques for Adaptive Anomaly Driven Traffic Engineering 1 Jakub Kopka, 2 Martin Révés, 3 Juraj Giertl Dept. of Computers and Informatics, FEEI TU of Košice, Slovak Republic 1 [email protected], 2 [email protected], 3 [email protected] Abstract—Traffic engineering (TE) has become mechanism for safe and efficient transportation of data in a computer network. TE uses statistical methods for prediction of a network traffic behavior. However, the traffic behavior will never match this predictions 100%. The aim of this paper is to describe detection techniques, which can be used for detection of the anomalous traffic in computer networks. We propose all known methods, which are suitable for the anomaly detection in different application domains and suggest the best techniques for the detection of anomalous traffic. x2 S Keywords—Computer networks, Computer networks management, Traffic control, Traffic engineering 1 x1 I. I NTRODUCTION Inadequate utilization of network resources is challenging problem for network traffic engineers. TE allows the optimization of the network resources usage through multiple mechanisms. This optimization is difficult due to dynamic nature of a network traffic. The network traffic can be characterized by several parameters. This parameters may come to be recorded and a model of traffic can be build. If one or more parameters will have a value that is different that one predicted by the traffic model, we call it an anomaly. This anomalies are monitored by a distributed monitoring system. The monitoring system can detect and localize the cause of anomalies and respond appropriately by reconfiguring the network. We call this approach Adaptive Anomaly Driven Traffic Engineering. This paper is organized as follows. In the section II we describe and divide anomalies into categories, describe application domains, where anomaly detection techniques can apply and name the most known of them. Section III presents classification based anomaly detection techniques. In the section IV we present techniques for detecting contextual anomalies. Section V presents techniques for detecting collective anomalies. Section VI is conclusion of the previous sections and presents our suggestion, which of the detection techniques is the most suitable for our traffic engineering approach. II. C LASSIFICATION OF ANOMALIES AND THEIR DETECTION TECHNIQUES A. Anomaly An anomaly is a deviation from the common rule, type or form. The anomalies are patterns in data that do not conform to a well defined notion of a normal behavior [1]. In relation to the network traffic, the anomaly is any deviation of the expected traffic behavior. This anomaly significantly distincts from the normal traffic and influences one or more links in network. For example, Fig. 1 illustrates anomalies in a simple X3 Fig. 1. S2 A simple example of anomalies. data set. This set of data has two areas of normal data S1 and S2 . It is because the most of the data instances lie in these two areas. Points x1 and x2 are anomalies, because they are significantly far away from this areas. Also the set X3 denotes anomalies, although it contains more than one data instance. Detecting such patterns in the data, which are different from the normal expected behavior is called an anomaly detection. The anomaly detection is used in several application domains e.g. image processing, card fraud detection, pharmaceutical research, network intrusion detection systems and many other. All these application domains provides data which can be analysed for presence of anomalies. Data are input for all anomaly detection techniques. These data are described by the set of attributes which can be binary, categorical or continuous. Data can be univariate or multivariate if it has one or multiple attributes repectively. Character of the data determines which of the techniques can be used. The anomalies are divided into three categories: • point anomalies, • contextual anomalies, • collective anomalies. The point anomalies are single data instances, which are separated from the rest of the data. The contextual (conditional [2]) anomalies are such data instances, which are considered SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice as anomalies in a specific context only. In the other context can be considered as the normal data. This data have two types of attributes - behavioral and contextual attributes. The behavioral attributes denote noncontextual characteristics of the data, while the contextual attributes denote the relation between the data. The collective anomalies occur when whole data subset is divided from the normal data and considered as anomalous. Either point anomaly, or collective anomaly can be transformed to the contextual anomaly. B. Anomaly detection There exists more than one approach in the anomaly detection. A straightforward approach defines a region, which represents the normal data or behavior and declares any data, which do not fall into this normal data as an anomaly. However, there exist some factors which are challenging [1]: • It is problem to say what normal is. A boundary between normal and anomalous behavior is often not so precise. • When the anomalies are result of a malicious software, this software can try to mask these anomalies as a normal traffic pattern. • The normal traffic pattern is dynamic. What is normal now, might not be normal tomorrow. • It is very difficult to get the normal labeled data for building a prediction model. • The data can contain a noise which is not considered as the anomaly, but it is not interesting for analyst and therefore is unnecessary, because can cause false detection of the anomalies. The anomaly detection methods are different for different application domains and specific problems related to them. Data labels denote data as normal or anomalous. Labeling a data is made manually, so aquiring of a fully labeled data is very difficult and expensive. Based on a type of the labeled data, the anomaly detection techniques can be divided into three categories: • supervised anomaly detection techniques, • semi-supervised anomaly detection techniques, • unsupervised anomaly detection techniques. The techniques working in the supervised mode need a fully labeled data. The typical approach in this mode is to build the predictive model of normal and anomalous data. All tested data are then compared to these models and denoted as normal or anomalous. The techniques working in semi-supervised mode need for building of the predictive models the normal labeled data only. A specific type of this mode are techniques, which work with the anomalous data only and build the model of the anomalous behavior. The unsupervised mode techniques do not need the training data to be labeled because they assume that there is much more normal data instances than anomalous ones. III. P OINT ANOMALIES DETECTION TECHNIQUES A. Classification based techniques These techniques work in two phases. During the first phase, which is called learning, a prediction model (classificator) is built using avalaible labeled data. The classificator can distinct between the normal and the anomalous data. During the second, testing phase, the tested data are classified into the normal or the anomalous classes. In the learning phase can model divide the normal data into several sets. When this occurs, such technique will be called a multi-class technique. When only one normal class exists, it is an one-class technique. One group of classification techniques uses a classification agorithm based on neural networks. Such techniques can be used with the multi-class or the one-class data. Other techniques use the algorithms based on Bayesian networks, Support vector machines or Rule based systems. The testing phase is generally very fast, because a predictive model has been built and testing instances are only compared to the model. These techniques also use algorithms that can distinguish between instances belonging to different normal classes. A disadvantage is that these techniques need training labeled data to build the predictive model. B. Nearest neighbor based techniques These techniques are based on the prediction that normal data instances forms neighborhoods, while the anomalous do not. These techniques compute distances to the nearest neighbors or uses a relative density as the anomaly score. The first group of techniques use a distance to the k nearest neighbors as the anomalous score. The second group of techniques computes the relative density in a hypersphere with the radius d. Such anomalous score s can be computed as [3][4]: n (1) s= πd2 where n is the number of data instances. The advantage of these techniques is that they can work in the unsupervised mode, but if the semi-supervised mode is used, the number of the false anomaly detections is smaller [5]. The computational complexity of such techniques is relatively high, because between each pair of the data instances the distance is computed. Also the rate of the false anomaly detection is high, if a normal neighborhood consists only from few data instances. C. Clustering based techniques These techniques are very similar to techniques mentioned in the previous subsection. The problem of detecting anomalies which forms clusters can be transformed to the problem of nearest neighbor based techniques. Both, nearest neighbor based and the clustering based techniques are very similar. The clustering based techniques, however, evaluate each instance with a respect of the cluster it belongs to. The first type of clustering based techniques assume that normal instances forms clusters (Fig. 2). Such techniques apply known clustering based algorithms and declare, whether any data belongs to the cluster or not. A disadvantage is that they are optimized to find clusters not anomalies. The second type assumes that the normal data instances lie close enough to a closest cluster centroid (Fig. 3). Such techniques are not applicable, if the anomalies form clusters. Therefore there exists the third type of the clustering techniques, which assumes that the normal instances form large dense clusters, while anomalies form small clusters which are sparse (Fig. 4). Both previous types work in two phases. In the first phase, a clustering algorithm clusters data and in the second phase, it computes a distance as an anomalous score. The clustering based techniques can work in the unsupervised SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice y y x Fig. 2. x Normal instances as a one big cluster. Fig. 4. y x Fig. 3. Normal instances form more than one cluster. mode, because clustering algorithms do not need labeled data. Once a model is built, the testing phase is very fast, because it just compares tested instances to the model. One main disadvantage is a high computational complexity and the fact, that these algorithms are not optimized to find anomalies. D. Anomaly detection techniques based on statistical approach The statistical methods are based on the following assumption [6]: An anomaly is an observation which is suspected of being partially or wholly irrelevant because it is not generated by the stochastic model assumed. It means that the normal data instances occur in the high probability regions, while the anomalies occur in the low probability regions of the stochastic model. The statistical methods fit the statistical model to the normal data and then determines, if a tested data instance belongs to Anomalous instances form a cluster. the model or not. If technique assume the knowledge of the distribution it is called parametric [7], otherwise it is called nonparametric [8]. The nonparametric methods assume that model is determined from the given data. The most used techniques are the kernel function based and the histogram based techniques. The kernel function based techniques use the Parzen windows estimation [9]. The simplest techniques are the histogram based techniques. These techniques are widely used in intrusion detection systems. The first step in using of these techniques consists of building a histogram based on diferent values taken from the training data. In the second step is a tested data instance checked, whether it falls into one of the histogram bins. The parametrical methods assume that data are generated by the parametric distribution Θ and the probability density function f (o, Θ), where o is an observation. The parametrical methods can be divided based on types of the distributions: • Gaussian model based, • regression model based, • mixture of parametric distributions based. The Gaussian model based methods use many known techniques like box plot rule or Grubb’s test. In the Grubb’s test for each test instance x, its z score computed as: |x − x| (2) s where x is the mean and s is the standard deviation. The test instance is anomalous if v u t2α/(2N ),N −2 N − 1u z> √ t (3) N − 2 + t2α/(2N ),N −2 N z= where N is the data size and t2α/(2N ),N −2 is a treshold. The regression based methods were used for time-series data. Techniques based on mixtures of the parametric distribution use different types of the distribution to model normal and anomalous data. If the normal data cannot be modeled by none of all known distributions, mixture of them is used. SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice The advantages of the statistical methods are that they are widely used and if a good model is designed, they are very effective. They can be used in the unsupervised mode with lack of the training data. The histogram based techniques are not suitable for detecting the contextual anomalies, because they cannot record an interaction between the data instances. Also choosing of the proper test method is nontrivial. E. Other detection techniques The aformentioned techniques are the most widely used. The other methods use the information theory techniques based on the relative entropy or the Kolmogorov complexity. These techniques can operate in the unsupervised mode and do not need a statistical assumption about data. Another techniques are the spectral anomaly detection techniques, which try to find an approximation of the data and determine the subspaces in which the anomalous instances can be easily identified. These techniques have high computational complexity. IV. C ONTEXTUAL ANOMALIES DETECTION TECHNIQUES When detecting contextual anomalies, data instances have contextual and behavioral attributes. The context between data can be defined using sequences, space, graph or profile. The profiling is typicaly used for detecting of credit card frauds. For each of the credit card holders (each holder denotes separate context) is the behavioral profile built. Using the credit card for paying abroad can be labeled as the anomalous or the normal instance. It depends on the context and that is the card owner. The problem of contextual anomalies detection can be transformed to the problem of point anomalies detection. It is necessary to identify the context and then compute an anomaly score. The other methods utilize a structure of data and use regression or divide and conquer approach. The advantage of these techniques is that they can identify the anomaly, which would be undetectable using the techniques in the previous section. V. C OLLECTIVE ANOMALIES DETECTION TECHNIQUES These techniques can be divided into three categories: • • • sequential anomaly detection techniques spatial anomaly detection techniques, graph anomaly detection techniques. The sequential anomaly detection techniques work with sequential data and try to find the anomalous subsequencies. Such data can be system call data or biological data. The problem of detecting sequence anomalies can be reduced to the point anomaly detection problem [10][11]. Handling spatial anomalies includes finding subcomponents in the data. There exists few tehniques in this category. The image processing techniques using the Markov fields is one of them [12]. The graph anomaly detection techniques involve finding a anomaly subgraph in a large graph. The size of the subgraph is also taken into the consideration [13]. VI. C ONCLUSION In the previous sections we presented the most used anomaly detection techniques and defined what anomalies are. For our application domain, which is the traffic engineering, are suitable techniques which can work in the unsupervised mode, because it is hard to get fully labeled data which would cover all possible traffic in the computer network. Such techniques are using neural networks, statistical mathematical model or Bayesian networks. Our future work should include design of a distributed system, which will collect data from the computer network and build a model of a normal traffic behavior. This distributed system will also detect anomalies using the unsupervised anomaly detection techniques. This system will react on the detected anomalies and reconfigure network. ACKNOWLEDGMENT The authors want to thank the entire staff of Computer Networks Laboratory at DCI FEEI at Technical University of Košice. This publication is the result of the project implementation Centre of Information and Communication Technologies for Knowledge Systems (project number: 26220120020) supported by the Research & Development Operational Programme funded by the ERDF & was partially prepared within the project ”Methods of multimedia information effective transmission”, No. 1/0525/08 with the support of VEGA agency. R EFERENCES [1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” University of Minnesota, Tech. Rep., August 2007. [2] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Conditional anomaly detection,” IEEE Transactions of Knowledge and Data Engineering, 2007. [3] E. M. Knorr and R. T. Ng, “A unified approach for mining outliers,” Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research, 1997. [4] ——, “Algorithms for mining distance-based outliers in large datasets,” Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403, 1998. [5] D. Pokrajac, A. Lazarevic, and L. J. Latecki, “Incremental local outlier detection for data streams,” Proceedings of IEEE Symposium on Computitional Intelligence and Data Mining, 2007. [6] F. J. Anscombe and I. Guttman, Rejection of outliers. Technometrics, 1960. [7] E. Eskin, “Anomaly detection over noisy data using learned probability distributions,” Proceedings of the Seventeenth International Conference on Machine Learning, pp. 255–262, 2000. [8] M. Deforges, P. Jacob, and J. Cooper, “Applications of probability density estimation to the detection of abnormal conditions in engineering,” Proceedings of Institute of Mechanical Engineers, vol. 212, pp. 687–703, 1998. [9] E. Parzen, On the estimation of a probability density function and mode. Institute of Mathematical Statistics, 1962, no. 2. [10] P. K. Chan and M. V. Mahoney, “Modeling multiple time series for anomaly detection,” Proceedings of the Fifth IEEE International Conference on Data Mining, 2005. [11] S. Budalakoti, A. Srivastava, A. Akella, and E. Turkov, “Anomaly detection in large sets of high-dimensional symbol sequences,” NASA Ames Research Center, Tech. Rep., 2006. [12] G. G. Hazel, “Multivariate gaussian mrf for multispectral scene segmentation and anomaly detection,” GeoRS, vol. 3, pp. 1199–1211, 2000. [13] C. C. Noble and D. J. Cook, “Graph-based anomaly detection,” Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, 2003.