Download Anomaly Detection Techniques for Adaptive Anomaly Driven

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice
Anomaly Detection Techniques for Adaptive
Anomaly Driven Traffic Engineering
1
Jakub Kopka, 2 Martin Révés, 3 Juraj Giertl
Dept. of Computers and Informatics, FEEI TU of Košice, Slovak Republic
1 [email protected], 2 [email protected], 3 [email protected]
Abstract—Traffic engineering (TE) has become mechanism
for safe and efficient transportation of data in a computer
network. TE uses statistical methods for prediction of a network
traffic behavior. However, the traffic behavior will never match
this predictions 100%. The aim of this paper is to describe
detection techniques, which can be used for detection of the
anomalous traffic in computer networks. We propose all known
methods, which are suitable for the anomaly detection in different
application domains and suggest the best techniques for the
detection of anomalous traffic.
x2
S
Keywords—Computer networks, Computer networks management, Traffic control, Traffic engineering
1
x1
I. I NTRODUCTION
Inadequate utilization of network resources is challenging
problem for network traffic engineers. TE allows the optimization of the network resources usage through multiple
mechanisms. This optimization is difficult due to dynamic
nature of a network traffic. The network traffic can be characterized by several parameters. This parameters may come to
be recorded and a model of traffic can be build. If one or more
parameters will have a value that is different that one predicted
by the traffic model, we call it an anomaly. This anomalies are
monitored by a distributed monitoring system. The monitoring
system can detect and localize the cause of anomalies and
respond appropriately by reconfiguring the network. We call
this approach Adaptive Anomaly Driven Traffic Engineering.
This paper is organized as follows. In the section II we
describe and divide anomalies into categories, describe application domains, where anomaly detection techniques can apply
and name the most known of them. Section III presents classification based anomaly detection techniques. In the section IV
we present techniques for detecting contextual anomalies. Section V presents techniques for detecting collective anomalies.
Section VI is conclusion of the previous sections and presents
our suggestion, which of the detection techniques is the most
suitable for our traffic engineering approach.
II. C LASSIFICATION OF ANOMALIES AND THEIR
DETECTION TECHNIQUES
A. Anomaly
An anomaly is a deviation from the common rule, type or
form. The anomalies are patterns in data that do not conform
to a well defined notion of a normal behavior [1]. In relation
to the network traffic, the anomaly is any deviation of the
expected traffic behavior. This anomaly significantly distincts
from the normal traffic and influences one or more links in
network. For example, Fig. 1 illustrates anomalies in a simple
X3
Fig. 1.
S2
A simple example of anomalies.
data set. This set of data has two areas of normal data S1 and
S2 . It is because the most of the data instances lie in these
two areas. Points x1 and x2 are anomalies, because they are
significantly far away from this areas. Also the set X3 denotes
anomalies, although it contains more than one data instance.
Detecting such patterns in the data, which are different from
the normal expected behavior is called an anomaly detection.
The anomaly detection is used in several application domains
e.g. image processing, card fraud detection, pharmaceutical
research, network intrusion detection systems and many other.
All these application domains provides data which can be
analysed for presence of anomalies.
Data are input for all anomaly detection techniques. These
data are described by the set of attributes which can be
binary, categorical or continuous. Data can be univariate or
multivariate if it has one or multiple attributes repectively.
Character of the data determines which of the techniques can
be used.
The anomalies are divided into three categories:
• point anomalies,
• contextual anomalies,
• collective anomalies.
The point anomalies are single data instances, which are
separated from the rest of the data. The contextual (conditional
[2]) anomalies are such data instances, which are considered
SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice
as anomalies in a specific context only. In the other context
can be considered as the normal data. This data have two
types of attributes - behavioral and contextual attributes. The
behavioral attributes denote noncontextual characteristics of
the data, while the contextual attributes denote the relation
between the data. The collective anomalies occur when whole
data subset is divided from the normal data and considered as
anomalous. Either point anomaly, or collective anomaly can
be transformed to the contextual anomaly.
B. Anomaly detection
There exists more than one approach in the anomaly detection. A straightforward approach defines a region, which
represents the normal data or behavior and declares any
data, which do not fall into this normal data as an anomaly.
However, there exist some factors which are challenging [1]:
• It is problem to say what normal is. A boundary between
normal and anomalous behavior is often not so precise.
• When the anomalies are result of a malicious software,
this software can try to mask these anomalies as a normal
traffic pattern.
• The normal traffic pattern is dynamic. What is normal
now, might not be normal tomorrow.
• It is very difficult to get the normal labeled data for
building a prediction model.
• The data can contain a noise which is not considered
as the anomaly, but it is not interesting for analyst
and therefore is unnecessary, because can cause false
detection of the anomalies.
The anomaly detection methods are different for different
application domains and specific problems related to them.
Data labels denote data as normal or anomalous. Labeling
a data is made manually, so aquiring of a fully labeled data
is very difficult and expensive. Based on a type of the labeled
data, the anomaly detection techniques can be divided into
three categories:
• supervised anomaly detection techniques,
• semi-supervised anomaly detection techniques,
• unsupervised anomaly detection techniques.
The techniques working in the supervised mode need a fully
labeled data. The typical approach in this mode is to build the
predictive model of normal and anomalous data. All tested data
are then compared to these models and denoted as normal or
anomalous.
The techniques working in semi-supervised mode need for
building of the predictive models the normal labeled data only.
A specific type of this mode are techniques, which work with
the anomalous data only and build the model of the anomalous
behavior.
The unsupervised mode techniques do not need the training
data to be labeled because they assume that there is much
more normal data instances than anomalous ones.
III. P OINT ANOMALIES DETECTION TECHNIQUES
A. Classification based techniques
These techniques work in two phases. During the first phase,
which is called learning, a prediction model (classificator) is
built using avalaible labeled data. The classificator can distinct
between the normal and the anomalous data. During the second, testing phase, the tested data are classified into the normal
or the anomalous classes. In the learning phase can model
divide the normal data into several sets. When this occurs,
such technique will be called a multi-class technique. When
only one normal class exists, it is an one-class technique.
One group of classification techniques uses a classification
agorithm based on neural networks. Such techniques can
be used with the multi-class or the one-class data. Other
techniques use the algorithms based on Bayesian networks,
Support vector machines or Rule based systems.
The testing phase is generally very fast, because a predictive
model has been built and testing instances are only compared
to the model. These techniques also use algorithms that can
distinguish between instances belonging to different normal
classes. A disadvantage is that these techniques need training
labeled data to build the predictive model.
B. Nearest neighbor based techniques
These techniques are based on the prediction that normal
data instances forms neighborhoods, while the anomalous do
not.
These techniques compute distances to the nearest neighbors
or uses a relative density as the anomaly score. The first group
of techniques use a distance to the k nearest neighbors as the
anomalous score. The second group of techniques computes
the relative density in a hypersphere with the radius d. Such
anomalous score s can be computed as [3][4]:
n
(1)
s=
πd2
where n is the number of data instances. The advantage of
these techniques is that they can work in the unsupervised
mode, but if the semi-supervised mode is used, the number of
the false anomaly detections is smaller [5]. The computational
complexity of such techniques is relatively high, because between each pair of the data instances the distance is computed.
Also the rate of the false anomaly detection is high, if a normal
neighborhood consists only from few data instances.
C. Clustering based techniques
These techniques are very similar to techniques mentioned
in the previous subsection. The problem of detecting anomalies
which forms clusters can be transformed to the problem of
nearest neighbor based techniques. Both, nearest neighbor
based and the clustering based techniques are very similar. The
clustering based techniques, however, evaluate each instance
with a respect of the cluster it belongs to.
The first type of clustering based techniques assume that
normal instances forms clusters (Fig. 2). Such techniques
apply known clustering based algorithms and declare, whether
any data belongs to the cluster or not. A disadvantage is that
they are optimized to find clusters not anomalies.
The second type assumes that the normal data instances
lie close enough to a closest cluster centroid (Fig. 3). Such
techniques are not applicable, if the anomalies form clusters.
Therefore there exists the third type of the clustering techniques, which assumes that the normal instances form large
dense clusters, while anomalies form small clusters which are
sparse (Fig. 4). Both previous types work in two phases. In
the first phase, a clustering algorithm clusters data and in the
second phase, it computes a distance as an anomalous score.
The clustering based techniques can work in the unsupervised
SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice
y
y
x
Fig. 2.
x
Normal instances as a one big cluster.
Fig. 4.
y
x
Fig. 3.
Normal instances form more than one cluster.
mode, because clustering algorithms do not need labeled data.
Once a model is built, the testing phase is very fast, because
it just compares tested instances to the model. One main
disadvantage is a high computational complexity and the fact,
that these algorithms are not optimized to find anomalies.
D. Anomaly detection techniques based on statistical approach
The statistical methods are based on the following assumption [6]: An anomaly is an observation which is suspected of
being partially or wholly irrelevant because it is not generated
by the stochastic model assumed. It means that the normal
data instances occur in the high probability regions, while the
anomalies occur in the low probability regions of the stochastic
model.
The statistical methods fit the statistical model to the normal
data and then determines, if a tested data instance belongs to
Anomalous instances form a cluster.
the model or not. If technique assume the knowledge of the
distribution it is called parametric [7], otherwise it is called
nonparametric [8].
The nonparametric methods assume that model is determined from the given data. The most used techniques are the
kernel function based and the histogram based techniques. The
kernel function based techniques use the Parzen windows estimation [9]. The simplest techniques are the histogram based
techniques. These techniques are widely used in intrusion
detection systems. The first step in using of these techniques
consists of building a histogram based on diferent values taken
from the training data. In the second step is a tested data
instance checked, whether it falls into one of the histogram
bins.
The parametrical methods assume that data are generated
by the parametric distribution Θ and the probability density
function f (o, Θ), where o is an observation. The parametrical
methods can be divided based on types of the distributions:
• Gaussian model based,
• regression model based,
• mixture of parametric distributions based.
The Gaussian model based methods use many known techniques like box plot rule or Grubb’s test. In the Grubb’s test
for each test instance x, its z score computed as:
|x − x|
(2)
s
where x is the mean and s is the standard deviation. The test
instance is anomalous if
v
u
t2α/(2N ),N −2
N − 1u
z> √ t
(3)
N − 2 + t2α/(2N ),N −2
N
z=
where N is the data size and t2α/(2N ),N −2 is a treshold.
The regression based methods were used for time-series
data. Techniques based on mixtures of the parametric distribution use different types of the distribution to model normal
and anomalous data. If the normal data cannot be modeled by
none of all known distributions, mixture of them is used.
SCYR 2010 - 10th Scientific Conference of Young Researchers - FEEI TU of Košice
The advantages of the statistical methods are that they are
widely used and if a good model is designed, they are very
effective. They can be used in the unsupervised mode with
lack of the training data. The histogram based techniques are
not suitable for detecting the contextual anomalies, because
they cannot record an interaction between the data instances.
Also choosing of the proper test method is nontrivial.
E. Other detection techniques
The aformentioned techniques are the most widely used.
The other methods use the information theory techniques
based on the relative entropy or the Kolmogorov complexity.
These techniques can operate in the unsupervised mode and
do not need a statistical assumption about data. Another techniques are the spectral anomaly detection techniques, which
try to find an approximation of the data and determine the
subspaces in which the anomalous instances can be easily
identified. These techniques have high computational complexity.
IV. C ONTEXTUAL ANOMALIES DETECTION TECHNIQUES
When detecting contextual anomalies, data instances have
contextual and behavioral attributes. The context between data
can be defined using sequences, space, graph or profile. The
profiling is typicaly used for detecting of credit card frauds.
For each of the credit card holders (each holder denotes
separate context) is the behavioral profile built. Using the
credit card for paying abroad can be labeled as the anomalous
or the normal instance. It depends on the context and that is
the card owner.
The problem of contextual anomalies detection can be
transformed to the problem of point anomalies detection. It is
necessary to identify the context and then compute an anomaly
score. The other methods utilize a structure of data and use
regression or divide and conquer approach. The advantage of
these techniques is that they can identify the anomaly, which
would be undetectable using the techniques in the previous
section.
V. C OLLECTIVE ANOMALIES DETECTION TECHNIQUES
These techniques can be divided into three categories:
•
•
•
sequential anomaly detection techniques
spatial anomaly detection techniques,
graph anomaly detection techniques.
The sequential anomaly detection techniques work with
sequential data and try to find the anomalous subsequencies.
Such data can be system call data or biological data. The
problem of detecting sequence anomalies can be reduced to
the point anomaly detection problem [10][11].
Handling spatial anomalies includes finding subcomponents
in the data. There exists few tehniques in this category. The
image processing techniques using the Markov fields is one of
them [12].
The graph anomaly detection techniques involve finding a
anomaly subgraph in a large graph. The size of the subgraph
is also taken into the consideration [13].
VI. C ONCLUSION
In the previous sections we presented the most used
anomaly detection techniques and defined what anomalies are.
For our application domain, which is the traffic engineering,
are suitable techniques which can work in the unsupervised
mode, because it is hard to get fully labeled data which
would cover all possible traffic in the computer network. Such
techniques are using neural networks, statistical mathematical
model or Bayesian networks.
Our future work should include design of a distributed
system, which will collect data from the computer network
and build a model of a normal traffic behavior. This distributed
system will also detect anomalies using the unsupervised
anomaly detection techniques. This system will react on the
detected anomalies and reconfigure network.
ACKNOWLEDGMENT
The authors want to thank the entire staff of Computer
Networks Laboratory at DCI FEEI at Technical University of
Košice.
This publication is the result of the project implementation Centre of Information and Communication Technologies for Knowledge Systems (project number: 26220120020)
supported by the Research & Development Operational Programme funded by the ERDF & was partially prepared within
the project ”Methods of multimedia information effective
transmission”, No. 1/0525/08 with the support of VEGA
agency.
R EFERENCES
[1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A
Survey,” University of Minnesota, Tech. Rep., August 2007.
[2] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Conditional anomaly
detection,” IEEE Transactions of Knowledge and Data Engineering,
2007.
[3] E. M. Knorr and R. T. Ng, “A unified approach for mining outliers,”
Proceedings of the 1997 conference of the Centre for Advanced Studies
on Collaborative research, 1997.
[4] ——, “Algorithms for mining distance-based outliers in large datasets,”
Proceedings of the 24th International Conference on Very Large Data
Bases, pp. 392–403, 1998.
[5] D. Pokrajac, A. Lazarevic, and L. J. Latecki, “Incremental local outlier detection for data streams,” Proceedings of IEEE Symposium on
Computitional Intelligence and Data Mining, 2007.
[6] F. J. Anscombe and I. Guttman, Rejection of outliers. Technometrics,
1960.
[7] E. Eskin, “Anomaly detection over noisy data using learned probability
distributions,” Proceedings of the Seventeenth International Conference
on Machine Learning, pp. 255–262, 2000.
[8] M. Deforges, P. Jacob, and J. Cooper, “Applications of probability density estimation to the detection of abnormal conditions in engineering,”
Proceedings of Institute of Mechanical Engineers, vol. 212, pp. 687–703,
1998.
[9] E. Parzen, On the estimation of a probability density function and mode.
Institute of Mathematical Statistics, 1962, no. 2.
[10] P. K. Chan and M. V. Mahoney, “Modeling multiple time series
for anomaly detection,” Proceedings of the Fifth IEEE International
Conference on Data Mining, 2005.
[11] S. Budalakoti, A. Srivastava, A. Akella, and E. Turkov, “Anomaly
detection in large sets of high-dimensional symbol sequences,” NASA
Ames Research Center, Tech. Rep., 2006.
[12] G. G. Hazel, “Multivariate gaussian mrf for multispectral scene segmentation and anomaly detection,” GeoRS, vol. 3, pp. 1199–1211, 2000.
[13] C. C. Noble and D. J. Cook, “Graph-based anomaly detection,” Proceedings of the 9th ACM SIGKDD international conference on Knowledge
discovery and data mining, 2003.