Download Data Mining Techniques in Network Security - Edux FIT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Data Mining Techniques in
Network Security
Mgr. Rudolf B. Blažek, Ph.D.
Department of Computer Systems
Faculty of Information Technologies
Czech Technical University in Prague
© Rudolf Blažek 2010-2011
Network Security
MI-SIB, ZS 2011/12, Lecture 12
The European Social Fund
Prague & EU: We Invest in Your Future
Data mining v síťové bezpečnosti
Mgr. Rudolf B. Blažek, Ph.D.
Katedra počítačových systémů
Fakulta informačních technologií
České vysoké učení technické v Praze
© Rudolf Blažek 2010-2011
Síťová bezpečnost
MI-SIB, ZS 2011/12, Přednáška 12
Evropský sociální fond
Praha & EU: Investujeme do vaší budoucnosF
Data Mining Techniques
Reference
Anoop Singhal (Editor)
Data Warehousing and
Data Mining Techniques for
Cyber Security
(Advances in Information Security)
Springer, 2006
ISBN: 0387264094
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
3
Data Mining Techniques
Introduction
Data Mining in Network Security
Do not use signatures, but learn about usage patterns
•
Also viewed as Machine Learning techniques
Two basic detection approaches
•
Misuse detection:
•
•
detection of normal vs. intrusive data flows
Anomaly detection:
•
classification algorithms, rare class predictive models,
association rules, cost sensitive modeling
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
4
Data Mining Techniques
Introduction
Data Mining in Network Security
Misuse Detection
•
•
•
•
Models of misuse are created automatically
Often better rules than manually created signatures
Very good for detection of known (or modified) attacks
But deciding normal vs. intrusive is human resource intensive
Anomaly detection
•
•
•
Models of normal behavior are created automatically
Deviations from normal behavior are detected automatically
Can potentially detect unexpected and unknown attacks
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
5
Data Mining Techniques
Introduction
Data Mining for Anomaly Detection
Advantages
•
•
Detection of unknown attacks
Detection of unusual behavior interesting to IT managers
Limitations
•
Possibly very high false alarm rate (FAR)
Supervised Detection
•
Models for normal behavior are built from training data
Unsupervised Detection
•
No training data, attempts to learn about anomalies
automatically
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
6
Data Mining Techniques
Introduction
Unsupervised Anomaly Detection
No training data
•
Attempts to learn about anomalies automatically
Algorithms used
•
Statistical methods, clustering, outlier detection, state
machines
We will focus on unsupervised anomaly detection
•
•
We will study MINDS, the Minnesota IDS, as an example
It will illustrate the usage of data mining techniques
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
7
Data Mining Techniques
MINDS
MINDS
Minnesota Intrusion Detection System
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
8
Data Mining Techniques
MINDS
Minnesota Intrusion Detection System
Abbreviated as MINDS
Uses data mining techniques for
•
unsupervised anomaly detection
(assigns an “anomaly” score to each network connection)
•
association pattern analysis
(for suspected anomalous network connections)
Performance
•
Detects new intrusions that escape signature-based IDSs
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
9
Data Mining Techniques
MINDS
Minnesota Intrusion Detection System
4 C HAPTER T HREE
The MINDS System
Network
Association
Pattern
Analysis
Data
Capture
Device
Anomaly
Detection
Summary of
anomalies
Anomaly
Scores
Analyst
Labels
Storage
Filtering
Feature
Extraction
Known Attack
Detection
Detected known
attacks
Source: Data Warehousing and Data Mining Techniques for Cyber Security
Figure 3.1: MINDS System
Table 3.1: Time-window based features
Feature
name
Feature description
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
10
Data Mining Techniques
MINDS
Minnesota Intrusion Detection System
Data capture
•
•
•
•
•
Netflow v. 5 (Cisco protocol)
Using flow-tools (http://www.splintered.net/sw/flow-tools/)
Header information only, summarized into (one-way) flows
Much less data storage than tcpdump (or e.g. Wireshark)
Flow analysis period: 10 min. (1-2 mil. flows)
MINDS processing speed
•
•
Processing of 10 min. data: less than 3 min
But unwanted traffic is filtered out before processing
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
11
Data Mining Techniques
MINDS
MINDS Operation Steps
1. Feature Extraction
•
Basic features: source and destination IP addresses and
ports, protocol, flags, number of bytes, number of packets
•
Derived features for a T-seconds time-window:
•
•
•
Capture connections with similar recent characteristics
Useful to detect sources of high volume connections per
unit time (e.g. fast scanning)
Derived features for a connection-window:
•
Similar characteristics, but for the last connections from
distinct sources (good for e.g. slow scanning)
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
12
Data Mining Techniques
MINDS
Time-window based features
Feature name
Feature description
count-dest
Number of flows to unique destination IP addresses inside
the network in the last seconds from the same source
count-src
Number of flows from unique source IP addresses inside
the net- work in the last seconds to the same destination
count-serv-src
Number of flows from the source IP to the same
destination port in the last seconds
count-serv-dest
Number of flows to the destination IP address using same
source port in the last seconds
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
13
Data Mining Techniques
MINDS
Connection-window based features
Feature name
Feature description
count-dest-conn
Number of flows to unique destination IP addresses inside the network in the last flows from the same source
count-src-conn
Number of flows from unique source IP addresses inside
the network in the last flows to the same destination
count-serv-srcconn
Number of flows from the source IP to the same destination port in the last flows
count-serv-destconn
Number of flows to the destination IP address using same
source port in the last flows
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
14
Data Mining Techniques
MINDS
MINDS Operation Steps
2. Signature based detection for known attacks
•
Detected attacks are not further analyzed. Not our focus.
3. Anomaly detection
•
•
Outlier detection assigns an anomaly score to network flows
Humans can analyze only the most anomalous connections
4. Association pattern analysis
•
•
Summarization of highly anomalous network connections
Humans decide whether the summaries can be used to
create signatures for detection in step 2
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
15
Data Mining Techniques
MINDS
MINDS Operation Steps
2. Signature based detection for known attacks
•
Detected attacks are not further analyzed. Not our focus.
3. Anomaly detection
•
•
Outlier detection assigns an anomaly score to network flows
Humans can analyze only the most anomalous connections
4. Association pattern analysis
•
•
Summarization of highly anomalous network connections
Humans decide whether the summaries can be used to
create signatures for detection in step 2
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
16
Data Mining Techniques
MINDS
MINDS Anomaly Detection
Local Outlier Factor (LOF)
•
•
Assigned to each data point
It is local: outliers with respect to their neighborhood
LOF Calculation
1. Compute density of each point’s neighborhood
2. Calculate the ratio of the point’s density and the density of all
its neighbors
3. Find the average of the density ratios over all its neighbors
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
17
Data Mining Techniques
Outlier Detection
Outlier Detection
Minnesota Intrusion Detection System
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
18
considered as Dataoutlier.
Therefore,
the simple nearest
Mining Techniques
Outlier Detection
neighbor approaches based on computing the distances fail
in these scenarios. However, the example p1 may be detected as outlier using only the distances to the nearest
neighbor. On the other side, LOF is able to capture both
outliers (p1 and p2) due to the fact that it considers the density around the points.
2-D Outlier Detection Example
p2
p1
Figure 5. Advantages of the LOF approach
Source: Data Warehousing and Data Mining Techniques for Cyber Security
2 weeks
data. 7 we
records. T
DoS
teard
R2L,
exam
U2R,
by a
buffe
PROB
port-
Althou
advance i
unresolve
In his cri
tioned a n
thetic sim
using atta
lected fro
that the b
3.3. Unsupervised Support
Vector Machines.
Unlike
Network Security
MI-SIB,
ZS 2011/12, Lecture 12
Rudolf Blažek, Ph.D. (FIT ČVUT)
19
than 4, in this caseData
6. Mining Techniques
Outlier Detection
Definition 5: (reachability distance of an object p w.r.t. obLocal Outlier
Factor Calculation
ject o)
Let k be a natural number. The reachability distance of object
Reachability
Distance
p with respect
to object o is defined as
reach-distk(p, o) = max { k-distance(o), d(p, o) }.
Figure 2 illustrates
idea of reachability
distance
with point
k = 4.qInk-distance
bodu o isthe
a distance
from the k-th
closest
in
the
figure),
then
if
object
p
is
far
away
from
o
(e.g.
p
2
oods, tuitively,
particularly
with
respect
to
• at most k-1 points are
ds. These
are regarded
as
theoutliers
reachability
distance
between the two
is simply their actual discloser than q
p1
tance. However, if they are “sufficiently”
close (e.g.,
p=1 k-distance(o)
in the figreach-dist
(p
,
o)
at least
k is
points
k 1
le given •
in Figure
1. This
a sim- are
ure),
theThere
actual
distance
o k-distance of o. The reaing 502
objects.
are 400
ob- q is replaced by the
NOT
farther
than
objectsson
in the
, and
two the statistical fluctuations of d(p,o) for all the
is cluster
that inC2so
doing,
it is
the
diameter
of a circle
•
this example,
C
forms
a
denser
p's close2 to o can be significantly reduced. The strength of this
with both
k closest
neighbors
awkins'smoothing
definition,
o1
and
o2
effect can be controlled by
the parameter
higher
reach-dist
k(p2, o) k. The
k=4
bjects in C1 and C2 should not be.
the value of k, the more similar the reachability distances for ober, we wish to label both o1 and o2
p2
jects
within
the
same
neighborhood.
framework of distance-based outSource: LOF: Identifying Density-Based Local Outliers
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
20
case explicitly but
simply assume that
there are no duplicates. (To
Data Mining Techniques
Outlier Detection
deal with duplicates, we can base our notion of neighborhood on a
k-distinct-distance, defined analogously to k-distance in definition
3, with the additional requirement that there be at least k objects
with different spatial coordinates.)
L
t
a
T
≤
l
Local Outlier Factor Calculation
Definition 7: ((local) outlier factor of an object p)
Local Outlier
Factor
of
Point
p necessary
based
outliers,
however,
is
to compare
the densities of
The
(local)
outlier
factorit of
p is defined
as
different sets of objects, which means thatlrd
we have (too)determine the
MinPtswe keep MinPts
density of sets of objects dynamically. Therefore,
-------------------------lrd
(p) (p, o), for
as the only parameter and use
the
values
reach-dist
MinPts
MinPts
o ∈ N MinPts(p)
- the density
LOF(p),
p) = --------------------------------------------------------o ∈ NMinPts
as a(measure
of the volume to determine
MinPts
N MinPts(p)
in the neighborhood of an object p.
∑
5.1
TheL
Inspon
secti
Ininsid
parti
the
clu
bors
borsin
jects
beboun
labe
MinPts is a parameter
The outlier
factorDensity
object
pofcaptures
the degree
whichp)we call
Definition
6:of(local
reachability
density
of antoobject
Local Reachability
p
(k before)
p an outlier.
is the
average density
of the of
ratio
the local
The It
local
reachability
p is of
defined
as reachability
density of p and those of p’s MinPts-nearest neighbors. It is easy to
reach-dist
(p, the
o)⎞ higher the
⎛ reachability
see that the lower p's local
densityMinPts
is, and
∑
⎜
⎟
o
N
p
∈
(
)
local reachability densities
ofMinPts
p's MinPts-nearest neighbors
are, the
⎜
⎟
-------------------------------------------------------------------------------lrd MinPts(p)=1 ⁄
N MinPts
(p) section,⎟the formal
higher is the LOF value⎜ of p. In the
following
⎜
⎟
properties of LOF are made
precise. To simplify notation,
⎝
⎠ we drop
the subscript MinPts from reach-dist, lrd and LOF, if no confusion
Intuitively, the local reachability density of an object p is the inarises.verse of the average reachability distance based on the MinPts-
quite
Lemm
note
To th
re
dist-mi
clude
dist-ma
Let ε b
5.2fo
Then
(i)
(ii)
Lem
it holds
jects
21
Network
Security
nearest neighbors of p. Note
that
the local density canMI-SIB,
be ∞ ZS
if 2011/12,
all theLecture 12Proof
Rudolf Blažek, Ph.D. (FIT ČVUT)
Data Mining Techniques
Outlier Detection
LOF Bounds
MinPts = 3
Proof (Sketch): (a
C
∀o ∈ NMinPts
imax
by definition o
dmin
p
dmax
∑
imin
o ∈ N Min
⇒ 1⁄ ------------------
lrd(p) ≤ ------------direct
Source: LOF: Identifying Density-Based Local Outliers
dmin = 4∗imax
by definition o
dmax = 6∗imin
⇒ LOFMinPts(p) ≥ 4
⇒ LOFMinPts(p) ≤ 6
q∈N
⇒ 1 ⁄ ---------
Figure 3: Illustration of theorem 1
giving rise to tighter bounds. In section 5.3, we will analyze in
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
∀q ∈ NMinPts
lrd(o12
)≥
MI-SIB, ZS 2011/12, Lecture
------------22
Data Mining Techniques
Detection
In particular, we hope
to label o2 asOutlier
outlying,
but label all objects in
the cluster C1 as non-outlying. Below, we show that for most objects inBounds
C1 its LOF is approximately 1, indicating that they cannot
LOF
be labeled as outlying.
Lemma 1: Let C be a collection of objects. Let reach-dist-min denote the minimum reachability distance of objects in C, i.e., reachdist-min = min {reach-dist(p, q) | p, q ∈ C}. Similarly, let reachdist-max denote the maximum reachability distance of objects in C.
Let ε be defined as (reach-dist-max/reach-dist-min − 1).
Then for all objects p ∈ C, such that:
(i) all the MinPts-nearest neighbors q of p are in C, and
(ii) all the MinPts-nearest neighbors o of q are also in C,
it holds that 1/(1 + ε) ≤ LOF(p) ≤ (1 + ε).
Source: LOF: Identifying Density-Based Local Outliers
Proof (Sketch): For all MinPts-nearest neighbors q of p, reachMeaning:
is close to 1Then
if p is
the reachability
same cluster
as
dist(p,LOF(p)
q) ≥ reach-dist-min.
theinlocal
density
it’s neighbors
(not
an outlier)
of p, as per
definition
6, is ≤ 1/reach-dist-min. On the other
reach-dist(p,
q) ≤ reach-dist-max.
Thus,MI-SIB,
the ZSlocal
Rudolfhand,
Blažek, Ph.D.
(FIT ČVUT)
Network Security
2011/12,reachLecture 12
23
Data Mining Techniques
among the 3-nearest neighbors
of p, Outlier
we Detection
in turn find their minimum
and maximum reachability distances to their 3-nearest neighbors.
In the Bounds
figure, the indirectmin(p) and indirectmax(p) values are
LOF
marked as imin and imax respectively.
Theorem 1: Let p be an object from the database D, and
1 ≤ MinPts ≤ | D |.
Then, it is the case that
direct min(p)
direct max(p)
---------------------------------- ≤ LOF ( p ) ≤ --------------------------------indirect max(p)
indirect min(p)
Source: LOF: Identifying Density-Based Local Outliers
directmin ( ... (minimum reachability distance from the
( ( ( ( ( closest MinPts neighbors
directmax ( ... (maximum such reachability distance
indirectmin/max( ...( the same for the MinPts closest neighbors
( ( ( ( ( ( of the MinPts closest neighbors
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
24
Data Mining Techniques
Outlier Detection
LOF Bounds
outlier factor LOF
20.00
pct = relative distance from the center
pct is small for points with neighbors in the same cluster
15.00
L
---
LOFmax : pct=1%
LOFmax : pct=5%
10.00
LOFmin : pct=1%
LOFmax : pct=10%
5.00
⎛
⎜
⎜
⎜
⎝
LOFmin : pct=5%
LOFmin : pct=10%
0.00
Source: LOF: Identifying Density-Based Local Outliers
0
5
10
15
=
proportion direct/indirect
Figure 4: Upper and lower bound on LOF depending on direct/
indirect for different values of pct
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
Figur
25
Data Mining Techniques
Outlier Detection
LOF Bounds
pct = relative distance from the center
•
pct is small for point p with MaxPts neighbors in the same
cluster
•
•
no matter if p is in that cluster or not
Thus the bounds are good for both cases
Outliers:
•
•
p is NOT in the same cluster as the closest neighbors
LOF is large, bounds are tight
Non-outliers
•
LOF is close to 1, bounds are tight
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
26
sented in previous section 3.2.1., when k = 1. We
“outlier threshold” that will serve to determine w
Other Data Mining
Methods
for
point is an outlier or not. The threshold is based o
training
data
and
it
is
set
to
2%.
In
order
to
co
Outlier Detection
threshold, for all data points from training data
mal behavior”
data) distances to their nearest nei
Distance to the k-th closest
neighbor
computed and then sorted. All test data points
k
from the point p to its k-th nearest neighbor
• D (p) ... distancedistances
to their nearest neighbors greater than
withare
the
largestas
Dkoutliers.
(p) values
detected
• outliers: n pointsold
Data Mining Techniques
Outlier Detection
Nearest neighbor (NN)
3.2.3. Mahalanobis-distance Based Outlier
= 1training data corresponds to “normal be
• Same as above with
Sincek the
is
straightforward
to
compute
the
mean
and
th
Mahalanobis-distance Based Outlier Detection
deviation of the “normal” data. The Mahalanob
in training
• normal data determined
[ref] between
the particular point p and the mea
is the
computed
as:of normal data
farthestdata
from
mean μ
• outliers: n pointsnormal
T
1
(
p
)
(p
),
• Mahalanobis distance:
(Σ ... covariance where
matrixthe
of normal
is thedata)
covariance matrix of the “nor
27
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network
MI-SIB, ZS 2011/12,
Lecture
12
Similarly
toSecurity
the previous approach,
the
thresho
dM =
sented in previous section 3.2.1., when k = 1. We
“outlier threshold” that will serve to determine w
Other Data Mining
Methods
for
point is an outlier or not. The threshold is based o
training
data
and
it
is
set
to
2%.
In
order
to
co
Outlier Detection
threshold, for all data points from training data
mal behavior”
data) distances to their nearest nei
Distance to the k-th closest
neighbor
computed and then sorted. All test data points
k
from the point p to its k-th nearest neighbor
• D (p) ... distancedistances
to their nearest neighbors greater than
withare
the
largestas
Dkoutliers.
(p) values
detected
• outliers: n pointsold
Data Mining Techniques
Outlier Detection
Nearest neighbor (NN)
3.2.3. Mahalanobis-distance Based Outlier
= 1training data corresponds to “normal be
• Same as above with
Sincek the
is
straightforward
to
compute
the
mean
and
th
Mahalanobis-distance Based Outlier Detection
deviation of the “normal” data. The Mahalanob
in training
• normal data determined
[ref] between
the particular point p and the mea
is the
computed
as:of normal data
farthestdata
from
mean μ
• outliers: n pointsnormal
T
1
(
p
)
(p
),
• Mahalanobis distance:
(Σ ... covariance where
matrixthe
of normal
is thedata)
covariance matrix of the “nor
28
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network
MI-SIB, ZS 2011/12,
Lecture
12
Similarly
toSecurity
the previous approach,
the
thresho
dM =
considered as
outlier. Therefore, the simple n
Outlier Detection
neighbor approaches based on computing the distance
in these scenarios. However, the example p1 may b
tected as outlier using only the distances to the n
neighbor. On the other side, LOF is able to capture
outliers (p1 and p2) due to the fact that it considers the
sity around the points.
Data Mining Techniques
Advantage of LOF-based Detection
Nearest Neighbor Approach
• points in C1 are far apart
• p2 is closer to C2 than that
• p2 will NOT be detected as
an outlier but p1 will
LOF Approach
p2
• considers density of clusters
p1
• both p1 and p2 WILL be
detected as outliers
Figure 5. Advantages of the LOF approach
Source: Data Warehousing and Data Mining Techniques for Cyber Security
Rudolf Blažek, Ph.D. (FIT ČVUT)
3.3. Unsupervised
Support MI-SIB,
Vector
Machines.
29U
Network Security
ZS 2011/12,
Lecture 12
Data Mining Techniques
MINDS Performance
MINDS Performance
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
30
Data Mining Techniques
MINDS Performance
Computational Complexity
Neighborhoods around all data points:
•
•
Calculate distances between all pairs of data points
This is O (n2) calculation – computationally infeasible for
millions of data points.
Sampling a training data set from the observed data
•
•
•
Compare all data points to this small set,
Reduced complexity: O ( data size x sample size )
Additional benefit: anomalous behavior will not be seen often
in the sample, so it will not be mistaken with normal behavior
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
31
Data Mining Techniques
MINDS Performance
Performance of MINDS
Worms
•
October 10, 2002: MINDS detected two activities of the
slapper worm that were not identified by SNORT since they
were new variations of an existing worm code
Scanning and DoS activities
•
August 9, 2002: CERT/CC issued an alert for “widespread
scanning and possible denial of service activity targeted at
the Microsoft-DS service on port 445/TCP”
•
August 13, 2002: Network connections related to such
scanning were the top ranked outliers in MINDS
•
The port scan module of SNORT failed (slow scanning)
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
32
Data Mining Techniques
MINDS Performance
Performance of MINDS
Policy Violations
•
August 8 and 10, 2002: MINDS detected a machine running
a Microsoft PPTP VPN server, and another one running a
FTP server on non-standard ports.
•
Both policy violations were the top ranked outliers since they
are not allowed, and therefore very rare.
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
33
Data Mining Techniques
MINDS Performance
Performance of MINDS
Policy Violations
•
February 6, 2003: unsolicited ICMP echo reply messages to
a computer previously infected with Stacheldract worm (a
DDoS agent) were detected by MINDS.
•
The infected machine has been removed from the network,
but other infected machines outside the local network were
still trying to talk to the previously infected machine.
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
34
Data Mining Techniques
MINDS Performance
Performance of MINDS vs. SNORT
Scanning
•
SNORT rule: 3 seconds window / more than 4 destination IPs
for one source IP
•
•
•
SNORT requires longer windows for slow scans
MINDS detects fast and slow scans automatically
MINDS detects outbound scans, SNORT does not by default
Policy Violations
•
•
SNORT requires a rule for each violation
MINDS detect any rare unusual behavior
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
35
Data Mining Techniques
Mining Association Rules
Mining Association Rules
Summarizing Anomalous Connections Using
Association Rules
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
36
Data Mining Techniques
Mining Association Rules
Summarizing Anomalous Connections
Using Association Rules
Association Rules
•
Implications X
•
Example: {Breat, Butter}
Y (for binary features X and Y)
{Milk}
Mining Association Rules
•
•
Find all rules with specified support (percentage in data)
And with specified confidence = estimate of P(Y|X)
Extracted rules
•
The support of extracted rules is called “Frequent Item Set”
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
37
Data Mining Techniques
Mining Association Rules
Summarizing Anomalous Connections
Using Association Rules
Example rule: {Bread, Butter}
{Milk}
Assume
•
•
10% transactions contain bread and butter
6% of the transactions contain bread, butter, and milk
Support of the rule
•
Percentage of data with the rule: 6%
Confidence of the rule
•
Estimate of P({Milk} | {Bread, Butter}): 6% / 10% = 60%
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
38
Data Mining Techniques
Mining Association Rules
Summarizing Anomalous Connections
Using Association Rules
Example rule: {Bread, Butter}
•
•
{Milk}
Support of the rule: 6%
Confidence of the rule: 60%
If requirement is 1% support and 50% confidence
•
•
then the rule will be extracted
Frequent Item Set: {Bread, Butter, Milk}
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
39
Data Mining Techniques
Mining Association Rules
Summarizing Anomalous Connections
Using Association Rules
Association patterns
•
specified as frequent item sets or association rules
Summary of detected anomalous flows:x
•
•
•
Humans can analyze the summary
E.g. a frequent set for scanning: {srcIP= X, dstPort = Y }
This is a candidate signature for a signature-based system
Constructing a profile of normal network traffic:
•
•
Identify sets of features that are found in normal traffic
E.g. web browsing:
{protocol=TCP, dstPort=80, NumPackets=3. . . 6}
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
40
Data Mining Techniques
Mining Association Rules
Challenges in Mining Association Rules
Imbalanced class distribution
•
Small support for attack, large for normal traffic
Binarization and grouping of attribute values
•
Supervised or unsupervised
Pruning the redundant patterns
•
Must be subsets with similar support
Finding discriminating patterns
•
Patterns that identify attacks and normal traffic
Grouping the discovered patterns
Rudolf Blažek, Ph.D. (FIT ČVUT)
Network Security
MI-SIB, ZS 2011/12, Lecture 12
41