Download 幻灯片 1

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Data Mining
&Intrusion Detection
Shan Bai
Instructor: Dr. Yingshu Li
CSC 8712 ,Spring 08
1
Outline

Intrusion Detection

Data Mining

Data Mining in Intrusion Detection

Reference
2
What is an intrusion?
90000

An intrusion can be defined as
“any set of actions that attempt
to compromise the:
 Integrity
 confidentiality, or
 availability
of a resource”.
80000
70000
60000
50000
40000
30000
20000
10000
0
1
2
3
4
5
6
11
12
1990
1991
1992
1993
1994
1995 71996 81997 91998 10
1999 2000
2001
2002
13
Incidents Reported to Computer Emergency Response Team/Coordination
Center
Spread of SQL Slammer worm 10 minutes
after its deployment
3
Intrusion Examples




DOS
 Trojan horse /worm
 denial-of-service
 Address spoofing
R2L
 a malicious user uses a fake IP
address to send malicious
 unauthorized access from a
packets to a target.
remote machine, e.g.
guessing password;
 Many others…
U2R
 unauthorized access to local
super user (root) privileges,
e.g., various ``buffer
overflow'' attacks;
Probing
 surveillance and other
probing, e.g., port scanning.
4
Intrusion Detection System
(IDS)

Intrusion Detection System

combination of software and hardware that attempts to
perform intrusion detection raises the alarm when possible
intrusion happens.
5
IDS Categories

Intrusion detection systems are split into two
groups:
 Anomaly detection systems


Identify malicious traffic based on deviations from
established normal network.
Misuse detection systems

Identify intrusions based on a known pattern
(signatures) for the malicious activity.
6
Anomaly Detection
activity
measures
90
80
70
60
50
40
30
20
10
0
probable
intrusion
normal profile
abnormal
CPU
Process
Size
baseline the normal traffic and then look for things that are out of the norm
Relatively high false positive rate anomalies can just be new normal activities.
7
Misuse Detection
pattern
matching
Intrusion
Patterns
Example: if (src_ip ==
dst_ip) then “land attack”
intrusion
activities
look for known indicators ICMP Scans, port scans, connection
attempts CPU, RAM I/O Utilization, File system activity,
modification of system files, permission modifications
Can’t detect new attacks
8


Goal of Intrusion Detection Systems (IDS):
 To detect an intrusion as it happens and be able to respond to it.
False positives:
 A false positive is a situation where something abnormal (as
defined by the IDS) happens, but it is not an intrusion.
 Too many false positives


User will quit monitoring IDS because of noise.
False negatives:
 A false negative is a situation where an intrusion is really
happening, but IDS doesn't catch it.
9
Outline

Intrusion Detection

Data Mining

Data Mining in Intrusion Detection

Reference
10
Why do we need Data Mining?

Despite the enormous amount of data,
particular events of interest are still
quite rare, frequency ranges from 0.1%
to less than 10%

We are drowning in data, but
starving for knowledge!
11
Data Mining vs. KDD



Knowledge Discovery in Databases (KDD): The
whole process of finding useful information and
patterns in data
Data Mining: Use of algorithms to extract the
information and patterns derived by the KDD
process
Data mining is the core of the knowledge discovery
process
12
KDD Process





Selection: Obtain data from various sources.
Preprocessing: Cleanse data.
Transformation: Convert to common format. Transform
to new format.
Data Mining: Obtain desired results.
Interpretation/Evaluation: Present results to user in
meaningful manner
13
Data Mining: A KDD Process
– Data mining: core of
knowledge discovery
process
Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
Databases
14
Typical Data Mining Architecture
Graphical user interface
Pattern evaluation
Data mining engine
Knowledge-base
Database or data
warehouse server
Data cleaning & data integration
Databases
Filtering
Data
Warehouse
15
Outline

Intrusion Detection

Data Mining

Data Mining in Intrusion Detection

Reference
16

Network intrusion detection
Number of intrusions on the network is
typically a very small fraction of the total
network traffic
17
Why Can Data Mining Help?

Learn from traffic data

Supervised learning: learn precise models from past intrusions

Unsupervised learning: identify suspicious activities

Maintain models on dynamic data

Correlation of suspicious events across network sites
 Helps detect sophisticated attacks not identifiable by single site analyses

Analysis of long term data (months/years)
 Uncover suspicious stealth activities (e.g. insiders leaking/modifying
information)
18
Intrusion Detection

Traditional intrusion detection system IDS tools (e.g. SNORT)
are based on signatures of known attacks

Limitations
 Signature database has to be manually revised
for each new type of discovered intrusion
 They cannot detect emerging cyber threats
 Substantial latency in deployment of newly created
signatures across the computer system
19
Data Mining for Intrusion Detection:
Techniques and Applications




Frequent pattern mining
Classification
Clustering
Mining data streams
20
Frequent pattern mining

Patterns that occur frequently in a database

Mining Frequent patterns – finding regularities

Process of Mining Frequent patterns for intrusion
detection

Phase I: mine a repository of normal frequent itemsets for
attack-free data

Phase II: find frequent itemsets in the last n connections
and compare the patterns to the normal profile
21
Frequent pattern mining
Apriori:
• Any subset of a frequent itemset must be also
frequent — an anti-monotone property
– A transaction containing {beer, diaper, nuts} also
contains {beer, diaper}
– {beer, diaper, nuts} is frequent {beer, diaper} must
also be frequent
• No superset of any infrequent itemset should be
generated or tested
– Many item combinations can be pruned
22
Sequential Pattern Analysis

Models sequence patterns

(Temporal) order is important in many situations


Time-series databases and sequence databases

Frequent patterns  (frequent) sequential patterns
Sequential patterns for intrusion detection

Capture the signatures for attacks in a series of packets
23
Sequential Pattern Mining
Given a set of sequences, find the complete set
of frequent subsequences
24
Apriori Property in Sequences
25
Classification: A Two-Step
Process

Model construction: describe a set of predetermined
classes

Training dataset: tuples for model construction



Each tuple/sample belongs to a predefined class
Classification rules, decision trees, or math formulae
Model application: classify unseen objects

Estimate accuracy of the model using an independent test
set

Acceptable accuracy  apply the model to classify data
tuples with unknown class labels
26
Classification
27
Classification :Decision Tree



A node in the tree: a test of some attribute
A branch: a possible value of the attribute
Classification



Start at the root
Test the attribute
Move down the tree branch
28
Neural classification: HIDE


“A hierarchical network intrusion detection system
using statistical processing and neural network
classification” by Zheng et al.
Five major components





Probes collect traffic data
Event preprocessor preprocesses traffic data and feeds the
statistical model
Statistical processor maintains a model for normal activities
and generates vectors for new events
Neural network classifies the vectors of new events
Post processor generates reports
29
Clustering


What Is Clustering?
Group data into clusters



– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
– Unsupervised learning: no predefined classes
30
Clustering

What Is A Good Clustering?

High intra-class similarity and low
interclasssimilarity


Depending on the similarity measure
The ability to discover some or all of the hidden
patterns
31
Clustering

Clustering Approaches

Partitioning algorithms



– Partition the objects into k clusters
– Iteratively reallocate objects to improve the
clustering
Hierarchy algorithms


– Agglomerative: each object is a cluster, merge
clusters to form larger ones
– Divisive: all objects are in a cluster, split it up into
smaller clusters
32
Clustering

K-Means: Example
33
Mining Data Streams for Intrusion Detection

Maintaining profiles of normal activities


Identifying novel attacks


The profiles of normal activities may drift
Identifying clusters and outliers in traffic data
streams
Reduce the future alarm load by writing
filtering rules that automatically discard wellunderstood false positives
34
Data Mining for Intrusion Detection
 Misuse detection
Predictive models are built from labeled data sets (instances
are labeled as “normal” or “intrusive”)
These models can be more sophisticated and precise than manually
created signatures
 Recent research e.g. JAM (Java Agents for Metalearning)
35
Misuse Detection
pattern
matching
Intrusion
Patterns
Example: if (src_ip ==
dst_ip) then “land attack”
intrusion
activities
look for known indicators ICMP Scans, port scans, connection
attempts CPU, RAM I/O Utilization, File system activity,
modification of system files, permission modifications
Can’t detect new attacks
36
JAM (Java Agents for Metalearning)





JAM (developed at Columbia University) uses data mining techniques
to discover patterns of intrusions. It then applies a meta-learning
classifier to learn the signature of attacks.
The association rules algorithm determines relationships between fields
in the audit trail records, and the frequent episodes algorithm models
sequential patterns of audit events. Features are then extracted from
both algorithms and used to compute models of intrusion behavior.
The classifiers build the signature of attacks. So thus, data mining in
JAM builds misuse detection model.
Classifiers in the JAM are generated by using rule learning program on
training data of system usage. After training, resulting classification
rules is used to recognize anomalies and detect known intrusions.
The system has been tested with data from Sendmail-based attacks,
and with network attacks using TCP dump data.
37
Data Mining for Intrusion Detection
 Anomaly detection
 Identifies anomalies as deviations from “normal” behavior
 E.g. ADAM: Audit Data Analysis and Mining; MINDS – MINnesota
INtrusion Detection System
38
Anomaly Detection
activity
measures
90
80
70
60
50
40
30
20
10
0
probable
intrusion
normal profile
abnormal
CPU
Process
Size
baseline the normal traffic and then look for things that are out of the norm
Relatively high false positive rate anomalies can just be new normal activities.
39
ADAM: Audit Data Analysis and Mining
Detecting Intrusion by Data Mining
Combination of Association Rule and Classification Rule


Firstly, ADAM collects known frequent datasetsan off-line
algorithm
Secondly, ADAM runs an online algorithm
 Finds last frequent connection records
 Compare them with known mined data
 Discards those, which seems to be normal
 Suspicious ones are forwarded to the classifier
 Trained classifier then classify the suspicious data as one
of the following:



Known type of attack
Unknown type of attack
False alarm
40
ADAM: Detecting Intrusion by Data
Mining
41
ADAM: Audit Data Analysis and Mining

ADAM has two phases in their model

1st Phase: Train the classifier




Offline process
Takes place only once
Before the main experiment
2nd Phase: Using the trained classifier


Trained classifier is then used to detect anomalies
Online process
42
The MINDS Project

MINDS – MINnesota INtrusion Detection System
from Rare Class – Building rare
class prediction models
 Learning
 Anomaly/outlier
detection
 Summarization
of attacks using
association pattern analysis
TID
Items
1
2
3
4
5
Bread, Coke, Milk
Beer, Bread
Beer, Coke, Diaper, Milk
Beer, Bread, Diaper, Milk
Coke, Diaper, Milk
Rules Discovered:
{Milk} --> {Coke}
{Diaper, Milk} --> {Beer}
43
MINDS - Learning from Rare Class

Problem: Building models for rare network attacks
(Mining needle in a haystack)
 Standard
data mining models are not suitable
for rare classes
 Models
must be able to handle skewed class
distributions
 Learning
from data streams - intrusions are
sequences of events
44
MINDS - Anomaly Detection

Detect novel attacks/intrusions by identifying them
as deviations from “normal”, i.e. anomalous
behavior
 Identify normal behavior
 Construct useful set of features
 Define similarity function

Use outlier detection algorithm

Nearest neighbor approach

Density based schemes

Unsupervised Support Vector
Machines (SVM)
45
Experimental Evaluation
• Publicly available data set
DARPA 1998 Intrusion Detection Evaluation
Data Set prepared and managed by MIT
Lincoln Lab includes a wide variety of
intrusions simulated in a military network
environment
• Real network data from
University of Minnesota
Anomaly detection is applied
Open source signaturebased network IDS
 4 times a day
network
10 minutes time window
www.snort.org
10 minutes cycle
2 millions connections
net-flow data using CISCO
routers
Anomaly
scores
MINDS
Data preprocessing
anomaly
detection
…
…
Association
pattern analysis
46
MINDS - Framework for Mining Associations
Ranked
connections
attack
Anomaly
Detection
System
Discriminating
Association
Pattern
Generator
normal
update
1.
Build normal profile
2.
Study changes in
normal behavior
3.
Knowledge
Base
R1: TCP, DstPort=1863  Attack
…
…
Create attack summary
4.
Detect misuse behavior
5.
Understand nature of
the attack
…
…
R100: TCP, DstPort=80  Normal
MINDS association analysis module
47
Discovered Real-life Association Patterns
Rule 1: SrcIP=XXXX, DstPort=80, Protocol=TCP, Flag=SYN,
NoPackets: 3, NoBytes:120…180 (c1=256, c2 = 1)
Rule 2: SrcIP=XXXX, DstIP=YYYY, DstPort=80, Protocol=TCP,
Flag=SYN, NoPackets: 3, NoBytes: 120…180 (c1=177, c2 = 0)



At first glance, Rule 1 appears to describe a Web scan
Rule 2 indicates an attack on a specific machine
Both rules together indicate that a scan is performed first,
followed by an attack on a specific machine identified as
vulnerable by the attacker
48
Discovered Real-life Association Patterns
DstIP=ZZZZ, DstPort=8888, Protocol=TCP (c1=369, c2=0)
DstIP=ZZZZ, DstPort=8888, Protocol=TCP, Flag=SYN (c1=291, c2=0)



This pattern indicates an anomalously high number of TCP
connections on port 8888 involving machine ZZZZ
Follow-up analysis of connections covered by the pattern
indicates that this could be a machine running a variation of
the Kazaa file-sharing protocol
Having an unauthorized application increases the
vulnerability of the system
49
Discovered Real-life Association Patterns…(ctd)
SrcIP=XXXX, DstPort=27374, Protocol=TCP, Flag=SYN, NoPackets=4,
NoBytes=189…200 (c1=582, c2=2)
SrcIP=XXXX, DstPort=12345, NoPackets=4, NoBytes=189…200
(c1=580, c2=3)
SrcIP=YYYY, DstPort=27374, Protocol=TCP, Flag=SYN, NoPackets=3,
NoBytes=144 (c1=694, c2=3)
……

This pattern indicates a large number of scans on ports 27374 (which is
a signature for the SubSeven worm) and 12345 (which is a signature
for NetBus worm)

Further analysis showed that no fewer than five machines scanning for
one or both of these ports in any time window
50
Discovered Real-life Association Patterns…(ctd)
DstPort=6667, Protocol=TCP (c1=254, c2=1)




This pattern indicates an unusually large number of
connections on port 6667 detected by the anomaly detector
Port 6667 is where IRC (Internet Relay Chat) is typically run
Further analysis reveals that there are many small packets
from/to various IRC servers around the world
Although IRC traffic is not unusual, the fact that it is flagged
as anomalous is interesting

This might indicate that the IRC server has been taken down (by a
DOS attack for example) or it is a rogue IRC server (it could be
involved in some hacking activity)
51
Discovered Real-life Association Patterns…(ctd)
DstPort=1863, Protocol=TCP, Flag=0, NoPackets=1, NoBytes<139
(c1=498, c2=6)
DstPort=1863, Protocol=TCP, Flag=0 (c1=587, c2=6)
DstPort=1863, Protocol=TCP (c1=606, c2=8)



This pattern indicates a large number of anomalous TCP
connections on port 1863
Further analysis reveals that the remote IP block is owned
by Hotmail
Flag=0 is unusual for TCP traffic
52
MINDS: Conclusion

Data mining based algorithms are capable of detecting intrusions that
cannot be detected by state-of-the-art signature based methods

SNORT has static knowledge manually updated by human analysts

MINDS anomaly detection algorithms are adaptive in nature

MINDS anomaly detection algorithms can also be effective in detecting
anomalous behavior originating from a compromised or infected machine
MINDS Research






Defining normal behavior
Feature extraction
Similarity functions
Outlier detection
Result summarization
Detection of attacks originating from multiple sites
Outsider attack

Network intrusion
Insider attack

Policy violation
Worm/virus detection
after infection
53
IDS Using both Misuse and Anomaly Detection
:RIDS-100







RIDS( Rising Intrusion Detection System) is provided by Rising Tech. It
is a leader in antivirus and content security software and services in
China.
The company is a leading provider of client, gateway and server
security solutions for virus protection, firewall and intrusion detection
technologies and security services to enterprises and service providers
around China.
RIDS make the use of both intrusion detection technique, misuse and
anomaly detection.
Distance based outlier detection algorithm is used for detection
deviational behavior among collected network data.
For misuse detection, it has very vast set of collected data pattern
which can be matched with scanned network data for misuse detection.
This large amount of data pattern is scanned using data mining
classification Decision Tree algorithm.
http://www.rising-global.com/
54
A cooperative anomaly and intrusion
detection system (CAIDS),

built with a network-based intrusion detection system (NIDS) and an
anomaly detection system (ADS) operating interactively through a
signature generator.
55
A cooperative anomaly and intrusion
detection system (CAIDS),

A frequent episode rule (FER) is generated out of a collection of
frequent episodes. The FER is defined over episode sequences
with multiple connection events.

For an example, we envision a window where we observe a 3event sequence:
E, D, and F. An FER is generated as: E → D, F
confidence level freq (a U b)/freq (b)=0.8,
where a represents the event E on the LHS and b corresponds to
the two events D and F on the RHS of the rule.




If the b occurs with 5% and the joint event a and b has 4% to
occur, there is a (0.04/0.05) = 80% chance that D and F will
follow in the same window.
56
A cooperative anomaly and intrusion
detection system (CAIDS),






In practice, the event E could be an authentication service
characterized by two attributes
(service =authentication, flag=SF).
The events D, F may be two sequential smtp requests denoted
by (service = smtp).
Thus we can derive an FER with a confidence level of c = 80%,
that two smtp services will follow the authentication service
within a window w = 2 sec. The three joint traffic events accounts
with a support level s = 10% out of all the network connections
being evaluated. This FER is formally stated as follows:
(service = authentication) → (service = smtp)
(service = smtp) (0.8, 0.1, 2 sec)
(1)
57
A cooperative anomaly and intrusion
detection system (CAIDS),

An association rule is aimed at finding interesting intrarelationship inside a single connection record

In general, an FER is specified by the following
expression:

L1, L2,…, Ln  R1,…, Rm (c, s, window)

Li (1 ≤ i ≤ n) and Rj (1 ≤ j ≤ m) are ordered traffic
connection events.
We call L1, L2,…, Ln the LHS episode and R1,…, Rm
the RHS of the episode rule.

(2)
58
A cooperative anomaly and intrusion
detection system (CAIDS),
Architecture of the CAIDS simulator built with a 2,000-signature Snort
and an anomaly detection subsystem (ADS) with 60 FERs after 2 weeks
of rule training over the Lincoln Lab IDS evaluation dataset
59
Conclusion

In this report we have studied basic concept and some classic
system models, like ADAM ,MINDSin this area.

To make summary of those system models, their technologies
and their validation methods.

Hope to a overview on currently development in this area and
how data mining is evolving into the field of network intrusion
detection.
60
Reference







DARPA 1998 data set

A cleansed set in KDDCup’99

DARPA 1991 data set is also available

http://www.ll.mit.edu/IST/ideval/data/data_index.html
Daniel Barbara, Julia Couto, Sushil Jajodia, Leonard Popyack, Ningning Wu, “ADAM:
Detecting Intrusions by Data Mining”, Proceedings of the 2001 IEEE Workshop on
Information Assurance and Security, United States Military Academy, West Point, NY, 5-6
June 2001
Zhang, J. and Zulkernine, M. 2006. A Hybrid Network Intrusion Detection Technique Using
Random Forests. In Proceedings of the First international Conference on Availability,
Reliability and Security (April 20 - 22, 2006).
W. Lee et al. A data mining framework for building intrusion detection models. In
Information and System Security, Vol. 3, No. 4, 2000.
Ertoz L. et Al, "MINDS - Minnesota Intrusion Detection System", Next Generation Data
Mining Chapter 3, 2004
Exploiting efficient data mining techniques to enhance intrusion detection systems Lu, C.T.; Boedihardjo, A.P.; Manalwar, P. Information Reuse and Integration, Conf, 2005. IRI -2005
IEEE International Conference on. Volume , Issue , 15-17 Aug. 2005 Page(s): 512 - 517
Sal Stolfo, Andreas Prodromidis, Shelley Tselepis, Wenke Lee, Dave Fan, and Phil Chan
(Honorable mention (runner-up) for Best Paper Award in Applied Research Category) In
Proceedings of the Third International Conference on Knowledge Discovery and Data
Mining (KDD '97), Newport Beach, CA, August 1997
61
Questions & Comments
62