Download SMG_19042013_AnomalyNIDS_Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
February 21-23, 2007
Data Preprocessing for Anomaly Based Network
Intrusion Detection: A Review
by
<Jonathan J. Davis and Andrew J. Clark>
Presented by Sait Murat GIRAY
April 19,2013
1
Overview
• Outline of the presentation
– Introduction
– Packet Header Anomaly Detection
– Protocol Anomaly Detection
– KDD Cup 1999 Detection
– Content Anomaly Detection
– Alert Anomaly Detection
– Discussion and Conclusions
<Sait Murat GIRAY>, <METU>
2
Key Points and Focus
• Concentration is on the data preprocessing methods
used by anomaly-based network intrusion detection
systems (NIDS) with regards to;
– Aspects of network traffic
– Feature construction methods
– Selection techniques
• Motivation behind the review is the impact of data
preprocessing for NIDSs on their;
– Precision (Fidelity)
– Aptitude (Capability)
<Sait Murat GIRAY>, <METU>
3
What is an Intrusion and Detection System
• Intrusion: An attempt to break into or misuse of a
system.
• Intruders may be from outside the network or
legitimate users of the network.
• Intrusion can be physical, system or remote.
• Intrusion Detection Systems (IDS) monitor systems
against security compromises in real time (or as much
close as to that), log and report malicious activities.
• IDSs are passive while Intrusion Prevention
Systems(IPSs) are actively react against threats.
• Anomaly based approaches don't fit well for IPS and
out of the scope of this paper.
<Sait Murat GIRAY>, <METU>
4
Intrusion Detection Systems (IDS)
• IDS are mainly classfied based on;
– How they detect malicious activity?
 Signature based misuse
 Anamoly based
– Where they are deployed?
 Network based (Monitors network activity)
 Host based (Monitors client activity)
<Sait Murat GIRAY>, <METU>
5
Signature Based IDS
• Possess an attack description that can be matched
to sensed attack manifestations.
• Rely upon rules written by domain experts.
• Most signature analysis systems are based on
simple pattern matching algorithms.
• The question of what information is relevant to an
IDS depends upon what it is trying to detect.
– E.g DNS, FTP etc.
• Primary strength is very low false positive rates.
<Sait Murat GIRAY>, <METU>
6
Drawbacks of Signature Based IDS
• They are unable to detect novel attacks.
• They require continuous
signature
updates
and
have
to
programmed
repeatedly for every new attack pattern.
<Sait Murat GIRAY>, <METU>
7
Anomaly Based IDS
• Models all types of normal usage
or valid behavior of the network
as a noise characterization.
• Anything diverges from the
noise
(base
normal)
is
assumed to be an intrusive
activity.
– E.g flooding a host with lots of
packet. (DoS)
• The primary strength is its ability
to recognize new attack
types.
<Sait Murat GIRAY>, <METU>
8
Drawbacks of Anomaly Detection IDS
• Assumes that intrusions will be accompanied by
definitions that are sufficiently unusual so
as to permit detection.
• These generate many false alarms and hence
compromise the effectiveness of the IDS.
• New (unfamiliar) but legal network traffic
triggers false positives and it is not easy to
model all types of normal traffic at once.
<Sait Murat GIRAY>, <METU>
9
Why Data Preprocessing?
• Data in the real world is dirty
– incomplete: lacking attribute values, lacking
certain attributes of interest, or containing only
aggregate data
• e.g., occupation=“ ”
– noisy: containing errors or outliers
• e.g., Salary=“-10”
– inconsistent: containing discrepancies in codes
or names
• e.g., Age=“42” Birthday=“03/07/1997”
• e.g., Was rating “1,2,3”, now rating “A, B, C”
• e.g., discrepancy between duplicate records
<Sait Murat GIRAY>, <METU>
10
Major Tasks in Data Preprocessing
• Data cleaning
– Fill in missing values, smooth noisy data, identify or
remove outliers, and resolve inconsistencies
• Data integration
– Integration of multiple databases, data cubes or files
• Data transformation
– Normalization and aggregation
• Data reduction
– Obtains reduced representation in volume but
produces the same or similar analytical results
• Data discretization
– Part of data reduction but with particular importance,
especially for numerical data
<Sait Murat GIRAY>, <METU>
11
Data Preprocessing & Anomaly Detection
• NIDS is a knowledge discovery task and it tries
to classify network traffic to detect malicious activity.
• Preprocessing is required to
– Convert traffic into observations
– Represent each observation as a feature vector
– Label them as “normal” or “anomalous”.
• These feature vectors are then fed into data mining
and machine learning algortihms.
<Sait Murat GIRAY>, <METU>
12
Preprocessing Steps for NIDS
• Dataset creation
– Identify traffic for training and testing
– Label connections as normal or anomalous (time
consuming)
• Feature Construction
– Create extra feature with more refined discriminative
features
– Sequence analysis, association mining and frequentepisode mining can be used.
• Feature Selection
– Eliminate redundant or irrelevant features to optimize
dataset
– Principal component analysis (PCA) is a common
method.
<Sait Murat GIRAY>, <METU>
13
Research Papers by Anomaly vs. Features
<Sait Murat GIRAY>, <METU>
14
Overview
• Next topic is
– Introduction
– Packet Header Anomaly Detection
– Protocol Anomaly Detection
– KDD Cup 1999 Detection
– Content Anomaly Detection
– Alert Anomaly Detection
– Discussion and Conclusions
<Sait Murat GIRAY>, <METU>
15
7 Layers of OSI Model
All
People
Seem
To
Need
Data
Preprocessing
<Sait Murat GIRAY>, <METU>
16
A Network Packet Header
<Sait Murat GIRAY>, <METU>
17
Packet Header Anomaly Detection
• Packet headers are only a small portion of network data.
– Preprocessing is straightforward and requirements are
minimized
– Require fewer resources (CPU, RAM, HDD) compared to
payload analysis
– Remain useful even in the presence of encrypted payloads
(SSL)
– Good for real-time operation without deep packet
inspection on high bandwidth networks (transcontinental fiber
links or stock market systems)
• Determining and using feature construction is
– Required to produce more discriminative features
– But it is hard to manage it with basic traffic info of headers.
<Sait Murat GIRAY>, <METU>
18
Packet Header Anomaly Detection Feature Types
• 3 types of feature sets used in detection are as follows:
– Basic features
 Individual packet data without further feature
construction
– Single connection derived
 Complete network flow used as data instance.
 A flow is a sequences of packets with a common value
(same source or destination IP/port)
 Provides more context to analyze
 Most important feature is time based statistical measures
produced by monitoring basic features in a flow over
time.
– Multiple connection derived
 Features are constructed from multiple flows
<Sait Murat GIRAY>, <METU>
19
Packet Header Anomaly Detector (PHAD)
Basic
Feature
• This model detects TCP/IP and DoS attacks
– Learns normal ranges for packet headers from data link
(Ethernet), network (IP) and Transport/Control (TCP)
layers.
– This is a very large numeric range therefore clustering
is employed as preprocessing to reduce this space.
– Each attribute of a packet header is assigned an
anomaly score based on the last time it was detected.
– Anomaly score of each packet is the sum of its each
attributes` anomaly value.
• Since analysis is performed as per attribute of each header
– The distribution depends on one random variable
– Main algorithm is univariate anomaly detection
<Sait Murat GIRAY>, <METU>
20
Statistical Packet Anomaly Detection
Engine (SPADE)
Basic
Feature
• It is developed to detect network and port scans;
– Uses basic features such as source and destination IP
address
– Builds a model of a normal traffic distribution rather
than simply counting number of attempts within a certain
time window.
– Maintain distributions in real time by joint probability
measures or using a Bayes Network
• Detection phase;
– Anomaly score is calculated by comparing packets to
probability distribution.
– High anomalous packets are retained
<Sait Murat GIRAY>, <METU>
21
Wireless Network Anomaly Detection
Basic
Feature
• This approach detects attacks on wireless networks
– Preprocess all frame headers
– Categorize features and derive new ones.
– Finally apply feature selection to discover most
relevant set to detect malicious traffic.
• Detection mechanism is as follows:
– Features are filtered and sorted in accordance with
their relevance.
– A combination of forward search algorithm and KMeans Classifer is utilized to find best set of
features.
<Sait Murat GIRAY>, <METU>
22
Anamolous Network Traffic Detection with
Self-Organizing Maps (ANDSOM)
Single
Connection
• Self-organizing maps are artificial neural network (ANN)
trained with unsupervised learning and produce a lowdimensional, discretized representation of the input space
of the training samples called a map.
• In ANDSOM preprocessing categorize dataset by service
type (TCP,UDP) and application protocol (HTTP or
SMTP) and a different model is created for each data
segment.
• SCD features provide a session fingerprint. Data instances
are compared with SOM model to detect anomaly.
<Sait Murat GIRAY>, <METU>
23
Other SCD based Anomaly Detectors
Single
Connection
• HTTP request and response features
– Attacks against web servers when traffic is
encrypted with SSL
– Since only features used are clear text HTTP
request and response sizes, frequency analysis
is used to reduce false positives
• Round Trip Time (RTT) features
– Multiple step connections are used by
attackers to avoid detection
– Clustering and partitioning is used to calculate
RTTs and estimate number of steps
<Sait Murat GIRAY>, <METU>
24
Other SCD based Anomaly Detectors
Single
Connection
• Application protocol features
– Features are extracted from TCP/IP headers of flow
– C5 decision trees are used to classify these
features
– This mechanism discovers services and applications
running on nonstandard (anomalous) ports
SCD features are good for detecting unusual
behavior in a single session such as unexpected
protocol, data size and packet time but not
sufficient for finding activity spanning multiple
flows.
<Sait Murat GIRAY>, <METU>
25
Minnesota Network Intrusion Detection
System (MINDS)
Multiple
Connection
• Uses a volume based approach to count flow features
• Two sets of multiple connection features are calculated;
– The last N connections.
– Each time window (10 minutes)
• Main algorithm for anomaly detection is a density based
outlier detection called Local Outlier Factor (LOF)
and association mining.
• It is capable of exposing network scans, DoS attacks and
worms.
<Sait Murat GIRAY>, <METU>
26
Audit Data Analysis and Mining
(ADAM)
Multiple
Connection
• Preprocessing is performed to create connection records
from network flows
• This method uses association mining;
– Derive multiple connection features
– Apply on connection records over a sliding time
window of 3s and 24h
• A model from association rules is created in training
phase
• Data mining is used to dynamically find association
rules and then compare with model in detection
phase.
<Sait Murat GIRAY>, <METU>
27
Stochastic Clustering Algorithm for
Anomaly Detection (SCAN)
Multiple
Connection
• It samples both incoming data and produces summaries to
reduce workload.
• These summaries include multiple connection features over
60s time window.
• Time based features are fed into clustering algorithm to
find outliers. Discovered outliers are accepted as
anomalies in the network.
• SCAN technique is quite successful in detecting DoS attacks in
the high speed networks.
<Sait Murat GIRAY>, <METU>
28
Fuzzy Intrusion Recognition Engine
(FIRE)
Multiple
Connection
• FIRE incorporates fuzzy logic and MCD statistical
features in order to identify abnormal behaviors
• Header attributes from flows are extracted;
– TCP sessions are reassembled
– A unique key for each session is created
• Statistical measures are calculated from MCD feature with
a 15 min interval.
• Each feature is an input to a Fuzzy Threat Analyzer on
which security admins write fuzzy rules to detect
anomalies.
<Sait Murat GIRAY>, <METU>
29
Overview
• Next topic is
– Introduction
– Packet Header Anomaly Detection
– Protocol Anomaly Detection
– KDD Cup 1999 Detection
– Content Anomaly Detection
– Alert Anomaly Detection
– Discussion and Conclusions
<Sait Murat GIRAY>, <METU>
30
Protocol Anomaly Detection
• Various protocols (TCP, SMTP, HTTP, FTP) are
analyzed within network traffic to find out any
divergent action.
• Based on the adopted approaches;
– Specification Based
– Parser Based
– Application Protocol Keyword Based
<Sait Murat GIRAY>, <METU>
31
Specification Based Anomaly Detection
• When an expert ”specifies” the model manually for an
anomaly based NIDS it is called specification based.
• Network protocols change much slower than the attacks
do;
– X.200 (OSI basic model) was approved on 1994.
– Each day a new cyber attack emerges
• The idea behind this technique;
– Protocols have solid standards and definition
– They are superior to trained models on anomaly
detection.
<Sait Murat GIRAY>, <METU>
32
Specification Based Anomaly Detection
• Network traffic used to train models is generally both
stained with malicious activity and does
not
represent real normal behavior.
• Specification approaches;
– Segment data by IP address and port combinations as
preprocessing
– Calculate frequency distributions associated with state
machines
– Unusual frequency distributions are marked as
anomalous action
• A limitation of the model is having state machines only
for TCP/IP traffic and missing single packet attacks.
<Sait Murat GIRAY>, <METU>
33
Parser Based Anomaly Detection
• Protocol parsers or decoders are created with protocol
specifications built into them
• Protocol specific preprocessors parse and normalize
header fields
• When the decoders detects invalid protocol usage,
the anomaly is flagged
– Invalid total length
• The advantage is the ability of delivering detailed info
about the location and the cause of the anomaly
<Sait Murat GIRAY>, <METU>
34
Parser Based Anomaly Detection
• Snort is a very common NIDS mainly based on pattern
matching signatures but also offers parser based anomaly
detection.
• Bro is another NIDS using policy scripts to detect
anomalies.
<Sait Murat GIRAY>, <METU>
35
Application Protocol Keyword Based Anomaly
Detection
• Some SCD features from packet headers are added along
with protocol specific keywords
• The model is built with allowed keywords from different
application protocols and trained on them
• If an unknown/rare key is used by a particular service its
anomaly score is increased
• Total anomaly score of a connection depends on the
probability of each feature and its related keyword.
<Sait Murat GIRAY>, <METU>
36
Overview
• Next topic is
– Introduction
– Packet Header Anomaly Detection
– Protocol Anomaly Detection
– KDD Cup 1999 Detection
– Content Anomaly Detection
– Alert Anomaly Detection
– Discussion and Conclusions
<Sait Murat GIRAY>, <METU>
37
KDD Cup 1999
• This is the “Computer Network Intrusion” contest
• Its dataset is publicly available and already
preprocessed
– That`s why it is used in many NIDS researches
• Creating labels for a custom dataset is very labor
intensive.
– KDD Cup 99 dataset has labeled vector of 41
features for each network connection.
<Sait Murat GIRAY>, <METU>
38
Data Transformation
• Categorical features are normalized
and transformed by their mean and
standard deviation.
• Processed dataset is compared in
unsupervised and supervised learning
• Although supervised is better than
unsupervised for known attacks the latter
beats the former in new attacks.
– It
makes
sense
to
choose
unsupervised as it does not need a
labeled (lots of work) dataset to train
on.
<Sait Murat GIRAY>, <METU>
39
Data Cleaning and Reduction
• Dataset has underwent further preprocessing and sampled
for cleaning
• Principal Component Analysis (PCA) is used to reduce
the dimensionality of data set
– Easy on classifier computational needs
– Less time for classifier building and testing
– Improves detection rates.
<Sait Murat GIRAY>, <METU>
40
Data Discretization and Relabeling
• Symbolic features of KDD dataset is coded into numeric
data
• Dimensionality of dataset is increased to 119 (from 41)
• An unsupervised approach is followed to identify
anomalies within the set.
• Conversion process yielded good results since better
overall classification accuracy achieved.
• In relabeling, standard labels are traded with “service
type” label.
– Proximity measure detects anomalies as outliers
<Sait Murat GIRAY>, <METU>
41
Overview
• Next topic is
– Introduction
– Packet Header Anomaly Detection
– Protocol Anomaly Detection
– KDD Cup 1999 Detection
– Content Anomaly Detection
– Alert Anomaly Detection
– Discussion and Conclusions
<Sait Murat GIRAY>, <METU>
42
Content Anomaly Detection
• Content (payload) is the data
transported in the packet header.
• Basically
exploit
codes
are
embedded into this “free wagon”
and sent to clients
• Detection requires much more
deep inspection and it is
expensive
• Common attack vectors of payload
attacks are
– SQL injection and cross site
scripting on servers
– Web browser exploits on clients
<Sait Murat GIRAY>, <METU>
43
N-gram Analysis of Server Requests
• N-gram is a contiguous sequence of N items from
a given sequence of text or speech
• Using N-grams for data preprocessing
– Does not require expert domain knowledge
to construct relevant features
– Traffic payload models are automatically created
<Sait Murat GIRAY>, <METU>
44
Payload Based Anomaly Detector (PAYL)
• Uses 1-g (single byte of 0-255 range) and unsupervised
learning to build a byte-frequency distribution model
of network payloads.
• Measures distances as similarity of an unknown sample set
to a known one and compare current traffic packet with
model.
• If distance > threshold then there is an anomaly.
<Sait Murat GIRAY>, <METU>
45
ANAGRAM
• Uses a mixture of N-grams where N>1.
• Reduces mimicry attack risk as it is hard to
emulate extra padded bits
• ANAGRAM;
– Utilizes supervised learning to model
normal traffic
– Stores N-grams into bloom filters
those are probabilistic data structures
used to test whether an element is a
member of a set.
– Compare N-grams from traffic to
bloom filters for attack classification.
<Sait Murat GIRAY>, <METU>
46
Analysis of Web Content to Client
• Client computers are protected from external threats by
boundary protection such as NIDs.
• Therefore attackers target clients over web browser,
email clients, instant messaging an etc.
– They are externally viewed by client
– Lots of functionalities are embedded along with
vulnerabilities.
<Sait Murat GIRAY>, <METU>
47
Analysis of Web Content to Client
• Techniques to detect attacks on networks clients are
mainly categorized as:
– Constructing features from webpages and embedded
scripts (Java script) and using weights on each
feature to produce an anomaly score
– Performing statistical analysis of script functions
to discriminate normal activity from malicious behavior
– Analyzing web pages with HTTP links and whitelisting
allowed sites
<Sait Murat GIRAY>, <METU>
48
Overview
• Next topic is
– Introduction
– Packet Header Anomaly Detection
– Protocol Anomaly Detection
– KDD Cup 1999 Detection
– Content Anomaly Detection
– Alert Anomaly Detection
– Discussion and Conclusions
<Sait Murat GIRAY>, <METU>
49
Alert Anomaly Detection
• This is a hierarchy of NIDSs in which upper one is
processing alerts produced by the lower one.
• Upper NIDSs correlate and classify alerts, generate
statistics and make clusters to detect outliers in order to
give refined alert output.
• This technique is particularly useful in the presence of
large number of outputs.
– Either your system is under heavy attack
– Your NIDS is giving many false positives and time to
check it.
<Sait Murat GIRAY>, <METU>
50
Overview
• Next topic is
– Introduction
– Packet Header Anomaly Detection
– Protocol Anomaly Detection
– KDD Cup 1999 Detection
– Content Anomaly Detection
– Alert Anomaly Detection
– Discussion and Conclusions
<Sait Murat GIRAY>, <METU>
51
Feature Set Comparison and Recommendations
• Most popular packet header approach is MCDs and
statistical measures to evaluate network flows
• Approaches for packet payload analysis and client requests
are gaining popularity since attacks are evolving towards
this area.
• Recommendations:
• Broad anomaly detection requires separate detectors per
feature set while targeted detection demands per single
feature.
• Packet header features run fast with low overhead
• Payload analysis can detect application specific
attacks ignored by packet header approaches but at the
cost of expensive computations
<Sait Murat GIRAY>, <METU>
52
<Sait Murat GIRAY>, <METU>
53