Download Intelligence Based Intrusion Detection System (IBIDS) Senior Project

Document related concepts

Human genetic clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
Intelligence Based Intrusion Detection
System (IBIDS)
Senior Project
Session 2004-2005
Intelligence Based Intrusion Detection System
IBIDS
Feb 18th
2005
Submitted By:
Mobeen Faiq
Sayyed Sharjeel Musa Hussain
Zahra Nadeem
Abdul Moeed
2005-02-0090
2005-02-0260
2005-02-0208
2003-02-0005
Department of Computer Science
Lahore University of Management Sciences
A report submitted to the Department of Computer
Science in partial fulfillment of the requirements for the
degree BSc (Honours) in Computer Science
By
Mobeen Faiq
Zahra Nadeem
Sayyed Sharjeel Musa Hussain
Abdul Moeed
Lahore University of Management Sciences
February 18th, 2004
ii
Acknowledgements
First of all we would like to pay our respects and thanks to Allah Almighty for making us
capable of completing a successful project and for showering his blessing upon us throughout
the project and especially at times of need. The cooperation, help and support provided by Dr
Asim Karim was a great asset to have during our moments of confusion and we truly
acknowledge that. He has been a constant source of guidance throughout the course of this
project, and a great advisor. We would like to thank Dr Tariq Jadoon for his help and
guidance in understanding the different aspects of networks, which we truly needed in the
beginning, without his help, we might not have gotten half way through our project. We
would also like to thank M.M Awais for his guidance with the various aspects of Artificial
intelligence that he helped us understand, and for helping us out where we needed him. The
project’s completion would not have been possible without the constant follow-up by Dr.
Asim Loan and Mr. Bilal Afzal, who gave us a boost at times when we were lagging behind.
We would also like to thank our families whose love and care enabled us to ease our minds
and souls.
______________________
Sayyed Sharjeel Musa Hussain
2005-02-0260
______________________
Mobeen Faiq
2005-02-0090
______________________
Zahra Nadeem
2005-02-0208
______________________
Abdul Moeed
2003-02-0005
Date
February 18th, 2004
i
Executive Summary
In the current age of technology, the means of communication and establishing networks
throughout the globe has changed from a human medium to a digital medium. Now it is
mostly through computers, and especially the internet, that most of global communication
takes place. This global communication does not just contain normal conversations and
public-available data, but also consists of the transactions and transfers of private and
confidential data which are kept at sites available on the network. Everyone who uses the
network has some private data on the internet, whether it be a simple hotmail account
password, and on his own computer. This information needs to be secured and kept safe from
all kinds of intruders so that the safety and privacy of the individual is maintained.
Intrusion Detection Systems are devices that can be hardware or software based and which
detect different kinds of intrusions. It is the job of these systems to detect and warn of any
kind of attack being taking place, so that the user is aware of the situation and takes control
of it, even if it means restarting the computer. The problem with these systems is that most of
them use pre-defined rules that are easy to defeat because hackers usually find workarounds
for those rules. Our project is a software-based Intrusion Detection System for the Linux
operating system and it allows identification of malicious activities using pattern matching
that occurs in most commercial Intrusion Detection Systems. The difference in our system
and the rest is that ours is a hybrid of anomaly and misuse detection system with an
additional domain knowledge rule base which has been extracted through data mining on the
network. All this serves the purpose of not just detecting existing attacks, but also of
detecting new and unknown attacks.
ii
Table of Contents
Statement of Submission
Acknowledgements
Executive Summary
Table of Content
i
ii
iii
iv
Chapter 1: Introduction
1.1 Scope of Project
1.2 Currently available solutions
1.2.1 Stateless Firewalls
1.2.2 State full Firewalls
1.2.3 Proxy Servers
1.2.4 Signature Matching IDS
1.2.5 Anomaly detection IDS
1.3 Problems with current solutions
1.3.1 Problems with stateless firewalls
1.3.2 Problems with state full firewalls
1.3.3 Problems with Signature Matching IDS
1.3.4 Problems with Anomaly based IDS
1.4 Development Steps
1
1
1
1
2
3
4
4
5
5
6
7
8
8
Chapter 2: Research Work
2.1 Honeyd
2.2 Anomaly Detection
2.2.1 Clustering: A data mining technique
2.2.2 Types of clustering algorithms
2.2.3 Partitioning Algorithms
2.2.4 Hierarchal clustering algorithms
2.2.5 Density based clustering Algorithms
2.3 Misuse Detection
2.3.1 Boyer- Moore Algorithm
2.3.2 Exb and E2xb Algorithm
2.3.3 Wu and Manber’s Algorithm
2.3.4 Kim’s Algorithm
2.4 Derived Attributes
2.5 Packet Header
2.6 Intrusion Types and their Characteristics
2.6.1 DOS Attacks
2.6.2 R2L Attacks
2.6.3 U2R Attacks
2.6.4 Probes and Scans
9
9
10
10
11
11
12
12
12
14
15
16
18
19
22
23
23
24
24
24
Chapter 3: Design
3.1 Design Overview
25
25
ii
3.2 Details of Modules
3.2.1 Blocked Ports/IP
3.2.2 Pre Anomaly
3.2.3 Anomaly Detection
3.2.4 Signature Matching
3.2.5 Attack Clustering
3.3 Performance Tweaking
27
27
28
28
38
38
40
Chapter 4: Prototype Results
4.1 Coverage
4.1.1 The Pre Filtering Stage
4.1.2 Layer One: Anomaly Detection Stage
4.1.3 Layer Two: Payload signature matching and
Payload attributes check stage
4.2 False Positives
4.3 False Negatives
4.4 Detection Probability
4.5 Handling higher bandwidth Traffic
4.6 Ability to Detect new attacks
4.7 Limitations of the Model
43
43
43
44
Chapter 5: Future Enhancements
Chapter 6: Conclusion
48
49
References
50
44
44
45
46
46
46
47
iii
Chapter One
Introduction
1.1
Scope of Project
Typically network administrators use firewalls, proxy servers and Network Intrusion
Detection and Response mechanisms to protect their networks against any possible
threat from an external unauthorized source. However, the problem with currently
available NIDS lies in their detection mechanism. Current NIDS use pattern
recognition and signature matching to detect flow of malicious packets over the
network. However, these techniques fail in the case of presence of new and
unknown attacks. This is due to the absence of signatures and patters for new
attacks within the database. This creates a loophole in the security of the network,
making it vulnerable to potential threats. The scope of this project was to come up
with a new design for a Network Intrusion Detection System whose ability to detect
unauthorized attacks did not lie in having a pattern and signature database to match
network flow against.
The limited time for our senior project forced us to limit our focus only on the TCP
protocol. In other words the solution proposed in this report, as of current, works only
on TCP packets. However, this is no way means that the solution proposed cannot
be modified and applied to other protocols. Essentially, this paper puts forward a
generic design for a totally new real time intrusion detection system; a design which
can very easily be replicated for other protocols besides TCP. By stating that our
design works on the TCP protocol, what we mean is that the prototype developed for
this design was only tested on the TCP protocol. Testing of this design for ICMP,
UDP and other protocols was not carried out. Thus the viability of this design for
other protocols besides TCP is not known.
1.2
Currently Available Solutions
In this section we will provide a brief overview of the currently available solutions for
monitoring intrusions within the network.
1.2.1 Stateless Firewalls
Firewalls, in their basic form, are intended to prevent people with harmful
intentions from penetrating and gaining access to a network. There are
several different technologies in use today that a firewall can use to
accomplish this. For the purposes of this discussion, we will concentrate on
stateless packet filters, or “Stateless Firewalls”.
Stateless packet filters work by examining individual packets as they are
transmitted between the data link layer and network layer of the receiving
1
computer. Based on how the filter is configured, the incoming packet’s
protocol header is examined and compared with criteria established by the
network administrator. Some of the more useful data fields in the protocol
header are:





Protocol type
IP address
TCP/UDP port
Fragment number
Source routing information
Filtering based on the IP address is more effective than protocol filtering,
depending on how the filter is configured. It’s a two-way type of filter as well,
meaning that incoming as well as outgoing packets can be dropped. If the
filter is configured so that all IP addresses are permitted through the filter with
the exception of a few, then this represents a significant risk to the network of
being attacked by a hacker. The reason being is that a hacker has a good
chance of finding an IP address that is permitted by the filter. A better
approach is to deny all IP addresses by default and grant access to a limited
number of IP addresses. A potential hacker will have a harder time finding IP
addresses that are granted access by the filter.
Filtering on the basis of TCP/UDP Port numbers is another way to filter
packets. Port #s, which represent access points to a network, are a common
entry point for a hacker to gain access to a system. TCP/UDP examines the
port number associated with the incoming packet and compares it with a list
established by the network administrator. Some of the more important
protocols to block include Telnet ports, Net BIOS session ports and POP
ports. Most hackers try to exploit ports associated with these protocols
because they give them enormous capability once they have broken in.
1.2.2 State full Firewalls
A stateful firewall remembers the context of connections and continuously
updates this state information in dynamic connection tables. To give an
example of the benefits of a stateful firewall, a hacker trying to gain access
has less chance of forging entry as part of a valid series of connections
because the context will show that the additional connection does not make
sense for a legitimate user.
What is state and how does a firewall determine the state of a communication
between a source and destination host? State can be loosely defined as the
"condition or status of a connection between two communicating hosts".
States might be defined as beginning, middle, and end, or beginning and end,
or sent and received, or none of the above (as seen with "stateless"
protocols). The first rule about communication states is that they vary with the
2
protocols
used.
Regardless of the protocol and how it manages its state of communication, a
firewall needs to keep track of the communication status between a source
and destination host. This information is stored in what is called a "state
table". Various types of information are stored in a state table and the
information varies with the protocol used by the communicating hosts.
Examples of information kept in a state table include:






Source and destination IP address
Source and destination port
Protocol, flags, sequence and acknowledge numbers
ICMP Code and Type numbers
Secondary connection information communicated in application
layer headers
Application layer specific command sequences (GET, PUT,
OPTIONS, etc.)
For example, one of the main jobs a firewall performs is to block all
unsolicited inbound connections while allowing responses from servers that
internal network clients have made outbound connections to. The firewall can
block the unsolicited inbound connections while allowing the servers to
respond by keeping track of the outbound connections in its state table.
For example, when the internal network client makes an outbound
connection, the firewall might enter the source and destination IP address and
port number in the state table (it might also enter flag, sequence number, and
ack number information too). When the firewall receives the server's
response, it checks the state table to see if anyone made an outbound
request to that server. If so, and if the flags, sequence, and acknowledge
numbers are appropriate (for TCP communications), then the firewall passes
the response to the internal network client that made the outbound request.
1.2.3 Proxy Servers
A proxy server is a web server that resides between a client computer and the
Internet. It usually is on the same server as the firewall. The proxy server
acts as a block between the client computers and the live Internet. It is able
to monitor all requests, inbound and outbound, that pass through the server.
Proxy servers are a key component of most corporate networks. Proxy
servers are used for the following reasons:




Filter requests and control access
Provide access to clients that are behind a firewall
Improve the performance of the network
Share Internet connections among several computers
3
In order for the proxy server to be able to filter and control access to the
Internet, it must be setup on the network as the gateway to the Internet. For
this to be possible all computers that will require filtering and access control
are required to be inside the firewall. These computers will then request an
Internet service (FTP, HTTP, etc.). The proxy server receives the request
and uses access control lists to determine if the request is acceptable. If it is
not, the user will receive and error that the page is forbidden or that it is
inaccessible. If it does pass the filtering requirements, the proxy server
checks to see if the page is cached locally. If it is, it is passed to the user. If
not the proxy server uses one of its own IP addresses and the firewall
software to connect to the Internet on behalf of the requesting user. The
request is then made on the Internet. When it is returned, the proxy server
reads the response and makes sure it is acceptable. It is then forwarded to
the user. This process is virtually invisible to the users, and is extremely fast,
making the user believes they are getting the response directly from a remote
server.
1.2.4 Signature Matching IDS
The Idea behind a Signature Matching IDS lies in the ability of a system to
detect pre compiled attack patterns within a packet. Every attack produces
some sort of signature or pattern. These signatures or patterns are basically
packet attributes which when arranged in a certain manner can be used to
intrude networks. These attributes include:




Flag Bit settings
Time to live
Payload Content
Acknowledgement Values
Based on values of these attributes found within attack data, signatures and
patterns are compiled. These signatures and patterns are then used as
benchmarks to compare against incoming network packets. A packet that
contains any one of these signatures is classified as malicious and not
allowed to enter the network. For the purpose of this project, these rules have
been taken off the Snort website. Snort is an open source Intrusion detection
system which works on Linux based server machines. These precompiled
signatures and patterns, have made available for distribution the website
(URL).
1.2.5 Anomaly Detection IDS
4
Anomaly detection systems are another form of intrusion detection system.
While a signature matching based intrusion detection system compares
arriving packets with available signatures, anomaly detection IDS defines a
normal state for the network. By a normal state what we mean is that an
Anomaly based system will observe the normal flow of data over the intended
network for a pre specified period of time. Using this data, the system defines
an average (normal) packet for the network. This average packet is different
for different protocols. After this normal packet has been defined, the system
compares all incoming packets with this normal packet. Any packet found to
be deviating considerably from this normal state is classified malicious for the
intended network.
1.3
Problems with current Solutions
1.3.1 Problems with Stateless Firewalls
Stateless firewalls suffer from several significant drawbacks that make them
insufficient to safeguard networks by themselves. The major drawbacks to
stateless firewalls are






They cannot check the data (payload) that packets contain.
They do not retain the state of connections
The TCP can only be filtered in the 0th fragments
Public services must be forwarded through the filter
Trojan horses can defeat packet filters using NAT
Low pass blocking filters don’t catch high port connections. Each of
these is explained below.
The first drawback pertains to what the packet filters check prior to either
dropping or permitting a packet access to the network. Packet filters apply
criteria to a packet’s protocol header, which says nothing about the data
portion of the packet. As an example, a HTTP packet flowing into a network
could contain Trojan horses embedded in ActiveX controls. The packet filters
cannot detect this because it’s not part of the packet’s protocol header. The
other limitation has to do with the fact that stateless firewalls don’t retain a
memory of connections between host computers. As such, a hacker can set a
packet and claim that it belongs to a connection. A stateless firewall has
minimal ability (by checking packet’s SYN flag, which a hacker can set) to
determine that it doesn’t.
As hacker sophistication has grown and the use of the Internet has grown,
hackers are now able to gain access to networks through e-mail and web
servers that are open to the public. If these servers are part of a larger
network, a hacker can gain access to the larger network through them.
5
1.3.2 Problem with Stateful Firewalls
Despite the fact that many stateful firewalls by definition can examine
application layer traffic, holes in their implementation prevent stateful firewalls
from being a replacement for proxy firewalls in environments that need the
utmost in application-level control. The main problems with the stateful
examination of application-level traffic involve the abbreviated examination of
application-level traffic and the lack of thoroughness of this examination,
including the firewall's inability to track the content of said application flow.
To provide better performance, many stateful firewalls abbreviate
examinations by performing only an application-level examination of the
packet that initiates a communication session, which means that all
subsequent packets are tracked through the state table using Layer 4
information and lower. This is an efficient way to track communications, but it
lacks the ability to consider the full application dialog of a session. In turn, any
deviant application-level behavior after the initial packet might be missed, and
there are no checks to verify that proper application commands are being
used throughout the communication session.
However, because the state table entry will record at least the source and
destination IP address and port information, whatever exploit was applied
would have to involve those two communicating parties and transpire over the
same port numbers. Also, the connection that established the state table
entry would not be properly terminated, or the entry would be instantly
cleared. Finally, whatever activity transpired would have to take place in the
time left on the timeout of the state table entry in question. Making such an
exploit work would take a determined attacker or involve an accomplice on
the inside.
Another issue with the way stateful inspection firewalls handle applicationlevel traffic is that they typically watch traffic more so for triggers than for a full
understanding of the communication dialog; therefore, they lack full
application support. As an example, a stateful device might be monitoring an
FTP session for the port command, but it might let other non-FTP traffic pass
through the FTP port as normal. Such is the nature of a stateful firewall; it is
most often reactive and not proactive. A stateful firewall simply filters on one
particular command type on which it must act rather than considering each
command that might pass in a communication flow. Such behavior, although
efficient, can leave openings for unwanted communications types, such as
those used by covert channels or those used by outbound devious application
traffic.
In the previous example, we considered that the stateful firewall watches
diligently for the FTP port command, while letting non-FTP traffic traverse
without issue. For this reason, it would be possible in most standard stateful
6
firewall implementations to pass traffic of one protocol through a port that was
being monitored at the application level for a different protocol. For example,
if you are only allowing HTTP traffic on TCP port 80 out of your stateful
firewall, an inside user could run a communication channel of some sort (that
uses a protocol other than the HTTP protocol) to an outside server listening
for such communications on port 80.
Another potential issue with a stateful firewall is its inability to monitor the
content of allowed traffic. For example, because you allow HTTP and HTTPS
out through your firewall, it would be possible for an inside user to contact an
outside website service such as www.gotomypc.com. This website offers
users the ability to access their PC from anywhere via the web. The firewall
will not prevent this access; because their desktop will initiate a connection to
the outside Gotomypc.com server via TCP port 443 using HTTPS, which is
allowed by your firewall policy. Then the user can contact the Gotomypc.com
server from the outside and it will "proxy" the user's access back to his
desktop via the same TCP port 443 data flow. The whole communication will
transpire over HTTPS. The firewall won't be able to prevent this obvious
security breach because the application inspection portion of most stateful
firewalls really isn't meant to consider content. It is looking for certain triggerapplication behaviors, but most often (with some exceptions) not the lack
thereof.
1.3.3 Problems with Signature Matching IDS
Essentially all signatures matching IDS lack the ability to detect new attacks.
Since signature matching is based on a database of signatures, new attacks
or variations of old attacks are not logged into the database and hence forth
go undetected by the IDS. This creates a major loop hole in the network
security.
Furthermore, the number of known attacks and their different variations run
into thousands. The database maintains signatures for all these attacks
making the database enormous. Each packet that arrives on the network is
matched against this entire database. This makes signature matching based
IDS very slow. Also as time progresses, this database will continue grow and
so will the bandwidth of the Internet. This would mean that packets would be
arriving at a much faster rate at the network and the signature database will
be bigger as well. Under such circumstances, it will become virtually
impossible for a signature matching based IDS to do detection in real time,
creating a further security loop hole for hackers.
7
1.3.4 Problems with Anomaly based IDS
Anomaly detection systems suffer from an inherent drawback i.e.; they
generate too many false positives. False positive is a situation where the IDS
declare a packet as an attack where as in reality that packet is non-malicious.
Since every declaration of an attack by an IDS will needs to be sent to the
response system, excessive false positives will result in the response system
going crazy
1.4
Development Steps
Initially we started off in one direction however the final result which we
accomplished was drastically different from our initial thinking. Initially we based our
work on the assumption that their must be some common features amongst all
attacks. Based on this assumption we started our work by collecting attack data.
However as time progressed we realized that their existed no such commonality.
From that point onwards we started experimenting with the currently available
technologies, understanding them and attempting to combine them in a manner that
would bring about a new hybrid system which will cover the short comings of all the
currently available solutions for network security. The following section lists a
complete detail off all the work undertaken in both the initial phase of the project as
well as the work done later to develop a hybrid system.
8
Chapter Two
Research Work
2.1
Honeyd
What Honeyd essentially does is that it simulates a virtual network, consisting of
routers, print servers, web servers and the works. This virtual network is then put on
the web with the intention of allowing hackers to attack this network. Whenever a
hacker attacks this network, an invisible logger logs all the activities that the hacker
undertakes without his/her knowledge. The basic purpose behind deploying Honeyd
was to collect attack data for our initial assumption, i.e. to derive a common feature
among all attacks. Honeyd was supposed to provide us with the data and statistics
that we required to carry out our work. During the course of our research into the use
and deployment of Honeyd, the following technical papers were read:
Article:
Author:
Abstract:
Usage of Honey pots for Detection and Analysis of unknown Security
Attacks
Prof. Dr.-Ing. A. Wolisz (University of Berlin)
This article, which was originally written as a PhD thesis, discusses in
detail the design of a virtual Honeyd network. It talks about the various
issues in setting up a Honeyd network and how Honeyd operates. It
overviews the different form of signatures and hack attacks available at
currently. However, further research is still required in this area
because this article only gave an over view of different forms of
intrusions but left out the details which are more relevant with regards
to our current project.
Article:
Author:
Abstract:
Honey pots: Weighing up the costs and benefits
Andrew Evans
This article was more of a side reading to determine the feasibility of
use of honey pots for our senior project. No design or implementation
issues were discussed. This article contained a critical analysis of the
use of honey pots against currently available solutions towards
intrusion detection; primarily intrusion detection systems
Article:
Author:
Abstract
To Build a Honey pot
Lance Spitzner
This again was among one of those readings which aided in
gaining
a better idea about the working of a honey pot. What essentially was
different in this article from the first one was this article concentrated
less on the design of the honey pot and more on how to use that honey
pot and the how different services can be simulated on a Honeyd
network.
9
2.2
Anomaly Detection
The Anomaly Detection technique works on headers and not payload. This makes
this technique very fast and a good candidate for real time detection, as no lengthy
payload string comparisons need to be done. One disadvantage is that it produces a
lot of false positives because it only makes an intelligent guess. But, this
disadvantage can become an advantage. Because it is making an intelligent guess,
any new attack that has not been seen before, may be detected by such a system.
With progress in developing software and methods to counter network attacks,
hackers and attackers are getting cleverer and deploying newer and newer
techniques to attack. So you are not safe from attacks just by guarding yourself
against known attacks. You would have to anticipate the methods and intentions of
attackers and come up with a mean to counteract unknown attacks to some degree.
Anomaly Detection techniques can prevent new attacks that have not been seen
before. Because this compares incoming packets with normal packets, any
abnormal packet seen or unseen would be detected. But these are only those
attacks that can be identified from headers. Any new virus or worm would not be
detected by such a system because they are present in the payloads of the packets.
Since not all the normal packets can be incorporated in these clusters, false
positives are produced. Here there is a tradeoff. The more you give leverage to the
system, the more false positives would be produced but at the same time, more
novel attacks would be detected and less false negatives would be produced and
vice versa. So one has to decide upon how much leverage should be given to the
system.
Anomaly Detection seemed a good choice for fast real time detection of attacks
using packet headers. We reached this decision after reading various papers on
Intrusion Detection in real time. It was seen that if fast attack detection is needed,
then payload should not be touched as payload comparisons are very costly.
Detection using header information was seen as very fast. One such paper that
really caught our attention was NATE (Proceedings of 2001 workshop on New
Security Paradigms, ACM Press) which measures TCP header information and
defines clusters for them. Then it detects deviations in the new TCP header data
from the existing clusters to decide whether the header represents normal or attack
packet. It gave us an idea as to how to use headers in clustering. Another paper
further helped us as it gave detailed information about the usage of TCP headers. It
was Anomaly Detection using TCP Header Information by Weijie Cai and Li. It is
based on the idea of NATE. It contained all the TCP flags needed and how they
should be checked.
2.2.1 Clustering: A Data Mining Technique
It is a common practice to employ Data Mining techniques for Anomaly
Detection. A common Data Mining technique for Anomaly Detection is
Clustering. Clusters are groups of objects that are similar to each other.
10
Similar objects end up in one cluster. These clusters have low inter-cluster
similarity and high intra-cluster similarity. This means that the similarity
between any two objects of any two DIFFERENT clusters is less than the
similarity between any two objects of SAME cluster.
Clustering has various applications. One such field is marketing; where it
helps marketers to generate groups about customers and use these to
develop targeted marketing programs. Another area is City Planning;
planners can identify groups of houses according to house type, value,
geographical location by using this technique. World Wide Web also benefits
from this as documents can be classified and Web log data can be clustered
to discover groups of similar access patterns.
The objects to be categorized into clusters can be numerical or non-numerical
and depending on the type of the objects, there are various similarity
measures and algorithms available for clustering. Similarity is expressed in
terms of a distance function which is typically metric. Some popular distance
measures for numerical data are Minkowski Distance, Manhattan Distance
and Euclidean Distance. We used the Euclidean Distance in our project.
2.2.2 Types of Clustering Algorithms
There are around 5 basic clustering approaches. The following are some of
them. Partitioning Algorithms construct various partitions and then evaluate
them by some criterion. Hierarchal Algorithms create a hierarchal
decomposition of the set of data using some criterion. Density Based
Clustering Algorithms produce clusters based on connectivity and density
functions. Grid Based Clustering is based on a multiple level granularity
structure. And in Model Based Clustering, a model is created for each of the
clusters and the idea is to find the best fit of that model to the data.
These are the basic approaches. There are various algorithms that follow
either one of these approaches or are a mixture of them. Different algorithms
were reviewed in order to decide which algorithm should be used in our
project to cluster the header information. The information needed to
differentiate attack packets from normal packets is basically numeric; so we
looked at algorithms that worked with numeric data. We needed an algorithm
that was simple yet efficient. The following are some of the algorithms that
were narrowed down for implementation but one was selected.
2.2.3 Partitioning Algorithms
The K-Means Algorithm constructs a partition of a database D of n objects
into k clusters. Each cluster is represented by the mean of the cluster. A point
is placed in that cluster which has the least distance between the clusters
“mean” and the point.
11
Its strength is that it is relatively efficient and simple. It take O (tkn) where t is
the number of iterations, k are the number of clusters and n are the number of
points. Its weakness is that we need to specify the number of clusters in
advance.
There are some variations of K-Means Algorithm like the K-Modes Algorithm
that uses the modes instead of mean and the K-Mediod Algorithm that uses
median instead of mean. The K-Mediod Algorithm is also called PAM which
works for small data but not for large data. Thus PAM can not be used here.
Then there is CLARA that is built on PAM to improve its performance by using
sampling. CLARA can deal with larger data sets. Then there is an even more
efficient CLARANS but these algorithms are complex.
2.2.4 Hierarchal Clustering Algorithms
AGNES and DIANA are two hierarchal clustering algorithms. AGNES is
agglomerative and DIANA is divisive. These do not require k number of
clusters as input, but need a terminating condition.
AGNES goes in a non-descending fashion. It merges nodes that have the
least dissimilarity. It goes on merging the nodes until they all belong to the
same cluster. That is why we need a terminating condition. DIANA is the
inverse order of AGNES.
They algorithms do not scale well. Agglomerative methods have time
complexity of at least O (n2). BIRCH was looked at. It makes an in-memory
tree of the data and then applies a clustering algorithm to it but it is sensitive
to the order of the data record.
2.2.5 Density Based Clustering Algorithms
Several algorithms are present. One is the DBSCAN. It is a long and complex
algorithm. Another algorithm OPTICS is based upon it. OPTICS creates an
augmented ordering of the data set with respect to its density based
clustering. Clusters of all densities can be derived from this ordering.
DENCLUE is an algorithm which is faster than OPTICS and DBSCAN but
needs a large number of parameters. It is good for data sets with large
amounts of noise.
2.3
Misuse Detection
Misuse detection essentially checks for "activity that's bad" with comparison to
abstracted descriptions of undesired activity. This approach attempts to draft rules
describing known undesired usage (based on past penetrations or activity which is
12
theorized and would exploit known weaknesses) rather than describing historical
"normal" usage. Rules may be written to recognize a single auditable event that in
and of itself represents a threat to system security, or a sequence of events that
represent a prolonged penetration scenario. The effectiveness of provided misuse
detection rules is dependent upon how knowledgeable the developers are about
vulnerabilities. Misuse detection may be implemented by developing expert system
rules; model based reasoning or state transition analysis systems, or neural nets.




Expert Systems may be used to code misuse signatures as if-then
implication rules. Signature analysis focuses on defining specific descriptions
and instances of attack-type behavior to flag. Signatures describe an attribute
of an attack or class of attacks, and may require the recognition of sequences
of events. A misuse information database provides a quick-and-dirty
capability to address newly identified attacks prior to overcoming the
vulnerability on the target system. Typically, misuse rules tend to be specific
to the target machine, and thus not very portable.
Model Based Reasoning attempts to combine models of misuse with
evidential reasoning to support conclusions about the occurrence of a misuse.
This technique seeks to model intrusions at a higher level of abstraction than
the audit records. In this technique, developers develop intrusion descriptions
at a high, intuitive level of abstraction in terms of sequences of events that
define the intrusion. This technique may be useful for identifying intrusions
which are closely related, but whose audit trails patterns are different. It
permits the selective narrowing of the focus of the relevant data, so a smaller
part of the collected data needs to be examined. As a rule-based approach it
is still based on being able to define and monitor known intrusions, whereas
new and unknown vulnerabilities and attacks are the greatest threats.
State Transition Analysis creates a state transition model of known
penetrations. In the Initial State the intruder has some prerequisite access to
the system. The intruder executes a series of actions which take the target
system through intermediate states and may eventually result in a
Compromised State. The model specifies state variables, intruder actions,
and defines the meaning of a compromised state. Evidence is pre selected
from the audit trail to assess the possibility that current system activity
matches a modeled sequence of intruder penetration activity (i.e., described
state transitions lead to a compromised state). Based upon an ongoing set of
partial matches, specific audit data may be sought for confirmation. The
higher level representation of intrusions allows this technique to recognize
variations of scenarios missed by lower level approaches.
Neural Networks offer an alternative means of maintaining a model of
expected normal user behavior. They may offer a more efficient, less
complex, and better performing model than mean and standard deviation,
time decayed models of system and user behavior. Neural network
techniques are still in the research stage and their utility have yet to be
proven. They may be found to be more efficient and less computationally
13
intensive than conventional rule-based systems. However, a lengthy, careful
training phase is required with skilled monitoring.
This was followed by a review of expert systems techniques.
Expert systems mainly use signature matching as the technique for detection of
known attacks. However, the greatest problem encountered with these systems is
that the signatures database is quite large. In order to gain a better understanding of
how having a very large signatures database can create problems; imagine a
signatures database containing around 1000 signatures. Now if we assume that on
average 100 packets arrive per second on the network (a very unrealistic number,
because in reality that figure runs into thousands even millions depending on the
bandwidth and capacity of the link). Now since the signature matching mechanism
requires that every arriving packet be compared with the signature matching
database, this will lead to nearly 100,000 comparisons within a span of one minute?.
Imagine a gigabit network or even a 100Mbps network and you will realize how the
numbers of comparisons increase exponentially. This problem is further heightened
by considering substring matching. The majority of these signature and patterns
consist of a payload. These payload based signatures need to be found within the
packet payloads, giving rise to the issue of substring matching; which is a
computationally expensive process.
Computer Scientists over the world have been working on this problem and have
come up with a number of effective solutions to solve the problem of locating
signatures within packets, making it more efficient and reliable. The algorithms
proposed for signature matching fall into two broad categories:
Single pattern matching Algorithms
Algorithms that fall into this category have one common feature i.e., they compare
one signature at a time. This means that any of these algorithm will pick up one
signature at a time from the signatures database and after it has determined weather
this signature is located within the packet or not, only then will it move on to the next
signature. Descriptions of the algorithms which fall into this category are listed
below:
2.3.1 Boyer-Moore algorithm:
The most well-known algorithm for matching a single pattern against an input
was proposed by Boyer and Moore. The Boyer-Moore algorithm compares
the search string with the input starting from the rightmost character of the
search string. This allows the use of two heuristics that may reduce the
number of comparisons needed for string matching (compared to the naive
algorithm). Both heuristics are triggered on a mismatch. The first heuristic,
called the bad character heuristic, works as follows: if the mismatching
14
character appears in the search string, the search string is shifted so that the
mismatching character is aligned with the rightmost position at which the
mismatching character appears in the search string. If the mismatching
character does not appear in the search string, the search string is shifted so
that the first character of the pattern is one position past the mismatching
character in the input. The second heuristic, called the good suffixes heuristic,
is also triggered on a mismatch. If the mismatch occurs in the middle of the
search string, then there is a non-empty suffix that matches. The heuristic
then shifts the search string up to the next occurrence of the suffix in the
string.
2.3.2 ExB and E2Xb
E. P. Markatos, K. G. Anagnostakis proposed the exclusion based string
matching algorithm in 2002 and 2003. The idea is that for pattern P, if any
character pi doesn’t show up in text T, we will not be able to find P in T. This
algorithm is mainly designed for NIDS. It has two assumptions. The first
assumption is that most traffic will not trigger patterns. So by using this
algorithm, we don’t need invoke expensive algorithm such as Boyer-Moore
algorithm to verify the safety of the traffic. The second assumption is that the
text T is not too big. Otherwise the effectiveness of this idea will decrease
dramatically because the chance that text T has all the characters of pattern
P will increase as text T get big. This forces the NIDS not to accumulate too
much data to do pattern matching. Given the trend that NIDS are more and
more data stream (e.g. TCP data stream) oriented, this limitation may make
this algorithm less valuable.
In order to gain a better under standing of the ExB algorithm we will start off
by writing the pseudo code for ExB and then explaining it. This will clarify the
working of ExB.
boolean exists[256];
pre_process(char * text_T, int len_of_T)
{
bzero(exists, 256/8); // clear array
For (int idx = 0; idx < len_of_T ; idx++)
{
Exists [text_T[idx]] = 1;
}
}
Search (char * pattern_P, char * text_T, int len_of_P, int len_of_T)
{
For (int idx = 0; idx < len_of_P ; idx++)
{
if (exists[ pattern_P[idx] ] == 0)
Return DOES_NOT_EXIST;
}
15
Return boyer_moore(pattern_P, len_of_P, text_T, len_of_T);
}
Given above is the pseudo-code of ExB algorithm. There are two steps to
exclude patterns that will not match the packet. In the first step, function
preprocess () will process the characters in packet (text T) to record their
existence in array exists [256]. In the second step, function search () will
check signature pattern P is in the string text T or not. If any character of
signatures pattern P doesn’t show up in array exists [256], we can conclude
that signature pattern P will not match string text T. If there are multiple
patterns, the second step will be called repeatedly. But the first step is only
called once for all patterns. To decrease the false positive, the idea can be
generalized to use more bits of characters to create array exists []. For
common packets with nearly 1500-byte data, the chance that all characters
show up in a packet is high. So the 8-bit array (256 elements) will have high
false positive.
The major enhancement of E2xB to ExB is to use integer for variable exists
[256]. The reason for this change is that clearing exists [256] for each search
(each packet in NIDS) is too expensive. When exists [256] is integer, we can
assign the ID of packets to exists [256] in pre process (). In search (), we
check if exists[x] is equal to the ID of the packet. If exists[x] is not equal to the
ID of the packet, it means character x of pattern P doesn’t show up in this
packet (text T).
Multi Pattern Matching Algorithms
Unlike single pattern matching algorithm which takes up one signature at a
time and tries to locate it within the arriving packet, multi pattern based
algorithms simultaneously match a single packet with all available signatures.
How is this done? For a better explanation we will look at some currently
available multi pattern matching algorithms.
2.3.3 Wu and Manber’s algorithm
Sun Wu and Udi Manber proposed their multi-pattern matching algorithm in
1994. It mainly uses the bad character heuristic of Boyer-Moore algorithm.
But since the large number of patterns will decrease the chance to get bigger
movement, this algorithm use a block of characters, say 2 or 3 characters as
a block, to find a movement offset.
Let’s see an example to understand how it works. The figure below shows
that we are going to find 4 patterns, P1, P2, P3 and P4 in the text T. The
comparison region covers 2 characters. It is also the suffix region and the
character block that we are going to use to find patterns in the text.
16
Figure 1: Illustration of the Wu and Manber’s Algorithm
In step 1, “12” of text T is in the character block. Using pre computed
movement table, we know “12” will match P2’s substring “12”. So the text is
moved to left by 4. The pre computed movement table is got by using the
same idea as Boyer-Moore algorithm. In step 2, “56” of text T is in the
character block. Now since “56” of text T match the substring “56” of P1, P2,
and P3, the character is good. The bad character heuristic cannot tell us how
much to move. The algorithm pre-populated a hash table for the character
block of all patterns in the suffix region. Then the algorithm will be able to get
all potential patterns from the hash table quickly. In this case, the patterns are
P1, P2, and P3.
To eliminate some patterns that will not match the text, the algorithm will get
rid of the patterns that don’t match the text in the prefix region. In this case,
pattern P1 will be eliminated quickly without checking all characters of pattern
P1. Now only P2 and P3 left. The algorithm will check all remaining patterns
against the text T. If no pattern matches the text T, the text T will move left by
one and restart from step 1.
The key point of this algorithm is that it assumes it can move the text quickly
to the left from the bad character heuristic because of its larger character
block (2 or 3 characters instead of 1 character). If the substring in the
character block of the text matches patterns, the bad character heuristic will
not work any more. The solution to this problem is to use a hash to find
potential patterns and use prefix to further eliminate ineligible patterns.
Hopefully the number of remaining patterns will be small. Then we can use
naive comparison to verify if the remaining patterns can be found in the text.
17
2.3.4 Kim’s Algorithm
Sun Kim and Yanggon Kim proposed their encoding and hashing based multipattern matching algorithm in 1999. There are two basic ideas. The first idea
is that patterns may only have a few distinct characters. Then we can encode
the characters with fewer bits. This is the same idea as compression
algorithms.
With fewer bits to compare, we can use less number of comparisons. The
second idea is that we only need compare some patterns with the text instead
of all patterns in each comparison. The way to find only potential patterns to
compare is via hashing. In fact, these two ideas are independent. Either idea
can work by itself.
Let’s see an example to understand how the encoding scheme works. In the
following example, the pattern string only has 3 different characters: a, b, and
c. So 2-bit encoding is enough to represent the pattern. For text string, all
characters (e.g. d, e, f, g) other than the characters (a, b, c) in the pattern only
need one value to represent. Even if we encode all characters that are not in
patterns to one value, the pattern matching will still be correct. In the following
example, “abc” of text T and pattern P will match after encoding. But “de” of
text T will not match “ac” of pattern P because the encoded value of “de” is
not equal to the encoded value of “ac” of pattern P. This encoding scheme is
like the ExB algorithm, the special encoded valued for characters not in
patterns serve as the role of exclusion.
Pattern string: abcac
Text string: abcdefg
Encoding scheme: a: 00, b: 01, c: 10, all other characters: 11 Encoded pattern
string: 00 01 10 00 10
Encoded text string: 00 01 10 11 11 11 11
To search only one pattern in the text, the algorithm proposed in this paper is
the naive one: character by character comparison. In our example, if the
comparison fails, the encoded pattern string will move to the right by 2-bit
(corresponding to one character of the original strings). But because the
encoded pattern string becomes shorter, we may compare all characters of
the encoded pattern string with the text in one computer instruction. For
instance, the 32-bit computer needs two instructions to compare the original
pattern with the text because the pattern is 5 characters long. The encoded
pattern string only needs one instruction because it is only 10 bits long.
To search multiple patterns in the text, the authors proposed a hash to
decrease the number of patterns that need to be compared against the text.
All patterns are hashed to a hash table. The key is the first j characters of all
patterns. For instance, we add another two patterns to the above example:
acbcaaabb, cccaaabbbccc. The value of j is chosen to the min of the length of
18
patterns. So in this case, the value of j is 5, the length of abcac. The hash
table is pre-populated with the patterns. In each comparison, we use the
current j characters of the text as key to search the hash table. Then only
potential patterns are found. Other patterns whose hash value is different
from the hash value of the j characters of the text will not match the text. So it
is safe to use this hash scheme to find the potential patterns.
2.4
Derived Attributes
As mentioned before, the medium of attack for the attacker lies only in two places,
i.e. the packet payload and the packet header. The research done on how to detect
various kinds of attacks through these mediums led us to understand that the
current intrusion detection products in the market use only a basic set of options
which may consist of signature matching on the basis of rules, pattern recognition
and state based deviation detection systems. Further research on the types of
attacks made it obvious that a different set of rules were being used which was not
part of the basic set. These rules are defined as being derived attributes.
Derived attributes use domain knowledge to come up with certain attributes which
are able to encompass a larger radius of attacks and therefore were of more
interest. The research done on these derived attributes came primarily from the
1999 KDD Cup data, 1998 DARPA intrusion detection evaluation program and some
research papers like “Cost-based Modeling and Evaluation for Data Mining With
Application to Fraud and Intrusion Detection”: Results from the JAM Project by
Salvatore J. Stolfo, Wei Fan, Wenke Lee, Andreas Prodromidis, and Philip K. Chan.
The objective of the DARPA evaluation program arranged by MIT Lincoln Labs was
to survey and evaluate research in intrusion detection. They divided attacks in four
main categories:




DOS: denial-of-service, e.g. syn flood;
R2L: unauthorized access from a remote machine, e.g. guessing password;
U2R: unauthorized access to local super user (root) privileges, e.g., various
``buffer overflow'' attacks;
Probing: surveillance and other probing, e.g., port scanning.
Now for detecting these attacks, just checking the payload for signatures and
headers is not enough as that will not detect new kinds of attacks. To detect these
attacks some higher level features were required which were found on the KDD Cup
site. These Derived features are as follows:
19
type
Feature name
description
duration
length (number of seconds) of the
continuous
connection
protocol_type
type of the protocol, e.g. tcp, udp,
discrete
etc.
service
network service on the destination,
discrete
e.g., http, telnet, etc.
src_bytes
number of data bytes from source to
continuous
destination
dst_bytes
number of data bytes from destination
continuous
to source
flag
normal or
connection
Land
1 if connection is from/to the same
discrete
host/port; 0 otherwise
error
status
of
the
discrete
wrong_fragment number of ``wrong'' fragments
continuous
urgent
continuous
number of urgent packets
Table 1: Basic features of individual TCP connections.
The features described in Table 1 are one of the basic features of TCP connections
and will nearly always be used. All of these features can be extracted from the TCP
header. These features are used in nearly all the intrusion detection systems in the
market. The following is a table for the derived content features.
feature name
Description
Type
hot
number of ``hot'' indicators
Continuous
num_failed_logins
number of failed login attempts
Continuous
logged_in
1 if successfully logged in; 0
Discrete
otherwise
num_compromised
number
of
conditions
root_shell
1 if root shell is obtained; 0
Discrete
otherwise
su_attempted
1 if ``su root'' command
Discrete
attempted; 0 otherwise
num_root
number of ``root'' accesses
num_file_creations
number
of
``compromised''
file
Continuous
Continuous
creation Continuous
20
operations
num_shells
number of shell prompts
Continuous
num_access_files
number of operations on access
Continuous
control files
num_outbound_cmds
number of outbound commands
continuous
in an ftp session
is_hot_login
1 if the login belongs to the ``hot''
discrete
list; 0 otherwise
is_guest_login
1 if the login is a ``guest’ login; 0
discrete
otherwise
Table 2: Content features within a connection suggested by domain knowledge.
Table 2 dictates the content features which are suggested by domain knowledge.
These features are essentially part and parcel of the payload of the TCP packet and
must be extracted from there. Nevertheless they give precious information regarding
the usage of the system by an external source. It covers most of the R2L and U2R
attacks. The Hot feature describes events like the transfer of a file which can carry
malicious code, access to system directories, creation and execution of programs
and directories. The compromised condition covers wrong path traversal errors as
path not found/ file not found errors. This table is one of the most important tables
and its features are necessary to detect login hacks. Table 3 gives a time based
analysis of the traffic to find an attack.
feature name
Description
type
count
number of connections to the same
host as the current connection in continuous
the past two seconds
Note: The following features refer
to these same-host connections.
serror_rate
% of connections that have ``SYN''
continuous
errors
rerror_rate
% of connections that have ``REJ''
continuous
errors
same_srv_rate
% of connections to the same
continuous
service
diff_srv_rate
% of connections
services
srv_count
number of connections to the same
service as the current connection in continuous
the past two seconds
to
different
continuous
21
Note: The following features refer
to these same-service connections.
srv_serror_rate
% of connections that have ``SYN''
continuous
errors
srv_rerror_rate
% of connections that have ``REJ''
continuous
errors
srv_diff_host_rate % of connections to different hosts continuous
Table 3: Traffic features computed using a two-second time window
As is obvious from Table 3, this table tries to gather time based information from the
traffic to determine if an attack has taken place. Therefore it is a good derivation to
detect floods and scans. Although this table was of much importance but we
neglected it because most of the detection that it was doing was also being done by
other modules that we had thought up of.
2.5
Packet Header
To implement a good intrusion detection system, one needs to understand and have
a good knowledge of how network packets operate, and what are their components.
This is why we had to get an in depth knowledge of what kind of packets there are,
the most common include:
 TCP
 UDP
 ICMP
 IP
 IPV6
Most of the packets transported over the network are carried over the IP protocol
which has the following header properties:
4
8
16
32 bits
Ver.
IHL
Type of service
Total length
Identification
Time to live
Flags
Protocol
Fragment offset
Header checksum
Source address
Destination address
Option + Padding
Figure 2: IP header structure
This shows that the IP header is responsible for directing packets to/from IP
sources. Secondly some protocols are built upon this IP protocol, e.g. the TCP
protocol, which is responsible for a reliable end to end service. The TCP header
structure is as follows:
22
16
32 bits
Source port
Destination
port
Sequence number
Acknowledgement number
Offset
Resrvd
U
A
P
R
S
F
Checksum
Window
Urgent
pointer
Option + Padding
Figure 3: TCP header structure
Most of the anomalies and attacks used in scans and floods manipulate the TCP
header structure by setting a certain combination of flag values which for e.g.
establish a connection to get the computer’s Operating System Identification and
then end the connection.
The most famous technique of getting the victim’s information is by using the PING
and the FINGER command. The FINGER command operates on the FINGER
protocol whereas the PING command operates on the UDP protocol whose header
is as follows:
16
32 bits
Source port
Destination port
Length
Checksum
Figure 4: UDP header structure
The above header structures were mentioned due to their popularity. If the need
arises to check the header structure of other protocols, they can be found at
www.protocols.com .
2.6
Intrusion types and their characteristics
The research that we had done led us to understand that attacks fall into four main
categories:
2.6.1 DOS attacks:
DOS stands for denial of service attack. This attack does the exact same
thing; it denies a service to the victim. This service usually is the network
communication and the internet, but can also be more specific as targeting
the HTTP protocol and the web service or the TELNET service. Unlike a
privacy attack, where an adversary is trying to get access to resources to
23
which it has no authorization, the goal of DOS attacks is to keep authorized
users from accessing resources. The infected computers may crash or are
disconnect from the Internet. In some cases they are not very harmful,
because once you restart the crashed computer everything is on track again;
in other cases they can be disasters, especially when you run a corporate
network or ISP. These attacks either exploit a bug in the operating system to
crash the victim’s computer, or they use a large amount of data packets to
flood the victim’s service so that the service becomes of limited use to the
user. Some famous DOS attacks include nukes, syn-floods and http-floods.
Dos attacks can also be extended to DDOS which stand for the distributed
denial of service attack, which use multiple computers to start a DOS attack
to one victim. Characteristics of DOS attacks include a large surge of packets
to a certain service or the entire network service itself.
2.6.2 R2L attacks:
R2L attacks stand for Remote Login attacks. These attacks range from
guessing passwords to using scripts to crack passwords. In the end the goal
of this attack is to infiltrate the system by getting an unauthorized access from
a remote machine. These attacks are hard to detect but the major
characteristics of such attacks include a large amount of Password incorrect
errors which are sent back to the intruder. In another scenario in which the
attacker knows the password before hand, the attack becomes very hard to
detect, unless you have an anomaly detecting scheme installed which checks
the usage of the computer with the normal usage of the computer, and if there
is a certain deviation then it is considered as an attack.
2.6.3 U2R attacks:
U2R attacks stand for unauthorized access to root attack. These attacks
exploit user passwords and certain bugs and characteristics of the operating
system and their memory sizes to gain access to root. The famous kinds of
U2R attacks consist of buffer overflows. The characteristics of such attacks
include access to memory outside the set bounds.
2.6.4 Probes and scans:
Probes and scans are not necessarily attacks, but they help the attacker in a
variety of ways to construct and initiate an attack. Probes and scans are used
to gain information of the victim computer to check if certain ports are open on
which the attack can be initiated, and information of the Operating system
used. The information about the operating system is used to exploit certain
bugs in the OS or to initiate certain OS specific attacks. Characteristics of
such attacks include a packet being sent to multiple ports from a single
computer, or certain packets which request the reply from the computer and
which identifies the OS, such as finger or ping commands.
24
Chapter Three
Design
This section provides architectural details of a proposed hybrid design.
3.1
Design Overview
After reviewing a number of topics in network security, my team came up with the
conclusion that in order to resolve the issues present n currently available IDS one
needed to develop a hybrid based design, incorporating the features of currently
available technologies. The figure given below represents the final design for our
prototype.
25
Packet Sniffer
Blocked Ports/IP
Pseudo Random
Pre Anomaly
Signature Matching
Anomaly Detection
Check Attributes
Signature Matching
Attack Clustering
Figure 5: Modular View of proposed NIDS
Individual modules of this design are explained later on in this text. For now let us
examine the flow of data through this design.
Any packet arriving on the network can take one of two paths through the IDS. The
details of these paths are mentioned below:
Path One
(a) Packet Arrives at the Sniffer
(b) Packet goes to the blocked IP/ Ports module. If the packet is coming from a
source IP that has been registered with the IDS as blocked then the network
administrator will be alerted. Same will be the case with blocked ports. If packet
is not from a blocked IP or port then it is forwarded to the Pre Anomaly Module.
26
(c) This module checks various malicious settings of the header flags. Details of
what these checks are can be found in the relevant section. If the attributes of the
packet header match any of these pre defined malicious header attributes then
the administrator is alerted else the packet is passed on to the Anomaly
Detection Module.
(d) In the anomaly detection module the arriving packet is checked against the mean
(average/normal) packet which has been defined for the network. How this mean
packet is calculated and details of what criteria are used to reject or accept a
packet as malicious are given in the relevant section. However, it suffices to say
that if the anomaly detection phase classifies the packet as malicious then it is
stored in a database from where the next module will retrieve the packet for
further processing. If not then no action is taken.
(e) After the packet has been stored in the database, the signature matching layer,
which works off line, will take on this data and attempt to classify it as an attack
based on the currently available signatures. However, if a signature is not found
for any given packet, then it is not discarded but again stored in a different
database to be taken up for further processing by other Modules.
(f) Again the unclassified malicious packet is taken from this data base and is
passed on to the attack clustering module. This module is based on attack data
clustering. It will check the arriving malicious packet against pre computed attack
clusters. If the packet is within certain threshold of the attack cluster the system
administrator is warmed that although a packet could not be definitely classified
as an attack however its characteristics resemble those of an attack. If the packet
is outside these given thresholds then no action is taken.
Path Two
(a) The packet arrives at the Sniffer
(b) Some packets are randomly sent to the pseudo random module instead of
sending them through path one.
(c) In the pseudo random module, the IDS initially performs signature matching on
the packet against provided attack signatures. If a match is found then the
administrator is alerted else the packet is passed on to the check attribute
module.
(d) Details of how the check Attributes module works is left for later sections.
However, here it will suffice to say that check attributes works on domain specific
knowledge. Based on this knowledge it generates derived attributes for the
packets. It then performs if-then analysis based on these derived attributes and
based on this if-then analysis the packet is declared as either malicious or clean.
3.2
Details of Modules
3.2.1 Blocked Ports/IP
This module is by far the simplest module in our entire design. What this
module basically does is that it maintains a list of all the IPs and ports that
27
have been blocked by the network administrator. Whenever a packet arrives
through the Sniffer, this module it will check the IP and TCP headers of that
packet against the list of blocked ports and IPs. If packet is found to be
coming from a blocked source or going to a blocked port then the
administrator is alerted.
3.2.2 Pre Anomaly
This module is invoked immediately after the packets have been filtered for
blocked ports and IPs. Its functions are the following:

Matching known flag anomalies
The flags of the incoming packet are matched with the flags of popular
attacks like the syn-fin flag combination attack. It also handles values
which should not be set e.g. out of range or extremely high or low
values e.g. setting a very low packets size while setting the more
fragment bit.

Matching popular values
The other thing that this module is capable of is that it matches certain
popular values amongst hackers with the incoming packets e.g. IP ID
31337.
3.2.3 Anomaly detection
K-Mean Algorithm: The Selected Algorithm
AGNES and DIANA took a lot of time. Secondly, they did one pass and they
could not undo what had already been done. So there is a possibility that the
clusters created were not that good. BIRCH was sensitive to the order of the
data. It may need fewer passes than partitioning algorithms but this affects
the results. Density Based Algorithms are complex. Secondly we simply need
a clustering, not an augmented ordering from we which we derive clusters of
various densities by giving different parameters. DENCLUE is for data sets
with large amounts of noise. As we are going to deal with only normal traffic
data, there is going to be no noise and all the data needs to be placed in
clusters. So we are left with the Partitioning Algorithms. PAM reduces the
amount of scans done by the K-Mean algorithm and its variants but this
affects its results as it can work on small data sets but not on large data sets.
This also excludes CLARA and CLARANS which are based on PAM. K-Mean
Algorithm is relatively efficient and simple and is of O (ktn). There is a need to
specify the number of clusters in the beginning but we can get over that as
you will soon see.
Algorithm Implementation
28
It is implemented in four steps:
 Partition objects into k non empty clusters.
 Compute seed points as the centroids of the clusters. Centroids are
the centers (the means) of the clusters.
 Assign each object to the cluster with the nearest seed point.
 Go back to step 2 (start next iteration); stop when the clusters remain
unchanged.
Issues with the K-Mean Algorithm
1. It is difficult to estimate the optimum number of clusters for a particular
dataset.
2. A new point that should belong to a new (k+1) cluster is forcibly placed
in one of the old (<=k) clusters.
3. Density of the k clusters will be low if the data is scattered.
K-Mean Modification
Instead of fixing the number of clusters k in the start, we provide the radius as
the parameter. However, once again we will not need to specify the number
of clusters at the start. If a point is too far off from any cluster, i.e. does not fall
within the specified radius, it will be placed in a new cluster. The density will
improve but since the radius is same for all clusters, some clusters will still
have low density as compared to other clusters. If we decrease the
magnitude of the radius, then the low density clusters will be replaced by the
high density clusters, and high density clusters might divide into overlapping
clusters. So the radius seems to be a very rigid way of defining boundaries.
Instead of using radius, we are could use the standard deviation.
Standard Deviation and its impact on Clusters
To overcome this problem of density, we are using standard deviation instead
of radius. The standard deviation of each cluster is calculated. As the clusters
contain different data points, the standard deviation is different for different
clusters. Around 99% of the data points lie within “2.32 times the standard
deviation” of a cluster. If a new point lies within “2.32 times the standard
deviation” of a cluster, then it is added to that cluster. This works for one
iteration. Then the standard deviation is recalculated before the next iteration.
Since we are only adding those points whose distance from the mean of the
cluster is within “2.32 times the standard deviation” of that cluster, and most
of such points will be closer to the mean, it will mostly reduce the over all
standard deviation of that cluster. Thus, next time (in the new iteration) when
a new point comes, it will have a much stricter standard deviation to compare
with. This will ensure that the density of the resulting clusters will be high.
29
It is possible that the points accumulate near the edge of the cluster. Since
more points will be further from the mean now than when the iteration started,
the new standard deviation will be larger than the previous standard
deviation. We have kept a bound for the standard deviation. If the standard
deviation goes beyond that, then those points responsible for the increase are
not added in the cluster and in the next iteration, those points are put in a new
cluster. Then both the clusters will be of relatively high density.
The benefit of standard deviation is that it is not rigid like radius and different
clusters can be represented by their own means along with their own
standard deviations. The deviations will increase and decrease but the
density will be kept under control by not getting low. Secondly, only those
clusters will have large standard deviations which do have points that are
spaced all around in the cluster (except that they have a relatively higher
density in an area farther from the mean). Any cluster whose points are not
present in all the space and are accumulated at specific points farther from
the mean, then that cluster will be divided (since the standard deviation will
become larger than the boundary specified).
Network Traffic Header Features
Following are the features that are extracted and then used in clustering and
comparisons.
1. Connection: This includes the source and destination IP’s, and the
destination port. It is claimed in the paper; Anomaly Detection using TCP
Header Information, this combination is sufficient to be called a connection
and is sufficient to detect anomalies. This is used as an ID of a connection
and thus also used as primary key in searches.
2. FSR: This is the sum of the FIN, SYN and RST flags divided by the
number of total packets in a connection. In the paper it is claimed that if
the value is high, then it is a good indication of an attack. FIN and SYN
maximum values can be three and RST is usually zero. But keeping in
mind that there can be some nonzero values of RST, we had come up
with a certain value around which the normal FSR value revolved. This
was used to decide the initial standard deviation in Modified K-Means
Algorithm.
3. PSH: This is the sum of PSH flag divided by the number of total packets in
a connection.
30
4. ACK: This is the sum of ACK flag divided by the number of total packets
in a connection.
5. Total Packets: The number of packets sent in a connection till now.
6. Bytes per Packet: The total bytes sent in a connection divided by the
total packets in a connection.
7. Port: The port number that the connection is being established with. This
helps in checking for port scans. When a port is not present in the normal
clusters (this tells that no service is running on that port), and a connection
is attempted to it, then it generates an anomaly telling about the port scan.
Modified K-Means Algorithm, the Training Data and Network Sampling
Before the NIDS is put into action to detect attacks, the algorithm is executed.
The clustering on the normal data is done in the algorithm. The normal traffic
data is provided to the algorithm as the training data. The clusters formed
represent the normal traffic data. When the NIDS starts to work, it simply
compares the incoming traffic packet data with the clusters and if there is any
deviation, it produces an alert. The training data is important because the
algorithm trains itself on the training data. In the anomaly detection phase, the
comparisons need to be done with only the normal traffic. So the training data
represents the normal network traffic.
In order to keep our clusters representative of the normal network traffic, we
need take a huge amount of normal network traffic data so that it can
encompass as much detail about traffic as it can. This is a case of sampling.
We need to sample in such a way that the sample contains as much
information about the network as possible because the training data is to
represent the whole network. For this, huge amount of data needs to be
collected for weeks. As it was not possible for us to collect such a huge
amount of data and that too, of only the normal network traffic, by ourselves,
we got the data from the DARPA website.
Network Traffic Header Features and Training Data
Firstly, all the connections from the normal data collected for training are
extracted. Then features are extracted for each connection. The connection
IDs along with their features are placed in a file that becomes the training
data file for the Modified K-Means Algorithm. Before the algorithm is
executed, another function works that separates the individual features of
each connection and places the same features in the same files because the
algorithm works on individual features. Then individual feature training files
are sent to the Modified K-Means Algorithm one by one, which produces a
clustering for each feature separately. As the algorithm is run for each feature
31
separately, clusters are created for each feature separately. Thus, when the
network traffic is being checked, the features of the connections present are
extracted. Each feature is individually compared with their corresponding
clusters. If an anomaly is detected in any feature, then the connection is
flagged as malicious. There may be more than one cluster for a single
feature.
Modified K-Means Algorithm Pseudo-code
structure Cluster
float mean;
float st_dev;
int no_of_pts;
//used in the algo
int used;
float updated_mean;
float updated_st_dev;
/* initially the updated_mean and updated_st_dev are equal to mean and
st_dev. As new points enter the cluster, the updated_mean and st_dev
changes but not the original mean and st_dev, until the end of the iteration.
After that, the mean and st_dev are set equal to updated_mean and
updated_st_dev*/
Procedure KMeanVariant(string filename, float stdev): returns ClusterArray
current is an array of clusters
previous is an array of clusters
do
for all elements in the current array
set no_of_pts = 0
set used = NOTUSED
copyArray(previous,current)
Set pointer to beginning of file
While end of file is not reached
Input = read a number from file
If current is empty
current[0] = addCluster(input,stdev)
else
Call function: updateClusterArray(current,input,std)
32
delete those means(clusters) from the current array which had no updation
update mean and stdev of the clusters in current
while(current and previous are not same)
return current;
updateClusterArray(ClusterArray a, float m, float stdev)
for(i=0; i<size of array a; i++)
if(a[i] == NULL)
a[i] = addCluster(m,std);
else
mean_diff = absolute(a[i]->mean – m);
if(mean_diff <= 2*a[i]->st_dev)
add point m in cluster a[i] by updating the
updated_st_dev and no_of_pts
updated_mean
and
Modified K-Means Algorithm
Preview
This algorithm is executed before the real time analysis of the network traffic
begins. This provides the clusters for the anomaly detection of the network
data. We already defined the features for a connection. This algorithm is run
for each feature separately, except for the connection feature, which is an ID.
This results in the creation of clusters for each feature separately. Thus, when
the network traffic is being checked, the features of the connections present
are extracted. Each feature is individually compared with their corresponding
clusters. If an anomaly is detected in any feature, then the connection is
flagged as malicious. There may be more than one cluster for a single
feature.
When we start the algorithm for a feature, we give a Starting Standard
Deviation for that feature. This serves as the initial standard deviation to
which the points should be compared, in order to decide whether they lie in a
cluster or not, or whether they should be placed in a new cluster.
Structure “Cluster”
The structure cluster has six values:
 mean: The “mean” of the cluster
 st_dev: The “standard deviation” of the cluster
33
 no_of_pts: The “no. of points” in the cluster
 used: This variable tells us if the cluster contains any points or not
 updated_mean
 updated_st_dev
Initially the “updated_mean” and “updated_st_dev” are equal to “mean” and
“st_dev”. As new points enter the cluster, the “updated_mean” and
“updated_st_dev” change but not the original “mean” and “st_dev”, until the
end of the clustering. After that, the “mean” and “st_dev” are set equal to
“updated_mean” and “updated_st_dev”.
“used” variable is used when the re-clustering is done. Before re-clustering,
we have clusters that have some points in them; at least one point. Now in
the next iteration, we need to do the re-clustering. When we do the reclustering, it possible that when all the points are catered for, there may
remain a cluster/mean that was previously there, which does not contain any
point. This cluster would have “used” equal to NOTUSED, and this would be
deleted in the end of the clustering iteration. The “used” is set to USED when
a point is added to the cluster; when a new cluster is made, it’s “used” is also
set to USED.
“update Cluster Array” function
This function takes in a cluster array “a”, a point “m” and a starting standard
deviation “std”.
There is a loop that goes through the whole of the array. It searches to see if
the point “m” falls in any cluster in the array. If the value lies within
“2.32*st_dev” of the cluster, then the point is added to that cluster. If the end
of the array is reached and the point “m” does not fall in any cluster, then a
new cluster is formed and that point is placed in that cluster.
When a point does not come in a cluster, then a new cluster consisting of that
point is made. Here, we need the starting standard deviation, which is “std”.
The “st_dev” and the “updated_st_dev” of the cluster are set to starting
standard dev, the “mean” and the “updated_mean” are set to the value of the
point and the “no_of_pts” is set to 1.
When a point comes in a cluster, then the “updated_mean” and
“updated_st_dev” are updated. When the clustering of all the points is done at
the end of an iteration, then the “mean” is set equal to “updated_mean” and
the “st_dev” is set equal to the “updated_st_dev”. The updation of the
“updated_mean” and “updated_st_dev” is just a mathematical calculation
using the formulae. No. of points in the cluster is also increased by one.
“KMeanVariant” function
34
This is the function that does the clustering of the points in the dataset. This
function takes in the filename of the file that includes the values of a feature
for all the connections in the training data set (Another function extracts the
features from the training data file that contains all the connections along with
their features and passes these files to the KMeanVariant algorithm one by
one). It also takes in the starting standard dev “std”. This function returns a
ClusterArray.
In the beginning, two cluster arrays are declared; “current” and “previous”.
Two arrays are kept so that the ending condition can be checked. The ending
condition is that, when after re-clustering, the previous clustering is only a
small fraction different from the new/current clustering, and then the algorithm
should stop. This fraction is “st_dev/30”.
Now the loop starts that does the clustering until the ending condition is met.
First we see that if there are any elements in the “current” array, then we set
the “no_of_pts” equal to 0 and “used” = NOTUSED. Then we copy the current
array into the previous array because the re-clustering would be done on the
current array and then the current array and the previous array need to be
checked for the ending condition. Then we set the pointer to the beginning of
file so that we start reading the file from the beginning before we start
clustering in iteration. Then we do the following until the file does not end:
We read a point/value from the file. If the current array is empty, i.e.
there are no clusters in the array (this is the first iteration), then simply
a new cluster is added at the first location in the cluster with the cluster
“mean” equal to the input value from the file and the “st dev” equal to
the starting standard deviation. The “no of pts” is also set to 1.
If the current array is not empty, then we call the update Cluster Array
function to place the value in its right location in the cluster array.
After the above is done for all the points in the file, then we check if there is
any cluster in the array that was “not used”/“has no points in it”. If such a
cluster is present, we delete that cluster from the current array.
The next and the last thing to be done, is to update the “mean” and “st_dev”
of the clusters so that they can be used in the next iteration for re-clustering.
This is simply done by setting “mean” equal to “updated_mean” and “st_dev”
equal to the “updated_st_dev”. As we know, the “updated_mean” and
“updated_st_dev” are being updated as new points are being added to the
cluster.
Lastly we check the ending condition. If it is fulfilled, we return the current
array. Else, we go into the next iteration.
35
Creation and Real Time Update of Clusters
The clusters are created using the Modifies K-Means Algorithm right before
the NIDS starts working. Normal network traffic data is provided as the
training data to the algorithm to create clusters. Certain features involving the
headers of the normal network traffic packets are extracted and provided to
the algorithm as training data to create clusters.
The clusters are updated in real time. After the NIDS starts working, when
new features of a connection are compared with the clusters to check for
anomalies, if they are normal then they are added to the normal clusters by
updating the clusters, so as to keep on recording the progress. It simply
means recalculating the mean and standard deviation of the clusters that are
changed. If we do not do that, the clusters will not evolve and a packet, which
is not malicious, might come out to be one. So to avoid that, we evolve the
clusters in the least costly way in real time.
After sometime, the clusters are bound to get farther and farther from optimal.
The real time update of clusters led to overlapping and malformed clusters.
So to overcome this, at the end of the day, we do a costly re-clustering on all
the normal data recorded that day, to optimize the clusters; this then improves
the functioning of the Anomaly Detection Layer. As the pervious clusters are
kept and re-clustering is done upon those clusters, the history is maintained.
Secondly, day by day, as the re-clustering is done using that network’s data,
the NIDS will become accustomed to that network.
Feature Extraction from Window
Since our major criteria for clustering is done upon certain features of a TCP
connection, hence one of the main part of using these features is to extract
them from the incoming packets, and store them according to their
connections. This module does this task. Upon the execution of the IDS, the
first thing to be done is to initialize a window of packets which will retain
information about the TCP connections made within that window. Therefore
this module is set into play after the window has been initialized. After the
window has been initialized the entire contents of the window are sent to this
module. The module takes each packet from the window list and does the
following tasks on it:
1. Feature Extraction:
This task essentially takes the packet and extracts the features inside its
header that are required for our IDS.
2. Insert into Tree:
The second task done on the packet is that the connection that it belongs
to is searched in the tree that we are maintaining. If a match is found then
36
the features that we have extracted are added to the existing connection.
If however the connection does not exist in the tree, then a new
connection is created and inserted into the tree with the features set to the
extracted values.
After all the packets in the window have had their features extracted and
been inserted into the tree then a function is called which runs anomaly
detection on each connection inserted into the tree to check if any of the
connections filling up the window are anomalous are not. All of the
anomalous connections in the window are displayed at this point.
Anomaly Detection
The anomaly detection module operates on a single connection. It checks
whether that connection is anomalous or not by checking if it lies within the
clusters that we have formed. It performs the following tasks:
1. Cluster Check.
It checks all the clusters separately and if the connection’s features do not
fall inside any of the clusters set up during the clustering phase then it is
classified as an anomaly
2. Update Cluster.
If the features lie within the clusters then each cluster number is saved in
a temporary location before updating the corresponding clusters.
Update Connection Features
After the initialization of the window and the initial tree, each successive
packet that arrives is used to update the features in the tree while maintaining
the window size. The following tasks are performed in this module:
1. Extracting the packets
Two packets are involved in updating the tree; the packet that has just
arrived and the oldest packet in the tree. The oldest packet is extracted
from the window queue that is being maintained.
2. Feature Extraction
The features are extracted for two packets, the one that has just arrived,
and the one that will be removed from the tree.
3. Updating the tree
After the features have been extracted, the contribution of the oldest
packet is removed and the contribution of the arriving one added to the
tree. This is done by first searching for the connection of the oldest packet,
37
and then subtracting the feature values from that connection. If after
subtracting the values for the connection reach zero then that connection
is deleted from the tree. After this the contribution of the arriving packet is
added by searching for its connection in the tree. If the connection exists
then the feature values are simply added, otherwise a new connection is
made with the extracted features and inserted into the tree.
4. Check For Anomaly
After the tree has been updated then the connection that has just been
updated by the new packet is sent to the anomaly detection module to
check if an anomaly has been found in it or not.
3.2.4
Signature Matching
This module checks the packet payloads for existence of attack
signatures. Essentially, these signatures are various flag settings along
with different payload content. These payloads can range from executable
scripts to viruses and worms.
As discussed in section …, many different kinds of algorithms exist for the
purpose of matching attack signature against incoming packets. However
the algorithm used in this prototype was the Boyer- Moore algorithm,
discussed previously. The reason for the use of this algorithm is its
simplicity and efficiency.
Basically, the signature matching modules maintains a list of all possible
signatures. These signatures are available from snort; an open source
Linux based Intrusion Detection System.
For information on how the Boyer- Moore algorithm works please consult
the above documentation.
3.2.5
Attack Clustering
Basic Idea
Layer 3 was designed to reduce the false positives generated by the
Anomaly Detection phase. Layer 3 was designed on the fundamental
premise behind misuse model, i.e. Attacks follow a pattern. The pattern of
the attack is usually formulated to exploit known weaknesses in the
system. If an attack follows a pattern, then there is a similarity between
attacks. Thus it is possible to cluster attacks.
Attack Clustering and Modified K-Means
38
As already mentioned, an algorithm has been made for the clustering that
is well suited to this system; the same algorithm is being used for the
clustering of attack data. Same as in the case of normal clustering, the
attack clusters are created before the NIDS starts checking the traffic for
attacks because clusters are needed for comparisons.
Training data given the header features of anomalous
connections are provided to the algorithm. The algorithms works on
individual features and creates
clusters for each feature separately.
The attack clustering is same as normal clustering,
except that the
training data provided in both the parts is different and the clusters
created are different, i.e., first part produces clusters that represent
attacks, and the second part produces clusters that represent normal
traffic.
Attack Clustering and the Detection of Attacks
The anomalous packets from the Anomaly Detection go to the signature
matching phase and then come to the attack clustering phase. This whole
path is traversed in order to reduce the false positives generated by the
anomaly detection phase and to further make sure that the anomaly
detected is actually an attack.
The anomaly detection phase sends with the anomalous connection
packet, a distance measure. This includes all the least distances that each
feature has from their corresponding cluster means. These will be used
here in the attack clustering phase. The connection information will be
compared with the attack clusters. Now three cases arise here:
1. If any feature of the connection lies within an attack cluster, then it is
labeled as an attack.
2. We calculate the least distances of each feature from their
corresponding cluster means. Now a comparison is done with the
distances from the normal cluster means. If any feature has the attack
cluster distance less than the normal cluster distance, then an alert is
generated, indicating that this could potentially be a new kind of attack.
Otherwise, no alert is generated.
3. There is a third possibility. It is checked before the alert for the 2 nd
possibility is generated. If the distance from the attack cluster is greater
than a certain radius surrounding that cluster, then an alert is
generated indicating that even though this is closer to attack than
normal data, it is deviant enough from most attacks to consider it as an
attack; but it is still an anomaly. The radius is larger than the standard
deviation of that cluster.
39
Appropriate information about the connection is also generated if it is
not deemed normal as in the 2nd possibility.
Update of Attack and Normal Clusters
After the connection information is checked with the attack clusters and
one of the following possibilities is followed, there is a need to update
the clusters so that the clusters and in whole, the system can evolve.
If the first possibility is followed, then simply the cluster in which the
feature (which caused the packet to be labeled as an attack) falls is
updated. If the second possibility is followed, then if it is labeled as a
possible attack, a new attack cluster is made for that feature. If it is
deemed normal, then the corresponding normal cluster is updated. If
the third possibility is followed, then no change is done.
The attack cluster update is done in the same way as the normal
cluster update is done. And to optimize the clusters, re-clustering of the
attack clusters is done at the end of the day, just like it is done for the
normal clusters.
3.3
Performance Tweaking
For the tweaking part of our project the following variables were tweaked:
Cluster Standard Deviations
a.
FSR deviation
The final FSR value that we reached upon was 0.1. This makes sense as the range
of FSR ranges from 0 to 3.
b.
PSH deviation
The final PSH value that we reached upon was 0.2. This is because the range of
PSH already lies between 0 and 1.
c.
ACK deviation
The final ACK value that we reached upon was 0.1. The range of ACK is from 0-1.
d.
Port deviation
The final Port value reached was 10. This was found to be more accurate.
All the cluster deviation values that have been determined have been done so by
modifying the deviations in both directions, in all cases, increasing the deviation
resulted in lesser clusters and decreasing deviation led to more clusters. In many
cases, decreasing deviation led to more false positives, and increasing the deviation
40
led to false negatives for higher values. The values chosen have been done so by
trial and error starting with an educated guess, and then checking which values led
to the minimum error.
Window Size
The window size was set at 500 as that gave a good representation of the traffic flow
while not being too hard on the disk space and the initial window initialization.
Real Time Optimizations
1. AVL trees used to make searches more efficient.
2. Least costly Real Time Update of normal and attack clusters for their evolution so
that they incorporate normal data of the network as it comes.
3. Addition of new attack clusters that represent new unknown attacks so that the
system can evolve.
4. Re-clustering of clusters at the end of the day. Not only does this keep a trace of
history, but because of this, the system becomes accustomed to the network.
5. Modified K-Means Algorithm and the use of standard deviation.
6. Clusters are different for features. This may produce more false
positives but this makes the architecture more efficient to attack
detection because the features need to lie within a cluster to be
classified as normal and if the feature fails to lie within a cluster
of a specific feature then it is classified as an attack
Payload tree size
The Payload tree size was set at 100. This was an arbitrary value and we did not get
much time to test it for its size.
Payload Attributes
Timed Attributes (for 5 second time window)
i. Login Attempts threshold
Set at 3. It’s too hard for a normal person to attempt 3 logins in 5 seconds.
ii. HTTP requests threshold
Set at 10. Not tested
iii. Directory creation attempt threshold
Set at 3. Directory creation attempt is also lesser than 3 in normal cases.
Derived Attributes
41
iv. Hot threshold
Set at 3 as it is an important threshold and higher values here are a good sign of
anomalous activity.
v. Hidden directory threshold
Set at 3 as it is also an important threshold and higher values here are a good
sign of anomalous activity.
vi. Total failed login threshold
Set at 15 because a normal person does not enter a wrong password 15 times in
a session.
vii. Total directories created threshold
Set at 60 because of the same reason as above.
viii. Compromised condition threshold
Set at 3 as it is an important threshold and higher values here are a good sign of
anomalous activity.
42
Chapter Four
Prototype Results
4.1
Coverage
This measurement determines which attacks an IDS can detect under ideal
conditions. As our IDS contain three layers, which deal with different types of data,
we will treat them separately.
4.1.1 The Pre-filtering Stage
The Pre-filtering stage is the first stage that the incoming network packets
encounter. The first thing this layer does is check if the packet is coming from a
blocked IP. If it is, then it informs the network administrator. Second, it checks if a
blocked port is not being accessed. If it is, then appropriate information is
displayed. The information about the blocked IP’s and ports are provided by the
user. The next thing it does is do some checking on packet header attributes.
These are attributes which are well known and could not be used in clustering
because they are malicious only in a certain combination and individually they
are not. These were taken from Snort website, research papers and related sites.
These are those signatures that only look at the header and not the payload.
Anomaly was detected and appropriate information displayed when:

FIN flag is set, and any of the flags (except for ACK flag) is set. Detects scans
like SCAN nmap fingerprint attempt, SCAN synscan portscan, SCAN SYN FIN,
and SCAN XMAS.

TTL > 220, SYN is set and Acknowledgement is 0. This detects SCAN
myscan.

Reserved bits 1 and 2 of the TCP flags are set (this might lead false positives
as we will see later). This detects scans like SCAN cybercop os and SCAN
synscan portscan. DOS attacks like DDOS shaft synflood and DDOS
mstream client to handler can be detected. A backdoor BACKDOOR
ACKcmdC trojan scan is also detected. This can also detect a new attack with
these flags set.

IP header fragment ID field value is 31337 (very famous with some hackers).

Acknowledge=0, flags=0 and sequence=0. This detects the SCAN NULL.
43


ACK is set and Acknowledgement=0. This detects the SCAN NMAP TCP
ping.
More fragment bit set and IP length < 256 bytes. This detects those attacks
that have wrong fragmentation.
4.1.2 Layer 1: Anomaly Detection Stage
This is the stage that uses clustering. Here, the clustering is being done on
basically two types of attributes:
 Ports: As the clustering is around those ports which are normally used (which
have services running on them), so when a port that is not in the clusters was
accessed, it is considered as an anomaly. Consequently, this counteracts the
SYN scan. As most of the Trojans and, viruses communicate back on an
ephemeral port, or at least listen on one, their activity is also detected.
 Flags: The clustering on flags detects flood attacks. SYN flood attacks,
attacks by sending a lot of ACK packets, FIN flood attacks, can be detected.
In an attack, the number of these individual flags set in a connection is quite
large than in a normal connection. This helps in determining that it is an
attack. The plus point of this feature is that any flood attack (using header
fields) that is new, will be detected by this scheme.
4.1.3 Layer 2: Payload Signature Matching and Payload Attributes Check
Stage
The signature matching stage is not done on all packets because it is costly and
cannot be done in real time. When a packet has cleared stage 2, the system
checks if the stage 3 is free (i.e. not processing any packet). If it is free, it sends
the packet for signature matching of payload. Otherwise, the packet is simply let
go. The signature matching stage matches the signatures that require payload
check. This stage detects attacks which involve the payloads like Viruses,
Trojans, etc. The signatures were taken from Snort website. The stage 3 works
on these signatures. Thus the number of attacks detected by Signature Matching
is dependent upon the rule base of attacks.
There is the functionality of derived attributes. Attributes are derived from
payload information and the results tell whether an intrusion is under process or
not. It detects wrong log ins, checks if someone is trying to fill up your disk space
like by creating a lot of directories, checking system compromise attempts like
someone trying to access the root directory, wrong file traversals and http floods.
4.2
False Positives
The area where false positives can occur is in ports check. When we install a new
application, then a new port is used and when a sender tries to access it, an
anomaly will be generated. But this can be a one time job. The user can update the
training data appropriately to incorporate that port.
44
In flags check in anomaly detection, there is a possibility of false positives. There
can be cases where the flag values for a connection come out to be large which
indicates an anomaly but in reality, it is not. But this is the tradeoff we are willing to
do to detect attacks in real time and to detect new attacks.
There is one area which can produce false positives: the part where we check the
two TCP reserved bits. It involves ECN (Explicit Congestion Notification). What is
ECN? ECN is a standard proposed by the IETF that will cut down on network
congestion and routers dropping packets. Currently, RFC 2481 states that in order to
accomplish this task ECN will use four previously unused bits in both the IP header
and the TCP Header. In TCP header, these are the reserved bits which should
normally be set to zero. Now, the problem is that scans and attacks set these bits.
Here is the tradeoff, if we consider the setting of these bits as normal, then the false
negatives would increase and if we consider it as abnormal, then the false positives
would increase. We are considering it as abnormal behavior because it is better to
receive false positives than false negatives; otherwise, we would compromise the
security of our network. Secondly, ECN uses the three - way handshake to
determine whether or not a sender and receiver are ECN compatible. If they are
compatible, only then the ECN is used. Otherwise, the communication is done in the
normal way. The ECN concept is new and not many senders are compatible. Thus
most of the packets that will have their reserved bits set will be malicious and not
normal. So we do not consider setting the reserved bits as normal behavior and give
more importance to attacks. This would increase our detection rate and reduce our
false negatives rate, without increasing the false positives rate too much (because
the ECN concept is new); thus overall being beneficial.
4.3
False Negatives
One area where the false negatives can be generated is the Pseudo-random phase.
This phase is not applicable on all the packets going through the IDS. It selects
packets in pseudo-real time and works on them. So even though a signature is
present for an attack, it is a possibility that the attack will not be detected because
the packet was not picked up by the pseudo-real time module.
In the anomaly detection stage, when we are doing the comparisons of clusters with
the incoming traffic, we compare using “2.32 times the standard deviation”. If the
information of a connection lies within this figure, then it is considered to be normal;
otherwise, it is considered malicious. Around 99% of the points in any cluster lie
within “2.32 times the standard deviation” of that cluster. If the information of a
connection does not lie within this figure, it is a possibility that it is part of the 1% that
does lie within the cluster but does not lie within “2.32 times the standard deviation”;
it will not be detected, even though it should be detected.
45
4.4
Detection Probability
For the Pseudo-random Signature Matching phase, the detection is hundred percent
if the attack information (attack is in payload) is present in the rule base and the
packet is picked for signature matching. If it is not detected, the packet enters the
network. This depends how busy the network is. The busier the network, the more
packets are going to be dropped by the pseudo-random phase.
The Anomaly Detection phase detects only 99% of the anomalies that were present
in the training data as we have seen in the false negatives part. The rest 1% is not
detected.
4.5
Handling High Bandwidth Traffic
The system that we designed can be implemented in a distributed fashion. If this
system is deployed in a distributed fashion, then considering the performance it is
giving, it will work very efficiently in high bandwidth traffic. The pre-filter part is
computationally efficient (very light weight as simple seven comparisons are being
done). Anomaly Detection phase is also simply comparing the connection
information with the clusters already defined. The information is numeric, so it is
simple greater than or not greater than comparisons. This is very fast and according
to research by various scientists in this field, it is one of the best and latest
techniques for real time intrusion detection based on headers. Signature Matching is
comparatively slow and thus, performed pseudo-real time. There is a possibility that
if the traffic is fast, it would check fewer packets as compared to when the traffic is
slower.
4.6
Ability to detect new Attacks
New Attacks can be detected in the anomaly detection stage. Almost all sorts of
floods using TCP header can be detected. The clustering on flags detected flood
attacks. In an attack, the number of these individual flags set in a connection is quite
large in comparison with a normal connection. This helps in determining that it is a
flood attack. Thus new flood attacks can be detected.
Where payload derived attributes are checked, new attacks can be detected
because the attributes are extracted from domain knowledge which is developed by
domain experts and they claim that it capable of detecting many new attacks.
As the clustering is around those ports which are normally used (which have
services running on them), so when a port that is not in the clusters is accessed, it is
considered as an anomaly. So when someone tries to access some port with no
service running on it, an alert is generated. Thus, any type of new probe that tries to
access a new port on our machine, are detected.
46
Another area where the new attacks can be detected is the Pre-filtering stage. Here,
new attacks dealing with the seven conditions tested can be detected.
4.7
Limitations of the Model
The limitations of the model include:
 Coverage:
By coverage what we mean is that our model basically incorporates a subset of
the current signatures and the domain knowledge attributes which we were able
to obtain from our research. These limitations can be overcome by incorporating
them in the current model as well.
 Pseudo Real time:
The pseudo real time incorporated in our model is a big limitation as it means
that a certain amount of packets will be left unchecked by the signature matcher.
This limitation can be highly reduced by incorporating a distributed model, with
layer one working real time and layer two working on another computer. This
would distribute the resources and hence give the computer more time to capture
packets and reduce the pseudo real time model to an approximately equal real
time model.
 Attacks not catered for:
Although we have tried to incorporate as much as we could in the given time, but
due to the above mentioned reasons and some more, some of the attacks are
not detected. One reason for this is given above that we have not had the chance
to incorporate the certain limitations, but we have an answer for them. The
second reason is for the type of attacks that our model does not cater for at all
and can not improve upon. These attacks mainly lie in the U2R and R2L ranges.
These attacks require an anomaly detector for each of the user using the
computer and this our model does not cater for, as we are using an anomaly
detector for the network data and not personal usage data.
 No response system:
Right now our model does not cater for a response system to the arriving attacks,
meaning that we solely rely on the administrator to decide what to do when an
attack is found and relayed to the administrator. A response system could be
introduced which would pose a better chance in responding to the attacks
because of its mechanical nature. Thus it would be able to respond to the attack
and not let them through whereas right now the success of the attack lies on the
administrator’s response.
47
Chapter Five
Future Enhancements

In
the
Signature
Matching
part,
all
of
the
signatures
could not be incorporated. So those which are not present are not
detected by the system. But we can incorporate them in our rule base.

The DDOS attack that spoofs source addresses is not detected by the system.
This is because we have to deal with it differently from the usual DOS attacks.
However, we can extend the system to incorporate it

Our design can be implemented in a distributed fashion which can improve its
efficiency in high bandwidth networks.

An anomaly detection mechanism could be set up for all the important personnel so that
if a U2R or R2L attack is launched on them and succeeds then the anomaly detector
could detect the deviated behavior in the usage of that system/account.

The clustering algorithm can be further optimized to re-cluster itself when the clusters
are getting inefficient.
48
Chapter Six
Conclusion
IBIDS is a hybrid of anomaly detection, misuse detection and domain knowledge
based attribute extraction. The prototype developed by us is not just capable of
detecting existing attacks, but also detects new attacks. Although there is a lot of
room for future enhancements, a number of methods have been incorporated in
this system to improve detection. In short IBIDS is capable of the following:
•
•
•
•
•
Efficient. Saves time and space.
Able to process headers in real time.
Able to process payloads in pseudo real time.
Can detect anomalous behavior and new attacks.
Space for future enhancements.
49
References
[1]
http://www.ll.mit.edu/IST/ideval/data/data_index.html
[2]
http://kdd.ics.uci.edu/databases/kddcup99/task.html
[3]
WEIJIE CAI AND LI LI, "ANOMALY DETECTION USING TCP HEADER
INFORMATION
[4]
S Terry Brugger, “Data Mining Methods for Network Intrusion Detection”, June
2004.
[5]
Salvatore J. Stolfo, “Cost-based Modeling for Fraud and Intrusion Detection: Results
from the JAM Project”
[6]
Wenke Lee, Salvatore J. Stolfo, “Real Time Data Mining-based Intrusion
Detection”
[7]
Wenke Lee, “A Data Mining Framework for Constructing Features and Models for
Intrusion Detection Systems”
[8]
Binh Viet Nguyen, “An Application of Support Vector Machines to Anomaly
Detection”
[9]
Wei Lu and Issa Traore, “Detecting new forms of network intrusion using genetic
programming”
[10]
Aleksandar Lazarevic, “A Comparative Study of Anomaly Detection Schemes in
Network Intrusion Detection ”
[11]
Wenke Lee Salvatore J. Stolfo, “A Data Mining Framework for Building Intrusion
Detection Models”
[12]
Paul Dokas and Levent Ertoz, “Data mining for network intrusion detection”
[13]
Matthew V. Mahoney and Philip K. Chan, “PHAD: Packet Header Anomaly
Detection for Identifying Hostile Network Traffic”
[14]
Luca Deri, Gaia Maselli, Stefano Suin, “Design and Implementation of an Anomaly
Detection System”
[15]
Maheshkumar Sabhnani and Gursel Serpen, “KDD Feature Set Complaint Heuristic
Rules for R2L Attack Detection “
50
[16]
Alexandr Seleznyov and Seppo Puuronen, “Anomaly Intrusion Detection Systems:
Handling Temporal Relations between Events”
[17]
Vasilios A. Siris and Fotini Papagalou, “Application of Anomaly Detection
Algorithms forDetecting SYN Flooding Attacks”
[18]
Marina Bykova, Shawn Ostermann, Brett Tjaden, “Detecting Network Intrusions via
a Statistical Analysis of Network Packet Characteristics”
[19]
Eleazer Eskin and Andrew Arnold, “A Geometric Framework for Unsupervised
Anomaly Detection: Detecting intrusions in Unlabeled Data”
[20]
http://www.protocols.com/pbook/tcpip2.htm
51