Download DECISION BASED INTRUSION DETECTION SYSTEM USING

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed firewall wikipedia , lookup

Computer security wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Transcript
DECISION BASED INTRUSION DETECTION SYSTEM USING GENETIC
ALGORITHM
Ms. Priyanka Kisan Ghodekar
Ms. Shaila Bhausaheb Dere
[email protected]
[email protected]
Ms. Sneha Prabhakar Chaskar
Mr. Harish Tanaji Indalkar
[email protected]
[email protected]
Student, Department Of Computer Engineering, Sharadchandra Pawar College of Engineering, Pune,
Maharashtra, India
Abstract- In today’s scenario the security is important
testing dataset is used to generate effective new rules
in vast growing computer networks, so Intrusion
by adopting reasonable detection rate.
Detection System is essential task in daily life
Keywords-Computer and Network Security, Intrusion
practices. There are various approaches being utilized
Detection System, Genetic Algorithm, KDD CUP
for intrusion detection. We present an Intrusion
1999 Dataset, Decision Tree.
Detection System using Genetic Algorithm and
Decision Tree for efficiently and effectively to
identify various types of intrusions or attack. This
project is on developing advanced intelligent systems
using ensemble computing techniques for detection
of intrusions. Integration of computing techniques
like Genetic Algorithm (GA), and Decision Tree
(DT) are used to detect and prevent intrusions. The
Intrusion Detection System in Networking Using
1. INTRODUCTION
An intrusion detection system (IDS) is a device
(or application) that monitors network and/or system
activities for malicious activities or policy violations
and produces reports. It is a process of controlling the
events occurring in a computer system or network
and analyzing them for signs of possible incidents,
which are violation of computer security practices.
Genetic Algorithm (IDS) and Decision Tree is to
identify the intruder and block the data from the
Intrusion Detection Systems have undergone
intruder to avoid the system attack by the virus. The
rapid growth in power, scope and complexity in their
major components of the system are creating new set
short history. In recent years, Intrusion detection
of rules during run time. The GA component imparts
system has been one of the most sought after research
the feature subset selection through a suitably framed
topics in the field of Information Security having
fitness function. A decision tree is used to detect the
huge applications in the cooperate world where data
subtype of attack. The KDDCUP99 training and
integrity and security is a complex issue.
When an intruder attempts to break into an
information system, then we can say intrusion
parameters of the fuzzy functions for selecting the
features of the relevant network.
occurred. Intruders may be external or internal
depending upon the authorization level. Intrusion
Lu: In this method classification rules are generated
techniques may include exploiting software bugs or
by Genetic Programming. Detection or Classification
system
An
of intrusions in the network with the help of the
Intrusion Detection System (IDS) is a system for
fitness function is fine tuned by this method. The
detecting intrusions and reporting them accurately to
time required to train the system with huge data
the proper authority. IDSs are usually specific to the
creates
operating system that they operate in and are an
difficult.
configurations,
password
cracking.
Genetic
Programming
implementation
important tool in the overall implementation of an
organization’s
information
security
criteria
by
Crosbie: Different agent techniques and Genetic
defining the rules and practices to provide security,
Programming can be used to detecting network
handle intrusions, and recover from damage caused
intrusions. The set of agents that determine the
by security breaches.
network behaviors can be finding out by an agent
who monitors one parameter of the network audit
2. RELATED WORK
data
and
Genetic
Programming.
Many
small
GAs has been used for network intrusion
autonomous agents can be used in this method which
detection in several ways. Some of them use Genetic
is an advantage and the communication among the
Algorithm for to obtain the classification rules , while
agents is a drawback.
others use different AI methods for possession of
rules, where GAs are used to select appropriate
Selvakani: This system identifies the attacks using
features or to determine the optimal parameters of
rule set by proceeding Genetic Algorithm, then
some functions.
exploit rules only for R2l and DoS type of attacks.
Between these two attacks, one from each is selected.
Li represent a technique using GA to detect abnormal
The common performance of the system is less than
network
60%.
intrusion.
This
approach
includes
is
obtaining classification rules for quantitative and
distinct features of network data. Apart from the
implementation of rule generation for IDS is given
3. INTRUSION DETECTION SYSTEM
but results of experiments do not exist.
OVERVIEW
The following sections give a overview of
Bridge: This method is a combines both fuzzy data
various components of Intrusion Detection System,
mining techniques and Genetic Algorithm for
classifications and networking attacks.
detection of network anomalies and misuses. The
most features are not predicted properly in various
3.1 Components of Intrusion Detection System
existing Genetic Algorithm based IDS’s. This method
An intrusion detection system normally consists of
uses Genetic Algorithm to recognize the optimal
three functional components. The first component is
known as the event generator or data source. The
This section is an overview of the four major
second component is known as the analysis engine,
categories of networking attacks and each attack is
which takes information from the data source and
placed into following groupings.
examines the data for symptoms of attacks. The
_ Denial of Service (DoS): A DoS attack is a type of
analysis engine use following analysis approaches:
attack in which the hacker makes a computing or
_ Misuse/Signature-Based Detection: This type of
memory resources too busy or too full to serve
detection engine detects intrusions that follow well-
legitimate networking requests and hence denying
known patterns of attacks (or signatures) that exploit
users access to a memory resources. For example
known software vulnerabilities. The main limitation
smurf, apache, mail bomb, Neptune, etc.
of this approach is that it only looks for the known
_ Remote to User Attacks (R2L): A remote to user
weaknesses and may not care about detecting
attack is an attack in which a user sends packets to a
unknown future intrusions.
machine over an internet, which he or she doesn’t
_ Anomaly/Statistical Detection: An anomaly based
have access to in case to expose the machines
detection engine will search for something rare or
vulnerabilities and exploit privileges which a local
unusual. The drawbacks of the system are that they
user would have on the machine. For example phf,
are highly expensive and they can recognize an
guest, xlock, xnsnoop, etc.
intrusive behavior as normal behavior because of
_ User to Root Attacks (U2R): These attacks are
insufficient data.
exploitations in which the hacker starts off on the
_ The third component of an intrusion detection
system with a normal user account and attempts to
system is the response manager, which will only act
abuse vulnerabilities in the system in order to gain
when inaccuracies (possible intrusion attacks) are
super user rights. For example xterm, perl, etc.
found.
_ Probing: Probing is an attack in which the hacker
scans a machine or a networking device in order to
3.2 Classification of Intrusion Detection
determine weaknesses or vulnerabilities that may
Intrusions Detection can be classified into two
later be exploited. This is used in data mining. For
following categories.
example portsweep, nmap, etc.
_ Host Based Intrusion Detection: HIDSs evaluate
information found on a single or multiple host
4. PROPOSED SYSTEM
systems, including contents of operating systems,

system and application files.
_ Network Based Intrusion Detection: NIDSs
evaluate
information
captured
from
network
GA.

communications, analyzing the stream of packets
which travel across the network.
3.3 Networking Attacks
This system generates its own rules using a
The system is implemented using a KDD
CUP 1999 Testing and Training Datasets.

It dynamically increases the rules in the
dataset according to the packets flowing in
the network. Because of this reliability of
the system also increases.


Our system classifies attacks using a
permitted. And if that connection is attack type then
Decision Tree.
type of attack is detected and system generates alert.
The major objective of this system is to
improve the detection rate.
6. FLOW OF SYSTEM
5. WORKING OF OUR IDS IN REAL
SYSTEM
The following diagram shows the components of our
system. In the system attack detected before arriving
packet on machine through network connection. For
this we are using IDS which contains KDD, GA and
Decision Tree.
Figure 6.1 Flow of System
Let’s see the flow of system. KDD CUP 1999
dataset contains records in which each record has 41
attributes and 1 manually assigned record type.
Record type indicates whether a record is a normal
network connection or abnormal network connection.
Here we are using only 6 attributes. For the
extraction of attributes we are using a weka tool.
Then that 6 attributes given to the genetic algorithm
as input. Then genetic algorithm generates runtime
rule set which is nothing but a one chromosome. We
Figure 5.1 Applying Intrusion Detection System
can say that chromosome may be a signature of
Let’s see where to apply the Intrusion Detection
normal type or attack type. Match that signature with
System. A network connection is a sequence of TCP
the signatures predefined in test data set. If that
packets starting from a source IP address and ending
signature match with the signature of a particular
at target IP address, results in 41 attributes for every
attack then that attack is detected. Then that rule set
connection and 1 manually assigned record type.
or detected attack given to the decision tree as a
Before arriving that packet on machine that network
input. Then decision tree classify that attack in a
connection analyzed by Intrusion Detection System.
particular attack type. Generated rule set will be
If that connection is normal connection then
stored in rule base.
of chromosomes for survival and combination is
7. GENETIC ALGORITHM
biased towards the best fit chromosomes.
A Genetic Algorithm (GA) is a programming
technique that reproduces biological evolution as a
7.1 FLOWCHART OF GA
problem-solving strategy. GA is a technique which
works on the mechanics of natural selection. It is
based on the Darwin’s theory of survival of the
fittest.
The following flowchart shows the flow of a
simple genetic algorithm. Starting by a random
generation of initial population, then evaluate and
evolve through selection, recombination (crossover),
and
The GA process begins with a set of potential
solutions or chromosomes which are randomly
mutation.
Finally,
the
best
individual
(chromosome) is picked out as the final result once
the optimization meets it target.
generated or selected. These chromosomes are
normally encoded in the binary form but other forms
of encodings are also used. The entire set of these
chromosomes comprises a population. In every
generation the fitness of these chromosomes is
checked. Fitness function is used to find out the
fitness of the chromosomes and then selection
operator will choose the fittest chromosomes using
tournament selection. The chromosomes with poor
fitness value are discarded.
GA uses an evolution and natural selection that
uses a chromosome-like data structure and evolve the
chromosomes
using
selection,
recombination
(crossover), and mutation operators. The process
generally begins with arbitrarily generated population
of chromosomes, which represent all potential
solution of a problem that are measured applicant
solutions. Different positions of each chromosome
are encoded as bits, characters or numbers, which is
refer as genes. An evaluation function is used to
Figure 7.1.1 Flowchart of GA
7.2 ALGORITHM OF GA
compute the decency of each chromosome according
GA_Rule _Generation
to the desired solution is known as “Fitness
Input: Encoded binary string of length n (where n is
Function”. For the period of evaluation, the basic two
the number of features being passed), number of
operators, crossover and mutation, are used to imitate
generations, population size, crossover probability
the natural reproduction and mutation. The selection
(Pc), mutation probability (Pm).
Output: A rule set generation for IDS.
1. Initialize the population randomly.
2. Initialize N (total number of records in the training
In preprocessing phase the KDDCUP99 Dataset
set).
is processed by using Weka tool which is used to
3. for each chromosome in the new population
remove the redundant data from existing Dataset
4. Calculate fitness= Fx/Sum (Fx)
which result in tested Dataset. The removal of
5. End for
redundant data or records from Dataset it improves
6. Select 50% best fit chromosome and remove worse
the detection rate of desired result and improves the
fit chromosome.
performance of our system.
7. Apply Crossover to best selected chromosome.
8. Apply Mutation for each chromosome to generate
In detection phase the Genetic Algorithm is
new population .go to step no3.
applied on chosen features data set and locate fitness
9. Stop
for every rule with the following fitness function.
Fitness = Fx / sum (Fx)
Where Fx is the fitness of individual x and sum (Fx)
GA Parameters
is the entire fitness of all individuals.
GA has some general elements and parameters which
can be defined:
• GA Operators The different GA parameter
8. KNOWLEDGE DISCOVERY
selection mutation and crossover are the most
DATASET (KDD)
successful parts in the algorithm as they are
KDDCUP99 is based on DARPA data from MIT
contribute in the generation of each population.

Selection
phase
individuals
with
where
superior
population
fitness
are
selected, otherwise it gets damaged.
•
•
Lincoln Laboratory is broadly used to evaluate IDSs.
In this study, we used the KDDCUP99 training and
testing datasets.
Crossover is a method in each pair of each
Each record of the datasets consists of 41
individuals selects arbitrarily participates in
network features and 1 manually assigned record
exchanging their parent’s genes with each
type. Nine network features were used in the GA
other, until an entire new population has
which is Duration, Protocol, Service, Flag, Source
been generated.
bytes, Destination bytes etc. The record type indicates
Mutation flips some of the bits in an
whether a record is a normal network connection or
individual, and since all bits could be filled,
abnormal network connection.
there is low probability of predicting the
change.
The KDD 99 intrusion detection benchmark
consists different components :
• Fitness Function The fitness function is defined as
kddcup.data; kddcup.data_10_percent;
a function which scales the value individual relative
kddcup.newtestdata_10_percent_unlabeled;
to the rest of population. It generates the best possible
kddcup.testdata.unlabeled;
solutions from the amount of candidates located in
kddcup.testdata.unlabeled_10_percent; corrected.
the population.
We have used “kddcup.data_10_percent” as
Table 9.1- Detection rate of intrusions
training dataset and “corrected” as testing dataset.
In this case the training set consists of 494,021
Types
records among which 97,280 are normal connection
Total no.
Correctly
Detection
of records
detected
Rate
records
records, while the test set contains 311,029 records
among which 60,593 are normal
connection records. Table 8.1 shows the distribution
of each intrusion type in the training and the
test set.
normal
25640
22398
87.35 %
Probe
17890
14177
79.24 %
dos
16872
14982
88.26 %
u2r
8742
5230
59.82 %
r2l
6412
4310
67.21 %
Table 8.1- Distribution of intrusion types in
datasets
Types
Dataset
Train
10. CONCLUSIONS
Test (“corrected”)
(“kddcup.data_10_per
In this paper, we present and implemented an
Intrusion Detection System Using Genetic Algorithm
cent”)
and Decision Tree to efficiently detect various types
normal
97280
60593
probe
4107
4166
dos
391458
229853
u2r
52
228
r2l
1124
16189
Total
494021
311029
of network intrusions. The KDDCUP99 training and
testing dataset is used to generate effective new rules
by adopting reasonable detection rate.
The major advantages of this proposed detection
system can be generating the new rules to the systems
as the new intrusions become known. A GA is used
to obtain a set of classification rules. The six features
were used when encoding and obtaining the rules. A
simple but effective and flexible fitness function is
used to select the appropriate rules. Depending on the
9. EXPERIMENTAL RESULTS AND
selection of fitness, the generated rules given to the
ANALYSIS
Decision Tree to detect network intrusions or
categorize the types of intrusions.
We get better detection rate for denial of service &
The Genetic Algorithm based Intrusion Detection
user-to-root and close detection rate for probe &
System’s detecting several types of attacks is possible
remote-to-local.
with a high rate of rule set provided.
11. REFERENCES
[1] Mohammad Sazzadul Hoque, Md. Abu Naser
Bikasi, and Md. Abdul Mukit “An Implementation
Of Intrusion Detection System Using Genetic
Algorithm”, International Journal of Network
Security and Its Applications (IJNSA), Vol.4, No.2,
March 2012.
[2] Ch.Satya Keerthi.N.V.L#1, B.Minny Priscilla*2,
P.Lakshmi prasanna#3, M.V.B.T.Santhi#4, “Model
Generation for an Intrusion Detection System using
Genetic Algorithm”, International Journal of P2P
Network Trends & Technology- Vol.1Issue2- 2011.
[3] Mark, Crosbie, and Gene Spafford. 1995.
“Applying
Genetic
Programming
to
Intrusion
Detection”. In Proceeding of 1995 AAAI Fall
Symposium on Genetic Programming, pp. 1-8.
Cambridge, Massachusetts.
[4] Bridges, Susan and Rayford B. Vaughn. 2000.
“Intrusion Detection via Fuzzy Data Mining”, In
Proceedings of 12th Annual Canadian IT Security
Symposium, pp. 109-122. Ottawa, Canada.
[5] KDDcup 1999 data,
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.
html
[6] www.google.com