Download IEEE Transactions on Magnetics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
1
Hybrid Intrusion detection system using Principal Component Analysis
Khin Khattar Myint
PhD Student of UCSY, Myanmar,[email protected]

can be classified into two categories based on the technique
Abstract-- Intrusion detection is the essential component
used to detect intrusion: anomaly detection and misuse
of today computer network. There are two different
detection. Signature detection, or misuse detection, searches
approaches for intrusion detection such as misuse
for well-known patterns of attacks and intrusions by
detection and anomaly detection. Misuse detection can
scanning for pre classified signatures in network packets.
detect known attack but it is unable to detect when new
Anomaly detection can detect new intrusions while misuse
attacks arise. Anomaly detection can be used to detect
detection may not. The idea behind anomaly detection is that
unknown attack but false positive rate may high. In
if we can establish a normal activity profile for a system, in
order to detect our computer network efficiently, we
theory we can flag all system states varying from the
may use both misuse and anomaly detection. Moreover,
established profile as intrusion attempts. Anomalous
collected data contains irrelevant and redundant
activities that are not intrusive but flagged as intrusive are
features. The key point is to build an intrusion detection
referred as false positives. The effectiveness of an IDS is
system in terms of high detection rate and low false
measured using attack detection rate and false alarm rate.
alarm rate. So, we proposed hybrid intrusion detection
This paper consists of following sections: Section II is a
system to detect both known and unknown attack. In
brief description about intrusion detection system, section III
this paper decision tree is used for misuse detection and
discusses the related work of this system, section IV
classification based association is used for anomaly
describes about the proposed system.
detection and principal component analysis is used for
II. Intrusion Detection System
feature selection.KDD CUP 99 dataset is used to train
Intrusion may be defined as the unauthorized attempt for
and test our model.
gaining access on a secured system or network. Intrusion
Index Terms-- decision tree C4.5 algorithm, misuse and
detection is the course of action to detect suspicious
anomaly detection, KDD dataset.
activity on the network or a device. Intrusion detection
I. INTRODUCTION
system (IDS) is an important detection used as a counter
With the increment use of networked computers for critical
measure to preserve data integrity and system availability
systems, network security is becoming more and more
from attacks. An intrusion detection system (IDS) inspects
challenging. Intrusion detection (IDS) has been widely
all inbound and outbound network activity and identifies
deployed to be a second line of defense for computer
suspicious patterns that may indicate a network or system
network systems along with other network security
attack from someone attempting to break into or
techniques such as firewall and access control. The main
compromise a system.
goal of intrusion detection system is to detect unauthorized
A. Intrusion Detection Process
use, misuse and abuse of computer systems by both system
Intrusion detection processes [3] are categorized into
insiders and external intruders. Intrusion detection system
Misuse detection and Anomaly detection. Misuse detection
2
is sometimes called signature-based detection and anomaly
B. User to Root Attack (U2R)
detection is sometimes called behavior-based detection.
Is a class of exploit in which the attacker starts out with
Misuse detection compares the user activities to the known
access to a normal user account on the system and is able to
intruder activities on the web. The idea of misuse detection
exploit some vulnerability to gain root access to the system.
is to represent attacks in the form of a pattern or a signature
C. Remote to Local Attack (R2L)
so that the same attack can be detected and prevented in
Occurs when an attacker who has the ability to send packets
future. The IDS searches for defined signatures and if a
to a machine over a network but who does not have an
match is found, the system generates an alarm indicting the
account on that machine exploits some vulnerability to gain
presence of intrusion. Since it works on the basis of
local access as a user of that machine.
predefined signatures, it is unable to detect new or
D. Probing or Surveillance Attack
previously unknown intrusions. Anomaly intrusion detection
It's any attempt to gather information about a network of
identifies deviations from the normal usage behavior
computers for the apparent purpose of circumventing its
patterns to identify the intrusion. It estimates the deviation
security controls.
of a user activity from the normal behavior and if the
D. Intrusion Detection Dataset
deviation goes beyond a preset threshold, it considers that
The KDD Cup 1999 Intrusion detection contest data [3] is
activity as an intrusion. Anomaly is able to detect new
used in our experiments. This data was prepared by DARPA
intrusion but the compulsion for involvement of limiting
Intrusion detection evaluation program by MIT Lincoln
factor results in high percentage of false positive rate.
Laboratory. Lincoln labs acquired nine weeks of raw TCP
B. Intrusion Detection Approaches
dump data. The raw data was processed into connection
Intrusion Detection can be classified into Host-based
records, which contains about 5 million connection records.
Intrusion Detection System (HIDS) and Network-based
The data set contains 24 attack types. These attacks fall into
Intrusion Detection System (NIDS) according to the data
four main categories describe above. Besides the four
analyzed and stored. Host-based IDS analyze host-bound
different types of attacks, normal class needs to be detected.
audit sources such as operating system audit trails, system
The dataset for our experiments contained 1000 connection
logs and application logs. It is a software application which
records, which is a subset of 10% KDD Cup'99 intrusion
is installed onto a system in order to protect it from
detection
intruders. The audit data which is to be analyzed is collected
generated from the MIT dataset. Random generation of data
from the host in the network. Network-based IDS analyze
include the number of data from each class proportional to
network packets that are captured on a network. In NIDS,
its size, expect that the smallest class is completely included.
detection software is installed in a network in order to detect
All the intrusion detection models are trained and tested with
intrusions. NIDS collects data directly from the network in
same dataset.
form of packets and are analyzed for detecting intrusions
benchmark
dataset.
These
were
randomly
III. Related Works
[3].
There are many research papers in hybrid intrusion detection
C. Attacks Detected by IDS
with different data mining algorithms. Kandeeban and
Attacks types [1] fall in one of the following categories:
Rengan [7] used a combination of genetic algorithm and
A. Denial of Service Attack (DOS)
neural networks for intrusion detection on KDD 99 dataset.
Is an attack in which the attacker makes some computing or
They achieved a detection rate as close to 95 when the false
memory resource too busy or too full to handle legitimate
alarm rate is 1.9% to 2% and a detection arte of 70% as the
requests, or denies legitimate users access to a machine.
false alarm rate is brought down to 1%. Depren et al. [5]
3
proposed an IDS architecture utilizing Self-Organizing Map
classifier is used to separate the attack patterns and normal
(SOM) structure for anomaly and C4.5 for misuse detection.
pattern. Attack patterns are sent to misuse detector and
A rule based Decision Sup-port System (DSS) was also
normal patterns are sent to anomaly detector.
developed for interpreting the results of both anomaly and
misuse detection modules. They obtained a detection rate of
98.96% for anomaly detection and 99.61% for the misuse
detection modules on the KDD 99 Data Set.
Another model for IDS, proposed by Pan et al. [8], used
neural network and C4.5 for attack detection. Their model
achieved the average detection rate of 93.28% and false
positive rate of 0.2% on KDD Cup 99 dataset.
Nguyen and Choi [2] compared different algorithms on the
basis of their percentage accuracies of individual attack type
detection and on the basis of overall accuracies (AA).
Algorithms' classification times (TT in sec) were also
compared to find their real time usage. Compared algorithms
included BayesNet AA(90.62) TT(6.28), NaiveBayes
Figure 1. Proposed system framework
A. Principle Component Analysis
AA(78.32) TT(5.57), C4.5 AA(92.06) TT(15.85), NBTree
AA(92.28)
TT(295.88),
Decision
Table
AA(91.66)
TT(66.24) and few others. Even though the overall accuracy
of single classifiers, mainly C4.5, was quite good but none
of them was able to detect all four attacks efficiently.
Radhika Goel et.al [6] in this paper a novel hybrid model is
being proposed for Misuse and anomaly detection. C4.5
based binary decision trees are used for misuse and CBA
based classifier is used for anomaly detection. Results show
that 99.995% misuse detection rate with an anomaly
detection rate of 99.298% is achievable.
As discussed above, many researchers have conducted
extensive performance comparison of various popular
classification algorithms. Among them, the decision tree
based algorithms like C4.5 give the best performance than
PCA is a common statistical method used in multivariate
optimization problems in order to reduce the dimensionality
of data while retraining a large fraction of the data
characteristic. First, PCA is used to project the training set
onto eigenspace vectors representing the mean of the data.
These eigenspace vectors are then used to predict malicious
connections in a workload containing normal and attack
behavior [4]. PCA reduce the amount of dimensions
required to classify new data and produces a set of principal
components, which are orthonormal eigenvalue pairs. In
other words, pca projects a set of axes which best suit the
data. These set of axes represent the normal connection data.
Outlier detection occurs by mapping live network data onto
these normal axes and calculating the distance from the axes.
If the distance is greater than a certain threshold, then the
other classifiers.
IV. Proposed System
Figure-1 shows the proposed system of our hybrid intrusion
detection system. Firstly, both training and testing data be
preprocessed to remove irrelevant attributes. In our
proposed system, feature selection is done by using
Principal Component analysis (PCA). Decision tree based
connection is classified as an attack. The principle
components are derived from the covariance matrix. When
some values are much larger than others, then their
corresponding eigenvalues have larger weights. The larger
the eigenvalue, the more significant its corresponding
projected eigenvector. Therefore, the principal components
4
are sorted from most to least significant i.e. in descending
Confidence: The rule holds with confidence conf, if conf%
order. If a new data item is projected along the upper set of
of cases that contain X also contain y.
the significant principal components, it is likely that the data
The algorithm used for rule generation in CBA is similar to
item can be classified without projecting along all the
the Apriori algorithm using generate and test approach.
principal components. The eigenvectors of the principal
Firstly, size-k patterns are generated. These are called
components represent axes which best suit a data sample.
candidate patterns. Then using Apriori approach, candidate
Points which lie at a far distance from the axes would
patterns satisfying minimum support are selected as frequent
exhibit abnormal behavior. Outliers measured using the
patterns. Using size-k frequent pattern, size k+1 candidate
Euclidian distance are the network connections that are
patterns are generated and tested for minimum support
anomalous. Using a threshold value (t), any network
value. This process continues till a max limit on rule size is
connection with a distance greater than the threshold is
reached or no frequent patterns exist. Finally from frequent
considered an outlier.
patterns, rules are generated using confidence value.
B. Misuse Detector
After rule generation, CBA uses a heuristic method to order
Misuse detector is a hierarchical sequential model using
the rules in decreasing precedence based on their confidence
decision tree. The differential approach separates out one
and support values. If a set of rules has the same antecedent
attack at a time. This technique defines the unique features
then the rule with the highest confidence is selected to
of one attack and at the same time brings about the general
represent the set. If the confidence of the rules that apply is
characteristics of the rest of the other attacks which
the same, the rule with highest support will be picked. Again
differentiate the rest from that attack. The distribution of
if the support is also equal, CBA will classify the case
different attacks in the training dataset is usually uneven. So,
according to the rule which is generated earlier than the
the instances of some attacks are often less than others.
others. In this way an ordered list of rules is created. When a
Sequence maintained at different levels is so as to make an
new tuple is given for classification, the class associated
unbiased classifier by combining the less frequent records
with first rule satisfying the tuple is used for labeling. The
together. In the proposed architecture, C4.5 is used as the
classifier also contains a default rule, having low
decision tree algorithm [6].
precedence. If a tuple doesn’t satisfy any rule then it is
C. Anomaly Detector
assigned the label of default class [6].
Classification based on association is the application of
V. Conclusion
association rules to classification problems. It generates
In this paper we have proposed a new model for attack
class association rules (CARs). Classification association
detection using a decision tree for misuse detection and
rules (CARs) are association rules with the target class on
CBA rules based classifier for anomaly detection. Moreover,
the right hand side of the rules. A CAR is an implication of
relevant feature selection model using principal component
the form:
analysis was proposed to select the best features set that
.
could be used to design a lightweight intrusion detection
X is the set of features. I is the set of all features. y is the
system. Our proposed system is just frame. So the future
target class. Y is the set of all classes. CBA also provides
work of this proposed framework will be implementation.
strength measurements for the CARs:
VI. REFERENCES
Support: The rule holds with support sup, if sup% of cases
[1] Ayman I. Madbouly, Amr M. Gody, Tamer M. Barakat,
contain X.
Relevant Feature Selection Model Using Data Mining for
Intrusion
Detection
system,
International
Journal of
5
Engineering Trends and Technology, Volume 9 Number 10,
March 2014.
[2] H. A. Nguyen and D. Choi, Application of data mining
to network intrusion detection: classifier selection model,
APNOMS 2008, LNCS 5297, pp. 399-408, SpringerVerlag, 2008.
[3] Mradul Dhakar and Akhilesh Tiwari, Journal of
Information and Computing Science Vol.9, No.1, 2014.
[4] Nethu B, Adaptive Intrusion detection Using Machine
Learning, International Journal of Computer Science and
Network Security, Vol.13 No.3, March 2013.
[5] O. Depren, M. Topallar, E. Anarim, and M. K.Cili, An
intelligent intrusion detection system (IDS)for anomaly and
misuse detection in computer net-works, Expert Systems
with Application, vol. 29, no.4, pp. 713-722, 2005.
[6] Radhika Goel, Anjali Sardana, and rAmesh C. Joshi ,
Parallel
Misuse
and
Anomaly
Detection
Model,
International Journal of Network Security, Vol.14, No.4,
PP.211-222, July 2012.
[7] S. S. Kandeeban and R. S. Rajesh, Integrated Intrusion
Detection System Using Soft Computing, International
Journal of Network Security, vol. 10, no. 2, pp. 87-92, Mar.
2010.
[8] Z. S. Pan, S. C. Chen, G. B. Hu, and D. Q. Zhang,
Hybrid neural network and C4.5 for misuse detection,
Proceedings of Second
International
Machine Learning and Cybernetics, 2003.
Conference
on