Download Heather Ames Chuan-Heng Chsiao Chaitanya Sai Gaddam 13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Deep packet inspection wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Distributed firewall wikipedia , lookup

Computer network wikipedia , lookup

Computer security wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

IEEE 1355 wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Network tap wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
Heather Ames
Chuan-Heng Chsiao
Chaitanya Sai Gaddam
13 February 2006
CN710 Discussion Proposal: Network Intrusion Detection
The increase in the amount of data stored electronically, by average consumers and
companies, has made the issue of data privacy and tampering a critical one. Attempts to
remotely connect to computers or networks to gain illegal access to such data are labeled
intrusion attempts. Automated early detection of such behavior can greatly help network
administrators in safeguarding systems. As can be expected, these attacks can vary
greatly in terms of what inputs of malicious users look like in log files over time, and
within the file itself. Learning systems are applied with the hope of building classifiers
that are able to highlight previously known intrusion tactics as well as novel, deviant
behavior. We will discuss the use of various classifiers and pre-processing techniques on
such datasets, and the paradigm of artificial Immune Systems, which seems to be
currently popular with researchers in this field.
Some of the papers proposed as readings discuss the use of the KDD cup 1999 data set.
The 1999 edition of the Knowledge Discovery in Databases (KDD) Cup, an annual
competition organized by an ACM special interest group, looked specifically at the issue
of network intrusion. The data used for the contest is from the two month log of the US
Air Force, and has been classified as 24 different classes (one class labeled normal, and
the others being labeled abnormal network connections.) The features are also abstracted
as 41 features (according to the network headers.) There are roughly five million training
data points and about three million testing data points. This data set could help provide an
intuitive feel for the problem at hand and aid speculation about the type of supervised
learning algorithm most suited for the task. We suggest looking at the dataset and training
readily available classifier implementations with it.
References
Core Readings
de Castro, L.N. & Timmis, J. Artificial Immune Systems: A Novel Paradigm to Pattern
Recognition. Artificial Neural Networks in Pattern Recognition, University of Paisley,
2002.
This article introduces Artificial Immune Systems; explores the computational
analogies to biological immune networks.
1
Aickelin, U., Greensmith, J., & Twycross, J. Immune system approaches to intrusion
detection, a review. Giuseppe Nicosia, Vincenzo Cutello, P.J.B., ed.: Lecture Notes in
Computer Science, 3239 p. 316-329, 2004.
A review paper describing the developments in the use of the immune system
metaphor to Intrusion Detection.
Kim, D. S & Park, J. S. Network-Based Intrusion Detection with Support Vector
Machines. Lecture Notes in Computer Science, 2662 p.747-756, 2003.
The article details the use of SVM's to this problem.
Rawat, S. & Sastry. J, C. Network Intrusion Detection Using Wavelet Analysis. CIT,
LNCS 3356 p. 224-232, 2004.
The article describes the use of wavelets in analyzing network traffic.
|Cannady, J. & Garcia, R. C. The Application of Fuzzy ARTMAP in the Detection of
Computer Network Attacks. Lecture Notes in Computer Science. 2130 p. 225-230, 2001.
The article describes the use of Fuzzy-ARTMAP in the detection of network
intrusion.
Supplementary Readings
Kim, J. & Bentley, P. The Human Immune System and Network Intrusion Detection. 7th
European Conference on Intelligent Techniques and Soft Computing (EUFIT '99),
Aachen, Germany, 1999.
This article gives an overview of the desirable properties of an intrusion detection
system, and really stretches the analogy to immune systems to its limits.
Kim, J. & Bentley, P. The artificial immune model for network intrusion detection. 7th
European Conference on Intelligent Techniques and Soft Computing (EUFIT’99),
Aachen, Germany 1999a.
This article focuses on the artificial immune system model. Light on
implementation details and results, but is worth a quick read.
Timmis, J., Knight, T., de Castro, L. N., & Hart E. An overview of artificial immune
systems. Computation in Cells and Tissues: Perspectives and tools for thought. Natural
Computation Series, p. 51-86, 2004.
A more in-depth look at artificial immune systems.
2
Heather Ames
Chuan-Heng Chsiao
Chaitanya Sai Gaddam
21 February 2006
CN710: Network Intrusion Detection
Intrusion detection systems (IDS) can be classified into two main categories: misusebased systems and anomaly-based systems. Misuse-based IDS look for signatures of
previously known attacks and consequently cannot be of any use in detecting novel
attacks. Anomaly-based systems are trained to learn the normal behavior of a system and
signal any deviation (differing from the normal up to a certain threshold) from it.
Host-based IDS look at system data and reside on each computer in a network. Networkbased systems are usually installed on one computer on the network, that gates internet
connections, and mainly look at the data from packets.
Intrusion detection can be cast as a machine learning problem where the task is to learn to
distinguish between harmless behavior (data) and potentially malicious behavior (data).
Design issues commonly associated with machine learning tasks (preprocessing of data,
choosing initial input feature sets, metrics for similarity of data points, and network
parameters) need to be addressed.
Preprocessing: wavelets
Network traffic has been observed to be self-similar in nature, which means it is a natural
candidate for wavelet preprocessing. Self-similarity is considered to be attenuated in
abnormal conditions, so detecting abnormal behavior can be boiled down to detecting
outliers in wavelet coefficient sequences.
Network Design Choices
Researchers have used Fuzzy ARTMAP on this problem. Assigning pre-defined class
labels to the ARTB makes the network a misuse-based system. An anomaly-based
detection system can be created by having unsupervised feedback from the system as
input to the ARTB layer. The problem then becomes one of predicting this diagnostic
feedback, which can lead to the detection of novel anomalies. A dynamic vigilance
parameter, tied to the feedback, is used to prevent proliferation of F2 nodes.
In using support vector machines to tackle the problem, researchers have used multiple
categories along with the usual binary case of normal/anomalous behavior.
3
Artificial Immune Systems (AIS)
Artificial immune systems adopt the metaphor of vertebrate immune systems in detecting
foreign elements. The computational procedure is divided into three parts: negative
selection (behavior that is normal or intrinsic to the system is learnt), clonal selection
(detectors good at detecting anomalies are allowed to proliferate and mutate), and
immune network formation (detectors form suppressive networks to prevent too many
false positives)
Discussion Questions
What kind of data is likely to be most informative?
Is self-similarity a good characterization of the data?
What effect does c-index (paucity of attack data) have?
Is AIS really a new paradigm? What are the radical departures from normal computation?
deCastro and Timmis (2002) Comparative Analysis of AIS and ANN
Categories
Component
Location of the
components
Structure
Memory
Adaptation
Plasticity and
diversity
Interaction with
other components
Interaction with
the environment
AIS
ANN
Attribute string, s, (information
Neuron (processing elements) composed
storage and processing) represented
of an activation function, summing
in appropriate shape-space; might
junction, connection strengths, and
correspond to an immune cell or
activation threshold
molecule
Located according to the
Fixed, predetermined locations
environmental stimuli
Usually follows the spatial
Pre-defined architectures and weights
distribution of the antigens
biased by the environment
represented in shape-space
Content-addressable and
Knowledge in connection strengths;
distributed; carried in the attribute
self-associative or content-addressable
strength as well as connections
and distributed
Learning and evolution
Learning
Continuous insertion and
Pruning and/or insertion of new
elimination of the basic elements
connections, units, and layers in the
(cells/molecules) of the system
network
Match attribute strings by cell
Interconnected neurons through
receptors; cells have weighted
connection strengths
connections
Attribute string is compared with
Neurons receive input signals from the
patterns in the environment; some environment; whole ANN might be used
4
Threshold
Robustness
State
Control
Generalization
capability
Non-linearities
or all of the components might be
involved in pattern recognition
Affinity threshold determines the
degree of recognition between
immune cells and the presented
input pattern
Highly flexible and noise tolerant;
self-tolerant (learn to recognize
themselves)
Concentration of immune cells and
molecules and/or their affinities
and connection strengths
Any immune principle or theory
(i.e. clonal selection)
Cross-reactivity allows recognition
of similar patterns and components
can be multi-specific
Activation functions that define the
degree of recognition between 2
components
to recognize the pattern
Threshold determines the neuron’s
activation
Highly flexible and noise tolerant
Activation level of the output neuron
Unsupervised, supervised, and
reinforcement learning for training
Good generalization provided training;
satisfactory generalization by reducing
the dimensions of parameter space and
the size of the dimensions
Activation functions of individual
neurons
Some common intrusion terminology
Buffer overflow
What happens when you try to stuff more data into a buffer (holding area) than it can
handle. This problem is commonly exploited by crackers to get arbitrary commands
executed by a program running with root permissions.
DoS attack
This abbreviation for Denial-of-Service attack is used to label attempts to shut down
websites by flooding network links with large amounts of traffic
Syn attack
When a session is initiated between the Transport Control Program (TCP) client and
server in a network, a very small buffer space exists to handle the usually rapid "handshaking" exchange of messages that sets up the session. The session-establishing packets
include a SYN field that identifies the sequence in the message exchange. An attacker
can send a number of connection requests very rapidly and then fail to respond to the
reply. This leaves the first packet in the buffer so that other, legitimate connection
requests can't be accommodated.
5
Teardrop attack
This type of denial of service attack exploits the way that the Internet Protocol (IP)
requires a packet that is too large for the next router to handle be divided into fragments.
The fragment packet identifies an offset to the beginning of the first packet that enables
the entire packet to be reassembled by the receiving system. In the teardrop attack, the
attacker's IP puts a confusing offset value in the second or later fragment.
6