* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Heather Ames Chuan-Heng Chsiao Chaitanya Sai Gaddam 13
Deep packet inspection wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Distributed firewall wikipedia , lookup
Computer network wikipedia , lookup
Computer security wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Network tap wikipedia , lookup
Heather Ames Chuan-Heng Chsiao Chaitanya Sai Gaddam 13 February 2006 CN710 Discussion Proposal: Network Intrusion Detection The increase in the amount of data stored electronically, by average consumers and companies, has made the issue of data privacy and tampering a critical one. Attempts to remotely connect to computers or networks to gain illegal access to such data are labeled intrusion attempts. Automated early detection of such behavior can greatly help network administrators in safeguarding systems. As can be expected, these attacks can vary greatly in terms of what inputs of malicious users look like in log files over time, and within the file itself. Learning systems are applied with the hope of building classifiers that are able to highlight previously known intrusion tactics as well as novel, deviant behavior. We will discuss the use of various classifiers and pre-processing techniques on such datasets, and the paradigm of artificial Immune Systems, which seems to be currently popular with researchers in this field. Some of the papers proposed as readings discuss the use of the KDD cup 1999 data set. The 1999 edition of the Knowledge Discovery in Databases (KDD) Cup, an annual competition organized by an ACM special interest group, looked specifically at the issue of network intrusion. The data used for the contest is from the two month log of the US Air Force, and has been classified as 24 different classes (one class labeled normal, and the others being labeled abnormal network connections.) The features are also abstracted as 41 features (according to the network headers.) There are roughly five million training data points and about three million testing data points. This data set could help provide an intuitive feel for the problem at hand and aid speculation about the type of supervised learning algorithm most suited for the task. We suggest looking at the dataset and training readily available classifier implementations with it. References Core Readings de Castro, L.N. & Timmis, J. Artificial Immune Systems: A Novel Paradigm to Pattern Recognition. Artificial Neural Networks in Pattern Recognition, University of Paisley, 2002. This article introduces Artificial Immune Systems; explores the computational analogies to biological immune networks. 1 Aickelin, U., Greensmith, J., & Twycross, J. Immune system approaches to intrusion detection, a review. Giuseppe Nicosia, Vincenzo Cutello, P.J.B., ed.: Lecture Notes in Computer Science, 3239 p. 316-329, 2004. A review paper describing the developments in the use of the immune system metaphor to Intrusion Detection. Kim, D. S & Park, J. S. Network-Based Intrusion Detection with Support Vector Machines. Lecture Notes in Computer Science, 2662 p.747-756, 2003. The article details the use of SVM's to this problem. Rawat, S. & Sastry. J, C. Network Intrusion Detection Using Wavelet Analysis. CIT, LNCS 3356 p. 224-232, 2004. The article describes the use of wavelets in analyzing network traffic. |Cannady, J. & Garcia, R. C. The Application of Fuzzy ARTMAP in the Detection of Computer Network Attacks. Lecture Notes in Computer Science. 2130 p. 225-230, 2001. The article describes the use of Fuzzy-ARTMAP in the detection of network intrusion. Supplementary Readings Kim, J. & Bentley, P. The Human Immune System and Network Intrusion Detection. 7th European Conference on Intelligent Techniques and Soft Computing (EUFIT '99), Aachen, Germany, 1999. This article gives an overview of the desirable properties of an intrusion detection system, and really stretches the analogy to immune systems to its limits. Kim, J. & Bentley, P. The artificial immune model for network intrusion detection. 7th European Conference on Intelligent Techniques and Soft Computing (EUFIT’99), Aachen, Germany 1999a. This article focuses on the artificial immune system model. Light on implementation details and results, but is worth a quick read. Timmis, J., Knight, T., de Castro, L. N., & Hart E. An overview of artificial immune systems. Computation in Cells and Tissues: Perspectives and tools for thought. Natural Computation Series, p. 51-86, 2004. A more in-depth look at artificial immune systems. 2 Heather Ames Chuan-Heng Chsiao Chaitanya Sai Gaddam 21 February 2006 CN710: Network Intrusion Detection Intrusion detection systems (IDS) can be classified into two main categories: misusebased systems and anomaly-based systems. Misuse-based IDS look for signatures of previously known attacks and consequently cannot be of any use in detecting novel attacks. Anomaly-based systems are trained to learn the normal behavior of a system and signal any deviation (differing from the normal up to a certain threshold) from it. Host-based IDS look at system data and reside on each computer in a network. Networkbased systems are usually installed on one computer on the network, that gates internet connections, and mainly look at the data from packets. Intrusion detection can be cast as a machine learning problem where the task is to learn to distinguish between harmless behavior (data) and potentially malicious behavior (data). Design issues commonly associated with machine learning tasks (preprocessing of data, choosing initial input feature sets, metrics for similarity of data points, and network parameters) need to be addressed. Preprocessing: wavelets Network traffic has been observed to be self-similar in nature, which means it is a natural candidate for wavelet preprocessing. Self-similarity is considered to be attenuated in abnormal conditions, so detecting abnormal behavior can be boiled down to detecting outliers in wavelet coefficient sequences. Network Design Choices Researchers have used Fuzzy ARTMAP on this problem. Assigning pre-defined class labels to the ARTB makes the network a misuse-based system. An anomaly-based detection system can be created by having unsupervised feedback from the system as input to the ARTB layer. The problem then becomes one of predicting this diagnostic feedback, which can lead to the detection of novel anomalies. A dynamic vigilance parameter, tied to the feedback, is used to prevent proliferation of F2 nodes. In using support vector machines to tackle the problem, researchers have used multiple categories along with the usual binary case of normal/anomalous behavior. 3 Artificial Immune Systems (AIS) Artificial immune systems adopt the metaphor of vertebrate immune systems in detecting foreign elements. The computational procedure is divided into three parts: negative selection (behavior that is normal or intrinsic to the system is learnt), clonal selection (detectors good at detecting anomalies are allowed to proliferate and mutate), and immune network formation (detectors form suppressive networks to prevent too many false positives) Discussion Questions What kind of data is likely to be most informative? Is self-similarity a good characterization of the data? What effect does c-index (paucity of attack data) have? Is AIS really a new paradigm? What are the radical departures from normal computation? deCastro and Timmis (2002) Comparative Analysis of AIS and ANN Categories Component Location of the components Structure Memory Adaptation Plasticity and diversity Interaction with other components Interaction with the environment AIS ANN Attribute string, s, (information Neuron (processing elements) composed storage and processing) represented of an activation function, summing in appropriate shape-space; might junction, connection strengths, and correspond to an immune cell or activation threshold molecule Located according to the Fixed, predetermined locations environmental stimuli Usually follows the spatial Pre-defined architectures and weights distribution of the antigens biased by the environment represented in shape-space Content-addressable and Knowledge in connection strengths; distributed; carried in the attribute self-associative or content-addressable strength as well as connections and distributed Learning and evolution Learning Continuous insertion and Pruning and/or insertion of new elimination of the basic elements connections, units, and layers in the (cells/molecules) of the system network Match attribute strings by cell Interconnected neurons through receptors; cells have weighted connection strengths connections Attribute string is compared with Neurons receive input signals from the patterns in the environment; some environment; whole ANN might be used 4 Threshold Robustness State Control Generalization capability Non-linearities or all of the components might be involved in pattern recognition Affinity threshold determines the degree of recognition between immune cells and the presented input pattern Highly flexible and noise tolerant; self-tolerant (learn to recognize themselves) Concentration of immune cells and molecules and/or their affinities and connection strengths Any immune principle or theory (i.e. clonal selection) Cross-reactivity allows recognition of similar patterns and components can be multi-specific Activation functions that define the degree of recognition between 2 components to recognize the pattern Threshold determines the neuron’s activation Highly flexible and noise tolerant Activation level of the output neuron Unsupervised, supervised, and reinforcement learning for training Good generalization provided training; satisfactory generalization by reducing the dimensions of parameter space and the size of the dimensions Activation functions of individual neurons Some common intrusion terminology Buffer overflow What happens when you try to stuff more data into a buffer (holding area) than it can handle. This problem is commonly exploited by crackers to get arbitrary commands executed by a program running with root permissions. DoS attack This abbreviation for Denial-of-Service attack is used to label attempts to shut down websites by flooding network links with large amounts of traffic Syn attack When a session is initiated between the Transport Control Program (TCP) client and server in a network, a very small buffer space exists to handle the usually rapid "handshaking" exchange of messages that sets up the session. The session-establishing packets include a SYN field that identifies the sequence in the message exchange. An attacker can send a number of connection requests very rapidly and then fail to respond to the reply. This leaves the first packet in the buffer so that other, legitimate connection requests can't be accommodated. 5 Teardrop attack This type of denial of service attack exploits the way that the Internet Protocol (IP) requires a packet that is too large for the next router to handle be divided into fragments. The fragment packet identifies an offset to the beginning of the first packet that enables the entire packet to be reassembled by the receiving system. In the teardrop attack, the attacker's IP puts a confusing offset value in the second or later fragment. 6