Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
February 21-23, 2007 Data Preprocessing for Anomaly Based Network Intrusion Detection: A Review by <Jonathan J. Davis and Andrew J. Clark> Presented by Sait Murat GIRAY April 19,2013 1 Overview • Outline of the presentation – Introduction – Packet Header Anomaly Detection – Protocol Anomaly Detection – KDD Cup 1999 Detection – Content Anomaly Detection – Alert Anomaly Detection – Discussion and Conclusions <Sait Murat GIRAY>, <METU> 2 Key Points and Focus • Concentration is on the data preprocessing methods used by anomaly-based network intrusion detection systems (NIDS) with regards to; – Aspects of network traffic – Feature construction methods – Selection techniques • Motivation behind the review is the impact of data preprocessing for NIDSs on their; – Precision (Fidelity) – Aptitude (Capability) <Sait Murat GIRAY>, <METU> 3 What is an Intrusion and Detection System • Intrusion: An attempt to break into or misuse of a system. • Intruders may be from outside the network or legitimate users of the network. • Intrusion can be physical, system or remote. • Intrusion Detection Systems (IDS) monitor systems against security compromises in real time (or as much close as to that), log and report malicious activities. • IDSs are passive while Intrusion Prevention Systems(IPSs) are actively react against threats. • Anomaly based approaches don't fit well for IPS and out of the scope of this paper. <Sait Murat GIRAY>, <METU> 4 Intrusion Detection Systems (IDS) • IDS are mainly classfied based on; – How they detect malicious activity? Signature based misuse Anamoly based – Where they are deployed? Network based (Monitors network activity) Host based (Monitors client activity) <Sait Murat GIRAY>, <METU> 5 Signature Based IDS • Possess an attack description that can be matched to sensed attack manifestations. • Rely upon rules written by domain experts. • Most signature analysis systems are based on simple pattern matching algorithms. • The question of what information is relevant to an IDS depends upon what it is trying to detect. – E.g DNS, FTP etc. • Primary strength is very low false positive rates. <Sait Murat GIRAY>, <METU> 6 Drawbacks of Signature Based IDS • They are unable to detect novel attacks. • They require continuous signature updates and have to programmed repeatedly for every new attack pattern. <Sait Murat GIRAY>, <METU> 7 Anomaly Based IDS • Models all types of normal usage or valid behavior of the network as a noise characterization. • Anything diverges from the noise (base normal) is assumed to be an intrusive activity. – E.g flooding a host with lots of packet. (DoS) • The primary strength is its ability to recognize new attack types. <Sait Murat GIRAY>, <METU> 8 Drawbacks of Anomaly Detection IDS • Assumes that intrusions will be accompanied by definitions that are sufficiently unusual so as to permit detection. • These generate many false alarms and hence compromise the effectiveness of the IDS. • New (unfamiliar) but legal network traffic triggers false positives and it is not easy to model all types of normal traffic at once. <Sait Murat GIRAY>, <METU> 9 Why Data Preprocessing? • Data in the real world is dirty – incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data • e.g., occupation=“ ” – noisy: containing errors or outliers • e.g., Salary=“-10” – inconsistent: containing discrepancies in codes or names • e.g., Age=“42” Birthday=“03/07/1997” • e.g., Was rating “1,2,3”, now rating “A, B, C” • e.g., discrepancy between duplicate records <Sait Murat GIRAY>, <METU> 10 Major Tasks in Data Preprocessing • Data cleaning – Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies • Data integration – Integration of multiple databases, data cubes or files • Data transformation – Normalization and aggregation • Data reduction – Obtains reduced representation in volume but produces the same or similar analytical results • Data discretization – Part of data reduction but with particular importance, especially for numerical data <Sait Murat GIRAY>, <METU> 11 Data Preprocessing & Anomaly Detection • NIDS is a knowledge discovery task and it tries to classify network traffic to detect malicious activity. • Preprocessing is required to – Convert traffic into observations – Represent each observation as a feature vector – Label them as “normal” or “anomalous”. • These feature vectors are then fed into data mining and machine learning algortihms. <Sait Murat GIRAY>, <METU> 12 Preprocessing Steps for NIDS • Dataset creation – Identify traffic for training and testing – Label connections as normal or anomalous (time consuming) • Feature Construction – Create extra feature with more refined discriminative features – Sequence analysis, association mining and frequentepisode mining can be used. • Feature Selection – Eliminate redundant or irrelevant features to optimize dataset – Principal component analysis (PCA) is a common method. <Sait Murat GIRAY>, <METU> 13 Research Papers by Anomaly vs. Features <Sait Murat GIRAY>, <METU> 14 Overview • Next topic is – Introduction – Packet Header Anomaly Detection – Protocol Anomaly Detection – KDD Cup 1999 Detection – Content Anomaly Detection – Alert Anomaly Detection – Discussion and Conclusions <Sait Murat GIRAY>, <METU> 15 7 Layers of OSI Model All People Seem To Need Data Preprocessing <Sait Murat GIRAY>, <METU> 16 A Network Packet Header <Sait Murat GIRAY>, <METU> 17 Packet Header Anomaly Detection • Packet headers are only a small portion of network data. – Preprocessing is straightforward and requirements are minimized – Require fewer resources (CPU, RAM, HDD) compared to payload analysis – Remain useful even in the presence of encrypted payloads (SSL) – Good for real-time operation without deep packet inspection on high bandwidth networks (transcontinental fiber links or stock market systems) • Determining and using feature construction is – Required to produce more discriminative features – But it is hard to manage it with basic traffic info of headers. <Sait Murat GIRAY>, <METU> 18 Packet Header Anomaly Detection Feature Types • 3 types of feature sets used in detection are as follows: – Basic features Individual packet data without further feature construction – Single connection derived Complete network flow used as data instance. A flow is a sequences of packets with a common value (same source or destination IP/port) Provides more context to analyze Most important feature is time based statistical measures produced by monitoring basic features in a flow over time. – Multiple connection derived Features are constructed from multiple flows <Sait Murat GIRAY>, <METU> 19 Packet Header Anomaly Detector (PHAD) Basic Feature • This model detects TCP/IP and DoS attacks – Learns normal ranges for packet headers from data link (Ethernet), network (IP) and Transport/Control (TCP) layers. – This is a very large numeric range therefore clustering is employed as preprocessing to reduce this space. – Each attribute of a packet header is assigned an anomaly score based on the last time it was detected. – Anomaly score of each packet is the sum of its each attributes` anomaly value. • Since analysis is performed as per attribute of each header – The distribution depends on one random variable – Main algorithm is univariate anomaly detection <Sait Murat GIRAY>, <METU> 20 Statistical Packet Anomaly Detection Engine (SPADE) Basic Feature • It is developed to detect network and port scans; – Uses basic features such as source and destination IP address – Builds a model of a normal traffic distribution rather than simply counting number of attempts within a certain time window. – Maintain distributions in real time by joint probability measures or using a Bayes Network • Detection phase; – Anomaly score is calculated by comparing packets to probability distribution. – High anomalous packets are retained <Sait Murat GIRAY>, <METU> 21 Wireless Network Anomaly Detection Basic Feature • This approach detects attacks on wireless networks – Preprocess all frame headers – Categorize features and derive new ones. – Finally apply feature selection to discover most relevant set to detect malicious traffic. • Detection mechanism is as follows: – Features are filtered and sorted in accordance with their relevance. – A combination of forward search algorithm and KMeans Classifer is utilized to find best set of features. <Sait Murat GIRAY>, <METU> 22 Anamolous Network Traffic Detection with Self-Organizing Maps (ANDSOM) Single Connection • Self-organizing maps are artificial neural network (ANN) trained with unsupervised learning and produce a lowdimensional, discretized representation of the input space of the training samples called a map. • In ANDSOM preprocessing categorize dataset by service type (TCP,UDP) and application protocol (HTTP or SMTP) and a different model is created for each data segment. • SCD features provide a session fingerprint. Data instances are compared with SOM model to detect anomaly. <Sait Murat GIRAY>, <METU> 23 Other SCD based Anomaly Detectors Single Connection • HTTP request and response features – Attacks against web servers when traffic is encrypted with SSL – Since only features used are clear text HTTP request and response sizes, frequency analysis is used to reduce false positives • Round Trip Time (RTT) features – Multiple step connections are used by attackers to avoid detection – Clustering and partitioning is used to calculate RTTs and estimate number of steps <Sait Murat GIRAY>, <METU> 24 Other SCD based Anomaly Detectors Single Connection • Application protocol features – Features are extracted from TCP/IP headers of flow – C5 decision trees are used to classify these features – This mechanism discovers services and applications running on nonstandard (anomalous) ports SCD features are good for detecting unusual behavior in a single session such as unexpected protocol, data size and packet time but not sufficient for finding activity spanning multiple flows. <Sait Murat GIRAY>, <METU> 25 Minnesota Network Intrusion Detection System (MINDS) Multiple Connection • Uses a volume based approach to count flow features • Two sets of multiple connection features are calculated; – The last N connections. – Each time window (10 minutes) • Main algorithm for anomaly detection is a density based outlier detection called Local Outlier Factor (LOF) and association mining. • It is capable of exposing network scans, DoS attacks and worms. <Sait Murat GIRAY>, <METU> 26 Audit Data Analysis and Mining (ADAM) Multiple Connection • Preprocessing is performed to create connection records from network flows • This method uses association mining; – Derive multiple connection features – Apply on connection records over a sliding time window of 3s and 24h • A model from association rules is created in training phase • Data mining is used to dynamically find association rules and then compare with model in detection phase. <Sait Murat GIRAY>, <METU> 27 Stochastic Clustering Algorithm for Anomaly Detection (SCAN) Multiple Connection • It samples both incoming data and produces summaries to reduce workload. • These summaries include multiple connection features over 60s time window. • Time based features are fed into clustering algorithm to find outliers. Discovered outliers are accepted as anomalies in the network. • SCAN technique is quite successful in detecting DoS attacks in the high speed networks. <Sait Murat GIRAY>, <METU> 28 Fuzzy Intrusion Recognition Engine (FIRE) Multiple Connection • FIRE incorporates fuzzy logic and MCD statistical features in order to identify abnormal behaviors • Header attributes from flows are extracted; – TCP sessions are reassembled – A unique key for each session is created • Statistical measures are calculated from MCD feature with a 15 min interval. • Each feature is an input to a Fuzzy Threat Analyzer on which security admins write fuzzy rules to detect anomalies. <Sait Murat GIRAY>, <METU> 29 Overview • Next topic is – Introduction – Packet Header Anomaly Detection – Protocol Anomaly Detection – KDD Cup 1999 Detection – Content Anomaly Detection – Alert Anomaly Detection – Discussion and Conclusions <Sait Murat GIRAY>, <METU> 30 Protocol Anomaly Detection • Various protocols (TCP, SMTP, HTTP, FTP) are analyzed within network traffic to find out any divergent action. • Based on the adopted approaches; – Specification Based – Parser Based – Application Protocol Keyword Based <Sait Murat GIRAY>, <METU> 31 Specification Based Anomaly Detection • When an expert ”specifies” the model manually for an anomaly based NIDS it is called specification based. • Network protocols change much slower than the attacks do; – X.200 (OSI basic model) was approved on 1994. – Each day a new cyber attack emerges • The idea behind this technique; – Protocols have solid standards and definition – They are superior to trained models on anomaly detection. <Sait Murat GIRAY>, <METU> 32 Specification Based Anomaly Detection • Network traffic used to train models is generally both stained with malicious activity and does not represent real normal behavior. • Specification approaches; – Segment data by IP address and port combinations as preprocessing – Calculate frequency distributions associated with state machines – Unusual frequency distributions are marked as anomalous action • A limitation of the model is having state machines only for TCP/IP traffic and missing single packet attacks. <Sait Murat GIRAY>, <METU> 33 Parser Based Anomaly Detection • Protocol parsers or decoders are created with protocol specifications built into them • Protocol specific preprocessors parse and normalize header fields • When the decoders detects invalid protocol usage, the anomaly is flagged – Invalid total length • The advantage is the ability of delivering detailed info about the location and the cause of the anomaly <Sait Murat GIRAY>, <METU> 34 Parser Based Anomaly Detection • Snort is a very common NIDS mainly based on pattern matching signatures but also offers parser based anomaly detection. • Bro is another NIDS using policy scripts to detect anomalies. <Sait Murat GIRAY>, <METU> 35 Application Protocol Keyword Based Anomaly Detection • Some SCD features from packet headers are added along with protocol specific keywords • The model is built with allowed keywords from different application protocols and trained on them • If an unknown/rare key is used by a particular service its anomaly score is increased • Total anomaly score of a connection depends on the probability of each feature and its related keyword. <Sait Murat GIRAY>, <METU> 36 Overview • Next topic is – Introduction – Packet Header Anomaly Detection – Protocol Anomaly Detection – KDD Cup 1999 Detection – Content Anomaly Detection – Alert Anomaly Detection – Discussion and Conclusions <Sait Murat GIRAY>, <METU> 37 KDD Cup 1999 • This is the “Computer Network Intrusion” contest • Its dataset is publicly available and already preprocessed – That`s why it is used in many NIDS researches • Creating labels for a custom dataset is very labor intensive. – KDD Cup 99 dataset has labeled vector of 41 features for each network connection. <Sait Murat GIRAY>, <METU> 38 Data Transformation • Categorical features are normalized and transformed by their mean and standard deviation. • Processed dataset is compared in unsupervised and supervised learning • Although supervised is better than unsupervised for known attacks the latter beats the former in new attacks. – It makes sense to choose unsupervised as it does not need a labeled (lots of work) dataset to train on. <Sait Murat GIRAY>, <METU> 39 Data Cleaning and Reduction • Dataset has underwent further preprocessing and sampled for cleaning • Principal Component Analysis (PCA) is used to reduce the dimensionality of data set – Easy on classifier computational needs – Less time for classifier building and testing – Improves detection rates. <Sait Murat GIRAY>, <METU> 40 Data Discretization and Relabeling • Symbolic features of KDD dataset is coded into numeric data • Dimensionality of dataset is increased to 119 (from 41) • An unsupervised approach is followed to identify anomalies within the set. • Conversion process yielded good results since better overall classification accuracy achieved. • In relabeling, standard labels are traded with “service type” label. – Proximity measure detects anomalies as outliers <Sait Murat GIRAY>, <METU> 41 Overview • Next topic is – Introduction – Packet Header Anomaly Detection – Protocol Anomaly Detection – KDD Cup 1999 Detection – Content Anomaly Detection – Alert Anomaly Detection – Discussion and Conclusions <Sait Murat GIRAY>, <METU> 42 Content Anomaly Detection • Content (payload) is the data transported in the packet header. • Basically exploit codes are embedded into this “free wagon” and sent to clients • Detection requires much more deep inspection and it is expensive • Common attack vectors of payload attacks are – SQL injection and cross site scripting on servers – Web browser exploits on clients <Sait Murat GIRAY>, <METU> 43 N-gram Analysis of Server Requests • N-gram is a contiguous sequence of N items from a given sequence of text or speech • Using N-grams for data preprocessing – Does not require expert domain knowledge to construct relevant features – Traffic payload models are automatically created <Sait Murat GIRAY>, <METU> 44 Payload Based Anomaly Detector (PAYL) • Uses 1-g (single byte of 0-255 range) and unsupervised learning to build a byte-frequency distribution model of network payloads. • Measures distances as similarity of an unknown sample set to a known one and compare current traffic packet with model. • If distance > threshold then there is an anomaly. <Sait Murat GIRAY>, <METU> 45 ANAGRAM • Uses a mixture of N-grams where N>1. • Reduces mimicry attack risk as it is hard to emulate extra padded bits • ANAGRAM; – Utilizes supervised learning to model normal traffic – Stores N-grams into bloom filters those are probabilistic data structures used to test whether an element is a member of a set. – Compare N-grams from traffic to bloom filters for attack classification. <Sait Murat GIRAY>, <METU> 46 Analysis of Web Content to Client • Client computers are protected from external threats by boundary protection such as NIDs. • Therefore attackers target clients over web browser, email clients, instant messaging an etc. – They are externally viewed by client – Lots of functionalities are embedded along with vulnerabilities. <Sait Murat GIRAY>, <METU> 47 Analysis of Web Content to Client • Techniques to detect attacks on networks clients are mainly categorized as: – Constructing features from webpages and embedded scripts (Java script) and using weights on each feature to produce an anomaly score – Performing statistical analysis of script functions to discriminate normal activity from malicious behavior – Analyzing web pages with HTTP links and whitelisting allowed sites <Sait Murat GIRAY>, <METU> 48 Overview • Next topic is – Introduction – Packet Header Anomaly Detection – Protocol Anomaly Detection – KDD Cup 1999 Detection – Content Anomaly Detection – Alert Anomaly Detection – Discussion and Conclusions <Sait Murat GIRAY>, <METU> 49 Alert Anomaly Detection • This is a hierarchy of NIDSs in which upper one is processing alerts produced by the lower one. • Upper NIDSs correlate and classify alerts, generate statistics and make clusters to detect outliers in order to give refined alert output. • This technique is particularly useful in the presence of large number of outputs. – Either your system is under heavy attack – Your NIDS is giving many false positives and time to check it. <Sait Murat GIRAY>, <METU> 50 Overview • Next topic is – Introduction – Packet Header Anomaly Detection – Protocol Anomaly Detection – KDD Cup 1999 Detection – Content Anomaly Detection – Alert Anomaly Detection – Discussion and Conclusions <Sait Murat GIRAY>, <METU> 51 Feature Set Comparison and Recommendations • Most popular packet header approach is MCDs and statistical measures to evaluate network flows • Approaches for packet payload analysis and client requests are gaining popularity since attacks are evolving towards this area. • Recommendations: • Broad anomaly detection requires separate detectors per feature set while targeted detection demands per single feature. • Packet header features run fast with low overhead • Payload analysis can detect application specific attacks ignored by packet header approaches but at the cost of expensive computations <Sait Murat GIRAY>, <METU> 52 <Sait Murat GIRAY>, <METU> 53