Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Applying Data Mining of Fuzzy Association Rules to Network Intrusion Detection Authors: Aly El-Semary, Janica Edmonds, Jesús González-Pino, and Mauricio Papa Center for Information Security Department of Computer Science University of Tulsa, Tulsa, OK 74104 Overview • • • • • • Introduction Background Architecture Data-mining Algorithm Experimental Results Conclusions Introduction • Intrusion Detection System (IDS) – Capable of identifying security breaches • Classification of IDS – Host-based or Network-based – Signature-based or anomaly-based • Boolean logic has been used in decision-making • Fuzzy logic as an alternative – Sound foundation to handle imprecision and vagueness – Mature inference mechanisms using varying degrees of truth • Framework for hybrid fuzzy logic IDS – Detection profiles represented by fuzzy rulesets – Expert system capable of evaluating rule truthfulness Background • Data mining – Association rule algorithms • Apriori algorithm for two-valued attributes • Algorithm for quantitative valued attributes (Kuok et al.) – Mined association fuzzy rules are the basis for the detection profile • Fuzzy logic – Xprove: remote operating system fingerprinting tool • Multi-valued logic for pattern matching • Better and more accurate results – FIRE: Fuzzy Intrusion Recognition Engine • Data mining techniques used for identifying adequate fuzzy sets • Human interaction is needed to build fuzzy rules Background (cont’d.) • Our framework extends previous efforts in fuzzy data mining – Preprocessing facility for raw network traffic data – An optimized association rule algorithm for producing detection models – Expert system capable of evaluating such detection models – Prototype implementation Architecture • Two modes of operation: – Rule-generation mode – Detection mode Packets • Initial input for the IDS • Can be obtained from – Data repository (off-line) – Network packet sniffer (on-line) Preprocessor • Accepts raw network packets as input data – Used in both modes (rule-generation and detection) • Produces records for each group • A record contains aggregate information for a group – Records are used to generate and evaluate fuzzy rules Data Miner • Implements optimized association rule algorithm – Integrates Apriori and Kuok’s algorithms – Allows for efficient, single-pass, record processing • Resulting ruleset satisfies specific requirements – Support • Fraction of the data set for which all predicate terms hold true – Confidence • Fraction of the data set for which, if the antecedent holds true, then the consequent holds true Fuzzy Logic Rules • Logical implications of the form p → q – where • aai is an antecedent attribute • caj is a consequent attribute • catattr is an attribute category Fuzzy Inference Engine • Makes use of FuzzyJess – Integrates FuzzyJ with the Java Expert System Shell (Jess) – Can be configured to use the Mamdani or Larsen inference mechanisms • Rule evaluation – The firing strengths of the rules are the outputs • Approaches 1: Observed behavior closely follows the profile • Approaches 0: Observed behavior deviates from the profile Data Mining Algorithm • Definitions – An attribute is a relevant feature of the input data – A termset {l1, l2, …, ln} of an attribute a defines the set of labels describing a – A term t is a tuple 〈a:l 〉 – An itemset is an ordered set of terms {t1, t2, …, tn} – An i-itemset is an itemset where n = i – An itemset is called a large itemset (L-itemset) if its support is equal to or greater than a threshold minimum support – An Li-itemset is an L-itemset with i terms Data Mining Algorithm Data Mining Example Data Mining Example Data Mining Example Data Mining Example Data Mining Example Experimental Results • Dataset source – Training data contained in three different data files • File1: attack and background traffic (1 hour) – The attack is an ipsweep that lasts approximately 5 minutes • File2: background or normal traffic (1 hour) • File3: attack and background traffic (5 minutes – attack period) • Mined ruleset evaluation – Anomaly-based (profile models normal traffic: File2) – Signature-based (profile models attack traffic: File3) Experimental Results (cont’d.) • Output analysis and metrics – Rule firing strengths per record – Single rule firing strength over the entire data set • Visualization – Applet windows Results – Anomaly-based if SYN is AVERAGESYN and FIN is AVERAGEFIN then ICMP is AVERAGEICMP Normal traffic flow Results – Anomaly-based if SYN is AVERAGESYN and FIN is AVERAGEFIN then ICMP is AVERAGEICMP Attack and normal traffic flow Results – Signature-based if UDP is AVERAGEUDP then ICMP is ABOVEICMP Normal traffic flow Results – Signature-based if UDP is AVERAGEUDP then ICMP is ABOVEICMP Normal and attack traffic flow Conclusions • Proven ability to operate as hybrid system – Prototype shows promise in identifying deviations from anomaly and signature-based detection profiles • Multi-platform – Modular design and implementation in Java • Risk management – Fuzzy logic provides continuous rather than binary evaluations of system behavior • Robustness – Better classification than traditional IDS in the presence of slight pattern changes