Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Applying Data Mining of Fuzzy
Association Rules to Network Intrusion
Detection
Authors:
Aly El-Semary, Janica Edmonds, Jesús González-Pino,
and Mauricio Papa
Center for Information Security
Department of Computer Science
University of Tulsa, Tulsa, OK 74104
Overview
•
•
•
•
•
•
Introduction
Background
Architecture
Data-mining Algorithm
Experimental Results
Conclusions
Introduction
• Intrusion Detection System (IDS)
– Capable of identifying security breaches
• Classification of IDS
– Host-based or Network-based
– Signature-based or anomaly-based
• Boolean logic has been used in decision-making
• Fuzzy logic as an alternative
– Sound foundation to handle imprecision and vagueness
– Mature inference mechanisms using varying degrees of truth
• Framework for hybrid fuzzy logic IDS
– Detection profiles represented by fuzzy rulesets
– Expert system capable of evaluating rule truthfulness
Background
• Data mining
– Association rule algorithms
• Apriori algorithm for two-valued attributes
• Algorithm for quantitative valued attributes (Kuok et al.)
– Mined association fuzzy rules are the basis for the detection
profile
• Fuzzy logic
– Xprove: remote operating system fingerprinting tool
• Multi-valued logic for pattern matching
• Better and more accurate results
– FIRE: Fuzzy Intrusion Recognition Engine
• Data mining techniques used for identifying adequate fuzzy sets
• Human interaction is needed to build fuzzy rules
Background (cont’d.)
• Our framework extends previous efforts in
fuzzy data mining
– Preprocessing facility for raw network traffic data
– An optimized association rule algorithm for producing
detection models
– Expert system capable of evaluating such detection models
– Prototype implementation
Architecture
• Two modes of operation:
– Rule-generation mode
– Detection mode
Packets
• Initial input for the IDS
• Can be obtained from
– Data repository (off-line)
– Network packet sniffer (on-line)
Preprocessor
• Accepts raw network packets as input data
– Used in both modes (rule-generation and detection)
• Produces records for each group
• A record contains aggregate information for a group
– Records are used to generate and evaluate fuzzy rules
Data Miner
• Implements optimized association rule algorithm
– Integrates Apriori and Kuok’s algorithms
– Allows for efficient, single-pass, record processing
• Resulting ruleset satisfies specific requirements
– Support
• Fraction of the data set for which all predicate terms hold true
– Confidence
• Fraction of the data set for which, if the antecedent holds true, then
the consequent holds true
Fuzzy Logic Rules
• Logical implications of the form p → q
– where
• aai is an antecedent attribute
• caj is a consequent attribute
• catattr is an attribute category
Fuzzy Inference Engine
• Makes use of FuzzyJess
– Integrates FuzzyJ with the Java Expert System Shell (Jess)
– Can be configured to use the Mamdani or Larsen inference
mechanisms
• Rule evaluation
– The firing strengths of the rules are the outputs
• Approaches 1: Observed behavior closely follows the profile
• Approaches 0: Observed behavior deviates from the profile
Data Mining Algorithm
• Definitions
– An attribute is a relevant feature of the input data
– A termset {l1, l2, …, ln} of an attribute a defines the set of
labels describing a
– A term t is a tuple 〈a:l 〉
– An itemset is an ordered set of terms {t1, t2, …, tn}
– An i-itemset is an itemset where n = i
– An itemset is called a large itemset (L-itemset) if its support
is equal to or greater than a threshold minimum support
– An Li-itemset is an L-itemset with i terms
Data Mining Algorithm
Data Mining Example
Data Mining Example
Data Mining Example
Data Mining Example
Data Mining Example
Experimental Results
• Dataset source
– Training data contained in three different data files
• File1: attack and background traffic (1 hour)
– The attack is an ipsweep that lasts approximately 5 minutes
• File2: background or normal traffic (1 hour)
• File3: attack and background traffic (5 minutes – attack period)
• Mined ruleset evaluation
– Anomaly-based (profile models normal traffic: File2)
– Signature-based (profile models attack traffic: File3)
Experimental Results (cont’d.)
• Output analysis and
metrics
– Rule firing strengths per
record
– Single rule firing strength
over the entire data set
• Visualization
– Applet windows
Results – Anomaly-based
if SYN is AVERAGESYN and FIN is AVERAGEFIN then ICMP is AVERAGEICMP
Normal traffic flow
Results – Anomaly-based
if SYN is AVERAGESYN and FIN is AVERAGEFIN then ICMP is AVERAGEICMP
Attack and normal traffic flow
Results – Signature-based
if UDP is AVERAGEUDP then ICMP is ABOVEICMP
Normal traffic flow
Results – Signature-based
if UDP is AVERAGEUDP then ICMP is ABOVEICMP
Normal and attack traffic flow
Conclusions
• Proven ability to operate as hybrid system
– Prototype shows promise in identifying deviations from anomaly and
signature-based detection profiles
• Multi-platform
– Modular design and implementation in Java
• Risk management
– Fuzzy logic provides continuous rather than binary evaluations of
system behavior
• Robustness
– Better classification than traditional IDS in the presence of slight
pattern changes