Download Applying Data Mining of Fuzzy Association Rules to Network

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Computer and network surveillance wikipedia , lookup

Information privacy law wikipedia , lookup

Data remanence wikipedia , lookup

Transcript
Applying Data Mining of Fuzzy
Association Rules to Network Intrusion
Detection
Authors:
Aly El-Semary, Janica Edmonds, Jesús González-Pino,
and Mauricio Papa
Center for Information Security
Department of Computer Science
University of Tulsa, Tulsa, OK 74104
Overview
•
•
•
•
•
•
Introduction
Background
Architecture
Data-mining Algorithm
Experimental Results
Conclusions
Introduction
• Intrusion Detection System (IDS)
– Capable of identifying security breaches
• Classification of IDS
– Host-based or Network-based
– Signature-based or anomaly-based
• Boolean logic has been used in decision-making
• Fuzzy logic as an alternative
– Sound foundation to handle imprecision and vagueness
– Mature inference mechanisms using varying degrees of truth
• Framework for hybrid fuzzy logic IDS
– Detection profiles represented by fuzzy rulesets
– Expert system capable of evaluating rule truthfulness
Background
• Data mining
– Association rule algorithms
• Apriori algorithm for two-valued attributes
• Algorithm for quantitative valued attributes (Kuok et al.)
– Mined association fuzzy rules are the basis for the detection
profile
• Fuzzy logic
– Xprove: remote operating system fingerprinting tool
• Multi-valued logic for pattern matching
• Better and more accurate results
– FIRE: Fuzzy Intrusion Recognition Engine
• Data mining techniques used for identifying adequate fuzzy sets
• Human interaction is needed to build fuzzy rules
Background (cont’d.)
• Our framework extends previous efforts in
fuzzy data mining
– Preprocessing facility for raw network traffic data
– An optimized association rule algorithm for producing
detection models
– Expert system capable of evaluating such detection models
– Prototype implementation
Architecture
• Two modes of operation:
– Rule-generation mode
– Detection mode
Packets
• Initial input for the IDS
• Can be obtained from
– Data repository (off-line)
– Network packet sniffer (on-line)
Preprocessor
• Accepts raw network packets as input data
– Used in both modes (rule-generation and detection)
• Produces records for each group
• A record contains aggregate information for a group
– Records are used to generate and evaluate fuzzy rules
Data Miner
• Implements optimized association rule algorithm
– Integrates Apriori and Kuok’s algorithms
– Allows for efficient, single-pass, record processing
• Resulting ruleset satisfies specific requirements
– Support
• Fraction of the data set for which all predicate terms hold true
– Confidence
• Fraction of the data set for which, if the antecedent holds true, then
the consequent holds true
Fuzzy Logic Rules
• Logical implications of the form p → q
– where
• aai is an antecedent attribute
• caj is a consequent attribute
• catattr is an attribute category
Fuzzy Inference Engine
• Makes use of FuzzyJess
– Integrates FuzzyJ with the Java Expert System Shell (Jess)
– Can be configured to use the Mamdani or Larsen inference
mechanisms
• Rule evaluation
– The firing strengths of the rules are the outputs
• Approaches 1: Observed behavior closely follows the profile
• Approaches 0: Observed behavior deviates from the profile
Data Mining Algorithm
• Definitions
– An attribute is a relevant feature of the input data
– A termset {l1, l2, …, ln} of an attribute a defines the set of
labels describing a
– A term t is a tuple 〈a:l 〉
– An itemset is an ordered set of terms {t1, t2, …, tn}
– An i-itemset is an itemset where n = i
– An itemset is called a large itemset (L-itemset) if its support
is equal to or greater than a threshold minimum support
– An Li-itemset is an L-itemset with i terms
Data Mining Algorithm
Data Mining Example
Data Mining Example
Data Mining Example
Data Mining Example
Data Mining Example
Experimental Results
• Dataset source
– Training data contained in three different data files
• File1: attack and background traffic (1 hour)
– The attack is an ipsweep that lasts approximately 5 minutes
• File2: background or normal traffic (1 hour)
• File3: attack and background traffic (5 minutes – attack period)
• Mined ruleset evaluation
– Anomaly-based (profile models normal traffic: File2)
– Signature-based (profile models attack traffic: File3)
Experimental Results (cont’d.)
• Output analysis and
metrics
– Rule firing strengths per
record
– Single rule firing strength
over the entire data set
• Visualization
– Applet windows
Results – Anomaly-based
if SYN is AVERAGESYN and FIN is AVERAGEFIN then ICMP is AVERAGEICMP
Normal traffic flow
Results – Anomaly-based
if SYN is AVERAGESYN and FIN is AVERAGEFIN then ICMP is AVERAGEICMP
Attack and normal traffic flow
Results – Signature-based
if UDP is AVERAGEUDP then ICMP is ABOVEICMP
Normal traffic flow
Results – Signature-based
if UDP is AVERAGEUDP then ICMP is ABOVEICMP
Normal and attack traffic flow
Conclusions
• Proven ability to operate as hybrid system
– Prototype shows promise in identifying deviations from anomaly and
signature-based detection profiles
• Multi-platform
– Modular design and implementation in Java
• Risk management
– Fuzzy logic provides continuous rather than binary evaluations of
system behavior
• Robustness
– Better classification than traditional IDS in the presence of slight
pattern changes