Download 幻灯片 1

Data Mining &Intrusion Detection Shan Bai Instructor: Dr. Yingshu Li CSC 8712 ,Spring 08 1 Outline  Intrusion Detection  Data Mining  Data Mining in Intrusion Detection  Reference 2 What is an intrusion? 90000  An intrusion can be defined as “any set of actions that attempt to compromise the:  Integrity  confidentiality, or  availability of a resource”. 80000 70000 60000 50000 40000 30000 20000 10000 0 1 2 3 4 5 6 11 12 1990 1991 1992 1993 1994 1995 71996 81997 91998 10 1999 2000 2001 2002 13 Incidents Reported to Computer Emergency Response Team/Coordination Center Spread of SQL Slammer worm 10 minutes after its deployment 3 Intrusion Examples     DOS  Trojan horse /worm  denial-of-service  Address spoofing R2L  a malicious user uses a fake IP address to send malicious  unauthorized access from a packets to a target. remote machine, e.g. guessing password;  Many others… U2R  unauthorized access to local super user (root) privileges, e.g., various ``buffer overflow'' attacks; Probing  surveillance and other probing, e.g., port scanning. 4 Intrusion Detection System (IDS)  Intrusion Detection System  combination of software and hardware that attempts to perform intrusion detection raises the alarm when possible intrusion happens. 5 IDS Categories  Intrusion detection systems are split into two groups:  Anomaly detection systems   Identify malicious traffic based on deviations from established normal network. Misuse detection systems  Identify intrusions based on a known pattern (signatures) for the malicious activity. 6 Anomaly Detection activity measures 90 80 70 60 50 40 30 20 10 0 probable intrusion normal profile abnormal CPU Process Size baseline the normal traffic and then look for things that are out of the norm Relatively high false positive rate anomalies can just be new normal activities. 7 Misuse Detection pattern matching Intrusion Patterns Example: if (src_ip == dst_ip) then “land attack” intrusion activities look for known indicators ICMP Scans, port scans, connection attempts CPU, RAM I/O Utilization, File system activity, modification of system files, permission modifications Can’t detect new attacks 8   Goal of Intrusion Detection Systems (IDS):  To detect an intrusion as it happens and be able to respond to it. False positives:  A false positive is a situation where something abnormal (as defined by the IDS) happens, but it is not an intrusion.  Too many false positives   User will quit monitoring IDS because of noise. False negatives:  A false negative is a situation where an intrusion is really happening, but IDS doesn't catch it. 9 Outline  Intrusion Detection  Data Mining  Data Mining in Intrusion Detection  Reference 10 Why do we need Data Mining?  Despite the enormous amount of data, particular events of interest are still quite rare, frequency ranges from 0.1% to less than 10%  We are drowning in data, but starving for knowledge! 11 Data Mining vs. KDD    Knowledge Discovery in Databases (KDD): The whole process of finding useful information and patterns in data Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process Data mining is the core of the knowledge discovery process 12 KDD Process      Selection: Obtain data from various sources. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner 13 Data Mining: A KDD Process – Data mining: core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Data Warehouse Selection Data Cleaning Data Integration Databases 14 Typical Data Mining Architecture Graphical user interface Pattern evaluation Data mining engine Knowledge-base Database or data warehouse server Data cleaning & data integration Databases Filtering Data Warehouse 15 Outline  Intrusion Detection  Data Mining  Data Mining in Intrusion Detection  Reference 16  Network intrusion detection Number of intrusions on the network is typically a very small fraction of the total network traffic 17 Why Can Data Mining Help?  Learn from traffic data  Supervised learning: learn precise models from past intrusions  Unsupervised learning: identify suspicious activities  Maintain models on dynamic data  Correlation of suspicious events across network sites  Helps detect sophisticated attacks not identifiable by single site analyses  Analysis of long term data (months/years)  Uncover suspicious stealth activities (e.g. insiders leaking/modifying information) 18 Intrusion Detection  Traditional intrusion detection system IDS tools (e.g. SNORT) are based on signatures of known attacks  Limitations  Signature database has to be manually revised for each new type of discovered intrusion  They cannot detect emerging cyber threats  Substantial latency in deployment of newly created signatures across the computer system 19 Data Mining for Intrusion Detection: Techniques and Applications     Frequent pattern mining Classification Clustering Mining data streams 20 Frequent pattern mining  Patterns that occur frequently in a database  Mining Frequent patterns – finding regularities  Process of Mining Frequent patterns for intrusion detection  Phase I: mine a repository of normal frequent itemsets for attack-free data  Phase II: find frequent itemsets in the last n connections and compare the patterns to the normal profile 21 Frequent pattern mining Apriori: • Any subset of a frequent itemset must be also frequent — an anti-monotone property – A transaction containing {beer, diaper, nuts} also contains {beer, diaper} – {beer, diaper, nuts} is frequent {beer, diaper} must also be frequent • No superset of any infrequent itemset should be generated or tested – Many item combinations can be pruned 22 Sequential Pattern Analysis  Models sequence patterns  (Temporal) order is important in many situations   Time-series databases and sequence databases  Frequent patterns  (frequent) sequential patterns Sequential patterns for intrusion detection  Capture the signatures for attacks in a series of packets 23 Sequential Pattern Mining Given a set of sequences, find the complete set of frequent subsequences 24 Apriori Property in Sequences 25 Classification: A Two-Step Process  Model construction: describe a set of predetermined classes  Training dataset: tuples for model construction    Each tuple/sample belongs to a predefined class Classification rules, decision trees, or math formulae Model application: classify unseen objects  Estimate accuracy of the model using an independent test set  Acceptable accuracy  apply the model to classify data tuples with unknown class labels 26 Classification 27 Classification :Decision Tree    A node in the tree: a test of some attribute A branch: a possible value of the attribute Classification    Start at the root Test the attribute Move down the tree branch 28 Neural classification: HIDE   “A hierarchical network intrusion detection system using statistical processing and neural network classification” by Zheng et al. Five major components      Probes collect traffic data Event preprocessor preprocesses traffic data and feeds the statistical model Statistical processor maintains a model for normal activities and generates vectors for new events Neural network classifies the vectors of new events Post processor generates reports 29 Clustering   What Is Clustering? Group data into clusters    – Similar to one another within the same cluster – Dissimilar to the objects in other clusters – Unsupervised learning: no predefined classes 30 Clustering  What Is A Good Clustering?  High intra-class similarity and low interclasssimilarity   Depending on the similarity measure The ability to discover some or all of the hidden patterns 31 Clustering  Clustering Approaches  Partitioning algorithms    – Partition the objects into k clusters – Iteratively reallocate objects to improve the clustering Hierarchy algorithms   – Agglomerative: each object is a cluster, merge clusters to form larger ones – Divisive: all objects are in a cluster, split it up into smaller clusters 32 Clustering  K-Means: Example 33 Mining Data Streams for Intrusion Detection  Maintaining profiles of normal activities   Identifying novel attacks   The profiles of normal activities may drift Identifying clusters and outliers in traffic data streams Reduce the future alarm load by writing filtering rules that automatically discard wellunderstood false positives 34 Data Mining for Intrusion Detection  Misuse detection Predictive models are built from labeled data sets (instances are labeled as “normal” or “intrusive”) These models can be more sophisticated and precise than manually created signatures  Recent research e.g. JAM (Java Agents for Metalearning) 35 Misuse Detection pattern matching Intrusion Patterns Example: if (src_ip == dst_ip) then “land attack” intrusion activities look for known indicators ICMP Scans, port scans, connection attempts CPU, RAM I/O Utilization, File system activity, modification of system files, permission modifications Can’t detect new attacks 36 JAM (Java Agents for Metalearning)      JAM (developed at Columbia University) uses data mining techniques to discover patterns of intrusions. It then applies a meta-learning classifier to learn the signature of attacks. The association rules algorithm determines relationships between fields in the audit trail records, and the frequent episodes algorithm models sequential patterns of audit events. Features are then extracted from both algorithms and used to compute models of intrusion behavior. The classifiers build the signature of attacks. So thus, data mining in JAM builds misuse detection model. Classifiers in the JAM are generated by using rule learning program on training data of system usage. After training, resulting classification rules is used to recognize anomalies and detect known intrusions. The system has been tested with data from Sendmail-based attacks, and with network attacks using TCP dump data. 37 Data Mining for Intrusion Detection  Anomaly detection  Identifies anomalies as deviations from “normal” behavior  E.g. ADAM: Audit Data Analysis and Mining; MINDS – MINnesota INtrusion Detection System 38 Anomaly Detection activity measures 90 80 70 60 50 40 30 20 10 0 probable intrusion normal profile abnormal CPU Process Size baseline the normal traffic and then look for things that are out of the norm Relatively high false positive rate anomalies can just be new normal activities. 39 ADAM: Audit Data Analysis and Mining Detecting Intrusion by Data Mining Combination of Association Rule and Classification Rule   Firstly, ADAM collects known frequent datasetsan off-line algorithm Secondly, ADAM runs an online algorithm  Finds last frequent connection records  Compare them with known mined data  Discards those, which seems to be normal  Suspicious ones are forwarded to the classifier  Trained classifier then classify the suspicious data as one of the following:    Known type of attack Unknown type of attack False alarm 40 ADAM: Detecting Intrusion by Data Mining 41 ADAM: Audit Data Analysis and Mining  ADAM has two phases in their model  1st Phase: Train the classifier     Offline process Takes place only once Before the main experiment 2nd Phase: Using the trained classifier   Trained classifier is then used to detect anomalies Online process 42 The MINDS Project  MINDS – MINnesota INtrusion Detection System from Rare Class – Building rare class prediction models  Learning  Anomaly/outlier detection  Summarization of attacks using association pattern analysis TID Items 1 2 3 4 5 Bread, Coke, Milk Beer, Bread Beer, Coke, Diaper, Milk Beer, Bread, Diaper, Milk Coke, Diaper, Milk Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} 43 MINDS - Learning from Rare Class  Problem: Building models for rare network attacks (Mining needle in a haystack)  Standard data mining models are not suitable for rare classes  Models must be able to handle skewed class distributions  Learning from data streams - intrusions are sequences of events 44 MINDS - Anomaly Detection  Detect novel attacks/intrusions by identifying them as deviations from “normal”, i.e. anomalous behavior  Identify normal behavior  Construct useful set of features  Define similarity function  Use outlier detection algorithm  Nearest neighbor approach  Density based schemes  Unsupervised Support Vector Machines (SVM) 45 Experimental Evaluation • Publicly available data set DARPA 1998 Intrusion Detection Evaluation Data Set prepared and managed by MIT Lincoln Lab includes a wide variety of intrusions simulated in a military network environment • Real network data from University of Minnesota Anomaly detection is applied Open source signaturebased network IDS  4 times a day network 10 minutes time window www.snort.org 10 minutes cycle 2 millions connections net-flow data using CISCO routers Anomaly scores MINDS Data preprocessing anomaly detection … … Association pattern analysis 46 MINDS - Framework for Mining Associations Ranked connections attack Anomaly Detection System Discriminating Association Pattern Generator normal update 1. Build normal profile 2. Study changes in normal behavior 3. Knowledge Base R1: TCP, DstPort=1863  Attack … … Create attack summary 4. Detect misuse behavior 5. Understand nature of the attack … … R100: TCP, DstPort=80  Normal MINDS association analysis module 47 Discovered Real-life Association Patterns Rule 1: SrcIP=XXXX, DstPort=80, Protocol=TCP, Flag=SYN, NoPackets: 3, NoBytes:120…180 (c1=256, c2 = 1) Rule 2: SrcIP=XXXX, DstIP=YYYY, DstPort=80, Protocol=TCP, Flag=SYN, NoPackets: 3, NoBytes: 120…180 (c1=177, c2 = 0)    At first glance, Rule 1 appears to describe a Web scan Rule 2 indicates an attack on a specific machine Both rules together indicate that a scan is performed first, followed by an attack on a specific machine identified as vulnerable by the attacker 48 Discovered Real-life Association Patterns DstIP=ZZZZ, DstPort=8888, Protocol=TCP (c1=369, c2=0) DstIP=ZZZZ, DstPort=8888, Protocol=TCP, Flag=SYN (c1=291, c2=0)    This pattern indicates an anomalously high number of TCP connections on port 8888 involving machine ZZZZ Follow-up analysis of connections covered by the pattern indicates that this could be a machine running a variation of the Kazaa file-sharing protocol Having an unauthorized application increases the vulnerability of the system 49 Discovered Real-life Association Patterns…(ctd) SrcIP=XXXX, DstPort=27374, Protocol=TCP, Flag=SYN, NoPackets=4, NoBytes=189…200 (c1=582, c2=2) SrcIP=XXXX, DstPort=12345, NoPackets=4, NoBytes=189…200 (c1=580, c2=3) SrcIP=YYYY, DstPort=27374, Protocol=TCP, Flag=SYN, NoPackets=3, NoBytes=144 (c1=694, c2=3) ……  This pattern indicates a large number of scans on ports 27374 (which is a signature for the SubSeven worm) and 12345 (which is a signature for NetBus worm)  Further analysis showed that no fewer than five machines scanning for one or both of these ports in any time window 50 Discovered Real-life Association Patterns…(ctd) DstPort=6667, Protocol=TCP (c1=254, c2=1)     This pattern indicates an unusually large number of connections on port 6667 detected by the anomaly detector Port 6667 is where IRC (Internet Relay Chat) is typically run Further analysis reveals that there are many small packets from/to various IRC servers around the world Although IRC traffic is not unusual, the fact that it is flagged as anomalous is interesting  This might indicate that the IRC server has been taken down (by a DOS attack for example) or it is a rogue IRC server (it could be involved in some hacking activity) 51 Discovered Real-life Association Patterns…(ctd) DstPort=1863, Protocol=TCP, Flag=0, NoPackets=1, NoBytes<139 (c1=498, c2=6) DstPort=1863, Protocol=TCP, Flag=0 (c1=587, c2=6) DstPort=1863, Protocol=TCP (c1=606, c2=8)    This pattern indicates a large number of anomalous TCP connections on port 1863 Further analysis reveals that the remote IP block is owned by Hotmail Flag=0 is unusual for TCP traffic 52 MINDS: Conclusion  Data mining based algorithms are capable of detecting intrusions that cannot be detected by state-of-the-art signature based methods  SNORT has static knowledge manually updated by human analysts  MINDS anomaly detection algorithms are adaptive in nature  MINDS anomaly detection algorithms can also be effective in detecting anomalous behavior originating from a compromised or infected machine MINDS Research       Defining normal behavior Feature extraction Similarity functions Outlier detection Result summarization Detection of attacks originating from multiple sites Outsider attack  Network intrusion Insider attack  Policy violation Worm/virus detection after infection 53 IDS Using both Misuse and Anomaly Detection :RIDS-100        RIDS( Rising Intrusion Detection System) is provided by Rising Tech. It is a leader in antivirus and content security software and services in China. The company is a leading provider of client, gateway and server security solutions for virus protection, firewall and intrusion detection technologies and security services to enterprises and service providers around China. RIDS make the use of both intrusion detection technique, misuse and anomaly detection. Distance based outlier detection algorithm is used for detection deviational behavior among collected network data. For misuse detection, it has very vast set of collected data pattern which can be matched with scanned network data for misuse detection. This large amount of data pattern is scanned using data mining classification Decision Tree algorithm. http://www.rising-global.com/ 54 A cooperative anomaly and intrusion detection system (CAIDS),  built with a network-based intrusion detection system (NIDS) and an anomaly detection system (ADS) operating interactively through a signature generator. 55 A cooperative anomaly and intrusion detection system (CAIDS),  A frequent episode rule (FER) is generated out of a collection of frequent episodes. The FER is defined over episode sequences with multiple connection events.  For an example, we envision a window where we observe a 3event sequence: E, D, and F. An FER is generated as: E → D, F confidence level freq (a U b)/freq (b)=0.8, where a represents the event E on the LHS and b corresponds to the two events D and F on the RHS of the rule.     If the b occurs with 5% and the joint event a and b has 4% to occur, there is a (0.04/0.05) = 80% chance that D and F will follow in the same window. 56 A cooperative anomaly and intrusion detection system (CAIDS),       In practice, the event E could be an authentication service characterized by two attributes (service =authentication, flag=SF). The events D, F may be two sequential smtp requests denoted by (service = smtp). Thus we can derive an FER with a confidence level of c = 80%, that two smtp services will follow the authentication service within a window w = 2 sec. The three joint traffic events accounts with a support level s = 10% out of all the network connections being evaluated. This FER is formally stated as follows: (service = authentication) → (service = smtp) (service = smtp) (0.8, 0.1, 2 sec) (1) 57 A cooperative anomaly and intrusion detection system (CAIDS),  An association rule is aimed at finding interesting intrarelationship inside a single connection record  In general, an FER is specified by the following expression:  L1, L2,…, Ln  R1,…, Rm (c, s, window)  Li (1 ≤ i ≤ n) and Rj (1 ≤ j ≤ m) are ordered traffic connection events. We call L1, L2,…, Ln the LHS episode and R1,…, Rm the RHS of the episode rule.  (2) 58 A cooperative anomaly and intrusion detection system (CAIDS), Architecture of the CAIDS simulator built with a 2,000-signature Snort and an anomaly detection subsystem (ADS) with 60 FERs after 2 weeks of rule training over the Lincoln Lab IDS evaluation dataset 59 Conclusion  In this report we have studied basic concept and some classic system models, like ADAM ,MINDSin this area.  To make summary of those system models, their technologies and their validation methods.  Hope to a overview on currently development in this area and how data mining is evolving into the field of network intrusion detection. 60 Reference        DARPA 1998 data set  A cleansed set in KDDCup’99  DARPA 1991 data set is also available  http://www.ll.mit.edu/IST/ideval/data/data_index.html Daniel Barbara, Julia Couto, Sushil Jajodia, Leonard Popyack, Ningning Wu, “ADAM: Detecting Intrusions by Data Mining”, Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, NY, 5-6 June 2001 Zhang, J. and Zulkernine, M. 2006. A Hybrid Network Intrusion Detection Technique Using Random Forests. In Proceedings of the First international Conference on Availability, Reliability and Security (April 20 - 22, 2006). W. Lee et al. A data mining framework for building intrusion detection models. In Information and System Security, Vol. 3, No. 4, 2000. Ertoz L. et Al, "MINDS - Minnesota Intrusion Detection System", Next Generation Data Mining Chapter 3, 2004 Exploiting efficient data mining techniques to enhance intrusion detection systems Lu, C.T.; Boedihardjo, A.P.; Manalwar, P. Information Reuse and Integration, Conf, 2005. IRI -2005 IEEE International Conference on. Volume , Issue , 15-17 Aug. 2005 Page(s): 512 - 517 Sal Stolfo, Andreas Prodromidis, Shelley Tselepis, Wenke Lee, Dave Fan, and Phil Chan (Honorable mention (runner-up) for Best Paper Award in Applied Research Category) In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD '97), Newport Beach, CA, August 1997 61 Questions & Comments 62

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 幻灯片 1