Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Techniques in Network Security Mgr. Rudolf B. Blažek, Ph.D. Department of Computer Systems Faculty of Information Technologies Czech Technical University in Prague © Rudolf Blažek 2010-2011 Network Security MI-SIB, ZS 2011/12, Lecture 12 The European Social Fund Prague & EU: We Invest in Your Future Data mining v síťové bezpečnosti Mgr. Rudolf B. Blažek, Ph.D. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze © Rudolf Blažek 2010-2011 Síťová bezpečnost MI-SIB, ZS 2011/12, Přednáška 12 Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosF Data Mining Techniques Reference Anoop Singhal (Editor) Data Warehousing and Data Mining Techniques for Cyber Security (Advances in Information Security) Springer, 2006 ISBN: 0387264094 Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 3 Data Mining Techniques Introduction Data Mining in Network Security Do not use signatures, but learn about usage patterns • Also viewed as Machine Learning techniques Two basic detection approaches • Misuse detection: • • detection of normal vs. intrusive data flows Anomaly detection: • classification algorithms, rare class predictive models, association rules, cost sensitive modeling Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 4 Data Mining Techniques Introduction Data Mining in Network Security Misuse Detection • • • • Models of misuse are created automatically Often better rules than manually created signatures Very good for detection of known (or modified) attacks But deciding normal vs. intrusive is human resource intensive Anomaly detection • • • Models of normal behavior are created automatically Deviations from normal behavior are detected automatically Can potentially detect unexpected and unknown attacks Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 5 Data Mining Techniques Introduction Data Mining for Anomaly Detection Advantages • • Detection of unknown attacks Detection of unusual behavior interesting to IT managers Limitations • Possibly very high false alarm rate (FAR) Supervised Detection • Models for normal behavior are built from training data Unsupervised Detection • No training data, attempts to learn about anomalies automatically Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 6 Data Mining Techniques Introduction Unsupervised Anomaly Detection No training data • Attempts to learn about anomalies automatically Algorithms used • Statistical methods, clustering, outlier detection, state machines We will focus on unsupervised anomaly detection • • We will study MINDS, the Minnesota IDS, as an example It will illustrate the usage of data mining techniques Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 7 Data Mining Techniques MINDS MINDS Minnesota Intrusion Detection System Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 8 Data Mining Techniques MINDS Minnesota Intrusion Detection System Abbreviated as MINDS Uses data mining techniques for • unsupervised anomaly detection (assigns an “anomaly” score to each network connection) • association pattern analysis (for suspected anomalous network connections) Performance • Detects new intrusions that escape signature-based IDSs Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 9 Data Mining Techniques MINDS Minnesota Intrusion Detection System 4 C HAPTER T HREE The MINDS System Network Association Pattern Analysis Data Capture Device Anomaly Detection Summary of anomalies Anomaly Scores Analyst Labels Storage Filtering Feature Extraction Known Attack Detection Detected known attacks Source: Data Warehousing and Data Mining Techniques for Cyber Security Figure 3.1: MINDS System Table 3.1: Time-window based features Feature name Feature description Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 10 Data Mining Techniques MINDS Minnesota Intrusion Detection System Data capture • • • • • Netflow v. 5 (Cisco protocol) Using flow-tools (http://www.splintered.net/sw/flow-tools/) Header information only, summarized into (one-way) flows Much less data storage than tcpdump (or e.g. Wireshark) Flow analysis period: 10 min. (1-2 mil. flows) MINDS processing speed • • Processing of 10 min. data: less than 3 min But unwanted traffic is filtered out before processing Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 11 Data Mining Techniques MINDS MINDS Operation Steps 1. Feature Extraction • Basic features: source and destination IP addresses and ports, protocol, flags, number of bytes, number of packets • Derived features for a T-seconds time-window: • • • Capture connections with similar recent characteristics Useful to detect sources of high volume connections per unit time (e.g. fast scanning) Derived features for a connection-window: • Similar characteristics, but for the last connections from distinct sources (good for e.g. slow scanning) Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 12 Data Mining Techniques MINDS Time-window based features Feature name Feature description count-dest Number of flows to unique destination IP addresses inside the network in the last seconds from the same source count-src Number of flows from unique source IP addresses inside the net- work in the last seconds to the same destination count-serv-src Number of flows from the source IP to the same destination port in the last seconds count-serv-dest Number of flows to the destination IP address using same source port in the last seconds Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 13 Data Mining Techniques MINDS Connection-window based features Feature name Feature description count-dest-conn Number of flows to unique destination IP addresses inside the network in the last flows from the same source count-src-conn Number of flows from unique source IP addresses inside the network in the last flows to the same destination count-serv-srcconn Number of flows from the source IP to the same destination port in the last flows count-serv-destconn Number of flows to the destination IP address using same source port in the last flows Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 14 Data Mining Techniques MINDS MINDS Operation Steps 2. Signature based detection for known attacks • Detected attacks are not further analyzed. Not our focus. 3. Anomaly detection • • Outlier detection assigns an anomaly score to network flows Humans can analyze only the most anomalous connections 4. Association pattern analysis • • Summarization of highly anomalous network connections Humans decide whether the summaries can be used to create signatures for detection in step 2 Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 15 Data Mining Techniques MINDS MINDS Operation Steps 2. Signature based detection for known attacks • Detected attacks are not further analyzed. Not our focus. 3. Anomaly detection • • Outlier detection assigns an anomaly score to network flows Humans can analyze only the most anomalous connections 4. Association pattern analysis • • Summarization of highly anomalous network connections Humans decide whether the summaries can be used to create signatures for detection in step 2 Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 16 Data Mining Techniques MINDS MINDS Anomaly Detection Local Outlier Factor (LOF) • • Assigned to each data point It is local: outliers with respect to their neighborhood LOF Calculation 1. Compute density of each point’s neighborhood 2. Calculate the ratio of the point’s density and the density of all its neighbors 3. Find the average of the density ratios over all its neighbors Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 17 Data Mining Techniques Outlier Detection Outlier Detection Minnesota Intrusion Detection System Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 18 considered as Dataoutlier. Therefore, the simple nearest Mining Techniques Outlier Detection neighbor approaches based on computing the distances fail in these scenarios. However, the example p1 may be detected as outlier using only the distances to the nearest neighbor. On the other side, LOF is able to capture both outliers (p1 and p2) due to the fact that it considers the density around the points. 2-D Outlier Detection Example p2 p1 Figure 5. Advantages of the LOF approach Source: Data Warehousing and Data Mining Techniques for Cyber Security 2 weeks data. 7 we records. T DoS teard R2L, exam U2R, by a buffe PROB port- Althou advance i unresolve In his cri tioned a n thetic sim using atta lected fro that the b 3.3. Unsupervised Support Vector Machines. Unlike Network Security MI-SIB, ZS 2011/12, Lecture 12 Rudolf Blažek, Ph.D. (FIT ČVUT) 19 than 4, in this caseData 6. Mining Techniques Outlier Detection Definition 5: (reachability distance of an object p w.r.t. obLocal Outlier Factor Calculation ject o) Let k be a natural number. The reachability distance of object Reachability Distance p with respect to object o is defined as reach-distk(p, o) = max { k-distance(o), d(p, o) }. Figure 2 illustrates idea of reachability distance with point k = 4.qInk-distance bodu o isthe a distance from the k-th closest in the figure), then if object p is far away from o (e.g. p 2 oods, tuitively, particularly with respect to • at most k-1 points are ds. These are regarded as theoutliers reachability distance between the two is simply their actual discloser than q p1 tance. However, if they are “sufficiently” close (e.g., p=1 k-distance(o) in the figreach-dist (p , o) at least k is points k 1 le given • in Figure 1. This a sim- are ure), theThere actual distance o k-distance of o. The reaing 502 objects. are 400 ob- q is replaced by the NOT farther than objectsson in the , and two the statistical fluctuations of d(p,o) for all the is cluster that inC2so doing, it is the diameter of a circle • this example, C forms a denser p's close2 to o can be significantly reduced. The strength of this with both k closest neighbors awkins'smoothing definition, o1 and o2 effect can be controlled by the parameter higher reach-dist k(p2, o) k. The k=4 bjects in C1 and C2 should not be. the value of k, the more similar the reachability distances for ober, we wish to label both o1 and o2 p2 jects within the same neighborhood. framework of distance-based outSource: LOF: Identifying Density-Based Local Outliers Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 20 case explicitly but simply assume that there are no duplicates. (To Data Mining Techniques Outlier Detection deal with duplicates, we can base our notion of neighborhood on a k-distinct-distance, defined analogously to k-distance in definition 3, with the additional requirement that there be at least k objects with different spatial coordinates.) L t a T ≤ l Local Outlier Factor Calculation Definition 7: ((local) outlier factor of an object p) Local Outlier Factor of Point p necessary based outliers, however, is to compare the densities of The (local) outlier factorit of p is defined as different sets of objects, which means thatlrd we have (too)determine the MinPtswe keep MinPts density of sets of objects dynamically. Therefore, -------------------------lrd (p) (p, o), for as the only parameter and use the values reach-dist MinPts MinPts o ∈ N MinPts(p) - the density LOF(p), p) = --------------------------------------------------------o ∈ NMinPts as a(measure of the volume to determine MinPts N MinPts(p) in the neighborhood of an object p. ∑ 5.1 TheL Inspon secti Ininsid parti the clu bors borsin jects beboun labe MinPts is a parameter The outlier factorDensity object pofcaptures the degree whichp)we call Definition 6:of(local reachability density of antoobject Local Reachability p (k before) p an outlier. is the average density of the of ratio the local The It local reachability p is of defined as reachability density of p and those of p’s MinPts-nearest neighbors. It is easy to reach-dist (p, the o)⎞ higher the ⎛ reachability see that the lower p's local densityMinPts is, and ∑ ⎜ ⎟ o N p ∈ ( ) local reachability densities ofMinPts p's MinPts-nearest neighbors are, the ⎜ ⎟ -------------------------------------------------------------------------------lrd MinPts(p)=1 ⁄ N MinPts (p) section,⎟the formal higher is the LOF value⎜ of p. In the following ⎜ ⎟ properties of LOF are made precise. To simplify notation, ⎝ ⎠ we drop the subscript MinPts from reach-dist, lrd and LOF, if no confusion Intuitively, the local reachability density of an object p is the inarises.verse of the average reachability distance based on the MinPts- quite Lemm note To th re dist-mi clude dist-ma Let ε b 5.2fo Then (i) (ii) Lem it holds jects 21 Network Security nearest neighbors of p. Note that the local density canMI-SIB, be ∞ ZS if 2011/12, all theLecture 12Proof Rudolf Blažek, Ph.D. (FIT ČVUT) Data Mining Techniques Outlier Detection LOF Bounds MinPts = 3 Proof (Sketch): (a C ∀o ∈ NMinPts imax by definition o dmin p dmax ∑ imin o ∈ N Min ⇒ 1⁄ ------------------ lrd(p) ≤ ------------direct Source: LOF: Identifying Density-Based Local Outliers dmin = 4∗imax by definition o dmax = 6∗imin ⇒ LOFMinPts(p) ≥ 4 ⇒ LOFMinPts(p) ≤ 6 q∈N ⇒ 1 ⁄ --------- Figure 3: Illustration of theorem 1 giving rise to tighter bounds. In section 5.3, we will analyze in Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security ∀q ∈ NMinPts lrd(o12 )≥ MI-SIB, ZS 2011/12, Lecture ------------22 Data Mining Techniques Detection In particular, we hope to label o2 asOutlier outlying, but label all objects in the cluster C1 as non-outlying. Below, we show that for most objects inBounds C1 its LOF is approximately 1, indicating that they cannot LOF be labeled as outlying. Lemma 1: Let C be a collection of objects. Let reach-dist-min denote the minimum reachability distance of objects in C, i.e., reachdist-min = min {reach-dist(p, q) | p, q ∈ C}. Similarly, let reachdist-max denote the maximum reachability distance of objects in C. Let ε be defined as (reach-dist-max/reach-dist-min − 1). Then for all objects p ∈ C, such that: (i) all the MinPts-nearest neighbors q of p are in C, and (ii) all the MinPts-nearest neighbors o of q are also in C, it holds that 1/(1 + ε) ≤ LOF(p) ≤ (1 + ε). Source: LOF: Identifying Density-Based Local Outliers Proof (Sketch): For all MinPts-nearest neighbors q of p, reachMeaning: is close to 1Then if p is the reachability same cluster as dist(p,LOF(p) q) ≥ reach-dist-min. theinlocal density it’s neighbors (not an outlier) of p, as per definition 6, is ≤ 1/reach-dist-min. On the other reach-dist(p, q) ≤ reach-dist-max. Thus,MI-SIB, the ZSlocal Rudolfhand, Blažek, Ph.D. (FIT ČVUT) Network Security 2011/12,reachLecture 12 23 Data Mining Techniques among the 3-nearest neighbors of p, Outlier we Detection in turn find their minimum and maximum reachability distances to their 3-nearest neighbors. In the Bounds figure, the indirectmin(p) and indirectmax(p) values are LOF marked as imin and imax respectively. Theorem 1: Let p be an object from the database D, and 1 ≤ MinPts ≤ | D |. Then, it is the case that direct min(p) direct max(p) ---------------------------------- ≤ LOF ( p ) ≤ --------------------------------indirect max(p) indirect min(p) Source: LOF: Identifying Density-Based Local Outliers directmin ( ... (minimum reachability distance from the ( ( ( ( ( closest MinPts neighbors directmax ( ... (maximum such reachability distance indirectmin/max( ...( the same for the MinPts closest neighbors ( ( ( ( ( ( of the MinPts closest neighbors Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 24 Data Mining Techniques Outlier Detection LOF Bounds outlier factor LOF 20.00 pct = relative distance from the center pct is small for points with neighbors in the same cluster 15.00 L --- LOFmax : pct=1% LOFmax : pct=5% 10.00 LOFmin : pct=1% LOFmax : pct=10% 5.00 ⎛ ⎜ ⎜ ⎜ ⎝ LOFmin : pct=5% LOFmin : pct=10% 0.00 Source: LOF: Identifying Density-Based Local Outliers 0 5 10 15 = proportion direct/indirect Figure 4: Upper and lower bound on LOF depending on direct/ indirect for different values of pct Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 Figur 25 Data Mining Techniques Outlier Detection LOF Bounds pct = relative distance from the center • pct is small for point p with MaxPts neighbors in the same cluster • • no matter if p is in that cluster or not Thus the bounds are good for both cases Outliers: • • p is NOT in the same cluster as the closest neighbors LOF is large, bounds are tight Non-outliers • LOF is close to 1, bounds are tight Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 26 sented in previous section 3.2.1., when k = 1. We “outlier threshold” that will serve to determine w Other Data Mining Methods for point is an outlier or not. The threshold is based o training data and it is set to 2%. In order to co Outlier Detection threshold, for all data points from training data mal behavior” data) distances to their nearest nei Distance to the k-th closest neighbor computed and then sorted. All test data points k from the point p to its k-th nearest neighbor • D (p) ... distancedistances to their nearest neighbors greater than withare the largestas Dkoutliers. (p) values detected • outliers: n pointsold Data Mining Techniques Outlier Detection Nearest neighbor (NN) 3.2.3. Mahalanobis-distance Based Outlier = 1training data corresponds to “normal be • Same as above with Sincek the is straightforward to compute the mean and th Mahalanobis-distance Based Outlier Detection deviation of the “normal” data. The Mahalanob in training • normal data determined [ref] between the particular point p and the mea is the computed as:of normal data farthestdata from mean μ • outliers: n pointsnormal T 1 ( p ) (p ), • Mahalanobis distance: (Σ ... covariance where matrixthe of normal is thedata) covariance matrix of the “nor 27 Rudolf Blažek, Ph.D. (FIT ČVUT) Network MI-SIB, ZS 2011/12, Lecture 12 Similarly toSecurity the previous approach, the thresho dM = sented in previous section 3.2.1., when k = 1. We “outlier threshold” that will serve to determine w Other Data Mining Methods for point is an outlier or not. The threshold is based o training data and it is set to 2%. In order to co Outlier Detection threshold, for all data points from training data mal behavior” data) distances to their nearest nei Distance to the k-th closest neighbor computed and then sorted. All test data points k from the point p to its k-th nearest neighbor • D (p) ... distancedistances to their nearest neighbors greater than withare the largestas Dkoutliers. (p) values detected • outliers: n pointsold Data Mining Techniques Outlier Detection Nearest neighbor (NN) 3.2.3. Mahalanobis-distance Based Outlier = 1training data corresponds to “normal be • Same as above with Sincek the is straightforward to compute the mean and th Mahalanobis-distance Based Outlier Detection deviation of the “normal” data. The Mahalanob in training • normal data determined [ref] between the particular point p and the mea is the computed as:of normal data farthestdata from mean μ • outliers: n pointsnormal T 1 ( p ) (p ), • Mahalanobis distance: (Σ ... covariance where matrixthe of normal is thedata) covariance matrix of the “nor 28 Rudolf Blažek, Ph.D. (FIT ČVUT) Network MI-SIB, ZS 2011/12, Lecture 12 Similarly toSecurity the previous approach, the thresho dM = considered as outlier. Therefore, the simple n Outlier Detection neighbor approaches based on computing the distance in these scenarios. However, the example p1 may b tected as outlier using only the distances to the n neighbor. On the other side, LOF is able to capture outliers (p1 and p2) due to the fact that it considers the sity around the points. Data Mining Techniques Advantage of LOF-based Detection Nearest Neighbor Approach • points in C1 are far apart • p2 is closer to C2 than that • p2 will NOT be detected as an outlier but p1 will LOF Approach p2 • considers density of clusters p1 • both p1 and p2 WILL be detected as outliers Figure 5. Advantages of the LOF approach Source: Data Warehousing and Data Mining Techniques for Cyber Security Rudolf Blažek, Ph.D. (FIT ČVUT) 3.3. Unsupervised Support MI-SIB, Vector Machines. 29U Network Security ZS 2011/12, Lecture 12 Data Mining Techniques MINDS Performance MINDS Performance Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 30 Data Mining Techniques MINDS Performance Computational Complexity Neighborhoods around all data points: • • Calculate distances between all pairs of data points This is O (n2) calculation – computationally infeasible for millions of data points. Sampling a training data set from the observed data • • • Compare all data points to this small set, Reduced complexity: O ( data size x sample size ) Additional benefit: anomalous behavior will not be seen often in the sample, so it will not be mistaken with normal behavior Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 31 Data Mining Techniques MINDS Performance Performance of MINDS Worms • October 10, 2002: MINDS detected two activities of the slapper worm that were not identified by SNORT since they were new variations of an existing worm code Scanning and DoS activities • August 9, 2002: CERT/CC issued an alert for “widespread scanning and possible denial of service activity targeted at the Microsoft-DS service on port 445/TCP” • August 13, 2002: Network connections related to such scanning were the top ranked outliers in MINDS • The port scan module of SNORT failed (slow scanning) Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 32 Data Mining Techniques MINDS Performance Performance of MINDS Policy Violations • August 8 and 10, 2002: MINDS detected a machine running a Microsoft PPTP VPN server, and another one running a FTP server on non-standard ports. • Both policy violations were the top ranked outliers since they are not allowed, and therefore very rare. Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 33 Data Mining Techniques MINDS Performance Performance of MINDS Policy Violations • February 6, 2003: unsolicited ICMP echo reply messages to a computer previously infected with Stacheldract worm (a DDoS agent) were detected by MINDS. • The infected machine has been removed from the network, but other infected machines outside the local network were still trying to talk to the previously infected machine. Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 34 Data Mining Techniques MINDS Performance Performance of MINDS vs. SNORT Scanning • SNORT rule: 3 seconds window / more than 4 destination IPs for one source IP • • • SNORT requires longer windows for slow scans MINDS detects fast and slow scans automatically MINDS detects outbound scans, SNORT does not by default Policy Violations • • SNORT requires a rule for each violation MINDS detect any rare unusual behavior Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 35 Data Mining Techniques Mining Association Rules Mining Association Rules Summarizing Anomalous Connections Using Association Rules Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 36 Data Mining Techniques Mining Association Rules Summarizing Anomalous Connections Using Association Rules Association Rules • Implications X • Example: {Breat, Butter} Y (for binary features X and Y) {Milk} Mining Association Rules • • Find all rules with specified support (percentage in data) And with specified confidence = estimate of P(Y|X) Extracted rules • The support of extracted rules is called “Frequent Item Set” Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 37 Data Mining Techniques Mining Association Rules Summarizing Anomalous Connections Using Association Rules Example rule: {Bread, Butter} {Milk} Assume • • 10% transactions contain bread and butter 6% of the transactions contain bread, butter, and milk Support of the rule • Percentage of data with the rule: 6% Confidence of the rule • Estimate of P({Milk} | {Bread, Butter}): 6% / 10% = 60% Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 38 Data Mining Techniques Mining Association Rules Summarizing Anomalous Connections Using Association Rules Example rule: {Bread, Butter} • • {Milk} Support of the rule: 6% Confidence of the rule: 60% If requirement is 1% support and 50% confidence • • then the rule will be extracted Frequent Item Set: {Bread, Butter, Milk} Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 39 Data Mining Techniques Mining Association Rules Summarizing Anomalous Connections Using Association Rules Association patterns • specified as frequent item sets or association rules Summary of detected anomalous flows:x • • • Humans can analyze the summary E.g. a frequent set for scanning: {srcIP= X, dstPort = Y } This is a candidate signature for a signature-based system Constructing a profile of normal network traffic: • • Identify sets of features that are found in normal traffic E.g. web browsing: {protocol=TCP, dstPort=80, NumPackets=3. . . 6} Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 40 Data Mining Techniques Mining Association Rules Challenges in Mining Association Rules Imbalanced class distribution • Small support for attack, large for normal traffic Binarization and grouping of attribute values • Supervised or unsupervised Pruning the redundant patterns • Must be subsets with similar support Finding discriminating patterns • Patterns that identify attacks and normal traffic Grouping the discovered patterns Rudolf Blažek, Ph.D. (FIT ČVUT) Network Security MI-SIB, ZS 2011/12, Lecture 12 41