Download Helvetica is a Good Font

A Data Mining Approach for Building Cost-Sensitive and Light Intrusion Detection Models PI Meeting - July, 2000 North Carolina State University Columbia University Florida Institute of Technology Overview • Project description • Progress report: – – – – correlation cost-sensitive modeling anomaly detection collaboration with industry • Plan of work for 2000-2001 New Ideas/Hypotheses • High-volume automated attacks can overwhelm an IDS and its staff. • Use cost-sensitive data mining algorithms to construct ID models that consider cost factors: – damage cost, response cost, operational cost, etc. • Multiple specialized and light ID models can be dynamically activated/configured in run-time • Cost-effectiveness as the guiding principle and multi-model correlation as the architectural approach . Impact • A better understanding of the cost factors, cost models, and cost metrics related to intrusion detection. • Modeling techniques and deployment strategies for cost-effective IDSs. • “Clustering” techniques for grouping intrusions and building specialized and light models. • An architecture for dynamically activating, configuring, and correlating ID models. Correlation: Model and Issues across sources across time/sources • “Good” base models: data sources and modeling techniques. • The combined model: the correlation algorithms and network topology. Correlation: Approaches • Extend previous work in JAM • A sequence of time-stamped records – each is composed of signals from multiple sensors (network topology information embedded); • Apply data mining techniques to learn how to correlate the signals to generate a combined sensor: – link analysis, sequence analysis, machine learning (classification), etc. Correlation: Integrating NM and ID Signals • A stream of measures (anomaly reports) on MIB variables of network elements and a stream of ID signals: – Better coverage; – Early sensing of attacks. • Normal measures of network traffics and parameter values of ID signatures – S = f(N, A), A is invariant then S=g(N). – Automatic parameter adjustment, S1=g(N1). Cost Factors of IDSs • Attack taxonomy: result/target/technique • Development cost • Damage cost (DCost) – The amount of damage when ID is not available or ineffective. • Response cost (RCost) – The cost of acting upon an alarm of potential intrusion. • Operational cost (OpCost) – The cost of processing and analyzing audit data ; – Mainly the computational costs of the features. Cost Models of IDSs • The total cost of an IDS over a set of events: • CumulativeCost(E) =  eE (CCost(e) + OpCost(e)) • CCost(e), the consequential cost, depends on prediction on event e Consequential Cost (CCost) • For event e : CCost(e) Outcome Miss (FN) DCost(e) False Alarm (FP) RCost(e’)+PCost(e) 0 Hit (TP) RCost(e)+ DCost(e) DCost(e) Normal (TN) 0 Misclassified Hit RCost(e’)+ DCost(e) DCost(e) Conditions DCost(e’)  RCost(e’) Otherwise DCost(e)  RCost(e) Otherwise DCost(e’)  RCost(e’) Otherwise Cost-sensitive Modeling: Objectives • Reducing operational costs: – Use cheap features in ID models. • Reducing consequential costs: – Do not respond to an intrusion if RCost > DCost. Cost-sensitive Modeling: Approaches • Reducing operational costs: – A multiple-model approach: • Build multiple rule-sets, each with features of different cost levels; • Use cheaper rule-sets first, costlier ones later only for required accuracy. – Feature-Cost-Sensitive Rule Induction: • Search heuristic considers information gain AND feature cost. Cost-sensitive Modeling: Approaches (continued) • Reducing consequential costs: – MetaCost: • Purposely re-label intrusions with Rcost > DCost as normal. – Post-Detection decision: • Action depends on comparison of RCost and DCost. Latest Results • OpCost – Compare the multiple-model approach with single-model approach; – rdc%: (single - multiple)/single; – range: 57% to 79%. 250 200 150 Single Multiple 100 50 0 Average Per Connection Latest Results (continued) • CCost using a post-detection cost-sensitive decision module – rdc% range: 75% to 95%; – Compared with single model: slightly better rdc%; – Compared with cost-insensitive models: 25% higher rdc%. CS-single CS-multiple CI-single CI-Multiple 27500 27000 26500 26000 25500 25000 24500 24000 23500 23000 Total Ccost Anomaly Detection • Unsupervised Training Methods – Build models over noisy (not clean) data • Artificial Anomalies – Improves performance of anomaly detection methods. • Combining misuse and anomaly detection. AD over Noisy Data • Builds normal models over data containing some anomalies. • Motivating Assumptions: – Intrusions are extremely rare compared to to normal. – Intrusions are quantitatively different. Approach Overview • Mixture Model – Normal Component – Anomalous Component • Build Probabilistic Model of Data • Max Likelihood test for detection. Mixture Model of Anomalies • Assume a generative model: The data is generated with a probability distribution D. • Each element originates from one of two components. – M, the Majority Distribution (x  M). – A, the Anomalous Distribution (x  A). • Thus: D = (1-)M + A Modeling Probability Distributions • Train Probability Distributions over current sets of M and A. • PM(X) = probability distribution for Majority • PA(X) = probability distribution for Anomaly • Any probability modeling method can be used: Naïve Bayes, Max Entropy, etc. Detecting Anomalies • Likelihood of a partition of the set of all elements D into M and A: L(D)=  PD(X) D |A|  P (X)) =((1-)|M|  P (X) )(  M A M A • Log Likelihood (for computational reasons): LL(D)=log(L(D)) Algorithm for Detection • Assume all elements are normal (M0=D, A0= ). • Compute PD(X). • Using PD(X) compute LL(D). • For each element compute difference in LL(D) if removed from M and inserted into A. • If the difference is large enough, then declare the element an anomaly. Evaluating xt Mt+1 = Mt – {xt} At+1 = At U {xt} Recompute PMt and PAt. (efficiently) If (LLt+1-LLt)> threshold, xt is anomaly Otherwise xt is normal Experiments • Two Sets of experiments: – Measured Performance against comparison methods over noisy data. – Measured Performance trained over noisy data against comparison methods trained over clean data. AD Using Artificial Anomalies • Generate abnormal behavior artificially – assume the given normal data are representative – “near misses” of normal behavior is considered abnormal – change the value of only one feature in an instance of normal behavior – sparsely represented values are sampled more frequently – “near misses” help define a tight boundary enclosing the normal behavior Experimental Results • Learning algorithm: RIPPER rule learner. • Data: 1998/99 DARPA evaluation – U2R, R2L, DOS, PRB: 22 “clusters” • Training data: normal and artificial anomalies • Results – Overall hit rate: 94.26% (correctly normal or intrusion) – Overall false alarm rate: 2.02% – 100% dectection: buffer_overflow, guess_passwd, phf, back – 0% detection: perl, spy, teardrop, ipsweep, nmap – 50+% detection: 13 out of 22 intrusion subclasses Combining Anomaly And Misuse Detection • Training data: normal, artificially generated anomalies, known intrusions • The learned model can predict normal, anomaly, or known intrusion subclass • Experiments were performed on increasing subsets of known intrusion subclasses in the training data (simulates identified intrusions over time). Combining Anomaly And Misuse Detection (continued) • Consider phf, pod, teardrop, spy, and smurf are unknown (absent from the training data) • Anomaly detection rate: phf=25%, pod=100%, teardrop=93.91%, spy=50%, smurf=100% • Overall false alarm rate: .20% • The false alarm rate has dropped from 2.02% to .20% when some known attacks are included for training Collaboration with Industry • RST Inc. – Anomaly detection on NT systems • NFR Inc. – real-time IDS • SAS Institute – off-line ID (funded by SAS) • Aprisma (Cabletron) – Integrating ID with NM (funded by Aprisma) • HRL Labs – ID in wireless networks (funded by HRL) Plan for 2000-2001 • Dynamic cost-sensitive modeling and deployment – work with industry for realistic cost analysis and real-time testing • Anomaly detection – improve existing algorithms using feedback from evaluation • Correlation – develop/evaluate algorithms for integrating multiple sources data/evidences

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Helvetica is a Good Font