Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INFOCOM’2011 Shanghai, China Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection INPUT: multiple simple atomic detectors OUTPUT: optimization-based combination mostly consistent with 1, Wei Fan2, Deepak Turaga2, Jing Gao all atomic detectors 2 2 Olivier Verscheure , Xiaoqiao Meng , Lu Su1,Jiawei Han1 1 Department of Computer Science University of Illinois 2 IBM TJ Watson Research Center Network Traffic Anomaly Detection Computer Network Dest Port Number of bytes 1 206.135.38.95 11:07:20 160.94.179.223 139 192 2 206.163.37.95 11:13:56 160.94.179.219 139 195 3 206.163.37.95 11:14:29 160.94.179.217 139 180 4 206.163.37.95 11:14:30 160.94.179.255 139 199 5 206.163.37.95 11:14:32 160.94.179.254 139 19 6 206.163.37.95 11:14:35 160.94.179.253 139 177 7 206.163.37.95 11:14:36 160.94.179.252 139 172 8 206.163.37.95 11:14:38 160.94.179.251 139 285 9 206.163.37.95 11:14:41 160.94.179.250 139 195 Tid SrcIP Start time Dest IP … … Anomalous or Normal? 10 Network Traffic 2 Challenges • the normal behavior can be too complicated to describe. • some normal data could be similar to the true anomalies • labeling current anomalies is expensive and slow • the network attacks adapt themselves continuously – what we know in the past may not work for today 3 The Problem • Simple rules (or atomic rules) are relatively easy to craft. • Problem: – there can be way too many simple rules – each rule can have high false alarm or FP rate • Challenge: can we find their non-trivial combination (per event, per detector) that significantly improve accuracy? Why We Need Combine Detectors? Count 0.1-0.5 Entropy 0.1-0.5 Count 0.3-0.7 Entropy 0.3-0.7 Too many alarms! Count 0.5-0.9 Entropy 0.5-0.9 Combined view is better than individual views!! Label 5 Combining Detectors • is non-trivial – We aim at finding a consolidated solution without any knowledge of the true anomalies (unsupervised) – But we could improve with limited supervision and incrementally (semi-supervised and incremental) – We don’t know which atomic detectors are better and which are worse – At some given moment, it could be some non-trivial and dynamic combination of atomic detectors – There could be more bad base detectors than good ones, so that majority voting cannot work 6 Problem Formulation Which one is anomaly? A1 A2 Record 1 Y N Record 2 N Y Record 6 N N Record 7 N N …… …… …… Ak-1 Ak N N Y N N N N N Combine atomic detectors into one! …… Record 3 Y N N N We propose a non-trivial combination …… Consensus: Y Y N Y Record 4 1. mostly consistent withN all atomic detectors …… N Y Y Record 5 2. optimization-based framework …… …… …… 7 How to Combine Atomic Detectors? • Linear Models – – – As long as one detector is correct, there always exist weights to combine them linearly Question: how to figure out these weights Per example & per detector • • Principles – – – – – • Different from majority voting and model averaging Consensus considers the performance among a set of examples and weights each detectors by considering its performance over others, i.e, each example is no longer i.i.d Consensus: mostly consistent among all atomic detectors Atomic detectors are better than random guessing and systematic flipping Atomic detectors should be weighted according to their detection performance We should rank the records according to their probability of being an anomaly Algorithm – – Reach consensus among multiple atomic anomaly detectors • unsupervised • Semi-supervised • incremental Automatically derive weights of atomic detectors and records – per detector & per event – no single weight works for all situations. 8 Framework [1 0] [0 1] record i ui [ui 0 , ui1 ] A1 qj …… detector j q j [q j 0 , q j1 ] ui …… Ak probability of anomaly, normal adjacency 1 ui q j aij 0 otherwise initial probability [1 0] anomalous yj normal [0 1] Detectors Records 9 Objective [1 0] minimize disagreement [0 1] v 2 min Q,U ( aij || ui q j || ||q j y j ||2 ) n v i 1 j 1 A1 qj …… ui …… j 1 Similar probability of being an anomaly if the record is connected to the detector Ak Do not deviate much from the initial probability Detectors Records 10 Methodology [1 0] [0 1] Iterate until convergence Update detector probability a u y ij i j n A1 qj …… ui …… qj i 1 n a ij i 1 Update record probability Ak a q ij j v ui j 1 v a j 1 Detectors Records ij 11 Propagation Process [1 0] [0 1] [0.6828 0.3172] [0.7 0.3] [0.304 0.696] [0.357 0.643] …… [0.7514 0.2486] [0.7 0.3] [0.304 0.696] [0.357 0.643] Detectors [0.5 0.5] [0.5285 0.4715] [0.5 0.5] [0.357 0.643] [0.5 0.5] [0.5285 0.4715] [0.5 0.5] [0.7 0.3] [0.5 0.5] [0.5285 0.4715] [0.5 0.5] [0.5285 0.4715] [0.5 0.5] [0.357 0.643] [0.5 0.5] [0.357 0.643] …… Records 12 Consensus Combination Reduces Expected Error • Detector A – Has probability P(A) – Outputs P(y|x,A) for record x regarding y=0 (normal) and y=1 (anomalous) • Expected error of single detector Err S A P( x, y)P( y | x) P( y | x, A) ( x, y ) 2 • Expected error of combined detector 2 C Err ( x , y ) P( x, y) P( y | x) A P( A) P( y | x, A) • Combined detector has a lower expected error Err C Err S 13 Extensions • Semi-supervised – Know the labels of a few records in advance – Improve the performance of the combined detector by incorporating this knowledge • Incremental – Records arrive continuously – Incrementally update the combined detector 14 Incremental [1 0] [0 1] When a new record arrives Update detector probability n 1 A1 qj …… ui …… u n 1 Ak un qj aijui anjun y j i 1 n 1 a i 1 anj ij Update record probability a q ij j v ui j 1 v a j 1 Detectors ij Records 15 Semi-supervised [1 0] [0 1] Iterate until convergence aijui y j n A1 qj …… ui qj i 1 n a i 1 a q ij j v …… ui Ak ij j 1 v a j 1 unlabeled ij aij q j f i v ui j 1 a j 1 Detectors Records labeled v ij 16 Benchmark Data Sets • IDN – Data: A sequence of events: dos flood, syn flood, port scanning, etc, partitioned into intervals – Detector: setting threshold on two high-level measures describing the probability of observing events during each interval • DARPA – Data: A series of TCP connection records, collected by MIT Lincoln labs, each record contains 34 continuous derived features, including duration, number of bytes, error rate, etc. – Detector: Randomly select a subset of features, and apply unsupervised distance-based anomaly detection algorithm 17 Benchmark Datasets • LBNL – Data: an enterprise traffic dataset collected at the edge routers of the Lawrence Berkeley National Lab. The packet traces were aggregated by intervals spanning 1 minute – Detector: setting threshold on six metrics including number of TCP SYN packets, number of distinct IPs in the source or destination, maximum number of distinct IPs an IP in the source or destination has contacted, and 6) maximum pairwise distance between distinct IPs an IP has contacted. 18 Experiments Setup • Baseline methods – base detectors – majority voting – consensus maximization – semi-supervised (2% labeled) – stream (30% batch, 70% incremental) • Evaluation measure – area under ROC curve (0-1, 1 is the best) – ROC curve: tradeoff between detection rate and false alarm rate 19 AUCMajority on Benchmark Data Sets voting among detectors worst best average IDN MV UC SC IC 0.5269 0.6671 0.5904 0.7089 0.7255 0.7204 0.7270 0.2832 0.8059 0.5731 0.6854 0.7711 0.8048 0.7552 0.3745 0.8266 0.6654 0.8871 0.9076 0.9089 0.9090 Consensus combination improves anomaly DARPA 0.5804 0.6068 0.5981performance! 0.7765 0.7812 0.8005 0.7730 detection LBNL 0.5930 0.6137 0.6021 0.7865 0.7938 0.8173 0.7836 0.5851 0.6150 0.6022 0.7739 0.7796 0.7985 0.7727 0.5005 0.8230 0.7101 0.8165 0.8180 0.8324 0.8160 Worst, best and average performance of atomic detectors Unsupervised, semisupervised and incremental version of consensus combination20 Stream Computing Continuous Ingestion Continuous Complex Analysis in low latency Conclusions • Consensus Combination – Combine multiple atomic anomaly detectors to a more accurate one in an unsupervised way • We give – Theoretical analysis of the error reduction by detector combination – Extension of the method to incremental and semisupervised learning scenarios – Experimental results on three network traffic datasets 22 Thanks! • Any questions? Code available upon request 23