Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Balancing Risk and Utility in Flow Trace Anonymization Martin Burkhart, ETH Zurich [email protected] Joint work with Daniela Brauckhoff, Elisa Boschi, Martin May Motivation Sharing of traffic measurements is crucial Only a limited set of sources available Reproducibility of results Dynamics / variability of traffic Get the big picture (e.g. Internet Storm Center) Keep up with globalized attacks (e.g. botnets) More and more traces are collected but not shared Data protection legislation Security concerns Competitive advantage Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 2 State-Of-The-Art: Anonymization Black Marking Truncation E.g. last bits of IP addresses Permutation Random (Partial) Prefix-preserving IP address permutation Enumeration E.g. Timestamps: keep the logical order of events Categorization Randomization (data mining community) K-Anonymity (data mining community) Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 3 The Tradeoff in Anonymization It‘s a trade-off Risk(t) RU-Maps Algorithm X X t=0.4 X t=0.2 t: Anony. Strength X t=0.1 X Prefix Pres. X t=0.7 X-Axis: Utility(t) Y-Axis: Risk(t) X Random Perm. Sweet Spot Utility(t) Not quantitatively studied, lack of metrics Strongly dependent on the application / attacker model Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 4 A Case Study: IP Address Truncation Techniques that permute IP addresses 1:1 are reversible Characteristic object sizes/frequencies, behavioral profiling, fingerprint active ports, exploit prefix structure Apply IP address truncation and evaluate the risk and utility dimensions Lower risk: Hosts are aggregated to subnets IP address 8 bits trunc. 16 bits trunc. 123.45.67.89 123.45.67.0 123.45.0.0 123.45.67.123 123.45.67.0 123.45.0.0 123.45.12.34 123.45.12.0 123.45.0.0 Lower utility: Resolution of entities is reduced Quantifying the tradeoff: How bad is it in numbers? Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 5 Internal vs. External Prefixes Unique Count (log) Factor 3 Factor 53 Asymmetry in prefixes external Internal (AS 559) Is this reflected in Risk reduction? Utility reduction? x=8 Prefix length (32-x) Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 6 Measuring Utility of Truncated Data Specific application: anomaly detection Compare detection quality of scans and (D)DoS attacks in original and truncated data Two IP-based metrics Unique address count Address entropy 3 weeks of NetFlow data ~ 43 billion flows SWITCH network Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 7 Measuring Detection Quality Ground truth: Manual identification of scans/(D)DoS attacks Run a Kalman filter on metric timeseries Utility measured by AUC (area under the ROC curve) Vary threshold Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 8 Utility of Truncated Data Internal metrics degrade faster than external metrics Counts degrade faster than Entropy Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 9 Approximating Risk of Host Identification In general: Truncation of x bits leads to 2^(32-x) prefixes with 2^x addresses per prefix But: only a fraction (A) of potential addresses is usually active 129.130.80. 1, 2, 3, ... 10, 11, 12, ... 240, 241, ... 254, 255 e.g. A = 10% Hence, On average A*2^x addresses per prefix Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 10 Risk of Truncated Data 1 risk ( x) x 2 A Ain 10.5% (total: 2.2 million) Aout 0.08% (total: 4.3 billion) Risk for external addresses is higher due to sparcity! Constant offset: log 2 ( Martin Burkhart, ETH Zurich Ain )7 Aext Balancing Risk and Utility in Flow Trace Anonymization 11 The Risk-Utility Tradeoff No truncation 4 bits 8 bits 12 bits 16 bits best tradeoff Metric Martin Burkhart, ETH Zurich x Utility Risk internal entropy 8 0.94 0.035 internal entropy 12 0.87 0.002 external entropy 16 0.97 0.02 Balancing Risk and Utility in Flow Trace Anonymization 12 Conclusion We made a quantitative evaluation of the risk-utility tradeoff in anonymization Entropy is much more resistant to truncation than unique counts Risk and utility degrade faster for internal addresses For detection of scans and (D)DoS attacks, it is possible to get a good tradeoff with high utility and low risk Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 13 Thank You for the Attention Martin Burkhart, ETH Zurich Balancing Risk and Utility in Flow Trace Anonymization 14