Download Balancing Risk and Utility in Flow Trace Anonymization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Balancing Risk and Utility in
Flow Trace Anonymization
Martin Burkhart, ETH Zurich
[email protected]
Joint work with Daniela Brauckhoff, Elisa Boschi, Martin May
Motivation
 Sharing of traffic measurements is crucial

Only a limited set of sources available

Reproducibility of results

Dynamics / variability of traffic

Get the big picture (e.g. Internet Storm Center)

Keep up with globalized attacks (e.g. botnets)
 More and more traces are collected but not shared

Data protection legislation

Security concerns

Competitive advantage
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
2
State-Of-The-Art: Anonymization
 Black Marking
 Truncation

E.g. last bits of IP addresses
 Permutation

Random

(Partial) Prefix-preserving IP address permutation
 Enumeration

E.g. Timestamps: keep the logical order of events
 Categorization
 Randomization (data mining community)
 K-Anonymity (data mining community)
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
3
The Tradeoff in Anonymization
 It‘s a trade-off


Risk(t)
RU-Maps
Algorithm X
X
t=0.4
X
t=0.2
t: Anony. Strength
X
t=0.1
X Prefix Pres.
X t=0.7

X-Axis: Utility(t)

Y-Axis: Risk(t)
X Random Perm.
Sweet
Spot
Utility(t)
 Not quantitatively studied, lack of metrics
 Strongly dependent on the application / attacker model
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
4
A Case Study: IP Address Truncation
 Techniques that permute IP addresses 1:1 are reversible

Characteristic object sizes/frequencies, behavioral profiling, fingerprint
active ports, exploit prefix structure
 Apply IP address truncation and evaluate the risk and utility
dimensions
 Lower risk:
Hosts are aggregated to subnets
IP address
8 bits trunc.
16 bits trunc.
123.45.67.89
123.45.67.0
123.45.0.0
123.45.67.123
123.45.67.0
123.45.0.0
123.45.12.34
123.45.12.0
123.45.0.0
 Lower utility:
Resolution of entities is reduced
 Quantifying the tradeoff: How bad is it in numbers?
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
5
Internal vs. External Prefixes
Unique Count (log)
Factor 3
Factor 53
 Asymmetry in prefixes

external

Internal (AS 559)
 Is this reflected in

Risk reduction?

Utility reduction?
x=8
Prefix length (32-x)
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
6
Measuring Utility of Truncated Data
 Specific application: anomaly detection
 Compare detection quality of scans and (D)DoS attacks
in original and truncated data
 Two IP-based metrics

Unique address count

Address entropy
 3 weeks of NetFlow data

~ 43 billion flows

SWITCH network
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
7
Measuring Detection Quality
 Ground truth: Manual identification of scans/(D)DoS attacks
 Run a Kalman filter on metric timeseries
 Utility measured by AUC (area under the ROC curve)
Vary
threshold
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
8
Utility of Truncated Data
 Internal metrics degrade faster than external metrics
 Counts degrade faster than Entropy
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
9
Approximating Risk of Host Identification
 In general: Truncation of x bits leads to
 2^(32-x) prefixes with 2^x addresses per prefix
 But: only a fraction (A) of potential addresses is usually
active
129.130.80.
1, 2, 3, ...
10, 11, 12,
...
240, 241, ...
254, 255
e.g. A = 10%
 Hence, On average A*2^x addresses per prefix
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
10
Risk of Truncated Data
1
risk ( x)  x
2 A
Ain  10.5% (total: 2.2 million)
Aout  0.08% (total: 4.3 billion)
 Risk for external addresses is higher due to sparcity!
 Constant offset: log 2 (
Martin Burkhart, ETH Zurich
Ain
)7
Aext
Balancing Risk and Utility in Flow Trace Anonymization
11
The Risk-Utility Tradeoff
No truncation
4 bits
8 bits
12 bits
16 bits
best tradeoff
Metric
Martin Burkhart, ETH Zurich
x
Utility
Risk
internal entropy 8
0.94
0.035
internal entropy 12
0.87
0.002
external entropy 16
0.97
0.02
Balancing Risk and Utility in Flow Trace Anonymization
12
Conclusion
 We made a quantitative evaluation of the risk-utility
tradeoff in anonymization
 Entropy is much more resistant to truncation than unique
counts
 Risk and utility degrade faster for internal addresses
 For detection of scans and (D)DoS attacks, it is possible
to get a good tradeoff with high utility and low risk
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
13
Thank You for the Attention
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
14