Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
How may Auditors Inadvertently
Compromise Your Privacy
Kobbi Nissim
Microsoft
With Nina Mishra HP/Stanford
Work in progress
PORTIA Workshop on Sensitive Data in Medical, Financial, and ContentDistribution Systems
The Setting
q = (f ,i1,…,ik)
f (di1,…,dik)
Statistical
database
• Dataset: d={d1,…,dn}
– Entries di: Real, Integer, Boolean
• Query: q = (f ,i1,…,ik)
– f : Min, Max, Median, Sum, Average, Count…
• Bad users will try to breach the privacy of
individuals
2
The Data Privacy Game:
an Information-Privacy Tradeoff
• Private functions:
i
f
f f
– Want to hide i(d)=di
• Information functions:
– Want to reveal query answers f(di1,…,dik)
• Major question: what may be computed over d (and
given to users) without breaching privacy?
• Confidentiality control methods
– Perturbation methods: give `noisy’ answers
– Query restriction methods: limit the queries users may post,
usually imposing some structure (e.g. size/overlap
restrictions)
3
Auditing
• [AW89] classify auditing as a query restriction method:
– “Auditing of an SDB involves keeping up-to-date
logs of all queries made by each user (not the data
involved) and constantly checking for possible
compromise whenever a new query is issued”
• Partial motivation: May allow for more queries
to be posed, if no privacy threat occurs
• Early work: Hofmann 1977, Schlorer 1976,
Chin, Ozsoyoglu 1981, 1986
• Recent interest: Kleinberg, Papadimitriou, Raghavan
2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin
2003
4
Auditing
Here’s the answer
OR
Here’s a new query: qi+1
Query denied (as the answer
would cause privacy loss)
Auditor
Query log
q1,…,qi
Statistical
database
5
Design choices in Prior Work (1)
1. Privacy definition:
–
–
Privacy breached (only) when a database entry
may be deduced fully, or within some accuracy
These privacy guarantees do not generally
suffice:
•
Should take into account: Adversary’s computational
power, prior knowledge, access to other databases…
2. Exact answers given
–
Auditors viewed as a way to give `quality’
answers???
6
Design choices in Prior Work (2)
3. Which information is taken into account in
the auditor decision procedure:
–
•
Decision made based on queries q1,…,qi, qi+1
and their answers a1,…,ai, ai+1
Denials ignored
4. Offline vs. Online:
•
Offline auditing: queries and answers checked
for compromise at the end of the day
•
•
Only detect breaches
Online auditing: answer/deny queries on the
fly
•
Prevent breaches just before they happen
7
Example 1: Sum/Max auditing
di real, sum/max queries, privacy breached if some di learned
q1 = sum(d1,d2,d3)
sum(d1,d2,d3) = 15
q2 = max(d1,d2,d3)
Denied (the answer would
cause privacy loss)
Oh well…
Auditor
8
Some Prior Work on Auditors
Sum/Max
[Chin]
Boolean
[KPR00]
Data
real
Queries
Breach
Sum/max di learned
Complexity
NP-hard
0/1
Sum
--”--
NP-hard*
Max [KPR00]
Real
Max
--”--
PTIME
Interval based
[LWWJ02]
Generalized
results [JK03]
di [a,b]
sum
di within
accuracy
PTIME
NP-hard /
PTIME
* Approx version in PTIME
9
Can we use the offline version for online auditing?
… After Two Minutes …
di real, sum/max queries, privacy breached if some di learned
q1 = sum(d1,d2,d3)
sum(d1,d2,d3) = 15
q2 = max(d1,d2,d3)
There
must beiffa
q2 is denied
Ohreason
well…
for the
d1=d2=d3
=5
denial…
I win!
Denied (the answer would
cause privacy loss)
Auditor
10
Example 2: Interval Based Auditing
di [0,100], sum queries, =1 (PTIME)
q1 = sum(d1,d2)
Sorry, denied
q2 = sum(d2,d3)
sum(d2,d3) = 50
d1,d2 [0,1]
Denial
d3 [49,50]
d1,d2[0,1]
or
[99,100]
Auditor
11
Sounds Familiar?
Colonel Oliver North, on the Iran-Contra Arms Deal:
On the advice of my counsel I respectfully and
regretfully decline to answer the question based on
my constitutional rights.
David Duncan, Former auditor for Enron and
partner in Andersen:
Mr. Chairman, I would like to answer the
committee's questions, but on the advice of
my counsel I respectfully decline to answer
the question based on the protection afforded
me under the Constitution of the United
States.
12
Max Auditing
d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn
di real
q1 = max(d1,d2,d3,d4)
M1234
q2 = max(d1,d2,d3)
If denied: d4=M1234
M123 / denied
q2 = max(d1,d2)
If denied: d3=M123
M12 / denied
Auditor
13
Adversary’s Success
q1 = max(d1,d2,d3,d4)
If denied: d4=M1234
Denied with probability 1/4
q2 = max(d1,d2)
If denied: d3=M123
Denied with probability 1/3
Success probability: 1/4 + (1- 1/4)·1/3 = 1/2
q2 = max(d1,d2,d3)
Recover 1/8 of the database!
Auditor
14
Boolean Auditing?
d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn
q1 = sum(d1,d2)
di Boolean
1 / denied
q2=sum(d2,d3)
…
1 / denied
qi denied iff di = di+1 learn database/complement
Let di,dj,dk not all equal, where qi-1, qi, qj-1, qj, qk-1, qk all denied
q2=sum(di,dj,dk)
1/2
Recover the entire database!
Auditor
15
Two Problems
• Obvious problem: denied queries ignored
– Algorithmic problem: not clear how to incorporate denials in
the decision
• Subtle problem:
– Query denials leak (potentially sensitive) information
• Users cannot decide denials by themselves
Possible assignments to {d1,…,dn}
Assignments consistent
with (q1,…qi, a1,…,ai)
qi+1 denied
16
A Spectrum of Auditors
“Safe”
“Safe”
Size overlap restriction
Algebraic structure
q1,…,qi, qi+1
<
utility
“Unsafe”
q1,…,qi, qi+1
a1,…,ai
>
q1,…,qi, qi+1
a1,…,ai, ai+1
privacy
*Note: can work in “unsafe” region, but need to prove denials do not
leak crucial information
17
Simulatable Auditing*
An auditor is simulatable if a simulator exists s.t.:
qi+1
q1,…,qi
Statistical
database
Auditor
Deny/answer
qi+1
q1,…,qi
a1,…,ai
Simulator
Deny/answer
Simulation denials do not leak information
* `self auditors’ in [DN03]
18
Why Simulatable Auditors do not
Leak Information?
Possible assignments to {d1,…,dn}
Assignments consistent
with (q1,…qi, a1,…,ai )
qi+1 denied/allowed
19
Summary
• Improper usage of auditors may lead to privacy
breaches, due to information leakage in the decision
procedure.
– Cell suppression / some k-anonymity methods should be
checked similarly
– Should make sure offline auditors do not leak information in
decision
• Simulatable auditors provably don’t leak information
– Give best utility while still “safe”
– A launching point for further research on auditors
• Further research:
– Auditors with more reasonable privacy guarantees
20