* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Online Auditing
Survey
Document related concepts
Transcript
Online Auditing
Kobbi Nissim
Microsoft
Based on a position paper with Nina Mishra
The Setting
q = (f ,i1,…,ik)
f (di1,…,dik)
Statistical
database
• Dataset: {d1,…,dn}
– Entries di: Real, Integer, Boolean
• Query: q = (f ,i1,…,ik)
– f : Min, Max, Median, Sum, Average, Count…
• Some users are bad…
2
Auditing
Here’s the answer
OR
Here’s a new query: qi+1
Query denied (as the answer
would cause privacy loss)
Auditor
Query log
q1,…,qi
Statistical
database
3
Auditing
• [Adam, Wortmann 89] classify auditing as a query restriction
method
– Such methods limit the queries users may post, usually
imposing some structure (e.g. combinatorial, algebraic)
– “Auditing of an SDB involves keeping up-to-date logs of all
queries made by each user (not the data involved) and
constantly checking for possible compromise whenever a
new query is issued”
• Partial motivation: May allow for more queries to be
posed, if no privacy threat occurs
• Early work: Hofmann 1977, Schlorer 1976, Chin,
Ozsoyoglu 1981, 1986
• Recent interest: Kleinberg, Papadimitriou, Raghavan 2000,
Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003
4
Design choices in Prior Work
• Out of the scope for this talk (but important):
– Very weak privacy guarantee: Privacy breached
(only) when a database entry may be uniquely
deduced
– Exact answers given
• Important for this talk:
– Data taken into account in decision procedure
• Answers to q1,…,qi and qi+1 taken into account
• Denials ignored
5
Some Prior Work on Auditors
Sum/Max
[Chin]
Boolean
[KPR00]
Data
real
Queries
Breach
Sum/max di learned
Complexity
NP-hard
0/1
Sum
--”--
NP-hard
Max [KPR00]
Real
Max
--”--
PTIME
Interval based
[LWWJ02]
Generalized
results [JK03]
di [a,b]
sum
di within
accuracy .
PTIME
NP-hard /
PTIME
6
Example 1: Sum/Max auditing
di real, sum/max queries
q1 = sum(d1,d2,d3)
sum(d1,d2,d3) = 15
q2 = max(d1,d2,d3)
Denied (the answer would
cause privacy loss)
is denied iff
Ohq2
well…
d1=d2=d3 = 5
I win!
Auditor
7
Example 2: Interval Based Auditing
di [0,100], sum queries, =1 (PTIME)
q1 = sum(d1,d2)
Sorry, denied
q2 = sum(d2,d3)
sum(d2,d3) = 50
d1,d2 [0,1]
d3 [49,50]
Auditor
8
Sounds Familiar?
Colonel Oliver North, on the Iran-Contra Arms Deal:
On the advice of my counsel I respectfully and
regretfully decline to answer the question based on
my constitutional rights.
David Duncan, Former auditor for Enron and
partner in Andersen:
Mr. Chairman, I would like to answer the
committee's questions, but on the advice of
my counsel I respectfully decline to answer
the question based on the protection afforded
me under the Constitution of the United
States.
9
What about Max Auditing?
d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn
di real
q1 = max(d1,d2,d3,d4)
M1234
q2 = max(d1,d2,d3)
M123 / denied
If denied: d4=M1234
q2 = max(d1,d2)
M12 / denied
If denied: d3=M123
Recover 1/8 of the database!
Auditor
10
What about Boolean Auditing?
d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn
q1 = sum(d1,d2)
di Boolean
1 / denied
q2=sum(d2,d3)
…
1 / denied
qi denied iff di = di+1 learn database/complement
Let di,dj,dk not all equal, where qi-1, qi, qj-1, qj, qk-1, qk all denied
q2=sum(di,dj,dk)
1/2
Recover the entire database!
Auditor
11
What are the Problems?
• Obvious problem: denied queries ignored
– Algorithmic problem: not clear how to incorporate denials in
the deicion
• Subtle problem:
– Query denials leak (potentially sensitive) information
• Users cannot decide denials by themselves
Possible assignments to {d1,…,dn}
Assignments consistent
with (q1,…qi)
qi+1 denied
12
A Spectrum of Auditors
Decision data
q1,…,qi, qi+1
q1,…,qi, qi+1
a1,…,ai, ai+1
Examples
• Size overlap restriction
• Algebraic structure
• Sum/Max, Interval based,
Boolean, Max
• Cell suppression
• k-anonimity
“safe”
“unsafe”
*Note: can work in “unsafe” region, but need to prove denials do13 not
leak crucial information
Simulatable Auditing*
An auditor is simulatable if a simulator exists s.t.:
qi+1
q1,…,qi
Statistical
database
Auditor
Deny/answer
qi+1
q1,…,qi
a1,…,ai
Simulator
Deny/answer
Simulation denials do not leak information
* `self auditors’ in [DN03]
14
Summary
• Subtleties in current definition of auditors
allow for information leakage, and
potentially, privacy breaches
– Denials are not taken into account
– Auditor uses information not available to user
• Simulatable auditors provably don’t leak
information in decision
– New starting point for research on auditors
15