Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How may Auditors Inadvertently Compromise Your Privacy Kobbi Nissim Microsoft With Nina Mishra HP/Stanford Work in progress PORTIA Workshop on Sensitive Data in Medical, Financial, and ContentDistribution Systems The Setting q = (f ,i1,…,ik) f (di1,…,dik) Statistical database • Dataset: d={d1,…,dn} – Entries di: Real, Integer, Boolean • Query: q = (f ,i1,…,ik) – f : Min, Max, Median, Sum, Average, Count… • Bad users will try to breach the privacy of individuals 2 The Data Privacy Game: an Information-Privacy Tradeoff • Private functions: i f f f – Want to hide i(d)=di • Information functions: – Want to reveal query answers f(di1,…,dik) • Major question: what may be computed over d (and given to users) without breaching privacy? • Confidentiality control methods – Perturbation methods: give `noisy’ answers – Query restriction methods: limit the queries users may post, usually imposing some structure (e.g. size/overlap restrictions) 3 Auditing • [AW89] classify auditing as a query restriction method: – “Auditing of an SDB involves keeping up-to-date logs of all queries made by each user (not the data involved) and constantly checking for possible compromise whenever a new query is issued” • Partial motivation: May allow for more queries to be posed, if no privacy threat occurs • Early work: Hofmann 1977, Schlorer 1976, Chin, Ozsoyoglu 1981, 1986 • Recent interest: Kleinberg, Papadimitriou, Raghavan 2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003 4 Auditing Here’s the answer OR Here’s a new query: qi+1 Query denied (as the answer would cause privacy loss) Auditor Query log q1,…,qi Statistical database 5 Design choices in Prior Work (1) 1. Privacy definition: – – Privacy breached (only) when a database entry may be deduced fully, or within some accuracy These privacy guarantees do not generally suffice: • Should take into account: Adversary’s computational power, prior knowledge, access to other databases… 2. Exact answers given – Auditors viewed as a way to give `quality’ answers??? 6 Design choices in Prior Work (2) 3. Which information is taken into account in the auditor decision procedure: – • Decision made based on queries q1,…,qi, qi+1 and their answers a1,…,ai, ai+1 Denials ignored 4. Offline vs. Online: • Offline auditing: queries and answers checked for compromise at the end of the day • • Only detect breaches Online auditing: answer/deny queries on the fly • Prevent breaches just before they happen 7 Example 1: Sum/Max auditing di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) Oh well… Auditor 8 Some Prior Work on Auditors Sum/Max [Chin] Boolean [KPR00] Data real Queries Breach Sum/max di learned Complexity NP-hard 0/1 Sum --”-- NP-hard* Max [KPR00] Real Max --”-- PTIME Interval based [LWWJ02] Generalized results [JK03] di [a,b] sum di within accuracy PTIME NP-hard / PTIME * Approx version in PTIME 9 Can we use the offline version for online auditing? … After Two Minutes … di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) There must beiffa q2 is denied Ohreason well… for the d1=d2=d3 =5 denial… I win! Denied (the answer would cause privacy loss) Auditor 10 Example 2: Interval Based Auditing di [0,100], sum queries, =1 (PTIME) q1 = sum(d1,d2) Sorry, denied q2 = sum(d2,d3) sum(d2,d3) = 50 d1,d2 [0,1] Denial d3 [49,50] d1,d2[0,1] or [99,100] Auditor 11 Sounds Familiar? Colonel Oliver North, on the Iran-Contra Arms Deal: On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights. David Duncan, Former auditor for Enron and partner in Andersen: Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States. 12 Max Auditing d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn di real q1 = max(d1,d2,d3,d4) M1234 q2 = max(d1,d2,d3) If denied: d4=M1234 M123 / denied q2 = max(d1,d2) If denied: d3=M123 M12 / denied Auditor 13 Adversary’s Success q1 = max(d1,d2,d3,d4) If denied: d4=M1234 Denied with probability 1/4 q2 = max(d1,d2) If denied: d3=M123 Denied with probability 1/3 Success probability: 1/4 + (1- 1/4)·1/3 = 1/2 q2 = max(d1,d2,d3) Recover 1/8 of the database! Auditor 14 Boolean Auditing? d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn q1 = sum(d1,d2) di Boolean 1 / denied q2=sum(d2,d3) … 1 / denied qi denied iff di = di+1 learn database/complement Let di,dj,dk not all equal, where qi-1, qi, qj-1, qj, qk-1, qk all denied q2=sum(di,dj,dk) 1/2 Recover the entire database! Auditor 15 Two Problems • Obvious problem: denied queries ignored – Algorithmic problem: not clear how to incorporate denials in the decision • Subtle problem: – Query denials leak (potentially sensitive) information • Users cannot decide denials by themselves Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai) qi+1 denied 16 A Spectrum of Auditors “Safe” “Safe” Size overlap restriction Algebraic structure q1,…,qi, qi+1 < utility “Unsafe” q1,…,qi, qi+1 a1,…,ai > q1,…,qi, qi+1 a1,…,ai, ai+1 privacy *Note: can work in “unsafe” region, but need to prove denials do not leak crucial information 17 Simulatable Auditing* An auditor is simulatable if a simulator exists s.t.: qi+1 q1,…,qi Statistical database Auditor Deny/answer qi+1 q1,…,qi a1,…,ai Simulator Deny/answer Simulation denials do not leak information * `self auditors’ in [DN03] 18 Why Simulatable Auditors do not Leak Information? Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai ) qi+1 denied/allowed 19 Summary • Improper usage of auditors may lead to privacy breaches, due to information leakage in the decision procedure. – Cell suppression / some k-anonymity methods should be checked similarly – Should make sure offline auditors do not leak information in decision • Simulatable auditors provably don’t leak information – Give best utility while still “safe” – A launching point for further research on auditors • Further research: – Auditors with more reasonable privacy guarantees 20