Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Lower Bounds on Noise Sergey Yekhanin Institute for Advanced Study Setting Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries Want to make deductions about the database; learn large scale trends. E.g. Learn that a drug V increases likelihood of heart disease Do not leak info about individual patients Message Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Curator Analyst Message Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Curator Summary Analyst Message Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Thesis: The interactive approach is the right way to give good accuracy for a given level of privacy Any non-interactive solution permitting “too accurate” answers to “too many” questions leaks private information. Plan Mathematical model of database and queries Attacks Somewhat accurate answers to all queries lead to privacy leakage. (Fourier analysis) [Y] (extends [DiNi]). Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY] Study of privacy leads to a variety of mathematical challenges! Model [Dinur-Nissim] Simple Model (easily justifiable) Database: n-bit binary vector x Query: vector a True answer: Dot product ax Response is ax+ e = True Answer + Noise Privacy Leakage: Attacker learns a certain bit of x. Blatant Non-Privacy: Attacker learns n−o(n) bits of x. Fourier attack Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database. Put database records in one-to-one correspondence with elements of a group Z 2k . Think of a database as a function D from Z 2k to {0,1}. Choose queries to ask for Fourier coefficients of D. Noisy Fourier coefficients approximately determine the Boolean function D! (Parseval identity). Linear programming attack Theorem: If a curator adds o(√n) noise to 0.773 fraction of responses; then an attacker can ask O(n) questions, perform O(n3) computation and recover n-o(n) bits of the database. Arbitrarily large error on arbitrary and unknown 0.239 fraction on answers. Linear programming attack Ask O(n) random +1/-1 questions Obtain y=Ax+e, where e is the error vector A natural approach to recover x from y: Solve: min |e'|0 such that y=Ax'+e‘, x' in Rn (hard!) Solve a linear program [D, CT, MT]: min |e'|1 such that y=Ax'+e' x' in Rn Ax' y Polynomial interpolation attack Model: Questions have O(c) large coefficients Theorem: If a curator adds o(c) noise to 0.501 fraction of responses; then an attacker can ask c questions, perform O(c4) computation and reliably recover any particular bit of the database. Arbitrarily large error on arbitrary and unknown 0.499 fraction on answers. Polynomial interpolation attack Assume c is prime. Think of the space of queries as a linear space Fcn . To obtain a reliable answer to query x = (1,0, … , 0) , draw a degree two curve through x. Ask all queries that correspond to points on the curve. Use polynomial interpolation to carefully combine the answers. x q1 q4 q2 q3 q5 q6 Implications Privacy has a Price There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to attack. Cannot just output a noisy table. Non-interactive approach has inherent limitations Interactive approach works Can also publish a summary, as long as its clear which stats are accurate, and which ones are not. Future directions: Fewer queries Understand what can and what cannot be done privately