* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Uncircumventable Data Privacy Policies
Survey
Document related concepts
Serializability wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Database model wikipedia , lookup
Transcript
Uncircumventable Privacy Policies Arvind Narayanan Vitaly Shmatikov The University of Texas at Austin Outsourced Customer Support Database … 1010011 0100101 1111000 … “Answer our customers’ questions, but do NOT download the entire list of their social security numbers” What Does NOT Work (1) Database … 1010011 0100101 1111000 … DRM / tamper-proof systems have distressing track record “Tamper-proof” access control system blocks forbidden queries What Does NOT Work (2) Database … 1010011 0100101 1111000 … But user must be able to answer questions about specific records Randomize database records (cf. privacy-preserving data mining) NSA Phonebook John Q. Spook Bob Ispy Tom Carnivore Bill Sigint 555-1212 987-6543 212-2121 GET-RUDE [email protected] [email protected] [email protected] [email protected] We want the database to behave like a lookup oracle, i.e., like a function lookup: Names Phones Lookup(name) is easy to compute Retrieving list of names or list of phones is infeasible Retrieving phone if name is not known is infeasible Why? Usual notion of privacy: access control using “credentials” Our notion: retain control over data after it has been released Publish databases but prevent people (e.g. spammers) from harvesting information indiscriminately Easy to do if a trusted entity mediates every access to the data; we want to achieve the same level of security in a non-interactive setting Big Picture Data query attempt Database Allowed queries are easy Disallowed queries are infeasible X Use cryptography to implement data-in-a-box – “virtual black box” 1 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 … 010 101 000 011 101 000 … 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 1 0 0 0 1 Computationally infeasible: no trusted third parties, no access control software, no ad-hoc data scrambling … Our Objectives Not secrecy of individual records We want to scramble the database so that queries not permitted by the policy are “impossible” to evaluate • Note: permitted queries may reveal a lot about individual records • This depends on the policy! Obfuscation: “Virtual Black Box” Data-in-a-box, code-in-a-box • Data D, query Q = same as program P s.t. P(Q) = Q(D) • Think of data is simply a special case of code Study of putting code in a box: obfuscation An obfuscated version of a program… • Has the same output whp on all inputs (functionality) • Runs roughly as fast as the original (efficiency) • Reveals no more about the original program code than does a black box implementing the function (obfuscation) – … assuming a computationally bounded adversary Obfuscation: State of the Art Ad-hoc obfuscation schemes tend to be broken • No proofs of security, many successfully attacked – Example: Boneh-Jacob-Felten attack on obfuscated DES General-purpose obfuscation is impossible • Barak et al. (CRYPTO 2001) • No single obfuscator for all circuits Special-purpose obfuscation • Example: UNIX password hashes • Obfuscation of “string equality”, a.k.a. “point function” – fα(x) = { α == x ? 1 : 0 } Obfuscation Examples Point function x Decryption x H(x) =? β α α where β = H(α) Yes/No (we don’t know how) • Should work for every α • Obfuscated circuit should reveal nothing about α Dα(x) Basic Approach: Simulatability Define ideal functionality for obfuscated database • Formalization of “privacy policy” • Secure by definition! • Describes permitted queries and/or access patterns – What we want our database to look like (e.g., lookup function) Define the obfuscation algorithm Argue that no efficient adversary can tell the difference between the obfuscated database and a simulation in the ideal functionality • Therefore, obfuscated database does not leak any information beyond what’s given by ideal functionality Simulatability 1 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 0 10 01 00 11 01 00 … 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 1 0 0 0 1 Obfuscated database Ideal functionality Original database obfuscator ? 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 0 1 0 0 0 0 1 0 0 1 1 … 00 00 01 01 00 01 … (e.g., lookup function) 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 1 Fake obfuscated database simulator Secure by definition! Cannot leak anything that’s not permited by ideal functionality No probabilistic polynomial-time adversary should be able to distinguish the simulation and the real obfuscated database with more than negligible probability Formal Definition (Lookup Only) D is the database, i.e., list of (x,y) pairs ID: XY is the ideal lookup functionality • xX s.t. (x,y1) … (x,yn)D ID(x)={y1 … yn}, else GD: XY is the obfuscation of D if (1) Correct retrieval (allowed queries are feasible) xX Prob( GD(x) ≠ ID(x) ) ≤ negl() (2) Virtual black-box (disallowed queries infeasible) PPT adversary A, PPT simulator S | Prob(A(GD)=1) - Prob(SID(1ID)=1) | ≤ negl() Discussion of the Definition Indistinguishability from ideal functionality (IF) is not always the same as intuitive “privacy” • Some forms of access are permitted by IF • The goal is not to hide individual data records, but to control how they can be accessed E.g., obfuscated phonebook is indistinguishable from the lookup function: Names Phones • It’s hard to find the phone if you don’t know the name • Does not say that it’s hard to find the name for which there is a phone in the database Is this the right definition? Depends on application! Construction (Lookup Only) ith row of the original database xi r1 hash(r1,xi) yi r2 To learn xi from the obfuscated database, need to invert the hash function hash(r2,xi) yi ith row of the obfuscated database Easy simulatability proof in random oracle model Access time is now linear in |D| Group Privacy Extracting one record is easy • Legitimate account access • Response to a customer request Harvesting many records is hard X Database 1 0 1 1 1 0 • Harvesting of emails for spam • Theft of financial information • Unauthorized transaction monitoring Inverse of the census problem (allows access to individual records but hides some global property) 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 … 010 101 000 011 101 000 … 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 1 0 0 0 1 Applications Electronic directories: prevent malicious users from harvesting information from the directory Outsourced customer support: support clerk can easily look up a record in response to a customer request, but cannot steal data wholesale Multi-institution drug trials: share encrypted test subject records, reveal some of them later • Revelation condition is not known in advance – “Open records of all subjects with this group of symptoms” • To prevent dictionary attacks, queries based on partial information should take a long time to evaluate Exponential Slowdown Legitimate questions vs. mass harvesting • Intuition: legitimate users know what they are looking for and can describe it precisely – “Give me the email of John Q. Public, born 1969” • Abusers want all information indiscriminately – “Give me the emails of all males under 50” Idea: if N records satisfy user’s query, force user to guess N bits to compute the answer • Answer encrypted, user learns all but N bits of the key What queries can be obfuscated in this way? Simple Example Name YOB Email Smith 1949 [email protected] Brown 1952 [email protected] Smith 1972 [email protected] Jones 1949 [email protected] SELECT EMAIL WHERE NAME=“Smith” SELECT EMAIL WHERE YOB=1949 • User can’t learn email without guessing 2 bits SELECT EMAIL WHERE NAME=“Smith” AND YOB=1949 • User can’t learn email without guessing 1 bit Obfuscation of a Small Database Helps user verify that he found the right row r1 r2 r3 r4 H(r1,“Smith”) Hidden key bits depend on other database entries H(r3,“1949”) H(r2,“Smith”)(24) q1 q2 q3 q4 H(q1,“Brown”) H(p1,“Smith”) H(s1,“Jones”) H(1234)“[email protected]” H(q4,“1952”)(134) H(12 3 4)“[email protected]” H(p4,“1972”)(124) H(1234)“[email protected]” H(s4,“1949”)(23) H(1234)“[email protected]” H(p3,“1972”) H(p2,“Smith”)(24) s1 s2 s3 s4 H(r4,“1949”)(23) H(q3,“1952”) H(q2,“Brown”)(134) p1 p2 p3 p4 Random 4-bit key H(s3,“1949”) H(s2,“Jones”)(123) Can obfuscate any logical circuit of equalities and not-equalities on individual field values More Practical Construction Space inefficiency is due to N random bits for each row Goal: generate N bits from a small random seed so that any subset can be selectively revealed. Attempt 1: sqrt(N) “blocks” of pseudorandom sequences • If a whole block is to be revealed, simply output the seed, else output the selected bits from that block • Same worst case, better average case complexity Better construction: Merkle tree k0 = hash(kroot||0) kroot k1 = hash(kroot||1) k0 k00 k1 k01 k10 k11 kleft = hash(kparent||0) kright = hash(kparent||1) Each key reveals the subtree rooted at that node, but nothing more Each leaf is a “block of random bits” When there are O(1) hidden bits, the space complexity is O(k log n) Open questions: worst case? provable? Summary Obfuscation is an interesting notion of privacy • Orthogonal to commonly used definitions What are the interesting ideal functionalities? • Lookup, exponential slowdown… what else? Provably secure constructions for a large class of access patterns Practical implementation still a challenge More details in our CCS 2005 paper • … and several forthcoming papers