Download Uncircumventable Data Privacy Policies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Serializability wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Access wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

IMDb wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Transcript
Uncircumventable
Privacy Policies
Arvind Narayanan
Vitaly Shmatikov
The University of Texas at Austin
Outsourced Customer Support
Database
…
1010011
0100101
1111000
…
“Answer our customers’ questions,
but do NOT download the entire list
of their social security numbers”
What Does NOT Work (1)
Database
…
1010011
0100101
1111000
…
DRM / tamper-proof systems
have distressing track record
“Tamper-proof” access control
system blocks forbidden queries
What Does NOT Work (2)
Database
…
1010011
0100101
1111000
…
But user must be able to
answer questions about
specific records
Randomize database records
(cf. privacy-preserving data mining)
NSA Phonebook
John Q. Spook
Bob Ispy
Tom Carnivore
Bill Sigint
555-1212
987-6543
212-2121
GET-RUDE
[email protected]
[email protected]
[email protected]
[email protected]
We want the database to behave like a lookup oracle,
i.e., like a function lookup: Names  Phones
 Lookup(name) is easy to compute
 Retrieving list of names or list of phones is infeasible
 Retrieving phone if name is not known is infeasible
Why?
Usual notion of privacy: access
control using “credentials”
Our notion: retain control over
data after it has been released
Publish databases but prevent people
(e.g. spammers) from harvesting
information indiscriminately
Easy to do if a trusted entity
mediates every access to the data;
we want to achieve the same level of
security in a non-interactive setting
Big Picture
Data query attempt
Database
Allowed queries are easy
Disallowed queries are infeasible
X
Use cryptography to implement
data-in-a-box – “virtual black box”
1
0
1
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
0
0
1
1
1
0
…
010
101
000
011
101
000
…
1
1
0
0
0
1
1
1
0
1
1
1
1
0
1
0
1
0
0
1
0
0
0
1
Computationally infeasible:
no trusted third parties,
no access control software,
no ad-hoc data scrambling …
Our Objectives
Not secrecy of individual records
We want to scramble the database so that
queries not permitted by the policy are
“impossible” to evaluate
• Note: permitted queries may reveal a lot about
individual records
• This depends on the policy!
Obfuscation: “Virtual Black Box”
Data-in-a-box, code-in-a-box
• Data D, query Q = same as program P s.t. P(Q) = Q(D)
• Think of data is simply a special case of code
Study of putting code in a box: obfuscation
An obfuscated version of a program…
• Has the same output whp on all inputs (functionality)
• Runs roughly as fast as the original (efficiency)
• Reveals no more about the original program code than
does a black box implementing the function
(obfuscation)
– … assuming a computationally bounded adversary
Obfuscation: State of the Art
Ad-hoc obfuscation schemes tend to be broken
• No proofs of security, many successfully attacked
– Example: Boneh-Jacob-Felten attack on obfuscated DES
General-purpose obfuscation is impossible
• Barak et al. (CRYPTO 2001)
• No single obfuscator for all circuits
Special-purpose obfuscation
• Example: UNIX password hashes
• Obfuscation of “string equality”, a.k.a. “point function”
– fα(x) = { α == x ? 1 : 0 }
Obfuscation Examples
Point function
x
Decryption
x
H(x) =? β
α
α
where β = H(α)
Yes/No
(we don’t know how)
• Should work for every α
• Obfuscated circuit should reveal nothing about α
Dα(x)
Basic Approach: Simulatability
Define ideal functionality for obfuscated database
• Formalization of “privacy policy”
• Secure by definition!
• Describes permitted queries and/or access patterns
– What we want our database to look like (e.g., lookup function)
Define the obfuscation algorithm
Argue that no efficient adversary can tell the
difference between the obfuscated database and
a simulation in the ideal functionality
• Therefore, obfuscated database does not leak any
information beyond what’s given by ideal functionality
Simulatability
1
0
1
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
0
0
1
1
1
0
0
1
0
0
1
0
10
01
00
11
01
00
…
1
1
0
0
0
1
1
1
0
1
1
1
1
0
1
0
1
0
0
1
0
0
0
1
Obfuscated
database
Ideal
functionality
Original
database
obfuscator
?
1
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0
1
0
1
0
1
0
0
0
0
1
0
0
1
1
…
00
00
01
01
00
01
…
(e.g., lookup function)
1
1
1
1
0
1
0
0
0
1
1
1
1
0
1
0
1
0
1
1
0
1
0
1
Fake
obfuscated
database
simulator
Secure by definition!
Cannot leak anything
that’s not permited by
ideal functionality
 No probabilistic polynomial-time adversary should be able
to distinguish the simulation and the real obfuscated
database with more than negligible probability
Formal Definition (Lookup Only)
D is the database, i.e., list of (x,y) pairs
ID: XY is the ideal lookup functionality
• xX s.t. (x,y1) … (x,yn)D ID(x)={y1 … yn}, else 
GD: XY is the obfuscation of D if
(1) Correct retrieval (allowed queries are feasible)
xX Prob( GD(x) ≠ ID(x) ) ≤ negl()
(2) Virtual black-box (disallowed queries infeasible)
 PPT adversary A,  PPT simulator S
| Prob(A(GD)=1) - Prob(SID(1ID)=1) | ≤ negl()
Discussion of the Definition
Indistinguishability from ideal functionality (IF) is
not always the same as intuitive “privacy”
• Some forms of access are permitted by IF
• The goal is not to hide individual data records, but to
control how they can be accessed
E.g., obfuscated phonebook is indistinguishable
from the lookup function: Names  Phones
• It’s hard to find the phone if you don’t know the name
• Does not say that it’s hard to find the name for which
there is a phone in the database
Is this the right definition? Depends on application!
Construction (Lookup Only)
ith row of the original database
xi
r1
hash(r1,xi)
yi
r2
To learn xi from the
obfuscated database,
need to invert the
hash function
hash(r2,xi)  yi
ith row of the obfuscated database
Easy simulatability proof in random oracle model
Access time is now linear in |D|
Group Privacy
Extracting one record is easy
• Legitimate account access
• Response to a customer request
Harvesting many records is hard
X
Database
1
0
1
1
1
0
• Harvesting of emails for spam
• Theft of financial information
• Unauthorized transaction monitoring
Inverse of the census problem
(allows access to individual records
but hides some global property)
0
1
1
0
0
1
1
0
1
0
0
1
0
0
1
1
1
0
…
010
101
000
011
101
000
…
1
1
0
0
0
1
1
1
0
1
1
1
1
0
1
0
1
0
0
1
0
0
0
1
Applications
Electronic directories: prevent malicious users
from harvesting information from the directory
Outsourced customer support: support clerk can
easily look up a record in response to a customer
request, but cannot steal data wholesale
Multi-institution drug trials: share encrypted test
subject records, reveal some of them later
• Revelation condition is not known in advance
– “Open records of all subjects with this group of symptoms”
• To prevent dictionary attacks, queries based on partial
information should take a long time to evaluate
Exponential Slowdown
Legitimate questions vs. mass harvesting
• Intuition: legitimate users know what they are looking
for and can describe it precisely
– “Give me the email of John Q. Public, born 1969”
• Abusers want all information indiscriminately
– “Give me the emails of all males under 50”
Idea: if N records satisfy user’s query, force user
to guess N bits to compute the answer
• Answer encrypted, user learns all but N bits of the key
What queries can be obfuscated in this way?
Simple Example
Name
YOB
Email
Smith
1949
[email protected]
Brown
1952
[email protected]
Smith
1972
[email protected]
Jones
1949
[email protected]
 SELECT EMAIL WHERE NAME=“Smith”
 SELECT EMAIL WHERE YOB=1949
• User can’t learn email without guessing 2 bits
 SELECT EMAIL WHERE NAME=“Smith” AND YOB=1949
• User can’t learn email without guessing 1 bit
Obfuscation of a Small Database
Helps user verify that
he found the right row
r1 r2 r3 r4
H(r1,“Smith”)
Hidden key bits depend
on other database entries
H(r3,“1949”)
H(r2,“Smith”)(24)
q1 q2 q3 q4
H(q1,“Brown”)
H(p1,“Smith”)
H(s1,“Jones”)
H(1234)“[email protected]”
H(q4,“1952”)(134)
H(12 3 4)“[email protected]”
H(p4,“1972”)(124)
H(1234)“[email protected]”
H(s4,“1949”)(23)
H(1234)“[email protected]”
H(p3,“1972”)
H(p2,“Smith”)(24)
s1 s2 s3 s4
H(r4,“1949”)(23)
H(q3,“1952”)
H(q2,“Brown”)(134)
p1 p2 p3 p4
Random 4-bit key
H(s3,“1949”)
H(s2,“Jones”)(123)
Can obfuscate any logical circuit of equalities and not-equalities
on individual field values
More Practical Construction
Space inefficiency is due to N random bits for
each row
Goal: generate N bits from a small random seed
so that any subset can be selectively revealed.
Attempt 1: sqrt(N) “blocks” of pseudorandom
sequences
• If a whole block is to be revealed, simply output the
seed, else output the selected bits from that block
• Same worst case, better average case complexity
Better construction: Merkle tree
k0 = hash(kroot||0)
kroot
k1 = hash(kroot||1)
k0
k00
k1
k01
k10
k11
kleft = hash(kparent||0)
kright = hash(kparent||1)
Each key reveals the subtree
rooted at that node, but nothing more
Each leaf is a “block of random bits”
When there are O(1) hidden bits,
the space complexity is O(k log n)
Open questions: worst case? provable?
Summary
Obfuscation is an interesting notion of privacy
• Orthogonal to commonly used definitions
What are the interesting ideal functionalities?
• Lookup, exponential slowdown… what else?
Provably secure constructions for a large class of
access patterns
Practical implementation still a challenge
More details in our CCS 2005 paper
• … and several forthcoming papers