Download Privacy Overview

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
18739A: Foundations of Security and Privacy
Privacy Research Overview
Anupam Datta
Fall 2007-08
Privacy Research Space
What is Privacy?
[Philosophy, Law, Public Policy]
Next 3
lectures
Formal Model, Policy Language,
Compliance-check Algorithms
[Programming Languages, Logic]
Implementation-level Compliance
[Software Engg, Formal Methods]
TODAY
TODAY
Data Privacy
[Databases, Cryptography]
Philosophical studies on privacy
Reading
• Overview article in Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/entries/privacy/
Alan Westin, Privacy and Freedom, 1967
Ruth Gavison, Privacy and the Limits of Law, 1980
Helen Nissenbaum, Privacy as Contextual
Integrity, 2004 (more on Nov 8)
Westin 1967
Privacy and control over information
“Privacy is the claim of individuals, groups or
institutions to determine for themselves when,
how, and to what extent information about
them is communicated to others”
 Relevant when you give personal information to a web site; agree
to privacy policy posted on web site
 May not apply to your personal health information
Gavison 1980
 Privacy as limited access to self
“A loss of privacy occurs as others obtain information
about an individual, pay attention to him, or gain access
to him. These three elements of secrecy, anonymity,
and solitude are distinct and independent, but
interrelated, and the complex concept of privacy is
richer than any definition centered around only one of
them.”
 Basis for database privacy definition discussed later
Gavison 1980
 On utility
“We start from the obvious fact that both perfect privacy
and total loss of privacy are undesirable. Individuals
must be in some intermediate state – a balance
between privacy and interaction …Privacy thus cannot
be said to be a value in the sense that the more people
have of it, the better.”
 This balance between privacy and utility will show up in data
privacy as well as in privacy policy languages, e.g. health data could
be shared with medical researchers
Privacy Laws in the US
 HIPAA (Health Insurance Portability and Accountability
Act, 1996)
• Protecting personal health information
 GLBA (Gramm-Leach-Bliley-Act, 1999)
• Protecting personal information held by financial service
institutions
 COPPA (Children‘s Online Privacy Protection Act, 1998)
• Protecting information posted online by children under 13
 More details in lecture on Nov 8.
Data Privacy
Releasing sanitized databases
• k-anonymity
• (c,t)-isolation
• Differential privacy
Privacy preserving data mining
Sanitization of Databases
Add noise,
delete
names, etc.
Real Database
(RDB)
Sanitized Database
(SDB)
•
Health records
•
Protect privacy
•
Census data
•
Provide useful
information
(utility)
Re-identification by linking
• Linking two sets of data on shared attributes may
uniquely identify some individuals:
• Example [Sweeney] : De-identified medical data was released,
purchased Voter Registration List of MA, re-identified Governor
• 87 % of US population uniquely identifiable by 5-digit ZIP, sex, dob
K-anonymity (1)
 Quasi-identifier: Set of attributes (e.g. ZIP, sex, dob)
that can be linked with external data to uniquely identify
individuals in the population
 Make every record in the table indistinguishable
from at least k-1 other records with respect to quasiidentifiers
 Linking on quasi-identifiers yields at least k records for
each possible value of the quasi-identifier
K-anonymity and beyond
• Provides some protection: linking on ZIP, age, nationality yields 4 records
• Limitations: lack of diversity in sensitive attributes, background knowledge,
subsequent releases on the same data set
• Utility: less suppression implies better utility
(c,t)-isolation (2)
Mathematical definition motivated by Gavison’s
idea that privacy is protected to the extent that
an individual blends into a crowd.
Image courtesy of WaldoWiki: http://images.wikia.com/waldo/images/a/ae/LandofWaldos.jpg
Definition of (c,t)-isolation
Let y be any RDB point, and let δy=║q-y║2. We say
that q (c,t)-isolates y iff B(q,cδy) contains fewer than t
points in the RDB, that is, |B(q,c δy) ∩ RDB| < t.
A database is represented by n points in high
dimensional space
x2
(one dimension per column)
xt-2
x1
q
δy
y
cδy
Definition of (c,t)-isolation (contd)
Differential Privacy: Motivation (3)
Guaranteeing that a sanitized database does not
imply any private information is too hard
• Auxiliary info: Terry is an inch taller than average
• Sanitized database: The average height is 6 feet
• Sanitized database only provided non-private data,
but resulted in private info being learned
 All surveyors really need is for people to be comfortable
supplying their private data
 People will be comfortable if providing data does not
change the sanitized database enough to be noticed
Differential Privacy: Formalization
 Want a sanitization function K that maps two databases
D1 and D2 that differ by one person to about the same
sanitized databases K(D1) and K(D2)
 Make a disclosure S about as likely with K(D1) as K(D2)
 A randomized function K give ε-differential privacy if for
all data sets D1 and D2 differing in at most one element
and all subset S of Range(K),
Pr[K(D1) in S] ≤ exp(ε) × Pr[K(D2) in S]
Privacy Preserving Data Mining
Reference
• Y. Lindell and B. Pinkas. Privacy Preserving Data
Mining, Journal of Cryptology, 15(3):177-206, 2002.
Problem:
• Compute some function of two confidential databases
without revealing unnecessary information
Example: Govt. database of suspected terrorists
intersection with airline passengers database
Approach:
• Cryptographic techniques for secure multiparty
computation
The Security Definition
For every real
adversary A

(Slide: Lindell)
there exists an
adversary S
Protocol
interaction
Computational Indistinguishability: every probabilistic
polynomial-time observer that receives the input/output
distribution of the honest parties and the adversary, outputs 1
party
upon receiving the distribution generated in Trusted
IDEAL with
negligibly close probability to when it is generated in REAL.
REAL
IDEAL