Download Privacy in Today`s World: Solutions and Challenges

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Privacy in Today’s World:
Solutions and Challenges
Rebecca Wright
Stevens Institute of Technology
26 June, 2003
Talk Outline
• Overview of privacy landscape
• Privacy-preserving data mining
• Privacy-protecting statistical analysis of large
databases
• Selective private function evaluation
• Conclusions
Erosion of Privacy
“You have zero privacy. Get over it.”
- Scott McNealy, 1999
• Changes in technology are making privacy harder.
– reduced cost for data storage
– increased ability to process lots of data
• Increased need for security may make privacy seem
less critical.
Historical Changes
• Small towns, little movement:
– very little privacy, social mechanisms helped
prevent abuse
• Large cities, increased movement:
– lost social mechanisms, but gained privacy
through anonymity
• Now:
– advancing technology is reducing privacy,
social mechanisms not replaced.
What Can We Do?
• Use technology, policy, and education to
– maintain/increase privacy
– provide new social mechanisms
– create new mathematical models for better
understanding
Problem: Using old models and old modes of thought
in dealing with situations arising from new
technology.
What is Privacy?
• May mean different things to different people
– seclusion: the desire to be left alone
– property: the desire to be paid for one’s data
– autonomy: the ability to act freely
• Generally: the ability to control the dissemination
and use of one’s personal information.
Privacy of Data
• Stored data
– encryption, computer security, intrusion
detection, etc.
• Data in transit
– encryption, network security, etc.
• Release of data
– current privacy-oriented work: P3P, privacy
bird, EPA, Internet Explorer 6.0, etc.
Internet Explorer V.6
Block All Cookies
Not usable
High
Reasonable range of
behavior, blocking some
cookies based on their
privacy policies
…
Low
Accept All Cookies
No privacy
Fairly simple, deals only with cookies, limited info
Different Types of Data
• Transaction data
– created by interaction between stakeholder and
enterprise
– current privacy-oriented solutions useful
• Authored data
– created by stakeholder
– digital rights management (DRM) useful
• Sensor data
– stakeholders not clear at time of creation
– presents a real and growing privacy threat
Product Design as Policy Decision
• product decisions by large companies or public
organizations become de facto policy decisions
• often such decisions are made without conscious
thought to privacy impacts, and without public
discussion
• this is particularly true in the United States, where
there is not much relevant legislation
Example: Metro Cards
Washington, DC
New York City
- no record kept of per
card transactions
- transactions recorded by
card ID
- damaged card can be
replaced if printed
value still visible
- damaged card can be
replaced if card ID still
readable
- have helped find
suspects, corroborate
alibis
Privacy Tradeoffs?
this a dangerous person?)

• Privacy vs. security: maybe, but doesn’t mean
giving up one gets the other (who is this person?
is
• Privacy vs. usability: reasonable defaults, easy and
extensive customizations, visualization tools
Tradeoffs are to cost or power, rather than inherent
conflict with privacy.
Surveillance and Data Mining
• Analyze large amounts of data from diverse
sources.
• Law enforcement and homeland security:
– detect and thwart possible incidents before they occur
– identify and prosecute criminals after incidents occur
• Companies like to do this, too.
– Marketing, personalized customer service
Privacy-Preserving Data Mining
Allow multiple data holders to collaborate to
compute important (e.g. security-related)
information while protecting the privacy of other
information.
Particularly relevant now, with increasing focus on
security even at the expense of privacy (e.g. TIA).
Advantages of privacy protection
• protection of personal information
• protection of proprietary or sensitive
information
• fosters collaboration between different
agencies (since they may be more willing to
collaborate if they need not reveal their
information)
Cryptographic Approach
• Using cryptography, provably does not reveal
anything except output of computation.
– Privacy-preserving computation of decision trees [LP00]
– Secure computation of approximate Hamming distance
of two large data sets [FIMNSW01]
– Privacy-protecting statistical analysis [CIKRRW01]
– Privacy-preserving association rule mining [KC02]
Randomization Approach
• Randomizes data before computation (which can
then either be distributed or centralized).
• Induces a tradeoff between privacy and
computation error.
– Distribution reconstruction algorithm from randomized
data [AS00]
– Association rule mining [ESAG02]
Comparison of Approaches
cryptographic approach
inefficiency
privacy loss
randomization approach
inaccuracy
Comparison of Approaches
inefficiency
cryptographic approach
privacy loss
randomization approach
inaccuracy
Privacy-Protecting Statistics [CIKRRW01]
CLIENT
SERVERS
Wishes to
compute
statistics of
servers’ data
Each holds
large database
• Parties communicate using
cryptographic protocols designed so that:
– Client learns desired statistics, but learns nothing
else about data (including individual values or
partial computations for each database)
– Servers do not learn which fields are queried, or
any information about other servers’ data
– Computation and communication are very efficient
Non-Private and Inefficient Solutions
• Database sends client entire database (violates
database privacy)
• For sample size m, use SPIR to learn m values
(violates database privacy)
• Client sends selections to database, database does
computation (violates client privacy)
• General secure multiparty computation (not
efficient for large databases)
Secure Multiparty Computation
• Allows k players to privately compute a function f
of their inputs.
P1
P2
Pk
• Overhead is polynomial in size of inputs and
complexity of f [Yao, GMW, BGW, CCD, ...]
Symmetric Private Information
Retrieval
• Allows client with input i to interact with database
server with input x to learn (only) xi
Client
i
Server
x  x1 ,..., xn
Learns xi
• Overhead is polylogarithmic in size of database x
[CMS,GIKM]
Homomorphic Encryption
• Certain computations on encrypted messages
correspond to other computations on the cleartext
messages.
• For additive homomorphic encryption,
– E(m1) • E(m2) = E (m1+ m2)
– also implies E(m)x = E(mx)
• Paillier encryption is an example.
Privacy-Protecting Statistics Protocol
• To learn mean and variance: enough to learn sum
and sum of squares.
• Server stores:
x1 x2
...
( zi  x )
z1 z 2
...
2
i
xn
zn
and responds to queries from both
• efficient protocol for sum
efficient protocol for
mean and variance
Weighted Sum
Client wants to compute selected linear combination
m
of m items:  j 1 w j xi
j
Client
Server
Homomorphic encryption E, D
w j if i  i
j
i  
E (1 ),..., E ( n )
0 o/w
v
decrypts to obtain

n
 i xi   j 1 w j xi
i 1
m
j
computes
v  i 1 ( E ( i ) )
n
 E (i 1 i xi )
n
xi
Efficiency
• Linear communication and computation (feasible
in many cases)
• If n is large and m is small, would like to do
better
Selective Private Function Evaluation
• Allows client to privately compute a function f
over m inputs xi1 ,  , xim
• client learns only f ( xi1 ,  , xim )
• server does not learn i1 ,..., im
Unlike general secure multiparty computation, we
want communication complexity to depend on m,
not n. (More accurately, polynomial in m,
polylogarithmic in n).
Security Properties
• Correctness: If client and server follow the
protocol, client’s output is correct.
• Client privacy: malicious server does not learn
client’s input selection.
• Database privacy:
– weak: malicious client learns no more than
output of some m-input function g
– strong: malicious client learns no more than
output of specified function f
Solutions based on MPC
• Input selection phase:
– server obtains blinded version of each xi j
• Function evaluation phase
– client and server use MPC to compute f on the m
blinded items
Input selection phase
Client
Server
Homomorphic encryption D,E
Computes encrypted database
Retrieves
i1
E ( x1 ) ...
E ( x ),..., E ( xim )
using SPIR
Picks random
c1 ,..., cm
computes
ij
j
E( x  c )
E ( xn )
SPIR(m,n), E
E( xi j  c j )
Decrypts received values:
s j  xi j  c j
Function Evaluation Phase
• Client has c  c1 ,..., cm
• Server has s
 s1 ,..., sm
s j  xi j  c j
Use MPC to compute:
g (c, s)  f (s  c)  f ( x1 ,..., xm )
• Total communication cost polylogarithmic in n,
polynomial in m, | f |
Distributed Databases
• Same approach works to compute function over a
distributed database.
– Input selection phase done in parallel with each
database server
– Function evaluation phase done as single MPC
– Database privacy means only final outcome is
revealed to client.
Performance
Complexity
Security
1 mSPIR(n,1,k) + O(k|f|)
Strong
2 mSPIR(n,1,1) + MPC(m,|f|)
Weak
3 SPIR(n,m,log n) + MPC(m,|f|) + km2 Weak
4 SPIR(n,m,k) + MPC(m,|f|)
Honest client
only
Current experimentation to understand whether
these methods are efficient in real-world settings.
Initial Experimental Results
• Initial implementation of linear computation and
communication solution [H. Subramaniam & Z. Yang]
– implementation in Java and C++
– uses Paillier encryption
– uses synthetic data, with client and server as separate
processes on the same machine (2-year old Toshiba
laptop).
Initial Experimental Results
30
Time (minutes)
25
20
15
Total time
10
5
0
0
20,000
40,000
60,000
Database size
80,000
100,000
Initial Experimental Results
30
Time (minutes)
25
20
Total time
Encryption
15
10
5
0
0
20,000
40,000
60,000
Database size
80,000
100,000
Conclusions
• Privacy is in danger, but some important progress
has been made.
• Important challenges ahead:
– Usable privacy solutions (efficiency and user interface)
– Sensor data
– Better use of hybrid approach: decide what can safely
be disclosed, what needs moderate protection, and use
cryptographic protocols to protect most critical
information.
– Mathematical/formal models to understand and
compare different solutions.
Research Directions
• Investigate integration of cryptographic approach
and randomization approach:
– seek to maintain strong privacy and accuracy of
cryptographic approach, ...
– while benefitting from improved efficiency of
randomization approach
• Understand mathematically what the resulting
privacy and/or accuracy compromises are.
• Technology, policy, and education must work
together.