Download Searching with Privacy: a survey

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Buffer overflow protection wikipedia , lookup

Transcript
Private Keyword
Search on
Streaming Data
Rafail Ostrovsky
William Skeith
UCLA
http://www.cs.ucla.edu/~rafail/
(patent pending)
Motivating Example

The intelligence community collects data
from multiple sources that might potentially
be “useful” for future analysis.




Network traffic
Chat rooms
Web sites, etc…
However, what is “useful” is often
classified.
Current Practice
 Continuously
transfer all data to a
secure environment.
 After data is transferred, filter in the
classified environment, keep only
small fraction of documents.
Classified Environment
Filter
¢¢¢! D(1,3)! D(1,2)! D(1,1)!
Storage
D(3,3)
(3,1)
(1,1)
(1,2)
(2,2)
(2,3)
(3,2)
(2,1)
(1,3)
¢¢¢! D(2,3)! D(2,2)! D(2,1)!
¢¢¢! D(3,3)! D(3,2)! D(3,1)!
Filter rules are
written by an
analyst and
are classified!
Current Practice
 Drawbacks:
Communication
Processing
How to improve performance?
 Distribute
work to many locations on
a network
 Seemingly ideal solution, but…
 Major problem:
 Not
clear how to maintain privacy, which
is the focus of this talk
Storage
¢¢¢! D(1,3)! D(1,2)! D(1,1)!
Filter
E (D(1,2))
E (D(1,3))
Classified
Environment
Decrypt
Storage
¢¢¢! D(2,3)! D(2,2)! D(2,1)!
Filter
E (D(2,2))
Storage
¢¢¢! D(3,3)! D(3,2)! D(3,1)!
Filter
Storage
D(1,2)
D(1,3)
D(2,2)

Example Filter:
Look for all documents that contain special
classified keywords, selected by an analyst
 Perhaps an alias of a dangerous criminal


Privacy
Must hide what words are used to create the
filter
 Output must be encrypted

More generally:
We define the notion of Public Key
Program Obfuscation
 Encrypted version of a program

Performs same functionality as un-obfuscated
program, but:
 Produces encrypted output
 Impossible to reverse engineer


A little more formally:
Public Key Program Obfuscation
Privacy
Related Notions
PIR (Private Information Retrieval)
[CGKS],[KO],[CMS]…
 Keyword PIR [KO],[CGN],[FIPR]
 Program Obfuscation [BGIRSVY]…



Here output is identical to un-obfuscated
program, but in our case it is encrypted.
Public Key Program Obfuscation

A more general notion than PIR, with lots of
applications
What we want
¢¢¢! D(1,3)! D(1,2)! D(1,1)!
Filter
Storage
This is a
Nonmatching
matching
document
#2
document
#3
This is a
matching
Nondocument
matching
#1
document
This is a
Nonmatching
document
How to accomplish this?
Several Solutions based on
Homomorphic Encryptions
For this talk: Paillier Encryption
 Properties:

Plaintext set = Zn
 Ciphertext set = Z*n2
 Homomorphic, i.e., E(x)E(y) = E(x+y)

Simplifying Assumptions for this
Talk
All keywords come from some poly-size
dictionary
 Truncate documents beyond a certain
length

Dictionary
w1
E(0)
w2
E(1)
w3
E(0)
w4
E(0)
w5
E(1)
D
.
.
.
wt-2
E(1)
wt-1
E(0)
wt
E(0)
(g,gD)
¤=
E(0
)
E(0
)
E(0
)
¤=
¤=
E(0
)
E(0
)
E(0
)
Output Buffer
E(0
)
E(0
)
E(0
)
E(0
)
Here’s
another
Collisions cause two problems:
matching
document
1. Good documents are destroyed
2. Non-existent documents could be
fabricated
This is
matching
document
#1
This is
matching
document
#3
This is
matching
document
#2
We’ll
make use of two
combinatorial lemmas…
How to detect collisions?
Append a highly structured, (yet random)
k-bit string to the message
 The sum of two or more such strings will
be another such string with negligible
probability in k
 Specifically, partition k bits into triples of
bits, and set exactly one bit from each
triple to 1

100|001|100|010|010|100|001|010|010
010|001|010|001|100|001|100|001|010
010|100|100|100|010|001|010|001|010
=
100|100|010|111|100|100|111|010|010
Detecting Overflow > m
Double buffer size from m to 2m
 If m < #documents < 2m, output “overflow”
 If #documents > 2m, then expected
number of collisions is large, thus output
“overflow” in this case as well.


Not yet in eprint version, will appear soon, as well as some other
extensions.
More from the paper that we don’t
have time to discuss…
Reducing program size below dictionary
size (using  – Hiding from [CMS])
 Queries containing AND (using [BGN]
machinery)
 Eliminating negligible error (using perfect
hashing)
 Scheme based on arbitrary homomorphic
encryption

Conclusions
Private searching on streaming data
 Public key program obfuscation, more
general than PIR
 Practical, efficient protocols
 Many open problems

Thanks
For
Listening!
