Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
Tools for Privacy
Preserving Distributed
Data Mining
By Michael Holmes
Why Private Data Mining
❖
The CDC may want to use data mining techniques to
identify trends in disease outbreaks.
❖
Insurance companies have useful data but can’t disclose
it because of privacy concerns.
❖
Is there a way to obtain this data without revealing the
identity of the patients?
Private Data Mining
Techniques
❖
Secure Sum
❖
Secure Set Union
❖
Secure Size of Set Intersection
❖
Scalar Product
Private Data Mining Toolkit
❖
Association Rules in horizontally partitioned data
❖
Association Rules in vertically partitioned data
❖
EM Clustering
Secure Sum
❖
Securely compute the sum from individual databases.
❖
Have a site randomly generate a number R
❖
Add this number to every value and send it to site 2.
❖
Site 2 can then add each of it’s values to that values
sent from site 1 and return a single number back to Site
1.
❖
Site 1 can then remove the random number N times and
find the correct sum.
Secure Sum
Secure Set Union
Secure Size of Set
Intersection
❖
Only possible with Commutative Encryption.
❖
very party encrypts their data and then sends it to another party.
❖
The next party also encrypts the encrypted data.
❖
After all parties have encrypted all the data from every other
party only that has been duplicated by the encryption is shared.
❖
Count the duplicates and you know the size of the intersection.
Scalar Product
❖
Want to compute the sum of x1 * y1 between two
databases
❖
Use linear combinations of random numbers to disguise
elements and then computationally remove these once
you get the result.
Association Rules in Horizontally
Partitioned Data
❖
Candidate Set Generation
❖
Local Pruning
❖
Itemset Exchange (Secure Union Step here)
❖
Support Count Exchange
Association Rules in Vertically
Partitioned Data
❖
Uses scalar product to determine if the count of an item
set is greater than a threshold
❖
If the count is above the threshold you’ve determined
that the database is worth querying
❖
Can also user Secure Size Set Intersection to see how
much is in common.
❖
Useful when using algorithm such as apriori algorithm
EM Clustering
❖
Uses secure sum to get a global number associated with
all sites involved.
❖
Once global sum is computed, it can be used in the
Expectation-maximization method to generate staistical
models.
EM Clustering
❖
Uses secure sum to get a global number associated with
all sites involved.
❖
Once global sum is computed, it can be used in the
Expectation-maximization method to generate staistical
models.
Things to Note
❖
These algorithms are not fully private, some information
is learned in the process.
❖
For example in the set intersection, sites can
potentially learn the sizes of each database.
❖
Make sure to pick the appropriate algorithms for what
you need to accomplish
❖
Watch out for intermediate information being leaked!
Thank you
Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

K-means clustering wikipedia, lookup

Cluster analysis wikipedia, lookup