Download Asymptotical Lower Limits on Required Numberof Examples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
PRIVACY AND SECURITY ISSUES
IN DATA MINING
P.h.D. Candidate:
Anna Monreale
Supervisors
Prof. Dino Pedreschi
Dott.ssa Fosca Giannotti
University of Pisa
Department of Computer Science
Privacy-Preserving Data Mining
2

New privacy-preserving data mining techniques:



Goal: to develop algorithms for modifying the original data, so
that




For individual privacy: Personal data are private
For corporate privacy: Knowledge extracted is private
private data are protected
private knowledge remain private even after the mining tasks
Analysis results are still useful
Natural trade-off between privacy quantification and data utility
Secure Outsourcing of Data Mining
3
The server has access to data
of the owner

Data owner has the property of
 Data
 Knowledge extracted from
data






all encrypted transactions in D* and items contained in it are secure
given any mining query the server can compute the encrypted result
encrypted mining and analysis results are secure
the owner can decrypt the results and so, reconstruct the exact result
the space and time incurred by the owner in the process has to be minimum
A Solution for Pattern Mining: K-anonymity
4


Attack Model: the attacker knows the set of plain items and their true
supports in D exactly and has access to the encrypted database D∗
 Item-based attack: guessing the plain item corresponding to the cipher item
e with probability prob(e)
 Itemset-based attack: guessing the plain itemset corresponding to the
cipher itemset E with probability prob(E)
Encryption:
 Replacing each plain item in D by a
1-1 substitution cipher
 Adding fake transactions
 K-Anonymity: for each item e there
are at least others k-1 cipher items
+

Decryption: A Synopsis allows computing the actual support of every pattern
Privacy-Preserving DT Framework
5



GOAL: publishing and sharing various forms of data without disclosing
sensitive personal information while preserving mining results

Sequence data

Query-Log data

….…
Problem: Anonymizing sequence data while preserving sequential pattern
mining results
Attack Model: Sequence Linking Attack


The attacker knows part of a sequence and want to guess the whole correct
sequence
Idea: Combining k-anonymity and sequence hiding methods and
reformulating the problem as that of hiding k-infrequent sequences
Running example: k = 2
6
Dataset D
BC
ABCD
ABCD
BCE
BCD
Tree
Reconstruction
Root
Prefix Tree
Construction
C:3
B:2
E:1 D:1 C:2
Root
D:2
Tree Pruning
Lcut
BCE:1
BCD:1
B:2 A:2
A:3
Generation of D’
C:2
LCS:
1. B C
2. B C D
B:3 A:2
Root
B:3
B:2
C:3
C:2
D:3
D:2
B:3 A:2
B:1
C:3
C:1
B:2
E:1 D:1 C:2
D:2
Dataset D’
BC
ABCD
ABCD
BC
ABCD
Related documents