Download OptRR: Optimizing Randomized Response Schemes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
OptRR: Optimizing Randomized
Response Schemes For PrivacyPreserving Data Mining
Zhengli Huang and Wenliang (Kevin) Du
Department of EECS
Syracuse University
Data Mining/Analysis
Data Miner
Step 2: Data Publishing
Data Publisher
Step 1: Data Collection
Individual
Data
Data cannot be published
directly because of privacy
concern
Background:
Randomized Response
The true
answer
is “Yes”
Biased coin:
P( Head )  
  0.5
Do you smoke?
P(Yes )  
(  0.5)
Head
Tail
Yes
No
P'(Yes)  P(Yes)    P(No)  (1  )
P'(No)  P(Yes)  (1  )  P(No)  
RR for Categorical Data
q1
q2
q3
q4
True Value: Si
Si
Si+1
Si+2
Si+3
P'(s1)  q1 q4 q3 q2 P(s1) 

 


P'(s2)  q2 q1 q4 q3P(s2) 
P'(s3)  q3 q2 q1 q4 P(s3) 

 


P'(s4)
q4
q3
q2
q1
P(s4)

 


M
A Generalization
• Several RR Matrices have been proposed
– [Warner 65]
– [R.Agrawal et al. 05], [S. Agrawal et al. 05]
• RR Matrix can be arbitrary
a11 a12

a21 a22

M
a31 a32

a41 a42
a13
a23
a33
a43
a14 

a24 
a34 

a44 
• Can we find optimal RR matrices?

What is an optimal matrix?
• Which of the following is better?
1 0 0


M1  0 1 0

0 0 1

13
1
M 2  3
1

3
1
3
1
3
1
3
Privacy: M2 is better
Utility: M1 is better

So, what is an optimal matrix?
1
3
1
3
1
3





Optimal RR Matrix
• An RR matrix M is optimal if no other RR matrix’s
privacy and utility are both better than M (i, e, no other
matrix dominates M).
– Privacy Quantification
– Utility Quantification
• A number of privacy and utility metrics have been
proposed. We use the following:
– Privacy: how accurately one can estimate individual info.
– Utility: how accurately we can estimate aggregate info.
Optimization Methods
• Approach 1: Weighted sum:
w1 Privacy + w2 Utility
• Approach 2
– Fix Privacy, find M with the optimal Utility.
– Fix Utility, find M with the optimal Privacy.
– Challenge: Difficult to generate M with a fixed
privacy or utility.
• Our Approach: Multi-Objective Optimization
Evolutionary Multi-Objective
Optimization (EMOO)
• Genetic algorithms has difficulty of dealing with
multiple objectives.
• We use the EMOO algorithm
• We use SPEA2.
Our SPEA2-based algorithm
EMOO
• Evolution
– Crossover
– Mutation
• Fitness Assignment (SPEA2)
– Strength Value S(M): the number of matrix dominated by M.
– Raw fitness F’(M): the sum of the strength of the RR
matrices that dominate M. The lower the better.
– Density d(M): discriminate the matrices with the same fitness.
Diversity
Worse
M5
M4
Utility
M
M2 3
M1
Better
Privacy
The Output of Optimization
• Pareto Fronts
– The optimal set is often plotted in the objective
space and the plot is called the Pareto front.
Utility
(error)
0
Privacy
Experiments
For normal distribution with different δ
For First attribute of Adult data
For normal distribution (δ=0.75)
Summary
• We use an evolutionary multi-objective
optimization technique to search for optimal RR
matrices.
• The evaluation shows that our scheme achieves
better performance than the existing RR
schemes.
Related documents