Download OptRR: Optimizing Randomized Response Schemes

OptRR: Optimizing Randomized Response Schemes For PrivacyPreserving Data Mining Zhengli Huang and Wenliang (Kevin) Du Department of EECS Syracuse University Data Mining/Analysis Data Miner Step 2: Data Publishing Data Publisher Step 1: Data Collection Individual Data Data cannot be published directly because of privacy concern Background: Randomized Response The true answer is “Yes” Biased coin: P( Head )     0.5 Do you smoke? P(Yes )   (  0.5) Head Tail Yes No P'(Yes)  P(Yes)    P(No)  (1  ) P'(No)  P(Yes)  (1  )  P(No)   RR for Categorical Data q1 q2 q3 q4 True Value: Si Si Si+1 Si+2 Si+3 P'(s1)  q1 q4 q3 q2 P(s1)       P'(s2)  q2 q1 q4 q3P(s2)  P'(s3)  q3 q2 q1 q4 P(s3)       P'(s4) q4 q3 q2 q1 P(s4)      M A Generalization • Several RR Matrices have been proposed – [Warner 65] – [R.Agrawal et al. 05], [S. Agrawal et al. 05] • RR Matrix can be arbitrary a11 a12  a21 a22  M a31 a32  a41 a42 a13 a23 a33 a43 a14   a24  a34   a44  • Can we find optimal RR matrices?  What is an optimal matrix? • Which of the following is better? 1 0 0   M1  0 1 0  0 0 1  13 1 M 2  3 1  3 1 3 1 3 1 3 Privacy: M2 is better Utility: M1 is better  So, what is an optimal matrix? 1 3 1 3 1 3      Optimal RR Matrix • An RR matrix M is optimal if no other RR matrix’s privacy and utility are both better than M (i, e, no other matrix dominates M). – Privacy Quantification – Utility Quantification • A number of privacy and utility metrics have been proposed. We use the following: – Privacy: how accurately one can estimate individual info. – Utility: how accurately we can estimate aggregate info. Optimization Methods • Approach 1: Weighted sum: w1 Privacy + w2 Utility • Approach 2 – Fix Privacy, find M with the optimal Utility. – Fix Utility, find M with the optimal Privacy. – Challenge: Difficult to generate M with a fixed privacy or utility. • Our Approach: Multi-Objective Optimization Evolutionary Multi-Objective Optimization (EMOO) • Genetic algorithms has difficulty of dealing with multiple objectives. • We use the EMOO algorithm • We use SPEA2. Our SPEA2-based algorithm EMOO • Evolution – Crossover – Mutation • Fitness Assignment (SPEA2) – Strength Value S(M): the number of matrix dominated by M. – Raw fitness F’(M): the sum of the strength of the RR matrices that dominate M. The lower the better. – Density d(M): discriminate the matrices with the same fitness. Diversity Worse M5 M4 Utility M M2 3 M1 Better Privacy The Output of Optimization • Pareto Fronts – The optimal set is often plotted in the objective space and the plot is called the Pareto front. Utility (error) 0 Privacy Experiments For normal distribution with different δ For First attribute of Adult data For normal distribution (δ=0.75) Summary • We use an evolutionary multi-objective optimization technique to search for optimal RR matrices. • The evaluation shows that our scheme achieves better performance than the existing RR schemes.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download OptRR: Optimizing Randomized Response Schemes