Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
OptRR: Optimizing Randomized Response Schemes For PrivacyPreserving Data Mining Zhengli Huang and Wenliang (Kevin) Du Department of EECS Syracuse University Data Mining/Analysis Data Miner Step 2: Data Publishing Data Publisher Step 1: Data Collection Individual Data Data cannot be published directly because of privacy concern Background: Randomized Response The true answer is “Yes” Biased coin: P( Head ) 0.5 Do you smoke? P(Yes ) ( 0.5) Head Tail Yes No P'(Yes) P(Yes) P(No) (1 ) P'(No) P(Yes) (1 ) P(No) RR for Categorical Data q1 q2 q3 q4 True Value: Si Si Si+1 Si+2 Si+3 P'(s1) q1 q4 q3 q2 P(s1) P'(s2) q2 q1 q4 q3P(s2) P'(s3) q3 q2 q1 q4 P(s3) P'(s4) q4 q3 q2 q1 P(s4) M A Generalization • Several RR Matrices have been proposed – [Warner 65] – [R.Agrawal et al. 05], [S. Agrawal et al. 05] • RR Matrix can be arbitrary a11 a12 a21 a22 M a31 a32 a41 a42 a13 a23 a33 a43 a14 a24 a34 a44 • Can we find optimal RR matrices? What is an optimal matrix? • Which of the following is better? 1 0 0 M1 0 1 0 0 0 1 13 1 M 2 3 1 3 1 3 1 3 1 3 Privacy: M2 is better Utility: M1 is better So, what is an optimal matrix? 1 3 1 3 1 3 Optimal RR Matrix • An RR matrix M is optimal if no other RR matrix’s privacy and utility are both better than M (i, e, no other matrix dominates M). – Privacy Quantification – Utility Quantification • A number of privacy and utility metrics have been proposed. We use the following: – Privacy: how accurately one can estimate individual info. – Utility: how accurately we can estimate aggregate info. Optimization Methods • Approach 1: Weighted sum: w1 Privacy + w2 Utility • Approach 2 – Fix Privacy, find M with the optimal Utility. – Fix Utility, find M with the optimal Privacy. – Challenge: Difficult to generate M with a fixed privacy or utility. • Our Approach: Multi-Objective Optimization Evolutionary Multi-Objective Optimization (EMOO) • Genetic algorithms has difficulty of dealing with multiple objectives. • We use the EMOO algorithm • We use SPEA2. Our SPEA2-based algorithm EMOO • Evolution – Crossover – Mutation • Fitness Assignment (SPEA2) – Strength Value S(M): the number of matrix dominated by M. – Raw fitness F’(M): the sum of the strength of the RR matrices that dominate M. The lower the better. – Density d(M): discriminate the matrices with the same fitness. Diversity Worse M5 M4 Utility M M2 3 M1 Better Privacy The Output of Optimization • Pareto Fronts – The optimal set is often plotted in the objective space and the plot is called the Pareto front. Utility (error) 0 Privacy Experiments For normal distribution with different δ For First attribute of Adult data For normal distribution (δ=0.75) Summary • We use an evolutionary multi-objective optimization technique to search for optimal RR matrices. • The evaluation shows that our scheme achieves better performance than the existing RR schemes.