Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Privacy-Preserving Clustering Outline Introduction Related Work Preliminaries Secure Multi-Party Computation Data Sanitization Yao’s Millionaires’ Problem Homomorphic Encryption Privacy-Preserving K-Means Clustering Conclusion Introduction Why needs privacy-preserving? Data sharing in today's globally networked systems poses a threat to individual privacy and organizational confidentiality. The privacy problem is not data mining, but the way data mining is done. So, privacy and data mining can coexist. An important data mining problem: clustering. Related Work Privacy-preserving clustering: Secure multi-party computation. High computation and communication costs. Data sanitization. Lost of accuracy. Dimensionality reduction. Model-based solutions. Yao’s Millionaires’ Problem Millionaires’ problem: Two millionaires wish to know who is richer; however, they do not want to find out any additional information about each other’s wealth. Solutions Suppose Alice has i millions. Bob has j millions. 1 < i, j < 10. Solutions Suppose 1. 2. Alice: i = 5, Bob: j = 3. (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2. x Ea(x) 1 7 2 3 3 5 4 8 5 6 6 2 7 4 8 1 9 10 10 9 Solutions 3. (A) y1 = Da(2) = Da(k - j + 1) = 6. x Ea(x) y2 = Da(3) = 2. 1 7 yj = y3 = Da(4) = Da(k - j + j) = 7. 2 3 y4 = Da(5) = 3. 3 5 y5 = Da(6) = 5. 4 8 y6 = Da(7) = 1. 5 6 y7 = Da(8) = 4. 6 2 … 7 4 8 1 9 10 10 9 Solutions 4. (A) z1 = y1 = 6. z2 = y2 = 2. z3 = y3 = Da(k - j + j) = 7. z4 = y4 = 3. z5 = yi = y5 = 5. z6 = yi + 1 = y5 + 1 = 6. z7 = y6 + 1 = 2. … 5. (B) Check if z3 = x or not. If yes, means that i ≧ j. If no, means that i < j. Homomorphic Encryption Homomorphic encryption: If there is an algorithm ⊕ to compute H(x⊕y) from H(x) and H(y) that does not reveal x or y. H(x⊕y) = H(x) ⊙ H(y) RSA, … Additive homomorphic: H(x+y) = H(x) * H(y) Paillier, … Homomorphic Encryption Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data SIGKDD, 2003 Problem Definition Goal: Input: Cluster the known set of common entities without revealing any value that the clustering is based on. Each user provides one attribute of all items. Output: Assignment of entities to clusters. Cluster centers themselves. K-Means Clustering K-Means Clustering cluster decision distance matrix new center computation Vertically Partitioned Data User 1 User 2 Terminology r: # of users, each having different attributes for the same set of items. n: # of the common items. k: # of clusters required. ui: each cluster mean, i = 1, …, k. uij: projection of the mean of cluster i on user j. Final result for user j: The final value / position of uij, i = 1, …, k. Cluster assignments: clusti for all points i = 1, …, n. Privacy-Preserving K-Means Clustering Securely Finding the Closest Cluster Securely Finding the Closest Cluster The security of the algorithm is based on three key ideas. Disguise the site components of the distance with random values that cancel out when combined. Permute the order of clusters so the real meaning of the comparison results is unknown. Compare distances so only the comparison result is learned; no party knows the distances being compared. Securely Finding the Closest Cluster Securely Finding the Closest Cluster Securely Finding the Closest Cluster Check Threshold j m Conclusion Horizontally partitioned data: User 1 User 2