Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Privacy Preserving Data Mining Benjamin Fung bfung(at)cs.sfu.ca Privacy Preserving Data Mining • What is data mining? – Non-trivial extraction of implicit, previously unknown, and potentially useful information from large data sets or databases [W. Frawley and G. PiatetskyShapiro and C. Matheus, 1992] • What is privacy preserving data mining? – Study of achieving some data mining goals without scarifying the privacy of the individuals Scenario (Information Sharing) • A data owner wants to release a person-specific data table to another party (or the public) for the purpose of classification analysis without scarifying the privacy of the individuals in the released data. Person-specific data Data owner Data recipients Privacy Threat • If a description on (Education, Sex) is so specific that not many people match it, releasing the table will lead to linking a unique or a small number of individuals with sensitive information. Education Sex Age Class # of Recs. 9th F 30 0G3B 3 10th M 32 0G4B 4 11th F 35 2G3B 5 12th F 37 3G1B 4 Bachelors F 42 4G2B 6 Education Sex Diagnosis … Bachelors F 44 4G0B 4 Bachelors F Depression … Masters M 44 4G0B 4 Bachelors M Heart disease … Masters F 44 3G0B 3 Masters F Depression … Doctorate F 44 1G0B 1 Masters F Heart disease … Total: 34 Frecipients Knee injury Data Adversary Doctorate … Solution: Generalization Education Sex Age Class # of Recs. Education Sex Age Class # of Recs. 9th F 30 0G3B 3 9th F 30 0G3B 3 10th M 32 0G4B 4 10th M 32 0G4B 4 11th F 35 2G3B 5 11th F 35 2G3B 5 12th F 37 3G1B 4 12th F 37 3G1B 4 Bachelors F 42 4G2B 6 Bachelors F 42 4G2B 6 Bachelors F 44 4G0B 4 Bachelors F 44 4G0B 4 Masters M 44 4G0B 4 Grad School M 44 4G0B 4 Masters F 44 3G0B 3 Grad School F 44 4G0B 4 Doctorate F 44 1G0B 1 References 1. K. Wang, B. C. M. Fung, and P. S. Yu. Template-Based Privacy Preservation in Classification Problems. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), Houston, TX, USA, November 27-30, 2005. 2. K. Wang, B. C. M. Fung, and G. Dong. Integrating Private Databases for Data Analysis. In Proc. of the 2005 IEEE International Conference on Intelligence and Security Informatics (ISI 2005), pages 171-182, Atlanta, GA, USA, May 19-20, 2005. 3. B. C. M. Fung, K. Wang, and P. S. Yu. Top-Down Specialization for Information and Privacy Preservation. In Proc. of the 21st IEEE International Conference on Data Engineering (ICDE 2005), pages 205-216, Tokyo, Japan, April 5-8, 2005. For more information, visit http://www.cs.sfu.ca/~bfung