Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Data Mining Dr. Bradley A. Malin Assistant Professor Department of Biomedical Informatics Vanderbilt University 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Data Collection demographics Electronic Medical clinical presentation Records Overt Collection Covert Collection Hospital visit for treatment webcam while walking • Localized: Personal records in databases at source • Distributed: Integration of records from many sources 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Data Mining • Unsupervised – “Labels” unknown in advance, so search for intrinsic patterns of the data – Clustering “similar” people • Purchased same products Country USA Canada • Supervised – “Labels” known in advance – Train models on sample data to classify new cases 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE Age <50 Age >50 <50 >50 Harry Scream 1984 Jaws Potter 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Website Personalization • Can a website predict what I want to see? – Intra-personalization: What pages / topics did I visit in my previous visits? – Inter-personalization: Is my browsing / purchasing history similar to other people’s? • Does my behavior reveal my identity or sensitive things about my life? – What information should not be revealed? 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Intelligence • Lists of entities are becoming increasingly prevalent – Intelligence reports, rosters, networks • How many Alice’s are there? Which is which? • How does Alice relate to Bob? 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE Alice Doe Bob Doe Junior Doe doc A Alice Doe John Smith Bob Doe Junior Doe Alice Doe doc B Alice Doe John Smith Brad Malin doc D doc C Bob Alice “A” Brad Alice “B” Junior Alice “C” Alice “D” John 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Surveillance • Location Surveillance: Did someone on Interpol’s watchlist visit hotel X? Airline Y? Hotel X Airline Y • Challenge: Data holders want to collaborate, but fear strategic knowledge and legal constraints 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Privacy Protections • Protect Anonymity – Remove / encrypt identifying information – Suppress inferences that can reveal identity • Protect Confidentiality – Hide Sensitive Rules – Perturbation and Generalization • Secure Multiparty Computation – E(a) + E(b) = E(a + b) [homomorphism] – E(E(John),x),y) = E(E(John),y),x) [commutate] 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Clinical Genomics Linked by Medical Record # Clinical Record Blood Samples De-identification • Vanderbilt DNA Databank • DNA from “leftover” blood – 25-75K per year, 250K in 5 years • Combined with de-identified electronic medical records – 600 GBytes on 1.4 mil. patients Clinical Record DNA 512 Bit Hash of # • “Hypothesis Generation” to mine correlations between clinical features and DNA 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Example De-identified Medical Record Replaced SSN and phone # MR# is removed Substituted names Shifted Dates 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Naïve Protection DNA in Genomic DBs Identities in Discharge DBs H1 H2 H3 H1 ACTG1 ACTG3 H2 H3 ACTG1 ACTG1 ACTG2 ACTG2 ACTG3 • Patterns in data can lead to privacy compromise • Suppress patterns “intelligently” to support goals 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la In Detail: Cystic Fibrosis 100 100 % of DNA Records Disclosed % of Samples Re-identified (1149 patients, 174 hospitals) 80 60 40 20 0 0 10 20 30 40 50 80 60 40 20 Naive Partial Trail Suppression 0 0 k 10 20 30 40 k BEFORE Protection 100% Samples In Repository 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE AFTER Protection 0% Samples Re-identified 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la 50 The Impact of Data Mining on Privacy in the Public and Private Sectors Richard S. Rosenberg Professor Emeritus, Department of Computer Science, University of British Columbia and President of the BC Freedom of Information and Privacy Association Vancouver, BC [email protected] 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la The U.S. Government 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la A Revision 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Top Six Purposes of Data Mining Efforts in Departments and Agencies 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Table 1: Key Steps Agencies Are Required to Take to Protect Privacy, with Examples of Related Detailed Procedures and Sources Key steps to protect privacy of Examples of procedures Primary statutory personal information Source___________ Publish notice in the Federal • Specify the routine uses for the system • Privacy Act Register when creating or modifying • Identify the individual responsible for the system system of records • Outline procedures individuals can use to gain access to their ________________________________records_________________________________________________________________ Provide individuals with access to • Permit individuals to review records about themselves • Privacy Act their records_____________________• Permit individuals to request corrections to their records__________________________ Notify individuals of the purpose and • Notify individuals of the authority that authorized the agency to • Privacy Act authority for the requested collect the information Information when it is collected • Notify individuals of the principal purposes for which the information ________________________________is to be used_____________________________________________________________ Implement guidance on system • Perform a risk assessment to determine the information system • FISMA vulnerabilities, identify threats, and develop countermeasures to • Privacy Act those threats • Have the system certified and accredited by management • Ensure the accuracy, relevance, timeliness, and completeness of _______________________________ information_______________________________________________________________ Conduct a privacy impact • Describe and analyze how information is secured • E-Government Act Assessment • Describe and analyze intended use of information • Have assessment reviewed by chief information officer or equivalent ______________________________• Make assessment publicly available, if practicable_________________________________ 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la ADVISE Data Mining Tool (Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement) 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Cato Institute: Data Mining and Terrorism • Attempting to use predictive data mining to ferret out terrorists before they strike would be a subtle but important misdirection of national security resources. • With a relatively small number of attempts every year and only one or two major terrorist incidents every few years – each one distinct in terms of planning and execution – there are no meaningful patterns that show what behavior indicates planning or preparation for terrorism. 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Data Mining in the Private Sector • We generate an enormous amount of data as a by-product of our everyday transactions (purchasing goods, enrolling for courses, etc.), visits to Web sites and interactions with government (taxes, census, car registration, voter registration, etc.). Not only is the number of records we generate increasing, but the amount of data gathered for each type of record is increasing. • As data miners, our tasks are colliding with these concerns. In analytic customer relationship management (CRM), we often analyze customer data with the specific intent of understanding individual behavior and instituting sales campaigns based on this understanding. Researchers in economics, demographics, medicine and social sciences are trying to understand the relationships between behaviors and outcomes. • How can we reconcile the legitimate needs of business and research with the equally legitimate desire of people to maintain their privacy? 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la The Use of Anonymizing • Still, anonymizing technologies have been endorsed repeatedly by panels appointed to examine the implications of data mining. And intriguing progress appears to have been made at designing information-retrieval systems with record anonymization, user audit logs — which can confirm that no one looked at records beyond the approved scope of an investigation — and other privacy mechanisms "baked in." • The trick is to do more than simply strip names from records. Latanya Sweeney of Carnegie Mellon University — a leading privacy technologist who once had a project funded under TIA — has shown that 87% of Americans could be identified by records listing solely their birthdate, gender and ZIP code. • Sweeney had this challenge in mind as she developed a way for the U.S. Department of Housing and Urban Development to anonymously track the homeless. 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la A Private Sector Example • • Tesco is quietly building a profile of you, along with every individual in the country - a map of personality, travel habits, shopping preferences and even how charitable and eco-friendly you are. A subsidiary of the supermarket chain has set up a database, called Crucible, that is collating detailed information on every household in the UK, whether they choose to shop at the retailer or not. The company refuses to reveal the information it holds, yet Tesco is selling access to this database to other big consumer groups, such as Sky, Orange and Gillette. "It contains details of every consumer in the UK at their home address across a range of demographic, socio-economic and lifestyle characteristics," says the marketing blurb of dunnhumby, the Tesco subsidiary in question. It has "added intelligent profiling and targeting" to its data through a software system called Zodiac. This profiling can rank your enthusiasm for promotions, your brand loyalty, whether you are a "creature of habit" and when you prefer to shop. As the blurb puts it: "The list is endless if you know what you are looking for." 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la The View From 30,000 feet Canadian company releases 3-D Face Scanner $350 Choicepoint’s press release states they have forgone selling certain consumer information “in selected markets” at a cost of $15 million dollars per year 2007 University of Pisa, Italy KDD Laboratory & “K-Anonymity” advancements RCMP buys info from data broker Roelof Temmingh, South Africa, releases version 1 of “Evolution” Fall 2006 -Purdue University electrophotograhic halftone printer code advances New Zealand Court of Appeal 4 May 2007 Brooker V. Police Brussels, Belgium EU “googles” Google privacy practices Feb 2007 - Portugal adopts “biometric” national ID card provider 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE Melbourne, Australia Jane Doe vs. ABC 3 April 2007 Costs, including tort of invasion of privacy: $234,190 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la