Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la De-identification Risk and Resolution Bradley Malin, Ph.D. Assistant Professor Vanderbilt University 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la De-identified is not Anonymous (Sweeney 1998, 2000) Ethnicity Name Visit date Address Zip Date registered 87% of theBirthdate United States is Diagnosis Party affiliation RE-IDENTIFIABLE Procedure Sex Medication Date last voted Total charge Hospital Discharge Data 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE Voter List 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la DNA Re-identification • Many deployed genomic privacy technologies leave DNA susceptible to re-identification (Malin 2005) • DNA is re-identified by automated methods, such as: – Genotype – Phenotype Inference (Malin & Sweeney, 2000, 2002) (CAG)n 3334 Medical Database 3334 ICD9 code Genetic ICD9 code Mutation (CAG)n HD Gene Mutation 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE DNA Database 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Genealogy Re-identification (Malin 2006) • IdentiFamily: Public Resource – software that links deidentified pedigrees to named individuals – Uses publicly available information, such as obituaries, death records, and the Social Security Death Index database to build genealogies Public Resource Public Resource Step 1: Extract Death Records Step 2: Validate Step 3: Structure Bob Chaz Ada Ed Step 4: Link Dan Fay Identified Family Structures 29e Confrence internationale des 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE Population Records De-identified Pedigrees (Shared for Research) 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS commissaires à la protection de la Genealogy Re-identification (Malin 2006) 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la System Susceptibility (Malin, JAMIA 2005) Privacy Protection Systems What Trusted Third Party Semi-Trusted Third Party Where deCode Genetics Inc. University of Gent, Custodix Denominalization De-identification University of Montreal University of Utah, University of Sydney, Australian National University Susceptibility to Attack Family Structures Trails Genotype-Phenotype Dictionary Susceptible 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE Not Susceptible 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Altering Data Does not Guarantee Protection • Science Magazine (Lin et al, 2004) • i.e., change A with T, etc. • aaaact atacct – Utility (Correlations) – < 100 “SNPs” make DNA unique – Proposed protection:DISCLAIMER: perturb DNA Uniqueness Does not Guarantee Increase perturbation,will decrease Privacy be Compromised internal correlations (see graph) – Conclusions • Too much perturbation needed to prevent linkage • Keep records under lock and key 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE Privacy (Perturbation) 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Formal Re-identification Model Already Public De-identified Biobank Data aaactaaga cacaccatg tatatgatgt Identified Data Necessary Condition 2. Certify No Linkage LINKAGE Route MODELC 1. Make Data Non-unique Necessary Condition UNIQUENESS 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE John Doe Jane Doe Jeremiah Doe Necessary Condition UNIQUENESS 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Formal Protection • k-Map (Sweeney, 2002) – Each shared record refers to at least k entities in the population • k-Anonymity (Sweeney, 2002) – Each shared record is equivalent to at least k-1 other records • k-Unlinkability (Malin 2006) – Each shared record links to at least k identities via its trail – Satisfies k-Map protection model 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Beyond Ad hoc Protections • Perturbation does not guarantee privacy • Alternative: Generalization of data Perturbation (Lin et al 2004) ATACAACGTT ATCGATCGAT Generalization (Malin 2005) ATC[G or C]A[T or A]CG[T or A]T 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Learning Who You Are From Where You Have Been (“Trails”) (Malin & Sweeney, 2001; 2004, Malin & Airoldi 2006) DNA in Genomic DBs Identities in Discharge DBs H1 H2 H3 H1 ACTG1 ACTG3 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE H2 H3 ACTG1 ACTG1 ACTG2 ACTG2 ACTG3 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la Preventing Trails: Cystic Fibrosis Population 100 100 % of DNA Records Disclosed % of Samples Re-identified (1149 samples) 80 60 40 20 0 0 10 20 30 40 50 80 60 40 20 Naive Partial Trail Suppression 0 0 k 10 20 30 40 k BEFORE STRANON 100% Samples In Repository 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE AFTER STRANON 0% Samples k-Re-identified 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la 50 Benefit: Quantified Risk Requested Quantity % of Samples in Repository Forced Setting Initial Setting • Change in reidentification risk 100 80 • Shift burden of increased risk to requesting analyst 60 40 20 0 0 10 20 30 40 50 k 29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE • Ties together legal and computational models 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS 29e Confrence internationale des commissaires à la protection de la