Download Malin 2006

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
De-identification Risk and
Resolution
Bradley Malin, Ph.D.
Assistant Professor
Vanderbilt University
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
De-identified is not Anonymous
(Sweeney 1998, 2000)
Ethnicity
Name
Visit date
Address
Zip
Date registered
87%
of theBirthdate
United States
is
Diagnosis
Party affiliation
RE-IDENTIFIABLE
Procedure
Sex
Medication
Date last voted
Total charge
Hospital Discharge Data
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
Voter List
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
DNA Re-identification
•
Many deployed genomic privacy technologies leave
DNA susceptible to re-identification (Malin 2005)
•
DNA is re-identified by automated methods, such as:
–
Genotype – Phenotype Inference (Malin & Sweeney, 2000, 2002)
(CAG)n
3334
Medical
Database
3334
ICD9 code
Genetic
ICD9 code
Mutation
(CAG)n
HD Gene
Mutation
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
DNA
Database
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Genealogy Re-identification
(Malin 2006)
• IdentiFamily:
Public
Resource
– software that links deidentified pedigrees to
named individuals
– Uses publicly available
information, such as
obituaries, death records,
and the Social Security
Death Index database to
build genealogies
Public
Resource
Public
Resource
Step 1: Extract
Death
Records
Step 2: Validate
Step 3: Structure
Bob
Chaz
Ada
Ed
Step 4: Link
Dan
Fay
Identified
Family
Structures
29e Confrence
internationale
des
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
Population
Records
De-identified Pedigrees
(Shared for Research)
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
commissaires à la protection de la
Genealogy Re-identification
(Malin 2006)
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
System Susceptibility
(Malin, JAMIA 2005)
Privacy Protection Systems
What
Trusted
Third Party
Semi-Trusted
Third Party
Where
deCode
Genetics
Inc.
University of
Gent,
Custodix
Denominalization
De-identification
University of
Montreal
University of Utah,
University of Sydney,
Australian National
University
Susceptibility to Attack
Family Structures
Trails
Genotype-Phenotype
Dictionary
Susceptible
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
Not Susceptible
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Altering Data Does not
Guarantee Protection
• Science Magazine (Lin et al, 2004)
• i.e., change A with T, etc.
• aaaact  atacct
–
Utility
(Correlations)
– < 100 “SNPs” make DNA unique
– Proposed protection:DISCLAIMER:
perturb DNA
Uniqueness Does not Guarantee
Increase
perturbation,will
decrease
Privacy
be Compromised
internal correlations (see graph)
– Conclusions
• Too much perturbation needed to
prevent linkage
• Keep records under lock and key
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
Privacy
(Perturbation)
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Formal Re-identification Model
Already Public
De-identified
Biobank Data
aaactaaga
cacaccatg
tatatgatgt
Identified Data
Necessary Condition
2. Certify
No Linkage
LINKAGE
Route
MODELC
1. Make Data
Non-unique
Necessary
Condition
UNIQUENESS
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
John Doe
Jane Doe
Jeremiah Doe
Necessary Condition
UNIQUENESS
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Formal Protection
• k-Map (Sweeney, 2002)
– Each shared record refers to at least k entities in the
population
• k-Anonymity (Sweeney, 2002)
– Each shared record is equivalent to at least k-1 other
records
• k-Unlinkability
(Malin 2006)
– Each shared record links to at least k identities via its trail
– Satisfies k-Map protection model
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Beyond Ad hoc Protections
• Perturbation does not guarantee privacy
• Alternative: Generalization of data
Perturbation (Lin et al 2004)
ATACAACGTT
ATCGATCGAT
Generalization
(Malin 2005)
ATC[G or C]A[T or A]CG[T or A]T
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Learning Who You Are From Where
You Have Been (“Trails”)
(Malin & Sweeney, 2001; 2004, Malin & Airoldi 2006)
DNA in Genomic DBs
Identities in Discharge DBs
H1
H2
H3
H1
ACTG1
ACTG3
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
H2
H3
ACTG1
ACTG1
ACTG2
ACTG2
ACTG3
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Preventing Trails: Cystic Fibrosis Population
100
100
% of DNA Records Disclosed
% of Samples Re-identified
(1149 samples)
80
60
40
20
0
0
10
20
30
40
50
80
60
40
20
Naive
Partial Trail Suppression
0
0
k
10
20
30
40
k
BEFORE STRANON
100% Samples In Repository
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
AFTER STRANON
0% Samples k-Re-identified
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
50
Benefit: Quantified Risk
Requested
Quantity
% of Samples in Repository
Forced
Setting
Initial
Setting
• Change in reidentification risk
100
80
• Shift burden of
increased risk to
requesting analyst
60
40
20
0
0
10
20
30
40
50
k
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
• Ties together legal
and computational
models
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Related documents