Download Anonymity and Re-identification Risk

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

DNA profiling wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

DNA supercoil wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Deoxyribozyme wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Transcript
Re-identification and
Privacy risk
Professor John Bacon-Shone
Director, Social Sciences Research Centre &
Chair, Human Research Ethics Committee
The University of Hong Kong
Asian Privacy Scholars Network: July 2013
Introduction
Ethics committees in universities generally assume that once
personal data has been anonymized, it is no longer personal
data, so the privacy risk is permanently addressed
Recent papers suggest that this is not necessarily a wise
assumption!
I wish to examine the issue of re-identification and what it
means for privacy, confidentiality and research ethics
Asian Privacy Scholars Network: July 2013
Anonymity
Seems the most difficult ethical concept for academics to fully
grasp.
The dictionary says: Anonymous: not named or identified
But most people think it just means not named, so for
example, if I interview you, but do not record your name, they
think it is anonymous, even if I know who you are or you make
statements in the interview record that implicitly identify you
What is much more tricky is that anonymity may not be static:
being anonymous today does not necessarily mean being
anonymous tomorrow
Asian Privacy Scholars Network: July 2013
Personal Identifier (PDPO)
The ordinance states that:
“Personal Identifier” means an identifier that is assigned to an
individual by a data user for the purpose of the operations of the user
and that uniquely identifies that individual in relation to the data user,
but does not include an individual's name used to identify that individual
While the assignment of a “personal identifier” may provide a certain
degree of anonymity, its effectiveness relies on the data user taking the
necessary action. For example, if a hospital uses the patient’s ID card
number to identify the patient, the desired degree of anonymity will not
be attained.
Asian Privacy Scholars Network: July 2013
Personal Identifier (my version)
Personal Identifier means an identifier, other than name, that
uniquely identifies some (but maybe not all) individuals in a
specified population
Clearly, the existence of a personal identifier does not mean
we have anonymity for all individuals. Some privacy risk
therefore exists. The evaluation of such privacy risk requires
knowing both the chance of re-identification of individuals and
the consequences.
Next, let’s examine the chance of re-identification
Asian Privacy Scholars Network: July 2013
Chance of re-identification
This can be separated into 2 elements:
1)
2)
Chance of uniqueness
Ease of matching
Asian Privacy Scholars Network: July 2013
Chance of uniqueness
The chance of uniqueness depends on both the identifier and
the population
The more variables in the dataset and the more possible
values for each variable, the more likely that the identifier is
unique for some individuals. Hence the concern about Big
Data and the development of much larger datasets.
A smaller population (e.g. identical twins in Hong Kong) has a
much greater chance of uniqueness than a large population.
Note that DNA profile may not be unique (identical twins) and
the matching can be indirect using the similarities of DNA
within families.
Asian Privacy Scholars Network: July 2013
Ease of matching
The ease of matching means how easily can we match the
identifier back to a specific person. Let us consider some
examples:
ID card number: here the risk of matching is high, because the
government has enabled leakage of matching information (e.g.
Company Registry)
DNA profile: the risk of matching should be low, unless you or
family members have provided DNA profiles to a registry (see
later discussion)
Date and time of admission to a specific hospital: would allow
matching with hospital records, if they can be accessed
Asian Privacy Scholars Network: July 2013
Ease of matching
Recent publications have discussed the possibility of matching
becoming easier with time, for example:
Data leakage:
Individuals make DNA profiles public, making it increasingly
possible to use familial matches to match individuals or
surnames
Arrested individuals are often required to provide DNA profiles
that are not erased even if innocent
ID card numbers are leaked from websites, making it even
easier to match to names.
Asian Privacy Scholars Network: July 2013
Ease of matching
Linkage of the identifier to individual characteristics:
Hospital admission:
If it is known that you were involved in a traffic accident, your
hospital admission soon afterward near to the accident
location becomes likely, increasing the ease of matching to
hospital records.
DNA:
Researchers are developing methods to predict personal
characteristics from DNA profiles, such as eye, skin and hair
colour, so ease is likely to increase
Asian Privacy Scholars Network: July 2013
Value of matching
Need to consider the reason of matching:
Authentication – need to be able to match against an identifier
carried by the individual such as ID card
Matching other records – need only to match internally, so no
need to use an identifier usable externally, greatly reducing the
risk of unintended matching
Asian Privacy Scholars Network: July 2013
Consequences of re-identification
Can range from the trivial (e.g. customer of a clothing retail
outlet) to the serious (e.g. HIV status), but the full
consequences cannot always be predicted
While it is possible to change some identifiers (e.g. ID card
number, mobile phone number), it is impossible to change
other identifiers (e.g. DNA profile), so long term risk needs to
be recognized and addressed
Asian Privacy Scholars Network: July 2013
Implications of re-identification
Arguably unethical to promise:
Zero risk – mistakes can always happen
Future risk is same as current risk – technology and
circumstances change; ease of matching continues to
increase, especially for biological markers
Need to review use of identifiers – what seemed privacy safe
in the past may not be safe in the future, so need to continue
to review privacy risk
Asian Privacy Scholars Network: July 2013