Download Inter-personalization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Data Mining
Dr. Bradley A. Malin
Assistant Professor
Department of Biomedical Informatics
Vanderbilt University
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Data Collection
demographics Electronic
Medical
clinical
presentation Records
Overt Collection
Covert Collection
Hospital visit for treatment
webcam while walking
• Localized: Personal records in databases at source
• Distributed: Integration of records from many sources
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Data Mining
• Unsupervised
– “Labels” unknown in
advance, so search for
intrinsic patterns of the data
– Clustering “similar” people
• Purchased same products
Country
USA
Canada
• Supervised
– “Labels” known in advance
– Train models on sample
data to classify new cases
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
Age
<50
Age
>50
<50
>50
Harry
Scream 1984 Jaws
Potter
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Website Personalization
• Can a website predict what I want to see?
– Intra-personalization: What pages / topics did
I visit in my previous visits?
– Inter-personalization: Is my browsing /
purchasing history similar to other people’s?
• Does my behavior reveal my identity or
sensitive things about my life?
– What information should not be revealed?
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Intelligence
• Lists of entities are
becoming increasingly
prevalent
– Intelligence reports,
rosters, networks
• How many Alice’s are
there? Which is which?
• How does Alice relate
to Bob?
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
Alice Doe
Bob Doe
Junior Doe
doc A
Alice Doe
John Smith
Bob Doe
Junior Doe
Alice Doe
doc B
Alice Doe
John Smith
Brad Malin
doc D
doc C
Bob
Alice
“A”
Brad
Alice
“B”
Junior
Alice
“C”
Alice
“D”
John
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Surveillance
• Location Surveillance: Did someone on Interpol’s
watchlist visit hotel X? Airline Y?
Hotel X
Airline Y
• Challenge: Data holders want to collaborate, but
fear strategic knowledge and legal constraints
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Privacy Protections
• Protect Anonymity
– Remove / encrypt identifying information
– Suppress inferences that can reveal identity
• Protect Confidentiality
– Hide Sensitive Rules
– Perturbation and Generalization
• Secure Multiparty Computation
– E(a) + E(b) = E(a + b)
[homomorphism]
– E(E(John),x),y) = E(E(John),y),x) [commutate]
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Clinical Genomics
Linked by Medical Record #
Clinical
Record
Blood
Samples
De-identification
• Vanderbilt DNA Databank
• DNA from “leftover” blood
– 25-75K per year, 250K in 5 years
• Combined with de-identified
electronic medical records
– 600 GBytes on 1.4 mil. patients
Clinical
Record
DNA
512 Bit Hash of #
• “Hypothesis Generation” to
mine correlations between
clinical features and DNA
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Example De-identified Medical Record
Replaced SSN
and phone #
MR# is
removed
Substituted
names
Shifted
Dates
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Naïve Protection
DNA in Genomic DBs
Identities in Discharge DBs
H1
H2
H3
H1
ACTG1
ACTG3
H2
H3
ACTG1
ACTG1
ACTG2
ACTG2
ACTG3
• Patterns in data can lead to privacy compromise
• Suppress patterns “intelligently” to support goals
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
In Detail: Cystic Fibrosis
100
100
% of DNA Records Disclosed
% of Samples Re-identified
(1149 patients, 174 hospitals)
80
60
40
20
0
0
10
20
30
40
50
80
60
40
20
Naive
Partial Trail Suppression
0
0
k
10
20
30
40
k
BEFORE Protection
100% Samples In Repository
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
AFTER Protection
0% Samples Re-identified
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
50
The Impact of Data Mining on Privacy
in the Public and Private Sectors
Richard S. Rosenberg
Professor Emeritus, Department of Computer
Science, University of British Columbia and
President of the BC Freedom of Information and
Privacy Association
Vancouver, BC
[email protected]
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
The U.S. Government
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
A Revision
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Top Six Purposes of Data Mining
Efforts in Departments and Agencies
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Table 1: Key Steps Agencies Are Required
to Take to Protect Privacy, with Examples of
Related Detailed Procedures and Sources
Key steps to protect privacy of
Examples of procedures
Primary statutory
personal information
Source___________
Publish notice in the Federal
• Specify the routine uses for the system
• Privacy Act
Register when creating or modifying • Identify the individual responsible for the system
system of records
• Outline procedures individuals can use to gain access to their
________________________________records_________________________________________________________________
Provide individuals with access to
• Permit individuals to review records about themselves
• Privacy Act
their records_____________________• Permit individuals to request corrections to their records__________________________
Notify individuals of the purpose and • Notify individuals of the authority that authorized the agency to
• Privacy Act
authority for the requested
collect the information
Information when it is collected
• Notify individuals of the principal purposes for which the information
________________________________is to be used_____________________________________________________________
Implement guidance on system
• Perform a risk assessment to determine the information system
• FISMA
vulnerabilities, identify threats, and develop countermeasures to
• Privacy Act
those threats
• Have the system certified and accredited by management
• Ensure the accuracy, relevance, timeliness, and completeness of
_______________________________ information_______________________________________________________________
Conduct a privacy impact
• Describe and analyze how information is secured
• E-Government Act
Assessment
• Describe and analyze intended use of information
• Have assessment reviewed by chief information officer or equivalent
______________________________• Make assessment publicly available, if practicable_________________________________
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
ADVISE Data Mining Tool
(Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement)
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Cato Institute: Data Mining and
Terrorism
• Attempting to use predictive data mining to
ferret out terrorists before they strike would
be a subtle but important misdirection of
national security resources.
• With a relatively small number of attempts
every year and only one or two major terrorist
incidents every few years – each one distinct
in terms of planning and execution – there
are no meaningful patterns that show what
behavior indicates planning or preparation for
terrorism.
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
Data Mining in the Private Sector
• We generate an enormous amount of data as a by-product of our
everyday transactions (purchasing goods, enrolling for courses,
etc.), visits to Web sites and interactions with government (taxes,
census, car registration, voter registration, etc.). Not only is the
number of records we generate increasing, but the amount of data
gathered for each type of record is increasing.
• As data miners, our tasks are colliding with these concerns. In
analytic customer relationship management (CRM), we often
analyze customer data with the specific intent of understanding
individual behavior and instituting sales campaigns based on this
understanding. Researchers in economics, demographics, medicine
and social sciences are trying to understand the relationships
between behaviors and outcomes.
• How can we reconcile the legitimate needs of business and
research with the equally legitimate desire of people to maintain
their privacy?
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
The Use of Anonymizing
• Still, anonymizing technologies have been endorsed repeatedly
by panels appointed to examine the implications of data mining.
And intriguing progress appears to have been made at
designing information-retrieval systems with record
anonymization, user audit logs — which can confirm that no one
looked at records beyond the approved scope of an
investigation — and other privacy mechanisms "baked in."
• The trick is to do more than simply strip names from records.
Latanya Sweeney of Carnegie Mellon University — a leading
privacy technologist who once had a project funded under TIA
— has shown that 87% of Americans could be identified by
records listing solely their birthdate, gender and ZIP code.
• Sweeney had this challenge in mind as she developed a way for
the U.S. Department of Housing and Urban Development to
anonymously track the homeless.
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
A Private Sector Example
•
•
Tesco is quietly building a profile of you, along with every individual in the
country - a map of personality, travel habits, shopping preferences and even
how charitable and eco-friendly you are. A subsidiary of the supermarket
chain has set up a database, called Crucible, that is collating detailed
information on every household in the UK, whether they choose to shop at
the retailer or not.
The company refuses to reveal the information it holds, yet Tesco is selling
access to this database to other big consumer groups, such as Sky, Orange
and Gillette. "It contains details of every consumer in the UK at their home
address across a range of demographic, socio-economic and lifestyle
characteristics," says the marketing blurb of dunnhumby, the Tesco
subsidiary in question. It has "added intelligent profiling and targeting" to its
data through a software system called Zodiac. This profiling can rank your
enthusiasm for promotions, your brand loyalty, whether you are a "creature
of habit" and when you prefer to shop. As the blurb puts it: "The list is
endless if you know what you are looking for."
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la
The View From 30,000 feet
Canadian company
releases 3-D Face
Scanner $350
Choicepoint’s press release states
they have forgone selling certain
consumer information “in selected
markets” at a cost of $15 million
dollars per year
2007 University of Pisa, Italy
KDD Laboratory &
“K-Anonymity” advancements
RCMP buys info from
data broker
Roelof Temmingh,
South Africa,
releases version 1 of
“Evolution”
Fall 2006 -Purdue
University
electrophotograhic
halftone printer
code advances
New Zealand Court of
Appeal 4 May 2007
Brooker V. Police
Brussels, Belgium
EU “googles” Google
privacy practices
Feb 2007 - Portugal
adopts “biometric”
national ID card
provider
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE
Melbourne, Australia
Jane Doe vs. ABC 3 April 2007
Costs, including tort of invasion of
privacy: $234,190
29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY COMMISSIONERS
29e Confrence internationale des
commissaires à la protection de la