Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Agenda 1. What is (Web) data mining? And what does it have to do with privacy? – a simple view – 2. Examples of data mining and "privacy-preserving data mining": Association-rule mining (& privacy-preserving AR mining) Collaborative filtering (& privacy-preserving collaborative filtering) 3. A second look at ...privacy 4. A second look at ...Web / data mining 5. The goal: More than modelling and hiding – Towards a comprehensive view of Web mining and privacy. Threats, opportunities and solution approaches. 6. An outlook: Data mining for privacy Technical background of the problem: Privacy Problems: Example 1 • The dataset allows for Web mining (e.g., which search queries lead to which site choices), • it violates k-anonymity (e.g. "Lilburn" a likely k = #inhabitants of Lilburn) 2 3 Privacy Problems: Example 2 Where do people live who will buy the Koran soon? Technical background of the problem: • A mashup of different data sources • Amazon wishlists • Yahoo! People (addresses) • Google Maps each with insufficient k-anonymity, allows for attribute matching and thereby inferences Privacy Problems: Example 3 Predicting political affiliation from Facebook profile and link data (1): Most Conservative Traits Trait Name Trait Value Weight Conservative Group george w bush is my homeboy 45.88831329 Group college republicans 40.51122488 Group texas conservatives 32.23171423 Group bears for bush 30.86484689 Group kerry is a fairy 28.50250433 Group aggie republicans 27.64720818 Group keep facebook clean 23.653477 Group i voted for bush 23.43173116 Group protect marriage one man one woman 21.60830487 Lindamood et al. 09 & Heatherly et al. 09 4 Predicting political affiliation from Facebook profile and link data (2): Most Liberal Traits per Trait Name Trait Name Trait Value Weight Liberal activities amnesty international 4.659100601 Employer hot topic 2.753844959 favorite tv shows queer as folk 9.762900035 grad school computer science 1.698146579 hometown mumbai 3.566007713 Relationship Status in an open relationship 1.617950632 religious views agnostic 3.15756412 looking for whatever i can get 1.703651985 5 Lindamood et al. 09 & Heatherly et al. 09 6 "Privacy-preserving Web mining" example: find patterns, unlink personal data Volvo S40 website targets people in 20s Are visitors in their 20s or 40s? Which demographic groups like/dislike the website? An example of the "Randomization Approach" to PPDM: R. Agrawal and R. Srikant, "Privacy Preserving Data Mining", SIGMOD 2000. 7 Randomization Approach Overview 30 | 70K | ... 50 | 40K | ... Randomizer Randomizer 65 | 20K | ... 25 | 60K | ... Reconstruct distribution of Age Reconstruct distribution of Salary Data Mining Algorithms ... ... ... Model 8 Seems to work well! Number of People 1200 1000 800 Original Randomized Reconstructed 600 400 200 0 20 60 Age 9 What is collaborative filtering? "People like what people like them like" – regardless of support and confidence 10 User-based Collaborative Filtering Idea: People who agreed in the past are likely to agree again To predict a user’s opinion for an item, use the opinion of similar users Similarity between users is decided by looking at their overlap in opinions for other items Next step: build a model of user types "global model" rather than "local patterns" as mining result 11 1. Privacy as confidentiality: "the right to be let alone" – and to hide data Data Is this all there is to privacy? 12 2. Privacy as control: informational self-determination Data Don‘t do THIS ! e.g. data privacy: "the right of the individual to decide what information about himself should be communicated to others and under what circumstances" (Westin, 1970) behind much of data-protection legislation (see Eleni Kosta‘s talk) 13 Discussion item: What is this an example of? Tracing anonymous edits in Wikipedia http://wikiscanner.virgil.gr/ 14 [Method: Attribute matching] 15 Results (an example)