* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Limiting Disclosure in Hippocratic Databases
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Limiting Disclosure in Hippocratic Databases VLDB August 31, 2004 Kristen LeFevre Rakesh Agrawal Vuk Ercegovac Raghu Ramakrishnan Yirong Xu David DeWitt Presentation Outline 8/31/2004 Hippocratic Databases framework for managing privacy, including the problem of limiting disclosure Overview of our proposal for integrating policydriven disclosure control into an existing relational database environment Brief discussion of alternative cell-level enforcement models Optimized implementation of opt-in and opt-out choices Overview of performance evaluation Conclusions Limiting Disclosure in Hippocratic Databases 2 Hippocratic Databases and Limited Disclosure Hippocratic Databases have been proposed as a framework for managing privacy-sensitive information Limited disclosure is one of the defining principles of this framework Limited Disclosure includes 3 Main Ideas: Privacy Policy Organizations define a set of rules describing to whom data may be disclosed (recipients) and how the data may be used (purposes) Consent Data subjects given control over who may see their personal information and under what circumstances Disclosure Control Database ensures that privacy policy and data subject consent is enforced with respect to all data access 8/31/2004 Limits the outflow of information from the database Limiting Disclosure in Hippocratic Databases 3 Motivating Example Consider a group of athletes registering for a major international competition Personal information is collected from each athlete, possibly including Data must be managed according to the organizing committee’s privacy policy 8/31/2004 Name, Age, Nationality, Address, Phone number, Visa status Government officials are allowed to see visa information for the purpose of venue security Team travel agents may see the contact information for athletes from their own country for making travel arrangements Organizing committee may not disclose athletes’ information to journalists without the athlete’s consent Limiting Disclosure in Hippocratic Databases 4 Limited Disclosure Framework Goals 8/31/2004 Provide techniques for enforcing a broad class of privacy policy rules Privacy policy enforcement should require little or no modification to existing application code Policy rules should be stored and managed by the database Provide limited disclosure enforcement at the cell level Limiting Disclosure in Hippocratic Databases 5 Limited Disclosure Framework Overview Start with an existing database environment with associated applications Privacy policy is defined and stored in the database in privacy meta-data tables Query When providing information, data subjects also provide consent for various data use Query Modifier Subject Consent Policy Definition Privacy MetaData 8/31/2004 Queries are modified so results respect privacy policy and consent Data Table Limiting Disclosure in Hippocratic Databases Consent Info 6 Policy Definition Privacy policy is defined using one of the following XML-based policy definition languages 8/31/2004 Platform for Privacy Preferences (P3P) Enterprise Privacy Authorization Language (EPAL) Limiting Disclosure in Hippocratic Databases 7 Privacy Meta-Data and Policy Meta-Language Privacy “meta-language” for expressing the privacy policy in the database 8/31/2004 Not tied to one particular policy language Many practical P3P and EPAL policies can be translated to this language Privacy policy is a set of rules of the form <data, purpose, recipient, condition> Condition must be a predicate that can be expressed in SQL Privacy policy rules stored in the database Limiting Disclosure in Hippocratic Databases 8 Privacy Meta-Data Example Journalists Government may officials only see mayathletes’ see athletes’ namesvisa for information the purposefor of writing security articles purposes. with explicit consent 8/31/2004 Policy Rule Purpose Recipient Table Column CondID P1 R1 Security Gov’t Off. Athletes Visa - P1 R2 Security Gov’t Off. Athletes Name - P1 R3 Travel Travel Ag. Athletes Name - P1 R4 Travel Travel Ag. Athletes Phone - P1 R5 Articles Journalist Athletes Name C1 P1 R6 Articles Journalist Athletes Address C2 CondID Predicate C1 “EXISTS (SELECT Name_choice FROM Athlete_choices WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND Athlete_choices.Name_choice = 1)” C2 “EXISTS (SELECT Name_choice FROM Athlete_choices WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND Athlete_choices.Address_choice = 1)” Limiting Disclosure in Hippocratic Databases 9 Query Modification 8/31/2004 Implemented two alternative algorithms for modifying queries to incorporate policy rules and consent information Queries modified in such a way that query results follow one our celllevel semantic models Limiting Disclosure in Hippocratic Databases 10 Enforcement Models 8/31/2004 Row (tuple)-level enforcement insufficient for enforcing arbitrary policies when existing database schemas are not designed with the policy in mind Limiting Disclosure in Hippocratic Databases 11 An Example Table “Athletes” Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 2 Natalie Coughlin 22 Berkeley 222-2222 3 Ian Thorpe 23 Sydney 333-3333 4 Jenny Thompson 31 New York 444-4444 # Athlete# Name Age Address Phone 1 √ √ √ √ √ 2 X X X X X 3 √ X X √ √ 4 √ √ X X X Consent information for journalists writing stories 8/31/2004 Limiting Disclosure in Hippocratic Databases 12 Row-Level Enforcement Table “Athletes” Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 2 Natalie Coughlin 22 Berkeley 222-2222 3 Ian Thorpe 23 Sydney 333-3333 4 Jenny Thompson 31 New York 444-4444 # Athlete# Name Age Address Phone 1 √ √ √ √ √ 2 X X X X X 3 √ X X √ √ 4 √ √ X X X Consent information for journalists writing stories 8/31/2004 Limiting Disclosure in Hippocratic Databases 13 Must either disclose prohibited information, or restrict information that should be available! Filter Athlete #2 because no consent is provided Row-Level Enforcement Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 3 Ian Thorpe 23 Sydney 333-3333 4 Jenny Thompson 31 New York 444-4444 # Athlete# Name Age Address Phone 1 √ √ √ √ √ 2 X X X X X 3 √ X X √ √ 4 √ √ X X X Consent information for journalists writing stories 8/31/2004 Limiting Disclosure in Hippocratic Databases 14 Enforcement Models Cell-level enforcement 8/31/2004 Table Semantics model Query Semantics model Limiting Disclosure in Hippocratic Databases 15 Table Semantics Enforcement 1. 2. 3. 8/31/2004 “Mask” prohibited cells with the null value Filter rows where the primary key is prohibited Conceptually, query is performed on top of this “view” Limiting Disclosure in Hippocratic Databases 16 Table Semantics Enforcement 8/31/2004 SQL’s null value represents “no value” Desirable semantics for prohibited values Predicates applied to null never evaluate to true Null does not join with other values Null is not included when computing aggregates Limiting Disclosure in Hippocratic Databases 17 Table Semantics Enforcement Table “Athletes” Consent Information Athlete# Name Age Address Phone # Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 1 √ √ √ √ √ 2 Natalie Coughlin 22 Berkeley 222-2222 2 X X X X X 3 Ian Thorpe 23 Sydney 333-3333 3 √ X X √ √ 4 Jenny Thompson 31 New York 444-4444 4 √ √ X X X Mask prohibited cells with null Filter rows where the primary key is prohibited Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 Sydney 333-3333 3 4 Jenny Thompson Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 Sydney 333-3333 3 4 8/31/2004 Jenny Thompson Limiting Disclosure in Hippocratic Databases 18 Enforcement Models Cell-level enforcement 8/31/2004 Table Semantics model Query Semantics model Limiting Disclosure in Hippocratic Databases 19 Query Semantics Enforcement 1. 2. 3. 8/31/2004 “Mask” prohibited cells with the null value Execute the query on top of the masked table Filter rows that are entirely null from the result set Limiting Disclosure in Hippocratic Databases 20 Query Semantics Enforcement Mask prohibited cells with null Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 Sydney 333-3333 3 4 Issue Query: SELECT Name, Age FROM Athletes Jenny Thompson Name Age Michael Phelps 19 Jenny Thompson Filter rows that are entirely null from result set Name Age Name Age Michael Phelps 19 Michael Phelps 19 Jenny Thompson Query Semantics Jenny Thompson Table Semantics 8/31/2004 Limiting Disclosure in Hippocratic Databases 21 Query Modification Example (Table Semantics) SELECT Name FROM Athletes WHERE Name = ‘Michael Phelps’ SELECT CASE WHEN EXISTS (SELECT Name_Choice FROM Athlete_Choices WHERE Athletes.Athlete# = Athlete_Choices.Athlete# AND Athlete_Choices.Name_Choice = 1) THEN Name ELSE null END FROM Athletes WHERE Name = ‘Michael Phelps’ AND EXISTS (SELECT Athlete#_Choice FROM Athlete_Choices WHERE Athletes.Athlete# = Athlete_Choices.Athlete# AND Athlete_Choices.Athlete#_Choice = 1) 8/31/2004 Limiting Disclosure in Hippocratic Databases 22 Database-level disclosure control Database the best place to enforce limited disclosure More efficient, flexible, and secure than an application-level approach Need not fetch prohibited data from the database When applied naively, an application-level approach leads to privacy leaks when applied at the cell level 8/31/2004 Consider the query SELECT Name, Age FROM Athletes WHERE Age > 30 Limiting Disclosure in Hippocratic Databases 23 Example: Difficulties of application-level disclosure control Table “Athletes” Query the database; Retrieve results to application Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 111-1111 2 Natalie Coughlin 22 Berkeley 222-2222 3 Ian Thorpe 23 Sydney 333-3333 4 Jenny Thompson 31 New York 444-4444 Name Age Jenny Thompson 31 Consent Information Check policy and consent info; replace prohibited cells with null Name Age Jenny Thompson # Athlete# Name Age Address Phone 1 √ √ √ √ √ 2 X X X X X 3 √ X X √ √ 4 √ √ X X X Based on this query, it is easy to infer that Jenny Thompson’s age is greater than 30! 8/31/2004 Limiting Disclosure in Hippocratic Databases 24 Database-level disclosure control Database is a logical place to enforce limited disclosure More efficient and flexible than an applicationlevel rule engine approach Need not fetch prohibited data from the database When applied naively, an application-level approach leads to privacy leaks when applied at the cell level 8/31/2004 Consider the query SELECT Name, Age FROM Athletes WHERE Age > 30 Alternative approach performs much query processing in the application Even more complicated to compute aggregates and joins when some cells are prohibited! Limiting Disclosure in Hippocratic Databases 25 Optimized Implementation of Opt-in and Opt-out Conditions Important to note that SQL queries offer much flexibility for defining disclosure conditions In practice simple opt-in and opt-out choices are often used to express subject consent and are extremely important 8/31/2004 Sufficient for expressing P3P policy rules Sufficient for expressing many HIPAAmandated policies, for example. Implemented several techniques for storing consent and optimizing this type of condition Limiting Disclosure in Hippocratic Databases 26 Optimized Implementation of Opt-in and Opt-out Conditions Several alternative storage techniques 8/31/2004 Internal column (inline) representation External, single table representation External, multiple table representation Limiting Disclosure in Hippocratic Databases 27 Optimized Implementation of Opt-in and Opt-out Conditions Internal Column representation Table “Athletes” Athlete# Name Age Address Phone Athlete # Name Age Address Phone 1 Michael Phelps 19 Baltimore 1111111 yes yes yes yes yes 2 Natalie Coughlin 23 Berkeley 2222222 no no no no no 3 Ian Thorpe 23 Sydney 3333333 yes no no yes yes 4 Jenny Thompson 31 New York 4444444 yes yes no no no 8/31/2004 Limiting Disclosure in Hippocratic Databases 28 Optimized Implementation of Opt-in and Opt-out Conditions External, single table representation Table “Athletes” Consent Table Athlete# Name Age Address Phone ID Athlete# Name Age Address Phone 1 Michael Phelps 19 Baltimore 1111111 1 yes yes yes yes yes 2 Natalie Coughlin 23 Berkeley 2222222 2 no no no no no 3 yes no no yes yes 3 Ian Thorpe 23 Sydney 3333333 4 yes yes no no no 4 Jenny Thompson 31 New York 4444444 8/31/2004 Limiting Disclosure in Hippocratic Databases 29 Optimized Implementation of Opt-in and Opt-out Conditions External, multiple table representation Positive Consent Tables Table “Athletes” Athlete# Name Age Address Phone Athlete# Name Age 1 Michael Phelps 19 Baltimore 1111111 1 1 1 3 4 2 Natalie Coughlin 23 Berkeley 2222222 4 3 Ian Thorpe 23 Sydney 3333333 Address Phone 4 Jenny Thompson 31 New York 4444444 1 1 3 3 8/31/2004 Limiting Disclosure in Hippocratic Databases 30 Overview of Performance Experiments Implemented Query Modification algorithms on top of DB2 version 8.1 Focused on measuring performance for unconditional rules, and those with opt-in and opt-out choices Experimental setup 8/31/2004 Synthetic dataset based on the Wisconsin Benchmark Dual-processor 1.8 GHz AMD Machine running Windows 2000 Server 2 gigabytes memory 50 megabyte buffer pool Queries run warm and cold Here we report the warm numbers (error less than ±5% with 95% confidence) Limiting Disclosure in Hippocratic Databases 31 Elapsed Time (seconds) 40 30 20 10 Unmodified Modified External Multiple Modified Internal 0 0 • • • 20 40 60 Choice Selectivity (%) 80 100 Measured performance of a query selecting all records from a 5 millionrecord table Compared performance of original and modified queries for varied choice selectivity Not surprisingly, performance actually better for modified queries when we use privacy enforcement as an additional selection condition – Able to use indexes on choice values • Shows the importance of database-level privacy enforcement for performance 8/31/2004 Limiting Disclosure in Hippocratic Databases 32 80 Elapsed Time (seconds) Unmodified 60 Modified Internal Modified External Multiple 40 20 0 1 5 10 Data Table Size (millions of records) • • Measured overhead cost using a query that selects all records Choice selectivity = 100% – Observed worst-case scenario where no rows are filtered due to privacy constraints, but incur all costs of cell-level checking • • • Full bar represents elapsed time Bottom portion of bar is CPU time Much of the cost of privacy enforcement is CPU cost, so scales well as queries become more I/O intensive 8/31/2004 Limiting Disclosure in Hippocratic Databases 33 Additional Performance Results Cost of rewriting queries is small Must only be done once if query is pre-compiled Found that query semantics enforcement model is often faster than table semantics because frequently more rows are filtered Tradeoffs between choice storage techniques Number of choices stored for a particular table As more choices are stored, performance of internal representation suffers Number of choices enforced for a particular query Tradeoffs between query modification algorithms 8/31/2004 As more choices are enforced, performance of external multiple representation suffers Described in paper Limiting Disclosure in Hippocratic Databases 34 Conclusions Limited Disclosure is a necessary component of a comprehensive data privacy management system Proposed a framework enforcing limited disclosure at the database level 8/31/2004 More efficient and flexible than application-level disclosure control Techniques also have broader use for other applications requiring policy-driven fine-grained disclosure control Framework can be deployed to an existing environment with minimal modification to legacy applications and existing schemas Query modification and consent storage approaches efficient enough to be viable in practice Limiting Disclosure in Hippocratic Databases 35 Questions