Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Securely Managing History in Database Systems Gerome Miklau Joint work with Brian Levine, Patrick Stahlberg, Wentian Lu University of Massachusetts, Amherst History has benefits History: a stored record of data and operations performed in a system. now • Arguments for preserving history • Protection against loss • Storage is cheap • History is useful: accountability History has benefits History: a stored record of data and operations performed in a system. now • Arguments for preserving history • Protection against loss • Storage is cheap • History is useful: accountability History has benefits History: a stored record of data and operations performed in a system. now • Arguments for preserving history • Protection against loss • Storage is cheap • History is useful: accountability holding people/ programs responsible for actions taken. History has risks • Arguments against preserving history • Persistence threatens privacy. • Institutions can be compelled to reveal retained data (even if they don’t want to). • There are significant benefits to institutional forgetfulness. Retention policies Privacy and accountability balanced through retention policies. Institution Collected Info Retention Policy [Mayer-Schoenberger 2007] Retention policies Privacy and accountability balanced through retention policies. Institution Russian KGB Collected Info Retention Policy хранить вечно speech, actions, etc. “to be preserved forever” U.S. credit agency late payments, defaults, etc. 7 years Google search engine queries 9 months [Mayer-Schoenberger 2007] Retention policies Privacy and accountability balanced through retention policies. Institution Russian KGB Collected Info Retention Policy хранить вечно speech, actions, etc. “to be preserved forever” U.S. credit agency late payments, defaults, etc. 7 years Google search engine queries 9 months [Mayer-Schoenberger 2007] Retention policies Privacy and accountability balanced through retention policies. Institution Russian KGB Collected Info Retention Policy хранить вечно speech, actions, etc. “to be preserved forever” U.S. credit agency late payments, defaults, etc. 7 years Google search engine queries 9 months [Mayer-Schoenberger 2007] Securing history Investigator/ Auditor Individual Problem: very little control over how and when historical data is retained in systems, who can recover and analyze it. • Part 1: Databases don’t easily “forget” history. • To support privacy: “memory-less” systems • Part 2: Databases can “remember”, but not safely. • To privacy & accountability: implementing retention policies on an audit log. Securing history Investigator/ Auditor Individual Problem: very little control over how and when historical data is retained in systems, who can recover and analyze it. • Part 1: Databases don’t easily “forget” history. • To support privacy: “memory-less” systems • Part 2: Databases can “remember”, but not safely. • To privacy & accountability: implementing retention policies on an audit log. PART 1: Databases don’t forget Databases inadvertently retain historical traces of data and operations. • INSERT sensitive record Table storage Index • (later) DELETE the record • deletion is “logical” -data is not destroyed • actual persistence of data is hard to predict, and virtually impossible to control. Log Temp PART 1: Databases don’t forget Databases inadvertently retain historical traces of data and operations. • INSERT sensitive record Table storage Index • (later) DELETE the record • deletion is “logical” -data is not destroyed • actual persistence of data is hard to predict, and virtually impossible to control. Log Temp Database forensic analysis Unintentionally retained data is recovered by forensic analysis. A forensic investigator is a powerful adversary: • access to persistent storage at time t • goal: recover expired data and/or history of operations Slack data in table storage Tuples active deleted (but recoverable) t1 t2 t3 t4 t5 t6 t1 t2 t3 t4 t5 t6 t1 t2 t7 t4 t5 t6 t2 t7 t6 t4 (1) (2) (3) (4) t5 t6 Allocated file File system Delete t3, t5 Deletion is insecure Vacuum is insecure Insert t7 Delete t1, t4 Vacuum Database slack Filesystem slack Tracing data lifetime Active Removed Ideal case Tracing data lifetime DB Slack FS Slack Active Removed Ideal case Tracing data lifetime FS Slack FS ins ert e itiz DB san DB DB delete upd ate DB Slack DB update Active Removed Ideal case DB/FS operations Tracing data lifetime FS Slack FS ins ert e itiz DB san DB DB delete upd ate DB Slack DB update Active Removed Ideal case DB/FS operations Vacuum Experiments • We studied: • Built forensic recovery tools which scan database pages, recovering expired tuples. • Table storage • deletion is insecure in all systems • database and file system slack data generated in proportion to • workload, vacuum, clustering. # records in Slack (x1000) Recoverable database slack 30 25 20 15 10 5 0 0 10 20 30 operations (x1000) Expired records MySQL (MyISAM) DB2 40 SQLite MySQL (InnoDB) PostgreSQL 50 # records in Slack (x1000) Recoverable database slack 30 25 20 15 10 5 0 0 10 20 30 operations (x1000) Expired records MySQL (MyISAM) DB2 40 SQLite MySQL (InnoDB) PostgreSQL 50 # records in Slack (x1000) Recoverable database slack 30 25 20 15 10 5 0 0 10 20 30 operations (x1000) Expired records MySQL (MyISAM) DB2 40 SQLite MySQL (InnoDB) PostgreSQL 50 # records in Slack (x1000) Recoverable database slack 30 25 20 15 10 5 0 0 10 20 30 operations (x1000) Expired records MySQL (MyISAM) DB2 40 SQLite MySQL (InnoDB) PostgreSQL 50 # records in Slack (x1000) Recoverable database slack 30 25 20 15 10 5 0 0 10 20 30 operations (x1000) Expired records MySQL (MyISAM) DB2 40 SQLite MySQL (InnoDB) PostgreSQL 50 # records in Slack (x1000) Recoverable database slack 30 25 20 15 10 5 0 0 10 20 30 operations (x1000) Expired records MySQL (MyISAM) DB2 40 SQLite MySQL (InnoDB) PostgreSQL 50 The problem with forensic data recovery • Intended interface of database (SQL) does not reliably represent the stored contents of the database • e.g. deleted tuples do not appear in query results, but are recoverable. • tuples do not have “age” or order in data model, but this info can be recovered from disk image. Transparent systems Clarity of interfaces • The system should provide users with clear, accurate bounds on the persistence of data in the system. Purposeful retention • Data retained after deletion must have a legitimate purpose, and data should be removed once that purpose is no longer valid. Complete removal • Deleted data must be destroyed, including copies and derived versions. Securing history Investigator/ Auditor Individual Problem: very little control over how and when historical data is retained in systems, who can recover and analyze it. • Part 1: Databases don’t easily “forget” history. • To support privacy: “memory-less” systems • Part 2: Databases can “remember”, but not safely. • To privacy & accountability: implementing retention policies on an audit log. Securing history Investigator/ Auditor Individual Problem: very little control over how and when historical data is retained in systems, who can recover and analyze it. • Part 1: Databases don’t easily “forget” history. • To support privacy: “memory-less” systems • Part 2: Databases can “remember”, but not safely. • To privacy & accountability: implementing retention policies on an audit log. Data Model Client Schema : S (eid, name, department, salary) Auditor Data Model Client Schema : S (eid, name, department, salary) Auditor Log table (S + audit fields) client IP time type eid name dept sal Data Model Client Schema : S (eid, name, department, salary) Auditor Log table (S + audit fields) client IP time type eid name dept sal Jack 1.1.1 0 ins 101 Bob Sales 10 Data Model Client Schema : S (eid, name, department, salary) Auditor Log table History table (a transaction time (S + audit fields) representation derived from log table) client IP time type eid name dept sal Jack 1.1.1 0 ins 101 Bob Sales 10 eid name dept sal from to Data Model Client Schema : S (eid, name, department, salary) Auditor Log table History table (a transaction time (S + audit fields) representation derived from log table) client IP time type eid name dept sal eid name dept sal from to Jack 1.1.1 0 ins 101 Bob Sales 10 101 Bob Sales 10 0 now Data Model Client Schema : S (eid, name, department, salary) Auditor Log table History table (a transaction time (S + audit fields) representation derived from log table) client IP time type eid name dept sal eid name dept sal from to Jack 1.1.1 0 ins 101 Bob Sales 10 101 Bob Sales 10 0 now 100 Jack 2.1.1 100 upd 101 - - 12 101 Bob Sales 12 100 200 now Kate 3.1.1 200 upd 101 - Mgmt - 101 Bob Mgmt 12 200 now 300 Kate 4.1.1 300 upd 101 - - 15 101 Bob Mgmt 15 300 now Jack 1.1.1 0 ins 201 Chris HR 8 201 Chris HR 8 0 now 300 Jack 2.1.1 300 upd 201 - Mgmt 10 201 Chris Mgmt 10 300 500 now Kate 4.1.1 500 del 201 - - - Log table, History table and Example Queries Log table History table client IP time type eid name dept sal eid name dept sal from to Jack 1.1.1 0 ins 101 Bob Sales 10 101 Bob Sales 10 0 100 Jack 2.1.1 100 upd 101 - - 12 101 Bob Sales 12 100 200 Kate 3.1.1 200 upd 101 - Mgmt - 101 Bob Mgmt 12 200 300 Kate 4.1.1 300 upd 101 - - 15 Jack 1.1.1 0 ins 201 Chris HR 8 101 Bob Mgmt 15 300 now Jack 2.1.1 300 upd 201 - Mgmt 10 201 Chris HR 8 0 300 Kate 4.1.1 500 del 201 - - - 201 Chris Mgmt 10 300 500 Queries: Q1. Q2. Retention restrictions A compliance officer is responsible for enforcing data retention Non-negotiable, including auditors We propose two kinds of declarative rules for limiting the lifetime of data Retention Policies Rule type 1: redaction Remove value of attributes, do not hide existence. E.g. Rule type 2: expunction Expunge tuple, along with all evidence of its existence. E.g. Retention Policies Rule type 1: redaction Remove value of attributes, do not hide existence. E.g. Rule type 2: expunction Expunge tuple, along with all evidence of its existence. E.g. History After Application of Policies History After Application of Policies History table eid name dept sal from to 101 Bob Sales 10 0 100 101 Bob Sales 12 100 200 101 Bob Mgmt 12 200 300 101 Bob Mgmt 15 300 now 201 Chris HR 8 0 300 201 Chris Mgmt 10 300 500 History After Application of Policies Sanitized History table eid name dept sal from to 101 Bob Sales 10 sx 0 100 101 Bob Sales 12 sy 100 200 101 Bob Mgmt 12 sy 200 300 101 Bob Mgmt 15 300 now 201 Chris HR 8 0 300 201 Chris Mgmt 10 300 500 History After Application of Policies Sanitized History table eid name dept sal from to 101 Bob Sales 10 sx 0 100 101 Bob Sales 12 sy 100 200 101 Bob Mgmt 12 sy 200 300 101 Bob Mgmt 15 300 now 201 Chris HR 8 0 300 201 Chris Mgmt 10 300 500 History After Application of Policies Sanitized history table History After Application of Policies Sanitized log table Sanitized history table History After Application of Policies Sanitized log table Retention policies change history Incomplete information Loss of information Sanitized history table History After Application of Policies Sanitized log table Retention policies change history Incomplete information Loss of information Now how can we audit? Sanitized history table Original History table eid name dept sal from to 101 Bob Sales 10 0 100 101 Bob Sales 12 100 200 101 Bob Mgmt 12 200 300 101 Bob Mgmt 15 300 now 201 Chris HR 8 0 300 201 Chris Mgmt 10 300 500 Original certain: After {Bob, Chris}. application of policies certain: {Chris}. possible: {Bob} Sanitized History table Original History table eid name dept sal from to 101 Bob Sales 10 0 100 101 Bob Sales 12 100 200 101 Bob Mgmt 12 200 300 101 Bob Mgmt 15 300 now 201 Chris HR 8 0 300 201 Chris Mgmt 10 300 500 Original certain: After {Bob, Chris}. application of policies certain: {Chris}. possible: {Bob} Sanitized History table Original History table eid name dept sal from to 101 Bob Sales 10 0 100 101 Bob Sales 12 100 200 101 Bob Mgmt 12 200 300 101 Bob Mgmt 15 300 now 201 Chris HR 8 0 300 201 Chris Mgmt 10 300 500 Original certain: After {Bob, Chris}. application of policies certain: {Chris}. possible: {Bob} Sanitized History table client IP time type eid name dept sal Jack 1.1.1 0 ins 101 Bob Sales 10 Jack 2.1.1 100 upd 101 - - 12 Kate 3.1.1 200 upd 101 - Mgmt - Kate 4.1.1 300 upd 101 - - 15 Jack 1.1.1 0 ins 201 Chris HR 8 Jack 2.1.1 300 upd 201 - Mgmt 10 Kate 4.1.1 500 del 201 - - - Original certain: {(Kate,200), After (Jack,300)}. policy application certain:{(Kate,200)}. possible: {} client IP time type eid name dept sal Jack 1.1.1 0 ins 101 Bob Sales 10 Jack 2.1.1 100 upd 101 - - 12 Kate 3.1.1 200 upd 101 - Mgmt - Kate 4.1.1 300 upd 101 - - 15 Jack 1.1.1 0 ins 201 Chris HR 8 Jack 2.1.1 300 upd 201 - Mgmt 10 Kate 4.1.1 500 del 201 - - - Original certain: {(Kate,200), After (Jack,300)}. policy application certain:{(Kate,200)}. possible: {} client IP time type eid name dept sal Jack 1.1.1 0 ins 101 Bob Sales 10 Jack 2.1.1 100 upd 101 - - 12 Kate 3.1.1 200 upd 101 - Mgmt - Kate 4.1.1 300 upd 101 - - 15 Jack 1.1.1 0 ins 201 Chris HR 8 Jack 2.1.1 300 upd 201 - Mgmt 10 Kate 4.1.1 500 del 201 - - - Original certain: {(Kate,200), After (Jack,300)}. policy application certain:{(Kate,200)}. possible: {} client IP time type eid name dept sal Jack 1.1.1 0 ins 101 Bob Sales 10 Jack 2.1.1 100 upd 101 - - 12 Kate 3.1.1 200 upd 101 - Mgmt - Kate 4.1.1 300 upd 101 - - 15 Jack 1.1.1 0 ins 201 Chris HR 8 Jack 2.1.1 300 upd 201 - Mgmt 10 Kate 4.1.1 500 del 201 - - - Original certain: {(Kate,200), After (Jack,300)}. policy application certain:{(Kate,200)}. possible: {} Audit queries under policies Incompleteness Variables: indicates that value is removed. Status column (C or P): indicates that if current tuple is Certain or Possible in this table. Queries Based on an extended relational algebra Audit queries under policies Incompleteness Variables: indicates that value is removed. Status column (C or P): indicates that if current tuple is Certain or Possible in this table. Queries Based on an extended relational algebra name dept from to status Bob Mgmt 0 200 C Jack X 100 300 C dept manager from to status Mgmt Alice 0 150 C Audit queries under policies Incompleteness Variables: indicates that value is removed. Status column (C or P): indicates that if current tuple is Certain or Possible in this table. Queries Based on an extended relational algebra name dept from to status Bob Mgmt 0 200 C Jack X 100 300 C dept manager from to status Mgmt Alice 0 150 C name manager from to status Bob Alice 0 150 C Jack Alice 100 150 P Properties of policy application Original data Log table History table After policy application Properties of policy application After policy application Original data Log table History table policy Sanitized History table Properties of policy application After policy application Original data Log table Sanitized Log table History table Sanitized History table policy Properties of policy application After policy application Original data History table Sanitized History table policy Secrecy Log table Sanitized Log table For any query, protected information must not be disclosed. Soundness True answer to any query must be a possible answer under policy. Protected auditing -- contributions • Declarative rules for expressing retention restrictions over a historical data model • Semantics of policy application and audit queries • Implementation • Impact of retention policy on performance and accuracy of audit query Conclusion • History should be a “first-class” part of a (database) system • The safe, accurate configuration of the system’s historical memory allows needed balance between privacy and accountability. • Transparency requirements: • Interface should faithfully represent stored contents. • Auditing and retention: • A protection model for sanitizing history while preserving auditing capabilities. Questions? For more information, please see these publications: • Threats to Privacy in the Forensic Analysis of Database Systems. Patrick Stahlberg, G. Miklau, Brian Levine. SIGMOD 2007. • Auditguard: A system for database auditing under retention restrictions. Wentian Lu and G. Miklau. VLDB 2008 (Demo Program) • Auditing a database under retention restrictions. Wentian Lu and G. Miklau. ICDE 2009.