Download Securely Managing History in Database Systems Gerome Miklau University of Massachusetts, Amherst

Document related concepts

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Securely Managing History
in Database Systems
Gerome Miklau
Joint work with Brian Levine, Patrick Stahlberg, Wentian Lu
University of Massachusetts, Amherst
History has benefits
History: a stored record of data and
operations performed in a system.
now
• Arguments for preserving history
• Protection against loss
• Storage is cheap
• History is useful: accountability
History has benefits
History: a stored record of data and
operations performed in a system.
now
• Arguments for preserving history
• Protection against loss
• Storage is cheap
• History is useful: accountability
History has benefits
History: a stored record of data and
operations performed in a system.
now
• Arguments for preserving history
• Protection against loss
• Storage is cheap
• History is useful: accountability
holding people/
programs responsible
for actions taken.
History has risks
• Arguments against preserving history
• Persistence threatens privacy.
• Institutions can be compelled to reveal retained data
(even if they don’t want to).
• There are significant benefits to institutional forgetfulness.
Retention policies
Privacy and accountability balanced through retention policies.
Institution
Collected Info
Retention Policy
[Mayer-Schoenberger 2007]
Retention policies
Privacy and accountability balanced through retention policies.
Institution
Russian KGB
Collected Info
Retention Policy
хранить вечно
speech, actions,
etc.
“to be preserved forever”
U.S. credit
agency
late payments,
defaults, etc.
7 years
Google
search engine
queries
9 months
[Mayer-Schoenberger 2007]
Retention policies
Privacy and accountability balanced through retention policies.
Institution
Russian KGB
Collected Info
Retention Policy
хранить вечно
speech, actions,
etc.
“to be preserved forever”
U.S. credit
agency
late payments,
defaults, etc.
7 years
Google
search engine
queries
9 months
[Mayer-Schoenberger 2007]
Retention policies
Privacy and accountability balanced through retention policies.
Institution
Russian KGB
Collected Info
Retention Policy
хранить вечно
speech, actions,
etc.
“to be preserved forever”
U.S. credit
agency
late payments,
defaults, etc.
7 years
Google
search engine
queries
9 months
[Mayer-Schoenberger 2007]
Securing history
Investigator/
Auditor
Individual
Problem: very little control over how and when historical data
is retained in systems, who can recover and analyze it.
• Part 1: Databases don’t easily “forget” history.
• To support privacy: “memory-less” systems
• Part 2: Databases can “remember”, but not safely.
• To privacy & accountability: implementing retention
policies on an audit log.
Securing history
Investigator/
Auditor
Individual
Problem: very little control over how and when historical data
is retained in systems, who can recover and analyze it.
• Part 1: Databases don’t easily “forget” history.
• To support privacy: “memory-less” systems
• Part 2: Databases can “remember”, but not safely.
• To privacy & accountability: implementing retention
policies on an audit log.
PART 1: Databases don’t forget
Databases inadvertently retain historical
traces of data and operations.
• INSERT sensitive record
Table storage
Index
• (later) DELETE the record
• deletion is “logical” -data is not destroyed
• actual persistence of data is
hard to predict, and virtually
impossible to control.
Log
Temp
PART 1: Databases don’t forget
Databases inadvertently retain historical
traces of data and operations.
• INSERT sensitive record
Table storage
Index
• (later) DELETE the record
• deletion is “logical” -data is not destroyed
• actual persistence of data is
hard to predict, and virtually
impossible to control.
Log
Temp
Database forensic analysis
Unintentionally retained data is recovered by
forensic analysis.
A forensic investigator is a powerful adversary:
• access to persistent storage at time t
• goal: recover expired data and/or history of operations
Slack data in table storage
Tuples
active
deleted
(but
recoverable)
t1
t2
t3
t4
t5
t6
t1
t2
t3
t4
t5
t6
t1
t2
t7
t4
t5
t6
t2
t7
t6
t4
(1)
(2)
(3)
(4)
t5
t6
Allocated file
File system
Delete t3, t5
Deletion is insecure
Vacuum is insecure
Insert t7
Delete t1, t4
Vacuum
Database slack
Filesystem slack
Tracing data lifetime
Active
Removed
Ideal case
Tracing data lifetime
DB Slack
FS Slack
Active
Removed
Ideal case
Tracing data lifetime
FS Slack
FS
ins
ert
e
itiz
DB
san
DB
DB delete
upd
ate
DB Slack
DB update
Active
Removed
Ideal case
DB/FS operations
Tracing data lifetime
FS Slack
FS
ins
ert
e
itiz
DB
san
DB
DB delete
upd
ate
DB Slack
DB update
Active
Removed
Ideal case
DB/FS operations
Vacuum
Experiments
• We studied:
• Built forensic recovery tools which scan database pages,
recovering expired tuples.
• Table storage
• deletion is insecure in all systems
• database and file system slack data generated in proportion to
• workload, vacuum, clustering.
# records in Slack (x1000)
Recoverable database slack
30
25
20
15
10
5
0
0
10
20
30
operations (x1000)
Expired records
MySQL (MyISAM)
DB2
40
SQLite
MySQL (InnoDB)
PostgreSQL
50
# records in Slack (x1000)
Recoverable database slack
30
25
20
15
10
5
0
0
10
20
30
operations (x1000)
Expired records
MySQL (MyISAM)
DB2
40
SQLite
MySQL (InnoDB)
PostgreSQL
50
# records in Slack (x1000)
Recoverable database slack
30
25
20
15
10
5
0
0
10
20
30
operations (x1000)
Expired records
MySQL (MyISAM)
DB2
40
SQLite
MySQL (InnoDB)
PostgreSQL
50
# records in Slack (x1000)
Recoverable database slack
30
25
20
15
10
5
0
0
10
20
30
operations (x1000)
Expired records
MySQL (MyISAM)
DB2
40
SQLite
MySQL (InnoDB)
PostgreSQL
50
# records in Slack (x1000)
Recoverable database slack
30
25
20
15
10
5
0
0
10
20
30
operations (x1000)
Expired records
MySQL (MyISAM)
DB2
40
SQLite
MySQL (InnoDB)
PostgreSQL
50
# records in Slack (x1000)
Recoverable database slack
30
25
20
15
10
5
0
0
10
20
30
operations (x1000)
Expired records
MySQL (MyISAM)
DB2
40
SQLite
MySQL (InnoDB)
PostgreSQL
50
The problem with forensic data recovery
• Intended interface of database (SQL) does not reliably
represent the stored contents of the database
• e.g. deleted tuples do not appear in query results, but are
recoverable.
• tuples do not have “age” or order in data model, but this info can
be recovered from disk image.
Transparent systems
Clarity of interfaces
• The system should provide users with clear, accurate bounds
on the persistence of data in the system.
Purposeful retention
• Data retained after deletion must have a legitimate purpose,
and data should be removed once that purpose is no longer
valid.
Complete removal
• Deleted data must be destroyed, including copies and derived
versions.
Securing history
Investigator/
Auditor
Individual
Problem: very little control over how and when historical data
is retained in systems, who can recover and analyze it.
• Part 1: Databases don’t easily “forget” history.
• To support privacy: “memory-less” systems
• Part 2: Databases can “remember”, but not safely.
• To privacy & accountability: implementing retention
policies on an audit log.
Securing history
Investigator/
Auditor
Individual
Problem: very little control over how and when historical data
is retained in systems, who can recover and analyze it.
• Part 1: Databases don’t easily “forget” history.
• To support privacy: “memory-less” systems
• Part 2: Databases can “remember”, but not safely.
• To privacy & accountability: implementing retention
policies on an audit log.
Data Model
 Client

Schema : S (eid, name, department, salary)
 Auditor
Data Model
 Client

Schema : S (eid, name, department, salary)
 Auditor
Log table
(S + audit fields)
client
IP
time
type
eid
name
dept
sal
Data Model
 Client

Schema : S (eid, name, department, salary)
 Auditor
Log table
(S + audit fields)
client
IP
time
type
eid
name
dept
sal
Jack
1.1.1
0
ins
101
Bob
Sales
10
Data Model
 Client

Schema : S (eid, name, department, salary)
 Auditor
Log table
History table (a transaction time
(S + audit fields)
representation derived from log table)
client
IP
time
type
eid
name
dept
sal
Jack
1.1.1
0
ins
101
Bob
Sales
10
eid
name
dept
sal
from
to
Data Model
 Client

Schema : S (eid, name, department, salary)
 Auditor
Log table
History table (a transaction time
(S + audit fields)
representation derived from log table)
client
IP
time
type
eid
name
dept
sal
eid
name
dept
sal
from
to
Jack
1.1.1
0
ins
101
Bob
Sales
10
101
Bob
Sales
10
0
now
Data Model
 Client

Schema : S (eid, name, department, salary)
 Auditor
Log table
History table (a transaction time
(S + audit fields)
representation derived from log table)
client
IP
time
type
eid
name
dept
sal
eid
name
dept
sal
from
to
Jack
1.1.1
0
ins
101
Bob
Sales
10
101
Bob
Sales
10
0
now
100
Jack
2.1.1
100
upd
101
-
-
12
101
Bob
Sales
12
100
200
now
Kate
3.1.1
200
upd
101
-
Mgmt
-
101
Bob
Mgmt
12
200
now
300
Kate
4.1.1
300
upd
101
-
-
15
101
Bob
Mgmt
15
300
now
Jack
1.1.1
0
ins
201
Chris
HR
8
201
Chris
HR
8
0
now
300
Jack
2.1.1
300
upd
201
-
Mgmt
10
201
Chris
Mgmt
10
300
500
now
Kate
4.1.1
500
del
201
-
-
-
Log table, History table and Example Queries
Log table
History table
client
IP
time
type
eid
name
dept
sal
eid
name
dept
sal
from
to
Jack
1.1.1
0
ins
101
Bob
Sales
10
101
Bob
Sales
10
0
100
Jack
2.1.1
100
upd
101
-
-
12
101
Bob
Sales
12
100
200
Kate
3.1.1
200
upd
101
-
Mgmt
-
101
Bob
Mgmt
12
200
300
Kate
4.1.1
300
upd
101
-
-
15
Jack
1.1.1
0
ins
201
Chris
HR
8
101
Bob
Mgmt
15
300
now
Jack
2.1.1
300
upd
201
-
Mgmt
10
201
Chris
HR
8
0
300
Kate
4.1.1
500
del
201
-
-
-
201
Chris
Mgmt
10
300
500
 Queries:
 Q1.
 Q2.
Retention restrictions
A
compliance officer is responsible for enforcing data
retention

Non-negotiable, including auditors

We propose two kinds of declarative rules for limiting the
lifetime of data
Retention Policies
 Rule
type 1: redaction

Remove value of attributes, do not hide existence.

E.g.
 Rule
type 2: expunction

Expunge tuple, along with all evidence of its existence.

E.g.
Retention Policies
 Rule
type 1: redaction

Remove value of attributes, do not hide existence.

E.g.
 Rule
type 2: expunction

Expunge tuple, along with all evidence of its existence.

E.g.
History After Application of Policies
History After Application of Policies
History table
eid
name
dept
sal
from
to
101
Bob
Sales
10
0
100
101
Bob
Sales
12
100
200
101
Bob
Mgmt
12
200
300
101
Bob
Mgmt
15
300
now
201
Chris
HR
8
0
300
201
Chris
Mgmt
10
300
500
History After Application of Policies
Sanitized History table
eid
name
dept
sal
from
to
101
Bob
Sales
10
sx
0
100
101
Bob
Sales
12
sy
100
200
101
Bob
Mgmt
12
sy
200
300
101
Bob
Mgmt
15
300
now
201
Chris
HR
8
0
300
201
Chris
Mgmt
10
300
500
History After Application of Policies
Sanitized History table
eid
name
dept
sal
from
to
101
Bob
Sales
10
sx
0
100
101
Bob
Sales
12
sy
100
200
101
Bob
Mgmt
12
sy
200
300
101
Bob
Mgmt
15
300
now
201
Chris
HR
8
0
300
201
Chris
Mgmt
10
300
500
History After Application of Policies
Sanitized history table
History After Application of Policies
Sanitized log table
Sanitized history table
History After Application of Policies
Sanitized log table
 Retention
policies change history

Incomplete information

Loss of information
Sanitized history table
History After Application of Policies
Sanitized log table
 Retention
policies change history

Incomplete information

Loss of information
 Now
how can we audit?
Sanitized history table
Original History table
eid
name
dept
sal
from
to
101
Bob
Sales
10
0
100
101
Bob
Sales
12
100
200
101
Bob
Mgmt
12
200
300
101
Bob
Mgmt
15
300
now
201
Chris
HR
8
0
300
201
Chris
Mgmt
10
300
500
 Original

certain:
 After
{Bob, Chris}.
application of policies

certain:
{Chris}.

possible:
{Bob}
Sanitized History table
Original History table
eid
name
dept
sal
from
to
101
Bob
Sales
10
0
100
101
Bob
Sales
12
100
200
101
Bob
Mgmt
12
200
300
101
Bob
Mgmt
15
300
now
201
Chris
HR
8
0
300
201
Chris
Mgmt
10
300
500
 Original

certain:
 After
{Bob, Chris}.
application of policies

certain:
{Chris}.

possible:
{Bob}
Sanitized History table
Original History table
eid
name
dept
sal
from
to
101
Bob
Sales
10
0
100
101
Bob
Sales
12
100
200
101
Bob
Mgmt
12
200
300
101
Bob
Mgmt
15
300
now
201
Chris
HR
8
0
300
201
Chris
Mgmt
10
300
500
 Original

certain:
 After
{Bob, Chris}.
application of policies

certain:
{Chris}.

possible:
{Bob}
Sanitized History table
client
IP
time
type
eid
name
dept
sal
Jack
1.1.1
0
ins
101
Bob
Sales
10
Jack
2.1.1
100
upd
101
-
-
12
Kate
3.1.1
200
upd
101
-
Mgmt
-
Kate
4.1.1
300
upd
101
-
-
15
Jack
1.1.1
0
ins
201
Chris
HR
8
Jack
2.1.1
300
upd
201
-
Mgmt
10
Kate
4.1.1
500
del
201
-
-
-
 Original

certain: {(Kate,200),
 After
(Jack,300)}.
policy application

certain:{(Kate,200)}.

possible:
{}
client
IP
time
type
eid
name
dept
sal
Jack
1.1.1
0
ins
101
Bob
Sales
10
Jack
2.1.1
100
upd
101
-
-
12
Kate
3.1.1
200
upd
101
-
Mgmt
-
Kate
4.1.1
300
upd
101
-
-
15
Jack
1.1.1
0
ins
201
Chris
HR
8
Jack
2.1.1
300
upd
201
-
Mgmt
10
Kate
4.1.1
500
del
201
-
-
-
 Original

certain: {(Kate,200),
 After
(Jack,300)}.
policy application

certain:{(Kate,200)}.

possible:
{}
client
IP
time
type
eid
name
dept
sal
Jack
1.1.1
0
ins
101
Bob
Sales
10
Jack
2.1.1
100
upd
101
-
-
12
Kate
3.1.1
200
upd
101
-
Mgmt
-
Kate
4.1.1
300
upd
101
-
-
15
Jack
1.1.1
0
ins
201
Chris
HR
8
Jack
2.1.1
300
upd
201
-
Mgmt
10
Kate
4.1.1
500
del
201
-
-
-
 Original

certain: {(Kate,200),
 After
(Jack,300)}.
policy application

certain:{(Kate,200)}.

possible:
{}
client
IP
time
type
eid
name
dept
sal
Jack
1.1.1
0
ins
101
Bob
Sales
10
Jack
2.1.1
100
upd
101
-
-
12
Kate
3.1.1
200
upd
101
-
Mgmt
-
Kate
4.1.1
300
upd
101
-
-
15
Jack
1.1.1
0
ins
201
Chris
HR
8
Jack
2.1.1
300
upd
201
-
Mgmt
10
Kate
4.1.1
500
del
201
-
-
-
 Original

certain: {(Kate,200),
 After
(Jack,300)}.
policy application

certain:{(Kate,200)}.

possible:
{}
Audit queries under policies


Incompleteness

Variables: indicates that value is removed.

Status column (C or P): indicates that if current tuple is Certain
or Possible in this table.
Queries

Based on an extended relational algebra
Audit queries under policies


Incompleteness

Variables: indicates that value is removed.

Status column (C or P): indicates that if current tuple is Certain
or Possible in this table.
Queries

Based on an extended relational algebra
name
dept
from
to
status
Bob
Mgmt
0
200
C
Jack
X
100
300
C
dept
manager
from
to
status
Mgmt
Alice
0
150
C
Audit queries under policies


Incompleteness

Variables: indicates that value is removed.

Status column (C or P): indicates that if current tuple is Certain
or Possible in this table.
Queries

Based on an extended relational algebra
name
dept
from
to
status
Bob
Mgmt
0
200
C
Jack
X
100
300
C
dept
manager
from
to
status
Mgmt
Alice
0
150
C
name manager
from
to
status
Bob
Alice
0
150
C
Jack
Alice
100
150
P
Properties of policy application
Original data
Log table
History table
After policy application
Properties of policy application
After policy application
Original data
Log table
History table
policy
Sanitized
History table
Properties of policy application
After policy application
Original data
Log table
Sanitized
Log table
History table
Sanitized
History table
policy
Properties of policy application
After policy application
Original data

History table
Sanitized
History table
policy
Secrecy


Log table
Sanitized
Log table
For any query, protected information must not be disclosed.
Soundness

True answer to any query must be a possible answer under policy.
Protected auditing -- contributions
• Declarative rules for expressing retention restrictions over a
historical data model
• Semantics of policy application and audit queries
• Implementation
• Impact of retention policy on performance and accuracy of
audit query
Conclusion
• History should be a “first-class” part of a (database) system
• The safe, accurate configuration of the system’s historical
memory allows needed balance between privacy and
accountability.
• Transparency requirements:
• Interface should faithfully represent stored contents.
• Auditing and retention:
• A protection model for sanitizing history while preserving
auditing capabilities.
Questions?
For more information, please see these publications:
• Threats to Privacy in the Forensic Analysis of Database Systems.
Patrick Stahlberg, G. Miklau, Brian Levine. SIGMOD 2007.
• Auditguard: A system for database auditing under retention
restrictions. Wentian Lu and G. Miklau. VLDB 2008 (Demo Program)
• Auditing a database under retention restrictions. Wentian Lu and
G. Miklau. ICDE 2009.