Download Databases with Minimal Trust

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia, lookup

Microsoft Jet Database Engine wikipedia, lookup

Concurrency control wikipedia, lookup

Database model wikipedia, lookup

Functional Database Model wikipedia, lookup

StrangerDB -Safe Data Management
with Untrusted Servers
Dennis Shasha
([email protected])
• Store private data in a public database: backup,
concurrency control, and some query processing
• Protect data from being observed (privacy)
• Make unauthorized modifications evident
• Force server to deliver a consistent picture to all
honest users or be discovered (consistency).
• Dishonest users have the same effect as users
who enter bad data.
• Encryption per user/group for privacy.
• Signatures for tamper-evidence
• SUNDR-style [1] maintenance to detect
inconsistent transaction orders.
1. "Building secure file systems out of Byzantine
storage", David Mazieres and Dennis Shasha, Principles
of Distributed Computing, 2002. pp. 108-117.
Database Setup for Privacy
• A record is a sequence of cleartext field values
plus an encrypted part that may encompass one
or more fields.
• The encryption is known to one or more users.
• Decryption is done at the most public processor
possible: user’s workstation or smartcard if
private to a user, workstation of group member if
decryption pertains to group-owned information.
• Encryption can be private key encryption.
Privacy Related Optimization
• There are classical optimization problems
to be solved in this framework.
• Example: if some data is private to a user
and other data belongs to a group, do I do
the group processing first and then bring
the result to the private workstation or do I
bring all the data to the private workstation
right away?
Work Related to Privacy
• Hakan Hacigumus, Bala Iyer, Chen Li and Sharad
Mehrotra. "Executing SQL over Encrypted Data in the
Database Server Provider Model.“ ACM Sigmod 2002 –
advocates a field by field encryption idea; they map
queries to encrypted values. Sometimes encryption
preserves order and sometimes not.
• Matthias Fischmann and Oliver Günther “Privacy
Tradeoffs in Database Service Architectures,”
(BIZSEC'03) – points to security leaks in this model if the
adversary can ask queries. Even if not, encrypted fields
yield information about the number of distinct values and
their distribution.
Work Related to Privacy
• “Hippocratic Databases” Rakesh Agrawal, Jerry
Kiernan, Ramakrishnan Srikant, Yirong Xu,
VLDB 2002. Argues that databases should
provide mechanisms to preserve privacy
including properties like consent of information
donor, limited use, limited retention etc.
• Encryption gives consent of information donor
only. Once I give you my key, I have little further
control. Changing the key does prevent recipient
from learning new data.
Our Take on Privacy
• We are agnostic: you encrypt what you
want and issue queries and updates to
achieve your privacy goals.
• For purposes of this talk:
a database designer knows exactly which
information is revealed to non-owners:
everything that is in the clear.
• Non-owners may not issue queries to or
modify your data.
Tamper Evidence (safety)
• Every data item can be modified by
exactly one user or group. A modifier signs
a collision-resistant hash of the encrypted
result of the data after modification.
• Note: Need trusted public repository of
public signature keys (e.g. provided by
company security officer)
Collision-Resistant Hash Setup:
Inspired by Merkle Trees
sgn_user(HASH (root), ptr)
HASH1, ptrs
HASH2, ptrs
Malicious servers and forks
• If a user u accesses a database at time t,
the user wants to be sure that the
database is current as of time t.
• A malicious server might give u a
database state reflecting only some
previous updates.
• Users cannot prevent such “forking
attacks” but would like to discover them
Underlying Strategy to
ensure inter-user consistency
• Periodically, every pair of users exchange
their ideas of global history. If one member
of the pair has missed an update done by
the other, the histories won’t be consistent.
• For this to work in practice, we need some
encoding of that global history.
Underlying Strategy: intuition
Bob: “Alice do you agree that
Mary said X?”
Alice: “No way. I never heard that!”
Bob: “But here is Mary’s signed
statement to that effect.”
Alice: “Well I guess the server is
messing with history then.”
Strawman Implementation:
Log of Global Operations
• Imagine that we have a sequential log
consisting of every transaction that ever hit
the database and that this sequential log is
signed by the last transaction.
• Ex: order of transactions is T1 T2 T3 (done
by users u1 u2 u3).
Log after T3:
• sgn_u3(T3 sgn_u2(T2 sgn_u1(T1)))
Ensuring Individual Consistency
• Log is held by the untrusted server.
• Every time a user appends to and signs the log,
the user first checks that the log he/she
previously signed is a prefix of log to be signed.
• Ex: if u is about to commit transaction T’ and has
previously committed T, then u makes sure that
the log now contains as a prefix the log from the
time T was committed.
Individual Log Consistency
Bob: “In my previous update, the log had (in left
to right order): Talice1, Tbob1
Now it has Talice1, Tbob1, Tmary1.
So my previous view of the log was a prefix of the
current one. Ok, so I’ll append my new
transaction: Talice1, Tbob1, Tmary1, Tbob2
Alice hears nothing of all this.
Ensuring Global Consistency
by detecting forking attacks
• Periodically, users exchange their ideas of
the global log. Each user verifies:
– The signatures of all the users
– Whether one global history is a prefix of the
other or not.
• If there has been a forking attack and u1
has not seen a transaction of u2, then
neither user’s log will be a prefix of the
Global Log Consistency
Bob: “Alice, here is the log of all transactions as I
see it: Talice1, Tbob1, Tmary1, Tbob2
Alice: “That’s funny. Here is my log
Talice1, Tbob1, Tjill1. It is not a prefix of yours
because I have Jill’s transaction, but yours is not a
prefix of mine because you have Mary’s
Bob: “Are all the signatures
Alice: “Absolutely. See for
yourself. Server is being naughty
Semantic Objection
• Server might fail to update the data but
assert that the transactions are executed
in the same global order.
• Fix: associate with each transaction a
collision-resistant hash of the state of the
whole database. Call these h1, h2, h3
• sgn_u3(T3 h3 sgn_u2(T2 h2 sgn_u1(T1 h1)))
• Transactions verify global hash upon data
Hashes of all the data
Bob: “Alice, the log says you
were the last to execute, yet
when I perform a collisionresistant hash of the database,
the result Is not consistent with
that hash.”
Alice: “It’s lucky I signed the hash that
I placed in the log. That shows the state
of the database I think is present.”
Bob: “Darn server has been
changing the data again!”
Practical objection:
space grows without bound
• In this log-based (strawman) implementation,
each user keeps log of all transactions ever
• Alternative is to have each user update his/her
version for every access (even read-only
• A “version structure” is basically a set of userversion pairs + a hash of the data of the signer.
• Space per version structure proportional to
number of users N. Because each user needs to
keep the latest version structure for each user,
the total space per user is N2
Version Structure Detail
• Suppose user u creates the last version
structure. Then u increments his/her
version number (and no other) and signs
the structure, which contains:
sgn_u(hash of data owned by u, (u1, n1),
(u2, n2), …)
where (ui, ni) means: ui is at version ni.
• From now on, call hash of the data owned
by u “hash(udata)”
Basic Properties of version
• Because of the signature, the server
cannot forge a version structure.
• Because of the collision-resistant hash,
each user’s data can be verified to be
what that user intended.
• Each user maintains a “version structure
list” of the most recent version structure
from each user.
Use of Version Structure List
Bob: “Alice, according to the version
structure list you were the last to
execute, yet when I perform a collisionresistant hash of the database, the
result Is not consistent with that hash.”
Alice: “It’s lucky I signed the hash that
I placed in the version structure list.
That shows the state
of my data I think is present.”
Bob: “Darn server has been
changing the data again!”
Three incrementally related
version structures
Bob: sgn_Bob(hash(Bobdata),
(Bob, 6), (Alice, 12), (Bill, 4))
Alice: sgn_Alice(hash(Alicedata),
(Bob,6), (Alice,13), (Bill,4))
Bob: sgn_Bob(hash(Bobdata),
(Bob,7), (Alice, 13), (Bill,4))
Ordering Properties of version
• Define a partial order on version structures:
vs1 < vs2 if the users in vs1 are a subset of the
users in vs2 and for every user u in vs1, the
version of u in vs1 (denoted vs1[u]) is less than
or equal to vs2[u] and for at least one user v,
vs1[v] < vs2[v].
• We say vs1 is “incrementally less than” vs2 if
there exists a u such that u signs vs2,
vs2[u] = vs1[u] + 1
and for all v, if v != u then vs2[v] = vs1[v].
Version Structure Construction
• User u forms its new version structure vs_u as
u first examines the previous version structure
that u signed vs_u_old and sets vs_u[u] =
• Next u examines the last version structure vs_v
signed by each other user v and sets vs_u[v] =
• In this way u creates a version structure that
reflects the last signed version of every user.
Signing Verification Protocol: Part I
• When a user u is ready to sign the version
structure vs_u constructed as above, u checks
1. the highest version number for every user v is
in the last version structure vs_v signed by v.
(Other version structures may have the same
highest version number for v as well, but they
may not exceed vs_v[v].)
2. There is some ordering such that each
version structure in the list is incrementally less
than the next one on the list and vs_u is the
Signing Verification Protocol: Part II
• The set of all data belonging to a user v is
hashed to a value hash(vdata) as of the last
signed version structure of v: vs_v.
• When user u reads v’s data, it checks v’s data
against hash(vdata) to verify that v’s data hasn’t
been changed since the signing of vs_v.
• If both the parts of the protocol succeed, then u
signs the version structure vs_u and commits
the transaction.
Signing Verification Protocol
Bob: “Alice, here’s the drill. You issue a
transaction. It accesses data from many people.
You check that the data you have read from each
person is consistent with the signed hash on
his/her last version structure.”
Alice: “How do I know it’s that person’s
last version structure?”
Bob: “Good question. You check that all the
version structures are incrementally related
to one another. You are checking that the
server is consistent in what it tells you.”
Alice: “That’s not enough is it?”
Bob: “No, but forking will leave traces of guilt.”
Forking attacks on honest clients
incomparable version structures
• If the server fails to show user v the
version structure vs_u produced by user u,
the version structure that v will sign, call it
vs_v, will have the property vs_v[u] <
vs_u[v]. Once v signs, vs_v[v] > vs_u[v].
• So vs_u and vs_v will be unordered by <.
• The signing verification protocol will still
succeed. So we need a global protocol.
Forking creates incomparable
version structures
Bob: sgn_Bob(hash(Bobdata),
(Bob, 6), (Alice, 12), (Bill, 4))
Alice: sgn_Alice(hash(Alicedata),
(Bob,6), (Alice,13), (Bill,4))
Server forks and doesn’t show Bob this.
Bob: sgn_Bob(hash(Bobdata),
(Bob,7), (Alice, 12), (Bill,4))
Now, Bob and Alice’s last version
structures are incomparable, i.e.
unordered by <.
Version Structure Exchange I
• Users periodically perform a global version
structure exchange protocol. Let us say that
such a protocol begins at global time t. Every
user u sends the most recent version structure
that u signed before time t to every other user.
Call that vs_u.
• When a user v receives vs_u from user u, then
user v performs a “well-formedness” test: v
compares its most recent version structure
signed before t, call it vs_v, with vs_u. They
should be ordered by < and vs_v[v] >= vs_u[v]
and vs_v[u] <= vs_u[u].
Version Structure Exchange II
• If v performs a well-formedness test for every
user in U and the version structures from those
users are all ordered by <, then v declares those
version structures to be all well-formed.
• If every user v in some set of users U declares
the version structures it receives from users U to
be well-formed, then the global structure
exchange is said to succeed for U.
Global Version Exchange Protocol
Bob: “Alice, from time to time, a global version exchange
protocol begins. Let’s say an instance of the protocol starts
at time t. Every user sends its latest version structure
preceding t to all other users. Sending is done without
mediation by server.”
Alice: “Then what?”
Bob: “Each user checks that the
version structures are well-formed.
Alice: “What if some user does not
Bob: “No problem. Validate the
ones that do send.”
Version Structures and
• Serializability will be based on version
structure order.
• That is, transactions will serialize in the <
order of version structures.
Role of concurrency control
in correctness
• Locking is merely a heuristic that the
server uses to delay transactions and
therefore to give a serial order to version
• If the server cheats and allows accesses
that violate locks, then the version
structures won’t be ordered by <.
• Later caught by the signing verification
protocol or the global exchange.
The interesting case of
multiversion read consistency
• Effectively, a multiversion read consistent
transaction should make its version structure
reflect its start time. So, the user associated with
such a transaction signs its version structure
when it starts, then starts reading.
• If that transaction never commits (because some
data has changed and transaction detects this
by looking at a hash), there is no damage
because the database won’t change and the
application issuing the transaction will receive a
failure as it should.
Proof Strategy
• If all users are honest, but the server may not
be, then the theorems are not that hard to prove.
• If some users could be dishonest, then we could
have major problems, e.g. they could corrupt the
data. But this is like any data corruptor.
• So, we quarantine them in our proofs: we
concern ourselves only with honest users having
no data dependency on dishonest ones.
• We call those “virtuous users.”
Serializability Lemma
• Lemma: If T1  T2 (conflict edge from T1 to T2),
vs1 is the version structure signed by user u1 for
T1, vs2 is the version structure signed by user
u2 for T2, all version structures among some set
of virtuous users U including u1 and u2 are
ordered, then vs1 < vs2.
• Proof: Suppose user u1 issues T1 and user u2
issues T2. For any conflict, there is some data
item x such that op1(x) precedes op2(x).
Lemma continued
• write-read: there is an x such that W1(x) precedes R2(x).
Therefore R2(x) must occur after vs1 has been signed
by u1, because u2 will verify hash (u1data), so vs1[u1]
<= vs2[u1]. Moreover, the temporal ordering implies that
vs1[u2] < vs2[u2]. Finally, because vs1 and vs2 are
ordered, vs1 < vs2.
• write-write: very similar to the write-read case.
• read-write: there exists an x such that R1(x) precedes
W2(x). Therefore vs1[u2] < vs2[u2]. Otherwise R1(x)
would either read from W2(x) or from some later value.
Because version structures are ordered, vs1 < vs2.
Total Ordering Lemma
• Lemma: Suppose the global version
structure exchange begins at t and ends
successfully at t’ for some set of virtuous
users U. Assuming every user in U has
been following the signing verification
protocol up to time t’, then all version
structures among U are ordered
up to time t.
Total Ordering Lemma I
• Prove the contrapositive: Consider version
structures vs1 signed by user u1 and vs2 signed
by user u2 before time t, where both u1 and u2
belong to U such that vs1[u1] > vs2[u1] and
vs1[u2] < vs2[u2].
• That is, the version structures are incomparable.
• Then either the signing verification protocol of
some user or the global version structure
exchange that begins at time t will be
Total Ordering Lemma II
• Consider the next version structure signed
by u1, call it vs1’. At that moment u1 will
know that there has been a fork if u1 sees
vs2 during the signing verification protocol
(because vs1’[u2] < vs2[u2]). So, assume
the server will not show vs2 to u1 and
hence vs1’ > vs2 will be false.
Total Ordering Lemma III
• If server’s forking not yet discovered, then vs1’
and vs2 are unordered, so the argument of the
last slide holds for any subsequent version
structure vs1’’ signed by u.
Symmetrically, no subsequent version structure
signed by u2 such as vs2’ will have the property
that vs2’ > vs1.
• Therefore when the global version structure
exchange occurs, u1 and u2 will discover a lack
of well-formedness. Done.
Notes on Implementation
• Server avoids being framed
• Concurrency control
• Version structure commits – how to make them
• Supporting cryptographic assumptions and
global verification protocol.
• View maintenance
• Read-write asymmetry.
• Indexing
Server is framed
Bob (good): sgn_Bob(hash(Bobdata),
(Bob, 6), (Alice, 12), (Bill, 4))
Alice (good): sgn_Alice(hash(Alicedata),
(Bob,6), (Alice,13), (Bill,4))
Server shows this to Bob. No fork.
Bob (bad): sgn_Bob(hash(Bobdata), (Bob,7),
(Alice, 12), (Bill,4))
Bob pretends server has forked.
Upon global exchange, server is framed.
Server can avoid being framed
• Server signs the version structures from
users if it agrees they are legitimate. In the
case of previous figure, server will refuse
to sign Bob’s second version structure.
• Bob and the server can present their
evidence before security officer.
Server Proves Innocence
Bob (good):
(Bob, 6), (Alice, 12), (Bill, 4)))
Alice (good): sgn_server(
(Bob,6), (Alice,13), (Bill,4)))
Server shows this to Bob. No fork.
Bob (bad): sgn_Bob(hash(Bobdata), (Bob,7),
(Alice, 12), (Bill,4))
Server refuses to sign. Shows that Bob is bad.
Concurrency Control
• Accesing v’s data can be done by locking
hash (vdata).
• To increase concurrency, partition v’s rows
into k parts, each with its own hash. The
user u would then write k hashdata values
in the version structure.
• Transactions will lock the appropriate hash
Making Version Structures Fast
• Verifying, signing, and sending in a version
structure may take time.
• The verification protocol involves many version
structures so is expensive part.
• To make this faster, server sends signed version
structures as they appear to users that have
subscribed to this list so most computation can
be done asynchronously.
• A user sends in its signed version structure
when user is ready and all is verified.
Controlling Version Structure Size
• As it stands, as the number of users
increases, the size of storage increases as
the square.
• Fortunately, it is seldom the case that so
many people trust one another.
• It is fine to have several subpopulations
each with shared version structure lists.
• Subpopulations could overlap but at some
cost to commits.
Implementation of Cryptographic
• Public key infrastructure to hold public
signature keys.
• Smartcards are used to authenticate user
and perhaps as a final private processing
• If attached to a cell phone, smartcards can
be used for the server-independent
communication in the global version
exchange protocol.
View Maintenance
• Deferred view maintenance (or
equivalently triggers) is usually done by
the server on behalf of a transaction. In
this case it must be done by a user that
has the right to modify the data.
• If user u changes private data but sends a
summary of those updates to a user v who
maintains that view, then u must use a
public key of v to do so.
Read Write Asymmetry
Using Public Key Cryptography
• Current use case is each user or group
wants to implement its own data.
• Sometimes, as in view maintenance, one
user wants to share some data with
another (e.g. between patient and
• Use ssh-style protocol with public keys.
Data is owned by patient but read by
• Suppose a user wants not all of his/her
data but only the people in some subset
(e.g. in his/her department).
• Indexes are simply merkle tree like
structures. So start at the root, get the next
page verifying the contents based on the
hash. Decrypt that page and proceed.
• Modifications return back up the tree.
StrangerDB achieves
• Use virtues of servers:
– Reliability, availability, historical backup
But avoid their vices: they might cheat.
Encryption protects privacy.
Signatures makes tampering evident.
Histories/version structures + global
exchange make forking evident.
• Result: serializability, no forks, privacy.
• Questions and criticism welcome.
• If I’ve missed a significant reference,
please let me know now and by email.
• Thank you!
How to Access Data
• The data available to a user u is the data to
which that user has a decryption key.
• For each user v whose data u has access to, u
fetches that data using the index beginning with
pointer associated with hash(vdata).
• Decrypt the part of the index needed (i.e.
decrypt the root, then necessary child etc.)
• This gives a set of row ids.
• Fetch the rows with those rowids from database.
Modifying data
User u modifies certain rows that u owns.
Reflects modification in the indexes.
Updates hash(udata) as appropriate.
Does signing verification protocol.
Commits the transaction.