Download Honey, I got the wrong DLL A REUSE Nightmare.

Document related concepts

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Big data wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Authentic Publication
The TRUTHSAYER Project
Chip Martel
Premkumar Devanbu
Michael Gertz
April Kwong
Glen Nuckolls
Stuart Stubblebine
Department of Computer Science,
University of California, Davis
http://truthsayer.cs.ucdavis.edu
Databases Play a Vital Role
1) Commerce: credit card data, find goods
2) Financial: Investment sites
3) Health: treatments, doctors/credentials, drugs
4) Many more
Answering queries
Server Integrity?
Correct Query processing?
Performance? Reliability?
Data Query
User
Answers
Database
Goals
• Correct and complete answers
(with assurance)
• Efficient Protocols
Example Queries
• Is Credit card number 5543… Valid?
• List all Hong Kong to San Francisco flights.
• Find Digital cameras with 3-5 Mega-pixels, and
cost < $200
• List all bars within one mile of HKU
What is a Correct Answer?
• We assume a trusted Data Owner with the
official copy of the Database: Defines the
“correct answer”
What is a Correct Answer?
• We assume a trusted Data Owner with the official
copy of the Database: Defines the “correct
answer”
• Problems with a single Data Owner:
1) May not want/be able to answer queries
2) Hard to keep online DB secure
3) Scalability
Solution: Third-Party Servers
• Third party sites (Publishers) get
information from the Data Owner and
answer queries
• Example: Travel sites (Expedia,
Travelocity, Orbitz) answer using
government airline Data (FAA)
Server Replication
Can I
Trust
This
Server?
Travelocity
FAA
Data
Orbitz
Expedia
Trust Issues
• Sites have left out cheaper flights from
non-preferred airlines (deliberate)
• Sites may be corrupted: outside hacker or
insider
• Errors
Authentic Publication:
The TRUTHSAYER project.
Initially: for RDB (DBSEC 2000, Jnl. Comp. Sec.)
General Model for a Variety of Data (Algorithmica,
2004)
Owner
Data +
Digest of
Data
Query
Answer +
Verification Object
Publisher
Talk Outline
•
•
•
•
Introduction
Background--- Merkle Trees
Range Queries (Multi-attribute Queries)
A General Model for Authenticated Data
Structures
• Conclusion
Authentic Publication
1)
A trusted Owner digests the Data Set, and signs it.
2)
Untrusted Publishers receive the data & signature.
3)
Clients submit queries to untrusted Publishers.
4)
Publishers return Answers (A), and Verification Objects
(A+ VO)
5)
Clients use A + VO to Prove the answer is
correct/complete.
Protocol is correct, and secure.
Verifying answers
•
•
•
•
Protocol provides:
Correctness: Returns exact elements
matching the query.
Completeness: Returns all elements
matching query.
Security: Cheating is infeasible.
Efficiency: Overhead is low.
Recall: No signatures!!
Merkle hashing a data set.
h* (Root Hash)
h(h1 || h2 )
h1
h(d1)
h2
• Leaves: data in some lexical order.
• One way hash function h; h1= h(d1)
• Bottom-up hashing, starting with data
• Root hash value = the digest of the data set.
Merkle Trees
• Classic use: prove that data value d is in
the data set
• Solves: Is Credit card number 5543…
Valid?
• But also can verify all items in a range:
e.g. camcorders from $400 to $900
Verifying a Range
1
3
5
6
8
10
11
15
q
To Show that q =(5,6,8) is the Answer to 4<d <10:
Used Lower Bound 3, Upper Bound 10 and starred hash
values to compute/verify root hash.
Verifying a Range
1
3
5
6
8
10
11
q
Query: 4<d <10:
Answer: 5,6,8 (in practice, key + data)
Verification Object: [( (h(1),3), (5,6) ) ( (8,10), *) ]
15
Authentic Publication
Hash Digest
Merkle Tree
Security Property
• If the Answer and VO are correct, user
accepts
Security Property
• User accepts an Invalid answer only if a
specific collision in h is found (provable):
h(x,y)= z in a correct VO (x,y, z are the
hash values of tree nodes),
VO uses different x’, y’ with h(x’,y’)=z
Good Features
• Proofs are short (size proportional to tree
height and answer size).
• Use hashes, a fast cryptographic
operation
• Proofs as easy to compute as finding the
answer
• No secret keys: hash function and digests
all are public (no insider attack once data
set is digested).
Extensions
• Want to handle more complex queries
• Find Digital cameras with 3-5 Mega pixels,
and
cost < $200
• List all bars within one mile of HKU
Multi-Attribute Queries
• Model as a 2-D
Range query
• Find points (x,y) with
a<x<b
 c<y<d
(a,d)
(b,d)
Pixels
(a,c)
Cost
(b,c)
2-Dimensional range tree
• Leaves are 2D points, or 2 attributes (cost, pixels).
Sorted by x-value in X-tree
• A Y-tree for each internal node
Searching a 2D-range Tree
• Find (x,y) with 4 < x <50 AND 4 < y < 10
• All in Associated Y-trees Match x-range
Searching a 2D-range Tree
• Find pairs (x,y) with 4 < x <50 AND 4 < y < 10
• In X-tree: subtrees rooted at 5 and 13
• Search in Associated Y-trees
Searching a 2D-range Tree
• Find (x,y) with 4 < x <50 AND 4 < y < 10
• Answer: (12,5) and (23,8) AND values in 5’s Y-tree
Digesting a 2D-range Tree
• Digest each Y-tree as Merkle tree
• Each internal node in the X-tree gets the hash of three
values: two children and associated Y-tree value
Range Trees
• Let k be the number of answers (out of n)
• Search: O(k+ log2n) time, nlogn space
• improve to O(k+ logn) time with extra
pointers (can still get a hash digest)
• VO (proof) size also O(k+logn)
• Extend to d-dimensions (d-attribute query).
Search time: O(k+log(d-1) n), VO size: same.
Authenticated Data Structures
• Problem: May want to use a variety of
efficient data-structures:
 B-trees (reduce disk access)
 Suffix arrays (string queries)
 Geometric data structures (items within one
mile)
 Many more
Authenticated Data Structures
• Solution: General method to digest a data
structure (produce a single summary hash
value).
• Efficient: Proof size and construction time
= search time.
• Secure: Similar security property: break
only with a specific collision in h
Search DAGS
• Our general setting is any data structure
modeled by:
 A labeled Directed Acyclic Graph (DAG)
 A search process that visits DAG nodes and
determines which neighboring nodes to visit
next (based on labels of visited nodes)
This Models a wide range of structures
A Search DAG
• Search starts at the
unique source node s
of in-degree zero
• Digesting starts from
the sinks (here u, v ):
hash the associated
values
s
b
c
a
u
v
A Search DAG
• D(u): Digest of u
• Node u data : du
• D(u)= h(du)
• D(v)= h(dv)
s
b
c
a
u
v
A Search DAG
• Other Digests use
data and successors
s
b
• D(c) = h(dc, D(v) )
• D(b)=h(db,D(v),D(c))
• D(s) is DAG Digest
c
a
u
v
Verification for Search DAG
• Traditional Merkle Tree verification is
Bottom up (hash path values to root)
• We use top down verification to simulate a
correct search
• Owner provides search procedure P and
root digest D(s)
Authentic Publication
D(s), P
DAG, P
Verification Object for DAG
• VO: information so User can reproduce
the search (and thus verify answers)
• “Lines” of VO match steps of P:
• Data of a node and successor hashes
 ds, D(v1), D(v2) … (successors of s)
 dv1 , D(u1), D(u2), … (successors of v1)
An Example Search
• Starts at s, then visits b
then v
s
b
• VO:
 ds, D(a), D(b), D(c) (line 1)
D(s) = h(ds, D(a), D(b), D(c))
So know data ds is OK.
c
a
u
v
An Example Search
• Starts at s, process ds and
decide b is next
• VO:
 ds, D(a), D(b), D(c) [line 1]
 db, D(v), D(c) [line 2]
If D(b)=h(db,D(v),D(c))
(using D(b) from line 1)
 Data db is correct
s
b
c
a
u
v
Verified Search
• The verified computation proceeds until all
nodes in the actual search are visited (the
VO has one line for each node visited).
• The correct answer is now returned by
search procedure P.
Verified Search
• The verified computation takes time
proportional to the original search (visits
the same nodes).
• Security Proof: shows that a User accepts
the wrong answer only if a specific
collision in hash function h used (e.g.
D(b)=h(d’b,D’(v),D’(c))
Updates
• Typically Digests are updated with work
similar to the data structure’s update time
(e.g. length of the search paths to updated
items)
• If updates are frequent, overall scheme
doesn’t work well (can use time-stamped
digests)
Generalizations
• Allowing multiple Owners: often want to
query data collected from several owners.
Can be done, but now need to trust
owners and data collector.
• Privacy: VO’s may reveal information
about about the data set. Methods to
conceal extra data.
Generalizations
• I/O efficient digests/VO’s: can use a multiway tree to store multiple values in one
disk block (still logically a binary tree for
VO purposes, but stored more efficiently).
• Top-down search DAG approach may be
improved for specific data-structures (e.g.
2D range trees)
Generalizations
• Collections of structured data: XML
documents (can answer path queries)
• Relational operations (Joins, Selection,
Projection)
• Fancier Crypto operations (to reduce VO
size)
References
P. Devanbu, M. Gertz, C. Martel, and S.
G. Stubblebine. Authentic Third Party
Data Publication, 14th IFIP 11.3 Working
Conf. in DB Security (DBSec 2000),
Original Authentic Publication Paper
A General Model for Authenticated
Data Structures, Algorithmica, 2004
Many Data Structures and Search
DAG ( above group and G. Nuckolls)
References
Certifying Data from Multiple Sources,
Proceedings of the 17th Database
Security Conference, 2003
Shows how to use multiple Owners
Flexible authentication of XML documents,
Journal Computer Security, 2004
Survey Chapters
Li, Hadjieleftheriou, Kollios, Reyzin
Authenticated Index Structures for Outsourced
Databases(Overview of area and efficiency issues)
R. Sion: Towards Secure Data Outsourcing
Both in:
Michael Gertz and Sushil Jajodia (eds.):
"Handbook of Database Security: Applications
and Trends", Springer, 2007, to appear.
A. Anagnostopoulos, M. Goodrich, R. Tamassia,
Persistent Authenticated Dictionaries and Their
Applications (allows queries of prior DB
versions)
Authenticated Data Structures for Graph and
Geometric Searching (fancy geometric data
structures)
Pointer for more information
http://truthsayer.cs.ucdavis.edu
Conclusion
• A single signed Digest, can authenticate
answers to many queries
• Secure against hackers and insiders
• Can handle a wide range of data
structures
• Efficient protocols: fast query processing
and small VO’s
Future Work
• Better Update Mechanisms
• Integration of Database optimization
methods
• Actual implementation (partly done by
others), and evaluation