Download Presented - Michigan State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Privacy and Integrity Preserving in
Distributed Systems
Presented for Ph.D. Qualifying Examination
Fei Chen
Michigan State University
August 25th, 2009
Introduction
Data collection and publishing is a core operation in
many distributed systems
 Outsourced database systems
• Organizations outsource their databases to service providers
• Organizations can focus on their core tasks without considering
the management of their database
 Two-tiered wireless sensor networks
• Storage nodes gather data from nearby sensors and process
queries from the sink
• Power and storage saving for sensors as well as the efficiency of
query processing
Outsourced database systems
 An outsourced database system
Query
Customer
Result
Database
Database
Organizations
Outsourced
Database
Service Provider
Query
Result
Customer
Query
Result
Customer
 Outsourcing databases offers many advantages
 Significantly reduce the management cost of organizations
 Service Providers have higher bandwidths and lower latencies
 Having multiple service providers helps to avoid the organizations
being a single point of failure
Two-tired sensor networks
A two-tired sensor network
Sensor
Sensor
Data
Data
Query
Result
Data
Storage Node
Data
Sensor
Sensor
Benefits
 Power saving for sensors
 Memory saving for sensors
 Query processing becomes more efficient
Sink
Comparison between two distributed systems
 Similarity
 Three common parties
• Data owners, i.e., organizations and sensors
• Data publishers, i.e., service providers and storage nodes
• Users, i.e., customers and the sink
 The two distributed systems can be modeled as
Data
Outsourced
Data
Query
Result
User
Data Owner
Data Publisher
• There may be multiple publishers
• For outsourced database systems, there may be multiple users
• For two-tiered sensor networks, there are multiple data owners
 Difference
 For outsourced database systems, users may not be fully trusted by data owners
 For two-tiered sensor networks, the sink is fully trusted by sensors
Security Challenges
Due to the important role of data publishers, there
are two security challenges
 Preserve privacy of the data stored in a data publisher
Encrypted
Data
Data
Data Owner
Outsourced
Data
Query
Result
User
Data Publisher
Untrusted
How can a data publisher search the query result over the encrypted data?
Security Challenges
 Preserve integrity of a query result from a data publisher
Untrusted
Database
Data
Data Owner
Outsourced
Data
Query
Result
User
Data Publisher
manipulate results
(1) Forge data
(2) Return portion
of the result
How can we prevent the misbehavior of data publishers?
Problem Statement
Design the storage scheme and query protocol in a
privacy and integrity preserving manner
 Data and query privacy
• Publishers cannot figure out the original data
• Publishers cannot figure out queries
 Queries over data
• Data publishers can search query results over the encrypted data,
e.g., range queries.
 Query result integrity
• Users can detect whether a query result contains forged data or
misses some legitimate data.
 Efficiency
• e.g. communication and computation cost
The Proposed Approaches
 To preserve the privacy of the data, the data owner encrypts the data
 To enable the searching operation for data publishers, the data owner encodes
the private data in a format which supports the searching operation
 To preserve the integrity of query results, the data owner computes verification
objects (VOs) for all possible queries
Let {t1, t2, …, tm} denote the data of a data owner, the basic idea is illustrated
{encrypt(t1), …, encrypt(tm)}
{t1, …, tm}
User
Data Publisher
Data Owner
search(t1, …, tm)
VOs(t1, …, tm)
search(query)
encrypt(ti1), …, encrypt(tig)
query
VO(ti1, …, tig)
Previous Work
 Outsourced database systems
 Preserving Privacy
• Bucket Partition [Hacigumus et al., SIGMOD 2002]
• A Public-key system [Boneh and Waters, TCC 2007]
 Preserving Integrity
• Merkle hash trees [Devanbu et al., Journal of Computer Security 2003]
• Signature aggregation and chaining techniques [Narasimha and Tsudik,
DASFAA 2006]
• Spatial data structures [Chen et al., ESORICS 2008]
 Two-tiered sensor networks
• S&L scheme [Infocom 2008]
• The optimized version of S&L scheme [Infocom 2009, Mobihoc 2009]
Privacy in outsourced database systems
Bucket Partition [Hacigumus et al., SIGMOD 2002]
Data owner
(Key Ki )
User
(Key Ki )
Data Publisher
Outsourced
Database
Database
{2,5,9,15,20,23,34,40}
{2,5,9} Ki
0
Bucket ids : 1
[35,45]
{15,20,23} Ki {34,40} Ki
12
32
2
40
3
50
4
3, 4
Result: {34,40} Ki
Return more data
Privacy in outsourced database systems
Bucket Partition
Drawbacks
 A query result may have false positive errors
 It allows data publishers to obtain a reasonable estimation
on the actual value of data items and queries
Privacy in outsourced database systems
A Public key system [Boneh and Waters, TCC 2007]
 Hidden Vector Encryption
• Using bilinear groups to produce tokens for searching conjunctive,
subset, and range queries on an encrypted database.
Drawback
 Computationally expensive
• Public key cryptography
• Require a database owner to perform O(zD) encryption for each
tuple, where z is the number of dimensions and D is the domain
size
Integrity in outsourced database systems
Merkle hash trees [Devanbu et al., Journal of Computer
Security 2003]
H18=h (H14|H58)
H14=h(H12|H34)
H12=h(H1|H2)
H1=h((d1)ki) H
1
H18
H14
H58
H12
H34
H2
H3
(d1)ki (d2)ki (d3)ki
H56
H4
H5
H78
H6
H7
H8
(d4)ki (d5)ki (d6)ki (d7)ki (d8)ki
Integrity in outsourced database systems
Merkle hash trees
H18
H14
H58
H12
H34
H1
H2
H3
(2)ki
(5)ki
(9)ki
H56
H4
H5
H78
H6
(15)ki (20)ki (23)ki
H7
H8
(34)ki (40)ki
Query [10, 30]
Query result
Verification object
Integrity in outsourced database systems
H18
H14
H58
H12
H34
H1
H2
(2)ki
(5)ki
H3
H56
H4
H5
H78
H6
H7
H8
(9)ki (15)ki (20)ki (23)ki (34)ki (40)ki
Query [10, 14]
Query result
Verification object
 Drawbacks
 A query result has false positive errors
 It is hard to extend Merkle hash trees to verify the integrity for multidimensional data
Integrity in outsourced database systems
 Signature Aggregation and Chaining
 It aggregates multiple individual signatures into one unified signature
• Verifying the unified signature is equivalent to verifying all individual signatures
 It presents a signature chain that links a signature of a data item with
the signatures of the data item’s neighbors
The signature of t5 is Sig t5   hht5  | ht6  | ht2  | ht7 k
 Drawbacks
 A query result has false positive errors
 It is computationally expensive to verify the integrity of multidimensional data
Integrity in outsourced database systems
Spatial Data Structures [Chen et al., ESORICS 2008]
Integrity in outsourced database systems
 Chen et al. proposed a Canonical Range Tree (CRT) to count the
number of data items in access control areas and query spaces.
 Advantages
 No false positive errors. Do not need to provide the boundary data items.
 It can be used to perform access control
 Drawbacks
 Only can be applied for range queries, while SQL includes other types of
queries
Privacy and Integrity in Two-tiered sensor
networks
S&L scheme [Infocom 2008]
Sensor Si
(Key Ki )
Storage Node
Data
Sink
(Key Ki )
Query
[9,10]
{1, 4, 5, 7, 9}
{1,4} Ki {5} Ki {7, 9} Ki h(i||4||t||Ki)
0
Bucket IDs:1
4
5
2
9
3
Two major drawbacks
10
4
3, 4
{7, 9} Ki 7 is out of the range
Result:
h(i||4||t||Ki) Prove empty bucket
 Fairly accurate estimating data items and quires
 Power and space consumption grows exponentionally
with the number of dimensions.
Privacy and Integrity in Two-tiered sensor
networks
 Optimized version of S&L scheme
 For one-dimensional data [Infocom 2009]
• Embed relationships among data collected by each sensor
• Define a vector where each bit indicates whether the node has data in the
corresponding bucket or not
3
{3, 1110}Ki
18
{18, 1110} Ki
Sensor 2
V1
Storage Node
2,5,9
Bucket Vector V1: 1
Sensor 1
15,20,23
1
V1
34,40
1
0
Sensor 3
Privacy and Integrity in Two-tiered sensor
networks
 For Multi-dimensional data [Mobihoc 2009]
Sensor 2
1
10
21
30
2
5
(12,5)ki (15,6)ki
(23,4)ki
(45,3)ki
46
50
7
011
V1= 0 1 0
010
000
V1
Sensor 1
Storage Node
V1
Sensor 3
 These two schemes are less secure than S&L’s scheme
 They inherit the same weakness of allowing storage nodes to
estimate the original data and queries
 The optimization technique allows a compromised sensor to easily
compromise the integrity verification functionality of the network
• Send falsified bit maps to sensors and storage nodes.
Future Research Directions
For outsourced database systems
 No complete solutions of preserving privacy and integrity
for outsourced database systems
 Preserving privacy and integrity for multi-dimensional data
is not well studied
For two-tiered sensor networks
 Prevent a storage node from estimating data and queries
 Multi-dimensional data
 Efficiency
Questions
Thank you!