Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25th, 2009 Introduction Data collection and publishing is a core operation in many distributed systems Outsourced database systems • Organizations outsource their databases to service providers • Organizations can focus on their core tasks without considering the management of their database Two-tiered wireless sensor networks • Storage nodes gather data from nearby sensors and process queries from the sink • Power and storage saving for sensors as well as the efficiency of query processing Outsourced database systems An outsourced database system Query Customer Result Database Database Organizations Outsourced Database Service Provider Query Result Customer Query Result Customer Outsourcing databases offers many advantages Significantly reduce the management cost of organizations Service Providers have higher bandwidths and lower latencies Having multiple service providers helps to avoid the organizations being a single point of failure Two-tired sensor networks A two-tired sensor network Sensor Sensor Data Data Query Result Data Storage Node Data Sensor Sensor Benefits Power saving for sensors Memory saving for sensors Query processing becomes more efficient Sink Comparison between two distributed systems Similarity Three common parties • Data owners, i.e., organizations and sensors • Data publishers, i.e., service providers and storage nodes • Users, i.e., customers and the sink The two distributed systems can be modeled as Data Outsourced Data Query Result User Data Owner Data Publisher • There may be multiple publishers • For outsourced database systems, there may be multiple users • For two-tiered sensor networks, there are multiple data owners Difference For outsourced database systems, users may not be fully trusted by data owners For two-tiered sensor networks, the sink is fully trusted by sensors Security Challenges Due to the important role of data publishers, there are two security challenges Preserve privacy of the data stored in a data publisher Encrypted Data Data Data Owner Outsourced Data Query Result User Data Publisher Untrusted How can a data publisher search the query result over the encrypted data? Security Challenges Preserve integrity of a query result from a data publisher Untrusted Database Data Data Owner Outsourced Data Query Result User Data Publisher manipulate results (1) Forge data (2) Return portion of the result How can we prevent the misbehavior of data publishers? Problem Statement Design the storage scheme and query protocol in a privacy and integrity preserving manner Data and query privacy • Publishers cannot figure out the original data • Publishers cannot figure out queries Queries over data • Data publishers can search query results over the encrypted data, e.g., range queries. Query result integrity • Users can detect whether a query result contains forged data or misses some legitimate data. Efficiency • e.g. communication and computation cost The Proposed Approaches To preserve the privacy of the data, the data owner encrypts the data To enable the searching operation for data publishers, the data owner encodes the private data in a format which supports the searching operation To preserve the integrity of query results, the data owner computes verification objects (VOs) for all possible queries Let {t1, t2, …, tm} denote the data of a data owner, the basic idea is illustrated {encrypt(t1), …, encrypt(tm)} {t1, …, tm} User Data Publisher Data Owner search(t1, …, tm) VOs(t1, …, tm) search(query) encrypt(ti1), …, encrypt(tig) query VO(ti1, …, tig) Previous Work Outsourced database systems Preserving Privacy • Bucket Partition [Hacigumus et al., SIGMOD 2002] • A Public-key system [Boneh and Waters, TCC 2007] Preserving Integrity • Merkle hash trees [Devanbu et al., Journal of Computer Security 2003] • Signature aggregation and chaining techniques [Narasimha and Tsudik, DASFAA 2006] • Spatial data structures [Chen et al., ESORICS 2008] Two-tiered sensor networks • S&L scheme [Infocom 2008] • The optimized version of S&L scheme [Infocom 2009, Mobihoc 2009] Privacy in outsourced database systems Bucket Partition [Hacigumus et al., SIGMOD 2002] Data owner (Key Ki ) User (Key Ki ) Data Publisher Outsourced Database Database {2,5,9,15,20,23,34,40} {2,5,9} Ki 0 Bucket ids : 1 [35,45] {15,20,23} Ki {34,40} Ki 12 32 2 40 3 50 4 3, 4 Result: {34,40} Ki Return more data Privacy in outsourced database systems Bucket Partition Drawbacks A query result may have false positive errors It allows data publishers to obtain a reasonable estimation on the actual value of data items and queries Privacy in outsourced database systems A Public key system [Boneh and Waters, TCC 2007] Hidden Vector Encryption • Using bilinear groups to produce tokens for searching conjunctive, subset, and range queries on an encrypted database. Drawback Computationally expensive • Public key cryptography • Require a database owner to perform O(zD) encryption for each tuple, where z is the number of dimensions and D is the domain size Integrity in outsourced database systems Merkle hash trees [Devanbu et al., Journal of Computer Security 2003] H18=h (H14|H58) H14=h(H12|H34) H12=h(H1|H2) H1=h((d1)ki) H 1 H18 H14 H58 H12 H34 H2 H3 (d1)ki (d2)ki (d3)ki H56 H4 H5 H78 H6 H7 H8 (d4)ki (d5)ki (d6)ki (d7)ki (d8)ki Integrity in outsourced database systems Merkle hash trees H18 H14 H58 H12 H34 H1 H2 H3 (2)ki (5)ki (9)ki H56 H4 H5 H78 H6 (15)ki (20)ki (23)ki H7 H8 (34)ki (40)ki Query [10, 30] Query result Verification object Integrity in outsourced database systems H18 H14 H58 H12 H34 H1 H2 (2)ki (5)ki H3 H56 H4 H5 H78 H6 H7 H8 (9)ki (15)ki (20)ki (23)ki (34)ki (40)ki Query [10, 14] Query result Verification object Drawbacks A query result has false positive errors It is hard to extend Merkle hash trees to verify the integrity for multidimensional data Integrity in outsourced database systems Signature Aggregation and Chaining It aggregates multiple individual signatures into one unified signature • Verifying the unified signature is equivalent to verifying all individual signatures It presents a signature chain that links a signature of a data item with the signatures of the data item’s neighbors The signature of t5 is Sig t5 hht5 | ht6 | ht2 | ht7 k Drawbacks A query result has false positive errors It is computationally expensive to verify the integrity of multidimensional data Integrity in outsourced database systems Spatial Data Structures [Chen et al., ESORICS 2008] Integrity in outsourced database systems Chen et al. proposed a Canonical Range Tree (CRT) to count the number of data items in access control areas and query spaces. Advantages No false positive errors. Do not need to provide the boundary data items. It can be used to perform access control Drawbacks Only can be applied for range queries, while SQL includes other types of queries Privacy and Integrity in Two-tiered sensor networks S&L scheme [Infocom 2008] Sensor Si (Key Ki ) Storage Node Data Sink (Key Ki ) Query [9,10] {1, 4, 5, 7, 9} {1,4} Ki {5} Ki {7, 9} Ki h(i||4||t||Ki) 0 Bucket IDs:1 4 5 2 9 3 Two major drawbacks 10 4 3, 4 {7, 9} Ki 7 is out of the range Result: h(i||4||t||Ki) Prove empty bucket Fairly accurate estimating data items and quires Power and space consumption grows exponentionally with the number of dimensions. Privacy and Integrity in Two-tiered sensor networks Optimized version of S&L scheme For one-dimensional data [Infocom 2009] • Embed relationships among data collected by each sensor • Define a vector where each bit indicates whether the node has data in the corresponding bucket or not 3 {3, 1110}Ki 18 {18, 1110} Ki Sensor 2 V1 Storage Node 2,5,9 Bucket Vector V1: 1 Sensor 1 15,20,23 1 V1 34,40 1 0 Sensor 3 Privacy and Integrity in Two-tiered sensor networks For Multi-dimensional data [Mobihoc 2009] Sensor 2 1 10 21 30 2 5 (12,5)ki (15,6)ki (23,4)ki (45,3)ki 46 50 7 011 V1= 0 1 0 010 000 V1 Sensor 1 Storage Node V1 Sensor 3 These two schemes are less secure than S&L’s scheme They inherit the same weakness of allowing storage nodes to estimate the original data and queries The optimization technique allows a compromised sensor to easily compromise the integrity verification functionality of the network • Send falsified bit maps to sensors and storage nodes. Future Research Directions For outsourced database systems No complete solutions of preserving privacy and integrity for outsourced database systems Preserving privacy and integrity for multi-dimensional data is not well studied For two-tiered sensor networks Prevent a storage node from estimating data and queries Multi-dimensional data Efficiency Questions Thank you!