Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer Science Integrity Assurance for Outsourced Databases without DBMS Modification DBSec 2014 Wei Wei, Ting Yu 1 Overview Motivation o Database outsourcing is a cost-effective solution o Integrity for Outsourced Databases It has been an active research area in decades Existing solutions requires modifying DBMSs No existing cloud database services support integrity checking Our Focus o Provide integrity assurance for outsourced databases without DBMS modification Basic Idea o Build a Merkle Hash Tree based Authenticated Data Structure per table o De-serialize authentication data into tables with well-designed format Support highly efficient authentication data retrieval Computer Science 2 Database Outsourcing Model •Data Owner id col1 … coln 0 Alice … 1000 10 Ben … 2000 … … … … 70 Smith … 4500 •Update Data and Authentication Data •Clients •Send Data Queries and Authentication Data Queries •Upload Data and Authentication Data •Database Service Provider (DSP) •Query Results Including Data and Authentication Data Computer Science 3 System Model Assumptions o DSPs are not fully trusted by data owners and clients o The data owner has a public/private key pair, and public key is known to all o The data owner is the only party who can update data o Public communications are through a secure channel Attacks from a DSP o Return incorrect data by tampering some data o Return incomplete data result by discarding some data o Report that data doesn’t exist or return old data Computer Science 4 System Model cont’d Goal o Provide integrity assurance for outsourced databases without DBMS modification Design Goals o Security (Integrity) Correctness, completeness, freshness o Practicability Simplicity, flexibility, efficiency Computer Science 5 Running Example id col1 col2 col3 col4 … coln 0 Alice F 20 NC … 1000 10 Ben M 30 NY … 2000 20 Cary F 42 CA … 1500 30 Lisa F 15 CA … 3000 40 Kate F 18 NY … 2300 50 Mike M 24 SC … 4000 60 Nancy F 36 VA … 2300 70 Smith M 12 TA … 4500 •A Relational Data Table Computer Science 6 System Design Authentication Data Structure Identify Authentication Data Store Authentication Data Extract Authentication Data Computer Science 7 Authentication Data Structure Signature Aggregation based ADS Merkle Hash Tree based ADS •… id col1 … coln 0 Alice … 1000 10 Ben … 2000 20 Cary … 1500 30 Lisa … 3000 40 Kate … 2300 50 Mike … 4000 60 Nancy … 2300 70 Smith … 4500 •ki •pi •hi=H(h1|…|hf) •20 •10 •0 •… •40 •30 •10 •20 •50 •30 40 •60 50 60 70 •Merkle B-tree •Data Table Computer Science 8 Identify Authentication Data Existing Approaches o Adjacency list Multiple steps to find ancestor, no order of pointers or records in a node o Path enumeration No order of pointers or records in a node, inefficient string operation o Nested set Require joining two tables to find parent node, hard to find siblings o Closure table Consume more storage, no order of pointers or records in a node Our Approach o Radix-Path Identifier Combine Radix-based labeling and Dewey labeling Computer Science 9 Identify Authentication Data 0 1 2 20 00 01 10 10 0 000 11 20 30 10 010 40 20 100 50 30 110 21 40 200 22 60 50 210 60 220 70 221 •Radix-Path Identifier Computer Science 10 Identify Authentication Data 0 1 20 0 1 4 10 0 0 40 5 8 9 30 10 4 2 20 16 50 30 20 40 32 10 60 50 36 60 40 70 41 •Radix-Path Identifier (rb = 4) Computer Science 11 Identify Authentication Data Radix-Path Identifier Properties 1. 2. 3. 4. Identifiers in a node are continuous, but not continuous between sibling nodes = Min = Max = Easy to find the index in a node, which is Computer Science 12 Store Authentication Data SAT: Single Authentication Table data_auth (max level - 2) id rpid hash level id rpid hash level -1 0 TvJtus 2 60 10 Udew 1 20 1 asdwS 2 0 0 nudg 0 40 2 DFsQ 2 10 4 Q9ej 0 -1 0 Kjdaw 1 20 16 wVi2 0 10 1 Ujrw 1 30 20 kidDs 0 -1 4 JHds 1 40 32 Kdie* 0 30 5 iueDs 1 50 36 8dFes 0 -1 8 Jdiw. 1 60 40 Iurw 0 50 9 .dkaw 1 70 41 KJdw 0 Computer Science 13 Store Authentication Data SAT: Single Authentication Table o Pros Simple and straightforward o Cons Index is built based on all records (inefficient queries) Updates could be inefficient Concurrent updates may cost more resources Computer Science 14 Store Authentication Data LBAT: Level-based Authentication Table •data_auth0 (Level 2) •data_auth1 (Level 1) •data (Level 0) id rpid hash id rpid hash id col1 … coln rpid hash -1 0 TvJtus -1 0 Kjdaw 0 Alice … 1000 0 nudg 20 1 asdwS 10 1 Ujrw 10 Ben … 2000 4 Q9ej 40 2 DFsQ -1 4 JHds 20 Cary … 1500 16 wVi2 30 5 iueDs 30 Lisa … 3000 20 kidDs -1 8 Jdiw. 40 Kate … 2300 32 Kdie* 50 9 .dkaw 50 Mike … 4000 36 8dFes 60 10 Udew 60 Nancy … 2300 40 Iurw 70 Smith … 4500 41 KJdw •data_mapping level table 2 data_auth0 1 data_auth1 0 data Computer Science 15 Store Authentication Data LBAT: Level-based Authentication Table o Pros Indexes are more efficient Updates could be more efficient (root split) Enable concurrent updates with table level lock o Cons Multiple authentication tables Relatively complicated for authentication data generation Computer Science 16 Extract Authentication Data •Retrieval of Authentication Data 20 10 0 40 30 10 20 50 30 40 60 50 60 70 •Authentication Data for 50 Computer Science 17 Extract Authentication Data SingleJoin •select l1.rpid,l1.hash from data t0 data l1 on l1.rpid/4 = t0.rpid/(4) •left join data_auth0 data_auth1 l1 on l1.rpid/4 = t0.rpid/(4*4*4) t0.rpid/(4*4) •where t0.id=50; •data_auth0 (Level 0) •data_auth1 (Level 1) •data (Level 2) id rpid hash id rpid hash id col1 … coln rpid hash -1 0 TvJtus -1 0 Kjdaw 0 Alice … 1000 0 nudg 20 1 asdwS 10 1 Ujrw 10 Ben … 2000 4 Q9ej 40 2 DFsQ -1 4 JHds 20 Cary … 1500 16 wVi2 30 5 iueDs 30 Lisa … 3000 20 kidDs -1 8 Jdiw. 40 Kate … 2300 32 Kdie* 50 9 .dkaw 50 Mike … 4000 36 8dFes 60 10 Udew 60 Nancy … 2300 40 Iurw 70 Smith … 4500 41 KJdw Computer Science 18 Extract Authentication Data RangeCondition •-- find the rpid of the data record with the id 50 •declare @rowrpid AS int; •set @rowrpid=(select top 1 rpid from data where id=50); •-- level 2, 1, 0 (from leaf level to root level) •select rpid,hash from data •where rpid>=(@rowrpid/(4))*4 and rpid<(@rowrpid/(4))*4+4; •select rpid,hash from data_auth1 •where rpid>=(@rowrpid/(4*4))*4 and rpid<(@rowrpid/(4*4))*4+4; •select rpid,hash from data_auth0 •where rpid>=(@rowrpid/(4*4*4))*4 and rpid<(@rowrpid/(4*4*4))*4+4; Computer Science 19 Extract Authentication Data SingleJoin RangeCondition Computer Science 20 Data Operations Select o Unique Select o Range Select Update o Single Record Update o Batch Update and Optimization Insert o Single Record Insert o Batch Insert and Optimization Computer Science 21 Range Select Steps o Find two boundaries o Retrieve authentication data for two boundaries 20 10 40 30 10 20 •Left Boundary 50 30 •Query Range 40 60 50 •Right Boundary •Retrieve Authentication Data for Range Select from 15 to 45 Computer Science 22 Range Select Steps o Find two boundaries o Retrieve authentication data for two boundaries •Retrieve Authentication Data for Range Select from 15 to 45 Computer Science 23 Single Record Update Steps 20 o Retrieve authentication data o Generate update statements 40 20 40 o h updates, (h – tree height) 30 o Execute updates in one transaction 30 20 VO for 20 20 VO update for 20 •Update 20 Computer Science 24 Batch Update and Optimization Update x records o x * h update statements o Some of them update the same authentication data record 20 10 0 40 30 10 20 50 30 40 60 50 60 70 •# of updates is 4 * 3 = 12 •Actually, 8 updates are necessary Computer Science 25 Batch Update and Optimization Optimization – MergeUpdate o Track all updates to each table o Find the set of updates to one authentication data record o Keep the latest one and remove others Computer Science 26 Experimental Evaluation System Implementation o o o o o Implementation based on .NET and SQL Server Merkle B-tree based on .NET MultiJoin, SingleJoin, ZeroJoin and RangeCondition Query rewrite algorithm A tool to generate authentication data Experiment Setup o o o o A synthetic database containing one table with 100,000 records Each record is about 1KB .NET3.5, SQL Server 2008 R2, Windows OS Client network with 30Mbps download and 4Mbps upload Computer Science 27 Experiments – Unique Select Setup o Run a select statement based on the primary key to retrieve one data record o Compute the overhead of our scheme when using different authentication retrieval methods Results 1. 2. RangeCondition is much more efficient than others The overhead of our scheme could be as low as 5% Computer Science 28 Experiments – Update Setup o Execute batch updates for different number of rows based on different cases D – Direct Update, C – Cached Update, RC – RangeCondition, MU – MergeUpdate o Compute the overhead of our scheme Results 1. The overhead of Direct Update could go up to 200% 2. The overhead of C-RC ranges from 3% to 30% 3. The overhead of C-RC-MU is about 3% 4. The MergeUpdate effectively reduce the number of update statements Computer Science 29 Experiments – Scalability Setup o Run a range query to select 512 rows as # of rows changes in the data table o Record time spent to complete the select query Results 1. The overhead of our scheme is about 2 ~ 3% in all cases Computer Science 30 Experiments – Comparison Setup o Run queries to retrieve authentication data as # of rows in table increases for three schemes: our scheme, OPEN-XML and DT-XML o Record time spent to retrieve authentication data Results 1. Our scheme takes about 250ms in all cases 2. Our scheme is about 100 times faster than two XML-based schemes Computer Science 31 Related Work Authentication Data Structure o Verifiable B-tree [Pang et al. ICDE 2004] o Embedded Merkle B-tree [Hakan et al. SIGMOD 2006] o Signature aggregation and chaining [Narasimha et al. DASFFA 2006] Others o o o o Protect data privacy [Hakan et al. SIGMOD 2002] Only handle read-only queries [Sion et al.VLDB 2005] Authenticated join processing [Yin et al.SIGMOD 2009] Partially materialized digest scheme [Mouratidis et al.VLDB 2009] Similar to our work o Probabilistic approaches [Xie et al.VLDB 2007] o Tamper-Evident Database System [Gerome et al. ASIAN 2005] o Authenticated Relational Tables [Giuseppe et al. DBSec 2007] Computer Science 32 Conclusion Contributions o Proposed a novel approach called Radix-Path Identifier to serialize/deserialize MHT-based authentication data o Explored the efficiency of different methods to retrieve authentication data, and optimize the update processing o Build a proof-of-concept prototype and conduct extensive experiments to evaluate the performance overhead and efficiency Computer Science 33 •Questions? •Thank you Computer Science 34