Download slides19

Lecture 17: Executing SQL over Encrypted Data in Database-Service-Provider Model Professor Chen Li Executing SQL over Encrypted Data in Database-Service-Provider Model Hakan Hacigumus University of California, Irvine Bala Iyer IBM Silicon Valley Lab. Chen Li University of California, Irvine Sharad Mehrotra University of California, Irvine SIGMOD 2002, Madison, Wisconsin, USA What do we want to do? Server User Data User Untrusted Administrator    Encrypted User Database We want to store the data on “a server” But the problem is we do not trust “the server” for sensitive information!  encrypt the data and store it  but still be able to run queries over the encrypted data  do most of the work at the server If the server is trusted, ICDE 2002 3 Why is it important anyway? Server Internet User Untrusted Administrator Encrypted User Database (Untrusted) Application Service Provider  Application Service Provider (ASP) Model for Database  DB management transferred to service provider for   backup, administration, restoration, space management, upgrades etc. use the database “as a service” provided by an ASP  use SW, HW, human resources of ASP, instead of your own 4 Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion System Architecture Client Site Server Site Encrypted Results Query Executer Temporary Results Client Side Query ? Server Side Query Service Provider Query Translator Original Query Metadata ? Encrypted User Database ? Actual Results User 6 Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Relational Encryption NAME SALARY etuple PID N_ID S_ID P_ID John 50000 2 fErf!$Q!!vddf>></| 50 1 10 Marry 110000 2 F%%3w&%gfErf!$ 65 2 10 James 95000 3 &%gfsdf$%343v<l 50 2 20 105000 4 %%33w&%gfs##! 65 2 20 Lisa Server Site   Store an encrypted string – etuple – for each tuple in the original table  This is called “row level encryption”  Any kind of encryption technique can be used  Blowfish encryption algorithm is used for this work Create an index for each (or selected) attribute(s) in the original table 9 Building the Index: Partition and Identification Functions  Partition function divides domain values into partitions (buckets) Partition (R.A) = { [0,200], (200,400], (400,600], (600,800], (800,1000] }   partitioning function has an impact on performance as well as privacy Identification function assigns a partition id to each partition of attribute A Partition (Bucket) ids 2 0 7 200 5 400 1 600 4 800 1000 Domain Values   e.g. identR.A( (200,400] ) = 7 Any function can be use as identification function, e.g., hash functions 10 Mapping Functions  Mapping function maps a value v in the domain of attribute A to the id of the partition which value v belongs to Partition (Bucket) ids 2 0 7 200 5 400 1 600 4 800 1000 Domain Values  e.g. MapR.A( 250 ) = 7, MapR.A( 620 ) = 1 11 Storing Encrypted Data R = < A, B, C >  RS = < etuple, A_id, B_id, C_id > etuple = encrypt ( A | B | C ) A_id = MapR.A( A ), B_id = MapR.B( B ), C_id = MapR.C( C ) Table: EMPLOYEES Table: EMPLOYEE NAME SALARY Etuple PID N_ID S_ID P_ID John 50000 2 fErf!$Q!!vddf>></| 50 1 10 Marry 110000 2 F%%3w&%gfErf!$ 65 2 10 James 95000 3 &%gfsdf$%343v<l 50 2 20 105000 4 %%33w&%gfs##! 65 2 20 Lisa 12 Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Mapping Conditions Q: SELECT name, pname FROM emp, proj WHERE emp.pid=proj.pid AND salary > 100k   Server stores attribute indices determined by mapping functions Client stores metadata and utilizes that to translate the query Conditions:  Condition  Attribute op Value  Condition  Attribute op Attribute  Condition  (Condition  Condition) | (Condition  Condition) | (not Condition) 15 Mapping Conditions (2) Example:  Attribute = Value  Mapcond( A = v )  AS = MapA( v )  Mapcond( A = 250 )  AS = 7 Partition Ids 2 0 7 200 5 400 1 600 4 800 1000 Domain Values 16 Mapping Conditions (3)  Attribute1 = Attribute2  Mapcond( A = B )  N (AS = identA( pk )  BS = identB( pl )) where N is pk  partition (A), pl  partition (B), pk  pl   Partitions A_id Partitions B_id [0,100] 2 [0,200] 9 (100,200] 4 (200,400] 8 (200,300] 3 C:A=B  C’ : (AS = 2  BS = 9)  (AS = 4  BS = 9)  (AS = 3  BS = 8) 17 Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Relational Operators over Encrypted Relations     Partition the computation of the operators across client and server Compute (possibly) superset of answers at the server Filter the answers at the client Objective : minimize the work at the client and process the answers as soon as they arrive without requiring storage at the client Operators studied:         Selection Join Grouping and Aggregation Sorting Duplicate Elimination Set Difference Union Projection 19 Selection Operator  ( R ) =  ( D ( c c S S (R )) Mapcond(c) Example: A=250 A=250 D TABLE A_id = 7 E_TABLE 2 0 7 200 5 400 1 600 Client Query Server Query 4 800 1000 20 Join Operator R c T= (D(R c S S S Mapcond(c) T ) Client Query A=B Example: D C EMP C’ PROJ E_EMP Partitions A_id Partitions E_PROJ C:A=B B_id [0,100] 2 [0,200] 9 (100,200] 4 (200,400] 8 (200,300] 3  Server Query C’ :(A_id = 2  B_id = 9) (A_id = 4  B_id = 9) (A_id = 3  B_id = 8) 21 Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Query Decomposition Q: SELECT name, pname FROM emp, proj WHERE emp.pid=proj.pid AND salary > 100k Client Query name,pname name,pname e.pid = p.pid e.pid = p.pid salary >100k salary >100k D D Encrypted (PROJ) PROJ EMP Server Query Encrypted (EMP) 24 Query Decomposition (2) Client Query name,pname Client Query name,pname e.pid = p.pid e.pid = p.pid salary >100k salary >100k D D D E_PROJ s_id = 1 v s_id = 2 D E_PROJ E_EMP E_EMP Server Query Server Query 25 Query Decomposition (3) Client Query name,pname e.pid = p.pid Client Query name,pname salary >100k  e.pid = p.pid salary >100k D D D e.p_id = p.p_id s_id = 1 v s_id = 2 E_EMP Server Query E_PROJ s_id = 1 v s_id = 2 E_PROJ E_EMP Server Query 26 Query Decomposition (4) Client Query name,pname salary >100k  e.pid = Q: SELECT FROM WHERE p.pid e.p_id = p.p_id E_EMP Server Query emp.pid=proj.pid AND salary > 100k QS: SELECT e_emp.etuple, e_proj.etuple D s_id = 1 v s_id = 2 name, pname emp, proj E_PROJ FROM e_emp, e_proj WHERE e.p_id=p.p_id AND s_id = 1 OR s_id = 2 QC: SELECT FROM WHERE name, pname temp emp.pid=proj.pid AND salary > 100k 27 Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Talk Outline  System Architecture  How to create Metadata: Relational Encryption and Storage Model  Query Decomposition and Relational Operators  Query Decomposition – Examples  Experimental Results  Conclusion Experimental Evaluation  Data   Queries   TPC-H database, scale factor 0.1 Based on TPC-H Queries Q#6 and Q#3 Partitioning Strategy   Equi-depth histograms for the first set of experiments Equi-width histograms for the second set of experiments 30 Effect of Number of Buckets in Non-Join Query Cost Factors for Query Response Time Query Response Time 40 30 Client Side Network Server Side 20 10 0 2 8 Number of Buckets   Client and communications costs decreases with increasing number of buckets due to better filtering at the server Server cost doesn’t decrease as much, table scan remains best choice in the optimizer 31 Effect of Number of Buckets in Non-Join Query Client/Server v.s. Single Server Query Response Time 35 30 25 20 15 10 5 0 Single Server Server Side Client Side 2 8 Number of Buckets   Single Server: Server is trusted and performs all operations including decryption on site Shows that proposed query execution protocol doesn’t introduce significant overhead 32 Effect of Number of Buckets in Join Query Client Server Total 1 75 100 150 250 300 Number of Buckets   500 750 1500 Effect of Decryption Time Query Response Time Query Response Time Client, Server, and Total Response Times Client /w decryption Client w/o decryption 1 75 100 150 250 300 500 750 1500 Number of Buckets Sharp decrease in query response time with increase in the number of buckets due to better filtering at the server Client side query response time is greater than server side query response time due to dominant decryption cost on the query (second graph) 33 Effect of Number of Buckets in Join Query Query Response Time Client/Server v.s. Single Server C/S Single Server 1 75 100 150 250 300 500 750 1500 Number of Buckets   Single Server: Server is trusted and performs all operations including decryption on site Consistent with the previous results showing proposed query execution protocol doesn’t introduce significant overhead 34 Conclusion   ASP model is a promising solution for enterprise computing in Internet era We studied data privacy problem    Proposed solution    in the context of ASP model when the ASP is not trusted encrypts data, creates “coarse indexes” and stores the data at ASP allows only data owner to decrypt the data With query decomposition   most of query execution performed at ASP client only performs encryption/decryption, filtering and continues to benefit from ASP model 35

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides19