Download slides19

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Team Foundation Server wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational algebra wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Lecture 17: Executing SQL over Encrypted
Data in Database-Service-Provider Model
Professor Chen Li
Executing SQL over Encrypted Data in
Database-Service-Provider Model
Hakan Hacigumus
University of California, Irvine
Bala Iyer
IBM Silicon Valley Lab.
Chen Li
University of California, Irvine
Sharad Mehrotra
University of California, Irvine
SIGMOD 2002, Madison, Wisconsin, USA
What do we want to do?
Server
User Data
User
Untrusted Administrator



Encrypted User
Database
We want to store the data on “a server”
But the problem is we do not trust “the server” for sensitive
information!

encrypt the data and store it

but still be able to run queries over the encrypted data

do most of the work at the server
If the server is trusted, ICDE 2002
3
Why is it important anyway?
Server
Internet
User
Untrusted Administrator
Encrypted User
Database
(Untrusted) Application Service Provider

Application Service Provider (ASP) Model for Database

DB management transferred to service provider for


backup, administration, restoration, space management, upgrades etc.
use the database “as a service” provided by an ASP

use SW, HW, human resources of ASP, instead of your own
4
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
System Architecture
Client Site
Server Site
Encrypted
Results
Query
Executer
Temporary
Results
Client Side
Query
?
Server Side
Query
Service Provider
Query
Translator
Original Query
Metadata
?
Encrypted User
Database
?
Actual Results
User
6
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Relational Encryption
NAME
SALARY
etuple
PID
N_ID
S_ID
P_ID
John
50000
2
fErf!$Q!!vddf>></|
50
1
10
Marry
110000
2
F%%3w&%gfErf!$
65
2
10
James
95000
3
&%gfsdf$%343v<l
50
2
20
105000
4
%%33w&%gfs##!
65
2
20
Lisa
Server Site


Store an encrypted string – etuple – for each tuple in the original table

This is called “row level encryption”

Any kind of encryption technique can be used

Blowfish encryption algorithm is used for this work
Create an index for each (or selected) attribute(s) in the original table
9
Building the Index:
Partition and Identification Functions

Partition function divides domain values into partitions (buckets)
Partition (R.A) = { [0,200], (200,400], (400,600], (600,800], (800,1000] }


partitioning function has an impact on performance as well as privacy
Identification function assigns a partition id to each partition of attribute A
Partition (Bucket) ids
2
0
7
200
5
400
1
600
4
800
1000
Domain Values


e.g.
identR.A( (200,400] ) = 7
Any function can be use as identification function, e.g., hash functions
10
Mapping Functions

Mapping function maps a value v in the domain of attribute A to the id
of the partition which value v belongs to
Partition (Bucket) ids
2
0
7
200
5
400
1
600
4
800
1000
Domain Values

e.g.
MapR.A( 250 ) = 7, MapR.A( 620 ) = 1
11
Storing Encrypted Data
R = < A, B, C >
 RS = < etuple, A_id, B_id, C_id >
etuple = encrypt ( A | B | C )
A_id = MapR.A( A ), B_id = MapR.B( B ), C_id = MapR.C( C )
Table: EMPLOYEES
Table: EMPLOYEE
NAME
SALARY
Etuple
PID
N_ID
S_ID
P_ID
John
50000
2
fErf!$Q!!vddf>></|
50
1
10
Marry
110000
2
F%%3w&%gfErf!$
65
2
10
James
95000
3
&%gfsdf$%343v<l
50
2
20
105000
4
%%33w&%gfs##!
65
2
20
Lisa
12
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Mapping Conditions
Q: SELECT name, pname FROM emp, proj
WHERE emp.pid=proj.pid AND salary > 100k


Server stores attribute indices determined by mapping functions
Client stores metadata and utilizes that to translate the query
Conditions:

Condition  Attribute op Value

Condition  Attribute op Attribute

Condition  (Condition
 Condition) | (Condition  Condition)
| (not Condition)
15
Mapping Conditions (2)
Example:

Attribute = Value

Mapcond( A = v )  AS = MapA( v )

Mapcond( A = 250 )  AS = 7
Partition Ids
2
0
7
200
5
400
1
600
4
800
1000
Domain Values
16
Mapping Conditions (3)

Attribute1 = Attribute2

Mapcond( A = B )  N (AS = identA( pk )  BS = identB( pl ))
where N is pk  partition (A), pl  partition (B), pk  pl  
Partitions
A_id
Partitions
B_id
[0,100]
2
[0,200]
9
(100,200]
4
(200,400]
8
(200,300]
3
C:A=B

C’ :
(AS = 2  BS = 9)
 (AS = 4  BS = 9)
 (AS = 3  BS = 8)
17
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Relational Operators over
Encrypted Relations




Partition the computation of the operators across client and server
Compute (possibly) superset of answers at the server
Filter the answers at the client
Objective : minimize the work at the client and process the answers
as soon as they arrive without requiring storage at the client
Operators studied:








Selection
Join
Grouping and Aggregation
Sorting
Duplicate Elimination
Set Difference
Union
Projection
19
Selection Operator
 ( R ) =  ( D (
c
c
S
S
(R ))
Mapcond(c)
Example:
A=250
A=250
D
TABLE
A_id = 7
E_TABLE
2
0
7
200
5
400
1
600
Client Query
Server Query
4
800
1000
20
Join Operator
R
c
T=
(D(R
c
S
S
S
Mapcond(c)
T )
Client Query
A=B
Example:
D
C
EMP
C’
PROJ
E_EMP
Partitions
A_id
Partitions
E_PROJ
C:A=B
B_id
[0,100]
2
[0,200]
9
(100,200]
4
(200,400]
8
(200,300]
3

Server Query
C’ :(A_id = 2  B_id =
9)
(A_id = 4  B_id = 9)
(A_id = 3  B_id = 8)
21
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Query Decomposition
Q: SELECT name, pname FROM emp, proj
WHERE emp.pid=proj.pid AND salary > 100k
Client Query
name,pname
name,pname
e.pid = p.pid
e.pid = p.pid
salary >100k
salary >100k
D
D
Encrypted
(PROJ)
PROJ
EMP
Server Query
Encrypted
(EMP)
24
Query Decomposition (2)
Client Query
name,pname
Client Query
name,pname
e.pid = p.pid
e.pid = p.pid
salary >100k
salary >100k
D
D
D
E_PROJ
s_id = 1 v s_id = 2
D
E_PROJ
E_EMP
E_EMP
Server Query
Server Query
25
Query Decomposition (3)
Client Query
name,pname
e.pid = p.pid
Client Query
name,pname
salary >100k  e.pid =
p.pid
salary >100k
D
D
D
e.p_id = p.p_id
s_id = 1 v s_id = 2
E_EMP
Server Query
E_PROJ
s_id = 1 v s_id = 2
E_PROJ
E_EMP
Server Query
26
Query Decomposition (4)
Client Query
name,pname
salary >100k  e.pid =
Q: SELECT
FROM
WHERE
p.pid
e.p_id = p.p_id
E_EMP
Server Query
emp.pid=proj.pid AND
salary > 100k
QS: SELECT e_emp.etuple, e_proj.etuple
D
s_id = 1 v s_id = 2
name, pname
emp, proj
E_PROJ
FROM e_emp, e_proj
WHERE
e.p_id=p.p_id AND
s_id = 1 OR s_id = 2
QC: SELECT
FROM
WHERE
name, pname
temp
emp.pid=proj.pid AND
salary > 100k
27
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Talk Outline

System Architecture

How to create Metadata: Relational Encryption and Storage Model

Query Decomposition and Relational Operators

Query Decomposition – Examples

Experimental Results

Conclusion
Experimental Evaluation

Data


Queries


TPC-H database, scale factor 0.1
Based on TPC-H Queries Q#6 and Q#3
Partitioning Strategy


Equi-depth histograms for the first set of experiments
Equi-width histograms for the second set of experiments
30
Effect of Number of Buckets in
Non-Join Query
Cost Factors for Query Response Time
Query Response
Time
40
30
Client Side
Network
Server Side
20
10
0
2
8
Number of Buckets


Client and communications costs decreases with increasing number of buckets
due to better filtering at the server
Server cost doesn’t decrease as much, table scan remains best choice in the
optimizer
31
Effect of Number of Buckets in
Non-Join Query
Client/Server v.s. Single Server
Query Response
Time
35
30
25
20
15
10
5
0
Single Server
Server Side
Client Side
2
8
Number of Buckets


Single Server: Server is trusted and performs all operations including
decryption on site
Shows that proposed query execution protocol doesn’t introduce significant
overhead
32
Effect of Number of Buckets in Join Query
Client
Server
Total
1
75
100
150
250
300
Number of Buckets


500
750
1500
Effect of Decryption Time
Query Response Time
Query Response Time
Client, Server, and Total Response Times
Client /w
decryption
Client w/o
decryption
1
75
100
150
250
300
500
750
1500
Number of Buckets
Sharp decrease in query response time with increase in the number of
buckets due to better filtering at the server
Client side query response time is greater than server side query response
time due to dominant decryption cost on the query (second graph)
33
Effect of Number of Buckets in Join Query
Query Response Time
Client/Server v.s. Single Server
C/S
Single Server
1
75
100
150
250
300
500
750
1500
Number of Buckets


Single Server: Server is trusted and performs all operations including
decryption on site
Consistent with the previous results showing proposed query execution
protocol doesn’t introduce significant overhead
34
Conclusion


ASP model is a promising solution for enterprise computing in Internet
era
We studied data privacy problem



Proposed solution



in the context of ASP model
when the ASP is not trusted
encrypts data, creates “coarse indexes” and stores the data at ASP
allows only data owner to decrypt the data
With query decomposition


most of query execution performed at ASP
client only performs encryption/decryption, filtering and continues to
benefit from ASP model
35