Download “Good Enough” Database Caching

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Caching with “Good Enough”
Currency, Consistency, and Completeness
Hongfei Guo
University of Wisconsin
Per-Åke Larson
Microsoft Research
Raghu Ramakrishnan University of Wisconsin
Motivation — Scaling Google
…
2
Motivation — Scaling A DBMS By Caching
How to tell whether the cached data is “good enough” for an
Application Server
application?
 NO data quality requirements from the applications!
 specific
NO data quality
App
codeguarantees from the caching DBMS!
…
Caching
DBMS
Asynchronous
Updates
Backend
DBMS
3
The Big Picture



Serverquality requirements in
Apps: Application
Specifies data
queries
Cache: Enforces data quality constraint
View
level
granularity
[SIGMOD
2004]
[SIGMOD
2004 Demo]
Caching
Cache admin:
Specify local data quality to be
DBMS
maintained
by cache
Finer
granularity
(Data (Partitions
quality-aware database
caching model)
of a view)
[This presentation]

Backend
System performance
evaluation
DBMS
[dissertation]
4
Data Quality Metrics (informal)



Currency: The elapsed time since this
copy becomes stale
Consistency: A query result is
(snapshot) consistent iff it is as if
evaluated from a snapshot of the
master database
C&C: Currency & Consistency
5
Roadmap






Background
Cache data quality properties
Cache property specification
Enforcing data quality constraints
Experiments
Future directions and conclusions
6
Why Define Cache Properties?
Query processing
Cache Properties
(= contract)
Cache
maintenance
7
Cache Properties (P+3C)




Presence — per object
Consistency — a set of objects
Completeness — per predicate
Currency — object staleness
8
Basic Concepts
Tables
Object
View 1
Master Database
H1
Snapshots
View 2
View 3
Cache
H2
Cache Property Examples
Currency = now – stale point
Consistent
Complete
Present
View 1
Master Database
H1
Stale point
View 2
View 3
Cache
H2
Roadmap






Background
Cache data quality properties
Cache property specification
Enforcing data quality constraints
Experiments
Future directions and conclusions
11
Specifying Cache Properties

Specified as integrity constraints





Presence constraint
Consistency constraint
Completeness constraint
Presence correlation constraint
Consistency correlation constraint
12
Presence Constraint
AuthorCopy:
authorId
Backend
DBMS
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
AuthorList_PCT:
authorId
1
Caching DBMS
2
3
13
Presence Constraint
CREATE VIEW
AuthorCopy AS
Partially
SELECT * FROM Authors
materialized
view
CREATEcontrolTABLE
AuthorList_PCT
[Zhou int)
et al 2005]
(authorId
key
ALTER VIEW AuthorCopy ADD
PRESENCE ON authorId IN
control(SELECT authorId
FROM table
authorId_PCT
AuthorCopy:
authorId
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
AuthorList_PCT:
authorId
1
2
3
14
Consistency Constraint
Cache Region
CREATE TABLE CityList_CsCT
(city string)
Backend
ALTER
VIEW AuthorCopy ADD
DBMS
Consistency
ON city IN
(SELECT city
FROM cityList_CsCT
AuthorCopy:
authorId
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
CityList_CsCT: AuthorList_PCT:
AuthorList_PCT:
authorId
city
authorId
Madison
1
1
2
2
3
3
15
Completeness Constraint
AuthorCopy:
authorId
CREATE TABLE CityList_CpCT
(city string)
Backend
ALTER
VIEW AuthorCopy ADD
DBMS
Completeness
ON city IN
(SELECT city
FROM cityList_CsCT
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
CityList_CpCT: AuthorList_PCT:
AuthorList_PCT:
authorId
city
authorId
Madison
1
1
3
3
16
Presence Correlation Constraint
AuthorList_PCT:
authorId
1
AuthorCopy:
authorId
2
3
Backend
DBMS
ALTER VIEW BookCopy ADD
PRESENCE ON authorId IN
(SELECT authorId
FROM AuthorCopy)
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
17
Presence Correlation Constraint
AuthorList_PCT:
authorId
1
2
3
AuthorList_PCT
authorId
AuthorCopy
authorId
BookCopy
AuthorCopy:
authorId
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
18
Consistency Correlation Constraint
AuthorList_PCT:
authorId
1
2
3
Backend
DBMS
ALTER VIEW BookCopy ADD
CONSISTENCY ROOT
AuthorCopy:
authorId
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
19
Consistency Correlation Constraint
AuthorList_PCT:
authorId
1
2
3
AuthorList_PCT
authorId
AuthorCopy
authorId
BookCopy
AuthorCopy:
authorId
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
20
Cache Schema Example
AuthorList_PCT
ReviewerList_PCT
authorId
reviewerId
AuthorCopy
ReviewerCopy
authorId
BookCopy
isbn
ReviewC
opy
reviewId
21
Roadmap






Background
Cache data quality properties
Cache property specification
Enforcing data quality constraints
Experiments
Future directions and conclusions
22
Changing The Assumptions
Fully materialized
Partially materialized
 More general algorithms
views
views
Run-time check for consistency
constraints that can not be validated
Consistent
views
Row-level consistency
at compile-time

Push-based
maintenance
Pull-based
maintenance
23
Run-time C&C Checking
When view V matches expression E
E
V
ChoosePlan
Local plan
using V
C&C
Guard
Remote plan
requesting E
Currency guard:
Check if local view V satisfies currency requirement
Consistency guard:
Check if local view V satisfies consistency requirement
24
Performance Evaluation Goals

Consistency guards overhead


Simple checks
A spectrum of checks ranging from simple
to complicated
25
Experimental Setting




Back-end hosts a TPCD database tpcd1gh
with scale factor 1.0 (~1GB)
Cache server has a shadow of tpcd1gh
Two local views: custCopy, orderCopy
LAN connection between cache and
backend server
26
Queries Used
Qa: key
select
SELECT *
FROM Customers C
WHERE c_custkey=1
CURRENCY 10 ON (C)
Qb: join
query
SELECT *
FROM Customers C, Orders O
WHERE c_custkey=o_custkey and c_custkey=1
CURRENCY 10 ON (C), 20 ON (O)
Qc: nonkey select
SELECT *
FROM Customers C
WHERE c_nationkey = 1
CURRENCY 10 on (C)
27
Simple Consistency Guards Overhead
Execution time (ms)
80
70
Consistency guard
60
Query
1.6%
1.72%
50
40
30
20
10
1.66%
1.59%
16.56%
14.00%
Qa
Qb
0
Local
Qc
Qa
Qb
Remote
Qc
28
Single Table Consistency Guard
Overhead
Execution time (ms)
7
6
5
Consistency guard
6.06%
4.95% 2.33%
7.48%
8.79%
A11a
A11b
S11
S12
Query
(Qa is used)
4
3
2
62.85%
58.32%
23.77%
1
71.41%
16.98%
0
A11a
A11b
A12
Local
S11
S12
A12
Remote
29
Future Directions
Adaptive data quality
aware caching policies
Improve current
prototype


Read-write
transactions?
Time-line constraints?
Apply “good enough”
to other forms of
replications

Indexing data?


Control-table content?
Refresh intervals?
Automate cache
design/tuning

How to get a good cache
schema? (i.e., cache region
granularity, assignment)
30
Summary


Goal: fine-grained data quality-aware
cache management
A comprehensive solution
long,
and
thanks
all the fish!
 So
How
the
cache
tracks
data for
quality?
Four
cache
properties



How
admin
specify
cache properties?
Dynamic
cache
model
How
to maintain
the cache efficiently?
Efficient
cache maintenance
and “safety”
How
to do enforce
enforce C&C
C&C checking
constraints for queries?
Efficiently
Questions?
31
32
Proposed SQL Syntax
BookCopy
bid
title
author
1
databases Raghu
2
databases Ullman
ReviewCopy
rid bid text
SELECT *
Consistency
FROM Currency
Books B, Reviews
R
Group
classby
bound
WHERE B.bid = R.bid AND
B.title = “Databases“
CURRENCY
CURRENCY
BOUND 10
BOUND
min ON
10(B,
minR)ON
BY(B)
(B,
B.bid
R)
,
30 min ON (R)
bid
title
author bid rid text
1
databases Raghu
1
1
…
…
1
databases Raghu
1
2
…
…
2
databases Ullman
2
3
…
1
1
…
2
1
3
2
33
Pull-Maintenance


Refresh a region by pulling query results
When refreshing a region, also refresh the
affected closure


All overlapping regions
All correlated regions
34
Theoretical Results

Definition: (Safe partially
materialized views)
A partially materialized view V is safe if the following
two conditions hold for every instance of the cache
that satisfies all integrity constraints:


Property held for
For any pair of regions in V, either they don’t overlap or one
is contained
in the other.
every
instance
If V is gray, let X denote the set of regions in V defined by
presence control-key values. X is a partitioning of V and no
pair of regions in X is contained in any one region defined
on V.


Cache schema
design rules:
Rule 1:
A cache graph is a DAG.
Syntactically
Rule
2:
Only red nodes can have
independent completeness or
consistency control-tables.
checkable
conditions
Rule 3:
Every PMV with more than
one parent must be a red circle.
Rule
4:
If a PMV has the shared(polynomial)
row problem according to Lemma 5.2,





then it cannot be gray.
Rule 5:
A PMV cannot have noncompatible control-tables.
Theorem:
Given a cache schema <W, E>, if it satisfies the design rules,
then every PMV in W is safe. Conversely, if the schema violates
one of these rules, there is an instance of the cache satisfying all
specified integrity constraints in which some PMV is unsafe.
35
Pull-Maintenance
AuthorList_PCT:
authorId
1
3
4
authorId
TitleList_CsCT:
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
1
3
4
title
aaa
bbb
ccc
aaa
eee
title
aaa
36
Pull-Maintenance
AuthorCopy:
AuthorList_PCT
authorId
AuthorCopy
authorId
BookCopy
authorId
name
city
1
3
Alice
Cedric
Madison
Seattle
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
1
3
3
authorId
title
aaa
bbb
ccc
aaa
eee
37
Inefficient Pulling
AuthorCopy:
authorId
isbn
111
1
1
3
3
222
333
111
555
city
1
Alice
Madison
Shared-row
3
Cedric Seattle
problem
BookCopy:
AuthorBookCopy:
authorId
1
name
isbn
isbn
111
222
333
555
price
10
20
30
50
title
aaa
bbb
ccc
eee
38
Issues

Inefficient pulling:


Calculation of the affected closure requires
checking the rows
Efficient pulling:


The affected closure does NOT depend on
the instance of a view
Only requires forward pull among
correlated views
39
Related Work
Relaxing data quality
 Distributed databases
Read-only transactions [Garcia-Monina
et al. 1982]
 Demarcation protocol [Barbará et al 1992]
 TACC [Yu et al. 2000]

 Epsilon-serilizability [Pu et al. 1992]
Caching
 Database caching


DBCache [Altinel et al. 2003]
Constraint-based database caching [Härder
et al. 2004]
Mid-Tier caching [TimesTen 2002]
Shared-storage caching [Khalil et al 2002]
Uniqueness of our approach (query-centric):
Query: Specifies fine-grained C&C constraints
Warehousing and web views
WebViews
Admin:
Flexible local data
quality control in
[Labrinidis et al 2003]
Others
FAS
[Röhm et of
al. 2002]
Semantic caching [Dar et al 1996]
terms
granularity and properties
Obsolescent views [Gal 1999]
Cache in Postgres [Stonebraker et al 1990]
Distributed views [Segev et al 1990]
Predicate-based
caching [Keller et al
1996]
Freshness-driven
CachingwebDBMS:
C&C guarantees
for
caching [Li etProvides
al 2003]
WATCHMAN [Scheuermann et al 1996]
Replica
management
individual
query
Cache investment [Kossmann et al 2000]
Quasi-copies [Alonso et al. 1998],



















[Gallersdörfer et al. 1995]
Good-enough views [Seligman et al. 1997]
TRAPP [Olson et al. 2000]


DECAF [Kiernan et al 2000]
Proxy caching [Luo et al 2001]
40
Related documents