Download “Good Enough” Database Caching

Document related concepts
no text concepts found
Transcript
“Good Enough”
Database Caching
Hongfei Guo
University of Wisconsin-Madison
Motivation — Scaling Google
…
2
Motivation — Scaling A DBMS By Caching
How to tell whether the cached data is “good enough” for an
Application Server
application?
 NO data quality requirements from the applications!
 specific
NO data quality
App
codeguarantees from the caching DBMS!
…
Caching
DBMS
Asynchronous
Updates
Backend
DBMS
3
The Thesis


Serverquality requirements in
Apps: Application
Specifies data
queries
Cache: Enforces data quality constraint
[SIGMOD 2004] [SIGMOD 2004 Demo]

Caching
Cache admin:
Specify local data quality to be
DBMS
maintained by cache
(Data quality-centric database caching model)
[TR 2005] [VLDB 2005]

Backend
System performance
evaluation
DBMS
4
Data Quality Metrics (informal)



Currency: The elapsed time since this
copy becomes stale
Consistency: A query result is
(snapshot) consistent iff it is as if
evaluated from a snapshot of the
master database
C&C: Currency & Consistency
5
Roadmap







Background
Specifying data quality constraints in SQL
Data quality-centric caching model
Enforcing data quality constraints
System performance evaluation
Other research
Conclusions and future directions
6
Specifying Data Quality Constraints in SQL
[Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004]




Currency requirements
Consistency requirements
Extend SQL to specify relaxed C&C
requirements
Formal semantics of C&C constraints
7
Currency Requirements
Example 1: The caching database keeps
BookCopy


Customer A is about to purchase –he wants
the data to be exactly current
(High data quality is preferred)
Customer B is browsing –it is ok if the data
is no more than 3 days out of sync
(Quick response time is preferred)
8
Currency Requirements
Example 1: The caching database keeps
BookCopy


Customer A is about to purchase –he wants
the data to be exactly current
(High data quality is preferred)
Customer B is browsing –it is ok if the data
is no more than 3 days out of sync
(Quick response time is preferred)
9
Currency Requirements
Example 1: The caching database keeps
BookCopy
Customer
A
is
about
to
purchase
–he
wants
Different apps may have different
the data to be exactly current
currency requirements for the same query
(High data quality is preferred)
 Customer B is browsing –it is ok if the data
is no more than 3 days out of sync
(Quick response time is preferred)

10
Consistency Requirements
Example 2:
SELECT *
FROM Books B, Reviews R
WHERE B.bid = R.bid AND
B.title = “Databases”
BookCopy
bid
title
author
1
databasesapps
Raghu may have different consistency
Different
The
Books
Each
whole
book
be
consistent
be
query
consistent
result
&
be
2
databases
Ullman
requirements
for the same query
consistent
Reviews
with
its reviews
be consistent
bid
ReviewCopy
title
author bid rid text
rid bid text
1
databases Raghu
1
1
…
1
1
…
1
databases Raghu
1
2
…
2
1
…
3
2
…
2
databases Ullman
2
3
…
11
Proposed SQL Syntax
BookCopy
bid
title
author
1
databases Raghu
2
databases Ullman
ReviewCopy
rid bid text
SELECT *
Consistency
FROM Currency
Books B, Reviews
R
Group
classby
bound
WHERE B.bid = R.bid AND
B.title = “Databases“
CURRENCY
CURRENCY
BOUND 10
BOUND
min ON
10(B,
minR)ON
BY(B)
(B,
B.bid
R)
,
30 min ON (R)
bid
title
author bid rid text
1
databases Raghu
1
1
…
…
1
databases Raghu
1
2
…
…
2
databases Ullman
2
3
…
1
1
…
2
1
3
2
12
Specifying Data quality Constraints in SQL:
Contributions

Extend SQL to express C&C constraints
Single-block queries
Provides
correctness
standard
 Multi-block
(i.e., nested) queries
 Timeline constraint
using


for
Formal
semanticsor
of cached
C&C constraints
replicated
data
13
Roadmap







Background
Specifying data quality constraints in SQL
Data quality-centric caching model
Enforcing data quality constraints
System performance evaluation
Other research
Conclusions and future directions
14
Data Quality-Centric Caching Model
[Guo, Larson and Ramakrishnan, submitted]



Cache data quality properties
Cache property specification
Maintenance and “safety”
15
Why Define Cache Properties?
Query processing
Cache Properties
(= contract)
Cache
maintenance
16
Cache Properties (P+3C)




Presence — per object
Consistency — a set of objects
Completeness — per predicate
Currency — object staleness
17
Basic Concepts
Tables
Object
View 1
Master Database
H1
Snapshots
View 2
View 3
Cache
H2
Cache Property Examples
Currency = now – stale point
Consistent
Complete
Present
View 1
Master Database
H1
Stale point
View 2
View 3
Cache
H2
Specifying Cache Properties

Specified as integrity constraints





Presence constraint
Consistency constraint
Completeness constraint
Presence correlation constraint
Consistency correlation constraint
20
Presence Constraint
AuthorCopy:
authorId
Backend
DBMS
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
AuthorList_PCT:
authorId
1
Caching DBMS
2
3
21
Presence Constraint
CREATE VIEW
AuthorCopy AS
Partially
SELECT * FROM Authors
materialized
view
CREATEcontrolTABLE
AuthorList_PCT
[Zhou int)
et al 2005]
(authorId
key
ALTER VIEW AuthorCopy ADD
PRESENCE ON authorId IN
control(SELECT authorId
FROM table
authorId_PCT
AuthorCopy:
authorId
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
AuthorList_PCT:
authorId
1
2
3
22
Consistency Constraint
Cache Region
CREATE TABLE CityList_CsCT
(city string)
Backend
ALTER
VIEW AuthorCopy ADD
DBMS
Consistency
ON city IN
(SELECT city
FROM cityList_CsCT
AuthorCopy:
authorId
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
CityList_CsCT: AuthorList_PCT:
AuthorList_PCT:
authorId
city
authorId
Madison
1
1
2
2
3
3
23
Completeness Constraint
AuthorCopy:
authorId
CREATE TABLE CityList_CpCT
(city string)
Backend
ALTER
VIEW AuthorCopy ADD
DBMS
Completeness
ON city IN
(SELECT city
FROM cityList_CsCT
name
city
1
Alice
Madison
2
Bob
Madison
3
Cedric
Seattle
CityList_CpCT: AuthorList_PCT:
AuthorList_PCT:
authorId
city
authorId
Madison
1
1
3
3
24
Presence Correlation Constraint
AuthorList_PCT:
authorId
1
AuthorCopy:
authorId
2
3
Backend
DBMS
ALTER VIEW BookCopy ADD
PRESENCE ON authorId IN
(SELECT authorId
FROM AuthorCopy)
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
25
Presence Correlation Constraint
AuthorList_PCT:
authorId
1
2
3
AuthorList_PCT
authorId
AuthorCopy
authorId
BookCopy
AuthorCopy:
authorId
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
26
Consistency Correlation Constraint
AuthorList_PCT:
authorId
1
2
3
Backend
DBMS
ALTER VIEW BookCopy ADD
CONSISTENCY ROOT
AuthorCopy:
authorId
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
27
Consistency Correlation Constraint
AuthorList_PCT:
authorId
1
2
3
AuthorList_PCT
authorId
AuthorCopy
authorId
BookCopy
AuthorCopy:
authorId
authorId
name
1
2
3
Alice
Bob
Cedric
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
2
3
3
city
Madison
Madison
Seattle
authorId
title
aaa
bbb
ccc
ddd
eee
28
Cache Schema Example
AuthorList_PCT
ReviewerList_PCT
authorId
reviewerId
AuthorCopy
ReviewerCopy
authorId
BookCopy
isbn
ReviewC
opy
reviewId
29
Pull-Maintenance


Refresh a region by pulling query results
When refreshing a region, also refresh the
affected closure


All overlapping regions
All correlated regions
30
Pull-Maintenance
AuthorList_PCT:
authorId
1
3
4
authorId
TitleList_CsCT:
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
1
3
4
title
aaa
bbb
ccc
aaa
eee
title
aaa
31
Pull-Maintenance
AuthorCopy:
AuthorList_PCT
authorId
AuthorCopy
authorId
BookCopy
authorId
name
city
1
3
Alice
Cedric
Madison
Seattle
BookCopy:
isbn
111
222
333
444
555
authorId
1
1
1
3
3
authorId
title
aaa
bbb
ccc
aaa
eee
32
Inefficient Pulling
AuthorCopy:
authorId
isbn
111
1
1
3
3
222
333
111
555
city
1
Alice
Madison
Shared-row
3
Cedric Seattle
problem
BookCopy:
AuthorBookCopy:
authorId
1
name
isbn
isbn
111
222
333
555
price
10
20
30
50
title
aaa
bbb
ccc
eee
33
Issues

Inefficient pulling:


Calculation of the affected closure requires
checking the rows
Efficient pulling:


The affected closure does NOT depend on
the instance of a view
Only requires forward pull among
correlated views
34
Theoretical Results

Definition: (Safe partially
materialized views)
A partially materialized view V is safe if the following
two conditions hold for every instance of the cache
that satisfies all integrity constraints:


Property held for
For any pair of regions in V, either they don’t overlap or one
is contained
in the other.
every
instance
If V is gray, let X denote the set of regions in V defined by
presence control-key values. X is a partitioning of V and no
pair of regions in X is contained in any one region defined
on V.


Cache schema
design rules:
Rule 1:
A cache graph is a DAG.
Syntactically
Rule
2:
Only red nodes can have
independent completeness or
consistency control-tables.
checkable
conditions
Rule 3:
Every PMV with more than
one parent must be a red circle.
Rule
4:
If a PMV has the shared(polynomial)
row problem according to Lemma 5.2,





then it cannot be gray.
Rule 5:
A PMV cannot have noncompatible control-tables.
Theorem:
Given a cache schema <W, E>, if it satisfies the design rules,
then every PMV in W is safe. Conversely, if the schema violates
one of these rules, there is an instance of the cache satisfying all
specified integrity constraints in which some PMV is unsafe.
35
Data Quality-Centric Caching Model:
Contributions

Four cache properties
Specifying
properties
Providescache
an abstraction
layer
Cache property unit: cache region
(contract) between query
 Safe views and efficient pulling
processing and cache maintenance


36
Roadmap







Background
Specifying data quality constraints in SQL
Data quality-centric caching model
Enforcing data quality constraints
System performance evaluation
Other research
Conclusions and future directions
37
Enforcing Data Quality Constraints
Overview
 Simple case: View-level consistency

[Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004]
[Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004 Demo]
Implemented in MS SQL Server code base

General case: Row-level consistency
[Guo, Larson and Ramakrishnan, submitted]
38
Queries with Relaxed
Queries
C&C Requirements
Shadow
Databases
Query
Optimizer
Cache
Region
Metadata
Local
Materialized
Views
Execution
Engine
Heartbeat
Tables
Caching DBMS
Backend
DBMS
Results
Extension
to MTCache
Framework
MTCache
Framework
[Larson
et al. 2004]
Simple Case Assumptions



Fully materialized views
Each view is consistent
Push-based maintenance

E.g., MS replication service
40
Queries with Relaxed
C&C Requirements
Shadow
Databases
Query
Optimizer
Cache
Region
Metadata
Local
Materialized
Views
Execution
Engine
Heartbeat
Tables
Caching DBMS
Backend
DBMS
Results
Extension to MTCache Framework
C&C Tracking Mechanism

Consistency tracking  cache region (CR)




The unit of update propagation
Data mutually consistent
all the time
V1
V3
Properties, e.g., est. delay, est. interval
V2
Currency tracking  heartbeat table
Backend
Cid
1
2
Timestamp
12: 00
12:
20
30
00
10
12: 00
V4
V5
Cache
CR1:
CR2:
42
Queries with Relaxed
C&C Requirements
Shadow
Databases
Query
Optimizer
Currency
Region
Metadata
Execution
Heartbeat
The best plan that:
Engine
Tables
 Satisfies consistency requirements
 Includes run-time currency checking
Local
Materialized
Views
Caching DBMS
Backend
DBMS
Results
Extension to MTCache Framework
Extension to the Optimizer
Compile-time consistency checking
 Run-time currency checking
 Cost estimation

44
Consistency Checking
Enforced at optimization time
 Immediately prune a sub-plan if it
violates consistency constraints

Merge join
Q1: σ( Books
Reviews)
CURRENCY 5 ON (Books, Reviews)
Local scan
Reviews
Remote query
on Books
45
Run-time Currency Checking
When view V matches expression E
E
V
ChoosePlan
Local plan
using V
Currency
Guard
Remote plan
requesting E
Currency guard:
Check if local view V satisfies currency requirement
46
Cost Estimation
Cost for the SwitchUnion operator:
C = p * Clocal + (1- p) * Cremote + Ccg
p
Clocal
Cremote
Ccg
:
:
:
:
probability that the local branch will be used
cost of execution of the local branch
cost of execution of the remote branch
cost of currency checking
47
Estimating p
Compute p from three parameters:
f : estimated refresh interval
d : estimated minimal delay
B : currency bound
p=
0
(B-d)/f
1
if B-d ≤ 0,
if 0 < B-d ≤ f,
if B-d > f
48
Changing The Assumptions
Fully materialized
Partially materialized
 More general algorithms
views
views
Run-time check for consistency
constraints that can not be validated
Consistent
views
Row-level consistency
at compile-time

Push-based
maintenance
Pull-based
maintenance
49
Run-time C&C Checking
When view V matches expression E
E
ChoosePlan
Local plan
using V
Currency
Guard
Remote plan
requesting E
Currency guard:
Check if local view V satisfies currency requirement
50
Run-time C&C Checking
When view V matches expression E
E
ChoosePlan
Local plan
using V
Currency
C&C
Guard
Remote plan
requesting E
Currency guard:
Check if local view V satisfies currency requirement
Consistency guard:
Check if local view V satisfies consistency requirement
51
Performance Evaluation Goals


Currency guards overhead
Consistency guards overhead


Simple checks
A spectrum of checks ranging from simple
to complicated
52
Experimental Setting




Back-end hosts a TPCD database tpcd1gh
with scale factor 1.0 (~1GB)
Cache server has a shadow of tpcd1gh
Two local views: custCopy, orderCopy
LAN connection between cache and
backend server
53
Queries Used
Qa: key
select
SELECT *
FROM Customers C
WHERE c_custkey=1
CURRENCY 10 ON (C)
Qb: join
query
SELECT *
FROM Customers C, Orders O
WHERE c_custkey=o_custkey and c_custkey=1
CURRENCY 10 ON (C), 20 ON (O)
Qc: nonkey select
SELECT *
FROM Customers C
WHERE c_nationkey = 1
CURRENCY 10 on (C)
54
Currency Guards Overhead
250
0.41%
Execution time (ms)
Currency guard
200
Query
150
100
3.66%
50
15.26%
21.3%
3.59%
4.31%
Qa
Qb
0
Qa
Qb
Local
Qc
Remote
Qc
55
Simple Consistency Guards Overhead
Execution time (ms)
80
70
Consistency guard
60
Query
1.6%
1.72%
50
40
30
20
10
1.66%
1.59%
16.56%
14.00%
Qa
Qb
0
Local
Qc
Qa
Qb
Remote
Qc
56
Single Table Consistency Guard
Overhead
Execution time (ms)
7
6
5
Consistency guard
6.06%
4.95% 2.33%
7.48%
8.79%
A11a
A11b
S11
S12
Query
(Qa is used)
4
3
2
62.85%
58.32%
23.77%
1
71.41%
16.98%
0
A11a
A11b
A12
Local
S11
S12
A12
Remote
57
Enforcing Data Quality Constraints:
contributions
Algorithms for enforcing C&C constraints
in query processing
Provides DBMS guarantees for C&C
 Implemented a prototype in MS SQL
requirements
Server code base for a restricted case

58
Roadmap







Background
Specifying data quality constraints in SQL
Data quality-centric caching model
Enforcing data quality constraints
System performance evaluation
Other research
Conclusions and future directions
59
System Performance Evaluation




Push vs. pull maintenance
Performance model
Model parameters and settings
Experiments and analysis
60
“Push” Maintenance
Publication
V1
V1
V3
V2
V4
V3
V2
V5
V4
V5
61
“Push” Maintenance
V1
Distribution
Agent
V2
V3
V4
V2
log sniffing
V1
V5
Distribution
Database
V3
Subscriptions
V1
Distribution
Agent
V5
V4
Updates
62
Push vs. Pull
“Push” model:
 Incremental
 Only view level regions
 Only limited types of views

selection and projection views
“Pull” model:
 Re-computing
 Maximal flexibility
63
Performance Model Overview
Single-site DBMS
([ACL87])
Cache-master
configuration
Model
refinement
User model
 Transaction model
 Data quality
requirements
 Cache region concept
 Cache-master
interaction
 Cache maintenance
 Consistency-class based
 Transaction processing
locking for the cache
 Network cost
 Sequential vs. random

disk access
64
Logical queuing model [ACL87]
(single-site)
...
TERMINALS
delay
ready queue
update queue
update
UPDATE
cc queue
RESTART
CC
blocked
queue
BLOCK
ACCESS
think
object queue
YES
NO
think?
object
65
Physical queuing model [ACL87]
...
(single-site)
TERMINALS
delay
ready queue
...
disk
think
disk
...
cpu
cpu
66
Queuing model for a cache-master configuration
TERMINALS
...
...
TERMINALS
SUBMIT
SUBMIT
COMMIT
COMMIT
MASTER
CACHE
remote queries
remote queue
distribution agents
refresh transactions
...
updates
67
Model Parameters for Single-Site DBMS
Parameter
Meaning
db_size
Number of objects in database
mpl
Multiprogramming level
max_size
Size of largest transaction
min_size
Size of smallest transaction
write_prob
Pr (write X | read X)
read_only_percentage
Percentage of read-only transactions
ext_think_time
Mean transaction think time
obj_io
Disk time for accessing an object
obj_io_seek
Disk seeking time
obj_io_transfer
Disk transfer time for an object
obj_cpu
CPU time for accessing an object
num_cpus
Number of CPUs
num_disks
Number of disks
68
Model Parameters for a Cache-Master
Configuration (1)
Parameter
Meaning
num_terms_total
Total number of terminals
num_terms_cache
Number of terminals at a cache
num_caches
Number of caches
network_delay_query
Network delay for sending a query
network_delay_transfer
Network delay for sending an object
num_regions
Number of cache regions at each cache
max_num_classes
Maximal number consistency classes per Xact
min_num_classes
Minimal number of consistency classes per Xact
num_classes
Number of consistency classes of the database
refresh_interval
Refresh interval
currency_bound
Currency bound
69
Model Parameters for a Cache-Master
Configuration (2)
Parameter
Meaning
log_sniffing_fixed_cpu
Fixed part of CPU time for log sniffing a transaction
log_sniffing_unit_cpu
Unit CPU time for log sniffing a write action
log_sniffing_fixed_disk
Fixed part of Disk time for log sniffing a transaction
log_sniffing_unit_disk
Unit Disk time for log sniffing a write action
distribution_fixed_cpu
Fixed part of CPU time for distributing updates
distribution_unit_cpu
Unit CPU time for distributing a write action
distribution_fixed_disk
Fixed part of Disk time for distributing updates
distribution_unit_disk
Unit Disk time for distributing a write action
seq_prob_copier
Pr (copier reads are sequential)
seq_prob_refresh
Pr (copier writes are sequential)
70
Parameter Setting (1)
Parameter
Value
db_size
10,000 pages
num_terms_total
300
num_terms_cache
15
mpl
50
max_size
12-page readset (maximum)
min_size
4-page readset (minimum)
write_prob
0.25
read_only_percentage
90
ext_think_time
1 second
obj_io
35 milliseconds
obj_io_seek
30 milliseconds
obj_io_transfer
5 milliseconds
obj_cpu
15 milliseconds
71
Parameter Setting (2)
Parameter
Value
num_cpus (master)
2
num_disks (master)
4
num_cpus (cache)
1
num_disks (cache)
2
num_caches
0, 1, 3, 5, 8, 10, 13, and 15
network_delay_query
20 milliseconds
network_delay_transfer
5 milliseconds
seq_prob_copier
1
seq_prob_refresh
0 for push, 1 for pull
num_regions
1
72
Parameter Setting (3)
Parameter
Value
log_sniffing_fixed_cpu
15 milliseconds
log_sniffing_unit_cpu
5 milliseconds
log_sniffing_fixed_disk
20 milliseconds
log_sniffing_unit_disk
5 milliseconds
distribution_fixed_cpu
200 milliseconds
distribution_unit_cpu
5 milliseconds
distribution_fixed_disk
200 milliseconds
distribution_unit_disk
5 milliseconds
73
Performance Metrics





Throughput
Number of transactions completed per
second
Response time
Local workload ratio
Conflict ratios
Utilization
74
Performance Metrics





Throughput
Response time
Elapsed time between transaction
submission and completion
Local workload ratio
Conflict ratios
Utilization
75
Performance Metrics





Throughput
Response time
Local workload ratio
Ratio of number of reads completed at
the caches to the total number of reads
submitted to the caches
Conflict ratios
Utilization
76
Performance Metrics





Throughput
Response time
Local workload ratio
Conflict ratios
Blocking ratio: average number of times that
a transaction has to block per commit
Restarting ratio: average number of times
that a transaction has to restart per commit
Utilization
77
Performance Metrics





Throughput
Response time
Local workload ratio
Conflict ratios
Utilization


Disk utilization
CPU utilization
78
Experiments and Analysis




Impact of writes
Impact
relaxing
interval
Onlyofone
cacherefresh
region,
push
Impact of relaxing data quality
requirements
Equal-sized cache regions
Impact of push vs. pull
push vs. pull
79
Impact of Writes


Scenario 1: never refresh
Scenario 2: continuous refresh
80
System Throughput
(∞ currency bound, ∞ refresh interval)
81
System Throughput
(∞ currency bound, 0 refresh interval)
82
Summary


Improvement is marginal when readonly percentage is low (80, 70, 50)
Cache maintenance overhead worsens
the situation
83
Impact of Relaxing Refresh Interval


Scenario 1: low cache maintenance
overhead
Scenario 2: high cache maintenance
overhead
84
System Throughput (low overhead)
85
System Throughput (high overhead)
86
Summary

The cache maintenance overhead
increases when:


the number of caches increases
the maintenance overhead increases
87
Impact of Relaxing Data Quality
Requirements




Scenario
Scenario
Scenario
Scenario
1: 0 refresh interval
2: 5s refresh interval
3: 50s refresh interval
4: ∞ refresh interval
88
System Throughput (0 refresh interval)
89
System Throughput (5s refresh interval)
90
System Throughput (50s refresh interval)
91
System Throughput (∞ refresh interval)
92
Local Workload Ratio (0 refresh interval)
93
Local Workload Ratio (5s refresh interval)
94
Local Workload Ratio (50s refresh interval)
95
Local Workload Ratio (∞ refresh interval)
96
Summary

Tradeoff between refresh interval and
currency bound
Refresh interval  refresh overhead
Choose appropriate refresh interval
 Currency bound  local workload ratio

according to workload currency bounds
 Balance refresh interval with currency
bound  better system performance
97
Impact of Push vs. Pull

Settings:



Skewed setting (decaying currency bounds)
Uniform setting (same currency bound)
Number of cache regions:


Push: 1, 20, 40 and 100
Pull: 100, 200, 500 and 1,000
98
System throughput (skewed, push)
99
System throughput (skewed, pull)
100
Local Workload Ratio (skewed, push)
101
Local Workload Ratio (skewed, pull)
102
System throughput (non-skewed, push)
103
System throughput (non-skewed, pull)
104
Local Workload Ratio (non-skewed, push)
105
Local Workload Ratio (non-skewed, pull)
106
Summary

Impact of fine cache region granularity
More opportunity for lazy maintenance
 Smaller
regions cache region granularity
Choose
appropriate
 More copier/refresh
according
to workloadtransactions
C&C requirements


Finer granularity  worse performance
for non-skewed workload
107
Performance modeling: contributions
Developed a detailed model for a
complex system — data quality-aware
Provides insights into performance
cache-master configuration

tradeoffs

Systematic performance evaluation
108
Related Work
Relaxing data quality
 Distributed databases
Read-only transactions [Garcia-Monina
et al. 1982]
 Demarcation protocol [Barbará et al 1992]
 TACC [Yu et al. 2000]

 Epsilon-serilizability [Pu et al. 1992]
Caching
 Database caching


DBCache [Altinel et al. 2003]
Constraint-based database caching [Härder
et al. 2004]
Mid-Tier caching [TimesTen 2002]
Shared-storage caching [Khalil et al 2002]
Uniqueness of our approach (query-centric):
Query: Specifies fine-grained C&C constraints
Warehousing and web views
WebViews
Admin:
Flexible local data
quality control in
[Labrinidis et al 2003]
Others
FAS
[Röhm et of
al. 2002]
Semantic caching [Dar et al 1996]
terms
granularity and properties
Obsolescent views [Gal 1999]
Cache in Postgres [Stonebraker et al 1990]
Distributed views [Segev et al 1990]
Predicate-based
caching [Keller et al
1996]
Freshness-driven
CachingwebDBMS:
C&C guarantees
for
caching [Li etProvides
al 2003]
WATCHMAN [Scheuermann et al 1996]
Replica
management
individual
query
Cache investment [Kossmann et al 2000]
Quasi-copies [Alonso et al. 1998],



















[Gallersdörfer et al. 1995]
Good-enough views [Seligman et al. 1997]
TRAPP [Olson et al. 2000]


DECAF [Kiernan et al 2000]
Proxy caching [Luo et al 2001]
109
Other Research

UW: Indexing large-scale, dynamic one-dimensional
intervals [In preparation]





Evaluating different locking protocols for database
caching [ongoing]
Quality of services evaluation of multicast streaming
protocols [SIGMETRICS 2002]
MS: SchemaGen project [Software released]


A family of data structures
Differed index
Designed and implemented a relational schema generator for
annotated XML schemas
MSR-Redmond: RECYCLE project

Added support for update statistics for query result caching in
SQL Server
110
Future Directions
Adaptive data quality
aware caching policies
Improve current
prototype


Read-write
transactions?
Time-line constraints?
Apply “good enough”
to other forms of
replications

Indexing data?


Control-table content?
Refresh intervals?
Automate cache
design/tuning

How to get a good cache
schema? (i.e., cache region
granularity, assignment)
111
Summary


Problem: Gap between applications and
caching DBMS
A comprehensive solution
long, data
and quality
thanks
for all the fish!
 So
Specifying
constraints



Data quality-centric cache model
Enforcing Data quality constraints
Systematic performance evaluation
Questions?
112
113
Related documents