Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Caching with “Good Enough” Currency, Consistency, and Completeness Hongfei Guo University of Wisconsin Per-Åke Larson Microsoft Research Raghu Ramakrishnan University of Wisconsin Motivation — Scaling Google … 2 Motivation — Scaling A DBMS By Caching How to tell whether the cached data is “good enough” for an Application Server application? NO data quality requirements from the applications! specific NO data quality App codeguarantees from the caching DBMS! … Caching DBMS Asynchronous Updates Backend DBMS 3 The Big Picture Serverquality requirements in Apps: Application Specifies data queries Cache: Enforces data quality constraint View level granularity [SIGMOD 2004] [SIGMOD 2004 Demo] Caching Cache admin: Specify local data quality to be DBMS maintained by cache Finer granularity (Data (Partitions quality-aware database caching model) of a view) [This presentation] Backend System performance evaluation DBMS [dissertation] 4 Data Quality Metrics (informal) Currency: The elapsed time since this copy becomes stale Consistency: A query result is (snapshot) consistent iff it is as if evaluated from a snapshot of the master database C&C: Currency & Consistency 5 Roadmap Background Cache data quality properties Cache property specification Enforcing data quality constraints Experiments Future directions and conclusions 6 Why Define Cache Properties? Query processing Cache Properties (= contract) Cache maintenance 7 Cache Properties (P+3C) Presence — per object Consistency — a set of objects Completeness — per predicate Currency — object staleness 8 Basic Concepts Tables Object View 1 Master Database H1 Snapshots View 2 View 3 Cache H2 Cache Property Examples Currency = now – stale point Consistent Complete Present View 1 Master Database H1 Stale point View 2 View 3 Cache H2 Roadmap Background Cache data quality properties Cache property specification Enforcing data quality constraints Experiments Future directions and conclusions 11 Specifying Cache Properties Specified as integrity constraints Presence constraint Consistency constraint Completeness constraint Presence correlation constraint Consistency correlation constraint 12 Presence Constraint AuthorCopy: authorId Backend DBMS name city 1 Alice Madison 2 Bob Madison 3 Cedric Seattle AuthorList_PCT: authorId 1 Caching DBMS 2 3 13 Presence Constraint CREATE VIEW AuthorCopy AS Partially SELECT * FROM Authors materialized view CREATEcontrolTABLE AuthorList_PCT [Zhou int) et al 2005] (authorId key ALTER VIEW AuthorCopy ADD PRESENCE ON authorId IN control(SELECT authorId FROM table authorId_PCT AuthorCopy: authorId name city 1 Alice Madison 2 Bob Madison 3 Cedric Seattle AuthorList_PCT: authorId 1 2 3 14 Consistency Constraint Cache Region CREATE TABLE CityList_CsCT (city string) Backend ALTER VIEW AuthorCopy ADD DBMS Consistency ON city IN (SELECT city FROM cityList_CsCT AuthorCopy: authorId name city 1 Alice Madison 2 Bob Madison 3 Cedric Seattle CityList_CsCT: AuthorList_PCT: AuthorList_PCT: authorId city authorId Madison 1 1 2 2 3 3 15 Completeness Constraint AuthorCopy: authorId CREATE TABLE CityList_CpCT (city string) Backend ALTER VIEW AuthorCopy ADD DBMS Completeness ON city IN (SELECT city FROM cityList_CsCT name city 1 Alice Madison 2 Bob Madison 3 Cedric Seattle CityList_CpCT: AuthorList_PCT: AuthorList_PCT: authorId city authorId Madison 1 1 3 3 16 Presence Correlation Constraint AuthorList_PCT: authorId 1 AuthorCopy: authorId 2 3 Backend DBMS ALTER VIEW BookCopy ADD PRESENCE ON authorId IN (SELECT authorId FROM AuthorCopy) authorId name 1 2 3 Alice Bob Cedric BookCopy: isbn 111 222 333 444 555 authorId 1 1 2 3 3 city Madison Madison Seattle authorId title aaa bbb ccc ddd eee 17 Presence Correlation Constraint AuthorList_PCT: authorId 1 2 3 AuthorList_PCT authorId AuthorCopy authorId BookCopy AuthorCopy: authorId authorId name 1 2 3 Alice Bob Cedric BookCopy: isbn 111 222 333 444 555 authorId 1 1 2 3 3 city Madison Madison Seattle authorId title aaa bbb ccc ddd eee 18 Consistency Correlation Constraint AuthorList_PCT: authorId 1 2 3 Backend DBMS ALTER VIEW BookCopy ADD CONSISTENCY ROOT AuthorCopy: authorId authorId name 1 2 3 Alice Bob Cedric BookCopy: isbn 111 222 333 444 555 authorId 1 1 2 3 3 city Madison Madison Seattle authorId title aaa bbb ccc ddd eee 19 Consistency Correlation Constraint AuthorList_PCT: authorId 1 2 3 AuthorList_PCT authorId AuthorCopy authorId BookCopy AuthorCopy: authorId authorId name 1 2 3 Alice Bob Cedric BookCopy: isbn 111 222 333 444 555 authorId 1 1 2 3 3 city Madison Madison Seattle authorId title aaa bbb ccc ddd eee 20 Cache Schema Example AuthorList_PCT ReviewerList_PCT authorId reviewerId AuthorCopy ReviewerCopy authorId BookCopy isbn ReviewC opy reviewId 21 Roadmap Background Cache data quality properties Cache property specification Enforcing data quality constraints Experiments Future directions and conclusions 22 Changing The Assumptions Fully materialized Partially materialized More general algorithms views views Run-time check for consistency constraints that can not be validated Consistent views Row-level consistency at compile-time Push-based maintenance Pull-based maintenance 23 Run-time C&C Checking When view V matches expression E E V ChoosePlan Local plan using V C&C Guard Remote plan requesting E Currency guard: Check if local view V satisfies currency requirement Consistency guard: Check if local view V satisfies consistency requirement 24 Performance Evaluation Goals Consistency guards overhead Simple checks A spectrum of checks ranging from simple to complicated 25 Experimental Setting Back-end hosts a TPCD database tpcd1gh with scale factor 1.0 (~1GB) Cache server has a shadow of tpcd1gh Two local views: custCopy, orderCopy LAN connection between cache and backend server 26 Queries Used Qa: key select SELECT * FROM Customers C WHERE c_custkey=1 CURRENCY 10 ON (C) Qb: join query SELECT * FROM Customers C, Orders O WHERE c_custkey=o_custkey and c_custkey=1 CURRENCY 10 ON (C), 20 ON (O) Qc: nonkey select SELECT * FROM Customers C WHERE c_nationkey = 1 CURRENCY 10 on (C) 27 Simple Consistency Guards Overhead Execution time (ms) 80 70 Consistency guard 60 Query 1.6% 1.72% 50 40 30 20 10 1.66% 1.59% 16.56% 14.00% Qa Qb 0 Local Qc Qa Qb Remote Qc 28 Single Table Consistency Guard Overhead Execution time (ms) 7 6 5 Consistency guard 6.06% 4.95% 2.33% 7.48% 8.79% A11a A11b S11 S12 Query (Qa is used) 4 3 2 62.85% 58.32% 23.77% 1 71.41% 16.98% 0 A11a A11b A12 Local S11 S12 A12 Remote 29 Future Directions Adaptive data quality aware caching policies Improve current prototype Read-write transactions? Time-line constraints? Apply “good enough” to other forms of replications Indexing data? Control-table content? Refresh intervals? Automate cache design/tuning How to get a good cache schema? (i.e., cache region granularity, assignment) 30 Summary Goal: fine-grained data quality-aware cache management A comprehensive solution long, and thanks all the fish! So How the cache tracks data for quality? Four cache properties How admin specify cache properties? Dynamic cache model How to maintain the cache efficiently? Efficient cache maintenance and “safety” How to do enforce enforce C&C C&C checking constraints for queries? Efficiently Questions? 31 32 Proposed SQL Syntax BookCopy bid title author 1 databases Raghu 2 databases Ullman ReviewCopy rid bid text SELECT * Consistency FROM Currency Books B, Reviews R Group classby bound WHERE B.bid = R.bid AND B.title = “Databases“ CURRENCY CURRENCY BOUND 10 BOUND min ON 10(B, minR)ON BY(B) (B, B.bid R) , 30 min ON (R) bid title author bid rid text 1 databases Raghu 1 1 … … 1 databases Raghu 1 2 … … 2 databases Ullman 2 3 … 1 1 … 2 1 3 2 33 Pull-Maintenance Refresh a region by pulling query results When refreshing a region, also refresh the affected closure All overlapping regions All correlated regions 34 Theoretical Results Definition: (Safe partially materialized views) A partially materialized view V is safe if the following two conditions hold for every instance of the cache that satisfies all integrity constraints: Property held for For any pair of regions in V, either they don’t overlap or one is contained in the other. every instance If V is gray, let X denote the set of regions in V defined by presence control-key values. X is a partitioning of V and no pair of regions in X is contained in any one region defined on V. Cache schema design rules: Rule 1: A cache graph is a DAG. Syntactically Rule 2: Only red nodes can have independent completeness or consistency control-tables. checkable conditions Rule 3: Every PMV with more than one parent must be a red circle. Rule 4: If a PMV has the shared(polynomial) row problem according to Lemma 5.2, then it cannot be gray. Rule 5: A PMV cannot have noncompatible control-tables. Theorem: Given a cache schema <W, E>, if it satisfies the design rules, then every PMV in W is safe. Conversely, if the schema violates one of these rules, there is an instance of the cache satisfying all specified integrity constraints in which some PMV is unsafe. 35 Pull-Maintenance AuthorList_PCT: authorId 1 3 4 authorId TitleList_CsCT: BookCopy: isbn 111 222 333 444 555 authorId 1 1 1 3 4 title aaa bbb ccc aaa eee title aaa 36 Pull-Maintenance AuthorCopy: AuthorList_PCT authorId AuthorCopy authorId BookCopy authorId name city 1 3 Alice Cedric Madison Seattle BookCopy: isbn 111 222 333 444 555 authorId 1 1 1 3 3 authorId title aaa bbb ccc aaa eee 37 Inefficient Pulling AuthorCopy: authorId isbn 111 1 1 3 3 222 333 111 555 city 1 Alice Madison Shared-row 3 Cedric Seattle problem BookCopy: AuthorBookCopy: authorId 1 name isbn isbn 111 222 333 555 price 10 20 30 50 title aaa bbb ccc eee 38 Issues Inefficient pulling: Calculation of the affected closure requires checking the rows Efficient pulling: The affected closure does NOT depend on the instance of a view Only requires forward pull among correlated views 39 Related Work Relaxing data quality Distributed databases Read-only transactions [Garcia-Monina et al. 1982] Demarcation protocol [Barbará et al 1992] TACC [Yu et al. 2000] Epsilon-serilizability [Pu et al. 1992] Caching Database caching DBCache [Altinel et al. 2003] Constraint-based database caching [Härder et al. 2004] Mid-Tier caching [TimesTen 2002] Shared-storage caching [Khalil et al 2002] Uniqueness of our approach (query-centric): Query: Specifies fine-grained C&C constraints Warehousing and web views WebViews Admin: Flexible local data quality control in [Labrinidis et al 2003] Others FAS [Röhm et of al. 2002] Semantic caching [Dar et al 1996] terms granularity and properties Obsolescent views [Gal 1999] Cache in Postgres [Stonebraker et al 1990] Distributed views [Segev et al 1990] Predicate-based caching [Keller et al 1996] Freshness-driven CachingwebDBMS: C&C guarantees for caching [Li etProvides al 2003] WATCHMAN [Scheuermann et al 1996] Replica management individual query Cache investment [Kossmann et al 2000] Quasi-copies [Alonso et al. 1998], [Gallersdörfer et al. 1995] Good-enough views [Seligman et al. 1997] TRAPP [Olson et al. 2000] DECAF [Kiernan et al 2000] Proxy caching [Luo et al 2001] 40