Download ETH

Symbolic Query Processing Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter Hass (IBM Almaden Research Center) © ETH Zürich Symbolic Query Processing  Treat all data as symbols (think of variables)  E.g., a1 represents any value under the domain of attribute a  Table R and S are called symbolic relations ETH Zurich 2 Background – Symbolic Execution 1/3  Borrow the concept from symbolic execution  A well known program verification technique  Represent values of program variables with symbolic values instead of concrete data  Manipulate expressions based on those symbolic values ETH Zurich 3 Background – Symbolic Execution 2/3 Find a test case for path 1236 Symbolic execution – start: 1. minsalary = read_input(); 1. minsalary = ben 2. bensalary = minsalary + 2000; 2. bensalary = ben + 2000; 3. if (bensalary < 80000) 3. bensalary = ben + 2000 4. output “no kidding!”; Symbolic execution – end 5. else 6. and !(bensalary < 80000);- (  ) output “that’s right”; Instantiate (): ben = 90000  expected input “that’s right”  expected output ETH Zurich 4 Background – Symbolic Execution 3/3  Has been research for > 20 years  Still have many limitations  E.g., cannot handle highly complex software  However, many large software vendors still put hope on this technique for program verifications  E.g., Microsoft Research  No progress on database applications  involve an external database and SQL ETH Zurich 5 SQP Applications  Extend program verification and symbolic execution techniques to support database applications  For DBMS testing  focus of today ETH Zurich 6 Symbolic Query Processing  Query manipulates data according to different needs  R b=c b1 S  Want the join results to have one tuple? set c1=b1  Want the join results to have:  four tuples  Zipf distribution (t1 joins more, t2 joins less)? ETH Zurich 7 DBMS Testing  To test a DBMS, we generate a lot of test databases and execute a lot of test queries  DBMS vendors are looking for a way to control the intermediate results of a test query such that we can test an individual component of a DBMS under a particular test case ETH Zurich 8 DBMS Testing Example  Test the accuracy of a cardinality estimation component of a query optimizer under  a multi-way hash join query  a two-way join query with aggregation  If we can make sure executing the test query on the test database expected answer gives ETH Zurich 9 DBMS Testing  The test query is given  Physical join ordering can be fixed (by testers)  Evaluation algorithm (e.g., using hash-join) can be fixed too  However, the size of the intermediate results cannot be fixed easily ETH Zurich 10 DBMS Testing Problem Guarantee that executing a test query on a test database can obtain the desired intermediate query results (e.g.,. output cardinality, data distribution) ETH Zurich 11 DBMS Testing Problem  A test case T is:  a parametric query Qp  with a set of constraints C on each intermediate result  A good test database D means  Qp (D) satisfies C - if the set of parameters p is properly instantiated  Test case T D covers test case T ETH Zurich 12 Trial-and-error  Generate Database 3, 2, and 1  Using traditional database generators such as IBM Test DB generator, MSR DB generator, etc  Search for parameters  T2 is never covered  The database generation process does not care about the test queries ETH Zurich 13 Latest approach – Finding query parameters  MSR realized this problem [TKDE06]  Given the test database + the test query Qp, search parameter values for p such that Qp(D) (almost) fit the cardinality requirements defined on the test case  It is a NP-hard problem  Same as the previous approach, T2 is never covered ETH Zurich 14 QAGen – Query Aware test database Generator  Based on symbolic query processing  We can control the output size of each intermediate query result (and even more) ETH Zurich 15 QAGen – Generate a query-aware test database for each test case ETH Zurich 16 QAGen overview ETH Zurich 17 QAGen overview – Query Analyzer  Analyzer the query and assign the knob to an operator  A knob is a parameter of an operator to control the output (e.g., output cardinality, distribution)  A knob for an operator is not always available for tuning ETH Zurich 18 QAGen overview – Query Analyzer A knob for an operator is not always available for tuning join distribution? Yes ETH Zurich join distribution? No 19 QAGen overview – Query Analyzer The available knob(s) for an operator depends on its input characteristics Definition: pre-grouping data Definition: non pre-grouping data ETH Zurich 20 QAGen overview – Query Analyzer ETH Zurich 21 Symbolic Query Engine and Symbolic Database ETH Zurich 22 Symbolic Query Engine and Symbolic Database (SDB) σa>p  An SQL operator:  Add predicates to a symbol  Replace a symbol with another other >p <=p symbol (e.g., joining)  E.g., SELECT a FROM R WHERE a > p;  1 output ETH Zurich 23 Symbolic Query Engine and Symbolic Database (SDB)  How to physically store the symbolic data? >p <=p  Options:  Implement a native symbolic database  Use relational database s - How to represent “a1 > p”? - Stores all predicates that are associated with a symbol s in a separate relation called Pred. a1 a1>p a2 a2<=p PTable PTable ETH Zurich 24 Data Instantiator ETH Zurich 25 Data Instantiation • Data instantiator uses a constraint solver: • Input: a (propositional) constraint (e.g., A + B > 50) • Output: any concrete values for the constraint (e.g., A=99, B=12) ETH Zurich 26 Symbolic Query Engine ETH Zurich 27 Symbolic Query Engine  Iterator-based  open(), getNext(), close()  No naughty user  ETH Zurich Contradicting knob values 28 SQP – Table operator  Fill up the table with symbols ETH Zurich 29 SQP – σ operator ETH Zurich 30 SQP – operator (with FK constraint) Action: join key replacement ETH Zurich 31 SQP – operator (with FK constraint) Action: join key replacement ETH Zurich 32 SQP – operator (with FK constraint)  When the input of the join is pre-grouped, the world has changed  It sometimes happen, e.g.,  2-way join  Base tables A, B and C with foreign key relationships  A  B, B C ETH Zurich 33 SQP – operator (with FK constraint)  Do not support join distribution (the knob is disabled by the analyzer)  Controlling the output cardinality is a subset-sum problem (weakly NP-hard)  Subset-sum has a pseudo-polynomial time exact solution using dynamic programming ETH Zurich 34 SQP – operator (with FK constraint)  Blocking  During open()  Materialize Table S in a temporary relation  SELECT COUNT(k) From S GROUP BY k  Solve the subset-sum ETH Zurich 35 SQP – χ operator Action 1: Aggregation attribute replacement • o_date3  o_date1 • o_date4  o_date2 1st output group (o_date1) 2nd output group (o_date2) ETH Zurich 36 SQP – χ operator Action 2 (base case version): - Adding aggregation constraints to PTable, base case: <l_price1, aggsum1 = l_price1+ l_price2 + l_price3+l_price4 + l_price7> <l_price2, aggsum1 = l_price1+ l_price2 + l_price3+l_price4 + l_price7> <l_price3, aggsum1 = ‘’> <l_price4, aggsum1=‘’> <l_price7, aggsum1 = l_price1+ l_price2 + l_price3+l_price4 + l_price7> <l_price5, aggsum2 = l_price5+ l_price6 + l_price8> <l_price6, aggsum2 = l_price5+ l_price6 + l_price8> <l_price8, aggsum2 = ‘’> ETH Zurich 37 SQP – χ operator Action 2 (optimized version): - A constraint solver call is exponential to the size of predicates - Adding 2 aggregation constraints to PTable: <l_price1, aggsum1 = l_price1 x 5> <l_price5, aggsum2 = l_price5 x 3> and do l_price replacement ETH Zurich 38 Data Instantiation ETH Zurich 39 Data Instantiation  Use a constraint solver to instantiate the symbolic database  for each symbolic relation r for each tuple t for each symbol s load the related predicates P instantiate P cache P ETH Zurich 40 Experiment 1 – Operator Performance  Study the performance (and scalability) of   Individual operator during SQP The data instantiation phase  Use TPC-dbgen to generate 3 TPCH-DB  10M, 100M, 1G  Q8(TPCH-DB) to collect the intermediate results R for each operator  QAGen(Q8, R)  Q8 query aware database ETH Zurich 41 Experiments – TPC-H Query 8 ETH Zurich 42 Experiment 1 – TPC-H Query 8 ETH Zurich 43 Experiment 2 – Effects of knob values  Use TPCH Q8  6 sets of knob values    TPCH-Uniform, TPCH-Zipf Min-Uniform, Min-Zipf Max-Uniform, Max-Zipf ETH Zurich 44 Experiment 2 – Effects of knob values ETH Zurich 45 Experiment 3 – System Scalability ETH Zurich 46 Related Work, Future Work, Conclusions  Reverse Query Processing (ICDE07)  Given the result R, the query Q, reversely process Q to generate D  for function testing database applications, view maintenance, debugging SQL  Multiple SQL statements (to ACM TSE journal) ETH Zurich 47 ETH Zurich 48 Current approach 2 – Stochastically generate many test queries  Based on a given test database, RAGS/QGen generates many valid SQL queries to test the system  No guarantee that T1 can be covered  Same as the previous approach, T2 is never covered ETH Zurich 49 QAGen overview – Query Analyzer  Each knob combination (e.g., output cardinality + join distribution) for an operator may have different ways to implement it  The output is an knobannotated execution plan ETH Zurich 50

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ETH