Download X-Data: Test Data Generation for Killing SQL Mutants

Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay   Complex SQL queries hard to get right Question: How to check if an SQL query is correct?  Formal verification is not applicable since we do not have a separate specification and an implementation  State of the art solution: Generate test databases and check if the query gives the intended result 2 Automated Test Data generation  Based on database constraints, and SQL query ▪ Agenda [Chays et al., STVR04], a tool which generates test cases for database applications which additionally uses user fed heuristics  Ensuring query result is not empty ▪ Reverse Query Processing [Binning et al., ICDE07] takes desired query output and generates relation instances ▪ Handle a subset of Select/Project/Join/GroupBy queries  None of the above guarantee anything about detecting errors in SQL queries  Question: How do you model SQL errors?  Answer: Query Mutation  3  Mutant: Variation of the given query  Mutations model common programming errors, like ▪ Join used instead of outerjoin (or vice versa) ▪ Join/selection condition errors ▪ < vs. <=, missing or extra condition ▪ Wrong aggregate (min vs. max)  Mutant may be the intended query 4 Traditional use of mutation testing has been to check coverage of dataset  Generate mutants of the original program by modifying the program in a controlled manner  A dataset kills a mutant if query and the mutant give different results on the dataset  A dataset is considered complete if it can kill all non-equivalent mutants of the given query  Prior work:  Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of SQL query mutations  Shortcoming: do not address test data generation  Our goal: generated dataset for testing query  Test dataset and query result on the dataset are shown to human, who verifies that the query result is what is expected given this dataset  Note that we do not need to actually generate and execute mutants  5  Address the problem of test data generation for killing non-equivalent mutants  Equivalent Mutants: r(A,B) s(B,C) and r(A,B) s(B,C) where r.B is a foreign key to s, and is not null will always produce the same resultset  Define class of:  Join/outerjoin mutations  Selection predicate mutations  Algorithm for test data generation that kills all nonequivalent mutants in above class  Under some simplifying assumptions (given in the paper)  With the guarantee that generated datasets are small and realistic, to aid in human verification of results 6   Join type mutations: An occurrence of a join operator ( , , , ) is replaced by one of the other join operators Defining join mutations in SQL is complicated by the absence of a particular join order  SELECT * FROM a,b,c WHERE (a.x = b.x) and (b.x = c.x)   We consider all relational algebra expressions (trees) equivalent (under inner join reordering) to the given SQL query We consider join type mutations to single join nodes in each tree above 7  Case I: Mutation at root node, with no foreign key constraints  Schema: r(A), s(B)    To kill this mutant: ensure that for an r tuple there is no matching s tuple Generated test case: r(A)={(1)}; s(B)={} Basic idea: (a) run query on given database, (b) from result extract matching tuples for r and s (c) delete s tuple to ensure no matching tuple for r 8  Case II: Extra join above mutated node  Schema: r(A,B), s(C,D), t(E) To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple  Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)}  9  Given join expression on relations r1, r2, …, rn  Create dataset where all relations have a set of matching tuples  For each relation ri, generate a dataset where rest of relations match, but ri is empty ▪ Unless making ri empty makes join graph disconnected  Above procedure kills all join type mutations of given inner join tree  Outer joins complicate picture when attributes are projected out ▪ May have to make more than one ri empty at a time  Foreign keys may prevent making some ri empty 10  Case III: Mutation at root node with foreign key constraints and selection on right side  Schema: r(A), s(B,C)  Foreign key: r.A →s.B  To kill this mutant we must create an s tuple which matches with the r tuple on the foreign key reference, but which has s.C ≠ 4  Generated test case: r(A)={(2)}; s(B,C)={(2,5)}  Notion of valid nullable pattern defined in paper specifies which relations can be made null/non-matching, given foreign key constraints and join graph 11  Implemented using Java and PostgreSQL  Creates datasets by extracting and modifying tuples from given database  Currently handles join type mutation and selection predicate mutation  For creating a merged dataset ▪ Tuples having same values for join attributes must be blocked from being inserted again  Handling selection predicate mutation ▪ Eg. to distinguish r.A < 3 and r.A <= 3 we generate tuples with r.A = 2 and 3 12  Ongoing work :  Synthetic data generation taking database and query constraints into account which is non trivial ▪ Idea (from RQP [Binning et al ICDE07]): Use a model checker to generate data ▪ Under implementation using CVC3  Extend the technique to handle aggregations and sub-queries  Future work: data generation for application code with multiple queries 13 Questions   Problem: is Q equivalent to a mutant Q‘ can be reduced to query containment and vice versa in polynomial time The Chase algorithm can be used to generate datasets to show that Q and Q' are not equivalent (for SPJ queries and several extensions)  such a dataset would kill the mutant Q‘  limited work on outerjoin containment data generation  However we don't want to enumerate each mutant and generate separate datasets  too expensive 15  Under the following conditions we can generate merged datasets:  Tuples having same values for join attributes must be blocked from being inserted again  The query must not contain any equality selection on an unique key  The result of the query must contain one or more attributes which together form an unique key for any relation  Also attributes from the result forming an unique key must be guaranteed to be non-null in the result 16  Consider the three relations :  Student(name, deptcode, progcode),  Department(deptcode, deptname)  Program(progcode, progname)  And a query:  SELECT rollno, name, deptname, progname FROM student s INNER JOIN department d ON s.deptcode=d.deptcode INNER JOIN program p ON s.progcode=p.progcode 17 Query Tree 1 Query Tree 2 Query Tree 3 Generate mutants by mutating join operator of a single node for all above trees  18 Program Department Student Progcode Progname Deptcode Deptname Rollno 0 B.Tech CS Computer 501 1 M.Tech CH Chemical 2 PhD ME Mechanical Name progcode deptcode Devang 1 CS 401 Abhijeet 0 CE 701 Sandeep 5 CH 101 Aditya 4 MA Generated data shows :       A student (Devang) with valid program and department A student (Abhijeet) with invalid department A student (Sandeep) with invalid program A student (Aditya) with invalid program invalid program and department A program (PhD) with no student A department (Mechanical) with no student 19 Program Department Student Progcode Progname Deptcode Deptname Rollno Name progcode deptcode 0 B.Tech CS Computer 501 Devang 1 CS 1 M.Tech EE Electrical Foreign Keys are: Student.progcode → Program.progcode Student.deptcode → Department.deptcode Generated data shows :    A student (Devang) with valid program and department A program (B.Tech) with no student A department (Electrical) with no student 20 Case of no foreign keys 21

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download X-Data: Test Data Generation for Killing SQL Mutants