Download X-Data: Test Data Generation for Killing SQL Mutants

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

PL/SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Database model wikipedia , lookup

Join (SQL) wikipedia , lookup

Relational model wikipedia , lookup

Relational algebra wikipedia , lookup

Transcript
Bhanu Pratap Gupta
Devang Vira
S. Sudarshan
Dept. of Computer Science and Engineering, IIT Bombay


Complex SQL queries hard to get right
Question: How to check if an SQL query is
correct?
 Formal verification is not applicable since we do
not have a separate specification and an
implementation
 State of the art solution: Generate test databases
and check if the query gives the intended result
2
Automated Test Data generation
 Based on database constraints, and SQL query
▪ Agenda [Chays et al., STVR04], a tool which generates test cases
for database applications which additionally uses user fed
heuristics
 Ensuring query result is not empty
▪ Reverse Query Processing [Binning et al., ICDE07] takes desired
query output and generates relation instances
▪ Handle a subset of Select/Project/Join/GroupBy queries
 None of the above guarantee anything about detecting errors in SQL
queries
 Question: How do you model SQL errors?
 Answer: Query Mutation

3

Mutant: Variation of the given query
 Mutations model common programming errors, like
▪ Join used instead of outerjoin (or vice versa)
▪ Join/selection condition errors
▪ < vs. <=, missing or extra condition
▪ Wrong aggregate (min vs. max)
 Mutant may be the intended query
4
Traditional use of mutation testing has been to check coverage of
dataset
 Generate mutants of the original program by modifying the program in
a controlled manner
 A dataset kills a mutant if query and the mutant give different results
on the dataset
 A dataset is considered complete if it can kill all non-equivalent
mutants of the given query
 Prior work:
 Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of
SQL query mutations
 Shortcoming: do not address test data generation
 Our goal: generated dataset for testing query
 Test dataset and query result on the dataset are shown to human, who
verifies that the query result is what is expected given this dataset
 Note that we do not need to actually generate and execute mutants

5

Address the problem of test data generation for killing
non-equivalent mutants
 Equivalent Mutants: r(A,B)
s(B,C) and r(A,B) s(B,C) where
r.B is a foreign key to s, and is not null will always produce the
same resultset

Define class of:
 Join/outerjoin mutations
 Selection predicate mutations

Algorithm for test data generation that kills all nonequivalent mutants in above class
 Under some simplifying assumptions (given in the paper)
 With the guarantee that generated datasets are small and
realistic, to aid in human verification of results
6


Join type mutations: An occurrence of a join operator
( , , , ) is replaced by one of the other join
operators
Defining join mutations in SQL is complicated by the
absence of a particular join order
 SELECT * FROM a,b,c WHERE (a.x = b.x) and (b.x = c.x)


We consider all relational algebra expressions (trees)
equivalent (under inner join reordering) to the given
SQL query
We consider join type mutations to single join nodes in
each tree above
7

Case I: Mutation at root node, with no foreign key
constraints
 Schema: r(A), s(B)



To kill this mutant: ensure that for an r tuple there is no
matching s tuple
Generated test case: r(A)={(1)}; s(B)={}
Basic idea:
(a) run query on given database,
(b) from result extract matching tuples for r and s
(c) delete s tuple to ensure no matching tuple for r
8

Case II: Extra join above mutated node
 Schema: r(A,B), s(C,D), t(E)
To kill this mutant we must ensure that for an r tuple there
is no matching s tuple, but there is a matching t tuple
 Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)}

9

Given join expression on relations r1, r2, …, rn
 Create dataset where all relations have a set of matching tuples
 For each relation ri, generate a dataset where rest of relations
match, but ri is empty
▪ Unless making ri empty makes join graph disconnected

Above procedure kills all join type mutations of given
inner join tree
 Outer joins complicate picture when attributes are projected
out
▪ May have to make more than one ri empty at a time
 Foreign keys may prevent making some ri empty
10

Case III: Mutation at root node with foreign key constraints and selection
on right side
 Schema: r(A), s(B,C)
 Foreign key: r.A →s.B

To kill this mutant we must create an s tuple which matches with the r
tuple on the foreign key reference, but which has s.C ≠ 4
 Generated test case: r(A)={(2)}; s(B,C)={(2,5)}

Notion of valid nullable pattern defined in paper specifies which
relations can be made null/non-matching, given foreign key constraints
and join graph
11

Implemented using Java and PostgreSQL
 Creates datasets by extracting and modifying
tuples from given database
 Currently handles join type mutation and
selection predicate mutation

For creating a merged dataset
▪ Tuples having same values for join attributes must be
blocked from being inserted again

Handling selection predicate mutation
▪ Eg. to distinguish r.A < 3 and r.A <= 3 we generate tuples
with r.A = 2 and 3
12

Ongoing work :
 Synthetic data generation taking database and
query constraints into account which is non trivial
▪ Idea (from RQP [Binning et al ICDE07]): Use a model
checker to generate data
▪ Under implementation using CVC3
 Extend the technique to handle aggregations and
sub-queries

Future work: data generation for application
code with multiple queries
13
Questions


Problem: is Q equivalent to a mutant Q‘ can be
reduced to query containment and vice versa in
polynomial time
The Chase algorithm can be used to generate datasets
to show that Q and Q' are not equivalent (for SPJ
queries and several extensions)
 such a dataset would kill the mutant Q‘
 limited work on outerjoin containment data generation

However we don't want to enumerate each mutant
and generate separate datasets
 too expensive
15

Under the following conditions we can generate
merged datasets:
 Tuples having same values for join attributes must be
blocked from being inserted again
 The query must not contain any equality selection on
an unique key
 The result of the query must contain one or more
attributes which together form an unique key for any
relation
 Also attributes from the result forming an unique key
must be guaranteed to be non-null in the result
16

Consider the three relations :
 Student(name, deptcode, progcode),
 Department(deptcode, deptname)
 Program(progcode, progname)

And a query:
 SELECT rollno, name, deptname, progname
FROM student s INNER JOIN department d
ON s.deptcode=d.deptcode INNER JOIN program p
ON s.progcode=p.progcode
17
Query Tree 1
Query Tree 2
Query Tree 3
Generate mutants by mutating join operator
of a single node for all above trees

18
Program
Department
Student
Progcode
Progname
Deptcode
Deptname
Rollno
0
B.Tech
CS
Computer
501
1
M.Tech
CH
Chemical
2
PhD
ME
Mechanical
Name
progcode
deptcode
Devang
1
CS
401
Abhijeet
0
CE
701
Sandeep
5
CH
101
Aditya
4
MA
Generated data shows :






A student (Devang) with valid program and department
A student (Abhijeet) with invalid department
A student (Sandeep) with invalid program
A student (Aditya) with invalid program invalid program and
department
A program (PhD) with no student
A department (Mechanical) with no student
19
Program
Department
Student
Progcode
Progname
Deptcode
Deptname
Rollno
Name
progcode
deptcode
0
B.Tech
CS
Computer
501
Devang
1
CS
1
M.Tech
EE
Electrical
Foreign Keys are:
Student.progcode → Program.progcode
Student.deptcode → Department.deptcode
Generated data shows :



A student (Devang) with valid program and department
A program (B.Tech) with no student
A department (Electrical) with no student
20
Case of no foreign keys
21