Download Optimization of Real Conjunctive Queries

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft Access wikipedia , lookup

Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

PL/SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

SQL wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Relational algebra wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Optimization
Surajit
Hewlett
Palo
IBM
Laboratories
Alto,
CA
The
has been
studied
research
nated).
tion
queries
in
In
this
techniques
from
carry
to the
over
we
We
the
set-theoretic
seman-
are
study
not
the
show
under
that
set-theoretic
number
the
such
do not
to
of
that
optimize
is semantically
sive
to
evaluate.
queries
class
queries
that
attention
has
most
this
trast,
of
equivalence
cial
A
amount
of
queries,
arise
in
SQL,
ination
two
functions
practice
class.
The
granted
direct
title
to
copy
provided
commercial
of the
specific
ACM
all or part
of this
are not made
the ACM
and its data
appear,
material
or distributed
copyright
and
is by permission
of tha Association
To copy otherwise,
or to republish,
notice
notice
is
lem
for
for
tion
ia given
of
change
used
that
con-
in commer-
semantics;
eliminated
relathat
unless
This
is,
elim-
is done
elimination
expensive.
is,
In
is rnultisets),
as COUNT)
set-
database
sets,
language
duplicate
in the
Second,
for
might
aggregate
are sensitive
to the
semantics
makes
Do
the
known
in
to the
the
the
results
queries
setting
know,
the optimization
queries
conjunctive
As we shall
underlying
to re-examine
conjunctive
set-theoretic
for Computing
requires
e fee
are
tuples.
not
men-
assumes
of tuples.
setting.
and the
above
either
queries
the
con-
about
carry
prob-
bag-theoretic
optimiza-
over
from
bag-theoretic
results
the
setting?
do not
carry
over.
queries
is typically
permission.
ACM-PODS-5/93/Washington,
01993
fee
the copiee
advantage,
publication
that copying
Mechinery.
and/or
that
without
of
in real-life
The
requested.
First,
multiplicity
We
of
duplicate
term
are
(such
it necessary
Permission
query
is explicitly
reasons.
and
applying
invariably
of
be computationally
since
in
has bag-theoretic
tuples
queries
optimization
relations,
contain
the
con-
optimization
almost
are bags (another
duplicate
has
queries.
on
systems.
results
not
of
for
complexity
however,
to query
databases,
tions
of
so research
a significant
that
expen-
or
do
a result
understood.
computational
semantics;
relations
they
questions
of relational
of queries
are
a query
less
fundamental
In general,
classes
into
equivalence
class of conjunctive
number
into
but
deciding
attracted
is the
a large
equivalent
is undecidable,
on certain
queries
query
optimization.
relational
focused
relational
a given
Thus,
is one of the
in query
fall
to
well
to
is.
research
research
As
problem
conjunctive
management
tioned
on transforming
the
the
theoretic
based
of joins.
very
at ion
queries
database
Techniques
is
of
corresponds
optimization
minimize
14).
minimization
which
is a difficulty,
results
Introduction
is the
number
what
junctive
1
research
the
minimiz
There
setting.
[ASU79A,
see [U89](Chapter
queries
how
we know
bag-
conjunctive
of conjuncts,
research,
know
optimization
setting
bag-theoretic
the
for
extensively
DBS90];
of this
junctive
elimi-
optimiza-
ive queries
focus
this
studied
CM77,
The
com
problem
been
minimizing
In
bag-theoretic
conjunct
semantics.
this
95120-6099
optimization
has
ASU79B,
queries
eliminated).
duplicates
paper
for
are
have
general
problems
theoretic
assumes
duplicates
SQL
(i.e.,
conjunctive
Unfortunately,
invariably
(i.e.,
contrast,
for
extensively.
almost
semantics
tics
problem
CA
Center
vardiQalmaden.ibm.
queries
optimization
Vardi
Research
Jose,
E-mail:
.com
Y,
Almaden
San
94304
chaudhuriQhpl.hp
Abstract
The
Queries
Moshe
Chaudhuri
Packard
E-mail:
of Real Conjunctive
0-89791
Equivalence
D.C.
-593 -3/931000510059
approached
. ..+1 .50
co
of conjunctive
via
the
notion
of
containment.
In
the
set-theoretic
setting,
contained
in another
is always
a subset
Two
queries
in
answer
to
if they
of cent ainment.
vestigation
under
thus
start
conjunctive
bag-theoretic
query
leaves
lations
are
the middle
of conjunctive
theoretic
setting
tainment
in the
problem
is known
ASU79B,
CM77].
some
sufficient
under
We
is at
erarchy,
the
NP(=
semantics
plexity
X;)
that
does
(rather
the
theoretic
semantics.
seems
der
set-theoretic
to
fact,
portant
the
sible;
ordering
The
picture
queries
are
by
conjuncts
above
for
The
result
practical
queries.
We
indicate
that
consider
unions
present,
perhaps
of
same
from
seems
not
indicates
the
in the
a in a bag
set.
B will
B is clear
is a bag
of tuples
is an assignment
names.
which
(wing,
is
where
pre-
An
is that
“;”
the
relation
PART consists
of
tuples:
[2]), (flap,
Seattle;
Portland;
im-
Seattle;
[1]),
[1])}
~~~,
the
end
is indicated
but
tuples.
That
Seattle)l
=
of
the
tuple
in square
PART contains
Seattle),
the other
under
marks
that
(engine,
two
only
copies
one
of each
Seattle)l
I(wing,
the
This
of the tuple
copy
is, I(engine,
1, and
and
brackets.
of
= 2,
Portland)l
=
of conis posto re-
We say that
query.
ement
a fairly
bleak
than
of conjunctive
two
Let
complex-
be restricted
all
database
annotated
{(engine,
under
of conjuncts
conjunctive
2.1:
the following
means
however,
relat ton
a
of an ele-
Ial, when
A
of
Intuitively,
element
A
arity.
annota-
un-
semantics
to paint
the
(or simply
to relation
Example
prob-
queries
optimization
case
multiplicity
of an element
context).
of relations
the
the
of an element
fixed
conjunctive
in the
this
occurrences
for
by lal~
the
multiplicity
must
in
integer.
duplicate
multiplicity
[GJ79].
result
contain
be denoted
bag-
the
no optimization
removal
and bag-
that
we then
the
isomorphic.
of this
represents
elements;
called
is a positive
of duplicates
be NP-hard
two
situation
show
re-
duplicates
Semantics
also
the multiplicity
NP-complete
bag-theoretic
queries
optimization
the
ment;
is
equivalence
the
no
is possible.
of an element,
number
com-
problem,
to
semantics
any
of
the
underlying
equivalence
has
that
consequence
bag-theoretic
junctive
level
in
is still
known
under
when
may
optimization
While
CM77],
show
are equivalent
cisely
bag
hi-
to be strict,
of the
graph-isomorphism
we
element,
Since
surprisingly,
easier.
We
the database
set-theoretic
A bag is a set of annotated
the
computational
semantics
is not
Bag-Theoretic
11~-hard.
first
problem
semantics
but
2
op-
queries.
(i.e.,
this
We
optimization
tion
containment),
Here
ASU79B,
as the
the
query
than
be
bag-theoretic
In
for
equivalence
lem
NP
con-
polynomial
is believed
the
be sets
between
some
of some
concept
examine
[ASU79A,
is
[St77].
change
where
Intuitively,
semantics.
is decid-
problem.
key
equivalence
in
the
semantics.
of meaningful
of conjunctive
to
re-
for
whether
problem
is at
increase
of the
Since
know
hierarchy
hierarchy
indicates
conditions
of the
bag-theoretic
of
this
latter
we do have
semantics
level
con-
that
[ASU79A,
necessary
the
second
while
than
The
while
even
that
polynomial-time
ity
contrast,
polynomial-time
this
harder
setting.
bag-theoretic
prove
the
be
in the bag-
to be NP-complete
In
we do not
tainment
able.
to
set-theoretic
and some
containment,
II;
seems
queries
known
ground
theoretic
Containment
under
equivreduces
respectively,
the possibility
the situation
are allowed).
con-
semantics.
equivalence,
unions
also consider
and
queries
[SY8 1]. We show
hold
for
containment
conjunctive
and
open
timization
in-
of
queries
does not
This
to
our
union
containment
sult
we say
answer
semantics,
of
conjunctive
can be expressed
We
by considering
tainment
to
latter.
if the answer
of the
equivalence
alence
contained
setting,
a subbag
set-theoretic
is
former
the
are
in another
is always
and again
terms
the
of
is contained
to the former
a query
to the
Inthebag-theoretic
a query
the latter,
that
answer
are equivalent
in each other.
that
we say
if the
results
is lost.
queries.
or equal
will
we
be denoted
between
the
Under
60
by
=b
two
(=.).
~b.
by ~..
bags
(resp.,
of B’ if each el-
also in B!
multiplicity.
bag relationship
that
First,
a bag B is a subbag
of B is contained
We will
The
The
with
define
subset
equality
sets)
will
a greater
the sub-
relationship
relationship
be denoted
by
Example
2.2:
In this
example,
The
B ~b B’.
operational
queries
B
=
{(engine,
Seatt/e;
[1])}
B’
=
{(engine,
Seattle;
[2]), (flap,
(wing,
Portland;
is
operational
Seattle;
[1]),
of
[1])}
in
are
bag
also define
union
of two
multiplicity
bags.
bags
factors
The
bag
Example
B’
the
2.3:
in Example
(wing,
tuple
in either
be denoted
The
k
[3]), (flap,
the
on
Example
B
and
the
bag:
Seattle;
Let
Example
for
this
section
mantics
we describe
is the
we start
and
the
the
bag-theoretic
queries.
optimization
with
the
Since
of
description
of SQL
se-
our
“real-life”
sults
An
SQL
Conjunctive
conjunctive
the following
SELECT
coluxnnlist
rellist
WHERE
equal
where
columnist
selected,
(called
ple
tion
the
table
queries,
conjunctive
query
is an
the
complete
is
SQL)
names
equalit
among
syntax
cross
conjunctive
3.1:
The
tuple
of the
tuples:
Portland;
SQL
query
of relation
the
[1]),
ylist
following
wing,
Seattie,
application
. ID ,
SUPPLIER
, PART
WHEItE
SUPPLIER
. CITY
re-
[1]),
[1])}
Seattle;
the
the
[2]),
Portland;
flap,
of
only
Seattle,
condition
following
engine,
Seattle,
flap,
the application
following
relation
with
in
tuples
the
qualify:
Seattle;
[2]),
Seattle;
[1]), }
of the selection
relation
as answer
list results
to the
query
3.1.
{(Boeing,
engine;
[2]), (Boeing,
flap;
[1]))
tu-
is a conjuncattributes.
we refer
the
(For
reader
is an example
of a
3.2
Logical
In this
subsection,
mantics.
SUPPLIER
relations
of
query.
SELECT
Seattle;
Seattle,
of conjunctive
FROM
of the
attributes
of
(possibly
relation
of SQL,
list
product
(Boeing,
equalitylist
to [D87].)
Example
that
of the
[1])}
engine,
in the
is the list
and
conjunctive
relation:
{(Boeing,
rellist
of equalities
in the
[1])}
[2]), (wing,
Seattle,
Finally,
variables),
among
us assume
consists
{(Boeing,
After
Queries
itylist
in
are
the
Let
PART consists
Seattle;
in Example
to
that
the
tuples
moti-
form:
FROM
in
qualifying
Seattle;
Seatt/e;
(Boeing,
be
for
(Boeing,
SQL
taken.
be described
3.1.
SUPPLIER
relation
in the
queries.
3.1
is
conditions
Queries
of conjunctive
vation
will
the
cross
obtained
the
us consider
{(Boeing,
[1]),
Therefore,
In
rellist
attributes
in
relation
(flap,
Conjunctive
for
the
selection
tuple
details
3.2:
query
{(engine,
3
the
The
First,
paper.
SQL
[1])}
Portland;
each
conjunctive
[D87]
of the
bags
the
to
projected
full
in
of the
Finally,
product.
colunmlist.
by U.
the
B U B’
Seattle;
The
by adding
us consider
2.2.
{(engine,
each
will
Let
operator.
is obtained
for
union
bag unton
each
equalitylist
(see
of SQL).
relations
we apply
cross
We will
the
of SQL
as follows
semantics
product
Next,
semantics
defined
The
the previous
PART. ID
alent
= PART. CITY
(for
61
we describe
queries
two
and
section)
see also
the
their
approaches
a detailed
see [NPS91];
Queries
Conjunctive
are then
semantical
[MPR90]).
logical
syntax
denotational
se-
(of this
section
shown
to be equiv-
account
and
of SQL,
A logical
conjunctive
query
is a rule
The
of the form:
result
denoted
Query(X)
:-Cl (Xl),
assignment
. . . . C~(X~)
result
where
the
X and the Xi’s
Cj’s
are
head
of the
body
of the
a query
relation
query
in this
that
there
rather,
by multiple
variables
will
are
the
distinguished
of the
over
mapping
bag
Example
3.3:
The
equalities
are cap-
variables,
Example
3.6:
ample
and
3.3
assignment
The
(a)
s-id
mapped
and
p_id
p.id)
Supplier(s-id,
Part(p.icl,
(b)
The
denotational
queries
mappmgs.
tive
query
of data
that
mapping
in Q.
X,
3.4:
Let
and
database
3.3
mapping
where
to Seattle
and
mapping.
Ci (Xi)
is an
tive
variables
of Q is
Section
and
let
the
X
be a
data
value
by L9(C;(X;))
query
Example
to
engine,
now
due
to
rni
10( Ci(Xi))l,
to
the
=
0 of
Q
define
an
The
over
multiplicity
i =
D
the
tuple
Boeing,
c-id
derived
mapping
1, . . . ,n.
is the
tuple
use
Example
due
tion
name
The
following
ical
syntax.
{(Boeing,
the
engine;
Consider
The
(O(X);
Example
assignment
[2])}
p-id
of
(a)
assign-
Therefore,
the
wing;
[1])}
Conjunctive
by
0.
the
result
[m])
a
the
in
3.4.
The
there
relation
in
order
that
of
no rela-
once in rellist.
every
the log-
relation
in
variable.
rellist
introduce
with
as the
con-
beginning
the
a
variables
attributes
in
the
information.
every
attribute
equalit
for
ylist,
among
equalities
there
induces
tion
on the variables.
tive
from
each
variable
corresponds
each
variables
is
62
of
conjunct
variable,
each
of SQL
generate
a distinct
re-
1
than
of steps
attribute
same
equality
mapping
form
in the
more
sequence
every
tinct
with
of conjunc-
we assume
introduce
3. Since
due
a transformation
syntax
canonical
is repeated
schema
Let
sketch
to logical
as introduced
every
2. For
m = ml m2 . . . mn.
3.5:
to
[1])}.
SQL
3. For simplicity,
These
sult
and
that
bag:
we briefly
rellist,
is an assignment
assignment
There
possible:
Seattle
result
[2]), (Boeing,
syntax
query
1. For
in Ex-
3.2.
9
can
to
in Ex-
of assignment
The
vs.
section,
in
We
are
except
result
wing;
in the
corresponding
query
that
c-id
3.5.
engine;
will
D,
the
3.2.
queries.
We
is mapped.
in
query
as (l),
The
is {(Boeing,
the SQL
junctive
is mapped
to
In this
from
@ be an assignment
us consider
s-id
p-id
D
to the
we denote
the
Example
Queries
of a conjunc-
by O(X)
and
Same
Example
Logical
conjunc-
in the body
Let
database
to which
Example
ample
D
in
assignment
a database
in
conjunct
We denote
O maps
tuple
into
in D.
of Q into
variable
mapping
values
every
of
us consider
c-id)
of logical
terms
assignment
to a tuple
to which
in
Q as above
of Q such
the
defined
An
assignment
mapped
semantics
is
mappings
mappings
results
3.2.1
tive
assignment
(b)
{(Boeing,
: –
c-id),
to these
to wing.
in
evaluation
(s-id,
by takmappings
to Boeing,
to engine;
ment
3.2 can be
as:
Query
re is the
assignment
Let
two
is mapped
in Example
and
is obtained
all
database
are
van’ables.
query
due
D,
O is any
D
is, Q(D)
over
a database
where
D.
is given
expressed
over
1-1oro,
of Q into
union
results
Q
by
upon
in the
of variables.
selected
the
is the
look
of literals
equalities
occurrences
X
be called
sometimes
or set
ing
a query
is given
due to 0. That
is the
. . . . C~(X~)
are no explicit
representation;
tured
head
will
bag
and
Query(X)
C’l(X1),
(we
as the
Note
of variables,
names.
and
query
itself
body).
are tuples
of
Q(D),
equality
a dis-
is a corresponding
such
as X
an equivalence
We select
equivalence
to
predicate
class,
by its representative.
=
Y.
rela-
a represent
and
a-
replace
4. The
the
distinguished
variables
representatives
of the
respond
in the
head
variables
to the attributes
in the
An
are
that
important
ment
cor-
for
relationship
colundist.
observation
conjunctive
than
is that
queries
bag
contain-
is a strictly
set containment
for
stronger
conjunctive
queries.
Example
3.7:
ample
3.1.
CITY
Assume
as the first
SUPPLIER
step,
the
the
the
of SUPPLIER
attributes
query
in
Ex-
has
ID
and
schema
second
attributes
relations.
s-id,
variables
of Part.
In
the
we create
the
c-id
p-id
(a)
first
for
and
second
the
Proposition
4.1:
of the
In
variables
and
transformation,
SQL
the
the
PART
we introduce
the
the
that
and
and
tributes
for
Consider
at-
For
any
ifQ
<b Q’
c-id),
step
(b)
.,
of
There
exist
such
body
Part(p-id,
that
the
cl.-id
third
step,
to obtain
we apply
the
the
The
c.id),
head
result
Query
the
equality
c-id
=
ing
Query
Part(p.id,
(s-id,
c-id).
(s-id,
us assume
p-id)
p-id)
in the
fourth
Part
(p-id,
c-id)
Q
database
Z’ransform(Q)
obtained
from
denotes
an SQL
the
query
3.8:
Let Q be an SQL
the
Transform(Q)
about
results
to any
be easier
the
of
of
Therefore,
for
D
Q
the
the
and
prove
our
rest
syntax
of this
semantics.
paper,
of
Q is bag contained
denoted
by Q <b Q’,
we use the
D.
In contrast,
be denoted
~b Q’(D)
set containment
by Q ~.
result
of
Q
has
two
of
Q’
has
exactly
(bag)
us consider,
result
Q ~b Q’.
however,
[1]), (q(a);
a
[2])}.
tuples,
one
B
Q’,
any
would
the
that
a conjunct
distinguished
next
by
be
of set
mappings.
each
conjunct
to a
It
in the
of Q.
an identity
there
Q’
of Q’ to vari-
in the body
The
mapping
is known
that
is a containment
to Q (see [CM77]).
can we strengthen
proposition
queries.
63
to
a characterization
Proposition
of Q in
Q’.
u maps
when
Q
obtained
a query
variables.
precisely
from
from
u of variables
of Q such
bag
a characterizabe
of cent ainment
of Q’ into
Q’
set contain-
characterization
mapping
is required
that
than
expect
known
in terms
subsection
would
ables
How
over
one
containment
Q is a mapping
the
Containment
stronger
body
for
query
for
is strictly
Thus,
query
Results
in another
if Q(D)
{(p(a);
A containment
Conjunc-
and
Let
tuples
seen in the previous
of bag
Q <s
Definitions
A query
will
q(X)
Q’.
Conditions
get
Q’
following
the
(bag)
containment
problems
and
S,
4.2
mapping
database
p(x)
strengthening
results
Queries
Basic
P(X),
Therefore,
ment.
are equal.
and equivalence
logical
Containment
4.1
the
consider
:-
tuple.
mapping
tive
us
:-
containment
and
latter.
4
Let
Q(X)
the
We have
conjunctive
applying
database
to state
containment
terms
by the follow-
4.l(b)
Q’(X)
with
the
tion
in
Q’,
Q
transformation.
Then
It will
and
: –
c.id),
that
query
query.
Proposition
4.2:
whereas
Theorem
Q
<b Q’ does
but Q
example.
Then,
above
queries
holds,
queries:
1
by the
Q’
cl-id).
Clearly,
Let
Q’,
is:
Supplier(s-id,
conjunctive
and
Q’.
hold.
Example
We add
Q
also Q <.
body
Suppiier(s-id,
step.
queries
then
conjunctive
Q <,
We illustrate
In
conjunctive
holds,
cl-id
not
Suppiier(s-id,
two
this
of bag
provides
4.3:
Let
characterization
containment?
to
The
a clue.
Q
and
Q’
be conjunctive
(a)
If
Q
<h
Q’,
then
the number
no less than
query
(b)
for
any
relation
ofp-conjuncts
name
in the query
the corresponding
p,
Theorem
is
queries.
IfQ
relation
name,
Q’
number
in the
Q.
tainment
If Q <h Q’,
there
then
for
every
M a containment
to Q such
that
conjunct
mapping
The
above
Q’
1 E u(Q’).
proposition
between
tainment.
If
set
Q
<,
less constraining
Q’,
the
there
are
conjuncts
is for
Q to be contained
Q’,
however,
be fewer
Q’
To get
Q’
result
bag
yields
and
containment,
juncts.
As
independently
nan
[I R92].
We riow
of “coverage”
show
yields
that
ple,
Q <~
Example
4.fi
enough
that
a strong
It
con-
requires
enough
queries.
Q’
If there
onto
Q,
Q
Q’
for
the following
two
r(Z)
Q(X,
Z)
a
p(x),
q(U, X),
q(V, Z),
T(Z)
is easy
that
that
from
there
Q’
is no
to Q.
onto
We
con-
can
show,
Q <b Q’.
definition
of containment
over
show
all
that
this
for
a method
this
bag
to test
such
as the
quan-
full
paper,
can
over
somea finite
the multiplicities
While
procedure
the
quantification
where
symbolically.
cases,
In
quantification
by
of databases,
given
involves
databases.
be replaced
number
from
following
observe
however,
tification
to Q is onto
mapping
to
mapping
provide
the
us consider
q(V, Y),
decision
Consider
containment
H
Q <h Q’.
4.5:
is no onto
q(U, Y),
tain
Example
there
exam-
bag
be conjunctive
is a containment
then
condi-
In the following
to Q.
Let
tainment
notion
condition
and
Q’
of an onto
a necessary
p(X),
we will
Q
but
existence
is not
discov-
Ramakrish-
z
times
Let
4.6 were
and
Z)
that
if Q ~b a(Q’).
4.4:
Q.
Theorem
the
the same
M a con-
Q’(X,
The
Proposition
onto
with
there
multiplicity,
shows,
u from
iff
queries.
would
in Q by the conjuncts
mapping
Q’,
be conjunctive
in Ioannidis
cent ainment.
from
Q’
Q’
however,
mapping
containment.
A containment
from
mapping
bag
Q’
it
to ensure
have
a sufficient
general,
for
multiplicity.
enough
4.3
of the conjuncts
Q <b
4.4 and
ered
tion
and
conjuncts
consequently
lower
Q’ should
is
the
conjuncts
there
and
high
Proposition
‘(coverage”
in Q.
that
“easier”
we need
with
means
the
that
of
of Q’
Thus,
Fewer
means
tuples
body
con-
Q
then
mapping
containment
dif-
bag
of Q.
Q’,
mappings,
in
t uples
that
the
body
in
a basic
and
in Q’.
would
assignment
would
out
then
than
fewer
in
brings
containment
Let
has no two
Proposition
1 in Q,
u from
In
ference
4.6:
does
not
containment,
bag
are
yield
it
containment
queries
a
does
in cer-
in Example
4.7.
three
queries:
Complexity
4.3
Q(X,
Y)
:-
s(X,
Z),
t(iV,
Y),
Q’(X,
Y)
:-
s(X,
Z),
t(Z,
Y)
Q“(X,
Y)
:-
s(X,
Z),
t(Z,
Y),
t(Z,
The
complexity
from
Q
Q onto
to
Q“
that
Q“.
~,
also
Q’ and
Thus,
Q.
tainment
there
On
u(Y,
out
that
a containment
a containment
we
the
mapping
turns
is
have
that
W)
Proposition
Q’
hand,
there
from
Q onto
Q“.
~b Q.
mapping
mapping
other
Q“
of set
be NP-complete
lated
Observe
<b
of
Containment
W’)
from
ilar
Q
and
is no
con-
Indeed,
Since
Theorem
characterization
is known
the
4.6
to
condition
are
of
closely
re-
of set containment,
that
this
condition
The
problem
has a sim-
complexity.
4.8:
whether
1
and
surprising
Theorem
it
4.4
to the
it is not
containment
[CM77].
there
conjunctive
ts a conjunctive
query
Q’
onto
of
determining
mapping
a conjunctive
from
a
query
Q
is NP-complete.
The
be
condition
necessary
conjunctive
of Proposition
and
sufficient
4.4
for
turns
a large
out
to
class
of
The
queries.
tainment
64
suggestion
given
of
by the
intractability
NP-completeness
of
set
con-
result
if
[CM77]
is somewhat
plexity
is
is in terms
typically
database.
misleading,
much
smaller
test
if Q <,
To
apply
Q’
yields
the goal
arise
to the
in practice,
body
of Q.
this
full
paper
rithm
for
testing
than
Q’,
the
com-
the
of
the
have
is quite
describe
Lemma
this
queries
of
algo-
of onto
containment
existence
of onto
containment
that
mapping
the
is in
general
so Theorem
complexity
now
of bag
describe
in terms
tells
oracle
Turing
St77]
second
bound
level
reader
It
Q’(X)
Clearly,
The
is referred
is in
is believed
of
returns
to
cates.
that
bound
4.9:
The
for bag con-
bag containment
Q’
problem
is
saw
harder
than
complexity
of
problem.
set
containment
containment
even
know
is
The
pre-
is an
open
containment.
bag
We do not
bag
if the
4.3)
Q’.
Over
but
the
this
Q’ returns
Q #b Q’.
database
database,
four
Q
dupli-
,
problem
for
bag
containment
in
a conjunctive
of
Q
by
such
a sufficient
Q’.
We
show
coverage
had
(Proposi-
for such
We now
of
query
coverage
condition
4.4).
two
zsomorphac
ment
4 that
Q
that
coverfor
condition
bag
is neces-
sufficient.
conjunctive
iff
mappings
there
from
queries
are
Q and
one-to-one
Q’ onto
Q and
Q’
containvice
versa.
is
5.2:
Let
Q ~b Q’
iff
Q
and
Q and
Q’
Q’
be conjunctive
are isomorphic.
ConjuncCorollary
tive
us consider
[2])}.
a simple
and
queries.
of
P(X)
“coverage”
and
Theorem
Equivalence
se-
Q and
p(x), p(x)
conditions
We say that
are
decidable.
5
queries
: –
query
requires
sary
that
is
equivalence
:-
in Section
equivalence
4.9 suggests
semantics
the
set-theoretical
the
duplicates
age (Proposition
11~-hard.
Theorem
under
Let
Therefore,
We
tion
cise
two
necessary
Theorem
Q’.
of {(p(a);
a conjunctive
lower
than
to
of con-
the
tainment.
indeed
Q =,
consisting
in H;.
the
similar
equivalence
bag-theoretic
Consider
Q(X)
problem
II;
when
We
in terms
class
hierarchy.
state
this
5.1:
the
general.
results,
the
property
queries
neces-
about
hierarchy.
The
contained
We can now
for
the
details.
of the
not
is defined
machines;
for
is strictly
in
polynomial-time
hierarchy
[GJ79,
but
us nothing
containment
a lower
of the
polynomial-time
NP
sufficient
4.8
that
under
conjunctive
Example
Recall
to prove
stronger
precisely
<~ Q hold.
mantics.
mappings.
sary,
queries
a strictly
practical.
existence
Q’
4.1, showing
junctive
that
a similar
Q -b Q’ holds
Q’. clearly,
Q <h Q’ and
It is straightforward
to
see whether
algorithm
we will
size
For many
Q =
both
which
we simply
of Q and
tuple
In the
since
of the size of the queries,
Queries
5.3:
querzes
Bag
equivalence
is polynomially
of conjunctwe
equivalent
to graph
iso-
morphism.
The
focus
that
In
on
the
and
containment
tion
vates
the
A query
Q’(D).
setting
both
We
is quite
equivalence
any database
If
denote
equivalence
Q
and
it
by
fact
Corollary
equivalence
D,
queries
are
NP-
equivalence
in
Sec-
ter
problem
in the
bag-
morphism
difficult.
This
moti-
are
•~
of Q and
to another
bag
Q’.
Q’
In
will
The
query
crux
tive
equivalent,
we
placement
contrast,
the
denoted
lent
queries
65
to
here,
of
the
in
NP,
[GJ79].
misses,
in
of a conjunctive
optimizing
query
it is known
with
that
setting
query
by
a smaller
for
it
isois not
Focusing
however,
set-theoretic
set
the lat-
graph
but
of
than
While
[CM77],
be
matter
in the
equivalence
easier
queries.
be NP-complete
conjunctive
conjuncts:
by
bag
perhaps
is NP-complete
is known
to
us that
is
of conjunctive
complexity
=,
be
Q(D)
tells
queries
problem
known
the
we have that
5.3
conjunctive
directly.
Q’
Q
the
containment.
however,
saw,
Q is bag equivalent
Q’ iff over
from
to
containment
semantics
studying
stems
reducible
of conjunctive
[CM77].
4 that
theoretic
set
is
set-theoretic
complete
will
containment
equivalence
every
the
on
point.
conjuncis the
re-
an equivanumber
conjunctive
of
—
query
Q there
tive
query
other
is a mmzmally
Q’,
i.e.,
conjunctive
than
Q’
orem
however,
bag-equivalent
up
optimization
the
and
tion
class
ttes
conjunct
An
Theorem
5.2
of conjunctive
the
{(Boeing,
are
and the relation
the
ive queries
not
carry
in
over
with
Then,
ques-
Seatt/e;
[1]),
to
the
first
SQL
query
SQL
provides
obtained
SQL
and
the ability
by
second
SQL
the
Therefore,
statement
for
Semantics
to take
evaluating
union
individual
bag
union
of the bags
queries.
is given
ALL
A and B are SQL
compatible,
tributes
i.e.,
(the
quired
corresponding
to be type
A and
B are SQL
sume
that
evaluating
lations
the
yields
TA
A UNION
union
same
TB
are
by:
A
union
our
queries.
the
U
a database.
Then,
and
TB.
6.1:
Consider
schema
three
relations
PART (ID,
CITY)
, CAPITAL
(CITY,
the
of the
[1])}.
and
Semantics
expressions
is an ex-
u . . . UQn(X),
tween
by
for
the re-
SQL
query
database
CITY),
arity
the SQL
The
for
result
of
all
set of
U over
equivalence
and the logical
a
be-
approach
extended
of conjunctive
to
queries.
We can represent
in Example
and
same
can be easily
union
6.2:
the
The
a~p~oach
queries
query
and
the
SQL
6.1 by QI (1) U Q2(I),
query
where
given
consists
SUPPLIER
a conjunctive
same
variables.
conjunctive
equivalence
by taking
is
the
D is U1<i<nQi(D).
Example
The
in
form
Qi
has
database
us as-
obtained
each
Qi’s
given
Example
results
[2]), (.fiap;
of conjunctive
distinguished
purpose,
Let
relations
B is obtained
below.
query
Syntax
of the
where
of at-
are also re-
For
ALL
of TA
relation
[11)}
combined
Logical
6.2
The
number
attributes
conjunctive
and
the
and are union-
compatible).
A and B over
for
bag
the
relations
B
statements
contains
the
[2])}.
query
Q1(X)
where
[1])}
I
pression
A UNION
Pittsburgh;
yields
{(engine;
{(engine;
Syntax
[1]),
the
Queries
SQL
tuples:
Portland;
[1]), (brake,
{Uk
6.1
[1])}
(~iap,
Seattle;
inequalz-
Conjunctive
of
SUPPLIER
for PART has the following
Seatfie;
(engine,
The
Union
for
applicable
interesting
queries
{(engine,
bssic
[K88].
6
relation
are identical
Thus,
is simply
setting.
is whether
larger
for
that
tuple
The-
queries
they
reordering.
bag-theoretic
to
conjunctive
setting
us assume
has the
no
to Q has fewer
According
when
technique
set-theoretic
in the
two
Let
conjunc-
to Q and
equivalent
[CM77].
precisely
to renaming
equivalent
is equivalent
query
conjuncts
5.2,
Q’
of
Q~(l):-part(l,
(ID,
C), supplier(l,
QZ(l):-part(l,
C)
C), capita/(C,
N)
COUNTRY).
9
SELECT
FROM
PART . ID
PART ,
SUPPLIER
WHERE PART . CITY
= SUPPLIER.
Equivalence
6.3
Sagiv
ALL
for
UNION
and
PART . ID
FROM PART ,
WHERE PART . CITY
must
Q <~ Qj.
COUNTRY
optimizing
= COUNTRY. CAPITAL
in the
66
Yannakakis
conjunctive
there
SELECT
and
Containment
CITY
exist
This
[SY81]
have
shown
that
if
<~
Ui Qi,
then
queries,
some
suggests
a union
set-theoretic
Qj
Q
in the
union
the following
Ui Qi
setting:
such
approach
of conjunctive
that
to
queries
(a)
eliminate
redundant
eliminate
Qi
conjunctive
if Qi
~~ Qj
queries,
for some
Example
i.e.,
j # i, and
which
then
(b)
replace
each
a minimally
Assume
giv
The
query.
that
did
<~
answers
(b)
junctive
mizable
Qj
applicable,
query
in the
union
theoretic
that
does
carry
not
assume
that
plicates
and consists
ated
due
cates
that
Q’
and
6.3:
and Q“
First,
to
Q’(X,
Q“(X,
each
such
duplilater
by
as COUNT or AVERAGE.
: –Student(id,
age)
7.1
not
Containment
conA query
mini-
Q is bag-set
denoted
any set-valued
of Sagiv
to the
set
bag-
Q’,
database
containment
between
contained
by Q Sbs
in another
if Q(D)
D.
is an
bag
There
such
that
are conjunctive
queries
It turns
and
query
Lb Q’(D)
out
intermediate
containment
Q <h Q’ LIQ”,
7.2
Example
but Q -fb
only
Consider
7.1.
those
The
over
that
bag-
relationship
set containment.
a variant
query,
students
following
queries
satisfy
the
given
who
of the query
below,
in
considers
are also employed.
: -p(X),
q(U, X),
q(V, Z),
r(Z)
Z)
: -p(X),
q(U, X),
q(V, X),
r(Z)
q(V, Z),
r(Z)
: -p(X),
q(U, Z),
a student
Example
failure
open
mization
of
the
for
step
in
Sagiv-Yannakakis’
possibility
unions
that
a characterization
of conjunctive
of
of
Theorem
meaningful
conjunctive
would
be
of bag
equivalence
may
7.3:
Consider
will
use the
obtain
for
unions
no
duplicates.
tainment
An
and
tions
in
the
arise
often
set-valued
is a set,
important
equivalence
database
but
the
Q
jobs.
following
age)
Student(id,
age),
It is not
hard
to verify
that
database
relations
<~,
Q’,
S
queries.
Female(id)
Q <b$ Q’,
but
Q <b
1
When
the
condition
of Proposition
are
4.3 has
set-valued,
to be weak-
ened.
Set-valued
analogously.
Q’,
multiple
Student(id,
the
Databases
term
that
<.
:-
Proposition
a relation
jobtitle)
A
to
queries.
Set-Valued
to
Q
have
: –
Q’.
opti-
queries.
direction
Emp(id,
Q(id)
I
The
age),
It is easy to see that
Z)
Z)
: –Student(id,
claims
Q’(id)
We
the
the
Second,
proposition.
Q(X,
7
that
I
since
first
and Age
be gener-
if
Q(age)
The
leaves
ID
may
Q $b Q“.
Proof:
of the
attributes
can
no du-
can be processed
function
Example
Q’
contains
Observe
are generated
query,
We
semantics.
Proposition
Q,
following
students.
duplicates
to projection.
Q’(age)
Q’,
over
of the
However,
the
even
the result
the
of all
the STUDENT relation
an aggregate
5.2.
however,
age
set-
answer.
since
Consider
the
respectively.
bag-
over
is by itself
to Theorem
Yannakakis
carry
contributes
is not
It so happens,
the
since,
in the
above
according
to
to be negative.
tuples
by
of Sa-
bag-theoretic
applicable,
Q~ and
of the
over
the
seems
Qj , both
result
we then
to
is not
the
carry
Could
above
multiplicity
and
conjunctive
technique
(a)
step
equivalent
setting.
optimization
Qi
query
Yannakakis
theoretic
ting?
conjunctive
hypothetically
and
step
remaining
7.1:
returns
are
i.e.,
relation
databases
special
arises
are
case
when
set-valued.
Let
If Q <b,
Q,
is a containment
there
Q such
to refer
a relation
7.4:
queries.
that
Q’,
Q
and
then for
Q’
be conjunctive
every
mapping
vartable
u from
v in
Q’
to
v E u(Q’).
with
Example
defined
vide
of conthe
This
rela-
ment.
case
says that
then
in practice,
67
7.3
provides
a sufficient
The
sufficient
for
Q <b Q’.
us
condition
if the
with
for
condition
containment
Certainly,
this
a clue
bag-set
to
in Proposition
mapping
is also
pro-
contain4.4
is onto
a sufficient
condition
for the restricted
weaken
the
condition
containment
onto
mapping
the
where
in the
query
However,
7.2
we can
restricted
u from
if V ~~ u(V’)
of variables
caee.
for
case.
A query
Q’ to Q is variable-
V And
Q and
V’
Equivalence
A
are the
Q’,
set
Q’(D)
Q’ respectively.
Q is bag-set
denoted
for
any
Proposition
queraes.
Q’
7.5:
If
there
Let Q and Q’
be conjunctive
is a containment
variable-onto
Q,
then
mapping
Q <b,
Example
Q
7.6:
the
is
Consider
only
from
Example
Therefore,
variable-onto.
Proposition
7.5 that
We note
that
Proposition
bag-set
7.3.
mapping
it
Q <h$ Q’.
it follows
7.5 is not
Observe
from
Q’
follows
from
Example
7.10:
from
complexity
is similar
in nature
whet her
Thus,
there
the following
there
one query
to the problem
is an onto
is a
following
two
bag-set
: –
l(x,
-Z),P(X>
Y)
:-
P(X,
Y)
difference
between
equivalence
is that
under
tuple
Q’
bag-set
equiv-
v
bag
duplicate
equivalence
Iiterals
are
equivalence,
since
each
one.
We
representation
literals
and
duplicate
has multiplicity
is a canonical
Q if all
of determining
is not
the
not
will
say
of a query
are removed
from
Q.
to another
cent ainment
result
that
but
Y)
database
whether
from
and
for
of Containment
of determining
Observe
Q(X,
A key
mapping
equivalence
4.7 that
condition
that
variable-onto
with
is an interme-
bag
Q(X,
redundant
The
=h
As
alent.
bag-set
Complexity
equivalence
between
query
Q(D)
D.
to
containment.
7.1.1
that
database
are set equivalent
~
a necessary
to another
if we have
set equivalence.
Q’.
Example
containment
bag-set
relationship
queries
that
Q’,
set-valued
cent ainment,
diate
equivalent
Q =h~
Theorem
mapping.
7.11:
queries.
surprising.
Q;
QI
are
~b,
Let
Q2
canonical
Q1
iff
Q;
and
Q2
~h
Q;
be conjunctive
representations
where
Q;
of QI
and
and
Q2
respectively.
Theorem
7.7:
whether
there
conjunctive
query
problem
of
a conjunctive
query
Q’
determining
mapping
variable-onto
from
Corollary
a
tive
a conjunctive
that
is a sufficient
condition
complexity
lowing
condition
and
of bag-set
Proposition
tween
the
of Theorem
does not
tell
and
The
a connection
bag-set
Theorem
7.7
timization,
us about
containment.
establishes
bag containment
7.12
queries
Bag-set
equivalence
is polynomzally
of conjunc-
equivalent
to
graph
isomorphism.
Q is NP-complete.
We observe
the
The
is
erals,
fol-
is possible
set-valued.
be-
containment.
7.11 shows
namely
of conjunctive
only
some
of the
only
very
of removing
in the
In the full
t ion
that
that
limited
lit-
relations
are
case where
paper,
queries
we discuss
over
relations
are
op-
duplicate
optimiza-
databases
known
where
to
be set-
valued.
Proposition
tion
7.8:
There
of bag containment
is a polynomial
to bag-set
reduc-
containment.
8
From
lowing
Proposition
result
Theorem
7.8 and
Theorem
follows.
7.9
The
bag-set
Related
containment
problem
is 11~-hard.
Bag
containment
tive
queries
As in the
for
unrestricted
bag-set
containment
case,
the
remains
decision
prob-
68
and
equivalence
first
addressed
[D GK82].
These
notions
by
Klausner
[K86]
1 in
relational
1Klausner
et al..
open.
were
al.
tended
lem
Work
4.9 the fol-
algebra
also corrected
the
with
were
for
conjunc-
by
also
context
Dayal
of
additional
the earlier
et
addressed
results
an
ex-
control
by Dayal
over
duplicate
the
query
are harder
tained
elimination
than
by
Recently,
in our
model,
and
the
conditions
consider
the results
than
setting
our
aspect
ob-
[IR92]
of
ours.
[ASU79A]
Aho
results.
in-
V.,
SIAM
Sagiv
of
Journal
[ASU79B]
do
and
Aho
[CM77]
V.,
Sagiv
Queries,”
Chandra
Remarks
paper
we studied
conjunctive
the
semantics.
We
niques
the set-theoretic
from
over
to the
bag
containment
to
be
showed
on
Merlin
As
queries
sible
in the
setting
is an a posteriori
on join
nation
in commercial
(See
We
rem
[S*79,
have
shown
does
leaving
for
lence
the
database
that
open
of
the
of queries
than
database
not
unions
we discussed
rather
[DBS90]
pos-
on join
the
[GJ79]
em-
have
a discussion
our
attention
problem
with
the
need
of optimizing
tics.
Umesh
Dayal
work
in this
area.
comments
This
Waqar
to
[IR92]
Garey
on an earlier
a
M.,
and
equiva-
1992.
setting
where
[JK84]
was
inspired
by
who
brought
to
version
[K86]
on
pp.
Sagiv
Y.,
of
3rd
1990,
Johnson
117-
“Op-
conjunc-
Int ’1 Conf.
pp.
bag
D.
on
455-469.
M.,
E.,
Surveys
16:2,
Klausner
A.,
1986.
Klug
A.,
and
Theory
and
of
Co.,
Ramakrishnan
Technical
Re-
Wisconsin-Madison,
J.,
“Query
Systems,”
pp.
R.,
of Conjunctive
Science
of
Koch
Database
the
Freeman
Containment
Computer
sity,
to
1979.
University
Jarke
S., Computers
Guide
W.
Y.
Databases,”
practical
us by pointing
Albert
H.,
with
Elimination,”
J.,
Proc.
A
Queries,”
work
with
R.
Symposaum
subclass
Theory,
Ioannidis
port,
the
Katz
Systems,
Biskup
of
queries,”
Finally,
Hasan
queries
helped
Joseph
P.,
“Generalized
no duplicates.
address
Standard,
Algebra
ACM
of Database
Dublish
in
Acknowledgement
N.,
Duplicate
NP-completeness,
sys-
of optimiza-
case of containment
to the SQL
First
Intractability:
Theo-
queries.
in the bag-theoretic
relations
over
Database
bag-theoretic
possibility
New
1982.
tive
elimi-
Sagiv-Yannakakis’
conjunctive
in
ACM
This
current
management
to the
9th
Computing,
Relational
San Francisco,
extend
Proc.
Goodman
the
timization
JK84]).
[SY81]
setting,
tion
ordering
are is~
bag-
of the
U.,
of
123,
in the
of conjuncts.
justification
“Optimal
queries
1987.
Extended
Principles
it is not
by
Dayal
Proc.
conjunctive
queries
phasis
tems
setting,
Wesley,
Control
further
they
conjunctive
removal
that
conjunctive
when
P. M.,
of
on
77-90.
C, J., A Guide
“An
contain-
found
unlike
set-theoretic
[DGK82]
seems
set
two
precisely
a consequence,
to minimize
theoretic
We
setting
are equivalent
morphic.
We found
than
Date
Addison
carry
queries
queries.
bag-theoretic
[D87]
Theory
pp.
D.,
435-454.
of conjunctive
1977,
J.
of Re-
tech-
do not
conjunctive
harder
of conjunctive
queries
optimization
setting.
of
prob-
bag-theoretic
setting
bag-theoretic
in the
under
that
computationally
ment
that
optimization
queries
Unman
pp.
databases,”
Symp.
York,
for
218-
Transactions
4:4,
A. K.,
relational
this
D.,
pp.
of a Class
ACM
Systems,
Implementation
In
8:2,
Y,
Optimization
lational
com-
plexity.
lems
J.
Expressions,”
of Computing
A.
“Efficient
nor
of computational
Concluding
Unman
246.
Database
!3
Y,
Relational
some
They
problem,
A.
“Equivalence
contain-
and found
to
equivalence
the
problems
problem
similar
the
study
and
References
As a result,
Ramakrishnan
addressed
in the bag-theoretic
do they
SQL.
equivalence
are weaker
Ioannidis
sufficient
not
and
Klausner
dependently
ment
than
containment
Optimization
ACM
Computing
111-152.
“Multirelations
Ph .D thesis,
in Relational
Harvard
Univer-
semanto past
has given
[K88]
containing
useful
pp.
of the ‘draft.
69
146-160.
“On
Conjunctive
Inequalities,”
J. ACM
Queries
35:1,
[MPR90]
Mumick
nan
R.,
I. S., Pirahesh
“The
gregates,”
Proc.
ference,
[NPS91]
Negri
M.,
of the
1990,
Pelagatti
Transactions
16:3,
1991,
pp.
P.
Selinger
lection
ment.
Proc.
and
VLDB
pp.
264277.
of
AgCon-
Sbattella
SQL
L.,
Queries,”
on Database
Systems,
513-534.
G.
et.al.:
in a Relational
ference
Ramakrish-
16th
G.,
Semantics
ACM
H.,
of Duplicates
Brisbane,
“Formal
[S*79]
Magic
Access
Path
Database
of the
ACM
on Management
Se-
Manage-
SIGMOD
of Data,
ConJune
79,
pp.23-34.
[St77]
Stockmeyer
L. J.,
hierarchy”,
ence,
[SY81]
Vol
Sagiv
pp.
J. D.,
Science
Computer
M.,
Sci-
“Equivalences
expressions
difference
Knowledge-base
puter
polynomial-time
1–22.
Yannakakis
and
27,1980,
Unman
pp.
Relational
union
[U89]
3, 1977,
Y.,
among
“The
Theoretical
with
operators,”
the
JACM
633-655.
Ptincip/es
Systems,
Press,
of Database
Vol
2,
and
Com-
1989.
70