Download Containment of conjunctive queries

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Concurrency control wikipedia , lookup

Versant Object Database wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Containment of Conjunctive
Beyond Relations as Sets
YANNIS E. IOANNIDIS
University of Wisconsin
Conjwzctiue
queries
languages
of
such
many
tion.
however,
as multisets
Thus,
in order
necessary
In
about
we
associated
label.
This
that
or “false”
holds
for label
systems
of a type
all
and
that
as part
views,
query
optimiza-
over
sets
(and
equivalence)
of tuples,
has been
shown
In fact,
relations
for
systems
between
The
work
viewing
of
to be
of Y, E. Ioannidis
Informix.
The
work
Grant
of R
Young
to make
digitaI/hard
by the
was
conditions.
For
and
queries
fuzzy
sets. We
class of
Finally,
we show
of the
first
the
motivates
National
partially
and Engmeermg,
Sciences
a
for
of a restricted
underscores
and by grants
Award
our
that
type
and
fundamental
a closer
examina-
in SQL.
supported
Investigator
and Xerox.
Computer
the tuple
to keep
is established
these
systems
and
an
are
traditional
that
of conjunctive
as multlsets.
result
has
is essentially
a result
set-relations
as multmets,
IRI-9157368
in Science
Tandem,
address
and
This
importance
Ramakrishnan
Fellowship
a Presldentlal
them
was partially
Investigator
Foundation
as sets
given
system
satisfy
for label
it is
that
model
In order
label
Once
tuple
relations
also
for containment
relations
type
each
relation).
each
traditional
queries,
are interpreted
(meaning
for containment
is decidable
second
which
“true”
that
are
requests
be generahzed.
we can
tuples.
condition
abstracts
queries
of the
where
of relatlons
the
of SQL
us to consider
case,
with
explicit
relations
in
be either
condition
both
that
must
is not in the
systems,
sufficient
relations
as multisets,
Young
of label
system
undecidable
of relatlons
tuple
after
class
in which
allows
a tuple
and sufficient
of conjunctwe
tion
the
only
databases
a special
with
abstracts
of unions
label
sets. As
that
and
be posed.
of a large
over
mterpretations
necessary
for a label
may
queries,
can be associated
a necessary
a different
queries
from DEC,
Authors’
query
arises
rnateriahzed
global
queries
ehmmated
of a database
fuzzy
that
for
of tuples
as a set of tuples
associated
containment
under
using
and
on set mcluslon
being
queries
a variety
it
system,
we present
and
based
of conjunctive
(meaning
we consider
example,
Packard
queries
sets to sets. Containment
multzsets
notion
are
a label
Presidential
conjunctive
from
of a relation
the label
on the labels
difference
examined
duplicates
conjunctive
set of conditions
conjunctive
example,
optimization,
containment/equivalence
generalized
relatlons
general,
also present
has
over
with
The view
study
by making
is in the relation)
results
for
of such
query
defined
a generalization
of tuples:
and
queries
to reason
paper
set-relatlons
topic
and are at the core of relational
equivalence)
optimization,
semantic
naturally
default,
this
mcsltzsets
(and
as a function
by
to consider
as multmets
query
on the
has been
database
problem.
in SQL,
treated
a relational
queries,
can be viewed
queries
an NP-complete
Even
work
RAMAKRISHNAN
for containment
of
nested
formal
each query
conjunctive
over
Testing
features
correlated
Earlier
where
are queries
as SQL.
advanced
processing
and RAGHU
Queries:
and
supported
Grant
Foundation
DEC,
by
by the National
under
Department,
Science
from
IBM,
a David
Unlvermty
and
Science
IRI-9011563,
under
HP, AT&T,
and
of Wisccms]n,
Lucde
Foundation
by grants
Madison,
WI
53706.
Permission
granted
without
fee provided
advantage,
the
given
copying
that
copyright
copy of part
that
notice,
TransactIons
the
is by permmsion
servers, or to redistribute
to lists,
G 1995 ACM 0362-5915/95/0900–0288
ACM
copies
on Database
are
title
or all of thm
not
of the
of ACM,
requires
Systems,
prior
$03.50
Vol
made
work
publication,
Inc.
To copy
specific
20, No
for personal
or distributed
and
its
otherwise,
permission
3, September
for
date
or classroom
profit
appear,
to republish,
and/or
a fee
1995, Pages 288-324
use IS
or commercial
and
notice
is
to post
on
Containment
of Conjunctive
Queries
289
.
Categories and Subject Descriptors: l?,m [Theory of Computation]:
Miscellaneous; H.1.l
[Models and Principles]:
Systems and Information Theory; H.2.3 [Database Management]:
H.2.4 [Database Management]:
Systems—query
processing
Languages—query languages;
General Terms: Algorithms, Languages, Theory
Additional
Key
equivalence,
Words
query
and
Phrases:
Conjunctive
queries,
multisets,
query
containment
and
containment
and
optimization
1. INTRODUCTION
The
problem
conjunctive
and Merlin
of syntactically
queries
[1977]
characterizing
was addressed
in the late
and by the tableau
work
pioneering
papers,
conjunctive
t~ples) to sets, and containment
queries
were
was naturally
Even
over
in SQL,
relations
nated
are
only
of a large
conjunctive
Our view
tn this
however,
treated
after
queries
as multisets
explicit
requests.
class of SQL queries,
queries,
in which
multisets
by
of tuples
with
in order
it is necessary
relations
to consider
elimi-
equivalence
a generalization
as multisets
of
of tuples:
in which
each
1991]. This generalized
that are fuzzy sets or
We develop results
on
different
label systems,
that can be associated
with tuples.
our work, and describe our results
significance.
1.-1 Motivation
for Generalized
The unit
which
being
about
of a relation
as a set of types must be generalized.
paper we study conjunctive
queries
over databases
tuples has an associated
label [Ioannidis
and Wong
notion
of a database
allows us to consider
relations
multisets,
in addition
to the case of relations
as sets.
the decidability
of conjunctive
query containment
for
and their
may be posed. In fact,
duplicates
to reason
are interpreted
that is, different
conditions
on the labels
In the rest of this section, we motivate
of
seen as functions
from sets (of
defined based on set inclusion.
default,
Thus,
equivalence
1970s by the work of Chandra
of Aho et al. [1979].
In these
of optimization
corresponds
closely
Conjunctive
Queries
in current
relational
optimizers
is an SQL block,
to a conjunctive
query. The basic idea behind most
query optimizers
is to examine
a collection
of equivalent
evaluation
plans for
a given query and to use the plan with the least estimated
cost. In addition,
many advanced
optimization
techniques
require
an earlier,
rewriting,
step,
where declarative
the opportunities
queries are rewritten
into equivalent
ones, thus expanding
for optimization
at the lower, algebraic
(plan-based),
level.
The rewriting
is often based on comparing
the original
query with (possibly
pieces of) other queries with respect to equivalence
or containment.
As a first example
of such advanced
query optimization
techniques,
consider the use of materialized
views. In this case, a given query is compared
against
multiple
view definitions
(which
may be seen as queries
as well) to
identify
which of them may be combined
to answer the query. This comparison is essentially
testing
for query
containment.
Supporting
materialized
ACM Transactions
on Database
Systems,
Vol. 20, No. 3, September
1995.
290
Y
.
views
to
E. Ioannidls
process
[Blakeley
et
al.
and
and
R. Ramakrlshnan
optimize
1986;
queries
Chaudhuri
et
designers
example,
are considering
offering
Microsoft
is contemplating
in
system
their
problem
of
[Graefe
processing
1995].
is
al.
the
topic
1995;
of much
Tsatalos
et
recent
al.
this option
in their
future
the use of GMAPs [Tsatalos
As
correlated
a second
nested
example,
SQL
and
systems;
for
et al. 1994]
consider
queries,
work
1994],
the
where
related
one
of the
alternatives
is to process a more general
nested query whose result can then
be used to process the outer query. Figuring
out what the appropriate
more
general
TPC-D
nested query is involves
query containment.
With the release of the
benchmark,
in the last year or so much research
has started
on such
transformations
As a third
system
tries
queries)
in the context
example,
of processing
consider
to figure
can be used
out
semantic
if integrity
to modify
and optimizing
query
constraints
a given
query
aggregate
queries.
In this
case, the
optimization.
into
(which
may
be seen
a semantically
as
equivalent
one that may be more efficiently
executed [King 1981; Shenoy and Ozsoyoglu
1987]. This is only possible if the antecedent
of the integrity
constraint,
seen
is contained
in the given
query.
As a final
example,
in
as a query,
multiple/global
query optimization,
several queries are compared
pairwise
to
identify
equivalent
common
subqueries
(subexpressions).
This gives us the
option
overall
of evaluating
such subexpressions
only once, hopefully
resulting
more efficient
combination
of plans [Grant
and Minker
1981;
1986].
Again,
common
examples,
it
equivalence
tion
these
comparisons
subexpressions
of many
should
involve
contain
become
of declarative
advanced
each
clear
queries
query
query
of the
that
the
are central
full
containment
queries.
questions
tests,
From
all
in an
Sellis
as the
of these
of containment
to the design
and
and implementa-
optimizers.
Currently,
the formal
results
on conjunctive
query equivalence
and containment
apply only when we regard
a relation
as a set of tuples [Chandra
and Merlin
1977; Sagiv and Yannakakis
1980]. As mentioned,
however,
the
defaz~lt in SQL (unless distinct
is used) is that duplicates
are not eliminated,
and
the
reason
cardinal
rigorously
ities
of the
about
tuples
in the
answer
containment/equivalence
are significant.
Hence,
in the
of current
context
to
SQL query optimizers,
we must consider
conjunctive
queries
over relations
that
are multisets
of tuples.
This is a major
motivation
for the results
presented
in this paper, which provide
a theoretical
foundation
for conjunctive query equivalence
and containment
over relations
that are multisets.
Moreover,
these results
are cast in a more general
framework,
and in fact
deal with relations
as fuzzy sets and with the traditional
case of relations
as
sets as well. Our results
demonstrate
that the containment
problems
for sets
and multisets
are quite
different
in nature.
Thus,
we believe
that
they
represent
a necessary
complement
to the earlier
results
on conjunctive
query
containment
[Chandra
and Merlin
1977; Sagiv and Yannakakis
1980], which
are widely recognized
as foundational
from a theoretical
standpoint.
Next, we present
two examples,
one from view materialization
and
from multiple
query optimization,
illustrating
the query containment/equivalence problem
and the significance
of addressing
it when
relations
treated
as multisets.
ACM
Transactmns
on Database
Systems,
Vol
20, No
3, September
1995
one
are
Containment
1.1. Consider
Example
create
select
from
create
select
from
The
the following
of Conjunctive
SQL view
Queries
291
.
definitions:
view EMPHIWl(ename,
hours) as
distinct.
e.ename, e.hours
EMP e,
view EMPHRS2(ename,
hours) as
e.ename, e.hours
EMP e.
database
relation
EIW?
has
columns
“ename,”
“dname,”
and
“hours”;
intuitively,
each employee
works in one or more departments
for a certain
number
of hours. Users write queries that refer to the views EMPHRS
1 and
EMPHRS2,
and we wish to evaluate
these queries
efficiently.
If queries
on
EMPHRSI
are common,
Now,
a query
given
1 to answer
of IEMPHRS
tant
the
system
on EMPHRS2,
it? Such
in an environment
might
when
choose
an optimization
containing
to materialize
this
can we use the materialized
large
becomes
numbers
especially
of complex,
view.
version
impor-
related
view
definitions,
as is the case for instance
in decision support
applications.
TO answer this question,
we must determine
when the two view definitions
are equivalent.
If EMPHRS1
they are equivalent.
However,
and EMPHRS2
if we take into
are regarded
as sets of tuples,
account the number
of copies of
each tuple,
they are different.
For example,
an employee
hours each in the Toy Department
and the Sales Department,
identical
tuples in EMPHRS2,
but
Consider
the following
query:
only
one tuple
could work five
leading
to two
in EMPHRS1.
select ename, max(hours)
from EMPHRS2.
The
difference
between
EMPHRS1
and EMPHRS2
is not important
query, and we can use a materialized
version of EMPHRS1,
in place of the reference
to EMPHRS2.
On the other hand,
max(hours)
carmot
rectly,
different
The
is replaced
by sum(hours),
the difference
becomes
substitute
EMPHRS
1 for EMPHRS2.
To make
the SQL optimizer
must recognize
that the two
under
next
which,
central
example
again,
role.
Example
database,
multiset
1.2.
which
for this
if it is available,
if the aggregate
crucial,
and we
this distinction
view definitions
corare
semantics.
illustrates
equivalence
Consider
includes
a more
of queries
the
complex
over
relational
the following
optimization
relations
schema
three
of
scenario
in
as multi
sets
plays
a large
corporation’s
a
relations:
MANUF( mrzo, location,...
)
TRANSP(tno,
location,...
)
WAREFIS(zorzo, location,.. . ).
is the city where the manuAttributes
in italic are primary
keys, “location”
facturing
plant, the warehouse,
or the transportation
unit’s headquarters
are
located,
and other attributes
are omitted
for simplicity.
Assume
that, in a
ACM
Transactions
on Database
Systems,
Vol. 20, No. 3, September
1995
292
Y E Ioannldis
.
and R i3amakrishnan
downsizing
effort of the company,
the following
three
SQL queries
optimize
and then
Ql:
execute
them
together:
selectm.mno,
count(m.location)
fmm
m,
MANUF
vvhere
to decide relocations
of’ various
operations,
are submitted,
and the system
tries to
TRANSP
m.location
t
= t.locatlon
@oup-by
m.mno,
select w.wno, count(w.location)
Q2:
from
WAREHS
where
W, TRANSP
w.location
group-by
t
= t.location
w.wno,
SAX% m.mno, w.wno, count(”)
from MANUF m, WAREHS w, TRANSP
where m.location = t.location
and w.location = t.location
group-by
m.mno, w.wno.
Q3:
QI requests
manufacturing
t
the number
of transportation
units
based on the
plant. Q2 requests
the number
of transportation
city of each
units based
on the city of each warehouse.
Finally,
Q3 requests the number
of transportation units based on the city of each pair of colocated manufacturing
plant and
warehouse.
Most relational
gregate-free
sary
data
queries
SQ1:
SQ2:
SQ3:
systems
subqueries
and
then
respectively
process
aggregate
queries
to generate
temporary
relations
by performing
corresponding
the
to Ql,
appropriate
by first
processing
containing
aggregations.
ag-
the necesThe
sub-
Q2, and Q3 are as follows:
select m.mno,
m.locations
from MANUF m, TRANSP t
where m.location = t.location
select w.wno, w.location
from WAREHS w, TRANSP t
where w.location = t.location
select m.mno, w.wno
from MANUF m, WAREHS w, TRANSP
where m.location = t.location
and w.location = t.location,
t
In the above SQL queries, it is natural
to consider the relations
named
from
clause to be sets of tuples. However,
since the keyword
distinct
in the
is not
used in their formulation
the resulting
relations
are m uztisets. For example,
if there are three transportation
units in San Francisco,
which is the city of
warehouse
#34, there are three copies of the tuple (#34, San Francisco)
in
the result
of SQ2. Retention
of duplicates
is critical
in this case, since it
enables us to count these tuples on a per-warehouse
basis and to generate
the
information
requested
by Q2.
Instead
of directly
processing
SQ3, a seemingly
plausible
alternative
for a
multiple
query optimizer
is to join the results
of SQ1 and SQ2 on “location,”
thus reducing
the number
of required
join operations.
The query correspond-
ACM
Transactions
on Database
Systems,
Vol
20, No
3, September
1995
Containment
ing to this
approach
SQ4:
of Conjunchve
that
293
.
is the following:
select m.mno, w.wno
from iVMNUF m, WAREHS w, TRANSP
where m.location = t.location
and w.location = t’.location
and m.location = w.location.
It is clear
Queries
SQ3 and
SQ4 generate
the
t, TRANSP
exact
same
t’
sets of (mno, wno)
pairs, that is, pairs of colocated
manufacturing
plants
and warehouses
with
at least a transportation
unit
based in the same city. (This may also be
formally
proved
by applying
the results
mentioned
earlier
[Chandra
and
Merlin
1977].) Nevertheless,
executing
SQ4 instead
of SQ3 is wrong in this
case, because
be retained,
each
the
and
(mno, wno)
many
times
subsequent
aggregations
SQ3 and
pair.
as there
SQ4 generate
Specifically,
require
different
SQ3
that
generates
are transportation
units
the
duplicates
numbers
each
based
must
of duplicates
(mno, wno)
on the
city
of
pair
of the
as
pair,
whereas
SQ4 generates
each (mno, wno) pair as many times
as there
are
pairs of transportation
units based on the city of the pair, a rather
meaningin SQ4 when
less result.
Viewed
as conjunctive
queries,
SQ3 is contained
relations
are treated
as multisets,
but not vice versa,
whereas
they are
equivalent
when relations
are treated
as sets. Thus,
ent depending
on how relations
are treated.
If this
systems
may
Example
t?W_lCe
1.2 illustrates
of the presented
query
optimization
compare
the
results.
scenarios
queries
are treated
interpretations
in this
end up producing
with
main
results.
motivation
As we have
depends
respect
as multisets.
of relations,
erroneous
these notions
are differdistinction
is not made,
for this
work
seen, correctness
on the
to containment
existence
and
realistic
of techniques
or equivalence
Similar
needs also
for example,
fuzzy
the impor-
of many
when
that
relations
arise for a variety
of other
sets, which are also included
study.
In developing
our results,
we have
chosen
to model
generalized
relations
as
sets of tuples
with
associated
labels
[Ioannidis
and Wong
1991]. As an
example,
a positive
integer
multiplicity
(or number
of copies, intuitively)
is
associated
with every tuple in a multiset.
As noted, current
relational
query
languages
such
as SQL
support
a multiset
to be extremely
useful
fuzzy
sets,
arbitrary
itively)
The
is associated
with every tuple in a fuzzy set.
conditions
under which our results
are applicable
where
an
of the algebraic
properties
Although
it is true that
without
the abstraction
tant
advantages.
First,
for many
semantics,
found
number
common
in
and
queries.
[0, 1] (or
this
Another
certainty
has
been
example
factor,
are stated
is
intu-
in terms
of the set of labels and associated
label operations.
the results
for, say, multisets
could be obtained
of label systems, the framework
has several imporour results
have more generality;
for example,
a
single label system, type A, defined
later, covers both fuzzy sets and traditio nal sets, and thus, our results about type A systems apply to both. Without
the abstraction
of labels and label systems, the same results would have to be
ACM TransactIons
on Database
Systems.
Vol
20, No. 3, September
1995.
294
Y. E, Ioannldis and R. Ramakrishnan
.
established
independently
although
essential
for relations
as sets and for relations
the essence of the proofs is the same. Second,
properties
upon which
our proofs depend,
the
as fuzzy
sets,
by abstracting
the
framework
of label
systems
allows
us to understand
more fully
the basic differences
between
queries over different
kinds of relations.
Finally,
it is possible that our results
are applicable
to other
kinds
of relational
extensions
to the label systems that we study; the formulation
label systems makes it easy to extend our results,
show
that
the
given
label
system,
new
kind
of relations
satisfies
with
properties
similar
of our results in terms
since it is only necessary
the
(simple)
conditions
of
to
for
a
1.2 Results
Our
main
results
are the following:
First,
we show
that
containment
of both
individual
conjunctive
queries
and unions
of them is decidable
but NP-complete for a label system that generalizes
relations
as sets and relations
as
fuzzy sets. For the specific case of relations
as sets, these results
had been
shown
earlier
1980].
Second,
undecidable
individual
as well
[Chandra
we show
for
a label
conjunctive
conditions
for
that
and
Merlin
containment
system
queries
containment,
that
within
which
1977;
Sagiv
of unions
captures
that
are
label
and
relations
system,
necessary
Yannakakis
of conjunctive
queries
we present
and
Recently,
Chaudhuri
and Vardi
some of our results.
1.3 Outline
[ 1993]
studied
(We discuss
this
multisets
further
For
sufficient
sufficient
when
queries
are in a restricted
(but very common)
form. Decidability
ment in the general
case, however,
remains
an open problem.
discovered
is
as multisets.
the
of contain-
and independently
in Section
8.)
of the Paper
The paper is organized
as follows:
In Section 2 we present
some background
material
and also develop
our generalized
notion
of a database,
which
we
consider
to be one of our important
contributions.
We introduce
label
systems
and identify
two types of label systems (types A and B), for which results
are
presented
in this paper.
In Section
3 we establish
a general
necessary
condition
for conjunctive
query containment
that is satisfied
by a large class
of label systems, including
type A and type B systems. In Section 4 we show
that
the necessary
condition
identified
in Section
3 is also sufficient
for
conjunctive
query containment
over databases
with label systems of type A.
This result
strictly
generalizes
the condition
of Chandra
and Merlin
[1977]
sets.
In Section
5 we
and is also applicable
to databases
that deal with fuzzy
present
a
systems
of type
are
as
no
sufficient
repeated
sufficient.
condition
B.
For
predicates,
These
for
restricted
we
results
containment
classes
prove
of
that
are applicable
over
conjunctive
this
databases
queries
condition
to databases
is
that
in
with
label
which
there
necessary
deal with
as
well
multi-
sets. In Section 6 we study unions of conjunctive
queries. For label systems of
type A, we present
a necessary
and sufficient
condition
for containment
of
unions
of conjunctive
queries,
which
generalizes
a similar
result
for sets
[Sagiv and Yannakakis
1980]. For label systems of type B, we prove that the
ACM
Transactions
on Database
Systems,
Vol. 20, No
3, September
1995
Containment
problem
is, in generaI,
concerning
a class
reasoning
used in expert
are motivated
systems
9 we outline
summarize
In
Section
systems
by graph-theory
8. In Section
that
like
several
7 we establish
captures
MYCIN,
problems.
the
as well
We discuss
interesting
Queries
some
form
directions
results
of fuzzy-set
as label
related
295
.
systems
work
that
in Section
for future
work
and
our results.
2. BACKGROUND
In this
undecidable.
of label
of Conjunctwe
section
databases
we review
and
systems,
AND BASIC DEFINITIONS
databases,
the usual
some
conjunctive
standard
queries.
concepts
In
and conjunctive
and
particular,
query
develop
the
containment
our
model
definitions
of
of label
are generalizations
of
definitions.
2.1 Label Systems
We first introduce
label systems,
of conjunctive
queries. Intuitively,
by attaching
factors
a label
each tuple.
to
or multiplicity,
which are at the heart
the ideal is to extend
for example.
Labels
of our generalization
the relational
model
can be used
In our framework,
to capture
labels
certainty
are drawn
from
a special domain,
and their usage must conform
to certain
rules that enable
us to interpret
labels using an appropriate
semantics.
It is possible to place
different
sets of rules upon labels,
leading
to different
semantics
and to
differences
queries.
in the difficulty
In this
by a small
paper
collection
Definition
2.1.
of deciding
issues
we study
two types
of algebraic
rules.
A label
system
9
like
containment
of label
systems,
is a quintuple
1% z
of conjunctive
each characterized
(L,
*,
O, < ) such
+,
that
L
is a domain
*
is a binary
of at least
operation
two labels
(called
equipped
with
multiplication)
a partial
on L that
order
< ;
is associative
and
commutative;
+
is a binary
commutative;
O
is the additive
multiplication
thatis,
operation
and
(called
addition)
on
L
a+
O=a,
a*O
=0,
(Al)
associative
2.2.
A
label
system
5?’=
and
respect to
order
< ,
and O<a.
When
a s b and a # b, then we use the notation
system studied
in this paper is presented
here:
satisfies
is
identity
in L and is also an annihilator
with
and the least element
with respect to the partial
Va~L,
Definition
that
(L,
a < b. The
*, +, O, < ) is
of
first
type
A
label
if
it
the following:
b’a, b~L-{O},
O<a*b
(A2)Va
GL,
(A3)Va,
a’, b, b’~L,
(A4)Va,
b~L,
<a.
a*a=a.
a+
(a<a’andb
b<aora+b<
ACM
Transactions
<b’)
=a+b<a’+
b’.
b.
cm Database
Systems,
Vol
20, No 3, September
1995
296
Y. E, Ioannidis
.
Note
that
conditions
and
R, Ramaknshnan
condition
(A2) states
that
multiplication
(A3) and (A4) are equivalent
to stating
that
is idempotent.
Also,
< is a total order and
that
+ is equal to max with respect to that total order. We chose the given
presentation
to make the analogy with condition
(B2) of label systems of type
B clearer.
Example
important
2.1.
Binary
truth
values
as well as certainty
factors
examples
of label systems of type A. For the case of truth
L = {“false”,
“true”}.
or. The label
the annihilator
is easy
The operation
* is logical
“false”
corresponds
for multiplication.
to verify
satisfied.
For certainty
The operation
that
all
factors,
* is min,
order
the operation
+ is logical
to O and serves as the additive
Finally,
< is such that “false”
of the
conditions
for
a type
identity
< “true.”
A label
system
of
It
are
L is equal to the set of real numbers
between O and 1.
and the operation
+ is max. (This is the case, e.g., in
the system
proposed
by van Emden
additive
identity
and the multiplicative
total
and,
are two
values,
on numbers.
Again,
all
[ 1986].) The
annihilator.
conditions
for
number
Finally,
a type
O serves as the
< is the regular
A label
system
are
satisfied.
Definition
2.3.
fies the following.
A label system L?= (L,
The second label system
*,+ ,0, < ) is of type B if it satisstudied
in this paper is presented
next:
(Bl)Va,
b~L–{O},
a<a*b.
(B2)Va,
a’, b, b’=L,
(a<a’andb
(B3)
<b’)+
a+
b<a’+
most
important
Example
2.2.
Specifically,
and
+
Arithmetic
L is equal
are the
The number
to the
usual
O serves
is the
set of nonnegative
multiplication
and
as the additive
identity
Relation
Instances
with Respect
In order to deal with
requires
that a relation
presenting
the formal
multiple
instance
definition,
label
integers.
addition
to a Label
system
The
of numbers,
of type
B.
operations
*
respectively.
and the multiplicative
tor. Finally,
< is the usual total order on numbers.
that all of the conditions
for a type B label system
2.2
b’.
Va @ L, ~a’ G L, a < a’.
annihila-
Again, it is easy to verify
are satisfied.
System
interpretations
of relations,
our treatment
is defined in a new, generalized
way. Before
we illustrate
it with
an example
from a
traditional
SQL-based
context to provide intuition.
After the definition,
additional examples
focus on aspects of the definition
itself.
For the purposes
of this work, a relation
instance
is defined
as a function
from
all possible
tuples
that
could be formed
from
the domains
of the
relation’s
attributes
to the labels of a system according
to which the relation
is interpreted.
For example,
consider
the MANUF
relation
of Example
1.2
interpreted
as a multiset.
Assuming
that
there
are three
manufacturing
plants
ACM
#1,
#7,
TransactIons
and
#9
located,
on Database
Systems,
respectively,
in Boston,
Vol. 20, No, 3, September
1995
San
Francisco,
and
Containment
Boston,
the
corresponding
the following
MANUF
of Conjunctive
instance
would
(#1,
San Francisco)
(#9,
San Francisco)
Boston
units,
as
(#l,
(#7,
(#9,
indicating
tuple
the
~
units
and San Francisco
has 5
SQ1 would be formally
specified
as
Boston)
~ 3,
~ O,
Boston)
-0,
~ 5,
Boston)
~ 3,
San Francisco)
is associated
number
1,
+ O.
San Francisco)
(#9,
is, each possible
-+ 1,
San Francisco)
(#7,
1,
~ O,
Boston)
has 3 transportation
the result of query
(#1,
formalized
specified
A O,
Boston)
(#7,
(#9,
Similarly,
if
transportation
follows:
~
San Francisco)
(#7,
plicity,
be formally
297
.
function:
(# I, Boston)
That
Queries
-0.
with
of times
it
a nonnegative
appears
in
the
integer
relation.
multiThis
is
in the following:
Definition
the domains
2.4.
Let Q be an n-ary predicate
symbol, and let .DI,...
, D. be
of values
of the arguments
of Q. Predicate
symbols
over the
sa,me cross product
of domains
are called
compatible.
Also,
let 2? be a label
system with domain
L. A relation
instance
for Q with respect to 2 is a total
instances
for compatible
predicate
functionl
Q: DI X . . . x D. ~ L. Relation
symbols
are called compatible.
A database
instance
with respect to a label
sy~tem
&
x . . . X D.
l:$i
is a set
is called
<n.
Example
the
of relation
instances.
Any element
of the domain
DI
a tuple
and is denoted
by (dl,.
. . . dn), for d, ● D,,
first
2.3.
label
The traditional
system
of type
view
of relations
A given
in
as sets is captured
Example
2.1. Usually,
through
a relation
instance
is compactly
represented
as the subset of its domains
containing
the
Fuzzy sets may be captured
through
the second
tulples that map to “true.”
label system of type A given in Example
2.1, where each possible
tuple is
mapped
to a certainty
factor between
O and 1. Finally,
multisets
may be
calptured
through
the label system of type B given in Example
2.2. In this
case, as illustrated
earlier,
each tuple is mapped
to a nonnegative
integer,
which indicates
the number
of copies of the given tuple in the multiset.
lFunctions
that
denote
relation
instances
appear
ACM Transactions
in boldface.
on Database
Systems,
Vol. 20, No. 3, September
1995
298
Y E Ioannldis
.
In the
that
it
sequel,
is with
and
whenever
respect
R, Ramakrishnan
we refer
to a database
to a given
label
instance
system.
One
it is understood
could
argue
equivalent,
and perhaps
more natural,
formulation
of different
tions of relations
is obtained
by adding
an explicit
column (field)
the label assigned to each tuple. The resulting
extended
relations
be treated
as regular
sets. For example,
instead
of using atomic
that
an
interpretathat stores
would then
formulas
of
the form Q(tl, t2,....tn),one could use Q(tl, t2,....tn,1),where the domains
unchanged
and the domain
of 1 is L. In fact, this alternative
of t, remain
formulation
would
instances
clear
based
shortly,
junctive
explicit
queries
status
and
most
likely
be
on nontraditional
representation
is not
must
used
enough.
be treated
in
as the
label
systems.
of the
The
label
Conjunctive
Conjunctive
mentioned
them.
additional
column
a special
way
associated
with it. Otherwise,
wrong results
The formalism
of label systems, introduced
semantics
of labels explicit,
and is therefore
2.3
basis
for
However,
relation
be made
label
must
that
storing
as will
column
be given
respects
the
in cona special
semantics
will be obtained
in several cases.
in this section, makes the special
adopted for the rest of the paper.
Queries
queries are formally
defined in this subsection.
We have already
that SQL blocks, that is, fZat SQL queries,
correspond
closely to
Specifically,
conjunctive
queries
are
essentially
logic-based
specifica-
tions of such SQL queries,
whose syntax
is more natural
to deal with
in
formal
work, as earlier
studies of containment
and equivalence
have demonstrated. For example,
consider the SQ1 query, which
The equivalent
logic-based
syntax for this query is
MANUF(mno,
10C) ~ TRANSP(tno,
The join
on location
MANUF
and TRANSP
is captured
10C) +
is expressed
SQlresult(mno,
by the use of the common
predicates,
whereas
in flat
SQL.
10C).
variable
10C in the
of mno
and 10C on
the appearance
the right-hand
side of ~
require
a predicate
name
indicates
the query projections.
Conjunctive
queries
for the result,
which was arbitrarily
chosen here to
be SQlresult.
syntax
Definition
Al
~Az
The formal
2.5.
of conjunctive
queries
A
conjunctive
query is a first-order
t“
~ . . . ~ Am ~ C. All of the variables
appearing
is defined
formula
in
the
as follows:
of the
formula
form
are
(implicitly)
universally
quantified.
The formula
to the left of is called the
Each one of
antecedent,
and that to the right of ~ is called the consequent.
formula
of the form Q(tl, tz, . . . . t.), where Q
Am is an atomic
C, AI, A,,...,
The predicate
is a relation
(predicate)
symbol and t,,1 < i < n, is a variable.
symbol in the consequent
does not appear in the antecedent
(i.e., recursion
is
distinnot allowed).
Variables
that
appear
in the consequent
are called
guished
and must
appear
in the antecedent
as well, while
all others
are
called nondistinguished.
Finally
a conjunctive
query has an associated
unary
function
f: M e M, for some set M equipped
with a partial
order
< such
that 30 ~ M, ‘da ● M – {O}, O < a, and O <f(a).
ACM
TransactIons
on Database
Systems,
Vol
20, No
3, September
1995
Containment
Based
on Definition
It obtains
2.5, a conjunctive
semantics
only
system
and processed
to the
same
systems.
when
based
conjunctive
The
only
query
on that.
query
Several
by
conjunctive
query associated
with
Va = L, f(a) ~ L. The role of the
distinct
using
it
construct.
on a particular
semantics
based
a label
query
on different
system
without
label
can be given
label
L? to interpret
a function
f: M ~ M is that
function
f will be clear when
queries.
a conjunctive
299
.
syntactic
based
interpreting
for
Queries
is a purely
it is interpreted
requirements
valuations
of conjunctive
‘Whenever
we present
of Conjunctive
a
L c M and
we discuss
an associated
function
f,
then it is assumed that ~ is the identity
function.
Also, we use the convention
that the atomic formula
in the consequent
of a conjunctive
query always has
the distinguished
predicate
symbol P, possibly subscripted
with an indicator
of the specific
atomic
conjunctive
formulas
are used to refer
2.4 The Rewlt
To define
we first
Such
the
to those
formula
otherwise
predicates
noted,
the phrases
of a conjunctive
query
Query
a conjunctive
the essence
inference,
in the
unless
and
in the antecedent.
of applying
to capture
a single-tuple
atomic
Finally,
query
of a Conjunctive
result
have
query.
of a conjunctive
or derivation,
antecedent
query
of inferring
to a database
a single
is realized
of a conjunctive
tuple
instance,
in that
result.
by instantiating
query
with
each
a single
tuple.
For the purposes
of this paper,
such a derivation
corresponds
to the
traditional
combination
of tuples
(one from each participating
relation)
enhanced
with
a multiplication
of their
labels,
which
results
in the label
assigned
to the derived
tuple. For instance,
consider
query SQ 1 of Example
1.2 with its relations
interpreted
as multisets
and instantiated
as mentioned
earlier.
Combining
the
(#1, Boston)
tuple
of MANUF,
with, say, the (#24, Boston) tuple of TRANSP,
which
the result
tuple (#1, Boston)
with label 1, capturing
transportation
ton)
tuple
of TRANSP,
unit
in the city of plant
of MANUF,
which
generates
the result
#r/ is not even in Boston.
Valuations
of conjunctive
single-tuple
definition,
A
is from
2.6.
examples
Consider
O, with
tuple
queries
in the
additional
Definition
... A
derivations
#1. Similarly,
is labeled
focus
combining
the above
(#7, Boston)
are
query
used
result
a conjunctive
with
as the
and
on aspects
which
1,
tuple
label
O, since
plant
formal
of
Bos-
Boston)
of the definition
a
the (#7,
(#24,
are defined
query
is labeled
is labeled
1, generates
the existence
of some
the
mechanism
for
next.
the
After
itself.
form
Al
A Az
Am ~ C. A valuation
0 of a is a pair of functions
( @u,01). Function
@v
the variables
of a to some set of constants,
and function
91 is from
the atomic
formulas
of a to labels
such that
0Jc)=f(1cP4
Applying
0 on a gives
an instance
Example
2.4.
To illustrate
intuition
when dealing
with
of a.
Definition
2.6 and the fact that it captures
the
familiar
label systems,
we discuss the cases of
ACM Transactions on llatabase
Systems,
Vol. 20, No. 3, September
1995
300
.
Table
I
ConJunctzue
Y E. Ioannldls
and
TWO Valuatlons
d and
query
H’ In a Label
ztem
System
that
Interprets
Relatlons
e
as Sets
(i’
.!
o,(x)
6:(x)
= K
z
HC,(Z)=L
til:(z)
= L
-v
H,,(y) = N
o;,(y)
= N
Q(,r,
z)
R(z,.v)
relations
tive
R. Ramaknshnan
= K
til( Q( x, z)) =“true”
fi~(Q(x,
z)) =“false”
Ol(R(z.
O~(R(z,
y))
as sets and relations
y))
=“true”
as multisets.
Consider
=“true”
the following
conjunc-
query:
Q(x,
Assume
that
the
{K, L, M, N}.
Consider
the
domain
label
system in Example
Table I. Valuation
system,
system
tuple
since,
dl applied
til(P(x,
On the other
of all
y)
argament
that
y).
+~(X,
positions
interprets
of all
relations
relations
as sets
is D =
(first
label
2.1), and let the two valuations
o and 6’ be as shown in
O represents
the case where tuple (K, L) is in Q and tuple
(L, N) is in R. Then,
captures
this intuition
set label
A~(z,
z)
hand,
y))
(K, N) should
be in the result.
Definition
2.6
based on the definition
of multiplication
in the
on the consequent
= Ol(Q(x,
valuation
z))
and
(1’ represents
y))
= Ol(Q(x,
Ol(R(z,
y))
.z)) and
query
yields
=“true”.
the case where
in Q and tuple (L, N) is in R, and therefore,
result.
Again,
Definition
2.6 captures
this
consequent
of the conjunctive
query yields
OI(P(X,
of the conjunctive
tuple
(K, L) is not
tuple (K, N) should not be in the
intuition
since (I1 applied
on the
Ol(R(z,
y))
=“false”.
We now turn our attention
to the label system that interprets
multisets.
Consider
the two valuations
o and d’ shown in Table
relations
as
II. Valuation
0 represents
the case where tuple (K, L) appears in Q twice and tuple (L, N)
appears in R five times. Then, tuple (K, N) should appear in the result ten
times. Definition
2.6 captures
this intuition
since, based on the definition
of
multiplication
the conjunctive
in the multiset
query yields
dl(~(~,y))
On the other
hand,
label
= 01( Q(x,
valuation
system,
z))*
01 applied
61( R(z,
0’ represents
y))
on the
=2*5=
the case where
consequent
of
10.
tuple
(K, L) does
not appear in Q at all, and therefore,
independent
of the number
of times
tuple (L, N) appears
in R, tuple (K, N) should
not be in the result.
Again,
Definition
2.6 captures
this intuition
since (ll applied
on the consequent
of
the conjunctive
query yields
dl(P(.x,
ACM
‘I%msactlons
y))
on Database
= 61( Q(x,
Systems,
Vol
z))*
01( R(z,
y))
20, No, 3, September
= O*5
1995
= O.
Containment
Table
II.
Conjunctl
Two Valuations
ve query
() and
f)’ in a Label
item
System
that
Queries
Interprets
as Multlsets
w
f),(x)
K
=
e:(x)=K
o:(z) = L
q:(y) = N
z
Y
Q(x, z)
f),(z)=L
~,>(y)= N
Ol(Q(X, Z)) = 2
o;(Q(.z,
z)) = O
R(z,
@l(R(z,
O;(R(z,
y))
y)
Definition
consequent
i < n, are a,
b,, respectively.
and
If,
for
all
Note that in Definition
of the same conjunctive
equivalence
because
.v)) = 5
2.7.
Consider
two conjunctive
whose distinguished
variables
D, respectively.
compatible.
relation
they
1<
301
.
Relatlons
(1
x
1<
of Conjunctive
queries
a and ,B with compatible
in the ith argument
position,
Let
i < n,
= 5
/3” and
6;(a,
6 ~ be valuations
) = Ot?(b,),
then
6”
of a and
and
o ~ are
2.7 a and ~ do not have to be distinct.
For valuations
query, it is easy to show that compatibility
is an
over
capture
valuations.
derivations
Compatible
of the
valuations
same
tuple.
are
This
important
is a key
concept
needed in defining
the result
of applying
a conjunctive
query to a database
instance,
because the labels
assigned
to a tuple
by compatible
valuations
need to be combined
to determine
the label assigned to it in the overall result.
Definition
to
2.8.
A valuation
o of a conjunctive
a database
Q((l(x,
instance
if, for every
. . . . du(xn))) = f31(Q(x1,...,
),
We finally
query
to
individual
tions
tuple
move
tuple
of the
in
tuple
of the
This
The
result
earlier.
In
the
of the
result.
For
done
by
particular,
by adding
outcome
in the overall
with
respect
Q( xl, . . . , x.)
of applying
is essentially
defined
a is true
formula
in
a,
x~)).
are combined
derivation.
to the tuple
of Example
mentioned
definition
instance.
derivations
same
each
assigned
once for
to the
a database
query
atomic
separate
labels
addition
example,
a conjunctive
combining
the
deriva-
assigned
is
the
consider
to the
final
query
label
SQ 1
1.2 with its relations
interpreted
as multisets
and instantiated
as
earlier.
The (#1, Boston) tuple of its result is derived three times,
each
of the
the (#1, Boston)
tuple
TRANSP
tuples
of MANUF.
with
Each
location
derivation
= Boston
assigns
combined
the label
with
1 to the
resulting
tuple, whose final label is therefore
equal to 3, capturing
the fact
that three transportation
units are based in the city of plant #1, as expected.
The result
of a query is formally
defined
in Definition
2.9. After the definition,
additional
Definition
A
. . . A Am
f
~
examples
2.9.
~
and
focus
Consider
a &tabaSe
on aspects
a conjunctive
instance
of the definition
query
1. Let
of a that are true with respect to 1. Partition
relation
of valuation
compatibility,
and let
ACM TransactIons
on Database
a
@ be
of
the
@ based
@~ denote
Systems,
Vol
itself.
the
set
form
Al
~ Az
of all valuations
on the equivalence
the partition
that
20, No 3, September
1995.
302
Y. E. Ioannidis
.
Table
Domam
III
Instances
?nenlber
and
R, Ramaknshnan
of Q and R m a Label
d
System
that
Interprets
Relations
R(d)
Q(d)
(K, L)
“true”
“false”
(K, M)
“true”
“false”
(L, N)
“false”
“true”
(M, N)
“false”
“true”
All
“false”
“false”
Table
others
IV,
Four
Valuations
that
are Compatlble
in Generating
of the Conjunctive
Item
o
x
(+,(x)
fJ,(z)=K
f),(y)
= K
til(Q( x, z)) =“false”
R(z,
01(R(2,
generates
y))
tuple
a to 1 (denoted
=<’false”
d:(z)
= L
o
d:(y)
= N
fl[v(~)
= K
()::(z)=M
H;!(z)
= N
d;’’(y)
= N
e;(y)
= N
z)) =“true”
,9j(Q(x,
z)) =“trUe”
o?(Q(
#~(R(.z,
y))
(3j’(R(z,
y))
t?j’’(R(z,
=< ’true”
variables
is a relation
instance
(,!
~;:(x)=K
6~(Q(x,
t in the distinguished
by a(l))
(K, N) m the Consequent
(i’{
ti:(x)=K
= N
Y
Q( x, z)
y)
Tuple
Query
6’
z
as Sets
=’<true”
of a. The
P, that
X, z)) =’<false”
result
y))
=“false”
of applying
is, a function,
such that
‘(t)=,Loz(c)=
2“),UP(A4$
The
intuition
based
behind
Definition
on the conjunctive
query
2.9 is that
are grouped
a different
label to it, and these labels
the label of the tuple in the result.
all
derivations
together.
are combined
Each
(through
of a single
derivation
tuple
assigns
addition)
to find
Example
2.5.
To illustrate
Definition
2.9, we discuss the cases of relations
as sets and relations
as multisets.
We use the same conjunctive
query as in
Example
2.4,
Q(x,
z) //~(Z,
y)
+~(X,
uV),
with
the same domain
for all argument
positions
of all relations,
D =
{K, L, M, N}.
Consider
the label system
that interprets
relations
as sets, and let the
instances
of Q and R be as shown in Table III. There are 81 valuations
that
are true with respect to Table 111 (three variables,
each assigned
to one of
four possible elements
in its domain).
Four of these valuations
are compatible
in generating
tuple (K, N) in the consequent
of the conjunctive
query, and are
given in Table IV. Valuations
(I and 0’” generate
the label “false”
for tuple
(K, N), while
is, by taking
tuple
valuations
the logical
(K, N). This
was
8’ and O“ generate
the label “true.”
By adding,
that
or of these four labels, we obtain the label “true”
for
to be intuitively
expected,
because
tuples
(K, L) and
(L, N) are in Q and R, respectively,
and therefore
their
composition
should be in the result.
Similarly,
tuple (K, N) is placed in the result
ACM
‘J1-ansactlons
on Database
Systems,
Vol
20, No. 3, September
1995
(K, N)
as the
Containment
Table
V.
Four
Valuations
Compatible
in Generating
of the Conjunctive
Item
o
O[(Q(x,
dl(l?(z,y)
y)
‘lrable VI.
Domain
z))=
’’false”
Instances
member
6j(Q(x,
)=’’false”
z))=
of Qand
Lebel
d
z))=
fl~(lt(z,y)
System
that
’’true”
)=’’false”
Interprets
@j’’(Q(x,
z))=
’’false”
O~(R(z,
y))=
’’false”
Relations
0
0
5
o
o
o
Four
Compatible
Valuations
2
0
that
Generate
the Conjunctive
Zte?n
o
as Multisets
2
3
VII.
,!,
o;(x) = K
H:(z) = N
e~(y) = M
R(d)
others
Table
o
Q(d)
(K, L)
(K, M)
(IJ,N)
(M, N)
All
@f(Q(x,
)=’’false”
Rina
(K, M) in the Consequent
o;(x) = K
6((z) = M
e:(y) = M
’’true”
tij(fi(z,y)
303
o
0“
o;(x) = K
0;(2)
= L
o:(y) = M
d,,(x)
R(z,
Tuple
Queries
Query
6’
= K
0“(2) = K
(1,,(y)= M
x
z
Y
Q(x, z)
of Conjunctwe
Tuple(K,
N)inthe
Consequentof
Query
0)
Q,!,
w
x
OU(x)=
K
f);(x)
= K
o;(x)
== K
o:(x)
z
o,)(z)
= K
o;(z)
= L
o;(z)
= M
~~(z)
= N
Y
Q(x;
o,(y)
= N
f);(y)
= N
e:(y)
= N
o:(y)
= N
Z)
dl(Q(x,
z)) = O
R(z,
y)
6’l(R(Z,
y))
composition
effect
since
=
o
of tuples
relations
6:(Q(x,
z)) = 2
@fl(Q(x,
z)) = 3
(Ij’’(Q(z,
z)) = O
@j(R(z,
y))
Oj’(R(z,
y))
Oj’’(R(z,
y))
(K, M)
and
= 5
(M, N),
are interpreted
but
of the conjunctive
this
for tuple
query,
as given
whose
in Table
(K, M), and therefore,
was to be intuitively
composition
expected,
could
= 2
multiple
derivations
= O
have
no
as sets.
Now consider
the four compatible
valuations
the ,given database
instance
and that generate
“false”
= K
generate
V. All
its final
because
that
tuple
are true with respect to
(K, M) in the consequent
of these
label
there
generate
is “false”
the label
as well.
are no tuples
in
Again,
Q and R
(K, M).
We now turn our attention
to the label system that interprets
relations
as
multisets
(Example
2.2). Let the instances
of Q and R be as shown in Table
VI. The four compatible
valuations
that are true with the instance
given in
Table VI and that generate
tuple (K, N) in the consequent
of the conjunctive
query
are given
Definition
in Table
2.6, the four
VII.
Based
valuations
on the multiplication
generate
the labels
semantics
of * and
O, 10, 6, and O for tuple
(K, N). By adding these four labels, we obtain the label 16 for the tuple, which
is exactly
the number
of copies one would
intuitively
expect for it. The
composition
of tuples (K, L) and (L, N) of Q and R, respectively,
generates
10
of these
copies,
and
the
composition
of (K, M) and
(M, N) generates
6 more.
ACM Transa&,ons on Database Systems,Vol. 20. No. 3, September 1995.
304
.
Table
Y, E. Ioannldls
VIII,
Four
and R, Ramakrishnan
Compatible
Valuations
that
Generate
the Coniunctlve
Item
o
Tuple
(K, M) in the Consequent
tl,,/
e“
(1’
x
(9,(x)=K
(?:(x)
= K
6;(x)
z
d,,(z)
= K
d;(z)
= L
8;(.
#u(.Y)
=
e;(y)
= M
M
= K
(l;:(x)
) = M
o::(y)
01’’(y)
= M
z.)
(Ir(Q(X,
Z)) = O
o;(Q(x,
z)) = 2
6j’(Q(x,
z)) = 3
(3:( Q(x,
R(z,
y)
tll(R(z,
y))
o
Oj(l?(z,
y))
o
tIj’(R(z,
y))
Hj’’(l?(z,
Since
we
dealing
are
with
multisets,
To conclude
with
the example,
to the
respect
each
= O
separate
copy
counts,
consider
given
the four
database
compatible
instance
y))=
o
there-
of copies of
valuations
and generate
= M
z)) = O
and
the number
Indeed, this is precisely
the above query in SQL.2
fore, 10 and 6 must be added.
(K, N) generated
by evaluating
true
=
= K
@J’’(z)=N
Y
Q(x)
=
of
Query
that
tuple
are
(K, M) in
the consequent
of the conjunctive
query, as shown in Table VIII. All of these
valuations
generate
the label
O for tuple
(K, M), since E?(z, y) is always
mapped
to
intuition
At
this
point,
additional
used
O. Thus,
the
final
label
of tuple
(K, M) is O, which
again
is what
requires.
in
let
relation
the
consider
us
columns
previous
why
is not
examples,
explicit
enough.
and
representation
Consider
add a label
the
column
of labels
conjunctive
in the
as
query
participating
relations:
Q(x,
We claim that,
capture
arbitrary
final
can
label
only
z,lQ)
~l~=g(lQ,zR)
~R(Z,Y,lR)
‘f’(x,Y,lP).
independent
of our choice of function
g, the above cannot
interpretations
of relations.
The main problem
is that the
assigned
capture
to
a tuple
depends
relationships
on
of labels
multiple
associated
derivations,
with
a single
whereas
tuple
g
deriva-
tion. Thus, for example,
if g is equal to *, then g captures
the treatment
of
labels for a single valuation,
but cannot capture
the combination
of multiple
compatible
valuations.
For conjunctive
queries where all variables
are distinare
projected
out of the result,
howguished,
this suffices.
When variables
ever,
additional
as part
machinery
of a single
needs
valuation.
to be invoked,
The formulation
which
cannot
of the problem
be represented
as presented
in
the previous
definitions
achieves what is desired. Moreover,
in our opinion,
it
syntax
from semantics.
is also aesthetically
pleasing
because it separates
There is a single syntactic
definition
of conjunctive
queries,
on top of which
one may impose arbitrary
semantics
captured
via label systems.
We now present
some simple
queries
that
illustrate
the power
of the
generalizations
that we have proposed.
These complement
the examples
in
in light
of the formal
definitions
that we have
the Introduction
and are given
example
demonstrates
the importance
of multiestablished
earlier.
Our first
2 In
SQL,
requested
ACM
multiset
semantics
by using
the keyword
Transactions
on Database
is
the
default,
and
duplicate
elimination
distinct.
Systems,
Vol
20, No
3, September
1995
must
be
explicitly
Containment
sets
in
languages
compelling
like
reason
SQL.
to
2.6.
The
already
generalized
such as the ones discussed
E,~ample
As we have
study
here
of Conjunctive
following
query
argued,
conjunctive
are already
Queries
this
queries,
widely
305
.
is the
since
most
queries
used.
can be considered
a view
definition
in
SQL:
Department
(dname,
~ Deptsals(
This
query
2.2,
where
example,
tuples.
it is natural
However,
appears
salary.
all
This
tion.
If the view
over
not
and
the
diminish
relations
formulation,
salary
relation
as there
such
relation
to compute
Department
does
keyword
us to query
features
and sums the (multiset
it
the
authorized
and
in the
of tuples
In
formulation
in which
budget
the
that
the sum of
of each
sum
budgets,
depart-
aggregate
to be a set of tuples,
since it is possible
Employee
relations
of) salaries
the
that
opera-
we could
for two
reasons.
Employee
a multiset
relations
view
First,
that
there
this
is true,
conjunctive
in this
is an implicit
of salaries,
and
not
over
by department
Although
even
not
is
an SQL query
groups
of understanding
to recognize
to see the Deptsals
directly
in each department.
importance
generates
to write
of
a salary
earning
to compute
is, the
this
to be sets of
in the department
and
in Example
cardinality.
is used
for example,
that
considered
defined
associated
distinct
basis,
2’
sal)
in the antecedents
are employees
account rigorously
for the cardinality
for a variety
of reasons,
it may not
Department
system
is a multiset
the view,
dname,
of a department
by examining
the view.
that
explicit
construction
of multiset
relations
as multisets,
that
an
as GROUP-BY
was
it is important
field
label
has
the relations
on a per-department
compute
the budget
One could argue
the
on the
a relation
to consider
as often
by using
necessary
based
the Deptsals
enables
salaries
ment,
sal ).
in
unless
the SQL query,
value
tuple
A Employee(ename,
dname,
is interpreted
each
fZoor)
queries
single
query
projection
thus,
of
we need
to
of each salary in the projection.
Second,
be desirable
to give users access to the
directly.
For
may nonetheless
example,
a user
who
not be authorized
what each individual
earns. In such a situation,
the
multiset
relation
is the most natural
solution.
The following
example
illustrates
the generalization
explicit
creation
to relations
is
to see
of a
as fuzzy
sets:
Example
2.7.
The following
queries attempt
to find promising
candidates
in a football
draft. The first query identifies
linebackers
who are big and have
experience,
and the second identifies
receivers
with speed and skill. The data
have a degree of uncertainty,
in that each fact has an associated
“certainty
factor” between
O and 1. These factors are to be interpreted
in terms of fuzzy
logic, rather than as probabilities.
That is, a fact llig~ohn)
with an associated
label of 0.7 is to be read as “John is quite big,” rather
than “It is quite likely
that John is big.” Furthermore,
our confidence
in the criteria
expressed in the
two rules is also limited,
and thus, each rule has an associated
“rule certainty
factor.” The certainty
of a deduced fact is given by the minimum
value of the
ACM
Transactions
on Database
Systems,
Vol. 20, No. 3, September
1995.
306
Y
.
Ioanrudls and R, Ramaknshnan
E,
certainties
associated
multiplied
(in
the
with
deduced using several
given by the maximum
about
certainty
the facts
arithmetic
different
of
factors
probability-based
used
sense)
deduced
why
reasoning
in
antecedent
rule
the
are
associated
factors
it.
more
elsewhere
the
valuation
If a fact
is
factor
is
certainty
for
sometimes
discussed
of
factor.
certainty
certainty
they
are
the
the
valuations,
the
and
by
(General
rules
appropriate
[Parsaye
than
and
Chignell
1988].)
Linebacker
x)
A Big(x)
Recei~er(
The above desired
x ) A Fast(x)
semantics
the second label
system
tive
respectively
queries
fz(a)
This
are
The
label
X),
‘-G Candidate(
X).
A Skilled(x)
captured
by
sets) defined
associated
system
certainty
tion
and
interpreting
this
in Example
with
framework
and
note
respect
that
the
the
labels
the
required
Example
kuples,
label
assigned
Although
van
we
Emden
of
but
define
do
trees
for
captures
[ 1986];
treatment
consider
a and
domain.
above.
Emden
the
ignoring
of
recursive
intersecqueries,
a correspondence
two-person
to
the
same
of
it
between
his
games.
in
label
SQ3
contains
the
the
a more
of
For
SQ4
potentially
is
results
system.
and
is formalized
definition,
containment
necessarily
containment
tuple
the
Introduction,
sQ4
not
Intuitively,
order
the
query
interpreted
sets
based
on
two
queries
and
exactly
certainty
a
general
but
with
comparing
based
as
the
we
saw
same
never
on
in
set
fewer
based on a comparison
of the
in SQ4, but not vice versa.
for a general
formal
in
as
instance,
generate
more
of each tuple
than
SQ3. Thus,
of each tuple, SQ3 is contained
After
1 as their
essentially
van
[1965]
not
showed
conjunctive
are
The above illustration
2.10.
Zadeh’s
essentially
system.
partial
duplicates
multiplicities
2.7
by
on
conjunc-
fl( a) = 0.7*
functions
Example
based
query
2.1. The two
Query Containment
some
1.2
in
proposed
search
now ready
to
where
relations
to
is
sets.
alpha–beta
Conjunctive
We are
setting,
it
of fuzzy
to
A used
framework
factors,
union
interesting
2.5
of type
deduction
rule
of
are
(fuzzy
x) ‘~ Candidate(
= 0.6* a, having
the real numbers
between
O and
mentioned
gives the rule certainty
factor semantics
quantitative
is
A E~perienced(
example
is
label
system
also
provided.
in
Definition
Definition
2.10.
Consider
two relation
instances
R ~ and R ~ for a predirespect
to a label
system
~.
R ~ is
cate R with domain
D ~ x “”” x D. with
in Rz, denoted
by RI <, Rz, if, for each tuple
t ● Dl x ““” xD.,
contained
R.l(t ) s Rz(t).
Clearly,
S, is a partial
order.
Example
multisets
2.8.
to
We again
illustrate
the
use the
above
label
definition
systems
corresponding
to sets and
If R 1 <, R ~, then,
of containment.
of R ~ and R ~
based on the above definition,
there is no tuple t in the domain
such that Rl(t)
> R.g(t).
this is equivalent
to the absence of a tuple
t for which
For the set case,
and
R2(t) = “false,” that is, which is in R ~ but not in R ~. This
Rl(t)
= “true”
captures
precisely
the traditional
notion of set inclusion,
R 1 c R z.
ACM
TransactIons
on Database
System.,
Vol
20. No
3. September
1995
Containment
For
the
which
t
multiset
has
traditional
notion
Definition
tive
case,
strictly
is equivalent
copies
of inclusion
2.11.
than
this
more
For
~, denoted
two
a
<,
of Conjunctive
in
to the
RI
than
absence
in
Rz.
between
multisets
based
conjunctive
queries
a and
if, for any
~,
database
Queries
latter
is defined
in
partial
order over both
conjunctive
queries,
2.6
on number
j3,
of the
former.
‘l’he
is
I,
the
of copies.
is more
a
instance
a(I)
restric<,
f?(I).
containment
of
is natural,
since
ordering
relation
t for
this
<,
instances
denotes
a
and the set of
Homomorphisms
We close this
tools
section
with
in characterizing
Definition
ble
terms
the set of compatible
307
of a tuple
Again,
Note that the symbol
S, is overloaded
in that it signifies
relations
as well as containment
of conjunctive
queries. This
the
.
2.12.
Consider
consequent.
variables
A
of ~ into
(1) if x, y
position
that
x.)
two
contains
function.
appears
this
in
D, then
h:
a and
is a total
appearing
in
a, respectively,
Q(h(xl),
~ -~ a
this
definition
...,
~ with
compati-
function
from
the
induces
the same argument
then h(x) = y; and
h(x.))
appears
a total
mapping
is not
in
mapping
a function,
of a homomorphism
extension,
we can define
homomorphism
in
which
atomic
in
a. We also define
formulas
queries
~ -+ a
variables
of f? and
formulas,
the
With
are useful
a.
from
the
of ~ to the atomic
formulas
of a. Occasionally,
when no
mapping
as well. If a
we use h to denote that induced
repeated
extend
conjunctive
h:
which
containment.
of a, such that
a homomorphism
atomic
formulas
confusion
arises,
readily
those
of homomorphisms,
query
homomorphism
are distinguished
in the consequent
(2) if Q(xl,...,
Note
the definition
conjunctive
this
function
to
an onto
is onto,
but
we
that
it
homomorphism
with
a variable-onto
require
respect
can
is
a
to be a
to the
homomorphism
set
to
of
be
a
homomorphism
that is onto with respect to the set of variables
in a. Note
that every onto homomorphism
is also variable-onto,
but that the converse is
not true.
2.13.
Definition
subset
of the
defined
as fl
3. TWO
For two functions
domain
o
fz(~)
GENERAL
of fl,
their
= fl( fz(~))
fl
3.1.
fz such that
PROOF. Let
argument
by
x in the domain
fl
of fz is a
o
fz
and
is
of fz.
RESULTS
Consider
a label
system
that are applicable
to almost
in the rest of the paper.
-Y such that
For two conjunctive
queries a and ~, the inequality
to 2 only if there exists a homomorphism
h: ~ -
ith
the range
is denoted
for any member
In this section we establish
two results
systems and that are used extensively
LEMMA
and
composition
all label
Vu, b ~ L – {O}, a * b # O.
a <,
~ holds
with
respect
a.
a, (resp., b,), 1< i < n, be the distinguished
variable
in the
position
of a (resp.,
~). Assume
that
a <, (3. Consider
a
ACM
Transactions
on Database
Systems,
Vol. 20, No, 3, September
1995.
308
Y. E. Ioannldls
.
valuation
set
O Of a
of constants
SUCh
C
of L. Consider
is satisfied:
R. Ramakrlshnan
0,, is
one-to-one
(+1 maps
a database
all
instance
Let
I?fi ) be the
result
in
to
a
onto
nonzero
for any relation
some
elements
Q the following
, . . . . .~,. ) in the antecedent
of applying
on the requirements
on
f
~t((~,(al),
in
the
of a;
L
with
... (1,,(a,,)))
true
and
respect to the given
that for any atomic
~ ) on that
a <,
respect
to
<
+ O as well. Thus,
database
formula
instance.
of the lemma
of conjunctive
. . . . ti,, (a~))) # O. Since
of
a (resp..
on .& in the premise
definition
element
Pfl((dt(al),.
with
such
a
t = (f),,
(.rl
),...,fl,
(x,,l))
for some
if
Q(xI
requirements
holds:
variables
in
otherwise.
P,, (resp.,
based
the
formulas
such that
\ 4),
Then,
from
atomic
. . ..xm )).
1
=
that
and
6/( Q(.rl,
Q(t)
and
/3 and
and the
queries,
the
following
because
O is the
least
, it
must
a valuation
be
the
case
that
6’ of ~ exists that is
instance
that
Q(.Y1, ..., y~)
is compatible
with
o
in 6, Q((fl~(.Yl ),...,
6:( Y., ))) # O. By the construction
of the database
instance,
the above implies
that d; maps the variables
of P into the set of constants
C. Valuation
O, is
Taking
the composition
one-to-one
and onto, so its inverse
(3,T1 is defined.
h = 0,, 10 ():,
is easy to verify
it
of ~ to the variables
L~iWWA 3.2.
there
exists
of a.
Consider
that
it is a homomorphism
from
the variables
❑
two conjunctive
a honlomorphisrn
h: ~ ~
queries
a and
a. Consider
~, and
a database
assume
instance
that
I and
the set 0,, ( resp., @P) of all valuations
of a ( resp., @) that are true with
respect to I. Let P be a total function
on 6)0 ranging
over the set of valuations
of ~ and
defined
us F( O) = O ~ h. Function
o ~ (OC,, P( e ) ● (3D
For all
PROOF.
phisms.
Q(yl,
The
By
proof
of
property
. . . . -vk’) in
/3,
Q(h(
the
lemma
and
is
(2)
of
yl),
. . . . A(yk))
F has the following
P(
(ii)
based
Definition
is compatible
on
the
2.12,
appears
with
definition
for
in
property:
a
each
as
well.
(i
of
homomor-
atomic
This
formula
implies
the
following:
z Oh(Q(yI,...,
Y,n)) = fl(h(Q(
yl,
. . ..ym)))
= O1(Q(h(y l),.
... h(y,n)))
=Q((o,,oi(yl)
),...,
ti,,(h(ym)})
since
=Q((O,
Hence,
by Definition
argument
position
ti,, o h(b, ), 1 < i <
ACM
TransactIons
with
respect
to 1
F( (3) = 60 h of /3 is true
with
respect
to 1,
ok(yl),...,
2.8 valuation
that is, it is in (Op.
Let a, (resp.,
b,),
(I is true
1 < z < n,
be
d,, Oh(y~)))
the
distinguished
variable
in
the
ith
of a (resp., ~). By property
(1) of Definition
2.12, /3C(aZ) ==
n, and therefore,
H and F(fl ) = @o h are compatible.
❑
on Database
Systems,
Vol
20, No
3, September
1995
Containment
4. LABEL
The
SYSTEMS
following
conjunctive
THEOREM
and
p:
Then,
exists
Let
PROOF.
argument
it follows
two
a <,
Assume
a homomorphism
Then,
h: P ~
/3) that
instance
are true
with
F has the following
of type
label
queries
that
with
suff~cient
with
Ya, b =L,
respect
condition
systems
a:
Al
A ~.” A A.,l
system
for
of type
a s b = fa(a)
to a label
A:
~ c.
<fP(b).
.S? of type A
a.
Va, b e L – {O}, a * b # O. Therefore,
a database
The proof
309
a, (resp,, b,), 1 s i s n, be the distinguished
variable
in the
position
of a (resp., ~). Assume that a <, /3, By property
(Al),
that
a (resp.,
and
databases
conjunctive
~ holds
the “only-if’
direction
is proved.
For the “if” direction,
assume
Consider
a necessary
over
B~z ~c~.
A
the inequality
iff there
ith
Consider
A ...
BI
identifies
containment
4.1,
.
OF TYPE A
theorem
query
Queries
of Conjunctive
of property
A. Let
that
respect
additional
1 s
exists
i <
to 1. Let
Lemma
a homomorphism
1 and the set 6). (resp.,
(a) is based
A = {A,:
there
by applying
h: ~ -
@P) of all valuations
F be defined
as in Lemma
3.1,
a.
of
3.2.
property:
on specific
ml} and
characteristics
B = {B,:
of label
systems
1 < i < rn2} represent
the
set of atomic formulas
in a and ~, respectively.
Consider
the subset A~ of A
consisting
of the atomic formulas
in a that are images of atomic formulas
in
B under h. Without
loss of generality,
assume that A~ = {Al:
some k z 1.Then, since til o h ~ @fi (Lemma
3.2), the following
1<
i < k} for
holds:
(1)
The last
h(llj
) for
following
equality
some
is due to property
i #j,
the
product
(A2),
is not
which
implies
affected.
that,
On
the
even if h(B, ) =
other
hand,
the
is also true:
From the above, we conclude
that
Ol(c,t ) s 610 h(cP ), using property
(Al) of
multiplication
and the conditions
on f. and fp in the theorem.
Thus, property
(a) of F holds.
We proceed with the
the equivalence
relation
proof of the
of valuation
theorem.
Partition
(1,, and @v based on
compatibility,
and let @ffl and @~P be
the corresponding
partitions
that
generate
tuple
t in the distinguished
variables
of a or ~. Clearly,
there is a one-to-one
and onto correspondence
between the partitions
obtained
for a and those obtained
for B (since for both
of them
there
is a single
partition
for each tuple
in the domain
of the
consequent
relation).
Let V,, and Vfi be multisets
defined
as follows:
V,, =
ACM
TransactIons
on Datiabase
Systems,
Vol. 20, No 3, September
1995
310
Y, E Ioannidis
.
{6?(c. ): 9 = @t.} and
there is some element
Suppose
that
R Ramaknshnan
Vp = {O~(cD ): 0’ = @fP}. By
UO = V’ such that
property
(A4)
of addition,
UO = 61(c0 ) and v~ = IN 01)(cP ), for some O = ofa. From
(a), UO < v~, and from
following
holds:
Combining
and
Lemma
(2) and (3) yields
3.2, u~ = VP. By property
E,, ~ ~ u < ZU
(A3)
~ U. If a(~)
●
and
property
of addition,
~(l)
the
are equal
to
the relations
Pa and PD, respecti~ely,
by ~efinition
2.9, the above
that, for all tuples
t in the result of a or ~, P.(t)
< PP (t ). Therefore,
implies
for an
arbitrary
a <,
database
instance
1, a(1)
<,
P(1),
which
also implies
that
~.
❑
As we have
mentioned
tional
databases,
A systems. Thus,
[ 1977],
so that
example,
ward
it
is now
is that
systems
5. LABEL
The
Testing
SYSTEMS
following
over
traditional
alternative
semantics
sets [Zadeh
query
1965].
the following
[Chandra
conjunctive
query
as well,
identifies
databases
5.1 illustrates,
5.1.
with
PROOF.
a sufficient
with
label
and Merlin
containment
Consider
two
condition
systems
it is not necessary,
Let
conjunctive
a,
(resp.,
6,),
1 <
i <
position
of a (resp.,
h: /? + a. Consider
(resp., fl~ ) of all valuations
3’ be defined
as in Lemma
1977]:
with
all
TransactIons
to
n,
queries
be
from
a:
Al
A
... A
the
distinguished
on Database
is, @l(c. ) <010
6). to 6)P.
Systems,
Vol. 20, No
as
variable
3, September
Anl
~ CW
< ffl(b).
a <,
in
/?
the
that
there
exists an onto
instance
I and the set 6).
of a (resp., ~) that are true with respect
3.2. Then, F has the following
additional
that
query
B. Unfortunately,
in general;
/3 ‘). Assume
a database
O E @a, 191(c.) s (A);
(b) F is one-to-one
ACM
respect
for conjunctive
of type
ties:
(a) For
respect
for the traditional
rfi
Assume
that Va, b = L, a < b * f.(a)
and P: Bl A . . . ~B~z~cp.
If there exists an onto homomorphism
h: /? d a, then the inequality
holds with respect to a label system S7 of type B.
ith argument
homomorphism
for
A straightfor-
containment
to the same problem
rela-
case of type
and Merlin
OF TYPE B
theorem
over
THEOREM
for
as fuzzy
for conjunctive
for
queries
as sets, are a special
the result of Chandra
of type A is NP-complete.
containment
Example
testing
we have
COROLLARY 4.1.
systems
relations
of type A is identical
case of sets. Hence,
label
conjunctive
applicable
for interpreting
corollary
to label
earlier,
which interpret
relations
Theorem
4.1 generalizes
1995
h(cp).
to 1. Let
proper-
Containment
The
proof
systems
of property
of
type
B.
(a) is based
Let
the
set of atomic
onto,
set
.?3 can be partitioned
that
the
in
a and
into
Without
two
Queries
B = {Bi:
say,
B ~={BZ:
Because
B~ and
assume
ml
of label
1 < i < rrz2}
~, respectively.
subsets,
311
.
characteristics
and
loss of generality,
1 s i s ml} and
B~ = {B,:
holds:
specific
1 < i < ml}
formulas
h: B~ a A is a bijection,
subsets are
the following
on the
A = {A,:
represent
of Conjunctive
h is
BQ, such
that
the two
m2}. Thus,
+ 1 .s i <
m2
=fp
fi6’1(
[ t=l
~
Al)*
6,0h(Bz)
.
2=7721+1
)
Also,
From
the above,
because
of property
(Bl)
of multiplication
on fa and ~P in the theorem,
we conclude
fore that property
(a) of F holds,
The
proof
of property
(b) consists
h is onto,
of some variable
~. The above
it is also variable-onto,
combined
of /3. Assume
with
(4) implies
Ouo
k(y)
Therefore,
0,,0 M P) # O; oh(B),
We proceed
with
the
proof
given
two
valuations
O(a)
+ O’(a),
(4)
and
that
so every
x = h(y),
variable
of a is an
for some variable
y of
that
# 6); 0/i(y).
which
of the
that,
# O’Oh( @). Because
a such that
# f3;(x).
tic(x)
Because
and the conditions
01(Ca) s 010 h(cP ) and there-
of showing
19,f3’ = @a, if o(a) # ~’(a), then Ooh(P)
there must be at least one variable
x in
h-image
that
implies
theorem.
(5)
that
Partition
property
~.
(b) of F holds.
and @p based on
the equivalence
relation
of valuation
compatibility,
and let (l~e and @fP be
t in the distinguished
the corresponding
partitions
that
generate
tuple
variables
of a or ~. As in the proof of Theorem
4.1, there is a one-to-one
and
onto
tained
correspondence
for
~. Let
between
V.
and
the
partitions
VP be multisets
obtained
defined
for
a and
as follows:
those
ob-
Vu = {61(c0 ):
@E @ta} and VP = {O;(CP ): (3’ e @tP}. Let VP” be defined
as follows:
VP” =
of F imply
{O{(cp): 0’ e @tp and 0’ = /3 o h for some 19= (30}. The properties
that for every
to u (Lemma
other
imply
element
u = V. there is an element
U’ ● V: that corresponds
u < cl’ (property
(a)), which
corresponds
to no
3.2) such that
element
of Va (property
the following:
ACM
(b)).
TransactIons
By property
on Database
(B2)
Systems,
of addition,
the
Vol. 20, No. 3, September
above
1995.
312
Y, E. Ioannidls
.
and
R Ramakrishnan
by
If a(l)
and P(1) are equal to the relations
P. and PP, respectively,
t in the result
of a or /3,
Definition
2.9, (6) implies
that,
for all tuples
Pa(t) < Po(t). Therefore,
for an arbitrary
database
instance
1, a(l)
<, /3(1),
which
also implies
The
following
query
PROPOSITION
with
homomorphism
provides
For
two
respect
h: /3 ~
Assume
❑
/3.
containment
5.1,
a ST ~ holds
a <,
proposition
for conjunctive
PROOF.
that
a straightforward
with
respect
conjunctive
to a label
queries
system
necessary
to label
a
systems
and
condition
of type
(3, the
-5? of type B only
B:
inequality
if there
exists
a
a.
that
CY S,
{O}, a * b # O. Therefore,
~.
By
property
by applying
(Bl),
Lemma
it
follows
3,1, the
that
VCL,
proposition
b = L –
is proved.
❑
By restricting
that
the form
the condition
THEOREM
of conjunctive
of Theorem
Consider
5.2,
queries,
5.1 is both
two
conjunctive
D +
to a label
system
shows
Vu, b ● L,
Al
~ A A~l
A
a < b + f.(a)
a <,
~ c.
< fP(b).
fl holds
with
an onto homomorphism
h:
a!,
PROOF.
Suppose
The
that
“if”
direction
a <,
/3. By
h: D s
homomorphism
such
a:
the inequality
$?’ of type B iff there exists
theorem
and sufficient:
queries
that
and ~: BI A .“ A B~z ~ C@. Assume
If a does not contain
repeated predicates,
respect
the following
necessary
homomorphism.
is
an
immediate
Proposition
5.1,
cr. We first
Since
there
show
consequence
we
know
that,
that
in this
are no repeated
of
there
Theorem
must
case, there
predicates
in
5,1,
be
some
is a unique
a, for each
atomic formula
of ~, there is a unique
atomic formula
in a that can be its
image under any homomorphism.
Therefore,
for each variable
in ~ its image
is uniquely
determined;
that is, there is a unique
homomorphism
h: /3 4 a.
It remains
the contrary
not
the
image
to be shown
that
that
this
h is not onto.
of any
atomic
unique
Then,
formula
atomic formula
of a. Clearly,
Q cannot
ity, assume that all arguments
of all
have the same domain.
Consider
instance
such that each relation
R(t)
# O,
R(t) = o,
Let P. (resp.,
PP ) be the result
homomorphism
there
in
is an atomic
/?. Let
is onto. Assume
formula
Q be the
in
to
a that
predicate
is
in that
appear in ~. Without
loss of generalpredicates
in the conjunctive
queries
a constant
R satisfies
c in that domain
the following:
and a database
if t has c in all its arguments;
otherwise.
of applying
a (resp.,
B ) on that
instance.
Let
s
be the tuple in these results that has c in all of its arguments.
Let Pp(s) = 1,
where, by the construction
of the database
instance,
1 G L – {O}. By property
(B3), there is a label m ● L such that m >1. Suppose that t’is the tuple in Q
that has c in all of its arguments,
and choose Q( t’) = m. Then, by property
ACM
TransactIons
on Database
Systems,
Vol
20, No
3, September
1995
Containment
(Bl),
tion.
l’.(t)
> m >1 = PP(t), which
❑
Hence, h must be onto.
Recently,
Chaudhuri
5.1, Proposition
ask whether
not
contain
following
Let
Y
5.1.
Definition
satisfies
(El)
A Q(x)
P:
Q(x)
+P(x).
L = N (the
addition,
a s,
A
does).
to
~ does
Unfortunately,
the
-P(x),
<
there
B label
numbers
is the usual
~, although
so that
version
-!%= (L,
total
including
order
is no onto
systems
a stronger
system
queries:
set of natural
and
of type
label
the case that
two conjunctive
Q(x)
we can prove
5.1.
a possibly
Theorem
It is natural
is not possible:
a:
Clearly,
is a contradic-
discovered
to cover
313
.
homomorphism
examples
of Theorem
*, +, O, < ) is
O),
over the
of
like
the
5.2.
type
B’
if
it
the following:
Va, b~L–{O},
(B’2)Va,
(B’3)
this
as follows:
p, which
independently
the following
+ is the usual
are excluded,
(while
that
from ~ to a.
By limiting
the definition
above
[1993]
a <,
Queries
5.2 for the case of multisets.
predicates
Consider
numbers.
that
5.2 can be strengthened
shows
be defined
* is ma-c,
natural
and Vardi
repeated
example
Example
implies
5.1, and Theorem
Theorem
of Conjunctive
a<a*b,
a’, b, b’=L,
and3a
(a<a’andb
=L,
<b’)
Vkkl,
ah<ak+l.
+a+b<a’+
b’.
Va G L, ~a’ ~ L, a < a’.
Observe
that
B’ is that
the only
there
than the element
following
result:
THEOREM
difference
is an element
itself
(property
Consider
5.3.
between
in L whose
label
systems
product
(B l)). For these
two
conjunctive
of type
with
itself
label
systems,
queries
a:
Al
B and of type
is strictly
larger
we have
the
~ . . . ~ Am ~ 5 c.
tp
and
P: BI
If either
holds
A . . . ~B~z-cfl.
Assume
a or ~ does not contain
with
homomorphism
respect
to a label
h: p d
that
repeated
system
Va, b = L, a < b ~ f.(a)
predicates,
J? of type
B’
the inequality
iff
there
exists
< f~(b).
a <,
an
/3
onto
a.
PROOF. The “if’
direction
where there are no repetitions
as well as the “only-if’
direction
in a are immediate
consequences
for the case
of Theorems
5.1 and 5,2, respectively.
Suppose that (1 does not contain repeated
predicates
and that
a <, ~, By Proposition
5,1, we know that
there
must be some
homomorphism
h: D ~ a. Clearly,
every such homomorphism
must be oneto-one with respect to atomic formulas,
since there is no repetition
or predicates in ~. Assume that h is not onto. If m (resp., n) is the number
of atomic
formulas
in a (resp., ~), then clearly
m > n.
Let 1 be an element
of L such that
Vk >1,
lk < lk +‘. Property
(B 1)
ensures
the existence
of such a label.
ACM
Transactions
Without
on Database
loss of generality,
Systems,
assume
Vol. 20, No. 3, September
that
1995.
Y. E, Ioannidis
314
.
all
arguments
domain.
that
of all
Consider
R. Ramakrishnan
predicates
conjunctive
R satisfies
the following:
domain
queries
= 1,
if t has c in all its arguments,
R(t)
= O,
otherwise.
of applying
results
P.(s)
P.(s)
= lm and PP(s) = l“,
> PP (s ), which implies
must
be onto,
that
have
and a database
R(t)
in these
We remark
the
c in that
PP ) be the result
be the tuple
in
a constant
each relation
Let Pe (resp,,
and
a (resp.,
~ ) on that
has c in all of its arguments.
Since
that
the
same
instance
such
instance.
Let
We note
s
that
m > n, by property
(B 1), it follows
that
a K, /3, which is a contradiction.
Hence, h
❑
that
the
choice
of h in
the
above
proof
was
arbitrary.
Hence,
a <, ~ only if every homomorphism
h: P ~ a is onto.
As mentioned
earlier,
commercial
relational
database
systems
treat relations as multisets
and therefore,
operate under the semantics
of type B (and,
in fact, type B ) label systems. Hence, Theorems
5. 1–5.3 are applicable
to this
case.
6. UNIONS
In this
ment
OF CONJUNCTIVE
section
we generalize
not of individual
QUERIES
the results
conjunctive
queries,
classical
case, where
relations
are
addressed
by Sagiv and Yannakakis
zation
to the containment
presented
problem,
but
viewed
[1980],
After
introducing
holds
show
undecidable
B, thus
for label
between
systems
of type
the two label
of collections
contain-
of them.
as sets, the problem
who gave a syntactic
ogy, we first show that their theorem
which
the set-case is one). We then
differences
above and discuss
the necessary
For the
has been
characteriterminol-
for all label systems of type A (of
that the problem
is, in general,
providing
more
evidence
of the
systems.
6.1 Basic Definitions
The following
junctive
query
definitions
case:
are
straightforward
Definition
6.1.
A union of conjunctive
formulas
with
identical
consequent.
If
queries in the collection,
then their union
extensions
of the
single
con-
queries is a collection
of first-order
a,, 1< i < n, are the conjunctive
is denoted by U ~= ~al.
Definition
6.2.
Consider
a union
of conjunctive
queries
U ;. ~ a, and a
database
instance
1. Let P, be the relation
instance
that is the result
of
applying
a, to 1. The result of applying
U ~. ~ a, to 1 (denoted by U ~. ~ a,(l))
is a relation
instance
P, that is, a function,
such that, for each tuple t in the
domain
of the consequent
of the conjunctive
queries,
P(t)
Definition
with
6.3.
compatible
AC!M TransactIons
For two unions
consequent,
on Database
Systems,
=
fi P,(t).
[=1
of conjunctive
queries
I.J ~~~ a, and I.J~j ~ ~~
is more restrictive
than
U ~~ 1 BJ,
U ~~~ al
Vol
20, No
3, September
1995,
Containment
6.2
of Conjunctive
Queries
315
.
Label Systems of Type A
The following
theorem
containment
systems
of
of type
‘1’HEOREM
of
a necessary
conjunctive
and
queries
sufficient
over
condition
databases
with
for
label
A:
For two unions
6.1.
the inequality
u ~~~ a, <,
type A iff for all
PROOF.
identifies
unions
For
1 s i s nl
the
“if’
of conjunctive
U ~~~ a, and
queries
U ~~~ B,,
U ~! ~ /3, holds
there
with respect to a label system ~
1 s j < n2 such that a, <, (31.
exists
direction,
assume
that
all
for
1<
i s nl
there
of
exists
1< j < n2 such that
a, <, ~j. Consider
a database
instance
and let Pa, PP,
Pai, and PPj denote the results
of applying
U ~~~a,, (J~~ ~ ~,, al, and ~~ tO
that
instance,
respective]
of the conjunctive
t in the domam
y. For any tuple
queries,
Definition
6.2 implies
of the consequent
that
: Paz(t).
Pa(t) =
(7)
1=1
Properties
(A3)
imply
there
that
and (A4) together
exists
some
with
associativity
1 < -I < nl
;
Pa,(t)
and commutativity
of +
such that
< Pa,(t).
(8)
~=1
By the
aI <,
premises
of
PJ, which
Finally,
imply
the
implies
property
(A3)
“if’
direction,
there
exists
some
J < TZ2 such that
1<
that
and
associativity
of
+
together
with
Definition
6.2
that
PpJ(t)
<
;
PFJ(t)
(lo)
= Pp(t).
,j=l
The
combination
consequent
of (7)-(10)
of the
yields
conjunctive
that
queries
for any tuple
for arbitrary
tu-pies t and arbitrary
database
2.11 imply that l-l~!l
a, <, U~!l
(3,.
The proof of the “only-if’
direction
is very
3.1, but
is given
here
in full
detail
for
t in the
Pc,( t ) g PD (t),Since
clarity.
instances,
similar
domain
the
Definitions
to the
Assume
that
of the
above
holds
2.10 and
proof
of Lemma
the
inequality
U ~1, a, <, U ~1, ~, holds. Let al (resp., b,), 1< i < n, be the distinguished
variable
in the i th argument
position
of the conjunctive
queries
in U :1 ~ a,
(resp.,
U ~! ~ ,t3~). Consider
ACM
an
arbitrary
TransactIons
conjunctive
on Database
Systems,
query
ar
in
U:!
~ a,.
Vol. 20, No, 3, September
1995
316
Y
.
E. Ioannldls
Furthermore,
variables
in
aI
consider
to nonzero
elements
. . ..x7n )).
and
I?pj
Since
if
that
instance,
premise
of
denote
instance
the
formulas
such
that
t = (6L,
(x1),....OL.
Tin))))for
some
,..
.,xm)
for
a~ ;
in
that
of applying
direction
(.J ~1 ~ a,,
(Al)
imply
and
(A3),
U ~! ~ PJ,
Definition
6.2,
O,,(an)))
eL’(an))).
of L with
there
respect
must
exist
to
<
some
and the additive
~,1 in
U ~! ~ ~
identity,
such
0< Pp.l((f),,(al)....) (),(a.)))
true
with
respect
and
such
that,
to the
for
any
al.
that
< Pfl((6L, (a1),...,
q(a,t)))
element
implies
results
Properties
“only-if”
Pal((oL, (al),...,
O is the least
above
the
respectively.
the
< pb((o,, (al)....>
the
a database
from
all atomic
is satisfied:
Q(x,
o<
0, is one-to-one
Ot maps
otherwise.
Let P<,, PP, P.,
to
C and
of L. Consider
o,
\
/3j
6 of aI such that
\
=
the
a valuation
Q the following
i?l(Q(xl,
Q(t)
and
R Ramakrishnan
in aI onto some set of constants
any relation
and
and
as well. Thus, a valuation
0’ of BJ exists
given database
instance
that is compatible
atomic
formula
Q( yl, . . . . v,. ) in
that
that
with
PJ, Q({ ~~(yl),
is
d
....
O:( y~ )) ) + O. By the construction
of the database
instance,
the above implies
that d; maps the variables
of BJ into the set of constants
C. Valuation
0,, is
Taking
the composition
one-to-one
and onto, so its inverse
d,: 1 is defined.
= (3C 10 (3;, it is easy to verify
that it is a homomorphism
from the variables
of PJ to the variables
of a{. Theorem
4.1 implies
that
aI <, BJ. Since the
above was true for an arbitrary
conjunctive
query cr~ in U ~1 ~ a,, the theorem
h
❑
follows.
When
considering
the
label
system
that
captures
of relations
as sets, Theorem
6.1 is identical
Sagiv and Yannakakis
[ 1980] on containment
Its
significance
containment
6.3
is that
exists
Label Systems
We now
turn
containment
our
of
it
demonstrates
for arbitrary
label
the usual
interpretation
to the corresponding
theorem
of
of unions of conjunctive
queries.
that
systems
the
same
of type
characterization
of
A.
of Type B
attention
unions
of
to
label
conjunctive
systems
queries.
of
We
type
show
B
and
that
the
there
label system
for which
the problem
is undecidable,
specifically,
system that captures
the interpretation
of relations
as multisets.
tion is from a variant
from the Diophantine
equation
problem,
problem
of
is one such
the label
The reducso we first
state some definitions
related
to that.
Let .&( x ~, Xz, . . . . x~ ) be a polynomial
in k variables
with integer
coefficients. The equation
AC xl, Xz, . . . . x~ ) = O is referred
to as a Diophantine
equation
when only its integer
solutions
are being sought.
ACM
TransactIons
on Database
Systems,
Vol
20, No. 3, September
1995
Containment
The
problem
of determining
whether
solution
to such an equation
there is no decision procedure
The following
equivalences
(m3x1,
there
xk)A(xl,
‘=’(VX1,
X’2, ...,
x2, xk), xk)
a nonnegative
1982].
317
.
More
integer3
formally,
a variable
=0
x~)A(xl,
xz, ...,
x~)l–(A(
xl,
Therefore,
the problem
of determining
for all nonnegative
integer
assignments
for
Queries
are straightforward:
x2,...,
1 s i s h, the following
exists
is undecidable
[Davis
for the statement
=(v-r~, xz, ...,
addition,
of Conjunctive
XO that
xz, . . ..x~))2
<0.
whether
a polynomial
is nonpositive
to its variables
is also undecidable.
In
is
equivalence
+0,
Xk)
distinct
from
all
other
variables
x,,
holds:
~(vx~,x~,...,
xk)xo
A(.xl
xl, ) <
,x2,...,
0.
Note that
XOA(XI, Xz, ..., Xh ) is a polynomial
without
a constant
term.
Hence, the above equivalence
implies that the problem of determining
whether
a polynomial
is nonpositive
for all nonnegative
integer
assignments
to its
variables
remains
undecidable
even if the polynomial
has no constant
terms.
This final undecidable
problem
related
to Diophantine
equations
is the one
that
we use in the following
result:
6.2.
Consider
the following
label
system
of
* , + , (), ~ ), where L = No, the set of nonnegative
integers,
THEOREM
(L,
the usual
integer
multiplication
on the integers.
Also
consider
and addition,
two unions
and
<
PROOF.
Let
Given the above problem
problem
of containment
is
undecidable
positive,
or arbitrary
4Note that we have
two
polynomials
U ?I ~ a, ~,
total
order
U ~~~ a, and
~!:
1 P, with
X2, ...,
xk ) and
Wxl,
X2, . . . . Xh) be two
polynomials
in
positive
integer
coefficients
and with no constant
terms. We
above that there is no decision procedure
for the statement~
(VX1, X2,...,
problem
queries
@(xl,
k variables
with
have established
3The
is the usual
of conjunctive
lJ:: I B,. The problem
of deciding
whether
or not
respect to the above label system F is undecidable.
type B: -Y’=
* and
+ are
with
xk)(@(
x1 ,x2,...,
instance,
of unions
we construct
of conjunctive
independent
integers.
rewritten
a polynomial
positive
coefficients.
xk)sw(xl,
of whether
with
ACM Transactions
arbitrary
on Database
x~, ...,
an equivalent
queries.
the
solutions
integer
Systems,
sought
coefficients
Vol
Xk )).
instance
are
of the
nonnegative,
as the difference
20, No 3, September
of
1995
318
Y E, Ioannidis
.
Assume
that
and R, Ramaknshno.n
the polynomial
@(x1,
@ is of the form
x2, . . ..xk)
s
~a,x:’’x$”
=
““” .x;”,
L=l
where
there
are s terms
in @, which
have
been numbered
in some arbitrary
way from
1 to s. For all 1 < i < s, a, is a positive
integer,
and for all
1 <J’ < k, c,] is a nonnegative
integer with at least one of them being positive
(cD has
no constant
queries
as follows:
variable
xi,
terms).
There
1 s z s k,
From
0,
we
is a unique,
plus
the
construct
unary
unary
a union
relation
relation
of conjunctive
symbol
symbol
P
Xl
for
for
the
each
query
consequent.
All of these relations
share the same domain
D. For each term
conjunctive
queries in the union; that is,
ajx;’’x; ~ ““” x~~, we put a, identical
there are a total of n ~ = Z:. ~ a, conjunctive
queries.
Each such conjunctive
query
is of the form
c, ~ times
Xl(u)
Let
Ax’l(u)
c,,, times
A . . AXI(V)
A . . . A Xh(U)
AX1(V)
A ““
~X~(U)
be the union of conjunctive
queries thus constructed.
of conjunctive
queries
U ~~ ~ ~j can be constructed
from
lJ ~1 ~ al
the union
mial
Similarly,
the polyno-
W.
What
remains
to be shown
(b’xl,
For the “only-if”
P,, denote
associated
,x%, . . ..x~
u;!lal<r
U;!l
direction,
consider
)<Wxl,x2,...,xh)
p,.
an arbitrary
database
instance
1, and let
the result of applying
U ~~~ a, to 1. If X, is the relation
(function)
with the symbol Xl in this instance,
then, for any element
d = D,
Pa(d)
above
structed
is that
x2, . ..rk)(@(xlx1
*
The
~P(u).
= @(Xl(cZ),
is a straightforward
from
the polynomial
U 75 ~ P] to 1, then,
consequence
@. Likewise,
for any element
PP(d)
Xz(d),...,
= WX1(d),
X~(d)).
of the
(11)
way
if PD denotes
IJ ~~~ a,
the result
was
con-
of applying
d = D,
X,(d),,,.,
X~(d)).
(12)
If (Vxl, xz, . . ..x~)(@(xl.
xz, x~), x~) <IP(Zl,
Xz, . . ..xk)).
then by (11) and
(12), for any element
of d = D, P.(d)
< PP(d), which implies
that P. <, Pp.
Since the above holds for an arbitrary
database
instance,
we conclude
that
u;!,
ffzsr
U;?l
P,.
For the “if” direction,
consider
an arbitrary
set of nonnegative
integers
Xl
assigned
to the variables
xl, respectively,
of CDand ~. Let I be a database
= x,. If P. and P~
instance
such that, for some d G D, for all 1 < i s k, X,(d)
then
denote the results
of applying
U ~~~ a, and U ~j, ~j to 1, respectively,
clearly
ACM
TransactIons
on Database
Pw(d)=@(
X1, X2, . . .. x}.),
(13)
PP(d)=T(
Xl, Xz, . . .. X~).
(14)
Systems.
Vol
20, No
3, September
1995
Containment
If U ~~1 a, S, U ;~l DJ, then
that @(xl, xZ, ..., Xk)swxl,
trary
set
of
nonnegative
of Conjunctive
Queries
p.(d) s J?@(d), which by (13) and (14) implies
xz, ..., Xk ). Since the above holds for an arbiintegers
x,,
we
conclude
that
(Vx ~, x ~, . . . .
Xk)).
x2,...,
x~)(~(xl,
xz, ..., xh)<q?(xl,
The two problem
instances
are equivalent,
and the polynomial
problem
is undecidable.
Therefore,
the problem
of containment
conjunctive
queries with respect
theorem
is undecidable
as well,
We
should
important
there
emphasize
label
could
that
system
of type
be some such label
to the label
system
for
unions
inequality
of unions
of
of type B described
in the
❑
Theorem
6.2
B from
system
deals
explicitly
a practical
for which
with
perspective.
of conjunctive
queries
in
the
the
most
In principle,
the problem
of unions
of conjunctive
queries
is decidable,
but Theorem
there is no general
decision
procedure
for all such label
particular,
multisets,
319
.
of containment
6,2 shows that
systems
and, in
case
of relations
as
7. DISCUSSION
One of the benefits
useful
types
syntactically.
of our general
detail.
This is a variant
fuzzy-set
reasoning.
7.1.
Definition
satisfies
(AI’)
is the
A
of our
label
type
system
2
A
system
= (L,
ability
to identify
other
containment
for them
new label system in some
and
is
also
*, +, O, < ) is
of
motivated
type
by
A’
if
it
the following:
Va, bGL-{O},
O<a*b
(A2’)vcze
L,a*a=a.
(A3’)Va,
a’, b, b’~L,
With
label
framework
of label
systems
and to characterize
As an example,
we consider a specific
respect
system
Example
<a.
(a<cz’andb
<b’)
~a+b<a’+
b’.
to type A, condition
(A4) has been dropped.
is also a type A’ label system.
7.1.
For
certainty
factors,
L is equal
between
O and 1, as noted earlier.
However,
rein, the operation
+ is not always
taken
expert
system
denote addition
of type A, but
Clearly,
to the
every
set of real
type A
numbers
while the operation
* is usually
to be max. For instance,
in the
iMYCIN,
a + b is defined
as a + b – a. b (where
+ and “
and multiplication
over numbers,
resp.). This label system is
not of type A. Many
other choices for + also yield type A’
systems.
Lemma
tion
3.1 clearly
for conjunctive
sufficient
holds
query
for type
A’ systems,
containment.
giving
us a necessary
The following
theorem
condi-
establishes
a
condition:
f<,
THEOREM
and
~:
7.1.
Consider
r
two
conjunctive
BI ~ . . . ~ B~z ~ CP. Assume
ACM
Transactions
that
queries
Va, b ~ L,
on Database
Systems,
a:
Al
A
... A
a < b *
fi,(a)
A~l
~ c.
< f~(b).
Vol. 20, No 3. September
1995
320
Y.
.
E, Ioannidis
and
R, Ramakrlshnan
Then, the inequality
a <, P holds with respect to a label
if there exists a variable-onto
homomorphism
h: /3 - a.
PROOF.
ith
Let
argument
a,
(resp.,
position
homomorphism
h:
system
J? of type A
b,), 1 s i s n, be the distinguished
variable
in the
of a (resp., /3). Assume that there exists a variable-onto
~ ~
a. Consider
(resp., @p) of all valuations
F be defined
as in Lemma
a database
instance
I
and
the
of a (resp., B) that are true with respect
3.2. Then, F has the following
additional
set
O.
to ~. Let
proper-
ties.
o ● @a, Ol(cti)
(a) For all
(b) F is one-to-one
The proof
from
of property
Theorem
< (F(O~))(cp);
(a) is identical
4.1. The proof
that
is, OZ(ca) < 1910h(cP).
(9. to (3P.
to the corresponding
of property
(b) is identical
part
of the proof
to the corresponding
of
part
of
the proof of Theorem
5.1. The rest of the proof is identical
to the corresponding part of the proof of Theorem
5.1, observing
that condition
(B2) is identical
to condition
The proof
framework:
of Theorem
The exact
identified
of interest
This
❑
(A3’),
7.1 illustrates
properties
upon
an important
benefit
of our algebraic
which each part of a proof rests can be
precisely,
and this is of great help in identifying
for which useful results
can be established.
still
leaves
be strengthened
studying
open the interesting
to provide
equivalence
and
various
label systems
systems, pose further
A more
basic
the
of whether
sufficient
In particular,
be both the
of conjunctive
or not our conditions
systems
Theorem
condition.
here, and possibly
other
problems
in this area.
is whether
liberal.
should
and
case of unions
considered
interesting
question
be made more
least element
question
a necessary
new label
7.1 can
Of course,
queries
for
interesting
on label
the
label
systems
can
can we relax the requirement
that the
additive
identity
and the multiplicative
annihilator?
Our final example
provides
some motivation
for such fundamental extensions
and also illustrates
that
our generalization
of conjunctive
queries
queries.
using
Example
label
7.2.
systems
Several
generalizations
seek to perform
path
performing
computations
label
is equally
computations
useful
of the
in
transitive
by associating
as paths
the
labels
are enumerated,
context
of recursive
closure
with
of a graph
paths
Typically,
and by
the set of
labels is either the set of nonnegative
integers
or the set of nonnegative
reals.
The operations
correspond
to computing
a new label for a path by “multiplying”
( * ) the labels
for the subpaths
used to generate
the path
and to
computing
a new label for a set of paths by “adding”
( + ) the labels for the
paths in the set. By making
different
choices for * and +, a variety
of path
problems
can be stated in this framework.
Table IX contains
some characteristic examples.
Although
we do not consider
recursive
queries,
as Table IX shows, we can
use any of these label systems to define useful
databases
and queries.
For
example,
the label system in reachability
is the traditional
relation-as-set
ACM
TransactIons
on Database
Systems,
Vol
20, No
3, September
1995
Containment
Table
IX.
Characteristic
*
Problem
Reachability
Maximum
Bill
capacity
path
of materials
Shortest
Most
path
reliable
Critical
path
(longest)
path
Examples
+
and
L
max
Nonnegative
integers
A
sum
Nonnegative
integers
B
integers
?
min
Nonnegative
max
[0,
sum
max
of conjunctive
Sagiv
and
it in the
case of Theorem
multisets,
which
6.1 (in
we prove
Chaudhuri
the
not
while
and Vardi
[1993]
solved
by Chan-
and
Klug
dependencies.
of conjunctive
essentially
same
problem
independently
satisfy
because
in the presence
Johnson
the union
the
systems,
the problem
is decidable,
contrast
to
undecidable).
it does not
label
and inclusion
addressed
or even as label
because
* is not
for sets was
studied
dependencies,
problem
because
even
containment
of functional
[1980]
that
system
are
et al. [1979]
multivalued
sets and showed
Recently,
?
integers
restricted
to nonnegative
edge labels,
Unfortunately,
not all proposed path
B label
query
Aho
presence
and Yannakakis
?
11
Nonnegative
critical
path
annihilator.
The
studied
a type
type
A
as type A or type B label systems,
path is not a type A label system
WORK
of functional
“true”}
sum
it is not
[1977].
system
product
8. RELATED
problem
Label
min
(Bl).
Shortest
path and
neither
has a multiplicative
dra and Merlin
Problems
(“false,”
product
321
●
or
can be viewed
Most reliable
idempotent,
of Path
Queries
and
view, whereas
bill of materials,
when
essentially
treats relations
as multisets.
systems
systems!
of Conjunctive
queries
proving
over
[19821
Finally,
for
a special
relations
discovered
as
Theorem
5.1, Proposition
5.1, and Theorem
5.2 (i.e., similar
sufficient
conditions
for
conjunctive
query containment)
for the case of multiset
queries.
In terms of
differences
with our work, they
query containment
for multiset
conjunctive
query
fore in NP, but
equivalence
not known
also showed that the problem
queries
is II # hard, whereas
is the same as graph
to be NP-complete).
additionally
deal with unions
of conjunctive
ment is undecidable
in that case. Moreover,
multisets
alone,
we present
systems),
which
captures
a much
a variety
more
isomorphism
(and thereOn the other
hand,
we
queries
whereas
general
of possible
of conjunctive
surprisingly,
and show that containtheir entire work is on
formalism
interpretations
(that
of label
of relations
(including
sets and fuzzy sets). Hence, even the results that are common with
those of Chaudhuri
and Vardi are in essence more general,
since they hold for
all label systems of type B.
The algebraic
framework
that we have used in this paper is inspired
by an
idea of Ioannidis
and Wong [1991]. Specifically,
that earlier
work talks about
associating
a real number
with each tuple of a relation,
so that an algebra
ACM
TransactIons
on Database
Systems,
Vol
20, No 3, September
1995.
322
Y E. Ioannidls
.
and
R. Ramakrishnan
can be formed and a theorem
on Iinearizability
relations
as sets of tuples)
can be obtained.
real-number
idea
and
have
generalized
of multilineal
recursion
(over
In this paper we have taken the
it to attach
an arbitrary
label
to a
tuple. Based on that, we have then developed
the entire
formalism
of label
systems, their properties,
and the results
on query containment.
There have been a few other pieces of work that deal with relations
as
multisets,
but
address
problems
other
than
containment
of conjunctive
queries.
Dayal
et al. [ 1982] were probably
the first
to consider
multiset
queries in SQL, using a model of relational
algebra
with control
over duplicate
elimination.
mantics
junctive
Maher
and
Ramakrishnan
[1989]
for logic programs
that generalizes
queries
studied
in this paper. The
proposed
the multiset
main results
a multiset
se-
semantics
for conin that paper were
that checking
if a general
logic program
and that there is a sufficient
condition
generated.
Mumick
et al. [1990] studied
generated
duplicates
is undecidable
for ensuring
that no duplicates
are
the issue of multiset
queries in SQL
and
nested
showed
using
the
how
magic
selection-pushing
sets technique.
into
ered the semantics
of general
SQL queries.
The area of fuzzy sets has been studied
work by
programs
sets and
1988].
9. SUMMARY
We have
multisets,
reasoning
with
expert
recently
following
the
proposed
an extension
that is closely related
systems
[Parsaye
and
considseminal
of logic
to fuzzy
Chignell
AND FUTURE WORK
generalized
and other
the notion
refinements
examined
the problem
types of label systems.
deal
in
can be accomplished
et al. [1991]
extensively
Zadeh
[1965].
Van Emden
[ 1986]
to deal with quantitative
deduction
uncertainty
queries
Final ly, Negri
relations
of a relational
to the concept
database
to cover fuzzy sets,
of a relation
as a set. We have
of conjunctive
query containment
for
Specifically,
we have shown that earlier
as sets
are generalized
for
all
label
two important
theorems
that
systems
of type
A,
which
include
sets and fuzzy sets as special
cases. We have also provided
sufficient
conditions,
and in special cases necessary
and sufficient
conditions,
for label systems of type B, which include
relations
as multisets,
an important special case due to its significance
in SQL.
problem
of containment
of sets of conjunctive
earlier
set-based
results
for all label
systems
We have
queries,
of type
also addressed
the
generalized
again
A and proving
undecid-
ability
for label systems of type B.
An interesting
open problem
is that of containment
of individual
conjunctive queries for type B systems. We have presented
a necessary
and sufficient
condition
for queries
with no repeated
predicates
and a sufficient
condition
for the general
case. Is there a general
necessary
and sufficient
condition?
Is
the problem
decidable?
(Note that Chaludhuri
and Vardi [1993] established
a
lower
bound
for this problem,
but not an upper
bound.)
Given
that
the
problem
is decidable
for both individual
queries
and sets of queries
over
relations
ACM
as
TransactIons
sets
and
on Database
our
undecidability
Systems,
Vol
20, No
result
3, September
for
1995
sets
of
queries
over
Containment
multi sets, this raises
individual
queries.
There
are several
include
dealing
formalizing
ties
the possibility
a label
the techniques
of query
additional
areas
recursive
queries
with
system
and addressing
that
that
developed
the problem
that
require
the
containment
in this
work
further
even for
exploration.
traditional
query
323
.
of deductive
problem
into
Queries
is undecidable
in the context
captures
the query
of Conjunctive
These
databases,
notion
of probabili-
for it, and incorporating
optimizers
that
face issues
containment.
REFERENCES
AHo,
A., SAGIV, Y., AND ULLMAN,
Comput.
8, 2 (May),
BLAKELEY,
J. A.,
views.
In
LARSON,
Proceedings
(Washington,
P. A.,
of the
CHANDRA, A. K., AND MERLIN,
data
Computing
bases.
(Boulder,
CHAUDHURI,
with
neering
DAVIS, M.
DAYAL,
over
1993.
Calif.,
In
GRANT,
ACM,
J.,
Mar.).
New
Nicolas,
Eds.,
In
York,
(Apr.),
J.
updating
materialized
Management
of Data
of conjunctive
ACM
Symposium
queries
in
Theory
of
on
of real
DC.,
conjunctive
June).
queries.
ACM,
S., AND SHIM,
New
K.
1995.
International
In
York,
Proceedings
59–70.
Optimizing
Conference
queries
on Data
Engi-
190–200.
Dover
1982.
An
of the
Publications,
extended
New
relational
1st ACM–PODS
York.
algebra
Conference
with
(Los
control
Angeles,
117-123.
1981.
Advances
Press,
May.
Optimization
New
York,
1991.
in
deductive
Base Tlzeor.y, Vol.
m Data
and
conventional
1, J. Minker.
relational
H. Gallawe,
and J. M.
195-234.
Towards
and algebraic
theory
of recursion.
J. ACM
38, 2
329-381.
JOHNSON, D. S., AND KLUG,
tional
and
Angeles,
KING, J.
inclusion
Calif.,
J.
MAHER,
A.
1982.
ACM,
Quist:
New
I.
S.,
deductive
Symposwm
PIRAHESH,
databases.
Australia,
Aug.).
M.,
In
union
SELLIS,
and difference
T, K.
SHENOY,
S.,
queries
1st ACM–PODS
under
func-
Conference
(Los
164-169.
query
optimization
Conference
1989.
(Cannes,
of the
R.
16th
Aug
of logic
(Cleveland,
AND RAMAKRISHNAN,
in relational
France,
Di2j6 vu in fixpoints
Programmmg
G., AND SBATTELLA,
Syst.
16, 3 (Sept.),
M.
AND YANNAKAKIS,
Conference
R.
Proceedings
PARSAYE, K., AND CHIGNELL,
Y.,
VLDB
of conjunctive
of the
Ohio,
1990.
databases.
programs.
Oct.
In
Proceed-
1), 963–980.
Duplicates
and
VLDB
Con&ence
International
In
). Pp. 510-517
aggregates
in
(Brisbane,
264-277.
PELAGATTI,
Database
pp
for semantic
on Logic
H.,
containment
Proceedings
York,
A system
M. J., AND RAMAKRISHNAN,
Trans.
In
of the 7th International
of the ALP
MUIVIICK,
Testing
dependencies.
Mar.).
1981.
Proceedings
SAGIV,
SIAM
77-90
Unsolc]abJ@.
communication,
J.
Plenum
York,
York,
Proceedings
IOANNIDIS, Y. E., AND WONG, E.
NEGRI,
New
and
Personal
systems.
implementation
Annual
9th
of the IEEE
IEEE,
In
AND MINKDR,
database
Etllciently
the
R., POTAMIANOS,
Computabdity
1995.
expressions.
Lyon feren ce on the
(Washington,
Proceedings
elimination.
Mar.).
1986.
Optimization
Conference
Taiwan,
duplicate
relational
Optimal
New
ACM,
U., GOODMAN, N., AND KATZ, R. H.
GRAEFE, G.
ings
1977.
May)
Colo.,
views.
1982.
F. W.
of
S., KRISHNAMURTHY,
(Taipei,
among
ACM–SIGMOD
proceedings
S., AND VARDI, M.
materialized
Equivalence
AND TOMPA,
1986
P. M.
In
of the 12th ACM–PODS
CHAUDHURI,
1979.
ACM, New York, 61-71.
D. C., May).
relational
J.
218-246.
1986,
1988.
M.
Global
query
AND OZSOYOGLU,
Proceedings
of the
1987
cisco, Calif.,
May).
ACM,
M.
ACM
York,
semantics
for Experts.
among
27, 4 (Oct.),
of SQL
queries.
ACM
In
1987.
A
Proceedings
D. C., May).
system
Conference
Wiley,
relational
New
York.
expressions
with
the
633–655.
(Washington,
ACM–SIGMOD
New
Systems
optimization.
of Data
Z.
Formal
Eqmvalences
J. ACM
on the Management
1991.
Expert
1980.
operators.
L.
513-534.
for
of the
ACM,
semantic
1986
New
query
on the Management
ACM-SIGMOD
York,
191-205.
optimization.
of Data
(San
In
Fran-
181–195.
Transactions
on Database
Systems,
Vol
20, No 3, September
1995.
324
.
Y, E. Ioannidis
M,, ANDIOANNIDIS,Y.
‘1’S.4TALOS, O., SOLOMON,
data
independence.
Chile,
M. H.
1 (Jan,),
37-53,
ZArYJFI, L.
1965.
ACM
In
Proceechizgs
1986.
Quantitative
nf the
1994.
20th
The
grnap:
Intw-natmnal
A versatile
VLDB
tool
for
C’on ferenw
phwcal
(Santiago,
Sept ), 367-378
VAN EMIXN.
Received
and R. Ramakrishnan
January
Transactions
Fuzzy
1992;
sets
revised
on Database
Inf.
deduction
Control
June
Systems,
1993;
Vol
8.3
and Its fixpoint
(,June),
August
20, No
theory
J. Logrc
Program
338-353.
1994,
and
3, September
May
1995;
1995
accepted
May
1995
3,