Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Concurrency control wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational algebra wikipedia , lookup
Containment of Conjunctive Beyond Relations as Sets YANNIS E. IOANNIDIS University of Wisconsin Conjwzctiue queries languages of such many tion. however, as multisets Thus, in order necessary In about we associated label. This that or “false” holds for label systems of a type all and that as part views, query optimiza- over sets (and equivalence) of tuples, has been shown In fact, relations for systems between The work viewing of to be of Y, E. Ioannidis Informix. The work Grant of R Young to make digitaI/hard by the was conditions. For and queries fuzzy sets. We class of Finally, we show of the first the motivates National partially and Engmeermg, Sciences a for of a restricted underscores and by grants Award our that type and fundamental a closer examina- in SQL. supported Investigator and Xerox. Computer the tuple to keep is established these systems and an are traditional that of conjunctive as multlsets. result has is essentially a result set-relations as multmets, IRI-9157368 in Science Tandem, address and This importance Ramakrishnan Fellowship a Presldentlal them was partially Investigator Foundation as sets given system satisfy for label it is that model In order label Once tuple relations also for containment relations type each relation). each traditional queries, are interpreted (meaning for containment is decidable second which “true” that are requests be generahzed. we can tuples. condition abstracts queries of the where of relatlons the of SQL us to consider case, with explicit relations in be either condition both that must is not in the systems, sufficient relations as multisets, Young of label system undecidable of relatlons tuple after class in which allows a tuple and sufficient of conjunctwe tion the only databases a special with abstracts of unions label sets. As that and be posed. of a large over mterpretations necessary for a label may queries, can be associated a necessary a different queries from DEC, Authors’ query arises rnateriahzed global queries ehmmated of a database fuzzy that for of tuples as a set of tuples associated containment under using and on set mcluslon being queries a variety it system, we present and based of conjunctive (meaning we consider example, Packard queries sets to sets. Containment multzsets notion are a label Presidential conjunctive from of a relation the label on the labels difference examined duplicates conjunctive set of conditions conjunctive example, optimization, containment/equivalence generalized relatlons general, also present has over with The view study by making is in the relation) results for of such query defined a generalization of tuples: and queries to reason paper set-relatlons topic and are at the core of relational equivalence) optimization, semantic naturally default, this mcsltzsets (and as a function by to consider as multmets query on the has been database problem. in SQL, treated a relational queries, can be viewed queries an NP-complete Even work RAMAKRISHNAN for containment of nested formal each query conjunctive over Testing features correlated Earlier where are queries as SQL. advanced processing and RAGHU Queries: and supported Grant Foundation DEC, by by the National under Department, Science from IBM, a David Unlvermty and Science IRI-9011563, under HP, AT&T, and of Wisccms]n, Lucde Foundation by grants Madison, WI 53706. Permission granted without fee provided advantage, the given copying that copyright copy of part that notice, TransactIons the is by permmsion servers, or to redistribute to lists, G 1995 ACM 0362-5915/95/0900–0288 ACM copies on Database are title or all of thm not of the of ACM, requires Systems, prior $03.50 Vol made work publication, Inc. To copy specific 20, No for personal or distributed and its otherwise, permission 3, September for date or classroom profit appear, to republish, and/or a fee 1995, Pages 288-324 use IS or commercial and notice is to post on Containment of Conjunctive Queries 289 . Categories and Subject Descriptors: l?,m [Theory of Computation]: Miscellaneous; H.1.l [Models and Principles]: Systems and Information Theory; H.2.3 [Database Management]: H.2.4 [Database Management]: Systems—query processing Languages—query languages; General Terms: Algorithms, Languages, Theory Additional Key equivalence, Words query and Phrases: Conjunctive queries, multisets, query containment and containment and optimization 1. INTRODUCTION The problem conjunctive and Merlin of syntactically queries [1977] characterizing was addressed in the late and by the tableau work pioneering papers, conjunctive t~ples) to sets, and containment queries were was naturally Even over in SQL, relations nated are only of a large conjunctive Our view tn this however, treated after queries as multisets explicit requests. class of SQL queries, queries, in which multisets by of tuples with in order it is necessary relations to consider elimi- equivalence a generalization as multisets of of tuples: in which each 1991]. This generalized that are fuzzy sets or We develop results on different label systems, that can be associated with tuples. our work, and describe our results significance. 1.-1 Motivation for Generalized The unit which being about of a relation as a set of types must be generalized. paper we study conjunctive queries over databases tuples has an associated label [Ioannidis and Wong notion of a database allows us to consider relations multisets, in addition to the case of relations as sets. the decidability of conjunctive query containment for and their may be posed. In fact, duplicates to reason are interpreted that is, different conditions on the labels In the rest of this section, we motivate of seen as functions from sets (of defined based on set inclusion. default, Thus, equivalence 1970s by the work of Chandra of Aho et al. [1979]. In these of optimization corresponds closely Conjunctive Queries in current relational optimizers is an SQL block, to a conjunctive query. The basic idea behind most query optimizers is to examine a collection of equivalent evaluation plans for a given query and to use the plan with the least estimated cost. In addition, many advanced optimization techniques require an earlier, rewriting, step, where declarative the opportunities queries are rewritten into equivalent ones, thus expanding for optimization at the lower, algebraic (plan-based), level. The rewriting is often based on comparing the original query with (possibly pieces of) other queries with respect to equivalence or containment. As a first example of such advanced query optimization techniques, consider the use of materialized views. In this case, a given query is compared against multiple view definitions (which may be seen as queries as well) to identify which of them may be combined to answer the query. This comparison is essentially testing for query containment. Supporting materialized ACM Transactions on Database Systems, Vol. 20, No. 3, September 1995. 290 Y . views to E. Ioannidls process [Blakeley et al. and and R. Ramakrlshnan optimize 1986; queries Chaudhuri et designers example, are considering offering Microsoft is contemplating in system their problem of [Graefe processing 1995]. is al. the topic 1995; of much Tsatalos et recent al. this option in their future the use of GMAPs [Tsatalos As correlated a second nested example, SQL and systems; for et al. 1994] consider queries, work 1994], the where related one of the alternatives is to process a more general nested query whose result can then be used to process the outer query. Figuring out what the appropriate more general TPC-D nested query is involves query containment. With the release of the benchmark, in the last year or so much research has started on such transformations As a third system tries queries) in the context example, of processing consider to figure can be used out semantic if integrity to modify and optimizing query constraints a given query aggregate queries. In this case, the optimization. into (which may be seen a semantically as equivalent one that may be more efficiently executed [King 1981; Shenoy and Ozsoyoglu 1987]. This is only possible if the antecedent of the integrity constraint, seen is contained in the given query. As a final example, in as a query, multiple/global query optimization, several queries are compared pairwise to identify equivalent common subqueries (subexpressions). This gives us the option overall of evaluating such subexpressions only once, hopefully resulting more efficient combination of plans [Grant and Minker 1981; 1986]. Again, common examples, it equivalence tion these comparisons subexpressions of many should involve contain become of declarative advanced each clear queries query query of the that the are central full containment queries. questions tests, From all in an Sellis as the of these of containment to the design and and implementa- optimizers. Currently, the formal results on conjunctive query equivalence and containment apply only when we regard a relation as a set of tuples [Chandra and Merlin 1977; Sagiv and Yannakakis 1980]. As mentioned, however, the defaz~lt in SQL (unless distinct is used) is that duplicates are not eliminated, and the reason cardinal rigorously ities of the about tuples in the answer containment/equivalence are significant. Hence, in the of current context to SQL query optimizers, we must consider conjunctive queries over relations that are multisets of tuples. This is a major motivation for the results presented in this paper, which provide a theoretical foundation for conjunctive query equivalence and containment over relations that are multisets. Moreover, these results are cast in a more general framework, and in fact deal with relations as fuzzy sets and with the traditional case of relations as sets as well. Our results demonstrate that the containment problems for sets and multisets are quite different in nature. Thus, we believe that they represent a necessary complement to the earlier results on conjunctive query containment [Chandra and Merlin 1977; Sagiv and Yannakakis 1980], which are widely recognized as foundational from a theoretical standpoint. Next, we present two examples, one from view materialization and from multiple query optimization, illustrating the query containment/equivalence problem and the significance of addressing it when relations treated as multisets. ACM Transactmns on Database Systems, Vol 20, No 3, September 1995 one are Containment 1.1. Consider Example create select from create select from The the following of Conjunctive SQL view Queries 291 . definitions: view EMPHIWl(ename, hours) as distinct. e.ename, e.hours EMP e, view EMPHRS2(ename, hours) as e.ename, e.hours EMP e. database relation EIW? has columns “ename,” “dname,” and “hours”; intuitively, each employee works in one or more departments for a certain number of hours. Users write queries that refer to the views EMPHRS 1 and EMPHRS2, and we wish to evaluate these queries efficiently. If queries on EMPHRSI are common, Now, a query given 1 to answer of IEMPHRS tant the system on EMPHRS2, it? Such in an environment might when choose an optimization containing to materialize this can we use the materialized large becomes numbers especially of complex, view. version impor- related view definitions, as is the case for instance in decision support applications. TO answer this question, we must determine when the two view definitions are equivalent. If EMPHRS1 they are equivalent. However, and EMPHRS2 if we take into are regarded as sets of tuples, account the number of copies of each tuple, they are different. For example, an employee hours each in the Toy Department and the Sales Department, identical tuples in EMPHRS2, but Consider the following query: only one tuple could work five leading to two in EMPHRS1. select ename, max(hours) from EMPHRS2. The difference between EMPHRS1 and EMPHRS2 is not important query, and we can use a materialized version of EMPHRS1, in place of the reference to EMPHRS2. On the other hand, max(hours) carmot rectly, different The is replaced by sum(hours), the difference becomes substitute EMPHRS 1 for EMPHRS2. To make the SQL optimizer must recognize that the two under next which, central example again, role. Example database, multiset 1.2. which for this if it is available, if the aggregate crucial, and we this distinction view definitions corare semantics. illustrates equivalence Consider includes a more of queries the complex over relational the following optimization relations schema three of scenario in as multi sets plays a large corporation’s a relations: MANUF( mrzo, location,... ) TRANSP(tno, location,... ) WAREFIS(zorzo, location,.. . ). is the city where the manuAttributes in italic are primary keys, “location” facturing plant, the warehouse, or the transportation unit’s headquarters are located, and other attributes are omitted for simplicity. Assume that, in a ACM Transactions on Database Systems, Vol. 20, No. 3, September 1995 292 Y E Ioannldis . and R i3amakrishnan downsizing effort of the company, the following three SQL queries optimize and then Ql: execute them together: selectm.mno, count(m.location) fmm m, MANUF vvhere to decide relocations of’ various operations, are submitted, and the system tries to TRANSP m.location t = t.locatlon @oup-by m.mno, select w.wno, count(w.location) Q2: from WAREHS where W, TRANSP w.location group-by t = t.location w.wno, SAX% m.mno, w.wno, count(”) from MANUF m, WAREHS w, TRANSP where m.location = t.location and w.location = t.location group-by m.mno, w.wno. Q3: QI requests manufacturing t the number of transportation units based on the plant. Q2 requests the number of transportation city of each units based on the city of each warehouse. Finally, Q3 requests the number of transportation units based on the city of each pair of colocated manufacturing plant and warehouse. Most relational gregate-free sary data queries SQ1: SQ2: SQ3: systems subqueries and then respectively process aggregate queries to generate temporary relations by performing corresponding the to Ql, appropriate by first processing containing aggregations. ag- the necesThe sub- Q2, and Q3 are as follows: select m.mno, m.locations from MANUF m, TRANSP t where m.location = t.location select w.wno, w.location from WAREHS w, TRANSP t where w.location = t.location select m.mno, w.wno from MANUF m, WAREHS w, TRANSP where m.location = t.location and w.location = t.location, t In the above SQL queries, it is natural to consider the relations named from clause to be sets of tuples. However, since the keyword distinct in the is not used in their formulation the resulting relations are m uztisets. For example, if there are three transportation units in San Francisco, which is the city of warehouse #34, there are three copies of the tuple (#34, San Francisco) in the result of SQ2. Retention of duplicates is critical in this case, since it enables us to count these tuples on a per-warehouse basis and to generate the information requested by Q2. Instead of directly processing SQ3, a seemingly plausible alternative for a multiple query optimizer is to join the results of SQ1 and SQ2 on “location,” thus reducing the number of required join operations. The query correspond- ACM Transactions on Database Systems, Vol 20, No 3, September 1995 Containment ing to this approach SQ4: of Conjunchve that 293 . is the following: select m.mno, w.wno from iVMNUF m, WAREHS w, TRANSP where m.location = t.location and w.location = t’.location and m.location = w.location. It is clear Queries SQ3 and SQ4 generate the t, TRANSP exact same t’ sets of (mno, wno) pairs, that is, pairs of colocated manufacturing plants and warehouses with at least a transportation unit based in the same city. (This may also be formally proved by applying the results mentioned earlier [Chandra and Merlin 1977].) Nevertheless, executing SQ4 instead of SQ3 is wrong in this case, because be retained, each the and (mno, wno) many times subsequent aggregations SQ3 and pair. as there SQ4 generate Specifically, require different SQ3 that generates are transportation units the duplicates numbers each based must of duplicates (mno, wno) on the city of pair of the as pair, whereas SQ4 generates each (mno, wno) pair as many times as there are pairs of transportation units based on the city of the pair, a rather meaningin SQ4 when less result. Viewed as conjunctive queries, SQ3 is contained relations are treated as multisets, but not vice versa, whereas they are equivalent when relations are treated as sets. Thus, ent depending on how relations are treated. If this systems may Example t?W_lCe 1.2 illustrates of the presented query optimization compare the results. scenarios queries are treated interpretations in this end up producing with main results. motivation As we have depends respect as multisets. of relations, erroneous these notions are differdistinction is not made, for this work seen, correctness on the to containment existence and realistic of techniques or equivalence Similar needs also for example, fuzzy the impor- of many when that relations arise for a variety of other sets, which are also included study. In developing our results, we have chosen to model generalized relations as sets of tuples with associated labels [Ioannidis and Wong 1991]. As an example, a positive integer multiplicity (or number of copies, intuitively) is associated with every tuple in a multiset. As noted, current relational query languages such as SQL support a multiset to be extremely useful fuzzy sets, arbitrary itively) The is associated with every tuple in a fuzzy set. conditions under which our results are applicable where an of the algebraic properties Although it is true that without the abstraction tant advantages. First, for many semantics, found number common in and queries. [0, 1] (or this Another certainty has been example factor, are stated is intu- in terms of the set of labels and associated label operations. the results for, say, multisets could be obtained of label systems, the framework has several imporour results have more generality; for example, a single label system, type A, defined later, covers both fuzzy sets and traditio nal sets, and thus, our results about type A systems apply to both. Without the abstraction of labels and label systems, the same results would have to be ACM TransactIons on Database Systems. Vol 20, No. 3, September 1995. 294 Y. E, Ioannldis and R. Ramakrishnan . established independently although essential for relations as sets and for relations the essence of the proofs is the same. Second, properties upon which our proofs depend, the as fuzzy sets, by abstracting the framework of label systems allows us to understand more fully the basic differences between queries over different kinds of relations. Finally, it is possible that our results are applicable to other kinds of relational extensions to the label systems that we study; the formulation label systems makes it easy to extend our results, show that the given label system, new kind of relations satisfies with properties similar of our results in terms since it is only necessary the (simple) conditions of to for a 1.2 Results Our main results are the following: First, we show that containment of both individual conjunctive queries and unions of them is decidable but NP-complete for a label system that generalizes relations as sets and relations as fuzzy sets. For the specific case of relations as sets, these results had been shown earlier 1980]. Second, undecidable individual as well [Chandra we show for a label conjunctive conditions for that and Merlin containment system queries containment, that within which 1977; Sagiv of unions captures that are label and relations system, necessary Yannakakis of conjunctive queries we present and Recently, Chaudhuri and Vardi some of our results. 1.3 Outline [ 1993] studied (We discuss this multisets further For sufficient sufficient when queries are in a restricted (but very common) form. Decidability ment in the general case, however, remains an open problem. discovered is as multisets. the of contain- and independently in Section 8.) of the Paper The paper is organized as follows: In Section 2 we present some background material and also develop our generalized notion of a database, which we consider to be one of our important contributions. We introduce label systems and identify two types of label systems (types A and B), for which results are presented in this paper. In Section 3 we establish a general necessary condition for conjunctive query containment that is satisfied by a large class of label systems, including type A and type B systems. In Section 4 we show that the necessary condition identified in Section 3 is also sufficient for conjunctive query containment over databases with label systems of type A. This result strictly generalizes the condition of Chandra and Merlin [1977] sets. In Section 5 we and is also applicable to databases that deal with fuzzy present a systems of type are as no sufficient repeated sufficient. condition B. For predicates, These for restricted we results containment classes prove of that are applicable over conjunctive this databases queries condition to databases is that in with label which there necessary deal with as well multi- sets. In Section 6 we study unions of conjunctive queries. For label systems of type A, we present a necessary and sufficient condition for containment of unions of conjunctive queries, which generalizes a similar result for sets [Sagiv and Yannakakis 1980]. For label systems of type B, we prove that the ACM Transactions on Database Systems, Vol. 20, No 3, September 1995 Containment problem is, in generaI, concerning a class reasoning used in expert are motivated systems 9 we outline summarize In Section systems by graph-theory 8. In Section that like several 7 we establish captures MYCIN, problems. the as well We discuss interesting Queries some form directions results of fuzzy-set as label related 295 . systems work that in Section for future work and our results. 2. BACKGROUND In this undecidable. of label of Conjunctwe section databases we review and systems, AND BASIC DEFINITIONS databases, the usual some conjunctive standard queries. concepts In and conjunctive and particular, query develop the containment our model definitions of of label are generalizations of definitions. 2.1 Label Systems We first introduce label systems, of conjunctive queries. Intuitively, by attaching factors a label each tuple. to or multiplicity, which are at the heart the ideal is to extend for example. Labels of our generalization the relational model can be used In our framework, to capture labels certainty are drawn from a special domain, and their usage must conform to certain rules that enable us to interpret labels using an appropriate semantics. It is possible to place different sets of rules upon labels, leading to different semantics and to differences queries. in the difficulty In this by a small paper collection Definition 2.1. of deciding issues we study two types of algebraic rules. A label system 9 like containment of label systems, is a quintuple 1% z of conjunctive each characterized (L, *, O, < ) such +, that L is a domain * is a binary of at least operation two labels (called equipped with multiplication) a partial on L that order < ; is associative and commutative; + is a binary commutative; O is the additive multiplication thatis, operation and (called addition) on L a+ O=a, a*O =0, (Al) associative 2.2. A label system 5?’= and respect to order < , and O<a. When a s b and a # b, then we use the notation system studied in this paper is presented here: satisfies is identity in L and is also an annihilator with and the least element with respect to the partial Va~L, Definition that (L, a < b. The *, +, O, < ) is of first type A label if it the following: b’a, b~L-{O}, O<a*b (A2)Va GL, (A3)Va, a’, b, b’~L, (A4)Va, b~L, <a. a*a=a. a+ (a<a’andb b<aora+b< ACM Transactions <b’) =a+b<a’+ b’. b. cm Database Systems, Vol 20, No 3, September 1995 296 Y. E, Ioannidis . Note that conditions and R, Ramaknshnan condition (A2) states that multiplication (A3) and (A4) are equivalent to stating that is idempotent. Also, < is a total order and that + is equal to max with respect to that total order. We chose the given presentation to make the analogy with condition (B2) of label systems of type B clearer. Example important 2.1. Binary truth values as well as certainty factors examples of label systems of type A. For the case of truth L = {“false”, “true”}. or. The label the annihilator is easy The operation * is logical “false” corresponds for multiplication. to verify satisfied. For certainty The operation that all factors, * is min, order the operation + is logical to O and serves as the additive Finally, < is such that “false” of the conditions for a type identity < “true.” A label system of It are L is equal to the set of real numbers between O and 1. and the operation + is max. (This is the case, e.g., in the system proposed by van Emden additive identity and the multiplicative total and, are two values, on numbers. Again, all [ 1986].) The annihilator. conditions for number Finally, a type O serves as the < is the regular A label system are satisfied. Definition 2.3. fies the following. A label system L?= (L, The second label system *,+ ,0, < ) is of type B if it satisstudied in this paper is presented next: (Bl)Va, b~L–{O}, a<a*b. (B2)Va, a’, b, b’=L, (a<a’andb (B3) <b’)+ a+ b<a’+ most important Example 2.2. Specifically, and + Arithmetic L is equal are the The number to the usual O serves is the set of nonnegative multiplication and as the additive identity Relation Instances with Respect In order to deal with requires that a relation presenting the formal multiple instance definition, label integers. addition to a Label system The of numbers, of type B. operations * respectively. and the multiplicative tor. Finally, < is the usual total order on numbers. that all of the conditions for a type B label system 2.2 b’. Va @ L, ~a’ G L, a < a’. annihila- Again, it is easy to verify are satisfied. System interpretations of relations, our treatment is defined in a new, generalized way. Before we illustrate it with an example from a traditional SQL-based context to provide intuition. After the definition, additional examples focus on aspects of the definition itself. For the purposes of this work, a relation instance is defined as a function from all possible tuples that could be formed from the domains of the relation’s attributes to the labels of a system according to which the relation is interpreted. For example, consider the MANUF relation of Example 1.2 interpreted as a multiset. Assuming that there are three manufacturing plants ACM #1, #7, TransactIons and #9 located, on Database Systems, respectively, in Boston, Vol. 20, No, 3, September 1995 San Francisco, and Containment Boston, the corresponding the following MANUF of Conjunctive instance would (#1, San Francisco) (#9, San Francisco) Boston units, as (#l, (#7, (#9, indicating tuple the ~ units and San Francisco has 5 SQ1 would be formally specified as Boston) ~ 3, ~ O, Boston) -0, ~ 5, Boston) ~ 3, San Francisco) is associated number 1, + O. San Francisco) (#9, is, each possible -+ 1, San Francisco) (#7, 1, ~ O, Boston) has 3 transportation the result of query (#1, formalized specified A O, Boston) (#7, (#9, Similarly, if transportation follows: ~ San Francisco) (#7, plicity, be formally 297 . function: (# I, Boston) That Queries -0. with of times it a nonnegative appears in the integer relation. multiThis is in the following: Definition the domains 2.4. Let Q be an n-ary predicate symbol, and let .DI,... , D. be of values of the arguments of Q. Predicate symbols over the sa,me cross product of domains are called compatible. Also, let 2? be a label system with domain L. A relation instance for Q with respect to 2 is a total instances for compatible predicate functionl Q: DI X . . . x D. ~ L. Relation symbols are called compatible. A database instance with respect to a label sy~tem & x . . . X D. l:$i is a set is called <n. Example the of relation instances. Any element of the domain DI a tuple and is denoted by (dl,. . . . dn), for d, ● D,, first 2.3. label The traditional system of type view of relations A given in as sets is captured Example 2.1. Usually, through a relation instance is compactly represented as the subset of its domains containing the Fuzzy sets may be captured through the second tulples that map to “true.” label system of type A given in Example 2.1, where each possible tuple is mapped to a certainty factor between O and 1. Finally, multisets may be calptured through the label system of type B given in Example 2.2. In this case, as illustrated earlier, each tuple is mapped to a nonnegative integer, which indicates the number of copies of the given tuple in the multiset. lFunctions that denote relation instances appear ACM Transactions in boldface. on Database Systems, Vol. 20, No. 3, September 1995 298 Y E Ioannldis . In the that it sequel, is with and whenever respect R, Ramakrishnan we refer to a database to a given label instance system. One it is understood could argue equivalent, and perhaps more natural, formulation of different tions of relations is obtained by adding an explicit column (field) the label assigned to each tuple. The resulting extended relations be treated as regular sets. For example, instead of using atomic that an interpretathat stores would then formulas of the form Q(tl, t2,....tn),one could use Q(tl, t2,....tn,1),where the domains unchanged and the domain of 1 is L. In fact, this alternative of t, remain formulation would instances clear based shortly, junctive explicit queries status and most likely be on nontraditional representation is not must used enough. be treated in as the label systems. of the The label Conjunctive Conjunctive mentioned them. additional column a special way associated with it. Otherwise, wrong results The formalism of label systems, introduced semantics of labels explicit, and is therefore 2.3 basis for However, relation be made label must that storing as will column be given respects the in cona special semantics will be obtained in several cases. in this section, makes the special adopted for the rest of the paper. Queries queries are formally defined in this subsection. We have already that SQL blocks, that is, fZat SQL queries, correspond closely to Specifically, conjunctive queries are essentially logic-based specifica- tions of such SQL queries, whose syntax is more natural to deal with in formal work, as earlier studies of containment and equivalence have demonstrated. For example, consider the SQ1 query, which The equivalent logic-based syntax for this query is MANUF(mno, 10C) ~ TRANSP(tno, The join on location MANUF and TRANSP is captured 10C) + is expressed SQlresult(mno, by the use of the common predicates, whereas in flat SQL. 10C). variable 10C in the of mno and 10C on the appearance the right-hand side of ~ require a predicate name indicates the query projections. Conjunctive queries for the result, which was arbitrarily chosen here to be SQlresult. syntax Definition Al ~Az The formal 2.5. of conjunctive queries A conjunctive query is a first-order t“ ~ . . . ~ Am ~ C. All of the variables appearing is defined formula in the as follows: of the formula form are (implicitly) universally quantified. The formula to the left of is called the Each one of antecedent, and that to the right of ~ is called the consequent. formula of the form Q(tl, tz, . . . . t.), where Q Am is an atomic C, AI, A,,..., The predicate is a relation (predicate) symbol and t,,1 < i < n, is a variable. symbol in the consequent does not appear in the antecedent (i.e., recursion is distinnot allowed). Variables that appear in the consequent are called guished and must appear in the antecedent as well, while all others are called nondistinguished. Finally a conjunctive query has an associated unary function f: M e M, for some set M equipped with a partial order < such that 30 ~ M, ‘da ● M – {O}, O < a, and O <f(a). ACM TransactIons on Database Systems, Vol 20, No 3, September 1995 Containment Based on Definition It obtains 2.5, a conjunctive semantics only system and processed to the same systems. when based conjunctive The only query on that. query Several by conjunctive query associated with Va = L, f(a) ~ L. The role of the distinct using it construct. on a particular semantics based a label query on different system without label can be given label L? to interpret a function f: M ~ M is that function f will be clear when queries. a conjunctive 299 . syntactic based interpreting for Queries is a purely it is interpreted requirements valuations of conjunctive ‘Whenever we present of Conjunctive a L c M and we discuss an associated function f, then it is assumed that ~ is the identity function. Also, we use the convention that the atomic formula in the consequent of a conjunctive query always has the distinguished predicate symbol P, possibly subscripted with an indicator of the specific atomic conjunctive formulas are used to refer 2.4 The Rewlt To define we first Such the to those formula otherwise predicates noted, the phrases of a conjunctive query Query a conjunctive the essence inference, in the unless and in the antecedent. of applying to capture a single-tuple atomic Finally, query of a Conjunctive result have query. of a conjunctive or derivation, antecedent query of inferring to a database a single is realized of a conjunctive tuple instance, in that result. by instantiating query with each a single tuple. For the purposes of this paper, such a derivation corresponds to the traditional combination of tuples (one from each participating relation) enhanced with a multiplication of their labels, which results in the label assigned to the derived tuple. For instance, consider query SQ 1 of Example 1.2 with its relations interpreted as multisets and instantiated as mentioned earlier. Combining the (#1, Boston) tuple of MANUF, with, say, the (#24, Boston) tuple of TRANSP, which the result tuple (#1, Boston) with label 1, capturing transportation ton) tuple of TRANSP, unit in the city of plant of MANUF, which generates the result #r/ is not even in Boston. Valuations of conjunctive single-tuple definition, A is from 2.6. examples Consider O, with tuple queries in the additional Definition ... A derivations #1. Similarly, is labeled focus combining the above (#7, Boston) are query used result a conjunctive with as the and on aspects which 1, tuple label O, since plant formal of Bos- Boston) of the definition a the (#7, (#24, are defined query is labeled is labeled 1, generates the existence of some the mechanism for next. the After itself. form Al A Az Am ~ C. A valuation 0 of a is a pair of functions ( @u,01). Function @v the variables of a to some set of constants, and function 91 is from the atomic formulas of a to labels such that 0Jc)=f(1cP4 Applying 0 on a gives an instance Example 2.4. To illustrate intuition when dealing with of a. Definition 2.6 and the fact that it captures the familiar label systems, we discuss the cases of ACM Transactions on llatabase Systems, Vol. 20, No. 3, September 1995 300 . Table I ConJunctzue Y E. Ioannldls and TWO Valuatlons d and query H’ In a Label ztem System that Interprets Relatlons e as Sets (i’ .! o,(x) 6:(x) = K z HC,(Z)=L til:(z) = L -v H,,(y) = N o;,(y) = N Q(,r, z) R(z,.v) relations tive R. Ramaknshnan = K til( Q( x, z)) =“true” fi~(Q(x, z)) =“false” Ol(R(z. O~(R(z, y)) as sets and relations y)) =“true” as multisets. Consider =“true” the following conjunc- query: Q(x, Assume that the {K, L, M, N}. Consider the domain label system in Example Table I. Valuation system, system tuple since, dl applied til(P(x, On the other of all y) argament that y). +~(X, positions interprets of all relations relations as sets is D = (first label 2.1), and let the two valuations o and 6’ be as shown in O represents the case where tuple (K, L) is in Q and tuple (L, N) is in R. Then, captures this intuition set label A~(z, z) hand, y)) (K, N) should be in the result. Definition 2.6 based on the definition of multiplication in the on the consequent = Ol(Q(x, valuation z)) and (1’ represents y)) = Ol(Q(x, Ol(R(z, y)) .z)) and query yields =“true”. the case where in Q and tuple (L, N) is in R, and therefore, result. Again, Definition 2.6 captures this consequent of the conjunctive query yields OI(P(X, of the conjunctive tuple (K, L) is not tuple (K, N) should not be in the intuition since (I1 applied on the Ol(R(z, y)) =“false”. We now turn our attention to the label system that interprets multisets. Consider the two valuations o and d’ shown in Table relations as II. Valuation 0 represents the case where tuple (K, L) appears in Q twice and tuple (L, N) appears in R five times. Then, tuple (K, N) should appear in the result ten times. Definition 2.6 captures this intuition since, based on the definition of multiplication the conjunctive in the multiset query yields dl(~(~,y)) On the other hand, label = 01( Q(x, valuation system, z))* 01 applied 61( R(z, 0’ represents y)) on the =2*5= the case where consequent of 10. tuple (K, L) does not appear in Q at all, and therefore, independent of the number of times tuple (L, N) appears in R, tuple (K, N) should not be in the result. Again, Definition 2.6 captures this intuition since (ll applied on the consequent of the conjunctive query yields dl(P(.x, ACM ‘I%msactlons y)) on Database = 61( Q(x, Systems, Vol z))* 01( R(z, y)) 20, No, 3, September = O*5 1995 = O. Containment Table II. Conjunctl Two Valuations ve query () and f)’ in a Label item System that Queries Interprets as Multlsets w f),(x) K = e:(x)=K o:(z) = L q:(y) = N z Y Q(x, z) f),(z)=L ~,>(y)= N Ol(Q(X, Z)) = 2 o;(Q(.z, z)) = O R(z, @l(R(z, O;(R(z, y)) y) Definition consequent i < n, are a, b,, respectively. and If, for all Note that in Definition of the same conjunctive equivalence because .v)) = 5 2.7. Consider two conjunctive whose distinguished variables D, respectively. compatible. relation they 1< 301 . Relatlons (1 x 1< of Conjunctive queries a and ,B with compatible in the ith argument position, Let i < n, = 5 /3” and 6;(a, 6 ~ be valuations ) = Ot?(b,), then 6” of a and and o ~ are 2.7 a and ~ do not have to be distinct. For valuations query, it is easy to show that compatibility is an over capture valuations. derivations Compatible of the valuations same tuple. are This important is a key concept needed in defining the result of applying a conjunctive query to a database instance, because the labels assigned to a tuple by compatible valuations need to be combined to determine the label assigned to it in the overall result. Definition to 2.8. A valuation o of a conjunctive a database Q((l(x, instance if, for every . . . . du(xn))) = f31(Q(x1,..., ), We finally query to individual tions tuple move tuple of the in tuple of the This The result earlier. In the of the result. For done by particular, by adding outcome in the overall with respect Q( xl, . . . , x.) of applying is essentially defined a is true formula in a, x~)). are combined derivation. to the tuple of Example mentioned definition instance. derivations same each assigned once for to the a database query atomic separate labels addition example, a conjunctive combining the deriva- assigned is the consider to the final query label SQ 1 1.2 with its relations interpreted as multisets and instantiated as earlier. The (#1, Boston) tuple of its result is derived three times, each of the the (#1, Boston) tuple TRANSP tuples of MANUF. with Each location derivation = Boston assigns combined the label with 1 to the resulting tuple, whose final label is therefore equal to 3, capturing the fact that three transportation units are based in the city of plant #1, as expected. The result of a query is formally defined in Definition 2.9. After the definition, additional Definition A . . . A Am f ~ examples 2.9. ~ and focus Consider a &tabaSe on aspects a conjunctive instance of the definition query 1. Let of a that are true with respect to 1. Partition relation of valuation compatibility, and let ACM TransactIons on Database a @ be of the @ based @~ denote Systems, Vol itself. the set form Al ~ Az of all valuations on the equivalence the partition that 20, No 3, September 1995. 302 Y. E. Ioannidis . Table Domam III Instances ?nenlber and R, Ramaknshnan of Q and R m a Label d System that Interprets Relations R(d) Q(d) (K, L) “true” “false” (K, M) “true” “false” (L, N) “false” “true” (M, N) “false” “true” All “false” “false” Table others IV, Four Valuations that are Compatlble in Generating of the Conjunctive Item o x (+,(x) fJ,(z)=K f),(y) = K til(Q( x, z)) =“false” R(z, 01(R(2, generates y)) tuple a to 1 (denoted =<’false” d:(z) = L o d:(y) = N fl[v(~) = K ()::(z)=M H;!(z) = N d;’’(y) = N e;(y) = N z)) =“true” ,9j(Q(x, z)) =“trUe” o?(Q( #~(R(.z, y)) (3j’(R(z, y)) t?j’’(R(z, =< ’true” variables is a relation instance (,! ~;:(x)=K 6~(Q(x, t in the distinguished by a(l)) (K, N) m the Consequent (i’{ ti:(x)=K = N Y Q( x, z) y) Tuple Query 6’ z as Sets =’<true” of a. The P, that X, z)) =’<false” result y)) =“false” of applying is, a function, such that ‘(t)=,Loz(c)= 2“),UP(A4$ The intuition based behind Definition on the conjunctive query 2.9 is that are grouped a different label to it, and these labels the label of the tuple in the result. all derivations together. are combined Each (through of a single derivation tuple assigns addition) to find Example 2.5. To illustrate Definition 2.9, we discuss the cases of relations as sets and relations as multisets. We use the same conjunctive query as in Example 2.4, Q(x, z) //~(Z, y) +~(X, uV), with the same domain for all argument positions of all relations, D = {K, L, M, N}. Consider the label system that interprets relations as sets, and let the instances of Q and R be as shown in Table III. There are 81 valuations that are true with respect to Table 111 (three variables, each assigned to one of four possible elements in its domain). Four of these valuations are compatible in generating tuple (K, N) in the consequent of the conjunctive query, and are given in Table IV. Valuations (I and 0’” generate the label “false” for tuple (K, N), while is, by taking tuple valuations the logical (K, N). This was 8’ and O“ generate the label “true.” By adding, that or of these four labels, we obtain the label “true” for to be intuitively expected, because tuples (K, L) and (L, N) are in Q and R, respectively, and therefore their composition should be in the result. Similarly, tuple (K, N) is placed in the result ACM ‘J1-ansactlons on Database Systems, Vol 20, No. 3, September 1995 (K, N) as the Containment Table V. Four Valuations Compatible in Generating of the Conjunctive Item o O[(Q(x, dl(l?(z,y) y) ‘lrable VI. Domain z))= ’’false” Instances member 6j(Q(x, )=’’false” z))= of Qand Lebel d z))= fl~(lt(z,y) System that ’’true” )=’’false” Interprets @j’’(Q(x, z))= ’’false” O~(R(z, y))= ’’false” Relations 0 0 5 o o o Four Compatible Valuations 2 0 that Generate the Conjunctive Zte?n o as Multisets 2 3 VII. ,!, o;(x) = K H:(z) = N e~(y) = M R(d) others Table o Q(d) (K, L) (K, M) (IJ,N) (M, N) All @f(Q(x, )=’’false” Rina (K, M) in the Consequent o;(x) = K 6((z) = M e:(y) = M ’’true” tij(fi(z,y) 303 o 0“ o;(x) = K 0;(2) = L o:(y) = M d,,(x) R(z, Tuple Queries Query 6’ = K 0“(2) = K (1,,(y)= M x z Y Q(x, z) of Conjunctwe Tuple(K, N)inthe Consequentof Query 0) Q,!, w x OU(x)= K f);(x) = K o;(x) == K o:(x) z o,)(z) = K o;(z) = L o;(z) = M ~~(z) = N Y Q(x; o,(y) = N f);(y) = N e:(y) = N o:(y) = N Z) dl(Q(x, z)) = O R(z, y) 6’l(R(Z, y)) composition effect since = o of tuples relations 6:(Q(x, z)) = 2 @fl(Q(x, z)) = 3 (Ij’’(Q(z, z)) = O @j(R(z, y)) Oj’(R(z, y)) Oj’’(R(z, y)) (K, M) and = 5 (M, N), are interpreted but of the conjunctive this for tuple query, as given whose in Table (K, M), and therefore, was to be intuitively composition expected, could = 2 multiple derivations = O have no as sets. Now consider the four compatible valuations the ,given database instance and that generate “false” = K generate V. All its final because that tuple are true with respect to (K, M) in the consequent of these label there generate is “false” the label as well. are no tuples in Again, Q and R (K, M). We now turn our attention to the label system that interprets relations as multisets (Example 2.2). Let the instances of Q and R be as shown in Table VI. The four compatible valuations that are true with the instance given in Table VI and that generate tuple (K, N) in the consequent of the conjunctive query are given Definition in Table 2.6, the four VII. Based valuations on the multiplication generate the labels semantics of * and O, 10, 6, and O for tuple (K, N). By adding these four labels, we obtain the label 16 for the tuple, which is exactly the number of copies one would intuitively expect for it. The composition of tuples (K, L) and (L, N) of Q and R, respectively, generates 10 of these copies, and the composition of (K, M) and (M, N) generates 6 more. ACM Transa&,ons on Database Systems,Vol. 20. No. 3, September 1995. 304 . Table Y, E. Ioannldls VIII, Four and R, Ramakrishnan Compatible Valuations that Generate the Coniunctlve Item o Tuple (K, M) in the Consequent tl,,/ e“ (1’ x (9,(x)=K (?:(x) = K 6;(x) z d,,(z) = K d;(z) = L 8;(. #u(.Y) = e;(y) = M M = K (l;:(x) ) = M o::(y) 01’’(y) = M z.) (Ir(Q(X, Z)) = O o;(Q(x, z)) = 2 6j’(Q(x, z)) = 3 (3:( Q(x, R(z, y) tll(R(z, y)) o Oj(l?(z, y)) o tIj’(R(z, y)) Hj’’(l?(z, Since we dealing are with multisets, To conclude with the example, to the respect each = O separate copy counts, consider given the four database compatible instance y))= o there- of copies of valuations and generate = M z)) = O and the number Indeed, this is precisely the above query in SQL.2 fore, 10 and 6 must be added. (K, N) generated by evaluating true = = K @J’’(z)=N Y Q(x) = of Query that tuple are (K, M) in the consequent of the conjunctive query, as shown in Table VIII. All of these valuations generate the label O for tuple (K, M), since E?(z, y) is always mapped to intuition At this point, additional used O. Thus, the final label of tuple (K, M) is O, which again is what requires. in let relation the consider us columns previous why is not examples, explicit enough. and representation Consider add a label the column of labels conjunctive in the as query participating relations: Q(x, We claim that, capture arbitrary final can label only z,lQ) ~l~=g(lQ,zR) ~R(Z,Y,lR) ‘f’(x,Y,lP). independent of our choice of function g, the above cannot interpretations of relations. The main problem is that the assigned capture to a tuple depends relationships on of labels multiple associated derivations, with a single whereas tuple g deriva- tion. Thus, for example, if g is equal to *, then g captures the treatment of labels for a single valuation, but cannot capture the combination of multiple compatible valuations. For conjunctive queries where all variables are distinare projected out of the result, howguished, this suffices. When variables ever, additional as part machinery of a single needs valuation. to be invoked, The formulation which cannot of the problem be represented as presented in the previous definitions achieves what is desired. Moreover, in our opinion, it syntax from semantics. is also aesthetically pleasing because it separates There is a single syntactic definition of conjunctive queries, on top of which one may impose arbitrary semantics captured via label systems. We now present some simple queries that illustrate the power of the generalizations that we have proposed. These complement the examples in in light of the formal definitions that we have the Introduction and are given example demonstrates the importance of multiestablished earlier. Our first 2 In SQL, requested ACM multiset semantics by using the keyword Transactions on Database is the default, and duplicate elimination distinct. Systems, Vol 20, No 3, September 1995 must be explicitly Containment sets in languages compelling like reason SQL. to 2.6. The already generalized such as the ones discussed E,~ample As we have study here of Conjunctive following query argued, conjunctive are already Queries this queries, widely 305 . is the since most queries used. can be considered a view definition in SQL: Department (dname, ~ Deptsals( This query 2.2, where example, tuples. it is natural However, appears salary. all This tion. If the view over not and the diminish relations formulation, salary relation as there such relation to compute Department does keyword us to query features and sums the (multiset it the authorized and in the of tuples In formulation in which budget the that the sum of of each sum budgets, depart- aggregate to be a set of tuples, since it is possible Employee relations of) salaries the that opera- we could for two reasons. Employee a multiset relations view First, that there this is true, conjunctive in this is an implicit of salaries, and not over by department Although even not is an SQL query groups of understanding to recognize to see the Deptsals directly in each department. importance generates to write of a salary earning to compute is, the this to be sets of in the department and in Example cardinality. is used for example, that considered defined associated distinct basis, 2’ sal) in the antecedents are employees account rigorously for the cardinality for a variety of reasons, it may not Department system is a multiset the view, dname, of a department by examining the view. that explicit construction of multiset relations as multisets, that an as GROUP-BY was it is important field label has the relations on a per-department compute the budget One could argue the on the a relation to consider as often by using necessary based the Deptsals enables salaries ment, sal ). in unless the SQL query, value tuple A Employee(ename, dname, is interpreted each fZoor) queries single query projection thus, of we need to of each salary in the projection. Second, be desirable to give users access to the directly. For may nonetheless example, a user who not be authorized what each individual earns. In such a situation, the multiset relation is the most natural solution. The following example illustrates the generalization explicit creation to relations is to see of a as fuzzy sets: Example 2.7. The following queries attempt to find promising candidates in a football draft. The first query identifies linebackers who are big and have experience, and the second identifies receivers with speed and skill. The data have a degree of uncertainty, in that each fact has an associated “certainty factor” between O and 1. These factors are to be interpreted in terms of fuzzy logic, rather than as probabilities. That is, a fact llig~ohn) with an associated label of 0.7 is to be read as “John is quite big,” rather than “It is quite likely that John is big.” Furthermore, our confidence in the criteria expressed in the two rules is also limited, and thus, each rule has an associated “rule certainty factor.” The certainty of a deduced fact is given by the minimum value of the ACM Transactions on Database Systems, Vol. 20, No. 3, September 1995. 306 Y . Ioanrudls and R, Ramaknshnan E, certainties associated multiplied (in the with deduced using several given by the maximum about certainty the facts arithmetic different of factors probability-based used sense) deduced why reasoning in antecedent rule the are associated factors it. more elsewhere the valuation If a fact is factor is certainty for sometimes discussed of factor. certainty certainty they are the the valuations, the and by (General rules appropriate [Parsaye than and Chignell 1988].) Linebacker x) A Big(x) Recei~er( The above desired x ) A Fast(x) semantics the second label system tive respectively queries fz(a) This are The label X), ‘-G Candidate( X). A Skilled(x) captured by sets) defined associated system certainty tion and interpreting this in Example with framework and note respect that the the labels the required Example kuples, label assigned Although van we Emden of but define do trees for captures [ 1986]; treatment consider a and domain. above. Emden the ignoring of recursive intersecqueries, a correspondence two-person to the same of it between his games. in label SQ3 contains the the a more of For SQ4 potentially is results system. and is formalized definition, containment necessarily containment tuple the Introduction, sQ4 not Intuitively, order the query interpreted sets based on two queries and exactly certainty a general but with comparing based as the we saw same never on in set fewer based on a comparison of the in SQ4, but not vice versa. for a general formal in as instance, generate more of each tuple than SQ3. Thus, of each tuple, SQ3 is contained After 1 as their essentially van [1965] not showed conjunctive are The above illustration 2.10. Zadeh’s essentially system. partial duplicates multiplicities 2.7 by on conjunc- fl( a) = 0.7* functions Example based query 2.1. The two Query Containment some 1.2 in proposed search now ready to where relations to is sets. alpha–beta Conjunctive We are setting, it of fuzzy to A used framework factors, union interesting 2.5 of type deduction rule of are (fuzzy x) ‘~ Candidate( = 0.6* a, having the real numbers between O and mentioned gives the rule certainty factor semantics quantitative is A E~perienced( example is label system also provided. in Definition Definition 2.10. Consider two relation instances R ~ and R ~ for a predirespect to a label system ~. R ~ is cate R with domain D ~ x “”” x D. with in Rz, denoted by RI <, Rz, if, for each tuple t ● Dl x ““” xD., contained R.l(t ) s Rz(t). Clearly, S, is a partial order. Example multisets 2.8. to We again illustrate the use the above label definition systems corresponding to sets and If R 1 <, R ~, then, of containment. of R ~ and R ~ based on the above definition, there is no tuple t in the domain such that Rl(t) > R.g(t). this is equivalent to the absence of a tuple t for which For the set case, and R2(t) = “false,” that is, which is in R ~ but not in R ~. This Rl(t) = “true” captures precisely the traditional notion of set inclusion, R 1 c R z. ACM TransactIons on Database System., Vol 20. No 3. September 1995 Containment For the which t multiset has traditional notion Definition tive case, strictly is equivalent copies of inclusion 2.11. than this more For ~, denoted two a <, of Conjunctive in to the RI than absence in Rz. between multisets based conjunctive queries a and if, for any ~, database Queries latter is defined in partial order over both conjunctive queries, 2.6 on number j3, of the former. ‘l’he is I, the of copies. is more a instance a(I) restric<, f?(I). containment of is natural, since ordering relation t for this <, instances denotes a and the set of Homomorphisms We close this tools section with in characterizing Definition ble terms the set of compatible 307 of a tuple Again, Note that the symbol S, is overloaded in that it signifies relations as well as containment of conjunctive queries. This the . 2.12. Consider consequent. variables A of ~ into (1) if x, y position that x.) two contains function. appears this in D, then h: a and is a total appearing in a, respectively, Q(h(xl), ~ -~ a this definition ..., ~ with compati- function from the induces the same argument then h(x) = y; and h(x.)) appears a total mapping is not in mapping a function, of a homomorphism extension, we can define homomorphism in which atomic in a. We also define formulas queries ~ -+ a variables of f? and formulas, the With are useful a. from the of ~ to the atomic formulas of a. Occasionally, when no mapping as well. If a we use h to denote that induced repeated extend conjunctive h: which containment. of a, such that a homomorphism atomic formulas confusion arises, readily those of homomorphisms, query homomorphism are distinguished in the consequent (2) if Q(xl,..., Note the definition conjunctive this function to an onto is onto, but we that it homomorphism with a variable-onto require respect can is a to be a to the homomorphism set to of be a homomorphism that is onto with respect to the set of variables in a. Note that every onto homomorphism is also variable-onto, but that the converse is not true. 2.13. Definition subset of the defined as fl 3. TWO For two functions domain o fz(~) GENERAL of fl, their = fl( fz(~)) fl 3.1. fz such that PROOF. Let argument by x in the domain fl of fz is a o fz and is of fz. RESULTS Consider a label system that are applicable to almost in the rest of the paper. -Y such that For two conjunctive queries a and ~, the inequality to 2 only if there exists a homomorphism h: ~ - ith the range is denoted for any member In this section we establish two results systems and that are used extensively LEMMA and composition all label Vu, b ~ L – {O}, a * b # O. a <, ~ holds with respect a. a, (resp., b,), 1< i < n, be the distinguished variable in the position of a (resp., ~). Assume that a <, (3. Consider a ACM Transactions on Database Systems, Vol. 20, No, 3, September 1995. 308 Y. E. Ioannldls . valuation set O Of a of constants SUCh C of L. Consider is satisfied: R. Ramakrlshnan 0,, is one-to-one (+1 maps a database all instance Let I?fi ) be the result in to a onto nonzero for any relation some elements Q the following , . . . . .~,. ) in the antecedent of applying on the requirements on f ~t((~,(al), in the of a; L with ... (1,,(a,,))) true and respect to the given that for any atomic ~ ) on that a <, respect to < + O as well. Thus, database formula instance. of the lemma of conjunctive . . . . ti,, (a~))) # O. Since of a (resp.. on .& in the premise definition element Pfl((dt(al),. with such a t = (f),, (.rl ),...,fl, (x,,l)) for some if Q(xI requirements holds: variables in otherwise. P,, (resp., based the formulas such that \ 4), Then, from atomic . . ..xm )). 1 = that and 6/( Q(.rl, Q(t) and /3 and and the queries, the following because O is the least , it must a valuation be the case that 6’ of ~ exists that is instance that Q(.Y1, ..., y~) is compatible with o in 6, Q((fl~(.Yl ),..., 6:( Y., ))) # O. By the construction of the database instance, the above implies that d; maps the variables of P into the set of constants C. Valuation O, is Taking the composition one-to-one and onto, so its inverse (3,T1 is defined. h = 0,, 10 ():, is easy to verify it of ~ to the variables L~iWWA 3.2. there exists of a. Consider that it is a homomorphism from the variables ❑ two conjunctive a honlomorphisrn h: ~ ~ queries a and a. Consider ~, and a database assume instance that I and the set 0,, ( resp., @P) of all valuations of a ( resp., @) that are true with respect to I. Let P be a total function on 6)0 ranging over the set of valuations of ~ and defined us F( O) = O ~ h. Function o ~ (OC,, P( e ) ● (3D For all PROOF. phisms. Q(yl, The By proof of property . . . . -vk’) in /3, Q(h( the lemma and is (2) of yl), . . . . A(yk)) F has the following P( (ii) based Definition is compatible on the 2.12, appears with definition for in property: a each as well. (i of homomor- atomic This formula implies the following: z Oh(Q(yI,..., Y,n)) = fl(h(Q( yl, . . ..ym))) = O1(Q(h(y l),. ... h(y,n))) =Q((o,,oi(yl) ),..., ti,,(h(ym)}) since =Q((O, Hence, by Definition argument position ti,, o h(b, ), 1 < i < ACM TransactIons with respect to 1 F( (3) = 60 h of /3 is true with respect to 1, ok(yl),..., 2.8 valuation that is, it is in (Op. Let a, (resp., b,), (I is true 1 < z < n, be d,, Oh(y~))) the distinguished variable in the ith of a (resp., ~). By property (1) of Definition 2.12, /3C(aZ) == n, and therefore, H and F(fl ) = @o h are compatible. ❑ on Database Systems, Vol 20, No 3, September 1995 Containment 4. LABEL The SYSTEMS following conjunctive THEOREM and p: Then, exists Let PROOF. argument it follows two a <, Assume a homomorphism Then, h: P ~ /3) that instance are true with F has the following of type label queries that with suff~cient with Ya, b =L, respect condition systems a: Al A ~.” A A.,l system for of type a s b = fa(a) to a label A: ~ c. <fP(b). .S? of type A a. Va, b e L – {O}, a * b # O. Therefore, a database The proof 309 a, (resp,, b,), 1 s i s n, be the distinguished variable in the position of a (resp., ~). Assume that a <, /3, By property (Al), that a (resp., and databases conjunctive ~ holds the “only-if’ direction is proved. For the “if” direction, assume Consider a necessary over B~z ~c~. A the inequality iff there ith Consider A ... BI identifies containment 4.1, . OF TYPE A theorem query Queries of Conjunctive of property A. Let that respect additional 1 s exists i < to 1. Let Lemma a homomorphism 1 and the set 6). (resp., (a) is based A = {A,: there by applying h: ~ - @P) of all valuations F be defined as in Lemma 3.1, a. of 3.2. property: on specific ml} and characteristics B = {B,: of label systems 1 < i < rn2} represent the set of atomic formulas in a and ~, respectively. Consider the subset A~ of A consisting of the atomic formulas in a that are images of atomic formulas in B under h. Without loss of generality, assume that A~ = {Al: some k z 1.Then, since til o h ~ @fi (Lemma 3.2), the following 1< i < k} for holds: (1) The last h(llj ) for following equality some is due to property i #j, the product (A2), is not which implies affected. that, On the even if h(B, ) = other hand, the is also true: From the above, we conclude that Ol(c,t ) s 610 h(cP ), using property (Al) of multiplication and the conditions on f. and fp in the theorem. Thus, property (a) of F holds. We proceed with the the equivalence relation proof of the of valuation theorem. Partition (1,, and @v based on compatibility, and let @ffl and @~P be the corresponding partitions that generate tuple t in the distinguished variables of a or ~. Clearly, there is a one-to-one and onto correspondence between the partitions obtained for a and those obtained for B (since for both of them there is a single partition for each tuple in the domain of the consequent relation). Let V,, and Vfi be multisets defined as follows: V,, = ACM TransactIons on Datiabase Systems, Vol. 20, No 3, September 1995 310 Y, E Ioannidis . {6?(c. ): 9 = @t.} and there is some element Suppose that R Ramaknshnan Vp = {O~(cD ): 0’ = @fP}. By UO = V’ such that property (A4) of addition, UO = 61(c0 ) and v~ = IN 01)(cP ), for some O = ofa. From (a), UO < v~, and from following holds: Combining and Lemma (2) and (3) yields 3.2, u~ = VP. By property E,, ~ ~ u < ZU (A3) ~ U. If a(~) ● and property of addition, ~(l) the are equal to the relations Pa and PD, respecti~ely, by ~efinition 2.9, the above that, for all tuples t in the result of a or ~, P.(t) < PP (t ). Therefore, implies for an arbitrary a <, database instance 1, a(1) <, P(1), which also implies that ~. ❑ As we have mentioned tional databases, A systems. Thus, [ 1977], so that example, ward it is now is that systems 5. LABEL The Testing SYSTEMS following over traditional alternative semantics sets [Zadeh query 1965]. the following [Chandra conjunctive query as well, identifies databases 5.1 illustrates, 5.1. with PROOF. a sufficient with label and Merlin containment Consider two condition systems it is not necessary, Let conjunctive a, (resp., 6,), 1 < i < position of a (resp., h: /? + a. Consider (resp., fl~ ) of all valuations 3’ be defined as in Lemma 1977]: with all TransactIons to n, queries be from a: Al A ... A the distinguished on Database is, @l(c. ) <010 6). to 6)P. Systems, Vol. 20, No as variable 3, September Anl ~ CW < ffl(b). a <, in /? the that there exists an onto instance I and the set 6). of a (resp., ~) that are true with respect 3.2. Then, F has the following additional that query B. Unfortunately, in general; /3 ‘). Assume a database O E @a, 191(c.) s (A); (b) F is one-to-one ACM respect for conjunctive of type ties: (a) For respect for the traditional rfi Assume that Va, b = L, a < b * f.(a) and P: Bl A . . . ~B~z~cp. If there exists an onto homomorphism h: /? d a, then the inequality holds with respect to a label system S7 of type B. ith argument homomorphism for A straightfor- containment to the same problem rela- case of type and Merlin OF TYPE B theorem over THEOREM for as fuzzy for conjunctive for queries as sets, are a special the result of Chandra of type A is NP-complete. containment Example testing we have COROLLARY 4.1. systems relations of type A is identical case of sets. Hence, label conjunctive applicable for interpreting corollary to label earlier, which interpret relations Theorem 4.1 generalizes 1995 h(cp). to 1. Let proper- Containment The proof systems of property of type B. (a) is based Let the set of atomic onto, set .?3 can be partitioned that the in a and into Without two Queries B = {Bi: say, B ~={BZ: Because B~ and assume ml of label 1 < i < rrz2} ~, respectively. subsets, 311 . characteristics and loss of generality, 1 s i s ml} and B~ = {B,: holds: specific 1 < i < ml} formulas h: B~ a A is a bijection, subsets are the following on the A = {A,: represent of Conjunctive h is BQ, such that the two m2}. Thus, + 1 .s i < m2 =fp fi6’1( [ t=l ~ Al)* 6,0h(Bz) . 2=7721+1 ) Also, From the above, because of property (Bl) of multiplication on fa and ~P in the theorem, we conclude fore that property (a) of F holds, The proof of property (b) consists h is onto, of some variable ~. The above it is also variable-onto, combined of /3. Assume with (4) implies Ouo k(y) Therefore, 0,,0 M P) # O; oh(B), We proceed with the proof given two valuations O(a) + O’(a), (4) and that so every x = h(y), variable of a is an for some variable y of that # 6); 0/i(y). which of the that, # O’Oh( @). Because a such that # f3;(x). tic(x) Because and the conditions 01(Ca) s 010 h(cP ) and there- of showing 19,f3’ = @a, if o(a) # ~’(a), then Ooh(P) there must be at least one variable x in h-image that implies theorem. (5) that Partition property ~. (b) of F holds. and @p based on the equivalence relation of valuation compatibility, and let (l~e and @fP be t in the distinguished the corresponding partitions that generate tuple variables of a or ~. As in the proof of Theorem 4.1, there is a one-to-one and onto tained correspondence for ~. Let between V. and the partitions VP be multisets obtained defined for a and as follows: those ob- Vu = {61(c0 ): @E @ta} and VP = {O;(CP ): (3’ e @tP}. Let VP” be defined as follows: VP” = of F imply {O{(cp): 0’ e @tp and 0’ = /3 o h for some 19= (30}. The properties that for every to u (Lemma other imply element u = V. there is an element U’ ● V: that corresponds u < cl’ (property (a)), which corresponds to no 3.2) such that element of Va (property the following: ACM (b)). TransactIons By property on Database (B2) Systems, of addition, the Vol. 20, No. 3, September above 1995. 312 Y, E. Ioannidls . and R Ramakrishnan by If a(l) and P(1) are equal to the relations P. and PP, respectively, t in the result of a or /3, Definition 2.9, (6) implies that, for all tuples Pa(t) < Po(t). Therefore, for an arbitrary database instance 1, a(l) <, /3(1), which also implies The following query PROPOSITION with homomorphism provides For two respect h: /3 ~ Assume ❑ /3. containment 5.1, a ST ~ holds a <, proposition for conjunctive PROOF. that a straightforward with respect conjunctive to a label queries system necessary to label a systems and condition of type (3, the -5? of type B only B: inequality if there exists a a. that CY S, {O}, a * b # O. Therefore, ~. By property by applying (Bl), Lemma it follows 3,1, the that VCL, proposition b = L – is proved. ❑ By restricting that the form the condition THEOREM of conjunctive of Theorem Consider 5.2, queries, 5.1 is both two conjunctive D + to a label system shows Vu, b ● L, Al ~ A A~l A a < b + f.(a) a <, ~ c. < fP(b). fl holds with an onto homomorphism h: a!, PROOF. Suppose The that “if” direction a <, /3. By h: D s homomorphism such a: the inequality $?’ of type B iff there exists theorem and sufficient: queries that and ~: BI A .“ A B~z ~ C@. Assume If a does not contain repeated predicates, respect the following necessary homomorphism. is an immediate Proposition 5.1, cr. We first Since there show consequence we know that, that in this are no repeated of there Theorem must case, there predicates in 5,1, be some is a unique a, for each atomic formula of ~, there is a unique atomic formula in a that can be its image under any homomorphism. Therefore, for each variable in ~ its image is uniquely determined; that is, there is a unique homomorphism h: /3 4 a. It remains the contrary not the image to be shown that that this h is not onto. of any atomic unique Then, formula atomic formula of a. Clearly, Q cannot ity, assume that all arguments of all have the same domain. Consider instance such that each relation R(t) # O, R(t) = o, Let P. (resp., PP ) be the result homomorphism there in is an atomic /?. Let is onto. Assume formula Q be the in to a that predicate is in that appear in ~. Without loss of generalpredicates in the conjunctive queries a constant R satisfies c in that domain the following: and a database if t has c in all its arguments; otherwise. of applying a (resp., B ) on that instance. Let s be the tuple in these results that has c in all of its arguments. Let Pp(s) = 1, where, by the construction of the database instance, 1 G L – {O}. By property (B3), there is a label m ● L such that m >1. Suppose that t’is the tuple in Q that has c in all of its arguments, and choose Q( t’) = m. Then, by property ACM TransactIons on Database Systems, Vol 20, No 3, September 1995 Containment (Bl), tion. l’.(t) > m >1 = PP(t), which ❑ Hence, h must be onto. Recently, Chaudhuri 5.1, Proposition ask whether not contain following Let Y 5.1. Definition satisfies (El) A Q(x) P: Q(x) +P(x). L = N (the addition, a s, A does). to ~ does Unfortunately, the -P(x), < there B label numbers is the usual ~, although so that version -!%= (L, total including order is no onto systems a stronger system queries: set of natural and of type label the case that two conjunctive Q(x) we can prove 5.1. a possibly Theorem It is natural is not possible: a: Clearly, is a contradic- discovered to cover 313 . homomorphism examples of Theorem *, +, O, < ) is O), over the of like the 5.2. type B’ if it the following: Va, b~L–{O}, (B’2)Va, (B’3) this as follows: p, which independently the following + is the usual are excluded, (while that from ~ to a. By limiting the definition above [1993] a <, Queries 5.2 for the case of multisets. predicates Consider numbers. that 5.2 can be strengthened shows be defined * is ma-c, natural and Vardi repeated example Example implies 5.1, and Theorem Theorem of Conjunctive a<a*b, a’, b, b’=L, and3a (a<a’andb =L, <b’) Vkkl, ah<ak+l. +a+b<a’+ b’. Va G L, ~a’ ~ L, a < a’. Observe that B’ is that the only there than the element following result: THEOREM difference is an element itself (property Consider 5.3. between in L whose label systems product (B l)). For these two conjunctive of type with itself label systems, queries a: Al B and of type is strictly larger we have the ~ . . . ~ Am ~ 5 c. tp and P: BI If either holds A . . . ~B~z-cfl. Assume a or ~ does not contain with homomorphism respect to a label h: p d that repeated system Va, b = L, a < b ~ f.(a) predicates, J? of type B’ the inequality iff there exists < f~(b). a <, an /3 onto a. PROOF. The “if’ direction where there are no repetitions as well as the “only-if’ direction in a are immediate consequences for the case of Theorems 5.1 and 5,2, respectively. Suppose that (1 does not contain repeated predicates and that a <, ~, By Proposition 5,1, we know that there must be some homomorphism h: D ~ a. Clearly, every such homomorphism must be oneto-one with respect to atomic formulas, since there is no repetition or predicates in ~. Assume that h is not onto. If m (resp., n) is the number of atomic formulas in a (resp., ~), then clearly m > n. Let 1 be an element of L such that Vk >1, lk < lk +‘. Property (B 1) ensures the existence of such a label. ACM Transactions Without on Database loss of generality, Systems, assume Vol. 20, No. 3, September that 1995. Y. E, Ioannidis 314 . all arguments domain. that of all Consider R. Ramakrishnan predicates conjunctive R satisfies the following: domain queries = 1, if t has c in all its arguments, R(t) = O, otherwise. of applying results P.(s) P.(s) = lm and PP(s) = l“, > PP (s ), which implies must be onto, that have and a database R(t) in these We remark the c in that PP ) be the result be the tuple in a constant each relation Let Pe (resp,, and a (resp., ~ ) on that has c in all of its arguments. Since that the same instance such instance. Let We note s that m > n, by property (B 1), it follows that a K, /3, which is a contradiction. Hence, h ❑ that the choice of h in the above proof was arbitrary. Hence, a <, ~ only if every homomorphism h: P ~ a is onto. As mentioned earlier, commercial relational database systems treat relations as multisets and therefore, operate under the semantics of type B (and, in fact, type B ) label systems. Hence, Theorems 5. 1–5.3 are applicable to this case. 6. UNIONS In this ment OF CONJUNCTIVE section we generalize not of individual QUERIES the results conjunctive queries, classical case, where relations are addressed by Sagiv and Yannakakis zation to the containment presented problem, but viewed [1980], After introducing holds show undecidable B, thus for label between systems of type the two label of collections contain- of them. as sets, the problem who gave a syntactic ogy, we first show that their theorem which the set-case is one). We then differences above and discuss the necessary For the has been characteriterminol- for all label systems of type A (of that the problem is, in general, providing more evidence of the systems. 6.1 Basic Definitions The following junctive query definitions case: are straightforward Definition 6.1. A union of conjunctive formulas with identical consequent. If queries in the collection, then their union extensions of the single con- queries is a collection of first-order a,, 1< i < n, are the conjunctive is denoted by U ~= ~al. Definition 6.2. Consider a union of conjunctive queries U ;. ~ a, and a database instance 1. Let P, be the relation instance that is the result of applying a, to 1. The result of applying U ~. ~ a, to 1 (denoted by U ~. ~ a,(l)) is a relation instance P, that is, a function, such that, for each tuple t in the domain of the consequent of the conjunctive queries, P(t) Definition with 6.3. compatible AC!M TransactIons For two unions consequent, on Database Systems, = fi P,(t). [=1 of conjunctive queries I.J ~~~ a, and I.J~j ~ ~~ is more restrictive than U ~~ 1 BJ, U ~~~ al Vol 20, No 3, September 1995, Containment 6.2 of Conjunctive Queries 315 . Label Systems of Type A The following theorem containment systems of of type ‘1’HEOREM of a necessary conjunctive and queries sufficient over condition databases with for label A: For two unions 6.1. the inequality u ~~~ a, <, type A iff for all PROOF. identifies unions For 1 s i s nl the “if’ of conjunctive U ~~~ a, and queries U ~~~ B,, U ~! ~ /3, holds there with respect to a label system ~ 1 s j < n2 such that a, <, (31. exists direction, assume that all for 1< i s nl there of exists 1< j < n2 such that a, <, ~j. Consider a database instance and let Pa, PP, Pai, and PPj denote the results of applying U ~~~a,, (J~~ ~ ~,, al, and ~~ tO that instance, respective] of the conjunctive t in the domam y. For any tuple queries, Definition 6.2 implies of the consequent that : Paz(t). Pa(t) = (7) 1=1 Properties (A3) imply there that and (A4) together exists some with associativity 1 < -I < nl ; Pa,(t) and commutativity of + such that < Pa,(t). (8) ~=1 By the aI <, premises of PJ, which Finally, imply the implies property (A3) “if’ direction, there exists some J < TZ2 such that 1< that and associativity of + together with Definition 6.2 that PpJ(t) < ; PFJ(t) (lo) = Pp(t). ,j=l The combination consequent of (7)-(10) of the yields conjunctive that queries for any tuple for arbitrary tu-pies t and arbitrary database 2.11 imply that l-l~!l a, <, U~!l (3,. The proof of the “only-if’ direction is very 3.1, but is given here in full detail for t in the Pc,( t ) g PD (t),Since clarity. instances, similar domain the Definitions to the Assume that of the above holds 2.10 and proof of Lemma the inequality U ~1, a, <, U ~1, ~, holds. Let al (resp., b,), 1< i < n, be the distinguished variable in the i th argument position of the conjunctive queries in U :1 ~ a, (resp., U ~! ~ ,t3~). Consider ACM an arbitrary TransactIons conjunctive on Database Systems, query ar in U:! ~ a,. Vol. 20, No, 3, September 1995 316 Y . E. Ioannldls Furthermore, variables in aI consider to nonzero elements . . ..x7n )). and I?pj Since if that instance, premise of denote instance the formulas such that t = (6L, (x1),....OL. Tin))))for some ,.. .,xm) for a~ ; in that of applying direction (.J ~1 ~ a,, (Al) imply and (A3), U ~! ~ PJ, Definition 6.2, O,,(an))) eL’(an))). of L with there respect must exist to < some and the additive ~,1 in U ~! ~ ~ identity, such 0< Pp.l((f),,(al)....) (),(a.))) true with respect and such that, to the for any al. that < Pfl((6L, (a1),..., q(a,t))) element implies results Properties “only-if” Pal((oL, (al),..., O is the least above the respectively. the < pb((o,, (al)....> the a database from all atomic is satisfied: Q(x, o< 0, is one-to-one Ot maps otherwise. Let P<,, PP, P., to C and of L. Consider o, \ /3j 6 of aI such that \ = the a valuation Q the following i?l(Q(xl, Q(t) and R Ramakrishnan in aI onto some set of constants any relation and and as well. Thus, a valuation 0’ of BJ exists given database instance that is compatible atomic formula Q( yl, . . . . v,. ) in that that with PJ, Q({ ~~(yl), is d .... O:( y~ )) ) + O. By the construction of the database instance, the above implies that d; maps the variables of BJ into the set of constants C. Valuation 0,, is Taking the composition one-to-one and onto, so its inverse d,: 1 is defined. = (3C 10 (3;, it is easy to verify that it is a homomorphism from the variables of PJ to the variables of a{. Theorem 4.1 implies that aI <, BJ. Since the above was true for an arbitrary conjunctive query cr~ in U ~1 ~ a,, the theorem h ❑ follows. When considering the label system that captures of relations as sets, Theorem 6.1 is identical Sagiv and Yannakakis [ 1980] on containment Its significance containment 6.3 is that exists Label Systems We now turn containment our of it demonstrates for arbitrary label the usual interpretation to the corresponding theorem of of unions of conjunctive queries. that systems the same of type characterization of A. of Type B attention unions of to label conjunctive systems queries. of We type show B and that the there label system for which the problem is undecidable, specifically, system that captures the interpretation of relations as multisets. tion is from a variant from the Diophantine equation problem, problem of is one such the label The reducso we first state some definitions related to that. Let .&( x ~, Xz, . . . . x~ ) be a polynomial in k variables with integer coefficients. The equation AC xl, Xz, . . . . x~ ) = O is referred to as a Diophantine equation when only its integer solutions are being sought. ACM TransactIons on Database Systems, Vol 20, No. 3, September 1995 Containment The problem of determining whether solution to such an equation there is no decision procedure The following equivalences (m3x1, there xk)A(xl, ‘=’(VX1, X’2, ..., x2, xk), xk) a nonnegative 1982]. 317 . More integer3 formally, a variable =0 x~)A(xl, xz, ..., x~)l–(A( xl, Therefore, the problem of determining for all nonnegative integer assignments for Queries are straightforward: x2,..., 1 s i s h, the following exists is undecidable [Davis for the statement =(v-r~, xz, ..., addition, of Conjunctive XO that xz, . . ..x~))2 <0. whether a polynomial is nonpositive to its variables is also undecidable. In is equivalence +0, Xk) distinct from all other variables x,, holds: ~(vx~,x~,..., xk)xo A(.xl xl, ) < ,x2,..., 0. Note that XOA(XI, Xz, ..., Xh ) is a polynomial without a constant term. Hence, the above equivalence implies that the problem of determining whether a polynomial is nonpositive for all nonnegative integer assignments to its variables remains undecidable even if the polynomial has no constant terms. This final undecidable problem related to Diophantine equations is the one that we use in the following result: 6.2. Consider the following label system of * , + , (), ~ ), where L = No, the set of nonnegative integers, THEOREM (L, the usual integer multiplication on the integers. Also consider and addition, two unions and < PROOF. Let Given the above problem problem of containment is undecidable positive, or arbitrary 4Note that we have two polynomials U ?I ~ a, ~, total order U ~~~ a, and ~!: 1 P, with X2, ..., xk ) and Wxl, X2, . . . . Xh) be two polynomials in positive integer coefficients and with no constant terms. We above that there is no decision procedure for the statement~ (VX1, X2,..., problem queries @(xl, k variables with have established 3The is the usual of conjunctive lJ:: I B,. The problem of deciding whether or not respect to the above label system F is undecidable. type B: -Y’= * and + are with xk)(@( x1 ,x2,..., instance, of unions we construct of conjunctive independent integers. rewritten a polynomial positive coefficients. xk)sw(xl, of whether with ACM Transactions arbitrary on Database x~, ..., an equivalent queries. the solutions integer Systems, sought coefficients Vol Xk )). instance are of the nonnegative, as the difference 20, No 3, September of 1995 318 Y E, Ioannidis . Assume that and R, Ramaknshno.n the polynomial @(x1, @ is of the form x2, . . ..xk) s ~a,x:’’x$” = ““” .x;”, L=l where there are s terms in @, which have been numbered in some arbitrary way from 1 to s. For all 1 < i < s, a, is a positive integer, and for all 1 <J’ < k, c,] is a nonnegative integer with at least one of them being positive (cD has no constant queries as follows: variable xi, terms). There 1 s z s k, From 0, we is a unique, plus the construct unary unary a union relation relation of conjunctive symbol symbol P Xl for for the each query consequent. All of these relations share the same domain D. For each term conjunctive queries in the union; that is, ajx;’’x; ~ ““” x~~, we put a, identical there are a total of n ~ = Z:. ~ a, conjunctive queries. Each such conjunctive query is of the form c, ~ times Xl(u) Let Ax’l(u) c,,, times A . . AXI(V) A . . . A Xh(U) AX1(V) A ““ ~X~(U) be the union of conjunctive queries thus constructed. of conjunctive queries U ~~ ~ ~j can be constructed from lJ ~1 ~ al the union mial Similarly, the polyno- W. What remains to be shown (b’xl, For the “only-if” P,, denote associated ,x%, . . ..x~ u;!lal<r U;!l direction, consider )<Wxl,x2,...,xh) p,. an arbitrary database instance 1, and let the result of applying U ~~~ a, to 1. If X, is the relation (function) with the symbol Xl in this instance, then, for any element d = D, Pa(d) above structed is that x2, . ..rk)(@(xlx1 * The ~P(u). = @(Xl(cZ), is a straightforward from the polynomial U 75 ~ P] to 1, then, consequence @. Likewise, for any element PP(d) Xz(d),..., = WX1(d), X~(d)). of the (11) way if PD denotes IJ ~~~ a, the result was con- of applying d = D, X,(d),,,., X~(d)). (12) If (Vxl, xz, . . ..x~)(@(xl. xz, x~), x~) <IP(Zl, Xz, . . ..xk)). then by (11) and (12), for any element of d = D, P.(d) < PP(d), which implies that P. <, Pp. Since the above holds for an arbitrary database instance, we conclude that u;!, ffzsr U;?l P,. For the “if” direction, consider an arbitrary set of nonnegative integers Xl assigned to the variables xl, respectively, of CDand ~. Let I be a database = x,. If P. and P~ instance such that, for some d G D, for all 1 < i s k, X,(d) then denote the results of applying U ~~~ a, and U ~j, ~j to 1, respectively, clearly ACM TransactIons on Database Pw(d)=@( X1, X2, . . .. x}.), (13) PP(d)=T( Xl, Xz, . . .. X~). (14) Systems. Vol 20, No 3, September 1995 Containment If U ~~1 a, S, U ;~l DJ, then that @(xl, xZ, ..., Xk)swxl, trary set of nonnegative of Conjunctive Queries p.(d) s J?@(d), which by (13) and (14) implies xz, ..., Xk ). Since the above holds for an arbiintegers x,, we conclude that (Vx ~, x ~, . . . . Xk)). x2,..., x~)(~(xl, xz, ..., xh)<q?(xl, The two problem instances are equivalent, and the polynomial problem is undecidable. Therefore, the problem of containment conjunctive queries with respect theorem is undecidable as well, We should important there emphasize label could that system of type be some such label to the label system for unions inequality of unions of of type B described in the ❑ Theorem 6.2 B from system deals explicitly a practical for which with perspective. of conjunctive queries in the the most In principle, the problem of unions of conjunctive queries is decidable, but Theorem there is no general decision procedure for all such label particular, multisets, 319 . of containment 6,2 shows that systems and, in case of relations as 7. DISCUSSION One of the benefits useful types syntactically. of our general detail. This is a variant fuzzy-set reasoning. 7.1. Definition satisfies (AI’) is the A of our label type system 2 A system = (L, ability to identify other containment for them new label system in some and is also *, +, O, < ) is of motivated type by A’ if it the following: Va, bGL-{O}, O<a*b (A2’)vcze L,a*a=a. (A3’)Va, a’, b, b’~L, With label framework of label systems and to characterize As an example, we consider a specific respect system Example <a. (a<cz’andb <b’) ~a+b<a’+ b’. to type A, condition (A4) has been dropped. is also a type A’ label system. 7.1. For certainty factors, L is equal between O and 1, as noted earlier. However, rein, the operation + is not always taken expert system denote addition of type A, but Clearly, to the every set of real type A numbers while the operation * is usually to be max. For instance, in the iMYCIN, a + b is defined as a + b – a. b (where + and “ and multiplication over numbers, resp.). This label system is not of type A. Many other choices for + also yield type A’ systems. Lemma tion 3.1 clearly for conjunctive sufficient holds query for type A’ systems, containment. giving us a necessary The following theorem condi- establishes a condition: f<, THEOREM and ~: 7.1. Consider r two conjunctive BI ~ . . . ~ B~z ~ CP. Assume ACM Transactions that queries Va, b ~ L, on Database Systems, a: Al A ... A a < b * fi,(a) A~l ~ c. < f~(b). Vol. 20, No 3. September 1995 320 Y. . E, Ioannidis and R, Ramakrlshnan Then, the inequality a <, P holds with respect to a label if there exists a variable-onto homomorphism h: /3 - a. PROOF. ith Let argument a, (resp., position homomorphism h: system J? of type A b,), 1 s i s n, be the distinguished variable in the of a (resp., /3). Assume that there exists a variable-onto ~ ~ a. Consider (resp., @p) of all valuations F be defined as in Lemma a database instance I and the of a (resp., B) that are true with respect 3.2. Then, F has the following additional set O. to ~. Let proper- ties. o ● @a, Ol(cti) (a) For all (b) F is one-to-one The proof from of property Theorem < (F(O~))(cp); (a) is identical 4.1. The proof that is, OZ(ca) < 1910h(cP). (9. to (3P. to the corresponding of property (b) is identical part of the proof to the corresponding of part of the proof of Theorem 5.1. The rest of the proof is identical to the corresponding part of the proof of Theorem 5.1, observing that condition (B2) is identical to condition The proof framework: of Theorem The exact identified of interest This ❑ (A3’), 7.1 illustrates properties upon an important benefit of our algebraic which each part of a proof rests can be precisely, and this is of great help in identifying for which useful results can be established. still leaves be strengthened studying open the interesting to provide equivalence and various label systems systems, pose further A more basic the of whether sufficient In particular, be both the of conjunctive or not our conditions systems Theorem condition. here, and possibly other problems in this area. is whether liberal. should and case of unions considered interesting question be made more least element question a necessary new label 7.1 can Of course, queries for interesting on label the label systems can can we relax the requirement that the additive identity and the multiplicative annihilator? Our final example provides some motivation for such fundamental extensions and also illustrates that our generalization of conjunctive queries queries. using Example label 7.2. systems Several generalizations seek to perform path performing computations label is equally computations useful of the in transitive by associating as paths the labels are enumerated, context of recursive closure with of a graph paths Typically, and by the set of labels is either the set of nonnegative integers or the set of nonnegative reals. The operations correspond to computing a new label for a path by “multiplying” ( * ) the labels for the subpaths used to generate the path and to computing a new label for a set of paths by “adding” ( + ) the labels for the paths in the set. By making different choices for * and +, a variety of path problems can be stated in this framework. Table IX contains some characteristic examples. Although we do not consider recursive queries, as Table IX shows, we can use any of these label systems to define useful databases and queries. For example, the label system in reachability is the traditional relation-as-set ACM TransactIons on Database Systems, Vol 20, No 3, September 1995 Containment Table IX. Characteristic * Problem Reachability Maximum Bill capacity path of materials Shortest Most path reliable Critical path (longest) path Examples + and L max Nonnegative integers A sum Nonnegative integers B integers ? min Nonnegative max [0, sum max of conjunctive Sagiv and it in the case of Theorem multisets, which 6.1 (in we prove Chaudhuri the not while and Vardi [1993] solved by Chan- and Klug dependencies. of conjunctive essentially same problem independently satisfy because in the presence Johnson the union the systems, the problem is decidable, contrast to undecidable). it does not label and inclusion addressed or even as label because * is not for sets was studied dependencies, problem because even containment of functional [1980] that system are et al. [1979] multivalued sets and showed Recently, ? integers restricted to nonnegative edge labels, Unfortunately, not all proposed path B label query Aho presence and Yannakakis ? 11 Nonnegative critical path annihilator. The studied a type type A as type A or type B label systems, path is not a type A label system WORK of functional “true”} sum it is not [1977]. system product 8. RELATED problem Label min (Bl). Shortest path and neither has a multiplicative dra and Merlin Problems (“false,” product 321 ● or can be viewed Most reliable idempotent, of Path Queries and view, whereas bill of materials, when essentially treats relations as multisets. systems systems! of Conjunctive queries proving over [19821 Finally, for a special relations discovered as Theorem 5.1, Proposition 5.1, and Theorem 5.2 (i.e., similar sufficient conditions for conjunctive query containment) for the case of multiset queries. In terms of differences with our work, they query containment for multiset conjunctive query fore in NP, but equivalence not known also showed that the problem queries is II # hard, whereas is the same as graph to be NP-complete). additionally deal with unions of conjunctive ment is undecidable in that case. Moreover, multisets alone, we present systems), which captures a much a variety more isomorphism (and thereOn the other hand, we queries whereas general of possible of conjunctive surprisingly, and show that containtheir entire work is on formalism interpretations (that of label of relations (including sets and fuzzy sets). Hence, even the results that are common with those of Chaudhuri and Vardi are in essence more general, since they hold for all label systems of type B. The algebraic framework that we have used in this paper is inspired by an idea of Ioannidis and Wong [1991]. Specifically, that earlier work talks about associating a real number with each tuple of a relation, so that an algebra ACM TransactIons on Database Systems, Vol 20, No 3, September 1995. 322 Y E. Ioannidls . and R. Ramakrishnan can be formed and a theorem on Iinearizability relations as sets of tuples) can be obtained. real-number idea and have generalized of multilineal recursion (over In this paper we have taken the it to attach an arbitrary label to a tuple. Based on that, we have then developed the entire formalism of label systems, their properties, and the results on query containment. There have been a few other pieces of work that deal with relations as multisets, but address problems other than containment of conjunctive queries. Dayal et al. [ 1982] were probably the first to consider multiset queries in SQL, using a model of relational algebra with control over duplicate elimination. mantics junctive Maher and Ramakrishnan [1989] for logic programs that generalizes queries studied in this paper. The proposed the multiset main results a multiset se- semantics for conin that paper were that checking if a general logic program and that there is a sufficient condition generated. Mumick et al. [1990] studied generated duplicates is undecidable for ensuring that no duplicates are the issue of multiset queries in SQL and nested showed using the how magic selection-pushing sets technique. into ered the semantics of general SQL queries. The area of fuzzy sets has been studied work by programs sets and 1988]. 9. SUMMARY We have multisets, reasoning with expert recently following the proposed an extension that is closely related systems [Parsaye and considseminal of logic to fuzzy Chignell AND FUTURE WORK generalized and other the notion refinements examined the problem types of label systems. deal in can be accomplished et al. [1991] extensively Zadeh [1965]. Van Emden [ 1986] to deal with quantitative deduction uncertainty queries Final ly, Negri relations of a relational to the concept database to cover fuzzy sets, of a relation as a set. We have of conjunctive query containment for Specifically, we have shown that earlier as sets are generalized for all label two important theorems that systems of type A, which include sets and fuzzy sets as special cases. We have also provided sufficient conditions, and in special cases necessary and sufficient conditions, for label systems of type B, which include relations as multisets, an important special case due to its significance in SQL. problem of containment of sets of conjunctive earlier set-based results for all label systems We have queries, of type also addressed the generalized again A and proving undecid- ability for label systems of type B. An interesting open problem is that of containment of individual conjunctive queries for type B systems. We have presented a necessary and sufficient condition for queries with no repeated predicates and a sufficient condition for the general case. Is there a general necessary and sufficient condition? Is the problem decidable? (Note that Chaludhuri and Vardi [1993] established a lower bound for this problem, but not an upper bound.) Given that the problem is decidable for both individual queries and sets of queries over relations ACM as TransactIons sets and on Database our undecidability Systems, Vol 20, No result 3, September for 1995 sets of queries over Containment multi sets, this raises individual queries. There are several include dealing formalizing ties the possibility a label the techniques of query additional areas recursive queries with system and addressing that that developed the problem that require the containment in this work further even for exploration. traditional query 323 . of deductive problem into Queries is undecidable in the context captures the query of Conjunctive These databases, notion of probabili- for it, and incorporating optimizers that face issues containment. REFERENCES AHo, A., SAGIV, Y., AND ULLMAN, Comput. 8, 2 (May), BLAKELEY, J. A., views. In LARSON, Proceedings (Washington, P. A., of the CHANDRA, A. K., AND MERLIN, data Computing bases. (Boulder, CHAUDHURI, with neering DAVIS, M. DAYAL, over 1993. Calif., In GRANT, ACM, J., Mar.). New Nicolas, Eds., In York, (Apr.), J. updating materialized Management of Data of conjunctive ACM Symposium queries in Theory of on of real DC., conjunctive June). queries. ACM, S., AND SHIM, New K. 1995. International In York, Proceedings 59–70. Optimizing Conference queries on Data Engi- 190–200. Dover 1982. An of the Publications, extended New relational 1st ACM–PODS York. algebra Conference with (Los control Angeles, 117-123. 1981. Advances Press, May. Optimization New York, 1991. in deductive Base Tlzeor.y, Vol. m Data and conventional 1, J. Minker. relational H. Gallawe, and J. M. 195-234. Towards and algebraic theory of recursion. J. ACM 38, 2 329-381. JOHNSON, D. S., AND KLUG, tional and Angeles, KING, J. inclusion Calif., J. MAHER, A. 1982. ACM, Quist: New I. S., deductive Symposwm PIRAHESH, databases. Australia, Aug.). M., In union SELLIS, and difference T, K. SHENOY, S., queries 1st ACM–PODS under func- Conference (Los 164-169. query optimization Conference 1989. (Cannes, of the R. 16th Aug of logic (Cleveland, AND RAMAKRISHNAN, in relational France, Di2j6 vu in fixpoints Programmmg G., AND SBATTELLA, Syst. 16, 3 (Sept.), M. AND YANNAKAKIS, Conference R. Proceedings PARSAYE, K., AND CHIGNELL, Y., VLDB of conjunctive of the Ohio, 1990. databases. programs. Oct. In Proceed- 1), 963–980. Duplicates and VLDB Con&ence International In ). Pp. 510-517 aggregates in (Brisbane, 264-277. PELAGATTI, Database pp for semantic on Logic H., containment Proceedings York, A system M. J., AND RAMAKRISHNAN, Trans. In of the 7th International of the ALP MUIVIICK, Testing dependencies. Mar.). 1981. Proceedings SAGIV, SIAM 77-90 Unsolc]abJ@. communication, J. Plenum York, York, Proceedings IOANNIDIS, Y. E., AND WONG, E. NEGRI, New and Personal systems. implementation Annual 9th of the IEEE IEEE, In AND MINKDR, database Etllciently the R., POTAMIANOS, Computabdity 1995. expressions. Lyon feren ce on the (Washington, Proceedings elimination. Mar.). 1986. Optimization Conference Taiwan, duplicate relational Optimal New ACM, U., GOODMAN, N., AND KATZ, R. H. GRAEFE, G. ings 1977. May) Colo., views. 1982. F. W. of S., KRISHNAMURTHY, (Taipei, among ACM–SIGMOD proceedings S., AND VARDI, M. materialized Equivalence AND TOMPA, 1986 P. M. In of the 12th ACM–PODS CHAUDHURI, 1979. ACM, New York, 61-71. D. C., May). relational J. 218-246. 1986, 1988. M. Global query AND OZSOYOGLU, Proceedings of the 1987 cisco, Calif., May). ACM, M. ACM York, semantics for Experts. among 27, 4 (Oct.), of SQL queries. ACM In 1987. A Proceedings D. C., May). system Conference Wiley, relational New York. expressions with the 633–655. (Washington, ACM–SIGMOD New Systems optimization. of Data Z. Formal Eqmvalences J. ACM on the Management 1991. Expert 1980. operators. L. 513-534. for of the ACM, semantic 1986 New query on the Management ACM-SIGMOD York, 191-205. optimization. of Data (San In Fran- 181–195. Transactions on Database Systems, Vol 20, No 3, September 1995. 324 . Y, E. Ioannidis M,, ANDIOANNIDIS,Y. ‘1’S.4TALOS, O., SOLOMON, data independence. Chile, M. H. 1 (Jan,), 37-53, ZArYJFI, L. 1965. ACM In Proceechizgs 1986. Quantitative nf the 1994. 20th The grnap: Intw-natmnal A versatile VLDB tool for C’on ferenw phwcal (Santiago, Sept ), 367-378 VAN EMIXN. Received and R. Ramakrishnan January Transactions Fuzzy 1992; sets revised on Database Inf. deduction Control June Systems, 1993; Vol 8.3 and Its fixpoint (,June), August 20, No theory J. Logrc Program 338-353. 1994, and 3, September May 1995; 1995 accepted May 1995 3,