Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Fast Algorithms for Universal Quantification in Large Databases GOETZ GRAEFE Portland State Unwersity and RICHARD L. COLE Redbrick Systems Universal quantification it adds significant for analysis the reasons for mance omission that and been embodies effects of explicit inputs larger than performance comparisons comparisons demonstrate and that the predicate over two relations over the same system. expressed the very directly rather and H.2.4 General Terms: research has and 18. monitored Sequent Computer author’s (goetzg Permission to make granted without advantage, given that servers, supported IRI-91 Digital Systems, 98052-6399 by the 16547, the for experimental Moreover, evaluates a universal join evaluates an existential and universal quantification algorithm to a query quantification query optimizers division, should be do not recognize and [Database H.2.3 therefore produce Management]: Lan- processing ADP, address: and the Intel Advanced Microsoft Science Foundation Research Laboratory Corp., Oregon G. Graefe, National Advanced Research Equipment Projects under contract Supercomputer Computing Corp., with Institute Way, (ARPA DAAB-07-9 Systems One Microsoft grants Agency 1- Division, (OACIS). Redmond, WA @ microsoft.tom). digital/hard fee provided copy of part that copies the copyright notice, the copying M by permission or to redistribute Q 1995 ACM perfor- queries. Systems—query by the US Army Instruments, current Files; the Experimentation partially Contact [Data]: E.5 three algebra variants and proposed as relational quantification the the algorithms. universal most main perfor- and compare enforcement, algorithm recently of the their division, Analytical (semi-) that because in SQL Management]: is One and we investigate existential the study universal Descriptors: as hash attributes. among division Thus, language available been Texas of our query IRI-8912618, number C-Q518), Subject as fast by adding result for many strategies. differences relations. efficiency Algorithms, IRI-8996270, order plans [Database guages; This second formulations evaluation Categories equal A in a database indirect poor with integrity the fact that in particular algorithms relational algorithm, proposed despite capabilities, we describe for each substantial recently article, referential execution predicate be supported For parallel quantification can In this and the systems of set-valued algorithm removal illustrate database and inference quantification quantification. quantification evaluation and universal proposed duplicate memory, in most processing databases. recently universal mance that for large one directly query relationships has not been explored algorithms operator to a system’s of many-to-many this have known is not supported power to lists, 0362-5915/95/0600-0187 ACM Transactions are or all of this not made work title of the publication, of ACM, Inc. To copy requires prior specific for personal or distributed for or classroom profit use is or commercial and its date appear, and notice otherwise, to republish, to post permission and/or is on a fee. $03.50 on Database Systems, Vol. 20, No 2, June 1995, Pages 1S7-236 188 . G. Graefe and R. L. Cole 1. INTRODUCTION Quantification database which Similarly, the queries. division example research request using a relation query is to find list, there R that those “for-all” algebra evaluation customers indicates who which all do not meets directly or efficiently provide the it, consider which A such that for each criterion in R matching A and B. In relational of R by the list of interesting of universal quantification relational quantification of a given customer i.e., predicates includes universal systems satisfy algorithms. quantification, complex complex and quantification, semi-join universal evaluate most databases, customers a tuple supported not well-known complete for query most to find exists to and existential or algorithms to implement operation of a universal quantification query, query is a division application a typical are system sets support using required Nonetheless, querying also requires a relationally operator relational As an for as SQL completeness of a database 1972]. division, concept such be implemented relational ability [Codd languages easily can However, the is a powerful query a market list of criteria, criteria. This B in the given algebra, this criteria. In fact, the query is and relational division, which in any commercial relational database system. Other computer can easily be imagined. Which students have taken all courses required to graduate? Which courses have been examples science can provide all by all students of a research group? Which suppliers assembly? Which parts are available from all regional for a certain taken parts suppliers? Which Which skills applicants common are possess all to all skills recent required hires? for a new job opening? In order to illustrate the ubiquity of useful universal quantification and relational division queries, are intentionally based on the relational schemas frequently these questions used for simple database examples. In fact, quantification induces a pair of universal many relationships underlying the first set-valued only the queries with a restriction and to “recent Given many-to-many relationship only the two many-toabove. Similarly, any a pair of universal quantification queries, not attribute suggests “skills’’-attribute underlying the last two questions. Many of these answer very useful questions in the real world, in particular when combined science” any queries, not four questions the power on the divisor, e.g., the restrictions to “computer hires.” of universal quantification and relational division to analyze many-to-many relationships and set-valued attributes, why have they been neglected in the past? We believe that this omission has largely implementation algorithms. This article been due to the lack of efficient attempts to remedy this problem by surveying algorithms for relational (all data fit in division and comparing their suitability for small data volumes memory), results to all complex ments ACM for large is that prior algorithms cases as demonstrate Transactions data volumes, a recently well that on Database proposed by its as in and for algorithm generality parallel relational Systems, Vol parallel and query its performance execution division 20, No execution. 2, June Our hash-diuision called can 1995 be systems. computed primary is superior in simple Our as fast and experias a Fast Algorithms hash-based join performance enable the mented or semi-join can be of the achieved optimizer to by relational same only detect for lJniversal Quantification if pair and relations. language universal division, of input query and fastest this formulation that relation 189 However, query quantifications if the . can division be imple- algorithm is used. Because universal glected in for supporting not in past database tional universal first rationale tend R in a difference product plan ignore four and relational be very and alternative rela- operators. negated existential functions. Fourth, bother? B) and S(B) are relations, then [Maier 1983]. However, Cartesian this one must We require matched will present relational against as a join. expression Cartesian to be is as expensive equivalent inefficient. this so why which on the do not explicitly First, relational aggregate ne- I?( A, operation, that division using using been rationales systems. other expressed frequently, if large, of have traditional evaluation terms x S) – R) based is plans very correct: to be very operator therefore in division address query can used is products evaluation in be expressed – ~~((m~(l?) dividend evaluation and expressed not relational and quantification it can are R + S = m~(R) query name quantification Third, predicates The be and us languages can quantifications. for-all let universal query division Second, quantification research, using more the Thus, a a Cartesian efficient product operations, division in our query and will performance comparisons. The second queries as we will clauses the rationale expressed see in and tifications are much systems, the in algebra expression above. B-trees, the plan EXISTS” Section The query clauses third two simple specify this such proposed not in At if both plan inputs are naive division In fact, many in two to quan- evaluation subqueries; similar quantification to the query using in Furthermore, as universal article. any query even universal SQL. choose this quantification best, in unnest complex, EXISTS” queries, queries will very “NOT complexity, correctly therefore a universal nested example Given evaluation be similar rationale queries be discussed and complex. ing our a query for quantification are those “NOT the EX- relational indexed with clustering query with two algorithm “NOT discussed in 2. division cation will for do a universal results example, recognize ones universal quantifications for and optimizers formulating clauses to not divisions, than although predicates. hard will relational slower SQL, required are relational correct, existential query typically and commercial ISTS” 2. In queries optimizers plans Section of additional quantification also negated subqueries absence query is using is based in Section relational Moreover, division by on division means in rare inputs be duplicate-free, valid, and SQL formulations and their query 2. Unfortunately, as we except also aggregation of will formulations queries see in based the aggregate referential ACM Transactions aggregation results in integrity on Database between Systems, poor requires the will quantifi- are evaluation, performance relational plans of universal performance functions circumstances—competitive that on for evaluation also very implementperformance that inputs Vol. 20, No. 2, June both hold, 1995. 190 . and G Graefe and R. L, Cole that the Whenever the query the processor divisor relational division a separate step, systems. Thus, important for use is the result inputs which hash-based, will not hold is particularly direct not of a selection, and must expensive algorithms that modern, parallel database rationale seems to be the sort-based, referential in avoid aggregation. integrity parallel than explicitly query aggregation systems between be enforced they in evaluation are even have been more in the past. The fourth tification not and at all relational clear what quantification SQL not cation if used, or support on-line the in following relational algebra because algorithms’ decision support, context that the quantification. also (as opposed universal Moreover, in constraints bases, and or universal the to other data with as by Carlis [ 1986] its model. about that serve those and that models can such by model conclusions algorithms level quantification data familiar and operators to algorithmic) quantifi- knowledge relational is most here the in many-to-many in general for reader Furthermore, express integrity mining, of the applicable quantification to to grow. algorithms considered are data is quantification earlier. complex it universal universal discussed be expected the complex analyzing of quantification must assume sophisticated conceptual as enforce that universal in quan- However, possible are While universal used, environments, that algorithms Is it valuable attributes, chose one: and they time? very examine performance universal more We we the because systems particular important taught is effect. a long importance sections division. However, what are set-valued programming, The run most rarely shunned to database quantification and division and next-generation are in transaction-processing relational relationships on sets are tend be useful and logic is cause queries and, may division the require as a basis explored and for on a by Whang et al, [1990], Section 2 gives relational division. discussed in an algorithms using memory. parallel Section cost for both Section final files 5 4, for comparison. contains been called and proposed intermediate of plans 7 uses the experimental a summary and results the these larger algorithms to systems. and derive cost formulas comparison our is of distributed-memory evaluation for hash-di~ision modifications adaptations Section An have consider and query algorithms. section we inputs shared-memory four that algorithm describes complete the performance 8. The In algorithms proposed Section 6, we illustrate analytical Section 3. temporary machines, formulas of recently Section than main overview A conclusions In analytical for follows from any in this research. 2. EXISTING ALGORITHMS most clearly, we introduce two example In order to describe the algorithms database queries that will be used throughout this paper. Assume a university title ) and Transcript ( student-id, with two relations, Course ( course-~zo, course-no, are interested grade) in with finding the the obvious students key attributes. who have For taken ACM Tramsact,ons on D~t~base Systems, Vol 20, NO 2, June 1995 the all first courses example, offered we by Fast Algorithms the university. As an English query relational In this algebra, this example, projection The tient. the query is the attributes of the in the 191 . database, this is courses (courseappears in the is of Transcript The divisor, divisor the example. The quotient that are not in the divisor, relations in projection of Courses the Transcript such that for all this student-id and course-no find the students ( student-id’s) no’s) in Courses, a tuple with relation. Transczzpt In over for Universal Quantification in is the division are called and dividend result is called attributes, divisor the the quo- course-no are the attributes of the dividend in the example. The set of quotient attributes student-id attribute values in the dividend, n,t .~,, ~.,~ (Transcript) in this example, is candidates. The set of all dividend tuples with the called the set of quotient candidate group. same quotient attribute value is called a quotient It is important to notice that both relations are projected on their key attributes in dividend or the take same the this an additional key division attribute be properly employed desirable We also assume, for this integrity tion. values a priori; semi-join) must As our second taken all the holds It is important known string have queries above the that same seem does not may very well dividend the validity of the division explicitly enforce it. example, we courses, For restrictive effect almost hold be transcript courses for this example, a in finding the which title the e.g., selection considered entries referential Courses that integrity do not refer the by two constraint difference has ramifications when division is implemented tions, which will be described shortly. Table I summarizes the cases that must be considered have a prior would example assumed In the example, to database (a contains a semi-join, While relation, who attribute is restricted be step students divisor here. all constraint a preprocessing divisor, the that Cozu-se rela- the the restricted a referential namely in on the identical, the interested i.e., represents integrity or expenin Thus, divisor, referential the must to duplicates relation appear algorithm either can be quite operations as the for are also require or the inputs courses. and relation the “Database.” Other between and to would is insensitive the Transcript students the duplicates, them removal that in permitted of Transcript on studentremoval step. As we will contain duplicate that Transcript otherwise, database selection. example, between in the may tuples were relation be able to handle an algorithm relationship constraint course-no Transcript division must Performing very many-to-many the of duplicate if students term and the projection require an explicit duplicate preprocessed. sive, making its inputs. problems However, times, of relational algorithm the arise. multiple would inputs Thus, do not course id and course-no see, if the example. divisor there courses. using This aggrega- separately in a complete analysis of relational division algorithms. In order to shorten the discussion, we only consider cases in which either both or none of the two inputs may contain duplicates. Thus, in the analytical and experimental ACM TransactIons on Database Systems, Vol. 20, NO 2. June 1995. 192 . G. Graefe and R. L. Cole Table I. Cases to Consider Vlolatlons referential Case for Relational of Division Duplicates mtegrlty Algorithms Duphcates in the diutdend 1 No No No 2 No No Yes 3 No Yes No 4 No Yes Yes 5 Yes No No 6 Yes No Yes Yes Yes No Yes Yes Yes 8 performance comparisons, we only consider four in the dwmor cases, namely cases 1, 4, 5, and 8. Let us briefly speculate which practice, a dividend relation will many relationship quantification between queries of these cases will often be a relation two entity will find the types. In other an entire set or stored query. frequently, into will e.g., our the relational foreign keys, often divisor duplicate where be students key Before point not are attribute discussing out that example there are not query. permitted arise, to take useful, from the e.g., the in with between other the same hand, the modified course keys require relational division tions Student (student-id, but actually name, major) and a query for the students major. The crucial difference quantification queries do not. Consider and Requirement who have taken all courses here is that the requirements and in the inputs dividend and problem example twice, more E-R-model of above requiring term that is not present in the dividend. specific relational division algorithms, we would are some universal e.g., much the I, duplicates integrity On universal type, e.g., our first be used are represented of Table referential assumed. sometimes often will Translated relationships while be many of one entity subset of instances of another We believe that queries using alternative classification a problem, does subset where the typically removal additional second into as divisor the model, and not can relation Instead, words, set of instances students, that are related to all or a selected entity type, e.g., courses or database courses. example arise most frequently. In representing a many-to- that an like to seem to two additional rela(major, course-no) required for their are different for each student, depending on his or her major. This query can be answered with a sequence of binary matching operations. A join of the Student and Requirement relations projected on the student-id and course-no minus the relation an be projected on student-id’s to obtain a set of students Transcript Student who have not taken all their requirements. The difference of the relation ments. with this finds the The relational algebra students who have satisfied all their expression that describes this plan is ‘?,tudent.,d,course-no( ACM TransactIons on Database Systems, Vol Transcript)) 20, No. 2, June 1995 require- Fast Algorithms for Universal Quantification 193 . This sequence of operations will have acceptance performance because it does not contain a Cartesian product operation and its required set matching algorithms (i.e., join and difference) all belong to the family of one-to-one match operations etrized join 1986]. In that fact, student-id can be implemented algorithm, e.g., hybrid since the attributes, by a single matching The following difference these join could two operations operation division efficiently hash for three algorithms When sort operations and hash-based suitable index et al. be executed could inputs by [Graefe more Shapiro on the efficiently 1993]. on the existence of indices might provide a mechanism indices could be used instead in the sort-based algorithms, in particular indices could be used to the hash-based structures param- 1986; hashing be executed do not depend or other associative access structures that fast or ordered access to tuples. Sort-based explicit indices, by any suitably [DeWitt are available, they for of clustering algorithms. can speed up the affected algorithms, e.g., by rendering a sort operation unnecessary. However, it is not always possible to choose a suitable clustering. For example, keeping the Transcript useful, table but in as for alternative sort example) is equally queries. Any our any order sorted representing on the likely table example table other part to be chosen representing and clustered on a many-to-many of the table’s to enhance a many-to-many join student-id’s is relationship, key (cow-se-no performance relationship the in the for other implies the same problem. Similarly, ing) would multi-table clustering (sometimes called “master-detail” be useful and could improve the performance of some clusterrelational division algorithms. Unfortunately, it is available in only a very few database systems, and it has the same problem for many-to-many relationships as clustering by sort order: one-to-many relationship, can be clustered in two while a master-detail a table representing ways, For generality, relationship is inherently a many-to-many relationship we therefore clustered relations. Moreover, and more importantly, indices typically diate query results. In order to ensure that our apply to both permanent, ing results, we discuss advantages. database inferred cessing division in database systems and physical and orderings can be the cost of prepro- as sorting. 2.1 A Naive Scwt-Based The first dividend attributes stored relations and to intermediate query processand analyze division algorithms without any such The cost of relational such presume do not exist on intermealgorithms and analyses designs that could include suitable structures from the provided cost formulas by subtracting steps do not a Algorithm algorithm directly implements the calculus predicate. is sorted using the quotient attributes as major and as minor sort keys. In the examples, the Transcript First, the the divisor relation is sorted on student-id’s and, for equal student-id’s, on course-no’s. Second, the divisor is sorted on all its attributes. Third, the two sorted relations are scanned in a fashion reminiscent of both nested loops and merge-join. The dividend serves as outer, the divisor as inner, relation. The dividend is ACM TransactIons on Database Systems, Vol 20, No 2. June 1995 194 . G. Graefe and R. L. Cole 1+ Student Student Course Jack Intro Jill Database Jill Intro to Databases Jill Intro to Graphics Jill Readings in Databases Joe Database Performance Joe Intro to Compilers Joe Intro to Databases Jill to Databases Performance I Readings + Dividend Fig. in Databases I 1. Sorted division exactly tuple, and once partially for each candidate quotient tuple which does not participate in the quotient, Differently than in merge-join, both scans can be advanced when an equality match has been The second dividend divisor example. Figure string and values tion, This 1 shows dividend these relation tuples, in would no. Concurrent tables divisor Figure contain scan marked are sorted 1 because typically scans is scanned a tuple e.g., a Transcript algorithm’s three the may divisor naive scanned any of the the into Quotient quotient actually however, found, once, whereas inputs = Divisor of the “Jack and of the divisor determine has not taken the “Database such Dividend, are be identifying Divisor, for tuples naive easier keys does not of a graphics ignores properly they that tuple logic once entirely to read; such in the as match course extraneous and in with in Quotient. The We a real (there the records. division. student-id dividend for each and show applica- course- is only one) that “Jack is not part of the quotient because he Performance” course. A continuing scan through the “Jill” tuples in the dividend and a new scan of the entire divisor includes “Jill” in the output of the naive division. The fact that “Jill” has also taken an “Intro to Graphics” course is ignored by a suitably general scan logic for naive division. Finally, the “Joe” tuples in the dividend are matched in a new scan of the divisor, and “Joe” is not included in the output. Essentially this algorithm was proposed by Smith and Chang [19751 and other early papers on algorithms for executing relational algebra expressions. analyzed in the performance comparisons later in this It is the first algorithm article. SQL Formulations In the introduction, quantification and existential very poor ACM Using EXISTS” Clauses we claimed that the typical relational division queries, quantifications, performance. TransactIons Two “NOT on Database SQL formulation which employs leads not only to confusing complexity For example, presuming that “NULL” Systems, Vol. 20, No. 2, June 1995 of universal two negated but also to values are Fast Algorithms prohibited, [1994] the first example query for Universal Quantification is Date tl,student-id FROM SELECT DISTINCT WHERE NOT EXISTS ( SELECT’ FROM Course c WHERE NOT EXISTS ( SELECT * FROM Transcript t2 WHERE t2.student-id = tl.student-id AND t2.course-no = c.course-no)). In order to process this SQL query, and Darwen Transcript as many 195 o [1993]; and O’Neil t] as three nested loops will be The used. The outermost one loops over all unique values for t1.student-id. middle one loops over the values for c. course-no. Notice that these two loops correspond to the Cartesian product in the relational algebra expression given in the introduction as well as to the two loops in naive division. The with innermost loop determines whether or not there is a tuple in Transcript and c. course-no. Since this loop is essentially a values of tl.student-id search, an index on either attribute of Transcript can be very useful. using B-tree indices, many of the If Transcript and Course are clustered the index searches run-time Thus, when compare for 2.2 of the be two “NOT fast. other algorithms Since the naive scans of the with Division division divisor, the sity into counts enforcing example as many (tuples SELECT best case, will the against case execution be similar of the naive typical logic to naive division, SQL” and division. we also work-around by Aggregation algorithm it may requires be rather in the both large inputs and repeated inputs, One such In fact, in and it seems alternative uses most relational counting is the most efficient way to express is left to the user to rephrase all universal which is a likely of duplicate tuples referential integrity. query can be expressed courses for algorithms. counting. courses as there relation).” In as “find the are courses SQL, relation, that all Transcript are no “NULL” values, this t.student-id FROM Transcript t BY t.student-id COUNT (t.course-no) = (SELECT This query university are the subquery courses taken sorting slow aggregate functions, due to the presence (different) duplicates in either no’s, and that there GROUP HAVING best to search for alternative or, more specifically, quantifications e.g., incorrect taken this clauses algorithms database management systems, for-all predicates. However, it a predicate The first In EXISTS” quantification, Implementing worthwhile aggregations very comparing these universal can source of errors, or the omission of students offered assuming that tuples refer query is COUNT (course-no) who have by the univerthere to valid FROM are no course- Course). is evaluated in three steps. First, the courses offered by the counted using a scalar aggregate operator. This step replaces expression with a constant. Second, for each student, the are counted using an aggregate function operator. Third, only ACM TransactIons on Database Systems, Vol 20, No. 2. June 1995. 196 . those G, Graefe and R. L. Cole students whose number of courses taken is equal to the courses offered are selected to be included in the quotient. If the dividend and divisor relations do not contain unique example, it would dent-id’s and same course-no’s database that case, would course the be be necessary part to explicitly counted. twice, For if the example, first time Transcript relation should of the However, the key. request include before grouping The second in dividend is the SELECT t.student-id t.course-no BY query can be expressed Transcript (DISTINCT COUNT c.title as “find t, Course AND c.title t.course-no) (DISTINCT LIKE illustration would would words FROM = c.course-no COUNT WHERE For the grade. attribute, In which projection Tran- of the students courses who have offered by the c LIKE ‘?% Database’%” t.student-id (SELECT they a failing and counting. example WHERE HAVING stu- take of the key is being removed, in the dividend is required taken as many database courses as there are database university.” In SQL, this query can be expressed by GROUP may a term of as in our of the students resulted script on student-id and course-no; since part duplicates are possible and duplicate removal keys uniqueness some number = course-no) FROM Course “7cDatabase%”). purposes, be necessary we included if duplicates two “DISTINCT” could exist. key words A good query where optimizer infer key and uniqueness properties and ignore these “DISTINCT” key where appropriate. More important, however, are the additional “WHERE” clauses. First, since referential integrity between dividend and divisor does not hold, an additional join clause must be specified. Second, the musk be specified on both levels restriction on the divisor (on “04DatabaseVO”) of subquery nesting. Comparing the and the second count third step only courses, query steps of becomes those evaluation the plans plan above significantly tuples the aggregate and Courses restricted The scalar aggregate from function for remain more the Quantification Queries SQL virtually relation be preceded queries, the Since that it can the same, is but first the important refer by a semi-join to database courses. operator for the divisor easily, e.g., using a file scan, and similarly function and the possible semi-join require ourselves only this section, we will concern Can All Universal two complex. Transcript must the to to database of Transcript be implemented quite the final selection. The aggregate more effort; in the remainder of with these be Expressed operators. by Counting? While it seems to work for the example queries, one might ask whether all universal quantification can be expressed by counting. The essential, basic ideas for a proof are that (1) universal quantification and existential quantification are related to each other as each is negated using the other, and (2) existential quantification is equivalent to a count greater than zero. Negated, ACM Transactions on Database Systems, Vol 20, No 2, June 1995 Fast Algorithms existential quantification universal equal to to zero. the count all This, maximum to a count is equivalent in turn, to the is equivalent possible, i.e., the in the set over which of items universal tion is equivalent quantification quantification for Universal Quantification to a count count of to zero. of qualifying Similarly, items items items is universally can indeed 197 non-qualifying of qualifying the query queries equal count . is equal equal to quantified. be rewritten using the Thus, aggrega- and counting. Since this proof idea is quite simple and very intuitive, formal proof here. In order to capture all subtleties “NULL” universal values must quantification be taken into and relational proper division consideration. queries will pattern given in the introduction, i.e., analyze and set-valued attributes, and therefore involve “NULL” values in the rest of this article. Division The Using Sort-Based traditional way we do not develop a of SQL semantics, Because most follow the query many-to-many key attributes, relationships we will ignore Aggregation of implementing aggregate functions relies on sorting Afterwards, the examples, the count of courses Transcript is sorted on attribute student-id. taken by each student can be determined in a single scan. The exact of this [Epstein 1979]. file In logic scan is left to the reader. An obvious optimization of this algorithm is to perform aggregation during sorting, i.e., whenever two tuples with equal sort keys are found, they are aggregated into one tuple, thus reducing the number of tuples written to temporary files [Bitten and DeWitt 1983], We call this optimization early aggregation. If the query requires a semi-join prior to the aggregation as in the second example, any of the semi-join algorithms available in the system can be used, typically merge-join, semi-join versions, must be sorted from the must student-id’s loops exist. on the divisor grouping relation nested if they attributes (quotient) be sorted join, first index If merge-join nested loops is used, notice for the semi-join, attributes. In on course-no’s the for join, or their that the dividend which are different example, the the semi-join has been used in most database Transcript and then on for the aggregation. Since sort-based INGRES [Epstein aggregation 1979] and DB2 [Cheng et al. 1985], it systems, e.g., is important to understand its performance when used for relational division. Division using sort-based aggregation is the second algorithm analyzed in the performance comparisons below. Division Using Hash-Based Aggregation Sorting actually results in more order than necessary for many relational algebra operations. In the examples, it is not truly required that the Transcript tuples be rearranged in ascending student-id order; it is only necessary that Transcript tuples with equal student-id attribute values be brought together. The fastest way to achieve this uses hashing. Thus, several hashbased algorithms have been proposed for join, semi-join, intersection, duplietc. (e.g., in Bratbergsengen [1984], cate removal, aggregate functions, ACM Transactions on Database Systems, Vol. 20, No. 2, June 1995. 198 . DeWitt G. Graefe and R. I_. Cole et al. Kitsuregawa [1984], others ). The most hashing modified and Shapiro efficient Gerber [1986], technique [DeWitt et al. with techniques [Kitsuregawa Hash-based main DeWitt et al. [1983], for inputs 1984; DeWitt for managing et al. 1989; Nakayama aggregate functions memory hash table, but [1985], Zeller F’ushimi and Gray larger than and Gerber non-uniform tuples, al. [1986], and many memory is hybrid 1985; Shapiro 19861 hash value distributions et al. 1988]. keep tuples of the not input et [1990], output The output relation relation if in a contains the grouping attributes. student-id in the examples, and one or more aggreor both in the case of an average gation values, e.g., a sum, a count, computation. Each tuple with tuple. When input matching is either aggregated attributes, input is consumed, the entire is contained tuple grouping in the hash into an existing or it is used to create the result output a new output of the aggregate function table. If the aggregate function output does not fit into main memory, hash it. Since the Oz)erflow occurs and temporary files must be used to resolve table contains tion input with only fit a total tuples. of Thus, files, for on those of larger sorting than If using et al. the than on hybrid necessary i.e., sort-based 1/0 overflow Shapiro If can students table need hold only 1/0 for temporary based [Knuth the size be used for 500 sorting aggregation are is as considered also on of 1973] output techniques 1986] aggrega- 500 requirements selection hashing. the are without aggregation The that if there hash well, replacement 1984; is not examples, tuples, quicksort. based memory, it the performs files using aggregation [DeWitt In Transcript aggregation runs tions output, memory. 10,000 much larger duplicate main hash memory-sized based aggregation into table hash similar only to slightly for join opera- aggregation and removal. the aggregate function is preceded by a semi-join as in the second example query, the semi-join can also be implemented using hashing, The hash table used for the semi-join is different from the one used for aggregation, just as sort-based semi-join and aggregation require two separate sorts on different attributes. The hash hashing on course-no’s, whereas on hashing studen t-id’s. Let us briefly duplicates the sort filter that sort can in consider either operation compares be duplicates used and duplicates input or, relation if the aggregate remove tuples again. can inputs consecutive to table in the semi-join is built and probed by the hash table for the aggregation is based are tuples. be the already In duplicates, at the In naive eliminated sorted, sort-based although same time. division algorithm, conveniently by other the cannot words, of a simple aggregation, sorting In as part inserting initial remove if duplicate Transcript relation is required, the early aggregation can be used for duplicate removal but the actual aggregation (counting course-no’s) cannot be integrated into the sort operator. Similarly, hash-based aggregation cannot include duplicate removal, since only one tuple is kept in the hash table for each group. While efficient duplicate removal schemes based on removal in hashing in main exist, they require that the entire duplicate-free output must be kept memory hash tables or in overflow files. Thus, duplicate removal ACM the TransactIons on Database Systems, Vol 20, No, 2, June 1995 Fast Algorkhms based on about the sorting hashing and in of lower and hash-based et costs large as used 1986; only in 199 relation, removal a small et [Zeller third dividend duplicate . with based on 1993]. DeWitt SQL is the a very [Graefe been al. NonStop aggregation for operations has [DeWitt Tandem’s 1/0 CPU aggregation Gamma 1994], be expensive number slightly Hash-based e.g., may same for Unwersal Quantification al, and algorithm number 1990], Gray 1990]. analyzed of systems, Volcano in [Graefe Division using performance com- parisons. 3. HASH-DIVISION This section duced as algorithm. algorithm naive contains a description has?l-diuision The design for universal division but in proposed algorithm division Algorithm Description followed was motivated quantification fits on and hashing. into II a classification pseudo-code for the hash-division one the and kept divisor divisor table, the each tuple in the with of for division, shows intro- discussion by a search the a direct similar where of universal 2 gives the algorithm a the to recently quantification and algorithms. Figure called by relational Table tables, for proposed [ 1989], of hash-division based relational of a recently Graefe one for the algorithm. quotient. quotient table, called second first hash table is table. An integer value is the divisor number. With the divisor It uses two hash The each tuple in the quotient table, a bit map is kept with one bit for each divisor tuple. Bit maps are common data structures for keeping track of the composition of a set; since quotient candidates natural component FiDwre of division 3 illustrates are the that the purpose of relational division is to ascertain which are associated with the entire divisor set, bit maps are a same the as those inputs algorithms. the two hash in Figure be sorted. The tables used in hash-division. 1; hash-division, divisor table tuples and associates a divisor number the right contains quotient candidates, however, on the left The inputs does not contains taken incompletely not appear there hashing on tuple. the left either and two “Database” maps. The courses hash no compilers tuples tuple bit hash-division divisor tuples, one filled in was The only and and graphics the attributes. assigns the In the current a unique At of this first end Figure step, in the the divisor than First, The hash bucket is number divisor table their do that relation steps. algorithm by databases determined three the for each and “Joe” indicated immediately as a tuple’s divisor is other in process, count Thus, the course table. is inserted, in it was proceeds divisor courses on subjects because algorithm into all table, divisor with each item. The quotient table on obtained by projecting dividend tuples on their quotient attributes, and a bit map for each item indicating divisor tuples there has been a dividend tuple. The fact that “Jack’ have require all it counts divisor number is assigned is complete, inserts determined to the by divisor when each all the divisor as shown on 20. No 2, June 1995 3. ACM TransactIons on Database Systems, Vol 200 G. Graefe and R. L. Cole . Table 11. Classification of Universal Based Du-ect Indirect Nawe (counting) semijoin Sorting by aggregation // step 1: build assign divisor with the divisor on hashing Hash -du,ision Hash-based duplicate duplicate hybrid hash join, based aggregation aggregation removal, hash- table count +-- zero for each divisor tuple insert divisor assign tuple’s increment Based merge-join sorting Algorithms divwon with removal, and Quantification on sorting tuple into the divisor divisor number the divisor table hash bucket divisor scan appropriate if no matching in the divisor table tuple is found hash bucket in the quotient quotient (candidate) create new quotient from quotient candidate attributes of dividend with to divisor /i step 3: find result in the quotient in quotient tuple’s tuple zeroes insert it into hash bucket of quotient set bit corresponding table tuple is found tuple incl. a bit map initialized for each bucket bucket tuple scan appropriate if a matching appropriate count count // step 2: build the quotient for each dividend table’s +-- divisor table divisor number table table for each tuple in bucket if the associated print quotient Fig, 2. bit map contains no zero tuple The hash-division algorithm, In the second step, the algorithm consumes the dividend relation, For each dividend tuple, the algorithm first checks whether or not the dividend tuple corresponds to a divisor tuple in the divisor table by hashing and matching the dividend tuple on the divisor attributes. If no matching divisor tuple exists, the dividend tuple is immediately discarded. In the second example query, a student’s Transcript tuple for an “Intro to Graphics” course does not pass this test and is not considered further, and the algorithm advances to the next dividend tuple. If a matching divisor tuple is found, its divisor number is kept and the dividend tuple is considered a quotient candidate. Next, the algorithm determines whether or not a matching quotient candiACM TransactIons on Database Systems, Vol. 20, No. 2, June 1995 Fast Algorithms Divisor Database for Universal Quanhfication . 201 Bitmap Number Performance ’21 Fig. 3 date already dividend exists tuple exists, e.g., Divisor in on the because table the quotient the tuple is quotient attributes, and the the new Together tuple for bit the that matching to set step, the with in the one bit in divisor table Finally, the can third exactly contains no zero. quotient Algorithm by tuple, hashing no empty map is created map is initialized number bit all At on the the quotient If, needs the end right bit for zero’s, earlier. that map. as shown on one with a new tuple into with kept exists, candidate’s beginning, is inserted the candidate dividend bit divisor matching quotient the the tuple and such at projecting already is complete, division. a bit This tuple quotient is by in hash If quotient to the table consists the the table new candidate quotient table created table. corresponds quotient the divisor table attributes. quotient candidate divisor and quotient quotient quotient table. ~ in the each except however, a to be done is of this second Figure 3, and be discarded. step determines of those This tuples set can in easily the the quotient quotient of the table be determined for two inputs, which the by scanning all which bit map buckets in table. Discussion A number the divisor so, the first of observations can be made on hash division. can be eliminated while building the divisor step of the algorithm must be augmented duplicate in the divisor table’s hash bucket the divisor table, which is the standard duplicate removal and aggregate functions. First, duplicates in table. In order to do to test for an existing before inserting a new tuple into technique used for hash-based Second, duplicates in the dividend are ignored automatically. Two identical dividend items map to the same divisor tuple and therefore the same divisor number; moreover, they map to the same quotient candidate, and therefore to exactly the same bit in the same bit map. To appreciate this maps, compare this method of detecting and ignoring duplicates effect of bit in the large input, the dividend, with duplicate removal in the indirect strategy using hash-based duplicate removal. The latter strategy requires building a hash table that contains all fields of all unique tuples from the entire dividend. In hash-division, dividend tuples are split into two sections (vertically partitioned into the divisor and the quotient attributes), and the repetition factors ACM Transactions on Database Systems, Vol 20, No 2, June 1995 202 . in each G. Graefe and R. L. Cole section relate are entries than the hash instructive is In hash-based divisor table to relation step, for counter used employed in in with semi-join. aggregation the are The second holds instead if the between entirely replaced maps, the based dividend and however, prior used example queries classes. Typically, join is needed majors to Student sion permit step that and output mits the the the isons, the the big product of divisor contain more expect that TransactIons count with and memory on Database on divisor is the the requirements Systems, the dividend quotient, Vol in not sufficient and Even relation see the really as it though namely do not 20, No. 2, June performance the quotient pose 1995 of memory are all not the of quotient if algorithms, is a superset the the do the naive delivering compar- competitive. main quotient effect student-id’s, which perwithout such as Student e.g., are hash-divi- disadvantage property of the across counts sort-based useful e.g., and function The will algorithms quotient. than we input, and end typically a clear the a third as depends the relation have the of the individual attributes, names at input has the a set of a subsequent a third aggregate the relation. with However, that the table) consider available a third division quotient students’ implementations for quotient useful; and the taken modification permit built a third not with efficient (the have table aggregation division tuples the and could far table the hash-division involved; simple merge-join Recall eliminated to example, as the hash-join aggregation, on quotient. tables. func- integrity be removal, who are such aggregation with sort-based Finally, hash map are known to employ similar For quotient so divisor sort-based an immediate sorting The bit additional can on its result themselves The hash-based sorted table immediate hash is to be joined division their the referential and duplicate information Hash-based Thus, quotient useful an compares candidates. and of students stz~dent-id’s A fairly retaining as table the aggrega- function surprisingly, operations. student-id’s discussed join. in the candidates. the In contains quotient provide divisor not a hash student-id’s, relation. subsequent ACM the enables algorithm is, subsequent attach these hash-division for to find to the semi-join algorithm using hash-based aggregation. Fifth, hash-division creates be table used two tables. are ignored. If duplicates algorithm can be modified divisor, without hash-division can same are maps associated with quotient candidate tuples can be These are precisely the conditions required for division on aggregation that table, the using there quotient hash of divisor numbers and bit maps. divisor is also free of duplicates and the bit by counters. resulting hash maps division cases, and first i.e., serves tionality that duplicates in the dividend not to be a problem, the hash-division counters Fourth, the output, Bit hash-division. divisor bit removal, against both the memory-efficient duplicate In the and more hash-division semi-join, aggregation numbers much semi-join. prior the is hash-based they the contains in prior hash-division, divisor which compare with aggregation The tables, required aggregation tables. to be high. hash it hash-based tion these large Third, hash expected in quotient a major to hold both smaller relations of the Cartesian table may actually candidates, problem in we most Fast Algorithms for Unwersal Quanhfication If, however, the divisor table or the quotient table are larger occurs and portions available main memory, hash table overflow than cases. both tables must hash-division section, can we processor take consider spooled advantage for system. In this rithms divisor) TEMPORARY to secondary hash of the systems are FOR LARGE FILES storage, a multi-processor handling Adaptations to multi-processor USING of techniques database algorithms 4. be temporarily In overflow hash-division discussed in of the one or Alternatively, system. table 203 . the in and Section next a single the other 5. INPUTS section we consider the modifications required for each of the algodiscussed in the last two sections if one of the inputs (dividend or or the quotient candidates do not fit into the available memory. For sort-based algorithms, semi-join, i.e., the underlying as described in the naive division sort operator literature, Salzberg et al, [19901. overflow techniques [Fushimi et al. 1986; For e.g., hash-based can be used, and sort-based aggregation e.g., overflow Sacco 1986] aggregation must perform a disk-based in Graefe [1993], Knuth (possibly and semi-join, avoidance combined and merge-sort [1973], and standard or fragmentation with bucket tuning and dynamic destaging [Kitsuregawa et al. 1989; Nakayama et al. 1988]) or hybrid hash overflow resolution [DeWitt et al. 1984; DeWitt and Gerber 1985; Shapiro 1986]. In the following, we outline the alternatives for overflow management in hash-division. If the available memory is not sufficient for divisor table and quotient table, the input tions that data must one at a time. The partitions spooled are first first partitioned the partition called quotient range-partitioning or into disjoint phases. subsets The partitions is kept in main memory files, one for each quotient partitioning, attributes using hash-partitioning. For while partition, the parti- the other in a way there dividend a partitioning the called are processed join [Shapiro 1986]. For hash-division, which can be used alone or together. strategy, on in multiple to temporary similar to hybrid hash partitioning strategies, In the be partitioned can be processed are two relation strategy example is such queries, the as set of student-id’s would be partitioned, e.g., into odd and even values. Each phase which is the quotient of one dividend partition produces a quotient partition, and the divisor, The quotient (disjoint union) of example, the of students students with set set of students Since table of quotients who the entire division partitions. have The the taken all dividend the concatenation into database the courses concrete is the student-id’s who have taken all database courses student-id’s who have taken all database with even divisors, strategy, relations it certainly called divisor using ACM the the entire divisor, all phases. While can be a problem partitioning, same Transactions partitioning on Database both function Systems, Vol of the courses. the divisor this is no if the divisor partitions set plus odd for small second is Translated all dividend partitions are dividend with must be kept in main memory during problem large. and all the applied is very divisor to the 20, No. 2. June 1995 204 G. Graefe and R. L Cole . divisor tions attributes. are For partitioned performs the division dividend and one Notice ing that and must the algorithm from the in the Courses and for divisor are For divisor of have taken courses all have mined undergraduate really taken partitions of finding is exactly the quotient the division the quotient partition- quotient partitions quotient tuples that that appear in all only students who all graduate database set can easily candidates problem from partition. for the This phase the and courses. one quotient i.e., the tuples result. Indeed, courses all database since the problem quotient database Only rela- Each files, different phase. Transcript courses. one partitioning, were produced by all single-phase divisions, quotient partitions, participate in the final the partition producing quite collection a final and graduate a pair input, partitions partitioning. gathered both undergraduate quotient divisor be example, into that again. be deter- appear Thus, in all in order to obtain the final result, each quotient tuple produced by a single-phase division is tagged with the phase number. The collection phase divides (in the sense of relational division) the union (concatenation) of all quotient partitions over the set of phase numbers. However, instead of using a divisor table to determine which Thus, the collection phase number Let bit to set in the bit maps, phase replace us compare tioning the first the divisor these strategies can skip two division strategies with is executed partitioning the is function on methods are somewhat similar grouping can be used. because to partitioning course-no partitioning the student-id in in their algorithms attribute, similar overflow by hash-based with a preceding hash-based semi-join. Divisor partitioning in the semi-join on the join attribute, Quotient number the numbers. partitioning used when the phase step of hash-division, input and partiaggregation is in to similar to the examples. the aggregate our examples. Since the and similar partitioning strategies could be used for hash-based aggregation and hash-division, we expect that the two methods exhibit similar performance for small and large files. In fact, the similar partitioning strategies also suggest that similar parallel algorithms can be designed, 5. MULTIPROCESSOR In this section adapted we as we will consider for multiprocessor how well systems. partitioning of partitioning processing partitions ACM methods can algorithms, are Transactmns practically section. on Database are assume equal Systems, four division discussion, parallelism of sets can include be those hash- and freely here that (pipelining between operaThe into disjunct subsets). range-partitioning; with either sizes. Vol algorithms we will (distributed memory) and shared-every[Stonebraker 1986] that go beyond the and synchronization overhead. on operators and sets, parallelism can be be combined we the In our exploited effectively by using program tors) and data parallelism (partitioning standard in the next IMPLEMENTATIONS differences between shared-nothing thing (shared memory) architectures obvious differences in communication As for all query processing based forms discuss 20, No 2, June 1995 sortone and since hash-based is used such both query that the Fast Algonthrns for Universal Quantification Parallel For Naive naive sort division, program and the parallelism division partitioning can be employed Parallel Aggregation Algorithms For algorithms based using outlined in DeWitt to be a promising referential integrity have between quotient below program for parallel the two partitioning and for hash-division. and data query parallelism execution, can e.g., those et al. [1990] and Graefe [1993]. While partitioning seems approach, there is a problem if a semi-join is needed for enforcement. Recall that the join attribute in the semi-join from the grouping Thus, the large to be partitioned for the aggregation. expensive. Parallel both techniques (e.g., course-no) is different aggregation (e.g., student-id). ) may be used Both as described on aggregation, standard can only operator. divisor script 205 13wision operators be applied . twice, Partitioning aggregation permits the once large a number attribute dividend for in the relation the dividend subsequent (e.g., Tran- semi-join relation of optimizations and twice that once can be are of interest here. input First, parallel aggregation does not require that the entire aggregation be shipped between machines. It is frequently more effective to perform local aggregations first (e.g., count within partition and finalize the aggregation and Courses relations the Transcript course-no, each student’s number on partition, local and counts dividend, then are can be summed but each machine Second, only only local have is partitioned about the (moved aggregation dividend partitions shipped ization of the aggregation For partitioned courses is partitioned The effect is employed and then after semi-joined within each student-id such that is that not the entire aggregation machines) and example, counted on size of the between partition) counts). been of counts existing local for each student, a relation if sort-based each (sum output from for the aggregation. to derive local counts, the across the network will already be sorted. Finalcan take advantage of this fact by merging a partitions and summing the local counts within a particular partition, avoiding another complete sort and aggregation of the partially aggregated input. Third, a worthwhile idea for hash table overflow resolution in parallel aggregation sites rather records without additional and duplicate than writing can removal is to send overflow buckets to their final them to overflow files. On each receiving site, the be aggregated creating memory. any new directly hash Considering into table that the existing entries and certainly hash without table, possibly requiring all shared-memory any machines but also many modern distributed-memory machines provide faster communication than disk 1/0, this seems to be a viable alternative that deserves investigation. To summarize, division using sort- or hash-based aggregate functions is most likely to be competitive if semi-join is not required. However, if a semi-join is required, the dividend relation must be partitioned twice across the interconnection network, once for the semi-join ACM Transactions on Database Systems, and shipped and once for Vol. 20, No 2, June 1995 206 . G. Graefe and R. L Cole aggregation, the although the Parallel thus significantly optimization increasing using local the cost aggregation in can most environments, be used. Hash-Division For hash-division, program separation of the division operator, division because operator. However, parallelism has only limited promise inputs’ producers and consumer the entire division is performed both partitioning strategies beyond the from the actual within a single discussed earlier for hash table overflow, i.e., quotient partitioning and divisor partitioning, can also be employed to parallelize hash-division. For hash-division with quotient partitioning, the divisor table must be replicated all in local and each result. lar the memories hash-division local division Clearly, since divisor processes In divisor instead tagging once dresses the central ized using the to the over quotient with this in can sizes. and ing the duplicate detailed over the cost solely relational of the number ACM indicated and of tuples TransactIons site of network the In step parallel instead divides addresses. collection we the can ad- set of all case that be decentral- performance claim that not database the poor query of strate- for appropriately quantification efficient hybrid languages is universal performance are universal and four comparison power algorithms optimizer, simple This imply division query of the division. and hash without join and creating a problem. query evaluation discussed above, or referential analysis integrity tedious—we as query well earlier as we in Table data paths each Systems, plans derivations plans in this up to four query flow from between operators 20, No 2, June 1995 For readers may readers skip each evaluation the are cost provid- explanations, section. path. Vol those and their variants Some that I. Tuples develop their enforcement. their show and including recommend evaluation algorithms, on Database the does if as the illustrate the FORMULAS relational our fact, in algorithms traveling COST division In included on the plans, collection analyze by the functions, division cases in However, processor network the and as fast removal this the are section, find focus chosen be for the AND support or throughput this and we to relational therefore In order division formulas section, and performance particuamong processed overflow. numbers, of processor PLANS database implemented relational tuples, quantification quantification non-trivial in partitioning. universal detailed other, division synchronization are table phase is a bottleneck, Beginning very with site EVALUATION for hash set 6. QUERY gies global machine, without partitions for the collection of the a shared-memory resulting tuples attached tuples be part shared replication, of each is complete. as discussed quotient are incoming in be After independently immediately can it processors. completely is trivial partitioning. the will table of in phases participating work result replication the multiple of all operators leaves labeled and of the plans to the with four for roots the Fast Algorithms Table III. Variables for Unwersal Quant}flcation Used . 207 in the Analysls Vartable DescrlptLon IRI Dividend cardinality Dividend kl size in pages Divisor cardinahty Divisor size in pages Quotient ~Ql cardinality Quotient size in pages Quotient candidate cardinality Quotient csndidate size in pages Memory size in pages Size reduction of divisor Size reduction of dividend by duplicate Size reduction of dwidend by referential Degree by duplicate removal removal integrity enforcement of parallelism We assume dividend relation R (I RI tuples in r pages) and divisor relation S (IS] tuples in s pages) with quotient relation Q (IQ I tuples in q pages). The projection of R on the quotient attributes is the quotient candidate relation C (ICI tuples in c pages), Obviously, IQI < ICI < IRI and q < c s r. Further, we assume m memory pages, with s + c < m. In other words, we only analyze cases in which divisor and quotient are smaller than memory. Given that the dividend relation is the Cartesian product of quotient and divisor, this restriction will cover most practical cases. All variables are summarized in Table III, together with those used in the analyses of duplicate removal, referential Our integrity enforcement, cost measure combines and parallelism. CPU and 1/0 without overlap of CPU and 1/0 activity. essential operations such as 1/0 operations to roughly match a contemporary workstation cost units. The cost units, their values are given in Table IV. For each of the four algorithms aggregation and semi-join, division and hash-division), duplicate removal then analyze also estimate Costs we first nor the some common for Original Inputs both measured (in milliseconds), and their (naive division, division by hash-based aggregation derive referential cost for costs, cost formulas integrity these for the enforcement preprocessing description by sort-based and semi-join, case that are steps. neither required However, and we first costs, and Final Output The cost of reading the divisor and dividend relations, the quotient file (copying), and writing the quotient is (r + s) X as times The cost formulas capture all and record comparisons. We tried when setting the weights of the SeqIO x WrRec + (IR + 1S1) X RdRec assembling + q X Moue pages + IQI + q x SeqIO. ACM Transactions of (2) on Database Systems, Vol 20, No. 2, June 1995. 208 G . Graefe and R L. Cole Table Unit Tune SeqIO RdRec WrRec RdTmp WrTmp Comp Aggr Hash Move Bit Map SMXfer Description random I/ O,onepagefrom sequential CPU 1/0, or to disk one page from cost per record when (incl. or to disk reading seek and latency) (incl. latency) a permanent file CPU cost per record when writing a permanent file file CPU cost per record when reading a temporary CPU cost per record when writing a temporary file comparison of two tuples initializing aggregation, or aggregation of two tuples calculation of a hash value from a tuple memory to memory copy of one page setting a bit in a bit map initializing or scanning a bit in a bit map transfer of one page between processes in shared memory transfer of one page in distributed memory 4 DMXfer Cost Units [ rns] 20 10 0.2 0,2 0.1 01 0.03 0.03 0.03 0.4 0.003 0.0001 0.1 RndIO IV. We omit this cost in the following cost formulas, because it is common to all algorithms and does not contribute to their comparison. In the analytic performance evaluation in the following section, we include this base cost in the diagrams to provide a reference point for the algorithms’ costs. Sort Costs Since several of the algorithms require sorting, we separate the formulas for the cost of sorting. We distinguish between input relations that tit in memory and those that do not. For the former mate cost function of Sort(S) := 2 x ISI x logz for relation S since it fits in memory. For relations larger than memory, we assume (ISI) we quiclcsort with X Comp + s X Moue assume a disk-based an approxi- (3) merge-sort with sequential 1/0 and algorithm with fan-in F, in which runs are written read with random 1/0. Its sort cost is the sum of sorting the initial runs using quicksort one memory load at a time and the product of the number of merge passes with the cost of each merge, both 1/0 and CPU costs, which is Sort(R) := 2 x Il?l X(r x log2 (lhl/(r/m)) x (Moue X (WrTmp + SeqIO + RdTmp X Cornp + RndIO) + logz (F) + hg~ (r/m) + II?] (4) X Comp)). Quicksort will be invoked r/m times, each time for lR1/(r,/m) tuples. The merge depth is approximated by the logarithm without ceiling to reflect the advantages of optimized merging [ Graefe 1993]. The cost of writing a run, e.g., an initial 6.1 Naive run, is included in the cost of reading and merging it. Division Figure 4 shows three query evaluation plans for relational division using naive division. The labels on the data paths between operators indicate the cardinalities of intermediate results, ACM Transactions on Database Systems, Vol. 20, No 2, June 1995. Fast Algorlthrns for Universal Quantification Naive Division %-’ Scan S Naive Division IRI Sort ISI Scan R If referential removal is not integrity required, replaced by counting. equation quotient This formula reflects dates groups, counting plan is case, size quite which Referential Query plans for naive It is only necessary in the Scan R division algorithm the of each similar will Integrity size of each group to the count plan the tuples is shown with in Figure case is R, dividing group, plan and R. This + ISI XAggr to the be discussed for the inputs and duplicate can actually be omitted and in this costs of sorting the division. to sort dividend + IRI x (Con-zp + Aggr) Sort(R) this Scan S (c) is known to hold sorting the divisor attribute 4a. The cost of naive same I IRI (b) Fig. 4(a–c). the Son-Unique 1s1 Scan R (a) comparing ‘=Jq/p Sort-Unique IRI Scan S 209 Division lsl/y/ %x’ Sort sort Naive . R into counting number using (5) + ICI x Comp. the of divisor sort-based quotient divisor candi- tuples, tuples. and Note aggregation that for the shortly. Enforcement Naive division can enforce referential integrity as part of its nested merging scans, if the two inputs are sorted properly, e.g., as shown in Figure 1. The plan this in Figure plan 4b shows naive division with two sort operations. The cost of is Sort(S) + Sort(R) + IRI x C’omp +( ICI – IQI) x The new cost components comparing consecutive candidate groups, and enforcement. The latter reflect + IQI x ISI X Comp 1S1/2 x Comp. sorting divisor (6) S in addition to dividend R, R tuples to divide the sorted dividend into quotient comparing R and S tuples for referential integrity cost is calculated separately for successful and unsuc- cessful quotient candidates. We presume that half of the divisor file is scanned for each unsuccessful quotient candidate before the failure is identified and the remaining dividend tuples of the quotient candidate are skipped over without pursuing the scan in the divisor. Note that we assumed that s < m; therefore, the repeated scans of the divisor can be performed without 1/0, Duplicate Removal If the inputs them, which contain duplicates, the sort operations must identify requires comparisons, costing (IRI + IS 1) x Comp. ACM TransactIons on Database Systems, and remove We presume Vol. 20. No. 2, June 1995. 210 G. Graefe and R. L Cole . that duplicate removal reduces the divisor size by a factor a and the dividend by a factor /3, as shown in Figure 4c. Thus, the cost for naive division with duplicate removal Sort(S) The cost integrity but without + Sort(R) + (IN + lS1/a X Aggr naive division for enforcement Sort(S) referential + ISI) X integrity Comp enforcement + lR1/~ is X (Comp +Aggr) (7) + ICI X Comp. with both duplicate removal and referential is + Sort(R) + (IRI + ISI) X Comp + lR1//3 X Comp (8) + IQI 6.2 Division ]S1/a X by Sort-Based X Comp + (ICI – IQI) X lS\/a/2 X Comp. Aggregation Performing a division by a sort-based aggregate function requires the elements in S, sorting and counting groups in R, and verifying counts for the groups the query evaluation the cardinality of R equal the count of S. Figure 5a (upper left) shows plan. Determining the scalar aggregate, i.e., counting of the divisor, and verifying that the correct for each group in the aggregate function costs IS I x Aggr on assume for simplicity that the aggregate function output procedure counting that the of the sort’s final merge step, the count was found + ICI X Comp. If we R is formed in the costs of comparing each its predecessor and the actual aggregation must be added to the cost of sorting, i.e., IR I x (Comp + Aggr). Thus, the total cost for division tuple of R with by sort-based Sort(R) Referential aggregation + IRI Integrity For sort-based is (Comp X +Aggr) + ISI XAggr + ICI X Comp, (9) Enforcement referential integrity enforcement, the inputs must be sorted on the divisor attributes and followed by a semi-join version of merge-join. Figure 5b (upper right) shows the corresponding query evaluation plan. The reduction factor of referential integrity enforcement is indicated by S. We presume that the scalar aggregate can be performed as a side-effect of the sort or at least can obtain its input from the sort; thus, we do not count the cost of two divisor scans, only of one. Thus, only the costs of two sort operations and of a merge-join must be added, which is (1R I + IS 1) X Comp. We assume that the entire S relation is kept in buffer memory during sort and join processing and ignore any cost of memory-to-memory copying in the merge-join since the join is actually a semi-join, which can be implemented without cop ying. Thus, the cost for sort-based aggregation with referential integrity enforcement is Sort(S) + Sort(R) + lR1/8 ACM Transactmns on Database + (IRI + ISI) x (Comp Systems, +Aggr) Vol X Comp + Sort(R/8) + IS I X Aggr 20, No. 2, June 1995 + ICI X Comp. (lo) Fast Algorithms for Unwersal Quanhficatlon 211 . Verify vScalarAggr Sort-AggrFct I IRIIc$ 1s1I Scal~Aggr Merge-Join Scan S Verify Ww sort Sort-AggrFct Scan R Scan S (b) Scan R Scan S IRI 1s1 IRI 1s1 sort (a) Verify J’-’”+!Sort-AggrFct ScalarAggr Verify “v J“” Scala~Aggr Islla AggrFct I IsI@/ 1s1 Scan R (d) Fig. 5(a-d). Query plans for division by sort-based aggregations Removal When the dividend must be performed Figure 5c-d (lower removal. IRI Scan S Scan R (c) Duplicate Sort-Unique 1s1 IRI Scan S plp Scan S Sort-Unique Sort-Unique 1s1 Merge-Join Sort-Unique I lR1lP Sort-Unique lRll@6 Islla A sort and divisor contain duplicate records, duplicate removal prior to aggregation in the aggregation-based algorithms. left and right) show the query plans including duplicate operator can perform either duplicate removal or an aggre- gate function, but not both. Thus, the figures show a separate function, which relies on sorted input and thus can identify groups aggregate by simply comparing tuples immediately following each other in the input stream. The common subexpression (sorting and duplicate removal from S) is shown twice but charged only once in the following formulas. The cost of sort-based aggregation with duplicate removal but without referential integrity enforcement is Sort(S) + Sort(R) + (ISI + + /S//cz X Aggr IRI) x C“omp + lR1/~ X (Comp + Aggr) ~11) + ICI X Comp. ACM TransactIons on Database Systems, Vol 20, No 2, June 1995. 212 G. Graefe and R L Cole . The cost ential of sort-based integrity aggregation enforcement SOrt(S) + Sort(R) + ISI) + Sort(R/~/6) 6.3 Division The is query to X Aggr one aggregation in the number average Recall that occurs in verifying that function is the in Figure the aggregation, same and + lS1/a) refer- x Comp +Aggr) (12) + hc x Comp required The value in Figure The to cost was found sort-based for each no hash the each aggregation. left), of hash-based + Aggr ), where that of finding 6a (upper cost traverse s + q s m, meaning that as for shown aggregation. IRI x (Hash function. same table scalar group Thus, he is bucket. overflow aggregate in the hash the cost and aggregate for the plan 6a is IR] X (Hash Referential + he x Comp Integrity For hash-based in is assumed + (lRl\B (Comp X sort-based of comparisons aggregate removal Aggregation for memory we the duplicate + ICI X Comp. for hash-based the x Comp + lR1//3/8 by Hash-Based plan similar both is + (IRI + lS1/a with Figure hash-based aggregation, (13) cost of a prior is IS I x Hash + IR I x (Hash + he x Comp) table with the tuples from the total cost for division S and probing by hash-based it with the tuples aggregation and + X(Hash Duplicate + ICI X Comp. the additional right) semi-join ISI X Hash + ISI X Aggr Enforcement 6b (upper building a hash from R. Thus, +Aggr) for referential IRI X (Hash + hc X Comp integrity enforcement + he x Com,p) +Aggr) semi-join as shown is + lR1/8 + IS I X Aggr for (14) + ICI X Comp. Removal The next two figures show the previous two query evaluation plans modified to include duplicate removal. If the implementation of hash-based duplicate removal were very smart, it would be possible to ensure that the duplicate removal operator in Figure 6C (lower left) produces its output grouped suitably for the subsequent aggregate function, making a separate hash table in the aggregate function obsolete. Similarly, in Figure 6d (lower right), the duplicate removal operation for the divisor could be integrated into the hybrid hash join operation. However, we ignore these optimizations in our cost formulas. Since the duplicate-free dividend may be larger than memory, i.e., IR1/~ > m, hash case, can be divided ing table overflow recursive may hybrid approximated by fan-out, [Graefe and the which 1993; ACM Transactions by memory we Graefe occur the size, assume et in hashing al. logarithm with the duplicate must the be of the duplicate logarithm to be equal 19941. removal employed. Thus based to the the cost of In recursion removal equal fan-in on Database Systems, Vol. 20, No 2, June 1995 operation. The output to the during hash-based this depth size partition- merge-sort duplicate 1? Fast Algorithms for Universal Quantification 213 . Verify /w ScalarAggr Verify Y ScalarAggr 1s1 w’ Hash-AggrFct 1s1 lRll~ Scan S IRI Hybrid Hash Join Y“+! Scan S Scan R Scan S Hash-AggrFct (a) Scan R (b) Verify YScalarAggr Verify y’ <cl Scala~Aggr Hash-AggrFct Islla Alp IRI [s1 I Scan R Query plans for division removal for the large dividend step in the deepest recursion reading a partition is—considering level first and file together aggregations. the actual duplicate removal calculating the cost of (sequen- with := IRI x (Hash the cost of writing it (using + he x Cornp) + logF(r/rn) x ( WrTmp Thus, the cost of relational duplicate removal HashUnique(R) + lR1/Q If by hash-based writes): Hashuniqzte(li) both quired, Scan R Scan S (et) Fig. 6(a–d). random Hash-Unique Scan S Hash-Unique (c) tially) Hash Join Is@” IRI Scan S Hybrid 1s1I Hash-Unique 1s11 I lRllflc$ Hash-Unique I lRll~ Hash-Unique Hash-AggrFct Is I/a duplicate using but + ISI X (Hash x (Mm+ + RdTmp) division on the inputs not X Hash (15) -t- Ill + r X (RndIO hash-based referential and + Aggr) referential + SeqIO)). aggregation integrity + [S1/a x Aggr integrity including enforcement + he X Comp) + hc x Comp removal (IRI X is (16) + ICI X Comp. enforcement are re- the cost is IlczshUnique(R) + ISI X (Hash + lR1//3 x + lR1/p/13 + lS\/a ACM + he X Comp) (Hash X Aggr Transactions X Hash + he x Comp) (Hash x + lS1/a + hc X Comp (17) + Aggr) + ICI X Comp. on Database Systems, Vol. 20, No. 2, June 1995. 214 6.4 G Graefe and R, L. Cole . Hash-Division When hash-division is used, a prior semi-join for referential integrity enforce- ment or an explicit duplicate removal step is never necessary. Thus, the query evaluation plan shown in Figure 7 always applies, although the actual hash-division algorithm might have the divisor table and the bit maps enabled or disabled. If it is known that neither duplicate removal nor referential integrity enforcement is required, the simplest version of the algorithm hash-based can be used. aggregation enforcement. tuples This version mimics the without duplicate removal It uses neither and dividend divisor tuples table by groups. behavior of division by or referential integrity nor bit maps The cost of this and just variant counts divisor of hash-division is ISI xAggr which and is the sum counting is done algorithm, not in hash-based data type Referential the costs very table. efficiently as a more facility, even Integrity divisor use this program may tasks (18) and formula of inserting although of the of a query invoke as simple tuples cost variables interpretation which +Aggr), evaluation a relatively must of the referential be used. IS] xHash In this plan expensive as abstract as counting. integrity constraint is not known, case, the cost of hash-division + IRI X (Hash x (Hash + hc x Comp + he X Comp) a is + lR1/8 + Aggr). (19) The differences to the cost for hash-division without referential enforcement reflect building and probing the divisor table. Duplicate the hash-division Enforcement the validity table the We with general for + hc X Comp of counting quotient aggregation, If, however, divisor of the in counting + IRI X (Hash integrity Removal If the inputs contain duplicates, the divisor table must be probed while it is being built and the quotient table must contain bit vectors instead of counters, with IS I/a bits per bit vector. If duplicates exist but referential integrity is known to hold, the divisor table need not be probed with dividend tuples, and the cost of hash-division ISI x (Hash + 2 X ICI + k X is X Comp) lS1/a The first two terms account table. The third term reflects + IRI X (Hash + hc X Comp + IRI) X (Hash for building initializing + he X Comp) + 2 x ICI x lS1/a ACM Transactions on Database ~20) Map. X the divisor table and the quotient and scanning the bit maps. The cost of the most powerful variant of hash-division in eludes both duplicate removal and referential integrity (IS + Bit) + lR1/8 X (Hash x Map. Systems, algorithm, enforcement, + hc X Comp which is + Bit) (21) Vol 20, No. 2, June 1995 Fast Algorithms for Universal Quantification . 215 Hash-Division Iy Fig. 7. Query plan for hash division. WI Scan R Scan S The first term reflects building the divisor probing and building maps, 6.5 which probing table the quotient can be very Parallel the Relational divisor table, and by the dividend. table, both by the The second which includes fast if it is done word divisor term setting by word, while accounts bits for in the bit not bit by bit. Division The four algorithms for relational division and all their variants can readily be parallelized with both pipelining and partitioning, i.e., inter-operator and intra-operator parallelism, as discussed earlier. Figure 8 shows the most complex plan for repartitioning algorithms, tioning, each and i.e., naive which of the replication division requires algorithms, marked and that the with datapaths by “part.” and hash-division, divisor requiring “repl.” For we assume be broadcast data the direct quotient to all sites. partiFor the component algorithms of the indirect strategies, i.e., duplicate removal, semijoin, and aggregation, we assume partitioning on the join attributes or the grouping attributes. Moreover, we assume a shared-nothing (distributedmemory) machine with local disk drives or a shared-everything (shared-memory) machine with disk drives dedicated to processors in order to reduce contention bandwidth and interference. is sufficient that pairs of sites; thus, the amount of data that dividend In order communication delay can be estimated sent from each site to other processors. and divisor the partitioning robin). In any case, we assume that communication can occur in parallel cannot are partitioned be used to determine the for cost originally processing of each by determining We also assume across (e.g., plan, the the available between any all disks partitioning cost of the but that is roundunderlying sequential plan is increased by the cost of repartitioning data. If P is the some data remain with the same processor, namely the degree of parallelism, I/P. The cost of exchanging data across process and machine boundfraction aries for a relation R is r x X~er, or distributed memory, i.e., either Table IV. Thus, the costs of naive partitioning are increased where Xs) X (1 – l/P) where B represents the relative sending it to only one recipient (B no support). The referential enforce same cost should reflect shared memory given in quotient by (r+l? not X~er or Xfer = DMXfer Xfer = SMXfer division and of hash-division using is added integrity ACM (22) XXfer, cost of broadcasting = 1 for full broadcast for explicitly Transactions aggregation-based by means on Database the divisor versus B = P for support, algorithms of a semi-join. Systems, that do Parallel Vol. 20, No. 2, June 1995. 216 . G. Graefe and R. L. Cole Naive Division Isy “=-JQ/p Hash-Division Sort-Unique Sort-Unique ISI part. part. Scan S IRI -’~Scan S Scan R Scan R Verify Verify Y@ ScalarAggr HP’. ScalarAggr v Sort-AggrFct part. I lR1//3’?i IsI/a 1s1 Islp-=’ Hash-Unique ‘-’-qIp ISI part. part. Scan S plans for relational X (1 – I/P) by splitting the before. X course, Moreover, smaller For by costs the divisor incurred cost even at the a final to each ANALYTICAL plan. the In this site suitable to PERFORMANCE the operations case, the data sites must and such cost illustrations be not local and cost for the cost (division cost to global, can reflect memory easily factor by response increases. modified the their and that consumption (remove its into transfer (24) resource local, plans and omit (23) XXfer. repartitioning operator We above and require by Xfer. multiple total derived each broadcast collection at if formulas size partitioning, modifying identifiers) are decreases dividend adding 7. all actually division aggregation P X c X (1 – I/P) Of Scan R with referential integrity enforcement result, and their costs are increased global components, as discussed aggregation is about time part. I IRI Scan S r/~/8 can be reduced Hash-Unique ISI I part. query aggregation-based algorithms repartitioning the intermediate “-+/p Scan S Hash-Unique IRI Parallel Hash Join By+ Scan R Fig. 8. This Hybrid 1s1I Sort-Unique Scan S Sort-Unique part. I lR1/f18 Islla Merge-Join Sort-Unique ‘-+ Hash-AggrFct B the formulas be above) set the size. of found and node here. COMPARISONS In this section we compare the algorithms’ performance by applying the cost formulas derived above to various input sizes and to situations with and without explicit referential integrity enforcement and duplicate removal. In this analysis, B. We chose ACM Transactions we assume that the tuple size for R is 32 B and for S and Q 16 fairly small tuples because in most practical cases, the tuples on Database Systems, Vol. 20, No 2, June 1995. Fast Algorithms will consist The memory which of keys only used used to serve partitioning a hash 7.1 size for or hash a single clients. 16. The student-id’s is operator The average with tables fan-in 1,024 of query merging that each, a machine the fan-out required we assume 217 4 KB on and of comparisons . and course-no’s. pages of a single for number is he = 2. Furthermore, bucket sorted sorting many is as in our examples for is a realistic for Unwersal Quantification for neither for scanning R nor S are originally. Sequential Since there Algorithms is no typical or average case for division algorithms, the conceptu- ally simplest case will be our first comparison: the dividend is the crossproduct of the divisor and quotient, i.e., R = Q X S, and neither duplicate removal nor referential integrity enforcement are required. Figure 9a shows the performance of the four division varying from 16 to 4,096 with constant \QI = 256. Since I R [ varies from 4,096 to 1,048,576. The bottom data points, as in all and subsequent other pertinent indicates the output, The is base alike; divisor. The The reason dominates both cost, cost two is that in margin (a factor inputs (dividend quotient). of 2 to 5) and and the sort-based whereas sorting Referential Integrity 9b shows constraint requires N in their the previous hash dividend are table. base final the R while However, the by quite costs, output performance R relation sorting algorithms the perform performance. i.e., to a large scanning disk (the deteriorates algorithms consistently are that hash-aggregation division direct the quick-sorting require a algorithms’ If? grow while the hash-based to the base cost. The reasons alike for of the costs curve writing algorithms cost sort-based writing line cost. treatment of their and in also probing the most divisor) derived algorithms require outperform was IS I all 9a, first unmarked and small are the The the inputs sort-based the in Figure in the algorithm’s sort-based without overflow whereas naive files, and that hashing permits Figure is algorithms, algorithms Moreover, IS I and relative all line. cost two algorithms cost, and both algorithms each the difference hash-based hash-based hash-based in second this label parameter of reading when that varying the cost is included only the in the obvious their the i.e., mentioned immediately almost explains parameters base as previously section. It figures, algorithms for R = Q x S for and sort-aggregation access to the correct as perform well can proceed must use run hash bucket log N comparisons. Enforcement the analytical is explicitly enforced. cost of division The when cardinalities the referential of the dividend, integrity divisor, and quotient are IQ I = 256, IS I = 256, and IRI = IQ I x IS I x 8. In other words, this comparison expands upon the midpoint of the previous comparison shown in Figure 9a with the size of the dividend relation increased by a factor which appears as reduction factor S in the cost functions for query evaluation plans with explicit referential integrity enforcement. For example, if 8 = 1, then lR/ = 65,536, and when 8 = 8, then IF? = 524,288. ACM TransactIons on Database Systems, Vol. 20, No. 2, June 1995 218 . G. Graefe and R. L. Cole ❑ Naive Dwlslon 1500 – cost x Sort-Based Aggregation + Hash-Based Aggregation o Hash-Division 1000 _ [see] 500 – o– I I I 16 64 256 Divisor I 1024 Cardinality lQl=256and I 4096 1S1 R=Sx Q (a) 2000 -+ 1500 – cost ❑ Naive Division x Sort-Based Aggregation + Hash-Based Aggregation o Hash-Division ~ooo” _ [see] 500 – o– I 1 Size Reduction I 8 I 4 I 2 by Referential Integrity I 16 Enforcement 6 IQI = ISI = 256, IRl = ISI X IQI X d (b) Fig. 9. (a) Cost of division algorithms. (b) Division with referential integrity enforcement When the referential integrity constraint is explicitly enforced, as shown in Figure 9b, the naive-division algorithm increases in a familiar IV log IV pattern with the size of the dividend before the referential integrity enforcement and size reduction. The cost of sort-based aggregation significantly varies within Figure 9b because of two sort operations for the semi-join and the aggregation. For 8 = 1, i.e., if the semi-join does not remove any tuples, the cost of sort-based aggregation with explicit referential integrity enforcement is almost twice that of naive division if the base cost is ignored. Thus, if referential integrity must be enforced, sort-based aggregation is a particularly poor query evaluation plan, although it is the only one available in most current systems. For larger before ACM values of 8, the semi-join the second dividend sort, TransactIons on Database Systems, removes a large and the second Vol. 20, No. 2, June 1995 fraction of the dividend sort becomes relatively Fast Algorithms for Universal Quantification . 219 ❑ Nawe l.huslon 1500 – cost x Sort-Based Aggregation + Hash-Based Aggregation o Hash-Division 1000” _ [see] 500 – o– I 1 I 2 I 4 Size Reduction P = CZ,IQI= I 8 by Duplicate 256,Isl = 256 I 16 Removal a x a, IRI = 65,536 x /3 (c) ❑ Naive Division 8000 – x Sort-Based Aggregation 6000 – + Hash-Based Aggregation cost [see] o Hash-Division 4000 – 2000 – o– Size Reduction cz=~=4, by Referential 1Ql=256,1Sl =256X Integrity Enforcement 6 a,lRl=65,536x/?x6 (d) Fig. 9. (c) integrity Division with duplicate removal. (d) Division with duplicate removal and referential enforcement. insignificant compared to the first dividend sort. Since the first dividend sort has the same input size and the same cost as the sort of the dividend required for naive division, the costs of the sort-based direct and indirect algorithms become comparable for large values of 6, although the sort-based algorithms are still much more expensive than the hash-based division algorithms. In hash-based aggregation with referential integrity enforcement, the semi-join reduces the number of dividend tuples by a factor of 6 for a relatively small cost. In hash-division, the algorithm’s divisor table is enabled, larger which has exactly values of 8, the the same effect as a hash-based semi-join. overhead of hash-division relative to the actually decreases very slowly. The hash-based algorithms have a significant performance their sort-based counterparts: because none of the situations ACM TransactIons on Database Systems, Thus, for base cost advantage illustrated Vol. 20, No. 2, June over in 1995. 220 . G. Graefe and R. L. Cole Figure 9b causes hash reduction of the larger in-memory hash table table overflow, 1/0 to temporary files is avoided. input R can be achieved by filtering it through built on S. This is an example of the superiority hash-based join algorithms for different sizes of the two inputs in earlier research {Bratbergsengen 1984; Graefe et al, 1994]. Duplicate The the of also observed Removal Figure 9C shows the costs of the four relational division algorithms when the inputs require duplicate removal, For the duplication factors a and B, we assumed that a = @ and varied their value from 1 to 16. Again we see that the sort-based algorithms are more expensive because of the duplicate removal hash-based aggregation is because requiring the processing prior to aggregation. In addition, the cost of can now be discerned from that of hash-division. This cost of duplicate recursive processing removal prior to aggregation since the duplicate-free be held by the in-memory hash fastest algorithm. Duplicates in dividend is quite high, can no longer table. Hash-division continues to be the the inputs only require enabling of the quotient table bit map processing, at only a small additional cost. Figure 9d shows the midpoint of Figure 9C (a = P = 4) augmented with referential integrity enforcement. The reduction factor for the dividend relation 6 varies from 1 to 16, Naive division outperforms sort-based aggregation if referential integrity enforcement is required, as had been observed earlier in Figure 9b. As the referential integrity reduction factor increases and the input small to the second dividend sort in sort-based compared to the input to the fh-st dividend aggregation becomes quite sort, the difference between naive division and sort-based aggregation all but vanishes. With sizes of the original dividend relation, the cost of hash-based removal becomes larger and larger. However, it does not approach sort-based aggregation. The performance of hash-division is very increasing duplicate the cost of steady. Over the entire range, the cost of hash-division relative to the base cost is constant. For both duplicate removal and referential integrity reduction factors equal to 4, hash-division is almost 5 times than 5 times faster than sort-based hash-based aggregation. Preliminary faster than aggregation, naive division, a little and 2+ times faster more than Conclusions Since we realize that the formulas are not precise, we will also report on experimental measurements in the next section. Nevertheless, we believe that these analytical comparisons already permit some conclusions. First, division by sort-based aggregation does not perform much better than the naive algorithm. In fact, if a semi-join is necessary to ensure that only valid dividend tuples are counted, the additional sort cost makes division by aggregation significantly more expensive. Second, if division is executed by aggregation, the division can be performed much and semi-join, faster if hash-based algorithms because the two input sizes, ACM on Database TransactIons Systems, Vol 20, No. 2, June are used for the aggregation dividend and divisor, are very 1995 Fast Algorithms for Universal Quantification different. This parallels the observations sort- and hash-based matching operators sengen 1984; Ck-aefe et al. 1994; Schneider In sort-based for each file . 221 by Bratbergsengen and others on (join, intersection, etc.) [Bratbergand DeWitt 1989; Shapiro 1986]. division algorithms, the number individually by the file’s size. of merge levels is determined Therefore, sorting the large dividend with multiple merge levels dominates the cost for both the aggregation and the join. In hash-based algorithms, on the other hand, the recursion depth of overflow avoidance or resolution determined by the smaller one, in our relation. Since the inputs of the semi-join, is equal case the dividend for both inputs and relatively small divisor and divisor, differ by a factor of at least the cardinality of the quotient, using a hash-based semi-join operator is significantly faster. This conclusion can also be stated in a different way: if a database system’s only means to evaluate universal quantification and relational division queries is aggregation with prior duplicate removal and semi-join, the semi-join as well as the aggregation should be hash-based because in the case of relational division, the semi-join inputs have very different sizes and therefore give hash-based join algorithms a significant performance advantage. Third, the new hash-division algorithm better than division by hash-based consistently aggregation. performs However, as well as or if a semi-join duplicate removal is needed, hash-division significantly outperforms by hash-based aggregation as well as both sort-based algorithms. Finally, hash-based algorithms outperform sort-based ones not or division only for existential quantification but also for universal quantification problems and algorithms. This result is different from the earlier observation cited above because those papers pertained only to one-to-many relationship operations such as join, semi-join, intersection, difference, and union, while this paper is the first to demonstrate this fact for relational division. Most importantly, it outperforms sort-based techniques consistently in a wide variety of situa- tions. Effect of Clustering Indices Let us briefly consider the effect of clustering B-tree indices, despite our general focus on algorithms that work on all relations, including intermediate results. There are two effects that are B-tree indices may save sort attributes relevant for a query, operations. the index file, which permits base file records. The latter effect record sizes in our 1/0 significant interesting here. First, clustering Second, if an index contains all may be scanned instead of the base savings if index entries is not pertinent here, because we analysis. The former effect, however, are shorter assumed is worth than very short analyzing, which we do in Figure 10. For this comparison, we modified the last comparison (Figure 9d) to presume sorted relations suitable for the first processing step. Thus, the sort-based plans save the initial sort step for each input, whereas the cost of the hash-based algorithms remains unchanged. Most strikingly, hash-based aggregation is not competitive with the outer three algorithms, because it requires temporary files for hash table overflow. ACM Transactions on Database Systems, Vol. 20, No, 2, June 1995. 222 G. Graefe and R. L. Cole . 4000 ❑ Naive Division 3000 x Sort-Based Aggregation + Hash-Based Aggregation cost [see] 0 Hash-Division 2000 1000 0 i.J1 I 1 I 2 Size Reduction a=~=4,1Ql Fig. 10 Diwsion with I 4 by Referential =256, duphcate removal Integrity 1Sl=256xct, and referential I 16 I 8 Enforcement 6 lRl=65,536x/!?x6 integrity enforcement; no initial sorts The other three algorithms are fairly close to each other as well as to the base cost, i.e., the cost of reading the inputs and writing the final output. In other words, reading the inputs these algorithms. If Figure 10 had finer and writing resolution, the output the are the following detailed dominant trends costs for would be apparent. Naive division is consistently about 15% more expensive than the base cost. Hash-division is 18–43% slower than naive division, with improving relative performance for higher reduction factors 8 and increasing dividend sizes IR 1. Sort-based aggregation shows the most interesting behavior, with costs 20– 112°A higher than the base cost. For small reduction factors 8, the first sort step between merge-join because the size of the dividend large 6, i.e., referential integrity second sort operation becomes even outperforms hash-division. Thus, Figure 10 demonstrates and aggregation weighs in heavily has not been reduced by the merge-join. enforcement with large size reduction, almost that negligible, sorting and the large sort-based dividend For the aggregation is the domi- nating cost for sort-based relational division algorithms. If sorting can be sort-based relational division avoided by use of clustering B-tree indices, algorithms become competitive with hash-division. Non-clustering indices would not have the same effect, unless they contain all required attributes and permit index-only retrievals [Graefe 19931. On the other hand, if the dividend is an intermediate query result of suitable B-tree indices do not exist, hash-division is the algorithm of choice. Seen in a different way, the hash tables of hash-division organize the divisor and dividend tuples into associative structures, just like permanent indices on disk. However, since they are in memory, they are faster than non-clustering indices on disk but which can deliver the input data not quite as fast as clustering indices, immediately ACM TransactIons in the proper on Database organization Systems, Vol 20, No (for naive 2. June 1995 division). Fast Algorithms 400 – ‘ 300 – for Universal Quantification . 223 ❑ Naive Division x Sort-Based Aggregation + Hash-Based cost Aggregation o Hash-Division [see] 200 – loo– o– I 1 2 Size Reduction P=16, I 4 I cr=fl=4, by Referential 1Ql=256,1Sl I 16 I 8 Integrity =256 Enforcement xa, 1 6 lRl=65,536x/?xd (a) ❑ Naive Division – 16 x Sort-Based Aggregation + Hash-Based Aggregation –8 o Hash-Division 4 Speedup –2 –1 I I 2 1 I 4 I 8 Degree of Parallelism a=~=J=4, 1Ql=256,1Sl =256 I 16 P xa, lRl=65,536x/lx6 (b) Fig. 11. (a) Parallel division on distributed memory. (b) Speed-up for parallel division alga. rithms. 7.2 Parallel If division Algorithms is executed on a parallel machine, inputs and intermediate data must be partitioned across the processors, which add a new cost to all cost formulas above. Figure 1 la shows the cost of the algorithms with duplicate removal and referential integrity enforcement for a distributed-memory database machine with P = 16 processors and no broadcast assistance (B = P). Each processor is assumed to be equipped with CPU, disk, and memory like the single processor in the previous subsection. The indicated relation sizes pertain l/P of these costs. to the entire system, i.e., each processor stores and processes sizes, The base cost of Figure 1 la does not include partitioning ACM TransactIons on Database Systems, VO1 20, No 2, June 1995. 224 . G. Graefe and R. L. Cole The curves for the sort-based aggregation and semi-join, 9d. The same is true is different total from memory. algorithms, are similar for hash-division. that in Figure Therefore, i.e., naive division to the corresponding The curve 9d because hash-based for hash-based the parallel duplicate and sort-based curves removal machine in Figure aggregation has a larger is possible without overflow even for the larger dividend relation, saving significantly on 1/0 costs. In the cases where in-memory hash-based aggregation is feasible, i.e., 8<2, it outperforms hash-division by a slight margin because counting is faster than manipulating and scanning bit maps. The similarity appear to permit of Figure 9d and Figure 1 la indicates linear speedup. Figure 1 lb illustrates that the algorithms the speedup behavior of the four parallel division algorithms with 1 to 16 processors in a distributed-memory architecture. While hash-division comes very close to linear speedup, the other algorithms exhibit super-linear speedup. The reason is the same as discussed for hash-division in Figure 1 la: If only CPU and disk bandwidth were added, the speedup should be at most linear. However, memory is also added as the system grows, permitting not only a higher total bandwidth but also more effective use of it, because the fraction of data that fits in memory in hybrid hashing as well as the fan-in for sorting and partitioning increases. However, as demonstrated in Figure 1 la, hash-division is still the most efficient division algorithm. Unfortunately, super-linear speed-up could not be achieved for any of the algorithms in our real experiments, as we will see in the next section. 8. EXPERIMENTAL To validate the algorithms extensible PERFORMANCE analytical COMPARISON comparisons, in the Volcano set of algorithms we implemented query processing and mechanisms the four division system. Volcano provides an for efficient query processing [Graefe 1994]. All operators are designed, implemented, debugged, and tuned on sequential systems but can be executed on parallel systems by combining operator in them with a special “parallelism” operator, called the exchange Volcano [Graefe algorithm Graefe In i.e., et al. all 1993]. Graefe Volcano [ 1989], is left. If the The entire algebra next, a sort operator step function. as relational of open, by means merge Davison such has been Graefe and used for Thakkar a number [ 1992], of and [1994]. Volcano, opening and studies, and prepares final input operators are close procedures sorted runs and implemented [Graefe 1994]. merges them as iterators, For until example, only one merge is performed on demand by the next fits into the sort buffer, it is kept there until demanded by the next function. Final house-keeping, e.g., deallocating a merge heap, is done by the close function. Operators implemented in this model are the standard in commercial relational systems [Graefe 1993]. A tree-structured query evaluation plan is used to execute queries by demand-driven dataflow passing record identifiers and addresses in the buffer pool between operators. All functions on data records, e.g., comparison ACM TransactIons on Database Systems, Vol 20, No. 2, June 1995 Fast Algorithms for Universal Quantification . 225 and hashing, are compiled prior to execution and passed to the processing algorithms by means of pointers to the function entry points. The naive division algorithm was implemented in such a way that it first consumes the entire sorted divisor relation, building a linked list of divisor tuples fixed in the buffer pool. It then consumes the sorted dividend relation, advancing in the linked list of divisor tuples as matching dividend tuples are produced by the dividend input, of the divisor list is reached. Sort-based aggregation and producing a quotient duplicate removal tuple are each time the end implemented Volcano’s sort operator. It performs aggregation and early as possible, that is, no intermediate run contains duplicate duplicate within removal as sort keys. A merge-join consists of a merging scan of both inputs, in which tuples from the inner relation with equal key values are kept in a linked list of tuples pinned in the buffer pool. For semi-joins in which the outer relation produces the result, no linked lists are used. For all algorithms involving sort, the runlength was 4,099 records and the fan-in was 20. The hash-based hash tables, space for hash data bucket, a tuple’s and chosen, providing sizes for the generated unique experiments CDC Wren for system and Dividend and distributed skew was main size conflict Chain to the memory bit (number resolution manager elements. to the sort-based were for memory a pointer pointer table the data chain and or the with test uniformly no data 8.1 Sequential The parity chaining Volcano’s contain a hash quotient. and and maps, identifier count of the bucket use that algorithms, Record bytes bit record divisor use algorithms structures the hash-based hash tables, auxiliary pool, algorithms The elements next address map are tuple in in the respectively. of buckets) in to allocate the buffer For of 4,099 all was algorithms. 32 bytes divisor for divisor attributes integers such and were that dividend and 8 pseudo-randomly all attributes were present. Algorithms were run on a Sun SparcStation running VI disk drives. One disk was used for normal files, source code, executable, etc., while the SunOS with two UNIX file systems other was accessed directly by Volcano as a raw device. Of the system’s 24 MB of main memory, 4 MB was allocated for use by Volcano’s buffer manager, in order to guarantee that the experiments would not be affected by virtual-memory thrashing. The unit of 1/0, between buffer and disk, was chosen to be 4 KB, which was also the page-size. Simplest Case Figure 12a-b shows the measured performance of the four algorithms for quotient cardinalities IQI of 16 and 1,024. Each figure illustrates division performance for a fixed number of quotient records. The divisor cardinality IS’I varies from 16 to 4,096 or from 16 to 1,024. In all cases, the dividend is the i.e., R = Q x S. This Cartesian product of the divisor and the quotient, ACM Transactions on Database Systems, Vol. 20, No. 2, June 1995 226 G. Graefe and R L. Cole . 50 – ❑ Naive Division 40 – Elapsed x Sort-Based Aggregation + Hash-Based Aggregation 30_ Time [see] o Hash-Division 20 – lo– o– Divisor ISI Cardinality IQI=16, 4096 1024 256 64 16 R=SXQ (a) ❑ Naive Division 1250 – 1000 – Elapsed Time 750- [see] 500 – x Sort-Based Aggregation + Hash-Based Aggregation 0 Hash-Division 250 – o– 1024 256 64 16 Divisor Cardinality IQI=1,024, IN R=SXQ (b) Fig. 12. implies are (a) Measured that neither required, These and the the The outperforms the division the the in for the case not algorithm integrity in the (R Figure early in can be one no the less algorithm. to naive significantly faster 12 (although of the particu- analytical well of division equal on Database Systems, Vol 20, No 2, June 1995 duplicates) In as in significantly attributed aggregation enforcement be applied. and aggregation illustrated quotient, comparisons. appeared was reflect holds implementation = Q X S) but This integrity same perform direct (b) Large can analytical is the aggregation algorithms). did referential hash-based sort-based experiments model ACM TransactIons the quotient. nor algorithms and small of each here general-purpose this hash-based analytical sort-based very (referential made algorithms algorithms, is that analysis form results of the hash-based “surprise” removal simplest observations ranking comparison. the performance, duplicate the experimental substantiate lar, sequential to the individual not fact than slightly The one division than in naive as fast that sort as our runs, Fast Algorithms for Universal Quantlficatlon 1;& Elapsed Time Relative To + Hash-Based 4 . 227 Aggregation Hash-Div. J Divisor Cardinality IQI=1,024, Fig. 13. Performance relative 1024 256 64 16 ISI R=SXQ to hash division, large quotient, and therefore the cost of sort-based aggregation is overestimated in the cost formulas. The difference in performance for small relation sizes are small, but the difference grows fast as the relations grow. Figure 12b with all elapsed times divided for the same input sizes. The difference algorithms shown in Figure Figure 13 shows the same data by the elapsed time between the slowest 12b and Figure 13 ranges as of hash-division and the fastest from a factor of 5 to a factor of 10. Differences of this magnitude occur for all input sizes in Figure 12a-b. For very small relation sizes, e.g., Ilil = 1,024, ISI = 64, and IQI = 16 in Figure 12a, the performance fastest division differed algorithms; by a factor 0.53 versus of 3 between 0.17 seconds. the slowest and relations, e.g., For larger Il?l = 4,194,304, ISI = 4,096, and IQI = 1,024 in Figure 12b, the performance differed by a factor of almost 10: 1,380 versus 140.2 seconds. Naive division became relatively dividend, worse be sorted; because however, it requires the sort gation tation like sort-based aggregation [Bitten of division may not be crucially divisor relations, but if the relations The same operations possible error in the selective and dividend sizes accurately. estimate Figure 14, shows referential itly enforced IR] = 65,536 points while which Integrity the the larger take input advantage relation, of early the aggre- and DeWitt 1983]. The implemenimportant for small dividend and are large, division algorithm carefully. are results of other database Referential that cannot it is imperative to choose the is true if the dividend or the divisor such as selections and joins and the makes it impossible to predict divisor Enforcement the measured integrity during constraint query performance between execution. of the four dividend In Figure algorithms and divisor 14, the when must dividend the be expliccardinality and the quotient cardinality IQ] = 256 are fixed for all data the divisor cardinality IS I = 256/8 varies. 8 is the factor by dividend size decreases ACM in Transactions the referential on Database integrity Systems. Vol enforcement 20, No 2, June 1995 228 G Graefe and R. L Cole . 504 K ❑ n ~ 40 – Elapsed ❑ Naive Division ~. Aggregation + Hash-Based Aggregation Time [see] ~._ x x Sort-Based o Hash-Division lo– 6 1 I 1.6 1 Size Reduction by Referential 1S1= 256/6, Fig. step are 14. and Measured varies the from of enforcement. leftmost measurements integrity already four for tial integrity the additional range from 7.95 Note for than did to speeds ments to Duplicate Figure records up sort-based added for merge-join. 7.09 ( 6 = sort from 8.67 to the left explicit elapsed of of 6 = times from the referen- integrity actually the the actual affected, 1) to costs to ( 8 = 8) without The relatively sort- referen- seconds, due to non-matching also is requires this less because it Its elapsed times 1/0. hash-division little (no the algorithm and The 20.66 remove performance referenimproves, division. from 32.77 for performance algorithm performance can must degradation, seconds. results the parallel fared into aggregation. a second integrity more than the worse experimental aggregation aggregation 1 when referential analytical in the results. Again, the analytical This also double results, analytical this model, explains from the except comparison is because which why leftmost the we signifielapsed measure- dividend sort without early aggregation enforcement using a semi-join version Removal 15 shows the run-time performance are present in the dividend and Transactmns 1 referential a redundant significant 8, its seconds. incurs sort-based 8 = of required on which run show in overflow to but justified sort 6 enforcement. referential aggregation experimental include of 16.59 to 48.18 its impact aggregation, seems cantly ACM to sort-based times the table table, the in cost where strongly hash-based but 10.83 that not to hash 9.56 rise participate and The divisor indicated the values is most semi-join from the integrity without the division large tuples algorithm without times cases naive For dividend semi-join, proceed in Enforcement inputs. except records. referential 1 represents step Integrity algorithms words, 8 = enforcement) additional use the aggregation dividend other to with elapsed division enforcement. fewer based In algorithms integrity because 8. The I 8 IQI = 256, IRI = 65,536 performance the enforcement held All tial 1 to performance integrity tial sequential I 4 I 2 on Database Systems, Vol of the algorithms divisor. In Figure 20, No 2, June 1995, when duplicate 15, the quotient is of Fast Algorithms for Universal Quantification . 229 500 – ❑ Naive Division Elapsed 400 – x Sort-Based Aggregation 300 – + Hash-Based Aggregation Time o Hash-Division 200 – [see] loo– I 1 Q “ u o-l , I I I 2 4 8 Size Reduction by Duplicate Removal a D = a, IQI = 256, ISI = 256 X a, IRI = 65,536 Fig. cardinality 15. Sequential performance IQI = 256 is constant and dividend and ~ ranging cardinality from comparison while the IR I = 65,536 with divisor x @ vary 1 to 8. In all experiments, X /3 duplicates. cardinality with /S I = 256 X a replication factors a = ~. For example, a if a = 1, Il? = 65,536 and ISI = 256, and if a = 8, IRI = 524,288 and ISI = 2.048. The performance difference between the worst and best algorithms is again dramatic. seconds. For 8 = 8, the difference is a factor 6,500.98 Here the most important result is the difference aggregation duplicate whereas seconds, and hash-division. In the seconds versus 76.20 between hash-based measurements removal, the difference between for 8 = 8, it is more than a factor for division these two algorithms of 4,326.63 seconds without was small, versus 76.20 Differences in the relative performance of the algorithms between the analytical and the experimental results are due to the omission of early aggregation in the analysis and the fact that the buffer size and management is different than assumed in the analytical comparison. In several cases in the experiment, 1/0 the costs occur entire buffer dividend entire dividend due to sorting. relation Furthermore, buffer is used as sort space, so that pool from run creation to merging relation has to be sorted fits only into the buffer, a small so that percentage no of the temporary file pages remain in the and deletion. Only when a large twice, i.e., in the case of sort-based aggregation with preceding semi-join, is this effect not observed, and the preceding semi-join and additional sort almost double the cost of sort-based aggregation, e.g., for IRI = 65,536, IS I = 256, and IQ I = 256, the run-time is 20,48 versus 39.63 seconds. It is important to realize terms of an aggregate that function if a universal with quantification preceding semi-join is expressed and the in query optimizer does not rewrite the query to use a direct division algorithm, the query may be evaluated using an inferior strategy. This is true for both sort-based query evaluation systems such as System R, Ingres, and most commercial database systems (sort-based aggregation versus naive division) ACM TransactIons on Database Systems, Vol. 20, No, 2, June 1995. 230 G. Graefe and R. L, Cole 0 and for hash-based systems such as Gamma (hash-based aggregation versus hash-division). Since it is much easier to implement a query optimizer that recognizes a division operation and rewrites it into an aggregation expression than vice versa, universal quantification should be included as a language construct in query languages, for example, as the “contains” clause between table expressions originally included in SQL [Astrahan et al. 1976]. In summary, hash-division is always faster than any of the other algorithms presented here. Even if we compare the most complex version of hash-division, i.e., the version with automatic duplicate removal and referential integrity enforcement, against hash-based removal or referential integrity enforcement, slower than cate records faster 8.2 hash-based aggregation. must be removed from than the next Parallel However, the inputs, best algorithm, hash-based aggregation hash-division without duplicate is only slightly when non-matching or duplihash-division is significantly aggregation. Algorithms Considering of-magnitude that parallel processing speed-ups in database optimization technique should has been demonstrated to permit orderquery processing, any new algorithm or also be evaluated in parallel settings. In addition to the comparisons of our four algorithms on a sequential machine, we also measured speed-up and scale-up on a shared-memory parallel machine. Speed-up compares elapsed times for a constant problem size, whereas scale-up compares elapsed times for a constant ratio of data volume to processing resources. Linear speed-up is achieved if N times more processing resources reduce the elapsed time to l/Nt}’ of the time. Linear scale-up is achieved if the provided following the ratio diagrams, 100% parallel The 80386 elapsed time is constant and independent of the input of data volume to processing power is constant. In both linear speed-up and linear scale-up are indicated size, the as efficiency. experimental environment was a Sequent Symmetry with twenty processors running at 16 MHz (about 3 MIPS each) and twenty disk drives connected to four dual-channel controllers. Of these, we used up to eight processors and eight disks, because we could not obtain more disks for our experiments. In each experiment, we use the same number of disks as processors. The data are round-robin partitioned at the start of each experiment; thus, operations such as aggregation, join, and division require that the first query execution step be repartitioning. Figures 16a-b show run-times and speed-ups of the four algorithms using two different partitioning schemes. (The mapping of tags on the curves to algorithms is the same as in the previous figures; we omitted it here to keep the figure readable.) While none of the experiments show linear speed-up, all algorithms become faster through parallelism. In other words, the basis for intra-operator parallelism, namely the division of sets (relations) into disjoint subsets, applies not only to the known algorithms for sorting, merge-join, hash-aggregation, and hash join but also to the new universal quantification algorithm, hash-division. This observation holds not only in theory, as reflected in our analytical cost comparisons, but also in practice. ACM Transactions on Database Systems, Vol 20, No 2, June 1995 Fast Algorithms for Universal Quantification 231 . 1 Elapsed ====– --0_____ ‘8+..==::: 100 0.8 -++-.--. –– ----a Parallel Time 0.6 Efficiency 0.4 (dashed [see] (solid 50 lines) lines) 0.2 0 I , I I I I 1 2 4 8 0 .-,---. = =_ i&~ : Degree of Parallelism ISI=IQI=256, P R=SXQ (a) 1 ‘.. Elapsed ‘. 100 Time [see] (solid -#=...-_ 50 lines) -. ‘% ----- --=== =#... ---- = +.. - 0.8 ‘.-. -. -. :--.- -w - e -. + Parallel 0.6 Efficiency 0.4 (dashed lines) 0.2 –0 o– 2 1 8 4 Degree of Parallelism ISI=IQI=256, P R=SXQ (b) Fig. 16. (a) Speed-up In fact, it would relatively small with divisor not have been realistic data volumes. ory machine, only processors constant. Thus, the effect that cal comparison operators and partitioning. is missing. buffers has First, (b) Speed-up to expect in this with quotient linear experiment and disks are added permitted super-linear partitioning speed-up using with these a shared-mem- while memory remains speed-up in the analyti- Second, since the Volcano implementation of been tuned very carefully and is quite efficient [Graefe and Thakkar 1992], the CPU effort for manipulating records is very low and a large fraction of all CPU effort is due to pre-process overhead and protocols for data exchange. For divisor scans as few as 32 tuples. helps, but and closes Volcano’s process to example, for P = 8, a scan process for the Using a pool of “primed” Volcano processes process allocation is still not free. In addition, each process opens its own partition file. With respect to data exchange note that, in shared-memory implementation, each data packet going from one another requires a synchronization, i.e., a semaphore system call, ACM TransactIons on Database Systems, Vol 20, No 2. June 1995 232 G. Graefe and R. L. Cole . Elapsed Parallel Time Efficiency [see] (dashed (solid lines) lines) Degree of Parallelism ISI=PX256, 1QI=256, P R=SXQ (a) 150 Elapsed Parallel Time Efficiency 100 [see] (dashed (solid lines) 50 lines) 0 8 4 2 1 Degree of Parallelism ISI=PX256,1QI =256, P R=SXQ (b) Fig. 17. (a) Scale-up in each process. data Since exchange step, slow interprocess gree of parallelism faster 17a-b. with constant strategies. only the 17a-b ratios All partitioning. l\P of all fraction (b) Scale-up data of data stay that system apply show of algorithms to the input suffer size and to slight quotient on the same partitioning. processor go through increases in the with are supported any relatively increasing de- by the fact that the efficiency. scale-up run-times with must calls P. These observations the poorer its parallel observations Figures divisor communication an algorithm, Similar with experiments scale-ups processing increases shown of power in the for processing four two time in Figures algorithms partitioning for larger volumes and degrees of parallelism. None of them exhibits truly linear scale-up in our experiments. Nonetheless, the speed-ups and scale-ups obtained are sufficient to support parallel processing for large universal quandata tification ACM and relational Transactions on Database division Systems, problems. Vol. 20, No. 2, June 1995. Fast Algorithms for Universal Quantification 233 -1 Elapsed ~: 15 Time ~ x Sort and Merge-Join + Hybrid [see] Hash Join o Hash-Division lo– 5– I I I 1248 Size Reduction by Referential ISI = 256 /6, Fig. 8.3 Compariscm Existential 18. Existential of Existential quantification by semi-join algorithms A A I I 6 I 64 128 256 I Integrity Enforcement 6 IQI = 256, IRI = 65,536 versus universal quantification, and Universal Quantification is well-supported such in most as merge-join and query hybrid processing hash join. systems Figure 18 shows the measured performance of existential and universal quantification, i.e., hash-based semi-join and hash-division, over the same inputs as those Figure 14, i.e., Il? 8 represents the = 65,536 dividend’s in and IRI = 256 are fixed while ISI = 256/8 varies. “reduction factor” due to referential integrity enforcement, that is, there are increasing numbers of non-matching records in the dividend. When 8 = 1, the semi-join outputs all 65,536 records, whereas the size of the quotient produced by hash-division is only IQI = 256. However, when 8 = 256, the output of the semi-join and IQI are both equal to 256 records, and the algorithms for existential and universal quantification have the same costs for output record processing. We can see that the run-times for hash-based semi-join and hash-division are nearly the same regardless of the number of output records, In fact, the difference between hash-division and hash-join join. This is much smaller than the difference between merge-join and hashis another strong argument for directly supporting universal quan- tification in today’s tions need quantification no database quantification and relational 9. SUMMARY AND We have namely recently query longer be an issue and relational division described languages—performance of such because the performance can be equivalent to that opera- of universal of existential semi-join. CONCLUSIONS and compared four algorithms for naive division, sort- and hash-based aggregation proposed algorithm called hash-division. Analytical relational division, (counting), and a and experimen- tal performance comparisons demonstrate the competitiveness of hash-division. Naive division is slow because it requires sorted inputs; it is competitive only if neither input incurs costs for sorting. Division using sort-based ACM TransactIons on Database Systems, Vol 20, No. 2. June 1995 234 G. Graefe and R. L Cole . aggregate functions is almost as expensive and an additional sort are required to ensure are counted, sion. sort-based Division using hash-division step must can be more hash-based when are required. uniqueness aggregation neither as naive division; if a semi-join that only proper dividend tuples aggregate a preceding expensive functions semi-join If either one is required be enforced, hash-division is nor because than divi- as fast a duplicate referential outperforms naive exiidly as removal integrity all other or algorithms because it automatically enforces referential integrity and ignores duplicates. However, as in our second example (students who have taken all database courses), referential integrity is unlikely to hold if the divisor is the result of a prior selection or other restrictive preprocessing step. On the other hand, if referential integrity holds and both dividend and divisor are duplicate-free, the basic hash-division table and substituting candidates, thus algorithm counters making can be simplified for the bit maps it similar rithm and performance. Like the other algorithms to hash-based consider here, by omitting the divisor associated with quotient aggregation hash-division in both is easy algo- to paral- lelize. Its speed-up and scale-up behavior is similar to that of hash-based aggregation and join, while its absolute run-times are consistently lower than those of universal since hash-division requires partitioning quantification using hash-based aggregation. Moreover, does not require referential integrity to hold, it never the dividend input twice, as is necessary in indirect division algorithms based on subsequent aggregate function. able in primary relational parallel systems, conclusion from division aggregation, Therefore, i.e., for the semi-join and the hash-division is particularly valu- where it outperforms all other algorithms, this study is that universal quantification can be performed very efficiently database systems by using hash-division. An additional result of our comparisons is that in sequential direct Our and and parallel algorithms tend to outperform indirect strategies based on aggregate functions, both in sortbased and in hash-based query evaluation systems. In direct algorithms, the larger input, the dividend, has to be partitioned, sorted, or hashed only once, not twice as in indirect strategies based on aggregation. Thus, division problems should be evaluated by direct division algorithms. However, it is much easier for a query optimizer to detect a “for-all” or “contains” expression and to rewrite it into a query evaluation is to determine that a complex aggregation plan using expression aggregation than it or a pair of nested “NOT EXISTS” clauses are actually “work-arounds” for a universal quantification problem. Thus, query languages should include “for-all” quantifiers and “contains” predicts, e.g., as the “contains” clause between table expressions originally included in SQL [Astrahan et al. 1976]. In other words, it was a mistake to remove that clause from the original language design, and it should be restored in future versions of SQL and in other database query languages. Finally, hash-division is fast not only when compared to other division algorithms but also when compared to existential quantification algorithms, i.e., semi-joins. We have shown experimentally that hash-division is as fast as ACM TransactIons on Database Systems, Vol 20, No 2, June 1995 Fast Algorithms hash-based semi-join for the same two inputs, sort-based semi-join. Thus, the permits database implementors tion and for Universal Quantification relational division and performs . much 235 better than recently proposed algorithm, hash-division, and users to consider universal quantifica- with a justified expectation of excellent perfor- mance< ACKNOWLEDGMENTS Dale Favier, many David excellent Maier, Barb suggestions Peters, and the anonymous for the presentation of this reviewers made paper. REFERENCES ASTRAHAN, M. M., BLASGEN, M. W., CHAMBERLAIN, D. D., ESWARAN, K. P., GRAY, J. N., GRIFFITHS, P. P., KING, W. F., LORIE, R. A., MCJONES, WADE, B. W., AND WATSON, V. 1976. ment. Syst. ACM Readings BITTON, Trans. Database in Database Systems, D., AND DEWITT, Trans. Database Syst. BRATBERGSENGEN, Conf. K. CARLIS, J. V. Proc. 1986. IEEE CHENG, mance: design, F. Hashing Reprinted record methods Bases HAS: A relational (Singapore, and in to database I. L., manage- Stonebraker, M.. 1988 files. ACM Proc. Znt ‘1. Calif, elimination relational August), algebra Eng. in large operator, data IBM Syst. operations. Endowment, or divide Calif., A., AND WORTHINGTON, Completeness algebra VLDB (Los Angeles, and tuning. Relational approach San Mateo, Duplicate Data implementation, PUTZOLCT, G. R., TRAIGER, 255. C., SHIBAMIYA, 1972. 97. Kaufmann, 1983. Znt ‘1. Con f, on Data J., LOOSELY, CODD, E. D. J. J. W., R: A relational 1, 2 (June), Morgan 8, 2 (June), 1984. on Very Large P. R., MEHL, System 323. is not enough to conquer. February), IEEE, New P. IBM database 1985. York. 254 2 perfor- J. 23, 2. of Database Sublanguages. Prentice-Hall, New York. DATE, C. J., AND DARWEN, H. Reading, DEWITT, 1993, A Guide to The SQL Standard, 3rd cd., Addison-Wesley, Mass. D. J., KATZ, Implementation R., OLKRN, techniques F., SHAPIRO, for main L., memory STONEBRARER, database M., systems. D. J., GERBER, R. H., GRAEFE, G., HEYTENS, 1986. GAMMA-A Large Data Bases in Database DEWITT, R. performance (Kyoto, Japan, Systems, Morgan The (March), EPSTEIN, GAMMA August), R. 1979. Proc. 151. M. L., KUMAR, K. B., AND MURALIRRISHNA, database machine. 228. Reprinted San Mateo, Proc. Int’1 in Stonebraker, M., Conf. 1988. M. on Very Readings Calif. machme project. IEEE relational for at Berkeley, processing UCB/ERL Trans. of aggregates Memorandum Knowledge M., AND TANARA, H. database machine 1986. GRACE. Znt ‘t. Conf. on Data Eng. G., AND THAKKAR, memory multiprocessor. GRAEFE, G. 1993. Query (Los Angeles, S. S. Calif., 1992. relational Data Eng. 2, 1 Int’1 February), Exper. 22, 7 (July), evaluation techniques for large on and their a parallel Softw.-Pract. of the Conf. IEEE, database systems. (February). An overview Proc. Tuning in M79/8, (Kyoto, Japan, August), VLDB Endowment, 209. GRAEFE, G. 1989. Relational division: Four algorithms (June), algorithms. Endowment, S., SCHNEIDER, D., BRICKSR, A., HSIAO, H. I., AND RASMUSSEN, database Techniques S., KITSUREGAWA, parallel GIWEFE 1984. Corzf. 44. of California FUSHIMI, dataflow Kaufmann, D. J., GHANDEHARIZADEH, 1990. Univ. high SZGMOD ACM (Boston, Mass., June), 1. DEWITT, D. J., AND GERBER, R. H. 1985. Multiprocessor hash-based join Int’1. Conf. on Very Large Data Bases (Stockholm, Sweden, August) VLDB DEWITT, D. AND WOOD, Proc, system Very Large performance. New York, database software of a Data Bases Proc. IEEE 94. algorithm on a shared- 495. databases. ACM Comput. Surl,. 25, 2 73-170. ACM Transactions on Database Systems, Vol 20, No. 2, June 1995 236 G, Graefe and R. L, Cole . GRAEFE, G., AND DAVISON, D. L. dence m extensible GRAEFE, G IEEE 1994. Trans. GRAEFE, Knowl. Data machine M., dynamic KNUTH, MAIER, D. D GRACE 1973. The Art Reading, 1983, NAKAYAMA, M., Tlle dynamic destaging VLDB O’NEIL, 19, 8 (August), dataflow query processing 749. system. Sort versus hash revisited. IEEE Trans 934, T. Gene~ation join 1983. of hash to data base 1, 163. 1989. Proc. VLDB of Computer Application Comput. method. August), Theory The effect of bucket Znt ‘1. Conf. Endowment, Programming, of Relational M., strategy. on size tuning Very Large in the Data Bases 257 Vol. Databases. AND TARAGI, Proc. Endowment, P. E. architecture-indepen- Eng Softw III: Sorting and Searching. Addi- Mass. KITSURECAWA, August), and Trans. 120. 1994. M., AND TARAGI, M. hash of parallelism parallel 1 (February), New The Netherlands, son-Wesley, and H , AND MOTOOKA, M., NARAYAMA, (Amsterdam, 6, 6, 6 (December), TANAKA, hybrid IEEE extensible Eng. and Its architecture. KITSUREG~WA, processing A., AND SHAPIRO, L. D. Eng. KITSUREGAWA, An Data Encapsulation query Volcano: G., LINVILLE, Knowl. 1993. database M. Int ‘1, Conf. Computer 1983. Science Press, Hash-partitioned on Very Large Data Rockville, join Bases Md. method using (Los Angeles, Calif., 468. 1994 ~ntroductzon 1986. Fragmentation: to Relational Databases. Morgan Kaufmann, San Mateo, query processing. ACM Trans. Cahf, SACCO, G. M. Database Syst 11, 2 (June), SALZBERG, B., TSURERMAN, Sort: A distributed (Atlantic City, N. J., May), in a shared-nothing ACM, SHAPIRO, L. D. Trans. 1986 Database SMITH, J. M., database ACM, Join Syst. 1986. evaluation Proc. in database 1975. of four ACM systems Optimizing the 18, 10 (October), database IEEE, 1990. ACM April TransactIons 1994; revised on Database parallel SIGMOD An Data with 1990. Fast- Conference join algorithms Con ference (Portland January Systems, large main memories. ACM query New IEEE language, York, adaptive hash Bases. performance 1995; Vol 20, No algebra Data 1990. Proc, Eng. Supporting IEEE Bull, 9, 1 (March). universal Int ‘1. Conf. quantifi- on Data Eng. 68. join (Brisbane, accepted of a relational 568. algorithm for multiuser Australia, August) 186 Received B. SIGMOD 239. case for shared-nothing. on Very Large ACM 110. ACM February), Proc. 94. environment. processing The S., AND VAUGHAN, sort. A., SOCIWT, G., AND BURNS, L. H. AND GRAY, J. Int ‘1, Conf, York, 11, 3 (September), Commun. Calif., efficient external A performance York, in a two-dimensional (LOS Angeles, Proc. New 1989. AND CHANG, P. Y. T. interface. STONEBRAICZ~, M.. ZELLER, D. New WHANG, K. Y., MALHOTRA, cation single-output multiprocessor May-June), for A., GRAY, J , STEWART, M., UREN, single-input SCHN~lDER D., AND DEWITT, Ore., A techmque 113 March 2, June 1995 1995 VLDB environments. Endowment,