Download Fast algorithms for universal quantification in large databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Fast Algorithms for Universal
Quantification in Large Databases
GOETZ
GRAEFE
Portland
State
Unwersity
and
RICHARD
L. COLE
Redbrick
Systems
Universal
quantification
it adds
significant
for
analysis
the
reasons
for
mance
omission
that
and
been
embodies
effects
of explicit
inputs
larger
than
performance
comparisons
comparisons
demonstrate
and
that
the
predicate
over
two
relations
over
the
same
system.
expressed
the
very
directly
rather
and
H.2.4
General
Terms:
research
has
and
18. monitored
Sequent
Computer
author’s
(goetzg
Permission
to make
granted
without
advantage,
given that
servers,
supported
IRI-91
Digital
Systems,
98052-6399
by the
16547,
the
for
experimental
Moreover,
evaluates
a universal
join
evaluates
an existential
and
universal
quantification
algorithm
to a query
quantification
query
optimizers
division,
should
be
do not recognize
and
[Database
H.2.3
therefore
produce
Management]:
Lan-
processing
ADP,
address:
and the
Intel
Advanced
Microsoft
Science
Foundation
Research
Laboratory
Corp.,
Oregon
G. Graefe,
National
Advanced
Research
Equipment
Projects
under
contract
Supercomputer
Computing
Corp.,
with
Institute
Way,
(ARPA
DAAB-07-9
Systems
One Microsoft
grants
Agency
1-
Division,
(OACIS).
Redmond,
WA
@ microsoft.tom).
digital/hard
fee provided
copy of part
that
copies
the copyright
notice, the
copying
M by permission
or to redistribute
Q 1995 ACM
perfor-
queries.
Systems—query
by the US Army
Instruments,
current
Files;
the
Experimentation
partially
Contact
[Data]:
E.5
three
algebra
variants
and
proposed
as relational
quantification
the
the algorithms.
universal
most
main
perfor-
and compare
enforcement,
algorithm
recently
of the
their
division,
Analytical
(semi-)
that
because
in SQL
Management]:
is
One
and
we investigate
existential
the
study
universal
Descriptors:
as hash
attributes.
among
division
Thus,
language
available
been
Texas
of our
query
IRI-8912618,
number
C-Q518),
Subject
as fast
by adding
result
for many
strategies.
differences
relations.
efficiency
Algorithms,
IRI-8996270,
order
plans
[Database
guages;
This
second
formulations
evaluation
Categories
equal
A
in a database
indirect
poor
with
integrity
the fact that
in particular
algorithms
relational
algorithm,
proposed
despite
capabilities,
we describe
for
each
substantial
recently
article,
referential
execution
predicate
be supported
For
parallel
quantification
can
In this
and
the
systems
of set-valued
algorithm
removal
illustrate
database
and inference
quantification
quantification.
quantification
evaluation
and
universal
proposed
duplicate
memory,
in most
processing
databases.
recently
universal
mance
that
for large
one
directly
query
relationships
has
not been explored
algorithms
operator
to a system’s
of many-to-many
this
have
known
is not supported
power
to lists,
0362-5915/95/0600-0187
ACM
Transactions
are
or all of this
not
made
work
title of the publication,
of ACM,
Inc. To copy
requires
prior
specific
for personal
or distributed
for
or classroom
profit
use is
or commercial
and its date appear,
and notice
otherwise,
to republish,
to post
permission
and/or
is
on
a fee.
$03.50
on Database
Systems,
Vol. 20, No 2, June
1995, Pages 1S7-236
188
.
G. Graefe and R. L. Cole
1. INTRODUCTION
Quantification
database
which
Similarly,
the
queries.
division
example
research
request
using
a relation
query
is to find
list,
there
R that
those
“for-all”
algebra
evaluation
customers
indicates
who
which
all
do not
meets
directly
or efficiently
provide
the
it,
consider
which
A such that for each criterion
in R matching
A and B. In relational
of R by the list of interesting
of universal
quantification
relational
quantification
of a given
customer
i.e.,
predicates
includes
universal
systems
satisfy
algorithms.
quantification,
complex
complex
and
quantification,
semi-join
universal
evaluate
most
databases,
customers
a tuple
supported
not
well-known
complete
for
query
most
to find
exists
to
and
existential
or algorithms
to implement
operation
of a universal
quantification
query,
query is a division
application
a typical
are
system
sets
support
using
required
Nonetheless,
querying
also requires
a relationally
operator
relational
As an
for
as SQL
completeness
of a database
1972].
division,
concept
such
be implemented
relational
ability
[Codd
languages
easily
can
However,
the
is a powerful
query
a market
list
of criteria,
criteria.
This
B in the given
algebra,
this
criteria.
In fact, the query is
and relational
division,
which
in any commercial
relational
database
system.
Other
computer
can easily
be imagined.
Which
students
have taken
all
courses required
to graduate?
Which
courses have been
examples
science
can provide
all
by all students
of a research
group? Which
suppliers
assembly?
Which
parts are available
from all regional
for a certain
taken
parts
suppliers?
Which
Which
skills
applicants
common
are
possess
all
to
all skills
recent
required
hires?
for a new job opening?
In
order
to
illustrate
the
ubiquity
of useful
universal
quantification
and relational
division
queries,
are intentionally
based on the relational
schemas frequently
these questions
used
for simple
database
examples.
In fact,
quantification
induces a pair of universal
many relationships
underlying
the first
set-valued
only
the
queries
with
a restriction
and to “recent
Given
many-to-many
relationship
only the two many-toabove. Similarly,
any
a pair of universal
quantification
queries,
not
attribute
suggests
“skills’’-attribute
underlying
the last two questions.
Many of these
answer
very useful
questions
in the real world,
in particular
when
combined
science”
any
queries,
not
four questions
the
power
on the
divisor,
e.g., the restrictions
to “computer
hires.”
of universal
quantification
and
relational
division
to
analyze
many-to-many
relationships
and set-valued
attributes,
why have
they been neglected
in the past? We believe that this omission
has largely
implementation
algorithms.
This article
been due to the lack of efficient
attempts
to remedy
this problem
by surveying
algorithms
for relational
(all data fit in
division
and comparing
their suitability
for small data volumes
memory),
results
to
all
complex
ments
ACM
for
large
is that
prior
algorithms
cases
as
demonstrate
Transactions
data
volumes,
a recently
well
that
on Database
proposed
by
its
as
in
and
for
algorithm
generality
parallel
relational
Systems,
Vol
parallel
and
query
its
performance
execution
division
20, No
execution.
2, June
Our
hash-diuision
called
can
1995
be
systems.
computed
primary
is superior
in
simple
Our
as
fast
and
experias
a
Fast Algorithms
hash-based
join
performance
enable
the
mented
or semi-join
can
be
of the
achieved
optimizer
to
by relational
same
only
detect
for lJniversal Quantification
if
pair
and
relations.
language
universal
division,
of input
query
and
fastest
this
formulation
that
relation
189
However,
query
quantifications
if the
.
can
division
be
imple-
algorithm
is
used.
Because
universal
glected
in
for
supporting
not
in
past
database
tional
universal
first
rationale
tend
R in a difference
product
plan
ignore
four
and
relational
be
very
and
alternative
rela-
operators.
negated
existential
functions.
Fourth,
bother?
B) and S(B)
are relations,
then
[Maier
1983].
However,
Cartesian
this
one
must
We
require
matched
will
present
relational
against
as a join.
expression
Cartesian
to
be
is as expensive
equivalent
inefficient.
this
so why
which
on the
do not
explicitly
First,
relational
aggregate
ne-
I?( A,
operation,
that
division
using
using
been
rationales
systems.
other
expressed
frequently,
if
large,
of
have
traditional
evaluation
terms
x S) – R)
based
is
plans
very
correct:
to be very
operator
therefore
in
division
address
query
can
used
is
products
evaluation
in
be expressed
– ~~((m~(l?)
dividend
evaluation
and
expressed
not
relational
and
quantification
it can
are
R + S = m~(R)
query
name
quantification
Third,
predicates
The
be
and
us
languages
can
quantifications.
for-all
let
universal
query
division
Second,
quantification
research,
using
more
the
Thus,
a
a Cartesian
efficient
product
operations,
division
in
our
query
and
will
performance
comparisons.
The
second
queries
as we
will
clauses
the
rationale
expressed
see in
and
tifications
are
much
systems,
the
in
algebra
expression
above.
B-trees,
the
plan
EXISTS”
Section
The
query
clauses
third
two
simple
specify
this
such
proposed
not
in
At
if both
plan
inputs
are
naive
division
In
fact,
many
in
two
to
quan-
evaluation
subqueries;
similar
quantification
to the
query
using
in
Furthermore,
as universal
article.
any
query
even
universal
SQL.
choose
this
quantification
best,
in
unnest
complex,
EXISTS”
queries,
queries
will
very
“NOT
complexity,
correctly
therefore
a universal
nested
example
Given
evaluation
be similar
rationale
queries
be discussed
and
complex.
ing
our
a query
for
quantification
are
those
“NOT
the
EX-
relational
indexed
with
clustering
query
with
two
algorithm
“NOT
discussed
in
2.
division
cation
will
for
do
a universal
results
example,
recognize
ones
universal
quantifications
for
and
optimizers
formulating
clauses
to
not
divisions,
than
although
predicates.
hard
will
relational
slower
SQL,
required
are
relational
correct,
existential
query
typically
and
commercial
ISTS”
2. In
queries
optimizers
plans
Section
of additional
quantification
also
negated
subqueries
absence
query
is
using
is
based
in
Section
relational
Moreover,
division
by
on
division
means
in
rare
inputs
be
duplicate-free,
valid,
and
SQL
formulations
and
their
query
2. Unfortunately,
as we
except
also
aggregation
of
will
formulations
queries
see in
based
the
aggregate
referential
ACM Transactions
aggregation
results
in
integrity
on Database
between
Systems,
poor
requires
the
will
quantifi-
are
evaluation,
performance
relational
plans
of universal
performance
functions
circumstances—competitive
that
on
for
evaluation
also
very
implementperformance
that
inputs
Vol. 20, No. 2, June
both
hold,
1995.
190
.
and
G Graefe and R. L, Cole
that
the
Whenever
the
query
the
processor
divisor
relational
division
a separate
step,
systems.
Thus,
important
for
use
is the
result
inputs
which
hash-based,
will
not
hold
is particularly
direct
not
of a selection,
and
must
expensive
algorithms
that
modern,
parallel
database
rationale
seems
to be the
sort-based,
referential
in
avoid
aggregation.
integrity
parallel
than
explicitly
query
aggregation
systems
between
be enforced
they
in
evaluation
are
even
have
been
more
in
the
past.
The
fourth
tification
not
and
at
all
relational
clear
what
quantification
SQL
not
cation
if
used,
or support
on-line
the
in
following
relational
algebra
because
algorithms’
decision
support,
context
that
the
quantification.
also
(as opposed
universal
Moreover,
in
constraints
bases,
and
or
universal
the
to other
data
with
as
by Carlis
[ 1986]
its
model.
about
that
serve
those
and
that
models
can
such
by
model
conclusions
algorithms
level
quantification
data
familiar
and
operators
to algorithmic)
quantifi-
knowledge
relational
is most
here
the
in
many-to-many
in general
for
reader
Furthermore,
express
integrity
mining,
of the
applicable
quantification
to
to grow.
algorithms
considered
are
data
is
quantification
earlier.
complex
it
universal
universal
discussed
be expected
the
complex
analyzing
of quantification
must
assume
sophisticated
conceptual
as
enforce
that
universal
in
quan-
However,
possible
are
While
universal
used,
environments,
that
algorithms
Is it
valuable
attributes,
chose
one:
and
they
time?
very
examine
performance
universal
more
We
we
the
because
systems
particular
important
taught
is effect.
a long
importance
sections
division.
However,
what
are
set-valued
programming,
The
run
most
rarely
shunned
to
database
quantification
and
division
and
next-generation
are
in transaction-processing
relational
relationships
on sets
are
tend
be useful
and
logic
is cause
queries
and,
may
division
the
require
as a basis
explored
and
for
on
a
by Whang
et
al, [1990],
Section
2 gives
relational
division.
discussed
in
an
algorithms
using
memory.
parallel
Section
cost
for
both
Section
final
files
5
4,
for
comparison.
contains
been
called
and
proposed
intermediate
of
plans
7 uses
the
experimental
a summary
and
results
the
these
larger
algorithms
to
systems.
and
derive
cost
formulas
comparison
our
is
of
distributed-memory
evaluation
for
hash-di~ision
modifications
adaptations
Section
An
have
consider
and
query
algorithms.
section
we
inputs
shared-memory
four
that
algorithm
describes
complete
the
performance
8. The
In
algorithms
proposed
Section
6, we illustrate
analytical
Section
3.
temporary
machines,
formulas
of
recently
Section
than
main
overview
A
conclusions
In
analytical
for
follows
from
any
in
this
research.
2. EXISTING
ALGORITHMS
most
clearly,
we introduce
two example
In order to describe the algorithms
database
queries that will be used throughout
this paper. Assume a university
title
) and
Transcript
( student-id,
with
two relations,
Course
( course-~zo,
course-no,
are
interested
grade)
in
with
finding
the
the
obvious
students
key
attributes.
who
have
For
taken
ACM Tramsact,ons on D~t~base Systems, Vol 20, NO 2, June 1995
the
all
first
courses
example,
offered
we
by
Fast Algorithms
the university.
As
an English
query
relational
In
this
algebra,
this
example,
projection
The
tient.
the
query
is the
attributes
of the
in
the
191
.
database,
this
is
courses (courseappears
in the
is
of
Transcript
The
divisor,
divisor
the example.
The quotient
that are not in the divisor,
relations
in
projection
of Courses
the
Transcript
such that for all
this
student-id
and course-no
find the students
( student-id’s)
no’s) in Courses,
a tuple
with
relation.
Transczzpt
In
over
for Universal Quantification
in
is
the
division
are called
and
dividend
result
is called
attributes,
divisor
the
the
quo-
course-no
are the attributes
of the dividend
in the example.
The set of quotient
attributes
student-id
attribute
values in the dividend,
n,t .~,, ~.,~ (Transcript)
in this example,
is
candidates.
The set of all dividend
tuples with the
called the set of quotient
candidate
group.
same quotient
attribute
value is called a quotient
It is important
to notice that both relations
are projected
on their
key
attributes
in
dividend
or the
take
same
the
this
an additional
key
division
attribute
be properly
employed
desirable
We also assume,
for this
integrity
tion.
values
a priori;
semi-join)
must
As
our
second
taken
all
the
holds
It is important
known
string
have
queries
above
the
that
same
seem
does
not
may very well
dividend
the
validity
of the
division
explicitly
enforce
it.
example,
we
courses,
For
restrictive
effect
almost
hold
be transcript
courses
for
this
example,
a
in finding
the
which
title
the
e.g.,
selection
considered
entries
referential
Courses
that
integrity
do not refer
the
by
two
constraint
difference
has ramifications
when division
is implemented
tions, which will be described
shortly.
Table I summarizes
the cases that must be considered
have
a prior
would
example
assumed
In the example,
to database
(a
contains
a semi-join,
While
relation,
who
attribute
is restricted
be
step
students
divisor
here.
all
constraint
a preprocessing
divisor,
the
that
Cozu-se rela-
the
the
restricted
a referential
namely
in
on the
identical,
the
interested
i.e.,
represents
integrity
or
expenin
Thus,
divisor,
referential
the
must
to duplicates
relation
appear
algorithm
either
can be quite
operations
as the
for
are
also
require
or the inputs
courses.
and
relation
the
“Database.”
Other
between
and
to
would
is insensitive
the Transcript
students
the
duplicates,
them
removal
that
in
permitted
of Transcript
on studentremoval
step. As we will
contain
duplicate
that
Transcript
otherwise,
database
selection.
example,
between
in the
may
tuples
were
relation
be able to handle
an algorithm
relationship
constraint
course-no
Transcript
division
must
Performing
very
many-to-many
the
of duplicate
if students
term and the projection
require
an explicit
duplicate
preprocessed.
sive, making
its inputs.
problems
However,
times,
of relational
algorithm
the
arise.
multiple
would
inputs
Thus,
do not
course
id and course-no
see, if the
example.
divisor
there
courses.
using
This
aggrega-
separately
in
a
complete
analysis
of relational
division
algorithms.
In order to shorten
the
discussion,
we only consider
cases in which
either
both or none of the two
inputs
may contain
duplicates.
Thus,
in the analytical
and experimental
ACM TransactIons
on Database
Systems,
Vol. 20, NO 2. June
1995.
192
.
G. Graefe and R. L. Cole
Table
I.
Cases to Consider
Vlolatlons
referential
Case
for Relational
of
Division
Duplicates
mtegrlty
Algorithms
Duphcates
in
the diutdend
1
No
No
No
2
No
No
Yes
3
No
Yes
No
4
No
Yes
Yes
5
Yes
No
No
6
Yes
No
Yes
Yes
Yes
No
Yes
Yes
Yes
8
performance
comparisons,
we only
consider
four
in
the dwmor
cases, namely
cases 1, 4, 5,
and 8.
Let us briefly
speculate
which
practice,
a dividend
relation
will
many
relationship
quantification
between
queries
of these cases will
often be a relation
two entity
will
find
the
types.
In other
an entire
set or stored
query.
frequently,
into
will
e.g., our
the relational
foreign
keys,
often
divisor
duplicate
where
be
students
key
Before
point
not
are
attribute
discussing
out that
example
there
are not
query.
permitted
arise,
to
take
useful,
from
the
e.g.,
the
in
with
between
other
the
same
hand,
the
modified
course
keys
require
relational
division
tions Student
(student-id,
but actually
name, major)
and a query for the students
major. The crucial
difference
quantification
queries
do not. Consider
and Requirement
who have taken all courses
here is that the requirements
and
in the
inputs
dividend
and
problem
example
twice,
more
E-R-model
of
above
requiring
term that is not present in the dividend.
specific relational
division
algorithms,
we would
are some universal
e.g.,
much
the
I, duplicates
integrity
On
universal
type,
e.g., our first
be used
are represented
of Table
referential
assumed.
sometimes
often
will
Translated
relationships
while
be
many
of one entity
subset of instances
of another
We believe that queries using
alternative
classification
a problem,
does
subset
where
the
typically
removal
additional
second
into
as divisor
the
model,
and
not
can
relation
Instead,
words,
set of instances
students,
that are related
to all or a selected
entity
type, e.g., courses or database
courses.
example
arise most frequently.
In
representing
a many-to-
that
an
like
to
seem to
two additional
rela(major,
course-no)
required
for their
are different
for
each student,
depending
on his or her major.
This query can be answered
with a sequence
of binary
matching
operations.
A join of the Student
and
Requirement
relations
projected
on the student-id
and course-no
minus the
relation
an be projected
on student-id’s
to obtain
a set of students
Transcript
Student
who
have
not taken
all their
requirements.
The
difference
of the
relation
ments.
with
this finds the
The relational
algebra
students
who have satisfied
all their
expression
that describes this plan is
‘?,tudent.,d,course-no(
ACM
TransactIons
on Database
Systems,
Vol
Transcript))
20, No. 2, June
1995
require-
Fast Algorithms
for Universal Quantification
193
.
This sequence of operations
will have acceptance
performance
because it does
not contain
a Cartesian
product
operation
and its required
set matching
algorithms
(i.e., join and difference)
all belong
to the family
of one-to-one
match
operations
etrized
join
1986].
In
that
fact,
student-id
can be implemented
algorithm,
e.g., hybrid
since
the
attributes,
by a single
matching
The following
difference
these
join
could
two operations
operation
division
efficiently
hash
for three
algorithms
When
sort operations
and hash-based
suitable
index
et al.
be executed
could
inputs
by
[Graefe
more
Shapiro
on the
efficiently
1993].
on the existence
of indices
might
provide
a mechanism
indices could be used instead
in the sort-based
algorithms,
in particular
indices could be used to the hash-based
structures
param-
1986;
hashing
be executed
do not depend
or other associative
access structures
that
fast or ordered
access to tuples.
Sort-based
explicit
indices,
by any suitably
[DeWitt
are available,
they
for
of
clustering
algorithms.
can speed up the affected
algorithms,
e.g., by rendering
a sort operation
unnecessary.
However,
it is not
always
possible
to choose a suitable
clustering.
For example,
keeping
the
Transcript
useful,
table
but
in
as for
alternative
sort
example)
is equally
queries.
Any
our
any
order
sorted
representing
on the
likely
table
example
table
other
part
to be chosen
representing
and
clustered
on
a many-to-many
of the
table’s
to enhance
a many-to-many
join
student-id’s
is
relationship,
key
(cow-se-no
performance
relationship
the
in the
for other
implies
the
same problem.
Similarly,
ing) would
multi-table
clustering
(sometimes
called “master-detail”
be useful and could improve
the performance
of some
clusterrelational
division
algorithms.
Unfortunately,
it is available
in only a very few database
systems,
and it has the same problem
for many-to-many
relationships
as
clustering
by sort order:
one-to-many
relationship,
can be clustered
in two
while
a master-detail
a table representing
ways,
For
generality,
relationship
is inherently
a many-to-many
relationship
we therefore
clustered
relations.
Moreover,
and more importantly,
indices typically
diate query results.
In order to ensure
that our
apply to both permanent,
ing results,
we discuss
advantages.
database
inferred
cessing
division
in database
systems
and physical
and orderings
can be
the cost of prepro-
as sorting.
2.1 A Naive Scwt-Based
The first
dividend
attributes
stored relations
and to intermediate
query processand analyze
division
algorithms
without
any such
The cost of relational
such
presume
do not exist on intermealgorithms
and analyses
designs that could include
suitable
structures
from the provided
cost formulas
by subtracting
steps
do not
a
Algorithm
algorithm
directly
implements
the calculus
predicate.
is sorted using the quotient
attributes
as major
and
as minor
sort keys. In the examples,
the Transcript
First,
the
the divisor
relation
is
sorted on student-id’s
and, for equal student-id’s,
on course-no’s.
Second, the
divisor
is sorted
on all its attributes.
Third,
the two sorted relations
are
scanned in a fashion
reminiscent
of both nested loops and merge-join.
The
dividend
serves as outer,
the divisor
as inner,
relation.
The dividend
is
ACM
TransactIons
on Database
Systems,
Vol
20, No 2. June
1995
194
.
G. Graefe and R. L. Cole
1+
Student
Student
Course
Jack
Intro
Jill
Database
Jill
Intro
to Databases
Jill
Intro
to Graphics
Jill
Readings
in Databases
Joe
Database
Performance
Joe
Intro
to Compilers
Joe
Intro
to Databases
Jill
to Databases
Performance
I
Readings
+
Dividend
Fig.
in Databases
I
1.
Sorted
division
exactly
tuple,
and once partially
for each candidate
quotient
tuple which
does not participate
in the quotient,
Differently
than in merge-join,
both scans can be advanced
when
an equality
match
has been
The
second
dividend
divisor
example.
Figure
string
and
values
tion,
This
1 shows
dividend
these
relation
tuples,
in
would
no. Concurrent
tables
divisor
Figure
contain
scan
marked
are
sorted
1 because
typically
scans
is scanned
a tuple
e.g., a Transcript
algorithm’s
three
the
may
divisor
naive
scanned
any of the
the
into
Quotient
quotient
actually
however,
found,
once, whereas
inputs
=
Divisor
of the “Jack
and of the divisor determine
has not taken the “Database
such
Dividend,
are
be identifying
Divisor,
for
tuples
naive
easier
keys
does not
of a graphics
ignores
properly
they
that
tuple
logic
once entirely
to read;
such
in the
as
match
course
extraneous
and
in
with
in
Quotient.
The
We
a real
(there
the
records.
division.
student-id
dividend
for each
and
show
applica-
course-
is only
one)
that “Jack
is not part of the quotient
because he
Performance”
course. A continuing
scan through
the “Jill”
tuples in the dividend
and a new scan of the entire divisor includes
“Jill” in the output
of the naive division.
The fact that “Jill” has also taken an
“Intro
to Graphics”
course is ignored by a suitably
general scan logic for naive
division.
Finally,
the “Joe”
tuples
in the dividend
are matched
in a new scan
of the divisor,
and “Joe” is not included
in the output.
Essentially
this algorithm
was proposed
by Smith
and Chang [19751 and
other early papers on algorithms
for executing
relational
algebra expressions.
analyzed
in the performance
comparisons
later
in this
It is the first algorithm
article.
SQL Formulations
In the introduction,
quantification
and
existential
very poor
ACM
Using
EXISTS”
Clauses
we claimed that the typical
relational
division
queries,
quantifications,
performance.
TransactIons
Two “NOT
on Database
SQL formulation
which
employs
leads not only to confusing
complexity
For example,
presuming
that
“NULL”
Systems,
Vol. 20, No. 2, June
1995
of universal
two negated
but also to
values
are
Fast Algorithms
prohibited,
[1994]
the
first
example
query
for Universal Quantification
is Date
tl,student-id
FROM
SELECT DISTINCT
WHERE
NOT EXISTS
(
SELECT’
FROM
Course c
WHERE
NOT EXISTS
(
SELECT
* FROM Transcript
t2
WHERE
t2.student-id
= tl.student-id
AND t2.course-no
= c.course-no)).
In order
to process
this
SQL
query,
and
Darwen
Transcript
as many
195
o
[1993];
and
O’Neil
t]
as three
nested
loops
will
be
The
used. The outermost
one loops over all unique
values for t1.student-id.
middle
one loops over the values for c. course-no. Notice that these two loops
correspond
to the Cartesian
product
in the relational
algebra
expression
given in the introduction
as well as to the two loops in naive division.
The
with
innermost
loop determines
whether
or not there is a tuple in Transcript
and c. course-no.
Since this loop is essentially
a
values
of tl.student-id
search, an index on either attribute
of Transcript
can be very useful.
using
B-tree
indices,
many
of the
If Transcript
and Course are clustered
the
index
searches
run-time
Thus,
when
compare
for
2.2
of the
be
two
“NOT
fast.
other
algorithms
Since
the naive
scans
of the
with
Division
division
divisor,
the
sity
into
counts
enforcing
example
as many
(tuples
SELECT
best
case,
will
the
against
case
execution
be similar
of the
naive
typical
logic
to naive
division,
SQL”
and
division.
we
also
work-around
by Aggregation
algorithm
it
may
requires
be rather
in the
both
large
inputs
and repeated
inputs,
One such
In fact, in
and
it
seems
alternative
uses
most
relational
counting
is the most efficient
way to express
is left to the user to rephrase
all universal
which is a likely
of duplicate
tuples
referential
integrity.
query can be expressed
courses
for
algorithms.
counting.
courses
as there
relation).”
In
as “find
the
are courses
SQL,
relation,
that all Transcript
are no “NULL”
values, this
t.student-id
FROM Transcript
t
BY t.student-id
COUNT
(t.course-no)
= (SELECT
This query
university
are
the subquery
courses taken
sorting
slow
aggregate
functions,
due to the presence
(different)
duplicates
in either
no’s, and that there
GROUP
HAVING
best
to search for alternative
or, more
specifically,
quantifications
e.g., incorrect
taken
this
clauses
algorithms
database
management
systems,
for-all
predicates.
However,
it
a predicate
The first
In
EXISTS”
quantification,
Implementing
worthwhile
aggregations
very
comparing
these
universal
can
source of errors,
or the omission
of
students
offered
assuming
that
tuples refer
query is
COUNT
(course-no)
who
have
by the univerthere
to valid
FROM
are
no
course-
Course).
is evaluated
in three steps. First,
the courses offered by the
counted
using a scalar aggregate
operator.
This step replaces
expression
with
a constant.
Second,
for each student,
the
are counted
using an aggregate
function
operator.
Third,
only
ACM
TransactIons
on Database
Systems,
Vol
20, No. 2. June
1995.
196
.
those
G, Graefe and R. L. Cole
students
whose
number
of courses
taken
is equal
to the
courses offered are selected to be included
in the quotient.
If the dividend
and divisor
relations
do not contain
unique
example,
it would
dent-id’s
and
same
course-no’s
database
that
case,
would
course
the
be
be necessary
part
to explicitly
counted.
twice,
For
if
the
example,
first
time
Transcript
relation
should
of the
However,
the
key.
request
include
before
grouping
The second
in
dividend
is
the
SELECT
t.student-id
t.course-no
BY
query
can be expressed
Transcript
(DISTINCT
COUNT
c.title
as “find
t, Course
AND
c.title
t.course-no)
(DISTINCT
LIKE
illustration
would
would
words
FROM
= c.course-no
COUNT
WHERE
For
the
grade.
attribute,
In
which
projection
Tran-
of
the students
courses
who have
offered
by the
c
LIKE
‘?% Database’%”
t.student-id
(SELECT
they
a failing
and counting.
example
WHERE
HAVING
stu-
take
of the key is being removed,
in the dividend
is required
taken as many database
courses as there are database
university.”
In SQL, this query can be expressed
by
GROUP
may
a term
of
as in our
of the
students
resulted
script
on student-id
and course-no;
since part
duplicates
are possible
and duplicate
removal
keys
uniqueness
some
number
=
course-no)
FROM
Course
“7cDatabase%”).
purposes,
be necessary
we included
if duplicates
two
“DISTINCT”
could
exist.
key words
A good
query
where
optimizer
infer key and uniqueness
properties
and ignore these “DISTINCT”
key
where
appropriate.
More
important,
however,
are the additional
“WHERE”
clauses.
First,
since
referential
integrity
between
dividend
and
divisor does not hold, an additional
join clause must be specified.
Second, the
musk
be specified
on both
levels
restriction
on the divisor (on “04DatabaseVO”)
of
subquery
nesting.
Comparing
the
and
the
second
count
third
step
only
courses,
query
steps
of
becomes
those
evaluation
the
plans
plan
above
significantly
tuples
the aggregate
and Courses restricted
The scalar aggregate
from
function
for
remain
more
the
Quantification
Queries
SQL
virtually
relation
be preceded
queries,
the
Since
that
it
can
the
same,
is
but
first
the
important
refer
by a semi-join
to database
courses.
operator
for the divisor
easily, e.g., using a file scan, and similarly
function
and the possible
semi-join
require
ourselves
only
this section, we will concern
Can All Universal
two
complex.
Transcript
must
the
to
to database
of Transcript
be implemented
quite
the final selection.
The aggregate
more effort; in the remainder
of
with
these
be Expressed
operators.
by Counting?
While it seems to work for the example
queries,
one might
ask whether
all
universal
quantification
can be expressed
by counting.
The essential,
basic
ideas for a proof are that (1) universal
quantification
and existential
quantification are related
to each other as each is negated
using the other, and (2)
existential
quantification
is equivalent
to a count greater than zero. Negated,
ACM
Transactions
on Database
Systems,
Vol
20, No 2, June
1995
Fast Algorithms
existential
quantification
universal
equal
to
to zero.
the
count
all
This,
maximum
to a count
is equivalent
in
turn,
to the
is equivalent
possible,
i.e.,
the
in the set over which
of items
universal
tion
is equivalent
quantification
quantification
for Universal Quantification
to a count
count
of
to zero.
of qualifying
Similarly,
items
items
items
is universally
can indeed
197
non-qualifying
of qualifying
the query
queries
equal
count
.
is
equal
equal
to
quantified.
be rewritten
using
the
Thus,
aggrega-
and counting.
Since this proof idea is quite simple and very intuitive,
formal
proof here. In order
to capture
all subtleties
“NULL”
universal
values
must
quantification
be taken
into
and relational
proper
division
consideration.
queries will
pattern
given in the introduction,
i.e., analyze
and set-valued
attributes,
and therefore
involve
“NULL”
values in the rest of this article.
Division
The
Using
Sort-Based
traditional
way
we do not develop a
of SQL semantics,
Because
most
follow the query
many-to-many
key attributes,
relationships
we will ignore
Aggregation
of implementing
aggregate
functions
relies
on sorting
Afterwards,
the examples,
the count of courses
Transcript
is sorted on attribute
student-id.
taken by each student
can be determined
in
a single
scan. The exact
of this
[Epstein
1979].
file
In
logic
scan is left
to the reader.
An obvious
optimization
of this algorithm
is to perform
aggregation
during
sorting,
i.e.,
whenever
two tuples with equal sort keys are found, they are aggregated
into
one tuple,
thus reducing
the number
of tuples
written
to temporary
files
[Bitten
and DeWitt
1983], We call this optimization
early aggregation.
If the query requires
a semi-join
prior to the aggregation
as in the second
example,
any of the semi-join
algorithms
available
in the system can be used,
typically
merge-join,
semi-join
versions,
must
be sorted
from
the
must
student-id’s
loops
exist.
on the divisor
grouping
relation
nested
if they
attributes
(quotient)
be sorted
join,
first
index
If merge-join
nested
loops
is used, notice
for the semi-join,
attributes.
In
on course-no’s
the
for
join,
or
their
that
the dividend
which
are different
example,
the
the
semi-join
has been used in most
database
Transcript
and
then
on
for the aggregation.
Since
sort-based
INGRES
[Epstein
aggregation
1979]
and
DB2
[Cheng
et al.
1985],
it
systems,
e.g.,
is important
to
understand
its performance
when used for relational
division.
Division
using
sort-based
aggregation
is the second algorithm
analyzed
in the performance
comparisons
below.
Division
Using
Hash-Based
Aggregation
Sorting
actually
results
in more order than necessary
for many relational
algebra
operations.
In the examples,
it is not truly
required
that the Transcript tuples be rearranged
in ascending
student-id
order; it is only necessary
that
Transcript
tuples
with
equal
student-id
attribute
values
be brought
together.
The fastest way to achieve this uses hashing.
Thus, several hashbased algorithms
have been proposed
for join, semi-join,
intersection,
duplietc. (e.g., in Bratbergsengen
[1984],
cate removal,
aggregate
functions,
ACM
Transactions
on Database
Systems,
Vol. 20, No. 2, June
1995.
198
.
DeWitt
G. Graefe and R. I_. Cole
et
al.
Kitsuregawa
[1984],
others ). The most
hashing
modified
and
Shapiro
efficient
Gerber
[1986],
technique
[DeWitt
et al.
with techniques
[Kitsuregawa
Hash-based
main
DeWitt
et al. [1983],
for inputs
1984; DeWitt
for managing
et al. 1989; Nakayama
aggregate
functions
memory
hash
table,
but
[1985],
Zeller
F’ushimi
and Gray
larger
than
and Gerber
non-uniform
tuples,
al.
[1986],
and many
memory
is hybrid
1985; Shapiro
19861
hash value distributions
et al. 1988].
keep tuples
of the
not input
et
[1990],
output
The output
relation
relation
if
in
a
contains
the grouping
attributes.
student-id
in the examples,
and one or more aggreor both in the case of an average
gation
values,
e.g., a sum, a count,
computation.
Each
tuple
with
tuple.
When
input
matching
is either
aggregated
attributes,
input
is consumed,
the entire
is contained
tuple
grouping
in the hash
into
an existing
or it is used to create
the result
output
a new output
of the aggregate
function
table.
If the aggregate
function
output
does not fit into main memory,
hash
it. Since the
Oz)erflow occurs and temporary
files must be used to resolve
table
contains
tion
input
with
only
fit
a total
tuples.
of
Thus,
files,
for
on
those
of
larger
sorting
than
If
using
et al.
the
than
on
hybrid
necessary
i.e.,
sort-based
1/0
overflow
Shapiro
If
can
students
table
need
hold
only
1/0
for
temporary
based
[Knuth
the
size
be used
for
500
sorting
aggregation
are
is
as considered
also
on
of
1973]
output
techniques
1986]
aggrega-
500
requirements
selection
hashing.
the
are
without
aggregation
The
that
if there
hash
well,
replacement
1984;
is not
examples,
tuples,
quicksort.
based
memory,
it
the
performs
files
using
aggregation
[DeWitt
In
Transcript
aggregation
runs
tions
output,
memory.
10,000
much
larger
duplicate
main
hash
memory-sized
based
aggregation
into
table
hash
similar
only
to
slightly
for join
opera-
aggregation
and
removal.
the
aggregate
function
is
preceded
by
a semi-join
as in
the
second
example
query,
the semi-join
can also be implemented
using hashing,
The
hash table used for the semi-join
is different
from the one used for aggregation, just as sort-based
semi-join
and aggregation
require
two separate
sorts
on different
attributes.
The hash
hashing
on course-no’s,
whereas
on hashing
studen t-id’s.
Let
us
briefly
duplicates
the
sort
filter
that
sort
can
in
consider
either
operation
compares
be
duplicates
used
and
duplicates
input
or,
relation
if the
aggregate
remove
tuples
again.
can
inputs
consecutive
to
table in the semi-join
is built and probed by
the hash table for the aggregation
is based
are
tuples.
be
the
already
In
duplicates,
at the
In
naive
eliminated
sorted,
sort-based
although
same
time.
division
algorithm,
conveniently
by
other
the
cannot
words,
of
a simple
aggregation,
sorting
In
as part
inserting
initial
remove
if duplicate
Transcript
relation
is required,
the early aggregation
can be
used for duplicate
removal
but the actual aggregation
(counting
course-no’s)
cannot be integrated
into the sort operator.
Similarly,
hash-based
aggregation cannot include duplicate
removal,
since only one tuple is kept in the hash
table for each group. While
efficient
duplicate
removal
schemes based on
removal
in
hashing
in main
exist, they require
that the entire duplicate-free
output must be kept
memory
hash tables
or in overflow
files. Thus, duplicate
removal
ACM
the
TransactIons
on Database
Systems,
Vol
20, No, 2, June
1995
Fast Algorkhms
based
on
about
the
sorting
hashing
and
in
of
lower
and
hash-based
et
costs
large
as
used
1986;
only
in
199
relation,
removal
a small
et
[Zeller
third
dividend
duplicate
.
with
based
on
1993].
DeWitt
SQL
is the
a very
[Graefe
been
al.
NonStop
aggregation
for
operations
has
[DeWitt
Tandem’s
1/0
CPU
aggregation
Gamma
1994],
be expensive
number
slightly
Hash-based
e.g.,
may
same
for Unwersal Quantification
al,
and
algorithm
number
1990],
Gray
1990].
analyzed
of systems,
Volcano
in
[Graefe
Division
using
performance
com-
parisons.
3. HASH-DIVISION
This
section
duced
as
algorithm.
algorithm
naive
contains
a description
has?l-diuision
The
design
for
universal
division
but
in
proposed
algorithm
division
Algorithm
Description
followed
was
motivated
quantification
fits
on
and
hashing.
into
II
a classification
pseudo-code
for the hash-division
one
the
and
kept
divisor
divisor
table, the
each tuple in the
with
of
for
division,
shows
intro-
discussion
by a search
the
a direct
similar
where
of universal
2 gives
the
algorithm
a
the
to
recently
quantification
and
algorithms.
Figure
called
by
relational
Table
tables,
for
proposed
[ 1989],
of hash-division
based
relational
of a recently
Graefe
one
for
the
algorithm.
quotient.
quotient
table, called
second
first
hash
table
is
table. An integer
value is
the divisor
number.
With
the
divisor
It uses two hash
The
each tuple in the quotient
table, a bit map is kept with one bit for each divisor
tuple. Bit maps are common data structures
for keeping track of the composition of a set; since
quotient
candidates
natural
component
FiDwre
of division
3 illustrates
are the
that
the purpose
of relational
division
is to ascertain
which
are associated
with the entire divisor set, bit maps are a
same
the
as those
inputs
algorithms.
the two
hash
in Figure
be sorted.
The
tables
used
in hash-division.
1; hash-division,
divisor
table
tuples and associates
a divisor number
the right contains
quotient
candidates,
however,
on the
left
The inputs
does not
contains
taken
incompletely
not
appear
there
hashing
on
tuple.
the
left
either
and
two
“Database”
maps.
The
courses
hash
no compilers
tuples
tuple
bit
hash-division
divisor
tuples,
one
filled
in
was
The
only
and
and
graphics
the
attributes.
assigns
the
In
the
current
a unique
At
of this
first
end
Figure
step,
in
the
the
divisor
than
First,
The
hash
bucket
is
number
divisor
table
their
do
that
relation
steps.
algorithm
by
databases
determined
three
the
for each
and “Joe”
indicated
immediately
as a tuple’s
divisor
is
other
in
process,
count
Thus,
the
course
table.
is inserted,
in
it was
proceeds
divisor
courses
on subjects
because
algorithm
into
all
table,
divisor
with each item. The quotient
table on
obtained
by projecting
dividend
tuples
on their quotient
attributes,
and a bit map for each item indicating
divisor tuples there has been a dividend
tuple. The fact that “Jack’
have
require
all
it
counts
divisor
number
is assigned
is complete,
inserts
determined
to
the
by
divisor
when
each
all
the
divisor
as shown
on
20. No 2, June
1995
3.
ACM
TransactIons
on Database
Systems,
Vol
200
G. Graefe and R. L. Cole
.
Table
11.
Classification
of Universal
Based
Du-ect
Indirect
Nawe
(counting)
semijoin
Sorting
by
aggregation
// step 1: build
assign divisor
with
the divisor
on hashing
Hash -du,ision
Hash-based
duplicate
duplicate
hybrid
hash join,
based
aggregation
aggregation
removal,
hash-
table
count +-- zero
for each divisor
tuple
insert divisor
assign tuple’s
increment
Based
merge-join
sorting
Algorithms
divwon
with
removal,
and
Quantification
on sorting
tuple into the divisor
divisor
number
the divisor
table
hash bucket
divisor
scan appropriate
if no matching
in the divisor
table
tuple is found
hash bucket in the quotient
quotient
(candidate)
create new quotient
from quotient
candidate
attributes
of dividend
with
to divisor
/i step 3: find result in the quotient
in quotient
tuple’s
tuple
zeroes
insert it into hash bucket of quotient
set bit corresponding
table
tuple is found
tuple
incl. a bit map initialized
for each bucket
bucket
tuple
scan appropriate
if a matching
appropriate
count
count
// step 2: build the quotient
for each dividend
table’s
+-- divisor
table
divisor
number
table
table
for each tuple in bucket
if the associated
print
quotient
Fig, 2.
bit map contains
no zero
tuple
The hash-division
algorithm,
In the second step, the algorithm
consumes the dividend
relation,
For each
dividend
tuple, the algorithm
first checks whether
or not the dividend
tuple
corresponds
to a divisor
tuple in the divisor
table by hashing
and matching
the dividend
tuple
on the divisor
attributes.
If no matching
divisor
tuple
exists, the dividend
tuple is immediately
discarded.
In the second example
query, a student’s
Transcript
tuple for an “Intro
to Graphics”
course does not
pass this test and is not considered
further,
and the algorithm
advances
to
the next dividend
tuple.
If a matching
divisor
tuple
is found,
its divisor
number
is kept and the dividend
tuple is considered
a quotient
candidate.
Next, the algorithm
determines
whether
or not a matching
quotient
candiACM
TransactIons
on Database
Systems,
Vol. 20, No. 2, June
1995
Fast Algorithms
Divisor
Database
for Universal Quanhfication
.
201
Bitmap
Number
Performance
’21
Fig. 3
date
already
dividend
exists
tuple
exists,
e.g.,
Divisor
in
on
the
because
table
the
quotient
the
tuple
is
quotient
attributes,
and
the
the
new
Together
tuple
for
bit
the
that
matching
to
set
step,
the
with
in the
one
bit
in
divisor
table
Finally,
the
can
third
exactly
contains
no zero.
quotient
Algorithm
by
tuple,
hashing
no
empty
map
is created
map
is initialized
number
bit
all
At
on
the
the
quotient
If,
needs
the
end
right
bit
for
zero’s,
earlier.
that
map.
as shown
on
one
with
a new
tuple
into
with
kept
exists,
candidate’s
beginning,
is inserted
the
candidate
dividend
bit
divisor
matching
quotient
the
the
tuple
and
such
at
projecting
already
is complete,
division.
a bit
This
tuple
quotient
is
by
in hash
If
quotient
to the
table
consists
the
the
table
new
candidate
quotient
table
created
table.
corresponds
quotient
the
divisor
table
attributes.
quotient
candidate
divisor
and quotient
quotient
quotient
table.
~
in
the
each
except
however,
a
to be done
is
of this
second
Figure
3, and
be discarded.
step
determines
of those
This
tuples
set
can
in
easily
the
the
quotient
quotient
of the
table
be determined
for
two
inputs,
which
the
by scanning
all
which
bit
map
buckets
in
table.
Discussion
A number
the divisor
so, the first
of observations
can be made on hash division.
can be eliminated
while building
the divisor
step of the
algorithm
must
be augmented
duplicate
in the divisor table’s hash bucket
the divisor
table,
which
is the standard
duplicate
removal
and aggregate
functions.
First, duplicates
in
table. In order to do
to test
for an existing
before inserting
a new tuple into
technique
used for hash-based
Second, duplicates
in the dividend
are ignored
automatically.
Two identical
dividend
items map to the same divisor tuple and therefore
the same divisor
number;
moreover,
they map to the same quotient
candidate,
and therefore
to
exactly
the same bit in the same bit map. To appreciate
this
maps, compare this method
of detecting
and ignoring
duplicates
effect of bit
in the large
input,
the dividend,
with
duplicate
removal
in the indirect
strategy
using
hash-based
duplicate
removal.
The latter
strategy
requires
building
a hash
table that contains
all fields of all unique
tuples from the entire dividend.
In
hash-division,
dividend
tuples
are split into two sections
(vertically
partitioned into the divisor and the quotient
attributes),
and the repetition
factors
ACM
Transactions
on Database
Systems,
Vol
20, No 2, June
1995
202
.
in each
G. Graefe and R. L. Cole
section
relate
are
entries
than
the
hash
instructive
is
In
hash-based
divisor
table
to
relation
step,
for
counter
used
employed
in
in
with
semi-join.
aggregation
the
are
The
second
holds
instead
if the
between
entirely
replaced
maps,
the
based
dividend
and
however,
prior
used
example
queries
classes.
Typically,
join
is needed
majors
to
Student
sion
permit
step
that
and
output
mits
the
the
the
isons,
the
the
big
product
of divisor
contain
more
expect
that
TransactIons
count
with
and
memory
on Database
on
divisor
is the
the
requirements
Systems,
the
dividend
quotient,
Vol
in
not
sufficient
and
Even
relation
see
the
really
as it
though
namely
do not
20, No. 2, June
performance
the
quotient
pose
1995
of
memory
are
all
not
the
of quotient
if
algorithms,
is a superset
the
the
do
the
naive
delivering
compar-
competitive.
main
quotient
effect
student-id’s,
which
perwithout
such as Student
e.g.,
are
hash-divi-
disadvantage
property
of
the
across
counts
sort-based
useful
e.g.,
and
function
The
will
algorithms
quotient.
than
we
input,
and
end
typically
a clear
the
a third
as
depends
the
relation
have
the
of the
individual
attributes,
names
at
input
has
the
a set of
a subsequent
a third
aggregate
the
relation.
with
However,
that
the
table)
consider
available
a third
division
quotient
students’
implementations
for
quotient
useful;
and the
taken
modification
permit
built
a third
not
with
efficient
(the
have
table
aggregation
division
tuples
the
and
could
far
table
the
hash-division
involved;
simple
merge-join
Recall
eliminated
to
example,
as the
hash-join
aggregation,
on
quotient.
tables.
func-
integrity
be
removal,
who
are
such
aggregation
with
sort-based
Finally,
hash
map
are known
to employ
similar
For
quotient
so
divisor
sort-based
an immediate
sorting
The
bit
additional
can
on its result
themselves
The
hash-based
sorted
table
immediate
hash
is to be joined
division
their
the
referential
and duplicate
information
Hash-based
Thus,
quotient
useful
an
compares
candidates.
and
of students
stz~dent-id’s
A fairly
retaining
as
table
the
aggrega-
function
surprisingly,
operations.
student-id’s
discussed
join.
in the
candidates.
the
In
contains
quotient
provide
divisor
not
a hash
student-id’s,
relation.
subsequent
ACM
the
enables
algorithm
is,
subsequent
attach
these
hash-division
for
to find
to
the
semi-join
algorithm
using hash-based
aggregation.
Fifth,
hash-division
creates
be
table
used
two
tables.
are ignored.
If duplicates
algorithm
can be modified
divisor,
without
hash-division
can
same
are
maps associated
with quotient
candidate
tuples
can be
These are precisely
the conditions
required
for division
on aggregation
that
table,
the
using
there
quotient
hash
of divisor numbers
and bit maps.
divisor
is also free of duplicates
and the bit
by counters.
resulting
hash
maps
division
cases,
and
first
i.e.,
serves
tionality
that duplicates
in the dividend
not to be a problem,
the hash-division
counters
Fourth,
the
output,
Bit
hash-division.
divisor
bit
removal,
against
both
the
memory-efficient
duplicate
In
the
and
more
hash-division
semi-join,
aggregation
numbers
much
semi-join.
prior
the
is
hash-based
they
the
contains
in
prior
hash-division,
divisor
which
compare
with
aggregation
The
tables,
required
aggregation
tables.
to be high.
hash
it
hash-based
tion
these
large
Third,
hash
expected
in
quotient
a major
to
hold
both
smaller
relations
of the
Cartesian
table
may
actually
candidates,
problem
in
we
most
Fast Algorithms
for Unwersal Quanhfication
If, however,
the divisor
table or the quotient
table are larger
occurs
and portions
available
main memory,
hash table overflow
than
cases.
both
tables
must
hash-division
section,
can
we
processor
take
consider
spooled
advantage
for
system.
In this
rithms
divisor)
TEMPORARY
to secondary
hash
of the
systems
are
FOR
LARGE
FILES
storage,
a multi-processor
handling
Adaptations
to multi-processor
USING
of
techniques
database
algorithms
4.
be temporarily
In
overflow
hash-division
discussed
in
of
the
one
or
Alternatively,
system.
table
203
.
the
in
and
Section
next
a single
the
other
5.
INPUTS
section we consider
the modifications
required
for each of the algodiscussed
in the last two sections
if one of the inputs
(dividend
or
or the quotient
candidates
do not fit into the available
memory.
For
sort-based
algorithms,
semi-join,
i.e.,
the underlying
as described
in
the
naive
division
sort operator
literature,
Salzberg
et al, [19901.
overflow
techniques
[Fushimi
et al. 1986;
For
e.g.,
hash-based
can be used,
and
sort-based
aggregation
e.g., overflow
Sacco 1986]
aggregation
must perform
a disk-based
in Graefe
[1993],
Knuth
(possibly
and
semi-join,
avoidance
combined
and
merge-sort
[1973],
and
standard
or fragmentation
with
bucket
tuning
and
dynamic
destaging
[Kitsuregawa
et al. 1989; Nakayama
et al. 1988])
or
hybrid hash overflow
resolution
[DeWitt
et al. 1984; DeWitt
and Gerber 1985;
Shapiro
1986]. In the following,
we outline
the alternatives
for overflow
management
in hash-division.
If the available
memory
is not sufficient
for divisor
table and quotient
table,
the input
tions
that
data
must
one at a time.
The
partitions
spooled
are
first
first
partitioned
the
partition
called
quotient
range-partitioning
or
into
disjoint
phases.
subsets
The partitions
is kept
in main
memory
files,
one for
each
quotient
partitioning,
attributes
using
hash-partitioning.
For
while
partition,
the
parti-
the
other
in
a way
there
dividend
a partitioning
the
called
are processed
join [Shapiro
1986]. For hash-division,
which can be used alone or together.
strategy,
on
in multiple
to temporary
similar
to hybrid
hash
partitioning
strategies,
In the
be partitioned
can be processed
are two
relation
strategy
example
is
such
queries,
the
as
set
of
student-id’s
would be partitioned,
e.g., into odd and even values. Each phase
which
is the quotient
of one dividend
partition
produces a quotient
partition,
and
the
divisor,
The
quotient
(disjoint
union)
of
example,
the
of students
students
with
set
set
of students
Since
table
of
quotients
who
the
entire
division
partitions.
have
The
the
taken
all
dividend
the
concatenation
into
database
the
courses
concrete
is the
student-id’s
who have taken
all database
courses
student-id’s
who have
taken all database
with
even
divisors,
strategy,
relations
it certainly
called
divisor
using
ACM
the
the entire divisor,
all phases. While
can be a problem
partitioning,
same
Transactions
partitioning
on Database
both
function
Systems,
Vol
of
the
courses.
the divisor
this is no
if the divisor
partitions
set
plus
odd
for small
second
is
Translated
all dividend
partitions
are dividend
with
must be kept in main
memory
during
problem
large.
and
all
the
applied
is very
divisor
to the
20, No. 2. June
1995
204
G. Graefe and R. L Cole
.
divisor
tions
attributes.
are
For
partitioned
performs
the
division
dividend
and
one
Notice
ing
that
and
must
the
algorithm
from
the
in
the
Courses
and
for
divisor
are
For
divisor
of
have
taken
courses
all
have
mined
undergraduate
really
taken
partitions
of finding
is exactly
the quotient
the
division
the
quotient
partition-
quotient
partitions
quotient
tuples
that
that appear in all
only students
who
all
graduate
database
set can easily
candidates
problem
from
partition.
for
the
This
phase
the
and
courses.
one
quotient
i.e., the tuples
result.
Indeed,
courses
all database
since the problem
quotient
database
Only
rela-
Each
files,
different
phase.
Transcript
courses.
one
partitioning,
were produced
by all single-phase
divisions,
quotient
partitions,
participate
in the final
the
partition
producing
quite
collection
a final
and
graduate
a pair
input,
partitions
partitioning.
gathered
both
undergraduate
quotient
divisor
be
example,
into
that
again.
be deter-
appear
Thus,
in all
in order
to
obtain
the final
result,
each quotient
tuple
produced
by a single-phase
division
is tagged with the phase number.
The collection
phase divides (in the
sense of relational
division)
the union
(concatenation)
of all quotient
partitions over the set of phase numbers.
However,
instead
of using a divisor table
to determine
which
Thus,
the collection
phase
number
Let
bit to set in the bit maps,
phase
replace
us compare
tioning
the first
the divisor
these
strategies
can skip
two
division
strategies
with
is executed
partitioning
the
is
function
on
methods
are somewhat
similar
grouping
can be used.
because
to
partitioning
course-no
partitioning
the
student-id
in
in their
algorithms
attribute,
similar
overflow
by hash-based
with
a preceding
hash-based
semi-join.
Divisor
partitioning
in the semi-join
on the join attribute,
Quotient
number
the
numbers.
partitioning
used when
the phase
step of hash-division,
input
and partiaggregation
is
in
to
similar
to
the
examples.
the
aggregate
our examples.
Since the
and similar
partitioning
strategies
could be used for hash-based
aggregation
and hash-division,
we
expect that the two methods
exhibit
similar
performance
for small and large
files. In fact, the similar
partitioning
strategies
also suggest
that
similar
parallel
algorithms
can be designed,
5. MULTIPROCESSOR
In
this
section
adapted
we
as we will
consider
for multiprocessor
how
well
systems.
partitioning
of partitioning
processing
partitions
ACM
methods
can
algorithms,
are
Transactmns
practically
section.
on Database
are
assume
equal
Systems,
four
division
discussion,
parallelism
of sets
can
include
be
those
hash-
and
freely
here
that
(pipelining
between
operaThe
into disjunct
subsets).
range-partitioning;
with
either
sizes.
Vol
algorithms
we will
(distributed
memory)
and shared-every[Stonebraker
1986] that go beyond the
and synchronization
overhead.
on operators
and sets, parallelism
can be
be combined
we
the
In our
exploited
effectively
by using program
tors) and data parallelism
(partitioning
standard
in the next
IMPLEMENTATIONS
differences
between
shared-nothing
thing (shared
memory)
architectures
obvious differences
in communication
As for all query processing
based
forms
discuss
20, No
2, June
1995
sortone
and
since
hash-based
is used
such
both
query
that
the
Fast Algonthrns for Universal Quantification
Parallel
For
Naive
naive
sort
division,
program
and
the
parallelism
division
partitioning
can be employed
Parallel
Aggregation
Algorithms
For algorithms
based
using
outlined
in DeWitt
to be a promising
referential
integrity
have
between
quotient
below
program
for parallel
the
two
partitioning
and
for hash-division.
and data
query
parallelism
execution,
can
e.g., those
et al. [1990] and Graefe [1993]. While partitioning
seems
approach,
there is a problem
if a semi-join
is needed for
enforcement.
Recall that the join attribute
in the semi-join
from the grouping
Thus, the large
to be partitioned
for the aggregation.
expensive.
Parallel
both
techniques
(e.g., course-no)
is different
aggregation
(e.g., student-id).
) may
be used
Both
as described
on aggregation,
standard
can only
operator.
divisor
script
205
13wision
operators
be applied
.
twice,
Partitioning
aggregation
permits
the
once
large
a number
attribute
dividend
for
in the
relation
the
dividend
subsequent
(e.g., Tran-
semi-join
relation
of optimizations
and
twice
that
once
can
be
are of interest
here.
input
First, parallel
aggregation
does not require
that the entire aggregation
be shipped between machines.
It is frequently
more effective to perform
local
aggregations
first
(e.g., count
within
partition
and finalize
the aggregation
and Courses
relations
the Transcript
course-no,
each
student’s
number
on
partition,
local
and
counts
dividend,
then
are
can be summed
but
each machine
Second,
only
only
local
have
is partitioned
about
the
(moved
aggregation
dividend
partitions
shipped
ization
of the aggregation
For
partitioned
courses
is
partitioned
The effect
is employed
and
then
after
semi-joined
within
each
student-id
such that
is that not the entire
aggregation
machines)
and
example,
counted
on
size of the
between
partition)
counts).
been
of
counts
existing
local
for each student,
a relation
if sort-based
each
(sum
output
from
for the aggregation.
to derive
local
counts,
the
across the network
will already
be sorted. Finalcan take advantage
of this fact by merging
a
partitions
and summing
the local counts within
a particular
partition,
avoiding another
complete
sort and aggregation
of the partially
aggregated
input.
Third,
a worthwhile
idea for hash table overflow
resolution
in parallel
aggregation
sites rather
records
without
additional
and duplicate
than writing
can
removal
is to send overflow
buckets
to their final
them to overflow
files. On each receiving
site, the
be aggregated
creating
memory.
any
new
directly
hash
Considering
into
table
that
the
existing
entries
and
certainly
hash
without
table,
possibly
requiring
all shared-memory
any
machines
but also many modern
distributed-memory
machines
provide
faster communication
than disk 1/0,
this seems to be a viable alternative
that deserves
investigation.
To summarize,
division
using sort- or hash-based
aggregate
functions
is
most likely
to be competitive
if semi-join
is not required.
However,
if a
semi-join
is required,
the dividend
relation
must be partitioned
twice across the interconnection
network,
once for the semi-join
ACM
Transactions
on Database
Systems,
and shipped
and once for
Vol. 20, No 2, June
1995
206
.
G. Graefe and R. L Cole
aggregation,
the
although
the
Parallel
thus
significantly
optimization
increasing
using
local
the
cost
aggregation
in
can
most
environments,
be used.
Hash-Division
For hash-division,
program
separation
of the
division
operator,
division
because
operator.
However,
parallelism
has only
limited
promise
inputs’
producers
and consumer
the entire
division
is performed
both
partitioning
strategies
beyond
the
from the actual
within
a single
discussed
earlier
for
hash
table overflow,
i.e., quotient
partitioning
and divisor partitioning,
can also be
employed
to parallelize
hash-division.
For hash-division
with
quotient
partitioning,
the divisor
table
must be
replicated
all
in
local
and
each
result.
lar
the
memories
hash-division
local
division
Clearly,
since
divisor
processes
In
divisor
instead
tagging
once
dresses
the
central
ized
using
the
to the
over
quotient
with
this
in
can
sizes.
and
ing
the
duplicate
detailed
over
the
cost
solely
relational
of
the
number
ACM
indicated
and
of tuples
TransactIons
site
of
network
the
In
step
parallel
instead
divides
addresses.
collection
we
the
can
ad-
set
of all
case
that
be decentral-
performance
claim
that
not
database
the
poor
query
of
strate-
for
appropriately
quantification
efficient
hybrid
languages
is
universal
performance
are
universal
and
four
comparison
power
algorithms
optimizer,
simple
This
imply
division
query
of the
division.
and
hash
without
join
and
creating
a
problem.
query
evaluation
discussed
above,
or referential
analysis
integrity
tedious—we
as
query
well
earlier
as
we
in Table
data
paths
each
Systems,
plans
derivations
plans
in this
up
to four
query
flow
from
between
operators
20, No
2, June
1995
For
readers
may
readers
skip
each
evaluation
the
are
cost
provid-
explanations,
section.
path.
Vol
those
and
their
variants
Some
that
I. Tuples
develop
their
enforcement.
their
show
and
including
recommend
evaluation
algorithms,
on Database
the
does
if
as the
illustrate
the
FORMULAS
relational
our
fact,
in
algorithms
traveling
COST
division
In
included
on the
plans,
collection
analyze
by the
functions,
division
cases
in
However,
processor
network
the
and
as fast
removal
this
the
are
section,
find
focus
chosen
be
for
the
AND
support
or throughput
this
and
we
to
relational
therefore
In
order
division
formulas
section,
and
performance
particuamong
processed
overflow.
numbers,
of processor
PLANS
database
implemented
relational
tuples,
quantification
quantification
non-trivial
in
partitioning.
universal
detailed
other,
division
synchronization
are
table
phase
is a bottleneck,
Beginning
very
with
site
EVALUATION
for
hash
set
6. QUERY
gies
global
machine,
without
partitions
for
the
collection
of the
a shared-memory
resulting
tuples
attached
tuples
be part
shared
replication,
of each
is complete.
as discussed
quotient
are
incoming
in
be
After
independently
immediately
can
it
processors.
completely
is trivial
partitioning.
the
will
table
of in phases
participating
work
result
replication
the
multiple
of all
operators
leaves
labeled
and
of the
plans
to the
with
four
for
roots
the
Fast Algorithms
Table
III.
Variables
for Unwersal Quant}flcation
Used
.
207
in the Analysls
Vartable
DescrlptLon
IRI
Dividend
cardinality
Dividend
kl
size in pages
Divisor
cardinahty
Divisor
size in pages
Quotient
~Ql
cardinality
Quotient
size in pages
Quotient
candidate
cardinality
Quotient
csndidate
size in pages
Memory
size in pages
Size reduction
of divisor
Size reduction
of dividend
by duplicate
Size reduction
of dwidend
by referential
Degree
by duplicate
removal
removal
integrity
enforcement
of parallelism
We assume dividend
relation
R (I RI tuples in r pages) and divisor relation
S (IS] tuples in s pages) with quotient
relation
Q (IQ I tuples in q pages). The
projection
of R on the quotient
attributes
is the quotient
candidate
relation
C
(ICI tuples in c pages), Obviously,
IQI < ICI < IRI and q < c s r. Further,
we
assume m memory
pages, with
s + c < m. In other words, we only analyze
cases in which divisor
and quotient
are smaller
than memory.
Given that the
dividend
relation
is the Cartesian
product
of quotient
and divisor,
this
restriction
will cover most practical
cases. All variables
are summarized
in
Table III, together
with
those used in the analyses
of duplicate
removal,
referential
Our
integrity
enforcement,
cost measure
combines
and parallelism.
CPU
and
1/0
without
overlap
of CPU and 1/0
activity.
essential
operations
such as 1/0 operations
to roughly
match a contemporary
workstation
cost units.
The cost units,
their
values
are given in Table IV.
For each of the four algorithms
aggregation
and semi-join,
division
and
hash-division),
duplicate
removal
then
analyze
also
estimate
Costs
we first
nor
the
some common
for Original
Inputs
both
measured
(in milliseconds),
and their
(naive
division,
division
by hash-based
aggregation
derive
referential
cost for
costs,
cost
formulas
integrity
these
for
the
enforcement
preprocessing
description
by sort-based
and semi-join,
case that
are
steps.
neither
required
However,
and
we first
costs,
and Final Output
The cost of reading
the divisor
and dividend
relations,
the quotient
file (copying),
and writing
the quotient
is
(r + s) X
as times
The cost formulas
capture
all
and record comparisons.
We tried
when setting the weights
of the
SeqIO
x WrRec
+
(IR
+
1S1)
X
RdRec
assembling
+ q X Moue
pages
+ IQI
+ q x SeqIO.
ACM
Transactions
of
(2)
on Database
Systems,
Vol
20, No. 2, June
1995.
208
G
.
Graefe and R L. Cole
Table
Unit
Tune
SeqIO
RdRec
WrRec
RdTmp
WrTmp
Comp
Aggr
Hash
Move
Bit
Map
SMXfer
Description
random
I/ O,onepagefrom
sequential
CPU
1/0,
or to disk
one page from
cost per record
when
(incl.
or to disk
reading
seek and latency)
(incl.
latency)
a permanent
file
CPU cost per record when writing a permanent file
file
CPU cost per record when reading a temporary
CPU cost per record when writing a temporary file
comparison of two tuples
initializing
aggregation, or aggregation of two tuples
calculation of a hash value from a tuple
memory to memory copy of one page
setting a bit in a bit map
initializing
or scanning a bit in a bit map
transfer of one page between processes in shared memory
transfer of one page in distributed
memory
4
DMXfer
Cost Units
[ rns]
20
10
0.2
0,2
0.1
01
0.03
0.03
0.03
0.4
0.003
0.0001
0.1
RndIO
IV.
We omit this cost in the following
cost formulas,
because it is common to all
algorithms
and does not contribute
to their
comparison.
In the analytic
performance
evaluation
in the following
section, we include
this base cost in
the diagrams
to provide
a reference
point
for the algorithms’
costs.
Sort Costs
Since several of the algorithms
require
sorting,
we separate
the formulas
for
the cost of sorting. We distinguish
between
input relations
that tit in memory
and those that do not. For the former
mate cost function
of
Sort(S)
:= 2 x ISI x logz
for relation
S since it fits in memory.
For relations
larger
than
memory,
we assume
(ISI)
we
quiclcsort
with
X Comp
+ s X Moue
assume
a disk-based
an approxi-
(3)
merge-sort
with sequential
1/0 and
algorithm
with fan-in
F, in which runs are written
read with random
1/0.
Its sort cost is the sum of sorting
the initial
runs
using quicksort
one memory
load at a time and the product
of the number
of
merge passes with the cost of each merge, both 1/0 and CPU costs, which is
Sort(R)
:= 2 x Il?l
X(r
x log2 (lhl/(r/m))
x (Moue
X (WrTmp
+ SeqIO
+ RdTmp
X Cornp
+ RndIO)
+ logz (F)
+ hg~
(r/m)
+ II?]
(4)
X Comp)).
Quicksort
will
be invoked
r/m
times,
each time
for lR1/(r,/m)
tuples.
The
merge depth is approximated
by the logarithm
without
ceiling
to reflect the
advantages
of optimized
merging
[ Graefe
1993]. The cost of writing
a run,
e.g., an initial
6.1
Naive
run,
is included
in the cost of reading
and merging
it.
Division
Figure
4 shows three
query evaluation
plans for relational
division
using
naive division.
The labels on the data paths between
operators
indicate
the
cardinalities
of intermediate
results,
ACM
Transactions
on Database
Systems,
Vol. 20, No 2, June
1995.
Fast Algorlthrns for Universal Quantification
Naive Division
%-’
Scan S
Naive Division
IRI
Sort
ISI
Scan R
If referential
removal
is not
integrity
required,
replaced
by counting.
equation
quotient
This
formula
reflects
dates
groups,
counting
plan
is
case,
size
quite
which
Referential
Query
plans
for naive
It is only
necessary
in the
Scan R
division
algorithm
the
of each
similar
will
Integrity
size
of each
group
to
the
count
plan
the
tuples
is shown
with
in Figure
case is
R, dividing
group,
plan
and
R. This
+ ISI XAggr
to the
be discussed
for the inputs
and duplicate
can actually
be omitted
and
in this
costs of sorting
the
division.
to sort
dividend
+ IRI x (Con-zp + Aggr)
Sort(R)
this
Scan S
(c)
is known
to hold
sorting
the divisor
attribute
4a. The cost of naive
same
I IRI
(b)
Fig. 4(a–c).
the
Son-Unique
1s1
Scan R
(a)
comparing
‘=Jq/p
Sort-Unique
IRI
Scan S
209
Division
lsl/y/
%x’
Sort
sort
Naive
.
R into
counting
number
using
(5)
+ ICI x Comp.
the
of divisor
sort-based
quotient
divisor
candi-
tuples,
tuples.
and
Note
aggregation
that
for
the
shortly.
Enforcement
Naive division
can enforce referential
integrity
as part of its nested merging
scans, if the two inputs
are sorted properly,
e.g., as shown in Figure
1. The
plan
this
in Figure
plan
4b shows
naive
division
with
two
sort
operations.
The
cost of
is
Sort(S)
+ Sort(R)
+ IRI x C’omp
+( ICI – IQI) x
The new cost components
comparing
consecutive
candidate
groups,
and
enforcement.
The latter
reflect
+ IQI x ISI X Comp
1S1/2 x Comp.
sorting
divisor
(6)
S in addition
to dividend
R,
R tuples to divide the sorted dividend
into quotient
comparing
R and S tuples
for referential
integrity
cost is calculated
separately
for successful and unsuc-
cessful
quotient
candidates.
We presume
that
half
of the divisor
file is
scanned for each unsuccessful
quotient
candidate
before the failure
is identified and the remaining
dividend
tuples of the quotient
candidate
are skipped
over without
pursuing
the scan in the divisor.
Note that we assumed
that
s < m; therefore,
the repeated
scans of the divisor
can be performed
without
1/0,
Duplicate
Removal
If the inputs
them, which
contain duplicates,
the sort operations
must identify
requires
comparisons,
costing (IRI + IS 1) x Comp.
ACM
TransactIons
on Database
Systems,
and remove
We presume
Vol. 20. No. 2, June
1995.
210
G. Graefe and R. L Cole
.
that duplicate
removal
reduces the divisor size by a factor a and the dividend
by a factor /3, as shown in Figure
4c. Thus, the cost for naive division
with
duplicate
removal
Sort(S)
The
cost
integrity
but
without
+ Sort(R)
+ (IN
+ lS1/a
X Aggr
naive
division
for
enforcement
Sort(S)
referential
+ ISI)
X
integrity
Comp
enforcement
+ lR1/~
is
X (Comp
+Aggr)
(7)
+ ICI X Comp.
with
both
duplicate
removal
and
referential
is
+ Sort(R)
+ (IRI + ISI) X Comp
+ lR1//3
X Comp
(8)
+ IQI
6.2
Division
]S1/a
X
by Sort-Based
X
Comp
+ (ICI – IQI) X lS\/a/2
X Comp.
Aggregation
Performing
a division
by a sort-based
aggregate
function
requires
the elements
in S, sorting
and counting
groups in R, and verifying
counts for the groups
the query evaluation
the cardinality
of R equal the count of S. Figure
5a (upper left) shows
plan. Determining
the scalar aggregate,
i.e., counting
of the divisor,
and verifying
that
the correct
for each group in the aggregate
function
costs IS I x Aggr
on
assume
for simplicity
that the aggregate
function
output
procedure
counting
that the
of the
sort’s
final
merge
step,
the
count
was found
+ ICI X Comp. If we
R is formed
in the
costs
of comparing
each
its predecessor
and the actual aggregation
must be added to
the cost of sorting,
i.e., IR I x (Comp + Aggr). Thus, the total cost for division
tuple
of R with
by sort-based
Sort(R)
Referential
aggregation
+ IRI
Integrity
For sort-based
is
(Comp
X
+Aggr)
+ ISI XAggr
+ ICI X Comp,
(9)
Enforcement
referential
integrity
enforcement,
the inputs
must
be sorted
on
the divisor
attributes
and followed
by a semi-join
version
of merge-join.
Figure
5b (upper right)
shows the corresponding
query evaluation
plan. The
reduction
factor
of referential
integrity
enforcement
is indicated
by S. We
presume
that the scalar aggregate
can be performed
as a side-effect
of the
sort or at least can obtain its input from the sort; thus, we do not count the
cost of two divisor
scans, only of one. Thus,
only the costs of two sort
operations
and of a merge-join
must be added, which is (1R I + IS 1) X Comp.
We assume that the entire S relation
is kept in buffer memory
during
sort
and join processing
and ignore any cost of memory-to-memory
copying in the
merge-join
since the join is actually
a semi-join,
which can be implemented
without
cop ying. Thus, the cost for sort-based
aggregation
with referential
integrity
enforcement
is
Sort(S)
+ Sort(R)
+ lR1/8
ACM
Transactmns
on Database
+ (IRI + ISI)
x
(Comp
Systems,
+Aggr)
Vol
X
Comp
+ Sort(R/8)
+ IS I X Aggr
20, No. 2, June
1995
+ ICI X Comp.
(lo)
Fast Algorithms
for Unwersal Quanhficatlon
211
.
Verify
vScalarAggr
Sort-AggrFct
I IRIIc$
1s1I
Scal~Aggr
Merge-Join
Scan S
Verify
Ww
sort
Sort-AggrFct
Scan R
Scan S
(b)
Scan R
Scan S
IRI
1s1
IRI
1s1
sort
(a)
Verify
J’-’”+!Sort-AggrFct
ScalarAggr
Verify
“v
J“”
Scala~Aggr
Islla
AggrFct
I
IsI@/
1s1
Scan R
(d)
Fig. 5(a-d).
Query
plans
for division
by sort-based
aggregations
Removal
When the dividend
must be performed
Figure
5c-d (lower
removal.
IRI
Scan S
Scan R
(c)
Duplicate
Sort-Unique
1s1
IRI
Scan S
plp
Scan S Sort-Unique
Sort-Unique
1s1
Merge-Join
Sort-Unique
I lR1lP
Sort-Unique
lRll@6
Islla
A sort
and divisor
contain
duplicate
records, duplicate
removal
prior to aggregation
in the aggregation-based
algorithms.
left and right)
show the query plans including
duplicate
operator
can perform
either
duplicate
removal
or an aggre-
gate function,
but not both. Thus, the figures
show a separate
function,
which relies on sorted input and thus can identify
groups
aggregate
by simply
comparing
tuples immediately
following
each other in the input stream.
The
common subexpression
(sorting
and duplicate
removal
from S) is shown twice
but charged
only once in the following
formulas.
The cost of sort-based
aggregation
with duplicate
removal
but without
referential
integrity
enforcement
is
Sort(S)
+ Sort(R)
+ (ISI +
+ /S//cz X Aggr
IRI) x C“omp + lR1/~
X (Comp
+ Aggr)
~11)
+ ICI X Comp.
ACM
TransactIons
on Database
Systems,
Vol
20, No 2, June
1995.
212
G. Graefe and R L Cole
.
The
cost
ential
of sort-based
integrity
aggregation
enforcement
SOrt(S)
+ Sort(R)
+ ISI)
+ Sort(R/~/6)
6.3
Division
The
is
query
to
X Aggr
one
aggregation
in
the
number
average
Recall
that
occurs
in
verifying
that
function
is the
in
Figure
the
aggregation,
same
and
+ lS1/a)
refer-
x Comp
+Aggr)
(12)
+ hc x Comp
required
The
value
in Figure
The
to
cost
was
found
sort-based
for
each
no hash
the
each
aggregation.
left),
of hash-based
+ Aggr ), where
that
of finding
6a (upper
cost
traverse
s + q s m, meaning
that
as for
shown
aggregation.
IRI x (Hash
function.
same
table
scalar
group
Thus,
he is
bucket.
overflow
aggregate
in
the
hash
the
cost
and
aggregate
for
the
plan
6a is
IR] X (Hash
Referential
+ he x Comp
Integrity
For hash-based
in
is
assumed
+ (lRl\B
(Comp
X
sort-based
of comparisons
aggregate
removal
Aggregation
for
memory
we
the
duplicate
+ ICI X Comp.
for hash-based
the
x Comp
+ lR1//3/8
by Hash-Based
plan
similar
both
is
+ (IRI
+ lS1/a
with
Figure
hash-based
aggregation,
(13)
cost of a prior
is
IS I x Hash
+ IR I x (Hash
+ he x Comp)
table with the tuples from
the total
cost for division
S and probing
by hash-based
it with the tuples
aggregation
and
+
X(Hash
Duplicate
+ ICI X Comp.
the additional
right)
semi-join
ISI X Hash
+ ISI X Aggr
Enforcement
6b (upper
building
a hash
from
R. Thus,
+Aggr)
for referential
IRI X (Hash
+ hc X Comp
integrity
enforcement
+ he x Com,p)
+Aggr)
semi-join
as shown
is
+ lR1/8
+ IS I X Aggr
for
(14)
+ ICI X Comp.
Removal
The next two figures
show the previous
two query evaluation
plans modified
to include
duplicate
removal.
If the implementation
of hash-based
duplicate
removal
were very smart,
it would be possible
to ensure that the duplicate
removal
operator
in Figure
6C (lower
left)
produces
its output
grouped
suitably
for the subsequent
aggregate
function,
making
a separate hash table
in the aggregate
function
obsolete.
Similarly,
in Figure
6d (lower right),
the
duplicate
removal
operation
for the divisor
could be integrated
into the
hybrid
hash join operation.
However,
we ignore
these optimizations
in our
cost formulas.
Since the duplicate-free
dividend
may be larger than memory,
i.e., IR1/~ >
m, hash
case,
can
be
divided
ing
table
overflow
recursive
may
hybrid
approximated
by
fan-out,
[Graefe
and
the
which
1993;
ACM Transactions
by
memory
we
Graefe
occur
the
size,
assume
et
in
hashing
al.
logarithm
with
the
duplicate
must
the
be
of the
duplicate
logarithm
to be equal
19941.
removal
employed.
Thus
based
to the
the
cost
of
In
recursion
removal
equal
fan-in
on Database Systems, Vol. 20, No 2, June 1995
operation.
The
output
to the
during
hash-based
this
depth
size
partition-
merge-sort
duplicate
1?
Fast Algorithms
for Universal Quantification
213
.
Verify
/w
ScalarAggr
Verify
Y
ScalarAggr
1s1
w’
Hash-AggrFct
1s1
lRll~
Scan S
IRI
Hybrid
Hash Join
Y“+!
Scan S
Scan R
Scan S
Hash-AggrFct
(a)
Scan R
(b)
Verify
YScalarAggr
Verify
y’
<cl
Scala~Aggr
Hash-AggrFct
Islla
Alp
IRI
[s1 I
Scan R
Query plans for division
removal
for the large dividend
step in the deepest recursion
reading
a partition
is—considering
level first and
file
together
aggregations.
the actual duplicate
removal
calculating
the cost of (sequen-
with
:= IRI
x (Hash
the
cost
of writing
it
(using
+ he x Cornp)
+ logF(r/rn)
x ( WrTmp
Thus,
the cost of relational
duplicate
removal
HashUnique(R)
+ lR1/Q
If
by hash-based
writes):
Hashuniqzte(li)
both
quired,
Scan R
Scan S
(et)
Fig. 6(a–d).
random
Hash-Unique
Scan S Hash-Unique
(c)
tially)
Hash Join
Is@”
IRI
Scan S
Hybrid
1s1I
Hash-Unique
1s11
I lRllflc$
Hash-Unique
I lRll~
Hash-Unique
Hash-AggrFct
Is I/a
duplicate
using
but
+ ISI X (Hash
x (Mm+
+ RdTmp)
division
on the inputs
not
X Hash
(15)
-t- Ill
+ r X (RndIO
hash-based
referential
and
+ Aggr)
referential
+ SeqIO)).
aggregation
integrity
+ [S1/a
x Aggr
integrity
including
enforcement
+ he X Comp)
+ hc x Comp
removal
(IRI
X
is
(16)
+ ICI X Comp.
enforcement
are
re-
the cost is
IlczshUnique(R)
+ ISI X (Hash
+
lR1//3 x
+ lR1/p/13
+ lS\/a
ACM
+ he X Comp)
(Hash
X Aggr
Transactions
X Hash
+ he x Comp)
(Hash
x
+ lS1/a
+ hc X Comp
(17)
+ Aggr)
+ ICI X Comp.
on Database
Systems,
Vol. 20, No. 2, June
1995.
214
6.4
G Graefe and R, L. Cole
.
Hash-Division
When
hash-division
is used,
a prior
semi-join
for referential
integrity
enforce-
ment
or an explicit
duplicate
removal
step is never necessary.
Thus, the
query evaluation
plan shown in Figure
7 always applies,
although
the actual
hash-division
algorithm
might
have the divisor
table
and the bit maps
enabled
or disabled.
If it is known
that
neither
duplicate
removal
nor
referential
integrity
enforcement
is required,
the simplest
version
of the
algorithm
hash-based
can be used.
aggregation
enforcement.
tuples
This version
mimics
the
without
duplicate
removal
It uses neither
and dividend
divisor
tuples
table
by groups.
behavior
of division
by
or referential
integrity
nor bit maps
The cost of this
and just
variant
counts
divisor
of hash-division
is
ISI xAggr
which
and
is the
sum
counting
is done
algorithm,
not
in
hash-based
data
type
Referential
the
costs
very
table.
efficiently
as a more
facility,
even
Integrity
divisor
use
this
program
may
tasks
(18)
and
formula
of inserting
although
of the
of a query
invoke
as simple
tuples
cost
variables
interpretation
which
+Aggr),
evaluation
a relatively
must
of the referential
be used.
IS] xHash
In this
plan
expensive
as
abstract
as counting.
integrity
constraint
is not known,
case, the cost of hash-division
+ IRI X (Hash
x (Hash
+ hc x Comp
+ he X Comp)
a
is
+ lR1/8
+ Aggr).
(19)
The differences
to the cost for hash-division
without
referential
enforcement
reflect building
and probing
the divisor table.
Duplicate
the
hash-division
Enforcement
the validity
table
the
We
with
general
for
+ hc X Comp
of counting
quotient
aggregation,
If, however,
divisor
of the
in
counting
+ IRI X (Hash
integrity
Removal
If the inputs
contain
duplicates,
the divisor
table must be probed while it is
being built
and the quotient
table must contain
bit vectors instead
of counters, with IS I/a bits per bit vector. If duplicates
exist but referential
integrity
is known to hold, the divisor
table need not be probed with dividend
tuples,
and the cost of hash-division
ISI x (Hash
+ 2
X
ICI
+ k
X
is
X Comp)
lS1/a
The first two terms account
table. The third term reflects
+ IRI X (Hash
+ hc X Comp
+ IRI) X (Hash
for building
initializing
+ he X Comp)
+ 2 x ICI x lS1/a
ACM
Transactions
on Database
~20)
Map.
X
the divisor
table and the quotient
and scanning
the bit maps.
The cost of the most powerful
variant
of hash-division
in eludes both duplicate
removal
and referential
integrity
(IS
+ Bit)
+ lR1/8
X (Hash
x Map.
Systems,
algorithm,
enforcement,
+ hc X Comp
which
is
+ Bit)
(21)
Vol
20, No. 2, June
1995
Fast Algorithms
for Universal Quantification
.
215
Hash-Division
Iy
Fig. 7. Query plan for hash division.
WI
Scan R
Scan S
The
first
term
reflects
building
the divisor
probing
and building
maps,
6.5
which
probing
table
the quotient
can be very
Parallel
the
Relational
divisor
table,
and by the dividend.
table,
both
by the
The second
which
includes
fast if it is done word
divisor
term
setting
by word,
while
accounts
bits
for
in the bit
not bit by bit.
Division
The four algorithms
for relational
division
and all their variants
can readily
be parallelized
with both pipelining
and partitioning,
i.e., inter-operator
and
intra-operator
parallelism,
as discussed
earlier.
Figure
8 shows the most
complex
plan
for
repartitioning
algorithms,
tioning,
each
and
i.e., naive
which
of the
replication
division
requires
algorithms,
marked
and
that
the
with
datapaths
by “part.”
and
hash-division,
divisor
requiring
“repl.”
For
we assume
be broadcast
data
the
direct
quotient
to all
sites.
partiFor
the
component
algorithms
of the indirect
strategies,
i.e., duplicate
removal,
semijoin, and aggregation,
we assume partitioning
on the join attributes
or the
grouping
attributes.
Moreover,
we assume
a shared-nothing
(distributedmemory)
machine
with local disk drives or a shared-everything
(shared-memory) machine
with
disk drives
dedicated
to processors
in order to reduce
contention
bandwidth
and interference.
is sufficient
that
pairs of sites; thus,
the amount
of data
that
dividend
In
order
communication
delay can be estimated
sent from each site to other processors.
and divisor
the partitioning
robin).
In any case, we assume
that
communication
can occur in parallel
cannot
are partitioned
be used
to determine
the
for
cost
originally
processing
of each
by determining
We also assume
across
(e.g.,
plan,
the
the available
between any
all disks
partitioning
cost
of the
but
that
is roundunderlying
sequential
plan is increased
by the cost of repartitioning
data. If P is the
some data remain
with
the same processor,
namely
the
degree of parallelism,
I/P.
The cost of exchanging
data across process and machine
boundfraction
aries
for a relation
R is r x X~er,
or distributed
memory,
i.e., either
Table IV. Thus, the costs of naive
partitioning
are increased
where
Xs)
X (1 – l/P)
where
B represents
the relative
sending
it to only one recipient
(B
no support).
The
referential
enforce
same
cost
should
reflect
shared
memory
given in
quotient
by
(r+l?
not
X~er
or Xfer = DMXfer
Xfer = SMXfer
division
and of hash-division
using
is added
integrity
ACM
(22)
XXfer,
cost of broadcasting
= 1 for full broadcast
for
explicitly
Transactions
aggregation-based
by
means
on Database
the
divisor
versus
B = P for
support,
algorithms
of a semi-join.
Systems,
that
do
Parallel
Vol. 20, No. 2, June
1995.
216
.
G. Graefe and R. L. Cole
Naive Division
Isy
“=-JQ/p
Hash-Division
Sort-Unique
Sort-Unique
ISI part.
part.
Scan S
IRI
-’~Scan S
Scan R
Scan R
Verify
Verify
Y@
ScalarAggr
HP’.
ScalarAggr
v
Sort-AggrFct
part. I lR1//3’?i
IsI/a
1s1
Islp-=’
Hash-Unique
‘-’-qIp
ISI part.
part.
Scan S
plans
for relational
X (1 – I/P)
by splitting
the
before.
X
course,
Moreover,
smaller
For
by
costs
the
divisor
incurred
cost
even
at
the
a final
to each
ANALYTICAL
plan.
the
In this
site
suitable
to
PERFORMANCE
the
operations
case, the data
sites
must
and
such
cost
illustrations
be
not
local
and
cost for
the
cost
(division
cost
to
global,
can
reflect
memory
easily
factor
by
response
increases.
modified
the
their
and
that
consumption
(remove
its
into
transfer
(24)
resource
local,
plans
and
omit
(23)
XXfer.
repartitioning
operator
We
above
and
require
by
Xfer.
multiple
total
derived
each
broadcast
collection
at
if
formulas
size
partitioning,
modifying
identifiers)
are
decreases
dividend
adding
7.
all
actually
division
aggregation
P X c X (1 – I/P)
Of
Scan R
with referential
integrity
enforcement
result,
and their costs are increased
global components,
as discussed
aggregation
is about
time
part. I IRI
Scan S
r/~/8
can be reduced
Hash-Unique
ISI I part.
query
aggregation-based
algorithms
repartitioning
the intermediate
“-+/p
Scan S Hash-Unique
IRI
Parallel
Hash Join
By+
Scan R
Fig. 8.
This
Hybrid
1s1I
Sort-Unique
Scan S Sort-Unique
part. I lR1/f18
Islla
Merge-Join
Sort-Unique
‘-+
Hash-AggrFct
B
the
formulas
be
above)
set
the
size.
of
found
and
node
here.
COMPARISONS
In this section we compare
the algorithms’
performance
by applying
the cost
formulas
derived
above to various
input
sizes and to situations
with
and
without
explicit
referential
integrity
enforcement
and duplicate
removal.
In
this analysis,
B. We chose
ACM
Transactions
we assume that the tuple size for R is 32 B and for S and Q 16
fairly
small tuples
because in most practical
cases, the tuples
on Database
Systems,
Vol. 20, No
2, June
1995.
Fast Algorithms
will
consist
The
memory
which
of keys only
used
used
to
serve
partitioning
a hash
7.1
size
for
or
hash
a single
clients.
16. The
student-id’s
is
operator
The
average
with
tables
fan-in
1,024
of
query
merging
that
each,
a machine
the
fan-out
required
we assume
217
4 KB
on
and
of comparisons
.
and course-no’s.
pages
of a single
for
number
is he = 2. Furthermore,
bucket
sorted
sorting
many
is
as in our examples
for
is a realistic
for Unwersal Quantification
for
neither
for
scanning
R nor
S are
originally.
Sequential
Since there
Algorithms
is no typical
or average
case for division
algorithms,
the conceptu-
ally simplest
case will be our first
comparison:
the dividend
is the crossproduct
of the divisor
and quotient,
i.e., R = Q X S, and neither
duplicate
removal
nor referential
integrity
enforcement
are required.
Figure
9a shows the performance
of the four division
varying
from 16 to 4,096 with constant
\QI = 256. Since
I R [ varies
from
4,096
to 1,048,576.
The bottom
data points,
as in
all
and
subsequent
other
pertinent
indicates
the
output,
The
is
base
alike;
divisor.
The
The
reason
dominates
both
cost,
cost
two
is that
in
margin
(a factor
inputs
(dividend
quotient).
of 2 to 5) and
and
the
sort-based
whereas
sorting
Referential
Integrity
9b shows
constraint
requires
N
in
their
the
previous
hash
dividend
are
table.
base
final
the
R while
However,
the
by quite
costs,
output
performance
R
relation
sorting
algorithms
the
perform
performance.
i.e.,
to
a large
scanning
disk
(the
deteriorates
algorithms
consistently
are that hash-aggregation
division
direct
the
quick-sorting
require
a
algorithms’
If? grow while the hash-based
to the base cost. The reasons
alike
for
of the
costs
curve
writing
algorithms
cost
sort-based
writing
line
cost.
treatment
of their
and
in
also
probing
the
most
divisor)
derived
algorithms
require
outperform
was
IS I
all
9a,
first
unmarked
and
small
are
the
The
the
inputs
sort-based
the
in Figure
in
the
algorithm’s
sort-based
without
overflow
whereas
naive
files, and that
hashing
permits
Figure
is
algorithms,
algorithms
Moreover,
IS I and
relative
all
line.
cost
two
algorithms
cost, and both
algorithms
each
the
difference
hash-based
hash-based
hash-based
in
second
this
label
parameter
of reading
when
that
varying
the
cost
is included
only
the
in
the
obvious
their
the
i.e.,
mentioned
immediately
almost
explains
parameters
base
as previously
section.
It
figures,
algorithms
for
R = Q x S for
and sort-aggregation
access to the correct
as
perform
well
can proceed
must use run
hash bucket
log N comparisons.
Enforcement
the analytical
is explicitly
enforced.
cost of division
The
when
cardinalities
the referential
of the
dividend,
integrity
divisor,
and quotient
are IQ I = 256, IS I = 256, and IRI = IQ I x IS I x 8. In other words,
this comparison
expands
upon the midpoint
of the previous
comparison
shown in Figure 9a with the size of the dividend
relation
increased
by a factor
which appears as reduction
factor S in the cost functions
for query evaluation
plans with explicit
referential
integrity
enforcement.
For example,
if 8 = 1,
then lR/ = 65,536, and when 8 = 8, then IF? = 524,288.
ACM
TransactIons
on Database
Systems,
Vol. 20, No. 2, June
1995
218
.
G. Graefe and R. L. Cole
❑ Naive Dwlslon
1500 –
cost
x Sort-Based
Aggregation
+ Hash-Based
Aggregation
o Hash-Division
1000 _
[see]
500 –
o–
I
I
I
16
64
256
Divisor
I
1024
Cardinality
lQl=256and
I
4096
1S1
R=Sx
Q
(a)
2000 -+
1500 –
cost
❑ Naive Division
x Sort-Based
Aggregation
+ Hash-Based
Aggregation
o Hash-Division
~ooo” _
[see]
500 –
o–
I
1
Size Reduction
I
8
I
4
I
2
by Referential
Integrity
I
16
Enforcement
6
IQI = ISI = 256, IRl = ISI X IQI X d
(b)
Fig. 9.
(a) Cost of division
algorithms.
(b) Division
with
referential
integrity
enforcement
When the referential
integrity
constraint
is explicitly
enforced,
as shown in
Figure
9b, the naive-division
algorithm
increases
in a familiar
IV log IV
pattern
with the size of the dividend
before the referential
integrity
enforcement and size reduction.
The cost of sort-based
aggregation
significantly
varies
within
Figure
9b
because of two sort operations
for the semi-join
and the aggregation.
For
8 = 1, i.e., if the semi-join
does not remove any tuples, the cost of sort-based
aggregation
with
explicit
referential
integrity
enforcement
is almost
twice
that of naive division
if the base cost is ignored.
Thus, if referential
integrity
must be enforced,
sort-based
aggregation
is a particularly
poor query evaluation plan, although
it is the only one available
in most current
systems. For
larger
before
ACM
values
of 8, the semi-join
the second dividend
sort,
TransactIons
on Database
Systems,
removes
a large
and the second
Vol. 20, No. 2, June
1995
fraction
of the dividend
sort becomes
relatively
Fast Algorithms
for Universal Quantification
.
219
❑ Nawe l.huslon
1500 –
cost
x Sort-Based
Aggregation
+ Hash-Based
Aggregation
o Hash-Division
1000” _
[see]
500 –
o–
I
1
I
2
I
4
Size Reduction
P = CZ,IQI=
I
8
by Duplicate
256,Isl = 256
I
16
Removal
a
x a, IRI = 65,536
x /3
(c)
❑ Naive Division
8000 –
x Sort-Based
Aggregation
6000 – + Hash-Based
Aggregation
cost
[see]
o Hash-Division
4000 –
2000 –
o–
Size Reduction
cz=~=4,
by Referential
1Ql=256,1Sl
=256X
Integrity
Enforcement
6
a,lRl=65,536x/?x6
(d)
Fig. 9. (c)
integrity
Division
with
duplicate
removal.
(d) Division
with
duplicate
removal
and
referential
enforcement.
insignificant
compared
to the first dividend
sort. Since the first dividend
sort
has the same input size and the same cost as the sort of the dividend
required
for naive division,
the costs of the sort-based
direct and indirect
algorithms
become comparable
for large values of 6, although
the sort-based
algorithms
are still much more expensive
than the hash-based
division
algorithms.
In hash-based
aggregation
with
referential
integrity
enforcement,
the
semi-join
reduces
the number
of dividend
tuples
by a factor
of 6 for a
relatively
small
cost. In hash-division,
the algorithm’s
divisor
table is enabled,
larger
which has exactly
values
of 8, the
the same effect as a hash-based
semi-join.
overhead
of hash-division
relative
to the
actually
decreases very slowly.
The hash-based
algorithms
have a significant
performance
their
sort-based
counterparts:
because none of the situations
ACM
TransactIons
on Database
Systems,
Thus, for
base cost
advantage
illustrated
Vol. 20, No. 2, June
over
in
1995.
220
.
G. Graefe and R. L. Cole
Figure
9b causes hash
reduction
of the larger
in-memory
hash table
table overflow,
1/0 to temporary
files is avoided.
input
R can be achieved
by filtering
it through
built
on S. This is an example
of the superiority
hash-based
join algorithms
for different
sizes of the two inputs
in earlier
research
{Bratbergsengen
1984; Graefe et al, 1994].
Duplicate
The
the
of
also observed
Removal
Figure 9C shows the costs of the four relational
division
algorithms
when the
inputs
require
duplicate
removal,
For the duplication
factors
a and B, we
assumed that a = @ and varied their value from 1 to 16.
Again we see that the sort-based
algorithms
are more expensive
because of
the duplicate
removal
hash-based
aggregation
is because
requiring
the
processing
prior to aggregation.
In addition,
the cost of
can now be discerned
from that of hash-division.
This
cost of duplicate
recursive
processing
removal
prior
to aggregation
since the duplicate-free
be held by the in-memory
hash
fastest
algorithm.
Duplicates
in
dividend
is quite
high,
can no longer
table.
Hash-division
continues
to be the
the inputs
only require
enabling
of the
quotient
table bit map processing,
at only a small additional
cost.
Figure
9d shows the midpoint
of Figure
9C (a = P = 4) augmented
with
referential
integrity
enforcement.
The reduction
factor for the dividend
relation 6 varies from 1 to 16, Naive division
outperforms
sort-based
aggregation
if referential
integrity
enforcement
is required,
as had been observed earlier
in Figure
9b. As the referential
integrity
reduction
factor increases
and the
input
small
to the second dividend
sort in sort-based
compared
to the input to the fh-st dividend
aggregation
becomes quite
sort, the difference
between
naive division
and sort-based
aggregation
all but vanishes.
With
sizes of the original
dividend
relation,
the cost of hash-based
removal
becomes larger and larger. However,
it does not approach
sort-based
aggregation.
The performance
of hash-division
is very
increasing
duplicate
the cost of
steady.
Over
the entire range, the cost of hash-division
relative
to the base cost is constant.
For both duplicate
removal
and referential
integrity
reduction
factors equal
to 4, hash-division
is almost 5 times
than 5 times faster than sort-based
hash-based
aggregation.
Preliminary
faster than
aggregation,
naive division,
a little
and 2+ times faster
more
than
Conclusions
Since we realize
that the formulas
are not precise,
we will also report
on
experimental
measurements
in the next section.
Nevertheless,
we believe
that these analytical
comparisons
already
permit
some conclusions.
First,
division
by sort-based
aggregation
does not perform
much better
than the naive algorithm.
In fact, if a semi-join
is necessary
to ensure that
only valid dividend
tuples are counted, the additional
sort cost makes division
by aggregation
significantly
more expensive.
Second, if division
is executed
by aggregation,
the division
can be performed
much
and semi-join,
faster if hash-based
algorithms
because the two input
sizes,
ACM
on Database
TransactIons
Systems,
Vol
20, No. 2, June
are used for the aggregation
dividend
and divisor,
are very
1995
Fast Algorithms
for Universal Quantification
different.
This parallels
the observations
sort- and hash-based
matching
operators
sengen 1984; Ck-aefe et al. 1994; Schneider
In sort-based
for each file
.
221
by Bratbergsengen
and others on
(join, intersection,
etc.) [Bratbergand DeWitt
1989; Shapiro
1986].
division
algorithms,
the number
individually
by the file’s
size.
of merge levels is determined
Therefore,
sorting
the large
dividend
with multiple
merge levels dominates
the cost for both the aggregation and the join. In hash-based
algorithms,
on the other hand, the recursion
depth
of
overflow
avoidance
or
resolution
determined
by the smaller
one, in our
relation.
Since the inputs
of the semi-join,
is
equal
case the
dividend
for
both
inputs
and
relatively
small
divisor
and divisor,
differ by a
factor of at least the cardinality
of the quotient,
using a hash-based
semi-join
operator
is significantly
faster.
This
conclusion
can also be stated
in a
different
way: if a database
system’s
only means
to evaluate
universal
quantification
and relational
division
queries is aggregation
with prior duplicate removal
and semi-join,
the semi-join
as well as the aggregation
should be
hash-based
because in the case of relational
division,
the semi-join
inputs
have very different
sizes and therefore
give hash-based
join algorithms
a
significant
performance
advantage.
Third,
the new hash-division
algorithm
better
than
division
by hash-based
consistently
aggregation.
performs
However,
as well
as or
if a semi-join
duplicate
removal
is needed, hash-division
significantly
outperforms
by hash-based
aggregation
as well as both sort-based
algorithms.
Finally,
hash-based
algorithms
outperform
sort-based
ones not
or
division
only
for
existential
quantification
but also for universal
quantification
problems
and
algorithms.
This result
is different
from the earlier
observation
cited above
because those papers pertained
only to one-to-many
relationship
operations
such as join, semi-join,
intersection,
difference,
and union, while this paper is
the first to demonstrate
this fact for relational
division.
Most importantly,
it
outperforms
sort-based
techniques
consistently
in
a wide
variety
of situa-
tions.
Effect of Clustering
Indices
Let us briefly
consider
the effect of clustering
B-tree
indices,
despite
our
general focus on algorithms
that work on all relations,
including
intermediate
results.
There
are
two
effects
that
are
B-tree
indices
may save sort
attributes
relevant
for a query,
operations.
the index
file, which permits
base file records.
The latter
effect
record sizes in our
1/0
significant
interesting
here.
First,
clustering
Second, if an index
contains
all
may be scanned instead
of the base
savings
if index
entries
is not pertinent
here, because we
analysis.
The former
effect, however,
are shorter
assumed
is worth
than
very short
analyzing,
which we do in Figure
10. For this comparison,
we modified
the last comparison (Figure
9d) to presume
sorted relations
suitable
for the first processing
step. Thus, the sort-based
plans save the initial
sort step for each input,
whereas
the cost of the hash-based
algorithms
remains
unchanged.
Most strikingly,
hash-based
aggregation
is not competitive
with the outer
three algorithms,
because it requires
temporary
files for hash table overflow.
ACM
Transactions
on Database
Systems,
Vol. 20, No, 2, June
1995.
222
G. Graefe and R. L. Cole
.
4000
❑ Naive Division
3000
x Sort-Based
Aggregation
+ Hash-Based
Aggregation
cost
[see]
0 Hash-Division
2000
1000
0 i.J1
I
1
I
2
Size Reduction
a=~=4,1Ql
Fig.
10
Diwsion
with
I
4
by Referential
=256,
duphcate
removal
Integrity
1Sl=256xct,
and referential
I
16
I
8
Enforcement
6
lRl=65,536x/!?x6
integrity
enforcement;
no initial
sorts
The other three algorithms
are fairly close to each other as well as to the base
cost, i.e., the cost of reading
the inputs
and writing
the final output.
In other
words, reading
the inputs
these algorithms.
If Figure
10 had finer
and writing
resolution,
the output
the
are the
following
detailed
dominant
trends
costs for
would
be
apparent.
Naive division
is consistently
about 15% more expensive
than the
base cost. Hash-division
is 18–43%
slower than naive division,
with improving relative
performance
for higher
reduction
factors
8 and increasing
dividend sizes IR 1. Sort-based
aggregation
shows the most interesting
behavior,
with costs 20– 112°A higher than the base cost. For small reduction
factors
8,
the
first
sort
step
between
merge-join
because the size of the dividend
large 6, i.e., referential
integrity
second sort operation
becomes
even outperforms
hash-division.
Thus, Figure
10 demonstrates
and
aggregation
weighs
in
heavily
has not been reduced by the merge-join.
enforcement
with large size reduction,
almost
that
negligible,
sorting
and
the large
sort-based
dividend
For
the
aggregation
is the domi-
nating
cost for sort-based
relational
division
algorithms.
If sorting
can be
sort-based
relational
division
avoided
by use of clustering
B-tree
indices,
algorithms
become
competitive
with
hash-division.
Non-clustering
indices
would not have the same effect, unless they contain
all required
attributes
and permit
index-only
retrievals
[Graefe
19931. On the other hand, if the
dividend
is an intermediate
query result
of suitable
B-tree
indices
do not
exist, hash-division
is the algorithm
of choice. Seen in a different
way, the
hash tables
of hash-division
organize
the divisor
and dividend
tuples
into
associative
structures,
just like permanent
indices
on disk. However,
since
they are in memory,
they are faster than non-clustering
indices on disk but
which
can deliver
the input
data
not quite
as fast as clustering
indices,
immediately
ACM
TransactIons
in the proper
on Database
organization
Systems,
Vol
20, No
(for naive
2. June
1995
division).
Fast Algorithms
400 – ‘
300 –
for Universal Quantification
.
223
❑ Naive Division
x Sort-Based
Aggregation
+ Hash-Based
cost
Aggregation
o Hash-Division
[see]
200 –
loo–
o–
I
1
2
Size Reduction
P=16,
I
4
I
cr=fl=4,
by Referential
1Ql=256,1Sl
I
16
I
8
Integrity
=256
Enforcement
xa,
1
6
lRl=65,536x/?xd
(a)
❑ Naive Division
– 16
x Sort-Based
Aggregation
+ Hash-Based
Aggregation
–8
o Hash-Division
4
Speedup
–2
–1
I
I
2
1
I
4
I
8
Degree of Parallelism
a=~=J=4,
1Ql=256,1Sl
=256
I
16
P
xa,
lRl=65,536x/lx6
(b)
Fig.
11.
(a) Parallel
division
on distributed
memory.
(b)
Speed-up
for
parallel
division
alga.
rithms.
7.2
Parallel
If division
Algorithms
is executed
on a parallel
machine,
inputs
and
intermediate
data
must be partitioned
across the processors,
which add a new cost to all cost
formulas
above. Figure
1 la shows the cost of the algorithms
with duplicate
removal
and referential
integrity
enforcement
for a distributed-memory
database
machine
with P = 16 processors
and no broadcast
assistance
(B =
P). Each processor
is assumed
to be equipped
with CPU, disk, and memory
like the single processor
in the previous
subsection.
The indicated
relation
sizes pertain
l/P
of these
costs.
to the entire
system,
i.e., each processor
stores and processes
sizes, The base cost of Figure
1 la does not include
partitioning
ACM
TransactIons
on Database
Systems,
VO1 20, No 2, June
1995.
224
.
G. Graefe and R. L. Cole
The curves
for the sort-based
aggregation
and semi-join,
9d. The same is true
is different
total
from
memory.
algorithms,
are similar
for hash-division.
that
in Figure
Therefore,
i.e., naive
division
to the corresponding
The curve
9d because
hash-based
for hash-based
the parallel
duplicate
and sort-based
curves
removal
machine
in Figure
aggregation
has a larger
is possible
without
overflow
even for the larger
dividend
relation,
saving
significantly
on 1/0
costs. In the cases where in-memory
hash-based
aggregation
is feasible,
i.e.,
8<2,
it outperforms
hash-division
by a slight
margin
because counting
is
faster than manipulating
and scanning
bit maps.
The similarity
appear to permit
of Figure
9d and Figure
1 la indicates
linear speedup. Figure
1 lb illustrates
that the algorithms
the speedup behavior
of the four
parallel
division
algorithms
with
1 to 16 processors
in a
distributed-memory
architecture.
While
hash-division
comes very close to
linear
speedup,
the other algorithms
exhibit
super-linear
speedup.
The reason is the same as discussed for hash-division
in Figure
1 la: If only CPU and
disk bandwidth
were added, the speedup should be at most linear.
However,
memory
is also added as the system grows, permitting
not only a higher total
bandwidth
but also more effective
use of it, because the fraction
of data that
fits in memory
in hybrid
hashing
as well as the fan-in
for sorting
and
partitioning
increases.
However,
as demonstrated
in Figure
1 la, hash-division is still the most efficient
division
algorithm.
Unfortunately,
super-linear
speed-up
could not be achieved
for any of the algorithms
in our real experiments, as we will see in the next section.
8. EXPERIMENTAL
To validate
the
algorithms
extensible
PERFORMANCE
analytical
COMPARISON
comparisons,
in the Volcano
set of algorithms
we implemented
query
processing
and mechanisms
the
four
division
system.
Volcano
provides
an
for efficient
query processing
[Graefe 1994]. All operators
are designed,
implemented,
debugged,
and tuned
on sequential
systems but can be executed
on parallel
systems by combining
operator
in
them with a special “parallelism”
operator,
called the exchange
Volcano
[Graefe
algorithm
Graefe
In
i.e.,
et al.
all
1993].
Graefe
Volcano
[ 1989],
is
left.
If the
The
entire
algebra
next,
a sort operator
step
function.
as
relational
of open,
by means
merge
Davison
such
has
been
Graefe
and
used
for
Thakkar
a number
[ 1992],
of
and
[1994].
Volcano,
opening
and
studies,
and
prepares
final
input
operators
are
close procedures
sorted
runs
and
implemented
[Graefe
1994].
merges
them
as iterators,
For
until
example,
only
one
merge is performed
on demand
by the next
fits into the sort buffer,
it is kept there until
demanded
by the next function.
Final
house-keeping,
e.g., deallocating
a
merge heap, is done by the close function.
Operators
implemented
in this
model are the standard
in commercial
relational
systems [Graefe
1993].
A tree-structured
query
evaluation
plan is used to execute
queries
by
demand-driven
dataflow
passing
record
identifiers
and addresses
in the
buffer pool between
operators.
All functions
on data records, e.g., comparison
ACM
TransactIons
on Database
Systems,
Vol
20, No. 2, June
1995
Fast Algorithms
for Universal Quantification
.
225
and hashing,
are compiled
prior to execution
and passed to the processing
algorithms
by means of pointers
to the function
entry points.
The naive division
algorithm
was implemented
in such a way that it first
consumes
the
entire
sorted
divisor
relation,
building
a linked
list
of divisor
tuples fixed in the buffer pool. It then consumes the sorted dividend
relation,
advancing
in the linked list of divisor tuples as matching
dividend
tuples are
produced
by the dividend
input,
of the divisor list is reached.
Sort-based
aggregation
and
producing
a quotient
duplicate
removal
tuple
are
each time
the end
implemented
Volcano’s
sort operator.
It performs
aggregation
and
early as possible, that is, no intermediate
run contains
duplicate
duplicate
within
removal
as
sort keys. A
merge-join
consists of a merging
scan of both inputs,
in which tuples from the
inner relation
with equal key values are kept in a linked list of tuples pinned
in the buffer
pool. For semi-joins
in which
the outer relation
produces
the
result,
no linked
lists are used. For all algorithms
involving
sort, the runlength was 4,099 records and the fan-in was 20.
The
hash-based
hash
tables,
space
for
hash
data
bucket,
a tuple’s
and
chosen,
providing
sizes
for
the
generated
unique
experiments
CDC Wren
for system
and
Dividend
and
distributed
skew
was
main
size
conflict
Chain
to
the
memory
bit
(number
resolution
manager
elements.
to the
sort-based
were
for
memory
a pointer
pointer
table
the
data
chain
and
or the
with
test
uniformly
no data
8.1 Sequential
The
parity
chaining
Volcano’s
contain
a hash
quotient.
and
and
maps,
identifier
count
of the
bucket
use
that
algorithms,
Record
bytes
bit
record
divisor
use
algorithms
structures
the
hash-based
hash
tables,
auxiliary
pool,
algorithms
The
elements
next
address
map
are
tuple
in
in
the
respectively.
of buckets)
in
to allocate
the
buffer
For
of 4,099
all
was
algorithms.
32 bytes
divisor
for
divisor
attributes
integers
such
and
were
that
dividend
and
8
pseudo-randomly
all
attributes
were
present.
Algorithms
were
run
on a Sun
SparcStation
running
VI disk drives. One disk was used for normal
files, source code, executable,
etc., while the
SunOS
with
two
UNIX
file systems
other was accessed
directly
by Volcano as a raw device. Of the system’s 24 MB of main memory,
4
MB was allocated
for use by Volcano’s
buffer manager,
in order to guarantee
that the experiments
would not be affected by virtual-memory
thrashing.
The
unit of 1/0, between buffer and disk, was chosen to be 4 KB, which was also
the page-size.
Simplest
Case
Figure
12a-b shows the measured
performance
of the four algorithms
for
quotient
cardinalities
IQI of 16 and 1,024. Each figure
illustrates
division
performance
for a fixed number
of quotient
records. The divisor
cardinality
IS’I varies from 16 to 4,096 or from 16 to 1,024. In all cases, the dividend
is the
i.e.,
R = Q x S. This
Cartesian
product
of the divisor
and the quotient,
ACM
Transactions
on Database
Systems,
Vol. 20, No. 2, June
1995
226
G. Graefe and R L. Cole
.
50 –
❑ Naive Division
40 –
Elapsed
x Sort-Based
Aggregation
+ Hash-Based
Aggregation
30_
Time
[see]
o Hash-Division
20 –
lo–
o–
Divisor
ISI
Cardinality
IQI=16,
4096
1024
256
64
16
R=SXQ
(a)
❑ Naive Division
1250 –
1000 –
Elapsed
Time
750-
[see]
500 –
x Sort-Based
Aggregation
+ Hash-Based
Aggregation
0 Hash-Division
250 –
o–
1024
256
64
16
Divisor
Cardinality
IQI=1,024,
IN
R=SXQ
(b)
Fig.
12.
implies
are
(a) Measured
that
neither
required,
These
and
the
the
The
outperforms
the
division
the
the
in
for
the
case
not
algorithm
integrity
in the
(R
Figure
early
in
can
be
one
no
the
less
algorithm.
to naive
significantly
faster
12 (although
of the
particu-
analytical
well
of division
equal
on Database Systems, Vol 20, No 2, June 1995
duplicates)
In
as in
significantly
attributed
aggregation
enforcement
be applied.
and
aggregation
illustrated
quotient,
comparisons.
appeared
was
reflect
holds
implementation
= Q X S) but
This
integrity
same
perform
direct
(b) Large
can
analytical
is the
aggregation
algorithms).
did
referential
hash-based
sort-based
experiments
model
ACM TransactIons
the
quotient.
nor
algorithms
and
small
of each
here
general-purpose
this
hash-based
analytical
sort-based
very
(referential
made
algorithms
algorithms,
is that
analysis
form
results
of the
hash-based
“surprise”
removal
simplest
observations
ranking
comparison.
the
performance,
duplicate
the
experimental
substantiate
lar,
sequential
to
the
individual
not
fact
than
slightly
The
one
division
than
in
naive
as fast
that
sort
as
our
runs,
Fast Algorithms
for Universal Quantlficatlon
1;&
Elapsed
Time
Relative
To
+ Hash-Based
4
.
227
Aggregation
Hash-Div.
J
Divisor
Cardinality
IQI=1,024,
Fig.
13.
Performance
relative
1024
256
64
16
ISI
R=SXQ
to hash
division,
large
quotient,
and therefore
the cost of sort-based
aggregation
is overestimated
in the cost
formulas.
The difference
in performance
for small relation
sizes are small, but the
difference
grows
fast as the relations
grow.
Figure
12b with all elapsed times divided
for the same input
sizes. The difference
algorithms
shown
in Figure
Figure
13 shows
the same data
by the elapsed time
between
the slowest
12b and Figure
13 ranges
as
of hash-division
and the fastest
from
a factor
of 5 to a
factor of 10. Differences
of this magnitude
occur for all input sizes in Figure
12a-b. For very small relation
sizes, e.g., Ilil = 1,024, ISI = 64, and IQI = 16 in
Figure
12a, the performance
fastest
division
differed
algorithms;
by a factor
0.53 versus
of 3 between
0.17 seconds.
the slowest
and
relations,
e.g.,
For larger
Il?l = 4,194,304,
ISI = 4,096, and IQI = 1,024 in Figure
12b, the performance
differed
by a factor of almost
10: 1,380 versus 140.2 seconds. Naive division
became
relatively
dividend,
worse
be sorted;
because
however,
it requires
the
sort
gation
tation
like sort-based
aggregation
[Bitten
of division
may not be crucially
divisor
relations,
but
if the relations
The same
operations
possible error in the selective
and dividend
sizes accurately.
estimate
Figure
14, shows
referential
itly
enforced
IR] = 65,536
points
while
which
Integrity
the
the larger
take
input
advantage
relation,
of early
the
aggre-
and DeWitt
1983]. The implemenimportant
for small
dividend
and
are large,
division
algorithm
carefully.
are results
of other database
Referential
that
cannot
it is imperative
to choose the
is true if the dividend
or the divisor
such as selections
and joins and the
makes
it impossible
to predict
divisor
Enforcement
the measured
integrity
during
constraint
query
performance
between
execution.
of the four
dividend
In Figure
algorithms
and divisor
14, the
when
must
dividend
the
be expliccardinality
and the quotient
cardinality
IQ] = 256 are fixed for all data
the divisor
cardinality
IS I = 256/8
varies.
8 is the factor by
dividend
size
decreases
ACM
in
Transactions
the
referential
on Database
integrity
Systems.
Vol
enforcement
20, No 2, June
1995
228
G Graefe and R. L Cole
.
504
K
❑
n
~
40 –
Elapsed
❑ Naive Division
~.
Aggregation
+ Hash-Based
Aggregation
Time
[see]
~._
x
x Sort-Based
o Hash-Division
lo–
6
1
I
1.6
1
Size Reduction
by Referential
1S1= 256/6,
Fig.
step
are
14.
and
Measured
varies
the
from
of
enforcement.
leftmost
measurements
integrity
already
four
for
tial
integrity
the
additional
range
from
7.95
Note
for
than
did
to
speeds
ments
to
Duplicate
Figure
records
up
sort-based
added
for
merge-join.
7.09
( 6 =
sort
from
8.67
to
the
left
explicit
elapsed
of
of
6 =
times
from
the
referen-
integrity
actually
the
the
actual
affected,
1)
to
costs
to
( 8 =
8)
without
The
relatively
sort-
referen-
seconds,
due
to
non-matching
also
is
requires
this
less
because
it
Its
elapsed
times
1/0.
hash-division
little
(no
the
algorithm
and
The
20.66
remove
performance
referenimproves,
division.
from
32.77
for
performance
algorithm
performance
can
must
degradation,
seconds.
results
the
parallel
fared
into
aggregation.
a second
integrity
more
than
the
worse
experimental
aggregation
aggregation
1 when
referential
analytical
in
the
results.
Again,
the
analytical
This
also
double
results,
analytical
this
model,
explains
from
the
except
comparison
is
because
which
why
leftmost
the
we
signifielapsed
measure-
dividend
sort without
early aggregation
enforcement
using
a semi-join
version
Removal
15 shows the run-time
performance
are present
in the dividend
and
Transactmns
1
referential
a redundant
significant
8,
its
seconds.
incurs
sort-based
8 =
of
required
on
which
run
show
in
overflow
to
but
justified
sort
6
enforcement.
referential
aggregation
experimental
include
of
16.59
to
48.18
its
impact
aggregation,
seems
cantly
ACM
to
sort-based
times
the
table
table,
the
in
cost
where
strongly
hash-based
but
10.83
that
not
to
hash
9.56
rise
participate
and
The
divisor
indicated
the
values
is most
semi-join
from
the
integrity
without
the
division
large
tuples
algorithm
without
times
cases
naive
For
dividend
semi-join,
proceed
in
Enforcement
inputs.
except
records.
referential
1 represents
step
Integrity
algorithms
words,
8 =
enforcement)
additional
use
the
aggregation
dividend
other
to
with
elapsed
division
enforcement.
fewer
based
In
algorithms
integrity
because
8. The
I
8
IQI = 256, IRI = 65,536
performance
the
enforcement
held
All
tial
1 to
performance
integrity
tial
sequential
I
4
I
2
on Database
Systems,
Vol
of the algorithms
divisor.
In Figure
20, No 2, June
1995,
when duplicate
15, the quotient
is
of
Fast Algorithms
for Universal Quantification
.
229
500 –
❑ Naive Division
Elapsed
400 –
x Sort-Based
Aggregation
300 –
+ Hash-Based
Aggregation
Time
o Hash-Division
200 –
[see]
loo–
I
1
Q
“
u
o-l ,
I
I
I
2
4
8
Size Reduction
by Duplicate
Removal
a
D = a, IQI = 256, ISI = 256 X a, IRI = 65,536
Fig.
cardinality
15.
Sequential
performance
IQI = 256 is constant
and
dividend
and
~ ranging
cardinality
from
comparison
while
the
IR I = 65,536
with
divisor
x @ vary
1 to 8. In all experiments,
X /3
duplicates.
cardinality
with
/S I = 256 X a
replication
factors
a = ~. For example,
a
if a = 1,
Il? = 65,536 and ISI = 256, and if a = 8, IRI = 524,288 and ISI = 2.048.
The performance
difference
between the worst and best algorithms
is again
dramatic.
seconds.
For 8 = 8, the difference
is a factor 6,500.98
Here the most important
result is the difference
aggregation
duplicate
whereas
seconds,
and
hash-division.
In
the
seconds versus 76.20
between hash-based
measurements
removal,
the difference
between
for 8 = 8, it is more than a factor
for
division
these two algorithms
of 4,326.63
seconds
without
was small,
versus 76.20
Differences
in the relative
performance
of the algorithms
between
the
analytical
and the experimental
results
are due to the omission
of early
aggregation
in the analysis
and the fact that the buffer size and management
is different
than assumed
in the analytical
comparison.
In several cases in
the
experiment,
1/0
the
costs occur
entire
buffer
dividend
entire
dividend
due to sorting.
relation
Furthermore,
buffer is used as sort space, so that
pool from run creation
to merging
relation
has
to
be
sorted
fits
only
into
the
buffer,
a small
so that
percentage
no
of the
temporary
file pages remain
in the
and deletion.
Only when a large
twice,
i.e.,
in
the
case
of
sort-based
aggregation
with
preceding
semi-join,
is this effect not observed,
and the
preceding
semi-join
and additional
sort almost double the cost of sort-based
aggregation,
e.g., for IRI = 65,536,
IS I = 256, and IQ I = 256, the run-time
is
20,48 versus 39.63 seconds.
It is important
to realize
terms
of an
aggregate
that
function
if a universal
with
quantification
preceding
semi-join
is expressed
and
the
in
query
optimizer
does not rewrite
the query to use a direct division
algorithm,
the
query may be evaluated
using
an inferior
strategy.
This is true for both
sort-based
query evaluation
systems
such as System
R, Ingres,
and most
commercial
database
systems (sort-based
aggregation
versus naive division)
ACM
TransactIons
on Database
Systems,
Vol. 20, No, 2, June
1995.
230
G. Graefe and R. L, Cole
0
and for hash-based
systems such as Gamma
(hash-based
aggregation
versus
hash-division).
Since it is much easier to implement
a query optimizer
that
recognizes
a division
operation
and rewrites
it into
an aggregation
expression
than vice versa, universal
quantification
should be included
as a language
construct
in query languages,
for example,
as the “contains”
clause between
table expressions
originally
included
in SQL [Astrahan
et al. 1976].
In summary,
hash-division
is always
faster
than any of the other algorithms
presented
here. Even if we compare
the most complex
version
of
hash-division,
i.e., the version with automatic
duplicate
removal
and referential integrity
enforcement,
against
hash-based
removal
or referential
integrity
enforcement,
slower than
cate records
faster
8.2
hash-based
aggregation.
must be removed
from
than
the next
Parallel
However,
the inputs,
best algorithm,
hash-based
aggregation
hash-division
without
duplicate
is only slightly
when non-matching
or duplihash-division
is significantly
aggregation.
Algorithms
Considering
of-magnitude
that parallel
processing
speed-ups
in database
optimization
technique
should
has been demonstrated
to permit
orderquery processing,
any new algorithm
or
also
be
evaluated
in
parallel
settings.
In
addition
to the comparisons
of our four algorithms
on a sequential
machine,
we also measured
speed-up
and scale-up
on a shared-memory
parallel
machine. Speed-up compares
elapsed times for a constant
problem
size, whereas
scale-up
compares
elapsed
times
for a constant
ratio
of data volume
to
processing
resources.
Linear
speed-up is achieved if N times more processing
resources
reduce the elapsed time to l/Nt}’
of the time. Linear
scale-up
is
achieved
if the
provided
following
the ratio
diagrams,
100%
parallel
The
80386
elapsed
time
is constant
and
independent
of the
input
of data volume
to processing
power is constant.
In
both linear speed-up
and linear
scale-up are indicated
size,
the
as
efficiency.
experimental
environment
was a Sequent
Symmetry
with
twenty
processors
running
at 16 MHz (about 3 MIPS each) and twenty
disk
drives connected
to four dual-channel
controllers.
Of these, we used up to
eight processors
and eight disks, because we could not obtain more disks for
our experiments.
In each experiment,
we use the same number
of disks as
processors.
The data are round-robin
partitioned
at the start of each experiment; thus, operations
such as aggregation,
join, and division
require
that
the first
query
execution
step be repartitioning.
Figures
16a-b show run-times
and speed-ups
of the four algorithms
using
two different
partitioning
schemes. (The mapping
of tags on the curves to
algorithms
is the same as in the previous
figures;
we omitted
it here to keep
the figure readable.)
While none of the experiments
show linear speed-up,
all
algorithms
become faster through
parallelism.
In other words, the basis for
intra-operator
parallelism,
namely the division
of sets (relations)
into disjoint
subsets,
applies
not only to the known
algorithms
for sorting,
merge-join,
hash-aggregation,
and hash join but also to the new universal
quantification
algorithm,
hash-division.
This observation
holds not only in theory,
as reflected in our analytical
cost comparisons,
but also in practice.
ACM
Transactions
on Database
Systems,
Vol
20, No
2, June
1995
Fast Algorithms
for Universal Quantification
231
.
1
Elapsed
====– --0_____
‘8+..==:::
100
0.8
-++-.--. –– ----a
Parallel
Time
0.6
Efficiency
0.4
(dashed
[see]
(solid
50
lines)
lines)
0.2
0
I
,
I
I
I
I
1
2
4
8
0
.-,---.
= =_
i&~ :
Degree of Parallelism
ISI=IQI=256,
P
R=SXQ
(a)
1
‘..
Elapsed
‘.
100
Time
[see]
(solid
-#=...-_
50
lines)
-.
‘% -----
--===
=#...
----
= +..
-
0.8
‘.-.
-.
-. :--.-
-w
- e
-. +
Parallel
0.6
Efficiency
0.4
(dashed
lines)
0.2
–0
o–
2
1
8
4
Degree of Parallelism
ISI=IQI=256,
P
R=SXQ
(b)
Fig.
16.
(a) Speed-up
In fact, it would
relatively
small
with
divisor
not have been realistic
data
volumes.
ory machine,
only processors
constant.
Thus, the effect that
cal comparison
operators
and
partitioning.
is missing.
buffers
has
First,
(b) Speed-up
to expect
in this
with
quotient
linear
experiment
and disks are added
permitted
super-linear
partitioning
speed-up
using
with
these
a shared-mem-
while
memory
remains
speed-up in the analyti-
Second,
since the Volcano
implementation
of
been tuned
very carefully
and is quite efficient
[Graefe and Thakkar
1992], the CPU effort for manipulating
records is very
low and a large fraction
of all CPU effort is due to pre-process
overhead
and
protocols
for data exchange.
For
divisor scans as few as 32 tuples.
helps, but
and closes
Volcano’s
process to
example,
for P = 8, a scan process for the
Using a pool of “primed”
Volcano processes
process allocation
is still not free. In addition,
each process opens
its own partition
file. With respect to data exchange
note that, in
shared-memory
implementation,
each data packet going from one
another
requires
a synchronization,
i.e., a semaphore
system call,
ACM
TransactIons
on Database
Systems,
Vol
20, No 2. June
1995
232
G. Graefe and R. L. Cole
.
Elapsed
Parallel
Time
Efficiency
[see]
(dashed
(solid
lines)
lines)
Degree of Parallelism
ISI=PX256,
1QI=256,
P
R=SXQ
(a)
150
Elapsed
Parallel
Time
Efficiency
100
[see]
(dashed
(solid
lines)
50
lines)
0
8
4
2
1
Degree of Parallelism
ISI=PX256,1QI
=256,
P
R=SXQ
(b)
Fig.
17.
(a) Scale-up
in each process.
data
Since
exchange
step,
slow
interprocess
gree
of parallelism
faster
17a-b.
with
constant
strategies.
only
the
17a-b
ratios
All
partitioning.
l\P
of all
fraction
(b) Scale-up
data
of data
stay
that
system
apply
show
of
algorithms
to
the
input
suffer
size
and
to
slight
quotient
on the
same
partitioning.
processor
go through
increases
in
the
with
are supported
any
relatively
increasing
de-
by the fact that
the
efficiency.
scale-up
run-times
with
must
calls
P. These observations
the poorer its parallel
observations
Figures
divisor
communication
an algorithm,
Similar
with
experiments
scale-ups
processing
increases
shown
of
power
in
the
for
processing
four
two
time
in
Figures
algorithms
partitioning
for
larger
volumes
and degrees of parallelism.
None of them exhibits
truly linear
scale-up
in our experiments.
Nonetheless,
the speed-ups
and scale-ups
obtained
are sufficient
to support
parallel
processing
for large universal
quandata
tification
ACM
and relational
Transactions
on Database
division
Systems,
problems.
Vol. 20, No. 2, June
1995.
Fast Algorithms
for Universal Quantification
233
-1
Elapsed
~:
15
Time
~
x Sort and Merge-Join
+ Hybrid
[see]
Hash Join
o Hash-Division
lo–
5–
I
I
I
1248
Size Reduction
by Referential
ISI = 256 /6,
Fig.
8.3
Compariscm
Existential
18.
Existential
of Existential
quantification
by semi-join
algorithms
A
A
I
I
6
I
64
128
256
I
Integrity
Enforcement
6
IQI = 256, IRI = 65,536
versus
universal
quantification,
and Universal Quantification
is well-supported
such
in most
as merge-join
and
query
hybrid
processing
hash
join.
systems
Figure
18
shows the measured
performance
of existential
and universal
quantification,
i.e., hash-based
semi-join
and hash-division,
over the same inputs
as those
Figure
14, i.e., Il?
8 represents
the
= 65,536
dividend’s
in
and IRI = 256 are fixed while ISI = 256/8
varies.
“reduction
factor”
due to referential
integrity
enforcement,
that is, there are increasing
numbers
of non-matching
records
in the dividend.
When 8 = 1, the semi-join
outputs
all 65,536 records, whereas
the size of the quotient
produced
by hash-division
is only IQI = 256. However,
when
8 = 256, the output
of the semi-join
and IQI are both equal to 256
records, and the algorithms
for existential
and universal
quantification
have
the same costs for output record processing.
We can see that the run-times
for
hash-based
semi-join
and hash-division
are nearly the same regardless
of the
number
of output
records,
In fact,
the
difference
between
hash-division
and
hash-join
join. This
is much smaller
than the difference
between
merge-join
and hashis another
strong argument
for directly
supporting
universal
quan-
tification
in today’s
tions
need
quantification
no
database
quantification
and relational
9. SUMMARY
AND
We
have
namely
recently
query
longer
be an issue
and relational
division
described
languages—performance
of such
because
the performance
can be equivalent
to that
opera-
of universal
of existential
semi-join.
CONCLUSIONS
and
compared
four
algorithms
for
naive division,
sort- and hash-based
aggregation
proposed
algorithm
called hash-division.
Analytical
relational
division,
(counting),
and a
and experimen-
tal performance
comparisons
demonstrate
the competitiveness
of hash-division. Naive division
is slow because it requires
sorted inputs;
it is competitive
only if neither
input
incurs
costs for sorting.
Division
using
sort-based
ACM
TransactIons
on Database
Systems,
Vol
20, No. 2. June
1995
234
G. Graefe and R. L Cole
.
aggregate
functions
is almost
as expensive
and an additional
sort are required
to ensure
are counted,
sion.
sort-based
Division
using
hash-division
step
must
can be more
hash-based
when
are required.
uniqueness
aggregation
neither
as naive division;
if a semi-join
that only proper dividend
tuples
aggregate
a preceding
expensive
functions
semi-join
If either
one is required
be enforced,
hash-division
is
nor
because
than
divi-
as fast
a duplicate
referential
outperforms
naive
exiidly
as
removal
integrity
all other
or
algorithms
because it automatically
enforces referential
integrity
and ignores duplicates.
However,
as in our second example
(students
who have taken all database
courses), referential
integrity
is unlikely
to hold if the divisor is the result of
a prior selection
or other restrictive
preprocessing
step. On the other hand, if
referential
integrity
holds and both dividend
and divisor
are duplicate-free,
the basic hash-division
table and substituting
candidates,
thus
algorithm
counters
making
can be simplified
for the bit maps
it similar
rithm
and performance.
Like the other algorithms
to hash-based
consider
here,
by omitting
the divisor
associated
with
quotient
aggregation
hash-division
in both
is easy
algo-
to paral-
lelize.
Its speed-up
and scale-up
behavior
is similar
to that of hash-based
aggregation
and join, while its absolute run-times
are consistently
lower than
those of universal
since hash-division
requires
partitioning
quantification
using
hash-based
aggregation.
Moreover,
does not require
referential
integrity
to hold, it never
the dividend
input
twice,
as is necessary
in indirect
division
algorithms
based on
subsequent
aggregate
function.
able in
primary
relational
parallel
systems,
conclusion
from
division
aggregation,
Therefore,
i.e., for the semi-join
and the
hash-division
is particularly
valu-
where
it outperforms
all other
algorithms,
this study
is that
universal
quantification
can be performed
very
efficiently
database
systems by using hash-division.
An additional
result
of our comparisons
is that
in sequential
direct
Our
and
and parallel
algorithms
tend
to
outperform
indirect
strategies
based on aggregate
functions,
both in sortbased and in hash-based
query evaluation
systems. In direct algorithms,
the
larger input, the dividend,
has to be partitioned,
sorted, or hashed only once,
not twice
as in indirect
strategies
based on aggregation.
Thus,
division
problems
should be evaluated
by direct
division
algorithms.
However,
it is
much easier for a query optimizer
to detect a “for-all”
or “contains”
expression and to rewrite
it into a query evaluation
is to determine
that a complex
aggregation
plan using
expression
aggregation
than it
or a pair of nested
“NOT EXISTS”
clauses are actually
“work-arounds”
for a universal
quantification
problem.
Thus, query languages
should
include
“for-all”
quantifiers
and “contains”
predicts,
e.g., as the “contains”
clause between
table expressions originally
included
in SQL [Astrahan
et al. 1976]. In other words, it was
a mistake
to remove
that clause from the original
language
design,
and it
should
be restored
in future
versions
of SQL and in other database
query
languages.
Finally,
hash-division
is fast not only when compared
to other
division
algorithms
but also when compared
to existential
quantification
algorithms,
i.e., semi-joins.
We have shown experimentally
that hash-division
is as fast as
ACM
TransactIons
on Database
Systems,
Vol
20, No
2, June
1995
Fast Algorithms
hash-based
semi-join
for the same two inputs,
sort-based
semi-join.
Thus, the
permits
database
implementors
tion
and
for Universal Quantification
relational
division
and performs
.
much
235
better
than
recently
proposed
algorithm,
hash-division,
and users to consider
universal
quantifica-
with
a justified
expectation
of excellent
perfor-
mance<
ACKNOWLEDGMENTS
Dale
Favier,
many
David
excellent
Maier,
Barb
suggestions
Peters,
and the
anonymous
for the presentation
of this
reviewers
made
paper.
REFERENCES
ASTRAHAN, M. M., BLASGEN, M. W., CHAMBERLAIN, D. D., ESWARAN, K. P., GRAY, J. N., GRIFFITHS, P.
P., KING,
W. F., LORIE,
R. A.,
MCJONES,
WADE, B. W., AND WATSON, V.
1976.
ment.
Syst.
ACM
Readings
BITTON,
Trans.
Database
in Database
Systems,
D., AND DEWITT,
Trans.
Database
Syst.
BRATBERGSENGEN,
Conf.
K.
CARLIS, J. V.
Proc.
1986.
IEEE
CHENG,
mance:
design,
F.
Hashing
Reprinted
record
methods
Bases
HAS:
A relational
(Singapore,
and
in
to database
I. L.,
manage-
Stonebraker,
M..
1988
files.
ACM
Proc.
Znt ‘1.
Calif,
elimination
relational
August),
algebra
Eng.
in large
operator,
data
IBM
Syst.
operations.
Endowment,
or divide
Calif.,
A., AND WORTHINGTON,
Completeness
algebra
VLDB
(Los Angeles,
and tuning.
Relational
approach
San Mateo,
Duplicate
Data
implementation,
PUTZOLCT, G. R., TRAIGER,
255.
C., SHIBAMIYA,
1972.
97.
Kaufmann,
1983.
Znt ‘1. Con f, on Data
J., LOOSELY,
CODD, E.
D. J.
J. W.,
R: A relational
1, 2 (June),
Morgan
8, 2 (June),
1984.
on Very Large
P. R., MEHL,
System
323.
is not
enough
to conquer.
February),
IEEE,
New
P.
IBM
database
1985.
York.
254
2 perfor-
J. 23, 2.
of Database
Sublanguages.
Prentice-Hall,
New
York.
DATE, C. J., AND DARWEN, H.
Reading,
DEWITT,
1993,
A Guide
to The SQL
Standard,
3rd
cd., Addison-Wesley,
Mass.
D.
J.,
KATZ,
Implementation
R.,
OLKRN,
techniques
F.,
SHAPIRO,
for main
L.,
memory
STONEBRARER,
database
M.,
systems.
D. J., GERBER, R. H., GRAEFE, G., HEYTENS,
1986.
GAMMA-A
Large
Data
Bases
in Database
DEWITT,
R.
performance
(Kyoto,
Japan,
Systems, Morgan
The
(March),
EPSTEIN,
GAMMA
August),
R.
1979.
Proc.
151.
M. L., KUMAR, K. B., AND MURALIRRISHNA,
database
machine.
228. Reprinted
San Mateo,
Proc.
Int’1
in Stonebraker,
M.,
Conf.
1988.
M.
on Very
Readings
Calif.
machme
project.
IEEE
relational
for
at Berkeley,
processing
UCB/ERL
Trans.
of aggregates
Memorandum
Knowledge
M., AND TANARA, H.
database
machine
1986.
GRACE.
Znt ‘t. Conf.
on Data
Eng.
G., AND THAKKAR,
memory
multiprocessor.
GRAEFE, G.
1993.
Query
(Los Angeles,
S. S.
Calif.,
1992.
relational
Data
Eng.
2,
1
Int’1
February),
Exper.
22, 7 (July),
evaluation
techniques
for large
on
and their
a parallel
Softw.-Pract.
of the
Conf.
IEEE,
database
systems.
(February).
An overview
Proc.
Tuning
in
M79/8,
(Kyoto, Japan, August), VLDB Endowment, 209.
GRAEFE, G. 1989. Relational
division: Four algorithms
(June),
algorithms.
Endowment,
S., SCHNEIDER, D., BRICKSR, A., HSIAO, H. I., AND RASMUSSEN,
database
Techniques
S., KITSUREGAWA,
parallel
GIWEFE
1984.
Corzf.
44.
of California
FUSHIMI,
dataflow
Kaufmann,
D. J., GHANDEHARIZADEH,
1990.
Univ.
high
SZGMOD
ACM
(Boston, Mass., June), 1.
DEWITT, D. J., AND GERBER, R. H.
1985. Multiprocessor
hash-based join
Int’1. Conf. on Very Large Data Bases (Stockholm,
Sweden,
August)
VLDB
DEWITT,
D.
AND WOOD,
Proc,
system
Very
Large
performance.
New
York,
database
software
of a
Data
Bases
Proc.
IEEE
94.
algorithm
on
a shared-
495.
databases.
ACM
Comput.
Surl,.
25, 2
73-170.
ACM
Transactions
on Database
Systems,
Vol
20, No. 2, June
1995
236
G, Graefe and R. L, Cole
.
GRAEFE, G., AND DAVISON, D. L.
dence
m extensible
GRAEFE,
G
IEEE
1994.
Trans.
GRAEFE,
Knowl.
Data
machine
M.,
dynamic
KNUTH,
MAIER,
D.
D
GRACE
1973.
The Art
Reading,
1983,
NAKAYAMA,
M.,
Tlle
dynamic
destaging
VLDB
O’NEIL,
19, 8 (August),
dataflow
query
processing
749.
system.
Sort
versus
hash
revisited.
IEEE
Trans
934,
T.
Gene~ation
join
1983.
of
hash
to
data
base
1, 163.
1989.
Proc.
VLDB
of Computer
Application
Comput.
method.
August),
Theory
The effect
of bucket
Znt ‘1. Conf.
Endowment,
Programming,
of Relational
M.,
strategy.
on
size tuning
Very
Large
in the
Data
Bases
257
Vol.
Databases.
AND TARAGI,
Proc.
Endowment,
P. E.
architecture-indepen-
Eng
Softw
III:
Sorting
and
Searching.
Addi-
Mass.
KITSURECAWA,
August),
and
Trans.
120.
1994.
M., AND TARAGI, M.
hash
of parallelism
parallel
1 (February),
New
The Netherlands,
son-Wesley,
and
H , AND MOTOOKA,
M., NARAYAMA,
(Amsterdam,
6,
6, 6 (December),
TANAKA,
hybrid
IEEE
extensible
Eng.
and Its architecture.
KITSUREG~WA,
processing
A., AND SHAPIRO, L. D.
Eng.
KITSUREGAWA,
An
Data
Encapsulation
query
Volcano:
G., LINVILLE,
Knowl.
1993.
database
M.
Int ‘1, Conf.
Computer
1983.
Science
Press,
Hash-partitioned
on Very
Large
Data
Rockville,
join
Bases
Md.
method
using
(Los Angeles,
Calif.,
468.
1994
~ntroductzon
1986.
Fragmentation:
to Relational
Databases.
Morgan
Kaufmann,
San
Mateo,
query
processing.
ACM
Trans.
Cahf,
SACCO, G. M.
Database
Syst
11, 2 (June),
SALZBERG, B., TSURERMAN,
Sort:
A distributed
(Atlantic
City,
N. J., May),
in a shared-nothing
ACM,
SHAPIRO, L. D.
Trans.
1986
Database
SMITH, J. M.,
database
ACM,
Join
Syst.
1986.
evaluation
Proc.
in
database
1975.
of four
ACM
systems
Optimizing
the
18, 10 (October),
database
IEEE,
1990.
ACM
April
TransactIons
1994;
revised
on Database
parallel
SIGMOD
An
Data
with
1990.
Fast-
Conference
join
algorithms
Con ference
(Portland
January
Systems,
large
main
memories.
ACM
query
New
IEEE
language,
York,
adaptive
hash
Bases.
performance
1995;
Vol
20, No
algebra
Data
1990.
Proc,
Eng.
Supporting
IEEE
Bull,
9, 1 (March).
universal
Int ‘1. Conf.
quantifi-
on Data
Eng.
68.
join
(Brisbane,
accepted
of a relational
568.
algorithm
for multiuser
Australia,
August)
186
Received
B.
SIGMOD
239.
case for shared-nothing.
on Very Large
ACM
110.
ACM
February),
Proc.
94.
environment.
processing
The
S., AND VAUGHAN,
sort.
A., SOCIWT, G., AND BURNS, L.
H. AND GRAY, J.
Int ‘1, Conf,
York,
11, 3 (September),
Commun.
Calif.,
efficient
external
A performance
York,
in a two-dimensional
(LOS Angeles,
Proc.
New
1989.
AND CHANG, P. Y. T.
interface.
STONEBRAICZ~, M..
ZELLER,
D.
New
WHANG, K. Y., MALHOTRA,
cation
single-output
multiprocessor
May-June),
for
A., GRAY, J , STEWART, M., UREN,
single-input
SCHN~lDER D., AND DEWITT,
Ore.,
A techmque
113
March
2, June
1995
1995
VLDB
environments.
Endowment,