Download Object operations benchmark

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DBase wikipedia , lookup

Global serializability wikipedia , lookup

Commitment ordering wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Microsoft Access wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

IMDb wikipedia , lookup

Btrieve wikipedia , lookup

Oracle Database wikipedia , lookup

Serializability wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Object Operations
R. G. G. CATTELL
Benchmark
and J. SKEEN
Sun Microsystems
Performance is a major issue in the acceptance of object-oriented
and relational
database systems
aimed at engineering
applications
such as computer-aided
software engineering
(CASE) and
computer-aided
design (CAD). Because traditional
database systems benchmarks are inappropri ate to measure performance for operations on engineering
objects, we designed a new benchmark
Object Operations version 1 (001) to focus on important
characteristics
of these applications.
001 is descended from an earlier benchmark
for simple database operations and is based on
several years experience with that benchmark.
In this paper we describe the 001 benchmark
and results we obtained running
it on a variety of database systems. We provide a careful
specification
of the benchmark,
show how it can be implemented
on database systems, and
present evidence that more than an order of magnitude
difference in performance
can result
from a DBMS implementation
quite different from current products: minimizing
overhead per
database call, offloading
database server functionality
to workstations,
taking
advantage of
large main memories, and using link-based methods.
and Logic Structures]: Performance
Categories and Subject Descriptors:
B.2.2 [Arithmetic
Analysis and Design Aids; D. 1.5 [Programming
Techniques]:
Object-oriented
Programmin~
measures;
H.2. 1 [Database
ManageD.2.8 [Software
Engineering]:
Metrics–performance
Languages— data dement]: Logical Design— data models; H. 2.3 [Database Management]:
scription
languages
(DDL),
data manipulation
languages (DML), database
(persistent)
programming languages;
H. 2,8 [Database Management]: Database Applications;
K. 6.2 [Management
perforof Computing
and Information
Systems]:
Installation
Management— benchmarks,
mance
and
General
usage
measurement
Terms: Algorithms,
Design, Experimentation,
Languages,
Measurement,
Performance
Additional
Key Words and Phrases: Benchmark,
CAD, CASE, client-server
architecture,
engineering database benchmark, hypermodel, object operations benchmark, object-oriented DBMS’s,
relation of DBMS’s, workstations
1. INTRODUCTION
We
believe
that
the
additional
functionality
that
the
new
generation
of
object-oriented
and extended
relational
DBMSS provide for engineering
applications
such as CASE and CAD is important,
but these new DBMSS will
not be widely
accepted unless a factor of ten to one hundred
times the performance
of traditional
DBMSS
is achieved
for common
operations
that
Authors’ address: Sun Microsystems,
2550 Garcia Ave., Mountain View, CA 94043.
Permission to copy without fee all or part of this material is granted provided that the copies are
not made or distributed
for direct commercial advantage, the ACM copyright notice and the title
of the publication
and its date appear, and notice is given that copying is by permission of the
Association for Computing Machinery.
To copy otherwise, or to republish, requires a fee and/or
specific permission.
@ 1992 ACM 0362-5915/92/0300-0001
$01.50
ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992, Pages 1-31.
R G G
2.
Cattell
and J. Skeen
engineering
applications
perform.
Engineering
interest
to us, so we are attempting
to focus
in this
applications
some attention
are of particular
on performance
area.
Measuring
engineering
DBMS performance
in a generic way is very difficult, since every application
has somewhat
different
requirements.
However,
our goal is simply
to measure
common
business applications,
operations
that
provided
by SQL and other high-level
cally utilize
a programming
language
operations
that
differ
cannot be expressed
DMLs.
Engineering
interspersed
with
from
traditional
in the abstractions
applications
typioperations
on indi-
vidual
persistent
data objects. Although
database
access at this
level is in some ways a step backward
from high-level
operations
relations
in modern
DMLs,
it is a step forward
mentations
and is the best solution
As an example,
consider
a CAD
connections
on a circuit
tion algorithm
might
them to reduce wire
their
interdependencies
board
are accessed
current
might
be stored
complex.
a programming
possible
2. ENGINEERING
-
inter-
and an optimiza-
the dependency
graph, examining
plan.
In both cases, hundreds
per second,
The
perhaps
DBMS
language
to efficiently
and their
in a database
executing
the
tional
join for each object.
Simply
adding
transitive
language
is inadequate,
as the steps performed
at
arbitrarily
ad hoc imple
follow connections
between
components
and rearrange
lengths.
In a CASE application,
program
modules
and
might
be stored in a database,
and a system-build
algorithm
might traverse
to construct
a compilation
objects
from
available
at present.
application:
components
fine-grain
on entire
mix
in
must
the
either
provide
database
query
the programming
DATABASE
version numbers
or thousands
of
equivalent
of a rela-
closure
to the query
each junction
can be
the full
language,
and query
functionalist
or it
y of
must
be
languages.
PERFORMANCE
We do not believe
that traditional
measures
of DBMS system performance,
the Wisconsin
benchmarks
[51 and TPC-A
(Transaction
Processing
Performance
Council
[101),
adequately
measure
engineering
application
performance.
The TPC-A
benchmark
is designed
to measure
transaction
throughput
with
large
measures
are relevant
sures
too
at
optimizer
coarse
on complex
numbers
of
to engineering
a grain
queries
and
that
users.
While
some
performance,
focus
are rare
on
the
they
of the
generally
intelligence
in our applications.
Wisconsin
are mea-
of the
Current
query
query
languages
are not capable of expressing
the operations
that our applications
when
this
problem
is remedied,
a benchmark
for the more
complex
require;
queries will be appropriate.
The most accurate
measure
of performance
for engineering
applications
would be to run an actual application,
representing
the data in the manner
best suited
to each potential
DBMS.
However,
we feel it is difficult
or
impossible
to design an application
whose performance
would be representative of many
different
engineering
applications,
and we want
a generic
measure.
Perhaps
in the future
someone
will
be successful
at this more
difficult
task
ACM Transactions
[6, 11].
on Database Systems, Vol
17, No. 1, March 1992
Object Operations Benchmark
Our generic
benchmark
measures
frequent
in engineering
applications,
CAD
engineers
as well
as feedback
are the operations
based on interviews
on our previous
.
3
we expect to be
with
CASE and
paper
[9]. The benchmark
we describe, 001,1 substantially
improves
upon our earlier
work, simplifying
and focusing
the measurement
on important
object operations.
The Hyper
Model benchmark
[1] was also designed to improve
on our earlier benchmark;
however,
measures
Section
-
that work has gone in the direction
of a more comprehensive
set of
rather
than simpler
ones. We contrast
these two benchmarks
in
5.
The 001
benchmark
measures
include
inserting
objects,
looking
up objects, and traversing
connections
between objects. The benchmark
is designed
to be applied
to object-oriented,
systems,
even B-tree packages
tems. Engineering
applications
queries,
so performance
system
some
acceptable.
engineering
orders
of magnitude
applications,
is required,
transaction
There
Also,
relational,
network
or custom
may also
application-specific
database
sysrequire
more complex
or ad hoc
on the
001
benchmark
the 001
mix
of operations
applications.
away
However,
from
acceptable
so we simply
aim
just as the TPC-A
we
or hierarchical
alone
does
is only
believe
performance
database
not
make
a
representative
current
levels
systems
of
are
for engineering
to focus attention
where high performance
benchmark
focuses on commercial
on-line
processing.
is a tremendous
gap between
ory programming
language
structures
in a conventional
the performance
provided
by in-mem-
data structures
and that provided
by disk-based
database management
system. Existing
database
systems typically
respond
to queries
in tenths
of a second. Simple
lookups
using in-memory
structures
can be performed
in microseconds.
This factor of
100,000 difference
in response time is the result of many factors, not simply
disk access. We believe there is a place fbr a database system
gap between
these two systems,
performing
close to 1,000
second on typical
How
this
can much
performance?
engineering
higher
databases.
performance
We believe
that spans the
operations
per
be achieved?
a number
of DBMS
What
do we give up to get
features
used in commer-
cial applications
are not necessary
in the engineering
applications
that
require
high performance.
For example,
we believe that the level of concurrency is quite different
— an engineer
may “check out” part of a design and
work
on it for days, with no requirements
for concurrent
updates
by other
and the data can largely
be cached in main
memory
on the
engineers,
engineer’s
workstation.
Large
improvements
in engineering
database
performance
are probably
not going to be achieved
through
minor
cal representation,
or query languages.
improvements
in data model, physi In our experience,
the substantial
1 Note that our benchmark
was originally
performed in 1988; this paper, in slightly
revised
forms, has been informally
available since then. The current version, containing further updates
and more recent results, supersedes the previous ones. More recent information
on DBMSS for
Addisonengineering
applications
can be found in Cattell’s
book, ObJ’ect Data Management,
Wesley, 1991.
ACM Transactions
on Database Systems, Vol
17, No 1, March 1992.
R. G. G. Cattell and J. Skeen
4.
improvements
efficient
result
from
major
changes
in DBMS
remote
access to data, caching
a large
mismatch”
avoiding
an “impedance
memory,
query languages,
and providing
new
logarithmic
access time.
Differences
We want
tion
database
The following
of the
list-valued
information
fields,
We define
differences,
as long
than
and
as the
DATABASE
a benchmark
the DBMS.
providing
set of data in main
programming
and
access methods
with fixed rather
in data models
(e.g., relational
object oriented)
can be dwarfed
by architectural
data model does not dictate implementation.
3. BENCHMARK
architecture:
working
between
should
independent
therefore
to be stored,
possibly
the database
of the
possibly
as two or more
as two logical
data
be regarded
record
model
as a single
types
provided
as an abstract
object
by
defini-
type
in a relational
with
system.
records:
Part: RECORD[ld: INT, type: STRING [1 01, x, Y: INT, build. DATE]
ConnectIon. RECORD[from” Part-id, to Part-id, type” STRING [1 O], length. INT]
We define
a database
of parts with
unique
id’s 1 through
20,000,
and
60,000 connections,
with exactly
three connections
from each part to other
(randomly
selected)
parts.
The x, y, and length field values
are randomly
distributed
selected
in
from
is randomly
tion
specified
of a date
mum
range
strings
distributed
We assume
types
the
the
that
the
in a 10-year
the
above,
[0..999991,
{ “part-typeO”
database
where
and time,
and
type
have
values
and the
randomly
build date
range.
system
allows
INT k a 32-bit
STRINGIN]
size N. The from and to fields
fields
. . . “part-type9”},
integer,
data
fields
is a variable-length
are references
with
the
scalar
DATE is some representato specific
string
parts;
of maxithey
might
be implemented
by part id foreign
keys in a relational
DBMS,
or by unique
identifiers
in an object-oriented
DBMS. Also, it is permissible
to combine part
and connection
information
in a single
object
if the DBMS
permits
it.
However,
the information
associated with connections
(type and length)
must
be preserved.
Also,
it
must
be possible
direction.
The random
connections
between
parts
ity of reference.
Specifically,
90 percent
to traverse
connections
in
either
are selected to produce some localof the connections
are randomly
selected among the 1 percent
of parts that are “closest,”
and the remaining
connections
are made to any randomly
selected part.
Closeness
is defined
using the parts with the numerically
closest part ids. We use the following
algorithm
to select part connections,
where rand[ 1.. k] is a random
integer
in
the range 1 to k, and N is the number
of parts in the database.
for part in [1
N]
for connechon in [1
3]
if rand[l
10] > 1 then
/* 90% of time create connection
ACM Transactions
on Database Systems, Vol
to the closest 1% of parts * /
17, No 1, March 1992
Object Operations Benchmark
.
5
let cpart = part + rand[l
N/ 100] – 1 – N/200;
I* “double up” at the ends so stay in part id range * /
if cpart < N / 200 then let cpart = cpart + N / 200;
if cpart > N – N / 200 then let cpart = cpart – N / 200;
connect(part, cpart)
else
/“ 10% of time create connection to any part 1.. N * /
let cpart = rand[l
N]
connect(part, cpart)
Using part id’s for locality
this is as good as any other
is quite
criterion.
arbitrary;
in Section 6, we justify
that
The choice of 90 percent
colocality
of
reference
is also somewhat
arbitrary;
however,
we found that unless
is as high as 99 percent or as low as 1 percent it does not substantially
the result
(we return
Our database
space overhead
bles
this
amount.
altogether
set that
to this
issue
in Section
comprises
approximately
for access methods
and
As
such,
the
in main
9).
2 megabytes
of attribute
data,
record storage, which typically
database
and is a good representative
fits entirely
locality
affect
is
approximately
of an engineering
memory—for
example,
plus
dou-
4 megabytes
database
the data
working
associated
with
a CAD drawing
shown on the screen, the data in one engineer’s
portion
of a
larger engineering
design or software
project, or a working
set from a larger
knowledge
base used by an expert system.
A DBMS
must
also scale up to larger
databases
and working
sets. We refer
to the database we just described
as the small database,
and include
a second
database,
the large database,
that is identical
except that the record counts
are scaled up by a factor of ten. This database
then requires
approximately
40 megabytes
a substantial
of storage and overhead.
performance
difference
Some database systems
on the larger
database,
will not show
while
others,
particularly
those that use a main memory cache, will run much
small
database.
Other
systems,
e.g., a persistent
programming
may not even permit
the larger database
size.
As an experiment,
test larger
real-world
our benchmark
first
we also scaled
databases.
because
up to a huge
However,
the results
400 megabyte
we do not require
seem to differ
faster on the
language,
this
less dramatically
database
database
than
to
in
the
two.
We will see that the most important
distinction
between
the databases
is
the different
working
set size, not the need for access methods
that scale up
efficiently
with
larger
data sets on disk. Using
the benchmark
machine
configuration
we define in Section 6, the small database
working
set fits in
main memory,
while the large and huge databases
do not.
We assume that the database itself resides on a server on a network,
while
the engineering
application
runs on a workstation.
We feel this
remote
database
configuration
is the most realistic
representation
of engineering
and
office
databases,
and
is consistent
with
the
move
towards
database
servers
on a network.
Performance
for remote
databases
may be significantly impacted
by the caching
and concurrency
control
architecture
of the
DBMS. We also report performance
for a local database, with the benchmark
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
R. G. G. Cattell and J. Skeen
6.
application
running
on the same machine
where
we do not require
the local database
configuration
the database
is stored, but
for purposes of reproduc-
ing the benchmark.
4. BENCHMARK
We now
and is run
“checking
we require
highly
DBMS
data
describe
the benchmark.
The benchmark
measures
response time
with our model of an engineer
by a single user. 2 This is consistent
out” a segment
of data for exclusive
use; indeed, the performance
may
not
used be locked
each
even
be achievable
following
measure
consistency
or versioned
three
10 times,
and
and caching
any host
the part.
current
technology
if there
Traversal.
Find
to it,
are
measure
to support
our
multiple
benchmark
response
users.
measures.
time
for
each
(3)
Insert.
run
to check
all parts
and
random
part id’s and fetch the corresponding
For each part, call a null procedure
written
in
language,
passing
connected
the
x, y position
to randomly
so on, up to seven
the results
Enter
We
run
selected
hops
(total
and
part,
of 3,280
possible
duplicates).
For each part, call a null programming
procedure
with the value of the x and y fields and the part
measure time for reverse traversal,
swapping
“from”
and “to”
to compare
is
for the
that the
behavior.
programming
connected
in order
operations
Lookup.
Generate
1,000
parts from the database.
(2)
with
concurrent
access by multiple
users. However,
it is important
to allow multiuser
access to the database,
and we do require
The
(1)
MEASURES
type
of
or to a part
parts,
with
language
type. Also
directions,
obtained.
100 parts
and
three
connections
from
each
domly selected parts. Time must be included
to update
access structures
used in the execution
of (1) and (2).
gramming
language
procedure
to obtain
Commit
the changes to the disk.
The benchmark
measures
were selected
operations
in many engineering
applications.
the
x, y position
to other
ran-
indices or other
Call a null profor each insert.
to be representative
of the data
For example,
in a CASE appli-
cation,
an operation
very similar
to the traversal
must be performed
to
determine
a compilation
plan for a system configuration,
an operation
like
the lookup is needed to find programs
associated
with a particular
system or
range
of
information
traversal
dates,
and
an
operation
like
the
a new module
is compiled.
is needed to optimize
a circuit
layout,
after
with particular
board.
types,
and the
2 Response time is the real (wall
database system with a particular
mto the program’s variables.
ACM Transactions
insertion
insert
is
needed
to
enter
new
In an ECAD
application,
the
the lookup to find components
to add new
components
to a circuit
clock) time elapsed from the point where a program calls the
query, until the results of the query, if any, have been placed
on Database Systems, Vol
17, No
1. March 1992
Object Operations Benchmark
.
7
We feel there is some value to a single “overall”
performance
time, as with
the TPC-A
benchmark.
We designed
the benchmark
such that
the three
measures
were
executed
approximately
in proportion
to their
frequency
of
occurrence
in representative
applications
(an order of magnitude
more reads
than writes,
and several times more traversal
operations
than Iookups,
as a
very
rough
a single
However,
estimate
from
interviews
overall
number
we warn against
frequency
of occurrence
with
CASE
can be computed
the abuse of the
of these
operations
and CAD
engineers),
so that
by adding
the three
results.
“overall”
number,
because the
will
differ
widely
between
appli-
cations: the individual
measures
are important.
Note that we require
that the host application
program
be called on each
simple operation
above or that each database
operation
be called from the
host programming
language
to simulate
an engineering
application
where
arbitrary
programming
operations
must be mixed with database
operations.
For the three operations
above, for example,
the corresponding
procedures
might
(1) display
different
paths,
the part
and
on the screen,
(3) compute
the
(2) compute
x, y position
total
wire
lengths
for placing
the
along
part
in a
diagram.
Our requirement
to call the host application
programming
language
may
be controversial:
it could be argued that all of the data should be fetched by a
single query before returning
to the application
program
or that embedded
procedures
or procedural
constructs
in a query
language
should
be used
instead of calling
the application
program
itself with the data. However,
this
requirement
of intermixing
database
access and programming
is one of the
factors that makes our benchmark
interesting.
The restriction
is necessary to
allow
an application
programmer
to take advantage
of the computation
power
available
on workstations
our simple
benchmark—
language
and embedded
“traversal”
would
and to encode
operations
procedures.
probably
not
that
For
operations
cannot
example,
traverse
every
more
complex
node
to a fixed
depth;
application
might
perform
arbitrary
computation
based on previous
visited,
global data structures,
and perhaps
even user input to decide
nodes to traverse
Because
than
be encoded
in the query
in a real application
the
the
parts
which
next.
of our
requirement
for
intermixing
database
and
programming
operations
at a fine grain, we refer to the 001 benchmark
leave to future
research the exploration
of a batch version
as interactive.
We
of 001,
in which
all
lookups
repetitions
of
an
operation,
for
example
the
traversals,
may be executed in a single database
batch versus interactive
issues in Section 7.
1,000
call.
We further
or
3,280
discuss
the
We require
that the benchmark
time with no data cached. These
measures
repetitions
be performed
ten times, the first
provide
a consistency
check on
the
also
the
variation
on the
results
and
show
performance
(on the
second,
third,
and remaining
iterations)
as successively
more data is in memory.
Of
course, there will be some cache hits even on the first iteration,
depending
on
database system parameters
such as page size, clustering,
and caching strat egy. As a result,
a DBMS can achieve a cache hit
as low as 10 percent or as large as 90 percent.
ACM Transactions
ratio
for the first
iteration
on Database Systems, Vol. 17, No. 1, March 1992.
R. G. G. Cattell and J Skeen
8.
We refer
to the results
asymptotic
best times
of the first
(the tenth
iteration
iteration,
as the
when
cold start
results
the cache is fully
and the
initialized)
as the warm start results.
In the systems we measured,
we found the
asymptote
was achieved
in only two or three
iterations,
with
only
(under 10 percent)
random
variations
after that.
warm
small
A different
set of random
parts
is fetched
with
each iteration
of the
measures.
Thus, most or all of the database must be cached in order to obtain
good performance.
The small database
constitutes
a working
set that can fit
in memory,
the
large
The benchmark
the forward
traversal
sal. We leave
database
constitutes
implementor
may
in measure
(2) may
it up to the user to decide
would be acceptable
for their
no access method
for reverse
a working
colocate
parts
set that
go faster
whether
does not.
in the database
than
such that
the reverse
the slower
reverse
travertraversal
application.
It is not acceptable
to implement
traversal
(e. g., do linear
search to find parts
connected to a part); this would take too long for almost any application.
For
example,
in a CASE
application
it must
be possible
to find
dependent
modules as well as modules
dependent
upon the current
module.
Note that the results
for the reverse traversal
may
but a random
part has three parts it is connected
“to,”
connected
to the
“from”
forward
(three
on average).
traversal.
We
However,
found
it
nodes visited
on the reverse
traversal
the
comparable
figures
(i. e., multiply
vary widely,
as each
number
of parts it is
the average
convenient
should
to count
the
and normalize
the result
time
by 3280/N,
where
be similar
number
number
of nodes actually
visited).
It is not acceptable
to take advantage
of the fact that the benchmark
id’s are numbered
sequentially,
computing
disk address directly
from
id—some
sort of hash
index,
B-tree
index,
structure
is necessary for a realistic
Engineering
applications
typically
trol,
but
there
is currently
much
and/or
record
implementation.
require
some form
debate
as to whether
(object)
part
part
ID based link
of concurrency
they
of
to obtain
N is the
need the
conatomic
transactions
of commercial
DBMSS
or whether
some form
of versions
or
locking
plus logging
of updates is better.
We side-step this debate by specifying that the benchmark
numbers
should
be measured
and reported
with
whatever
provides
important
concurrency
more than
to report.
control mechanisms
the DBMS provides.
If the DBMS
one choice, the performance
with
each alternative
is
We leave it to the users to decide whether
the concur-
rency control and recovery
mechanisms
provided,
and their performance
cost,
is adequate
for their application.
Although
we do not specify which
type of concurrency
control
must be
provided,
we do insist
that the DBMS
minimally
guarantee
physical
and
logical
integrity
of the data with
multiple
readers
and writers
on a network with
the concurrency
control
mechanism
(locking,
versions,
transactions) provided.
Furthermore,
the insert
measure
requires
that updates
be
committed
to disk after every 100 parts.
We require
that the benchmark
measures
be performed
on both the small
and large database,
as described
earlier.
The space as well as time requirements must be reported
for the databases.
ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992
Object Operations Benchmark
5. BENCHMARK
We made
object
choices
In
contrast
the 001
benchmark.
Earlier
in selecting
this
that
we incorporated
connections
to our earlier
defined
instead
it
two
has
and
been
measures
001
[9].
meaningful
many
for
choices,
and
and the HyperModel
benchmark
The
benchmark
of these
benchmark
of reference
of authors
but
the
in
locality
but
we justify
We believe
than
minor
point,
applications).
We dropped
a simple
section,
benchmark
Benchmark.
realistic
though
9
JUSTIFICATION
many
operations.
more
.
is both
database
itself
simpler
al-
of parts
and
and use a database
documents
a source
(the
database
of confusion
we included
in our
contents
about
original
and
is similar,
our
paper.
are
a
intended
The
range
lookup measure
found the parts in a ten-day
range of “build”
dates. In our
experience
the result of this measure was very similar
to the lookup measure;
in fact,
they
differed
only
when
case the time was dominated
lookup).
We also dropped the
the database
to fetch
the
more
by the
sequential
x, y position
than
a few parts
were
found
(in which
copying
time rather
than the index
scan, which enumerated
all parts in
for each.
We also dropped
the reference
lookup
and group lookup
measure
in the
original
benchmark.
These measures
involved
following
references
(e. g.,
from a connection
record to a part) or back-references
(e.g., from a part to the
connections
that
reference
it). We had hoped
complex
operations,
such as the traversal
we
individual
reference
and group
lookup
times.
specify these measurements
in such a way
consistent
numbers
for their DBMS without
to fetch the record from which the reference
with
the single
trauersal
We considered
found
tant
than
a simple,
important
range
update
for update
we included
only the
Thus, we chose the
insert.
lookup,
lookup,
that different
implementors
including
the set-up time
emanated).
So we replaced
(in addition
and insert
traversal,
sequential
to insert)
to be closely
and
scan, and
focused benchmark.
However,
to obtain for some applications,
insert
HyperModel
Benchmark.
The
operations
update
measures
these additional
so we recommend
HyperModel
of part
correlated,
in a complete
application
study (e. g., [11). In particular,
DBMS
include
a B-Tree
or other access method
for
these occur in many applications.
original
benchmark
ing in 17 benchmark
got
(e. g.,
them
measure.
including
the results
to estimate
the time to do
now propose,
by composing
In practice
it was hard to
records.
however,
as more
We
so
impor-
for purposes
of
results
may be
they be included
it is essential
that a
range lookup
because
Benchmark
[11 added
to our
and database
rather
than simplifying
it, thereby
result measures,
four types of database entities
and four types
of relationships.
Although
the HyperModel
benchmark
is more complex
and
therefore
harder
to implement,
we believe it is a more thorough
measure
of
engineering
database performance
than 001.
The difference
between
001
and HyperModel
might
be likened
to the
difference
between
TPC-A
[101 and the Wisconsin
[51 benchmarks–
our intent
ACM TransactIons on Database Systems, Vol. 17, No. 1, March 1992.
10
R. G. G. Cattell and J. Skeen
.
is to focus
HyperModel
believe
on overall
performance
benchmark
provides
there
is a place
for
both
for a few important
operations,
while
many
different
measures.
As such,
the
HyperModel
benchmark
and
the
we
001
in
comparing
systems. It is easy to produce 001 results on any DBMS for which
the HyperModel
measures
have been implemented
because our database
and
measures
are more or less a subset of theirs (mapping
“nodes”
to parts, and
the “MtoN”
relationship
We do not enumerate
to connections).
all of the differences
in the HyperModel
benchmark,
but we note the following:
(1) The HyperModel
jects, including
tionship.
database
includes
multiple
relationships
between
oba one-to-many
subpart
relationship,
and a subtype
rela-
It also includes
a “blob”
field
that
stores
a multi-kilobyte
(2) The EIyperModel
benchmark
includes
the same six measures
original
paper, plus some new measures:
a traversal
measure
(they retain
the group
and reference
measures
rather
than
them), and read/write
operations
on the “blob”
field.
These
additional
However,
extensions
to the
those we have included,
“blob”
operations
are
familiar
original
benchmark
data about a DBMS and we think
we feel that all of these measures
with—blob
so we have
not DBMS
operations
all
useful
in
from our
like ours
replacing
providing
the choice of additions
is good.
are of secondary
importance
to
retained
only our simpler
measures.
The
bottlenecks
in the applications
we are
are typically
include
only small STRING
and INT
relationships
do seem to represent
plausible
that they all measure
the
methods
in the database;
perhaps
benchmark
will show whether
their
In order to make the HyperModel
are
value.
limited
by disk
speed.
Thus
we
fields in our database.
The additional
a more realistic
database,
but it is
same kinds
of operations
and access
future
experience
with
the HyperModel
traversal
times are directly
correlated.
benchmark
more useful, we would like to
see a few changes in its definition.
Some work may already
be underway
these changes at the time of this writing.
First, we feel it is important
to precisely
specify and control
the details
on
of
implementation
so that the benchmark
results
can be accurately
compared.
These details include
those we cover in the next section: the amount
of main
memory
dedicated
to the DBMS
and cache, the processor
and disk performance, and control of initialization
timing
and cache.
Second, since the HyperModel
benchmark
is intended
to be more complete
databases,
it should
include
some representain its coverage of engineering
CAD,
CASE,
and other
tion
of versions,
a concept
important
in
many
engineering
applications.
Versions
and the concurrency
model
should
be
tested in the context of a multiuser
benchmark.
Third,
the HyperModel
benchmark
introduces
multiple
connection
types
but defines no benchmark
measure
or guidelines
for how the data should
be clustered
according
complex data structures
to the various
connections.
are not very useful.
ACM TransactIons on Database Systems, Vol
17, No 1, March 1992
Without
this,
the
more
Object Operations Benchmark
Finally,
and
benchmark
most
be
database
is more
Summary.
earlier
We have
and
users.
cascaded
list
evolution,
utility
harder
and
“range”
order
that
made
on our
a list
referees
and
the
after
of suggesbenchmark
composite
multiuser
we removed
remote
provides
inheritance,
versions,
except
to
a good
where
reproduce
and
more
objects
measures,
our first
we felt
with
schema
paper.
as
the
it essential.
be precisely
detailed
comparable
DBMS
on
a
we used
the database
issues
results,
similar
particulars
of the
should be reported.
benchmarks,
where
benchmark
we clarify
number
of pertinent
implementation
that
our
of suggestions
in itself
from
A
applications.
ON BENCHMARKS
benchmarks
For
queries
HyperModel
benchmark
focused on basic performance:
they make it
and reproduce.
We have therefore
resisted
the tempta-
therefore,
machine
operations,
the benchmark
is essential
In
ideas
fields,
the
11
that all of these features
belong in a complete
benchmark,
in the HyperModel
critique
above, but that they reduce
to expand
section,
and office
work
that
architecture.
to a number
other
“blob”
other
of a simple
to implement
6. NOTES
It
explored
and the
We believe
we discussed
responded
it essential
client/server
of engineering
includes
delete
a
The HyperModel
we have
The
we feel
using
realistic
benchmark.
tions,
tion
importantly,
performed
.
the
reproducible.
is
necessary
configuration.
it
Table
DBMS
architecture
following
In
this
to
run
the
I
shows
on its implementation.
and
configuration
on the
server
resides:
Sun 3/280 (Motorola 68020, 4 MIPs) with 16 megabytes memory
SMD-4 disk controller (2.4 MB/see transfer rate)
2 Hitachi disks, 892 MB each, 15-ms avg. seek time; 1 disk used for database,
for system
Sun UNIX 0/S 4.0.3
The
client
workstation,
was a diskfull
892 MB disks
Both
the client
for benchmarking
was generous,
benchmark
where
and
server
during
and was
program
to swap
machines,
benchmark
the
remote
benchmark
programs,
machines
were
stand
alones
reserved
exclusively
the time of our runs. Swap space on both machines
not an issue in any event,
as we only ran the
process.
measures,
and that any schema or system
tion
time
is included
in the
client
we ran
1
Sun 3/260 with 8 MB memory,
SMD-4 disk controller,
and two
(15-ms avg. seek time). This system ran Sun UNIX
0/S 4.0.3.
In all of our benchmark
permitted
a
benchmark
in as much
so that
code
we assume
information
database
of the
swap
DBMS
time
that
the DBMS
is already
open. The
code as desired
will
not
be
is initialized
cached. The initializadatabase
open call is
counted
on the
in
server
the
or
actual
measures.
ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992.
12
.
R. G. G Cattell and J Skeen
Table I.
Reportable
Information
Types of reportable
Area
Hardware
Software
DBMS
“wall-clock”
also useful;
We want
information
CPU type, amount of memory, controller, disk type/size
0/S version, size of cache
Transaction
properties
(whether
atomic, level of supported concur.
rency, level of read consistency, lock modes used, etc, ); recovery and
logging
properties;
size of database cache; process architecture
(comrnumcation
protocol, number of processes involved, etc ); security features (or lack thereof); network protocols; access methods
used
Any discrepancies
from the implementation
described here; real
Benchmark
ments
Benchmark
to include
any
running time 3, CPU time and disk utilization
size of database
disk
for our benchmarks.
1/0
overhead
On the
other
in the response
hand,
many
time are
time
database
measure-
systems
can
validly
obtain
better
performance
for engineering
applications
by caching
data between
the multiple
lookups
performed
within
one transaction
or
session.
We allowed
specifically
for repetitions
in the measures,
to allow
caching,
and we also report
may be cached
at the time
database
be closed
empty
must
the
necessary
both
major
the
cache.
to run
client
difference
command
megabytes,
Since
a program
and
in
server
the
the
asymptotic
warm
the benchmark
and
reopened
before
the
operating
system
to clear
each
all database
machines.
results:
we
results.
is started
This
may
file
is a subtle
suggest
running
for the
cache
pages
point
order
of executing
a specific
file
pages,
We allow
the client
workstation
to cache as much
workstation
about 5MB
that
is
on
can make
file,
at least
measure
of the database
a
or system
20
for ten
report
as it can,
memory.
For most DBMS
and 0/S
for a cache (the cache may be in the 0/S
3 The real-time clock on Sun machines “ticks”
at a jyanularlty
accurate enough for the measurements
we require
ACM TransactIons
to
it
out of memory
= Exit the previous benchmark measure, if any
= Clear the operating system cache
= Begin execution of the new benchmark measure
Open the database
Start timer
Report cold time for the first benchmark (when cache is being filled)
Include time for transaction checkpoint for ‘(insert” measure
Continue execuhon for remaming 9 loops
If clear asymptote emerges, report It as WARM time; otherwise
average of the 9 runs
Stop timer
Close the database
= Exit the benchmark
= For the insert measure, restore the database to its original state
given
8MB
of physical
working
sets, this leaves
The
measure,
a program
benchmark
no data
cold results.
benchmark
that sequentially
reads all the pages of a large
to fill the system cache with other data.
Thus, the general
loops is:
However,
on Database Systems, Vol. 17, No. 1, March 1992
of 1/60 of a second, which
is
Object Operations Benchmark
or DBMS,
available
or both). We conceive this amount
for application
use on contemporary
increase
as technology
advances
and
.
13
of memory
as typical
of what is
workstations;
it will no doubt
manufacturing
techniques
improve.
However,
the amount
of memory
used by applications
and other software
will
also increase,
so approximately
5MB of cache may be realistic
for some time
to come.
For our
For our
than
small
large
database,
and
the small
huge
the entire
database,
space required
database
there
to hold
typically
is little
root
fits
advantage
in main
storage.
to a cache
pages for indices,
system
larger
informa-
tion, data schema, and similar
information.
However,
for the large database,
a significant
portion
of the Part table may fit into memory
for the lookup
measure.
On the server,
the memory
used by 0/S,
DBMS,
and cache should
be
limited
that
to 16 MB.
server
cache
If it is not
our implementation,
configuration.
There are several
Generating
60,000
IDs.
associated
respectively,
possible
size be limited
since
we removed
issues
memory
the
connections;
large
small
these
database.
sive records are given successive
to select a random
part, which
total
instead.
in the generation
We populate
for the
to limit
to 12MB
memory,
This
boards
we recommend
was not necessary
to produce
of the benchmark
database
numbers
When
database:
20,000
are 200,000
the
ID numbers.
is guaranteed
with
parts
in
the proper
parts
and
are loaded,
and
600,000,
succes-
This allows the benchmark
to exist, by calculating
a
random
integer.
We choose indices or other physical
structures
to achieve
the best performance.
The physical
structure
may not be changed
between benchmarks
(i.e., the same index or link
overhead
must be included in the database
size, update time, and read time).
Connections.
“from”
field
There should be exactly
three logical
is set to each part, and the “to” fields
should be randomly
selected
connections
to the same “to”
“from”
are
acceptable,
using the algorithm
part, or a connection
as long
as they
arise
“connections”
whose
for these connections
in Section 3. Multiple
“to” the same part as
through
normal
random
assignment.
references;
It is not acceptable
to use a fixed-size
array for the “from”
the schema must work with any possible random
database.
Clustering.
according
We allow the connections
to be clustered
in physical
storage
to the parts they connect. They may also be clustered
with the
parts themselves,
if this gives better performance.
We have made it easy
for the benchmark
implementor
to do this collocation,
because part connections
have colocality
based on part id: the best way to group parts is
sequentially
by part id. In fact, when we
thought
we had made this
too easy, that
designed
we gave
the benchmark
no advantage
we
to a
DBMS
that
could automatically
colocate
by connections.
However,
we
realized
that
any benchmark
implementor
could build
the equivalent
ordering
from any underlying
collocation
mechanism
we devise and group
may seem
the parts accordingly.
So while
using part id’s for collocation
odd, it is not unrealistic
or unfair,
as we see it.
ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992.
14
R. G G. Cattell and J Skeen
.
As mentioned
earlier,
Id’s in a way that would
example,
using
permissible
them
to
use
it is not
not work
to index
physical
permissible
to exploit
the consecutive
part
with a more sparse assignment
of id’s—for
into
an
pointers
array
of addresses.
instead
of part
However,
ids to
it
represent
is
the
connections,
as long as the benchmarks
are faithfully
reproduced
(e. g., the
parts are actually
fetched in the traversal
benchmark).
In the insertion
benchmark
(3), the 100 parts and 300 connections
we
create are added to the original
database,
i.e., there will be a total of 20,100
parts in the database
after the first execution
of the benchmark.
The three
connections
for each of these parts are selected using the same algorithm
as
the others,
i.e., they are 90 percent
likely
to be to one of the parts with
largest
part id’s, 10 percent
likely
to be to any existing
part.
The newly
inserted
parts
and
connections
should
actually
be deleted
in order
to make
the insert measure
idempotent,
i.e., so that there are not 20,200 parts after
two executions.
However,
in our measurements
we found the results were not
noticeably
larger
with
21,000
parts,
so ten iterations
could probably
be
executed together.
Network
load between
the workstation
and server
machines
should
be
minimized,
though
in our experience
this had no measurable
effect on the
results.
This result may initially
be surprising,
but is consistent
with experience
at
Sun—the
workstations
Be careful
vide
Ethernet
randomizing
utilities
bits. We implemented
Miller
[71.
7. NOTES
We
rarely
with
close
generation;
nonrandom
the random
to
number
saturation
with
dramatically
some operating
characteristics
generator
in
three
their
different
overall
DBMS
design,
include
with
100
systems
pro-
in the
suggested
products.
One system
the balance
it to show
what
The
implementation
control,
and other
“traditional”
DBMS
properties,
something
about architectural
performance
factors
now review specifics of each product.
INDEX.
Throughout
even
low-order
by Park
and
ON DBMSS
experimented
quite
is
on a single wire.
with random
number
products
so we hoped to learn
by comparing
them.
We
is a B-tree
package
we had available
at
of the paper,
we label this system
INDEX.
can be done
under
ideal
vary
of concurrency
conditions,
using
Sun.
We
a highly
optimized
access method with no high-level
semantics.
We implemented
the benchmark
database
with two record files, one for
parts and one for connections.
We created a B-tree on the part id’s in the parts
file. We used this B-tree to perform
the lookup operation.
In addition
to B-trees, the INDEX
package provides
efficient
lookup operations on record ids; record ids consist of a physical
page number
plus
number
on a page. For the traversal
measures,
we used the record ids
parent-child
links as in System R [31. Our connection
records contained
type and length
attributes,
plus four record id attributes:
two containing
record ids of the “from”
and “to” parts it connects (its “parent”
records),
ACM Transactions
on Database Systems, Vol
17, No 1, March 1992
slot
for
the
the
and
Object Operations Benchmark
two
used
“from”
to circularly
and “to”
id, type,
link
parts
x, y, and
together
(the
“child”
all
of the
records).
build attributes,
plus
connection
The part
two
records
records
attributes
.
on parts.
of B-trees
against
for
the
contained
containing
the
ids for the first “from”
and “to” connection
records for the part,
Thus our INDEX
version
of the benchmark
database
consisted
file, a connection
file interconnected
to the parts file with record
B-tree file used for the lookup
To compare the performance
15
parent-child
the
record
of a parts
ids, and a
links,
we also
tried performing
the traversal
measure
using B-trees: we created two more
B-trees, one on the “from”
field and one on the “to” field in the connections
file.
Because
approach,
B-tree
the parent-child
we report
results
creation
are shown
time,
the B-trees
the
link
results
implementation
from
the
in the following
marked
with
“*”
and links.
The
insert
section.
in our tables,
times
was faster
former
in
Note:
reflects
reflect
the
only
than
next
the B-tree
section.
the overall
the time
the
The
database
to build
both
access method
used.
The INDEX
package works over a network,
and is designed to be particularly fast when the data can be cached in workstation
main memory,
treating
read-only
data
utilizing
a novel
network
lock
service
to control
access.
The
INDEX
package
caches an 8K contiguous
page of records on each remote
procedure
call. The package provides
update record and table level locks, but
no update
level
logging
locks.
are native
duty
for
concurrency
Multistatement
security
of the
features.
applications
are linked
programmer
with
from disk to the applications’s
the lock manager
only, and
required
for the table
ture even in the
System
(NFS) to
products
use
for every
memory.
The
data
INDEX
data
the
and recovery.
transactions
Of course,
access paths, as described.
INDEX
uses a single-process
libraries
control
atomic
there
is no query
to compute
program,
yet
optimizer;
program
That
and
the
data
table
supported,
nor
it is the
the
fastest
is, the INDEX
is read
directly
virtual
memory.
A separate process is used for
only one interprocess
remote
procedure
call is
lock used in the benchmark.
a client/server
system,
and
access architecture.
application
remote
database
access database
access,
We used
are not
copying
although
case,
pages.
architecture,
data
from
powerful
INDEX
by utilizing
In contrast,
requiring
the
DBMS
within
its
uses this
architec-
Sun’s Network
File
most UNIX
DBMS
an interprocess
call
buffer
intended
to application
application
domain,
is not strictly
comparable
to full-function
DBMSS. In an application
for which a query language
or transaction
semantics
are essential,
it is not
possible to use the INDEX
package.
For other applications,
gross-level
locks
and direct use of access methods
may be quite adequate.
The INDEX
are interesting
because
they represent
the best performance
that
conceivably
be obtained,
made negligible.
in the case where
the cost of high-level
OQBMS.
A second system we used is a prerelease
mercial
object-oriented
DBMS.
We believe
this system
of the current
state of the industry:
it supports
objects,
results
might
semantics
is
(“beta”)
of a comto be representative
classes, inheritance,
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
16
R. G G. Cattell and J. Skeen
.
persistent
tions
storage,
for
multiuser
concurrency
sharing
and
object-oriented
products
should not be considered
hash index
We
call
remote
this
are new to the market,
indicative
of what will
an object-oriented
system.
We implemented
the 00DBMS
taining
variable-size
arrays with
reverse
contain
of objects,
recovery.
benchmark
with
an object
the connection
information.
on the part id, used for the lookup
to the
physical
00DBMS.
page
and
measure.
offset
Object
OIDS to memory
address
is very
requiring
only table lookups.
The type
declaration
for
references
something
like
this
parts
and
when
the
(and
as object identifiers
that
bits; the OID can normally
of the
fast
here
with
type Part conWe created
a
referenced
object
extra disk accesses. The way we implemented
the benchmark,
automatically
swizzled into pointers
by the 00DBMS
the first
is used, as in some object-oriented
DBMS products.
However,
from
Since
the numbers
reported
eventually
be achieved
references)
are stored by the 00DBMS
a logical page address in the upper-order
be mapped
access, and transac-
system
the
object
associated
with
no
OIDS are not
time an object
the mapping
has been
cached,
connections
looks
in the 00 DBMS:4
Part. {
id: INT,
type” STRING[l O],
x, y INT,
build: DATE,
to. LIST OF {p: Part, type: STRINGII O], length. INT}
from. LIST OF Part}
Like
the INDEX
architecture:
tion.
tion
the
In addition,
programming
programmer
the
used a single-process
directly
with
the
data
benchmark
access
applica-
the 00DBMS
provides
close integration
with the applicalanguage,
C++
in this case. Part objects appear to the
transaction
identical
into
00DBMS
is bound
as conventional
and support
is essentially
be paged
package,
00DBMS
C++
objects,
except,
that
they
semantics.
The disk representation
to the in-memory
representation,
application
memory
with
a minimum
are persistent
of a C++
object
so that objects can
of overhead.
We used the transaction
mechanism
provided
by the
mented using shadow pages and journalling.
Transaction
00DBMS,
rollback
impleand roll-
forward
are supported.
The 00DBMS
system works remotely
over a network,
fetching
pages rather
than objects, and caches pages on the workstation.
A
variable
page size was supported;
we used 2K pages for the results
shown
here, but the results
did not differ much with lK or 4K pages. Objects may
be grouped
we put all
is being
in segments,
of the parts
implemented;
with locking
at a segment level. For the benchmark,
and connections
in one segment.
Page-level
locking
it
4 The syntax has been simplified
ACM Transactions
will
be informative
to run
for readability
on Database Systems, Vol
17, No 1, March 1992
the
benchmark
with
Object Operations Benchmark
these
finer-grain
locks.
We
predict
there
will
not
be substantially
performance,
because the implementation
will use a page
lock calls can be piggy-backed
on the page reads.
RDBMS.
production
.
17
slower
server
in which
The final
system we used, named
RDBMS,
is a UNIX-based
release of a commercial
relational
DBMS.
The benchmark
meas-
ures were implemented
in C with SQL calls for the lookup,
traversal,
and
insert operations.
The RDBMS,
like most current
relational
DBMS products,
is implemented
using
a client/server
architecture,
i .e., each
database
call
from
the application
requires
an interprocess
call to the DBMS
server.
Queries
were precompiled
wherever
possible,
thus bypassing
most of the
usual query parsing
and optimizations,
or at least incurring
these costs only
on the first execution.
We set up two tables,
tively.
A single
Part and Connection,
secondary
index
indexed
was also created,
Connection
table; this permitted
reasonable
traversal
measure.
We used B-trees
as the
tables;
the
RDBMS
did
not
provide
on id and from,
on the
respec-
to attribute
of the
response
times on the reverse
storage
structure
for all three
parent-child
links
or
another
more
efficient
access method
for the traversal
measure.
We used 100 percent
fill
factors on both index and leaf pages, and the page size for both indexes and
data was 2K.
The RDBMS
provided
a relatively
full
set of DBMS
features:
atomic
transactions,
full concurrency
support,
network
architecture
(TCP/IP
for our
network
lookup
read
benchmarks),
and traversal
locks,
features
and
turned
engineering
We refer
and
support
benchmarks,
we measured
off.
the
We believe
for
journalling
implementation,
recovery.
the
system
roll-forward
these
with
settings
interaction
that we envision.
to our implementation
of the
“straightforward”
and
we programmed
in
the
the
make
For
to not
on the
that
we
model
RDBMS
used
the
obvious implementation
we would expect an application
programmer
In particular,
there are two important
limitations
of our implementation
may or may
programmers
not be justified,
involved:
(1) The RDBMS
and INDEX.
depending
on the
application
provides
no way to cache data locally
We considered
implementing
a cache
the
place
journalling
sense for the
benchmark
sense
system
of
as a
most
to try.
that
environment
and
as in the OODBMS
ourselves,
as a layer
between the RDBMS
and the application.
However,
in so doing we would
effectively
have to duplicate
much of the functionality
of the DBMS:
index lookups by part id, connection
links between
parts, and so on. In a
real application,
which presumably
requires
much more than the three
operations
in the benchmark,
the application
programmer
would have to
implement
even more — in the worst case duplicating
most of the DBMS
on top of the application
cache: access methods,
query language,
referential
integrity,
special
types such as blobs, concurrency
nisms to keep the cache
application
programmers
versions,
mecha-
consistent
in a multiuser
context,
and so on. If
are to convert CAD or CASE application
to use
ACM Transactions on Database Systems, Vol. 17, No, 1, March 1992.
18
R. G G. Cattell and J. Skeen
.
a DBMS,
work
on
acceptable
they would
in our opinion,
demand
cached
data.
However,
a cache in
to some users.
that the DBMS
the application
(2) DBMS calls are invoked
many times for each measure,
mark requires
that we mix C and database
operations,
uses a client/server
architecture.
traversal
as a single
operations
DBMS
calls,
traversal
instead
in just
We considered
database
of 3280
seven
calls.
database
call,
We
calls,
since the benchand the RDBMS
ways
to execute
or as a smaller
could
returning
perform
the
product
may be
many
number
of
a breadth-first
3 N parts
at the
Nth
stage of the expansion
to the application
program.
However,
this might
not be very efficient,
since the 3 N parts must be passed back again in the
next query step, or a temporary
table must be used to retain
the parts on
the server,
invoking
substantial
update
overhead
(writes
are many
times
more expensive
than reads).
A relational
DBMS
with
procedural
constructs
in SQL and the ability
to link
application
procedures
with the
DBMS
could be used to perform
the entire
traversal
as a single DBMS
call that performs
the required
null application
procedure
calls on the
server.
On
interactive
tion
the
other
benchmark,
would
not
execute
hand,
on the
dure have access to global
workstation.
In
any
case,
we
left
this
approach
as defined
in Section
workstation,
variables,
these
more
violates
the
4, since much
nor
the screen,
would
or other
sophisticated
the
intent
of the
of the applicaserver
proce-
resources
on the
implementations,
and
arguments
about
whether
engineering
applications
could realistically
be
decomposed
in these ways, to future
research.
At the least, we believe
our
is representative
of what
a general
straightforward
implementation
industry
user would expect of a DBMS.
8. RESULTS
Table II shows the most important
(i.e., 20,000 parts accessed over the
performed
the benchmark
measures
out any cached data in the operating
The table also shows the warm
after
ngs
running
of the
the
cold
benchmark
and warm
results,
for the small
remote database
network).
As per our specifications,
we
cold, after running
a program
to clear
system page cache as well as the DBMS.
case, the asymptotic
results
we obtained
ten times,
as specified. 5 The relative
weighiresults
in choosing
a database
system
are
application-dependent.
Most of the applications
we encounter
would
behavior
between
these two extremes.
The INDEX
and 00DBMS
results
are clearly
fastest;
both come
to
our
goal
of ten
seconds
for
executing
each
cold
measure.
Though
have
close
the
5 These asymptotic results were generally
reached very quickly: with INDEX
and 00 DBMS,
where entire pages are added to the local cache with each data operation, we found that nearly
the entire small database M in the cache after the first iteration
of the lookup or traversal
measures
ACM TransactIons
on Database Systems, Vol
17, No 1, March 1992
Object Operations Benchmark
Table II.
Benchmark
Measure
Traversal
Insert
Total
cold
cold
warm
cold
warm
cold
warm
23
7.6
2.4
17
8.4
8.2
7.5
cold
warm
33
18
Results in seconds unless otherwise
results
are close, INDEX
wins
3.7MB
of a product
with
4.5MB
267
370
22
95
20
29
1.0
19
94
84
17
1.2
3.6
2.9
20
20
41
5
143
123
noted
on the
cold
substantial
margin
on the warm results.
The 00DBMS
results
are surprisingly
version
RDBMS
OODBMS
3.3MB
11OO*
19
Database
Remote
INDEX
Cache
DB size
Load (local)
Reverse traverse
Lookup
Note:
Results for Small
.
little
results.
good,
The
00DBMS
considering
performance
tuning.
wins
this
The
by a
is an early
00DBMS
uses
efficient
access methods
(links
for traversal,
B-trees for lookup),
minimum
concurrency
overhead
(table locking),
no overhead
for higher-level
semantics,
a local cache, and no interprocess
communication
to make database
calls. We
speculate
that
efficient
00DBMS
does so well
representation
to find
on the
objects
in
warm
results
memory
because
(different
it uses an
from
the
disk
representation).
Note
on the
that
the only
cold results
learned
from
measure
is lookup.
discussions
on which
There
with
the
00DBMS
is no obvious
the
implementors
that
lease had a poor hash index implementation.
Although
the lack of more complex concurrency
in INDEX
esting
makes
it inapplicable
in demonstrating
access methods
the
of performance
directly.
The
for this,
00DBMS
and higher-level
to some applications,
the kind
can be applied
does not beat
explanation
the
that
overall
section),
in addition
The INDEX
to the
access methods
to cache the database
The RDBMS
times
parent-child
links
used are similar
on the workstation.
are slowest,
with
reflect the overhead
of accessing the
by our straightforward
benchmark
for the
are inter-
are quite
total
where
good.
time (marked
(results,
next
connection
to the 00DBMS,
an overall
prere-
can be obtained
The load time is much higher
than the other systems, but that
the overhead
to redundantly
create B-trees
with “*”) included
but we
semantics
results
results
INDEX
traversals.
as is its ability
of 143.
The
results
DBMS for each operation
as demanded
implementation,
approximately
5,000
calls altogether.
At 20 ms for a roundtrip
network
call, this
overhead
amounts
for most of the time.
As we noted earlier,
these results
are not
inherent
in the relational
data model, only in the database
server architecture and programming
language/database
environment
boundary
overhead.
Both
the 00DBMS
and INDEX
come close to our goal of 1,000 read
operations
per second.
Ideally,
we would
ACM Transactions
like
all three
measurements,
lookup,
on Database Systems, Vol. 17, No. 1, March 1992.
20
R. G. G. Cattell and J, Skeen
.
traversal,
interaction
and insert,
in the 1-10
performance.
Generally
an upper
limit
for performance
second range for good levels of human
speaking,
1,000 operations
per second is
on a database
even with a 90 percent
cache hit rate,
more than
10 ms to fetch data from
perform
much
traversal
better
than
benchmarks
a 1-10
(1,000
second
lookups,
that
does not
since good disk
the disk.
Thus,
range
3,000
fit
in memory,
technology
we cannot
on the
traversals)
cold
requires
hope to
lookup
on current
and
technol-
ogy
Table
Now,
III
none
shows
the results
of the products
when
we ran
can make
much
against
a large
advantage
remote
of a local
database.
cache.
Note that now there is much less difference
between
the three systems on
either the cold or warm results.
Nearly
all of the results
are within
a factor
of two. Furthermore,
there is less speedup in the warm cases.
We believe that most of the
are due to the larger
working
speed decrease in the larger
set, i.e., that the database
database
results
can no longer be
cached, and not to scaling
up of access methods
with database
size. Indeed,
the link access methods
used for traversal
in INDEX
and 00DBMS
should
require
only one disk access regardless
of database
size, unlike
the indexes
used for the lookup.
The most
significant
is on the traversal:
RDBMS.
benefits
00DBMS
Since
differences
the INDEX
the locality
between
systems
and 00DBMS
of reference
on the
results
inherent
large
are faster
in the traversal
remote
than
case
for the
can provide
even when the database
cannot
be fully
cached, and INDEX
utilize
links instead of index lookups to traverse
connections,
and
these
differences
do not surprise
us.
As an experiment,
we ran the results for the huge remote database as well.
Because the time to create the database
is prohibitive,
we did this for the
INDEX
package
The overall
only.
results
case, and over four
The results
are shown
are over twice
times
as slow
as slow
in the
in Table
as the
warm
IV.
large
database
case. Again,
in the
we think
cold
most
of
this result is due to inability
to cache the larger working
set. Note that there
times—a
fact that
is now essentially
no difference
between
cold and warm
supports
this hypothesis.
With a O percent cache hit rate, 20-25
ms to read data off the disk, 15-20
ms to fetch a page over the network,
one page access for the traversal
operation,
and two page accesses for the lookup
(one for the index,
one to
fetch the data fields),
the lookup
measure
would
take 80 seconds and the
traversal
about 120 seconds. Note that this is very close to the results
we
actually
obtained
for the huge database,
This suggests that the results might
not get too much worse with a “very
huge” database,
but we do not have the
patience
to try
that!
9. VARIATIONS
In this section we consider
some variations
of the benchmark
configuration,
beyond the results
required
by the benchmark
definition,
to better
understand the DBMSS we compared.
ACM Transactions
on Database Systems, Vol. 17, No, 1, March 1992
Object Operahons Benchmark
Table III.
Benchmark
Measure
Cache
DB size
Load (local)
Reverse traverse
Lookup
Insert
Total
cold
cold
warm
cold
warm
cold
warm
100
cold
warm
123
77
Database
00DBMS
RDBMS
37.OMB
6000
84
49
43
68
59
10
7.5
47
18
56
41
20
18
Benchmark
Measure
DB size
Load
Lookup
Traversal
Insert
Total
44.6MB
4500
212
49
24
135
107
24
24
127
110
208
155
Results
Results for Huge
Remote
Database
Cache
INDEX
cold
warm
cold
warm
cold
330MB
78000 +
109
106
143
142
53
warm
cold
warm
53
305
301
in seconds
Table V shows the results
for a local database,
i.e., a database
stored
disks attached
directly
to the user’s machine.
These results are important
some engineering
applications,
remote data access.
These
much
21
Results in seconds
Table IV.
Note:
Remote
INDEX
33.lMB
16400*
Traversal
Note:
Results for Large
.
results
better
than
even
though
surprised
us; we expected
they
especially
were,
we
claim
the results
most
in the
will
local
on
for
require
case to be
for the RDBMS.
The RDBMS
results
were essentially
the same as the
expected them to be better because it is no longer necessary
remote case. We
to make a remote
call to the DBMS for every operation.
However,
the network
call may not be
the largest portion
of the overhead
in an application
DBMS call. Even on one
machine
this RDBMS,
as with
nearly
all other RDBMS
products,
requires
communication
between
an application
process and the DBMS process, and
copying
data between
the two. There is generally
significant
overhead
in
such an interprocess
call. In addition,
the benchmark
application
is compet ing for resources on the server machine
in the local case.
ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992.
22
R. G. G Catteli and J. Skeen
.
Table V.
Measure
Lookup
Traversal
Insert
Total
Benchmark
Results for Local Databases (Cold)
RDBMS
Size
INDEX
00DBMS
small
large
small
large
small
large
5.4
24
13
32
74
15
12.9
32
98
37
1,5
3.6
2’7
44
90
124
22
28
small
large
26
71
24
73
139
196
Note: Results in seconds
We also guessed that the INDEX
and 00DBMS
results
would be much
better than for the remote database,
but they are only 30-40 percent faster.
This illustrates
that remote access to files is not much more expensive
than
local access. We performed
some rough measurements
running
an application reading
pages locally
and remotely
through
NFS, and found only a 30
measurement
does not
percent
difference
in speed. Of course, this rough
include time for locking
or to flush pages back remotely
on the insert results.
Note that 00DBMS
beats INDEX
by a small margin
in the small local case,
though
it lost in the small remote
minor differences
in implementation.
In
order
to better
tional
experiments.
results
were good,
understand
case; this
benchmark
switch
is probably
performance,
We used INDEX
for these
it was easy to work with,
we did
experiments
and we had
a result
two
of
addi-
since its overall
the source code
available
so that we could be certain
of its internal
architecture.
First, we examined
the effects of the locality
of reference
that exists in the
connections
between
parts, to see its effects on the the traversal
and other
measures.
Table
VI shows
these
results.
Note that the result for the traversal
(24 in Table VI versus 17 seconds
Table II, for the cold case) and insert (36 versus 8 seconds) differ significantly
without
locality
of reference
in
the
small
remote
database.
The
in
traversal
benefits
from the likelihood
that connected
parts are frequently
nearby, with
the page-level
caching provided
by INDEX;
the insert benefits
because it also
must fetch connected
parts to create the parent-child
links that we use for
traversal.
Since the results
without
locality
differ
so much, the addition
of
reference
locality
to our benchmark
would seem to be an important
one.
Note there is little
difference
on the lookup
numbers;
this is as expected,
since
little
the lookups
are randomly
distributed
difference
in the case of the warm
since
the
entire
database
is cached
over all the parts. There is also
numbers:
this is not surprising,
in memory
at this
point
and
locality
is
irrelevant.
On the large or huge database,
we would
expect there to be similar
or
larger differences
without
locality
of reference,
though
we did not test these
cases. Note that the large and huge database
results
are worse for at least
two reasons: the database
does not fit in memory
and (in some cases) access
ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992
Object Operations Benchmark
Table VI.
Small
Database Without
Remote
Measure
DB size
23
former
is
of Reference
INDEX
Cache
5 .OMB
Lookup
Traversal
Insert
Total
Note:
Locality
.
cold
warm
cold
warm
cold
warm
7.9
2.4
24
7.9
36
10
cold
68
warm
20
Results in seconds
structures
such
substantially
as
B-trees
affected
take
longer
to
traverse.
Only
the
by locality.
We were curious
as to the
links to perform
the traversal
effect of using B-trees instead
of parent-child
measure.
Intuitively,
we expect the links to be
faster
the
since
they
can
obtain
connection
records
and
part
records
in
a
single disk access (or a few memory
references,
if cached), while the B-trees
require
at least one more page access and significantly
more computation
(to
compare keys).
B-Trees on the
traversal.
As expected,
about
twice
(the time
connected
In Table VII, we show the results we found when we created
“from”
and “to” fields of connections
and used them for the
the traversal
as fast
for inserts
parts
into
that
DBMSS,
require
links
result,
B-trees,
traversal
The
slower:
insert
connection
and create
not affected.
This is a significant
relational
is significantly
as B-trees.
time
B-trees
to them)
as it shows
is comparable
that
objects.
than
links
to find
are of course
access method
links
a larger
are
substantially
times
the mainstay
worse
With
differ
to the time
and the lookup
is substantially
of many
the parent-child
does not
of
for applications
database
we would
expect the difference
to be even larger,
since links take a constant
time to
traverse,
while B-tree lookup time grows logarithmically
with database
size,
Thus DBMSS for engineering
applications
should provide link access methods
or some other
access method
that
is efficient
for traversal
indexes might be adequate).
It should also be noted that
B-trees,
as used by the relational
preserve
entries
as do the
an ordering
connections
on the
in the 00DBMS
and the parent-child
(extendible
hash
DBMS,
do not
parent-child
link
links.
version
The
of INDEX
part
can
be maintained
in any desired order; in our benchmark
implementation,
the
connections
are traversed
in the same order in which they were created. If we
added ordering
to our 001
specification,
the RDBMS
performance
would
be even slower than
in our results–the
RDBMS
would
have to sort the
connections
during
the traversal
operation.
In the current
version of 001, we do not require
that an ordering
be maintained
on the
connections,
nor
do we require
that
the
traversal
operation
ACM Transactions on Database Systems, Vol. 17, No 1, March 1992.
24
R. G. G, Cattell and J. Skeen
.
Table VII.
Small
Database Using B-Trees instead of Links
Remote
Measure
DB size
Lookup
Traversal
Insert
Total
Results
Note:
in a particular
to measure
an “ordered”
At
order.
version
where such an ordering
portions
of a document
paragraphs,
10.
or if the parts
SOME
the
time
A
cold
warm
cold
warm
cold
warm
7.9
24
33
21
12
91
cold
warm
53
33
It would
of the
would
with
were
be interesting,
traversal.
There
be important,
a hierarchy
statements
for future
are many
work,
applica-
for example
if the parts
of chapters,
sections,
and
and blocks
in a program.
COMPARISONS
of this
writing,
object-oriented
databases,
results
we have seen have
pendix
INDEX
3.9MB
in seconds
be performed
tions
were
Cache
shows
some
other
relational
been
results
groups
have
databases,
consistent
reported
run
and
with
on
001
on a variety
of
access methods.
The
those
products
reported
from
here.
Object
Ap-
Design,
Objectivity,
Ontologic,
and Versant
on the small
database.
Note that the
results on these object-oriented
database
systems are very close, but a couple
of these systems
are faster
for lookup
In fact, one object-oriented
DBMS that
run
at
essentially
structures.
In comparing
is intentionally
it is only
the
same
speed
and traversals
in the warm
case.
incorporates
swizzling
was found to
as conventional
in-memory
C++
data
DBMSS, it is important
to keep in mindl that our benchmark
simple, in order to make it easy to be widely reproduced—and
a rough
tions.
The only
actually
execute
approximation
truly
the
accurate
application;
to the requirements
of engineering
representation
of perfcm-mance
our goal is merely
to focus
applica-
would
be to
attention
on
gross-level
issues in the performance
of different
database
architectures.
However,
001
is based on past experience
and interviews
with CASE and
CAD engineers,
and bears close resemblance
to other engineering
benchmarks
we have examined
[4, 8]. We computed
a correlation
coefficient
of
.981 between
the latter
benchmark
object-oriented
DBMS products.
We would like to emphasize
that
and
our
001,
results
based
do not
on the
show
results
that
of five
an object-
oriented
data model is better
than a relational
data model for engineering
applications.
We can think
of only two limitations
of the relational
model
itself that might be a problem
for engineering
applications:
(1) the relational
model provides
no “manual”
ordering
capability,
as mentioned
earlier,
and
ACM Transactions
on Database Systems, VO1 17, No 1, March 1992
Object Operations Benchmark
(2) using
the
overhead
between
structure
relational
model
the
representation
benchmark
provides
for the
tabular
in
the
evidence
no
database
may
representation
order of magnitude
difference
from
results
appear to result
necessitate
in the
application
programming
to validate
or
and
these
different
DBMS.
the relational
data
Our
effects.
The
our
rather
and RDBMS
architecture
addition
of other operations
the balance
of the results,
of using
the
language.
quantify
we see in the 00DBMS
a difference
in the DBMS
25
translation
database
than data model. Furthermore,
the
to 001
might
change
hoc queries)
way
.
in
(such as ad
would
a
as
In this regard,
Winslett
and Chu [11] report
results
that might
to those
presented
here. They ported
a VLSI
CAD
contradictory
DBMS, UNIFY,
and found that
MAGIC,
to run on top of a relational
ran less than 25 percent
slower. On 001,
we found that a relational
appear
system,
MAGIC
DBMS
ran
in these
several
results
times
actually
slower
than
support
different
than Winslett
One important
difference
with
the DBMS
into
the
its
However,
we have
the contrast
made;
our experiments
is that
DBMS,
which
in
in-memory
own
systems.
of the claims
and Chu’s in several important
ways.
Winslett
and Chu used a batch
were
interaction
other
many
the
application
structures,
fetched
operated
on the
period of time, then dumped
the data back to the relational
As explained
in Section 4, the results reported
in this paper
interactive
version of 001,
operating
In addition,
UNIFY
differs
from
all
model
data
data
of
from
for some
representation.
are based on an
directly
on objects in the database.
the RDBMS
we used in two important
ways:
most
UNIX
DBMSS (and a newer UNIFY
product
(1) UNIFY,
unlike
is bound
directly
with the application
program,
the same company)
minimizing
overhead
to call the DBMS.
It also keeps data in the
address
unlike
current
with
in
memory,
11.
SUMMARY
would
like
of such
001
ways
located,
tions
at
for
grain,
can
or other
sometimes
propose
lookup,
that
on
three
with
fit
in
experience
simple
and
insert
database
must
conventional
control,
memory.
and
(4) the
These
performed
set.
benchmarks
database
be intermixed
of engi-
representative
working
other
the
transactions
engineering
study
operations,
small
(1)
for
and
measures
from
applications:
main
by directly
of the
performance
architecturally
concurrency
on 001
amounts
a relatively
language
(3)
links.
methods.
traversal,
engineering
programming
fine
emphasis
differs
a
access
better
large
benchmarking
database
(2) the
versions
set
are
located
benchmark
important
more
parent-child
perform
caching
new
of our
we
applications
a remotely
Our
ing
see
applications,
upon
with
to
provides
should
program,
providing
On the basis
databases.
neering
and
DBMSS,
products
application
an
We
UNIX
most
relational
binding
main
same
space.
(2) UNIFY,
Thus,
from
thus
is
with
may
entire
differences
data
be
database
in
remotely
operareplaced
work-
necessitate
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
R. G. G. Cattell and J. Skeen
Table VIII.
Small Remote Database Results with Different
Systems
f-
f)l
violations
of conventional
wisdom
obtain acceptable
performance.
results
of the paper,
The main
in
VIII.
VIII
Table
Table
performed
that
our
The
goals
The
Notice
the
we
tributable
The
INDEX
small
00DBMS.
results
remote
other
worse
alternatives
to
Without
somewhat
benchmark
locality
worse.
of
This
is important
ACM Transactions
local
00DBMS
demonstrating
impossibly
benchmark
some
high.
using
of these
from
the
conclusions
Section
our
for
hit
rate
9,
the
(Table
are
shown
in
links
that
the
connections
found
local
it
than
with
for
the
cache
per
is
DBMS
case.
benchmark
only
that
at-
databases.
overhead
?ocal
the
with
claim
a
size
directly
larger
was
the
for
database
X) are better
contention
even
logical
the
was
with
to minimize
the
as
difference
traversal,
supports
IX
deterioration
database
slowly
for
Table
a similar
parent-child
B-trees
type
this
IX);
ran
of
This
in
of
important
results
measures.
results
support
RDBMS
not
to
systems.
the
to
taken
in cache
(Table
data
instead
“transitive-closure”
the
it is also
the
B-trees
on
support
majority
a small
with
are
order
graphically
commercial
these
performed
results,
reduction
these
general;
significantly
sider
the
database
the
access method,
than
in
X.
in
that
However,
Using
and
deterioration
found
All
important.
The
IX
to a dramatic
the
in
Tables
servers)
summarized
performance
further
architecture.
in
are
of
INDEX
slowly
we
lend
II,
prerelease
more
variations
database
Table
DBMS
ran
package
database
grows.
the
engineering
RDBMS
INDEX
about
the
for
experimental
graphically
that
from
as did the home-grown
commercial
in-house
call
illustrates
well,
for
(e. g.,
produces
minor
is
that
effects
on
to
con-
important
are
traversed
in
a
operation.
reference,
shows
where
the
that
the
results
on
inclusion
applications
the
traversal
of locality
exhibit
on Database Systems, Vol. 17, No. 1, March 1992.
such
measure
of reference
locality.
are
in
a
Object Operations Benchmark
Table IX.
INDEX
.
Results for Remote Access to Small, Large, and Huge Working
27
Sets
l-,-
J-,
—
,’
—
_=: ; !:,l-l
~
.
i
,1
-i
-—
>.
...
_l.1 .’.
,.
.,
+–
.’
—
‘;
II. -l.
,-,
,1.
Table X. INDEX Results for Small Database Locally, Remotely, Using B-Trees Instead
of Links for Connections, and Without Locality of Reference in Connections
,L—
..:,
—
,,.,
-i
~
~n
J
,1
,-
I
:)
..
–’,
1.
—,
,—
;,
,_,
.’
!-l”-
,,-
Y, I r 1--,
–!’.
To summarize:
operations
in
Good
001
,“1!
performance
requires
language
data
long-term
check-out,
space,
on
effective
new
and
access
caching
database
systems
will
they
perform
benchmark
can
‘. ‘. I’j
not
be used
use
the
and
methods,
on
in
measures
ACM Transactions
a local
many
such
~:
‘,’,’! ’,.1
lookup,
traversal,
integration
engineering
We
1-10
allowing
contend
applications
in
insertion
programming
control
workstation.
as ours
and”
of the
concurrency
,1, II’,
seconds
that
unless
with
a
on Database Systems, Vol. 17, No 1, March 1992.
28
R. G. G Cattell and J. Skeen
.
90 percent
have
cache
hit
that
such
shown
can
fall
short
essential
of
mance
needs
define
The 001
show
and
not
to use
different
the
Sun-3
results
half-a-dozen
and/or
the
could
the
not
Measure
were
as
00DB2
13
13
6,7
13
33
The
results
and
to the
set
of
the
perfor-
measures
we
used.
The
numbers
Sun
but
we
did
not
their
not
and
and
001
has
verify
the
results
is
DBMSS,
are
premises,
shown
numbers
relational
RDBMS1,
comparison.
are
of these
to
on
ObjectStore,
results
relative
00DB4,
for
00DB3
22
18
3.7
16
44
strictly
slightly
INDEX
been
are
run
source
on
code
here.
00DB4
RDBMS1
20
17
36
22
41
29
94
20
95
143
7.6
17
8.2
23
33
19
84
20
123
2.4
8.4
7.5
18
1,1
12
1.0
3.3
00DB1
00DB2
00DB3
5 6MB
133
3.4MB
50
7.OMB
47
10
83
19
20
L+ T+ Icold
39
25
essentially
ObjectStore,
purpose
them.
verified
Design
The
1.0
12
2.9
5.1
INDEX
Results in Seconds: Small Local Database
10
10
5.3
ACM Transactions
the
products
0,03
01
31
3.2
12
18
9
‘M Objectivity/DB,
1990.
to reproduce
18
27
14
34
59
Benchmark
results
We
products
magnitude.
attention
the
servers.
that
Results in Seconds: Small Remote Database
Lookup cold
Traverse cold
Insert cold
Note: local warm
well,
permission
0.1
0.7
3.7
4,5
001
of
the
Object
early
among
not
00DB1
Lookup warm
Traverse warm
Insert warm
L+ T+ Iwarm
order
through
since
repeated
Benchmark
Lookup cold
Traverse cold
Insert cold
R-traverse cold
L+ T+ Icold
Measure
were
paper,
obtain
new
to choose
systems
001
Database size
Build time
of the
some
other
shown
than
is to bring
in
products,
configurations
from
an
on Objectivity\DB,
VERSANTTM
as a basis
because
as
important
architectures,
was run
and
performance
comparable
much
and
also
systems.
identifying
the
as
more
have
RESULTS
benchmark
without
by
workstations
We
applications
DBMS
ONTOS,
used
possible.
paper,
engineering
A: OTHER
Ontologic
to
this
database
to compare
APPENDIX
here
of
commonly
are
results
in
about
on
results
these
message
observations
rate
UNIFY,
RDBMS1
INDEX
3 7MB
267
4.5MB
370
3. 3MB
1100 *
13
9.8
1,5
24
same as remote
ONTOS,
00DB4
27
90
22
139
5.4
13
74
26
warm
and VERSANT
on Database Systems, Vol. 17, No 1, March 1992
are registered
trademarks,
Object Operations Benchmark
APPENDIX
Database
B: SQL
.
29
DEFINITIONS
Definition
create table part (
id int,
part_type
char(lO),
x int,
y int,
build datetime)
create table connection
(
frm int,
too int,
conn_type
char(lO),
length
int)
create unique
clustered
index ipart on part(id)
with fillfactor
= 100
create clustered
index iconn on connection(frm)
with fillfactor
= 100
create nonclustered
index rconn on connection(too)
with fillfactor
= 100
Stored
Procedures
CREATE
PROCEDURE
(@urn
int,
@id int = O
@x int = O
@y int = O
@part_type
OUTPUT,
OUTPUT,
OUTPUT,
char(lO)
engl _lookup
= “abc”
OUTPUT
AS
SELECT id, x, y, part_type
FROM part
WHERE id = @num
CREATE
PROCEDURE
engl
@partial
int,
@frm int = O,
@too int = O OUTPUT,
@length
int = O OUTPUT,
AS
_ftrav_conn
SELECT too, length
FROM connection
WHERE
frm
= @partial
CREATE
PROCEDURE
@id int,
@part_type
char(lO),
@x int,
@y int,
@build
datetime
AS
engl
_part_
insert
INSERT part
(id,
part _ type,
x,
Y?
build
ACM Transactions on Database Systems, Vol
17, No. 1, March 1992
30
R, G. G. Cattell and J. Skeen
.
\ALUES
(@id,
@part_
(@X,
type,
COY,
@build
)
Example
of C Code:
Lookup
lp_start
= time100(
);
for (i = O; i < nrepeats;
i + + ) {
genrandom(nparts,
nrecords);
/“ Generate
random
numbers
“/
starttime
= time100(
);
for(j=O;
j<nparts;
j++){
dbrpcinit(dbproc,
“engl _ lookup”,
(DBSMALLINT)O);
dbrpcparam(dbproc,
NULL,
(BYTE)O,
SYBINT4,
– 1, -1,
&rand_
numberstil);
dbrpcsend(dbproc);
dbsqlok(dbproc);
dbresults(dbproc);
dbnextrow(dbproc);
p_id
= *((DBINT*)dbdata(dbproc,
l));
p_x
= *((DBINT*)dbdata(dbproc,
2));
p_y
= “((DBINT*)dbdata(dbproc,
3));
strcpy(part
_t ype, (DBCHAR*)dbdata(dbproc,
4));
nullproc(p_x,
p_y, part _type);
)
~ndtime
= time100(
);
printf(’’Total
running
time for loop %d is %d. %02d
(endtime
– starttime)/100,
(endtime
- starttime)%lOO);
starttime
= time100(
);
seconds”,
i + 1,
}
dbexit( ):
endtime
= time100(
);
printf(’’Total
running
time
(endtime
– lp_start)/100,
(endtime
- lp_start)%100);
%d. %02d
seconds”,
REFERENCES
1 ANDERSON, T,,
BERRE, A., MALLISON, M., PORTER, H , AND SCHNEIDER, B.
The HyperModel
Conference
on Extending
Database
Technology
(Venice, March
benchmark,
In Proceedings
1990), Springer-Verlag
Lecture Notes 416
2 ANON ET AL, A measure of transaction processing power. Datarnatzon.
31, 7 (April 1, 1985).
ACM
3 ASTRAHAN, M , ET AL. System R: Relational
approach to database management
Trans.
Database
Syst.
1,
1 (1976).
BARRETT, M, C + + Test driver for object-oriented
databases, Prime Computervlsion,
personal commurucatlon,
1990.
database systems: A system5 BRITTON, D., DEWITT, D. J., AND TURBYFILL, C. Benchmarking
VLDB
Conference
(October, 1983) pp. 000 000. Expanded
atic approach. In Proceedings
and revised version available as Wisconsin Computer Science TR 526.
6 M.MER, D. Making database systems fast enough for CAD applications.
Tech Rep CS/E87-016, Oregon Graduate Center, Beaverton, Oregon, 1987
4
ACM TransactIons on Database Systems, Vol. 17, No 1, March 1992
Object Operations Benchmark
7. PARK, S., MILLER, K.
ACM.
Random
number
generators:
Good ones are hard to find,
.
31
Conwum.
31, 10 (Oct. 1988).
8. SCHWARTZ, J., SAMCO, R.
communication,
1990.
Object-oriented
database
evaluation,
Mentor
Graphics,
personaI
9. RUBENSTEIN, W. B., KUBICAR, M. S., AND CATTELL, R. G. G. Benchmarking
simple database
ACM SIGMOD
(1987).
operations. In Proceedings
10. TRANSACTION PROCESSINGPERFORMANCE COUNCIL (TPC),
“TPC Benchmark
A Standard”,
Shanley Public Relations, 777 N. First St., Suite 600, San Jose, California,
November, 1989.
11. WINSLETT, M., AND CHU, S. Using a relational
R-90-1649, Computer Science, Univ. of Illinois,
Received December
1988; revised January
DBMS for CAD data, Tech. Rep. UIUCDCSUrbana, 1990.
1991; accepted January
1991
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.