Download A Database System for Time

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Relational algebra wikipedia , lookup

PL/SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
File: clinsys-article.tex, Revision Date: Sep 1, 2011
A Database System for Time-Stamped Data
By Gary D. Knott, George Pick and Judy Graves
Civilized Software, Inc.
12109 Heritage Park Circle
Silver Spring, MD 20906
www.civilized.com
[email protected]
301-962-3711
Introduction
This paper describes the design of a time-aware database system. The system is called CLINSYS since we originally designed it to be used to manage
clinical trial data. Such data often includes time series such as a patient’s
blood pressure at various times. CLINSYS is, however, not restricted to any
particular domain of application. The ideas involving time are generally applicable, as is the CLINSYS system itself. CLINSYS is especially suitable
for storing and retrieving data about events sequenced in time.
There are several noteworthy features in CLINSYS. These include the handling of time-stamped data-values, the use of three-valued logic for dealing
with unknown (e.g. missing) data values in order to support various statistical procedures, and the particular set-theoretic vis a vis relational mixture
employed as the fundamental database model in CLINSYS. We are presenting this description of CLINSYS in hopes that various of the design features
in CLINSYS will be useful to others, and as a precis for potential supporters
of our implementation effort.
CLINSYS is an interpreter which implements the CLINSYS language. The
preliminary version of the CLINSYS interpreter has been programmed by
George Pick and Judy Graves. The CLINSYS language contains facilities
for general computation (e.g. loop-constructs, assignment statements) and
graphics (e.g. display of data and function graphs), together with a rich collection of builtin operators, including particular operators for constructing
and querying a CLINSYS database. The structure of a CLINSYS database
is described below.
1
Time is an important aspect when modeling real world phenomena. Events
occur at certain points in time. Data is valid within certain time periods.
The ability to record the time when an event occurred or the time when the
data is valid along with the data itself is essential for meaningful interpretation of this data. Traditional databases do not directly support storage
and management of time-dependent data. In these databases, as data is
updated, past information is lost and becomes inaccessible.
As an example, consider a relation called CITY specifying the state, population, mayor, rainfall, and temperature for each city stored in the relation.
name state population mayor rainfall high-temp
City1 State1
100,000 Johnson
2
35
City2 State2
400,000 Jackson
4
55
City3 State1
320,000 Jefferson
1
44
This data was valid on certain dates or for certain time intervals, but this is
not reflected in any way in this relation. If we now have new values for the
population attribute or a new mayor has been elected, we need to replace
the existing data. This historical data will now be inaccessible. However, it
is sometimes important to keep track of such historical information in order
to maintain data about time-varying phenomena. We would like to store
the date the population was recorded along with its value. For the rainfall
attribute, it would be useful to record the time interval for which the value
corresponds. In addition we might want to store many such values for any
given city along with the relevant time-stamps.
In recent years there has been much interest in developing database management systems that support time-dependent information. Three types of time
have been considered [SNODGRASS 86A]. These are Valid time – the clock
time when the event occurred, Transaction time – the time when the data
was stored in the database, and User-defined time – an arbitrary attribute
of type time, for which the allowed operations include input, output, and
comparison. Most database systems can deal with user-defined time data.
A time-aware system could support both valid time and transaction time
and offer special functions that deal with this dimension of data. Snodgrass
calls a database that supports only valid time a historical database, and one
that supports only transaction time a rollback database. A database that
supports both valid and transaction time is called a temporal database. A
conventional database that does not support time at all is called a snapshot
database [SNODGRASS 86A].
2
There are some basic design decisions that characterize the types of objects
that a temporal database manages [MCKENZIE 91B]. Is valid time associated with attribute values or with tuples? Is time represented as points
(single values), intervals or sets of intervals?
The models that have been described in the literature [MCKENZIE 91B]
assume that the value of the data is constant over the interval for which it is
defined. This is the case regardless of whether time is represented as single
values or as intervals, or whether time is associated with attribute values or
with whole tuples.
This constancy assumption is not always appropriate. Some data is only
valid at one point in time, and may differ at other times. For example, the
population of a city is valid only at the time the information was collected.
Suppose that the population of Washington DC is recorded on January 1,
1988 and then again on January 1, 1990. If we want to know the population
of Washington DC on January 1, 1989, then it may be incorrect to assume
that the population remained the same from 1988 until 1990. Instead, some
form of interpolation could be used to derive this value. Similarly, suppose
that the temperature is recorded on a certain day representing each month
of the year 1990 in Washington DC. If we want to know the temperature in
the time interval May 1990 through October 1990, then it is necessary to
take the average of these values. For rainfall over a time period, the sum
of the recorded values could be used. On the other hand, suppose that the
name of the mayor of Washington DC is recorded on January 1986 and then
on January 1990. If we want to know who was the mayor in January 1988,
then the latest value prior to the query time should be used. Thus it should
be possible to store attribute values with just single time-stamp values and
define an impute rule such as linear interpolation for each such attribute
which specifies how to assign a value to the attribute at a point where no
time-value pair is stored.
CLINSYS employs such single time-stamp values. CLINSYS provides a
mechanism for defining an impute rule for each attribute when defining
the schema of the database along with operators that can use the defined
impute rules to provide a wide range of meaningful context-sensitive temporal queries. These include queries based upon relative time and upon
time-intervals.
Previous Work
Much effort has been devoted in recent years to define a data model that sup3
ports the time dimension. Most of this work is an extension of the relational
model. Some work has also been done in extending other models. TERM
[KLOPPROGGE 81] is an extension of the entity-relationship model. PDM
[MANOLA 86A] is an extension of the DAPLEX functional model. ADM
[THOMPSON 91A] is an extension of the relational model incorporating
sets of synchronized time-varying relations.
Ben-Zvi [BENZVI 82] defines their Time Relational Model that supports
both transaction time and valid time. Time is associated with tuples. Static
relations are extracted and manipulated using the traditional relational algebra operators. Navathe and Ahmed [NAVATHE 89] define a temporal
relational model. In this model, attributes may be non-time-varying or
time-varying. All attributes of a relation must be of the same type. Time
is associated with tuples. Clifford and Warren [CLIFFORD 83] were the
first to suggest associating time-stamps with attributes. In the historical
database model [CLIFFORD 87A], Clifford and Crocker suggest associating time-stamps with both the tuple and with each attribute value. The
time-stamp associated with the tuple represents the life-span of the tuple,
whereas the time-stamps associated with the attributes represent the validtime of the value stored for that attribute. The attribute’s valid time must
be a subinterval of the tuple’s life-span. In their data model, Clifford and
Crocker employ non-1NF (first normal form) attributes. Tansel’s historical
algebra [TANSEL 86B] also uses non-1NF attributes. Four types of attribute
values are defined by this model. Attributes may be either non-time varying
or time-varying and they may be atomic-valued or non-atomic valued. The
conventional relational operators are modified to handle the historical data
directly. In Gadia and Yeung’s [GADIA 88A] model, attribute values are
functions from temporal elements onto attribute value domains. Static relations are extracted in order to apply the conventional relational operators.
All of the relational algebras associated with these models introduce some
new operators to deal with temporal selection and temporal join as well as
other time operations. Navathe and Ahmed [NAVATHE 89] define timeslice, inner-time-view and outer-time-view as forms of temporal selection,
and tcjoin and tcnjoin as forms of temporal join. Clifford and Crocker
[CLIFFORD 87A] define the following operations: when which maps a relation instance onto its lifespan; select-if which is a form of temporal selection;
select-when which is a combintion of temporal selection and temporal join;
time-slice which is a form of temporal join. Tansel [TANSEL 86B] defines
slice, uslice, and dslice which are forms of temporal selection. Some of these
algebras provide conversion operators that transform the non-1NF attributes
4
to normal form attributes and vice versa by adding or removing attributes
and tuples in the relation and by splitting or coalescing the existing tuples as
needed. Tansel’s [TANSEL 86B] pack, unpack, t-dec, t-form and drop-time
operators are examples of these operations.
We don’t know of any prior work in defining attributes with impute rules
for attribute value computation for unrecorded times, as in the population
example above. Tansel [TANSEL 87A] proposes a statistical interface for historical relational databases. This is in fact a set of operators whose domain
is the values of time-dependent attributes. These operators perform various
statistical operations such as max, average etc. on these time-dependent
variables. But, the inherent difference between different data values by definition is not exploited in this work. That is, data is still assumed to be
constant over the time period for which it is defined, and there is no way to
impute values for variables for which time was not explicitly defined.
CLINSYS Database Concepts
A simple record-oriented file is a collection of records, where each record
consists of a list of variable values for a specified set of variables. Generally,
each record corresponds to a unique individual or event. Let us call the objects described by records subjects. A simple record-oriented file is a special
example of a CLINSYS subject-class. Alternately, a CLINSYS subject class
corresponds to a single relational table in a relational database schema.
A CLINSYS database consists of a group of one or more subject-classes.
Each subject-class has an associated set of database variables whose values
constitute properties of, or observations about, the subjects of the associated
subject-class. Within each subject-class, a unique internal integer subject
number is assigned by CLINSYS to each subject of the class.
Each database variable is associated with a unique subject-class of the
database. Thus to define a CLINSYS subject-class, you define a collection
of database variables to be associated with the subjects of the subject-class.
A subject number is specified by writing the corresponding integer preceded
by a number-sign. For example #1932 refers to subject number 1932 of the
current subject-class. The subject-class can also be specified explicitly; thus
sc#30 denotes subject number 30 of the subject-class named sc. Usually
subject numbers are not specified by the user; they are produced by queryexpressions.
A CLINSYS database is thus defined by a collection of database variables
which are partitioned among the specified subject-classes of the database.
5
Unlike a simple record-oriented file, a CLINSYS database variable may be
time-dependent. A time-dependent database variable v may have zero or
more values recorded in the database for each subject belonging to the
subject-class associated with the database variable v. Each such value has a
distinct date and time-of-day associated with that value. A date and a timeof-day are combined in one fractional Julian day value. The integer part is
the day, and the fraction part is the time-of-day within the associated day.
A date and time-of-day fractional Julian day value is simply called a timevalue for short. Such Julian day values are used internally, while ordinary
dates and times age specfied as input and given as output.
The time-value t associated with a given value x of a time-dependent database
variable v is not semantically interpreted by CLINSYS. It is generally expected to be the case that t is the time at which v assumes the value x, and
certain CLINSYS functions make most sense in this context. However, any
other interpretation, such as the time that v is updated to have the value x
in the database may alternately be intended.
Every subject-class may have a specific database variable of that subjectclass designated as the key-variable of the subject-class. The key-variable of
a subject-class is a time-independent variable whose value is distinct for each
subject. Thus a key-variable value uniquely determines a subject number of
the associated subject-class. For example, a key-variable for a subject-class
whose subjects are individual people might be the social security number.
A value of a CLINSYS database variable can be any one of the following
datatypes:
1.
2.
3.
4.
5.
6.
7.
real number.
complex number.
subject.
string of characters.
matrix of reals.
time-value.
code-value.
1. A real number is a floating-point value. Integers may conceptually
be stored in a CLINSYS database variable, but they are converted
to reals when input occurs and stored as reals in the database. The
logical constants are a subset of the reals (TRUE=1, FALSE=0).
2. A complex number is a pair of floating-point values.
6
3. A subject is a subject-number and subject-class pair. A subject-valued
database variable in a given subject class S may have values which are
subjects of other subject classes.
4. A string is a sequence of characters.
5. A matrix is a 1-origin indexed, n row by m column array of real numbers,
where the dimensions n and m are recorded with the matrix.
6. A time-value is a floating-point value, interpreted as a fractional Julian
day value; the integer part denotes the date, which corresponds to an
ordinary Gregorian calendar date, and the fraction part denotes the
exact time of day. A time-value constant is a time-of-day and/or date
enclosed in angle brackets (’<’,’>’). The time-of-day and the date may
appear in any order. If the time-of-day part is omitted it defaults to
midnight (0:0:0). A time-of-day is written as: hh : mm or hh : mm : ss
where hh, mm and ss are one or two digit numbers representing hours,
minutes and seconds, respectively, with 0 ≤ hh < 24, 0 ≤ mm < 60,
and 0 ≤ ss < 60. A date is written as: dd − mmm − [yy]yy or
nn/dd/[yy]yy where dd is a one or two digit number representing the
day of the month with 1 ≤ dd ≤ 31, mmm is a three letter month
name, nn is a one or two digit month number with 1 ≤ nn ≤ 12, yy
and yyyy are two or four digit numbers, representing the year of this
century or an absolute year, respectively.
7. A code-value is an integer which corresponds to one member of a fixed,
specified set of textual descriptors, for example: MALE, FEMALE,
OTHER. In the internal CLINSYS symbol table, a code-value is stored
as an integer representing the index of the code name in the symbol
table. In a CLINSYS database, a code-value is stored as a string
holding the code name. This makes it difficult to change the code
name after its introduction. If, however, we stored the index into a
code table for the corresponding variable instead, we wouldn’t be able
to easily change the order of the codes, or insert new codes in the code
table afterwards.
Explicit data-missing codes may be recorded as the values of CLINSYS
database variables. The database data-missing codes are: UNASKED, NOTAPPLICABLE, SUSPICIOUS, REFUSED, NOVALUE. These names are
the names of special constants of polymorphic type. A data-missing code
7
is recorded together with the datatype of the missing data in the datatype
field which is associated with a data value entry in the database.
In CLINSYS, retrieval of stored data is done in two steps. First a set
of subject-numbers (all from the same subject class) is constructed, and
then a table of database-variable values is constructed with a row for each
subject-number in the retrieved set. For example, the statement S = all(P)
|(bloodpressure>200 in [surgerydate,surgerydate+30]) creates S as
the set of subject-numbers of those patients who had a blood pressure greater
than 200 at least once in the first 30 days after their surgery. The statement
T = extract age, weight, cholesterol from S builds a table of three
columns called T with a row for each subject-number in S. Such tables may
be joined or projected to form other tables according to the usual relational
database algebra.
CLINSYS is an interpreter. CLINSYS reads successive commands, parses
them, evaluates them, and presents any generated output. Ordinary internal variables may be used in CLINSYS. They come into existence and are
entered into the internal symbol table when they are defined, i.e. when a
value is first assigned to them. CLINSYS interpreter variables are datatyped. The type of a variable is determined by the type of the value first
assigned to it; this type cannot be subsequently changed by another assignment. In order to reuse that variable to hold values of other types, it must
first be deleted.
Expressions in CLINSYS are made-up of constants and variables, and of
functions that act on these values. The functions have differing syntactical
forms (e.g. a+b, sin(x), if a then b else c). Also, functions may be
either built-in, or user-defined.
Internal Database Architecture A CLINSYS database is assigned a name
N. Then the database itself is kept in the file named N.db. In addition to
the database file, we have two kinds of ancillary input files. The Database
Variable Definition file, named N.def, is used to store the definitions of
database variables. A file named N.inp is called a data input file; such files
contain variable values to be either initially loaded into a database or to be
added to an existing database.
A CLINSYS database file is treated as an extensible linear array. Extensiblearray caching routines [Knott1] are used to read from and write to this file,
allowing it to be viewed as a arbitrarily-large array. Above this first layer
of routines that emulate an array, we have a second layer of routines that
handle the dynamic allocation of blocks of cells belonging to this array. The
8
head of the free block list for the space managed by the dynamic allocation
system is stored in the first 4 bytes of the file as a 32 bit integer index into
the array.
Each database variable of each subject-class has an associated hash table
which is stored in the database file. The key for any one of these hash tables
is the subject number for which the values of the variable corresponding to
the hash table is sought.
Each hash table cell, which is 32 bits wide, holds the list-head index into
the extensible array .db file of the first member of the chain of hash table
entries that have the common hash value associated with that cell. The
hash function is the subject-number modulo the hash table size. Each entry
in the hash table has a subject number (a 32 bit integer), a link (32-bit
extensible array index) to the start of the list of values for that subject in
the database file, and a link to the next chain entry with the same hash value.
(It is feasible, and in view of current processor and disk-drive technology,
desirable, to expand CLINSYS hash-tables to 64-bit cells.)
Each list of values is stored in the database file. Each list-entry has a time
value, a type value, a data value, and a link to the next entry. The values
are ordered by time in decreasing order – most recent first. This makes
retrieval easier for the latest values, and a little harder for retrievals based
on sequence number.
Each data value field is 8 bytes wide, and may conceptually hold a real value,
a code value, a time value, a subject, a matrix or a string. Real, code, and
time values are stored directly. Strings, matrices, codes and subjects are
stored indirectly elsewhere in the same database file. The data value field
for a string-value or a matrix-value is a 32-bit index to the byte elsewhere
in the .db file where the matrix or string is placed. A matrix m[1 : n, 1 : k]
is stored as a sequence of bytes. The first 4 bytes is n, the number of rows.
The second 4 bytes is c, the number of columns. The remaining 8 · n · k bytes
hold the n · k doubles of the matrix, stored by rows. A string is stored as a
null(0)-terminated sequence of characters.
One extra byte is kept with each data value for type information. The type
information consists of two fields: the value type (Matrix, Code, String,
Time, Real, Complex, Subject) which is 5 bits wide, and the data-missing
code, (UNASKED, NOTAPPLICABLE, SUSPICIOUS, REFUSED, NOVALUE), which
is 3 bits wide. The existence of data-missing codes requires that a type byte
be stored with each value; otherwise this would not be necessary.
9
Database Schema Format. The CLINSYS database variable definition file
for the database named N is the file named N.def. It is an ASCII file which
contains the definitions of each database variable and each subject-class of
the database.
A CLINSYS database variable for a given subject-class is defined by specifying the following attributes in the database variable definition .def text
file.
1. The variable name
2. The subject class for this variable.
3. The variable description (a description text string)
4. The value datatype (real, complex, subject, string, matrix, time, code)
5. Constraints (edit-check rules, range limits, etc.)
6. Unit (examples: ml, grams, count)
7. Impute rule for unknown values
8. Code table (in case of a codev datatype)
9. Unique/time-dependent (time-dependent yes/no)
10. Computation rule for computed variables
All these attributes except the variable name are optional with established
defaults where required.
A specified variable whose values are unique for each subject may be declared
to be the key-variable for its subject-class. There must be at most one keyvariable for each subject-class. It is generally more convenient to refer to
a particular subject of a subject-class by specifying its key-variable value
rather then its subject number.
Impute rules. The impute rule for a time-dependent database variable is
a specification of how to impute a value to a variable which is either not
recorded at a specified time, or is explicitly assigned an unknown descriptor
for that time. The possible impute-rules are:
1. exact: the returned value associated with a time which has no associated
explicit value will be the unknown code UNKNOWN. The default is
the exact rule.
2. nearest: the value associated with the time nearest to the requested
time is returned. UNKNOWN is returned for times less than the least
recorded time. UNKNOWN is returned for times less than the least
recorded time.
10
3. bounded-nearest: same as nearest with UNKNOWN returned for
times outside the time range of the collection of values.
4. earliest: (early step-function interpolation) the value associated with
the nearest time earlier than the requested time is returned. UNKNOWN is returned for times less than the earliest recorded time.
5. bounded-earliest: early step-function interpolation with UNKNOWN
returned for times outside the time range of the collection of values.
6. latest: (late step-function interpolation) the value associated with the
nearest time later than the requested time is returned. UNKNOWN
is returned for times greater than the greatest recorded time.
7. bounded-latest: late step-function interpolation with UNKNOWN
returned for times outside the time range of the collection of values.
8. linear: linear interpolation is used to compute the returned value.
9. mean: the mean value is returned.
10. median: the median value is returned.
Impute rules 8, 9, and 10 apply only to real-valued and time-valued database
variables
An example subject class is given in the table below. In this table, italicized
database variables are time-independent and bold-face database variables
are time-dependent. The time-dependent attributes of this subject-class,
i.e., population, mayor, rainfall, and high-temp could be assigned the impute
rules: linear, earliest, sum, and mean, respectively.
name state
population
mayor
rainfall
ACity AState 100,000 1/1/88 Johnson 1/1/91 2 1/01/91
120,000 7/1/90 Jackson 1/1/79 4 2/28/91
3 3/31/91
high-temp
35 1/15/91
40 2/15/91
43 3/15/91
Each database variable is specified to be either a time-independent variable,
or a time-dependent variable. A time-dependent variable is further classified
as an exact time-dependent variable, for which a sequence of data-valuetime-value pairs are stored, or as a day-precision time-dependent variable,
for which a sequence of data-value-time-value pairs are stored, where the
times associated with data values are recorded with the time-of-day part
fixed to be 0.
11
Creating and Changing Variable Definitions. CLINSYS provides the vdefine
statement to add a new variable, possibly in a new subject class, for a
database by appending the derived variable definition to the end of the
specified variable definition .def file. The vdefine statement works interactively by presenting a form with appropriate fields to be filled-in. The user
may edit the form entries by using standard cursor control keys. A carriage
return is used to signal the completion of each entry. Each new database
variable definition is written out to the appropriate .def file and added to
the symbol table used in the on-going CLINSYS session.
There is also the revdefine statement which works interactively in a manner
similar to the enter statement. The revdefine statement is used to change
the definition of an already-defined variable in the .def file associated with
the database given as the argument to the statement. It should be noted
that a variable’s name, subject class and type cannot be changed. Existing
code values for a code-valued variable also cannot be changed, but new code
values can be added. A key variable can be made an ordinary variable only
by explicitly selecting a new key variable.
Adding and Modifying Stored Data. Data entry into a CLINSYS database
N.db is a matter of specifying that specifically-provided variable values be
entered for specific subjects. This includes the case of constructing a new
subject of a subject-class. The specification of values for database variables
is done by constructing a database update input file, N.inp, which is used
to do an update of the N.db file when the dataload statement is given.
A database-update .inp file is constructed from statements which specify database variables and corresponding data-values and time-stamp values to be stored. The basic value-specification statement is of the form:
hvariablenamei : hvaluei [; htimei ] [ hmodifier i ] This input statement specifies a value for a variable, together with an optional time and modifier. If
the time is not given, the time at update is used.
If a value is specified for a time-independent variable which already has a
value, or if a value is specified for a time-dependent variable with an associated time which matches the existing time-value, then the value-specification
statement is taken as a change request when the update modifier is present;
or as a delete request when the delete modifier is present. Otherwise the
add modifier is taken to be present. When the add modifier is used, an
attempt to change an existing variable value is ignored and reported as an
error.
The enter s1, s2, . . . statement invokes the CLINSYS on-line data-entry
12
facility in order to create a new .inp file or to add to an existing .inp file.
Each of the names s1, s2, etc. can be the name of an explicitly-defined
database-variable list, or it can be the name of a subject-class, thereby
specifying the list of all the database variables of the subject class. If no list
is given, then the currently-selected subject class is assumed to be the only
input by default. The CLINSYS data entry function will prompt for the
database-variable values of the database-variables in the database-variable
list s1, and then for the database variables in the database-variable list s2,
and so on. The appropriate set-value statements are constructed with the
appropriate times and modifiers, and these set-value statements are placed
in the specified .inp file.
The actual act of updating a database is initiated with the dataload statement. Values of the variables of a database are both initially loaded into a
database .db file and later used to update a database file with the dataload
statement. dataload loads data from the specified data-input (.inp) file
and stores it in the specified database, thus adding to and/or updating any
stored data which is already present.
When CLINSYS is started, it prints an identification message and then it
enters its command loop. No database is open and CLINSYS acts as a function interpreter. One or more databases are opened with the start command: start name1, name2, ... For each specified database, the associated variable definition .def file is read. Then, for each specified database,
the associated .db files are opened (or created as empty databases, if they
do not exist).
Super-Databases. Several databases may be open simultaneously; the effect
of this is to join the subject-classes of all the open databases together as one
“super-database”. A super-database can be queried and updated. However,
as always, only one of the subject-classes is currently selected. The currently
selected subject-class is used by some database handling statements, and to
disambiguate database-variable names that are not fully-specified.
CLINSYS Operators and Additional Datatypes
We use the terms operator and function interchangeably, since the distinction between them is at most a matter of syntax. CLINSYS operators generally have their arguments evaluated in advance, i.e. before the function
that uses them is called, however some operators have their arguments unevaluated. This latter form is used for functions that conditionally evaluate
13
some, but not always all, of their arguments (e.g. if then else, and ,
or ).
Functions may have implicit arguments. This means that some actual arguments may be left out from the function call, in which case they are implicit;
that is: values for the missing arguments are supplied by CLINSYS, where
the supplied values may depend upon both the function and the context
in which the function is used. Both user functions and builtin functions
may be invoked with missing arguments. User-defined functions must have
default values for implicit arguments specified in their definitions for each
potentially-missing argument.
Usually, the arguments of CLINSYS operators are coerced to be of the correct type, and, for matrices and tables, the correct row and column dimension, as follows. If a matrix or table is given where a scalar is expected, the
(1,1)-element of the matrix or table will be used. The value 0 will be used if
the matrix or table is empty. If a matrix is given where a table is expected,
the matrix is converted into a matching table. If a scalar is given where a
matrix or table is expected, it is coerced into a 1 by 1 matrix or table whose
(1,1)-element is the given scalar value. If an incorrectly-dimensioned matrix
or table is given, its rows and/or columns are truncated to obtain a desired
smaller size, and its rows and/or columns are cyclically repeated to obtain
a desired larger size.
In addition to the data types of the values of variables stored in the database:
real, string, code, time-value, matrix of doubles, CLINSYS provides for the
use of the following internal data types:
1. Database Variable (access function). A database variable name has an
internal CLINSYS value which is of type “database variable access function”.
This is a data-type because this value is in fact a function-like object used
to access the stored values of the associated database variable. Recall that
each database variable is associated with a unique subject-class.
Database variable values are accessed by database-variable access functions
whose names are the same as the names of the database variables. Let v be
a database-variable, let s be a subject number, and let t be a time value.
Then both v(s,t) and v(t,s) denote the value of the database variable v
for subject s at time t. If no such value exists, the impute rule associated
with the database variable v is used to determine the value returned. v(s,t)
may be written explicitly with two arguments, or alternatively, we may write
v(s) (t implicit), v(t) (s implicit), or v (s implicit, or t implicit). Usually
14
when t is implicit, the latest time-value for which v(s) exists is the default
value of t that is supplied.
2. Set of Subject-numbers. In CLINSYS, data retrieval consists of constructing a set of subject-numbers of a particular subject-class. Given such a set
of subject-numbers S, variable values and/or associated time-values for the
subject-numbers in S may be extracted from the database and arranged in a
matrix or table which may, in turn, be the input to a statistical calculation.
A set of subject-numbers is a CLINSYS data object of type subject-set.
Subject-sets may be created, operated on, and destroyed. The set of all
subjects of a particular subject-class may be obtained with the function
all(). A set of subject numbers from the subject class S may be explicitly
specified by writing S{a1 , a2 , . . . , an } where a1 , a2 , . . . , an are the subject
numbers to be included in the set. For example, sc1{#1,#2} denotes the
set of subject-numbers 1 and 2 of the subject-class sc1. The subject class
argument is optional. If the subject-class for the set is present, then all the
subjects occurring inside the braces must be subject-numbers of the specified
subject-class for the set, either implicitly or explicitly. If the subject-class
for the set is missing, then if one of the subjects has the class specified,
it is taken as the class for the set. If no subject has a class specified, the
currently selected subject-class is assumed. s{} denotes the empty set for
the subject-class s.
3. Table (2-dimensional array with elements of any mixed data types, including table). In CLINSYS, a table is a 2-dimensional array, each element of
which is of any desired data-type, including the type table. It is thus a generalized matrix, allowing hierarchical construction of data objects. Tables
are generally constructed by data extraction.
An important special case of a table is a time-interval table. A time-interval
is specified by a starting time-value and an ending time-value; such an interval is interpreted as a closed interval. In general, we may specify a sequence
of one or more time-intervals. These are given as the rows of a 2-column table whose entries are time-values. Such a table is called a time-interval table.
CLINSYS contains operators for computing the union and the intersection
of two time-interval tables.
4. List of Database Variables. A list of database variables is constructed
with the function varlist() whose arguments may be of type database
variable, or of type subject class in which case all the databse variables of
the subject class are specified. Database variable lists are represented as
1-row tables whose elements are database variables. The main purpose for
15
variable lists is in the enter statement, and in the data extraction operators. Variable lists are automatically saved when CLINSYS is exited, and
restored when CLINSYS is invoked. Thus they exist across CLINSYS sessions. In many contexts, a subject-class is treated as a variable list, defined
extensionally as the database variables of the class.
5. User-defined Functions. The user may define any function using compositions of built-in functions, operators, and other user-defined functions.
The arguments to, and the values returned by a user function may be of any
CLINSYS datatype. A user function may have optional arguments. Moreover, in contrast to the built-in functions, the arguments of a user-defined
function do not need to have a fixed type. Their type is dynamically determined at evaluation time. However, type checking may be requested for
arguments to user-defined functions by specifying their type in the function
definition.
6. Void. Void is a special data-object provided in CLINSYS. The void value
is the value returned by statements; it may possibly be entered into tables
by the join operations.
Unknowns
CLINSYS represents each “missing” data value in the database with one of
the following database “unknown” codes: UNASKED, NOTAPPLICABLE,
SUSPICIOUS, REFUSED, NOVALUE. These codes are collectively called
database unknowns. These unknown values are of polymorphic type (e.g.
NOVALUE, etc. may be treated as a scalar, a matrix or any other type).
All database unknowns are special constants defined by CLINSYS. An additional special internal constant called UNKNOWN is also defined. UNKNOWN is used as the value within CLINSYS which is returned to indicate
an unknown result for an operation.
A special meta-constant, UNKN, is also defined which refers to any of the
database unknowns, as well as the “internal” unknown code UNKNOWN.
Thus, the following predicates are true: REFUSED==UNKN, UNASKED==UNKN,
12.5!=UNKN, UNKNOWN==UNKN. There is also a predicate unknownp()
defined in CLINSYS such that unknownp(x) has the same logical value as
x==UNKN.
Unknowns travel through CLINSYS as follows: Database unknowns are
stored in the database by the database load routine. Such unknowns are
16
retrieved by the query functions. When an unknown occurs in a computation, the result of the operation is the computational unknown code: UNKNOWN. There are two categories of exceptions to this, where unknowns
have a special treatment. The first exception category consists of the logical
operators and, or, not, the in and during predicates, and the quantifiers
for all and there exists. The second exception category is made up of
the equality test and the inequality test. These operations are explicitly
defined for unknown inputs.
CLINSYS employs three-value logic, i.e. a logical operator can produce
one of three possible values: TRUE, FALSE, and UNKNOWN. TRUE and
FALSE are the usual boolean logic values represented by the real numbers
1 and 0.
Real number arguments of boolean operations are coerced to the boolean
logical values by the coercion rules: 0 converts to FALSE (i.e. to 0 itself) and
all other non-zero real values convert to TRUE (i.e. to 1). Thus the values
taken as input by the logical operators are interpreted as follows: non-zero
real values are taken as TRUE, the real value zero is interpreted as FALSE,
and any kind of unknown is interpreted as UNKNOWN. The values output
by the logical operators are: TRUE (1), FALSE (0), and UNKNOWN.
The usual conjunction, disjunction and negation boolean operators and, or,
and not are extended to tri-valued logic by the following truth tables. In
the truth tables below, T denotes TRUE, F denotes FALSE, and U denotes
UNKNOWN.
And
T
F
U
Or
T
F
U
T
F
U
T
F
U
F
F
F
U
F
U
T
F
U
T
T
T
T
F
U
T
U
U
Not
T
F
U
F
T
U
The logical operators and and or are computed with lazy-evaluation, which
means the second argument is evaluated only if necessary.
Database Retrieval Functions
In addition to the database-variable access functions, CLINSYS contains
a collection of other specialized database access functions; these functions
are listed bellow. (CLINSYS also has hundreds of functions for mathematics, statistics, and graphics inherited from the MLAB software (see
www.civilized.com,) which are not discussed here.)
17
In the following function descriptions, each function refers to the value of
the database variable whose name is v for the subject with subject numbers.
Also, if not otherwise stated, each function works with database variables
of any type. If the specified database variable doesn’t exist, then an error is
generated. If the specified subject does not exist, or has no entries for that
database variable, UNKNOWN is returned.
valof(v,t,s) [returns a data-value]: Given a time value t, valof returns
the corresponding data value for database variable v of subject s. The
evaluation imputation rule assigned to v is used. For example, if the impute
rule for v is exact, then if no such data value is found, UNKNOWN is
returned. Note that valof(v,t,s) is equivalent to v(t,s) or v(s,t).
timeof(v,d,s) [returns a time-value]: Given a data value d of the database
variable v, timeof returns the latest corresponding time value for the database
variable v of subject s. The data value d must be of the same type as that
specified for v.
first(v,s) [returns a data-value]: first returns the first (in time) data
value for the database variable v of subject s.
last(v,s) [returns a data-value]: last returns the last (in time) data value
for the database variable v of subject s. Note that last(v,s) is equivalent
to v(s), except when an implicit time argument other than the greatest
time is provided in the context where v(s) occurs.
timemin(v,s) [returns a time-value]: timemin returns the first time associated with a stored value for the database variable v of subject s.
timemax(v,s) [returns a time-value]: timemax returns the last time associated with a stored value for the database variable v of subject s.
vmin(v,s) [returns a data-value]: vmin returns the minimum data value for
the database variable v of subject s. It is only valid for database variables
of type real or timeval.
vmax(v,s) [returns a data-value]: vmax returns the maximum data value
for the database variable v of subject s. It is only valid for the database
variables of type real or timeval.
vave(v,s) [returns a data-value]: vave returns the average data value for
database variable v of subject s. It is only valid for the database variables
of type real or timeval.
vmedian(v,s) [returns a data-value]: vmedian returns the median data
value for database variable v of subject s. It is only valid for the database
variables of type real or timeval.
18
number(v,s) [returns a real]: number returns the number of values for
database variable v of subject s.
table(v,s) [returns a table]: table returns all the values, for database
variable v of subject s, arranged in a 2-column table, where for each row,
the first element is the time value, and the second is the corresponding data
value.
seq(v,i,s) [returns a data-value]: seq returns the i-th (in time-sequence)
data value for database variable v of subject s.
timeseq(v,i,s) [returns a time-value]: timeseq returns the i-th (in timesequence) time value for database variable v of subject s.
stable(v,y,t) [returns a 1 column table of subject numbers]: stable(v,y,t)
returns the list of subject numbers s such that valof(v,t,s) = y. When
CLINSYS is extended to have indices for specified database variables, then
stable can be used in transformed expressions produced by a query optimization algorithm. Currently, query optimization for CLINSYS subsetselection predicates is an unresolved research problem.
Time-related Operators and Functions
CLINSYS has a large collection of builtin operators including a large collection of mathematics and statistics functions. CLINSYS also contains operators used for manipulating time-values and also operators used to form
the query expressions used to do retrievals. The essential time-awareness
of CLINSYS stems from the existence of this collection of special functions
which allow the user to form predicates which deal with time in a straightforward way. We shall focus on these functions below.
Recall that time-values are represented as fractional Julian day values. These
are floating point numbers with the Julian day (the date) as the integer part
and the time-of-day as the fractional part. Since time-values are thus reduced to real numbers, we may directly perform operations like comparisons
and additions/subtractions.
Time Calculations. CLINSYS provides all the usual comparison operators
on time-values. The value returned for a comparison operation on timevalues, such as tv1>tv2, is either TRUE (1) or FALSE (0). Note, however
tv1>UNKNOWN equals UNKNOWN. The comparison is performed with
a precision of fractions of a second. Such fractions of a second may not be
entered as constants or printed out in CLINSYS, but they may result from
19
operations on time values. Thus two time-values apparently equal to the
second may prove to be unequal when comparing them.
Time-real addition and subtraction can be done in CLINSYS. For example <9/12/90 10:15> + 2 = <9/14/90 10:15> and <9/12/90 10:15>
- .5 = <8/12/90 22:15>. Time-real addition and subtraction operations
are used to obtain a time value that is a number of days before or after
another time-value, respectively. Time-value units are days, so the real
number added/subtracted to a time value is taken to be a value interpreted as a length of time expressed in days. Similarly, time-time subtraction such as <9/14/90 10:15> − <9/10/90 10:15> = 4 or <9/14/90> −
<9/14/90 12:00> = −.5 can be done in CLINSYS. A time-time subtraction operation calculates the distance, in days, between two time points.
The result is a real number.
Quantifiers. CLINSYS contains a collection of selection functions which
may be used to select a certain element of a date-and-time time-value. For
example, year(t) returns the year for the date-and-time t as a real value,
and day(t) returns the day of the month for the date-and-time t as a real
value. CLINSYS also provides functions which truncate a date-and-time
value argument to the day, hour, minute or second.
CLINSYS contains finite forms of the traditional quantifiers of the predicate
calculus. These forms are provided as predicates. The predicate corresponding to ∀ is written as for all v in c :: E. The predicate corresponding to
∃ is written as there exists v in c :: E.
The expression c must evaluate to an object of one of the following data
types: set of subject numbers, matrix, or table; this value is the control-list
of the expression. In all cases, the control-list expression c determines a
collection of values which are the elements of a matrix, a table, or a set.
The quantified variable v must either be undefined or it must be of the type
corresponding to a member of the control-list given by c.
for all returns TRUE if E evaluates to TRUE for all members of the
control-list, and FALSE if E evaluates to FALSE for some members of the
control list. Otherwise UNKNOWN results. there exists returns TRUE if
E evaluates to TRUE for at least one of the members of the control-list, and
FALSE if E evaluates to FALSE for all members of the control list. Otherwise UNKNOWN results. The members of the control-list are assigned
to the variable v in the order in which they occur in that list. As soon as
a value of FALSE, for for all, or TRUE, for there exists, is found, the
loop on the control-list members is terminated. If the control-list is empty,
20
for all returns TRUE, and there exists returns FALSE. The value of the
quantified variable v after the test-loop is that of the member that terminated the loop. For for all, it is thus the member of the control-list that
makes E true. For there exists, it is the member that makes E false.
Forming sets of subject numbers. As mentioned above, the construction
of sets of subject numbers is the main device by which retrieval is done
in CLINSYS. There are a group of set operations provided in CLINSYS
to allow sets of subject numbers to be constructed and manipulated. All
the subjects and sets specified in a set operation must belong to the same
subject-class. Inter-subject-class set operations are not allowed.
Two sets may be tested for equality using the equal predicate (==) or the
not-equal predicate (!=). If two sets contain the same elements they are
equal, otherwise they are not equal. The overloaded binary infix operators
+, ∗, and − denote the union, intersection and difference of two sets, respectively. The predicate member(x,S) tests the membership of the element
x in the set S. If x ∈ S, member(x,S) = TRUE, otherwise member(x,S) =
FALSE.
The function subjects(v) returns the set of subjects that have values for
the database variable v. The function all(sc) returns the set of all the
subjects of the subject-class sc. When all is written without an argument,
the currently-selected subject class is implied. Since all(sc) may be a very
large set, It is desirable to employ an optimization strategy which avoids
the explicit construction of the set all(sc).
The subset selection operator S|B creates the subset of all the subjects of a
given set S with a certain property. This property is expressed by a boolean
expression B which may contain one or more functions with an implied
subject-number argument.
A subject set is specified in traditional mathematical notation as “{x ∈ S|B(x)}”.
This is read as “the set of all subjects x in the subject set S such that the
selection predicate B(x) is TRUE”. In CLINSYS, this set is written as S|B,
where B is an implicit-parameter form of the selection predicate B(x). The
expression S|B is called a subject set retrieval expression. For example, we
may write: all|age>30 to specify the set of all subjects whose (latest)
age-value is greater than 30.
An implicit-parameter-form expression is an expression in which database
variable access functions, such as age, and database retrieval functions, such
as first(), may appear without a subject number argument. Such database
21
variables and database retrieval functions are interpreted as having an implicit subject number argument which is supplied when B is being evaluated
to construct a subject set corresponding to the subject set retrieval expression in which B appears.
If we wanted to construct the set {x ∈ S |B(x) is TRUE or UNKNOWN},
we could write S |(B!=FALSE).
For example, to construct the set Q of all patients who are also doctors,
we may use the assignment statement Q= all(patients) | there exists
d in all(doctors) :: patients.socsec == doctors.socsec(d).
Special Operators for Time-Based Queries. CLINSYS contains several specialized operators for expressing predicates involving time; these are the
in-predicate, the during-predicate, and the when function.
The in-predicate is a special time-aware predicate written B in I, where
B denotes a boolean expression and I denotes a time-interval table. Recall
that a time-interval table is a 2-column table, each row of which specifies
a time-interval. In the special case where a single time-interval is desired,
the notation [a,b] may be used. This is the constructor for a 1-row table
with elements a and b. The single-row time-interval table [a, a + 1] may be
written as [a].
The in-predicate B in I is TRUE if there exists at least one time point in I
for which B is TRUE. The in-predicate B in I is FALSE if there is no time
point in I for which B is TRUE or UNKNOWN, but there exists at least
one time point for which B is FALSE. Otherwise, if there is no time point
in I for which B is TRUE and there exists a time point in I for which B is
UNKNOWN, then the in-predicate B in I is UNKNOWN.
If no variables occur in B, or if all the variables in B are time-independent,
then B is evaluated exactly once, since it is time-independent. Any timeindependent database variables, and all internal variables, which are necessarily time independent, are assumed to have their specified values at all
times.
The during-predicate is a special time-aware predicate written B during
I, where B denotes a boolean expression and I denotes a time-interval table.
The during-predicate B during I is FALSE if there exists at least one time
point in I for which B is FALSE. The during-predicate B during I is TRUE
if there is no time point in I for which B is FALSE or UNKNOWN, but there
exists at least one time point in I for which B is TRUE. Otherwise, if there
is no time point in I for which B is FALSE and there exists a time point
22
in I for which B is UNKNOWN, then the during-predicate B during I is
UNKNOWN. If no variables occur in B, or if all the variables in B are timeindependent, then B is evaluated exactly once, since it is time-independent.
Note that not(not B in I)=B during I.
For example, in order to see if the blood pressure of subject #1 was greater
than 100 in the given day 10/12/90, we may use the predicate bp(#1) >100
in <10/12/90>. Note that the date <10/12/90> is automatically converted to the interval [<10/12/90>,<10/12/90 23:59:59>].
To ask if more than 10 subjects simultaneously had blood pressure greater
than 100 in a given time interval I, we may use the predicate card(all |
bp>100 in I)>10.
Suppose that subject 1 has only the following values for A and B. At time
<1-JUN-89> A=2 and at time <2-JUN-89> B=4. Then note that {#1}
| A<3 and B>3 in [<1-JAN-89>,<1-JAN-90>] is empty because (A<3
and B>3) is not TRUE at any single time point in [<1-JAN-89>,<1-JAN-90>].
In order to retrieve all male subjects whose white-blood count (wbc) is
less than 3000 during the 10 days following each chemotherapy treatment,
we construct the set m with the following assignment statement. m=(all |
sex=MALE)| for all t in (table(CHEMOTHERAPY) col 1) :: (wbc<3000
during[t,t+10])
An in-predicate or during-predicate may involve database access functions
with missing arguments which are supplied from an enclosing context established by a for all, there exists, or subset selection operation.
For various builtin functions such as first(), missing optional arguments
are supplied in a way which depends upon the context of the function.
Notably, in a for all or there exists predicate, in a set expression S|B, and
in during-clauses and in-clauses, the supplied value for a missing argument
is provided from the dynamic context established by these operators.
In the subset-selection ( | ), extract, and in/during operations, only subject numbers and time values may be used as implicit arguments respectively. The missing implicit arguments are supplied as values which depend
upon the context in which the implicit-argument-function calls occur. Several operation-defined contexts that supply the implicit arguments may be
nested. In this case the most recent implicit argument of the type required, is
used. For example consider: R | ((cholesterol > 200) and (bp > 100
during I)) in J. Here both the database variable bp and the database
variable cholesterol take an implicit subject-number argument supplied
23
by the subset-selection operator ( | ) from the subject set R. However the
implicit time-value argument of bp and the implicit time-value argument of
cholesterol are different. bp takes the time-value from the time interval I
supplied by the during operator, while cholesterol takes the time-value
from the time interval J supplied by the in operator.
CLINSYS contains a useful time-aware operator called the when-operator
which produces time-interval tables. The when operator takes a boolean
expression B as its argument, and returns the list of time-intervals, as a
time-interval-table, during which the given expression B is true (i.e. not 0
and not UNKNOWN).
A list of time-intervals is a sequence of pairs of real numbers to be interpreted as time values. In CLINSYS such sets are represented as lists of
closed time-intervals where a time-interval list is, in turn, represented by
a corresponding two-column table. The representation of a set S of time
values as a list of closed time intervals entails taking the closure of the set S.
The implementation of the when operator compensates for this by producing
the closed interval [p, q − ²] in place of the half-open interval [p, q) for a small
value ².
The when operator satisfies the following identities.
when(a and b) ≡ when(a) ∩ when(b)
when(a or b) ≡ when(a) ∪ when(b)
when(not a) ≡ when(a == 0) ≡ complement(when(a))∩(when(a 6= UNKNOWN))
T
when(for all x in S :: E(x)) ≡ x∈S when(E(x))
S
when(there exists x in S :: E(x)) ≡ x∈S when(E(x))
In general, when(b) is the set of time values for which b = 1; thus computing when(b) is a root-finding operation which consists of finding all the
time-values t such that b(t)=1. When the impute rules used with the
database variables occurring in the expression b correspond to having each
of these variables assume a finite number of values, as for example with a
step-function-interpolation impute rule, then the required time-values can
be found by enumerating the time points corresponding to all distinct cotemporal combinations of the database variable values and checking the
expression b at each of these time-points.
For example, the set of all patients receiving a chemotherapy dose of 150 during the two weeks following surgery whose weight was greater than 200 when24
ever their blood pressure was less than 120 can be constructed as: S = all
| dose==150 during [surgdate,surgdate+14] and weight>200 during when(bp<120).
A more elaborate example of the when operator is shown next; we compute Q as the table of time-intervals when every patient was receiving a
chemotherapy dose less than 100 as follows: Q = when(for all s in all
:: dose(s) <= 100) .
The data extraction operators in CLINSYS are written: extract E1,...,En
from S and extractall V1,...,Vn from S.
Devising an efficient algorithm to compute when(b) where b contains complex predicates involving database variables using linear interpolation as
their impute rule is an open research problem.
Creating Tables. The extract operator evaluates the sequence of expressions E1 , E2 , . . . , En for each element of the set of subjects S to produce a
table of values. The rows of the result table correspond to the subjects in
the set S and the columns correspond to the given list of expressions.
The extractall form, takes a list of database variables V1 , V2 , . . . , Vn as
arguments, and produces a table of times and values of the specified variables
for all times and every subject in the set S.
Both extract and extractall are implicit parameter forms, with a subjectnumber parameter x ranging over the set S. That is: extract v1+v2,v3
from S is short for: for x in S :: R = R & (v1(x)+v2(x) &’ v3(x)),
where & is the row-concatenation operator and &’ is the column-concatenation
operator. R is the final result table.
For example, we can construct a table H of four columns holding the values
of 1, x + y ∗ z, age, and wbc with one row for each subject with weight > 100
sometime between 9/12/90 and 12/12/90 as follows: H=extract 1,x+y*z,age,wbc
from (all| weight>100 in [<9/12/90>,<12/12/90>]) .
As another example, the statement L = extract timemax(var1),var1 from
all extracts the latest time-value pairs of var1 for all the subjects.
Tables may be interpreted as relational tables. Row concatenation, column
concatenation, row selection and column selection as well as the usual projection and join operators of the relational algebra are provided in CLINSYS
in order to manipulate tables. At this level CLINSYS has no special facilities for time-stamped data; time values are merely another data-type which
may occur as a table element.
Note that the following two forms are identical:
25
extractall V1,...,Vn from S
extract table(V1) &+ ...
&+ table(Vn) from S
where X &+ Y stands for ujoin(X,Y). This is because table(v) produces
a 2 column table, where column 1 holds the timevalues for the database
variable v and column 2 holds the associated values.
Conclusion
CLINSYS is a historical time-aware database management system that provides flexibile time-queries, based on attaching time-stamps to data. Timestamps are associated with attribute values. Not all attributes of a tuple
change at the same time; thus, associating time-values with tuples is not as
powerful.
The meaning of a time-stamp varies according to the context, i.e. according
to the attribute value it is describing. The use of time-stamps associated
with attribute values together with associated impute rules gives CLINSYS
the ability to model reality in many practical situations.
CLINSYS retains the semantics of the standard database operators. CLINSYS provides special time-aware query operators that result in snapshot
relations. The standard database operators can then be used on these snapshot relations in the customary way. Redefining, and thereby extending, the
semantics of the standard database operators is not trivial and there are
many ways this can be done, not all of which will match the user’s notion
of these extended operators. We feel that is better to leave the standard
operators used in snapshot databases unchanged and to provide a new set
of operators that deal directly with the added time dimension.
References
[BENZVI 82] J. Ben-Zvi. The Time Relational Model. PhD thesis, Computer Science Department, UCLA, 1982.
[BLUM 82] Blum, Robert L.: “Discovery, Confirmation, and Incorporation of Causal Relationships from a Large Time-Oriented Clinical Data
Base: The RX Project”, Computers and Biomedical Research, Vol. 15,
pp. 164:187, 1982.
26
[CLIFFORD 87A] J. Clifford and A. Croker. The historical relational data
model (hrdm) and algebra based on lifespans. In Proceedings of the International Conference on Data Engineering, pages 528–537, Los Angeles, CA,
Feb. 1987. IEEE Computer Society, IEEE Computer Society Press.
[CLIFFORD 83] J. Clifford and D.S. Warren. Formal semantics for time in
databases. tods, 8(2):214–254, June 1983.
[GADIA 88A] S.K. Gadia and C.S. Yeung. A generalized model for a relational temporal database. In sigmod, pages 251–259, Chicago, IL, June
1988. acm.
[KLOPPROGGE 81] M.R. Klopprogge. Term: An approach to include
the time dimension in the entity-relationship model. In Proceedings of the
Second International Conference on the Entity Relationship Approach, pages
477–512, Washington, DC, Oct. 1981.
[KNOTT 81] Knott, G.D., Procedures for Managing Extensible Array Files,
Software Practice and Experience, Vol. 11, pp. 63:84, 1981.
[LAYARD 83] Layard, Maxwell W., McShane, Dennis J.: “Applications of
Medlog, A microcomputer-Based System for Time-Oriented Clinical Data”,
Proc. Seventh Annual IEEE Symposium on Computer Applications in Medical Care, pp. 731:734, 1983.
[MCKENZIE 91B] E. McKenzie and R. Snodgrass. An evaluation of relational algebras incorporating the time dimension in databases. compsurv,
23(4):505–543, Dec. 1991.
[NAVATHE 89] S. B. Navathe and R. Ahmed. A temporal relational model
and a query language. Information Sciences, 49:147–175, 1989.
[SNODGRASS 86A] R. Snodgrass and I. Ahn. Temporal databases. IEEE
Computer, 19(9):35–42, Sept. 1986.
[TANSEL 86B] A.U. Tansel. Adding time dimension to relational model and
extending relational algebra. Information Systems, 11(4):343–355, 1986.
[TANSEL 87A] A.U. Tansel. A statistical interface for historical relational
databases. In Proceedings of the International Conference on Data Engineering, pages 538–546, Los Angeles, CA, Feb. 1987. IEEE Computer
Society, IEEE Computer Society Press.
[THOMPSON 91A] P.M. Thompson. A Temporal Data Model Based on
Accounting Principles. PhD thesis, Department of Computer Science, University of Calgary, Calgary, Alberta, Canada, Mar. 1991.
27