Download Database creation and/or reorganization over multiple database

Document related concepts

Open Database Connectivity wikipedia , lookup

Oracle Database wikipedia , lookup

Registry of World Record Size Shells wikipedia , lookup

IMDb wikipedia , lookup

Ingres (database) wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Calhoun: The NPS Institutional Archive
Theses and Dissertations
Thesis and Dissertation Collection
1989-06
Database creation and/or reorganization over
multiple database backends.
McGhee, Deborah A.
Monterey, California. Naval Postgraduate School
http://hdl.handle.net/10945/25800
-
-
dOG*
NAVAL POSTGRADUATE SCHOOL
Monterey, California
DATABASE CREATION AND/OR REORGANIZATION
OVER MULTIPLE DATABASE BACKENDS
by
Deborah A. McGhee
June 1989
Thesis Advisor:
Approved
David K. Hsiao
for public release; distribution unlimited
T244C10
Security Classification of this page
REPORT DOCUMENTATION PAGE
la Report Security Classification
lb Restrictive Markings
2a Security Classification Authority
3
UNCLASSIFIED
Approved
Declassification/Downgrading Schedule
2b
4 Performing Organization Report Numbers)
Name
6a
6b Office Symbol
of Performing Organization
Distribution Availability of Report
for public release; distribution is unlimited.
5
Monitoring Organization Report Number(s)
7a
Name
of Monitoring Organization
(If Applicable)
(city, state,
Monterey,
and ZIP code)
7 b Address
CA 93943-5000
8b Office Symbol
(If
and ZIP code)
CA 93943-5000
Procurement Instrument Identification Number
9
Applicable)
and ZIP code)
(city, state,
(city, state,
Monterey,
8a Name of Funding/Sponsoring Organization
8c Address
Naval Postgraduate School
52
Naval Postgraduate School
6c Address
Source of Funding Numbers
1
Program Element Number
Project
TukNo
No
Work Uml A.
on
No
Title (Include Security Classification)
1 1
DATABASE CREATION AND/OR REORGANIZATION OVER MULTIPLE DATABASE BACKENDS
12 Personal Author(s)
McGhee, Deborah A.
13b Time Covered
13a Type of Report
Master's Thesis
Supplementary Notation
6
|i
1
To
From
The views expressed
in this
4 Date of Report (year, month.day)
15 Page Count
June 1989
105
thesis are those of the author and do not reflect the
official
policy or position of the Department of Defense or the U.S. Government.
Cosati Codes
.7
Group
Field
1
Subject Terms (continue on reverse
8
if
necessary and identify by block number)
Multi-Lingual Database System (MLDS), Multi-Model Database Systerr
Subgroup
(MMDS), Multi-Backend Database System (MBDS)
1
Abstract (continue on reverse
9
if
necessary and identify by block number
To create a record in a database, one uses the INSERT command. However,
Backend Database System (MBDS), the insert command only inserts one record
in the Multiat a time.
When
reating very large databases consisting of thousands or millions of records, the use of the
[NSERT command
Once
MBDS
a database
uses the
is
is
a
time-consuming process.
•ecords tagged for deletion
agged records
some of the records of the database may be tagged for deletion.
tag these records. Over some period of time, those
should be physically removed from the database. Hence, removing
created,
DELETE command to
is
in
new
essence creating
databases from untagged records.
Distribution/Availability of Abstract
:0
X|
unclassified/unlimited
same
2
as report
|
DT1C users
I
!2a
Name
Prof.
)D
1473. 84
MAR
Abstract Security Classification
UNCLASSIFIED
22b Telephone (Include Area code)
(408) 646-2253
of Responsible Individual
David K. Hsaio
FORM
1
83
APR
edition
m3y
be used until exhausted
All other editions are obsolete
22c Office Symbol
Code 52Hq
security classification of this page
Unclassified
Unclassified
Security Classification of this page
19 Continued:
In this thesis,
we present
a methodology to efficiently create very large databases in gigabytes
computers and to reorganize them when they have been tagged for deletion.
Specifically, we design a utility program to by-pass the system's INSERT command, to load the
data of the database directly on to disks and create all the necessary base data and meta data of the
on
parallel
database.
S/N 0102-LF-014-6601
security classification of this page
Unclassified
Approved
for public release; distribution is unlimited.
Database Creation and/or Reorganization over Multiple
Database Backends
by
Deborah A. McGhee
Lieutenant, United States
Navy
B.S., University of South Carolina,
1982
Submitted in partial fulfillment
of the requirements for the degree of
MASTER OF SCIENCE
IN
COMPUTER SCIENCE
from the
NAVAL POSTGRADUATE SCHOOL
June 1^9,
-^
A
,
r
'
nil
mi
ABSTRACT
To
create a record in a database, one uses the
the Multi-Backend Database
record
at
a time.
When
System (MBDS), the
a database
MBDS
for deletion.
is
insert
command
in
only inserts one
creating very large databases consisting of thousands or
millions of records, the use of the
Once
INSERT command. However,
created,
uses the
INSERT command
some of
is
a time-consuming process.
the records of the database
DELETE command
may be
to tag these records.
tagged
Over some
period of time, those records tagged for deletion should be physically removed from
Hence, removing tagged records
the database.
is
in essence creating
new
databases
from untagged records.
In this thesis,
in gigabytes
on
for deletion.
command,
to
we
parallel
present a methodology to efficiently create very large databases
computers and
Specifically,
we
design a
to reorganize
utility
them when they have been tagged
program
to by-pass the
system's
INSERT
load the data of the database directly onto disks and create
necessary base data and meta data of the database.
IV
all
the
TABLE OF CONTENTS
I.
INTRODUCTION
A.
MOTIVATION
1
B.
BACKGROUND
2
C.
II.
1
1.
The Multi-Lingual Database System
4
2.
The Multi-Backend Database System
7
THESIS ORGANIZATION
THE KERNEL SYSTEM
A.
11
THE KERNEL DATA MODEL AND THE KERNEL DATA
LANGUAGE
B.
8
11
1.
The Attribute-Based Data Language
11
2.
The Attribute-Based Data Model
14
a.
The Meta Data
14
b.
The Base Data
15
THE TEMPLATE SPECIFICATION
17
1.
Template Descriptions
19
2.
Descriptor Specifications
19
3.
Database Records
22
C.
LIMITATIONS
24
D.
POSSIBLE SOLUTIONS
29
1.
One-Pass Approach
29
m.
IV.
V.
2.
Two-Pass Approach
30
3.
The Selected Solution
31
THE PROPOSED DESIGN
34
A.
THE DESIGN OF DATABASE CREATION/REORGANIZATION
B.
MULTIPLE PHASES OF THE ALGORITHM
.
.
34
35
1.
Phase-One
35
2.
Phase-Two
41
PARALLEL OPERATIONS
C.
SEQUENTIAL
D.
THE TIME-COMPLEXITY ANALYSIS
46
E.
DETAILED DESIGN SPECIFICATION OF ALGORITHM
51
VS.
IMPLEMENTATION ISSUES
43
56
CONCLUSIONS
62
A.
A REVIEW OF THE RESEARCH
63
B.
SOME OBSERVATIONS AND INSIGHT
64
APPENDIX A
-
ATTRIBUTE TABLE PROGRAM SPECIFICATIONS
65
APPENDIX B
-
DESCRIPTOR TABLE PROGRAM SPECS
72
APPENDIX C
-
CLUSTER DEFINITION TABLE PROGRAM SPECS
75
APPENDIX D
-
RECORD CHECKER PROGRAM SPECS
79
LIST OF
REFERENCES
92
INITIAL DISTRIBUTION LIST
94
VI
LIST
OF FIGURES
Figure 1.1
The Multi-Lingual Database System
Figure 1.2
Multiple Language Interfaces for the
Figure 1.3
The Multi-Backend Database System
Figure 2.1
Sample Directory Tables
16
Figure 2.2
Sample Record Relationship
18
Figure 2.3
Sample Template
20
Figure 2.4
Sample Descriptor
Figure 2.5
Sample Record
Figure 3.1
Phase-One
40
Figure 3.2
Phase-Two
44
File
5
Same
KDS
7
9
23
File
25
File
vn
INTRODUCTION
I.
A.
MOTIVATION
In order to create a record in databases, an
INSERT command
is
In these
used.
The
cases, inserting records into any database requires input/output (I/O) operations.
record
is
read from one secondary storage, processed by the database system, then
stored onto another secondary storage.
record.
The
insert
command
is
We
notice that there are three operations per
usually efficient for single record insertion, as well as
a small batch of insert operations.
The problems
arise
when massive amounts of
records are to be loaded, generating thousands upon thousands of I/O requests.
this occurs, the loading
of a
new
database
is
a time-consuming process.
building of a utility package that by-passes the system's
the data directly onto the secondary storage
would be
When
Hence, the
INSERT command
and loads
a viable solution to help creating
databases more quickly.
Over
are deleted
records
at
some records
the course of time,
from the database.
that point
Instead, systems tag
in
them so
these records have been identified for deletion.
that they
"garbage collection" routine.
may
be deleted
at
at
the time of deletion.
non-prime times.
It
later.
Most systems employ
a
Although, garbage collection process imposes additional
overhead and complexity on the system [Ref.
records
no longer required and
Most operational systems do not physically remove
when
time
in the database are
l:p.
348],
it
does not remove tagged
Instead, the storage reclamation [Ref. 2:p. 432]
identifies all the
unused/unnecessary cells
(i.e.,
is
done
storage space
occupied by tagged records) and returns them to free storage.
allows that a record
when
the record
is
is
only tagged and
identified for deletion.
of tagged records within any database
the secondary storage.
Good
is
space
its
is
The garbage
collection
not reclaimed at every instance
The recognition and
disposal [Ref. 3:p. 396]
essential to the optimization of the records
optimization produces good data organization, which, in
Records
turn, provides efficient data retrieval.
that are tagged for deletion are
removed
from the database and the active records are reorganized on the secondary storage
fill
the space vacated by the
creating a
new
removed
Removing tagged records
records.
is
in essence
utility
package
capable of creating and/or reorganizing large databases over multiple disks.
this utility
package
is in
package
will provide a
more
The
direct support of the multi-backend database
system (MBDS), which will be discussed
utility
to
database from the untagged records.
This thesis will provide a detailed design specification for a
development of
on
in
efficient
more
means
detail in the next section.
for generating
new
This
databases as
well as provide a garbage-collection capability.
B.
BACKGROUND
Over
fairly
the past
standard.
language for
that
two decades, database design and implementation methods were
The general approach was
particular model,
homogeneous database systems, which
database system approach are
specify a data model, define a data
and develop a system
transactions written in the data language.
corresponding data language.
to
to
manage and execute
This approach lead to the development of
restricts the user to a single data
Some examples
model with
its
of systems using the homogeneous
IBM's Information Management System (IMS) which
supports the hierarchical data model and the Data Language
DMS-1100
supporting the network data model and the
IBM's SQL/Data System supporting
the
relational
data
I
(DL/I), Sperry Univac's
CODAS YL
data language,
model and the Structured
English Query Language (SQL), and Computer Corporation of America's Local Data
Manager (LDM), which supports
the
functional data
model and the Daplex data
language.
A
revolutionary approach to database
multi-lingual database system
previously [Ref.
(MLDS),
The design of
4].
management system development
which eliminates the
MLDS
is
the
restrictions discussed
affords the user the ability to access and
manipulate several different databases, using their corresponding data models with their
The major design goal of
respective data languages.
MLDS
is
the development of a
system which can be accessed via different data models and their model-based data
languages
MLDS
functional/Daplex).
network/CODASYL,
relational/SQL,
hierarchical/DL/I,
(e.g.,
and
will function as a heterogeneous collection of databases
vice a single database system.
The many advantages of
databases using different
hardware upgrades,
and
data
the
MLDS
are
its
ability
to
support a wide variety of
models and languages, economy and efficiency of
reusability
of database
transactions
developed
on
a
conventional system.
MLDS
databases.
their
has taken further steps toward a more complete utilization of
Currently,
all
are
resident
data models are allowed access to the database only through
corresponding languages:
network databases
its
hierarchical databases are only accessible through DL/1,
only accessible through
CODASYL,
functional databases are
only accessible through Daplex, and relational databases are only accessible through
SQL.
MLDS
extends the concept of a multi-lingual database systems to a multi-model
database system
(MMDS)
which the databases based on
in
accessed by data languages based on different data models.
the user of one data
alio
The obvious
data model.
model
We
MLDS.
models can be
This type of environment
and manipulate data stored in another
the cross-access of databases based
is
on
true sharing of data over multiple databases.
The following subsections
function of
MMDB
benefit of
models which allows
different
to access
different
overview of the structure and
will give the reader an
also introduce the reader to the architecture of the multi-
backend database system (MBDS).
MBDS
is
by
the database system used
MLDS
to
support database transaction processing.
The Multi-Lingual Database System
1.
A
in
Figure
block diagram
1.1.
To
of the multi-lingual database system
is
shown
access or modify the database, the user issues transactions through
the language interface layer (LIL) using a user data
data language
(MBDS)
(UDL)
for that particular model.
mapping system (KMS).
KMS
model (UDM) written
LIL routes
in a user
the transaction to the kernel
performs one of two tasks, depending on the type of
database transaction requested.
If the transaction specified
KMS
transforms the
(KDM)
(KC),
UDM
database definition.
which
processing.
then
routes
by the user
is
for the creation of a
new
database,
database definition to an equivalent kernel data model
KDM
the
database definition
request
Upon completion. KDS
to
notifies
the
is
kernel
KC. which
sent to the kernel controller
database
in turn
system
forwards
(KDS)
its
for
results to
*
KMS v
^
> KFS
UDM
:
UDL
:
LIL
:
KMS
:
KC
KFS
:
KDM
KDL
KDS
User Data Model
User Dale Language
Language Interface Layer
Kernel Mapping System
Kernel Controller
Kernel Formatting System
Kernel Data Model
Kernel Data Language
Kernel Database System
Data Model
Data Language
System Module
Information Flow
Figure 1.1
the
The Multi-Lingual Database System
formatting
kernel
structure to
UDM
KFS
system (KFS).
The user
equivalent via LIL.
KDM
transforms the results from the
then informed that the request has
is
been processed and more requests can be accepted.
If the transaction specified
KMS
KC
translates
sends
KDL
UDL
to the
UDM
KDL
transaction to
transaction to
KDS
KDM
to
sends the results in
form.
form
The KFS then
by the user
a database manipulation request,
equivalent and sends
for execution.
KFS,
is
via
KC,
KDL
Once processing
for transforming
transaction to
is
KC.
complete,
KDS
KDM
form
from the
returns the results of the transformed data to
UDM.
The LIL, KMS,
For example, there
KFS
are
hierarchical/DL/I,
KDS
is
a separate and unique set of LIL,
is
network/CODASYL-DML and
a single, major
attribute -based data
series of reports
be
to
show how
transformed
to
work on
preliminary
from
language
KFS
MLDS.
each
for
On
hand
various language
all
KDS
through the
the other
that the actual
model and
SQL
to
ABDL
raw
data
attribute-based
[Ref. 8],
from DL/I
to
to
model has not been completed
at
respectively, for
MLDS.
network and functional data
the
same time presenting
More
at
ABDL
from
[Ref. 9],
[Ref. 11].
CODAS YL-DML
Additionally, the language
The language
interface software for the functional
the time of this thesis, but detailed design for
15].
Additionally, there
is
The model
is
the Graphic
LAnguage
Actor language currently under development
at
for
the
its
also research
language interface for an object-oriented data model and
specified data language.
the
KDL,
relational [Ref. 12], hierarchical [Ref. 13],
implementation has been documented [Ref.
model using
while
ABDL
been completed for
and network [Ref. 14] data models.
a
and
(ABDL)
a complete set of algorithms for the data-language translations
interface software has
incorporating
KDM
the relational, hierarchical,
and from Daplex
[Ref. 10],
the attribute -based data language
the corresponding data-language transactions [Refs. 5,6,7].
works provide
ABDL
into
and
functional/Daplex.
It is
have been selected and implemented as the
recent
as the
accessed and manipulated by the various user-defined language interfaces.
is
The
can
KMS, KC
component shared and accessed by
Figure 1.2 depicts this concept.
interfaces.
A
known
collectively
Currently, these models and their corresponding languages are relational/SQL,
model.
data
and
Four similar modules are required for each language interface of the
interface.
the
KC
Database
its
(GLAD)
Naval Postgraduate
Figure 1.2
School.
Multiple Language Interfaces for the
Associate Professor of Computer Science
Same KDS
Thomas
C.
Wu
is in
charge of this
project.
2.
The Multi-Backend Database System
To overcome
traditional
approach
(MBDS) was
to
designed.
the performance and upgrade
problems associated with the
database system design, the multi-backend database system
MBDS
has solved these problems by using multiple backend
Each backend has
processors connected in parallel to a single controller.
hardware, software and disk system, as shown in Figure
1.3.
are not unique to each backend, but
all
replicated over
is
own
its
Hardware and software
backends.
The backend
The
processors are connected to the backend controller via a communication bus.
backend controller can be accesses
The backend
computer.
controller
either
is
by the user
directly or through the host
responsible for supervising the execution of
database transactions.
Performance gains are realized by
processors.
transaction
Also,
size,
a
if
then
more
C.
the
number of backend
number of backend processors of
inversely proportional to the
number of backend processors
MBDS
the
database size remains constant, then the response time for the user
If the
is
increasing
the system.
increases in direct proportion to the database
produces nearly invariant response time for the same transaction.
on the
detailed discussion
MBDS
the reader
For
referred to [Refs. 16 and 17].
is
THESIS ORGANIZATION
Under
the current
mode
of operation there
The process
into the database
is
operations.
This in
itself is
take several days to load a
When
to actually
quick and simple.
not bad, but
new and
the delete operation
is
efficient
manner
in
which
to
detennine where a record should be inserted
Each
insert operation generates a set
when performed
several thousand times,
of I/O
it
can
removed from
the
large database.
used, records are not physically
database, but are actually tagged for deletion.
records grow in numbers.
no
Current operations are satisfactory for single
generate or reorganize large databases.
record insertions.
is
They should be
Over
a period of time, these tagged
periodically collected and
8
removed from
the
Backend Store
1
Jackend
•Processor
1
Backend Store 2
Backenc
Processor
2
To a
/'Backend
Host
Controllj
Backend Store
n
Backenc
Processor
n
Communications
Bus
The Multi-Backend Database System
Figure 1.3
database to obtain the
maximum
provide an algorithm which
is
used of disk space for active records.
flexible
enough
This thesis will
to create large databases
and provide a
"garbage collection" feature to get rid of the tagged records and reorganize remaining
active records.
In Chapter
II,
we
in regards to creating
look
new
at
the kernel software and describe the system's limitations
databases and deleting records from a database.
We
also
describe two approaches for an algorithm designed to create and/or reorganize large
databases over multiple backends. then select the appropriate approach.
outlines the
multiple phases of the algorithm.
A
Chapter in
complexity analysis of the
algorithm versus the complexity analysis of the insert operation
is
new
given, indicating the
efficiency of the
new
algorithm.
of the actual algorithm.
V,
we make
C, and
D
Also included
In Chapter IV,
we
in this chapter is a
pseudo-code design
discuss implementation issues.
our conclusions about the proposed design.
Finally,
In Chapter
Appendices A, B,
provide the program design specifications for building the Attribute Table,
Descriptor-to-Descriptor-Id Table, Cluster Definition Table and program specifications
for the
Record Error Checker
routine, respectively.
10
n.
In this chapter
first
section,
we
second section,
and record).
we
THE KERNEL SYSTEM
discuss the overall configuration of the kernel system.
discuss the kernel data language and the kernel data model.
we
discuss the specifications for the input files
In the third section,
to database creation
we
(i.e.,
In the
In the
template, descriptor
discuss the limitations of the system with respect
and garbage collection.
In the final section
we
discuss possible
solutions and select a proposed solution for overcoming the limitations discussed in the
second section.
A.
THE KERNEL DATA MODEL AND THE KERNEL DATA LANGUAGE
The kernel system
is
model-based data language.
system
(MBDS)
is
composed of two
1.
the kernel data
The kernel data model used
the attribute-based data model.
supports the attribute-based data model
The next two
parts:
is
in the
model and
its
multi-backend database
The kernel data language
the attribute-based data language
that
(ABDL).
sections introduce the concepts and terminology of the kernel system.
The Attribute-Based Data Language
ABDL
supports
UPDATE, RETRIEVE
operation with
a
and
five
primary
database
operations:
RETRIEVE-COMMON. A
qualification.
A
qualification
database on which to operate.
11
is
request in
INSERT, DELETE.
ABDL
is
a primary
used to specify the part of the
The INSERT command
The
qualification of the
INSERT
is
request
For example, the following
inserted.
used to
is
a
insert a
list
new
record into the database.
of keywords and a record body being
INSERT command
INSFJlT(<FILE,USCensus>,<CrTY,Cumberland>,<POPULATION,4000>)
will
insert
a record
into
the
US
Census
file
for the
city
of Cumberland with a
population of 4,000.
database.
The
DELETE
The
qualification of a delete record
request
is
used to remove one or more records from the
is
a query.
A
query, in the
operation, specifies which record in the database will be deleted.
For example, the
DELETE command
following
DELETE
will delete
all
The
The query
UPDATE
US Census
request
UPDATE
specifies
how
((FILE = USCensus) and
records from the
qualification of the
specifies
DELETE
is
file
(POPULATION >
100000))
with a population greater than 100,000.
used to modify records of the database.
The
request consists of two parts: the query and the modifier.
which records of the database
are to be modified and the modifier
the record to be updated will be changed.
For example, the following
UPDATE command
UPDATE
(FILE = USCensus)
(POPULATION = POPULATION +
12
5000)
will
modify
all
records in the
In this example the query
= POPULATION +
request
RETRIEVE
The
by increasing
all
populations by 5,000.
is
(POPULATION
by-clause,
RETRIEVE command may
AVG, MEN, MAX) on
request consists of three parts:
name
which
is
The
used to retrieve records in the database.
is
identifies the record to
consists of the attributes (fields
the user.
file
5000).
The query
and a by-clause.
Census
(FILE = USCensus) and the modifier
is
The RETRIEVE
qualification of the
US
in the record)
optional,
is
list,
be retrieved, and the target
whose values
attribute values.
list
are to be output to
used to group records.
consist of an aggregate operation
one or more output
the query, a target
(i.e.,
Also, the
COUNT, SUM,
For example, the following
RETRIEVE command
RETRIEVE
((File
= USCensus) and (POPULATION > 50000)) (CITY)
will output the city of all records in the
US Census
file
with populations greater than
50,000.
The
attribute value.
three parts:
two
The
qualification
the query, the target
The query and
join the
RETRIEVE-COMMON
target
files.
list
The
request
of the
list,
is
used to merge two
RETRIEVE-COMMON
and a
common
attribute
files
13
in
common
request consists of
between the two
are used as specified above, while the attribute
RETRIEVE-COMMON command
by
ABDL
is
the
files.
key
functions the
to
same
as the
JOIN command
in
SQL.
For example, the following
RETRIEVE-COMMON
command
RETRIEVE
((FILE=CanadaCensus) and (POPULATION>50000)) (CITY)
COMMON
(POPULATION, POPULATION)
((FILE = USCensus) and (POPULATION >
RETRIEVE
will find all records in the
find
all
US
records in the
the records
from these
names whose
cities
Census
file
file
whose populations
five simple
with a population greater than 50,000,
with a population greater than 50,000, identify
have the same population
With these
to access
files
Canada Census
50000)) (CITY)
commands,
figures are
common
and return the
city
figures.
ABDL
provides the user with the means
and manipulate the database.
2.
The Attribute-Based Data Model
In
constructs:
the
attribute-based
data
the database, files, records,
pairs, attribute-value ranges,
model, data
attribute
is
sets,
considered in the
following
value domains, attribute-value
keywords, directories, directory keywords, non-directory
keywords, keyword predicates, record bodies, and queries. These constructs are applied
to
two types of
a.
data:
meta data and base
data.
The Meta Data
The meta data
is
The various meta data
the stored data about the structure and
form of the
constructs form the directory of the database.
The
directory contains the following constructs: attributes, descriptors, and clusters.
The
base data.
14
attribute is
A
used to represent a category of certain
descriptor
attribute.
same
set
A
is
common
property of the base data.
used to describe a unique range of values or distinct value for the
cluster is a
group of records in which every record in the cluster
More
of descriptors.
is
of the
specifically, the directory is organized in three tables:
the attribute table (AT), the descriptor-to-descriptor-id table (DDIT), and the cluster-
definition table
(CDT).
AT maps
directory attributes to descriptors, while
CDT
each descriptor to a unique descriptor-id.
ids.
There are three classifications of descriptors.
maps
DDIT maps
descriptors-id sets to clusters-
A type-A
descriptor
a conjunction
is
of less-than-or-equal-to and greater-than-or-equal-to predicates, where the same attribute
appears in each predicate.
C
A
type-B descriptor consists of equality predicates.
equality predicates defined over
A
The type-C
descriptor defines a set of type-C sub-descriptors.
all
sample of each directory table
b.
is
data
a collection of
by a unique
attribute-value
type-
sub-descriptor are
unique attribute-values which exist in the database.
provided in Figure
2.1.
The Base Data
The base
database
is
A
set
is
files.
raw data
the actual
These
attribute-value pair
(or
is
a
A
keyword) and the
member
value domain of the attribute.
record
textual
is
composed of two
information
For example,
<POPULATION, 50000>
A
attribute defined in the database.
pairs of a record or file are called directory
A
keywords and
15
parts:
an
body).
An
name and
the
(record
of the Cartesian product of the attribute
pair having 50,000 as the value for the population attribute.
one attribute-value for each
the database.
contain a group of records that are related
files
of directory keywords.
pair
makes up
that
is
an attribute
record contains
at
most
Certain attribute-value
are kept in the directory for
Attribute Type
Rttr bute
i
POPULRT ON
CITY
FILE
DDIT Entry
R
C
B
1
D11
D21
D31
Rn Attribute Table (RT)
Descr ptor
ID
i
<= POPULRTION <=
50,000
50.001 <= POPULRTION < =
100,000
100,001 <= POPULRTION <=
250,000
250,001 <= POPULRTION <= 1,000,000
D11
DI2
D13
D14
D21
D22
D23
D24
D3
CITY
CITY
CITY
CITY
=
=
Columbus
=
Montereu
Toronto
LE
LE
=
F
F
I
D32
I
I
=
=
Detroi
t
Canada Census
US Census
R Descr ptor-to-Descr iptor-ID Table (DDIT)
i
ID
Desc-iD Set
Rddress List
CI
D11.D21.D32
D14.D22.D32
D12.D23.D32
D14.D24.D31
a1
C2
C3
C4
,
m
R2
R3.A5.R6
R7.R8
R Cluster-Definition Table CCDT)
Figure 2.1
Sample
identifying records or
Directory' Tables
files.
Those attribute-value
called non-directory keywords.
base data of the database.
Below
A
is
record in the
pairs not kept in the directory are
files
of the database constitutes the
an example of a record.
16
(<FILE,USCensus>,<CITY,Marina>,<POPULATION,50000>, Moderate Climate
{
Angle brackets, <,> enclose the
attribute-value pair, curly brackets,
record body, and the record itself
database
may be
identified
is
A
by keyword predicates.
keyword predicate
tuple consisting of a directory attribute, a relational operator
an
Combining keyword predicates
attribute -value.
characterizes a query of the database.
The query
is to
be accessed and/or modified.
B.
THE TEMPLATE SPECIFICATION
The
construction of a database
the template
file,
the descriptor
file,
directory and record structures of a
To
The
These
files are
used by
MBDS
files
The system
schema and Figure
the
name
2.2.b
of each template
this
(file)
captures the data relationship from one
representation that
is
its
file to
Figure
attributes.
defines the
file
system within some
and supplier records.
2. 2. a
shows
the record
Each record
The process of nonnalization
another, as well as, provides a form of
suitable for the template specification.
17
in the database
of the database:
schema would be normalized.
and
form
clusters.
consists of purchase-order, part,
shows how
normal
contain the actual data or
form the record
Figure 2.2 describes the relationship of the records.
shows
files
better describe the input files created, consider a purchasing
large business.
a three
descriptor file contains the directory
The record
to
which record
The template
file.
is
the
=,!=,<,>,<=,>=), and
disjunctive
by three input
and the record
file.
(i.e.,
in
specifies
controlled
and their descriptor definitions.
attributes
records.
is
enclose the
{,},
The records of
enclosed in parenthesis.
)
Normalization requires that
Purchase Order
Order
Number
Supplier
Order
Delivery
Number
Date
Date
Y
Total
Cost
Supplier
Supplier
City
Name
1r
Part
Quantity
Part
Price
Number
(a)
Schema
Purchase-Order
for
a Purchase Order System
(order-number
order-date,
Part
(b)
(supplier-number,
Normalized
file
is
form a unique
for a
supplier-name,
city)
Purchase Order System Schema
Sample Record Relationship
certain attributes appear in
order
total-cost)
(order-number, part-number, quantity, price)
Supplier
Figure 2.2
supplier-number,
delivery-date,
more than one
repeated in the supplier
identifier.
file
file.
and
The
is
supplier
number
combined with
in the purchase-
the supplier
This duplication does not imply that the value
stored because normalization
is
is
name
to
redundantly
concerned with logical structures rather than physical
organization
18
Template Descriptions
1.
The template
database.
file
In general, there
contains the descriptions of the templates defined in the
may be many
databases in the
The format of
keeping
in the different databases
mind, the template descriptors for the records
separate.
MBDS. So
a template file for a given database with
this in
must be kept
n templates could
be as follows:
Database name
of templates in the database
Template description for template #1
Template description for template #2
Number
Template description
A
typical descriptor with
Number
m
for template
#n
could be as follows:
attributes
of attributes in a template
Template name
Attribute #1 data type
Attribute
#2 data type
Attribute
#m
data type
There are three different data types: integer
name of
this
database will be
Order, Part, and Supplier.
for the
PURCHASING
2.
directory
string (s),
PURCHASING. The
With
this
database, as
information
shown
and floating point (0-
The
template names will be Purchase-
we can now
generate the template
file
in Figure 2.3.
Descriptor Specifications
A
name =
(i),
descriptor
WANG)
is
or (price
keywords
a
keyword predicate of
= $200).
MBDS
the form, for example, (supplier-
recognizes two kinds of keywords: non-
for search and retrievals, and directory
19
keywords
for search and
PURCHASING
3
6
Purchase-order
template
s
order-#
s
supplier-#
s
order-date
s
delivery -date
s
total-cost
f
5
Part
template
s
order-#
s
part-#
s
quantity
i
price
f
4
Supplier
Figure 2.3
retrievals as well as
directory
keyword
template
s
supplier-#
s
supplier-name
s
city
s
Sample Template
forming
The
clusters.
descriptors only.
descriptor
file is
an input
As an example,
of $10,000 and up to $50,000 since 1988,
WANG),
(10,000
=<
is
WANG
with a
descriptor
or-equal-to.
is
a
total cost
derivable with a set of three descriptors:
total-cost
< 50,000), and
(order-date
There are three types of descriptors: type-A, type-B, and type-C.
A
contains
a cluster containing records
purchasing database for purchase orders from a supplier
(supplier-name =
file that
Cluster formulations are based on the attribute
values and value ranges of the descriptors.
in the
File
= 1988).
A
type-
conjunction of two predicates: less-than-or-equal-to and greater-than-
An example
of a type-A descriptor
20
is
(10,000
=<
total-cost
< 50.000).
For creating a type-A descriptor, the
attribute
(i.e.,
A
of a type-B descriptor
(supplier-name
is
However,
equality predicate.
type-B predicate
and the value range
The value range
of 10,000 and up to 50,000) must be specified.
of upper and lower limits.
total-cost)
is
is
expressed in terms
an equality predicate.
= WANG). The type-C
An
predicate
example
is
values are provided later by the input record.
its
(i.e.,
also an
These
descriptors are then automatically converted to a set of type-B descriptors with the
same
attribute
name and
values corresponding to the value range.
As
an example,
if
the template has a type-C descriptor with values of purchase-order, part, and supplier
provided by the record, the
purchase-order), (template
When
=
first
part),
set
of type-B descriptors generated
is
(template
=
and (template = supplier).
specifying descriptors, the attributes of the given descriptor must be
unique and the specification of the values and the value ranges must be mutually
exclusive.
Below
is
an example of a descriptor
file
with n descriptors.
Database name
Descriptor definition #1
Descriptor definition #2
Descriptor definition #n
$
The
in
"$" indicates the end of the descriptor
terms of the attribute and
by the value ranges
as
its
file.
Each descriptor definition
Descriptor-type
Value range 1
Value range 2
Value range k
expressed
associated descriptor type and data type and followed
shown below.
Attribute
is
Data-type
@
Value ranges are expressed
C
are equality values,
which
holding the exact value.
is
therefore indicated
are exact.
"!".
file is
The placehold
limit is not used.
The
"<S>" indicates
limits.
Since type-B and type-
for the
upper limit
The value of
is
used for
the lower limit
the end of the descriptor
file.
An
depicted in Figure 2.4.
Database Records
Once
will
terms of upper and lower
The lower
by an
example of a descriptor
3.
in
the template and descriptor files are defined, the data record format
be specified.
Data records can be prepared
format for a typical record
file is as
in separate files for loading.
The
follows:
Database name
@
Template name #1
Record #1 template #1
Record #2 template #1
Record #n template #1
@
Template name #2
Record #1 template #2
Record #2 template #2
Record #n template #2
The database name
identifies
the
database to which the template
The "@"
indicates
the
beginning
belongs.
template's name.
of a
All records following the template
22
new
template
name belong
and the record
followed
by the
to that template until
PURCHASING
template
C
s
Purchase-order
Part
Supplier
@
total-cost
A
1000.00
100000.00
100000.00
500000.00
@
f
order-#
A
#1
#50
#100
#1000
#50
#100
@
s
price
A
1000.00
50000.00
50000.00
500000.00
f
@
supplier-name
B
s
WANG
IBM
DEC
@
B
city
s
Monterey
San Jose
Carmel
@
$
Figure 2.4
either the
Values
in
"@"
the
Sample Descriptor
or "$"
is
encountered.
File
The "$"
indicates the end of the entire record.
record are separated by a space.
Purchasing database
in the
An example
purchase-order template would look like
<order-#,26>,<supplier#,51>,<order-date, 18-May-1988>
<delivery-date,22-Jul-1988>,<total-cost, 200,500.00>
If
we
of a record in the
extract the values of each attribute value pair,
23
we have
26
which
18-May-1988
51
22-Jul-1988
$200,500.00
a record of the purchase-order template in the Purchasing database.
is
2.5 gives a
more complete record
Figure
file.
LIMITATIONS
C.
The
same
kernel system described in the earlier section provides
functional capabilities as most other database systems.
a clean, easy
way
Yet for
system.
to access
all its
and manipulate data.
It is
its
users with the
provides
It
its
users with
a simple, but powerful database
simplicity, the system lacks an efficient
means
to generate very
large databases.
The
capability to create large databases does exist but
and quite tedious.
The INSERT command
into the database.
Let us briefly discuss
is
the
how
it
receives the
slow, cumbersome,
means by which records
this
are inserted
process works.
In order to add a record to the database an insert operation
The kernel system
is
command, recognizes
it
is
issued by the user.
as an insert operation
and
proceeds to process the request. The record undergoes a record processing phase where
it
is
checked for validity prior to insertion on
a necessary
and very involved process and
The INSERT operation
The complication
system.
be inserted.
there are
keywords
type-C
is
If the attributes
is
We
backend.
Record processing
is
outlined here.
requires considerable
work on
the part
of the kernel
due to the type of directory keywords of the record to
of the directory keywords are of type- A or type-B, then
no complications. The complications
are of type-C.
to the
first
look
at
arise
when
the attributes of the directory
the steps required to insert a record without
attributes.
24
PURCHASING
@
Purchase-order
26
51
18-May-1988
31
51
25-jun-1988
V780
VAX-1 1^780
Memory
RA81
VAX-8600
22-Nov-1988
14-Apr-1889
191500.00
381900.00
1
91500.00
42000.00
58000.00
381900.00
@
Part
26
26
26
M780
D81
V8600
31
8
1
1
@
Supplier
DEC
51
Carmel
$
ngure 2.5
Sample Record
File
In step one, the attribute-value pairs (or keywords) of the record are identified.
The
attribute,
of an attribute-value pair,
successful match
keyword.
There
indicates
may
that
If there is
directory keyword.
If
is
least
at
used for matching the attributes in AT.
one keyword of the record
be more than one match
directory keyword.
record for insertion
is
if
the record contains
not a match, then the
none of the
keyword of
is
A
a directory
more than one
the record
attributes of a record is a directory
is
not a
keyword, the
rejected by the system.
In step two, once the
attribute-ids
are obtained, the descriptors
in
DDIT
are
searched to determine whether a descriptor covers the keyword of the record for
insertion having the
same
corresponding descriptor-id
attribute.
is
used
to
If the
descriptor covers the
is
performed.
the
fonn the descriptor-id group.
In step three, a search for a descriptor-id set in
descriptor-id group
keyword then
CDT
If a descriptor-id set is
25
which
found
to
is
identical to the
be identical to the
descriptor-id group, then a cluster has
inserted.
If a descriptor-id
group
group
either (1) the descriptor-id
group cannot be a descriptor-id
been
not found during the search of
is
a
is
new
cannot form a
new
In step four,
its
In case (1), this indicates that a
set.
placed
at the
new
descriptor-id set.
we
identified in step three.
is
associated
A
new
this implies:
entry in
CDT
new
is
cluster is to
created for the
In case (2), the descriptor-id group
cluster-id.
In this case, the record
is
rejected.
are concerned with the placement of the record into the cluster
The
record-id
is
transformed into a disk address and the record
This process of record placement
secondary storage so addressed.
more complicated than
CDT,
descriptor-id set, or (2) the descriptor-id
be formed for the record to be inserted.
descriptor-id set and
identified to receive the record to be
it
The records of
sounds.
a cluster
must be evenly distributed
across the multiple backends, so the placement of the record on a backend
simple task.
(1)
The following
The
cluster-id set is sent to the controller.
This information
controller tasks
all
know
A
backend then receives from
is
obtained from the table
is
known
maintained by the controller.
the cluster into
the backends to
which the record
work on
the
by broadcasting the operation and record
communication bus.
Meta data processing by backends
from
all
is
is
meta data and
to receive the record
cluster-id
not a
on the backend whose disk has the available block
backend table (CINBT), which
controller does not
is
steps are taken:
the controller the information
the cluster.
is
is
for
as the cluster-to-next-
In the beginning, the
to
be inserted.
The
to identify the cluster
to all
backends, via the
parallel operations.
One
the
determined by the backends, the controller receives the same cluster-id
the backends. again via the
communication bus.
26
The
(2)
controller locates the entry in the
Once
to the cluster-id received.
enough space
is
the cluster
is identified, it is
available to receive the record.
is
CINBT whose
space
If
is
decreased by the length of the record to be inserted.
enough, the controller
forfeits the
The
from the next backend.
space and allocates a
controller then sends a
cluster-id is identical
checked
to determine if
available, the cluster size
If the space is not large
new block of secondary
storage
message over the communication
bus, simply instructing the backend to write the record into the backend 's available
secondary space.
To
write a record
we
retrieve the
from secondary storage, write the record
into that portion
We
portion back onto the secondary storage.
there
is
and then stores the cluster
note this operation
not parallel since
is
only one backend performing the writing.
The
entire process
record, but not
of records.
processing.
Thus,
backend 's portion of the cluster
we
when
described above
is
very efficient
when
inserting
a single
creating a very large database consisting of thousands or millions
Each record generates two I/O requests and the hardware
Thousands of records
during I/O
will generate thousands of separate I/O requests.
can see where the inefficiency
lies
when
What we attempt
to create very large databases.
is idle
using the current
to
do
mode
in this thesis is to
of operation
minimize the
involved process on the record-by-record basis where the entire kernel system
up by the
large
number of
The kernel system
insert operations for a large
record from the database.
as
no longer
active.
tied
records.
lacks the capability to reclaim storage space once a record has
been identified for deletion.
it
number of
is
The
The
DELETE command
DELETE command
does not physically remove a
simply tags the record, identifying
The system no longer recognizes
27
this record as a part
of the
database, but the record
on secondary
to reside
records and "garbages"
storage
medium.
is still
fit
What now
storage.
(i.e.,
Some
in
smaller than the record that previously lived
larger than the previous record, then
it
by the tagged record and there
all
records are fixed-
at that spot,
then the
new
is
record could be
If the
new
record to be inserted
is
cannot be inserted into that slot and must be
When most
placed someplace else on the medium.
memory
record added to the
record to be added to the database
only creates smaller fragments.
It
are active
But a smaller record will not solve the problem of fragmented
inserted into this space.
space on disk.
new
most databases not
new
If the
continues
still
on secondary storage
size then a
nicely into the space occupied
are variable-length.
it
This results in fragmentation within the
were of the same
would be no fragmentation. Unfortunately,
length.
resides
tagged records).
If all records
database would
a physical part of the database since
large systems want to reclaim
that is
no longer needed, they perform an operation called storage reclamation
or compaction.
This type of function involves gathering occupied areas of storage into
one end or the other
in
secondary storage.
hole instead of numerous small holes.
This concept can be applied to the database
problem where records tagged for deletion
If all the active
are not required to
remain
in the database.
records are collected and then redistributed onto secondary' storage, this
will rid the storage
secondary
This leaves a single large free storage
storage
Compressing the
medium of
is
the
fragmentation.
creation
of
a
new
In other words, the reorganization of
database
active records remaining in storage
28
is
with
only
active
records.
the route that should be taken.
D.
POSSIBLE SOLUTIONS
In this section
we
propose two solutions to resolve the problem for the lack of
an efficient database creation and reorganization function.
solution
is to
After each proposal
is
to speed
produce the same output.
inputs to the algorithm are: (1) the template
descriptor
if
new
creating a
database
be reorganized along with a record
redistributed.
The
large
presented, the best possible solution will be selected.
is
will
medium)
With both
up the process of creating very
Both approaches use the same type of input and
tape
basic idea behind each
bypass the system and to load the data directly to the disks.
proposed solutions, the intention
databases.
The
file,
or, (2) the
(on tape
file
output, once the algorithm
is
AT
and
and a record
file,
DDIT
file
The
(on
of the database to
medium) containing
the data to be
used, will generate a database over
multiple disks consisting of both base and meta data.
1.
One-Pass Approach
The One-Pass approach
record will be handled before
for this algorithm
is
is
named because
of the
number of times each
placed onto secondary storage.
is
it
so
as follows:
AT
DDIT
and
(1)
Generate the
(2)
Load records from tape
(3)
Process each record
(4)
Update
(5)
Build
(6)
Assign records
(7)
Write cluster to backend
DDIT
CDT
-
-
-
temporary work space
check syntax and format record
adding
if first
into
new Type-C
record on
first
to cluster
29
attributes
track/
Update
CDT
The
basic strategy
(8)
Repeat steps three through seven
(9)
Write meta data to secondary storage
The advantage
to this algorithm
disadvantage to this algorithm
storage
many
2.
is
in processing
that
each record
each record
handled only once.
is
that clusters of records are written onto the secondary
is
times as the clusters grow in size.
Two-Pass Approach
The two-pass approach informs us
will be placed onto secondary storage.
The
that data will
be handled twice before
basic strategy for this algorithm
follows:
Phase -One
AT
and
DDIT
(1)
Generate
(2)
Load records
(3)
Process each records
(4)
Update
(5)
Build/Update
(6)
Repeat steps three through five in processing
(7)
Determine record distribution
If
good
go
into a
temporary work space
check record syntax and format record
-
DDIT
CDT
distribution and/or small percent of
to
all
records
bad records then
Phase-Two
else redefine descriptor file and repeat Phase-One.
Phase -Two
(1)
The
Scan and
sort records
-
bv each cluster-id number
30
is
it
as
CINBT
(2)
Build
(3)
Load records onto
(4)
Write meta data to their disk
(5)
Send CINBT, IIGAT and
The advantages
to this
the clusters, (2)
it
their disks
approach
sorts records
nGDDIT
are: (1)
by
it
enforces even distribution of records
cluster-id
number, which
loading of clusters of records onto disks, (3)
pass
if
to controller disk.
it
a large percentage of the input records are corrupted and (4)
each record
that
assists in the building
is
handled twice.
The disadvantage
IIGAT and nGDDIT
and are used for generating new type-C descriptors.
entry for a
descriptor values.
writes clusters
to this algorithm
are insert-information-
These tables are part of the meta data residing on the controller
table, respectively.
DDIT
it
and insert-information-generation-descriptor-to-descriptor-id-
generation-attribute-table
the next
and
disallows proceedings into the second
of records only once onto the permanent storage.
is
among
new type-C
These two
descriptor.
IIGAT houses
the attribute and
IIGDDIT houses
tables are used to generate
new type-C
all
the type-C
descriptor-ids for
pre-defined type-C attributes.
The
3.
Selected Solution
In selecting the best solution to the database creation and reorganization
problem,
As
it
other.
we must weigh
the advantages and disadvantages of each
appears in our case, the advantage of one approach
So based on
effective
this,
we must
and efficient manner
advantages mentioned are
how
in
is
proposed solution.
the disadvantage of the
determine which approach will yield to us the most
which
to
store
often a single record
31
large databases.
is
The two major
handled, getting a good record
distribution within the clusters
We
storage.
and how often a cluster
written onto the secondary
is
look at these advantages in more detail and determine the relative
importance of each.
In the One-Pass approach the record
secondary storage.
In the
then, during pass-two,
in the
is
it
two-pass approach
Two-Pass approach
record
is
on the
overall efficiency of the algorithm.
As mentioned
which a record
problem
is
added
the record
read/handled twice.
handled twice, and because
earlier, the
is
it
the large
number of I/O operations
we
design of the two algorithms,
processed in pass-one and
It is
INSERT command
when each
stated that the
that
in
the
is
record
problem
must be performed.
see
why
important to understand
Single record insertion
arises with large database creation,
INSERT command. We
is
handled twice, what impact,
use of the
to a database.
using the
processed and then placed onto
and placed onto secondary storage. The record
read, sorted
is
is
is
is
if
any,
the only
has
it
way
quick and easy.
is
the
in
The
individually inserted
encountered because of
If
we
look
One-Pass approach
at the
that
general
each record
processed generates an I/O operation for writing the record onto the secondary storage.
This
is
similar to the current design and
no performance gain
Two-Pass approach each record does not generate an I/O
records onto secondary storage
as clusters of records
is
its
own
request.
encountered.
In the
The placing of
the
deferred to the second pass where records are written
and not individually.
each record generating
is
Since records are written in clusters vice
I/O request, the number of I/O requests
greatly.
32
is
reduced
Since records of a cluster are available for distribution on the parallel disks,
an even distribution of records within a cluster will produce better disk utilization and
speedier data retrieval.
even
facilitate
quarter
which
are of
we
distribution,
On
track.
accommodate
The numbers of records within
the
We
may
other
hand,
block sizes
smaller
we do
not
split
may have
have shown
that the
number of times
distribution of the records of the cluster
when
all
difficulty
to
a record over different blocks
a record
We
note
it
is
easier to have a
it
is
writes each
This will require a
which induces many computations and
good record
distribution
records of a given cluster are ready for distribution.
approach embodies both of these concepts,
handled does not
is
The One-Pass approach
cluster of records repeatedly onto the disks as the cluster grows.
I/O operations.
To
different backend's disks, respectively.
necessarily imply the processing will be slower.
new
vary greatly.
should choose smaller blocks sizes, say one-half or one-
large record sizes, since
two
clusters
among
clusters
Since the Two-Pass
our choice as the best approach. These
concepts are (1) fewer I/O operations for cluster creation, and (2) better distribution of
clustered records
on
parallel disks
due
to one-time record collection
33
and
distribution.
m. THE PROPOSED DESIGN
In this chapter
we
discuss the algorithm in detail.
multiple phases of the Two-Pass approach.
complexity analysis for
this
Also in
algorithm to show that
this chapter,
it
we
look
at the
provide a time-
does provide a more efficient
means, over current operations, of generating large databases.
Finally,
we
provide a
pseudo code of the actual design specifications for the algorithm.
Pascal-like
A.
we
Specifically,
THE DESIGN OF DATABASE CREATION/REORGANIZATION
The INSERT command
used to insert a single record into the database.
is
In
order to create a large database holding thousands or millions of records, under the
mode
current
of operation
records to be inserted.
method,
The
DELETE command
not physically
These
Although
it
execute as
is
tagged
removed from
records
are
fragmentation on the medium.
We
records
is
We
as there are
possible to generate a large database under this
is
used
to delete records
from
the database.
the database but are actually tagged as
considered
To
rid the
the active records and then redistribute
disks.
many INSERT commands
a very laborious and inefficient process.
is
it
we must
"garbages"
medium
in
the
no longer
database
of this fragmentation,
them evenly by
Records are
and
we
essence creating a
new
database with
have designed an algorithm which
creation and reorganization (garbage collection).
34
is
its
lead
collect
to
all
clusters across the multiple
discovered in the previous chapter that ridding the system of
in
active.
its
tagged
active records.
flexible
The user
enough
for both
database
will specify- in the beginning
stage
what type of operation he/she
will
Based on
be performing.
this input, the
algorithm will proceed to perform either a creation or reorganization function.
input, output,
The
and internal processing for the type of operations to be performed are
slightly different, but in either case this algorithm will
produce a database that
is
evenly distributed across the multiple backends per cluster.
MULTIPLE PHASES OF THE ALGORITHM
B.
The proposed algorithm has two
phase
is
Phase-One and Phase-Two.
phases:
concerned mostly with record processing, while the second phase
with loading data, both meta and base, onto the backends.
phase approach
is
when
is
similar to the logic used by
the record
detail the
1.
is
MBDS. The
logic used in the two-
difference between
main directory
is
tables: the attribute table
the cluster-definition table
(AT), the descriptor-to-descriptor-id table
(CDT). Upon entering
If the
database then the template and descriptors
a
in
concerned primarily with record processing and building the
database or reorganized active data).
these files are retrieved, the
AT
them
on disks. The next two sections will discuss
this phase, the user
maps
director)
unique descriptor-id.
AT
new
user has specified a creation of a
new
files
descriptor
for the
will be generated.
attributes to descriptors
The
file
and
contains
35
must
a
have specified the type of function he/she wishes to perform
built.
concerned
Phase-One
(DDIT) and
Once
first
Two-Phase approach.
Phases-One
three
actually placed
The
is
The
all
(i.e.,
create
database will be retrieved.
A
partial
DDIT maps
directory
DDIT
will also be
each descriptor to
keyword
descriptors.
Each descriptor
definition is expressed in terms of the attribute,
and followed by the value ranges.
type, data type,
information
and
AT
obtained to construct both
is
read in a single pass and
file is
This value
entry value.
At
set.
by
attribute is followed
its
AT
same time
the
its
is
the
"@"
is
known
file.
Once
encountered.
DDIT
are
be retrieved.
already exist.
This
Since this
is
Once DDIT
is
reasoning behind
to
is
If the
form the descriptor-id
is
These values
Each
AT
made
are read
reach
is
and a
partial
user has specified a
DDIT
a reorganization of active records, their
AT
and
is
retrieved the type-C attributes will be
DDIT
encounter them in the active records.
for building
If not, they
Appendices
AT
A
may
and
B
is
removed from
The
conceivable that
be re-introduced
are attached
and DDIT, respectively.
36
will
DDIT
contains only attributes that are active.
it
in
type-C descriptors are
garbage collection), then
once records are tagged for deletion
program specification
DDIT
also being built.
all
will
and
(i.e.,
type-C attributes could have been deleted.
the
keywords
the group of range values
time because
The
file.
AT
ensure that
tliis
at this
all attributes
entry for the attribute
introduced by the records.
reorganization of the active records
DDIT.
is
the
that
assigned a unique
is
the descriptor has been read
not completed
is
since they
The end of
file
AT
directory
to help
DDIT
Once an
value ranges.
all
corresponding range values can then be place in DDIT.
constructed.
not
DDIT
being constructed,
is
sequentially from the descriptor
when
when complete
used as an index into
is
this
from the descriptor
Each keyword
have been extracted and placed into AT.
from
is
and DDIT. To construct
their corresponding attributes types are extracted
descriptor
AT
It
associated descriptor
its
all
when we
and provide
Once we have
built
record processing phase.
At
function being performed
is
AT
and a
DDIT we
partial
We
is
the record
examined
We
creation or garbage collection.
start
by loading a
main memory
file into
(or a
work
is
At
this point the record
After scanning the record,
scanned.
when we determine
This number will be used later
percentage of bad records processed.
record
with the
then get a single record from our work space, keeping a running total of
records in the database.
A
now proceed
processed the same, whether the
this point, records are
block of records to be processed from the record
space).
can
check to determine
its
syntax.
to determine the record template.
it
The
Once
processing phase begins.
checked for
will be
the template
is
errors.
First,
of the record
attribute
first
the
is
identified as valid,
the remaining attributes are checked to determine if the record contains the correct
number of
attributes
These records
are
and
that
each
reflect the information
total
it
all
contained in the template
number of bad records
at
this
These data structures
records in the database.
file.
If the
will be sent to a record handler for processing
be formatted
These data structures hold
checked against specific data structures.
the record template descriptions for
errors,
attribute also has its correct associated data types.
will be maintained.
If the
record
found to contain
is
bad records.
record
which
time
will be built (or
is
not a complete
it
will
time for subsequent placement onto the disk in Phase-Two.
is
The
new
"#"
is
the
placed between each attribute in the record and the "&"
the special character used to
CDT
count of the
without errors
is
Formatting involves embedding special characters within the record.
special character
A
mark
the end of the record.
DDIT
entries will be placed in the table).
CDT
because the address of each cluster
37
is
will be updated and
The
is
CDT
not
built at this
known.
Recall
provides us with a count for the records in each cluster.
later
when we determine whether
has been achieved.
a temporary
into the
work
the record
file to
file are
This information will be used
is
Once
the record
repeated until
then placed into
is
all
the records loaded
the records are processed,
in the record file.
we check
more records
If there are
in
be processed, then another block of records are read into the work
is
repeated.
After
processed.
all
compose
error-free records that
all
CDT,
into
This entire procedure
more records
space and the procedure
record
made
is
work space has been processed.
to see if there are any
CDT
partial
or not a good record distribution within the clusters
After an entry
space.
The
secondary storage until Phase-Two.
that records are not placed onto
This process will continue until
records are processed, a tape
Appendices
the database.
provide the program specifications for building
CDT
C
is
all
records in the
produced containing
and
D
are attached and
and the record-error-checking
routine, respectively.
Once
determine
if
conditions:
we
if
record
the
processing
should proceed onto Phase-Two.
there
is
good record
a
percentage of records containing errors
records processed).
We
that are relatively the
the
CDT.
We
can
same
now
is
size.
look
The
achieved
is
to
look
distribution
then
proceed to Phase -Two under two
within the
clusters,
relatively small (e.g., less than
ideal size of a cluster
the track size.
at
the partial
obtained good distribution of the records.
is
We
we must
and
10%
if
the
of total
define a good record distribution within a cluster as clusters
number of backends times
built
phase has been completed
at the
A
average cluster
38
would be approximately
During the record processing phase
CDT, and judge whether
criterion to determine if
size.
If the
or not
good
we
we have
distribution
average cluster size deviates
greatly
from the
ideal cluster size, then a
good
distribution
was not
obtained.
If
it
is
determined that the records within the clusters are not distributed evenly, then the
algorithm will not proceed to Phase-Two.
attribute values
Cluster formulations are based on the
problem would be the automatic division of the range
might be an appropriate solution.
Problems
within clusters.
non-numeric, or
cluster size.
may
this division
In any case,
combined
would guarantee a
encountered
when
when even
possible solution to this
attributes.
At
glance this
first
definite redistribution of records
dividing range attributes that are
much
result in clusters that are
record distribution
is
smaller than the ideal
not obtained the range
These ranges should be either decreased and reduced
Adjusting
to created a larger interval.
be required or desired.
combinations.
It
may be
attributes should be adjusted.
sized or
One
and values ranges of the descriptors.
all
ranges variables
The procedures described above can be taken
The readjustment of range
in a
may
in
not
number of
attributes for a particular database should
be
handled on a case-by-case basis.
The second condition under which we proceed
small percent (say
10%) of
the records contain errors.
corrupted and a large percent
algorithm will not proceed
(e.g.,
30%) of
onto Phase-Two.
to
Phase-Two
If for
is
if
only a
some reason data
the data processed
is
is
bad, then this
At the time, the user can reexamine the
input data, take whatever actions are needed to correct the problem with the data and
then reenter the algorithm.
If
Phase-One
is a
success (both good record distribution
and small percent of bad records), then the algorithm will proceed to Phase-Two.
Figure 3.1 provides a flow-chart diagram of Phase-One.
39
L—EUASE-QUeJ
REDEFINE DESCRIPTOR FILE
CREATE
<CH^ril!^>-
RBDRG
1
1
GET FILES
GET TABLES
t
BUILD AT
RMTYPEC
31
FROM DDrr
BUILD DDIT
LOAD RECS
^
t
GETREC
t
SCAN REC
t
FRROFCHK
no
TYES
FORMAT REC
1
i
iPnATP
nnu
1
BUILD CDT
1
W/REC TEMP
YE
W/REC TAPE
NO
^T YES
CLUST. DIST
t
BAD REC
%
NO
^Tyes
CEPHAS E-TWO^
Figure 3.1
Phase-One
40
1
BAD REC
HANDLER
Phase-Two
2.
Phase-Two
onto disks.
mainly concerned with loading the meta data and base data
is
The records
that
make up
This tape was built during Phase-One.
Two
is
The records on
a tape sort.
In other words,
each other.
all
Most systems provide a
algorithms available
(i.e.,
Quick
The
first
clusters
Bubble
is
placed into either main
stored on tape.
their cluster-id
id will
numbers.
be placed next to
There are a number of
so
sort, etc.)
we
will not specify a sort
memory
work
or a
Since the
space.
records are sorted by their cluster-id numbers, the building of the cluster
simpler.
We
things: cluster size
A
is
made
need only read the records sequentially, since the records of a given
cluster are next to each other.
blocks.
sort
After the tape has been sorted,
algorithm but leave this decision to the implementor.
a portion of the tape
by
have the same
tape-sort capability.
sort,
now been
function to be performed in Phase-
the tape will be sorted
whose
the records
the database have
When
building a cluster
and cluster number.
total
size without surpassing
it.
when
number of bytes
The
of bytes in the block (track).
first
Each time
is
block (track)
in the records is as close to
a cluster
made
is
to disks in
number of records have
a sufficient
two bytes of each block
the address of the record of the cluster
ourselves with two
Records of a cluster are loaded
block will be written to a disk
been collected and the
we concern
refer to the total
number
loaded onto one or more disks
into the
CDT.
The record
address,
called record-id, therefore consists of the track address and the offset of the record
within the track in which the blocked record resides.
It is at
this point that a
CDT
is
Each backend has
being built for each backend.
the backends have identical cluster definition information
41
(i.e.,
its
Although
own CDT.
all
CDTs
in
descriptor-id sets), the
ones in their
own backends
their record-ids in the
id-to-next-backend table
controller
meta data and
CDT
on the disks of a backend, the backend's
If the record tracks are not
have
own
contain only those records-ids that refer to their
CDT
entries.
(CINBT)
does not
this loading process, the cluster-
The
being constructed.
is
this table is
Also during
tracks.
table
part of the
is
used to identify the next backend to receive the
next block of records of a given cluster to be placed on the backend's disk.
process
is
repeated until the entire input tape
tape have been processed, a unique
CDT
is
will
processed.
have been
After
built for
the records
insert-information-generation-descriptor-to-descriptor-id-table
are also part of the
meta data residing on
IIGAT houses
type-C descriptors.
type-C descriptor.
nGDDIT
are used to generate
DDIT
and
AT we
From
id of
houses
new type-C
all
all
global
DDIT
entry for a
available to create these
type-C attributes and from
We
DDIT we
two
Once
tables.
get the last descriptor-
extract this information
insert this data into their respective tables.
new
These two tables
descriptor-ids for pre-defined type-C attributes.
is
tables
used for generating new
is
the type-C descriptor values.
are built, all the information
extract
and
The
(TIGAT) and
(nGDDIT). These
the attribute and the next
any particular type-C descriptor.
and then
the controller
on the
each backend.
tables to be built are insert-information-generation-attribute-table
final
AT
all
This
from
These tables
AT
and
DDIT
exist to provide
information on keeping track of type-C attributes which in
rum
help keep
descriptor information on the multiple backends consistent.
All the
disks.
meta data has been generated and
is
now
CINBT. IIGAT and nGDDIT can be loaded onto
original function in entering
Phase-One was
42
ready to be placed on the
the controller disk.
a database creation, then
AT
and
If the
DDIT
be replicated onto each backend and each individual
will
backend.
appropriate
If
the
DDIT
reorganization of the active records,
the individual
CDTs
will
SEQUENTIAL
AT
VS.
No
entering
be loaded to
its
was
a
Phase-One
be replicated onto each backend and
Figure 3.2 provides
in this function is not affected.
PARALLEL OPERATIONS
Sequential operations infer that
sequence.
will
in
will
be loaded to their respective backends.
a flow chart diagram of Phase-Two.
C.
function
original
CDT
tasks will be performed
all
one
a time, in
at
other operation will be performed until the operation before
it
has been
completed. In most cases, sequential processing implies some sort of task dependency.
One
that
task cannot start until another task has been completed.
one or more tasks can be performed
at the
same
time.
Parallel operations infer
The
start
of one task does
not depend on the completion of another task, so these tasks can be processed in
unison.
In
most cases,
parallel processing usually provides a quicker response over
sequential processing.
The proposed Two-Phase approach algorithm performs
algorithm reads a
flat file (i.e.,
the record
a file and then processes these records.
retrieve another
chunk and continue
have been processed.
tape,
file
is
It
retrieves "chunks" of records
Once those records
by
are processed,
flat file is created.
clusters to be loaded onto the disk space.
also processed sequentially.
it
This
is
will
file
This clustered
Records of a cluster are stripped from the
blocked into tracks and these blocks are then placed onto backends' disks.
placing of the clusters onto disks
from
to repeat this process until all records in the file
During record processing another
will contain the records
record
file).
This
sequentially.
not performed randomly.
43
A
The
round-robin approach
(
-
.^PHASE^)
feT
*l LOAD REC
t
fc r
GET REC
*l
LOAD TO
<£^ME£LLj£>
^JYES
<^TKFULLJ>
T
RUII
D CINRT
ES
7
ND
CLSZ
=
RECSZ
t
ADD REC SIZE
TOCHJRSI7F
RACKFND?
F
BUILD CLUST
YES
B.E.
W.)
LOAD
B.E.
t
m?
^MOREP^>^-
1
ND
f
YES
LOAD TO
B.E.
t
BUILD IIGAT, 1IGDDIT
IIGDDIT&
CINBT TO CNTRL
IIGAT,
CRF *TF
^^\
F
\CORR^'
FORG
LOAD AT & DDITTOEA. B.E.
LOAD INDIV CDT TO APPRO.
LOAD DDIT TO EACH BACKEND
LOAD INDIV CDT TO APPRO.
BACKEND.
BACKEND.
(
Figure 3.2
STOP
Phase-Two
44
\^
1^
is
The
used.
first
backend's disk receives the
first
block of records of a cluster, the
second backend's disk receives the next block of records of the cluster and so on.
When
the last backend's disk receives
block of the cluster, and
its
if
there are
more
records remaining in the cluster, then the next backend to receive a block of records
of the cluster
records of
all
is
the
first
on
clusters are placed
of operations.
to the disks.
to
all
memory and
Records cannot be
they cannot be loaded onto
done sequentially, different blocks of a given cluster
backends
different
repeated until
Nevertheless, even though blocking records for
the disks until they are processed.
is
is
There seems to be no overlapping
All tasks are seen to be performed sequentially.
processed until they are read into the main
given clusters
This procedure
backend's disk again.
be placed on their respective disks in
may
parallel.
be sent to
we have
Here,
record-serial processing and block -parallel storing operations.
MBDS, when
onto
records
loading
the record
is
place.
backends
belongs.
backend
into
CDT,
will
The
to
some
parallel
The backend
controller
communication bus, searching for the backend on
will be placed.
All backends will
respective
does perform
ready to be placed on to a backend.
polls each backends, over the
which the record
disk,
After a record has been assigned a cluster
operations but in a limited capacity.
number,
the
This
is
where the
to
parallel operations actually take
simultaneously search their meta data, specifically their
determine
if
the
record belongs to one of their clusters.
All
respond to the controller, identifying the cluster to which the record
controller takes the backends'
The portion of
to receive the record.
main memory,
the record
is
acknowledgements and
the cluster
identifies
on the backend
is
the
retrieved
written into that portion of the cluster, and the portion
45
of the cluster
When
placed back onto the disk.
is
using the
toward parallel
MBDS
INSERT command,
The procedure
just described is sequential.
the architectural design of
MBDS
The remaining four commands take
capabilities.
design to utilize parallel processing to
offers very
full
advantages of
If the reader requires
its fullest.
little
more
information on the benefits of parallel operations using the other four commands,
he/she
D.
is
referred to the references.
THE TIME-COMPLEXITY ANALYSIS
In this section
we
present a complexity analysis of the
complexity analysis of the
INSERT
new
algorithm versus the
operation, indicating the efficiency of the
new
algorithm.
The new algorithm makes use of
The new algorithm
functions
almost identically to the
difference between the two strategies
the data onto multiple disks.
inserts,
each record
is
Under
are
second phase.
bound
from tape
to the
into
is
how
mode
processed and then generates
Phase-One
CPU,
all
INSERT
INSERT
operation.
The
operation.
the algorithm handles the placement of
the current
algorithm on the other hand processes
until its
the basic strategy behind the
of operation for multiple-recordits
own
I/O request.
This
new
records, and defers the loading of the records
will concern itself
mainly with the computations
that
although there are some I/O requests which reads the records
main memory
for initial record processing.
The second phase
will be
concerned with operations that are mainly I/O bound.
We
will consider the following observations that will be used to simplify our
calculations.
First,
we
look
at
the
sequence of operations for both the
46
INSERT
command and
new
the
algorithm.
Below
is
the sequence of operations for the
INSERT
command.
load records from the record
file
get a record
check record syntax and format record
new type-C descriptor
update DDIT
update IIGAT and HGDDIT - new type-C descriptor
update CDT - determine cluster to receive the record
check CINBT — determine the backend whose available disk
process a record
-
-
receive the record
is to
I/O
—
retrieve cluster to receive record into
write the record to
I/O
—
its
main memory
cluster
place the block of the cluster to a specific
backend 's disk
Repeat process
the
new
until all records are processed.
algorithm
is
The
general sequence of operations for
as follows:
Phase -One
load records from the record tape
get a record
check record syntax and format record
update DDIT
new type-C attributes
update CDT - determine the cluster to receive the record
repeat until all records are processed
process record
-
-
Phase -Two
sort the record tape
by cluster-ids
load in records from the sorted tape
build clusters
update
I/O
—
CINBT
write block of records per cluster to specific
repeat until
generate
generate
all
backend' s disk
clusters are placed on the backend's disks
IIGAT
IIGDDIT
-
write
AT
I/O
write
CINBT, IIGAT and nGDDIT
I/O
-
write different
I/O
and
DDIT
CDTs
to
all
backends' meta data disks
to different
meta data disks
47
to the controller
backends'
backend
We
can see
the order in
achieved.
is
the area
that
both strategies contain similar operations.
which they
We
we
are executed
pay close attention
be constructed.
these tables, their
will
assume
that
tables will be in
initial
total
total
x:
total
*ddit
:
w
time
time
time
time
the
new
algorithm require that
AT
construction will not be included in the comparison study.
main memory
1:
^proce»
where the I/O operations are executed, since
this
and
Since both strategies use exactly the same process to generate
no I/O operations
n:
w
and the degree of parallelism in which they have
INSERT command and
with our comparison study.
t|o«d :
differences are
are seeking to optimize.
Secondly, both the
DDIT
to
The major
at all
We
are required to read the directory tables.
times.
With these considerations,
let
We
These
us proceed
define the following variables:
number of backends
number of records
number of tracks placed on
disks
to read a block of records into
main memory
to get a single record
to process a record
to inset
new type-C
attribute into
DDIT
time to build/update CDT
time to access CINBT — determine the backend to receive next the
cluster
t
i/o-
time to retrieve/return a cluster from/to a backend
Migddit'
time to write record(s) to a cluster
time to build a cluster of track size in
time to build/update IIGAT
time to build/update IIGDDIT
t„
time to perform the tape sort
time to build/update CINBT
'-write'
V:
t iig.t :
Let us calculate the total time required
n records into the database.
system
in the
48
bytes
INSERT command
The time required
is
8K
to process
to process a single record
and load
through the
The time required
+ tr +
n(tl0^
We now
,
t
mt
into the database.
+
n(t l0^
+ t^ +
U + t^
+
t^,,
"*"
writing the
to load
X (tbc +
2^ + t„J
+
--
(1)
algorithm, to process and load
required to process n records in Phase-One
The time
+
tbcinbt
n records
+
ti/o)
in
+
t jig „
Phase-Two
ti jgdd]t
CINBT, IIGAT and nGDDIT
to generate a large database using the
n(t lotd
+ V, +
3(t 1/c )
-
x
vnnu
+ tMl +
t cdI )
+
+
CDT
writing each individual
is
i(t i/0 )
new
is
is
+ tp^ + U, + U)
t^,
The time required
where
t^ +
calculate the total time required, using the
n records
t ion
n records through the system
to process
l(t i/0 )
to
is
+
3ti/
respective backend and 3(t i/0 )
its
to the controller disk.
two-pass approach
t ion
+
x(t*
+
tMnh{
The
total
is
time required
is
+
t if
J +
t
iig> ,
+
tj
igddil
+
i(t i/0 )
+
(2).
We
observe that there are some similar constants that appear in both equations (1) and
(2)
(i.e.,
t losd ,
equations.
t
r„
They
fmtu ).
Tr „
are calculated the
same
t
m„
t cd
,
and
t
t lotd .
t cd
,
and l Mu can be eliminated from both
against the
same number of
records.
Since
they are constant factors that appear in both equations, removing them from each
equation will not impact the overall comparative analysis.
both equations.
to
This constant cannot be removed because
two I/O operations, whereas
nO™.
+
t iigll
+
t iigdd „
+
in equation (2)
t cinbl
+
3t l/r
it
is
+t wte ) --(!')
49
not.
Tprocess also appears in
in
The
equation (1)
it
is
linked
resulting equations are:
+
n(tproceM )
If
we
ignore
t
the
x^
+
iort
+ t^,) + tjx +
references
processing and data loading
nCW^
+
nt,™ +
Now we
2^ t^J
xt*
+
see that the
requests in the
new
the
we can
+
3)
+ t^ + t^, + t^
meta data and concentrate
--
(2')
solely
on record
simplify the equations to the following:
(1")
--
Ux
to
i
+
i
+
3)
number of I/O
U-
+
(2")
requests
is
The number of I/O
drastically reduced.
algorithm depends largely on the number of tracks that will be
written to the backends instead of the
number of
records.
we view
If
these
two
equations solely on the basis of loading the records, equations (1") and (2") will appear
as
nlt^
+
nt^^ +
respectively.
2x yo
xt„c
+ t^J
+
--
--
t lon
(3)
(4)
Notice from equation (4)
we removed t^i +
load the meta data onto disk. If the database
is
3).
These I/O operations
sufficiently
large, then these I/O
operations can be ignored.
These two operations
We
are
now
in
terms of record processing and I/O operations.
can see that deferring the I/O operations until Phase-Two reduces the number of
I/O operations greatly.
When
generating large databases, equation (3) will perform on
the order of (3n) while equation (4) performs
are
loaded
in
blocks,
less
advantageous over the current
I/O
is
mode
on the order of (n +
required.
of operation.
50
This
algorithm
x).
Since records
does prove
to
be
DETAILED DESIGN SPECIFICATION OF ALGORITHM
E.
program Create_Reorg(input,output);
/**********************************************************/
*/
/*
This utility program can be used to generate a data
over multiple backends/disks. A two pass approach
is used in processing the data. Phase-One will scan
the data, process the data, load this data to a tape
and build three tables. Phase-Two will sort the tape
produced in Phase-One, load the sorted data onto the
/*
/*
/*
/*
/*
/*
/*
disk, then load the tables built in
/*
the
meta data
Phase-One onto
*/
*/
*/
*/
*/
*/
*/
*/
disk.
/*
*/
j* *********************************************************/
begin
Phase-One(...);
((Good_Cluster_Distribution) and (Less_N%_Bad_Records)) then
if
Phase -Two(...)
else
Exit— Redefine Descriptor
end. /*
main
File;
*/
51
procedure Phase-One(...);
begin
case TypeOperation of
Create:
begin
get template
file;
get descriptor
file;
Build_AT(...);
Build_DDIT(...);
end;
Reorg:
begin
AT;
DDIT;
remove Type
get
get
C
attributes
from DDIT;
end;
/*
end;
case
*/
repeat
/*
process group of records
/*
process each record
load in records;
get a record;
repeat
*/
count record;
scan record;
Error_Check_Record(...);
if
good_record then
/*
process good records
begin
format record;
Build_DDIT(...);
Build_CDT(...);
write record to work space;
end;
52
*/
*/
/*
else
process bad records
*/
begin
Bad_Record_Handler;
count bad records;
end;
get record;
until
no_more_records
write block of records to tape;
until
end_of_tape
CRITERIA: even distribution of records among clusters
determine if there is a good cluster distribution;
calculate percentage of bad records;
/*
end;
/*
Phase-One
*/
53
procedure Phase-Two(...);
begin
tape sort by cluster
id;
load records to disk
/*
repeat
*/
load records;
get a record;
repeat
if
(record_in_same_cluster) then
begin
if
(recordsize
+
clustersize
<=
tracksize-2bytes) then
/*
begin
build a cluster
add recordsize to clustersize;
*/
build cluster;
end;
/*
else
new
cluster,
same
cluster id
begin
add recordsize to clustersize;
determine backend to receive next cluster;
load backend;
/* for each individual backend
Build_CDT(...);
*/
*/
end;
else
/*
new
cluster, different cluster id */
begin
Build CINBT;
add recordsize to clustersize;
determine backend to receive next cluster;
load backend;
Build_CDT(...);
end;
54
get a record
until
until
no_more_records
end_of_tape
load
/*
last cluster to
backend
*/
CINBT;
build
determine backend to receive next cluster;
load backend;
Build_CDT(...);
load controller meta data
/*
build
EGAT;
build
IIGDDIT;
load
IIGDDIT
IIGAT to
load
CINBT
load
to controller;
controller;
to controller;
load meta data to disk
/*
*/
*/
case TypeOperation of
Create:
begin
load
AT
load
DDIT
each backend;
each backend;
to
to
load individual
CDT
to appropriate backend;
end;
Re org:
begin
load
DDIT
to
load individual
end;
end;
/*
/*
case
each backend;
CDT
to appropriate
*/
Phase -Two
*/
55
backend;
IV.
The user
its
avenue in which the user accesses the database or
interface is the
interacts with the system.
and
IMPLEMENTATION ISSUES
The
user interface
its
usually characterized by the data
The data model allows
model-based data language.
database in terms of
is
logical representation
is
always a high-level construct which
"bogged down" with the
details of the database
The kernel software of
database system
the
the user to refer to the
and the data language allows the user to
write generic transactions and queries against the database.
language
model
is
abstract
The user data model and
enough so the user
is
not
and the database system.
of the multi-lingual, multi-model, multi-backend
(MLDS, MMDS, MBDS)
is
the attribute-based data
and the attribute-based data language (ABDL).
of accessing and manipulating the database.
ABDL
ABDL
model
(ABDM)
provides the user with a means
provides the user with five primary
through these operations and dialogue that the
operations and interactive dialogue.
It is
user interacts with the system.
will be through the interactive dialogue that the
It
algorithm will be incorporated into the system.
The
simply.
interactive dialogue
All
menu
indicate the type
is
menu
driven.
The menus
are extensive, but organized
items are organized as a multi-level hierarchy with top levels to
of dialogues and the low level to indicate the specific dialogue to
follow
The algorithm embodies many of
Integration of the algorithm into
MBDS
the capabilities already existing in the system.
should be done easily.
generate the database are the template, descriptors and record
56
The
files.
inputs needed to
MBDS
has the
capability, through the user interface, to
prompt the user for
inputs needed to reorganize a database are the attribute table
to-descriptor-id table (DDIT).
the interactive dialogue
is
menu
driven.
We
functions already developed.
MBDS
These tables are already
As
stated earlier,
will take advantage of the
propose to modify some of the existing menus of
menus should be changed and what
menu appear on
The
(AT) and the descriptor-
in existence.
The algorithm
to integrate the algorithm into the system.
incorporation of the algorithm.
this information.
Upon
options
entering
The next few pages
should
MBDS
be
reflect
added to
how
facilitate
the
the
the user will have the following
the screen
The Multi-Lingual/Multi-Backend Database System
Select an operation:
(a)
-
(r)
-
(h)
-
(n)
-
(f)
-
(x)
-
Execute
Execute
Execute
Execute
Execute
the attribute-based/ABDL interface
the relational/SQL interface
the hierarchical/DL/I interface
the
network/CODASYL
interface
the
functional/DAPLEX
interface
Exit to the operation system.
Select-> a
new
If the user is creating a
will select "a" as
data base or reorganizing an existing database then he/she
shown above.
The next menu
The attribute-based/ABDL
(g)
-
(0)
-
(1)
-
(r)
-
(x)
-
to
appear will be
interface:
Generate a database
Reorganize an existing database
Load a database
Request interface
Exit to operating system
Select-> g
57
If the user is creating a
new
database, then he/she will select "g" as
Next the user will be prompted
for the
shown above.
number of backends:
Enter number of backends->
The
user will enter a numeric value such as "8".
user with a series of prompts.
These prompts are used
template, descriptor and record files (meta data
system two ways.
He/she can have a
file
files).
(g)
(x)
(z)
-
Load template,
Load template,
enter data into the
The following prompts
will appear:
descriptor and record files (predefined files)
descriptor and record files (generate)
-
Exit to operating system
-
Exit and stop
MBDS
has already built the meta data
prompts will appear,
Enter
Enter
Enter
If the user
The user can
operation would you like to perform:
(p)
If the user
to collect information about the
already built containing the meta data files
or can generate the meta data files at this point.
What
Next the system will provide the
files
he/she will select "p" and the following
in succession, requesting the
name of
name of
name of
meta data
file
names
the file containing Template information:
the file containing Descriptor information:
the file containing Records to be loaded:
does not have predefined meta data
files
then he/she will select "g" and the
following series of prompts will appear:
For building the Template
Enter
name of
Enter database
file:
the file to be used to store template information:
id:
Enter number of templates for database -name:
Enter number of attributes for template #1:
Enter name of template #1:
Enter attribute name #1 for template #1:
Enter value type:
Enter attribute
name #2
for template #1:
Enter value type:
Enter number of attributes for template #2:
58
name of template #2:
Enter attribute name #1 for template #2:
Enter value type:
Enter attribute name #2 for template #2:
Enter
Enter value type:
Enter number of attribute for template #n:
Enter name of template #n:
name #1 of
Enter attribute
template #n:
Enter value type:
For building the Descriptor file:
Enter name of the file used
Enter
name of
Do you want
template
to store descriptor information:
file:
attribute "attribute-name" to
be a directory
attribute:
Enter the descriptor type for "attribute -name" (A,B,C):
Enter upper bound for each descriptor in rum
-Enter "@" to stop:
Upper bound:
Use "!" to indicate no lower bound exists
—Enter "@" to stop:
Lower bound:
For building the record
Enter
name
The system now has
A
message
of the
file:
file
containing records to be loaded:
the three files that
will appear
it
needs
on the screen infonning
to actually
begin using the algorithm.
the user, that all inputs are received
and database generation has begun. The algorithm will proceed
If
Phase-One
fails,
is
successful, the algorithm will proceed on to
the system will send a
message
to the user.
to process the records.
If
Phase-One
will read
one of the
Phase-Two.
The message
following:
PHASE-ONE FAILED!!! - BAD RECORD DISTRIBUTION
PHASE-ONE FAILED!!! - DATA CORRUPT
The user
will correct the
problem.
If
Phase-One
failed
distribution, the user should redefine his/her descriptor file.
range variables should be adjusted.
If
Phase-One
failed
the user should check the file containing the bad data.
59
because of bad record
Specifically, the type-A
because of corrupt data, then
This
file
was produced during
made
the first pass. After the user has
new
process to generate the
the correction needed, he/she should re-start the
database.
If the user's operation is to reorganize
an existing database then
at the
menu
shown below:
The attribute-based/ABDL
interface:
(0)
-
Generate a database
Reorganize an existing database
(1)
-
Load
(r)
-
Request interface
(x)
-
Exit to operating system
-
(g)
a database
Select-> o
the user should select "o" as
The user
shown above.
will receive a series of
requesting information about the database to be reorganized.
prompts
The following prompt
will
appear
Enter number of backends:
name of database to be reorganized:
name of the containing the records
Enter
Entering the
Once
now
the user enters the
identify the tables
algorithm.
A
name of
needed
message
will
the database to be reorganized, the algorithm can
as input.
appear
The system
on
the
Two.
If
pass-one
If
fails,
Pass-One
is
will
now
start actually
informing
screen
reorganization of the specified database has begun.
process the records.
to be loaded:
the
The algorithm
user
will
successful, the algorithm will proceed
the system will send a
message
to the user.
PHASE-ONE FAILED!!! - BAD RECORD DISTRIBUTION
--
DATA CORRUPT
60
the
that
proceed
on
to
to
Phase-
The message
read one of the following:
PHASE-ONE FAILED!!!
using the
will
The
user will take whatever action
is
appropriate to eliminate the problem.
problem has been corrected the user
process.
61
will
re-enter
the
system
and
Once
restart
the
the
V.
We
have found
model and
its
CONCLUSIONS
that all database
systems have
four major components: a data
model -based data language, a database, software and hardware.
discussed these components in some detail over the past few chapters.
system
is
(MBDS).
have
The kernel
the underlying system used to support the multi-backend database system
The kernel system has
provides
own
its
data
model and language known
as the
model and language (ABDM, ABDL). Like most database systems
attribute -based data
ABDL
We
users
its
with the
capability
to
and manipulate
access
its
data.
Unfortunately, as mentioned earlier, the kernel system does not lend itself to providing
a fast
and
method of generating
efficient
kernel data language
command.
we
store large databases onto disk
This methodology
is
DELETE command
reality, tags the
is
then
we must
INSERT
In order to
it
We
does work.
have learned
that the
does not physically remove a record from the database but
record as longer active.
The
through the
use the one-record-at-a-time methodology.
not very efficient but
redistribute
fragmentation.
that
a one-record-at-a-time operation.
These tagged records leave "holes"
database creating fragmentation within the medium.
and
We know
can create these large databases with the use of the
INSERT command
The
large database.
them
over
the
backends
If
we
we
62
in the
collect the active records
eliminate
the
problem
entire thrust of this thesis is to provide a solution to the
of database creation and garbage collection.
in
of
problem
A.
A REVIEW OF THE RESEARCH
In
this
thesis,
we have
addressed
of
topic
the
database
reorganization (garbage collection) over multiple backends.
creation
Specifically,
and/or
we have
presented a methodology' that will efficiently create very large databases of gigabytes
on
parallel
computers and reorganize them when they have records
tagged for deletion.
INSERT command,
We
have designed a
utility
program
to load the data directiy onto disk
to
that
have been
by -pass the system's
and create
all
necessary base
data and meta data of the database.
In
this
thesis,
approaches) that
We
we
may be
recognized two approaches
(i.e.,
One-Pass and Two-Pass
taken with respect to database creation or garbage collection.
discussed the two methods and gave our reasons for selecting the Two-Pass
approach as the best alternative.
The Two-Pass chosen methodology
The
as Phase-One.
first
I/O processing performed during the
onto the backends' disks.
second phase
is
two phases. The
first
phase deals mostly with record processing.
in the first pass is the initial building
Two. The second phase
entailed
of the three directory tables.
first
phase.
The second phase
phase
known
Also performed
There
is
is
is
known
very
little
as Phase-
concerned primarily with loading data, both base and meta,
is
The loading of
I/O-bound.
the data generates
The separation of
63
I/O requests, so the
the data being processed and the data
being loaded have reduced the number of I/O operations.
clusters instead of individually.
many
Now
records are loaded in
We
feel
implementation.
that
the
With
methodology
presented
in
this
for
sufficient
is
an efficient means of generating large
the implementation
databases with even distributed over multiple backends will
B.
thesis
become
a reality.
SOME OBSERVATIONS AND INSIGHT
The
MBDS)
multi-lingual, multi-model, multi-backend database system
is
The power of
a very powerful, yet simple system.
(MLDS,MMDS,
the system lies in
its
provide one user access to a neighboring database which was created under
ability to
a different database model, using his/her
model and language
is
own
the kernel system
data language.
which supports the
MBDS.
There are considerable design, development and
system
a
research
vehicle
for
new
The
research
attribute-based data
MMDS
MLDS,
testing efforts in
undertakings.
This
and
making
thesis,
this
once
implemented, will provide an additional capability to an already powerful system.
Some
areas
of future research in database creation, would be
algorithm to generate more than one database
on the backends
that already
at
expanding
a time, and (2) placing a
have resident databases.
64
(1)
new
this
database
APPENDIX A
ATTRIBUTE TABLE PROGRAM SPECIFICATIONS
-
#include <stdio.h>
#include "flags. def
#include "dblocal.def
#include "beno.def
#include "commdata.def
#include "dirman.def
#include "tmpl.def
#include "dirman.ext"
ddit_definition *ATM_FIND(attribute, desc_type, AT)
Find an attribute in AT and return the pointer to its DDIT. Set */
desc_type to the type of descriptors defined on the attribute. */
struct
/*
/*
char
attribute [],
*desc_type; /* not C, C,
struct at_tbl_definition
*AT;
AT_NOTFOUND
or
AT_DELETED
*/
/* attribute table */
{
pos; /* position of attribute in Attribute Table */
int
#ifdef EnExFlagg
printf("Enter
ATM_FIND\n");
fflush(stdout);
#endif
/* find attribute in
AT
*/
pos = AT_binsearch(AT,
/* if attribute not in
if
(pos
==
-1
II
AT
attribute);
*/
AT->at_entrv[pos].at_desc_type
AT_NOTFOUND;
*desc_type =
EnExFlagg
#ifdef
printf("Exitl
ATM_FIND\n\n");
fflush(stdout);
#endif
65
== AT_DELETED)
{
return(NULL);
}
else
{
*desc_type = AT->at_entry[pos].at_desc_type;
#ifdef EnExFlagg
printf("Exit2 ATM_FDSfD\n\n");
fflush(stdout);
#endif
return( AT->at_entry [pos] .at_ddit_ptr);
}
}
/*
end
ATM_FIND
*/
ATM_INSERT(attr_name,
/* Insert
char
attr_id, desc_type, ddit_ptr,
AT)
an attribute into AT. */
attr_name[];
int
attr_id;
char
stmct
ddit_definition
struct
at_tbl_definition
desc_type; /* not C, C,
AT_NOTFOUND
*ddit_ptr; /* pointer to first
*AT;
or
AT_DELETED
DDIT
*/
element */
/* attribute table */
{
position
int
=
compare,
0. /*
index to
AT
/* return value
for attribute */
from
string
comparison of
attributes */
i;
EnExFlagg
#ifdef
printf("Enter
ATM_INSERTV);
fflush(stdout);
#endif
maintained sorted by attribute name; if not the */
must be checked for fullness; if the table is */
attributes marked for deletion are removed; if the table is still */
(no attributes were marked for deletion) an error condition exists */
/* the attribute table is
/* first entry the table
/* full,
/* full
if
(!AT->at_no_entry)
{
simply insert */
strcpy(AT->at_entrv[0].at_AttrName, attr_name);
AT->at_entry[0].at_AttrId = attr_id;
/* if first entry into table
AT->at_entry[OJ.at_desc_type = desc_type;
AT->at_entry[0].at_ddit_ptr = ddit_ptr;
++AT->at_no
entrv:
66
}
else
{
/* if full table
remove deleted
(AT->at_no_entry ==
if
entries */
AT_MAX_ENTRIES)
AT_remove_del(AT);
/* if table still full */
(AT->at_no_entry
if
printffERROR:
= AT_MAX_ENTREES)
AT
is full in
{
ATM_INSERT().\n");
sleep(ErrDelay);
}
else
{
/* find correct position */
&&
while ((position < AT->at_no_entry)
(compare = strcmp(attr_name,AT->at_entry[position].at_AttrName))>0)
-H-position;
/*
if
check for duplicate entries */
(Icompare
(AT->at_entry [position]. at_desc_type !=
printf("ATM_INSERT
attribute is already in AT\n");
&&
else
AT_DELETED))
{
down table entries */
= AT->at_no_entry - 1;
/* shift
for
(i
i
>=
position;
— i)
{
strcpy (AT->at_entry [i+ 1 ] .at_AttrName ,AT->at_entry [i] .at_AttrName);
AT->at_entry[i + l].at_AttrId = AT->at_entry[i].at_AttrId;
AT->at_entry[i + l].at_desc_type = AT->at_entry[i].at_desc_type;
AT->at_entry[i + l].at_ddit_ptr
=
AT->at_entry[i].at_ddit_ptr;
}
/* insert
new
entry */
strcpy (AT->at_entry [position] .at_AttrName, attr_name);
AT->at_entry [position]. at_AttrId = attr_id;
AT->at_entry [position]. at_desc_type = desc_type;
AT->at_entry [position]. at_ddit_ptr = ddit_ptr;
++AT->at_no_entry;
)
#ifdef
EnExFlagg
ATM_INSERTNn\n");
printf("Exit
fflush(stdout);
#endif
return (position);
}
/*
end
ATMJNSERT
*/
67
ATM_DELETE(attributeAT)
Mark an attribute in AT for deletion.
static
/*
*/
char attribute!];
*AT;
struct at_tbl_definition
/* attribute table */
{
position; /* position of attribute in
int
/* find attribute in
position
if
AT
AT
table */
*/
= AT_binsearch(AT,attribute);
(position != -1)
mark
/*
the attribute for deletion */
AT_DELETED;
AT->at_entry [position]. at_desc_type =
else
SysError(5,
}
/*
end
"ATM_DELETE");
ATM_DELETE
*/
ATM_UPDATE( attribute,
Update the dditptr
/*
char
new_ddit_ptr,
AT)
for an attribute in
AT.
*/
attribute [];
struct ddit_definition
*new_ddit_ptr: /* ptr to
struct at_tbl_definition
*AT;
first
element of
/* attribute table */
1
int position; /*
index to
AT
*/
EnExFlagg
#ifdef
printf( "Enter
ATM_UPDATE\n");
fflush(stdout);
#endif
/* find attribute in
position
if
AT
*/
= AT_binsearch(AT,
(position !=
attribute);
-1
update ddit ptr in AT */
AT->at_entry[position].at_ddit_ptr = new_ddit_ptr;
/*
else
{
SysError(5, "ATMJJPDATE");
sleep(ErrDelay);
68
DDIT
*/
#ifdef
EnExFlagg
ATM_UPDATE\nW);
printf("Exit
fflush(stdout);
#cndif
}
/*
ATM_UPDATE
end
at_tbl_definition
struct
AT
Find the
char dbid[];
/*
*/
*AT_lookuptbl(dbid)
for a database
and return a pointer
I
struct
db_info *db_ptr, *DB_find();
EnExFlagg
#ifdef
printf( "Enter
AT_lookuptbI\n");
fflush(stdout);
#endif
AT for database dbid */
db_ptr = DB_find(dbid);
/* find
if
(db_ptr)
/*
AT
is
{
found */
#ifdef EnExFlagg
printf("Exitl AT_lookuptbl\n\n");
fflush(stdout);
#endif
return(db_ptr->db_at_pointer);
}
else
/*
{
AT
not found */
#ifdef EnExFlagg
printf("Exit2 AT_lookuptbr\n\n");
fflush(stdout);
#endif
return(NULL);
}
}
/*
end ATJookuptbl
*/
69
to
it
*/
AT_binsearch(AT, attribute)
/* Find an attribute in AT using binary search and return
struct
at_tbl_definition *AT; /* Attribute Table */
char
its
position */
attribute [];
{
high,
int
/* highest
low =
0, /* lowest
mid,
/*
compare;
index in portion of tbl being searched */
index in portion of tbl being searched */
index who's attr name is being compared */
value from string comparison of attributes */
/* return
#ifdef EnExFlagg
printf( "Enter
AT_binsearchVi");
fflush(stdout);
#endif
high
= AT->at_no_entry
-
1;
while (low <= high)
mid = (low + high) / 2;
compare = strcmp( attribute, AT->at_entry[mid].at_AttrName);
{
if (! compare)
{
#ifdef EnExFlagg
printf("Exitl AT_binsearch\nVi");
fflush(stdout);
#endif
return(mid);
}
else
if
(compare < 0)
high = mid
-
1;
low = mid +
1;
else
}
/*
end while */
/* attribute not
found
in
AT
*/
#ifdef EnExFlagg
printf("Exit2 AT_binsearch\n\n");
fflush(stdout);
#endif
return (-1
}
/*
);
end AT_binsearch
*/
70
AT_remove_del(AT)
attributes marked for
at_tbl_definition
struct
*AT; /*
deletion
AT
table */
static
Remove
/*
from AT.
*/
attribute table */
{
int
i,
for
(i
if
j
=
0;
/*
indexes to
=
0; i < AT->at_no_entry; ++i)
(AT->at_entry[i].at_desc_type !=
AT_DELETED)
{
keep this attribute */
strcpy ( AT->at_entry [j] at_AttrName AT->at_entry [i] at_AttrName);
/*
,
.
AT->at_entry[j].at_AttrId
=
.
AT->at_entry[i].at_AttrId;
AT->at_entry[j].at_desc_type = AT->at_entry[i].at_desc_type;
AT->at_entry[j].at_ddit_j>tr
=
AT->at_entry[i].at_ddit_ptr;
}
/*
update number of entries in
AT->at_no_entry = j;
AT->at_write_required =
}
/*
AT
*/
TRUE;
end AT_remove_del */
71
APPENDIX B
-
DESCRIPTOR TABLE PROGRAM SPECS.
#include <stdio.h>
#include "flags. def"
#include "dblocal.def
#include
#include
#include
#include
"beno.def
"commdata.def
"dinnan.def
"dirman.ext"
struct ddit_definition *
DM_INSERT_DDIT(descriptor, val_type, ddit_list_header, new_desc_id)
/* Add a descriptor to DDIT. If the descriptor is added to the beginning,
return a pointer to
/*
*/
*/
* descriptor;
struct desc_definition
char
it.
val_type;
*ddit_list_header;/*
struct ddit_definition
first
element in
DDIT
list
*/
*new_desc_id;
struct
Descld
int
compare;
{
struct ddit_definition *create_ddit_node(),
*new_ddit, *next_ddit,*prev_ddit;
#ifdef EnExFlag
printf("Enter
DM_INSERT_DDn\i");
#endif
new_ddit = create_ddit_node();
#ifdef m_pr_flag
printf("new_ddit = %o\n", new_ddit);
#endif
/* place descriptor into ddit
element */
strcpy(new_ddit->lower, descriptor->lower);
strcpy(new_ddit->upper. descriptor->upper);
di_cpy(&(ne\v_ddit->ddit_did), new_desc_id);
if
(!ddit_lLst_header)
/* creating
new
{
list
*/
72
EnExFlag
#ifdef
printf("Exitl
DM_INSERT_DDIT\n");
#endif
return(new_ddit);
}
else
{
in existing list */
/find proper place
prev_ddit
next_ddit
= NULL;
= ddit_list_header;
(next_ddit->lower[0]
if
=
&&
NOBOUND
next_ddit->upper[0]
NOBOUND)
/* the first
one
catchall descriptor; skip
is
it
*/
prev_ddit = next_ddit;
next_ddit
=
next_ddit->next_ddit_definition;
}
while (next_ddit)
compare = datacmp(next_ddit->upper, new_ddit->upper, val_type);
{
if
(compare < 0)
{
prev_ddit = next_ddit;
= next_ddit->next_ddit_definition;
next_ddit
}
else
break;
)
/* if
if
new
ddit
is at
(!prev_ddit)
beginning of
list
*/
{
new_ddit->next_ddit_definition = ddit_list_header;
#ifdef EnExFlag
printf("Exit2
DM_INSERT_DDn\i M
);
#endif
return(new_ddit);
}
else
{
new
ddit somewhere after first element */
new_ddit->next_ddit_definition = prev_ddit->next_ddit_definition;
/*
prev_ddit->next_ddit_definition
= new_ddit;
73
EnExFlag
#ifdef
printf("Exit3
DM_INSERT_DDnW);
#endif
return(NULL);
}
}
}
/*
/*
end
end
if
(ddit_list_header) */
DM_INSERT_DDIT
*/
74
APPENDIX C
#include
#include
#include
#include
#include
-
CLUSTER DEFINITION TABLE PROGRAM SPECS.
<stdio.h>
"flags. def
"dblocal.def
"beno.def
"commdata.def
#include "dirman.def
#include "dirman.ext"
*CDTM_INSERT(DT,
cdt_definition
struct
d_i_s, cid)
Add
an entry to cluster-definition table and return a pointer to it.
* It will also update DT if necessary (this happens if one or more of the
* descriptor ids defining the cluster are not in DT). Update DTCT. */
/*
struct dt_definition
struct des_id_set
int
*DT;
/* descriptor table */
*d_i_s;
cid;
I
*new_cdt_ptr;
*create_did_link_node(), *desc_ptr,
*prev_desc_ptr;
struct cdt_definition *create_cdt_node(),
struct did_link_definition
struct dtct_definition
int
*create_dtct_node(), *new_dtct_ptr;
k, ind;
#ifdef EnExFlag
printffEnter
CDTMJNSERTV');
#endif
/* sort the descriptor-id set */
DescSort(d_i_s);
/* allocate
new
CDT
entry */
new_cdt_ptr = create_cdt_node();
#ifdef m_pr_flag
printf("new_cdt_ptr =
%oW,
new_cdt_ptr);
#endif
/* store the information in the
new_cdt_ptr->cdt_clus_no =
new
entry */
cid;
75
copy the descriptor ids into the cdt entry */
new_cdt_ptr->cdt_no_desc = d_i_s->dis_desc_count;
/*
desc_ptr
=
create_did_link_node();
#ifdef m_pr_flag
printf("desc_ptr
= %oNn",
desc_ptr);
#endif
di_cpy(&(desc_ptr->dld_did), &(d_i_s->dis_dids[0]));
new_cdt_ptr->cdt_did_pointer = desc_ptr;
for (k = 1; k < d_i_s->dis_desc_count; ++k) {
prev_desc_ptr = desc_ptr;
desc_ptr = create_did_link_node();
m_pr_flag
#ifdef
= %o\n", desc_ptr);
#endif
di_cpy(&(desc_ptr->dld_did), &(d_i_s->dis_dids[k]));
prev_desc_ptr->next_did_link_definition = desc_ptr;
printf("desc_ptr
}
/*
end
for */
desc_ptr->next_did_link_definition =
/*
DT
update
for (k
=
NULL;
0;
and DTCT */
k < d_i_s->dis_desc_count; ++k)
/* find the descriptor id in
DT
{
*/
= binsr_dt(&(d_i_s->dis_dids[k]), DT)) == -1)
not in DT; add it to DT */
ind = CDTM_DT_INSERT(&(d_i_s->dis_dids[k]), DT);
new_dtct_ptr = create_dtct_node();
if ((ind
/* the descriptor id is
#ifdef
m_pr_flag
printf("new_dtct_ptr
= %o\n", new_dtct_ptr);
#endif
new_dtct_ptr->dtct_cdt_pointer = new_cdt_ptr;
new_dtct_ptr->next_dtct_definition = DT->dt_entry[ind].dt_dtct_pointer;
DT->dt_entry[ind].dt_dtct_pointer = new_dtct_ptr;
DT->dt_entry[ind].dt_clus_count++;
end for */
DT->dt_write_required =
}/*
TRUE;
76
#ifdef EnExFlag
printf("Exit
CDTMJNSERTNnW');
#endif
return (new_cdt_ptr);
end
}/*
CDTM.INSERT
*/
CDTM_DT_INSERT(d_id, DT)
/* Add a descriptor id to descriptor
(DT)
table
if it
not already there.
Return the position of the descriptor id in DT. */
struct
Descld *d_id;
*
struct
dt_definition
int
pos, k;
*DT;
/* descriptor table */
{
EnExFlag
#ifdef
printf("Enter
CDTM_DT_INSERTMi")
;
#endif
check to see
/*
if
((pos
=
if
the descriptor id
binsr_dt(d_id,
#* descriptor
DT))
id is already in
already in
is
!= -1)
DT
{
*#
EnExFlag
#ifdef
printf("Exitl
CDTM_DT_INSERT\nW);
#endif
return (pos);
}
*/
/* the descriptor id is not in
DT; add
it
*/
#ifdef special_flag
printf("DT->dt_no_did = %d\n", DT->dt_no_did);
#endif
/*
Check
if
(DT->dt_no_did
for
room
SysError(9,
*/
>= DT_MAX_DESC)
"CDTM_DT_INSERT");
sleep(ErrDelay);
}
77
{
DT
...
done where called
/* find the appropriate position in
DT
for the descriptor id
* in ascending order by descriptor ids) */
for (pos
=
0;
pos < DT->dt_no_did
(DT
is
ranked
&&
strcmp(DT->dt_entry[pos].dt_did.did, d_id->did)
<
0; -H-pos)
new descriptor id should be added to DT at position 'pos' */
move down the descriptor ids in DT to make space for the new one
for (k = DT->dt_no_did; k >= pos; -k)
/* the
/*
{
di_cpy(&(DT->dt_entry[k].dt_did), &(DT->dt_entry[k-l].dt_did));
DT->dt_entry[k].dt_clus_count
DT->dt_entry[k].dt_dtct_pointer
= DT->dt_entry[k-l].dt_clus_count;
= DT->dt_entry[k-l].dt_dtct_pointer;
}
/* add the
new
descriptor id */
di_cpy(&(DT->dt_entry[pos].dt_did), d_id);
DT->dt_entry[pos].dt_clus_count = 0;
DT->dt_entry[pos].dt_dtct_pointer = NULL;
/* increment the
number of
descriptor ids in
-H-DT->dt_no_did;
#ifdef EnExFlag
printf("Exit2
CDTM_DT_INSERTsnW');
#endif
return (pos);
}
I*
end
CDTM_DT_INSERT
*/
78
DT
*/
APPENDIX D
#include
"flags.def
#include
#include
#include
#include
#include
<stdio.h>
-
RECORD CHECKER PROGRAM
SPECS.
"beno.def"
"commdata.def
"reqprep.def
"reqprep.ext"
chk_ParsedTrafUnit(ParsedRequestsPtr, dbid, err_msg)
Check
/*
struct
all
the requests in a traffic unit against the record template */
req_index_definition
char
*ParsedRequestsPtr;
err_msg[], dbid[];
{
struct
k, tmpl_index;
REQtbl_definition *req_ptr;
struct
rtemp_definition
int
*tmpl_ptr,
*get_tmpl_ptr();
#ifdef pr_flag
int z=0;
#endif
#ifdef
EnExFlag
printf( "Enter
chk_ParsedTrafUnit Nn");
#endif
for (k
=
0; (req_ptr
= ParsedRequestsPtr->req_tbl_pointer[k])
!=
I
#ifdef pr_flag
while(req_prr->req_tbl[z][0]!=EORequest)
printf("%s\n'Veq_ptr->req_tbl[z++]);
printf("%s\n" ,req_ptr->req_tbl[z+ 1 ] );
#endif
/*
Get the record template. Template name
req_tbl[4][0] = request type
req_tbl[tmpl_index]
is
found as follows:
= template name
79
pointer */
NULL; ++k)
if((req_ptx->req_tbl[4][0]-'0')==INSERT)
tmpl_index=7;
else
tmpl_index=8;
/*
switch (req_ptr->req_tbl[4][0]
case
-
'0')
{
INSERT:
tmpl_index =
7;
break;
case
RETRIEVE:
case
case
case
RET_COM:
DELETE:
UPDATE:
tmpl_index =
8;
}
*/
tmpl_ptr = get_tmpl_ptr(dbid,req_ptr->req_tbl[tmpl_index]);
if
#ifdef
(!chk_request(ParsedRequestsPtr->req_tbl_pointer[k],tmpl_ptr,err_msg))
EnExFlag
printf("Exitl chk_ParsedTrafUnit \n");
#endif
return(FALSE);
}
}
/*
end of for loop */
/* all the
#ifdef
parsed requests are ok */
EnExFlag
printf("Exit2 chk_ParsedTrafUnit \n");
#endif
retum(TRUE);
}/*
end chk_ParsedTrafUnit */
80
chk_request(req_ptr, rtemp_j>tr, err_msg)
Purpose:
This procedure checks the validity of a request by comparing it
/* againest recored template. It returns TRUE if the request is valid;
/* otherwise, it returns FALSE and an error message.
*/
/*
*/
*/
/*
struct
REQtbl_definition
struct
rtemp_definition
char
*req_ptr;
*rtemp_ptr;
err_msg[];
{
char
flag_attr;
int
flag_insrt, flag_dlt, flag_rtrv, flag_updt;
int
j,
mod_type;
char
#ifdef
chk_attr_name();
EnExFlag
printf("Enter chk_requesr \n");
#endif
check request type */
switch (req_ptr->req_tbl[4][0]
/*
-
'0')
{
case
INSERT:
flag_insrt
#ifdef
= chk_insrt_rec(req_ptr,rtemp_ptr,err_msg);
EnExFlag
printf("Exitl chk_request \n");
#endif
rerum( flag_insrt)
break;
case
DELETE:
j
=
6;
flag_dlt
#ifdef
= chk_non_insrt_q (req_ptr,&j,rtemp_ptr,err_msg);
EnExFlag
printf("Exit2 chk_request Vi");
#endif
return(flag_dlt);
break;
81
*/
case
RET_COM:
case
RETRIEVE:
j = 6;
flag_rtrv
= chk_non_insrt_q
(req_ptr,&J4temp_j>tr,err_msg);
if (!flag_rtrv)
{
#ifdef
EnExFlag
printf("Exit3 chk_request
W);
#endif
return
(FALSE);
}
EOQuery
j++; /* skip
check target
/*
while
(
list
*/
*/
req_ptr->req_tbl[j][0] != ETList
)
{
flag_attr
= chk_attr_name(rtemp_ptr,req _ptr->req_tbl[j]);
r
if ( flag_attr
==
'0'
)
{
concat("invalid attribute name: ",req_ptr->req_tbl|j],err_msg);
#ifdef
EnExFlag
printf("Exit4 chk_request Vi");
#endif
return(FALSE);
)
j++; /* skip attribute name */
j++; /* skip the aggregate */
end while
}/*
check the
/*
if
(
(
req_ptr->req_tbl[j][0] != ETList
attribute for
)
*/
by clause */
strcmp(req_ptr->req_tbl[j],"000") !=
)
{
flag_attr
if
(
= chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]);
flag_attr
==
'0'
)
I
#ifdef
EnExFlag
printf("Exit5 chk_request Vi");
#endif
cone at ("invalid
attribute
retum(FALSE);
}
)
#ifdef
EnExFlag
printf("Exit6 clik_request Vi");
#endif
name
:
",req_ptr->req_tbl[j],en_msg);
retum(TRUE);
break;
case
UPDATE:
1=6;
= chk_non_insrt_q(req_ptr,&j^temp_ptr,err_msg);
flag_updt
if
(!flag_updt)
{
#ifdef
EnExFlag
W);
printf("Exit7 chk_request
#endif
retum(FALSE);
}
j++; /* skip
EOQuery
mod_type =
req_ptr->req_tbl[j][0]
*/
-
'0';
j++; /* skip modifier type */
/* check the attr-being-modified */
= chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]);
flag_attr
if
(
flag_attr
==
'0'
)
{
#ifdef
EnExFlag
printf("Exit8 chk_request \n");
#endif
concat(" invalid attribute
name being
modified:
",
req_ptr->req_tbl(j], err_msg);
retum(FALSE);
>
j++; /* skip attribute being modified */
/* check modifier type */
switch
mod_type
(
)
(
case
MTO:
/*
check the new value for valid value type */
/*
<«
TBC
>»
*/
break;
case
case
MT1:
MT2:
flag_attr
if
(
= chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]);
== '0' )
flag_attr
{
#ifdef
EnExFlag
printf("Exit9 chk_request Nn");
#endif
83
concat(" invalid base attribute name:
",
req_ptr->req_tbl[j], err_msg);
retum(FALSE);
}
break;
MT3:
case
flag_attr
= chk_attr_name(rtemp_ptr req _ptr->req_tbl[j]);
if (flag_attr
=
t
r
'0')
{
concat(" invalid base attribute name:
",
req_ptr->req_tbl[j], err_msg);
#ifdef
EnExFlag
printf("ExitlO chk_request
W);
#endif
retum(FALSE);
}
j++; /* skip base attribute */
while ( req_ptr->req_tbl[j][0] !=
j
+=
#ifdef
)
skip EOExpr and number of predicates */
= chk_non_insil_q(req_ptr,&j,rtemp_ptr,err_msg);
2; /*
£lag_attr
if
EOExpr
(!flag_attr)
{
EnExFlag
printf("Exitll chkjrequest
W);
#endif
return
(FALSE);
I
break;
case
MT4:
flag_attr
if
(
= chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]);
== '0'
flag_attr
)
(
concat(" invalid base attribute name:
",
req_ptr->req_tbl[j] ,err_msg);
#ifdef
EnExFlag
printf("Exitl2 chk_request VT);
#endif
retum(FALSE);
)
break:
default:
concati "invalid modifier type",(mod_type
84
+
'0 ),err_msg);
,
EnExFlag
#ifdef
printf("Exitl3 chk_request Nn");
#endif
return(FALSE);
break;
}/*
end switch
(
mod_type
)
*/
EnExFlag
#ifdef
printf("Exitl4 chk_request Nn");
#endif
retum(TRUE);
break;
default:
concat( "invalid request type: ",req_ptr->req_tbl[4],err_msg);
EnExFlag
#ifdef
printf("Exitl5 chk_request \n");
#endif
return(FALSE);
break;
end switch
}/*
(
req_ptr->req_tbl[4][0]
-
'0'
)
*/
EnExFlag
#ifdef
printf("Exit chk_request \n");
#endif
}/*
end chk_request */
chk_insrt_rec(req_ptr, rtemp_ptr, err_msg)
*/
/*
Purpose:
/*
/*
*/
This routine checks if all (and only) attribute names in
record template are in an insert request. It also checks the */
/*
value type associated with each attribute.
/* if ok; otherwise
it
returns
struct
REQtbl_definition
struct
rtemp_definition
err_msg[];
char
FALSE
flag_ins;
int
k;
int
no_kwrd;
char
returns
TRUE
and an error message.
*req_ptr;
*rtemp_ptr;
(
int
It
str_no_attr[5];
85
*/
*/
EnExFlag
#ifdef
printf("Enter chk_insrt_rec
W);
#endif
no_kwrd = str_to_num(req_ptr->req__tbl[5]);
if ( no_kwrd != rtemp_ptr->no_cntries )
{
num_to_str(rtemp_ptr->no_entries, str_no_attr);
concat( "number of keywords in the insert request should be:
",
str_no_attr, err_msg);
EnExFlag
#ifdef
printf("Exitl chk_insrt_rec \n");
#endif
return(FALSE);
)
for
k =
(
0;
k < rtemp_ptr->no_entries;
-H-k
)
{
flag_ins
= insrt_attr_name(rtemp_ptr->rt_entry[k].attr_name,
rtemp_ptr->rt_entry [k] value_data_type
.
,
req_ptr ,err_msg )
if (!flag_ins)
{
EnExFlag
#ifdef
printf("Exit2 chk_insrt_rec Vi");
#endif
re turn( FALSE);
}
end for */
EnExFlag
}/*
#ifdef
printf("Exit3 chk_insrt_rec Nn");
#endif
return(TRUE);
}/*
end chk_insrt_rec */
insrt_attr_name(att_name, att_val_type, req_ptr, err_msg)
/*
/*
Purpose:
This routine checks
*/
if attribute
name
in record
template
is
*/
*/
and also checks the validity of the
value type associated with attribute name. It returns TRUE if */
*/
attribute name exsists and valid valuetype; otherwise, it
/* in the insert request,
/*
/*
/* returns
struct
char
FLASE.
REQtbl_definition
*/
*req_ptr;
att_name[]:
86
char
att_val_type;
char
err_msg[];
{
int
j;
int
flag_value;
EnExFlag
#ifdef
printf( "Enter insrt_attr_name \n");
#endif
j
=
6;
while
req_ptr->req_tbl[j][0] !=
(
EORecord
{
if
(
strcmp(att_name,req_ptr->req_tbl[j])
)
=
)
{
flag_value
if
=
chk_value_type(att_val_type^eq_ptr->req_tbl[j]);
(!flag_value)
{
concat("invalid value type for attribute:
",
req_ptr->req_tbl[j-l], err_msg);
#ifdef
EnExFlag
printf("Exitl chk_insrt_rec
W);
#endif
retum( FALSE);
}
EnExFlag
#ifdef
printf("Exit2 chk_insrt_rec \n");
#endif
return(TRUE);
}
j
=
}/*
j
+
2; /*
end while
skip attribute
name and
*/
concat("missing attribute name:
#ifdef EnExFlag
",
att_name, err_msg);
printf("Exit3 chk_insrt_rec proc \n");
#endif
return( FALSE);
}/*
attribute value */
end insrt_attr_name
*/
87
chk_non_insrt_q(req_ptr,
rtemp_ptr, err_msg)
i,
*/
/* Purpose:
This procedure checks the validity of the query part of
/*
/* non-insert request.
/* returns
FALSE
It
returns
TRUE
if valid;
otherwise,
it
and an error message.
*/
*/
*/
REQtbl_definition *req_ptr;
rtemp_definition *rtemp_ptr;
struct
struct
char
err_msg[];
int
*i;
int
flag_value;
{
char
chk_attr_name
char
flag_attr;
(
);
EnExFlag
#ifdef
printf("Enter chk_non_insrt_q Vi");
#endif
while
(
req_ptr->req_tbl[*i][0] !=
EOQuery
)
{
flag_attr
if
(
= chk_attr_name(rtemp_ptr^req_ptr->req_tbl[*i]);
flag_attr
==
'0'
)
{
concat( "invalid attribute name:
",
req_ptr->req_tbl[*i], err_msg);
EnExFlag
#ifdef
printf("Exitl chk_non_insrt_q \n");
#endif
return
(FALSE);
}
++*i; /* skip attribute
name
*/
++*i; /* skip relational operator */
flag_value
if
=
chk_value_type(flag_attr,req_ptr->req_tbl[*i]);
(!flag_value)
(
concat( "invalid value for attribute:
",
req_ptr->req_tbl[*i-2], err_msg);
#ifdef
EnExFlag
printf("Exit2 chk_non_insrt_q \n");
#endif
retum(FALSE);
88
++*i; /* skip attribute value */
req_ptr->re<Ubl[*i][0] ==
++*i; /* skip EOConj */
if (
}/*
end while
EOConj
)
*/
EnExFlag
#ifdef
printf("Exit3 chk_non_insrt_q \n");
#endif
return(TRUE);
}/*
end chk_non_insrt_q */
chk_attr_name(rtemp_ptr, attribute)
char
*/
Purpose:
attribute
in
the
*/
This
procedure
checks
if
an
name
exsists
I*
/* record template.
*/
If the attribute is in the record template,
/* it returns the type of value (i, s, f....) for the attribute.
*/
/*
/* If the attribute is not in the record template,
rtemp_definition
struct
char
it
returns '0'. */
*rtemp_ptr,
attribute [];
{
int
i;
EnExFlag
#ifdef
printf("Enter chk_attr_name \n");
#endif
for
i
(
if (
=
0;
i
< rtemp_ptr->no_entries;
++i
)
strcmp(rtemp_ptr->rt_entry[i].attr_name, attribute)
{
EnExFlag
#ifdef
printf("Exitl chk_attr_name VT);
#endif
return(rtemp_ptr->rt_entry[il.value_data_type);
}
#ifdef
EnExFlag
printf("Exit2 chk_attr_name \n");
#endif
return('O');
}/*
end chk_attr_name
*/
89
==
)
chk_value_type(value_type, val)
/* Purpose:
/*
This procedure checks the validity of a value.
TRUE
/*
otherwise,
if valid;
char
value_type;
char
val[];
returns */
FALSE.
returns
it
*/
It
{
i,negflag=0;
int
EnExFlag
#ifdef
printf("Enter chk_value_type \n");
#endif
switch
(
value_type
)
{
case V:
allow for negative numbers */
/*
< '0'
retum(FALSE);
if((val[0]
val[0]
II
> '9')&&(val[0]!='-'))
else
if(val[0]=='-')
/* since
it
* not just
is
negative,
'-'
make
sure
numbers follow,
*/
negflag=l;
for(i=l; val[i]!='\0';
if(val[i]
#ifdef
<
'0'
II
++i
val[i]
>
)
'9')
{
EnExFlag
printf("Exitl chk_value_type \n");
#endif
retum(FALSE);
}
/* scold the user if just a negative sign */
if((negflag)&&(i=l))
return( FALSE);
#ifdef
EnExFlag
printf("Exit2 chk_value_type \n");
#endif
return(TRUE);
break;
90
*/
case
V:
EnExFlag
#ifdef
printf("Exit3 chk_value. .type
V);
#endif
return(TRUE);
break;
case
T:
EnExFlag
#ifdef
printf("Exit4 chk_value. .type
W);
#endif
return(TRUE);
break;
default
#ifdef
:
EnExFlag
printf("Exit5 chk_value_ .type
V);
#endif
return(FALSE);
break;
}/*
#ifdef
end switch
*/
EnExFlag
printf("Exit6 chk_value_type \n");
#endif
}/*
end chk_value_type
*/
91
LIST
1.
OF REFERENCES
Korth, H. F. and Siberschatz, K., Database System Concepts, p.348,
Book
McGraw-
Hill
Co., 1986
2.
MacLennan, B. J., Principles of Programming Languages: Design, Evaluation and
Implementation, 2d ed., p.432, CBS College Publishing, 1987.
3.
Stubbs, D.
Modula-2,
F.,
and Webre, N. W., Data Structures with Abstract Data Types and
Brooks/Cole Publishing Co., 1987.
p. 396,
4.
Demurjian, S. A. and Hsiao, D. K., "New Directions in Database-Systems
Development,"
Technical
Report,
NPS-52-85-001, Naval
Research
and
Postgraduate School, Monterey, California, February 1985.
5.
and Hsiao, D. K., "The Use of a Database Machine for Supporting
Workshop on Computer Architecture for
Nonnumeric Processing (August 1987).
Banerjee,
J.
Relational Databases," Proceedings 5th
6.
Benerjee, J., Hsiao, D. K., and Ng, F., "Database Transformation, Query
Transformations and Performance Analysis of a Database Computer in Supporting
Hierarchical Database Management," IEEE Transactions on Software Engineering
Vol. SE-6, No. I, (January 1980).
7.
Benerjee,
and Hsiao, D. K., "A Methodology for Supporting Existing
Databases with New Database Machines," Proceedings of National
Conference (1978).
J.
CODASYL
ACM
8.
G.. "Design and Analysis of an SQL Interface for a Multi-Backend
Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California,
Macy,
March 1984.
9.
Weisher, D., "Design and Analysis of a Complete Hierarchical Interface for a
Multi-Backend Database System," M. S. Thesis, Naval Postgraduate School,
Monterey, California, June 1984.
10.
Wortherly, C. R., "Design and Analysis of a Network Interface for a MultiBackend Database System," M. S. Thesis, Naval Postgraduate School, Monterey,
California, December 1985.
11.
"The Design and Analysis of a Complete Entity-Relationship
Interface for the a Multi-Backend Database System," M. S. Thesis, Naval
Postgraduate School, Monterey, California, December 1985.
Goisman.
P.
L.,
92
12.
Kloepping, G. R. and Mack, J.F., "The Design and Implementation of a Relational
for the Multi-Lingual Database System," M. S. Thesis, Naval
Postgraduate School, Monterey, California, June 1985.
Interface
13.
Benson, T.
P.
and Wentz, G.
L.,
"The Design and Implementation of a
Hierarchical Interface for the Multi-Lingual Database System,"
M.
S. Thesis,
Naval Postgraduate School, Monterey, California, June 1985.
14.
Emdi, B., "The Implementation of a CODASYL-DML Interface for the MultiLingual Database System," M. S. Thesis, Naval Postgraduate School, Monterey,
December 1985.
California,
15.
Anthony,
Interface
J.
A. and Billings, A.
for
the
J.,
Multi-Lingual
"The Implementation of an Entity-Relationship
Database System," M. S. Thesis, Naval
Postgraduate School, Monterey, California, December 1985.
and Menon, M. J., "Design and Analysis of a Multi-Backend
Database System for Performance Improvement, Functionality Expansion and
Capacity Growth (Part I)," Technical Report, OSU-CISRC-TR-81-7, The Ohio
State University, Columbus Ohio, July 1981.
16.
Hsiao, D. K.
17.
and Menon, M. J., "Design and Analysis of a Multi-Backend
Database System for Performance Improvement, Functionality Expansion and
Capacity Growth (Part II)," Technical Report, OSU-CISRC-TR-81-8, The Ohio
State University, Columbus Ohio, August 1981.
Hsiao, D. K.
93
INITIAL DISTRIBUTION LIST
No. Copies
1.
Defense Technical Information Center
Cameron
Alexandria, Virginia
2.
Library,
2
Station
22304-6145
Code 0142
2
Naval Postgraduate School
Monterey, California 93943-5002
3.
Chief of Naval Operations
Director, Informations Systems (OP-945)
Navy Department
Washington, D.C. 20350-2000
1
4.
Department Chairman, Code 52
Department of Computer Science
Naval Postgraduate School
Monterey, California 93943-5000
2
5.
Curriculum Officer, Code 37
Computer Technology
Naval Postgraduate School
Monterey, California 93943-5000
6.
Professor David K. Hsiao,
1
Code 52Hq
Computer Science Department
2
Naval Postgraduate School
Monterery, California 93943-5000
7.
Professor Steven A. Demurjian
2
Computer Science & Engineering Department
The University of Connecticut
260 Glenbrook Road
Storrs, Connecticut
8.
06268
Deborah A. McGhee
232 Foxhunt Road
Columbia. S.C. 29223
2
94
.
r
Thesis
M18828
c.l
McGhee
Database creation
and/or reorganization
over multiple database
backends.
22 JAM 93
M18828
c.l
38360
McGhee
Database creation
and/or reorganization
over multiple database
backends.
J