Download Techniques for multiple database integration

Document related concepts

Serializability wikipedia , lookup

Oracle Database wikipedia , lookup

Relational algebra wikipedia , lookup

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Navitaire Inc v Easyjet Airline Co. and BulletProof Technologies, Inc. wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Calhoun: The NPS Institutional Archive
Theses and Dissertations
Thesis and Dissertation Collection
1997-03
Techniques for multiple database integration
Whitaker, Barron D
Monterey, California. Naval Postgraduate School
http://hdl.handle.net/10945/9058
N PS ARCHIVE
1997. OS
WHITAKER,
B.
NAVAL POSTGRADUATE SCHOOL
Monterey, California
THESIS
TECHNIQUES
FOR
MULTIPLE DATABASE INTEGRATION
by
Barron D. Whitaker
March 1997
Thesis
W55045
Thesis Advisor:
Approved
C.
for public release; distribution
is
Thomas
Wu
unlimited.
dudley knox library
ostgraduate school
nava*.
MONTEREY CA 93943^401
REPORT DOCUMENTATION PAGE
Public reporting burden for this collection of information
estimated to average
Form Approved
OMB No.
0704-0188
hour per response, including the time for reviewing instruction,
searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments
regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington
is
1
headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington,
Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503.
VA
22202-4302, and to
the Office of
1
AGENCY USE ONLY (Leave blank)
.
REPORT DATE
2.
3.
March 1997
TITLE
4.
REPORT TYPE AND DATES COVERED
Master's Thesis
AND SUBTITLE
5.
FUNDING NUMBERS
TECHNIQUES FOR MULTIPLE DATABASE INTEGRATION
AUTHOR(S)
6.
Whitaker, Barron D.
PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
7.
8.
Naval Postgraduate School
Monterey,
CA
93943-5000
SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES)
9.
PERFORMING ORGANIZATION
REPORT NUMBER
10.
SPONSORING MONITORING
/
AGENCY REPORT NUMBER
11.
SUPPLEMENTARY NOTES
The views expressed
in this thesis are those of the author
and do not
reflect the official policy
or position of
the Department of Defense or the U.S. Government.
12a.
DISTRIBUTION / AVAILABILITY STATEMENT
Approved
1 3.
for public release; distribution
ABSTRACT (maximum
12b.
is
DISTRIBUTION
CODE
unlimited.
200 words)
There are several graphic client/server application development tools which can be
used to easily develop powerful relational database applications. However, they do not
provide a direct means of performing queries which require relational joins across multiple
database boundaries.
This thesis studies ways to access multiple databases. Specifically, it examines how
a "cross-database join" can be performed. A case study of techniques used to perform joins
between academic department financial management system and course management system
databases was done using PowerBuilder 5.0.
Although we were able to perform joins across database boundaries, we found that
PowerBuilder is not conducive to cross-database join access because no relational database
engine is available to execute cross-database queries.
14.
SUBJECT TERMS
15.
NUMBER OF PAGES
Relational Database, Object Based Database Systems Design,
22
16.
Multiple Database Integration
SECURITY CLASSIFICATION
SECURITY CLASSIFICATION
PRICE CODE
LIMITAITONOF
SECURITY CLASSIFICATION
OF REPORT
OF THIS PAGE
OF ABSTRACT
ABSTRACT
Unclassified
Unclassified
Unclassified
UL
17.
NSN
7540-01-280-5500
18.
19.
20.
Standard Form 298 (Rev. 2-89)
Prescribed by ANSI Std. 239-18-298-102
11
Approved
for public release; distribution
is
unlimited.
TECHNIQUES
FOR
MULTIPLE DATABASE INTEGRATION
Barron D.Whitaker
//
Major, United States Marine Corps
B.S., University of Texas
A & M,
1984
Submitted in partial fulfillment
of the requirements for the degree of
MASTER OF SCIENCE IN COMPUTER SCIENCE
from the
NAVAL POSTGRADUATE SCHOOL
March 1997
ca
ABSTRACT
bBS***
There are several graphic client/server application development tools
which
can
be
applications.
used
to
easily
develop
However, they do not provide
powerful
a direct
relational
database
means of performing
queries which require relational joins across multiple database boundaries.
This thesis studies ways to access multiple databases.
it
examines how
a "cross-database join" can be performed.
A
Specifically,
case study of
techniques used to perform joins between academic department financial
management system and course management system databases was done
using PowerBuilder 5.0.
Although we were able
we found
that
PowerBuilder
to
is
perform joins across database boundaries,
not conducive to cross-database join access
because no relational database engine
queries.
is
available to execute cross-database
VI
1
TABLE OF CONTENTS
I.
E.
INTRODUCTION
1
A.
BACKGROUND
1
B.
THESIS OBJECTIVES
3
C.
METHODOLOGY
3
D.
ORGANIZATION OF THE THESIS
4
RELATIONAL DATABASE MODEL
5
A.
RELATIONAL MODEL OVERVIEW
5
B.
STRENGTHS
6
C.
LIMITATIONS
7
m. INTEGRATION METHODS
9
A.
MULTIPLE DATABASE INTEGRATION
B.
PHYSICAL INTEGRATION
10
C.
LOGICAL INTEGRATION
1
D.
9
1.
Database Level
11
2.
Application Level
15
INTEGRATION STRATEGY CHALLENGES
1.
Data Type Conflicts
17
17
2.
Duplicate Data
18
3.
Concurrency and Consistency
18
vn
IV.
CASE STUDY: INTEGRATION OF COURSE MANAGEMENT AND FINANCIAL
MANAGEMENT DATABASES
A.
B.
V.
21
BACKGROUND
21
Management System
1.
Financial
2.
Course Management System
21
3.
InterDatabase Relationships
21
INTEGRATION TECHNIQUE
22
CMS
22
1.
Create
2.
Multiple Database Connections
23
3.
Cross-database Join Query
24
4.
Query Subcomponents
25
5.
Data
6.
Putting the Pieces Together with
7.
Performance Drag
29
8.
A More Efficient Technique
29
Window
26
Objects
PERFORMANCE ISSUES
A.
B.
VI.
21
Windows and
Scripts
27
33
QUERY OPTIMIZATION
33
1.
Relational Join Minimization
33
2.
Table Size
33
PARALLELISM
35
CONCLUSIONS
37
Vlll
APPENDIX A. ENTITY RELATIONSHIP DIAGRAM FOR THE FMS DATABASE.... 39
APPENDIX B. FMS DATABASE SCHEMA
41
APPENDIX C. CMS ENTITY-RELATIONSHIP DIAGRAM
45
APPENDIX D. CMS DATABASE SCHEMA
47
APPENDIX E. CMS SCRIPTS AND SAMPLE SCREEN CAPTURES
53
LIST OF
REFERENCES
59
BIBLIOGRAPHY
61
INITIAL DISTRIBUTION LIST
63
IX
LIST OF FIGURES
view of tables from disparate databases
1.
Integrated
2.
Physical Database Integration
3.
Logical Integration
4.
Application Level Integration
16
5.
Application script for concurrent database connections
24
6.
Cross-database query between
7.
First
8.
Second subcomponent query
26
9.
Data source query for "d_teach_less_than_2" data window
26
Data source query for "d_pi_less_than_2" data window
27
10.
at the
10
Database Level
13
CMS andFMS
subcomponent query
25
25
11. Cross-database join illustration using data
windows
12.
Nested for loop
13.
Improved data source query
14.
Single for loop script to perform cross-database join
script to
2
perform cross-database join
for
data
window
XI
28
28
30
31
Xll
INTRODUCTION
I.
BACKGROUND
A.
Organizations
use
and applications
systems
database
information needed to conduct daily business and operations.
to
process
Historically,
these database systems were individually developed to meet specific needs.
As
an
organization's
additional
applications
requirements
information
were written
maintain,
to
changed
over
process,
and extract
time,
information from the individual databases. Typically however, each of
these databases provided only a part of the total picture needed by an
An example
organization.
of this
illustrated
is
example integration of the two databases makes
query to retrieve the answer
books checked out?".
of
either
the
to the question,
However,
preexisting
Additionally, as a result of the
evolve,
which
of
is
it
how
often
becomes necessary
1.
In
this
possible to perform a
is
databases
not available from
in
this
example.
that databases in organizations tend to
to
maintain databases that contain data
duplicated in other databases. This leads to the additional problem
to
deal with data inconsistencies between the differing systems.
This situation
a
underlying
Figure
"How many sophomores have
information
this
way
it
in
is
inefficient both from a
data processing perspective since
associated
with
maintaining
the
it
same
management perspective and from
requires the additional overhead
information
in
more than one
database.
also,
It,
makes
it
difficult,
if
not
to
maintain
Manpower
intensive
impossible,
consistency and concurrency of the duplicate data.
solutions for eliminating inconsistencies and managing concurrency are
costly
and
when compared
inefficient
solutions to the problem.
efficient
If
the
to
potential
of automated
an organization could identify effective and
automated methods for integrating existing databases
significantly enhance the usefulness of the information
it
it
could
already maintains
by eliminating database organizational inefficiencies which cause data
duplication and inconsistencies.
Integrated
book checkout
book
I
View
student info
student id
I
due date
I
I
student id
I
classification
I
student info
student id
Figure
1
.
I
classification
Integrated view of tables from disparate databases
I
THESIS OBJECTIVES
B.
The purpose of
this
thesis
determine
to
is
if
possible
is
it
to
concurrently access multiple independent databases within an organization
in a
way
that provides an integrated
Furthermore, assuming that this
and
efficiently
effectively
management system
applications?
model and
to
What
is
view of the information they contain.
possible, can multiple databases be
integrated
provide
a
into
focal
a
single
cohesive
of access
point
for
database
are the limitations in the context of the relational
development
in the context of the current database application
technology?
database
In the process of addressing these questions this thesis also
provides some analysis of the different methods which might be used to
integrate
multiple
databases
and
lists
the
relative
advantages
and
disadvantages of these methods.
C.
METHODOLOGY
In order to test the actual ability to integrate multiple databases a
relatively simple subset of the overall larger
problem
model
was
built
around
a
problem was selected.
university
academic
The
department
organization with only two specific database applications, each supported
by
a dedicated database.
PowerBuilder
5.0,
management system front-end development
which
tool,
is
a relational database
was used
to
develop
a
course scheduling and management application and supporting database.
This application was designed to provide a university academic department
3
with a tool to manage and plan course scheduling and course assignments
to
teachers.
A
method was then developed
management
scheduling
and
department
financial
database
management
developed using PowerBuilder
5.0.
with
system
to
a
integrate
pre-existing
which
database
course
the
academic
was
also
This method was then evaluated and
conclusions were developed.
D.
ORGANIZATION OF THE THESIS
This thesis
is
discussion of the relational database model,
its
Chapter
organized as follows:
its
II
presents a brief
impact on database design,
strengths for managing and accessing data, as well as
The primary reason
limitations
Chapter
III
theoretical
its
for this discussion is to set the frame
which may occur
in
implementing
limitations.
work
for
any
a multiple database solution.
provides an overview and some discussion of the various
approaches which might be taken
to
develop an integration
solution as well as potential advantages and disadvantages.
Chapter IV
presents a case study of one of the approaches discussed in Chapter
III
and
evaluates this solution against the desired goals of database integration.
Chapter
V
looks
at the
various database integration performance issues in
the context of the relational
in
model and considers the impact of these issues
terms of the model's strengths as well as
its
limitations.
Chapter VI will present conclusions drawn from this research.
Finally,
RELATIONAL DATABASE MODEL
II.
RELATIONAL MODEL OVERVIEW
A.
The
since
its
replace
is
relational
model has been the subject of
introduction by
many
Codd
in the early 70's.
a great deal of research
In the 80's
began
Although,
are looking for
management
may
this
ways
to
be changing as more and more organizations
improve their information management and access
capabilities through the use of object oriented solutions such as the
database model. [Ref.
1]
In the relational model, data
which are frequently referred
remember
to
as tables.
is
However,
stored as relations
it
is
relation is best represented as a mathematical set of tuples
ordering on
its
members.
imposed on
its
members
A
table
since
it
however usually has some
consists of rows or tuples.
is
relationship
which
members
a
in
important to
which has no
sort of order
represents an actual file or
physical implementation of an instance of a relation.
row
ODMG
two terms are not completely interchangeable since a
that these
tuples, each
to
preexisting networking and hierarchical database systems and
currently the basis of most commercially available database
systems.
it
Because by definition,
A
some other
relation or table
a relation is a set of
distinct and corresponds to an instance of an entity or a
exists
row
relationship instance.
between two or more entity instances.
or tuple
The
are
the
attributes
set
The
of attributes of the entity or
which correspond
to the
columns of
a
table are the properties of the entity or relationship instance.
basic
schemas or definitions and a
set
of integrity constraints placed upon those
There are three fundamental update operations for a relation and
They
they are limited by the integrity constraints on the relation schemas.
are
MODIFY, UPDATE,
and,
DELETE. Although Codd
eight relational algebra operations in the relational
of
this
database schemas can be defined by a set of relation
structure,
schemas.
Using
relation
instances,
These
primitive.
only
essentially
operations
five
model
of
SELECT,
are
DIFFERENCE, and CARTESIAN PRODUCT.
originally defined
[Refs.
for manipulation
operations
these
UNION,
PROJECT,
2, 3]
are
A more
detailed
discussion of relation schemas, integrity constraints, update operations,
and relational algebra operations can be found in [Ref.
framework which has formed the basis of
relatively simple
4].
It
a large
is
this
number
of relational database management systems in recent years.
B.
STRENGTHS
•
The
relational
structure
and
model
is
is
based on a simple and uniform data
therefore
easier
to
understand than other
database models such as the hierarchical and network data
models.
It
is
strongly
founded
on
the
mathematical
concepts
of
predicate logic and set theory allowing the representation and
manipulation of data
to
SQL, which
is
basis of
be formalized. [Ref.
3]
This
the
is
one of the most widely used query
languages in database management systems.
•
Allows
a
more
abstract representation
of a database
than
previous models such as the hierarchical and network data
models. [Ref.
3]
Therefore
is
it
easier to design a database
model which more closely resembles
real
world instances and
relationships.
C.
LIMITATIONS
•
Insufficient semantic completeness or expressiveness. [Ref.
This has been a very popular topic for discussion and
have proposed semantic modeling extensions
model
to
improve
it.
[Refs. 5,
semantic modeling extension
model.
because
an
•
it
its
is
to the relational
good example of
a
Entity-Relational (ER)
simple and easy to understand.
is
was intended
Mathematical
However, when
converted into a database schema
resemblance
relational
the
is
many
one of the most well known semantic data models
ER model
lose
it
It is
A
7]
6,
5]
to the real
it
may
often
world entities and relationships
to represent. [Refs. 3, 4]
aggregate
algebra.
In
functions
order to
cannot
be
expressed
specify operations
in
such as
SUM,
the
•
or
AVERAGE,
DBMS.
The
additional operators must be defined by
[Ref. 4]
recursive
closure
relational algebra.
cannot
operation
is
specified
in
In order to specify a query to retrieve all
some form of looping
instances in a recursive relationship,
mechanism
be
needed
as well as a
means
to specify the
number
of recursive levels required for the specific query based on
some base condition.
OUTER JOIN
and
[Ref. 4]
OUTER UNION
extend the relational JOIN and
operations are required to
UNION
operations to handle
joins between relations or tables which either have
which contain null values
or records
have
tuples
or
records
which
are
some tuples
in the join attributes or
not
union
compatible.
[Ref. 4]
The relational JOIN can be
and
is
is
a
very time consuming operation
frequently the cause of performance degradation.
because
it
This
requires the cross-referencing of a tuple at a time
between two relations and there are many possible ways
execute this
operation.
As
a
result,
to
complex queries are
usually a performance bottleneck for large database systems.
[Refs. 4, 7]
INTEGRATION METHODS
III.
MULTIPLE DATABASE INTEGRATION
A.
What
meant by the term multiple database integration?
is
integration itself
or to
make one
means
make
to
into a
whole by bringing
thing a part of something else.
parts together
This definition
context of database design in this thesis to
the
all
mean
is
represented
relationships
databases.
by
data
between
exist
In
fact,
databases
multiple
in
entities
within
the
of a
context
is
the
or
by
represented
duplication of data in disparate databases
create logical relationships
when
applied in
is
the
logical
physical combination of multiple databases into one database.
of multiple databases can be particularly useful
The term
Integration
same information
where
data
meaning
for
in
database
model,
relational
real
data
spread
over
integration,
data
in
Given the
disparate
It
is
then
new and more expanded information views derived from
multiple
procedures for eliminating
taxonomy of
some
world relationships not
databases can then be accessed from a single vantage point.
possible to present
disparate
actually necessary in order to
which model the
multiple
world
real
yet realized within the organization's information infrastructure.
above
or
strategies
databases
and
to
automate
inconsistencies between them.
more
efficient
Following are
a
which might provide database integration solutions
as well as a brief discussion of probable advantages
and disadvantages.
B.
PHYSICAL INTEGRATION
The physical integration of
a
database
forward process and consists of designing
the union of
all
is
conceptually a straight
a single
database which contains
information contained in the preexisting databases which
are to be integrated (Figure 2).
The actual process of designing
this
new
r—
dbl
ij
dbSum
db3
Figure
database
redesign.
2.
Physical Database Integration
can be very difficult because
It
will
applications and, in
always
almost
some
most obvious benefits
to
it
requires
a
require
changes
to
complete system
any preexisting
cases, these changes will be substantial.
this
The
approach over other integration solutions
involving a more loosely coupled arrangement are better performance and
easier
and less expensive application development for data access and
retrieval
after
a
physical integration solution
10
is
achieved.
In
a
large
organization with very large information management, storage, and access
requirements, the physical integration will result in what
is
frequently referred to as a Very Large Database (VLDB). [Ref.
now most
In this
8]
case redesign will require special attention to ensure that a
number of new
One very
to this strategy
challenges are handled.
is
that
it
may
large possible
not be as extensible as other more modular strategies which
maintain the existence of separate databases.
to
achieve a solution which
expansion
to
downside
Special care
scaleable in order to
is
is
also required
accommodate
future
handle new information requirements and avoid the need for
costly changes and/or complete system redesign.
Therefore, in terms of
system maintenance and updates as a result of changes
in the organization's
business and management practices, this solution could easily become cost
prohibitive.
information
In other words,
handling
and
soon as the next new requirement for
as
access
identified,
is
a
strategy
integration could result in the organization being back in the
it
was
C.
in
when
it
started out
same situation
to integration.
LOGICAL INTEGRATION
Database Level
1.
The strategy behind
use
on the path
of physical
logical integration at the database level
of a middleware layer which connects
presents
a
single
interface
to
to
layer
so
that
databases appear to applications as one large database (Figure
11
the
each database separately, yet
application
the
is
multiple
3).
A
significant advantage to this approach
same
that
is
it
effectively maintains the
level of simplicity in query formulation as the physical integration
strategy while at the
same time allowing the use of existing databases with
fewer or no changes to existing structure.
This strategy has
effect on existing applications and only requires changes
to
accommodate changes
inconsistencies
between
in their
little
to
no
where necessary
underlying databases due to field type
duplicate
The
data.
between
connections
preexisting databases and their applications are not affected.
This strategy
could be implemented through a kind of universal database engine which
establishes a single connection to each database.
Its
cross-database queries from an application and parse
subquery components for
each database involved.
example of the university library given
would take the query based on
books checked out?" and divide
the library database
A,
who
is a
the student
are
in
Chapter
the question
it
into
would be based on
it
job
is
to
receive
into the appropriate
For example, using the
the universal engine
1,
"How many sophomores have
two subqueries.
The subquery
the smaller question
for
"Does student
sophomore, have any books checked out?" and the subquery for
admin database would be "Retrieve
sophomores."
a list
of
all
students
who
The universal engine then performs the join of the two
intermediate result sets and applies an aggregate function to return the
final
answer
perform
to the original question.
query
optimization
based
12
At the same time the engine should
on
some
acquired
knowledge
Course Management
System db
.^^
far.i ilty
1
^
namp
fimn
id
code
y
v^
V
s
Figure
3.
Logical Integration
13
at the
Database Level
last
namp;
of the underlying databases structures.
to get the result
For example,
it
number of students
university has a very large
number of books checked
its
probably faster
above by checking the library database for the existence of
a particular "student_id" after first retrieving a list of
largely from
is
comparison
if the
to the total
This logical integration strategy benefits
out.
more modular approach and
information infrastructure.
in
sophomores
Its
its
ability to leverage existing
modularity allows the use of databases
which may be distributed and performance can benefit from
a
greater
degree of parallelism since different parts of a cross-database query can be
Additionally, this more modular and loosely
processed simultaneously.
coupled approach allows easier modification without necessarily affecting
other parts of the organization's information infrastructure.
an organization to upgrade
to
accommodate changes
over time.
itself.
to
It
The key
as the organization's information needs
to the
success in this strategy
Fortunately the
database connectivity
of
systems gradually, and has more flexibility
sounds nice theoretically, but
achieve.
kind
its
common
This can allow
first
(ODBC)
interface
it
is
the universal engine
requires features that are difficult
of these
is
already here.
The open
standard established by Microsoft
necessary
change
for
an application to
is
connect
the
to
multiple relational database management systems. The more difficult part
of the universal engine concept
is
the capability to establish a
base from the connected databases making
14
it
knowledge
possible to perform intelligent
and effective query optimization.
Because
this solution carries
additional overhead of the middleware layer
perform
as well
may
architecture
situations
more
as a
it
might not be expected
the
to
tightly coupled approach, yet its distributed
actually yield better overall performance particularly in
where there
is
heavy user load or when handling large queries
which span multiple databases.
cross-database
it
with
queries
Resolving data type mismatches where
involved
are
is
more of
integration strategy as well as the question of
a
how
challenge
with duplicate
to deal
data which leads to concurrency and consistency problems.
with this
Maintenance of
concurrency and consistency between duplicate and related data could be
handled by the implementation of business rules in the middleware layer
for data
which
is
related across database boundaries.
implemented and how well
it
would work
is
How
this
would be
not clear.
Application Level
2.
Integration at the application level could be accomplished by defining
connections within an application to preexisting databases as illustrated in
Figure
This approach
4.
solution in that
coupled
solution
Integration
databases
it is
is
a
is
similar to the previous logical integration
more modular approach, but
since
the
boundaries
become
results in a
a
little
more
tightly
more blurred.
of the information contained in the disparate organizational
handled completely
at
the application level.
Therefore,
all
cross-database query optimization must be handled by the application
15
faculty
'
namp
emc
Academic Dept
Course
id
code
las!
name
Academic Dept
Mgmt
Financial
Mgmt
Application
Application
Figure 4. Application Level Integration
developer.
This makes the application development process
complex. The effectiveness of
extent
this
approach will also depend
much more
to
a great
on the robustness of the application development environment.
16
Because
this is a loosely
coupled approach,
redesign of the existing system, however
it
it
does not require significant
will require a
more deliberate
application design effort to resolve data conflicts, minimize duplicate data,
and to maintain data concurrency and consistency between the different
databases.
INTEGRATION STRATEGY CHALLENGES
D.
Data Type Conflicts
1.
A
is
how
challenge that
to
is
common
to all
of the above integration strategies
resolve differences in data types between cross-database join
The differences can range from differences
fields.
different types to different field masks.
might be defined
another.
Using
in
For example, dates
in
"dd/mm/yy" format and
a
physical
integration
in
field
in
length to
one database
"dd/mmm/yyyy" format
strategy,
these
differences
in
are
resolved in the design of the integrated database and resulting changes in
data types and field lengths will most likely require changes to existing
applications.
data
However
special care
must be taken
to
avoid corruption of
during the process of migrating data from preexisting databases.
Resolution of data conflicts in the logical integration strategies will most
likely
require
changes
to
both
existing
databases
and
applications.
However, data conflict resolution should only be necessary where actual
joins across database boundaries are required.
These cases can then be
resolved as they are encountered during the integration design process.
17
Duplicate Data
2.
Another
common problem
existence of duplicate data.
database integration
resolved
integration
strategy,
As we
through
but
proper
requires
is
the
stated earlier, one of the goals of
minimize duplication between databases.
is to
automatically
related to that of data conflicts
normalization
special
in
is
physical
the
with
attention
This
both
logical
In both application level and database level logical
integration approaches.
integration approaches, minimizing duplicate data will require changes to
existing databases and applications.
will require a
Integration at the application level
more substantial redesign
effort but can be
done
as part of
the actual integration implementation while at the database level fewer
changes
3.
to applications
should be required.
Concurrency and Consistency
Concurrency
and
consistency
issues
arise
primarily
from
the
existence of duplicate data and are therefore not expected to present any
special problems for the physical integration strategy.
over
wasted
storage
and
performance
Apart from concerns
inefficiencies,
and
concurrency
consistency issues are the primary reason for minimizing duplicate data and
they present significant challenges for the logical integration strategies.
handled by integrity constraints on the
a single database,
consistency
database schema.
The problem with
way
to
is
In
the logical strategies is there
impose actual integrity constraints on related
18
fields
is
no
in different
databases.
Instead, business rules must be used to maintain consistency
and must be defined in the application for application level integration
this is possible
with the application development tool in use.
possible, then the benefits of integration
reports for decision support.
may be
it
is
not
limited to read only
Business rules to maintain consistency should
be defined in the universal engine for logical integration
level.
If
if
Concurrency controls will most likely have
procedural constraints in the application layer and
the application development tool in use.
19
to
at the
database
be implemented using
may
also be limited
by
20
CASE STUDY: INTEGRATION OF COURSE
MANAGEMENT AND FINANCIAL MANAGEMENT
DATABASES
IV.
A.
BACKGROUND
Financial
1.
Management System
The Financial Management System (FMS)
application developed using PowerBuilder 5.0.
is
Its
database
a preexisting
purpose
is to
provide a
university academic department with an automated decision support tool
for
managing
responsibilities.
of this database
the
department's
resources
financial
The Entity-Relationship diagram
is
illustrated in
contained in Appendix B.
Appendix
See [Ref.
9]
A
and
fiscal
for the relevant entities
and the associated schema
is
for full details of this application
and supporting database.
Course Management System
2.
The
Course
Management
(CMS)
System
database
manages course scheduling and course assignments
professors.
It
was developed using PowerBuilder
Relationship diagram for the
and the associated schema
database
is
instructors
5.0.
illustrated in
and
The EntityAppendix C
contained in Appendix D.
InterDatabase Relationships
3.
In
is
CMS
to
application
the
case of the
FMS
and
CMS
databases, there
is
a
need
to
accomplish some level of integration since many of the professors which
21
are assigned to teach classes are also principle investigators (PI) for
one or
more research
of the
projects.
This results in a natural
The academic department
information contained in the two databases.
would
like to be able to track the relationship of courses taught
since there
the
is a
number of
there
intersection
may
by Pis
business rule which links the research dollars for each PI to
classes he or she
is
required to teach.
On
further inspection
be other relationships between the two databases which could be
taken advantage
of,
but the situation described above was sufficient to
answer the primary questions of
whether or not
it
was possible
this
thesis.
We
wanted
to first establish client/server
multiple databases simultaneously and,
if so,
to
demonstrate
connections to
perform inner joins on tables
from each of those databases.
B.
INTEGRATION TECHNIQUE
1.
Create
CMS
Design and create the
CMS
application
and supporting database
based on the need for an academic department to schedule courses and
make
course
assignments
to
professors
and
instructors.
The
CMS
application also serves as the integration application in order to test cross-
database join techniques which can be used in logical integration
application level.
22
at
the
2.
Multiple Database Connections
Initially
it
was not
how
clear if and
multiple simultaneous database
connections could be created within an integrated database application.
discovered that this
provides
for
the
is
a
creation
relatively
We
simple process since PowerBuilder
of multiple
"transaction"
objects
within
an
application which can then be used to represent a client connection to a
The code
database on the server.
in
Figure
5
is
an excerpt from the
"application open" event script contained in Appendix E shows
was done.
The "cms_trCursor"
SQLCA"
script statement "fms_trCursor
"Transaction" object and assigns
it
The
script
assigns the "cms_trCursor" to point to
the default database connection profile
The
this
and 'fms_trCursor" are declared as gobal
"Transaction" type variables in the PowerBuilder script painter.
statement "cms_trCursor =
how
"SQLCA"
CREATE
=
to the
for the
CMS
application.
transaction" creates a
"fms_trCursor".
new
The necessary
database connection profile parameters are retrieved from the database
profile sections of the "PB.ini" file using the "ProfileStringO" function.
The
two
Transaction
databases using the
objects
"CONNECT"
are
then
statement.
23
connected
to
their
respective
//Initial
default
//In this
database transaction
application this transaction
connect to the CMS database
= ProfileString("PB.INr,"Database","DBMS" " ")
SQLCA.DbParm = ProfileString( PB.INI Database","DbParm", " ")
//is
used
to
SQLCA.DBMS
,
,,
,,
,,
I
//Transaction cms_trCursor declared
cms_trCursor =
in
menu
"declare:global"
SQLCA
CONNECT using cms_trCursor;
cmsJrCursor.SQLCODE <> THEN
MessageBoxfConnect Error", "CMS connection
IF
failed")
HALT
END
IF
//Transaction fms_trCursor declared
fms_trCursor =
in
"declare:global"
menu
CREATE transaction
fmsJrCursor.DBMS = ProfileString("PB.INI","PROFILE Fms'V'DBMS" ," ")
fmsJrCursor.DbParm = ProfileString("PB.INI","PROFILE Fms","DbParm"," ")
CONNECT using fmsJrCursor;
IF
fmsJrCursor.SQLCODE <> THEN
MessageBoxfConnect Error","FMS connection
failed!")
HALT
END
IF
Figure
5.
Application script for concurrent database connections.
Cross-database Join Query
3.
The
first step to
cross-database join
developing a method or technique for performing
is to
create a standard
question you wish to answer that
databases.
The tables
database name.
The query
relationship between the
faculty
are
members who
is
CMS
query which represents the
based on the tables in the relevant
identified
in
SQL
a
using dot notation following the
Figure 6 reflects the previous example
and
FMS
databases and returns a
are Pis for only one research project and
teaching less than two courses.
24
list
who
of
are
SELECT faculty_id, l_name, f_name, mi
FROM
CMS. faculty
WHERE
f
.
f,
FMS. employee e
faculty_id = e emp_id_code
.
and EXISTS (SELECT
f.faculty_id
FROM
CMS. teach t
WHERE
t.teach_fac_id = f.faculty_id
GROUPBY f.faculty_id
HAVING COUNT f.faculty_id <
and EXISTS
(SELECT
2)
e.emp_id_code
FROM
FMS. pi p
WHERE
p emp_id_code = e emp_id_code
.
.
GROUPBY e.emp_id_code
HAVING COUNT e emp_id_code =
.
Figure
6.
Cross-database query between
CMS
and
1)
FMS
Query Subcomponents
4.
Once
the proper cross database query
subcomponent
queries
identified,
the
form
will
the
PowerBuilder "data window" objects which will have
The
created.
the
CMS
faculty
first
subcomponent identified
database and
members who
is
is
are teaching less than 2 courses.
f.faculty_id
FROM
CMS. teach t, CMS. faculty
WHERE
t.teach_fac_id = f.faculty_id
f
GROUP BY f.faculty_id
HAVING COUNT f.faculty_id <
Figure
7.
2
First
subcomponent query
25
to
is
broken into
the
basis
separate
for
the
be defined and
based on data contained in
the part of the above query
SELECT
it
from
data
retrieve
to
subcomponents
These
databases.
necessary
is
which retrieves
(Figure 7)
all
)
The second and only other subcomponent
contained in the
FMS
in this
database and retrieves
all
example
is
based on data
employees who are Pis for
only one research project. (Figure 8)
SELECT
e emp_id_code
FROM
FMS. pi p,
WHERE
p emp_id_code = e emp_id_code
.
FMS employee e
.
.
.
GROUP BY e emp_id_code
.
HAVING COUNT e emp_id_code <
.
Figure
5.
8.
2
Second subcomponent query
Data Window Objects
Using
Powerbuilder's
SQL
select
painter,
the
actual
queries
representing the subcomponents are created and used as the data source for
separate "data
window"
data source queries for
data
window
objects. Figure 9 and Figure 10
show
the resulting
the "d_teach_less_than_2" and "d_pi_less_than_2"
objects respectively.
SELECT
"faculty". "faculty_id",
"faculty" "f_name",
.
FROM
WHERE
"faculty",
"faculty" "mi"
.
"teach"
.
"faculty". "faculty_id",
"faculty" "f_name",
.
ORDER BY
.
("teach". "teach_fac_id" = "faculty" "faculty_id"
GROUP BY
HAVING
"faculty" "l_name",
(
"faculty" "l_name",
.
"faculty" "mi"
.
count ("teach". "teach_fac_id") <
'2*
)
"faculty". "faculty_id" ASC
Figure
9.
Data source query
for "d_teach_less_than_2" data
26
window.
SELECT
"employee". "emp_id_code",
"employee" "mi",
.
FROM
"employee",
"employee" "f irst_name"
.
"employee" "last_name"
"pi"
("pi"."emp_id_code" = "employee" "emp_id_code"
WHERE
,
.
.
GROUP BY
"employee". "emp_id_code"
"employee" "mi",
.
HAVING
,
"employee" "first_name",
"employee" "last_name"
.
(count ("employee". "emp_id_code") =
ORDER BY
)
.
1
)
"employee". "emp_id_code" ASC
Figure 10. Data source query for "d_pi_less_than_2" data window.
Putting the Pieces Together with
6.
Once
the data
windows objects
Windows and
Scripts
are completed, they are placed in a
PowerBuilder "window" using the "window painter" and code
the
window "open"
between the
condition
FMS
is
added
to
event script which implements the cross-database join
and
CMS
(f.faculty_id
graphically in Figure 11.
databases as reflected in the
= e.emp_id_code).
The window
WHERE
This approach
script to
implement
is
this
clause
illustrated
approach
uses nested "for loops" in order to accomplish the join cross-product of the
"dw_teach_less_than_2" and "dw_pi_less_than_2" result
27
sets.
(Figure 12)
)
dw
teach less than 2
dw_pi_less_than_2
SELECT f. faculty id
CMS. teach t, CMS .faculty f
FROM
WHERE
t. teach fac id = f faculty id
GROUPBY f ,faculty_id
HAVING COUNT f.faculty_id < 2
SELECT e.emp id code
FROM
FMS.pi p, FMS employee e
WHERE
p emp_id_code = e emp id code
GROUPBY e.emp id code
HAVING COUNT e.emp_id_ code < 2
.
f.
.
faculty id
.
e
.
emp id code
Figure 11. Cross-database join illustration using data windows.
long pi_number_of_rows
long teach_number_of_rows
long pijoopcount
long teachjoopcount
string
string
piNum
teachNum
pi_number_of_rows = dw_pi_less_than_2_list.RowCount()
teach_number_of_rows = dw_fac_teach_less_than_2_list.RowCount()
for
pijoopcount =
1
to
pi_number_of_rows
piNum = dw_pi_less_than_2_list.GetltemString(piJoopcount,1)
teachjoopcount =
for
1
to
teach_number_of_rows
teachNum = dwJacJeachJessJhan_2Jist.GetltemString(teachJoopcount,1
IF
dw_employeeJinkJacultyJist.Retrieve(piNum,teachNum) >=
THEN
COMMIT using SQLCA;
ELSE
ROLLBACK using SQLCA;
MessageBox("Retrieve","Retrieve error
END
-
employee join
faculty detail")
IF
next
next
Figure 12. Nested for loop script to perform cross-database join
28
Performance Drag
7.
At
this point the original goal of
has been accomplished with
little
performing a cross-database join
difficulty;
however, the use of nested
"for loops" has a significant impact on performance.
result in the "teach less than 2" data
window
result in the "pi less than 2" contains 13.
against the
CMS. faculty
table
only one name.
contains 7 records and the
This results in 91 retrieval calls
which contains 74 records.
6734 join comparisons are required
yields
In this example, the
to get the final result
This means that
which
in this case
For queries over large tables and with large
intermediate result sets, this approach quickly becomes unusable.
task.
8.
A More Efficient Technique
What
is
If the
work done
needed
more
efficient
method of performing the same
in the script can be
reduced to a single loop vice the
is
a
nested dual loop, the performance will be significantly improved.
This can
be done by combining the supporting subquery for the "teach less than 2"
data
window with
data
window and making using
the supporting query for the "faculty_link_employee"
the resulting query the
the "faculty_link_employee" data
is
basically the
same subquery
window.
The
result
new
shown
The
passed
window
to the data
string
in Figure 13
for the condition "teach less than 2", but
have added the join condition "faculty_id = :employee_id"
"where" clause.
data source for
":employee_id"
is
the
inside a single loop in the
29
to the
retrieval
"window"
we
query's
argument
script
and
"
is
as
shown
against the
to 13
in Figure
CMS. faculty
from 91 and the
loop in the
different
window
windows
The number of
14.
now
required
table for the cross-database join has been reduced
total
number of join comparisons required
script is
are
retrieval calls
reduced
actually run
to
962 from 6734.
the
in
application,
inside the
When
there
is
the
two
a
very
noticeable improvement in response speed of the second technique over
that of the first.
"faculty". "faculty_id",
SELECT
"
FROM
faculty"
.
"faculty",
WHERE
"
f _name "
,
"
"faculty" "l_name",
.
faculty" "mi
.
"teach"
("teach". "teach_fac_id" = "faculty" "faculty_id"
.
and
"faculty" "faculty_id" =
.
(
:
employee_id
)
GROUP BY "faculty" "faculty_id*\
"faculty" "l_name",
"faculty" "f_name",
"faculty" "mi "
HAVING
(
count ("teach". "teach_fac_id") <
Figure 13. Improved data source query for
30
2
data
)
window
)
long number_of_rows
long loopcount
string
piNum
number_of_rows = dw_pi_less_than_2_list.RowCount()
FOR loopcount =
1
to
number_of_rows
piNum = dwjDi_less_than_2Jist.GetltemString(loopcount,1)
IF
dw_employee_link_faculty_list.Retrieve(piNum) >=
THEN
COMMIT using SQLCA;
ELSE
ROLLBACK
using
SQLCA;
MessageBoxfRetrieve",
"Retrieve error
^
END
-
employee join
faculty detail")
IF
NEXT
Figure 14. Single for loop script to perform cross-database join.
31
32
PERFORMANCE ISSUES
V.
QUERY OPTIMIZATION
A.
Relational Join Minimization
1.
As
in
any relational database the join operator will have a significant
impact on the performance of any integration solution.
physical integration strategy this
is
handled
way
strategy
of
a
However, some attention
required by the application developer to ensure that queries are
still
formulated in a
In
by the query
to a large extent
optimization rules built into the database engine.
is
In the case of the
that
minimizes the number of intermediate result
logical
integration
at
implementation of query optimization rules
means of minimizing
application
level,
this as a
as
result, large queries are
level,
in the universal
engine
performance bottleneck.
we demonstrated
attention to query design
database
the
in
sets.
the
is
a
Integration at the
Chapter IV, requires special
when performing cross-database
joins.
As
a
probably not practical as a rule with this approach,
and probably limit the usefulness of application level integration to smaller
organizations and applications base on smaller databases.
Table Size
2.
Although there are many ways
common
brute
is
force
to
perform
a join,
one of the most
the nested (inner-outer) loop approach, often referred to as the
approach.
This
is
the
33
method used
in
both
techniques
demonstrated
in
study because
it
are forced to use this
method
in the case
deals with intermediate result sets or tables represented by
windows and
data
We
Chapter IV.
the join
achieved through the use of a looping
is
construct where the retrieval arguments are passed to the data
record by record comparison and retrieval.
When
window
for a
using the nested loop
approach, query optimization also becomes a function of which table
chosen for the outer loop and which for the inner loop.
A
factor
is
known
as
the "join selection factor" affects the performance of a join and depends on
the
The equijoin
equijoin condition of the two tables and their size.
condition affects the percentage of records in a table which will be joined
with records in the other
illustrated
13 PI
file.
example given
In the
by the "faculty" window screen capture
employee records with
table of 88 records in the
in
in the case study, as
Appendix
E, there are
less than 2 research projects retrieved
FMS
from
a
database while there only 7 faculty members
teaching fewer than 2 courses retrieved from a table of 22 teach records in
the
CMS
database.
comparisons
if
we
Therefore,
result set as the outer loop.
small,
(7
x
88)
join
use the "PI less than 2" intermediate result as the outer
loop versus 276 (13 x 22)
relatively
we must perform 616
it
is
if
we use
the "teach less than 2" intermediate
Even though
sufficient
significantly affected if one table
is
to
the
show
numbers
in this
example are
how performance can
be
significantly larger than the other and
only a small percentage of the records in that table are selected for join.
34
B.
PARALLELISM
Logical integration strategies offer the ability to take advantage of
parallelism
connections
which can be achieved from multiple concurrent database
and can realize an overall performance advantage over
physical integration strategy depending on the overall
configuration and performance.
logical
a
system network
The inherently distributed nature of the
approach avoids the problem of server overload which
is
more
likely to occur with a single physical integrated database on a centralized
server.
35
36
CONCLUSIONS
VI.
Since
the
introduction
tremendous growth
of
relational
the
data
model
and
the
use of computers in recent years, the use of
in the
database applications as a tool for managing information has grown in
popularity.
Organizations have
discovered that they
now have many
databases and applications which manage access to them, but they find
increasingly difficult to
their disposal because
which
now
is
is
through the large amounts of information
many of
to integrate existing databases so that
among numerous
considered three
data systems
strategies
for
at
these databases contain overlapping data
not coordinated in any useful or meaningful way.
how
retrieval
sift
it
is
The challenge
information access and
We
efficient and effective.
have
accomplishing database integration and
discussed some of the probable challenges, benefits, and limitations of
each in the context of the relational data model.
As
a
result,
reminded that one of the primary sources of difficulty
database retrieval
is
the join operation.
This
is
cross-database
join
access.
Using
are
relational
even more significant when
seeking an integration solution since the relational model
to
in
we
PowerBuilder
is
not conducive
5.0
we showed
techniques for performing a join across multiple database boundaries as a
method of integrating existing databases
thesis demonstrates that
it
is
at
the application level.
possible to perform
37
some
This
level of integration
at
However,
the application level.
this
approach
not practical for use
is
with large or complex queries for performance reasons making
for use in large systems.
database
integration
information
It
to
unsuitable
can be a useful means of performing limited
between smaller applications
resources
it
provide
greater
administrative functionality.
38
to
decision
leverage
support
existing
utility
or
APPENDIX
A.
ENTITY RELATIONSHIP DIAGRAM FOR
THE FMS DATABASE
This entity relationship diagram
5.0's graphic representation of the
Only tables which are relevant
.
':
Database
FMS
to this
screen capture of PowerBuilder
database described in Chapter IV.
work
are included.
HEE
Fins
-
^PJ- depT code
—
'jftj
depLname
chair
is a
ernp_td_code
jon
^
code
1011
budgetjw©e_date
fur»d_type
Vp
laborjon
title
©mp_id_code
dv_gtade
&
step
mpl - emp_id_code
dept_code
d*te_ceceived
e>$w_date
spon_id_code
in^_fac_labor_$
-sHU
emp_code
<&>
«n
Ht.travet_$
in»t_optar_$
fir*t_name
mi
IteijnatM
base_saLary
effjsaLdate
eccel ate
—
^fel
tri\jxx*jript
emp_id_code
init_conUpa
mil_grade
WLconLoth
service
nd*ee^_cost
remarks
ba(_r*cJabor
bal_sptJabor
baLtravef
i
bWo.8
room_#
work_phone
home_pbone
ffe
streei_address
emp_id_code
ov_giade
dfy
step
bal_opter
ba<_cor^_mipr
bat.cont_ipa
bai_cont_oth
state
zipcode
spousejname
category
term_date
>fi
39
Entity relationship diagram of
FMS
created using S-designor 5.0
AppModeler Desktop.
FACULTY
EMP
ID
CODE
CIV_GRADE
STEP
MILITARY
char(4)
EMP
char(5)
MIL_GRADE
SERVICE
char(2)
EMP
ID
ID
CODE
STAFF
EMP
char(5)
CIV_GRADE
STEP
char(4)
CODE
=
EMP
CODE
char(4)
ID
ID
char(4)
char(5)
char(2)
CODE
V
EMP_ID_CODE = EMP_ID_CODE
DEPT CODE = DEPT CODE
£
DEPARTMENT
DEPT CODE
char(2)
DEPT_NAME
char(40)
CHAIR_CODE
char(4)
EMP
ID
CODE = EMP
ID
CODE
EMPLOYEE
char(4)
EMP ID CODE
DEPT_CODE
char(2)
EMP_CODE
char(2)
SSN
FIRST_NAME
char(11)
Ml
char(1)
LAST_NAME
BASE_SALARY
EFF_SAL_DATE
ACCEL_RATE
BLDG_#
char(15)
char(15)
numeric(10,2)
date
numeric(3,2)
char(3)
ROOM_#
WORK_PHONE
HOME_PHONE
char(5)
STREET_ADDRESS
char(20)
CITY
char(15)
STATE
ZIPCODE
char(2)
SPOUSE_FNAME
CATEGORY
TERM DATE
char(15)
char(13)
char(13)
char(10)
char(1)
date
ACCOUNT
PI
EMP
JON
ID
CODE
char(4)
char(5)
JON
_
JON
JON
char(5)
BUDGET PAGE DATE
FUND TYPE
LABOR JON
date
char(2)
char(5)
TITLE
char(30)
DATE RECEIVED
EXPIR DATE
INIT FAC LABOR
INIT SPT LABOR
INIT TRAVEL $
date
INIT_OPTAR_$
INIT CONT MIPR
INIT_CONT_IPA
CONT OTH
INDIRECT COST
REMARKS
INIT
date
$
numeric(12,2)
$
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
char(50)
BAL FAC LABOR
BAL SPT LABOR
BAL TRAVEL
BAL OPTAR
BAL_CONT_MIPR
BAL CONT PA
numeric(12,2)
BAL_CONT_OTH
numeric(12,2)
I
40
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
;MP
ID
CODE = EMP
ID
CODE
APPENDIX
B.
FMS DATABASE SCHEMA
EMPLOYEE
Column
List
Name
ACCEL RATE
BASE SALARY
BLDG #
Code
Type
CATEGORY
ACCEL RATE
BASE SALARY
BLDG #
CATEGORY
CITY
CITY
char(15)
DEPT CODE
EFF SAL DATE
DEPT CODE
EFF SAL DATE
date
EMP CODE
EMP ID CODE
FIRST NAME
HOME PHONE
LAST NAME
EMP CODE
EMP ID CODE
FIRST NAME
HOME PHONE
LAST NAME
char(2)
Ml
Ml
char(1)
ROOM
#
ROOM
#
numeric(3,2)
numeric(10,2)
char(3)
char(1)
char(2)
char(4)
char(15)
char(13)
char(15)
char(5)
SPOUSE FNAME
SPOUSE FNAME
char(15)
SSN
STATE
STREET ADDRESS
TERM DATE
SSN
STATE
STREET ADDRESS
TERM DATE
char(11)
WORK PHONE
WORK PHONE
char(13)
ZIPCODE
ZIPCODE
char(10)
char(2)
char(20)
date
P
No
No
No
No
No
No
No
No
Yes
No
No
No
No
No
No
No
No
No
No
No
No
Reference to List
DEPARTMENT
Foreign Key
Primary Key
Reference to
DEPT_CODE
DEPT CODE
Reference by List
PI
FACULTY
STAFF
MILITARY
Foreign Key
Primary Key
Referenced by
EMP
EMP
EMP
EMP
ID
ID
ID
ID
CODE
CODE
CODE
CODE
41
EMP
EMP
EMP
EMP
ID
ID
ID
ID
CODE
CODE
CODE
CODE
M
No
No
No
Yes
No
No
No
Yes
Yes
No
No
Yes
No
No
No
No
No
No
No
No
No
FACULTY
Column
List
Name
Code
Type
CIV
GRADE
EMP ID CODE
CIV
GRADE
EMP ID CODE
char(5)
STEP
STEP
char(2)
char(4)
P
M
No No
Yes Yes
No No
Reference to List
EMPLOYEE
Foreign Key
Primary Key
Reference to
EMP_ID_CODE
EMP_ID_CODE
MILITARY
Column
List
Name
EMP
MIL
CODE
GRADE
ID
SERVICE
Code
EMP
MIL
Type
CODE
GRADE
char(4)
ID
char(5)
SERVICE
char(4)
P
M
Yes Yes
No No
No No
Reference to List
EMP
EMPLOYEE
Foreign Key
Primary Key
Reference to
EMP
CODE
ID
ID
CODE
STAFF
Column
List
Name
CIV
EMP
GRADE
ID CODE
Type
Code
CIV
EMP
GRADE
ID CODE
char(5)
char(4)
STEP
STEP
char(2)
P
No No
Yes Yes
No No
Reference to List
EMPLOYEE
Foreign Key
Primary Key
Reference to
EMP
ID
CODE
42
EMP
ID
CODE
M
DEPARTMENT
Column
List
Name
Code
Type
CHAIR CODE
CHAIR CODE
char(4)
DEPT CODE
DEPT_NAME
DEPT CODE
DEPT NAME
char(2)
char(40)
_
Reference by List
Primary Key
Referenced by
DEPT CODE
EMPLOYEE
P
M
No No
Yes Yes
No No
Foreign Key
DEPT CODE
PI
Column
List
Name
EMP
ID
CODE
JON
Code
EMP
ID
Type
CODE
char(4)
JON
Reference to
List
Foreign Key
Primary Key
Reference to
EMPLOYEE
ACCOUNT
char(5)
P
M
Yes Yes
Yes Yes
EMP
ID
CODE
EMP
JON
JON
43
ID
CODE
ACCOUNT
Column
List
Name
BAL
BAL
BAL
BAL
BAL
BAL
BAL
Code
CONT IPA
CONT MIPR
CONT OTH
FAC LABOR
OPTAR
SPT LABOR
TRAVEL
BUDGET PAGE DATE
BAL
BAL
BAL
BAL
BAL
BAL
BAL
Type
CONT IPA
CONT MIPR
CONT OTH
FAC LABOR
OPTAR
SPT LABOR
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
numeric(12,2)
numerical 2,2)
DATE RECEIVED
EXPIR DATE
FUND TYPE
TRAVEL
BUDGET PAGE DATE
DATE RECEIVED
EXPIR DATE
FUND TYPE
INDIRECT COST
INDIRECT COST
INIT
CONT IPA
CONT MIPR
CONT OTH
INIT
FAC LABOR
INIT
INIT
OPTAR $
SPT LABOR
INIT
TRAVEL
INIT
INIT
INIT
CONT IPA
CONT MIPR
CONT OTH
FAC LABOR
OPTAR $
SPT LABOR
INIT
TRAVEL
INIT
INIT
INIT
$
INIT
INIT
$
$
numeric(12,2)
date
date
date
char(2)
numeric(12,2)
numeric(12,2)
numerical 2,2)
numeric(12,2)
$
numeric(12,2)
numeric(12,2)
$
$
numeric(12,2)
numeric(12,2)
JON
LABOR JON
JON
LABOR JON
char(5)
REMARKS
REMARKS
char(50)
TITLE
TITLE
char(30)
char(5)
P
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
Yes
No
No
No
Reference by List
PI
Foreign Key
Primary Key
Referenced by
JON
JON
44
M
No
No
No
No
No
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
APPENDIX
C.
CMS ENTITY-RELATIONSHIP DIAGRAM
This entity relationship diagram
5.0's graphic representation of the
Database
-
is
CMS
a screen capture of
PowerBuilder
database described in Chapter IV.
EM3
Cms
bU
45
CMS
Entity relationship diagram of
created using S-designor 5.0
AppModeler Desktop.
section
teach
char(5)
section nbr
teach sec nbr
char(2)
s
course nbr
char(6)
teach course nbr
char(6)
s
ay
char(2)
teach ay
char(2)
teach qtr
char(1)
teach fac
id
section_nbr = teach_sec_nbr
s_course_nbr = teach_course_nbr
char(2)
char(1)
s qtr
s_ay = teach_ay
= teach_ qtr
s_qtr
I
offered
»urse =
s course nbr
ay_offered = s_ay
qtr_offered
facultyjd = teach_fac_td
= s_qtr
faculty
"faculty
qtr
offering
char(5)
id
l_name
f_name
char(20)
offered course
char(6)
char(20)
ay offered
char(2)
mi
chart 1)
qtr offered
aterQ)
rank
char(5)
ay
qtr
ay = ay_offered
qtr_nbr = qtr_offered
char(2)
nbr
char(l)
start_date
date
end date
date
course nbr = offered course
faculty_id = ct_faculty_id
can teach
ct
course nbr
ct faculty
id
char(6)
char(5)
course nbr =
ct
course nbr
char(6)
coursejitle
char(40)
course nbr
instr
hours
c_description
46
char(3)
char(2000)
prerequisite
p_course_nbr
cocourse_nbr = p_course_nbnbr prerequisite_c_nbr
char(6)
char(6)
APPENDIX
D.
CMS DATABASE SCHEMA
CAN TEACH
Column
list
Name
Code
P
M
Yes Yes
Yes Yes
Type
ct_course_nbr
ct_course_nbr
char(6)
ct_faculty_id
ct_faculty_id
char(5)
Reference to List
Primary Key
Reference to
Foreign Key
course
course_nbr
ct_course_nbr
faculty
faculty_id
ct_faculty_id
COURSE
Column
List
Name
Code
Type
c_description
c_description
char(2000)
course nbr
course title
instr hours
course nbr
course title
char(6)
instrjiours
char(3)
char(40)
P
No
Yes
No
No
M
No
Yes
Yes
No
Index List
Index
Code
idx_course_nbr
F
P
Yes No
U
C
Yes No
Column Code
Sort
ASC
course_nbr
Reference by List
Referenced by
can teach
offering
prerequisite
prerequisite
Foreign Key
Primary Key
course nbr
course nbr
course_nbr
course_nbr
47
course nbr
offered_course
ct
p_course_nbr
prerequisite_c_nbr
FACULTY
Column
List
Name
Code
Type
f_name
fjiame
char(20)
facultyjd
facultyjd
char(5)
l_name
Ijiame
char(20)
mi
mi
char(1)
rank
rank
char(5)
P
No
Yes
No
No
No
M
No
Yes
No
No
No
Reference by List
Primary Key
Referenced by
Foreign Key
can_teach
facultyjd
ct_faculty_id
teach
facultyjd
teach fac
id
OFFERING
Column
List
Name
Code
Type
ay_offered
ay_offered
char(2)
offered_course
offeredjx>urse
char(6)
qtijoffered
qtr_offered
char(1)
P
M
Yes Yes
Yes Yes
Yes Yes
Reference to List
qtr
Foreign Key
Primary Key
Reference to
course
coursejibr
ay
offeredjx>urse
qtr_nbr
qtr_offered
ay_offered
Reference by List
Referenced by
section
Primary Key
offered jx>urse
Foreign Key
ay_offered
s_course_nbr
s_ay
qtr_offered
s_qtr
48
PREREQUISITE
Column
List
Name
Code
Type
P
M
No
No
Yes
Yes
p_course_nbr
p_course_nbr
char(6)
prerequ isite_c_n br
prerequisite_c_nbr
char(6)
Table prerequisite Index List
Index Code
P
No
pk_prereq
F
U
C
Yes Yes No
Column Code
prerequisite_c_nbr
Reference to
Sort
ASC
ASC
p_course_nbr
List
Reference to
course
course
Primary Key
course_nbr
course_nbr
Foreign Key
p_course_nbr
prerequisite_c_nbr
QTR
Column
List
Name
Code
Type
ay
end date
ay
end date
date
qtr_nbr
qtr_nbr
char(1)
start_date
start_date
date
Reference by
M
P
Yes Yes
No Yes
Yes Yes
No Yes
List
Primary Key
Referenced by
offering
char(2)
Foreign Key
ay
ay_offered
qtr_nbr
qtr_offered
49
SECTION
Column
List
Name
Code
Type
s_ay
s course nbr
s_ay
char(2)
s course nbr
char(6)
s_qtr
s_qtr
char(1)
section_nbr
section nbr
char(2)
Reference to
P
Yes
Yes
Yes
Yes
List
Primary Key
ay_offered
Foreign Key
s_course_nbr
s_ay
qtr_offered
s_qtr
Reference to
offering
offered_course
Reference by List
Primary Key
Referenced by
teach
Foreign Key
section nbr
teach sec nbr
s course nbr
teach course nbr
s_ay
teach_ay
s_qtr
teach_qtr
50
M
Yes
Yes
Yes
Yes
TEACH
Column
List
Name
Code
Type
teach_ay
teach course nbr
teach fac id
teach_ay
teach course nbr
char(2)
teach fac
char(5)
teach_qtr
teach_qtr
char(1)
teach sec nbr
teach_sec_nbr
char(2)
Reference to
id
char(6)
P
Yes
Yes
Yes
Yes
Yes
List
Primary Key
Reference to
Foreign Key
faculty
faculty_id
teach_fac_id
section
section nbr
s course nbr
teach sec nbr
teach course nbr
s_ay
teach_ay
s_qtr
teach_qtr
51
M
Yes
Yes
Yes
Yes
Yes
52
APPENDIX
E.
CMS SCRIPTS AND SAMPLE SCREEN
CAPTURES
Script for Application Open Event
//Initial default database transaction
//In this application this transaction
//is used to connect to the CMS database
= Prof ileString "PB. INI", "Database", "DBMS"
SQLCA.DBMS
SQLCA.DbParm = ProfileString "PB. INI", "Database", "DbParm",
(
,
(
"
"
//Transaction cms_trCursor declared in "declare: global" menu
cms_trCursor = SQLCA
CONNECT using cms_trCursor;
IF cms_trCursor.SQLCODE <>
THEN
MessageBox ("Connect Error", "CMS connection failed")
HALT
END IF
//Transaction fms_trCursor declared in "declare global" menu
fms_trCursor = CREATE transaction
fms_trCursor.DBMS =
Prof ileString "PB. INI", "PROFILE Fms", "DBMS", " ")
fms_trCurs or. DbParm = Prof ileString "PB. INI", "PROFILE Fms", "DbParm", " "
:
(
(
CONNECT using fms_trCursor;
IF fms_trCursor.SQLCODE <>
THEN
MessageBox ("Connect Error", "FMS connection failed!")
HALT
END IF
/* let user control the toolbar*/
toolbarusercontrol = TRUE
toolbartext = TRUE
open
(w
main)
53
Script for Application Close Event
transaction cms_trCursor
cms_trCursor = SQLCA
DISCONNECT using cms_trCursor;
THEN
IF cms_trCursor SQLCODE <>
ROLLBACK using cms_trCursor;
MessageBox( "Disconnect", cms_trCursor .SQLERRTEXT)
END IF
.
54
)
Script for w_facuity "window" Open Event with nested loop
// associate each data window with a database connection
// transaction object
dw_fac_teach_less_than_2_list settransobject
sqlca
dw_pi_less_than_2_list settransobject (fms_trcursor
dw_employee_link_faculty_list settransobject
sqlca
.
(
)
.
(
)
.
// disable edit control for each data window
dw_pi_less_than_2_list .Modify "datawindow. readonly = yes")
dw_fac_teach_less_than_2_list .Modify "datawindow. readonly = yes")
dw_employee_link_faculty_list .Modify "datawindow. readonly = yes")
(
(
(
// retrieve list of faculty from CMS for which teach instances
// are less than two dw_f ac_teach_less_than_2_list data window
IF dw_f ac_teach_less_than_2_list. Retrieve () = -1 THEN
ROLLBACK using SQLCA;
MessageBox "Retrieve", "Faculty Teach Retrieval Error")
(
ELSE
COMMIT using SQLCA;
END IF
// retrieve list of employees from FMS for which PI instances
// are less than 2 in dw_pi_less_than_2_list data window
= -1 then
IF dw_pi_less_than_2_list .Retrieve
(
)
rollback using fms_trCursor;
messagebox "Error", "PI Employee List Retrieval Error")
(
ELSE
commit using fms_trCursor;
END IF
55
)
)
Script for w__faculty "window" Open Event with nested loop
(continued)
long pi_number_of_rows
long teach_number_of_rows
long pi_loopcount
long teach_loopcount
string piNum
string teachNum
// get number of rows in pi result data window
pi_number_of_rows = dw_pi_less_than_2_list RowCount
// get number of rows in teach result data window
teach_number_of_rows = dw_fac_teach_less_than_2_list .RowCount
.
for pi_loopcount =
1
(
(
to pi_number_of_rows
piNum = dw_pi_less_than_2_list .GetltemString (pi_loopcount,
for teach_loopcount =
1
1)
to teach_number_of_rows
teachNum =
dw_fac_teach_less_than_2_list .GetltemString (teach_loopcount,
IF dw_employee_link_faculty_list .Retrieve (piNum, teachNum) >=
THEN
COMMIT using SQLCA;
ELSE
ROLLBACK using SQLCA;
MessageBox ("Retrieve", "Retrieve error - employee join
faculty detail")
END IF
next
next
56
1)
)
)
Script for w faculty "window" Open Event with single for loop
// associate each data window with a database connection
// transaction object
dw_fac_teach_less_than_2_list settransobject
sqlca
dw_pi_less_than_2_list settransobject (fms_trcursor)
dw_employee_link_faculty_list settransobject
sqlca
.
(
)
.
(
)
.
// disable edit control for each data window
dw_f ac_teach_less_than_2_list Modify "datawindow readonly = yes "
dw_pi_less_than_2_list .Modify ("datawindow. readonly = yes")
dw_employee_link_faculty_list .Modify "datawindow. readonly = yes")
(
.
.
(
// retrieve list of faculty from CMS for which teach instances
// are less than two dw_fac_teach_less_than_2_list data window
IF dw_fac_teach_less_than_2_list.Retrieve() = -1 THEN
ROLLBACK using SQLCA;
MessageBox ("Retrieve", "Faculty Teach Retrieval Error")
ELSE
COMMIT using SQLCA;
END IF
// retrieve list of employees from FMS for which PI instances
// are less than 2 in dw_pi_less_than_2_list data window
IF dw_pi_less_than_2_list.Retrieve() = -1 THEN
ROLLBACK using fms_trCursor;
messagebox ("Error", "Employee List Retrieval Error")
ELSE
COMMIT using fms_trCursor;
END IF
long number_of_rows
long loopcount
string piNum
number_of_rows = dw_pi_less_than_2_list .RowCount
FOR loopcount =
1
(
to number_of_rows
piNum = dw_pi_less_than_2_list. Get ItemString( loopcount,
IF dw_employee_link_faculty_list .Retrieve (piNum)
COMMIT using SQLCA;
ELSE
ROLLBACK using SQLCA;
MessageBox "Retrieve",
"Retrieve error
END IF
>=
1)
THEN
(
NEXT
57
-
employee join faculty detail")
Screen capture of w faculty window
HBQ
Course Management System
&t»
facuky
Course*
Schedufino.
FaultyJdL Last Name
3333 HALWACHS
;
Help
First
Name
|
M
Initial
[THOMAS
ORNA NAKAGAWA
ORPD =URDUE
SCHMORRCW
ORTW TAYLOR:
3RUTZMAN
SORDOtvt
3
ETER
3YLAN
iames:
50NALD
*
5&S&*S&S»
•-..
Close
.
.
.
58
LIST OF REFERENCES
1.
Loomis, Mary E.
S.
,
Object Databases The Essentials, Reading,
MA:
Addison-
Wesley Publishing Company, 1996.
2.
3.
J., An Introduction to Database Systems, Fourth
Wesley Publishing Company, Reading, MA, 1986.
Date, C.
Edition,
Volume
1,
Addison-
Spear, R. L., "A Relational/Object-Oriented Database Management System:
" Master's Thesis, Naval Postgraduate School, Monterey, CA,
September, 1992.
R/OODBMS,
4.
Elmasri, R. and Navathe, S. B., Fundamentals ofDatabase Systems, Second Edition,
Inc., Redwood City, CA, 1994.
Benjamin/Cummings Publishing Company,
5.
Kent,
W.
Limitations of Record-Based Information Models,
4(1); 1979; 107-131.
ACM Transactions on
Database Systems,
6.
Codd, E.
J.,
Extending the Database Relational Model to Capture More Meaning,
ACM Transactions on Database Systems, 4(4);
7.
8.
1979.
C. T., An Effect of Set Type to Query Formulation in Relational Database
Systems, Naval Postgraduate School, Monterey, CA, Report No NPS52-88-018, July,
1988.
Wu,
Jernigan, Kevin,
"Making
the
Move to VLDBs," Databased Advisor, p.
36,
March, 1997.
9.
Pires, S. Alan,
FMS Thesis reference, Master's Thesis, Naval Postgraduate School,
Monterey, CA, March, 1997.
59
60
BIBLIOGRAPHY
L., and Tucherman, L., A Software Tool for Modular Design,
ACM Transactions on Database Systems, Vol. 16, No.2, June 1991, pp. 209-234.
Casanova, M. A., Furtado, A.
Codd, E. F., The Relational Model for Database Management, Version
Addison- Wesley Publishing Company, 1990.
2,
Reading,
J., with Warden, A., Relational Database Writings, 1885-1989, Reading,
Addison- Wesley Publishing Company, 1990.
Date, C.
J., with Darwin, H., Relational Database Writings, 1889-1991, Reading,
Addison- Wesley Publishing Company, 1992.
Date, C.
MA:
MA:
MA:
Grahne, G., The Problem of Incomplete Information in Relational Databases, Lecture Notes
in
Computer Science, Vol 554, Springer- Verlag Berlin Heidelberg, 1991.
Kuper, G. M. and Moshe, Y. V., The Logical Data Model,
ACM Transactions on Database
Systems, Vol. 15, No.l, September 1990, pp. 379-413.
View Updates in Relational Databases with an Independent Scheme, ACM
Transactions on Database Systems, Vol. 15, No.l, March 1990, pp. 40-66.
Langerak, R.,
Shasha, D. and
Wang,
Relations Are
T.,
Optimizing, Equijoin Queries in Distributed Databases
Where
Hash Partitioned, ACM Transactions on Database Systems, Vol.
16,
No.2, June 1991, pp. 279-308.
Markowitz, V. M. and Shoshani, A., Representing Extended Entity-Relationship
Transactions on
Structures in Relational Databases: A Modular Approach,
Database Systems, Vol. 17, No.3, September 1992, pp. 423-464.
ACM
61
62
INITIAL DISTRIBUTION LIST
1.
Defense Technical Information Center.
8725 John
Ft. Belvoir,
2.
Kingman Road.,
J.
VA
Ste 0944
22060-6218
Dudley Knox Library
Naval Postgraduate School
411 Dyer Rd.
Monterey,
3.
CA
93943-5101
Director, Training and Education
MCCDC, Code C46
1019 Elliot Rd.
Quantico,
4.
VA
22134-5027
Director, Marine Corps Research Center
MCCDC, Code C40RC
2040 Broadway Street
Quantico,
5.
VA
22134-5107
Director, Studies and Analysis Division
MCCDC, Code C45
300 Russell Road
Quantico,
VA
22134-5130
63
6.
Chairman, Code CS
Computer Science Department
Naval Postgraduate School
Monterey,
7.
C.
CA
93943-5101
Thomas Wu, Code CS/Wq
..
Computer Science Department
Naval Postgraduate School
Monterey,
8.
CA
93943-5101
Mark H. Barber, Code CS/Bm
Computer Science Department
Naval Postgraduate School
Monterey,
9.
CA
93943-5101
Maj. Barron D. Whitaker
375C Bergin
Dr.
Monterey, CA, 93940
64
DUDLEY KNOX LIBRARY
-POSTGRADUATE SCHOOL
£Y CA S3343-SJ01
DUDLEY KNOX LIBRARY
3 2768 00337607