Download Towards a solution to the proper integration of a

Document related concepts

DBase wikipedia , lookup

Serializability wikipedia , lookup

Microsoft Access wikipedia , lookup

Relational algebra wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Versant Object Database wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Calhoun: The NPS Institutional Archive
Theses and Dissertations
Thesis and Dissertation Collection
1987-12
Towards a solution to the proper integration of a
logic programming system and a large knowledge
based management system
Gorman, John Patrick
http://hdl.handle.net/10945/22557
NAVA...
-
..-.
'^-'
^'-^^
^^^
MONTKRE^, CA..i50BWIA 93943-6002
V.
NAVAL POSTGRADUATE SCHOOL
Monterey ,
California
THESIS
TOWARDS A SOLUTION TO THE PROPER INTEGRATION
OF A LOGIC PROGRAMMING SYSTEM AND A LARGE
KNOWLEDGE BASED MANAGEMENT SYSTEM
1
|
by
John Patrick Gorman
December 1987
Thesis Advisor
C.T. Wu
Approved for public release; distribution is unlimited
T238936
^CLASSIFIED
REPORT DOCUMENTATION PAGE
lb
PO«I StCuRilV CL>SSifiCATlON
RESTRICTIVE
MARKINGS
iclassif ied
OlSTRiBUTlON/AVAILAaiLITV O^ REPORT
CURiTV ClASSiHCAnON AUTHORiTV
)
tClASSifiCATiON/OOWNGRAOiNG SCHEDULE
Approved for public release;
distribution is unlimited.
IfORMiNG ORGANi/AnON REPORT NUMBER(S)
S
6b OfUCi SrMBOl
(If tpplntblt)
^ME or PERFORMING ORGANIZATION
il
Postgraduate School
1»
lb
93943-5000
California
ORESS(Ciry
ill
AOORESStCry.
5(«r«.
»nd li^Codt)
Monterey, California
8b OFfiCE SrwSOL
ME OF Funding /SPONSORING
GANiZATiON
NAME 0» MONITORING ORGANIZATION
Naval Postgraduate School
Code 52
•ORESS (Cry. Srjr«. tmiliPCodt)
:erey,
MONITORING ORGANISATION REPORT NUK/8ER(S)
PROCUREMENT INSTRUMENT
9
93943-5000
lOENTif iCAIlON
NUMBER
4P0llCSblt)
%utt. tnd llPCodt)
10
SOURCE OF FUNDING NUMBERS
PROGRAM
PROJECT
rAS<
WORK
ELEMENT NO
NO
NO
ACCESSION
UNIT
NO
SOLUTION TO THE PROPER INTEGRATION OF A LOGIC PROGRAMMING SYSTEM
A LARGE KNOWLEDGE BASED MANAGEMENT SYSTEM (u)
.RDS A
ISO^A^,
lan,
»t
auth0R(U
,
.
John Patrick
Of RfPQRr
i)D
er^s Tiiesi:
COVERED
T'ME
ftVE. OF REPORT, [Ytst. Monrft Dty)
'<,
fPOM
1987 December
TO
IS
PAGE COuNI
4l
l>9lE^nNTARV NOTATION
COSATi COOES
GROUP
iD
TRACT
(COrtl<ouf
18
on rfv«a«
(/
if
ntttliny *nd
idtntify by b'ock
numbtr)
Expert Sys tems; Knowledge Based Management Sys
tems, Expert Database Systems; Integration of
"
Expert Dat;
^base Systems
GROUP
SUB
SUBJECT TERMS {Conimu* on tt^tnt
necfUJ'y »nti lOtntify by blotk numt>*rt
designing the interface between a database and a logic system with
rence such as Prolog, efficiency is the major issue.
Presented here are
e of the methods that are conside red most promising and in which much
arch is focused.
The first metho d explores extending an inference
ine to include a database manager
the second couples the inference
anism with a database management system; and the third extends a datamanagement mechanism to include inference
;
.
«'3UTi0N' AVAILABILITY QF ABSTRACT
iCLASSiFi(0/\jNLiMiTED
*VI(
'.
M
21
SAME AS RPT
Q OTiC
OF RESPONSIBLE iNO'ViOUAL
C.T.
(
Q
'^siWSK'-mt""°*'
Wu
1473. 04 MAR
ABSTRACT SECURITY CL-ASSiFlCATlON
Unclassified
USERS
8)
APR rd'don
All
in«y bt ultd until
c^tulttd
oth«r td'tioni »tt obtolatt
'&Sc"U\rq'°'
SECURITY CLASSIFICATION QF TmiS PACE
UNCLASSIFIED
SECURITY CLASSiriCATION OF THIS PAGE (Vian Dmtm
Item
Sn(«r*40
# 19
Ackncwledging up front that no method can be claimed best, the major
emphasis of this study will be to determine the strengths and
limitations of all three methods and thereby help to clarify many
uncertain and sometimes conflicting issues caused by the parallel lines
of development from the database and artificial intelligence
conmunities
S
N 0102-
LF- 014-
6601
SECURITY CLASSIFICATION OF THIS RAGErWh.n Dmim SnUrad)
Approved
for public release; distribution is unlimited.
A Solution To The Proper Integration
Of A Logic Programming System And A
Towards
Large Knowledge Based Management System
by
John
Lieutenant
P.
Gorman
Commander, United
States
Navy
B.S., Santa Clara University, 1975
Submitted in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE IN COMPUTER SCIENCE
from the
NAVAL POSTGRADUATE SCHOOL
December 1987
^
ABSTRACT
In designing the interface between a database and a logic system with inference such
as Prolog, efficiency is the major issue. Presented here are three of the
considered most promising and in which
much
research
is
focused.
methods
The
first
that are
method
explores extending an inference machine to include a database manager, the second
couples the inference mechanism with a database management system; and the third
extends a database management mechanism to include inference.
Acknowledging up
this
front that
no method can be claimed
best, the
study will be to determine the strengths and limitations of
thereby help to clarify
parallel lines
many
all
major emphasis of
three
methods and
uncertain and sometimes conflicting issues caused by the
of development from the database and
artificial intelligence
communities.
-'S'iS
TABLE OF CONTENTS
I.
INTRODUCTION
OVERVIEW
PROOF THEORY VS MODEL THEORY
A.
B.
8
9
INDEXING
PROLOG AS AN IMPLEMENTATION OF LOGIC PROGRAM-
C.
D.
n.
8
1
MING
EXTENDING PROLOG TO HANDLE SECONDARY STORAGE
11
15
A.
OVERVIEW
15
B.
CRITICISMS OF PROLOG
15
Lacks Support
1.
2.
15
Weak Typing
16
Evaluation Strategies
Strict
4.
No Look Ahead
No Secondary Storage Management
5.
16
"...
3.
16
16
ADVANTAGES OF PROLOG
C.
17
Augmented
17
1.
Easily
2.
Straightforward
3.
Simple Virtual Data Definition
18
4.
Simplicity
18
5.
Use Of Rules
19
And Flexible
17
D.
CLAUSE INDEXING
19
E.
QUERY OPTIMIZATION
20
1.
Reorder Clause
20
2.
Group Independent Clauses
21
m. THE COUPLED MECHANISM
22
A.
B.
C.
OVERVIEW
LOOSELY COUPLED
TIGHTLY COUPLED METHODS
22
Methods
32
1.
Interpretive
22
27
Set Oriented Approach
33
SUMMARY
EXTENDING AN RDBMS
34
a.
D.
IV.
35
OVERVIEW
B. DATABASE ADVANTAGES
C.QUERY EVALUATION
A.
D.POSTGRES SUPPORT
35
35
36
-
5
37
V.
TOWARDS A NEW APPROACH
A. OVERVIEW
B.
THE COMBINED APPROACH
C SUMMARY AND SCOPE
LIST OF REFERENCES
INITIAL DISTRIBUTION LIST
40
40
42
44
45
47
LIST OF
1.
Loosely Coupled Mechanism
2.
Tightly Coupled
3.
Mechanism
Combined Approach
nOURES
25
27
42
I.
A.
INTRODUCTION
OVERVIEW
In recent years, great emphasis has been placed on the relationship between logic
programming and
A
databases.
relational
between logic programming and expressions of
between Prolog
on the
raised
facts
and tuples of relations
possibility of
An
storage.
outstanding
progammming an
[1].
managing large
programming using some means of
recent
is to
relational algebra
and
in particular
Therefore, great expectations have been
collections of facts represented in logic
retrieving those facts that are stored in secondary
which
development
area of major importance
project The aim of the project
correspondence has been shown
natural
is
Japan's
promises
fifth
to
make
logic
generation computer system
develop computer systems for the 1990's. This will
involve bringing together four currently separate research areas: expert systems, very
high
programming languages,
level
integration technology.
and
that the
How
(RDB)
It is
distributed
computer achitecture be designed
receiving
great
interest
in
programming and
both the database and
that
However, such a marriage was not conceived
thesis is to
large
scale
to directiy support such languages.
communities because of the widespread appUcations
marriage.
and very
intended that the very high level languages be based on logic
to best represent the integration of logic
is
computing
examine possible
interfaces
relational databases
artificial
intelligence
can be drawn from such a
in heaven.
The
intent of this
between an expert system (ES) and database
management system(DBMS). The sum of
the
8
two
is
called a
knowledge base database
system(KBMS). This chapter
programming and
Prolog and
why
will discuss
some fundamental
relational logic/databases.
it is
viewed
differences between logic
Also, presented will be a discussion of
as the premier logic
programming language. Chapter
Two
will discuss the issues involved in expanding Prolog to include database facilities.
Chapter Three will present an in-depth study of coupling an inference machine with that
of a complete database management system (DBMS). Chapter Four presents some issues
of expanding
DBMS's
to include inference
and Chapter Five discusses future issues of
integrated expert database systems.
B.
PROOF THEORY VS MODEL THEORY
Fundamental differences do
respective
management of
exist
data.
between logic programming and databases
in their
While some issues can be readily resolved, more
fundamental issues are based on theoretical differences such as the model theoretic
versus proof theoretic views.
The
basis
Such features
many
from which
relational database
as integrity constraints
and database theory are founded
and views direcdy apply to notions of
instances however, databases are viewed in a
model
theoretic
is logic.
logic. In
way. In a model
theoretic definition of a database, the state of the database is an interpretation (which
must be a model) of the theory. Hence, every evaluation
truth value for the
constitutes the computation of a
query over the current model. During updates, the model changes but
not the underlying database schema; consequendy, update transactions must preserve
database integrity such that the
new database
Logic programming views databases
will be a
in light
model of the theory
of the proof theoretic view.
proof theoretic view, facts and deduction rules constitute the theory
9
[1].
itself rather
In the
than a
model of
more general underlying
the
theory. Queries
make up theorems
that
have to be
proven from the theory by using a small number of well founded proof techniques such
as resolution.
Consequendy, update operations change the theory. Time invariance, as
discussed above in conjunction with model theoretic, does not exist Therefore, the
theory changes after update operations
The question then
is
how
[2].
best to reconcile the
two views of databases
of efficient and intelligent knowledge base management.
differences between the
is
two
is
through the representation of tables
a set of assertions, procedures and a goal.
name can be viewed
applying
all
assertions
at
The
set
in relational
[3].
to resolve the
A
logic
program
of assertions with the same predicate
where each row represents a
as a table
conforms with a relation
One way
for the benefit
fact.
This representation
database. This representation also facilitates
once to an atomic formula and hence the table represents
all
possible substitution sets for an individual atomic formula [4]. Extending this concept, a
conjunction of atomic formulae can also be represented by a table where the individual
columns represent the
distinct variables/constants in the conjunction
the set of values the variables can take.
Theorem proving, using
and rows represent
the resolution principle,
can be viewed in terms of table operations producing a new table when one of the atomic
formulae in the conjunction
1.
is
resolved.
Tables can be used to handle
corresponds to a substitution
2.
sets,
The advantages of table representation
where each row
set.
A relation is often conceived as a table and hence this
representation enables a smoother interface with
relational database
management system (RDBMS).
10
are:
3.
Resolution
is in
terms of well defined set operations and
the possibility of optimization.
4.
Negation by failure requires that
before success or failure
is
declared. Table
representation facilitates obtaining
the
C.
same time
solutions be sought
all
all
table sharing reduces
answers and
memory
at
space needed.
INDEXING
Standard relational database systems make extensive use of indices to help search
through
many
facts.
access method
files,
A
number of
are
techniques, such as
known and used
in the
B+
trees
and indexed sequential
implementation of these sophisticated
database systems. The issue of indices and their associated access-methods
is
necessary
to explore in this thesis because of their uses in improving query processing efficiency of
integrated logic
programming and database systems
[5].
Logic programming systems support a limited and fixed amount of indexing. They
may implement
their
own
access methods which results in an extension of the logic
programming language. This
may be
will be discussed in Chapter
coupled with an existing
DBMS
resulting
Two. Secondly,
in
two
distinct
the interface
solutions,
the
"compiled" and "interpreted" approaches. These will be discussed in Chapter Three.
D.
PROLOG AS AN IMPLEMENTATION OF LOGIC PROGRAMMING
Prolog interpreter and compilers are the most
available
today.
In
virtually
all
studies
programming and DBMS's, Prolog
is
common
logic
programming systems
concerning the integration of knowledge
used as the defacto knowledge programming
11
language. Prolog has been demonstrated to be easy to learn, write and modify.
efficient: interpreted
comparable
Prolog
Prolog
is
It is
also
about as fast as interpreted Lisp and compiled Prolog
in
speed with more conventional languages.
is
a class of implementation of the positive
programming. Prolog programs correspond
to
Hom
hypotheses.
theorems to be proven by the theorem prover using unification.
how Prolog works follows. An example
of a Prolog program
is
clause subset of logic
Queries
A
correspond to
simple explanation of
is:
grandparent(x,y):-parent(x,z),parent(z,y).
parent(x,y):- mother(x,y).
parent(x,y):- father(x,y).
ancestor(x,y):- parent(z,y), ancestor(x,z).
ancestor(x,y):- parent(x,y).
father(johnJane).
father(jim,george).
father(chris,john)
mother(april,jane).
mother(heather,april).
mother(mo,john).
mother(april,george).
The program
parent(x,y).
consists of a collection of clauses or statements such as ancestor(x,y):-
An atom is
Each clause
is
a construct such as ancestor(x,y).
shorthand for a
implication and the
comma
first
order logic formula.
between atoms
12
to the right
The
of the
-: is
-:
ordinary logical
stands for logical
Qauses with an empty body
conjunction.
that
jim
is
the father of george.
called rules.
The clauses with non-empty body
are conditional and are
A Prolog program is simply a collection of facts and rules.
To compute an
The
are called facts. Thus, father (jim,george) states
answer
to
answer, a query, a clause with no head,
a
query
such
as
is
?grandparent(chris,x)
given to run the program.
has
the
property
grandparent(chris,x), with x replaced by the value in the answer, can be deduced
The answer,
program.
parent(x,y)
:-
jane,
and the
father(x,y)
is
During the answering of a query, Prolog
statement to be proved
program
clauses.
is
Each
from the
deduced using the rule for grandparent, the
facts father(chris John)
is
that
rule
and father(john,jane).
actually proving a theorem.
The
the query and the axioms used to prove the statement are the
step in the query answering process uses resolution inference
towards proving the query.
Standard Prolog differs from, and complements, pure logic programming in several
ways.
First, built-in predicates are
provided that permit clauses to be read and written to
and from terminals and the database, the order of which
in the database is not important.
For control of the search mechanism, Prolog provides
features such as cut
few
utilities for
and
fail.
and other
And, for support of program development, Prolog provides a
debugging and tracing programs.
Second, Prolog's proof theoretic resolution
clause
built-in predicates
subset of logic
is
incomplete for the positive Horn
programming. Prolog matching
differs
in
two ways from
unification used in resolution. First, Prolog's resolution uses left to right
bottom matching. Second, resolution requires
something containing
itself.
that a variable
to
cannot be instantiated to
For example, the Prolog expressions
13
and top
matches(x,x).
equal(example(y,y)).
very well could produce
infinite
terms and should be avoided.
Prolog evaluation mechanism uses model theoretic as well as proof theoretic views
since Prolog does not distinguish facts from predicates and
and
printed.
all
possible matches are
made
Hence, model theoretic answers are obtained simultaneously with proof
theoretic answers.
For reasons given above, the
discussing
relational
model
DBMS's. However, many advanced
sparse, so that
is
used throughout
this thesis
applications find the relational
when
model too
models such as the Entity-Relationship model have been developed
capture the semantics of the relationships
[6].
It is
beyond the scope of
discuss the issues involved in choosing which model to adopt.
14
to
this thesis to
n.
A.
EXTENDING PROLOG TO HANDLE SECONDARY STORAGE
OVERVIEW
The
between a Prolog program and knowledge based
conceptual difference
program with secondary storage management
secondary storage. This implies that
reading
at
it all
once into main
it is
store.
is
number of
the
facts
that reside in
impossible to load a deductive database by
Prolog programs are usually small enough to be
kept in certain data structures in main store so that they can be easily accessed by the
interpreter.
In the approach being addressed here, the database
and only those parts of it required
Thus,
this
approach requires
that the interpreter
Unlike
file
to
answer the current query are read into main
it
other approaches in which queries are
relational database or queries are first expressed
this
requires.
either
handled direcdy
B.
relational
by a
and controlled by Prolog or some other
approach senses the close similarities between Prolog
and a relational database and seeks to streamline accesses to secondary
doing away with the
store.
a relational database system so
structures similar to
can quickly access any fact or rule
knowledge based language,
must be kept on disk
and implementing
all
store
by
storage in a Prolog framework.
CRITICISMS OF PROLOG
1.
Lacks Support
The lack of standard database
authorization,
and
support,
such
as
concurrency,
recovery,
integrity checking, is a criticism of Prolog as a database system, not
as a database language.
15
2.
Weak Typing
Prolog does not provide for data definition and typing. The flexibility this
provides
is
desireable in an intensional database
However, there
is
no mechanism
to
types of attributes. Such information
management, optimization and
3.
convey
where processes are kept
in
main
store.
relation schemes, database constraints, or
is essential
for database design, secondary storage
efficient evaluation.
Strict Evaluation Strategies
The evaluation
evaluation
may be
strategies
used by Prolog are limited.
impractical for general clauses,
it is
While bottom-up
quite tractable for
many of
the
clauses encountered in database querying.
4.
No Look Ahead
Prolog lacks a mechanism that
lets the
operating system
so that the operating system can decide which pages to
know of program plans
bump when
bringing in
new
pages.
5.
No
By
Secondarv Storage Management
far the biggest
problem with using Prolog for databases
secondary storage management.
allowances must be
made
If
for cases
Prolog
is
to
is
its
lack of
be used as a database language,
where the database resides
because of size and the case where the database system resides
from the one where Prolog
is
at
in
secondary storage
a processor different
running.
Concentration on the management of large transfers from secondary storage
partial solution to to this
problem. Consider a simple join statement in Prolog:
ans(A,B,C)
r(A,B), s(B,C).
:-
16
is
a
If relations r
and
s reside in secondary storage, the
programmer must be cognizant of the
physical organization of the data. If the relations are not physically clustered, then a
block access per tuple
is likely.
When
using Prolog's looping join strategy, a block
access could be assumed for each time an s -tuple joins with
some
r -tuple, even if
.s
is
indexed on B.
Common
sense strategies would be to read a block of r -tuples, to read as
blocks of s -tuples as will
currentiy in
into the remaining
memory. More blocks of s
The process
s is considered.
that
fit
is
-tuples are brought into
5, is
-tuples or
many
moving
to sort both relations
on B,
C.
if
until all
of
is
virtual
memories
Another strategy,
not ah-eady sorted, and
made.
grouped well or are not retrieved as grouped,
and out of core and large
in
main memory
s -tuples.
perform a merge join where only a single pass through r and s
If tuples are not
to try all joins tuples
repeated for each block of r -tuples. This strategy assumes
most blocks accessed contain many r
based on the sizes of r and
memory and
many
alot of data will
be
will not help.
ADVANTAGES OF PROLOG
Some
1.
of the advantages of Prolog for database programming follow.
Augmented
Easily
The language can be
operations,
new
by defining new
2.
to write parsers
and
in syntax. Special
data structures, and special functions for querying can be readily added
predicates.
Straightforward
Prolog
easily augmented, both in function
is
And Rexible
a good language for query parsers and translators.
and tree-transformation programs which makes
17
this
It is
straightforward
language attractive
because of
its flexiblity
the language a
good
target for query translation.
Simple Virtual Data Definition
3.
Views can be
Views
with indices. Programs are easily manipulated as data, making
readily added to the database and
are conceptually the
same
may
reference other views.
as relations for querying. There is
views before using them, and computation and
no need
retrieval are easily
mixed
to evaluate
in a
view
definition.
Simplicity
4.
One of
from main
the advantages cited of such a system is
to secondary store because of Prolog's
example,
in state-of-the-art
embodied
in
a powerful,
systems, PL/1 and
relational
databases,
simplicity
traversing
good impedence match
most of the
[2].
For
relational ideas are
elegant and practical database system.
COBOL for example,
when
have host languages. Access
However,
most
to the database is
provided by embedded query language statements in a host language such as SQL.
SQL however,
is
so different from PL/1 and
COBOL that
the resulting combination can
be considered unsatisfactory. In addition, programmers will need to know both
one of the procedural system languages. In the approach taken
programmers need
to
know
only one language,
PROLOG, which
in
this
SQL and
chapter,
can be used to ask and
answer queries and also used for running system language programs. Being based on
logic,
PROLOG
is
a
much more
procedural languages such as PL/1 or
suitable
COBOL.
system language
Furthermore,
interpreter sufficient sophistication to optimize queries so that
with the
minimum number of
disk accesses.
18
it is
for
databases
than
possible to give the
they can be answered
Use Of Rules
5.
This
similar facility,
share
called a view,
which
more
RDBMS's do
of rules.
likened to a special case of a rule
is
the advantages that views bring.
all
rather
also has the advantage
approach
However,
it
provide a
[3].
Rules
can be argued that rules are
useful and powerful than views. First, rules are at the heart of a deductive
database rather than on the outside, as views are in a relational database. Rules can be
used in an important
way
in the data modelling stage of database creation
an essential part of the data model
defining a relation partly by
can be recursive. This
In
general,
express
directly
is
relational
general
say, in the
facts
For example, they provide the
and partly by some
a very useful property which
programming languages.
single nile,
some
itself.
is
rules.
and become
flexibility
Furthermore, niles
not shared by views.
database systems compensate for their inability
laws
about
data
of
by interfacing
In other words, a general law,
with
to
conventional
which can be expressed by a
system being described, will nonnally need to be expressed
procedurally in a relational database system by a program written in a host language.
The
rule is
more
explicit,
more
concise,
easier
to
understand and easier to modify
than the con-esponding program.
D.
CLAUSE INDEXING
In
order to efficiently retrieve large amounts
suitable file
stmcture
must be found. This
is
of facts
in
secondary storage a
generally called the clause indexing
problem.
There are two
the other
is
distinct solutions to this
the interpreted approach.
problem.
One
is
the compiled approach and
Both solutions are examined
19
in detail in
another
section
first
of
suffice
this thesis;
it
compiled approach, the compiler
to say that in the
uses the rules to generate a number of subsidiary queries which can then be
answered by looking
at the facts.
query answering process.
followed by one of pure
Thus, no fact
In other words,
retrieval.
On
is
accessed until the very last step of the
the other hand, in the interpreted approach,
accessing of facts and the computation are interleaved.
a fact to continue the computation,
because the required page
may
a disk access
Whenever
the interpreter needs
potentially required
is
the
(potentially,
already be in buffer).
QUERY OPTIMIZATION
E.
This problem
rule,
is
essentially that of finding an
appropriate order, or computation
for solving subqueries so that the total cost of answering the query
Standard
PROLOG
systems have a fixed
always answer the leftmost query
are essentially
1.
first,
left to right
computation
then the second to the
must be done before the query
the query optimization
two main techniques involved
is
left,
rule.
etc.
is
minimized.
That
is,
they
In such systems,
given to the interpreter.
There
in planning a query.
Reorder Clause
An
estimate
is
made of
the
number of
query, based on the sizes of rules and the
A
one of pure computation
the process is
greedy algorithm
This optimization
is
is
successful matches for each goal in a
number of
different values for each attribute.
used to reorder the goals by increasing the number of matches.
similar to that used in
orderings of clauses, and
it
SQL, but
doesn't use information
method depends on goals being
satisfied
by
example.
20
unit
it is
not exhaustive in trying
on index
availablity [7].
ground clauses. The following
all
This
is
an
answer(C):- country (C), borders(C,mediterranean),
country(Cl),asian(Cl),borders(C,Cl).
Which
is
ordered
to:
answer(C):- borders(C,mediterranean),country(C),
borders(C,Cl), asian(Cl), country(Cl).
Group Independent Clauses
2.
portions of queries
Certain
may be
independent, in that
uninstantiated variables at the time of invocation. If a query has
there is
fails.
no point
in trying to resatisfy the first part
mechanism
is
changed so
if
braces and
that portions in braces are skipped
share no
two independent
upon backtracking,
Independent portions of queries are grouped with
they
parts,
the second part
the
evaluation
over on backtracking. For
example, braces can be added to the reordered query above to yield:
answer(C):- borders(C.mediterranean), { country (C) )
{borders(C,Cl),(asian(Cl)}, {country (CI)}
Although similar
evaluation
to
QUEL
in
query
decomposition,
[7].
21
it
is
).
not
interleaved
with
m. THE COUPLED MECHANISM
A.
OVERVIEW
The coupled approach
knowledge management
fii
this
refers to the cooperation of
(the
distinct systems,
ES) and one for massive data management
the key issue is the interface
approach,
two
that allows
the
(the
one for
DBMS).
two systems
to
communicate.
Broadly speaking, the key attraction of coupling the
machine and one for the
from
the performance advantages of
possibilities
B.
when designing
facts
(EDB). This
store or extensional database
DBMS
appears to be
by running two intercommunicating processes: one for
the production of fast solutions
the inference
ES and
this
and statements
sort of coupling
that reside in
secondary
could also greatiy benefit
database machines. There are essentially two
kind of interface: the loosely and tighdy coupled.
LOOSELY COUPLED
Intuitively, in this approach, the
the
compiled approach
It
DBMS.
processes demands for
performed):
first
[8].
serves at the request
of the
based on two distinct phases (eventually
DBMS;
and the delivery of the
ES which
This approach has also been referred to as the
a computation on the side of the ES, which, using
generates queries for the
DBMS
is
DBMS
its
repeatedly
knowledge,
then the execution of the queries on the side of the
result to the former. Ideally,
compiling queries makes use
of two processors, one each for the extensional and the other for the intensional so that
each machine mzdntains
its
own
identity.
The
22
logic language is essentially devoted to
deductive function and employs theorem proving by using only the intensional axioms
IDB
without interleaving access to the extensions of the database. The output of the
processor
is
a set of axioms,
all
of which reference only relations stored explicitly
extensional database (EDB). After the compiled axioms are produced, deduction
in the
is
no
longer necessary to answer any query on the database and the theorem prover, used to
compile the axioms,
IDB
is
no longer necessary. In
processors relegates the
effect,
this
decoupling of the
and
task over the intensional database (IDB) to the
search
theorem prover, and the inference free computation task over the
which uses conventional
EDB
EDB
to the
DBMS
relational techniques [9].
In order to take advantage of the compiled approach most implementations use
some
sort
of delay
tactic in
the query belonging to the
facilitates the
which the query
EDB
is
of a
in order to control access to the
also necessary for each specific database.
that
IDB from
And, some modifications
to
the
Prolog
and other optimizations.
simple example of a use of compilation
set
DBMS
unloading of queries or predicates in optimal form would be required.
are then necessary to permit delay
A
evaluated either to identify conjuncts of
or to further optimize. Perhaps a link to the
Automatic loading into a meta-dictionary
EDB
is
is
as follows.
Given a program consisting
of rules:
Ql
Ql
:-
P2, P3.
:-
P2, P3, P4.
Q2:-P1,P6.
We
identify those conjuncts of a query that are directiy definable in the
predefined predicate
ExtDB
so that
we have
23
EDB
with a
PI
:-
ExtDB(Pl).
P2
:-
ExtDB(P2).
P3
:-
ExtDB(P3).
P4
EXTDB(P4).
:-
P6
:-
ExtDB(P6).
During the deduction of a query, when an
goal,
it is
ExtDB
placed at the end of the query. Thus,
if
predicate
we have
is to
be added to form a new
a goal statement of
:.Q1,Q2,Q3.
and a statement with an ExtDB pedicate of
Ql
:-
ExtDB(Ql),
then the derived goal becomes
:-
So
that
the entire query is analyzed,
conjuncts are
moved
to the
EDB
all
conjuncts that
end of the query. In
Prolog are made. The query
All
Q2, Q3, ExtDB(Ql).
is
delayed until
all
this
are identified as
ExtDB
approach, the fewest changes to
possible
EDB
identifications are
made.
conjuncts are passed at once resolving the tuple vs set at a time difference
inherent between Prolog and
RDBMS.
In modifying the control structure and termination condition, the
are incorporated into the
new
ExtDB
predicates
goal statement being placed at the end of the
list
of
conjuncts in contrast to other predicates which are added to the front of the goal
statement. Also, goal statements
the
list
which
start
with an
ExtDB
predicate as the
of conjuncts must be recognized as a halt condition.
arises, the goal
statement must be transmitted to the
24
EDB
and the
When
first
a halt
RDBMS
goal in
condition
to obtain the
A
answers.
conjuncts
Figure
meta-dictionary
does a match against the
from which Prolog
this
mechanism
is
given
in
1.
this
approach and the obvious simplicity
serious drawbacks.
new EDB
large, requiring
makes
utilized
and the ExtDB predicates. The architecture of
Against
each
is
it
.
The meta-dictionary would have
it
to
provides,
we
take note of
some
be loaded prior to processing for
Also, for large EDB's, the meta-dictionary would be proportionately
update in order to sufficientiy satisfy
constant
unfeasible.
It is
also not clear
Prolog
how
—
Met
ts
Pi
EDB goal
to
c
t
a
iona ry
quer ies
RDBMS
1
RDBMS
Figure
1.
goal queries. This
the concept of views in the
con June
sent
all
Loosely Coupled Mechanism
25
IDE
side
would
be supported.
Finally,
a problem of consistency
may
arise
the data collection
if
extracted from the database (which essentially represents a snapshot) is used while
the original version is updated.
The
coupling
attraction of loosely coupled systems is the possibility of optimization in the
itself. If
query conditions were converted into some nonnal fonn, previous to
passing these queries to the
to
them
DBMS,
to obtain a simplified
a
number of transfonnation
fonn of the
steps could be applied
The
original expressions.
extra infonnation
available at this level and the pattern matching facilities of the logic language provide
an optimization mechanism that compliments the one in the
The
DBMS.
role of integrity constraints as an optimization technique applies nicely to the
loosely coupled mechanism. Integrity
constraints play
answers to a knowledge based query in a
Hom
no
part in the
database. That
derivation
of
integrity constraints
is,
are not necessary to obtain answers to a given existential query in a
Hom database
[10].
Nevertheless, the role of integrity constraints as semantics (knowledge) expressed over
the database has been recognized
possibilities [11].
the
by Vassiliou and Jarke for
Integrity constraints
its
potential optimization
can aid/guide and act as heuristics in searching
space for answers. In some cases, they can help terminate queries that have no
answers
search
as they violate
over
semantically
the
some database
database.
Integrity
integrity
constraints
constraints
and hence require
no
can also be used to generate
equivalent queries which can be executed
more
efficiently
database than the original query. Thus, integrity constraints are valuable to
over the
aid the
search process or to transform the original query into a set of semantically equivalent
queries.
26
The
representation of integrity constraints in clause form
examples will
show
suffice to
"The school
its
relatively
is
Some
easy.
simplicity.
that offers
knowledge also
offers challenges."
offer(X,knowledge), offerpc.challenges).
:-
"The course
is
either in this quarter's schedule or in the
yearly schedule."
spring, schedule (Y), yearly. schedule(Y)
"Only
axioms
is
the
axiom
EDB
teach(X,CS,Z).
work by Chakravarthy, Fishman and Minker,
integrity constraints. If
the
:-
complete, a search for predicate matches
the matching
scheduled(X,Y).
WU teaches computer science courses."
(X = wu)
Summing
:-
is
after compilation of the
made between
matches are found, the pertinent integrity constraint
axiom by using a subsumption algorithm
[8].
Ideally,
are instantiated using the values in the integrity constraint
will not be necessary.
considering very large
substantial.
Partial
goal
The method
integrity constraints
the axioms and the
is
matches
queries,
also
to goal queries are
the time saved
even
efficient
and axioms since the
from
added
to
open variables of
and accesses
much more
calls to the
in the presence
integrity
is
to the
likely and,
EDB
will be
of a large number of
constraints are
compiled into the
axioms only once.
C.
TIGHTLY COUPLED METHODS
In
the
example
implementation.
above,
One way
some
to avoid
modification
was
necessary
modifying the Prolog compiler
for
proper
(interpreter)
introduce the concept of a meta- language. See Figure 2 for a generic architecture.
27
is to
The
Pro log
collect database
reque s t
X
Meta- anguage
1
t
generate target
rans late:
query
I
DBMS
Figure
2.
Tightly Coupled
Mechanism
sentences in the knowledge language (Prolog) are expressed as terms
language, along with the
the simulation of the
constructs.
new meta-language
,
the meta-
predicates. Basically, this corresponds to
knowledge based language through
In Figure 2
in
the high level meta-language
the meta-language acts as an efficient translator to
oriented queries and optimizes the processing of parameterized views.
The
form
flexibility
of the meta-language representation arises from the fact that the computation
knowledge
of
defining the
meta- language predicates,
provides
the
added advantage of
and
based language(Prolog) can be modified by suitably
search rules
the
set-
such as
Select,
altering the control
28
Add, Member,
etc.
structure without effecting
This
any
changes
to
Prolog
itself,
by
only
but
meta-language
of the
redefinition
the
predicates [12].
Perhaps the most obvious advantage of close coupling to the expert system user
the complete transparency of the
RDBMS. To
'employee' which has two attributes
Madison, the question
is
'name' and
,
this point
illustrate
To
'salary'.
employee(Madison,
of
obtain the salary
salary),
regardless of the type of storage used by the relation employee.
might be kept as a relation by a
RDBMS
close coupling a very attractive
is
choice.
in the first instance without
The
fact that
employee
completely hidden. This transparency makes
In
particular, prototype
DBMS facility.
Later,
when
be
systems could
the
amount of data
DBMS
facility
could be added. This would not cause changes
programs on account of the
DBMS
addition.
exceed certain
to the
consider the relation
asked:
?-
implemented
is
limits,
the
In this strategy, the
In this case an
required.
on
Queries
line
are
ES
consults the
DBMS
at
various points during
operation.
its
communication channel between the ES and the
generated
and transmitted
to
the
DBMS
DBMS
is
dynamically, and
answers can be received and transformed into the intemal knowledge representation.
Thus, in tight coupling the
ES must know when and how
to consult
DBMS
the
and
must be able to understand the answers.
In a loosely coupled system,
the
ES
has a
window on
direcdy only those facts currentiy loaded on the window.
the facts and
When
DBMS
can access
the data actually held
in the
window have been
tightly
coupled system, the limitations due to the presence of the window are overcome,
processed, the
ES
29
asks the
for
new
data.
In the
and access
problem
to data
can be performed during the deductive process. The consistency
identified in loosely coupled
systems
is
avoided since accesses are performed
in real time relative to the status of the database.
A
ES
naive use of the communication channel would assume the redirection of
queries to the
difficulties.
The
DBMS. Any
such approach
the
first difficulty is
is
bound
number of database
to face at least
Since the
calls.
operates with one piece of information at a time(record), a large
database
may be
query language
DBMS
calls will
required for each
level,
ES
goal.
Assuming
rather than at an internal
result
two major
ES normally
number of
that the coupling is
DBMS
all
calls
made
to a
at the
such a large number of
level,
The number of
in unacceptable system performance.
calls at
the query language level could be reduced if these calls result in a collection or set of
The second
records.
difficulty is the
usually have limited coverage.
complexity of database
calls.
For instance, the majority of query languages do not
support recursion. For reasons of transportability and simplicity,
to include in the coupling
or
COBOL,
and
a language
mechanism
that
Database languages
it
may
not be desired
embedding programming language, say PL/1
the
would solve
the discrepancies in
power between
the
ES
DBMS representations and languages.
The
DBMS
is
basic scenario for tight coupling a Prolog based
as follows.
The user
consults the
ES
with
ES
with an existing relational
a problem
to
be solved or a
decision to be made; typically this can be expressed as any user friendly language
available in
ES
shells but Prolog predicates will
be assumed. Rather than evaluate
this
user request directly, in a tightly coupled framework the predicate would be massaged
into
a slightiy modified
form whose
evaluation can
30
be
delayed
while
various
transformations are performed upon
This process
it.
is
analogous to a *pre-pnx;essing'
stage.
Because of the
flexibility the
meta-language provides, the design of such a system
can take on any aspect. In a general way, such a system would typically be comprised of
a translator which would translate the knowledge based language(Prolog) either into a
DBMS
would
language(SQL) or one more adaptable towards optimizations. Optimizations
logically be
execution of
Also during
DBMS.
this
in
strategies.
this stage,
instantiated
to
eliminate the
the query could be evaluated to determine
by the database
particularly
-
Prolog. Finally,
in
important
main memory and which
when concerning how
to
which conjuncts
be passed on to the
coupling
as implied
is
not without
if
an intermediate language
some
setbacks.
When
made
at
accommodate recursion
to
another translator must be implemented
from the data manipulation language
Close
to
[13]
Decisions for storage of query results for future reference could be
point
relations
Removal of redundancies
this stage.
unnecessary operations could be implemented using techniques similar
view processing
would be
done during
for translation to and
is
used.
facts
are
mapped
into
by the definition of close coupling, operating system resources
may be
extensively utilized. Processes and pipes could be used to the detriment of the
system
by
slowing the overall response time of
consuming a considerable amount of
a very efficient access
its
the
local operating
system
by
limited resources. Gains in time, obtained by
mechanism, would be
almost
completely
lost
in
the
communication between processes.
Another
detriment of close coupling
31
is
the overhead
required for conversion
between a database source language and Prolog. As an example, consider a database
with a relation
that
stores
employees and
given a single employee, asks
who
relational calculus query, the
as described
above evaluate
managers. Consider a query which,
the highest boss over that employee. This is a
is
standard transitive closure example
their
and,
since
it
cannot
need for Prolog becomes obvious.
this
parse, optimize
and execute
ship
system
it
off to the database, which
and ship the single tuple answer back to the
it
Prolog component. Prolog would then check to see
if so,
How would a
query? The Prolog component would construct a
database query to retrieve a single employee tuple, ship
would
be answered by a simple
if
the
employee had a manager, and
construct a database query to retrieve a single employee tuple (the manager's),
it
off to the database,
etc.
The
point
is that
there
is
a tremendous
amount of
overhead for what ought to be a simple running of a chain of pointers on disk.
In contrast, loose coupling seems to avoid
To
start with, it requires
a
minimum
coupling could be implemented
DBMS
effort to
many of
implement. As suggested
by mapping requests
into equivalent expressions in
the pitfalls of close coupling.
QUEL.
earlier,
in Prolog for service
Unfortunately,
in
loose
by the
a loose coupling,
regardless of the data manipulation used, no obvious solution exists to the problem of
recursion and multiple queries.
replies to
their
The problem
respective queries.
is
rooted in the very large numbers of
Depending on the number of communication
channels available in the hardware, the problem could be
1.
Interpretive
In
the
deductive steps.
made
worse.
Methods
interpretive
approach,
The generic approach
one interleaves searches of the
to this
32
RDB
with
method can be described as having a
Prolog interpreter modified to operate on sets in which a sentence becomes an ordered
ordinary sentence that contains only variables and a
pair consisting of an
Thus the
substitutions.
set
of
assertions,
P(al,bl).
P(a2,b2).
can be represented as a single sentence:
(p(x,y), {[al,a2]/x,[bl,b2]/y}).
Procedurally speaking, given a goal statement, an atom
sent to the
RDBMS
to retrieve all instances that
on a stack whereupon Prolog defines
deduction, forms a
sends
it
back
exhausted
a.
new
to the
goal clause
RDBMS
if
it
match
as a statement
whatever
is
is
the atom.
if
selected
and a query
The answer
the stack is
empty
is
placed
or,
using
found on the stack applies and then
for further searches. This process is continued until
for
[3].
Set Oriented Approach
This approach can be considered the standard of the interpreted methods.
In this approach, the task of generating, including intersection and union of unification
sets are relegated to
The knowledge
the
DBMS
based machine
substitution at each
many
unifiers applicable to the
necessary only
has a
set
of
in
an optimized fashion.
substitutions
instead
node and the operations are performed on the
approach fuses together the
many
which can generate them
when
The implementation of
multiple
branches
at the
node which
same atomic formula.
procedures
arises
of a single
entire set. This
on account of
In this approach backtracking
is
are applicable to a single procedure call.
the set-oriented approach can be conceived in terms of tables at
33
each node and discussed previously. These tables represent the partial substitution
sets
applicable at that node.
Chakravarthy and Tran present a technique called table sharing which
reduces the
amount of stack space necessary
total
[3].
It is left
to the reader to inquire
further if interested.
D.
SUMMARY
Two modes
with a
RDBMS.
were distinguished
These are the compiled and the
approach, a theorem prover
is
to generate conjuncts all of
how
interpretive approaches. In the
which must be looked up
The
first
way
in a
RDBMS. Two ways
are
required a modification to a Prolog
goal atoms are added to a clause. In the second way, the operation
simulated by using meta-level statements directly within Prolog
The
compiled
used basically to expand an atom in a goal statement so as
described in which this can be done.
system in
which a logic programming language can couple
in
interpretive approach solves
Modifications can be
made
to a logic
is
itself.
problems by interleaving searches and deduction.
language to handle and maintain a
means of tables.
34
set
of answers by
IV.
A.
OVERVIEW
As
previously discussed in Chapter Three, two chief problems of a coupled system
are duplication of effort
and so called inefficiency stemming from the interface of the
expert system front-end and the
gain
EXTENDING AN RDBMS
if
the
RDBMS
included in a
is
never
RDBMS, why
RDBMS. The
left?
That
question follows then,
is
there
much
to
since optimization techniques are already
is,
not extend these systems with
new
operators to get
more
expressive power?
The
position taken in this chapter
knowledge bases of a
and timely
fact
is that
certain sort. Databases
management
databases are (or at least can be viewed as)
and knowledge bases
services but with different views.
try to
By
provide reliable
relating
what has
already been learned concerning traditional relational databases and logic programs and
by considering a database
extended
B.
as a very large but limited
knowledge base,
RDBMS becomes an attractive option presented to us.
DATABASE ADVANTAGES
Relational
database systems already have
programming and vice
integrity constraints
inference engine.
versa.
For example,
it
many
capabilities
The
that
mimic
logic
has already been shown in this thesis that
and views can be made to enhance and assume properties of an
The functions and
capabilities in an
RDBMS
attractive choice are listed:
a.
the use of an
DBMS handles accesses to large databases efficiendy.
35
that
make
this
system an
This has traditionally been the domain of the database system.
b.
Multiple users can share the database. Extra concurrency
control
if
c.
must be provided
in a
knowledge base front-end
multiple users are expected.
DBMS
supports multiple views in a well understood fashion.
Prolog supports multiple views of facts through
its
deduction
rules, but their interplay with consistency constraints is
not well understood.
d.
The
structures in a
DBMS provide efficient management and
programming of large databases. Such
structures are entities
and relationships and hierarchies.
e.
Recovery and security are basic
capabilities provided
and are
mentioned for completeness.
f Prolog provides
no concept of update or
transaction
consistency. Database transactions can include any
number
of actions over objects of the database and provide
transaction consistency.
C.
QUERY EVALUATION
Query evaluation stands
at the
center of most database power.
Database query
evaluation sacrifices flexibilty in exchange for increased performance.
solution is to increase
the
management
RDBMS
flexibilty
The proposed
without giving up any performance by allowing
facility inference capabilities.
Such a system would be a solution
to the
36
problem of incompleteness as allowed by
first
order logic and consistently seen in Prolog.
state that
one of two conditions
is
The problem stems from
true without saying
the ability to
which (using a disjunction
operator), or to state that something satisfies a certain condition without saying
thing
what
that
is.
Consider, for example, a database of employees, their salaries and department.
Queries for the number of employees making over
fifty
thousand dollars are particularly
well suited for traditional database applications but not for Prolog. This brings out an
important difference this approach
is
trying to capitalize on.
The Prolog approach
to
resolve queries leads to resolution by finding solutions through correct representations
and indexing. The problem of narrowing
Much
in
on the data
itself
has been ignored.
research in this area has been centered around the extensions of a database
query language such as
collections of
DBMS
QUEL
to
make
commands and
possible the expression of search algorithms as
to support the selection
of an algorithm based on
performance considerations.
D.
POSTGRES SUPPORT
Most
recently, the
work by Stonebraker and
the implementation of his Postgres
system stands out Postgres supports a collection of altemate algorithms and various
implementations of the same algorithm which reside in a library in the
DBMS. When
query
is
is
introduced, optimization proceeds and a shortest-path algorithm
Postgres though
theoretic view.
The
is
an
RDBMS
that
chosen
a
[14].
remains within the confines of the model
functions favored by knowledge engineers in a knowledge base, most
notably deduction, are
still
lacking.
It is
understandable that the proof theoretic view
never be implemented in a database because the proof theoretic approach
37
is
may
not
appropriate
to provide
for complete
To be
algorithms over large databases.
supplant the expert system but
data services and for implementing retrieval
is
fair,
this
available to be
type of system
employed by
is
designed not to
the expert system to
provide unified management of data in the knowlege base that resides both intensionally
and extensionally.
Postgres represents Stonebraker's implementation of a
managing and evaluating
rules
DBMS
using either forward or backward chaining.
Postgres' rules optimization procedures allow greater flexibility of the
this thesis
crosses the boundary between the
ES and
this in the
compiled
Expert
stage. This
be discussed further in Chapter Five.
Problem areas
more complex
the
DBMS, now rule
DBMS in either direction.
systems using Postgres can perhaps take advantage of
Also,
RDBMS. Whereas
has been concerned with optimizing predicate matching in a
management
will
that is capable of
to
still
remain
QUEL, making
master than
knowledge engineer who
accompanying baggage
only one
is
The
it,
interfacing query language
this
more comfortable
in Prolog or Lisp.
Despite
its
Moreover, the
mapping, optimization
system to be slow overall. Stonebraker acknowledges
the base relation to the
view
is
far,
provided without requiring
an extra semantic definition of what the inverse mapping should be [14].
is
even
system cannot be used to provide view update semantics. So
way mapping from
way mapping
is
as a stand alone system, unattractive to
that supports lock implementation, rule
and priorization may cause
that the rule update
in the system.
Work
for
two
in progress.
self-imposed restrictions that prevents
full
knowledge base
utilization,
Postgres represents a large stride towards the correct integration of expert systems and
38
DBMS'.
Its
portability
and anticipated widespread acceptance provide greater
and enhancements for a complete knowledge base.
39
flexibility
V.
TOWARDS A NEW APPROACH
OVERVIEW
A.
In the evolution of the integration of expert systems and the
that additional functions
machine
DBMS, it becomes clear
and new object types must inevitably be represented
to allow optimization
in either
of the management of knowledge while maintaining
highly deductive powers. In whatever form they take, these
new systems
should offer a
broad range of capabilities without major distruption to the end user. With these goals in
mind, these systems should have:
a.
Application Independence.
The systems of the
future should be general purpose with a
spectrum of applicability covering a large
set
of different
application domains.
b.
Expandability.
The management of the knowledge,
facts, rules
and
heuristics,
data and schemas are
c.
in
different forms as
its
must be managed as updatable
managed
in a
modem DBMS
today.
Alternative Strategies.
The system should have
strategies.
To
a choice of different search
help decide which strategy
what problem, accessible
sets
is
good for
of rules must be able to
control search strategy invocation.
40
d.
Multiuser Environment.
Similar to
many DBMS',
different purposes
by
the system should be used for
different users.
To achieve
this
concept, views must be imported from the database
with the caveat that knowledge views are
field,
much more
complex than data views. Multi-user support permits
to
work both
specialists
cooperatively, through the knowledge and
database, and independently.
e.
Efficiency.
This
is
which
perhaps the most important factor against
all
that the
systems are compared.
most
efficient
It
seems reasonable
to say
system to come out will be
the one accepted.
In light of the integration aspects already discussed, this chapter explores
further issues related to efficiency and presents
The
efficient interface
its
own
some
design for an efficient interface.
between a knowledge base and a conventional database
system strives to save the most accesses to secondary storage database. The systems
discussed in previous chapters use the paradigm of mapping each pattern matching
operation between a knowledge based predicate and the related facts to one query on the
relational database. Optimizing queries takes place prior to
storage, e.g., the
compiled approach.
Another approach
repeating the
any interface with secondary
is
to
save accesses to secondary storage database by never
same database query. This approach requires
41
storing in
main memory,
in a
compact and
efficient
way, information about the past interaction with the database
elements which have been retreived [15]. Simply speaking,
the query and corresponding rules residing in
relations
from secondary storage
that
this
system would analyze
main memory and load those simple
might fully or
partially fulfill the relations
making
up the query. For example, suppose the query was posed:
salary(cs,Y):- dept(cs,a,-,C), level(a,Y).
level(Q,A):-dept(Q,a,t,m).
And the database consisted of:
a.dept(cs,a,djc).
b.dept(cs,b,d,m).
c.dept(ae,a,d,l).
d.dept(cs,a,f,g).
e.dept(oa,a,d,l).
Then a and d would automatically be
fulfilling
B.
loaded. Efficiency
is
gained by partially or fully
query requirements without going through the overhead of analysis.
THE COMBINED APPROACH
On
the face of
it,
this
method seems reasonable. However, consider
this scenario.
An
optimizer would have to be employed in the 'pre-analysis' phase to ferret out duplicate
predicates and, although the real
secondary storage are
place in the
still
work of
taking place.
name of efficiency. Perhaps
It
analysis has not yet started, accesses to
appears then, that
if the
work of the
much
duplication
is
taking
pre-analysis phase were done
concurrently with the compilation work in a coupled design then real efficiency would be
gained.
42
conjunc
ts
Me
Prolog
ta
Diet ionary
t
uns at
i
sf ied
T
goal queries
sent
to
RDBMS
i
RDBMS
pre -anal ys
and opt imi za
Figure
In Figure three,
it
3.
in
is
really not
Chapter Three of
combined compiled/pre-analyzed system could look
set
up
in the
coupled approach
is
s
ion
Combined Approach
can be seen that there
compiled approach described
t
i
this
like.
much
difference between the
thesis
and what a possible
The meta-dictionary,
already
crucial in a prc-analysis system for optimization such as
keeping track of predicates that have been
Structures in the meta-dictionary
satisfied
by instances
would be necessary
shorthand correspondence referring to
EDB
secondary storage.
in
to assist in
its
performance.
A
predicate formulae should be maintained.
system could be as a stack that allows the ordered handling of
Implementation of
this
mutliple formulas.
Once
initiated
by the compiled mechanism
litde
upkeep
is
needed.
43
__,j
Other necessary structures include access-methods to indicate the database access-
method
for any particular formula
the database
and page-pointers for pointing
with respect to the access-method.
some
sequential scan or
dictionary.
The more
environment already
left at
built
Access-methods would be either
DBMS level to allow greater flexiblity by the meta-
DBMS
the
page of
Both the access-method device and page-
other technique.
pointers should be maintained at the
to the current
and for better
level the better overall because of the favorable
portability.
The meta-dictionary must
still
keep
track of data-base relations already retrieved.
C.
SUMMARY AND SCOPE
This thesis has sketched the concept of integration between a deductive system and a
relational database
shown
that the
effectively
a
management system
illustrating
each approach with examples.
It
was
spectrum of possible mechanisms to link these two components
continuum from,
implements both components,
at
to, at
one extreme, a single logic-based system
the other extreme,
is
that
two completely separate systems
with a strong channel of communication. Which system to employ ultimately
dependent on characteristics attractive to the designer and components already
at
is
hand.
Questions continued to be raised by this spectrum of possible mechanisms include:
in a coupled system,
what
is
between the two components?
the general architecture for the
How
into the query language of the
queries?
And
finally,
would
thesis into a meta-expert
expertise about
all
it
communication channel
can the expert system database calls be translated
DBMS? How
far
can one go in the optimization of
be worthwhile to integrate
all
methods discussed
in this
system that combines the expertise of the problem domain with
connected types.
44
LIST OF REFERENCES
[1]
Gallaire, H., Minker,
and Nicolas,
J.,
J.
M.,
"An Overview
and Introduction To
Logic and Data Bases," in Logic and Data Bases, ed. by H. Gallaire (Plenum
Press,
[2]
New York,
Brodie,
M. and
1978).
Jarke, M.,
"On Integrating Logic Programming and Databases,"
in
Proceedings First International Workshop for Expert Database Systems, ed. by L.
Kerschberg (The Benjamin/Cummings Publishing Company,
Menlo
Inc.,
Park,
1986).
[3]
Chakravarthy, U.
S.,
Minker,
and Tran, D., "Interfacing Predicate Logic
J.,
Languages and Relational Databases," Technical Report, TR-1019, University of
Maryland, College Park, Maryland, 1981.
[4]
Minker,
J.,
"An
Experimental Relational Data Base System Based on Logic," in
Logic and Data Bases, ed. by H. Gallaire (Plenum Press,
[5]
Sciore, E.
in
and Warren, D.
S.,
"Towards An
New York,
1978).
Integrated Database-Prolog System,"
Proceedings First International Workshop for Expert Database Systems, ed. by
L. Kerschberg (The
Benjamin/Cummings Publishing Company,
Inc.,
Menlo
Park,
1986).
and Lochovsky,
Data Models,
[6]
Tsichritzis, D.
[7]
Warren, D. H. D., "Efficient Processing of Interactive Relational Data Base
Queries Expressed in Logic,"
F., in
IEEE
(Prentice-Hall, Inc., 1982).
Seventh International Conference on Very
Large Data Bases 272-281 (1981).
[8]
Chakravarthy,
U.
S.,
Fishman,
D.
H.,
and Minker,
J.,
"Semantic
Query
Optimization in Expert Systems and Database Systems," in Proceedings First
International
Workshop for Expert Database Systems,
Benjamin/Cummings Publishing Company,
45
Inc.,
Menlo
ed.
by
L.
Kerschberg (The
Park, 1986).
[9]
Lloyd,
J.
W.,
"An
Introduction
Computer Journal Volume
[10]
[11]
Jarke,
(Plenum Press,
M.
and
[12]
Missikoff,
(May
2,
New York,
Vassiliou,
Management Systems,"
W. Reitman
No.
1983).
Closed World Data Bases," in Logic and Data Bases, ed. by H.
Reiter, R., **0n
Gallaiie
15,
To Deductive Database Systems," The Australian
1978).
"Coupling
Y.,
Expert
Systems
with
Database
in Artificial Intelligence Applications for Business, ed.
by
(Ablix Publishing Corporation, Norwood, 1983).
M. and Wiederhold,
G.,
"Towards
A Unified Approach For Expert And
Database Systems," in Proceedings First International Workshop for Expert
Database Systems,
Company,
[13]
Inc.,
Rosenthal, A.
Menlo
by L. Kerschberg (The Benjamin/Cummings Publishing
Park, 1986
).
and Reiner, D., "Querying Relational Views of Networks,"
Proceeding IEEE
[14]
ed.
COMPSAC (1982).
Stonebraker,ed., Michael and Rowe,ed.,
Memorandum No. UCB/ERL M86/85,
Lawrence A., "The Postgres Papers,"
University
of
California,
Berkeley,
Berkeley, June 1987.
[15]
Ceri, S., Gottiob, G.,
and Wiederhold, G., "Interfacing Relational Databases and
Prolog Efficientiy,"
in
Expert Database Systems, Proceedings of the First
International Conference on Expert Database Systems, ed. by L. Kerschberg
(Insitute
of Information Management, Technology and Policy, University of South
Carolina, Charleston, 1986), p. 141-153.
46
INITIAL DISTRIBUTION LIST
No, Copies
1.
Defense Technical Information Center
Cameron
Station
Alexandria, Virginia 22304-6145
2.
Library,
Code 0142
Naval Postgraduate School
Monterey, California 93943-5002
3.
Chief of Naval Operations
Director, Information Systems (OP-945)
Navy Department
Washington, D.C. 20350-2000
4.
Department Chairman, Code 52
Department of Computer Science
Naval Postgraduate School
Monterey, Cahfomia 93943-5000
5.
Curriculum Officer, Code 37
Computer Technology
Naval Postgraduate School
Monterey, California 93943-5000
6.
Professor C.
T WU, Code 52Wq
Computer Science Department
Naval Postgraduate School
Monterey, California 93943-5000
47
^j
3.
- 5-9 9
th^r^'^^
a
a
i^;:- itit^^^:^'' to
ing
syst ein. iSGd
1
Thesis
G57647
c.l
Govman
to
Towards a solution
the proper integration
ptogramming
o£ a logic
knowsystem and a large
management
ledge based
system.