Download Paraphrasing Using Given and New Information in a Question

Document related concepts

Georgian grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

French grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

English clause syntax wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Icelandic grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Antisymmetry wikipedia , lookup

Polish grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Lexical semantics wikipedia , lookup

English grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
University of Pennsylvania
ScholarlyCommons
Technical Reports (CIS)
Department of Computer & Information Science
August 1980
Paraphrasing Using Given and New Information in
a Question-Answer System
Kathleen R. McKeown
University of Pennsylvania
Follow this and additional works at: http://repository.upenn.edu/cis_reports
Recommended Citation
Kathleen R. McKeown, "Paraphrasing Using Given and New Information in a Question-Answer System", . August 1980.
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-80-13.
This paper is posted at ScholarlyCommons. http://repository.upenn.edu/cis_reports/723
For more information, please contact [email protected].
Paraphrasing Using Given and New Information in a Question-Answer
System
Abstract
The design and implementation of a paraphrase component for a natural language question-answer system
(CO-OP) is presented. A major point made is the role of given and new information in formulating a
paraphrase that differs in a meaningful way from the user's question. A description is also given of the
transformational grammar used by the paraphraser to generate questions.
Comments
University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-80-13.
This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/723
MS-CIS-80-13
UNIVERSITY OF PENNSYLVANIA
THE MOORE SCHOOL
Paraphrasing Using
Given and New Information
in a Question-Answer System
Kathleen R. McKeown
This work was partially supported by an IBM fellowship,
NSF grant MCS 78-08401, and NSF grant MCS 79-19171.
Presented to the Faculty of the College of Engineering
and Applied
Science (Department
of Computer
and
Information Science) in partial fulfillment of the
requirements for the degree of Master of Science in
Engineering.
Philadelphia, Pennsylvania
August, 1979
ABSTRACT
Paraphrasing Using Given and New Information
in a Question-Answer System
Kathleen R. McKeown
Supervisor: Dr. Aravind K. Joshi
The design and implementation
of a paraphrase component
for a natural language question-answer system (CO-OP) is
presented.
A major point made is
the role of given and
new information in formulating a paraphrase that differs
in
a
meaningful
description
grammar used
is
way
also
from
the
given
of
by the paraphraser to
user's
the
question.
A
transformational
generate questions.
PAGE
-
-
TABLE OF CONTENTS
.
1.INTRODUCTION.
.
.
11. OVERVIEW OF THE CO-OP SYSTEM
111, THE CO-OP PARAPHRASER
.
IV. LINGUISTIC BACKGROUND.
V. FORMULATION
VI. IMPLEMENTATION OVERVIEW
..........
.
A. THE PHRASE STRUCTURE TREE
.
C.FLATTENINGo
E. CONJUNCTION AND DISJUNCTION
...
F. NUMERICAL MODIFICATION
. ..... . .
VII.GENERATION.
. . . . . . . . .
VIII. FUTURE RESEARCH . . . . .
. .
G. Translation.
,
. . . . . . . .
... ... . .. .
. . . . , . . . . .
IX. RELATED RESEARCH
X-CONCLUSIONS
XI. APPENDIX A
.
. . ..
.
. . . .
. . . .
.. . .
. . ..
. . . .
. . .
XIV. REFERENCES
8
10
14
16
17
21
24
29
35
37
39
44
48
51
53
58
XII.APPENDIXB.
XIII.APPENDIXC.
4
19
B . DIVIDING THE TREE.
D. TRANSFORYATIONS
2
.
.
60
ti3
ACKNOWLEDGEMENTS
by an IBM fellowship,
This work was partially supported
NSF grant MCS 78-08401, and NSF grant MCS 79-19171.
I would like to thank my advisor Dr.
Aravind K.
Joshi,
for the tremendous amount of time and advice he has given me
during the
making of
this thesis.
I would
also like
to
thank Jerry Kaplan for his
advice during the implementation
stages of
and Dr.
the paraphraser
invaluable comments on the style
Some
special
graduate
thanks
students
for
reading
Bonnie Webber
for her
and content of the thesis.
of the
and
Moore
School
commenting
on
deserve
various
versions of the thesis and discussing critical points.
are: Tom
Williams, David
Bourne, Joe
O'Rourke, and
They
David
Raab.
Finally, thanks goes to the members of the house at 4103
Chestnut St.
for their support
entire period of the project.
and friendship
during the
I. INTRODUCTION
a natural
In
language interface
system, a paraphraser can be used
has correctly understood
a database
to
query
to ensure that the system
the user.
Such a
paraphraser has
been developed as part of the CO-OP system (KAPLAN
79).
In
the user's question is
CO-OP, an internal representation of
passed to the paraphraser which then generates a new version
of the question
the
user has
before
for the user.
the
the system
question was
seeing the paraphrase,
of rephrasing
attempts
to answer
not interpreted
caught before a
initiated.
option
Upon
her/his
it.
Thus, if
correctly, the
possibly lengthy search of
question
error can
the
be
the database is
Furthermore, the user is assured that the answer
s/he receives is an answer to
the question asked and not to
a deviant interpretation of it.
The idea of using a paraphraser
new.
T o date,
in the above way is not
other systems have used
canned templates to
form paraphrases, filling in empty slots in the pattern with
information
from the
user's question
The CO-OP paraphraser differs from
that a
adopted.
systematic method to
In
generate the
the question.
(WALTZ 78; CODD 78).
these earlier systems in
generate paraphrases
CO-OP, a transformational
paraphrase from an internal
grammar is
has been
used to
representation of
Moreover, the CO-OP paraphraser
generates a
question whose form differs in a meaningful way from that of
the
original
between given
question.
It makes
and new information
use
of
a
distinction
to indicate to
the user
the existential presuppositions made in her/his question.
---
11. OVERVIEW O F THE CO-OP SYSTEM
The
CO-OP
system
is
database query systems.
aimed
at
infrequent
users
of
These casual users are likely t o be
unfamiliar with computer systems and unwilling t o invest the
t i m e needed t o learn a formal query language.
converse naturally
in English enables
Being able to
such persons
to tap
t h e information available in a database.
In order t o allow the question-answer process t o proceed
naturally,
CO-OP
principles" o f
using
these
follows
some
of
the
conversation (GRICE 75).
principles,
meaningful answers
the
'co-operative
In
system
particular, by
attempts
t o questions having
to
find
negative responses.
T h e motivation for the approach was based on the observation
that people expect a non-trivial response t o their questions
-
(i.e.
than a
more informative
correct direct
response is
simple 'no').
negative, an
When the
indirect response
c a n be more informative.
T h e CO-OP
responses
questioner
by
system w a s developed t o
addressing
may have
made in
to a
direct
response
'none",
CO-OP gives a more
correcting
example,
the
wrong,
incorrect
her/his
question
question.
would
(A) below
mistaken
the
the
When
the
"no'
or
be simply
assumptions.
is posed,
projects in oceanography
CO-OP g i v e s
assumptions
informative indirect response by
questioner's
if question
assuming that
any
provide cooperative
corrective
the speaker
exist.
If
For
is
s/he is
indirect response
rather than the less cooperative direct response onone".
(B)
(A) Which users work o n projects in oceanography?
(B) I don't know of any projects in oceanography,
The
false
assumptions
that
existential presuppositions
(A) above,
in question
presupposition
that
CO-OP
corrects
of the question.
the speaker
there are
the
For example,
makes the
projects
are
in
existential
oceanography.
Since these presuppositions can be computed from the sutface
of
structure
the
question,
a
large
knowledge for inferencing purposes is
a
lexicon and
contain
database
schema
of
semantic
not needed,
In fact,
are the
domain-specific information,
the CO-OP system
store
only
it.ems which
Although this
is a portable one, it also
means
means that the
system does a minimum of semantic analysis.
The modules
morphological
in the
CO-OP system
analyzer,
phase translator,
the paraphraser,
and a
The
flow
an
control structure
presuppositional analysis when questions
responses.
include a
of control
is
parser, a
intermediate
which does
result in negative
initiated
with
morphological analyzer
which processes
the input
and passes
to the parser,
In this
the result
the
the
question
stage, the
morphological analyzer strips plural endings, determines the
root form for verbs, etc.
The parser uses an Augmented Transition Network (ATN by
(WOODS 73)) t o
parse the question
structure of the question in
is at
this point that the
and outputs
a syntactic
Meta Query Language ( M Q L ) .
paraphraser is invoked
It
with the
MQL
structure
as
input.
The
paraphraser
rephrases
the
question in English and the result is presented to the user.
If
the
user
perceives
that
the
system
incorrectly
interpreted her/his question, s/he can rephrase the question
and try again before the database is searched for an answer.
Once
the
user
interpretation,
translated
the
system
satisfied
MQL
Q,
into
interrogating the
CO-OP
is
version
the formal
database.
was
supplied
Atmospheric Research (NCAR).
compatible with
with
of
query
the
the
system's
question
language
used
is
for
(The database for
testinq the
by
Center
the
National
for
It is in CODASYL format and is
the commercially
available database
system SEED (GERRITSEN 78) which was
used.)
query
If the database
search results in a negative response, the control structure
does a presuppositional analysis, checking
to see if any of
the
database
presuppostions
results
were
an answer
in
to
false,
the
If
the
question, then
the
search
report
formatter is called to comprehensibly present the results to
the user,
A s input t o the paraphraser,
the
information available
question.
The
the YQL structure contains
for its
MQL representation is
use
in rephrasing
composed of
sets and
arcs and encodes the surface structure of the question.
sets
denote entities
in
the
database while
binary relations between those entities.
of the arcs
and sets are drawn from words
As such, sentences in the
arcs
the
The
denote
The lexical labels
in the question.
system are treated extensionally,
each word in the question pointing to its actual counterpart
in the database.
Figure 1
shows the MQL interpretation for
the sample question (A) above,
Attached
to the
sets
and
arcs are
properties
which
provide additional syntactic information about the question.
For
example, each
set
or arc
indfcates the word's syntactic
available as
has
a
category.
properties includes
property C9T
which
Other information
the number
of a
noun or
verb, the topic of the question, the main verb, the tense of
a verb, etc,
For a full description of MQL
see the system
documentation on the language and the macros which access it
in Appendix A.
ocean-
Figure 1
-
111. THE CO-OP PARAPHRASER
CO-OP'S
paraphraser
provides
the
error-checking for the casual user.
with
the system,
s/he
results printed, in
formal
database
can ask
only
will
to
be
have the
intermediate
output and the
shown.
The
naive
however, is unlikely to understand these results.
this reason that the paraphraser
of
If the user is familiar
which case the parser's
query
means
user
It is for
was designed to respond in
English.
The use of English to paraphrase queries creates several
problems.
The first is that
A
ambiguous.
natural language is inherently
paraphrase
must
clarify
the
system's
interpretation of possible ambiguous phrases in the question
without introducing additional ambiguity.
One particular type of ambiquity that a paraphraser must
address
is caused
by the
linear nature
of sentences.
A
modifying relative clause, for example, frequently cannot be
placed directly after the noun
phrase it modifies.
cases,
sentence
the
semantics
correct choice
the
sentence
question
of modified
may
(C) below
plausible.
of the
noun phrase,
be genuinely
has t w o
The speaker
may
In such
indicate
the
but occasionally,
ambiguous.
For
interpretations, both
could be referring to
example,
equally
books datinq
from the '60s or to computers dating from the '60s.
(C) Which students
the '6Os?
read hooks on computers
dating from
A second problem in paraphrasing
possibility
of
generating
originally asked.
If a
generate English
from an
question
the
English queries is the
exact
grammar were
question
that
was
developed t o
simply
underlying representation
of the
this possibility
could be
method must be devised which
realized.
Instead,
a
can determine how the phrasinq
should differ from the original.
The
CO-OP paraphraser
addresses
ambiguity and the rephrasing of
both
the problem
the question.
of
It makes the
system's interpretation of the question explicit by breaking
down
the
clauses
dependent
upon
of
their
question (C) above
the
question
function
in
and
reordering
the
sentence.
will result in either
them
Thus,
paraphrase (D) or
(E), reflecting the interpretation the system has chosen.
(D) Assuming that there are books on computers (those
computers date from the '60s), which students read
those books?
(E) Assuming that there are books on computers (those
books date from the '60s), which students read those
books?
The method
adopted quarantees that the
differ from the
clauses
or
prepositional
formulated on the
new
original except in cases
information
phrases
were
basis of a distinction
and
indicates
to
presuppositions
s/he
"assuming that"
clause), while focussing
has
made in
the
paraphrase will
where no relative
used.
It
was
between given and
the
question
user
(in
the
the
her/his attention
on the attributes of the class s/he is interested in.
IV. LINGUISTIC BACKGROUND
As mentioned earlier,
the lexicon and the
the sole sources
of world knowledge for
design increases
CO-OP'S
database are
CO-OP.
While this
portability, it means
that little
semantic information is available for the paraphraser's
Contextual
information is
history or context
parser
question.
limited
since no
is maintained for a user
current version.
the
also
The input
is basically
Using
syntactic
this information,
receives from
parse tree
the
running
session in the
the paraphraser
a
use.
of
paraphraser
the
must
reconstruct the question to ol,tain a phrasing different from
oriqinal.
the
The
following
question
must therefore
be
addressed:
What reasons are there for choosin? one syntactic form
of expression over another?
Some linguists maintain
functional
roles
Terminology use?
elements
that word order is
play
t o describe
within
the types
affected by
the
sentence.*
of roles
that can
............................................................
*
Some
other
influences
discussed
in
stylistic
reasons, in
discussed
(MORGAN
here,
on
and G R E E N 73).
addition to
determine
constructions are to be used.
that the
avoid
passive tense is often
identification of
flavor t o the text.
syntactic
agent
when
They
some
expression
are
suggest
that
of the
different
functions
syntactic
They point out, for example,
used in academic
and
to lend
a
prose to
scientific
occur varies widely.
Some of the distinctons that have been
described include qiven/new, topic/comment, theme/rheme, and
presupposition/focus.
are
not consistent
Definitions
(for
of these
terms however,
see (PRINCE 79)
example,
for
a
discussion of various usages of "given/newm).
Nevertheless, one influence on expression does appear to
be the
interaction of sentence
content and the
the speaker concerning the knowledge
elements in
of the listener.
the sentence function in
which the speaker assumes is
of the listener (CHAFE 76).
present in the "consciousness"
This
information is said to be
the preceding discourse or because it
knowledge
question-answer
information
database.
of
the
Some
conveyinq information
contextually dependent, either by virtue
world
beliefs of
is part of the shared
dialog
system, shared
of its presence in
In
a
knowledge refers
to
participants.
world
which the
speaker assumes
Information
functioning
is
in
present in
the
role
the
just
described has been termed "given'.
.Newn labels
presented
as
all information in
not
retrievable
declarative, elements
that the
the sentence
from
functioning in
listener is presumed not
represent
listener
wants
to
know
information
is
not
(i.e.which
already
to know are
additional functions
in the
the
In
the
called new.
in conveyinq what the
what
aware
context.
asserting information
In the question, elements functioning
speaker
which is
s/he
speaker
of.
question.
doesn't
presumes
Firbas
Of
know)
the
identifies
these, (ii)
is
used here to augment the
interpretation of new i-nformation.
He says:
.(i)
it indicates the want of knowledge on the part of
the inquirer and appeals to the informant to
satisfy this want.
(ii) [a] it imparts knowledge to the informant in that
it informs him what the inquirer is interested in
(what is on his mind) and [b] from what particular
angle the intimated want of knowledge is to be
satisfied."
(FIRBAS 74; p.31)
Although
word
order
vis-a-vis
distinctions has been discussed in
sentence, less has
analyzed the
arrive at
of the
Despite
different conclusions**,
lines of reasoning.
among the
the
the finite
few to
fact that
the two
they
follow similar
Krizkova argues that 50th
wh-question and
related
interrogative form.
and Krizkova* are
question.
and
light of the declarative
been said about the
Halliday (HALLIDAY 67)
have
these
the wh-item
-
verb (e.g.
"do"
or
"be") of the yes/no question point to the new information to
be disclosed
in the response.
These elements
she claims,
---------------
*
Summary
by ( F I R B A S 74) of
Interrogative Sentence
the untranslated
and Some
Functional Sentence Perspective
Problems of
article "The
the So-called
(Contextual Organization of
the Sentence), Nasa rec 4, 1968.
** It should be noted that Halliday and Krizkova discuss the
unknowns in
rheme of a
the question in order
question.
to define the
Appendix C contains
theme and
a description of
this concept and the analyses made by Halliday and Krizkova.
are
the only
unknowns
to
the questioner.
discussing the yes/no question, also
verb is
the only unknown.
The
Halliday,
in
argues that the finite
polarity of the text, is in
question and the finite element indicates this.
In this paper the interpretation of the unknown elements
in
the question
followed.
The
as
defined by
wh-items, in defining the
of knowledge, act
as new information,
of
new
information
and Halliday
questioner's
Firbas'
is
lack
analysis of
used to further elucidate the
the functions in questions is
role
Krizkova
in
elements are given information.
questions.
The
remaining
They represent information
assumed by the questioner to be true of the database domain,
This labeling of information within
the
construction
ambiquity.
of
a
natural
the question will allow
parsphrase,
avoidinq
V. FORMULATION
Following
paraphraser
the
analysis
breaks
information.
More
down
described
questions
specifically,
divided into three parts, of which
above,
into
an
the
given
input
CO-OP
and
new
question
is
(2) and (3) form the new
information.
(1) given information
(2) Function ii[a] from Firbas above
(3) Function ii[b] from Firbas a5ove
In
terms of
question
the
with
question
no subclauses
knowledge for the hearer.
indirect
components, (2)
modifiers
of
hearer.
it
defines the
the
lack
of
Part (3) comprises the direct and
the
indicate the angle from which
define the
as
comprises
attributes of
interragative
as
they
the question was asked.
They
the missing
words
information for
the
Part (1) is formed from the remaininq clauses.
As an example, consider question (F):
(F) Which division of the computinq facility
projects usinq oceanography research?
works on
Following the outline above, part (2) of the paraphrase will
be the question
minus subclauses: .Which division
projects?",
Part
words, will
be .of the
.which
division".
(3), the
modifiers of
works on
the interrogative
computing facilitym which modifies
The remaininq
clause
.projects
oceanoqraphy researchm is considered given information.
usinq
The
three parts can then he assembled into a natural sequence:
(G) Assuming that there are projects using oceanography
research, which division works on those projects?
Look for a division of the computing facility.
(I?),
In question
information belonging
three categories occurred in the
types
of
information
presented
minus the
part (2) of
than
one
question
clause
will
the
question
further
a
particular
splintered.
containinq
several
Clauses
characterized by category (3) will
sentences following
Only
category,
the
~ d d i t i o n a l given
the *assuming
t5at
the paraphrase
clauses
no clauses defining specific
information.
be
If more
occur.
Example (H) below illustrates
question
missing
in
will
concluding clauses.
parenthesized following
infarmation and
the
initial or
occurs
be
...' clause.
a
is missing,
of the
If one of these
question.
the paraphrase will invariably
information is
for
to each
of
given
attributes of
containing
information
be presented as separate
the stripped-down question.
(I) below
demonstrates a paraphrase containing more than one clause of
this type of information.
(H) Q: Which users work on projects in oceanography that
are sponsored by N4SA?
P: Assuming that there are projects in oceanography
(those projects are sponsored by NASA), which
users work on those projects?
(I) Q: Which programmers in superdivision 5000
ASD group are advised by Thomas Wirth?
from the
P: Which programmers are advised by Thomas Wirth?
Look for programmers in superdivision 5000. The
programmers must be from the ASD group.
VI. IMPLEMENTATION OVERVIEW
The paraphraser's
tree structure
tree is
from the
The design
the paraphraser
trees reflecting
given and new information
in the question.
tree allows
the tree.
The
for a
the parser's
In the
transformations
the
During this
grammar
are
of rules
processing in
translation phase,
representation are
words.
of
simple set
final stage of
is translation.
their corresponding
string.
The
three separate
of the
which flatten
is given.
representation it
then divided into
the division of
labels in
first step in processing is to build a
translated into
process, necessary
performed
upon
the
A. THE PHRASE STRUCTURE TREE
In its
initial processing,
the parser's
the paraphraser
representation into one that is more convenient
for generation purposes.
The resultant
that highlights certain syntactic
This
initial
independence
processing
from
representation
the
gives
changed or
the
The paraphraser's
the
component
the question
as the
root node
subject
of the
verb is
the root
right subtree.
relations in
a
In the
object.
The
of the
of Meta
paraphraser's
Query
new
node
of the
left
node of the
the use
Language)
does
other constructions
the main
The
of binary
(KAPLAN 79)
creates
preposition has a
tree
a
tree.
representation (see
every verb or
representation of
tree uses
current system,
the parser's
parser's
moved to
(if there is one) the root
description
illusion that
the
some
phase need be modified.
verb of
subtree, the object
paraphraser
Were
phrase structure
main
structure is a tree
features of the question.
CO-OP system.
system, only the initial processing
for
transforms
the
subject and
allow
should the
for
the
incoming
language use them.
Each of
the subtrees
represents other
question.
Both the subject and the
will have
a subtree for
in.
If a noun in one
another clause in
As
an example,
clauses in
the
object of the main verb
each other clause
it participates
of these clauses also participates in
the sentence, it will
consider
the
users advised by Thomas Wirth work
have subtrees too.
question: .Which
active
on projects in area 3?".
The phrase structure
in Figure 2.
tree used in the
Since 'work'
root node of the tree.
.projectsw
of the
other clause and
adjective
structure.
'active'
is the
'usersw
right.
paraphraser is shown
main verb, it will be the
is root of the left subtree,
Each
noun
participates in
therefore has one subtree.
does
Instead, it
not
appear as
is closely
part
bound to
Note that the
of the
the noun
modifies and is treated as a property of the noun.
users
projects
in
area
object
advised by
Thomas Wirth
object
Figure 2
one
tree
it
B. DIVIDING THE TREE
The constructed
tree is computationally suited
three-part paraphrase.
The
tree is flattened after
been divided into subtrees
accomplished by
portion of
the tree
least, this
right subtree root
stripped
down
The splitting of the tree
first extracting
the topmost
containing the
will include
it has
containing given information and
the two types of new information,
is
for the
wh-item.
the root node
nodes.
question.
smallest
At
the very
plus the
left and
This portion of the
tree is the
The
define
clauses
which
the
particular aspect from which the question is asked are found
by searching the left and right
The subtree whose root node is the wh-item
questioned noun,
contains these
clauses.
left or
right subtree or
these.
The
information.
subtrees for the wh-item or
remainder
Figure 3
previous example,
Note that
this may be
may only be
of
the
the entire
a subtree of
tree
illustrates this
one of
represents
division for
given
the
Q: Which active users
advised by Thomas Wirth
work on
projects in area 3?
P: Assuming that
there are projects
active users work on those
advised by Thomas Wirth.
Figure 3
in area
3, which
projects? Look for users
C. FLATTENING
If
the structure
of the
phrase structure
tree is
as
left subtree and B the right,
shown in Figure 4, with A the
then the following rules define the flattening process:
TREE-> A R B
SUBTREE -> R' A' B'
In other words,
each of the subtrees will
be linearized by
doing a pre-order traversal of that subtree.
subtree has three pieces of
As a node in a
information associated with it,
one more rule is required t o expand a node.
A node consists
of:
(1) arc-label
(2) set-label
(3) subject/object
where arc-label is the label of the verb or preposition used
in the parse tree and set-label
Subject/object indicates
the label of a noun phrase.
whether the
functions as subject or object in
sub-node noun
the clause; it is used by
the subject-aux
transformation and
expansion rule.
The following rule expands a node:
NODE
Two
They
are
does not
apply to
the
-> ARC-LABEL SET-LABEL
transformations
process.
phrase
They
are
applied
are wh-fronting
further
described
during
the
and subject-aux
in
the
flattening
inversion.
section
on
transformations.
Figure 4
The tree of given information is flattened first.
part of
the left or right
tree and
therefore is flattened
It is during
...
" are
of given information.
the clause.
by a
the flattening stage that
that there (be)
are
subtree of the
If
inserted
the
phrase structure
pre-order traversal.
the words "Assuming
inserted to introduce the clause
will agree with
there is more than
around
representing the
'Bea
the subject of
one clause, parentheses
additional
ones.
stripped down question is
The
modifiers.
form is
inserted before
tree
flattened next.
It is followed by the modifiers of the questioned noun.
phrase 'Look
It is
the first
The
clause of
D. TRANSFORMATIONS
The
grammar
used
transformational one.
rules
in
the
paraphraser
In addition t o the
described'above,
the
is
a
basic flatteninq
followinq transformations
are
used:
wh-f ronting
negation
do-support
subject-aux inversion
affix-hopping
contraction
has deletion
T h e curved lines indicate
t h e ordering restrictions.
are two connected groups of transformations.
There
If wh-fronting
applies, then s o will do-support, subject-aux inversion, and
affix-hopping.
The
second
group
invoked through
the application
of
transformations
o f negation.
do-support, contraction, and affix-hopping.
not
affected
by
The
described by
absence
or
A description of
transformations.
follows.
the
rules
used
here
is
It includes
Has-deletion is
presence
of
other
the transformation rules
are
(AKMAJIAN and HENY 75) and
based
on
analyses
analyses described
by (CULLICOVER 76).
T h e rule for wh-fronting is
SD
abbreviates structural
specified a s follows, where
description
change:
SD:
-
-
X
NP
Y
1
2
3
SC: 2+1 0
3
condition: 2 dominates wh
and SC,
structural
The first
step in
search of
the tree for
approach
is
generation,
the implementation
used
The
of wh-fronting
the wh-item.
for
A
paraphrasing
slightly different
than
clauses must
Since the clauses
for
the
be fronted
paraphrasing
that
it
along with
the
will
never
NP to
paraphrase
mode
wh-fronting is tested
is
be
be
the
also
case
when
dominates
For this reason,
applicability
the
for and is applied
question,
noun.
are broken down
fronted
used,
process of the stripped down
the original
the head
relative clauses or prepositional phrases.
when
for
When generating,
of the original question
paraphrase,
used
be the head noun of some
relative clauses or prepositional phrases.
these
is
difference occurs because in
question, the NP to be fronted may
is a
of
in the flattening
If it applies, only
one word need be moved to the initial position.
When
generation is
being
done,
the applicability
of
wh-fronting is tested for immediately before flattening.
If
the transformation applies, the tree
of which
the wh-item
from the
remainder of the tree
position to the
is the
is split,
root is
The subtree
flattened separately
and is attached
string resulting from flattening
in fronted
the other
part.
After
invoked.
wh-fronting
In CO-OP,
has
been
the underlying
question does not contain modals
fronting the
applied,
wh-item necessitates
do-support
representation of
or auxiliary verbs.
supplying an
The following rule is used for do-support:
is
the
Thus,
auxiliary,
-
- -4
-
NP
NP
tense
V
1
2
3
SC: 1
do+2
3
condition: 1 dominates w h
SD:
Subject-aux
afterwards.
inversion
Again,
if
4
activated
wh-fronting
inversion will apply also.
-
is
X
immediately
applied,
subject-aux
T h e rule 'is:
-
-
AUX
X
NP
NP
1
2
3
4
0
4
SC: 1
3+2
condition: 1 dominates wh
SD:
Affix-hopping
follows
subject-aux inversion.
paraphraser it is a combination
o f a s affix-hopping and
auxiliary is
'hopped'
from the verb t o the auxiliary.
X
SC:
1
1
generated, the
- AUX - Y 3
3
2
2+4
well
as
the
head-noun is
and
inversion apply t o
question.
In
the
not be
however, may be applicable.
used.
number are
Formally:
-
2
6
6
propose that wh-fronting
the relative
CO-OP
properly positioned by the
wh-fronting need
representation.
tense and
tense-num-V
4
5
0
5
S o m e transformational analyses
and subject-aux
Tense and number
in the parser's
When a n
SD:
the
o f what is commonly thought
number-agreement.
are attributes of all verbs
In
clause a s
paraphraser,
the
flattening process
Subject-aux
inversion
In c a s e s where the head noun of
t h e clause is not its subject, subject-aux inversion results
in t h e proper order.
T h e rule for
negation is tested during
t h e translation
phase of execution.
It has been formalized as:
-
-
-
Y
tense-V
NP
X
3
4
1
2
3
4
SC: 1
2+no
condition: 3 marked as negative
SD:
In the
CO-OP representation, an
carried
on
the
(KAPLAN 7 9 ) ) .
the
part
it is
a
possible
as modification
below),
of
of
the
Therefore,
verb
when
(see
the
negation is
relation
(see
English representation of
in
the
In all cases however,
of
binary
When generating an
question,
negation
object
indication of
some cases
noun (see
to
express
question
(J)
negation can be indicated as
version
object is
(K)
of
marked
paraphraser moves the negation to
as
question
(J)).
negative,
the
become part of the verbal
element.
(J) Which students have no advisors?
(K) Which students don't have advisors?
In
English, the
auxiliary of
case
for
negative
the verbal element
questions,
Do-support
is used.
an
The rule
presented this
way for
used
X
SC:
1
1
-
tense-V-no
2
do+2
must
the
was the
generated.
after
used after wh-fronting.
They
clarity, but
-
be
to
for do-support
combined into one rule.
SD:
is attached
and therefore, as
auxiliary
negation differs from the one
are
marker
Y
3
3
could have
been
Affix-hoppinq,
number, and
as
described
negation from the
The cycle of transformations
negation is
above,
hops
verb to the
the
tense,
auxiliary verb.
invoked through application of
completed with the
contraction transformation.
The statement of the contraction transformation is:
SD:
SC:
X
1
1
- do+tense
2
-no
3
-Y
#2+ngt# 0
where # indicates that the result
for further transformations.
4
4
must be treated as a unit
E. CONJUNCTION AND DISJUNCTION
The
use of
affects
both
three-part
conjunction
the
design and
paraphrase.
appears as part
and
When
disjunction in
the
questions
implementation
conjunction
or
of the new information in
of
disjunction
the question, no
changes need be made in the
design of the paraphrase.
conjunction
appears as
or disjunction
information however,
to
the already
method
of referring
to
given
plurals,
abovea to
"eacha to
of eacha is
items.
of the
given
will refer
standard
information which
contains
adopted.
In the CO-OP
used to refer
to conjoined
conjoined singulars,
disjoined entities.
When
Some
conjunction (or disjunction) must be
paraphraser, "some
part
the stripped-down question
mentioned conjoined
the
and
For example,
"any of
(M) below
the
is
used to paraphrase question (L):
(L) Which
users
work
on
projects
advised
Clayton-Paulsen and projects sponsored by NASA?
by
(M) Assuming that
there are
projects advised
by
Clayton-Paulsen and assuming that there are projects
sponsored by NASA, which'users work on some of each?
Note that since any number of items could potentially be
conjoined, expressions
like
.botha, had
plurals do
be
avoided.
not necessarily imply
are indicated.
the speaker
projects
to
which implicitly
For
Furthermore,
that all of
example, (L.) above does
is interested only in
advised
limit the
by
number,
conjoined
the entities
not imply that
users who worked
Clayton-Paulsen
and
all
on all
projects
sponsored by
NASA.
Again, a
referring term that
does not
carry this connotation must be used.
The features of
the system affected by
conjunction and disjunction are
paraphraser's
phrase
process.
MQL,
In
of its objects.
in the
structure tree,
its
currently
since
occurs
flattening
conjunction
is
verb or one
does not
in relative
in the
clauses
each of
the paraphraser (see question
provide for
not occur
into two,
(0) below).
additional modification of
in YQL.
of
question is split
paraphrase
it does
the
When conjunction occurs around the wh-items
question, the
and
and
it occurs around the main
which is passed separately to
(N)
the MQL representation, the
the representation
implicit except when
the addition of
this
The
paraphraser
type of
conjunction
input.
is treated
Conjunction
by
the head noun and
the parser
that
as
is so encoded
Question (P) below would be represented in the same
way as if question (Q) had been asked.
(N) Which users and advisors are sponsored by NASA?
( 0 ) Which users
are sponsored
are sponsored by N A S A ?
by N A S A ?
(P) Which users work on projects
sponsored by N A S A ?
(Q) Which users
in oceanoqraphy
work on projects in
are sponsored by N A S A ?
As
was
the
conjunction
case
in the
for conjoined
user's
and
oceanography that
wh-items,
question
Which advisors
this
type
of
is invisible
to
the
is visible
to
the
paraphraser.
Conjunction
around the
main
30
verb
paraphraser since more
than one verb is marked
verb in the MQL representation.
the main verb is
Although conjunction around
treated in the same way as
- as
conjunction (i.e.
a s the main
modification
other types of
of the
subject), the
additional modifying arc happens to be the main verb in this
case.
Conjunction
around the objects
of the main
verb is
also visible to the paraphraser.
Conjunction around objects
of verbs or prepositions results
in duplication of the verh
or preposition in the
M Q L representation.
occurs around the object of the
When conjunction
main verb, the main verh is
duplicated.
Disjunction is always explicitly represented.
type of
nouns
arc called
a disjunct
are disjoined
nouns in the
in the
arc is
question.
A special
used when
Disjunction
of.
around
in MQL of
question results in the duplication
the relation the noun is part
verbs or
Figure 5 depicts the MQL
representation of a question usinq disjunction.
The
paraphraser's
phrase
structure
tree
currently
handles any type of disjunction and the types of conjunction
which
are
visible
replicating
node
in
syntactic
unit.
replication is also
each label
This
conjunction
Each node
The qroup
is
or
achieved
by
disjunction
in the tree functions as
of
labels
treated as a syntactic
in the group can
thus allowing
YQL.
labels when
occurs in the question.
a
the
resulting
unit.
have its own set
for the representation
from
However,
of subtrees,
of questions
(R) below (its representation is shown in Figure 6).
such as
(R) Which users work on projects
reports sponsored by NASA?
in oceanography
T o distinguish between conjunction
and disjunction at a
node, the
list of
corresponding
labels is
nested with
to conjunction
is a
list of
level and
list.
disjoined items.
ends when an
item is a
any
combination of
than
the M Q L
any
not a
does not appear at any
occur in the list.
nested
for a greater range
currently
is
conjunction
In
and
In fact, the paraphraser's
disjunction can be represented.
representation allows
occur to
lexical label and
If conjunction or disjunction
manner,
A node
Each item in the list
Nesting can
particular level, only one item will
this
alternate levels
and disjunction.
labeled by a list of conjoined items.
and
provides for,
of possibilities
as
it has
limited
nesting capabilities.
Flatteninq rules for expansion of
nodes also need to be
modified to accomodate conjunction and disjunction.
node is
now a
list of
labels instead
ordering rules for expansion of a
An item
in a list
conjoined or
and its
until a
its subtrees are expanded by
VI. C above.
-
The rule
single node must be used.
neighbors in the
it is applied to
the bottom level is
reached.
list.
The
label and
the rules presented in Section
for expansion of
when the node has already
the
In
such
and then
successive levels
A lexical
does not apply
question.
single label,
subtrees are expanded
disjoined to its
procedure is recursive;
of a
Since a
cases,
a list
of labels
been mentioned in
the appropriate
referring
phrase is used.
....
(1) (item-1 item-2
item-n
...
item-1 and item-2 an3
itemon)->
....
.,..
(2) If item-n = (item-nl item-n2
item-n->item-nl or item-n2 or
Else item-n is a lexical label
item-nm)
item-nm
(3) if item-nm is not a lexical label, repeat
c3
users
u
Which users work in division 3 or 5 ?
Figure 5
users
((projects)
in
oceanography
object
Figure 6
(reports))
sponsored by
NASA
object
F. NYMERICAL MODIFICATION
T h e CO-OP
system incorporates a very
o f quantification, that
modification
question
interpreted a s
are represented
79)
for
in fact, looks more
o f nouns,
(like
Any
modifiers of the
in M Q L a s
on
questions in parsing).
(3 5) means
or
moren,
interpretation
Each property
etc,),
precede.
the
are
They
(see (KAPLAN
of
quantified
is a numerical range;
first indicating the lower bound,
"uw is used
in the universe.
"from 3 to S " ,
(0
(1 u)
(indicating negation),
quantifiers in
nouns they
indicating the upper,
all, or everything
like numerical
properties of sets
the
two numbers are used, the
the second
explicit
"every", "3
mall",
details
limited treatment
0)
,
'from
For
'from
1
to indicate
example, the pair
0 to
O n or "nonew
to alln,
or "somen,
etc.
The
paraphraser treats
same way
it treats
from numerical
stage of
numerical
adjectives,
of the
Properties
representation to
As
the
are translated
English during
processing, translation,
from unique label to lexical
modification in
a set
the third
is translated
entry, any numerical modifiers
set are introduced into
the string.
Table
1 below
shows how the numerical pairs are translated into words,
When numerically
the given
information in a
strategy is
presented
modified expressions occur as
used,
as
part
representing given
In such
of
question, a
slightly different
cases, the modification
the
information.
part of
existential
is not
presupposition
When the noun
appears for
the
second
Paraphrases
time,
(T)
the
and
(V)
numerical
of
modifier
(S) and
(U)
is
used.
respectively,
demonstrate this for two cases:
(S) Which students work on 5 or more projects advised by
NASA?
(T) Assuming that there are
which students work on 5
projects advised by NASA,
or more of those projects?
(U) Which users work on every project advised by NASA?
(V) Assuming that there is at least one project advised
by NASA, which user works on every such project?
----------+--------------+-----------------------Number
1
1
Plural
I
I
all
I
I
Singular
----------+--------------+-----------------------1
I
----------+--------------+-------------------------( U U)
I
every
1
I
----------+--------------+-----------------------(1~-1) I
no all
no every
I
I
no
----------+--------------------------------------( 0 0)
I
(n m)
I
n to
----------+---------------------------------------
I
I
I
I
I
m
(n n)
I
1
n
(n>l U )
I
I
n or more
exactly
----------+---------------------------------------
----------+--------------------------------------Table 1
I
I
I
I
1
I
I
I
I
I
1
I
G. Translation
I n the translation phase, the final cosmetic changes are
made to produce the paraphrase.
a string of
a
for
The input to this module is
word in
the
original
question.
includes the words which were
...
","Look
of the string
for
The
string
also
introduced for the three-part
-
process (e.g.
paraphrase during the flattening
that
a unique label
Lisp Gensyms, each Gensym being
...",etc.).
The syntactic structure
is essentially that of
in some cases, transformatons are
"Assurninq
the final paraphrase;
performed upon the strinq
during this stage.
T h e major
bulk of
work done during
translation of labels into
nouns
and
adjectives,
translation
as a property of the Gensym.
contains
the
root
syntactic information.
the
verb
has
paraphraser
a
the
is
a
simple
For
look-up
for these items is stored
The lexical property of a Verb
form
for
the
verb
The tense, the number,
regular
calls
is the
their English counterparts.
procedure since the lexical entry
Gensym
this stage
conjugation
morphological
are
and
some
and whether
stored.
routines
with
The
this
information to conjugate the verb.
The nouns that were used
first.
A list
of
in the question are translated
set-labels
maintained as part of the MQL.
for each
set-label in
the list
string is replaced by the
representing the
is
The input strinq is searched
and the
set-label in
appropriate word.
any adjectives that modify the
nouns
the
At this point,
noun are introduced into the
question.
As
mentioned earlier,
phase, adjectives are
not appear a s
order
are
splitting the
translating,
bound
each
post-modification.
to
nouns and do
This
is done in
adjective modification
given and new information.
closely
tree-building
stored as properties of
part of the tree structure.
t o avoid
clauses of
during the
the
Instead, adjectives
nouns
set-label
is
they
modify.
checked
Quantification
into
on
a
for
When
pre-
set
is
or
also
translated into words (see Section VI F for details),
The relations
in a
question are
nouns have been translated.
translated after
the
The relations include verbs and
prepositions.
A search of the input string is made for each
arc-label and
its occurrence in
the string is
the preposition
or conjugated verb.
translated into
a verb, the
transformation
is
verb (see
When an
arc-label is
applicability of
the negation
tested.
transformation to apply, 'no"
Section VI.
the
D).
In
order
This is
on
translated and
indicates negation.
verb's
that the verbs are only translated
Following
paraphrase is complete,
for
the
negation
must appear directly after the
quantification
translated.
replaced by
only possible
object
It
has
is for
if the
already
been
this reason
once the nouns have be'en
translation
of
the
verbs,
the
VII
.
GENERATION
The paraphrase component has been given a dual function.
It
can
generate
an
English
version
of
the
parser's
representation as well as paraphrase in the three-part form.
This function
uses the same
procedures and grammar
as the
three-part paraphraser, but the tree is not split into three
separate
trees
before
beinq
flattened.
function could be used to produce
were, there is
The
generation
the paraphrase, but if it
no guarantee that the
question would differ
from the user's.
In
CO-OP, generation
is
used
to produce
A corrective response
suggestions and corrective responses.
is used to
an
correct the user's
false
existential presupposition
incorrect, the
presuppositions.
encoded in
When
the question
M Q L representing
portion of
alternative
is
the particular
presupposition is passed to
the paraphraser which generates
the
For example,
corrective
corrective
response.
response
that
could
be
below
(X)
generated
is
by
a
the
paraphraser if (W) were asked:
(W) Which programmers in division 3
oceanography?
(X)
I don't
Alternative
system when
work on projects in
know of any projects in oceanography.
suggestions
are
the direct response
also used
to the user's
negative.
If an incorrect presupposition
question,
the
resulting
by
question may
the
CO-OP
question is
is removed from a
no
longer
have
a
negative response.
wider class
such cases, the system
question to
Alternative
above
as a
the user
suggestions
corrective response
(W),(X)
In
are
only
has been made.
might
be
suggests the
possible interest.
after'
presented
Thus, a
followed
sequence like
the
by
a
alternative
suggest ion (Y) :
(Y) But you might be interested in programmers
division 3 that work on any projects.
For corrective
portion of
the MQL
responses, the paraphraser
representation of
in
receives the
the user's
question
which encodes the incorrect presupposition.
For alternative
suggestions, the paraphraser also receives
a portion of the
original MQL.
the user's
In this case,
it is the portion representing
question minus the incorrect presupposition.
For
both types of response, the paraphraser generates a question
M Q L representation
from the
it is
given.
The
wh-item is
then stripped from the front of the question and a phrase is
attached
to
the
'I
statement.
corrective
depending
front,
don't
responses,
on
the
"But
noun
which
presupposition
above,
Slight
in
was
the
"anyw modifies
the
...
of any
question
used,
Table
...
The adjective 'any'
restricted
by
original question,
"projectsm
which,
are
2
particular wh-items and
in
to
' is used
modifications
you might be interested
alternative suggestions.
the
know
wh-item
correspondence between
used,
converting
a
for
made,
shows
the
the phrases
is
used for
is used before
the
Thus,
in the
incorrect
in
(Y)
original
question, was restricted by .in
T h e flattening process for
used
for
subtrees
paraphrasing.
representing
therefore, the tree
is the
flattened
by pre-order
however, is
The
given
same;
traversal.
generation differs from that
tree
is
and
new
not
divided
into
information
and
is flattened as a whole.
traversal
inorder
oceanography".
the left
and
traversals, the
The
rule
not identical.
for
It
into the question in order that
T h e order of
right subtrees
total
tree by
expansion of
introduces the
are
an
sub-nodes
word "that"
each subtree be expanded as
a relative clause, and not as a separate sentence.
The rule
is:
SUB-NODE->that ARC-LABEL SET-LABEL
The transformational grammar also
applies to the generation
process, with
being the point
the one difference
the applicability of wh-fronting is
these
process
changes and
is
the
the flattening
same
as
the
at which
tested for.
Other than
process, the
generation
paraphrase
process.
The
generation function is general enough that it m a y eventually
be used for other types of responses in cases when something
other than a direct response is needed.
1--------+-------------------------+-------------------------
I
1
I
1WH-ITEM 1
1
I
1
I
I
1
1
1
and
1
1 -------- -------------------------+-------------------------
I
lwhich
I
1
EXAMPLE
I
I (question
1
I
I
I
1
1
PHRASE
I don't
know of any
(corrective response)
I
I
1
I
I
IWhich
I
users
work
1
on!
1
Iprojects?
1
1
( I don't
know
of
I
lusers
that work
I
Iprojects.
-------- .-------------------------.+----------------------I
I don't
who
know of anyone
any!
I
on1
1
I
1
I
(Who works on
projectsl
I
( i n oceanography?
f
I
1
I
I
1
i
I1 don't know of anyone!
1
1
lthat works on projects1
I
l in oceanography.
I
1
--------+-------------------------+------------------------Table 2
I
1
I
..
--------+------------------------I
WH-ITEM I
PHRASE
I
I
I
1
what
I
I
I
I
I
I
1
.
I
I
I
I
I
I
I
EXAMPLE
I
1
corrective response)l
.--------------------- I
I
I 1 don't know
I
I
I
of anything What is
I
I
INASA?
sponsored by1
I
I
I
1
(I
I
I
don't
know
of
1
(anything
I
I
I
1 sponsored
1
I
\--------+-------------------------+--------------------I
I
I
lwhere
I1 don't know of anyplacelwhere do
I
I
I
I
1
I
I
(question and
--------+------------------------I
.---------------------
I
I
that
is
by NASA.
users
have
laccounts?
I
I
1
1
I
1
11
don't
I
l anyplace
I
I
know
that
of
users
lhave accounts.
I
I
I--------+-------------------------+----------------------1
I
1
(How many11 don't
I
I
know of any
\How many users
I
have1
1
laccounts?
I
1
I
I
1
1
I
,
I
I1 don't
I
lusers
I
know of
any1
1
that
have I
I
I
laccounts.
I--------+-------------------------+-----------------------
I
I
I
Table 2 (continued)
I
1
FUTURE RESEARCH
VIII.
The
CO-OP
semantics.
to
the
paraphraser
is
lacking
in
the
area
of
Although a good deal of attention has been given
form
of
the question
and
reasons
for
using
a
different form than was used in the original, the words used
in the
paraphrase are the same
words that occurred
user's question (with the exception
in the
paraphrase to introduce
development
of
a
particular words
method
of words that are added
given or
The next step in paraphrasing in
in the
new information).
the database system is the
to determine
in the paraphrase
why
and
should differ
how
the
from the
user's.
A
logical schema
developed (see (MAYS
of the
is currently
79)) which will provide
information for the system.
of the
database
implementaion of
The
about the
information could
be used
some semantic
schema will be independent
any particular
contain knowledge
being
database and
will
the data.
This
structure of
by the
paraphraser in
choosing
words that reflect the structure and content of the database
t o replace words used in the original question.
The current version
disambiguation.
Nouns in
occur in
the lexicon
category.
The word's
any
semantic
of the system does
are assigned
example, in question
question which
to a
do not
specific database
syntactic function in the question and
constraints
determine its category
the user's
do some lexical
placed
upon
it
(see (KAPLAN 79) for
(2) below, the name
are
used
details).
to
For
Thomas Wirth does
not appear
in t h e
lexicon.
(AA)
indicating t h e category that h a s
is output before
is output
t o the
been assigned.
t h e paraphrase and is done
user,
T h e phrase
by t h e control
structure.
( 2 ) Which
programmers advised
division 3?
by T h o m a s
W i r t h are
in
(AA) I a m assuming that T h o m a s Yirth is an advisor name.
Another
change
which
would
provide
the
user
with
additional semantic information would be t o generate part of
t h e paraphrase from
the formal database query.
T h i s would
provide t w o specific t y p e s of information for the user.
The
first of these h a s t o d o w i t h t h e verbs used in t h e question
and t h e relations
not
occur a s
in the database.
a relation
in t h e
For
form a
example,
'checks
whose
'newn
"checks
relation
correspondinq t o
night
the
In
the verb.
correspond
than t h e account
s o m e fictitious banking database.
helpful for t h e user t o s e e
used in
from t h e database may be
that bounced"
amount is less
verb w h i c h does
database is
question, a composite o f relations
used t o
When a
to
balancen in
such cases, it may be
h o w the verb w a s interpreted in
t e r m s o f t h e database concepts.
A second
t y p e of
information that
from t h e formal database query
sets
restricted by
are
retrieved
In some cases, t h e order in
from
the
modifying c l a u s e s makes
t i m e t a k e n t o find t h e answer.
generated
concerns the method by which
t h e database is t o be searched.
which
could be
database
and
a difference
then
in t h e
Although such information is
not
necesgary
for
interpretation
the user
of
her/his
to
understand
question,
the
some
system's
users
may
be
interested in the efficiency of the database search.
Additions to the
area
paraphraser could also be
of inferencing.
indicate
the
The
system's
paraphraser
made in the
could be
interpretation
of
used
the
to
user's
intentions if the system were to address the question of why
a particular
would
question was asked.
contain more
content
of
the
This type
information than
user's
question.
of paraphrase
just the
substantial
Some
the
of
user's
intentions or motives may be deduced on the basis of her/his
question alone,
For example, when
a question is asked that
requires a list for an answer, the questioner may not really
be interested in all items on the list.
Recognition of such
motives
a question
is useful
unmanageably large.
the user about
when the
answer to
The paraphraser could
be used
underlying motives which would
becomes
to ask
restrict the
list.
If a
of the user
model were maintained
durinq her/his
session with the system, more information would be available
to aid the paraphraser in determining the user's
It
is
often the
questions
on
case
that
one topic,
a
S/he
questions t o get the information
question,
In
such cases,
may
asks a
have to
series
ask
of
several
needed to ask a particular
the paraphraser
deduce what the user is aiming at
the paraphrase.
person
intentions.
may be
able to
and include it as part of
Moreover, a question asked
at the end of a
series of questions may take on a slightly different meaning
if viewed in
in isolation.
provide the
light of previous questions
A running
history of
paraphraser with
rather than taken
a user
the necessary
generate these nuances of meaning.
session would
information to
IX. RELATED RESEARCH
At the
aware of
present time, two
exist for
other paraphrasers that
database question-answer
was devel.oped by David Waltz
et.
al.
PLANES
Ted
Codd et.
system, the
Rendezvous Version
generates
using
other by
the paraphrase
templates.
actions.
The
English
query.
An
from
the
process
words
abbreviations or code
(WALTZ 78)
(CODD 7 8 ) .
1 System
systems.
al.
The
are
One
for the
for
the
PLANES system
formal database
involves
I am
three
query
specific
substituted
for
any
names which appeared in
the database
appropriate paraphrase template is
selected for
use and the slots in the template are then filled with words
and phrases from
per se.
the query.
It involves the
suitable for
The process
is not generation
formation of templates
the particular database
which are
and for the
types of
questions which can be asked.
The Rendezvous
is
slightly more
three parts
used.
A
System also uses templates,
sophisticated
to generation
the system
EVERY
and two
header template which
query is chosen first.
most
than
frequently.
...,
The
where the
header
types of
There
type of
three types of queries in
COUNT), of which
for FIND
dots must be
are
templates are
corresponds to the
There are
(FIND, EXIST, and
WALTZ'S.
althouqh it
is
FIND occurs
PRINT THE
filled in,
...
The second
part of the paraphrase is the target list and it occurs only
in PRINT type queries.
called the body.
The third
It is formed by
part of the paraphrase is
extracting patterns from
tables
that are
associated with
particular
items in
the
generation component
are
database.
The
goals of
important
easy
ones.
to
the Rendezvous
The
generated
understand,
(CODD 7 8 ) .
achieve
discriminating,
Instead of
these
English must
goals
developing
and
a
however, the
unambiguous,
not
general solution
research
seems
concentrated on
particular examples which don't
criteria.
results in
This
misleading
part from
the use
to
to
be
meet these
of patterns
which are essentially fragments of English t o be inserted in
the sentence.
for a
constructed beforehand
The patterns must be
particular database and great
choose phrases
that can be
variety
of other
looking
at
phrases.
particular
care must be
taken to
easily patched together
Such
a solution
examples, instead
with a
necessitates
of
the
general
framework.
Goldman (GOLDMAN 75)
although it
system,
is not part
MARGIE,
network
paraphrase
mode.
Unlike
the
and
In
English
operates in
either
paraphrase mode,
paraphraser,
system.
a
from
knows of expressing a
CO-OP
a paraphraser,
of a question-answer
generates
dependency
possible ways it
has also developed
The
conceptual
inference
MARGIE outputs
or
all
particular concept,
MARGIE
is
a
semantic
paraphraser; it uses different idioms and phrases to express
the same idea.
Other work
Slocum
has been done
in generation by
(SIMMONS and SLOCUM 72), Heidorn
Simmons and
(HEIDORN 7 5 ) ,
and
McDonald (MCDONALD 78).
Simmmons and
a system t o generate English
a n ATN
(WOODS 73).
structure grammar
system
can
be
used
pronouns versus
proper nouns.
Heidorn
uses
an augmented
with a n interpreter
McDonald h a s examined
of
from semantic networks using a
T h e formalism they use is similar
transformational grammar.
to
Slocum have developed
for
both
for t h e
generation
and
t h e more s ~ e c i f i cproblem
naming
through the
phrase
rules.
His
analysis.
of t h e use
use - o f nouns
and
H e has developed a system for generation that
incorporates t h e constraints he has observed.
X.
CONCLUSIONS
The paraphraser
this work
a syntactic
described here is
has examined the
one.
reasons for different
While
forms of
expression, additions must be made in the area of semantics.
The
substitution
portions or all
of
synonyms,
phrases,
of the question requires
the effect of context on word
base
and
contextual
used here,
extended
wider
a
for
meaning and of the intentions
syntactic approach
once
idioms
an examination of
of the speaker on word or phrase choice.
semantic
or
The lack of a rich
information
but the
ranqe
of
has
been
dictated
paraphraser can
information
the
be
becomes
available.
The
paraphraser
CO-OP
domain-independent
and
requires no changes in
use
the template
thus
a
designed
change
the paranhraser.
form however,
do
of
to
the
be
database
Paraphrasers which
require such
changes.
This is because the templates, or patterns, which constitute
the
type of
dependent
on
question that
the
domain.
can be
For
asked, are
different
necessarily
databases,
a
different set of templates must be used.
The CO-OP
that
paraphraser also
it qenerates
grammar of
the
questions.
differs from
question
It
using a
addresses two
other systems
in
transformational
specific problems
involved in qenerating paraphrases:
1. ambiguity in
determining
relative clause modifies
which
noun
phrases
a
2. the production
user's
of a question
These goals have been achieved
that differs
from the
for questions using relative
clauses through the application of a theory of given and new
information to the generation process.
XI.
APPENDIX A
MACROS and MQL Representation
MQL representation:
(
sets
(
. . . . . ) . relations ....
))
a list, the
MQL is
that identify the
lists,
Each
CAR
is a list
of which
sets in a graph.
list in the
of Gensyms
The CADR is a
list represents a
list of
relation.
Its
format is as follows:
The
CAR
of
the
list
is
a
identifies a relation in the graph.
list
of Gensyms
involved in.
which identify
The first item in
Gensym
The
which
CADR
the sets
of the list is
the relation
the list will be
order set (or the subject) of the relation.
uniquely
is
the hiqh
The second item
in the list will be the low order set (or the object) of the
relation.
If the arc is a
disjunct arc, there will be more
than one low order set.
All other
the
Gensyms.
information is located
Each
node
in a
qraph
on property
has
the
lists of
following
properties associated with it:
CAT
The lexical category of the node. Will be either
N (noun), P N (proper noun), ADJ (adjective) ,
I
WH (wh-word).
QUANT
The quantification on
the node.
This will
(u
(e.g.
be a list
u) for
universal quantification).
NUM
The number of the
node,
LEX
Will be either PLUR or SING.
The lexical word associated
- USER).
with the Gensym (e.9.
TOPIC
This will be true if the
node is the topic of the question.
Each arc has the properties associated with it:
LEX
All lexical information associated
with an
arc will
form is
a list of lists.
for each
be on
low order
this property
list.
There will be
set associated
Its
one list
with the
arc.
Each list will have the following form:
<vrb> will be the lexical
arc.
NIL.
(e.g.
WORK).
<PREP>
will
verb associated with the
If there is none,
be
the
associated with the arc (e.g.
none,
it will
be NIL.
lexical
-
1
Note that
it will he
preposition
If there is
both of
these
slots can be non-NIL or one of them can be NIL, but
they both
can not
next item
is a
be NIL in
list of
the same
tense and
list.
The
number of
the
verb.
can
If <vrb> is nil, these will be also.
be
either
PAST,
PRES,
PRESP
participle), or PASTP (past participle).
be SING or PLURo
is conjugated
will be
T,
<reg>
(present
<num> can
indicates whether the verb
regularly or
If
not,
not,<reg> will
If
be a
it is
be T
if this
arc is
the main
<reg>
list of
proper conjugations taken from the lexicon.
will
<tnse>
<Main>
verb of
question,
MACROS
ARC:HIORDSET
Arguments:
1. The arc for which the high order
set is needed.
2. The list of relations and their
associated sets.
(the CADR of the M Q L graph),
Returns:
The Gensym identifying the high order
set or subject of the arc.
ARC:LOWORDSET
Arguments
1. the arc for which the low order set
is needed.
2, the list of relations and their
associated sets,
Returns:
the first low order set of the arc,
ARC :LEX
Arguments:
1. the Gensym identifying the arc to
be translated.
Returns:
A list containing the lexical translation
the
the
of the arc. T h i s will be either a preposition
a verb, o r a verb and preposition. T h e
verb will be properly conjugated.
ARC :VRB
Arguments:
1. O n e list of the property list of
a n arc. i.e.
T h e lexical information corresponding t o one arc,
whether it be part of a disjunct
a r c or a simple arc.
Returns:
T h e unconjuqated verb from that list.
-
ARC: P R E P
Arguments:
1. O n e lexical list of a n arc.
Returns:
A list of the preposition or NIL if there
is no prepositon.
ARC: TNSE
Arquments:
1. o n e lexical list of a n arc
Returns:
t h e tense of the verb from that list.
ARC :REG
Arqurnents:
1. o n e lexical list of a n arc.
Returns:
T if the verb is regularly conjugated.
Otherwise, it returns the list of
conjugations for the irregular
verb.
ARC :C A T
Arguments:
1. T h e Gensym identifying the arc.
Returns:
T h e lexical category of an arc. Note
that this will be either PREP or VERB.
NODE :LEX
Arguments:
1. T h e Gensym identifying the node.
Returns:
T h e lexical translation of a node.
NODE: QUANT
Arguments:
1. The Gensym identifying the node.
Returns:
the quantification on a node. This
will be in the form of a list (e,g.
(u u) for universal quantification).
-
NODE :CAT
Arguments:
1. The Gensym identifying the node.
Returns:
T h e category of a node.
This will be
either N, PN, ADJ, or WHO
NODE: NUM
Arguments:
1. The Gensym identifying the node.
Returns:
T h e number of the node, This will
be either PLUR or SING.
N0DE:TOPIC
Arguments:
1. The set of nodes in the graph.
Returns:
The topic of the set of nodes.
ARCS :
Arguments:
1. The Y Q L qraph
Returns:
The set of Gensyms identifying
the arcs in the qraph.
NODES:
Arguments:
1. The M Q L graph
Returns:
T h e set of Gensyms identifyinq
the nodes in the graph. This is
the car of the MQL.
-
XII. APPENDIX B
Sample Questions and their Paraphrases
Q: Which programmers from the ASD group are in superdivision
SOOO?
P: Which programmers
are in
superdivision #5000?
Look for
programmers from ASD group.
Q: Who advises projects in area 36?
P: Assuming that there are projects in area 135, who advises
those projects?
Q: Which
advisors
have
accounts
for
projects
in
oceanography?
P: Assuminq
that there
are
accounts
projects are in oceanography.),
for projects
(those
which advisors have those
accounts?
Q: Which users that work on
projects that NASA sponsors are
in division 3?
P: Which users are in division #3?
sponsors.
Q: Which
look for users that N A S A
The users must work on projects.
users work
on
projects in
area
3
that are
in
division 200?
P: Assuming that there are projects
work
on
those projects?
Look
in area 4 3 , which users
for
users that
are
in
division #200.
Q: What are the groups?
P: Display the groups.
Q: Which users work on every project in area 5 5 3
P: Assuming that there is at least
one project in area t 5 5 ,
which users work on every such project?
Q: Who sponsors more than 3 projects?
P: Who sponsors 4 or more projects?
Q: Which programmers work in division 3 or 4?
P: Which programmers work in division # 3 or division #4?
Q: Which users work on all projects in oceanography?
P: Assuming that
there are projects in
users work on all of those projects?
oceanography, which
-
XIII. APPENDIX C
A Note on Theme and Rheme
The concept
relation to
of theme
and rheme
its affect on the
more detail than have other
Prague School
has been
discussed in
word order of a
distinctions.
postulate that the
sentence in
Linguists of the
sentence is
divided into
elements providing common ground for the conversants (theme)
and elements which function in
be imparted (rheme).
is
governed by
conveying the information to
The definition of elements as thematic
two constraints.
In sentences
containinq
-
elements which are contextually dependent e
.
information
context),
known
or
contextually dependent
In sentences
defined
as
determined
elements always
lacking known or
those
Since
hand, is
(CD), a
degree of
the
new
C D than those that
lowest
don't,
theme.
theme is
of
Rheme, on
high deqree
information
the
degree
vague concept.
characterized by a
elements conveying
function as
given information,
elements having
communicative dynamism*
the other
from
conveying
of CD.
carry a
higher
rheme is close
to the
concept of new information.
The Prague
School contends that
tendancy for theme to appear as
for rheme
in English there
subject of the sentence and
to appear towards the
end of the
sentence.
p
*
of
(FIRBAS 74)
an
element
defines the deqree of
as
"the
is a
extent
p
p
p
p
The
-
communicative dynamism
to
which
contributes towards the development of the
the
element
reverse order is possible, though less likely,
According to
these observations, if context indicates that the transitive
subject of the
the
sentence conveys the new
theme occurs
as
object, then
passivized to re-establish
the
information, while
sentence would
the order of theme
be
first, rheme
last.
An
analysis of
question
has
theme and rheme and its
been
made by
both
Although they
agree about the
(see
IV),
Section
function
as theme
they
function in the
Halliday
and
unknowns for
disagree
and which
funcion
about
Krizkova.
the questioner
which
elements
as rheme.
Halliday
defines the theme of a question as a demand for information,
The wh-item of a question is interpreted as theme because of
its position
in the question
and because it
indicates the
speaker's want of knowledge and desire to fill it,
He says:
"In a non-polar interrogative for example, the wh-item
is the theme by virtue of its being the point of
departure for the message; it is precisely what is
being talked about,"
(HALLIDAY 67; p, 212)
Krizkova criticizes this
the interrogative
Krizkova,
question.
Rrizkova's
words of the
the rhematic
The
analysis, instead interpreting
elements are
remaining
definition of
question as
elements
the
rhematic.
For
unknowns in
the
function
theme and rheme is
as
theme.
similar to the
given/new distinction since she only considers what is known
or unknown in the question.
Halliday, o n the other hand, is closer to the concept of
topic/comment articulation
rheme.
Describing
in his
definition of
theme and
the difference
between theme
and given
information, he defines. theme as that which
talking
about now,
speaker
was talking
ascribes the
the sentence.
rheme.
as
opposed to
about.
term theme to
given,
the speaker is
that which
Furthermore, Halliday
the element occurring
He labels the
remainder of the
the
always
first in
sentence as
For him, whether elements function as theme or rheme
is determined
by the
difference in
interpretation of
order of
the sentence.
theme and
rheme that accounts for
Krizkova's
analysis.
the meaning
It is
of the
the conflict in
this
terms
his and
XIV. REFERENCES
Akmajian, A,
and Heny, F., An
1,
(AKMAJIAN and HENY 75).
Introduction to the Principles of Transformational Syntax,
MIT Press, 1975.
-
-
2.
(CHAFE 76).
Chafe, W, L,, OGivenness, Contrastiveness,
Definiteness, Subjects, Topics, and Points of View", Subject
and Topic (ed, C. N. Li), Academic Press, 1976.
3.
An
(CODD 78).
Codd, E.
F.,
et al., Rendezvous Version 1:
Ex erimental English-language Query Formulation
G rs *
of Relational
Report ~~2144(29407), IBM ~ e s e a r ~ a b o r a t b rSan
~ , Jose,
Ca., 1978,
-
4.
(CULLICOVER 76).
Press, N. Y., 1976.
Cullicover,
P. W.,
Syntax,
Academic
5.
(DANES 74). Danes, F,
(ed. ) , Pa ers on ~ u n c t i o n a l
Sentence Perspective, Academia, Prague, -&T6.
(FIRBAS 66).
Firbas, Jan, "On Defining the Theme
Functional Sentence Analysis",
Travaux Linquistiques
Prague 1, Univ. of Alabama Press, 1966,
in
de
-
.
7
(FIRBAS 74).
Firbas,Jan,
"Some
Aspects of
the
Czechoslovak Approach to Problems of Functional Sentence
Perspective", papers on Functional Sentence Perspective,
Academia, Prague, 1974.
-
8,
(GERRITSEY 75).
Gerritsen, R., SEED Reference Manual,
Version COO-B04 draft, ~ n t e r n a t i o n a l a t a Base Systems,
Inc., Philadelphia, Pa., 19104, 1978,
(GOLDMAN 75).
Goldman, N.,
"Conceptual Generation",
9.
Conceptual
Information
Processing (R.
C.
Schank) ,
North-Holland Publishing Co., Amsterdam, 1975,
10.
(GRICE 75). Grice, H. P., .Logic and conversation", in
3, (P. Cole and
Syntax and Semantics: Speech Acts, Vol.
J. L. Morgan, Ed.), Academic Press, N. Y., 1975.
-
11.
(HALLIDAY 67).
Halliday,
M.A.K.,
Transitivity and Theme in ~ n g l i s h " , Journal
3, 1957.
"Notes
on
of Linguistics
12.
(HIEDORN 75). Heidorn, G., "Augmented Phrase Structure
Grammar", TINLAP-1 Proceedings, June 1975.
(JOSH1 79).
Joshi, A. K., "Centered Logic: the Role of
13.
Entity Centered Sentence Representation in Natural Language
Inferencing", to appear in IJCAI Proceedinqs 79.
14,
(KAPLAN 79).
Kaplan, S. J., "Cooperative Responses
from a Portable Natural Language Data Base Query Systemu,
Ph,D. Dissertation, Univ. of Pennsylvania, Philadelphia,
Pa., 1979.
15. (MAYS 79). Mays, E., forthcominq Masters ~ h e s i s ,Dept.
of Computer and Information Science, U.
of ~ennsylvania,
Philadelphia, Pa., 1979,
16. (MCDONALD 78). McDonald, D. D., "Subsequent Reference:
Syntactic and Rhetorical Constraintsn, TINLAP-2 Proceedinqs,
1978.
17. (MORGAN and GREEN 77).
Morgan,J.L.
and Green, G.M.:
"Pragmatics and
Reading Comprehension",
University of
Illinois, 1977.
18,
(PRINCE 79).
Prince,
E,,
"On
Distinctionw, to appear in CLS 15, 1979,
the
Given/New
19.
(Sgall, Hajicova, and
Benesova 73)Sqal1,P. ,
and Generative
Hajicova,-E., and ~enesova,E,, Topic, Focus Semantics, Scriptor Verlag Gmbh, Kronberg Taunus, 1973.
20. (SIMMONS and SLOCUM 72). Simmons, R.
and Slocurn, J.,
"Generating English Discourse from Semantic Networksn, Univ.
of Texas at Austin, CACM, Vol. 5, #lo, October 1972.
21. (WALTZ 78). Waltz, D,L., 'An English Language Question
Answering System for a Large Relational Database", CACM,
Vol. 21 #7, July 1978.
-
"An Experimental Parsing
22.
(WOODS 73). woods, W. A.,
System for Transition Network Grammars", in Natural Language
Processign, Courant Computer Science
Symposium #8, R.
Rustin (Ed,), Algorithmics Press, Inc., N.Y., 1973.