* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Paraphrasing Using Given and New Information in a Question
Georgian grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
French grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Icelandic grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Antisymmetry wikipedia , lookup
Polish grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Lexical semantics wikipedia , lookup
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science August 1980 Paraphrasing Using Given and New Information in a Question-Answer System Kathleen R. McKeown University of Pennsylvania Follow this and additional works at: http://repository.upenn.edu/cis_reports Recommended Citation Kathleen R. McKeown, "Paraphrasing Using Given and New Information in a Question-Answer System", . August 1980. University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-80-13. This paper is posted at ScholarlyCommons. http://repository.upenn.edu/cis_reports/723 For more information, please contact [email protected]. Paraphrasing Using Given and New Information in a Question-Answer System Abstract The design and implementation of a paraphrase component for a natural language question-answer system (CO-OP) is presented. A major point made is the role of given and new information in formulating a paraphrase that differs in a meaningful way from the user's question. A description is also given of the transformational grammar used by the paraphraser to generate questions. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-80-13. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/723 MS-CIS-80-13 UNIVERSITY OF PENNSYLVANIA THE MOORE SCHOOL Paraphrasing Using Given and New Information in a Question-Answer System Kathleen R. McKeown This work was partially supported by an IBM fellowship, NSF grant MCS 78-08401, and NSF grant MCS 79-19171. Presented to the Faculty of the College of Engineering and Applied Science (Department of Computer and Information Science) in partial fulfillment of the requirements for the degree of Master of Science in Engineering. Philadelphia, Pennsylvania August, 1979 ABSTRACT Paraphrasing Using Given and New Information in a Question-Answer System Kathleen R. McKeown Supervisor: Dr. Aravind K. Joshi The design and implementation of a paraphrase component for a natural language question-answer system (CO-OP) is presented. A major point made is the role of given and new information in formulating a paraphrase that differs in a meaningful description grammar used is way also from the given of by the paraphraser to user's the question. A transformational generate questions. PAGE - - TABLE OF CONTENTS . 1.INTRODUCTION. . . 11. OVERVIEW OF THE CO-OP SYSTEM 111, THE CO-OP PARAPHRASER . IV. LINGUISTIC BACKGROUND. V. FORMULATION VI. IMPLEMENTATION OVERVIEW .......... . A. THE PHRASE STRUCTURE TREE . C.FLATTENINGo E. CONJUNCTION AND DISJUNCTION ... F. NUMERICAL MODIFICATION . ..... . . VII.GENERATION. . . . . . . . . . VIII. FUTURE RESEARCH . . . . . . . G. Translation. , . . . . . . . . ... ... . .. . . . . . , . . . . . IX. RELATED RESEARCH X-CONCLUSIONS XI. APPENDIX A . . . .. . . . . . . . . . .. . . . . .. . . . . . . . XIV. REFERENCES 8 10 14 16 17 21 24 29 35 37 39 44 48 51 53 58 XII.APPENDIXB. XIII.APPENDIXC. 4 19 B . DIVIDING THE TREE. D. TRANSFORYATIONS 2 . . 60 ti3 ACKNOWLEDGEMENTS by an IBM fellowship, This work was partially supported NSF grant MCS 78-08401, and NSF grant MCS 79-19171. I would like to thank my advisor Dr. Aravind K. Joshi, for the tremendous amount of time and advice he has given me during the making of this thesis. I would also like to thank Jerry Kaplan for his advice during the implementation stages of and Dr. the paraphraser invaluable comments on the style Some special graduate thanks students for reading Bonnie Webber for her and content of the thesis. of the and Moore School commenting on deserve various versions of the thesis and discussing critical points. are: Tom Williams, David Bourne, Joe O'Rourke, and They David Raab. Finally, thanks goes to the members of the house at 4103 Chestnut St. for their support entire period of the project. and friendship during the I. INTRODUCTION a natural In language interface system, a paraphraser can be used has correctly understood a database to query to ensure that the system the user. Such a paraphraser has been developed as part of the CO-OP system (KAPLAN 79). In the user's question is CO-OP, an internal representation of passed to the paraphraser which then generates a new version of the question the user has before for the user. the the system question was seeing the paraphrase, of rephrasing attempts to answer not interpreted caught before a initiated. option Upon her/his it. Thus, if correctly, the possibly lengthy search of question error can the be the database is Furthermore, the user is assured that the answer s/he receives is an answer to the question asked and not to a deviant interpretation of it. The idea of using a paraphraser new. T o date, in the above way is not other systems have used canned templates to form paraphrases, filling in empty slots in the pattern with information from the user's question The CO-OP paraphraser differs from that a adopted. systematic method to In generate the the question. (WALTZ 78; CODD 78). these earlier systems in generate paraphrases CO-OP, a transformational paraphrase from an internal grammar is has been used to representation of Moreover, the CO-OP paraphraser generates a question whose form differs in a meaningful way from that of the original between given question. It makes and new information use of a distinction to indicate to the user the existential presuppositions made in her/his question. --- 11. OVERVIEW O F THE CO-OP SYSTEM The CO-OP system is database query systems. aimed at infrequent users of These casual users are likely t o be unfamiliar with computer systems and unwilling t o invest the t i m e needed t o learn a formal query language. converse naturally in English enables Being able to such persons to tap t h e information available in a database. In order t o allow the question-answer process t o proceed naturally, CO-OP principles" o f using these follows some of the conversation (GRICE 75). principles, meaningful answers the 'co-operative In system particular, by attempts t o questions having to find negative responses. T h e motivation for the approach was based on the observation that people expect a non-trivial response t o their questions - (i.e. than a more informative correct direct response is simple 'no'). negative, an When the indirect response c a n be more informative. T h e CO-OP responses questioner by system w a s developed t o addressing may have made in to a direct response 'none", CO-OP gives a more correcting example, the wrong, incorrect her/his question question. would (A) below mistaken the the When the "no' or be simply assumptions. is posed, projects in oceanography CO-OP g i v e s assumptions informative indirect response by questioner's if question assuming that any provide cooperative corrective the speaker exist. If For is s/he is indirect response rather than the less cooperative direct response onone". (B) (A) Which users work o n projects in oceanography? (B) I don't know of any projects in oceanography, The false assumptions that existential presuppositions (A) above, in question presupposition that CO-OP corrects of the question. the speaker there are the For example, makes the projects are in existential oceanography. Since these presuppositions can be computed from the sutface of structure the question, a large knowledge for inferencing purposes is a lexicon and contain database schema of semantic not needed, In fact, are the domain-specific information, the CO-OP system store only it.ems which Although this is a portable one, it also means means that the system does a minimum of semantic analysis. The modules morphological in the CO-OP system analyzer, phase translator, the paraphraser, and a The flow an control structure presuppositional analysis when questions responses. include a of control is parser, a intermediate which does result in negative initiated with morphological analyzer which processes the input and passes to the parser, In this the result the the question stage, the morphological analyzer strips plural endings, determines the root form for verbs, etc. The parser uses an Augmented Transition Network (ATN by (WOODS 73)) t o parse the question structure of the question in is at this point that the and outputs a syntactic Meta Query Language ( M Q L ) . paraphraser is invoked It with the MQL structure as input. The paraphraser rephrases the question in English and the result is presented to the user. If the user perceives that the system incorrectly interpreted her/his question, s/he can rephrase the question and try again before the database is searched for an answer. Once the user interpretation, translated the system satisfied MQL Q, into interrogating the CO-OP is version the formal database. was supplied Atmospheric Research (NCAR). compatible with with of query the the system's question language used is for (The database for testinq the by Center the National for It is in CODASYL format and is the commercially available database system SEED (GERRITSEN 78) which was used.) query If the database search results in a negative response, the control structure does a presuppositional analysis, checking to see if any of the database presuppostions results were an answer in to false, the If the question, then the search report formatter is called to comprehensibly present the results to the user, A s input t o the paraphraser, the information available question. The the YQL structure contains for its MQL representation is use in rephrasing composed of sets and arcs and encodes the surface structure of the question. sets denote entities in the database while binary relations between those entities. of the arcs and sets are drawn from words As such, sentences in the arcs the The denote The lexical labels in the question. system are treated extensionally, each word in the question pointing to its actual counterpart in the database. Figure 1 shows the MQL interpretation for the sample question (A) above, Attached to the sets and arcs are properties which provide additional syntactic information about the question. For example, each set or arc indfcates the word's syntactic available as has a category. properties includes property C9T which Other information the number of a noun or verb, the topic of the question, the main verb, the tense of a verb, etc, For a full description of MQL see the system documentation on the language and the macros which access it in Appendix A. ocean- Figure 1 - 111. THE CO-OP PARAPHRASER CO-OP'S paraphraser provides the error-checking for the casual user. with the system, s/he results printed, in formal database can ask only will to be have the intermediate output and the shown. The naive however, is unlikely to understand these results. this reason that the paraphraser of If the user is familiar which case the parser's query means user It is for was designed to respond in English. The use of English to paraphrase queries creates several problems. The first is that A ambiguous. natural language is inherently paraphrase must clarify the system's interpretation of possible ambiguous phrases in the question without introducing additional ambiguity. One particular type of ambiquity that a paraphraser must address is caused by the linear nature of sentences. A modifying relative clause, for example, frequently cannot be placed directly after the noun phrase it modifies. cases, sentence the semantics correct choice the sentence question of modified may (C) below plausible. of the noun phrase, be genuinely has t w o The speaker may In such indicate the but occasionally, ambiguous. For interpretations, both could be referring to example, equally books datinq from the '60s or to computers dating from the '60s. (C) Which students the '6Os? read hooks on computers dating from A second problem in paraphrasing possibility of generating originally asked. If a generate English from an question the English queries is the exact grammar were question that was developed t o simply underlying representation of the this possibility could be method must be devised which realized. Instead, a can determine how the phrasinq should differ from the original. The CO-OP paraphraser addresses ambiguity and the rephrasing of both the problem the question. of It makes the system's interpretation of the question explicit by breaking down the clauses dependent upon of their question (C) above the question function in and reordering the sentence. will result in either them Thus, paraphrase (D) or (E), reflecting the interpretation the system has chosen. (D) Assuming that there are books on computers (those computers date from the '60s), which students read those books? (E) Assuming that there are books on computers (those books date from the '60s), which students read those books? The method adopted quarantees that the differ from the clauses or prepositional formulated on the new original except in cases information phrases were basis of a distinction and indicates to presuppositions s/he "assuming that" clause), while focussing has made in the paraphrase will where no relative used. It was between given and the question user (in the the her/his attention on the attributes of the class s/he is interested in. IV. LINGUISTIC BACKGROUND As mentioned earlier, the lexicon and the the sole sources of world knowledge for design increases CO-OP'S database are CO-OP. While this portability, it means that little semantic information is available for the paraphraser's Contextual information is history or context parser question. limited since no is maintained for a user current version. the also The input is basically Using syntactic this information, receives from parse tree the running session in the the paraphraser a use. of paraphraser the must reconstruct the question to ol,tain a phrasing different from oriqinal. the The following question must therefore be addressed: What reasons are there for choosin? one syntactic form of expression over another? Some linguists maintain functional roles Terminology use? elements that word order is play t o describe within the types affected by the sentence.* of roles that can ............................................................ * Some other influences discussed in stylistic reasons, in discussed (MORGAN here, on and G R E E N 73). addition to determine constructions are to be used. that the avoid passive tense is often identification of flavor t o the text. syntactic agent when They some expression are suggest that of the different functions syntactic They point out, for example, used in academic and to lend a prose to scientific occur varies widely. Some of the distinctons that have been described include qiven/new, topic/comment, theme/rheme, and presupposition/focus. are not consistent Definitions (for of these terms however, see (PRINCE 79) example, for a discussion of various usages of "given/newm). Nevertheless, one influence on expression does appear to be the interaction of sentence content and the the speaker concerning the knowledge elements in of the listener. the sentence function in which the speaker assumes is of the listener (CHAFE 76). present in the "consciousness" This information is said to be the preceding discourse or because it knowledge question-answer information database. of the Some conveyinq information contextually dependent, either by virtue world beliefs of is part of the shared dialog system, shared of its presence in In a knowledge refers to participants. world which the speaker assumes Information functioning is in present in the role the just described has been termed "given'. .Newn labels presented as all information in not retrievable declarative, elements that the the sentence from functioning in listener is presumed not represent listener wants to know information is not (i.e.which already to know are additional functions in the the In the called new. in conveyinq what the what aware context. asserting information In the question, elements functioning speaker which is s/he speaker of. question. doesn't presumes Firbas Of know) the identifies these, (ii) is used here to augment the interpretation of new i-nformation. He says: .(i) it indicates the want of knowledge on the part of the inquirer and appeals to the informant to satisfy this want. (ii) [a] it imparts knowledge to the informant in that it informs him what the inquirer is interested in (what is on his mind) and [b] from what particular angle the intimated want of knowledge is to be satisfied." (FIRBAS 74; p.31) Although word order vis-a-vis distinctions has been discussed in sentence, less has analyzed the arrive at of the Despite different conclusions**, lines of reasoning. among the the the finite few to fact that the two they follow similar Krizkova argues that 50th wh-question and related interrogative form. and Krizkova* are question. and light of the declarative been said about the Halliday (HALLIDAY 67) have these the wh-item - verb (e.g. "do" or "be") of the yes/no question point to the new information to be disclosed in the response. These elements she claims, --------------- * Summary by ( F I R B A S 74) of Interrogative Sentence the untranslated and Some Functional Sentence Perspective Problems of article "The the So-called (Contextual Organization of the Sentence), Nasa rec 4, 1968. ** It should be noted that Halliday and Krizkova discuss the unknowns in rheme of a the question in order question. to define the Appendix C contains theme and a description of this concept and the analyses made by Halliday and Krizkova. are the only unknowns to the questioner. discussing the yes/no question, also verb is the only unknown. The Halliday, in argues that the finite polarity of the text, is in question and the finite element indicates this. In this paper the interpretation of the unknown elements in the question followed. The as defined by wh-items, in defining the of knowledge, act as new information, of new information and Halliday questioner's Firbas' is lack analysis of used to further elucidate the the functions in questions is role Krizkova in elements are given information. questions. The remaining They represent information assumed by the questioner to be true of the database domain, This labeling of information within the construction ambiquity. of a natural the question will allow parsphrase, avoidinq V. FORMULATION Following paraphraser the analysis breaks information. More down described questions specifically, divided into three parts, of which above, into an the given input CO-OP and new question is (2) and (3) form the new information. (1) given information (2) Function ii[a] from Firbas above (3) Function ii[b] from Firbas a5ove In terms of question the with question no subclauses knowledge for the hearer. indirect components, (2) modifiers of hearer. it defines the the lack of Part (3) comprises the direct and the indicate the angle from which define the as comprises attributes of interragative as they the question was asked. They the missing words information for the Part (1) is formed from the remaininq clauses. As an example, consider question (F): (F) Which division of the computinq facility projects usinq oceanography research? works on Following the outline above, part (2) of the paraphrase will be the question minus subclauses: .Which division projects?", Part words, will be .of the .which division". (3), the modifiers of works on the interrogative computing facilitym which modifies The remaininq clause .projects oceanoqraphy researchm is considered given information. usinq The three parts can then he assembled into a natural sequence: (G) Assuming that there are projects using oceanography research, which division works on those projects? Look for a division of the computing facility. (I?), In question information belonging three categories occurred in the types of information presented minus the part (2) of than one question clause will the question further a particular splintered. containinq several Clauses characterized by category (3) will sentences following Only category, the ~ d d i t i o n a l given the *assuming t5at the paraphrase clauses no clauses defining specific information. be If more occur. Example (H) below illustrates question missing in will concluding clauses. parenthesized following infarmation and the initial or occurs be ...' clause. a is missing, of the If one of these question. the paraphrase will invariably information is for to each of given attributes of containing information be presented as separate the stripped-down question. (I) below demonstrates a paraphrase containing more than one clause of this type of information. (H) Q: Which users work on projects in oceanography that are sponsored by N4SA? P: Assuming that there are projects in oceanography (those projects are sponsored by NASA), which users work on those projects? (I) Q: Which programmers in superdivision 5000 ASD group are advised by Thomas Wirth? from the P: Which programmers are advised by Thomas Wirth? Look for programmers in superdivision 5000. The programmers must be from the ASD group. VI. IMPLEMENTATION OVERVIEW The paraphraser's tree structure tree is from the The design the paraphraser trees reflecting given and new information in the question. tree allows the tree. The for a the parser's In the transformations the During this grammar are of rules processing in translation phase, representation are words. of simple set final stage of is translation. their corresponding string. The three separate of the which flatten is given. representation it then divided into the division of labels in first step in processing is to build a translated into process, necessary performed upon the A. THE PHRASE STRUCTURE TREE In its initial processing, the parser's the paraphraser representation into one that is more convenient for generation purposes. The resultant that highlights certain syntactic This initial independence processing from representation the gives changed or the The paraphraser's the component the question as the root node subject of the verb is the root right subtree. relations in a In the object. The of the of Meta paraphraser's Query new node of the left node of the the use Language) does other constructions the main The of binary (KAPLAN 79) creates preposition has a tree a tree. representation (see every verb or representation of tree uses current system, the parser's parser's moved to (if there is one) the root description illusion that the some phase need be modified. verb of subtree, the object paraphraser Were phrase structure main structure is a tree features of the question. CO-OP system. system, only the initial processing for transforms the subject and allow should the for the incoming language use them. Each of the subtrees represents other question. Both the subject and the will have a subtree for in. If a noun in one another clause in As an example, clauses in the object of the main verb each other clause it participates of these clauses also participates in the sentence, it will consider the users advised by Thomas Wirth work have subtrees too. question: .Which active on projects in area 3?". The phrase structure in Figure 2. tree used in the Since 'work' root node of the tree. .projectsw of the other clause and adjective structure. 'active' is the 'usersw right. paraphraser is shown main verb, it will be the is root of the left subtree, Each noun participates in therefore has one subtree. does Instead, it not appear as is closely part bound to Note that the of the the noun modifies and is treated as a property of the noun. users projects in area object advised by Thomas Wirth object Figure 2 one tree it B. DIVIDING THE TREE The constructed tree is computationally suited three-part paraphrase. The tree is flattened after been divided into subtrees accomplished by portion of the tree least, this right subtree root stripped down The splitting of the tree first extracting the topmost containing the will include it has containing given information and the two types of new information, is for the wh-item. the root node nodes. question. smallest At the very plus the left and This portion of the tree is the The define clauses which the particular aspect from which the question is asked are found by searching the left and right The subtree whose root node is the wh-item questioned noun, contains these clauses. left or right subtree or these. The information. subtrees for the wh-item or remainder Figure 3 previous example, Note that this may be may only be of the the entire a subtree of tree illustrates this one of represents division for given the Q: Which active users advised by Thomas Wirth work on projects in area 3? P: Assuming that there are projects active users work on those advised by Thomas Wirth. Figure 3 in area 3, which projects? Look for users C. FLATTENING If the structure of the phrase structure tree is as left subtree and B the right, shown in Figure 4, with A the then the following rules define the flattening process: TREE-> A R B SUBTREE -> R' A' B' In other words, each of the subtrees will be linearized by doing a pre-order traversal of that subtree. subtree has three pieces of As a node in a information associated with it, one more rule is required t o expand a node. A node consists of: (1) arc-label (2) set-label (3) subject/object where arc-label is the label of the verb or preposition used in the parse tree and set-label Subject/object indicates the label of a noun phrase. whether the functions as subject or object in sub-node noun the clause; it is used by the subject-aux transformation and expansion rule. The following rule expands a node: NODE Two They are does not apply to the -> ARC-LABEL SET-LABEL transformations process. phrase They are applied are wh-fronting further described during the and subject-aux in the flattening inversion. section on transformations. Figure 4 The tree of given information is flattened first. part of the left or right tree and therefore is flattened It is during ... " are of given information. the clause. by a the flattening stage that that there (be) are subtree of the If inserted the phrase structure pre-order traversal. the words "Assuming inserted to introduce the clause will agree with there is more than around representing the 'Bea the subject of one clause, parentheses additional ones. stripped down question is The modifiers. form is inserted before tree flattened next. It is followed by the modifiers of the questioned noun. phrase 'Look It is the first The clause of D. TRANSFORMATIONS The grammar used transformational one. rules in the paraphraser In addition t o the described'above, the is a basic flatteninq followinq transformations are used: wh-f ronting negation do-support subject-aux inversion affix-hopping contraction has deletion T h e curved lines indicate t h e ordering restrictions. are two connected groups of transformations. There If wh-fronting applies, then s o will do-support, subject-aux inversion, and affix-hopping. The second group invoked through the application of transformations o f negation. do-support, contraction, and affix-hopping. not affected by The described by absence or A description of transformations. follows. the rules used here is It includes Has-deletion is presence of other the transformation rules are (AKMAJIAN and HENY 75) and based on analyses analyses described by (CULLICOVER 76). T h e rule for wh-fronting is SD abbreviates structural specified a s follows, where description change: SD: - - X NP Y 1 2 3 SC: 2+1 0 3 condition: 2 dominates wh and SC, structural The first step in search of the tree for approach is generation, the implementation used The of wh-fronting the wh-item. for A paraphrasing slightly different than clauses must Since the clauses for the be fronted paraphrasing that it along with the will never NP to paraphrase mode wh-fronting is tested is be be the also case when dominates For this reason, applicability the for and is applied question, noun. are broken down fronted used, process of the stripped down the original the head relative clauses or prepositional phrases. when for When generating, of the original question paraphrase, used be the head noun of some relative clauses or prepositional phrases. these is difference occurs because in question, the NP to be fronted may is a of in the flattening If it applies, only one word need be moved to the initial position. When generation is being done, the applicability of wh-fronting is tested for immediately before flattening. If the transformation applies, the tree of which the wh-item from the remainder of the tree position to the is the is split, root is The subtree flattened separately and is attached string resulting from flattening in fronted the other part. After invoked. wh-fronting In CO-OP, has been the underlying question does not contain modals fronting the applied, wh-item necessitates do-support representation of or auxiliary verbs. supplying an The following rule is used for do-support: is the Thus, auxiliary, - - -4 - NP NP tense V 1 2 3 SC: 1 do+2 3 condition: 1 dominates w h SD: Subject-aux afterwards. inversion Again, if 4 activated wh-fronting inversion will apply also. - is X immediately applied, subject-aux T h e rule 'is: - - AUX X NP NP 1 2 3 4 0 4 SC: 1 3+2 condition: 1 dominates wh SD: Affix-hopping follows subject-aux inversion. paraphraser it is a combination o f a s affix-hopping and auxiliary is 'hopped' from the verb t o the auxiliary. X SC: 1 1 generated, the - AUX - Y 3 3 2 2+4 well as the head-noun is and inversion apply t o question. In the not be however, may be applicable. used. number are Formally: - 2 6 6 propose that wh-fronting the relative CO-OP properly positioned by the wh-fronting need representation. tense and tense-num-V 4 5 0 5 S o m e transformational analyses and subject-aux Tense and number in the parser's When a n SD: the o f what is commonly thought number-agreement. are attributes of all verbs In clause a s paraphraser, the flattening process Subject-aux inversion In c a s e s where the head noun of t h e clause is not its subject, subject-aux inversion results in t h e proper order. T h e rule for negation is tested during t h e translation phase of execution. It has been formalized as: - - - Y tense-V NP X 3 4 1 2 3 4 SC: 1 2+no condition: 3 marked as negative SD: In the CO-OP representation, an carried on the (KAPLAN 7 9 ) ) . the part it is a possible as modification below), of of the Therefore, verb when (see the negation is relation (see English representation of in the In all cases however, of binary When generating an question, negation object indication of some cases noun (see to express question (J) negation can be indicated as version object is (K) of marked paraphraser moves the negation to as question (J)). negative, the become part of the verbal element. (J) Which students have no advisors? (K) Which students don't have advisors? In English, the auxiliary of case for negative the verbal element questions, Do-support is used. an The rule presented this way for used X SC: 1 1 - tense-V-no 2 do+2 must the was the generated. after used after wh-fronting. They clarity, but - be to for do-support combined into one rule. SD: is attached and therefore, as auxiliary negation differs from the one are marker Y 3 3 could have been Affix-hoppinq, number, and as described negation from the The cycle of transformations negation is above, hops verb to the the tense, auxiliary verb. invoked through application of completed with the contraction transformation. The statement of the contraction transformation is: SD: SC: X 1 1 - do+tense 2 -no 3 -Y #2+ngt# 0 where # indicates that the result for further transformations. 4 4 must be treated as a unit E. CONJUNCTION AND DISJUNCTION The use of affects both three-part conjunction the design and paraphrase. appears as part and When disjunction in the questions implementation conjunction or of the new information in of disjunction the question, no changes need be made in the design of the paraphrase. conjunction appears as or disjunction information however, to the already method of referring to given plurals, abovea to "eacha to of eacha is items. of the given will refer standard information which contains adopted. In the CO-OP used to refer to conjoined conjoined singulars, disjoined entities. When Some conjunction (or disjunction) must be paraphraser, "some part the stripped-down question mentioned conjoined the and For example, "any of (M) below the is used to paraphrase question (L): (L) Which users work on projects advised Clayton-Paulsen and projects sponsored by NASA? by (M) Assuming that there are projects advised by Clayton-Paulsen and assuming that there are projects sponsored by NASA, which'users work on some of each? Note that since any number of items could potentially be conjoined, expressions like .botha, had plurals do be avoided. not necessarily imply are indicated. the speaker projects to which implicitly For Furthermore, that all of example, (L.) above does is interested only in advised limit the by number, conjoined the entities not imply that users who worked Clayton-Paulsen and all on all projects sponsored by NASA. Again, a referring term that does not carry this connotation must be used. The features of the system affected by conjunction and disjunction are paraphraser's phrase process. MQL, In of its objects. in the structure tree, its currently since occurs flattening conjunction is verb or one does not in relative in the clauses each of the paraphraser (see question provide for not occur into two, (0) below). additional modification of in YQL. of question is split paraphrase it does the When conjunction occurs around the wh-items question, the and and it occurs around the main which is passed separately to (N) the MQL representation, the the representation implicit except when the addition of this The paraphraser type of conjunction input. is treated Conjunction by the head noun and the parser that as is so encoded Question (P) below would be represented in the same way as if question (Q) had been asked. (N) Which users and advisors are sponsored by NASA? ( 0 ) Which users are sponsored are sponsored by N A S A ? by N A S A ? (P) Which users work on projects sponsored by N A S A ? (Q) Which users in oceanoqraphy work on projects in are sponsored by N A S A ? As was the conjunction case in the for conjoined user's and oceanography that wh-items, question Which advisors this type of is invisible to the is visible to the paraphraser. Conjunction around the main 30 verb paraphraser since more than one verb is marked verb in the MQL representation. the main verb is Although conjunction around treated in the same way as - as conjunction (i.e. a s the main modification other types of of the subject), the additional modifying arc happens to be the main verb in this case. Conjunction around the objects of the main verb is also visible to the paraphraser. Conjunction around objects of verbs or prepositions results in duplication of the verh or preposition in the M Q L representation. occurs around the object of the When conjunction main verb, the main verh is duplicated. Disjunction is always explicitly represented. type of nouns arc called a disjunct are disjoined nouns in the in the arc is question. A special used when Disjunction of. around in MQL of question results in the duplication the relation the noun is part verbs or Figure 5 depicts the MQL representation of a question usinq disjunction. The paraphraser's phrase structure tree currently handles any type of disjunction and the types of conjunction which are visible replicating node in syntactic unit. replication is also each label This conjunction Each node The qroup is or achieved by disjunction in the tree functions as of labels treated as a syntactic in the group can thus allowing YQL. labels when occurs in the question. a the resulting unit. have its own set for the representation from However, of subtrees, of questions (R) below (its representation is shown in Figure 6). such as (R) Which users work on projects reports sponsored by NASA? in oceanography T o distinguish between conjunction and disjunction at a node, the list of corresponding labels is nested with to conjunction is a list of level and list. disjoined items. ends when an item is a any combination of than the M Q L any not a does not appear at any occur in the list. nested for a greater range currently is conjunction In and In fact, the paraphraser's disjunction can be represented. representation allows occur to lexical label and If conjunction or disjunction manner, A node Each item in the list Nesting can particular level, only one item will this alternate levels and disjunction. labeled by a list of conjoined items. and provides for, of possibilities as it has limited nesting capabilities. Flatteninq rules for expansion of nodes also need to be modified to accomodate conjunction and disjunction. node is now a list of labels instead ordering rules for expansion of a An item in a list conjoined or and its until a its subtrees are expanded by VI. C above. - The rule single node must be used. neighbors in the it is applied to the bottom level is reached. list. The label and the rules presented in Section for expansion of when the node has already the In such and then successive levels A lexical does not apply question. single label, subtrees are expanded disjoined to its procedure is recursive; of a Since a cases, a list of labels been mentioned in the appropriate referring phrase is used. .... (1) (item-1 item-2 item-n ... item-1 and item-2 an3 itemon)-> .... .,.. (2) If item-n = (item-nl item-n2 item-n->item-nl or item-n2 or Else item-n is a lexical label item-nm) item-nm (3) if item-nm is not a lexical label, repeat c3 users u Which users work in division 3 or 5 ? Figure 5 users ((projects) in oceanography object Figure 6 (reports)) sponsored by NASA object F. NYMERICAL MODIFICATION T h e CO-OP system incorporates a very o f quantification, that modification question interpreted a s are represented 79) for in fact, looks more o f nouns, (like Any modifiers of the in M Q L a s on questions in parsing). (3 5) means or moren, interpretation Each property etc,), precede. the are They (see (KAPLAN of quantified is a numerical range; first indicating the lower bound, "uw is used in the universe. "from 3 to S " , (0 (1 u) (indicating negation), quantifiers in nouns they indicating the upper, all, or everything like numerical properties of sets the two numbers are used, the the second explicit "every", "3 mall", details limited treatment 0) , 'from For 'from 1 to indicate example, the pair 0 to O n or "nonew to alln, or "somen, etc. The paraphraser treats same way it treats from numerical stage of numerical adjectives, of the Properties representation to As the are translated English during processing, translation, from unique label to lexical modification in a set the third is translated entry, any numerical modifiers set are introduced into the string. Table 1 below shows how the numerical pairs are translated into words, When numerically the given information in a strategy is presented modified expressions occur as used, as part representing given In such of question, a slightly different cases, the modification the information. part of existential is not presupposition When the noun appears for the second Paraphrases time, (T) the and (V) numerical of modifier (S) and (U) is used. respectively, demonstrate this for two cases: (S) Which students work on 5 or more projects advised by NASA? (T) Assuming that there are which students work on 5 projects advised by NASA, or more of those projects? (U) Which users work on every project advised by NASA? (V) Assuming that there is at least one project advised by NASA, which user works on every such project? ----------+--------------+-----------------------Number 1 1 Plural I I all I I Singular ----------+--------------+-----------------------1 I ----------+--------------+-------------------------( U U) I every 1 I ----------+--------------+-----------------------(1~-1) I no all no every I I no ----------+--------------------------------------( 0 0) I (n m) I n to ----------+--------------------------------------- I I I I I m (n n) I 1 n (n>l U ) I I n or more exactly ----------+--------------------------------------- ----------+--------------------------------------Table 1 I I I I 1 I I I I I 1 I G. Translation I n the translation phase, the final cosmetic changes are made to produce the paraphrase. a string of a for The input to this module is word in the original question. includes the words which were ... ","Look of the string for The string also introduced for the three-part - process (e.g. paraphrase during the flattening that a unique label Lisp Gensyms, each Gensym being ...",etc.). The syntactic structure is essentially that of in some cases, transformatons are "Assurninq the final paraphrase; performed upon the strinq during this stage. T h e major bulk of work done during translation of labels into nouns and adjectives, translation as a property of the Gensym. contains the root syntactic information. the verb has paraphraser a the is a simple For look-up for these items is stored The lexical property of a Verb form for the verb The tense, the number, regular calls is the their English counterparts. procedure since the lexical entry Gensym this stage conjugation morphological are and some and whether stored. routines with The this information to conjugate the verb. The nouns that were used first. A list of in the question are translated set-labels maintained as part of the MQL. for each set-label in the list string is replaced by the representing the is The input strinq is searched and the set-label in appropriate word. any adjectives that modify the nouns the At this point, noun are introduced into the question. As mentioned earlier, phase, adjectives are not appear a s order are splitting the translating, bound each post-modification. to nouns and do This is done in adjective modification given and new information. closely tree-building stored as properties of part of the tree structure. t o avoid clauses of during the the Instead, adjectives nouns set-label is they modify. checked Quantification into on a for When pre- set is or also translated into words (see Section VI F for details), The relations in a question are nouns have been translated. translated after the The relations include verbs and prepositions. A search of the input string is made for each arc-label and its occurrence in the string is the preposition or conjugated verb. translated into a verb, the transformation is verb (see When an arc-label is applicability of the negation tested. transformation to apply, 'no" Section VI. the D). In order This is on translated and indicates negation. verb's that the verbs are only translated Following paraphrase is complete, for the negation must appear directly after the quantification translated. replaced by only possible object It has is for if the already been this reason once the nouns have be'en translation of the verbs, the VII . GENERATION The paraphrase component has been given a dual function. It can generate an English version of the parser's representation as well as paraphrase in the three-part form. This function uses the same procedures and grammar as the three-part paraphraser, but the tree is not split into three separate trees before beinq flattened. function could be used to produce were, there is The generation the paraphrase, but if it no guarantee that the question would differ from the user's. In CO-OP, generation is used to produce A corrective response suggestions and corrective responses. is used to an correct the user's false existential presupposition incorrect, the presuppositions. encoded in When the question M Q L representing portion of alternative is the particular presupposition is passed to the paraphraser which generates the For example, corrective corrective response. response that could be below (X) generated is by a the paraphraser if (W) were asked: (W) Which programmers in division 3 oceanography? (X) I don't Alternative system when work on projects in know of any projects in oceanography. suggestions are the direct response also used to the user's negative. If an incorrect presupposition question, the resulting by question may the CO-OP question is is removed from a no longer have a negative response. wider class such cases, the system question to Alternative above as a the user suggestions corrective response (W),(X) In are only has been made. might be suggests the possible interest. after' presented Thus, a followed sequence like the by a alternative suggest ion (Y) : (Y) But you might be interested in programmers division 3 that work on any projects. For corrective portion of the MQL responses, the paraphraser representation of in receives the the user's question which encodes the incorrect presupposition. For alternative suggestions, the paraphraser also receives a portion of the original MQL. the user's In this case, it is the portion representing question minus the incorrect presupposition. For both types of response, the paraphraser generates a question M Q L representation from the it is given. The wh-item is then stripped from the front of the question and a phrase is attached to the 'I statement. corrective depending front, don't responses, on the "But noun which presupposition above, Slight in was the "anyw modifies the ... of any question used, Table ... The adjective 'any' restricted by original question, "projectsm which, are 2 particular wh-items and in to ' is used modifications you might be interested alternative suggestions. the know wh-item correspondence between used, converting a for made, shows the the phrases is used for is used before the Thus, in the incorrect in (Y) original question, was restricted by .in T h e flattening process for used for subtrees paraphrasing. representing therefore, the tree is the flattened by pre-order however, is The given same; traversal. generation differs from that tree is and new not divided into information and is flattened as a whole. traversal inorder oceanography". the left and traversals, the The rule not identical. for It into the question in order that T h e order of right subtrees total tree by expansion of introduces the are an sub-nodes word "that" each subtree be expanded as a relative clause, and not as a separate sentence. The rule is: SUB-NODE->that ARC-LABEL SET-LABEL The transformational grammar also applies to the generation process, with being the point the one difference the applicability of wh-fronting is these process changes and is the the flattening same as the at which tested for. Other than process, the generation paraphrase process. The generation function is general enough that it m a y eventually be used for other types of responses in cases when something other than a direct response is needed. 1--------+-------------------------+------------------------- I 1 I 1WH-ITEM 1 1 I 1 I I 1 1 1 and 1 1 -------- -------------------------+------------------------- I lwhich I 1 EXAMPLE I I (question 1 I I I 1 1 PHRASE I don't know of any (corrective response) I I 1 I I IWhich I users work 1 on! 1 Iprojects? 1 1 ( I don't know of I lusers that work I Iprojects. -------- .-------------------------.+----------------------I I don't who know of anyone any! I on1 1 I 1 I (Who works on projectsl I ( i n oceanography? f I 1 I I 1 i I1 don't know of anyone! 1 1 lthat works on projects1 I l in oceanography. I 1 --------+-------------------------+------------------------Table 2 I 1 I .. --------+------------------------I WH-ITEM I PHRASE I I I 1 what I I I I I I 1 . I I I I I I I EXAMPLE I 1 corrective response)l .--------------------- I I I 1 don't know I I I of anything What is I I INASA? sponsored by1 I I I 1 (I I I don't know of 1 (anything I I I 1 sponsored 1 I \--------+-------------------------+--------------------I I I lwhere I1 don't know of anyplacelwhere do I I I I 1 I I (question and --------+------------------------I .--------------------- I I that is by NASA. users have laccounts? I I 1 1 I 1 11 don't I l anyplace I I know that of users lhave accounts. I I I--------+-------------------------+----------------------1 I 1 (How many11 don't I I know of any \How many users I have1 1 laccounts? I 1 I I 1 1 I , I I1 don't I lusers I know of any1 1 that have I I I laccounts. I--------+-------------------------+----------------------- I I I Table 2 (continued) I 1 FUTURE RESEARCH VIII. The CO-OP semantics. to the paraphraser is lacking in the area of Although a good deal of attention has been given form of the question and reasons for using a different form than was used in the original, the words used in the paraphrase are the same words that occurred user's question (with the exception in the paraphrase to introduce development of a particular words method of words that are added given or The next step in paraphrasing in in the new information). the database system is the to determine in the paraphrase why and should differ how the from the user's. A logical schema developed (see (MAYS of the is currently 79)) which will provide information for the system. of the database implementaion of The about the information could be used some semantic schema will be independent any particular contain knowledge being database and will the data. This structure of by the paraphraser in choosing words that reflect the structure and content of the database t o replace words used in the original question. The current version disambiguation. Nouns in occur in the lexicon category. The word's any semantic of the system does are assigned example, in question question which to a do not specific database syntactic function in the question and constraints determine its category the user's do some lexical placed upon it (see (KAPLAN 79) for (2) below, the name are used details). to For Thomas Wirth does not appear in t h e lexicon. (AA) indicating t h e category that h a s is output before is output t o the been assigned. t h e paraphrase and is done user, T h e phrase by t h e control structure. ( 2 ) Which programmers advised division 3? by T h o m a s W i r t h are in (AA) I a m assuming that T h o m a s Yirth is an advisor name. Another change which would provide the user with additional semantic information would be t o generate part of t h e paraphrase from the formal database query. T h i s would provide t w o specific t y p e s of information for the user. The first of these h a s t o d o w i t h t h e verbs used in t h e question and t h e relations not occur a s in the database. a relation in t h e For form a example, 'checks whose 'newn "checks relation correspondinq t o night the In the verb. correspond than t h e account s o m e fictitious banking database. helpful for t h e user t o s e e used in from t h e database may be that bounced" amount is less verb w h i c h does database is question, a composite o f relations used t o When a to balancen in such cases, it may be h o w the verb w a s interpreted in t e r m s o f t h e database concepts. A second t y p e of information that from t h e formal database query sets restricted by are retrieved In some cases, t h e order in from the modifying c l a u s e s makes t i m e t a k e n t o find t h e answer. generated concerns the method by which t h e database is t o be searched. which could be database and a difference then in t h e Although such information is not necesgary for interpretation the user of her/his to understand question, the some system's users may be interested in the efficiency of the database search. Additions to the area paraphraser could also be of inferencing. indicate the The system's paraphraser made in the could be interpretation of used the to user's intentions if the system were to address the question of why a particular would question was asked. contain more content of the This type information than user's question. of paraphrase just the substantial Some the of user's intentions or motives may be deduced on the basis of her/his question alone, For example, when a question is asked that requires a list for an answer, the questioner may not really be interested in all items on the list. Recognition of such motives a question is useful unmanageably large. the user about when the answer to The paraphraser could be used underlying motives which would becomes to ask restrict the list. If a of the user model were maintained durinq her/his session with the system, more information would be available to aid the paraphraser in determining the user's It is often the questions on case that one topic, a S/he questions t o get the information question, In such cases, may asks a have to series ask of several needed to ask a particular the paraphraser deduce what the user is aiming at the paraphrase. person intentions. may be able to and include it as part of Moreover, a question asked at the end of a series of questions may take on a slightly different meaning if viewed in in isolation. provide the light of previous questions A running history of paraphraser with rather than taken a user the necessary generate these nuances of meaning. session would information to IX. RELATED RESEARCH At the aware of present time, two exist for other paraphrasers that database question-answer was devel.oped by David Waltz et. al. PLANES Ted Codd et. system, the Rendezvous Version generates using other by the paraphrase templates. actions. The English query. An from the process words abbreviations or code (WALTZ 78) (CODD 7 8 ) . 1 System systems. al. The are One for the for the PLANES system formal database involves I am three query specific substituted for any names which appeared in the database appropriate paraphrase template is selected for use and the slots in the template are then filled with words and phrases from per se. the query. It involves the suitable for The process is not generation formation of templates the particular database which are and for the types of questions which can be asked. The Rendezvous is slightly more three parts used. A System also uses templates, sophisticated to generation the system EVERY and two header template which query is chosen first. most than frequently. ..., The where the header types of There type of three types of queries in COUNT), of which for FIND dots must be are templates are corresponds to the There are (FIND, EXIST, and WALTZ'S. althouqh it is FIND occurs PRINT THE filled in, ... The second part of the paraphrase is the target list and it occurs only in PRINT type queries. called the body. The third It is formed by part of the paraphrase is extracting patterns from tables that are associated with particular items in the generation component are database. The goals of important easy ones. to the Rendezvous The generated understand, (CODD 7 8 ) . achieve discriminating, Instead of these English must goals developing and a however, the unambiguous, not general solution research seems concentrated on particular examples which don't criteria. results in This misleading part from the use to to be meet these of patterns which are essentially fragments of English t o be inserted in the sentence. for a constructed beforehand The patterns must be particular database and great choose phrases that can be variety of other looking at phrases. particular care must be taken to easily patched together Such a solution examples, instead with a necessitates of the general framework. Goldman (GOLDMAN 75) although it system, is not part MARGIE, network paraphrase mode. Unlike the and In English operates in either paraphrase mode, paraphraser, system. a from knows of expressing a CO-OP a paraphraser, of a question-answer generates dependency possible ways it has also developed The conceptual inference MARGIE outputs or all particular concept, MARGIE is a semantic paraphraser; it uses different idioms and phrases to express the same idea. Other work Slocum has been done in generation by (SIMMONS and SLOCUM 72), Heidorn Simmons and (HEIDORN 7 5 ) , and McDonald (MCDONALD 78). Simmmons and a system t o generate English a n ATN (WOODS 73). structure grammar system can be used pronouns versus proper nouns. Heidorn uses an augmented with a n interpreter McDonald h a s examined of from semantic networks using a T h e formalism they use is similar transformational grammar. to Slocum have developed for both for t h e generation and t h e more s ~ e c i f i cproblem naming through the phrase rules. His analysis. of t h e use use - o f nouns and H e has developed a system for generation that incorporates t h e constraints he has observed. X. CONCLUSIONS The paraphraser this work a syntactic described here is has examined the one. reasons for different While forms of expression, additions must be made in the area of semantics. The substitution portions or all of synonyms, phrases, of the question requires the effect of context on word base and contextual used here, extended wider a for meaning and of the intentions syntactic approach once idioms an examination of of the speaker on word or phrase choice. semantic or The lack of a rich information but the ranqe of has been dictated paraphraser can information the be becomes available. The paraphraser CO-OP domain-independent and requires no changes in use the template thus a designed change the paranhraser. form however, do of to the be database Paraphrasers which require such changes. This is because the templates, or patterns, which constitute the type of dependent on question that the domain. can be For asked, are different necessarily databases, a different set of templates must be used. The CO-OP that paraphraser also it qenerates grammar of the questions. differs from question It using a addresses two other systems in transformational specific problems involved in qenerating paraphrases: 1. ambiguity in determining relative clause modifies which noun phrases a 2. the production user's of a question These goals have been achieved that differs from the for questions using relative clauses through the application of a theory of given and new information to the generation process. XI. APPENDIX A MACROS and MQL Representation MQL representation: ( sets ( . . . . . ) . relations .... )) a list, the MQL is that identify the lists, Each CAR is a list of which sets in a graph. list in the of Gensyms The CADR is a list represents a list of relation. Its format is as follows: The CAR of the list is a identifies a relation in the graph. list of Gensyms involved in. which identify The first item in Gensym The which CADR the sets of the list is the relation the list will be order set (or the subject) of the relation. uniquely is the hiqh The second item in the list will be the low order set (or the object) of the relation. If the arc is a disjunct arc, there will be more than one low order set. All other the Gensyms. information is located Each node in a qraph on property has the lists of following properties associated with it: CAT The lexical category of the node. Will be either N (noun), P N (proper noun), ADJ (adjective) , I WH (wh-word). QUANT The quantification on the node. This will (u (e.g. be a list u) for universal quantification). NUM The number of the node, LEX Will be either PLUR or SING. The lexical word associated - USER). with the Gensym (e.9. TOPIC This will be true if the node is the topic of the question. Each arc has the properties associated with it: LEX All lexical information associated with an arc will form is a list of lists. for each be on low order this property list. There will be set associated Its one list with the arc. Each list will have the following form: <vrb> will be the lexical arc. NIL. (e.g. WORK). <PREP> will verb associated with the If there is none, be the associated with the arc (e.g. none, it will be NIL. lexical - 1 Note that it will he preposition If there is both of these slots can be non-NIL or one of them can be NIL, but they both can not next item is a be NIL in list of the same tense and list. The number of the verb. can If <vrb> is nil, these will be also. be either PAST, PRES, PRESP participle), or PASTP (past participle). be SING or PLURo is conjugated will be T, <reg> (present <num> can indicates whether the verb regularly or If not, not,<reg> will If be a it is be T if this arc is the main <reg> list of proper conjugations taken from the lexicon. will <tnse> <Main> verb of question, MACROS ARC:HIORDSET Arguments: 1. The arc for which the high order set is needed. 2. The list of relations and their associated sets. (the CADR of the M Q L graph), Returns: The Gensym identifying the high order set or subject of the arc. ARC:LOWORDSET Arguments 1. the arc for which the low order set is needed. 2, the list of relations and their associated sets, Returns: the first low order set of the arc, ARC :LEX Arguments: 1. the Gensym identifying the arc to be translated. Returns: A list containing the lexical translation the the of the arc. T h i s will be either a preposition a verb, o r a verb and preposition. T h e verb will be properly conjugated. ARC :VRB Arguments: 1. O n e list of the property list of a n arc. i.e. T h e lexical information corresponding t o one arc, whether it be part of a disjunct a r c or a simple arc. Returns: T h e unconjuqated verb from that list. - ARC: P R E P Arguments: 1. O n e lexical list of a n arc. Returns: A list of the preposition or NIL if there is no prepositon. ARC: TNSE Arquments: 1. o n e lexical list of a n arc Returns: t h e tense of the verb from that list. ARC :REG Arqurnents: 1. o n e lexical list of a n arc. Returns: T if the verb is regularly conjugated. Otherwise, it returns the list of conjugations for the irregular verb. ARC :C A T Arguments: 1. T h e Gensym identifying the arc. Returns: T h e lexical category of an arc. Note that this will be either PREP or VERB. NODE :LEX Arguments: 1. T h e Gensym identifying the node. Returns: T h e lexical translation of a node. NODE: QUANT Arguments: 1. The Gensym identifying the node. Returns: the quantification on a node. This will be in the form of a list (e,g. (u u) for universal quantification). - NODE :CAT Arguments: 1. The Gensym identifying the node. Returns: T h e category of a node. This will be either N, PN, ADJ, or WHO NODE: NUM Arguments: 1. The Gensym identifying the node. Returns: T h e number of the node, This will be either PLUR or SING. N0DE:TOPIC Arguments: 1. The set of nodes in the graph. Returns: The topic of the set of nodes. ARCS : Arguments: 1. The Y Q L qraph Returns: The set of Gensyms identifying the arcs in the qraph. NODES: Arguments: 1. The M Q L graph Returns: T h e set of Gensyms identifyinq the nodes in the graph. This is the car of the MQL. - XII. APPENDIX B Sample Questions and their Paraphrases Q: Which programmers from the ASD group are in superdivision SOOO? P: Which programmers are in superdivision #5000? Look for programmers from ASD group. Q: Who advises projects in area 36? P: Assuming that there are projects in area 135, who advises those projects? Q: Which advisors have accounts for projects in oceanography? P: Assuminq that there are accounts projects are in oceanography.), for projects (those which advisors have those accounts? Q: Which users that work on projects that NASA sponsors are in division 3? P: Which users are in division #3? sponsors. Q: Which look for users that N A S A The users must work on projects. users work on projects in area 3 that are in division 200? P: Assuming that there are projects work on those projects? Look in area 4 3 , which users for users that are in division #200. Q: What are the groups? P: Display the groups. Q: Which users work on every project in area 5 5 3 P: Assuming that there is at least one project in area t 5 5 , which users work on every such project? Q: Who sponsors more than 3 projects? P: Who sponsors 4 or more projects? Q: Which programmers work in division 3 or 4? P: Which programmers work in division # 3 or division #4? Q: Which users work on all projects in oceanography? P: Assuming that there are projects in users work on all of those projects? oceanography, which - XIII. APPENDIX C A Note on Theme and Rheme The concept relation to of theme and rheme its affect on the more detail than have other Prague School has been discussed in word order of a distinctions. postulate that the sentence in Linguists of the sentence is divided into elements providing common ground for the conversants (theme) and elements which function in be imparted (rheme). is governed by conveying the information to The definition of elements as thematic two constraints. In sentences containinq - elements which are contextually dependent e . information context), known or contextually dependent In sentences defined as determined elements always lacking known or those Since hand, is (CD), a degree of the new C D than those that lowest don't, theme. theme is of Rheme, on high deqree information the degree vague concept. characterized by a elements conveying function as given information, elements having communicative dynamism* the other from conveying of CD. carry a higher rheme is close to the concept of new information. The Prague School contends that tendancy for theme to appear as for rheme in English there subject of the sentence and to appear towards the end of the sentence. p * of (FIRBAS 74) an element defines the deqree of as "the is a extent p p p p The - communicative dynamism to which contributes towards the development of the the element reverse order is possible, though less likely, According to these observations, if context indicates that the transitive subject of the the sentence conveys the new theme occurs as object, then passivized to re-establish the information, while sentence would the order of theme be first, rheme last. An analysis of question has theme and rheme and its been made by both Although they agree about the (see IV), Section function as theme they function in the Halliday and unknowns for disagree and which funcion about Krizkova. the questioner which elements as rheme. Halliday defines the theme of a question as a demand for information, The wh-item of a question is interpreted as theme because of its position in the question and because it indicates the speaker's want of knowledge and desire to fill it, He says: "In a non-polar interrogative for example, the wh-item is the theme by virtue of its being the point of departure for the message; it is precisely what is being talked about," (HALLIDAY 67; p, 212) Krizkova criticizes this the interrogative Krizkova, question. Rrizkova's words of the the rhematic The analysis, instead interpreting elements are remaining definition of question as elements the rhematic. For unknowns in the function theme and rheme is as theme. similar to the given/new distinction since she only considers what is known or unknown in the question. Halliday, o n the other hand, is closer to the concept of topic/comment articulation rheme. Describing in his definition of theme and the difference between theme and given information, he defines. theme as that which talking about now, speaker was talking ascribes the the sentence. rheme. as opposed to about. term theme to given, the speaker is that which Furthermore, Halliday the element occurring He labels the remainder of the the always first in sentence as For him, whether elements function as theme or rheme is determined by the difference in interpretation of order of the sentence. theme and rheme that accounts for Krizkova's analysis. the meaning It is of the the conflict in this terms his and XIV. REFERENCES Akmajian, A, and Heny, F., An 1, (AKMAJIAN and HENY 75). Introduction to the Principles of Transformational Syntax, MIT Press, 1975. - - 2. (CHAFE 76). Chafe, W, L,, OGivenness, Contrastiveness, Definiteness, Subjects, Topics, and Points of View", Subject and Topic (ed, C. N. Li), Academic Press, 1976. 3. An (CODD 78). Codd, E. F., et al., Rendezvous Version 1: Ex erimental English-language Query Formulation G rs * of Relational Report ~~2144(29407), IBM ~ e s e a r ~ a b o r a t b rSan ~ , Jose, Ca., 1978, - 4. (CULLICOVER 76). Press, N. Y., 1976. Cullicover, P. W., Syntax, Academic 5. (DANES 74). Danes, F, (ed. ) , Pa ers on ~ u n c t i o n a l Sentence Perspective, Academia, Prague, -&T6. (FIRBAS 66). Firbas, Jan, "On Defining the Theme Functional Sentence Analysis", Travaux Linquistiques Prague 1, Univ. of Alabama Press, 1966, in de - . 7 (FIRBAS 74). Firbas,Jan, "Some Aspects of the Czechoslovak Approach to Problems of Functional Sentence Perspective", papers on Functional Sentence Perspective, Academia, Prague, 1974. - 8, (GERRITSEY 75). Gerritsen, R., SEED Reference Manual, Version COO-B04 draft, ~ n t e r n a t i o n a l a t a Base Systems, Inc., Philadelphia, Pa., 19104, 1978, (GOLDMAN 75). Goldman, N., "Conceptual Generation", 9. Conceptual Information Processing (R. C. Schank) , North-Holland Publishing Co., Amsterdam, 1975, 10. (GRICE 75). Grice, H. P., .Logic and conversation", in 3, (P. Cole and Syntax and Semantics: Speech Acts, Vol. J. L. Morgan, Ed.), Academic Press, N. Y., 1975. - 11. (HALLIDAY 67). Halliday, M.A.K., Transitivity and Theme in ~ n g l i s h " , Journal 3, 1957. "Notes on of Linguistics 12. (HIEDORN 75). Heidorn, G., "Augmented Phrase Structure Grammar", TINLAP-1 Proceedings, June 1975. (JOSH1 79). Joshi, A. K., "Centered Logic: the Role of 13. Entity Centered Sentence Representation in Natural Language Inferencing", to appear in IJCAI Proceedinqs 79. 14, (KAPLAN 79). Kaplan, S. J., "Cooperative Responses from a Portable Natural Language Data Base Query Systemu, Ph,D. Dissertation, Univ. of Pennsylvania, Philadelphia, Pa., 1979. 15. (MAYS 79). Mays, E., forthcominq Masters ~ h e s i s ,Dept. of Computer and Information Science, U. of ~ennsylvania, Philadelphia, Pa., 1979, 16. (MCDONALD 78). McDonald, D. D., "Subsequent Reference: Syntactic and Rhetorical Constraintsn, TINLAP-2 Proceedinqs, 1978. 17. (MORGAN and GREEN 77). Morgan,J.L. and Green, G.M.: "Pragmatics and Reading Comprehension", University of Illinois, 1977. 18, (PRINCE 79). Prince, E,, "On Distinctionw, to appear in CLS 15, 1979, the Given/New 19. (Sgall, Hajicova, and Benesova 73)Sqal1,P. , and Generative Hajicova,-E., and ~enesova,E,, Topic, Focus Semantics, Scriptor Verlag Gmbh, Kronberg Taunus, 1973. 20. (SIMMONS and SLOCUM 72). Simmons, R. and Slocurn, J., "Generating English Discourse from Semantic Networksn, Univ. of Texas at Austin, CACM, Vol. 5, #lo, October 1972. 21. (WALTZ 78). Waltz, D,L., 'An English Language Question Answering System for a Large Relational Database", CACM, Vol. 21 #7, July 1978. - "An Experimental Parsing 22. (WOODS 73). woods, W. A., System for Transition Network Grammars", in Natural Language Processign, Courant Computer Science Symposium #8, R. Rustin (Ed,), Algorithmics Press, Inc., N.Y., 1973.