* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Definite Clause Grammars - School of Computer Science
English clause syntax wikipedia , lookup
Modern Greek grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Macedonian grammar wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Ojibwe grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Navajo grammar wikipedia , lookup
Udmurt grammar wikipedia , lookup
Swedish grammar wikipedia , lookup
Arabic grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Old Irish grammar wikipedia , lookup
Malay grammar wikipedia , lookup
Vietnamese grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Esperanto grammar wikipedia , lookup
French grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Icelandic grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Polish grammar wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Syntax and Processing it: Definite Clause Grammars in Prolog (optional material) John Barnden School of Computer Science University of Birmingham Natural Language Processing 1 2014/15 Semester 2 DCGs: Introduction • A way of writing syntactic recognizers and parsers directly in Prolog. • We write Prolog rules of a special type. These look very much like CF grammar productions. Recognition or parsing happens by the normal Prolog computation process. Different structures can be recognized/created for the same sentence, by the normal alternative-answer process of Prolog: i.e., natural handling of syntactic ambiguity. • In the parsing case, syntax trees are produced. Grammatical constraints such as agreement are also easy to include. • The rules can be translated into ordinary Prolog, but with a lot of extra parameters that are tedious to write and that obscure the main information. The compiler meta-interprets the rules into normal Prolog. • Caution: DCGs provide only top-down depth-first parsing, because of Prolog’s approach to using rules. But other strategies may be better. More on this later. DCGs, contd: Recognition • See link on Slides page to a toy recognizer in DCG that you can examine and play with. • Example DCG rules for recognition of non-terminal categories: s --> np --> • np, vp. noun, pp. np --> det, adj, noun, pp. Example DCG rules for recognition of terminal categories: det --> [a]. noun --> [cat]. det --> [an]. noun --> [dog]. det --> [the]. noun --> [dogs]. verb --> [dogs]. (There is another, more economical method.) • The program can be run in two ways: s([a, dog, sits, on, a, mat], []). phrase(s, ([a, dog, sits, on, a, mat]). • np([a, dog], []). phrase(np,[a,dog]). The second argument for s, np etc. is for catching extra words: np([a, dog, sits, on, a], X). Gives X = [sits, on, a]. Advantage of DCGs over ordinary Prolog S NP VP NP Det Noun • Consider the abstract grammar rules • Here’s how they could be implemented in ordinary Prolog (for just recognition, but syntax-tree constructing and grammatical-category checking [see later] can be added) : s(WordList, Residue):np(WordList, Residue_to_pass_on), vp(Residue_to_pass_on, Residue). np(WordList, Residue):det(WordList, Residue_to_pass_on), noun(Residue_to_pass_on, Residue). det([the | Residue], Residue). noun([dog | Residue], Residue). • Can be called as in: s([a, dog, sits, on, a dog], []). Exercise: See ordinary-prolog version of the recognizer linked from Slides page. • Compared to DCG form, have the extra WordList and Residue arguments in every syntactic-category predicate. Tedious, error-prone. DCGs: Additions • Can embed ordinary Prolog within grammar rules. • Can use disjunction and cuts. • Can add arguments to the category symbols (np, det, etc.) so as to – Build syntax trees, i.e. do parsing, not just recognition – Include “grammatical categories” (used to enforce constraints such as agreement) – Build semantic structures. • Will see some of this in following slides. DCGs: Parsing • Add a parameter to each category symbol, delivering a node of the syntax tree: vp(vp_node(Verb_node, PP_node) ) --> verb(Verb_node), verb(verb_node(sits)) --> pp(PP_node). [sits]. • The program can again be run in two ways: s(ST, [a, dog, sits, on, a, mat], []). phrase(s(ST), ([a, dog, sits, on, a, mat]). • See links on Slides page to toy parsers in DCG that you can examine and play with. So far: “basic” parser1. An initial exercise: add new words and new NP rules. DCGs: Syntactic Ambiguity • Suppose we add two extra rules: vp( vp_node(Verb_node, PP_node1, PP_node2) ) --> verb(Verb_node), pp(PP_node1), pp(PP_node2). np( np_node(Det_node, N_node, PP_node) ) --> det(Det_node), noun(N_node), pp(PP_node). • Then we get two different structures for A dog sits on the mat with the flowers. • Exercise: • Work out by hand what structures you should get, both as drawn syntax trees and as Prolog forms. • Try it out using the relevant parser on the Slides page. Terminals: A Better Implementation • verb(verb_node(Word)) --> [Word], {verb_pred(Word)}. The part in braces is ordinary Prolog. • Individual verbs are included as follows: verb_pred(sit). verb_pred(sits). verb_pred(hates). • This is less writing per individual verb, and concentrates the node-building into one place. • Looks possibly less efficient, because of the extra step. BUT in modern Prologs it speeds up execution: by making the DCG terminal symbol call (verb in top line above) deterministic by making the call of the lexical predicates (verb_pred, etc.) deterministic. • Exercise: amend one of the toy parsers by using the above method. Grammatical Categories • • A grammatical category is a dimension along which (some) lexical or syntactic consistuents can vary in limited, systematic ways, such as (in English): Number singular or plural: lexically, nouns, verbs, determiners, numerals Person first, second and third: lexically, only for verbs, nouns and some pronouns Tense present, past (various forms), future: lexically, only for verbs Gender M, F, N [neither/neuter]: lexically, only some pronouns and some nouns Syntactic constituents can sometimes inherit grammatical category values from their components, e.g. (without showing all possible GC values): the big dog: 3rd person M/F/N singular NP // the big dogs: 3rd person M/F/N plural NP we in the carpet trade: 1st person M/F plural NP // you silly idiot: 2nd person M/F singular NP eloped with the gym teacher: past-tense VP // will go: future-tense VP the woman with the long hair: female NP // the radio with the red knobs: neuter NP • A lexical or syntactic constituent can be ambiguous as to a GC value: e.g. sheep: singular/plural; manage: singular/plural 1st/2nd person Grammatical Categories in DCGs, contd • Or, using the better lexicon representation: noun(n_node(Word), gcs(numb(Numb), person(third)) ) --> [Word], {noun_pred(Word, Numb)}. noun_pred(dog, singular). noun_pred(dogs, plural). Grammatical Categories in DCGs, contd • Enforcing agreement in an NP syntax rule: np(np_node(Det_node, N_node), gcs(Number_gc, Person_gc) ) --> det(Det_node, gcs(Number_gc, Person_gc) ), noun(n_node, gcs(Number_gc, Person_gc) ). OR more simply, if don’t need to enforce a particular shape to gcs(...): np(np_node(Det_node, N_node), GCs) --> det(Det_node, GCs), noun(n_node, GCs). • Enforcing subject-NP / VP agreement (NB: doesn’t handle the case GC) s(s_node(NP_node, VP_node), GCs) --> np(NP_node, GCs), vp(VP_node, GCs). Grammatical Categories in DCGs, contd • Not enforcing agreement within part of a VP rule: vp(vp_node(Verb_node, PP_node), GCs ) --> verb(Verb_node, GCs), pp(PP_node). OR if you needed PP to return some GCs that didn’t matter: vp(vp_node(Verb_node, PP_node), GCs ) --> verb(Verb_node, GCs), pp(PP_node, _ ). • Exercise: understand and play around with the GC version of the parser linked from Slides page. • The program can again be run in two ways: s(ST, GCs, [a, dog, sits, on, a, mat], []). phrase(s(ST, GCs), ([a, dog, sits, on, a, mat]).