Download Definite Clause Grammars - School of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

English clause syntax wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Morphology (linguistics) wikipedia , lookup

Macedonian grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Inflection wikipedia , lookup

Georgian grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Navajo grammar wikipedia , lookup

Udmurt grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Old Irish grammar wikipedia , lookup

Malay grammar wikipedia , lookup

Vietnamese grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Esperanto grammar wikipedia , lookup

French grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Icelandic grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
Syntax and Processing it:
Definite Clause Grammars in Prolog
(optional material)
John Barnden
School of Computer Science
University of Birmingham
Natural Language Processing 1
2014/15 Semester 2
DCGs: Introduction
•
A way of writing syntactic recognizers and parsers directly in Prolog.
•
We write Prolog rules of a special type. These look very much like CF grammar productions.
Recognition or parsing happens by the normal Prolog computation process.
Different structures can be recognized/created for the same sentence, by the normal
alternative-answer process of Prolog: i.e., natural handling of syntactic ambiguity.
•
In the parsing case, syntax trees are produced.
Grammatical constraints such as agreement are also easy to include.
•
The rules can be translated into ordinary Prolog, but with a lot of extra parameters that are
tedious to write and that obscure the main information.
The compiler meta-interprets the rules into normal Prolog.
•
Caution: DCGs provide only top-down depth-first parsing, because of Prolog’s approach to
using rules.
But other strategies may be better. More on this later.
DCGs, contd: Recognition
•
See link on Slides page to a toy recognizer in DCG that you can examine and play with.
•
Example DCG rules for recognition of non-terminal categories:
s
-->
np -->
•
np, vp.
noun, pp.
np -->
det, adj, noun, pp.
Example DCG rules for recognition of terminal categories:
det --> [a].
noun --> [cat].
det --> [an].
noun --> [dog].
det --> [the].
noun --> [dogs].
verb --> [dogs].
(There is another, more economical method.)
•
The program can be run in two ways:
s([a, dog, sits, on, a, mat], []).
phrase(s, ([a, dog, sits, on, a, mat]).
•
np([a, dog], []).
phrase(np,[a,dog]).
The second argument for s, np etc. is for catching extra words:
np([a, dog, sits, on, a], X).
Gives X = [sits, on, a].
Advantage of DCGs over ordinary Prolog
S  NP VP
NP  Det Noun
•
Consider the abstract grammar rules
•
Here’s how they could be implemented in ordinary Prolog (for just recognition, but
syntax-tree constructing and grammatical-category checking [see later] can be added) :
s(WordList, Residue):np(WordList, Residue_to_pass_on), vp(Residue_to_pass_on, Residue).
np(WordList, Residue):det(WordList, Residue_to_pass_on), noun(Residue_to_pass_on, Residue).
det([the | Residue], Residue).
noun([dog | Residue], Residue).
•
Can be called as in:
s([a, dog, sits, on, a dog], []).
Exercise: See ordinary-prolog version of the recognizer linked from Slides page.
•
Compared to DCG form, have the extra WordList and Residue arguments in
every syntactic-category predicate. Tedious, error-prone.
DCGs: Additions
•
Can embed ordinary Prolog within grammar rules.
•
Can use disjunction and cuts.
•
Can add arguments to the category symbols (np, det, etc.) so as to
– Build syntax trees, i.e. do parsing, not just recognition
– Include “grammatical categories” (used to enforce constraints such as agreement)
– Build semantic structures.
•
Will see some of this in following slides.
DCGs: Parsing
•
Add a parameter to each category symbol, delivering a node of the syntax tree:
vp(vp_node(Verb_node, PP_node) ) --> verb(Verb_node),
verb(verb_node(sits))
-->
pp(PP_node).
[sits].
• The program can again be run in two ways:
s(ST, [a, dog, sits, on, a, mat], []).
phrase(s(ST), ([a, dog, sits, on, a, mat]).
• See links on Slides page to toy parsers in DCG that you can examine and play with.
So far: “basic” parser1.
An initial exercise: add new words and new NP rules.
DCGs: Syntactic Ambiguity
•
Suppose we add two extra rules:
vp( vp_node(Verb_node, PP_node1, PP_node2) ) -->
verb(Verb_node),
pp(PP_node1), pp(PP_node2).
np( np_node(Det_node, N_node, PP_node) )
-->
det(Det_node), noun(N_node), pp(PP_node).
• Then we get two different structures for
A dog sits on the mat with the flowers.
• Exercise:
• Work out by hand what structures you should get, both as drawn syntax trees and as
Prolog forms.
• Try it out using the relevant parser on the Slides page.
Terminals: A Better Implementation
•
verb(verb_node(Word))
-->
[Word], {verb_pred(Word)}.
The part in braces is ordinary Prolog.
•
Individual verbs are included as follows:
verb_pred(sit).
verb_pred(sits).
verb_pred(hates).
•
This is less writing per individual verb, and concentrates the node-building into one place.
•
Looks possibly less efficient, because of the extra step.
BUT in modern Prologs it speeds up execution:
by making the DCG terminal symbol call (verb in top line above) deterministic
by making the call of the lexical predicates (verb_pred, etc.) deterministic.
•
Exercise: amend one of the toy parsers by using the above method.
Grammatical Categories
•
•
A grammatical category is a dimension along which (some) lexical or syntactic consistuents
can vary in limited, systematic ways, such as (in English):
Number
singular or plural: lexically, nouns, verbs, determiners, numerals
Person
first, second and third: lexically, only for verbs, nouns and some pronouns
Tense
present, past (various forms), future: lexically, only for verbs
Gender
M, F, N [neither/neuter]: lexically, only some pronouns and some nouns
Syntactic constituents can sometimes inherit grammatical category values from their
components, e.g. (without showing all possible GC values):
the big dog: 3rd person M/F/N singular NP // the big dogs: 3rd person M/F/N plural NP
we in the carpet trade: 1st person M/F plural NP // you silly idiot: 2nd person M/F singular NP
eloped with the gym teacher: past-tense VP // will go: future-tense VP
the woman with the long hair: female NP // the radio with the red knobs: neuter NP
•
A lexical or syntactic constituent can be ambiguous as to a GC value:
e.g. sheep: singular/plural;
manage: singular/plural 1st/2nd person
Grammatical Categories in DCGs, contd
•
Or, using the better lexicon representation:
noun(n_node(Word), gcs(numb(Numb), person(third)) )
--> [Word], {noun_pred(Word, Numb)}.
noun_pred(dog, singular).
noun_pred(dogs, plural).
Grammatical Categories in DCGs, contd
•
Enforcing agreement in an NP syntax rule:
np(np_node(Det_node, N_node), gcs(Number_gc, Person_gc) )
--> det(Det_node, gcs(Number_gc, Person_gc) ),
noun(n_node,
gcs(Number_gc, Person_gc) ).
OR more simply, if don’t need to enforce a particular shape to gcs(...):
np(np_node(Det_node, N_node), GCs)
--> det(Det_node, GCs), noun(n_node, GCs).
•
Enforcing subject-NP / VP agreement (NB: doesn’t handle the case GC)
s(s_node(NP_node, VP_node), GCs)
--> np(NP_node, GCs), vp(VP_node, GCs).
Grammatical Categories in DCGs, contd
•
Not enforcing agreement within part of a VP rule:
vp(vp_node(Verb_node, PP_node), GCs )
--> verb(Verb_node, GCs), pp(PP_node).
OR if you needed PP to return some GCs that didn’t matter:
vp(vp_node(Verb_node, PP_node), GCs )
--> verb(Verb_node, GCs), pp(PP_node, _ ).
•
Exercise: understand and play around with the GC version of the parser linked from Slides
page.
•
The program can again be run in two ways:
s(ST, GCs, [a, dog, sits, on, a, mat], []).
phrase(s(ST, GCs), ([a, dog, sits, on, a, mat]).