Download PADL Talk 2008-01-04 - School of Computer Science

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Spanish grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Japanese grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transformational grammar wikipedia , lookup

Junction Grammar wikipedia , lookup

Context-free grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Determiner phrase wikipedia , lookup

Dependency grammar wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Cognitive semantics wikipedia , lookup

Parsing wikipedia , lookup

PADL Talk 2008-01-04
1. My name is Richard Frost. My co-authors are Rahmatullah
Hafiz from the University of Windsor in Canada, and Paul
Callaghan from the University of Durham in the U.K. Some
of you may remember Paul’s work on the LOLITA natural
language processing system, which for a few years was the
largest program written in a Lazy Functional Programming
2. The work that is described in our paper is a continuation of a
project which began in 1985 when I was working on naturallanguage database-query processing at the University of
Glasgow. At that time I met two professors who had a
significant influence on my subsequent work. One was a
retired Professor of Linguistics who I met in the Faculty Club
adjoining the cloisters. The other had just been appointed as
one of the youngest Full Professors of Computer Science in
the U.K. I met him in the newly renovated Computer Science
building located in LillyBank Gardens.
3. The Linguist introduced me to a compositional semantics for
natural-language that had been developed by Richard
Montague in the late sixties. One feature of Montague
semantics is that all phrases of the same syntactic category
(such as “every moon” and “mars”) have denotations of the
same semantic type. Here the denotation of “every moon”
given in red, is of the same type as the denotation of the
proper noun “mars”.
4. Montague Semantics is polymorphic. For example, the
denotation of the word “and” can be used when conjoining
nouns, verbs, and termphrases. Montague also accounts for
intensionality and modality, allowing the interpretation of
phrases such as the title of this slide. The semantics is also
highly compositional with one exception – Montague did not
provide a direct denotation for transitive verbs. Montague
semantics also had a few other shortcomings which needed to
be addressed before it could be used as the basis for naturallanguage database-query processing.
5. The young professor who I met at Glasgow has not changed
much over the years. And I am sure that many of you will
recognize him. I attended a few of John Hughes’ lectures and
I was quickly convinced that functional programming was
ideally suited for my research. For example, according to
Montague, the sentence “Hall discovered Phobos” would ,
after a convoluted process, be interpreted as “discover_pred
(hall, Phobos). However, we can easily derive a direct
denotation for transitive verbs as shown here.
6. We now have a straightforward compositional method for
interpreting phrases involving transitive verbs, as illustrated
on this slide. In addition, John introduced me to higher-order
functions, which can be used to construct complex parsers
from simpler components. Such functions are now commonly
known as “parser combinators”.
7. From 1987 to 89, I extended the combinators to
accommodate simple ambiguous attribute grammars,
converted a subset of Montague semantics to a more efficient
form, and, together with John Launchbury, built a prototype
database query processor in Miranda.
8. During the last few years, I have improved the efficiency of
the combinators and have encapsulated them in an attribute
grammar programming environment. I have also extended
the denotational semantics to accommodate arbitrary-nested
quantification and negation. Recently, we have created some
natural-language applications and have deployed them in a
Public-Domain SpeechWeb.
9. Here is an extract from an application which uses our old
combinators. The notation is a bit messy and we are
improving it in our Haskell implementation.
The dictionary consists of a list of words together with their
syntactic category and a list of attributes which constitute
their meaning.
The rules of the attribute grammar define the context free
structure of compound syntactic categories, together rules
defining relationship between attributes.
Here a determiner phrase is defined as consisting of an
indefinate pronoun, orelse a determiner followed by a
nounclause. In the latter case, the value of the detphrase is
obtained by applying the function applydet to the value of the
determiner and the value of the nounclause.
The dictionary can be extended by defining new words in
terms of phrases already covered by the grammar.
Although the application can evaluate relatively complex
queries, it does not accommodate left recursion and therefore
misses some parses for ambiguous input. It does not allow
“attribute inheritance from the right” for ambiguous
grammars, and it does not represent parses efficiently.
Why are we doing all this? Our long-term goal is to
create tools and techniques that will facilitate the
construction of natural-language applications to be deployed
on a Public-Domain SpeechWeb, the framework for which,
we have already created.
Our current objective is to extend our combinators to
accommodate left recursion, allow fully-general attribute
dependencies, and represent parse trees efficiently.
Our PADL paper describes the progress that we have
made. Here is the Haskell code for an example parser
constructed with our new combinators. Note that the rule for
sentence, s, is left recursive. Note also that we have not yet
incorporated attribute rules (we are currently doing that).
This shows part of the output when the parser s is
applied to an ambiguous sentence. This shows that a sentence
has been recognised starting at position 1 and ending at
position 5, consisting of a nounphrase followed by a
verbphrase. Two other sentences have been recognized, both
finishing at position 8. The latter of these consists of a
sentence from position 1 to 5 followed by a prepositional
phrase starting at 5 and finishing at 8. Note the sharing of the
parse tree for the sentence finishing at position 5. More
explanation is given in the paper.
We have maintained the modularity associated with
top-down parser combinators, and can parse ambiguous leftrecursive grammars in polynomial time and space. We have
tested our approach on grammars used by Tomita. We have
also tested our approach on small massively-ambiguous
grammars from Aho and Ulman. Finally, we have tested our
approach on a natural-language grammar with 5,200 rules.
Our new parsing method makes use of techniques
developed by others researchers over the last forty years. We
accommodate left recursion by curtailing recursive descent
when no parse is possible. Our combinators are based on a
top-down approach, and we use Wadler’s notion of “failure
as an empty list of successes”. We represent forests of parse
trees in a similar way to Tomita. We achieve polynomial time
complexity for ambiguous grammars through memoization.
We had to develop a new technique to ensure the correct
reuse of results in the presence of indirect left recursion. We
use monads to structure our implementation.
We have found a name for our project “X-SAIGA” for
executable specifications of grammars, and have constructed
a website containing code, examples, documentation, and
proofs of termination and complexity.
We have begun to extend the new combinators to
accommodate fully-general attribute dependencies. We are
also extending our compositional semantics to accommodate
general transitive verbs – our approach is base don binaryrelations.
We are also encouraging the use of our tools to create content
for our SpeechWeb, and have recrutied a number of students
who have just begun work on this.