* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PADL Talk 2008-01-04 - School of Computer Science
Spanish grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Japanese grammar wikipedia , lookup
Pipil grammar wikipedia , lookup
Transformational grammar wikipedia , lookup
Junction Grammar wikipedia , lookup
Context-free grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Determiner phrase wikipedia , lookup
Dependency grammar wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
PADL Talk 2008-01-04 1. My name is Richard Frost. My co-authors are Rahmatullah Hafiz from the University of Windsor in Canada, and Paul Callaghan from the University of Durham in the U.K. Some of you may remember Paul’s work on the LOLITA natural language processing system, which for a few years was the largest program written in a Lazy Functional Programming Language. 2. The work that is described in our paper is a continuation of a project which began in 1985 when I was working on naturallanguage database-query processing at the University of Glasgow. At that time I met two professors who had a significant influence on my subsequent work. One was a retired Professor of Linguistics who I met in the Faculty Club adjoining the cloisters. The other had just been appointed as one of the youngest Full Professors of Computer Science in the U.K. I met him in the newly renovated Computer Science building located in LillyBank Gardens. 3. The Linguist introduced me to a compositional semantics for natural-language that had been developed by Richard Montague in the late sixties. One feature of Montague semantics is that all phrases of the same syntactic category (such as “every moon” and “mars”) have denotations of the same semantic type. Here the denotation of “every moon” given in red, is of the same type as the denotation of the proper noun “mars”. 4. Montague Semantics is polymorphic. For example, the denotation of the word “and” can be used when conjoining nouns, verbs, and termphrases. Montague also accounts for intensionality and modality, allowing the interpretation of phrases such as the title of this slide. The semantics is also highly compositional with one exception – Montague did not provide a direct denotation for transitive verbs. Montague semantics also had a few other shortcomings which needed to be addressed before it could be used as the basis for naturallanguage database-query processing. 5. The young professor who I met at Glasgow has not changed much over the years. And I am sure that many of you will recognize him. I attended a few of John Hughes’ lectures and I was quickly convinced that functional programming was ideally suited for my research. For example, according to Montague, the sentence “Hall discovered Phobos” would , after a convoluted process, be interpreted as “discover_pred (hall, Phobos). However, we can easily derive a direct denotation for transitive verbs as shown here. 6. We now have a straightforward compositional method for interpreting phrases involving transitive verbs, as illustrated on this slide. In addition, John introduced me to higher-order functions, which can be used to construct complex parsers from simpler components. Such functions are now commonly known as “parser combinators”. 7. From 1987 to 89, I extended the combinators to accommodate simple ambiguous attribute grammars, converted a subset of Montague semantics to a more efficient form, and, together with John Launchbury, built a prototype database query processor in Miranda. 8. During the last few years, I have improved the efficiency of the combinators and have encapsulated them in an attribute grammar programming environment. I have also extended the denotational semantics to accommodate arbitrary-nested quantification and negation. Recently, we have created some natural-language applications and have deployed them in a Public-Domain SpeechWeb. 9. Here is an extract from an application which uses our old combinators. The notation is a bit messy and we are improving it in our Haskell implementation. The dictionary consists of a list of words together with their syntactic category and a list of attributes which constitute their meaning. The rules of the attribute grammar define the context free structure of compound syntactic categories, together rules defining relationship between attributes. Here a determiner phrase is defined as consisting of an indefinate pronoun, orelse a determiner followed by a nounclause. In the latter case, the value of the detphrase is obtained by applying the function applydet to the value of the determiner and the value of the nounclause. The dictionary can be extended by defining new words in terms of phrases already covered by the grammar. Although the application can evaluate relatively complex queries, it does not accommodate left recursion and therefore misses some parses for ambiguous input. It does not allow “attribute inheritance from the right” for ambiguous grammars, and it does not represent parses efficiently. 10. Why are we doing all this? Our long-term goal is to create tools and techniques that will facilitate the construction of natural-language applications to be deployed on a Public-Domain SpeechWeb, the framework for which, we have already created. 11. Our current objective is to extend our combinators to accommodate left recursion, allow fully-general attribute dependencies, and represent parse trees efficiently. 12. Our PADL paper describes the progress that we have made. Here is the Haskell code for an example parser constructed with our new combinators. Note that the rule for sentence, s, is left recursive. Note also that we have not yet incorporated attribute rules (we are currently doing that). 13. This shows part of the output when the parser s is applied to an ambiguous sentence. This shows that a sentence has been recognised starting at position 1 and ending at position 5, consisting of a nounphrase followed by a verbphrase. Two other sentences have been recognized, both finishing at position 8. The latter of these consists of a sentence from position 1 to 5 followed by a prepositional phrase starting at 5 and finishing at 8. Note the sharing of the parse tree for the sentence finishing at position 5. More explanation is given in the paper. 14. We have maintained the modularity associated with top-down parser combinators, and can parse ambiguous leftrecursive grammars in polynomial time and space. We have tested our approach on grammars used by Tomita. We have also tested our approach on small massively-ambiguous grammars from Aho and Ulman. Finally, we have tested our approach on a natural-language grammar with 5,200 rules. 15. Our new parsing method makes use of techniques developed by others researchers over the last forty years. We accommodate left recursion by curtailing recursive descent when no parse is possible. Our combinators are based on a top-down approach, and we use Wadler’s notion of “failure as an empty list of successes”. We represent forests of parse trees in a similar way to Tomita. We achieve polynomial time complexity for ambiguous grammars through memoization. We had to develop a new technique to ensure the correct reuse of results in the presence of indirect left recursion. We use monads to structure our implementation. 16. We have found a name for our project “X-SAIGA” for executable specifications of grammars, and have constructed a website containing code, examples, documentation, and proofs of termination and complexity. 17. We have begun to extend the new combinators to accommodate fully-general attribute dependencies. We are also extending our compositional semantics to accommodate general transitive verbs – our approach is base don binaryrelations. We are also encouraging the use of our tools to create content for our SpeechWeb, and have recrutied a number of students who have just begun work on this.