* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lab_parsing
Survey
Document related concepts
Latin syntax wikipedia , lookup
Spanish grammar wikipedia , lookup
Focus (linguistics) wikipedia , lookup
Determiner phrase wikipedia , lookup
Antisymmetry wikipedia , lookup
Distributed morphology wikipedia , lookup
Construction grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Dependency grammar wikipedia , lookup
Context-free grammar wikipedia , lookup
Transformational grammar wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Lucien Tesnière wikipedia , lookup
Junction Grammar wikipedia , lookup
Transcript
CC437 Lab 7: Parsing 12th December 2004 The goal of this lab is to get some practice with parsing, using the Connexor `Machinese Syntax’ tool. 1. Machinese Syntax Machinese Syntax is the latest name for the software developed by the Finnish company Connexor (formerly Lingsoft), which also developed the hand-coded approach to POS tagging discussed in the lectures. Machinese Syntax performs a number of tasks. First of all, it is an example of parser – a program that analyzes a text into its constituents. Machinese Syntax uses a special type of context-free grammar called Functional Dependency Grammar (FDG) that identifies a head for each constituent and assigns to the other elements of that constituent a relation with respect to the head. Machinese Syntax also does POS tagging and morphological analysis, using traditional grammarian tags, and using rules like those seen in class. The Connexor software is installed in the C:\Conexor folder, which also contains documentation (in .pdf format). Demos of the software can be accessed from the Web, at: www.connexor.com 1.1 Functional Dependency Grammar: A Quick Intro In the lectures we only discussed one formalism for describing grammars: Context Free Grammar (CFG). Dependency grammar is another formalism to achieve the same goal which developed at about the same time as CFG and has always been very popular in Computational Linguistics. The basic idea of Dependency grammar is that every subtree of the parse tree must have a lexical head; all other constituents are dependent on this head. This is easier to see using the online demo, from the website listed above. Go there, find the demo for Machinese Syntax, and then type the following sentence: Kim saw the postman As you’ll notice, the demo produces a fancy parse tree, which however is very different from the trees we saw in the lectures: S VP NP NP V Det Kim saw the N postman even though the constituents it identifies are similar, in a dependency structure, all the nodes are words; the mother of a subtree is the main word of that subtree. For example, the NP “the postman” is analyzed as having postman as a root, and the determiner the as a dependent of postman. The verb saw is considered the main element (the head) of the overall sentence. The second new aspect of the structure that you just saw are the labels on the arcs: this is what the term `functional’ refers to. The idea is that every dependency expresses a particular grammatical function: Kim is the subject of the verb saw, whereas the postman is the object. (The output of Machinese Syntax is discussed in more detail below.) For more details about the formalism, and for a complete list of the function tags, see the file cnxfdgen.pdf in the Conexor folder Exercise 1: Type in the web demo the following text: I shot an elephant in my pajamas. Which of the two readings of the sentence discussed in the lectures does Machinese Syntax produce? Exercise 2: Now try the graphical interface. To do this, click on the `Connexor’ icon on your desktop. Then type the “Kim saw a postman” sentence, and using the documentation in cnxgen.pdf, try to understand the parse thus produced. Exercise 3: Now type The aim of research in Natural Language Engineering (NLE) is to endow computer systems with the ability to process natural language. This ability is essential for applications such as information retrieval and web search, information extraction and data mining, text summarization, and speech technology. And write for both sentences the context-free parse trees equivalent to the analyses produced by Machinese Syntax. 1.2 Using Machinese Syntax from the command line Machinese Syntax has a client-server architecture. The server runs permanently on sh721. To use Machinese Syntax in your programs, you can call the client, main-vanilla-client, passing sh721 and the port number (19001) as arguments. The client can also be called from the command line, as in the following example (you’ll need to set the Path variable first. Machinese Syntax lives in C:\Conexor.) C:\Conexor>echo "Kim saw the postman" | main-vanilla-client sh721 19001 1 Kim kim subj:>2 @SUBJ %NH N NOM SG 2 saw see main:>0 @+FMAINV %VA V PAST 3 the the det:>4 @DN> %>N DET 4 postman postman obj:>2 @OBJ %NH N NOM SG 5 <p> <p> 1 <s> <s> Exercise 4: use main-vanilla-client as just shown to parse the example sentence seen in the previous exercises. 1.3 Java API Machinese Syntax can also be called using a socket interface. This is discussed in the document cnxfdglinux.pdf in the Conexor folder, section “Using Java client”. 2. Publically available parsers A number of parsers can be freely downloaded from the Web. See, e.g., The OpenNLP site at Sourceforge: http://opennlp.sourceforge.net/ Chris Manning’s list of NLP resources: http://www-nlp.stanford.edu/fsnlp/probparse/ The Natural Language Software Registry: http://registry.dfki.de/ In our group we use primarily the Charniak parser (see Chris Manning’s pointers) and the RASP parser developed by Cambridge & Sussex. The Charniak parser in particular is good to get the kind of representations discussed in the lectures, although in Lisp format.