Download lab_parsing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Latin syntax wikipedia , lookup

Spanish grammar wikipedia , lookup

Focus (linguistics) wikipedia , lookup

Determiner phrase wikipedia , lookup

Antisymmetry wikipedia , lookup

Distributed morphology wikipedia , lookup

Construction grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Dependency grammar wikipedia , lookup

Context-free grammar wikipedia , lookup

Transformational grammar wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Lucien Tesnière wikipedia , lookup

Junction Grammar wikipedia , lookup

Musical syntax wikipedia , lookup

Parsing wikipedia , lookup

Transcript
CC437
Lab 7: Parsing
12th December 2004
The goal of this lab is to get some practice with parsing, using the Connexor
`Machinese Syntax’ tool.
1.
Machinese Syntax
Machinese Syntax is the latest name for the software developed by the
Finnish company Connexor (formerly Lingsoft), which also developed the
hand-coded approach to POS tagging discussed in the lectures. Machinese
Syntax performs a number of tasks. First of all, it is an example of parser – a
program that analyzes a text into its constituents. Machinese Syntax uses a
special type of context-free grammar called Functional Dependency Grammar
(FDG) that identifies a head for each constituent and assigns to the other
elements of that constituent a relation with respect to the head. Machinese
Syntax also does POS tagging and morphological analysis, using traditional
grammarian tags, and using rules like those seen in class.
The Connexor software is installed in the C:\Conexor folder, which also
contains documentation (in .pdf format). Demos of the software can be
accessed from the Web, at:
www.connexor.com
1.1
Functional Dependency Grammar: A Quick Intro
In the lectures we only discussed one formalism for describing grammars:
Context Free Grammar (CFG). Dependency grammar is another formalism to
achieve the same goal which developed at about the same time as CFG and
has always been very popular in Computational Linguistics. The basic idea of
Dependency grammar is that every subtree of the parse tree must have a
lexical head; all other constituents are dependent on this head. This is easier
to see using the online demo, from the website listed above. Go there, find the
demo for Machinese Syntax, and then type the following sentence:
Kim saw the postman
As you’ll notice, the demo produces a fancy parse tree, which however is
very different from the trees we saw in the lectures:
S
VP
NP
NP
V
Det
Kim
saw
the
N
postman
even though the constituents it identifies are similar, in a dependency
structure, all the nodes are words; the mother of a subtree is the main word of
that subtree. For example, the NP “the postman” is analyzed as having
postman as a root, and the determiner the as a dependent of postman. The verb
saw is considered the main element (the head) of the overall sentence.
The second new aspect of the structure that you just saw are the labels on the
arcs: this is what the term `functional’ refers to. The idea is that every
dependency expresses a particular grammatical function: Kim is the subject
of the verb saw, whereas the postman is the object. (The output of Machinese
Syntax is discussed in more detail below.) For more details about the
formalism, and for a complete list of the function tags, see the file
cnxfdgen.pdf in the Conexor folder
Exercise 1: Type in the web demo the following text:
I shot an elephant in my pajamas.
Which of the two readings of the sentence discussed in the lectures does Machinese
Syntax produce?
Exercise 2: Now try the graphical interface. To do this, click on the `Connexor’ icon
on your desktop. Then type the “Kim saw a postman” sentence, and using the
documentation in cnxgen.pdf, try to understand the parse thus produced.
Exercise 3: Now type
The aim of research in Natural Language Engineering (NLE) is to endow
computer systems with the ability to process natural language. This ability is
essential for applications such as information retrieval and web search,
information extraction and data mining, text summarization, and speech
technology.
And write for both sentences the context-free parse trees equivalent to the analyses
produced by Machinese Syntax.
1.2
Using Machinese Syntax from the command line
Machinese Syntax has a client-server architecture. The server runs
permanently on sh721. To use Machinese Syntax in your programs, you can
call the client, main-vanilla-client, passing sh721 and the port number
(19001) as arguments. The client can also be called from the command line, as
in the following example (you’ll need to set the Path variable first. Machinese
Syntax lives in C:\Conexor.)
C:\Conexor>echo "Kim saw the postman" | main-vanilla-client sh721 19001
1
Kim kim subj:>2 @SUBJ %NH N NOM SG
2
saw see main:>0 @+FMAINV %VA V PAST
3
the the det:>4 @DN> %>N DET
4
postman postman obj:>2 @OBJ %NH N NOM SG
5
<p> <p>
1
<s> <s>
Exercise 4: use main-vanilla-client as just shown to parse the example
sentence seen in the previous exercises.
1.3
Java API
Machinese Syntax can also be called using a socket interface. This is discussed
in the document cnxfdglinux.pdf in the Conexor folder, section “Using Java
client”.
2.
Publically available parsers
A number of parsers can be freely downloaded from the Web. See, e.g.,
 The OpenNLP site at Sourceforge: http://opennlp.sourceforge.net/
 Chris Manning’s list of NLP resources:
http://www-nlp.stanford.edu/fsnlp/probparse/
 The Natural Language Software Registry: http://registry.dfki.de/
In our group we use primarily the Charniak parser (see Chris Manning’s
pointers) and the RASP parser developed by Cambridge & Sussex. The
Charniak parser in particular is good to get the kind of representations
discussed in the lectures, although in Lisp format.