Download Link Grammar

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A Link Grammar for
an Agglutinative Language
Ozlem Istek & Ilyas Cicekli
Bilkent University, TURKEY
Outline






Link Grammar Formalism
Some Distinctive Features of Turkish Syntax
The System Architecture Of Turkish Parser and Our Adapted
Link Grammar Formalism
Method for Handling the Syntactic Roles of the Words with
Derivations
Evaluation
Concluding Remarks
RANLP-2007
2
Link Grammar


Link grammar is a formal grammatical system developed by
Sleator and Temperley
The syntax of a language is defined by a grammar that
includes the words and their linking requirements.


The grammar is defined in a dictionary file and each of the linking
requirements of words is expressed in terms of connectors
A given sentence is accepted by the system if



the linking requirements of all the words are satisfied (connectivity),
none of the links between the words cross each other (planarity) and
there is at most one link between any pair of words (exclusion)
RANLP-2007
3
Link Grammar – Example

The linkage requirements of three Turkish words:
yedi : O- & S-;
- ate
kadın : S+ ;
- the woman
portakalı : O+;
- the orange

A linkage for a sentence containing these three words
+--------------S-------------------+
|
+-------O------+
|
|
|
Kadın
portakalı
yedi
The woman
RANLP-2007
the orange
(The woman ate the orange)
ate
4
Turkish Syntax


The basic word order is SOV, but order of constituents may
change according to the discourse context.
Turkish is head-final -- modifiers precede modified item.



an adjective (modifier) precedes the head noun (modified item) in a
noun phrase.
In the basic word order of the sentence, the subject and the object
(modifiers) precede the verb (modified item).
Although the head-final property can be violated at major constituent
levels (SOV) of a sentence, it is preserved at sub-clause levels and
smaller syntactic structures.
kırmızı
red
RANLP-2007
şapkalı
with hat
kız
girl
(the girl with the red hat)
5
Turkish Syntax (cont.)




Turkish is agglutinative.
Words can take many derivational suffixes and each of these
derivations can take its inflectional suffixes.
Inflectional suffixes have important grammatical roles.
A significant amount of interaction between syntax and
morphotactics.
uygarlaştı
He got civilized.
uygar-laş-tı
uygar+Noun+A3sg+Pnon+Nom^DB+Verb+Become+Pos+Past+A3sg
RANLP-2007
6
Motivation for New Formalism

In standart link grammar formalism, linking requirements are
defined for words.

When we consider all possible derivations and inflections for
Turkish words, the number of possible words will be huge.
The words in the same category behave similarly at the
syntactical level.


We preferred to use linking requirements based on the classes
of words and their inflections (and derivations are treated as
separate words)
RANLP-2007
7
System Architecture of Turkish Parser
Input Sentence
Morphological
Analysis
Stripping
Lexical Parts
Separating
Derivation Boundaries
Create
Sentence List
Parse Sentences with
Link Grammar
All possible linkages
RANLP-2007
Linking Requirements
for Turkish Word Classes
and Derivations
8
System Architecture (cont.)

Morphological Analysis:




All the words in the input sentence are analyzed by the fully functional Turkish
morphological analyzer.
oku  oku+Verb+Pos+Past+A2sg
(read)
uygarlaşmak  uygar+Noun+A3sg+Pnon+Nom
(to get civilized)
^DB+Verb+Become+Pos^DB+Noun+Inf1+A3sg+Pnon+Nom
Stripping Lexical Parts:



Lexical parts of the words are removed for all types of words except
conjunctions.
In fact, Turkish link grammar is designed for the classes of word types and their
feature structures
oku+Verb+Pos+Past+A2sg  Verb+Pos+Past+A2sg
RANLP-2007
9
System Architecture (cont.)

Separating Derivation Boundaries:


The words are separated at derivational boundaries and the part of
speech tag of each derived form is marked in order to indicate its
position in that word.
Each token starts with a part of speech tag together with a position
mark, and continues with inflectional feature structures.
Noun+A3sg+P1pl+Loc ^DB+Adj+Rel ^DB+Noun+Zero+A3sg+Pnon+Gen
NounRoot+A3sg+P1pl+Loc
AdjDB
NounDBEnd+A3sg+Pnon+Gen
RANLP-2007
10
System Architecture (cont.)

Parsing Sentences:



Turkish link grammar contains linking requirements for:



Each representation of the sentence is fed into the parser.
A sentence is parsed with respect to the designed Turkish link grammar.
each part of speech tag, and
each part of speech tag followed by one of the strings “Root”, “DB”, or
“DBEnd”.
A linking requirement for a token depend on


the part of speech tag of the token, and
the inflection suffixes in that token.
RANLP-2007
11
Turkish Link Grammar

Linking requirements are defined for a part of speech tag and
inflectional suffixes.
Noun+A3sg+Pnon+Nom : linking requirements for nouns with
+A3sg+Pnon+Nom inflections
Noun+A3sg+Pnon+Acc : linking requirements for nouns with
+A3sg+Pnon+Acc inflections
Verb+Pos+Past+A1sg : linking requirements for verbs with
+Pos+Past+A1sg inflections
Verb+Pos+Past+A2sg : linking requirements for verbs with
+Pos+Past+A2sg inflections
RANLP-2007
12
Linking Requirements for Derivations


In order to preserve the syntactic roles that the intermediate
derived forms of a word play, they are treated as separate
words in the grammar.
In order to indicate that they are the intermediate derivations
of the same word, all of them are linked with the special “DB”
(derivational boundary) connector.
Noun+A3sg+P1pl+Loc ^DB+Adj+Rel ^DB+Noun+Zero+A3sg+Pnon+Gen
+----------DB----------+---DB---+
|
|
|
NounRoot+A3sg+P1pl+Loc AdjDB NounDBEnd+A3sg+Pnon+Gen
RANLP-2007
13
Linking Requirements for Derivations (cont.)




A derived word consists of root word, intermediate derived
forms and last derived form.
Root Word only contributes left linking requirements of that
word, and it is connected to the right with a DB connector.
Intermediate Derived Forms also only contribute left linking
requirements of that word, and it is connected to the left and
right with a DB connector.
Last Derived Form contributes both left and right linking
requirements of that word, and it is connected to the left with
a DB connector.
RANLP-2007
14
Linking Requirements for Derivations (cont.)

For each part of speech tag, we will need three more
linking requirements for three positions in derived words
(root, intermediate and last)
Example:
Noun Inflections : LeftLinkingRs & RightLinkingRs
NounRoot Inflections : LeftLinkingRs & DBNounDB Inflections : LeftLinkingRs & DB- & DB+
NounDBEnd Inflections : LeftLinkingRs & RightLinkingRs & DBRANLP-2007
15
Evaluation

We tested the developed Turkish parser with a set of 250
sentences.






Average number of words in the sentences is 5.19.
Average number of parses per sentence is 7.49.
For 84.31% of the sentences, their result sets contain the correct parse.
Average ordering of the correct parse in the result set was 1.78.
For 62.39% of the sentences, the first parse is the correct parse
For 80.94% of the sentences, one of the first three parses is correct.
RANLP-2007
16
Conclusions


A Turkish grammar is developed in the link grammar
formalism.
The developed Turkish link grammar is not a lexical grammar.




We used the morphological feature structures and the word classes.
We preserved the syntactic roles of the intermediate derived forms of
words in our system by separating the derived words from their
derivational boundaries and treating each intermediate form as a distinct
word.
Our linking requirements are defined for morphological categories.
Our current system does not use a POS tagger, and its addition
will improve the performance in terms of both time and
precision.
RANLP-2007
17