Download EAGLES compliant tagset for the morphosyntactic tagging of Esperanto

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Chinese grammar wikipedia , lookup

English clause syntax wikipedia , lookup

Georgian grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Sanskrit grammar wikipedia , lookup

Malay grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Inflection wikipedia , lookup

Portuguese grammar wikipedia , lookup

Ukrainian grammar wikipedia , lookup

Udmurt grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Singular they wikipedia , lookup

Swedish grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Italian grammar wikipedia , lookup

Latvian declension wikipedia , lookup

Old Irish grammar wikipedia , lookup

Spanish pronouns wikipedia , lookup

Pipil grammar wikipedia , lookup

Russian declension wikipedia , lookup

Old Norse morphology wikipedia , lookup

French grammar wikipedia , lookup

Turkish grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Spanish grammar wikipedia , lookup

Romanian nouns wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Transcript
EAGLES compliant tagset for the
morphosyntactic tagging of Esperanto
Antonio Toral, Sergio Ferrández and Andrés Montoyo
Natural Language Processing and Information Systems Group
Department of Software and Computing Systems
University of Alicante, Spain
{atoral, sferrandez, montoyo}@dlsi.ua.es
Abstract
This paper presents the first stage of a research related to automatic morphosyntactic annotation in Esperanto. We present and justify a
tagset which fulfils the EAGLES standard. This standard allows us to map our tagset with the tagsets developed for other languages. In
future studies, an automatic tagger and a corpus will be developed using the proposed tagset.
Background
Esperanto
Human-created International Language
Neutral tool for global communication
Supported by UNESCO resolutions (1954 and 1985)
Its features regarding word formation facilitate its morphosyntactic
annotation
PoS Tagging
Important step for Natural Language Processing tasks (Question
Answering, Information Extraction, etc)
Classifies words in a natural language according to their Parts-ofSpeech
EAGLES
European Standard for PoS annotation
Provides an Intermediate tagset with a set of features:
- mandatory, PoS of words
- recommended
- optionals
Tagsets defined according to the standard must match the
intermediate tagset
Linkings can be made between tagsets defined with this standard
by using the intermediate tagset
Tagset
Follows the EAGLES standard
The 13 mandatory attributes of EAGLES suit the Parts of Speech
defined in Esperanto
Just a subset of the recommended attributes that EAGLES considers
have been necessary:
- There are different values for the attribute case, but we
only need one (accusative)
- Gender and number are not necessary for verbs or
adjectives
- The attribute Person is not needed for verbs
- Grade for adjectives
- Only two types of pronouns are needed
- The attributes for the PoS article are not used as
Esperanto has only one article (la)
No optional attributes have been needed
EAGLES does not provide any attribute for a feature of Esperanto called
direction adverbs
We have added tags to take into account this fact but we do not
consider them for the intermediate tagset in order to mantain the
compatibility with the standard
Final tag
NCMS
NCMSA
NCMP
NCMPA
NCFS
NCFSA
NCFP
NCFPA
NCNS
NCNSA
NCNP
NCNPA
NP
NPA
VP
VF
VPA
VIM
VC
VIN
VPTPA
VPTFA
VPTPAA
VPTPP
VPTFP
VPTPAP
VGPA
VGFA
VGPAA
VGPP
VGFP
VGPAP
AJS
AJSA
AJP
AJPA
AT
AV
AVD
AP
CC
CS
NUMC
NUMCA
NUMO
NUMOA
PDP1S
PDP2S
PDP3SM
PDP3SF
PDP3SN
PDP1P
PDP2P
PDP3P
PDP1SA
PDP2SA
PDP3SMA
PDP3SFA
PDP3SNA
PDP1PA
PDP2PA
PDP3PA
PDPO1S
PDPO2S
PDPO3SM
PDPO3SF
PDPO3SN
PDPO1P
PDPO2P
PDPO3P
PDPO1SA
PDPO2SA
PDPO3SMA
PDPO3SFA
PDPO3SNA
PDPO1PA
PDPO2PA
PDPO3PA
I
U
RFW
RSY
PUE
PUB
PUL
PUR
Description
Example
Intermediate tag
Noun common singular masculine
Noun common singular masculine accusative
Noun common plural masculine
Noun common plural masculine accusative
Noun common singular femenine
Noun common singular femenine accusative
Noun common plural femenine
Noun common plural femenine accusative
Noun common singular neuter
Noun common singular neuter accusative
Noun common plural neuter
Noun common plural neuter accusative
Noun proper
Noun proper accusative
Verb indicative present
Verb indicative future
Verb indicative past
Verb imperative
Verb conditional
Verb infinitive
Verb participle present active
Verb participle future active
Verb participle past active
Verb participle present passive
Verb participle future passive
Verb participle past passive
Verb gerund present active
Verb gerund future active
Verb gerund past active
Verb gerund present passive
Verb gerund future passive
Verb gerund past passive
Adjetive singular
Adjetive singular accusative
Adjetive plural
Adjetive plural accusative
Article
Adverb
Adverb direction
Preposition
Conjunction coordinating
Conjunction subordinating
Numeral cardinal
Numeral ordinal
Numeral cardinal accusative
Numeral ordinal accusative
Pronoun personal 1st pers. singular
Pronoun personal 2nd pers. singular
Pronoun personal 3rd pers. sing. masc.
Pronoun personal 3rd pers. sing. fem.
Pronoun personal 3rd pers. sing. neuter
Pronoun personal 1st pers. plural
Pronoun personal 2nd pers. plural
Pronoun personal 3rd pers. plural
Pron. pers. 1st pers. singular accusative
Pron. pers. 2nd pers. singular accusative
Pron. pers. 3rd pers. sing. masc. accusative
Pron. pers. 3rd pers. sing. fem. accusative
Pron. pers. 3rd pers. sing. neuter accusative
Pronoun pers. 1st pers. plural accusative
Pronoun pers. 2nd pers. plural accusative
Pronoun pers. 3rd pers. plural accusative
Pronoun posesive 1st pers. singular
Pronoun posesive 2nd pers. singular
Pronoun posesive 3rd pers. sing. masc.
Pronoun posesive 3rd pers. sing. fem.
Pronoun posesive 3rd pers. sing. neuter
Pronoun posesive 1st pers. plural
Pronoun posesive 2nd pers. plural
Pronoun posesive 3rd pers. plural
Pos. pron. 1st pers. singular accusative
Pos. pron. 2nd pers. singular accusative
Pos. pron. 3rd pers. sing. masc. accusative
Pos. pron. 3rd pers. sing. fem. accusative
Pos. pron. 3rd pers. sing. neuter accusative
Pos. pron. 1st pers. plural accusative
Pos. pron. 2nd pers. plural accusative
Pos. pron. 3rd pers. plural accusative
Interjection
Particles
Foreign words
Symbols
Punctuation sentence-final
Punctuation sentence-medial
Punctuation left-parentheical
Punctuation right-parentheical
knabo
knabon
knaboj
knabojn
knabino
knabinon
knabinoj
knabinojn
domo
domon
domoj
domojn
Karlo
Karlon
amas
amos
amis
amu
amus
ami
amanta
amonta
aminta
amata
amota
amita
amante
amonte
aminte
amate
amote
amite
bela
belan
belaj
belajn
la
tie
tien
kun
kaj
kvankam
unu
unua
unun
unuan
mi
vi
li
si
gxi
ni
vi
ili
min
vin
lin
sin
gxin
nin
vin
ilin
mia
via
lia
sia
gxia
nia
via
ilia
mian
vian
lian
sian
gxian
nian
vian
ilian
aj
ne, cxu
show
\$
., ?, !
,, ;, :, (, \{, [
), \}, ]
N1110
N1114
N1120
N1124
N1210
N1214
N1220
N1224
N1310
N1314
N1320
N1324
N2000
N2004
V00001100
V00001300
V00001400
V00003000
V00004000
V00005000
V00006110
V00006310
V00006410
V00006120
V00006320
V00006420
V00007110
V00007310
V00007410
V00007120
V00007320
V00007420
AJ0010
AJ0014
AJ0020
AJ0024
AT0000
AV0
AV0
AP1
C1
C2
N10000
N20000
N10040
N20040
PD10100150
PD20100150
PD31100150
PD32100150
PD33100150
PD10200150
PD20200150
PD30200150
PD10104150
PD20104150
PD31104150
PD32104150
PD33104150
PD10204150
PD20204150
PD30204150
PD10100130
PD20100130
PD31100130
PD32100130
PD33100130
PD10200130
PD20200130
PD30200130
PD10104130
PD20104130
PD31104130
PD32104130
PD33104130
PD10204130
PD20204130
PD30204130
I
U
R100
R300
PU1
PU2
PU3
PU4
Conclusions and further work
A standard compliant tagset for the morphosyntactic tagging of
Esperanto has been defined
Useful as a starting point to build NLP systems for this language
EAGLES provides a set of attributes that are able to represent the
morphosyntactic features of Esperanto
One aspect of Esperanto (direction adverbs) is not considered by
EAGLES
Due to morphosyntactic features of Esperanto, it has been possible to
design a small tagset (86 tags). EAGLES compliant tagsets for other
languages are bigger (i.e. 114 for English, 274 for Italian or 280 for
Urdu)
Next steps are to develop a PoS tagger for Esperanto using this tagset
and to build an annotated corpus