* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download EAGLES compliant tagset for the morphosyntactic tagging of Esperanto
Chinese grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Georgian grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Ojibwe grammar wikipedia , lookup
Sanskrit grammar wikipedia , lookup
Malay grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Ukrainian grammar wikipedia , lookup
Udmurt grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Modern Greek grammar wikipedia , lookup
Singular they wikipedia , lookup
Swedish grammar wikipedia , lookup
Arabic grammar wikipedia , lookup
Italian grammar wikipedia , lookup
Latvian declension wikipedia , lookup
Old Irish grammar wikipedia , lookup
Spanish pronouns wikipedia , lookup
Pipil grammar wikipedia , lookup
Russian declension wikipedia , lookup
Old Norse morphology wikipedia , lookup
French grammar wikipedia , lookup
Turkish grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Spanish grammar wikipedia , lookup
Romanian nouns wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
EAGLES compliant tagset for the morphosyntactic tagging of Esperanto Antonio Toral, Sergio Ferrández and Andrés Montoyo Natural Language Processing and Information Systems Group Department of Software and Computing Systems University of Alicante, Spain {atoral, sferrandez, montoyo}@dlsi.ua.es Abstract This paper presents the first stage of a research related to automatic morphosyntactic annotation in Esperanto. We present and justify a tagset which fulfils the EAGLES standard. This standard allows us to map our tagset with the tagsets developed for other languages. In future studies, an automatic tagger and a corpus will be developed using the proposed tagset. Background Esperanto Human-created International Language Neutral tool for global communication Supported by UNESCO resolutions (1954 and 1985) Its features regarding word formation facilitate its morphosyntactic annotation PoS Tagging Important step for Natural Language Processing tasks (Question Answering, Information Extraction, etc) Classifies words in a natural language according to their Parts-ofSpeech EAGLES European Standard for PoS annotation Provides an Intermediate tagset with a set of features: - mandatory, PoS of words - recommended - optionals Tagsets defined according to the standard must match the intermediate tagset Linkings can be made between tagsets defined with this standard by using the intermediate tagset Tagset Follows the EAGLES standard The 13 mandatory attributes of EAGLES suit the Parts of Speech defined in Esperanto Just a subset of the recommended attributes that EAGLES considers have been necessary: - There are different values for the attribute case, but we only need one (accusative) - Gender and number are not necessary for verbs or adjectives - The attribute Person is not needed for verbs - Grade for adjectives - Only two types of pronouns are needed - The attributes for the PoS article are not used as Esperanto has only one article (la) No optional attributes have been needed EAGLES does not provide any attribute for a feature of Esperanto called direction adverbs We have added tags to take into account this fact but we do not consider them for the intermediate tagset in order to mantain the compatibility with the standard Final tag NCMS NCMSA NCMP NCMPA NCFS NCFSA NCFP NCFPA NCNS NCNSA NCNP NCNPA NP NPA VP VF VPA VIM VC VIN VPTPA VPTFA VPTPAA VPTPP VPTFP VPTPAP VGPA VGFA VGPAA VGPP VGFP VGPAP AJS AJSA AJP AJPA AT AV AVD AP CC CS NUMC NUMCA NUMO NUMOA PDP1S PDP2S PDP3SM PDP3SF PDP3SN PDP1P PDP2P PDP3P PDP1SA PDP2SA PDP3SMA PDP3SFA PDP3SNA PDP1PA PDP2PA PDP3PA PDPO1S PDPO2S PDPO3SM PDPO3SF PDPO3SN PDPO1P PDPO2P PDPO3P PDPO1SA PDPO2SA PDPO3SMA PDPO3SFA PDPO3SNA PDPO1PA PDPO2PA PDPO3PA I U RFW RSY PUE PUB PUL PUR Description Example Intermediate tag Noun common singular masculine Noun common singular masculine accusative Noun common plural masculine Noun common plural masculine accusative Noun common singular femenine Noun common singular femenine accusative Noun common plural femenine Noun common plural femenine accusative Noun common singular neuter Noun common singular neuter accusative Noun common plural neuter Noun common plural neuter accusative Noun proper Noun proper accusative Verb indicative present Verb indicative future Verb indicative past Verb imperative Verb conditional Verb infinitive Verb participle present active Verb participle future active Verb participle past active Verb participle present passive Verb participle future passive Verb participle past passive Verb gerund present active Verb gerund future active Verb gerund past active Verb gerund present passive Verb gerund future passive Verb gerund past passive Adjetive singular Adjetive singular accusative Adjetive plural Adjetive plural accusative Article Adverb Adverb direction Preposition Conjunction coordinating Conjunction subordinating Numeral cardinal Numeral ordinal Numeral cardinal accusative Numeral ordinal accusative Pronoun personal 1st pers. singular Pronoun personal 2nd pers. singular Pronoun personal 3rd pers. sing. masc. Pronoun personal 3rd pers. sing. fem. Pronoun personal 3rd pers. sing. neuter Pronoun personal 1st pers. plural Pronoun personal 2nd pers. plural Pronoun personal 3rd pers. plural Pron. pers. 1st pers. singular accusative Pron. pers. 2nd pers. singular accusative Pron. pers. 3rd pers. sing. masc. accusative Pron. pers. 3rd pers. sing. fem. accusative Pron. pers. 3rd pers. sing. neuter accusative Pronoun pers. 1st pers. plural accusative Pronoun pers. 2nd pers. plural accusative Pronoun pers. 3rd pers. plural accusative Pronoun posesive 1st pers. singular Pronoun posesive 2nd pers. singular Pronoun posesive 3rd pers. sing. masc. Pronoun posesive 3rd pers. sing. fem. Pronoun posesive 3rd pers. sing. neuter Pronoun posesive 1st pers. plural Pronoun posesive 2nd pers. plural Pronoun posesive 3rd pers. plural Pos. pron. 1st pers. singular accusative Pos. pron. 2nd pers. singular accusative Pos. pron. 3rd pers. sing. masc. accusative Pos. pron. 3rd pers. sing. fem. accusative Pos. pron. 3rd pers. sing. neuter accusative Pos. pron. 1st pers. plural accusative Pos. pron. 2nd pers. plural accusative Pos. pron. 3rd pers. plural accusative Interjection Particles Foreign words Symbols Punctuation sentence-final Punctuation sentence-medial Punctuation left-parentheical Punctuation right-parentheical knabo knabon knaboj knabojn knabino knabinon knabinoj knabinojn domo domon domoj domojn Karlo Karlon amas amos amis amu amus ami amanta amonta aminta amata amota amita amante amonte aminte amate amote amite bela belan belaj belajn la tie tien kun kaj kvankam unu unua unun unuan mi vi li si gxi ni vi ili min vin lin sin gxin nin vin ilin mia via lia sia gxia nia via ilia mian vian lian sian gxian nian vian ilian aj ne, cxu show \$ ., ?, ! ,, ;, :, (, \{, [ ), \}, ] N1110 N1114 N1120 N1124 N1210 N1214 N1220 N1224 N1310 N1314 N1320 N1324 N2000 N2004 V00001100 V00001300 V00001400 V00003000 V00004000 V00005000 V00006110 V00006310 V00006410 V00006120 V00006320 V00006420 V00007110 V00007310 V00007410 V00007120 V00007320 V00007420 AJ0010 AJ0014 AJ0020 AJ0024 AT0000 AV0 AV0 AP1 C1 C2 N10000 N20000 N10040 N20040 PD10100150 PD20100150 PD31100150 PD32100150 PD33100150 PD10200150 PD20200150 PD30200150 PD10104150 PD20104150 PD31104150 PD32104150 PD33104150 PD10204150 PD20204150 PD30204150 PD10100130 PD20100130 PD31100130 PD32100130 PD33100130 PD10200130 PD20200130 PD30200130 PD10104130 PD20104130 PD31104130 PD32104130 PD33104130 PD10204130 PD20204130 PD30204130 I U R100 R300 PU1 PU2 PU3 PU4 Conclusions and further work A standard compliant tagset for the morphosyntactic tagging of Esperanto has been defined Useful as a starting point to build NLP systems for this language EAGLES provides a set of attributes that are able to represent the morphosyntactic features of Esperanto One aspect of Esperanto (direction adverbs) is not considered by EAGLES Due to morphosyntactic features of Esperanto, it has been possible to design a small tagset (86 tags). EAGLES compliant tagsets for other languages are bigger (i.e. 114 for English, 274 for Italian or 280 for Urdu) Next steps are to develop a PoS tagger for Esperanto using this tagset and to build an annotated corpus