Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Outline of Turkish Morphology Kemal Oazer, Elvan Gocmen, Cem Bozsahin October 1994 Contents 1 Introduction 2 Current orthography of Turkish 3 Morphophonemic processes 3.1 Vowel Harmony 3.1.1 Resolving low-unrounded vowels 3.1.2 Resolving high vowels 3.2 Vowel drops 3.3 Consonant drops 3.4 Consonant changes 3.5 Words ending with (su) 3.6 Gemination 3.7 s-drop 2 2 4 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Ax Inventory 4.1 Coding scheme 4.2 Noun Inections (NNI xxxxxx) 4.3 Derivations producing nouns (NxDxxxxx) 4.3.1 Nouns from nouns (NND xxxxxx) or adjectives (NJD xxxx) 4.3.2 Nouns from verbs (NVD xxxxxx) 4.3.3 Adjectives from nouns (JND xxxxxx) or adjectives (JJD xxxxx) 4.3.4 Adjectives from verbs (JVD xxxx) 4.4 Verb Inections (VVI xxxxxxx) 4.5 Derivations producing verbs 4.5.1 Verbs from nouns (VND xxxxx) or adjectives (VJD xxxx) 4.5.2 Verbs from verbs (VVD xxxxx) 4.5.3 Adverbs from nouns (AND xxxxx) or adjectives (AJD xxxxx) 4.5.4 Adverbs from verbs (AVD xxxxx) 4.5.5 Adverbs from adverbs (AAD xxx) 11 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 Morphotactics 5.1 Paradigms 5.1.1 Finite State Machine for Nominal Morphotactics 5.1.2 Finite State Machine for Verbal Morphotactics 11 11 12 13 14 15 16 16 18 18 18 19 19 19 19 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 Multiple-word constructs 4 4 5 6 7 7 9 10 10 19 20 22 28 1 +Round {Round +Front {Front +Front {Front -High u u i {High o o e a Table 1: Turkish Vowels 1 Introduction Turkish is an agglutinative language with word structures formed by productive axations of derivational and inectional morphemes to root words.1 A popular|and rather exaggerated| example of a Turkish word formation is: OSMANLILASTIRAMAYABI_ LECEKLERI_ MI_ ZDENMI_ SSI_ NI_ Z which can be broken down into morphemes as follows: OSMAN-LI-LAS-TIR-AMA-YABI_ L-ECEK-LER-I_ MI_ Z-DEN-MI_ S-SI_ NI_ Z where the -'s indicate morpheme boundaries. This adverb can be translated into English as \(behaving) as if you were of those whom we might consider not converting into an Ottoman." For the details of Turkish grammar and word formations rules one can refer to a number of books [5, 9]. Turkish has clear but rather complex morphotactics. Morphemes added to a stem can convert the word from a nominal to a verbal structure or vice-versa, or can create adverbial constructs as above. The surface realizations of morphological constructions are constrained and modied by a number of morphophonemic rules. Vowels in the axed morpheme have to agree with the preceding vowel in certain aspects to achieve vowel harmony. Under certain circumstances vowels in the roots and morphemes are deleted. Similarly, consonants in the roots words, or in the axed morphemes undergo certain modications, and may sometimes be deleted. Furthermore, the assimilation of a large number of words into the language from various foreign languages { most notably Arabic, Persian, and French { resulted in word formations which behave as exceptions. Turkish morphology has been investigated from a computational point of view by Koksal [4], Hankamer [2], Solak and Oazer [7], and Oazer [6]. 2 Current orthography of Turkish The Turkish language has an alphabet of 29 letters in its current orthography based on the Latin characters. There are 8 vowels: a, e, , i, o, o, u, u, and 21 consonants: b c c d f g g h j k l m n p r s s t v y z. Tables 1 and 2 show the phonetic features of Turkish vowels and consonants. 1 Turkish is an exclusively suxing language. There are however a few very unproductive prexes of foreign origin, such as na- (un-).Words with such suxes can be treated as separate lexical items. 2 Stop Fricative Nasal Liquid Glide Bilabial voiceless voiced voiceless voiced Labio- Dental Palatodental Alveolar Alveolar Palatal p b m f v lateral nonlat. t d c c s z n s l r Velar Glottal k g y ~g h Table 2: Turkish Consonants There are however phonemes not covered by these. Certain long vowels are mainly used in words borrowed from foreign languages most notably Arabic and Persian. Such vowels are sometimes distinguished in older orthography by various means (such as with a ^ on top of the vowel). In modern orthography such distinctions are almost never used. There is also a certain phoneme known as \yumusak g" (soft g { denoted as g in orthography) which creates bisyllabic two-vowel sequences. At the end of a syllable, this phoneme causes the lengthening of the preceding vowel [10]. Consonants k,g, and l have palatal and non-palatal allophones. In certain cases the palatalization process has impact on the vowel harmony. We can partition the vowels as follows to aid in the description of the vowel harmony processes. 1. 2. 3. 4. 5. 6. 7. 8. Back vowels: f a, , o, u g Front vowels: f e, i, o, u g Front unrounded vowels f e, i g Front rounded vowels fo, ug Back unrounded vowels fa, g Back rounded vowels fo, ug High vowels f, i, u, ug Low unrounded vowels fa, eg In Turkish, proper nouns are separated from certain suxes by an apostrophe ('). All vowel harmony rules and some of the consonant change rules are in eect in the orthography of proper nouns. Below we present the rules in two-level notation. In giving examples we refer to lexical and surface forms, where the former refers to the structural form of a word, and the latter refers to the phonological realization of the word. The 0's in the examples denote the phonemes or morpheme boundary symbol which get deleted on the surface realizations. 3 3 Morphophonemic processes We use the following meta-phonemes in our descriptions: 1. 2. 3. 4. 5. 6. D : voiced (d) or voiceless (t) A : back (a) or front (e) H : high vowel (, i, u, u) R : vowel except o, o C : voiced (c) or voiceless (c) G : voiced (g) or voiceless (k) 3.1 Vowel Harmony Turkish has vowel harmony processes that force certain vowels in suxes agree with the last vowel in the stems or roots they are being axed to. Some of these phenomena are exemplied below using two-level notation. 3.1.1 Resolving low-unrounded vowels Let A be a vowel in a sux, which may either be an a or e. A is resolved as follows: A is resolved as an a if the last vowel in the stem is a back vowel. For example: Lexical: Surface: Lexical: Surface: Lexical: Surface: Lexical: Surface: A masa-lAr masa0lar N(table)-PLU masalar satr-lAr satr0lar N(hatchet)-PLU satrlar oto-lAr oto0lar N(car)-PLU otolar kutu-lAr kutu0lar N(box)-PLU kutular is resolved as an e if the last vowel in the stem is a front vowel. For example: 4 Lexical: Surface: Lexical: Surface: Lexical: Surface: Lexical: Surface: ev-lAr ev0ler N(house)-PLU evler kedi-lAr kedi0ler N(cat)-PLU kediler g ol-lAr g ol0ler N(lake)-PLU g oller g ul-lAr g ul0ler N(rose)-PLU g uller is also resolved as an e if the last vowel is a long ^a (a^), a long u^ (u^) or an ^o followed by a palatalized l (mostly in words of French origin.) The long vowels are almost always realized on the surface as their short counterparts. For example. A Lexical: Surface: Lexical: Surface: Lexical: Surface: sa^ at-lAr saat0ler N(hour)-PLU saatler us^ ul-lAr usul0ler N(method)-PLU usuller g^ ol-lAr gol0ler N(goal)-PLU goller Note that between the harmonizing and the harmonized vowel, there may be one or more consonants. 3.1.2 Resolving high vowels Let H denote a high-vowel in a sux. It is resolved as follows: H is resolved as a u is the last vowel in the stem is a back-rounded vowel. Lexical: Surface: Lexical: Surface: okul-Hm okul0um N(school)-1SG-POSS okulum gel-Hyor-yHm gel0iyor00um V(come)-PROG-1SG geliyorum is resolved as a u if the last vowel in the stem is a front-rounded vowel, or long u^ or ^o as dened above. H 5 Lexical: Surface: Lexical: Surface: Lexical: Surface: Lexical: Surface: H N(day)-1SG-POSS g un um g ol-Hm g ol0 um N(lake)-1SG-POSS g ol um alk^ ol-Hm alkol0 um N(alcohol)-1SG-POSS alkol um us^ ul-Hm usul0 um N(method)-1SG-POSS usul um is resolved as a is the last vowel in the stem is a back-unrounded vowel. Lexical: Surface: Lexical: Surface: H g un-Hm g un0 um masal-Hm masal0m N(tale)-1SG-POSS masalm yldz-Hm yldz0m N(star)-1SG-POSS yldzm is resolved as a i if the last vowel in the stem is a front-unrounded vowel, or a long a. Lexical: Surface: Lexical: Surface: Lexical: Surface: ev-Hm ev0im N(house)-1SG-POSS evim pir-Hm pir0im N(master)-1SG-POSS pirim sa^ at-Hm saat0im N(watch)-1SG-POSS saatim There a very small number of special cases which presents some problems with respect to vowel harmony. These happen with the verbal roots de (say) and ye (eat), where the only vowel in the root may be deleted under certain circumstances. In these case, we assume the sux vowel harmonizes with respect to the undropped lexical vowel. 3.2 Vowel drops An H denoting a high vowel at the beginning of a sux is deleted if the last phoneme of the stem is a vowel. For example: 6 Lexical: Surface: masa-Hm masa00m N(table)-1PS-POSS masam However, this drop does not occur if the high vowel is not a part of the tense sux (-Hyor) and the verbal root stem ends with a vowel in which case this vowel drops. (This may also be viewed as the H actually dropping and the stem-nal vowel becoming a high vowel if necessary) Lexical: Surface: kapa-Hyor kap00yor V(close)-PR-CON-3PS kapyor The last vowel in certain roots is deleted when those roots are axed certain suxes that either start with a vowel or a consonant that also drops. There is no uniform way of specifying such words, except possibly by explicitly enumerating them. Here we indicate such vowels by prexing them with a certain marker. Lexical: Surface: 3.3 bur$un-Hm bur00n0um N(nose)-1SG-POSS burnum Consonant drops The consonants n,s and y at the beginning of a sux drop if the last phoneme of the stem is a consonant. However if the sux is -sHz with H representing a high-vowel, then s does not drop even if the preceding phoneme is a consonant. Lexical: Surface: Lexical: Surface: Lexical: Surface: Lexical: Surface: 3.4 ev-nHn ev00in N(house)-GEN evin kalem-sH kalem00i N(pencil)-3PS-POSS kalemi kalem-sHz kalem0siz N(pencil)-WITHOUT kalemsiz a g$z-yH a g00z00 N(mouth)-ACC a gz Consonant changes Let D denote a sux initial dental consonant that may resolved as either a resolved to a t is the last phoneme in the stem is resolved as one of fc, f, h, tg. Otherwise, D is resolved as a d. Some examples are: 7 or . It is d t k, p, s, s , Lexical: Surface: Lexical: Surface: Lexical: Surface: kitab-DA kitap0ta N(book)-LOC kitapta yulaf-DAn yulaf0tan N(oat)-ABL yulaftan a c-DHk a c0tk V(open)- PERF a ctk Voiced stops b, d are realized as p, t respectively when they are stem-nal or they are the last consonant in the stem which axed a morpheme that starts with a consonant that does not drop. Some examples are: Lexical: Surface: Lexical: Surface: Lexical: Surface: Lexical: Surface: kitab-lAr kitap0lar N(book)-PLU kitaplar kitab-cH kitap0 c N(book)-NtoN(ci) kitap c dolab-nHn dolab00n N(closet)-GEN dolabn tad-DHk tat0tk V(taste)- PERF tattk There are however some exceptions to this rule. These exceptions are: ab (water) kalb (heart), balad (ballad), hemoroid , onad (fore name), soyad (last name) yad (remembrance), etc. c is another voiced obstruent like those above except that it also appears in certain suxes as the rst consonant where it gets modied to a c, due to a reciprocal assimilation process. So the sux-initial c is resolved as a c if the last consonant of the stem is resolved one of fc,f, h, k, p, s, sg . A stem nal c is resolved to a c if it is also word nal or is followed by a consonant that does not drop. Some examples are: Lexical: Surface: Lexical: Surface: Lexical: Surface: harac-cH hara c0 c N(tribute)-NtoN(ci) hara c c ya s-cA ya s0 ca N(age)-NtoAdv(ca) ya s ca harac-yA harac00a N(tribute)-DAT haraca There are however some exceptions to this rule. These exceptions are the following monosyllabic forms, and compound forms written together using these as the second component: ac 8 (hungry/to open) cec (wheat pile), goc (migration), hac (cross), ic (interior), etc. A velar stop k at the end of a stem becomes a g when a sux starting with a vowel is axed. There may be a dropping consonant before the sux. Some examples are: Lexical: Surface: Lexical: Surface: ayak-nHn aya g00n N(foot)-GEN aya gn tarak-Hm tara g0m N(comb)-1PS-POSS tara gm However a stem-nal k preceded by a n becomes a g under the same circumstances. Lexical: Surface: Lexical: Surface: renk-yH reng00i N(color)-ACC rengi ahenk-yA aheng00e N(harmony)-DAT ahenge However there are some exceptions to these, where the k does not change. These exceptions are the following monosyllabic forms and some polysyllabic words of foreign origin, and compound forms written together using these as the second component: afak (Arabic plural version of ufuk (horizon)), ahlak (ethics), arabesk (), ark (water canal), ask (love) bank (chair), etc. At stem-nal g (in words of foreign origin) also becomes a g when a sux starting with a vowel is axed. There may be a dropping consonant before the sux. Lexical: Surface: radyolog-yA radyolo g00a N(radiologist)-DAT radyolo ga However under the circumstances above if the stem-nal g is preceded by another consonant (only n and r seem to be such consonants) then the g does not become a g. Some examples are: Lexical: Surface: Lexical: Surface: brifing-Hm brifing0im N(briefing)-1PS-POSS brifingim aysberg-HnHz aysberg0iniz N(iceberg)-2PP-POSS aysberginiz There are again some exceptions to these rules. These are: (league), pedagog, sinagog. 3.5 , demagog f ug (fugue) gag, lig Words ending with (su) Turkish has a large number of nominal roots ending with su (water) e.g., akarsu (river). Su, along with ne (what) does not obey the standard inection rules. For example su-sH (water 9 -3PS-POSS) is suyu and not susu and su-nHn (water-GEN) is suyun and not sunun.2 Thus a stem nal y inserted to such stems when a sux starting with a vowel or a dropping consonant is axed. Here are some examples: Lexical: Surface: Lexical: Surface: akarsu0-yHnHz akarsuy00unuz N(river)-2PP-POSS akarsuyunuz akarsu-lar akarsu0lar N(river)-PLU akarsular 3.6 Gemination Certain nominal forms in of Arabic or Persian origin, there is a gemination process whereby the last consonant is duplicated when certain suxes are added. Some examples are: Lexical: Surface: Lexical: Surface: Lexical: Surface: s0-sH u uss00 u N(base)-3SG-POSS u ss u hak0-yH hakk00 hak-lAr hak0lar N(right)-ACC hakk N(right)-PLU haklar The suxes that cause this gemination are those that start with a dropping consonant. The words that undergo this gemination process are: hak, tb, med, hal (solution) sk, ad (recognition) had, us, zam, af, sr, hat. 3.7 s-drop The rst consonant 3SG-POSS sux (-sH) when added to certain words of Arabic origin ending with a vowel, drops in exception to the general rule. Words with s dropping and not dropping are considered legal, though two vowel sequences are not at all common in Turkish. Lexical: Surface: Surface: cami-sH cami-0i cami-si N(mosque)-3SG-POSS camii camisi The following words have this property: bayi, cami, c ma, enva, filvaki, ibda, i ctima, if sa, ihtira, ikna, imtina, indifa, inkta, intiba, irca, irtica, irtifa, ttla, kablelvuku, kan, maktu, mani, matbu, mayi, mebde, mecmu, memba, men se, merci, meta, mevdu, mevki, mevzi, mevzu, msra, mudi, murabba, m urteci, muti, muttali, m uvezzi, niza, r uk u, sanayi, s ayi, s eci, s ema, s ua, s uyu, tab, teberru, terfi, te sci, te sri, te syi, tetebbu, tevabi, tevazu, tevdi, tevess u, tevsi, tulu, vaki, vasi, veda, vuku, zayi, zya, z urra . 2 For ne the normal inections are also valid. 10 4 Ax Inventory In this section we present the set of suxes that are available in Anatolian Turkish for word formation via derivational or inectional means. Many words derived from derivational suxes are lexicalized in the sense that their meaning composition is no longer related to the meaning of the stem in a predictable way. The suxes marked with { in the tables are the ones which are relatively more productive and compositional in this sense. 4.1 Coding scheme Upper-case letters in morphemic representations denote meta-phonemes3. Parentheses indicate insertion/deletion depending on the previous segment. Codes for morphemes are of the form P0 P1 P2 P3 : : : where : Position 0. Final grammatical category. 1 : Source grammatical category. 2 : Type of process. I for inection and D for derivation. A N means not applicable. remainder is the mnemonic name of the morpheme. P0 P P Grammatical category codes are : N Noun V Verb A Adverb J Adjective R Pronoun P Postposition C Conjunction X Exclamation For instance, NVD xyz means the xyz ax produces a noun from a verb via a derivation. 4.2 Noun Inections (NNI xxxxxx) Elements|in order| are given below. All except N are optional. 1. Noun stem (N) 2. Plural (NNI PLU) 3. Possessive (NNI POSSxx) 3 see the section on morphophonemic processes 11 4. Case (NNI xxx) 5. Relative (NNI REL) Morphemic Representation Code Gloss Examples -lAr -(H)m -(H)mHz -(H)n -(H)nHz -(s)H -lArH -(y)H -nH 0 -(n)Hn -(y)A -nA -DA -nDA -DAn -nDAn -(y)lA -ki Plural 1st person singular possessive 1st person plural possessive 2nd person singular possessive 2nd person plural possessive 3rd person singular possessive 3rd person plural possessive Objective (accusative) case Objective case (after 3P poss) Nominative case Genitive case Dative case Dative case (after 3P poss) Locative case Locative case Ablative case Ablative case Instrumental/comitative case Relative arabalar, evler arabam, evim arabamz, evimiz araban, evin arabanz, eviniz arabas, evi arabalar, evleri arabay, evi masasn araba, ev arabann, evin arabaya, eve masasna arabada, evde masasnda arabadan, evden masasndan arabayla, evle evdeki, arabadakilerinki 4.3 { NNI PLU { NNI POSS1s { NNI POSS1p { NNI POSS2s { NNI POSS2p { NNI POSS3s { NNI POSS3p { NNI OBJ { NNI OBJ3 NNI NOM { NNI GEN { NNI DAT { NNI DAT3 { NNI LOC { NNI LOC3 { NNI ABL { NNI ABL3 { NNI INC { NNI REL Derivations producing nouns (NxDxxxxx) The 'adjective' and noun distinction in Turkish is a dicult one. 4 Most adjectives can be used as nouns, and undergo the derivations from a noun. Nouns can perform the function of an adjective as noun modier in noun-noun groups (izafet). See the chapter on syntax. 4 Nouns and adjectives are sometimes collectively called substantives 12 4.3.1 Nouns from nouns (NND xxxxxx) or adjectives (NJD xxxx) Morphemic Representation -CA -CA -cAgIz -cAk -CH Code NJD CA NND CA NND CAGZ NND CAK NND CI -CHk -CHl -dAs -gen -lHk NND CIK NND CIL NND DAS NND GEN NND LIK -lHk { NJD LIK Example akca, karaca, Hintce Turkce, Arapca adamcagz, koycegiz oyuncak, yavrucak ekmekci, odac, isci, ciftci yolcu, oncu, yabanc, arac, konusmac kaderci, aklc, milliyetci, Ataturkcu bademcik, kzlck, maymuncuk oncul, balkcl srdas, arkadas, meslekdas, gonuldes ucgen, altgen, kosegen gunluk, gozluk, sabunluk, salatalk, gecelik kitaplk, delik, komurluk, odunluk ebelik, doktorluk, tasclk halkclk, maddecilik, rkclk bolluk, guzellik, titizlik, sksklk 13 4.3.2 Nouns from verbs (NVD xxxxxx) Morphemic Representation Code -AcAk NVD ACAK -Ak NVD AK -amak -An -AnAk -cA -gA -(G)Ac -GAn -GH -gHc -gHn -H -(y)HcH -(H)k -(H)m -(H)n -(Hn)c -HntH -(y)Hs -(H)t -mA -mAcA -mAk -mAn -mAzlHk -tH Example alacak, verecek, icecek, yakacak, cekecek, kracak durak, yatak, batak, konak, yunak, sgnak, tapnak olcek, kayak, kaydrak, ucak, yutak, sacak NVD AMAK basamak, kacamak, tutamak bakan, kapan, kalan, caglayan, bolunen, bolen { NVD AN NVD ANAK gelenek, gorenek, yetenek, olanak, tutanak saganak, odenek NVD CA saknca, dusunce, eglence, dinlence, soylence dalga, bilge, supurge, onerge, bildirge, gosterge NVD GA kskac, suzgec, sayac, buyutec NVD GAC NVD GAN srgan, sergen, yelken, ergen silgi, sarg, atk, keski, sungu NVD GI catk, dolgu, cizgi, icki, bitki sevgi, sayg, ilgi, etki, gorgu NVD GIC dalgc yangn, salgn, duzgun, bilgin, bozgun NVD GIN olu, dolu, soru, korku, san NVD I yap, dizi, suru, bat, dogu, kosu NVD ICI satc, yuzucu, okuyucu stc, susturucu, uyusturucu, tasyc NVD IK tank, delik, krk, gocuk, boluk, katk ayrk, konuk, oksuruk, buyruk, sark, ack NVD IM dogum, olum, yudum, atm, saym, secim donum, yarm, ekim, pisirim, icim, tadm baglam, kavram, saglam, ucurum, oturum, bitirim, kaldrm NVD IN ekin, ygn, tutun, akn, sn, gelin kazanc, ilenc, sapnc, bilinc NVD INC NVD INTI aknt, kesinti, dokuntu, kuruntu dals, gecis, ucus, yuruyus NVD YIS ayrt, gecit, umut, yogurt, yakt, kesit, tast NVD IT NVD MA korunma, bekleme, arastrma bilmece, kandrmaca, kosmaca, cekmece, atmaca NVD MACA { NVD MAK yapmak, yemek sayman, gocmen, okutman, egitmen NVD MAN { NVD MAZLIK aldrmazlk, dinlemezlik, uyusmazlk NVD TI belirti, kzart 14 4.3.3 Adjectives from nouns (JND xxxxxx) or adjectives (JJD xxxxx) Morphemic Representation -CA -CH -cHk -cHl -(H)msH -(H)mtrak -(H)ncH -(H)z -lH -lHk -mAn -sAl -msAr -(m)sH -sHz -(s)Ar Code JJD CA JND CI JND CIK JJD CIL JJD IMSI JJD MTRAK JND INCI JND IZ JND LI JND LIK JJD MAN JND SAL JJD MSAR JND MSI { JND SIZ JND SER Example mertce, guzelce, yaslca sakac, inatc, uykucu, karahaberci incecik, ufack, kucucuk evcil, bencil, olumcul, insancl tatlms, mavimsi, sertimsi, hamurumsu eksimtrak, yesilimtrak birinci, ikinci, ucuncu, onuncu ikiz, ucuz tatl, sesli, uslu, turlu, Asyal, Cinli yemeklik, kiralk, tursuluk kocaman, kucumen, sisman, delismen anayasal, sorunsal, bicimsel iyimser, kotumser, karamsar erkeksi, ylans, cocuksu, budalams tatsz, evsiz, sonsuz, koksuz birer, ikiser, dokuzar 15 4.3.4 Adjectives from verbs (JVD xxxx) Morphemic Representation -dHk -AcAk -AgAn -(A)k -An -gAc -GAn -gic -GHn Code JVD MADIK { JVD ACAK JVD AGAN JVD AK { JVD AN JVD GAC JVD GAN JVD GIC JVD GIN -H -(y)HcH -(H)k JVD I JVD ICI JVD IK -(H)lH -Hn -(Hn)c -(H)ntH -Hr (Ar) JVD ILI JVD IN JVD INC JVD INTI JVD IR -mA -mAz -mHs JVD MA { JVD MAZ { JVD MIS Example gorulmedik, olmadk, ummadk oturacak, iecek, soylenecek, olmayacak olagan, suregen, duragan, gezegen korkak, sarsak, dislek, yuvarlak, uzak yaratan, goren, sevilen, tuten utangac, usengec calskan, alngan, cekingen, dovusken bilgic dalgn, yorgun, uzgun, bezgin, saskn yetiskin, alskn sk, olu, kat, dolu, duru yrtc, bolucu, gecici, uzucu batk, kesik, cizik, tutuk, yank, catk islek, oynak, patlak, buyuk, soguk, bulank ekili, sarl, ortulu, kurulu uzun, sayn kskanc, igrenc, korkunc, gulunc uyuntu, ozenti, supruntu olur, okur, dusunur, bilir, icilir akar, bakar, keser, doner dokme, yapma, kaplama, serpme gorunmez, ylmaz, utanmaz gecmis, okumus, pismis 4.4 Verb Inections (VVI xxxxxxx) Elements are given below. VVI Txxx and VVI PERS are required if the verb is nite. 1. Verb stem (V) 2. Reexive (VVI REFX) 3. Reciprocal/Collective (VVI RECP) 4. Causative (VVI CAUSx) 5. Passive (VVI PASSx) 6. Impossible (VVI IMP) 7. Negative (VVI NEG) 8. Tense-aspect (VVI Txxxx) 9. Auxiliary (VVI Xxxxx) 10. Person (VVI PERSxxx) 16 Morphemic Representation Code Gloss -(H)n VVI REFX Reexive -Hs VVI RECP -DHr -t -(H)r -Hl -(H)n -(y)AmA -mA -(H)r { VVI CAUSD { VVI CAUST { VVI CAUSR { VVI PASSL { VVI PASSN { VVI IMP { VVI NEG { VVI TAORSH -(A)r -(H)yor { VVI TAORSA { VVI TPROG -DH -mHs -(y)AcAk -(y)A -mAlI -sA -yAbil -yAmA -yAdur -yAkal -yAyaz -yAgor -yAgel -yAkoy -(y)DI -(y)mHs { VVI TPAST { VVI TNARR { VVI TFUTR { VVI TOPTA { VVI TNECE { VVI TCOND { VVI TABIL { VVI ANEG { VVI TDUR { VVI TKAL { VVI TYAZ { VVI TGOR { VVI TGEL VVI TKOY { VVI XPAST { VVI XDUBT -(y)sA { VVI XCOND -(y)ken -ArAk -cAsInA -(H)m -(H)z -k -(sH)n -(sH)nHz -DHr -0 -z -lAr { VVI XADV1 { VVI XADV2 { VVI XADV3 { VVI PERS1s { VVI PERS1p1 { VVI PERS1p2 { VVI PERS2s { VVI PERS2p { VVI DHR { VVI PERS3st1 { VVI PERS3st2 { VVI PERS3p Examples kapan, kacn, ortun, vurun, edin Reciprocal/Collective kacstr, buzustur, kosusmak Causative kaldr, arttr, guldur, sektir Causative ckart, kucult Causative ckar, batr Passive yaplms, kuculdu Passive vidaland Impossible geleme, kalama Negative gelme, kalma Aorist tense kalr, bulur buyur, gelir Aorist tense gecer, kacar Progressive geciyor, kalyor, buluyor, guluyor Past tense kald, gecti, buldu, guldu Narrative past kalms, bulmus, olmus Future kalacak, gelecek, isteyecek Optative gelmiyeydi, kazmyayd Necessitative gelmeli, bulmal, bilmeli Conditional gelse, vursa, bulasa Abilitative gidebil, kalamayabil Negative abilitative gideme, okuyama Continuous gidedur, calsadur bakakal duseyaz, unutayaz yapagor yapagel alkoy Past aux yapsayd, gelmisti, gelecekti Dubitative aux tembelmis, gitmismis, buradayms Conditional aux buradaysa, bulduysa, gelmisse Adverbial aux gelmisken, buradayken Adverbial aux bakarak, gelerek Adverbial aux bilmiscesine, ucarcasna 1st person singular geldim, bulmusum Type I 1st person plural geliriz, bulmusuz Type II 1st person plural geldik, baksak 2nd person singular gelsen, bulursun 2nd person plural gelseniz, bulursunuz copula buradadr, gelmisizdir Type I 3rd singular okurlar, gelmis Type II173rd singular yapamaz, gelemez 3rd plural okurlar, gelmisler 4.5 Derivations producing verbs 4.5.1 Verbs from nouns (VND xxxxx) or adjectives (VJD xxxx) Morphemic Representation -dA -A -(A)l -Ar -et -HmsA -lA -lA -lAn -lAs -sA -sA Code VND DA VND A VJD AL Example parldamak, hopurdemek yasamak, kanamak, dilemek, turemek, kocamak azalmak, incelmek, duzelmek, daralmak, korelemek yukselmek, ufalmak VJD AR agarmak, kararmak, gogermek, yesermek VND ET gozetmek, yonetmek VJD IMSE kucumsemek VND LA sepetlemek, cuvallamak, kalaylamak, kazklamak tuzlamak, katlamak, ogutlemek, sabunlamak patlamak, gurlemek, melemek, havlamak atlamak, saplamak, yoklamak sollamak, genislemek, serinlemek, ucuzlamak VJD LA VND LAN evlenmek, yaslanmak, uslanmak, ayaklanmak, kibirlenmek { VJD LAS guzellesmek, yoksullasmak, baskalasmak susamak, tavsamak, kapsamak, umursamak, onemsemek, VND SA VJD SA garipsemek, raksamak, yaknsamak 4.5.2 Verbs from verbs (VVD xxxxx) Morphemic Representation -DAr -AklA -AlA -(H)klA -mAk -HstHr Code VVD DAR VVD AKLA VVD ALA Example aktarmak, kaytarmak, kotarmak duraklamak, iteklemek, tartaklamak eselemek, ogalamak, kovalamak, sasalamak, silkelemek VVD IKLA uyuklamak, didiklemek, durtuklemek, sayklamak yapmak, gelmek VVD MAK VVD USTUR itistirmek, veristirmek, atstrmak, ogusturmak 18 4.5.3 Adverbs from nouns (AND xxxxx) or adjectives (AJD xxxxx) Morphemic Representation -CA -CA -cAk -(y)A -(y)A -Hn -Hn -lA -leyin Code AND CA AJD CA AND CAK AND YA AJD YA AND IN AJD IN AND LA AND LAYIN Example dostca, snfca usulca, ayrca, boylece evcek beriye, uca, yarna, asagya temize, ucuza ksn, guzun ilkin, anszn hzla, guclukle, zamanla, oncelikle sabahleyin, aksamleyin 4.5.4 Adverbs from verbs (AVD xxxxx) Morphemic Representation Code Example -dHkcA AVD DIKCA oldukca, gittikce -ArAk(tAn) AVD ARAK olarak, giderekten, bakaraktan, gulerek, bilmeyerek 4.5.5 Adverbs from adverbs (AAD xxx) Morphemic Representation Code Example -cek AAD CEK demincek 5 Morphotactics 5.1 Paradigms Turkish has two main paradigms for word formation. The nominal paradigm applies to nouns and adjectives and describes the order of the inectional suxes. This paradigm is described in Figure 1. The verbal paradigm applies to verbs and describes the order of the inectional suxes that are applicable to verbal roots. It is shown in Figure 2. These paradigms do not however describe cross-paradigm derivations which will be described in the next section on sux sequencing. Turkish morphotactics allow productive formation of words whose part-of-speech categories may change a number of times during axation. One can start with a nominal root, then form a verbal form with a sux which can then take an aspect sux and then become a nominal form again through for example a gerund sux, and then take the standard nominal suxes (plural, possessive case, etc.) It is also possible to have circular constructions (an example of which is given later). This however does not mean that there are no restrictions on such formations. In fact there are semantic restrictions on the formations. It is possible to enforce such restrictions in morphotactics except the mechanisms one would need would have to be much more sophisticated than the simple provisions provided by the most morphological 19 nominal plural possessive case relative root sux sux sux sux plural sux -lAr possessive suxes -(H)m -(H)mHz -(H)n -(H)nHz -(s)H -lArH case suxes -(y)H -(y)A -DAn -nH -nDA -ki relative sux -(y)lA -DA -(n)Hn -nA -nDAn Figure 1: The nominal model analyzers. In this section we will present the morphotactics of Turkish word paradigms by means of nite state machines. In the gures describing our morphotactic component (such as Figure 3), the boxes indicate suxation states, the arrows indicate the next states which can be reached when a sux matching one of the labels is found. The circles indicate the nal states for complete and valid word formations with the labels in parentheses near these states labeled End indicate the class of the word construction when the machine ends up in that nal state. The 0 on the transitions indicate that the transition can be taken with input. The states drawn in bold correspond to references to states in other gures.For example, the state labeled Possessive-3 indicates the state of a nominal construction which has been axed a third person possessive sux. From that state one can go to a nal state indicating a nominal in accusative case with sux -nH, or to the states labeled Case-1 or Case-2 with the relevant case suxes, or to another nal state with the sux -cA. nul l 5.1.1 Finite State Machine for Nominal Morphotactics Figure 3 shows the nite state machine for the nominal paradigm. The morphotactics for the nominal paradigm is relatively simple. There are mainly two parts: The top part corresponds to nominal constructions with plural, possessive, case and relativization suxes. It is technically possible to go around the loop through the state labeled Relative a number of times though in practice such constructions are rarely used. For example it is possible to have a word structure like: MASA-LAR-IM-DA-KI_ -LER-I_ N-KI_ -NDE which roughly means \at those (things) which belong to those (other things) at my tables." 20 verbal voice negation compound main question second person root suxes sux verb s. tense s. sux tense s. sux voice suxes reexive reciprocal causative passive -(H)n -(H)s -DHr -Ht -t -Hr -Ar negation suxes -mA -(y)AmA compound verb -(y)Abil -(y)Ayaz suxes -(y)Adur -(y)Akal -(y)Hver -(y)Agel -(y)Akoy -(y)Agor main tense suxes -DH -mHS -(y)AcAk -(H)r -Ar -(H)yor -mAktA -sA -(y)A -mAlH -0 question sux second tense suxes person suxes -mH -(y)DH -(y)mHS -(y)sA -m -n - -k -nHz -lAr -(y)Hm -sHn -(y)Hz -sHnHz -lHm -(y)Hn -(y)HnHz -sHnlAr Figure 2: The verbal model 21 -Hl -Hn -n The bottom part of the nominal morphotactics state diagram corresponds to the nominal verb and adverbial constructions like: evdeydi { (S/he/it) was at the house. evdeyse { If (s/he/it) is at the house. evdeymis { (s/he/it) was as the house.(Narrative) evdeyim { I am at the house. evdedirler { They are (denitely) at the house. evdeyken { While (someone) is (was) at the house. evdeymiscesine { (behaving) as if he is at the house. The nominal morphotactics are a bit dierent for compound nouns. The additional states required by these compound nouns are shown in Figure 4. Compound nouns which are treated as single lexical unit have two components both of which are nominal roots. Thus Turkish does not have a productive compounding paradigm such as in German. The second component in such compound nouns is always axed compound marker, which is the same as the third person possessive sux, when the compound noun is used in the nominative case. For example bitpazar (ea market)) (Lexical bit-pazar-sH), is used as both the nominative form and the third person possessive form. However further axation does not proceed as in other nominals. For example the plural of bitpazar is bitpazarlar) where the plural sux is now axed to the nominative form of the second part of the compound and then the third person possessive is added. Similarly in bitpazarm (my ea market) or bitpazarn (your ea market) the axation is onto the nominative form of the second component and not on to the nominative form of the compound noun. Some lexical elements are already in plural form. For those cases the plural sux and/or the possessive suxes are skipped in the morphotactics. For example: amcamlar (the family/home of my uncle):5 This is already in plural form and does not take any possessive sux either. Hence the sux lexicon that follows this is the CASE-1 lexicon. bakliyat (legumes), baklagiller (leguminous plants) are already in plural form. For nouns already in plural form and ending in -lAr, the possessive sux -sH, can be interpreted as both the third person singular possessive or third person plural possessive. 5.1.2 Finite State Machine for Verbal Morphotactics Figures 5 and 6 show the nite state machine for the verbal paradigm. The verbal morphotactics is signicantly more complicated than the nominal morphotactics. Turkish verbal Note that this looks like it has a possessive sux (-Hm) followed by the plural sux (-lAr). However morphotactics puts the possessive after the plural, hence this can not parsed as such within the nominal paradigm. 5 22 Nominal Root +IAr,0 +IH,+SHz Plural (Acc. Noun) +Hm,+Hn, +HmHz, +HnHz,0 +IArH +sH (Acc. Noun) +nH Possess. End Posses.-3 +yA,+DA,+DAn +nHn,+yIA, +sH 0 End +cA +nA,+nDA +nDAn,+nHn, +yIA,0 End End (Adverb) 0 Case-1 (Adverb) End DA,+nHn (nominative/cased noun) +nDA, +nHn Case-2 +nA, +nDA, +nDAn, +nHn, +yIA,0 +ki +nDA, +nHn Relative +IAr +IH,+SHz +yDH, +ysA +ymHs Nominal Verb 1 +yHm +sHn 0, +yHz +IAr SHnHz Nominal Verb 2 +Hm,+SHn, 0,+Hz, +SHnHz,IAr +m,+n 0,+k,nHz, +IAr (nominal verb) End (temporal adverb) Nominal Verb 4 (nominal verb) +IAr,0 End Nominal Verb-2 Person End +yken +DHr +DHr,0 End +cAsHnA (nominal verb) End nominal verb) End (attitude adverb) Figure 3: Finite State Machine for Nominal Morphotactics 23 Compound Noun Root Nominal Root No Possesive +DA,+nHn +yA,+DA,+DAn +nHn,+yIA, 0 Case-1 +IAr +sH, +Hm,+Hn, +IArH +HmHz,+HnHz Case-2 Possessive Plural Possessive-3 +Hm,Hn, +HmHz, +HnHz,0 +sH, +IArH +Hm,+Hn +sH* +HmHz, +HnHz,0 Nominal Root Plural Nominal Root Plural/IAr * This possessive has both singular and plural interpretation. Figure 4: Finite State Machine for Compound Noun Morphotactics 24 structures can take a sequence of reexive, reciprocal, causative and passive suxes which can then be followed by a compound verb, and then by aspect, tense and person suxes. Verbal structures can also be made into nominal or adverbial structures with the addition of yet other suxes. When a verbal root takes no reexive or reciprocal sux, the causative or the passive suxes can take a variety of forms depending on a number of criteria on the roots. If, however, they take either of the reexive or the reciprocal suxes (which are mutually exclusive), then the causative and passive formations are very simple as shown on Figure 5. After state labeled Passive Hn which corresponds to a verbal stem with all the reexive/reciprocal, causative, and passive suxes are accounted for, we can construct a negative form by -mA and -yAmA or directly go into positive verb construction. In any case, we can possibly add from a small number of auxiliary (or compound) verbs (the most common being -yAbil indicating potentiality) to get a verbal stem to which we can now add tense and person suxes, or suxes which form nominal structures, innitives and adverbs. Turkish verbs can have at most two suxes indicating aspect and tense. The rst one can be one of narrative, future, aorist, present continuous, necessitative, optative, imperative, perfect and conditional suxes. These can take possibly dierent sets of person suxes to form a verbal structure, or take a second morpheme indicating perfect, conditional or narrative. As can be seen from the morphotactics diagram, not all possible combinations of the aspect and tense suxes are possible. The second set of suxes will only be allowed if the rst sux is one of narrative, future, aorist, present continuous and necessitative. There are a number of nonstandard cases especially involving the third person plural and these are accounted for in the state diagrams. An example will clarify the general idea behind verbal constructions. Consider the verb: gorulemiyormusum which can be translated into English as \(it is said that) I was not able to be seen." The morpheme structure is: g or g or -Hl - ul -yAmA -0em0 -Hyor -iyor -ymH s -0mu s -yHm -0um see -PASS -NEG -PRES-CONT -NARR -1PS This verbal root gor will generate the structure above by going through the states labeled: 1. Verbal Root (root) 2. Passive Hl with -Hl 3. Passive Hn with 0 4. Negative yama with -yAmA 5. Verbal Stem with 0 6. Other Tense with -Hyor 7. Second Tense Other with -ymHs 8. End with -yHm Readers familiar with details of verb formation in Turkish will note that our morphotactic model does not deal with the three groups of a total of 13 verbal roots whose aorist forms are exceptions to the rules. 25 Verbal Root +Hn +Hs Reflexive Reciprocal +DHr +DHr Causative DHr +DHr +t +HI, 0 +HI,0 +HI,0 +HI,0 Causative t Passive HI +Hn,0 0 Passive Hn +ma +yama Negative ma +ZsHnHz, +zIAr +m, +zsHn, +z, End yHz +yAdur,+yHver,yAgel, +yAgor,+yAbil,yAyaz, +yAkal,+yAkoy,0 Negative yama +z +z +yAbil,0 Negative Aorist Positive mAksHzHn, +mAdAn NegAorist) 0 +yHcH +yAdur,+yHver, +yAgel,+yAgor,0 Verbal Stem +yHp, yAIH, DHkcA End (adverb) +mAk +yArAk Adverb-1 +yHncA Adverb-2 Nominal Root +mazIHk,+ma, +yHs (noun) +DAn,0 End +yA,0 +yAn,+yAcAk, +yAsH,+DHk, +mHs (adj) (adverb) Figure 5: Finite State Machine for Verbal Morphotactics 26 Infinitive +DAn,+DA, +yIA,+yA Case-1 Negative Aorist Verbal Stem Other Tense Optative +yHm,+SHn 0,+yHz, +SHnHz,+IAr +IAr, 0 Imperative Perfect/Cond +yHm,+SHn, 0,+yHz, +SHnHz,+IAr 0,+SHn, +yHnHz, +SHnIAr +m,+n, +k,0, +nHz, +IAr End End End +IAr Other Tense 3rd Person Other Tense Person +cAsHna,+yken End +DHr,0 (Optative Verb) (Imperative Verb) (Perfect/Cond) End (Other Verb) +cAsHnA (Adverb) +yDh, +ysA Other Tense Sec Tense-3PP Second Tense Perfect/Cond +yDh,+ysA, +ymHs End (Verb) +ymHs Second Tense Other +m,+n,0,+k, +nHz,+IAr +yHm,SHn,0, +yHz,SHnHz, +IAr End End (Verb) (Verb) Figure 6: Finite State Machine for Verbal Morphotactics (cont.) 27 6 Multiple-word constructs Analyzing text on lexical item basis may generate spurious analyses when multiple lexical items act as single syntactic or semantic entity. For example, in the sentence Sirin mi sirin bir kopek kosa kosa geldi. (A very cute dog came running.) the fragment sirin mi sirin constitutes a duplicated emphatic adjective in which there is an embedded question sux mi (written separately in Turkish), and the fragment kosa kosa is a duplicated verbal construction where each form has the morphological parse: kosa English 1. N(kosa) N twin 2. V(kos)-OPT-3SG V let him run but yet the duplicated form has the grammatical role of manner adverb in the sentence. Following is a set of multi-word constructs in Turkish that can be handled in a post-morphological pre-syntactic analysis phase. This list is not meant to be comprehensive, and new construct specications can easily be added. 1. duplicated optative and 3SG verbal forms functioning as manner adverb, e.g., kosa kosa, 2. aorist verbal forms with root duplications and sense negation functioning as temporal adverbs, e.g.,yapar yapmaz. (an exception is olur olmaz which may also function as a manner adverb. 3. duplicated verbal and derived adverbial forms with the same verbal root acting as temporal adverbs, e.g.,gitti gideli, 4. duplicated compound adjectival form constructions that act as adjectives, e.g., guzeller guzeli, 5. adjective or noun duplications that act as manner adverbs, e.g., hzl hzl, ev ev, 6. emphatic adjectival forms involving the question sux, e.g., guzel mi guzel, 7. word sequences with specic usage whose semantics is not compositional, e.g., yan sra, hic olmazsa, 8. proper nouns, e.g., Suleyman Demirel,Topkap Saray, 9. idiomatic forms and duplications which are never used alone, e.g., gurul gurul, 10. other idiomatic forms. Recognizing and appropriately marking these prior to the syntactic analysis substantially aids in parsing. 28 References [1] R. Simsek. Orneklerle Turkce Sozdizimi (Turkish Syntax with Examples). Kuzey Matbaaclk, 1987. [2] Jorge Hankamer. Finite state morphology and left to right phonology. In Proceedings of the West Coast Conference on Formal Linguistics, volume 5. Stanford University, 1986. [3] L.E. Knecht. Subject and Object in Turkish. PhD thesis, Department of Linguistics and Philosophy, Massachusetts Institute of Technology, Cambridge, Massachusetts, February 1986. [4] Aydn Koksal. Automatic Morphological Analysis of Turkish. PhD thesis, Hacettep University, Ankara, Turkey, 1975. [5] G. L. Lewis. Turkish Grammar. Oxford University Press, 1991. [6] Kemal Oazer. Two-level description of Turkish morphology. Linguistic and Literary Computing, 1994. [7] Aysn Solak and Kemal Oazer. Design and implementation of a spelling checker for Turkish. Linguistic and Literary Computing, 1993. [8] Richard Sproat. Morphology and Computation. MIT Press, 1992. [9] Robert Underhill. Turkish Grammar. MIT Press, 1976. [10] Harry van der Hulst and Jeroen van de Weijer. Topics in Turkish phonology. In Hendrik Boeschoten and Ludo Verhoeven, editors, Turkish Linguistics Today. E. J. Brill, 1991. 29