* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Toward an Ontology of the Sumerian Language Part 1. The
Survey
Document related concepts
Arabic grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Untranslatability wikipedia , lookup
Spanish grammar wikipedia , lookup
Old Irish grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Malay grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Agglutination wikipedia , lookup
Transformational grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Transcript
Toward an Ontology of the Sumerian Language Part 1. The Sumerian Language § 1. The Sumerian language has characteristics which it shares with many other languages of the world, living or dead. Unfortunately, it does not seem to share all its set of characteristics with the set of any other known language of the world, living or dead. In other words, it has contradictory and conflicting aspects and non consequential or ambiguous relationships between its parts and stands as a unicum among the languages. Of course, that simply means that we Assyriologists do not have, to date, a reliable reconstruction of the Sumerian Grammar. In fact, it is a common knowledge and a matter of fact among scholars of Third Millennium B.C. Mesopotamia that every Sumerologist has his own grammar in mind, which evidently differs from the one of all the other scholars in the field in this or that point. So, we have to deal, speaking about Sumerian, with a sort of strange grammatical monstrum (slide n. 1-3). This is the areas of Sumerian development.. Joking aside, there is one very good reason for the difficulties Sumerologists have in reconstructing a reliable grammar of the Sumerian language, and it is of historical nature. During all its history, starting from the very first attestations on the tablets of Uruk at the end of the IVth mill. until the collapse of the Sumerian culture at the beginning of the IInd, the Sumerians never strove to express in their writing all what was necessary, or is felt as necessary by us, in order to read the text itself . To say this differently, they only wrote the nucleum of the information, hence the definition of nuclear writing, and even if they added more and more grammatical elements in course of time, they never regarded as obligatory to bestow the reader in a given text all the elements which could allow him to read the text as it was thought in the mind of the writer. The word which best describes the attitude of the reader of a Sumerian text is “hospitalization”, that is, the reader had to add, in the necessary case, the grammatical elements which had been not written by the writer, because the last one considered sufficient what had been written in order to let anybody understand and consequently hospitalize, grammatically, his message (and understand it). (slide n. 4-11). These are examples of sumerian texts from each period. § 2. Typologically, the Sumerian language is agglutinative, that is the word (verb, substantive, adjective etc.) we read in a given text is identical with the same word we find in the vocabulary. To say it differently, in Sumerian we do not have the idea of “root”, that is a linguistic reality bearing a basic semantic value and which, as such, does not form part of the vocabulary (in other word, the Sumerian language does not know any flection). The specification of the meaning happens with morphems added, agglutinated to the unchangeable word in a fixed order after it in the case of the substantive, or before and after it in the case of the verb; moreover, generally speaking, the morphemes have only one semantic meaning and as such are transparent, immediately recognizable. (slide n. 12). For instance, in the latin expression filiis, “to the children”, we can detect a root, *fili- and a suffix *-is, which has at least three different meanings: masculine, plural and dative. In Sumerian we should add for each one of these specifications a different morpheme, in this specific case we should have: *dumu-nita2-ene-ra, that is: *child-male-plural-dative, “to the children”. The morphemes build with the noun they refer to a group which is often called “chain”, so we can speak of noun chains and, as we shall see, we can speak of verbal chains. It is interesting to highlight that in Sumerian we have a two class system for the substantive, class A, for the gods and the human being, and class B for all the rest. § 3. The Sumerian is an ergative language. That means that the subject of a sentence with one participant is morphologically identical with the patiente of a sentence with two participants. Let’s make an example (slide n. 13): lugal-e e2-Ø in(=i+n)-du3-Ø lugal-Ø i3-gin-Ø The king built a temple The king went In the first sentence we have two participants, the “temple” and the “king”, while in the second only the “king”, so to say, participates to the action. Now, it is evident that the way in which the participants are treated morphologically is exactly the opposite of what happens, for instance, in Latin, which is not an ergative language but has a nominative-accusative structure: rex templum exstruxit rex ivit The king built a temple The king went In this case the word rex has its nominative case-marker, *-s (*reg-s), which means that it is the subject of the action in both phrases and therefore is morphologically highlighted as such. On the other hand, in Sumerian in the two instances the word lugal, “king”, as we have seen, has been treated differently from the morphological point of view: in the first sentence lugal is added with the case-marker *-e, called “ergative”, which means that he is the actor (from Greek ergàzomai, “I work, I act”) while the word e2, “temple”, is left unmarked, or rather it is marked with the zero case-marker, which is called absolutive (in many ergative languages the patiente is left unmarked); in the second sentence he has no case-marker, that is the word is in the absolutive state, because the king does not work toward something else, if we want, he is exercising his power only on himself. § 4. The verb is the part of the grammar where we have the biggest problems for a reliable reconstruction of the function of the many morphemes we can detect in its structure. First of all, we have in Sumerian a series of prefixes: it is a morpheme which seems to be necessary in a finite verbal phrase and which is always present in a finite verbal form, indicating perhaps that the structure is to be understood as having an acting subject. To date, we do not have any satisfactory explanation for them and this remains the most troublesome and problematic aspect of the entire Sumerian grammar. Moreover, the Sumerian presents in the verb a characteristic which is admittedly rather rare in the languages of the world and which can be labeled “verbal incorporation”. That simply means that the verb, which is always to be found at the end of a given sentence, according to the SOV pattern (subject-object-verb), incorporates, absorbes, so to say, inside its structure (we shall see soon how) not only the indication of ergative-absolutive, but also a series of other noun-phrases which have been used by the speaker (slide n. 14) For instance, in the sentence: “The king drank beer in the garden with the general”, in the verbal chain at the end of the sentence we should find, after a prefix which always starts a verb-phrase, the morphem for the ergative (lugal-e), *-n- in preverbal position; the one for the absolutive (kaš-Ø), marked with *-Ø in postradical position; the infixes, called dimensional in Sumerian grammar, of the comitative: *-n+da-, and the one of the locative, *-b+a-; so the form should be translated literally: (FINITE VERB)-him+with-it+in-he(past)-drink-(it) We have at the beginning the prefix *i-, one of the series of prefixes, chosen here with a mere didactic scope. Then we find the so called dimensional infixes, which appear in a fixed order (it is possible to find in the verbal forms up to four of such infixes, although the medium rate is two); in this case: comitative (with him)-locative (in it), recalling the two noun-phrase in the sentence. Then we have the morphem *-n-, in preverbal position, recalling the ergative and indicating that: 1) it is third person of class A (human being); 2) it is a singular subject; 3) that it is a past action. Finally, we find a zero-morphem in post-verbal position recalling the absolutive (kaš, beer). So you can easily realize how central the verbal chain is in the Sumerian language. Let’s now pass to the description of the ontology created in order to represent in a knowledge oriented base of information the characteristic we have briefly described above. Part 2. The ontology of Sumerian language The construction of an ontology of the Sumerian language was part of a wider project of the company Epistematica s.r.l. for testing "Semantic Web tecnology" and its application on the computational linguistics. In this context, the company had already completed a project to realize an ontology of Esperanto trying to demonstrate the possibilty to realize a linguistic parser with reasoning skills. After this experience, it seemed appropriate to test this procedure on a language that were not artificial like the Esperanto, but natural, a language which had an evolving grammar and a more complex history. The choice of the Sumerian for the next step was due to two reasons: firstly it is a natural language, with all the problems of "irrationality" that this entails, but is a language whose textual corpus is closed and can be treated as a unit that can not be further modified; secondly, the transparency of its grammar, for his character agglutinative, allows an easier identification of morphemes. It is important to stess that the next step should be the formal description of a living language (e.g. Turkish). I started to write an ontology of Sumerian with Dr. Marco Romano, an expert in the theory of the Semantic Web, who helped me to write the ontology using the programs for managing ontologies in OWL format: Protégé 3.2 and RacerPro 1.8. An ontology is composed of two elements: a T-Box (Terminological Box), and an A-Box (Assertation-Box) (slide n. 15) The T-box is the taxonomic part of ontology, where concepts (and the hierarchy that exists among them) is defined, and where we formalize the relationship among different concepts. It is the "shape" of the ontology. The A-Box is the part of the ontology that contains the facts, where individual instances are classified as belonging to a specific class and where properties are defined for the class of each instance. It is the "substance" of ontology. In the case of Sumerian (slide n. 16), the T-Box is the Sumerian grammar, while the A-Box is represented by every Sumerian texts that can be formalized in the grammar described in the T-Box. The possibility to apply this system on the Sumerian language (that is still so problematic for scholars) could be very important: when the T-box is formalized, and when we apply to this T-Box a great number of Sumerian texts, we could let the machine to tell us where our reconstruction of grammar is right, and where, on the contrary, it is not possible to apply the reconstruction of the grammar, formalized in T-Box, to a text. So, it would be possible to add new texts (and there are hundreds of thousands in the museums all over the world) to understand where our reconstruction of the language is right and where we need to seek new grammatical solution. Given the fact that the ontology represented an "experiment", it was decided not to use a long text, but one which could show some of the most common grammatical features of the Sumerian language in order to obtain a small but consistent and fully instantiated A-Box. After analyzing some texts the choice fell on a foundation brick of the king Ur-Namma, king and founder of the Third Dynasty of Ur, who ruled in Mesopotamia between 2112 and 2095 BC. (slide n. 17). This means (slide n. 18) that I use in the T-Box the class, for example, of nominal chain (which has two sub-classes: possessive and case-marker), and the class of verbal chain (which has, for example, the subclasses of prefix, dimensional infix and so on). The transliteration (that is the rendering in latin characters of each cuneiform sign) and translation of the text is the follow (slide n. 1920): d Nanna lugal-a-ni Ur-dNamma lugal-Urim5ki-ma-ke4 e2-a-ni mu-na-du3 bad3-Urim5ki-ma mu-na-du3 to the God Nanna his king Ur-dNamma king of Ur his temple he built the walls of Ur he built This text represents our A-Box. The T-Box had been formalized using the grammar in this text, and after about two months of work, the progam could distinguish each grammatical element of this text (slide 21), that is it distinguished that, for example lugal-a-ni was a sbstantive + possessive adjective: d Nanna lugal-a-ni Ur-dNamma lugal-Urim5ki-ma-ke4 e2-a-ni mu-na-du3 bad3-Urim5ki-ma mu-na-du3 noun (= god)+(dative case) noun (= substantive) + possessive adjective (a-ni) noun (= personal name) noun (= substantive) + genitive(city name + ak) + ergative noun (substantive) + possessive adjective (a-ni) verbal chain: prefix (mu) + dimensional infix (na) + verbal root (du3) noun (substantive) + genitive (city name + a<k>) verbal chain: prefix (mu) + dimensional infix (na) + verbal root (du3) As I said above, this ontology is only an experiment, an attempt, but this work shows however that it is possible to apply the technologies of the Semantic Web on a natural language as well. This seems to be the right track and I am sure that these technologies will be able to provide important new tools not only for Sumerian, but also for many other linguistic aspects. This work has produced a final report (slide n. 22) available in http://dx.doi.org/10.1683/ab0002, and a ontology in OWL format (Ur_Namma.owl) in http://dx.doi.org/10.1683/me0004.