Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Annotation guidelines The following guidelines are rules to annotate a corpus with XML tags in order to delimit multiword expressions with adverbial function. 1. General definition A multiword expression is defined as an expression made of several words with several of its elements frozen together. For example, de nos jours ‘nowadays’ should be tagged as <ADV fs='PDETC'>de nos jours</ADV>. It is multiword because it is made of three words; it is frozen because the words do not belong to a paradigm of words which could be freely substituted to them. The criterion for being made of several words instead of one is the presence of at least one character which is not a letter inside the word. Thus, adverbs with an internal apostrophe, such as d'ailleurs ‘by the way’, should be tagged. Discontinuous expressions should not be tagged, except if the discontinuity consists of an embedded phrase (cf. section 3 below). The criterion of frozenness is the fact that the combination of elements in the multiword expression does not obey productive rules of syntactic and semantic compositionality. For example, de nos jours ‘nowadays’ is frozen in Il est facile de nos jours de s'informer ‘Getting informed is easy nowadays’, and therefore it should be tagged; the same phrase is not frozen, and therefore should not be tagged, in Voici la liste de nos jours de fermeture ‘Here is the list of our closing days’. An expression has an adverbial function if it is a complement (of a predicative expression or of an adverb), but not an object. For example, au hasard has an adverbial function in Ils erraient au hasard ‘They were wandering at random’, but not in Ils faisaient confiance au hasard ‘They trusted chance’. In the former sentence, au hasard is an object of the predicative expression faisaient confiance ‘trusted’. In the following, we give more detailed guidelines on the application of these rules in various cases of doubt. 2. Adverbial function In these guidelines, a complement of a predicative expression or of an adverb is said to have an adverbial function if and only if it is not an object. The distinction between objects (or essential complements) and complements with adverbial function should be made on the basis of criteria (Gross, 1986, 1990a, 1990b) involving the fact that - complements with adverbial function are optional (but some objects are optional too), - they combine freely with a wide variety of predicates, - and some of them pronominalize with specific forms. For example, à neuf heures ‘at nine’ has an adverbial function and should be tagged in Nous avons résolu le problème à neuf heures ‘We fixed the problem at nine’, but not in Nous avons fixé la réunion à neuf heures ‘We set the meeting at nine’, because the first sentence is paraphrased by Nous avons résolu le problème et cela s'est produit à neuf heures ‘We fixed the problem and that happened at nine’, whereas the second is not 1 paraphrased by Nous avons fixé la réunion et cela s'est produit à neuf heures ‘We set the meeting and that happened at nine’. In French, the essential/adverbial distinction is particularly difficult in the case of locative complements. In case of doubt, annotators should use the criterion of support sentences (Guillet, Leclère, 1992). For example, au fond ‘at/to the bottom’ has an adverbial function and should be tagged in (1) Un ruisseau coule au fond ‘A stream flows at the bottom’ but not in (2) Nous sommes descendus au fond ‘We descended to the bottom’ To check this, construct both support sentences Nous sommes au fond ‘We are at the bottom’ and Nous ne sommes pas au fond ‘We are not at the bottom’ and observe that one of them holds before the process denoted by sentence (2) and the other holds after. The same is not observed with (1). Complements of predicative expressions should be analysed in order to determine whether they have an adverbial function. This includes complements: - of verbs, as tous les jours ‘everyday’ in Il se promène tous les jours ‘He takes a walk everyday’; - of adjectives, as de plus en plus ‘more and more’ in L'eau est de plus en plus froide ‘The water is colder and colder’; - and of support-verb constructions, as tous les jours ‘everyday’ in Il fait une promenade tous les jours ‘He takes a walk everyday’, or à travers les frontières ‘across borders’ in Le public manifeste sa solidarité à travers les frontières ‘The public shows solidarity across borders’. Complements of adverbs should be analysed also, as de plus en plus ‘more and more’ in L'eau coule de plus en plus vite ‘The water is flowing faster and faster’. However, complements of nouns, as de tous les jours ‘everyday’ in Il fait sa promenade de tous les jours ‘He takes his everyday walk’, or à travers les frontières ‘across borders’ in Les victimes s'en remettent à la solidarité à travers les frontières ‘The victims hope for solidarity across borders’ should not be tagged. 3. Embedded free parts When a modifier with an embedded free phrase is embedded in a multiword expression with adverbial function, the free phrase should be annotated in function of its syntactic category. For example, du fait de cette décision ‘as a consequence of this decision’ should be tagged as <ADV fs='PCDN'>du fait de <NP>cette décision</NP></ADV> because the noun phrase cette décision ‘this decision’ is embedded in the complement with adverbial function. If the preposition is contracted with the determiner, the tagging should leave the contraction out of the noun phrase. For example, du fait du temps ‘as a consequence of the weather’ should be tagged as <ADV fs='PCDN'>du fait du <NP>temps</NP></ADV> 2 When the embedded phrase is a clause, it should be tagged as a sentence, i.e. with the S element: du fait qu'il a plu ‘since it has rained’ should be tagged as <ADV fs='PCDN'>du fait qu'<S>il a plu</S></ADV> A complementizer introducing a sentential complement should be left out of the embedded sentence, as in the preceding example. A relative pronoun introducing a relative clause should be included in the embedded sentence: au moment où il a plu ‘at the moment when it rained’ should be tagged as <ADV fs='PCDN'>au moment <S>où il a plu</S></ADV> 4. Named entities Named entities should be tagged only when they are multiword and have an adverbial function. For example, le soir ‘the evening’ should be tagged in Le soir, le vent tomba ‘In the evening, the wind fell’, but not in Le soir arriva ‘The evening came’. For named entities, the criterion of frozenness mentioned in section 1 is less relevant. (Named entities which are not clearly frozen obey a specific syntax, but this syntax is usually largely independent of the rest of the syntax of the language.) 5. Inclusion in larger multiword units When an adverbial multiword expression is embedded in another, the inner one should be tagged only if it has an adverbial function with respect to the embedding expression: <ADV fs='PCDN'>du fait, <ADV fs='PC Conj'>en somme</ADV>, de <NP>cette décision</NP></ADV> ‘as a consequence, in short, of this decision’ or if it is embedded in a free phrase: <ADV fs='PCDN'>du fait qu'<S><ADV fs='PCA'>à coup sûr</ADV>, il a plu</S></ADV> ‘since it has certainly rained’ In other cases, the inner expression should not be tagged, for example when a named entity of date or a time is a part of another named entity: <ADV fs='DATE Conj'>Le lendemain 2 mai à midi</ADV> ‘On the day after, May 2nd, at noon’ In particular, annotating multiword expressions with adverbial function involves analysing sentences and detecting whether sequences are included in larger frozen units. Such larger frozen units may be verbal idioms, e.g. s'attendre au pire ‘expect the worst’, or frozen prepositional phrases used with être ‘be’, e.g. au mieux ‘at one's best’. In these phrases, au pire and au mieux should not be tagged, even though they can have adverbial function in other contexts: au pire ‘at worst’, au mieux ‘at best’. 6. Coordination When a multiword unit is coordinated with another one and appears as reduced because a common part is pronominalized, it should be tagged as if it were not reduced. 3 For example, in dans les rangs de la fonction publique et dans ceux du privé ‘in the ranks of civil servants and in those of the private sector’, the noun rangs ‘ranks’ is pronominalized into ceux ‘those’; therefore, dans les rangs de la fonction publique ‘in the ranks of civil servants’ should be tagged on its own, and dans ceux du privé should be tagged with the same tags as if it had the form of dans les rangs du privé ‘in the ranks of the private sector’. The rules above do not apply when a modifier embedded in the expression is occupied by a coordination of embedded free phrases, as in dans les rangs de la fonction publique et du privé ‘in the ranks of civil servants and of the private sector’, which should be tagged as a single occurrence of an expression, with two embedded free noun phrases. The rules above do not apply either when the whole coordination is frozen, as in en tout et pour tout ‘altogether, only’, which is recognizable by the impossibility to permute the co-ordinated parts (pour tout et en tout ‘for everything and in everything’ is interpretable only compositionally). 7. Subcategories Multiword expressions annotated in the corpus should be assigned the name of the subcategory to which they belong. These subcategories are based upon the surface constituency of the internal structure of multiword expressions, except for the case of named entities, in which it depends on the semantic content. A closed list of subcategories should be used: Cat. names Description of morphosyntactic structure Example PC Preposition and noun par exemple PDETC Preposition, determiner and noun de nos jours PAC Preposition, determiner, preposed adjective and noun à la dernière minute PCA Preposition, determiner, noun and preposed adjective à la nuit tombante PCDC Prepositional phrase containing a prepositional phrase with preposition de and frozen noun phrase dans la limite du possible PCPC Prepositional phrase containing a prepositional phrase with preposition other than de and frozen noun phrase à cent pour cent PCONJ Co-ordination tôt ou tard PCDN Prepositional phrase containing a prepositional phrase with preposition de and free noun phrase à l’insu de NP PCPN Prepositional phrase containing a en comparaison avec NP 4 prepositional phrase with preposition other than de and free noun phrase PV PF Expression with a subjectless verb à dire vrai Expression with an embedded sentence jusqu'à ce que mort s'ensuive PECO Comparative phrase with comme and a noun phrase, compatible with an adjective <fidèle> comme un chien PVCO Comparative phrase with comme and a noun phrase, compatible with a verb <travailler> comme un chien PPCO Comparative phrase with comme and a prepositional phrase, compatible with a verb <disparaître> comme par enchantement PJC Expression beginning with a coordinating conjunction mais aussi et surtout DATE Named entity denoting a date le 22 mai 2008 Named entity denoting a duration pendant vingt-quatre heures Named entity denoting a time à huit heures du soir DURATION TIME FREQUENCE Named entity denoting a frequence deux fois par jour Not all multiword nouns in French strictly match one of these descriptions. Some of them match variants of them: for instance, à nouveau matches the PC structure, except for the part of speech of nouveau, which is an adjective rather than a noun. In that case, annotators are requested to select the closest structure, here PC, so that the closed list above is respected. If the expression to be annotated is a variant of an expression with an embedded free phrase, the morpho-syntactic structure is assigned in function of the form with the embedded free phrase. For example, à nos yeux ‘in our opinion’ is a variant of aux yeux de NP ‘in the opinion of NP’ where NP is possessivized; therefore, it should be assigned the PCDN structure. Similarly, à ce sujet ‘about this’ is a variant of au sujet de NP ‘about NP’, and should be assigned the PCDN structure. 8. Conjunctive function An expression with adverbial function assumes a conjunctive function in discourse if it connects the clause in which it occurs with the previous clause, as en somme ‘in short’. The positive value is indicated by identifier ‘Conj’ in attribute ‘fs’. Example: <ADV fs='PC Conj'>en somme</ADV>. 9. XML Syntax The XML syntax for tagging multiword expressions with adverbial function involves 5 - the ADV element - the fs attribute in the ADV element. The value of the fs attribute is a list of feature identifiers separated by spaces. Example: <ADV fs='PCDN Conj'>En conséquence</ADV>. Feature identifiers may be subcategory names such as PCDN and binary feature names such as Conj. The syntax for tagging embedded free parts in multiword expressions involves - the NP element for embedded noun phrases - the S element for embedded clauses. Bibliography Gross, Maurice. 1986. Lexicon-Grammar. The representation of compound words. In Proceedings of the Eleventh International Conference on Computational Linguistics, Bonn, West Germany, pp. 1--6. Gross, Maurice. 1990a. Grammaire transformationnelle du français: 3. Syntaxe de l’adverbe. Paris, ASSTRIL. Gross, Maurice. 1990b. La caractérisation des adverbes dans un lexique-grammaire. Langue Française, 86, pp. 90-102. Guillet, Alain; Christian Leclère. 1992. La structure des phrases simples en français. Les constructions transitives locatives, Genève, Droz, 446 p. 6