* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Thesis - Archive ouverte UNIGE
Malay grammar wikipedia , lookup
Old Norse morphology wikipedia , lookup
Ojibwe grammar wikipedia , lookup
Udmurt grammar wikipedia , lookup
Proto-Indo-European verbs wikipedia , lookup
Ukrainian grammar wikipedia , lookup
Polish grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Old Irish grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Japanese grammar wikipedia , lookup
Macedonian grammar wikipedia , lookup
Swedish grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Navajo grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Germanic weak verb wikipedia , lookup
Turkish grammar wikipedia , lookup
Germanic strong verb wikipedia , lookup
Spanish grammar wikipedia , lookup
Russian grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Old English grammar wikipedia , lookup
Italian grammar wikipedia , lookup
Icelandic grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Hungarian verbs wikipedia , lookup
Georgian grammar wikipedia , lookup
Sotho verbs wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Kagoshima verb conjugations wikipedia , lookup
Thesis Dynamics, causation, duration in the predicate-argument structure of verbs : a computational approach based on parallel corpora SAMARDZIC, Tanja Abstract This dissertation addresses systematic variation in the use of verbs where two syntactically different sentences are used to express the same event. We show that the frequency distribution of the syntactic alternants depends on the meaning of the verbs: force dynamics in light verb constructions, external causation in lexical causatives, and duration in verb aspect classes. Both intra-linguistic and cross-linguistic variation in morphological and syntactic realisations of semantically equivalent items are taken into account by analysing data extracted from parallel corpora. The three semantic properties are empirically induced on the basis of the observations automatically extracted from large parallel corpora, which are automatically parsed and word-aligned. The generalisations are learned from the extracted data automatically using statistical inferences and machine learning techniques. The accuracy of the predictions made on the basis of the generalisations is assessed experimentally on an independent set of test instances. Reference SAMARDZIC, Tanja. Dynamics, causation, duration in the predicate-argument structure of verbs : a computational approach based on parallel corpora. Thèse de doctorat : Univ. Genève, 2013, no. L. 796 URN : urn:nbn:ch:unige-384224 DOI : 10.13097/archive-ouverte/unige:38422 Available at: http://archive-ouverte.unige.ch/unige:38422 Disclaimer: layout of this document may differ from the published version. Doctoral Dissertation Dynamics, causation, duration in the predicate-argument structure of verbs: A computational approach based on parallel corpora Tanja Samardžić June 30, 2014 Supervisor: Prof. Paola Merlo IMPRIMATUR La Faculté des lettres, sur le préavis d’une commission composée de Madame et Messieurs les professeurs Jacques MOESCHLER, président du jury; Paola MERLO, directrice de thèse; Balthasar BICKEL (Université de Zurich); Jonas KUHN (Université de Stuttgart); Martha PALMER (University of Colorado, Boulder), autorise l’impression de la présente thèse, sans exprimer d’opinion sur les propositions qui y sont énoncées. Genève, le 23 décembre 2013 Le Doyen: Nicolas ZUFFEREY Thèse No 796 3 Abstract This dissertation addresses systematic variation in the use of verbs where two syntactically different sentences are used to express the same event, such as the alternations in the use of decide, break, and push shown in (0.1-0.3). We study the frequency distribution of the syntactic alternants showing that the distributional patterns originate in the meaning of the verbs. (0.1) a. Mary took/made a decision. (0.2) a. Adam broke the laptop. (0.3) a. John pushed the cart. b. Mary decided (something). b. The laptop broke. b. John pushed the cart for some time. Both intra-linguistic and cross-linguistic variation in morphological and syntactic realisations of semantically equivalent items are taken into account by analysing data extracted from parallel corpora. The dissertation includes three case studies: light verb constructions (0.1) in English and German, lexical causatives (0.2) also in English and German, and verb aspect classes (0.3) in English and Serbian. The core question regarding light verb constructions is whether the verbs such as take and make, when used in expressions such as (0.1a), turn into functional words losing their lexical meaning. Arguments for both a positive and a negative answer have been put forward in the literature. The results of our study suggest that light verbs keep at least force-dynamic semantics of their lexical counterparts: the inward dynamics in verbs such as take and the outward dynamics in verbs such as make. The inward dynamics results in a cross-linguistic preference for compact grammatical forms (single verbs) and the outward dynamics results in a preference for analytical forms (constructions). 5 The study on lexical causatives (0.2) addresses the question of why some verbs in some languages do not alternate while their counterparts in other languages do. The results of the study suggest that the property which underlies the variation is the likelihood of external causation. Events described by the alternating verbs are distributed on a scale of increasing likelihood for an external causer to occur. The verbs which alternate in some but not in other languages are those verbs which describe events on the two extremes of the scale. The preference for one alternant is so strong in these verbs that the other alternant rarely occurs, which is why it is not attested in some languages. There are two ways in which the likelihood of external causation can be empirically assessed: a) by observing the typological distribution of causative vs. anticausative morphological marking across a wide range of languages and b) by the frequency distribution of transitive vs. intransitive uses of the alternating verbs in a corpus of a single language. Our study shows that these two measures are correlated. By applying the corpus-based measure, the position on the scale of likelihood of external causation can be determined automatically for a wide range of verbs. The subject of the third case study is the relationship between two temporal properties encoded by the grammatical category of verb aspect: event duration and temporal boundedness. The study shows that these two properties interact in a complex but predictable way giving rise to the observed variation in morphosyntactic realisations of verbs. English native speakers’ intuitions about possible duration of events described by verbs (short vs. long) are predicted from the patterns of formal aspect marking in the equivalent Serbian verbs. The accuracy of the prediction based on the bi-lingual model is superior to the best performing monolingual model. One of the main contributions of the dissertation is a novel experimental methodology, which relies on automatic processing of parallel corpora and statistical inference. The three properties of the events described by verbs (dynamics orientation, the likelihood of external causation, duration) are empirically induced on the basis of the observations automatically extracted from large parallel corpora (containing up to over a million sentences per language), which are automatically parsed and word-aligned. The generalisations are learned from the extracted data automatically using statistical inferences and machine learning techniques. The accuracy of the predictions made on the basis of the generalisations is assessed experimentally on an independent set of test instances. 6 Résumé Cette thèse porte sur la variation systématique dans l’usage des verbes où deux phrases, différentes par rapport à leurs structures syntactiques, peuvent être utilisées pour exprimer le même événement. La variation concernée est montrée dans les exemples (0.40.6). Nous étudions la distribution des fréquences des alternants syntactiques en montrant que la source des patterns distributionnels est dans le contenu sémantique des verbes. (0.4) a. Mary took/made a decision. Marie pris/fait une décision Marie a pris une décision. b. Mary decided (something). Marie décidé (quelque chose) Marie a décidé (quelque chose) (0.5) a. Adam broke the laptop. Adam cassé le ordinateur Adam a cassé l’ordinateur. b. The laptop broke. le ordinateur cassé L’ordinateur c’est cassé. (0.6) a. John pushed the cart. Jean poussé le chariot Jean a poussé le chariot. 7 b. John pushed the cart for some time. Jean poussé le chariot pour quelque temps Jean poussait le chariot pendent quelque temps. La variation intra-linguistique ainsi que la variation à travers des langues concernant les réalisations morphologiques et syntactiques des items sémantiquement équivalents sont prises en compte. Ceci est effectué par une analyse des données extraites de corpus parallèles. La thèse contient trois études de cas: constructions à verbes légers (0.4) en anglais et allemand, les verbes causatifs lexicaux (0.5), égalément en anglais et allemand, et les classes d’aspect verbal (0.6) en anglais et serbe. La question centrale par rapport aux constructions à verbes légers est de savoir si les verbes comme take et make utilisés dans des expressions comme (0.4a) deviennent des mots fonctionnels perdant donc entièrement leur contenu lexical. Des arguments en faveur des deux réponses, positive et négative, ont été cités dans la littérature. Les résultats de notre étude suggèrent que les verbes légers maintiennent au moins la sémantique de dynamique de force appartenant au contenu des verbes lexicaux équivalents: La dynamique orientée vers l’agent de l’événement (à l’intérieur) des verbes comme take et la dynamique orientée vers d’autres participants dans l’événement (à l’extérieur) des verbes comme make. La dynamique orientée vers l’intérieur a pour conséquence une préférence pour des réalisations compactes (des verbes individuelles) à travers des langues, tandis que la dynamique orientée vers l’extérieur a pour conséquence une préférence pour des formes analytiques (des constructions). L’étude des verbes causatifs lexicaux (0.5) porte sur la variation à travers des langues concernant la participation de ces verbes dans l’alternance causative: Pourquoi certains verbes dans certaines langues n’entrent pas dans l’alternance causative tandis que leurs verbes correspondants dans d’autres langues le font? Les résultats de l’étude suggèrent que la caractéristique sémantique qui est à la source de la variation est la probabilité de la causalité externe de l’événement décrit par un verbe. Les événements décrits par les verbes causatifs lexicaux sont placés au long d’une échelle de probabilité croissante de la causalité externe. Les verbes qui entrent dans l’alternance dans une langue, mais ne le font pas dans d’autres langues, sont les verbes décrivant des événements qui se trouvent aux deux extrémités de l’échelle. Ces verbes ont une préférence pour l’un 8 des deux alternants si forte que l’autre alternant n’apparaı̂t que rarement. Ceci est la raison pour laquelle un de deux alternants n’est pas observé dans certaines langues. Il y a deux moyens empiriques pour estimer la probabilité de la causalité externe: a) en observant la distribution typologique des morphèmes causatifs vs. anticausatifs dans la structure des verbes causatifs lexicaux au travers d’un grand nombre des langues et b) en observant la distribution de fréquences des réalisations transitives vs. intransitives des verbes dans un corpus d’une langue individuelle. Notre étude montre que ces deux mesures sont corrélées. En appliquant la mesure basée sur le corpus, la position sur l’échelle de la causalité externe peut être déterminée automatiquement pour un grand nombre de verbes. Le sujet de la troisième étude de cas est la relation entre les deux caractéristiques temporales des événements encodées par la catégorie grammaticale d’aspect verbale: la longueur et la délimitation temporelle. L’étude montre que ces deux caractéristiques interagissent d’une manière complexe mais prévisible, ce qui est à l’origine de la variation observée dans les réalisations morphosyntactiques des verbes. Les intuitions des locuteurs natifs anglais sur la longueur possible d’un événement décrit par un verbe (court vs. long) peuvent être prédites sur la base du marquage formel d’aspect verbal dans les verbes correspondants serbes. L’exactitude des prédictions basées sur le modèle bi-linguistique est supérieure à la performance du meilleur modèle monolanguistique. Une parmi les contributions principales de cette thèse est la nouvelle méthodologie expérimentale qui se base sur le traitement automatique des corpus parallèles et sur l’inférence statistique. Les trois caractéristiques sémantiques des événements décrits par des verbes (la dynamique, la probabilité de la causalité externe, la longueur) sont inférées empiriquement à partir d’observations extraites automatiquement des grands corpus parallèles (contenant jusqu’à plus d’un million de phrases pour chaque langue) automatiquement analysés et alignés. Les généralisations généralisations sont acquises de données de corpus de manière automatique en utilisant l’inférence statistique et les techniques d’apprentissage automatique. L’exactitude des prédictions effectuées sur la base des généralisations est estimée de manière expérimentale en utilisant un échantillon séparé de données de test. 9 Acknowledgements This dissertation has greatly benefited from the help and support of numerous friends and colleagues and I wish to express my gratitude to all of them here. First and foremost, I would like to thank my supervisor, Paola Merlo, for the commitment with which she has supervised this dissertation, for sharing generously her knowledge and experience in countless hours spent discussing my work and reading my pages, for treating my ideas with care and attention, and for showing me that I can do better than I thought I could. I am most thankful to Vesna Polovina and Jacques Mœschler, who made it possible for me to move from Belgrade to Geneva and who have discretely looked after me throughout my studies. I thank Balthasar Bickel, Jonas Kuhn, and Martha Palmer, who kindly agreed to be members of the defence committee, and to Jacques Mœschler, who agreed to be the president of the jury. I have gathered much of the knowledge and skills necessary for carrying out this research in the discussions and joint work with Boban Arsenijević, Effi Georgala, Andrea Gesmundo, Kristina Gulordava, Maja Miličević, Lonneke van der Plas, Marko Simonović, and Balša Stipčević. I am thankful for the time they spent working and thinking with me. I appreciate very much the assistance of James Henderson, Jonas Kuhn, and Gerlof Bouma, who shared their data with me, allowing me to spend less time processing corpora, so I could spend more time thinking about the experiments. 11 I am thankful to my colleagues in the Department of General Linguistics in Belgrade, in the Linguistics Department in Geneva, and in the CLCL research group for their kindness and support. On various occasions, I felt lucky to be able to talk to Tijana Ašić, Lena Baunaz, Anamaria Bentea, Frédérique Berthelot, Giuliano Bocci, Eva Capitao, Maja Djukanović, Nikhil Garg, Jean-Philippe Goldman, Asheesh Gulati, Tabea Ihsane, Borko Kovačević, Joel Lang, Antonio Leoni de León, Gabriele Musillo, Goljihan Kashaeva, Alexis Kauffmann, Christopher Laenzlinger, Jasmina Moskovljević Popović, Luka Nerima, Natalija Panić Cerovski, Genoveva Puskas, Lorenza Russo, Yves Scherrer, Violeta Seretan, Gabi Soare, Živka Stojiljković, Eric Wehrli, and Richard Zimmermann. I would also like to thank Pernilla Danielsson, who helped me start doing computational linguistics while I was a visiting student at the Centre for Corpus Research at the University of Birmingham. In the end, I would like to express my gratitude to Fabio, who has stayed by my side despite all the evenings, weekends, and holidays dedicated to this dissertation. 12 Contents 1. Introduction 1.1. Grammatically relevant components of the meaning of verbs 1.2. Natural language processing in linguistic research . . . . . . 1.3. Using parallel corpora to study language variation . . . . . . 1.4. The overview of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Overview of the literature 2.1. Theoretical approaches to the argument structure . . . . . . . . . 2.1.1. The relational meaning of verbs . . . . . . . . . . . . . . . 2.1.2. Atomic approach to the predicate-argument structure . . . 2.1.3. Decomposing semantic roles into clusters of features . . . . Proto-roles . . . . . . . . . . . . . . . . . . . . . . . . . . . The Theta System . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4. Decomposing the meaning of verbs into multiple predicates Aspectual event analysis . . . . . . . . . . . . . . . . . . . Causal event analysis . . . . . . . . . . . . . . . . . . . . . 2.1.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Verb classes and specialised lexicons . . . . . . . . . . . . . . . . . 2.2.1. Syntactic approach to verb classification . . . . . . . . . . 2.2.2. Manually annotated lexical resources . . . . . . . . . . . . FrameNet . . . . . . . . . . . . . . . . . . . . . . . . . . . The Proposition Bank (PropBank) . . . . . . . . . . . . . VerbNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparing the resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 24 26 27 31 . . . . . . . . . . . . . . . . . . 35 36 38 40 44 44 47 51 51 52 54 55 56 56 59 59 64 66 69 13 Contents 2.3. Automatic approaches to the predicate-argument structure 2.3.1. Early analyses . . . . . . . . . . . . . . . . . . . . . 2.3.2. Semantic role labelling . . . . . . . . . . . . . . . . Standard semantic role labelling . . . . . . . . . . . Joint and unsupervised learning . . . . . . . . . . . 2.3.3. Automatic verb classification . . . . . . . . . . . . . 2.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 74 75 75 82 83 87 3. Using parallel corpora for linguistic research — rationale and methodology 3.1. Cross-linguistic variation and parallel corpora . . . . . . . . . . . . . . . 3.1.1. Instance-level microvariation . . . . . . . . . . . . . . . . . . . . . 3.1.2. Translators’ choice vs. structural variation . . . . . . . . . . . . . 3.2. Parallel corpora in natural language processing . . . . . . . . . . . . . . . 3.2.1. Automatic word alignment . . . . . . . . . . . . . . . . . . . . . . 3.2.2. Using automatic word alignment in natural language processing . 3.3. Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1. Summary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2. Statistical inference and modelling . . . . . . . . . . . . . . . . . 3.3.3. Bayesian modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Machine learning techniques . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1. Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2. Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3. Learning with Bayesian Networks . . . . . . . . . . . . . . . . . . 3.4.4. Evaluation of predictions . . . . . . . . . . . . . . . . . . . . . . . 3.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 90 91 94 96 96 100 103 103 105 110 114 115 122 127 129 130 4. Force dynamics schemata and cross-linguistic alignment structions 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Theoretical background . . . . . . . . . . . . . . . . . . 4.2.1. Light verb constructions as complex predicates . 4.2.2. The diversity of light verb constructions . . . . 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . of light verb con133 . . . . . . . . . . 133 . . . . . . . . . . 136 . . . . . . . . . . 136 . . . . . . . . . . 140 Contents 4.3. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1. Experiment 1: manual alignment of light verb constructions in a parallel corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2. Experiment 2: Automatic alignment of light verb constructions in a parallel corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 4.4. General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1. Two force dynamics schemata in light verbs . . . . . . . . . . . . 4.4.2. Relevance of the findings to natural language processing . . . . . 4.5. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6. Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 143 146 147 150 152 152 154 161 161 163 164 167 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives 169 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.2. Theoretical accounts of lexical causatives . . . . . . . . . . . . . . . . . . 173 5.2.1. Externally and internally caused events . . . . . . . . . . . . . . . 174 5.2.2. Two or three classes of verb roots? . . . . . . . . . . . . . . . . . 176 5.2.3. The scale of spontaneous occurrence . . . . . . . . . . . . . . . . 178 5.3. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 5.3.1. Experiment 1: Corpus-based validation of the scale of spontaneous occurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . 184 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 187 5.3.2. Experiment 2: Scaling up . . . . . . . . . . . . . . . . . . . . . . 188 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . 189 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 190 5.3.3. Experiment 3: Spontaneity and cross-linguistic variation . . . . . 192 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . 193 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 201 15 Contents 5.3.4. Experiment 4: Learning spontaneity with a probabilistic model The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental evaluation . . . . . . . . . . . . . . . . . . . . . . 5.4. General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1. The scale of external causation and the classes of verbs . . . . . 5.4.2. Cross-linguistic variation in English and German . . . . . . . . 5.4.3. Relevance of the findings to natural language processing . . . . 5.5. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . 6. Unlexicalised learning of event duration using parallel corpora 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Theoretical background . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1. Aspectual classes of verbs . . . . . . . . . . . . . . . . . . . . 6.2.2. Observable traits of verb aspect . . . . . . . . . . . . . . . . . 6.2.3. Aspect encoding in the morphology of Serbian verbs . . . . . . 6.3. A quantitative representation of aspect based on cross-linguistic data 6.3.1. Corpus and processing . . . . . . . . . . . . . . . . . . . . . . 6.3.2. Manual aspect classification in Serbian . . . . . . . . . . . . . 6.3.3. Morphological attributes . . . . . . . . . . . . . . . . . . . . . 6.3.4. Numerical values of aspect attributes . . . . . . . . . . . . . . 6.4. Experiment: Learning event duration with a statistical model . . . . 6.4.1. The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Bayesian net classifier . . . . . . . . . . . . . . . . . . . . 6.4.2. Experimental evaluation . . . . . . . . . . . . . . . . . . . . . Materials and methods . . . . . . . . . . . . . . . . . . . . . . Results and discussion . . . . . . . . . . . . . . . . . . . . . . 6.5. General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1. Aspectual classes . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2. Relevance of the findings to natural language processing . . . 6.6. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7. Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 206 208 213 214 215 216 217 220 . . . . . . . . . . . . . . . . . . . . . 223 223 228 228 233 235 240 242 245 246 247 250 251 253 255 256 261 262 262 263 264 266 Contents 7. Conclusion 267 7.1. Theoretical contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 7.2. Methodological contribution . . . . . . . . . . . . . . . . . . . . . . . . . 271 7.3. Directions for future work . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Bibliography 275 A. Light verb constructions data 295 A.1. Word alignment of the constructions with ’take’ . . . . . . . . . . . . . . 295 A.2. Word alignment of the constructions with ’make’ . . . . . . . . . . . . . 299 A.3. Word alignments of regular constructions . . . . . . . . . . . . . . . . . . 303 B. Corpus counts and measures for lexical causatives 307 C. Verb aspect and event duration data 319 17 List of Figures 1.1. Cross-linguistic mapping between morphosyntactic categories. . . . . . . 1.2. Cross-linguistic mapping between morphosyntactic categories. . . . . . . 28 30 3.1. Word alignment in a parallel corpus . . . . . . . . . . . . . . . . . 3.2. Probability distributions of the morphological forms and syntactic sations of the example instances. . . . . . . . . . . . . . . . . . . 3.3. Probability distributions of the example verbs and their frequency. 3.4. A general graphical representation of the normal distribution. . . 3.5. An example of a decision tree . . . . . . . . . . . . . . . . . . . . 3.6. An example of a Bayesian network . . . . . . . . . . . . . . . . . 97 . . . . reali. . . . . . . . . . . . . . . . . . . . 4.1. A schematic representation of the structure of a light verb construction compared with a typical verb phrase . . . . . . . . . . . . . . . . . . . . 4.2. Constructions with vague action verbs . . . . . . . . . . . . . . . . . . . 4.3. True light verb constructions . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Extracting verb-noun combinations . . . . . . . . . . . . . . . . . . . . . 4.5. The difference in automatic alignment depending on the direction. . . . . 4.6. The distribution of nominal complements in constructions with take . . . 4.7. The distribution of nominal complements in constructions with make . . 4.8. The distribution of nominal complements in regular constructions . . . . 4.9. The difference in automatic alignment depending on the complement frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 108 109 118 128 134 145 146 148 154 157 157 158 159 5.1. The correlation between the rankings of verbs on the scale of spontaneous occurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 5.2. Density distribution of the Sp value in the two samples of verbs . . . . . 190 19 List of Figures 5.3. 5.4. 5.5. 5.6. 5.7. Collecting data on lexical causatives . . . . . . . . . . . . . . . . . Density distribution of the Sp value over instances of 354 verbs . . Joint distribution of verb instances in the parallel corpus . . . . . Bayesian net model for learning spontaneity. . . . . . . . . . . . . The Interaction of the factors involved in the causative alternation . . . . . . . . . . . . . . . . . . . . 195 200 203 206 213 6.1. Traditional lexical verb aspect classes, known as Vendler’s classes . . . . 229 6.2. Serbian verb structure summary . . . . . . . . . . . . . . . . . . . . . . . 239 6.3. Bayesian net model for learning event duration . . . . . . . . . . . . . . . 253 20 List of Tables 2.1. 2.2. 2.3. 2.4. Frame elements for the verb achieve . . . . . . . . . . . . . Some combinations of frame elements for the verb achieve. The PropBank lexicon entry for the verb pay. . . . . . . . The VerbNet entry for the class Approve-77. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 64 67 68 Examples of instance variables . . . . . . . . . . . . . . . . . . . . . . . . Examples of type variables . . . . . . . . . . . . . . . . . . . . . . . . . . A simple contingency table summarising the instance variables . . . . . . An example of data summary in Bayesian modelling . . . . . . . . . . . . An example of a data record suitable for supervised machine learning . . Grouping values for training a decision tree . . . . . . . . . . . . . . . . . An example of a data record suitable for supervised machine learning . . An example of probability estimation using the expectation-maximisation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9. Precision and recall matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 104 104 104 111 116 120 122 3.1. 3.2. 3.3. 3.4. 3.5. 3.6. 3.7. 3.8. 4.1. Types of mapping between English constructions and their translation equivalents in German. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Well-aligned instances of light verb constructions . . . . . . . . . . . . . 4.3. The three types of constructions partitioned by the frequency of the complements in the sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Counts and percentages of well-aligned instances in relation with the frequency of the complements in the sample . . . . . . . . . . . . . . . . . . 124 129 150 155 156 160 5.1. Cross-linguistic variation in lexical causatives . . . . . . . . . . . . . . . 171 5.2. Morphological marking of cause-unspecified verbs . . . . . . . . . . . . . 177 21 List of Tables 5.3. Morphological marking across languages . . . . . . . . . . . . . . . . . . 5.4. An example of an extracted instance of an English alternating verb and its translation to German . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5. Examples of parallel instances of lexical causatives. . . . . . . . . . . . . 5.6. Contingency tables for the English and German forms in different samples of parallel instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7. Examples of the cross-linguistic input data . . . . . . . . . . . . . . . . . 5.8. Agreement between corpus-based and typology-based classification of verbs. The classes are denoted in the following way: a=anticausative (interanally caused), c=causative (externally caused) , m=cause-unspecified. . . . . . 5.9. Confusion matrix for monolingual and cross-linguistic classification on 2 classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10. Confusion matrix for monolingual and cross-linguistic classification on 3 classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1. 6.2. 6.3. 6.4. 6.5. 6.6. 6.7. 22 A relationship between English verb tenses and aspectual Serbian lexical derivations . . . . . . . . . . . . . . . . . Serbian lexical derivations with a bare perfective . . . . . An illustration of the MULTEX-East corpus . . . . . . . A sample of the verb aspect data set. . . . . . . . . . . . A sample of the two versions of data . . . . . . . . . . . Results of machine learning experiments . . . . . . . . . classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 197 198 202 209 210 212 212 233 236 239 243 250 258 260 1. Introduction Languages use different means to express the same content. Variation in the choice of lexical items or syntactic constructions is possible without changing the meaning of a sentence. For example, any of the sentences in (1.1a-c) can be used to express the same event. Similarly, the meaning of the sentences in (1.2a-b), (1.3a-b), and (1.4a-b) can be considered as equivalent. The sentences in (1.1) illustrate the variation in the choice of lexical items, while the sentences in (1.2-1.4) show that the syntactic structure of a sentence can be changed without changing the meaning. In both cases, the variation is limited to the options which are provided by the rules of grammar. In order to be exchangeable, linguistic units have to share certain properties. Identifying the properties shared by different formal expressions of semantically equivalent units is, thus, a way of identifying abstract elements of the structure of language. (1.1) a. Mary drank a cup of tea. b. Mary took a cup of tea. c. Mary had a cup of tea. d. Mary had a cup of coffee. As illustrated in (1.1d), verbs allow alternative expressions more easily than other categories. Replacing the noun tea, for example, by coffee changes the meaning of the sentence so that (1.1d) can no longer be considered as equivalent with (1.1a-c). The property which allows verbs to alternate more easily than other categories is their relational meaning. In the given examples, the verbs drink, take, and have relate the nouns Mary and tea. The relational meaning of a verb is commonly represented as the predicate-argument structure, where a verb is considered as a predicate which takes other 23 1. Introduction constituents of a sentence as its arguments. The number and the type of the arguments that a verb takes in a particular instance is partially determined by the verb’s meaning and partially by the contextual and pragmatic factors involved in the instance. (1.2) a. Mary laughed. b. Mary had a laugh. (1.3) a. Adam broke the laptop. b. The laptop broke. (1.4) a. John pushed the cart. b. John pushed the cart for some time. In this dissertation, we study systematic variation in the use of verbs involving alternation in the syntactic structure, as in (1.2-1.4). We study frequency distributions of the syntactic alternants as an observable indicator of the underlying meaning of verbs with the aim of discovering the components of verbs’ meaning which are relevant for their predicate-argument structure and for the grammar of language. 1.1. Grammatically relevant components of the meaning of verbs As argued by Pesetsky (1995) and later by Levin and Rappaport Hovav (2005), only some of the potential components of the meaning of verbs are grammatically relevant. For example, the distinction between verbs describing loud speaking (e.g. shout) and verbs describing quiet speaking (e.g. whisper ) is grammatically irrelevant in the sense that it does not influence any particular syntactic behaviour of these verbs (Pesetsky 1995). Contrary to this, the distinction between verbs which describe primarily the manner of speaking (whisper ) and verbs which describe primarily the content of speaking (e.g. say) is grammatically relevant in the sense that the latter group of verbs can be used without the complementizer that, while the former cannot. Along the same lines, Levin and 24 1.1. Grammatically relevant components of the meaning of verbs Rappaport Hovav (2005) argue that the quality of sound described by verbs of sound emission — volume, pitch, resonance, duration — does not influence their syntactic behaviour. Syntactic behaviour of these verbs is, in fact, influenced by the source of the sound: verbs which describe sound emission with the source of the sound external to the emitting object (e.g. rattle) can alternate between transitive and intransitive uses (in a similar fashion as break in (1.3)), while verbs which describe sound emission with the source of the sound internal to the emitting object (e.g. rumble) do not alternate. Our research continues in the same direction investigating other semantic properties of verbs which are potentially relevant for the grammar. We take into consideration a wide range of verbs and their syntactic realisations. If a particular observed distribution of syntactic alternants can be predicted from a semantic poperty of a verb, then we can say that this property underlies the distribution. If a semantic property undrlies a frequency distribution of syntactic alternants, then this property can be considered as grammaticaly relevant. We focus on three kinds of alternations in realisation of verbs’ arguments. First, by studying the alternation between light verb constructions (1.2b) and the corresponding single verbs (1.2a), we address the issue of whether certain lexical content, in the form of the predicate-argument structure, is present in the verbs which are used as light verbs, such as have in (1.2b). Determining whether some components of meaning are present in light verbs is important for understanding whether the choice of the light verb in a construction is arbitrary or it is constrained by the meaning of light verbs. Second, we study the alternation in the use of lexical causatives such as break in (1.3). Lexical causatives are the verbs which can be used in two ways: as causative (1.3a), where the agent or the causer of the event described by the verb is realised as a constituent of a sentence, and as anticausative (1.3b), where the agent or the causer is not syntactically realised. Many verbs across many different languages can alternate in this way. However, the fact that some verbs in some languages do not alternate raises the question which is addressed in this dissertation: What property of verbs is responsible for allowing or blocking the alternation? Finally, we study the factors involved in the interpretation of temporal properties of events described by verbs. As illustrated in (1.4), the temporal properties of events described by verbs play a role in syntactic structuring of a sentence. For example, the event of pushing is interpreted as short by default (1.4a). With the 25 1. Introduction appropriate temporal modifier, as in (1.4b), it can also be interpreted as lasting for a longer time. In contrast to this, other verbs, such as tick, stay, walk, describe events which are understood as lasting for some time by default. We look for observable indicators in the use of a wide range of verbs pointing to the event duration which is implicit to their meaning. 1.2. Natural language processing in linguistic research The approach that we take in addressing the defined questions is empirical and computational. We take advantage of automatic language processing to collect and analyse large data sets, applying established statistical approaches to infer elements of linguistic structure from the patterns in the observed variation. The tools, methods, and resources which we use are originally developed for practical natural language processing tasks which fall within the domain of computational linguistics. The developments in automatic language processing are directly related to the increasing demand for automatic analysis of large amounts of linguistic contents which are now freely available (mostly through the Internet). Natural language processing tasks include automatic information extraction, question answering, translation etc. Despite the fact that it provides extremely rich resources for empirical linguistic investigations, natural language processing technology has rarely been used for theoretical linguistic research. On the other hand, linguistic representations that are used in developing language technology rarely reflect the current state-of-the-art in linguistic theory. Our research should contribute to bridging the gap between theoretical and computational linguistics by addressing current theoretical discussion with a computational methodology. The work in this dissertation draws on the work in natural language processing in two ways. First, we use automatic processing tools to extract the information from large language corpora. For example, to identify syntactic forms of the realisations of verbs, we use automatically parsed corpora. The information provided by the parses is then used to extract automatically the instances which are relevant for a particular question. Second, we use natural language processing methodology to analyse the extracted instances. This methodology involves three main components: a) the generalisations in the observations 26 1.3. Using parallel corpora to study language variation are captured by designing statistical models; b) the parameters of the models are learnt automatically from the extracted data applying machine learning techniques; c) the predictions of the models are tested on an independent set of data, quantifying and measuring the performance. Adopting this methodology for our research allows us not only to study language use in a valid experimental framework, but also to discover generalisations which can be integrated into further development of natural language processing more easily than the generalisations based on linguistic introspection. 1.3. Using parallel corpora to study language variation Our approach to the relationship between the variation in language use and the structure of language takes into account both language-internal and cross-linguistic variation. This is achieved by extracting verb instances from parallel corpora. By studying the variation in the use of verbs in parallel corpora, we combine and extend two main approaches to language variation: the corpus-based approach to language-internal variation and the theoretical approach to cross-linguistic variation. Corpus-based studies of linguistic variation have been mostly monolingual, following the use of linguistic units either over a period of time or across different language registers. Extending the corpus-based approach to parallel corpora allows a better insight into structural linguistic elements, setting them apart from other potential factors of variation. Consider, for example, the alternations in (1.2-1.4). An occurrence of one or the other syntactic alternant in a monolingual corpus depends partially on the predicateargument structure of the verbs and partially on the contextual and pragmatic factors. However, if we can observe actual translations of the sentences, then we can observe at least two uses of semantically equivalent units in the same contextual and pragmatic conditions, since these conditions are constant in translation. In this way, we control for contextual and pragmatic factors while potentially observing the variation due to structural factors. Unlike language-internal variation, which has become the subject of research relatively recently, with the development of corpus-based approaches, cross-linguistic variation is 27 1. Introduction traditionally one of the core issues in theoretical linguistics. Differences in the expressions of the same contents across languages have always been analysed with the aim of discovering universally invariable elements of the structure of language which constrain the variation. Consider, for example, the English sentence in (1.5a) and its corresponding German, Serbian, and French sentences in (1.5b-d). (1.5) a. Mary has just sent the letter. (English) b. Maria hat gerade eben den Brief geschickt. c. Marija je upravo poslala pismo. d. Marie vein d’envoyer la lettre. (German) (Serbian) (French) English German Serbian French present perfect adverb+ perfect prefix venir+ infinitive Figure 1.1.: Cross-linguistic mapping between morphosyntactic categories. All the four sentences describe a short completed action that happened immediately before the time in which the sentence is uttered, but the meaning of shortness, completeness, and time (immediate precedence) is expressed in different ways in the four languages. In English, this meaning is encoded with a verb tense, present perfect. German uses more general perfect tense, and the immediate precedence component is encoded in the adverbs (gerade eben). French, on the other hand, does not use any particular verb conjugation to express this meaning, but rather a construction which consists of a semantically impoverished verb (venir ’come’) and the main verb (envoyer ’send’) in the neutral, infinitive form. The corresponding Serbian expression is formed in yet another different way: through lexical derivation. The verb poslati used in (1.5c) is derived from the verb slati, which does not encode any specific temporal properties, by adding the prefix po-. Figure 1.1 summarises the identified grammatical mappings across languages. Note also that, unlike the sentences in other languages, the French sentence does not contain a temporal adverb. The meaning of immediate precedence 28 1.3. Using parallel corpora to study language variation is already encoded as part of the meaning of the constructions formed with the verb ’venir’. These examples illustrate systematic variation across languages, and not just incidental differences between these particular sentences. If we replace the constituents of the sentences with some other members of their paradigms, we will observe the same patterns of variation. For instance, we can replace the phrase send the letter in the English sentence and its lexical counterparts in German, Serbian, and French by some other phrases, such as open the window, read the message, arrive to the meeting and so on. The choice of the corresponding morphosyntactic categories can be expected to stay the same. The regular patterns in cross-linguistic variation are due to the fact that sentences are composed of the same abstract units. As mentioned before, all the four sentences in (1.5) express the same event, with the same temporal properties (shortness, completeness, immediate precedence). The fact that they influence (morpho)syntactic realisations of verbs makes these properties grammatically relevant. The fact that they are equally interpreted across languages, despite the differences in the morphosyntactic realisations, makes them candidates for universal elements of the structure of language. Theoretical approaches to cross-linguistic variation are concerned with identifying not only the elements of linguistic structure which are invariable across languages, but also the parameters of variation and their possible settings. With these two elements one could then construct a general representation of language capacity shared by all speakers of all languages. In this system, the grammar of any particular language instantiates the general grammar by setting the parameters to a certain value. For example, temporal properties of events in our example, which are invariable across languages, can be encoded in a syntactic construction (French), in the morphology (English), or in the lexical derivation (Serbian). Ideally, the number of possible values for a parameter should be small. However, identifying the parameters of cross-linguistic variation and their possible settings is far from being a trivial task. Even though there are some regular patterns of cross-linguistic mapping, as we saw earlier, it is hard to define general rules which apply to all instances of a given category, independently of a given context. In fact, when we take a closer look, finding regularities in cross-linguistic variation turns out to be a very 29 1. Introduction difficult task for which no common methodology has been proposed. To illustrate the difficulties, we will look again at the example of English present perfect tense, for which we have defined cross-linguistic mappings shown in Figure 1.1. As we can see in (1.6), the mappings in Figure 1.1 do not hold for all the instances of English present perfect tense. A different use of this tense in English brings to rather different mappings. (1.6) a. Mary still has not seen the film. (English) b. Maria hat noch immer nicth den Film gesehen. c. Marija još nije gledala film. d. Marie n’a pas encore vu le film. (German) (Serbian) (French) English German Serbian French present perfect perfect bare form passé composé Figure 1.2.: Cross-linguistic mapping between morphosyntactic categories. Figure 1.2 summarises the mappings between the sentences in (1.6). We can see that, iinstead of the construction with the verb venir, the corresponding French form n this case is a verb tense (passé composé). The corresponding Serbian verb in this context is neither prefixed nor perfective. This means that the English present perfect tense has multiple cross-linguistic mappings even in this small sample of only two other languages (the German form can be considered invariable in this case). Other uses might be mapped in yet different ways. For instance, there can be a use which maps to French as in Figure 1.2, and to Serbian as in Figure 1.1. If we take into account all the other languages and all possible uses of present perfect tense in English, the number of possible cross-linguistic mappings of this single morphological category is likely to become very big. We can expect to encounter the same situation with all the other categories and their combinations. This creates a very large space of possible cross-linguistic mappings, which is hard to explore and to account for in an exhaustive fashion. 30 1.4. The overview of the dissertation Extracting verb instances from parallel corpora allows us to observe directly a wide range of cross-linguistic mappings of the target morphosyntactic categories at the instance level, taking into account contextual factors. With a large number of instances analysed using computational and statistical methods, we can take a new perspective on the cross-linguistic variation. Zooming out to analyse general tendencies in the data, rather than individual cases, we can identify patterns signalling potential constraints on the variation. Even though this approach is not exhaustive, it is systematic in the sense that it allows us to observe patterns in cross-linguistic variation in large samples and to use statistical inference to formulate generalisations which hold beyond the observed samples. 1.4. The overview of the dissertation The dissertation consists of seven chapters. In addition to Introduction and Conclusion, there are five central chapters which are divided between two main parts. The first part (Chapters 2 and 3) presents the conceptual and technical background of our work, the rationale for our methodological choices, as well as a detailed description of general methods used in our experiments. The second part (Chapters 4, 5, and 6) contains three case studies in which our experimental methodology is used to address three specific theoretical questions. In Chapter 2, we discuss the issues in the predicate-argument structure of verbs from two points of view: theoretical and computational. The theoretical track follows the development in the view of the predicate argument structure from the first proposals which divide the grammatical and the idiosyncratic components of the lexical structure of verbs to the current view of verbs as composed of multiple predicates, which is adopted in our research. We review theoretical arguments for abandoning the initial “atomic” view of the predicate-argument structure, as well as some proposals for its systematic decomposition into smaller components. We then proceed by reviewing the work on extensive verb classification, which relates the grammatical and the idiosyncratic layer of the lexical structure of verbs. We discuss the principles of semantic classification of verbs on the basis of their syntactic behaviour, as well as practical implementations of verb 31 1. Introduction classification principles in developing extensive language resources. Finally, we review approaches to automatic acquisition of verb classification and the predicate-argument structure, discussing the representations and methods used for these tasks. Chapter 3 deals with the methodology of using parallel corpora for linguistic research. Since parallel corpora are not commonly used as a source of data for linguistic research, we first present our rationale for this choice, discussing its advantages, but also its limitations. We then give an overview of natural language processing approaches based on parallel corpora and the contributions of this line of research. The second part of the chapter deals with the technical and practical issues in using natural language processing methodology for linguistic research. We first describe steps in processing parallel corpora for extracting linguistic data, in particular, automatic word alignment, which is crucial for our approach. We then turn to the methods used for analysing the extracted data providing the technical background necessary to follow the discussion in the three case studies. The background includes an introduction to statistical inference and modelling in general, as well as to Bayesian modelling in particular, which is followed by an overview of four standard machine learning classification techniques which are used or referred to in our case studies: naı̈ve Bayes, decision tree, Bayesian net, and the expectationmaximisation algorithm. The first case study, on light verb constructions, is presented in Chapter 4. We first give an overview of the theoretical background and the questions raised by light verb constructions. We introduce two classes of light verb constructions discussed in the literature, true light verb constructions and constructions with vague action verbs. We then introduce our proposed classification which is based on verb types. We argue that light verb constructions headed by light take behave like true light verb constructions, while the constructions headed by light make behave like the constructions with vague action verbs. We relate this behaviour to the force dynamics representation of the predicateargument structure of these verbs. We then present two experiments in which we test two hypotheses about the relationship between the force dynamics in the meaning of the verbs and the cross-linguistic frequency distribution of the alternating morhosyntactic forms. The case study on the causative alternation is presented in Chapter 5. We start by 32 1.4. The overview of the dissertation reviewing the proposed generalisations addressing the meaning of the verbs which participate in the causative alternation. In particular, we address the notions of change of state, external vs. internal causation, and cross-linguistic variation in the availability of the alternation. We then introduce the discussion on the number of classes into which verbs should be classified with respect to these notions. Two proposal have been put forward in the literature: a) a two-way distinction between alternating and not alternating verbs, where alternating verbs are characterised as describing externally caused events, while the verbs which do not alternate describe internally caused events; b) a three-way classification involving a third class of verbs situated between the two previously proposed classes. We then discuss the distribution of the morphological marking on alternating verbs across languages as a potential indicator of the grammatically relevant meaning of the alternating verbs. This leads us to introducing the notion of the likelihood of external causation. The experimental part of this study consists of four steps. In the first step, we validate a corpus based measure of the likelihood of external causation showing that it correlates with the typological distribution of the morphological marking. In the second step, we show that the corpus based measure can be extended to a large sample of verbs. In the third step, we extract the instances of the large sample of verbs from a parallel corpus and test the influence of the likelihood of external causation on the cross-linguistic distribution of their morphosyntactic realisations. In the fourth step, we address the issue of classifying the alternating verbs by designing a statistical model which takes as input cross-linguistic realisations of verbs and outputs their semantic classification. We test the model in two modes, on the two-way and on the three-way classification. The last case study, presented in Chapter 6, deals with the representation of grammatically relevant temporal properties of events described by a wide range of verbs. We start by introducing verb aspect as a grammatical category usually thought to encode temporal meaning. More specifically, we discuss two notions related to verb aspect: temporal boundedness and event duration. We then discuss Serbian verb derivations associated with verb aspect as a potential observable indicator of these two temporal properties of events described by verbs. We proceed by proposing a quantitative representation of Serbian verb aspect based on cross-linguistic realisations of verbs extracted from parallel corpora. We then design a Bayesian model which predicts the duration (short vs. 33 1. Introduction long) of events taking this representation as input. We test the performance of the model against English native speakers’ judgments of the duration of events described by English verbs. We compare our results to the results of models based on monolingual English input. In Chapter 7, we draw some general conclusions, pointing to the limitations of the current approach as well as to some directions for future research. 34 2. Overview of the literature The conceptual and methodological framework of the experiments presented in this dissertation encompasses three partially interrelated lines of research: theoretical accounts of the grammatically relevant meaning of verbs, its extensive descriptions in specialised lexicons, and its automatic acquisition from language corpora. Theoretical accounts of the meaning of verbs are crucial for defining the hypotheses which are tested in our experiments. Our hypotheses are formulated in the context and framework of recent developments in theoretical accounts of lexical representation of verbs. While using the tools and the methodology developed in computational linguistics, our main goal is not to develop a new tool or resource, but to extend the general knowledge about what kinds of meaning are actually part of the lexical representation of verbs and how they are related to the grammar. Our work is related to the work on constructing comprehensive specialised lexicons of verbs because we work with large sets of verbs assigning specific lexical and grammatical properties to each verb in each sample. Finally, we follow the work on automatic acquisition of the meaning of verbs in that we learn the elements of their lexical representation automatically from the observed distributions of their realisations in a corpus. This aspect distinguishes our work from theoretical approaches, as well as from the work on developing specialised lexicons, which are based on linguistic introspection rather than on empirical observations. This chapter contains an overview of the existing research in all three domains. In Section 2.1, we follow the developments in theoretical approaches to the meaning of verbs. We start by introducing the notion of predicate-argument structure of verbs, discussing its role in the grammar of language, as well as in linguistic theory (2.1.1). We proceed by reviewing proposed theoretical accounts which represent general views of the predicateargument structure in the literature, discussing at length crucial turning points in the 35 2. Overview of the literature theoretical development leading to the temporal and causal decomposition of the meaning of verbs which is adopted in our experiments. In Section 2.2, we discuss the principles of large-scale implementations of some views of the predicate-argument structure. We summarise the main ideas behind the syntactic behavioural approach to the meaning of verbs (2.2.1), which is followed by descriptions of three lexical resources which contain thousands of verbs with explicit analyses of their predicate-argument structure. In Section 2.3, we discuss approaches to automatic acquisition of the predicate-argument structure from language corpora which rely on the described lexical resources conceptually (they adopt the principles of syntactic approach to verb meaning) and practically (they use the resources for training and testing systems for automatic acquisition). 2.1. Theoretical approaches to the argument structure It is generally assumed in linguistic theory that the structure of a sentence depends, to a certain degree, on the meaning of its main verb. Some verbs, such as see in (2.1) require a subject and an object; others, such as laugh in (2.2), form grammatical sentences expressing only the subject; others, such as tell in (2.3) require expressing three constituents. (Clauses with more than three principal constituents are rare.) The assumption concerning these observations is that the association of certain verbs with a certain number and kind of constituents is not due to chance, but that it is part of the grammar of language. (2.1) [Mary] saw [a friend]. subject object (2.2) [Mary] laughed. subject (2.3) [Mary] told [her friend] subject indirect-object 36 [a story]. object 2.1. Theoretical approaches to the argument structure Although the relation between the meaning of verbs and the available syntactic patterns seems obvious, defining precise rules to derive a phrase structure from the lexical structure of a verb proves to be a difficult task. The task, known in the the linguistic literature as the linking problem, is one of the central concerns of the theory of language (Baker 1997). The main difficulty in linking the meaning of verbs and the form of the phrases that they head is in analysing verbs’ meaning so that the components responsible for the syntactic forms of the phrases are identified. There are many different ways in which the meaning of verbs can be analysed and it is hard to see what kind of analysis is relevant for the grammar. Consider, for example, basic dictionary definitions of the verbs used in (2.1-2.3) given in (2.4). see to notice people and things with your eyes laugh to smile while making sounds with your voice that (2.4) show you are happy or think something is funny tell to say something to someone, usually giving them information Cambridge Dictionaries Online http://dictionary.cambridge.org/ In the definitions above, the meaning of the verbs is analysed into smaller components. They state, for example, that seeing involves eyes, things, and people, that laughing involves sounds, showing that you are happy, and something funny, and that telling involves something, someone, and giving information. The units which are identified as components of the verbs’ meaning are very different in nature: some are nouns with specific meaning, some are pronouns with very general meaning, some are complex phrases. In theoretical approaches to the meaning of verbs, like in lexicography, the analysis results in identifying smaller, more primitive notions of which the meaning is composed. Unlike lexicographic analysis, however, theoretical analysis aims at defining and organising these notions having in mind the language system as a whole, and not only the meaning of each verb separately. This implies establishing general components which apply across lexical items and which play a role in the rules of grammar. 37 2. Overview of the literature 2.1.1. The relational meaning of verbs The most important general distinction made in the theory of lexical structure of verbs is the one between the relational meaning and the idiosyncratic lexical content. In the definition of the verb see given in (2.4), for example, things and people belong to the relational structure, while eyes belong to the idiosyncratic content. In the case of the verb laugh, all the components listed in the definition are idiosyncratic. The verb tell has two relational components (something, someone). The relational meaning expresses the fact that the verb relates its subject with another entity or with a property. In this sense, verbs are analysed as logical predicates which can take one, two, three, or more arguments. This part of their lexical structure is usually called the predicate-argument structure. It is seen as an abstract component of meaning present in all verbs. There are only a few possible predicate-argument structures, so that they are typically shared by many verbs, while the idiosyncratic content characterises each individual verb. The predicate-argument structure is the part of the lexical representation of verbs which determines the basic shape of clauses. In a simplified scenario, a verbal predicate which takes two arguments forms a clause with two principal constituents, as in (2.1) and in (2.5a). One argument in the lexical structure of a verb results in intransitive clauses (2.2 and 2.5b) and so on. Formally, the transfer of the information from the lexicon to syntax is handled by more general mechanisms, by projection in earlier accounts (Chomsky 1970; Jackendoff 1977; Chomsky 1986) and feature checking in newer proposals (Chomsky 1995; Radford 2004). In the accounts that are based on the notion of projection, lexical items project their relational properties into syntax by forming a specific formal structure which can then be combined only with the structures with which it is compatible. So, for instance, a twoargument verb will form a structure with empty positions intended for its subject and object. In principle, these positions can only be filled by nominal structures, while other verbal, adjectival, or adverbial structures will not be compatible with these positions. In the feature checking account, lexical items do not form any specific structures, but they carry their properties as features, which, by a general rule, need to match between 38 2.1. Theoretical approaches to the argument structure the items which are to be combined in a phrase structure. For instance, the list of features of a two-argument verb will contain one feature requiring a subject and one requiring an object. A verb with these features can be combined only with items which have the matching features, that is with nominal items which bear the same features. Characterisation of possible semantic arguments of verbs depends on the theoretical framework adopted for an analysis, but all approaches make distinctions between at least several kinds of arguments. The kind of meaning expressed by a verb’s argument is usually called a semantic role. Two traditional semantic roles, agent and theme are illustrated in (2.5). (2.5) a. [Mary] stopped subject/agent [the car]. object/theme b. [The car] stopped. subject/theme There is a certain alignment between semantic roles and syntactic functions. Agents, for instance, tend to be realised as subjects across languages, while themes are usually objects as in (2.5a). However, the same semantic role can be realised with different syntactic functions, as it is the case with the theme role assigned to the car in (2.5ab). The phenomenon of multiple syntactic realisations of the same predicate-argument structure is known as argument alternation. The alternation illustrated in (2.5) is called the causative alternation, because the argument which causes the car to stop (Mary) is present in one expression (2.5a), but not in the other (2.5b). Other well-known examples of argument alternations include the dative alternation (2.6) and the locative alternation (2.7). (2.6) a. [Mary] told [her friend] [a story]. subject/agent indirect-object/recipient object/theme b. [Mary] told [a story] [to her friend]. subject/agent object/theme prep-complement/recipient 39 2. Overview of the literature (2.7) a. [People] were swarming [in the exhibition hall]. subject/agent prep-complement/location b. [The exhibition hall] was swarming [with people]. subject/location prep-complement/agent In the dative alternation, the recipient role (her friend in (2.6)) can be expressed as the indirect object which usually takes dative case (2.6a),1 or as a prepositional complement (2.6b). In the locative alternation, the arguments which express the location and the agent of the situation described by the verb swap syntactic functions: the location (exhibition hall ) is the prepositional complement in (2.7a) and the subject in (2.7b). The agent (people) is in the subject position in (2.7a) and it is the prepositional complement in (2.7b). The view of the predicate argument structure has evolved with the developments in linguistic theory, from the quite intuitive notions illustrated in the examples so far to more formal and general analyses. The main changes in the theory are reviewed in the following sections. 2.1.2. Atomic approach to the predicate-argument structure In the earliest approaches, the roles of the semantic arguments of verbs are regarded as simple, atomic labels. Apart from the roles illustrated in (2.5-2.7), the set of labels commonly includes: experiencer, instrument, source, and goal, illustrated in (2.8-2.11).2 The atomic semantic labels of the constituents originate in the notions of “deep cases” in Case grammar (Fillmore 1968). These labels capture common intuitions about the relational meaning of verbs which cannot be addressed using only the notions of syntactic functions. For example, the meanings of the subjects in (2.5a-b), as well as the role that they play in the event 1 Although the dative case is not visible in most of English phrases, including (2.6a), it can be shown that it exists in the syntactic representation of the phrases. 2 The labels patient and theme are often used as synonyms (as, for example, in (Levin and Rappaport Hovav 2005)). If a difference is made, patient is the participant undergoing a change of state, and theme is the one that undergoes a change of location. 40 2.1. Theoretical approaches to the argument structure described by the verb stop, are rather different. Mary refers to a human being who is actively (and possibly intentionally) taking part in the event, while the car refers to an object which cannot have any control of what is happening. This difference cannot be formulated without referring to the semantic argument label of the constituents. A similar distinction is made between people and the exhibition hall in (2.7a-b). Another important intuition which is made evident by the predicate-argument representation is that the sentences such as (2.5a) and (2.5b) are related in the sense that they are paraphrases of each other. The same applies for (2.6a) and (2.6b) and (2.7a) and (2.7b). The fact that the predicate-argument structure is shared by the two paraphrases, while their syntactic structure is different, represents the intuition that the two sentences have approximately the same meaning, despite the different arrangements of the constituents. (2.8) [Mary] enjoyed the film. experiencer (2.9) Mary opened the door [with a card]. instrument (2.10) Mary borrowed a DVD [from the library]. source (2.11) Mary arrived [at the party]. goal Finally, the predicate-argument representation is useful in establishing the relationship between the sentences which express the same content across languages. As the examples in (2.12) show, the relational structure of the verbs like in English and plaire in French is the same, despite the fact that their semantic arguments have inverse syntactic functions. (2.12) a. [Mary] liked [the idea]. subject/experiencer object/theme (English) 41 2. Overview of the literature b. [L’idée] a plu [à Marie]. subject/theme prep-complement/experiencer (French) Although the predicat-argument structure proves to be a theoretically necessary level of representation of the phrase structure, it was soon shown that the concept of semantic roles as atomic labels for the verbs’ arguments is too naı̈ve with respect to the reality of the observations that it is intended to capture. First of all, the set of roles is not definitive. There are no common criteria which define all possible members of the set. New roles often need to be added to account for different language facts. For example, the sentence in (2.9) can be transformed so that instrument is the subject as in (2.13), but if we replace the card with the wind as in (2.14), the meaning of this subject cannot be described with any of the labels listed so far. It calls for a new role — cause or immediate cause (Levin and Rappaport Hovav 2005). Similarly, many other sentences cannot be described with the given set of roles. This is why different analyses keep adding new roles (such as beneficiary, destination, path, time, measure, extent etc.) to the set. (2.13) [The card] opened the door. instrument (2.14) [The wind] opened the door. cause Another problem posed by the atomic view of semantic roles is that there are no transparent criteria or tests for identifying a particular role. Definitions of semantic roles do not provide sets of necessary and sufficient conditions that can be used in identifying the semantic role of a particular argument of a verb. For example, agent is usually defined as the participant in an activity that deliberately performs the action, goal is the participant toward which an action is directed,3 and source is the participant denoting the origin of an action. These definitions, however, do not apply in many cases, as noted by Dowty (1991). For example, both Mary and John in (2.15) seem to act voluntarily 3 Dowty analyses to Mary in (2.15a) as goal, while the role of this constituent would be analysed as recipient by other authors, which further illustrates the problem. 42 2.1. Theoretical approaches to the argument structure in both sentences, which means that they both bear the role of agent. Furthermore, John is not just agent, but also source, while Mary is both agent and goal. (2.15) (a) [John] sold the piano [to Mary] for $1000. ? ? (b) [Mary] bought the piano [from John] for $1000. ? ? (Dowty 1991: 556) The example in (2.15) shows that the relational structure of such sentences cannot be described by assigning a single and distinct semantic label to each principal constituent of the clause. The meaning of the verbs’ arguments seems to express multiple relations with the verbal predicate. There is one more observation which cannot be addressed with the simple view of semantic labels. This is the fact that the meaning of the roles is not equally distinct in all the cases. Some roles obviously express similar meanings, while others are very different. Furthermore, semantic clustering of the roles seems to be related with the kinds of syntactic functions that the arguments have in a phrase. For example, the arguments which are realised as subjects in (2.9), (2.13), and (2.14), agent, instrument, cause respectively, constitute a paradigm — they can be replaced by each other in the same context. It has been noticed that two of these roles, agent and cause, can never occur together in the same phrase. On the other hand, the roles such as source and goal are in a syntagmatic relation: they tend to occur together in the same phrase. The traditional view of semantic roles as a set of atomic notions does not provide a means to account for these facts. Different theoretical frameworks have been developed in the linguistic literature to deal with these problems and to provide a more adequate definitions of the predicateargument relations. Studying in more detail how semantic arguments of verbs are realised in the phrase structure, some authors (Larson 1988; Grimshaw 1990) propose a universal hierarchy of the arguments. The order in the hierarchy is imposed by the syntactic prominence of the arguments. For example, agents are at the top of the hierarchy, 43 2. Overview of the literature which means that they take the most prominent position in the sentence, the subject position. Next in the hierarchy are themes. They are typically realised as direct objects, but they can also be realised as subjects if agents are not present in the representation. Lower arguments are realised as indirect objects and prepositional complements. We do not discuss these proposals further as the view of the arguments does not significantly depart from the atomic notions. In the following sections, we take a closer look into the analyses which propose decomposing the predicate-argument structure into a set of more primitive notions. We start with the approaches based on a decomposition of semantic roles into features or properties. Then we move to the approaches based on a decomposition of verbal meaning into multiple predicates. 2.1.3. Decomposing semantic roles into clusters of features An obvious direction for overcoming the problems posed by the atomic view of the predicate-argument relationship is to decompose the notions of individual roles into features or properties. Using a limited set of features for defining all the roles should provide more systematic and more precise definitions of roles. It should also enable defining a role hierarchy that can group the roles according to properties that they share. Two approaches to the feature-based approaches to semantic roles are described in this section. Proto-roles Dowty (1991) concentrates on argument selection — the principles that languages use to determine which argument of a predicate can be expressed with which grammatical function. Dowty (1991) argues that discrete semantic types of arguments do not exist at all, but that the arguments are rather divided into only two conceptual clusters — proto-agent and proto-patient. These clusters are understood as categories in the sense of the theory of prototypes (Rosch 1973), which means that they have no clear boundaries, and that they are not defined with sets of necessary and sufficient conditions. 44 2.1. Theoretical approaches to the argument structure These categories are represented with their prototypical members, with other members belonging to the categories to a different degree. The more the members are similar to the prototypes the more they belong to the category. Looking into different realisations of subjects and objects and the semantic distinctions that they express in different languages, Dowty proposes lists of features that define the agent and the patient prototype. Each feature is illustrated by the sentence whose number is indicated. agent: a. volitional involvement in the event or state (2.16) b. sentence (and/or perception) (2.17) c. causing an event or change of state in another participant (2.18) d. movement (relative to the position of another participant) (2.19) (e. exists independently of the event named by the verb) (2.20)4 patient a. undergoes change of state (2.21) b. incremental theme (2.22) c. causally affected by another participant (2.23) d. stationary relevant to movement of another participant (2.24) (e. does not exist independently of the event, or not at all) (2.25) (2.16) [Bill] is ignoring Mary. (2.17) [John] sees Mary. 4 Dowty uses the parentheses to express his own doubts about the relevance of the last feature in both groups. 45 2. Overview of the literature (2.18) [Teenage unemployment] causes delinquency. (2.19) [Water] filled the boat. (2.20) [John] needs a new car. (2.21) John made [a mistake]. (2.22) John filled [the glass] with water. (2.23) Smoking causes [cancer]. (2.24) The bullet overtook [the arrow]. (2.25) John built [the house]. These examples illustrate the properties in isolation, the phrases used in contexts where syntactic constituents are characterised with only one of the properties. Prototypical realisations would include all agent properties for subjects and all patient properties for objects. These properties are conceived as entailments that are contained in verbs meaning specifying the value for the cognitive categories that people are actually concerned with: whether an act was volitional, whether it was caused by something, whether there were emotional reactions to it, and so on. (Dowty 1991: 575) The relation between a verb’s meaning and its syntactic form can be formulated in the following way: If a verb has two arguments, the one that is closer to the agent prototype is realised as the subject, and the one that is closer to the patient prototype is realised as the object. If there are three arguments of a verb, the one that is in between these two ends is realised as a prepositional object. This theory can be applied to explain certain phenomena concerning the interface between semantics and syntax. For example, the existence of “double lexicalizations” such as those in (2.15) that are attested in many different languages with the same types of verbs can be explained by the properties of their arguments. Both arguments that are realised in (2.15) are agent-like arguments (none of them being a prototypical agent), so the languages tend to provide lexical elements (verbs) for both of them to be realised as subjects. 46 2.1. Theoretical approaches to the argument structure Dowty’s theory provides an elaborate framework for distinguishing between verbs’ arguments, accounting for numerous different instances of arguments. These characteristics make it a suitable conceptual framework for large-scale data analysis. Recently, Dowty’s notions have been used as argument descriptors in a large-scale empirical study of morphosyntactic marking of argument relations across a wide range of languages (Bickel et al. To appear), as well as in a large-scale annotation project (Palmer et al. 2005a). Dowty’s approach, however, does not address issues related to syntax, such as different syntactic realisations of the same arguments. The approach reviewed in the following subsection is more concentrated on these issues. The Theta System Unlike Dowty, who assumes monostratal syntactic structure of phrases (Levin and Rappaport Hovav 2005), Reinhart (2002) sets the discussion on semantic roles in the context of derivational syntax. The account proposed by Reinhart (2002) offers an elaborate view of the interface between lexical representation of verbs and syntactic derivations. It assumes three independent cognitive modules — the systems of concepts, the computational system (syntax), and the semantic inference systems. Linguistic information is first processed in the systems of concepts, then passed on to the computational system, and then to the semantic inference systems. The theta system5 belongs to the systems of concepts. It enables the interface between all the three modules. It consists of three parts: a. Lexical entries, where theta-relations of verbs are defined. b. A set of arity operations on lexical entries, where argument alternations are produced. c. Marking procedures, which finally shape a verb entry for syntactic derivations. 5 In the generative grammar theoretical framework, semantic roles are often referred to as thematic roles or, sometimes, as Θ-roles, indicating that the meaning expressed by the semantic arguments of verbs is not as specific as the traditional labels would suggest. 47 2. Overview of the literature There are eight possible theta relations that can be defined for a verb and that can be encoded in its lexical entry. They represent different combinations of values for two binary features: cause change (feature [c]) and mental state (feature [m]). They can be related to the traditional semantic role labels in the following way: a) [+c+m] — agent b) [+c−m] — instrument (. . .) c) [−c+m] — experiencer d) [−c−m] — theme / patient e) [+c] — cause (unspecified for / m); consistent with either (a) or (b) f) [+m] — ? (Candidates for this feature-cluster are the subjects of verbs like love, know, believe) g) [−m] — (unspecified for / c): subject matter / locative source h) [−c] — (unspecified for / m): roles like goal, benefactor; typically dative (or PP). The verb entries in the lexicon can be basic or derived. There are three operations that can be applied to the basic entries resulting in derived entries: saturation, reduction, and expansion. Saturation is applied to the entries that are intended for deriving passive constructions. It defines that one of the arguments is just an existential quantification and that it is not realised in syntax. It is formalised as described in the example of the verb wash: a) Basic entry: wash(θ1 , θ2 ) b) Saturation: ∃x(wash(x, θ2 )) c) Max was washed : ∃x(x washed Max) 48 2.1. Theoretical approaches to the argument structure Reduction can apply in two ways. If it applies to the argument that is realised within the verb phrase in syntax (typically, the direct object) it reduces the verb’s argument array to only one argument, so that the meaning of the verb is still interpreted as a two-place relation, but as its reflexive instance ((2.26b) vs. (2.26a)). If it applies to the argument that is realised outside of the verb phrase, which means as the subject in a sentence, it eliminates this argument from the array of verb’s arguments completely, so that the verb is interpreted as a one-place relation ((2.26c) vs. (2.26a)). (2.26) (a) Mary stopped the car. (b) Mary stopped. (c) The car stopped. Expansion is an operation usually known as causativization. It adds one argument — agent — to the array of the verb (2.27b vs. 2.27a). (2.27) (a) The dog walked slowly. (b) Mary walked the dog slowly. All these operations take place in the lexicon, producing different outputs. While the operations of saturation and reduction produce new variations of the same lexical concept, expansion creates a whole new concept. Before entering syntactic derivations, the concepts undergo one more procedure, the marking procedure, which assigns indices to the arguments of verbs. These indices serve as a message to the computational system as to where to insert each argument in the phrase structure. Taking into consideration the number and the type of the feature clusters that are found in a verb entry, they are assigned according to the following rules: Given an n-place verb entry, n > 1,6 6 Insertion of a single argument as subject follows from a more general syntactic rule, namely the Extended Projection Principle, which states that each clause must have a subject. 49 2. Overview of the literature a) Mark a [−] cluster with index 2. b) Mark a [+] cluster with index 1. c) If the entry includes both a [+] cluster and a fully specified cluster [/α, /−c],7 mark the verb with the ACC feature.8 This marking is associated with the following instructions for the computational system: a) When nothing rules this out, merge externally. b) An argument realising a cluster marked 2 merges internally. c) An argument with a cluster marked 1 merges externally. The operation of internal merging joins a new constituent to an existing structure within a verb phrase, while the operation of external merging inserts a new constituent in the existing structure of a sentence outside of the phrase headed by the verb in question. The result of an internal merge is usually a syntactic relation between a verb and its object, while external merge forms a relation between a verb and its subject. With this system, some generalizations concerning the relation between theta roles and syntactic functions can be stated. Arguments that realise [+] clusters ([+m−c] agent, [+c] cause, [+m] ?) are subjects. Since there can be only one subject in a sentence, they exclude each other. Arguments that realise [−] clusters ([−m−c] patient, [−m] subject matter, [−c] goal) are objects. Only the fully specified one can be the direct object (introducing the ACC feature to the verb). The others (underspecified ones) have to be marked with a preposition or an inherent case (e.g. dative), thus realised as indirect objects. 7 A cluster that is specified for both features, where one of them has to be [−c] and the other can be any of the following: [+m], [−m], [+c], [−c]. 8 ACC stands for accusative. This feature determines whether a verb assign accusative case to its complement. 50 2.1. Theoretical approaches to the argument structure Arguments that are specified for both features, but with opposite values ([+m−c] experiencer and [−m+c] instrument) are neutral. They have no indices, so they can be inserted into any position in the phrase structure that is available at the moment of their insertion. The same applies to the arguments that are encoded as the only argument of a verb. Summary The approaches of Dowty (1991) and Reinhart (2002) reviewed in this section deal with issues in the traditional predicate-argument analysis by proposing small sets of primitive notions which can be seen as semantic components of the traditional argument labels. With such decomposition, the similarities between some arguments (e.g. agent, cause, and instrument), as well as the constraints in their realisations in phrases (e.g. if a clause expresses an agent of an event, it cannot express another distinct cause of the same event), follow from the features that characterise them. The generalisations proposed as part of these approaches contribute to a better understanding of how the interface between lexicon and syntax operates, capturing a wide range of observations. However, the sets of features used in these accounts are not motivated by some other more general principles. In other words, these accounts do not address the issue of why we have exactly the sets of features proposed in the two theories and not some others. The approaches to the predicate-argument structure reviewed in the following section propose deeper semantic analysis exploring the origins of the argument types. 2.1.4. Decomposing the meaning of verbs into multiple predicates In theories of predicate decomposition, it is assumed that verbs do not describe one single relation, but more than one. These relations are regarded as different components of an event described by the meaning of a verb, often referred to as subevents. Some of these components can be rather general and shared by many verbs, while others are idiosyncratic and characteristic of a particular verb entry. In this framework, each predicate included in the lexical representation of a verb assigns semantic roles to its 51 2. Overview of the literature arguments. The syntactic layout of a clause depends, thus, on the number and the nature of the predicates of which the meaning of its heading verb is composed. Many approaches to predicate decomposition are influenced by the work of Hale and Keyser (1993) who are the first to prpose a formal syntactic account the relational meaning of verbs. Hale and Keyser (1993) propose a separate level of lexical representation of verbs — lexical relational structure (LRS). The components of conceptual, idiosyncratic meaning of a verb are the arguments of the relations grouped in the LRS. The relational part of lexical representation, for example, is the same for the verbs get in (2.28) and bottle in (2.29) indicating that machine did something to the wine. The difference in meaning between these two verbs is explained by two different incorporations of idiosyncratic components in the relational structure. In the first case, the relational structure incorporates the verb get with its own complex structure, while in the second case, it incorporates the noun bottle. (2.28) A machine got the wine into bottles. (2.29) A machine bottled the wine. Different approaches which follow this kind of analysis offer different representations of the relational structure, depending on what organizational principle is taken as a basis for event decomposition. Aspectual event analysis Aspectual analysis of events takes into consideration temporal properties of verbs’ meaning. More precisely, it decomposes the relational part of lexical representation of verbs into a number of predicates which correspond to the stages in the temporal development of the event. These predicates take arguments which are then realised as the principal constituents of clauses. As an illustration of the phenomena that are of interest for the aspectual decomposition of verbs’ meaning, consider the sentences in (2.30-2.31). (2.30) a. Mary drank [a bottle of wine] in 2 hours / ? for two hours. b. Mary drank [wine] for 2 hours / * in two hours. 52 2.1. Theoretical approaches to the argument structure (2.31) a. Mary crammed [the pencils] into the jar. b. Mary crammed [the jar] with the pencils. The examples in (2.30) show that the choice of the adverbial with which the verb drink can be combined (in two hours vs. for two hours) depends on the presence or absence of the noun bottle in the object of the verb. Intuitively, we know that the adverbials such as in two hours are compatible only with the events which are understood as completed. The event expressed in the sentence in (2.30a), for example, is completed because the sentence implies that there is no more wine in the bottle. What makes this event completed and, thus, compatible with the adverbial in two hours is precisely the presence of the noun bottle in the object. This noun quantifies the substance which is the object (wine) and, at the same time, it quantifies the whole event which includes it. This fact points to the presence of a predicate in the relational structure of the verb which takes the noun bottle as its argument. This predicate relates the other parts of the lexical structure of the verb (or of the event described by the verb) with an end point. The nature of the end point is specified by the argument of this predicate, that is by the argument which is realised as the direct object in the phrase. If the direct object does not provide the quantification, the whole event is interpreted as not quantified, as in (2.30b). The verb in (2.30b) is not compatible with the adverbial in two hours because its object is not quantified. The examples in (2.31) show how an alternation of the arguments of a verb can change the temporal interpretation of the event described by the verb. The event in (2.31a) lasts until all the pencils are in the jar, while the event in (2.31b) lasts until the jar is full. The argument of the temporal delimitation predicate, which is usually realised as the direct object in a phrase, is known as the incremental theme (Krifka 1998). The adjective incremental in the term refers to the fact that the theme argument of the verb changes incrementally in the course of the event described by the verb. The degree of change in the theme “measures out” the development of the event. An influential general approach to aspectual decomposition of verbal predicates is proposed by Ramchand (2008). Looking at the event as a whole, Ramchand (2008) proposes several predicates which take arguments such as initiator, undergoer, path, and resultee. The predicates represent the subevents of the event described by a verb: 53 2. Overview of the literature the predicate whose argument is initiator represents the beginning of the event, those that take the arguments undegoer and path are in the middle, and the one whose argument is resultee is in the end. These predicates are added to the representation in the course of syntactic derivations, but only if this is allowed by the lexical specifications of verbs. In an analysis of the example (2.31a) in this framework, Mary is initiator, the pencils are both undergoer and resultee, and the jar is another argument of the last (resulting) predicate. In (2.31b) on the other hand, the argument of both the middle and the end predicate is the jar. This view of the semantic structure relates the complexity of the event described by the verb with the complexity of its argument structure, and by this, with the complexity of the structure of the clause that is formed with the verb. It should be noted, however, that this decomposition does not address all the temporal properties of phrases, but only those which are implicit to the meaning of the verbs. Causal event analysis In a causal analysis, events are analysed into entities (participants) that interact according to a particular energy flow or force-dynamic schemata (Talmy 2000). The main concept in this framework is the direction that the force can take. Semantic properties of verbs such as aspect and argument structure are seen as a consequence of a particular energy flow involved in the event described by the verb. If a verb describes an event were some energy is applied, it will be an action verb, otherwise it will describe a state. In the domain of argument structure, the participant that is the source of energy will have the role of agent and the one that is the sink of the energy will have the role of patient. This approach has been applied to account for different senses of the same verb, as well as for its different syntactic realisations. Both different verb senses and their argument alternations are explained by the shifts in the energy flow with the idiosyncratic meaning that stays unchanged. The following examples illustrate different force-dynamic patterns for the verb take. 54 2.1. Theoretical approaches to the argument structure (2.32) Sandy took the book { from Ashley / off the table }. (2.33) (a) Sandy took the book to Ashley. (b) Sandy took Ashley (to the movies). The action in (2.32) is self-oriented, with Sandy being the energy source and sink at the same time. In (2.33), another participant (Ashley / to the movies) is more important, since it indicates the direction of energy. These are two different senses of the verb take. The difference is reflected in the fact that this argument can be omitted in sentences like (2.32) whenever it is indefinite or indeterminate, while in sentences like (2.33), it can only be omitted if it can be interpreted from the context. 2.1.5. Summary In the approaches to the analysis of the predicate-argument structure outlined so far, the meaning of verbs and their arguments is described with relatively small inventories of descriptive notions. The proposed accounts decompose the predicate-argument structure into more primitive notions in an attempt to reduce the number of theoretical notions to a minimum. The Theta system by Reinhart (2002), for example, accounts for a wide range of linguistic facts using only three notions: mental state, cause change, and presence/absence. Similarly, the aspectual decomposition proposed by Ramchand (2008) results in only four principal components which play a role in accounting for many different argument realisations and interpretations. Theoretical accounts reviewed in this section identify important generalisations about the lexical representation of verbs. The generalisations are, however, not tested on a wide range of verbs, but mostly on a small set of examples either provided by the authors of the proposals themselves or taken from a common set of examples frequently cited in the literature. Applying theoretical generalisations to a larger set of verb instances in a more practical analysis is not straightforward. Some approaches to large-scale analysis of the predicate-argument structure are discussed in the following section. 55 2. Overview of the literature 2.2. Verb classes and specialised lexicons We have assumed so far that the predicate-argument relational structure is the part of lexical representation which is shared by different verbs, while the idiosyncratic lexical content is specific to each individual verb. However, if we take a closer look at the inventory of verbs in a language, this distinction turns out to be a simplified view of the organisation the inventory. We intuitively group together not only the verbs with the same predicate-argument structure, but also the verbs with similar lexical content. Such groups are, for example, verbs of motion (e.g. come, go, fall, rise, enter, exit, pass), state verbs (e.g. want, need, belong, know ), verbs of perception (e.g. see, watch, notice, hear, listen, feel ) etc. 2.2.1. Syntactic approach to verb classification The members of semantic classes tend to be associated with the same types of syntactic structures. For example, verbs of motion are usually intransitive, verbs of perception are usually transitive, while the verbs that describe states can be associated with a variety of different structures. However, it has been noticed that verbs which belong to the same semantic class do not always participate in the same argument alternations. For example, the verbs bake in (2.34) and make in (2.35) have similar meanings in that they are both verbs of creation and that they can take the same kind of objects. (2.34) a. Mary baked a cake. b. The cake baked for 40 minutes. (2.35) a. Mary made a cake. b. *The cake made for 40 minutes. Despite the obvious parallelism, the verb bake participates in the causative alternation (2.34b) is grammatical), while the verb make does not ((2.35b) is not grammatical). This contrast suggests that the two verbs have different lexical representations. There 56 2.2. Verb classes and specialised lexicons should be an element which is present in the structure of one verb and missing in the structure of the other, causing the difference in the syntactic patterns. On the basis of this assumption Levin (1993) studies possible argument realisations of a large number of verbs and proposes a comprehensive classification which combines semantic and syntactic groupings. The aim of Levin’s analysis is to use the information about argument alternations as behavioural, observable indicators of the components of verbs’ meaning which are grammatically relevant. Having pointed out many constraints and distinctions which call for theoretical accounts, Levin’s classification has been often referred to in the subsequent work on the predicateargument structure, including Levin’s own work (Levin and Rappaport Hovav 1994; 1995). An example of a phenomenon identified in Levin’s classification which has received a proper theoretical account is the distinction between verbs such as freeze, melt, grow, which are known as unaccusatives, and emission verbs such as glow, shine, beam, sparkle. The two groups are similar in that they consist of intransitive verbs which take the same kind of arguments — non-agentive, non-volitional — as subjects. The syntactic properties of the two groups, however, are different. While unaccusatives participate in the causative alternation, the emission verbs do not, which groups them with semantically very different agentive intransitive verbs such as walk, run, march, gallop, hurry. Reinhart (2002) employs the notions developed in the framework of the Theta System (see Section 2.1.3) to explain this fact by different derivations of unaccusatives and emission verbs. Unaccusatives are derived lexical entries (derived from transitive verbs). Their argument is marked with the index 2, as the internal argument of the transitive verb. By the operation of reduction, the other argument is removed. The remaining argument is merged internally, even if it stays the only argument of the verb due to the fact that is marked with the index 2. It then moves to the position of the subject to satisfy general syntactic conditions. As for emission verbs, their subject is originally the only argument. This is why it cannot be marked. And since it is not marked, it is merged to the first position available — and this is the external position of the subject. The systematic analysis of a large set of verbs proposed by Levin (1993) proved to be especially important for the subsequent empirical approaches to the meaning of verbs. The 57 2. Overview of the literature classification has often been cited as the reference resource for selecting specific groups of verbs for various purposes, including the experiments presented in this dissertation. The more recent work on the argument alternation is concentrated on the conditions which determine different syntactic realisations of verbs’ arguments in alternations. Beavers (2006) revisits a range of alternations, especially those which involve arguments switching between the direct object and a prepositional complement, arguing that general semantic relationships between syntactic constituents directly influence their position in a clause, and not only the relationship of the arguments with verbal predicates. Beavers (2006) proposes a set of semantic hierarchies along different dimensions, such as the one illustrated in (2.36): the higher the interpretation of an argument in the hierarchy, the more distant its syntactic realisation from the direct object. (2.36) Affectedness scale: PARTICIPANT ⊂ IMPINGED ⊂ AFFECTED ⊂ TOTALLY AFFECTED Bresnan (2007) takes an empirical approach proposing a statistical model of speakers’ choice between two options provided by the dative alternation. The study first shows that human judgement of acceptability of syntactic constructions are influenced by the frequency of the constructions. It then shows that several factors are good predictors of human judgements. If the recipient role is characterised as nominal, non-given, indefinite, inanimate, and not local in the given spacial context, it is likely to be realised as a prepositional complement, while it is characterised with the opposite features (as pronominal, given, definite, animate, and local) it is likely to be realised as the indirect object. Bresnan and Nikitina (2009) offer an explanation of the speaker’s choice based on the interaction of two opposed tendencies. On the one hand, there is the tendency of semantic arguments to be aligned with the syntactic functions, more prominent arguments are realised as more prominent syntactic functions (like the direct and the indirect object), while less prominent arguments are realised as prepositional complements. On the other hand, the form of the prepositional phrase expresses the relationship between the verb and its complement in a more transparent way. Hence, if the argument is semantically prominent enough, it will be assigned a less transparent, but more syntactically prominent functions. Otherwise, it will be realised as a prepositional complement. 58 2.2. Verb classes and specialised lexicons The more recent developments in the approach to argument alternations, however, have not been followed by a large-scale implementation. The comprehensive resources which have been developed up to the present day do not make reference to these generalisations. 2.2.2. Manually annotated lexical resources Three big projects are concerned with providing extensive descriptions of the predicateargument relations for English words. They are described in the following subsections. We start by describing FrameNet. As our detailed review shows, this resource implements the least theoretical view of the predicate argument structure based on the atomic semantic role analysis. Nevertheless, this resource is the one that is most frequently used as a reference for developing similar resources for other languages. The second resource, PropBank, implements Dowty (1991)’s view of the predicate-argument structure, but with significant simplifications which brings the implementation closer to the atomic view. This resource has been frequently used for machine learning experiments due to the fact that, in addition to the lexicon of verbs, it provides a large corpus of texts manually annotated with the proposed predicate-argument analysis. The last resource that we discuss is VerbNet. Although it implements Levin (1993)’s classification, this resource too relies on the atomic view of semantic roles, assigning traditional semantic role labels to the arguments of verbs. FrameNet FrameNet is an electronic resource and a framework for explicit description of the lexical semantics of words. It is intended to be used by lexicographers, but also by systems for natural languages processing (Baker et al. 1998). It consists of three interrelated databases: a. Frame database, the core component describing semantic frames that can be expressed by lexical units. 59 2. Overview of the literature b. Annotated example sentences extracted from the British National Corpus (Burnard 2007) with manually annotated frames and frame elements that are described in the Frame database. c. Lexicon, a list of lexical units described in terms of short dictionary definitions and detailed morpho-syntactic specifications of the units that can realise their arguments in a sentence. The frame database (1 179 frames)9 contains descriptions of frames or scenes that can be described by predicating lexical units (Fillmore 1982), such as verbs, adjectives, prepositions, nouns. Each scene involves one or more participants. The predicating units are referred to as “targets”, and the participant which are combined with predicating units as “frame elements”. One frame can be realised in its “core” version including “core” frame elements, or it can be realised as a particular variation, including additional frame elements that are specific for the variation. For example, the target unit for the frame Accomplishment can be one of the verbs accomplish, achieve, bring about, or one of the nouns accomplishment, achievement. The core frame elements for this frame are: a. agent: The conscious entity, generally a person, that performs the intentional act that fulfills the goal. b. goal: The state or action that the agent has wished to participate in. The definition of the frame itself specifies the interaction between the core frame elements: After a period in which the agent has been working on a goal, the agent manages to attain it. The goal may be a desired state, or be conceptualised as an event. For the non-core realisations, only the additional frame elements are defined. The frames are organised into a network by means of one or more frame-to-frame relations that are defined as attributes of frames. (Note thought that not all frames could 9 Online documentation at https://framenet.icsi.berkeley.edu/fndrupal/current_status, accessed on 5 June 2014. 60 2.2. Verb classes and specialised lexicons be related to other frames.) Defining the relations enables grouping related frames according to different criteria, so that the annotation can be used with different levels of granularity. There are six types of relations that can be defined: • Inherits From / Is Inherited By: relates an abstract to a more specified frame with the same meaning, e.g. Activity (no lexical units) inherits from Process (lexical unit process.n) and it is inherited by Apply heat (lexical units bake.v, barbecue.v, boil.v, cook.v, fry.v, grill.v, roast.v, toast.v, ...). • Subframe of / Has Subframes: if an event described by a frame can be divided into smaller parts, this relation holds between the frames that describe the parts and the one that describes the whole event, e.g. Activity has subframes: Activity abandoned state, Activity done state (lexical units done.a, finished.a, through.a), Activity finish (lexical units complete.v, completion.n, ...), Activity ongoing (lexical units carry on.v, continue.v, keep on.v, ...), Activity pause (lexical units freeze.n, freeze.v, pause.n, take break.v, ...), Activity paused state, Activity prepare, Activity ready state (lexical units prepared.a, ready.a, set.a), Activity resume (lexical units renew.v, restart.v, resume.v ), Activity start (lexical units begin.v, beginner.n, commence.v, enter.v, initiate.v, ...), Activity stop (lexical units quit.v, stop.v, terminate.v, ...). • Precedes / Is Preceded by: holds between the frames that describe different parts of the same event, e.g. Activity pause precedes Activity paused state and is preceded by Activity ongoing. • Uses / Is Used By: connects the frames that share some elements, e.g. Accomplishment (lexical units accomplish.v, accomplishment.n, achieve.v, achievement.n, bring about.v ) uses Intentionally act (lexical units act.n, act.v, action.n, activity.n, carry out.v,, ...). • Perspective on / Is perspectivised in: holds between the frames that express different perspectives on the same event, e.g. Giving (lexical units gift.n, gift.v, give.v, give out.v, hand in.v, hand.v, ...) is a perspective on Transfer (lexical units transfer.n, transfer.v ), Transfer can also be perspectivised in Receiving (lexical units accept.v, receipt.n, receive.v ). 61 2. Overview of the literature • Is Causative of: e.g. Apply heat is causative of Absorb heat (lexical units bake.v, boil.v, cook.v, fry.v,...) The relations of inheritance, using, subframe, and perspective connect specific frames to the corresponding more general frames, but in different ways. The specific frame is a kind of the general frame in inheritance. Only a part of the specific frame is a kind of the general frame in using. A subframe is a part of another frame. The other two relations do not involve abstraction, they hold between the frames of the same level of specificity. Frames and frame elements can also be classified into semantic types that are not based on the hierarchies described above, but that correspond to some ontologies that are commonly referred to (such as WordNet (Fellbaum 1998)). For example, frames are divided into non-lexical (e.g. Activity) and lexical (e.g. Accomplishment). Similarly, the frame element agent belongs to the type “sentient”, and theme belongs to the type “physical object”. Annotated examples such as (2.37-2.41) are provided for most of the frame versions. (2.37) [Iraq] had [achieved] agent target [its programme objective of producing nuclear weapons]. goal (2.38) Perhaps [you] [achieved] [perfection] [too quickly]. agent target goal manner (2.39) [He] has [only partially] [achieved] [his objective]. agent degree target goal (2.40) [These positive aspects of the Michigan law] may, however, have been goal [achieved] at the expense of simplicity. [CNI] target agent 62 2.2. Verb classes and specialised lexicons Frame element agent circumstances degree explanation goal instrument manner means outcome place time Syntactic realization (phrase types and their functions) CNI.– NP.Ext PP[by].Dep PP[for].Dep PP[in].Dep PP[despite].Dep PP[as].Dep AJP.Dep PP[on].Dep AVP.Dep PP[since].Dep PP[because of].Dep NP.Ext NP.Obj NP.Dep PP[in].Dep PP[with].Dep NP.Ext VPing.Dep AVP.Dep PP[in].Dep PP[through].Dep PP[by].Dep PP[in].Dep NP.Ext PP[at].Dep NP.Ext PP[in].Dep PP[at].Dep Sfin.Dep AVP.Dep PP[in].Dep PP[at].Dep PP[after].Dep Table 2.1.: Frame elements for the verb achieve (2.41) A programme of national assessment began in May 1978 and concerned itself with [the standard] [achieved] [by 11 year olds]. goal target agent The units of the lexicon are word senses (12 754 units). The entries contain a short lexical definition of the sense of the word, the frame that the unit realises, as well as the list of the frame elements that can occur with it. They also contain two more pieces of information on frame elements: the specification of the syntactic form that each frame element can take and a list of possible combinations of the frame elements in a sentence. For example, the verb achieve realises the frame Accomplishment. The frame elements that can occur with it are listed in Table 2.1. The first row in Table 2.1 states that the frame element agent occurs with the verb achieve and that it can be realised as Constructional null instantiation (CNI), which is most often the case in passive sentences (2.40), or as a noun phrase external to the 63 2. Overview of the literature Agent NP Ext Agent NP Ext Agent NP Ext Goal NP Obj Goal NP Obj Degree AVP Dep Manner AVP Dep Goal GOAL Obj Table 2.2.: Some combinations of frame elements for the verb achieve. verb phrase headed by the target verb, which is most often the subject of a sentence (2.37-2.39), or as a prepositional phrase headed by the preposition by and realizing the grammatical function of dependent,10 or as a prepositional phrase headed by the preposition for with the same grammatical function. Possible syntactic realizations for the other frame elements are described in the same way. Since not all frame elements can be combined with all the others in a sentence, the possible combinations of the frame elements are also listed. Some of the possible combinations for the verb achieve, those that correspond to the examples (2.37-2.39), are listed in Table 2.2. The original entry for this verb contains 19 combinations in total. Each of the combinations can have several versions depending on the type and grammatical function of the constituents that realise the frame elements. The Proposition Bank (PropBank) PropBank is a resource which consists of an annotated corpus of naturally occurring sentences and a lexicon of verbal predicates with explicitly listed possible arguments. It is intended to be used for developing systems for natural language understanding that 10 In the system of grammatical functions used in FrameNet, the standard distinction between a complement and a modifier is not made. They are both considered dependent constituents — dependents. (Ruppenhofer et al. 2005) 64 2.2. Verb classes and specialised lexicons depend on semantic parsing, but also for quantitative analysis of syntactic alternations and transformations. The corpus contains 2 499 articles (1 million words) published in the Wall Street Journal for which the syntactic structure was annotated in the The Penn Treebank Project (Marcus et al. 1994). The semantic roles were annotated in the PropBank project (Palmer et al. 2005a). The labels for the semantic roles were attached to the corresponding nodes in the syntactic trees. A simplified example of an annotated sentence is given in (2.42). The added semantic annotation is placed between the “@” characters. S NP-SBJ@ARG0pay01@ VP DT NN VBZ The nation has VP VBN@rel-pay01@ paid (2.42) NP@ARG1pay01@ PP-TMP@ARGM-TMPpay01@ RB RB JJ NN very little last year Only a limited set of labels was used for annotation. Verbs are marked with the label rel for relation and the participants in the situation described by the verb are marked with the labels arg0 to arg5 for the verb’s arguments and with arg-m for adjuncts. The numbered labels represent semantic roles of a very general kind. The labels arg0 and arg1 have approximately the same value with all verbs. They are used to mark instances of proto-agent (arg0) and proto-patient (arg1) roles (see 2.1.3). The value of other indices varies across verbs. It depends on the meaning of the verb, on the type of the constituent that they are attached to, and on the number of roles present in a particular sentence. arg3, for example, can mark the purpose, or it can mark a direction or some other role with other verbs. The indices are assigned according to the roles’ prominence in the sentence. More prominent are the roles that are more closely related to the verb. 65 2. Overview of the literature The arg-m can have different versions depending on the semantic type of the constituent: loc denoting location, cau for cause, ext for extent, tmp for time, dis for discourse connectives, pnc for purpose, adv forgeneral-purpose, mnr for manner, dir for direction, neg for negation marker, and mod for modal verb. The last three labels do not correspond to adjuncts, but they are added to the set of labels for semantic annotation nevertheless, so that all the constituents that surround the verb could have a semantic label (Palmer et al. 2005a). The labels for adjuncts are more specific than the labels for arguments. They do not depend on the presence of other roles in the sentence. They are mapped directly from the syntactic annotation. For example, the verb pay in (2.42) assigns two semantic roles to its arguments and one to an adjunct. arg0 is attached to the noun phrase that is the subject of the sentence (NP-SUBJ: The nation) and it represents the (proto-)agent. arg1 is attached to the direct object (NP: very little). The label for the adjunct (PP-TMP: last year ), arg-m-tmp, is mapped from the syntactic label for the corresponding phrase. The annotated corpus is accompanied with a lexicon that specifies the interpretation of the roles for each verb in its different senses. The unit of the lexicon is a lemma (3300 verbs) containing one or more lexemes (4 500 verb senses). The interpretations for the numbered roles are given for each lexeme separately. Table 2.3 illustrates the lexical entry for the verb pay. Possible syntactic realisations of the roles are not explicitly described as in FrameNet, but they are illustrated with a number of annotated sentences, each representing a different syntactic realisation of the role. These sentences are mostly drawn from the corpus. For some syntactic realizations that are not attested in the corpus, example sentences are constructed. VerbNet VerbNet is a database which is primarily concerned with classification of English verbs. The approach to classification is based on the framework proposed by Levin (1993). It takes into account two properties: a) the lexical meaning of a verb and b) the kind of 66 2.2. Verb classes and specialised lexicons pay.01 pay.02 (pay off) pay.03 (pay out) Arg0: Arg1: Arg2: Arg3: Arg0: Arg1: Arg2: Arg0: Arg1: Arg2: Arg3: Arg1: Arg1: payer or buyer money or attention person being paid, destination of attention commodity, paid for what payer debt owed to whom, person paid payer or buyer money or attention person being paid, destination of attention commodity, paid for what thing succeeding or working out thing succeeding or working out pay.04 pay.05 (pay off) pay.06 Arg0: payer (pay down) Arg1: debt Table 2.3.: The PropBank lexicon entry for the verb pay. argument alternations that can be observed in the sentences formed with a particular verb (see Section 2.2.1 for more details). The unit of classification in VerbNet is a verb sense. It currently covers 6 340 verb senses. The classification is partially hierarchical, including 237 top-level classes with only three more levels of subdivision (Kipper Schuler 2005). Each class entry includes: • Member verbs. • Semantic roles — All the verbs in the class assign the same roles. These roles are semantic roles that are more general than frame elements in FrameNet, but more specific than the numbered roles in PropBank. The label for a role in VerbNet does not depend on context, as in FrameNet and PropBank. There is a fixed set of roles that have the same interpretation with all verbs. Although the set of roles is fixed in principle, its members are revised in the course of the resource development (Bonial et al. 2011). Initially the set included the following roles: 67 2. Overview of the literature Members accept, discourage, encourage, understand Roles agent [+animate | +organization] , proposition Frames HOW-S: Example: “I accept how you do it.” Syntax: Agent V Proposition <+how-extract> Semantics: approve(during(E), Agent, Proposition) Table 2.4.: The VerbNet entry for the class Approve-77. Original Role set in VerbNet (23): actor, agent, asset, attribute, beneficiary, cause, location, destination, source, experiencer, extent, instrument, material, product, patient, predicate, recipient, stimulus, theme, time, topic. The set was later revised to include the following roles: Updated Role set in VerbNet (33): actor, agent, asset, attribute, beneficiary, cause, co-agent, co-patient, co-theme, destination, duration, experiencer, extent, final time, frequency, goal, initial location, initial time, instrument, location, material, participant, patient, pivot, place, product, recipient, result, source, stimulus, theme, trajectory, topic. The revised set is accompanied by a hierarchy, where all the roles are classified into four categories: actor, undergoer, time, place. • Selectional restrictions — defining characteristics of possible verbs’ arguments, such as [+animate | +organization] for the role agent in Table 2.4. They can be compared with the semantic types in FrameNet. • Frames — containing a description of syntactic realizations of the arguments and some additional semantic features of verbs. (Note that the VerbNet frames are different from the FrameNet frames.) In the example class entry given in Table 2.4, only one (HOW-S) of the 5 frames that are defined in the original entry is 68 2.2. Verb classes and specialised lexicons included, since the other frames are defined in the same way. The semantics of the verb describes the temporal analysis of the verbal predicates (see Section 2.1.4). The VerbNet database contains also information about the correspondence between the classes of verbs and lexical entries in other resources. 5 649 links with the PropBank lexicon entries have been specified, as well as 4 186 with the FrameNet entries. No annotated example sentences are provided directly by the resource. However, naturally occurring sentences with annotated VerbNet semantic roles can be found in another resource, SemLink (Loper et al. 2007), which maps the PropBank annotation to the VerbNet descriptions. Each numbered semantic role annotated in the PropBank corpus is also annotated with the corresponding mnemonic role from the set of roles used in VerbNet. This resource enables comparison between the two annotations and exploration of their usefulness for the systems for automatic semantic role labelling. Comparing the resources The three resources that are described in the previous subsections all provide information on how predicating words combine with other constituents in a sentences: what kind of constituents they combine with and what interpretation they impose on these constituents. They are all intended to be used for training systems for automatic semantic parsing. However, there are considerable differences in the data provided by these resources. The overlap between the sets of lexical items covered by the three resources is rather limited. For example, 25.5% of the word instances in PropBank are not covered by VerbNet (Loper et al. 2007)), despite the fact that VerbNet contains more entries than PropBank (3 300 verbs in PropBank vs. 3 600 verbs in VerbNet). The coverage issue has been pointed out in the case of FrameNet too. Burchardt et al. (2009) use English FrameNet to annotate a corpus of German sentences manually. They find that the vast majority of frames can be applied to German directly. However, around one third of the verb senses identified in the German corpus were not covered by FrameNet. Also, a number of German verbs were found to be underspecified. Contrary to this, Monachesi et al. (2007) use PropBank labels for semi-automatic annotation of a corpus 69 2. Overview of the literature of Dutch sentences. Although not all Dutch verbs could be translated to an equivalent verb sense in English, these cases were assessed as relatively rare. Samardžić et al. (2010) and van der Plas et al. (2010) use PropBank labels to annotate manually a corpus of French sentences. The coverage reported in these studies is around 95%. Potential explanation of the coverage differences lies in the fact that PropBank is the only of the three resources which is based on text samples. While the criteria for including lexical items in the lexicons are not clear in the other two resources, the verbs and verb senses included in PropBank are those that are found in the corpus which is taken as a starting point. The criteria for distinguishing verb senses are differently defined, which means that different senses are described even for the same words. It can be noted, for example, in Table 2.3 that PropBank introduces a new verb sense for a phrasal verb even if its other properties are identical to those of the corresponding simplex verb, which is not the case in the other two databases. The information that is included in the lexicon entries is also different. We can see, for example, that the morpho-syntactic properties of the constituents that combine with the predicating words are described in different way. While FrameNet provides detailed specifications (Table 2.1), VerbNet defines these properties only for some argument realizations (Table 2.4). PropBank does not contain this information in the lexicon at all, but all the instances of the roles are attached to nodes in syntactic trees in the annotated corpus. Finally, different sets of roles are used in the descriptions. FrameNet uses many different role labels that depend on which frame they occur in. These roles can have a more specific meaning such as buyer in the frame Commerce-buy, but they can also refer to a more general notions such as agent in the frame Accomplishment (see Section 2.2.2). VerbNet uses a set of 23 roles with general meaning that are interpreted in the same way with all verbs. PropBank uses only 6 role labels, but their interpretation varies depending on the context (see Section 2.2.2). Interestingly, all the three resources adopt atomic notions and relatively arbitrary role sets, despite the arguments for decomposing the predicate-argument structure, put forward in the linguistic literature (see Section 2.1.2). PropBank labels are based on Dowty (1991)’s notions of proto-roles, but the properties 70 2.2. Verb classes and specialised lexicons which define to what degree a role belongs to one of the two types (see Section 2.1.3) are not annotated separately. A number of experiments have been conducted to investigate how the differences between the PropBank and the VerbNet annotation schemes influence the systems for automatic role labelling. The task of learning the VerbNet labels can be expected to be more difficult, since there are more different items to learn. On the other hand, the fact that the labels are used in a consistent way with different verbs could make it easier because the labels should be better associated with the other features used by the systems. Loper et al. (2007) show that the system trained on the VerbNet labels predicts better the label for new instances than the system trained on the PropBank labels, especially if the new instances occur in texts of a different genre. However, this finding only holds if the performance is compared for the arg1 and arg2 labels in PropBank vs. the sets of VerbNet labels that correspond to them respectively. The VerbNet labels were grouped in more general labels for this experiment, 6 labels corresponding to arg1 and 5 corresponding to arg2. If the overall performance is compared, the PropBank labels are better predicted, which is also confirmed by the findings of Zapirain et al. (2008). Merlo and van der Plas (2009) compare different quantitative aspects of the two annotation schemes and propose the ways in which the resources can be combined. They first reconsider the evaluation of the performances of the systems for automatic semantic role labelling. They point out that an uninformed system that predicts only one role, the most frequent one, for every case would be correct in 51% cases if it learned the PropBank roles, and only in 33% cases if it learned the VerbNet roles, due to the different distributions of the instances of the roles in the corpus. They neutralise this bias by calculating and comparing the reduction in error rate for the two annotations. According to this measure, the overall performance is better for the VerbNet labels, but it is more degraded in the cases where the verb is not known (not observed in the training data) compared to the cases where it is known, due to the stronger correlation between the verb and its role set. Thus, they argue that the VerbNet labels should be used if the verb is known, and the PropBank labels if it is new. Looking at the joint distribution of the labels in the corpus, Merlo and van der Plas (2009) note different relations for roles with different frequencies. The frequent labels in 71 2. Overview of the literature PropBank can be seen as generalizations of the frequent labels in VerbNet. For example agent and experiencer are most frequently labelled as arg0, while theme, topic, and patent are most frequently labelled as arg1, which means that the PropBank labels group together the similar VerbNet labels. The PropBank labels of low frequency are more specific and more variable, due to the fact that they depend on the context, and the VerbNet labels are more stable. Thus for interpretation for a particular instance of a PropBank label a VerbNet label could be useful. The comparisons of the sets of labels used in PropBank and VerbNet annotation schemes indicate that they can be seen as complementary sources of information about semantic roles. However, other aspects of combining the resources are still to be explored. The described comparisons are performed only for the lexical units which are included both in PropBank and VerbNet. Since these two resources contain different lexical items, their combination might be used for increasing the coverage. Also, the potential advantages of using the other data provided by the resources (e.g. the hierarchies defined in FrameNet and VerbNet) are still to be examined. 2.3. Automatic approaches to the predicate-argument structure The relational meaning of verbs, represented as the predicate-argument structure, is not only interesting from a theoretical point of view. As a relatively simple representation of the meaning of a clause, which can also be related with some observable indicators (see Section 2.2.1), this representation has attracted considerable attention in the domain of automatic analysis of the structure of natural language. An automatic analysis of the predicate-argument structure can be useful for improving human-machine interface so that computers can be exploited in searching for information in texts and databases, automatic translation, automatic booking, and others. For example, an automatic railway information system could use such an analysis to “understand” that from Geneva denotes the starting point and to Montreux the destination of the request in (2.43). (2.43) What is the shortest connection from Geneva to Montreux? 72 2.3. Automatic approaches to the predicate-argument structure Automatic analysis of the predicate-argument structure relies on the observations of the instances of verbs in large samples of texts, electronic corpora. The observable characteristics of the instances of verbs are formulated as features which are then used to train machine learning algorithms. Most of the algorithms used in computational approaches to the predicate-argument structure are not tailored specifically for natural language processing, but they are general algorithms which can be applied to a wider range of machine learning tasks. Nevertheless, with well chosen feature representation, statistic modelling of data, and an appropriate architecture, systems generally manage to capture semantically relevant aspects of the uses of verbs showing high agreement with human judgments. In this dissertation, we regard computational approaches to the predicate-argument structure of verbs as a suitable experimental framework for testing our theoretical hypotheses. To achieve a good performance in automatic analysis of the predicateargument structure, it is necessary to capture generalisations about the relationship between the meaning of verbs, which people interpret intuitively, and the distribution of different observable characteristics of verb uses in language corpora. Since the same relationship is modelled in our work, we study and apply methods used in computational approaches. As opposed to theoretical approaches, computational approaches put the accent on predictions rather than on the generalisations themselves. In theoretical accounts of linguistic phenomena, generalisations are usually stated explicitly in the form of grammar rules. In computational approaches, generalisations are often formulated in terms of statistical models expressing explicitly relationships between structural elements, but not necessarily in the form of grammar rules. Predictions which follow from the generalisations can be explicitly formulated in theoretical accounts, but this is not necessary. Contrary to this, predictions are precisely formulated in computational approaches and tested experimentally. Another important difference between theoretical and computational approaches is in the theoretical context which is assumed for each particular problem. While theoretical accounts treat particular problems in relation to a more general theory of linguistic structure, computational approaches are focused on specific tasks treating them as inde- 73 2. Overview of the literature pendent of other tasks. The task orientation in computational approaches allows specific definitions of predictions and measuring the performance, but theoretical relevance of discovered generalisations is often not straightforward. In this section, we discuss the work in the natural language processing framework which involves analysing the predicate-argument structure of verbs. We concentrate especially on two tasks which deal with the relationship between the meaning of verbs and the structure of the clauses: semantic role labelling and verb classification. We briefly describe other related tasks which are less directly concerned with the relationship studied in our own experiments. 2.3.1. Early analyses Early work on automatic analysis of the syntactic realisations of verbs’ semantic arguments centred around automatic development of lexical resources which would be used for syntactic parsing and text generation. The work was based on the assumption that the number of potential syntactic analyses can be significantly reduced if the information about the verb’s subcategorisation frame is known (Manning 1993; Brent 1993; Briscoe and Carroll 1997). It was soon understood that the notion of verb subcategorisation alone does not capture the relevant lexical information. Due to the alternations of arguments, many verbs are systematically used with multiple subcateogrisation frames. The subsequent work brought some proposals for automatic identification of argument alternations of verbs (McCarthy and Korhonen 1998; Lapata 1999). These proposals still concerned mostly syntactic issues. Early approaches, as well as the majority of the subsequent work on lexical and syntactic properties of verbs, do not target the nature of the relationship between a verbal predicate and its semantic arguments. The tasks are defined in terms syntactic subcategorisation and selectional preferences. The aim of this research is to improve the performance of automatic parsers by limiting the range of possible syntactic constituents with which a verb can be combined (the taks of identifying the subcategorisation frames) and the range of possible lexical items which can head these constituents (the task of identifying selectional preferences). 74 2.3. Automatic approaches to the predicate-argument structure Identifying the nature of the semantic relationships between verbs and their arguments was established as a separate task, called semantic role labelling. In the following section, we discuss in detail the methods used in this task. 2.3.2. Semantic role labelling The work on automatic semantic role labelling was enabled by the creation of the resources described in Section 2.2.2, which provided enough annotated examples to be used for training and testing the systems. Since the first experiments (Gildea and Jurafsky 2002), semantic role labelling has received considerable attention, which has resulted in a variety of proposed approaches and systems. Many of the systems have been developed and directly compared within shared tasks such as the CoNLL-2005 shared task on semantic role labelling (Carreras and Màrquez 2005) and the CoNLL-2009 shared task on syntactic and semantic dependencies in multiple languages (Hajič et al. 2009). Most of the numerous proposed solutions follow what can be considered the standard approach. Standard semantic role labelling The most widely adopted view of the task of automatic semantic role labelling is the supervised machine learning approach defined by Gildea and Jurafsky (2002). The term supervised refers to the fact that a machine learning system is trained to recognise the predicate-argument structure of a clause by first observing a range of examples where the correct structure is explicitly annotated. (2.44) 75 2. Overview of the literature S S NP/agent Mary a) VP NP V NP/theme made a cake Mary b) VP V NP made a cake The annotation guides the system in selecting the appropriate indicators of the structure. The program reads the training input (a simplified example of a training sentence is shown in (2.44a)) and collects the information about the co-occurrence of the annotated structure and other observable properties of the phrase (lexical, morphological, and syntactic). The collected observations are transformed into structured knowledge and generalisations are made by means of a statistical model. Once the model is built using the training data, it is asked to predict the predicate-argument structure of new (test) phrases (illustrated in (2.44b)), by observing their lexical, morphological and syntactic properties. In the standard approach, the task of predicting the predicate argument structure of a sentence is divided into two sub-tasks: identifying the constituents that bear a semantic role (distinguishing them from the constituents that do not) and identifying the semantic role label for all the constituents that bear one. Both sub-tasks are defined as a classification problem: each constituents is first classified as either bearing or not bearing a semantic role. Each constituent bearing a role is then classified into one of the predefined semantic role classes. All the constituents belonging to the same class bear the same semantic role. The two classification steps constitute the core of the semantic role labelling task, which is usually performed in a pipeline including some pre- and post-processing. The pre-processing part provides the information which is considered given. First, the predicates which assign semantic roles to the constituents are identified prior to semantic role labelling proper. They are usually identified as the main verbs which head clauses. Second, the syntactic analysis of the sentence that is being analysed is considered given. Both pieces of information are obtained by morphological and syntactic processing of 76 2.3. Automatic approaches to the predicate-argument structure the sentence. The relatively good performances of current morphological and syntactic analysers allow these analyses to be performed automatically. In practice, most of the systems use resources in which predicates are manually annotated (see Section 2.2.2). However, Merlo and Musillo (2008) show that this information can also be automated with comparable results, exploiting the relevant syntactic phenomena already encoded in the syntactic annotation. In deciding which arguments belongs to which predicate in a sentence, two sets of conditions are informative. First, the minimality conditions determine whether another verb intervenes between a constituent and its potential verb predicate. Second, the locality constraints determine whether the constituent is realised outside of a verb phrase, either as the subject of a sentence or an extracted constituent. The post-procesing part can include various operations depending on the particular system. In most cases, this part includes optimising at the sentence level. This step is needed to account for the fact that semantic roles which occur in one sentence are not mutually independent. For example, if one role is assigned to one syntactic constituent, it is unlikely that the same role is assigned to another constituent in the same sentence. In the standard approach, all the constituents are first assigned a role independently, then the assignments are reconsidered in the post-processing phase taking into account the information about the other roles in the sentence. We do not discuss in detail all the aspects of automatic semantic role labelling, but we focus on two aspects which are most relevant to our own experiments: knowledge representation using features and statistical modelling of the collected data. Features. The grammatical properties of phrases which are relevant for semantic role labelling are described in terms of features.11 Different systems may use different features, depending on the approach, but, as noted by Carreras and Màrquez (2005) and Palmer et al. (2010), a core set of features is used in almost all approaches. Those are mainly the features defined already by Gildea and Jurafsky (2002): 11 Note that the term feature is used in a different way in computational and in theoretical linguistics. Features in theoretical linguistics are more or less formal properties of lexical units which indicate with what other lexical units they can be combined in syntactic derivations. In computational linguistics, a feature can be any fact about a particular use of some word of a phrase. 77 2. Overview of the literature • Phrase type — reflects the fact that some semantic roles tend to be realised by one and others by another type of phrase. For example, the role goal tends to be realised by noun phrases, and the role place is realised by prepositional phrases. In training, the phrase type of each constituent annotated as a realisation of a semantic argument of a verb is recorded. In the toy example given in (2.44), both roles would be assigned the same value for this feature: NP. • Governing category — defines the grammatical function of the constituent that realises a particular semantic role. This feature captures the fact that some semantic roles are realised as the subject in a sentence, and others are realised as the direct object. The feature is defined so that it can only have two possible values: S and VP. If a constituent bearing a semantic role is governed by the node S in a syntactic tree, it is the subject of the sentence (Mary in (2.44)); if it is governed by the node VP, it means that it belongs to the verb phrase, which is the position of the object (a cake in (2.44)). The difference between the direct and the indirect object is not made. • Parse tree path — defines the path in the syntactic tree which connects a given semantic role to its corresponding predicate. The value of this feature is the sequence of nodes that form the path, starting with the verb node and ending with the phrase that realises the role. The direction of moving from one node to another is marked with arrows. For example, the value of the feature for the agent role in the example (2.44) relating it to the verb made would be: V↑VP↑S↓NP; the value of this feature for the theme role would be: V↑VP↓NP. The whole string is regarded as an atomic value. Possible values for this feature are numerous. Gildea and Jurafsky (2002) count 2 978 different values in their training data. • Position — defines the position of the constituent bearing a semantic role relative to its corresponding predicate, whether the constituent occurs before or after the predicate. This is another way to describe the grammatical function of the constituent, since subjects tend to occur before and objects after the verb. • Voice — marking whether the verb is used as passive or active. This feature is needed to capture the systematic alternation of the relation between the grammatical function and semantic role of a constituent. While agent is the subject 78 2.3. Automatic approaches to the predicate-argument structure and theme is the object in typical realisations, the reverse is true if the passive transformation takes place. • Head word — describes the relation between the lexical content of a constituent and the semantic role that it bears. The value of this feature is the lexical item that heads the constituent. For example, a constituent which is headed by Mary is more likely to be an agent than a theme, while the constituent headed by cake is more likely to be a theme, as it is the case in (2.44). The overview of the features shows that the systems for automatic identification of the roles of verbs’ arguments largely rely on the syntactic analysis and the relation between the type of a semantic role and the syntactic form. Three of the listed features, path, government, and position are different indicators of the grammatical function of the constituents. Gildea and Jurafsky (2002) compare performances of the system using only one or two features at a time to the performance using the whole set. They find that using both the position and either of the other two feature is redundant. On the other hand, including any of these features is necessary. More recent systems use more features. In addition to the government feature, for instance, the information about the siblings of the constituent in the tree is collected. Also, information about the subcategorization frame or syntactic pattern of the verb is often used (Carreras and Màrquez 2005). Selecting a particular set of features to represent the relevant knowledge about the predicate-argument structure does not rely on any particular theoretical framework or study. The choice of features tends to be arbitrary with little research on its linguistic background. An exception to this is the study of Xue and Palmer (2004), which shows that the feature set which should be used for argument identification is not the same as the set which should be used for assigning the labels. Modelling. When predicting the correct semantic role for a string of words (usually representing a constituent of a sentence) the system observes the values of the defined features in the test data and calculates the probability that each of the possible roles occurs in the given conditions. The role that is most likely to occur in the given conditions is assigned to the constituent. 79 2. Overview of the literature The probability that is calculated for each possible role is formulated in the following way (Gildea and Jurafsky 2002): P (r|h, pt, gov, position, voice, t) (2.45) The knowledge about the current instance which is being classified consists of the values of the features listed on the right-hand side of the bar. The formula in (2.45) reads in the following way: What is the probability that a particular constituent bears a particular semantic role r knowing that the head of the constituent is h, the path between the constituent and the predicate is pt, the category governing the constituent is gov, the position of the constituent relative to the predicate is position, the voice of the verb predicate is voice, and the verb predicate is t? To choose the best role for a particular set of feature values, the probability of each role in the given context needs to be assessed. One could assume that the role which occurs most frequently with a given combination of values of the features in the training data is the role that is most likely to occur with the same features in the test data too. In this case, the probability could be calculated as the relative frequency of the observations: the number of times the role occurs with the combination of features out of all the occurrences of the combination of features in question: P (r|h, pt, gov, position, voice, t) = #(r, h, pt, gov, position, voice, t) #(h, pt, gov, position, voice, t) (2.46) The problem with this approach is that some features can have many different values (e.g. the value of the feature head word can be any word in the language), which results in a large number of possible combinations of the values. Many of the combinations will not occur in the training data, even if large-scale resources are available. Thus, the set of features has to be divided into subsets that occur enough times in the training data. The values for each subset are then considered for each possible semantic role and the decision on the most probable role is made by combining the information. Gildea and Jurafsky (2002) divide the set of features into 8 subsets: 80 2.3. Automatic approaches to the predicate-argument structure P (r|t), P (r|pt, t), P (r|pt, gov, t), P (r|pt, position, voice), P (r|pt, position, voice, t), P (r|h), P (r|h, t), P (r|h, pt, t). They explore several methods of combining the information based on the subsets achieving the best results by combining linear interpolation with the back-off method. Linear interpolation provides the average value of the probabilities based on the subsets of features. It is calculated in the following way: P (r|constituent) =λ1 P (r|t) + λ2 P (r|pt, t)+ λ3 P (r|pt, gov, t) + λ4 P (r|pt, position, voice)+ λ5 P (r|pt, position, voice, t) + λ6 P (r|h)+ λ7 P (r|h, t) + λ8 P (r|h, pt, t) (2.47) where λi represents interpolation weight of each of the probabilities and Σi λi = 1. It can be noted that not all subsets include the same number of features. By including more features, the subset (pt, position, voice, t), for instance, defines more specific conditions than the subset (t). The back-off method enables combining the more specific features, that provide more information, when they are available and turning to the more general features only if the specific features are not available. The values for the most specific subsets ((pt, position, voice, t), (pt, gov, t), (h, pt, t)) are considered first. If the probability cannot be estimated for any of them, it is replaced by its corresponding less specific subset. For example, (pt, position, voice, t) is replaced by (pt, position, t), (pt, gov, t) by (pt, t), (h, pt, t) by (h, t) and so on. Different systems apply different machine learning methods to estimate the probabilities. A range of different methods, including those based on maximum entropy, support vector machines, decision tree learning and others, have been applied in more recent systems (Carreras and Màrquez 2005) . 81 2. Overview of the literature The described classification applies to the task of assigning a semantic role to a constituent which is known to bear one. The same methods can be used for the first step in semantic role labelling, that is to identifying the constituents which bear semantic roles. The estimated probability in this case is the probability that a constituent bears any semantic role in given conditions, described by a reduced set of features. Gildea and Jurafsky (2002) use the information on the head word (feature h), target word t and the path between them. Joint and unsupervised learning There are two kinds of approaches which can be seen as not following the standard pipeline framework. One line of the development explores the potential of joint modelling. These statistical models and computational methods exploit the relationship between the syntactic and the predicat-argument structure in a more systematic way. Toutanova et al. (2005) shows that moving the account of the global outline of a sentence from the post-processing phase to the core statistical model improves the classification results. Henderson et al. (2008) propose a model for joint learning of both syntactic and semantic labelling in a single model, moving the syntactic information from the pre-processing phase to the core statistical model. The advantage of such approaches compared to the standard approach is that the syntactic structure of a phrase is not definitely assigned before the semantic structure, so that the semantic information can be used for constructing a better syntactic representation, reducing error propagation between the levels of analysis. Recently, the attention has been focused on unsupervised learning, where the information about correct semantic role labels (assigned by human annotators) is not available for training. The advantage of unsupervised approaches (Lang and Lapata 2011; Titov and Klementiev 2012; Garg and Henderson 2012) compared to the standard approach is that they do not require manually annotated training data, which are costly and hard to develop (see Section 2.2.2 for more detail). Unsupervised learning exploits the overlap between syntactic representation and the predicate-argument structure. The models cluster the instances of syntactic constituents described in terms of features (similar to the features used in the standard approaches). The constituents which are similar in 82 2.3. Automatic approaches to the predicate-argument structure terms of their feature representations are grouped together. The models include a hidden layer representing semantic roles which potentially underlie the observed distribution of the constituents. 2.3.3. Automatic verb classification The task of automatic verb classification addresses not only the predicate-argument structure of verbs, but also the semantic classification of verbs. An in-depth analysis of the relationship between the lexical semantics of verbs and the distribution of their uses in a corpus is performed by Merlo and Stevenson (2001). The study addresses the fine distinctions between three classes of verbs which all include verbs that alternate between transitive and intransitive use. The classes in question are manner of motion verbs (2.48), which alternate only in a limited number of languages, change of state verbs (2.49), alternating across languages, and performance/creation verbs (2.50). (2.48) a. The horse raced past the barn. b. The jockey raced the horse past the barn. (2.49) a. The butter melted in the pan. b. The cook melted the butter in the pan. (2.50) a. The boy played. b. The boy played soccer. Although the surface realisations of phrases formed with these verbs are the same (they all appear both in transitive and intransitive uses), the underlying semantic analysis of the predicate-argument structure is different in each class. The subject of intransitive realisation is agentive (animate, volitional) in (2.48a) and (2.50a), while it is not in (2.49a). On the other hand, the transitive realisation contains one agentive and one non-agentive role in (2.49b) and (2.50b), while the realisation in (2.48b) contains two agentive arguments. Correct classification of verbs into one of the three classes defines, thus, the correct analysis of the semantic relations that they express. 83 2. Overview of the literature Merlo and Stevenson (2001) base their approach to automatic classification of verbs on the theoretical notion of linguistic markedness. The main idea of the theory of markedness is that linguistic marking occurs in elements which are unusual, unexpected, while the common, expected elements are unmarked. To use a common simple example, the plural of nouns in English is marked with an ending (’-s’) because it is more uncommon than singular which is unmarked. Linguistic markedness has direct consequences on the frequency of use: it has been shown that marked unites are rarer than unmarked unites. Applied to the choice between the intransitive and transitive use of the verbs addressed by Merlo and Stevenson (2001), the theory of linguistic markedness results in certain expectations about the distribution of the uses. It can be expected, for example, that the uses such as (2.48a) are unmarked, which means more frequent, while the uses such (2.48b) are marked, which means less frequent. In the case of verbs represented in (2.50) the expected pattern is reversed: the intransitive use (2.50a) is marked here, while the transitive use (2.50b) is unmarked. For the verbs illustrated in (2.49) none of the uses is marked, which means that roughly equal number of transitive and intransitive realisations is expected. Features. In the classification task, the uses of verbs are described in terms of features which are based on a combination of the markedness analysis with an analysis of semantic properties of the arguments of the verbs. Three main features are defined: • Transitivity — captures the fact that transitive use is not equally common for all the verbs. It is very uncommon for manner of motion verbs (2.48b), much more common for change of state verbs (2.49b), and, finally, very common for performance/creation verbs (2.50b). This means that manner of motion verbs are expected to have consistently a low value for this feature, change of state verbs middle, and performance/creation verbs high. • Causativity — represents the fact that, in the causative alternation, the same lexical items can occur both as subjects and as objects of the same verb. This can be expected for arguments such as butter in (2.49) and horse in (2.48). This feature is operationally defined as the rate of overlap between lexical units found as the head of the subject of the intransitive uses and those found as the head 84 2.3. Automatic approaches to the predicate-argument structure of the object in the transitive uses of the same verb. The quantity is expected to distinguish between the two classes illustrated in (2.48) and (2.49) on one side and the class illustrated in (2.50) on the other side, because the alternation in the latter class is not causative (the object of the transitive use does not appear as the subject of the intransitive use, it is simply left out). • Animacy — is used to distinguish between the verbs that tend to have animate subjects (manner of motion verbs (2.48) and performance verbs (2.50)) and those that do not (change of state verbs (2.49)). It is operationally defined as the rate of personal pronouns that appear as the subjects of verbs. Additional features are also used (the information about the morphological form of the verb) but they are not as theoretically prominent as the main three features. Classification. The experiments in classification are performed on 60 verbs (20 per class) listed as belonging to the relevant verb classes by Levin (1993). Each verb is described as a vector of feature values, where the values are calculated automatically from corpus data, as shown for the verb form opened in (2.51). (2.51) a) b) verb trans pass vbn caus anim class-code opened .69 .09 .21 .16 .36 unacc verb trans pass vbn caus anim class-code opened .69 .09 .21 .16 .36 ? The co-occurrence of the feature values with a particular class is observed in the training data and registered. The training input is illustrated in (2.51a). The first six positions in the vector represent the values of the defined features extracted from instances of the verb in a corpus. The the last position in the vector is the class that should be assigned to the verb. The code unacc refers to the term unaccusative verbs, which is often used to refer to the change-of-state verbs. In predicting the class that is assigned to a verb in the test input (illustrated in (2.51b)), the probability of each class being associated with the observed vector of feature values is assessed. The algorithm used for calculating the most probable class is a supervised learning algorithm, the decision tree, which is described in more detail in Chapter 3. 85 2. Overview of the literature The results of the study show that the classifier performs best if all the features are used. They also show that the discriminative value of the features differs when they are used separately and when they are used together, which means that information about the use of verbs that they encode is partially overlapping. Subsequent studies develop in different directions. While Merlo et al. (2002) explore using cross-linguistic information as a kind of additional general supervision in the classification task, most of the remaining work concerns two interrelated lines of research: unsupervised classification and generalisation. Lapata and Brew (2004) propose a statistical model of verb class ambiguity for unsupervised learning the classification preferences of verbs which can be assigned to multiple classes. The model does not use a predefined set of linguistically motivated features as in the approach of Merlo and Stevenson (2001), but it takes into account the distribution of a wide range of verbs (those listed by Levin (1993)) and their syntactic realisations in combination with the distribution of classes. The resulting preferences are then used to improve verb senses disambiguation. Several studies deal with the required feature set (Stevenson and Joanis 2003; Joanis and Stevenson 2003; Joanis et al. 2008; Schulte im Walde 2003; Schulte im Walde 2006; Li and Brew 2008), especially in the unspervised and partially supervised setting. This work suggests that the set of features which is useful for verb classification is not specific to this task. Schulte im Walde (2003) argues that no generally useful features can be identified, but that the usefulness of a feature depends on the idiosyncratic properties of verb classes. Baroni and Lenci (2010) explore further potential generalisations in lexical acquisition from corpora, proposing a framework for constructing a general memory of automatically acquired lexical knowledge about verbs. This knowledge can be used directly for different classifications required by different applications. Schulte im Walde et al. (2008), Sun and Korhonen (2009) explore further the effects of incorporating the information about lexical preferences of verbs into verb classification, which had proved to be less helpful than expected in earlier experiments (Schulte im Walde 2003). 86 2.4. Summary 2.4. Summary We have shown in this chapter how the view of the lexical representation of verbs has evolved in linguistic theory and how it was followed in computational linguistics. Three turning points in theoretical approaches to the meaning of verbs can be identified. First, the relational meaning of verbs is separated from the other, idiosyncratic semantic components. The relational meaning, called the predicate-argument structure, is then further analysed. There are two main approaches to the decomposition of the predicate argument structure: decomposition of the arguments into sets of features and decomposition of the verbal predicates into sets of predicates. In the latter approach, an attempt is made to derive the decomposition from more general semantic templates (such as causal or temporal). The predicate-argument structure has recently been recognised in computational linguistics as a level of linguistic representation that is suitable and useful for automatic analysis. The view of the predicate argument structure underlying the computational approaches, however, does not follow the developments in linguistic theory. The overview of the knowledge representation in the resources used for training automatic systems shows that the predicate-argument structure which is annotated and automatically learnt is still based on the atomic view of the predicates and arguments, despite the fact that this view is shown to be theoretically inadequate in the linguistic literature. The featurebased knowledge representation used in the statistical models is also not closely related to the notions discussed in the linguistic literature. However, the work on automatic semantic role labelling and verb classification shows the potential of using the information about verb instances in corpora for recognising fine components of verbal meaning. In this dissertation, we use the methods developed in the approaches to automatic acquisition of the meaning of verbs to learn automatically the components of the lexical representation which are relevant to the current discussion in linguistic theory. The components of verbs’ meaning which we identify on the basis of the distributions of their syntactic realisations observed in a corpus are defined in terms of causal and temporal decomposition of events described by the verbs. Studying the uses of verbs in parallel corpora is the main novelty of this work. By 87 2. Overview of the literature taking this approach, we make a step further with respect to both existing computational approaches and the standard linguistic methodology. Previous corpus-based explorations of the meaning of verb are generally monolingual and they do not address the patterns in the cross linguistic variation. On the other hand, cross-linguistic data are crucial for the standard methodology of linguistic research. However, the standard approaches usually involve just a few instances of the studied phenomena which are discussed in depth. In contrast to this, the approach which we propose allows studying cross-linguistic variation in a more systematic way, taking into consideration large data sets. The details of our approach based on parallel corpora are discussed in the following chapter. 88 3. Using parallel corpora for linguistic research — rationale and methodology A parallel corpus is a collection of translations between two (or more) languages, where each sentence in one language is aligned with the corresponding sentence in the other language. The work on constructing numerous parallel corpora of different languages was primarily motivated by the developments in statistical machine translation in the early nineties. With the emergence of systems able to learn to translate from one language to another by observing a set of examples of translated sentences, resources for training and evaluating such systems started growing rapidly. Current versions of some popular sentence-aligned parallel corpora, such as Europarl (Koehn 2005) or OPUS (Tiedemann 2009), contain tens of languages, with some languages being represented with millions of sentences. These resources are still used mostly for experiments in machine translation, but potential other uses are increasingly proposed and explored. In this dissertation, parallel corpora are used for investigating lexical representation of verbs. To address theoretical questions concerning the meaning of verbs, we design a novel methodology which combines methods originating in several disciplines. We formulate our research hypotheses on the basis of theoretical discussions and arguments put forward in the linguistic literature. We then collect a large number of empirical observations relevant to the hypotheses from parallel corpora using state-of-the-art natural language processing. We perform different statistical analyses of the collected data. Some interesting insights are obtained by a simple descriptive analysis where a summary of a large number of observations reveals significant patterns in the use of verbs. In some 89 3. Using parallel corpora for linguistic research — rationale and methodology cases, we use standard statistical tests to determine whether some identified tendencies are statistically and scientifically significant. When a more complex analysis is required, we design statistical models, which are intended to explain the observations with a set of generalisations. To test the predictive performance of the models, we employ standard machine learning methods which are commonly used in natural language processing, but not in theoretical linguistics. We train the models on a large set of examples of verbs’ use extracted from parallel corpora using machine learning algorithms. We then test and evaluate the predictions made by the models on an independent set of test examples. The methods are described in more detail in the remaining of this chapter. The chapter consists of two major parts. In the first part (Section 3.1 and Section 3.2) we discuss several methodological issues related to parallel corpora. We start by presenting our arguments for using parallel corpora for linguistic research, showing that our method based on parallel corpora can be regarded as an extension of the standard theoretical approach to cross-linguistic variation (3.1.1). We then discuss potential methodological problems posed by translation effects, which can influence the representativeness of parallel corpora (3.1.2). In Section 3.2, we first discus in detail automatic word alignment, which is crucial for automatic extraction of data from parallel corpora (3.2.1). We then give a brief overview of how parallel corpora have been used for research in natural language processing as in illustration of the potential of parallel corpora as sources of linguistic data (3.2.2). In the second part (Section 3.3 and Section 3.4), we present the technical details of the methodology which we apply to analyse the data extracted from parallel corpora. Section 3.3 contains an introduction to statistical inferences and modelling. In Section 3.4 we lay out machine learning approaches to training statistical models, providing more details about the learning algorithms which are used in the experiments in this dissertation. 3.1. Cross-linguistic variation and parallel corpora In theoretical linguistics, cross-linguistic variation has always been studied as a means of discovering elements of linguistic structure which are invariably present in all languages, the atoms of language as metaphorically put by Baker (2001). Linguistic analysis almost 90 3.1. Cross-linguistic variation and parallel corpora inevitably involves parallel sentences such as the pair Gungbe-English in (3.1), taken from Aboh (2009), or the pair Ewe-English in (3.2) by Collins (1997). (3.1) Àsı́bá ḑà l´ sı̀ ḑù. Asiba cook/prepare/make rice eat Asiba cooked the rice eat (i.e. she ate the rice). (3.2) Kofi tso ati- fo Yao (yi). Kofi take stick-def hit Yao P Kofi took the stick and hit Yao with it. Such parallel sentences are usually constructed on the basis of native-speaker competence to illustrate apparently different realisations of a particular construction in different languages ((3.1) and (3.2) are examples of complex predicates) and to identify the level of representation at which the languages do not differ. For the reason of simplification, we do not show the full analysis of the examples (3.1) and (3.2), but they illustrate a situation where the same kind of complex predicate-argument structures are realised with two separate clauses in English, and with a single clause in Gungbe and Ewe. In this case, the predications expressed in the sentences are invariable across languages, while the structural level at which they are realised is varied. Parallel sentences cited and analysed in the linguistic literature usually represent the most typical realisations, abstracting away from potential variation in realisations in both languages. The corpus of analysed cases rarely exceeds several examples for each construction studied. 3.1.1. Instance-level microvariation In this dissertation, parallel realisations of particular constructions are studied on a much larger scale taking into consideration the potential variation in realisations. We see parallel corpora as samples of sentences naturally produced in two (or more) languages, from which we extract all instances of the studied constructions, and not just typical uses, relying on statistical methods in addressing the variation. This approach allows us to observe many different realisations of constructions that actually occur in texts 91 3. Using parallel corpora for linguistic research — rationale and methodology and to address the non-canonical uses as well as the canonical realisations. Since we work with actual translations, the cross-linguistically equivalent expressions are directly observed. We do not have to rely on our intuition about which construction in one language corresponds to which construction in the other language. We simply observe the realisations in the aligned sentences and then summarise (or classify) the observations. In this way, we can identify grammatically relevant tendencies which cannot be observed using standard approaches. For example, passive constructions are available both in English and German and they can be seen as equivalent forms. However, verbs in one of the two languages may show a tendency to be realised in passive forms in the same context where intransitive realisations are preferred by the other language. Such an asymmetry might prove to be grammatically relevant. Studying instances of verbs in a parallel corpus makes it possible to control for any pragmatical and contextual factors that may be involved in a particular realisation of a verb, allowing us to isolate structural factors which underlie the variation in the realisations. Since translation is supposed to express the same meaning in the same context, we can assume that the same factors that influence a particular realisation of a verb in a clause in one language influence the realisation of its translation in the corresponding clause in another language. Any potential differences in the form of the two parallel clauses should be explained by the lexical properties of the verbs or by structural differences between languages. Studying many different instances of verbs in parallel corpora fits well with some recent general trends in theoretical linguistics. In the current methodology of linguistic research, small lexical variation between similar languages has been given an important place. As discussed in several places in a collection of articles consecrated to the theoretical aspects of cross-linguistic variation (Biberauer 2008), a distinction is made between macro-parameters and micro-parameters. In making this distinction, the term macro-parameters is used for those parameters of variation which are traditionally studied, mostly in the framework of the theory of principles and parameters (Chomsky 1995). Such a parameter is, for example, the famous pro-drop parameter, which divides languages into two major groups: those where expressing the subject of a sentence is obligatory, such as English and French, and those 92 3.1. Cross-linguistic variation and parallel corpora where the subject can be omitted when it is expressed by a pronoun (hence the term pro-drop), such as Italian. The term macro-parameter does not only refer to the fact that these parameters concern all (or almost all) languages, but also to the fact that they concern large structural chunks. Presence vs. absence of the subject is the kind of variation that affects the basic layout of sentences, causing substantial differences in the structure of sentences across languages. As opposed to macro-parameters, micro-parameters concern the variation which is limited to smaller portions of sentences. They affect the structure of small phrases and, especially, the choice of lexical items. They also apply to a smaller number of languages. Micro-parameters are typically studied when structures are compared between closely related languages, which have the same setting of macro-parameters. An example of a micro-parametric category is the difference between the French quantifier beacoup and its apparently corresponding English quantifier many. In an influential study, Kayne (2005) shows that the two lexical items have different representations, although they are considered equivalent. Kayne’s study is set within the programme of isolating minimal abstract units of language structure by identifying minimal structural divergence in two similar languages or even dialects of the same language. We see parallel corpora as a suited resource for studying micro-variation. Numerous examples of uses of lexical items can be extracted from parallel corpora and studied in a systematic way. Applying automatic methods for extraction allows us to analyse not only many instances of lexical items, but also many items. While theoretical investigations are usually limited to just several items which are analysed at the type level, our studies include thousands of instances of hundreds of verbs, which provides a strong empirical basis for new theoretical insights. We underline that this advantage applies only to those phenomena which are frequent enough so that a sample of instances can be found in corpora. Lexical items with grammatically relevant properties, like the quantifiers studied by Kayne (2005) and the verbs studied in this dissertation, represent exactly that kind of linguistic phenomena. Although cross-linguistic variation is one of the crucial issues in linguistic theory, parallel corpora are rarely used in linguistic research outside natural language processing. The importance of parallel corpora for linguistic research has been recognised mostly by the 93 3. Using parallel corpora for linguistic research — rationale and methodology researchers in the domain of language typology. A collection of papers edited by Cysouw and Wälchli (2007) brings several case studies demonstrating the kind of language facts that can be extracted from parallel corpora. A broader study is performed by von Waldenfels (2012) who uses a parallel corpus of eleven Slavic languages to study the variation in the use of imperative forms. The patterns that are found in the corpus data are shown to correspond to the traditional genetic and areal classification of Slavic languages. Linguistic investigations of parallel corpora are not only rare, but they are also little automated. Data collection and, especially, analyses are performed almost entirely manually, which means that the number of observations which can be analysed is rather small compared to the available information in the resources. In contrast to this, the methodology proposed in this dissertation is entirely automatic, drawing heavily on the approaches in natural language processing. 3.1.2. Translators’ choice vs. structural variation One important limitation of using parallel corpora for linguistic research is the fact that, despite controlling for context and discourse factors, translations might still include variation which is not necessarily caused by linguistic divergences. Consider, for example, the English sentence in (3.3a) and its French translation in (3.3b). (3.3) a. I hope that the President of the Commission [...] tells us what he intends to do. b. J’espère I hope part de part of que le président de la Commission [...] nous fera that the president of the commission us will make ses intentions. his intentions Even if the English sentence in (3.3a) can be translated into French keeping the parallel structure, it is not. As a result, the phrases tells us what he intends to do and nous fera part de ses intentions (will make us part of his intentions) cannot be seen as structural 94 3.1. Cross-linguistic variation and parallel corpora counterparts, although the two languages can express the content in question in a structurally parallel way. There is a verb in French (communiquer ) that corresponds to the English verb tell, taking the same types of the complements as the English verb. However, at the instance level, these two sentence are not parallel. The factors that influence the translations at the instance level are numerous, including discourse factors, broader cultural context, translators attitudes, and other factors. An interesting question to ask, then, is to what degree the existing translations actually show the structural divergence between languages. In an experimental study on a sample of 1 000 sentences containing potentially parallel frames in the sense of FrameNet (see Section 2.2.2 in Chapter 2), extracted from the Europarl corpus and manually annotated, Padó (2007) finds that 72% of English frames that could have a parallel frame in German were realized as parallel instances. The ratio is 65% for the pair English-French. However, once the frames are parallel, the parallelism between the roles (frame elements in FrameNet) within the frames is assessed as “almost perfect”. We address this limitation by extracting only the most parallel sentences. We use the information obtained by automatic alignment of words in parallel sentences and automatic linguistic analysis of the sentences on both sides of a parallel corpus (described in more detail in Section 3.2.1) to control the kind of constructions which are extracted. We extract only the realisations which show certain levels of parallelism, minimising the variation which is potentially irrelevant for linguistic studies. Another potential problem for using parallel corpora for linguistic research are the known translation effects. It has been argued that the language of translated texts differs from the language of texts originally produced in some language in several respects. Baroni and Bernardini (2006) have shown, for example, that, given a choice between an expressions which is similar to the one in the source language and an expression which is different, translators tend to chose the different expression. The result of this tendency is more divergence in the translations than it is imposed by structural differences between the languages. Also, different translators might have different strategies in choosing the expressions. 95 3. Using parallel corpora for linguistic research — rationale and methodology This limitation is partially addressed by the strategy of maximising parallelism in extracting the instances of verbs. Another strategy that we apply to addresses this issue is using large-scale data. It can be expected that the variation which represents noise for a linguistic analysis is marginalised in a big sample of instances which includes translations produced by many different translators. Patterns observed in a big sample can be assigned to linguistic factors. The reasoning behind this expectation is that translators’ choice of expression is still limited by linguistic factors: they can only choose between options provided by structural elements available in a language. 3.2. Parallel corpora in natural language processing Collecting large data samples, which is crucial for studying parallel corpora, necessarily involves automatic processing of texts. The information which is crucial for collecting the data on cross-linguistic realisations of verbs is word alignment. If we want to extract verbs that are translations if each other in parallel sentences, the sentences need to be word-aligned, so that, for each word in the sentence in one language, we know its corresponding word in the sentence in the other language. Given that collecting large samples implies extracting verb instances from hundreds of thousands of parallel sentences, the required information can only be obtained automatically. In this section, we discuss methods for automatic alignment of words in parallel corpora which have been developed in the context of statistical machine translation. 3.2.1. Automatic word alignment Word alignment establishes links between individual words in each sentence and their actual translations in the parallel sentence. Figure 3.1 illustrates such an alignment, where the German pronoun ich is aligned with the English pronoun I, German verb form möchte, with the English forms would like and so on. As the example in Figure 3.1 shows, correspondences between the words in sentences are often rather complex. The range of existing alignment possibilities can be described with the following taxonomy: 96 3.2. Parallel corpora in natural language processing Figure 3.1.: Word alignment in a parallel corpus • One-to-one alignment is the simplest and the prototypical case, where corresponding single words are identified, such as I Ich or lesson Lehre in Figure 3.1 • One-to-null alignment can be used to describe words which occur in one language but no corresponding words can be identified in the other language. In the example in Figure 3.1, such words are English There, is, to and German daß. • One-to-many alignment holds between a single word in one language and multiple words in the other language. Examples of this relationship in Figure 3.1 are möchte would like and daraus from this. • Many-to-many alignment is necessary when no single word in any of the aligned sentences can be identified as an alignment unit. This is usually case in aligning idioms. The sentences in Figure 3.1 do not contain such an example. To illustrate this case, we use the example in (3.4) taken from Burchardt et al. (2009). The phrase nehmen in Kauf aligns with English put up with, but they can only be aligned in the many-to-many fashion because no subpart of neither expression can be identified as an alignment unit. (3.4) a. Die Glaubiger nehmen Nachteile in Kauf. (German) 97 3. Using parallel corpora for linguistic research — rationale and methodology b. The creditors put up with disadvantages. (English) Note that identifying alignments between words and phrases is not always straightforward. Although it is clear that units smaller than a sentence can be aligned, it is not always clear what kind of alignment holds and between which words exactly. As an illustration, consider the word to in the English sentence in Figure 3.1. Its alignment is subject to interpretation. It can be seen as not corresponding to any word in the parallel German sentence (one-to-null alignment), which is how it is aligned in our example. However, since to marks the infinitive form in English and the corresponding German verb is in the infinitive form, the one-to-many alignment to learn ziehen is also correct. The alignment between English learn and German ziehen illustrates an important difference between word alignments and lexical translations. The two verbs are clearly aligned in the example in Figure 3.1, but they are not lexical translations of each other. Outside of the given context, German ziehen would translate to English draw or pull, while English learn would translate to German lernen. For the purpose of automatic extraction from parallel corpora, word alignment is usually represented as a set of ordered pairs, which is a subset of the Cartesian product of the set of words of the sentence in one language and the set of words of the aligned sentence in the other language (Brown et al. 1993). Technically, one language is considered the source and the other the target language, although this distinction does not depend on the true direction of translation in parallel corpora. With the words being represented by their position in the sentence, the first member in each ordered pair is the position of the target word (j in 3.5) and the second member is the position of the source word that the target word is aligned with (i in 3.5). A ⊆ {(j, i) : j = 1, ..., J; i = 1, ..., I} (3.5) The set A is generated by a a single-valued function which maps each word in the target sentence to exactly one word in the source sentence. For example, taking English as the target and German as the source language in Figure 3.1, the alignment between I and ich can be represented as the ordered pair (6, 1). Alignment of would like with möchte, 98 3.2. Parallel corpora in natural language processing is represented with two ordered pairs (7, 2) and (8, 2). To account for the fact that some target language words cannot be aligned with any source language word, a special empty word (“NULL”) is introduced in the source sentence. In this way, all the words that have no translation (such as English There, is, to in Figure 3.1) can be aligned with this word, satisfying the general condition which requires that they are aligned with one word in the source sentence. Note that the given formal definition only approximates the intuitive notion of word alignment described above. The definition simplifies the listed alignment relations in two ways. First, one-to-many alignments are possible only in one direction; one source word can be aligned with multiple target words, but not the other way around. As a consequence, switching the target-source assignment of a pair of sentences changes the alignment. Second, the single-valued function definition excludes many-to-many relations entirely. Despite these limitations, the described formalisation is widely used because it expresses the main properties of word alignment in a way that is suitable for implementing algorithms for its automatic extraction from parallel corpora. Word alignment is usually computed from sentence alignment by means of the expectationmaximisation algorithm. The algorithm considers all possible alignments of the words in a pair of sentences (the number of possible word alignment pairs is length(source) )(length(target) ) and outputs the one which is most probable. The probability of alignments is assessed at the level of a sentence. Individual words are aligned so that the alignment score of the whole sentence is maximised. The algorithm starts by assigning a certain initial probability to all possible alignments. The probabilities are then iteratively updated on the basis of observations in a parallel corpus. If a pair of words is observed together in other pairs of sentences, the probability of aligning the two words increases. The algorithm is described in more detail in Section 3.4.2. A commonly used program that provides automatic word alignment of parallel corpora, GIZA++ (Och and Ney 2003)), which is also used in our experiments, assumes the alignment definition described above. In addition to the described basic elements (individual word alignment and global sentence alignment), the system implements some refinements, which improve its actual performance. We do not discuss these refinements since they do not introduce major conceptual changes. 99 3. Using parallel corpora for linguistic research — rationale and methodology The experiments performed to evaluate this alignment method (Och and Ney 2003) showed that, apart from setting the required parameters, the quality of alignment depends on the language pair, as well as on the direction of alignment (e.g. the performance is better for the direction English → German than the other way around). They also showed that combining the alignments made in both directions has a very good effect on the overall success rate. 3.2.2. Using automatic word alignment in natural language processing Since parallel corpora have been available to the research community they have inspired research in natural language processing beyond machine translation. A number of proposals have been put forward to exploit translations of words automatically extracted from parallel corpora for improving performance on different natural language processing tasks. The work on part-of-speech tagging (Snyder et al. 2008) shows that data from another language can help in disambiguating word categories. For example, the English word can is ambiguous between three readings: it can be a modal verb, a noun, or a lexical verb. Each of the three categories is translated with a different word in Serbian, for example: the corresponding modal is moći, the noun is konzerva, and the lexical verb is konzervirati. Knowing what is Serbian translation of the English word in a given sentence can help decide which category to assign to the word. The work of van der Plas and Tiedemann (2006) shows that the data from parallel corpora can improve automatic detection of synonyms. The main difficulty for monolingual approaches is distinguishing synonyms from other lexical relations such as antonyms, hyponyms, and hypernyms, which all occur in similar contexts. For example, a monolingual system would propose as synonyms the words apple, fruit, and pear because they all occur in similar contexts. However, the fact that the three words are consistently translated with different words into another language indicates that they are not synonyms. The potential of the data from parallel corpora for reducing ambiguity at different levels of natural language representation has been used to improve syntactic analysis (Kuhn 100 3.2. Parallel corpora in natural language processing 2004; Snyder et al. 2009; Zarrieß et al. 2010), the analysis of the predicate-argument structure (Fung et al. 2007; Wu and Palmer 2011), as well as machine translation (Collins et al. 2005; Cohn and Lapata 2007). An interesting application of parallel corpora is transferring structural annotation (morphological, syntactic, semantic) from one language to another. Developing resources such as FrameNet or PropBank (see Chapter 2, Section 2.2.2), which have enabled progress in automatic predicate-argument analysis, requires substantial investments involving linguistic expertise, financial support, and technical infrastructure. This is why such resources are only available for a small number of languages. Parallel corpora have been seen as a means of automatic development of the resources for multiple languages. The assumption behind the work on transferring annotation is that languages share abstract structural representations and that whatever analysis applies to a sentence in one language should be applied to its translation in another language. This assumption is generally shared by theoretical linguists, as discussed in more detail in Section 3.1. However, when tested on large corpora, the portability of a structural annotation is not straightforward (Yarowsky et al. 2001; Hwa et al. 2002; Padó 2007; Burchardt et al. 2009; van der Plas et al. 2011). The work on automatic annotation transfer, although primarily motivated by more practical goals, has provided some general insights concerning the difference between the elements of the structure which are universal and those which are language-specific. The issue of parallelism vs. variation in the predicate-argument structure between English and Chinese is addressed by Fung et al. (2007), who study a sample of the Parallel English-Chinese PropBank corpus containing over 1 000 manually annotated and manually aligned semantic arguments of verbs (Palmer et al. 2005b). They find that the roles do not match in 17.24% cases. English arg0 role (see Section 2.2.2 in Chapter 2) for more details), for instance, is mapped to Chinese arg1 77 times. Although the sources of the mismatches are not discussed, the findings are interpreted as evidence against the assumption that this level of linguistic representation is shared in the case of English and Chinese. The plausibility of a strong version of the assumption of structural parallelism is explored by Hwa et al. (2002). It is formulated as the Direct Correspondence Assumption: 101 3. Using parallel corpora for linguistic research — rationale and methodology Given a pair of sentences E and F that are (literal) translations of each other with syntactic structures T reeE and T reeF , if nodes xE and yE of T reeE are aligned with nodes xF and yF of T reeF , respectively, and if syntactic relationship R(xE , yE ) holds in T reeE , then R(xF , yF ) holds in T reeF . The evaluation of the annotation transferred from English to Chinese against a manually annotated Chinese gold standard shows that syntactic relations are not directly transferable in many cases. However, a limited set of regular transformations can be applied to the result of direct projection to improve significantly the overall results. For example, while English verb tense forms express verbal aspect at the same time (whether the activity denoted by the verb is completed or not), Chinese forms are composed of two words, one expressing the tense and the other the aspect. Projecting the annotation from English, the relation between the aspect marker and the verb in Chinese cannot be determined, since the aspect marker is either aligned with the same word as the verb (the English verb form), or it is not aligned at all. In this case, a rule can be stated adding the relation between the aspect marker and the verb to the Chinese annotation in a regular way. The work reviewed in this section illustrates the variety of cross-linguistic issues which can be addressed on the basis of data automatically extracted from parallel corpora. Despite the limitations discussed in Section 3.1.2, parallel corpora, in combination with automatic word alignment, provide a new rich resource for studying various aspects of cross-linguistic variation. Note that, in addition to word alignment, which is common to all studies, extracting linguistic information from parallel corpora requires other kinds of automatic processing. If we want to extract all the instances of a certain verb in a corpus, we need to make sure that, when we look for the verb go, for example, we obtain the instances of goes, went, gone, and going as well. This means that the corpus needs to be lemmatised. The corpus also needs to be morphologically tagged, so that we know that our extracted instances are all verbs, and not some other categories. For example, we need to make sure that the extracted instances do not include cases such as have a go, where go is a noun. If we want to count how many times a verb is used as transitive and how many times as intransitive, the corpus needs to be syntactically parsed. The details of linguistic 102 3.3. Statistical analysis processing used in our experiments are explained in the methodological sections of each case study separately, because the approaches to the studied constructions required different linguistic analyses. 3.3. Statistical analysis Once the aligned instances of verbs that interest us are extracted from parallel corpora, we analyse them using various statistical methods. Statistical analysis allows us to identify tendencies in the use of verbs which are relevant for studying their lexical representation. In this section, we lay out the methods used in our studies together with the technical background necessary for following the discussion in the dissertation. The survey of the notions in the technical background relies mostly on two sources, Baayen (2008) and Upton and Cook (1996). 3.3.1. Summary tables In all the three case studies in this dissertation, observations are stored as two kinds of variables. We distinguish between instance observations which refer to the characteristics of use of verbs at the token level, and type observations, which refer to the properties of verbs as separate entries in the lexicon. As an illustration of the two kinds of data extracted from corpora, simple artificial examples are shown in Tables 3.1 and 3.2. Instance variables contain the information about each occurrence of a verb in the corpus. Table 3.1, for example, contains two variables: the morphological form of the verb in the given instances and its syntactic realisation (whether it is used as transitive or not). Type variables contain the information that is relevant for lexical items at the type level. Frequency in the corpus shown in Table 3.2 is typically the kind of information that applies to types. Simple tables that list the values of the variables usually do not help much in spotting interesting patterns; individual observations are of little interest for a statistical analysis. What is more interesting is the relationship between the values in two or more variables. 103 3. Using parallel corpora for linguistic research — rationale and methodology Instance ID 1 2 3 4 Morph Transitive past no present yes present yes past no Table 3.1.: Examples variables of Verb stop drive hide sleep instance Frequency 236 75 13 9 Table 3.2.: Examples of type variables For instance, a question that immediately comes to mind looking at Table 3.1 is whether the verb tense somehow influences the transitivity of a verb use or the other way around. The observations listed in the table suggest that there is a pattern: the instances that are in the present tense are also transitive and those that are in the past simple tense are intransitive. A simple way to look up the relations between the values of two or more variables is to construct a contingency table which shows the number of joint occurrences of all the pairs of values. Table 3.3 is a contingency table which summaries the observations listed in Table 3.1. The benefits of contingency tables might not look that obvious on such a small data set, but as soon as the number of observations becomes greater than ten, such summaries are necessary. The more variables and possible values the harder to see the relationships in simple tables. transitive not transitive simple past 0 2 present 2 0 Table 3.3.: A simple contingency table summarising the instance variables Of course, the pattern that seems to be present in Table 3.1 might be due to chance and not to a true relationship between the two variables. This is a possibility that can never be completely excluded. Assessing the probability that patterns in observations are due to chance is thus one of the core issues in statistics. If the probability is very low (usually the threshold is set to p < 0.05), the pattern is significant. 104 3.3. Statistical analysis What assessing this probability easier in a general sense is a greater number of observations. Misleading patterns occur much easier in small than in big samples. On the other hand, true relationships can also go unnoticed in small samples. This is why we insist on collecting and analysing large data sets. Patterns that are obvious in large samples are very likely to be statistically significant. But one should bear in mind that, no matter how large our collections of observations are, they still represent just small samples of the phenomena that are generally possible in language. Their analysis makes sense only in the context of statistical inference. 3.3.2. Statistical inference and modelling The main purpose of a statistic analysis, as underlined by Upton and Cook (1996), is not describing observed phenomena, but making predictions about unobserved phenomena on the basis of a set of observations. The pattern that we observe in our toy example in Table 3.1 is not very interesting by itself. It would become much more interesting if we could conclude on the basis of it what the morphological form and the syntactic realisation of every new instance of the verb will be. Good predictions rely on good understanding of the relationships between the values of variables. If the relationships are understood well enough, we can identify a general rule that generates and at the same time explains the observations in the sample. As an illustration of these notions, we adapt a simple example composed by Abney (2011). Consider the variables recorded in (3.6). t (3.6) d 1 0.5 1 1 2 2 3 ? 4 7 The column t specifies the time at which an observation is made. The column d specifies the values recorded: the distance travelled by a ball rolling down an inclined plane. 105 3. Using parallel corpora for linguistic research — rationale and methodology There are two measurements for the time t = 1 (0.5 and 1). There is no observation at t = 3. d = 2t−1 d = 21 t2 (3.7) In this case, the rules that generate the observed sequences can be stated as formulas. Two possible generalisations are given in (3.7). They both capture the sequence of observations only partially. Even if we were allowed to choose the values which are easier to explain (which we are not), and to ignore the value 0.5 at t = 1, the formula on the left hand side does not predict the value 7 at t = 4, but 8. The value 0.5 at t = 1 would suit better the formula on the right hand side, but neither this formula explains the value at t = 4. If we knew all the distances at all time points with certainty and if these values followed a perfectly regular pattern, this pattern could be described in terms of a single generalisation which would have no exceptions and on the basis of which any distance at any time could be predicted, including the missing value at t = 3. Such reasoning is common to all inductive scientific methods. Certain facts are, however, rare in science and observations are hardly ever explainable with a single powerful generalisation. The situation usually resembles much more the example in (3.6): we do not know the facts for sure and we cannot explain them entirely. This perhaps applies especially to linguistic phenomena, which are essentially subject to interpretation. Statistical inference is a way to make predictions taking into consideration the uncertainty and the limits of explanation. Statistical predictions are formulated as the probability that a certain variable will take a certain value (or that it will be situated within a certain range of values) under certain conditions. The probability is usually assessed as the relative frequency of the variable values in a sample of the studied phenomena. For example, the sample of observations in Table 3.1 contains four observations for the morphological form variable and four for the syntactic realisation variable. Out of four morphological forms, two are the simple present tense and two are the past tense. The probability that the next verb is in the simple present tense is thus equal to the probability that it is in the past tense, 106 3.3. Statistical analysis Figure 3.2.: Probability distributions of the morphological forms and syntactic realisations of the example instances. p = 42 = 0.5. The same calculation can be done for the other variable resulting in the same probabilities. Assigning a probability to all possible values of a variable results in a probability distribution, which can be graphically represented with a histogram. Histograms representing probability distributions of the variables in Table 3.1 are given in Figure 3.2. Figure 3.3 shows the probability distribution of the data in Table 3.2 in two cases. The histogram on the left hand side shows the probability distribution over verbs (how likely is an occurrence of the verb), and the one on the right hand side shows the probability distribution over the frequency values (how likely each frequency value is). For the reason of simplicity, we assume in both cases that the lexical inventory consists of only these four verbs. As we can see in Figure 3.3, the shape of the distributions can be very different. The notion of the shape of a distribution does not concern only the visual representation of data, it is very important for inference. The patterns which are observed in the sample can be generalised to a bigger population only if we can assume that the shape of the 107 3. Using parallel corpora for linguistic research — rationale and methodology Figure 3.3.: Probability distributions of the example verbs and their frequency. probability distribution in the unobserved values is the same as in the values observed in the sample. Moreover, generalisations are often only possible if we can assume a specific shape of the probability distribution. The shape of a distribution is determined by the values of a certain number of parameters. The most typical examples of such parameters are the mean value and the standard deviation (showing how much the values deviate from the mean value). There can be other parameters depending on what kind of variation in the values of variables needs to be captured. The normal distribution, illustrated in Figure 3.4, is frequently referred to in science, as many statistical tests require this particular distribution. It is characterised as symmetric because the values around the mean value are at the same time the most probable values. The values which are lower and higher than the mean value are equally probable, with the probability decreasing as they are further away from the mean. Many quantitative variables follow this pattern. A typical example is people’s height: most people are of a medium height, while extremely tall and extremely short people are very rare. Frequency of words in texts, for example, does not follow this pattern. There are usually only 108 3.3. Statistical analysis Figure 3.4.: A general graphical representation of the normal distribution. few words that are extremely frequent, but there are many words with extremely low frequency, many more than those with medium frequency. (Our artificial example in Table 3.2 and on the left hand side in Figure 3.3 illustrates this tendency as much as this is possible with only four examples.) Since standard formulas for statistical tests usually assume the normal distribution, one has to be careful when applying them to linguistic data. An example of a standard test which is very frequently used and which requires that the probability distribution over values is normal is the t-test. This test is a formula that uses the parameters of probability distributions in two samples to calculate the probability that two samples belong to the same larger population. It is very frequently used because it is often important to show that two samples do not belong to the same larger population, that is that they are significantly different. In one of our case studies, the t-test is used to show that two samples belong, in fact, to the same population. 109 3. Using parallel corpora for linguistic research — rationale and methodology As already mentioned above, real predictions based on statistical inference rarely concern only one single variable. What is usually studied in statistical approaches are the relationships between the values of two or more variables. By observing the values in the sample, we try to determine whether the values of one variable (called dependent,) depend on other, independent, variables. If we can determine that the values in the dependent variable systematically increase as the values in the independent variables increase, then we say that there is a positive correlation between the variables. If the changes in the values in the dependent and independent variables are consistent but in the opposite direction (increasing in one and decreasing in the other), we say that there is a negative correlation. For example, people’s height and weight are positively correlated: taller people generally have more weight, despite the fact that this is not always the case. There are a number of statistical tests which measure the strength and the significance of correlation between two variables. The notion of correlation is fundamental to constructing statistical models. If there is a correlation between an independent and a dependent variable and if the values of both variables are normally distributed, then the values in the dependent variable can be predicted from the values in the independent variables. In this case, we say that the variation in the dependent variable is explained by the variation in the independent variable. The purpose of statistical models is to predict values of one variable on the basis of information contained in other variables. They model a piece of reality in terms of a set of independent variables, potential predictors, one dependent variable, and precisely described relationships between them. The prediction is usually based on a regression analysis which shows to what degree the variation in the dependent variable is explained by each factor represented with an independent variable. 3.3.3. Bayesian modelling An alternative approach to predicting values in one variable on the basis of values in other variables is Bayesian modelling. In this framework, the probability of some variable taking a certain value is assessed in terms of a prior and a posterior probability. The prior probability represents our general knowledge about some domain before learning a new piece of information about it. The posterior probability is the result of combining 110 3.3. Statistical analysis Variable a burglary in the given neighbourhood the alarm if there is a burglary the alarm if there is no burglary a burglary in the neighbourhood if the alarm is activated Value happens Notation Probability p(b) 0.014 activated activated happens p(a|b) p(a|¬b) p(b|a) 0.75 0.1 ? Table 3.4.: An example of data summary in Bayesian modelling the prior probability with some newly acquired knowledge. Probability updating is formulated as conditional probability, which can be calculated from joint probability (the probability that the variable A takes the value a and that the variable B at the same time takes the value b) using the general conditional probability rule given in (3.8). P (A|B) = P (A, B) P (B) (3.8) Bayesian modelling is based on the assumption that our knowledge about the world is formed in a sequence of updating steps and that it can be expressed in terms of conditional probabilities, as illustrated in Table 3.4. The example, based on Silver (2012), concerns assessing the probability that a burglary actually took place if an alarm is activated. In assessing this probability, we rely on several facts (listed in Table 3.4). From previous experience, we know that the probability of a burglary in the given neighbourhood is 0.014. This is the prior probability of a burglary in our example. We also have an assessment on how efficiently the alarm detects the burglary: it gives a positive signal in 75% of cases of an actual burglary, and in 10% cases where there is no burglary. We combine this knowledge by applying the equation in (3.9), known as Bayes’ rule, which is derived from the conditional probability rule (3.8) applying the commutative law. P (A|B) = P (B|A) · P (A) P (B) (3.9) 111 3. Using parallel corpora for linguistic research — rationale and methodology When we replace general symbols in (3.9) with the notation from our data summary, we obtain the equation in (3.10). Replacing the terms with actual probability given in Table 3.4, as in (3.11) we obtain the answer to the initial question: the probability that a burglary took place when the alarm is activated is around 0.1, which is still low considering that the signal from the alarm is positive. p(a|b) · p(b) p(a) (3.10) 0.75 · 0.014 = 0.096 0.1091 (3.11) p(b|a) = p(b|a) = Note that the term p(a) was not listed in the table. It is calculated from the conditional probabilities which are available. As shown in (3.12), the probability that the alarm is activated is first expressed as the sum of two joint probabilities: the probability that the alarm is activated and there is a burglary and the probability that the alarm is activated and there is no burglary (the probability of the complement set of values). Since the two joint probabilities are not listed in our data, we calculate them from the conditional probabilities which are known applying the rule in (3.8). The term p(¬b), which is required for this calculation is given by p(b). Since these two cases are complementary, their probability sums to 1, which yields p(¬b) = 1 − p(b) = 0.986. p(a) = p(a, b) + p(a, ¬b) = p(a|b) · p(b) + p(a, ¬b) · p(¬b) = 0.75 · 0.014 + 0.1 · 0.986 = 0.0105 + 0.0986 = 0.1091 (3.12) These relatively simple calculations provide a formal framework for updating the prior probability after having encountered new evidence related to the question that is investigated. In our example, the prior probability of a burglary in the given neighbourhood is updated having learnt that the alarm had been activated. This updating is performed 112 3.3. Statistical analysis taking into consideration the uncertainty that is inherent to the knowledge about the phenomenon at each step. An advantage of Bayesian modelling compared to the “standard” statistical inference laid out in Section 3.3.2 is that it offers a more straightforward mechanism for combining evidence. In the standard approach, the influence of all potential predictors on the predicted variable is assessed directly. The explanations from predictors can be combined in a linear or weighted fashion, but not hierarchically. Contrary to this, Bayesian calculations can be applied recursively: once a posterior probability is calculated, it can be used as a prior for some other posterior probability. For example, the prior probability of a burglary in a particular neighbourhood, which is used in the calculations above, could have been calculated as a posterior probability relating the chance of a burglary in general to the relationship between some characteristics of a particular neighbourhood and its proneness to burglaries. Another advantage of Bayesian modelling is that it assumes no particular parameters of probability distributions over the values of variables. The accent in Bayesian modelling is on combining the probabilities, while their origin is less important. The probability assessments can be expressions of intuitive (expert) knowledge, of previous experience, or of a relative frequency in a sample. The calculations yielding new assessments apply to any kind of probability distributions over the values as long as the probability of all the values sums to 1 (like the probability that a burglary happens and the probability that it does not happen in our example). Both underlined advantages are especially important in the context of modelling linguistic phenomena. The recursive nature of Bayesian models makes them a well-adapted framework for a statistical approach to linguistic structures, which are, according to the majority of theoretical accounts, recursive. The fact that the inference in this approach does not depend on a particular probability distribution (notably, on the normal distribution) is important because linguistic data are often associated with unusual distributions for which it is hard to define a small set of appropriate parameters. These advantages, however, come at a cost. Stepping out of the standard statistical inference framework makes evaluating the predictions in Bayesian modelling harder. Good predictions in the traditional statistical modelling are guaranteed by the notion 113 3. Using parallel corpora for linguistic research — rationale and methodology of statistical significance. If a statistically significant effect of a predictor on a predicted variable is identified, the predictions based on this relationship can be expected to be correct in the majority of cases. The notion of statistical significance is not incorporated in the predictions in Bayesian modelling. The quality of predictions has to be evaluated in another way, usually by measuring the rate of successful predictions. In this dissertation, both approaches are used. We apply standard tests in the situation where we can assume the normal probability distribution over the values of a variable and where the hierarchical relationships between the components of a model are not complex. Otherwise, we formalise our generalisations in terms of Bayesian models and we test the predictions comparing the predicted and the actual values on a sample of test examples. The generalisations which are addressed by the models concern the relationship between semantic properties of verbs and the observable formal properties of their realisations in texts. We explain the variation in the verb instances by the variation in their semantic properties. 3.4. Machine learning techniques Statistical models which are proposed in this dissertation are developed by combing theoretical insight with some standard machine learning techniques. Theoretical analysis results in a small number of variables which define the studied domain. It also provides the hypotheses about the dependency relationship between the variables, but the exact numerical values of the relationships between the values of the variables are acquired automatically from the data set. Automatic acquisition of generalisations from data is studied in the domain of machine learning. Using the general terminology of learning, the data which machine learning algorithms take as input are regarded as experience. A computer program is said to learn from experience if its performance at some task improves with the experience, that is by observing the data. In our experiments, we assume that the machine learning task is defined as classification. The notions used in the section are mostly based on three sources, Mitchell (1997), Russell and Norvig (2010), and Witten and Frank (2005). 114 3.4. Machine learning techniques There are two main approaches that can be taken in inferring the relationships between the values of variables: supervised and unsupervised learning. In this section, we first illustrate the two approaches by describing standard algorithms which are most widely used. We then show how the two approaches are implemented with Bayesian models in this dissertation. 3.4.1. Supervised learning In the supervised learning setting, the training data include the information about the values in the predicted variable, which we call the target variable. To illustrate these notions, we adapt an example constructed by Russell and Norvig (2010) (see the data summary in Table 3.5). Suppose we are at work and we receive a message that our neighbour Mary called some time ago. We want to assess the probability that this call means that there was a burglary at our place. We have an old-fashion alarm that rings when some shock is detected, but we cannot hear the ringing when we are away from home. So we ask our neighbours Mary and John to call us if they hear the alarm. The day when we receive a call from Mary, we did not hear from John. We should also bear in mind that Mary could have called for some other reason. Also the alarm could have been activated by some other shock, not burglary (like an earthquake, for example). The question that we ask is: What is the probability that there was a burglary, given that Mary called, John did not call and there was no earthquake, p(b = yes|m = yes, j = no, e = no) (the bottom row in Table 3.5)? In assessing the probability, we look up the records of the last ten cases when one of our neighbours called us at work (the other rows in Table 3.5). What we want to find is the same situations in the past and to see whether a burglary actually happened in these cases. We find that only one of the previous situations (row seven) was the same and that a burglary actually happened then. However, we are still not convinced because in the majority of our records there was no burglary. To use all the data available, we look how the burglary was related to each of the values separately and then we recompose the probability for the case in queastion. To illustrate 115 3. Using parallel corpora for linguistic research — rationale and methodology 1 2 3 4 5 6 7 8 9 10 Q Mary calls yes yes yes no no no yes no no yes yes John calls Earthquake yes no yes no no yes yes no yes no yes no no no yes no yes no yes yes no no Burglary yes yes no no no no yes no no no ? Table 3.5.: An example of a data record suitable for supervised machine learning how this can be done, we describe two algorithms which are usually regarded as rather simple, but often well performing. The Naı̈ve Bayes algorithm decomposes the records assuming that the values of the three predictors are mutually independent. The term naı̈ve in the name of the algorithm refers to the fact that the variables are usually not independent in reality, but that the potential dependencies are ignored. With the assumed independence, the probability which we look for can be expressed as the product of individual conditional probabilities, as shown in (3.13). Since our task is classification, we look for the probability of a particular class cj based on the values of predictor variables, which are attributes a1,...,n of each instance of a class. P (a1 , a2 , a3 , ...an |cj ) ≈ Y P (ai |cj ) (3.13) i We calculate the most probable class for a given set of values of attributes by applying Bayes’ rule in (3.9) repeated here as (3.14), which gives the general formula in (3.15), where the z1 is a constant when the values of the attributes are known, as in our example. When we apply the general classification formula to our data, we obtain (3.16). 116 3.4. Machine learning techniques P (A|B) = P (B|A) · P (A) P (B) (3.14) n Y 1 P (cj |ai , ...an ) ≈ P (cj ) P (ai |cj ) z i p(b = yes|m = yes, j = no, e = no) ≈ z1 · p(b = yes) · p(m = yes|b = yes) ·p(j = no|b = yes) · p(e = no|b = yes) (3.15) (3.16) With the separated conditional probabilities, we can use more records to estimate each factor of the product. For example, applying the conditional probability rule in (3.8), we can calculate: p(m = yes|b = yes) = 3 p(m = yes, b = yes) = p(b = yes) 3 Burglary actually happened in three of our ten records and in all three of them, Mary called. The score would be different for John, since he failed to call in one of the three cases. In deciding whether to classify the current situation as burglary or not, we calculate the product of all the conditional probabilities for both potential values of the target variable, we multiply this product with the probability of the value of the target variable, and then we select the higher probability. The constant can be omitted because it is the same for both classes. In this particular example, the calculation would give: p(b = yes|m = yes, j = no, e = no) ≈ 1 3 3 1 3 · · · = 10 3 3 3 10 (3.17) p(b = no|m = yes, j = no, e = no) ≈ 7 2 1 5 1 · · · = 10 7 7 7 49 (3.18) 117 3. Using parallel corpora for linguistic research — rationale and methodology Mary calls? No. Yes. Earthquake? No burglary. Yes. No. No burglary. Burglary! Figure 3.5.: An example of a decision tree Since glary. 1 10 > 1 , 49 the final decision should be to classify Mary’s call as signalling a bur- Applying the decision tree algorithm to the same data set, we proceed by querying each variable in the order of informativeness, as shown in Figure 3.5. We first determine that Mary called. If she had not called there would have been no reason to worry. But Mary did call in this case, so then we check if there was an earthquake immediately preceding Mary’s call. If there was an earthquake, there is no need to worry; it was the earthquake that probably activated the alarm, which made Mary call. But if there was no earthquake, which is the case in the current situation, then we should better hurry home, because there was a burglary. The decision tree which brought us to this conclusion is constructed on the basis of the same records which were used for assessing the probabilities for the naı̈ve Bayes algorithm (Table 3.5). In deciding which variable should be in the root of the tree, we look up the joint distributions of values combining each predictor with the target variable separately. This procedure results in groupings shown in the upper part of Table 3.6 (Step 1). We compare the resulting divisions to identify the most discriminative variable. The variable which gives the “purest” groups is the most discriminative. In our example, two variables give entirely pure goupings. We can see in Table 3.6 that every time the 118 3.4. Machine learning techniques value of Marry calls is “no”, the value of the target variable is also “no”. But also every time the value of Earthquake is “yes”, the value of the target variable is “no”. Since they give pure classes, these two variables are candidates for the most discriminative variable. Mary calls wins because it gives the bigger pure group and also because the size of the two resulting groups, which depends on the distribution of the values in the variable, is more balanced (there are five occurrences of each value). If all the values of the target variable were the same when the value of Mary calls is “yes”, we could stop at this point, ignore the other two variables and predict the target variable only from the values of Mary calls. This is, however, not the case, so we need to continue constructing the tree by looking up the combinations of values of Mary calls with the other two variables. The aim here is to see if some of these combinations will result in pure groupings of the values in the target variable. The resulting groupings of the second step are shown in the bottom part of Table 3.6 (Step 2). We can see that the combination of values of Mary calls and Earthquake divides the set of values of the target variable into entirely pure classes (for both values of Mary calls). Since all the groupings of the values in the target variable are pure at this point, the tree is completed. We can ignore the variable John calls because it provides no information about whether there was a burglary or not. We have used the notion of the purity of a class in an intuitive way so far. We have considered the classes purer if they contain more of the same kind of items. For example, the group of values of the target variable associated with the value “yes” of the variable John calls in the upper part of Table 3.6 is “purer” than the group of values associated with the value “no” of the same variable because the proportion of the same items is bigger in the first group (five out of seven) than in the second (two out of three). The same principle applies when working with large data set, but the purity of classes has to be measured at each step as it is hard to assess larger classes intuitively. The measure that is most commonly used to assess the purity of classes is entropy, which is calculated from the probability distribution of a variable, using the formula in (3.19), where S denotes the variable for which we measure the entropy and c denotes the number of possible values of the variable. It shows the degree to which the probabilities 119 3. Using parallel corpora for linguistic research — rationale and methodology Step 1 Mary calls yes no Burglary yes no yes no no no yes no no no Step 2 Mary yes John calls yes no Burglary yes no yes yes no yes John calls yes no yes no yes no no yes no no no no calls no John calls yes no no no no no Earthquake yes no no yes no yes no no no yes no no Mary calls yes no Earthquake Earthquake yes no yes no no yes no no yes no yes no no Table 3.6.: Grouping values for training a decision tree of individual values vary. The more similar the probabilities of the values, the higher entropy. If one value is much more likely than the other, entropy is going to be low. Entropy(S) = c X − pi log2 pi (3.19) i=1 As an illustration, we calculate the entropy of the set of Burglary values observed in the training data. There are two possible values: “yes” occurs three times, and “no” occurs seven times. 120 3.4. Machine learning techniques 3 7 7 3 log2 10 − 10 log2 10 Entropy(B) = − 10 = −(0.3 · −1.74) − (0.7 · −0.51) = 0.52 + 0.36 = 0.88 (3.20) To choose the attribute which should be put in the root of the decision tree, we compare the entropy of the starting set of values with the entropy of the subsets of values of the target variable which are associated with each value of each attribute (the columns in the upper part of Table 3.6). The variable that is considered the most discriminative at each node in constructing a decision tree is the one which reduces the most the entropy of the target variable. The measure which is most commonly used for this comparison is called information gain. It is calculated using the formula in (3.21), where A is the attribute which is considered and |Sv | is the subset of S for which the value of A is v. Gain(S, A) = Entropy(S) − X v∈V alues(A) |Sv| Entropy(Sv ) |S| (3.21) As an illustration, we calculate the information gain of the attribute M arycalls in our example: |Bv | v∈V alues(M ) |B| Entropy(Bv ) |B =yes | Entropy(B) − M|B| Entropy(BM =yes ) − |BM|B|=no | Entropy(BM =no ) 5 5 0.88 − 10 · 0.97 − 10 ·0 GainB, M = Entropy(B) = P = = 0.88 − 0.48 − 0 = 0.40 (3.22) The same calculations are performed for the other two attributes and the one which provides the highest information gain is taken as the first split attribute. The calculations are performed recursively until the entropy of all resulting subsets is 0. Note that the entropy of BM =no in our example is 0 because all the values in this subset are the same. In this case, the recursive calculations are performed only for the subset 121 3. Using parallel corpora for linguistic research — rationale and methodology Mary calls Earthquake 1 yes no 2 yes no yes 3 yes 4 no no no 5 no 6 no no no 7 yes 8 no no 9 no no yes 10 yes Q yes no Burglary ? ? ? ? ? ? ? ? ? ? ? Table 3.7.: An example of a data record suitable for supervised machine learning BM =yes . In practice, the programs that implement the decision tree algorithm work with some additional constraints, but we do not discuss these issues further because such a discussion would exceed the scope of this survey. 3.4.2. Unsupervised learning In the unsupervised learning setting, the values of the target variable are not known in the training data. The task of deciding what class to assign to a particular case resembles the data summary in Table 3.7. The data in Table 3.7 represent basically the same records as in Table 3.5, with the variable John calls omitted for the reason of simplicity. The question marks in the last column represent the fact that the values of the target variable are not recorded. However, we can assume that such a variable exists and that its values can be explained by the values of the other, known variables. Such variables are called hidden variables. In principle, hidden variables are not necessarily the target variables. Any variable in a model can be regarded as hidden. In some models, the values of the target variable itself are known in training, but they are assumed to be influenced by some other variable with an unknown probability distribution. In this case, the learning setting is supervised, 122 3.4. Machine learning techniques but estimating the probability distribution of the hidden variable requires a special approach. In this subsection, we describe the expectation-maximisation algorithm, which is often used in assessing the probability distribution of a hidden variable, regardless of whether it is a target variable. It is a general algorithm which has been applied to many different learning tasks. The algorithm is applied to an independently constructed model and to a set of data in an iterative fashion. As its name suggests, it consists of two main parts. In the expectation part, expected values in the data set are generated based on hypothesised parameters of distributions. In the maximisation part, the hypothesised parameters of distributions are combined with the observations in the data set and updated so that they are more consistent with the observed data. As a result, the parameters of distributions of both observed and unobserved variables are consistent with the observed data. The algorithm starts with arbitrary hypothesised parameters which are combined with the observations and updated in a number of iterations. It ends when the parameters reach the values which are consistent with the data and they are no longer updated. The mathematical background of the algorithm is much more complex than in the case of the two supervised algorithms which we have introduced so far. Its precise general mathematical formulation would exceed the scope of this survey. We thus limit the discussion in this subsection to the particular application of the algorithm which is used in this dissertation. To illustrate the functioning of the algorithm, we use the same model and the same data set which were used for the naı̈ve Bayes algorithm in Section 3.4.1, modified as shown in Table 3.7. With the variable John calls omitted (for simplicity), the model is formulated as in (3.23). p(m, e, b) = p(b) · p(m|b) · p(e|b) (3.23) The model formulation which we use in this example is more general than in the previous calculations. Instead of specifying concrete values, we refer to any value that a variable can take. Thus the small letter m stands for both Mary calls = “yes” and Mary calls = “no”, e stands for both values of Earthquake, and b for both values of Burglary. Note also that the value on the left side of the equation is not a conditional, but a joint 123 3. Using parallel corpora for linguistic research — rationale and methodology m y y y y n n n n e y y n n y y n n b y n y n y n y n p(b) 0.4 0.6 0.4 0.6 0.4 0.6 0.4 0.6 Iteration 1 p(m|b) p(e|b) 0.4 0.4 0.4 0.4 0.4 0.6 0.4 0.6 0.6 0.4 0.6 0.4 0.6 0.6 0.6 0.6 p(M ) 0.064 0.096 0.096 0.144 0.096 0.144 0.144 0.216 Cc 0.8 1.2 1.2 1.8 2 3 p(b) 0.4 0.6 0.4 0.6 0.4 0.6 0.4 0.6 Iteration 2 p(m|b) p(e|b) 0.5 0.2 0.5 0.2 0.5 0.8 0.5 0.8 0.5 0.2 0.5 0.2 0.5 0.8 0.5 0.8 p(M ) 0.04 0.06 0.16 0.24 0.04 0.06 0.16 0.24 Cc 0.8 1.2 1.2 1.8 2 3 Table 3.8.: An example of probability estimation using the expectation-maximisation algorithm probability, which is also a more general case (knowing the joint probability of a set of values allows one to calculate several related conditional probabilities). The more general formulation is needed for this example because the expectationmaximisation algorithm explores all possible combinations of values. In our example, there are three variables, each with two possible values. The number of possible combinations of values is 23 = 8. They are all listed in the first three columns in Table 3.8, which shows two iterations of the algorithm assuming the model in (3.23) and the data set in Table 3.7. The first step of the algorithm is the initialisation of the model. In this step, the probability distribution of all variables is determined in an arbitrary way, regardless of the frequency of certain values in the training sample. For example, we assign the probability 0.4 to all values “yes” of all variables (regardless of whether its probability is conditional or prior) and 0.6 to all values “no” of all variables. This initialisation reflects our general belief about which events are more likely and which ones are less likely. Note, however, that the initialisation step is arbitrary and relating it to some existing belief does not guarantee a better result. The initial (arbitrary) probability of each factor of the model is shown under Iteration 1 in Table 3.8. The probability of the whole model (column p(M ) in the table) is 124 3.4. Machine learning techniques calculated by multiplying the factors, as shown for the first two cases in (3.24) and (3.25) respectively. The probability of the model in each case is then combined with the counts observed in Table 3.7 to distribute the counts to different cases. The distributed counts are called complete counts (Ccounts in the formulas, Cs columns in Table 3.8) as opposed to the incomplete counts which are available in the data set. For example, what we can see in Table 3.7 is that there were two cases where both Mary called and there was an earthquake (F (y, y, ∗) in the formula, the asterisk stands for any value of the third variable), but we do not know whether there was a burglary in these cases. We apply the formula shown in (3.24) and (3.25) to assign the count of 0.8 to the first case (where there is a burglary) and the count 1.2 to the second case (where there was no burglary). Note that the counts are fractional, which would not be possible in reality, but this is acceptable because they are only an intermediate step in calculating the probability of each factor of the model. Applying the formula gives complete counts for all the cases, as shown in the column Cc under Iteration 1 in Table 3.8. b=y, m=y, e=y: p(M ) = p(b = y) · p(m = y|b = y) · p(e = y|b = y) = 0.4 · 0.4 · 0.4 = 0.064 Ccounts = F (y,y,∗)·p(M(y,y,y) ) p(M(y,y,y) )+p(M(y,y,n) ) p(b = y) = F (b=y) T otal p(m = y|b = y) = F (m=y,b=y) F (b=y) p(e = y|b = y) = F (e=y,b=y) F (b=y) New P(M) = 0.4 · 0.5 · 0.2 = 0.04 = = 0.8+1.2+0+2 10 = = 0.8+1.2 4 0.8+0 4 2·0.064 0.064+0.096 = 4 10 = 0.4 = 0.128 0.16 = 0.8 (3.24) = 0.5 = 0.2 125 3. Using parallel corpora for linguistic research — rationale and methodology b=n, m=y, e=y: p(M ) = p(b = n) · p(m = y|b = n) · p(e = y|b = n) = 0.6 · 0.4 · 0.4 = 0.096 Ccounts = F (y,y,∗)·p(M(y,y,n) ) p(M(y,y,y) )+p(M(y,y,n) ) p(b = n) = F (b=n) T otal p(m = y|b = n) = F (m=y,b=n) F (b=n) p(e = y|b = n) = F (e=y,b=n) F (b=n) New P(M) = 0.6 · 0.5 · 0.2 = 0.06 = = 1.2+1.8+0+3 10 = = 1.2+1.8 6 1.2+0 6 2·0.096 0.064+0.096 = 6 10 = 0.6 = 0.192 0.16 = 1.2 (3.25) = 0.5 = 0.2 To update the probability of each factor of the model, we sum up the counts for each relevant case and calculate the conditional probability applying the conditional probability rule (3.8), as shown for the first two cases in (3.24) and (3.25). All the counts which are added up can be looked up in the corresponding cells under Iteration 1 in Table 3.8, and all the resulting updated probabilities of each factor of the model in each case are listed under Iteration 2. In the next step, we calculate the probability of the model by multiplying the updated probabilities of the factors. We then calculate new complete counts (the Cc column under Iteration 2 in Table 3.8) using the updated model probability and then use the new counts to update again the probability of the factors of the model. We then repeat applying and updating the model till the probabilities of the factors of the model converge to the true probabilities. The convergence is not guaranteed in all the cases, but if the patterns in the data are clear enough, it is very likely. Looking at the values in Table 3.8, we can see that the initial arbitrary probabilities of the models have changed when combined with the information about the incomplete counts. In both cases which are of interest in our example (the cases (y, n, y) and (y, 126 3.4. Machine learning techniques n, n), the probability of the model has increased. The probability of no burglary is still higher than the probability of burglary, which does not correspond to the results of supervised learning in Section 3.4.1. However, the ranking of the two models would change if there were more instance to learn from. Unsupervised learning is harder than supervised learning because crucial information about the values of the target variable is not available for training. However, it is increasingly used in natural language processing because linguistic data sets with known target variables, such as manually annotated corpora presented in Chapter 2, Section 2.2.2, are hard to construct. Another reason why unsupervised learning is seen as an attractive framework for approaching linguistic phenomena is the fact that it allows using corpus data for discovering new structures, not pre-defined in linguistic annotation, such as in the experiments on grammar induction by Klein (2005). 3.4.3. Learning with Bayesian Networks Since the main purpose of the models proposed in our case studies is representing generalisations about the structure of language, the accent is not as much on assessing the probabilities as it is on the structure of the relationships between the variables. We use rather basic learning methods for training the models on a set of data extracted from corpora, putting more complexity on the structure of the models. To represent the hierarchical relationships between the variables, we formulate our models in terms of Bayesian networks. A Bayesian network is a directed acyclic graph where the nodes represent the variables of a model and the edges represent the dependency relationships between the variables. A Bayesian network would be very useful, for example, if we wanted to add the variable Alarm to the model discussed in Section 3.4.1. Although we know that Mary’s and John’s calls depend, to a certain degree, on whether they have heard the alarm, this dependence is only implicitly present in the data set. A graph such as the one in Figure 3.6 can be used to represent the role of the alarm explicitly. The edges in the graph show that the alarm can be caused by an earthquake or by a burglary, and also that it causes John and Mary to call. Each edge is associated with a 127 3. Using parallel corpora for linguistic research — rationale and methodology Earthquake Burglary Alarm Mary calls John calls Figure 3.6.: An example of a Bayesian network conditional probability distribution showing how the two variables which are connected with it are related. For example, we can specify the probability of the alarm being activated by an earthquake as p(a|e) = 0.7, which also specifies p(¬a|e) = 0.3. We can specify the probability of a burglary activating the alarm as higher, for example as p(a|b) = 0.9 and p(¬a|b) = 0.1. Such conditional probabilities are specified for each node and each edge. They can be estimated on the basis of intuition or on the basis of training on a set of examples using machine learning algorithms. Some of the variables can be regarded as hidden and their distribution estimated using approaches such as the one described in Section 3.4.2. The probability of the whole model represented in Figure 3.6 is given in (3.26): p(e, b, a, m, j) = p(e) · p(b) · p(a|e) · p(a|b) · p(m|a) · (j|a) (3.26) The decomposition of the model into the factors is based on the notion of conditional independence, which allows us to reduce the complexity of the potential dependencies between the variables avoiding at the same time oversimplifications, such as for example the independence assumption used in the naı̈ve Bayes algorithm (see Section 3.4.1). Note, for example, that the variables Mary calls and John calls are not directly connected in the graph. This represents the fact that these two variables are conditionally independent given Alarm. If we know whether the alarm rang or not, then Mary’s and John’s call do not depend on each other, but they both depend on the state of the alarm. Also, 128 3.4. Machine learning techniques Predicted 1 True 1 A True 0 C Predicted 0 B D Table 3.9.: Precision and recall matrix note that each node in the graph depends only on the parent node (or nodes). This means that, if we know the value of the alarm, then the calls from Mary and John are not relevant for assessing the probability that there was a burglary. The probability of any particular value of any variable in the network can be inferred applying Bayes’ rule (3.9). 3.4.4. Evaluation of predictions The success of a model in making predictions is evaluated on a test data set which contains new instances. The values of the target variable in each instance is predicted based on the values of predictor variables (like in the bottom rows of Tables 3.5 and 3.7). The predictions are then compared with the correct answers, usually called gold standard, and a measure of success is calculated. The predictions of the model are counted as correct if the predicted values are identical to the values in the gold standard. Since a number of values can be identical to the gold standard due to chance, the success of a model is usually defined as an improvement relative to the baseline — the result that would be achieved by chance, or by a very simple technique. The most commonly used measure is the F1 measure. It is the harmonic mean of two measures: precision (p) and recall (r ): F1 = 2 · (p · r) p+r (3.27) A Precision shows how many of the predictions made are correct (p = (A+C) in the matrix in Table 3.9). Recall shows how many of the values that exist in the gold standard are 129 3. Using parallel corpora for linguistic research — rationale and methodology also predicted by the model (r = A ). (A+B) The difference in recall and precision is important for the tasks where some instances can be left without a response by the model (for example, these measures are typically used in the tasks of information retrieval ). Since in our experiments every instances is given a prediction, the appropriate measure is accuracy. It is calculated using the formula in (3.28). Accuracy = Correct All (3.28) The correct predictions include true positives and true negatives, while the difference between correct predictions and the total number of predictions includes false positives and false negatives. 3.5. Summary In this chapter, we have discussed two methodological issues concerning the use of parallel corpora for linguistic research. We have first addressed the question of why use parallel corpora. We propose this approach as an extension of standard analysis of cross-linguistic variation in the context of studying microparametric variation. To deal with the linguistically irrelevant variation, which is seen as one of the main obstacles for using parallel corpora for linguistic research, we propose collecting large data sets containing maximally parallel verb instances. In addition to the methodological discussion, we present additional arguments in favour of this approach coming from the experiments in natural language processing, which demonstrate that automatically word-aligned corpora provide a rich new resource for studying various questions related to cross-linguistic variation. Having argued in favour of using parallel corpora, we have then discussed applying the methodology of natural language processing to address theoretical linguistic issues by processing large data sets. As this methodology has not been commonly used in linguistic research so far, we provide the technical background necessary for following the presentations of our experiments in the three case studies. The introduction to the notions in statistical inference and modelling in combination with machine learning is 130 3.5. Summary carefully adapted specifically for the purpose of this dissertation, providing all the necessary technical details, but in a way which is adapted to an audience with little experience in these disciplines. The general methodology outlined in this chapter is applied in three cases studies which are presented in the following chapters. 131 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions 4.1. Introduction Light verb constructions are special verb phrases which are identified as periphrastic paraphrases of verbs. English expressions put the blame on, give someone a kick, take a walk are instances of such paraphrases for the verbs blame, kick, and walk. These constructions are attested in many different languages representing a wide-spread linguistic phenomenon, interesting both for theoretical and computational linguistics. They are characterised by a special relation between the syntax and the semantics of their constituents. The overall meaning of the phrase matches the meaning of the complement, instead of matching the meaning of the head word (the verb), which is the case in typical verb phrases. Figure 4.1 illustrates the difference between regular verb phrases and phrases headed by a light verb. Despite the same syntactic structures, the two phrases are interpreted differently: have a yacht is about having, while have a laugh is about laughing. The special relation between the meaning and the structure makes light verb constructions semantically non-compositional or opaque to a certain degree. The meaning of the phrase cannot be calculated from the meaning of its constituents using general rules of grammar. Moreover, the use of these phrases is partially conventionalised. They show 133 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions Regular VP VP Light verb construction [syntactic features] [semantic features] VP [syntactic features] [semantic features] Verb Complement Light verb Complement have a yaht have a laugh Figure 4.1.: A schematic representation of the structure of a light verb construction compared with a typical verb phrase. The dashed arrows show the direction of projection. some properties of idiomatic expressions, but, unlike collocations and idioms, they are formed according to the same “semi-productive” pattern in different languages. The semi-productive and semi-compositional nature of light verb constructions has important consequences for their cross-linguistic mappings. Consider the following examples of English constructions and their translations into German and Serbian. (4.1) a. Mary [had a laugh]. b. Maria [lachte]. (English) (German) c. Marija se [na-smejala]. (4.2) a. Mary [gave a talk]. (Serbian) (English) b. Maria [hielt einen Vortrag]. c. Marija [je o-držala predavanje]. (German) (Serbian) English expression had a laugh in (4.1a) is translated to German with a single verb (lachte in (4.1b)). The Serbian counterpart of the English expression (nasmejala in (4.1c)) is also a single verb, but with a prefix attached to it. By contrast, the English expression in (4.2a) is translated with phrases both in German and in Serbian, but the 134 4.1. Introduction heading verbs are not lexical counterparts. Unlike English gave, German hielt means ’held’, and Serbian održala means approximately ’held for a moment’. Distinguishing between regular verb phrases and light verb constructions is crucial both for constructing correct representations of sentences and for establishing cross-linguistic mappings (Hwang et al. 2010). Moreover, one needs to distinguish between different kinds of light verb constructions to account for the fact that they are not distributed across languages in the same way. In some cases, cross-linguistically equivalent expressions are constructions, as in (4.2), while in other cases, cross-linguistic equivalence holds between constructions and individual lexical items, as in (4.1). However, these distinctions are hard to make because there are no formal indicators which would mark the differences either morphologically or syntactically. The issue of distinguishing between different types of light verb constructions has been addressed in both theoretical and computational linguistics. It has been argued that these constructions should be seen as a continuum of verb usages with different degrees of verbs’ lightness and different degrees of compositionality of the meaning of constructions. There has been a number of proposals as to how to distinguish between different kinds of constructions. Despite the fact that light verb constructions are headed by several different verbs in all the studied languages (for example, take, make, have, give, pay in English), the proposed accounts do not address the potential influence of lexical properties of the heading verb on the overall interpretation of the construction. Regarding light verbs as semantically empty or impoverished, the proposed accounts rely on the characteristics which are common to all of them. Contrary to this, our study addresses potential lexical differences between light verbs. We perform two experiments showing that cross-linguistic mappings of English light verb constructions depend on the kind of meaning of the heading light verbs. We describe the meaning in terms of force dynamics schemata (see Chapter 2, Section 2.1.4). The chapter consists of four main parts. In the first part, we present the questions raised by light verb constructions and the proposed accounts which constitute the theoretical background of our study. We start by introducing the problem of semantic role assignment in light verb constructions (4.2.1), which is followed by the discussion of the proposed distinctions between different constructions (4.2.2). In the second part, 135 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions we present two experiments. In the first experiment (4.3.1), we examine the differences in cross-linguistic alignments between two kinds of light verb constructions in a sample of instances extracted from a parallel corpus based on manual word alignment. In the second experiment (4.3.2), we evaluate automatic word alignment of the same sample of instances which is manually analysed in the first experiment. The aim of this analysis is to determine whether the quality of automatic alignment of light verb constructions depends on the semantic properties of the heading light verbs. In the third part (Section 4.4, we interpret the results of our experiments in light of the theoretical discussion presented in the first part. We compare the findings of our study with the related work in Section 4.5. 4.2. Theoretical background Theoretical accounts of light verb constructions are mostly concerned with the question of whether light verbs assign semantic roles to some constituents in a sentence or not. While some authors argue that light verbs are functional words with no lexical content and no predicate-argument structure, others argue that some semantic roles are assigned by light verbs. In the following subsection, we discuss theoretical challenges posed by light verb constructions and the proposed accounts. We then turn to the issue of semicompositionality and semi-productivity of the constructions. 4.2.1. Light verb constructions as complex predicates The question of whether light verbs assign semantic roles or not is theoretically interesting because it relates directly to the general theory of the relationship between the lexical properties of verbs and the rules of phrase structure (see Chapter 2, Section 2.1.1). Note that the nouns which head the complements of light verbs, for example look in (4.3a), are derived from verbs. Contrary to other, regular nouns, these nouns retain the relational meaning of the original verb. For example, the noun look in (4.3a) relates the nouns daughter and Mary in a similar way as the verb look in (4.3b). If a light verb which heads a light verb construction (for example, took in (4.3a)) assigns 136 4.2. Theoretical background some semantic roles too, then there are more arguments of verbs with semantic roles than constituents in the clause that can realise them syntactically.1 This problem is characteristic for a range of phenomena usually called complex predicates. (4.3) a. Mary took a look [at her daughter]. b. Mary looked [at her daughter]. In some languages, such as Urdu, (Butt and Geuder 2001), light verbs can take both verbs and deverbal nouns as complements. In others, such as English, they only take deverbal nouns, but these nouns can be more or less similar to the corresponding verbs. Their form can be identical to the verb form, as it is the case with look in (4.3), or it can be derived from a verb with a suffix (e.g. inspectV vs. inspectionN ). In some cases, the same semantic arguments of deverbal nouns and their corresponding verbs are realised as the same syntactic complement. For example, the same prepositional phrase at her daughter in 4.3 occurs as a complement of both the noun and the verb look. In other cases, the same semantic argument can be differently realised in syntax (her brother vs. to her brother in (4.4)) or it can be left unspecified (the project site vs. no complement in (4.5)). (4.4) a. Mary visited [her brother]. b. Mary paid a visit [to her brother]. (4.5) a. They inspected [the project site] last week. b. They made an inspection last week. The meaning of a deverbal noun can be more or less similar to the meaning of the corresponding verb. Grimshaw (1990) distinguishes between event and result nominal structures called nominals, arguing that only event nominals actually denote an action and can take arguments. For example, the expression in (4.6a) is grammatical, while the expression in (4.6b) is not. According to this test, the deverbal noun examination 1 Note that auxiliary and modal verbs constitute a single lexical unit with a main verb. The problem of syntactic realisation of verbal arguments does not arise with these items because they are purely functional words with no idiosyncratic lexical content; they do not assign to their arguments any semantic roles that need to be interpreted. 137 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions refers to an activity, while exam refers to a result of an activity. In addition to this test, Grimshaw (1990) proposes several syntactic indicators to distinguish between deverbal nouns which refer to an activity and which are, thus, more similar to the corresponding verbs and the nouns which refer to a result state, which is closer to the typical nominal meaning. One of the test is the indefinite article. As illustrated in (4.7), result nominals (4.7a) can occur in an indefinite context, while event nominals cannot ((4.7b) is not acceptable). (4.6) a. the examination of the papers b. * the exam of the papers (4.7) a. * take an examination b. take an exam According to this analysis, most light verb complements would be classified as result nominals, since the indefinite article seems to be one of the characteristic determiners in light verb constructions (see also the examples below). This characteristic, however, does not necessarily hold in all languages. Based on an analysis of Japanese light verb constructions, Grimshaw and Mester (1988) provide evidence for a distinction between transparent noun phrases which are complements of the verb suru and opaque noun phrases which are complements of the verb soseru. The former are special noun phrases which occur only as complements of light verbs. They are described as transparent because the predicate-argument relations are syntactically marked (by cases). The latter are more typical noun phrases which occur in other contexts as well. They are described as opaque because the semantic relationships in these phrases are interpreted implicitly. According to Grimshaw and Mester (1988), English light verb constructions would all be formed with the opaque nominals. For example, the relationship between the predicate visit and its argument her brother is transparent in (4.4a), where visit is a verb: her brother is theme and this relationship is syntactically expressed as the direct object. Contrary to this, the same semantic relationship is not transparent in (4.4b), where visit is a noun. The attachment of the prepositional phrase to her brother is ambiguous 138 4.2. Theoretical background (it can be attached to the light verb paid or the noun visit), and its semantic role is interpreted implicitly (the preposition to does not encode the role theme). Wierzbicka (1982), on the other hand, underlines the difference in meaning between the complements of light verbs in English. For example, the meaning of the verb have in (4.8) is contrasted to the one in (4.9-4.11). The nouns like swim in (4.8) are claimed to be verbs “despite the fact that they combine with an indefinite article” and should be distinguished from deverbal nouns. All the derived forms are considered to be nouns, together with some nouns that have the same form as verbs, but whose meaning is clearly that of a noun, such as smile in (4.9), cough in (4.10), or quarrel in (4.11). Wierzbicka (1982), however, does not use any observable criterion or test to distinguish between the nouns such as swim in (4.8) and the nouns such as smile, cough, quarrel in (4.9-4.11) relying only on individual judgements. (4.8) He had a swim. (4.9) She has a nice smile. (4.10) He has a nasty cough. (4.11) They had a quarrel. Kearns (2002) notices that the complements of light verbs in English are not “real nouns” in some constructions, but that they are coined for light verb constructions and do not occur freely in other nominal environments. This characteristic makes some light verb constructions in English similar to the suru-constructions in Japanese. The degree to which the complement of a light verb is similar to its corresponding verb influences the overall representation of the light verb construction. The more verbal the complement the less straightforward the assignment of semantic roles in the construction. The more typical the noun which heads the complement, the more compositional and regular the construction. Light verb constructions are distributed on a scale ranging from complex predicates to near regular constructions. The variety of constructions is discussed in the following subsection. 139 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions 4.2.2. The diversity of light verb constructions Several degrees of “lightness” of light verbs are illustrated by expressions in (4.12-4.16) taken from Butt and Geuder (2001). The sequence of expressions shows the gradual extension of the prototypical meaning of give (4.12) to its lightest use (4.16) . (4.12) a. give him the ball b. give the dog a bone c. give the costumer a receipt (4.13) a. Tom gave the children their inheritance money before he died. b. The king gave the settlers land. (4.14) a. give advice b. give someone the right to do something c. give someone information (4.15) a. give someone emotional supported b. give someone one’s regards (4.16) a. give someone a kiss / a push / a punch / a nudge / a hug b. give the car a wash, give the soup a stir The change in the meaning of give depends on the sort of the complement. The most prototypical variant in (4.12) involves a change in possession of the object together with a change of its location. Having a more abstract object, or an object that does not move, excludes the component of moving from give in (4.13). The possession is excluded with objects that are not actually possessed, such as advice or right in (4.14), and replaced with a more abstract component of a result state. The action of “giving” in (4.15) is realised without “giver’s” control over the recipient’s state. Finally, the light give in (4.16) does not describe a transfer at all, but just “the exertion of some effect on the 140 4.2. Theoretical background recipient”. The difference between the two groups of expressions is made by the presence of the component of moving in (4.16a), while in (4.16b), even this is gone. The presence of an agent (the participant that performs or causes the action described by the verb), the completion of the action, and its “directedness” are the components of meaning present in all the realisations. By comparing the range of uses of give in English and its corresponding verb de in Urdu, Butt and Geuder (2001) argue that the same components of meaning which are shared by all the illustrated uses of English give are also the components that the English give and the Urdu de have in common. Brugman (2001) takes a more formal approach to identifying the relevant components of meaning on the basis of which light verb constructions can be differentiated. Instead of analysing the properties of the nominal complements, Brugman (2001) turns to the light verbs themselves, focusing on the English verbs give, take, and have. In an analysis that assumes the force-dynamic schemata (Talmy 2000) (see Section 2.1.4 in Chapter 2 for more details), Brugman (2001) argues that light verbs retain the pattern of force dynamics (or a part of it) of their prototypical (semantically specified) counterparts. The differences in meaning between light verbs such as take in (4.17) and give in (4.18) are explained in terms of different force-dynamics patterns. The overall flow in the events described by the verbs is differently oriented in the two examples. (4.17) Take a { sniff / taste } of the sauce, will you? (4.18) Give the sauce a { sniff / taste }, will you? In (4.17) it is the opinion of the addressee that is asked for, so that the energy is directed towards the agent. This orientation corresponds to the force-dynamic pattern of the verb take, which is a self-oriented activity. The question in (4.18) is about the sauce. One wants to know whether it had spoiled. This direction corresponds to the pattern of the verb give, which is a directed activity, oriented outwards with respect to the agent of the event. The account of Brugman (2001) provides a general framework for discussing the meaning of light verbs. However, it does not relate the identified components of meaning with 141 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions the discussion concerning the degree of lightness of the verbs and the variety of light verb constructions. Kearns (2002) proposes a set of formal syntactic tests to distinguish between the constructions with “lighter” verbs and the constructions with “heavier” verbs. The former group is called true light verb constructions and the latter group is called constructions with vague action verbs. True light verb constructions are identified as special syntactic forms, while the constructions with vague action verbs are regarded as regular phrases. (4.19) a. The inspection was made by the man on the right. b. * A groan was given by the man on the right. (4.20) a. Which inspection did John make? b. * Which groan did John give? (4.21) a. I made an inspection and then Bill made one too. b. * I gave the soup a heat and then Bill gave it one too. The formal distinction between true light verbs and vague action verbs is illustrated in (4.19-4.21), where the expression make an inspection represents constructions with vague action verbs, and the expression give a groan represents true light verb constructions. The examples show that the complement of a true light verb cannot be moved or omitted in regular syntactic transformations. While the passive form of the expression make an inspection (4.19a) is grammatical, the passive form of the expression give a groan (4.19b) is not grammatical. The same asymmetry holds for the WH-question transformation in (4.20) and for the co-ordination transformation in (4.21). Kearns (2002)’s analysis points to some observable indicators on the basis of which various light verb constructions can be differentiated and classified. However, it does not relate the observed behaviour with the meaning of the verbs, regarding true light verbs as semantically empty. The empirical case study presented in this chapter addresses both issues discussed in the literature: the components of meaning of light verbs discussed by Brugman (2001) 142 4.3. Experiments and the degree of compositionality of light verb constructions discussed in the other presented accounts. Following Grimshaw and Mester (1988) and Kearns (2002), we distinguish between two kinds of constructions. We use Kearns (2002)’s terminology referring to the more idiomatic constructions as true light verb constructions and to the less idiomatic constructions as constructions with vague action verbs. We follow Brugman (2001) in using force-dynamic schemata for describing the meaning of light verbs. In the experiments presented in the following section, we examine the relationship between the meaning of light verbs and their cross-linguistic syntactic behaviour. An in-depth empirical study of light verb constructions in the specific context of parallel corpora and alignment can lead to new generalisations concerning the correlation of their linguistic and statistical properties. On the one hand, the statistical large-scale analysis of the behaviour of these constructions in a general cross-linguistic word alignment process provides novel linguistic information, which enlarges the empirical basis for the analysis of these constructions, and complements the traditional grammaticality judgments. On the other hand, the linguistically fine-grained analysis of the statistical behaviour of these constructions provides linguistically-informed performance and error analyses that can be used to improve systems for automatic word alignment. 4.3. Experiments The purpose of our study is to examine the translation equivalents of a range of English light verb constructions and the effect that lexical properties of light verbs have on the cross-linguistic variation. We take as a starting point the observation that the crosslinguistic distribution of light verb constructions depends on their structural properties, as shown in (4.1-4.2), repeated here as (4.22-4.23). (4.22) a. Mary [had a laugh]. b. Maria [lachte]. (4.23) (English) (German) c. Marija se [na-smejala]. (Serbian) a. Mary [gave a talk]. (English) 143 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions b. Maria [hielt einen Vortrag]. c. Marija [je o-držala predavanje]. (German) (Serbian) Recall that English light verb constructions are paraphrases of verbs. The expressions had a laugh in (4.22a) and gave a talk in (4.23) can be replaced by the corresponding verbs laughed and talked respectively without changing the meaning of the sentences. (Obtaining natural sentences with the verbs instead of the constructions would require adding some modifiers, but this does not influence their semantic equivalence.) The corresponding cross-linguistic realisations of the constructions illustrated in (4.22-4.23) can be either single verbs or constructions. The cross-linguistic variation can therefore be seen as an extension of the within-language variation. We analyse the cross-linguistic frequency distribution of the two alternants as an observable indicator of the lexical properties of the constructions which spread across languages. We explore the potential relationship between the meaning of light verbs and the crosslinguistic realisations of light verb constructions by examining a sample of constructions formed with two light verbs widely discussed in the literature. We select the verb take as a representative of self-oriented force dynamic schemata (following Brugman (2001), as discussed in Section 4.2.2). We select the verb make as a representative of directed force dynamic schemata, similar in this respect to give, as analysed by Brugman (2001). The reason for studying the verb make instead of give is to keep the number of arguments constant across the verbs (give takes three arguments, while take takes two), excluding this factor as a possible source of variation. To compare the realisations of light verbs with realisations of regular lexical verbs, we compose a set of verbs which are “heavy” lexical entries comparable in meaning with the verb make. The set consists of the following verbs: create, produce, draw, fix, (re)construct, (re)build, establish. It is obtained from WordNet (Fellbaum 1998), which is a widely cited lexical resource specifying lexical relationships between words. Including several representatives of regular verbs is necessary to deal with the differences in frequency. Since the two light verbs are much more frequent than any of the regular verbs, comparable samples cannot be drawn from corpora of the same size. For example, in the same portion of a corpus which contains fifty occurrences of the verb make, one can expect less than ten occurrences of the verb create. To obtain comparable samples, 144 4.3. Experiments Figure 4.2.: Constructions with vague action verbs we sum up the numbers of occurrences of all regular verbs regarding them as a single regular verb during the analysis. Our samples consist of instances of English light verbs and their German equivalents automatically extracted from a word-aligned parallel corpus. We use this language pair as a sample of many possible language pairs. In principle, the same analysis can be performed for any given language pair. We identify two aspects of the alignment of these constructions as the relevant objects of study. First, we quantify the amount and nature of correct word alignments for light verb constructions compared to regular verbs, as determined by human inspection. Given the cross-linguistic variation between English, German, and Serbian, described in (4.1-4.2), it can be expected that English light verb constructions will be aligned with a single word more often than constructions headed by a regular verb. Assuming that the properties of the heading light verbs do influence semantic compositionality of the constructions, it can also be expected that light verb constructions headed by different verbs will be differently aligned to the translations in other languages. Different patterns of alignment would thus indicate different types of constructions. Second, we evaluate the quality of automatic word alignments of light verb constructions. Translations that deviate from one-to-one word alignments, as it is the case with light verb constructions, are hard to handle in the current approaches to automatic word alignment (see Section 3.2.1 in Chapter 3). Because of the cross-linguistic variation illustrated in (4.1-4.2), light verb constructions can be expected to pose a problem for automatic word alignment. Specifically, we expect lower overall quality of word alignment in the sentences containing light verb constructions than in the sentences that contain corresponding regular constructions. 145 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions Figure 4.3.: True light verb constructions 4.3.1. Experiment 1: manual alignment of light verb constructions in a parallel corpus In the first experiment, we address the relationship between two distinctions pointed out in the theoretical accounts of light verb constructions: a) the distinction between self-oriented vs. directed dynamics in the meaning of light verbs and b) the distinction between idiomatic true light verb constructions vs. regular-like constructions with vague action verbs. The experiment consists of manual word alignment and a statistical analysis of a random sample of three kinds of constructions: constructions with the verb take, constructions with the verb make, and regular constructions. We test the following hypotheses: 1. Light verb constructions in English are aligned with a single word in German more often than constructions headed by a regular verb. 2. True light verb constructions in English are aligned with a single word in German more often than constructions with vague action verbs. 3. The degree of compositionality of light verb constructions depends on the force dynamic schemata represented in the meaning of light verbs. We assume that the lack of cross-linguistic parallelism indicates idiosyncratic structures. In the case of light verb constructions, we assume that the one-to-two word alignment illustrated in Figure 4.3 indicates idiomatic true light verb constructions, while the oneto-one word alignment illustrated in Figure 4.2 indicates more regular constructions with 146 4.3. Experiments vague action verbs. We assume that both types have some semantic content, but that this content is richer in the latter group than in the former. Materials and methods We analyse three samples of the constructions, one for each of the types defined by the heading verb. Each sample contains 100 instances randomly selected from a parallel corpus. Only the constructions where the complement is the direct object were included in the analysis. This means that constructions such as take something into consideration are not included. The only exception to this were the instances of the construction take something into account. This construction was included because it is used as a variation of take account of something with the same translations to German. All the extracted instances are listed in Appendix A. Corpus. The instances of the phrases were taken from the English-German portion of the Europarl corpus (Koehn 2005). The texts in Europarl are collected from the website of the European Parliament. They are automatically segmented into sentences and aligned at the level of sentence. The version of the corpus which we use contains about 30 million words (1 million sentences) of each of the 11 formerly official languages of the European Union: Danish (da), German (de), Greek (el), English (en), Spanish (es), Finnish (fin), French (fr), Italian (it), Dutch (nl), Portuguese (pt), and Swedish (sv). Most of the possible language pairs are not direct translations of each other, since for each text, there is one source language and the others are translations. Some translations are also mediated by a third language. All the instances analysed in this study are extracted from the portion of the corpus which contains the proceedings of the sessions held in 1999. The selected portion of the corpus is parsed using a constituent parser (Titov and Henderson 2007). Sampling. Instances of light verb constructions are sampled in two steps. First, a sample of random 1000 bi-sentences is extracted using a sampler based on computergenerated random numbers (Algorithm 1). Each sentence is selected only once (sampling without replacement). All verb phrases headed by the verbs take and make, as well as the six regular verbs in the randomly selected 1000 bi-sentences are extracted automatically. 147 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions 1. Extract verb-noun pairs from the randomly selected automatically parsed sentences; Tgrep2 query: ’VP < ‘/^VB/ & <-1 (/^NP/ < (‘/^NN/ !. /^NN/))’ 3. Select light verb construction candidates: a. the pairs which contain the verb take and a deverbal nominal complement listed in NOMLEX b. the pairs which contain the verb make and a deverbal nominal complement listed in NOMLEX. 4. Select the pairs which contain one of the regular verbs. Figure 4.4.: Extracting verb-noun combinations The extraction is performed in several steps, as summarised in Figure 4.4. We first extract all the verb-noun pairs using Tgrep2, a specialised search engine for parsed corpora (Rohde 2004). We formulate the query shown in Figure 4.4 to extract all the verbs which head a verb phrase containing a nominal complement together with the head of the complement. The noun which is immediately dominated by the noun phrase and which is not followed by another noun is considered the head of the noun phrase. The extracted verb noun pairs are then compared with the list of deverbal nominals in the NOMLEX database (Macleod et al. 1998) to select the pairs which consist of one of the light verbs and a nominalisation. The selected pairs are then manually examined and uses which are not light are removed from the list. Regular constructions are extracted from the verb-noun pairs by comparing the heading verb with our predefined sample of regular verbs. After assessing the frequency of the selected constructions in the initial random sample of 1000 sentences, we assess that the number of sentences needed to extract 100 instances of each of the three types of constructions is 6000. In the second step, we add 5000 randomly selected bi-sentences to the initial sample using the same sampler and repeat the extraction procedure. The final sample which was analysed in the experiment consists of the first 100 occurrences of each construction type in the sample of 6000 randomly selected bi-sentences. 148 4.3. Experiments Algorithm 1: Selecting a random sample of bi-sentences Input : Aligned corpus of bi-sentences S, each sentence s ∈ S is assigned a unique number n Output : A random sample of K bi-sentences for i = 1 to i = K do generate a random number r in the range from 1 to S; for j = 1 to j = S do if r(i) == n(s) then select s; remove s from S; break; end end end Feature representation. The constructions are represented as ordered pairs of words V + N, where the first word is the verb that heads the construction and the second is the noun that heads the verb’s complement. For a word pair in English, we identify the corresponding word or word pair in German which is its actual translation in the parallel corpus. If either the English or the German verb form included auxiliary verbs or modals, these were not considered. Only the lexical part of the forms were regarded as word translations. (4.24) Er hat einen Vorschlag gemacht. he AUX a proposal made He made a proposal. (4.25) English instance: made + proposal German alignment: Vorschlag + gemacht (note that hat is left out) Type of mapping: 2-2 We then determine the type of mapping between the translations. If the German translation of an English word pair includes two words too (e.g. take+decision ↔ Beschluss+fassen), this was marked as the “2-2” type. If German translation is a single word, the mapping was marked with “2-1”. This type of alignment is further divided 149 translation German 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions 2 2 2 2 → → → → 2 1N 1V 0 LVC take 57 8 30 5 Total English LVC make Regular 50 94 18 2 28 2 4 2 100 100 100 Table 4.1.: Types of mapping between English constructions and their translation equivalents in German. into “2-1N” and “2-1V”. In the first subtype, the English construction corresponds to a German noun (e.g. initiative+taken ↔ Initiative). In the second subtype, the English construction corresponds to a German verb (e.g. take+look ↔ anschauen). In the cases where a translation shift occurs so that no translation can be found, the mapping is marked with “2-0”. For example, a record of an occurrence of the English construction “make + proposal” extracted from the bi-sentence in (4.24) would contain the information given in (4.25). For more examples, see Appendix A. Results and discussion We summarise the collected counts in a contingency table and compare the observed distributions with the distributions which are expected under the hypothesis that the type of the construction does not influence the variation. χ2 = X (E − O)2 E (4.26) To asses whether the difference between the observed and the expected distributions is statistically significant, we use the χ2 -test, which is calculated using the equation in (4.26), where O stands for observed counts, E for expected counts. 150 4.3. Experiments Table 4.1 shows how many times each of the four types of mapping (2-2; 2-1N; 2-1V; 2-0) between English constructions and their German translation equivalents occurs in the sample. We can see that the three types of constructions tend to be mapped to their German equivalents in different ways. First, both types of light verb constructions are mapped to a single German word much more often than the regular constructions (38 instances of light verb constructions with take and 46 instances of light verb constructions with make vs. only 4 instances of regular constructions.). This difference is statistically significant (χ2 = 56.89, p < 0.01). Confirming our initial hypothesis No. 1, this result suggests that the difference between fully compositional phrases and light verb constructions in English can be described in terms of the amount of the “2-1” mapping to German translation equivalents. The number of “2-1” mappings is not significantly different between light verb constructions headed by take and those headed by make (χ2 = 4.54, p < 0.90). However, an asymmetry can be observed concerning the two subtypes of the “2-1” mapping. The German equivalent of an English construction is more often a verb if the construction is headed by the verb take (in 30 occurrences, that is 79% of the 2-1 cases) than if the construction is headed by the verb make (28 occurrences, 61% cases). This difference is statistically significant (χ2 = 3.90, p < 0.05). When the German translation equivalent for an English construction is a verb, the meaning of both components of the English construction are included in the corresponding German verb, the verbal category of the light verb and the lexical content of the nominal complement. These instances are less compositional, more specific and idiomatic (e.g. take+care ↔ kümmern, take+notice ↔ berücksichtigen). On the other hand, English constructions that correspond to a German noun are more compositional, less idiomatic and closer to the regular verb usages (e.g. make+proposal ↔ Vorschlag, make+changes ↔ Korrekturen). The noun that is regarded as their German translation equivalent is, in fact, the equivalent of the nominal part of the construction, while the verbal part is simply omitted. This result suggests that English light verb constructions with take are less compositional than the light verb constructions with make. 151 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions This result does not confirm the hypothesis No. 2, but it does confirm the hypothesis No. 3. Although the number of “2-1” mappings is not different between the two light verbs, two kinds of these mappings can be distinguished. The statistically significant difference in the mappings suggests that the degree of compositionality of light verb constructions depends on the force dynamic schemata represented in the meaning of light verbs. The agent-oriented dynamics of the verb take gives rise to more divergent cross-linguistic mappings than the directed dynamics of the verb take. 4.3.2. Experiment 2: Automatic alignment of light verb constructions in a parallel corpus In the second experiment, we address the relationship between the degree of compositionality of light verb constructions and the quality of automatic word-alignment. On the basis of the results of the first experiment and of the assumption that divergent alignments are generally more difficult for an automatic aligner than the one-to-one alignments, we expect the quality of automatic alignment to depend on the heading verb. In particular, we test the following hypotheses: 1. The quality of word alignment in the sentences containing light verb constructions is lower than in the sentences that contain corresponding regular constructions. 2. The quality of word alignment in the sentences containing light verb constructions headed by take is lower than in the sentences that contain light verb constructions headed by make. Materials and methods Corpus and sampling. The same sample of sentences as in the first experiment is analysed. Before sampling, the corpus was word-aligned in both directions using GIZA++ (Och and Ney 2003). As discussed in Section 3.2.1 in Chapter 3, the formal definition of alignment used by this system excludes the possibility of aligning multiple words in one language to multiple words in the other language, which is an option needed 152 4.3. Experiments for representing alignment of non-compositional constructions. However, it does provide the possibility of aligning multiple words in one to a single word in the other language, which is the option needed to account for some of the described divergences between English and German, such as the mappings shown in Figure 4.3. Such alignment is possible in the setting where English is the target and German is the source language, since in this case, both English words, the light verb and its complement can be aligned with one German word. By contrast, if German is the target language, its single verb that can be the translation for the English construction cannot be aligned with both English words, but only with one of them. The direction of alignment can influence the quality of automatic alignment, since the probability of alignment can only be calculated for the cases that can be represented by the formal definition of alignment. The definition of alignment implies that all the words in the target language sentence are necessarily aligned, while some of the source sentence words can be left unaligned. This is another reason why the quality of alignment can depend both on the type of the constructions and on the direction of alignment. Taking only the intersection of the alignments of both directions as the final automatic alignment is a common practice. Its advantage is that it provides almost only good alignments (precision 98.6% as evaluated by Padó (2007) and Och and Ney (2003)), which can be very useful for some tasks. However, it has two disadvantages. First, many words are left unaligned (recall only 52.9%). Second, it excludes the possibility of many-to-one word alignment that is allowed by the alignment model itself and that could be useful in aligning segments such as constructions with true light verbs. We therefore do not use the intersection alignment, but rather analyse both directions. (4.27) Target language German EN: He made a proposal. DE: Er(1) hat(1) einen(3) Vorschlag(4) gemacht(3). Target language English DE: Er hat einen Vorschlag gemacht. EN: He(1) made(5) a(3) proposal(4). (4.28) Automatic alignment, target German, noun: good, verb: no align Automatic alignment, target English, noun: good, verb: good 153 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions Figure 4.5.: The difference in automatic alignment depending on the direction. Alignment categories. We examine the output of the automatic aligner for the sample of 300 instances described in Section 4.3.1 comparing it with the manual alignment obtained in the first experiment. We collect the information on automatic alignment for each element of the English word pair for both alignment directions. The alignment was assessed as “good” if the construction or the individual word is aligned with its actual translation, as “bad” if the construction or the word is aligned with some other word, and as “no align” if no alignment is found. For example, the automatically aligned sentences in (4.27) would be recorded as in (4.28) (The numbers in the brackets represent the positions of the aligned words). More examples can be found in Appendix A. Note that the “no align” label can only occur in the setting were English is the source language, since all the words in the sentence have to be aligned in the case when English is the target language. Results and discussion We evaluate the quality of automatic alignment comparing the alignment of the three types of constructions and taking into account the effects of the direction of alignment. 154 4.3. Experiments Both EN words EN noun EN verb Both EN words LVCs with EN noun make EN verb Both EN words Regular EN noun construction EN verb LVCs with take Target DE Target EN 5 57 63 79 6 57 5 40 58 58 6 52 26 42 68 81 32 47 Table 4.2.: Well-aligned instances of light verb constructions with take, with make, and with regular constructions (out of 100), produced by an automatic alignment, in both alignment directions (target is indicated). As in the first experiment, the statistical significance of the observed differences in frequency distributions is assessed using the χ2 -test. Table 4.2 shows how the quality of automatic alignment varies depending on the type of construction, but also on the direction of alignment (see also Figure 4.5). Both words are well aligned in light verb constructions with take in 57 cases and with make in 40 cases if the target languages is English, which is comparable with regular constructions (42 cases). However, if the target language is German, both types of light verb constructions are aligned well (both words) in only 5 cases, while regular constructions are well aligned in 26 cases. The effect of the direction of alignment is expected in light verb constructions given the underlying formal definition of alignment which does not allow multiple English words to be aligned with a single German word when German is the target language. However, the fact that the alignment of regular phrases is degraded in this direction too shows that the alignment of light verb constructions influences other alignments. The difference in the amount of correct alignments in two directions also shows the amount of the correct alignments which remain out of the intersection alignment. Looking into the alignment of the elements of the constructions (verbs and nouns) sepa- 155 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions Frequency Low High take LVC make LVC Regular 12 25 62 76 40 8 Table 4.3.: The three types of constructions partitioned by the frequency of the complements in the sample. rately, we can notice that nouns are generally better aligned than verbs for all the three types of constructions, and in both directions. However, this difference is not the same in all the cases. The difference in the quality of alignment of nouns and verbs is the same in both alignment directions for regular constructions, but it is more pronounced in light verb constructions if German is the target. On the other hand, if English is the target, the difference is smaller in light verb construction than in regular phrases. These findings suggest that the direction of alignment influences more the alignment of verbs than the alignment of nouns in general. This influence is much stronger in light verb constructions than in regular constructions. Given the shown effects of the direction of alignment, we focus only on the direction which allows for better alignments in all three groups (with English as the target language) and perform statistical tests only for this direction. The difference between alignments of both members of the three types of constructions (both EN words in Table 4.2) is statistically significant (χ2 =6,97, p < 0.05). However, this does not confirm the initial hypothesis No. 1 that the quality of alignment of light verb constructions is lower than the quality of alignment of regular constructions. The quality of alignment in light verb constructions is, in fact, better than in regular constructions. The difference in the quality of automatic alignment between the two kinds of light verb constructions is also statistically significant (χ2 =5.74, p < 0.05), but the difference is again opposite to the hypothesis No. 2: constructions with take are better aligned than constructions with make. On the other hand, there is no significant difference between constructions with make and regular constructions. These results suggest that the type of construction which is the least compositional and the most idiomatic of the three is best aligned if the direction of alignment suits its properties. 156 4.3. Experiments Figure 4.6.: The distribution of nominal complements in constructions with take. In 12 out of 100 instance the complement is headed by a low-frequency noun (low frequency = 1 occurrence in the sample). There are 76 instances where the complement is headed by a high frequency noun: 5 (one noun with frequency 5) + 7 (one noun with frequency 7) + 27 (three nouns with frequency 9) + 17 (one noun with frequency 17) + 20 (one noun with frequency 20). Figure 4.7.: The distribution of nominal complements in constructions with make. In 25 out of 100 instance the complement is headed by a low-frequency noun (low frequency = 1 occurrence in the sample). There are 40 instances where the complement is headed by a high frequency noun: 15 (three nouns with frequency 5) + 7 (one noun with frequency 7) + 8 (one noun with frequency 8) + 10 (one noun with frequency 10). 157 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions Figure 4.8.: The distribution of nominal complements in regular constructions. In 62 out of 100 instance the complement is headed by a low-frequency noun (low frequency = 1 occurrence in the sample. There are 8 instances where the complement is headed by a high frequency noun: one noun with frequency 8. Since the quality of alignment of the three types of constructions proved different from what was expected in the case where English was the target language, we examine further the automatic alignment in this direction. In particular, we investigate the influence of the frequency distribution of the elements of light verb constructions on the quality of alignment. This approach is based on the fact that the elements of idiomatic expressions tend to occur more jointly than separately (Church and Hanks 1990). As discussed in Section 3.2.1 in Chapter 3, the co-occurrence frequency is important for calculating word-alignment, which is the factor that could have influenced the results. Since the verb is a constant element within the three studied groups, we analyse the distribution of the nominal complements. The frequency of the nouns is defined as the number of occurrences in the sample. It ranges from 1 to 20 occurrences in the sample of 100 instances. The instances of the constructions are divided into three frequency ranges: instances containing nouns with 1 occurrence are regarded as low frequency items; those containing nouns that occurred 158 4.3. Experiments Figure 4.9.: The difference in automatic alignment depending on the complement frequency. English is the target language. 5 and more times in the sample are regarded as high frequency items; nouns occurring 2, 3, and 4 times are regarded as medium frequency items. Only low and high frequency items were considered in this analysis. Table 4.3 shows the number of instances belonging to different frequency ranges. It can be noted that light verb constructions with take exhibit a small number of low frequency nouns (see also Figure 4.6). The number of low frequency nouns increases in the constructions with make (25/100, see also Figure 4.7), and it is much bigger in regular constructions (62/100, see also Figure 4.8). The opposite is true for high frequency nouns (LVCs with take: 76/100, with make: 40/100, regular: 8/100). Such distribution of low/high frequency items reflects different collocational properties of the constructions. In the most idiomatic constructions (with take), lexical selection is rather limited, which results in little variation. Verbs in regular constructions select for a wide range of different complements with little reoccurrence. Constructions with make can be placed between these two types. Different trends in the quality of automatic alignment can be identified for the three 159 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions Freq take lvc C % Low Both 4 33 Freq N 8 66 V 4 33 High Both 47 62 Freq N 64 84 V 58 76 Well aligned make lvc C % 8 32 8 32 12 48 18 51 27 77 18 51 Regular C % 21 34 47 75 53 85 4 50 8 100 4 50 Table 4.4.: Counts and percentages of well-aligned instances of the three types of constructions in relation with the frequency of the complements in the sample. The percentages represent the number of well-aligned instances out of the overall number of instances within one frequency range. English is the target language. types of constructions depending on the frequency range of the complement in the constructions, as shown in Table 4.4 and Figure 4.9. First, the quality of alignment of both components of the constructions jointly is the same for all the three types of constructions in low frequency items ( there is no statistically significant difference between 33% well-aligned instances of light verb constructions with take, 32% of light verb constructions with make, and 34% of regular constructions. The alignment in this category is improved in high frequency items in all the three types, compared to low frequency. The improvement is statistically significant (χ2 = 16.24 p < 0.01) Note that the high frequency regular items are represented with only 8 instances, which is why the trends might not be clear enough for this subtype. The analysis of the influence of frequency of verbs’ complements on the quality of automatic alignment shows that frequency of words is more important for automatic alignment than the structural parallelism between languages. The alignment is significantly better for high frequency combinations in all three types. Contrary to our hypothesis, the idiomatic nature of light verb constructions with take does not pose a problem for an automatic aligner due to the fact that a big proportion of instances of these constructions belongs to the high frequency category. The quality of alignment in constructions 160 4.4. General discussion with take is better than the quality in the other two types due to the difference in the distribution of high frequency items. As it can been seen in Figures 4.6, 4.7, and 4.8, the sample of constructions with take consists mostly of high frequency items. Low frequency items, on the other hand, prevail in regular constructions, while constructions with make are in between the two. 4.4. General discussion The results of our study confirm the hypotheses about the relationship between crosslinguistic alignment of light verb constructions and the meaning of the heading light verb tested in Experiment 1. On the other hand, the hypotheses about the relationship between the type of light verb constructions and automatic word alignment in a parallel corpus tested in Experiment 2 are not confirmed. However, the identified behaviour of the automatic aligner with respect to light verb constructions provides additional evidence for the distinctions confirmed in Experiment 1. In this section, we interpret the results in light of the theoretical discussion concerning light verb constructions. 4.4.1. Two force dynamics schemata in light verbs The main finding of the study is the fact that the constructions headed by light take behave as idiomatic phrases more than the constructions headed by make. The difference between more idiomatic and less idiomatic light verb constructions has been widely discussed in the literature, especially from the point of view of analysing the predicateargument structure of the constructions. The structure of true light verb constructions, which are idiomatic and non-compositional, is argued to be similar to complex predicates, while the structure of constructions with vague action verbs, which are less idiomatic and compositional, is argued to be similar to regular phrases. Our study shows that the idiomatic properties of light verb constructions can be related to the meaning of the heading verbs. The self-oriented force dynamics in the meaning of light take results in more compact cross-linguistic morphosyntactic realisations than the directed dynamics of light make. Cross-linguistic equivalents of English light verb constructions with take 161 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions tend to be single verbs (which is a compact representation), while cross-linguistic equivalents of English light verb constructions with make tend to stay constructions with two main elements. This does not hold only for the language pair English-German, but also for the pair English-Serbian, as discussed by Samardžić (2008). The idiomatic nature of true light verb constructions (represented by the constructions with take in our study) is additionally confirmed by the finding that these constructions are better aligned automatically than regular constructions. This finding, which is opposed to our hypotheses, is due to the same interaction between frequency and irregularity which has been established in relation with different language processing and acquisition phenomena. Idiosyncratic (irregular) elements of language are known to be more frequent than regular unites. This is the case, for example, with English irregular verbs, which are, on average, more frequent than regular verbs. In the case of light verb constructions in our study, the idiosyncratic units are the constructions with take which are idiomatic with high co-occurrence of the two elements (the heading verb and the nominal complement). The constructions with make, which represent constructions with vague action verbs in our study, can be positioned somewhere between irregular and regular items. This additionally confirms the claim that these two types of constructions differ in the level of semantic compositionality. Our analysis of corpus data has shown that there is a clear difference between regular phrases and light verb constructions (including the constructions with make) in the way they are cross-linguistically mapped in a parallel corpus. Regular constructions are mapped word-by-word, with the English verb being mapped to the German verb, and the English noun to the German noun. A closer look into the only 4 examples where regular constructions were mapped as “2-1” shows that this mapping is not due to the “lightness” of the verb. In two of these cases, it is the content of the verb that is translated, not that of the noun (produce+goods ↔ Produktion; establishes+rights ↔ legt). This never happens in light verb constructions. On the other hand, light verb constructions are much more often translated with a single German word. In both subtypes of the “2-1” mapping of light verb constructions, it is the content of the nominal complement that is translated, not that of the verb. The noun is either transformed into a verb (take+look ↔ anschauen) or it is translated directly with the verb being omitted (take+initiative ↔ Initiative). 162 4.4. General discussion The frequency distribution observed in out data represents a new piece of empirical evidence of the distinctions made. The observable differences in cross-linguistic alignment are especially useful for distinguishing between regular constructions and constructions with vague action verbs (represented in our sample by the constructions with make). It has been shown in other studies that true light verb constructions have characteristic syntactic behaviour. Constructions with vague action verbs, however, can not be distinguished using the same tests, while they are clearly distinguished on the basis of their cross-linguistic mappings. 4.4.2. Relevance of the findings to natural language processing The findings of our study show that the interactions between automatic alignment and types of constructions is actually more complicated than the simple hypotheses which we initially formulated. To summarise, we find, first, better alignment of regular constructions compared to light verb constructions only if the target language is German; second, overall, alignment if English is target is better than if German is target; and third, we found a clear frequency by construction interaction in the quality of alignment. The quality of automatic alignment of both regular constructions and light verb constructions interacts with the direction of alignment. First, the alignment is considerably better if the target language is English than if it is German, which confirms the findings of Och and Ney (2003). Second, the expected difference in the quality of alignment between regular constructions and light verb constructions has only been found in the direction of alignment with German as the target language, that is where the “2-1” mapping is excluded. However, the overall quality of alignment in this direction is lower than in the other. This result could be expected, given the general morphological properties of the two languages, as well as the formalisation of the notion of word alignment used in the system for automatic alignment. According to this definition, multiple words in the target language sentence can be aligned with a single word in the source language sentence, but not the other way around. Since English is a morphologically more analytical language 163 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions than German, multiple English words often need to be aligned with a single German word (a situation allowed if English is the target but not if German is the target). The phrases in (4.29) illustrate the two most common cases of such alignments. First, English tends to use functional words (the preposition of in (3a)), where German applies inflection (genitive suffixes on the article des and on the noun Bananensektors in (3b). Second, compounds are regarded as multiple words in English (banana sector ), while they are single words in German (Bananensektors). This asymmetry explains both the fact that automatic alignment of all the three types of constructions is better when the target language is English and that the alignment of light verb constructions is worse than the alignment of regular phrases when it is forced to be expressed as one-to-one mapping, which occurs when German is the alignment target. (4.29) a. the infrastructure of the banana sector b. die Infrastruktur des Bananensektors Practically, all these factors need to be taken into consideration in deciding which version of alignment should be taken, be it for evaluation or for application in other tasks such as automatic translation or annotation projection. The intersection of the two directions has been proved to provide most reliable automatic alignment Padó (2007); Och and Ney (2003). However, it excludes, by definition, all the cases of potentially useful good alignments that are only possible in one direction of alignment. 4.5. Related work Corpus-based approaches to light verb constructions belong to the very developed domain of collocation extraction. General methods developed for automatic identification of collations in texts based on various measures of association between words can also be applied to light verb constructions. However, light verb constructions differ from other types of collocations in that they are partially compositional and relatively productive, which calls for a special treatment. 164 4.5. Related work The methods which combine syntactic parsing with standard measures of association between words prove to be especially well adapted for automatic identification of light verb constructions (Seretan 2011). Identifying the association between syntactic constituents rather than between the words in a context window allows identifying light verb constructions as collocations despite the variation in their realisations due to their partially compositional meaning. Grefenstette and Teufel (1995) present a method for automatic identification of an appropriate light verb for a derived nominal on the basis of corpus data. Making a difference between the cases where the derived nominals can be ambiguous between more verb-like (e. g. make a proposal ) and more noun-like (e.g. put the proposal in the drawer ) uses, Grefenstette and Teufel (1995) extract only those usages where the noun occurs in a context similar to a typical context of the corresponding verb. The most frequent governing verbs for these noun occurrences are their light verbs. As noted by the authors, this technique proves to be insufficient on its own for identifying light verbs. It does not differentiate between the light verb and other frequent verbal collocates for a given nominalisation (e. g. reject a proposal vs. make a proposal ). But it can be used as a step in automatic processing of corpora, since light verbs do occur in the lists of the most frequent collocates. The method for extracting verb-noun collocations proposed by Tapanainen et al. (1998) is based on the assumption that collocations of the type verb-noun are asymmetric in such a way that it is the object (i. e. the noun) that is more indicative of the construction being a collocation. If a noun occurs as the object of only few verbs in a big corpus, its usage is idiomatical. For example, the noun toll occurs mainly with the verb take. It can be used with other verbs too (e. g. charge, collect), but not with many. The measure proposed in the study, the distributed frequency of the object, is better suited for extracting light verb constructions than some symmetric measures of association. However, this approach does not provide a means to distinguish light verb constructions from the other collocations of the same type. Using the information from cross-linguistic word alignment for identifying collocations is explored by Zarrieß and Kuhn (2009). The study shows that many-to-one automatic word alignment in parallel corpora is a good indicator of reduced compositionality of ex- 165 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions pressions. Combined with syntactic parsing, this information can be used for automatic identification of a range of collocation types, including light verb constructions. Semantic characteristics of light verb constructions are studied in more detail by Fazly (2007), who proposes a statistical measure that quantifies the degree of figurativeness of the light verb in conjunction with a predicating noun. The degree of figurativeness of a verb is regarded as the degree to which its meaning is different from its literal meaning in a certain realisation. It is assumed that constructions of the type verb-noun can be placed on a continuum of figurativeness of meaning, including literal combinations (e. g. give a present), abstract combinations (e. g. give confidence), light verb constructions (e. g. give a groan), and idiomatic expressions (e. g. give a whirl ). More figurative meanings of the verb are considered closer to true light verbs, while more literal meanings are closer to vague action verbs (i. e. to the abstract combinations on the presented continuum). The measure of figurativeness is based on indicators of conventionalised use of the constructions: the more the two words occur together and the more they occur within a particular syntactic pattern the more figurative the meaning of the verb. Thus, the figurativeness score is composed of two measures of association: association of the two words and association of the verb with a particular syntactic pattern. The syntactic pattern that is expected for figurative combinations is defined in terms of three formal properties associated with typical light verb constructions (see the examples in (4.19-4.21) in Section 4.2.2): active voice, indefinite (or no) article, and singular form of the noun. The association of a verb-noun combination with the expected syntactic pattern is expressed as the difference between the association of the combination with this pattern (positive association) and the association of the combination with any of the patterns where any of the features has the opposite value (passive voice, definite article, plural noun). For a sample of expressions, the scores assigned by the measure of figurativeness are compared with the ratings assigned by human judges. The results show that a measure which includes linguistic information about expressions performs better in measuring the degree of their figurativeness than a simple baseline measure of association between the words in the expressions. 166 4.6. Summary of contributions The work of Stevenson et al. (2004) deals with semantic constraints on light verb complements. They focus on true light verb constructions trying to identify the classes of complements that would be preferred by a given light verb. Light verb constructions are first identified automatically and then the relations between light verbs and some classes of complements are examined. Following the analysis of Wierzbicka (1982) (see Section 4.2.1), the nominal complements of light verbs are identified with their corresponding verbs. With this, it was possible to use Levin’s lexical semantic classification of verbs (see Section 2.2.1 in Chapter 2 for more details) to divide the complements into semantic classes and to examine if certain light verbs prefer certain classes of complements. The study shows that light verbs have some degree of systematic and predictable behaviour with respect to the class of their complement. For example, light give tend to combine with deverbal nouns derived from the Sound Emission verbs, while light take combines better with the nouns derived from the Motion (non-vehicle) verbs. As the light verb construction score gets higher, the pattern gets clearer. It shows as well that some of the verbs (e. g. give and take) behave in a more consistent way than others (e. g. make). The computational approaches presented in this section show that the compositionality of the meaning of light verb constructions does not correspond directly to the strength of the association of their components. Adding specific linguistic information improves the correlation between human judgements and automatic rankings. The studies, however, do not address the lexical properties of light verbs verbs as one of the potential causes of the observed variation. Also, they do not address the patterns in cross-linguistic variation which are potentially caused be caused by different degrees of compositionality of light verb constructions. Our study focuses on these two issues. 4.6. Summary of contributions In the study of light verb constructions, we have proposed using data automatically extracted from parallel corpora to identify two kinds of meaning of light verbs. We have shown that English light verb constructions headed by the verb take tend to be aligned 167 4. Force dynamics schemata and cross-linguistic alignment of light verb constructions with a German single verb more than the constructions headed by the verb make. The difference in the cross-linguistic mapping is predicted from the meaning of the verbs described in terms of force dynamics: the self-oriented schemata of light take gives rise to more compact cross-linguistic realisations than the directed schemata of the verb make. The difference in the force dynamics of the two verbs is related to the level of compositionality of their corresponding light verb constructions. The constructions with take are less compositional and more irregular than the constructions with make. The idiomatic nature of light verb constructions represented in our study by the constructions with take is additionally confirmed by the finding that these constructions are better automatically aligned than the constructions represented with the verb make, as well as the comparable regular constructions. Although this finding sounds surprising, it actually follows from the interaction of frequency and regularity which plays an important role in automatic word alignment. 168 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives 5.1. Introduction The causative(/inchoative) alternation has been recognised in the linguistic literature a wide- spread linguistic phenomenon, attested in almost all languages (Schafer 2009). This alternation involves verbs such as break in (5.1), which can be realised in a sentence both as transitive (5.1a) and as intransitive (5.1b). Both realisations express the same event, with the only difference being that the transitive version specifies the causer of the event (Adam in (5.1a)), and the intransitive version does not. The transitive version is thus termed causative and the intransitive anticausative. The verbs that participate in this alternation are commonly referred to as lexical causatives.1 (5.1) a. Causative: Adam broke the laptop. b. Anticausative: The laptop broke. 1 The lexical causative alternation, which we address in this study, is to be distinguished from the syntactic causative alternation illustrated in (i), which has been studied more extensively in the linguistic literature, as a case of verb serialisation (Baker 1988; Williams 1997; Alsina 1997; Collins 1997; Aboh 2009). (i.) a. Lexical causative: Adam broke the laptop. b. Syntactic causative: Adam made the laptop break. 169 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives What makes this alternation an especially attractive topic for our research is the wide range of cross-linguistic variation in surface forms of the clauses formed with the alternating verbs. The causative alternation appears in different languages with a diversity of lexical, morphological, and syntactic realisations which defies linguists’ attempts at generalisation. First of all, the variation is observed in the sets of alternating verbs. Most of the alternating verbs are lexical counterparts across many languages. However, there are still many verbs which alternate in some languages, while their lexical counterparts in other languages do not. The verbs that do not alternate in some languages can be divided into two groups: only intransitive and only transitive. Examples of only intransitive and only transitive verbs in English are given in Table 5.1. As the examples, taken from Alexiadou (2010), show, the English verbs arrive and appear do not alternate: their transitive realisation (causative in Table 5.1) is not available in English. However, their counterparts in Japanese, or Salish languages, for example, are found both as transitive and intransitive, that is as alternating. Similarly, the verbs such as cut and kill are only found as transitive in English, while their counterparts in Greek or Hindi, for example, can alternate between intransitive and transitive use. Languages also differ in the morphological realisation of the alternation. Some examples of morphological variation, taken from Haspelmath (1993), are given in Table 5.1. In some languages, such as Russian, Mongolian, and Japanese, the alternation is morphologically marked. The morpheme that marks the alternation can be found on the intransitive form, while the corresponding transitive form is not marked (the case of Russian in Table 5.1). In other languages, such as Mongolian, the morpheme that marks the alternation is found on the transitive version, while the intransitive version is unmarked. There are also languages where both forms are attached a causative marker, one marking the transitive and the other the intransitive version, like in the Japanese example in Table 5.1. English, on the other hand, is an example of a language where the alternation is not marked at all. (Note that both forms of the verbs melt and gather in Table 5.1 are the same.) The different marking strategies illustrated in Table 5.1 represent only the most common markings. Languages can use different options for different verbs. For example, anticausative versions of some verbs are not marked in Russian. In principle, any option can be found in any language, but with different probability. 170 5.1. Introduction Availability: Causative Anticausative arrive, appear +Japanese, +Salish, -English +all languages kill, cut +all languages +Greek, +Hindi, -English Morphological marking: Causative Anticausative Mongolian xajl-uul-ax ’melt’ xajl-ax ’melt’ Russian rasplavit’ ’melt’ rasplavit’-sja ’melt’ Japanese atum-eru ’gather’ atum-aru ’gather’ Table 5.1.: Availability of the alternation (Alexiadou 2010) and morphological marking (Haspelmath 1993) in some examples of verbs and languages. 171 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives The variation in the availability of the alternation illustrated in Table 5.1 raises the question of why some verbs do not alternate in some languages. Answering this question can help understanding why alternating verbs do alternate. The variation in morphological marking is even more puzzling: Why is it that languages do not agree on which version of the alternating verbs to mark? Also, what needs to be addressed is the interaction between the categories of variation: Is there a connection between the alternation availability and morphological marking? In this study, we address the issues raised by the causative alternation in a novel approach which combines the knowledge about the use of verbs in a corpus with the knowledge about the typological variation.2 We analyse within-language variation in realisations of lexical causatives, as well as cross-linguistic variation in a parallel corpus with the aim of identifying common properties of lexical causatives underlying the variation. Our analysis relates the variation observed in language corpora with the observed cross-linguistic variation in a unified account of the lexical representation of lexical causatives. The findings of our study are expected to extend the knowledge about the nature of the causative alternation by taking into consideration much more data than the previous accounts. On the other hand, they are also expected to be applicable in natural language processing. Being able to predict the cross-linguistic transformations of phrases involving lexical causatives based on their common lexical representation can be useful for improving automatic alignment of phrase constituents, which is a necessary step in machine translation and other tasks in cross-linguistic language processing. Having both purposes in mind, we propose and account of lexical causatives suitable for machine learning. We model all the the studied factors so that the values of the variables can be learned automatically by observing the instances of the alternating verbs in a corpus. The chapter is organised in the following way. We start by discussing the questions raised by lexical causatives. In Section 5.2.1, we introduce the distinction between internally and externally caused events in relation to the argument structure of lexical causatives and to the causative alternation. In Section 5.2.2, we discuss cross-linguistic variation in the causative alternation and the challenges that it poses for the account based on the two-way distinction between internally and externally caused event. In Section 5.2.3, we discuss a more elaborate typological approach to cross-linguistic variation in lexical 2 Some pieces of the work presented in this chapter are published as Samardžić and Merlo (2012). 172 5.2. Theoretical accounts of lexical causatives causatives which proposes an account of their meaning in terms of a scale, rather than two or three classes. After defining the theoretical context of our study, we present our experimental approach to the questions discussed in the literature. The study consists of four experiments. The first two experiments (Sections 5.3.1 and 5.3.2) establish a corpus-based measure which can be used to distinguish between lexical causatives with different lexical representations. In the third experiment (Section 5.3.3), we examine the influence of the meaning of lexical causatives on their cross-linguistic realisations. In the fourth experiment (Section 5.3.4), we test a statistical model which classifies lexical causatives based on their cross-linguistic realisations. In Section 5.4, we interpret the results of our experiments in light of the theoretical discussion and also in relation to more practical issues concerning natural language processing. We compare our study with the related work in Section 5.5. 5.2. Theoretical accounts of lexical causatives Among other issues raised by the causative alternation which have been discussed in the linguistic literature, theoretical accounts have been proposed for the interrelated questions which we address in our study: 1. What are the properties of alternating verbs that distinguish them from the verbs that do not alternate? 2. Which one of the two realisations is the basic form and which one is the derivation? 3. What is the source of cross-linguistic variation in the sets of alternating verbs? Most of the proposed accounts are focused on the specific properties of alternating verbs (Question No. 1) and on the structural relationship between the two alternants (Question No. 2). The issues in cross-linguistic variation are usually not directly addressed except in typological studies. Our study addresses the issue of cross-linguistic variation directly, but the findings are relevant for the other two questions too. In this section, we present theoretical accounts of lexical causatives introducing the notions and distinctions which are addressed in our study. We focus on the proposals and ideas concerning the 173 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives relationship between the meaning of lexical causatives and the variation in their morphosyntactic realisations, especially cross-linguistic variation, leaving on the side the accounts of the structure of clauses formed with these verbs. The most apparent common property of the alternating verbs in different languages is their meaning. Most of these verbs describe an event in which the state of one of the participants (patient or theme) changes (Levin and Rappaport Hovav 1994). If a verb describes some kind of change of state, it can be used both as causative and as anticausative. This is the case illustrated in (5.1) repeated here as (5.2). In the causative use (5.2a), the verb is transitive, the changing participant is its object, and the agent is expressed as its subject. In the anticausative use, the verb is intransitive, with the changing participant being expressed as its subject (5.2b). (5.2) a. Adam broke the laptop. b. The laptop broke. If the change-of-state condition is not satisfied, the alternation does not take place. The example in (5.3a) illustrates a case of an intransitive verb whose subject does not undergo a change, which is why it cannot be used transitively, as shown in (5.3b). Similarly, the object of the verb bought in (5.4a) is not interpreted as changing, so the verb cannot be used intransitively (5.4b). (5.3) a. The children played. b. * The parents played the children. (5.4) a. The parents bought the toys. b. * The toys bought. 5.2.1. Externally and internally caused events Taking other verbs into consideration, however, it becomes evident that the meaning of change of state is neither necessary nor sufficient condition for the alternation to take place. Verbs can alternate even if they do not describe a change-of-state event. On the other hand, some verbs do describe a state-of-change event but they still do 174 5.2. Theoretical accounts of lexical causatives not alternate. For example, verbs with the meaning of positioning such as hanging in (5.5) do alternate although their meaning, at least in the anticausative version, does not involve change of state. On the other hand, verbs such as transitive cut in (5.6) and intransitive bloomed in (5.7) do not alternate although their meaning involves a change of state, of bread in (5.6) and of flowers in (5.7). (5.5) a. Their photo was hanging on the wall. b. They were hanging their photo on the wall. (5.6) a. The baker cut the bread. b. * The bread cut. (5.7) a. The flowers suddenly bloomed. b. * The summer bloomed the flowers. To deal with the issue of non-change-of-state verbs entering the alternation, Levin and Rappaport Hovav (1994) introduce the notion of “externally” and “internally” caused events. Externally caused events can be expressed as transitive forms, while internally caused events cannot. On this account, verbs such as hanging in (5.5) can alternate even though they do not describe a change of state event, because they mean something that is externally caused. The hanging of the photo in (5.5a) is not initiated by the photo itself, but by some other, external cause, which can then be expressed as the agent in a transitive construction. The same distinction explains the ungrammaticality of (5.7b). Since blooming is something that flowers do on their own, themselves, the verb bloom does not specify an external causer which would be realised as its subject, which is why this verb cannot occur in a transitive construction. This distinction still does not account for all the problematic cases. It leaves without an explanation the case of transitive verbs which describe a change of state, and which are clearly externally caused, but which do not alternate such as cut in (5.6). To deal with these cases, Levin and Rappaport Hovav (1994) introduce the notion of agentivity. According to this explanation, the meaning of some verbs is such that specifying the agent in the event described by the verb is obligatory, which is why they cannot occur as intransitive. Levin and Rappaport Hovav (1994) argue that this happens with verbs whose subject can only be the agent (and not the instrument, for example). Schafer 175 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives (2009) challenges this view showing that the alternation can be blocked in verbs with different subjects. Such a verb is English destroy whose subject can be some natural force, an abstract entity, or even an instrument, but which still does not alternate. Haspelmath (1993) argues that it is the level of specificity of the verb that plays a role in blocking the alternation. If a verb describes an event that is highly specified, such as English decapitate, the presence of specific details in the interpretation of the meaning of the verb can block the alternation. 5.2.2. Two or three classes of verb roots? Since the discussed properties concern the meaning of verbs, one could expect that the verbs which are translations of each other alternate in all languages. This is, however, not always true. There are many verbs that do alternate in some languages, while their counterparts in other languages do not. For example, in Greek and Hindi, the counterparts of kill and destroy have intransitive versions (Alexiadou et al. 2006). On the other hand, typically intransitive verbs of moving (run, swim, walk, fly) can have transitive versions in English, which is not possible in French or German (Schafer 2009). An explanation for these cases is proposed by Haspelmath (1993), who argues that a possible cause of these differences is a slightly different meaning of the lexical counterparts across languages. Russian myt’, for example, which does alternate, does not mean exactly the same as English wash, which does not alternate. Haspelmath (1993), however, does not propose a particular property which differs in the two verbs. The question of cross-linguistic variation has received more attention in the work of Alexiadou (2010) who examines a wide range of linguistic facts including the variation in the availability of the alternation and in morphological marking. Alexiadou (2010) argues that the account of the examined facts requires introducing one more class of verbs3 . In addition to the classes of externally caused and internally caused verbs, proposed by Levin and Rappaport Hovav (1994), Alexiadou (2010) proposes a third 3 More precisely, Alexiadou (2010) refers to verb roots rather than to verbs to emphasise that the discussion concerns this particular level of the lexical representation. 176 5.2. Theoretical accounts of lexical causatives Causative Greek: spao ’break’ klino ’close’ aniyo ’open’ Japanese: war-u ’break’ Turkish : kapa ’close’ Anticausative spao ’break’ klino ’close’ aniyo ’open’ war-er-u ’break’ kapa-n ’close’ Table 5.2.: The examples of morphological marking of cause unspecified verbs discussed by Alexiadou (2010). group of “unspecified roots”. The generalisations based on the proposed framework are summarised in (5.8). (5.8) a. Anticausative verbs that are characterised as internally caused and/or cause unspecified are not morphologically marked, while those that are characterised as externally caused are marked. b. Cause unspecified verbs alternate in all languages, while internally caused and externally caused verbs alternate only in languages that allow anticausative morphological marking. Although Alexiadou’s (2010) analysis relates the two important aspects of the crosslinguistic variation in an innovative way, it fails to explain the tendency observed in many languages regarding the morphological marking of the anticausative variant. In particular, of all the examples mentioned in Alexiadou (2010) (see Table 5.2) as support to (5.8a), only the Greek examples can be clearly classified. The other examples in Table 5.2 illustrate that verbs classified as prototypical cause-unspecified (Class I) (e.g., break, open, close (Alexiadou et al. 2006; Alexiadou 2010)) tend to allow rather than disallow morphological marking of their anticausative variant (compare the examples in Table 5.2).4 Looking up the verbs in other languages shows some limitations for the generalisation in (5.8b) too. For example, the Serbian verb rasti ’grow’ would be classified as cause 4 These verbs are mostly classified as Class II (externally caused) in the data overview, but they are classified as Class I in the summary of the data. 177 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives unspecified according to Alexiadou’s 2010 criteria, implying that it is expected to alternate in all languages. However, this verb does not alternate in Serbian; it exists only as intransitive. 5.2.3. The scale of spontaneous occurrence The approach proposed by Haspelmath (1993) does not address the syntactic aspects of the variation, but it provides a better account of the data which pose a problem for the generalisations proposed by Alexiadou (2010). Haspelmath (1993) analyses the typology of morphological marking of the two realisations of alternating verbs across a wade range of languages. Alternating verbs can be divided into several types according to the morphological differences between the causative and the anticausative version of the verb. The alternation can be considered morphologically directional if one form is derived from the other.5 There are two directional types: causative, if the causative member of the pair is marked, and anticausative, with morphological marking on the anticausative. Morphologically non-directed alternations are equipollent, if both members of the pair bear a morphological marker, suppletive, if two different verbs are used as alternants, and labile if there is no difference in the form between the two verbs. One language typically allows several types of marking, but it prefers one or two types. For example, both English and German allow anticausative, equipollent, labile, and suppletive alternations. There is a strong preference for labile pairs in English, while German prefers anticausative and labile pairs (Haspelmath 1993).6 Despite the different morphological marking types of languages, a study of twenty-one pairs of alternating verbs showed that certain alternating verbs tend to bear the same 5 Certain authors (Alexiadou 2006a) argue against the direct derivation. Since precise account of the derivation of the structures is not relevant for our work, we maintain the morphological distinctions described in (Haspelmath 1993). 6 The issue of whether these preferences can be related to some other properties of the languages is still unresolved. The only correlation that could be observed is the fact that the anticausative morphology is observed mostly in European languages, even if they are not closely genetically related. For example, Greek is as close to the other European languages as Hindi-Urdu. While Greek shows a preference for anticausative morphology, Hindi-Urdu prefers causative morphology. Languages that prefer causative morphology are more spread, being located in almost all the continents, while the preference for anticausative morphology is restricted mainly to Europe. 178 5.2. Theoretical accounts of lexical causatives boil freeze dry wake up go out / put out sink learn / teach melt stop turn dissolve burn destroy fill finish begin spread roll develop get lost / lose rise-raise improve rock connect change gather open break close split die / kill Languages (N) A C E L S A/C 21 0.5 11.5 3 6 0 0.04 21 2 12 3 4 0 0.17 20 3 10 4 3 0 0.30 21 3 9 6 2 1 0.33 21 3 7.5 5.5 3 2 0.41 21 4 9.5 5.5 1.5 0.5 0.42 21 3.5 7.5 6 2 3 0.47 21 5 10.5 3 2.5 0 0.48 21 5.5 9 3.5 3 0 0.61 21 8 7.5 4 1.5 0 1.07 21 10.5 7.5 2 1 0 1.40 21 7 5 2 5 2 1.40 20 8.5 5.5 5 1 0 1.50 21 8 5 5 3 0 1.60 21 7.5 4.5 5 4 0 1.67 19 5 3 3 8 0 1.67 21 11 6 3 1 0 1.83 21 8.5 4.5 5 3 0 1.89 21 10 5 5 1 0 2.00 21 11.5 4.5 4.5 0 0.5 2.56 21 12 4.5 3.5 0 1 2.67 21 8.5 3 8 1.5 0 2.67 21 12 4 3.5 1.5 0 3.00 21 15 2.5 1.5 1 1 6.00 21 11 1.5 4.5 4 0 7.33 21 15 2 3 1 0 7.50 21 13 1.5 4 2.5 0 8.67 21 12.5 1 2.5 2 0 12.50 21 15.5 1 2.5 2 0 15.50 20 11.5 0.5 5 3 0 23.00 21 0 3 1 1 16 — Table 5.3.: Morphological marking across languages: A=anticausative, C=causative, E=equipollent, L=labile, S=suppletive 179 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives kind of marking across languages. Verbs such as lexical equivalents of English freeze, dry, melt tend to be marked when used causatively in many different languages, while the equivalents of English gather, open, break, close tend to be marked in their anticausative uses. Table 5.3 shows the distribution of morphological marking for all the verbs included in Haspelmath’s (1993) study. Note that the verbs are ranked according to the ratio between anticausative and causative marking. The verbs with a low ratio are found on the top of the table and those with a high ratio are in the bottom. Assuming that the cross-linguistic distribution of the kinds of morphological marking is a consequence of the way lexical items are used in language in general, Haspelmath (1993) interprets these findings as pointing to a universal scale of increasing likelihood of spontaneous occurrence. The verbs with a low A/C ratio describe events that are likely to happen with no agent or external force involved. If the verb is used with an expressed agent, the form of the verb contains a morphological marker in the majority of languages. The verbs with a high A/C ratio typically specify an agent, and if the agent is not specified, the verb tends to get some kind of morphological marking across languages. In this interpretation, the cross-linguistic A/C ratio is an observable and measurable indicator of a lexical property of verbs. It expresses the degree to which an agent or an external cause is involved in the event described by the verb. A summary of the notion of the scale of spontaneous occurrence is given in (5.9). (5.9) The scale of spontaneous occurrence: f reeze > dry > melt > low A/C (spontaneous) ..... > gather > open > break > close high A/C (non-spontaneous) The notion of spontaneous occurrence can be related to the distinction between internally and externally caused events argued for in the other analyses. Both notions concern the same lexical property of verbs — the involvement of an agent in the event described by a verb. The events that are placed on the spontaneous extreme of the scale would be those that can be perceived as internally caused. The occurrence of an agent or an external cause in these events is very unlikely. Since the externally caused events are considered to give rise to the causative alternation, they would correspond to a wider portion of the 180 5.3. Experiments scale of spontaneous occurrence, including not just the events on the non-spontaneous extreme of the scale, but also those in the middle of the scale. However, there are important theoretical and methodological differences between the two conceptions. The qualitative notion of internal vs. external causation implies that there are two kinds of events: those where the agent is present in the event and those with no agent involved. Verbs describing internally caused event can only be used as anticausative. A causative use of a verb describing an internally caused event is expected to be ungrammatical (as in (5.7b)). The notion of scale of spontaneous occurrence does not imply complete absence of the agent in any event. It does not predict the ungrammaticality of uses such as (5.7b). What follows from this notion is that such uses are possible, but very unlikely. The difference between the two conceptions is even more important with respect to the events perceived as externally caused. The qualitative analyses imply that all externally caused events have the same status with respect to the causative alternation — the verbs describing these events alternate. The attested cases of verbs that describe externally caused events, but do not alternate, such as (5.6b), are considered exceptions due to some idiosyncratic semantic properties of events described by the verbs (Section 5.2). The quantitative notion of scale of spontaneous occurrence allows expressing the differences between the verbs that describe externally caused events. Each point on the scale represents a different probability for an agent to occur in an event described but a verb. Opposite to the spontaneous extreme of the scale there is the non-spontaneous extreme. It predicts cases of verbs describing events that are very unlikely to occur spontaneously. An intransitive use of these verbs would be unlikely, although possible. The case in (5.6b) could be explained in these terms with no need to treat it as an exception. 5.3. Experiments Our approach to lexical causatives is based on statistical analysis and modelling of large data sets. Assuming that the use of verbs is related to their semantic and grammatical properties, we observe the distribution of the causative and anticausative realisations of a 181 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives large number of verbs extracted from a corpus of around 1’500’000 syntactically analysed sentences identifying the properties of verbs which generated this distribution. We first show that the distribution of the two realisations of the alternating verbs in a corpus is correlated with the distribution of morphological marking across languages. We measure the correlation for a sample of 29 verbs for which typological descriptions are available (Haspelmath 1993). Regarding this correlation as a piece of evidence that the two distributions are generated by the same underlying property of the alternating verbs, we define this property as the degree of involvement of an external causer in an event described by a verb. Following Haspelmath (1993), we call this property the degree of spontaneity of an event. The more spontaneous an event, the less is an external causer involved in the event. We see the degree of spontaneity as a general scalar component of the meaning of lexical causatives whose value has an impact on the observable behaviour of all the verbs which participate in the alternation in any language. Showing that the corpus-based measure of spontaneity is correlated with the typological measure allows us to extend the account to a larger sample of verbs. Since the corpusbased value is assigned to the verbs entirely automatically, it can be quickly calculated for practically any given set of verbs, replacing the typology-based value for which the data are harder to collect. We calculate the corpus-based value of the spontaneity of events described by 354 verbs cited as participating in the causative alternation in English (Levin 1993). We show, by means of a statistical test, that the smaller set of verbs (the 29 verbs for which we measured the correlation) is a proper sample of the bigger set (the 354 verbs from Levin (1993)). This implies that the correlation established for the smaller set applies to the bigger set as well. To study how exactly the spontaneity value influences the cross-linguistic variation, we analyse the distribution of causative and anticausative realisations in German translations of English lexical causatives. We extract the data from the corpus of German translations of the 1’500’000 English sentences which were used in the monolingual part of the study. The sentences on the German side are, like English sentences, syntactically analysed. All the sentences are word-aligned so that German translations of individual English words are known. By a statistical analysis of parallel instances of verbs, we 182 5.3. Experiments identify certain trends in the cross-linguistic variation which are due to the spontaneity value. Based on these findings, we design a probabilistic model which exploits the information about the cross-linguistic variation at the level of token to assess the spontaneity value of lexical causatives abstracting away from the potential language-specific biases. 5.3.1. Experiment 1: Corpus-based validation of the scale of spontaneous occurrence Haspelmath (1993) does not discuss a potential relation between the likelihood of spontaneous occurrence of an event and the frequency of different uses of the verb which describes it in a single language. Nevertheless, it is logical to suppose that such a relation should exist, since the indicator of the likelihood, the morphological marking on the verbs, is considered to be a consequence of the way the verbs are used in general. The placement of an event described by a verb on the scale can be expected to correspond to the probability for the verb to be used transitively or intransitively in any single language. On the other hand, the ratio of the frequencies of intransitive to transitive uses of verbs in a single language can be influenced by other factors as well, which can result in cross-linguistic variation. The relation between the scale of spontaneous occurrence and the patterns of use of verbs in different languages thus needs to be examined. Note that the causative alternation is realised in different ways across languages: some languages mark the causative use of a verb, some mark the anticausative use, some mark both and some none of them (see Tables 5.1 and 5.3). Morphological markers themselves can be special causative morphemes, but they can often be morphemes that have other functions as well, such as the reflexive anticausative marker in most of European languages. These factors might have influence on the ratio of intransitive to transitive uses in a given language. To validate empirically the hypothesis that the alternating verbs can be ordered on the scale of spontaneous occurrence of the events that they describe, we test it on corpus data. More precisely, we test the hypothesis that the distribution of morphological 183 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives marking on the verbs across languages and the distribution of their transitive to intransitive uses in a corpus are correlated. We can expect this correlation on the basis of the well established correspondence between markedness and frequency. In general, marked forms are expected to be less frequent than unmarked forms. Therefore, we expect the verbs that tend to have anticausative marking across languages to be used more often as causative (transitive), and verbs that tend to have causative marking to be used more often as anticausative (intransitive). To make the discussion easier to follow, we opt for a positive, and not a negative correlation. We thus calculate the C/A and not the A/C ratio as Haspelmath (1993). We calculate the ratio between the frequencies of causative (active transitive) and anticausative (intransitive) uses of verbs in a corpus of English for the verbs for which Haspelmath’s study provides the typological A/C ratio, as shown in (5.10). C/A(verb) = frequency of causative uses frequency of anticausative uses (5.10) We then measure the strength of the correlation between the ranks obtained by the two measures. Materials and methods Several explanations are needed regarding the matching between the verbs in Haspelmath’s and our study and the criteria that were used to exclude some verbs. Most of the verbs analysed by Haspelmath are also listed as participating in the causative alternation by Levin (1993) (e.g. freeze, dry, melt, open, break, close). Some verbs are not listed by Levin (e.g. boil, gather ). We include them in the calculation nevertheless because they clearly alternate. Four entries in Haspelmath’s list are not English alternating verbs, but complement pairs: learn/teach, rise/raise, go out/put out, and get lost/lose. We treat the former two pairs as single verb entries adding up counts of occurrences of both members of the pair. We do not calculate the ratio for the latter two because automatic extraction of their instances from the corpus could not be done using the methods already developed to extract the other verb instances. We exclude the verb 184 5.3. Experiments destroy because it does not alternate in English and no complement verb is proposed by Haspelmath. Finally, the pair kill/die is excluded because its typology-based ranking is not available. This leaves us with 27 verbs for which we calculate the corpus-based C/A ratio. Transitive, intransitive, and passive instances of the verbs were extracted from the English side of the parallel corpus Europarl (Koehn 2005), version 3, which contains around 1’500’000 sentences for each language (the same corpus which was used for the study on light verb construction presented in Chapter 4). Syntactic relations needed for determining whether a verb is realised as transitive (with a direct object) or as intransitive (without object) are identified on the basis of automatic parsing with the MaltParser, a data-driven system for building parsers for different languages (Nivre et al. 2007). Instance representation. Each instance is represented with the following elements: the verb, the head of its subject, and the head of its object (if there is one). An English causative use of a verb is identified as an alternating verb realised in an active transitive clause. The anticausative use is identified as an intransitive use of an alternating verb. Passive is identified as the verb used in the form of passive participle and headed by the corresponding passive auxiliary verb. Identification of the form of the clause which contains a lexical causative is performed automatically, using the algorithm shown in Algorithm 2. Regarding all the transitive uses of the alternating verbs as causatives, and intransitive uses as anticausatives is a simplification, because this is not always true. It can happen that a verb alternates in one sense, but not in another. For instance, the phrase in (5.12) is not the causative alternation of (5.11), but only of (5.13). (5.11) Mary was running in the park this morning. (5.12) Mary was running the program again. (5.13) The program was running again. By a brief manual inspection of the lexical entries of the verbs in the Proposition Bank (Palmer et al. 2005a), we assessed that this phenomenon is not very frequent and that it should not have an important influence on the results. In our sample, only the verb 185 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives freeze proved to be affected by this phenomenon. This verb was discarded as an outlier while calculating the correlation between corpus based and typology based rankings of the verbs, but this was the only such example in the sample of verbs. Algorithm 2: Identifying transitive, intransitive, and passive uses of lexical causatives. Input : 1. A corpus S consisting of sentences s parsed with a dependency parser 2. A list of lexical causatives V Output : The number of transitive, intransitive, and passive instances of each verb v ∈ V in the corpus S for i = 1 to i = S do for j = 1 to j = V do if vj in si then if there is SUBJ which depends on vj then if there is OBJ which depends on vj then return transitive; else if vj is passive then return passive; else return intransitive; end end end end end end As it can be seen in Algorithm 2, only the instances with all the arguments realised in the same clause were taken into account. This is obtained by the constraint that the extracted verb has to have a subject. We exclude the realisations of verbs where either the subject or the object are moved or elided in order to control for potential influence that the specific syntactic structures can have on the interpretation of the meaning of verbs. Single clause realisations can be considered the typical and the most simple case. 186 5.3. Experiments Figure 5.1.: The correlation between the rankings of verbs on the scale of spontaneous occurrence Although they are basically transitive realisations, the passive instances are extracted separately because the difference between active and passive transitive uses is crucial with respect to the causative alternation, as discussed in detail by Alexiadou et al. (2006). Expressing the external causer (by means of a prepositional complement) is optional in passive constructions, while in active transitive instance, the external causer is obligatorily expressed as the subject. Results and discussion To asses the strength of correlation between the corpus-based C/A ratio and the A/C ratio based on the typology of morphological marking on the verbs, we rank the verbs according to the corpus use ratio and then perform a correlation test between the rankings of the same verbs based on the two measures. We obtain the Spearman rank correlation 187 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives score rs = 0.67, p < 0.01, with one outlier7 removed. The score suggests good correlation between the two sources of data. Figure 5.1 shows the scattergram representing the correlation. The coefficient of the correlation is strong enough to be taken as an empirical confirmation of Haspelmath’s hypothesis. Given that the two distributions are significantly correlated, it is reasonable to assume that the same factor which underlies the typological distribution of morphological marking on verbs underlies the distribution of their transitive and intransitive realisations in a monolingual corpus too. Since the correlation is established based on the intuition that the underlying cause of the observed distributions is the meaning of verbs, we can conclude that the lexical property on which the two distributions depend is the probability of occurrence of an external causer in an event described by a verb. 5.3.2. Experiment 2: Scaling up The fact that automatically obtained corpus-based ranking of verbs corresponds to the scale of spontaneous occurrence is a useful finding, not only because it confirms Haspelmath’s theoretical hypothesis, but also because it means that the spontaneity feature can be calculated automatically from corpus data. In this way, it is possible to extend the account beyond the small group of example verbs that are discussed in the literature and cover many more cases. To test whether the correlation that we find for the small sample of verbs discussed in Section 5.3.1 applies to a larger set, we compare the distribution of the corpus-based measure of spontaneity over this sample and the distribution of the same value for the 354 verbs listed by Levin (1993) (see Section 2.2.1 in Chapter 2 for more details). 7 Verb freeze is frequently used in our corpus in its non-literal sense (e.g. freeze pensions, freeze assets), while the sense that was taken into account by Haspelmath (1993) is most likely the literal meaning of the verb (as in The lake freezed.). This is why the verb’s corpus-based ranking was very different from its typology based ranking. 188 5.3. Experiments Materials and methods The list of English lexical causatives is extracted from the (Levin 1993) verb index. Since the index referred with the same number to the verbs that do not enter the alternation (the book sections 1.1.2.1, 1.1.2.2, and 1.1.2.3), the verbs that do not alternate were removed from the list manually. All the instances where these verbs occur as transitive, intransitive, and passive were extracted from the automatically parsed English side of the Europarl corpus. We extract the same counts which were extracted for the small sample discussed in Section 5.3.1. We reduce the variance in the corpus-based measure preserving the information about the ordering of verbs by transforming the frequency ratio into a more limited measure of spontaneity. We calculate the value of spontaneity (Sp in 5.14) for each verb v included in the study as the logarithm of the ratio between the rates of anticausative and causative uses of the verb in the corpus, as shown in (5.14). Sp = ln rate(v, caus) rate(v, acaus) (5.14) The rates of uses of the three extracted constructions f orm ∈ {anticaus, caus, pass} for each verb are calculated as in (5.15). F (f orm, v) f orm F (f orm, v) rate(f orm) = P (5.15) The verbs that tend to be used as anticausative will have negative values for the variable Sp, the verbs that tend to be used as causative will be represented with positive values, and those that are used equally as anticausative and causative will have values close to zero. The distribution of the Sp-value over the 354 verbs is shown in Figure 5.4. In the cases of verbs that were not observed in one of the three forms, we calculated the rate values as the rate of uses of the form in the instances of all verbs with frequency one divided by the total frequency of the verb in question. For example, the verb attenuate occurred three times in the corpus, once as causative, and two times as passive. The 189 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives Figure 5.2.: Density distribution of the Sp value in the two samples of verbs rate of anticausative uses for this verb is 0.31/3 = 0.10. The number 0.31 that is used instead of the observed count 0 represents the rate of all verbs with frequency one that occurred as intransitive. After normalising, the rate of causative uses of this verb is 0.30. The rate of passive uses is 0.61, and the rate of anticausative uses is 0.09. In this way we obtain small non-zero values proportional to the two observed frequencies. Results and discussion We compare the distribution of the Sp-values over the small and the large set of verbs in several ways. Figure 5.2 shows the density distribution of the spontaneity value over the two samples of verbs. First, visual assessment of the shapes of the two distributions suggests that they are very similar. They both have a single mode (a single most probable value). Both modes are situated in the same region (around the value 0). The difference in the probability of the most probable values which can be observed in the figure (0.6 190 5.3. Experiments for the large sample as opposed to 0.3 for the small sample) does not necessarily reflect the real difference in the two distributions. It can be explained by the fact that the large sample contains a number of unobserved verbs for which are assigned the same Spvalue, estimated on the basis of the values of low frequency verbs, as discussed earlier. In reality, the verbs would not have exactly the same value, so that the density would be more equally distributed around zero, which is exactly the case in the small sample. Another indication that the two samples are the same is the value of two-vector t-test t = −0.0907, p = 0.9283, which indicates a very small difference in the means of the two distribution and a high probability that it is observed by chance. The t-test works with the assumption that the distributions which are compared belong to the family of normal distributions. It does not apply to other kinds of distributions. To make sure that the distributions of our data can be compared with the t-test, we perform the Shapiro-Wilk test which shows how much a distribution deviates from a normal distribution. This test was not significant (W = 0.9355, p = 0.07), which means that the distribution of our data can be considered normal. We conclude that the verbs for which the corpus-based ranking is shown to correspond to the typology-based ranking represent an unbiased sample of the larger population of verbs that participate in the causative alternation. This implies that the corpus-based method for calculating the spontaneity value presented in this section can be applied to all the verbs that participate in the alternation. The limitation of this method, however, is that it is based on the observations in a monolingual corpus. Given the well-documented cross-linguistic variation in the behaviour of the alternating verbs, discussed in Section 5.2.2 and 5.2.3, and summarised in Table 5.1, a monolingual measure is likely to be influenced by language-specific biases in the data. In the following section, we take a closer look at the relationship between the patterns of cross-linguistic variation in the instances of lexical causatives and their spontaneity value. 191 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives 5.3.3. Experiment 3: Spontaneity and cross-linguistic variation In analysing cross-linguistic variation in the realisations of lexical causatives, we try to determine whether a verb can be expected to have consistent or inconsistent realisations across languages depending on the degree to which an external causer is involved in the event described by the verb. We approach this task by analysing German translations of English lexical causatives as they are found in a parallel corpus. Studying instances of translations of lexical causatives in a parallel corpus allows us to control for any pragmatical and contextual factors that may be involved in a particular realisation of a lexical causative. Since translation is supposed to express the same meaning in the same context, we can assume that the same factors that influence a particular realisation of a verb in a clause in one language influence the realisation of its translation in the corresponding clause in another language. Any potential differences in the form of the two parallel clauses should be explained by the lexical properties of the verbs or by structural differences between the languages. We perform a statistical analysis of a sample of parallel instances of lexical causatives in English and German, which we divide into three subsamples: expressions of spontaneous events, expressions of non-spontaneous events and expressions of the events that are neutral with respect to spontaneity. Given that spontaneity of an event, as a universal property, correlates with causative and anticausative use monolingually, and given that translations are meaning-preserving, we expect to find an interaction between the level of spontaneity of the event described by the verb and its cross-linguistic syntactic realisation. Assuming that cross-linguistic variation is an extension of within-language variation, as discussed in the end of Section 4.3 in Chapter 4, we expect syntactic realisations consistent with the lexical semantics of the verb to be carried across languages in a parallel fashion, while those that are inconsistent are expected to show a tendency towards the consistent realisation. For example, we expect intransitive realisations to stay intransitive, and transitives to be often transformed into intransitives when verbs describe spontaneous events. Since the probability of both realisations is similar in neutral instances, we expect to find fewer transformations than in the other two groups. 192 5.3. Experiments Materials and methods The data collected for this analysis comes from large and complex resources. To know which English form is aligned with which German form, we first need to extract the English lexical causative from the English side of the parallel corpus. We then determine its form based on the automatic syntactic analysis of the sentence. Once we know which sentence in the English side of the parallel corpus contains a lexical causative, we find the German sentence which is aligned with the English sentence based on automatic sentence alignment in the parallel corpus. Once the aligned German sentence is identified, we look for the German verb which is aligned with the English verb. To do this, we first search the automatic word-alignments to find the German word which is aligned with the English verb. If we find such a word, we then look into the syntactic analysis of the German sentence to determine whether this word is a verb. If it is a verb, we then search the German syntactic parse to find the constituents of the clause where the verb is found. Once we know the constituents, we can determine whether the German translation of the English lexical causative is transitive, intransitive or passive (using the same criteria as for extracting English instances in Section 5.3.2). The methods used to collect the data are described in more detail in the following subsection. The verbs included in this study are the 354 English verbs listed as alternating by Levin (1993), for which we have calculated the Sp-value applying the procedure described in Section 5.3.2. We extract the parallel instances of these verbs from a parallel EnglishGerman Europarl (Koehn 2005) corpus, version 3 (the same corpus which is used in the study in Chapter 4). The corpus consists of German translations of around 1’500’000 English sentences which are used in the previous two experiments (see Section 5.3.1). Note that by German translation we mean German translation equivalents, since the direction of translation is not known for most of the corpus. To extract the information about the syntactic form of the instances, needed for our research, the German side of the corpus is syntactically parsed using the same parser as for the English side, the MaltParser (Nivre et al. 2007). The corpus was word-aligned using the system GIZA++ (Och and Ney 2003) (the same tool which is used in the study in Chapter 4.) Both for the syntactic parses and word alignments are provided by Bouma et al. (2010) who used these tools to process the corpus for their own research. 193 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives We extract the data for our research reusing the processed corpus (with some adaptation and conversion of the format of the data). In extracting the data for our analysis, we search the processed parallel corpus looking for four pieces of information: English syntactic parse, alignment between English and German sentences, alignment between English and German words, and German syntactic parse. All four pieces of information are not available for all the sentences in the processed resource. Some English sentences are syntactically analysed, but the corpus does not contain their translations to German. Likewise, there are German sentences for which English translations are not available. Finally, syntactic parses are not available for all the sentences which are aligned. Having established these mismatches, we first search the whole resource to find the items which contain all the required information. Once we have found the intersection between the English parses, the German parses and the sentence alignment, we search the English side of these sentences to identify English lexical causatives in the same way as in Section 5.3.2. The German translation of each instance of an English lexical causative is extracted on the basis of word alignments. Instances where at least one German element was wordaligned with at least one element in the English instance were considered aligned. The extraction procedure is shown in more detail in Algorithm 3. 194 5.3. Experiments Levin (1993) verb index Parsed English Europarl, Prolog format Parsed German Europarl, Prolog format List of lexical causatives Parsed English Europarl, CoNLL format Parsed German Europarl, CoNLL format Extract English instances English causatives counts Corpus-based spontaneity measure English instances Extract word alignment of English instances Word aligned English-German Europarl, Prolog format Corrected word alignment Experimental data set Figure 5.3.: Data collecting workflow. The shaded boxes represent external resources. The dashed boxes represent the scripts which are written for specific tasks in extracting data and performing the calculations. The other boxes represent the input and the output data at each stage of data processing. 195 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives Algorithm 3: Extracting parallel cross-linguistic realisations of lexical causatives. Input : 1. A corpus E consisting of English sentences e which a. contain realisations of lexical causatives b. are parsed with a dependency parser c. annotated with the form of the realisation (transitive, intransitive, or passive) 2. A corpus G consisting of German sentences g which are a. sentence- and word-aligned with English sentences in E b. parsed with a dependency parser. Output : Parallel instances consisting of: a. the form of the English realisation (transitive, intransitive, or passive) b. the form of the aligned German realisation (transitive, intransitive, or passive) for i = 1 to i = E do if align(verbj ) is a German verb then g-verb = align(verbj ); do Algorithm-2(g-verb) else if there is align(OBJj ) then g-verb = the verb on which align(OBJj ) depends; do Algorithm-2(g-verb) else if there is align(SU BJj ) then g-verb = the verb on which align(SU BJj ) depends; do Algorithm-2(g-verb) else return no align; end end end end We define the alignment in this way to address the issue of missing alignments. As discussed in more detail in Section 4.4 in Chapter 4, the evaluation of the performance of the word-alignment system GIZA++ on the Europarl data for English and German (Padó 2007) showed a recall rate of only 52.9%, while the precision is very high (98.6%). 196 5.3. Experiments This evaluation applies to the intersection of both directions of word alignment, which is the setting used in the processing of our data. The low recall rate means that around half of the word alignments are not identified by the system. Extracting only the instances where there is a word alignment between an English and a German verb would hence considerably reduce our data set. Instead, we rely on the extracted syntactic relations and on the intuition that the verbs are aligned if any of the constituents which depend on them are aligned. With this definition of instance alignment, we also take advantage of our own finding that nouns (the heads of the depending constituents are nouns) are generally better aligned than verbs, as discussed in Chapter 4. Sentence ID Verb Form Verb instance Subject Object 96-11-14.867 intensify CAUS 7 5 8 96-11-14.859 beschleunigen CAUS 6 2 5 Table 5.4.: An example of an extracted instance of an English alternating verb and its translation to German. The numbers under the elements of the realisations of the verbs indicate their position in the sentence. For example, the object of the English verb is the eight word in the sentence 96-11-14.867, and the object of the German verb is the fifth word in the sentence 96-11-14.859. A pair of extracted aligned instances is illustrated in Table 5.4. The first column is the sentence identifier, the second column is the verb found in the instance, the third column is the form of the verb in the instance, and the following three columns are the positions of the verb, the head of its subject and the head of its object in the sentence. One more processing step was needed to identify sentence constituents which are wordaligned because the word alignments and the syntactic analysis did not refer to the same positions. This is caused by the fact that sentence alignment was often not one-to-one. In the cases where more than one English sentence were aligned with a single German sentence, or the other way around, the positions of words were determined with respect to the alignment chunk, and not with respect to the individual sentences. For example, if two English sentences were aligned with a German sentence, with eight words in the first sentences and seven in the second. The position of the first word in the second 197 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives Sp 1.20 1.97 0.71 -0.05 0.71 -0.09 -0.14 -3.91 0.39 -1.76 En pass trans trans pass trans trans trans intrans pass intrans De intrans trans trans pass trans pass intrans intrans intrans trans Table 5.5.: Examples of parallel instances of lexical causatives. sentence is indicated as 9. In the syntactic parse, on the other hand, these two sentences are not grouped together, so the position of the same word is indicated as 1. We restored the original sentence-based word enumeration in the word alignments before extracting the alignment of the constituents. Applying the described methods allows us to extract only translations with limited cross-linguistic variation. Only the instances of English verbs that are translated with a corresponding finite verb form in German are extracted, excluding the cases where English verbs are translated into German with a corresponding non-finite form such as infinitive, nominalization, or participle in German. The extracted parallel instance were then combined with the information about the Spvalue for each verb to form the final data source for our study, as illustrated in Table 5.5. Each line represents one instance of an alternating English verb found in the corpus and its translation to German. The full data set contains 13033 such items. The first column contains the spontaneity value of the verb found in the instance. The second column represents the form in which the English verb is realised in the instance. The third column represents the form of the German translation of the English verb. Figure 5.3 shows the main steps in the data collecting work flow. 198 5.3. Experiments Validation of the collected data. Since all the data used in our study are collected automatically from an automatically parsed and word-aligned corpus, they necessarily include processing-related errors. The best reported labelled attachment score of the MaltParser system for English is 88.11 (CoNLL Shared Task 2007) and for German 85.82 (CoNLL-X Shared Task 2006).We perform a manual evaluation of a sample of randomly selected instances to asses to what degree they correspond to the actual analyses. One hundred parallel instances were randomly selected from the total of 13033 extracted instances. The following categories were evaluated: • The form of the clause in the English instance • The form of the clause of the German translation The extraction script assigned a wrong form to 8/100 English instances (error rate 8%). In 7 cases out of 8 errors, the wrong form was assigned due to parsing errors. One error was due to the fact that the parser’s output does not include information about traces. For example, in a sentence such as That is something we must change, the anticausative form is assigned to the instance of the verb change instead of the causative form. In four out of the seven parsing errors the actual forms found in the instances were not verbs but adjectives (open, close, clear, worry). The evaluation of the translation extraction was performed only for the cases where the English instance actually contained a verbal form (96 instances). A wrong form was assigned to the German translation in 13/96 cases (error rate 13.5%). In 7 of the 13 wrong assignments, a wrong form was assigned to the translation due to parsing errors in German. The errors in 3 cases were due to the fact that German passive forms headed by the verb sein, as in Das Team war gespalten for English The team was split, were not recognised as passive, but they were identified as anticausative instead. The ambiguity between such forms and anticausative past tense formed with the sein auxilliary verb cannot be resolved in our current extraction method. In the last 3 cases, the error was due to the fact that the corresponding German form was not a clause. In these cases, the English verb is aligned to a word with a different category (an adverb and a nominalization) or entirely left out in the German sentence (a verb such as sit in We sit here and produce...). The form that is assigned to the translation in these cases is the 199 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives Figure 5.4.: Density distribution of the Sp value over instances of 354 verbs form of the verb on which the aligned words depend. Our extraction method cannot deal with these cases at the moment, although such transformations would be interesting to capture. Sampling of instances. Three groups of instances are defined according to the density distribution of the Sp value. As it can be seen in Figure 5.4, roughly symmetric points of low density are around values -1 and 1. We regard the instances containing the verbs with the value of Sp inferior to -1 as the low value group. These are expressions of spontaneous events in the terms of the scale of spontaneous occurrence (see Section 5.2.3), or of internally caused events in the sense of the theories presented in Section 5.2.1 and 5.2.2. Instances containing a verb with the Sp value superior to 1 are considered belonging to the high value group, representing expressions of non-spontaneous, or externally-caused events. The instances in between the two values are considered medium value instances, 200 5.3. Experiments representing what Alexiadou (2010) refers to as cause unspecified events (see Section 5.2.2). This division gives symmetric sub-samples of comparable size: similar number of examples for the two extreme values (3’107 instances with high Sp values, 2’822 instances with low Sp values) and roughly double this number for non-extreme values (7’104 instances with medium Sp values). Results and discussion Table 5.6 shows the frequencies of the realisations of lexical causatives in parallel English and German instances for the whole sample of instances, as well as for the three sub-samples. The three most frequent combinations of forms in each group of parallel instances are highlighted to show the changes in the distribution of combinations of forms in the two languages across groups. The overview of the frequencies suggests that lexical properties of verbs influence their cross-linguistic realisations. The table that shows occurrences over the whole sample indicates that, both in English and in German, intransitives are more frequent than transitives, which are, in turn, more frequent than passives (marginal distributions). The non-parallel translations cover 32% of the cases. When we partition the occurrences by the spontaneity of the event, the distribution changes, despite the fact that these are distributions in translations, and therefore subject to very strong pressure in favour of parallel constructions. In the group of instances containing verbs that describe events around the middle of the scale of spontaneous occurrences, the parallel combinations are the most frequent, as in the distribution of the whole set, with an even more markedly uniform distribution (29% of non-parallel translations). This means that the verbs which describe events which are neither spontaneous or non-spontaneous tend to be used in the same form across languages. The probabilities of the two realisations are similar in these verbs, which means that they can be expected to occur with similar frequency across languages. 201 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives English Whole sample Intransitive Transitive Passive Total Spontaneous English English events Intransitive Transitive Passive Total Nonspontaneous events Intransitive Transitive Passive Total English Neutral events Intransitive Transitive Passive Total Intransitive N % 3504 27 1186 9 781 6 5471 42 Intransitive N % 1733 61 182 6 35 1 1950 68 Intransitive N % 74 2 288 9 448 14 810 25 Intransitive N % 1697 24 716 10 298 4 2711 38 German Transitive Passive Total N % N % N % 1001 8 314 2 4819 37 2792 21 369 3 4347 33 517 4 2569 20 3867 30 4310 33 3252 25 13033 100 German Transitive Passive Total N % N % N % 495 17 102 4 2330 82 132 5 18 1 332 11 23 1 102 4 160 6 650 23 222 9 2822 100 German Transitive Passive Total N % N % N % 72 2 29 1 175 5 948 31 125 4 1361 44 289 10 834 27 1571 51 1309 43 988 32 3107 100 German Transitive Passive Total N % N % N % 434 6 183 3 2314 33 1712 24 226 3 2654 37 205 3 1633 23 2136 30 2351 33 2042 29 7104 100 Table 5.6.: Contingency tables for the English and German forms in different samples of parallel instances. Since both realisations are frequent in these verbs, they can be expected to alternate in the majority of languages. The distribution of the forms is different in the groups of instances containing verbs that describe events on the extremes of the scale of spontaneous occurrence. The parallel 202 5.3. Experiments Figure 5.5.: Joint distribution of verb instances in the parallel corpus. The size of the boxes in the table represents the proportion of parallel instances in each sub-sample. realisations are frequent only for the forms that are consistent with the lexical properties (intransitive for spontaneous events and transitive for non-spontaneous events). An atypical instance of a verb in one language (e.g. transitive instance of a verb that describes a spontaneous event) is not preserved across languages. These realisations tend to be transformed into the typical form in another language. For example, German transitives are much less frequent in the spontaneous events group than in the nonspontaneous events group, while English intransitive non-spontaneous verbs are only 5% compared to 82% of the spontaneous group. The atypical realisations of these verbs are thus rare across languages, which means that they might be entirely absent in some languages. In the languages in which these realisations are found, the verbs alternate, while in the languages where these realisations are not found the verbs do not alternate. This means that the verbs describing events on the extremes of the scale of spontaneous occurrence can be expected to alternate in a smaller range of languages. We conclude that the analysis of the realisations of lexical causatives in a parallel corpus provides evidence that the probability of occurrence of an external cause in the event described by a verb (the spontaneity of the event) is a grammatically relevant lexical property. The cross-linguistic variation in the availability of the alternation is influenced by this property. Verbs that describe events on the extremes of the scale of spontaneous occurrence are more likely to have different realisations across languages than those that describe events in the middle of the scale. 203 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives 5.3.4. Experiment 4: Learning spontaneity with a probabilistic model In Section 5.3.1 and Section 5.3.2, we have shown that the spontaneity value of verbs can be estimated from the information about the distribution of the causative and anticausative instances of verbs in a corpus. These estimations, however, are based on the data from only one language. Given that realisations of the alternation in different languages is influenced by unknown factors, resulting in the observed cross-linguistic variation, the estimation based on the data from a single language can be expected to be influenced by language-specific factors. As we could see in Section 5.3.1 the estimation of the spontaneity value based on English corpus data is correlated with an estimation based on the data from many different languages. The correlation, however, is not perfect and the reason for this could be the deviation of the realisations in English from the general tendencies. For example, it has been established that English prefers causative realisations of some verbs compared to other languages (Bowerman and Croft 2008; Wolff et al. 2009a; Wolff and Ventura 2009b). As a result, estimations based on English data can give lower spontaneity values compared to the universal value. Another indication of potential language-specific factors which influence the way lexical causatives are realised in a language can be found in the results of the experiment presented in Section 5.3.3. While this experiment shows that the spontaneity value influences the cross-linguistic realisations, we can also see that there is a number of realisations which are divergent across languages. As discussed in Section 5.3.3, the realisations which are not consistent with the spontaneity value in one language tend to be transformed into realisations consistent with the spontaneity value in the other language. However, the factors which give rise to the realisations inconsistent with the spontaneity are not known. To address the issue of potential influence of language-specific factors on corpus-based estimation of the spontaneity value of events described by alternating verbs, we extend the corpus based approach to the cross-lingsuitc domain. We collect the information about the realisations of the alternating verbs in a parallel corpus, as described in Section 5.3.3. The extended data set is expected to provide a better estimation of the universal spontaneity value than the monolingual set, neutralising the language-specific influences. 204 5.3. Experiments Naturally, including more languages would be expected to give even better estimates. In this study, however, we consider only two languages as a first step towards a richer cross-linguistic setting. A simple approach to integrating cross-linguistic corpus data would be to calculate the ratio of causative to anticausative uses by adding up the counts from different languages. For instance, if a verb is found as transitive four times in one language and the translations of these four instances were two times transitive and two times intransitive, the count of transitive instance of these verb would be six. The two intransitive translations would be added to the counts of intransitive instances. In this approach, however, the information about which instances were observed in which language is lost. This knowledge, however, can be very important for isolating language-specific factors and using this information to predict cross-linguistic transformations of phrases containing verbs with causative meaning. Another disadvantage of such an approach, which applies to all the estimations performed in this study so far, is that they do not provide a straightforward way of grouping the verbs into classes, which is one of the major concerns in the representation of lexical knowledge (see Section 5.2 in this chapter and also Section 2.2.1 in Chapter 2). To take into account both potential language specific factors and potential grouping of verbs we design a probabilistic model which estimates the spontaneity value on the basis of cross-linguistic data and generates a probability distribution over a given number of spontaneity classes for each verb in a given set of verbs. The number of classes. Two main proposals concerning the classification of alternating verbs have been put forward in the linguistic literature. As discussed in Section 5.2, Levin and Rappaport Hovav (1994) use the distinction between externally and internally caused events to explain a set of observations concerning the alternating verbs. Alexiadou (2010), however, points out that a range of cross-linguistic phenomena are better explained by introducing a third semantic class, the cause-unspecified verbs. The distinctions argued for in the linguistic literature can be roughly related to the spontaneity feature in our account, so that externally caused events correspond to non-spontaneous, internally caused to spontaneous and cause unspecified to medium-spontaneity events. 205 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives The model As it can be seen in its graphical representation in Figure 5.6, the model consists of four variables. V Sp En Ge Figure 5.6.: Bayesian net model for learning spontaneity. The first variable is the set of considered verbs V . This can be any given set of verbs. The second variable is the spontaneity class of the verb, for which we use the symbol Sp. The values of this variable depend on the assumed classification. The third (En) and the fourth (Ge) variables are the surface realisations of the verbs in parallel instances. These variables take three values: causative for active transitive use, anticausative for intransitive use, and passive for passive use. We represent the relations between the variables by constructing a Bayesian network (for more details on Bayesian networks, see Section 3.4.3 in Chapter 3), shown in Figure 5.6. The variable that represents the spontaneity class of verbs (Sp) is treated as an unobserved variable. The values for the other three variables are observed in the data source. Note that the input to the model, unlike the information extracted for the analysis in Section 5.3.3, does not contain the information about the spontaneity (compare Table 5.7 with Table 5.5). 206 5.3. Experiments The dependence between En and Ge represents the fact that the two instances of a verb are translations of each other, but does not represent the direction of translation. The form of the instance in one language depends on the form of the parallel instance because they express the same meaning in the same context, regardless of the direction of translation. Assuming that the variables are related as in Figure 5.6, En and Ge are conditionally independent of V given Sp, so we can calculate the probability of the model as in (5.16). P (v, sp, en, ge) = P (v) · P (sp|v) · P (en|sp) · P (ge|sp, en) (5.16) Since the value of spontaneity is not observed, the parameters of the model that involve this value need to be estimated so that the probability of the whole model is maximised. We estimate the Sp-value for each instance of a verb by querying the model, as shown in (5.17). P (v, sp, en, ge) P (sp|v, en, ge) = P sp P (v, sp, en, ge) (5.17) Applying the independence relations defined in the Bayesian net (Figure 5.6), the most probable spontaneity class in each instance is calculated as shown in (5.18). P (v) · P (en|sp) · p(ge|sp, en) sp ¯ = arg max P sp sp P (v) · P (en|sp) · p(ge|sp, en) (5.18) Having estimated the Sp-value for each verb instance, we assign to each verb the average spontaneity value across instances, as shown in (5.19). P P sp class(verb) = en ge p(sp|v, en, ge) F (v) (5.19) where F (v) is the number of occurrences of the verb in the training data. 207 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives All the variables in the model are defined so that the parameters can be estimated on the basis of frequencies of instances of verbs automatically extracted from parsed corpora. The corpus used as input does not need to be annotated with classes, since the parameters are estimated treating the class variable as unobserved. The model described in this section includes only two languages because we apply it to the two languages that we choose as a minimal pair (English and German), but it can be easily extended to include any number of languages. Experimental evaluation The accuracy of the predictions of the model is evaluated in an experiment. We implement a classifier based on the model, which we train and test using the data extracted from the syntactically parsed and word-aligned parallel corpus of English and German, as described in Section 5.3.3. To address the discussion on the number of classes of alternating verbs (see Section 5.2.2), we test two versions of the model. In one version, the model performs a two-way classification which corresponds to the binary distinction between externally and internally caused events. In the other version, the model performs a three-way classification which corresponds to the distinction between internally caused, externally caused, and cause-unspecified events. The verbs for which we calculate the spontaneity class in the experimental evaluation of the model are the 354 verbs that participate in the causative alternation in English, as listed in Levin (1993). We estimate the parameters of the model by implementing an expectation-maximisation algorithm, which we run for 100 iterations. (Using the algorithm to estimate the probabilities in a Bayesian model are explained in Section 3.4.2 in Chapter 3.) We initialise the algorithm according to the available knowledge about the parameters. The probability P (v) is set to the prior probability of each verb estimated as the relative frequency of the verb in the corpus. The probability P (sp|v) is set so that causative events are slightly more probable than anticausative events in the two-way classification, and so that that cause-unspecified events are slightly more probable than the other two kinds 208 5.3. Experiments V En move pass alter trans improve trans increase pass improve trans break trans change trans grow intrans close pass split intrans De intrans trans trans pass trans pass intrans intrans intrans trans Table 5.7.: Examples of the cross-linguistic input data of events in the three-way classification. The values of P (en|sp) and P (ge|sp, en) are initialised randomly. For the set of verbs for which the typological information is available, we compare the classification of verbs learned by the model both with the typology-based ranking and with the rankings based on the monolingual corpus-based Sp-value, automatically calculated in the second experiment (see Section 5.3.2 for more details). Since the set of verbs for which it is possible to perform a direct evaluation against typological data is relatively small (the data for 26 verbs are available), we measure the agreement between the classification learned by the model and the rankings based on the monolingual corpusbased Sp-value for the set of verbs which are found in the parallel corpus (203 verbs). This measure is expected to provide an indirect assessment of how distinguished the supposed classes of verbs are. Table 5.8 shows all the classifications performed automatically in comparison with the classifications based on the typology rankings. Since the typology-based and the monolingual corpus-based measures do not classify, but only rank the verbs, the classes based on these two measures are obtained by dividing the ranked verbs according to arbitrary thresholds. The thresholds for classifying the verbs according to the monolingual corpus-based Sp-value are determined in the same way as in the third experiment (see Section 5.3.3). In the two-way classification, the threshold is Sp = −1. The verbs with 209 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives Verb boil dry wake up sink learn-teach melt stop turn dissolve burn fill finish begin spread roll develop rise-raise improve rock connect change gather open break close split Agreement Two-way classification Monolingual bilingual. a a a a a a a a a a a a a a a a c c c c c c a a a a a a c c c c c c c c c c c c a a c c c c c c c c c c 85 % 85% Three-way classification Monolingual bilingual a a a a a a a a m m a a a a m m c c c m c m a a a a a a m m m m c c m m m m c c a a m m m m c c c c c c 61% 69% Table 5.8.: Agreement between corpus-based and typology-based classification of verbs. The classes are denoted in the following way: a=anticausative (interanally caused), c=causative (externally caused) , m=cause-unspecified. 210 5.3. Experiments the Sp-value below -1 are considered anticausative, the other verbs are causative. In the three-way classification the causative class is split into two classes using the threshold Sp = 1. The verbs with the Sp-value between -1 and 1 are considered cause-unspecified, while the verbs with the Sp-value above 1 are causative. The thresholds for classifying the verbs according to the typology-based ranking are determined for each evaluation separately so that the agreement is maximised. For example, threshold is set after the verb turn in the first two columns of Table 5.8. All the verbs ranked higher than turn are considered anitcausative. The others are causative. In the two-way classification, the two versions of the model, with monolingual and with bilingual input, result in identical classifications. The agreement of the models with the typological ranking can be considered very good (85%). The optimal threshold divides the verbs into two asymmetric classes: eight verbs in the internally caused class and eighteen in the externally caused class. The agreement is better for the internally caused class. In the three way-classification, the performance of both versions of the model drops. In this setting, the output of the two versions differs: there are two verbs which are classified as externally caused by the monolingual version and as cause-unspecified by the bilingual version, which results in a slightly better performance of the bilingual version. Given the small number of evaluated verbs, however, this tendency cannot be considered significant. The three-way classification seems more difficult for both methods. The difficulty is not only due to the number of classes, but also to the fact that two classes are not well-distinguished in the data. While the class of anticausative verbs is relatively easily distinguished (small number of errors in all classifications), the classes of causative and cause-unspecified verbs are hard to distinguish. This finding supports the two-way classification argued for in the literature. However, the classification performed by the model indicates that the distinction between causative and cause-unspecified verbs might still exist. Compared to the classification based on monolingual Sp-value, more verbs are classified as cause-unspecified, and they are more densely distributed on the typological scale. Since the model takes into account cross-linguistic variation in the realisations of 211 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives Parallel Sp class Anticausative Causative Monolingual Sp class Anticausative Causative 64 13 14 112 Table 5.9.: Confusion matrix for the monolingual corpus-based measure of spontaneity and the 2-class bi-lingual model classification for 203 verbs found in the parallel corpus. Parallel Sp class Anticausative Unspecified Causative Monolingual Sp class Anticausative Unspecified Causative 52 19 0 1 32 35 6 24 34 Table 5.10.: Confusion matrix the monolingual corpus-based measure of spontaneity and the 3-class bi-lingual model classification for 203 verbs found in the parallel corpus. lexical causatives, the observed difference in performance could be interpreted as a sign that the distinction between cause-unspecified and causative verbs does emerge in the cross-linguistic context. The performance of the two automatic classifiers on all the alternating verbs found in the parallel corpus is compared in Tables 5.9 and 5.10. The agreement between the two automatic methods is 87% in the two-class setting and 58 % in the three-class setting. Again, the class of anticausative verbs is rarely confused with the other two classes, while causative and cause-unspecified verbs are frequently confused (the agreement between the two classifications is at the chance level). The lack of agreement between the two methods, however, does not necessarily mean that the two classes are not distinguishable. It can also mean that the bi-lingual probabilistic model distinguishes between the two classes better than the monolingual ratio-based measure. The direct comparison of the two methods with the typology scale points in the same direction. 212 5.4. General discussion Figure 5.7.: The relationship between the syntactic realisations, the morphological form and the meaning of lexical causatives. 5.4. General discussion The experiments performed in our study of morpho-syntactic realisations of lexical causatives relate various factors involved in the causative alternation. We have established a statistically significant correlation between frequency distribution of syntactic alternants in a monolingual corpus and frequency distribution of morphological marking on the verbs which participate in the causative alternation across languages. The verbs which tend to be used in intransitive clauses tend to bear causative marking across languages. The verbs which tend to be used in transitive clauses tend to bear anticausative marking across languages. This finding suggests that the underlying cause of both distributions is the meaning of verbs, as illustrated in Figure 5.7. The fact that a verb which describes an event in which an external causer is very unlikely could be the reason why the verb occurs in intransitive clauses and why it bears causative marking across languages. The intransitive realisations could be due to the fact that the meaning of the verb is such that an external causer does not need to be expressed in most of its uses. This further implies that only one argument of the verb is expressed and that the opposition between the subject and the object is not needed, which gives rise to an intransitive structure. However, there is still a possibility for such a verb to be realised so that the external causer of the event is expressed (typically in a transitive clause). The realisations which are default for these verbs (intransitive) are not morphologically marked because they correspond to the general expectation. The realisations with an explicit external causer are morphologically 213 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives marked because they are unexpected. For the same reasons the verbs which describe events in which an external causer is very likely tend to occur in transitive clauses, but when they are used as intransitive, they tend to be morphologically marked. Although we have not excluded other possible explanations, the likelihood of an external causer in an event described by a verb seems to be a plausible underlying cause of the observed correlation. 5.4.1. The scale of external causation and the classes of verbs The results of our experiments suggest that the alternating verbs are spread on a scale of the likelihood of external causation. The distribution of the corpus-based measure of the likelihood of external causation (the term Sp-value is used in our experiments) over a large number of verb instances in a corpus is normal, which implies that the most likely value is the mean value, and that both extremely low and extremely high values are equally likely. This finding suggests that most of the alternating verbs can be expected to describe events in which there is around 50% chance for an external causer to occur. These are the verbs which alternate in the majority of languages. However, the probability of an external causer can be very small for some verbs. These are the verbs which alternate only in some languages, while they do not alternate in the majority of languages. The same can be claimed for the verbs describing events with extremely likely external causers. Thus, the likelihood of external causation in the meaning of verbs explains the observed cross-linguistic variation in the availability of the causative alternation. The verbs which do not alternate in English can be expected not to alternate in a number of other languages too. The number of languages in which a verb can be expected not to alternate can be predicted from the likelihood of external causation in the event which it describes. Although the scale of likelihood of external causation seems to be continuous based on the fact that many different verbs are assigned distinct values, the results of our experiments on classification suggest that some values can be grouped together. The anticausative part of the scale seems to be distinguishable from the rest of the scale. The verbs classified as anticausative in our experiments can be related to verbs describing internally-caused events discussed in the literature. Relating these two categories, 214 5.4. General discussion however, requires redefining the role of internal causation in the account of the causative alternation. The verbs which are classified as anticausative in our experiments do alternate in English, while internal causation has been used to explain why some verbs do not alternate in English (see Section 5.2.1). Anticausative verbs include both the verbs which do and which do not alternate in English, but all of these verbs can be expected to alternate in fewer languages than the verbs in the middle of the scale. The question of whether there is a difference between the class of cause-unspecified and causative verbs remains open leading to another potential question. If it turns out that these two classes cannot be distinguished, this raises the question of why the two extremes of the scale behave differently with respect to classification. The data collected in our experiments do not seem to provide enough empirical evidence to address these issues. Although some tendencies seem to emerge in our classification experiments, more data from more languages would need to be analysed before some answers to these questions can be offered. 5.4.2. Cross-linguistic variation in English and German Unlike the previous research which is either monolingual or typological, we choose to take a micro-comparative approach and to study the use of lexical causatives in English and German at the level of token. We consider these two languages, which are genetically and geographically very close, a minimal pair. We can expect fewer lexical types to be differently realised in English and German than it would be the case in two distant languages, with fewer potential sources of variation. On the other hand, if a lexical type is inconsistently used in English and German, inconsistent realisations of the type can be expected in any two languages. This approach is in line with some recent trends in theoretical linguistics (discussed in Section 3.1.1 in Chapter 3). Despite the fact that English and German are closely related languages, systematically different realisations of lexical causatives could be expected on the basis of the grammatical and lexical differences that have already been identified in the literature. It has been noticed that the sets of alternating verbs in these languages are not the same (Schafer 2009). English verbs of moving, such as run, swim, walk, fly alternate, having both 215 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives anticausative and causative version. Their lexical counterparts in German can only be found as intransitive. The causative use of these verbs in English necessarily requires a transformation in the German translation. On the other hand, some verbs can alternate in German, but not in English. For example, the verb verstärken (’reinforce’) in German can have the anticausative version (sich verstärken), while in English, only the causative version is possible. The equivalent of the German anticausative verb is the expression become strong. At a more general level, it has been claimed that the relations between the elements of the argument structure of German verbs are more specified than those in English verbs, especially in prefixed verbs (Hawkins 1986). Given the opinions that the degree of specificity of verbs’ meaning can influence the alternation, which have been put forward in the qualitative analyses of lexical causatives (see Section 5.2 for more details), this difference might have influence on the way the verbs are realised in the two languages. From the morphological point of view, the alternation is differently marked in the two languages. While English shows preference for labile verb pairs (see Section 5.5), German uses both anticausative marking and labil pairs. This difference is another factor than could potentially influence the realisations of the verbs. Although we have not examined the influence of these factors directly, the results of our experiments suggest that including the data from German in the automatic estimation of the likelihood of external causation changes the estimation so that it corresponds better to the typological ranking of verbs in the three-way classification setting. This can be interpreted as an indication that lexical causatives are realised differently in English and German, despite the fact that the two languages are similar in many respects. The difference is big enough to neutralise some language-specific trends in the realisations of lexical causatives, such as, for example, the preference for transitive clauses in English, discussed in the beginning of Section 5.3.4. 5.4.3. Relevance of the findings to natural language processing Studying the alternation in lexical causatives is not only interesting for theoretical linguistics. As discussed in Chapter 2, formal representation of the meaning of verbs are extensively used in natural language processing too. Analysing the predicate-argument 216 5.5. Related work structure of verbs proves important for tasks such as word sense disambiguation (Lapata and Brew 2004), semantic role labelling (Màrquez et al. 2008), cross-linguistic transfer of semantic annotation (Padó and Lapata 2009; Fung et al. 2007; van der Plas et al. 2011). Several large-scale projects have been undertaken to represent semantic properties of verbs explicitly in lexicons such as Word Net (Fellbaum 1998), Verb Net (Kipper Schuler 2005), and PropBank Palmer et al. (2005a). Since the causative alternation involves most of verbs, identifying the properties of verbs which allow them to alternate is important for developing representations of the meaning of verbs in general. The findings of our experiments provide new facts which could be useful in two natural language processing domains. First, the position of a verb on the scale of the likelihood of external causation can be used to predict the likelihood for a clause to be transformed across languages. Generally, the verbs which are in the middle of the scale can be expected to be used in a parallel fashion across languages, while placement of a verb on the extremes of the scale gives rise to divergent realisations. However, studying other factors involved in the realisations in each particular language would be required to predict the exact transformation. Second, the knowledge about the likelihood of external causation might be helpful in the task of detecting implicit arguments of verbs and, especially deverbal nouns (Gerber and Chai 2012; Roth and Frank 2012). Knowing, for example, that a verb is on the causative side of the scale increases the probability of an implicit causer if an explicit causer is not detected in a particular instance of the verb. 5.5. Related work Frequency distributions of transitive and intransitive realisations of lexical causatives in language corpora have been extensively studied in natural language processing, starting with the work on verb subcategorisation (Briscoe and Carroll 1997) and argument alternations (McCarthy and Korhonen 1998; Lapata 1999) to current general distributional approaches to the meaning of words (Baroni and Lenci 2010) (see Section 2.3 in Chapter 2 for more details). Here we focus on the work which addresses the notion of external causation itself in a theoretical framework. 217 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives McKoon and Macfarland (2000) address the distinction between verbs denoting internally caused and externally caused events. Their corpus study of twenty-one verb defined in the linguistic literature as internally caused change-of-state verbs and fourteen verbs defined as externally caused change-of-state verbs, show that the appearance of these verbs as causative (transitive) and anticausative (intransitive) cannot be used as a diagnostic for the kind of meaning that they are attributed. Since internally caused change-of-state verbs do not enter the alternation, they were expected to be found in intransitive clauses only. This, however, was not the case. The probability for some of these verbs to occur in a transitive clause is actually quite high (0.63 for the verb corrode, for example). More importantly, no difference was found in the probability of the verbs denoting internally caused and externally caused events to occur as transitive or as intransitive. This means that the acceptability judgements used in the qualitative analysis do not apply to all the verbs in question, and, also, not to all the instances of these verbs. Even though the most obvious prediction concerning the corpus instances of the two groups of verbs was not confirmed, the corpus data were still found to support the distinction between the two groups. Examining 50 randomly selected instances of transitive uses of each of the studied verbs, McKoon and Macfarland (2000) find that, when used in a transitive clause, internally caused change-of-state verbs tend to occur with a limited set of subjects, while externally caused verbs can occur with a wider range of subjects. This difference is statistically significant. The relation between frequencies of certain uses and the lexical semantics of English verbs is explored by Merlo and Stevenson (2001) in the context of automatic verb classification. Merlo and Stevenson (2001) show that information collected from instances of verbs in a corpus can be used to distinguish between three different classes which all include verbs that alternate between transitive and intransitive use. The classes in question are manner of motion verbs (5.20), which alternate only in a limited number of languages, externally caused change of state verbs (5.21), alternating across languages, and performance/creation verbs, which are not lexical causatives (5.22). (5.20) a. The horse raced past the barn. b. The jockey raced the horse past the barn. 218 5.5. Related work (5.21) a. The butter melted in the pan. b. The cook melted the butter in the pan. (5.22) a. The boy played. b. The boy played soccer. In the classification task, the verbs are described in terms of features that quantify the relevant aspects of verbs’ use on the basis of corpus data. The three main features are derived from the linguistic analysis of the verbs’ argument structure. The feature transitivity is used to capture the fact that transitive use is not equally common for all the verbs. It is very uncommon for manner of motion verbs (5.20b), much more common for change of state verbs (5.21b), and, finally, very common for performance/creation verbs (5.22b). This means that manner of motion verbs are expected to have consistently a low value for this feature. Change of state verbs are expected to have a middle value for this feature, while a high value of this feature is expected for performance/creation verbs. The feature causativity represents the fact that, in the causative alternation, the same lexical items can occur both as subjects and as objects of the same verb. This feature is expected to distinguish between the two causative classes and the performance class. The feature animacy is used to distinguish between the verbs that tend to have animate subjects (manner of motion and performance verbs) and those that do not (change of state verbs). The results of the study show that the classifier performs best if all the features are used. They also show that the discriminative value of the features differs when they are used separately and when they are used together, which means that information about the use of verbs that they encode is partially overlapping. In our study, we draw on the fact that the lexical properties of verbs are reflected in the way they are used in a corpus, established by the presented empirical approaches to the causative alternation. As in these studies, we consider frequencies of certain uses of verbs an observable and measurable property which serves as empirical evidence of the lexical properties. We explore this relationship further, relating it to a deeper level of theoretical semantic analysis of verbs and to the typological distribution of grammatical features. 219 5. Likelihood of external causation and the cross-linguistic variation in lexical causatives 5.6. Summary of contributions The experiments presented in this chapter provide empirical evidence that contribute to better understanding of the relationship between the semantics of lexical causatives, their formal morphological and syntactic properties, and the variation in their use. First, we have shown that the distribution of morphological marking on lexical causatives across a wide range of languages correlates with the distribution of their two syntactic realisations (transitive and intransitive) in a corpus of a single language. We have argued that the underlying cause of this correlation can be the meaning of lexical causatives, more precisely, the likelihood of an external causer in an event described by a verb. We have then proposed a monolingual corpus-based measure of the likelihood of an external causer which is automatically calculated for a wide range of alternating verbs. Having assigned the likelihood values to a large sample of verbs, we have analysed the distribution of the syntactic alternants of the verbs in the cross-linguistic English and German realisations. The analysis was performed on data automatically extracted from parallel corpora. It showed that the likelihood of external causation in the meaning of verbs influences the cross-linguistic variation. The realisations which are consistent with this value tend to be realised across languages in a parallel fashion. Contrary to this, the realisations in one language which are not consistent with the meaning of the verb tend to be transformed into consistent realisations in the other language. We have shown that automatic assessment of the likelihood of external causation in the meaning of verbs is more congruent with the typological data if it is based on the information on the realisations of lexical causatives in two languages than if it ist based on the monolingual information, despite the fact that the two studied languages, English and German, are typologically and geographically very close. To demonstrated this, we have designed a probabilistic model which classifies verbs into external causation classes taking into account cross-linguistic data. We have then evaluated the classification against verb ordering based on the typological distribution of causative and anticausative morphological marking of lexical causatives, as well as with the ordering based on the distribution of the syntactic alternants of lexical causatives in a monolingual corpus. To address the ongoing theoretical discussion on the number of semantic classes of lexical 220 5.6. Summary of contributions causatives, we have tested two versions of the model: a two-class and a three-class version. These tests have not provided conclusive results, but they have pointed out some potential tendencies which can be further investigated. 221 6. Unlexicalised learning of event duration using parallel corpora 6.1. Introduction Sentences denote events and states that can last from less than a second to an unlimited time span. The time span in which an event or a state takes place is understood mostly implicitly. The interpretation of the time in which the event takes place is sometimes influenced by the adverbials found in the sentence or by other structural elements, but the time span itself is hardly ever explicitly formulated. Consider the following example: (6.1) Winston stayed in the shop for two hours. The time span for which the relation expressed in (6.1) holds is specified with the adverbial two hours. However, even in this sentence, where the time adverbial is as explicit as it can be in natural language, we understand that Winston stayed in the shop some time “around two hours and not more” due to the automatic inference mechanisms called conversational implicatures (Grice 1975; Levinson 1983; Moeschler and Reboul 1994). The time span in this sentence is clearly not meant to be interpreted as “at least two hours”, which is, in fact, the truth-conditional meaning of the adverbial. Eliminating this second interpretation is based on implicit understanding. Sentences of natural language provide various clues to infer the implied time interpretation. We illustrate how the clues guide our interpretation with the examples in (6.2). (6.2) a. Winston looked around himself for a while before he quickly put the book in the bag. 223 6. Unlexicalised learning of event duration using parallel corpora b. Winston looked around himself quickly before he started putting the book in the bag. c. Winston looked around himself quickly before he started putting the books in the bag. The adverbials for a while and quickly in (6.2a) do not directly quantify the duration of the events of looking around and putting the book in the bag respectively, but rather suggest appropriate readings. For instance, it is appropriate to understand that the former event lasted longer than the latter and that both of them did not last more than a minute. If we rearrange the constituents as in (6.2b), then the appropriate understanding is that the former event was shorter than the latter. The event of putting the book in the bag is obviously longer in (6.2b) than in (6.2a), but its duration is also less specified. The event of putting the books in the bag in (6.2c) is interpreted as the longest of the three, which is due to the plural form of the head of the direct object (books) (See Krifka (1998) for more details on the relationship between the meaning of the direct object of a verb and the temporal properties of the event described by the verb.) The examples in (6.2) show that the interpretation of the duration of an event depends not just on the time adverbials, but also on the semantics of the verbs, and even on the semantics of their complements. All the events in (6.2) are interpreted as lasting up to several minutes, with the described variation. The common time range can be related to the meaning of the verbs which are used to describe the events. The time span is more specified with the time adverbials that are used in the sentences, with each adverbial setting resulting in a slightly different interpretation, as in (6.2a-b). Finally, as we see in (6.2c), a particular interpretation can be the result of a verb-object combination. While time adverbials can guide our intuition in selecting the most appropriate time interpretation, they cannot be taken as reliable quantifiers of events. The time span over which an event holds can be underspecified, despite a very precise time expressed by the time adverbial. This is the case in (6.3a). (6.3) 224 a. On 29th November 1984 at 7:55, Winston turned the switch. 6.1. Introduction b. On 29th November 1984 at 7:55, Winston turned the switch and the whole building disappeared. The default interpretation of (6.3a) is that the event is very short, shorter than a minute, and that it takes place at some time around 7:55. This interpretation is suggested in the context given in (6.3b). Note that the causal relation between Winston turning the switch and the building disappearing is inferred rather than encoded (Lascarides and Asher 1993; Wilson and Sperber 1998). The same sentence can denote a situation where the two events are unrelated. Nevertheless, the causal relation is automatically inferred as a conversational implicature, which then suggests that everything, including the turning of the switch, happened in a very short time span. Without additional contextual elements, the sentence in (6.3a) can be assigned another interpretation. It can denote a situation where Winston turned the switch over an unlimited time span which includes the point described with the time adverbial. We do not know when the event started, we know that it is true on 29th November 1984 at 7:55, and we do not know whether (and when) it ended. The essentially implicit nature of understanding how long an event lasted is what makes the task of automatic identification of event duration especially difficult. In automatic processing of natural language, the intuition used by a human interpreter to detect the intended implicit meaning has to be replaced by more explicit and rational reasoning in which the relationship between the linguistic clues and the time interpretation has to be fully specified. Yet the information about the duration of an event can rarely be read directly from the lexical representation of the time adverbials in the sentences, as the examples listed above show. Some reasoning is required to assign the correct intended interpretation even to time adverbials such as two hours in (6.1), which appear explicit and straightforward. The relationship between the linguistic clues and the resulting time interpretation is even less specified in (6.2), and it is unspecified in (6.3a), where it cannot be used for narrowing down the possible interpretations. Apart from the fact that one of the two interpretations is generally more likely, the sentence (6.3a) does not contain any elements which can point to the interpretation which is appropriate in the given contexts. Even though there is a time adverbial in the sentence, it is not useful 225 6. Unlexicalised learning of event duration using parallel corpora for disambiguating between the two interpretations. The clues for the disambiguation would have to be found elsewhere. The work on automatic identification of event duration, despite the difficulty, is motivated by the importance that it has in natural language understanding. Using natural language sentences as statements on the basis of which we want to infer some actual state of affairs often requires knowing the time span over which a certain situation or event is true. For instance, the sentence in (6.3a) can be taken as a basis for inferring the state of the switch at 8:00, or whether Winston still turned the switch at 8:30. The interpretation suggested in (6.3b) makes it almost certain that the switch is in a different position at 8:00 than it was before 7:55, and also that the event of Winston turning the switch is not true at 8:30. These inferences cannot be made starting with the temporally unbounded interpretation of the same sentence. One approach to dealing with the incomplete information about the time value of a natural language sentence is to rely on the semantic representation of the whole discourse to which the sentence in question belongs. The interpretation of the time expressed in the sentence is deduced from the representation of the time adverbials (or, possibly, other linguistic units which can point to the relevant time) found in other sentences, and from certain knowledge about the structure of the discourse. A narrative discourse would impose chronological sequencing of sentences, while the sentence sequencing in an argumentative discourse would be only partly chronological, with many inversions (stepping back in time). Knowing how the events are sequenced is helpful in determining which one is true at which point in time. There are, however, two problems with this approach. First, it is computationally complex, because it requires working with complex representations of big chunks of discourse at the same time. Second, pieces of discourse usually do not belong to a single type, but rather include the characteristics of several types at the same time (Biber 1991). With multiple types present in the same sequence of sentences, it is hard to see which type should be used for which sentence. Another approach, the one which we pursue in this study, is to rely on an elaborate semantic representation of verbal predicates, which are at the core of event denotation. In this study, we search for the elements of the lexical representation of verbs which can provide useful information for determining the duration of the events that they describe. 226 6.1. Introduction We make hypotheses about the nature of these elements on the basis of theoretical arguments put forward in the linguistic literature. More specifically, we study verb aspect as a lexical and grammatical category that has been widely discussed in relation with temporal properties of the meaning of verbs. We consider a range of theoretical insights about what semantic traits are encoded by verb aspect and how these traits can be related to event duration, which is a category of interest for automatic construction of real-world representations in natural language understanding. In the experiment presented in this study, we explore linguistic encoding and the possibility of cross-linguistic transfer of aspect information. It is a well-known fact that Slavic languages encode verb aspect in a much more consistent way than most of the other European languages. However, the mechanism which is used to encode aspect in Slavic languages is lexical derivation of verbs, not syntactic or morphological rules. The consequence of the lexical nature of aspect encoding is that the derivational patterns are not regular, but rather idiosyncratic and unpredictable, presenting numerous challenges for generalisations. In our study, we take a data-driven probabilistic approach to aspect encoding in Serbian as a Slavic language. We develop a cross-linguistic aspectual representation of verbs using automatic token-level alignment of verbs in English and Serbian. We then apply this representation to model and predict event duration classes. This approach is based on an assumption that aspectual meaning is preserved across languages at the level of lexical items (see Section 3.1 in Chapter 3 for a more elaborate discussion). In the remaining of this chapter, we first explain the theoretical notions on which our experiments are based. In Section 6.2.1, we present aspectual classes of verbs most commonly referred to in the literature and describe their potential relationship with event duration. The observable properties of semantic aspectual classification of verbs in general are discussed in Section 6.2.2. Since, unlike English, Serbian encodes temporal properties of verb meaning in a relatively consistent way, we then show how aspectual information is encoded in the morphology of Serbian verbs in Section 6.2.3. After laying down the theoretical foundations, we describe our cross-linguistic data driven approach to representing verb aspect (Section 6.3). We proceed by describing the experiment performed to test whether event duration is predictable from the representation of aspect in Section 6.4. Our statistical model which is designed to learn event duration from the 227 6. Unlexicalised learning of event duration using parallel corpora representation of aspect is described in Section 6.4.1. In Section 6.4.2, we describe the data used for the experiment, as well as various experimental settings. The results of the evaluation are discussed in Section 6.4.2. After general discussion (Section 6.5), our approach is compared to related work in Section 6.6. 6.2. Theoretical background Events described by verbs have different temporal properties: they can start and end at some point in time, they can be long, short, instantaneous, or somewhere in between these categories, they can overlap with other events or a particular point in time, the overlap can be partial or full, and so on. However, not all of these properties are equally relevant for the grammatical meaning of verb. Some temporal properties give rise to certain grammatical structures, while others are grammatically irrelevant. The main issue in linguistic theory of verb aspect is the kind of temporal meaning which it encodes. In this section, we give an overview of the main accounts of the relationship between the temporal meaning and the form of verbs. 6.2.1. Aspectual classes of verbs It has been argued in the linguistic literature that verbs can be divided into a (small) number of aspectual classes. A verb’s membership in an aspectual class is argued to play an important role in interpreting time relations in discourse. Dowty (1986) discusses contrasts such as the one shown in the sentences in (6.4a-b). (6.4) a. John entered the president’s office. The president woke up. b. John entered the president’s office. The clock on the wall ticked loudly. The interpretation of (6.4a) is that the president woke up after John entered his office, while the interpretation of (6.4b) is that the clock ticked loudly before and at the same time with John’s entering the president’s office. Dowty (1986) argues that the contrast 228 6.2. Theoretical background Eventuality described by a verb Unbounded State have, know, believe Activity swim, walk, talk Bounded Achievement arrive, realise, learn Accomplishment give, teach, paint Figure 6.1.: Traditional lexical verb aspect classes, known as Vendler’s classes is due to the fact that the verbs wake up and tick belong to two different aspectual classes. The aspectual classification of verbs depends on a set of lexical properties which describe the dynamics and the duration of the event which is described by the verb. A verb can describe a stative relation (as in (6.1)). We say that these verbs describe states. Otherwise, a verb can describe a dynamic action (as in (6.2-6.4)). States are usually considered temporally unbounded, while actions can be unbounded and bounded.1 The temporal boundaries to which we refer here are those that are implicit to the meaning of the verb. Although the state in (6.1), for example, is temporally bounded to two hours, this is imposed by the time adverbial. The meaning of the verb stay itself does not imply that there is a start or an end point of the state described by it. In contrast to this, the meaning of a verb such as wake up in (6.4a) does imply that there is a point in time where the action described by the verb (waking up in this case) is completed. Such verbs are said to describe temporally bounded events, usually termed as telic actions in the literature. (In this sense, states are always atelic.) Actions can 1 The term event, as it is used in the linguistic literature, sometimes does not include states, but only dynamic aspectual types. The general term which includes all aspectual types is eventuality. However, this distinction is not always made, especially not in computational approaches, and the term event is used in the more general sense, the one covering all aspectual types. In our study, we use the term event in the general sense. 229 6. Unlexicalised learning of event duration using parallel corpora also be temporally unbounded (atelic). This is the case with the clock ticking in (6.4b). Even though this action consists of repeated, temporally bounded actions of ticking, the verb is understood as describing a single temporally unbounded action, usually termed as an activity. The difference in the existence of an implicit time boundary in the interpretation of the verbs wake up and tick is precisely what creates the difference in the interpretation of the event ordering in (6.4a) and (6.4b). Temporally bounded actions can be bounded in different ways. Most commonly a distinction is made between the actions that last for some time before they finish, known as accomplishments and the actions which both begin and end in a single point of time, known as achievements. Typical examples of verbs that describe accomplishments are build, give, teach, paint, and those that describe achievements are arrive, realise, learn. Accomplishments are usually thought of as telic actions, as they point to the end of an action. Achievements, on the other hand, are frequently described as inchoative, which means that they point to the beginning of an action. This taxonomy of four aspectual types, summarised in Fig 6.1, is often referred to as Vendler’s aspectual classes (Vendler 1967). It has a long tradition in the linguistic theory, but it cannot be taken as a reference classification, as more recent work on verb aspect shows. The distinction between the entities which are temporally unbounded and those which are bounded seems much easier to make than the distinctions referred to at the second level of the classification. Dowty (1986) proposes a precise semantic criterion for distinguishing between the two: (a) A sentence2 ϕ is stative iff it follows from the truth of ϕ at an interval I that ϕ is true at all subintervals of I. (e.g. if John was asleep from 1:00 until 2:00PM, then he was asleep at all subintervals of this interval: be asleep is a stative). (b) A sentence ϕ is an activity (or energeia) iff it follows from the truth of ϕ at an interval I that ϕ is true of all subintervals of I down to a 2 Although the units discussed here are sentences, Dowty (1986) explicitly applies the same criteria to lexical items and functional categories. 230 6.2. Theoretical background certain limit in size (e.g. if John walked from 1:00 until 2:00 PM, then most subintervals of this time are times at which John walked; walk is an activity.) (c) A sentence ϕ is an accomplishment/achievement (or kinesis) iff it follows from the truth of ϕ at an interval I that ϕ is false at all subintervals of I. (E.g. if John built a house in exactly the interval from September 1 until June 1, then it is false that he built a house in any subinterval of this interval: build a house is an accomplishment/achievement.) (Dowty 1986: p. 42) There are two points to note about this criterion. First, the main difference is made between (a) and (b) on one side and (c) on the other. The difference between (a) and (b) is only in the degree to which the implication is true: in (a) it is true for all subintervals, while in (b), it is true for most subintervals. This difference does not matter in the contrasts illustrated in (6.4). With respect to the interpretation of time ordering, items defined in (a) and in (b) behave in the same way. Second, the distinction between accomplishments and achievements is not made at all. The reason for this is not just the fact that the distinction, like the distinction between (a) and (b), does not play a role in the contrasts addressed in Dowty’s study, but also the fact that there are no clear criteria for distinguishing between the two. Dowty (1986) argues that the duration criterion evoked in the literature does not apply. Relating the distinction between bounded and unbounded entities to only dynamic types, as we have been doing so far for clarity of presentation of the traditional taxonomy, does not entirely correspond to the real semantics of verbal predicates. Marı́n and McNally (2011) show that some verbs which would traditionally be classified as achievements (Spanish aburrirse ‘to be/become bored’ and enfadarse ‘to become angry’) are states, in fact, even though they are temporally bounded (inchoative). Other authors have proposed other criteria for defining aspectual classes. Most approaches analyse events in an attempt to define their components, such as start, or result. The classes are then derived as different combination of the components (Pustejovsky 1995; Rothstein 2008; Ramchand 2008). 231 6. Unlexicalised learning of event duration using parallel corpora In our study, no particular classification or event structure is adopted. We use the notions of temporal boundedness and duration, which are well defined and endorsed by most of the studies discussed above, but we do not adopt any of the traditional classes which are defined by particular combinations of the values of these categories. Since traditional aspectual classes are questionable, as shown in the discussion above, we propose our own approach to combining the values of boundedness and duration in forming aspectual classes. Our representation of aspect is based on the traditionally discussed notions, but the resulting categories do not correspond to any of the traditionally defined clsasses. As opposed to Dowty (1986), our study does not address orderings of events. We are interested in the duration of each event separately. Regarding the examples in (6.4), for instance, we are not interested in knowing whether the president woke up before or after John entered his office. Our questions are: How long does John’s entering the president’s office last? How long is the president’s waking up and the clock’s ticking? They are related to the sequencing of the parts of discourse, but they can be treated separately. We try to answer these questions using the knowledge about verbs’ aspectual classes, which are defined based on the notion of temporal boundedness. The intuition behind this goal is that temporal boundedness and duration are related. It is reasonable to expect that short events are temporally bounded. It is easier to imagine a time boundary in something that lasts several seconds than in something that lasts a hundred years. Long events can be expected to be less temporally bounded. Note that our expectations are probabilistic. We do not exclude the possibility for a short event to be temporally unbounded and for a long event to be bounded. However, we expect short temporally unbounded events to be less likely than than short temporally bounded events and long temporally bounded events to be less likely than long temporally unbounded event. We expect these dependencies to be strong enough so that the duration of an event can be predicted from aspectual properties of the verb that expresses it. 232 6.2. Theoretical background Tense Present/Past Simple Present/Past Continuous Present/Past Perfect Aspectual interpretation unspecified → Activities → Bounded (achievements/accomplishments) Table 6.1.: A relationship between English verb tenses and aspectual classes. 6.2.2. Observable traits of verb aspect The criterion for distinguishing temporally bounded and unbounded verbs defined by Dowty (1986) is a truth-conditional test which is well suited for querying human logical intuition. To perform such a test automatically, a system would need a comprehensive knowledge database, with all truth-conditions and inference rules explicitly encoded for each expression. The size of such a database and the resources needed to create it, as well as to search it, are hard to asses, but such a project would certainly be a challenging one. What would be more suitable for automatic identification of temporal boundedness is to be able to observe formal differences between unbounded and bounded events. This brings up the question of how aspectual classes can be observed in verbs. The form of the verbs listed in Fig. 6.1, for example, clearly does not vary depending on the class membership: verbs belonging to the same class have nothing in common. By considering only the form of a verb, we cannot determine its aspectual class. When verbs are used in a sentence, however, they receive a tense form, and some of the verb tenses in English do encode certain aspectual properties. As illustrated in Table 6.1, continuous tenses tend to refer to activities, while perfect tenses indicate that the event is temporally bounded. The tense with which a verb is used can override the inherent aspectual class of the verb lexical entry. For example, the verb realise is usually classified as an achievement, but the Present Continuous Tense form as the one in (6.5), cannot be assigned this class. (6.5) People are slowly realising what is going on. 233 6. Unlexicalised learning of event duration using parallel corpora The tense form of English verbs is a useful marker for identifying their aspect, but the range of the examples in which this relation can be used for automatic identification is limited to the sentences which contain either continuous or perfect tense. However, sentences with marked continuous or perfect tense are far less frequent than sentences with simple or bare forms, which do not point to any particular aspectual class. Another potential source of observations are distributional properties of verbs. A number of distributional tests to identify aspectual classes have been proposed in the literature, starting with Dowty (1979). Interestingly, the proposed tests do not make reference to the distinction between bounded and unbounded events, but to the second level of the traditional taxonomy (Fig. 6.1). For example, the most famous test of compatibility with in-adverbials vs. for-adverbials, shown in (6.6), differentiates between states and activities on one side and accomplishments on the other. States and activites are compatible with the for-adverbials, and accomplishments are compatible with the in-adverbials. This test does not apply to achievements. Other tests will distinguish between, for example, states and other classes and so on. (6.6) a. State: Winston stayed in the shop for two hours / ?? in two hours. b. Activity: The clock in the presidents office ticked for two hours / c. Accomplishment: Winston put the books in the bag c. Achievement: The president woke up ?? ?? ?? in two hours. for two seconds / in two seconds. for two seconds / ?? in two seconds. Apart from the fact that the categories which are identified with the distributional tests are not clearly defined, the problem with such tests is that English verbs are highly ambiguous between different aspectual readings so that adverbials can impose almost any given reading. The use of the verbs in (6.6) with the compatible adverbials is preferred, but the use with the incompatible adverbials is not ungrammatical. The incompatible adverbial imposes a less expected, potentially marginal reading, but, with 234 6.2. Theoretical background this reading, the sentence remains grammatical. Moreover, many verbs are perfectly compatible with different contexts, such as write in (6.7). (6.7) a. Activity: Winston wrote the message for two hours. b. Accomplishment: Winston wrote the message in two hours. Unlike English, other languages make formal differences between aspectual classes and these differences can be observed in the structure of verbal lexical entries. The form of verbs in Slavic languages, for example, famously depends on some aspectual properties of the events that they describe. In the remaining of this section, we show how this marking is realised in Serbian, as one of the Slavic languages. 6.2.3. Aspect encoding in the morphology of Serbian verbs The inventory of Serbian verbs contains different entries for describing temporally unbounded and temporally bounded events. Consider, for example, the Serbian equivalents of the sentence in (6.7), given in (6.8-6.9). The verbs pisao in (6.8) and napisao in (6.9) constitute a pair of lexical entries in fully complementary distribution: pisati (infinitive form of pisao) is used for temporally unbounded events, and napisati is used for temporally bounded events. As we can see in (6.8-6.9), exchanging the forms between the unbounded and bounded contexts makes the sentences ungrammatical. The forms that are used in the temporally unbounded context are called imperfective and those that are used in the temporally bounded context are called perfective. (6.8) Vinston je pisao/*napisao poruku dva sata. Winston-nom aux wrote message-acc two hours. Winston wrote a message for two hours. (6.9) Vinston je napisao/*pisao poruku za dva sata. Winston-nom aux wrote message-acc for two hours. Winston wrote a message in two hours. 235 6. Unlexicalised learning of event duration using parallel corpora pref (x) = P ’complete a specified x’ skuvati ’cookP ’ prokuvati ’boilP briefly’ iskuvati ’cookP well’ otkuvati ’cleanP by boiling’ zakuvati ’addP something into boiling liquid’ suff (pref (x)) = I ’do pref (x) continuously or repeatedly’ — pref (suff (pref (x))) = P ’complete multiple pref (x)’ — prokuvavati isprokuvavati iskuvavati iziskuvavati otkuvavati izotkuvavati zakuvavati izakuvavati Table 6.2.: Serbian lexical derivations, e.g. x = kuvati (I) ’cookI ’, basic form; I stands for imperfective, P for perfective. The two verbs in (6.8-6.9) are obviously morphologically related. The perfective verb is derived from the imperfective by adding the prefix na-. This case represents the simplest and the most straightforward relation between an imperfective and a perfective verb, usually considered prototypical. In reality, the derivations are far more complex, involving both lexical and aspectual modifications of the basic entry. The category of temporal boundedness underlies the two major aspectual classes in Serbian (perfective and imperfective), but it also interacts with some other factors resulting in a more finegrained classification, which does not necessarily match the classifications mentioned in Section 6.2.1. An illustration of derivations involving the verb kuvati (cook) is given in Table 6.2. We can see in Table 6.2 that the verbs are organized into aspectual sequences rather than pairs. Multiple affixes can be added to the same basic verb, modifying its meaning and its aspect at different levels. Each column in the table represents a step in the derivation. Each step can be seen as a function that applies to the result of the previous step. The forms in the first column are the result of adding a prefix to the basic form. The basic form is imperfective and adding the prefix turns it into a perfective. This 236 6.2. Theoretical background derivation is in many ways similar to the attachment of particles to English verbs, as the translations of the prefixed forms suggest (Milićević 2004). Adding a prefix results in a more specified meaning of the basic verb by introducing an additional resultative predication into the verb’s lexical representation (Arsenijević 2007). We say that this derivation indirectly encodes verb aspect because prefixes are not aspectual morphemes. The change of the aspect is a consequence of the fact that the result state introduced by the prefix makes the event temporally bounded. In some cases, this derivation can be further modified as shown in the second column. By attaching a suffix, the verb receives a new imperfective interpretation which is ambiguous between progressive and iterative meaning, similar to the interpretation of tick in (6.4b). (Historically, the suffix is iterative.) This new imperfective, sometimes referred to as secondary imperfective, is necessarily different than the starting imperfective of the basic form. The forms in the second column express events containing the resultative predicate introduced by the prefix, but with the time boundary suppressed by the suffix. Finally, the forms in the third column are again perfective, but this perfective is different from the one in the first column. These forms can be regarded as describing plural bounded events. In actual language use, these forms are much less frequent than the others. They can be found only in big samples of text, which is why we do not consider them in our experiments. Examples in (6.10-6.13) illustrate typical uses of the described forms of Serbian verbs. (6.10) Basic imperfective: Vinston je često kuvao. Winston-nom aux often cooked. Winston often cooked. (6.11) Prefixed perfective: Vinston je prokuvao čašu vode. Winston-nom aux boiled glass-acc water-gen. Winston boiled a glass of water. (6.12) Secondary imperfective: 237 6. Unlexicalised learning of event duration using parallel corpora Vinston je prokuvavao čašu vode (kada je čuo Winston-nom aux boiled glass-acc water-gen (when aux heard glas). sound-acc). Winston was boiling a glass of water when he heard the sound. (6.13) Double-prefix plural perfective: Vinston je isprokuvavao sve čaše vode. Winston-nom aux boiled all-acc glasses-acc water-gen. Winston boiled all the glasses of water. Not all prefixed verbs can be further modified. The verb skuvati in Table 6.2, for example, does not have the forms which would belong to the second and the third column. This phenomenon has been widely discussed in the literature, with many authors trying to determine what exactly blocks further derivations. It has been argued that prefixes can be divided into lexical (or inner) and superlexical (or outer) and that further derivations are not possible if the prefix is superlexical. This account, however, remains subject of debate (Svenonius 2004b; Arsenijević 2006; Žaucer 2010). We ignore this difference considering that there are no structural differences between the prefixes. Lexical and aspectual derivations are also possible with verbs whose basic form has perfective meaning. An illustration of this paradigm is given in Table 6.3. The aspect does not change for these verbs when they are prefixed. The verbs in the first column are perfective just like the basic form. The rest of derivations proceed in the same way as for the imperfective basic forms. There are other patterns of lexical expression of aspectual classes in Serbian. For instance, some verbs are attached a perfective suffix (rather than a prefix) directly to the basic form (usually used to express very short, semelfactive events), some verbs do not have the basic form, they are found only with prefixes, some perfective verbs have no imperfective counterparts and vice versa, etc. However, the examples listed in this section form a general picture of how aspectual classes are morphologically marked in Serbian. The summary of possible verb forms is given in Fig. 6.2. What is important for our study is the fact that aspectual classes are observable in the verb forms, although the relationship between the form and the meaning is not simple. 238 6.2. Theoretical background suff (x) = I ’do x continuously or repeatedly’ → bacati pref (x) = P ’complete a specified x’ prebaciti transferP izbaciti throwP out ubaciti throwP in odbaciti rejectP suff (pref (x)) = I ’do pref (x) continuously or repeatedly’ prebacivati pref (suff (pref (x))) = P ’complete multiple pref (x)’ isprebacivati izbacivati izizbacivati ubacivati izubacivati odbacivati izodbacvati Table 6.3.: Serbian lexical derivations, e.g. x = baciti (P) ’throwI ’, basic form; I stands for imperfective, P for perfective verb <outer prefix> <inner prefix> ’iz’’iz’’na-’ ’na-’ ’u-’ ... ’ od-’ ... <stem> ... <suffix> <inflection> imperfective ’-va’ ’-ja’ other perfective ’-nu’ tense mood Figure 6.2.: Serbian verb structure summary 239 6. Unlexicalised learning of event duration using parallel corpora Serbian verb morphology encodes aspect only indirectly, but, unlike with English verb tenses, some kind of aspect information is present in almost all verb uses. Morphological expression of aspect in Serbian is also potentially less ambiguous, hence more helpful in determining verb aspect than time adverbials and other elements which can be found in the context in which the verb is used. The described derivations can potentially encode numerous aspectual classes. Minimally, the verbs are divided into temporally bounded (perfective) and temporally unbounded (imperfective). However, combinations of different stems, prefixes and suffixes can result in more fine-grained classes. For example, the secondary imperfectives (the third column in Table 6.2) do not have the same temporal properties as the starting, basic imperfective. As discussed above, the secondary imperfective contains the resultative meaning introduced by the prefix, while the bare form does not. As a consequence, the meaning of the secondary imperfective is more specified and more anchored in the context. This distinction might prove relevant for event duration. We can expect prefixed imperfective verbs to describe shorter events than basic imperfective verbs. Further distinctions can be encoded by potential dependencies between the prefixes and suffixes, or between the stems and the structural elements. In our study, we do not explore all the possible encoded meanings, but we do explore distinctions which are more complex than the simplest distinction between imperfective and perfective meaning. We represent aspectual classes as combinations of three attribute-value pairs. The attributes are defined on the basis of the analysis presented in this section. 6.3. A quantitative representation of aspect based on cross-linguistic data Experimental approach to the temporal meaning which we adopt in our study requires a relatively large set of examples of linguistically expressed events for which both event duration and verb aspect are known and explicitly encoded. Compiling such a data set is already a challenging task because we are dealing with intuitive notions referring to 240 6.3. A quantitative representation of aspect based on cross-linguistic data the phenomena which are hard to observe and measure. Although we all might agree that some events last longer than others, assessing exact duration of any particular event is something that we do not normally do. Collecting such assessments for a large number of cases is a task which requires significant efforts. Collecting verb aspect assessments is even more complicated because this is a theoretical notion which cannot be understood without linguistic training. Moreover, assessing an exact verb aspect value for a particular verb use proves difficult even for trained linguists because there is no general consensus in the theory on what values this category includes and how they should be defined. One of the objectives of our work is, thus, compiling a set of data for our experimental work. The existing resources provide one part of the information that we need: human judgments of event duration have been annotated for a set of event instances (Pan et al. 2011). We collect the information about verb aspect for the same set of instances. We decide to collect our own verb aspect information because of the fact that no theoretical account of this category can be taken as standard. Using any existing annotation necessarily means adopting the view of verb aspect inherent to the annotation. Instead, we propose our own approach which is data-driven and based on observable structural elements of verb forms. Since English verb morphology typically does not mark aspect, we gather the information from the translations of English verbs into Serbian, where, like in all other Slavic languages, verb aspect is indirectly encoded in the verbs’ morphology. Instead of collecting human judgements, we observe natural language encoding of verb aspect in Serbian and use these observations to assign verb aspect properties to the verbs both in Serbian and English. The information used for assignment is automatically extracted from language corpora, which makes this approach especially suitable for annotating a large number of examples. We represent aspectual classes of verbs as combinations of values of three attributes. The first attribute is the binary classification into grammatical perfective and imperefective classes. The second and the third attribute (morphological attributes) specify whether the verb in question is used with a prefix or/and with a suffix respectively. The combinations of the values of the three attributes represent aspectual classes. We use these classes to predict event duration, but we do not identify them with any aspectual 241 6. Unlexicalised learning of event duration using parallel corpora classes already proposed in the literature. The values for the three attributes are assessed on the basis of the instances of verbs in a parallel corpus, which is described in the following section. The values for aspectual attributes of verbs are determined automatically on the basis of cross-linguisitc realisations of verbs in English and Serbian. Cross-linguistic assignment of aspectual classes is possible due to the fact that aspect is a property of the meaning of verbs, and the meaning is what is preserved in translating from one language to another, while the forms of its expression change. In our case, verb aspect is morphologically encoded in Serbian verbs, but the same aspectual class can be assigned to the corresponding English verbs. In this section, we describe our approach to cross-linguistic aspectual classification of verbs based on the described representation of aspect decomposed into attributes. 6.3.1. Corpus and processing For transferring verb aspect classification from Serbian to English, we need to know which Serbian verb is the translation of an English verb in a given context. This kind of information can be obtained from a corpus of translated sentences — a parallel corpus — which is aligned at the word level, so that the translation is known for each word of each sentence. For the purpose of predicting event duration on the basis of aspectual properties of verbs, which is the goal of our experiments, we need a word-aligned parallel corpus with annotated event duration on one side of the corpus. Such a corpus, however, does not exist. Examples of annotated event duration are available in English and we do not have Serbian translations of these sentences. We have to use other resources. There are several parallel English-Serbian corpora which are currently available (Tiedemann 2009; Erjavec 2010). In our current study we only use the Serbian translation of the novel “1984” by George Orwell, which is created in the MULTEXT-East project (Krstev et al. 2004; Erjavec 2010). We use this corpus for the convenience of the manual annotation and literary text genre, which is known to be rich in verbs (Biber 1991). In principle, our methods are applicable to all available parallel corpora. 242 6.3. A quantitative representation of aspect based on cross-linguistic data Serbian Word propagirao je svoju jeres , zanosio se njome Lemma propagirati jesam svoj jeres # zanositi se ona MSD Vmps-sman-n—p Va-p3s-an-y—p Ps-fsa Ncfsa–n # Vmps-sman-n—p Q Pp3fsi English Word proclaiming his heresy , exulting in it Lemma proclaim his heresy # exult in it MSD Vmpp Ds3—sm Ncns # Vmpp Sp Pp3ns Table 6.4.: An illustration of the MULTEX-East corpus: manually encoded morphological analysis of a chunk of text in Serbian and its corresponding chunk in English. The MULTEXT-East parallel corpus is available as an XML database containing two kinds of manual annotation: • Morphological annotation — Each word in the text is assigned a lemma and a code called morphosyntacic definitions (MSD), which is a compact representation of a number of lexical, morphological, and syntactic categories realised in each word form. Each category is encoded by a single character in the label. The first character encodes the part-of-speech (verb, noun, adjective, etc.). The second character encodes a subclassification for each main category (e.g. main, auxiliary, modal, copula for verbs, common, proper for nouns etc.). Other characters specify morphological features that are marked in the word form such as case, number, tense, voice, mood etc. For example, the MSD label ”Vmps-sman-n—p” denotes that the word propagirao is a main verb, in the past participle singular masculine active positive form. The last letter indicates that its aspect is imperfective. The letter “p” in the MSD code stands for “progressive”, but in fact it encodes “imperfective” as described in Section 6.2.3. An illustration of the morphological annotation is shown in Table 6.4. Detailed specifications of the labels are provided in the MULTEXT-East project documentation. • Sentence alignment — The information about which sentence in English corre- 243 6. Unlexicalised learning of event duration using parallel corpora sponds to which sentence in Serbian is provided as an additional layer of annotation. The corpus is not aligned at the word level. We obtain word alignments, which is the last piece of information needed for our study, automatically, using the methods described in the following subsection. Automatic alignment of English and Serbian verbs in a parallel corpus. We extract the information about word alignments in the manually aligned Serbian and English sentences using the system GIZA++ (Och2003), the same tool which is used in the previous two studies. The input to the system are tokenised sentence-aligned corpora in the plain text format, with one sentence alignment chunk per line. We use the XML pointers in the manual alignment file of the MULTEXT-East corpus to convert the Serbian and English text to the format required by GIZA++. The conversion also includes removing the morphological annotation temporarily. We then perform word alignment in both directions: with English as the target language and Serbian as the target language. Given the formal definition of word alignment which is used by the system (see Section 3.2 in Chapter 3), the amount and the correctness of word alignment depends very much on the given direction. It is common practice in machine translation and in other tasks involving automatic word alignments to use the intersection of both directions of alignment, that is only the alignments between the words which are aligned in both directions (Och and Ney 2003; Padó 2007; Bouma et al. 2010; van der Plas et al. 2011). This approach gives very precise alignments, but only for a relatively small number of words. We do not follow this approach, since it leaves out many correct alignments which are potentially useful for our study, as shown in Chapter 4. Instead, we convert the alignment output into a format suitable for manual inspection. We then manually compare a sample of the alignments in the two directions and choose the one which gives more correct alignments. This, in our case, was the alignment with English as the target language. Once we have obtained word alignments, we combine them with the morphological annotation from the original corpus to extract only those alignments of English verbs which are aligned with Serbian verbs. In other words, we only keep alignments between the 244 6.3. A quantitative representation of aspect based on cross-linguistic data words which both contain the “Vm” code in their respective morphosyntactic definitions (see Table 6.4). This simple method does not only select verbs, which we are interested in in our study, but it also eliminates potentially wrong alignments. If a system aligns a verb with a noun, or an adjective, or any other category, chances are that this is a wrong alignment. On the other hand, even if it is possible for a verb in one language to be aligned with a wrong verb in another language in a sentence which contains more than one verb, this happens relatively rarely in practice. 6.3.2. Manual aspect classification in Serbian With word-to-word alignment between English and Serbian verbs and with the manually annotated aspect code, which is contained in the morphological description of the words on the Serbian side, we can now see whether English verbs are aligned with perfective or imperfective Serbian verbs. This will determine the value of the first aspectual attribute (simple binary aspect classification). We collect the following counts: • For each verb form on the English side of the corpus: – the number of times it is aligned with a perfective Serbian verb – the number of times it is aligned with an imperfective Serbian verb • For each verb lemma on the English side of the corpus: – the sum of the alignments of all the forms with a perfective Serbian verb – the sum of the alignments of all the forms with a perfective Serbian verb We collect the counts at the level of verb type because some of the verb tenses in English can indicate a particular aspectual context, as shown in Section 6.2.2. This implies that aspectual classes assigned to verb forms separately are expected to be more precise than assigning the same class to all the forms of one lemma. Summing up the counts for each lemma, on the other hand, is useful as a kind of back-off for classifying verb forms which are not observed in the parallel corpus. If any other form 245 6. Unlexicalised learning of event duration using parallel corpora of the same lemma is observed, then the count of alignments for the lemma is not zero and the unobserved form can be assigned the value which is assigned to the lemma. 6.3.3. Morphological attributes The binary grammatical category of perfective and imperfective aspect does not represent all the aspectual properties of Serbian verbs which are encoded in the morphology. As shown in Section 6.2.3, these categories interact with other factors and the resulting morphology encodes a more fine-grained classification. Perfective verbs, for instance, can be divided into those that are perfective in their basic form (such as baciti in Table 6.3), those that have become perfective by attaching a prefix (the first column in Table 6.2 and 6.3), or those that are attached a perfective suffix (see Fig. 6.2, these verbs usually do not have bare forms). Similarly, verbs can be imperfective in their basic form such as kuvati in Table 6.2, or after they have been attached both a prefix and an imperfective suffix. We take the presence or the absence of the relevant morphological units in the structure of Serbian verbs as indicators of different aspectual properties. For these reasons, in addition to the simple perfectivity, we define two more attributes for encoding more fine-grained aspectual distinctions of verbs. To collect the counts needed for this description, we analyse all the Serbian verbs which are identified as aligned with English verbs in the parallel corpus. We perform the analysis automatically using our own analyser which implements the rules described in Section 6.2.3. The structure obtained with our analyser cannot be considered the true structure but rather an approximation of it. Due to historical changes, some morphemes that are known to have existed in the structure are not easily recognisable in the present-day forms. We ignore these elements and treat these verbs as uncompositional. By identifying the visible morphological segments only, we can still analyse most Serbian verbs. With the identified segments of the morphological structure of Serbian verbs, we can now collect the following counts: • For each verb form on the English side of the corpus: – the number of times it is aligned with a prefixed Serbian verb 246 6.3. A quantitative representation of aspect based on cross-linguistic data – the number of times it is aligned with a suffixed Serbian verb • For each verb lemma on the English side of the corpus: – the sum of the alignments of all the forms with a prefixed Serbian verb – the sum of the alignments of all the forms with a suffixed Serbian verb Knowing the value of each of the three aspectual attributes of the Serbian alignment of an English verb form, that is knowing if the Serbian translation of an English verb has a prefix or not, if it has a suffix or not, and if it is perfective or imperfective, we can now describe English forms in terms of these attributes, and then use them to predict the duration of the events expressed by the verbs. We assign to each English verb form and to each lemma a single value for each of the three aspectual attributes. The values represent the total of the corpus counts for each type. We explain in the following subsection how the values are calculated. 6.3.4. Numerical values of aspect attributes In our cross-linguistic representation, aspect of each English verb form is defined by a vector of three numbers between 0 and 1. Each number expresses the value of one attribute. The values are determined based on the observations made in the parallel corpus. We quantify three aspect attribute in the following way: • Prefix: This attribute encodes the tendency of English verbs to be word-aligned with prefixed Serbian verbs. Given the role of prefixes in the derivation of Serbian verbs, described in Section 6.2.3, such tendency provides two pieces of information about the event which is described by the English verb: a) The event is more specified than those that are aligned with Serbian bare verbs. b) The event is temporally bounded, unless the verb also tends to be associated with an imperfective suffix, which can remove the temporal boundary imposed by the prefix. 247 6. Unlexicalised learning of event duration using parallel corpora The value of this attribute is calculated as the proportion of prefixed verbs in the set of verb alignments for each verb form in English, as shown in (6.14). Pf (e) = F (sr pref (e)) F (sr(e)) (6.14) where F stands for frequency (= total count in the corpus), e is an English verb, sr pref (e) is a prefixed Serbian verb aligned with the English verb e, and sr(e) is any Serbian verb aligned with the English verb e. For example, if an English verb form is aligned with a Serbian verb in a parallel corpus 9 times, and 7 of these alignments are Serbian prefixed verbs, the value of Pf = 97 = 0.8. • Suffix: This attribute encodes the tendency of English verbs to be word-aligned with Serbian verbs which are attached a suffix. The presence of suffixes in the Serbian translations of English verbs can mean two opposite things: a) The event described by the English verb is temporally unbounded, but specified. This is the case when the suffix is added to derive the secondary imperfective in Serbian (the second column in Table 6.2 and 6.3). b) The event is temporally bounded and very short, which is the case when the suffix is added to the bare form directly. The value of this attribute is calculated, similarly to prefix alignments, as the proportion of verbs containing a suffix in the set of verbs with which an English verb is aligned, as shown in (6.15). Sf (e) = F (sr sf (e)) F (sr(e)) (6.15) where e is an English verb, sr suff (e) is a Serbian verb which contains a suffix and which is aligned with the English verb e, and sr(e) is any Serbian verb aligned with the English verb e. For example, if the same English verb for which we calculated the prefix value is aligned with a suffixed Serbian verb 4 times, the value of the suffix attribute is Sf = 94 = 0.4. 248 6.3. A quantitative representation of aspect based on cross-linguistic data • Aspect: This attribute encodes the information extracted from the manually annotated morphological description of Serbian verbs. It represents the tendency of an English verb form to be aligned with Serbian verbs tagged as perfective. This information is especially useful in the case of bare verbs in Serbian, where the structural information is missing. It is calculated in a similar way as the previous two values, as shown in (6.16). Asp(e) = F (sr asp(e)) F (sr(e)) (6.16) where e is an English verb, sr asp(e) is a the perfective aspect annotation of a Serbian verb aligned with the English verb e, and sr(e) is any Serbian verb aligned with the English verb e. If the same verb is seen 5 times aligned with a Serbian perfective verb, the value is calculated as Asp = 95 = 0.5. Note that, since all the verbs in the corpus are tagged either as perfective or imperfective, this value determines at the same time the tendency of a verb to be aligned with imperfective forms in Serbian. We do not set any threshold to the number of observations that are included in the measures. We calculate all the three values for all English forms (or lemmas) which are observed at least once in the parallel corpus. To deal with low frequency items and zero counts, we apply additive smoothing, which is calculated as in (6.17): Θi = xi + 1 n+2 (6.17) where i ∈ {Pf, Sf, Asp} is one of the aspect attributes, x is the number of observed alignments of an English verb with a specific value of the attribute, and n is the number of times the English verb is seen in the parallel corpus. The smoothed values for the examples used above are ΘP f = 7+1 = 0.7, ΘSf = 4+1 = 0.4, and ΘAsp = 5+1 = 0.5. 9+2 9+2 9+2 We illustrate the resulting aspectual definitions of English verbs with a sample of data in Table 6.5. The zero values that can be seen in the table are the result of rounding. 249 6. Unlexicalised learning of event duration using parallel corpora Verb Prefix deal 0.8 find 0.9 owns 0.8 crashed 0.2 thought 0.6 hit 0.7 spent 0.8 think 0.4 going 0.4 Suffix 0.5 0.5 0.8 0.6 0.0 0.1 0.2 0.1 0.4 Aspect 0.8 0.9 0.2 0.6 0.6 0.7 0.3 0.3 0.4 Table 6.5.: A sample of the verb aspect data set. With the aspectual attributes of English verbs defined using the morphological information of Serbian verbs, as we described in this section, we can now perform machine learning experiments to test whether event duration can be predicted from these descriptions. 6.4. Experiment: Learning event duration with a statistical model The main goal of our experiment is to determine if the grammatical notion of verb aspect encodes the real-world temporal properties of the events. The general intuition behind our approach, as already discussed in Section 6.2.1, is that the implicit time boundary in the meaning of a verb and the duration of the event described by it are related. If there is a time boundary in the lexical representation of a verb, the event described by it is more likely to be short than if there is no time boundary. Even though time boundaries can be defined for any event, even for those that last for years, we can expect the boundary to be implicit to the meaning of only those verbs which describe short events. The time limit is more prominent in the event whose duration is perceived as limited to a short time span. This general relationship is then modified in particular cases, such as 250 6.4. Experiment: Learning event duration with a statistical model English verb tenses (see Section 6.2.2) or secondary imperfectives in Serbian (see Section 6.2.3). We formalise our hypotheses about the relation between verb aspect and event duration by means of a statistical model. Representing the aspect attributes with the quantities based on corpus counts, as described in Section 6.3, is already one part of the model. The attributes Prefix, Suffix, and Aspect, which we propose, are, in fact, a model of grammatical aspect. What remains to be specified, in order to construct a full model of all the notions examined in our research, is how the aspect attributes are related to event duration, and also how they are related to each other. The interest for developing such a model is not only practical, but also theoretical. A model with a sound theoretical background, if successful, is not only expected to make good predictions improving the performance in the tasks related to natural language understanding. Such a model, making reference to specific theoretical notions, is also a means of testing whether these notions actually play a role in the empirical reality. Being a model of the relationship between the categories in the domain of grammar of language and those that belong to world knowledge, it can provide new insights into the functioning of the interface between these two domains. In the remaining of this section, we first describe the full model which is tested in the experiment. We then describe the algorithms and the data sets used in the experimental evaluation, and, in the last subsection, the results of the evaluation. 6.4.1. The model The full model of the relationship between verb aspect properties and event duration consists of four variables. The three aspect properties described in Section 6.3 are included in the model as separate variables. The fourth variable represents event duration. In the following list we introduce the notation that we use, summarising the variables, their values and the expected relationships between them. T: for Time. This variable represents the information about event duration as assessed by human annotators. It can take the values “short” and “long”. 251 6. Unlexicalised learning of event duration using parallel corpora Pf : for Prefix. This variable encodes the tendency of English verbs to be word-aligned with prefixed Serbian verbs. It can take the values between 0 and 1, as described in Section 6.3. Since the presence of a prefix in Serbian verbs indirectly indicates perfective aspect, as we show in Section 6.2.3, we expect that higher values of this variable provide a signal of short event duration, and lower values of long duration. Sf : for Suffix. This variable encodes the proportion of suffixed verbs in the set of Serbian verbs with which an English verb form is aligned (Section 6.3). The expected contribution of this variable to the model is based on the interaction between verb prefixes and suffixes which is observed in the grammar of Serbian aspectual derivations. As described in Section 6.2.3, suffixes can be attached to the verbs which are already attached a prefix, resulting in what is usually called secondary imperfective. Otherwise, suffixes can be attached to bare forms resulting in perfective interpretation (see Fig. 6.2). Crossing the values of Pf and Sf is thus expected to yield a more accurate representation of the temporal properties of verbs than using any one of the two variables. Asp: for Aspect. In addition to the formal grammatical elements that indicate the aspectual class of the Serbian alignments of English verb forms, the model contains a variable which encodes directly whether the alignments tend to be perfective or imperfective. Higher values of Asp indicate that the English form tends to be aligned with perfective Serbian verbs and lower values indicate imperfective alignments. The information from this variable is expected to be useful in the cases where English verb forms are aligned with Serbian verbs which do not bear any formal marking, but which are still specified for aspect, such as basic forms in Table 6.2 and Table 6.3. Note that the model does not include any lexical information. We do not use the information about lexical entries either of English or of Serbian verbs. We also do not use the form of Serbian prefixes and suffixes, we only observe whether any affix appears in a verb or not. 252 6.4. Experiment: Learning event duration with a statistical model T Pf Sf Asp Figure 6.3.: Bayesian net model for learning event duration We formalise the described relationships between the variables in the model by means of a Bayesian net, shown in Figure 6.3. The general principles of constructing the Bayesian net model representation are discussed in more detail in Section 3.4.3 in Chapter 3. As represented by the arrows in Figure 6.3, we assume that Asp and T are conditionally independent given Pf and Sf. This relationship captures the fact that the information about verb aspect is important only when the information about verb affixes is not available. We assume that Sf depends both on T and Pf, which represents the fact that a suffix can be added for two reasons. First, it can be added as a means of deriving secondary imperfectives, and this is the case where a prefix is already attached to the verb. Second, a suffix can be added to a bare form, and, in this cases, it can result in a perfective. Pf depends only on T, meaning that a prefix is attached only to the verbs which express events with particular durations (short events). The variable whose values we predict in the machine learning experiments is T , and the predictors are the other three variables. The Bayesian net classifier We build a supervised classifier which is an implementation of our Bayesian net model described in Section 6.4.1. Assuming the independence relationships expressed in the Bayesian net (Figure 6.3), we can decompose the model into smaller factors and calculate its probability as the product of the probabilities of the factors, as shown in (6.18). 253 6. Unlexicalised learning of event duration using parallel corpora P (T, Pf, Sf, Asp) = P (T ) · P (Pf |T ) · P (Sf |Pf, T ) · P (Asp|Pf, Sf ) (6.18) The probability of each factor of the product is assessed on the basis the relative frequency of the values of the variables in the training set. The prior probability of event duration T is calculated as the relative frequency of the duration in the training sample (T otal), as shown in (6.19). P (T ) = F (T ) T otal (6.19) The conditional probability of the other factors are calculated from the joint probability, which is estimated for each value of each variable as the joint relative frequency of the values in the sample, as shown in (6.20-6.22). P (Pf |T ) = P (Sf |Pf, T ) = P (Asp|Pf, Sf ) = F (Pf, T ) F (T ) (6.20) F (Sf, Pf, T ) F (Pf, T ) (6.21) F (Asp, Pf, Sf ) F (Pf, Sf ) (6.22) In testing, the predicted time value is the one which is most likely, given the values of the three aspect attributes observed in the test data: duration class(instance) = arg max P (t|pf, sf, asp) (6.23) t The conditional probability of each value t ∈ T is calculated applying the general conditional probability rule, factorised according to the independence assumptions in the Bayes’ net (see Figure 6.3), as shown in (6.24). 254 6.4. Experiment: Learning event duration with a statistical model P (t) · P (pf |t) · P (sf |pf, t) · P (asp|pf, sf ) P (t|pf, sf, asp) = P t P (t) · P (pf |t) · P (sf |pf, t) · P (asp|pf, sf ) (6.24) If one of the values of the three predictor variables is unseen in the training data, we eliminate this variable from the evidence and calculate the conditional probability of t given the remaining two variables, as shown in (6.25)-(6.27): P asp P (t) · P (pf |t) · P (sf |pf, t) · P (asp|pf, sf ) P (t|pf, sf ) = P P asp t P (t) · P (pf |t) · P (sf |pf, t) · P (asp|pf, sf ) (6.25) P sf P (t) · P (pf |t) · P (Sf |pf, t) · P (asp|pf, sf ) P (t|pf, asp) = P P sf t P (t) · P (pf |t) · P (sf |pf, t) · P (asp|pf, sf ) (6.26) P pf P (t) · P (pf |t) · P (sf |pf, t) · P (asp|pf, sf ) P (t|sf, asp) = P P pf t P (t) · P (pf |t) · P (sf |pf, t) · P (asp|pf, sf ) (6.27) In principle, the same variable elimination procedure can be applied also when two values are unseen, but this was not necessary in our experiments. 6.4.2. Experimental evaluation In the experimental evaluation, we train the model on a set of examples and then test its predictions on an independent test set. In all the settings of the experiment, the learning task is defined as supervised classification. The learning systems are trained on a set of examples where the values of the target variable are observed together with the values of the predictor variables. The predictions are then performed on a new set of examples for which only the values of the predictor variables are observed. The way the information from the predictors is used to predict values of the target variable depends on the classification approach and on the machine learning algorithm which is used. Some algorithms can be better suited for certain kinds of data than others. (Hall et al. 2009). To determine which classification approach is the best for our predictions, we perform several experiments. In addition to our Bayesian net 255 6. Unlexicalised learning of event duration using parallel corpora classifier, we test three more methods on two versions of the data. Our experimental set-up is described in more detail in the following subsections. Materials and methods We test our model on a set of examples with manually annotated event duration provided by Pan et al. (2011) to which we assign verb aspect values acquired from Serbian verb morphology through a parallel corpus as described in Section 6.3. The full set of data which were used in the experiments is given in Appendix C. Corpus and processing. The examples annotated with event duration are part of the TimeBank corpus (Pustejovsky et al. 2003). The annotation of duration is, in fact, added to the already existing TimeBank annotation. An example of an annotated event is given in (6.28) (the mark-up language is XML). The part of the annotation in bold face is the added duration information, the rest belongs to the original TimeBank annotation. (6.28) There’s nothing new on why the plane <EVENT eid=“e3” class=“OCCURRENCE” lowerBoundDuration=“PT1S” upperBoundDuration=“PT10S”>exploded</EVENT>. In annotating event duration, the annotators were asked to assess a possible time span over which the event expressed in a particular sentence can last. They were asked to determine the lower and the upper bound of the span. We can see in the annotation of the event in (6.28), for example, that the event of exploding is assessed to last between one and ten seconds. Such annotations are provided for 2’132 event instances. The agreement between three annotators is measured on a sample of around 10% of instances. To measure the agreement, the seven time units (second, minute, hour, day, week, month, year) were first converted into seconds. Then the mean value (between the lower and upper bound) was taken as a single duration value. To account for the different perception of the time variation in the short and long time spans, that is for the fact that the difference between 3 and 5 seconds is perceived as more important than the difference between 533 and 535 seconds, the values in seconds were converted 256 6.4. Experiment: Learning event duration with a statistical model into their natural logarithms. The values on the logarithmic scale were then used to define a threshold and divide all events into two classes: those that were assigned a value less than 11.4 (which roughly corresponds to a day) were classified as short events and the others were classified as long events. Pan et al. (2011) report a proportion of agreement between the annotators of 0.877 on the two classes, which corresponds to the κ-score of 0.75 when taking into account the expected agreement. This agreement can be considered strong. It confirms that people do have common intuitions on how long events generally last. The events which are annotated by Pan et al. (2011) are expressed with different grammatical categories, including nouns (such as explosion), adjectives (such as worth), and others. For testing our model, we select only those events which are expressed with verbs, that is those instances which are assigned a verb tense value in an additional layer of annotation, not shown in (6.28). We limit our data set to verbs because the theoretical notion of aspect, which we examine in our study, is essentially verbal. Category changing is known to have considerable influence on certain elements of lexical structure of words (Grimshaw 1990). Given the place it takes in the lexical representation of verbs (Ramchand 2008), aspect can be expected to be one of the elements that are affected by category changing. We thus use only the instances of events which are expressed by verbs to avoid unnecessary noise in the data. After eliminating non-verb events from the original Pan et al. (2011)’s corpus, the number of instances with annotated event duration were reduced to 1’121. We had to further eliminate a number of these instance because we could only test our model on those verbs for which we had acquired aspect information from the parallel corpus. Those are the verbs which occur both in the TimeBank and in the MULTEXT-East corpus and which are word-aligned with Serbian verbs (see Section 6.3). After eliminating the instances for which we did not have Serbian alignments, we obtained the definitive set of data which we used in the experiments, a total of 918 instances. We follow Pan et al. (2011) in dividing all the events into two classes: short and long. This decision is based on the fact that the inter-annotator agreement on more finegrained distinctions is much weaker than in the case of the binary classification. Pan et al. (2011) also report agreement based on the overlap between the distributions of 257 6. Unlexicalised learning of event duration using parallel corpora duration values. This agreement score depends on the threshold defined for the overlap, but reaches the kappa-score of 0.69 only when as little as 10% of the overlap is considered agreement. To transform the existing annotations into the two classes, we apply the procedure which is described above: we convert all the time units into seconds, then transform these values into natural logarithms and then set the threshold for dividing the events into short and long to 11.4, which roughly corresponds to the length of one day. Two versions of verb aspect annotation. Since human judgements on event duration agree much better for the binary classes than for more fine-grained distinctions, it can be the case that verb aspect properties too are better represented as binary variables, instead of the ten-value representation shown in Section 6.3 (see Table 6.5). To check whether a coarser representation of verb aspect is more useful for predicting the two event duration classes, we perform experiments with two versions of the data. In the first setting, we use the ten-value representation of verb aspect, as described in Section 6.3. In the second setting, we use only two values: high and low. We define the threshold for dividing the values into these two classes as the median value of each variable observed in the training data: the values are considered high if they are greater than 0.5 for Prefix, 0.3 for Suffix, and 0.7 for Aspect. Otherwise, the values are considered low. Prefix 0.8 0.9 0.8 0.2 0.6 0.7 0.8 0.4 0.4 Suffix 0.5 0.5 0.8 0.6 0.0 0.1 0.2 0.1 0.4 Aspect Time 0.8 short 0.9 short short 0.2 0.6 long long 0.6 0.7 short 0.3 short 0.3 long 0.4 long Prefix high high high low high high high low low Suffix high high high high low low low low high Aspect Time high short high short short low low long long low high short low short low long low long Table 6.6.: A sample of the two versions of data set with combined verb aspect and event duration information: the version with ten-value predictor variables on the left side, the version with two-value predictor variables on the right side. As an illustration of the data which were used in the machine learning experiments, a 258 6.4. Experiment: Learning event duration with a statistical model sample of instances is shown in Table 6.6. The first three columns of the two panels contain the two versions of the representation of aspectual properties of English verbs acquired from the morphological structure of their Serbian counterparts. The fourth column contains human judgements of whether the events described by the verbs last less or more than a day. Comparison with other classifications methods. To assess whether our learning approach is well-adapted to the kind of predictions that we make in our tests, we perform the same classification of verbs into “short” and “long” but using different methods. In all the methods with which we compare our Bayes net classifier, the representation of aspect is the one described in Section 6.3. The first classification method which we test is a simple rule-based classifier which does not use the information about the morphological structure of Serbian verbs, but only the binary classification into imperfective and perfective verbs. We classify as “short” all English verbs which tend to be aligned with Serbian perfective verbs (these are the verbs which are assigned 0.7 or more for the attribute Aspect as described in 6.3). The other verbs are classified as “long”. In addition to the Bayes net classifier and the simple rule-based classifier just described, we train two machine learning algorithms: Decision Tree and Naı̈ve Bayes classifiers. Both algorithms are described in more detail in Section 3.4.1 in Chapter 3. We choose these two algorithms because they are known to perform very well on a wide range of learning tasks, despite their relative simplicity. It is important to note that these two algorithms use our representation of aspect in different ways. As discussed in Section 3.4.1, the Naı̈ve Bayes approach is based on the assumption that all the predictor variables are independent. Contrary to this, the Bayesian net classifier includes the specific dependencies which we believe to exist in the reality. This, if the dependencies are correctly identified, can be an advantage for the Bayesian net classifier. Another important difference is between Decision Tree on one side and both Bayesian algorithms on the other. In the setting with the ten-value aspect attributes, values of the variables are treated as real numbers in the Decision Tree experiments, while they 259 6. Unlexicalised learning of event duration using parallel corpora Bayesian Net Decision Tree Naı̈ve Bayes Binary aspect Baseline Mean accuracy score (%) Ten-value setting Two-value setting 83 79 83 79 79 76 70 70 65 65 Table 6.7.: Results of machine learning experiments with 5-fold cross-validation (roughly 80% of data for training and 20% for testing. are treated as nominal values in both Bayesian experiments. To give an example, the Decision Tree classifier “knows” that the value 0.8 is greater than 0.4, while Bayesian classifiers treat these two numbers just as strings representing two different classes. This difference can turn into an advantage for the Decision Tree classifier, because the true nature of these values is ordinal. We run the two classifiers using the packages available in the system Weka (Hall et al. 2009). The performance of all the three classifiers is reported in Table 6.7. Training and test set, baseline. We evaluate the performance of the classifier on the data set which consists of 918 instances for which both time annotation and crosslinguistic data are available (see Section 6.4.2). The data set is split into the training set (around 80%) and the test set (around 20%) using random choice of instances. To account for the potential influence of the data split on the observed results, we perform a five-fold cross-validation.3 In Table 6.7, we report the mean accuracy scores. The baseline is defined as the proportion of the more common class in the data set. Since 65% of instances belong to long events, if we assigned this label to all instances, the accuracy of the classification would be 65%. 260 6.4. Experiment: Learning event duration with a statistical model Results and discussion We can see in Table 6.7 that all four classifiers outperform the baseline in both settings. The Bayesian Net and Decision Tree classifier perform better than Naı̈ve Bayes classifier (83% vs. 79%. mean accuracy score). This difference is statistically significant, as determined by the t-test (t = 2.79, p < 0.01). The performance of all the three classifiers is significantly better in the ten-value setting than in the two-value setting. The differences in the performance of the four classifiers indicate that all the information included in the model is useful. First, the simple rule-based classification gives the lowest results, but still above the baseline. This indicates that there is a certain relationship between the duration of an event and the perfective vs. imperfective interpretation of Serbian verb which express it. However, the other methods, which combine the binary division into perfective and imperfective with the morphological attributes, perform much better. The worst performing out of the three machine-learning classifiers is Naı̈ve Bayes which simplifies most the relationships between the properties, treating them as independent. The fact that the Bayesian net classifier makes more correct predictions than Naı̈ve Bayes is likely to be due to the fact that the dependencies specified in the model express true relationships between the structural elements. The fact that Decision Tree reaches the same performance as our Bayesian net can be explained by the hierarchical nature of the decisions taken by this classifier. Finally, the consistent difference in the performance of all the three classifiers on the two versions of data indicates that representing aspectual properties with only two values is an oversimplification which results in more errors in predictions. Verb aspect clearly cannot be described in terms of a single binary attribute, such as, for example, the temporal boundedness, which is used in our study. Finer distinctions contain more information which proves useful for predicting event duration. 3 In five-fold cross validation, the data set is split in five portions. Each time the classifier is run one portion is taken as the test set and all the other portions serve as the training test. The classifier is run five time, once for each test set. 261 6. Unlexicalised learning of event duration using parallel corpora 6.5. General discussion The results of the experiments can be interpreted in the light of several questions which have been examined in our study. First of all, they clearly show that natural language does encode those temporal properties of events about which people have common intuitions and which are relevant for the representation of real world. The 83% accuracy in predictions realised by the two best classifiers is much closer to the upper bound of the performance on this task (the proportion of inter-annotator agreement of 0.877 measured by Pan et al. (2011) would correspond to an accuracy score of 87.7%) than to the baseline (65%). This shows that the relevance of the linguistic encoding can be learned and exploited by the systems for automatic identification of event duration. Using Serbian language data to classify events expressed in English is based on the assumption that verb aspect is an element of general structure of language and that it has the same role in the languages where it is morphologically encoded as in the languages which do not exhibit it in their observable morphological structure. This assumption underlies all general linguistic accounts of verb aspect which we have taken into consideration in our study. Successful transfer of the representation of aspect from Serbian to English can be interpreted as a piece of empirical evidence in favour of this assumption. 6.5.1. Aspectual classes A careful analysis of the linguistic structure, guided by established theoretical notions, proves useful in identifying the elements of the linguistic structure which are relevant for temporal encoding. The temporal meaning of Serbian verb morphemes, which is exploited by the systems in our experiments, becomes clear only in the context of the general theory of verb aspect. The morphemes themselves are ambiguous and do not constitute a temporal paradigm. However, the frequency distribution of the morphemes in the cross-linguistic realisations of verbs clearly depends on the temporal properties of the events described by the verbs and it does provide an observable indicator of the temporal meaning of verbs. 262 6.5. General discussion The results of the experiments suggest that the three-attribute representation of aspect which we propose, captures the relationships between the structural elements in Serbian verbs which are relevant for time encoding. The simple binary classification is clearly not an adequate level of aspectual classification. However, the set of ten values which we have used, is not necessarily the best representation either. Our decision to group the values into ten classes is arbitrary. It is possible that the classifiers do not use all the ten values and that some of them are more informative than others. A systematic approach to identifying the best representation of aspectual properties of verbs could help identifying the elements of the lexical representation which interact with the temporal boundedness in a systematic way to form fully specified aspectual classes. This would improve our understanding of what kinds of aspectual classes exist and what kinds of meaning they express. 6.5.2. Relevance of the findings to natural language processing Our results indicate that the kind of information elicited from human judges in annotating event duration is represented mostly at the word level in the linguistic structure. The event duration annotation which is used in our experiments is instance-based. The annotators could assign different classes to different instances of the same verb form, depending on the context of the form in a particular instance. This was not possible in our approach to verb aspect. In order to transfer the acquired information from the parallel corpus to the corpus which contains event annotation, we separated the aspect representation from the context and tied it to verb types, assigning the same values to all instances of the same type. The automatic classifiers were able to learn the relationship between the two annotations despite this simplification, which indicates that the human annotations were more influenced by the lexical properties of verbs than by the context. This information could be useful in designing future approaches to identification of temporal properties in natural language. It should be noted that, even though we work with the representations at the word level, our model is unlexicalised. We do not use the information contained in the idiosyncratic lexical meaning of verbs, but more general elements of the representation shared by different lexical items. As a consequence, a relatively small size of the training corpus 263 6. Unlexicalised learning of event duration using parallel corpora needed for learning both the empirical representation of verb aspect and its relation to event duration. However, the word-level representation is, at the same time, a limitation of our approach. Not all the information about event duration can be found at the word level. A full model of linguistic encoding of time will have to take into account observations at the higher levels of the structure of language too. 6.6. Related work Pan et al. (2011) use the corpus described in Section 6.4.2 to perform machine learning experiments. They define a set of features representing event instances which are used to learn a classification of the events into short and long (see Section 6.4.2). The features used in the event instance representation are: the lexical item which expresses the event, its part-of-speech tag and lemma, the same information for the words in the immediate context of the event item (in the window of size four), some syntactic dependencies of the event item (such as its direct object, for example), and the WordNet (Fellbaum 1998) hypernyms of the event item. The classification is learned using three different supervised algorithms: Decision Tree, Naı̈ve Bayes, and Support Vector Machines. The best performance is obtained by Support Vector Machines on the class of long events, with an F-score of 0.82. Although the overall performance is not reported, it can be expected to be lower than this score, given that the performance on the short events is measured as an F-score of 0.65 (weighted average of the two scores is 0.75). Our results are not directly comparable with the results of Pan et al. (2011). Our best accuracy score corresponds to an overall F-score of 0.83 on both kinds of events, but we do not use exactly the same training and test set. Since we have selected only the instances of events expressed by verbs, we use only a portion of Pan et al. (2011)’s data both for training and testing. Although we obtain a better score with a smaller data set, we do not know what exactly causes the difference. A more thorough comparison would be necessary to determine whether the task is easier on the instances which we selected. This would justify our decision and underline the need for a different approach to categories other than verbs. Otherwise, our approach could be judged as better, but it should be extended to other categories. 264 6.6. Related work Gusev et al. (2011) use predefined word patterns as indicators of event duration. One of the patterns used, for example, is Past Tense + yesterday. If an event expressing item shows a tendency to occur with this pattern, it can be taken to express a short event in the sense of Pan et al. (2011). The occurrence data are extracted from the web using the Yahoo! search engine. Gusev et al. (2011) train learning algorithms on the instances where the event duration annotation is replaced by the pattern definitions. A maximum entropy algorithm performs better than Support Vector Machines reaching the best performance of 74.8%, which is not significantly different from the performance on the hand annotated data set. Gusev et al. (2011) also try learning finer-grained classes, but the accuracy scores are much lower (below 70%). Feature analysis by both Pan et al. (2011) and Gusev et al. (2011) indicates that enriching the models with context information brings little or no improvement in the results, which is in agreement with our own findings. Williams and Katz (2012) explore other word patterns which indicate event duration for classifying events into habitual and episodic. The data are extracted from a corpus of Twitter messages and classified using a semi-supervised approach. The study suggests that most verbs are used in both senses and proposes a lexicon of mean duration of episodes and habits expressed by a set of verbs. These temporal quantifications, however, are not directly evaluated against human judgments. The work on verb aspect is mostly concerned with using elements of the context to detect certain aspectual classes. The work of Siegel and McKeown (2000), for example, addresses the aspectual classes proposed by Moens and Steedman (1988), showing, by means of a regression analysis, that the context indicators which distinguish between dynamic and stative events are different from the indicators which distinguish between culminated and nonculminated events (the notion of a culminated event roughly corresponds to the notion of a temporally bounded event discussed in our study). Siegel and McKeown (2000) also show that it is harder to distinguish between culminated and non-culminated than between static and dynamic events. Kozareva and Hovy (2011) propose a semi-supervised method for extracting word patterns correlated with a set of aspectual classes, but their classes do not make reference to the classes discussed by Siegel and McKeown (2000) nor to other classes argued for in the linguistic theory. 265 6. Unlexicalised learning of event duration using parallel corpora A possibility of cross-linguistic transfer of verb aspect through parallel corpora is explored by Stambolieva (2011), but the study is not conducted in the experimental framework and does not report on automatic data processing. 6.7. Summary of contributions In this study, we have explored the relationship between verb aspect, as an element of the grammar of natural language, and the more general cognitive notion of event duration. We have shown that this relationship can be explicitly formulated in terms of a probabilistic model which predicts the duration of an event on the basis of the aspectual representation of the verb which is used to express it. With the accuracy of 83%, the model’s predictions can be considered successful. The model’s accuracy score is much closer to the upper bound, defined as the agreement between human classification (88%), than to the baseline, defined as the proportion of the most frequent class (65%). For the purpose of our experimental study, we have developed a quantitative representation of verb aspect which is based on the distribution of morphosyntactic realisations of Serbian verbs in parallel English-Serbian instances of verbs. Contrary to other approaches to automatic identification of event duration, which have explored the observable indicators at the syntactic and at the discourse level of linguistic structure, we have identified observable indicators of event duration at the word level. We have shown that a good proportion of temporal information which is implicitly understood in language use is, in fact, contained in the grammar of lexical derivation of verbs in Serbian. This information can be automatically acquired and ported across languages using parallel corpora. The accuracy of the prediction based on our bi-lingual model is superior to the best performing monolingual model. 266 7. Conclusion In this dissertation, we have proposed a computational method for identifying grammatically relevant components of the meaning of verbs by observing the variation in samples of verbs’ instances in parallel corpora. The core of our proposal is a formalisation of the relationship between the meaning of verbs and the variation, cross-linguistic as well as language-internal, in their morphosyntactic realisations. We have used standard and Bayesian inferential statistics to provide empirical evidence of a number of semantic components of the lexical representation of verbs which are grammatically relevant because they play a role in the verbs’ predicate-argument structure. In particular, we have shown that frequency distributions of morphosyntactic alternants in argument alternations depend on the properties of events described by the alternating verbs. Identifying grammatically relevant components of the meaning of verbs is one of the core issues in linguistic theory due to the evident relationship between the meaning of verbs and the structure of the clauses that they form. In order to understand how basic clauses are structured, one needs to account for the differences in the number of constituents which are realised. Such an account involves explaining why some clauses are intransitive, some are transitive, and some are ditransitive. The explanation leads to the lexical properties of the main verb which heads a clause. Intransitive clauses are typically formed with the verbs such as go, swim, cough, laugh. Transitive clauses are formed with verbs such as make, break, see. Ditransitive clauses are formed with verbs such as give, tell. However, one needs to take into account also the fact that one verb is rarely associated with only one type of a clause. It is much more often the case that the same verb is associated with alternating clausal structures. For example, the verb break can be realised in both a transitive clause (e.g. Adam broke the laptop) and in a semantically related intransitive clause (e.g. The laptop broke). Alternative 267 7. Conclusion morphosyntactic realisations of semantically equivalent units are not only found within a single language, but also across languages. Although associating the meaning of verbs with the types of clauses which they form is necessary for formulating the rules of clausal structure, defining precise rules to link the elements of a phrase structure to the elements of the lexical representation of verbs proves to be a challenging task. The work on the interface between the lexicon and the rules of grammar has resulted in numerous proposals regarding the grammatically relevant elements of the lexical representation of verbs. It is widely accepted that the meaning of a verb is related to the grammar of a clause through a layer in the lexical representation of verbs which is usually called the predicate-arguments structure. However, the views of what exactly the elements of the predicate-argument structure are differ very much. The nature of the predicate-argument relations in the representation of the meaning of verbs has been described in various frameworks, starting with the naı̈ve analyses of semantic roles of verbs’ arguments (Fillmore 1968; Baker et al. 1998) to more general approaches based on semantic decomposition of the predicate-argument relations. Several attributes of the meaning of verbs have been proposed as relevant for the predicate arguments structure. It has been argued, for example, that these attributes include volition (Dowty 1991) or, more generally, mental state (Reinhart 2002), change (Dowty 1991; Reinhart 2002; Levin and Rappaport Hovav 1994), causation (Talmy 2000; Levin and Rappaport Hovav 1994). The fact that the morphosyntactic realisations of verbs are influenced by the values of these attributes makes these semantic components grammatically relevant (Pesetsky 1995; Levin and Rappaport Hovav 2005). For example, a verb can be expected to form intransitive clauses if it describes an event which does not involve somebody’s volition. The attributes can interact between themselves and also with other factors, which results in a complex relationship between the lexical representation of verbs and the structure of a clause. In more recent accounts, the components of the predicate-argument structure have been reinterpreted in terms of temporal decomposition of events described by verbs (Krifka 1998; Ramchand 2008; Arsenijević 2006). The notion of causation, for example, is identified with the notion of temporal precedence, while the notion of change is reanalysed as a result. The structural elements proposed in the temporal account of the predicate- 268 7.1. Theoretical contribution argument relations are usually called sub-events. The defined relations hold between the sub-events. We have proposed an empirical statistical method to test theoretical proposals regarding the relationship between the lexical structure of verbs and the structure of a clause on a large scale. Following the influential study on large-scale semantic classification of verbs (Levin 1993), we have based our approach on the assumption that the meaning of a verb determines the syntactic variation in the structure of the clauses that it forms and that, therefore, the grammatically relevant components of the meaning of a verb can be identified by observing the variation in its syntactic behaviour. The validity of this general assumption has already been tested in the context of automatic verb classification (Merlo and Stevenson 2001). In this dissertation, we have formulated and tested experimentally a number of specific hypotheses showing that the frequency distribution of syntactic alternants in the morphosyntactic realisations of verbs can be predicted from some particular properties of the meaning of verbs. We have applied our approach to two general properties of events which have been widely discussed in the recent literature: causation and temporal structure. The contribution of our work with respect to the existing work is both theoretical and methodological. 7.1. Theoretical contribution With respect to previous theoretical approaches to the predicate-argument structure of verbs, the main novelty of our work is the demonstrated grammatical relevance of certain attributes of the meaning of verbs. Using statistical inference, we have formalised the relationship between the meaning of verbs, their use, represented by the frequency distribution of their instances in a corpus, and their formal properties (such as causative or aspectual marking), showing how the three sources of data can be combined in a unified account of the interface between the lexicon and the grammar. In an analysis of the relationship between the kind of causation and the variation in morphosyntactic realisations of light verb constructions, we have found empirical evidence of the presence of two force-dynamics schemata in light verbs. The meaning of light verbs such as take can be described as self-oriented (Talmy 2000; Brugman 2001) because the 269 7. Conclusion dynamics of the event is oriented towards its causer (or agent). As opposed to this, the meaning of light verbs such as make can be described as directed because the dynamics of the event is not oriented towards the causer, but towards another participant in the event. Our experiments have shown that the frequency distribution of cross-linguistic morphosyntactic alternants of light verbs depend on their force-dynamics schemata. In an analysis of cross-linguistic morphosyntactic realisations of lexical causatives, we have taken a closer look into the notion of external causation (Haspelmath 1993; Levin and Rappaport Hovav 1994; Alexiadou 2010). We have argued, based on the results of a series of experiments, that the likelihood of an external causer in an event described by a verb is a semantic property which underlies two correlated frequency distributions: the distribution of morphological marking on lexical causatives across a wide range of languages and the distribution of clause types in a sample of verb instances in any single language. The contribution of this piece of work is twofold. First, we have shown that there is a relationship between a semantic attribute of lexical causatives and their morphosyntactic form. Specifically, the observed variation in the cross-linguistic realisations of a verb depends on the likelihood of external causation of the event described by the verb. Second, we have shown that the likelihood of external causation can be estimated for a wide range of verbs by means of a statistical model. The temporal structure of events is analysed in the third case study. The main contribution of this study is the established relationship between formal aspectual marking on a verb and the duration of the event described by the verb. More specifically, we have designed a statistical model which predicts the duration of an event described by an English verb on the basis of the observed frequency distribution of formal morphosyntactic aspectual markers in the aligned Serbian verbs. In an experimental evaluation, the model is shown to make better predictions than the best performing monolingual model. We have developed corpus-based measures of the values of the three semantic attributes of verbs which we have studied. These values are calculated automatically and they can be assigned to a large number of verbs. 270 7.2. Methodological contribution 7.2. Methodological contribution The main methodological contribution of this dissertation is that it combines theoretical linguistic goals with the sound modelling and experimental methodology developed in computational linguistics. The methodology which we have used in this dissertation is not new in itself, but its application to testing theoretical hypotheses is novel in three ways. First, while frequency distributions of syntactic realisations of verbs have been extensively studied and used in the context of developing practical applications in the domain of automatic natural language processing, this kind of evidence is not commonly used in theoretical linguistics. In our experiments, we have demonstrated that a statistical analysis of a large number of verb instances can be used to study structural components in the lexical representation of verbs. We have quantified and measured the semantic phenomena which we have studied using the methods and the techniques developed in natural language processing. We have used statistical models and tests to capture the generalisations in large data sets. We have estimated the parameters of the models by applying machine learning techniques. We have tested and explicitly evaluated the predictions of the models. These methods constitute the standard experimental paradigm in computational linguistics. In this dissertation, we have shown that their application to addressing theoretical questions can lead to extending our knowledge about language. Combining various sources of data in a large-scale analysis can shed some new light on the nature of the interface between the lexicon and the grammar, which involves complex interactions of multiple factors. Second, the data sets which are used in the standard linguistic approaches are usually much smaller than those which are used in our experiments. The methodological advantage of large data sets is that they are more likely to be representative of linguistic phenomena than small samples which are manually analysed. By using computer-based language technology, we can now observe the variation in the use of linguistic units on a large scale, applying inductive reasoning in defining generalisations. In this dissertation, we have shown how the tools and resources developed in natural language processing can be used to compose large experimental data sets for theoretical linguistic research. We have used existing parallel corpora, an automatic alignment tool, syntactic 271 7. Conclusion parsers, morphological analysers, as well as our own scripts for automatic extraction of the experimental data from parallel corpora. With the rapidly developing language technology, such resources can be expected to grow and to be increasingly available in the future. The data and the tools accumulated in developing language technology represent extremely rich new resource for future linguistic research. Third, we have extended the corpus-based quantitative approach to linguistic analysis to the cross-linguistic domain. This is a necessary step for formulating generalisations which hold across languages. We have achieved this by collecting data from parallel corpora. We have shown that parallel corpora represent a valuable source of information for a systematic study of the structural sources of cross-linguistic microvariation, despite the fact that the observed variation can be influenced by some non-linguistic factors (such as translators’ choice) as well. 7.3. Directions for future work We define the directions for continuing the work presented in this dissertation in two ways. On the one hand, our approach can be extended to include more languages and to more complex modelling. On the other hand, our findings have opened new questions which could be pursued further in future work. Although our approach is cross-linguistic in the sense that we analyse the data from at least two languages in all our experiments, our data come from only a few languages: English, German, and Serbian. We have used only a small sample of languages because the focus of our work has been on developing and testing the methodology of cross-linguistic corpus-based linguistic research. Applying the methods proposed in this dissertation to a larger sample of languages is a natural next step in future research. Increasing the number of languages included in an analysis would enrich the data sets not only because more instances of linguistic phenomena would be analysed, but also because more linguistic information can be automatically extracted. For example, morphological marking, which is often not available in English, can be extracted from other languages. Although we have not used morphological marking in a systematic way, the results 272 7.3. Directions for future work of our experiments suggest that it can be a valuable source of information to study various elements of the grammar of language, which is in accordance with some recent broad typological studies (Bickel et al. 2013; To appear). Parallel corpora of numerous languages already exist (for example, the current version of the corpus Europarl contains 21 languages) and they are constantly growing. Since statistical modelling has not been widely used in theoretical linguistic research so far, the focus of this dissertation has been on demonstrating how statistical inference can be used to address theoretical issues. To this end, we have formulated relatively narrow theoretical questions which could have been addressed using simple statistical and computational approaches. This allowed us to establish a straightforward relationship between theoretical notions which we studied and the components of the models. However, our approach can be extended to more general questions involving more factors. This can be done by applying more advanced modelling approaches such as those which are currently being proposed in computational linguistics and in other disciplines dealing with large-scale data analysis. By analysing the data in our experiments, we have noticed several phenomena which call for further investigation, but which we could not address directly because this work would fall out of the scope of the dissertation. Such a phenomenon is, for example, the fact that nouns are aligned better than verbs in automatic alignment in general. It would be worth exploring in future work whether this fact can be related to some known distributional differences between these two classes or not. It might also mean that nominal lexical items are more stable across languages than verbal ones. Another phenomenon which would be worth exploring in future research became evident while we were studying the data on lexical causatives. We have noticed that the quantity of anticausative morphological marking varies in European languages. The number of lexical causatives which are attached a reflexive particle in the citation form, such as sich öffnen ’open’ in German, varies across European languages. There are, for example, many more such verbs in Serbian than in German, while there are almost none in English. Possible explanation for this variation is the difference in the morphological richness between the three languages, given that Serbian is usually considered morphologically richer than German, and German richer than English. Based on our results, this marking 273 7. Conclusion could be expected to be related to the likelihood of external causation too. The verbs which describe an event with a low likelihood of an external causer are expected to occur without a marker more than the verbs describing an event with a high likelihood of an external causer. The morphological markings should, thus, be distributed in a continuous fashion over the scale of likelihood of external causation, covering different portions of the scale in different languages. Addressing these relations directly in an experiment might result in new findings pointing to some structural constraints on cross-linguistic variation. Finally, in the study of temporal properties of the meaning of verbs, we have proposed a quantitative representation of verb aspect classes based on frequency distribution of morphological marking in Serbian verbs. This representation proved useful for the goals of our experiments. However, we have not fully examined the theoretical aspects of our proposal. What remains as an open question for future research is the source of the quantities which have been observed in our experiments as the values of the aspectual attributes. Exploring this question further could point to new findings on how morphological marking patterns can be used for determining which aspectual classes exist in language and what their meaning is. This should provide a clearer picture of the semantic representation of time in language in general. 274 Bibliography Steven Abney. Data-intensive experimental linguistics. Linguistic Issues in Language Technology — LiLT, 6(2):1–30, 2011. Enoch Oladé Aboh. Clause structure and verb series. Linguistic Inquiry, 40(1):1–33, 2009. Artemis Alexiadou. On (anti-)causative alternations, 2006a. d’automne de linguistique, Paris. Presentation, École Artemis Alexiadou. On the morpho-syntax of (anti-)causative verbs. In Malka Rappaport Hovav, Edit Doron, and Ivy Sichel, editors, Syntax, Lexical Semantics and Event Structure, pages 177–203, Oxford, 2010. Oxford University Press. Artemis Alexiadou, Elena Anagnostopoulou, and Florian Schäfer. The properties of anticausatives crosslinguistically. In Mara Frascarelli, editor, Phases of Interpretation, pages 187–212, Berlin, New York, 2006. Mouton de Gruyter. Alex Alsina. A theory of complex predicates: evidence from causatives in Bantu and Romance. In Alex Alsina, Joan Bresnan, and Peter Sells, editors, Complex predicates, pages 203–247, Stanford, California, 1997. CSLI Publications. Boban Arsenijević. Inner aspect and telicity: The decompositional and the quantificational nature of eventualities at the syntax-semantics interface. LOT, Utrecht, 2006. Boban Arsenijević. Slavic verb prefixes are resultative. Cahiers Chronos, 17:197–213, 2007. Harald Baayen. Analyzing Linguistic Data. A Practical Introduction to Statistics using R. Cambridge University Press, Cambridge, 2008. 275 Bibliography Collin F. Baker, Charles J. Fillmore, and John B. Lowe. The berkeley framenet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pages 86–90, Montreal, Canada, 1998. ACL / Morgan Kaufmann Publishers. Mark Baker. Thematic roles and syntactic structure. In Liliane Haegeman, editor, Elements of Grammar, pages 73–137, Dordrecht, 1997. Kluwer. Mark C. Baker. Incorporation — A Theory of Grammatical Function Changing. The University of Chicago Press, Chicago, London, 1988. Mark C. Baker. The atoms of language. Basic Books, New York, 2001. Marco Baroni and Silvia Bernardini. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing, 21(3):259–274, 2006. Marco Baroni and Alessandro Lenci. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4):673–722, 2010. John Beavers. Argument/Oblique Alternations and the Structure of Lexical Meaning. PhD thesis, Stanford University, 2006. Douglas Biber. Variation across Speech and Writing. Cambridge University Press, Cambridge, 1991. Theresa Biberauer, editor. The Limits of Syntactic Variation, Amsterdam, 2008. John Benjamins. Balthasar Bickel, Giorgio Iemmolo, Taras Zakharko, and Alena Witzlack-Makarevich. Patterns of alignment in verb agreement. In Dik Bakker and Martin Haspelmath, editors, Languages across boundaries: studies in the memory of Anna Siewierska, pages 15–36. De Gruyter Mouton, Berlin, 2013. Balthasar Bickel, Taras Zakharko, Lennart Bierkandt, and Alena Witzlack-Makarevich. Semantic role clustering: an empirical assessments of semantic role types in nondefault case assignment, To appear. 276 Bibliography Claire. Bonial, William. Corvey, Martha. Palmer, Volha V. Petukhova, and Harry Bunt. A hierarchical unification of lirics and verbnet semantic roles. In Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on, pages 483–489, Sept 2011. doi: 10.1109/ICSC.2011.57. Gerlof Bouma, Lilja Øvrelid, and Jonas Kuhn. Towards a large parallel corpus of cleft constructions. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta, 2010. European Language Resources Association. Melissa Bowerman and William Croft. The acquisition of the English causative alternation. In Melissa Bowerman and Penelope Brown, editors, Crosslinguistic perspectives on argument structure: Implications for learnability, pages 279–306, New York, NY, 2008. Lawrence Erlbaum Associates. Michael R. Brent. From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics, 19(3):243–262, 1993. Joan Bresnan. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In Sam Featherston and Wolfgang Sternefeld, editors, Roots: Linguistics in Search of Its Evidential Base, Studies in Generative Grammar, pages 77–96, Berlin, 2007. Mouton de Gruyter. Joan Bresnan and Tatiana Nikitina. The gradience of the dative alternation. In Linda Uyechi and Lian Hee Wee, editors, Reality Exploration and Discovery: Pattern Interaction in Language and Life, pages 161–184, Stanford, 2009. CSLI Publications. Ted Briscoe and John Carroll. Automatic extraction of subcategorization from corpora. In Proceedings of the 5th ACL Conference on Applied Natural Language Processing, pages 356–363, 1997. Peter F. Brown, Stephen A. Della-Pietra, Vincent J. Della-Pietra, and Robert L. Mercer. The mathematics of statistical machine translation. Computational Linguistics, 19(2): 263–313, 1993. Claudia Brugman. Light verbs and polysemy. Language Science, 23:551–578, 2001. 277 Bibliography Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian Padó, and Manfred Pinkal. Using frameNet for the semantic analysis of German: Annotation, representation and automation. In Hans Boas, editor, Multilingual FrameNets in computational lexicography, pages 209–244. Mouton de Guyter, 2009. Lou Burnard. Reference guide for the British National Corpus (XML edition), 2007. URL http://www.natcorp.ox.ac.uk/XMLedition/URG/. Miriam Butt and Wilhelm Geuder. On the (semi)lexical status of light verbs. In Norbert Corver and Henk van Riemsdijk, editors, Semilexical Categories: On the content of function words and the function of content words, pages 323–370, Berlin, 2001. Mouton de Gruyter. Xavier Carreras and Lluis Màrquez. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the 9th conference on computational natural language learning (CONLL), pages 152–164, Ann Arbor, 2005. Association for computational linguistics. Noam Chomsky. Remarks on nominalization. In Roderick Jacobs and Peter Rosenbaum, editors, Readings in English Transformational Grammar, Waltham, MA, 1970. Blaisdell. Noam Chomsky. Knowledge of language: its nature, origin and use. Praeger, New York, 1986. Noam Chomsky. The minimalist program. MIT Press, Cambridge, Massachusetts, 1995. Kenneth W. Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, 1990. Trevor Cohn and Mirella Lapata. Machine translation by triangulation: Making effective use of multi-parallel corpora. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 728–735, Prague, Czech Republic, June 2007. Association for Computational Linguistics. Chris Collins. Argument sharing in serial verb constructions. Linguistic Inquiry, 28: 461–497, 1997. 278 Bibliography Michael Collins, Philipp Koehn, and Ivona Kučerová. Clause restructuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 531–540, Ann Arbor, 2005. Association for Computational Linguistics. Michael Cysouw and Bernhard Wälchli, editors. Parallel Texts. Using Translational Equivalents in Linguistic Typology, volume Theme issue in Sprachtypologie und Universalienforschung (STUF) 60(2), 2007. Akademie Verlag GMBH. David Dowty. Thematic proto-roles and argument selection. Language, 67(3):547–619, 1991. David R Dowty. Word meaning and Montague grammar: the semantics of verbs and times in generative semantics and in Montague’s PTQ. D. Reidel, cop., Dordrecht, Boston, 1979. David R. Dowty. The effects of aspectual class on the temporal structure of discourse: semantics or pragmatics. Linguistics and Philosophy, 9:37–61, 1986. Tomaž Erjavec. MULTEXT-East version 4: Multilingual morphosyntactic specifications, lexicons and corpora. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, editors, Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), pages 2544–2547, Valletta, Malta, 2010. European Language Resources Association (ELRA). Afsaneh Fazly. Automatic acquisition of lexical knowledge about multiword predicates. PhD thesis, University of Toronto, 2007. Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Mass., 1998. Charles Fillmore. The case for case. In Emmon Bach and Robert T. Harms, editors, Universals in linguistic theory, pages 1–88, New York, 1968. Holt, Rinehart and Winston. 279 Bibliography Charles J. Fillmore. Frame semantics. In Linguistics in the Morning Calm, pages 111– 137, Seoul, 1982. Hanshin Publishing Co. Pascale Fung, Zhaojun Wu, Yongsheng Yang, and Dekai Wu. Learning bilingual semantic frames: Shallow semantic parsing vs. semantic role projection. In 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007), pages 75– 84, Skovde, Sweden, 2007. Nikhil Garg and James Henderson. Unsupervised semantic role induction with global role ordering. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 145–149, Jeju Island, Korea, July 2012. Association for Computational Linguistics. URL http://www.aclweb. org/anthology/P12-2029. Matthew Gerber and Joyce Y. Chai. Semantic role labeling of implicit arguments for nominal predicates. Computational Linguistics, 38(4):755–798, 2012. Daniel Gildea and Daniel Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245–288, 2002. Gregory Grefenstette and Simone Teufel. Corpus-based method for automatic identification of support verbs for nominalization. In Proceedings of the 7th Meeting of the European Chapter of the Association for Computational Linguistics, pages 98–103, Dublin, Irland, 1995. Association for Computational Linguistics. H. Paul Grice. Logic and conversation. In Peter Cole and Jerry L. Morgan, editors, Syntax and Semantics 3: Speech Acts, pages 41–58, New York, 1975. Academic Press. Jane Grimshaw. Argument Structure. MIT Press, Cambridge, Mass., 1990. Jane Grimshaw and Armin Mester. Light verbs and theta-marking. Linguistic Inquiry, 19:205–232, 1988. Andrey Gusev, Nathaniel Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, and Dan Jurafsky. Using query patterns to learn the durations of events. In IEEE IWCS-2011, 9th International Conference on Web Service, pages 145–155, Oxford, UK, 2011. Institute of Electrical and Electronics Engineers (IEEE ). 280 Bibliography Jan Hajič, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martı́, Lluı́s Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jan Štěpánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 1–18, Boulder, Colorado, June 2009. Association for Computational Linguistics. Kenneth Hale and Samuel Jay Keyser. On argument structure and the lexical representation of syntactic relations. In Kenneth Hale and Samuel Jay Keyser, editors, The View from Building 20, pages 53–110. MIT Press, 1993. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11(1), 2009. Martin Haspelmath. More on typology of inchoative/causative verb alternations. In Bernard Comrie and Maria Polinsky, editors, Causatives and transitivity, volume 23, pages 87–121, Amsterdam/Philadelphia, 1993. John Benjamins Publishing Co. John A. Hawkins. A comparative typology of English and German : unifying the contrasts. Croom Helm, London ; Sydney, 1986. James Henderson, Paola Merlo, Gabriele Musillo, and Ivan Titov. A latent variable model of synchronous parsing for syntactic and semantic dependencies. In Alex Clark and Kristina Toutanova, editors, Proceedings of the Twelfth Conference on Computational Natural Language Learning (CONLL 2008), page 178–182, Manchester, UK, 2008. Rebecca Hwa, Philip Resnik, Amy Weinberg, and Okan Kolak. Evaluation translational correspondance using annotation projection. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 392–399, Philadelphia, PA, 2002. Association for Computational Linguistics. Jena D. Hwang, Archna Bhatia, Claire Bonial, Aous Mansouri, Ashwini Vaidya, Nianwen Xue, and Martha Palmer. PropBank annotation of multilingual light verb 281 Bibliography constructions. In Proceedings of the Fourth Linguistic Annotation Workshop, pages 82–90, Uppsala, Sweden, July 2010. Association for Computational Linguistics. Ray Jackendoff. X syntax : a study of phrase structure. MIT Press, Cambridge Mass., 1977. Eric Joanis and Suzanne Stevenson. A general feature space for automatic verb classification. In Proceedings of The 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pages 163–170, Budapest, Hungary, 2003. Association for Computational Linguistics. Eric Joanis, Suzanne Stevenson, and David James. A general feature space for automatic verb classification. Natural Language Engineering, 14(3):337–367, 2008. Richard Kayne. The Oxford Handbook of Comparative Syntax, chapter Some notes on comparative syntax, with special reference to English and French. Oxford University Press, 2005. Kate Kearns. Light verbs in english. Manuscript, 2002. Karin Kipper Schuler. VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania, 2005. Dan Klein. The unsupervised learning of natural language structure. PhD thesis, Stanford University, 2005. Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT Summit 2005, Phuket, Thailand, 2005. Zornitsa Kozareva and Eduard Hovy. Learning temporal information for states and events. In Proceedings of the Workshop on Semantic Annotation for Computational Linguistic Resources (ICSC 2011), Stanford, 2011. Manfred Krifka. The origins of telicity. In Susan Rothstein, editor, Events and Grammar, pages 197–235, Dordrecht, 1998. Kluwer. 282 Bibliography Cvetana Krstev, Duško Vitas, and Tomaž Erjavec. MULTEXT-East resources for Serbian. In Proceedings of 8th Informational Society - Language Technologies Conference, IS-LTC, pages 108–114, Ljubljana, Slovenia, 2004. Jonas Kuhn. Experiments in parallel-text based grammar induction. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Main Volume, pages 470–477, Barcelona, Spain, July 2004. Joel Lang and Mirella Lapata. Unsupervised semantic role induction via split-merge clustering. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1117–1126, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL http: //www.aclweb.org/anthology/P11-1112. Maria Lapata. Acquiring lexical generalizations from corpora: A case study for diathesis alternations. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 397–404, College Park, Maryland, USA, June 1999. Association for Computational Linguistics. Mirella Lapata and Chris Brew. Verb class disambiguation using informative priors. Computational Linguistics, 30(1):45–73, 2004. Richard K. Larson. On the double object construction. Linguistic Inquiry, 19:335–391, 1988. Alex Lascarides and Nicholas Asher. Temporal interpretation, discourse relations and commonsense entailment. Linguistics and Philosophy, 16(5):437–493, 1993. Beth Levin. English verb classes and alternations : a preliminary investigation. The University of Chicago Press, Chicago, 1993. Beth Levin and Malka Rappaport Hovav. A preliminary analysis of causative verbs in English. Lingua, 92:35–77, 1994. Beth Levin and Malka Rappaport Hovav. Unaccusativity : at the syntax-lexical semantics interface. MIT Press, Cambridge, Mass., 1995. 283 Bibliography Beth Levin and Malka Rappaport Hovav. Argument realization. Cambridge University Press, Cambridge, 2005. Stephen C. Levinson. Pragmatics. Cambridge Textbooks in Linguistics. Cambridge University Press, Cambridge, 1983. Jianguo Li and Chris Brew. Which are the best features for automatic verb classification. In Proceedings of ACL-08: HLT, pages 434–442, Columbus, Ohio, June 2008. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P/ P08/P08-1050. Edward Loper, Szu-Ting Yi, and Martha Palmer. Combining lexical resources: Mapping between propbank and verbnet. In Proceedings of the 7th International Workshop on Computational Linguistics, Tilburg, the Netherlands, 2007. Catherine Macleod, Ralph Grishman, Adam Meyers, Leslie Barrett, and Ruth Reeves. NOMLEX: A lexicon of nominalizations. In In Proceedings of Euralex98, pages 187– 193, 1998. Christopher D. Manning. Automatic acquisition of a large subcategorization dictionary from corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 235–242, Columbus, Ohio, USA, June 1993. Association for Computational Linguistics. Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19(2): 313–330, 1994. Rafael Marı́n and Louise McNally. Inchoativity, change of state, and telicity: Evidence from Spanish reflexive psychological verbs. Natural Language and Linguistic Theory, 29:467–502, 2011. Lluı́s Màrquez, Xavier Carreras, Kenneth C. Litkowski, and Suzanne Stevenson. Semantic role labeling: An introduction to the special issue. Computational Linguistics, 34 (2):145–159, 2008. 284 Bibliography Diana McCarthy and Anna Korhonen. Detecting verbal participation in diathesis alternations. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2, pages 1493–1495, Montreal, Quebec, Canada, August 1998. Association for Computational Linguistics. Gail McKoon and Talke Macfarland. Externally and internally caused change of state verbs. Language, 76(4):833–858, 2000. Paola Merlo and Gabriele Musillo. Semantic parsing for high-precision semantic role labelling. In Proceedings of the 12th conference on computational natural language learning (CONLL), pages 1–8, Manchester, 2008. Association for Computational Linguistics. Paola Merlo and Susanne Stevenson. Automatic verb classification based on statistical distribution of argument structure. Computational Linguistics, 27(3):373–408, 2001. Paola Merlo and Lonneke van der Plas. Abstraction and generalization in semantic role labels: PropBank, VerbNet or both? In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 288–296, Singapore, 2009. Association for Computational Linguistics. Paola Merlo, Suzanne Stevenson, Vivian Tsang, and Gianluca Allaria. A multilingual paradigm for automatic verb classification. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 207–214, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073119. URL http://www.aclweb.org/anthology/P02-1027. Nataša Milićević. The lexical and superlexical verbal prefix iz- and its role in the stacking of prefixes. Nordlyd, 32(2):279–300, 2004. Tom T. Mitchell. Machine Learning. McGraw-Hill, Boston, Mass., 1997. Marc Moens and Mark Steedman. Temporal ontology and temporal reference. Computational Linguistics, 14(2):15–28, June 1988. 285 Bibliography Jacques Moeschler and Anne Reboul. Dictionnaire encyclopédique de pragmatique. Ed. du Seuil, Paris, 1994. Paola Monachesi, Gerwert Stevens, and Jantine Trapman. Adding semantic role annotation to a corpus of written Dutch. In Proceedings of the Linguistic Annotation Workshop (LAW), pages 77–84, Prague, Czech Republic, 2007. Association for Computational Linguistic. Joakim Nivre, Johan Hall, Jens Nilsson, Chanev Atanas, Güleşen Eryiğit, Sandra Kübler, Svetoslav Marinov, and Erwin Marsi. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2): 95–135, 2007. Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–52, 2003. Sebastian Padó. Cross-Lingual Annotation Projection Models for Role-Semantic Information. PhD thesis, Saarland University, 2007. Sebastian Padó and Mirella Lapata. Cross-lingual annotation projection of semantic roles. Journal of Artificial Intelligence Research, 36:307–340, 2009. Martha Palmer, Daniel Gildea, and Paul Kingsbury. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71–105, 2005a. Martha Palmer, Nianwen Xue, Olga Babko-Malaya, Jinying Chen, and Benjamin Snyder. A parallel Proposition Bank II for Chinese and English. In Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, pages 61–67, Ann Arbor, Michigan, June 2005b. Association for Computational Linguistics. Martha Palmer, Dan Gildea, and Nianwen Xue. Semantic role labeling. Morgan & Claypool Publishers, 2010. Feng Pan, Rutu Mulkar-Mehta, and Jerry R. Hobbs. Annotating and learning event durations in text. Computational Linguistics, 37(4):727–753, 2011. David Pesetsky. Zero syntax: Experiencers and cascades. MIT Press, Cambridge Mass., 1995. 286 Bibliography James Pustejovsky. The generative lexicon. MIT Press, Cambridge, MA, 1995. James Pustejovsky, Patrik Hanks, Roser Saurı́, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir R. Radev, Beth Sundheim, David Day, Lisa Ferro, and Marzia Lazo. The TIMEBANK corpus. In Corpus Linguistics, page 647–656, 2003. Andrew Radford. Minimalist Syntax. Cambridge University Press, Cambridge, 2004. Gillian Ramchand. Verb Meaning and the Lexicon: A First Phase Syntax. Cambridge Studies in Linguistics. Cambridge University Press, Cambridge, 2008. Tanja Reinhart. The theta system — An overview. Theoretical linguistics, 28:229–290, 2002. Douglas LT Rohde. Tgrep2 user manual, 2004. URL http://tedlab.mit.edu/~dr/ Tgrep2/tgrep2.pdf. Eleanor Rosch. Natural categories. Cognitive Psychology, 4(3):328–350, 1973. Michael Roth and Anette Frank. Aligning predicate argument structures in monolingual comparable texts: A new corpus for a new task. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 218–227, Montréal, Canada, 7-8 June 2012. Association for Computational Linguistics. Susan Rothstein. Telicity and atomicity. In Susan Rothstein, editor, Theoretical and crosslinguistic approaches to the semantics of aspect, pages 43–78, Amsterdam, 2008. John Benjamins. Josef Ruppenhofer, Michael Ellsworth, Miriam R. L. Petruck, Christopher R. Johnson, and Jan Scheffczyk. FrameNet II: Extended theory and practice, 2005. URL http: //framenet.icsi.berkeley.edu/book/book.pdf. Stuart J. Russell and Peter Norvig. Artificial intelligence : a modern approach. Prentice Hall Pearson, Upper Saddle River, N.J., 2010. 287 Bibliography Tanja Samardžić. Light verb constructions in English and Serbian. In English Language and Literature Studies – Structures across Cultures, pages 59–73, Belgrade, 2008. Faculty of Philology. Tanja Samardžić, Lonneke van der Plas, Goldjihan Kashaeva, and Paola Merlo. The scope and the sources of variation in verbal predicates in English and French. In Markus Dickinson, Kaili Müürisep, and Marco Passarotti, editors, Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories, volume 9, pages 199–211, Tartu, Estonia, 2010. Northern European Association for Language Technology (NEALT). Tanja Samardžić and Paola Merlo. The meaning of lexical causatives in cross-linguistic variation. Linguistic Issues in Language Technology, 7(12):1–14, 2012. Florian Schafer. The causative alternation. In Language and Linguistics Compass, volume 3, pages 641–681. Blackwell Publishing, 2009. Sabine Schulte im Walde. Experiments on the choice of features for learning verb classes. In Proceedings of The 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pages 315–322, Budapest, Hungary, 2003. Association for Computational Linguistics. Sabine Schulte im Walde. Experiments on the automatic induction of German semantic verb classes. Computational Linguistics, 32(2):159–194, 2006. Sabine Schulte im Walde, Christian Hying, Christian Scheible, and Helmut Schmid. Combining EM training and the MDL principle for an automatic verb classification incorporating selectional preferences. In Proceedings of ACL-08: HLT, pages 496– 504, Columbus, Ohio, June 2008. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P/P08/P08-1057. Violeta Seretan. Syntax-Based Collocation Extraction. Text, Speech and Language Technology. Springer, Dordrecht, 2011. Eric V. Siegel and Kathleen R. McKeown. Learning methods to combine linguistic indicators: improving aspectual classification and revealing linguistic insights. Computational Linguistics, 26(4):595–628, 2000. 288 Bibliography Nate Silver. The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t. The Penguin Press, New York, 2012. Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay. Unsupervised multilingual learning for pos tagging. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1041–1050, Honolulu, 2008. Association for Computational Linguistics. Benjamin Snyder, Tahira Naseem, and Regina Barzilay. Unsupervised multilingual grammar induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 73–81, Suntec, Singapore, August 2009. Association for Computational Linguistics. Maria Stambolieva. Parallel corpora in aspectual studies of non-aspect languages. In Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora, pages 39–42, Hissar, Bulgaria, September 2011. Suzanne Stevenson and Eric Joanis. Semi-supervised verb class discovery using noisy features. In Walter Daelemans and Miles Osborne, editors, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 71–78, 2003. URL http://www.aclweb.org/anthology/W03-0410.pdf. Suzanne Stevenson, Afsaneh Fazly, and Ryan North. Statistical measures of the semiproductivity of light verb constructions. In Proceedings of the ACL’04 Workshop on Multiword Expressions: Integrating Processing, pages 1–8. Association for Computational Linguistics, 2004. Lin Sun and Anna Korhonen. Improving verb clustering with automatically acquired selectional preferences. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 638–647, Singapore, August 2009. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/D/D09/ D09-1067. Peter Svenonius. Slavic prefixes inside and outside VP. Nordlyd, 32(2):205–253, 2004b. Leonard Talmy. Towards a cognitive semantics. The MIT Press, Cambridge Mass., 2000. 289 Bibliography Pasi Tapanainen, Jussi Piitulainen, and Timo Jarvinen. Idiomatic object usage and support verbs. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2, pages 289–1293, Montreal, Quebec, Canada, August 1998. Association for Computational Linguistics. Jörg Tiedemann. News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In Nicolas Nicolov, Kalina Bontcheva, Galia Angelova, and Ruslan Mitkov, editors, Recent Advances in Natural Language Processing, volume V, pages 237–248, Borovets, Bulgaria, 2009. John Benjamins, Amsterdam/Philadelphia. Ivan Titov and James Henderson. Constituent parsing with incremental sigmoid belief networks. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 632–639, Prague, Czech Republic, June 2007. Association for Computational Linguistics. Ivan Titov and Alexandre Klementiev. A Bayesian approach to unsupervised semantic role induction. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 12–22, Avignon, France, April 2012. Association for Computational Linguistics. URL http://www.aclweb.org/ anthology/E12-1003. Kristina Toutanova, Aria Haghighi, and Christopher Manning. Joint learning improves semantic role labeling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 589–596, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. URL http://www.aclweb. org/anthology/P/P05/P05-1073. Graham Upton and Ian Cook. Understanding statistics. Oxford University Press, Oxford, 1996. Lonneke van der Plas and Jörg Tiedemann. Finding synonyms using automatic word alignment and measures of distributional similarity. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 866–873, Sydney, Australia, July 2006. Association for Computational Linguistics. 290 Bibliography Lonneke van der Plas, Tanja Samardžić, and Paola Merlo. Cross-lingual validity of PropBank in the manual annotation of French. In Proceedings of the Fourth Linguistic Annotation Workshop, pages 113–117, Uppsala, Sweden, 2010. Association for Computational Linguistics. Lonneke van der Plas, Paola Merlo, and James Henderson. Scaling up automatic crosslingual semantic role annotation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 299– 304, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. Zeno Vendler. Linguistics in Philosophy. Cornell University Press, Ithaca, 1967. Ruprecht von Waldenfels. Aspect in the imperative across Slavic - a corpus driven pilot study. Oslo Studies in Language, 4(1):141–155, 2012. Rok Žaucer. The reflexive-introducing na- and the distinction between internal and external Slavic prefixes. In Anastasia Smirnova, Vedrana Mihaliček, and Lauren Ressue, editors, Formal Studies in Slavic Linguistics, pages 54–102, Newcastle upon Tyne, 2010. Cambridge Scholars Publishing. Anna Wierzbicka. Why can you have a drink when you can’t *have an eat? Language, 58(4):753–799, 1982. Edwin Williams. Lexical and synatctic complex predicates. In Alex Alsina, Joan Bresnan, and Peter Sells, editors, Complex predicates, pages 13–29, Stanford, California, 1997. CSLI Publications. Jennifer Williams and Graham Katz. Extracting and modeling durations for habits and events from twitter. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 223–227, Jeju Island, Korea, July 2012. Association for Computational Linguistics. Deirdre Wilson and Dan Sperber. Pragmatics and time. In Robyn Carston and Seiji Uchida, editors, Relevance Theory: Applications and Implications, Amsterdam, 1998. John Benjamins. 291 Bibliography Ian H. Witten and Eibe Frank. Data mining : practical machine learning tools and techniques. Morgan Kaufmann Publishers, San Francisco, 2005. Phillip Wolff and Tatyana Ventura. When Russians learn English: How the semantics of causation may change. Bilingualism: Language and Cognition, 12(2):153–176, 2009b. Phillip Wolff, Ga-Hyun Jeon, and Yu Li. Causal agents in English, Korean and Chinese: The role of internal and external causation. Language and Cognition, 1(2):165–194, 2009a. Shumin Wu and Martha Palmer. Semantic mapping using automatic word alignment and semantic role labeling. In Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 21–30, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. Nianwen Xue and Martha Palmer. Calibrating features for semantic role labeling. In Dekang Lin and Dekai Wu, editors, Proceedings of Empirical Methods in Natural Language Processing (EMNLP) 2004, pages 88–94, Barcelona, Spain, July 2004. Association for Computational Linguistics. David Yarowsky, Grace Ngai, and Richard Wicentowski. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st international conference Human Language Technology, pages 161–168, San Diego, CA, 2001. Association for Computational Linguistics. Beñat Zapirain, Eneko Agirre, and Lluı́s Màrquez. Robustness and generalization of role sets: PropBank vs. VerbNet. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) and the Human Language Technology Conference, pages 550–558, Columbus, Ohio, 2008. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P/P08/P08-1063. Sina Zarrieß and Jonas Kuhn. Exploiting translational correspondences for patternindependent mwe identification. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pages 23–30, Singapore, August 2009. Association for Computational Linguistics. 292 Bibliography Sina Zarrieß, Aoife Cahill, Jonas Kuhn, and Christian Rohrer. A cross-lingual induction technique for german adverbial participles. In Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, pages 34–42, Uppsala, Sweden, July 2010. Association for Computational Linguistics. 293 A. Light verb constructions data A.1. Word alignment of the constructions with ’take’ Mapping EN DE Target DE Target EN v n v n 2–2 taken account berücksichtigung finden no no bad good 2–1 take account berücksichtigen no good good good 2–1 take account berücksichtigen no no good good 2–1 taken action aktion no good bad good 2–2 take check kontrolle durchzuführen no no bad bad 2–2 taken initiative initiative ergriffen no good good good 2–1 take precedence dominieren no no bad good 2–2 take decision beschluss fassen no good bad good 2–2 take account rechnung tragen no good bad good 2–2 take initiative initiative ergriffen no good good good 2–1 take seat sitzen no bad bad bad 2–2 approach taken kurs verfolgt no bad bad bad 2–1 decision taken beschlossen no bad good bad 2–2 take account rechnung tragen no good bad good 2–2 take note nehmen (zur) kenntnis no good good good 2–1 initiative taken initiative no good bad good 2–2 take view sind (der) ansicht no good bad good 2–1 take account berücksichtigen no no good good 2–1 take account berücksichtigt no no good good 295 A. Light verb constructions data 2–2 take action maßnahmen ergreifen good good good good 2–2 taken view meinung vertreten no good good good 2–1 taken initiative vorantreiben no no bad bad 2–2 take view vetreten meinung no good good good 2–1 take action aufarbeitet no no good good 2–2 took account rechnung tragen no no good good 2–2 took steps schritte unternähme no good good good 2–2 taken note (zur) kenntnis genommen no no good good 2–1 taken initiative initiative no good bad bad 2–2 take view bin (der) ansicht no good bad good 2–2 take steps schritte unternehmen no good bad good 2–1 take decision entscheiden no no good good 2–1 take cognisance berücksichtigen no bad bad bad 2–1 take account berücksichtigen no good good good 2–2 decision taken entscheidung getroffen good good good good 2–1 decision taken entscheiden no no good good 2–0 actions take no translation no no bad bad 2–2 taken step schritt kommen no good bad good 2–1 steps taken schritte no good bad good 2–1 take decisions beschließen no no good good 2–1 take care kümmert no no good good 2–2 decision taken beschlüsse fassen no good bad bad 2–2 taken note (zur) kenntnis genommen no good good good 2–0 taken decisions no translation no no bad bad 2–2 take action maßnahmen ergriffen no good good good 2–2 take steps schritte unternehmen no good bad good 2–2 decision taken gefaßten beschlüssen no good good good 2–2 decision taken entscheidung getroffen good good good good 2–2 decision taken getroffenen entscheidung good good good good 2–2 steps taken schritte vollziehen no good bad good 2–1 take decision entscheidung no good bad good 2–1 take approach herantreten no bad bad bad 2–2 take break pause einlegen no good good good 296 A.1. Word alignment of the constructions with ’take’ 2–1 take (into) account berücksichtigt no good good good 2–2 action taken sanktionen verhängt good bad good good 2–2 steps taken maßnahmen ergriffen no good good good 2–2 take decision entscheidung treffen no good good good 2–1 take action einschreiten no no bad bad 2–1 take notice berücksichtigen no no good good 2–2 take view sind (der) ansicht no good bad good 2–2 decisions taken beschlüsse gefaßt no good good good 2–1 take (into) account übernehmen no no good bad 2–2 decision taken entscheidung getroffen good good good good 2–2 take view (um) standpunkt vertreten no no bad good 2–1 taken (into) account berücksichtigt no no good good 2–2 decision taken beschluß angenommen no good good good 2–1 taken account berücksichtigt no good good good 2–2 decision taken beschlüsse getroffen no good good good 2–2 steps taken hürden nehmen no no bad bad 2–2 steps take schritte unternimmt bad good good good 2–2 took view ansicht vertreten bad good good good 2–2 took decision beschluß gefaßt bad good good good 2–1 approach took vorgehen no no bad bad 2–1 took account erörtert bad no good good 2–2 take steps maßnahmen ergriffen no good good good 2–2 decisions taken entscheidungen getroffen no good good good 2–2 vote taken abstimmung findet-statt no good good good 2–1 take (into) account berücksichtigen no good good good 2–2 samples taken proben gezogen no good bad good 2–1 take look anschaut no good bad good 2–2 obligations take verpflichtungen no good good good wahrnehmen 2–1 takes account berücksichtigt no good good good 2–2 decision taken getroffene entscheidung no good good good 2–1 action taken maßnahmen no good bad bad 2–0 action taken no translation no no bad bad 297 A. Light verb constructions data 2–2 take decisions entscheidungen treffen no good good good 2–1 take (into) account einbezogen no no bad bad 2–1 taken note notiert no good good good 2–2 decisions taken getroffene beschlüsse no no good good 2–2 take notice berücksichtigen no no bad bad 2–2 took step schritt vollzogen no good bad good 2–2 take control kontrolle bringen no good bad good 2–0 step take no translation bad bad bad bad 2–1 taken (into) account berücksichtigt no good good good 2–0 action taken no translation no no bad bad 2–2 take account rechnung tragen no good bad good 2–2 decisions taken getroffene entscheidungen no good bad good 2–2 decisions taken gefällten entscheidungen no good bad good 2–1 take account berücksichtigen no no good good 2–2 took decision beschluß gefaßt bad good good good 2–1 decisions taken beschlüsse bad good good good 298 A.2. Word alignment of the constructions with ’make’ A.2. Word alignment of the constructions with ’make’ Mapping EN DE Target DE Target EN v n v n 2–1 make choices auszuwählen no no good good 2–0 make use no translation no no bad bad 2–1 make progress vorankommen no no good good 2–2 makes cuts kürzungen vornimmt no no bad bad 2–2 make decisions entscheiden betreffen no no bad bad 2–1 make contribution beitragen no bad bad bad 2–2 make decisions entscheidungen treffen no good good good 2–1 make reduction reduziert no no bad bad 2–1 make start anfangen no no bad bad 2–2 make points punkte ansprechen no good good good 2–1 make use einsetzen no no bad bad 2–2 make point verfahrensfrage no good good good anzus- prechen 2–2 make contribution beitrag geleistet no good bad good 2–1 speech made rede no good bad bad 2–2 comparison made vergleich anstellen no good good good 2–1 investments made investiert no bad bad bad 2–2 progress made fortschritte erzielt no good good good 2–2 make comments bemerkungen machen no good good good 2–1 make decisions entscheiden no no good good 2–1 make comment sagen no good good good 2–1 make provision einrichten no no bad bad 2–1 comments make anmerkungen no good good good 2–1 proposal makes vorschlag no good bad good 2–2 make suggestion anmerkungen machen no no good bad 2–2 make checks prüfungen vornehmen no no good bad 2–2 make progress schritt getan no no bad bad 2–1 make changes korrekturen no no bad good 299 A. Light verb constructions data 2–1 comments made aussagen no good good good 2–1 choice made entscheidung bad good bad bad 2–1 changes made geändert no no good good 2–1 attempts made versucht no good bad bad 2–2 made request forderung erhoben no bad good bad 2–1 attempts made versucht no good good bad 2–1 investments made investitionen no good bad good 2–1 points made gesagt no no bad bad 2–2 made contribution beitrag geleistet no good good good 2–1 investments made investiert no no bad bad 2–2 gains made erzielten erfolge no no good bad 2–2 made progress kommt voran no no bad good 2–0 makes fuss no translation no no bad bad 2–1 makes reference befaßt no no bad bad 2–2 make statement erklärung abgegeben bad good bad bad 2–1 make inspections kontrollen bad good bad bad 2–2 make assessment bilanz ziehen no bad bad bad 2–1 make statement (um) wort no no bad bad 2–1 make remarks eingehen no no bad bad 2–2 make observation bemerkung machen no good good bad 2–2 made contribution beitrag leisten no good good good 2–2 made start hat start no no bad good 2–1 suggestion made anregung no good good good 2–1 mistakes made fehler no good good good 2–1 comment made anmerkung bad good good good 2–2 progress made fortschritte erzielt bad good good good 2–1 reference made erwähnt bad no bad bad 2–1 appeal made aufruf no good good good 2–1 references made verweis no good good good 2–2 appointments made einstellungen no good good good no good bad good vorgenom- men 2–2 300 progress made fortschritte erzielt A.2. Word alignment of the constructions with ’make’ 2–2 decisions made entscheidungen no good bad good berücksichtigt 2–0 reference made no translation no no bad bad 2–2 decisions made entscheidung fallen good good good good 2–2 statement made abgegebenen erklärung good good good good 2–1 comments made bemerkungen no good good good 2–1 made statement ausgesagt no no good good 2–2 make contribution beitrag liefern no good good good 2–2 make proposal vorschlag machen no good good good 2–2 make reference bezug nehmen no no bad bad 2–2 make contribution beitrag leisten no good good good 2–2 make progress fortschritte erzielen no good bad good 2–2 make contribution beitrag leisten good good bad good 2–1 make assessment einschätzungsvermögen no bad good bad 2–0 make point no translation no no bad bad 2–1 make demands überfordern no no good good 2–2 make statement erklärung abgeben no good bad good 2–2 contribution make beitrag leisten no good good good 2–2 make use gebrauch machen good no good good 2–2 make contribution beitrag aufgaben no good bad good 2–2 make changes sehen veränderungen no no bad good 2–2 make contribution beitrag leisten no good good good 2–2 make decisions macht haben no no bad bad 2–1 make points bemerkungen no good bad good 2–1 make profits verdienstmöglichkeiten no bad good good 2–2 achievements made erreichten erfolge no good bad bad 2–2 made proposal vorschlag gelesen no good bad good 2–2 points made punkte angesprochen no good good good 2–2 made attempts versuch unternommen no no good bad 2–1 points made punkte no good bad bad 2–2 demands made forderungen gestellt good good bad good 2–1 calls made gefordert no good bad bad 2–2 made proposal vorschlag gemacht no good good good 301 A. Light verb constructions data 2–1 made decision entschieden no bad bad bad 2–2 decisions made entscheidungen no good good good 2–1 made gesagt no good bad bad pronounce- ments 2–2 made comment bemerkung gemacht good good good good 2–1 progress made fortschritte no good good good 2–2 proposal made vorschlag machen no good bad good 2–1 promises made versprechen no good good good 2–1 attempt made versucht no good bad bad 2–1 use made förderung no no bad bad 2–2 makes changes änderungen vorgeshclagen no good bad good 302 A.3. Word alignments of regular constructions A.3. Word alignments of regular constructions 2–2 create basis grundlage schaffen good good good good 2–1 jobs created arbeitsplätze bad good bad good 2–2 created climate klima schaffen no good bad good 2–2 create framework entsteht rahmen no good good good 2–2 jobs created arbeitsplätze geschaffen good good good good 2–2 create regime regelung schaffen no good good good 2–2 create inequality wäre ungleichheit no good bad good 2–0 create tape no translation no bad bad bad 2–2 creates networks verwirklichung no good bad good verkehrsnetze 2–2 they created sie dimensioniert good good good good 2–2 create area finanzraum schaffen good no good good 2–2 create jobs schafft arbeitsplätze good good bad good 2–2 jobs created schaffung arbeitsplätzen good good bad good 2–2 create problems probleme no good good good herauf- beschwören 2–2 create inequalities ungleichheit schafft good good bad good 2–2 consensus created war einig no no bad good 2–2 create jobs arbeitsplätze schaffen good good good good 2–2 create incentives anreize schaffen good good good bad 2–2 create institution institutionen schaffen no good good bad 2–2 peace created frieden schaffen good good good good 2–2 create charter titel einfügen no good good bad 2–2 create conditions beitrittsfähigkeit no no good good herzustellen 2–2 create council sicherheitsrat schaffen good bad good good 2–2 jobs created arbeitsplätze entstehen no good bad good 2–2 create societies formierung gesellschaft bad no good bad 2–2 jobs created arbeitsplätze geschaffen good good good good 2–2 create problem schafft problem no no good good 2–2 create code verhaltenskodex schaffen good good good good 303 A. Light verb constructions data 2–2 create union union vereinbaren no no good good 2–2 create source aufbau no no good good informationssys- tems 2–2 literature produced produzierten litaratur no good bad good 2–2 food produced nahrungsmittel produziert good good good good 2–2 produce cereals anbau qualitätsgetreide no no bad good 2–2 produce paper grünbuch vorgelegt good no bad good 2–2 produce industry entwicklung no no bad good indus- triezweigs 2–2 produce programme vorbereitung programms no good bad good 2–2 food produce produzierten nahrungsmit- good good bad good tel 2–2 produce wine wein erzeugen no good good good 2–2 sherry produced hergestellten sherry no good good good 2–2 draw outlines rahmen vorgegeben no good bad bad 2–2 parities fixed paritäten festgelegt no good bad good 2–2 constructed europe europa aufgebau no no bad good 2–2 reconstruct kosovo kosovo wiederaufbauen no good good good 2–2 reconstruct balkans wiederaufbau balkan good good bad good 2–2 rebuild confidence vertrauen aufbauen good good good good 2–2 build bureaucracy verwaltungsapparat no good good good no no bad bad no no bad good kommission eingesetzt good good bad good no good bad good no good good good auf- bauen 2–2 build democracy aufbau demokratie 2–2 establish shelter bereitstellung un- terkünften 2–2 commission estab- lished 2–2 establish framework rahmen setzen 2–2 priorities festgeschriebenen estab- pri- lished oritäten 2–2 policy established preispolitik stabilisiert bad good good good 2–2 establish norms arbeitsnormen einsetzen no bad bad good 2–0 establish principle no translation no no bad bad 304 A.3. Word alignments of regular constructions 2–2 establish consis- konsequenz verstärkt no good bad good identifizierungssysteme bad no bad good tency 2–2 establish systems festgelegt 2–2 establish vanguard bildung vorhut no good good good 2–2 primacy established primat herausgestellt no no good good 2–2 multinationals konzerne niedergelassen no no bad bad grundlagen schaffen good good good good established 2–2 establish founda- tions 2–2 establish policy umweltpolitik machen no bad bad good 2–2 establish clarity klarheit schaffen no good bad good 2–2 they established sie aufgebaut good good good good 2–2 establish procedures verfahren hervorgebracht no good bad good 2–2 established stages phasen festgelegt good good bad good 2–2 criteria established aufgestellten kriterien no good good good 2–2 established perspec- vorausschau aufgestellt bad good bad good vereinbarenden good no good good tives 2–2 procedures estab- ver- lished fahrensweiseg 2–2 establish conditions bedingungen schafft no good bad good 2–2 partnerships estab- beitrittspartnerschaft no bad bad good lished besteht 2–2 establish itself sich festigen no no bad bad 2–2 create situation situation bad good good good herauf- beschwören 2–2 create peace schaffung friedens good good good good 2–2 created alternatives alternativen geschaffen good good good good 2–2 sherry produced hergestellten sherry no good good good 2–2 establish system system schaffen good good bad good 2–2 establish court satzung strafgerichtshofs no good good good 2–2 create fund schaffung fonds good good good good 305 A. Light verb constructions data 2–2 opportunities cre- möglichkeiten bietet no good bad good ated 2–2 created instruments gibt instrument no good bad good 2–2 create opportunity möglichkeit finden no good bad good 2–2 opportunities arbeitsmöglichkeiten good good good good cre- ated geschaffen 2–2 create sources bauen spannungsfaktoren no no bad good 2–2 create problems probleme verursachen no good good good 2–2 jobs create arbeitsplätze geschaffen good good good good 2–2 create opportunities schaffung arbeitsplätze good good bad bad 2–2 create conditions bedingungen schaffen no good good good 2–2 created policy wirtschaftspolitik verwirk- no no good good licht 2–2 produce products produktion liefern bad no bad bad 2–1 produce goods produktion good no bad bad 2–2 produce them sie herstellen no good bad bad 2–2 produce obstacles handelshemmnisse erzeu- no good bad good festlegung einstellungsbe- no no bad bad gen 2–2 establish conditions dingungen 2–2 establish priorities sind prioritäten no no bad bad 2–1 establishes right legt good no good bad 2–2 establish regionalpartnerschaften no good bad good no good bad bad no no bad good partner- ships einzugehen 2–1 establish democracy demokratischen 2–2 distance established mindestentfernung einge- halten 2–1 establish chapter charta no good bad good 2–2 this established das festlegen no good good good 306 B. Corpus counts and measures for lexical causatives Verb abate Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value 11 0.18 0.82 0 0.22 -1.5 128 0.52 0.22 0.27 2.36 0.86 acetify 0 0.19 0.31 0.5 0.63 -0.47 acidify 0 0.19 0.31 0.5 0.63 -0.47 63 0.13 0.57 0.3 0.22 -1.5 0 0.19 0.31 0.5 0.63 -0.47 40 0.18 0.05 0.78 3.5 1.25 0 0.19 0.31 0.5 0.63 -0.47 436 0.58 0.08 0.34 7.17 1.97 ameliorate 4 0.05 0.5 0.5 0.1 -2.34 americanize 0 0.19 0.31 0.5 0.63 -0.47 asphyxiate 0 0.19 0.31 0.5 0.63 -0.47 atrophy 2 0.5 0.5 0 1 0 attenuate 3 0.33 0.1 0.67 3.25 1.18 awaken 29 0.55 0.21 0.24 2.67 0.98 balance 238 0.18 0.05 0.78 3.82 1.34 beam 0 0.19 0.31 0.5 0.63 -0.47 beep 0 0.19 0.31 0.5 0.63 -0.47 bend 17 0.47 0.53 0 0.89 -0.12 0 0.19 0.31 0.5 0.63 -0.47 accelerate age agglomerate air alkalify alter bivouac 307 B. Corpus counts and measures for lexical causatives Verb Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value blacken 4 1 0.08 0 13 2.56 blare 0 0.19 0.31 0.5 0.63 -0.47 blast 4 0.05 1 0 0.05 -3.03 bleed 20 0.25 0.65 0.1 0.38 -0.96 blink 0 0.19 0.31 0.5 0.63 -0.47 blunt 1 1 0.31 0 3.25 1.18 blur 23 0.43 0.43 0.13 1 0 board 9 0.56 0.11 0.33 5 1.61 bounce 12 0.17 0.75 0.08 0.22 -1.5 701 0.33 0.3 0.38 1.1 0.09 brighten 2 0.1 0.5 0.5 0.19 -1.65 broaden 110 0.36 0.14 0.5 2.67 0.98 0 0.19 0.31 0.5 0.63 -0.47 burn 137 0.23 0.18 0.59 1.24 0.22 burp 0 0.19 0.31 0.5 0.63 -0.47 burst 14 0.29 0.71 0 0.4 -0.92 buzz 1 0.19 1 0 0.19 -1.65 calcify 0 0.19 0.31 0.5 0.63 -0.47 canter 0 0.19 0.31 0.5 0.63 -0.47 capsize 4 0.05 1 0 0.05 -3.03 caramelize 0 0.19 0.31 0.5 0.63 -0.47 carbonify 0 0.19 0.31 0.5 0.63 -0.47 carbonize 0 0.19 0.31 0.5 0.63 -0.47 3457 0.37 0.42 0.2 0.87 -0.14 char 0 0.19 0.31 0.5 0.63 -0.47 cheapen 1 1 0.31 0 3.25 1.18 cheer 10 0.2 0.4 0.4 0.5 -0.69 chill 1 0.19 1 0 0.19 -1.65 choke 9 0.22 0.44 0.33 0.5 -0.69 clack 0 0.19 0.31 0.5 0.63 -0.47 clang 0 0.19 0.31 0.5 0.63 -0.47 break brown change 308 Verb clash Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value 21 0.1 0.9 0 0.11 -2.25 0 0.19 0.31 0.5 0.63 -0.47 clean 58 0.53 0.12 0.34 4.43 1.49 clear 296 0.5 0.07 0.43 7.1 1.96 click 0 0.19 0.31 0.5 0.63 -0.47 clog 6 0.17 0.5 0.33 0.33 -1.1 1604 0.2 0.14 0.66 1.47 0.39 coagulate 0 0.19 0.31 0.5 0.63 -0.47 coarsen 0 0.19 0.31 0.5 0.63 -0.47 coil 0 0.19 0.31 0.5 0.63 -0.47 collapse 151 0.04 0.95 0.01 0.04 -3.18 collect 249 0.27 0.06 0.68 4.71 1.55 compress 3 0.33 0.1 0.67 3.25 1.18 condense 2 1 0.15 0 6.5 1.87 contracte 0 0.19 0.31 0.5 0.63 -0.47 13 0.23 0.46 0.31 0.5 -0.69 2 0.5 0.5 0 1 0 crack 22 0.27 0.5 0.23 0.55 -0.61 crash 15 0.01 1 0 0.01 -4.36 crease 0 0.19 0.31 0.5 0.63 -0.47 crimson 0 0.19 0.31 0.5 0.63 -0.47 crinkle 0 0.19 0.31 0.5 0.63 -0.47 crisp 0 0.19 0.31 0.5 0.63 -0.47 crumble 35 0.03 0.94 0.03 0.03 -3.5 crumple 0 0.19 0.31 0.5 0.63 -0.47 crystallize 2 1 0.15 0 6.5 1.87 dampen 9 1 0.03 0 29.25 3.38 dangle 5 0.6 0.2 0.2 3 1.1 darken 4 0.5 0.08 0.5 6.5 1.87 decelerate 1 0.19 0.31 1 0.63 -0.47 decentralize 9 0.22 0.03 0.78 6.5 1.87 clatter close cool corrode 309 B. Corpus counts and measures for lexical causatives Verb decompose Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value 0 0.19 0.31 0.5 0.63 -0.47 decrease 253 0.11 0.82 0.07 0.14 -1.97 deepen 84 0.42 0.39 0.19 1.06 0.06 deflate 3 0.06 0.67 0.33 0.1 -2.34 defrost 1 0.19 0.31 1 0.63 -0.47 degenerate 59 0 0.98 0.02 0 -5.71 degrade 20 0.15 0.05 0.8 3 1.1 0 0.19 0.31 0.5 0.63 -0.47 834 0.09 0 0.91 35.5 3.57 demagnetize 0 0.19 0.31 0.5 0.63 -0.47 democratize 4 0.25 0.08 0.75 3.25 1.18 depressurize 0 0.19 0.31 0.5 0.63 -0.47 desiccate 0 0.19 0.31 0.5 0.63 -0.47 destabilize 19 0.68 0.02 0.32 42.25 3.74 deteriorate 291 0.03 0.96 0.01 0.03 -3.43 detonate 5 0.2 0.06 0.8 3.25 1.18 dim 4 0.75 0.25 0 3 1.1 189 0.31 0.4 0.29 0.76 -0.27 0 0.19 0.31 0.5 0.63 -0.47 disintegrate 14 0.14 0.86 0 0.17 -1.79 dissipate 16 0.44 0.25 0.31 1.75 0.56 dissolve 63 0.17 0.11 0.71 1.57 0.45 distend 0 0.19 0.31 0.5 0.63 -0.47 divide 686 0.16 0.04 0.8 4.19 1.43 double 220 0.21 0.63 0.15 0.34 -1.08 drain 26 0.54 0.12 0.35 4.67 1.54 drift 46 0.13 0.87 0 0.15 -1.9 drive 586 0.36 0.11 0.53 3.33 1.2 drop 303 0.22 0.42 0.36 0.54 -0.62 drown 51 0.2 0.49 0.31 0.4 -0.92 dry 18 0.06 0.94 0 0.06 -2.83 dehumidify delight diminish dirty 310 Verb Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value dull 2 1 0.15 0 6.5 1.87 ease 98 0.67 0.14 0.18 4.71 1.55 empty 27 0.22 0.22 0.56 1 0 emulsify 0 0.19 0.31 0.5 0.63 -0.47 energize 0 0.19 0.31 0.5 0.63 -0.47 enlarge 183 0.16 0.33 0.51 0.5 -0.69 enthuse 3 0.33 0.33 0.33 1 0 equalize 0 0.19 0.31 0.5 0.63 -0.47 20 0.01 1 0 0.01 -4.64 9 0.44 0.22 0.33 2 0.69 expand 343 0.2 0.52 0.28 0.39 -0.94 explode 50 0.04 0.92 0.04 0.04 -3.14 fade 40 0.05 0.95 0 0.05 -2.94 fatten 4 0.05 0.08 1 0.63 -0.47 federate 1 1 0.31 0 3.25 1.18 265 0.37 0.08 0.55 4.71 1.55 firm 3 0.33 0.1 0.67 3.25 1.18 flash 3 0.06 1 0 0.06 -2.75 flatten 7 0.29 0.14 0.57 2 0.69 float 34 0.38 0.18 0.44 2.17 0.77 flood 88 0.26 0.33 0.41 0.79 -0.23 114 0.22 0.78 0 0.28 -1.27 fold 7 0.03 1 0 0.03 -3.59 fossilize 0 0.19 0.31 0.5 0.63 -0.47 fracture 4 0.25 0.08 0.75 3.25 1.18 fray 1 0.19 1 0 0.19 -1.65 115 0.18 0.04 0.77 4.2 1.44 freshen 2 1 0.15 0 6.5 1.87 frost 0 0.19 0.31 0.5 0.63 -0.47 fructify 0 0.19 0.31 0.5 0.63 -0.47 fuse 3 0.33 0.33 0.33 1 0 evaporate even fill fly freeze 311 B. Corpus counts and measures for lexical causatives Verb Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value gallop 5 0.2 0.8 0 0.25 -1.39 gasify 0 0.19 0.31 0.5 0.63 -0.47 gelatinize 0 0.19 0.31 0.5 0.63 -0.47 gladden 1 0.19 0.31 1 0.63 -0.47 glide 0 0.19 0.31 0.5 0.63 -0.47 glutenize 0 0.19 0.31 0.5 0.63 -0.47 granulate 0 0.19 0.31 0.5 0.63 -0.47 gray 0 0.19 0.31 0.5 0.63 -0.47 green 0 0.19 0.31 0.5 0.63 -0.47 grieve 6 0.17 0.67 0.17 0.25 -1.39 grow 1379 0.14 0.78 0.08 0.19 -1.68 halt 130 0.35 0.05 0.61 7.5 2.01 hang 106 0.25 0.62 0.12 0.41 -0.89 harden 14 0.21 0.64 0.14 0.33 -1.1 harmonize 89 0.28 0.07 0.65 4.17 1.43 hasten 23 0.3 0.65 0.04 0.47 -0.76 heal 31 0.29 0.39 0.32 0.75 -0.29 heat 26 0.27 0.27 0.46 1 0 heighten 59 0.53 0.07 0.41 7.75 2.05 hoot 0 0.19 0.31 0.5 0.63 -0.47 humidify 0 0.19 0.31 0.5 0.63 -0.47 18 0.01 0.11 0.89 0.1 -2.34 hybridize 0 0.19 0.31 0.5 0.63 -0.47 ignite 6 0.83 0.17 0 5 1.61 improve 3021 0.45 0.22 0.33 2.03 0.71 increase 4292 0.39 0.41 0.2 0.95 -0.05 incubate 4 0.5 0.5 0 1 0 12 0.42 0.08 0.5 5 1.61 242 0.46 0.24 0.31 1.95 0.67 iodize 0 0.19 0.31 0.5 0.63 -0.47 ionize 2 0.1 0.15 1 0.63 -0.47 hush inflate intensify 312 Verb Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value jangle 0 0.19 0.31 0.5 0.63 -0.47 jingle 0 0.19 0.31 0.5 0.63 -0.47 jump 61 0.39 0.61 0 0.65 -0.43 kindle 9 0.33 0.03 0.67 9.75 2.28 lean 24 0.04 0.92 0.04 0.05 -3.09 leap 11 0.02 1 0 0.02 -4.05 lengthen 19 0.42 0.32 0.26 1.33 0.29 lessen 51 0.61 0.2 0.2 3.1 1.13 level 145 0.18 0.16 0.66 1.13 0.12 0 0.19 0.31 0.5 0.63 -0.47 light 28 0.36 0.18 0.46 2 0.69 lighten 17 0.71 0.12 0.18 6 1.79 lignify 0 0.19 0.31 0.5 0.63 -0.47 liquefy 0 0.19 0.31 0.5 0.63 -0.47 115 0.39 0.02 0.59 22.5 3.11 loop 0 0.19 0.31 0.5 0.63 -0.47 loose 2 0.5 0.15 0.5 3.25 1.18 loosen 8 0.5 0.04 0.5 13 2.56 macerate 0 0.19 0.31 0.5 0.63 -0.47 madden 0 0.19 0.31 0.5 0.63 -0.47 magnetize 0 0.19 0.31 0.5 0.63 -0.47 magnify 7 0.14 0.04 0.86 3.25 1.18 march 25 0.08 0.88 0.04 0.09 -2.4 mature 30 0.01 0.9 0.1 0.01 -4.94 mellow 0 0.19 0.31 0.5 0.63 -0.47 24 0.01 0.92 0.08 0.01 -4.74 0 0.19 0.31 0.5 0.63 -0.47 2910 0.11 0.8 0.09 0.14 -1.97 2 0.5 0.15 0.5 3.25 1.18 102 0.27 0.49 0.24 0.56 -0.58 56 0.36 0.34 0.3 1.05 0.05 levitate lodge melt moisten move muddy multiply narrow 313 B. Corpus counts and measures for lexical causatives Verb Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value neaten 0 0.19 0.31 0.5 0.63 -0.47 neutralize 4 0.25 0.08 0.75 3.25 1.18 nitrify 0 0.19 0.31 0.5 0.63 -0.47 obsess 26 0.04 0.01 0.96 3.25 1.18 1627 0.54 0.14 0.32 3.79 1.33 994 0.1 0.84 0.06 0.12 -2.16 1 0.19 0.31 1 0.63 -0.47 78 0.29 0.01 0.69 23 3.14 oxidize 0 0.19 0.31 0.5 0.63 -0.47 pale 1 0.19 1 0 0.19 -1.65 perch 1 0.19 0.31 1 0.63 -0.47 petrify 1 0.19 0.31 1 0.63 -0.47 polarize 0 0.19 0.31 0.5 0.63 -0.47 pop 4 0.25 0.75 0 0.33 -1.1 proliferate 20 0.15 0.85 0 0.18 -1.73 propagate 13 0.38 0.08 0.54 5 1.61 purify 1 0.19 0.31 1 0.63 -0.47 purple 0 0.19 0.31 0.5 0.63 -0.47 putrefy 1 0.19 1 0 0.19 -1.65 puzzle 30 0.17 0.07 0.77 2.5 0.92 quadruple 10 0.4 0.6 0 0.67 -0.41 quicken 3 0.67 0.33 0 2 0.69 quiet 1 0.19 1 0 0.19 -1.65 quieten 2 0.1 1 0 0.1 -2.34 race 6 0.17 0.5 0.33 0.33 -1.1 redden 0 0.19 0.31 0.5 0.63 -0.47 regularize 3 0.06 0.33 0.67 0.19 -1.65 rekindle 13 0.54 0.08 0.38 7 1.95 reopen 95 0.45 0.08 0.46 5.38 1.68 reproduce 47 0.3 0.3 0.4 1 0 258 0.05 0.94 0 0.06 -2.85 open operate ossify overturn rest 314 Verb Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value revolve 22 0.32 0.68 0 0.47 -0.76 ring 50 0.28 0.54 0.18 0.52 -0.66 rip 12 0.33 0.03 0.67 13 2.56 1 0.19 0.31 1 0.63 -0.47 38 0.37 0.16 0.47 2.33 0.85 rotate 5 0.04 0.6 0.4 0.06 -2.75 roughen 0 0.19 0.31 0.5 0.63 -0.47 round 67 0.66 0.07 0.27 8.8 2.17 rumple 0 0.19 0.31 0.5 0.63 -0.47 1293 0.3 0.56 0.14 0.53 -0.64 rupture 1 0.19 0.31 1 0.63 -0.47 rustle 0 0.19 0.31 0.5 0.63 -0.47 sadden 59 0.17 0.05 0.78 3.33 1.2 scorch 2 0.1 0.15 1 0.63 -0.47 sear 0 0.19 0.31 0.5 0.63 -0.47 455 0.17 0.21 0.62 0.83 -0.18 sharpen 13 0.69 0.15 0.15 4.5 1.5 shatter 52 0.29 0.06 0.65 5 1.61 shelter 17 0.47 0.24 0.29 2 0.69 shine 18 0.17 0.83 0 0.2 -1.61 short 0 0.19 0.31 0.5 0.63 -0.47 short- 0 0.19 0.31 0.5 0.63 -0.47 shorten 55 0.25 0.07 0.67 3.5 1.25 shrink 80 0.08 0.93 0 0.08 -2.51 shrivel 0 0.19 0.31 0.5 0.63 -0.47 155 0.3 0.1 0.61 3.07 1.12 sicken 3 0.33 0.1 0.67 3.25 1.18 silicify 0 0.19 0.31 0.5 0.63 -0.47 silver 0 0.19 0.31 0.5 0.63 -0.47 singe 0 0.19 0.31 0.5 0.63 -0.47 120 0.1 0.82 0.08 0.12 -2.1 ripen roll run settle shut sink 315 B. Corpus counts and measures for lexical causatives Verb sit Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value 723 0.06 0.93 0.01 0.07 -2.71 0 0.19 0.31 0.5 0.63 -0.47 slacken 12 0.42 0.58 0 0.71 -0.34 slide 27 0.11 0.89 0 0.13 -2.08 slim 4 0.05 0.25 0.75 0.19 -1.65 slow 156 0.48 0.38 0.13 1.25 0.22 smarten 0 0.19 0.31 0.5 0.63 -0.47 smooth 14 0.64 0.02 0.36 29.25 3.38 snap 4 0.25 0.5 0.25 0.5 -0.69 soak 4 0.05 0.25 0.75 0.19 -1.65 sober 1 0.19 1 0 0.19 -1.65 soften 16 0.56 0.13 0.31 4.5 1.5 solidify 2 0.1 0.5 0.5 0.19 -1.65 sour 2 0.5 0.15 0.5 3.25 1.18 spin 7 0.29 0.43 0.29 0.67 -0.41 splay 0 0.19 0.31 0.5 0.63 -0.47 splinter 1 0.19 0.31 1 0.63 -0.47 117 0.24 0.05 0.71 4.67 1.54 sprout 5 0.2 0.8 0 0.25 -1.39 squeak 0 0.19 0.31 0.5 0.63 -0.47 squeal 0 0.19 0.31 0.5 0.63 -0.47 squirt 0 0.19 0.31 0.5 0.63 -0.47 15 0.27 0.47 0.27 0.57 -0.56 1349 0.15 0.85 0 0.17 -1.76 steady 2 0.1 0.5 0.5 0.19 -1.65 steep 15 0.07 0.02 0.93 3.25 1.18 steepen 0 0.19 0.31 0.5 0.63 -0.47 stiffen 1 1 0.31 0 3.25 1.18 stifle 69 0.57 0.04 0.39 13 2.56 straighten 9 0.33 0.22 0.44 1.5 0.41 stratify 0 0.19 0.31 0.5 0.63 -0.47 slack split stabilize stand 316 Verb strengthen Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value 1670 0.52 0.05 0.43 10.47 2.35 stretch 90 0.41 0.24 0.34 1.68 0.52 submerge 12 0.02 0.08 0.92 0.19 -1.65 subside 10 0.02 1 0 0.02 -3.95 suffocate 25 0.32 0.28 0.4 1.14 0.13 sweeten 2 0.5 0.15 0.5 3.25 1.18 swim 20 0.01 1 0 0.01 -4.64 swing 15 0.13 0.87 0 0.15 -1.87 tame 4 0.75 0.08 0.25 9.75 2.28 tan 0 0.19 0.31 0.5 0.63 -0.47 taper 0 0.19 0.31 0.5 0.63 -0.47 tauten 0 0.19 0.31 0.5 0.63 -0.47 tear 95 0.31 0.03 0.66 9.67 2.27 tense 0 0.19 0.31 0.5 0.63 -0.47 thaw 0 0.19 0.31 0.5 0.63 -0.47 thicken 1 0.19 0.31 1 0.63 -0.47 thin 1 0.19 0.31 1 0.63 -0.47 thrill 11 0.18 0.03 0.82 6.5 1.87 175 0.38 0.07 0.55 5.58 1.72 tilt 7 0.43 0.29 0.29 1.5 0.41 tinkle 0 0.19 0.31 0.5 0.63 -0.47 tire 36 0.06 0.47 0.47 0.12 -2.14 topple 19 0.16 0.11 0.74 1.5 0.41 7 0.57 0.14 0.29 4 1.39 triple 19 0.21 0.74 0.05 0.29 -1.25 trot 15 0.2 0.13 0.67 1.5 0.41 turn 2003 0.37 0.48 0.15 0.77 -0.26 twang 0 0.19 0.31 0.5 0.63 -0.47 twirl 0 0.19 0.31 0.5 0.63 -0.47 twist 8 0.63 0.04 0.38 16.25 2.79 ulcerate 0 0.19 0.31 0.5 0.63 -0.47 tighten toughen 317 B. Corpus counts and measures for lexical causatives Verb unfold Counts Caus. Anticaus. Passive C/A ra- rate rate tio rate Sp-value 63 0.05 0.95 0 0.05 -3 unionize 0 0.19 0.31 0.5 0.63 -0.47 vaporize 0 0.19 0.31 0.5 0.63 -0.47 159 0.11 0.84 0.05 0.14 -2 vibrate 0 0.19 0.31 0.5 0.63 -0.47 vitrify 2 0.1 0.5 0.5 0.19 -1.65 volatilize 0 0.19 0.31 0.5 0.63 -0.47 waken 1 0.19 1 0 0.19 -1.65 walk 76 0.16 0.84 0 0.19 -1.67 warm 10 0.1 0.9 0 0.11 -2.2 warp 5 0.6 0.06 0.4 9.75 2.28 435 0.53 0.09 0.38 5.75 1.75 weary 4 0.75 0.08 0.25 9.75 2.28 westernize 0 0.19 0.31 0.5 0.63 -0.47 whirl 1 0.19 0.31 1 0.63 -0.47 whiten 0 0.19 0.31 0.5 0.63 -0.47 widen 164 0.38 0.4 0.23 0.95 -0.05 wind 43 0.26 0.21 0.53 1.22 0.2 worry 597 0.29 0.46 0.26 0.63 -0.47 worsen 172 0.31 0.65 0.05 0.48 -0.74 wrinkle 0 0.19 0.31 0.5 0.63 -0.47 yellow 0 0.19 0.31 0.5 0.63 -0.47 ONE 26 0.19 0.31 0.5 0.63 -0.47 vary weaken 318 C. Verb aspect and event duration data 319 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. believe 0.3 0.9 0.1 LONG is 0.6 0.1 0.3 sold 0.9 0.3 deal 0.8 find get 0.7 0.4 0.8 LONG LONG goes 0.5 0.3 0.2 LONG 0.7 LONG saw 0.2 0.1 0.8 SHORT 0.5 0.8 LONG calls 0.4 0.2 0.2 SHORT 0.9 0.5 0.9 LONG saw 0.2 0.1 0.8 SHORT owns 0.8 0.8 0.2 LONG blew 0.8 0.2 0.8 SHORT crashed 0.2 0.6 0.6 LONG became 0.9 0.1 0.8 SHORT thought 0.6 0.0 0.6 LONG released 0.2 0.2 0.6 SHORT hit 0.7 0.1 0.7 LONG exploded 0.3 0.7 0.7 SHORT thought 0.6 0.0 0.6 LONG went 0.7 0.4 0.8 SHORT spent 0.8 0.2 0.3 LONG said 0.2 0.8 0.9 SHORT think 0.4 0.1 0.3 LONG hear 0.1 0.9 0.9 SHORT going 0.4 0.4 0.4 LONG went 0.7 0.4 0.8 SHORT think 0.4 0.1 0.3 LONG tries 0.8 0.3 0.4 SHORT talking 0.9 0.1 0.3 LONG asks 0.4 0.1 0.3 SHORT estimates 0.7 0.3 0.7 LONG believe 0.3 0.9 0.1 LONG is 0.6 0.1 0.3 LONG make 0.6 0.1 0.7 LONG going 0.4 0.4 0.4 LONG give 0.6 0.3 0.6 LONG believe 0.3 0.9 0.1 LONG kept 0.6 0.2 0.4 LONG lost 0.8 0.1 0.8 LONG see 0.1 0.0 0.9 LONG are 0.6 0.2 0.3 LONG see 0.1 0.0 0.9 LONG helping 0.7 0.3 0.3 LONG want 0.4 0.1 0.2 LONG fallen 0.4 0.1 0.9 LONG says 0.1 0.3 0.8 SHORT are 0.6 0.2 0.3 LONG invited 0.7 0.7 0.3 LONG turning 0.4 0.2 0.2 LONG predicted 0.7 0.7 0.3 LONG plunged 0.3 0.3 0.3 LONG tried 0.9 0.3 0.5 LONG soared 0.7 0.3 0.3 LONG said 0.2 0.8 0.9 LONG said 0.2 0.8 0.9 SHORT tried 0.9 0.3 0.5 LONG double 0.7 0.3 0.7 LONG endures 0.9 0.1 0.7 LONG means 0.4 0.4 0.1 LONG persuade 0.7 0.5 0.5 LONG spending 0.9 0.1 0.4 LONG flew 0.7 0.3 0.7 LONG 320 Dur. Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. say 0.2 0.3 0.9 SHORT become 1.0 0.1 0.9 held 0.4 0.3 included 0.8 chosen Dur. shipping 0.4 0.2 0.4 LONG LONG said 0.2 0.8 0.9 SHORT 0.3 LONG stopped 0.8 0.1 0.8 LONG 0.2 0.5 LONG said 0.2 0.8 0.9 SHORT 0.7 0.7 0.3 LONG number 0.6 0.3 0.3 LONG learned 0.9 0.1 0.8 SHORT told 0.5 0.5 0.7 SHORT named 0.5 0.5 0.5 SHORT met 0.6 0.2 0.9 LONG taken 0.7 0.2 0.8 SHORT bring 0.9 0.3 0.8 LONG hurried 0.6 0.2 0.6 LONG led 0.4 0.1 0.4 LONG makes 0.7 0.3 0.2 LONG believe 0.3 0.9 0.1 LONG picked 0.8 0.6 0.9 LONG called 0.3 0.1 0.2 SHORT followed 0.6 0.4 0.4 LONG appears 0.8 0.2 0.7 SHORT doing 0.3 0.2 0.4 LONG say 0.2 0.3 0.9 SHORT was 0.1 0.0 0.1 LONG engaged 0.4 0.2 0.2 LONG has 0.6 0.2 0.7 LONG want 0.4 0.1 0.2 LONG said 0.2 0.8 0.9 SHORT retreated 0.7 0.7 0.7 LONG said 0.2 0.8 0.9 SHORT told 0.5 0.5 0.7 SHORT said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT saw 0.2 0.1 0.8 LONG said 0.2 0.8 0.9 SHORT come 0.8 0.5 0.8 LONG fear 0.4 0.2 0.4 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT committed 0.9 0.1 0.9 LONG denied 0.8 0.6 0.6 LONG expected 0.9 0.9 0.1 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT continue 0.8 0.2 0.6 LONG indicated 0.6 0.2 0.6 SHORT receive 0.8 0.2 0.2 LONG told 0.5 0.5 0.7 SHORT pushed 0.5 0.7 0.7 LONG hopes 0.3 0.7 0.7 LONG plans 0.5 0.8 0.5 LONG comes 0.3 0.7 0.2 LONG occurred 0.5 0.3 0.9 LONG arrive 0.6 0.6 0.8 LONG say 0.2 0.3 0.9 SHORT said 0.2 0.8 0.9 SHORT streamed 0.2 0.2 0.2 LONG said 0.2 0.8 0.9 SHORT quoted 0.3 0.7 0.3 LONG 321 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. took 0.7 0.3 0.9 LONG produce 0.7 0.2 0.5 said 0.2 0.8 continued 0.7 called fall 0.6 0.2 0.6 SHORT LONG has 0.6 0.2 0.7 LONG 0.9 SHORT hope 0.2 0.2 0.4 LONG 0.2 0.6 LONG said 0.2 0.8 0.9 SHORT 0.3 0.1 0.2 SHORT seen 0.1 0.1 0.9 LONG calls 0.4 0.2 0.2 SHORT want 0.4 0.1 0.2 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT called 0.3 0.1 0.2 SHORT need 0.6 0.3 0.4 LONG said 0.2 0.8 0.9 SHORT understands 0.4 0.1 0.7 LONG said 0.2 0.8 0.9 SHORT indicated 0.6 0.2 0.6 LONG called 0.3 0.1 0.2 SHORT put 0.5 0.2 0.8 SHORT said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT say 0.2 0.3 0.9 SHORT following 0.5 0.3 0.5 LONG declined 0.3 0.3 0.3 SHORT told 0.5 0.5 0.7 SHORT included 0.8 0.2 0.5 LONG save 0.2 0.2 0.8 LONG said 0.2 0.8 0.9 SHORT save 0.2 0.2 0.8 LONG grew 0.6 0.4 0.4 LONG begun 0.5 0.2 0.5 LONG allows 0.3 0.3 0.7 LONG say 0.2 0.3 0.9 SHORT seeking 0.2 0.2 0.2 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT hope 0.2 0.2 0.4 LONG ordered 0.7 0.3 0.7 LONG welcomed 0.7 0.3 0.7 LONG talking 0.9 0.1 0.3 SHORT is 0.6 0.1 0.3 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT called 0.3 0.1 0.2 SHORT becoming 0.8 0.2 0.5 LONG has 0.6 0.2 0.7 LONG said 0.2 0.8 0.9 SHORT continued 0.7 0.2 0.6 LONG said 0.2 0.8 0.9 SHORT close 0.8 0.1 0.4 LONG taking 0.2 0.4 0.2 LONG said 0.2 0.8 0.9 SHORT rests 0.8 0.2 0.6 LONG called 0.3 0.1 0.2 SHORT shown 0.8 0.5 0.5 SHORT finished 0.7 0.4 0.7 SHORT diminishes 0.7 0.3 0.7 LONG is 0.6 0.1 0.3 LONG recognizes 0.8 0.1 0.9 LONG 322 Dur. Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. said 0.2 0.8 0.9 SHORT announced 0.8 0.2 0.8 leave 0.8 0.4 add 0.8 said Dur. arrested 0.9 0.1 0.9 LONG SHORT said 0.2 0.8 0.9 SHORT 0.8 SHORT said 0.2 0.8 0.9 SHORT 0.2 0.5 LONG sent 0.5 0.2 0.7 LONG 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT provide 0.5 0.2 0.8 LONG received 0.7 0.1 0.3 LONG plans 0.5 0.8 0.5 LONG appeared 0.7 0.2 0.7 LONG cut 0.8 0.6 0.8 LONG find 0.9 0.5 0.9 LONG double 0.7 0.3 0.7 LONG involved 0.8 0.8 0.2 LONG appoint 0.7 0.3 0.7 LONG pushing 0.5 0.5 0.2 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT working 0.3 0.3 0.2 LONG continued 0.7 0.2 0.6 LONG hopes 0.3 0.7 0.7 LONG attacked 0.3 0.3 0.3 LONG working 0.3 0.3 0.2 LONG continue 0.8 0.2 0.6 LONG led 0.4 0.1 0.4 LONG said 0.2 0.8 0.9 SHORT began 0.2 0.0 0.8 LONG establish 0.7 0.3 0.7 LONG captured 0.8 0.2 0.8 LONG happened 0.4 0.3 0.8 LONG followed 0.6 0.4 0.4 LONG formed 0.7 0.7 0.4 LONG killed 0.8 0.1 0.6 SHORT died 0.9 0.1 0.7 LONG said 0.2 0.8 0.9 SHORT killed 0.8 0.1 0.6 SHORT sent 0.5 0.2 0.7 LONG said 0.2 0.8 0.9 SHORT turned 0.9 0.7 0.9 LONG believed 0.1 0.9 0.1 LONG clearing 0.8 0.5 0.8 LONG searching 0.7 0.3 0.7 LONG considered 0.4 0.2 0.2 LONG entered 0.8 0.4 0.8 LONG said 0.2 0.8 0.9 SHORT built 0.6 0.4 0.6 LONG found 0.9 0.6 0.9 LONG used 0.7 0.5 0.4 LONG ordered 0.7 0.3 0.7 LONG designed 0.8 0.2 0.8 LONG buried 0.7 0.3 0.7 LONG gave 0.4 0.1 0.5 LONG fled 0.8 0.5 0.8 LONG press 0.7 0.7 0.7 LONG falling 0.5 0.1 0.3 LONG said 0.2 0.8 0.9 SHORT went 0.7 0.4 0.8 LONG brought 0.8 0.1 0.9 LONG 323 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. rejected 0.7 0.3 0.7 SHORT blocked 0.8 0.4 0.6 look 0.6 0.0 said 0.2 finish said 0.2 0.8 0.9 SHORT LONG said 0.2 0.8 0.9 SHORT 0.5 LONG assailed 0.7 0.3 0.7 LONG 0.8 0.9 SHORT calls 0.4 0.2 0.2 SHORT 0.8 0.2 0.8 LONG create 0.2 0.2 0.5 LONG said 0.2 0.8 0.9 SHORT called 0.3 0.1 0.2 SHORT touched 0.8 0.6 0.6 SHORT join 0.8 0.2 0.8 LONG shot 0.5 0.3 0.6 SHORT marked 0.4 0.2 0.2 SHORT agreed 0.4 0.2 0.9 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT had 0.5 0.2 0.6 LONG consider 0.1 0.1 0.3 LONG had 0.5 0.2 0.6 LONG lived 0.1 0.2 0.1 LONG secured 0.7 0.7 0.7 LONG require 0.2 0.2 0.2 LONG defend 0.3 0.3 0.3 LONG remain 0.9 0.1 0.8 LONG said 0.2 0.8 0.9 SHORT covered 0.9 0.4 0.4 LONG gotten 0.7 0.3 0.7 LONG discussed 0.3 0.3 0.7 LONG told 0.5 0.5 0.7 SHORT quoted 0.3 0.7 0.3 SHORT crashed 0.2 0.6 0.6 SHORT undermining 0.5 0.5 0.2 SHORT killed 0.8 0.1 0.6 LONG claimed 0.1 0.1 0.1 LONG made 0.8 0.2 0.7 SHORT called 0.3 0.1 0.2 SHORT attacked 0.3 0.3 0.3 LONG permit 0.8 0.2 0.5 LONG think 0.4 0.1 0.3 LONG said 0.2 0.8 0.9 SHORT indicated 0.6 0.2 0.6 SHORT fell 0.5 0.1 0.7 LONG cut 0.8 0.6 0.8 LONG calls 0.4 0.2 0.2 SHORT refer 0.8 0.5 0.5 SHORT signed 0.5 0.8 0.5 SHORT said 0.2 0.8 0.9 SHORT ruled 0.5 0.2 0.2 LONG said 0.2 0.8 0.9 SHORT signed 0.5 0.8 0.5 SHORT wants 0.3 0.2 0.3 LONG claimed 0.1 0.1 0.1 LONG told 0.5 0.5 0.7 SHORT signed 0.5 0.8 0.5 SHORT claiming 0.1 0.1 0.1 LONG say 0.2 0.3 0.9 SHORT sent 0.5 0.2 0.7 LONG solved 0.2 0.5 0.5 LONG suffered 0.2 0.1 0.2 LONG 324 Dur. Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. wants 0.3 0.2 suggested 0.9 occupies Dur. 0.3 LONG seen 0.1 0.1 0.9 LONG 0.5 0.6 SHORT said 0.2 0.8 0.9 SHORT 0.8 0.2 0.2 LONG worked 0.3 0.1 0.2 LONG said 0.2 0.8 0.9 SHORT expected 0.9 0.9 0.1 LONG told 0.5 0.5 0.7 SHORT led 0.4 0.1 0.4 LONG calls 0.4 0.2 0.2 LONG blocked 0.8 0.4 0.6 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT killed 0.8 0.1 0.6 LONG arrived 0.7 0.5 0.8 LONG wants 0.3 0.2 0.3 LONG destroyed 0.9 0.2 0.8 LONG faced 0.7 0.1 0.6 LONG said 0.2 0.8 0.9 SHORT created 0.5 0.2 0.8 LONG made 0.8 0.2 0.7 LONG killed 0.8 0.1 0.6 LONG allowed 0.7 0.1 0.5 LONG said 0.2 0.8 0.9 SHORT moved 0.9 0.6 0.6 LONG lift 0.8 0.4 0.8 LONG signed 0.5 0.8 0.5 SHORT charged 0.2 0.2 0.5 LONG remain 0.9 0.1 0.8 LONG stopped 0.8 0.1 0.8 LONG argued 0.7 0.7 0.3 LONG inspected 0.8 0.2 0.2 LONG placed 0.6 0.2 0.4 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT resisted 0.9 0.3 0.4 LONG visited 0.3 0.3 0.3 LONG appointed 0.7 0.3 0.7 LONG laid 0.8 0.1 0.8 SHORT mention 0.8 0.7 0.8 SHORT allowed 0.7 0.1 0.5 LONG agreed 0.4 0.2 0.9 SHORT marched 0.6 0.4 0.4 LONG try 0.4 0.1 0.4 LONG say 0.2 0.3 0.9 SHORT said 0.2 0.8 0.9 SHORT trying 0.8 0.5 0.4 LONG said 0.2 0.8 0.9 SHORT reported 0.5 0.8 0.5 SHORT assisted 0.3 0.3 0.3 LONG chanted 0.5 0.5 0.5 SHORT reported 0.5 0.8 0.5 SHORT said 0.2 0.8 0.9 SHORT visited 0.3 0.3 0.3 LONG carried 0.5 0.1 0.5 LONG said 0.2 0.8 0.9 SHORT found 0.9 0.6 0.9 LONG finished 0.7 0.4 0.7 LONG appeared 0.7 0.2 0.7 LONG eliminated 0.2 0.8 0.8 LONG prevented 0.8 0.2 0.8 LONG 325 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. ruled 0.5 0.2 0.2 LONG pushed 0.5 0.7 0.7 said 0.2 0.8 said 0.2 marched go 0.6 0.5 0.5 LONG LONG made 0.8 0.2 0.7 LONG 0.9 SHORT appointed 0.7 0.3 0.7 LONG 0.8 0.9 SHORT become 1.0 0.1 0.9 LONG 0.6 0.4 0.4 LONG add 0.8 0.2 0.5 LONG supposed 0.2 0.1 0.3 LONG use 0.5 0.4 0.2 LONG said 0.2 0.8 0.9 SHORT set 0.6 0.2 0.7 LONG hit 0.7 0.1 0.7 SHORT place 0.5 0.2 0.4 LONG caused 0.9 0.6 0.9 SHORT ensure 0.3 0.7 0.7 LONG demanded 0.5 0.5 0.2 LONG provide 0.5 0.2 0.8 LONG said 0.2 0.8 0.9 SHORT emerge 0.5 0.2 0.8 LONG seized 0.9 0.1 0.9 SHORT emerged 0.8 0.2 0.8 LONG bombed 0.3 0.7 0.7 SHORT help 0.8 0.4 0.8 LONG invited 0.7 0.7 0.3 LONG has 0.6 0.2 0.7 LONG kept 0.6 0.2 0.4 LONG beginning 0.7 0.3 0.3 LONG threatening 0.8 0.2 0.5 LONG come 0.8 0.5 0.8 LONG say 0.2 0.3 0.9 SHORT reaching 0.5 0.5 0.8 LONG wants 0.3 0.2 0.3 LONG create 0.2 0.2 0.5 LONG said 0.2 0.8 0.9 SHORT prove 0.8 0.2 0.8 LONG want 0.4 0.1 0.2 LONG need 0.6 0.3 0.4 LONG move 0.5 0.5 0.5 LONG wrote 0.8 0.8 0.8 LONG sent 0.5 0.2 0.7 LONG reported 0.5 0.8 0.5 LONG served 0.7 0.3 0.7 LONG organising 0.2 0.8 0.8 LONG spent 0.8 0.2 0.3 LONG have 0.6 0.2 0.6 LONG expressed 0.7 0.7 0.3 SHORT invited 0.7 0.7 0.3 LONG part 0.8 0.4 0.6 LONG held 0.4 0.3 0.3 LONG set 0.6 0.2 0.7 LONG interpret 0.3 0.3 0.3 LONG said 0.2 0.8 0.9 SHORT do 0.3 0.2 0.4 LONG said 0.2 0.8 0.9 SHORT reproduced 0.3 0.7 0.7 SHORT leaving 0.7 0.4 0.7 LONG said 0.2 0.8 0.9 LONG ordered 0.7 0.3 0.7 LONG said 0.2 0.8 0.9 SHORT 326 Dur. Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. hopes 0.3 0.7 0.7 LONG requires 0.3 0.3 0.3 wanted 0.3 0.2 said 0.2 gave Dur. say 0.2 0.3 0.9 SHORT LONG continues 0.7 0.1 0.5 LONG 0.3 LONG says 0.1 0.3 0.8 SHORT 0.8 0.9 SHORT committed 0.9 0.1 0.9 LONG 0.4 0.1 0.5 SHORT prepared 0.2 0.5 0.5 LONG said 0.2 0.8 0.9 SHORT know 0.1 0.0 0.0 LONG said 0.2 0.8 0.9 SHORT trying 0.8 0.5 0.4 LONG cost 0.3 0.3 0.3 LONG sought 0.3 0.3 0.3 SHORT said 0.2 0.8 0.9 SHORT fails 0.6 0.1 0.9 LONG allow 0.8 0.2 0.6 LONG presented 0.8 0.2 0.5 SHORT took 0.7 0.3 0.9 LONG arrived 0.7 0.5 0.8 LONG approaching 0.8 0.6 0.4 LONG said 0.2 0.8 0.9 SHORT think 0.4 0.1 0.3 LONG fails 0.6 0.1 0.9 LONG waste 0.7 0.3 0.7 LONG allow 0.8 0.2 0.6 LONG doing 0.3 0.2 0.4 LONG wrapped 0.8 0.2 0.8 SHORT think 0.4 0.1 0.3 LONG says 0.1 0.3 0.8 SHORT announced 0.8 0.2 0.8 LONG added 0.9 0.1 0.9 SHORT been 0.6 0.2 0.6 LONG appears 0.8 0.2 0.7 LONG think 0.4 0.1 0.3 LONG come 0.8 0.5 0.8 LONG think 0.4 0.1 0.3 LONG fell 0.5 0.1 0.7 LONG announced 0.8 0.2 0.8 SHORT built 0.6 0.4 0.6 LONG hold 0.9 0.1 0.8 LONG seeing 0.5 0.2 0.5 LONG expect 0.6 0.4 0.6 LONG look 0.6 0.0 0.5 LONG authorized 0.7 0.3 0.7 LONG saw 0.2 0.1 0.8 LONG stop 0.7 0.3 0.9 LONG made 0.8 0.2 0.7 LONG continue 0.8 0.2 0.6 LONG falling 0.5 0.1 0.3 LONG feels 0.8 0.1 0.3 LONG tells 0.3 0.5 0.7 LONG happens 0.1 0.6 0.4 LONG discovered 0.7 0.1 0.9 LONG hope 0.2 0.2 0.4 LONG killed 0.8 0.1 0.6 LONG trying 0.8 0.5 0.4 LONG killed 0.8 0.1 0.6 LONG have 0.6 0.2 0.6 LONG hurt 0.5 0.2 0.5 SHORT 327 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. ignore 0.5 0.5 0.8 LONG explodes 0.3 0.7 0.7 say 0.2 0.3 remember 0.2 claim are 0.6 0.2 0.3 LONG SHORT wants 0.3 0.2 0.3 LONG 0.9 SHORT lost 0.8 0.1 0.8 SHORT 0.0 0.3 LONG found 0.9 0.6 0.9 SHORT 0.1 0.1 0.1 LONG contain 0.8 0.2 0.5 LONG say 0.2 0.3 0.9 SHORT reported 0.5 0.8 0.5 SHORT suggesting 0.9 0.4 0.7 LONG said 0.2 0.8 0.9 SHORT worked 0.3 0.1 0.2 LONG have 0.6 0.2 0.6 LONG charged 0.2 0.2 0.5 LONG live 0.2 0.1 0.3 LONG suspect 0.8 0.2 0.8 LONG declared 0.7 0.3 0.7 SHORT become 1.0 0.1 0.9 LONG flying 0.5 0.2 0.5 LONG ordered 0.7 0.3 0.7 SHORT added 0.9 0.1 0.9 SHORT continues 0.7 0.1 0.5 LONG continues 0.7 0.1 0.5 LONG came 0.8 0.4 0.7 LONG reflected 0.9 0.2 0.3 LONG left 0.9 0.1 0.6 LONG helped 0.8 0.4 0.4 LONG began 0.2 0.0 0.8 LONG showed 0.7 0.5 0.3 LONG began 0.2 0.0 0.8 LONG added 0.9 0.1 0.9 LONG hear 0.1 0.9 0.9 LONG remained 0.9 0.1 0.8 LONG put 0.5 0.2 0.8 LONG became 0.9 0.1 0.8 LONG causing 0.7 0.7 0.7 LONG remained 0.9 0.1 0.8 LONG returning 0.2 0.2 0.8 LONG fallen 0.4 0.1 0.9 LONG want 0.4 0.1 0.2 LONG showed 0.7 0.5 0.3 LONG happens 0.1 0.6 0.4 LONG reflected 0.9 0.2 0.3 LONG appears 0.8 0.2 0.7 LONG creating 0.3 0.3 0.3 LONG coming 0.9 0.6 0.4 LONG hit 0.7 0.1 0.7 LONG come 0.8 0.5 0.8 SHORT reflects 0.9 0.2 0.4 LONG believe 0.3 0.9 0.1 LONG added 0.9 0.1 0.9 LONG invited 0.7 0.7 0.3 LONG came 0.8 0.4 0.7 LONG go 0.6 0.5 0.5 SHORT led 0.4 0.1 0.4 LONG said 0.2 0.8 0.9 SHORT began 0.2 0.0 0.8 LONG are 0.6 0.2 0.3 LONG showed 0.7 0.5 0.3 LONG 328 Dur. Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. allowed 0.7 0.1 0.5 LONG move 0.5 0.5 0.5 broke 0.8 0.2 said 0.2 dropped Dur. rose 0.4 0.2 0.5 LONG LONG lost 0.8 0.1 0.8 LONG 0.8 LONG exhaust 0.7 0.7 0.3 LONG 0.8 0.9 SHORT start 0.5 0.4 0.6 LONG 0.8 0.3 0.8 LONG declined 0.3 0.3 0.3 LONG reported 0.5 0.8 0.5 SHORT showed 0.7 0.5 0.3 LONG withstood 0.7 0.3 0.7 LONG tried 0.9 0.3 0.5 LONG created 0.5 0.2 0.8 LONG caused 0.9 0.6 0.9 LONG believes 0.2 0.9 0.1 LONG extended 0.7 0.3 0.3 LONG asked 0.6 0.1 0.3 SHORT rose 0.4 0.2 0.5 LONG want 0.4 0.1 0.2 SHORT suggested 0.9 0.5 0.6 LONG made 0.8 0.2 0.7 SHORT intend 0.7 0.7 0.3 LONG know 0.1 0.0 0.0 LONG set 0.6 0.2 0.7 LONG stated 0.4 0.8 0.8 SHORT said 0.2 0.8 0.9 SHORT pleased 0.3 0.1 0.1 SHORT edged 0.7 0.3 0.3 LONG removed 0.6 0.2 0.6 LONG helped 0.8 0.4 0.4 LONG intended 0.7 0.2 0.3 LONG rose 0.4 0.2 0.5 LONG lives 0.8 0.3 0.8 LONG welcomed 0.7 0.3 0.7 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT took 0.7 0.3 0.9 SHORT expects 0.9 0.8 0.2 LONG waiting 0.1 0.1 0.1 SHORT made 0.8 0.2 0.7 LONG looks 0.7 0.3 0.3 LONG raise 0.8 0.7 0.8 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT disregarded 0.7 0.3 0.7 SHORT said 0.2 0.8 0.9 SHORT expected 0.9 0.9 0.1 LONG finished 0.7 0.4 0.7 SHORT running 0.5 0.2 0.3 LONG emptied 0.8 0.2 0.8 SHORT said 0.2 0.8 0.9 SHORT appeared 0.7 0.2 0.7 SHORT left 0.9 0.1 0.6 LONG said 0.2 0.8 0.9 SHORT extending 0.6 0.2 0.4 LONG watching 0.9 0.1 0.1 SHORT given 0.6 0.5 0.9 LONG killed 0.8 0.1 0.6 SHORT 329 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. fled 0.8 0.5 said 0.2 appear 0.8 SHORT agree 0.8 0.2 0.3 LONG 0.8 0.9 SHORT took 0.7 0.3 0.9 LONG 0.9 0.3 0.7 LONG said 0.2 0.8 0.9 SHORT started 0.2 0.4 0.8 SHORT expressed 0.7 0.7 0.3 SHORT hit 0.7 0.1 0.7 SHORT written 0.4 0.4 0.6 LONG pronounced 0.7 0.3 0.3 SHORT found 0.9 0.6 0.9 SHORT seen 0.1 0.1 0.9 LONG chose 0.8 0.2 0.8 LONG related 0.7 0.3 0.3 LONG make 0.6 0.1 0.7 LONG expressed 0.7 0.7 0.3 LONG became 0.9 0.1 0.8 LONG believed 0.1 0.9 0.1 LONG seized 0.9 0.1 0.9 LONG killed 0.8 0.1 0.6 SHORT released 0.2 0.2 0.6 LONG said 0.2 0.8 0.9 SHORT thrown 0.8 0.2 0.8 LONG arrested 0.9 0.1 0.9 LONG ruled 0.5 0.2 0.2 LONG said 0.2 0.8 0.9 SHORT bought 0.1 0.1 0.9 LONG created 0.5 0.2 0.8 LONG convicted 0.7 0.7 0.3 LONG identified 0.7 0.3 0.7 LONG has 0.6 0.2 0.7 SHORT said 0.2 0.8 0.9 SHORT facing 0.5 0.5 0.5 LONG made 0.8 0.2 0.7 LONG reinstated 0.3 0.3 0.7 LONG needed 0.2 0.2 0.1 LONG raised 0.8 0.7 0.7 LONG began 0.2 0.0 0.8 LONG means 0.4 0.4 0.1 LONG beaten 0.6 0.6 0.3 SHORT stand 0.4 0.2 0.5 LONG said 0.2 0.8 0.9 SHORT speak 0.5 0.0 0.6 LONG said 0.2 0.8 0.9 SHORT coming 0.9 0.6 0.4 SHORT said 0.2 0.8 0.9 SHORT destroyed 0.9 0.2 0.8 LONG seem 0.4 0.4 0.4 LONG kept 0.6 0.2 0.4 LONG have 0.6 0.2 0.6 LONG declared 0.7 0.3 0.7 SHORT asked 0.6 0.1 0.3 SHORT used 0.7 0.5 0.4 LONG argued 0.7 0.7 0.3 SHORT think 0.4 0.1 0.3 SHORT cover 0.7 0.3 0.7 LONG got 0.6 0.2 0.7 SHORT retreated 0.7 0.7 0.7 LONG have 0.6 0.2 0.6 LONG supported 0.7 0.7 0.3 LONG stand 0.4 0.2 0.5 LONG 330 Dur. Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. rolled 0.6 0.2 discussed 0.3 says Dur. 0.4 SHORT fight 0.1 0.1 0.1 LONG 0.3 0.7 LONG changing 0.8 0.2 0.5 LONG 0.1 0.3 0.8 SHORT asked 0.6 0.1 0.3 LONG abandoned 0.8 0.2 0.8 LONG bought 0.1 0.1 0.9 LONG took 0.7 0.3 0.9 SHORT had 0.5 0.2 0.6 LONG go 0.6 0.5 0.5 LONG decide 0.8 0.2 0.8 LONG demonstrating 0.7 0.3 0.7 LONG move 0.5 0.5 0.5 LONG ordered 0.7 0.3 0.7 SHORT laid 0.8 0.1 0.8 LONG hurt 0.5 0.2 0.5 SHORT dropped 0.8 0.3 0.8 SHORT get 0.7 0.4 0.8 LONG delivered 0.3 0.3 0.7 SHORT say 0.2 0.3 0.9 SHORT has 0.6 0.2 0.7 LONG slipping 0.4 0.6 0.8 LONG think 0.4 0.1 0.3 LONG say 0.2 0.3 0.9 SHORT says 0.1 0.3 0.8 SHORT destroy 0.9 0.5 0.5 SHORT has 0.6 0.2 0.7 LONG cascaded 0.7 0.3 0.7 SHORT says 0.1 0.3 0.8 SHORT break 0.4 0.1 0.6 SHORT became 0.9 0.1 0.8 LONG means 0.4 0.4 0.1 LONG ignore 0.5 0.5 0.8 LONG say 0.2 0.3 0.9 SHORT say 0.2 0.3 0.9 SHORT warning 0.7 0.7 0.3 SHORT rule 0.5 0.5 0.2 LONG cause 0.8 0.4 0.8 LONG embrace 0.7 0.3 0.7 LONG reports 0.3 0.7 0.7 SHORT providing 0.4 0.2 0.6 LONG fall 0.6 0.2 0.6 SHORT bring 0.9 0.3 0.8 LONG said 0.2 0.8 0.9 SHORT say 0.2 0.3 0.9 SHORT destroy 0.9 0.5 0.5 SHORT led 0.4 0.1 0.4 LONG used 0.7 0.5 0.4 LONG teaches 0.8 0.2 0.4 LONG presented 0.8 0.2 0.5 LONG learning 0.7 0.3 0.7 LONG expected 0.9 0.9 0.1 LONG says 0.1 0.3 0.8 SHORT says 0.1 0.3 0.8 SHORT said 0.2 0.8 0.9 SHORT believe 0.3 0.9 0.1 LONG has 0.6 0.2 0.7 LONG said 0.2 0.8 0.9 SHORT take 0.6 0.2 0.7 LONG called 0.3 0.1 0.2 LONG denounced 0.9 0.3 0.7 LONG 331 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. complained 0.7 0.7 0.7 SHORT wondered 0.4 0.1 0.3 said 0.2 0.8 shifted 0.7 fear beginning 0.7 0.3 0.3 LONG SHORT used 0.7 0.5 0.4 LONG 0.9 SHORT rise 0.5 0.7 0.7 LONG 0.7 0.7 LONG says 0.1 0.3 0.8 SHORT 0.4 0.2 0.4 LONG identified 0.7 0.3 0.7 LONG worried 0.3 0.7 0.3 LONG warns 0.7 0.7 0.3 SHORT saying 0.5 0.3 0.5 SHORT grew 0.6 0.4 0.4 LONG thought 0.6 0.0 0.6 LONG helped 0.8 0.4 0.4 LONG taking 0.2 0.4 0.2 LONG says 0.1 0.3 0.8 SHORT reports 0.3 0.7 0.7 SHORT says 0.1 0.3 0.8 SHORT announced 0.8 0.2 0.8 SHORT attracted 0.7 0.7 0.7 LONG believe 0.3 0.9 0.1 LONG opening 0.6 0.2 0.2 LONG say 0.2 0.3 0.9 LONG surprising 0.7 0.3 0.7 LONG state 0.4 0.8 0.8 SHORT become 1.0 0.1 0.9 LONG expected 0.9 0.9 0.1 LONG joining 0.8 0.5 0.8 LONG reports 0.3 0.7 0.7 LONG says 0.1 0.3 0.8 SHORT began 0.2 0.0 0.8 LONG says 0.1 0.3 0.8 SHORT says 0.1 0.3 0.8 LONG become 1.0 0.1 0.9 LONG began 0.2 0.0 0.8 LONG puts 0.6 0.2 0.7 LONG says 0.1 0.3 0.8 SHORT says 0.1 0.3 0.8 SHORT believe 0.3 0.9 0.1 LONG says 0.1 0.3 0.8 SHORT followed 0.6 0.4 0.4 LONG says 0.1 0.3 0.8 SHORT seen 0.1 0.1 0.9 LONG says 0.1 0.3 0.8 SHORT received 0.7 0.1 0.3 SHORT says 0.1 0.3 0.8 SHORT using 0.4 0.4 0.5 LONG appealing 0.7 0.3 0.3 LONG jumped 0.7 0.3 0.7 LONG says 0.1 0.3 0.8 SHORT say 0.2 0.3 0.9 SHORT says 0.1 0.3 0.8 SHORT approaching 0.8 0.6 0.4 LONG make 0.6 0.1 0.7 LONG believe 0.3 0.9 0.1 LONG says 0.1 0.3 0.8 SHORT have 0.6 0.2 0.6 LONG organized 0.2 0.8 0.8 LONG becoming 0.8 0.2 0.5 LONG created 0.5 0.2 0.8 LONG 332 Dur. Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. become 1.0 0.1 0.9 LONG says 0.1 0.3 0.8 facing 0.5 0.5 forbidden 0.8 have Dur. is 0.6 0.1 0.3 LONG SHORT accumulate 0.7 0.3 0.7 LONG 0.5 LONG see 0.1 0.0 0.9 LONG 0.2 0.8 LONG called 0.3 0.1 0.2 SHORT 0.6 0.2 0.6 LONG asked 0.6 0.1 0.3 LONG torn 0.4 0.4 0.4 LONG expects 0.9 0.8 0.2 LONG gets 0.3 0.7 0.3 LONG suggested 0.9 0.5 0.6 SHORT forced 0.8 0.5 0.3 LONG said 0.2 0.8 0.9 SHORT become 1.0 0.1 0.9 LONG says 0.1 0.3 0.8 SHORT runs 0.5 0.2 0.6 LONG meet 0.5 0.2 0.7 LONG says 0.1 0.3 0.8 SHORT remain 0.9 0.1 0.8 LONG put 0.5 0.2 0.8 LONG paid 0.9 0.4 0.2 LONG do 0.3 0.2 0.4 LONG had 0.5 0.2 0.6 LONG works 0.3 0.7 0.3 LONG purchased 0.3 0.3 0.7 LONG gave 0.4 0.1 0.5 LONG said 0.2 0.8 0.9 SHORT says 0.1 0.3 0.8 SHORT denied 0.8 0.6 0.6 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT served 0.7 0.3 0.7 LONG decided 0.5 0.2 0.8 SHORT made 0.8 0.2 0.7 LONG turned 0.9 0.7 0.9 SHORT works 0.3 0.7 0.3 LONG came 0.8 0.4 0.7 SHORT says 0.1 0.3 0.8 SHORT said 0.2 0.8 0.9 SHORT announced 0.8 0.2 0.8 SHORT produced 0.9 0.1 0.5 LONG need 0.6 0.3 0.4 LONG approved 0.7 0.3 0.7 SHORT have 0.6 0.2 0.6 LONG said 0.2 0.8 0.9 SHORT treated 0.8 0.2 0.5 LONG begun 0.5 0.2 0.5 LONG says 0.1 0.3 0.8 SHORT said 0.2 0.8 0.9 SHORT doing 0.3 0.2 0.4 LONG narrowed 0.7 0.7 0.3 LONG waiting 0.1 0.1 0.1 LONG reported 0.5 0.8 0.5 SHORT do 0.3 0.2 0.4 LONG said 0.2 0.8 0.9 SHORT save 0.2 0.2 0.8 LONG said 0.2 0.8 0.9 SHORT begins 0.8 0.2 0.2 LONG closed 0.7 0.2 0.5 SHORT 333 C. Verb aspect and event duration data Verb Pref. Suff. Asp. Dur. Verb Pref. Suff. Asp. issue 0.8 0.2 0.8 LONG said 0.2 0.8 0.9 adopted 0.8 0.2 had 0.5 have has 0.6 0.2 0.7 LONG SHORT said 0.2 0.8 0.9 SHORT 0.8 LONG closed 0.7 0.2 0.5 SHORT 0.2 0.6 LONG improve 0.7 0.3 0.3 LONG 0.6 0.2 0.6 LONG said 0.2 0.8 0.9 SHORT was 0.1 0.0 0.1 LONG apply 0.7 0.3 0.2 LONG suffered 0.2 0.1 0.2 LONG declared 0.7 0.3 0.7 SHORT expected 0.9 0.9 0.1 LONG declared 0.7 0.3 0.7 SHORT said 0.2 0.8 0.9 SHORT issue 0.8 0.2 0.8 LONG said 0.2 0.8 0.9 SHORT exercised 0.8 0.2 0.5 SHORT reported 0.5 0.8 0.5 SHORT said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT has 0.6 0.2 0.7 LONG made 0.8 0.2 0.7 LONG issued 0.8 0.3 0.7 LONG help 0.8 0.4 0.8 LONG said 0.2 0.8 0.9 SHORT profit 0.7 0.3 0.3 LONG declared 0.7 0.3 0.7 SHORT said 0.2 0.8 0.9 SHORT paid 0.9 0.4 0.2 LONG rose 0.4 0.2 0.5 LONG had 0.5 0.2 0.6 LONG said 0.2 0.8 0.9 SHORT purchased 0.3 0.3 0.7 LONG said 0.2 0.8 0.9 SHORT said 0.2 0.8 0.9 SHORT plans 0.5 0.8 0.5 LONG said 0.2 0.8 0.9 SHORT include 0.8 0.2 0.4 LONG close 0.8 0.1 0.4 LONG said 0.2 0.8 0.9 SHORT result 0.8 0.5 0.2 LONG expected 0.9 0.9 0.1 LONG rose 0.4 0.2 0.5 LONG reported 0.5 0.8 0.5 SHORT said 0.2 0.8 0.9 SHORT fell 0.5 0.1 0.7 LONG 334 Dur.