Download Resúmenes - Colloque international de Linguistique de Corpus

Resúmenes CONFERENCIANTES Gloria Corpas Pastor Universidad de Málaga, España "Through the Corpus Glass: diatopy and idiomaticity in translated Spanish" Doctora en Filología Inglesa por la Universidad Complutense de Madrid (1994), Gloria Corpas Pastor es catedrática visitante en Tecnologías de la Traducción del Instituto de Investigación en Procesamiento del Lenguaje y la Información de la Universidad de Wolverhampton (desde 2007) y catedrática de Traducción e Interpretación de la Universidad de Málaga (desde 2008). Experto español para el comité ISO TC37/SC2-WG6 "Translation and Interpreting". Cuenta con una extensa producción científica y forma parte de numerosos comités científicos y consejos de redacción nacionales e internacionales. Actualmente es Presidenta de AIETI (Asociación Ibérica de Traducción e Interpretación), miembro del Consejo Consultivo de EUROPHRAS (“European Society of Phraseology”) y VicePresidenta de la AMIT-A (Asociación de Mujeres Investigadoras y Tecnólogas de Andalucía). Susan Hunston University of Birmingham, Reino Unido "Words and Phrases: re-thinking corpus-based approaches to lexis and grammar" Susan Hunston es catedrática de lengua inglesa en la Universidad de Birmingham (GB). Es especialista en Lingüítica de corpus y en Análiis del discurso. Es autora de varios monográficos (orpora in Applied Linguistics (2002/CUP), Corpus Approaches to Evaluation: Phraseology and evaluative language (2011/Routledge) y coautora de Grammar: a corpus-driven approach to the lexical grammar of English (1999/Benjamins). Es co-editora de Evaluation in Text: authorial stance and the construction of discourse (2000/OUP) y de System and Corpus: exploring the connections (2005/Equinox). Publicó numerosos artículos sobre el uso de los corpus para describir la gramática y el léxico del inglés, y sobre los corpus y análisis del discursos. Aquilino Sánchez Pérez Universidad de Murcia, España "The Cognitive Foundations of Corpus Linguistics" Aquilino Sánchez Pérez fue Director de la Escuela Oficial de Idiomas de Barcelona y profesor de la Universidad de Barcelona y Autónoma de Barcelona. Posteriormente accedió a Cátedra en la Universidad de Murcia, Departamento de Filología Inglesa, centro en el cual sigue impartiendo su docencia. Su docencia y trabajo investigador se han centrado en la Enseñanza y aprendizaje de lenguas extranjeras, lexicología, lexicografía monolingüe y bilingüe (inglés-español) y lingüística del corpus (diseño y recopilación de corpus y desambiguación automática de significados). Fue cofundador y Secretario de la Asociación Española de Lingüística Aplicada (AESLA), fue miembro fundador de la Asociación de Estudios Ingleses en España, de la Asociación Europea de Lexicografía, y fue Presidente de AELINCO( Asociación española de lingüística del corpus). Sumario A Comparable Corpora Study on Self-Directed Motion in Spontaneous and Translated English, Patricia Gonzalez Darriba . . . . . . . . . . . . . . . . . . . . . . . 8 A Corpus-Based Analysis of Phraseological Units in Korean Academic Texts, SunHee Lee [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 A Diachronic Study of the Conative Alternation Construction in English, Laura Esteban-Segura [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 A corpus-based analysis of news values in construing intimate partner violence discourses in digital written media: A historical perspective, Sergio MaruendaBataller [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A corpus-based analysis of syntactic linking between antecedents and ellipsis sites in Post-Auxiliary Ellipsis in Modern English, Evelyn Gandón-Chapela . . . . . . 16 A corpus-based analysis of the collocational patterning of adjectives with abstract nouns in medical English, Natalia Judith Laso [et al.] . . . . . . . . . . . . . . . . 18 A corpus-stylistic analysis of direct thought presentation in Charles Dickens’s fifteen novels, Pablo Ruano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 A data-driven analysis of linguistic complexity and proficiency in learner and native English, Javier Perez-Guerra [et al.] . . . . . . . . . . . . . . . . . . . . . . 22 Affix rivalry in English derivation: An onomasiological approach, Cristina FernándezAlcaina [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Anaphora Resolution on the Fly – Pronouns in a Psycholinguistically Motivated Parsing System, Noemi Vadasz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Anaphora resolution in the interlanguage of English and Greek learners of Spanish: a corpus-based study, Athanasios Georgopoulos . . . . . . . . . . . . . . . . 28 1 Análisis de los aspectos pragmáticos en los discursos especializados de economı́a y finanzas: un trabajo basado en un corpus oral como apoyo a la interpretación, Sonia Paola Martı́nez Zavala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Aplicaciones del corpus CORPEN a la enseñanza y la evaluación de las unidades fraseológicas del español usado en contextos especı́ficos, Inmaculada Martı́nez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Applying Textometric Analysis to a Description of Cochrane Medical Abstracts and their Plain Language versions: Quantitative Characterisation of Plain Language in Medical Discourse, Christopher Gledhill [et al.] . . . . . . . . . . . . . . 34 Aproximación a la fraseologı́a contrastiva en las sentencias del TJUE, Andrades Arsenio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Calcul de la saillance pour annoter un corpus anaphorique (RESUMAN), Afef Selmi [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Constitution d’un corpus juridique pour l’extraction des collocations, Joaquı́n Giraldez Ceballos-Escalera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Construction de corpus en vue d’une étude contrastive des structures résultatives en anglais et de leur traduction en français, Dijana Bojovic . . . . . . . . . . . . 42 Corpus en classe de langue. Exemple avec les marqueurs d’exemplification et de reformulation, Cristelle Cavalla [et al.] . . . . . . . . . . . . . . . . . . . . . . . . 44 Development of Tatar-Russian Socio-Political Dictionary of Collocations on Corpus Data, Olga Nevzorova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Development of annotation system for multiword constructions for Tatar National Corpus, Dzhavdet Suleymanov . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Diccionario de terminologı́a médica español - chino basado en corpus, Antonio Moreno-Sandoval [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Dire la nouveauté par les mots : les néologismes révélant les nouvelles tendances sociétales en France, Najet Boutmgharine Idyassner . . . . . . . . . . . . . . . . 52 Early Modern English Scientific Text Types: Di↵erent Levels of Linguistic Complexity?, Jesús Romero-Barranco . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 El corpus de fuentes digitales como herramienta para la gramática del discurso, Vı́ctor Pérez Béjar [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 El desacuerdo a través de la interrogación ecoica, Marı́a Valentina Barrio [et al.] 2 58 El lenguaje jurı́dico y el lenguaje de la ingenierı́a biomédica vistos desde la metodologı́a de corpus, Eleonora Lozano Bachioqui [et al.] . . . . . . . . . . . . . 60 Estudio comparativo de la traducción en inglés, francés y español de los aspectos ling´’uı́sticos y paraling´’uı́sticos de los cómics a partir de un corpus multimodal de género de terror, Marı́a Del Carmen Baena Lupiáñez . . . . . . . . . . . . . . . . 62 Estudio comparativo de las marcas de uso en los repertorios lexicográficos actuales, Estrella Calvo-Rubio Jiménez . . . . . . . . . . . . . . . . . . . . . . . . . 64 Estudio contrastivo de corpus para identificar los rasgos diacrónicos del discurso normativo catalán : estudio de los Estatutos de autonomı́a de 1932, 1979 y 2006, Albert Morales Moreno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Estudio de la aplicabilidad de la ley de Zipf y de la ley de Heaps en los corpus de aprendientes de inglés., Nicolas Ballier [et al.] . . . . . . . . . . . . . . . . . . . . 67 Extracción de fraseologı́a contable con Sketch Engine. Propuesta de flujo de trabajo, Daniel Gallego . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Extracting semantic frame structures from Environmental Sciences corpora, Beatriz Sánchez-Cárdenas [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Facework in a telecollaboration student corpus, Pennock-Speck Barry [et al.] . . 73 From text to word and from word to morpheme: Exploring the interface of corpus linguistics and word formation study with evidence from Modern Greek, Paraskevi Savvidou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Functional and thematic ngrams in specialized corpora: the case of academic English, French and Spanish, Clive Hamilton . . . . . . . . . . . . . . . . . . . . 77 Gender-based di↵erences in the use of epistemic modals in late Modern English scientific register, Francisco Alonso-Almeida [et al.] . . . . . . . . . . . . . . . . . 79 Gobernabilidad y democracia en México. Unidades fraseológicas del Ejecutivo Federal 2012-2016 desde el Análisis Crı́tico del Discurso, Carlos Enrique Ahuactzin Martı́nez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Gramática española para hablantes de francés: el uso de la preposición ”de” después de matrices del tipo es posible., Marı́a Adelaida Gil Martı́nez . . . . . . 83 Hedging in tourism discourse: the variable genre in academic vs professional texts, Francisca Suau-Jiménez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . 85 Identificación de fórmulas recurrentes en español académico, Marcos Garcı́a Salido [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3 Impact of Parallel Corpora as Translation Memories on Phraseological Translation Quality in Student Translations of Specialized Medical Texts, Heidi Verplaetse [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Investigating style and conventionality in literary translation: a corpus-based approach, Carolina Barcellos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Investigating the cognitive potential of primary EFL textbook activities: a corpusbased study, Joaquı́n Gris Roca [et al.] . . . . . . . . . . . . . . . . . . . . . . . . 93 Investigating the relationship between L1 and L2 collocation processing in the bilingual mental lexicon from a cross-linguistic perspective, Hakan Cangir . . . . 95 Knowledge extraction for TKB phraseology module design, Pilar León-Araúz [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 L’analyse contrastive des références au passé en français et en chinois -Sur le corpus des récits, Xingzi Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 La adquisición de los verbos de cambio: Un análisis de la interlengua de aprendices de español (L1 sueco), Ester Fernández . . . . . . . . . . . . . . . . . . . . . . . . 101 La detección y etiquetado de las estrategias metadiscursivas en artı́culos académicos: METOOL, Marı́a Luisa Carrió-Pastor . . . . . . . . . . . . . . . . . . . . . . . . 103 La economı́a al borde de un ataque de nervios: metáforas médicas en el discurso periodı́stico económico, Ismael Ramos Ruiz . . . . . . . . . . . . . . . . . . . . . 105 La mise en discours des données chi↵rées dans les textes de vulgarisation scientifique, Riham El Khamissy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 La modalité dans les discours politiques : segments phraséologiques en langue et en discours. Exploration textométrique d’un corpus de débats présidentiels états-uniens (1960-2016), Marion Bendinelli . . . . . . . . . . . . . . . . . . . . . 109 La traduction des ” megatermes ” anglais de type erythrocyte invasion-inhibitory response : une approche fondée sur corpus et analyse du discours, Mojca Pecman [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 La traduction publicitaire : approche par corpus, Isabel Comitre Narvaez . . . . 113 Le continuum lexique-grammaire en genre spécialisé à partir de corpus maison, Laurent Gautier [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Le marqueur discursif ”donc” dans deux corpus dialogaux de di↵érente nature, Gemma Delgar Farrés . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4 Learner vs. professional translational behavior: The case of discourse markers, Maria Kunilovskaya [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Les appositions nominales en français et en slovène : étude contrastive sur le corpus FraSloK, Adriana Mezeg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Les constructions verbales en comme : de l’écrit scientifique à l’écrit académique des étudiants natifs/non-natifs, Marie-Paule Jacques [et al.] . . . . . . . . . . . . 122 Meeting the reader in academic writing: reader pronouns in English and French., Curry Niall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Multi-word terms: disclosing the semantic relations in noun compounds, Melania Cabezas-Garcı́a [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Multilingual extraction of terminology from specialised corpora., Eva M. MestreMestre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Naming practices and media constructions of reality in Spanish: A corpus-based perspective on violence against women news (2005-2015), José Santaemilia . . . . 130 On the Endophoric, Abstract and Narrative Nature of Idiomatic ’Do So’ in Legal texts, Journalistic Texts and Written Correspondence. ”, Carlos Prado-Alonso . 132 On the Grammaticalization Path of the Quasi-coordinator as well as, Miriam Criado Peña . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Onomasiologı́a del sentimiento: los corpus ling´’uı́sticos como fuente de datos para la semántica y la combinatoria sintagmática de los nombres de emoción en español, Inmaculada Mas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Phraseological routines in scientific writing: the example of metatextual routines in French, Agnès Tutin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Phraseology and discourse grammar in English as a lingua franca: ’on the contrary’ and ’on the other hand’ in unedited research papers, Silvia Murillo . . . . 139 ROUND TABLE: Corpus-based analysis of interpersonal metadiscourse in specialized domains: academic vs professional and social genres. Theoretical and methodological challenges, Francisca Suau-Jiménez [et al.] . . . . . . . . . . . . . 141 Rocking the corpus. A discourse analysis of pop rock lyrics., Marı́a Martı́nez Casas143 SUNCODAC: A Spanish-English corpus of computer-mediated student discussions, Mario Cal Varela [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5 Secuencia gramatical para la enseñanza del español como lengua extranjera, Yun Sil Jeon [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Semantic constraints on MWU formation: Evidence from clinical records., Leonie Grön [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Sobre la cuasi-sinonimia de poner y meter en español: un análisis de regresión logı́stica de dos verbos locativos., Marie Comer . . . . . . . . . . . . . . . . . . . 151 Spanish Fragments and Polar Verbless Clauses. Typology and Corpus Distribution, Oscar Garcia-Marchena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Spoken Language Corpora under Examination, Hanna Hedeland [et al.] . . . . . 155 Strategies for Processing Large Corpora for Linguistic Inquiry and Natural Language Processing Tasks., Antonio Moreno-Ortiz . . . . . . . . . . . . . . . . . . . 157 Students’ use of the n-grams tool to learn about phraseology in academic writing, Maggie Charles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Teachers’ Dispositions Towards the Use of Corpus-Based Approaches in Teaching English as a Foreign Language in Higher Education, Awatif Alruwaili . . . . . . 161 The Developmental Relationship between Spoken and Written Clause Packaging in an English Secondary School, Mark Brenchley . . . . . . . . . . . . . . . . . . 163 The Psycholinguistic Profile of Domestic Abusers: A Corpus-Based Approach, ángela Almela [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 The XML Annotation of A Corpus of Historical English Law Reports 1535-1999: A Progress Report, Paula Rodrı́guez-Puente . . . . . . . . . . . . . . . . . . . . . 167 The construction of shared feelings: analysis of a↵ect in a corpus of obituary comments in online newspapers, Isabel Corona . . . . . . . . . . . . . . . . . . . 168 The implied consumer in British hotel websites, Carmen Gregori-Signes . . . . . 170 The power of English: I and we in ELF and in ENL academic discourse, Jolanta Sinkuniene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 The textual colligation of stance phraseology in cross-disciplinary academic discourses: the timing of authors’ self-projection, Louisa Buckingham [et al.] . . . . 174 Towards an extended lexical grammar: Complex colligational patterns of the noun cause, Moisés Almela Sánchez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . 176 6 Técnicas de caracterización de los personajes femeninos en Galdós: una aproximación desde los estudios de corpus, Guadalupe Nieto . . . . . . . . . . . . . . . 178 Unidades fraseológicas en la subtitulación de una serie del género de drama., Dalila Itzel Nieto Mercado [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Verbal agreement with NCOLL-of-NPL subjects in the inner varieties of English in GloWbE, Yolanda Fernández-Pena . . . . . . . . . . . . . . . . . . . . . . . . 181 Évaluer le seuil de fréquence pour la sélection des paquets lexicaux: de bonnes nouvelles avec quelques réserves, Yves Bestgen . . . . . . . . . . . . . . . . . . . 183 Índice de creatividad metafórica y universales de traducción: propuesta metodológica a partir de un corpus de informes de responsabilidad social empresarial, Sara Piccioni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 ‘His maiestie chargeth, that no person shall engrose any maner of corne’. The Standardization of Punctuation in Early Modern English Legal Proclamations, Javier Calle-Martı́n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 ‘Making it clear’: A contrastive study of evidentials and boosters in contemporary political discourse, Ana Albalat-Mascarell . . . . . . . . . . . . . . . . . . . . . . 189 Lista de autores 190 7 A Comparable Corpora Study on Self-Directed Motion in Spontaneous and Translated English Patricia Gonzalez Darriba 1 ⇤ 1 Rutgers, The State University of New Jersey [New Brunswick] (RUTGERS) – 100 George Street, New Brunswick, NJ 08901, Estados Unidos This paper employs a corpus-based approach to test two sets of hypotheses that predict opposite outcomes regarding the Unique Item T-Universal (Chesterman, 2004, 2010): on the one hand, Tirkkonen-Condit’s (2004) Unique Item Hypothesis, which claims that Unique Items are under-represented in translated texts, and on the other hand, Bakers’s (1993) Simplification Hypothesis and Halverson’s (2003) Gravitational Pull Hypothesis, which predict overrepresentation of Unique Items in translated texts. In order to test the aforementioned hypotheses, two comparable corpora have been selected and analyzed: The Translational English Corpus (TEC, Baker (2003)) and The Corpus of Contemporary American English (COCA, Davies (2008)), specifically in regards to the relative presence of English self-directed motion expressions such as float into, fly out, etc. The use of Spanish source texts in the case of the translated English texts from the TEC allows us to compare the prevalence of two widely accepted motion lexicalization patterns that correspond to the two languages in question: satellite-framed constructions in English and verb-framed constructions in Spanish (Talmy (1985), Slobin (1996), Levin and Rappaport (2016)). A total of 28 English manner of motion verbs in combination with 8 English path-denoting satellites were selected to search for, count, and compare the number of self-directed motion expressions in the TEC and the COCA. This comparable corpora study yielded a total of 41,852 tokens from both corpora. This number is broken down into 209.2 self-directed motion expressions per million words in the TEC, and 395.5 self-directed motion expressions per million words in the COCA. Data from the 28 verbs in both corpora were analyzed using an independent samples t-test, which revealed that the number of self-directed motion expressions is significantly higher in the COCA (M = 3.32) than in the TEC (M = 1.76; t (219.267) = -2.274; p = .012), Levene: p = .029). Moreover, a two-way ANOVA was conducted to compare the main e↵ects of Corpus and Lexical Frequency, and the interaction e↵ect between Corpus and Lexical Frequency on the number of self-directed motion occurrences by verb form per million words. Main e↵ects were significant for both Corpus and Lexical Frequency, but no Corpus*Lexical Frequency interaction e↵ect was found. These results confirm Tirkkonen-Condit’s Unique Item Hypothesis by proving that spontaneous, non-translated English is significantly richer in self-directed motion expressions than translated English, regardless the frequency of the verb taking part in the self-directed motion expression, and disprove the Simplification Hypothesis (Baker, 1993) and the Gravitational Pull Hypothesis (Halverson, 2003). Additionally, the results provide a baseline for future research aiming at gaining a better understanding of the cognitive processes involved in the translation of self-directed motion expressions. ⇤ Ponente 8 Contraseña: Comparable corpora, self, directed motion, translation universals, under, representation of unique items. 9 A Corpus-Based Analysis of Phraseological Units in Korean Academic Texts Sun-Hee Lee ⇤† 1 , Beomil Kang‡ 2 , Hye Ryeong Yoo§ 3 1 Department of East Asian Languages and Cultures, Wellesley College (EALC) – Green Hall 236B 106 Central Street, Wellesley, MA 02481, Estados Unidos 2 Department of Korean Language and Literature, Yonsei University (Korean Yonsei) – Oesolgwan 214, Yonsei Unviersity, Yonsei-ro 50, Seodaemun-Gu, Seoul, Corea del Sur 3 Department of Korean Language and Literature, Yonsei Graduate School (Yonsei) – Oaesolgwan 214, Yonsei-ro 50, Seadaemun-Gu, Seoul, Corea del Sur This study provides a corpus-based genre analysis of phraseological expressions in Korean academic prose, including collocation, colligation, and prefabricated lexical bundles (or formulaic expressions), etc. As an agglutinative language, phrasal structures in Korean incorporate particles and verbal endings in word-units and are more complex than the corresponding English structures. While exploring relevant challenges and new methodological tools to capture typologically distinct properties of Korean, we identify unique genre-specific properties of L1 academic texts using prefabricated phraseological units. We have collected a 10.9 million ecel (space-based unit) corpus composed of 2171 academic theses in the disciplines of humanities and social science with the highest ranks within the Korea Citation Index. From the corpus we extracted phraseological units depending on language model N-grams and processed them with statistical tools. While addressing related challenges in language specific data processing and analysis, we present the distinct linguistic functions of the phraseological units in Korean academic prose in comparison with other registers. Our study demonstrates the need to integrate both corpus-driven and corpus-based methodologies in order to process meaningful lexico-grammatical combinations in Korean, where strong morphosyntactic relations hold across distinct phrasal boundaries via a diverse collection of particles and endings. Our study also shows that combining N-gram-based extraction and morpheme-based cut-o↵s is more useful for identifying meaningful combinations. In line with Jang (2015), we argue for incorporating context sensitivity to n-grams to determine more useful patterns especially for processing agglutinative languages like Korean. For example, collecting the preceding and the following slots of an extracted N-gram and utilizing them to decide the final pattern increases the usability of the outcome. In the post-process of counting the frequency of an extracted N-gram, we merge a verbal lexeme with the following dependent morpheme(s), which does not make a meaningful linguistic contribution to the given phraseological unit; this process significantly decreases the number of patterns due to morpheme-based processing of N-grams in Korean. Based upon extracted phraseological expressions, we provide a genre-focused linguistic analysis of Korean academic register. While we are still in the process of extracting meaningful phraseological patterns, our pilot study suggests that there exist dynamic functions of referential expressions, stance expressions, hedges etc. in Korean academic texts. Despite the lack of referential expressions in Korean, the usage of phraseological units with demonstrative pronouns i ‘this’, and ku ‘that’ is highly ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] § Autor correspondiente: [email protected] † 10 frequent in academic contexts. Expressions of epistemic and attitudinal/modality stance are more rigorously used in the Korean academic register, which contrasts with Biber’s (2004) analysis of academic prose in English. Expressions of indirect quotation and hedges are noticeable in the extracted outcome. These findings suggest that sociocultural property of indirectness is prevalently reflected in Korean academic writing. The outcome of our study will provide a platform for further research with a large-size corpus of more than 100 million ecel for applied/pedagogical research on language acquisition and Korean for academic purpose (KEP). The long-term goal of our research aims to develop full-fledged genre analysis of L1 academic texts as well as L2 acquisition data. The study also explores dynamic interactions between grammar and lexicon in agglutinative languages like Korean while identifying language specific features in processing phraseological units and a genre analysis of academic texts. Contraseña: phraseological expressions, formulaic expressions, collocation, genre analysis, academic register 11 A Diachronic Study of the Conative Alternation Construction in English Laura Esteban-Segura ⇤† 1 , Soluna Salles-Bernal ⇤ 1 1 Universidad de Málaga (UMA) – España The conative alternation is a subtype of transitivity alternation in which there is a transitive variant and an intransitive one represented with an at-construction. From a syntactic point of view, it occurs with transitive verbs and is therefore referred to as a case of preposition insertion (the preposition at is inserted before the direct object). From a semantic perspective, it can be described as a ”detransitivizing” construction, since there is a contrast between conative uses of transitive verbs and their transitive counterparts (Perek 2015: 90). Accordingly, the argument can be direct (subject, direct object or indirect object) or oblique. (1) a. Kim cut the pie. b. Kim cut at the pie (drunkenly) (Beavers 2006: 6). The patient (”the pie”) can have two realizations: as the direct object (1a) or as an oblique signalled by the preposition at (1b). Here we find a semantic contrast: in the transitive variant the patient is known to have been a↵ected in some way, whereas in the one with the at-construction this is not necessarily the case; thus, the action denoted by the verb may or may not have been completed and the alternation may convey ”a reduced a degree of e↵ectiveness” (Riemer 2010: 354), as seen in example (2b) below, which implies that the action was not completely successful: (2) a. The zombies slashed my face. b. The zombies slashed at my face. Although the construction has been studied before (van der Leek [1996], Broccias [2001, 2003], Beavers [2010], Perek and Lemmens [2010], Guerrero-Medina [2011], Perek [2015]), it remains scarcely investigated from a diachronic point of view. Therefore, our main objective is to research on the origin and development of the conative construction in English by looking at its occurrence in several historical corpora. For the purpose, we have first made a comprehensive list of verbs which allow the construction and then selected the verbs under study. A collostructional analysis, which ”investigates which lexemes are strongly attracted or repelled by a particular slot in the construction (i.e. occur more frequently or less frequently than expected)” (Stefanowitsch and Gries 2003: 214), has been carried out as it can help to establish which verbs favour the construction over others in the di↵erent corpora. Some of our preliminary results show that the construction was already present in Old English and that in most instances the subject is agentive or animate. ⇤ † Ponente Autor correspondiente: [email protected] 12 Contraseña: conative alternation, verb alternation, history of English, collostructional analysis 13 A corpus-based analysis of news values in construing intimate partner violence discourses in digital written media: A historical perspective Sergio Maruenda-Bataller ⇤ 1 , Paula Rodrı́guez-Abruñeiras ⇤ 1 1 IULMA/Universitat de València – España In the last thirty years, there have been important advances in the media coverage or discussion of violence against women (VAW) (Aran Ramspott & Medina Bravo 2006; VallejoRubinstein 2005). Lately, it is indisputable that IPV is one of the key issues not only in political, social and institutional discourses but also in the selection agenda of news producers. The recognition of this phenomenon has been largely due to the media, which have played a decisive role in transferring the issue from the private and personal to the public sphere, thus ensuring visibility and contributing to sensitizing citizenship (Berganza Conde 2003). However, some authors (e.g. Altés 1998; Alberdi & Matas 2002) have argued that this is not without a cost. Media are torn between two conflicting interests: on the one hand, to treat these grievous cases with the required ethics and, on the other, to attract a maximum audience, which is almost ‘naturally’ done through sensationalism. Journalists can create di↵erent pictures of domestic violence and ”confirm and debunk the myths surrounding it by choosing certain topics, sources, facts, and words over others” (Bullock & Cubert 2002: 479). Against this backdrop, the present study aims to contribute a corpus-based approach to the discursive devices used to construct newsworthiness in IPV news in Spanish and UK dailies in an ad-hoc corpus of gender violence news reports from 2005 to 2015. Specifically, we explore the way media outlets have discursively represented women victims of IPV by means of news values over the last decade. Subsidiary to this, we will explore the way news values are exploited ideologically to construct discourse prosodies around women victims of IPV, violent episodes and perpetrators. The results gain insights into the social configuration and definition of women and their identities in contemporary written media on IPV through time. For our purposes, we apply Bednarek & Caple’s (2012; 2014) linguistic approach to news values as discursive realisations of newsworthiness that ”exist in and are constructed through discourse” (Bednarek & Caple 2014:136). Our analysis combines a quantitative approach with close qualitative readings of concordance lines to identify frequent linguistic occurrences in the corpus that may give rise to discourse prosodies (Bednarek 2006; Baker et al. 2008; Baker & Levon 2015). We pay attention to shared and di↵erent values cross-culturally, together with the most relevant discourse prosodies and ideological implications. Our results substantiate the existence of two polarised discourses which are nevertheless inextricably and ineluctably linked: a discourse of death, violence and terrible su↵ering and another of institutional and social support. The former is mainly conveyed through Negativity and Impact, while the latter is conveyed through Eliteness and Positivity. On the whole, these discourses are similarly constructed in the four ⇤ Ponente 14 data sets. However, the concordance analysis points to remarkable di↵erences. It shows that Negativity has more critical overtones in the Spanish newspapers, and reports on abusers are often constructed as more impersonal in the case of UK dailies. As for the depiction of extreme negative emotions, the higher number of occurrences, together with a wider plethora of word combinations construct Spanish reports as more ideological, if not sensationalist, thus exploiting readers’ interest in crime and violence. Contraseña: intimate partner violence, news values, newsworthiness, CADS, women. 15 A corpus-based analysis of syntactic linking between antecedents and ellipsis sites in Post-Auxiliary Ellipsis in Modern English Evelyn Gandón-Chapela 1 ⇤ 1 University of Cantabria and University of Vigo – España This study analyses the type of syntactic linking established between the antecedent clause(s) and the ellipsis site(s) in cases of Post-Auxiliary Ellipsis (PAE) in Modern English, using the Penn Parsed Corpus of Modern British English (1700-1914, one million words and eighteen di↵erent genres).The term ‘PAE’ (Sag 1976; Warner 1993; Miller 2011; Miller & Pullum 2014) covers those cases in which a Verb Phrase, Prepositional Phrase, Noun Phrase, Adjective Phrase or Adverbial Phrase is omitted after modal auxiliaries, auxiliaries be, have and do, and infinitival marker to. VP ellipsis (VPE) and Pseudogapping (PG) are the two subtypes of PAE under investigation: (1) That I had received such from Edward also I need not mention; but I do, you see, because it is a pleasure. [VPE: coordination] (2) They can by no means, therefore, be members of happiness; for if they were, happiness might be said to be made up of one member. [VPE: adverbial subordination] (3) I can recollect nothing more to say. When my letter is gone, I suppose I shall. [VPE: none] (4) A skilled florist will produce a finer e↵ect with a few inexpensive blossoms than an unskilled one will with a cartload of choice material. [PG: comparative subordination]. (5) but did not admire the strain of its poetry in general, though I did its morality. [PG: adverbial subordination] This aspect has also been studied in very few corpus-based works for the Present-Day English period (Hardt & Rambow 2001; Nielsen 2005; Hoeksema 2006; Bos & Spenader 2011; Sharifzadeh 2012; Miller 2014). Here I extend these studies by analysing the type of syntactic linking in PAE constructions in Modern English and by presenting a retrieval algorithm of instances of PAE via CorpusSearch 2. This complex algorithm has led to successful recall ratios (0.97) and is applicable to parsed corpora which follow the conventions of the Penn Parsed Corpus of Modern British English. The results show that, regarding PG, the vast majority of cases are comparative constructions (74%), followed by those cases with lack of syntactic linking (15.12%), coordination (4.65%), adverbial subordination (4.65%) and relative subordination (1.16%). The comparison with other studies on PG in Present-Day English (Hoeksema 2006; Sharifzadeh 2012; Miller 2014) has revealed that instances of PG with NP remnants have a stronger preference for comparative constructions in Present-Day English (around 90%) than in Modern English (70%). Regarding VPE, in over 50% of the examples there is no syntactic linking between the source and the target of ellipsis, which contrasts with the percentage found in PG (15.12%). The second most important type of syntactic linking is comparative subordination (31.51%). ⇤ Ponente 16 However, although the percentage of comparative constructions is high in VPE, it is almost 2.5 times higher in PG (74.42%). Far less common are cases of relative subordination (7.22%), coordination (5.56%) and adverbial subordination (5.37%). If these findings are compared with Bos & Spenader’s (2011), it is observed that the first three types of linking are the same in both studies: as-appositives, comparatives and lack of syntactic linking. Hardt & Rambow (2001), on their part, found that the di↵erent forms of subordination favour VPE, while the absence of a direct relation disfavours its presence. However, this type of linking is the third most frequent one in Bos & Spenader’s (2011) work and in this paper. Contraseña: ellipsis, syntactic linking, Modern English 17 A corpus-based analysis of the collocational patterning of adjectives with abstract nouns in medical English Natalia Judith Laso ⇤† 1 , Suganthi John ⇤ 2 1 2 University of Barcelona (UB) – España University of Birmingham – Reino Unido Research on specific-domain phraseology has demonstrated that it is challenging for EAL writers to acquire phraseological competence in academic English and develop a good working knowledge of domain-specific collocational patterns (Carter 1998; Williams 1998; Wray 1999; Gledhill 2000; Flowerdew 2003; Biber 2006; Hyland 2008 & 2016; Granger & Meunier 2008; Author 1 & Author 2 2013; Pérez-Llantada 2014; Hyland 2016). This is especially apparent in scientific discourse, where research grows at a rapid pace and researchers often are required to disseminate their results equally rapidly to an international audience. The struggle for the EAL speaker is learning the discourse conventions of the scientific genre to ensure that their results receive the sort of attention they would like it to from other members of the science community. Corpus-based analyses have been of special relevance in the field of genre analysis, which is a specific language practice, characterised by a number of linguistic features and phraseological conventions. It can therefore be claimed that genres make use of di↵erent ways of expressing meaning (Swales 1990; Hunston 2002). This assumption is intimately linked with the concept of local grammar (Gross 1993; Barnbrook & Sinclair 1995; Hunston & Sinclair 2000), which consists of a description of particular areas of language (e.g. the analysis of the collocational and phraseological conventions characteristic of scientific discourse), rather than the language as a whole (Bednarek 2007). The aim of this paper is to describe one pattern commonly found in scientific discourse; i.e. abstract nouns in combination with adjectives so as to contribute to the characterisation of this combinatorial pattern in medical science writing. The corpus analysed in this study is the Health Science Corpus (HSC ), which is a representative sample of health science research articles specifically compiled for investigating the lexico-grammatical patterns surrounding nontechnical terms in scientific English and the conventionalised phraseological characteristics of this genre. The observations drawn have contributed to our understanding of the positions and typology of adjectives in combination with abstract noun patterns in medical English. Furthermore, this study has also brought to the forefront the convenience of using collocation evidence obtained from textual corpora in EFL and ESP settings so as to help EAL writers focus on slices of real language as well as high-frequent combinations of words. To this end, the findings in this study have informed the development of SciE-Lex, a reference tool which provides information about the meanings and the grammatical and collocational patterns of ⇤ † Ponente Autor correspondiente: [email protected] 18 general terms frequently produced in medical English. The aim of SciE-Lex is to help the Spanish professional medical community use the appropriate collocational patterns in their medical research articles. Some other publicly available resources, such as existing technical and scientific monolingual dictionaries, focus mainly on terminological and encyclopaedic information or –as in the case of bilingual and multilingual dictionaries- they provide translation equivalents without further information about the context on which the meaning of a given lexical entry depends. Consequently, the development of lexical databases like SciE-Lex as well as specialised dictionaries that take into account the lexico-grammatical patterning of lexical units and acknowledge that meaning is highly dependent on the context of co-occurrence of the word (Barnbrook 2007:191) is considered to be extremely valuable to the EAL scientific community. Contraseña: phraseological units, abstract nouns, EAL writers, medical community, ESP corpus investigation 19 A corpus-stylistic analysis of direct thought presentation in Charles Dickens’s fifteen novels Pablo Ruano 1 ⇤ 1 Universidad de Extremadura - Uex (SPAIN) – España In this presentation, a corpus-stylistic analysis of direct thought presentation will be carried out in a corpus of Charles Dickens’s fifteen novels (c. 3.8 million words). The aim of the analysis is to delve deeper into Dickens’s presentation of his characters’ thoughts, an aspect so far underexplored maybe due to the ‘lack of psychological inwardness and depth in his characters’ (McParland, 2011: 209). Despite such dearth of psychological depth, though, Dickens consistently reported his characters’ thoughts throughout his fifteen novels. Therefore, a systematic analysis of how he did so is in order, if only because no comprehensive account of it has been yet attempted. As will be shown, occurrences of direct thought (henceforth, DT) can be effectively retrieved thanks to a corpus methodology, which makes it possible to systematically analyse Dickens’s use of this mode of thought presentation. Specifically, 244 occurrences of DT have been retrieved here, constituting a much wider set of examples than the twenty-one examined by Busse (2010) in the most comprehensive analysis of discourse presentation strategies in nineteenth-century fiction to date.[1] The analysis of these 244 occurrences will not only further confirm some of Busse’s findings regarding DT in nineteenth-century narrative fiction, but will also unveil hitherto unremarked patterns in form and function as far as Dickens’s presentation of his characters’ thoughts is concerned. The analysis has focused on those examples that contain the verb think, the reporting verb for thought presentation par excellence. For example: ”John” thought madame, checking o↵ her work as her fingers knitted, and her eyes looked at the stranger. ”Stay long enough, and I shall knit ‘BARSAD’ before you go.” (A Tale of Two Cities, book 2, chapter 16) This example contains several characteristic features of Dickens’s use of DT, such as the use of a vocative in the reported clause, a suspended reporting clause and the reference to the character’s eyes. These and other traits are investigated in this presentation. As will be shown, they fulfil meaningful functions which relate to significant aspects of Dickens’s style, as discussed by other critics. The analysis is intended to contribute to a better understanding of Dickens’s craftsmanship from a stylistic point of view. It is only fair to note that Busse’s corpus is composed of excerpts of less than 3,500 words from twenty-two nineteenth-century novels (Busse, 2010: 64), being therefore much smaller than the corpus of Dickens’s novels analysed here. ⇤ Ponente 20 Contraseña: Dickens, corpus stylistics, direct thought presentation 21 A data-driven analysis of linguistic complexity and proficiency in learner and native English Javier Perez-Guerra ⇤ 1 , Ana Elina Martinez-Insua ⇤ 1 1 University of Vigo (UVigo) – FFT. Campus Universitario. 36310 Vigo, España This paper investigates issues covered by the umbrella concept of ‘linguistic complexity’ in learner language. The notion of complexity, as understood in this study, focuses on a number of dimensions: lexical, syntactic and semantic-discoursive. The null hypothesis ‘learner language does not deviate from native language as regards linguistic complexity’ is rejected in light of data-driven standard metrics of linguistics density and inter-/intra-textual diversity. On the one hand, the data sampling learner language are retrieved from the Early-Access Subset of the Trinity Lancaster Corpus, compiled at the ESRC Centre for Corpus Approaches to Social Science, Lancaster University. This subset of the Trinity Lancaster Corpus comprises approximately two million words in length and includes transcribed interactions between candidates and examiners from B1 to C2 level of the Common European Framework of Reference (Council of Europe 2001). Each candidate participated in a number of speaking tasks (depending on his/her proficiency level). On the other hand, the data retrieved from the learner dataset will be compared with results deriving from the native learner corpus LOCNEC (Centre for English Corpus Linguistics, Université catholique de Louvain), which will constitute the English native control corpus, as well as with other non-native L2 corpora, such as the Louvain International Database of Spoken English Interlanguage (LINDSEI). The software tools which will be used in this research are, first, Coh-Metrix (McNamara et al. 2014) and Synlex (Lu 2012, 2014). First, Coh-Metrix provides basic lexical and semanticdiscoursive features such as type-token ratio and average word and sentence length, as well as other metrics of textual lexical diversity (mainly vocd-D) and readability indexes (Flesh Reading Ease, Flesh Kincaid Grade Level). Besides, it determines spaces in Latent Semantic Analysis which can be used to characterise the degree of conceptual similary within a group of texts. Second, Synlex (Lu’s Lexical Complexity Analyzer and L2 Syntactic Complexity Analyzer) automates the analysis of complexity by using 25 di↵erent measures of lexical density, taken from the first- and second-language development literature. The input texts from the Early-Access Subset of the Trinity Lancaster Corpus will be POS-tagged and lemmatised by means of TreeTagger so that Synlex can provide the di↵erent measures. The statistical analysis and discussion of the metrics for the native and the learner corpora, as supplied by Coh-Metrix and Synlex, will be decisive to investigate the following research questions: does learner language di↵er from native language as regards linguistic complexity? do the CEFR levels imply di↵erences as regards linguistic complexity? The results show, first, that the answers to the previous research questions are positive and, second, that the cline as ⇤ Ponente 22 regards complexity degrees complies with the CEFR levels in a very significant way. References Council of Europe. 2001. Common European Framework of Reference for Languages: learning, teaching, assessment. Cambridge: Cambridge UP. Landauer, Thomas K. 2007. LSA as a theory of meaning. Eds. Thomas K. Landauer, Danielle S. McNamara, Simon Dennis and Walter Kintsch eds. The handbook of Latent Semantic Analysis. Mahwah, NJ: Lawrence Erlbaum, 3–34. Lu, Xiaofei. 2012. The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal 96/2: 190–208. McNamara, Danielle S., Arthur C. Graesser, Philip M. McCarthy and Zhiqiang Cai. 2014. Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge UP. Lu, Xiaofei. 2014. Computational methods for corpus annotation and analysis. Dordrecht: Springer. Contraseña: complexity, CEFR, metrics, learner language, readability 23 Affix rivalry in English derivation: An onomasiological approach Cristina Fernández-Alcaina 1 , Cristina Lara-Clares 1 , Jesús Fernández-Domı́nguez ⇤ 1 1 University of Granada [Granada] – vda. del Hospicio, s/n C.P. 18071 Granada, España The notion that morphological processes contend with each other for concept naming is a well-known and substantiated one, and underlies some of the most prominent word-formation theories. Morphological competition is however a slippery notion which numerous scholars have dealt with in passing, and where the existing approaches are more theoretically than empirically oriented. In principle, competition is a theory-neutral notion and ”[...] happens when two or more morphological processes can express the same syntactic-semantic function” (Kastovsky 1986: 597). The competitive behaviour of word-formation has been the focus of recent investigations, most of which have adopted a primarily formal perspective by comparing pairs or small groups of competing rules (Bauer 2006, Bauer et al. 2010, Arono↵ & Lindsay 2014). The scope of such works includes the semantics of derivation, but their driving force is formal performance. Ultimately, the conclusions of customary approaches to competition are that affix X succeeds in the competition with affix Y, or that affix Z dominates in a given morphosemantic context. One alternative to the above is found in the onomasiological model of word-formation, which follows in the tradition of the Prague School of Linguistics and whose main exponent is Štekauer (2005). This approach has shifted the focus of word-formation analysis away from its formal aspects onto the naming needs of language users, such that the semantics of lexemes prevail over their form. In this view, the base-derivative relationship is inspected mainly through meaning categories like Causative, Locative, Agent or Instrument, each of which may be conveyed via various word-formation processes (e.g. -er, -ian or -ist all express agency). With this in mind, this paper considers the role of cognitive-semantic categories from two angles: i) How do cognitive-semantic categories behave with regard to morphological competition? ii) How can the existing formulas of productivity measurement be employed in the onomasiological evaluation of competition? For both issues we resort to the British National Corpus (BNC), the Corpus of Contemporary American English (COCA) and the Oxford English Dictionary (OED). In the case of question i), the derivatives are classified into competing clusters by using a template that considers a series of factors that facilitate or constraint the appearance and the profitability of word-formation processes. Based on the semantic classification in Bagasheva (to appear), this makes it possible to interpret which readings of a lexeme prevail and which become obsolete during competition. Question ii) is addressed by operating the productivity formulas in Baayen (2009) and Gaeta & Ricca (2015) on the study sample, for which corpus-derived frequencies prove essential. The results obtained from the above are then set in the framework of a semantic view of competition, understood as a complement to more formal views on the matter. The preliminary conclusions point to a correlation between the number of instances of a process in present com⇤ Ponente 24 petition and its profitability, and between the number of instances of prevalence with the degree of profitability of that process. Contraseña: affixation, English, morphological competition, onomasiology, productivity, word formation 25 Anaphora Resolution on the Fly – Pronouns in a Psycholinguistically Motivated Parsing System Noemi Vadasz 1 ⇤† 1,2 Pázmány Péter Catholic University, Faculty of Humanities and Social Sciences – Budapest, Hungrı́a 2 MTA-PPKE Hungarian Language Technology Research Group – Budapest, Hungrı́a A psycholinguistically motivated parsing model like AnaGramma (Prószéky and Indig, 2015) throws new light upon the broadly interpreted problem of anaphora resolution. This paper concentrates on the narrower problem of pronouns[1] namely the personal, reflexive and reciprocal pronouns in the framework of the AnaGramma parsing model. As AnaGramma, with its strictly left-to-right, word-by-word approach tries to handle utterances by following the patterns of human language processing as much as possible, it is needed to handle coreference ‘on the fly’ during the parsing of the utterance. It works with a supplyand-demand framework, which means that each word supplies its lexical representation and morpho-syntactic information, and demands are issued (e.g. verbs have an obligatory need for their arguments). At the end of the utterance all demands should be fulfilled either from the sentence or with default mechanisms. The output of the parser is a dependency graph with di↵erent types of edges including coreference-edge. When the parser gets a verb (or any element having argument frame), after calling the actual argument frame searchers of the arguments can start o↵. If an argument preceded the verb in the linearization of the sentence it is at service for the searchers (in a short term memory called pool ). In other cases the searcher can wait until a potential supply arrives. The searchers have di↵erent settings according to the demands of the verb. In Hungarian, from the inflection of the verb some features of some arguments are calculable. The searchers look for an agreeing subject or object. A default zero node with the appropriate case marker and agreeing features is pro↵ered as well. According to this, zero pronouns are involved into the parsing process. Reflexives and reciprocals with their actual case marker behave like other arguments – as supplies, ready for the verb’s demands. A special problem during the parsing is the case of homonymy. In Hungarian the pronoun maga has two meanings: (1) a third person singular reflexive pronoun in nominative case (‘himself/herself’) and (2) a polite or formal second person singular personal pronoun in nominative case (‘you’). In addition there is an other use of maga in the construction of e.g. maga a(z) ´’ord´’og (‘the devil itself’). Pronominalization and the use of zero pronouns are run by an underlying rule-system which enables us to reveal the anaphora dependencies and referential identities. These long term relations overarch the borders of the caluse – even of the sentence – in which they are. Using the algorithm of Pléh and Radics (1976), these underlying rules can be built into the AnaGramma ⇤ † Ponente Autor correspondiente: [email protected] 26 parsing system in order to close its operation to human sentence processing regarding to the pronouns as well. In this paper I present a solution for handling Hungarian personal, reflexive and reciprocal pronouns in the framework of AnaGramma, based on the anaphora resolution algorithm by Pléh and Radics (1976). My observations are based on corpus data for which I have used the Pázmány Corpus (Endrédy, 2016). Some types of corefenerce like repetition, proper name variants, synonyms, hyper- and hyponyms are needed to be taken into account as well, they are the subject of future research. Contraseña: computational linguistics, parser, pycholinguistics, performance, corpus 27 Anaphora resolution in the interlanguage of English and Greek learners of Spanish: a corpus-based study Athanasios Georgopoulos 1 ⇤ 1 Universidad de Granada - UGR (SPAIN) – España Overt pronominal subjects are not syntactically obligatory in pro-drop languages like Spanish (Fernández Soriano 1999, Luján 1999). Previous research has shown that their use and alternation with null subjects is both syntactically and contextually constrained (Alonso-Ovalle et al 2002, Perez Leroux & Glass 1999). It has also been demonstrated that learners of Spanish show persistent deficits concerning their distribution (Lozano 2009, 2016). The interface between syntax and discourse has been claimed to account for these deficits (Sorace 2004). While research in this field has traditionally relied on experimental data (for overviews: Quesada 2015), there is an increasing number of researchers who point out the need of using corpora to test existing hypotheses (Dı́az & Thompson 2013, Lozano & Mendikoetxea 2013, Mendikoetxea 2014, Tono 2003). Additionally, most of the studies on subject pronouns in Spanish L2 (Al Kasey & Pérez-Leroux 1998, Almoguera & Lagunas 1993, Liceras 1996, Liceras & Dı́az 1999) have examined the interlanguage of English-speaking learners, whose L1 is non pro-drop. Overall, in Spanish L2, there is a very limited number of corpus-based studies on the interlanguage of speakers of pro-drop languages such as Greek (Margaza & Bel 2006). This paper presents the preliminary results of research that aims to explore the anaphoric 3rd person subject usage in the interlanguage of Greek and English learners of Spanish. The major empirical basis of the investigation is a recently compiled L1 Greek-L2 Spanish learner corpus. The corpus is conceived as a component of the L1 English-L2 Spanish CEDEL2 corpus (Lozano 2009, Lozano & Mendikoetxea 2013). Both corpora exhibit the same design principles. Hence, this is the first corpus-based study that allows comparability of two groups of learners (Greekspeaking versus English-speaking) whose L1 di↵ers with respect to anaphoric subjects. For the analysis of the corpus data, the XML annotator ”UAM corpus tool” (O’Donnell 2009) was used. A purpose-oriented tagset was designed, on the basis of previous learner corpus studies (Blackwell & Quesada 2012, Gudmestad & Geeslin 2013, Lozano 2016). Learners of two di↵erent proficiency levels (elementary and upper-advanced) for each group (English and Greek) were examined and compared to a native Spanish control group. Preliminary results indicate that although elementary Greek-speaking learners of Spanish show some tendency to overuse overt subjects, they do so in a significantly lower percentage than their English counterparts. Moreover, at the upper-advanced level, they exhibit native-like preferences, in contrast to the English-speaking learners, who show deficits even at the highest levels of proficiency. Crosslinguistic influence can account for these di↵erences between the two learner groups. Greekspeaking learners seem to take advantage of the similarity between their L1 and Spanish with respect to anaphora resolution (AR) patterns, whereas English-speaking learners seem to transfer their L1 properties. From a developmental point of view, results suggest that cross-linguistic influence is a crucial factor and that certain AR categories at the syntax-discourse interface can be fully acquired. Results run partially against the Interface Hypothesis and are in line with other recent SLA studies (Judy 2015, Kras 2008, Prentza 2014, Zhao 2014). ⇤ Ponente 28 Contraseña: anaphora resolution, SLA, Spanish L2, contrastive interlanguage analysis, learner corpora, Interface Hypothesis 29 Análisis de los aspectos pragmáticos en los discursos especializados de economı́a y finanzas: un trabajo basado en un corpus oral como apoyo a la interpretación Sonia Paola Martı́nez Zavala 1 ⇤ 1 Universidad Autónoma de Baja California (UABC) – Av. Monclova 678, Ex-Ejido Coahuila, 21360 Mexicali, Baja California, México Argumento principal Los intérpretes se enfrentan a problemas como falsos sentidos, sin sentidos y contrasentidos que se presentan en la práctica. Éstos pueden ocurrir al no considerar los aspectos pragmáticos del discurso. Los fallos pragmáticos ocurren cuando la interpretación es gramaticalmente correcta; sin embargo existe una pérdida de sentido. Objetivos El objetivo general es identificar aspectos pragmáticos en el discurso de economı́a y finanzas a través de un corpus monoling´’ue en inglés que facilite la tarea interpretativa en este tipo de discurso a través de un corpus y un reporte de hallazgos que funcionen como herramientas de documentación para el intérprete. Para lograrlo, se compila una muestra de un corpus de textos sobre economı́a y finanzas en inglés, que consiste de 27 transcripciones de entrevistas obtenidas de The World Bank Group (2016), se procesa en la herramienta AntConc 3.4.4w y se analiza el corpus para identificar los aspectos como emociones, inferencias intelectuales, hipótesis, reformulaciones, evaluaciones, expresiones metafóricas, modalizaciones discursivas, peticiones, órdenes, entre otros y se realiza un reporte que concentre los hallazgos. Marco Teórico Garcı́a Yebra (1981) señala que ”la traducción se distingue de la interpretación en que tiene como punto de partida un texto escrito, y como resultado, otro texto escrito” (p.9). Escobar (1996) menciona que la interpretación es una modalidad de la traducción y que presiones como los plazos convierten a la traducción en un proceso casi tan rápido como la interpretación. Faber (2009) indica que la pragmática se enfoca tanto en el efecto del contexto en el comportamiento comunicativo, ası́ como en cómo el receptor infiere para llegar a la interpretación final de una oración. Asimismo, Faber (2009) señala que la pragmática del discurso especializado se relaciona directamente con las situaciones en las cuales ocurre este tipo de comunicación, y en las formas ⇤ Ponente 30 en las que el emisor y el receptor lidian con ellas de manera potencial o efectiva. Sobre dominio pragmático, Bertone (1989) afirma que la competencia del intérprete consiste en lograr una distinción entre los tipos de implı́citos y de información contextual para interpretar adecuadamente, respetando cada aspecto. McEnery y Hardie (2012) definen la ling´’uı́stica de corpus como un área que se enfoca en un conjunto de procedimientos para el estudio de una lengua que se pueden aplicar a varias áreas de la ling´’uı́stica. L´’udeling y Kyt´’o (2009) indican que los córpora orales pueden ser compilaciones de grabaciones o transcripciones de éstas y que es posible analizar las últimas como un corpus escrito. Resultados Se construyó un corpus oral en inglés que consta de 86,883 palabras recuperadas del Banco Mundial y que se analizó con herramientas de procesamiento de corpus para determinar los aspectos pragmáticos y su contexto. Los resultados permiten a los intérpretes conocer sobre caracterı́sticas pragmáticas y desarrollar el dominio pragmático en el discurso económico financiero. Algunos ejemplos encontrados en el corpus son: el adverbio en inglés absolutely que expresa evaluación en el discurso económico-financiero, la conjunción if que denota una hipótesis y la frase I mean que pone de manifiesto una reformulación. En reformulaciones, la frase I mean, se utilizó como tal en 21 casos de 22 hits. La interpretación propuesta es Quiero decir, Digo o Me refiero a. Como expresión idiomática apareció en una ocasión Across the board, y la interpretación propuesta es A todos en general o Incluyendo a todos. Contraseña: Palabras clave: : pragmática, discurso especializado, ling´’uı́stica de corpus e interpretación. 31 Aplicaciones del corpus CORPEN a la enseñanza y la evaluación de las unidades fraseológicas del español usado en contextos especı́ficos Inmaculada Martı́nez 1 ⇤ 1 , Susana Llorián ⇤ † 2 Centro Internacional de Estudios Superiores del Español (CIESE-Comillas) – Avda. de la Universidad Pontificia s/n. 39520 Comillas. Cantabria, España 2 Universidad Complutense de Madrid (UCM) – España El impacto del Plan Curricular del Instituto Cervantes (2007) lleva a la Fundación Comillas a publicar años más tarde el Plan Curricular del Español de los negocios (Martı́n Peris y Sabater, 2012), con el fin de que este documento se erigiera en la principal referencia para el diseño de cursos, de materiales didácticos y de exámenes certificativos del Español de los Negocios (ENE). Durante el desarrollo de la documentación curricular se ratificó la necesidad de que se pusiera en marcha un proyecto de investigación que guiara el desarrollo de este proceso, fundamentado en un corpus especializado, que se materializarı́a en el corpus CORPEN (Corpus Comillas del Español de los Negocios). Una de las áreas más afectada por la aplicación del corpus CORPEN a este proceso es el componente léxico. El objetivo principal de esta comunicación consiste en mostrar las implicaciones de la asistencia de este corpus en la especificación de los contenidos léxicos del currı́culo de ENE, las orientaciones metodológicas y la validación de pruebas de evaluación certificativa del léxico. El uso del corpus es determinante para la selección de las unidades léxicas, tanto mono- como pluriverbales, es decir, palabras simples o compuestas, colocaciones, locuciones, fórmulas de interacción social, según la clasificación de Gómez Molina (2004), que se incluyen en los inventarios que servirán de base para la elaboración de los sı́labos de los cursos y de los manuales, ası́ como de las especificaciones de los exámenes. Queda garantizado ası́ que la lengua de estos materiales sea auténtica, reflejo de la que se emplea en los contextos reales de comunicación del ámbito de ENE, y no artificial o inventada como la que se muestra en los materiales que toman los corpus como punto de partida (O’Keefe y McCarthy, 2010: 374). Por otro lado, el corpus se constituye en la herramienta idónea para presentar las unidades léxicas del currı́culo en la disposición que se requiere para su enseñanza, a partir de propuestas como la del ”enfoque léxico” (Lewis, 1993, 1997, 2000) y las de algunos de sus seguidores como Timmis (2015), que plantean aplicaciones del enfoque empleando metodologı́a de corpus. En esta lı́nea, O’Kee↵e et al. (2007) describen el trazado de perfiles léxico-gramaticales de las unidades léxicas en el currı́culum, cuya rentabilidad pedagógica resulta especialmente fructı́fera si se aplica a la didáctica de ENE. Como señalan estos autores (O’Keefe et al, 2007: 198), en los géneros especializados y profesionales, lo más probable es que ocurran patrones y distribuciones más regulares que los que se dan que en la lengua general. Las relaciones entre léxico y gramática que se establecen desde la óptica de este enfoque permiten, ⇤ † Ponente Autor correspondiente: [email protected] 32 en segundo lugar, implementar la metodologı́a del ”aprendizaje guiado por datos” (Data-Driven Learning), que consiste básicamente en utilizar las herramientas que facilitan los corpus para el aprendizaje de las unidades léxicas. De esta forma podrı́an paliarse muchas de las crı́ticas que reciben propuestas como la de Lewis, referidas a los problemas de aplicación práctica. Por último, un corpus como CORPEN contribuirá también de manera decisiva a la validación de las pruebas de evaluación del léxico en los exámenes certificativos. En este sentido, el corpus permite comprobar la relación entre la lengua de los ı́tems de elementos discretos con los usos que se dan en los contextos reales de ENE. Contraseña: ”español de los negocios”, ”corpus especializados”, ”currı́culum del español de los negocios”, ”colocaciones”, ”locuciones”, ”expresiones institucionalizadas” 33 Applying Textometric Analysis to a Description of Cochrane Medical Abstracts and their Plain Language versions: Quantitative Characterisation of Plain Language in Medical Discourse Christopher Gledhill ⇤† 1 , Hanna Martikainen ⇤ ‡ 1 , Alexandra Mestivier (volanschi) ⇤ § 1 , Maria Zimina ⇤ ¶ 1 1 CLILLAC-ARP, EA3697 – Université Paris Diderot - Paris 7 – Francia The Cochrane organisation publishes meta-analyses of large-scale medical studies (‘Systematic Reviews’ – SRs). This information is summarised in 1) a Scientific Abstract (ABS), targeting members of the scientific community, and 2) a simplified summary for the general public which Cochrane calls ‘Plain Language Summaries’ (PLS). Although there now exists extensive literature on controlled languages (Stewart 1998, O’Brien 2003), there has been less work on the linguistic description of ‘plain language’. The Cochrane guidelines state that SRs should be written in ”clear, simple English” (Cochrane Style Manual), while the language that should be used in PLS is defined as ”plain English which can be understood by most readers without a university education” (Cochrane PLEACS standards). But the guidelines do not provide any specific linguistic definition of what is meant by ‘plain English’. In this paper, we set out to identify the main lexico-grammatical di↵erences between ABS and PLS texts. Our hypothesis is that PLS authors adapt their usage consciously or unconsciously to the perceived norms of what they think may be plain writing. This process appears to be very regular, and can be seen in the techniques of reformulation and other revisions that can be seen as the salient features of PLS as opposed to ABS. We extracted two sub-corpora from the literature produced by the Cochrane organisation: a corpus of 4540 ABS (2.1 million words) and a corpus of their corresponding 4540 PLS (1.1 million words). The ABS texts are systematically divided into sub-sections: Background, Objectives, Search strategy, Selection criteria, Data collection and analysis, Main results, Author’s conclusion. A minority of PLS (370) are also divided into sub-sections: Review question, Background, Study characteristics, Quality of the evidence and Key results. This segmentation allows us to pinpoint some specific phraseological strategies, for instance, the simplification of information from Author’s Conclusions (in ABS) in the Key Results subsections of PLS. We propose to use the methods of textometrics to compare the quantitative characteristics of the ABS sub-corpus and the PLS sub-corpus. First, we applied POS-tagging to both (Schmid 1994). Then, we applied characteristic elements computation and factorial analysis to compare di↵erent parts (text sections) of these POS-tagged corpora (Lebart et al. 1998). These met⇤ Ponente Autor correspondiente: ‡ Autor correspondiente: § Autor correspondiente: ¶ Autor correspondiente: † [email protected] [email protected] [email protected] [email protected] 34 rics reveal important similarities between the Background and Conclusions sections of ABS and PLS. For example, Singular/Massive Nouns (NN), Prepositions (IN), Adjectives (JJ) and Determiners (DT) turn out to be salient (‘over-represented’) in PLS as well as ABS Backgrounds and Conclusions sections. The over-representation of prepositions can be partially explained by complex pre-modified nominal groups in the ABS which are ‘un-packed’ in the PLS into longer nominals involving multiple embedding of post-modifying prepositional phrases: ABS: ”Non-penetrating filtration surgery versus trabeculectomy for open-angle glaucoma” PLS: ”Two surgical techniques for the control of eye pressure in people with glaucoma” Such ‘unpacking’ corresponds to the advice adopted by controlled languages such as Simplified Technical English: break down pre-modified nominals into several post-modifying groups. In this paper, we also report on other PLS patterns (reformulation of research processes and empirical findings towards more disease-oriented or user-oriented terms and topicalisation of human participants). All of these point to underlying regular tendencies of simplification in PLS. The next stage of our project will devise a way of adapting the findings of textometric analysis into the appropriate editorial guidelines for the authors of Cochrane PLS. Contraseña: corpus linguistics, language for special purposes, medical discourse, plain language summaries, textometric analysis 35 Aproximación a la fraseologı́a contrastiva en las sentencias del TJUE Andrades Arsenio 1 ⇤ 1 Universidad Complutense de Madrid (UCM) – España La Unión Europea publica toda su legislación en las 24 lenguas oficiales correspondientes a los 28 Estados miembros que conforman esta organización supranacional. En este sentido, el portal de la Unión Europea contiene una serie de recursos y páginas de internet que ponen a disposición del público un enorme corpus de textos legislativos, judiciales, etc., de fácil acceso en cada una de las lenguas oficiales. Este corpus multiling´’ue de textos paralelos permite realizar búsquedas ling´’uı́sticas y constituye un instrumento muy útil para consultar y cotejar todo tipo de datos de carácter terminológico, fraseológico, estilı́stico, etc. La ling´’uı́stica de corpus facilita el análisis de los distintos elementos ling´’uı́sticos en su contexto de producción real a partir de la compilación de documentos digitales. El estudio de textos del Derecho de la Unión Europea nos permitirá conocer las caracterı́sticas fraseológicas especı́ficas de estos textos y proponer una clasificación de los distintos tipos de estructuras fraseológicas (colocaciones, locuciones, expresiones formulaicas, etc.) que más se utilizan, basada en las principales taxonomı́as fraseológicas del lenguaje general (Corpas, 1997; Ruiz Gurillo, 1998; Garcı́a-Page, 2008). Para delimitar el ámbito de este trabajo nos vamos a centrar en una de las instituciones de la UE, el Tribunal de Justicia de la Unión Europea, y en uno de los principales tipos de documentos que produce: las sentencias. Ası́ pues, esta propuesta de comunicación tiene como objetivo la compilación de un corpus de sentencias en tres lenguas (inglés, francés y español) con el fin de identificar y extraer sus principales elementos fraseológicos. La metodologı́a de trabajo consiste fundamentalmente en constituir un corpus ad hoc de sentencias de la UE que sea representativo (Seghiri, 2014) y explorarlo mediante el programa de concordancias Wordsmith 5.0. con la finalidad de obtener información sobre las estructuras fraseológicas que más se utilizan en las tres lenguas que se cotejan. Los datos que se obtengan podrán servir de base a la hora de establecer distintas estrategias para abordar la traducción de estructuras fraseológicas en textos pertenecientes al ámbito judicial. Con este tipo de trabajos se pone de relieve que la compilación de un corpus puede contribuir de manera significativa al conocimiento de la fraseologı́a en un campo especializado y se hace hincapié en la importancia de que el traductor jurı́dico esté familiarizado con la fraseologı́a de su ámbito de especialización (Monzó y Hoyo, 1998; Lorente, 2002; Aguado de Cea, 2007; Pontrandolfo, 2013; Andrades 2013). Los resultados obtenidos constituyen una primera aproximación a la fraseologı́a jurı́dica propia de los organismos internacionales que podrán ampliarse con estudios de mayor alcance y, si los datos lo corroboran, podrán extrapolarse a los textos jurı́dicos en general. Este estudio permitirá asimismo apreciar las diferencias y semejanzas fraseológicas existentes entre el discurso jurı́dico general y el lenguaje utilizado en las sentencias del TJUE. ⇤ Ponente 36 Contraseña: Corpus Linguistics, specialised phraseology, legal translation 37 Calcul de la saillance pour annoter un corpus anaphorique (RESUMAN) Afef Selmi ⇤† 1,2 , Laurent Gautier ⇤ ‡ 3 1 Centre Interlangues Texte Image Langage (TIL) – Université de Bourgogne : EA4182 – Université de Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia 2 Aix-Marseille Université - UFR Arts, Lettres, Langues et Sciences Humaines (AMU UFR ALLSH) – Aix Marseille Université – 29, avenue Robert Schuman - 13621 Aix-en-Provence cedex 1, Francia 3 Centre Interlangues Texte Image Langage (TIL) – Université de Bourgogne : EA4182 – Université de Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia [Contexte] Le développement des systèmes de communication électroniques est accompagné d’une augmentation incessante du nombre de documents textuels électroniques disponibles tels que les résumés de notre corpus RESUMAN. Cette évolution nécessite la mise au point d’outils informatiques efficaces capables de sélectionner, de structurer et d’extraire les informations pertinentes contenues dans ces documents. Problématique Ce résumé s’inscrit prioritairement dans la piste de réflexion 7 ” Linguistique computationnelle basée sur corpus”. De ce fait, et comme ” la langue est constituée en grande partie de préfabriqués dont on peut faire l’analyse en interrogeant les corpus en s’appuyant sur des méthodes statistiques ”, nous avons crée un algorithme qui s’appuie sur le calcul de saillance (Landragin, 2011) comme facteur principal de résolution des anaphores pronominales dans notre corpus. En prenant en compte di↵érents facteurs syntaxiques et cognitifs, cet algorithme fait recourt à un modèle permettant d’évaluer d’une manière efficiente la saillance d’un antécédent potentiel. Ces facteur comportent chacun un indice di↵érent en fonction de leur utilité dans la résolution. Notre interrogation est la suivante : notre méthode statistique, basée sur notre corpus, est-elle performante ? Corpus Le corpus RESUMAN est constitué des résumés d’ouvrages de la littérature française. Il regroupe 120 résumés, mis en ligne sur le site www.alalettre.com et présentant un peu moins de 20 000 mots. Ce corpus contient environ 12 000 anaphores pronominales dont 3 000 sont ambigu´’es. Il s’agit de textes caractérisés par leur brièveté et densité référentielles. Il vise à interroger, automatiquement, le fonctionnement de l’anaphore pronominale ambigu´’e dans ces textes en vue de mettre en évidence des caractéristiques syntaxiques et cognitives propres aux chaı̂nes anaphoriques. Cadre méthodologique Après l’annotation morphosyntaxique semi-automatique de RESUMAN (vu que nous sommes intervenue pour compléter l’annotation morphologique des entités nommées), nous avons présenté ⇤ Ponente Autor correspondiente: selmiafef [email protected] ‡ Autor correspondiente: [email protected] † 38 un algorithme qui est inspiré de celui de Lappin et Leass (1994) en changeant la stratégie de calcul de la saillance. Afin de restreindre les candidats potentiels, l’algorithme soumet les textes de notre corpus à deux filtres : tout d’abord, à un filtre relatif à la cohérence morphologique entre l’anaphore et le candidat, ensuite, à un filtre relatif à la structure syntaxique de la phrase du pronom. Les candidats restants seront évalués selon un poids de saillance calculé selon les critères suivants : la distance du candidat et son poids grammatical. Pour cela, nous avons attribué des valeurs allant de 100 à 10 aux fonctions syntaxiques suivantes : Sujet, COD, COI, Attribut et Relatif. L’algorithme exploite, en premier temps, des informations de nature syntaxique et morphologique. Après exclusion des pronoms non-anaphoriques, il applique une mesure de saillance qui vise à classer les candidats potentiels pour ne garder ensuite que les attributs adéquats. A travers la résolution automatique de l’anaphore pronominale, nous mettons l’accent par la suite sur les interactions entre discours, traitement automatique des langues et analyse de corpus. Résultats 80% des anaphores pronominales du corpus sont résolues dont 25% des cas ambigus. Il reste 20% d’anaphores pronominales non résolues ce qui nous mène à réinterroger le corpus pour savoir les mécanismes qui ont empêché la résolution. Les poids grammaticaux que nous avons rajoutés en sont-ils la cause ? Ou bien au contraire, est-ce grâce à eux que nous avons ce taux de performance ? La course à un corpus d’évaluation est de mise pour répondre à ces questions. Contraseña: Linguistique computationnelle, corpus, anaphores pronominales, statistique, saillance, poids grammatical, résolution automatique. 39 Constitution d’un corpus juridique pour l’extraction des collocations Joaquı́n Giraldez Ceballos-Escalera 1 ⇤ 1 UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) – Senda del Rey, 7 - 28040 MADRID, España Du point de vue méthodologique, cette contribution s’inscrit dans le cadre de la linguistique de corpus et met en œuvre une étude sur l’extraction des collocations en langage juridique. Cette étude a le double objectif d’aborder les bases méthodologiques pour la constitution d’un corpus de textes juridiques et de présenter les di↵érentes étapes suivies pour l’extraction des collocations. La linguistique du corpus est une discipline linguistique qui, associée à la linguistique computationnelle, étudie la langue à travers une grande variété de textes. En lexicographie, le corpus constitue le matériel de base pour l’analyse linguistique et, grâce à la technologie computationnelle, aujourd’hui il est possible de disposer d’une masse considérable de données linguistiques, disponibles sous forme électronique. Ces ensembles de textes permettent d’observer des données réelles nombreuses et diversifiées. Ces ressources ouvrent de nouvelles perspectives à la description linguistique, dans la mesure où des outils d’analyse permettent d’explorer ces textes et d’en extraire des données linguistiques de manière efficace. On présentera le Corpus du français juridique ” FRJUR ” que nous avons élaboré et des outils d’analyse ainsi que la méthodologie employée. Le corpus linguistique du français juridique (FRJUR) est le résultat de la collecte de textes relatifs au domaine du droit civil français. Il est composé de 3.200.086 mots distribués en di↵érentes sections: codes, arrêts, publications spécialisées, etc. Les textes ont été sélectionnés et organisés de façon systématique selon des critères de distribution équilibrée pour devenir un ensemble structuré davantage que des collections de textes. Le corpus, sur support numérique, a été conçu en fonction des critères établis par Sinclair (1991): ”a corpus is a collection of naturally-occurring language text, chosen to characterize a state or variety of a language”. Sinclair (1991: 171) Pour la conception du corpus on a pris en compte la représentativité des textes et les destinataires. Le corpus FRJUR, nous a permis d’étudier les relations lexicales qui existent entre deux mots (probabilité de la dépendance) avec la probabilité d’observer ces mêmes mots séparément (probabilité de l’indépendance). Selon la théorie de Church et Hanks (1989) basée sur la notion d’information mutuelle de la théorie de l’information, si une véritable relation lexicale existe ⇤ Ponente 40 entre deux mots, la probabilité de la dépendance sera beaucoup plus élevée que la probabilité de l’indépendance et l’information mutuelle de la paire (le rapport des deux probabilités) sera largement supérieure à zéro. La paire sera alors retenue comme étant significative. La fréquence, la transparence, l’arbitrariété et la directionnalité constituent les critères établies par la plupart d’auteurs pour identifier les collocations (Firth :1957; Cruse : 1986 ; Hausmann : 1989 ; Mel’cuk : 1998). Pour établir la typologie des ” collocations ” dans la langue juridique nous proposons de partir des ” associations ” établies par Hausmann (1989) et de les répartir en cinq groupes: nom – adjectif, verbe – nom, verbe – adverbe, adverbe – adjectif et nom - (préposition) – nom. À l’aide d’un corpus informatisé, l’étude des collocations dans le langage juridique permettra d’enrichir les banques de données terminologiques pour l’utilisation des traducteurs, des chercheurs spécialistes (jurilinguistes) et les apprenants de français sur objectifs spécifiques (FOS). Contraseña: Mots, clés : corpus, collocations, coocurrence, droit, extraction 41 Construction de corpus en vue d’une étude contrastive des structures résultatives en anglais et de leur traduction en français Dijana Bojovic ⇤ 1 1 Bases, Corpus, Langage (BCL) – CNRS : UMR7320, Université Nice Sophia Antipolis (UNS) – Laboratoire BCL - UMR 6039 Université de Nice - Campus Saint-Jean d’Angely 3 24, avenue des Diables bleus 06357 Nice Cedex 4, Francia Cette communication a pour objectif principal d’expliquer les manières de procéder et les problèmes rencontrés dans la construction de corpus pour notre étude contrastive des structures résultatives en anglais et de leur traduction en français. Basée sur plusieurs corpus (British National Corpus, Corpus of Contemporary American English, Gutenberg, Gallica et FRANTEXT), cette étude s’appuie sur la mise au point de procédures spécifiques à partir des caractéristiques connues du phénomène étudié, destinées à extraire des données à partir de corpus généraux. D’un point de vue sémantique, les structures résultatives représentent à la fois un dynamisme et l’aboutissement de ce dynamisme. Un procès dynamique est au cœur d’une première relation prédicative et l’état de fait résultant de ce dynamisme constitue une seconde relation prédicative. On a a↵aire à la fusion des deux relations prédicatives – c’est-à-dire une relation prédicative et une relation coprédicative – et donc à une syntaxe di↵érente de la syntaxe de l’enchâssement. Etant donné que les SR représentent un phénomène très productif en anglais, nous nous sommes, dans un premier temps, fixé l’objectif d’en dresser une typologie, tout en tenant compte de leurs limites, c’est-à-dire, verbes statiques d’un coté du spectre et transitifs prototypiques à l’autre extrémité. L’interaction entre la syntaxe et la sémantique est forcément en jeu et pour cela nous analysons lors de cette recherche les propriétés des structures transitives (He ate the plate clean), des intransitives inergatives (The child screamed itself hoarse) et des intransitives inaccusatives (The lake froze solid ). L’autre classement se fait par type d’attribut résultatif : syntagme adjectival (He hammered the metal flat), syntagme nominal (She dyed her pants a bright red.), syntagme prépositionnel (She smashed the vase to pieces), syntagme adverbial (We decided to creep upstairs and see what happened). Nous mettons au point les protocoles d’interrogation des corpus existants en anglais et en français en vue de constituer un corpus de SR en anglais et un corpus en français pour mener une étude des problèmes que pose leur traduction de l’anglais vers le français. Nous construisons ainsi un corpus à plusieurs volets ; le premier comportant les exemples anglais recueillis de manière systématique dans les corpus BNC et COCA, en créant des collocations et en lançant des recherches avec des variations, le deuxième réservé aux traductions en français des structures relevées en anglais dans le premier volet du corpus (Gallica, FRANTEXT, Gutenberg), et aux observations de leurs caractéristiques, et le troisième qui contient les SR existantes en français. Le but de cette recherche contrastiviste est de faire deux études linguistiques – l’une sur la langue anglaise, l’autre sur le français – du phénomène des SR, pour chercher où commencent les divergences et pour quelles raisons. L’analyse des traductions, quant à elle, a pour but de systématiser les solutions rencontrées, d’en chercher la justification, et de dégager des con⇤ Ponente 42 stantes qui pourront apporter une aide à la réflexion et à l’autonomie du traducteur, apporter un éclairage supplémentaire sur ces structures qui conservent à l’heure actuelle une part d’opacité et qui se prêtent mal à l’analyse, et apporter si possible des outils supplémentaires à la traduction assistée par ordinateur. Les conclusions de notre travail de recherche sont donc le fruit des données attestées en corpus, et la confrontation des hypothèses de travail avec notre corpus est heuristique. Contraseña: corpus, linguistique contrastive, structures résultatives, syntaxe, traduction, linguistique de corpus 43 Corpus en classe de langue. Exemple avec les marqueurs d’exemplification et de reformulation Cristelle Cavalla ⇤† 1 , Thi Thu Hoai Tran ⇤ ‡ 2 1 2 Didactique des langues, des textes et des cultures (DILTEC) – Université Paris III - Sorbonne nouvelle : EA2288, Université Sorbonne Paris Cité (USPC) – Maison de la Recherche, 4 rue des Irlandais, 75005 Paris, Francia Grammatica – Université d’Artois : EA4521 – Université d’Artois Maison de la Recherche 9, rue du Temple - BP 10665 62030 ARRAS CEDEX, Francia Dans cette communication nous décrirons une expérimentation en cours auprès d’étudiants allophones de niveau A2-B1 dans un cours de français académique autour de l’utilisation d’un lexique spécifique aux écrits scientifiques et d’un corpus numérique. En termes méthodologiques il s’agit aussi de les aider à se familiariser avec les normes de ce genre d’écrit universitaire qui sont parfois éloignées des normes rencontrées dans leur système éducatif d’origine. Dans ce travail nous nous intéressons tout particulièrement au discours universitaire issus d’un corpus de 5 millions de mots composé d’articles scientifiques venant de disciplines en SHS et accessible en ligne grâce à l’interface ScienQuest[1]. Ce corpus est étiqueté morpho-syntaxiquement et annoté semi-automatiquement (Tran, 2014). Notre intérêt porte essentiellement sur la phraséologie transdisciplinaire scientifique, ou le lexique scientifique transdisciplinaire (Tutin, 2007) qui est considéré comme un ” lexique de genre ” et traverse toutes les disciplines, par exemple : contredire une théorie, objectif principal etc. Nous nous situons dans une conception élargie du domaine de la phraséologie (Legallois et Tutin, 2014) en y incluant les marqueurs discursifs (désormais MD) (à savoir, en résumé, dans le cadre de etc.) qui servent à structurer le discours. Nous avons établi une typologie composée de 171 MD et divisée en neuf sous-groupes (Tran, 2014). Pour l’analyse de ces éléments, nous avons opté pour le modèle linguistique de Paillard et Vu (2014) selon lequel nous pouvons mettre l’accent sur la relation syntaxique entre les contextes gauche et droit d’un adverbe ou d’un adverbial pour relever par la suite ses valeurs sémantiques. L’expérimentation mise en place porte sur les marqueurs d’exemplification et de reformulation, car nous avions constaté leur sur-représentation dans les écrits scientifiques (Tran et al., 2016). Au plan pédagogique, les étudiants sont confrontés à des paragraphes courts, extraits du corpus numérique. Cette expérimentation est considérée comme la première étape de sensibilisation au rôle que jouent ces éléments phraséologiques dans la structuration de ces écrits pour les étudiants allophones. Nous émettons l’hypothèse qu’une telle entrée linguistique les conduira à découvrir les normes du genre de l’écrit universitaire. Références Adam, J.-M. (1989). ” Aspects de la structuration du texte descriptif: les marqueurs d’énumération et de reformulation ”. Langue française, (81), 5998. ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 44 Cavalla, C. & Loiseau, M. (2013). ” Scientext comme corpus pour l’enseignement ”. In L’écrit scientifique: du lexique au discours. Autour de scientext, Tutin, A. & Grossmann, F. Rennes : PUG, 16380. Legallois, D., et Tutin, A. (2013). ” Présentation: Vers une extension du domaine de la phraséologie ”. In ” Vers une extension du domaine de la phraséologie ”, Legallois, D. & Tutin, A. (éds), 1(189), 325. Mangiante, J.-M., & Parpette, C. (2011). Le français sur objectif universitaire. Grenoble: Presses universitaires de Grenoble. Paillard, D., & Vu, T.-N. (2012). Inventaire raisonné des marqueurs discursifs du français. Description. Comparaison. Didactique. Paris : AUF. Tran, T.-T.-H., Tutin, A, & Cavalla, C. (2016). ” Typologie des séquences lexicalisées à fonction discursive et aide à la rédaction scientifique ”. Cahiers de lexicologie, 108(1), 161-180. Tran, T.-T.-H. (2014). ” Développement d’une aide à l’écrit scientifique. Description de la phraséologie scientifique et réflexion didactique pour l’enseignement à des étudiants non natifs ”. Thèse de doctorat en Sciences du langage Spécialité Français Langue Etrangère, Université Grenoble Alpes. Tutin, A. Lexique et écrits scientifiques. Vol. XII-2. Revue Française de Linguistique Appliquée, 2007. URL : http://corpora.aiakide.net/scientext18/ Contraseña: phraséologie, FLE 45 Development of Tatar-Russian Socio-Political Dictionary of Collocations on Corpus Data Olga Nevzorova 1 ⇤ 1 Tatarstan Academy of Sciences (TAS) – 20 Bauman str., Kazan, Rusia The Tatar-Russian Socio-Political Dictionary of collocations is based on data of the Corpus of Written Tatar (http://corpus.tatar/en), the Tatar National Corpus (http://corpus.antat.ru), and data of comparable socio-political corpora. It is built as a collocation dictionary which contains more than 3000 collocations. The methodology of compiling the Dictionary included the following stages. First we developed comparable thematic socio-political corpora of Tatar and Russian. The next stage implied an automatic generation of the frequency list of actual terms (the list of one-word terms as potential header words) using comparable corpora. Then, applying the software of the Corpus of Written Tatar, we obtained a frequency list of collocations for each frequent term. The limitations for cutting elements from the collocation list were based on frequency of using linguistic items in the Corpus, and these limitations were determined empirically. When selecting collocations, we considered the syntactic structure of a collocation and the morphological parameters of its constituents. We also took into account regularities of grammatical (non-inflectional) variants of word combinations. For example, In Turkic languages occur the following regular synonymous models: ADJ +N and N + N, POSS 3: iqtisadi cinay´’at (ADJ +N) - iqtisad cinay´’ate (N + N, POSS 3) ’economic crime’. Such regular grammatical variants of collocations are considered as the same nominative item. The main unit in the Dictionary is noun phrase formed by filing one of possible semantic-syntactic positions of the word and meeting the criteria of semantic completeness. Quantitatively such an item may consist of two or more notional words. In the current version of the Dictionary most of collocations are composed of two notional components. The compiled Dictionary makes it possible 1) to represent the real use and collocability of words of the socio-political domain in Tatar; 2) to build typical grammatical models of collocations of these items; 3) to trace new items (words and collocations) in modern Tatar. The reported study was funded by Russian Science Foundation according to the research project 16-18-02074. Keywords: the Tatar language, collocations, Dictionary of collocations, socio-political terminology, corpora. References 1. Bahns, J. (1993). Lexical collocations: a contrastive view. ELT journal, 47(1), 56-63. 2. Benson, M. (1990). Collocations and general-purpose dictionaries. International Journal of Lexicography, 3(1), 23-34. 3. Benson, M. (1989). The structure of the collocational dictionary. International Journal of Lexicography, 2(1), 1-14. 4. Carter, R. (2012). Vocabulary: Applied linguistic perspectives. Routledge. 5. Conrad, S. (2002). 4. Corpus linguistic approaches for discourse analysis. Annual Review of Applied Linguistics, 22, 75-95. ⇤ Ponente 46 6. Corpus of Written Tatar. URL: http://corpus.tatfolk.ru/index en.php. 7. K´’ubler, N., & Pecman, M. (2012). The ARTE bilingual LSP dictionary: From collocation to higher order phraseology. 8. Kennedy, G., 2014. An introduction to corpus linguistics. Routledge. 9. Ramos, M. A., Nishikawa, A., & Vincze, O. (2010, June). DiCE in the web: An online Spanish collocation dictionary. In E-lexicography in the 21st century: New challenges, new applications: proceedings of eLex 2009, Louvain-la Neuve, 22-24 october 2009 (pp. 369-374). 10. Reppen, R., & Biber, D. (Eds.). (2012). Corpus linguistics (pp. 1988-1988). SAGE. 11. Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell Publishers. 12. Suleymanov D., Nevzorova O., Gatiatullin A., Gilmullin R., Khakimov B. (2013). National corpus of the Tatar language ”Tugan Tel”: Grammatical Annotation and Implementation. In Procedia - Social and Behavioral Sciences 2013. Pp. 68-74. 13. Tatar National Corpus. URL: http://corpus.antat. Contraseña: Socio, Political Dictionary, Tatar language, collocation 47 Development of annotation system for multiword constructions for Tatar National Corpus Dzhavdet Suleymanov 1 ⇤ 1 Tatarstan Academy of Sciences (TAS) – 20 Bauman str., Kazan, Rusia Tatar National Corpus (TNC - http://corpus.antat.ru) is a linguistic resource of the modern Tatar language. Its volume is 100,000,000 tokens. The texts included into the Corpus are provided with a grammatical mark-up, so that its search system enables for a search for lexemes, word forms and individual grammatical parameters, as well as search for stop-words, for a part of the word, and search using logical formulae. Currently TNC morphological analyser uses a tagset for morphological categories within a word form. Since Tatar is distinguished for its complicated agglutinative morphology, the analysis isolates the word stem, defines its part of speech, and gives a description to the chain of inflectional affixes of the word form. The present system of grammatical annotation is being supplemented with tags to mark up compound constructions. In Turkic languages a large number of lexical items and grammatical categories are expressed by means of multiword units (for example, the category of modality is, as a rule, conveyed not lexically, but using special constructions expressing the idea of obligation, possibility, or desire). In the current version of grammatical mark-up, compound word forms and multiword constructions may only be derived by means of sophisticated queries. So extracting multiword constructions requires a description of parameters of two or even more linguistic units with a predetermined distance between them. Therefore such queries become cumbersome and time-consuming, and the user has to be experienced in making complex queries. Presently the grammatical annotation system is being enriched by entering new tags for compound (analytical) forms and constructions, thus allowing for distinguishing between multiword lexical items, forms and constructions. Special rules for retrieving such units have been developed, basing on their structure, the order of components, and the possibility to insert some outer members. In particular, verbal constructions consisting of two components have the following standard structure: the first component has a required form (has a given affix or set of affixes) and is grammatically invariable, while the second may join all the inflectional and derivational affixes admissible for verbs. Compound verbs semantically equivalent to a lexeme consist of an invariant first component (stem) and an inflected second (auxiliary) component. For example, the verb y´’ard´’am it´’u ’to help’ in real use may have di↵erent realisations: y´’ard´’am ittel´’ar ’ they helped’, y´’ard´’am itm´’asme ’will he help?’, y´’ard´’am it´’uce ’that he helps’ etc. In actual use such verbs may form compound multiword constructions by adhering components, with a possibility to insert postpositional particles between them. ⇤ Ponente 48 The present Tatar grammars keep a superficial description of the structure of multiword constructions, covering but a small number, while the corpus technology o↵ers an exhaustive list of such units. By now we have drawn up sets of rules for retrieving compound verbs semantically equivalent to a lexeme, as well as rules for retrieving their tenses, and constructions composed of phase and modal verbs. Also we developed formats of queries for retrieving correspondent data and invented special tags to mark up diverse types of multiword constructions. The annotation system is mainly built on the tags of Leipzig Glossing rules and those of the database of verbs developed by V.Plungian (httlp://www.mccme.ru/ling/verbum.htm). The reported study was funded by RFBR according to the research project 15-07-09214. Contraseña: the Tatar language, corpus, multiword construction, corpus annotation 49 Diccionario de terminologı́a médica español - chino basado en corpus Antonio Moreno-Sandoval 1 ⇤ 1 , Yuanyi Liu ⇤ † 2 Universidad Autónoma Madrid (UAM) – Departamento de Lingüistica y Lenguas Modernas, Facultad de Filosofı́a y Letras, Cantoblanco, 280049 Madrid, España 2 Universidad Autónoma Madrid (UAM) – Laboratorio de Lingüistica Informática, Facultad de Filosofı́a y Letras, Cantoblanco, 28049, Madrid, España En relación a los diccionarios especializados español-chino o chino-español, aún son escasos los trabajos y carecen de variedad. Más concretamente en la terminologı́a médica, solo existe un diccionario biling´’ue Diccionario de medicina chino-español de la Editorial de Lenguas Extranjeras de Beijing. Está publicado en el año 2005 por lo que no incluye los términos más recientes de los últimos diez años y está por actualizar. Por otro lado, no está basado en el corpus ni aportan ejemplos que ilustren el significado en el uso real. En fin, es un campo en el que se pueden ampliar claramente las investigaciones. Nuestro proyecto está elaborando un diccionario biling´’ue español-chino especializado en la medicina y basado en corpus. En concreto, se van a utilizar MultiMedica (Moreno y Campillos 2013), corpus compilado y desarrollado por el Laboratorio de Ling´’uı́stica Informática de la Universidad Autónoma de Madrid (LLI-UAM) y Sketch Engine, uno de los sistemas de búsqueda más avanzados de ayuda a los lexicógrafos a encontrar buenos ejemplos de uso para su diccionario (Kilgarri↵ et al. 2008). El objetivo del proyecto es, en primer lugar, elaborar un diccionario especializado biling´’ue en formato electrónico, para, posteriormente, describir los problemas tanto traslaticios como técnicos en la elaboración del mismo y realizar un estudio comparativo de la terminologı́a médica en ambos idiomas. El objetivo final que persigue esta lı́nea de investigación es explorar, mediante la aplicación de la tecnologı́a de corpus a la lexicografı́a, una metodologı́a cientı́fica en la elaboración de diccionarios especializados español-chino que se pueda reproducir en otros terrenos especı́ficos, tales como la terminologı́a económica y comercial, la jurı́dica, etc., y, al mismo tiempo, contribuya al desarrollo de la traducción especializada y la formación de traductores e intérpretes de alto nivel. Esta comunicación se centrará en la metodologı́a empleada: 1. Fijación de la macroestructura y la microestructura del Diccionario de la Terminologı́a Médica Español-Chino: hemos elegido los 5000 términos más frecuentes extraı́dos del Corpus Multimédica del LLI como entradas principales del diccionario, a base de los cuales hemos decidido incorporar la frecuencia normalizada, códigos médicos internacionales (CUI, MESH), equivalente en inglés, equivalente en chino mandarı́n, término equivalente en la medicina tradicional china, ası́ como la variante en chino latinizado para facilitar el uso de los hispanohablantes, sinónimos, abreviaturas, observaciones. 2. Elaboración del Diccionario, que consiste principalmente en la traducción de los 5000 términos en español al chino. Para lograr equivalentes más adecuados y precisos, hemos usado el DTM, diccionario monoling´’ue de la Real Academia Española de Medicina, el Diccionario Médico Bil⇤ † Ponente Autor correspondiente: [email protected] 50 ing´’ue Inglés-Chino, varios corpus con textos paralelos y enciclopedias biling´’ues elaboradas por instituciones sanitarias oficiales. 3. Incorporación de colocaciones: hemos incluido las colocaciones (los 10 multiwords más frecuentes) de los 5000 términos según el corpus Multimédica como nuevas entradas y sus respectivos equivalentes en inglés y en chino. 4. Selección de ejemplos: en vez de un glosario, el nuestro es un diccionario de uso. En caso de ambig´’uedad, ponemos ejemplos reales del corpus Multimédica para cada equivalente, ası́ como su traducción a chino. De esta manera, el usuario distingue mejor las diferencias que hay entre los distintos equivalentes de un mismo término. 5. Elaboración del diccionario electrónico mediante el programa TshwaneLex. Adjuntamos dos entradas (sencilla y compuesta) del diccionario en el fichero. Contraseña: Medical terminology, Spanish, Chinese, corpus, based lexicography, corpus Multimedica 51 Dire la nouveauté par les mots : les néologismes révélant les nouvelles tendances sociétales en France Najet Boutmgharine Idyassner ⇤ 1 1 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât. Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia Chaque langue est dotée de la capacité à accueillir des mots nouveaux, mais la créativité lexicale est surtout un processus qui rend compte des évolutions sociétales. En français, les procédés de création lexicale sont variés (Sablayrolles 2006) favorisant l’émergence de dénominations nouvelles, communément appelées néologismes. Très tôt, ainsi, on remarque que la création lexicale est la meilleure trace des transformations de la société, Du Bellay résumant cette relation par la formule ” aux nouvelles choses être nécessaire imposer nouveaux mots ”. La relation de cause à e↵et est structurée en deux temps : si une nouvelle chose est créée, alors la dénomination doit suivre. Cette double opération, lorsqu’elle est démultipliée, influe sur l’évolution du lexique d’une langue : les perspectives de dénomination sont alors plus grandes, pour nommer les nouvelles réalités. En suivant les changements que connait une langue donnée, on peut donc retracer les évolutions de la société dans laquelle elle évolue. L’intérêt de la néologie réside en grande partie dans ce principe, de l’avis général des néologues : ” La néologie reflète la progression d’une langue tout autant que l’évolution d’une société. [...] Le langage est daté et ce sont les néologismes qui en sont les éléments comptables les plus marquants. ” (Pruvost et Sablayrolles, 2016 : 10). Les avancées en traitement automatique des langues permettent, à l’heure actuelle, de suivre ces évolutions. Nous proposons d’exposer une recherche sur les néologismes reflétant les évolutions que connait la société française actuelle. Ces travaux ont pour cadre le projet ” Neoveille, repérage, analyse et suivi des néologismes en sept langues ” (Cartier, 2016). La plateforme Neoveille est le fruit d’un projet scientifique financé par la COMUE Sorbonne Paris Cité impliquant des intervenants à l’échelle internationale. Elle consiste en un ensemble de modules de repérage, d’analyse et de suivi des néologismes à partir d’un corpus journalistique quotidiennement alimenté. En observant la liste des néologismes retenus par le système de repérage de la plateforme, on remarque d’emblée que les néologismes traduisent l’arrivée de nouvelles pratiques de société. En particulier, les emprunts à l’anglais endossent cette fonction : l’espace de travail (co-working, workventurer), les loisirs (mermaiding, binge-viewing) ainsi que de multiples autres sphères sociales, sont bousculées par l’arrivée de nombreuses tendances importées, souvent, du monde anglo-saxon. Certains de ces emprunts sociétaux désignent des pratiques promues par les réseaux sociaux (mannequin challenge), révélant de nouvelles formes de conduites délictueuses (trainsurfing), répréhensibles (bodyshaming) mais aussi parfois signalant de nouvelles formes d’actions sociales louables (book crossing, clickfunding). De même, le suivi des néologismes dans des corpus dont les paramètres dia-varient (diatopie, diastratie, diaphasie, cf. Coseriu, 1988), montrent notamment les sociolectes les plus influents dans la sphère française, et les variations diatopiques à l’oeuvre aujourd’hui. ⇤ Ponente 52 Contraseña: néologismes, création lexicale, emprunt, anglicisme, néologismes sociétaux 53 Early Modern English Scientific Text Types: Di↵erent Levels of Linguistic Complexity? Jesús Romero-Barranco 1 ⇤ 1 Universidad de Málaga (UMA) – Universidad de Malaga Campus de Teatinos 29071 Málaga, España Complexity was first defined by Simon as hierarchies of di↵erent elements originating from simplicity (1962: 468). In Linguistics, Givon (2009) has analysed syntactic complexity from the point of view of language typology; Dahl (2004) and Nichols (2009) have assessed grammatical complexity cross-linguistically; and Blankenship (1974), Chafe (1982) and Maas (2009) have studied the di↵erent levels of complexity in spoken and written registers. Furthermore, Lehto (2015) elaborated a diachronic analysis of the levels of complexity among di↵erent text types in early Modern English legal material, based on Biber’s works on linguistic complexity. Biber (1992) identified some key linguistic features associated with reduced complexity (i.e. that deletions, contractions or clause coordination, among others) and increased complexity (i.e. nominalizations, phrasal coordination or passive constructions, among others). These features occur in di↵erent patterns across di↵erent registers and the calculation of their frequency allows for the assessment of the level of complexity in di↵erent kinds of texts. In itself, the concept of complexity has not been hitherto evaluated in early English medical writing, especially considering its di↵erent text types. In the light of this, the present paper analyses the levels of linguistic complexity in two early Modern English medical treatises housed in Glasgow, Glasgow University Library, MS Hunter 135: a surgical treatise (↵. 34r-73v) and a recipe collection (↵. 74r-121v). These two treatises conform as the ideal input for this study inasmuch as they represent two text types of medical writing and, consequently, they allow for the comparison in terms of linguistic complexity. According to Pahta and Taavitsainen (2004), theoretical treatises were the most formal text type while remedybooks represented the popular medical knowledge, surgical treatises falling in-between these two. Therefore, the analysis sheds light on the di↵erences between two of the branches of medical writing in early Modern English. The present study, therefore, has been conceived with the following objectives: a) to identify the complexity features present in these two witnesses; and b) to analyse the di↵erent levels of complexity in both text types. In order to carry out such an analysis, the linguistic features identified by Biber (1992) will be retrieved and their frequency calculated. Furthermore, textual organisation will be also analysed as it certainly contributes to the level of complexity of a particular text. On methodological grounds, the texts have been transcribed following semi-diplomatic conventions so that editorial intervention is kept to a minimum. After the transcription, the texts have been POS-tagged so that automatic searches could be carried out by way of a conventional concordancer. These texts are part of The Málaga Corpus of Early Modern English Scientific Prose (available at http://modernmss.uma.es), a corpus that aims to provide a sample of ca. 1,000,000 POS-tagged words of early Modern English scientific prose. Contraseña: linguistic complexity, early english medical writing, surgical treatises, medical reme⇤ Ponente 54 dybooks 55 El corpus de fuentes digitales como herramienta para la gramática del discurso Vı́ctor Pérez Béjar 1 ⇤† 1 , Marı́a Soledad Padilla Herrada ⇤ ‡ 1 Universidad de Sevilla (US) – España Nuestro punto de partida es la consideración de la rentabilidad del uso de fuentes digitales en los estudios de investigación ling´’uı́stica. Todos coincidimos en la necesidad del trabajo con corpus, que implica un estudio empı́rico con datos reales, lo que legitima las conclusiones obtenidas. Aunque este tipo de trabajo es habitual en el léxico, es recomendable y, desde nuestro punto de vista, imprescindible, en el campo de la sintaxis. Por ello, desde el proyecto MEsA (Macrosintaxis del Español Actual ; referencia: FFI2013-43205P) estamos elaborando un corpus compuesto por textos procedentes de fuentes digitales. Consta de muestras de discurso tomados de blogs y foros de diversa temática, publicaciones y comentarios de páginas públicas de Facebook, tuits, transcripciones de vı́deos de YouTube y recopilaciones de sus comentarios, ası́ como conversaciones privadas de la aplicación WhatsApp. Está en fase de realización. Nuestro objetivo es conseguir material ling´’uı́stico de uno de los medios de comunicación más frecuentes en la actualidad: las redes sociales y aquellos entornos integrados en la internet 2.0. Se trata de un entorno comunicativo hı́brido en el continuum oral-escrito, coloquial-formal. Entre las ventajas podemos destacar la gran cantidad de muestras textuales a las que se tiene acceso, la obtención de ejemplos fáciles de interpretar sin las dificultades que presenta la lectura de una transcripción oral y la posibilidad de recuperar el contexto completo de las muestras. Entre los problemas, podemos señalar que no siempre es posible la reposición de elementos entonativos (a menudo, imprescindibles en la interpretación de enunciados) ya que la ortografı́a no es rigurosa a la hora de reflejar la prosodia. Este corpus nos servirá para trabajar dentro del marco del proyecto, para detectar patrones sintácticos que se están extendiendo en el discurso coloquial y del que, raras veces, obtenemos datos. Nos interesa porque en algunos casos puede llevar a la fijación de operadores o marcadores discursivos. En todos estos domina la intersubjetividad (Company 2004; Traugott 2004), uno de los motores de la evolución de estos elementos ling´’uı́sticos. En esta presentación nos queremos centrar en expresiones que salen de los moldes sintácticos tradicionales y que no se ajustan al esquema oracional. Dentro de este grupo, se sitúan las unidades fraseológicas, entendidas en un sentido amplio. Es decir, nos referimos a estructuras con una fijación léxica total (refranes, frases hechas...), construcciones cuya fijación se encuentra en la combinatoria de sus elementos (como construcciones insubordinadas) y otras expresiones ling´’uı́sticas que no se encuentran todavı́a del todo fosilizadas. El acercamiento a estas unidades se hará desde una perspectiva pragmagramatical (Fuentes 2015), que contempla la descripción de unidades sintácticas más allá de la oración según su uso real y su función dentro del discurso. Esta perspectiva se desarrolla desde un análisis multidimensional, que tiene en cuenta la ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 56 macroestructura, la microestructura y el tipo de texto, y que incluye los diferentes campos de inserción de la posición del hablante: estructura enunciativa, modal, informativa y argumentativa. Defendemos, por tanto, que es un corpus rentable en los estudios de los fenómenos coloquiales de la lengua. Con la presentación de muestras de unidades con mayor o menor grado de fijación extraı́dos de este corpus pretendemos reflejar que son muestras fiables para el estudio en este campo de investigación y que su uso constituye una herramienta eficaz en las investigaciones en gramática del discurso. Contraseña: Pragmagramática, sintaxis discursiva, discurso digital 57 El desacuerdo a través de la interrogación ecoica Marı́a Valentina Barrio ⇤ 1 , Milka Villayandre ⇤ 1 1 Universidad de León – España El español presenta un conjunto de esquemas sintácticos fraseológicos pragmáticos (Zamora Muñoz, 2003), de naturaleza interrogativa, que repiten total o parcialmente un enunciado previo emitido por otro interlocutor y cuya función discursiva es la expresión del desacuerdo mediante dicha repetición. Se pueden citar algunos ejemplos: (1) A: - A ti, Ana, te toca fregar los platos. B: - ¿A mı́, fregar, de qué? No pienso hacerlo. (2) A: - ¿Sabes cuándo vuelve Pili de las vacaciones? B: - ¿Yo qué voy a saber? (3) A: - ¿No tomas el desayuno con nosotros? B: - ¿Qué desayuno ni qué leches? Sigo sin olvidar lo que me habéis hecho. (4) A: - Si madrugaras más, tendrı́as más tiempo para organizarte. B: - ¿Yo, madrugar? Lo siento, me lo prohı́be mi religión. (5) A: - A ver, que el español no necesita promoción. B: - ¿Cómo que el español no necesita promoción? En este estudio, se proponen dos objetivos principales. En primer lugar, se sistematizarán los esquemas fraseológicos interrogativos existentes en español que manifiestan desacuerdo cumpliendo las caracterı́sticas anteriormente mencionadas, a fin de definir los elementos que conforman su esquema fijo y aquellos que pueden saturar sus variantes libres. En segundo lugar, se analizará el microdiscurso que forma el esquema interrogativo junto con su estı́mulo (el enunciado que repite) para describir las funciones pragmáticas que cumplen estas unidades y las relaciones que experimentan dentro de la conversación. En esta tarea, se hará especial hincapié en dos cuestiones. Por un lado, el estudio de la repetición y los componentes a los que afecta; esto es, al contenido del enunciado, al acto de enunciación, a los interlocutores... Por otro, la reflexión sobre las unidades en las que recae el desacuerdo y los supuestos pragmáticos en los que se basa este desacuerdo, ya sean de carácter explı́cito o requieran un proceso interpretativo de tipo inferencial. ⇤ Ponente 58 Por lo que se refiere al marco de estudio, se seguirán los postulados de la macrosintaxis de corte funcionalista (Gutiérrez Ordóñez, 2016) que supera los lı́mites del enunciado y se adentra en el microdiscurso, es decir, la combinatoria de enunciados en el discurso, para observar sus constituyentes y el entramado de relaciones y funciones que tienen lugar entre ellos. Metodológicamente, se partirá de un análisis cualitativo de estos esquemas fraseológicos en algunos corpus orales del español dentro del ámbito conversacional, en el que aparecen de forma natural debido a su naturaleza ecoica. Asimismo, se contrastará su incidencia en corpus más generales del español. Estos corpus son: el Corpus del Español del Siglo XXI (CORPES XXI), el Corpus del español web/dialectos, Sketch Engine, el Corpus Oral Didáctico Anotado Ling´’uı́sticamente (CORDIAL), el Corpus de conversación coloquial del grupo Val.Es.Co., el Corpus Oral Juvenil del Español de Mallorca (COJEM) y el Corpus del grupo de investigación ling´’uı́stica aplicada (COGILA). Se prevé que los resultados contemplen algunas de las principales caracterı́sticas de estos esquemas. Dentro de la conversación, actúan siempre como intervenciones de réplica despreferidas, puesto que nunca pueden ser primeros turnos de palabra. La manifestación del desacuerdo marca al mismo tiempo la ruptura con la continuación esperable del discurso y la presencia de varios enunciadores dentro de la misma intervención. Contraseña: Desacuerdo, estructuras interrogativas, repetición, análisis de la conversación, ling´’uı́stica de corpus, macrosintaxis 59 El lenguaje jurı́dico y el lenguaje de la ingenierı́a biomédica vistos desde la metodologı́a de corpus Eleonora Lozano Bachioqui ⇤† 1 , Allen Andrade Navarro ⇤ ‡ 2 1 2 Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Av. Álvaro Obregón y Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Álvaro Obregón y Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México El presente trabajo se concentra en dos lenguajes de especialidad: el jurı́dico y el de la ingenierı́a biomédica. Profundiza en el lenguaje jurı́dico desde la perspectiva fraseológica y en el de la ingenierı́a biomédica, desde perspectiva terminológica. Para tal efecto, se construyeron dos corpus especializados monoling´’ues en español que son producto de la investigación basada en la metodologı́a de corpus (McEnery y Hardy, 2012) y que fueron analizados a través de herramientas de gestión de corpus. Para ello, se contemplaron trabajos fundacionales en la ling´’uı́stica de corpus como los de Sinclair (1970) y Stubbs (2001). El primer corpus, un corpus con fines especı́ficos (Maia, 2002), consta de 73,214 palabras y 5, 751 tipos procedentes de documentos legales pertenecientes al derecho civil mexicano, tales como actas de nacimiento y de matrimonio, sentencias, testamentos, ası́ como contratos, entre otros. Éste se analizó mediante un software de procesamiento léxico: WordSmith Tools (Scott, 2014) que generó una lista de 558 palabras clave. De aquı́, se obtuvieron 60 verbos clave con una frecuencia 10; a partir de los cuales se estudiaron las colocaciones y secuencias formulaicas, utilizando el IM (Índice de Información Mutua). Para ello se consideraron trabajos fundacionales como los de Corpas Pastor (2003) y Koike (2001). Un ejemplo del análisis es el caso del verbo celebrar que tiene colocaciones léxicas simples como celebrar + contrato y celebrar + convenio (verbo + sustantivo objeto), ası́ como celebrar + a + (el ) tenor (verbo + preposición+ sustantivo). Además, presenta secuencias formulaicas como es su libre voluntad celebrar y obligarse. El segundo corpus consta de 394,351 palabras y 23,965 tipos procedentes de textos cientı́ficos pertenecientes al área de la ingenierı́a biomédica y obtenidos a través de revistas electrónicas de reconocido prestigio en Latinoamérica. Al igual que el primero, se analizó mediante un software de procesamiento léxico: Antconc (Lawrence, 2014) que generó una lista de palabras clave, de las cuales se consideraron aquellas con una Frecuencia de 45 y un ı́ndice de representatividad (Keyness) de 107, a partir de éstas se identificaron las colocaciones, considerando el ı́ndice Log-Likelihood. Para este trabajo se consideraron autores como Cabré, (2007) y Faber (2010). Ejemplos de las colocaciones encontradas en este último corpus son: tejido + óseo, matriz + extracelular, presión + arterial, alto + riesgo y baja + densidad (sustantivo + adjetivo), ası́ como reacción + difusión (sustantivo – sustantivo). Los resultados de este estudio facilitan un acercamiento, desde la perspectiva de la ling´’uı́stica de corpus, a estos dos lenguajes de especialidad y permiten al traductor, ası́ como al docente de ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 60 lenguas con fines especı́ficos, resolver los problemas ling´’uı́sticos relacionados con la estructura léxica, terminológica e, incluso, fraseológica de los lenguajes de especialidad. En este caso, el jurı́dico y el técnico. Contraseña: lenguajes de especialidad, ling´’uı́stica de corpus, colocaciones, traducción, enseñanza de lenguas 61 Estudio comparativo de la traducción en inglés, francés y español de los aspectos ling´’uı́sticos y paraling´’uı́sticos de los cómics a partir de un corpus multimodal de género de terror Marı́a Del Carmen Baena Lupiáñez 1 ⇤ 1 UNIVERSIDAD DE MÁLAGA – España Teniendo en cuenta los estereotipos que la sociedad establece con respecto a determinados gestos y expresiones, las producciones literarias han hecho uso de ellos para aportarle expresividad a sus personajes. Hoy en dı́a existen cómics con verdaderos ensayos filosóficos en sus bocadillos, y cómics en los que solo aparece la imagen sin texto alguno. Dicho texto se limita, en ocasiones, a complementar lo que el lector se dispone a ver en las viñetas. Texto e imagen son dos elementos que no pueden prescindir el uno sin el otro, ya que se complementan entre sı́. En la traducción de cómics, el traductor debe tener en cuenta esta complementariedad para que el texto meta resulte coherente y tenga cohesión. Por lo tanto, en la traducción de cómics hay que observar tanto los elementos textuales como los paratextuales, ya que no son de ningún modo indisociables. Ası́, el traductor debe no solo leer el texto, sino interpretar la imagen que la acompaña y emplear las técnicas pertinentes, ası́ como adaptar el texto y la imagen a la cultura meta en caso necesario. Esto permitirı́a afirmar que la traducción de cómics es un tipo de traducción especializada, ya que tiene sus propios códigos y sus propias estrategias de traducción. Sin embargo, y pese a la importancia de una buena interpretación de los aspectos paraling´’uı́sticos, los estudios de Traducción no han tratado este tema de forma directa con demasiada frecuencia. Anteriormente se estudiaba, por un lado, el género del cómic y, por otro lado, la traducción de cómics. De este modo, existen estudios que se centran en el análisis del cómic (T. Groensteen, 2009, 2013), en las caracterı́sticas especı́ficas de este género (Gubern y Gasca, 1988) y en su aspecto semiótico (N. Celotti, 2008), y, por otro lado, estudios que se centran en la importancia de la imagen para la traducción de cómics (Kaindl, 2004; Zanettin, 2008). Hoy en dı́a el concepto de ”paratraducción” es el que mejor se adecua a la traducción de cómics (José Yuste Frı́as, 2015). Autores como Zanettin han estudiado tanto la ling´’uı́stica de corpus como los cómics, y han señalado que se puede establecer una relación entre los corpus y los cómics, ya que el traductor puede elaborar corpus textuales para traducir de forma más eficaz y eficiente el cómic a la lengua y a la cultura meta (2002). Tras lo expuesto anteriormente, el objetivo principal de este estudio es el de establecer clasificaciones que integren los elementos paraling´’uı́sticos (gestualidad, expresiones faciales y lenguaje ⇤ Ponente 62 simbólico) que aparecen en los cómics teniendo en cuenta la cultura inglesa, francesa y la española. Para cumplir con dicho propósito, se han seleccionado seis cómics de terror. En este tipo de obras los elementos paraling´’uı́sticos son muy destacables, ya que son cómics en los que aparecen multitud de elementos simbólicos y en las que los personajes son especialmente expresivos, con lo cual se podrá establecer un amplio corpus multimodal. Contraseña: Ling´’uı́stica de corpus, corpus multimodal, cómics, cómics de terror, elementos paral´ ing’uı́sticos. 63 Estudio comparativo de las marcas de uso en los repertorios lexicográficos actuales Estrella Calvo-Rubio Jiménez 1 ⇤ 1 Universidad de Sevilla [Seville] – C/ S. Fernando, 4, C.P. 41004-Sevilla, España Los repertorios lexicográficos han registrado siempre, en menor o mayor medida, marcas de uso. Sin embargo, a lo largo de la historia lexicográfica, esta marcación ha sufrido cambios. En efecto, observamos que en los últimos años la Real Academia de la Lengua Española ha optado por introducir nuevas marcas de uso y, en ocasiones, ha procedido a la sustitución de una marca por otra. En este sentido, el Diccionario de la Lengua Española de la Real Academia ha sido siempre un referente en el mundo lexicográfico hispánico y, por supuesto, los estudios que tratan sobre él son muy abundantes y variados. Sin embargo, a raı́z de la observación de las últimas ediciones, llama la atención la variabilidad presente en las marcas de uso. No obstante, esta variabilidad o falta de precisión a la hora de establecer las marcas de uso no es una caracterı́stica exclusiva del Diccionario académico. De hecho, los lexicógrafos coinciden en señalar la existencia de una clara dificultad a la hora de instaurar un criterio a través del cual decretar cuándo una voz o acepción pertenece a un nivel de lengua o estilo concreto. De ahı́ que existan diferencias entre una obra lexicográfica y otra en lo que a las marcas de uso se refiere. Esta investigación realiza un estudio comparativo de las marcas de uso en diferentes repertorios lexicográficos actuales, concretamente en el Diccionario de la Lengua Española (2014) de la Real Academia, en el diccionario CLAVE (2012), en el Diccionario del español actual (2011), en el diccionario de uso Marı́a Moliner (2008) y en el diccionario de la lengua española LEMA (2001), con el objetivo de reflejar las diferencias en cuanto a esta marcación de una obra a otra. Para ello, se parte de la elaboración de un corpus en el que se recogen las voces o acepciones marcadas diafásica o diastráticamente en estas cinco obras. De este modo, a través de la observación y el estudio del corpus, me centro en estudiar las diferencias existentes entre un repertorio lexicográfico y otro en cuanto a las marcas de uso, prestando especial atención a las voces y acepciones marcadas como vulgar, malsonante y coloquial. Ası́ comprobamos que estas cinco obras presentan bastantes divergencias a la hora de establecer dicha marcación. Por ejemplo, observamos que el diccionario LEMA se aleja claramente de las otras obras al no catalogar ninguna de las voces y acepciones bajo la marcación vulgar ; o que, en el Marı́a Moliner, no aparece la anotación malsonante, introducida en el Diccionario académico en 2001 y presente en los otros repertorios lexicográficos. Cabe preguntarse, pues, qué criterios siguen los diferentes lexicógrafos a la hora de establecer las marcas de uso y cuáles son más convenientes en cada caso. Contraseña: lexicografı́a, marcas de uso, diccionarios ⇤ Ponente 64 Estudio contrastivo de corpus para identificar los rasgos diacrónicos del discurso normativo catalán : estudio de los Estatutos de autonomı́a de 1932, 1979 y 2006 Albert Morales Moreno 1 ⇤ 1 Universitat Pompeu Fabra / Università Ca’ Foscari Venezia (UPF / UCFV) – España El procedimiento legislativo de aprobación y redacción del Estatuto de autonomı́a de Cataluña de 2006 (EAC 2006), y su estudio exhaustivo presentado en Morales (2015), planteaban la necesidad de llevar a cabo un estudio diacrónico[1] comparativo de los diferentes Estatutos de autonomı́a de Cataluña que ha habido a lo largo de la historia: el EAC de 1932, el de 1979 y el ya citado de 2006. Como en otras tradiciones y paı́ses, la negociación de todos esos proyectos normativos han sido retos notables en su momento histórico, tanto jurı́dicamente como polı́ticamente, tal y como se puede constatar en Balcells (2010) –para el proyecto de autonomı́a de 1919–, Aymamı́ (1932) o Abelló (2007) sobre el de 1932, y Sobrequés (2010) en lo que respecta al EAC de 1979. Hay que leer cada uno de esos Estatutos como reclamaciones de autogobierno reiterados tanto en el marco jurı́dico constitucional actual como en los marcos de convivencia anteriores. Dicho compendio de documentos constituye lo que André Salem denomina ”serie textual cronológica” (Salem 1994:313). Esos textos, situados a medio camino entre el discurso especializado legislativo y el discurso polı́tico (Thornton 1987; Chilton 2004), se inscriben dentro de un género textual –el discurso normativo– poco estudiado desde la perspectiva del análisis del discurso (AD) (Fernández Lagunilla 1999a, 1999b; Bassols 2007), ya que sobre todo se han caracterizado otros géneros relacionados con la actividad polı́tica, especialmente el debate parlamentario (Ribas Bisbal 2000; Cuenca 2014). Tomando como referencia las publicaciones sobre redacción legislativa en catalán (como, por ejemplo, GRETEL 1986, 1995; Duarte 1993; SAL 2014) y la metodologı́a de otros estudios lexicométricos sobre discurso normativo en catalán (Morales 2010, 2015), ası́ como el estudio contrastivo de las constituciones españolas de 1812, 1931 y 1978 (Démol 2013), se llevará a cabo un estudio de tipo diacrónico. Partimos de una metodologı́a de análisis basada en la lexicometrı́a: las unidades de análisis se seleccionan en base a criterios estadı́sticos. Para tratar nuestro corpus, utilizaremos una de las herramientas de análisis lexicométrico más utilizadas, a saber, Lexico3, Iramuteq, TXM o Hyperbase. Procederemos a realizar un estudio lexicométrico de las caracterı́sticas principales del corpus (crecimiento del vocabulario, análisis factorial de correspondencias, segmentos repetidos...) y nos interesa, sobre todo, dos estudios: 1) el análisis de especificidades para, con este ı́ndice ⇤ Ponente 65 ampliamente usado en la tradición lexicométrica, ser capaces de identificar las unidades léxicas que presenten cambios a lo largo del periodo seleccionado (1932-2006). Dicho ı́ndice nos servirá para identificar las formas que aparecen infrautilizadas y sobreutilizadas estadı́sticamente, de acuerdo con el tamaño de cada subcorpus (cada EAC diferente) y de todo el corpus en su conjunto; 2) el análisis de segmentos repetidos, para identificar cuáles son las unidades fraseológicas que caracterizan el discurso normativo en catalán y su evolución a lo largo del tiempo. De este modo, nuestra investigación se plantea analizar el corpus lexicométricamente para identificar las formas que caractericen en positivo y en negativo cada versión del EAC estudiada y las unidades fraseológicas más recurrentes para, ası́, establecer las primeras bases que permitan describir, desde un punto de vista diacrónico, la evolución del discurso normativo en lengua catalana en relación al vocabulario y a la fraseologı́a. Esta investigación se enmarca en el proyecto de investigación financiado por el Instituto de Estudios del Autogobierno para el primer semestre de 2017. Contraseña: discurso normativo, lexicometrı́a, ling´’uı́stica de corpus, estudio diacrónico 66 Estudio de la aplicabilidad de la ley de Zipf y de la ley de Heaps en los corpus de aprendientes de inglés. Nicolas Ballier ⇤† 1 , Paula Lissón 1 ⇤ ‡ 2 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia 2 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia de Corpus Diderot Bât. de Corpus Diderot Bât. Este trabajo se centra en la aplicabilidad de la ley de Zipf-Mandelbrot (Zipf, 1949; Mandelbrot, 1953) y de la ley de Heaps (1978) en los corpus de aprendientes. Para ello, realizaremos una comparación entre las curvas de crecimiento del vocabulario en textos escritos por nativos ingleses y en textos escritos por aprendientes de inglés. La ley de Zipf-Mandelbrot establece que, en un texto dado, la distribución de las palabras está relacionada con su frecuencia. Esto se traduce en que el texto estará compuesto por pocas palabras con mucha frecuencia, y por muchas palabras con poca frecuencia. En un estudio reciente, Bentz y Buttery (2014) muestran que a) la ley de Zipf-Mandelbrot puede ser utilizada como medida de estudio de la diversidad léxica y, b) no todas las lenguas siguen de la misma forma la ley de Zipf-Mandelbort. Nuestra hipótesis es que los aprendientes de inglés no siguen exactamente la ley de Zipf-Mandelbort, y que su curva de crecimiento del vocabulario es diferente con respecto a la curva de los nativos, lo que podrı́a ayudarnos a clasificar a los aprendientes en diferentes niveles. La ley de Heaps (1978), complementaria a la ley de Zipf, establece que el crecimiento del vocabulario de un texto dado es una función del tamaño de dicho texto. Si aumentáramos el tamaño del texto, aunque el crecimiento del vocabulario seguirı́a siendo ascendiente, dejarı́a de ser linear, ya que a medida que se incrementa el número de palabras, la posibilidad de que aparezcan palabras nuevas se ve reducida. Nuestra hipótesis es que los aprendientes presentan un crecimiento del vocabulario más limitado, por lo que la producción de hápax legomena serı́a inferior a la predicción propuesta por la ley de Heaps (aproximadamente la raı́z cuadrada del número total de tokens). Para probar nuestra hipótesis, estudiaremos la aplicabilidad de la ley de Zipf-Mandelbrot y de la ley de Heaps en un corpus escrito de estudiantes hispanófonos de inglés, NOCE (Dı́azNegrillo, 2007), y compararemos los resultados con los de un corpus de producciones escritas de nativos ingleses, LOCNESS (Paquot, 2015). De esta forma, analizaremos la valencia de las leyes aquı́ propuestas, mostrando ası́ las variaciones entre los nativos y los no nativos. A partir del número de tokens y de hápax legomena de nuestro corpus de aprendientes, gener⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 67 aremos los espectros de frecuencia que nos permitirán crear las curvas de crecimiento del vocabulario. Para ello, emplearemos el paquete {zipfR} (Evert & Baroni, 2006), implementado en el programa R (R Core Team, 2016). Siguiendo los pasos de Ballier y Gaillat (2016), utilizaremos la función ”compare.richness.fnc” implementada en {langaugeR} (Baayen, 2007) para comparar el crecimiento del vocabulario entre las producciones de nativos y no nativos. A continuación, desarrollaremos la extrapolación de las curvas de crecimiento de vocabulario (ver figura 2) según los tres modelos de Large Number of Rare Events (LNRE) incluidos en {zipfr}: ”Generalized Inverse Gauss-Poisson” (R Harald Baayen, 2001, 2008), ”Zipf-Mandelbrot” y ”Finite Zipf-Mandelbrot” (Evert, 2004). Finalmente, comparemos los resultados de los tres modelos para identificar cuál de ellos es más adecuado en el análisis de los corpus de aprendientes. Contraseña: corpus de aprendientes, complejidad léxica, Zipf, Mandelbrot, crecimiento del vocabulario, hápax legomena 68 Extracción de fraseologı́a contable con Sketch Engine. Propuesta de flujo de trabajo Daniel Gallego 1 ⇤ 1 Universidad de Alicante (UA) – Carretera San Vicente del Raspeig s/n 03690 San Vicente del Raspeig - Alicante, España Este trabajo presenta una experiencia metodológica en la extracción de fraseologı́a especializada en un corpus genérico especializado en contabilidad. Se postula la hipótesis de que, sobre la base un listado cerrado de términos simples y de verbos que potencialmente pueden llegar a formar junto con tales términos unidades fraseológicas especializadas, Sketch Engine (Kilgarri↵ et al., 2004), a pesar de no estar diseñado especı́ficamente para la extracción de fraseologı́a especializada, puede ser de utilidad para el trabajo de vaciado fraseológico. El marco teórico gira en torno al concepto de fraseologı́a especializada, que se revisa a partir de trabajos como los de Gouadec (1994), L’Homme (1997), Bevilacqua (2004) o Aguado (2007). También se tienen en cuenta algunos estudios sobre evaluación de extracción de fraseologı́a (Claveau & L’Homme 2004; Wanner et al. 2005, entre otros). Para dar respuesta a la hipótesis de trabajo, en un primer momento, se delimita, sobre la base de los trabajos anteriores, el objeto de estudio (en esencia, se analiza la fraseologı́a especializada del tipo verbo + término). A continuación, se propone un flujo de trabajo para la extracción de un listado de candidatos a unidades fraseológicas especializadas con el sistema de explotación de corpus Sketch Engine. El flujo en cuestión se divide en diferentes pasos: el primero consiste en generar dos whitelists, una con términos y otra con verbos extraı́dos del propio corpus, y validarlos manualmente. El segundo tiene que ver con la extracción de concordancias que contengan los verbos y términos identificados, lo cual implica el uso avanzado de CQL (corpus query language) de Sketch Engine. En el tercer paso se genera, a partir del listado de concordancias anterior, un nuevo listado de frecuencias de las unidades extraı́das que puede considerarse un listado de candidatos a unidades fraseológicas especializadas. Por último, se estudia de manera individualizada las unidades extraı́das para determinar su carácter fraseológico. El análisis de las cincuenta primeras unidades extraı́das muestra un porcentaje de precisión de en torno al 40%, una cifra bastante elevada que merece seguir siendo investigada. La validación de más unidades permitirá conocer la fluctuación de este porcentaje y saber en qué medida es superior o inferior al de otros estudios. En cualquier caso, los resultados pueden ser tenidos en cuenta no solo en la elaboración de repertorios fraseológicos, sino también en la indexación de corpus. Asimismo, la experiencia permite hacer algunas sugerencias con el ánimo de optimizar el funcionamiento de sistemas de explotación de corpus en su relación con la extracción de fraseologı́a especializada. ⇤ Ponente 69 Contraseña: Fraseologı́a especializada, Sketch Engine, extracción, corpus genéricos 70 Extracting semantic frame structures from Environmental Sciences corpora Beatriz Sánchez-Cárdenas ⇤† 1 , Carlos Ramisch ⇤ 2 1 2 Lexicon research group, Universidad de Granada – España Université de Marseille – LIF (Laboratoire d’Informatique Fondamentale) – Francia Some authors argue that language is much less compositional than one might initially assume (Tutin & Falaise 2013, K´’ubler & Volanschi 2012, Gledhill 2000, Pecman et al 2010, L’Homme 1998). In addition to multiword expressions, such as idioms andcompounds, speakers often employ prefabricated templates and collocational patterns. Such patterns are omnipresent in specialized language, where their correct use is crucial to fully convey and understand domain concepts and their relations. In this research, we propose and evaluate a new way to automatically identify specialized nounverb combinations that are both recurrent and meaningful from a cognitive point of view in scientific discourse (Claveau & L’Homme 2006). The long-term goal of this work is to automatically extract argument structures from corpora to help building semantic frames that are activated in specialized domains. From a theoretical point of view, our work derives from frame-based terminology (FBT, Faber 2012, 2015). FBT applies the premises of frame semantics (Fillmore 2006) to the study of the conceptual organization that underlies specialized domains. Then, our description of thematic roles and argument structure is based on role and reference grammar (Van Valin 2006). Finally, we classify the nouns of the arguments in semantic categories (Flaux and Van Velde, 2000). With this perspective in mind, we developed a corpus-based methodology to acquire lexical patterns that reveal the structure of di↵erent frames. Our starting point are corpus queries and association measures implemented in the MWEtoolkit, a software for automatic MWE discovery in corpora(Ramisch 2014). After morphosyntactic analysis and lemmatization of the corpus, we search specialized nounverb and verb-noun combinations that are conceptually meaningful. These searches were based on semantic relations between nouns described in the Environmental database EcoLexiCon. For instance, the term volcano is connected to the noun eruption through the conceptual relation [cause of]. Since the extraction from corpora of relevant noun-verb combinations is crucial to identify the argument structures that underlie semantic frames (Fillmore et al 2003), we searched in the corpora for verbs that lexicalize the relation between these two nouns and retrieved verbs such as cause and produce. Using a bootstrap methodology, these verbs where reused to formulate another query, which retrieves from the corpora all causal relations related to volcanoes. The results were then sorted in descending order of association measure (pointwise mutual ⇤ † Ponente Autor correspondiente: [email protected] 71 information). The most relevant lexical items for the frame under study are those in the top of the list. Finally, these lists of patterns led to the emergence of the di↵erent conceptual frames associated to the concepts analyzed. These are then filled in manually by an expert lexicographer. In this article, we chose to present an example extracted from a 1-million-token corpus of Volcanology. For the moment, we have extracted the verbs associated to the term volcano. When we analyze the arguments of these verbs and their associated thematic roles (Van Valin 2006) and semantic categories (Flaux and Van Velde, 2010), we will illustrate the di↵erences between the three di↵erent frames. Since frames reflect cognitive patterns, they are language independent. As shall be seen in our presentation, this conceptual description can be enriched with linguistic information in any language. As a consequence, translation studies can greatly benefit from it. Contraseña: frame, based terminology, multiword expressions, argument structure, corpus analysis strategies 72 Facework in a telecollaboration student corpus Pennock-Speck Barry ⇤† 1 , Begoña Clavel Arroitia ⇤ 1 1 Universitat de València (UVEG) – Universitat de València, Avda. Blasco Ibáñez, 32, España Undoubtedly bigger is better in the world of corpus linguistics –the more data you have the better results. However, there are corpora that are necessarily small. Let’s take our corpus of twelve audio-visual recordings of synchronous peer interaction (Telecollaboration[1]) in English and Spanish between native secondary school speakers. Anyone who has done research on the discourse of minors knows how difficult it is to get permission from parents to record pupils for research purpose. What may not be so evident to those who have never been involved in telecollaboraion is the difficulty of finding schools in at least two countries that are willing to participate and time slots that suit geographically distant peers. These problems are compounded by often less than perfect technical resources in secondary schools. All this leads to small, finite corpora which are difficult to replicate. But does this mean that they are of no use? We would argue that this is far from the truth. In this talk, we aim to prove that detailed qualititative analysis of synchronous multimodal interaction between secondary school pupils yields valuable insights into the language pupils use and also intercultural and interersonal negotiations. During telecollaboration students are faced with challenges of an interpersonal, intercultural and a transactional nature while trying to complete the tasks they are given such as organising a party or a trip abroad on a tight budget. Such challenges require the use of facework, which we define, following Go↵man (1956, 1967), as the actions individuals take to mitigate face threats and to protect or enhance their own face and that of others. Our findings show that mitigating face threats is found in our corpus when requests for clarification arise due to a peer’s lack of linguistic prowess in the foreign language at a particular moment in the exchange or simply because he/she is not able to hear a word due to technical problems. In most cases we found that, if comprehension was not compromised, linguistic errors were obviated –which may be due to a common facework strategy, that is, avoidance of conflictive issues. We also discovered that facework addressed to positive face was very common and generally consisted of the search for common ground. Apart from linguistically-coded communication, we also detected many cases of non-linguistic communication through gestures, smiles, laughter and the showing of photographs of a personal nature. These often reinforced verbal facework strategies. To sum up, our findings point to the fact that the ”ceremonial activity” (Go↵man 1967:477) done through facework is an important, though oft-neglected, facet of linguistic or psychological studies of student interaction. Go↵man, Ervin. 1956. ”The nature of deference and demeanor.” American Anthropologist 58: 473-502. ⇤ † Ponente Autor correspondiente: [email protected] 73 Go↵man, Ervin. 1967. Interaction ritual: Essays on Face to Face Behavior. Garden City, New York. Telecollaboration for Intercultural Language Acquisition project (TILA) Contraseña: telecollaboration, facework, pragmatics, acquisition 74 From text to word and from word to morpheme: Exploring the interface of corpus linguistics and word formation study with evidence from Modern Greek Paraskevi Savvidou 1 ⇤ 1 National and Kapodistrian University of Athens (UoA) – Grecia The present paper aims to explore the contribution of corpus linguistics in word formation study, by reviewing previous research, as well as by discussing the findings of an ongoing study in Modern Greek word formation processes with emphasis on evaluative morphology. The orientation of the study is both theoretical and methodological. It aims to demonstrate that the further investigation of the interface of corpus linguistics and word formation morphology could provide significant insights into the understanding of the character and nature of corpus linguistics as a linguistic (un)field or methodology (see among others Stubbs 2009), by demonstrating its ties with what Sinclair (2004) used to call restrictions of the pre-computer age; also it can contribute towards the overcoming of these limitations. In other words, the interface of corpus linguistics and word formation study is presented as crucial for understanding and extending the theory and methodology of corpus linguistics. In the first part of the paper, a historical overview of the use of corpus linguistics demonstrates that morphology is a rather neglected area of corpus research, compared to other linguistic fields; corpora were applied in morphology later, less systematically and by concentrating only on specific aspects of morphemes’ behavior, like productivity, excluding or underestimating others. The critical overview of previous research shows that the use of corpora in individual linguistic fields seems to be driven by a latent distinction between the formation and the use level, which is associated with the relevant dichotomies between grammar and lexis, as well as between form/structure and semantics. The extent and the way of applying corpora in morphology can be seen as a consequence of this distinction. Given the fact that corpus linguistics is a perspective in language study which goes beyond theoretical assumptions and dichotomies which do not come from data analysis, the above observation could contribute to a most thorough understanding of such limitations, which is essential in order to overcome them. In the second part of the paper, we introduce a set of theoretical and methodological principles which could extend the implementation of corpus linguistics in word formation study and we give evidence in their favor by presenting the results of an ongoing study of Modern Greek evaluative morphology. The proposed methodology is designed on the basis of two main methodological principles: (a) the extension of the notion of co-occurrence in two levels: the word formation level (namely, various characteristics of the bases or compounding components which the elements under examination tend to combine with) and the (con)text level and (b) the combination of qualitative and quantitative analysis on the study of every aspect of the behavior of the sublexical units under examination, including function identification, combinatoriality, productivity etc. These principles aim to transfer all the benefits of the ‘phraseological approach’ of corpus linguistics to the field of morphology. The results of the analysis of a representative number of Modern Greek sub-lexical units show that these general principles allow the examination of the ⇤ Ponente 75 dynamic relation between the formation and the use level of the elements under examination, o↵ering a perspective which can only be in view if the analysis is careful not to exclude or underestimate specific aspects of morphemes’ behavior. Contraseña: Word formation, derivation, compounding, context, word level, text level, lexis, grammar, evaluative morphology, phraseological approach 76 Functional and thematic ngrams in specialized corpora: the case of academic English, French and Spanish Clive Hamilton ⇤ 1 1 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât. Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia Previous studies have established that functional and content single-word-units di↵er in ratio between oral and written modes of communication (cf. Halliday, 1994; Rowley-Jolivet 1998; Biber et al., 1999). Others have suggested that this mode di↵erence is equally attested in di↵erent languages (cf. Samaniego, forthcoming, for Spanish; Hamilton & Carter-Thomas, forthcoming, for English and French). However, in spite of the many advances in corpus studies, this observation has not yet been adapted or extended to clusters or recurrent word combinations. In addition, the study of phraseological units has become a burgeoning area of linguistic inquiry over the last years, both in theoretical and applied frameworks (cf. Cowie, 1998; Meunier & Granger, 2008). The pervasiveness of these units, irrespective of the type of data used for research, has also benefited from ”key publications”, according to Stubbs & Barth (2003:61). As a result, the pervasive nature of these recurrent combinations can therefore be considered an irrefutable characteristic of natural language production. In this presentation, the aim is to add a doubly contrastive perspective to the general debate, by examining (i) recurrent word combinations (or ngrams, which can be subdivided into bigrams, trigrams, and so forth) (ii) in a specialized trilingual corpus of academic discourse in natural sciences (restricted to chemistry, geochemistry, marine and water sciences) in English, French and Spanish. The corpus compilation process will be presented and I will briefly outline the distinction made between functional and thematic ngrams. The main part of my presentation will focus on two issues: i.e. the pervasiveness of the two types of recurrent word combinations in the three subcorpora and the parallels that can be drawn (especially when there is overlap between languages with a specific ngram) between thematic and functional ngrams and the lexical density of each language subcorpus. Preliminary results indicate overlapping: viz. the trigram ‘a partir de’ exhibits a similarly high frequency both in the Spanish and French subcorpora, whereas the Spanish ‘en la figura’ and the English equivalent ‘shown in figure’ are used in a comparable manner and both share similar frequency. Substantial di↵erences, however, have been observed in lexical density between languages with a greater ratio in English than in French and Spanish, implying that composition strategies may vary significantly in terms of information packaging. There is also a marked preference in English for functional ngrams rather than thematic ngrams. For instance, the top three trigrams in the English subcorpus are all functional, whereas those in the two remaining languages are considered thematic or topic-specific. (i.e. ‘the use of ’, ‘shown in figure’, ‘as well as’; ‘après J.-C’, avant J.-C, ‘de l’holocène’; ‘almacenamiento de CO2’, ‘de CO2 en’, ‘de la formación’, respectively). The implications of our results will be discussed in respect to language ⇤ Ponente 77 teaching and particularly that of language for specific purposes. Contraseña: ngrams, phraseology, academic discourse, specialized corpora, contrastive studies 78 Gender-based di↵erences in the use of epistemic modals in late Modern English scientific register Francisco Alonso-Almeida 1 ⇤† 1 , Francisco J. álvarez-Gil ⇤ ‡ 1 Universidad de Las Palmas de Gran Canaria (ULPGC) – España The research conducted has focused on samples from English scientific texts from 1700 to 1900 in order to evaluate epistemic modality as realised by modal verbs. Epistemic modality seems to be strongly connected to the idea of truth and the authors’ responsibility and commitment regarding their statements (Traugott 1989; Sweetser 1990; Stukker Sanders and Verhagen 2009). We will also discuss some related features, such as evidentiality. Whereas for some scholars evidentiality represents a subdomain of epistemic modality, there are others who consider evidentiality as an independent category. In this context, Dendale and Tasmowski (2001) argue that the relation between these two concepts is divided into disjunction, inclusion, and intersection. We follow the disjunctive approach in this paper in line with Cornillie (2009) who argues that the mode of knowing should not be associated with the degree of authors’ commitment towards their texts. Our interest was to see whether di↵erences in the use of these modals could be detected from a gender perspective. For this, we have interrogated the subcorpus of History of The Coruña Corpus of English Scientific Writing, which contains extracts of several historical texts written between 1700 and 1900, using its own retrieval tool, i.e. the Coruña Corpus Tool. Each occurrence has been categorised according to its contextual meaning following Dixon’s description of modal verbs that claims there are modals and what we can call semi-modals, which express the modalities (2009: 172). However, there are also other valuable insightful studies on modals as Coates (1983), Leech (1971) and Palmer (1979), among others, which have served as references for the present study. The process followed consists basically in the following: firstly, we have produced a list of occurrences in the corpus to check the presence of modal verbs in the history texts available. Secondly, we have interrogated and analysed the corpus to find the pragmatic functions those modals play in the di↵erent texts. Finally, we have checked the results to find out if there exist any di↵erence in the use of epistemic modals in late Modern English scientific register regarding the gender of the writers. Results report on frequency of usage of these modal verbs according to gender, but, most importantly, the di↵erent pragmatic functions these modal verbs fulfil in the communicative process. One such pragmatic function is mitigation of claims (Alonso Almeida 2015), and so the modals are used as a negative politeness strategy (Brown and Levinson 1987), to avoid or minimize imposition, to hedge the illocutionary force of a specific statement, or to put social distance in order to save the author’s face. In this sense, modals are quite useful as they enable an interactive construction of scientific knowledge giving the chance to the writer and the readers to negotiate meaning. ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 79 Contraseña: modals, corpus, gender, modality, evidentiality 80 Gobernabilidad y democracia en México. Unidades fraseológicas del Ejecutivo Federal 2012-2016 desde el Análisis Crı́tico del Discurso Carlos Enrique Ahuactzin Martı́nez ⇤ 1 1 Benemérita Universidad Autónoma de Puebla-Instituto de Ciencias de Gobierno y Desarrollo Estratégico (BUAP-ICGDE) – Av. Cúmulo de Virgo s/n. Acceso 4, CCU. Puebla, Puebla, México C.P. 72810., México La concepción del Estado como regulador de la vida pública en paı́ses latinoamericanos ha encontrado, en los últimos años, su prueba más rigurosa. En el caso de México, se propone documentar el proceso de construcción del discurso de la ”gobernabilidad democrática”, a partir de la figura presidencial, como una estrategia del Ejecutivo Federal para hacer frente a la existencia de un ”Estado fallido”, a la luz de los acontecimientos sociales y polı́ticos que han puesto en examen la capacidad del Estado mexicano para mantener y garantizar los derechos humanos. Con base en las perspectivas teórico-metodológicas del Análisis Crı́tico del Discurso y la Ling´’uı́stica de Corpus, de manera complementaria, se analizan los discursos presidenciales del periodo 2012-2016, en que se registra la configuración de la nueva polı́tica de Estado en materia de seguridad y el desarrollo de los procesos de violencia que han caracterizado a la administración federal actual. El discurso presidencial, a lo largo del corpus, revela los recursos discursivos que hicieron posibles las formas de comunicación de las reformas estructurales en México, basadas en el cumplimiento de la ”gobernabilidad democrática”, concebida como un marco normativo para el desarrollo del Estado y el fortalecimiento de la ciudadanı́a. El corpus ha sido organizado con base en las concordancias semánticas, utilizando el Sistema de Gestión de Corpus del Grupo de Ingenierı́a Ling´’uı́stica de la Universidad Nacional Autónoma de México. La clasificación y tratamiento de las unidades fraseológicas tuvo como base la identificación de dos monolexemas, ”gobernabilidad” y ”democracia”, que en el tratamiento del corpus revelaron su incorporación a plurilexemas, en función de la situación comunicativa del Ejecutivo Federal. De este modo, se establecieron tres grupos, dada su frecuencia en la base del corpus: 1) las locuciones nominales, 2) las locuciones adjetivas, y 3) las locuciones adverbiales. En la determinación de los usos de las locuciones, se consideró en el etiquetado del corpus el carácter funcional de las expresiones ling´’uı́sticas, en el contexto de la comunicación gubernamental. Los recursos discursivos utilizados por el Ejecutivo establecen un marco de referencia a nivel léxico-semántico, en el que la ”democracia” ocupa un lugar destacado en el ejercicio del poder público y la legitimación de las decisiones polı́ticas. Asimismo, el uso de las derivaciones de la ”gobernabilidad”, a la luz del análisis de las unidades fraseológicas, permite establecer un campo de asociaciones entre las locuciones nominales, adjetivas y adverbiales. El trabajo de etiquetado de las unidades de análisis en el periodo estudiado permite establecer la relación entre las modalidades del discurso presidencial y los procesos polı́ticos que determinaron el contexto de la producción y emisión de los mensajes institucionales. Por tanto, el estudio propone un acercamiento interdisciplinario sobre el discurso presidencial, considerando las variables discursivas, ling´’uı́sticas y polı́ticas, que participan en la configuración de los mensajes del Ejecutivo Federal en México. Finalmente, se ⇤ Ponente 81 propone, como resultado de los hallazgos empı́ricos, una tipologı́a de las estrategias comunicativas y discursivas que articulan la concepción de la ”gobernabilidad democrática” en un contexto normativo que pone en evidencia las limitaciones reguladoras del Estado mexicano. Contraseña: Discurso, unidades fraseológicas, locuciones, gobernabilidad y democracia. 82 Gramática española para hablantes de francés: el uso de la preposición ”de” después de matrices del tipo es posible. Marı́a Adelaida Gil Martı́nez 1 ⇤† 1 Instituto Cervantes de Burdeos (IC Burdeos) – Instituto Cervantes – 57, Crs de l’Intendance 33000 Bordeaux France, Francia Una de las dificultades más habituales en el aprendizaje del español por parte de hablantes de francés es el uso de las preposiciones, sobre todo el uso excesivo de la preposición de en matrices del tipo es posible, conformando una de las fosilizaciones más caracterı́sticas en la interlengua de dichos hablantes hasta el nivel B1. Si bien en los niveles iniciales se podrı́a pensar en una transferencia del francés al español, por ejemplo: (*es posible de dejar de fumar ) del francés (c’est possible de arrêter de fumer), en el nivel B1 se podrı́a llegar a considerar una estrategia para evitar el uso de subjuntivo, al no dominar la alternancia entre los dos modos en español. Teniendo en cuenta, además, que la transferencia sintáctica de la L1 es observable hasta niveles muy avanzados, no es raro que se observe este tipo de error en estos estadios del proceso de enseñanza-aprendizaje. Para corroborar esta hipótesis, hemos recurrido al Corpus de aprendices de español como lengua extranjera (CAES), un corpus diseñado por un equipo de la universidad de Santiago y financiado por el Instituto Cervantes, que consiste en textos escritos producidos por estudiantes de español con diferentes grados de dominio ling´’uı́stico (niveles A1 a C1 del Marco común europeo de referencia, aplicado al español en el Plan curricular del Instituto Cervantes. Niveles de referencia para el español ) y procedentes de seis L1: árabe, chino mandarı́n, francés, inglés, portugués y ruso. Los objetivos que se persiguen en esta propuesta son los siguientes: • Ver hasta qué punto CAES corrobora esta hipótesis al analizar y valorar, a través de técnicas estadı́sticas, la presencia de la matriz (*es posible de) en el aprendizaje de español por hablantes de francés. • Explorar en qué contextos aparece esta estructura y qué información podemos obtener de los muestras de CAES. El análisis contrastivo de dos lenguas o más a través de corpus ling´’uı́sticos nos permitirá valorar cómo funciona esta estructura dentro del discurso y determinar hasta qué punto su aparición se debe a la transferencia de la L1 o a otras estrategias de aprendizaje por parte de los hablantes de francés. • Construir un banco de ejemplos que pueda servir más tarde para el diseño de actividades y tareas que llevar al aula y que actúen como material-mediador-revulsivo que mejore el proceso de enseñanza-aprendizaje. ⇤ † Ponente Autor correspondiente: [email protected] 83 Los primeros resultados que arrojan las muestras analizadas de CAES nos hablan de los siguientes contextos: • Las matrices del tipo *es posible aparecen seguidas de la preposición de en un 38% de los casos. Contraseña: Corpus de aprendices, hablantes de francés, matrices de subjuntivo, interlengua, ELE, transferencia de la L1 84 Hedging in tourism discourse: the variable genre in academic vs professional texts Francisca Suau-Jiménez 1 2 ⇤ 1 , Carmen Piqué-Noguera ⇤ † 2 FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ. UNIVERSITAT DE VALENCIA (IULMA-UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ (IULMA - UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España In the last decades e-genres have been at the forefront of academic, professional and social studies to enhance writing in these areas (Thaine 2015), and tourism has often been targeted as one of them. Recent studies (Suau-Jiménez 2016; Mapelli 2016) have shown that tourism egenres strongly challenge the interpersonal model of metadiscourse for academic genres (Hyland 2005). Therefore, we hypothesize that one of the most representative interpersonal markers, hedges, should also show important di↵erences in what respects functions, frequency and language grammatical realization across Research Articles vs Hotel Websites. Genre and discipline are two variables that have been claimed to challenge the original interpersonal metadiscourse model that took English and academic discourse as their main referents. Hedges are prototypical markers in academic writing, and also central in tourism genres of promotion in English (Suau Jiménez 2012) since they reveal di↵erent author’s functional attitudes and commitments with content and with readers’ implication. Hedges, however, are not always so easy to discriminate, as Nash (1992) points out, claiming their fuzziness in interpersonal metadiscourse. This research analyzes a 100.000-word corpus composed of two sub-corpora of Hotel Websites and Research Articles in tourism. We aim to uncover which generic functions they partake or not in each case, their frequency, as well as the nature of their grammatical realization in both genres. Methodologically, we have taken Hyland’s (2005) taxonomy as a starting point and adapted it to the corpus at hand. We have disposed of so-called research verbs, such as ‘argue’ or ‘indicate’ since they only appear in Research Articles. Then, since modals ‘should’, ‘could’ and ‘may’ are shared by both genres, we have taken them as our specific object of analysis from an interpersonal discourse approach (Hyland 2005). Preliminary results from a pilot corpus of 34.000 words for both genres already showed a quantitative di↵erence: 8.03 hedges in Research Articles versus 4.07 in Hotel Websites. Besides, modals ‘may’ and ‘should’ present specific occurrences: whereas we have counted 21 in Articles, there are 54 in Hotel Websites. Also, ‘should’ appears 13 times in Articles and 15 in Hotel Websites, whereas ‘could’ has an occurrence of 12 times in Articles versus 2 times in Hotel Websites. Frequency may imply a di↵erent way to approach and persuade each readership, this being related to specific functional needs to achieve the genre communicative aim. Tourism marketing use these modal verbs to give advice to prospective clients or to describe what they would find around the hotel premises, whereas Research Articles writers make use of these modals especially in their argumentative and speculative sections. Their use is typical when making claims which are more or less tentative, or when a possible outcome is more or less probable, often accompanied by qualifying adverbs like ‘relatively’, ‘generally’ or ‘largely’. ⇤ † Ponente Autor correspondiente: [email protected] 85 Conclusions point towards interpersonal metadiscourse as a research framework that must consider the variables genre and discipline in order to render ad hoc analyses that can explain contextually marker frequencies and lexico-grammatical realizations, so that adequate discursive and socio-linguistic implications can be drawn. Contraseña: hedges / interpersonal metadiscourse / interpersonality / professional and academic genres 86 Identificación de fórmulas recurrentes en español académico Marcos Garcı́a Salido 1 ⇤ 1 , Marcos Garcia González 1 , Margarita Alonso Ramos 1 Departamento de Galego-Portugués, Francés e Lingüı́stica, Universidade da Coruña (UDC) – España En cualquier género discursivo existen combinaciones recurrentes o rutinarias de unidades léxicas. Dichas combinaciones son en muchas ocasiones semánticamente composicionales, pero su realización léxica está condicionada por la representación conceptual que el hablante desea expresar. Ası́, por ejemplo, para expresar la presentación de las conclusiones de un texto, en conclusión resulta más idiomática que ?a manera de conclusión. Desde una perspectiva fraseológica, este tipo de secuencias se han denominado clichés (Mel’čuk, 2015) y se solapan hasta cierto punto con el concepto de lexical bundle (Biber et al., 1999). Por sus caracterı́sticas, la comprensión de tales secuencias no es problemática, pero sı́ puede serlo su producción, de ahı́ el interés de un diccionario que las recoja, especialmente para hablantes no nativos o escritores noveles. El objetivo del presente trabajo es evaluar la eficacia de diversos métodos empleados para la identificación automática de secuencias pluriverbales, con vistas a la compilación de un diccionario de español académico. Hemos considerado como fórmulas recurrentes secuencias de dos, tres y cuatro palabras con una frecuencia de al menos diez ocurrencias por millón de palabras (Biber et al., 1999). Se han obtenido ası́ fórmulas como cabe destacar que o en el presente trabajo, al lado de otras de interés más dudoso como et al. 2002, no se han, etc. Para identificar aquellas que son caracterı́sticas del discurso académico hemos comparado fundamentalmente dos estrategias: (i) combinar un ı́ndice de dispersión (DP, Gries, 2008) con un valor de log-likelihood indicativo de diferencias significativas en cuanto a la distribución de las fórmulas estudiadas con respecto a textos no académicos y (ii) usar exclusivamente un test que a la vez mide diferencias de distribución y tiene en cuenta la dispersión de las formas testadas (Wilcoxon-Mann-Whitney [WMW]; cf. Kilgarri↵, 2001; Paquot y Bestgen, 2009; Lijffijt et al. 2015). El corpus de referencia utilizado (la parte en español del SERAC, InterLAE, 2008) se compone de artı́culos cientı́ficos de cuatro áreas diferentes (Humanidades, Ciencias Sociales, Fı́sica e Ingenierı́a y Ciencias de la salud) y se contrasta con textos narrativos procedentes del corpus LEXESP (Sebastián et al., 2000). Como sucede en otros estudios (Paquot y Bestgen, 2009), el test de WMW se muestra, en principio, más conservador que el log-likelihood. Por ejemplo, si consideramos solo los bigramas con un valor p  0,0001 en el test de WMW, nos quedarı́amos únicamente con un 25% de estas secuencias. Con el mismo valor p, el test log-likelihood producirı́a una lista del 73% de la cantidad original de bigramas. Ahora bien, esta última lista puede reducirse a solo los bigramas de mayor dispersión, acortando sensiblemente la distancia entre los resultados de los dos métodos. El análisis tanto de las listas obtenidas como de los elementos que se han quedado fuera de acuerdo con los distintos umbrales de significatividad y dispersión proporcionará información acerca de la precisión de los filtros usados y de su exhaustividad. ⇤ Ponente 87 Contraseña: fórmulas, discurso académico, diccionario, extracción automática de keywords 88 Impact of Parallel Corpora as Translation Memories on Phraseological Translation Quality in Student Translations of Specialized Medical Texts Heidi Verplaetse 1 1 ⇤† 1 , An Lambrechts , Kris Heylen ⇤ 1 KU Leuven, RU Quantitative Lexicology and Variational Linguistics – Bélgica ABSTRACT Theoretical background and main arguments Recently K´’ubler et al. (2016) conducted a student experiment using comparable corpora, indicating that these corpora help to solve translation difficulties, such as those relating to tense and aspect, the use of prepositions and collocations, etc. However, certain error types occur more often with the use of a corpus, possibly because of overconfidence in the corpus or a lack of time when making extensive use of it. Aside from comparable corpora, the use of parallel corpora as translation memories (TMs) integrated in a CAT tool, provides another excellent means to prepare students for their future professional environment, reflecting the needs of professional translators. Corpora improve student translations because they contain information which is not included in dictionaries, particularly with regard to terminology and idiomatic expressions (cf. phraseology) (Frérot, 2009). This is confirmed by K´’ubler (2011), who states that parallel corpora seem to be the perfect tool for a translator: next to the terminology needed for the translation task, they also provide the translator with the necessary phraseology. By using parallel corpora and integrating these in a CAT tool, it is not only possible to exploit the abovementioned benefits of corpora, but also those of TMs: not only do TMs speed up translation, leading to an increase in the translator’s productivity and gains, but they also have a positive influence on the overall translation quality. By recognizing previously translated segments, TMs increase the consistency at the stylistic, phraseological and terminological levels (Austerm´’uhl, 2006). However, when trying to increase their translation output, translators may work too fast if they have a TM, negatively influencing translation quality as they use translations from the TM without verifying them first (Bowker, 2005). Aims and method In order to assess the influence of CAT tools and preset corpus-based TMs on translation quality on a phraseological level, we examine translations of specialized medical texts executed by MA students of Translation. In our experiment the source texts contain predefined translation difficulties. The students perform the translations under three di↵erent conditions, viz. (i) without CAT tools, TMs or external resources, (ii) with a CAT tool and a TM and (iii) with external ⇤ † Ponente Autor correspondiente: [email protected] 89 resources only. For the medical translations in our current tests the students use the parallel corpus from the European Medicines Agency (EMA) compiled by Tiedemann (2009) as a TM. Upon completion of the translation an analysis of the predefined translation difficulties is executed based on an error classification (cf. MeLLANGe error typology, K´’ubler et al., 2016). We use an error typology, as errors can be defined more easily and precisely than translation quality: translation quality depends on the absence of errors to a large extent. And as stated by Schiaffino and Zearo (2005), among others, translation quality should be assessed as objectively as possible. Pilot test results Our pilot test with student translations led to the insight that concordance searches in TMs of parallel corpora prove beneficial for looking up specialized medical terminology (-, 2015), whereas mere TM support without concordance searches provided little added value. Terminology look-up through concordance searches proved especially beneficial for more difficult items. In these experiments, however, also the exclusive use of external resources (excluding CAT tools and TMs) showed a considerable positive influence on the translation of specialized terminology (-, 2015). Contraseña: Parallel corpora, Comparable corpora, Terminology, Phraseology, CAT tools, Translation Memories (TMs), Translation quality, Translation for Specific Purposes, Medical translation 90 Investigating style and conventionality in literary translation: a corpus-based approach Carolina Barcellos 1 ⇤ 1 University of Brası́lia (UnB) – Campus Universitário Darcy Ribeiro – Asa Norte – ICC Sul B1167/63 CEP: 70910-900 – Brası́lia /DF, Brasil Corpus-based Translation Studies (BAKER, 1999, 2000; SALDANHA, 2011) have focused on the style of translators, and addressed the translator’s discursive presence in the translated text as a result. This research specifically investigates stylistic traits of a literary translator from the perspective of conventionality and shifts in translation. It examines patterns of linguistic choices made by a translator regarding conventionality (BAKER, 2007) in Brazilian Portuguese that could be found both in his work as a translator and as an author, and the consequences of these choices for the recreation of meaning in the translated texts. Three corpora were compiled: 1) a corpus of translated texts written in Brazilian Portuguese by one of the current most prominent Brazilian literary translators, Paulo Henriques Britto, 2) a corpus of non-translated texts written in Brazilian Portuguese by Britto, and 3) a corpus of short stories written in American English by the authors Philip Roth, John Updike, and Jhumpa Lahiri that, with the first corpus, translated texts by Britto, composed a parallel corpus. Two other corpora (COMPARA and ESTRA) were used as control corpora for frequency reference regarding convencionality in Brazilian Portuguese. Statistical data were obtained using the software WordSmith Tools c 6.0 (SCOTT, 2012), and elements related to conventionality in Brazilian Portuguese were analyzed at the various orders (morpheme, word, group, and clause). The research methodology included compilation, preparation, alignment and tagging the texts for later analysis with WordSmith Tools c 6.0. The identification of patterns in the translated texts, attributed to the translator’s style and not to the linguistic constraints of the American English/Brazilian Portuguese pair, take on board mainly what was postulated by Munday (2008), Saldanha (2011) and Baker (1999, 2000, 2007). The results indicated that Britto made a set of choices to some extent distinct for each translated text, under the influence of the style of source texts. In general, the linguistic choices made by Britto regarding the use of conventional expressions increased the degree of colloquialism in the translated texts when compared to their respective source texts. In addition, the set of choices identified in Britto’s non-translated texts presented similarities with the set of choices identified in his translated texts, in particular with the ones in Philip Roth’s work. The most frequent shift in translation was addition (an amplification subcategory). These instances of addition were not directly related to explicitation. They were, on the other hand, related to a preference from the translator to use conventional expressions in translated texts, even when there was no clear motivation for this in the source texts. Britto also made use of sanitization, erasing some cultural references from the source texts. Nevertheless, the translator’s creativity consistently outweighted the use of sanitization, corroborating the results obtained by Munday (2008) and refuting, to some extent, the ones obtained by Baker (1999, 2000). ⇤ Ponente 91 Contraseña: Conventionality, Style of Translation, Literary Translation, Corpus, based Translation Studies. 92 Investigating the cognitive potential of primary EFL textbook activities: a corpus-based study Joaquı́n Gris Roca ⇤† 1,2 , Raquel Criado Sánchez ⇤ ‡ 3,4 , Agustı́n Romero Medina§ 2,5 , Isabel Alonso Belonte ⇤ ¶ 6 1 3 University of Murcia (UMU) – Universidad de Murcia, Facultad de Ciencias Sociosanitarias, Campus de Lorca, Antiguo Cuartel Sancho Dávila, Avda. de las Fuerzas Armadas, s/n, Lorca 30800 Murcia, Spain, España 2 Université de Murcie – España University of Murcia (UMU) – Universidad de Murcia, Facultad de Letras, Campus de la Merced, C/ Santo Cristo, 1, 30071 Murcia, Spain, España 4 Université de Murice – España 5 University of Murcia (UMU) – Facultad de Psicologı́a University of Murcia Campus de Espinardo 30100 Murcia, Spain, España 6 Université autonome de Madrid – España Textbooks and activities are fundamental tools in the EFL classroom (e.g. Littlejohn, 2011; Montijano-Cabrera, 2014; Sánchez, 2004; Tomlinson, 2003, 2011) as they are often the only means to a↵ord students opportunities to practise the L2 in (very often) poor-quality-input environments, as is the case of EFL contexts. Teachers can use them in a variety of ways, mainly to convey the L2 knowledge to students through practice or to support the explanations they present in class. Basically, there are three types of activities according to the type of knowledge they foster (Gris, 2015): i) activities whose teaching nature is mostly or fully explicit, which primarily foster explicit linguistic knowledge (e.g. knowledge of the forms); ii) activities with a high or full implicit teaching load, aimed at developing implicit knowledge (which underlies oral and written fluency); and iii) activities that have a mixed teaching load, that is, partially explicit and implicit. The selection and implementation of activities taking into their explicit and implicit teaching nature is crucial for a balanced development of both explicit and implicit knowledge, given that the ultimate goal of Foreign Language Teaching should be the attainment of the latter (e.g. DeKeyser, 2015, etc.). This issue becomes particularly sensitive when it comes to child L2 acquisition (Abello-Contesse et al., 2006), since earlier stages of acquisition are believed to be decisive for aspects such as pronunciation, intonation and fluency (Agustı́n-Llach, 2016; Alizadeh, 2011; Paradis, 2007). Therefore, the objective of this preliminary study is twofold: firstly, to analyze the load of explicit and implicit teaching nature of activities pertaining to EFL textbooks from di↵erent and representative editorial houses, used in primary school in Spain; secondly, to discern their cognitive potential. ⇤ Ponente Autor correspondiente: ‡ Autor correspondiente: § Autor correspondiente: ¶ Autor correspondiente: † [email protected] [email protected] [email protected] [email protected] 93 The method to analyze and categorize activities involved two basic steps. The first one entailed the creation of a corpus by compiling 100 activities from 10 real EFL textbooks used in the first year of Spanish primary school in Spain. The activities were randomly selected from two textbooks from each of the major EFL textbook editorials in Spain (Oxford University Press, Macmillan, Cambridge University Press, Santillana/Richmond, Pearson, Burlington Books, Anaya). Unit and activity selection within each textbook was randomly undertaken too. Secondly, each individual activity in the corpus was tagged with its explicit and implicit teaching load. Data analysis is ongoing and it is expected that this study will contribute to shed light on the patterns of activity typology of EFL primary-school textbooks. This will unveil the cognitive potential underlying textbook activities used for child EFL teaching. Derived pedagogical implications will be indicated. Contraseña: Primary school, EFL teaching, textbooks, activities, corpus 94 Investigating the relationship between L1 and L2 collocation processing in the bilingual mental lexicon from a cross-linguistic perspective Hakan Cangir 2 ⇤ 2,1 University of Exeter – Graduate School of Education St Luke’s Campus Heavitree Road Exeter Devon EX1 2LU United Kingdom, Reino Unido 1 Ankara University, School of Foreign Languages (AU YDYO) – Ankara Üniversitesi Gölbaşı 50.yıl yerleşkesi Bahçelievler Mahallesi Kaymakamlık arkası 06830 Gölbaşı/ANKARA, Turquı́a Many studies have investigated how the bilingual mental lexicon is structured and it has been suggested by various researchers that both lexicons seem to interact in some way during the language production. However, there are certain disagreements in terms of the interaction between the two mental dictionaries during the lexical activation process; in particular, in which phase of the activation process one can observe an interaction. Another related topic scrutinized by many applied linguists is whether the activation of lexis is language specific or language non-specific. The current study attempts to assume the process to be language non-specific and tries to shed light on the cross-linguistic nature of the bilingual mental lexicon with a specific emphasis on collocations, which seem to be an understudied topic. In addition, the research approaches the issue of cross-linguistic lexical priming from a syntagmatic perspective with the help of a typologically di↵erent language, Turkish, which previous research appears to lack. It is assumed that frequency, congruence, and typological variety are likely to have an impact on lexical processing, collocations in particular. With this notion in mind, the researcher exploits two representative and balanced corpora, Corpus of Contemporary American English (COCA) and Turkish National Corpus (TNC) to develop reliable items to be employed in a cross-linguistic collocational priming experiment and attempts to observe the response times of English-Turkish bilinguals and investigate the influence of frequency, congruence and typology on collocational processing. Building on lexical priming theory which suggests that every word is primed to occur with particular other words it collocates, the study attempts to refer to the Spreading Activation Model as the underlying theory to lexical activation and examine the cross-linguistic aspect of collocational priming in bilinguals. Furthermore, as the core framework for cross-linguistic collocational priming, Dual Activation of Collocational Connections Model and Psycholinguistic Model of Vocabulary Acquisition in L2 are employed due to the two di↵erent language acquisition settings reflected in the study; i.e. English as a Second Language (ESL) and English as a Foreign Language (EFL). The initial results indicated that a strong priming e↵ect seems to exist in Turkish based on the results of a monolingual priming experiment designed to set the baseline for the main experiment. Furthermore, the findings of the cross-linguistic priming experiment suggested that a priming e↵ect appears to be present for ADJECTIVE+NOUN collocations, but not for VERB+NOUN combinations, which can be regarded as a typology e↵ect on the processing of collocations cross⇤ Ponente 95 linguistically. What is more striking is that the direction of the presentation in the priming experiment appears to have the strongest impact on response times. That is, when the prime word was in L1 and the target word was in L2, the processing seems to be facilitated and a statistically more significant priming e↵ect can be detected. Last but not least, congruent and more frequent (having a higher P1—2) collocations yielded more significant cross-linguistic priming e↵ect. The regression analysis revealed that the direction of the presentation and P1—2 are strong predictors of the mean response times of the subjects in the cross-linguistic collocational priming experiment. The results were discussed in the light of the lexical processing models stated above. Contraseña: Collocational Priming, Mental Lexicon, Bilingual, Corpora and Crosslinguistic 96 Knowledge extraction for TKB phraseology module design Pilar León-Araúz 1 ⇤ 1 , Arianne Reimerink ⇤ † 1 University of Granada (UGR) – Buensuceso, 11 18001, España Certain authors define phraseological units as all word combinations with certain stability (Hausmann 1984, 1985, 1989; Gl´’aser 1994/95), even in specialized discourse (Roberts 1994/95, Heid 1994, 2001; Montero 2003, 2008). According to Rundell (2010: vii), collocations are as important as grammar since they make speakers/writers sound fluent. In specialized domains, they are perceived by language users to contribute to the domain-specific flavor of special languages (Bartsch 2004). In this line, recent studies have highlighted the importance of verbs, their collocations and argument structure in specialized terminology (Lorente 2007; Buendı́a 2012, 2013; Buendı́a, Montero and Faber 2014), but there are currently few terminographic resources that incorporate them (L’Homme 1998; Buendı́a 2012). If terminological knowledge bases (TKBs) want to be truly helpful for specialized writing, phraseological information should be added in a consistent and user-friendly way. In EcoLexicon, a TKB on the Environment (ecolexicon.ugr.es), phraseology was first included at the term level, linking verbs with arguments previously contained in EcoLexicon (Buendı́a 2013). However, certain verbs, or at least some of the paradigms in which they can be framed, can also be regarded as semantic relations. In EcoLexicon, knowledge extraction and representation is based on triplets or conceptual propositions (concept-relation-concept combinations; Faber, León and Reimerink 2014). Nevertheless, the expressivity of some of the relations should be improved. For instance, the relations a↵ects, has function, or cause could be divided into more specific relations. Conceptual propositions such as erosion a↵ects landform would be more meaningful if the relation was reduces instead of a↵ects. However, the TKB should also contain other verbs lexicalizing and specifying the nuclear meaning of reduction (e.g. carve, degrade, erode, etc.) as well as other terms that can also fill the slots of these arguments (e.g. weathering, cli↵, etc.) For a phraseological module to be consistent with the conceptual module in EcoLexicon, it should be based on the same principles. The design of our module is thus developed from the categorization of term-verb-term collocates reflecting the di↵erent lexicalizations of conceptual propositions. Thus, semantic relations can be further specified according to specialized predicates. In turn, phraseological templates can be generalized based on the semantic types related in conceptual networks. However, these semantic types need to be extracted in a consistent way. Top-down and bottom-up methods are applied to extract the information needed to build the module. The first consists of establishing basic semantic categories in the environmental domain (e.g. landform, structure, instrument, etc.), based on the definitions and conceptual networks in EcoLexicon. This will result in a domain-specific ontology similar to that of CPA semantic types, which is used in the Pattern Dictionary of English Verbs (PDEV; Hanks 2008). The validity of this categorization is tested by comparing it to the results of the automatic clustering (Brown et al. 1992) of a 50 million word corpus on the Environment. The latter consists of extracting all verbs from the corpus with TermoStat (Drouin 2003) and classifying them into di↵erent paradigms based on the concepts they relate and the basic conceptual relations they ⇤ † Ponente Autor correspondiente: [email protected] 97 express. These paradigms will be inspired in the patterns and implicatures of the PDEV and the lexical domains described in Faber and Mairal (1999). The analysis of verbs and arguments will contribute to the refinement of our semantic relations and categories as well as to the population of the phraseological module. Contraseña: phraseology, specialized discourse, TKB, categorization 98 L’analyse contrastive des références au passé en français et en chinois -Sur le corpus des récits Xingzi Zhang 1 ⇤ 1 Laboratoire – Université Paris III - Sorbonne nouvelle – Francia La linguistique contrastive est considérée comme une branche de la linguistique appliquée, qui étudie la comparaison des micro-systèmes de deux (ou éventuellement de plusieurs) langues afin de faciliter leur enseignement et leur apprentissage. C’est une branche classique de la linguistique. Les origines de la linguistique contrastive remontent aux années 1950, aux Etats-Unis. Deux ouvrages peuvent être mentionnés, celui d’Uriel Weinreich (1953) sur le contact des langues et celui de Robert Lado (1957) qui est considéré comme l’ouvrage fondateur de la discipline. Nous allons choisir cette méthode, en appuyant sur nos corpus, afin de comparer la façon de référer au temps du passé et à l’aspect, et pour étudier l’organisation temporelle du récit. En français, on utilise des morphologies verbales pour exprimer à la fois le temps et l’aspect. Dans la catégorie des temps du passé, le présent de narration, le passé composé, l’imparfait, le plus-que-parfait et le passé simple sont souvent utilisés. Le chinois est une langue sino-tibétaine qui est très éloignée de la langue française. Il ne dispose pas de morphologie verbale comme les langues indo-européennes et est considéré comme une langue aspectuelle, qui utilise des particules aspectuelles (” -le ” ” -zhe ”, etc.) ou des structures (les RVCs, les redoublements de verbe, etc.) pour exprimer la temporalité. Corpus : Nous comparons la production écrite d’un récit basé sur un film muet, des deux groupes (un groupe de français natifs (GF, n=8) et un groupe de chinois natifs (GC, n=8). Afin qu’ils racontent le récit au passé, nous leur avons précisé que la situation reprise dans l’extrait s’était déroulée une semaine avant, et ils devaient décrire en détail ce qu’ils avaient vu. Résultats : En comparant les récits rédigés par les chinois et les français, nous observons quelques di↵érences pour marquer le passé dans les deux langues : - En français, pour décrire un récit, les natifs utilisent systématiquement la morphologie verbale pour remarquer le temps, cependant, en chinois, l’indication explicite du temps du passé est indiquée par les adverbes temporels. Pour l’aspect, les chinois natifs utilisent les morphèmes d’aspect comme ” -le ” ” -zhe ” ” zai- ”. En plus, les morphèmes sont optionnels, beaucoup de propositions sont sans morphèmes, surtout quand elles expriment l’aspect imperfectif, la ma⇤ Ponente 99 jorité n’a pas d’indication explicite. Nous remarquons qu’en chinois, le type de procès est moins flexible qu’en français, il peut indiquer aussi l’aspect. - Pour marquer l’antériorité dans le récit, les français natifs utilisent le plus-que-parfait, le semiauxiliaire ” venir de ”, le participe passé ou bien le passé composé qui est en e↵et une forme erronée du plus-que-parfait. Quant aux chinois natifs, en raison de l’absence de morphologie verbale, pour marquer l’antériorité, les chinois utilisent les moyens lexicaux : ” ganggang ”/ ” gangcai ” (tout à l’heure), etc., ou utilisent le morphème ” -le ”, la structure ” shi...de ” (C’est...qui/que) qui marquent l’aspect perfectif dans le style indirect pour référer à une situation s’est passée antérieurement. Il y a également des propositions sans marquage, dans ce cas, c’est l’information contextuelle qui permet d’identifier l’antériorité. - Les français natifs ont tendance à raconter le récit de façon séquentielle. Mais les chinois natifs racontent le récit de façon détaillée : les actions, les descriptions de personnages, les explications de situations s’imbriquent. Contraseña: l’analyse contrastive, morphologie verbale, le passé, l’aspect perfectif, l’aspect imperfectif, l’antériorité 100 La adquisición de los verbos de cambio: Un análisis de la interlengua de aprendices de español (L1 sueco) Ester Fernández 1 ⇤ 1 University of Gothemburg (GU) – Suecia El presente trabajo aborda el estudio de la adquisición de los verbos de cambio en aprendices suecohablantes de español lengua extranjera (ELE). El español dispone de una importante cantidad de verbos que sirven para expresar la noción de cambio (ponerse, volverse, hacerse, convertirse en, etc.). Estos se diferencian entre ellos a nivel semántico (Morimoto y Pavón Lucero, 2007) ya que cada uno, junto con su complemento, expresa diferentes maneras de realizarse el cambio (cambio de entidad, cambio procesual y cambio procesual resultativo). El sueco dispone del verbo bli, un verbo general que sirve para expresar prácticamente cualquier tipo de cambio. ¿Cómo tiene lugar la adquisición de estos verbos que no existen o no tienen una equivalencia exacta en la L1 de los aprendices? ¿Qué formas ling´’uı́sticas utilizan los aprendices suecohablantes para describir eventos de cambio en español? El objetivo de esta comunicación es presentar los resultados de un estudio piloto llevado a cabo durante un semestre académico con un grupo de aprendices suecohablantes (N=20) con distintos niveles de competencia ling´’uı́stica (entre el A2 y el B2). Los participantes estaban estudiando el primer curso de español (Grundkurs) en dos universidades suecas. Utilizamos una tarea escrita (la narración de una historia a partir de unas imágenes) con el fin de obtener muestras de lengua de la referencia al cambio. La tarea se repitió dos veces, al principio y al final del curso académico. Además, esta fue realizada una vez por un grupo de hispanohablantes (N=24). Observamos que era difı́cil identificar contextos obligatorios puesto que los nativos tendı́an a variar su elección de los verbos con respecto a la descripción de eventos de cambio. Esto nos llevó a plantearnos el estudio de la elección de formas de los aprendices desde un enfoque variacionista. Dicho enfoque proviene del campo de la socioling´’uı́stica (Labov 1972), sin embargo, se ha mostrado útil en el estudio del proceso de adquisición de segundas lenguas (Tarone 1979, 1983, 2007; Ellis 1985, 1999; Gesslin 2010; Gudmestad 2006; 2012). Los mismos factores (ling´’uı́sticos y extraling´’uı́sticos) que determinan la variación en el habla de los nativos son responsables de los fenómenos de variación que se manifiestan en las producciones de los aprendices. Aplicamos un análisis del significado a la forma (Bardovi- Harlig 2007; 2014). Primero identificamos los contextos donde los aprendices habı́an expresado cambios y seleccionamos todas las formas verbales y léxicas empleadas, codificándolas en función de una serie de variables ling´’uı́sticas (tipo del cambio descrito, tipo de complemento con el que se combina la forma etc.) A continuación, se comparó su uso con respecto a los niveles de competencia de los aprendices, los dos momentos de la realización de la tarea y con los nativos. Los resultados revelan que los aprendices usan variadas formas para expresar determinados tipos de cambio (cambio de entidad, cambio procesual y cambio procesual resultativo). El diseño ⇤ Ponente 101 pseudo-longitudinal del estudio nos muestra tendencias sobre el desarrollo del sub-sistema gramatical de los verbos de cambio en la interlengua de los aprendices. Al principio del semestre se observa, por ejemplo, un sobreuso de verbos como ser y estar que carecen del aspecto dinámico propio de los verbos de cambio. Al final del semestre se observa que estos verbos se van reemplazando en mayor o menor grado por verbos de cambio más propios de la lengua meta. Contraseña: Verbos de cambio, noción de cambio, Español como Lengua Extranjera, interlengua, variación. 102 La detección y etiquetado de las estrategias metadiscursivas en artı́culos académicos: METOOL Marı́a Luisa Carrió-Pastor 1 ⇤ 1 Universitat Politècnica de Valencia (UPV) – España Esta presentación trata sobre la identificación, etiquetado y comparación de las estrategias metadiscursivas que se utilizan en la lengua española e inglesa en el registro de textos cientı́ficos, ası́ como del análisis de la variación de estas estrategias en ambas lenguas. Esta investigación se enmarca dentro del proyecto ”Identificación y análisis de las estrategias metadiscursivas en artı́culos cientı́ficos en español e inglés (IAMET)”. Dentro del registro cientı́fico, hemos seleccionado tres disciplinas distintas entre sı́, la ingenierı́a, la medicina y la ling´’uı́stica para determinar la variación del uso de estrategias metadiscursivas. Para ello, nos basamos en las categorı́as metadiscursivas identificadas por Hyland (1998, 2005), Mur Dueñas (2011) y Briz (2001, 2007) para identificar los elementos que las componen y ası́ establecer sus frecuencias con el fin de realizar estudios contrastivos entre disciplinas y entre el español y el inglés. La hipótesis de partida que hemos planteado es que las estrategias metadiscursivas se usan de manera distinta en inglés y español, lo que puede influir en la efectividad de la comunicación cuando se utilizan como lenguas extranjeras. Los objetivos son, por un lado, analizar las estrategias metadiscursivas en inglés y español en varias disciplinas del registro cientı́fico y, por otro, detectar la variación que aparece en estas lenguas y disciplinas. Por lo tanto, la finalidad es doble: primero, caracterizar el discurso cientı́fico y sus estrategias retóricas que sirven para convencer al lector y segundo, identificar patrones de variación con respecto a las estrategias analizadas para que pueda utilizarse en la enseñanza del español e inglés. Ello se hace a través de la herramienta ’METOOL’ que se ha diseñado en el Research Institute for Information and Language Processing (Universidad de Wolverhampton) para el etiquetado e identificación de los elementos retóricos del discurso. Los matices que los escritores le otorgan a una lengua para persuadir al lector son de interés tanto para los escritores académicos como para los docentes de lenguaje académico, con lo cual la consecución de nuestros objetivos, es decir, la identificación y análisis de la variación en el uso de las estrategias retóricas en artı́culos cientı́ficos, beneficia tanto a los investigadores como a los escritores de este género, ya que sabrán si utilizan elementos retóricos de forma adecuada y si consiguen su objetivo, es decir, convencer al hablante de la importancia de su investigación. A través del análisis de los corpus y de la medición estadı́stica de la capacidad de involucrar al lector y convencerlo de los argumentos que se esgrimen, se puede medir el uso de las estrategias de persuasión ası́ como proponer alternativas. Para realizar este proyecto, en primer lugar se van a compilar los corpus en inglés y español en las tres disciplinas; en segundo lugar se van a identificar y etiquetar las categorı́as metadiscursivas y, en tercer y último lugar, se van a clasificar y analizar las estrategias metadiscursivas en ambas lenguas y en las tres disciplinas para determinar la variación, mostrando ejemplos de cada caso para identificar su naturaleza. Aunque las estrategias metadiscursivas han sido estudiadas desde diversos ángulos, no existe actualmente un trabajo que aborde la variación en el uso de estas estrategias y que clasifique y contextualice los elementos a incluir en las categorı́as. ⇤ Ponente 103 Contraseña: metadiscurso, análisis comparativo, analizador, artı́culos académicos 104 La economı́a al borde de un ataque de nervios: metáforas médicas en el discurso periodı́stico económico Ismael Ramos Ruiz 1 ⇤ 1 Universidad de Granada – España La metáfora se ha estudiado como un recurso literario hasta la aparición de la Ling´’uı́stica cognitiva, cuando empieza a considerarse también un recurso cognitivo que forma parte de nuestro sistema conceptual. Por ello, la metáfora está presente tanto en la lengua general como en el lenguaje especializado, a saber el caso de la Economı́a (Resche y Colin, 2016; Wang, Runtsova, y Chen, 2013). Debido a ello, conocemos el uso de la metáfora en el discurso periodı́stico económico (ej.: Nerghes et al., 2015) y, concretamente, el de la metáfora médica (ej.: Arrese, 2015). Partimos de la hipótesis de que si la economı́a se entiende como un organismo vivo, muchas de las enfermedades que sufre el ser humano serán empleadas en las proyecciones metafóricas, como es el caso de las enfermedades mentales y del comportamiento. Por tanto, nuestros objetivos consisten en: • averiguar y analizar qué términos médicos relacionados con el ámbito de las enfermedades mentales y del comportamiento aparecen en dicho discurso y qué relaciones se establecen entre estos términos y otros términos del texto; • establecer unos criterios de clasificación sintácticos y semánticos que permitan categorizar dichas combinaciones léxicas metafóricas. En primer lugar, hemos establecido un marco teórico basado en la Teorı́a de la metáfora conceptual (Lako↵ y Johnson, 1980, 1999), que nos ha ayudado a comprender la estructura de las metáforas y proceder a su análisis, ası́ como en la Terminologı́a basada en marcos (Faber et al., 2012), que nos ha servido para establecer los criterios sintácticos y semánticos de categorización de las metáforas. En segundo lugar, hemos creado un corpus para fines especı́ficos compuesto por textos periodı́sticos económicos de la prensa española, tanto de periódicos especı́ficos del ámbito económico (ej.: El Economista) como de las secciones económicas de los periódicos de tirada nacional El Paı́s y El Mundo. Para seleccionar los textos con presencia de metáforas, hemos empleado una adaptación del Procedimiento de identificación metafórica propuesto por el Grupo Pragglejaz (2007). En tercer lugar, después de analizar el corpus y obtener las lı́neas de concordancia con presencia de metáforas, hemos establecido unos criterios tanto sintácticos (mediante una adaptación de ⇤ Ponente 105 la propuesta realizada por Corpas Pastor, 1996) como semánticos (a partir de un evento conceptual prototı́pico en el que se establecen unas categorı́as semánticas) para clasificar dichas combinaciones léxicas metafóricas. Además de establecer unas categorı́as semánticas, los eventos conceptuales muestran las relaciones semánticas entre las categorı́as, como son ”causa” o ”afecta”, y la proyección del dominio médico sobre el dominio económico, aplicando la Teorı́a de la metáfora conceptual. A continuación, mostramos unos ejemplos extraı́dos de la prensa con presencia de metáforas, ası́ como su categorización sintáctica y semántica: • Estamos ante un nuevo brote psicótico de los mercados (El Mundo 2012) Sustantivo + Adjetivo + Preposición + Sustantivo (SAPS). PROCESO • El problema radica en la incapacidad y pánico de nuestra economı́a (Cinco Dı́as 2009) Sustantivo + Preposición + Sustantivo (SPS). SIGNOS Y SÍNTOMAS • El estrés de los bancos griegos e italianos (Expansión 2014) Sustantivo + Adjetivo (SA). PACIENTE Contraseña: metáfora conceptual, ling´’uı́stica de corpus, fraseologı́a, periodismo económico, eventos conceptuales 106 La mise en discours des données chi↵rées dans les textes de vulgarisation scientifique Riham El Khamissy ⇤ 1 1 Département de français, faculté des langues (AL ALSUN), Université Ain Chams, Le Caire – Département de français Faculté des Langues (AL ALSUN) Université Ain Chams Rue khalifa Maamoun Abbaseya Le Caire, Egipto Les données chi↵rées ont cet atout de produire, chez le destinataire, cet e↵et d’incontestable, d’irréfutable. Dans les médias, les journalistes peuvent rapporter une statistique de sorte que celle-ci devienne l’élément central de l’article (chi↵rage de l’information). Dans ce cas, l’explication des chi↵res constitue l’information secondaire. Or, le plus souvent, les statistiques et les pourcentages servent à appuyer le texte même, à argumenter des énoncés et à conférer une légitimité aux informations et aux idées. Notre travail a pour objectif de saisir comment les journalistes traitent l’information chi↵rée dans les articles de vulgarisation scientifiques (dans les médias de vulgarisation et la presse généraliste) notamment ceux qui traitent le virus Zika qui a fait l’objet de nombreux débats au cours des deux dernières années. Nous avons choisi les textes de vulgarisation plutôt que les textes scientifiques parce que l’une des finalités les plus saillantes de notre travail consiste à mettre en relief la volonté d’orienter le destinataire vers une attitude donnée, voire parfois le manipuler, ce qui est, à notre sens, un phénomène qui se manifeste davantage dans les textes de vulgarisation destinés au grand public généralement non averti. Nous sommes partie d’un corpus de 13090 documents en français répertoriés par la base Europresse.com entre le 1er janvier 2015 et le 31 décembre 2016, période où le virus a connu une expansion remarquable à l’échelle planétaire. Nous explorerons d’abord les données formelles. Nous examinerons le choix entre la forme typique et classique du nombre (en chi↵res) et sa transcription (en lettres). Ensuite, nous analyserons les chi↵res dans leur environnement linguistique immédiat (le co-texte), lequel peut modifier l’information véhiculée par le chi↵re en matière d’exactitude, de précision et/ou d’orientation argumentative selon la motivation communicative du journaliste. Sur ce, nous procéderons à l’analyse des quantifieurs (jusqu’à, près de, aux environs de, autour de et aux alentours de, près de etc.). Notre contribution s’inscrit dans la même lignée que les travaux d’Adler et Asnès (2004, 2007, 2013), ceux de Ducrot (1983, 1995, 2002) approfondis par Doury et Moirand (2004). La question que nous traitons, dans la présente contribution, n’est pas le recours aux chi↵res mais plutôt leur mise en discours et leur soumission aux objectifs des journalistes pour influencer l’opinion publique. Résultat : d’après nos analyses, l’écart entre niveau factuel ou informatif d’une part et le niveau argumentatif d’autre part est souvent que le reflet du passage des résultats numériques officiels, témoins de la vérité, à des ersatz subjectifs de la réalité. ⇤ Ponente 107 Contraseña: Chi↵res, quantifieurs, opérateurs argumentatifs, textes de vulgarisation, presse 108 La modalité dans les discours politiques : segments phraséologiques en langue et en discours. Exploration textométrique d’un corpus de débats présidentiels états-uniens (1960-2016) Marion Bendinelli 1 ⇤ 1 Edition, Littératures, Langages, Informatique, Arts, Didactique, Discours (ELLIADD) – Université de Franche-Comté – 30 rue Mégevand, 25030 Besançon cedex, Francia Notre communication porte sur l’identification puis l’analyse énonciative et discursive de segments phraséologiques incluant un ou plusieurs marqueurs verbaux de modalité (notamment can, must, will, need to, have to). Ce travail repose sur l’exploration outillée d’un corpus, établi en format XML-TEI, de discours politiques anglo-saxons composé de l’intégralité des débats présidentiels organisés aux États-Unis depuis 1960. L’exploration est conduite au moyen des logiciels d’analyse de données textuelles TXM (Heiden, Magué, Pincemin 2010) et Hyperbase (Brunet 2010), et fait en particulier usage des modules permettant de consulter et/ou calculer concordances, segments répétés et cooccurrents. Une telle exploration mettra en évidence les associations privilégiées entre (i) divers marqueurs de modalité ou (ii) entre marqueurs de modalité, syntagmes nominaux sujets (groupe nominaux ou pronoms) et verbes ou, plus largement, classes sémantiques verbales (verbes de communication, d’existence, d’activité... - selon la classification établie par Biber, Johansson, Leech, Conrad et Finegan 1999). Ces associations ont parfois été relevées dans divers travaux décrivant des genres discursifs (Dedaić 2004 ; Née, Sitri, Veniard 2014), des textes de spécialité (Gotti et Dossena 2001 ; Labbé et Labbé 2013) ou la grammaire anglaise (Biber et al. 1999) ; ici, établies sur la base d’une co-fréquence statistiquement pertinente au sein du corpus, elles seront analysées comme des segments phraséologiques - collocations (Firth 1957) et colligation (Hoey 2005) - de l’anglais, dans sa variante parlée aux États-Unis, et du discours politique. Dans un premier temps de l’étude, nous montrerons, par le biais de di↵érentes manipulations des logiciels TXM et Hyperbase, comment l’approche textométrique permet de mettre au jour l’existence de segments phraséologiques du type we must + verbe d’action non aspectuel (” we must act ”) ou verbe mental + SN + can (” I believe that we can work together ”) dans le cas des modaux must et can. Le calcul des cooccurrents permettra de mettre en évidence des segments phraséologiques discontinus (les items n’étant pas nécessairement adjacents) et ordonnés (l’apparition des items étant contrainte) du type can + must et/ou have to + will (” we can fight terrorism [...], it has to be [...], therefore we must fight terrorism and we will ”). Mettant en regard ces segments avec les données issues du corpus de référence COCA (Corpus of Contemporary American English), établi par Mark Davies et librement interrogeable en ligne, nous montrerons que certains sont spécifiques au discours politique, d’autres plus transversaux car utilisés dans di↵érents genres discursifs, semblent davantage inscrits en langue. Quelques éléments théoriques issus de l’analyse énonciative développée par Antoine Culioli et reprise par Gilbert (2001) ou Deschamps (2001), à savoir les notions de construction et de parcours de ⇤ Ponente 109 l’altérité notionnelle, éclaireront par ailleurs le fonctionnement énonciatif des séquences modales et leur fonction rhétorique. Chemin faisant, cette communication articulera approche informatisée d’un corpus, analyse statistique de données textuelles, analyses énonciative et discursive ; elle entend ainsi contribuer à mieux connaı̂tre les caractéristiques linguistiques et discursives des discours politiques. Contraseña: Segments phraséologiques, Collocation, Colligation, Discours politique, Débats présidentiels, États, Unis 110 La traduction des ” megatermes ” anglais de type erythrocyte invasion-inhibitory response : une approche fondée sur corpus et analyse du discours Mojca Pecman 1 ⇤† 1 , Natalie Kubler , Alexandra Mestivier ⇤ ⇤ 1 1 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât. Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia La linguistique de corpus a permis aux linguistes non seulement de fonder leurs observations sur les données authentiques, mais également d’étudier l’évolution de la langue et ses tendances actuelles. En traduction spécialisée, tant dans le milieu professionnel que dans le cadre d’une formation préparant les futurs traducteurs à s’adapter à ce milieu, la capacité à envisager la dynamique actuelle des langues de spécialité devient un enjeu majeur de la qualité de la traduction. Associée à l’envergure de la di↵usion de l’information spécialisée et à la rapidité d’évolution des connaissances, cette dynamique qui transparait ostensiblement dans les corpus de linguistes, semble grandissante. Cette étude vise à démontrer comment une combinaison de l’analyse en corpus avec l’analyse en discours permet de capter la dynamique des discours spécialisés et de trouver les solutions en matière de traduction. Nous illustrerons notre propos sur l’exemple des problèmes de traduction que posent les groupes nominaux complexes en anglais de spécialité tels que erythrocyte invasion-inhibitory response. Les groupes nominaux complexes permettent de compacter ou condenser l’information, une caractéristique saillante du discours spécialisé anglais. L’étude diachronique sur l’évolution des adjectifs composé anglais de Mestivier-Volanschi (2015) fournit des preuves sur la fréquence en hausse des ces structures. Gledhill (1999) et Jaime-Sisó (1993) étudient les mutations dans les titres des textes spécialisés d’un format nominal vers un format à structure de phrase où les composés complexes permettent l’expression d’une structure argumentale de manière économique, selon un mécanisme qu’ils appellent ”miniaturisation”. Les travaux de Maniez (2007, 2008) sur la langue médicale anglaise et les groupes nominaux complexes discutent également de la propension de l’anglais pour la nominalisation et de l’aide qu’o↵rirait aux traducteurs la création d’une base de données des équivalences des GN complexes. En e↵et, la grande flexibilité de l’anglais quant à la formation des groupes et des syntagmes nominaux contraste avec le français, plus enclin à préserver l’argumentation dans sa forme phrastique. Nous présenterons, dans un premier temps, le cadre général de cette recherche qui s’inscrit dans la méthodologie d’enseignement de la traduction spécialisée aux étudiants de Master pratiquée à l’université Paris Diderot. Cette méthodologie repose sur l’analyse terminologique (Pecman et K´’ubler 2011) et donne lieu à des évaluations ⇤ † Ponente Autor correspondiente: [email protected] 111 à l’aide d’analyses quantitatives et qualitatives de corpus de traductions annotées (K´’ubler et al. 2016). Ces analyses permettent d’améliorer la méthodologie d’enseignement de manière incrémentale d’année en année. Nous montrerons comment cette méthodologie combine la pratique d’enseignement avec la recherche en traduction spécialisée pour inscrire notre étude dans la lignée des travaux sur l’enseignement de la traduction par les corpus (Aston 1999, Zanettin et al. 2004, Beeby et al. 2009, Castagnoli et al. 2011) et sur l’évaluation de l’apport des corpus en classe (Bowker & Bennison 2003, Frankenberg-Garcia 2009, Loock et al. 2013, Loock 2016). Nous illustrerons également l’évolution diachronique des composés adjectivaux dévoilée par Mestivier-Volanschi (2015) pour démontrer la nécessité de la prise en compte de la tendance de l’anglais de spécialité à recourir aux GN complexes. Dans un deuxième temps, nous présenterons l’analyse de l’exemple du groupe nominal anglais erythrocyte invasion-inhibitory response et nous tenterons de montrer les procédés utilisés pour véhiculer ce type d’information en français (cf. les réponses immunes protectrices... médiées par des anticorps... inhibent l’invasion des érythrocytes). Contraseña: specialised traslation, translation teaching, corpus based approach, discourse analysis, complex nominal groups 112 La traduction publicitaire : approche par corpus Isabel Comitre Narvaez 1 ⇤ 1 Université de Málaga (UMA) – Université de Málaga – Campus de teatinos s/n - 29071 Málaga, Francia Si nous observons attentivement les messages publicitaires pour certains produits, nous nous apercevons de la présence massive d’un vocabulaire technique, voire pseudo-scientifique (Remaury, 2000). Les grandes marques utilisent ce vocabulaire pseudo-scientifique comme argument persuasif majeur pour gagner en crédibilité. En e↵et, la rigueur médicale et l’autorité scientifique sont une garantie d’achat pour le futur consommateur (Valdés Rodriguez, 2004). C’est le cas du lexique des produits appellés cosméceutiques (cosmétique + pharmaceutique) qui reflète à la fois l’évolution de la société médico-esthétique, le progrès technologique du domaine et l’innovation scientifique de ce secteur d’activité. Au sein de l’Union Européenne, la question de la traduction se pose au-delà de la simple équivalence lexicale car elle touche également la législation de chaque pays. Cependant, la traduction est au coeur de notre étude qui a pour principal objectif de pointer les principales stratégies traductionnelles mises en oeuvre par le traducteur en publicité. Pour ce faire, nous avons analysé un corpus d’annonces bilingues que nous avons constitué à partir des critères proposés par Guidère (2009, 2011). Notre corpus comparable bilingue contient environ 750 termes en français et leurs équivalents en espagnol. Ce corpus ” ad hoc ” que nous avons créé a été puisé sur les sites officiels de grandes marques de produits cosméceutiques. Nous avons repéré ce lexique en relevant sur les sites officiels di↵érents procédés qui permettent de conférer aux produits cette allure pseudo-scientifique (dérivation préfixale, suffixale, emprunts, composition, abréviations, acronymie, siglaison alphabétique ou chi↵rée, confixation, mots-valises, utilisation des majuscules, etc). Après cette première approche, nous avons comparé le vocabulaire repéré dans les mêmes sites en espagnol afin de mettre en lumière les stratégies traductionnelles utilisées. Or, dans une communication telle que la communication publicitaire où l’aspect visuel coexiste avec l’aspect verbal, nous avons évidemment pris en compte les images des annonces car celles-ci participent à la création du sens global de la publicité, voire même porter toutes seules le sens de la publicité. C’est la raison pour laquelle nous avons choisi la sémiotraductologie (Guidère, 2000, 2009, 2011; Guillaume, 2016) comme cadre théorique et méthodologique car ce paradygme traductologique considère l’importance des signes non verbaux (images, personnages, cadre, émotions, sensations) lors du transfert du sens en traduction. Notamment, le concept du ”cube traductologique” (Guidère, 2011, p 112) que nous avons adapté à notre objet d’étude; Ce modèle d’analyse nous a servi de point de départ et nous a permis de déterminer 3 niveaux d’analyse spécifiques à la publicité: celui des conceptions (idées générales de l’annonce transmises par le message linguistique); celui des perceptions (informations sensorielles transmises par les messages iconique et sonore) et, enfin, celui des intentions (implicites discursifs culturels et idéologiques). Le modèle d’analyse ainsi obtenu nous permet, d’une part, d’identifier et de classifier le lexique pseudo-scientifique spécifique caractéristique des cosméceutiques et porté par le message verbal et, d’autre part, d’appréhender le sens transmis par l’image et toutes les informations sensorielles portées par le messages non verbal et contenues dans les annonces de notre corpus dans le but de déceler les stratégies traductionnelles qui sous-tendent les choix du traducteur de campagnes publicitaires. ⇤ Ponente 113 Contraseña: traduction publicitaire corpus comparable bilingue 114 Le continuum lexique-grammaire en genre spécialisé à partir de corpus maison Laurent Gautier ⇤ 2,1 , Cyril Nguyen Van ⇤ 2 2 1 Maison des Sciences de l’Homme de Dijon USR3516 (MSH Dijon) – Université Bourgogne Franche Comté – Esplanade Erasme, 21000 Dijon, Francia Centre Interlangues Texte Image Langage (TIL) – Université Bourgogne Franche Comté – Université de Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia [Problématique et objectifs] La proposition, qui s’inscrit dans l’axe 5 de l’appel ” Corpus, études contrastives et traduction ” vise à interroger l’apport des corpus spécialisés maison (Loock 2016a, b) pour la mise au jour, pour la traduction professionnelle et la formation de traducteurs, des patrons lexico-grammaticaux inhérents à des moules textuels (Gautier 2009) hautement contraints, en langue(s) traduite(s). On discutera en particulier, à la suite de K´’ubler/Gledhill (2016 : 75), l’idée selon laquelle l’interrogation systématique de corpus homogènes permet d’aboutir à une représentation holistique vérifiée des interactions entre lexique et grammaire, surtout quand chacune des deux composantes est mise en œuvre à travers des répertoires (très) réduits par rapport aux possibilités o↵ertes par le système linguistique considéré. Ces patrons peuvent en e↵et représenter pour le traducteur un ” sous-texte ” à partir duquel les choix de traduction se feront de manière ” naturelle ” à l’interface entre contenus conceptuels du texte à traduire et mise en mots et en textes. Données Cette problématique sera instanciée par un corpus clos, compilé manuellement, et composé des conférences de presse de la Banque Centrale Européenne 2015 et 2016 dans leur version originale en anglais (19.883 mots) et dans leurs traductions en français (23.931 mots), allemand (19.810 mots) et néerlandais (21.324 mots). Par-delà son caractère de prime abord parallèle (Teubert 1996), chacun des sous-corpus sera envisagé pour lui-même, comme corpus de langue traduite, la comparaison avec l’original ne jouant qu’un rôle périphérique. Méthodologie On partira tout d’abord de la fréquence des termes N pour en interroger systématiquement les combinatoires, en particulier verbales, afin de dresser un inventaire systématique par langue des structures argumentales dans lesquelles ils s’inscrivent. Ce faisant, la dimension formulatoire, indispensable au traducteur pour la fluidité de son texte, sera mise en avant en particulier pour les langues, allemand et néerlandais en tête, qui jouent sur l’emploi de N prédicatifs associés à des V supports préférentiels non prédictibles : (01) Insbesondere m´’ussen die entschlossene UmsetzungNPRED von [G´’uter- und Arbeitsmarktreformen]ARG sowie die Bem´’uhungenNPRED [zur Verbesserung des Gesch´’aftsumfelds f´’ur Unternehmen]ARG in einigen L´’andern intensiviertVSUP werden. ⇤ Ponente 115 (02) Ten tweede was, hoewel de tussen juni en september vorig jaar genomen monetairbeleidsmaatregelen tot een aanzienlijke verbeteringNPRED [in termen van de koersen op de financi´’ele markten]ARG hebben geleid VSUP, dit niet het geval voor de kwantitatieve uitkomsten. On s’arrêtera ensuite, à partir d’une analyse des n-grams, sur les structures récurrentes, analysées ici en termes de routines discursives, dont l’emploi, par-delà la terminologie et les collocations conceptuelles, garantit l’appartenance du texte au genre, comme en (03) : (03) D : nach wie vor, mit Blick auf ; F : au cours des prochains mois, (x) des prix à moyen terme, NL : (van) de additionele aankopen van, op de middellange termijn Discussion Les résultats seront discutés d’une part par rapport à l’implémentation des corpus, en particulier maison, dans la formation des traducteurs – et ce par-delà leur présence ” dissimulée ” dans nombre d’outils de TAO, à commencer par les MT – et d’autre part par rapport au cloisonnement souvent systématique entre un module grammatical, un module terminologique et un module ” stylistique ” qui, pour des types de textes spécialisés (très) contraints, vole en éclat dès que l’on part de la langue en usage attestée en corpus. Contraseña: corpus maison, genre, lexique, grammaire, routine discursive, terminologie, LSP 116 Le marqueur discursif ”donc” dans deux corpus dialogaux de di↵érente nature Gemma Delgar Farrés 1 ⇤ 1 Université de Vic-Université Centrale de Catalogne (UVic-UCC) – C. de la Laura, 13 08500-VIC (Barcelone), España Notre étude porte sur l’analyse du marqueur discursif donc dans un corpus de conversation réelle, le Minnesota Corpus (Kerr, 1983), et dans un corpus de dialogue de théâtre, la pièce Le Mariage de Figaro de Beaumarchais. Comme point de départ, nous formulons les questions de recherche suivantes : Les emplois de donc apparaissant dans les deux corpus sont-ils les mêmes ? Quelle est la distribution de ces emplois dans le corpus de conversation naturelle et dans celui du dialogue de théâtre ? Les études linguistiques antérieures de donc signalent que ce marqueur discursif peut avoir trois grands emplois : marque argumentative ou logique, marque de reprise et marque interactive (Trésor de la langue française,1971-1994 ; Zenone, 1981 ; Hybertie, 1996 ; Hansen, 1997 ; Pellet, 2005 ; Bolly et Degand, 2009 ; Delgar, 2010, 2013). La révision de ces approches nous conduit tout naturellement à la description de donc donnée par Pellet : In other words, the inferential aspect of donc may be viewed as a characteristic which is present to varying degrees depending on the function that the discourse marker fulfills in a particular context. The highest degree of ”inferentiality” is of course associated with the use of donc to mark results and conclusions (argumentative). It is also high with donc to mark recapitulations, confirmation requests, and resumptions. It seems ”less high” with the frameshift function (foregrounding) and with the discursive (emphasis) function. (2005 : 103) En premier lieu, nous avons étudié les occurrences de donc des deux premières sections du Minnesota Corpus et, en second lieu, nous avons réalisé la comparaison des résultats obtenus avec ceux que nous avions déterminés pour Le Mariage de Figaro. Au vu de ces données, il faut noter que les emplois et les valeurs sémantico-pragmatiques de donc sont quasi les mêmes dans les deux corpus bien qu’il existe des valeurs qui n’apparaissent pas dans un des corpus, soit parce qu’il s’agit d’emplois plus restreints du marqueur en situation dialogale, soit parce qu’elles sont plus caractéristiques ou bien de la conversation authentique ou bien du dialogue de théâtre. Au contraire, la distribution de ces emplois à l’intérieur des corpus est di↵érente car, dans le corpus de conversation authentique, elle relève du fonctionnement de la communication réelle alors que, dans le corpus théâtral, elle tient au fonctionnement du dialogue comme un projet d’écriture prédéterminé par l’auteur. Contraseña: valeurs sémantiques et pragmatiques, marqueur discursif, conversation, théâtre, corpus ⇤ Ponente 117 Learner vs. professional translational behavior: The case of discourse markers Maria Kunilovskaya ⇤† 1 , Natalia Morgoun 2 1 2 Tyumen State University (Utmn) – 625003, Volodarskogo 6, Tyumen, Russia, Rusia Lomonosov Moscow State University - MSU (RUSSIA) – 119991, Moscow, GSP-1, 1 Leninskiye Gory, Rusia Learner vs. professional translational behavior: The case of discourse markers Keywords: translational learner corpora, discourse markers, interference, frequency distribution, text-level linguistics, cohesion, translation studies, TQA The major motivation behind this research is understanding linguistic behavior of translation students in their mother tongue during translation. Which linguistic features (if any) make them distinct from professional translations, can they be measured and targeted in the educational programmes? Another concern is describing the existing professional norm against non-translated reference for a given direction of translation in a given language pair today. This investigation is limited to mass-media texts and explores connectives frequences in English originals and Russian translations and non-translations as one possible operator of these di↵erences. Levels of explicit text connectedness have been on the linguistic research agenda in computational and corpus linguistics for many years. It is an important textual feature that reflects peculiarities of text production under di↵erent socio-pragmatic conditions. It has been found that genres and entire languages vary not only in the inventory of the means used to signal relations between parts of text, but also by the intensity of their use (Liu, 2008; Fabricius-Hansen, 2005). Cross-linguistic di↵erences in textual strategies a↵ect translations and contribute to the source language independent translationese hypothesized by Baker (Baker 1993). This has been used to e↵ectively detect di↵erences between parallel corpora unseen by general similarity measures (Cartoni, 2011). Discourse markers frequencies are used to establish di↵erences between translations and nontranslations and are interpreted as a linguistic indicator of several tendencies in translation such as explicitation, simplification and convergence (Olohan, 2001; Chen, 2006; Denturk 2012). It is important for this research that the intensity of ‘being a translation’ can be related to translation quality (Scarpa, 2006) and translational norms, operating within a particular direction of translation and a particular language pair (Mauranen, 2004). We set out to reveal tendencies in translational behaviour at di↵erent competence levels by describing the frequency distributions of two functional types of discourse markers (connectives and epistemic commentary markers) in learner and professional translations against sources and non-translations. We compare data from a parallel translational learner corpus and a corpus of professional translations to customized selections from English and Russian national corpora. The total size of the research corpus amounts to 10 mln tokens. Using independent predefined lists of targeted items for each language, we explore cross-linguistic di↵erences and their influence over the two types of translation. We test three possible tendencies: translation follow source language pattern (interference); translations follow target language pattern (normaliza⇤ † Ponente Autor correspondiente: [email protected] 118 tion) or translations demonstrate independent idiosyncratic (over)use of connectives (explicitation). The observations are done with regard to the overall frequencies of the list items, their semantic groups and individual frequencies. The latter approach reveals translationally distinctive connectives (Chen, 2006) – items that have statistically di↵erent frequencies in translations as in originals. Manual analysis of parallel aligned data is used to verify the inferences from statistical analysis and provides insights into typical errors which lead to a significant decrease in the textual quality of learner translations. Contraseña: translational learner corpora, discourse markers, interference, frequency distribution, text, level linguistics, cohesion, translation studies, TQA 119 Les appositions nominales en français et en slovène : étude contrastive sur le corpus FraSloK Adriana Mezeg 1 ⇤† 1 Faculté des Lettres, Département de traduction – Askerceva 2, 1000 Ljubljana, Eslovenia La présente communication aborde un phénomène grammatical que nous appelons, d’après Combettes (1998), les appositions nominales, l’un des types de constructions détachées dont les propriétés principales sont : la liberté de position dans la phrase, la séparation du reste de la phrase par une virgule, la prédication seconde et la relation de coréférence avec le sujet de la phrase (Combettes 1998). Il s’agit d’un groupe nominal qui n’est jamais précédé d’un déterminant et qui établit avec le sujet principal une relation avec verbe être, par exemple : Chef du gouvernement provisoire de la République française, il a signé à Moscou, le 10 décembre 1944, un ” traité d’alliance et d’assistance mutuelle ”, qu’il qualifie de ” belle et bonne alliance ”. (Le Monde diplomatique, avril 2008) La présente communication ne se propose d’analyser que les traductions slovènes des appositions nominales françaises, placées en tête de phrase, cellesci étant le plus intéressantes contrastivement. L’apposition nominale s’avère problématique du point de vue contrastif franco-slovène et ne peut pas être transmise en slovène par la même structure, c’est-à-dire une construction détachée, car elle ne satisfait pas au critère de la mobilité phrastique, ne pouvant pas, par exemple, occuper la position frontale. Ainsi supposons-nous que l’explicitation grammaticale est de règle lors de la traduction de ces formes phrastiques en slovène, les traducteurs devant les remplacer par d’autres structures. L’analyse contrastive sera basée sur les exemples tirés semi-automatiquement du corpus parallèle français-slovène FraSloK qui contient des articles de presse (Le Monde diplomatique, sous-corpus journalistique) et des ouvrages littéraires (sous-corpus littéraire) publiés entre 1995 et 2008. Les deux sous-corpus sont annotés morphosyntaxiquement et équilibrés au niveau de la taille, contenant ensemble un peu moins de 2,5 millions de mots. Les exemples de constructions détachées nominales initiales seront extraits du corpus français-slovène par le logiciel Paraconc (Barlow 1995) à l’aide de patrons syntaxiques, composés d’étiquettes morphosyntaxiques et d’expressions régulières. D’après les résultats du repérage automatique et du tri manuel, les appositions nominales sont un peu plus fréquentes dans le corpus journalistique (178 occurrences contre 122 dans le corpus littéraire). Souvent plus longues de la proposition principale, elles apportent, surtout dans le discours journalistique, l’information sur la position et le statut social du référent de la proposition principale. Cette étude vise à examiner comment les traducteurs slovènes a↵rontent ces structures problématiques et propose d’en tirer des conclusions pratiques, utiles dans le cadre pédagogique et dans la médiation interlinguistique franco-slovène. Les premiers résultats montrent que le contenu des appositions nominales françaises est souvent exprimé en slovène sous forme du sujet de la phrase, de l’attribut du sujet, de l’attribut de l’objet et d’une construction liée (Combettes 1998) qui est, d’ailleurs, fréquente en slovène. La traduction des appositions nominales françaises vers le slovène pose d’autres problèmes que nous constatons dans le contexte pédagogique lors des cours de traduction, notamment les questions de l’ordre des mots, du changement de place au sein d’une phrase et de l’emploi de la virgule, questions que nous ⇤ † Ponente Autor correspondiente: adriana.mezeg@↵.uni-lj.si 120 tâcherons d’éclaircir dans la communication proposée. Contraseña: apposition nominale, construction détachée, corpus parallèle FraSloK, analyse contrastive, traduction 121 Les constructions verbales en comme : de l’écrit scientifique à l’écrit académique des étudiants natifs/non-natifs Marie-Paule Jacques ⇤ 1,2 , Rui Yan ⇤ † 1 1 LInguistique et DIdactique des Langues Étrangères et Maternelles (LIDILEM) – Université Grenoble Alpes – UFR des Sciences du Langage - BP 25 - 38040 Grenoble cedex 9, Francia 2 École supérieure du professorat et de l’éducation - Grenoble (ESPE Grenoble) – ESPE Académie de Grenoble, Université Grenoble Alpes – 30, avenue Marcelin Berthelot - 38100 Grenoble, Francia L’écrit scientifique fait un usage abondant d’une phraséologie spécialisée (Tutin, 2014), qui s’y présente sous di↵érentes formes : collocations (Grossmann & Tutin, 2003), séquences récurrentes (Tran, 2014) routines (Tutin & Kraif, 2016)... Cette phraséologie remplit des fonctions rhétoriques et discursives variées, par exemple, exprimer un point de vue, établir la cause et l’e↵et, signaler une filiation scientifique, définir des termes et concepts, donner des éléments de preuve, etc. Sa maitrise est de ce fait aussi importante que la maitrise de la terminologie et de l’appareil conceptuel de la discipline. Nous nous focaliserons sur la construction verbale associée à comme, dont une étude dans un corpus d’articles de recherche en SHS montre qu’elle introduit souvent ” des comparatives métaénonciatives ” (Debaisieux & Martin, 2010, p. 321, cité par Grossmann, 2014, p. 764) : comme nous l’avons montré/vu/souligné/dit, comme nous le verrons, comme nous l’expliquons, comme illustré/indiqué dans la figure, etc. Ces quelques exemples mettent en évidence la contribution de cette construction à l’argumentation scientifique : elle remplit ” une fonction métatextuelle et/ou évidentielle ” (Grossmann, 2014) et ceci par la présence massive, après comme, de verbes de constat (constater, voir ) ou de communication (dire, expliquer, souligner, montrer, indiquer ). La construction a alors pour fonction de renvoyer vers un élément textuel ou un (fragment de) discours qui servent de preuve ou de rappel. Nous nous situons dans la perspective de son apprentissage par des scripteurs novices et envisageons d’étudier l’usage de cette construction par une comparaison des productions d’étudiants natifs et non natifs et de textes de chercheurs, considérés ici comme experts de l’écriture scientifique. Dans la lignée de travaux centrés sur les phénomènes phraséologiques dans les écrits des natifs/non natifs (Hyland & Milton, 1997 ; Ne↵, Ballesteros, Dafouz, Martı́nez, & Rica, 2004 ; Granger & Paquot, 2009), nous considérons que le statut de novice en matière de rédaction scientifique confronte identiquement les étudiants natifs et non natifs aux difficultés de l’usage de la phraséologie scientifique. En revanche, comme le soulignent Granger et Paquot (2009), les difficultés des étudiants non natifs méritent d’être prises en compte et traitées spécifiquement puisqu’ils ont en outre des problèmes liés à la maı̂trise de la langue. Nous examinerons donc l’emploi des constructions verbales associées à comme chez les étudiants natifs ainsi que non-natifs en nous basant sur deux corpus composés de mémoires de master, et en les contrastant à un corpus d’articles de recherche en SHS. Les premières observations manifestent des di↵érences aussi bien quantitatives que qualitatives : 1) Par rapport aux experts, ces constructions sont sous-employées par ces deux publics. 2) Les étudiants montrent des emplois ⇤ † Ponente Autor correspondiente: [email protected] 122 di↵érents de ceux des experts, notamment concernant les verbes associés aux constructions en comme. 3) Les étudiants non-natifs produisent des erreurs lexicales sur ces constructions. Contraseña: construction verbale, écrit scientifique, étudiants natifs/non, natifs, linguistique de corpus 123 Meeting the reader in academic writing: reader pronouns in English and French. Curry Niall 1 ⇤ 1 University of Limerick [IRLANDE] (UL) – University of Limerick Limerick, Irlanda Research on corpus-based contrastive analysis is notably experiencing a rebirth in interest due to its role in a world of increasing ‘interlingual and intercultural communication’ (Granger 2003, p.18). This rebirth is largely influenced by advances in corpus linguistics over the last 30 years, where corpus-based contrastive analyses on academic writing are occupying an albeit small but growing space in the literature. Much of this growth is likely due to the fact that non-native speaking academic writers need to be informed of the writing conventions of the academic discourse communities to which they aspire (Pérez-Llantada 2010, p.45). This has led researchers on academic writing to occupy three streams of research (Biber 2006, p.6) that can better inform language teaching i.e. the study of context and text, the study of interpersonal communication and the study of lexico-grammatical items. Although these streams are arguably interconnected, there is a surprising lack of research on interpersonal communication in academic writing that compares evaluative markers across languages. In other words, there is a need for research on rhetorical devices, such as directives, personal asides, shared knowledge, questions and reader pronouns (Hyland 2005), that authors use to engage readers in academic writing and this research aims to address this gap in the context of reader pronouns in English and French academic writing. In this paper, we consider reader pronouns in the economics research article in English and French and in so doing, aim to analyse their varying role in the research article as engagement markers. We focus on the functions of these pronouns as a comparable common ground or tertium comparationis in English and French, and test their equivalence, following Krzeszowski (1990), in terms of form, location and word class. To do this, we present a corpusbased contrastive analysis of economics research articles in English and French, taken from the KIAP corpus (Fløttum et al. 2006) which is a comparbale corpus that contains 450 research articles with 150 in English, French and Norwegian and 50 in each language in the economics, linguistics and medicine disciplines. This research centres on the English and French economics subcorpora totalling 100 research articles. Reader pronouns are identified in each sub-corpus and their functions are categorised based on a synthesis of research by Hyland (2001; 2005) and Fløttum et al. (2006) in terms of their work on addressee features and reader pronouns. These reader pronouns are then analysed in terms of their formal typology, their location within the text, and their morpho-syntactic properties in a view to measure equivalence. The results of this study reveal some important similarities and di↵erences at the level of function, form, location and morpho-syntax which are investigated both quantitatively and qualitatively. Such findings allow us to add to the debate on the nature of English and French academic writing as writer- and reader-responsible languages, respectively and can have useful implications in informing the teaching of academic writing in both English for academic purposes and français langue académique. ⇤ Ponente 124 Contraseña: corpus, based contrastive analysis, English for academic purposes, français langue académique, academic writing 125 Multi-word terms: disclosing the semantic relations in noun compounds Melania Cabezas-Garcı́a ⇤† 1 , Pilar León-Araúz ⇤ 1 1 University of Granada (UGR) – Buensuceso, 11 18001, España Noun compounds (e.g. wind power ) are the units mainly used to designate specialized concepts (Nakov, 2013). These multi-word terms (MWTs) can be defined as a sequence of nouns that function as a single noun (Downing, 1977) and they are distinguished by their syntacticsemantic complexity, since two concepts are juxtaposed without any clear indication of the link between them (Rosario et al., 2002). This involves that in compound terms, such as air pollution and oil pollution, that have the same external form (the head pollution combines with a noun modifier), di↵erent semantic relations can be established between their constituents (Location vs. Cause) (Maguire et al., 2010). Therefore, the semantics of terminological noun compounds is not fully compositional or construed from the meaning of their constituents, as it is often assumed. Although the ambiguity of the semantic relations in noun compounds has long been studied, it remains problematic, because di↵erent interpretations can lead to di↵erent inferences, query expansion, paraphrases, translations, etc. (Hendrickx et al., 2013). The root of this issue is noun packing, which can be addressed by analyzing the formation processes of noun compounds, involving predicate deletion (e.g. power system, instead of a system produces power ) and predicate nominalization (e.g. energy transfer, instead of energy is transferred ) (Levi 1978). These propositions underlying the noun compounds make the semantic relation explicit and take the form of a predicate, its arguments, which are mandatory and make up the meaning of the verb, and adjuncts (optional complements) (Tesnière, 1976). The relation between a predicate and its complement structure is referred to as ‘micro-context’, which represents a key factor in accessing the semantics of terms. This paper describes the use of paraphrases conveying the conceptual content of English twoterm noun compounds (Nakov and Hearst, 2006; Butnariu and Veale, 2008; Cabezas-Garcı́a and Faber, in press) in the specialized domain of environmental science. Verb paraphrases were used to access micro-contexts, which represent the syntax-semantics interface, in two-term noun compounds formed by predicate deletion. Some of these paraphrases were based on the lexicosyntactic patterns that usually convey semantic relations in real texts (Meyer, 2001; Marshman, 2006). Our goal was to access the semantics of these MWTs in order to (i) disambiguate the semantic relation between the constituents of the compound; and (ii) develop a procedure of inference of the semantic relations in these MWTs. To this end, English two-term noun compounds were extracted from an environmental science corpus. The MWTs selected designated entities and all of them shared the same head (e.g. air pollution, wastewater pollution, oil pollution, etc.). We then organized the MWTs according to the semantic category of their modifiers, i.e. the qualitative valence of the concealed predicates ⇤ † Ponente Autor correspondiente: [email protected] 126 was considered to disambiguate the semantic relations in the noun compounds. The following step was the extraction of paraphrases from the corpus. Finally, the di↵erent groups of MWTs, which had been previously organized depending on the semantic category of their modifier, were compared. Our results showed that the specification of the semantic category of the modifiers and the use of paraphrases allowed access to the conceptual load of the noun compounds, namely to the semantic relation between their constituents. Thus, recurrent patterns in the formation of these compounds were observed, which was found to be a valuable starting point toward the development of translation rules of these units. Contraseña: noun compound, semantic relation, paraphrase, micro, context, terminology 127 Multilingual extraction of terminology from specialised corpora. Eva M. Mestre-Mestre 1 ⇤ 1 Universitat Politecnica de Valencia [Espagne] (UPV) – Camino de Vera, s/n 46022 Valencia, España There exists considerable amount of literature related to the use of text based corpora for various purposes: scientific research, elaboration of teaching materials, compilation of glossaries and vocabularies, etc. In many cases, computer software is used (and sometimes programmed) to help in these tasks. Most of the analysis software used permits the users to check word frequencies, concordances and collocations. However, there are not many tools which permit the extraction of true specialised lexical units from specialised domain corpora. In addition, there are not numerous able to work with languages other than English. This work presents the main characteristics of DEXTER (Discovering and EXtracting TERminology)[1], an online workbench for terminology management and data mining of corpora based on unstructured texts. The current version of DEXTER supports the processing of small- and medium-sized corpora carrying out first an automatic extraction of the terms in a given corpus, by contrasting the target corpus with the IATE thesaurus of the European Union. Then, a manual validation of the candidate terms is necessary to obtain final valid results. During the analysis, a distinctive characteristic of DEXTER is the possibility of working with di↵erent languages; at the moment, it is able to analyse corpora in English, French, Italian and Spanish. A second particularity of this software tool is that it uses a hybrid approach which takes into account the linguistic and statistical properties of the lexical units, using in addition lexical filters without grammatical tagging to restrict the results obtained before their weighing, which simplifies the validation work needed for the completion of the terminology extraction task. This also permits the identification of terms that include di↵erent grammatical categories (nouns, verbs, adjectives or adverbs). DEXTER uses the SCR metric (Periñán-Pascual, 2015), resulting from the combination of termhood and unithood of the n-grams extracted by the software (Salton, Wong, and Yang, 1975; Salton and Buckley, 1988; Ahmad, Gillam and Tostevin, 2000; Park, Byrd and Boguraev, 2002). The research presented here compares the results obtained in the analysis of three corpora composed by 50 articles written in French, 50 written in English and 50 written in Spanish on the subject of neurology published in the last five years in prestigious research journals. The degree of precision of the terms proposed by the software after manual validation has been studied. The cases in which greater degree of false positives (considered as terms by the software proposed but disregarded in the validation phase) have also been considered. The study concludes that the results obtained with DEXTER are similar for the three languages and consistent with previous studies carried out with monolingual corpora (Periñán-Pascual and Mestre-Mestre, 2015, 2016). DEXTER has been developed in C# with ASP.NET 4.0 by Prof. Carlos Periñán-Pascual, and is freely accessible at www.fungramkb.com/nlp.aspx. ⇤ Ponente 128 Contraseña: ATE, multilingual, specialised corpora, terminology 129 Naming practices and media constructions of reality in Spanish: A corpus-based perspective on violence against women news (2005-2015) José Santaemilia 1 ⇤ 1 Universitat de València (UV) – Avda. Blasco Ibáñez, 32-6 Valencia 46010, España Without a doubt, violence against women (VAW) is a serious issue within Spanish society, which is characterized, among other things, by a growing awareness of gender and sexual issues, and this includes a perception of VAW as a serious social malady, as well as a crime. Multiple representations of, and debates on, the topic are to be found in literature (Báez Ramos 2002), cinema (Sánchez Noriega 2002, Wheeler 2012) or TV and radio programmes (Gómez Nicolau 2012). In this heightened awareness of VAW, mass media have been instrumental. In Spain, media accounts of VAW are very closely related to two quality newspapers, El Paı́s and El Mundo. Since the mid-1970s quality papers have featured growing numbers of articles on the topic. With the murder of Ana Orantes in December 1997, a new discourse on VAW has been identified in the Spanish media (Bengoechea 2000, Carballido 2007), though scholarly research at the turn of this century (Bengoechea 2000, Lledó 2002, Fernández Dı́az 2003, Jorge 2004, Vives-Cases et al 2005, Carballido 2007, Zurbano 2012, Menéndez 2014, Carratalá 2016) still shows that Spanish media discourses have a tendency to naturalize and condone male responsibility, thus reproducing the existing asymmetrical relations between the two sexes. Although a vast number of denominations for VAW are present in the Spanish media discourse, three naming practices seem to stand out as the most common -violencia de género [Eng. ‘gender-based violence’], violencia doméstica [Eng. ‘domestic violence’] and violencia machista [Eng. ‘male violence’]. Choosing one term over another is especially relevant, as it is likely to impose a category of thought, convey negative or positive values, attribute blame or praise, or shape a certain evaluative stance. This presentation, therefore, compares and contrasts the two Spanish quality dailies (El Paı́s and El Mundo) in their use of the three main naming practices used in contemporary VAW news. To do so I draw on an ad-hoc corpus made up of ca. 10 million words of gender-based news, covering the period 2005-2015. This is part of a larger, comparable (Spanish-English), highly specialized corpus (GENTEXT-N), containing all the news articles dealing with genderrelated topics such as VAW, homosexuality or abortion. In terms of methodological approach, I resort to a CADS (Computer-Assisted Discourse Studies) approach (Partington 2004, Baker & Levon 2015) –e.g. the combined, dialogical insights from both corpus linguistics and Critical Discourse Analysis, ”moving back and forth recursively between qualitative and quantitative forms of analysis in order to generate new hypotheses as well as to test existing ones” (Baker & Levon 2015: 223). Therefore, di↵erences and similarities in frequencies and concordance lines are explored, in order to assess the most important ideological values present in VAW news ⇤ Ponente 130 stories. Attention has been paid to the news values (Bednarek & Caple 2012, 2014) construed by each newspaper, together with the relevant associations and ideological implications. Among the traits that seem to be confirmed we identify a general trend towards a more widespread use of two terms –violencia machista (El Paı́s) and violencia de género (El Mundo)– with the increasing exclusion of violencia doméstica. Newsworthy naming practices, and their evolution in media discourses, are powerful indicators of both social positionings on sensitive social issues and of public evaluations of the same issues. Contraseña: violence against women (VAW), Spanish press, El Paı́s, El Mundo, media discourse, VAW naming practices, news values. 131 On the Endophoric, Abstract and Narrative Nature of Idiomatic ’Do So’ in Legal texts, Journalistic Texts and Written Correspondence. ” Carlos Prado-Alonso 1 ⇤ 1 of Oviedo (Uniovi) – España Do so idiomatic constructions, as in ‘I ate an Apple yesterday in the park, and Peter did so last week’, are verbal anaphors that have been extensively studied from a theoretical perspective. Research on do so has mainly focused on the categorical factors -i.e. semantic and syntacticthat determine the use of the construction. It has been argued, for instance, that the extent of application of do so anaphora depends principally on factors such as: (a) non-stativity of the antecedent (Guimier 1981); (b) antecedent not headed by be (Levin 1986); (c) coreferentiality of subjects in the antecedent and do so clauses (Souesme 1987), (d) adjunct status of any ”orphan” in the do so clause (Culicover & Jackendo↵ 2005); and/or (e) non-contrastive status of any adjunct in the do so clause (Huddleston and Pullum 2002), among others. Overall, however, scholars have devoted little attention to the examination of the textual factors a↵ecting the distribution and use of do so anaphora in naturally occurring Present-day English, apart form a few isolated hints here and there (cf. Houser 2010). In order to bridge this gap, this paper presents an in-depth corpus-based analysis of the factors that determine the pragmatic use and distribution of do so constructions in di↵erent contemporary legal, journalistic and written correspondence texts. The data for the study are taken from the ICAME family of corpora, namely the LOB, FLOB, FLOB, FROWN, BE06, and AmE06 corpora. As a rule, do so has been regarded typical of formal registers, with the elliptical alternative omitting so being preferred in informal contexts (cf. Stirling and Huddleston 2002: 1531). Beyond that, however, the analysis of the 687 instances retrieved from the corpora will show that the frequency and distribution of do so constructions in legal, journalistic and written correspondence texts is not only dependent on the degree of formality but also on the narrative, endophoric and abstract nature of the texts in which it occurs. The data will also show that such a narrative, endophoric and abstract nature is not only a property of the texts in which do so anaphora occurs, but also a feature of the construction itself. In sum, the analysis sheds light on the linguistic and textual factors that drive the pragmatic use and the distribution of do so verbal anaphora and shows that, in addition to syntactic and semantic factors, the linguistic features of the texts in which they occur also play an important role in the use of these types of formulaic expressions. References ⇤ Ponente 132 Culicover, P.W., Jackendo↵, R. 2005. Simpler Syntax. Oxford: OUP. Houser, M. J. 2010. The Syntax and Semantics of Do So. University of California. Guimier, C. 1981. La Substitution Verbale en Anglais. Modèles Linguistiques 3.1: 135-161. Levin, L. 1986. Operations on Lexical Forms: Unaccusative Rules in Germanic Languages. Cambridge, MIT. Pullum, G. K. & R. Huddleston. 2002. The Cambridge Grammar of the English Language. Cambridge University, 1449-1564. Souesme, J. 1987. Valeurs et Emplois Respectifs de DO et DO SO. Modèles Linguistiques 9: 65-92. Contraseña: Idiomatic Do So, Textual Variation, Legal Texts, Editorials, Written Correspondence 133 On the Grammaticalization Path of the Quasi-coordinator as well as Miriam Criado Peña 1 ⇤ 1 UNIVERSIDAD DE MÁLAGA – España The English language as it is known today has undergone a number of developments that have changed it throughout time. Among those changes, grammaticalization stands out because of its relevance in the progress of the language, consisting in the process by which a lexical word having full meaning on its own becomes a grammatical item. The present study analyses the developmental path of the construction as well as taking the Old English adverb well as the origin of it. In Middle English as well and as well as (swa well swa) emerged from the original adverb behaving as single units to finally turn into the coordinator as well as in Early Modern English. These manifold layers still remain in Present Day English, which together with the versatility of the construction allows me to classify it into four groups according to their meaning and function: a) as an adverb of manner; b) as a comparative of two elements; c) as a conjunctive coordinator; and d) as a coordinator introducing one person or thing. Nevertheless, coordinators such as as well as sometimes perform di↵erent syntactic roles in a sentence, those are called quasi-coordinators, that is, linkers that can behave like coordinators or subordinators depending on the context. When they behave as subordinators, they introduce prepositional phrases and can be placed in front position but do not lose their coordination function. Besides, some of the mechanisms involved in the modification of the language, such as syntactic reanalysis or semantic bleaching, among others, are also considered in this paper to explain the changes and provide a dual view of them encompassing syntax and semantics. The process of grammaticalization of quasi-coordinators has been practically neglected in the literature, and therefore, the diachronic development of as well as still remains unknown. In the light of this, the present paper studies this process examining the syntactic and semantic changes of this construction as well as exploring its coordinating function in the di↵erent layers across time. In this fashion, the following objectives are pursued: a) a historical analysis to ascertain the origin of this quasi-coordinator, examining the linguistic causes that motivated the change, both syntactically and semantically; b) an identification of the multiple mechanisms and processes taking place along the grammaticalization path; c) a classification of the construction into four groups according to their function in order to appreciate its progress, and d) a socio-linguistic study to assess the role played by the social factors during the linguistic process. For the purpose, the Parsed Corpus of Early English Correspondence (PCEEC) and the Helsinki Corpus of English Texts have been used as sources of analysis, covering almost seven hundred years from the late Old English period to the Early Modern period. Contraseña: Grammaticalization, as well as, quasi, coordinator, diachronic development, semantic bleaching, reanalysis, socio, linguistic factors ⇤ Ponente 134 Onomasiologı́a del sentimiento: los corpus ling´’uı́sticos como fuente de datos para la semántica y la combinatoria sintagmática de los nombres de emoción en español Inmaculada Mas 1 ⇤ 1 Universidad de Santiago de Compostela - USC (SPAIN) (USC) – España La expresión de las emociones está de moda. Los emocionarios, la plasmación de sentimientos en las redes sociales, los emoticonos, imprescindibles en las conversaciones a través del chat móvil. Estas son solo algunas de las manifestaciones de la relevancia actual de la sensibilidad subjetiva. Más allá del monolı́tico me gusta, dar nombre a las emociones constituye en las comunicaciones públicas y privadas un elemento esencial, no por primitivo menos sofisticado. En esta comunicación proponemos un acercamiento a la semántica y la combinatoria léxico-sintáctica de los nombres de emoción en español con ayuda de los datos obtenidos a través de corpus ling´’uı́sticos. Los objetivos de esta propuesta son tres: en primer lugar, se intenta llevar a cabo una aproximación onomasiológica a la materia de los sentimientos, centrada en los nombres de emoción en español y su combinatoria léxico-sintáctica; en segundo lugar, se pretende comprobar la utilidad de los corpus como fuente de datos, ya que, además del contexto y el ámbito, aportan información sobre frecuencia (corpus de referencia), correspondencias multiling´’ues (corpus paralelos) e incidencia en la interlengua (corpus de aprendices); como tercer objetivo, se busca considerar la aplicabilidad de todo ello de cara a la elaboración de un producto lexicográfico destinado a estudios contrastivos y a resolver necesidades de producción y traducción. La aproximación onomasiológica parte del Diccionario ideológico de Casares, del Diccionario de uso del español, de Moliner, y del Diccionario de sinónimos y antónimos de la lengua española, de López Garcı́a. Según el plan general de la clasificación ideológica de Casares (1942), los nombres de emoción están englobados en la materia de Sensibilidad y se desglosan en Sensibilidad/Sentidos, en el Cuadro sinóptico 13 (p. L), y en Sentimientos, en el Cuadro sinóptico 14 (p. LI). Como es sabido, la perspectiva onomasiológica es más aprovechable en las tareas ling´’uı́sticas de producción y traducción, dos actividades para las que los catálogos del Diccionario de uso del español y los diccionarios de sinónimos y antónimos han demostrado ser fuentes de enorme utilidad. La localización del léxico preciso se consigue en general a partir de la voz más neutra, más general o más frecuente. Moliner tenı́a como uno de sus propósitos al incluir los catálogos el de ”conducir al lector desde la palabra que conoce al modo de decir lo que desconoce o que no acude a su mente en el momento preciso” (p. IX). En su diseño pretendió dotar al diccionario de una doble vı́a de consulta: la onomasiológica y la semasiológica. Los datos de frecuencia y, sobre todo, el caudal de ejemplos de los nombres en contexto que nos ofrecen los diferentes corpus consultados (CORPES XXI, Reverso Context y CAES), permiten perfilar el esquema semántico, completándolo con el potencial combinatorio; en el caso que nos ⇤ Ponente 135 ocupa, con los verbos de apoyo y los complementos adnominales. Algunos resultados en torno a los dos polos en que se sitúa la materia Sentimientos (gusto/disgusto, amor/odio, preocupación/despreocupación) muestran las particularidades de la combinatoria de estos sustantivos. Contraseña: nombres de emoción, lexicografı́a onomasiológica, corpus del español, combinatoria sintáctica 136 Phraseological routines in scientific writing: the example of metatextual routines in French Agnès Tutin 1 ⇤ 1 Laboratoire de Linguistique de de Didactique des Langues Maternelles et Etrangères (LIDILEM) – Université Paris VIII Vincennes-Saint Denis, Université de Grenoble – Université Grenoble Alpes Bâtiment Stendhal CS40700 38058 Grenoble cedex 9, Francia Phraseology is prevalent in scientific writing (e.g. Gledhill, 2000; Pecman & K´’ubler 2011) and has many faces in this genre (Tutin, 2013). Cross-disciplinary scientific phraseology includes collocations such as pay attention or encouraging results, discursive markers such as as long as or as a first step but also large phraseological chunks that we call semantico-rhetorical routines (Tutin & Kraif, 2016). These routines, which belong to the extended phraseological field (see also Teufel 1998, Pecman 2004; Sandor 2007) present specific properties: • At the syntactic level, they are generally complete sentences including a tensed verb. They are thus di↵erent from standard collocations which prototypically involve two lexical elements. • At the rhetorical level, they have a specific rhetorical function, such as highlighting textual coherence, e.g. comme on/nous l’avons mentionné/précisé [as one/we mentioned/made clear ...]. • At the enunciative level, they involve specific referents in the discourse situation (e.g. the author of the scientific writing, the scientific article, the audience of the scientific writing ...). • At the semantic and lexical level, they involve specific concepts, lexicalized with various elements, e.g. in the above example, the author of the scientific is referred to with on or nous, while mentionné alternates with précisé. These semantico-rhetorical routines are thus far from being frozen expressions, but we think they fully belong to the field of phraseology since these patterns are dedicated to specific functions in the genre of scientific writing and are realized through limited lexical paradigms. After a theoretical presentation of routines, our presentation will show how these phraseological patterns can be automatically extracted from treebanks of scientific articles in a corpus-driven approach. This technique uses statistical association measures and dispersion measures (Kraif 2016; Tutin & Kraif 2016), associated with semantic lexicons and syntactic relations (Hatier et al. 2016). We will then illustrate this notion in the field of metatextual functions, especially text navigation functions, often associated with speech verbs. ⇤ Ponente 137 References Gledhill, Ch. (2000). Collocations in Science Writing. Language in performance, 22. Tuebingen: Gunter Narr Verlag. Hatier, S., Augustyn, M., Yan, R., Tran, T. T. H., Tutin, A., & Jacques, M. - P. French crossdisciplinary scientific lexicon: extraction and linguistic analysis (2016).Dans T. Margalitadze & G. Meladze (éd.), Proceedings of the XVII EURALEX International congress Lexicography & Linguistic diversity (p. 355–365). Kraif, O. (2016). Le Lexicoscope : Un outil d’extraction des séquences phraséologiques basé sur des corpus arboré. (O. Kraif & A. Tutin, éd.)Cahiers de lexicologie, 1 (108), 91-106. Pecman, M. (2004). Phraséologie contrastive anglais-français : analyse et traitement en vue de l’aide à la rédaction scientifique. Thèse de doctorat, Université de Nice Sophia Antipolis, décembre 2004. Pecman, M., & K´’ubler, N. (2011). ARTES: an online lexical database for research and teaching in specialized translation and communication. In Proceedings of the First International Workshop on Lexical Resources. Sándor, A. (2007). Modeling metadiscourse conveying the author’s rhetorical strategy in biomedical research abstracts. Revue Française de Linguistique Appliquée, XII: 2007-2: 97-108. Tutin, A. (2016). La phraséologie transdisciplinaire des écrits scientifiques : des collocations aux routines sémantico-rhétoriques. Dans A. Tutin & F. Grossmann (éd.), L’écrit scientifique : du lexique au discours. Autour de Scientext (p. 27-44). Rennes: Presses Universitaires de Rennes. Tutin, A., & Kraif, O.(2016) Routines sémantico-rhétoriques dans l’écrit scientifique de sciences humaines : l’apport des arbres lexico-syntaxiques récurrents. Lidil. Revue de linguistique et de didactique des langues, (53), 119-141. Contraseña: phraeology, scientific writing, routines 138 Phraseology and discourse grammar in English as a lingua franca: ’on the contrary’ and ’on the other hand’ in unedited research papers Silvia Murillo 1 ⇤ 1 Universidad de Zaragoza – Pedro Cerbuna 12, 50009 ZARAGOZA-ESPAÑA, España Due to linguistic interference, some ‘deviant’ uses of the contrastive discourse markers on the contrary and on the other hand have been pointed out in essays written by learners of English (Lake 2004, Guilqin et al. 2007), as well as by users of English as a lingua franca (Prodromou 2008). For instance, these markers, which grammatically are prepositional phrases, are similar in form to the Spanish discourse markers por el contrario and por otra parte, but their use (i.e. their instructional or procedural meaning) is di↵erent. Por el contrario can either contrast two topics or oppose/ refute one single topic, whereas on the contrary only encodes the latter use. Por otra parte encodes discourse organizing instructions rather than counterargumentative ones. The same applies to other language pairs, for example English-French on the contrary/ au contraire (Portolés 2002). The purpose of this paper is to present a qualitative-quantitative analysis of the form and use of these two markers in the SciELF corpus, a subset of the WrELFA corpus (Written Corpus of English as a Lingua Franca in Academic Settings), compiled at the University of Helsinki. The SciELF corpus consists of 150 unedited research papers (759 300 words) from Sciences and Social Sciences and Humanities disciplines, written by academics of a range of ten L1 backgrounds. The analysis of the corpus revealed nonstandard phraseological variants of the two markers. Regarding on the other hand, makers such as on the other side, in the other side, in the other hand, and for the other hand were found. The phraseological range for on the contrary included at the contrary, by contrary, in contrary, on contrary, and contrary. As regards their functions, on the contrary presents deviant uses, contrasting two topics rather than opposing/ refuting one single topic, in over two thirds of the cases found in the SciELF corpus. On the other hand reflects a more discourse organizing role of the marker in some cases, and thus a less argumentative function. These processes may be described as semantically-driven developments, as the role of residual conceptual meaning in the L1 markers (cf. Murillo 2010) seems to become central for the form and use of these discourse markers in written academic ELF. Regarding form variants, in most cases the core conceptual element of the markers has been kept (as a cognate) or translated, and there is an approximate use of the prepositions and articles (cf. Sinclair 2004, Vetchinnikova 2015). Further, the procedural meaning of these markers seems to have been amplified due to the influence of the L1. Thus, hybridity is the most remarkable process with regard to these markers, and it is perceived at a formal level and at a pragmatic-semantic level. Variations in form are masked by the role played by editors at a later stage, who tend to ⇤ Ponente 139 correct the use of prepositions and articles in papers to be published (Mur, 2013). However, many deviant uses of on the contrary are overlooked in published papers (Murillo 2012). Considering this trend and the frequency of such cases revealed in the SciELF corpus, it is argued that this discourse marker is undergoing a grammaticalization process in ELF, that is, its procedural meaning is changing. Contraseña: English as a lingua franca (ELF), contrastive discourse markers, formal variants, procedural meaning, conceptual meaning, grammaticalization 140 ROUND TABLE: Corpus-based analysis of interpersonal metadiscourse in specialized domains: academic vs professional and social genres. Theoretical and methodological challenges Francisca Suau-Jiménez ⇤ 1,2 , Rosa Lorés Sanz ⇤ † 3 , Giovanna Mapelli 4 , Isabel Herrando Rodrigo ⇤ § 3 1 ⇤ ‡ FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ. UNIVERSITAT DE VALENCIA (IULMA-UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España 2 FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ (IULMA - UV) – Av. Blasco Ibáñez, 32 Valencia 46010, España 3 Universidad de Zaragoza – Pedro Cerbuna 12, 50009 ZARAGOZA-ESPAÑA, España 4 Dipartimento di Scienze della Mediazione Linguistica e di Studi Interculturali – Piazza Indro Montanelli, 1 20099 - Sesto San Giovanni (MI), Italia The main subject of this round table is an identified need to refine interpersonal metadiscourse (IM) as a theoretical and methodological tool of analysis to describe genres in specialized domains and languages through their corresponding corpora. The debate will be grounded on our own research results, based on corpora, stemming from the study of di↵erent academic and professional genres (Herrando-Rodrigo 2010, 2012, 2014; Lorés-Sanz 2009, 2011a, 2011b; Mapelli 2008, 2016; Suau 2012a, 2012b, 2014), with a focus on interpersonality and its limitations and challenges as an analytical perspective. Conclusions intend to suggest insights for the applicability of the descriptive framework of interpersonal metadiscourse and thus facilitate further research in the field. The hypothesis is that, if interpersonal metadiscourse (IM) as a framework for the analysis of interpersonal features in professional, social and academic genres is conditioned by contextual variables, it would therefore need to be constantly refined and readapted to the specific corpus it is applied to, thus accepting new markers and/or new lexico-grammatical realizations. If this hypothesis is somehow confirmed by means of the debate and the conclusions that will emerge from the proposed round table, the scope will be opened for further refinement of the model which will allow us to cater for the description of a wider range of genres, disciplines, languages and corpora, with discursive and socio-linguistic implications. To sum up, we will draw on several of our own studies carried out in specialized corpora from the standpoint of the IM framework, discussing their main achievements but also their limitations, due to the strict and extant pattern the model was designed with. Then, these four questions will be posed in order to hold a debate among the presenters and the audience: Questions for discussion, related to the four analyses: ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] § Autor correspondiente: [email protected] † 141 Q.1. Have any weaknesses being identified in the framework of interpersonal metadiscourse, especially related to markers and their lexico-grammatical realizations? and if so, which ones? Q.2. Does each corpus determine the way in which the framework has been applied, or, on the contrary, has the research objective determined what corpus to collect? Q.3. What di↵erences can be observed in the interpersonal metadiscourse framework according to genre, discipline and language? Q.4. What conclusions can be drawn and what suggestions can be made to facilitate methodological improvements in order to facilitate further research in IM? Which would be the theoretical implications? Based on our contributions and on the implications emerging from them, di↵erences will be identified in terms of variations in the degree of applicability of the model as regards the domain of specialization (professional and social vs academic), language used (English vs Spanish) and lexicogrammatical and phraseological indicators, among other aspects. Contraseña: interpersonality/ interpersonal metadiscourse/ specialized, domain corpora/academic genres/professional and social genres/ theoretical and methodological challenges 142 Rocking the corpus. A discourse analysis of pop rock lyrics. Marı́a Martı́nez Casas 1 ⇤ 1 Katholische Universität Eichstätt-Ingolstadt (KU) – Alemania Rocking the corpus. A discourse analysis of pop rock lyrics. song lyrics, discourse analysis, language use patterns, enunciation, semantic processes Pop rock songs are everywhere – except for corpora. As Kreyer and Mukherjee (2007: 31) point out: ”pop song lyrics have not been included in any of the standard reference corpora of present-day British and American English [...]; pop songs are virtually absent from corpuslinguistic research”. The current state of research on pop rock songs in Spanish does not constitute an exception to this statement. Thus, the aim of this paper is to present the main language use patterns (Bubenhofer 2009) regarding enunciation (Laferl 2005, Calsamiglia and Tusón 2015) and semantic processes (Halliday 1978, Ghio and Fernández 2008) in a corpus consisting of 1.000 pop rock lyrics in Spanish (169.500 tokens). The present corpus was compiled following the sociological criteria of consecration and canonization as well as central aesthetic values such as authenticity and hybridization (cf. Val, Noya and Pérez-Colman 2014). It comprises 85 albums released between 1968 and 2015 by artists coming from over 12 countries. 819 texts were taken from CD booklets or artists homepages and 181 lyrics were transcribed from recordings. They were then analyzed with both AntConc 3.4.4W and WordSmithTools 6.0 and finally POS-tagged using Treetagger. In accordance to the results of prior corpus-linguistic research on pop rock lyrics in English (Murphey 1990, Kreyer and Mukherjee 2007, Werner 2012, Bértoli-Dutra 2014), pop rock discourse in Spanish builds upon the personal pronouns and possessive determiners of first (yo, me, mi ) and second person singular (tú, te, tu). The most frequent enunciative structure as proposed by Laferl (2005: 68) is: ”The I addresses itself to a you and talks about their relationship”. However, both main participants in lyrics show di↵erent semantic preferences when it comes to types of processes: whereas the articulate ”I” tends to be involved in mental (querer, sentir ) processes, the ”you” carries out material (irse, dar, dejar, llevar ) or verbal (decir, pedir ) processes. The semantic categories which Bértoli-Dutra (2014: 162) grouped for the factor extraction in her multi-dimensional analysis of pop songs in English show therefore following distribution in the lyrics in Spanish: ”movement” and ”speech” apply rather to the ”you”; ”emotion”, on the contrary, appears mainly close to the ”I”. The linguistic representation of the main participants in pop rock lyrics shall be presented in this paper through the discourse analysis of clusters with deictic expressions referred to the ”I” and the ”you” in the corpus. Special attention will be paid to lexical co-occurrences with tags corresponding to clitic and personal pronouns, possessive determiners and verbal forms (i.e. lexical and modal verbs as well as ser, estar , haber ). ⇤ Ponente 143 Contraseña: song lyrics, discourse analysis, language use patterns, enunciation, semantic processes 144 SUNCODAC: A Spanish-English corpus of computer-mediated student discussions Mario Cal Varela 1 ⇤ 1 , Francisco Javier Fernandez Polo ⇤ † 1 Universidad de Santiago de Compostela - USC (SPAIN) (USC) – España In this paper, we present the SUNCODAC corpus of student discussion forums. Our aims will be to justify the corpus’ rationale, describe its compilation process, holdings, design and query tools, and to highlight its potential as a research tool. Despite the momentum of Computer-Mediated Communication research (Herring & alii 2013), CMC corpora (Breissberger & Storrer 2008) are relatively meager and scarcely representative of the wide variety of CMC settings, notably educational contexts. Existing research in CMC in education is generally based on relatively small corpora, compiled for the special needs and research questions of individual research projects. SUNCODAC is a comparatively large corpus of student forum discussions, a key genre in present-day higher education (Rourke & alii. 1999, Loncar & alii 2014). Data consist of Moodle-based discussions in an English-Spanish-English translation course over four consecutive years. The corpus contains a balanced representation of English and Spanish used as native and non-native languages by multinational students. In the course of the presentation, we will provide a short description of the context of the discussions, as well as a brief account of the corpus compilation process. SUNCODAC’S current holdings consist of approximately 450,000 words and, when completed, it is expected to total over 600,000 words. Data were anonymized and stored in XML format with metadata on a number of user and other contextual variables, including participants’ first language, gender, main language of post, date, time, topic and thread. Except for the replacement of participants’ names by codes, the texts were left unedited as far as grammar, spelling and other errors are concerned. A specific tool was developed to allow for the computerized retrieval of data via the Internet. The tool can be used to search for specific language features, as well as for browsing and retrieval of whole texts or text collections using one or a combination of the coded variables as filters. In the course of the presentation, we will demonstrate some of these functions. The corpus holds considerable potential as a research tool, for instance, a) to further knowledge of ”netspeak” and, more specifically, b) to complement existing research on the discussion forum genre (Biber & Conrad 2009) and its characteristic language. Furthermore, given its longitudinal nature, c) it should provide insights into processes (individual and collective) of genre development in CMC and, in view of its multilingual and multicultural nature, d) should also prove particularly useful for language contrasts as well as e) for cross-cultural studies into culture-specific communicative practices. Finally, f) it should also prove valuable as a tool to study learner-language and second-language acquisition processes in real-life environments, as well as to undertake pedagogically-oriented studies seeking to identify successful forum participations which result in more e↵ective learning practices, eventually leading to the design of ⇤ † Ponente Autor correspondiente: [email protected] 145 improved training materials. Contraseña: Keywords: corpus, CMC, forum, Spanish, English, academic discourse, SUNCODAC 146 Secuencia gramatical para la enseñanza del español como lengua extranjera Yun Sil Jeon 1 ⇤ 1 , Alejandro Muñoz-Garcés ⇤ † 1 Coastal Carolina University (CCU) – Associate Professor, Spanish, Estados Unidos La investigación que estamos realizando conjuntamente la Università di Firenze y la Coastal Caroline University inició con el propósito de conseguir encontrar un modo automático de extraer de corpus de la lengua oral las construcciones más sencillas que se realizan en el habla, e ir progresivamente viendo las construcciones que presentan mayor complejidad. Para esta investigación contábamos con varios corpus de la lengua oral española: C-Or-DiAL (Corpus Oral Didáctico Anotado Ling´’uı́sticamente) (120.000 palabras transcritas y etiquetadas), C-ORAL-ROM (etiquetado y alineado) y el Minicorpus del Español (30.000 palabras etiquetadas y alineados y con marcas de articulación de información). Nuestro trabajo de programación inicial se ha propuesto encontrar el camino para conseguir extraer de modo automático los enunciados más sencillos de todo el corpus y continuar con las extracciones de los que presentan mayor complejidad de modo progresivo. Se ha partido del presupuesto que un enunciado en el habla es menos complejo cuantas menos unidades tonales lo componen. Se ha considerado por lo tanto que la unidad mı́nima de la comunicación es un enunciado compuesto de una solo unidad tonal, y que aumenta la complejidad del enunciado al aumentar la complejidad en su articulación de la información con dos o más unidades tonales. Se ha iniciado el análisis utilizando las etiquetas de delimitación de estas unidades tonales en el corpus C-Or-DiAL; estas etiquetas marcan los lugares en los que se percibe la delimitación de las unidades tonales intermedias de un enunciado, los break prosódicos intermedios, y también de final de enunciado, los break prosódicos finales. Gracias a este etiquetado ha sido posible generar una lista con todos los enunciados compuestos de una unidad tonal, los compuesto de dos, de tres de cuatro o más. El paso sucesivo de la investigación consiste en analizar estas listas con los distintos tipos de enunciados con ayuda de algunos analizadores morfosintácticos (GRAMPAL y FREELING entre otros) que se ofrecen en la red, para decidir cuál utilizar. Este mismo proceso de trabajo de extracción de unidades tonales y análisis se hará también utilizando C-ORAL-ROM y con el Minicorpus del Español para poder confrontar los resultados y evaluar las diferencias. Como resultado de estos análisis esperamos encontrar datos que sean significativos o al menos indicativos de lo que se suele usar en los enunciados más sencillos y lo que se va encontrando en los más complejos. Se podrá reflexionar tras este análisis sobre los tipos de palabras y las construcciones que ocupan determinadas posiciones en la articulación del enunciado. Y por último a partir de estos datos se podrá proponer al profesor de español como lengua ⇤ † Ponente Autor correspondiente: [email protected] 147 extranjera una secuencialidad a la hora de elegir el material que enseñar en la clase, pues nuestra investigación espera obtener algunos indicios sobre lo que usa, dónde y cuánto se usa en el español coloquial. Contraseña: Secuencia gramatical, enseñanza del español, corpus de hablantes nativos, análisis morfológico y sintáctico 148 Semantic constraints on MWU formation: Evidence from clinical records. Leonie Grön 1 ⇤ 1 , Ann Bertels 1 Katholieke Universiteit Leuven (KUL) – Bélgica Since Sinclair’s (1991) formulation of the idiom principle, the scope of research related to multi-word units (MWUs) has widened considerably. While earlier work focused on fixed word sequences, recent research locates MWUs on a continuum, ranging from frozen expressions to patterns which allow for paradigmatic choices (Dobrovol’skij 2015; Steyer 2015). In studies on language for special purposes (LSP), the defining criteria centre around the functional value of the unit, whereby the surface forms may show both lexical and morpho-syntactic variation. Such variation patterns may be attributed to the area of research, as well as properties of the textual genre (Hyland 2008, Laso & Salazar 2013). In the medical domain, most related research has focused on scholarly articles (León & Divasson 2006; Laso & John 2013). By contrast, our study investigates the structure of MWUs in clinical records, which are at the verge of oral and written communication. By analyzing a corpus of Dutch patient records, we aim to reveal patterns in the formation of complex noun phrases (NPs). Our prediction is that structural preferences will pattern with semantic features of the constituents. Our study focused on MWUs relating to the semantic classes Diagnosis (e.g. ‘lipodystrofie’ lipodystrophy) and Examination (e.g. ‘schildklierfunctie’ thyroid function). Based on precompiled term lists, we extracted all instances of these classes that were either localized on the human body (Anatomical, e.g. ‘onderbeen’ lower leg), or specified with regard to severity, etiology or quality (Qualitative, e.g. ‘drug-ge´’induceerd’ drug-induced ). We identified about 3 times as many MWUs for Qualitative than for Anatomical, both in terms of raw counts (137.646 vs. 36.862) and the number of patterns ( ˜472 vs. 112). Especially for Qualitative, a small number of conventionalized phrases (e.g. ‘gunstig lipidenprofiel’ favourable lipid profile) accounts for a large share of occurrences. Irrespective of the headword class, Qualitative modifiers primarily occur in the left context. By contrast, the formation of Anatomical MWUs shows more structural variation: General types of Examination are premodified (e.g. ‘pulmonaal onderzoek’ pulmonary examination), whereas technical procedures are localized by nouns in the right context (e.g. ‘echo nier’ echography kidney). MWUs based on the Diagnosis class entail more detailed localizations, leading to an increase in average length ( ˜2.7 vs. 2.3 tokens for MWUs based on Diagnosis vs. Examination). In MWUs involving multiple modifiers, the internal order of the constituents is determined by their semantic class as well as the level of generality: Adjectives designating a particular body part (e.g. ‘abdominaal’ abdominal ) are strongly tied to the headword, whereas relative spatial modifiers and Qualitative specifications are found in the periphery (e.g. ‘stenose thv de arteria carotis links’ stenosis in the arteria carotis left). ⇤ Ponente 149 We conclude that the formation of MWUs in clinical writing is guided by domain-specific constraints. In NPs relating to clinical findings and procedures, the type and relative position of modifiers varies systematically depending on semantic properties of the constituents. These findings confirm that the study of MWUs in LSP benefits from a delexicalized approach, whereby patterns of conceptual types form the basis of investigation. Contraseña: Clinical language, Dutch, MWUs, concordance analysis 150 Sobre la cuasi-sinonimia de poner y meter en español: un análisis de regresión logı́stica de dos verbos locativos. Marie Comer 1 ⇤ 1 Ghent University – Bélgica En esta ponencia nos proponemos comparar la sintaxis y la semántica de los dos verbos principales locativos del español peninsular contemporáneo, poner y meter, mediante un corpus ampliamente anotado. En su significado básico, estos verbos cuasi-sinónimos expresan el desplazamiento de una entidad (la ‘Figura’) de un lugar a otro (la ‘Base’) (Cifuentes 2000) (1). Sin embargo, el uso de poner y meter va más allá del significado locativo básico (Autores 2015): ambos verbos se usan como verbo de transferencia (2), en usos pseudo-copulativos (3), y en perı́frasis causativas e incoativas (4). (1) poner el mantel en la mesa - meterse un chupete en la boca (2) poner una multa a alguien - meter muchos deberes a alguien (3) ponerse nervioso - meterse monja (4) ponerse a reı́r - meterse a trabajar En cada uno de estos campos, poner y meter se comportan como cuasi-sinónimos. Significa que son intercambiables en determinados contextos (ponerse/meterse a estudiar ), pero en no otros contextos (El rı́o se mete/*se pone en el mar; *ponerse monja). El objetivo de esta presentación es doble. Primero, con base en un corpus arbitrario y manualmente compilado de 2000 ocurrencias (1000 de cada verbo, extraı́das de los bancos de datos CORPES XXI, CORLEC y C-ORAL-ROM y ampliamente etiquetadas sintáctica y semánticamente), examinaremos hasta qué punto los núcleos semánticos arriba mencionados se destacan concretamente con estos verbos. Segundo, efectuaremos un estudio más detallado del uso locativo (1), con el fin de detectar paralelos y diferencias entre poner y meter. El análisis se sustenta en un número extenso de variables que potencialmente influencian la elección entre los verbos en su uso locativo. Para este uso, los parámetros estudiados son, entre otros: (a) la dirección del desplazamiento de la Figura con respecto a la Base; (b) la dimensión de la Base; (c) la presencia o ausencia de una zona de contacto entre Figura y Base; (d) la forma fı́sica de la Figura; (e) la posibilidad de una lectura de contenedor o no, y el grado de contenedor (parcial/completo); (f) la animacidad y el carácter concreto/abstracto de los participantes, y (g) la interpretación literal o metafórica del evento de colocación. Mediante un análisis de regresión logı́stica (logistic regression), estudiamos el impacto potencial que tiene el conjunto de las variables en la preferencia por uno de los dos verbos. Nuestro estudio piloto reveló que meter se especializa en eventos donde la base adquiere una lectura de contenedor de tipo meter el pañuelo dentro del bolsillo (Autores 2016; Cifuentes 2004; Cifuentes & Jesús Llopis 1996), al preferir una localización interna, mientras que poner se emplea en una diversidad de eventos locativos. Otros factores de diferencia son ⇤ Ponente 151 la reflexividad sintáctica del evento locativo y la semántica de los participantes. La presente investigación ilustra cómo un método multivariado y estadı́sticamente avanzado se puede aplicar para determinar la diferencia entre dos cuasi-sinónimos léxicos. Contraseña: cuasi, sinonimia, verbos locativos, regresión logı́stica, análisis multifactorial 152 Spanish Fragments and Polar Verbless Clauses. Typology and Corpus Distribution Oscar Garcia-Marchena 1 ⇤ 1 Laboratoire de Linguistique Formelle (LLF) – Université Paris VII - Paris Diderot, CNRS : UMR7110 – Case Postale 7031 5, rue Thomas Mann 75205 Paris cedex 13, Francia The properties and use of fragments (or elliptical clauses) have received recent attention in di↵erent works (Fernandez 2002, Merchant 2004, Schlangen 2003). There is no agreement, however, concerning their nature and classification. Firstly, some authors treat them as pure syntactic units: the remnants of verbless clauses which have undergone ellipsis (Merchant 2004). Secondly, others classify them as pragmatic objects, di↵erent from non-elliptical clauses (Schlangen 2003), by their function in discourse. Thirdly, other works stress their independence from non-elliptical clauses and classify them with a combination of syntactic and pragmatic criteria (Fernandez 2002). The aim of this paper is to show the extent to which Spanish fragments and polar verbless clauses (”yes”, ”no”) can be analysed as syntactic or discourse units, as well as to determine a typology based in their syntactic and pragmatic properties and to present their distribution in the di↵erent genres of a corpus. In order to achieve this goal, we have retrieved the totality of fragments in the corpus of contemporary oral Spanish (CORLEC) (Marcos Marı́n 1992), composed by more than 63 000 utterances and we have classified them according to their syntactic and pragmatic properties. Finally, we have counted the frequencies of each type in the di↵erent genres. The results of this analysis indicate that fragments containing a segment with a counterpart in their source have a predictable discursive relationship with it: they perform a particular speech act (answer, agreement, correction, check question, etc.) that is determined by the syntactic and semantic properties of the source and the target clauses. This combination of properties is detailed in the following list, with reference to constructed examples of the various speech acts: • Interrogative source & asserting target: answer (1) • Interrogative source & questioning target: answer + check question (2) • Questioning declarative source & asserting target & same referent: agreement (3) • Questioning declarative source & asserting target & di↵erent referent: correction (4) • Questioning declarative source & quest. target & same referent: check question (5) • Questioning declarative source & questioning target & di↵erent referent: correction (6) • Asserting declarative source & asserting target & same referent: acknowledgement (7) • Asserting declarative source & asserting target & di↵erent referent: correction ⇤ Ponente 153 • Asserting declarative source & questioning target & di↵erent referent: check question • Asserting declarative source & questioning target & di↵erent referent: repair • A: - ¿Cuándo vino? B: -Hoy. A: -‘When did he come?’ B: -‘Today.’ • A: - ¿Cuándo vino, hoy? A: -‘When did he come, today?’ • A: - ¿Se fue con Mar? B: -Con Mar. A: -‘Did he go with Mar?’ B: -‘With Mar.’ • A: - ¿Se fue con Pedro? B: -Con Mar. A: -‘Did he go with Pedro?’ B: -‘With Mar.’ • A: - ¿Se fue con Pedro? B: - ¿Con Pedro? A: -‘Did he go with Pedro?’ B: -‘With Pedro?’ • A: - ¿Se fue con Pedro? B: - ¿Con Marı́a? A: -‘Did he go with Pedro?’ B: -‘With Mar?’ • A: -Se fue con Pedro. B: - ¡Con Pedro! A: -‘He went with Pedro.’ B: -‘With Pedro!’ In this way, this article will show the illocutionary e↵ects of the combination of syntactic and semantic properties in the source and target clauses for Spanish fragments and polar verbless clauses, as well as the distribution of the resulting speech acts in the various genres of the CORLEC corpus. Contraseña: Fragments, non, sentential utterances, polar verbless clauses, Spanish, corpus, speech acts 154 Spoken Language Corpora under Examination Hanna Hedeland 1 ⇤ 1 , Daniel Jettka 1 Hamburg Centre for Language Corpora, University of Hamburg – Alemania Spoken Language Corpora under Examination Contributing to the current discussion on reuse and citation of corpora and the replicability of corpus-based research, this contribution describes evolving methods for corpus publication and dissemination at a research data centre and presents an outline for a revised model of spoken language corpora as complex dynamic linguistic resources. Within emerging digital research infrastructures (e.g. CLARIN), digital repositories have been set up for the dissemination of resources including spoken language corpora. While there are obviously many benefits to this current best practice approach, several questions regarding resource type specific aspects of data modelling and versioning require an answer for its implementation. By comparison to the previous web-based solution, this contribution discusses these questions and their implications, which are highly relevant to research based on spoken language corpora. Website resources The vast majority of the resources hosted by the centre (cf. [1]) are XML data sets created from heterogeneous legacy data using the EXMARaLDA system [2]. The EXMARaLDA Corpus Manager (Coma) provides a basic data model for corpora comprising communications, speakers, transcriptions, recordings and additional files, and the Coma metadata file itself. For publication and dissemination however, corpus-specific methods based on the EXMARaLDA system were used to create a number of export and - mainly HTML based - presentation formats (i.e. visualizations) and statistics from the source files, resulting in a much more comprehensive and complex resource. The protected resources were accessed via a public web page page containing further background information and documentation. Repository resources Since a digital repository enforces concepts such as persistent identifiers, versioning of digital objects and ingest/dissemination services, the initial corpus data model for the repository comprised only the original source files (cf. [3]), whereas basic visualization and export functionality was implemented by generic web services provided via the repository. This solution brought about two important di↵erences: First, the resource is no longer a collection of static web pages and files; the user interacts with web services that change as target formats or the services themselves are further developed. Secondly, to allow for appropriate presentation of specific corpora (e.g. for research on (child) ⇤ Ponente 155 language acquisition or regional varieties) by generic web services, the corpus type specific characteristics related to corpus design, annotation layers and transcription conventions need to be explicated and applied as configuration parameters for resource dissemination. Discussion Most important, the requirement of citable corpus versions makes it necessary to explicitly track also the versions of web services and further components used for dissemination. As a recent study [4] confirms, users of this type of corpora often mainly analyze visualized transcripts, whose characteristics are known to influence analysis (cf. [5]). Furthermore, while merely applying corpus specific parameters in web services is straight-forward, the definition of such parameters and classification of spoken language corpus types requires thorough investigation and interpretation. Such a typology can be used both to ensure a presentation consistent with original research questions and frameworks for various resources, or, conversely, to allow for a more consistent user experience by applying certain settings to various corpus types in the repository. In our contribution we will discuss this revised and extended model of spoken language corpora more in detail. Contraseña: spoken language corpora, replicability, research infrastructures 156 Strategies for Processing Large Corpora for Linguistic Inquiry and Natural Language Processing Tasks. Antonio Moreno-Ortiz 1 ⇤ 1 Universidad de Málaga (UMA) – España Very large (over a billion words) corpora, have become increasingly available to Corpus Linguistics (CL) and Natural Language Processing (NLP) researchers. However, such text collections are o↵ered with no or little filtering and processing of their content. This is a non-issue for some tasks, such as KWIC concordancing or collocates, due to the sheer volume of data available and, in some cases, the availability of web-based query environments. However, dealing with the raw text to obtain accurate, linguistically-driven statistical information from such corpora, with a view to using it for more advanced tasks, calls for some sophisticated pre-processing, in terms of filtering and word tokenization. This basic step is critical to all others, since it involves making such fundamental decisions as what a word is. This is even more relevant when a corpus is compiled from on-line resources, which are commonly includes a fiar amount of non-lexical and pseudo-lexical items, such as common computer-mediated communication items (URL’s, handles, hashtags) as well as numbers, measures, formulas, etc. If no special treatment treatment is given to such elements, they will certainly impact word frequency counts at all levels, including part-of-speech frequencies, n-gram extraction, statistical language modeling, and, in general, any task that builds on these. Determining the frequency of word classes accurately, as determined by part of speech assignment, is critical to a number of common corpus linguistics metrics, such as lexical density. In this work we examine the role that certain non-lexical and pseudo-lexical items (e.g. cardinal numbers, hashtags, URL’s, e-mail addresses) display in current available corpora obtained from the Web. Specifically, we will focus on GloWbe (Davies, 2013), a large corpus (1.9 billion words), available both for online queries and as a full-text download in di↵erent formats, including a tokenized, part-of-speech tagged, lemmatized version. We show that in such web-based corpora, non-lexical items exhibit high frequency, and therefore should be given a special treatment in order to obtain adequate statistics of common corpus linguistic metrics, such as type/token ratio, word class frequency, and those that are derived from these. We then propose certain cues for the proper treatment of such corpora, in terms of pre-processing, tokenization and part of speech tagging. During this process, we identified certain pre-processing flaws in the original corpus that led to inaccurate results, and propose ways to overcome them. Finally, we describe the results of our segmentation and part-of-speech tagging processing, and compare them with those given by the original tagged version of the Glowbe corpus, and go on to show the impact that di↵erent preprocessing approaches have on certain types of corpus queries, as well as n-gram extraction. References Davies, Mark. (2013) Corpus of Global Web-Based English: 1.9 billion words from speakers in 20 countries. Available online at http://corpus.byu.edu/glowbe/ ⇤ Ponente 157 Contraseña: large corpora, corpus processing, tokenization, part, of, speech tagging 158 Students’ use of the n-grams tool to learn about phraseology in academic writing Maggie Charles 1 ⇤† 1 Oxford University – Reino Unido This paper focuses on the use of recurring multi-word units (MWUs) that are fixed or semifixed in form. In academic writing, MWUs have been investigated using various terms, including ‘lexical bundles’ (Biber et al., 1999; Cortes, 2004) or ‘clusters’ (Hyland, 2008a) and research has shown that their occurrence di↵ers according to discipline (Hyland 2008b). Moreover, there are considerable discrepancies in MWU use between learner and expert academic writing (Cortes, 2004; Hyland, 2008a), with learners typically employing di↵erent MWUs from expert writers and/or using them for di↵erent purposes. Thus the use of MWUs presents challenges to learners of English for Academic Purposes and there is a consequent need for even advanced-level students to develop proficiency in academic phraseology (Gilquin et al., 2007). The present paper aims to address this issue by investigating students’ use of the n-grams tool in the AntConc software (Anthony 2014). The n-grams tool makes a list of all sequences of words that occur in a corpus, with the number of words in the sequence being determined by the user. This study draws on students’ work during a 6-week, 12-hour course on ‘Editing your Thesis with Corpora’. For this course, doctoral students built two do-it-yourself corpora: 1) a corpus of expert writing constructed from research articles (RAs) in their own field; 2) a corpus of learner writing consisting of draft chapters of their own doctoral thesis. Thus each student worked with two corpora tailored to their own specific needs. In the session on n-grams, students were shown the AntConc n-grams tool and each learner made an individual list of three-word sequences (tri-grams) from their corpus of expert RA writing. As the retrieval process of n-grams is automatic, it was hypothesised that the tool would help students to identify the tri-grams used in their own field and thus provide a means of highlighting appropriate academic phraseology. Students were then asked to study the most frequent tri-grams on the list and to perform further corpus searches to understand and explain what they noticed, comparing where necessary the findings from the expert corpus with those from their own writing. The data used in this paper currently consist of the corpora constructed by 15 students and the worksheets completed by them in class, giving details of the most frequent tri-grams they found and commenting on what they learnt from their findings. The most frequently mentioned tri-grams were as well as (found by 11 learners), in terms of (6 learners), the fact that, the e↵ect(s) of and in order to (4 learners each). Following the categorisations of the ‘Academic Formulas List’ (Simpson-Vlach & Ellis, 2010), as well as, the e↵ect(s) of and in order to have discourse organizing functions, while the fact that and in terms of are referential expressions. After further investigations, students often commented on di↵erences they found between their writing and that of the experts. For example, after researching the fact that, one student noted that she used due to the fact that, which did not appear in the RA corpus, where despite the fact that was prevalent. This paper reports in more detail on the student data and argues that the n-grams tool provides a useful way of promoting the noticing and understanding of academic ⇤ † Ponente Autor correspondiente: [email protected] 159 phraseology at an advanced level. Contraseña: academic writing, n, grams, academic phraseology, corpus tools, EAP learners 160 Teachers’ Dispositions Towards the Use of Corpus-Based Approaches in Teaching English as a Foreign Language in Higher Education Awatif Alruwaili 1 ⇤† 1 University of Nottingham – Reino Unido Despite the development and increased use of corpora as a resource in language learning, little evidence exists that corpora are used as alternatives to textbooks and traditional resources such as dictionaries (Chambers, 2005). Corpora use has not changed significantly since Chambers’s (2005) article, as revealed by later studies such as Boulton (2010) and R´’omer’s (2009). Published research has shown improvements in learner performance and positive attitudes in higher education, providing wide support for the use of a corpus approach in an English as a foreign language (EFL) context. Nonetheless, implementing this approach in daily teaching is still a distant goal. Many researchers (e.g. Boulton, 2009, 2010; Hughes, 2012; R´’omer, 2009) have shown concern regarding the infrequent use of corpora in everyday classrooms. Several authors have also confirmed the key role that teachers play in applying the corpus approach in language teaching (Frankenberg-Garcia, 2012). The present study sought to widen the existing perspective on using corpora in language classrooms given previous research’s promising results on the importance of investigating teachers’ attitudes towards the corpus approach. Their willingness to apply it is clearly a necessary step in popularising this approach. This study was particularly interested in ways to transform classrooms into learning environments that truly facilitate the use of corpus-based approaches for learning English in an EFL context. This transformation can be facilitated by introducing teachers to corpus-based approaches and their applications in teaching English, which could help to inform language instructors and shape their attitudes. This study’s aim, therefore, was to explore teachers’ dispositions towards the use of corpora in language classrooms. Only two previous studies have examined in-service instructors’ attitudes towards corpus-based approaches to teaching (Mukherjee, 2004; Tribble, 2015). To this end, the present study’s first phase involved designing an introductory course to show language instructors possible ways of using corpora in the classroom. Next, I evaluated in-service teachers’ attitudes towards classroom uses of the corpus approach by developing and administering a questionnaire. Finally, I identified possible factors that can a↵ect instructors’ opinions of using corpora in the EFL classroom. The introductory course consisted of two sessions, each of which ran for one hour and 30 minutes with a 15-minute break. The sessions were o↵ered multiple times to accommodate teachers’ availability. The course content consisted of three units: teaching about corpora, exploiting corpora to teach language and teaching to exploit corpora. The participants were 57 in-service teachers who worked in higher education programmes. ⇤ † Ponente Autor correspondiente: [email protected] 161 An exploratory design was selected for developing the questionnaire, in which a semi-structured interview was used to generate material on themes and list possible variables in addition to those found in the related literature. The questionnaire covered five themes related to corpora uses in the classroom, including usefulness, difficulty, practicality, confidence and anxiety, and implementation. The tri-component model of attitude was used as the theoretical framework for constructing the questionnaire because this model is widely known and accepted by many researchers (Vandewaetere & Desmet, 2009). This framework consists of three elements that provide a comprehensive view of attitudes – in this case, towards corpus use in the language classroom – by capturing the three components of attitudes: cognitive, a↵ective and behavioural. Overall, teachers had moderate to positive attitudes. Contraseña: Corpus, based approaches: in, service teachers: classroom 162 The Developmental Relationship between Spoken and Written Clause Packaging in an English Secondary School Mark Brenchley 1 ⇤ 1 Graduate School of Education, University of Exeter – 216 Baring Court University of Exeter St Luke’s Campus Heavitree Road Exeter EX1 2LU UK, Reino Unido This paper will detail the findings of a fresh study into the relationship between L1 spoken and written syntax during the secondary phase of the English education system, situating them within the context of other recent studies into L1 development during the school years and discussing their implications for L1 English curricula. Working within a framework of ”linguistic literacy” and a wider model of ”rhetorical” competence, according to which L1 speakers and writers must not only learn the core forms of a language but also develop the capacity to e↵ectively put these forms to work across a range of literate contexts (Berman & Ravid, 2009; Ravid & Tolchinsky, 2002; cf. Biber, 1988, 1992; Hymes, 1976), the aims of the present study were twofold. Firstly, to provide a better understanding of the relationship between spoken and written syntax during an apparently critical period in the development of L1 English (Berman & Ravid, 2009; Myhill, 2009). Secondly, to provide evidence that can better inform and support contemporary L1 English curricula, which are increasingly emphasising the explicit teaching of grammar (ACARA, 2016; DfE, 2014). To this end, a bespoke corpus of 180 pairs of spoken and written L1 expository discourse was directly elicited from students attending a mainstream secondary school in Southern England. The corpus was further designed so as to be balanced across two developmental axes: (a) the year group of the student, and (b) their National Curriculum attainment level. This corpus was then analysed in terms of the students’ modality-related use of clause packaging, construed here as comprising the various means by which clauses are combined via coordination and subordination (cf. Berman & Slobin, 1994). So analysed, the study indicates adolescent students at the present age and attainment levels to be at a stage where they can and do di↵erentiate their modality-related syntax, at least for these texts and measures. It also found this di↵erentiation to be something that varied according to the particular kind of packaging measured. Thus, the spoken texts exhibited a greater number of t-units per t-unit complex and clauses per t-unit, together with a greater prominence of finite adverbial and post-verbal complement clauses. Conversely, the written texts exhibited a greater overall prominence of non-finite clauses, whilst neither modality was distinguishable in terms of either clause length or their respective proportions of relative clauses and phrasal clauses. Finally, this di↵erentiation was found to be developmentally static, with participants handling their modality-related syntax in much the same way regardless of their age or attainment level. Overall, these findings are interpretable in terms of the participants tapping into the di↵erential production conditions of speech and writing, but without necessarily fully exploiting these conditions (Biber, 1988, 1992). Moreover, when placed in the context of the wider evidence base ⇤ Ponente 163 (Berman, 2008; Myhill, 2008; Nippold, 2007; Nippold & Scott, 2010; Ravid & Tolchinsky, 2002), the findings suggest two additional conclusions. Firstly, they indicate students at the present age and attainment levels to be at a stage where their syntactic output is more in line with the discourse of mature speakers and writers. Secondly, they indicate modality to be an aspect of student syntax that is characterised by a potentially high degree of sensitivity to the various communicative features of the wider discourse context. Contraseña: Education, English, L1, Later Language Development, Modality, Register Variation, Syntax 164 The Psycholinguistic Profile of Domestic Abusers: A Corpus-Based Approach ángela Almela⇤ 1 , Gema Alcaraz-Mármol 2 , Pascual Cantos Chaski 4 , Clara Pallejá 5 1 † 3 , Carole Centro Universitario de la Defensa - UPCT (CUD) – Centro Universitario de la Defensa. Base Aérea de San Javier C/ Coronel López Peña s/n, 30720, Santiago de la Ribera, Murcia, España 2 Universidad de Castilla la Mancha (UCLM) – España 3 Universidad de Murcia (UM) – España 4 Institute for Linguistic Evidence (ILE) – Estados Unidos 5 Centro Universitario de la Defensa - UPCT (CUD) – España Gender-based violence is receiving close attention from professionals and researchers within the legal, criminal and psychological scope, exploring several aspects related to both the victim and the abuser. In some cases, the phenomenon of gender-based violence shows the direct relationship between language and society. In fact, some stylistic methods show how social structures and language are interwoven through the abuser’s discourse. However, the language produced by those involved in gender-based violent acts has been hardly explored from a computationallinguistic perspective (Almela, Alcaraz-Mármol & Cantos, 2015; Hancock et al., 2011). This paper presents a pilot study of di↵erentiating the language of domestic abusers from a control group. The domestic abusers have been convicted of a violent crime in the domestic context, while control group members have not. The main aim is to shed some light on the gender-based abuser’s psycholinguistic profile in the Spanish language from an empirical viewpoint, in the light of the scientific practices promoted by Chaski (2013). This profile is meant to establish the underpinnings for a database which will be compared to other criminals’ speech. Our research is still at the initial stage, but we have already designed the methodology for the analysis of the morphological characteristics in the gender-based abuser’s discourse, as opposed to the speech of those convicted for other sorts of crimes and a control group. Specifically, the linguistic sample for our analysis correspond to written interviews done by subjects that have been accused and/or convicted for gender-based abuse. The computational analysis involves several stages like POS-tagging, punctuation tagging and the evaluation of markedness, as well as the assessment of lexical choice and the identification of morphosyntactic patterns, which will allow us to distinguish the abuser’s sublanguage from that of the control group. Thus, the results of analyzing the two groups’ linguistic behavior in writings responding to the same stimuli are presented. Further, results of clustering and classification to determine the statistical reliability of di↵erentiating the language of domestic abusers are presented. The present authors will also comment on some of the hindrances found in the collection of data, which has complicated the accomplishment of the work schedule initially programmed, and will show how the use of language as evidence in the framework of forensic linguistics in Spain is still in its infancy. REFERENCES ⇤ † Autor correspondiente: [email protected] Ponente 165 Almela, A., Alcaraz-Mármol, G. and Cantos, P. (2015). Analysing deception in a psychopath’s speech: a quantitative approach. DELTA 31 (2): 559-572. Chaski, C.E. (2013). Best practices and admissibility of forensic author identification. Journal of Law and Policy 21 (2): 333–372. Hancock, J. T., Woodworth, M. T. and Porter, S. (2011). Hungry like the wolf: A word-pattern analysis of the language of psychopaths. Legal and Criminological Psychology 2011, 1–13. Contraseña: domestic abusers, forensic linguistics, psycholinguistic profile, clustering, classification 166 The XML Annotation of A Corpus of Historical English Law Reports 1535-1999: A Progress Report Paula Rodrı́guez-Puente 1 ⇤ 1 University of Oviedo – España A Corpus of Historical English Law Reports (CHELAR; Rodrı́guez-Puente et al. 2016) is a specialised corpus consisting of law reports dating from the period 1535-1999. Law reports are records of judicial decisions which are ”cited by lawyers and judges for their use as precedent in subsequent cases” (Encyclopædia Britannica Online s.v. law report); they typically contain an account of all the facts of the case, the arguments of the judge, his reasoning, the judgment he arrives at and the kind of authority and evidence he uses. The corpus contains approximately half a million words. It is structured into nine periods of 50 years each, except for the first subperiod, which covers from 1535 to 1599. It is already available as plain text and with POS annotation (CLAWS C7; see Garside 1987). In previous work we described the first difficulties encountered during the process of creating the corpus texts as well as the editorial decisions that were initially taken (Rodrı́guez-Puente 2011); Fanego et al. 2017 provide an account of the final structure of the corpus and the type of documents it contains together with a description of the process of compilation of the raw and POS-annotated texts. In this presentation we report on the process of XML annotation of the corpus. CHELAR is currently being annotated following the Text Encoding Initiative P5 Guidelines for Electronic Text Encoding and Interchange developed by the Text Encoding Initiative Consortium (Bray et al. 2008). TEI XML encoding has become the standard practice adopted in digitally based humanities research for present-day English and diachronic corpora. More precisely we focus on the particular structure and contents of law reports and the specific XML tags used for our purposes. We advocate for a modest XML tagging which includes some renditional (e.g. italics), structural (paragraphs, line breaks, page breaks, etc.) and conceptual (foreign words, proper names, names of cases, etc.) features of the texts. In sum, although the annotation possibilities of the TEI-XML schema are infinite, we selected only those tags that satisfy the needs of our texts, yet at the same time facilitate a varied range of corpus analyses. An account of the decisions made will be provided in this paper, together with a progress report of the annotation process itself. At present we have concluded the annotation of the first two subperiods (1950-99 and 1900-1949) and we hope to conclude the annotation of the whole corpus by the end of 2017. Contraseña: corpus annotation, XML, law reports ⇤ Ponente 167 The construction of shared feelings: analysis of a↵ect in a corpus of obituary comments in online newspapers Isabel Corona 1 ⇤ 1 Universidad de Zaragoza (UNIZAR) – Facultad de Filosofı́a y Letras Pedro Cerbuna12 50009 ZARAGOZA, España The comments section in online newspapers consists of a slot found below an article’s body text where readers may post their opinion following that piece of news. Comment boards were o↵ered by online newspapers a decade ago to engage readers in the news process, thus creating a new context for expression and engagement (Yzer and Southwell 2008) within the general ‘connecting’ mantra. Journalistic obituaries, with a long-standing tradition in all sorts of newspapers, are life stories seen in retrospective. They are narratives of lives with a purpose established by the newspaper, either to praise or condemn, becoming a lesson of life that guides or reinforces the values of a community of readers who are supposed to share the same socio-cultural or political principles. Thus, evaluation of the subject has been an intrinsic feature of obituaries. The subjects’ lives are sanctioned as complying with or deviating from role-specific parameters, in such ways that they construe a particular version of collective memory, reflecting the values of the media institution. This collective memory can now be challenged by the new media a↵ordances that open up the space for individual reactions to that memory. By using the comments section, which could be viewed as a new ‘social tool’, prior readers become co-participants in the coproduction of the text’s meanings” (Page and Thomas 2011: 10): they may bring emotional reactions on his or her behaviour, on his or her public legacy as role models, and get an immediate response from other participants. The users’ discursive acts, although separated from the main text, construe another discursive context that may or may not agree with the newspaper’s assessment of the subject. The main aim of this study is to explore the commentator’s use of evaluative expressions for the construction of a↵ect towards a life story of a public persona in the digital media, in order to assess the way media users establish a new space for shared feelings. For this purpose, the corpus comprises 840 comments which appeared in the obituaries published by five online newspapers (Daily Mail (UK), The Daily Telegraph (UK), The Guardian (UK), the Huffington Post (USA edition), and the Washington Post (USA)) after the death of the Spanish Duchess of Alba. The study is grounded in Collective Memory as an umbrella concept that ”defines relations between the individual and the community to which she belongs and enables the community to bestow meaning upon its existence” (Neiger et al. 2011: 4). The analysis applies the framework proposed by Appraisal Theory (Martin 2004; Martin and White 2005; White 2001), to explore the attitudinal values used to construe a community of shared values. The present analysis focuses on the attitudinal realm of ”A↵ect”, as mapping the commentators’ reactions in terms of happiness, admiration, satisfaction, desire and solidarity towards the obituarised subject. The analysis of explicit attitudinal instantiations of A↵ect reveals a clearly positive emotional ⇤ Ponente 168 response of readers turned into users, with prototypical expressions of sorrow –so productive in the construction of community identity–, and a high frequency of desiderative expressions operating as ritual formula, all of them features –referred to by obituarists as ”dread clichés” (Massingberd 1995: viii) and banned in all quality newspapers –, that challenge tacitly accepted norms with respect to what is considered good obituary writing. Contraseña: Collective Memory, obituaries, online comments, Appraisal, Computer Mediated Communication (CMC) 169 The implied consumer in British hotel websites Carmen Gregori-Signes 1 ⇤ 1 IULMA. UNIVERSITAT DE VALENCIA (IULMA. UV) – Facultt de Filologia Blasco Ibañez 32 46010 Valencia, España Hotel websites is a discourse type within etourism that intertwines textual and visual strategies (cf. Cheng 2016) with the primary purpose of persuading website visitors to become customers. This paper focuses on the interpersonal rhetorical functions of engagement, i.e. the lexicogrammatical choices (cf. Hyland 2005) that hotel website designers use as a strategy to create a bond between the addresser (i.e. the hotelier) and the addressees (i.e. the potential clients), in the framework of a ‘business to consumer’ (B2C) marketing practice in ecommerce. As a framework for the analysis, the paper adopts Stern’s (1994) interactive communication model and focuses on the implied consumer, i.e. the construct of the imagined consumer within the message, and how the relationship between both is discursively established. This involves looking at metadiscourse, which Hyland and Jiang (2016: 3) described as ”(the) linguistic material referring to the evolving texts and to the writer and imagined reader of that text.” As Hunston (2011: 24) puts it, ”metadiscourse is subsumed entirely under the concept of interaction or engagement between writer and reader.” The corpus analysed comprises 114 British hotel websites, and amounts to half a million words. This is part of COMETVAL, a large database of over 7 million words, compiled by researchers at the University of València, and contains samples of tourism websites in three languages: French, Spanish and English. The results obtained in the analysis indicates the existence of patterns whose relevance becomes already apparent in an initial keyword analysis of the corpus: among the top keywords one can find the personal pronoun you (subject and object) and its corresponding possessive your as explicit reference to the implied consumer. Further observation by means of concordancing and manual scrutiny also pointed towards the need to include directives as a relevant feature of engagement (Hyland 2005). Directives are often conveyed by means of imperatives and cannot be detected through keyword analysis and ordinary morpho-syntactic tagging. The results of the quantitative and qualitative analysis seem to indicate that copywriters rely on a set of a set of specific conditional constructions built around the subject personal pronoun you, and, in some cases, directives. These structures were further explored and classified into di↵erent subsets, which brought out a set of lexico-grammatical patterns that reflect the textual choices that hoteliers use in their attempt to anticipate the needs and wishes that potential customers may have. These needs, they claim, can be satisfied by the products and/or services that hoteliers o↵er. It is our view that such rhetorical features of engagement distinguish the discourse of hotel websites from other kinds of promotional discourse. These patterns are examples genuine cases of engagement, key rhetorical features of hotel-owned websites (AUTHOR 2, 2014). ⇤ Ponente 170 Contraseña: Keywords: discourse, engagement, corpus linguistics, conditionals, advertising, etourism, hotel websites 171 The power of English: I and we in ELF and in ENL academic discourse Jolanta Sinkuniene 1 ⇤ 1 Vilnius University (VU) – Lituania Within the last several decades, numerous cross-disciplinary and cross-linguistic studies of research writing confirmed interesting trends in the ways knowledge is reported in di↵erent science fields and di↵erent cultures (Berkenkotter & Huckin 1995; Fløttum et al. 2006; Hyland 2008; Lorés–Sanz et al. 2010, inter alia). In those studies, author stance or author voice (Hyland & Sancho Guinda 2012) is the key element of investigation as it proved to play a very important role in creating persuasive discourse which shapes disciplinary and cultural identities. In cross-linguistic studies of research writing, the comparative axis is frequently drawn between English vs other academic cultures trying to establish the level of similarity or divergence in the expression of author stance. At the same time the question of the influence of English on other academic cultures has become of crucial importance leading to the debate about the role of English in the global research arena: the role of a common, unifying language of science or the Tyrannosaurus rex (Swales 1997) responsible for the ”epistemicide” (Bennett 2007) of smaller cultures. One of the most obvious elements of author stance manifestation is personal pronouns. The use of I and we in academic discourse has been acknowledged as one of the most powerful means to mark author stance (Harwood 2005; Hyland 2001, inter alia). Numerous empirical studies confirm substantial di↵erences in personal pronoun use depending on the cultural background of the writer (for an overview see Mur-Dueñas & Šinkūnien 2016). There is less research which attempts to investigate the ways personal pronouns are used in English as a Lingua Franca by non native English speakers in comparison to their writing in native languages. The aim of the present study therefore is to analyse the use of personal pronouns in linguistic research articles written by Lithuanian scholars in Lithuanian and by the same scholars in English, and to compare patterns of use with those of native English speakers. The study employs corpus-based contrastive methodology as well as quantitative and qualitative analysis. The data comes from a self-compiled corpus of 36 single-authored research articles. For the Lithuanian data 12 pairs of research articles written by the same scholar in English and in Lithuanian were selected. For the English sub-corpus, 12 articles written by British linguists were chosen. The quantitative analysis looks at the frequency distribution of I and we and their morphological forms in those three sub-corpora. The qualitative analysis investigates the range of functions that personal pronouns perform in Lithuanian, Lithuanian English and British English texts. For this purpose, all combinations of a personal pronoun with the verb have been analysed in context to determine the function they perform. The results suggest that most Lithuanian scholars choose a more explicit author stance expression when they write in English rather than in Lithuanian, though the frequency and functions of I and we in English native speakers’ texts are di↵erent. English native speakers choose more argumentative verbs to express author stance with personal pronouns, they also frequently shift from I to we and in this way create more persuasive discourse and closer links with the audience ⇤ Ponente 172 than Lithuanian scholars. Contraseña: academic discourse, personal pronouns, cross linguistic, quantitative analysis, qualitative analysis 173 The textual colligation of stance phraseology in cross-disciplinary academic discourses: the timing of authors’ self-projection Louisa Buckingham 1 ⇤† 1 , Jihua Dong ⇤ ‡ 1 University of Auckland – Nueva Zelanda Lexical items, according to Hoey (2005, p.13) ”are primed to occur in or avoid, certain positions within the discourse”. An analysis of textual colligation, the term Hoey (2005) uses to denote such priming, explores the textual position of linguistic markers in relation to textual structures. Recent studies have explored the textual colligation of particular words or phrases (e.g., Hoey & O’Donnell, 2008; Mahlberg, 2009; O’Donnell et al., 2012). Textual colligation explores the textual position of linguistic markers in relation to textual structures and the interaction between the textual position and discourse functions (Hoey, 2005). Previous studies have enriched our understanding of textual colligation of particular linguistic features such as keywords or key phrases in a text. This study investigates the textual colligation of a type of linguistic marker typical for one particular semantic group, namely, stance. This quantitative study investigates the textual colligation of the stance phrases in academic discourse in the disciplines of agriculture and economics. The study employs a purpose-built corpus of 655 published research articles totalling around 3 million tokens. We use Wordskew software (Barlow, 2016) to investigate the position (or colligation) of stance phrases at the level of sentence, paragraph and text, and examine the existence of disciplinary variation with respect to the textual colligation of these phrases. The results show that significant di↵erences exist in the distribution of stance phrases in different textual positions (sentence, paragraph and text) in the two disciplines. Nevertheless, the proportion of stance phrases in each of the three textual positions is notably similar in the two disciplines. It may be inferred that the textual position of particular stance phrases may be a result of the type of routinized discourse or communicative function these serve (Hoey, 2005). The findings regarding the textual position of the stance phrases consolidates Hoey’s premise that certain expressions are primed to occur or avoid particular textual positions. In addition, the study revealed that the phrases of a particular function tend to share some positional similarities with regard to their distribution in sentence, paragraph and the whole text. From a communicative viewpoint, the appropriate positioning of stance phrases in a text supports authors in constructing discourse-appropriate persona, interact with envisaged readers, and achieve their communicative objectives. The use of Wordskew has contributed to revealing the text positions at the sentence, paragraph, and text level. It provides an efficient way to quantify the textual position of particular linguistic features, and contributes to visualising the distribution of particular linguistic features in the organization of a text. ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 174 Barlow, M. (2016). WordSkew : Linking corpus data and discourse structure. International Journal of Corpus Linguistics, 21 (1), 105–115. Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge. Hoey, M., & O’Donnell, M. B. (2008). Lexicography, grammar, and text position. International Journal of Lexicography, 21 (3), 293–309. Mahlberg, M. (2009). Local text functions of move in newspaper story patterns. In U. R´’omer & R. Schulze (Eds.), Exploring the lexis-grammar interface (pp. 265–287). John Benjamins. O’Donnell, M. B., Scott, M., Mahlberg, M., & Hoey, M. (2012). Exploring text-initial words, clusters and concgrams in a newspaper corpus. Corpus Linguistics and Linguistic Theory, 8 (1), 73–101. Contraseña: textual colligation, stance phrases, academic disciplinary variation, academic writing 175 Towards an extended lexical grammar: Complex colligational patterns of the noun cause Moisés Almela Sánchez 1 ⇤ 1 , Pascual Cantos Gómez ⇤ † 1 University of Murcia – España It has become a truism that lexis and grammar are intertwined and that grammatical choices are bound to lexical items. The notion of lexical grammar is well established in several frameworks of modern linguistic research, and corpus-driven linguistics is not an exception in this respect-see, for instance, Francis (1993) and Hunston and Francis (2000). This research is aimed at extending the scope of description of lexico-grammatical co-selections, more specifically at identifying certain forms of coordination of lexical and grammatical features that are more complex, and also subtler, than the cases of lexico-grammatical co-selection usually described in the literature. Theoretically and methodologically, the study builds on research into lexical constellations (Cantos & Sánchez, 2001; Almela, 2011; Almela et al., 2013), which has provided evidence that the strength of association between a node and a collocate can be influenced by elements outside the pair, particularly by dependencies among di↵erent collocates of a node. For instance, the association of the verb face and the noun decision is strengthened by the presence of modifiers of a specific semantic set (e.g., hard, difficult, tough). Previous studies have focused on the implications of this phenomenon for the analysis of word meaning. The methodology was based on comparisons of conditional probabilities between bigrams and trigrams formed by previously extracted significant collocates of a node. The present study adapts the methodology of lexical constellation analysis to the description of dependencies between di↵erent colligational patterns (i.e. preferred grammatical contexts) of a word. The node under investigation is the noun cause, and the corpus used is enTenTen2013, a large-scale web corpus of English. This corpus contains 19,717,205,676 tokens and is accessible at Sketch Engine. The methodology will be organized in two main steps. In the first one we will compare the conditional probabilities of di↵erent grammatical contexts of the node. The goal of this first step is to determine whether the presence of a particular grammatical category in the context of the node increases or decreases the probability of another grammatical category in a di↵erent position. More specifically, we will observe possible dependencies between the slots ‘premodifier’ and ‘of -postmodifier’. In a second step, we will compare the behaviour of these two slots across di↵erent collocations of the node. In particular, we will analyse their distribution in collocations of cause with a list of top logDice collocates. Two main conclusions are drawn from the results. The first one is that there are dependency relations between the two grammatical slots investigated in the environment of cause (‘premodifier’ and ‘of -postmodifier’). The second one is that the dependency relations observed between grammatical slots are contingent on specific collocations of cause. The dependencies observed do not exhibit the same behaviour with all the verbal collocates of the node. In general, these results point towards an influence of collocation on the co-occurrence probabilities of di↵erent ⇤ † Ponente Autor correspondiente: [email protected] 176 colligations of cause. Contraseña: collocation, colligation, lexical priming, semantic preference. 177 Técnicas de caracterización de los personajes femeninos en Galdós: una aproximación desde los estudios de corpus Guadalupe Nieto 1 ⇤ 1 Universidad de Extremadura - Uex (SPAIN) – España En esta comunicación se explora, a partir de un estudio de corpus, el lenguaje gestual en las novelas de Benito Pérez Galdós y, de manera más precisa, los patrones empleados por el novelista para trazar la personalidad de los personajes femeninos. El estudio abordará la obra completa en prosa del escritor, la cual suma alrededor de 6,2 millones de palabras. Para ello se prestará especial atención a las construcciones de al menos cinco palabras (clusters), empleadas de manera sistemática, y que contengan algunas de las siguientes partes del cuerpo: cabeza, espalda, hombros, manos u ojos. Este recurso de caracterización ha sido analizado en otros escritores de habla inglesa como Dickens (Mahlberg, 2013; Ruano San Segundo, 2015) o Jane Austen (Fischer-Starcke, 2010). El lenguaje gestual, como apunta Korte (1997: 4), se erige, como se verá, en un sistema autónomo en la construcción del universo ficticio en el género novelesco. El estudio de corpus que se propone permitirá profundizar en un aspecto del estilo de Galdós que hasta ahora, debido a lo complejo que puede llegar a resultar su análisis sin herramientas de carácter cuantitativo, ha pasado, por lo general, desapercibido. Ası́ pues, se indagará en la caracterización de sus personajes femeninos a través de patrones recurrentes y que contengan las mencionadas partes del cuerpo a lo largo de su producción literaria y el cotejo, en algunos casos, con la caracterización de los personajes masculinos. Como se podrá comprobar, la obra de Galdós está poblada por patrones que actúan como bloques textuales que contribuyen a la construcción del universo ficticio que el autor nos plantea. Los textos han sido descargados del repositorio digital Cervantes Virtual y han sido procesados posteriormente con el software de concordancias WordSmith Tools 6 (Scott, 2013), que permite realizar búsquedas de palabras y concordancias que extraen resultados que pueden ser analizados en el contexto de la novela en que aparecen. Entre los ejemplos de nuestro análisis se encuentra la expresión ”el pañuelo a los ojos”, asociada casi con exclusividad a la caracterización de personajes femeninos y empleada normalmente en momentos dialógicos para insistir en la tristeza de estos: ”Irene se llevó el pañuelo a los ojos, y con voz de ahogo me dijo: ‘Sabe usted... más que Dios...”’ (El amigo Manso, capı́tulo 41). En definitiva, la caracterización de los personajes femeninos en el universo novelesco de Galdós está perfectamente lograda. En efecto, como se pretende demostrar en este trabajo, el análisis del lenguaje gestual desde una perspectiva de estilı́stica de corpus permitirá, además, marcar diferencias entre hombres y mujeres o entre mujeres burguesas y proletarias. El autor canario es, en palabras de Marı́a Zambrano (1994: 130), ”el primer escritor español que introduce valientemente a las mujeres en su mundo”. Bibliography: ⇤ Ponente 178 Biblioteca Virtual Miguel de Cervantes (2016): http://www.cervantesvirtual.com/ (acceso: 2 de abril de 2016). Fischer-Starcke, B. (2010): Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries. London: Continuum. Korte, B. (1997): Body Language in Literature. Toronto: University of Toronto Press. Mahlberg, M. (2013): Corpus Stylistics and Dickens’s Fiction. New York/London: Routledge. Ruano San Segundo, P. (2016): ”A corpus-stylistic approach to Dickens’ use of speech verbs: Beyond mere reporting”. Language and Literature, 25 (2), 1-15. Scott, M. (2013): WordSmith Tools. Version 6. Oxford: Oxford University Press. Zambrano, M. (1994): ”Mujeres de Galdós”. Asparkı́a, 3, 129-135. Contraseña: Galdós, mujer, lenguaje gestual, estilı́stica de corpus 179 Unidades fraseológicas en la subtitulación de una serie del género de drama. Dalila Itzel Nieto Mercado 1 ⇤ 1 , Eleonora Lozano Bachioqui† 1 Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Av. Álvaro Obregón y Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México Resumen El presente trabajo surge de la necesidad de conocer más sobre la traducción de unidades fraseológicas en la subtitulación del inglés al español, debido al crecimiento de espectadores de contenidos audiovisuales provenientes de Internet. En este contexto, se debe tener en cuenta que la labor del traductor consiste en hacer la cultura accesible a todo aquel que se interese por ella,ya que no se trata solamente de convertir mensajes de un idioma a otro sino también de difundir la cultura. El objetivo de este trabajo es la creación de un glosario de unidades fraseológicas en inglés junto con sus equivalencias, basado en un corpus proveniente de los diálogos de una serie de televisión. Los resultados beneficiarán a todos aquellos que se interesen por la traducción o bien puede servir como instrumento de enseñanza de unidades fraseológicas en inglés y sus equivalencias al español. Para esto, se compilaron algunos guiones de la serie estadounidense Mad men(Weiner, 2007) con el fin de realizar un análisis de las unidades fraseológicas utilizando el programa AntPConc, creado por Laurence Anthony para el análisis de textos paralelos. Phraseological units in the subtitling of a drama series Abstract The following paper rises from the need of learning more about the translation of phraseological units in English to Spanish subtitling, due to the increasing amount of spectators of Internet broadcast media.In this regard, we must take into account that a translator’s task is to make culture approachable to anyone who is interested in it, for it’s not only about translating words from one language to another but it’s also about spreading the culture. The objective of this paper is to create an English phraseological units glossary -along with their equivalence in Spanishbased on a corpus originated from the scripts and subtitles from a television series. The results will benefit anyone who is interested in translation or it may also serve as an English to Spanish phraseological unit teaching tool. To do the aforementioned glossary, a compilation of scripts from the American series Mad men(Weiner, 2007) was made in order to analize the phraseological units using the tool AntPConc (to analize paralel texts) a tool created by Laurence Anthony. Contraseña: traducción, subtitulación, unidades fraseológicas, ling´’uı́stica de corpus ⇤ † Ponente Autor correspondiente: [email protected] 180 Verbal agreement with NCOLL-of-NPL subjects in the inner varieties of English in GloWbE Yolanda Fernández-Pena 1 ⇤ 1 University of Vigo – España Collective noun-based subjects may take singular or plural verbs according to whether the speaker focuses on the collectivity or on its individuals (Dekeyser 1975), the latter being preferred in British English (Bauer 2002). This conundrum is further complicated when collective subjects take plural of -dependents (i.e. Ncoll-of-Npl subjects) which may interfere in the subject-verb agreement relation, as in (1): (1) A [crowd]SG of [waiters]PL [were]PL gathering. In previous research, with data from the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA), I showed that NCOLL-of-NPL subjects take a significant rate of plural verb agreement (68.05%) in local syntactic domains in both British and American English and that, with increasing syntactic distance and complexity, the influence of plural of -PPs on verb number diminishes and, therefore, the rate of plural agreement considerably lowers (58.47%). This study extends the scope of such investigation by exploring verbal agreement with NCOLLof-NPL subjects in the corpus of Global Web-based English (GloWbE) with a two-fold purpose. Firstly, I have inspected British and American English in GloWbE to find whether my prior observations were corroborated (and to what extent) in the more informal web-based register. Secondly, I have scrutinised the data for the other four inner varieties of English in GloWbE – Ireland, Canada, Australia and New Zealand – to detect significant regional tendencies and similarities/di↵erences with respect to British and American English. To this end, I have replicated my previous investigation and, thus, examined verbal agreement with twenty-three singular collective nouns taking of -dependents (lists retrieved from Biber et al. 1999: 249; Huddleston and Pullum et al. 2002: 503) in the six inner varieties of English in GloWbE. The syntactic variables considered in the study pertain to (i) the constituent structure of the of -PP, (ii) the typology of the modifiers of the NPL, and (iii) the morphology of the NPL (i.e. regular vs. irregular vs. non-overt plurality as in boys vs. men vs. people). The results confirm to a large extent my prior observations in the BNC and COCA and also evince significant regional trends. In general, NCOLL-of-NPL subjects show an overall preference for plural verbal agreement only in the British and the Irish components (57.26% and 63.97%); American English slightly favours singular agreement (52.18%), whereas Canada, Australia and New Zealand do not display significant preferences. In line with the BNC and COCA, the data from GloWbE demonstrate how the morphology of NPL conditions verbal agreement because morphologically-unmarked plural nouns such as people show a more remarkable influence on verb number (70.18%) than irregular (i.e. men, 60.89%) and regular (i.e. boys, 51.63%) plural nouns, a tendency which is attested in all the varieties surveyed. Concerning syntactic complexity, while Canada, Australia and New Zealand do not provide significant results, the ⇤ Ponente 181 results for the British, Irish and American varieties confirm that the most complex syntactic configurations of NCOLL-of-NPL subjects (i.e. those with pre- and postmodification) select a lower rate of plural agreement. Similarly, plural verb agreement is considerably less salient when the NPL is postmodified by clausal and, thus, expectedly more complex constituents (40.94% vs. non-clausal: 55.50%). This finding counteracts prior literature (Corbett 1979) but lends support to the tendencies that I had previously observed and, hence, confirms the significant impact of morphology and syntactic complexity on the verbal patterns of NCOLL-of-NPL subjects. Contraseña: verbal agreement, collective nouns, regional varieties, corpus 182 Évaluer le seuil de fréquence pour la sélection des paquets lexicaux: de bonnes nouvelles avec quelques réserves Yves Bestgen 1 ⇤ 1 Centre for English Corpus Linguistics (CECL) – Place du Cardinal Mercier, 10 B-1348 Louvain-la-Neuve, Bélgica Une des approches les plus fréquemment employées pour étudier les unités préformées dans des corpus repose sur l’identification automatique des paquets lexicaux (lexical bundles) qui sont les séquences de mots les plus récurrentes dans un corpus (Biber et al., 1999). Leur étude a permis de mettre en évidence des di↵érences phraséologiques entre des registres, des genres et des époques. Si la majorité des recherches ont été menées sur des séquences de 4 mots, des séquences plus courtes ont également été analysées. Pour les sélectionner parmi l’ensemble des n-grammes de mots présents dans un corpus, deux critères sont employés : un seuil de fréquence minimale, censé garantir que les paquets lexicaux ”show a statistical tendency to co-occur ” (Biber et al., 1999: 989) et le nombre minimal de documents dans lequel une séquence doit être présente afin d’éliminer les séquences idiosyncrasiques. Si un large consensus s’est établi pour fixer un seuil de 3 à 5 textes pour le deuxième critère, de très fortes variations sont observées pour le premier, celui-ci se situant habituellement entre 10 et 40 occurrences par million de mots, mais des valeurs allant de 4 (O’Kee↵e et al., 2007) à 88 (Decock, 1998) ont également été employées. S’agissant du critère principal de sélection (Cortes, 2015: 204), censé garantir que les paquets lexicaux sont composés de ”words which follow each other more frequently than expected by chance” (Hyland, 2008: 5), une telle plage de variation conduit à se demander si les seuils de fréquence employés sont suffisamment élevés pour éviter de sélectionner des n-grammes que le hasard aurait pu facilement produire aussi fréquemment. De nombreux chercheurs ont en e↵et souligné qu’une séquence peut-être très fréquente simplement en raison de la fréquence des mots la composant (p.ex. Evert, 2005; Gries, 2010). Afin d’essayer d’apporter une réponse à cette question, l’étude emploie une extension aux séquences de plus de deux mots du test exact de Fisher qui est recommandé dans le cas des bigrammes (Jones et Sinclair, 1974; Pedersen et al., 1996; Stefanowitsch et Gries, 2003). Il est important de noter que son objectif n’est pas de remettre en question la définition des paquets lexicaux comme les séquences les plus récurrentes. Il est évidemment plus utile de distinguer des registres au moyen de séquences très fréquentes qu’au moyen de séquences rares. Les analyses ont été e↵ectuées sur un corpus de 3 200 000 mots extraits de la section ”académique” du BNC. Trois sous-corpus ont également été extraits de ce corpus initial de manière à faire varier la taille, le premier contenant 800 000 mots, le deuxième 200 000 et le dernier 50 000 mots. Une procédure d’estimation des probabilités par permutation des mots dans le corpus a été employée et 10 millions de permutations ont été e↵ectuées dans chaque corpus. Les résultats indiquent que les seuils classiques sont suffisamment élevés pour ne sélectionner que des séquences de quatre mots que le hasard aurait très peu de chance de produire aussi fréquemment. Par contre, un nombre important de séquences de trois mots sélectionnés sur la ⇤ Ponente 183 base de ces seuils ne passent pas le test inférentiel. Cette étude met aussi en évidence un e↵et très marqué de la taille du corpus sur l’efficacité des seuils de fréquences lorsque ceux-ci sont exprimés en fréquence normalisée, confirmant les inquiétudes de Cortes (2015) et de Hyland (2012). Contraseña: expressions phraséologiques, paquets lexicaux, lexical bundles, test exact de Fisher, approche dirigée par le corpus, seuil de fréquence, taille du corpus 184 Índice de creatividad metafórica y universales de traducción: propuesta metodológica a partir de un corpus de informes de responsabilidad social empresarial Sara Piccioni 1 ⇤ 1 Università “G. D’Annunzio” di Chieti-Pescara – Italia Objetivo del presente trabajo es investigar las hipótesis de los universales de traducción (Baker 1996) a través de la comparación de un ı́ndice de creatividad metafórica en un corpus de textos originales y textos traducidos en español. El análisis realizado se basa sobre una doble propuesta metodológica: en primer lugar, abrazando la idea de que los textos traducidos difieren de los textos originales por rasgos ling´’uı́sticos propios, se propone incluir entre estos rasgos el nivel de lexicalización/creatividad metafórica, sugiriendo que el uso metafórico en originales y traducciones es distinto en cuanto al tipo de repertorio metafórico utilizado; en segundo lugar, se propone un ı́ndice de creatividad metafórica capaz de medir el nivel de creatividad metafórica a partir de observaciones en un corpus de referencia general del español. El corpus de estudio consta de un corpus monoling´’ue comparable de informes de responsabilidad social empresarial compuesto por originales españoles (OR-ES) y textos en español traducidos del inglés (TR-ES). Por lo que atañe a la primera propuesta metodológica, se avanza la hipótesis de que el fenómeno metafórico con su amplio margen de variación entre formas plenamente lexicalizadas (por ej., cuello de botella) y metáforas creativas (por ej., drenar el dolor ) ofrece un punto de observación ideal para observar cómo el uso ling´’uı́stico de quien traduce se diferencia del observado en textos originales. Más en lo especı́fico, se considera que las metáforas convencionales en las traducciones son reflejo de procesos de normalización propios de los textos traducidos (”tendency to exaggerate features of the target language and to conform to its typical patterns”, Baker 1996), mientras que las metáforas creativas pueden resultar de un proceso de reverberación de la lengua de partida en la lengua meta (shining through, Teich 2003). La segunda propuesta metodológica es funcional a la comparación del nivel de creatividad de las metáforas en textos traducidos y originales y parte del criterio propuesto por Deignan (2005) para distinguir metáforas innovadoras y metáforas históricas: la baja frecuencia de usos metafóricos de una palabra dada se considera indicativa de innovación metafórica, mientras que las palabras que se utilizan casi exclusivamente metafóricamente se consideran usos convencionales. Para calcular el ı́ndice de creatividad metafórica, se extraen las 200 parejas VERBO-SUSTANTIVO más frecuentes en los dos corpus (OR-ES y TR-ES), identificando entre estas las parejas metafóricas utilizando el procedimiento propuesto por el Grupo Pragglejaz (Pragglejaz Group, 2007). Sucesivamente, se calcula el ı́ndice de creatividad de verbos y sustantivos metafóricos contando el ⇤ Ponente 185 número de casos de usos metafóricos de cada uno en una selección casual de 100 concordancias extraı́das del corpus español de la Leeds Collection of Internet Corpora (Sharo↵ 2006, REF). El número de casos metafóricos en REF multiplicado por la frecuencia de una dada pareja VERBO-SUSTANTIVO en OR-ES y TR-ES se considera indicativo del nivel de creatividad/convencionalidad metafórica de cada corpus. La comunicación se centrará en una discusión de las implicaciones metodológicas de las propuestas avanzadas, además de poner en relación el ı́ndice de creatividad metafórica con fenómenos de normalización y reverberación en las traducciones. Contraseña: informes de responsabilidad social empresarial, traducción de la metáfora, universales de traducción, análisis de la metáfora basado en corpus 186 ‘His maiestie chargeth, that no person shall engrose any maner of corne’. The Standardization of Punctuation in Early Modern English Legal Proclamations Javier Calle-Martı́n 1 ⇤ 1 University of Málaga (UMA) – Facultad de Filosofia y Letras Departamento de Filologı́a Inglesa Campus de Teatinos s/n Málaga 29071, España Punctuation is historically noted to develop from the rhetorical to the grammatical, from the speaker to the reader, the Renaissance standing out as the transitional period with the adoption of syntactic and pragmatic functions to organize the written information. This standardization is elsewhere regarded as a consequence of the introduction of Caxton’s printing press in England, the increasing activity of Westminster’s Royal Chancery, and a growing number of professional scriveners engaged in the writing of all sort of documents, from guild’s records to private letters. The study of historical punctuation, however, has been mostly based on Old and Middle English handwritten material, literary and scientific texts in particular. Unfortunately, the Early Modern English period has been an exception with the publication of a limited number of studies investigating the scribal attitudes in di↵erent text-types, the list including scientific, legal and literary texts, drama in particular (Calle-Martı́n and Miranda-Garcı́a 2008: 356–360). The unexplored condition of Early Modern English punctuation is even more significant in the particular case of printed texts, despite their active participation in the process of standardization. Legal material is not an exception, proclamations being ”one of the most overlooked categories of printed material in the field of early modern history” (Kyle 2015: 771). In the light of this, the present study therefore analyses the punctuation system in Early Modern English printed legal material with the following objectives: a) to provide the inventory of marks of punctuation in Early Modern English printed texts; b) to o↵er a detailed account of the use and pragmatic functions of these symbols; and c) to assess the level of standardization of punctuation in these sources. The present study relies on The Corpus of Early Modern English Statutes (compiled by Anu Lehto at the University of Helsinki), containing approximately 214,000 words for the historical period 1491-1707 (Lehto 2013: 239). The corpus is divided into 25-year sub-periods for diachronic comparison and they have been compiled to include two proclamations for each time period, with samples printed during the reign of each sovereign. Legal material has been chosen in view of a) its orality, written to be read aloud; b) its conservativeness, hostile to individual creativity in favour of the standard practice; and c) it complex syntax, requiring a complex set of marks for all kinds of syntactic relationships. This material has allowed us to gather conclusive data to ascertain a) the existence of an inventory of punctuation marks with a preconceived set of rules, corroborating an ongoing process of specialization at that time; and b) more importantly, the historical development of particular punctuation symbols, o↵ering grounds as to the actual rise and fall of particular symbols and ⇤ Ponente 187 their functions in the history of English. Calle-Martı́n, Javier and Antonio Miranda-Garcı́a. 2008. ”The Punctuation System of Elizabethan Legal Documents: The Case of G.U.L. MS Hunter 3 (S.1.3)”. The Review of English Studies 59: 356–378. Kyle, Chris R. 2015. ”Monarch and Marketplace: Proclamations as Use in Early Modern England”. Huntington Library Quarterly 78.4: 771–787. Lehto, Anu. 2013. ”Complexity and Genre Conventions: Text Structure and Coordination in Early Modern English Proclamations”. In Andreas H. Jucker, Daniela Landert, Annina Seiler and Nicole Studer-Joho (eds.). Meaning in the History of Engish. Words and Texts in Context. Amsterdam, Phil: John Benjamins. 233–257. Contraseña: Early Modern English, proclamations, punctuation, standardization 188 ‘Making it clear’: A contrastive study of evidentials and boosters in contemporary political discourse Ana Albalat-Mascarell 1 ⇤† 1 Universitat Politecnica de Valencia [Espagne] (UPV) – España Within Hyland’s (2005) metadiscoursal framework, evidentials and boosters are common rhetorical strategies that lend credibility to arguments either by drawing on external sources of information or by emphasising one’s own certainty about a proposition. Both strategies are part of a strong interpersonal view of metadiscourse comprising the ways speakers can organize a discourse and adopt a stance towards what is being discussed and their audience (Hyland, 2004, 2005, 2010; Hyland and Tse, 2004; Dafouz-Milne, 2008; Mur-Dueñas, 2011). But while a useful tool in explaining the interactional features of language in di↵erent domains and genres, metadiscourse has mostly been examined in relation to academic writing (Hyland, 2015). Little attention has been given to the role of metadiscourse markers in non-academic discourses with an overtly persuasive component such as political discourse, least of all from a comparative perspective exploring rhetorical and discursive cross-cultural di↵erences (Mur-Dueñas, 2011) between English and other languages. I address this gap by focusing on the presence and function of evidentials and boosters in broadcast debates between political candidates held for the 2015 and 2016 general elections in Spain and for the 2016 presidential election in the United States of America. In this vein, my objectives are, first, to extract the frequencies of the words and phrases performing these particular metadiscourse functions in such televised debates aimed at a very large audience; second, to compare the rhetorical and discursive roles of the most frequently used expressions by di↵erent speakers and relate them to the candidates’ persuasive aims; third, to explore linguistic and intercultural di↵erences regarding the use of these strategies and contrast them with the particular outcome of each election. In the methodology set for this study, the analysis was based on a corpus of authentic data consisting of the transcripts of those debates involving the leaders of at least the two parties topping opinion polls in each country and election (i.e. the PP and the PSOE (also Podemos in the 2016 election) in Spain and the Democratic and Republican political parties in the United States). Furthermore, the quantitative use of evidentials and boosters was analyzed with the tool ‘Metool’ developed specifically to detect metadiscourse strategies. The results demonstrate how the strategies identified tend to work in combination towards the representation of a credible self with something plausible to say that challenges opposing views on the same issue. Also, the main di↵erences in the qualitative use of these metadiscourse devices between the political actors involved and the positions they publicly adopt reveal a striking correlation between the speaker’s communicative characteristics and the projection of personal authority and trustworthiness into their discourse. Last but not least, the cross-cultural analysis of evidentials and boosters in broadcast debates taking the framework of interpersonal metadiscourse shows that the speaker’s ability to construct an e↵ective ‘Ethos’ varies according to language and culture but, quite surprisingly, a better performance at debates does not necessarily imply an election victory neither in the Spanish national context nor in the Anglo-Saxon tradition in the United States. ⇤ † Ponente Autor correspondiente: [email protected] 189 Contraseña: Intercultural rhetoric, Corpus, based analysis, Metadiscourse, Evidentials, Boosters, Political discourse 190 Indice de autores Álvarez-Gil, Francisco J., 73 Ahuactzin Martı́nez, Carlos Enrique, 75 Albalat-Mascarell, Ana, 183 Alcaraz-Mármol, Gema, 159 Almela Sánchez, Moisés, 170 Almela, Ángela, 159 Alonso Belonte, Isabel, 87 Alonso Ramos, Margarita, 81 Alonso-Almeida, Francisco, 73 Alruwaili, Awatif, 155 Andrade Navarro, Allen, 54 Arsenio, Andrades, 30 Baena Lupiáñez, Marı́a del Carmen, 56 BALLIER, Nicolas, 61 Barcellos, Carolina, 85 Barrio, Marı́a Valentina, 52 Barry, Pennock-Speck, 67 Bendinelli, Marion, 103 Bertels, Ann, 143 Bestgen, Yves, 177 BOJOVIC, Dijana, 36 Boutmgharine Idyassner, Najet, 46 Brenchley, Mark, 157 Buckingham, Louisa, 168 Cabezas-Garcı́a, Melania, 120 Cal Varela, Mario, 139 Calle-Martı́n, Javier, 181 Calvo-Rubio Jiménez, Estrella, 58 CANGIR, Hakan, 89 Cantos Gómez, Pascual, 170 Cantos, Pascual, 159 Carrió-Pastor, Marı́a Luisa, 97 CAVALLA, Cristelle, 38 Charles, Maggie, 153 Chaski, Carole, 159 Clavel Arroitia, Begoña, 67 Comer, Marie, 145 Comitre Narvaez, Isabel, 107 CORONA, ISABEL, 162 Criado Peña, Miriam, 128 Criado Sánchez, Raquel, 87 Delgar Farrés, Gemma, 111 Dong, Jihua, 168 EL KHAMISSY, Riham, 101 Esteban-Segura, Laura, 6 Fernández, Ester, 95 Fernández-Alcaina, Cristina, 18 Fernández-Domı́nguez, Jesús, 18 Fernández-Pena, Yolanda, 175 Fernandez Polo, Francisco Javier, 139 Gallego, Daniel, 63 Gandón-Chapela, Evelyn, 10 Garcı́a Salido, Marcos, 81 Garcia González, Marcos, 81 Garcia-Marchena, Oscar, 147 Gautier, Laurent, 32, 109 Georgopoulos, Athanasios, 22 Gil Martı́nez, Marı́a Adelaida, 77 GIRALDEZ CEBALLOS-ESCALERA, JOAQUÍN, 34 Gledhill, Christopher, 28 Gonzalez Darriba, Patricia, 2 Grön, Leonie, 143 Gregori-Signes, Carmen, 164 Gris Roca, Joaquı́n, 87 Hamilton, Clive, 71 Hedeland, Hanna, 149 Herrando Rodrigo, Isabel, 135 Heylen, Kris, 83 Jacques, Marie-Paule, 116 Jeon, Yun Sil, 141 Jettka, Daniel, 149 John, Suganthi, 12 Kang, Beomil, 4 Kubler, Natalie, 105 Kunilovskaya, Maria, 112 Lambrechts, An, 83 Lara-Clares, Cristina, 18 Laso, Natalia Judith, 12 León-Araúz, Pilar, 91, 120 191 Lee, Sun-Hee, 4 Lissón, Paula, 61 Liu, Yuanyi, 44 Llorián, Susana, 26 Lorés Sanz, Rosa, 135 Lozano Bachioqui, Eleonora, 54, 174 MAPELLI, GIOVANNA, 135 Martı́nez Casas, Marı́a, 137 Martı́nez Zavala, Sonia Paola, 24 Martı́nez, Inmaculada, 26 Martikainen, Hanna, 28 Martinez-Insua, Ana Elina, 16 Maruenda-Bataller, Sergio, 8 Mas, Inmaculada, 129 Mestivier (Volanschi), Alexandra, 28 Mestivier, Alexandra, 105 Mestre-Mestre, Eva M., 122 Mezeg, Adriana, 114 Morales Moreno, Albert, 59 Moreno-Ortiz, Antonio, 151 Moreno-Sandoval, Antonio, 44 Morgoun, Natalia, 112 Muñoz-Garcés, Alejandro, 141 Murillo, Silvia, 133 Savvidou, Paraskevi, 69 Selmi, Afef, 32 Sinkuniene, Jolanta, 166 SUAU-JIMÉNEZ, FRANCISCA, 79, 135 Suleymanov, Dzhavdet, 42 TRAN, Thi Thu Hoai, 38 Tutin, Agnès, 131 Vadasz, Noemi, 20 Verplaetse, Heidi, 83 Villayandre, Milka, 52 Yan, Rui, 116 Yoo, Hye Ryeong, 4 ZHANG, Xingzi, 93 Zimina, Maria, 28 Nevzorova, Olga, 40 Nguyen Van, Cyril, 109 Niall, Curry, 118 Nieto Mercado, Dalila Itzel, 174 Nieto, Guadalupe, 172 Pérez Béjar, Vı́ctor, 50 Padilla Herrada, Marı́a Soledad, 50 Pallejá, Clara, 159 Pecman, Mojca, 105 Perez-Guerra, Javier, 16 Piccioni, Sara, 179 PIQUÉ-NOGUERA, CARMEN, 79 Prado-Alonso, Carlos, 126 Ramisch, Carlos, 65 Ramos Ruiz, Ismael, 99 Reimerink, Arianne, 91 Rodrı́guez-Abruñeiras, Paula, 8 Rodrı́guez-Puente, Paula, 161 Romero Medina, Agustı́n, 87 Romero-Barranco, Jesús, 48 Ruano, Pablo, 14 Sánchez-Cárdenas, Beatriz, 65 Salles-Bernal, Soluna, 6 Santaemilia, José, 124 192

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Resúmenes - Colloque international de Linguistique de Corpus