* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Resúmenes - Colloque international de Linguistique de Corpus
Survey
Document related concepts
Transcript
Resúmenes CONFERENCIANTES Gloria Corpas Pastor Universidad de Málaga, España "Through the Corpus Glass: diatopy and idiomaticity in translated Spanish" Doctora en Filología Inglesa por la Universidad Complutense de Madrid (1994), Gloria Corpas Pastor es catedrática visitante en Tecnologías de la Traducción del Instituto de Investigación en Procesamiento del Lenguaje y la Información de la Universidad de Wolverhampton (desde 2007) y catedrática de Traducción e Interpretación de la Universidad de Málaga (desde 2008). Experto español para el comité ISO TC37/SC2-WG6 "Translation and Interpreting". Cuenta con una extensa producción científica y forma parte de numerosos comités científicos y consejos de redacción nacionales e internacionales. Actualmente es Presidenta de AIETI (Asociación Ibérica de Traducción e Interpretación), miembro del Consejo Consultivo de EUROPHRAS (“European Society of Phraseology”) y VicePresidenta de la AMIT-A (Asociación de Mujeres Investigadoras y Tecnólogas de Andalucía). Susan Hunston University of Birmingham, Reino Unido "Words and Phrases: re-thinking corpus-based approaches to lexis and grammar" Susan Hunston es catedrática de lengua inglesa en la Universidad de Birmingham (GB). Es especialista en Lingüítica de corpus y en Análiis del discurso. Es autora de varios monográficos (orpora in Applied Linguistics (2002/CUP), Corpus Approaches to Evaluation: Phraseology and evaluative language (2011/Routledge) y coautora de Grammar: a corpus-driven approach to the lexical grammar of English (1999/Benjamins). Es co-editora de Evaluation in Text: authorial stance and the construction of discourse (2000/OUP) y de System and Corpus: exploring the connections (2005/Equinox). Publicó numerosos artículos sobre el uso de los corpus para describir la gramática y el léxico del inglés, y sobre los corpus y análisis del discursos. Aquilino Sánchez Pérez Universidad de Murcia, España "The Cognitive Foundations of Corpus Linguistics" Aquilino Sánchez Pérez fue Director de la Escuela Oficial de Idiomas de Barcelona y profesor de la Universidad de Barcelona y Autónoma de Barcelona. Posteriormente accedió a Cátedra en la Universidad de Murcia, Departamento de Filología Inglesa, centro en el cual sigue impartiendo su docencia. Su docencia y trabajo investigador se han centrado en la Enseñanza y aprendizaje de lenguas extranjeras, lexicología, lexicografía monolingüe y bilingüe (inglés-español) y lingüística del corpus (diseño y recopilación de corpus y desambiguación automática de significados). Fue cofundador y Secretario de la Asociación Española de Lingüística Aplicada (AESLA), fue miembro fundador de la Asociación de Estudios Ingleses en España, de la Asociación Europea de Lexicografía, y fue Presidente de AELINCO( Asociación española de lingüística del corpus). Sumario A Comparable Corpora Study on Self-Directed Motion in Spontaneous and Translated English, Patricia Gonzalez Darriba . . . . . . . . . . . . . . . . . . . . . . . 8 A Corpus-Based Analysis of Phraseological Units in Korean Academic Texts, SunHee Lee [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 A Diachronic Study of the Conative Alternation Construction in English, Laura Esteban-Segura [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 A corpus-based analysis of news values in construing intimate partner violence discourses in digital written media: A historical perspective, Sergio MaruendaBataller [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A corpus-based analysis of syntactic linking between antecedents and ellipsis sites in Post-Auxiliary Ellipsis in Modern English, Evelyn Gandón-Chapela . . . . . . 16 A corpus-based analysis of the collocational patterning of adjectives with abstract nouns in medical English, Natalia Judith Laso [et al.] . . . . . . . . . . . . . . . . 18 A corpus-stylistic analysis of direct thought presentation in Charles Dickens’s fifteen novels, Pablo Ruano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 A data-driven analysis of linguistic complexity and proficiency in learner and native English, Javier Perez-Guerra [et al.] . . . . . . . . . . . . . . . . . . . . . . 22 Affix rivalry in English derivation: An onomasiological approach, Cristina FernándezAlcaina [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Anaphora Resolution on the Fly – Pronouns in a Psycholinguistically Motivated Parsing System, Noemi Vadasz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Anaphora resolution in the interlanguage of English and Greek learners of Spanish: a corpus-based study, Athanasios Georgopoulos . . . . . . . . . . . . . . . . 28 1 Análisis de los aspectos pragmáticos en los discursos especializados de economı́a y finanzas: un trabajo basado en un corpus oral como apoyo a la interpretación, Sonia Paola Martı́nez Zavala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Aplicaciones del corpus CORPEN a la enseñanza y la evaluación de las unidades fraseológicas del español usado en contextos especı́ficos, Inmaculada Martı́nez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Applying Textometric Analysis to a Description of Cochrane Medical Abstracts and their Plain Language versions: Quantitative Characterisation of Plain Language in Medical Discourse, Christopher Gledhill [et al.] . . . . . . . . . . . . . . 34 Aproximación a la fraseologı́a contrastiva en las sentencias del TJUE, Andrades Arsenio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Calcul de la saillance pour annoter un corpus anaphorique (RESUMAN), Afef Selmi [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Constitution d’un corpus juridique pour l’extraction des collocations, Joaquı́n Giraldez Ceballos-Escalera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Construction de corpus en vue d’une étude contrastive des structures résultatives en anglais et de leur traduction en français, Dijana Bojovic . . . . . . . . . . . . 42 Corpus en classe de langue. Exemple avec les marqueurs d’exemplification et de reformulation, Cristelle Cavalla [et al.] . . . . . . . . . . . . . . . . . . . . . . . . 44 Development of Tatar-Russian Socio-Political Dictionary of Collocations on Corpus Data, Olga Nevzorova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Development of annotation system for multiword constructions for Tatar National Corpus, Dzhavdet Suleymanov . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Diccionario de terminologı́a médica español - chino basado en corpus, Antonio Moreno-Sandoval [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Dire la nouveauté par les mots : les néologismes révélant les nouvelles tendances sociétales en France, Najet Boutmgharine Idyassner . . . . . . . . . . . . . . . . 52 Early Modern English Scientific Text Types: Di↵erent Levels of Linguistic Complexity?, Jesús Romero-Barranco . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 El corpus de fuentes digitales como herramienta para la gramática del discurso, Vı́ctor Pérez Béjar [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 El desacuerdo a través de la interrogación ecoica, Marı́a Valentina Barrio [et al.] 2 58 El lenguaje jurı́dico y el lenguaje de la ingenierı́a biomédica vistos desde la metodologı́a de corpus, Eleonora Lozano Bachioqui [et al.] . . . . . . . . . . . . . 60 Estudio comparativo de la traducción en inglés, francés y español de los aspectos ling´’uı́sticos y paraling´’uı́sticos de los cómics a partir de un corpus multimodal de género de terror, Marı́a Del Carmen Baena Lupiáñez . . . . . . . . . . . . . . . . 62 Estudio comparativo de las marcas de uso en los repertorios lexicográficos actuales, Estrella Calvo-Rubio Jiménez . . . . . . . . . . . . . . . . . . . . . . . . . 64 Estudio contrastivo de corpus para identificar los rasgos diacrónicos del discurso normativo catalán : estudio de los Estatutos de autonomı́a de 1932, 1979 y 2006, Albert Morales Moreno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Estudio de la aplicabilidad de la ley de Zipf y de la ley de Heaps en los corpus de aprendientes de inglés., Nicolas Ballier [et al.] . . . . . . . . . . . . . . . . . . . . 67 Extracción de fraseologı́a contable con Sketch Engine. Propuesta de flujo de trabajo, Daniel Gallego . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Extracting semantic frame structures from Environmental Sciences corpora, Beatriz Sánchez-Cárdenas [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Facework in a telecollaboration student corpus, Pennock-Speck Barry [et al.] . . 73 From text to word and from word to morpheme: Exploring the interface of corpus linguistics and word formation study with evidence from Modern Greek, Paraskevi Savvidou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Functional and thematic ngrams in specialized corpora: the case of academic English, French and Spanish, Clive Hamilton . . . . . . . . . . . . . . . . . . . . 77 Gender-based di↵erences in the use of epistemic modals in late Modern English scientific register, Francisco Alonso-Almeida [et al.] . . . . . . . . . . . . . . . . . 79 Gobernabilidad y democracia en México. Unidades fraseológicas del Ejecutivo Federal 2012-2016 desde el Análisis Crı́tico del Discurso, Carlos Enrique Ahuactzin Martı́nez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Gramática española para hablantes de francés: el uso de la preposición ”de” después de matrices del tipo es posible., Marı́a Adelaida Gil Martı́nez . . . . . . 83 Hedging in tourism discourse: the variable genre in academic vs professional texts, Francisca Suau-Jiménez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . 85 Identificación de fórmulas recurrentes en español académico, Marcos Garcı́a Salido [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3 Impact of Parallel Corpora as Translation Memories on Phraseological Translation Quality in Student Translations of Specialized Medical Texts, Heidi Verplaetse [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Investigating style and conventionality in literary translation: a corpus-based approach, Carolina Barcellos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Investigating the cognitive potential of primary EFL textbook activities: a corpusbased study, Joaquı́n Gris Roca [et al.] . . . . . . . . . . . . . . . . . . . . . . . . 93 Investigating the relationship between L1 and L2 collocation processing in the bilingual mental lexicon from a cross-linguistic perspective, Hakan Cangir . . . . 95 Knowledge extraction for TKB phraseology module design, Pilar León-Araúz [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 L’analyse contrastive des références au passé en français et en chinois -Sur le corpus des récits, Xingzi Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 La adquisición de los verbos de cambio: Un análisis de la interlengua de aprendices de español (L1 sueco), Ester Fernández . . . . . . . . . . . . . . . . . . . . . . . . 101 La detección y etiquetado de las estrategias metadiscursivas en artı́culos académicos: METOOL, Marı́a Luisa Carrió-Pastor . . . . . . . . . . . . . . . . . . . . . . . . 103 La economı́a al borde de un ataque de nervios: metáforas médicas en el discurso periodı́stico económico, Ismael Ramos Ruiz . . . . . . . . . . . . . . . . . . . . . 105 La mise en discours des données chi↵rées dans les textes de vulgarisation scientifique, Riham El Khamissy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 La modalité dans les discours politiques : segments phraséologiques en langue et en discours. Exploration textométrique d’un corpus de débats présidentiels états-uniens (1960-2016), Marion Bendinelli . . . . . . . . . . . . . . . . . . . . . 109 La traduction des ” megatermes ” anglais de type erythrocyte invasion-inhibitory response : une approche fondée sur corpus et analyse du discours, Mojca Pecman [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 La traduction publicitaire : approche par corpus, Isabel Comitre Narvaez . . . . 113 Le continuum lexique-grammaire en genre spécialisé à partir de corpus maison, Laurent Gautier [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Le marqueur discursif ”donc” dans deux corpus dialogaux de di↵érente nature, Gemma Delgar Farrés . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4 Learner vs. professional translational behavior: The case of discourse markers, Maria Kunilovskaya [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Les appositions nominales en français et en slovène : étude contrastive sur le corpus FraSloK, Adriana Mezeg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Les constructions verbales en comme : de l’écrit scientifique à l’écrit académique des étudiants natifs/non-natifs, Marie-Paule Jacques [et al.] . . . . . . . . . . . . 122 Meeting the reader in academic writing: reader pronouns in English and French., Curry Niall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Multi-word terms: disclosing the semantic relations in noun compounds, Melania Cabezas-Garcı́a [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Multilingual extraction of terminology from specialised corpora., Eva M. MestreMestre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Naming practices and media constructions of reality in Spanish: A corpus-based perspective on violence against women news (2005-2015), José Santaemilia . . . . 130 On the Endophoric, Abstract and Narrative Nature of Idiomatic ’Do So’ in Legal texts, Journalistic Texts and Written Correspondence. ”, Carlos Prado-Alonso . 132 On the Grammaticalization Path of the Quasi-coordinator as well as, Miriam Criado Peña . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Onomasiologı́a del sentimiento: los corpus ling´’uı́sticos como fuente de datos para la semántica y la combinatoria sintagmática de los nombres de emoción en español, Inmaculada Mas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Phraseological routines in scientific writing: the example of metatextual routines in French, Agnès Tutin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Phraseology and discourse grammar in English as a lingua franca: ’on the contrary’ and ’on the other hand’ in unedited research papers, Silvia Murillo . . . . 139 ROUND TABLE: Corpus-based analysis of interpersonal metadiscourse in specialized domains: academic vs professional and social genres. Theoretical and methodological challenges, Francisca Suau-Jiménez [et al.] . . . . . . . . . . . . . 141 Rocking the corpus. A discourse analysis of pop rock lyrics., Marı́a Martı́nez Casas143 SUNCODAC: A Spanish-English corpus of computer-mediated student discussions, Mario Cal Varela [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5 Secuencia gramatical para la enseñanza del español como lengua extranjera, Yun Sil Jeon [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Semantic constraints on MWU formation: Evidence from clinical records., Leonie Grön [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Sobre la cuasi-sinonimia de poner y meter en español: un análisis de regresión logı́stica de dos verbos locativos., Marie Comer . . . . . . . . . . . . . . . . . . . 151 Spanish Fragments and Polar Verbless Clauses. Typology and Corpus Distribution, Oscar Garcia-Marchena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Spoken Language Corpora under Examination, Hanna Hedeland [et al.] . . . . . 155 Strategies for Processing Large Corpora for Linguistic Inquiry and Natural Language Processing Tasks., Antonio Moreno-Ortiz . . . . . . . . . . . . . . . . . . . 157 Students’ use of the n-grams tool to learn about phraseology in academic writing, Maggie Charles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Teachers’ Dispositions Towards the Use of Corpus-Based Approaches in Teaching English as a Foreign Language in Higher Education, Awatif Alruwaili . . . . . . 161 The Developmental Relationship between Spoken and Written Clause Packaging in an English Secondary School, Mark Brenchley . . . . . . . . . . . . . . . . . . 163 The Psycholinguistic Profile of Domestic Abusers: A Corpus-Based Approach, ángela Almela [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 The XML Annotation of A Corpus of Historical English Law Reports 1535-1999: A Progress Report, Paula Rodrı́guez-Puente . . . . . . . . . . . . . . . . . . . . . 167 The construction of shared feelings: analysis of a↵ect in a corpus of obituary comments in online newspapers, Isabel Corona . . . . . . . . . . . . . . . . . . . 168 The implied consumer in British hotel websites, Carmen Gregori-Signes . . . . . 170 The power of English: I and we in ELF and in ENL academic discourse, Jolanta Sinkuniene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 The textual colligation of stance phraseology in cross-disciplinary academic discourses: the timing of authors’ self-projection, Louisa Buckingham [et al.] . . . . 174 Towards an extended lexical grammar: Complex colligational patterns of the noun cause, Moisés Almela Sánchez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . 176 6 Técnicas de caracterización de los personajes femeninos en Galdós: una aproximación desde los estudios de corpus, Guadalupe Nieto . . . . . . . . . . . . . . . 178 Unidades fraseológicas en la subtitulación de una serie del género de drama., Dalila Itzel Nieto Mercado [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Verbal agreement with NCOLL-of-NPL subjects in the inner varieties of English in GloWbE, Yolanda Fernández-Pena . . . . . . . . . . . . . . . . . . . . . . . . 181 Évaluer le seuil de fréquence pour la sélection des paquets lexicaux: de bonnes nouvelles avec quelques réserves, Yves Bestgen . . . . . . . . . . . . . . . . . . . 183 Índice de creatividad metafórica y universales de traducción: propuesta metodológica a partir de un corpus de informes de responsabilidad social empresarial, Sara Piccioni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 ‘His maiestie chargeth, that no person shall engrose any maner of corne’. The Standardization of Punctuation in Early Modern English Legal Proclamations, Javier Calle-Martı́n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 ‘Making it clear’: A contrastive study of evidentials and boosters in contemporary political discourse, Ana Albalat-Mascarell . . . . . . . . . . . . . . . . . . . . . . 189 Lista de autores 190 7 A Comparable Corpora Study on Self-Directed Motion in Spontaneous and Translated English Patricia Gonzalez Darriba 1 ⇤ 1 Rutgers, The State University of New Jersey [New Brunswick] (RUTGERS) – 100 George Street, New Brunswick, NJ 08901, Estados Unidos This paper employs a corpus-based approach to test two sets of hypotheses that predict opposite outcomes regarding the Unique Item T-Universal (Chesterman, 2004, 2010): on the one hand, Tirkkonen-Condit’s (2004) Unique Item Hypothesis, which claims that Unique Items are under-represented in translated texts, and on the other hand, Bakers’s (1993) Simplification Hypothesis and Halverson’s (2003) Gravitational Pull Hypothesis, which predict overrepresentation of Unique Items in translated texts. In order to test the aforementioned hypotheses, two comparable corpora have been selected and analyzed: The Translational English Corpus (TEC, Baker (2003)) and The Corpus of Contemporary American English (COCA, Davies (2008)), specifically in regards to the relative presence of English self-directed motion expressions such as float into, fly out, etc. The use of Spanish source texts in the case of the translated English texts from the TEC allows us to compare the prevalence of two widely accepted motion lexicalization patterns that correspond to the two languages in question: satellite-framed constructions in English and verb-framed constructions in Spanish (Talmy (1985), Slobin (1996), Levin and Rappaport (2016)). A total of 28 English manner of motion verbs in combination with 8 English path-denoting satellites were selected to search for, count, and compare the number of self-directed motion expressions in the TEC and the COCA. This comparable corpora study yielded a total of 41,852 tokens from both corpora. This number is broken down into 209.2 self-directed motion expressions per million words in the TEC, and 395.5 self-directed motion expressions per million words in the COCA. Data from the 28 verbs in both corpora were analyzed using an independent samples t-test, which revealed that the number of self-directed motion expressions is significantly higher in the COCA (M = 3.32) than in the TEC (M = 1.76; t (219.267) = -2.274; p = .012), Levene: p = .029). Moreover, a two-way ANOVA was conducted to compare the main e↵ects of Corpus and Lexical Frequency, and the interaction e↵ect between Corpus and Lexical Frequency on the number of self-directed motion occurrences by verb form per million words. Main e↵ects were significant for both Corpus and Lexical Frequency, but no Corpus*Lexical Frequency interaction e↵ect was found. These results confirm Tirkkonen-Condit’s Unique Item Hypothesis by proving that spontaneous, non-translated English is significantly richer in self-directed motion expressions than translated English, regardless the frequency of the verb taking part in the self-directed motion expression, and disprove the Simplification Hypothesis (Baker, 1993) and the Gravitational Pull Hypothesis (Halverson, 2003). Additionally, the results provide a baseline for future research aiming at gaining a better understanding of the cognitive processes involved in the translation of self-directed motion expressions. ⇤ Ponente 8 Contraseña: Comparable corpora, self, directed motion, translation universals, under, representation of unique items. 9 A Corpus-Based Analysis of Phraseological Units in Korean Academic Texts Sun-Hee Lee ⇤† 1 , Beomil Kang‡ 2 , Hye Ryeong Yoo§ 3 1 Department of East Asian Languages and Cultures, Wellesley College (EALC) – Green Hall 236B 106 Central Street, Wellesley, MA 02481, Estados Unidos 2 Department of Korean Language and Literature, Yonsei University (Korean Yonsei) – Oesolgwan 214, Yonsei Unviersity, Yonsei-ro 50, Seodaemun-Gu, Seoul, Corea del Sur 3 Department of Korean Language and Literature, Yonsei Graduate School (Yonsei) – Oaesolgwan 214, Yonsei-ro 50, Seadaemun-Gu, Seoul, Corea del Sur This study provides a corpus-based genre analysis of phraseological expressions in Korean academic prose, including collocation, colligation, and prefabricated lexical bundles (or formulaic expressions), etc. As an agglutinative language, phrasal structures in Korean incorporate particles and verbal endings in word-units and are more complex than the corresponding English structures. While exploring relevant challenges and new methodological tools to capture typologically distinct properties of Korean, we identify unique genre-specific properties of L1 academic texts using prefabricated phraseological units. We have collected a 10.9 million ecel (space-based unit) corpus composed of 2171 academic theses in the disciplines of humanities and social science with the highest ranks within the Korea Citation Index. From the corpus we extracted phraseological units depending on language model N-grams and processed them with statistical tools. While addressing related challenges in language specific data processing and analysis, we present the distinct linguistic functions of the phraseological units in Korean academic prose in comparison with other registers. Our study demonstrates the need to integrate both corpus-driven and corpus-based methodologies in order to process meaningful lexico-grammatical combinations in Korean, where strong morphosyntactic relations hold across distinct phrasal boundaries via a diverse collection of particles and endings. Our study also shows that combining N-gram-based extraction and morpheme-based cut-o↵s is more useful for identifying meaningful combinations. In line with Jang (2015), we argue for incorporating context sensitivity to n-grams to determine more useful patterns especially for processing agglutinative languages like Korean. For example, collecting the preceding and the following slots of an extracted N-gram and utilizing them to decide the final pattern increases the usability of the outcome. In the post-process of counting the frequency of an extracted N-gram, we merge a verbal lexeme with the following dependent morpheme(s), which does not make a meaningful linguistic contribution to the given phraseological unit; this process significantly decreases the number of patterns due to morpheme-based processing of N-grams in Korean. Based upon extracted phraseological expressions, we provide a genre-focused linguistic analysis of Korean academic register. While we are still in the process of extracting meaningful phraseological patterns, our pilot study suggests that there exist dynamic functions of referential expressions, stance expressions, hedges etc. in Korean academic texts. Despite the lack of referential expressions in Korean, the usage of phraseological units with demonstrative pronouns i ‘this’, and ku ‘that’ is highly ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] § Autor correspondiente: [email protected] † 10 frequent in academic contexts. Expressions of epistemic and attitudinal/modality stance are more rigorously used in the Korean academic register, which contrasts with Biber’s (2004) analysis of academic prose in English. Expressions of indirect quotation and hedges are noticeable in the extracted outcome. These findings suggest that sociocultural property of indirectness is prevalently reflected in Korean academic writing. The outcome of our study will provide a platform for further research with a large-size corpus of more than 100 million ecel for applied/pedagogical research on language acquisition and Korean for academic purpose (KEP). The long-term goal of our research aims to develop full-fledged genre analysis of L1 academic texts as well as L2 acquisition data. The study also explores dynamic interactions between grammar and lexicon in agglutinative languages like Korean while identifying language specific features in processing phraseological units and a genre analysis of academic texts. Contraseña: phraseological expressions, formulaic expressions, collocation, genre analysis, academic register 11 A Diachronic Study of the Conative Alternation Construction in English Laura Esteban-Segura ⇤† 1 , Soluna Salles-Bernal ⇤ 1 1 Universidad de Málaga (UMA) – España The conative alternation is a subtype of transitivity alternation in which there is a transitive variant and an intransitive one represented with an at-construction. From a syntactic point of view, it occurs with transitive verbs and is therefore referred to as a case of preposition insertion (the preposition at is inserted before the direct object). From a semantic perspective, it can be described as a ”detransitivizing” construction, since there is a contrast between conative uses of transitive verbs and their transitive counterparts (Perek 2015: 90). Accordingly, the argument can be direct (subject, direct object or indirect object) or oblique. (1) a. Kim cut the pie. b. Kim cut at the pie (drunkenly) (Beavers 2006: 6). The patient (”the pie”) can have two realizations: as the direct object (1a) or as an oblique signalled by the preposition at (1b). Here we find a semantic contrast: in the transitive variant the patient is known to have been a↵ected in some way, whereas in the one with the at-construction this is not necessarily the case; thus, the action denoted by the verb may or may not have been completed and the alternation may convey ”a reduced a degree of e↵ectiveness” (Riemer 2010: 354), as seen in example (2b) below, which implies that the action was not completely successful: (2) a. The zombies slashed my face. b. The zombies slashed at my face. Although the construction has been studied before (van der Leek [1996], Broccias [2001, 2003], Beavers [2010], Perek and Lemmens [2010], Guerrero-Medina [2011], Perek [2015]), it remains scarcely investigated from a diachronic point of view. Therefore, our main objective is to research on the origin and development of the conative construction in English by looking at its occurrence in several historical corpora. For the purpose, we have first made a comprehensive list of verbs which allow the construction and then selected the verbs under study. A collostructional analysis, which ”investigates which lexemes are strongly attracted or repelled by a particular slot in the construction (i.e. occur more frequently or less frequently than expected)” (Stefanowitsch and Gries 2003: 214), has been carried out as it can help to establish which verbs favour the construction over others in the di↵erent corpora. Some of our preliminary results show that the construction was already present in Old English and that in most instances the subject is agentive or animate. ⇤ † Ponente Autor correspondiente: [email protected] 12 Contraseña: conative alternation, verb alternation, history of English, collostructional analysis 13 A corpus-based analysis of news values in construing intimate partner violence discourses in digital written media: A historical perspective Sergio Maruenda-Bataller ⇤ 1 , Paula Rodrı́guez-Abruñeiras ⇤ 1 1 IULMA/Universitat de València – España In the last thirty years, there have been important advances in the media coverage or discussion of violence against women (VAW) (Aran Ramspott & Medina Bravo 2006; VallejoRubinstein 2005). Lately, it is indisputable that IPV is one of the key issues not only in political, social and institutional discourses but also in the selection agenda of news producers. The recognition of this phenomenon has been largely due to the media, which have played a decisive role in transferring the issue from the private and personal to the public sphere, thus ensuring visibility and contributing to sensitizing citizenship (Berganza Conde 2003). However, some authors (e.g. Altés 1998; Alberdi & Matas 2002) have argued that this is not without a cost. Media are torn between two conflicting interests: on the one hand, to treat these grievous cases with the required ethics and, on the other, to attract a maximum audience, which is almost ‘naturally’ done through sensationalism. Journalists can create di↵erent pictures of domestic violence and ”confirm and debunk the myths surrounding it by choosing certain topics, sources, facts, and words over others” (Bullock & Cubert 2002: 479). Against this backdrop, the present study aims to contribute a corpus-based approach to the discursive devices used to construct newsworthiness in IPV news in Spanish and UK dailies in an ad-hoc corpus of gender violence news reports from 2005 to 2015. Specifically, we explore the way media outlets have discursively represented women victims of IPV by means of news values over the last decade. Subsidiary to this, we will explore the way news values are exploited ideologically to construct discourse prosodies around women victims of IPV, violent episodes and perpetrators. The results gain insights into the social configuration and definition of women and their identities in contemporary written media on IPV through time. For our purposes, we apply Bednarek & Caple’s (2012; 2014) linguistic approach to news values as discursive realisations of newsworthiness that ”exist in and are constructed through discourse” (Bednarek & Caple 2014:136). Our analysis combines a quantitative approach with close qualitative readings of concordance lines to identify frequent linguistic occurrences in the corpus that may give rise to discourse prosodies (Bednarek 2006; Baker et al. 2008; Baker & Levon 2015). We pay attention to shared and di↵erent values cross-culturally, together with the most relevant discourse prosodies and ideological implications. Our results substantiate the existence of two polarised discourses which are nevertheless inextricably and ineluctably linked: a discourse of death, violence and terrible su↵ering and another of institutional and social support. The former is mainly conveyed through Negativity and Impact, while the latter is conveyed through Eliteness and Positivity. On the whole, these discourses are similarly constructed in the four ⇤ Ponente 14 data sets. However, the concordance analysis points to remarkable di↵erences. It shows that Negativity has more critical overtones in the Spanish newspapers, and reports on abusers are often constructed as more impersonal in the case of UK dailies. As for the depiction of extreme negative emotions, the higher number of occurrences, together with a wider plethora of word combinations construct Spanish reports as more ideological, if not sensationalist, thus exploiting readers’ interest in crime and violence. Contraseña: intimate partner violence, news values, newsworthiness, CADS, women. 15 A corpus-based analysis of syntactic linking between antecedents and ellipsis sites in Post-Auxiliary Ellipsis in Modern English Evelyn Gandón-Chapela 1 ⇤ 1 University of Cantabria and University of Vigo – España This study analyses the type of syntactic linking established between the antecedent clause(s) and the ellipsis site(s) in cases of Post-Auxiliary Ellipsis (PAE) in Modern English, using the Penn Parsed Corpus of Modern British English (1700-1914, one million words and eighteen di↵erent genres).The term ‘PAE’ (Sag 1976; Warner 1993; Miller 2011; Miller & Pullum 2014) covers those cases in which a Verb Phrase, Prepositional Phrase, Noun Phrase, Adjective Phrase or Adverbial Phrase is omitted after modal auxiliaries, auxiliaries be, have and do, and infinitival marker to. VP ellipsis (VPE) and Pseudogapping (PG) are the two subtypes of PAE under investigation: (1) That I had received such from Edward also I need not mention; but I do, you see, because it is a pleasure. [VPE: coordination] (2) They can by no means, therefore, be members of happiness; for if they were, happiness might be said to be made up of one member. [VPE: adverbial subordination] (3) I can recollect nothing more to say. When my letter is gone, I suppose I shall. [VPE: none] (4) A skilled florist will produce a finer e↵ect with a few inexpensive blossoms than an unskilled one will with a cartload of choice material. [PG: comparative subordination]. (5) but did not admire the strain of its poetry in general, though I did its morality. [PG: adverbial subordination] This aspect has also been studied in very few corpus-based works for the Present-Day English period (Hardt & Rambow 2001; Nielsen 2005; Hoeksema 2006; Bos & Spenader 2011; Sharifzadeh 2012; Miller 2014). Here I extend these studies by analysing the type of syntactic linking in PAE constructions in Modern English and by presenting a retrieval algorithm of instances of PAE via CorpusSearch 2. This complex algorithm has led to successful recall ratios (0.97) and is applicable to parsed corpora which follow the conventions of the Penn Parsed Corpus of Modern British English. The results show that, regarding PG, the vast majority of cases are comparative constructions (74%), followed by those cases with lack of syntactic linking (15.12%), coordination (4.65%), adverbial subordination (4.65%) and relative subordination (1.16%). The comparison with other studies on PG in Present-Day English (Hoeksema 2006; Sharifzadeh 2012; Miller 2014) has revealed that instances of PG with NP remnants have a stronger preference for comparative constructions in Present-Day English (around 90%) than in Modern English (70%). Regarding VPE, in over 50% of the examples there is no syntactic linking between the source and the target of ellipsis, which contrasts with the percentage found in PG (15.12%). The second most important type of syntactic linking is comparative subordination (31.51%). ⇤ Ponente 16 However, although the percentage of comparative constructions is high in VPE, it is almost 2.5 times higher in PG (74.42%). Far less common are cases of relative subordination (7.22%), coordination (5.56%) and adverbial subordination (5.37%). If these findings are compared with Bos & Spenader’s (2011), it is observed that the first three types of linking are the same in both studies: as-appositives, comparatives and lack of syntactic linking. Hardt & Rambow (2001), on their part, found that the di↵erent forms of subordination favour VPE, while the absence of a direct relation disfavours its presence. However, this type of linking is the third most frequent one in Bos & Spenader’s (2011) work and in this paper. Contraseña: ellipsis, syntactic linking, Modern English 17 A corpus-based analysis of the collocational patterning of adjectives with abstract nouns in medical English Natalia Judith Laso ⇤† 1 , Suganthi John ⇤ 2 1 2 University of Barcelona (UB) – España University of Birmingham – Reino Unido Research on specific-domain phraseology has demonstrated that it is challenging for EAL writers to acquire phraseological competence in academic English and develop a good working knowledge of domain-specific collocational patterns (Carter 1998; Williams 1998; Wray 1999; Gledhill 2000; Flowerdew 2003; Biber 2006; Hyland 2008 & 2016; Granger & Meunier 2008; Author 1 & Author 2 2013; Pérez-Llantada 2014; Hyland 2016). This is especially apparent in scientific discourse, where research grows at a rapid pace and researchers often are required to disseminate their results equally rapidly to an international audience. The struggle for the EAL speaker is learning the discourse conventions of the scientific genre to ensure that their results receive the sort of attention they would like it to from other members of the science community. Corpus-based analyses have been of special relevance in the field of genre analysis, which is a specific language practice, characterised by a number of linguistic features and phraseological conventions. It can therefore be claimed that genres make use of di↵erent ways of expressing meaning (Swales 1990; Hunston 2002). This assumption is intimately linked with the concept of local grammar (Gross 1993; Barnbrook & Sinclair 1995; Hunston & Sinclair 2000), which consists of a description of particular areas of language (e.g. the analysis of the collocational and phraseological conventions characteristic of scientific discourse), rather than the language as a whole (Bednarek 2007). The aim of this paper is to describe one pattern commonly found in scientific discourse; i.e. abstract nouns in combination with adjectives so as to contribute to the characterisation of this combinatorial pattern in medical science writing. The corpus analysed in this study is the Health Science Corpus (HSC ), which is a representative sample of health science research articles specifically compiled for investigating the lexico-grammatical patterns surrounding nontechnical terms in scientific English and the conventionalised phraseological characteristics of this genre. The observations drawn have contributed to our understanding of the positions and typology of adjectives in combination with abstract noun patterns in medical English. Furthermore, this study has also brought to the forefront the convenience of using collocation evidence obtained from textual corpora in EFL and ESP settings so as to help EAL writers focus on slices of real language as well as high-frequent combinations of words. To this end, the findings in this study have informed the development of SciE-Lex, a reference tool which provides information about the meanings and the grammatical and collocational patterns of ⇤ † Ponente Autor correspondiente: [email protected] 18 general terms frequently produced in medical English. The aim of SciE-Lex is to help the Spanish professional medical community use the appropriate collocational patterns in their medical research articles. Some other publicly available resources, such as existing technical and scientific monolingual dictionaries, focus mainly on terminological and encyclopaedic information or –as in the case of bilingual and multilingual dictionaries- they provide translation equivalents without further information about the context on which the meaning of a given lexical entry depends. Consequently, the development of lexical databases like SciE-Lex as well as specialised dictionaries that take into account the lexico-grammatical patterning of lexical units and acknowledge that meaning is highly dependent on the context of co-occurrence of the word (Barnbrook 2007:191) is considered to be extremely valuable to the EAL scientific community. Contraseña: phraseological units, abstract nouns, EAL writers, medical community, ESP corpus investigation 19 A corpus-stylistic analysis of direct thought presentation in Charles Dickens’s fifteen novels Pablo Ruano 1 ⇤ 1 Universidad de Extremadura - Uex (SPAIN) – España In this presentation, a corpus-stylistic analysis of direct thought presentation will be carried out in a corpus of Charles Dickens’s fifteen novels (c. 3.8 million words). The aim of the analysis is to delve deeper into Dickens’s presentation of his characters’ thoughts, an aspect so far underexplored maybe due to the ‘lack of psychological inwardness and depth in his characters’ (McParland, 2011: 209). Despite such dearth of psychological depth, though, Dickens consistently reported his characters’ thoughts throughout his fifteen novels. Therefore, a systematic analysis of how he did so is in order, if only because no comprehensive account of it has been yet attempted. As will be shown, occurrences of direct thought (henceforth, DT) can be effectively retrieved thanks to a corpus methodology, which makes it possible to systematically analyse Dickens’s use of this mode of thought presentation. Specifically, 244 occurrences of DT have been retrieved here, constituting a much wider set of examples than the twenty-one examined by Busse (2010) in the most comprehensive analysis of discourse presentation strategies in nineteenth-century fiction to date.[1] The analysis of these 244 occurrences will not only further confirm some of Busse’s findings regarding DT in nineteenth-century narrative fiction, but will also unveil hitherto unremarked patterns in form and function as far as Dickens’s presentation of his characters’ thoughts is concerned. The analysis has focused on those examples that contain the verb think, the reporting verb for thought presentation par excellence. For example: ”John” thought madame, checking o↵ her work as her fingers knitted, and her eyes looked at the stranger. ”Stay long enough, and I shall knit ‘BARSAD’ before you go.” (A Tale of Two Cities, book 2, chapter 16) This example contains several characteristic features of Dickens’s use of DT, such as the use of a vocative in the reported clause, a suspended reporting clause and the reference to the character’s eyes. These and other traits are investigated in this presentation. As will be shown, they fulfil meaningful functions which relate to significant aspects of Dickens’s style, as discussed by other critics. The analysis is intended to contribute to a better understanding of Dickens’s craftsmanship from a stylistic point of view. It is only fair to note that Busse’s corpus is composed of excerpts of less than 3,500 words from twenty-two nineteenth-century novels (Busse, 2010: 64), being therefore much smaller than the corpus of Dickens’s novels analysed here. ⇤ Ponente 20 Contraseña: Dickens, corpus stylistics, direct thought presentation 21 A data-driven analysis of linguistic complexity and proficiency in learner and native English Javier Perez-Guerra ⇤ 1 , Ana Elina Martinez-Insua ⇤ 1 1 University of Vigo (UVigo) – FFT. Campus Universitario. 36310 Vigo, España This paper investigates issues covered by the umbrella concept of ‘linguistic complexity’ in learner language. The notion of complexity, as understood in this study, focuses on a number of dimensions: lexical, syntactic and semantic-discoursive. The null hypothesis ‘learner language does not deviate from native language as regards linguistic complexity’ is rejected in light of data-driven standard metrics of linguistics density and inter-/intra-textual diversity. On the one hand, the data sampling learner language are retrieved from the Early-Access Subset of the Trinity Lancaster Corpus, compiled at the ESRC Centre for Corpus Approaches to Social Science, Lancaster University. This subset of the Trinity Lancaster Corpus comprises approximately two million words in length and includes transcribed interactions between candidates and examiners from B1 to C2 level of the Common European Framework of Reference (Council of Europe 2001). Each candidate participated in a number of speaking tasks (depending on his/her proficiency level). On the other hand, the data retrieved from the learner dataset will be compared with results deriving from the native learner corpus LOCNEC (Centre for English Corpus Linguistics, Université catholique de Louvain), which will constitute the English native control corpus, as well as with other non-native L2 corpora, such as the Louvain International Database of Spoken English Interlanguage (LINDSEI). The software tools which will be used in this research are, first, Coh-Metrix (McNamara et al. 2014) and Synlex (Lu 2012, 2014). First, Coh-Metrix provides basic lexical and semanticdiscoursive features such as type-token ratio and average word and sentence length, as well as other metrics of textual lexical diversity (mainly vocd-D) and readability indexes (Flesh Reading Ease, Flesh Kincaid Grade Level). Besides, it determines spaces in Latent Semantic Analysis which can be used to characterise the degree of conceptual similary within a group of texts. Second, Synlex (Lu’s Lexical Complexity Analyzer and L2 Syntactic Complexity Analyzer) automates the analysis of complexity by using 25 di↵erent measures of lexical density, taken from the first- and second-language development literature. The input texts from the Early-Access Subset of the Trinity Lancaster Corpus will be POS-tagged and lemmatised by means of TreeTagger so that Synlex can provide the di↵erent measures. The statistical analysis and discussion of the metrics for the native and the learner corpora, as supplied by Coh-Metrix and Synlex, will be decisive to investigate the following research questions: does learner language di↵er from native language as regards linguistic complexity? do the CEFR levels imply di↵erences as regards linguistic complexity? The results show, first, that the answers to the previous research questions are positive and, second, that the cline as ⇤ Ponente 22 regards complexity degrees complies with the CEFR levels in a very significant way. References Council of Europe. 2001. Common European Framework of Reference for Languages: learning, teaching, assessment. Cambridge: Cambridge UP. Landauer, Thomas K. 2007. LSA as a theory of meaning. Eds. Thomas K. Landauer, Danielle S. McNamara, Simon Dennis and Walter Kintsch eds. The handbook of Latent Semantic Analysis. Mahwah, NJ: Lawrence Erlbaum, 3–34. Lu, Xiaofei. 2012. The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal 96/2: 190–208. McNamara, Danielle S., Arthur C. Graesser, Philip M. McCarthy and Zhiqiang Cai. 2014. Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge UP. Lu, Xiaofei. 2014. Computational methods for corpus annotation and analysis. Dordrecht: Springer. Contraseña: complexity, CEFR, metrics, learner language, readability 23 Affix rivalry in English derivation: An onomasiological approach Cristina Fernández-Alcaina 1 , Cristina Lara-Clares 1 , Jesús Fernández-Domı́nguez ⇤ 1 1 University of Granada [Granada] – vda. del Hospicio, s/n C.P. 18071 Granada, España The notion that morphological processes contend with each other for concept naming is a well-known and substantiated one, and underlies some of the most prominent word-formation theories. Morphological competition is however a slippery notion which numerous scholars have dealt with in passing, and where the existing approaches are more theoretically than empirically oriented. In principle, competition is a theory-neutral notion and ”[...] happens when two or more morphological processes can express the same syntactic-semantic function” (Kastovsky 1986: 597). The competitive behaviour of word-formation has been the focus of recent investigations, most of which have adopted a primarily formal perspective by comparing pairs or small groups of competing rules (Bauer 2006, Bauer et al. 2010, Arono↵ & Lindsay 2014). The scope of such works includes the semantics of derivation, but their driving force is formal performance. Ultimately, the conclusions of customary approaches to competition are that affix X succeeds in the competition with affix Y, or that affix Z dominates in a given morphosemantic context. One alternative to the above is found in the onomasiological model of word-formation, which follows in the tradition of the Prague School of Linguistics and whose main exponent is Štekauer (2005). This approach has shifted the focus of word-formation analysis away from its formal aspects onto the naming needs of language users, such that the semantics of lexemes prevail over their form. In this view, the base-derivative relationship is inspected mainly through meaning categories like Causative, Locative, Agent or Instrument, each of which may be conveyed via various word-formation processes (e.g. -er, -ian or -ist all express agency). With this in mind, this paper considers the role of cognitive-semantic categories from two angles: i) How do cognitive-semantic categories behave with regard to morphological competition? ii) How can the existing formulas of productivity measurement be employed in the onomasiological evaluation of competition? For both issues we resort to the British National Corpus (BNC), the Corpus of Contemporary American English (COCA) and the Oxford English Dictionary (OED). In the case of question i), the derivatives are classified into competing clusters by using a template that considers a series of factors that facilitate or constraint the appearance and the profitability of word-formation processes. Based on the semantic classification in Bagasheva (to appear), this makes it possible to interpret which readings of a lexeme prevail and which become obsolete during competition. Question ii) is addressed by operating the productivity formulas in Baayen (2009) and Gaeta & Ricca (2015) on the study sample, for which corpus-derived frequencies prove essential. The results obtained from the above are then set in the framework of a semantic view of competition, understood as a complement to more formal views on the matter. The preliminary conclusions point to a correlation between the number of instances of a process in present com⇤ Ponente 24 petition and its profitability, and between the number of instances of prevalence with the degree of profitability of that process. Contraseña: affixation, English, morphological competition, onomasiology, productivity, word formation 25 Anaphora Resolution on the Fly – Pronouns in a Psycholinguistically Motivated Parsing System Noemi Vadasz 1 ⇤† 1,2 Pázmány Péter Catholic University, Faculty of Humanities and Social Sciences – Budapest, Hungrı́a 2 MTA-PPKE Hungarian Language Technology Research Group – Budapest, Hungrı́a A psycholinguistically motivated parsing model like AnaGramma (Prószéky and Indig, 2015) throws new light upon the broadly interpreted problem of anaphora resolution. This paper concentrates on the narrower problem of pronouns[1] namely the personal, reflexive and reciprocal pronouns in the framework of the AnaGramma parsing model. As AnaGramma, with its strictly left-to-right, word-by-word approach tries to handle utterances by following the patterns of human language processing as much as possible, it is needed to handle coreference ‘on the fly’ during the parsing of the utterance. It works with a supplyand-demand framework, which means that each word supplies its lexical representation and morpho-syntactic information, and demands are issued (e.g. verbs have an obligatory need for their arguments). At the end of the utterance all demands should be fulfilled either from the sentence or with default mechanisms. The output of the parser is a dependency graph with di↵erent types of edges including coreference-edge. When the parser gets a verb (or any element having argument frame), after calling the actual argument frame searchers of the arguments can start o↵. If an argument preceded the verb in the linearization of the sentence it is at service for the searchers (in a short term memory called pool ). In other cases the searcher can wait until a potential supply arrives. The searchers have di↵erent settings according to the demands of the verb. In Hungarian, from the inflection of the verb some features of some arguments are calculable. The searchers look for an agreeing subject or object. A default zero node with the appropriate case marker and agreeing features is pro↵ered as well. According to this, zero pronouns are involved into the parsing process. Reflexives and reciprocals with their actual case marker behave like other arguments – as supplies, ready for the verb’s demands. A special problem during the parsing is the case of homonymy. In Hungarian the pronoun maga has two meanings: (1) a third person singular reflexive pronoun in nominative case (‘himself/herself’) and (2) a polite or formal second person singular personal pronoun in nominative case (‘you’). In addition there is an other use of maga in the construction of e.g. maga a(z) ´’ord´’og (‘the devil itself’). Pronominalization and the use of zero pronouns are run by an underlying rule-system which enables us to reveal the anaphora dependencies and referential identities. These long term relations overarch the borders of the caluse – even of the sentence – in which they are. Using the algorithm of Pléh and Radics (1976), these underlying rules can be built into the AnaGramma ⇤ † Ponente Autor correspondiente: [email protected] 26 parsing system in order to close its operation to human sentence processing regarding to the pronouns as well. In this paper I present a solution for handling Hungarian personal, reflexive and reciprocal pronouns in the framework of AnaGramma, based on the anaphora resolution algorithm by Pléh and Radics (1976). My observations are based on corpus data for which I have used the Pázmány Corpus (Endrédy, 2016). Some types of corefenerce like repetition, proper name variants, synonyms, hyper- and hyponyms are needed to be taken into account as well, they are the subject of future research. Contraseña: computational linguistics, parser, pycholinguistics, performance, corpus 27 Anaphora resolution in the interlanguage of English and Greek learners of Spanish: a corpus-based study Athanasios Georgopoulos 1 ⇤ 1 Universidad de Granada - UGR (SPAIN) – España Overt pronominal subjects are not syntactically obligatory in pro-drop languages like Spanish (Fernández Soriano 1999, Luján 1999). Previous research has shown that their use and alternation with null subjects is both syntactically and contextually constrained (Alonso-Ovalle et al 2002, Perez Leroux & Glass 1999). It has also been demonstrated that learners of Spanish show persistent deficits concerning their distribution (Lozano 2009, 2016). The interface between syntax and discourse has been claimed to account for these deficits (Sorace 2004). While research in this field has traditionally relied on experimental data (for overviews: Quesada 2015), there is an increasing number of researchers who point out the need of using corpora to test existing hypotheses (Dı́az & Thompson 2013, Lozano & Mendikoetxea 2013, Mendikoetxea 2014, Tono 2003). Additionally, most of the studies on subject pronouns in Spanish L2 (Al Kasey & Pérez-Leroux 1998, Almoguera & Lagunas 1993, Liceras 1996, Liceras & Dı́az 1999) have examined the interlanguage of English-speaking learners, whose L1 is non pro-drop. Overall, in Spanish L2, there is a very limited number of corpus-based studies on the interlanguage of speakers of pro-drop languages such as Greek (Margaza & Bel 2006). This paper presents the preliminary results of research that aims to explore the anaphoric 3rd person subject usage in the interlanguage of Greek and English learners of Spanish. The major empirical basis of the investigation is a recently compiled L1 Greek-L2 Spanish learner corpus. The corpus is conceived as a component of the L1 English-L2 Spanish CEDEL2 corpus (Lozano 2009, Lozano & Mendikoetxea 2013). Both corpora exhibit the same design principles. Hence, this is the first corpus-based study that allows comparability of two groups of learners (Greekspeaking versus English-speaking) whose L1 di↵ers with respect to anaphoric subjects. For the analysis of the corpus data, the XML annotator ”UAM corpus tool” (O’Donnell 2009) was used. A purpose-oriented tagset was designed, on the basis of previous learner corpus studies (Blackwell & Quesada 2012, Gudmestad & Geeslin 2013, Lozano 2016). Learners of two di↵erent proficiency levels (elementary and upper-advanced) for each group (English and Greek) were examined and compared to a native Spanish control group. Preliminary results indicate that although elementary Greek-speaking learners of Spanish show some tendency to overuse overt subjects, they do so in a significantly lower percentage than their English counterparts. Moreover, at the upper-advanced level, they exhibit native-like preferences, in contrast to the English-speaking learners, who show deficits even at the highest levels of proficiency. Crosslinguistic influence can account for these di↵erences between the two learner groups. Greekspeaking learners seem to take advantage of the similarity between their L1 and Spanish with respect to anaphora resolution (AR) patterns, whereas English-speaking learners seem to transfer their L1 properties. From a developmental point of view, results suggest that cross-linguistic influence is a crucial factor and that certain AR categories at the syntax-discourse interface can be fully acquired. Results run partially against the Interface Hypothesis and are in line with other recent SLA studies (Judy 2015, Kras 2008, Prentza 2014, Zhao 2014). ⇤ Ponente 28 Contraseña: anaphora resolution, SLA, Spanish L2, contrastive interlanguage analysis, learner corpora, Interface Hypothesis 29 Análisis de los aspectos pragmáticos en los discursos especializados de economı́a y finanzas: un trabajo basado en un corpus oral como apoyo a la interpretación Sonia Paola Martı́nez Zavala 1 ⇤ 1 Universidad Autónoma de Baja California (UABC) – Av. Monclova 678, Ex-Ejido Coahuila, 21360 Mexicali, Baja California, México Argumento principal Los intérpretes se enfrentan a problemas como falsos sentidos, sin sentidos y contrasentidos que se presentan en la práctica. Éstos pueden ocurrir al no considerar los aspectos pragmáticos del discurso. Los fallos pragmáticos ocurren cuando la interpretación es gramaticalmente correcta; sin embargo existe una pérdida de sentido. Objetivos El objetivo general es identificar aspectos pragmáticos en el discurso de economı́a y finanzas a través de un corpus monoling´’ue en inglés que facilite la tarea interpretativa en este tipo de discurso a través de un corpus y un reporte de hallazgos que funcionen como herramientas de documentación para el intérprete. Para lograrlo, se compila una muestra de un corpus de textos sobre economı́a y finanzas en inglés, que consiste de 27 transcripciones de entrevistas obtenidas de The World Bank Group (2016), se procesa en la herramienta AntConc 3.4.4w y se analiza el corpus para identificar los aspectos como emociones, inferencias intelectuales, hipótesis, reformulaciones, evaluaciones, expresiones metafóricas, modalizaciones discursivas, peticiones, órdenes, entre otros y se realiza un reporte que concentre los hallazgos. Marco Teórico Garcı́a Yebra (1981) señala que ”la traducción se distingue de la interpretación en que tiene como punto de partida un texto escrito, y como resultado, otro texto escrito” (p.9). Escobar (1996) menciona que la interpretación es una modalidad de la traducción y que presiones como los plazos convierten a la traducción en un proceso casi tan rápido como la interpretación. Faber (2009) indica que la pragmática se enfoca tanto en el efecto del contexto en el comportamiento comunicativo, ası́ como en cómo el receptor infiere para llegar a la interpretación final de una oración. Asimismo, Faber (2009) señala que la pragmática del discurso especializado se relaciona directamente con las situaciones en las cuales ocurre este tipo de comunicación, y en las formas ⇤ Ponente 30 en las que el emisor y el receptor lidian con ellas de manera potencial o efectiva. Sobre dominio pragmático, Bertone (1989) afirma que la competencia del intérprete consiste en lograr una distinción entre los tipos de implı́citos y de información contextual para interpretar adecuadamente, respetando cada aspecto. McEnery y Hardie (2012) definen la ling´’uı́stica de corpus como un área que se enfoca en un conjunto de procedimientos para el estudio de una lengua que se pueden aplicar a varias áreas de la ling´’uı́stica. L´’udeling y Kyt´’o (2009) indican que los córpora orales pueden ser compilaciones de grabaciones o transcripciones de éstas y que es posible analizar las últimas como un corpus escrito. Resultados Se construyó un corpus oral en inglés que consta de 86,883 palabras recuperadas del Banco Mundial y que se analizó con herramientas de procesamiento de corpus para determinar los aspectos pragmáticos y su contexto. Los resultados permiten a los intérpretes conocer sobre caracterı́sticas pragmáticas y desarrollar el dominio pragmático en el discurso económico financiero. Algunos ejemplos encontrados en el corpus son: el adverbio en inglés absolutely que expresa evaluación en el discurso económico-financiero, la conjunción if que denota una hipótesis y la frase I mean que pone de manifiesto una reformulación. En reformulaciones, la frase I mean, se utilizó como tal en 21 casos de 22 hits. La interpretación propuesta es Quiero decir, Digo o Me refiero a. Como expresión idiomática apareció en una ocasión Across the board, y la interpretación propuesta es A todos en general o Incluyendo a todos. Contraseña: Palabras clave: : pragmática, discurso especializado, ling´’uı́stica de corpus e interpretación. 31 Aplicaciones del corpus CORPEN a la enseñanza y la evaluación de las unidades fraseológicas del español usado en contextos especı́ficos Inmaculada Martı́nez 1 ⇤ 1 , Susana Llorián ⇤ † 2 Centro Internacional de Estudios Superiores del Español (CIESE-Comillas) – Avda. de la Universidad Pontificia s/n. 39520 Comillas. Cantabria, España 2 Universidad Complutense de Madrid (UCM) – España El impacto del Plan Curricular del Instituto Cervantes (2007) lleva a la Fundación Comillas a publicar años más tarde el Plan Curricular del Español de los negocios (Martı́n Peris y Sabater, 2012), con el fin de que este documento se erigiera en la principal referencia para el diseño de cursos, de materiales didácticos y de exámenes certificativos del Español de los Negocios (ENE). Durante el desarrollo de la documentación curricular se ratificó la necesidad de que se pusiera en marcha un proyecto de investigación que guiara el desarrollo de este proceso, fundamentado en un corpus especializado, que se materializarı́a en el corpus CORPEN (Corpus Comillas del Español de los Negocios). Una de las áreas más afectada por la aplicación del corpus CORPEN a este proceso es el componente léxico. El objetivo principal de esta comunicación consiste en mostrar las implicaciones de la asistencia de este corpus en la especificación de los contenidos léxicos del currı́culo de ENE, las orientaciones metodológicas y la validación de pruebas de evaluación certificativa del léxico. El uso del corpus es determinante para la selección de las unidades léxicas, tanto mono- como pluriverbales, es decir, palabras simples o compuestas, colocaciones, locuciones, fórmulas de interacción social, según la clasificación de Gómez Molina (2004), que se incluyen en los inventarios que servirán de base para la elaboración de los sı́labos de los cursos y de los manuales, ası́ como de las especificaciones de los exámenes. Queda garantizado ası́ que la lengua de estos materiales sea auténtica, reflejo de la que se emplea en los contextos reales de comunicación del ámbito de ENE, y no artificial o inventada como la que se muestra en los materiales que toman los corpus como punto de partida (O’Keefe y McCarthy, 2010: 374). Por otro lado, el corpus se constituye en la herramienta idónea para presentar las unidades léxicas del currı́culo en la disposición que se requiere para su enseñanza, a partir de propuestas como la del ”enfoque léxico” (Lewis, 1993, 1997, 2000) y las de algunos de sus seguidores como Timmis (2015), que plantean aplicaciones del enfoque empleando metodologı́a de corpus. En esta lı́nea, O’Kee↵e et al. (2007) describen el trazado de perfiles léxico-gramaticales de las unidades léxicas en el currı́culum, cuya rentabilidad pedagógica resulta especialmente fructı́fera si se aplica a la didáctica de ENE. Como señalan estos autores (O’Keefe et al, 2007: 198), en los géneros especializados y profesionales, lo más probable es que ocurran patrones y distribuciones más regulares que los que se dan que en la lengua general. Las relaciones entre léxico y gramática que se establecen desde la óptica de este enfoque permiten, ⇤ † Ponente Autor correspondiente: [email protected] 32 en segundo lugar, implementar la metodologı́a del ”aprendizaje guiado por datos” (Data-Driven Learning), que consiste básicamente en utilizar las herramientas que facilitan los corpus para el aprendizaje de las unidades léxicas. De esta forma podrı́an paliarse muchas de las crı́ticas que reciben propuestas como la de Lewis, referidas a los problemas de aplicación práctica. Por último, un corpus como CORPEN contribuirá también de manera decisiva a la validación de las pruebas de evaluación del léxico en los exámenes certificativos. En este sentido, el corpus permite comprobar la relación entre la lengua de los ı́tems de elementos discretos con los usos que se dan en los contextos reales de ENE. Contraseña: ”español de los negocios”, ”corpus especializados”, ”currı́culum del español de los negocios”, ”colocaciones”, ”locuciones”, ”expresiones institucionalizadas” 33 Applying Textometric Analysis to a Description of Cochrane Medical Abstracts and their Plain Language versions: Quantitative Characterisation of Plain Language in Medical Discourse Christopher Gledhill ⇤† 1 , Hanna Martikainen ⇤ ‡ 1 , Alexandra Mestivier (volanschi) ⇤ § 1 , Maria Zimina ⇤ ¶ 1 1 CLILLAC-ARP, EA3697 – Université Paris Diderot - Paris 7 – Francia The Cochrane organisation publishes meta-analyses of large-scale medical studies (‘Systematic Reviews’ – SRs). This information is summarised in 1) a Scientific Abstract (ABS), targeting members of the scientific community, and 2) a simplified summary for the general public which Cochrane calls ‘Plain Language Summaries’ (PLS). Although there now exists extensive literature on controlled languages (Stewart 1998, O’Brien 2003), there has been less work on the linguistic description of ‘plain language’. The Cochrane guidelines state that SRs should be written in ”clear, simple English” (Cochrane Style Manual), while the language that should be used in PLS is defined as ”plain English which can be understood by most readers without a university education” (Cochrane PLEACS standards). But the guidelines do not provide any specific linguistic definition of what is meant by ‘plain English’. In this paper, we set out to identify the main lexico-grammatical di↵erences between ABS and PLS texts. Our hypothesis is that PLS authors adapt their usage consciously or unconsciously to the perceived norms of what they think may be plain writing. This process appears to be very regular, and can be seen in the techniques of reformulation and other revisions that can be seen as the salient features of PLS as opposed to ABS. We extracted two sub-corpora from the literature produced by the Cochrane organisation: a corpus of 4540 ABS (2.1 million words) and a corpus of their corresponding 4540 PLS (1.1 million words). The ABS texts are systematically divided into sub-sections: Background, Objectives, Search strategy, Selection criteria, Data collection and analysis, Main results, Author’s conclusion. A minority of PLS (370) are also divided into sub-sections: Review question, Background, Study characteristics, Quality of the evidence and Key results. This segmentation allows us to pinpoint some specific phraseological strategies, for instance, the simplification of information from Author’s Conclusions (in ABS) in the Key Results subsections of PLS. We propose to use the methods of textometrics to compare the quantitative characteristics of the ABS sub-corpus and the PLS sub-corpus. First, we applied POS-tagging to both (Schmid 1994). Then, we applied characteristic elements computation and factorial analysis to compare di↵erent parts (text sections) of these POS-tagged corpora (Lebart et al. 1998). These met⇤ Ponente Autor correspondiente: ‡ Autor correspondiente: § Autor correspondiente: ¶ Autor correspondiente: † [email protected] [email protected] [email protected] [email protected] 34 rics reveal important similarities between the Background and Conclusions sections of ABS and PLS. For example, Singular/Massive Nouns (NN), Prepositions (IN), Adjectives (JJ) and Determiners (DT) turn out to be salient (‘over-represented’) in PLS as well as ABS Backgrounds and Conclusions sections. The over-representation of prepositions can be partially explained by complex pre-modified nominal groups in the ABS which are ‘un-packed’ in the PLS into longer nominals involving multiple embedding of post-modifying prepositional phrases: ABS: ”Non-penetrating filtration surgery versus trabeculectomy for open-angle glaucoma” PLS: ”Two surgical techniques for the control of eye pressure in people with glaucoma” Such ‘unpacking’ corresponds to the advice adopted by controlled languages such as Simplified Technical English: break down pre-modified nominals into several post-modifying groups. In this paper, we also report on other PLS patterns (reformulation of research processes and empirical findings towards more disease-oriented or user-oriented terms and topicalisation of human participants). All of these point to underlying regular tendencies of simplification in PLS. The next stage of our project will devise a way of adapting the findings of textometric analysis into the appropriate editorial guidelines for the authors of Cochrane PLS. Contraseña: corpus linguistics, language for special purposes, medical discourse, plain language summaries, textometric analysis 35 Aproximación a la fraseologı́a contrastiva en las sentencias del TJUE Andrades Arsenio 1 ⇤ 1 Universidad Complutense de Madrid (UCM) – España La Unión Europea publica toda su legislación en las 24 lenguas oficiales correspondientes a los 28 Estados miembros que conforman esta organización supranacional. En este sentido, el portal de la Unión Europea contiene una serie de recursos y páginas de internet que ponen a disposición del público un enorme corpus de textos legislativos, judiciales, etc., de fácil acceso en cada una de las lenguas oficiales. Este corpus multiling´’ue de textos paralelos permite realizar búsquedas ling´’uı́sticas y constituye un instrumento muy útil para consultar y cotejar todo tipo de datos de carácter terminológico, fraseológico, estilı́stico, etc. La ling´’uı́stica de corpus facilita el análisis de los distintos elementos ling´’uı́sticos en su contexto de producción real a partir de la compilación de documentos digitales. El estudio de textos del Derecho de la Unión Europea nos permitirá conocer las caracterı́sticas fraseológicas especı́ficas de estos textos y proponer una clasificación de los distintos tipos de estructuras fraseológicas (colocaciones, locuciones, expresiones formulaicas, etc.) que más se utilizan, basada en las principales taxonomı́as fraseológicas del lenguaje general (Corpas, 1997; Ruiz Gurillo, 1998; Garcı́a-Page, 2008). Para delimitar el ámbito de este trabajo nos vamos a centrar en una de las instituciones de la UE, el Tribunal de Justicia de la Unión Europea, y en uno de los principales tipos de documentos que produce: las sentencias. Ası́ pues, esta propuesta de comunicación tiene como objetivo la compilación de un corpus de sentencias en tres lenguas (inglés, francés y español) con el fin de identificar y extraer sus principales elementos fraseológicos. La metodologı́a de trabajo consiste fundamentalmente en constituir un corpus ad hoc de sentencias de la UE que sea representativo (Seghiri, 2014) y explorarlo mediante el programa de concordancias Wordsmith 5.0. con la finalidad de obtener información sobre las estructuras fraseológicas que más se utilizan en las tres lenguas que se cotejan. Los datos que se obtengan podrán servir de base a la hora de establecer distintas estrategias para abordar la traducción de estructuras fraseológicas en textos pertenecientes al ámbito judicial. Con este tipo de trabajos se pone de relieve que la compilación de un corpus puede contribuir de manera significativa al conocimiento de la fraseologı́a en un campo especializado y se hace hincapié en la importancia de que el traductor jurı́dico esté familiarizado con la fraseologı́a de su ámbito de especialización (Monzó y Hoyo, 1998; Lorente, 2002; Aguado de Cea, 2007; Pontrandolfo, 2013; Andrades 2013). Los resultados obtenidos constituyen una primera aproximación a la fraseologı́a jurı́dica propia de los organismos internacionales que podrán ampliarse con estudios de mayor alcance y, si los datos lo corroboran, podrán extrapolarse a los textos jurı́dicos en general. Este estudio permitirá asimismo apreciar las diferencias y semejanzas fraseológicas existentes entre el discurso jurı́dico general y el lenguaje utilizado en las sentencias del TJUE. ⇤ Ponente 36 Contraseña: Corpus Linguistics, specialised phraseology, legal translation 37 Calcul de la saillance pour annoter un corpus anaphorique (RESUMAN) Afef Selmi ⇤† 1,2 , Laurent Gautier ⇤ ‡ 3 1 Centre Interlangues Texte Image Langage (TIL) – Université de Bourgogne : EA4182 – Université de Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia 2 Aix-Marseille Université - UFR Arts, Lettres, Langues et Sciences Humaines (AMU UFR ALLSH) – Aix Marseille Université – 29, avenue Robert Schuman - 13621 Aix-en-Provence cedex 1, Francia 3 Centre Interlangues Texte Image Langage (TIL) – Université de Bourgogne : EA4182 – Université de Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia [Contexte] Le développement des systèmes de communication électroniques est accompagné d’une augmentation incessante du nombre de documents textuels électroniques disponibles tels que les résumés de notre corpus RESUMAN. Cette évolution nécessite la mise au point d’outils informatiques efficaces capables de sélectionner, de structurer et d’extraire les informations pertinentes contenues dans ces documents. Problématique Ce résumé s’inscrit prioritairement dans la piste de réflexion 7 ” Linguistique computationnelle basée sur corpus”. De ce fait, et comme ” la langue est constituée en grande partie de préfabriqués dont on peut faire l’analyse en interrogeant les corpus en s’appuyant sur des méthodes statistiques ”, nous avons crée un algorithme qui s’appuie sur le calcul de saillance (Landragin, 2011) comme facteur principal de résolution des anaphores pronominales dans notre corpus. En prenant en compte di↵érents facteurs syntaxiques et cognitifs, cet algorithme fait recourt à un modèle permettant d’évaluer d’une manière efficiente la saillance d’un antécédent potentiel. Ces facteur comportent chacun un indice di↵érent en fonction de leur utilité dans la résolution. Notre interrogation est la suivante : notre méthode statistique, basée sur notre corpus, est-elle performante ? Corpus Le corpus RESUMAN est constitué des résumés d’ouvrages de la littérature française. Il regroupe 120 résumés, mis en ligne sur le site www.alalettre.com et présentant un peu moins de 20 000 mots. Ce corpus contient environ 12 000 anaphores pronominales dont 3 000 sont ambigu´’es. Il s’agit de textes caractérisés par leur brièveté et densité référentielles. Il vise à interroger, automatiquement, le fonctionnement de l’anaphore pronominale ambigu´’e dans ces textes en vue de mettre en évidence des caractéristiques syntaxiques et cognitives propres aux chaı̂nes anaphoriques. Cadre méthodologique Après l’annotation morphosyntaxique semi-automatique de RESUMAN (vu que nous sommes intervenue pour compléter l’annotation morphologique des entités nommées), nous avons présenté ⇤ Ponente Autor correspondiente: selmiafef [email protected] ‡ Autor correspondiente: [email protected] † 38 un algorithme qui est inspiré de celui de Lappin et Leass (1994) en changeant la stratégie de calcul de la saillance. Afin de restreindre les candidats potentiels, l’algorithme soumet les textes de notre corpus à deux filtres : tout d’abord, à un filtre relatif à la cohérence morphologique entre l’anaphore et le candidat, ensuite, à un filtre relatif à la structure syntaxique de la phrase du pronom. Les candidats restants seront évalués selon un poids de saillance calculé selon les critères suivants : la distance du candidat et son poids grammatical. Pour cela, nous avons attribué des valeurs allant de 100 à 10 aux fonctions syntaxiques suivantes : Sujet, COD, COI, Attribut et Relatif. L’algorithme exploite, en premier temps, des informations de nature syntaxique et morphologique. Après exclusion des pronoms non-anaphoriques, il applique une mesure de saillance qui vise à classer les candidats potentiels pour ne garder ensuite que les attributs adéquats. A travers la résolution automatique de l’anaphore pronominale, nous mettons l’accent par la suite sur les interactions entre discours, traitement automatique des langues et analyse de corpus. Résultats 80% des anaphores pronominales du corpus sont résolues dont 25% des cas ambigus. Il reste 20% d’anaphores pronominales non résolues ce qui nous mène à réinterroger le corpus pour savoir les mécanismes qui ont empêché la résolution. Les poids grammaticaux que nous avons rajoutés en sont-ils la cause ? Ou bien au contraire, est-ce grâce à eux que nous avons ce taux de performance ? La course à un corpus d’évaluation est de mise pour répondre à ces questions. Contraseña: Linguistique computationnelle, corpus, anaphores pronominales, statistique, saillance, poids grammatical, résolution automatique. 39 Constitution d’un corpus juridique pour l’extraction des collocations Joaquı́n Giraldez Ceballos-Escalera 1 ⇤ 1 UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) – Senda del Rey, 7 - 28040 MADRID, España Du point de vue méthodologique, cette contribution s’inscrit dans le cadre de la linguistique de corpus et met en œuvre une étude sur l’extraction des collocations en langage juridique. Cette étude a le double objectif d’aborder les bases méthodologiques pour la constitution d’un corpus de textes juridiques et de présenter les di↵érentes étapes suivies pour l’extraction des collocations. La linguistique du corpus est une discipline linguistique qui, associée à la linguistique computationnelle, étudie la langue à travers une grande variété de textes. En lexicographie, le corpus constitue le matériel de base pour l’analyse linguistique et, grâce à la technologie computationnelle, aujourd’hui il est possible de disposer d’une masse considérable de données linguistiques, disponibles sous forme électronique. Ces ensembles de textes permettent d’observer des données réelles nombreuses et diversifiées. Ces ressources ouvrent de nouvelles perspectives à la description linguistique, dans la mesure où des outils d’analyse permettent d’explorer ces textes et d’en extraire des données linguistiques de manière efficace. On présentera le Corpus du français juridique ” FRJUR ” que nous avons élaboré et des outils d’analyse ainsi que la méthodologie employée. Le corpus linguistique du français juridique (FRJUR) est le résultat de la collecte de textes relatifs au domaine du droit civil français. Il est composé de 3.200.086 mots distribués en di↵érentes sections: codes, arrêts, publications spécialisées, etc. Les textes ont été sélectionnés et organisés de façon systématique selon des critères de distribution équilibrée pour devenir un ensemble structuré davantage que des collections de textes. Le corpus, sur support numérique, a été conçu en fonction des critères établis par Sinclair (1991): ”a corpus is a collection of naturally-occurring language text, chosen to characterize a state or variety of a language”. Sinclair (1991: 171) Pour la conception du corpus on a pris en compte la représentativité des textes et les destinataires. Le corpus FRJUR, nous a permis d’étudier les relations lexicales qui existent entre deux mots (probabilité de la dépendance) avec la probabilité d’observer ces mêmes mots séparément (probabilité de l’indépendance). Selon la théorie de Church et Hanks (1989) basée sur la notion d’information mutuelle de la théorie de l’information, si une véritable relation lexicale existe ⇤ Ponente 40 entre deux mots, la probabilité de la dépendance sera beaucoup plus élevée que la probabilité de l’indépendance et l’information mutuelle de la paire (le rapport des deux probabilités) sera largement supérieure à zéro. La paire sera alors retenue comme étant significative. La fréquence, la transparence, l’arbitrariété et la directionnalité constituent les critères établies par la plupart d’auteurs pour identifier les collocations (Firth :1957; Cruse : 1986 ; Hausmann : 1989 ; Mel’cuk : 1998). Pour établir la typologie des ” collocations ” dans la langue juridique nous proposons de partir des ” associations ” établies par Hausmann (1989) et de les répartir en cinq groupes: nom – adjectif, verbe – nom, verbe – adverbe, adverbe – adjectif et nom - (préposition) – nom. À l’aide d’un corpus informatisé, l’étude des collocations dans le langage juridique permettra d’enrichir les banques de données terminologiques pour l’utilisation des traducteurs, des chercheurs spécialistes (jurilinguistes) et les apprenants de français sur objectifs spécifiques (FOS). Contraseña: Mots, clés : corpus, collocations, coocurrence, droit, extraction 41 Construction de corpus en vue d’une étude contrastive des structures résultatives en anglais et de leur traduction en français Dijana Bojovic ⇤ 1 1 Bases, Corpus, Langage (BCL) – CNRS : UMR7320, Université Nice Sophia Antipolis (UNS) – Laboratoire BCL - UMR 6039 Université de Nice - Campus Saint-Jean d’Angely 3 24, avenue des Diables bleus 06357 Nice Cedex 4, Francia Cette communication a pour objectif principal d’expliquer les manières de procéder et les problèmes rencontrés dans la construction de corpus pour notre étude contrastive des structures résultatives en anglais et de leur traduction en français. Basée sur plusieurs corpus (British National Corpus, Corpus of Contemporary American English, Gutenberg, Gallica et FRANTEXT), cette étude s’appuie sur la mise au point de procédures spécifiques à partir des caractéristiques connues du phénomène étudié, destinées à extraire des données à partir de corpus généraux. D’un point de vue sémantique, les structures résultatives représentent à la fois un dynamisme et l’aboutissement de ce dynamisme. Un procès dynamique est au cœur d’une première relation prédicative et l’état de fait résultant de ce dynamisme constitue une seconde relation prédicative. On a a↵aire à la fusion des deux relations prédicatives – c’est-à-dire une relation prédicative et une relation coprédicative – et donc à une syntaxe di↵érente de la syntaxe de l’enchâssement. Etant donné que les SR représentent un phénomène très productif en anglais, nous nous sommes, dans un premier temps, fixé l’objectif d’en dresser une typologie, tout en tenant compte de leurs limites, c’est-à-dire, verbes statiques d’un coté du spectre et transitifs prototypiques à l’autre extrémité. L’interaction entre la syntaxe et la sémantique est forcément en jeu et pour cela nous analysons lors de cette recherche les propriétés des structures transitives (He ate the plate clean), des intransitives inergatives (The child screamed itself hoarse) et des intransitives inaccusatives (The lake froze solid ). L’autre classement se fait par type d’attribut résultatif : syntagme adjectival (He hammered the metal flat), syntagme nominal (She dyed her pants a bright red.), syntagme prépositionnel (She smashed the vase to pieces), syntagme adverbial (We decided to creep upstairs and see what happened). Nous mettons au point les protocoles d’interrogation des corpus existants en anglais et en français en vue de constituer un corpus de SR en anglais et un corpus en français pour mener une étude des problèmes que pose leur traduction de l’anglais vers le français. Nous construisons ainsi un corpus à plusieurs volets ; le premier comportant les exemples anglais recueillis de manière systématique dans les corpus BNC et COCA, en créant des collocations et en lançant des recherches avec des variations, le deuxième réservé aux traductions en français des structures relevées en anglais dans le premier volet du corpus (Gallica, FRANTEXT, Gutenberg), et aux observations de leurs caractéristiques, et le troisième qui contient les SR existantes en français. Le but de cette recherche contrastiviste est de faire deux études linguistiques – l’une sur la langue anglaise, l’autre sur le français – du phénomène des SR, pour chercher où commencent les divergences et pour quelles raisons. L’analyse des traductions, quant à elle, a pour but de systématiser les solutions rencontrées, d’en chercher la justification, et de dégager des con⇤ Ponente 42 stantes qui pourront apporter une aide à la réflexion et à l’autonomie du traducteur, apporter un éclairage supplémentaire sur ces structures qui conservent à l’heure actuelle une part d’opacité et qui se prêtent mal à l’analyse, et apporter si possible des outils supplémentaires à la traduction assistée par ordinateur. Les conclusions de notre travail de recherche sont donc le fruit des données attestées en corpus, et la confrontation des hypothèses de travail avec notre corpus est heuristique. Contraseña: corpus, linguistique contrastive, structures résultatives, syntaxe, traduction, linguistique de corpus 43 Corpus en classe de langue. Exemple avec les marqueurs d’exemplification et de reformulation Cristelle Cavalla ⇤† 1 , Thi Thu Hoai Tran ⇤ ‡ 2 1 2 Didactique des langues, des textes et des cultures (DILTEC) – Université Paris III - Sorbonne nouvelle : EA2288, Université Sorbonne Paris Cité (USPC) – Maison de la Recherche, 4 rue des Irlandais, 75005 Paris, Francia Grammatica – Université d’Artois : EA4521 – Université d’Artois Maison de la Recherche 9, rue du Temple - BP 10665 62030 ARRAS CEDEX, Francia Dans cette communication nous décrirons une expérimentation en cours auprès d’étudiants allophones de niveau A2-B1 dans un cours de français académique autour de l’utilisation d’un lexique spécifique aux écrits scientifiques et d’un corpus numérique. En termes méthodologiques il s’agit aussi de les aider à se familiariser avec les normes de ce genre d’écrit universitaire qui sont parfois éloignées des normes rencontrées dans leur système éducatif d’origine. Dans ce travail nous nous intéressons tout particulièrement au discours universitaire issus d’un corpus de 5 millions de mots composé d’articles scientifiques venant de disciplines en SHS et accessible en ligne grâce à l’interface ScienQuest[1]. Ce corpus est étiqueté morpho-syntaxiquement et annoté semi-automatiquement (Tran, 2014). Notre intérêt porte essentiellement sur la phraséologie transdisciplinaire scientifique, ou le lexique scientifique transdisciplinaire (Tutin, 2007) qui est considéré comme un ” lexique de genre ” et traverse toutes les disciplines, par exemple : contredire une théorie, objectif principal etc. Nous nous situons dans une conception élargie du domaine de la phraséologie (Legallois et Tutin, 2014) en y incluant les marqueurs discursifs (désormais MD) (à savoir, en résumé, dans le cadre de etc.) qui servent à structurer le discours. Nous avons établi une typologie composée de 171 MD et divisée en neuf sous-groupes (Tran, 2014). Pour l’analyse de ces éléments, nous avons opté pour le modèle linguistique de Paillard et Vu (2014) selon lequel nous pouvons mettre l’accent sur la relation syntaxique entre les contextes gauche et droit d’un adverbe ou d’un adverbial pour relever par la suite ses valeurs sémantiques. L’expérimentation mise en place porte sur les marqueurs d’exemplification et de reformulation, car nous avions constaté leur sur-représentation dans les écrits scientifiques (Tran et al., 2016). Au plan pédagogique, les étudiants sont confrontés à des paragraphes courts, extraits du corpus numérique. Cette expérimentation est considérée comme la première étape de sensibilisation au rôle que jouent ces éléments phraséologiques dans la structuration de ces écrits pour les étudiants allophones. Nous émettons l’hypothèse qu’une telle entrée linguistique les conduira à découvrir les normes du genre de l’écrit universitaire. Références Adam, J.-M. (1989). ” Aspects de la structuration du texte descriptif: les marqueurs d’énumération et de reformulation ”. Langue française, (81), 5998. ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 44 Cavalla, C. & Loiseau, M. (2013). ” Scientext comme corpus pour l’enseignement ”. In L’écrit scientifique: du lexique au discours. Autour de scientext, Tutin, A. & Grossmann, F. Rennes : PUG, 16380. Legallois, D., et Tutin, A. (2013). ” Présentation: Vers une extension du domaine de la phraséologie ”. In ” Vers une extension du domaine de la phraséologie ”, Legallois, D. & Tutin, A. (éds), 1(189), 325. Mangiante, J.-M., & Parpette, C. (2011). Le français sur objectif universitaire. Grenoble: Presses universitaires de Grenoble. Paillard, D., & Vu, T.-N. (2012). Inventaire raisonné des marqueurs discursifs du français. Description. Comparaison. Didactique. Paris : AUF. Tran, T.-T.-H., Tutin, A, & Cavalla, C. (2016). ” Typologie des séquences lexicalisées à fonction discursive et aide à la rédaction scientifique ”. Cahiers de lexicologie, 108(1), 161-180. Tran, T.-T.-H. (2014). ” Développement d’une aide à l’écrit scientifique. Description de la phraséologie scientifique et réflexion didactique pour l’enseignement à des étudiants non natifs ”. Thèse de doctorat en Sciences du langage Spécialité Français Langue Etrangère, Université Grenoble Alpes. Tutin, A. Lexique et écrits scientifiques. Vol. XII-2. Revue Française de Linguistique Appliquée, 2007. URL : http://corpora.aiakide.net/scientext18/ Contraseña: phraséologie, FLE 45 Development of Tatar-Russian Socio-Political Dictionary of Collocations on Corpus Data Olga Nevzorova 1 ⇤ 1 Tatarstan Academy of Sciences (TAS) – 20 Bauman str., Kazan, Rusia The Tatar-Russian Socio-Political Dictionary of collocations is based on data of the Corpus of Written Tatar (http://corpus.tatar/en), the Tatar National Corpus (http://corpus.antat.ru), and data of comparable socio-political corpora. It is built as a collocation dictionary which contains more than 3000 collocations. The methodology of compiling the Dictionary included the following stages. First we developed comparable thematic socio-political corpora of Tatar and Russian. The next stage implied an automatic generation of the frequency list of actual terms (the list of one-word terms as potential header words) using comparable corpora. Then, applying the software of the Corpus of Written Tatar, we obtained a frequency list of collocations for each frequent term. The limitations for cutting elements from the collocation list were based on frequency of using linguistic items in the Corpus, and these limitations were determined empirically. When selecting collocations, we considered the syntactic structure of a collocation and the morphological parameters of its constituents. We also took into account regularities of grammatical (non-inflectional) variants of word combinations. For example, In Turkic languages occur the following regular synonymous models: ADJ +N and N + N, POSS 3: iqtisadi cinay´’at (ADJ +N) - iqtisad cinay´’ate (N + N, POSS 3) ’economic crime’. Such regular grammatical variants of collocations are considered as the same nominative item. The main unit in the Dictionary is noun phrase formed by filing one of possible semantic-syntactic positions of the word and meeting the criteria of semantic completeness. Quantitatively such an item may consist of two or more notional words. In the current version of the Dictionary most of collocations are composed of two notional components. The compiled Dictionary makes it possible 1) to represent the real use and collocability of words of the socio-political domain in Tatar; 2) to build typical grammatical models of collocations of these items; 3) to trace new items (words and collocations) in modern Tatar. The reported study was funded by Russian Science Foundation according to the research project 16-18-02074. Keywords: the Tatar language, collocations, Dictionary of collocations, socio-political terminology, corpora. References 1. Bahns, J. (1993). Lexical collocations: a contrastive view. ELT journal, 47(1), 56-63. 2. Benson, M. (1990). Collocations and general-purpose dictionaries. International Journal of Lexicography, 3(1), 23-34. 3. Benson, M. (1989). The structure of the collocational dictionary. International Journal of Lexicography, 2(1), 1-14. 4. Carter, R. (2012). Vocabulary: Applied linguistic perspectives. Routledge. 5. Conrad, S. (2002). 4. Corpus linguistic approaches for discourse analysis. Annual Review of Applied Linguistics, 22, 75-95. ⇤ Ponente 46 6. Corpus of Written Tatar. URL: http://corpus.tatfolk.ru/index en.php. 7. K´’ubler, N., & Pecman, M. (2012). The ARTE bilingual LSP dictionary: From collocation to higher order phraseology. 8. Kennedy, G., 2014. An introduction to corpus linguistics. Routledge. 9. Ramos, M. A., Nishikawa, A., & Vincze, O. (2010, June). DiCE in the web: An online Spanish collocation dictionary. In E-lexicography in the 21st century: New challenges, new applications: proceedings of eLex 2009, Louvain-la Neuve, 22-24 october 2009 (pp. 369-374). 10. Reppen, R., & Biber, D. (Eds.). (2012). Corpus linguistics (pp. 1988-1988). SAGE. 11. Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell Publishers. 12. Suleymanov D., Nevzorova O., Gatiatullin A., Gilmullin R., Khakimov B. (2013). National corpus of the Tatar language ”Tugan Tel”: Grammatical Annotation and Implementation. In Procedia - Social and Behavioral Sciences 2013. Pp. 68-74. 13. Tatar National Corpus. URL: http://corpus.antat. Contraseña: Socio, Political Dictionary, Tatar language, collocation 47 Development of annotation system for multiword constructions for Tatar National Corpus Dzhavdet Suleymanov 1 ⇤ 1 Tatarstan Academy of Sciences (TAS) – 20 Bauman str., Kazan, Rusia Tatar National Corpus (TNC - http://corpus.antat.ru) is a linguistic resource of the modern Tatar language. Its volume is 100,000,000 tokens. The texts included into the Corpus are provided with a grammatical mark-up, so that its search system enables for a search for lexemes, word forms and individual grammatical parameters, as well as search for stop-words, for a part of the word, and search using logical formulae. Currently TNC morphological analyser uses a tagset for morphological categories within a word form. Since Tatar is distinguished for its complicated agglutinative morphology, the analysis isolates the word stem, defines its part of speech, and gives a description to the chain of inflectional affixes of the word form. The present system of grammatical annotation is being supplemented with tags to mark up compound constructions. In Turkic languages a large number of lexical items and grammatical categories are expressed by means of multiword units (for example, the category of modality is, as a rule, conveyed not lexically, but using special constructions expressing the idea of obligation, possibility, or desire). In the current version of grammatical mark-up, compound word forms and multiword constructions may only be derived by means of sophisticated queries. So extracting multiword constructions requires a description of parameters of two or even more linguistic units with a predetermined distance between them. Therefore such queries become cumbersome and time-consuming, and the user has to be experienced in making complex queries. Presently the grammatical annotation system is being enriched by entering new tags for compound (analytical) forms and constructions, thus allowing for distinguishing between multiword lexical items, forms and constructions. Special rules for retrieving such units have been developed, basing on their structure, the order of components, and the possibility to insert some outer members. In particular, verbal constructions consisting of two components have the following standard structure: the first component has a required form (has a given affix or set of affixes) and is grammatically invariable, while the second may join all the inflectional and derivational affixes admissible for verbs. Compound verbs semantically equivalent to a lexeme consist of an invariant first component (stem) and an inflected second (auxiliary) component. For example, the verb y´’ard´’am it´’u ’to help’ in real use may have di↵erent realisations: y´’ard´’am ittel´’ar ’ they helped’, y´’ard´’am itm´’asme ’will he help?’, y´’ard´’am it´’uce ’that he helps’ etc. In actual use such verbs may form compound multiword constructions by adhering components, with a possibility to insert postpositional particles between them. ⇤ Ponente 48 The present Tatar grammars keep a superficial description of the structure of multiword constructions, covering but a small number, while the corpus technology o↵ers an exhaustive list of such units. By now we have drawn up sets of rules for retrieving compound verbs semantically equivalent to a lexeme, as well as rules for retrieving their tenses, and constructions composed of phase and modal verbs. Also we developed formats of queries for retrieving correspondent data and invented special tags to mark up diverse types of multiword constructions. The annotation system is mainly built on the tags of Leipzig Glossing rules and those of the database of verbs developed by V.Plungian (httlp://www.mccme.ru/ling/verbum.htm). The reported study was funded by RFBR according to the research project 15-07-09214. Contraseña: the Tatar language, corpus, multiword construction, corpus annotation 49 Diccionario de terminologı́a médica español - chino basado en corpus Antonio Moreno-Sandoval 1 ⇤ 1 , Yuanyi Liu ⇤ † 2 Universidad Autónoma Madrid (UAM) – Departamento de Lingüistica y Lenguas Modernas, Facultad de Filosofı́a y Letras, Cantoblanco, 280049 Madrid, España 2 Universidad Autónoma Madrid (UAM) – Laboratorio de Lingüistica Informática, Facultad de Filosofı́a y Letras, Cantoblanco, 28049, Madrid, España En relación a los diccionarios especializados español-chino o chino-español, aún son escasos los trabajos y carecen de variedad. Más concretamente en la terminologı́a médica, solo existe un diccionario biling´’ue Diccionario de medicina chino-español de la Editorial de Lenguas Extranjeras de Beijing. Está publicado en el año 2005 por lo que no incluye los términos más recientes de los últimos diez años y está por actualizar. Por otro lado, no está basado en el corpus ni aportan ejemplos que ilustren el significado en el uso real. En fin, es un campo en el que se pueden ampliar claramente las investigaciones. Nuestro proyecto está elaborando un diccionario biling´’ue español-chino especializado en la medicina y basado en corpus. En concreto, se van a utilizar MultiMedica (Moreno y Campillos 2013), corpus compilado y desarrollado por el Laboratorio de Ling´’uı́stica Informática de la Universidad Autónoma de Madrid (LLI-UAM) y Sketch Engine, uno de los sistemas de búsqueda más avanzados de ayuda a los lexicógrafos a encontrar buenos ejemplos de uso para su diccionario (Kilgarri↵ et al. 2008). El objetivo del proyecto es, en primer lugar, elaborar un diccionario especializado biling´’ue en formato electrónico, para, posteriormente, describir los problemas tanto traslaticios como técnicos en la elaboración del mismo y realizar un estudio comparativo de la terminologı́a médica en ambos idiomas. El objetivo final que persigue esta lı́nea de investigación es explorar, mediante la aplicación de la tecnologı́a de corpus a la lexicografı́a, una metodologı́a cientı́fica en la elaboración de diccionarios especializados español-chino que se pueda reproducir en otros terrenos especı́ficos, tales como la terminologı́a económica y comercial, la jurı́dica, etc., y, al mismo tiempo, contribuya al desarrollo de la traducción especializada y la formación de traductores e intérpretes de alto nivel. Esta comunicación se centrará en la metodologı́a empleada: 1. Fijación de la macroestructura y la microestructura del Diccionario de la Terminologı́a Médica Español-Chino: hemos elegido los 5000 términos más frecuentes extraı́dos del Corpus Multimédica del LLI como entradas principales del diccionario, a base de los cuales hemos decidido incorporar la frecuencia normalizada, códigos médicos internacionales (CUI, MESH), equivalente en inglés, equivalente en chino mandarı́n, término equivalente en la medicina tradicional china, ası́ como la variante en chino latinizado para facilitar el uso de los hispanohablantes, sinónimos, abreviaturas, observaciones. 2. Elaboración del Diccionario, que consiste principalmente en la traducción de los 5000 términos en español al chino. Para lograr equivalentes más adecuados y precisos, hemos usado el DTM, diccionario monoling´’ue de la Real Academia Española de Medicina, el Diccionario Médico Bil⇤ † Ponente Autor correspondiente: [email protected] 50 ing´’ue Inglés-Chino, varios corpus con textos paralelos y enciclopedias biling´’ues elaboradas por instituciones sanitarias oficiales. 3. Incorporación de colocaciones: hemos incluido las colocaciones (los 10 multiwords más frecuentes) de los 5000 términos según el corpus Multimédica como nuevas entradas y sus respectivos equivalentes en inglés y en chino. 4. Selección de ejemplos: en vez de un glosario, el nuestro es un diccionario de uso. En caso de ambig´’uedad, ponemos ejemplos reales del corpus Multimédica para cada equivalente, ası́ como su traducción a chino. De esta manera, el usuario distingue mejor las diferencias que hay entre los distintos equivalentes de un mismo término. 5. Elaboración del diccionario electrónico mediante el programa TshwaneLex. Adjuntamos dos entradas (sencilla y compuesta) del diccionario en el fichero. Contraseña: Medical terminology, Spanish, Chinese, corpus, based lexicography, corpus Multimedica 51 Dire la nouveauté par les mots : les néologismes révélant les nouvelles tendances sociétales en France Najet Boutmgharine Idyassner ⇤ 1 1 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât. Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia Chaque langue est dotée de la capacité à accueillir des mots nouveaux, mais la créativité lexicale est surtout un processus qui rend compte des évolutions sociétales. En français, les procédés de création lexicale sont variés (Sablayrolles 2006) favorisant l’émergence de dénominations nouvelles, communément appelées néologismes. Très tôt, ainsi, on remarque que la création lexicale est la meilleure trace des transformations de la société, Du Bellay résumant cette relation par la formule ” aux nouvelles choses être nécessaire imposer nouveaux mots ”. La relation de cause à e↵et est structurée en deux temps : si une nouvelle chose est créée, alors la dénomination doit suivre. Cette double opération, lorsqu’elle est démultipliée, influe sur l’évolution du lexique d’une langue : les perspectives de dénomination sont alors plus grandes, pour nommer les nouvelles réalités. En suivant les changements que connait une langue donnée, on peut donc retracer les évolutions de la société dans laquelle elle évolue. L’intérêt de la néologie réside en grande partie dans ce principe, de l’avis général des néologues : ” La néologie reflète la progression d’une langue tout autant que l’évolution d’une société. [...] Le langage est daté et ce sont les néologismes qui en sont les éléments comptables les plus marquants. ” (Pruvost et Sablayrolles, 2016 : 10). Les avancées en traitement automatique des langues permettent, à l’heure actuelle, de suivre ces évolutions. Nous proposons d’exposer une recherche sur les néologismes reflétant les évolutions que connait la société française actuelle. Ces travaux ont pour cadre le projet ” Neoveille, repérage, analyse et suivi des néologismes en sept langues ” (Cartier, 2016). La plateforme Neoveille est le fruit d’un projet scientifique financé par la COMUE Sorbonne Paris Cité impliquant des intervenants à l’échelle internationale. Elle consiste en un ensemble de modules de repérage, d’analyse et de suivi des néologismes à partir d’un corpus journalistique quotidiennement alimenté. En observant la liste des néologismes retenus par le système de repérage de la plateforme, on remarque d’emblée que les néologismes traduisent l’arrivée de nouvelles pratiques de société. En particulier, les emprunts à l’anglais endossent cette fonction : l’espace de travail (co-working, workventurer), les loisirs (mermaiding, binge-viewing) ainsi que de multiples autres sphères sociales, sont bousculées par l’arrivée de nombreuses tendances importées, souvent, du monde anglo-saxon. Certains de ces emprunts sociétaux désignent des pratiques promues par les réseaux sociaux (mannequin challenge), révélant de nouvelles formes de conduites délictueuses (trainsurfing), répréhensibles (bodyshaming) mais aussi parfois signalant de nouvelles formes d’actions sociales louables (book crossing, clickfunding). De même, le suivi des néologismes dans des corpus dont les paramètres dia-varient (diatopie, diastratie, diaphasie, cf. Coseriu, 1988), montrent notamment les sociolectes les plus influents dans la sphère française, et les variations diatopiques à l’oeuvre aujourd’hui. ⇤ Ponente 52 Contraseña: néologismes, création lexicale, emprunt, anglicisme, néologismes sociétaux 53 Early Modern English Scientific Text Types: Di↵erent Levels of Linguistic Complexity? Jesús Romero-Barranco 1 ⇤ 1 Universidad de Málaga (UMA) – Universidad de Malaga Campus de Teatinos 29071 Málaga, España Complexity was first defined by Simon as hierarchies of di↵erent elements originating from simplicity (1962: 468). In Linguistics, Givon (2009) has analysed syntactic complexity from the point of view of language typology; Dahl (2004) and Nichols (2009) have assessed grammatical complexity cross-linguistically; and Blankenship (1974), Chafe (1982) and Maas (2009) have studied the di↵erent levels of complexity in spoken and written registers. Furthermore, Lehto (2015) elaborated a diachronic analysis of the levels of complexity among di↵erent text types in early Modern English legal material, based on Biber’s works on linguistic complexity. Biber (1992) identified some key linguistic features associated with reduced complexity (i.e. that deletions, contractions or clause coordination, among others) and increased complexity (i.e. nominalizations, phrasal coordination or passive constructions, among others). These features occur in di↵erent patterns across di↵erent registers and the calculation of their frequency allows for the assessment of the level of complexity in di↵erent kinds of texts. In itself, the concept of complexity has not been hitherto evaluated in early English medical writing, especially considering its di↵erent text types. In the light of this, the present paper analyses the levels of linguistic complexity in two early Modern English medical treatises housed in Glasgow, Glasgow University Library, MS Hunter 135: a surgical treatise (↵. 34r-73v) and a recipe collection (↵. 74r-121v). These two treatises conform as the ideal input for this study inasmuch as they represent two text types of medical writing and, consequently, they allow for the comparison in terms of linguistic complexity. According to Pahta and Taavitsainen (2004), theoretical treatises were the most formal text type while remedybooks represented the popular medical knowledge, surgical treatises falling in-between these two. Therefore, the analysis sheds light on the di↵erences between two of the branches of medical writing in early Modern English. The present study, therefore, has been conceived with the following objectives: a) to identify the complexity features present in these two witnesses; and b) to analyse the di↵erent levels of complexity in both text types. In order to carry out such an analysis, the linguistic features identified by Biber (1992) will be retrieved and their frequency calculated. Furthermore, textual organisation will be also analysed as it certainly contributes to the level of complexity of a particular text. On methodological grounds, the texts have been transcribed following semi-diplomatic conventions so that editorial intervention is kept to a minimum. After the transcription, the texts have been POS-tagged so that automatic searches could be carried out by way of a conventional concordancer. These texts are part of The Málaga Corpus of Early Modern English Scientific Prose (available at http://modernmss.uma.es), a corpus that aims to provide a sample of ca. 1,000,000 POS-tagged words of early Modern English scientific prose. Contraseña: linguistic complexity, early english medical writing, surgical treatises, medical reme⇤ Ponente 54 dybooks 55 El corpus de fuentes digitales como herramienta para la gramática del discurso Vı́ctor Pérez Béjar 1 ⇤† 1 , Marı́a Soledad Padilla Herrada ⇤ ‡ 1 Universidad de Sevilla (US) – España Nuestro punto de partida es la consideración de la rentabilidad del uso de fuentes digitales en los estudios de investigación ling´’uı́stica. Todos coincidimos en la necesidad del trabajo con corpus, que implica un estudio empı́rico con datos reales, lo que legitima las conclusiones obtenidas. Aunque este tipo de trabajo es habitual en el léxico, es recomendable y, desde nuestro punto de vista, imprescindible, en el campo de la sintaxis. Por ello, desde el proyecto MEsA (Macrosintaxis del Español Actual ; referencia: FFI2013-43205P) estamos elaborando un corpus compuesto por textos procedentes de fuentes digitales. Consta de muestras de discurso tomados de blogs y foros de diversa temática, publicaciones y comentarios de páginas públicas de Facebook, tuits, transcripciones de vı́deos de YouTube y recopilaciones de sus comentarios, ası́ como conversaciones privadas de la aplicación WhatsApp. Está en fase de realización. Nuestro objetivo es conseguir material ling´’uı́stico de uno de los medios de comunicación más frecuentes en la actualidad: las redes sociales y aquellos entornos integrados en la internet 2.0. Se trata de un entorno comunicativo hı́brido en el continuum oral-escrito, coloquial-formal. Entre las ventajas podemos destacar la gran cantidad de muestras textuales a las que se tiene acceso, la obtención de ejemplos fáciles de interpretar sin las dificultades que presenta la lectura de una transcripción oral y la posibilidad de recuperar el contexto completo de las muestras. Entre los problemas, podemos señalar que no siempre es posible la reposición de elementos entonativos (a menudo, imprescindibles en la interpretación de enunciados) ya que la ortografı́a no es rigurosa a la hora de reflejar la prosodia. Este corpus nos servirá para trabajar dentro del marco del proyecto, para detectar patrones sintácticos que se están extendiendo en el discurso coloquial y del que, raras veces, obtenemos datos. Nos interesa porque en algunos casos puede llevar a la fijación de operadores o marcadores discursivos. En todos estos domina la intersubjetividad (Company 2004; Traugott 2004), uno de los motores de la evolución de estos elementos ling´’uı́sticos. En esta presentación nos queremos centrar en expresiones que salen de los moldes sintácticos tradicionales y que no se ajustan al esquema oracional. Dentro de este grupo, se sitúan las unidades fraseológicas, entendidas en un sentido amplio. Es decir, nos referimos a estructuras con una fijación léxica total (refranes, frases hechas...), construcciones cuya fijación se encuentra en la combinatoria de sus elementos (como construcciones insubordinadas) y otras expresiones ling´’uı́sticas que no se encuentran todavı́a del todo fosilizadas. El acercamiento a estas unidades se hará desde una perspectiva pragmagramatical (Fuentes 2015), que contempla la descripción de unidades sintácticas más allá de la oración según su uso real y su función dentro del discurso. Esta perspectiva se desarrolla desde un análisis multidimensional, que tiene en cuenta la ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 56 macroestructura, la microestructura y el tipo de texto, y que incluye los diferentes campos de inserción de la posición del hablante: estructura enunciativa, modal, informativa y argumentativa. Defendemos, por tanto, que es un corpus rentable en los estudios de los fenómenos coloquiales de la lengua. Con la presentación de muestras de unidades con mayor o menor grado de fijación extraı́dos de este corpus pretendemos reflejar que son muestras fiables para el estudio en este campo de investigación y que su uso constituye una herramienta eficaz en las investigaciones en gramática del discurso. Contraseña: Pragmagramática, sintaxis discursiva, discurso digital 57 El desacuerdo a través de la interrogación ecoica Marı́a Valentina Barrio ⇤ 1 , Milka Villayandre ⇤ 1 1 Universidad de León – España El español presenta un conjunto de esquemas sintácticos fraseológicos pragmáticos (Zamora Muñoz, 2003), de naturaleza interrogativa, que repiten total o parcialmente un enunciado previo emitido por otro interlocutor y cuya función discursiva es la expresión del desacuerdo mediante dicha repetición. Se pueden citar algunos ejemplos: (1) A: - A ti, Ana, te toca fregar los platos. B: - ¿A mı́, fregar, de qué? No pienso hacerlo. (2) A: - ¿Sabes cuándo vuelve Pili de las vacaciones? B: - ¿Yo qué voy a saber? (3) A: - ¿No tomas el desayuno con nosotros? B: - ¿Qué desayuno ni qué leches? Sigo sin olvidar lo que me habéis hecho. (4) A: - Si madrugaras más, tendrı́as más tiempo para organizarte. B: - ¿Yo, madrugar? Lo siento, me lo prohı́be mi religión. (5) A: - A ver, que el español no necesita promoción. B: - ¿Cómo que el español no necesita promoción? En este estudio, se proponen dos objetivos principales. En primer lugar, se sistematizarán los esquemas fraseológicos interrogativos existentes en español que manifiestan desacuerdo cumpliendo las caracterı́sticas anteriormente mencionadas, a fin de definir los elementos que conforman su esquema fijo y aquellos que pueden saturar sus variantes libres. En segundo lugar, se analizará el microdiscurso que forma el esquema interrogativo junto con su estı́mulo (el enunciado que repite) para describir las funciones pragmáticas que cumplen estas unidades y las relaciones que experimentan dentro de la conversación. En esta tarea, se hará especial hincapié en dos cuestiones. Por un lado, el estudio de la repetición y los componentes a los que afecta; esto es, al contenido del enunciado, al acto de enunciación, a los interlocutores... Por otro, la reflexión sobre las unidades en las que recae el desacuerdo y los supuestos pragmáticos en los que se basa este desacuerdo, ya sean de carácter explı́cito o requieran un proceso interpretativo de tipo inferencial. ⇤ Ponente 58 Por lo que se refiere al marco de estudio, se seguirán los postulados de la macrosintaxis de corte funcionalista (Gutiérrez Ordóñez, 2016) que supera los lı́mites del enunciado y se adentra en el microdiscurso, es decir, la combinatoria de enunciados en el discurso, para observar sus constituyentes y el entramado de relaciones y funciones que tienen lugar entre ellos. Metodológicamente, se partirá de un análisis cualitativo de estos esquemas fraseológicos en algunos corpus orales del español dentro del ámbito conversacional, en el que aparecen de forma natural debido a su naturaleza ecoica. Asimismo, se contrastará su incidencia en corpus más generales del español. Estos corpus son: el Corpus del Español del Siglo XXI (CORPES XXI), el Corpus del español web/dialectos, Sketch Engine, el Corpus Oral Didáctico Anotado Ling´’uı́sticamente (CORDIAL), el Corpus de conversación coloquial del grupo Val.Es.Co., el Corpus Oral Juvenil del Español de Mallorca (COJEM) y el Corpus del grupo de investigación ling´’uı́stica aplicada (COGILA). Se prevé que los resultados contemplen algunas de las principales caracterı́sticas de estos esquemas. Dentro de la conversación, actúan siempre como intervenciones de réplica despreferidas, puesto que nunca pueden ser primeros turnos de palabra. La manifestación del desacuerdo marca al mismo tiempo la ruptura con la continuación esperable del discurso y la presencia de varios enunciadores dentro de la misma intervención. Contraseña: Desacuerdo, estructuras interrogativas, repetición, análisis de la conversación, ling´’uı́stica de corpus, macrosintaxis 59 El lenguaje jurı́dico y el lenguaje de la ingenierı́a biomédica vistos desde la metodologı́a de corpus Eleonora Lozano Bachioqui ⇤† 1 , Allen Andrade Navarro ⇤ ‡ 2 1 2 Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Av. Álvaro Obregón y Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Álvaro Obregón y Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México El presente trabajo se concentra en dos lenguajes de especialidad: el jurı́dico y el de la ingenierı́a biomédica. Profundiza en el lenguaje jurı́dico desde la perspectiva fraseológica y en el de la ingenierı́a biomédica, desde perspectiva terminológica. Para tal efecto, se construyeron dos corpus especializados monoling´’ues en español que son producto de la investigación basada en la metodologı́a de corpus (McEnery y Hardy, 2012) y que fueron analizados a través de herramientas de gestión de corpus. Para ello, se contemplaron trabajos fundacionales en la ling´’uı́stica de corpus como los de Sinclair (1970) y Stubbs (2001). El primer corpus, un corpus con fines especı́ficos (Maia, 2002), consta de 73,214 palabras y 5, 751 tipos procedentes de documentos legales pertenecientes al derecho civil mexicano, tales como actas de nacimiento y de matrimonio, sentencias, testamentos, ası́ como contratos, entre otros. Éste se analizó mediante un software de procesamiento léxico: WordSmith Tools (Scott, 2014) que generó una lista de 558 palabras clave. De aquı́, se obtuvieron 60 verbos clave con una frecuencia 10; a partir de los cuales se estudiaron las colocaciones y secuencias formulaicas, utilizando el IM (Índice de Información Mutua). Para ello se consideraron trabajos fundacionales como los de Corpas Pastor (2003) y Koike (2001). Un ejemplo del análisis es el caso del verbo celebrar que tiene colocaciones léxicas simples como celebrar + contrato y celebrar + convenio (verbo + sustantivo objeto), ası́ como celebrar + a + (el ) tenor (verbo + preposición+ sustantivo). Además, presenta secuencias formulaicas como es su libre voluntad celebrar y obligarse. El segundo corpus consta de 394,351 palabras y 23,965 tipos procedentes de textos cientı́ficos pertenecientes al área de la ingenierı́a biomédica y obtenidos a través de revistas electrónicas de reconocido prestigio en Latinoamérica. Al igual que el primero, se analizó mediante un software de procesamiento léxico: Antconc (Lawrence, 2014) que generó una lista de palabras clave, de las cuales se consideraron aquellas con una Frecuencia de 45 y un ı́ndice de representatividad (Keyness) de 107, a partir de éstas se identificaron las colocaciones, considerando el ı́ndice Log-Likelihood. Para este trabajo se consideraron autores como Cabré, (2007) y Faber (2010). Ejemplos de las colocaciones encontradas en este último corpus son: tejido + óseo, matriz + extracelular, presión + arterial, alto + riesgo y baja + densidad (sustantivo + adjetivo), ası́ como reacción + difusión (sustantivo – sustantivo). Los resultados de este estudio facilitan un acercamiento, desde la perspectiva de la ling´’uı́stica de corpus, a estos dos lenguajes de especialidad y permiten al traductor, ası́ como al docente de ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 60 lenguas con fines especı́ficos, resolver los problemas ling´’uı́sticos relacionados con la estructura léxica, terminológica e, incluso, fraseológica de los lenguajes de especialidad. En este caso, el jurı́dico y el técnico. Contraseña: lenguajes de especialidad, ling´’uı́stica de corpus, colocaciones, traducción, enseñanza de lenguas 61 Estudio comparativo de la traducción en inglés, francés y español de los aspectos ling´’uı́sticos y paraling´’uı́sticos de los cómics a partir de un corpus multimodal de género de terror Marı́a Del Carmen Baena Lupiáñez 1 ⇤ 1 UNIVERSIDAD DE MÁLAGA – España Teniendo en cuenta los estereotipos que la sociedad establece con respecto a determinados gestos y expresiones, las producciones literarias han hecho uso de ellos para aportarle expresividad a sus personajes. Hoy en dı́a existen cómics con verdaderos ensayos filosóficos en sus bocadillos, y cómics en los que solo aparece la imagen sin texto alguno. Dicho texto se limita, en ocasiones, a complementar lo que el lector se dispone a ver en las viñetas. Texto e imagen son dos elementos que no pueden prescindir el uno sin el otro, ya que se complementan entre sı́. En la traducción de cómics, el traductor debe tener en cuenta esta complementariedad para que el texto meta resulte coherente y tenga cohesión. Por lo tanto, en la traducción de cómics hay que observar tanto los elementos textuales como los paratextuales, ya que no son de ningún modo indisociables. Ası́, el traductor debe no solo leer el texto, sino interpretar la imagen que la acompaña y emplear las técnicas pertinentes, ası́ como adaptar el texto y la imagen a la cultura meta en caso necesario. Esto permitirı́a afirmar que la traducción de cómics es un tipo de traducción especializada, ya que tiene sus propios códigos y sus propias estrategias de traducción. Sin embargo, y pese a la importancia de una buena interpretación de los aspectos paraling´’uı́sticos, los estudios de Traducción no han tratado este tema de forma directa con demasiada frecuencia. Anteriormente se estudiaba, por un lado, el género del cómic y, por otro lado, la traducción de cómics. De este modo, existen estudios que se centran en el análisis del cómic (T. Groensteen, 2009, 2013), en las caracterı́sticas especı́ficas de este género (Gubern y Gasca, 1988) y en su aspecto semiótico (N. Celotti, 2008), y, por otro lado, estudios que se centran en la importancia de la imagen para la traducción de cómics (Kaindl, 2004; Zanettin, 2008). Hoy en dı́a el concepto de ”paratraducción” es el que mejor se adecua a la traducción de cómics (José Yuste Frı́as, 2015). Autores como Zanettin han estudiado tanto la ling´’uı́stica de corpus como los cómics, y han señalado que se puede establecer una relación entre los corpus y los cómics, ya que el traductor puede elaborar corpus textuales para traducir de forma más eficaz y eficiente el cómic a la lengua y a la cultura meta (2002). Tras lo expuesto anteriormente, el objetivo principal de este estudio es el de establecer clasificaciones que integren los elementos paraling´’uı́sticos (gestualidad, expresiones faciales y lenguaje ⇤ Ponente 62 simbólico) que aparecen en los cómics teniendo en cuenta la cultura inglesa, francesa y la española. Para cumplir con dicho propósito, se han seleccionado seis cómics de terror. En este tipo de obras los elementos paraling´’uı́sticos son muy destacables, ya que son cómics en los que aparecen multitud de elementos simbólicos y en las que los personajes son especialmente expresivos, con lo cual se podrá establecer un amplio corpus multimodal. Contraseña: Ling´’uı́stica de corpus, corpus multimodal, cómics, cómics de terror, elementos paral´ ing’uı́sticos. 63 Estudio comparativo de las marcas de uso en los repertorios lexicográficos actuales Estrella Calvo-Rubio Jiménez 1 ⇤ 1 Universidad de Sevilla [Seville] – C/ S. Fernando, 4, C.P. 41004-Sevilla, España Los repertorios lexicográficos han registrado siempre, en menor o mayor medida, marcas de uso. Sin embargo, a lo largo de la historia lexicográfica, esta marcación ha sufrido cambios. En efecto, observamos que en los últimos años la Real Academia de la Lengua Española ha optado por introducir nuevas marcas de uso y, en ocasiones, ha procedido a la sustitución de una marca por otra. En este sentido, el Diccionario de la Lengua Española de la Real Academia ha sido siempre un referente en el mundo lexicográfico hispánico y, por supuesto, los estudios que tratan sobre él son muy abundantes y variados. Sin embargo, a raı́z de la observación de las últimas ediciones, llama la atención la variabilidad presente en las marcas de uso. No obstante, esta variabilidad o falta de precisión a la hora de establecer las marcas de uso no es una caracterı́stica exclusiva del Diccionario académico. De hecho, los lexicógrafos coinciden en señalar la existencia de una clara dificultad a la hora de instaurar un criterio a través del cual decretar cuándo una voz o acepción pertenece a un nivel de lengua o estilo concreto. De ahı́ que existan diferencias entre una obra lexicográfica y otra en lo que a las marcas de uso se refiere. Esta investigación realiza un estudio comparativo de las marcas de uso en diferentes repertorios lexicográficos actuales, concretamente en el Diccionario de la Lengua Española (2014) de la Real Academia, en el diccionario CLAVE (2012), en el Diccionario del español actual (2011), en el diccionario de uso Marı́a Moliner (2008) y en el diccionario de la lengua española LEMA (2001), con el objetivo de reflejar las diferencias en cuanto a esta marcación de una obra a otra. Para ello, se parte de la elaboración de un corpus en el que se recogen las voces o acepciones marcadas diafásica o diastráticamente en estas cinco obras. De este modo, a través de la observación y el estudio del corpus, me centro en estudiar las diferencias existentes entre un repertorio lexicográfico y otro en cuanto a las marcas de uso, prestando especial atención a las voces y acepciones marcadas como vulgar, malsonante y coloquial. Ası́ comprobamos que estas cinco obras presentan bastantes divergencias a la hora de establecer dicha marcación. Por ejemplo, observamos que el diccionario LEMA se aleja claramente de las otras obras al no catalogar ninguna de las voces y acepciones bajo la marcación vulgar ; o que, en el Marı́a Moliner, no aparece la anotación malsonante, introducida en el Diccionario académico en 2001 y presente en los otros repertorios lexicográficos. Cabe preguntarse, pues, qué criterios siguen los diferentes lexicógrafos a la hora de establecer las marcas de uso y cuáles son más convenientes en cada caso. Contraseña: lexicografı́a, marcas de uso, diccionarios ⇤ Ponente 64 Estudio contrastivo de corpus para identificar los rasgos diacrónicos del discurso normativo catalán : estudio de los Estatutos de autonomı́a de 1932, 1979 y 2006 Albert Morales Moreno 1 ⇤ 1 Universitat Pompeu Fabra / Università Ca’ Foscari Venezia (UPF / UCFV) – España El procedimiento legislativo de aprobación y redacción del Estatuto de autonomı́a de Cataluña de 2006 (EAC 2006), y su estudio exhaustivo presentado en Morales (2015), planteaban la necesidad de llevar a cabo un estudio diacrónico[1] comparativo de los diferentes Estatutos de autonomı́a de Cataluña que ha habido a lo largo de la historia: el EAC de 1932, el de 1979 y el ya citado de 2006. Como en otras tradiciones y paı́ses, la negociación de todos esos proyectos normativos han sido retos notables en su momento histórico, tanto jurı́dicamente como polı́ticamente, tal y como se puede constatar en Balcells (2010) –para el proyecto de autonomı́a de 1919–, Aymamı́ (1932) o Abelló (2007) sobre el de 1932, y Sobrequés (2010) en lo que respecta al EAC de 1979. Hay que leer cada uno de esos Estatutos como reclamaciones de autogobierno reiterados tanto en el marco jurı́dico constitucional actual como en los marcos de convivencia anteriores. Dicho compendio de documentos constituye lo que André Salem denomina ”serie textual cronológica” (Salem 1994:313). Esos textos, situados a medio camino entre el discurso especializado legislativo y el discurso polı́tico (Thornton 1987; Chilton 2004), se inscriben dentro de un género textual –el discurso normativo– poco estudiado desde la perspectiva del análisis del discurso (AD) (Fernández Lagunilla 1999a, 1999b; Bassols 2007), ya que sobre todo se han caracterizado otros géneros relacionados con la actividad polı́tica, especialmente el debate parlamentario (Ribas Bisbal 2000; Cuenca 2014). Tomando como referencia las publicaciones sobre redacción legislativa en catalán (como, por ejemplo, GRETEL 1986, 1995; Duarte 1993; SAL 2014) y la metodologı́a de otros estudios lexicométricos sobre discurso normativo en catalán (Morales 2010, 2015), ası́ como el estudio contrastivo de las constituciones españolas de 1812, 1931 y 1978 (Démol 2013), se llevará a cabo un estudio de tipo diacrónico. Partimos de una metodologı́a de análisis basada en la lexicometrı́a: las unidades de análisis se seleccionan en base a criterios estadı́sticos. Para tratar nuestro corpus, utilizaremos una de las herramientas de análisis lexicométrico más utilizadas, a saber, Lexico3, Iramuteq, TXM o Hyperbase. Procederemos a realizar un estudio lexicométrico de las caracterı́sticas principales del corpus (crecimiento del vocabulario, análisis factorial de correspondencias, segmentos repetidos...) y nos interesa, sobre todo, dos estudios: 1) el análisis de especificidades para, con este ı́ndice ⇤ Ponente 65 ampliamente usado en la tradición lexicométrica, ser capaces de identificar las unidades léxicas que presenten cambios a lo largo del periodo seleccionado (1932-2006). Dicho ı́ndice nos servirá para identificar las formas que aparecen infrautilizadas y sobreutilizadas estadı́sticamente, de acuerdo con el tamaño de cada subcorpus (cada EAC diferente) y de todo el corpus en su conjunto; 2) el análisis de segmentos repetidos, para identificar cuáles son las unidades fraseológicas que caracterizan el discurso normativo en catalán y su evolución a lo largo del tiempo. De este modo, nuestra investigación se plantea analizar el corpus lexicométricamente para identificar las formas que caractericen en positivo y en negativo cada versión del EAC estudiada y las unidades fraseológicas más recurrentes para, ası́, establecer las primeras bases que permitan describir, desde un punto de vista diacrónico, la evolución del discurso normativo en lengua catalana en relación al vocabulario y a la fraseologı́a. Esta investigación se enmarca en el proyecto de investigación financiado por el Instituto de Estudios del Autogobierno para el primer semestre de 2017. Contraseña: discurso normativo, lexicometrı́a, ling´’uı́stica de corpus, estudio diacrónico 66 Estudio de la aplicabilidad de la ley de Zipf y de la ley de Heaps en los corpus de aprendientes de inglés. Nicolas Ballier ⇤† 1 , Paula Lissón 1 ⇤ ‡ 2 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia 2 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia de Corpus Diderot Bât. de Corpus Diderot Bât. Este trabajo se centra en la aplicabilidad de la ley de Zipf-Mandelbrot (Zipf, 1949; Mandelbrot, 1953) y de la ley de Heaps (1978) en los corpus de aprendientes. Para ello, realizaremos una comparación entre las curvas de crecimiento del vocabulario en textos escritos por nativos ingleses y en textos escritos por aprendientes de inglés. La ley de Zipf-Mandelbrot establece que, en un texto dado, la distribución de las palabras está relacionada con su frecuencia. Esto se traduce en que el texto estará compuesto por pocas palabras con mucha frecuencia, y por muchas palabras con poca frecuencia. En un estudio reciente, Bentz y Buttery (2014) muestran que a) la ley de Zipf-Mandelbrot puede ser utilizada como medida de estudio de la diversidad léxica y, b) no todas las lenguas siguen de la misma forma la ley de Zipf-Mandelbort. Nuestra hipótesis es que los aprendientes de inglés no siguen exactamente la ley de Zipf-Mandelbort, y que su curva de crecimiento del vocabulario es diferente con respecto a la curva de los nativos, lo que podrı́a ayudarnos a clasificar a los aprendientes en diferentes niveles. La ley de Heaps (1978), complementaria a la ley de Zipf, establece que el crecimiento del vocabulario de un texto dado es una función del tamaño de dicho texto. Si aumentáramos el tamaño del texto, aunque el crecimiento del vocabulario seguirı́a siendo ascendiente, dejarı́a de ser linear, ya que a medida que se incrementa el número de palabras, la posibilidad de que aparezcan palabras nuevas se ve reducida. Nuestra hipótesis es que los aprendientes presentan un crecimiento del vocabulario más limitado, por lo que la producción de hápax legomena serı́a inferior a la predicción propuesta por la ley de Heaps (aproximadamente la raı́z cuadrada del número total de tokens). Para probar nuestra hipótesis, estudiaremos la aplicabilidad de la ley de Zipf-Mandelbrot y de la ley de Heaps en un corpus escrito de estudiantes hispanófonos de inglés, NOCE (Dı́azNegrillo, 2007), y compararemos los resultados con los de un corpus de producciones escritas de nativos ingleses, LOCNESS (Paquot, 2015). De esta forma, analizaremos la valencia de las leyes aquı́ propuestas, mostrando ası́ las variaciones entre los nativos y los no nativos. A partir del número de tokens y de hápax legomena de nuestro corpus de aprendientes, gener⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 67 aremos los espectros de frecuencia que nos permitirán crear las curvas de crecimiento del vocabulario. Para ello, emplearemos el paquete {zipfR} (Evert & Baroni, 2006), implementado en el programa R (R Core Team, 2016). Siguiendo los pasos de Ballier y Gaillat (2016), utilizaremos la función ”compare.richness.fnc” implementada en {langaugeR} (Baayen, 2007) para comparar el crecimiento del vocabulario entre las producciones de nativos y no nativos. A continuación, desarrollaremos la extrapolación de las curvas de crecimiento de vocabulario (ver figura 2) según los tres modelos de Large Number of Rare Events (LNRE) incluidos en {zipfr}: ”Generalized Inverse Gauss-Poisson” (R Harald Baayen, 2001, 2008), ”Zipf-Mandelbrot” y ”Finite Zipf-Mandelbrot” (Evert, 2004). Finalmente, comparemos los resultados de los tres modelos para identificar cuál de ellos es más adecuado en el análisis de los corpus de aprendientes. Contraseña: corpus de aprendientes, complejidad léxica, Zipf, Mandelbrot, crecimiento del vocabulario, hápax legomena 68 Extracción de fraseologı́a contable con Sketch Engine. Propuesta de flujo de trabajo Daniel Gallego 1 ⇤ 1 Universidad de Alicante (UA) – Carretera San Vicente del Raspeig s/n 03690 San Vicente del Raspeig - Alicante, España Este trabajo presenta una experiencia metodológica en la extracción de fraseologı́a especializada en un corpus genérico especializado en contabilidad. Se postula la hipótesis de que, sobre la base un listado cerrado de términos simples y de verbos que potencialmente pueden llegar a formar junto con tales términos unidades fraseológicas especializadas, Sketch Engine (Kilgarri↵ et al., 2004), a pesar de no estar diseñado especı́ficamente para la extracción de fraseologı́a especializada, puede ser de utilidad para el trabajo de vaciado fraseológico. El marco teórico gira en torno al concepto de fraseologı́a especializada, que se revisa a partir de trabajos como los de Gouadec (1994), L’Homme (1997), Bevilacqua (2004) o Aguado (2007). También se tienen en cuenta algunos estudios sobre evaluación de extracción de fraseologı́a (Claveau & L’Homme 2004; Wanner et al. 2005, entre otros). Para dar respuesta a la hipótesis de trabajo, en un primer momento, se delimita, sobre la base de los trabajos anteriores, el objeto de estudio (en esencia, se analiza la fraseologı́a especializada del tipo verbo + término). A continuación, se propone un flujo de trabajo para la extracción de un listado de candidatos a unidades fraseológicas especializadas con el sistema de explotación de corpus Sketch Engine. El flujo en cuestión se divide en diferentes pasos: el primero consiste en generar dos whitelists, una con términos y otra con verbos extraı́dos del propio corpus, y validarlos manualmente. El segundo tiene que ver con la extracción de concordancias que contengan los verbos y términos identificados, lo cual implica el uso avanzado de CQL (corpus query language) de Sketch Engine. En el tercer paso se genera, a partir del listado de concordancias anterior, un nuevo listado de frecuencias de las unidades extraı́das que puede considerarse un listado de candidatos a unidades fraseológicas especializadas. Por último, se estudia de manera individualizada las unidades extraı́das para determinar su carácter fraseológico. El análisis de las cincuenta primeras unidades extraı́das muestra un porcentaje de precisión de en torno al 40%, una cifra bastante elevada que merece seguir siendo investigada. La validación de más unidades permitirá conocer la fluctuación de este porcentaje y saber en qué medida es superior o inferior al de otros estudios. En cualquier caso, los resultados pueden ser tenidos en cuenta no solo en la elaboración de repertorios fraseológicos, sino también en la indexación de corpus. Asimismo, la experiencia permite hacer algunas sugerencias con el ánimo de optimizar el funcionamiento de sistemas de explotación de corpus en su relación con la extracción de fraseologı́a especializada. ⇤ Ponente 69 Contraseña: Fraseologı́a especializada, Sketch Engine, extracción, corpus genéricos 70 Extracting semantic frame structures from Environmental Sciences corpora Beatriz Sánchez-Cárdenas ⇤† 1 , Carlos Ramisch ⇤ 2 1 2 Lexicon research group, Universidad de Granada – España Université de Marseille – LIF (Laboratoire d’Informatique Fondamentale) – Francia Some authors argue that language is much less compositional than one might initially assume (Tutin & Falaise 2013, K´’ubler & Volanschi 2012, Gledhill 2000, Pecman et al 2010, L’Homme 1998). In addition to multiword expressions, such as idioms andcompounds, speakers often employ prefabricated templates and collocational patterns. Such patterns are omnipresent in specialized language, where their correct use is crucial to fully convey and understand domain concepts and their relations. In this research, we propose and evaluate a new way to automatically identify specialized nounverb combinations that are both recurrent and meaningful from a cognitive point of view in scientific discourse (Claveau & L’Homme 2006). The long-term goal of this work is to automatically extract argument structures from corpora to help building semantic frames that are activated in specialized domains. From a theoretical point of view, our work derives from frame-based terminology (FBT, Faber 2012, 2015). FBT applies the premises of frame semantics (Fillmore 2006) to the study of the conceptual organization that underlies specialized domains. Then, our description of thematic roles and argument structure is based on role and reference grammar (Van Valin 2006). Finally, we classify the nouns of the arguments in semantic categories (Flaux and Van Velde, 2000). With this perspective in mind, we developed a corpus-based methodology to acquire lexical patterns that reveal the structure of di↵erent frames. Our starting point are corpus queries and association measures implemented in the MWEtoolkit, a software for automatic MWE discovery in corpora(Ramisch 2014). After morphosyntactic analysis and lemmatization of the corpus, we search specialized nounverb and verb-noun combinations that are conceptually meaningful. These searches were based on semantic relations between nouns described in the Environmental database EcoLexiCon. For instance, the term volcano is connected to the noun eruption through the conceptual relation [cause of]. Since the extraction from corpora of relevant noun-verb combinations is crucial to identify the argument structures that underlie semantic frames (Fillmore et al 2003), we searched in the corpora for verbs that lexicalize the relation between these two nouns and retrieved verbs such as cause and produce. Using a bootstrap methodology, these verbs where reused to formulate another query, which retrieves from the corpora all causal relations related to volcanoes. The results were then sorted in descending order of association measure (pointwise mutual ⇤ † Ponente Autor correspondiente: [email protected] 71 information). The most relevant lexical items for the frame under study are those in the top of the list. Finally, these lists of patterns led to the emergence of the di↵erent conceptual frames associated to the concepts analyzed. These are then filled in manually by an expert lexicographer. In this article, we chose to present an example extracted from a 1-million-token corpus of Volcanology. For the moment, we have extracted the verbs associated to the term volcano. When we analyze the arguments of these verbs and their associated thematic roles (Van Valin 2006) and semantic categories (Flaux and Van Velde, 2010), we will illustrate the di↵erences between the three di↵erent frames. Since frames reflect cognitive patterns, they are language independent. As shall be seen in our presentation, this conceptual description can be enriched with linguistic information in any language. As a consequence, translation studies can greatly benefit from it. Contraseña: frame, based terminology, multiword expressions, argument structure, corpus analysis strategies 72 Facework in a telecollaboration student corpus Pennock-Speck Barry ⇤† 1 , Begoña Clavel Arroitia ⇤ 1 1 Universitat de València (UVEG) – Universitat de València, Avda. Blasco Ibáñez, 32, España Undoubtedly bigger is better in the world of corpus linguistics –the more data you have the better results. However, there are corpora that are necessarily small. Let’s take our corpus of twelve audio-visual recordings of synchronous peer interaction (Telecollaboration[1]) in English and Spanish between native secondary school speakers. Anyone who has done research on the discourse of minors knows how difficult it is to get permission from parents to record pupils for research purpose. What may not be so evident to those who have never been involved in telecollaboraion is the difficulty of finding schools in at least two countries that are willing to participate and time slots that suit geographically distant peers. These problems are compounded by often less than perfect technical resources in secondary schools. All this leads to small, finite corpora which are difficult to replicate. But does this mean that they are of no use? We would argue that this is far from the truth. In this talk, we aim to prove that detailed qualititative analysis of synchronous multimodal interaction between secondary school pupils yields valuable insights into the language pupils use and also intercultural and interersonal negotiations. During telecollaboration students are faced with challenges of an interpersonal, intercultural and a transactional nature while trying to complete the tasks they are given such as organising a party or a trip abroad on a tight budget. Such challenges require the use of facework, which we define, following Go↵man (1956, 1967), as the actions individuals take to mitigate face threats and to protect or enhance their own face and that of others. Our findings show that mitigating face threats is found in our corpus when requests for clarification arise due to a peer’s lack of linguistic prowess in the foreign language at a particular moment in the exchange or simply because he/she is not able to hear a word due to technical problems. In most cases we found that, if comprehension was not compromised, linguistic errors were obviated –which may be due to a common facework strategy, that is, avoidance of conflictive issues. We also discovered that facework addressed to positive face was very common and generally consisted of the search for common ground. Apart from linguistically-coded communication, we also detected many cases of non-linguistic communication through gestures, smiles, laughter and the showing of photographs of a personal nature. These often reinforced verbal facework strategies. To sum up, our findings point to the fact that the ”ceremonial activity” (Go↵man 1967:477) done through facework is an important, though oft-neglected, facet of linguistic or psychological studies of student interaction. Go↵man, Ervin. 1956. ”The nature of deference and demeanor.” American Anthropologist 58: 473-502. ⇤ † Ponente Autor correspondiente: [email protected] 73 Go↵man, Ervin. 1967. Interaction ritual: Essays on Face to Face Behavior. Garden City, New York. Telecollaboration for Intercultural Language Acquisition project (TILA) Contraseña: telecollaboration, facework, pragmatics, acquisition 74 From text to word and from word to morpheme: Exploring the interface of corpus linguistics and word formation study with evidence from Modern Greek Paraskevi Savvidou 1 ⇤ 1 National and Kapodistrian University of Athens (UoA) – Grecia The present paper aims to explore the contribution of corpus linguistics in word formation study, by reviewing previous research, as well as by discussing the findings of an ongoing study in Modern Greek word formation processes with emphasis on evaluative morphology. The orientation of the study is both theoretical and methodological. It aims to demonstrate that the further investigation of the interface of corpus linguistics and word formation morphology could provide significant insights into the understanding of the character and nature of corpus linguistics as a linguistic (un)field or methodology (see among others Stubbs 2009), by demonstrating its ties with what Sinclair (2004) used to call restrictions of the pre-computer age; also it can contribute towards the overcoming of these limitations. In other words, the interface of corpus linguistics and word formation study is presented as crucial for understanding and extending the theory and methodology of corpus linguistics. In the first part of the paper, a historical overview of the use of corpus linguistics demonstrates that morphology is a rather neglected area of corpus research, compared to other linguistic fields; corpora were applied in morphology later, less systematically and by concentrating only on specific aspects of morphemes’ behavior, like productivity, excluding or underestimating others. The critical overview of previous research shows that the use of corpora in individual linguistic fields seems to be driven by a latent distinction between the formation and the use level, which is associated with the relevant dichotomies between grammar and lexis, as well as between form/structure and semantics. The extent and the way of applying corpora in morphology can be seen as a consequence of this distinction. Given the fact that corpus linguistics is a perspective in language study which goes beyond theoretical assumptions and dichotomies which do not come from data analysis, the above observation could contribute to a most thorough understanding of such limitations, which is essential in order to overcome them. In the second part of the paper, we introduce a set of theoretical and methodological principles which could extend the implementation of corpus linguistics in word formation study and we give evidence in their favor by presenting the results of an ongoing study of Modern Greek evaluative morphology. The proposed methodology is designed on the basis of two main methodological principles: (a) the extension of the notion of co-occurrence in two levels: the word formation level (namely, various characteristics of the bases or compounding components which the elements under examination tend to combine with) and the (con)text level and (b) the combination of qualitative and quantitative analysis on the study of every aspect of the behavior of the sublexical units under examination, including function identification, combinatoriality, productivity etc. These principles aim to transfer all the benefits of the ‘phraseological approach’ of corpus linguistics to the field of morphology. The results of the analysis of a representative number of Modern Greek sub-lexical units show that these general principles allow the examination of the ⇤ Ponente 75 dynamic relation between the formation and the use level of the elements under examination, o↵ering a perspective which can only be in view if the analysis is careful not to exclude or underestimate specific aspects of morphemes’ behavior. Contraseña: Word formation, derivation, compounding, context, word level, text level, lexis, grammar, evaluative morphology, phraseological approach 76 Functional and thematic ngrams in specialized corpora: the case of academic English, French and Spanish Clive Hamilton ⇤ 1 1 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât. Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia Previous studies have established that functional and content single-word-units di↵er in ratio between oral and written modes of communication (cf. Halliday, 1994; Rowley-Jolivet 1998; Biber et al., 1999). Others have suggested that this mode di↵erence is equally attested in di↵erent languages (cf. Samaniego, forthcoming, for Spanish; Hamilton & Carter-Thomas, forthcoming, for English and French). However, in spite of the many advances in corpus studies, this observation has not yet been adapted or extended to clusters or recurrent word combinations. In addition, the study of phraseological units has become a burgeoning area of linguistic inquiry over the last years, both in theoretical and applied frameworks (cf. Cowie, 1998; Meunier & Granger, 2008). The pervasiveness of these units, irrespective of the type of data used for research, has also benefited from ”key publications”, according to Stubbs & Barth (2003:61). As a result, the pervasive nature of these recurrent combinations can therefore be considered an irrefutable characteristic of natural language production. In this presentation, the aim is to add a doubly contrastive perspective to the general debate, by examining (i) recurrent word combinations (or ngrams, which can be subdivided into bigrams, trigrams, and so forth) (ii) in a specialized trilingual corpus of academic discourse in natural sciences (restricted to chemistry, geochemistry, marine and water sciences) in English, French and Spanish. The corpus compilation process will be presented and I will briefly outline the distinction made between functional and thematic ngrams. The main part of my presentation will focus on two issues: i.e. the pervasiveness of the two types of recurrent word combinations in the three subcorpora and the parallels that can be drawn (especially when there is overlap between languages with a specific ngram) between thematic and functional ngrams and the lexical density of each language subcorpus. Preliminary results indicate overlapping: viz. the trigram ‘a partir de’ exhibits a similarly high frequency both in the Spanish and French subcorpora, whereas the Spanish ‘en la figura’ and the English equivalent ‘shown in figure’ are used in a comparable manner and both share similar frequency. Substantial di↵erences, however, have been observed in lexical density between languages with a greater ratio in English than in French and Spanish, implying that composition strategies may vary significantly in terms of information packaging. There is also a marked preference in English for functional ngrams rather than thematic ngrams. For instance, the top three trigrams in the English subcorpus are all functional, whereas those in the two remaining languages are considered thematic or topic-specific. (i.e. ‘the use of ’, ‘shown in figure’, ‘as well as’; ‘après J.-C’, avant J.-C, ‘de l’holocène’; ‘almacenamiento de CO2’, ‘de CO2 en’, ‘de la formación’, respectively). The implications of our results will be discussed in respect to language ⇤ Ponente 77 teaching and particularly that of language for specific purposes. Contraseña: ngrams, phraseology, academic discourse, specialized corpora, contrastive studies 78 Gender-based di↵erences in the use of epistemic modals in late Modern English scientific register Francisco Alonso-Almeida 1 ⇤† 1 , Francisco J. álvarez-Gil ⇤ ‡ 1 Universidad de Las Palmas de Gran Canaria (ULPGC) – España The research conducted has focused on samples from English scientific texts from 1700 to 1900 in order to evaluate epistemic modality as realised by modal verbs. Epistemic modality seems to be strongly connected to the idea of truth and the authors’ responsibility and commitment regarding their statements (Traugott 1989; Sweetser 1990; Stukker Sanders and Verhagen 2009). We will also discuss some related features, such as evidentiality. Whereas for some scholars evidentiality represents a subdomain of epistemic modality, there are others who consider evidentiality as an independent category. In this context, Dendale and Tasmowski (2001) argue that the relation between these two concepts is divided into disjunction, inclusion, and intersection. We follow the disjunctive approach in this paper in line with Cornillie (2009) who argues that the mode of knowing should not be associated with the degree of authors’ commitment towards their texts. Our interest was to see whether di↵erences in the use of these modals could be detected from a gender perspective. For this, we have interrogated the subcorpus of History of The Coruña Corpus of English Scientific Writing, which contains extracts of several historical texts written between 1700 and 1900, using its own retrieval tool, i.e. the Coruña Corpus Tool. Each occurrence has been categorised according to its contextual meaning following Dixon’s description of modal verbs that claims there are modals and what we can call semi-modals, which express the modalities (2009: 172). However, there are also other valuable insightful studies on modals as Coates (1983), Leech (1971) and Palmer (1979), among others, which have served as references for the present study. The process followed consists basically in the following: firstly, we have produced a list of occurrences in the corpus to check the presence of modal verbs in the history texts available. Secondly, we have interrogated and analysed the corpus to find the pragmatic functions those modals play in the di↵erent texts. Finally, we have checked the results to find out if there exist any di↵erence in the use of epistemic modals in late Modern English scientific register regarding the gender of the writers. Results report on frequency of usage of these modal verbs according to gender, but, most importantly, the di↵erent pragmatic functions these modal verbs fulfil in the communicative process. One such pragmatic function is mitigation of claims (Alonso Almeida 2015), and so the modals are used as a negative politeness strategy (Brown and Levinson 1987), to avoid or minimize imposition, to hedge the illocutionary force of a specific statement, or to put social distance in order to save the author’s face. In this sense, modals are quite useful as they enable an interactive construction of scientific knowledge giving the chance to the writer and the readers to negotiate meaning. ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 79 Contraseña: modals, corpus, gender, modality, evidentiality 80 Gobernabilidad y democracia en México. Unidades fraseológicas del Ejecutivo Federal 2012-2016 desde el Análisis Crı́tico del Discurso Carlos Enrique Ahuactzin Martı́nez ⇤ 1 1 Benemérita Universidad Autónoma de Puebla-Instituto de Ciencias de Gobierno y Desarrollo Estratégico (BUAP-ICGDE) – Av. Cúmulo de Virgo s/n. Acceso 4, CCU. Puebla, Puebla, México C.P. 72810., México La concepción del Estado como regulador de la vida pública en paı́ses latinoamericanos ha encontrado, en los últimos años, su prueba más rigurosa. En el caso de México, se propone documentar el proceso de construcción del discurso de la ”gobernabilidad democrática”, a partir de la figura presidencial, como una estrategia del Ejecutivo Federal para hacer frente a la existencia de un ”Estado fallido”, a la luz de los acontecimientos sociales y polı́ticos que han puesto en examen la capacidad del Estado mexicano para mantener y garantizar los derechos humanos. Con base en las perspectivas teórico-metodológicas del Análisis Crı́tico del Discurso y la Ling´’uı́stica de Corpus, de manera complementaria, se analizan los discursos presidenciales del periodo 2012-2016, en que se registra la configuración de la nueva polı́tica de Estado en materia de seguridad y el desarrollo de los procesos de violencia que han caracterizado a la administración federal actual. El discurso presidencial, a lo largo del corpus, revela los recursos discursivos que hicieron posibles las formas de comunicación de las reformas estructurales en México, basadas en el cumplimiento de la ”gobernabilidad democrática”, concebida como un marco normativo para el desarrollo del Estado y el fortalecimiento de la ciudadanı́a. El corpus ha sido organizado con base en las concordancias semánticas, utilizando el Sistema de Gestión de Corpus del Grupo de Ingenierı́a Ling´’uı́stica de la Universidad Nacional Autónoma de México. La clasificación y tratamiento de las unidades fraseológicas tuvo como base la identificación de dos monolexemas, ”gobernabilidad” y ”democracia”, que en el tratamiento del corpus revelaron su incorporación a plurilexemas, en función de la situación comunicativa del Ejecutivo Federal. De este modo, se establecieron tres grupos, dada su frecuencia en la base del corpus: 1) las locuciones nominales, 2) las locuciones adjetivas, y 3) las locuciones adverbiales. En la determinación de los usos de las locuciones, se consideró en el etiquetado del corpus el carácter funcional de las expresiones ling´’uı́sticas, en el contexto de la comunicación gubernamental. Los recursos discursivos utilizados por el Ejecutivo establecen un marco de referencia a nivel léxico-semántico, en el que la ”democracia” ocupa un lugar destacado en el ejercicio del poder público y la legitimación de las decisiones polı́ticas. Asimismo, el uso de las derivaciones de la ”gobernabilidad”, a la luz del análisis de las unidades fraseológicas, permite establecer un campo de asociaciones entre las locuciones nominales, adjetivas y adverbiales. El trabajo de etiquetado de las unidades de análisis en el periodo estudiado permite establecer la relación entre las modalidades del discurso presidencial y los procesos polı́ticos que determinaron el contexto de la producción y emisión de los mensajes institucionales. Por tanto, el estudio propone un acercamiento interdisciplinario sobre el discurso presidencial, considerando las variables discursivas, ling´’uı́sticas y polı́ticas, que participan en la configuración de los mensajes del Ejecutivo Federal en México. Finalmente, se ⇤ Ponente 81 propone, como resultado de los hallazgos empı́ricos, una tipologı́a de las estrategias comunicativas y discursivas que articulan la concepción de la ”gobernabilidad democrática” en un contexto normativo que pone en evidencia las limitaciones reguladoras del Estado mexicano. Contraseña: Discurso, unidades fraseológicas, locuciones, gobernabilidad y democracia. 82 Gramática española para hablantes de francés: el uso de la preposición ”de” después de matrices del tipo es posible. Marı́a Adelaida Gil Martı́nez 1 ⇤† 1 Instituto Cervantes de Burdeos (IC Burdeos) – Instituto Cervantes – 57, Crs de l’Intendance 33000 Bordeaux France, Francia Una de las dificultades más habituales en el aprendizaje del español por parte de hablantes de francés es el uso de las preposiciones, sobre todo el uso excesivo de la preposición de en matrices del tipo es posible, conformando una de las fosilizaciones más caracterı́sticas en la interlengua de dichos hablantes hasta el nivel B1. Si bien en los niveles iniciales se podrı́a pensar en una transferencia del francés al español, por ejemplo: (*es posible de dejar de fumar ) del francés (c’est possible de arrêter de fumer), en el nivel B1 se podrı́a llegar a considerar una estrategia para evitar el uso de subjuntivo, al no dominar la alternancia entre los dos modos en español. Teniendo en cuenta, además, que la transferencia sintáctica de la L1 es observable hasta niveles muy avanzados, no es raro que se observe este tipo de error en estos estadios del proceso de enseñanza-aprendizaje. Para corroborar esta hipótesis, hemos recurrido al Corpus de aprendices de español como lengua extranjera (CAES), un corpus diseñado por un equipo de la universidad de Santiago y financiado por el Instituto Cervantes, que consiste en textos escritos producidos por estudiantes de español con diferentes grados de dominio ling´’uı́stico (niveles A1 a C1 del Marco común europeo de referencia, aplicado al español en el Plan curricular del Instituto Cervantes. Niveles de referencia para el español ) y procedentes de seis L1: árabe, chino mandarı́n, francés, inglés, portugués y ruso. Los objetivos que se persiguen en esta propuesta son los siguientes: • Ver hasta qué punto CAES corrobora esta hipótesis al analizar y valorar, a través de técnicas estadı́sticas, la presencia de la matriz (*es posible de) en el aprendizaje de español por hablantes de francés. • Explorar en qué contextos aparece esta estructura y qué información podemos obtener de los muestras de CAES. El análisis contrastivo de dos lenguas o más a través de corpus ling´’uı́sticos nos permitirá valorar cómo funciona esta estructura dentro del discurso y determinar hasta qué punto su aparición se debe a la transferencia de la L1 o a otras estrategias de aprendizaje por parte de los hablantes de francés. • Construir un banco de ejemplos que pueda servir más tarde para el diseño de actividades y tareas que llevar al aula y que actúen como material-mediador-revulsivo que mejore el proceso de enseñanza-aprendizaje. ⇤ † Ponente Autor correspondiente: [email protected] 83 Los primeros resultados que arrojan las muestras analizadas de CAES nos hablan de los siguientes contextos: • Las matrices del tipo *es posible aparecen seguidas de la preposición de en un 38% de los casos. Contraseña: Corpus de aprendices, hablantes de francés, matrices de subjuntivo, interlengua, ELE, transferencia de la L1 84 Hedging in tourism discourse: the variable genre in academic vs professional texts Francisca Suau-Jiménez 1 2 ⇤ 1 , Carmen Piqué-Noguera ⇤ † 2 FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ. UNIVERSITAT DE VALENCIA (IULMA-UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ (IULMA - UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España In the last decades e-genres have been at the forefront of academic, professional and social studies to enhance writing in these areas (Thaine 2015), and tourism has often been targeted as one of them. Recent studies (Suau-Jiménez 2016; Mapelli 2016) have shown that tourism egenres strongly challenge the interpersonal model of metadiscourse for academic genres (Hyland 2005). Therefore, we hypothesize that one of the most representative interpersonal markers, hedges, should also show important di↵erences in what respects functions, frequency and language grammatical realization across Research Articles vs Hotel Websites. Genre and discipline are two variables that have been claimed to challenge the original interpersonal metadiscourse model that took English and academic discourse as their main referents. Hedges are prototypical markers in academic writing, and also central in tourism genres of promotion in English (Suau Jiménez 2012) since they reveal di↵erent author’s functional attitudes and commitments with content and with readers’ implication. Hedges, however, are not always so easy to discriminate, as Nash (1992) points out, claiming their fuzziness in interpersonal metadiscourse. This research analyzes a 100.000-word corpus composed of two sub-corpora of Hotel Websites and Research Articles in tourism. We aim to uncover which generic functions they partake or not in each case, their frequency, as well as the nature of their grammatical realization in both genres. Methodologically, we have taken Hyland’s (2005) taxonomy as a starting point and adapted it to the corpus at hand. We have disposed of so-called research verbs, such as ‘argue’ or ‘indicate’ since they only appear in Research Articles. Then, since modals ‘should’, ‘could’ and ‘may’ are shared by both genres, we have taken them as our specific object of analysis from an interpersonal discourse approach (Hyland 2005). Preliminary results from a pilot corpus of 34.000 words for both genres already showed a quantitative di↵erence: 8.03 hedges in Research Articles versus 4.07 in Hotel Websites. Besides, modals ‘may’ and ‘should’ present specific occurrences: whereas we have counted 21 in Articles, there are 54 in Hotel Websites. Also, ‘should’ appears 13 times in Articles and 15 in Hotel Websites, whereas ‘could’ has an occurrence of 12 times in Articles versus 2 times in Hotel Websites. Frequency may imply a di↵erent way to approach and persuade each readership, this being related to specific functional needs to achieve the genre communicative aim. Tourism marketing use these modal verbs to give advice to prospective clients or to describe what they would find around the hotel premises, whereas Research Articles writers make use of these modals especially in their argumentative and speculative sections. Their use is typical when making claims which are more or less tentative, or when a possible outcome is more or less probable, often accompanied by qualifying adverbs like ‘relatively’, ‘generally’ or ‘largely’. ⇤ † Ponente Autor correspondiente: [email protected] 85 Conclusions point towards interpersonal metadiscourse as a research framework that must consider the variables genre and discipline in order to render ad hoc analyses that can explain contextually marker frequencies and lexico-grammatical realizations, so that adequate discursive and socio-linguistic implications can be drawn. Contraseña: hedges / interpersonal metadiscourse / interpersonality / professional and academic genres 86 Identificación de fórmulas recurrentes en español académico Marcos Garcı́a Salido 1 ⇤ 1 , Marcos Garcia González 1 , Margarita Alonso Ramos 1 Departamento de Galego-Portugués, Francés e Lingüı́stica, Universidade da Coruña (UDC) – España En cualquier género discursivo existen combinaciones recurrentes o rutinarias de unidades léxicas. Dichas combinaciones son en muchas ocasiones semánticamente composicionales, pero su realización léxica está condicionada por la representación conceptual que el hablante desea expresar. Ası́, por ejemplo, para expresar la presentación de las conclusiones de un texto, en conclusión resulta más idiomática que ?a manera de conclusión. Desde una perspectiva fraseológica, este tipo de secuencias se han denominado clichés (Mel’čuk, 2015) y se solapan hasta cierto punto con el concepto de lexical bundle (Biber et al., 1999). Por sus caracterı́sticas, la comprensión de tales secuencias no es problemática, pero sı́ puede serlo su producción, de ahı́ el interés de un diccionario que las recoja, especialmente para hablantes no nativos o escritores noveles. El objetivo del presente trabajo es evaluar la eficacia de diversos métodos empleados para la identificación automática de secuencias pluriverbales, con vistas a la compilación de un diccionario de español académico. Hemos considerado como fórmulas recurrentes secuencias de dos, tres y cuatro palabras con una frecuencia de al menos diez ocurrencias por millón de palabras (Biber et al., 1999). Se han obtenido ası́ fórmulas como cabe destacar que o en el presente trabajo, al lado de otras de interés más dudoso como et al. 2002, no se han, etc. Para identificar aquellas que son caracterı́sticas del discurso académico hemos comparado fundamentalmente dos estrategias: (i) combinar un ı́ndice de dispersión (DP, Gries, 2008) con un valor de log-likelihood indicativo de diferencias significativas en cuanto a la distribución de las fórmulas estudiadas con respecto a textos no académicos y (ii) usar exclusivamente un test que a la vez mide diferencias de distribución y tiene en cuenta la dispersión de las formas testadas (Wilcoxon-Mann-Whitney [WMW]; cf. Kilgarri↵, 2001; Paquot y Bestgen, 2009; Lijffijt et al. 2015). El corpus de referencia utilizado (la parte en español del SERAC, InterLAE, 2008) se compone de artı́culos cientı́ficos de cuatro áreas diferentes (Humanidades, Ciencias Sociales, Fı́sica e Ingenierı́a y Ciencias de la salud) y se contrasta con textos narrativos procedentes del corpus LEXESP (Sebastián et al., 2000). Como sucede en otros estudios (Paquot y Bestgen, 2009), el test de WMW se muestra, en principio, más conservador que el log-likelihood. Por ejemplo, si consideramos solo los bigramas con un valor p 0,0001 en el test de WMW, nos quedarı́amos únicamente con un 25% de estas secuencias. Con el mismo valor p, el test log-likelihood producirı́a una lista del 73% de la cantidad original de bigramas. Ahora bien, esta última lista puede reducirse a solo los bigramas de mayor dispersión, acortando sensiblemente la distancia entre los resultados de los dos métodos. El análisis tanto de las listas obtenidas como de los elementos que se han quedado fuera de acuerdo con los distintos umbrales de significatividad y dispersión proporcionará información acerca de la precisión de los filtros usados y de su exhaustividad. ⇤ Ponente 87 Contraseña: fórmulas, discurso académico, diccionario, extracción automática de keywords 88 Impact of Parallel Corpora as Translation Memories on Phraseological Translation Quality in Student Translations of Specialized Medical Texts Heidi Verplaetse 1 1 ⇤† 1 , An Lambrechts , Kris Heylen ⇤ 1 KU Leuven, RU Quantitative Lexicology and Variational Linguistics – Bélgica ABSTRACT Theoretical background and main arguments Recently K´’ubler et al. (2016) conducted a student experiment using comparable corpora, indicating that these corpora help to solve translation difficulties, such as those relating to tense and aspect, the use of prepositions and collocations, etc. However, certain error types occur more often with the use of a corpus, possibly because of overconfidence in the corpus or a lack of time when making extensive use of it. Aside from comparable corpora, the use of parallel corpora as translation memories (TMs) integrated in a CAT tool, provides another excellent means to prepare students for their future professional environment, reflecting the needs of professional translators. Corpora improve student translations because they contain information which is not included in dictionaries, particularly with regard to terminology and idiomatic expressions (cf. phraseology) (Frérot, 2009). This is confirmed by K´’ubler (2011), who states that parallel corpora seem to be the perfect tool for a translator: next to the terminology needed for the translation task, they also provide the translator with the necessary phraseology. By using parallel corpora and integrating these in a CAT tool, it is not only possible to exploit the abovementioned benefits of corpora, but also those of TMs: not only do TMs speed up translation, leading to an increase in the translator’s productivity and gains, but they also have a positive influence on the overall translation quality. By recognizing previously translated segments, TMs increase the consistency at the stylistic, phraseological and terminological levels (Austerm´’uhl, 2006). However, when trying to increase their translation output, translators may work too fast if they have a TM, negatively influencing translation quality as they use translations from the TM without verifying them first (Bowker, 2005). Aims and method In order to assess the influence of CAT tools and preset corpus-based TMs on translation quality on a phraseological level, we examine translations of specialized medical texts executed by MA students of Translation. In our experiment the source texts contain predefined translation difficulties. The students perform the translations under three di↵erent conditions, viz. (i) without CAT tools, TMs or external resources, (ii) with a CAT tool and a TM and (iii) with external ⇤ † Ponente Autor correspondiente: [email protected] 89 resources only. For the medical translations in our current tests the students use the parallel corpus from the European Medicines Agency (EMA) compiled by Tiedemann (2009) as a TM. Upon completion of the translation an analysis of the predefined translation difficulties is executed based on an error classification (cf. MeLLANGe error typology, K´’ubler et al., 2016). We use an error typology, as errors can be defined more easily and precisely than translation quality: translation quality depends on the absence of errors to a large extent. And as stated by Schiaffino and Zearo (2005), among others, translation quality should be assessed as objectively as possible. Pilot test results Our pilot test with student translations led to the insight that concordance searches in TMs of parallel corpora prove beneficial for looking up specialized medical terminology (-, 2015), whereas mere TM support without concordance searches provided little added value. Terminology look-up through concordance searches proved especially beneficial for more difficult items. In these experiments, however, also the exclusive use of external resources (excluding CAT tools and TMs) showed a considerable positive influence on the translation of specialized terminology (-, 2015). Contraseña: Parallel corpora, Comparable corpora, Terminology, Phraseology, CAT tools, Translation Memories (TMs), Translation quality, Translation for Specific Purposes, Medical translation 90 Investigating style and conventionality in literary translation: a corpus-based approach Carolina Barcellos 1 ⇤ 1 University of Brası́lia (UnB) – Campus Universitário Darcy Ribeiro – Asa Norte – ICC Sul B1167/63 CEP: 70910-900 – Brası́lia /DF, Brasil Corpus-based Translation Studies (BAKER, 1999, 2000; SALDANHA, 2011) have focused on the style of translators, and addressed the translator’s discursive presence in the translated text as a result. This research specifically investigates stylistic traits of a literary translator from the perspective of conventionality and shifts in translation. It examines patterns of linguistic choices made by a translator regarding conventionality (BAKER, 2007) in Brazilian Portuguese that could be found both in his work as a translator and as an author, and the consequences of these choices for the recreation of meaning in the translated texts. Three corpora were compiled: 1) a corpus of translated texts written in Brazilian Portuguese by one of the current most prominent Brazilian literary translators, Paulo Henriques Britto, 2) a corpus of non-translated texts written in Brazilian Portuguese by Britto, and 3) a corpus of short stories written in American English by the authors Philip Roth, John Updike, and Jhumpa Lahiri that, with the first corpus, translated texts by Britto, composed a parallel corpus. Two other corpora (COMPARA and ESTRA) were used as control corpora for frequency reference regarding convencionality in Brazilian Portuguese. Statistical data were obtained using the software WordSmith Tools c 6.0 (SCOTT, 2012), and elements related to conventionality in Brazilian Portuguese were analyzed at the various orders (morpheme, word, group, and clause). The research methodology included compilation, preparation, alignment and tagging the texts for later analysis with WordSmith Tools c 6.0. The identification of patterns in the translated texts, attributed to the translator’s style and not to the linguistic constraints of the American English/Brazilian Portuguese pair, take on board mainly what was postulated by Munday (2008), Saldanha (2011) and Baker (1999, 2000, 2007). The results indicated that Britto made a set of choices to some extent distinct for each translated text, under the influence of the style of source texts. In general, the linguistic choices made by Britto regarding the use of conventional expressions increased the degree of colloquialism in the translated texts when compared to their respective source texts. In addition, the set of choices identified in Britto’s non-translated texts presented similarities with the set of choices identified in his translated texts, in particular with the ones in Philip Roth’s work. The most frequent shift in translation was addition (an amplification subcategory). These instances of addition were not directly related to explicitation. They were, on the other hand, related to a preference from the translator to use conventional expressions in translated texts, even when there was no clear motivation for this in the source texts. Britto also made use of sanitization, erasing some cultural references from the source texts. Nevertheless, the translator’s creativity consistently outweighted the use of sanitization, corroborating the results obtained by Munday (2008) and refuting, to some extent, the ones obtained by Baker (1999, 2000). ⇤ Ponente 91 Contraseña: Conventionality, Style of Translation, Literary Translation, Corpus, based Translation Studies. 92 Investigating the cognitive potential of primary EFL textbook activities: a corpus-based study Joaquı́n Gris Roca ⇤† 1,2 , Raquel Criado Sánchez ⇤ ‡ 3,4 , Agustı́n Romero Medina§ 2,5 , Isabel Alonso Belonte ⇤ ¶ 6 1 3 University of Murcia (UMU) – Universidad de Murcia, Facultad de Ciencias Sociosanitarias, Campus de Lorca, Antiguo Cuartel Sancho Dávila, Avda. de las Fuerzas Armadas, s/n, Lorca 30800 Murcia, Spain, España 2 Université de Murcie – España University of Murcia (UMU) – Universidad de Murcia, Facultad de Letras, Campus de la Merced, C/ Santo Cristo, 1, 30071 Murcia, Spain, España 4 Université de Murice – España 5 University of Murcia (UMU) – Facultad de Psicologı́a University of Murcia Campus de Espinardo 30100 Murcia, Spain, España 6 Université autonome de Madrid – España Textbooks and activities are fundamental tools in the EFL classroom (e.g. Littlejohn, 2011; Montijano-Cabrera, 2014; Sánchez, 2004; Tomlinson, 2003, 2011) as they are often the only means to a↵ord students opportunities to practise the L2 in (very often) poor-quality-input environments, as is the case of EFL contexts. Teachers can use them in a variety of ways, mainly to convey the L2 knowledge to students through practice or to support the explanations they present in class. Basically, there are three types of activities according to the type of knowledge they foster (Gris, 2015): i) activities whose teaching nature is mostly or fully explicit, which primarily foster explicit linguistic knowledge (e.g. knowledge of the forms); ii) activities with a high or full implicit teaching load, aimed at developing implicit knowledge (which underlies oral and written fluency); and iii) activities that have a mixed teaching load, that is, partially explicit and implicit. The selection and implementation of activities taking into their explicit and implicit teaching nature is crucial for a balanced development of both explicit and implicit knowledge, given that the ultimate goal of Foreign Language Teaching should be the attainment of the latter (e.g. DeKeyser, 2015, etc.). This issue becomes particularly sensitive when it comes to child L2 acquisition (Abello-Contesse et al., 2006), since earlier stages of acquisition are believed to be decisive for aspects such as pronunciation, intonation and fluency (Agustı́n-Llach, 2016; Alizadeh, 2011; Paradis, 2007). Therefore, the objective of this preliminary study is twofold: firstly, to analyze the load of explicit and implicit teaching nature of activities pertaining to EFL textbooks from di↵erent and representative editorial houses, used in primary school in Spain; secondly, to discern their cognitive potential. ⇤ Ponente Autor correspondiente: ‡ Autor correspondiente: § Autor correspondiente: ¶ Autor correspondiente: † [email protected] [email protected] [email protected] [email protected] 93 The method to analyze and categorize activities involved two basic steps. The first one entailed the creation of a corpus by compiling 100 activities from 10 real EFL textbooks used in the first year of Spanish primary school in Spain. The activities were randomly selected from two textbooks from each of the major EFL textbook editorials in Spain (Oxford University Press, Macmillan, Cambridge University Press, Santillana/Richmond, Pearson, Burlington Books, Anaya). Unit and activity selection within each textbook was randomly undertaken too. Secondly, each individual activity in the corpus was tagged with its explicit and implicit teaching load. Data analysis is ongoing and it is expected that this study will contribute to shed light on the patterns of activity typology of EFL primary-school textbooks. This will unveil the cognitive potential underlying textbook activities used for child EFL teaching. Derived pedagogical implications will be indicated. Contraseña: Primary school, EFL teaching, textbooks, activities, corpus 94 Investigating the relationship between L1 and L2 collocation processing in the bilingual mental lexicon from a cross-linguistic perspective Hakan Cangir 2 ⇤ 2,1 University of Exeter – Graduate School of Education St Luke’s Campus Heavitree Road Exeter Devon EX1 2LU United Kingdom, Reino Unido 1 Ankara University, School of Foreign Languages (AU YDYO) – Ankara Üniversitesi Gölbaşı 50.yıl yerleşkesi Bahçelievler Mahallesi Kaymakamlık arkası 06830 Gölbaşı/ANKARA, Turquı́a Many studies have investigated how the bilingual mental lexicon is structured and it has been suggested by various researchers that both lexicons seem to interact in some way during the language production. However, there are certain disagreements in terms of the interaction between the two mental dictionaries during the lexical activation process; in particular, in which phase of the activation process one can observe an interaction. Another related topic scrutinized by many applied linguists is whether the activation of lexis is language specific or language non-specific. The current study attempts to assume the process to be language non-specific and tries to shed light on the cross-linguistic nature of the bilingual mental lexicon with a specific emphasis on collocations, which seem to be an understudied topic. In addition, the research approaches the issue of cross-linguistic lexical priming from a syntagmatic perspective with the help of a typologically di↵erent language, Turkish, which previous research appears to lack. It is assumed that frequency, congruence, and typological variety are likely to have an impact on lexical processing, collocations in particular. With this notion in mind, the researcher exploits two representative and balanced corpora, Corpus of Contemporary American English (COCA) and Turkish National Corpus (TNC) to develop reliable items to be employed in a cross-linguistic collocational priming experiment and attempts to observe the response times of English-Turkish bilinguals and investigate the influence of frequency, congruence and typology on collocational processing. Building on lexical priming theory which suggests that every word is primed to occur with particular other words it collocates, the study attempts to refer to the Spreading Activation Model as the underlying theory to lexical activation and examine the cross-linguistic aspect of collocational priming in bilinguals. Furthermore, as the core framework for cross-linguistic collocational priming, Dual Activation of Collocational Connections Model and Psycholinguistic Model of Vocabulary Acquisition in L2 are employed due to the two di↵erent language acquisition settings reflected in the study; i.e. English as a Second Language (ESL) and English as a Foreign Language (EFL). The initial results indicated that a strong priming e↵ect seems to exist in Turkish based on the results of a monolingual priming experiment designed to set the baseline for the main experiment. Furthermore, the findings of the cross-linguistic priming experiment suggested that a priming e↵ect appears to be present for ADJECTIVE+NOUN collocations, but not for VERB+NOUN combinations, which can be regarded as a typology e↵ect on the processing of collocations cross⇤ Ponente 95 linguistically. What is more striking is that the direction of the presentation in the priming experiment appears to have the strongest impact on response times. That is, when the prime word was in L1 and the target word was in L2, the processing seems to be facilitated and a statistically more significant priming e↵ect can be detected. Last but not least, congruent and more frequent (having a higher P1—2) collocations yielded more significant cross-linguistic priming e↵ect. The regression analysis revealed that the direction of the presentation and P1—2 are strong predictors of the mean response times of the subjects in the cross-linguistic collocational priming experiment. The results were discussed in the light of the lexical processing models stated above. Contraseña: Collocational Priming, Mental Lexicon, Bilingual, Corpora and Crosslinguistic 96 Knowledge extraction for TKB phraseology module design Pilar León-Araúz 1 ⇤ 1 , Arianne Reimerink ⇤ † 1 University of Granada (UGR) – Buensuceso, 11 18001, España Certain authors define phraseological units as all word combinations with certain stability (Hausmann 1984, 1985, 1989; Gl´’aser 1994/95), even in specialized discourse (Roberts 1994/95, Heid 1994, 2001; Montero 2003, 2008). According to Rundell (2010: vii), collocations are as important as grammar since they make speakers/writers sound fluent. In specialized domains, they are perceived by language users to contribute to the domain-specific flavor of special languages (Bartsch 2004). In this line, recent studies have highlighted the importance of verbs, their collocations and argument structure in specialized terminology (Lorente 2007; Buendı́a 2012, 2013; Buendı́a, Montero and Faber 2014), but there are currently few terminographic resources that incorporate them (L’Homme 1998; Buendı́a 2012). If terminological knowledge bases (TKBs) want to be truly helpful for specialized writing, phraseological information should be added in a consistent and user-friendly way. In EcoLexicon, a TKB on the Environment (ecolexicon.ugr.es), phraseology was first included at the term level, linking verbs with arguments previously contained in EcoLexicon (Buendı́a 2013). However, certain verbs, or at least some of the paradigms in which they can be framed, can also be regarded as semantic relations. In EcoLexicon, knowledge extraction and representation is based on triplets or conceptual propositions (concept-relation-concept combinations; Faber, León and Reimerink 2014). Nevertheless, the expressivity of some of the relations should be improved. For instance, the relations a↵ects, has function, or cause could be divided into more specific relations. Conceptual propositions such as erosion a↵ects landform would be more meaningful if the relation was reduces instead of a↵ects. However, the TKB should also contain other verbs lexicalizing and specifying the nuclear meaning of reduction (e.g. carve, degrade, erode, etc.) as well as other terms that can also fill the slots of these arguments (e.g. weathering, cli↵, etc.) For a phraseological module to be consistent with the conceptual module in EcoLexicon, it should be based on the same principles. The design of our module is thus developed from the categorization of term-verb-term collocates reflecting the di↵erent lexicalizations of conceptual propositions. Thus, semantic relations can be further specified according to specialized predicates. In turn, phraseological templates can be generalized based on the semantic types related in conceptual networks. However, these semantic types need to be extracted in a consistent way. Top-down and bottom-up methods are applied to extract the information needed to build the module. The first consists of establishing basic semantic categories in the environmental domain (e.g. landform, structure, instrument, etc.), based on the definitions and conceptual networks in EcoLexicon. This will result in a domain-specific ontology similar to that of CPA semantic types, which is used in the Pattern Dictionary of English Verbs (PDEV; Hanks 2008). The validity of this categorization is tested by comparing it to the results of the automatic clustering (Brown et al. 1992) of a 50 million word corpus on the Environment. The latter consists of extracting all verbs from the corpus with TermoStat (Drouin 2003) and classifying them into di↵erent paradigms based on the concepts they relate and the basic conceptual relations they ⇤ † Ponente Autor correspondiente: [email protected] 97 express. These paradigms will be inspired in the patterns and implicatures of the PDEV and the lexical domains described in Faber and Mairal (1999). The analysis of verbs and arguments will contribute to the refinement of our semantic relations and categories as well as to the population of the phraseological module. Contraseña: phraseology, specialized discourse, TKB, categorization 98 L’analyse contrastive des références au passé en français et en chinois -Sur le corpus des récits Xingzi Zhang 1 ⇤ 1 Laboratoire – Université Paris III - Sorbonne nouvelle – Francia La linguistique contrastive est considérée comme une branche de la linguistique appliquée, qui étudie la comparaison des micro-systèmes de deux (ou éventuellement de plusieurs) langues afin de faciliter leur enseignement et leur apprentissage. C’est une branche classique de la linguistique. Les origines de la linguistique contrastive remontent aux années 1950, aux Etats-Unis. Deux ouvrages peuvent être mentionnés, celui d’Uriel Weinreich (1953) sur le contact des langues et celui de Robert Lado (1957) qui est considéré comme l’ouvrage fondateur de la discipline. Nous allons choisir cette méthode, en appuyant sur nos corpus, afin de comparer la façon de référer au temps du passé et à l’aspect, et pour étudier l’organisation temporelle du récit. En français, on utilise des morphologies verbales pour exprimer à la fois le temps et l’aspect. Dans la catégorie des temps du passé, le présent de narration, le passé composé, l’imparfait, le plus-que-parfait et le passé simple sont souvent utilisés. Le chinois est une langue sino-tibétaine qui est très éloignée de la langue française. Il ne dispose pas de morphologie verbale comme les langues indo-européennes et est considéré comme une langue aspectuelle, qui utilise des particules aspectuelles (” -le ” ” -zhe ”, etc.) ou des structures (les RVCs, les redoublements de verbe, etc.) pour exprimer la temporalité. Corpus : Nous comparons la production écrite d’un récit basé sur un film muet, des deux groupes (un groupe de français natifs (GF, n=8) et un groupe de chinois natifs (GC, n=8). Afin qu’ils racontent le récit au passé, nous leur avons précisé que la situation reprise dans l’extrait s’était déroulée une semaine avant, et ils devaient décrire en détail ce qu’ils avaient vu. Résultats : En comparant les récits rédigés par les chinois et les français, nous observons quelques di↵érences pour marquer le passé dans les deux langues : - En français, pour décrire un récit, les natifs utilisent systématiquement la morphologie verbale pour remarquer le temps, cependant, en chinois, l’indication explicite du temps du passé est indiquée par les adverbes temporels. Pour l’aspect, les chinois natifs utilisent les morphèmes d’aspect comme ” -le ” ” -zhe ” ” zai- ”. En plus, les morphèmes sont optionnels, beaucoup de propositions sont sans morphèmes, surtout quand elles expriment l’aspect imperfectif, la ma⇤ Ponente 99 jorité n’a pas d’indication explicite. Nous remarquons qu’en chinois, le type de procès est moins flexible qu’en français, il peut indiquer aussi l’aspect. - Pour marquer l’antériorité dans le récit, les français natifs utilisent le plus-que-parfait, le semiauxiliaire ” venir de ”, le participe passé ou bien le passé composé qui est en e↵et une forme erronée du plus-que-parfait. Quant aux chinois natifs, en raison de l’absence de morphologie verbale, pour marquer l’antériorité, les chinois utilisent les moyens lexicaux : ” ganggang ”/ ” gangcai ” (tout à l’heure), etc., ou utilisent le morphème ” -le ”, la structure ” shi...de ” (C’est...qui/que) qui marquent l’aspect perfectif dans le style indirect pour référer à une situation s’est passée antérieurement. Il y a également des propositions sans marquage, dans ce cas, c’est l’information contextuelle qui permet d’identifier l’antériorité. - Les français natifs ont tendance à raconter le récit de façon séquentielle. Mais les chinois natifs racontent le récit de façon détaillée : les actions, les descriptions de personnages, les explications de situations s’imbriquent. Contraseña: l’analyse contrastive, morphologie verbale, le passé, l’aspect perfectif, l’aspect imperfectif, l’antériorité 100 La adquisición de los verbos de cambio: Un análisis de la interlengua de aprendices de español (L1 sueco) Ester Fernández 1 ⇤ 1 University of Gothemburg (GU) – Suecia El presente trabajo aborda el estudio de la adquisición de los verbos de cambio en aprendices suecohablantes de español lengua extranjera (ELE). El español dispone de una importante cantidad de verbos que sirven para expresar la noción de cambio (ponerse, volverse, hacerse, convertirse en, etc.). Estos se diferencian entre ellos a nivel semántico (Morimoto y Pavón Lucero, 2007) ya que cada uno, junto con su complemento, expresa diferentes maneras de realizarse el cambio (cambio de entidad, cambio procesual y cambio procesual resultativo). El sueco dispone del verbo bli, un verbo general que sirve para expresar prácticamente cualquier tipo de cambio. ¿Cómo tiene lugar la adquisición de estos verbos que no existen o no tienen una equivalencia exacta en la L1 de los aprendices? ¿Qué formas ling´’uı́sticas utilizan los aprendices suecohablantes para describir eventos de cambio en español? El objetivo de esta comunicación es presentar los resultados de un estudio piloto llevado a cabo durante un semestre académico con un grupo de aprendices suecohablantes (N=20) con distintos niveles de competencia ling´’uı́stica (entre el A2 y el B2). Los participantes estaban estudiando el primer curso de español (Grundkurs) en dos universidades suecas. Utilizamos una tarea escrita (la narración de una historia a partir de unas imágenes) con el fin de obtener muestras de lengua de la referencia al cambio. La tarea se repitió dos veces, al principio y al final del curso académico. Además, esta fue realizada una vez por un grupo de hispanohablantes (N=24). Observamos que era difı́cil identificar contextos obligatorios puesto que los nativos tendı́an a variar su elección de los verbos con respecto a la descripción de eventos de cambio. Esto nos llevó a plantearnos el estudio de la elección de formas de los aprendices desde un enfoque variacionista. Dicho enfoque proviene del campo de la socioling´’uı́stica (Labov 1972), sin embargo, se ha mostrado útil en el estudio del proceso de adquisición de segundas lenguas (Tarone 1979, 1983, 2007; Ellis 1985, 1999; Gesslin 2010; Gudmestad 2006; 2012). Los mismos factores (ling´’uı́sticos y extraling´’uı́sticos) que determinan la variación en el habla de los nativos son responsables de los fenómenos de variación que se manifiestan en las producciones de los aprendices. Aplicamos un análisis del significado a la forma (Bardovi- Harlig 2007; 2014). Primero identificamos los contextos donde los aprendices habı́an expresado cambios y seleccionamos todas las formas verbales y léxicas empleadas, codificándolas en función de una serie de variables ling´’uı́sticas (tipo del cambio descrito, tipo de complemento con el que se combina la forma etc.) A continuación, se comparó su uso con respecto a los niveles de competencia de los aprendices, los dos momentos de la realización de la tarea y con los nativos. Los resultados revelan que los aprendices usan variadas formas para expresar determinados tipos de cambio (cambio de entidad, cambio procesual y cambio procesual resultativo). El diseño ⇤ Ponente 101 pseudo-longitudinal del estudio nos muestra tendencias sobre el desarrollo del sub-sistema gramatical de los verbos de cambio en la interlengua de los aprendices. Al principio del semestre se observa, por ejemplo, un sobreuso de verbos como ser y estar que carecen del aspecto dinámico propio de los verbos de cambio. Al final del semestre se observa que estos verbos se van reemplazando en mayor o menor grado por verbos de cambio más propios de la lengua meta. Contraseña: Verbos de cambio, noción de cambio, Español como Lengua Extranjera, interlengua, variación. 102 La detección y etiquetado de las estrategias metadiscursivas en artı́culos académicos: METOOL Marı́a Luisa Carrió-Pastor 1 ⇤ 1 Universitat Politècnica de Valencia (UPV) – España Esta presentación trata sobre la identificación, etiquetado y comparación de las estrategias metadiscursivas que se utilizan en la lengua española e inglesa en el registro de textos cientı́ficos, ası́ como del análisis de la variación de estas estrategias en ambas lenguas. Esta investigación se enmarca dentro del proyecto ”Identificación y análisis de las estrategias metadiscursivas en artı́culos cientı́ficos en español e inglés (IAMET)”. Dentro del registro cientı́fico, hemos seleccionado tres disciplinas distintas entre sı́, la ingenierı́a, la medicina y la ling´’uı́stica para determinar la variación del uso de estrategias metadiscursivas. Para ello, nos basamos en las categorı́as metadiscursivas identificadas por Hyland (1998, 2005), Mur Dueñas (2011) y Briz (2001, 2007) para identificar los elementos que las componen y ası́ establecer sus frecuencias con el fin de realizar estudios contrastivos entre disciplinas y entre el español y el inglés. La hipótesis de partida que hemos planteado es que las estrategias metadiscursivas se usan de manera distinta en inglés y español, lo que puede influir en la efectividad de la comunicación cuando se utilizan como lenguas extranjeras. Los objetivos son, por un lado, analizar las estrategias metadiscursivas en inglés y español en varias disciplinas del registro cientı́fico y, por otro, detectar la variación que aparece en estas lenguas y disciplinas. Por lo tanto, la finalidad es doble: primero, caracterizar el discurso cientı́fico y sus estrategias retóricas que sirven para convencer al lector y segundo, identificar patrones de variación con respecto a las estrategias analizadas para que pueda utilizarse en la enseñanza del español e inglés. Ello se hace a través de la herramienta ’METOOL’ que se ha diseñado en el Research Institute for Information and Language Processing (Universidad de Wolverhampton) para el etiquetado e identificación de los elementos retóricos del discurso. Los matices que los escritores le otorgan a una lengua para persuadir al lector son de interés tanto para los escritores académicos como para los docentes de lenguaje académico, con lo cual la consecución de nuestros objetivos, es decir, la identificación y análisis de la variación en el uso de las estrategias retóricas en artı́culos cientı́ficos, beneficia tanto a los investigadores como a los escritores de este género, ya que sabrán si utilizan elementos retóricos de forma adecuada y si consiguen su objetivo, es decir, convencer al hablante de la importancia de su investigación. A través del análisis de los corpus y de la medición estadı́stica de la capacidad de involucrar al lector y convencerlo de los argumentos que se esgrimen, se puede medir el uso de las estrategias de persuasión ası́ como proponer alternativas. Para realizar este proyecto, en primer lugar se van a compilar los corpus en inglés y español en las tres disciplinas; en segundo lugar se van a identificar y etiquetar las categorı́as metadiscursivas y, en tercer y último lugar, se van a clasificar y analizar las estrategias metadiscursivas en ambas lenguas y en las tres disciplinas para determinar la variación, mostrando ejemplos de cada caso para identificar su naturaleza. Aunque las estrategias metadiscursivas han sido estudiadas desde diversos ángulos, no existe actualmente un trabajo que aborde la variación en el uso de estas estrategias y que clasifique y contextualice los elementos a incluir en las categorı́as. ⇤ Ponente 103 Contraseña: metadiscurso, análisis comparativo, analizador, artı́culos académicos 104 La economı́a al borde de un ataque de nervios: metáforas médicas en el discurso periodı́stico económico Ismael Ramos Ruiz 1 ⇤ 1 Universidad de Granada – España La metáfora se ha estudiado como un recurso literario hasta la aparición de la Ling´’uı́stica cognitiva, cuando empieza a considerarse también un recurso cognitivo que forma parte de nuestro sistema conceptual. Por ello, la metáfora está presente tanto en la lengua general como en el lenguaje especializado, a saber el caso de la Economı́a (Resche y Colin, 2016; Wang, Runtsova, y Chen, 2013). Debido a ello, conocemos el uso de la metáfora en el discurso periodı́stico económico (ej.: Nerghes et al., 2015) y, concretamente, el de la metáfora médica (ej.: Arrese, 2015). Partimos de la hipótesis de que si la economı́a se entiende como un organismo vivo, muchas de las enfermedades que sufre el ser humano serán empleadas en las proyecciones metafóricas, como es el caso de las enfermedades mentales y del comportamiento. Por tanto, nuestros objetivos consisten en: • averiguar y analizar qué términos médicos relacionados con el ámbito de las enfermedades mentales y del comportamiento aparecen en dicho discurso y qué relaciones se establecen entre estos términos y otros términos del texto; • establecer unos criterios de clasificación sintácticos y semánticos que permitan categorizar dichas combinaciones léxicas metafóricas. En primer lugar, hemos establecido un marco teórico basado en la Teorı́a de la metáfora conceptual (Lako↵ y Johnson, 1980, 1999), que nos ha ayudado a comprender la estructura de las metáforas y proceder a su análisis, ası́ como en la Terminologı́a basada en marcos (Faber et al., 2012), que nos ha servido para establecer los criterios sintácticos y semánticos de categorización de las metáforas. En segundo lugar, hemos creado un corpus para fines especı́ficos compuesto por textos periodı́sticos económicos de la prensa española, tanto de periódicos especı́ficos del ámbito económico (ej.: El Economista) como de las secciones económicas de los periódicos de tirada nacional El Paı́s y El Mundo. Para seleccionar los textos con presencia de metáforas, hemos empleado una adaptación del Procedimiento de identificación metafórica propuesto por el Grupo Pragglejaz (2007). En tercer lugar, después de analizar el corpus y obtener las lı́neas de concordancia con presencia de metáforas, hemos establecido unos criterios tanto sintácticos (mediante una adaptación de ⇤ Ponente 105 la propuesta realizada por Corpas Pastor, 1996) como semánticos (a partir de un evento conceptual prototı́pico en el que se establecen unas categorı́as semánticas) para clasificar dichas combinaciones léxicas metafóricas. Además de establecer unas categorı́as semánticas, los eventos conceptuales muestran las relaciones semánticas entre las categorı́as, como son ”causa” o ”afecta”, y la proyección del dominio médico sobre el dominio económico, aplicando la Teorı́a de la metáfora conceptual. A continuación, mostramos unos ejemplos extraı́dos de la prensa con presencia de metáforas, ası́ como su categorización sintáctica y semántica: • Estamos ante un nuevo brote psicótico de los mercados (El Mundo 2012) Sustantivo + Adjetivo + Preposición + Sustantivo (SAPS). PROCESO • El problema radica en la incapacidad y pánico de nuestra economı́a (Cinco Dı́as 2009) Sustantivo + Preposición + Sustantivo (SPS). SIGNOS Y SÍNTOMAS • El estrés de los bancos griegos e italianos (Expansión 2014) Sustantivo + Adjetivo (SA). PACIENTE Contraseña: metáfora conceptual, ling´’uı́stica de corpus, fraseologı́a, periodismo económico, eventos conceptuales 106 La mise en discours des données chi↵rées dans les textes de vulgarisation scientifique Riham El Khamissy ⇤ 1 1 Département de français, faculté des langues (AL ALSUN), Université Ain Chams, Le Caire – Département de français Faculté des Langues (AL ALSUN) Université Ain Chams Rue khalifa Maamoun Abbaseya Le Caire, Egipto Les données chi↵rées ont cet atout de produire, chez le destinataire, cet e↵et d’incontestable, d’irréfutable. Dans les médias, les journalistes peuvent rapporter une statistique de sorte que celle-ci devienne l’élément central de l’article (chi↵rage de l’information). Dans ce cas, l’explication des chi↵res constitue l’information secondaire. Or, le plus souvent, les statistiques et les pourcentages servent à appuyer le texte même, à argumenter des énoncés et à conférer une légitimité aux informations et aux idées. Notre travail a pour objectif de saisir comment les journalistes traitent l’information chi↵rée dans les articles de vulgarisation scientifiques (dans les médias de vulgarisation et la presse généraliste) notamment ceux qui traitent le virus Zika qui a fait l’objet de nombreux débats au cours des deux dernières années. Nous avons choisi les textes de vulgarisation plutôt que les textes scientifiques parce que l’une des finalités les plus saillantes de notre travail consiste à mettre en relief la volonté d’orienter le destinataire vers une attitude donnée, voire parfois le manipuler, ce qui est, à notre sens, un phénomène qui se manifeste davantage dans les textes de vulgarisation destinés au grand public généralement non averti. Nous sommes partie d’un corpus de 13090 documents en français répertoriés par la base Europresse.com entre le 1er janvier 2015 et le 31 décembre 2016, période où le virus a connu une expansion remarquable à l’échelle planétaire. Nous explorerons d’abord les données formelles. Nous examinerons le choix entre la forme typique et classique du nombre (en chi↵res) et sa transcription (en lettres). Ensuite, nous analyserons les chi↵res dans leur environnement linguistique immédiat (le co-texte), lequel peut modifier l’information véhiculée par le chi↵re en matière d’exactitude, de précision et/ou d’orientation argumentative selon la motivation communicative du journaliste. Sur ce, nous procéderons à l’analyse des quantifieurs (jusqu’à, près de, aux environs de, autour de et aux alentours de, près de etc.). Notre contribution s’inscrit dans la même lignée que les travaux d’Adler et Asnès (2004, 2007, 2013), ceux de Ducrot (1983, 1995, 2002) approfondis par Doury et Moirand (2004). La question que nous traitons, dans la présente contribution, n’est pas le recours aux chi↵res mais plutôt leur mise en discours et leur soumission aux objectifs des journalistes pour influencer l’opinion publique. Résultat : d’après nos analyses, l’écart entre niveau factuel ou informatif d’une part et le niveau argumentatif d’autre part est souvent que le reflet du passage des résultats numériques officiels, témoins de la vérité, à des ersatz subjectifs de la réalité. ⇤ Ponente 107 Contraseña: Chi↵res, quantifieurs, opérateurs argumentatifs, textes de vulgarisation, presse 108 La modalité dans les discours politiques : segments phraséologiques en langue et en discours. Exploration textométrique d’un corpus de débats présidentiels états-uniens (1960-2016) Marion Bendinelli 1 ⇤ 1 Edition, Littératures, Langages, Informatique, Arts, Didactique, Discours (ELLIADD) – Université de Franche-Comté – 30 rue Mégevand, 25030 Besançon cedex, Francia Notre communication porte sur l’identification puis l’analyse énonciative et discursive de segments phraséologiques incluant un ou plusieurs marqueurs verbaux de modalité (notamment can, must, will, need to, have to). Ce travail repose sur l’exploration outillée d’un corpus, établi en format XML-TEI, de discours politiques anglo-saxons composé de l’intégralité des débats présidentiels organisés aux États-Unis depuis 1960. L’exploration est conduite au moyen des logiciels d’analyse de données textuelles TXM (Heiden, Magué, Pincemin 2010) et Hyperbase (Brunet 2010), et fait en particulier usage des modules permettant de consulter et/ou calculer concordances, segments répétés et cooccurrents. Une telle exploration mettra en évidence les associations privilégiées entre (i) divers marqueurs de modalité ou (ii) entre marqueurs de modalité, syntagmes nominaux sujets (groupe nominaux ou pronoms) et verbes ou, plus largement, classes sémantiques verbales (verbes de communication, d’existence, d’activité... - selon la classification établie par Biber, Johansson, Leech, Conrad et Finegan 1999). Ces associations ont parfois été relevées dans divers travaux décrivant des genres discursifs (Dedaić 2004 ; Née, Sitri, Veniard 2014), des textes de spécialité (Gotti et Dossena 2001 ; Labbé et Labbé 2013) ou la grammaire anglaise (Biber et al. 1999) ; ici, établies sur la base d’une co-fréquence statistiquement pertinente au sein du corpus, elles seront analysées comme des segments phraséologiques - collocations (Firth 1957) et colligation (Hoey 2005) - de l’anglais, dans sa variante parlée aux États-Unis, et du discours politique. Dans un premier temps de l’étude, nous montrerons, par le biais de di↵érentes manipulations des logiciels TXM et Hyperbase, comment l’approche textométrique permet de mettre au jour l’existence de segments phraséologiques du type we must + verbe d’action non aspectuel (” we must act ”) ou verbe mental + SN + can (” I believe that we can work together ”) dans le cas des modaux must et can. Le calcul des cooccurrents permettra de mettre en évidence des segments phraséologiques discontinus (les items n’étant pas nécessairement adjacents) et ordonnés (l’apparition des items étant contrainte) du type can + must et/ou have to + will (” we can fight terrorism [...], it has to be [...], therefore we must fight terrorism and we will ”). Mettant en regard ces segments avec les données issues du corpus de référence COCA (Corpus of Contemporary American English), établi par Mark Davies et librement interrogeable en ligne, nous montrerons que certains sont spécifiques au discours politique, d’autres plus transversaux car utilisés dans di↵érents genres discursifs, semblent davantage inscrits en langue. Quelques éléments théoriques issus de l’analyse énonciative développée par Antoine Culioli et reprise par Gilbert (2001) ou Deschamps (2001), à savoir les notions de construction et de parcours de ⇤ Ponente 109 l’altérité notionnelle, éclaireront par ailleurs le fonctionnement énonciatif des séquences modales et leur fonction rhétorique. Chemin faisant, cette communication articulera approche informatisée d’un corpus, analyse statistique de données textuelles, analyses énonciative et discursive ; elle entend ainsi contribuer à mieux connaı̂tre les caractéristiques linguistiques et discursives des discours politiques. Contraseña: Segments phraséologiques, Collocation, Colligation, Discours politique, Débats présidentiels, États, Unis 110 La traduction des ” megatermes ” anglais de type erythrocyte invasion-inhibitory response : une approche fondée sur corpus et analyse du discours Mojca Pecman 1 ⇤† 1 , Natalie Kubler , Alexandra Mestivier ⇤ ⇤ 1 1 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât. Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia La linguistique de corpus a permis aux linguistes non seulement de fonder leurs observations sur les données authentiques, mais également d’étudier l’évolution de la langue et ses tendances actuelles. En traduction spécialisée, tant dans le milieu professionnel que dans le cadre d’une formation préparant les futurs traducteurs à s’adapter à ce milieu, la capacité à envisager la dynamique actuelle des langues de spécialité devient un enjeu majeur de la qualité de la traduction. Associée à l’envergure de la di↵usion de l’information spécialisée et à la rapidité d’évolution des connaissances, cette dynamique qui transparait ostensiblement dans les corpus de linguistes, semble grandissante. Cette étude vise à démontrer comment une combinaison de l’analyse en corpus avec l’analyse en discours permet de capter la dynamique des discours spécialisés et de trouver les solutions en matière de traduction. Nous illustrerons notre propos sur l’exemple des problèmes de traduction que posent les groupes nominaux complexes en anglais de spécialité tels que erythrocyte invasion-inhibitory response. Les groupes nominaux complexes permettent de compacter ou condenser l’information, une caractéristique saillante du discours spécialisé anglais. L’étude diachronique sur l’évolution des adjectifs composé anglais de Mestivier-Volanschi (2015) fournit des preuves sur la fréquence en hausse des ces structures. Gledhill (1999) et Jaime-Sisó (1993) étudient les mutations dans les titres des textes spécialisés d’un format nominal vers un format à structure de phrase où les composés complexes permettent l’expression d’une structure argumentale de manière économique, selon un mécanisme qu’ils appellent ”miniaturisation”. Les travaux de Maniez (2007, 2008) sur la langue médicale anglaise et les groupes nominaux complexes discutent également de la propension de l’anglais pour la nominalisation et de l’aide qu’o↵rirait aux traducteurs la création d’une base de données des équivalences des GN complexes. En e↵et, la grande flexibilité de l’anglais quant à la formation des groupes et des syntagmes nominaux contraste avec le français, plus enclin à préserver l’argumentation dans sa forme phrastique. Nous présenterons, dans un premier temps, le cadre général de cette recherche qui s’inscrit dans la méthodologie d’enseignement de la traduction spécialisée aux étudiants de Master pratiquée à l’université Paris Diderot. Cette méthodologie repose sur l’analyse terminologique (Pecman et K´’ubler 2011) et donne lieu à des évaluations ⇤ † Ponente Autor correspondiente: [email protected] 111 à l’aide d’analyses quantitatives et qualitatives de corpus de traductions annotées (K´’ubler et al. 2016). Ces analyses permettent d’améliorer la méthodologie d’enseignement de manière incrémentale d’année en année. Nous montrerons comment cette méthodologie combine la pratique d’enseignement avec la recherche en traduction spécialisée pour inscrire notre étude dans la lignée des travaux sur l’enseignement de la traduction par les corpus (Aston 1999, Zanettin et al. 2004, Beeby et al. 2009, Castagnoli et al. 2011) et sur l’évaluation de l’apport des corpus en classe (Bowker & Bennison 2003, Frankenberg-Garcia 2009, Loock et al. 2013, Loock 2016). Nous illustrerons également l’évolution diachronique des composés adjectivaux dévoilée par Mestivier-Volanschi (2015) pour démontrer la nécessité de la prise en compte de la tendance de l’anglais de spécialité à recourir aux GN complexes. Dans un deuxième temps, nous présenterons l’analyse de l’exemple du groupe nominal anglais erythrocyte invasion-inhibitory response et nous tenterons de montrer les procédés utilisés pour véhiculer ce type d’information en français (cf. les réponses immunes protectrices... médiées par des anticorps... inhibent l’invasion des érythrocytes). Contraseña: specialised traslation, translation teaching, corpus based approach, discourse analysis, complex nominal groups 112 La traduction publicitaire : approche par corpus Isabel Comitre Narvaez 1 ⇤ 1 Université de Málaga (UMA) – Université de Málaga – Campus de teatinos s/n - 29071 Málaga, Francia Si nous observons attentivement les messages publicitaires pour certains produits, nous nous apercevons de la présence massive d’un vocabulaire technique, voire pseudo-scientifique (Remaury, 2000). Les grandes marques utilisent ce vocabulaire pseudo-scientifique comme argument persuasif majeur pour gagner en crédibilité. En e↵et, la rigueur médicale et l’autorité scientifique sont une garantie d’achat pour le futur consommateur (Valdés Rodriguez, 2004). C’est le cas du lexique des produits appellés cosméceutiques (cosmétique + pharmaceutique) qui reflète à la fois l’évolution de la société médico-esthétique, le progrès technologique du domaine et l’innovation scientifique de ce secteur d’activité. Au sein de l’Union Européenne, la question de la traduction se pose au-delà de la simple équivalence lexicale car elle touche également la législation de chaque pays. Cependant, la traduction est au coeur de notre étude qui a pour principal objectif de pointer les principales stratégies traductionnelles mises en oeuvre par le traducteur en publicité. Pour ce faire, nous avons analysé un corpus d’annonces bilingues que nous avons constitué à partir des critères proposés par Guidère (2009, 2011). Notre corpus comparable bilingue contient environ 750 termes en français et leurs équivalents en espagnol. Ce corpus ” ad hoc ” que nous avons créé a été puisé sur les sites officiels de grandes marques de produits cosméceutiques. Nous avons repéré ce lexique en relevant sur les sites officiels di↵érents procédés qui permettent de conférer aux produits cette allure pseudo-scientifique (dérivation préfixale, suffixale, emprunts, composition, abréviations, acronymie, siglaison alphabétique ou chi↵rée, confixation, mots-valises, utilisation des majuscules, etc). Après cette première approche, nous avons comparé le vocabulaire repéré dans les mêmes sites en espagnol afin de mettre en lumière les stratégies traductionnelles utilisées. Or, dans une communication telle que la communication publicitaire où l’aspect visuel coexiste avec l’aspect verbal, nous avons évidemment pris en compte les images des annonces car celles-ci participent à la création du sens global de la publicité, voire même porter toutes seules le sens de la publicité. C’est la raison pour laquelle nous avons choisi la sémiotraductologie (Guidère, 2000, 2009, 2011; Guillaume, 2016) comme cadre théorique et méthodologique car ce paradygme traductologique considère l’importance des signes non verbaux (images, personnages, cadre, émotions, sensations) lors du transfert du sens en traduction. Notamment, le concept du ”cube traductologique” (Guidère, 2011, p 112) que nous avons adapté à notre objet d’étude; Ce modèle d’analyse nous a servi de point de départ et nous a permis de déterminer 3 niveaux d’analyse spécifiques à la publicité: celui des conceptions (idées générales de l’annonce transmises par le message linguistique); celui des perceptions (informations sensorielles transmises par les messages iconique et sonore) et, enfin, celui des intentions (implicites discursifs culturels et idéologiques). Le modèle d’analyse ainsi obtenu nous permet, d’une part, d’identifier et de classifier le lexique pseudo-scientifique spécifique caractéristique des cosméceutiques et porté par le message verbal et, d’autre part, d’appréhender le sens transmis par l’image et toutes les informations sensorielles portées par le messages non verbal et contenues dans les annonces de notre corpus dans le but de déceler les stratégies traductionnelles qui sous-tendent les choix du traducteur de campagnes publicitaires. ⇤ Ponente 113 Contraseña: traduction publicitaire corpus comparable bilingue 114 Le continuum lexique-grammaire en genre spécialisé à partir de corpus maison Laurent Gautier ⇤ 2,1 , Cyril Nguyen Van ⇤ 2 2 1 Maison des Sciences de l’Homme de Dijon USR3516 (MSH Dijon) – Université Bourgogne Franche Comté – Esplanade Erasme, 21000 Dijon, Francia Centre Interlangues Texte Image Langage (TIL) – Université Bourgogne Franche Comté – Université de Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia [Problématique et objectifs] La proposition, qui s’inscrit dans l’axe 5 de l’appel ” Corpus, études contrastives et traduction ” vise à interroger l’apport des corpus spécialisés maison (Loock 2016a, b) pour la mise au jour, pour la traduction professionnelle et la formation de traducteurs, des patrons lexico-grammaticaux inhérents à des moules textuels (Gautier 2009) hautement contraints, en langue(s) traduite(s). On discutera en particulier, à la suite de K´’ubler/Gledhill (2016 : 75), l’idée selon laquelle l’interrogation systématique de corpus homogènes permet d’aboutir à une représentation holistique vérifiée des interactions entre lexique et grammaire, surtout quand chacune des deux composantes est mise en œuvre à travers des répertoires (très) réduits par rapport aux possibilités o↵ertes par le système linguistique considéré. Ces patrons peuvent en e↵et représenter pour le traducteur un ” sous-texte ” à partir duquel les choix de traduction se feront de manière ” naturelle ” à l’interface entre contenus conceptuels du texte à traduire et mise en mots et en textes. Données Cette problématique sera instanciée par un corpus clos, compilé manuellement, et composé des conférences de presse de la Banque Centrale Européenne 2015 et 2016 dans leur version originale en anglais (19.883 mots) et dans leurs traductions en français (23.931 mots), allemand (19.810 mots) et néerlandais (21.324 mots). Par-delà son caractère de prime abord parallèle (Teubert 1996), chacun des sous-corpus sera envisagé pour lui-même, comme corpus de langue traduite, la comparaison avec l’original ne jouant qu’un rôle périphérique. Méthodologie On partira tout d’abord de la fréquence des termes N pour en interroger systématiquement les combinatoires, en particulier verbales, afin de dresser un inventaire systématique par langue des structures argumentales dans lesquelles ils s’inscrivent. Ce faisant, la dimension formulatoire, indispensable au traducteur pour la fluidité de son texte, sera mise en avant en particulier pour les langues, allemand et néerlandais en tête, qui jouent sur l’emploi de N prédicatifs associés à des V supports préférentiels non prédictibles : (01) Insbesondere m´’ussen die entschlossene UmsetzungNPRED von [G´’uter- und Arbeitsmarktreformen]ARG sowie die Bem´’uhungenNPRED [zur Verbesserung des Gesch´’aftsumfelds f´’ur Unternehmen]ARG in einigen L´’andern intensiviertVSUP werden. ⇤ Ponente 115 (02) Ten tweede was, hoewel de tussen juni en september vorig jaar genomen monetairbeleidsmaatregelen tot een aanzienlijke verbeteringNPRED [in termen van de koersen op de financi´’ele markten]ARG hebben geleid VSUP, dit niet het geval voor de kwantitatieve uitkomsten. On s’arrêtera ensuite, à partir d’une analyse des n-grams, sur les structures récurrentes, analysées ici en termes de routines discursives, dont l’emploi, par-delà la terminologie et les collocations conceptuelles, garantit l’appartenance du texte au genre, comme en (03) : (03) D : nach wie vor, mit Blick auf ; F : au cours des prochains mois, (x) des prix à moyen terme, NL : (van) de additionele aankopen van, op de middellange termijn Discussion Les résultats seront discutés d’une part par rapport à l’implémentation des corpus, en particulier maison, dans la formation des traducteurs – et ce par-delà leur présence ” dissimulée ” dans nombre d’outils de TAO, à commencer par les MT – et d’autre part par rapport au cloisonnement souvent systématique entre un module grammatical, un module terminologique et un module ” stylistique ” qui, pour des types de textes spécialisés (très) contraints, vole en éclat dès que l’on part de la langue en usage attestée en corpus. Contraseña: corpus maison, genre, lexique, grammaire, routine discursive, terminologie, LSP 116 Le marqueur discursif ”donc” dans deux corpus dialogaux de di↵érente nature Gemma Delgar Farrés 1 ⇤ 1 Université de Vic-Université Centrale de Catalogne (UVic-UCC) – C. de la Laura, 13 08500-VIC (Barcelone), España Notre étude porte sur l’analyse du marqueur discursif donc dans un corpus de conversation réelle, le Minnesota Corpus (Kerr, 1983), et dans un corpus de dialogue de théâtre, la pièce Le Mariage de Figaro de Beaumarchais. Comme point de départ, nous formulons les questions de recherche suivantes : Les emplois de donc apparaissant dans les deux corpus sont-ils les mêmes ? Quelle est la distribution de ces emplois dans le corpus de conversation naturelle et dans celui du dialogue de théâtre ? Les études linguistiques antérieures de donc signalent que ce marqueur discursif peut avoir trois grands emplois : marque argumentative ou logique, marque de reprise et marque interactive (Trésor de la langue française,1971-1994 ; Zenone, 1981 ; Hybertie, 1996 ; Hansen, 1997 ; Pellet, 2005 ; Bolly et Degand, 2009 ; Delgar, 2010, 2013). La révision de ces approches nous conduit tout naturellement à la description de donc donnée par Pellet : In other words, the inferential aspect of donc may be viewed as a characteristic which is present to varying degrees depending on the function that the discourse marker fulfills in a particular context. The highest degree of ”inferentiality” is of course associated with the use of donc to mark results and conclusions (argumentative). It is also high with donc to mark recapitulations, confirmation requests, and resumptions. It seems ”less high” with the frameshift function (foregrounding) and with the discursive (emphasis) function. (2005 : 103) En premier lieu, nous avons étudié les occurrences de donc des deux premières sections du Minnesota Corpus et, en second lieu, nous avons réalisé la comparaison des résultats obtenus avec ceux que nous avions déterminés pour Le Mariage de Figaro. Au vu de ces données, il faut noter que les emplois et les valeurs sémantico-pragmatiques de donc sont quasi les mêmes dans les deux corpus bien qu’il existe des valeurs qui n’apparaissent pas dans un des corpus, soit parce qu’il s’agit d’emplois plus restreints du marqueur en situation dialogale, soit parce qu’elles sont plus caractéristiques ou bien de la conversation authentique ou bien du dialogue de théâtre. Au contraire, la distribution de ces emplois à l’intérieur des corpus est di↵érente car, dans le corpus de conversation authentique, elle relève du fonctionnement de la communication réelle alors que, dans le corpus théâtral, elle tient au fonctionnement du dialogue comme un projet d’écriture prédéterminé par l’auteur. Contraseña: valeurs sémantiques et pragmatiques, marqueur discursif, conversation, théâtre, corpus ⇤ Ponente 117 Learner vs. professional translational behavior: The case of discourse markers Maria Kunilovskaya ⇤† 1 , Natalia Morgoun 2 1 2 Tyumen State University (Utmn) – 625003, Volodarskogo 6, Tyumen, Russia, Rusia Lomonosov Moscow State University - MSU (RUSSIA) – 119991, Moscow, GSP-1, 1 Leninskiye Gory, Rusia Learner vs. professional translational behavior: The case of discourse markers Keywords: translational learner corpora, discourse markers, interference, frequency distribution, text-level linguistics, cohesion, translation studies, TQA The major motivation behind this research is understanding linguistic behavior of translation students in their mother tongue during translation. Which linguistic features (if any) make them distinct from professional translations, can they be measured and targeted in the educational programmes? Another concern is describing the existing professional norm against non-translated reference for a given direction of translation in a given language pair today. This investigation is limited to mass-media texts and explores connectives frequences in English originals and Russian translations and non-translations as one possible operator of these di↵erences. Levels of explicit text connectedness have been on the linguistic research agenda in computational and corpus linguistics for many years. It is an important textual feature that reflects peculiarities of text production under di↵erent socio-pragmatic conditions. It has been found that genres and entire languages vary not only in the inventory of the means used to signal relations between parts of text, but also by the intensity of their use (Liu, 2008; Fabricius-Hansen, 2005). Cross-linguistic di↵erences in textual strategies a↵ect translations and contribute to the source language independent translationese hypothesized by Baker (Baker 1993). This has been used to e↵ectively detect di↵erences between parallel corpora unseen by general similarity measures (Cartoni, 2011). Discourse markers frequencies are used to establish di↵erences between translations and nontranslations and are interpreted as a linguistic indicator of several tendencies in translation such as explicitation, simplification and convergence (Olohan, 2001; Chen, 2006; Denturk 2012). It is important for this research that the intensity of ‘being a translation’ can be related to translation quality (Scarpa, 2006) and translational norms, operating within a particular direction of translation and a particular language pair (Mauranen, 2004). We set out to reveal tendencies in translational behaviour at di↵erent competence levels by describing the frequency distributions of two functional types of discourse markers (connectives and epistemic commentary markers) in learner and professional translations against sources and non-translations. We compare data from a parallel translational learner corpus and a corpus of professional translations to customized selections from English and Russian national corpora. The total size of the research corpus amounts to 10 mln tokens. Using independent predefined lists of targeted items for each language, we explore cross-linguistic di↵erences and their influence over the two types of translation. We test three possible tendencies: translation follow source language pattern (interference); translations follow target language pattern (normaliza⇤ † Ponente Autor correspondiente: [email protected] 118 tion) or translations demonstrate independent idiosyncratic (over)use of connectives (explicitation). The observations are done with regard to the overall frequencies of the list items, their semantic groups and individual frequencies. The latter approach reveals translationally distinctive connectives (Chen, 2006) – items that have statistically di↵erent frequencies in translations as in originals. Manual analysis of parallel aligned data is used to verify the inferences from statistical analysis and provides insights into typical errors which lead to a significant decrease in the textual quality of learner translations. Contraseña: translational learner corpora, discourse markers, interference, frequency distribution, text, level linguistics, cohesion, translation studies, TQA 119 Les appositions nominales en français et en slovène : étude contrastive sur le corpus FraSloK Adriana Mezeg 1 ⇤† 1 Faculté des Lettres, Département de traduction – Askerceva 2, 1000 Ljubljana, Eslovenia La présente communication aborde un phénomène grammatical que nous appelons, d’après Combettes (1998), les appositions nominales, l’un des types de constructions détachées dont les propriétés principales sont : la liberté de position dans la phrase, la séparation du reste de la phrase par une virgule, la prédication seconde et la relation de coréférence avec le sujet de la phrase (Combettes 1998). Il s’agit d’un groupe nominal qui n’est jamais précédé d’un déterminant et qui établit avec le sujet principal une relation avec verbe être, par exemple : Chef du gouvernement provisoire de la République française, il a signé à Moscou, le 10 décembre 1944, un ” traité d’alliance et d’assistance mutuelle ”, qu’il qualifie de ” belle et bonne alliance ”. (Le Monde diplomatique, avril 2008) La présente communication ne se propose d’analyser que les traductions slovènes des appositions nominales françaises, placées en tête de phrase, cellesci étant le plus intéressantes contrastivement. L’apposition nominale s’avère problématique du point de vue contrastif franco-slovène et ne peut pas être transmise en slovène par la même structure, c’est-à-dire une construction détachée, car elle ne satisfait pas au critère de la mobilité phrastique, ne pouvant pas, par exemple, occuper la position frontale. Ainsi supposons-nous que l’explicitation grammaticale est de règle lors de la traduction de ces formes phrastiques en slovène, les traducteurs devant les remplacer par d’autres structures. L’analyse contrastive sera basée sur les exemples tirés semi-automatiquement du corpus parallèle français-slovène FraSloK qui contient des articles de presse (Le Monde diplomatique, sous-corpus journalistique) et des ouvrages littéraires (sous-corpus littéraire) publiés entre 1995 et 2008. Les deux sous-corpus sont annotés morphosyntaxiquement et équilibrés au niveau de la taille, contenant ensemble un peu moins de 2,5 millions de mots. Les exemples de constructions détachées nominales initiales seront extraits du corpus français-slovène par le logiciel Paraconc (Barlow 1995) à l’aide de patrons syntaxiques, composés d’étiquettes morphosyntaxiques et d’expressions régulières. D’après les résultats du repérage automatique et du tri manuel, les appositions nominales sont un peu plus fréquentes dans le corpus journalistique (178 occurrences contre 122 dans le corpus littéraire). Souvent plus longues de la proposition principale, elles apportent, surtout dans le discours journalistique, l’information sur la position et le statut social du référent de la proposition principale. Cette étude vise à examiner comment les traducteurs slovènes a↵rontent ces structures problématiques et propose d’en tirer des conclusions pratiques, utiles dans le cadre pédagogique et dans la médiation interlinguistique franco-slovène. Les premiers résultats montrent que le contenu des appositions nominales françaises est souvent exprimé en slovène sous forme du sujet de la phrase, de l’attribut du sujet, de l’attribut de l’objet et d’une construction liée (Combettes 1998) qui est, d’ailleurs, fréquente en slovène. La traduction des appositions nominales françaises vers le slovène pose d’autres problèmes que nous constatons dans le contexte pédagogique lors des cours de traduction, notamment les questions de l’ordre des mots, du changement de place au sein d’une phrase et de l’emploi de la virgule, questions que nous ⇤ † Ponente Autor correspondiente: adriana.mezeg@↵.uni-lj.si 120 tâcherons d’éclaircir dans la communication proposée. Contraseña: apposition nominale, construction détachée, corpus parallèle FraSloK, analyse contrastive, traduction 121 Les constructions verbales en comme : de l’écrit scientifique à l’écrit académique des étudiants natifs/non-natifs Marie-Paule Jacques ⇤ 1,2 , Rui Yan ⇤ † 1 1 LInguistique et DIdactique des Langues Étrangères et Maternelles (LIDILEM) – Université Grenoble Alpes – UFR des Sciences du Langage - BP 25 - 38040 Grenoble cedex 9, Francia 2 École supérieure du professorat et de l’éducation - Grenoble (ESPE Grenoble) – ESPE Académie de Grenoble, Université Grenoble Alpes – 30, avenue Marcelin Berthelot - 38100 Grenoble, Francia L’écrit scientifique fait un usage abondant d’une phraséologie spécialisée (Tutin, 2014), qui s’y présente sous di↵érentes formes : collocations (Grossmann & Tutin, 2003), séquences récurrentes (Tran, 2014) routines (Tutin & Kraif, 2016)... Cette phraséologie remplit des fonctions rhétoriques et discursives variées, par exemple, exprimer un point de vue, établir la cause et l’e↵et, signaler une filiation scientifique, définir des termes et concepts, donner des éléments de preuve, etc. Sa maitrise est de ce fait aussi importante que la maitrise de la terminologie et de l’appareil conceptuel de la discipline. Nous nous focaliserons sur la construction verbale associée à comme, dont une étude dans un corpus d’articles de recherche en SHS montre qu’elle introduit souvent ” des comparatives métaénonciatives ” (Debaisieux & Martin, 2010, p. 321, cité par Grossmann, 2014, p. 764) : comme nous l’avons montré/vu/souligné/dit, comme nous le verrons, comme nous l’expliquons, comme illustré/indiqué dans la figure, etc. Ces quelques exemples mettent en évidence la contribution de cette construction à l’argumentation scientifique : elle remplit ” une fonction métatextuelle et/ou évidentielle ” (Grossmann, 2014) et ceci par la présence massive, après comme, de verbes de constat (constater, voir ) ou de communication (dire, expliquer, souligner, montrer, indiquer ). La construction a alors pour fonction de renvoyer vers un élément textuel ou un (fragment de) discours qui servent de preuve ou de rappel. Nous nous situons dans la perspective de son apprentissage par des scripteurs novices et envisageons d’étudier l’usage de cette construction par une comparaison des productions d’étudiants natifs et non natifs et de textes de chercheurs, considérés ici comme experts de l’écriture scientifique. Dans la lignée de travaux centrés sur les phénomènes phraséologiques dans les écrits des natifs/non natifs (Hyland & Milton, 1997 ; Ne↵, Ballesteros, Dafouz, Martı́nez, & Rica, 2004 ; Granger & Paquot, 2009), nous considérons que le statut de novice en matière de rédaction scientifique confronte identiquement les étudiants natifs et non natifs aux difficultés de l’usage de la phraséologie scientifique. En revanche, comme le soulignent Granger et Paquot (2009), les difficultés des étudiants non natifs méritent d’être prises en compte et traitées spécifiquement puisqu’ils ont en outre des problèmes liés à la maı̂trise de la langue. Nous examinerons donc l’emploi des constructions verbales associées à comme chez les étudiants natifs ainsi que non-natifs en nous basant sur deux corpus composés de mémoires de master, et en les contrastant à un corpus d’articles de recherche en SHS. Les premières observations manifestent des di↵érences aussi bien quantitatives que qualitatives : 1) Par rapport aux experts, ces constructions sont sous-employées par ces deux publics. 2) Les étudiants montrent des emplois ⇤ † Ponente Autor correspondiente: [email protected] 122 di↵érents de ceux des experts, notamment concernant les verbes associés aux constructions en comme. 3) Les étudiants non-natifs produisent des erreurs lexicales sur ces constructions. Contraseña: construction verbale, écrit scientifique, étudiants natifs/non, natifs, linguistique de corpus 123 Meeting the reader in academic writing: reader pronouns in English and French. Curry Niall 1 ⇤ 1 University of Limerick [IRLANDE] (UL) – University of Limerick Limerick, Irlanda Research on corpus-based contrastive analysis is notably experiencing a rebirth in interest due to its role in a world of increasing ‘interlingual and intercultural communication’ (Granger 2003, p.18). This rebirth is largely influenced by advances in corpus linguistics over the last 30 years, where corpus-based contrastive analyses on academic writing are occupying an albeit small but growing space in the literature. Much of this growth is likely due to the fact that non-native speaking academic writers need to be informed of the writing conventions of the academic discourse communities to which they aspire (Pérez-Llantada 2010, p.45). This has led researchers on academic writing to occupy three streams of research (Biber 2006, p.6) that can better inform language teaching i.e. the study of context and text, the study of interpersonal communication and the study of lexico-grammatical items. Although these streams are arguably interconnected, there is a surprising lack of research on interpersonal communication in academic writing that compares evaluative markers across languages. In other words, there is a need for research on rhetorical devices, such as directives, personal asides, shared knowledge, questions and reader pronouns (Hyland 2005), that authors use to engage readers in academic writing and this research aims to address this gap in the context of reader pronouns in English and French academic writing. In this paper, we consider reader pronouns in the economics research article in English and French and in so doing, aim to analyse their varying role in the research article as engagement markers. We focus on the functions of these pronouns as a comparable common ground or tertium comparationis in English and French, and test their equivalence, following Krzeszowski (1990), in terms of form, location and word class. To do this, we present a corpusbased contrastive analysis of economics research articles in English and French, taken from the KIAP corpus (Fløttum et al. 2006) which is a comparbale corpus that contains 450 research articles with 150 in English, French and Norwegian and 50 in each language in the economics, linguistics and medicine disciplines. This research centres on the English and French economics subcorpora totalling 100 research articles. Reader pronouns are identified in each sub-corpus and their functions are categorised based on a synthesis of research by Hyland (2001; 2005) and Fløttum et al. (2006) in terms of their work on addressee features and reader pronouns. These reader pronouns are then analysed in terms of their formal typology, their location within the text, and their morpho-syntactic properties in a view to measure equivalence. The results of this study reveal some important similarities and di↵erences at the level of function, form, location and morpho-syntax which are investigated both quantitatively and qualitatively. Such findings allow us to add to the debate on the nature of English and French academic writing as writer- and reader-responsible languages, respectively and can have useful implications in informing the teaching of academic writing in both English for academic purposes and français langue académique. ⇤ Ponente 124 Contraseña: corpus, based contrastive analysis, English for academic purposes, français langue académique, academic writing 125 Multi-word terms: disclosing the semantic relations in noun compounds Melania Cabezas-Garcı́a ⇤† 1 , Pilar León-Araúz ⇤ 1 1 University of Granada (UGR) – Buensuceso, 11 18001, España Noun compounds (e.g. wind power ) are the units mainly used to designate specialized concepts (Nakov, 2013). These multi-word terms (MWTs) can be defined as a sequence of nouns that function as a single noun (Downing, 1977) and they are distinguished by their syntacticsemantic complexity, since two concepts are juxtaposed without any clear indication of the link between them (Rosario et al., 2002). This involves that in compound terms, such as air pollution and oil pollution, that have the same external form (the head pollution combines with a noun modifier), di↵erent semantic relations can be established between their constituents (Location vs. Cause) (Maguire et al., 2010). Therefore, the semantics of terminological noun compounds is not fully compositional or construed from the meaning of their constituents, as it is often assumed. Although the ambiguity of the semantic relations in noun compounds has long been studied, it remains problematic, because di↵erent interpretations can lead to di↵erent inferences, query expansion, paraphrases, translations, etc. (Hendrickx et al., 2013). The root of this issue is noun packing, which can be addressed by analyzing the formation processes of noun compounds, involving predicate deletion (e.g. power system, instead of a system produces power ) and predicate nominalization (e.g. energy transfer, instead of energy is transferred ) (Levi 1978). These propositions underlying the noun compounds make the semantic relation explicit and take the form of a predicate, its arguments, which are mandatory and make up the meaning of the verb, and adjuncts (optional complements) (Tesnière, 1976). The relation between a predicate and its complement structure is referred to as ‘micro-context’, which represents a key factor in accessing the semantics of terms. This paper describes the use of paraphrases conveying the conceptual content of English twoterm noun compounds (Nakov and Hearst, 2006; Butnariu and Veale, 2008; Cabezas-Garcı́a and Faber, in press) in the specialized domain of environmental science. Verb paraphrases were used to access micro-contexts, which represent the syntax-semantics interface, in two-term noun compounds formed by predicate deletion. Some of these paraphrases were based on the lexicosyntactic patterns that usually convey semantic relations in real texts (Meyer, 2001; Marshman, 2006). Our goal was to access the semantics of these MWTs in order to (i) disambiguate the semantic relation between the constituents of the compound; and (ii) develop a procedure of inference of the semantic relations in these MWTs. To this end, English two-term noun compounds were extracted from an environmental science corpus. The MWTs selected designated entities and all of them shared the same head (e.g. air pollution, wastewater pollution, oil pollution, etc.). We then organized the MWTs according to the semantic category of their modifiers, i.e. the qualitative valence of the concealed predicates ⇤ † Ponente Autor correspondiente: [email protected] 126 was considered to disambiguate the semantic relations in the noun compounds. The following step was the extraction of paraphrases from the corpus. Finally, the di↵erent groups of MWTs, which had been previously organized depending on the semantic category of their modifier, were compared. Our results showed that the specification of the semantic category of the modifiers and the use of paraphrases allowed access to the conceptual load of the noun compounds, namely to the semantic relation between their constituents. Thus, recurrent patterns in the formation of these compounds were observed, which was found to be a valuable starting point toward the development of translation rules of these units. Contraseña: noun compound, semantic relation, paraphrase, micro, context, terminology 127 Multilingual extraction of terminology from specialised corpora. Eva M. Mestre-Mestre 1 ⇤ 1 Universitat Politecnica de Valencia [Espagne] (UPV) – Camino de Vera, s/n 46022 Valencia, España There exists considerable amount of literature related to the use of text based corpora for various purposes: scientific research, elaboration of teaching materials, compilation of glossaries and vocabularies, etc. In many cases, computer software is used (and sometimes programmed) to help in these tasks. Most of the analysis software used permits the users to check word frequencies, concordances and collocations. However, there are not many tools which permit the extraction of true specialised lexical units from specialised domain corpora. In addition, there are not numerous able to work with languages other than English. This work presents the main characteristics of DEXTER (Discovering and EXtracting TERminology)[1], an online workbench for terminology management and data mining of corpora based on unstructured texts. The current version of DEXTER supports the processing of small- and medium-sized corpora carrying out first an automatic extraction of the terms in a given corpus, by contrasting the target corpus with the IATE thesaurus of the European Union. Then, a manual validation of the candidate terms is necessary to obtain final valid results. During the analysis, a distinctive characteristic of DEXTER is the possibility of working with di↵erent languages; at the moment, it is able to analyse corpora in English, French, Italian and Spanish. A second particularity of this software tool is that it uses a hybrid approach which takes into account the linguistic and statistical properties of the lexical units, using in addition lexical filters without grammatical tagging to restrict the results obtained before their weighing, which simplifies the validation work needed for the completion of the terminology extraction task. This also permits the identification of terms that include di↵erent grammatical categories (nouns, verbs, adjectives or adverbs). DEXTER uses the SCR metric (Periñán-Pascual, 2015), resulting from the combination of termhood and unithood of the n-grams extracted by the software (Salton, Wong, and Yang, 1975; Salton and Buckley, 1988; Ahmad, Gillam and Tostevin, 2000; Park, Byrd and Boguraev, 2002). The research presented here compares the results obtained in the analysis of three corpora composed by 50 articles written in French, 50 written in English and 50 written in Spanish on the subject of neurology published in the last five years in prestigious research journals. The degree of precision of the terms proposed by the software after manual validation has been studied. The cases in which greater degree of false positives (considered as terms by the software proposed but disregarded in the validation phase) have also been considered. The study concludes that the results obtained with DEXTER are similar for the three languages and consistent with previous studies carried out with monolingual corpora (Periñán-Pascual and Mestre-Mestre, 2015, 2016). DEXTER has been developed in C# with ASP.NET 4.0 by Prof. Carlos Periñán-Pascual, and is freely accessible at www.fungramkb.com/nlp.aspx. ⇤ Ponente 128 Contraseña: ATE, multilingual, specialised corpora, terminology 129 Naming practices and media constructions of reality in Spanish: A corpus-based perspective on violence against women news (2005-2015) José Santaemilia 1 ⇤ 1 Universitat de València (UV) – Avda. Blasco Ibáñez, 32-6 Valencia 46010, España Without a doubt, violence against women (VAW) is a serious issue within Spanish society, which is characterized, among other things, by a growing awareness of gender and sexual issues, and this includes a perception of VAW as a serious social malady, as well as a crime. Multiple representations of, and debates on, the topic are to be found in literature (Báez Ramos 2002), cinema (Sánchez Noriega 2002, Wheeler 2012) or TV and radio programmes (Gómez Nicolau 2012). In this heightened awareness of VAW, mass media have been instrumental. In Spain, media accounts of VAW are very closely related to two quality newspapers, El Paı́s and El Mundo. Since the mid-1970s quality papers have featured growing numbers of articles on the topic. With the murder of Ana Orantes in December 1997, a new discourse on VAW has been identified in the Spanish media (Bengoechea 2000, Carballido 2007), though scholarly research at the turn of this century (Bengoechea 2000, Lledó 2002, Fernández Dı́az 2003, Jorge 2004, Vives-Cases et al 2005, Carballido 2007, Zurbano 2012, Menéndez 2014, Carratalá 2016) still shows that Spanish media discourses have a tendency to naturalize and condone male responsibility, thus reproducing the existing asymmetrical relations between the two sexes. Although a vast number of denominations for VAW are present in the Spanish media discourse, three naming practices seem to stand out as the most common -violencia de género [Eng. ‘gender-based violence’], violencia doméstica [Eng. ‘domestic violence’] and violencia machista [Eng. ‘male violence’]. Choosing one term over another is especially relevant, as it is likely to impose a category of thought, convey negative or positive values, attribute blame or praise, or shape a certain evaluative stance. This presentation, therefore, compares and contrasts the two Spanish quality dailies (El Paı́s and El Mundo) in their use of the three main naming practices used in contemporary VAW news. To do so I draw on an ad-hoc corpus made up of ca. 10 million words of gender-based news, covering the period 2005-2015. This is part of a larger, comparable (Spanish-English), highly specialized corpus (GENTEXT-N), containing all the news articles dealing with genderrelated topics such as VAW, homosexuality or abortion. In terms of methodological approach, I resort to a CADS (Computer-Assisted Discourse Studies) approach (Partington 2004, Baker & Levon 2015) –e.g. the combined, dialogical insights from both corpus linguistics and Critical Discourse Analysis, ”moving back and forth recursively between qualitative and quantitative forms of analysis in order to generate new hypotheses as well as to test existing ones” (Baker & Levon 2015: 223). Therefore, di↵erences and similarities in frequencies and concordance lines are explored, in order to assess the most important ideological values present in VAW news ⇤ Ponente 130 stories. Attention has been paid to the news values (Bednarek & Caple 2012, 2014) construed by each newspaper, together with the relevant associations and ideological implications. Among the traits that seem to be confirmed we identify a general trend towards a more widespread use of two terms –violencia machista (El Paı́s) and violencia de género (El Mundo)– with the increasing exclusion of violencia doméstica. Newsworthy naming practices, and their evolution in media discourses, are powerful indicators of both social positionings on sensitive social issues and of public evaluations of the same issues. Contraseña: violence against women (VAW), Spanish press, El Paı́s, El Mundo, media discourse, VAW naming practices, news values. 131 On the Endophoric, Abstract and Narrative Nature of Idiomatic ’Do So’ in Legal texts, Journalistic Texts and Written Correspondence. ” Carlos Prado-Alonso 1 ⇤ 1 of Oviedo (Uniovi) – España Do so idiomatic constructions, as in ‘I ate an Apple yesterday in the park, and Peter did so last week’, are verbal anaphors that have been extensively studied from a theoretical perspective. Research on do so has mainly focused on the categorical factors -i.e. semantic and syntacticthat determine the use of the construction. It has been argued, for instance, that the extent of application of do so anaphora depends principally on factors such as: (a) non-stativity of the antecedent (Guimier 1981); (b) antecedent not headed by be (Levin 1986); (c) coreferentiality of subjects in the antecedent and do so clauses (Souesme 1987), (d) adjunct status of any ”orphan” in the do so clause (Culicover & Jackendo↵ 2005); and/or (e) non-contrastive status of any adjunct in the do so clause (Huddleston and Pullum 2002), among others. Overall, however, scholars have devoted little attention to the examination of the textual factors a↵ecting the distribution and use of do so anaphora in naturally occurring Present-day English, apart form a few isolated hints here and there (cf. Houser 2010). In order to bridge this gap, this paper presents an in-depth corpus-based analysis of the factors that determine the pragmatic use and distribution of do so constructions in di↵erent contemporary legal, journalistic and written correspondence texts. The data for the study are taken from the ICAME family of corpora, namely the LOB, FLOB, FLOB, FROWN, BE06, and AmE06 corpora. As a rule, do so has been regarded typical of formal registers, with the elliptical alternative omitting so being preferred in informal contexts (cf. Stirling and Huddleston 2002: 1531). Beyond that, however, the analysis of the 687 instances retrieved from the corpora will show that the frequency and distribution of do so constructions in legal, journalistic and written correspondence texts is not only dependent on the degree of formality but also on the narrative, endophoric and abstract nature of the texts in which it occurs. The data will also show that such a narrative, endophoric and abstract nature is not only a property of the texts in which do so anaphora occurs, but also a feature of the construction itself. In sum, the analysis sheds light on the linguistic and textual factors that drive the pragmatic use and the distribution of do so verbal anaphora and shows that, in addition to syntactic and semantic factors, the linguistic features of the texts in which they occur also play an important role in the use of these types of formulaic expressions. References ⇤ Ponente 132 Culicover, P.W., Jackendo↵, R. 2005. Simpler Syntax. Oxford: OUP. Houser, M. J. 2010. The Syntax and Semantics of Do So. University of California. Guimier, C. 1981. La Substitution Verbale en Anglais. Modèles Linguistiques 3.1: 135-161. Levin, L. 1986. Operations on Lexical Forms: Unaccusative Rules in Germanic Languages. Cambridge, MIT. Pullum, G. K. & R. Huddleston. 2002. The Cambridge Grammar of the English Language. Cambridge University, 1449-1564. Souesme, J. 1987. Valeurs et Emplois Respectifs de DO et DO SO. Modèles Linguistiques 9: 65-92. Contraseña: Idiomatic Do So, Textual Variation, Legal Texts, Editorials, Written Correspondence 133 On the Grammaticalization Path of the Quasi-coordinator as well as Miriam Criado Peña 1 ⇤ 1 UNIVERSIDAD DE MÁLAGA – España The English language as it is known today has undergone a number of developments that have changed it throughout time. Among those changes, grammaticalization stands out because of its relevance in the progress of the language, consisting in the process by which a lexical word having full meaning on its own becomes a grammatical item. The present study analyses the developmental path of the construction as well as taking the Old English adverb well as the origin of it. In Middle English as well and as well as (swa well swa) emerged from the original adverb behaving as single units to finally turn into the coordinator as well as in Early Modern English. These manifold layers still remain in Present Day English, which together with the versatility of the construction allows me to classify it into four groups according to their meaning and function: a) as an adverb of manner; b) as a comparative of two elements; c) as a conjunctive coordinator; and d) as a coordinator introducing one person or thing. Nevertheless, coordinators such as as well as sometimes perform di↵erent syntactic roles in a sentence, those are called quasi-coordinators, that is, linkers that can behave like coordinators or subordinators depending on the context. When they behave as subordinators, they introduce prepositional phrases and can be placed in front position but do not lose their coordination function. Besides, some of the mechanisms involved in the modification of the language, such as syntactic reanalysis or semantic bleaching, among others, are also considered in this paper to explain the changes and provide a dual view of them encompassing syntax and semantics. The process of grammaticalization of quasi-coordinators has been practically neglected in the literature, and therefore, the diachronic development of as well as still remains unknown. In the light of this, the present paper studies this process examining the syntactic and semantic changes of this construction as well as exploring its coordinating function in the di↵erent layers across time. In this fashion, the following objectives are pursued: a) a historical analysis to ascertain the origin of this quasi-coordinator, examining the linguistic causes that motivated the change, both syntactically and semantically; b) an identification of the multiple mechanisms and processes taking place along the grammaticalization path; c) a classification of the construction into four groups according to their function in order to appreciate its progress, and d) a socio-linguistic study to assess the role played by the social factors during the linguistic process. For the purpose, the Parsed Corpus of Early English Correspondence (PCEEC) and the Helsinki Corpus of English Texts have been used as sources of analysis, covering almost seven hundred years from the late Old English period to the Early Modern period. Contraseña: Grammaticalization, as well as, quasi, coordinator, diachronic development, semantic bleaching, reanalysis, socio, linguistic factors ⇤ Ponente 134 Onomasiologı́a del sentimiento: los corpus ling´’uı́sticos como fuente de datos para la semántica y la combinatoria sintagmática de los nombres de emoción en español Inmaculada Mas 1 ⇤ 1 Universidad de Santiago de Compostela - USC (SPAIN) (USC) – España La expresión de las emociones está de moda. Los emocionarios, la plasmación de sentimientos en las redes sociales, los emoticonos, imprescindibles en las conversaciones a través del chat móvil. Estas son solo algunas de las manifestaciones de la relevancia actual de la sensibilidad subjetiva. Más allá del monolı́tico me gusta, dar nombre a las emociones constituye en las comunicaciones públicas y privadas un elemento esencial, no por primitivo menos sofisticado. En esta comunicación proponemos un acercamiento a la semántica y la combinatoria léxico-sintáctica de los nombres de emoción en español con ayuda de los datos obtenidos a través de corpus ling´’uı́sticos. Los objetivos de esta propuesta son tres: en primer lugar, se intenta llevar a cabo una aproximación onomasiológica a la materia de los sentimientos, centrada en los nombres de emoción en español y su combinatoria léxico-sintáctica; en segundo lugar, se pretende comprobar la utilidad de los corpus como fuente de datos, ya que, además del contexto y el ámbito, aportan información sobre frecuencia (corpus de referencia), correspondencias multiling´’ues (corpus paralelos) e incidencia en la interlengua (corpus de aprendices); como tercer objetivo, se busca considerar la aplicabilidad de todo ello de cara a la elaboración de un producto lexicográfico destinado a estudios contrastivos y a resolver necesidades de producción y traducción. La aproximación onomasiológica parte del Diccionario ideológico de Casares, del Diccionario de uso del español, de Moliner, y del Diccionario de sinónimos y antónimos de la lengua española, de López Garcı́a. Según el plan general de la clasificación ideológica de Casares (1942), los nombres de emoción están englobados en la materia de Sensibilidad y se desglosan en Sensibilidad/Sentidos, en el Cuadro sinóptico 13 (p. L), y en Sentimientos, en el Cuadro sinóptico 14 (p. LI). Como es sabido, la perspectiva onomasiológica es más aprovechable en las tareas ling´’uı́sticas de producción y traducción, dos actividades para las que los catálogos del Diccionario de uso del español y los diccionarios de sinónimos y antónimos han demostrado ser fuentes de enorme utilidad. La localización del léxico preciso se consigue en general a partir de la voz más neutra, más general o más frecuente. Moliner tenı́a como uno de sus propósitos al incluir los catálogos el de ”conducir al lector desde la palabra que conoce al modo de decir lo que desconoce o que no acude a su mente en el momento preciso” (p. IX). En su diseño pretendió dotar al diccionario de una doble vı́a de consulta: la onomasiológica y la semasiológica. Los datos de frecuencia y, sobre todo, el caudal de ejemplos de los nombres en contexto que nos ofrecen los diferentes corpus consultados (CORPES XXI, Reverso Context y CAES), permiten perfilar el esquema semántico, completándolo con el potencial combinatorio; en el caso que nos ⇤ Ponente 135 ocupa, con los verbos de apoyo y los complementos adnominales. Algunos resultados en torno a los dos polos en que se sitúa la materia Sentimientos (gusto/disgusto, amor/odio, preocupación/despreocupación) muestran las particularidades de la combinatoria de estos sustantivos. Contraseña: nombres de emoción, lexicografı́a onomasiológica, corpus del español, combinatoria sintáctica 136 Phraseological routines in scientific writing: the example of metatextual routines in French Agnès Tutin 1 ⇤ 1 Laboratoire de Linguistique de de Didactique des Langues Maternelles et Etrangères (LIDILEM) – Université Paris VIII Vincennes-Saint Denis, Université de Grenoble – Université Grenoble Alpes Bâtiment Stendhal CS40700 38058 Grenoble cedex 9, Francia Phraseology is prevalent in scientific writing (e.g. Gledhill, 2000; Pecman & K´’ubler 2011) and has many faces in this genre (Tutin, 2013). Cross-disciplinary scientific phraseology includes collocations such as pay attention or encouraging results, discursive markers such as as long as or as a first step but also large phraseological chunks that we call semantico-rhetorical routines (Tutin & Kraif, 2016). These routines, which belong to the extended phraseological field (see also Teufel 1998, Pecman 2004; Sandor 2007) present specific properties: • At the syntactic level, they are generally complete sentences including a tensed verb. They are thus di↵erent from standard collocations which prototypically involve two lexical elements. • At the rhetorical level, they have a specific rhetorical function, such as highlighting textual coherence, e.g. comme on/nous l’avons mentionné/précisé [as one/we mentioned/made clear ...]. • At the enunciative level, they involve specific referents in the discourse situation (e.g. the author of the scientific writing, the scientific article, the audience of the scientific writing ...). • At the semantic and lexical level, they involve specific concepts, lexicalized with various elements, e.g. in the above example, the author of the scientific is referred to with on or nous, while mentionné alternates with précisé. These semantico-rhetorical routines are thus far from being frozen expressions, but we think they fully belong to the field of phraseology since these patterns are dedicated to specific functions in the genre of scientific writing and are realized through limited lexical paradigms. After a theoretical presentation of routines, our presentation will show how these phraseological patterns can be automatically extracted from treebanks of scientific articles in a corpus-driven approach. This technique uses statistical association measures and dispersion measures (Kraif 2016; Tutin & Kraif 2016), associated with semantic lexicons and syntactic relations (Hatier et al. 2016). We will then illustrate this notion in the field of metatextual functions, especially text navigation functions, often associated with speech verbs. ⇤ Ponente 137 References Gledhill, Ch. (2000). Collocations in Science Writing. Language in performance, 22. Tuebingen: Gunter Narr Verlag. Hatier, S., Augustyn, M., Yan, R., Tran, T. T. H., Tutin, A., & Jacques, M. - P. French crossdisciplinary scientific lexicon: extraction and linguistic analysis (2016).Dans T. Margalitadze & G. Meladze (éd.), Proceedings of the XVII EURALEX International congress Lexicography & Linguistic diversity (p. 355–365). Kraif, O. (2016). Le Lexicoscope : Un outil d’extraction des séquences phraséologiques basé sur des corpus arboré. (O. Kraif & A. Tutin, éd.)Cahiers de lexicologie, 1 (108), 91-106. Pecman, M. (2004). Phraséologie contrastive anglais-français : analyse et traitement en vue de l’aide à la rédaction scientifique. Thèse de doctorat, Université de Nice Sophia Antipolis, décembre 2004. Pecman, M., & K´’ubler, N. (2011). ARTES: an online lexical database for research and teaching in specialized translation and communication. In Proceedings of the First International Workshop on Lexical Resources. Sándor, A. (2007). Modeling metadiscourse conveying the author’s rhetorical strategy in biomedical research abstracts. Revue Française de Linguistique Appliquée, XII: 2007-2: 97-108. Tutin, A. (2016). La phraséologie transdisciplinaire des écrits scientifiques : des collocations aux routines sémantico-rhétoriques. Dans A. Tutin & F. Grossmann (éd.), L’écrit scientifique : du lexique au discours. Autour de Scientext (p. 27-44). Rennes: Presses Universitaires de Rennes. Tutin, A., & Kraif, O.(2016) Routines sémantico-rhétoriques dans l’écrit scientifique de sciences humaines : l’apport des arbres lexico-syntaxiques récurrents. Lidil. Revue de linguistique et de didactique des langues, (53), 119-141. Contraseña: phraeology, scientific writing, routines 138 Phraseology and discourse grammar in English as a lingua franca: ’on the contrary’ and ’on the other hand’ in unedited research papers Silvia Murillo 1 ⇤ 1 Universidad de Zaragoza – Pedro Cerbuna 12, 50009 ZARAGOZA-ESPAÑA, España Due to linguistic interference, some ‘deviant’ uses of the contrastive discourse markers on the contrary and on the other hand have been pointed out in essays written by learners of English (Lake 2004, Guilqin et al. 2007), as well as by users of English as a lingua franca (Prodromou 2008). For instance, these markers, which grammatically are prepositional phrases, are similar in form to the Spanish discourse markers por el contrario and por otra parte, but their use (i.e. their instructional or procedural meaning) is di↵erent. Por el contrario can either contrast two topics or oppose/ refute one single topic, whereas on the contrary only encodes the latter use. Por otra parte encodes discourse organizing instructions rather than counterargumentative ones. The same applies to other language pairs, for example English-French on the contrary/ au contraire (Portolés 2002). The purpose of this paper is to present a qualitative-quantitative analysis of the form and use of these two markers in the SciELF corpus, a subset of the WrELFA corpus (Written Corpus of English as a Lingua Franca in Academic Settings), compiled at the University of Helsinki. The SciELF corpus consists of 150 unedited research papers (759 300 words) from Sciences and Social Sciences and Humanities disciplines, written by academics of a range of ten L1 backgrounds. The analysis of the corpus revealed nonstandard phraseological variants of the two markers. Regarding on the other hand, makers such as on the other side, in the other side, in the other hand, and for the other hand were found. The phraseological range for on the contrary included at the contrary, by contrary, in contrary, on contrary, and contrary. As regards their functions, on the contrary presents deviant uses, contrasting two topics rather than opposing/ refuting one single topic, in over two thirds of the cases found in the SciELF corpus. On the other hand reflects a more discourse organizing role of the marker in some cases, and thus a less argumentative function. These processes may be described as semantically-driven developments, as the role of residual conceptual meaning in the L1 markers (cf. Murillo 2010) seems to become central for the form and use of these discourse markers in written academic ELF. Regarding form variants, in most cases the core conceptual element of the markers has been kept (as a cognate) or translated, and there is an approximate use of the prepositions and articles (cf. Sinclair 2004, Vetchinnikova 2015). Further, the procedural meaning of these markers seems to have been amplified due to the influence of the L1. Thus, hybridity is the most remarkable process with regard to these markers, and it is perceived at a formal level and at a pragmatic-semantic level. Variations in form are masked by the role played by editors at a later stage, who tend to ⇤ Ponente 139 correct the use of prepositions and articles in papers to be published (Mur, 2013). However, many deviant uses of on the contrary are overlooked in published papers (Murillo 2012). Considering this trend and the frequency of such cases revealed in the SciELF corpus, it is argued that this discourse marker is undergoing a grammaticalization process in ELF, that is, its procedural meaning is changing. Contraseña: English as a lingua franca (ELF), contrastive discourse markers, formal variants, procedural meaning, conceptual meaning, grammaticalization 140 ROUND TABLE: Corpus-based analysis of interpersonal metadiscourse in specialized domains: academic vs professional and social genres. Theoretical and methodological challenges Francisca Suau-Jiménez ⇤ 1,2 , Rosa Lorés Sanz ⇤ † 3 , Giovanna Mapelli 4 , Isabel Herrando Rodrigo ⇤ § 3 1 ⇤ ‡ FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ. UNIVERSITAT DE VALENCIA (IULMA-UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España 2 FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ (IULMA - UV) – Av. Blasco Ibáñez, 32 Valencia 46010, España 3 Universidad de Zaragoza – Pedro Cerbuna 12, 50009 ZARAGOZA-ESPAÑA, España 4 Dipartimento di Scienze della Mediazione Linguistica e di Studi Interculturali – Piazza Indro Montanelli, 1 20099 - Sesto San Giovanni (MI), Italia The main subject of this round table is an identified need to refine interpersonal metadiscourse (IM) as a theoretical and methodological tool of analysis to describe genres in specialized domains and languages through their corresponding corpora. The debate will be grounded on our own research results, based on corpora, stemming from the study of di↵erent academic and professional genres (Herrando-Rodrigo 2010, 2012, 2014; Lorés-Sanz 2009, 2011a, 2011b; Mapelli 2008, 2016; Suau 2012a, 2012b, 2014), with a focus on interpersonality and its limitations and challenges as an analytical perspective. Conclusions intend to suggest insights for the applicability of the descriptive framework of interpersonal metadiscourse and thus facilitate further research in the field. The hypothesis is that, if interpersonal metadiscourse (IM) as a framework for the analysis of interpersonal features in professional, social and academic genres is conditioned by contextual variables, it would therefore need to be constantly refined and readapted to the specific corpus it is applied to, thus accepting new markers and/or new lexico-grammatical realizations. If this hypothesis is somehow confirmed by means of the debate and the conclusions that will emerge from the proposed round table, the scope will be opened for further refinement of the model which will allow us to cater for the description of a wider range of genres, disciplines, languages and corpora, with discursive and socio-linguistic implications. To sum up, we will draw on several of our own studies carried out in specialized corpora from the standpoint of the IM framework, discussing their main achievements but also their limitations, due to the strict and extant pattern the model was designed with. Then, these four questions will be posed in order to hold a debate among the presenters and the audience: Questions for discussion, related to the four analyses: ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] § Autor correspondiente: [email protected] † 141 Q.1. Have any weaknesses being identified in the framework of interpersonal metadiscourse, especially related to markers and their lexico-grammatical realizations? and if so, which ones? Q.2. Does each corpus determine the way in which the framework has been applied, or, on the contrary, has the research objective determined what corpus to collect? Q.3. What di↵erences can be observed in the interpersonal metadiscourse framework according to genre, discipline and language? Q.4. What conclusions can be drawn and what suggestions can be made to facilitate methodological improvements in order to facilitate further research in IM? Which would be the theoretical implications? Based on our contributions and on the implications emerging from them, di↵erences will be identified in terms of variations in the degree of applicability of the model as regards the domain of specialization (professional and social vs academic), language used (English vs Spanish) and lexicogrammatical and phraseological indicators, among other aspects. Contraseña: interpersonality/ interpersonal metadiscourse/ specialized, domain corpora/academic genres/professional and social genres/ theoretical and methodological challenges 142 Rocking the corpus. A discourse analysis of pop rock lyrics. Marı́a Martı́nez Casas 1 ⇤ 1 Katholische Universität Eichstätt-Ingolstadt (KU) – Alemania Rocking the corpus. A discourse analysis of pop rock lyrics. song lyrics, discourse analysis, language use patterns, enunciation, semantic processes Pop rock songs are everywhere – except for corpora. As Kreyer and Mukherjee (2007: 31) point out: ”pop song lyrics have not been included in any of the standard reference corpora of present-day British and American English [...]; pop songs are virtually absent from corpuslinguistic research”. The current state of research on pop rock songs in Spanish does not constitute an exception to this statement. Thus, the aim of this paper is to present the main language use patterns (Bubenhofer 2009) regarding enunciation (Laferl 2005, Calsamiglia and Tusón 2015) and semantic processes (Halliday 1978, Ghio and Fernández 2008) in a corpus consisting of 1.000 pop rock lyrics in Spanish (169.500 tokens). The present corpus was compiled following the sociological criteria of consecration and canonization as well as central aesthetic values such as authenticity and hybridization (cf. Val, Noya and Pérez-Colman 2014). It comprises 85 albums released between 1968 and 2015 by artists coming from over 12 countries. 819 texts were taken from CD booklets or artists homepages and 181 lyrics were transcribed from recordings. They were then analyzed with both AntConc 3.4.4W and WordSmithTools 6.0 and finally POS-tagged using Treetagger. In accordance to the results of prior corpus-linguistic research on pop rock lyrics in English (Murphey 1990, Kreyer and Mukherjee 2007, Werner 2012, Bértoli-Dutra 2014), pop rock discourse in Spanish builds upon the personal pronouns and possessive determiners of first (yo, me, mi ) and second person singular (tú, te, tu). The most frequent enunciative structure as proposed by Laferl (2005: 68) is: ”The I addresses itself to a you and talks about their relationship”. However, both main participants in lyrics show di↵erent semantic preferences when it comes to types of processes: whereas the articulate ”I” tends to be involved in mental (querer, sentir ) processes, the ”you” carries out material (irse, dar, dejar, llevar ) or verbal (decir, pedir ) processes. The semantic categories which Bértoli-Dutra (2014: 162) grouped for the factor extraction in her multi-dimensional analysis of pop songs in English show therefore following distribution in the lyrics in Spanish: ”movement” and ”speech” apply rather to the ”you”; ”emotion”, on the contrary, appears mainly close to the ”I”. The linguistic representation of the main participants in pop rock lyrics shall be presented in this paper through the discourse analysis of clusters with deictic expressions referred to the ”I” and the ”you” in the corpus. Special attention will be paid to lexical co-occurrences with tags corresponding to clitic and personal pronouns, possessive determiners and verbal forms (i.e. lexical and modal verbs as well as ser, estar , haber ). ⇤ Ponente 143 Contraseña: song lyrics, discourse analysis, language use patterns, enunciation, semantic processes 144 SUNCODAC: A Spanish-English corpus of computer-mediated student discussions Mario Cal Varela 1 ⇤ 1 , Francisco Javier Fernandez Polo ⇤ † 1 Universidad de Santiago de Compostela - USC (SPAIN) (USC) – España In this paper, we present the SUNCODAC corpus of student discussion forums. Our aims will be to justify the corpus’ rationale, describe its compilation process, holdings, design and query tools, and to highlight its potential as a research tool. Despite the momentum of Computer-Mediated Communication research (Herring & alii 2013), CMC corpora (Breissberger & Storrer 2008) are relatively meager and scarcely representative of the wide variety of CMC settings, notably educational contexts. Existing research in CMC in education is generally based on relatively small corpora, compiled for the special needs and research questions of individual research projects. SUNCODAC is a comparatively large corpus of student forum discussions, a key genre in present-day higher education (Rourke & alii. 1999, Loncar & alii 2014). Data consist of Moodle-based discussions in an English-Spanish-English translation course over four consecutive years. The corpus contains a balanced representation of English and Spanish used as native and non-native languages by multinational students. In the course of the presentation, we will provide a short description of the context of the discussions, as well as a brief account of the corpus compilation process. SUNCODAC’S current holdings consist of approximately 450,000 words and, when completed, it is expected to total over 600,000 words. Data were anonymized and stored in XML format with metadata on a number of user and other contextual variables, including participants’ first language, gender, main language of post, date, time, topic and thread. Except for the replacement of participants’ names by codes, the texts were left unedited as far as grammar, spelling and other errors are concerned. A specific tool was developed to allow for the computerized retrieval of data via the Internet. The tool can be used to search for specific language features, as well as for browsing and retrieval of whole texts or text collections using one or a combination of the coded variables as filters. In the course of the presentation, we will demonstrate some of these functions. The corpus holds considerable potential as a research tool, for instance, a) to further knowledge of ”netspeak” and, more specifically, b) to complement existing research on the discussion forum genre (Biber & Conrad 2009) and its characteristic language. Furthermore, given its longitudinal nature, c) it should provide insights into processes (individual and collective) of genre development in CMC and, in view of its multilingual and multicultural nature, d) should also prove particularly useful for language contrasts as well as e) for cross-cultural studies into culture-specific communicative practices. Finally, f) it should also prove valuable as a tool to study learner-language and second-language acquisition processes in real-life environments, as well as to undertake pedagogically-oriented studies seeking to identify successful forum participations which result in more e↵ective learning practices, eventually leading to the design of ⇤ † Ponente Autor correspondiente: [email protected] 145 improved training materials. Contraseña: Keywords: corpus, CMC, forum, Spanish, English, academic discourse, SUNCODAC 146 Secuencia gramatical para la enseñanza del español como lengua extranjera Yun Sil Jeon 1 ⇤ 1 , Alejandro Muñoz-Garcés ⇤ † 1 Coastal Carolina University (CCU) – Associate Professor, Spanish, Estados Unidos La investigación que estamos realizando conjuntamente la Università di Firenze y la Coastal Caroline University inició con el propósito de conseguir encontrar un modo automático de extraer de corpus de la lengua oral las construcciones más sencillas que se realizan en el habla, e ir progresivamente viendo las construcciones que presentan mayor complejidad. Para esta investigación contábamos con varios corpus de la lengua oral española: C-Or-DiAL (Corpus Oral Didáctico Anotado Ling´’uı́sticamente) (120.000 palabras transcritas y etiquetadas), C-ORAL-ROM (etiquetado y alineado) y el Minicorpus del Español (30.000 palabras etiquetadas y alineados y con marcas de articulación de información). Nuestro trabajo de programación inicial se ha propuesto encontrar el camino para conseguir extraer de modo automático los enunciados más sencillos de todo el corpus y continuar con las extracciones de los que presentan mayor complejidad de modo progresivo. Se ha partido del presupuesto que un enunciado en el habla es menos complejo cuantas menos unidades tonales lo componen. Se ha considerado por lo tanto que la unidad mı́nima de la comunicación es un enunciado compuesto de una solo unidad tonal, y que aumenta la complejidad del enunciado al aumentar la complejidad en su articulación de la información con dos o más unidades tonales. Se ha iniciado el análisis utilizando las etiquetas de delimitación de estas unidades tonales en el corpus C-Or-DiAL; estas etiquetas marcan los lugares en los que se percibe la delimitación de las unidades tonales intermedias de un enunciado, los break prosódicos intermedios, y también de final de enunciado, los break prosódicos finales. Gracias a este etiquetado ha sido posible generar una lista con todos los enunciados compuestos de una unidad tonal, los compuesto de dos, de tres de cuatro o más. El paso sucesivo de la investigación consiste en analizar estas listas con los distintos tipos de enunciados con ayuda de algunos analizadores morfosintácticos (GRAMPAL y FREELING entre otros) que se ofrecen en la red, para decidir cuál utilizar. Este mismo proceso de trabajo de extracción de unidades tonales y análisis se hará también utilizando C-ORAL-ROM y con el Minicorpus del Español para poder confrontar los resultados y evaluar las diferencias. Como resultado de estos análisis esperamos encontrar datos que sean significativos o al menos indicativos de lo que se suele usar en los enunciados más sencillos y lo que se va encontrando en los más complejos. Se podrá reflexionar tras este análisis sobre los tipos de palabras y las construcciones que ocupan determinadas posiciones en la articulación del enunciado. Y por último a partir de estos datos se podrá proponer al profesor de español como lengua ⇤ † Ponente Autor correspondiente: [email protected] 147 extranjera una secuencialidad a la hora de elegir el material que enseñar en la clase, pues nuestra investigación espera obtener algunos indicios sobre lo que usa, dónde y cuánto se usa en el español coloquial. Contraseña: Secuencia gramatical, enseñanza del español, corpus de hablantes nativos, análisis morfológico y sintáctico 148 Semantic constraints on MWU formation: Evidence from clinical records. Leonie Grön 1 ⇤ 1 , Ann Bertels 1 Katholieke Universiteit Leuven (KUL) – Bélgica Since Sinclair’s (1991) formulation of the idiom principle, the scope of research related to multi-word units (MWUs) has widened considerably. While earlier work focused on fixed word sequences, recent research locates MWUs on a continuum, ranging from frozen expressions to patterns which allow for paradigmatic choices (Dobrovol’skij 2015; Steyer 2015). In studies on language for special purposes (LSP), the defining criteria centre around the functional value of the unit, whereby the surface forms may show both lexical and morpho-syntactic variation. Such variation patterns may be attributed to the area of research, as well as properties of the textual genre (Hyland 2008, Laso & Salazar 2013). In the medical domain, most related research has focused on scholarly articles (León & Divasson 2006; Laso & John 2013). By contrast, our study investigates the structure of MWUs in clinical records, which are at the verge of oral and written communication. By analyzing a corpus of Dutch patient records, we aim to reveal patterns in the formation of complex noun phrases (NPs). Our prediction is that structural preferences will pattern with semantic features of the constituents. Our study focused on MWUs relating to the semantic classes Diagnosis (e.g. ‘lipodystrofie’ lipodystrophy) and Examination (e.g. ‘schildklierfunctie’ thyroid function). Based on precompiled term lists, we extracted all instances of these classes that were either localized on the human body (Anatomical, e.g. ‘onderbeen’ lower leg), or specified with regard to severity, etiology or quality (Qualitative, e.g. ‘drug-ge´’induceerd’ drug-induced ). We identified about 3 times as many MWUs for Qualitative than for Anatomical, both in terms of raw counts (137.646 vs. 36.862) and the number of patterns ( ˜472 vs. 112). Especially for Qualitative, a small number of conventionalized phrases (e.g. ‘gunstig lipidenprofiel’ favourable lipid profile) accounts for a large share of occurrences. Irrespective of the headword class, Qualitative modifiers primarily occur in the left context. By contrast, the formation of Anatomical MWUs shows more structural variation: General types of Examination are premodified (e.g. ‘pulmonaal onderzoek’ pulmonary examination), whereas technical procedures are localized by nouns in the right context (e.g. ‘echo nier’ echography kidney). MWUs based on the Diagnosis class entail more detailed localizations, leading to an increase in average length ( ˜2.7 vs. 2.3 tokens for MWUs based on Diagnosis vs. Examination). In MWUs involving multiple modifiers, the internal order of the constituents is determined by their semantic class as well as the level of generality: Adjectives designating a particular body part (e.g. ‘abdominaal’ abdominal ) are strongly tied to the headword, whereas relative spatial modifiers and Qualitative specifications are found in the periphery (e.g. ‘stenose thv de arteria carotis links’ stenosis in the arteria carotis left). ⇤ Ponente 149 We conclude that the formation of MWUs in clinical writing is guided by domain-specific constraints. In NPs relating to clinical findings and procedures, the type and relative position of modifiers varies systematically depending on semantic properties of the constituents. These findings confirm that the study of MWUs in LSP benefits from a delexicalized approach, whereby patterns of conceptual types form the basis of investigation. Contraseña: Clinical language, Dutch, MWUs, concordance analysis 150 Sobre la cuasi-sinonimia de poner y meter en español: un análisis de regresión logı́stica de dos verbos locativos. Marie Comer 1 ⇤ 1 Ghent University – Bélgica En esta ponencia nos proponemos comparar la sintaxis y la semántica de los dos verbos principales locativos del español peninsular contemporáneo, poner y meter, mediante un corpus ampliamente anotado. En su significado básico, estos verbos cuasi-sinónimos expresan el desplazamiento de una entidad (la ‘Figura’) de un lugar a otro (la ‘Base’) (Cifuentes 2000) (1). Sin embargo, el uso de poner y meter va más allá del significado locativo básico (Autores 2015): ambos verbos se usan como verbo de transferencia (2), en usos pseudo-copulativos (3), y en perı́frasis causativas e incoativas (4). (1) poner el mantel en la mesa - meterse un chupete en la boca (2) poner una multa a alguien - meter muchos deberes a alguien (3) ponerse nervioso - meterse monja (4) ponerse a reı́r - meterse a trabajar En cada uno de estos campos, poner y meter se comportan como cuasi-sinónimos. Significa que son intercambiables en determinados contextos (ponerse/meterse a estudiar ), pero en no otros contextos (El rı́o se mete/*se pone en el mar; *ponerse monja). El objetivo de esta presentación es doble. Primero, con base en un corpus arbitrario y manualmente compilado de 2000 ocurrencias (1000 de cada verbo, extraı́das de los bancos de datos CORPES XXI, CORLEC y C-ORAL-ROM y ampliamente etiquetadas sintáctica y semánticamente), examinaremos hasta qué punto los núcleos semánticos arriba mencionados se destacan concretamente con estos verbos. Segundo, efectuaremos un estudio más detallado del uso locativo (1), con el fin de detectar paralelos y diferencias entre poner y meter. El análisis se sustenta en un número extenso de variables que potencialmente influencian la elección entre los verbos en su uso locativo. Para este uso, los parámetros estudiados son, entre otros: (a) la dirección del desplazamiento de la Figura con respecto a la Base; (b) la dimensión de la Base; (c) la presencia o ausencia de una zona de contacto entre Figura y Base; (d) la forma fı́sica de la Figura; (e) la posibilidad de una lectura de contenedor o no, y el grado de contenedor (parcial/completo); (f) la animacidad y el carácter concreto/abstracto de los participantes, y (g) la interpretación literal o metafórica del evento de colocación. Mediante un análisis de regresión logı́stica (logistic regression), estudiamos el impacto potencial que tiene el conjunto de las variables en la preferencia por uno de los dos verbos. Nuestro estudio piloto reveló que meter se especializa en eventos donde la base adquiere una lectura de contenedor de tipo meter el pañuelo dentro del bolsillo (Autores 2016; Cifuentes 2004; Cifuentes & Jesús Llopis 1996), al preferir una localización interna, mientras que poner se emplea en una diversidad de eventos locativos. Otros factores de diferencia son ⇤ Ponente 151 la reflexividad sintáctica del evento locativo y la semántica de los participantes. La presente investigación ilustra cómo un método multivariado y estadı́sticamente avanzado se puede aplicar para determinar la diferencia entre dos cuasi-sinónimos léxicos. Contraseña: cuasi, sinonimia, verbos locativos, regresión logı́stica, análisis multifactorial 152 Spanish Fragments and Polar Verbless Clauses. Typology and Corpus Distribution Oscar Garcia-Marchena 1 ⇤ 1 Laboratoire de Linguistique Formelle (LLF) – Université Paris VII - Paris Diderot, CNRS : UMR7110 – Case Postale 7031 5, rue Thomas Mann 75205 Paris cedex 13, Francia The properties and use of fragments (or elliptical clauses) have received recent attention in di↵erent works (Fernandez 2002, Merchant 2004, Schlangen 2003). There is no agreement, however, concerning their nature and classification. Firstly, some authors treat them as pure syntactic units: the remnants of verbless clauses which have undergone ellipsis (Merchant 2004). Secondly, others classify them as pragmatic objects, di↵erent from non-elliptical clauses (Schlangen 2003), by their function in discourse. Thirdly, other works stress their independence from non-elliptical clauses and classify them with a combination of syntactic and pragmatic criteria (Fernandez 2002). The aim of this paper is to show the extent to which Spanish fragments and polar verbless clauses (”yes”, ”no”) can be analysed as syntactic or discourse units, as well as to determine a typology based in their syntactic and pragmatic properties and to present their distribution in the di↵erent genres of a corpus. In order to achieve this goal, we have retrieved the totality of fragments in the corpus of contemporary oral Spanish (CORLEC) (Marcos Marı́n 1992), composed by more than 63 000 utterances and we have classified them according to their syntactic and pragmatic properties. Finally, we have counted the frequencies of each type in the di↵erent genres. The results of this analysis indicate that fragments containing a segment with a counterpart in their source have a predictable discursive relationship with it: they perform a particular speech act (answer, agreement, correction, check question, etc.) that is determined by the syntactic and semantic properties of the source and the target clauses. This combination of properties is detailed in the following list, with reference to constructed examples of the various speech acts: • Interrogative source & asserting target: answer (1) • Interrogative source & questioning target: answer + check question (2) • Questioning declarative source & asserting target & same referent: agreement (3) • Questioning declarative source & asserting target & di↵erent referent: correction (4) • Questioning declarative source & quest. target & same referent: check question (5) • Questioning declarative source & questioning target & di↵erent referent: correction (6) • Asserting declarative source & asserting target & same referent: acknowledgement (7) • Asserting declarative source & asserting target & di↵erent referent: correction ⇤ Ponente 153 • Asserting declarative source & questioning target & di↵erent referent: check question • Asserting declarative source & questioning target & di↵erent referent: repair • A: - ¿Cuándo vino? B: -Hoy. A: -‘When did he come?’ B: -‘Today.’ • A: - ¿Cuándo vino, hoy? A: -‘When did he come, today?’ • A: - ¿Se fue con Mar? B: -Con Mar. A: -‘Did he go with Mar?’ B: -‘With Mar.’ • A: - ¿Se fue con Pedro? B: -Con Mar. A: -‘Did he go with Pedro?’ B: -‘With Mar.’ • A: - ¿Se fue con Pedro? B: - ¿Con Pedro? A: -‘Did he go with Pedro?’ B: -‘With Pedro?’ • A: - ¿Se fue con Pedro? B: - ¿Con Marı́a? A: -‘Did he go with Pedro?’ B: -‘With Mar?’ • A: -Se fue con Pedro. B: - ¡Con Pedro! A: -‘He went with Pedro.’ B: -‘With Pedro!’ In this way, this article will show the illocutionary e↵ects of the combination of syntactic and semantic properties in the source and target clauses for Spanish fragments and polar verbless clauses, as well as the distribution of the resulting speech acts in the various genres of the CORLEC corpus. Contraseña: Fragments, non, sentential utterances, polar verbless clauses, Spanish, corpus, speech acts 154 Spoken Language Corpora under Examination Hanna Hedeland 1 ⇤ 1 , Daniel Jettka 1 Hamburg Centre for Language Corpora, University of Hamburg – Alemania Spoken Language Corpora under Examination Contributing to the current discussion on reuse and citation of corpora and the replicability of corpus-based research, this contribution describes evolving methods for corpus publication and dissemination at a research data centre and presents an outline for a revised model of spoken language corpora as complex dynamic linguistic resources. Within emerging digital research infrastructures (e.g. CLARIN), digital repositories have been set up for the dissemination of resources including spoken language corpora. While there are obviously many benefits to this current best practice approach, several questions regarding resource type specific aspects of data modelling and versioning require an answer for its implementation. By comparison to the previous web-based solution, this contribution discusses these questions and their implications, which are highly relevant to research based on spoken language corpora. Website resources The vast majority of the resources hosted by the centre (cf. [1]) are XML data sets created from heterogeneous legacy data using the EXMARaLDA system [2]. The EXMARaLDA Corpus Manager (Coma) provides a basic data model for corpora comprising communications, speakers, transcriptions, recordings and additional files, and the Coma metadata file itself. For publication and dissemination however, corpus-specific methods based on the EXMARaLDA system were used to create a number of export and - mainly HTML based - presentation formats (i.e. visualizations) and statistics from the source files, resulting in a much more comprehensive and complex resource. The protected resources were accessed via a public web page page containing further background information and documentation. Repository resources Since a digital repository enforces concepts such as persistent identifiers, versioning of digital objects and ingest/dissemination services, the initial corpus data model for the repository comprised only the original source files (cf. [3]), whereas basic visualization and export functionality was implemented by generic web services provided via the repository. This solution brought about two important di↵erences: First, the resource is no longer a collection of static web pages and files; the user interacts with web services that change as target formats or the services themselves are further developed. Secondly, to allow for appropriate presentation of specific corpora (e.g. for research on (child) ⇤ Ponente 155 language acquisition or regional varieties) by generic web services, the corpus type specific characteristics related to corpus design, annotation layers and transcription conventions need to be explicated and applied as configuration parameters for resource dissemination. Discussion Most important, the requirement of citable corpus versions makes it necessary to explicitly track also the versions of web services and further components used for dissemination. As a recent study [4] confirms, users of this type of corpora often mainly analyze visualized transcripts, whose characteristics are known to influence analysis (cf. [5]). Furthermore, while merely applying corpus specific parameters in web services is straight-forward, the definition of such parameters and classification of spoken language corpus types requires thorough investigation and interpretation. Such a typology can be used both to ensure a presentation consistent with original research questions and frameworks for various resources, or, conversely, to allow for a more consistent user experience by applying certain settings to various corpus types in the repository. In our contribution we will discuss this revised and extended model of spoken language corpora more in detail. Contraseña: spoken language corpora, replicability, research infrastructures 156 Strategies for Processing Large Corpora for Linguistic Inquiry and Natural Language Processing Tasks. Antonio Moreno-Ortiz 1 ⇤ 1 Universidad de Málaga (UMA) – España Very large (over a billion words) corpora, have become increasingly available to Corpus Linguistics (CL) and Natural Language Processing (NLP) researchers. However, such text collections are o↵ered with no or little filtering and processing of their content. This is a non-issue for some tasks, such as KWIC concordancing or collocates, due to the sheer volume of data available and, in some cases, the availability of web-based query environments. However, dealing with the raw text to obtain accurate, linguistically-driven statistical information from such corpora, with a view to using it for more advanced tasks, calls for some sophisticated pre-processing, in terms of filtering and word tokenization. This basic step is critical to all others, since it involves making such fundamental decisions as what a word is. This is even more relevant when a corpus is compiled from on-line resources, which are commonly includes a fiar amount of non-lexical and pseudo-lexical items, such as common computer-mediated communication items (URL’s, handles, hashtags) as well as numbers, measures, formulas, etc. If no special treatment treatment is given to such elements, they will certainly impact word frequency counts at all levels, including part-of-speech frequencies, n-gram extraction, statistical language modeling, and, in general, any task that builds on these. Determining the frequency of word classes accurately, as determined by part of speech assignment, is critical to a number of common corpus linguistics metrics, such as lexical density. In this work we examine the role that certain non-lexical and pseudo-lexical items (e.g. cardinal numbers, hashtags, URL’s, e-mail addresses) display in current available corpora obtained from the Web. Specifically, we will focus on GloWbe (Davies, 2013), a large corpus (1.9 billion words), available both for online queries and as a full-text download in di↵erent formats, including a tokenized, part-of-speech tagged, lemmatized version. We show that in such web-based corpora, non-lexical items exhibit high frequency, and therefore should be given a special treatment in order to obtain adequate statistics of common corpus linguistic metrics, such as type/token ratio, word class frequency, and those that are derived from these. We then propose certain cues for the proper treatment of such corpora, in terms of pre-processing, tokenization and part of speech tagging. During this process, we identified certain pre-processing flaws in the original corpus that led to inaccurate results, and propose ways to overcome them. Finally, we describe the results of our segmentation and part-of-speech tagging processing, and compare them with those given by the original tagged version of the Glowbe corpus, and go on to show the impact that di↵erent preprocessing approaches have on certain types of corpus queries, as well as n-gram extraction. References Davies, Mark. (2013) Corpus of Global Web-Based English: 1.9 billion words from speakers in 20 countries. Available online at http://corpus.byu.edu/glowbe/ ⇤ Ponente 157 Contraseña: large corpora, corpus processing, tokenization, part, of, speech tagging 158 Students’ use of the n-grams tool to learn about phraseology in academic writing Maggie Charles 1 ⇤† 1 Oxford University – Reino Unido This paper focuses on the use of recurring multi-word units (MWUs) that are fixed or semifixed in form. In academic writing, MWUs have been investigated using various terms, including ‘lexical bundles’ (Biber et al., 1999; Cortes, 2004) or ‘clusters’ (Hyland, 2008a) and research has shown that their occurrence di↵ers according to discipline (Hyland 2008b). Moreover, there are considerable discrepancies in MWU use between learner and expert academic writing (Cortes, 2004; Hyland, 2008a), with learners typically employing di↵erent MWUs from expert writers and/or using them for di↵erent purposes. Thus the use of MWUs presents challenges to learners of English for Academic Purposes and there is a consequent need for even advanced-level students to develop proficiency in academic phraseology (Gilquin et al., 2007). The present paper aims to address this issue by investigating students’ use of the n-grams tool in the AntConc software (Anthony 2014). The n-grams tool makes a list of all sequences of words that occur in a corpus, with the number of words in the sequence being determined by the user. This study draws on students’ work during a 6-week, 12-hour course on ‘Editing your Thesis with Corpora’. For this course, doctoral students built two do-it-yourself corpora: 1) a corpus of expert writing constructed from research articles (RAs) in their own field; 2) a corpus of learner writing consisting of draft chapters of their own doctoral thesis. Thus each student worked with two corpora tailored to their own specific needs. In the session on n-grams, students were shown the AntConc n-grams tool and each learner made an individual list of three-word sequences (tri-grams) from their corpus of expert RA writing. As the retrieval process of n-grams is automatic, it was hypothesised that the tool would help students to identify the tri-grams used in their own field and thus provide a means of highlighting appropriate academic phraseology. Students were then asked to study the most frequent tri-grams on the list and to perform further corpus searches to understand and explain what they noticed, comparing where necessary the findings from the expert corpus with those from their own writing. The data used in this paper currently consist of the corpora constructed by 15 students and the worksheets completed by them in class, giving details of the most frequent tri-grams they found and commenting on what they learnt from their findings. The most frequently mentioned tri-grams were as well as (found by 11 learners), in terms of (6 learners), the fact that, the e↵ect(s) of and in order to (4 learners each). Following the categorisations of the ‘Academic Formulas List’ (Simpson-Vlach & Ellis, 2010), as well as, the e↵ect(s) of and in order to have discourse organizing functions, while the fact that and in terms of are referential expressions. After further investigations, students often commented on di↵erences they found between their writing and that of the experts. For example, after researching the fact that, one student noted that she used due to the fact that, which did not appear in the RA corpus, where despite the fact that was prevalent. This paper reports in more detail on the student data and argues that the n-grams tool provides a useful way of promoting the noticing and understanding of academic ⇤ † Ponente Autor correspondiente: [email protected] 159 phraseology at an advanced level. Contraseña: academic writing, n, grams, academic phraseology, corpus tools, EAP learners 160 Teachers’ Dispositions Towards the Use of Corpus-Based Approaches in Teaching English as a Foreign Language in Higher Education Awatif Alruwaili 1 ⇤† 1 University of Nottingham – Reino Unido Despite the development and increased use of corpora as a resource in language learning, little evidence exists that corpora are used as alternatives to textbooks and traditional resources such as dictionaries (Chambers, 2005). Corpora use has not changed significantly since Chambers’s (2005) article, as revealed by later studies such as Boulton (2010) and R´’omer’s (2009). Published research has shown improvements in learner performance and positive attitudes in higher education, providing wide support for the use of a corpus approach in an English as a foreign language (EFL) context. Nonetheless, implementing this approach in daily teaching is still a distant goal. Many researchers (e.g. Boulton, 2009, 2010; Hughes, 2012; R´’omer, 2009) have shown concern regarding the infrequent use of corpora in everyday classrooms. Several authors have also confirmed the key role that teachers play in applying the corpus approach in language teaching (Frankenberg-Garcia, 2012). The present study sought to widen the existing perspective on using corpora in language classrooms given previous research’s promising results on the importance of investigating teachers’ attitudes towards the corpus approach. Their willingness to apply it is clearly a necessary step in popularising this approach. This study was particularly interested in ways to transform classrooms into learning environments that truly facilitate the use of corpus-based approaches for learning English in an EFL context. This transformation can be facilitated by introducing teachers to corpus-based approaches and their applications in teaching English, which could help to inform language instructors and shape their attitudes. This study’s aim, therefore, was to explore teachers’ dispositions towards the use of corpora in language classrooms. Only two previous studies have examined in-service instructors’ attitudes towards corpus-based approaches to teaching (Mukherjee, 2004; Tribble, 2015). To this end, the present study’s first phase involved designing an introductory course to show language instructors possible ways of using corpora in the classroom. Next, I evaluated in-service teachers’ attitudes towards classroom uses of the corpus approach by developing and administering a questionnaire. Finally, I identified possible factors that can a↵ect instructors’ opinions of using corpora in the EFL classroom. The introductory course consisted of two sessions, each of which ran for one hour and 30 minutes with a 15-minute break. The sessions were o↵ered multiple times to accommodate teachers’ availability. The course content consisted of three units: teaching about corpora, exploiting corpora to teach language and teaching to exploit corpora. The participants were 57 in-service teachers who worked in higher education programmes. ⇤ † Ponente Autor correspondiente: [email protected] 161 An exploratory design was selected for developing the questionnaire, in which a semi-structured interview was used to generate material on themes and list possible variables in addition to those found in the related literature. The questionnaire covered five themes related to corpora uses in the classroom, including usefulness, difficulty, practicality, confidence and anxiety, and implementation. The tri-component model of attitude was used as the theoretical framework for constructing the questionnaire because this model is widely known and accepted by many researchers (Vandewaetere & Desmet, 2009). This framework consists of three elements that provide a comprehensive view of attitudes – in this case, towards corpus use in the language classroom – by capturing the three components of attitudes: cognitive, a↵ective and behavioural. Overall, teachers had moderate to positive attitudes. Contraseña: Corpus, based approaches: in, service teachers: classroom 162 The Developmental Relationship between Spoken and Written Clause Packaging in an English Secondary School Mark Brenchley 1 ⇤ 1 Graduate School of Education, University of Exeter – 216 Baring Court University of Exeter St Luke’s Campus Heavitree Road Exeter EX1 2LU UK, Reino Unido This paper will detail the findings of a fresh study into the relationship between L1 spoken and written syntax during the secondary phase of the English education system, situating them within the context of other recent studies into L1 development during the school years and discussing their implications for L1 English curricula. Working within a framework of ”linguistic literacy” and a wider model of ”rhetorical” competence, according to which L1 speakers and writers must not only learn the core forms of a language but also develop the capacity to e↵ectively put these forms to work across a range of literate contexts (Berman & Ravid, 2009; Ravid & Tolchinsky, 2002; cf. Biber, 1988, 1992; Hymes, 1976), the aims of the present study were twofold. Firstly, to provide a better understanding of the relationship between spoken and written syntax during an apparently critical period in the development of L1 English (Berman & Ravid, 2009; Myhill, 2009). Secondly, to provide evidence that can better inform and support contemporary L1 English curricula, which are increasingly emphasising the explicit teaching of grammar (ACARA, 2016; DfE, 2014). To this end, a bespoke corpus of 180 pairs of spoken and written L1 expository discourse was directly elicited from students attending a mainstream secondary school in Southern England. The corpus was further designed so as to be balanced across two developmental axes: (a) the year group of the student, and (b) their National Curriculum attainment level. This corpus was then analysed in terms of the students’ modality-related use of clause packaging, construed here as comprising the various means by which clauses are combined via coordination and subordination (cf. Berman & Slobin, 1994). So analysed, the study indicates adolescent students at the present age and attainment levels to be at a stage where they can and do di↵erentiate their modality-related syntax, at least for these texts and measures. It also found this di↵erentiation to be something that varied according to the particular kind of packaging measured. Thus, the spoken texts exhibited a greater number of t-units per t-unit complex and clauses per t-unit, together with a greater prominence of finite adverbial and post-verbal complement clauses. Conversely, the written texts exhibited a greater overall prominence of non-finite clauses, whilst neither modality was distinguishable in terms of either clause length or their respective proportions of relative clauses and phrasal clauses. Finally, this di↵erentiation was found to be developmentally static, with participants handling their modality-related syntax in much the same way regardless of their age or attainment level. Overall, these findings are interpretable in terms of the participants tapping into the di↵erential production conditions of speech and writing, but without necessarily fully exploiting these conditions (Biber, 1988, 1992). Moreover, when placed in the context of the wider evidence base ⇤ Ponente 163 (Berman, 2008; Myhill, 2008; Nippold, 2007; Nippold & Scott, 2010; Ravid & Tolchinsky, 2002), the findings suggest two additional conclusions. Firstly, they indicate students at the present age and attainment levels to be at a stage where their syntactic output is more in line with the discourse of mature speakers and writers. Secondly, they indicate modality to be an aspect of student syntax that is characterised by a potentially high degree of sensitivity to the various communicative features of the wider discourse context. Contraseña: Education, English, L1, Later Language Development, Modality, Register Variation, Syntax 164 The Psycholinguistic Profile of Domestic Abusers: A Corpus-Based Approach ángela Almela⇤ 1 , Gema Alcaraz-Mármol 2 , Pascual Cantos Chaski 4 , Clara Pallejá 5 1 † 3 , Carole Centro Universitario de la Defensa - UPCT (CUD) – Centro Universitario de la Defensa. Base Aérea de San Javier C/ Coronel López Peña s/n, 30720, Santiago de la Ribera, Murcia, España 2 Universidad de Castilla la Mancha (UCLM) – España 3 Universidad de Murcia (UM) – España 4 Institute for Linguistic Evidence (ILE) – Estados Unidos 5 Centro Universitario de la Defensa - UPCT (CUD) – España Gender-based violence is receiving close attention from professionals and researchers within the legal, criminal and psychological scope, exploring several aspects related to both the victim and the abuser. In some cases, the phenomenon of gender-based violence shows the direct relationship between language and society. In fact, some stylistic methods show how social structures and language are interwoven through the abuser’s discourse. However, the language produced by those involved in gender-based violent acts has been hardly explored from a computationallinguistic perspective (Almela, Alcaraz-Mármol & Cantos, 2015; Hancock et al., 2011). This paper presents a pilot study of di↵erentiating the language of domestic abusers from a control group. The domestic abusers have been convicted of a violent crime in the domestic context, while control group members have not. The main aim is to shed some light on the gender-based abuser’s psycholinguistic profile in the Spanish language from an empirical viewpoint, in the light of the scientific practices promoted by Chaski (2013). This profile is meant to establish the underpinnings for a database which will be compared to other criminals’ speech. Our research is still at the initial stage, but we have already designed the methodology for the analysis of the morphological characteristics in the gender-based abuser’s discourse, as opposed to the speech of those convicted for other sorts of crimes and a control group. Specifically, the linguistic sample for our analysis correspond to written interviews done by subjects that have been accused and/or convicted for gender-based abuse. The computational analysis involves several stages like POS-tagging, punctuation tagging and the evaluation of markedness, as well as the assessment of lexical choice and the identification of morphosyntactic patterns, which will allow us to distinguish the abuser’s sublanguage from that of the control group. Thus, the results of analyzing the two groups’ linguistic behavior in writings responding to the same stimuli are presented. Further, results of clustering and classification to determine the statistical reliability of di↵erentiating the language of domestic abusers are presented. The present authors will also comment on some of the hindrances found in the collection of data, which has complicated the accomplishment of the work schedule initially programmed, and will show how the use of language as evidence in the framework of forensic linguistics in Spain is still in its infancy. REFERENCES ⇤ † Autor correspondiente: [email protected] Ponente 165 Almela, A., Alcaraz-Mármol, G. and Cantos, P. (2015). Analysing deception in a psychopath’s speech: a quantitative approach. DELTA 31 (2): 559-572. Chaski, C.E. (2013). Best practices and admissibility of forensic author identification. Journal of Law and Policy 21 (2): 333–372. Hancock, J. T., Woodworth, M. T. and Porter, S. (2011). Hungry like the wolf: A word-pattern analysis of the language of psychopaths. Legal and Criminological Psychology 2011, 1–13. Contraseña: domestic abusers, forensic linguistics, psycholinguistic profile, clustering, classification 166 The XML Annotation of A Corpus of Historical English Law Reports 1535-1999: A Progress Report Paula Rodrı́guez-Puente 1 ⇤ 1 University of Oviedo – España A Corpus of Historical English Law Reports (CHELAR; Rodrı́guez-Puente et al. 2016) is a specialised corpus consisting of law reports dating from the period 1535-1999. Law reports are records of judicial decisions which are ”cited by lawyers and judges for their use as precedent in subsequent cases” (Encyclopædia Britannica Online s.v. law report); they typically contain an account of all the facts of the case, the arguments of the judge, his reasoning, the judgment he arrives at and the kind of authority and evidence he uses. The corpus contains approximately half a million words. It is structured into nine periods of 50 years each, except for the first subperiod, which covers from 1535 to 1599. It is already available as plain text and with POS annotation (CLAWS C7; see Garside 1987). In previous work we described the first difficulties encountered during the process of creating the corpus texts as well as the editorial decisions that were initially taken (Rodrı́guez-Puente 2011); Fanego et al. 2017 provide an account of the final structure of the corpus and the type of documents it contains together with a description of the process of compilation of the raw and POS-annotated texts. In this presentation we report on the process of XML annotation of the corpus. CHELAR is currently being annotated following the Text Encoding Initiative P5 Guidelines for Electronic Text Encoding and Interchange developed by the Text Encoding Initiative Consortium (Bray et al. 2008). TEI XML encoding has become the standard practice adopted in digitally based humanities research for present-day English and diachronic corpora. More precisely we focus on the particular structure and contents of law reports and the specific XML tags used for our purposes. We advocate for a modest XML tagging which includes some renditional (e.g. italics), structural (paragraphs, line breaks, page breaks, etc.) and conceptual (foreign words, proper names, names of cases, etc.) features of the texts. In sum, although the annotation possibilities of the TEI-XML schema are infinite, we selected only those tags that satisfy the needs of our texts, yet at the same time facilitate a varied range of corpus analyses. An account of the decisions made will be provided in this paper, together with a progress report of the annotation process itself. At present we have concluded the annotation of the first two subperiods (1950-99 and 1900-1949) and we hope to conclude the annotation of the whole corpus by the end of 2017. Contraseña: corpus annotation, XML, law reports ⇤ Ponente 167 The construction of shared feelings: analysis of a↵ect in a corpus of obituary comments in online newspapers Isabel Corona 1 ⇤ 1 Universidad de Zaragoza (UNIZAR) – Facultad de Filosofı́a y Letras Pedro Cerbuna12 50009 ZARAGOZA, España The comments section in online newspapers consists of a slot found below an article’s body text where readers may post their opinion following that piece of news. Comment boards were o↵ered by online newspapers a decade ago to engage readers in the news process, thus creating a new context for expression and engagement (Yzer and Southwell 2008) within the general ‘connecting’ mantra. Journalistic obituaries, with a long-standing tradition in all sorts of newspapers, are life stories seen in retrospective. They are narratives of lives with a purpose established by the newspaper, either to praise or condemn, becoming a lesson of life that guides or reinforces the values of a community of readers who are supposed to share the same socio-cultural or political principles. Thus, evaluation of the subject has been an intrinsic feature of obituaries. The subjects’ lives are sanctioned as complying with or deviating from role-specific parameters, in such ways that they construe a particular version of collective memory, reflecting the values of the media institution. This collective memory can now be challenged by the new media a↵ordances that open up the space for individual reactions to that memory. By using the comments section, which could be viewed as a new ‘social tool’, prior readers become co-participants in the coproduction of the text’s meanings” (Page and Thomas 2011: 10): they may bring emotional reactions on his or her behaviour, on his or her public legacy as role models, and get an immediate response from other participants. The users’ discursive acts, although separated from the main text, construe another discursive context that may or may not agree with the newspaper’s assessment of the subject. The main aim of this study is to explore the commentator’s use of evaluative expressions for the construction of a↵ect towards a life story of a public persona in the digital media, in order to assess the way media users establish a new space for shared feelings. For this purpose, the corpus comprises 840 comments which appeared in the obituaries published by five online newspapers (Daily Mail (UK), The Daily Telegraph (UK), The Guardian (UK), the Huffington Post (USA edition), and the Washington Post (USA)) after the death of the Spanish Duchess of Alba. The study is grounded in Collective Memory as an umbrella concept that ”defines relations between the individual and the community to which she belongs and enables the community to bestow meaning upon its existence” (Neiger et al. 2011: 4). The analysis applies the framework proposed by Appraisal Theory (Martin 2004; Martin and White 2005; White 2001), to explore the attitudinal values used to construe a community of shared values. The present analysis focuses on the attitudinal realm of ”A↵ect”, as mapping the commentators’ reactions in terms of happiness, admiration, satisfaction, desire and solidarity towards the obituarised subject. The analysis of explicit attitudinal instantiations of A↵ect reveals a clearly positive emotional ⇤ Ponente 168 response of readers turned into users, with prototypical expressions of sorrow –so productive in the construction of community identity–, and a high frequency of desiderative expressions operating as ritual formula, all of them features –referred to by obituarists as ”dread clichés” (Massingberd 1995: viii) and banned in all quality newspapers –, that challenge tacitly accepted norms with respect to what is considered good obituary writing. Contraseña: Collective Memory, obituaries, online comments, Appraisal, Computer Mediated Communication (CMC) 169 The implied consumer in British hotel websites Carmen Gregori-Signes 1 ⇤ 1 IULMA. UNIVERSITAT DE VALENCIA (IULMA. UV) – Facultt de Filologia Blasco Ibañez 32 46010 Valencia, España Hotel websites is a discourse type within etourism that intertwines textual and visual strategies (cf. Cheng 2016) with the primary purpose of persuading website visitors to become customers. This paper focuses on the interpersonal rhetorical functions of engagement, i.e. the lexicogrammatical choices (cf. Hyland 2005) that hotel website designers use as a strategy to create a bond between the addresser (i.e. the hotelier) and the addressees (i.e. the potential clients), in the framework of a ‘business to consumer’ (B2C) marketing practice in ecommerce. As a framework for the analysis, the paper adopts Stern’s (1994) interactive communication model and focuses on the implied consumer, i.e. the construct of the imagined consumer within the message, and how the relationship between both is discursively established. This involves looking at metadiscourse, which Hyland and Jiang (2016: 3) described as ”(the) linguistic material referring to the evolving texts and to the writer and imagined reader of that text.” As Hunston (2011: 24) puts it, ”metadiscourse is subsumed entirely under the concept of interaction or engagement between writer and reader.” The corpus analysed comprises 114 British hotel websites, and amounts to half a million words. This is part of COMETVAL, a large database of over 7 million words, compiled by researchers at the University of València, and contains samples of tourism websites in three languages: French, Spanish and English. The results obtained in the analysis indicates the existence of patterns whose relevance becomes already apparent in an initial keyword analysis of the corpus: among the top keywords one can find the personal pronoun you (subject and object) and its corresponding possessive your as explicit reference to the implied consumer. Further observation by means of concordancing and manual scrutiny also pointed towards the need to include directives as a relevant feature of engagement (Hyland 2005). Directives are often conveyed by means of imperatives and cannot be detected through keyword analysis and ordinary morpho-syntactic tagging. The results of the quantitative and qualitative analysis seem to indicate that copywriters rely on a set of a set of specific conditional constructions built around the subject personal pronoun you, and, in some cases, directives. These structures were further explored and classified into di↵erent subsets, which brought out a set of lexico-grammatical patterns that reflect the textual choices that hoteliers use in their attempt to anticipate the needs and wishes that potential customers may have. These needs, they claim, can be satisfied by the products and/or services that hoteliers o↵er. It is our view that such rhetorical features of engagement distinguish the discourse of hotel websites from other kinds of promotional discourse. These patterns are examples genuine cases of engagement, key rhetorical features of hotel-owned websites (AUTHOR 2, 2014). ⇤ Ponente 170 Contraseña: Keywords: discourse, engagement, corpus linguistics, conditionals, advertising, etourism, hotel websites 171 The power of English: I and we in ELF and in ENL academic discourse Jolanta Sinkuniene 1 ⇤ 1 Vilnius University (VU) – Lituania Within the last several decades, numerous cross-disciplinary and cross-linguistic studies of research writing confirmed interesting trends in the ways knowledge is reported in di↵erent science fields and di↵erent cultures (Berkenkotter & Huckin 1995; Fløttum et al. 2006; Hyland 2008; Lorés–Sanz et al. 2010, inter alia). In those studies, author stance or author voice (Hyland & Sancho Guinda 2012) is the key element of investigation as it proved to play a very important role in creating persuasive discourse which shapes disciplinary and cultural identities. In cross-linguistic studies of research writing, the comparative axis is frequently drawn between English vs other academic cultures trying to establish the level of similarity or divergence in the expression of author stance. At the same time the question of the influence of English on other academic cultures has become of crucial importance leading to the debate about the role of English in the global research arena: the role of a common, unifying language of science or the Tyrannosaurus rex (Swales 1997) responsible for the ”epistemicide” (Bennett 2007) of smaller cultures. One of the most obvious elements of author stance manifestation is personal pronouns. The use of I and we in academic discourse has been acknowledged as one of the most powerful means to mark author stance (Harwood 2005; Hyland 2001, inter alia). Numerous empirical studies confirm substantial di↵erences in personal pronoun use depending on the cultural background of the writer (for an overview see Mur-Dueñas & Šinkūnien 2016). There is less research which attempts to investigate the ways personal pronouns are used in English as a Lingua Franca by non native English speakers in comparison to their writing in native languages. The aim of the present study therefore is to analyse the use of personal pronouns in linguistic research articles written by Lithuanian scholars in Lithuanian and by the same scholars in English, and to compare patterns of use with those of native English speakers. The study employs corpus-based contrastive methodology as well as quantitative and qualitative analysis. The data comes from a self-compiled corpus of 36 single-authored research articles. For the Lithuanian data 12 pairs of research articles written by the same scholar in English and in Lithuanian were selected. For the English sub-corpus, 12 articles written by British linguists were chosen. The quantitative analysis looks at the frequency distribution of I and we and their morphological forms in those three sub-corpora. The qualitative analysis investigates the range of functions that personal pronouns perform in Lithuanian, Lithuanian English and British English texts. For this purpose, all combinations of a personal pronoun with the verb have been analysed in context to determine the function they perform. The results suggest that most Lithuanian scholars choose a more explicit author stance expression when they write in English rather than in Lithuanian, though the frequency and functions of I and we in English native speakers’ texts are di↵erent. English native speakers choose more argumentative verbs to express author stance with personal pronouns, they also frequently shift from I to we and in this way create more persuasive discourse and closer links with the audience ⇤ Ponente 172 than Lithuanian scholars. Contraseña: academic discourse, personal pronouns, cross linguistic, quantitative analysis, qualitative analysis 173 The textual colligation of stance phraseology in cross-disciplinary academic discourses: the timing of authors’ self-projection Louisa Buckingham 1 ⇤† 1 , Jihua Dong ⇤ ‡ 1 University of Auckland – Nueva Zelanda Lexical items, according to Hoey (2005, p.13) ”are primed to occur in or avoid, certain positions within the discourse”. An analysis of textual colligation, the term Hoey (2005) uses to denote such priming, explores the textual position of linguistic markers in relation to textual structures. Recent studies have explored the textual colligation of particular words or phrases (e.g., Hoey & O’Donnell, 2008; Mahlberg, 2009; O’Donnell et al., 2012). Textual colligation explores the textual position of linguistic markers in relation to textual structures and the interaction between the textual position and discourse functions (Hoey, 2005). Previous studies have enriched our understanding of textual colligation of particular linguistic features such as keywords or key phrases in a text. This study investigates the textual colligation of a type of linguistic marker typical for one particular semantic group, namely, stance. This quantitative study investigates the textual colligation of the stance phrases in academic discourse in the disciplines of agriculture and economics. The study employs a purpose-built corpus of 655 published research articles totalling around 3 million tokens. We use Wordskew software (Barlow, 2016) to investigate the position (or colligation) of stance phrases at the level of sentence, paragraph and text, and examine the existence of disciplinary variation with respect to the textual colligation of these phrases. The results show that significant di↵erences exist in the distribution of stance phrases in different textual positions (sentence, paragraph and text) in the two disciplines. Nevertheless, the proportion of stance phrases in each of the three textual positions is notably similar in the two disciplines. It may be inferred that the textual position of particular stance phrases may be a result of the type of routinized discourse or communicative function these serve (Hoey, 2005). The findings regarding the textual position of the stance phrases consolidates Hoey’s premise that certain expressions are primed to occur or avoid particular textual positions. In addition, the study revealed that the phrases of a particular function tend to share some positional similarities with regard to their distribution in sentence, paragraph and the whole text. From a communicative viewpoint, the appropriate positioning of stance phrases in a text supports authors in constructing discourse-appropriate persona, interact with envisaged readers, and achieve their communicative objectives. The use of Wordskew has contributed to revealing the text positions at the sentence, paragraph, and text level. It provides an efficient way to quantify the textual position of particular linguistic features, and contributes to visualising the distribution of particular linguistic features in the organization of a text. ⇤ Ponente Autor correspondiente: [email protected] ‡ Autor correspondiente: [email protected] † 174 Barlow, M. (2016). WordSkew : Linking corpus data and discourse structure. International Journal of Corpus Linguistics, 21 (1), 105–115. Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge. Hoey, M., & O’Donnell, M. B. (2008). Lexicography, grammar, and text position. International Journal of Lexicography, 21 (3), 293–309. Mahlberg, M. (2009). Local text functions of move in newspaper story patterns. In U. R´’omer & R. Schulze (Eds.), Exploring the lexis-grammar interface (pp. 265–287). John Benjamins. O’Donnell, M. B., Scott, M., Mahlberg, M., & Hoey, M. (2012). Exploring text-initial words, clusters and concgrams in a newspaper corpus. Corpus Linguistics and Linguistic Theory, 8 (1), 73–101. Contraseña: textual colligation, stance phrases, academic disciplinary variation, academic writing 175 Towards an extended lexical grammar: Complex colligational patterns of the noun cause Moisés Almela Sánchez 1 ⇤ 1 , Pascual Cantos Gómez ⇤ † 1 University of Murcia – España It has become a truism that lexis and grammar are intertwined and that grammatical choices are bound to lexical items. The notion of lexical grammar is well established in several frameworks of modern linguistic research, and corpus-driven linguistics is not an exception in this respect-see, for instance, Francis (1993) and Hunston and Francis (2000). This research is aimed at extending the scope of description of lexico-grammatical co-selections, more specifically at identifying certain forms of coordination of lexical and grammatical features that are more complex, and also subtler, than the cases of lexico-grammatical co-selection usually described in the literature. Theoretically and methodologically, the study builds on research into lexical constellations (Cantos & Sánchez, 2001; Almela, 2011; Almela et al., 2013), which has provided evidence that the strength of association between a node and a collocate can be influenced by elements outside the pair, particularly by dependencies among di↵erent collocates of a node. For instance, the association of the verb face and the noun decision is strengthened by the presence of modifiers of a specific semantic set (e.g., hard, difficult, tough). Previous studies have focused on the implications of this phenomenon for the analysis of word meaning. The methodology was based on comparisons of conditional probabilities between bigrams and trigrams formed by previously extracted significant collocates of a node. The present study adapts the methodology of lexical constellation analysis to the description of dependencies between di↵erent colligational patterns (i.e. preferred grammatical contexts) of a word. The node under investigation is the noun cause, and the corpus used is enTenTen2013, a large-scale web corpus of English. This corpus contains 19,717,205,676 tokens and is accessible at Sketch Engine. The methodology will be organized in two main steps. In the first one we will compare the conditional probabilities of di↵erent grammatical contexts of the node. The goal of this first step is to determine whether the presence of a particular grammatical category in the context of the node increases or decreases the probability of another grammatical category in a di↵erent position. More specifically, we will observe possible dependencies between the slots ‘premodifier’ and ‘of -postmodifier’. In a second step, we will compare the behaviour of these two slots across di↵erent collocations of the node. In particular, we will analyse their distribution in collocations of cause with a list of top logDice collocates. Two main conclusions are drawn from the results. The first one is that there are dependency relations between the two grammatical slots investigated in the environment of cause (‘premodifier’ and ‘of -postmodifier’). The second one is that the dependency relations observed between grammatical slots are contingent on specific collocations of cause. The dependencies observed do not exhibit the same behaviour with all the verbal collocates of the node. In general, these results point towards an influence of collocation on the co-occurrence probabilities of di↵erent ⇤ † Ponente Autor correspondiente: [email protected] 176 colligations of cause. Contraseña: collocation, colligation, lexical priming, semantic preference. 177 Técnicas de caracterización de los personajes femeninos en Galdós: una aproximación desde los estudios de corpus Guadalupe Nieto 1 ⇤ 1 Universidad de Extremadura - Uex (SPAIN) – España En esta comunicación se explora, a partir de un estudio de corpus, el lenguaje gestual en las novelas de Benito Pérez Galdós y, de manera más precisa, los patrones empleados por el novelista para trazar la personalidad de los personajes femeninos. El estudio abordará la obra completa en prosa del escritor, la cual suma alrededor de 6,2 millones de palabras. Para ello se prestará especial atención a las construcciones de al menos cinco palabras (clusters), empleadas de manera sistemática, y que contengan algunas de las siguientes partes del cuerpo: cabeza, espalda, hombros, manos u ojos. Este recurso de caracterización ha sido analizado en otros escritores de habla inglesa como Dickens (Mahlberg, 2013; Ruano San Segundo, 2015) o Jane Austen (Fischer-Starcke, 2010). El lenguaje gestual, como apunta Korte (1997: 4), se erige, como se verá, en un sistema autónomo en la construcción del universo ficticio en el género novelesco. El estudio de corpus que se propone permitirá profundizar en un aspecto del estilo de Galdós que hasta ahora, debido a lo complejo que puede llegar a resultar su análisis sin herramientas de carácter cuantitativo, ha pasado, por lo general, desapercibido. Ası́ pues, se indagará en la caracterización de sus personajes femeninos a través de patrones recurrentes y que contengan las mencionadas partes del cuerpo a lo largo de su producción literaria y el cotejo, en algunos casos, con la caracterización de los personajes masculinos. Como se podrá comprobar, la obra de Galdós está poblada por patrones que actúan como bloques textuales que contribuyen a la construcción del universo ficticio que el autor nos plantea. Los textos han sido descargados del repositorio digital Cervantes Virtual y han sido procesados posteriormente con el software de concordancias WordSmith Tools 6 (Scott, 2013), que permite realizar búsquedas de palabras y concordancias que extraen resultados que pueden ser analizados en el contexto de la novela en que aparecen. Entre los ejemplos de nuestro análisis se encuentra la expresión ”el pañuelo a los ojos”, asociada casi con exclusividad a la caracterización de personajes femeninos y empleada normalmente en momentos dialógicos para insistir en la tristeza de estos: ”Irene se llevó el pañuelo a los ojos, y con voz de ahogo me dijo: ‘Sabe usted... más que Dios...”’ (El amigo Manso, capı́tulo 41). En definitiva, la caracterización de los personajes femeninos en el universo novelesco de Galdós está perfectamente lograda. En efecto, como se pretende demostrar en este trabajo, el análisis del lenguaje gestual desde una perspectiva de estilı́stica de corpus permitirá, además, marcar diferencias entre hombres y mujeres o entre mujeres burguesas y proletarias. El autor canario es, en palabras de Marı́a Zambrano (1994: 130), ”el primer escritor español que introduce valientemente a las mujeres en su mundo”. Bibliography: ⇤ Ponente 178 Biblioteca Virtual Miguel de Cervantes (2016): http://www.cervantesvirtual.com/ (acceso: 2 de abril de 2016). Fischer-Starcke, B. (2010): Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries. London: Continuum. Korte, B. (1997): Body Language in Literature. Toronto: University of Toronto Press. Mahlberg, M. (2013): Corpus Stylistics and Dickens’s Fiction. New York/London: Routledge. Ruano San Segundo, P. (2016): ”A corpus-stylistic approach to Dickens’ use of speech verbs: Beyond mere reporting”. Language and Literature, 25 (2), 1-15. Scott, M. (2013): WordSmith Tools. Version 6. Oxford: Oxford University Press. Zambrano, M. (1994): ”Mujeres de Galdós”. Asparkı́a, 3, 129-135. Contraseña: Galdós, mujer, lenguaje gestual, estilı́stica de corpus 179 Unidades fraseológicas en la subtitulación de una serie del género de drama. Dalila Itzel Nieto Mercado 1 ⇤ 1 , Eleonora Lozano Bachioqui† 1 Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Av. Álvaro Obregón y Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México Resumen El presente trabajo surge de la necesidad de conocer más sobre la traducción de unidades fraseológicas en la subtitulación del inglés al español, debido al crecimiento de espectadores de contenidos audiovisuales provenientes de Internet. En este contexto, se debe tener en cuenta que la labor del traductor consiste en hacer la cultura accesible a todo aquel que se interese por ella,ya que no se trata solamente de convertir mensajes de un idioma a otro sino también de difundir la cultura. El objetivo de este trabajo es la creación de un glosario de unidades fraseológicas en inglés junto con sus equivalencias, basado en un corpus proveniente de los diálogos de una serie de televisión. Los resultados beneficiarán a todos aquellos que se interesen por la traducción o bien puede servir como instrumento de enseñanza de unidades fraseológicas en inglés y sus equivalencias al español. Para esto, se compilaron algunos guiones de la serie estadounidense Mad men(Weiner, 2007) con el fin de realizar un análisis de las unidades fraseológicas utilizando el programa AntPConc, creado por Laurence Anthony para el análisis de textos paralelos. Phraseological units in the subtitling of a drama series Abstract The following paper rises from the need of learning more about the translation of phraseological units in English to Spanish subtitling, due to the increasing amount of spectators of Internet broadcast media.In this regard, we must take into account that a translator’s task is to make culture approachable to anyone who is interested in it, for it’s not only about translating words from one language to another but it’s also about spreading the culture. The objective of this paper is to create an English phraseological units glossary -along with their equivalence in Spanishbased on a corpus originated from the scripts and subtitles from a television series. The results will benefit anyone who is interested in translation or it may also serve as an English to Spanish phraseological unit teaching tool. To do the aforementioned glossary, a compilation of scripts from the American series Mad men(Weiner, 2007) was made in order to analize the phraseological units using the tool AntPConc (to analize paralel texts) a tool created by Laurence Anthony. Contraseña: traducción, subtitulación, unidades fraseológicas, ling´’uı́stica de corpus ⇤ † Ponente Autor correspondiente: [email protected] 180 Verbal agreement with NCOLL-of-NPL subjects in the inner varieties of English in GloWbE Yolanda Fernández-Pena 1 ⇤ 1 University of Vigo – España Collective noun-based subjects may take singular or plural verbs according to whether the speaker focuses on the collectivity or on its individuals (Dekeyser 1975), the latter being preferred in British English (Bauer 2002). This conundrum is further complicated when collective subjects take plural of -dependents (i.e. Ncoll-of-Npl subjects) which may interfere in the subject-verb agreement relation, as in (1): (1) A [crowd]SG of [waiters]PL [were]PL gathering. In previous research, with data from the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA), I showed that NCOLL-of-NPL subjects take a significant rate of plural verb agreement (68.05%) in local syntactic domains in both British and American English and that, with increasing syntactic distance and complexity, the influence of plural of -PPs on verb number diminishes and, therefore, the rate of plural agreement considerably lowers (58.47%). This study extends the scope of such investigation by exploring verbal agreement with NCOLLof-NPL subjects in the corpus of Global Web-based English (GloWbE) with a two-fold purpose. Firstly, I have inspected British and American English in GloWbE to find whether my prior observations were corroborated (and to what extent) in the more informal web-based register. Secondly, I have scrutinised the data for the other four inner varieties of English in GloWbE – Ireland, Canada, Australia and New Zealand – to detect significant regional tendencies and similarities/di↵erences with respect to British and American English. To this end, I have replicated my previous investigation and, thus, examined verbal agreement with twenty-three singular collective nouns taking of -dependents (lists retrieved from Biber et al. 1999: 249; Huddleston and Pullum et al. 2002: 503) in the six inner varieties of English in GloWbE. The syntactic variables considered in the study pertain to (i) the constituent structure of the of -PP, (ii) the typology of the modifiers of the NPL, and (iii) the morphology of the NPL (i.e. regular vs. irregular vs. non-overt plurality as in boys vs. men vs. people). The results confirm to a large extent my prior observations in the BNC and COCA and also evince significant regional trends. In general, NCOLL-of-NPL subjects show an overall preference for plural verbal agreement only in the British and the Irish components (57.26% and 63.97%); American English slightly favours singular agreement (52.18%), whereas Canada, Australia and New Zealand do not display significant preferences. In line with the BNC and COCA, the data from GloWbE demonstrate how the morphology of NPL conditions verbal agreement because morphologically-unmarked plural nouns such as people show a more remarkable influence on verb number (70.18%) than irregular (i.e. men, 60.89%) and regular (i.e. boys, 51.63%) plural nouns, a tendency which is attested in all the varieties surveyed. Concerning syntactic complexity, while Canada, Australia and New Zealand do not provide significant results, the ⇤ Ponente 181 results for the British, Irish and American varieties confirm that the most complex syntactic configurations of NCOLL-of-NPL subjects (i.e. those with pre- and postmodification) select a lower rate of plural agreement. Similarly, plural verb agreement is considerably less salient when the NPL is postmodified by clausal and, thus, expectedly more complex constituents (40.94% vs. non-clausal: 55.50%). This finding counteracts prior literature (Corbett 1979) but lends support to the tendencies that I had previously observed and, hence, confirms the significant impact of morphology and syntactic complexity on the verbal patterns of NCOLL-of-NPL subjects. Contraseña: verbal agreement, collective nouns, regional varieties, corpus 182 Évaluer le seuil de fréquence pour la sélection des paquets lexicaux: de bonnes nouvelles avec quelques réserves Yves Bestgen 1 ⇤ 1 Centre for English Corpus Linguistics (CECL) – Place du Cardinal Mercier, 10 B-1348 Louvain-la-Neuve, Bélgica Une des approches les plus fréquemment employées pour étudier les unités préformées dans des corpus repose sur l’identification automatique des paquets lexicaux (lexical bundles) qui sont les séquences de mots les plus récurrentes dans un corpus (Biber et al., 1999). Leur étude a permis de mettre en évidence des di↵érences phraséologiques entre des registres, des genres et des époques. Si la majorité des recherches ont été menées sur des séquences de 4 mots, des séquences plus courtes ont également été analysées. Pour les sélectionner parmi l’ensemble des n-grammes de mots présents dans un corpus, deux critères sont employés : un seuil de fréquence minimale, censé garantir que les paquets lexicaux ”show a statistical tendency to co-occur ” (Biber et al., 1999: 989) et le nombre minimal de documents dans lequel une séquence doit être présente afin d’éliminer les séquences idiosyncrasiques. Si un large consensus s’est établi pour fixer un seuil de 3 à 5 textes pour le deuxième critère, de très fortes variations sont observées pour le premier, celui-ci se situant habituellement entre 10 et 40 occurrences par million de mots, mais des valeurs allant de 4 (O’Kee↵e et al., 2007) à 88 (Decock, 1998) ont également été employées. S’agissant du critère principal de sélection (Cortes, 2015: 204), censé garantir que les paquets lexicaux sont composés de ”words which follow each other more frequently than expected by chance” (Hyland, 2008: 5), une telle plage de variation conduit à se demander si les seuils de fréquence employés sont suffisamment élevés pour éviter de sélectionner des n-grammes que le hasard aurait pu facilement produire aussi fréquemment. De nombreux chercheurs ont en e↵et souligné qu’une séquence peut-être très fréquente simplement en raison de la fréquence des mots la composant (p.ex. Evert, 2005; Gries, 2010). Afin d’essayer d’apporter une réponse à cette question, l’étude emploie une extension aux séquences de plus de deux mots du test exact de Fisher qui est recommandé dans le cas des bigrammes (Jones et Sinclair, 1974; Pedersen et al., 1996; Stefanowitsch et Gries, 2003). Il est important de noter que son objectif n’est pas de remettre en question la définition des paquets lexicaux comme les séquences les plus récurrentes. Il est évidemment plus utile de distinguer des registres au moyen de séquences très fréquentes qu’au moyen de séquences rares. Les analyses ont été e↵ectuées sur un corpus de 3 200 000 mots extraits de la section ”académique” du BNC. Trois sous-corpus ont également été extraits de ce corpus initial de manière à faire varier la taille, le premier contenant 800 000 mots, le deuxième 200 000 et le dernier 50 000 mots. Une procédure d’estimation des probabilités par permutation des mots dans le corpus a été employée et 10 millions de permutations ont été e↵ectuées dans chaque corpus. Les résultats indiquent que les seuils classiques sont suffisamment élevés pour ne sélectionner que des séquences de quatre mots que le hasard aurait très peu de chance de produire aussi fréquemment. Par contre, un nombre important de séquences de trois mots sélectionnés sur la ⇤ Ponente 183 base de ces seuils ne passent pas le test inférentiel. Cette étude met aussi en évidence un e↵et très marqué de la taille du corpus sur l’efficacité des seuils de fréquences lorsque ceux-ci sont exprimés en fréquence normalisée, confirmant les inquiétudes de Cortes (2015) et de Hyland (2012). Contraseña: expressions phraséologiques, paquets lexicaux, lexical bundles, test exact de Fisher, approche dirigée par le corpus, seuil de fréquence, taille du corpus 184 Índice de creatividad metafórica y universales de traducción: propuesta metodológica a partir de un corpus de informes de responsabilidad social empresarial Sara Piccioni 1 ⇤ 1 Università “G. D’Annunzio” di Chieti-Pescara – Italia Objetivo del presente trabajo es investigar las hipótesis de los universales de traducción (Baker 1996) a través de la comparación de un ı́ndice de creatividad metafórica en un corpus de textos originales y textos traducidos en español. El análisis realizado se basa sobre una doble propuesta metodológica: en primer lugar, abrazando la idea de que los textos traducidos difieren de los textos originales por rasgos ling´’uı́sticos propios, se propone incluir entre estos rasgos el nivel de lexicalización/creatividad metafórica, sugiriendo que el uso metafórico en originales y traducciones es distinto en cuanto al tipo de repertorio metafórico utilizado; en segundo lugar, se propone un ı́ndice de creatividad metafórica capaz de medir el nivel de creatividad metafórica a partir de observaciones en un corpus de referencia general del español. El corpus de estudio consta de un corpus monoling´’ue comparable de informes de responsabilidad social empresarial compuesto por originales españoles (OR-ES) y textos en español traducidos del inglés (TR-ES). Por lo que atañe a la primera propuesta metodológica, se avanza la hipótesis de que el fenómeno metafórico con su amplio margen de variación entre formas plenamente lexicalizadas (por ej., cuello de botella) y metáforas creativas (por ej., drenar el dolor ) ofrece un punto de observación ideal para observar cómo el uso ling´’uı́stico de quien traduce se diferencia del observado en textos originales. Más en lo especı́fico, se considera que las metáforas convencionales en las traducciones son reflejo de procesos de normalización propios de los textos traducidos (”tendency to exaggerate features of the target language and to conform to its typical patterns”, Baker 1996), mientras que las metáforas creativas pueden resultar de un proceso de reverberación de la lengua de partida en la lengua meta (shining through, Teich 2003). La segunda propuesta metodológica es funcional a la comparación del nivel de creatividad de las metáforas en textos traducidos y originales y parte del criterio propuesto por Deignan (2005) para distinguir metáforas innovadoras y metáforas históricas: la baja frecuencia de usos metafóricos de una palabra dada se considera indicativa de innovación metafórica, mientras que las palabras que se utilizan casi exclusivamente metafóricamente se consideran usos convencionales. Para calcular el ı́ndice de creatividad metafórica, se extraen las 200 parejas VERBO-SUSTANTIVO más frecuentes en los dos corpus (OR-ES y TR-ES), identificando entre estas las parejas metafóricas utilizando el procedimiento propuesto por el Grupo Pragglejaz (Pragglejaz Group, 2007). Sucesivamente, se calcula el ı́ndice de creatividad de verbos y sustantivos metafóricos contando el ⇤ Ponente 185 número de casos de usos metafóricos de cada uno en una selección casual de 100 concordancias extraı́das del corpus español de la Leeds Collection of Internet Corpora (Sharo↵ 2006, REF). El número de casos metafóricos en REF multiplicado por la frecuencia de una dada pareja VERBO-SUSTANTIVO en OR-ES y TR-ES se considera indicativo del nivel de creatividad/convencionalidad metafórica de cada corpus. La comunicación se centrará en una discusión de las implicaciones metodológicas de las propuestas avanzadas, además de poner en relación el ı́ndice de creatividad metafórica con fenómenos de normalización y reverberación en las traducciones. Contraseña: informes de responsabilidad social empresarial, traducción de la metáfora, universales de traducción, análisis de la metáfora basado en corpus 186 ‘His maiestie chargeth, that no person shall engrose any maner of corne’. The Standardization of Punctuation in Early Modern English Legal Proclamations Javier Calle-Martı́n 1 ⇤ 1 University of Málaga (UMA) – Facultad de Filosofia y Letras Departamento de Filologı́a Inglesa Campus de Teatinos s/n Málaga 29071, España Punctuation is historically noted to develop from the rhetorical to the grammatical, from the speaker to the reader, the Renaissance standing out as the transitional period with the adoption of syntactic and pragmatic functions to organize the written information. This standardization is elsewhere regarded as a consequence of the introduction of Caxton’s printing press in England, the increasing activity of Westminster’s Royal Chancery, and a growing number of professional scriveners engaged in the writing of all sort of documents, from guild’s records to private letters. The study of historical punctuation, however, has been mostly based on Old and Middle English handwritten material, literary and scientific texts in particular. Unfortunately, the Early Modern English period has been an exception with the publication of a limited number of studies investigating the scribal attitudes in di↵erent text-types, the list including scientific, legal and literary texts, drama in particular (Calle-Martı́n and Miranda-Garcı́a 2008: 356–360). The unexplored condition of Early Modern English punctuation is even more significant in the particular case of printed texts, despite their active participation in the process of standardization. Legal material is not an exception, proclamations being ”one of the most overlooked categories of printed material in the field of early modern history” (Kyle 2015: 771). In the light of this, the present study therefore analyses the punctuation system in Early Modern English printed legal material with the following objectives: a) to provide the inventory of marks of punctuation in Early Modern English printed texts; b) to o↵er a detailed account of the use and pragmatic functions of these symbols; and c) to assess the level of standardization of punctuation in these sources. The present study relies on The Corpus of Early Modern English Statutes (compiled by Anu Lehto at the University of Helsinki), containing approximately 214,000 words for the historical period 1491-1707 (Lehto 2013: 239). The corpus is divided into 25-year sub-periods for diachronic comparison and they have been compiled to include two proclamations for each time period, with samples printed during the reign of each sovereign. Legal material has been chosen in view of a) its orality, written to be read aloud; b) its conservativeness, hostile to individual creativity in favour of the standard practice; and c) it complex syntax, requiring a complex set of marks for all kinds of syntactic relationships. This material has allowed us to gather conclusive data to ascertain a) the existence of an inventory of punctuation marks with a preconceived set of rules, corroborating an ongoing process of specialization at that time; and b) more importantly, the historical development of particular punctuation symbols, o↵ering grounds as to the actual rise and fall of particular symbols and ⇤ Ponente 187 their functions in the history of English. Calle-Martı́n, Javier and Antonio Miranda-Garcı́a. 2008. ”The Punctuation System of Elizabethan Legal Documents: The Case of G.U.L. MS Hunter 3 (S.1.3)”. The Review of English Studies 59: 356–378. Kyle, Chris R. 2015. ”Monarch and Marketplace: Proclamations as Use in Early Modern England”. Huntington Library Quarterly 78.4: 771–787. Lehto, Anu. 2013. ”Complexity and Genre Conventions: Text Structure and Coordination in Early Modern English Proclamations”. In Andreas H. Jucker, Daniela Landert, Annina Seiler and Nicole Studer-Joho (eds.). Meaning in the History of Engish. Words and Texts in Context. Amsterdam, Phil: John Benjamins. 233–257. Contraseña: Early Modern English, proclamations, punctuation, standardization 188 ‘Making it clear’: A contrastive study of evidentials and boosters in contemporary political discourse Ana Albalat-Mascarell 1 ⇤† 1 Universitat Politecnica de Valencia [Espagne] (UPV) – España Within Hyland’s (2005) metadiscoursal framework, evidentials and boosters are common rhetorical strategies that lend credibility to arguments either by drawing on external sources of information or by emphasising one’s own certainty about a proposition. Both strategies are part of a strong interpersonal view of metadiscourse comprising the ways speakers can organize a discourse and adopt a stance towards what is being discussed and their audience (Hyland, 2004, 2005, 2010; Hyland and Tse, 2004; Dafouz-Milne, 2008; Mur-Dueñas, 2011). But while a useful tool in explaining the interactional features of language in di↵erent domains and genres, metadiscourse has mostly been examined in relation to academic writing (Hyland, 2015). Little attention has been given to the role of metadiscourse markers in non-academic discourses with an overtly persuasive component such as political discourse, least of all from a comparative perspective exploring rhetorical and discursive cross-cultural di↵erences (Mur-Dueñas, 2011) between English and other languages. I address this gap by focusing on the presence and function of evidentials and boosters in broadcast debates between political candidates held for the 2015 and 2016 general elections in Spain and for the 2016 presidential election in the United States of America. In this vein, my objectives are, first, to extract the frequencies of the words and phrases performing these particular metadiscourse functions in such televised debates aimed at a very large audience; second, to compare the rhetorical and discursive roles of the most frequently used expressions by di↵erent speakers and relate them to the candidates’ persuasive aims; third, to explore linguistic and intercultural di↵erences regarding the use of these strategies and contrast them with the particular outcome of each election. In the methodology set for this study, the analysis was based on a corpus of authentic data consisting of the transcripts of those debates involving the leaders of at least the two parties topping opinion polls in each country and election (i.e. the PP and the PSOE (also Podemos in the 2016 election) in Spain and the Democratic and Republican political parties in the United States). Furthermore, the quantitative use of evidentials and boosters was analyzed with the tool ‘Metool’ developed specifically to detect metadiscourse strategies. The results demonstrate how the strategies identified tend to work in combination towards the representation of a credible self with something plausible to say that challenges opposing views on the same issue. Also, the main di↵erences in the qualitative use of these metadiscourse devices between the political actors involved and the positions they publicly adopt reveal a striking correlation between the speaker’s communicative characteristics and the projection of personal authority and trustworthiness into their discourse. Last but not least, the cross-cultural analysis of evidentials and boosters in broadcast debates taking the framework of interpersonal metadiscourse shows that the speaker’s ability to construct an e↵ective ‘Ethos’ varies according to language and culture but, quite surprisingly, a better performance at debates does not necessarily imply an election victory neither in the Spanish national context nor in the Anglo-Saxon tradition in the United States. ⇤ † Ponente Autor correspondiente: [email protected] 189 Contraseña: Intercultural rhetoric, Corpus, based analysis, Metadiscourse, Evidentials, Boosters, Political discourse 190 Indice de autores Álvarez-Gil, Francisco J., 73 Ahuactzin Martı́nez, Carlos Enrique, 75 Albalat-Mascarell, Ana, 183 Alcaraz-Mármol, Gema, 159 Almela Sánchez, Moisés, 170 Almela, Ángela, 159 Alonso Belonte, Isabel, 87 Alonso Ramos, Margarita, 81 Alonso-Almeida, Francisco, 73 Alruwaili, Awatif, 155 Andrade Navarro, Allen, 54 Arsenio, Andrades, 30 Baena Lupiáñez, Marı́a del Carmen, 56 BALLIER, Nicolas, 61 Barcellos, Carolina, 85 Barrio, Marı́a Valentina, 52 Barry, Pennock-Speck, 67 Bendinelli, Marion, 103 Bertels, Ann, 143 Bestgen, Yves, 177 BOJOVIC, Dijana, 36 Boutmgharine Idyassner, Najet, 46 Brenchley, Mark, 157 Buckingham, Louisa, 168 Cabezas-Garcı́a, Melania, 120 Cal Varela, Mario, 139 Calle-Martı́n, Javier, 181 Calvo-Rubio Jiménez, Estrella, 58 CANGIR, Hakan, 89 Cantos Gómez, Pascual, 170 Cantos, Pascual, 159 Carrió-Pastor, Marı́a Luisa, 97 CAVALLA, Cristelle, 38 Charles, Maggie, 153 Chaski, Carole, 159 Clavel Arroitia, Begoña, 67 Comer, Marie, 145 Comitre Narvaez, Isabel, 107 CORONA, ISABEL, 162 Criado Peña, Miriam, 128 Criado Sánchez, Raquel, 87 Delgar Farrés, Gemma, 111 Dong, Jihua, 168 EL KHAMISSY, Riham, 101 Esteban-Segura, Laura, 6 Fernández, Ester, 95 Fernández-Alcaina, Cristina, 18 Fernández-Domı́nguez, Jesús, 18 Fernández-Pena, Yolanda, 175 Fernandez Polo, Francisco Javier, 139 Gallego, Daniel, 63 Gandón-Chapela, Evelyn, 10 Garcı́a Salido, Marcos, 81 Garcia González, Marcos, 81 Garcia-Marchena, Oscar, 147 Gautier, Laurent, 32, 109 Georgopoulos, Athanasios, 22 Gil Martı́nez, Marı́a Adelaida, 77 GIRALDEZ CEBALLOS-ESCALERA, JOAQUÍN, 34 Gledhill, Christopher, 28 Gonzalez Darriba, Patricia, 2 Grön, Leonie, 143 Gregori-Signes, Carmen, 164 Gris Roca, Joaquı́n, 87 Hamilton, Clive, 71 Hedeland, Hanna, 149 Herrando Rodrigo, Isabel, 135 Heylen, Kris, 83 Jacques, Marie-Paule, 116 Jeon, Yun Sil, 141 Jettka, Daniel, 149 John, Suganthi, 12 Kang, Beomil, 4 Kubler, Natalie, 105 Kunilovskaya, Maria, 112 Lambrechts, An, 83 Lara-Clares, Cristina, 18 Laso, Natalia Judith, 12 León-Araúz, Pilar, 91, 120 191 Lee, Sun-Hee, 4 Lissón, Paula, 61 Liu, Yuanyi, 44 Llorián, Susana, 26 Lorés Sanz, Rosa, 135 Lozano Bachioqui, Eleonora, 54, 174 MAPELLI, GIOVANNA, 135 Martı́nez Casas, Marı́a, 137 Martı́nez Zavala, Sonia Paola, 24 Martı́nez, Inmaculada, 26 Martikainen, Hanna, 28 Martinez-Insua, Ana Elina, 16 Maruenda-Bataller, Sergio, 8 Mas, Inmaculada, 129 Mestivier (Volanschi), Alexandra, 28 Mestivier, Alexandra, 105 Mestre-Mestre, Eva M., 122 Mezeg, Adriana, 114 Morales Moreno, Albert, 59 Moreno-Ortiz, Antonio, 151 Moreno-Sandoval, Antonio, 44 Morgoun, Natalia, 112 Muñoz-Garcés, Alejandro, 141 Murillo, Silvia, 133 Savvidou, Paraskevi, 69 Selmi, Afef, 32 Sinkuniene, Jolanta, 166 SUAU-JIMÉNEZ, FRANCISCA, 79, 135 Suleymanov, Dzhavdet, 42 TRAN, Thi Thu Hoai, 38 Tutin, Agnès, 131 Vadasz, Noemi, 20 Verplaetse, Heidi, 83 Villayandre, Milka, 52 Yan, Rui, 116 Yoo, Hye Ryeong, 4 ZHANG, Xingzi, 93 Zimina, Maria, 28 Nevzorova, Olga, 40 Nguyen Van, Cyril, 109 Niall, Curry, 118 Nieto Mercado, Dalila Itzel, 174 Nieto, Guadalupe, 172 Pérez Béjar, Vı́ctor, 50 Padilla Herrada, Marı́a Soledad, 50 Pallejá, Clara, 159 Pecman, Mojca, 105 Perez-Guerra, Javier, 16 Piccioni, Sara, 179 PIQUÉ-NOGUERA, CARMEN, 79 Prado-Alonso, Carlos, 126 Ramisch, Carlos, 65 Ramos Ruiz, Ismael, 99 Reimerink, Arianne, 91 Rodrı́guez-Abruñeiras, Paula, 8 Rodrı́guez-Puente, Paula, 161 Romero Medina, Agustı́n, 87 Romero-Barranco, Jesús, 48 Ruano, Pablo, 14 Sánchez-Cárdenas, Beatriz, 65 Salles-Bernal, Soluna, 6 Santaemilia, José, 124 192