Download Resúmenes - Colloque international de Linguistique de Corpus

Document related concepts

Pipil grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Word-sense disambiguation wikipedia , lookup

Junction Grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Spanish pronouns wikipedia , lookup

Transcript
Resúmenes
CONFERENCIANTES
Gloria Corpas Pastor
Universidad de Málaga, España
"Through the Corpus Glass: diatopy and idiomaticity in translated Spanish"
Doctora en Filología Inglesa por la Universidad Complutense de Madrid (1994), Gloria Corpas
Pastor es catedrática visitante en Tecnologías de la Traducción del Instituto de Investigación
en Procesamiento del Lenguaje y la Información de la Universidad de Wolverhampton (desde
2007) y catedrática de Traducción e Interpretación de la Universidad de Málaga (desde 2008).
Experto español para el comité ISO TC37/SC2-WG6 "Translation and Interpreting". Cuenta
con una extensa producción científica y forma parte de numerosos comités científicos y consejos de redacción
nacionales e internacionales. Actualmente es Presidenta de AIETI (Asociación Ibérica de Traducción e
Interpretación), miembro del Consejo Consultivo de EUROPHRAS (“European Society of Phraseology”) y VicePresidenta de la AMIT-A (Asociación de Mujeres Investigadoras y Tecnólogas de Andalucía).
Susan Hunston
University of Birmingham, Reino Unido
"Words and Phrases: re-thinking corpus-based approaches to lexis and grammar"
Susan Hunston es catedrática de lengua inglesa en la Universidad de Birmingham (GB). Es
especialista en Lingüítica de corpus y en Análiis del discurso. Es autora de varios monográficos
(orpora in Applied Linguistics (2002/CUP), Corpus Approaches to Evaluation: Phraseology and
evaluative language (2011/Routledge) y coautora de Grammar: a corpus-driven approach to the
lexical grammar of English (1999/Benjamins). Es co-editora de Evaluation in Text: authorial
stance and the construction of discourse (2000/OUP) y de System and Corpus: exploring the
connections (2005/Equinox). Publicó numerosos artículos sobre el uso de los corpus para describir la gramática y el
léxico del inglés, y sobre los corpus y análisis del discursos.
Aquilino Sánchez Pérez
Universidad de Murcia, España
"The Cognitive Foundations of Corpus Linguistics"
Aquilino Sánchez Pérez fue Director de la Escuela Oficial de Idiomas de Barcelona y profesor de
la Universidad de Barcelona y Autónoma de Barcelona. Posteriormente accedió a Cátedra en la
Universidad de Murcia, Departamento de Filología Inglesa, centro en el cual sigue impartiendo su
docencia. Su docencia y trabajo investigador se han centrado en la Enseñanza y aprendizaje de
lenguas extranjeras, lexicología, lexicografía monolingüe y bilingüe (inglés-español) y lingüística
del corpus (diseño y recopilación de corpus y desambiguación automática de significados). Fue
cofundador y Secretario de la Asociación Española de Lingüística Aplicada (AESLA), fue miembro fundador de la
Asociación de Estudios Ingleses en España, de la Asociación Europea de Lexicografía, y fue Presidente de
AELINCO( Asociación española de lingüística del corpus).
Sumario
A Comparable Corpora Study on Self-Directed Motion in Spontaneous and Translated English, Patricia Gonzalez Darriba . . . . . . . . . . . . . . . . . . . . . . .
8
A Corpus-Based Analysis of Phraseological Units in Korean Academic Texts, SunHee Lee [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
A Diachronic Study of the Conative Alternation Construction in English, Laura
Esteban-Segura [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
A corpus-based analysis of news values in construing intimate partner violence
discourses in digital written media: A historical perspective, Sergio MaruendaBataller [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
A corpus-based analysis of syntactic linking between antecedents and ellipsis sites
in Post-Auxiliary Ellipsis in Modern English, Evelyn Gandón-Chapela . . . . . .
16
A corpus-based analysis of the collocational patterning of adjectives with abstract
nouns in medical English, Natalia Judith Laso [et al.] . . . . . . . . . . . . . . . .
18
A corpus-stylistic analysis of direct thought presentation in Charles Dickens’s
fifteen novels, Pablo Ruano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
A data-driven analysis of linguistic complexity and proficiency in learner and
native English, Javier Perez-Guerra [et al.] . . . . . . . . . . . . . . . . . . . . . .
22
Affix rivalry in English derivation: An onomasiological approach, Cristina FernándezAlcaina [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Anaphora Resolution on the Fly – Pronouns in a Psycholinguistically Motivated
Parsing System, Noemi Vadasz . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
Anaphora resolution in the interlanguage of English and Greek learners of Spanish: a corpus-based study, Athanasios Georgopoulos . . . . . . . . . . . . . . . .
28
1
Análisis de los aspectos pragmáticos en los discursos especializados de economı́a y
finanzas: un trabajo basado en un corpus oral como apoyo a la interpretación, Sonia Paola Martı́nez Zavala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
Aplicaciones del corpus CORPEN a la enseñanza y la evaluación de las unidades
fraseológicas del español usado en contextos especı́ficos, Inmaculada Martı́nez [et
al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
Applying Textometric Analysis to a Description of Cochrane Medical Abstracts
and their Plain Language versions: Quantitative Characterisation of Plain Language in Medical Discourse, Christopher Gledhill [et al.] . . . . . . . . . . . . . .
34
Aproximación a la fraseologı́a contrastiva en las sentencias del TJUE, Andrades
Arsenio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Calcul de la saillance pour annoter un corpus anaphorique (RESUMAN), Afef
Selmi [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
Constitution d’un corpus juridique pour l’extraction des collocations, Joaquı́n
Giraldez Ceballos-Escalera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
Construction de corpus en vue d’une étude contrastive des structures résultatives
en anglais et de leur traduction en français, Dijana Bojovic . . . . . . . . . . . .
42
Corpus en classe de langue. Exemple avec les marqueurs d’exemplification et de
reformulation, Cristelle Cavalla [et al.] . . . . . . . . . . . . . . . . . . . . . . . .
44
Development of Tatar-Russian Socio-Political Dictionary of Collocations on Corpus Data, Olga Nevzorova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
Development of annotation system for multiword constructions for Tatar National
Corpus, Dzhavdet Suleymanov . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
Diccionario de terminologı́a médica español - chino basado en corpus, Antonio
Moreno-Sandoval [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
Dire la nouveauté par les mots : les néologismes révélant les nouvelles tendances
sociétales en France, Najet Boutmgharine Idyassner . . . . . . . . . . . . . . . .
52
Early Modern English Scientific Text Types: Di↵erent Levels of Linguistic Complexity?, Jesús Romero-Barranco . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
El corpus de fuentes digitales como herramienta para la gramática del discurso, Vı́ctor
Pérez Béjar [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
El desacuerdo a través de la interrogación ecoica, Marı́a Valentina Barrio [et al.]
2
58
El lenguaje jurı́dico y el lenguaje de la ingenierı́a biomédica vistos desde la
metodologı́a de corpus, Eleonora Lozano Bachioqui [et al.] . . . . . . . . . . . . .
60
Estudio comparativo de la traducción en inglés, francés y español de los aspectos
ling´’uı́sticos y paraling´’uı́sticos de los cómics a partir de un corpus multimodal de
género de terror, Marı́a Del Carmen Baena Lupiáñez . . . . . . . . . . . . . . . .
62
Estudio comparativo de las marcas de uso en los repertorios lexicográficos actuales, Estrella Calvo-Rubio Jiménez . . . . . . . . . . . . . . . . . . . . . . . . .
64
Estudio contrastivo de corpus para identificar los rasgos diacrónicos del discurso
normativo catalán : estudio de los Estatutos de autonomı́a de 1932, 1979 y
2006, Albert Morales Moreno . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
Estudio de la aplicabilidad de la ley de Zipf y de la ley de Heaps en los corpus de
aprendientes de inglés., Nicolas Ballier [et al.] . . . . . . . . . . . . . . . . . . . .
67
Extracción de fraseologı́a contable con Sketch Engine. Propuesta de flujo de
trabajo, Daniel Gallego . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Extracting semantic frame structures from Environmental Sciences corpora, Beatriz Sánchez-Cárdenas [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
Facework in a telecollaboration student corpus, Pennock-Speck Barry [et al.]
. .
73
From text to word and from word to morpheme: Exploring the interface of corpus
linguistics and word formation study with evidence from Modern Greek, Paraskevi
Savvidou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
Functional and thematic ngrams in specialized corpora: the case of academic
English, French and Spanish, Clive Hamilton . . . . . . . . . . . . . . . . . . . .
77
Gender-based di↵erences in the use of epistemic modals in late Modern English
scientific register, Francisco Alonso-Almeida [et al.] . . . . . . . . . . . . . . . . .
79
Gobernabilidad y democracia en México. Unidades fraseológicas del Ejecutivo
Federal 2012-2016 desde el Análisis Crı́tico del Discurso, Carlos Enrique Ahuactzin
Martı́nez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
Gramática española para hablantes de francés: el uso de la preposición ”de”
después de matrices del tipo es posible., Marı́a Adelaida Gil Martı́nez . . . . . .
83
Hedging in tourism discourse: the variable genre in academic vs professional
texts, Francisca Suau-Jiménez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . .
85
Identificación de fórmulas recurrentes en español académico, Marcos Garcı́a Salido [et
al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3
Impact of Parallel Corpora as Translation Memories on Phraseological Translation
Quality in Student Translations of Specialized Medical Texts, Heidi Verplaetse [et
al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
Investigating style and conventionality in literary translation: a corpus-based
approach, Carolina Barcellos . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
Investigating the cognitive potential of primary EFL textbook activities: a corpusbased study, Joaquı́n Gris Roca [et al.] . . . . . . . . . . . . . . . . . . . . . . . .
93
Investigating the relationship between L1 and L2 collocation processing in the
bilingual mental lexicon from a cross-linguistic perspective, Hakan Cangir . . . .
95
Knowledge extraction for TKB phraseology module design, Pilar León-Araúz [et
al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
L’analyse contrastive des références au passé en français et en chinois -Sur le
corpus des récits, Xingzi Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
La adquisición de los verbos de cambio: Un análisis de la interlengua de aprendices
de español (L1 sueco), Ester Fernández . . . . . . . . . . . . . . . . . . . . . . . . 101
La detección y etiquetado de las estrategias metadiscursivas en artı́culos académicos:
METOOL, Marı́a Luisa Carrió-Pastor . . . . . . . . . . . . . . . . . . . . . . . . 103
La economı́a al borde de un ataque de nervios: metáforas médicas en el discurso
periodı́stico económico, Ismael Ramos Ruiz . . . . . . . . . . . . . . . . . . . . . 105
La mise en discours des données chi↵rées dans les textes de vulgarisation scientifique, Riham El Khamissy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
La modalité dans les discours politiques : segments phraséologiques en langue
et en discours. Exploration textométrique d’un corpus de débats présidentiels
états-uniens (1960-2016), Marion Bendinelli . . . . . . . . . . . . . . . . . . . . . 109
La traduction des ” megatermes ” anglais de type erythrocyte invasion-inhibitory
response : une approche fondée sur corpus et analyse du discours, Mojca Pecman [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
La traduction publicitaire : approche par corpus, Isabel Comitre Narvaez . . . . 113
Le continuum lexique-grammaire en genre spécialisé à partir de corpus maison, Laurent Gautier [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Le marqueur discursif ”donc” dans deux corpus dialogaux de di↵érente nature, Gemma
Delgar Farrés . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4
Learner vs. professional translational behavior: The case of discourse markers, Maria Kunilovskaya [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Les appositions nominales en français et en slovène : étude contrastive sur le
corpus FraSloK, Adriana Mezeg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Les constructions verbales en comme : de l’écrit scientifique à l’écrit académique
des étudiants natifs/non-natifs, Marie-Paule Jacques [et al.] . . . . . . . . . . . . 122
Meeting the reader in academic writing: reader pronouns in English and French., Curry
Niall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Multi-word terms: disclosing the semantic relations in noun compounds, Melania
Cabezas-Garcı́a [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Multilingual extraction of terminology from specialised corpora., Eva M. MestreMestre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Naming practices and media constructions of reality in Spanish: A corpus-based
perspective on violence against women news (2005-2015), José Santaemilia . . . . 130
On the Endophoric, Abstract and Narrative Nature of Idiomatic ’Do So’ in Legal
texts, Journalistic Texts and Written Correspondence. ”, Carlos Prado-Alonso . 132
On the Grammaticalization Path of the Quasi-coordinator as well as, Miriam
Criado Peña . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Onomasiologı́a del sentimiento: los corpus ling´’uı́sticos como fuente de datos
para la semántica y la combinatoria sintagmática de los nombres de emoción
en español, Inmaculada Mas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Phraseological routines in scientific writing: the example of metatextual routines
in French, Agnès Tutin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Phraseology and discourse grammar in English as a lingua franca: ’on the contrary’ and ’on the other hand’ in unedited research papers, Silvia Murillo . . . . 139
ROUND TABLE: Corpus-based analysis of interpersonal metadiscourse in specialized domains: academic vs professional and social genres. Theoretical and
methodological challenges, Francisca Suau-Jiménez [et al.] . . . . . . . . . . . . . 141
Rocking the corpus. A discourse analysis of pop rock lyrics., Marı́a Martı́nez Casas143
SUNCODAC: A Spanish-English corpus of computer-mediated student discussions, Mario Cal Varela [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5
Secuencia gramatical para la enseñanza del español como lengua extranjera, Yun
Sil Jeon [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Semantic constraints on MWU formation: Evidence from clinical records., Leonie
Grön [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Sobre la cuasi-sinonimia de poner y meter en español: un análisis de regresión
logı́stica de dos verbos locativos., Marie Comer . . . . . . . . . . . . . . . . . . . 151
Spanish Fragments and Polar Verbless Clauses. Typology and Corpus Distribution, Oscar Garcia-Marchena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Spoken Language Corpora under Examination, Hanna Hedeland [et al.] . . . . . 155
Strategies for Processing Large Corpora for Linguistic Inquiry and Natural Language Processing Tasks., Antonio Moreno-Ortiz . . . . . . . . . . . . . . . . . . . 157
Students’ use of the n-grams tool to learn about phraseology in academic writing, Maggie Charles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Teachers’ Dispositions Towards the Use of Corpus-Based Approaches in Teaching
English as a Foreign Language in Higher Education, Awatif Alruwaili . . . . . . 161
The Developmental Relationship between Spoken and Written Clause Packaging
in an English Secondary School, Mark Brenchley . . . . . . . . . . . . . . . . . . 163
The Psycholinguistic Profile of Domestic Abusers: A Corpus-Based Approach, ángela
Almela [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
The XML Annotation of A Corpus of Historical English Law Reports 1535-1999:
A Progress Report, Paula Rodrı́guez-Puente . . . . . . . . . . . . . . . . . . . . . 167
The construction of shared feelings: analysis of a↵ect in a corpus of obituary
comments in online newspapers, Isabel Corona . . . . . . . . . . . . . . . . . . . 168
The implied consumer in British hotel websites, Carmen Gregori-Signes . . . . . 170
The power of English: I and we in ELF and in ENL academic discourse, Jolanta
Sinkuniene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
The textual colligation of stance phraseology in cross-disciplinary academic discourses: the timing of authors’ self-projection, Louisa Buckingham [et al.] . . . . 174
Towards an extended lexical grammar: Complex colligational patterns of the noun
cause, Moisés Almela Sánchez [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . 176
6
Técnicas de caracterización de los personajes femeninos en Galdós: una aproximación desde los estudios de corpus, Guadalupe Nieto . . . . . . . . . . . . . . . 178
Unidades fraseológicas en la subtitulación de una serie del género de drama., Dalila
Itzel Nieto Mercado [et al.] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Verbal agreement with NCOLL-of-NPL subjects in the inner varieties of English
in GloWbE, Yolanda Fernández-Pena . . . . . . . . . . . . . . . . . . . . . . . . 181
Évaluer le seuil de fréquence pour la sélection des paquets lexicaux: de bonnes
nouvelles avec quelques réserves, Yves Bestgen . . . . . . . . . . . . . . . . . . . 183
Índice de creatividad metafórica y universales de traducción: propuesta metodológica
a partir de un corpus de informes de responsabilidad social empresarial, Sara Piccioni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
‘His maiestie chargeth, that no person shall engrose any maner of corne’. The
Standardization of Punctuation in Early Modern English Legal Proclamations, Javier
Calle-Martı́n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
‘Making it clear’: A contrastive study of evidentials and boosters in contemporary
political discourse, Ana Albalat-Mascarell . . . . . . . . . . . . . . . . . . . . . . 189
Lista de autores
190
7
A Comparable Corpora Study on
Self-Directed Motion in Spontaneous and
Translated English
Patricia Gonzalez Darriba
1
⇤ 1
Rutgers, The State University of New Jersey [New Brunswick] (RUTGERS) – 100 George Street, New
Brunswick, NJ 08901, Estados Unidos
This paper employs a corpus-based approach to test two sets of hypotheses that predict
opposite outcomes regarding the Unique Item T-Universal (Chesterman, 2004, 2010): on the
one hand, Tirkkonen-Condit’s (2004) Unique Item Hypothesis, which claims that Unique Items
are under-represented in translated texts, and on the other hand, Bakers’s (1993) Simplification Hypothesis and Halverson’s (2003) Gravitational Pull Hypothesis, which predict overrepresentation of Unique Items in translated texts. In order to test the aforementioned hypotheses, two comparable corpora have been selected and analyzed: The Translational English
Corpus (TEC, Baker (2003)) and The Corpus of Contemporary American English (COCA,
Davies (2008)), specifically in regards to the relative presence of English self-directed motion
expressions such as float into, fly out, etc. The use of Spanish source texts in the case of the translated English texts from the TEC allows us to compare the prevalence of two widely accepted
motion lexicalization patterns that correspond to the two languages in question: satellite-framed
constructions in English and verb-framed constructions in Spanish (Talmy (1985), Slobin (1996),
Levin and Rappaport (2016)).
A total of 28 English manner of motion verbs in combination with 8 English path-denoting
satellites were selected to search for, count, and compare the number of self-directed motion
expressions in the TEC and the COCA. This comparable corpora study yielded a total of 41,852
tokens from both corpora. This number is broken down into 209.2 self-directed motion expressions per million words in the TEC, and 395.5 self-directed motion expressions per million words
in the COCA. Data from the 28 verbs in both corpora were analyzed using an independent samples t-test, which revealed that the number of self-directed motion expressions is significantly
higher in the COCA (M = 3.32) than in the TEC (M = 1.76; t (219.267) = -2.274; p = .012),
Levene: p = .029). Moreover, a two-way ANOVA was conducted to compare the main e↵ects of
Corpus and Lexical Frequency, and the interaction e↵ect between Corpus and Lexical Frequency
on the number of self-directed motion occurrences by verb form per million words. Main e↵ects
were significant for both Corpus and Lexical Frequency, but no Corpus*Lexical Frequency interaction e↵ect was found.
These results confirm Tirkkonen-Condit’s Unique Item Hypothesis by proving that spontaneous,
non-translated English is significantly richer in self-directed motion expressions than translated
English, regardless the frequency of the verb taking part in the self-directed motion expression,
and disprove the Simplification Hypothesis (Baker, 1993) and the Gravitational Pull Hypothesis
(Halverson, 2003). Additionally, the results provide a baseline for future research aiming at gaining a better understanding of the cognitive processes involved in the translation of self-directed
motion expressions.
⇤
Ponente
8
Contraseña: Comparable corpora, self, directed motion, translation universals, under, representation of unique items.
9
A Corpus-Based Analysis of Phraseological
Units in Korean Academic Texts
Sun-Hee Lee
⇤† 1
, Beomil Kang‡ 2 , Hye Ryeong Yoo§
3
1
Department of East Asian Languages and Cultures, Wellesley College (EALC) – Green Hall 236B 106
Central Street, Wellesley, MA 02481, Estados Unidos
2
Department of Korean Language and Literature, Yonsei University (Korean Yonsei) – Oesolgwan 214,
Yonsei Unviersity, Yonsei-ro 50, Seodaemun-Gu, Seoul, Corea del Sur
3
Department of Korean Language and Literature, Yonsei Graduate School (Yonsei) – Oaesolgwan 214,
Yonsei-ro 50, Seadaemun-Gu, Seoul, Corea del Sur
This study provides a corpus-based genre analysis of phraseological expressions in Korean
academic prose, including collocation, colligation, and prefabricated lexical bundles (or formulaic expressions), etc. As an agglutinative language, phrasal structures in Korean incorporate
particles and verbal endings in word-units and are more complex than the corresponding English structures. While exploring relevant challenges and new methodological tools to capture
typologically distinct properties of Korean, we identify unique genre-specific properties of L1
academic texts using prefabricated phraseological units.
We have collected a 10.9 million ecel (space-based unit) corpus composed of 2171 academic
theses in the disciplines of humanities and social science with the highest ranks within the Korea Citation Index. From the corpus we extracted phraseological units depending on language
model N-grams and processed them with statistical tools. While addressing related challenges
in language specific data processing and analysis, we present the distinct linguistic functions
of the phraseological units in Korean academic prose in comparison with other registers. Our
study demonstrates the need to integrate both corpus-driven and corpus-based methodologies in
order to process meaningful lexico-grammatical combinations in Korean, where strong morphosyntactic relations hold across distinct phrasal boundaries via a diverse collection of particles and
endings. Our study also shows that combining N-gram-based extraction and morpheme-based
cut-o↵s is more useful for identifying meaningful combinations. In line with Jang (2015), we
argue for incorporating context sensitivity to n-grams to determine more useful patterns especially for processing agglutinative languages like Korean. For example, collecting the preceding
and the following slots of an extracted N-gram and utilizing them to decide the final pattern
increases the usability of the outcome. In the post-process of counting the frequency of an extracted N-gram, we merge a verbal lexeme with the following dependent morpheme(s), which
does not make a meaningful linguistic contribution to the given phraseological unit; this process
significantly decreases the number of patterns due to morpheme-based processing of N-grams in
Korean. Based upon extracted phraseological expressions, we provide a genre-focused linguistic
analysis of Korean academic register.
While we are still in the process of extracting meaningful phraseological patterns, our pilot
study suggests that there exist dynamic functions of referential expressions, stance expressions,
hedges etc. in Korean academic texts. Despite the lack of referential expressions in Korean,
the usage of phraseological units with demonstrative pronouns i ‘this’, and ku ‘that’ is highly
⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
§
Autor correspondiente: [email protected]
†
10
frequent in academic contexts. Expressions of epistemic and attitudinal/modality stance are
more rigorously used in the Korean academic register, which contrasts with Biber’s (2004) analysis of academic prose in English. Expressions of indirect quotation and hedges are noticeable
in the extracted outcome. These findings suggest that sociocultural property of indirectness is
prevalently reflected in Korean academic writing.
The outcome of our study will provide a platform for further research with a large-size corpus of
more than 100 million ecel for applied/pedagogical research on language acquisition and Korean
for academic purpose (KEP). The long-term goal of our research aims to develop full-fledged
genre analysis of L1 academic texts as well as L2 acquisition data. The study also explores dynamic interactions between grammar and lexicon in agglutinative languages like Korean while
identifying language specific features in processing phraseological units and a genre analysis of
academic texts.
Contraseña: phraseological expressions, formulaic expressions, collocation, genre analysis, academic
register
11
A Diachronic Study of the Conative
Alternation Construction in English
Laura Esteban-Segura
⇤† 1
, Soluna Salles-Bernal
⇤
1
1
Universidad de Málaga (UMA) – España
The conative alternation is a subtype of transitivity alternation in which there is a transitive
variant and an intransitive one represented with an at-construction. From a syntactic point of
view, it occurs with transitive verbs and is therefore referred to as a case of preposition insertion
(the preposition at is inserted before the direct object). From a semantic perspective, it can be
described as a ”detransitivizing” construction, since there is a contrast between conative uses of
transitive verbs and their transitive counterparts (Perek 2015: 90). Accordingly, the argument
can be direct (subject, direct object or indirect object) or oblique.
(1) a. Kim cut the pie.
b. Kim cut at the pie (drunkenly) (Beavers 2006: 6).
The patient (”the pie”) can have two realizations: as the direct object (1a) or as an oblique signalled by the preposition at (1b). Here we find a semantic contrast: in the transitive variant the
patient is known to have been a↵ected in some way, whereas in the one with the at-construction
this is not necessarily the case; thus, the action denoted by the verb may or may not have been
completed and the alternation may convey ”a reduced a degree of e↵ectiveness” (Riemer 2010:
354), as seen in example (2b) below, which implies that the action was not completely successful:
(2) a. The zombies slashed my face.
b. The zombies slashed at my face.
Although the construction has been studied before (van der Leek [1996], Broccias [2001, 2003],
Beavers [2010], Perek and Lemmens [2010], Guerrero-Medina [2011], Perek [2015]), it remains
scarcely investigated from a diachronic point of view. Therefore, our main objective is to research on the origin and development of the conative construction in English by looking at its
occurrence in several historical corpora. For the purpose, we have first made a comprehensive list
of verbs which allow the construction and then selected the verbs under study. A collostructional
analysis, which ”investigates which lexemes are strongly attracted or repelled by a particular slot
in the construction (i.e. occur more frequently or less frequently than expected)” (Stefanowitsch
and Gries 2003: 214), has been carried out as it can help to establish which verbs favour the
construction over others in the di↵erent corpora. Some of our preliminary results show that
the construction was already present in Old English and that in most instances the subject is
agentive or animate.
⇤
†
Ponente
Autor correspondiente: [email protected]
12
Contraseña: conative alternation, verb alternation, history of English, collostructional analysis
13
A corpus-based analysis of news values in
construing intimate partner violence
discourses in digital written media: A
historical perspective
Sergio Maruenda-Bataller
⇤ 1
, Paula Rodrı́guez-Abruñeiras
⇤
1
1
IULMA/Universitat de València – España
In the last thirty years, there have been important advances in the media coverage or discussion of violence against women (VAW) (Aran Ramspott & Medina Bravo 2006; VallejoRubinstein 2005). Lately, it is indisputable that IPV is one of the key issues not only in
political, social and institutional discourses but also in the selection agenda of news producers.
The recognition of this phenomenon has been largely due to the media, which have played a
decisive role in transferring the issue from the private and personal to the public sphere, thus
ensuring visibility and contributing to sensitizing citizenship (Berganza Conde 2003). However,
some authors (e.g. Altés 1998; Alberdi & Matas 2002) have argued that this is not without a
cost. Media are torn between two conflicting interests: on the one hand, to treat these grievous
cases with the required ethics and, on the other, to attract a maximum audience, which is almost
‘naturally’ done through sensationalism. Journalists can create di↵erent pictures of domestic
violence and ”confirm and debunk the myths surrounding it by choosing certain topics, sources,
facts, and words over others” (Bullock & Cubert 2002: 479).
Against this backdrop, the present study aims to contribute a corpus-based approach to the
discursive devices used to construct newsworthiness in IPV news in Spanish and UK dailies in
an ad-hoc corpus of gender violence news reports from 2005 to 2015. Specifically, we explore the
way media outlets have discursively represented women victims of IPV by means of news values
over the last decade. Subsidiary to this, we will explore the way news values are exploited ideologically to construct discourse prosodies around women victims of IPV, violent episodes and
perpetrators. The results gain insights into the social configuration and definition of women and
their identities in contemporary written media on IPV through time.
For our purposes, we apply Bednarek & Caple’s (2012; 2014) linguistic approach to news values
as discursive realisations of newsworthiness that ”exist in and are constructed through discourse”
(Bednarek & Caple 2014:136). Our analysis combines a quantitative approach with close qualitative readings of concordance lines to identify frequent linguistic occurrences in the corpus that
may give rise to discourse prosodies (Bednarek 2006; Baker et al. 2008; Baker & Levon 2015).
We pay attention to shared and di↵erent values cross-culturally, together with the most relevant discourse prosodies and ideological implications. Our results substantiate the existence of
two polarised discourses which are nevertheless inextricably and ineluctably linked: a discourse
of death, violence and terrible su↵ering and another of institutional and social support. The
former is mainly conveyed through Negativity and Impact, while the latter is conveyed through
Eliteness and Positivity. On the whole, these discourses are similarly constructed in the four
⇤
Ponente
14
data sets. However, the concordance analysis points to remarkable di↵erences. It shows that
Negativity has more critical overtones in the Spanish newspapers, and reports on abusers are
often constructed as more impersonal in the case of UK dailies. As for the depiction of extreme
negative emotions, the higher number of occurrences, together with a wider plethora of word
combinations construct Spanish reports as more ideological, if not sensationalist, thus exploiting
readers’ interest in crime and violence.
Contraseña: intimate partner violence, news values, newsworthiness, CADS, women.
15
A corpus-based analysis of syntactic linking
between antecedents and ellipsis sites in
Post-Auxiliary Ellipsis in Modern English
Evelyn Gandón-Chapela
1
⇤ 1
University of Cantabria and University of Vigo – España
This study analyses the type of syntactic linking established between the antecedent clause(s)
and the ellipsis site(s) in cases of Post-Auxiliary Ellipsis (PAE) in Modern English, using the
Penn Parsed Corpus of Modern British English (1700-1914, one million words and eighteen
di↵erent genres).The term ‘PAE’ (Sag 1976; Warner 1993; Miller 2011; Miller & Pullum 2014)
covers those cases in which a Verb Phrase, Prepositional Phrase, Noun Phrase, Adjective Phrase
or Adverbial Phrase is omitted after modal auxiliaries, auxiliaries be, have and do, and infinitival marker to. VP ellipsis (VPE) and Pseudogapping (PG) are the two subtypes of PAE under
investigation:
(1) That I had received such from Edward also I need not mention; but I do, you see, because
it is a pleasure. [VPE: coordination]
(2) They can by no means, therefore, be members of happiness; for if they were, happiness
might be said to be made up of one member. [VPE: adverbial subordination]
(3) I can recollect nothing more to say. When my letter is gone, I suppose I shall. [VPE:
none]
(4) A skilled florist will produce a finer e↵ect with a few inexpensive blossoms than an unskilled one will with a cartload of choice material. [PG: comparative subordination].
(5) but did not admire the strain of its poetry in general, though I did its morality. [PG:
adverbial subordination]
This aspect has also been studied in very few corpus-based works for the Present-Day English period (Hardt & Rambow 2001; Nielsen 2005; Hoeksema 2006; Bos & Spenader 2011; Sharifzadeh
2012; Miller 2014). Here I extend these studies by analysing the type of syntactic linking in
PAE constructions in Modern English and by presenting a retrieval algorithm of instances of
PAE via CorpusSearch 2. This complex algorithm has led to successful recall ratios (0.97)
and is applicable to parsed corpora which follow the conventions of the Penn Parsed Corpus of
Modern British English. The results show that, regarding PG, the vast majority of cases are
comparative constructions (74%), followed by those cases with lack of syntactic linking (15.12%),
coordination (4.65%), adverbial subordination (4.65%) and relative subordination (1.16%). The
comparison with other studies on PG in Present-Day English (Hoeksema 2006; Sharifzadeh 2012;
Miller 2014) has revealed that instances of PG with NP remnants have a stronger preference
for comparative constructions in Present-Day English (around 90%) than in Modern English
(70%). Regarding VPE, in over 50% of the examples there is no syntactic linking between the
source and the target of ellipsis, which contrasts with the percentage found in PG (15.12%).
The second most important type of syntactic linking is comparative subordination (31.51%).
⇤
Ponente
16
However, although the percentage of comparative constructions is high in VPE, it is almost
2.5 times higher in PG (74.42%). Far less common are cases of relative subordination (7.22%),
coordination (5.56%) and adverbial subordination (5.37%). If these findings are compared with
Bos & Spenader’s (2011), it is observed that the first three types of linking are the same in both
studies: as-appositives, comparatives and lack of syntactic linking. Hardt & Rambow (2001),
on their part, found that the di↵erent forms of subordination favour VPE, while the absence of
a direct relation disfavours its presence. However, this type of linking is the third most frequent
one in Bos & Spenader’s (2011) work and in this paper.
Contraseña: ellipsis, syntactic linking, Modern English
17
A corpus-based analysis of the collocational
patterning of adjectives with abstract nouns
in medical English
Natalia Judith Laso
⇤† 1
, Suganthi John
⇤
2
1
2
University of Barcelona (UB) – España
University of Birmingham – Reino Unido
Research on specific-domain phraseology has demonstrated that it is challenging for EAL
writers to acquire phraseological competence in academic English and develop a good working
knowledge of domain-specific collocational patterns (Carter 1998; Williams 1998; Wray 1999;
Gledhill 2000; Flowerdew 2003; Biber 2006; Hyland 2008 & 2016; Granger & Meunier 2008;
Author 1 & Author 2 2013; Pérez-Llantada 2014; Hyland 2016). This is especially apparent in
scientific discourse, where research grows at a rapid pace and researchers often are required to
disseminate their results equally rapidly to an international audience. The struggle for the EAL
speaker is learning the discourse conventions of the scientific genre to ensure that their results
receive the sort of attention they would like it to from other members of the science community.
Corpus-based analyses have been of special relevance in the field of genre analysis, which is a
specific language practice, characterised by a number of linguistic features and phraseological
conventions. It can therefore be claimed that genres make use of di↵erent ways of expressing
meaning (Swales 1990; Hunston 2002). This assumption is intimately linked with the concept
of local grammar (Gross 1993; Barnbrook & Sinclair 1995; Hunston & Sinclair 2000), which
consists of a description of particular areas of language (e.g. the analysis of the collocational
and phraseological conventions characteristic of scientific discourse), rather than the language
as a whole (Bednarek 2007).
The aim of this paper is to describe one pattern commonly found in scientific discourse; i.e.
abstract nouns in combination with adjectives so as to contribute to the characterisation of
this combinatorial pattern in medical science writing. The corpus analysed in this study is
the Health Science Corpus (HSC ), which is a representative sample of health science research
articles specifically compiled for investigating the lexico-grammatical patterns surrounding nontechnical terms in scientific English and the conventionalised phraseological characteristics of
this genre. The observations drawn have contributed to our understanding of the positions and
typology of adjectives in combination with abstract noun patterns in medical English.
Furthermore, this study has also brought to the forefront the convenience of using collocation evidence obtained from textual corpora in EFL and ESP settings so as to help EAL writers
focus on slices of real language as well as high-frequent combinations of words. To this end,
the findings in this study have informed the development of SciE-Lex, a reference tool which
provides information about the meanings and the grammatical and collocational patterns of
⇤
†
Ponente
Autor correspondiente: [email protected]
18
general terms frequently produced in medical English. The aim of SciE-Lex is to help the Spanish professional medical community use the appropriate collocational patterns in their medical
research articles.
Some other publicly available resources, such as existing technical and scientific monolingual
dictionaries, focus mainly on terminological and encyclopaedic information or –as in the case
of bilingual and multilingual dictionaries- they provide translation equivalents without further
information about the context on which the meaning of a given lexical entry depends. Consequently, the development of lexical databases like SciE-Lex as well as specialised dictionaries
that take into account the lexico-grammatical patterning of lexical units and acknowledge that
meaning is highly dependent on the context of co-occurrence of the word (Barnbrook 2007:191)
is considered to be extremely valuable to the EAL scientific community.
Contraseña: phraseological units, abstract nouns, EAL writers, medical community, ESP corpus
investigation
19
A corpus-stylistic analysis of direct thought
presentation in Charles Dickens’s fifteen
novels
Pablo Ruano
1
⇤ 1
Universidad de Extremadura - Uex (SPAIN) – España
In this presentation, a corpus-stylistic analysis of direct thought presentation will be carried
out in a corpus of Charles Dickens’s fifteen novels (c. 3.8 million words). The aim of the analysis is to delve deeper into Dickens’s presentation of his characters’ thoughts, an aspect so far
underexplored maybe due to the ‘lack of psychological inwardness and depth in his characters’
(McParland, 2011: 209). Despite such dearth of psychological depth, though, Dickens consistently reported his characters’ thoughts throughout his fifteen novels. Therefore, a systematic
analysis of how he did so is in order, if only because no comprehensive account of it has been
yet attempted. As will be shown, occurrences of direct thought (henceforth, DT) can be effectively retrieved thanks to a corpus methodology, which makes it possible to systematically
analyse Dickens’s use of this mode of thought presentation. Specifically, 244 occurrences of DT
have been retrieved here, constituting a much wider set of examples than the twenty-one examined by Busse (2010) in the most comprehensive analysis of discourse presentation strategies in
nineteenth-century fiction to date.[1] The analysis of these 244 occurrences will not only further
confirm some of Busse’s findings regarding DT in nineteenth-century narrative fiction, but will
also unveil hitherto unremarked patterns in form and function as far as Dickens’s presentation of
his characters’ thoughts is concerned. The analysis has focused on those examples that contain
the verb think, the reporting verb for thought presentation par excellence. For example:
”John” thought madame, checking o↵ her work as her fingers knitted, and her eyes looked at
the stranger. ”Stay long enough, and I shall knit ‘BARSAD’ before you go.” (A Tale of Two
Cities, book 2, chapter 16)
This example contains several characteristic features of Dickens’s use of DT, such as the use
of a vocative in the reported clause, a suspended reporting clause and the reference to the character’s eyes. These and other traits are investigated in this presentation. As will be shown, they
fulfil meaningful functions which relate to significant aspects of Dickens’s style, as discussed
by other critics. The analysis is intended to contribute to a better understanding of Dickens’s
craftsmanship from a stylistic point of view.
It is only fair to note that Busse’s corpus is composed of excerpts of less than 3,500 words from
twenty-two nineteenth-century novels (Busse, 2010: 64), being therefore much smaller than the
corpus of Dickens’s novels analysed here.
⇤
Ponente
20
Contraseña: Dickens, corpus stylistics, direct thought presentation
21
A data-driven analysis of linguistic
complexity and proficiency in learner and
native English
Javier Perez-Guerra
⇤ 1
, Ana Elina Martinez-Insua
⇤
1
1
University of Vigo (UVigo) – FFT. Campus Universitario. 36310 Vigo, España
This paper investigates issues covered by the umbrella concept of ‘linguistic complexity’ in
learner language. The notion of complexity, as understood in this study, focuses on a number of
dimensions: lexical, syntactic and semantic-discoursive. The null hypothesis ‘learner language
does not deviate from native language as regards linguistic complexity’ is rejected in light of
data-driven standard metrics of linguistics density and inter-/intra-textual diversity.
On the one hand, the data sampling learner language are retrieved from the Early-Access Subset
of the Trinity Lancaster Corpus, compiled at the ESRC Centre for Corpus Approaches to Social
Science, Lancaster University. This subset of the Trinity Lancaster Corpus comprises approximately two million words in length and includes transcribed interactions between candidates
and examiners from B1 to C2 level of the Common European Framework of Reference (Council
of Europe 2001). Each candidate participated in a number of speaking tasks (depending on
his/her proficiency level). On the other hand, the data retrieved from the learner dataset will
be compared with results deriving from the native learner corpus LOCNEC (Centre for English
Corpus Linguistics, Université catholique de Louvain), which will constitute the English native
control corpus, as well as with other non-native L2 corpora, such as the Louvain International
Database of Spoken English Interlanguage (LINDSEI).
The software tools which will be used in this research are, first, Coh-Metrix (McNamara et
al. 2014) and Synlex (Lu 2012, 2014). First, Coh-Metrix provides basic lexical and semanticdiscoursive features such as type-token ratio and average word and sentence length, as well as
other metrics of textual lexical diversity (mainly vocd-D) and readability indexes (Flesh Reading
Ease, Flesh Kincaid Grade Level). Besides, it determines spaces in Latent Semantic Analysis
which can be used to characterise the degree of conceptual similary within a group of texts.
Second, Synlex (Lu’s Lexical Complexity Analyzer and L2 Syntactic Complexity Analyzer) automates the analysis of complexity by using 25 di↵erent measures of lexical density, taken from
the first- and second-language development literature. The input texts from the Early-Access
Subset of the Trinity Lancaster Corpus will be POS-tagged and lemmatised by means of TreeTagger so that Synlex can provide the di↵erent measures.
The statistical analysis and discussion of the metrics for the native and the learner corpora,
as supplied by Coh-Metrix and Synlex, will be decisive to investigate the following research
questions: does learner language di↵er from native language as regards linguistic complexity?
do the CEFR levels imply di↵erences as regards linguistic complexity? The results show, first,
that the answers to the previous research questions are positive and, second, that the cline as
⇤
Ponente
22
regards complexity degrees complies with the CEFR levels in a very significant way.
References
Council of Europe. 2001. Common European Framework of Reference for Languages: learning,
teaching, assessment. Cambridge: Cambridge UP.
Landauer, Thomas K. 2007. LSA as a theory of meaning. Eds. Thomas K. Landauer, Danielle
S. McNamara, Simon Dennis and Walter Kintsch eds. The handbook of Latent Semantic Analysis. Mahwah, NJ: Lawrence Erlbaum, 3–34.
Lu, Xiaofei. 2012. The relationship of lexical richness to the quality of ESL learners’ oral
narratives. The Modern Language Journal 96/2: 190–208.
McNamara, Danielle S., Arthur C. Graesser, Philip M. McCarthy and Zhiqiang Cai. 2014.
Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge UP.
Lu, Xiaofei. 2014. Computational methods for corpus annotation and analysis. Dordrecht:
Springer.
Contraseña: complexity, CEFR, metrics, learner language, readability
23
Affix rivalry in English derivation: An
onomasiological approach
Cristina Fernández-Alcaina 1 , Cristina Lara-Clares 1 , Jesús
Fernández-Domı́nguez ⇤ 1
1
University of Granada [Granada] – vda. del Hospicio, s/n C.P. 18071 Granada, España
The notion that morphological processes contend with each other for concept naming is a
well-known and substantiated one, and underlies some of the most prominent word-formation
theories. Morphological competition is however a slippery notion which numerous scholars have
dealt with in passing, and where the existing approaches are more theoretically than empirically
oriented. In principle, competition is a theory-neutral notion and ”[...] happens when two or
more morphological processes can express the same syntactic-semantic function” (Kastovsky
1986: 597). The competitive behaviour of word-formation has been the focus of recent investigations, most of which have adopted a primarily formal perspective by comparing pairs or small
groups of competing rules (Bauer 2006, Bauer et al. 2010, Arono↵ & Lindsay 2014). The scope
of such works includes the semantics of derivation, but their driving force is formal performance.
Ultimately, the conclusions of customary approaches to competition are that affix X succeeds in
the competition with affix Y, or that affix Z dominates in a given morphosemantic context.
One alternative to the above is found in the onomasiological model of word-formation, which
follows in the tradition of the Prague School of Linguistics and whose main exponent is Štekauer
(2005). This approach has shifted the focus of word-formation analysis away from its formal
aspects onto the naming needs of language users, such that the semantics of lexemes prevail over
their form. In this view, the base-derivative relationship is inspected mainly through meaning
categories like Causative, Locative, Agent or Instrument, each of which may be conveyed via
various word-formation processes (e.g. -er, -ian or -ist all express agency). With this in mind,
this paper considers the role of cognitive-semantic categories from two angles:
i) How do cognitive-semantic categories behave with regard to morphological competition?
ii) How can the existing formulas of productivity measurement be employed in the onomasiological evaluation of competition?
For both issues we resort to the British National Corpus (BNC), the Corpus of Contemporary
American English (COCA) and the Oxford English Dictionary (OED). In the case of question i),
the derivatives are classified into competing clusters by using a template that considers a series
of factors that facilitate or constraint the appearance and the profitability of word-formation
processes. Based on the semantic classification in Bagasheva (to appear), this makes it possible
to interpret which readings of a lexeme prevail and which become obsolete during competition.
Question ii) is addressed by operating the productivity formulas in Baayen (2009) and Gaeta &
Ricca (2015) on the study sample, for which corpus-derived frequencies prove essential.
The results obtained from the above are then set in the framework of a semantic view of competition, understood as a complement to more formal views on the matter. The preliminary
conclusions point to a correlation between the number of instances of a process in present com⇤
Ponente
24
petition and its profitability, and between the number of instances of prevalence with the degree
of profitability of that process.
Contraseña: affixation, English, morphological competition, onomasiology, productivity, word formation
25
Anaphora Resolution on the Fly – Pronouns
in a Psycholinguistically Motivated Parsing
System
Noemi Vadasz
1
⇤† 1,2
Pázmány Péter Catholic University, Faculty of Humanities and Social Sciences – Budapest, Hungrı́a
2
MTA-PPKE Hungarian Language Technology Research Group – Budapest, Hungrı́a
A psycholinguistically motivated parsing model like AnaGramma (Prószéky and Indig, 2015)
throws new light upon the broadly interpreted problem of anaphora resolution. This paper concentrates on the narrower problem of pronouns[1] namely the personal, reflexive and reciprocal
pronouns in the framework of the AnaGramma parsing model.
As AnaGramma, with its strictly left-to-right, word-by-word approach tries to handle utterances by following the patterns of human language processing as much as possible, it is needed
to handle coreference ‘on the fly’ during the parsing of the utterance. It works with a supplyand-demand framework, which means that each word supplies its lexical representation and
morpho-syntactic information, and demands are issued (e.g. verbs have an obligatory need for
their arguments). At the end of the utterance all demands should be fulfilled either from the
sentence or with default mechanisms. The output of the parser is a dependency graph with
di↵erent types of edges including coreference-edge.
When the parser gets a verb (or any element having argument frame), after calling the actual argument frame searchers of the arguments can start o↵. If an argument preceded the verb
in the linearization of the sentence it is at service for the searchers (in a short term memory
called pool ). In other cases the searcher can wait until a potential supply arrives. The searchers
have di↵erent settings according to the demands of the verb.
In Hungarian, from the inflection of the verb some features of some arguments are calculable.
The searchers look for an agreeing subject or object. A default zero node with the appropriate
case marker and agreeing features is pro↵ered as well. According to this, zero pronouns are
involved into the parsing process.
Reflexives and reciprocals with their actual case marker behave like other arguments – as
supplies, ready for the verb’s demands. A special problem during the parsing is the case of
homonymy. In Hungarian the pronoun maga has two meanings: (1) a third person singular
reflexive pronoun in nominative case (‘himself/herself’) and (2) a polite or formal second person
singular personal pronoun in nominative case (‘you’). In addition there is an other use of maga
in the construction of e.g. maga a(z) ´’ord´’og (‘the devil itself’).
Pronominalization and the use of zero pronouns are run by an underlying rule-system which
enables us to reveal the anaphora dependencies and referential identities. These long term relations overarch the borders of the caluse – even of the sentence – in which they are. Using the
algorithm of Pléh and Radics (1976), these underlying rules can be built into the AnaGramma
⇤
†
Ponente
Autor correspondiente: [email protected]
26
parsing system in order to close its operation to human sentence processing regarding to the
pronouns as well.
In this paper I present a solution for handling Hungarian personal, reflexive and reciprocal
pronouns in the framework of AnaGramma, based on the anaphora resolution algorithm by
Pléh and Radics (1976). My observations are based on corpus data for which I have used the
Pázmány Corpus (Endrédy, 2016).
Some types of corefenerce like repetition, proper name variants, synonyms, hyper- and hyponyms
are needed to be taken into account as well, they are the subject of future research.
Contraseña: computational linguistics, parser, pycholinguistics, performance, corpus
27
Anaphora resolution in the interlanguage of
English and Greek learners of Spanish: a
corpus-based study
Athanasios Georgopoulos
1
⇤ 1
Universidad de Granada - UGR (SPAIN) – España
Overt pronominal subjects are not syntactically obligatory in pro-drop languages like Spanish (Fernández Soriano 1999, Luján 1999). Previous research has shown that their use and
alternation with null subjects is both syntactically and contextually constrained (Alonso-Ovalle
et al 2002, Perez Leroux & Glass 1999). It has also been demonstrated that learners of Spanish
show persistent deficits concerning their distribution (Lozano 2009, 2016). The interface between syntax and discourse has been claimed to account for these deficits (Sorace 2004). While
research in this field has traditionally relied on experimental data (for overviews: Quesada 2015),
there is an increasing number of researchers who point out the need of using corpora to test existing hypotheses (Dı́az & Thompson 2013, Lozano & Mendikoetxea 2013, Mendikoetxea 2014,
Tono 2003). Additionally, most of the studies on subject pronouns in Spanish L2 (Al Kasey
& Pérez-Leroux 1998, Almoguera & Lagunas 1993, Liceras 1996, Liceras & Dı́az 1999) have
examined the interlanguage of English-speaking learners, whose L1 is non pro-drop. Overall,
in Spanish L2, there is a very limited number of corpus-based studies on the interlanguage of
speakers of pro-drop languages such as Greek (Margaza & Bel 2006).
This paper presents the preliminary results of research that aims to explore the anaphoric 3rd
person subject usage in the interlanguage of Greek and English learners of Spanish. The major
empirical basis of the investigation is a recently compiled L1 Greek-L2 Spanish learner corpus.
The corpus is conceived as a component of the L1 English-L2 Spanish CEDEL2 corpus (Lozano
2009, Lozano & Mendikoetxea 2013). Both corpora exhibit the same design principles. Hence,
this is the first corpus-based study that allows comparability of two groups of learners (Greekspeaking versus English-speaking) whose L1 di↵ers with respect to anaphoric subjects.
For the analysis of the corpus data, the XML annotator ”UAM corpus tool” (O’Donnell 2009)
was used. A purpose-oriented tagset was designed, on the basis of previous learner corpus
studies (Blackwell & Quesada 2012, Gudmestad & Geeslin 2013, Lozano 2016). Learners of two
di↵erent proficiency levels (elementary and upper-advanced) for each group (English and Greek)
were examined and compared to a native Spanish control group. Preliminary results indicate
that although elementary Greek-speaking learners of Spanish show some tendency to overuse
overt subjects, they do so in a significantly lower percentage than their English counterparts.
Moreover, at the upper-advanced level, they exhibit native-like preferences, in contrast to the
English-speaking learners, who show deficits even at the highest levels of proficiency. Crosslinguistic influence can account for these di↵erences between the two learner groups. Greekspeaking learners seem to take advantage of the similarity between their L1 and Spanish with
respect to anaphora resolution (AR) patterns, whereas English-speaking learners seem to transfer their L1 properties. From a developmental point of view, results suggest that cross-linguistic
influence is a crucial factor and that certain AR categories at the syntax-discourse interface can
be fully acquired. Results run partially against the Interface Hypothesis and are in line with
other recent SLA studies (Judy 2015, Kras 2008, Prentza 2014, Zhao 2014).
⇤
Ponente
28
Contraseña: anaphora resolution, SLA, Spanish L2, contrastive interlanguage analysis, learner
corpora, Interface Hypothesis
29
Análisis de los aspectos pragmáticos en los
discursos especializados de economı́a y
finanzas: un trabajo basado en un corpus
oral como apoyo a la interpretación
Sonia Paola Martı́nez Zavala
1
⇤ 1
Universidad Autónoma de Baja California (UABC) – Av. Monclova 678, Ex-Ejido Coahuila, 21360
Mexicali, Baja California, México
Argumento principal
Los intérpretes se enfrentan a problemas como falsos sentidos, sin sentidos y contrasentidos que
se presentan en la práctica. Éstos pueden ocurrir al no considerar los aspectos pragmáticos del
discurso. Los fallos pragmáticos ocurren cuando la interpretación es gramaticalmente correcta;
sin embargo existe una pérdida de sentido.
Objetivos
El objetivo general es identificar aspectos pragmáticos en el discurso de economı́a y finanzas
a través de un corpus monoling´’ue en inglés que facilite la tarea interpretativa en este tipo de
discurso a través de un corpus y un reporte de hallazgos que funcionen como herramientas de
documentación para el intérprete.
Para lograrlo, se compila una muestra de un corpus de textos sobre economı́a y finanzas en
inglés, que consiste de 27 transcripciones de entrevistas obtenidas de The World Bank Group
(2016), se procesa en la herramienta AntConc 3.4.4w y se analiza el corpus para identificar
los aspectos como emociones, inferencias intelectuales, hipótesis, reformulaciones, evaluaciones,
expresiones metafóricas, modalizaciones discursivas, peticiones, órdenes, entre otros y se realiza
un reporte que concentre los hallazgos.
Marco Teórico
Garcı́a Yebra (1981) señala que ”la traducción se distingue de la interpretación en que tiene
como punto de partida un texto escrito, y como resultado, otro texto escrito” (p.9).
Escobar (1996) menciona que la interpretación es una modalidad de la traducción y que presiones
como los plazos convierten a la traducción en un proceso casi tan rápido como la interpretación.
Faber (2009) indica que la pragmática se enfoca tanto en el efecto del contexto en el comportamiento comunicativo, ası́ como en cómo el receptor infiere para llegar a la interpretación
final de una oración.
Asimismo, Faber (2009) señala que la pragmática del discurso especializado se relaciona directamente con las situaciones en las cuales ocurre este tipo de comunicación, y en las formas
⇤
Ponente
30
en las que el emisor y el receptor lidian con ellas de manera potencial o efectiva.
Sobre dominio pragmático, Bertone (1989) afirma que la competencia del intérprete consiste
en lograr una distinción entre los tipos de implı́citos y de información contextual para interpretar adecuadamente, respetando cada aspecto.
McEnery y Hardie (2012) definen la ling´’uı́stica de corpus como un área que se enfoca en un
conjunto de procedimientos para el estudio de una lengua que se pueden aplicar a varias áreas
de la ling´’uı́stica.
L´’udeling y Kyt´’o (2009) indican que los córpora orales pueden ser compilaciones de grabaciones o transcripciones de éstas y que es posible analizar las últimas como un corpus escrito.
Resultados
Se construyó un corpus oral en inglés que consta de 86,883 palabras recuperadas del Banco
Mundial y que se analizó con herramientas de procesamiento de corpus para determinar los
aspectos pragmáticos y su contexto. Los resultados permiten a los intérpretes conocer sobre
caracterı́sticas pragmáticas y desarrollar el dominio pragmático en el discurso económico financiero. Algunos ejemplos encontrados en el corpus son: el adverbio en inglés absolutely que
expresa evaluación en el discurso económico-financiero, la conjunción if que denota una hipótesis
y la frase I mean que pone de manifiesto una reformulación. En reformulaciones, la frase I mean,
se utilizó como tal en 21 casos de 22 hits. La interpretación propuesta es Quiero decir, Digo o
Me refiero a.
Como expresión idiomática apareció en una ocasión Across the board, y la interpretación propuesta es A todos en general o Incluyendo a todos.
Contraseña: Palabras clave: : pragmática, discurso especializado, ling´’uı́stica de corpus e interpretación.
31
Aplicaciones del corpus CORPEN a la
enseñanza y la evaluación de las unidades
fraseológicas del español usado en contextos
especı́ficos
Inmaculada Martı́nez
1
⇤ 1
, Susana Llorián
⇤ † 2
Centro Internacional de Estudios Superiores del Español (CIESE-Comillas) – Avda. de la Universidad
Pontificia s/n. 39520 Comillas. Cantabria, España
2
Universidad Complutense de Madrid (UCM) – España
El impacto del Plan Curricular del Instituto Cervantes (2007) lleva a la Fundación Comillas
a publicar años más tarde el Plan Curricular del Español de los negocios (Martı́n Peris y Sabater,
2012), con el fin de que este documento se erigiera en la principal referencia para el diseño de
cursos, de materiales didácticos y de exámenes certificativos del Español de los Negocios (ENE).
Durante el desarrollo de la documentación curricular se ratificó la necesidad de que se pusiera
en marcha un proyecto de investigación que guiara el desarrollo de este proceso, fundamentado
en un corpus especializado, que se materializarı́a en el corpus CORPEN (Corpus Comillas del
Español de los Negocios).
Una de las áreas más afectada por la aplicación del corpus CORPEN a este proceso es el componente léxico. El objetivo principal de esta comunicación consiste en mostrar las implicaciones de
la asistencia de este corpus en la especificación de los contenidos léxicos del currı́culo de ENE,
las orientaciones metodológicas y la validación de pruebas de evaluación certificativa del léxico.
El uso del corpus es determinante para la selección de las unidades léxicas, tanto mono- como
pluriverbales, es decir, palabras simples o compuestas, colocaciones, locuciones, fórmulas de interacción social, según la clasificación de Gómez Molina (2004), que se incluyen en los inventarios
que servirán de base para la elaboración de los sı́labos de los cursos y de los manuales, ası́ como
de las especificaciones de los exámenes. Queda garantizado ası́ que la lengua de estos materiales
sea auténtica, reflejo de la que se emplea en los contextos reales de comunicación del ámbito de
ENE, y no artificial o inventada como la que se muestra en los materiales que toman los corpus
como punto de partida (O’Keefe y McCarthy, 2010: 374). Por otro lado, el corpus se constituye
en la herramienta idónea para presentar las unidades léxicas del currı́culo en la disposición que
se requiere para su enseñanza, a partir de propuestas como la del ”enfoque léxico” (Lewis, 1993,
1997, 2000) y las de algunos de sus seguidores como Timmis (2015), que plantean aplicaciones
del enfoque empleando metodologı́a de corpus. En esta lı́nea, O’Kee↵e et al. (2007) describen el
trazado de perfiles léxico-gramaticales de las unidades léxicas en el currı́culum, cuya rentabilidad
pedagógica resulta especialmente fructı́fera si se aplica a la didáctica de ENE. Como señalan
estos autores (O’Keefe et al, 2007: 198), en los géneros especializados y profesionales, lo más
probable es que ocurran patrones y distribuciones más regulares que los que se dan que en la
lengua general.
Las relaciones entre léxico y gramática que se establecen desde la óptica de este enfoque permiten,
⇤
†
Ponente
Autor correspondiente: [email protected]
32
en segundo lugar, implementar la metodologı́a del ”aprendizaje guiado por datos” (Data-Driven
Learning), que consiste básicamente en utilizar las herramientas que facilitan los corpus para el
aprendizaje de las unidades léxicas. De esta forma podrı́an paliarse muchas de las crı́ticas que
reciben propuestas como la de Lewis, referidas a los problemas de aplicación práctica.
Por último, un corpus como CORPEN contribuirá también de manera decisiva a la validación
de las pruebas de evaluación del léxico en los exámenes certificativos. En este sentido, el corpus
permite comprobar la relación entre la lengua de los ı́tems de elementos discretos con los usos
que se dan en los contextos reales de ENE.
Contraseña: ”español de los negocios”, ”corpus especializados”, ”currı́culum del español de los
negocios”, ”colocaciones”, ”locuciones”, ”expresiones institucionalizadas”
33
Applying Textometric Analysis to a
Description of Cochrane Medical Abstracts
and their Plain Language versions:
Quantitative Characterisation of Plain
Language in Medical Discourse
Christopher Gledhill ⇤† 1 , Hanna Martikainen ⇤ ‡ 1 , Alexandra Mestivier
(volanschi) ⇤ § 1 , Maria Zimina ⇤ ¶ 1
1
CLILLAC-ARP, EA3697 – Université Paris Diderot - Paris 7 – Francia
The Cochrane organisation publishes meta-analyses of large-scale medical studies (‘Systematic Reviews’ – SRs). This information is summarised in 1) a Scientific Abstract (ABS), targeting
members of the scientific community, and 2) a simplified summary for the general public which
Cochrane calls ‘Plain Language Summaries’ (PLS). Although there now exists extensive literature on controlled languages (Stewart 1998, O’Brien 2003), there has been less work on the
linguistic description of ‘plain language’. The Cochrane guidelines state that SRs should be
written in ”clear, simple English” (Cochrane Style Manual), while the language that should be
used in PLS is defined as ”plain English which can be understood by most readers without a
university education” (Cochrane PLEACS standards). But the guidelines do not provide any
specific linguistic definition of what is meant by ‘plain English’. In this paper, we set out to
identify the main lexico-grammatical di↵erences between ABS and PLS texts. Our hypothesis
is that PLS authors adapt their usage consciously or unconsciously to the perceived norms of
what they think may be plain writing. This process appears to be very regular, and can be seen
in the techniques of reformulation and other revisions that can be seen as the salient features of
PLS as opposed to ABS.
We extracted two sub-corpora from the literature produced by the Cochrane organisation: a corpus of 4540 ABS (2.1 million words) and a corpus of their corresponding 4540 PLS (1.1 million
words). The ABS texts are systematically divided into sub-sections: Background, Objectives,
Search strategy, Selection criteria, Data collection and analysis, Main results, Author’s conclusion. A minority of PLS (370) are also divided into sub-sections: Review question, Background,
Study characteristics, Quality of the evidence and Key results. This segmentation allows us to
pinpoint some specific phraseological strategies, for instance, the simplification of information
from Author’s Conclusions (in ABS) in the Key Results subsections of PLS.
We propose to use the methods of textometrics to compare the quantitative characteristics
of the ABS sub-corpus and the PLS sub-corpus. First, we applied POS-tagging to both (Schmid
1994). Then, we applied characteristic elements computation and factorial analysis to compare
di↵erent parts (text sections) of these POS-tagged corpora (Lebart et al. 1998). These met⇤
Ponente
Autor correspondiente:
‡
Autor correspondiente:
§
Autor correspondiente:
¶
Autor correspondiente:
†
[email protected]
[email protected]
[email protected]
[email protected]
34
rics reveal important similarities between the Background and Conclusions sections of ABS and
PLS. For example, Singular/Massive Nouns (NN), Prepositions (IN), Adjectives (JJ) and Determiners (DT) turn out to be salient (‘over-represented’) in PLS as well as ABS Backgrounds
and Conclusions sections. The over-representation of prepositions can be partially explained by
complex pre-modified nominal groups in the ABS which are ‘un-packed’ in the PLS into longer
nominals involving multiple embedding of post-modifying prepositional phrases:
ABS: ”Non-penetrating filtration surgery versus trabeculectomy for open-angle glaucoma”
PLS: ”Two surgical techniques for the control of eye pressure in people with glaucoma”
Such ‘unpacking’ corresponds to the advice adopted by controlled languages such as Simplified
Technical English: break down pre-modified nominals into several post-modifying groups. In
this paper, we also report on other PLS patterns (reformulation of research processes and empirical findings towards more disease-oriented or user-oriented terms and topicalisation of human
participants). All of these point to underlying regular tendencies of simplification in PLS. The
next stage of our project will devise a way of adapting the findings of textometric analysis into
the appropriate editorial guidelines for the authors of Cochrane PLS.
Contraseña: corpus linguistics, language for special purposes, medical discourse, plain language
summaries, textometric analysis
35
Aproximación a la fraseologı́a contrastiva en
las sentencias del TJUE
Andrades Arsenio
1
⇤ 1
Universidad Complutense de Madrid (UCM) – España
La Unión Europea publica toda su legislación en las 24 lenguas oficiales correspondientes a
los 28 Estados miembros que conforman esta organización supranacional. En este sentido, el
portal de la Unión Europea contiene una serie de recursos y páginas de internet que ponen a
disposición del público un enorme corpus de textos legislativos, judiciales, etc., de fácil acceso en
cada una de las lenguas oficiales. Este corpus multiling´’ue de textos paralelos permite realizar
búsquedas ling´’uı́sticas y constituye un instrumento muy útil para consultar y cotejar todo tipo
de datos de carácter terminológico, fraseológico, estilı́stico, etc.
La ling´’uı́stica de corpus facilita el análisis de los distintos elementos ling´’uı́sticos en su contexto
de producción real a partir de la compilación de documentos digitales. El estudio de textos del
Derecho de la Unión Europea nos permitirá conocer las caracterı́sticas fraseológicas especı́ficas de
estos textos y proponer una clasificación de los distintos tipos de estructuras fraseológicas (colocaciones, locuciones, expresiones formulaicas, etc.) que más se utilizan, basada en las principales
taxonomı́as fraseológicas del lenguaje general (Corpas, 1997; Ruiz Gurillo, 1998; Garcı́a-Page,
2008).
Para delimitar el ámbito de este trabajo nos vamos a centrar en una de las instituciones de
la UE, el Tribunal de Justicia de la Unión Europea, y en uno de los principales tipos de documentos que produce: las sentencias. Ası́ pues, esta propuesta de comunicación tiene como
objetivo la compilación de un corpus de sentencias en tres lenguas (inglés, francés y español)
con el fin de identificar y extraer sus principales elementos fraseológicos.
La metodologı́a de trabajo consiste fundamentalmente en constituir un corpus ad hoc de sentencias de la UE que sea representativo (Seghiri, 2014) y explorarlo mediante el programa de
concordancias Wordsmith 5.0. con la finalidad de obtener información sobre las estructuras
fraseológicas que más se utilizan en las tres lenguas que se cotejan. Los datos que se obtengan
podrán servir de base a la hora de establecer distintas estrategias para abordar la traducción de
estructuras fraseológicas en textos pertenecientes al ámbito judicial.
Con este tipo de trabajos se pone de relieve que la compilación de un corpus puede contribuir
de manera significativa al conocimiento de la fraseologı́a en un campo especializado y se hace
hincapié en la importancia de que el traductor jurı́dico esté familiarizado con la fraseologı́a de su
ámbito de especialización (Monzó y Hoyo, 1998; Lorente, 2002; Aguado de Cea, 2007; Pontrandolfo, 2013; Andrades 2013). Los resultados obtenidos constituyen una primera aproximación a
la fraseologı́a jurı́dica propia de los organismos internacionales que podrán ampliarse con estudios de mayor alcance y, si los datos lo corroboran, podrán extrapolarse a los textos jurı́dicos
en general. Este estudio permitirá asimismo apreciar las diferencias y semejanzas fraseológicas
existentes entre el discurso jurı́dico general y el lenguaje utilizado en las sentencias del TJUE.
⇤
Ponente
36
Contraseña: Corpus Linguistics, specialised phraseology, legal translation
37
Calcul de la saillance pour annoter un
corpus anaphorique (RESUMAN)
Afef Selmi
⇤† 1,2
, Laurent Gautier
⇤ ‡ 3
1
Centre Interlangues Texte Image Langage (TIL) – Université de Bourgogne : EA4182 – Université de
Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia
2
Aix-Marseille Université - UFR Arts, Lettres, Langues et Sciences Humaines (AMU UFR ALLSH) –
Aix Marseille Université – 29, avenue Robert Schuman - 13621 Aix-en-Provence cedex 1, Francia
3
Centre Interlangues Texte Image Langage (TIL) – Université de Bourgogne : EA4182 – Université de
Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia
[Contexte] Le développement des systèmes de communication électroniques est accompagné
d’une augmentation incessante du nombre de documents textuels électroniques disponibles tels
que les résumés de notre corpus RESUMAN. Cette évolution nécessite la mise au point d’outils
informatiques efficaces capables de sélectionner, de structurer et d’extraire les informations pertinentes contenues dans ces documents.
Problématique
Ce résumé s’inscrit prioritairement dans la piste de réflexion 7 ” Linguistique computationnelle
basée sur corpus”. De ce fait, et comme ” la langue est constituée en grande partie de préfabriqués dont on peut faire l’analyse en interrogeant les corpus en s’appuyant sur des méthodes
statistiques ”, nous avons crée un algorithme qui s’appuie sur le calcul de saillance (Landragin,
2011) comme facteur principal de résolution des anaphores pronominales dans notre corpus.
En prenant en compte di↵érents facteurs syntaxiques et cognitifs, cet algorithme fait recourt à
un modèle permettant d’évaluer d’une manière efficiente la saillance d’un antécédent potentiel.
Ces facteur comportent chacun un indice di↵érent en fonction de leur utilité dans la résolution.
Notre interrogation est la suivante : notre méthode statistique, basée sur notre corpus, est-elle
performante ?
Corpus
Le corpus RESUMAN est constitué des résumés d’ouvrages de la littérature française. Il regroupe 120 résumés, mis en ligne sur le site www.alalettre.com et présentant un peu moins
de 20 000 mots. Ce corpus contient environ 12 000 anaphores pronominales dont 3 000 sont
ambigu´’es. Il s’agit de textes caractérisés par leur brièveté et densité référentielles. Il vise à
interroger, automatiquement, le fonctionnement de l’anaphore pronominale ambigu´’e dans ces
textes en vue de mettre en évidence des caractéristiques syntaxiques et cognitives propres aux
chaı̂nes anaphoriques.
Cadre méthodologique
Après l’annotation morphosyntaxique semi-automatique de RESUMAN (vu que nous sommes intervenue pour compléter l’annotation morphologique des entités nommées), nous avons présenté
⇤
Ponente
Autor correspondiente: selmiafef [email protected]
‡
Autor correspondiente: [email protected]
†
38
un algorithme qui est inspiré de celui de Lappin et Leass (1994) en changeant la stratégie de
calcul de la saillance. Afin de restreindre les candidats potentiels, l’algorithme soumet les textes
de notre corpus à deux filtres : tout d’abord, à un filtre relatif à la cohérence morphologique
entre l’anaphore et le candidat, ensuite, à un filtre relatif à la structure syntaxique de la phrase
du pronom. Les candidats restants seront évalués selon un poids de saillance calculé selon les
critères suivants : la distance du candidat et son poids grammatical.
Pour cela, nous avons attribué des valeurs allant de 100 à 10 aux fonctions syntaxiques suivantes
: Sujet, COD, COI, Attribut et Relatif. L’algorithme exploite, en premier temps, des informations de nature syntaxique et morphologique. Après exclusion des pronoms non-anaphoriques,
il applique une mesure de saillance qui vise à classer les candidats potentiels pour ne garder ensuite que les attributs adéquats. A travers la résolution automatique de l’anaphore pronominale,
nous mettons l’accent par la suite sur les interactions entre discours, traitement automatique
des langues et analyse de corpus.
Résultats
80% des anaphores pronominales du corpus sont résolues dont 25% des cas ambigus. Il reste
20% d’anaphores pronominales non résolues ce qui nous mène à réinterroger le corpus pour
savoir les mécanismes qui ont empêché la résolution. Les poids grammaticaux que nous avons
rajoutés en sont-ils la cause ? Ou bien au contraire, est-ce grâce à eux que nous avons ce taux
de performance ? La course à un corpus d’évaluation est de mise pour répondre à ces questions.
Contraseña: Linguistique computationnelle, corpus, anaphores pronominales, statistique, saillance,
poids grammatical, résolution automatique.
39
Constitution d’un corpus juridique pour
l’extraction des collocations
Joaquı́n Giraldez Ceballos-Escalera
1
⇤ 1
UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) – Senda del Rey, 7 - 28040
MADRID, España
Du point de vue méthodologique, cette contribution s’inscrit dans le cadre de la linguistique
de corpus et met en œuvre une étude sur l’extraction des collocations en langage juridique.
Cette étude a le double objectif d’aborder les bases méthodologiques pour la constitution d’un
corpus de textes juridiques et de présenter les di↵érentes étapes suivies pour l’extraction des
collocations.
La linguistique du corpus est une discipline linguistique qui, associée à la linguistique computationnelle, étudie la langue à travers une grande variété de textes.
En lexicographie, le corpus constitue le matériel de base pour l’analyse linguistique et, grâce à la
technologie computationnelle, aujourd’hui il est possible de disposer d’une masse considérable de
données linguistiques, disponibles sous forme électronique. Ces ensembles de textes permettent
d’observer des données réelles nombreuses et diversifiées. Ces ressources ouvrent de nouvelles
perspectives à la description linguistique, dans la mesure où des outils d’analyse permettent
d’explorer ces textes et d’en extraire des données linguistiques de manière efficace.
On présentera le Corpus du français juridique ” FRJUR ” que nous avons élaboré et des outils
d’analyse ainsi que la méthodologie employée.
Le corpus linguistique du français juridique (FRJUR) est le résultat de la collecte de textes
relatifs au domaine du droit civil français. Il est composé de 3.200.086 mots distribués en
di↵érentes sections: codes, arrêts, publications spécialisées, etc.
Les textes ont été sélectionnés et organisés de façon systématique selon des critères de distribution équilibrée pour devenir un ensemble structuré davantage que des collections de textes.
Le corpus, sur support numérique, a été conçu en fonction des critères établis par Sinclair (1991):
”a corpus is a collection of naturally-occurring language text, chosen to characterize a state
or variety of a language”. Sinclair (1991: 171)
Pour la conception du corpus on a pris en compte la représentativité des textes et les destinataires.
Le corpus FRJUR, nous a permis d’étudier les relations lexicales qui existent entre deux mots
(probabilité de la dépendance) avec la probabilité d’observer ces mêmes mots séparément (probabilité de l’indépendance). Selon la théorie de Church et Hanks (1989) basée sur la notion
d’information mutuelle de la théorie de l’information, si une véritable relation lexicale existe
⇤
Ponente
40
entre deux mots, la probabilité de la dépendance sera beaucoup plus élevée que la probabilité
de l’indépendance et l’information mutuelle de la paire (le rapport des deux probabilités) sera
largement supérieure à zéro. La paire sera alors retenue comme étant significative.
La fréquence, la transparence, l’arbitrariété et la directionnalité constituent les critères établies
par la plupart d’auteurs pour identifier les collocations (Firth :1957; Cruse : 1986 ; Hausmann
: 1989 ; Mel’cuk : 1998).
Pour établir la typologie des ” collocations ” dans la langue juridique nous proposons de partir
des ” associations ” établies par Hausmann (1989) et de les répartir en cinq groupes: nom –
adjectif, verbe – nom, verbe – adverbe, adverbe – adjectif et nom - (préposition) – nom.
À l’aide d’un corpus informatisé, l’étude des collocations dans le langage juridique permettra d’enrichir les banques de données terminologiques pour l’utilisation des traducteurs, des
chercheurs spécialistes (jurilinguistes) et les apprenants de français sur objectifs spécifiques
(FOS).
Contraseña: Mots, clés : corpus, collocations, coocurrence, droit, extraction
41
Construction de corpus en vue d’une étude
contrastive des structures résultatives en
anglais et de leur traduction en français
Dijana Bojovic
⇤ 1
1
Bases, Corpus, Langage (BCL) – CNRS : UMR7320, Université Nice Sophia Antipolis (UNS) –
Laboratoire BCL - UMR 6039 Université de Nice - Campus Saint-Jean d’Angely 3 24, avenue des
Diables bleus 06357 Nice Cedex 4, Francia
Cette communication a pour objectif principal d’expliquer les manières de procéder et les
problèmes rencontrés dans la construction de corpus pour notre étude contrastive des structures
résultatives en anglais et de leur traduction en français. Basée sur plusieurs corpus (British National Corpus, Corpus of Contemporary American English, Gutenberg, Gallica et FRANTEXT),
cette étude s’appuie sur la mise au point de procédures spécifiques à partir des caractéristiques
connues du phénomène étudié, destinées à extraire des données à partir de corpus généraux.
D’un point de vue sémantique, les structures résultatives représentent à la fois un dynamisme
et l’aboutissement de ce dynamisme. Un procès dynamique est au cœur d’une première relation
prédicative et l’état de fait résultant de ce dynamisme constitue une seconde relation prédicative.
On a a↵aire à la fusion des deux relations prédicatives – c’est-à-dire une relation prédicative et
une relation coprédicative – et donc à une syntaxe di↵érente de la syntaxe de l’enchâssement.
Etant donné que les SR représentent un phénomène très productif en anglais, nous nous sommes,
dans un premier temps, fixé l’objectif d’en dresser une typologie, tout en tenant compte de leurs
limites, c’est-à-dire, verbes statiques d’un coté du spectre et transitifs prototypiques à l’autre
extrémité. L’interaction entre la syntaxe et la sémantique est forcément en jeu et pour cela
nous analysons lors de cette recherche les propriétés des structures transitives (He ate the plate
clean), des intransitives inergatives (The child screamed itself hoarse) et des intransitives
inaccusatives (The lake froze solid ). L’autre classement se fait par type d’attribut résultatif
: syntagme adjectival (He hammered the metal flat), syntagme nominal (She dyed her pants a
bright red.), syntagme prépositionnel (She smashed the vase to pieces), syntagme adverbial
(We decided to creep upstairs and see what happened).
Nous mettons au point les protocoles d’interrogation des corpus existants en anglais et en
français en vue de constituer un corpus de SR en anglais et un corpus en français pour mener
une étude des problèmes que pose leur traduction de l’anglais vers le français. Nous construisons ainsi un corpus à plusieurs volets ; le premier comportant les exemples anglais recueillis de
manière systématique dans les corpus BNC et COCA, en créant des collocations et en lançant
des recherches avec des variations, le deuxième réservé aux traductions en français des structures
relevées en anglais dans le premier volet du corpus (Gallica, FRANTEXT, Gutenberg), et aux
observations de leurs caractéristiques, et le troisième qui contient les SR existantes en français.
Le but de cette recherche contrastiviste est de faire deux études linguistiques – l’une sur la
langue anglaise, l’autre sur le français – du phénomène des SR, pour chercher où commencent
les divergences et pour quelles raisons. L’analyse des traductions, quant à elle, a pour but
de systématiser les solutions rencontrées, d’en chercher la justification, et de dégager des con⇤
Ponente
42
stantes qui pourront apporter une aide à la réflexion et à l’autonomie du traducteur, apporter un
éclairage supplémentaire sur ces structures qui conservent à l’heure actuelle une part d’opacité et
qui se prêtent mal à l’analyse, et apporter si possible des outils supplémentaires à la traduction
assistée par ordinateur.
Les conclusions de notre travail de recherche sont donc le fruit des données attestées en corpus,
et la confrontation des hypothèses de travail avec notre corpus est heuristique.
Contraseña: corpus, linguistique contrastive, structures résultatives, syntaxe, traduction, linguistique de corpus
43
Corpus en classe de langue. Exemple avec
les marqueurs d’exemplification et de
reformulation
Cristelle Cavalla
⇤† 1
, Thi Thu Hoai Tran
⇤ ‡ 2
1
2
Didactique des langues, des textes et des cultures (DILTEC) – Université Paris III - Sorbonne
nouvelle : EA2288, Université Sorbonne Paris Cité (USPC) – Maison de la Recherche, 4 rue des
Irlandais, 75005 Paris, Francia
Grammatica – Université d’Artois : EA4521 – Université d’Artois Maison de la Recherche 9, rue du
Temple - BP 10665 62030 ARRAS CEDEX, Francia
Dans cette communication nous décrirons une expérimentation en cours auprès d’étudiants
allophones de niveau A2-B1 dans un cours de français académique autour de l’utilisation d’un
lexique spécifique aux écrits scientifiques et d’un corpus numérique. En termes méthodologiques
il s’agit aussi de les aider à se familiariser avec les normes de ce genre d’écrit universitaire qui
sont parfois éloignées des normes rencontrées dans leur système éducatif d’origine.
Dans ce travail nous nous intéressons tout particulièrement au discours universitaire issus d’un
corpus de 5 millions de mots composé d’articles scientifiques venant de disciplines en SHS et accessible en ligne grâce à l’interface ScienQuest[1]. Ce corpus est étiqueté morpho-syntaxiquement
et annoté semi-automatiquement (Tran, 2014). Notre intérêt porte essentiellement sur la phraséologie
transdisciplinaire scientifique, ou le lexique scientifique transdisciplinaire (Tutin, 2007) qui est
considéré comme un ” lexique de genre ” et traverse toutes les disciplines, par exemple : contredire une théorie, objectif principal etc. Nous nous situons dans une conception élargie du
domaine de la phraséologie (Legallois et Tutin, 2014) en y incluant les marqueurs discursifs
(désormais MD) (à savoir, en résumé, dans le cadre de etc.) qui servent à structurer le discours.
Nous avons établi une typologie composée de 171 MD et divisée en neuf sous-groupes (Tran,
2014). Pour l’analyse de ces éléments, nous avons opté pour le modèle linguistique de Paillard et
Vu (2014) selon lequel nous pouvons mettre l’accent sur la relation syntaxique entre les contextes
gauche et droit d’un adverbe ou d’un adverbial pour relever par la suite ses valeurs sémantiques.
L’expérimentation mise en place porte sur les marqueurs d’exemplification et de reformulation,
car nous avions constaté leur sur-représentation dans les écrits scientifiques (Tran et al., 2016).
Au plan pédagogique, les étudiants sont confrontés à des paragraphes courts, extraits du corpus
numérique. Cette expérimentation est considérée comme la première étape de sensibilisation au
rôle que jouent ces éléments phraséologiques dans la structuration de ces écrits pour les étudiants
allophones. Nous émettons l’hypothèse qu’une telle entrée linguistique les conduira à découvrir
les normes du genre de l’écrit universitaire.
Références
Adam, J.-M. (1989). ” Aspects de la structuration du texte descriptif: les marqueurs d’énumération
et de reformulation ”. Langue française, (81), 5998.
⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
†
44
Cavalla, C. & Loiseau, M. (2013). ” Scientext comme corpus pour l’enseignement ”. In L’écrit
scientifique: du lexique au discours. Autour de scientext, Tutin, A. & Grossmann, F. Rennes :
PUG, 16380.
Legallois, D., et Tutin, A. (2013). ” Présentation: Vers une extension du domaine de la
phraséologie ”. In ” Vers une extension du domaine de la phraséologie ”, Legallois, D. & Tutin,
A. (éds), 1(189), 325.
Mangiante, J.-M., & Parpette, C. (2011). Le français sur objectif universitaire. Grenoble:
Presses universitaires de Grenoble.
Paillard, D., & Vu, T.-N. (2012). Inventaire raisonné des marqueurs discursifs du français.
Description. Comparaison. Didactique. Paris : AUF.
Tran, T.-T.-H., Tutin, A, & Cavalla, C. (2016). ” Typologie des séquences lexicalisées à fonction
discursive et aide à la rédaction scientifique ”. Cahiers de lexicologie, 108(1), 161-180.
Tran, T.-T.-H. (2014). ” Développement d’une aide à l’écrit scientifique. Description de la
phraséologie scientifique et réflexion didactique pour l’enseignement à des étudiants non natifs
”. Thèse de doctorat en Sciences du langage Spécialité Français Langue Etrangère, Université
Grenoble Alpes.
Tutin, A. Lexique et écrits scientifiques. Vol. XII-2. Revue Française de Linguistique Appliquée, 2007.
URL : http://corpora.aiakide.net/scientext18/
Contraseña: phraséologie, FLE
45
Development of Tatar-Russian
Socio-Political Dictionary of Collocations on
Corpus Data
Olga Nevzorova
1
⇤ 1
Tatarstan Academy of Sciences (TAS) – 20 Bauman str., Kazan, Rusia
The Tatar-Russian Socio-Political Dictionary of collocations is based on data of the Corpus
of Written Tatar (http://corpus.tatar/en), the Tatar National Corpus (http://corpus.antat.ru),
and data of comparable socio-political corpora. It is built as a collocation dictionary which
contains more than 3000 collocations.
The methodology of compiling the Dictionary included the following stages. First we developed
comparable thematic socio-political corpora of Tatar and Russian. The next stage implied an
automatic generation of the frequency list of actual terms (the list of one-word terms as potential
header words) using comparable corpora. Then, applying the software of the Corpus of Written
Tatar, we obtained a frequency list of collocations for each frequent term. The limitations for
cutting elements from the collocation list were based on frequency of using linguistic items in
the Corpus, and these limitations were determined empirically. When selecting collocations,
we considered the syntactic structure of a collocation and the morphological parameters of its
constituents. We also took into account regularities of grammatical (non-inflectional) variants of
word combinations. For example, In Turkic languages occur the following regular synonymous
models: ADJ +N and N + N, POSS 3:
iqtisadi cinay´’at (ADJ +N) - iqtisad cinay´’ate (N + N, POSS 3) ’economic crime’.
Such regular grammatical variants of collocations are considered as the same nominative item.
The main unit in the Dictionary is noun phrase formed by filing one of possible semantic-syntactic
positions of the word and meeting the criteria of semantic completeness. Quantitatively such an
item may consist of two or more notional words. In the current version of the Dictionary most
of collocations are composed of two notional components.
The compiled Dictionary makes it possible 1) to represent the real use and collocability of words
of the socio-political domain in Tatar; 2) to build typical grammatical models of collocations of
these items; 3) to trace new items (words and collocations) in modern Tatar.
The reported study was funded by Russian Science Foundation according to the research project
16-18-02074.
Keywords: the Tatar language, collocations, Dictionary of collocations, socio-political terminology, corpora.
References
1. Bahns, J. (1993). Lexical collocations: a contrastive view. ELT journal, 47(1), 56-63.
2. Benson, M. (1990). Collocations and general-purpose dictionaries. International Journal of
Lexicography, 3(1), 23-34.
3. Benson, M. (1989). The structure of the collocational dictionary. International Journal of
Lexicography, 2(1), 1-14.
4. Carter, R. (2012). Vocabulary: Applied linguistic perspectives. Routledge.
5. Conrad, S. (2002). 4. Corpus linguistic approaches for discourse analysis. Annual Review of
Applied Linguistics, 22, 75-95.
⇤
Ponente
46
6. Corpus of Written Tatar. URL: http://corpus.tatfolk.ru/index en.php.
7. K´’ubler, N., & Pecman, M. (2012). The ARTE bilingual LSP dictionary: From collocation
to higher order phraseology.
8. Kennedy, G., 2014. An introduction to corpus linguistics. Routledge.
9. Ramos, M. A., Nishikawa, A., & Vincze, O. (2010, June). DiCE in the web: An online
Spanish collocation dictionary. In E-lexicography in the 21st century: New challenges, new
applications: proceedings of eLex 2009, Louvain-la Neuve, 22-24 october 2009 (pp. 369-374).
10. Reppen, R., & Biber, D. (Eds.). (2012). Corpus linguistics (pp. 1988-1988). SAGE.
11. Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell Publishers.
12. Suleymanov D., Nevzorova O., Gatiatullin A., Gilmullin R., Khakimov B. (2013). National
corpus of the Tatar language ”Tugan Tel”: Grammatical Annotation and Implementation. In
Procedia - Social and Behavioral Sciences 2013. Pp. 68-74.
13. Tatar National Corpus. URL: http://corpus.antat.
Contraseña: Socio, Political Dictionary, Tatar language, collocation
47
Development of annotation system for
multiword constructions for Tatar National
Corpus
Dzhavdet Suleymanov
1
⇤ 1
Tatarstan Academy of Sciences (TAS) – 20 Bauman str., Kazan, Rusia
Tatar National Corpus (TNC - http://corpus.antat.ru) is a linguistic resource of the modern
Tatar language. Its volume is 100,000,000 tokens. The texts included into the Corpus are provided with a grammatical mark-up, so that its search system enables for a search for lexemes,
word forms and individual grammatical parameters, as well as search for stop-words, for a part
of the word, and search using logical formulae.
Currently TNC morphological analyser uses a tagset for morphological categories within a word
form. Since Tatar is distinguished for its complicated agglutinative morphology, the analysis
isolates the word stem, defines its part of speech, and gives a description to the chain of inflectional affixes of the word form.
The present system of grammatical annotation is being supplemented with tags to mark up
compound constructions. In Turkic languages a large number of lexical items and grammatical
categories are expressed by means of multiword units (for example, the category of modality
is, as a rule, conveyed not lexically, but using special constructions expressing the idea of obligation, possibility, or desire). In the current version of grammatical mark-up, compound word
forms and multiword constructions may only be derived by means of sophisticated queries. So
extracting multiword constructions requires a description of parameters of two or even more
linguistic units with a predetermined distance between them. Therefore such queries become
cumbersome and time-consuming, and the user has to be experienced in making complex queries.
Presently the grammatical annotation system is being enriched by entering new tags for compound (analytical) forms and constructions, thus allowing for distinguishing between multiword
lexical items, forms and constructions. Special rules for retrieving such units have been developed, basing on their structure, the order of components, and the possibility to insert some outer
members. In particular, verbal constructions consisting of two components have the following
standard structure: the first component has a required form (has a given affix or set of affixes)
and is grammatically invariable, while the second may join all the inflectional and derivational
affixes admissible for verbs.
Compound verbs semantically equivalent to a lexeme consist of an invariant first component
(stem) and an inflected second (auxiliary) component. For example, the verb y´’ard´’am it´’u
’to help’ in real use may have di↵erent realisations: y´’ard´’am ittel´’ar ’ they helped’, y´’ard´’am
itm´’asme ’will he help?’, y´’ard´’am it´’uce ’that he helps’ etc. In actual use such verbs may form
compound multiword constructions by adhering components, with a possibility to insert postpositional particles between them.
⇤
Ponente
48
The present Tatar grammars keep a superficial description of the structure of multiword constructions, covering but a small number, while the corpus technology o↵ers an exhaustive list of
such units. By now we have drawn up sets of rules for retrieving compound verbs semantically
equivalent to a lexeme, as well as rules for retrieving their tenses, and constructions composed of
phase and modal verbs. Also we developed formats of queries for retrieving correspondent data
and invented special tags to mark up diverse types of multiword constructions. The annotation
system is mainly built on the tags of Leipzig Glossing rules and those of the database of verbs
developed by V.Plungian (httlp://www.mccme.ru/ling/verbum.htm).
The reported study was funded by RFBR according to the research project 15-07-09214.
Contraseña: the Tatar language, corpus, multiword construction, corpus annotation
49
Diccionario de terminologı́a médica español
- chino basado en corpus
Antonio Moreno-Sandoval
1
⇤ 1
, Yuanyi Liu
⇤ † 2
Universidad Autónoma Madrid (UAM) – Departamento de Lingüistica y Lenguas Modernas, Facultad
de Filosofı́a y Letras, Cantoblanco, 280049 Madrid, España
2
Universidad Autónoma Madrid (UAM) – Laboratorio de Lingüistica Informática, Facultad de
Filosofı́a y Letras, Cantoblanco, 28049, Madrid, España
En relación a los diccionarios especializados español-chino o chino-español, aún son escasos
los trabajos y carecen de variedad. Más concretamente en la terminologı́a médica, solo existe un
diccionario biling´’ue Diccionario de medicina chino-español de la Editorial de Lenguas Extranjeras de Beijing. Está publicado en el año 2005 por lo que no incluye los términos más recientes
de los últimos diez años y está por actualizar. Por otro lado, no está basado en el corpus ni
aportan ejemplos que ilustren el significado en el uso real. En fin, es un campo en el que se
pueden ampliar claramente las investigaciones.
Nuestro proyecto está elaborando un diccionario biling´’ue español-chino especializado en la
medicina y basado en corpus. En concreto, se van a utilizar MultiMedica (Moreno y Campillos
2013), corpus compilado y desarrollado por el Laboratorio de Ling´’uı́stica Informática de la Universidad Autónoma de Madrid (LLI-UAM) y Sketch Engine, uno de los sistemas de búsqueda
más avanzados de ayuda a los lexicógrafos a encontrar buenos ejemplos de uso para su diccionario
(Kilgarri↵ et al. 2008). El objetivo del proyecto es, en primer lugar, elaborar un diccionario especializado biling´’ue en formato electrónico, para, posteriormente, describir los problemas tanto
traslaticios como técnicos en la elaboración del mismo y realizar un estudio comparativo de la
terminologı́a médica en ambos idiomas. El objetivo final que persigue esta lı́nea de investigación
es explorar, mediante la aplicación de la tecnologı́a de corpus a la lexicografı́a, una metodologı́a
cientı́fica en la elaboración de diccionarios especializados español-chino que se pueda reproducir
en otros terrenos especı́ficos, tales como la terminologı́a económica y comercial, la jurı́dica, etc.,
y, al mismo tiempo, contribuya al desarrollo de la traducción especializada y la formación de
traductores e intérpretes de alto nivel.
Esta comunicación se centrará en la metodologı́a empleada:
1. Fijación de la macroestructura y la microestructura del Diccionario de la Terminologı́a
Médica Español-Chino: hemos elegido los 5000 términos más frecuentes extraı́dos del Corpus
Multimédica del LLI como entradas principales del diccionario, a base de los cuales hemos
decidido incorporar la frecuencia normalizada, códigos médicos internacionales (CUI, MESH),
equivalente en inglés, equivalente en chino mandarı́n, término equivalente en la medicina tradicional china, ası́ como la variante en chino latinizado para facilitar el uso de los hispanohablantes,
sinónimos, abreviaturas, observaciones.
2. Elaboración del Diccionario, que consiste principalmente en la traducción de los 5000 términos
en español al chino. Para lograr equivalentes más adecuados y precisos, hemos usado el DTM,
diccionario monoling´’ue de la Real Academia Española de Medicina, el Diccionario Médico Bil⇤
†
Ponente
Autor correspondiente: [email protected]
50
ing´’ue Inglés-Chino, varios corpus con textos paralelos y enciclopedias biling´’ues elaboradas por
instituciones sanitarias oficiales.
3. Incorporación de colocaciones: hemos incluido las colocaciones (los 10 multiwords más frecuentes) de los 5000 términos según el corpus Multimédica como nuevas entradas y sus respectivos equivalentes en inglés y en chino.
4. Selección de ejemplos: en vez de un glosario, el nuestro es un diccionario de uso. En caso de
ambig´’uedad, ponemos ejemplos reales del corpus Multimédica para cada equivalente, ası́ como
su traducción a chino. De esta manera, el usuario distingue mejor las diferencias que hay entre
los distintos equivalentes de un mismo término.
5. Elaboración del diccionario electrónico mediante el programa TshwaneLex.
Adjuntamos dos entradas (sencilla y compuesta) del diccionario en el fichero.
Contraseña: Medical terminology, Spanish, Chinese, corpus, based lexicography, corpus Multimedica
51
Dire la nouveauté par les mots : les
néologismes révélant les nouvelles tendances
sociétales en France
Najet Boutmgharine Idyassner
⇤ 1
1
Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus
(CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât.
Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia
Chaque langue est dotée de la capacité à accueillir des mots nouveaux, mais la créativité lexicale est surtout un processus qui rend compte des évolutions sociétales. En français, les procédés
de création lexicale sont variés (Sablayrolles 2006) favorisant l’émergence de dénominations nouvelles, communément appelées néologismes. Très tôt, ainsi, on remarque que la création lexicale
est la meilleure trace des transformations de la société, Du Bellay résumant cette relation par la
formule ” aux nouvelles choses être nécessaire imposer nouveaux mots ”. La relation de cause
à e↵et est structurée en deux temps : si une nouvelle chose est créée, alors la dénomination
doit suivre. Cette double opération, lorsqu’elle est démultipliée, influe sur l’évolution du lexique
d’une langue : les perspectives de dénomination sont alors plus grandes, pour nommer les nouvelles réalités. En suivant les changements que connait une langue donnée, on peut donc retracer
les évolutions de la société dans laquelle elle évolue. L’intérêt de la néologie réside en grande
partie dans ce principe, de l’avis général des néologues : ” La néologie reflète la progression
d’une langue tout autant que l’évolution d’une société. [...] Le langage est daté et ce sont les
néologismes qui en sont les éléments comptables les plus marquants. ” (Pruvost et Sablayrolles,
2016 : 10). Les avancées en traitement automatique des langues permettent, à l’heure actuelle,
de suivre ces évolutions. Nous proposons d’exposer une recherche sur les néologismes reflétant
les évolutions que connait la société française actuelle. Ces travaux ont pour cadre le projet ”
Neoveille, repérage, analyse et suivi des néologismes en sept langues ” (Cartier, 2016). La plateforme Neoveille est le fruit d’un projet scientifique financé par la COMUE Sorbonne Paris Cité
impliquant des intervenants à l’échelle internationale. Elle consiste en un ensemble de modules
de repérage, d’analyse et de suivi des néologismes à partir d’un corpus journalistique quotidiennement alimenté. En observant la liste des néologismes retenus par le système de repérage
de la plateforme, on remarque d’emblée que les néologismes traduisent l’arrivée de nouvelles
pratiques de société. En particulier, les emprunts à l’anglais endossent cette fonction : l’espace
de travail (co-working, workventurer), les loisirs (mermaiding, binge-viewing) ainsi que de multiples autres sphères sociales, sont bousculées par l’arrivée de nombreuses tendances importées,
souvent, du monde anglo-saxon. Certains de ces emprunts sociétaux désignent des pratiques
promues par les réseaux sociaux (mannequin challenge), révélant de nouvelles formes de conduites délictueuses (trainsurfing), répréhensibles (bodyshaming) mais aussi parfois signalant de
nouvelles formes d’actions sociales louables (book crossing, clickfunding). De même, le suivi des
néologismes dans des corpus dont les paramètres dia-varient (diatopie, diastratie, diaphasie, cf.
Coseriu, 1988), montrent notamment les sociolectes les plus influents dans la sphère française,
et les variations diatopiques à l’oeuvre aujourd’hui.
⇤
Ponente
52
Contraseña: néologismes, création lexicale, emprunt, anglicisme, néologismes sociétaux
53
Early Modern English Scientific Text Types:
Di↵erent Levels of Linguistic Complexity?
Jesús Romero-Barranco
1
⇤ 1
Universidad de Málaga (UMA) – Universidad de Malaga Campus de Teatinos 29071 Málaga, España
Complexity was first defined by Simon as hierarchies of di↵erent elements originating from
simplicity (1962: 468). In Linguistics, Givon (2009) has analysed syntactic complexity from the
point of view of language typology; Dahl (2004) and Nichols (2009) have assessed grammatical
complexity cross-linguistically; and Blankenship (1974), Chafe (1982) and Maas (2009) have
studied the di↵erent levels of complexity in spoken and written registers. Furthermore, Lehto
(2015) elaborated a diachronic analysis of the levels of complexity among di↵erent text types
in early Modern English legal material, based on Biber’s works on linguistic complexity. Biber
(1992) identified some key linguistic features associated with reduced complexity (i.e. that
deletions, contractions or clause coordination, among others) and increased complexity (i.e.
nominalizations, phrasal coordination or passive constructions, among others). These features
occur in di↵erent patterns across di↵erent registers and the calculation of their frequency allows
for the assessment of the level of complexity in di↵erent kinds of texts.
In itself, the concept of complexity has not been hitherto evaluated in early English medical
writing, especially considering its di↵erent text types. In the light of this, the present paper
analyses the levels of linguistic complexity in two early Modern English medical treatises housed
in Glasgow, Glasgow University Library, MS Hunter 135: a surgical treatise (↵. 34r-73v) and
a recipe collection (↵. 74r-121v). These two treatises conform as the ideal input for this study
inasmuch as they represent two text types of medical writing and, consequently, they allow for
the comparison in terms of linguistic complexity. According to Pahta and Taavitsainen (2004),
theoretical treatises were the most formal text type while remedybooks represented the popular
medical knowledge, surgical treatises falling in-between these two. Therefore, the analysis sheds
light on the di↵erences between two of the branches of medical writing in early Modern English.
The present study, therefore, has been conceived with the following objectives: a) to identify
the complexity features present in these two witnesses; and b) to analyse the di↵erent levels of
complexity in both text types.
In order to carry out such an analysis, the linguistic features identified by Biber (1992) will be
retrieved and their frequency calculated. Furthermore, textual organisation will be also analysed
as it certainly contributes to the level of complexity of a particular text. On methodological
grounds, the texts have been transcribed following semi-diplomatic conventions so that editorial
intervention is kept to a minimum. After the transcription, the texts have been POS-tagged
so that automatic searches could be carried out by way of a conventional concordancer. These
texts are part of The Málaga Corpus of Early Modern English Scientific Prose (available at
http://modernmss.uma.es), a corpus that aims to provide a sample of ca. 1,000,000 POS-tagged
words of early Modern English scientific prose.
Contraseña: linguistic complexity, early english medical writing, surgical treatises, medical reme⇤
Ponente
54
dybooks
55
El corpus de fuentes digitales como
herramienta para la gramática del discurso
Vı́ctor Pérez Béjar
1
⇤† 1
, Marı́a Soledad Padilla Herrada
⇤ ‡ 1
Universidad de Sevilla (US) – España
Nuestro punto de partida es la consideración de la rentabilidad del uso de fuentes digitales en
los estudios de investigación ling´’uı́stica. Todos coincidimos en la necesidad del trabajo con corpus, que implica un estudio empı́rico con datos reales, lo que legitima las conclusiones obtenidas.
Aunque este tipo de trabajo es habitual en el léxico, es recomendable y, desde nuestro punto de
vista, imprescindible, en el campo de la sintaxis.
Por ello, desde el proyecto MEsA (Macrosintaxis del Español Actual ; referencia: FFI2013-43205P) estamos elaborando un corpus compuesto por textos procedentes de fuentes digitales. Consta
de muestras de discurso tomados de blogs y foros de diversa temática, publicaciones y comentarios de páginas públicas de Facebook, tuits, transcripciones de vı́deos de YouTube y recopilaciones
de sus comentarios, ası́ como conversaciones privadas de la aplicación WhatsApp. Está en fase
de realización.
Nuestro objetivo es conseguir material ling´’uı́stico de uno de los medios de comunicación más
frecuentes en la actualidad: las redes sociales y aquellos entornos integrados en la internet 2.0. Se
trata de un entorno comunicativo hı́brido en el continuum oral-escrito, coloquial-formal. Entre
las ventajas podemos destacar la gran cantidad de muestras textuales a las que se tiene acceso,
la obtención de ejemplos fáciles de interpretar sin las dificultades que presenta la lectura de una
transcripción oral y la posibilidad de recuperar el contexto completo de las muestras. Entre los
problemas, podemos señalar que no siempre es posible la reposición de elementos entonativos (a
menudo, imprescindibles en la interpretación de enunciados) ya que la ortografı́a no es rigurosa
a la hora de reflejar la prosodia.
Este corpus nos servirá para trabajar dentro del marco del proyecto, para detectar patrones
sintácticos que se están extendiendo en el discurso coloquial y del que, raras veces, obtenemos
datos. Nos interesa porque en algunos casos puede llevar a la fijación de operadores o marcadores
discursivos. En todos estos domina la intersubjetividad (Company 2004; Traugott 2004), uno
de los motores de la evolución de estos elementos ling´’uı́sticos.
En esta presentación nos queremos centrar en expresiones que salen de los moldes sintácticos
tradicionales y que no se ajustan al esquema oracional. Dentro de este grupo, se sitúan las
unidades fraseológicas, entendidas en un sentido amplio. Es decir, nos referimos a estructuras
con una fijación léxica total (refranes, frases hechas...), construcciones cuya fijación se encuentra
en la combinatoria de sus elementos (como construcciones insubordinadas) y otras expresiones
ling´’uı́sticas que no se encuentran todavı́a del todo fosilizadas. El acercamiento a estas unidades
se hará desde una perspectiva pragmagramatical (Fuentes 2015), que contempla la descripción
de unidades sintácticas más allá de la oración según su uso real y su función dentro del discurso. Esta perspectiva se desarrolla desde un análisis multidimensional, que tiene en cuenta la
⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
†
56
macroestructura, la microestructura y el tipo de texto, y que incluye los diferentes campos de
inserción de la posición del hablante: estructura enunciativa, modal, informativa y argumentativa.
Defendemos, por tanto, que es un corpus rentable en los estudios de los fenómenos coloquiales
de la lengua. Con la presentación de muestras de unidades con mayor o menor grado de fijación
extraı́dos de este corpus pretendemos reflejar que son muestras fiables para el estudio en este
campo de investigación y que su uso constituye una herramienta eficaz en las investigaciones en
gramática del discurso.
Contraseña: Pragmagramática, sintaxis discursiva, discurso digital
57
El desacuerdo a través de la interrogación
ecoica
Marı́a Valentina Barrio
⇤ 1
, Milka Villayandre
⇤
1
1
Universidad de León – España
El español presenta un conjunto de esquemas sintácticos fraseológicos pragmáticos (Zamora
Muñoz, 2003), de naturaleza interrogativa, que repiten total o parcialmente un enunciado previo
emitido por otro interlocutor y cuya función discursiva es la expresión del desacuerdo mediante
dicha repetición. Se pueden citar algunos ejemplos:
(1) A: - A ti, Ana, te toca fregar los platos.
B: - ¿A mı́, fregar, de qué? No pienso hacerlo.
(2) A: - ¿Sabes cuándo vuelve Pili de las vacaciones?
B: - ¿Yo qué voy a saber?
(3) A: - ¿No tomas el desayuno con nosotros?
B: - ¿Qué desayuno ni qué leches? Sigo sin olvidar lo que me habéis hecho.
(4) A: - Si madrugaras más, tendrı́as más tiempo para organizarte.
B: - ¿Yo, madrugar? Lo siento, me lo prohı́be mi religión.
(5) A: - A ver, que el español no necesita promoción.
B: - ¿Cómo que el español no necesita promoción?
En este estudio, se proponen dos objetivos principales. En primer lugar, se sistematizarán los esquemas fraseológicos interrogativos existentes en español que manifiestan desacuerdo cumpliendo
las caracterı́sticas anteriormente mencionadas, a fin de definir los elementos que conforman su
esquema fijo y aquellos que pueden saturar sus variantes libres. En segundo lugar, se analizará
el microdiscurso que forma el esquema interrogativo junto con su estı́mulo (el enunciado que
repite) para describir las funciones pragmáticas que cumplen estas unidades y las relaciones
que experimentan dentro de la conversación. En esta tarea, se hará especial hincapié en dos
cuestiones. Por un lado, el estudio de la repetición y los componentes a los que afecta; esto es,
al contenido del enunciado, al acto de enunciación, a los interlocutores... Por otro, la reflexión
sobre las unidades en las que recae el desacuerdo y los supuestos pragmáticos en los que se
basa este desacuerdo, ya sean de carácter explı́cito o requieran un proceso interpretativo de tipo
inferencial.
⇤
Ponente
58
Por lo que se refiere al marco de estudio, se seguirán los postulados de la macrosintaxis de
corte funcionalista (Gutiérrez Ordóñez, 2016) que supera los lı́mites del enunciado y se adentra
en el microdiscurso, es decir, la combinatoria de enunciados en el discurso, para observar sus
constituyentes y el entramado de relaciones y funciones que tienen lugar entre ellos.
Metodológicamente, se partirá de un análisis cualitativo de estos esquemas fraseológicos en
algunos corpus orales del español dentro del ámbito conversacional, en el que aparecen de
forma natural debido a su naturaleza ecoica. Asimismo, se contrastará su incidencia en corpus
más generales del español. Estos corpus son: el Corpus del Español del Siglo XXI (CORPES
XXI), el Corpus del español web/dialectos, Sketch Engine, el Corpus Oral Didáctico Anotado
Ling´’uı́sticamente (CORDIAL), el Corpus de conversación coloquial del grupo Val.Es.Co., el
Corpus Oral Juvenil del Español de Mallorca (COJEM) y el Corpus del grupo de investigación
ling´’uı́stica aplicada (COGILA).
Se prevé que los resultados contemplen algunas de las principales caracterı́sticas de estos esquemas. Dentro de la conversación, actúan siempre como intervenciones de réplica despreferidas,
puesto que nunca pueden ser primeros turnos de palabra. La manifestación del desacuerdo
marca al mismo tiempo la ruptura con la continuación esperable del discurso y la presencia de
varios enunciadores dentro de la misma intervención.
Contraseña: Desacuerdo, estructuras interrogativas, repetición, análisis de la conversación, ling´’uı́stica
de corpus, macrosintaxis
59
El lenguaje jurı́dico y el lenguaje de la
ingenierı́a biomédica vistos desde la
metodologı́a de corpus
Eleonora Lozano Bachioqui
⇤† 1
, Allen Andrade Navarro
⇤ ‡ 2
1
2
Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Av. Álvaro Obregón y
Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México
Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Álvaro Obregón y Julián
Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México
El presente trabajo se concentra en dos lenguajes de especialidad: el jurı́dico y el de la ingenierı́a biomédica. Profundiza en el lenguaje jurı́dico desde la perspectiva fraseológica y en el de
la ingenierı́a biomédica, desde perspectiva terminológica. Para tal efecto, se construyeron dos
corpus especializados monoling´’ues en español que son producto de la investigación basada en la
metodologı́a de corpus (McEnery y Hardy, 2012) y que fueron analizados a través de herramientas de gestión de corpus. Para ello, se contemplaron trabajos fundacionales en la ling´’uı́stica de
corpus como los de Sinclair (1970) y Stubbs (2001).
El primer corpus, un corpus con fines especı́ficos (Maia, 2002), consta de 73,214 palabras y
5, 751 tipos procedentes de documentos legales pertenecientes al derecho civil mexicano, tales
como actas de nacimiento y de matrimonio, sentencias, testamentos, ası́ como contratos, entre
otros. Éste se analizó mediante un software de procesamiento léxico: WordSmith Tools (Scott,
2014) que generó una lista de 558 palabras clave. De aquı́, se obtuvieron 60 verbos clave con una
frecuencia 10; a partir de los cuales se estudiaron las colocaciones y secuencias formulaicas, utilizando el IM (Índice de Información Mutua). Para ello se consideraron trabajos fundacionales
como los de Corpas Pastor (2003) y Koike (2001). Un ejemplo del análisis es el caso del verbo
celebrar que tiene colocaciones léxicas simples como celebrar + contrato y celebrar + convenio
(verbo + sustantivo objeto), ası́ como celebrar + a + (el ) tenor (verbo + preposición+ sustantivo). Además, presenta secuencias formulaicas como es su libre voluntad celebrar y obligarse.
El segundo corpus consta de 394,351 palabras y 23,965 tipos procedentes de textos cientı́ficos
pertenecientes al área de la ingenierı́a biomédica y obtenidos a través de revistas electrónicas de
reconocido prestigio en Latinoamérica. Al igual que el primero, se analizó mediante un software
de procesamiento léxico: Antconc (Lawrence, 2014) que generó una lista de palabras clave, de
las cuales se consideraron aquellas con una Frecuencia de 45 y un ı́ndice de representatividad (Keyness) de 107, a partir de éstas se identificaron las colocaciones, considerando el ı́ndice
Log-Likelihood. Para este trabajo se consideraron autores como Cabré, (2007) y Faber (2010).
Ejemplos de las colocaciones encontradas en este último corpus son: tejido + óseo, matriz +
extracelular, presión + arterial, alto + riesgo y baja + densidad (sustantivo + adjetivo), ası́
como reacción + difusión (sustantivo – sustantivo).
Los resultados de este estudio facilitan un acercamiento, desde la perspectiva de la ling´’uı́stica
de corpus, a estos dos lenguajes de especialidad y permiten al traductor, ası́ como al docente de
⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
†
60
lenguas con fines especı́ficos, resolver los problemas ling´’uı́sticos relacionados con la estructura
léxica, terminológica e, incluso, fraseológica de los lenguajes de especialidad. En este caso, el
jurı́dico y el técnico.
Contraseña: lenguajes de especialidad, ling´’uı́stica de corpus, colocaciones, traducción, enseñanza
de lenguas
61
Estudio comparativo de la traducción en
inglés, francés y español de los aspectos
ling´’uı́sticos y paraling´’uı́sticos de los cómics
a partir de un corpus multimodal de género
de terror
Marı́a Del Carmen Baena Lupiáñez
1
⇤ 1
UNIVERSIDAD DE MÁLAGA – España
Teniendo en cuenta los estereotipos que la sociedad establece con respecto a determinados
gestos y expresiones, las producciones literarias han hecho uso de ellos para aportarle expresividad a sus personajes. Hoy en dı́a existen cómics con verdaderos ensayos filosóficos en sus
bocadillos, y cómics en los que solo aparece la imagen sin texto alguno. Dicho texto se limita,
en ocasiones, a complementar lo que el lector se dispone a ver en las viñetas. Texto e imagen
son dos elementos que no pueden prescindir el uno sin el otro, ya que se complementan entre
sı́. En la traducción de cómics, el traductor debe tener en cuenta esta complementariedad para
que el texto meta resulte coherente y tenga cohesión.
Por lo tanto, en la traducción de cómics hay que observar tanto los elementos textuales como
los paratextuales, ya que no son de ningún modo indisociables. Ası́, el traductor debe no solo
leer el texto, sino interpretar la imagen que la acompaña y emplear las técnicas pertinentes, ası́
como adaptar el texto y la imagen a la cultura meta en caso necesario.
Esto permitirı́a afirmar que la traducción de cómics es un tipo de traducción especializada,
ya que tiene sus propios códigos y sus propias estrategias de traducción.
Sin embargo, y pese a la importancia de una buena interpretación de los aspectos paraling´’uı́sticos, los estudios de Traducción no han tratado este tema de forma directa con demasiada
frecuencia.
Anteriormente se estudiaba, por un lado, el género del cómic y, por otro lado, la traducción
de cómics. De este modo, existen estudios que se centran en el análisis del cómic (T. Groensteen, 2009, 2013), en las caracterı́sticas especı́ficas de este género (Gubern y Gasca, 1988) y
en su aspecto semiótico (N. Celotti, 2008), y, por otro lado, estudios que se centran en la importancia de la imagen para la traducción de cómics (Kaindl, 2004; Zanettin, 2008). Hoy en
dı́a el concepto de ”paratraducción” es el que mejor se adecua a la traducción de cómics (José
Yuste Frı́as, 2015). Autores como Zanettin han estudiado tanto la ling´’uı́stica de corpus como
los cómics, y han señalado que se puede establecer una relación entre los corpus y los cómics, ya
que el traductor puede elaborar corpus textuales para traducir de forma más eficaz y eficiente
el cómic a la lengua y a la cultura meta (2002).
Tras lo expuesto anteriormente, el objetivo principal de este estudio es el de establecer clasificaciones que integren los elementos paraling´’uı́sticos (gestualidad, expresiones faciales y lenguaje
⇤
Ponente
62
simbólico) que aparecen en los cómics teniendo en cuenta la cultura inglesa, francesa y la
española.
Para cumplir con dicho propósito, se han seleccionado seis cómics de terror. En este tipo de
obras los elementos paraling´’uı́sticos son muy destacables, ya que son cómics en los que aparecen
multitud de elementos simbólicos y en las que los personajes son especialmente expresivos, con
lo cual se podrá establecer un amplio corpus multimodal.
Contraseña: Ling´’uı́stica de corpus, corpus multimodal, cómics, cómics de terror, elementos paral´
ing’uı́sticos.
63
Estudio comparativo de las marcas de uso
en los repertorios lexicográficos actuales
Estrella Calvo-Rubio Jiménez
1
⇤ 1
Universidad de Sevilla [Seville] – C/ S. Fernando, 4, C.P. 41004-Sevilla, España
Los repertorios lexicográficos han registrado siempre, en menor o mayor medida, marcas de
uso. Sin embargo, a lo largo de la historia lexicográfica, esta marcación ha sufrido cambios. En
efecto, observamos que en los últimos años la Real Academia de la Lengua Española ha optado
por introducir nuevas marcas de uso y, en ocasiones, ha procedido a la sustitución de una marca
por otra. En este sentido, el Diccionario de la Lengua Española de la Real Academia ha sido
siempre un referente en el mundo lexicográfico hispánico y, por supuesto, los estudios que tratan
sobre él son muy abundantes y variados. Sin embargo, a raı́z de la observación de las últimas
ediciones, llama la atención la variabilidad presente en las marcas de uso. No obstante, esta
variabilidad o falta de precisión a la hora de establecer las marcas de uso no es una caracterı́stica
exclusiva del Diccionario académico. De hecho, los lexicógrafos coinciden en señalar la existencia
de una clara dificultad a la hora de instaurar un criterio a través del cual decretar cuándo una
voz o acepción pertenece a un nivel de lengua o estilo concreto. De ahı́ que existan diferencias
entre una obra lexicográfica y otra en lo que a las marcas de uso se refiere. Esta investigación
realiza un estudio comparativo de las marcas de uso en diferentes repertorios lexicográficos
actuales, concretamente en el Diccionario de la Lengua Española (2014) de la Real Academia,
en el diccionario CLAVE (2012), en el Diccionario del español actual (2011), en el diccionario
de uso Marı́a Moliner (2008) y en el diccionario de la lengua española LEMA (2001), con el
objetivo de reflejar las diferencias en cuanto a esta marcación de una obra a otra. Para ello,
se parte de la elaboración de un corpus en el que se recogen las voces o acepciones marcadas
diafásica o diastráticamente en estas cinco obras. De este modo, a través de la observación
y el estudio del corpus, me centro en estudiar las diferencias existentes entre un repertorio
lexicográfico y otro en cuanto a las marcas de uso, prestando especial atención a las voces y
acepciones marcadas como vulgar, malsonante y coloquial. Ası́ comprobamos que estas cinco
obras presentan bastantes divergencias a la hora de establecer dicha marcación. Por ejemplo,
observamos que el diccionario LEMA se aleja claramente de las otras obras al no catalogar
ninguna de las voces y acepciones bajo la marcación vulgar ; o que, en el Marı́a Moliner, no
aparece la anotación malsonante, introducida en el Diccionario académico en 2001 y presente
en los otros repertorios lexicográficos. Cabe preguntarse, pues, qué criterios siguen los diferentes
lexicógrafos a la hora de establecer las marcas de uso y cuáles son más convenientes en cada
caso.
Contraseña: lexicografı́a, marcas de uso, diccionarios
⇤
Ponente
64
Estudio contrastivo de corpus para
identificar los rasgos diacrónicos del discurso
normativo catalán : estudio de los Estatutos
de autonomı́a de 1932, 1979 y 2006
Albert Morales Moreno
1
⇤ 1
Universitat Pompeu Fabra / Università Ca’ Foscari Venezia (UPF / UCFV) – España
El procedimiento legislativo de aprobación y redacción del Estatuto de autonomı́a de Cataluña
de 2006 (EAC 2006), y su estudio exhaustivo presentado en Morales (2015), planteaban la
necesidad de llevar a cabo un estudio diacrónico[1] comparativo de los diferentes Estatutos de
autonomı́a de Cataluña que ha habido a lo largo de la historia: el EAC de 1932, el de 1979 y el
ya citado de 2006.
Como en otras tradiciones y paı́ses, la negociación de todos esos proyectos normativos han sido
retos notables en su momento histórico, tanto jurı́dicamente como polı́ticamente, tal y como se
puede constatar en Balcells (2010) –para el proyecto de autonomı́a de 1919–, Aymamı́ (1932)
o Abelló (2007) sobre el de 1932, y Sobrequés (2010) en lo que respecta al EAC de 1979. Hay
que leer cada uno de esos Estatutos como reclamaciones de autogobierno reiterados tanto en
el marco jurı́dico constitucional actual como en los marcos de convivencia anteriores. Dicho
compendio de documentos constituye lo que André Salem denomina ”serie textual cronológica”
(Salem 1994:313).
Esos textos, situados a medio camino entre el discurso especializado legislativo y el discurso
polı́tico (Thornton 1987; Chilton 2004), se inscriben dentro de un género textual –el discurso
normativo– poco estudiado desde la perspectiva del análisis del discurso (AD) (Fernández Lagunilla 1999a, 1999b; Bassols 2007), ya que sobre todo se han caracterizado otros géneros relacionados con la actividad polı́tica, especialmente el debate parlamentario (Ribas Bisbal 2000;
Cuenca 2014).
Tomando como referencia las publicaciones sobre redacción legislativa en catalán (como, por
ejemplo, GRETEL 1986, 1995; Duarte 1993; SAL 2014) y la metodologı́a de otros estudios
lexicométricos sobre discurso normativo en catalán (Morales 2010, 2015), ası́ como el estudio
contrastivo de las constituciones españolas de 1812, 1931 y 1978 (Démol 2013), se llevará a cabo
un estudio de tipo diacrónico.
Partimos de una metodologı́a de análisis basada en la lexicometrı́a: las unidades de análisis
se seleccionan en base a criterios estadı́sticos. Para tratar nuestro corpus, utilizaremos una de
las herramientas de análisis lexicométrico más utilizadas, a saber, Lexico3, Iramuteq, TXM o
Hyperbase.
Procederemos a realizar un estudio lexicométrico de las caracterı́sticas principales del corpus
(crecimiento del vocabulario, análisis factorial de correspondencias, segmentos repetidos...) y
nos interesa, sobre todo, dos estudios: 1) el análisis de especificidades para, con este ı́ndice
⇤
Ponente
65
ampliamente usado en la tradición lexicométrica, ser capaces de identificar las unidades léxicas
que presenten cambios a lo largo del periodo seleccionado (1932-2006). Dicho ı́ndice nos servirá
para identificar las formas que aparecen infrautilizadas y sobreutilizadas estadı́sticamente, de
acuerdo con el tamaño de cada subcorpus (cada EAC diferente) y de todo el corpus en su conjunto; 2) el análisis de segmentos repetidos, para identificar cuáles son las unidades fraseológicas
que caracterizan el discurso normativo en catalán y su evolución a lo largo del tiempo.
De este modo, nuestra investigación se plantea analizar el corpus lexicométricamente para identificar las formas que caractericen en positivo y en negativo cada versión del EAC estudiada y
las unidades fraseológicas más recurrentes para, ası́, establecer las primeras bases que permitan
describir, desde un punto de vista diacrónico, la evolución del discurso normativo en lengua
catalana en relación al vocabulario y a la fraseologı́a.
Esta investigación se enmarca en el proyecto de investigación financiado por el Instituto de Estudios del Autogobierno para el primer semestre de 2017.
Contraseña: discurso normativo, lexicometrı́a, ling´’uı́stica de corpus, estudio diacrónico
66
Estudio de la aplicabilidad de la ley de Zipf
y de la ley de Heaps en los corpus de
aprendientes de inglés.
Nicolas Ballier
⇤† 1
, Paula Lissón
1
⇤ ‡ 2
Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et
(CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris
Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia
2
Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et
(CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris
Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia
de Corpus
Diderot Bât.
de Corpus
Diderot Bât.
Este trabajo se centra en la aplicabilidad de la ley de Zipf-Mandelbrot (Zipf, 1949; Mandelbrot, 1953) y de la ley de Heaps (1978) en los corpus de aprendientes. Para ello, realizaremos
una comparación entre las curvas de crecimiento del vocabulario en textos escritos por nativos
ingleses y en textos escritos por aprendientes de inglés.
La ley de Zipf-Mandelbrot establece que, en un texto dado, la distribución de las palabras está
relacionada con su frecuencia. Esto se traduce en que el texto estará compuesto por pocas palabras con mucha frecuencia, y por muchas palabras con poca frecuencia. En un estudio reciente,
Bentz y Buttery (2014) muestran que a) la ley de Zipf-Mandelbrot puede ser utilizada como
medida de estudio de la diversidad léxica y, b) no todas las lenguas siguen de la misma forma
la ley de Zipf-Mandelbort. Nuestra hipótesis es que los aprendientes de inglés no siguen exactamente la ley de Zipf-Mandelbort, y que su curva de crecimiento del vocabulario es diferente
con respecto a la curva de los nativos, lo que podrı́a ayudarnos a clasificar a los aprendientes en
diferentes niveles.
La ley de Heaps (1978), complementaria a la ley de Zipf, establece que el crecimiento del
vocabulario de un texto dado es una función del tamaño de dicho texto. Si aumentáramos
el tamaño del texto, aunque el crecimiento del vocabulario seguirı́a siendo ascendiente, dejarı́a
de ser linear, ya que a medida que se incrementa el número de palabras, la posibilidad de que
aparezcan palabras nuevas se ve reducida. Nuestra hipótesis es que los aprendientes presentan
un crecimiento del vocabulario más limitado, por lo que la producción de hápax legomena serı́a
inferior a la predicción propuesta por la ley de Heaps (aproximadamente la raı́z cuadrada del
número total de tokens).
Para probar nuestra hipótesis, estudiaremos la aplicabilidad de la ley de Zipf-Mandelbrot y
de la ley de Heaps en un corpus escrito de estudiantes hispanófonos de inglés, NOCE (Dı́azNegrillo, 2007), y compararemos los resultados con los de un corpus de producciones escritas
de nativos ingleses, LOCNESS (Paquot, 2015). De esta forma, analizaremos la valencia de las
leyes aquı́ propuestas, mostrando ası́ las variaciones entre los nativos y los no nativos.
A partir del número de tokens y de hápax legomena de nuestro corpus de aprendientes, gener⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
†
67
aremos los espectros de frecuencia que nos permitirán crear las curvas de crecimiento del vocabulario. Para ello, emplearemos el paquete {zipfR} (Evert & Baroni, 2006), implementado en el
programa R (R Core Team, 2016). Siguiendo los pasos de Ballier y Gaillat (2016), utilizaremos
la función ”compare.richness.fnc” implementada en {langaugeR} (Baayen, 2007) para comparar
el crecimiento del vocabulario entre las producciones de nativos y no nativos.
A continuación, desarrollaremos la extrapolación de las curvas de crecimiento de vocabulario (ver
figura 2) según los tres modelos de Large Number of Rare Events (LNRE) incluidos en {zipfr}:
”Generalized Inverse Gauss-Poisson” (R Harald Baayen, 2001, 2008), ”Zipf-Mandelbrot” y ”Finite Zipf-Mandelbrot” (Evert, 2004). Finalmente, comparemos los resultados de los tres modelos
para identificar cuál de ellos es más adecuado en el análisis de los corpus de aprendientes.
Contraseña: corpus de aprendientes, complejidad léxica, Zipf, Mandelbrot, crecimiento del vocabulario, hápax legomena
68
Extracción de fraseologı́a contable con
Sketch Engine. Propuesta de flujo de
trabajo
Daniel Gallego
1
⇤ 1
Universidad de Alicante (UA) – Carretera San Vicente del Raspeig s/n 03690 San Vicente del Raspeig
- Alicante, España
Este trabajo presenta una experiencia metodológica en la extracción de fraseologı́a especializada en un corpus genérico especializado en contabilidad. Se postula la hipótesis de que, sobre
la base un listado cerrado de términos simples y de verbos que potencialmente pueden llegar a
formar junto con tales términos unidades fraseológicas especializadas, Sketch Engine (Kilgarri↵
et al., 2004), a pesar de no estar diseñado especı́ficamente para la extracción de fraseologı́a
especializada, puede ser de utilidad para el trabajo de vaciado fraseológico.
El marco teórico gira en torno al concepto de fraseologı́a especializada, que se revisa a partir
de trabajos como los de Gouadec (1994), L’Homme (1997), Bevilacqua (2004) o Aguado (2007).
También se tienen en cuenta algunos estudios sobre evaluación de extracción de fraseologı́a
(Claveau & L’Homme 2004; Wanner et al. 2005, entre otros).
Para dar respuesta a la hipótesis de trabajo, en un primer momento, se delimita, sobre la
base de los trabajos anteriores, el objeto de estudio (en esencia, se analiza la fraseologı́a especializada del tipo verbo + término). A continuación, se propone un flujo de trabajo para la
extracción de un listado de candidatos a unidades fraseológicas especializadas con el sistema
de explotación de corpus Sketch Engine. El flujo en cuestión se divide en diferentes pasos: el
primero consiste en generar dos whitelists, una con términos y otra con verbos extraı́dos del
propio corpus, y validarlos manualmente. El segundo tiene que ver con la extracción de concordancias que contengan los verbos y términos identificados, lo cual implica el uso avanzado de
CQL (corpus query language) de Sketch Engine. En el tercer paso se genera, a partir del listado
de concordancias anterior, un nuevo listado de frecuencias de las unidades extraı́das que puede
considerarse un listado de candidatos a unidades fraseológicas especializadas. Por último, se estudia de manera individualizada las unidades extraı́das para determinar su carácter fraseológico.
El análisis de las cincuenta primeras unidades extraı́das muestra un porcentaje de precisión
de en torno al 40%, una cifra bastante elevada que merece seguir siendo investigada. La validación de más unidades permitirá conocer la fluctuación de este porcentaje y saber en qué
medida es superior o inferior al de otros estudios.
En cualquier caso, los resultados pueden ser tenidos en cuenta no solo en la elaboración de
repertorios fraseológicos, sino también en la indexación de corpus. Asimismo, la experiencia
permite hacer algunas sugerencias con el ánimo de optimizar el funcionamiento de sistemas de
explotación de corpus en su relación con la extracción de fraseologı́a especializada.
⇤
Ponente
69
Contraseña: Fraseologı́a especializada, Sketch Engine, extracción, corpus genéricos
70
Extracting semantic frame structures from
Environmental Sciences corpora
Beatriz Sánchez-Cárdenas
⇤† 1
, Carlos Ramisch
⇤
2
1
2
Lexicon research group, Universidad de Granada – España
Université de Marseille – LIF (Laboratoire d’Informatique Fondamentale) – Francia
Some authors argue that language is much less compositional than one might initially assume
(Tutin & Falaise 2013, K´’ubler & Volanschi 2012, Gledhill 2000, Pecman et al 2010, L’Homme
1998). In addition to multiword expressions, such as idioms andcompounds, speakers often
employ prefabricated templates and collocational patterns. Such patterns are omnipresent in
specialized language, where their correct use is crucial to fully convey and understand domain
concepts and their relations.
In this research, we propose and evaluate a new way to automatically identify specialized nounverb combinations that are both recurrent and meaningful from a cognitive point of view in
scientific discourse (Claveau & L’Homme 2006). The long-term goal of this work is to automatically extract argument structures from corpora to help building semantic frames that are
activated in specialized domains.
From a theoretical point of view, our work derives from frame-based terminology (FBT, Faber
2012, 2015). FBT applies the premises of frame semantics (Fillmore 2006) to the study of the
conceptual organization that underlies specialized domains. Then, our description of thematic
roles and argument structure is based on role and reference grammar (Van Valin 2006). Finally,
we classify the nouns of the arguments in semantic categories (Flaux and Van Velde, 2000).
With this perspective in mind, we developed a corpus-based methodology to acquire lexical
patterns that reveal the structure of di↵erent frames. Our starting point are corpus queries and
association measures implemented in the MWEtoolkit, a software for automatic MWE discovery
in corpora(Ramisch 2014).
After morphosyntactic analysis and lemmatization of the corpus, we search specialized nounverb and verb-noun combinations that are conceptually meaningful. These searches were based
on semantic relations between nouns described in the Environmental database EcoLexiCon. For
instance, the term volcano is connected to the noun eruption through the conceptual relation
[cause of]. Since the extraction from corpora of relevant noun-verb combinations is crucial to
identify the argument structures that underlie semantic frames (Fillmore et al 2003), we searched
in the corpora for verbs that lexicalize the relation between these two nouns and retrieved verbs
such as cause and produce. Using a bootstrap methodology, these verbs where reused to formulate another query, which retrieves from the corpora all causal relations related to volcanoes.
The results were then sorted in descending order of association measure (pointwise mutual
⇤
†
Ponente
Autor correspondiente: [email protected]
71
information). The most relevant lexical items for the frame under study are those in the top of
the list. Finally, these lists of patterns led to the emergence of the di↵erent conceptual frames
associated to the concepts analyzed. These are then filled in manually by an expert lexicographer.
In this article, we chose to present an example extracted from a 1-million-token corpus of Volcanology. For the moment, we have extracted the verbs associated to the term volcano. When
we analyze the arguments of these verbs and their associated thematic roles (Van Valin 2006)
and semantic categories (Flaux and Van Velde, 2010), we will illustrate the di↵erences between
the three di↵erent frames.
Since frames reflect cognitive patterns, they are language independent. As shall be seen in
our presentation, this conceptual description can be enriched with linguistic information in any
language. As a consequence, translation studies can greatly benefit from it.
Contraseña: frame, based terminology, multiword expressions, argument structure, corpus analysis
strategies
72
Facework in a telecollaboration student
corpus
Pennock-Speck Barry
⇤† 1
, Begoña Clavel Arroitia
⇤
1
1
Universitat de València (UVEG) – Universitat de València, Avda. Blasco Ibáñez, 32, España
Undoubtedly bigger is better in the world of corpus linguistics –the more data you have the
better results. However, there are corpora that are necessarily small. Let’s take our corpus of
twelve audio-visual recordings of synchronous peer interaction (Telecollaboration[1]) in English
and Spanish between native secondary school speakers. Anyone who has done research on the
discourse of minors knows how difficult it is to get permission from parents to record pupils
for research purpose. What may not be so evident to those who have never been involved in
telecollaboraion is the difficulty of finding schools in at least two countries that are willing to participate and time slots that suit geographically distant peers. These problems are compounded
by often less than perfect technical resources in secondary schools. All this leads to small, finite
corpora which are difficult to replicate. But does this mean that they are of no use? We would
argue that this is far from the truth. In this talk, we aim to prove that detailed qualititative
analysis of synchronous multimodal interaction between secondary school pupils yields valuable
insights into the language pupils use and also intercultural and interersonal negotiations.
During telecollaboration students are faced with challenges of an interpersonal, intercultural and
a transactional nature while trying to complete the tasks they are given such as organising a
party or a trip abroad on a tight budget. Such challenges require the use of facework, which we
define, following Go↵man (1956, 1967), as the actions individuals take to mitigate face threats
and to protect or enhance their own face and that of others. Our findings show that mitigating
face threats is found in our corpus when requests for clarification arise due to a peer’s lack of
linguistic prowess in the foreign language at a particular moment in the exchange or simply
because he/she is not able to hear a word due to technical problems. In most cases we found
that, if comprehension was not compromised, linguistic errors were obviated –which may be
due to a common facework strategy, that is, avoidance of conflictive issues. We also discovered
that facework addressed to positive face was very common and generally consisted of the search
for common ground. Apart from linguistically-coded communication, we also detected many
cases of non-linguistic communication through gestures, smiles, laughter and the showing of
photographs of a personal nature. These often reinforced verbal facework strategies.
To sum up, our findings point to the fact that the ”ceremonial activity” (Go↵man 1967:477)
done through facework is an important, though oft-neglected, facet of linguistic or psychological
studies of student interaction.
Go↵man, Ervin. 1956. ”The nature of deference and demeanor.” American Anthropologist
58: 473-502.
⇤
†
Ponente
Autor correspondiente: [email protected]
73
Go↵man, Ervin. 1967. Interaction ritual: Essays on Face to Face Behavior. Garden City,
New York.
Telecollaboration for Intercultural Language Acquisition project (TILA)
Contraseña: telecollaboration, facework, pragmatics, acquisition
74
From text to word and from word to
morpheme: Exploring the interface of
corpus linguistics and word formation study
with evidence from Modern Greek
Paraskevi Savvidou
1
⇤ 1
National and Kapodistrian University of Athens (UoA) – Grecia
The present paper aims to explore the contribution of corpus linguistics in word formation
study, by reviewing previous research, as well as by discussing the findings of an ongoing study
in Modern Greek word formation processes with emphasis on evaluative morphology. The orientation of the study is both theoretical and methodological. It aims to demonstrate that the
further investigation of the interface of corpus linguistics and word formation morphology could
provide significant insights into the understanding of the character and nature of corpus linguistics as a linguistic (un)field or methodology (see among others Stubbs 2009), by demonstrating
its ties with what Sinclair (2004) used to call restrictions of the pre-computer age; also it can
contribute towards the overcoming of these limitations. In other words, the interface of corpus
linguistics and word formation study is presented as crucial for understanding and extending
the theory and methodology of corpus linguistics.
In the first part of the paper, a historical overview of the use of corpus linguistics demonstrates
that morphology is a rather neglected area of corpus research, compared to other linguistic fields;
corpora were applied in morphology later, less systematically and by concentrating only on specific aspects of morphemes’ behavior, like productivity, excluding or underestimating others.
The critical overview of previous research shows that the use of corpora in individual linguistic fields seems to be driven by a latent distinction between the formation and the use level,
which is associated with the relevant dichotomies between grammar and lexis, as well as between
form/structure and semantics. The extent and the way of applying corpora in morphology can
be seen as a consequence of this distinction. Given the fact that corpus linguistics is a perspective in language study which goes beyond theoretical assumptions and dichotomies which
do not come from data analysis, the above observation could contribute to a most thorough
understanding of such limitations, which is essential in order to overcome them.
In the second part of the paper, we introduce a set of theoretical and methodological principles
which could extend the implementation of corpus linguistics in word formation study and we give
evidence in their favor by presenting the results of an ongoing study of Modern Greek evaluative
morphology. The proposed methodology is designed on the basis of two main methodological
principles: (a) the extension of the notion of co-occurrence in two levels: the word formation
level (namely, various characteristics of the bases or compounding components which the elements under examination tend to combine with) and the (con)text level and (b) the combination
of qualitative and quantitative analysis on the study of every aspect of the behavior of the sublexical units under examination, including function identification, combinatoriality, productivity
etc. These principles aim to transfer all the benefits of the ‘phraseological approach’ of corpus
linguistics to the field of morphology. The results of the analysis of a representative number of
Modern Greek sub-lexical units show that these general principles allow the examination of the
⇤
Ponente
75
dynamic relation between the formation and the use level of the elements under examination,
o↵ering a perspective which can only be in view if the analysis is careful not to exclude or
underestimate specific aspects of morphemes’ behavior.
Contraseña: Word formation, derivation, compounding, context, word level, text level, lexis, grammar, evaluative morphology, phraseological approach
76
Functional and thematic ngrams in
specialized corpora: the case of academic
English, French and Spanish
Clive Hamilton
⇤ 1
1
Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus
(CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât.
Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia
Previous studies have established that functional and content single-word-units di↵er in
ratio between oral and written modes of communication (cf. Halliday, 1994; Rowley-Jolivet
1998; Biber et al., 1999). Others have suggested that this mode di↵erence is equally attested
in di↵erent languages (cf. Samaniego, forthcoming, for Spanish; Hamilton & Carter-Thomas,
forthcoming, for English and French). However, in spite of the many advances in corpus studies,
this observation has not yet been adapted or extended to clusters or recurrent word combinations.
In addition, the study of phraseological units has become a burgeoning area of linguistic inquiry
over the last years, both in theoretical and applied frameworks (cf. Cowie, 1998; Meunier &
Granger, 2008). The pervasiveness of these units, irrespective of the type of data used for research, has also benefited from ”key publications”, according to Stubbs & Barth (2003:61). As
a result, the pervasive nature of these recurrent combinations can therefore be considered an
irrefutable characteristic of natural language production.
In this presentation, the aim is to add a doubly contrastive perspective to the general debate, by
examining (i) recurrent word combinations (or ngrams, which can be subdivided into bigrams,
trigrams, and so forth) (ii) in a specialized trilingual corpus of academic discourse in natural
sciences (restricted to chemistry, geochemistry, marine and water sciences) in English, French
and Spanish. The corpus compilation process will be presented and I will briefly outline the
distinction made between functional and thematic ngrams. The main part of my presentation
will focus on two issues: i.e. the pervasiveness of the two types of recurrent word combinations
in the three subcorpora and the parallels that can be drawn (especially when there is overlap
between languages with a specific ngram) between thematic and functional ngrams and the lexical density of each language subcorpus.
Preliminary results indicate overlapping: viz. the trigram ‘a partir de’ exhibits a similarly high
frequency both in the Spanish and French subcorpora, whereas the Spanish ‘en la figura’ and
the English equivalent ‘shown in figure’ are used in a comparable manner and both share similar
frequency. Substantial di↵erences, however, have been observed in lexical density between languages with a greater ratio in English than in French and Spanish, implying that composition
strategies may vary significantly in terms of information packaging. There is also a marked
preference in English for functional ngrams rather than thematic ngrams. For instance, the top
three trigrams in the English subcorpus are all functional, whereas those in the two remaining
languages are considered thematic or topic-specific. (i.e. ‘the use of ’, ‘shown in figure’, ‘as well
as’; ‘après J.-C’, avant J.-C, ‘de l’holocène’; ‘almacenamiento de CO2’, ‘de CO2 en’, ‘de la formación’, respectively). The implications of our results will be discussed in respect to language
⇤
Ponente
77
teaching and particularly that of language for specific purposes.
Contraseña: ngrams, phraseology, academic discourse, specialized corpora, contrastive studies
78
Gender-based di↵erences in the use of
epistemic modals in late Modern English
scientific register
Francisco Alonso-Almeida
1
⇤† 1
, Francisco J. álvarez-Gil
⇤ ‡ 1
Universidad de Las Palmas de Gran Canaria (ULPGC) – España
The research conducted has focused on samples from English scientific texts from 1700 to
1900 in order to evaluate epistemic modality as realised by modal verbs. Epistemic modality
seems to be strongly connected to the idea of truth and the authors’ responsibility and commitment regarding their statements (Traugott 1989; Sweetser 1990; Stukker Sanders and Verhagen
2009). We will also discuss some related features, such as evidentiality. Whereas for some scholars evidentiality represents a subdomain of epistemic modality, there are others who consider
evidentiality as an independent category. In this context, Dendale and Tasmowski (2001) argue
that the relation between these two concepts is divided into disjunction, inclusion, and intersection. We follow the disjunctive approach in this paper in line with Cornillie (2009) who argues
that the mode of knowing should not be associated with the degree of authors’ commitment
towards their texts.
Our interest was to see whether di↵erences in the use of these modals could be detected from
a gender perspective. For this, we have interrogated the subcorpus of History of The Coruña
Corpus of English Scientific Writing, which contains extracts of several historical texts written
between 1700 and 1900, using its own retrieval tool, i.e. the Coruña Corpus Tool. Each occurrence has been categorised according to its contextual meaning following Dixon’s description of
modal verbs that claims there are modals and what we can call semi-modals, which express the
modalities (2009: 172). However, there are also other valuable insightful studies on modals as
Coates (1983), Leech (1971) and Palmer (1979), among others, which have served as references
for the present study.
The process followed consists basically in the following: firstly, we have produced a list of
occurrences in the corpus to check the presence of modal verbs in the history texts available.
Secondly, we have interrogated and analysed the corpus to find the pragmatic functions those
modals play in the di↵erent texts. Finally, we have checked the results to find out if there exist
any di↵erence in the use of epistemic modals in late Modern English scientific register regarding
the gender of the writers.
Results report on frequency of usage of these modal verbs according to gender, but, most importantly, the di↵erent pragmatic functions these modal verbs fulfil in the communicative process.
One such pragmatic function is mitigation of claims (Alonso Almeida 2015), and so the modals
are used as a negative politeness strategy (Brown and Levinson 1987), to avoid or minimize
imposition, to hedge the illocutionary force of a specific statement, or to put social distance
in order to save the author’s face. In this sense, modals are quite useful as they enable an
interactive construction of scientific knowledge giving the chance to the writer and the readers
to negotiate meaning.
⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
†
79
Contraseña: modals, corpus, gender, modality, evidentiality
80
Gobernabilidad y democracia en México.
Unidades fraseológicas del Ejecutivo Federal
2012-2016 desde el Análisis Crı́tico del
Discurso
Carlos Enrique Ahuactzin Martı́nez
⇤ 1
1
Benemérita Universidad Autónoma de Puebla-Instituto de Ciencias de Gobierno y Desarrollo
Estratégico (BUAP-ICGDE) – Av. Cúmulo de Virgo s/n. Acceso 4, CCU. Puebla, Puebla, México C.P.
72810., México
La concepción del Estado como regulador de la vida pública en paı́ses latinoamericanos ha
encontrado, en los últimos años, su prueba más rigurosa. En el caso de México, se propone
documentar el proceso de construcción del discurso de la ”gobernabilidad democrática”, a partir de la figura presidencial, como una estrategia del Ejecutivo Federal para hacer frente a la
existencia de un ”Estado fallido”, a la luz de los acontecimientos sociales y polı́ticos que han
puesto en examen la capacidad del Estado mexicano para mantener y garantizar los derechos
humanos. Con base en las perspectivas teórico-metodológicas del Análisis Crı́tico del Discurso y
la Ling´’uı́stica de Corpus, de manera complementaria, se analizan los discursos presidenciales del
periodo 2012-2016, en que se registra la configuración de la nueva polı́tica de Estado en materia
de seguridad y el desarrollo de los procesos de violencia que han caracterizado a la administración
federal actual. El discurso presidencial, a lo largo del corpus, revela los recursos discursivos que
hicieron posibles las formas de comunicación de las reformas estructurales en México, basadas
en el cumplimiento de la ”gobernabilidad democrática”, concebida como un marco normativo
para el desarrollo del Estado y el fortalecimiento de la ciudadanı́a. El corpus ha sido organizado
con base en las concordancias semánticas, utilizando el Sistema de Gestión de Corpus del Grupo
de Ingenierı́a Ling´’uı́stica de la Universidad Nacional Autónoma de México. La clasificación y
tratamiento de las unidades fraseológicas tuvo como base la identificación de dos monolexemas,
”gobernabilidad” y ”democracia”, que en el tratamiento del corpus revelaron su incorporación
a plurilexemas, en función de la situación comunicativa del Ejecutivo Federal. De este modo, se
establecieron tres grupos, dada su frecuencia en la base del corpus: 1) las locuciones nominales,
2) las locuciones adjetivas, y 3) las locuciones adverbiales. En la determinación de los usos de
las locuciones, se consideró en el etiquetado del corpus el carácter funcional de las expresiones
ling´’uı́sticas, en el contexto de la comunicación gubernamental. Los recursos discursivos utilizados por el Ejecutivo establecen un marco de referencia a nivel léxico-semántico, en el que
la ”democracia” ocupa un lugar destacado en el ejercicio del poder público y la legitimación
de las decisiones polı́ticas. Asimismo, el uso de las derivaciones de la ”gobernabilidad”, a la
luz del análisis de las unidades fraseológicas, permite establecer un campo de asociaciones entre
las locuciones nominales, adjetivas y adverbiales. El trabajo de etiquetado de las unidades de
análisis en el periodo estudiado permite establecer la relación entre las modalidades del discurso
presidencial y los procesos polı́ticos que determinaron el contexto de la producción y emisión de
los mensajes institucionales. Por tanto, el estudio propone un acercamiento interdisciplinario
sobre el discurso presidencial, considerando las variables discursivas, ling´’uı́sticas y polı́ticas, que
participan en la configuración de los mensajes del Ejecutivo Federal en México. Finalmente, se
⇤
Ponente
81
propone, como resultado de los hallazgos empı́ricos, una tipologı́a de las estrategias comunicativas y discursivas que articulan la concepción de la ”gobernabilidad democrática” en un contexto
normativo que pone en evidencia las limitaciones reguladoras del Estado mexicano.
Contraseña: Discurso, unidades fraseológicas, locuciones, gobernabilidad y democracia.
82
Gramática española para hablantes de
francés: el uso de la preposición ”de”
después de matrices del tipo es posible.
Marı́a Adelaida Gil Martı́nez
1
⇤† 1
Instituto Cervantes de Burdeos (IC Burdeos) – Instituto Cervantes – 57, Crs de l’Intendance 33000
Bordeaux France, Francia
Una de las dificultades más habituales en el aprendizaje del español por parte de hablantes de
francés es el uso de las preposiciones, sobre todo el uso excesivo de la preposición de en matrices
del tipo es posible, conformando una de las fosilizaciones más caracterı́sticas en la interlengua
de dichos hablantes hasta el nivel B1. Si bien en los niveles iniciales se podrı́a pensar en una
transferencia del francés al español, por ejemplo: (*es posible de dejar de fumar ) del francés
(c’est possible de arrêter de fumer), en el nivel B1 se podrı́a llegar a considerar una estrategia
para evitar el uso de subjuntivo, al no dominar la alternancia entre los dos modos en español.
Teniendo en cuenta, además, que la transferencia sintáctica de la L1 es observable hasta niveles
muy avanzados, no es raro que se observe este tipo de error en estos estadios del proceso de
enseñanza-aprendizaje.
Para corroborar esta hipótesis, hemos recurrido al Corpus de aprendices de español como
lengua extranjera (CAES), un corpus diseñado por un equipo de la universidad de Santiago y
financiado por el Instituto Cervantes, que consiste en textos escritos producidos por estudiantes
de español con diferentes grados de dominio ling´’uı́stico (niveles A1 a C1 del Marco común europeo de referencia, aplicado al español en el Plan curricular del Instituto Cervantes. Niveles
de referencia para el español ) y procedentes de seis L1: árabe, chino mandarı́n, francés, inglés,
portugués y ruso. Los objetivos que se persiguen en esta propuesta son los siguientes:
• Ver hasta qué punto CAES corrobora esta hipótesis al analizar y valorar, a través de
técnicas estadı́sticas, la presencia de la matriz (*es posible de) en el aprendizaje de español
por hablantes de francés.
• Explorar en qué contextos aparece esta estructura y qué información podemos obtener de
los muestras de CAES. El análisis contrastivo de dos lenguas o más a través de corpus
ling´’uı́sticos nos permitirá valorar cómo funciona esta estructura dentro del discurso y
determinar hasta qué punto su aparición se debe a la transferencia de la L1 o a otras
estrategias de aprendizaje por parte de los hablantes de francés.
• Construir un banco de ejemplos que pueda servir más tarde para el diseño de actividades
y tareas que llevar al aula y que actúen como material-mediador-revulsivo que mejore el
proceso de enseñanza-aprendizaje.
⇤
†
Ponente
Autor correspondiente: [email protected]
83
Los primeros resultados que arrojan las muestras analizadas de CAES nos hablan de los siguientes contextos:
• Las matrices del tipo *es posible aparecen seguidas de la preposición de en un 38% de los
casos.
Contraseña: Corpus de aprendices, hablantes de francés, matrices de subjuntivo, interlengua, ELE,
transferencia de la L1
84
Hedging in tourism discourse: the variable
genre in academic vs professional texts
Francisca Suau-Jiménez
1
2
⇤ 1
, Carmen Piqué-Noguera
⇤ † 2
FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ. UNIVERSITAT DE VALENCIA
(IULMA-UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España
FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ (IULMA - UV) – 32, AV BLASCO
IBÁÑEZ 46010 VALENCIA, España
In the last decades e-genres have been at the forefront of academic, professional and social
studies to enhance writing in these areas (Thaine 2015), and tourism has often been targeted
as one of them. Recent studies (Suau-Jiménez 2016; Mapelli 2016) have shown that tourism egenres strongly challenge the interpersonal model of metadiscourse for academic genres (Hyland
2005). Therefore, we hypothesize that one of the most representative interpersonal markers,
hedges, should also show important di↵erences in what respects functions, frequency and language grammatical realization across Research Articles vs Hotel Websites. Genre and discipline
are two variables that have been claimed to challenge the original interpersonal metadiscourse
model that took English and academic discourse as their main referents. Hedges are prototypical
markers in academic writing, and also central in tourism genres of promotion in English (Suau
Jiménez 2012) since they reveal di↵erent author’s functional attitudes and commitments with
content and with readers’ implication. Hedges, however, are not always so easy to discriminate,
as Nash (1992) points out, claiming their fuzziness in interpersonal metadiscourse.
This research analyzes a 100.000-word corpus composed of two sub-corpora of Hotel Websites
and Research Articles in tourism. We aim to uncover which generic functions they partake or not
in each case, their frequency, as well as the nature of their grammatical realization in both genres.
Methodologically, we have taken Hyland’s (2005) taxonomy as a starting point and adapted
it to the corpus at hand. We have disposed of so-called research verbs, such as ‘argue’ or ‘indicate’ since they only appear in Research Articles. Then, since modals ‘should’, ‘could’ and
‘may’ are shared by both genres, we have taken them as our specific object of analysis from an
interpersonal discourse approach (Hyland 2005).
Preliminary results from a pilot corpus of 34.000 words for both genres already showed a quantitative di↵erence: 8.03 hedges in Research Articles versus 4.07 in Hotel Websites. Besides,
modals ‘may’ and ‘should’ present specific occurrences: whereas we have counted 21 in Articles,
there are 54 in Hotel Websites. Also, ‘should’ appears 13 times in Articles and 15 in Hotel Websites, whereas ‘could’ has an occurrence of 12 times in Articles versus 2 times in Hotel Websites.
Frequency may imply a di↵erent way to approach and persuade each readership, this being related to specific functional needs to achieve the genre communicative aim. Tourism marketing
use these modal verbs to give advice to prospective clients or to describe what they would find
around the hotel premises, whereas Research Articles writers make use of these modals especially in their argumentative and speculative sections. Their use is typical when making claims
which are more or less tentative, or when a possible outcome is more or less probable, often
accompanied by qualifying adverbs like ‘relatively’, ‘generally’ or ‘largely’.
⇤
†
Ponente
Autor correspondiente: [email protected]
85
Conclusions point towards interpersonal metadiscourse as a research framework that must consider the variables genre and discipline in order to render ad hoc analyses that can explain
contextually marker frequencies and lexico-grammatical realizations, so that adequate discursive and socio-linguistic implications can be drawn.
Contraseña: hedges / interpersonal metadiscourse / interpersonality / professional and academic
genres
86
Identificación de fórmulas recurrentes en
español académico
Marcos Garcı́a Salido
1
⇤ 1
, Marcos Garcia González 1 , Margarita Alonso
Ramos 1
Departamento de Galego-Portugués, Francés e Lingüı́stica, Universidade da Coruña (UDC) – España
En cualquier género discursivo existen combinaciones recurrentes o rutinarias de unidades
léxicas. Dichas combinaciones son en muchas ocasiones semánticamente composicionales, pero
su realización léxica está condicionada por la representación conceptual que el hablante desea expresar. Ası́, por ejemplo, para expresar la presentación de las conclusiones de un texto,
en conclusión resulta más idiomática que ?a manera de conclusión. Desde una perspectiva
fraseológica, este tipo de secuencias se han denominado clichés (Mel’čuk, 2015) y se solapan
hasta cierto punto con el concepto de lexical bundle (Biber et al., 1999). Por sus caracterı́sticas,
la comprensión de tales secuencias no es problemática, pero sı́ puede serlo su producción, de ahı́
el interés de un diccionario que las recoja, especialmente para hablantes no nativos o escritores
noveles. El objetivo del presente trabajo es evaluar la eficacia de diversos métodos empleados
para la identificación automática de secuencias pluriverbales, con vistas a la compilación de un
diccionario de español académico.
Hemos considerado como fórmulas recurrentes secuencias de dos, tres y cuatro palabras con una
frecuencia de al menos diez ocurrencias por millón de palabras (Biber et al., 1999). Se han
obtenido ası́ fórmulas como cabe destacar que o en el presente trabajo, al lado de otras de interés
más dudoso como et al. 2002, no se han, etc. Para identificar aquellas que son caracterı́sticas
del discurso académico hemos comparado fundamentalmente dos estrategias: (i) combinar un
ı́ndice de dispersión (DP, Gries, 2008) con un valor de log-likelihood indicativo de diferencias
significativas en cuanto a la distribución de las fórmulas estudiadas con respecto a textos no
académicos y (ii) usar exclusivamente un test que a la vez mide diferencias de distribución y
tiene en cuenta la dispersión de las formas testadas (Wilcoxon-Mann-Whitney [WMW]; cf. Kilgarri↵, 2001; Paquot y Bestgen, 2009; Lijffijt et al. 2015). El corpus de referencia utilizado
(la parte en español del SERAC, InterLAE, 2008) se compone de artı́culos cientı́ficos de cuatro
áreas diferentes (Humanidades, Ciencias Sociales, Fı́sica e Ingenierı́a y Ciencias de la salud) y
se contrasta con textos narrativos procedentes del corpus LEXESP (Sebastián et al., 2000).
Como sucede en otros estudios (Paquot y Bestgen, 2009), el test de WMW se muestra, en
principio, más conservador que el log-likelihood. Por ejemplo, si consideramos solo los bigramas
con un valor p  0,0001 en el test de WMW, nos quedarı́amos únicamente con un 25% de estas
secuencias. Con el mismo valor p, el test log-likelihood producirı́a una lista del 73% de la cantidad original de bigramas. Ahora bien, esta última lista puede reducirse a solo los bigramas de
mayor dispersión, acortando sensiblemente la distancia entre los resultados de los dos métodos.
El análisis tanto de las listas obtenidas como de los elementos que se han quedado fuera de
acuerdo con los distintos umbrales de significatividad y dispersión proporcionará información
acerca de la precisión de los filtros usados y de su exhaustividad.
⇤
Ponente
87
Contraseña: fórmulas, discurso académico, diccionario, extracción automática de keywords
88
Impact of Parallel Corpora as Translation
Memories on Phraseological Translation
Quality in Student Translations of
Specialized Medical Texts
Heidi Verplaetse
1
1
⇤† 1
, An Lambrechts
, Kris Heylen
⇤
1
KU Leuven, RU Quantitative Lexicology and Variational Linguistics – Bélgica
ABSTRACT
Theoretical background and main arguments
Recently K´’ubler et al. (2016) conducted a student experiment using comparable corpora,
indicating that these corpora help to solve translation difficulties, such as those relating to tense
and aspect, the use of prepositions and collocations, etc. However, certain error types occur
more often with the use of a corpus, possibly because of overconfidence in the corpus or a lack
of time when making extensive use of it. Aside from comparable corpora, the use of parallel
corpora as translation memories (TMs) integrated in a CAT tool, provides another excellent
means to prepare students for their future professional environment, reflecting the needs of professional translators. Corpora improve student translations because they contain information
which is not included in dictionaries, particularly with regard to terminology and idiomatic expressions (cf. phraseology) (Frérot, 2009). This is confirmed by K´’ubler (2011), who states that
parallel corpora seem to be the perfect tool for a translator: next to the terminology needed for
the translation task, they also provide the translator with the necessary phraseology. By using
parallel corpora and integrating these in a CAT tool, it is not only possible to exploit the abovementioned benefits of corpora, but also those of TMs: not only do TMs speed up translation,
leading to an increase in the translator’s productivity and gains, but they also have a positive
influence on the overall translation quality. By recognizing previously translated segments, TMs
increase the consistency at the stylistic, phraseological and terminological levels (Austerm´’uhl,
2006). However, when trying to increase their translation output, translators may work too fast
if they have a TM, negatively influencing translation quality as they use translations from the
TM without verifying them first (Bowker, 2005).
Aims and method
In order to assess the influence of CAT tools and preset corpus-based TMs on translation quality
on a phraseological level, we examine translations of specialized medical texts executed by MA
students of Translation. In our experiment the source texts contain predefined translation difficulties. The students perform the translations under three di↵erent conditions, viz. (i) without
CAT tools, TMs or external resources, (ii) with a CAT tool and a TM and (iii) with external
⇤
†
Ponente
Autor correspondiente: [email protected]
89
resources only. For the medical translations in our current tests the students use the parallel
corpus from the European Medicines Agency (EMA) compiled by Tiedemann (2009) as a TM.
Upon completion of the translation an analysis of the predefined translation difficulties is executed based on an error classification (cf. MeLLANGe error typology, K´’ubler et al., 2016). We
use an error typology, as errors can be defined more easily and precisely than translation quality:
translation quality depends on the absence of errors to a large extent. And as stated by Schiaffino
and Zearo (2005), among others, translation quality should be assessed as objectively as possible.
Pilot test results
Our pilot test with student translations led to the insight that concordance searches in TMs
of parallel corpora prove beneficial for looking up specialized medical terminology (-, 2015),
whereas mere TM support without concordance searches provided little added value. Terminology look-up through concordance searches proved especially beneficial for more difficult items.
In these experiments, however, also the exclusive use of external resources (excluding CAT tools
and TMs) showed a considerable positive influence on the translation of specialized terminology
(-, 2015).
Contraseña: Parallel corpora, Comparable corpora, Terminology, Phraseology, CAT tools, Translation Memories (TMs), Translation quality, Translation for Specific Purposes, Medical translation
90
Investigating style and conventionality in
literary translation: a corpus-based
approach
Carolina Barcellos
1
⇤ 1
University of Brası́lia (UnB) – Campus Universitário Darcy Ribeiro – Asa Norte – ICC Sul B1167/63 CEP: 70910-900 – Brası́lia /DF, Brasil
Corpus-based Translation Studies (BAKER, 1999, 2000; SALDANHA, 2011) have focused
on the style of translators, and addressed the translator’s discursive presence in the translated
text as a result. This research specifically investigates stylistic traits of a literary translator from
the perspective of conventionality and shifts in translation. It examines patterns of linguistic
choices made by a translator regarding conventionality (BAKER, 2007) in Brazilian Portuguese
that could be found both in his work as a translator and as an author, and the consequences
of these choices for the recreation of meaning in the translated texts. Three corpora were compiled: 1) a corpus of translated texts written in Brazilian Portuguese by one of the current most
prominent Brazilian literary translators, Paulo Henriques Britto, 2) a corpus of non-translated
texts written in Brazilian Portuguese by Britto, and 3) a corpus of short stories written in American English by the authors Philip Roth, John Updike, and Jhumpa Lahiri that, with the first
corpus, translated texts by Britto, composed a parallel corpus. Two other corpora (COMPARA
and ESTRA) were used as control corpora for frequency reference regarding convencionality in
Brazilian Portuguese. Statistical data were obtained using the software WordSmith Tools c 6.0
(SCOTT, 2012), and elements related to conventionality in Brazilian Portuguese were analyzed
at the various orders (morpheme, word, group, and clause). The research methodology included
compilation, preparation, alignment and tagging the texts for later analysis with WordSmith
Tools c 6.0. The identification of patterns in the translated texts, attributed to the translator’s
style and not to the linguistic constraints of the American English/Brazilian Portuguese pair,
take on board mainly what was postulated by Munday (2008), Saldanha (2011) and Baker (1999,
2000, 2007). The results indicated that Britto made a set of choices to some extent distinct for
each translated text, under the influence of the style of source texts. In general, the linguistic
choices made by Britto regarding the use of conventional expressions increased the degree of colloquialism in the translated texts when compared to their respective source texts. In addition,
the set of choices identified in Britto’s non-translated texts presented similarities with the set of
choices identified in his translated texts, in particular with the ones in Philip Roth’s work. The
most frequent shift in translation was addition (an amplification subcategory). These instances
of addition were not directly related to explicitation. They were, on the other hand, related to
a preference from the translator to use conventional expressions in translated texts, even when
there was no clear motivation for this in the source texts. Britto also made use of sanitization,
erasing some cultural references from the source texts. Nevertheless, the translator’s creativity
consistently outweighted the use of sanitization, corroborating the results obtained by Munday
(2008) and refuting, to some extent, the ones obtained by Baker (1999, 2000).
⇤
Ponente
91
Contraseña: Conventionality, Style of Translation, Literary Translation, Corpus, based Translation
Studies.
92
Investigating the cognitive potential of
primary EFL textbook activities: a
corpus-based study
Joaquı́n Gris Roca ⇤† 1,2 , Raquel Criado Sánchez ⇤ ‡ 3,4 , Agustı́n Romero
Medina§ 2,5 , Isabel Alonso Belonte ⇤ ¶ 6
1
3
University of Murcia (UMU) – Universidad de Murcia, Facultad de Ciencias Sociosanitarias, Campus
de Lorca, Antiguo Cuartel Sancho Dávila, Avda. de las Fuerzas Armadas, s/n, Lorca 30800 Murcia,
Spain, España
2
Université de Murcie – España
University of Murcia (UMU) – Universidad de Murcia, Facultad de Letras, Campus de la Merced, C/
Santo Cristo, 1, 30071 Murcia, Spain, España
4
Université de Murice – España
5
University of Murcia (UMU) – Facultad de Psicologı́a University of Murcia Campus de Espinardo
30100 Murcia, Spain, España
6
Université autonome de Madrid – España
Textbooks and activities are fundamental tools in the EFL classroom (e.g. Littlejohn, 2011;
Montijano-Cabrera, 2014; Sánchez, 2004; Tomlinson, 2003, 2011) as they are often the only
means to a↵ord students opportunities to practise the L2 in (very often) poor-quality-input environments, as is the case of EFL contexts. Teachers can use them in a variety of ways, mainly
to convey the L2 knowledge to students through practice or to support the explanations they
present in class.
Basically, there are three types of activities according to the type of knowledge they foster (Gris,
2015): i) activities whose teaching nature is mostly or fully explicit, which primarily foster explicit linguistic knowledge (e.g. knowledge of the forms); ii) activities with a high or full implicit
teaching load, aimed at developing implicit knowledge (which underlies oral and written fluency); and iii) activities that have a mixed teaching load, that is, partially explicit and implicit.
The selection and implementation of activities taking into their explicit and implicit teaching nature is crucial for a balanced development of both explicit and implicit knowledge, given
that the ultimate goal of Foreign Language Teaching should be the attainment of the latter
(e.g. DeKeyser, 2015, etc.). This issue becomes particularly sensitive when it comes to child
L2 acquisition (Abello-Contesse et al., 2006), since earlier stages of acquisition are believed to
be decisive for aspects such as pronunciation, intonation and fluency (Agustı́n-Llach, 2016; Alizadeh, 2011; Paradis, 2007).
Therefore, the objective of this preliminary study is twofold: firstly, to analyze the load of
explicit and implicit teaching nature of activities pertaining to EFL textbooks from di↵erent
and representative editorial houses, used in primary school in Spain; secondly, to discern their
cognitive potential.
⇤
Ponente
Autor correspondiente:
‡
Autor correspondiente:
§
Autor correspondiente:
¶
Autor correspondiente:
†
[email protected]
[email protected]
[email protected]
[email protected]
93
The method to analyze and categorize activities involved two basic steps. The first one entailed the creation of a corpus by compiling 100 activities from 10 real EFL textbooks used
in the first year of Spanish primary school in Spain. The activities were randomly selected
from two textbooks from each of the major EFL textbook editorials in Spain (Oxford University Press, Macmillan, Cambridge University Press, Santillana/Richmond, Pearson, Burlington
Books, Anaya). Unit and activity selection within each textbook was randomly undertaken too.
Secondly, each individual activity in the corpus was tagged with its explicit and implicit teaching
load.
Data analysis is ongoing and it is expected that this study will contribute to shed light on
the patterns of activity typology of EFL primary-school textbooks. This will unveil the cognitive potential underlying textbook activities used for child EFL teaching. Derived pedagogical
implications will be indicated.
Contraseña: Primary school, EFL teaching, textbooks, activities, corpus
94
Investigating the relationship between L1
and L2 collocation processing in the
bilingual mental lexicon from a
cross-linguistic perspective
Hakan Cangir
2
⇤ 2,1
University of Exeter – Graduate School of Education St Luke’s Campus Heavitree Road Exeter Devon
EX1 2LU United Kingdom, Reino Unido
1
Ankara University, School of Foreign Languages (AU YDYO) – Ankara Üniversitesi Gölbaşı 50.yıl
yerleşkesi Bahçelievler Mahallesi Kaymakamlık arkası 06830 Gölbaşı/ANKARA, Turquı́a
Many studies have investigated how the bilingual mental lexicon is structured and it has
been suggested by various researchers that both lexicons seem to interact in some way during
the language production. However, there are certain disagreements in terms of the interaction
between the two mental dictionaries during the lexical activation process; in particular, in which
phase of the activation process one can observe an interaction. Another related topic scrutinized by many applied linguists is whether the activation of lexis is language specific or language
non-specific. The current study attempts to assume the process to be language non-specific and
tries to shed light on the cross-linguistic nature of the bilingual mental lexicon with a specific
emphasis on collocations, which seem to be an understudied topic. In addition, the research
approaches the issue of cross-linguistic lexical priming from a syntagmatic perspective with the
help of a typologically di↵erent language, Turkish, which previous research appears to lack. It
is assumed that frequency, congruence, and typological variety are likely to have an impact on
lexical processing, collocations in particular.
With this notion in mind, the researcher exploits two representative and balanced corpora,
Corpus of Contemporary American English (COCA) and Turkish National Corpus (TNC) to
develop reliable items to be employed in a cross-linguistic collocational priming experiment and
attempts to observe the response times of English-Turkish bilinguals and investigate the influence of frequency, congruence and typology on collocational processing.
Building on lexical priming theory which suggests that every word is primed to occur with
particular other words it collocates, the study attempts to refer to the Spreading Activation
Model as the underlying theory to lexical activation and examine the cross-linguistic aspect
of collocational priming in bilinguals. Furthermore, as the core framework for cross-linguistic
collocational priming, Dual Activation of Collocational Connections Model and Psycholinguistic
Model of Vocabulary Acquisition in L2 are employed due to the two di↵erent language acquisition settings reflected in the study; i.e. English as a Second Language (ESL) and English as a
Foreign Language (EFL).
The initial results indicated that a strong priming e↵ect seems to exist in Turkish based on the
results of a monolingual priming experiment designed to set the baseline for the main experiment.
Furthermore, the findings of the cross-linguistic priming experiment suggested that a priming
e↵ect appears to be present for ADJECTIVE+NOUN collocations, but not for VERB+NOUN
combinations, which can be regarded as a typology e↵ect on the processing of collocations cross⇤
Ponente
95
linguistically. What is more striking is that the direction of the presentation in the priming
experiment appears to have the strongest impact on response times. That is, when the prime
word was in L1 and the target word was in L2, the processing seems to be facilitated and a statistically more significant priming e↵ect can be detected. Last but not least, congruent and more
frequent (having a higher P1—2) collocations yielded more significant cross-linguistic priming
e↵ect. The regression analysis revealed that the direction of the presentation and P1—2 are
strong predictors of the mean response times of the subjects in the cross-linguistic collocational
priming experiment. The results were discussed in the light of the lexical processing models
stated above.
Contraseña: Collocational Priming, Mental Lexicon, Bilingual, Corpora and Crosslinguistic
96
Knowledge extraction for TKB phraseology
module design
Pilar León-Araúz
1
⇤ 1
, Arianne Reimerink
⇤ † 1
University of Granada (UGR) – Buensuceso, 11 18001, España
Certain authors define phraseological units as all word combinations with certain stability
(Hausmann 1984, 1985, 1989; Gl´’aser 1994/95), even in specialized discourse (Roberts 1994/95,
Heid 1994, 2001; Montero 2003, 2008). According to Rundell (2010: vii), collocations are as
important as grammar since they make speakers/writers sound fluent. In specialized domains,
they are perceived by language users to contribute to the domain-specific flavor of special languages (Bartsch 2004). In this line, recent studies have highlighted the importance of verbs,
their collocations and argument structure in specialized terminology (Lorente 2007; Buendı́a
2012, 2013; Buendı́a, Montero and Faber 2014), but there are currently few terminographic
resources that incorporate them (L’Homme 1998; Buendı́a 2012). If terminological knowledge
bases (TKBs) want to be truly helpful for specialized writing, phraseological information should
be added in a consistent and user-friendly way. In EcoLexicon, a TKB on the Environment
(ecolexicon.ugr.es), phraseology was first included at the term level, linking verbs with arguments previously contained in EcoLexicon (Buendı́a 2013). However, certain verbs, or at least
some of the paradigms in which they can be framed, can also be regarded as semantic relations.
In EcoLexicon, knowledge extraction and representation is based on triplets or conceptual propositions (concept-relation-concept combinations; Faber, León and Reimerink 2014). Nevertheless,
the expressivity of some of the relations should be improved. For instance, the relations a↵ects,
has function, or cause could be divided into more specific relations. Conceptual propositions
such as erosion a↵ects landform would be more meaningful if the relation was reduces instead
of a↵ects. However, the TKB should also contain other verbs lexicalizing and specifying the
nuclear meaning of reduction (e.g. carve, degrade, erode, etc.) as well as other terms that can
also fill the slots of these arguments (e.g. weathering, cli↵, etc.)
For a phraseological module to be consistent with the conceptual module in EcoLexicon, it
should be based on the same principles. The design of our module is thus developed from the
categorization of term-verb-term collocates reflecting the di↵erent lexicalizations of conceptual
propositions. Thus, semantic relations can be further specified according to specialized predicates. In turn, phraseological templates can be generalized based on the semantic types related
in conceptual networks. However, these semantic types need to be extracted in a consistent way.
Top-down and bottom-up methods are applied to extract the information needed to build the
module. The first consists of establishing basic semantic categories in the environmental domain
(e.g. landform, structure, instrument, etc.), based on the definitions and conceptual networks
in EcoLexicon. This will result in a domain-specific ontology similar to that of CPA semantic
types, which is used in the Pattern Dictionary of English Verbs (PDEV; Hanks 2008). The
validity of this categorization is tested by comparing it to the results of the automatic clustering
(Brown et al. 1992) of a 50 million word corpus on the Environment. The latter consists of
extracting all verbs from the corpus with TermoStat (Drouin 2003) and classifying them into
di↵erent paradigms based on the concepts they relate and the basic conceptual relations they
⇤
†
Ponente
Autor correspondiente: [email protected]
97
express. These paradigms will be inspired in the patterns and implicatures of the PDEV and the
lexical domains described in Faber and Mairal (1999). The analysis of verbs and arguments will
contribute to the refinement of our semantic relations and categories as well as to the population
of the phraseological module.
Contraseña: phraseology, specialized discourse, TKB, categorization
98
L’analyse contrastive des références au passé
en français et en chinois -Sur le corpus des
récits
Xingzi Zhang
1
⇤ 1
Laboratoire – Université Paris III - Sorbonne nouvelle – Francia
La linguistique contrastive est considérée comme une branche de la linguistique appliquée,
qui étudie la comparaison des micro-systèmes de deux (ou éventuellement de plusieurs) langues
afin de faciliter leur enseignement et leur apprentissage. C’est une branche classique de la linguistique.
Les origines de la linguistique contrastive remontent aux années 1950, aux Etats-Unis. Deux
ouvrages peuvent être mentionnés, celui d’Uriel Weinreich (1953) sur le contact des langues et
celui de Robert Lado (1957) qui est considéré comme l’ouvrage fondateur de la discipline.
Nous allons choisir cette méthode, en appuyant sur nos corpus, afin de comparer la façon
de référer au temps du passé et à l’aspect, et pour étudier l’organisation temporelle du récit.
En français, on utilise des morphologies verbales pour exprimer à la fois le temps et l’aspect.
Dans la catégorie des temps du passé, le présent de narration, le passé composé, l’imparfait, le
plus-que-parfait et le passé simple sont souvent utilisés.
Le chinois est une langue sino-tibétaine qui est très éloignée de la langue française. Il ne dispose
pas de morphologie verbale comme les langues indo-européennes et est considéré comme une
langue aspectuelle, qui utilise des particules aspectuelles (” -le ” ” -zhe ”, etc.) ou des structures
(les RVCs, les redoublements de verbe, etc.) pour exprimer la temporalité.
Corpus :
Nous comparons la production écrite d’un récit basé sur un film muet, des deux groupes (un
groupe de français natifs (GF, n=8) et un groupe de chinois natifs (GC, n=8). Afin qu’ils
racontent le récit au passé, nous leur avons précisé que la situation reprise dans l’extrait s’était
déroulée une semaine avant, et ils devaient décrire en détail ce qu’ils avaient vu.
Résultats :
En comparant les récits rédigés par les chinois et les français, nous observons quelques di↵érences
pour marquer le passé dans les deux langues :
- En français, pour décrire un récit, les natifs utilisent systématiquement la morphologie verbale
pour remarquer le temps, cependant, en chinois, l’indication explicite du temps du passé est
indiquée par les adverbes temporels. Pour l’aspect, les chinois natifs utilisent les morphèmes
d’aspect comme ” -le ” ” -zhe ” ” zai- ”. En plus, les morphèmes sont optionnels, beaucoup de
propositions sont sans morphèmes, surtout quand elles expriment l’aspect imperfectif, la ma⇤
Ponente
99
jorité n’a pas d’indication explicite. Nous remarquons qu’en chinois, le type de procès est moins
flexible qu’en français, il peut indiquer aussi l’aspect.
- Pour marquer l’antériorité dans le récit, les français natifs utilisent le plus-que-parfait, le semiauxiliaire ” venir de ”, le participe passé ou bien le passé composé qui est en e↵et une forme
erronée du plus-que-parfait. Quant aux chinois natifs, en raison de l’absence de morphologie
verbale, pour marquer l’antériorité, les chinois utilisent les moyens lexicaux : ” ganggang ”/
” gangcai ” (tout à l’heure), etc., ou utilisent le morphème ” -le ”, la structure ” shi...de ”
(C’est...qui/que) qui marquent l’aspect perfectif dans le style indirect pour référer à une situation s’est passée antérieurement. Il y a également des propositions sans marquage, dans ce cas,
c’est l’information contextuelle qui permet d’identifier l’antériorité.
- Les français natifs ont tendance à raconter le récit de façon séquentielle. Mais les chinois
natifs racontent le récit de façon détaillée : les actions, les descriptions de personnages, les explications de situations s’imbriquent.
Contraseña: l’analyse contrastive, morphologie verbale, le passé, l’aspect perfectif, l’aspect imperfectif, l’antériorité
100
La adquisición de los verbos de cambio: Un
análisis de la interlengua de aprendices de
español (L1 sueco)
Ester Fernández
1
⇤ 1
University of Gothemburg (GU) – Suecia
El presente trabajo aborda el estudio de la adquisición de los verbos de cambio en aprendices suecohablantes de español lengua extranjera (ELE). El español dispone de una importante
cantidad de verbos que sirven para expresar la noción de cambio (ponerse, volverse, hacerse, convertirse en, etc.). Estos se diferencian entre ellos a nivel semántico (Morimoto y Pavón Lucero,
2007) ya que cada uno, junto con su complemento, expresa diferentes maneras de realizarse el
cambio (cambio de entidad, cambio procesual y cambio procesual resultativo).
El sueco dispone del verbo bli, un verbo general que sirve para expresar prácticamente cualquier
tipo de cambio. ¿Cómo tiene lugar la adquisición de estos verbos que no existen o no tienen una
equivalencia exacta en la L1 de los aprendices? ¿Qué formas ling´’uı́sticas utilizan los aprendices
suecohablantes para describir eventos de cambio en español?
El objetivo de esta comunicación es presentar los resultados de un estudio piloto llevado a
cabo durante un semestre académico con un grupo de aprendices suecohablantes (N=20) con
distintos niveles de competencia ling´’uı́stica (entre el A2 y el B2). Los participantes estaban
estudiando el primer curso de español (Grundkurs) en dos universidades suecas. Utilizamos
una tarea escrita (la narración de una historia a partir de unas imágenes) con el fin de obtener
muestras de lengua de la referencia al cambio. La tarea se repitió dos veces, al principio y al
final del curso académico. Además, esta fue realizada una vez por un grupo de hispanohablantes
(N=24). Observamos que era difı́cil identificar contextos obligatorios puesto que los nativos
tendı́an a variar su elección de los verbos con respecto a la descripción de eventos de cambio.
Esto nos llevó a plantearnos el estudio de la elección de formas de los aprendices desde un
enfoque variacionista. Dicho enfoque proviene del campo de la socioling´’uı́stica (Labov 1972),
sin embargo, se ha mostrado útil en el estudio del proceso de adquisición de segundas lenguas
(Tarone 1979, 1983, 2007; Ellis 1985, 1999; Gesslin 2010; Gudmestad 2006; 2012). Los mismos
factores (ling´’uı́sticos y extraling´’uı́sticos) que determinan la variación en el habla de los nativos
son responsables de los fenómenos de variación que se manifiestan en las producciones de los
aprendices.
Aplicamos un análisis del significado a la forma (Bardovi- Harlig 2007; 2014). Primero identificamos los contextos donde los aprendices habı́an expresado cambios y seleccionamos todas
las formas verbales y léxicas empleadas, codificándolas en función de una serie de variables
ling´’uı́sticas (tipo del cambio descrito, tipo de complemento con el que se combina la forma etc.)
A continuación, se comparó su uso con respecto a los niveles de competencia de los aprendices,
los dos momentos de la realización de la tarea y con los nativos.
Los resultados revelan que los aprendices usan variadas formas para expresar determinados tipos
de cambio (cambio de entidad, cambio procesual y cambio procesual resultativo). El diseño
⇤
Ponente
101
pseudo-longitudinal del estudio nos muestra tendencias sobre el desarrollo del sub-sistema gramatical de los verbos de cambio en la interlengua de los aprendices. Al principio del semestre se
observa, por ejemplo, un sobreuso de verbos como ser y estar que carecen del aspecto dinámico
propio de los verbos de cambio. Al final del semestre se observa que estos verbos se van reemplazando en mayor o menor grado por verbos de cambio más propios de la lengua meta.
Contraseña: Verbos de cambio, noción de cambio, Español como Lengua Extranjera, interlengua,
variación.
102
La detección y etiquetado de las estrategias
metadiscursivas en artı́culos académicos:
METOOL
Marı́a Luisa Carrió-Pastor
1
⇤ 1
Universitat Politècnica de Valencia (UPV) – España
Esta presentación trata sobre la identificación, etiquetado y comparación de las estrategias
metadiscursivas que se utilizan en la lengua española e inglesa en el registro de textos cientı́ficos,
ası́ como del análisis de la variación de estas estrategias en ambas lenguas. Esta investigación
se enmarca dentro del proyecto ”Identificación y análisis de las estrategias metadiscursivas en
artı́culos cientı́ficos en español e inglés (IAMET)”. Dentro del registro cientı́fico, hemos seleccionado tres disciplinas distintas entre sı́, la ingenierı́a, la medicina y la ling´’uı́stica para
determinar la variación del uso de estrategias metadiscursivas. Para ello, nos basamos en las
categorı́as metadiscursivas identificadas por Hyland (1998, 2005), Mur Dueñas (2011) y Briz
(2001, 2007) para identificar los elementos que las componen y ası́ establecer sus frecuencias
con el fin de realizar estudios contrastivos entre disciplinas y entre el español y el inglés. La
hipótesis de partida que hemos planteado es que las estrategias metadiscursivas se usan de manera distinta en inglés y español, lo que puede influir en la efectividad de la comunicación cuando
se utilizan como lenguas extranjeras. Los objetivos son, por un lado, analizar las estrategias
metadiscursivas en inglés y español en varias disciplinas del registro cientı́fico y, por otro, detectar la variación que aparece en estas lenguas y disciplinas. Por lo tanto, la finalidad es doble:
primero, caracterizar el discurso cientı́fico y sus estrategias retóricas que sirven para convencer
al lector y segundo, identificar patrones de variación con respecto a las estrategias analizadas
para que pueda utilizarse en la enseñanza del español e inglés. Ello se hace a través de la herramienta ’METOOL’ que se ha diseñado en el Research Institute for Information and Language
Processing (Universidad de Wolverhampton) para el etiquetado e identificación de los elementos
retóricos del discurso. Los matices que los escritores le otorgan a una lengua para persuadir al
lector son de interés tanto para los escritores académicos como para los docentes de lenguaje
académico, con lo cual la consecución de nuestros objetivos, es decir, la identificación y análisis
de la variación en el uso de las estrategias retóricas en artı́culos cientı́ficos, beneficia tanto a los
investigadores como a los escritores de este género, ya que sabrán si utilizan elementos retóricos
de forma adecuada y si consiguen su objetivo, es decir, convencer al hablante de la importancia
de su investigación. A través del análisis de los corpus y de la medición estadı́stica de la capacidad de involucrar al lector y convencerlo de los argumentos que se esgrimen, se puede medir el
uso de las estrategias de persuasión ası́ como proponer alternativas. Para realizar este proyecto,
en primer lugar se van a compilar los corpus en inglés y español en las tres disciplinas; en segundo lugar se van a identificar y etiquetar las categorı́as metadiscursivas y, en tercer y último
lugar, se van a clasificar y analizar las estrategias metadiscursivas en ambas lenguas y en las tres
disciplinas para determinar la variación, mostrando ejemplos de cada caso para identificar su
naturaleza. Aunque las estrategias metadiscursivas han sido estudiadas desde diversos ángulos,
no existe actualmente un trabajo que aborde la variación en el uso de estas estrategias y que
clasifique y contextualice los elementos a incluir en las categorı́as.
⇤
Ponente
103
Contraseña: metadiscurso, análisis comparativo, analizador, artı́culos académicos
104
La economı́a al borde de un ataque de
nervios: metáforas médicas en el discurso
periodı́stico económico
Ismael Ramos Ruiz
1
⇤ 1
Universidad de Granada – España
La metáfora se ha estudiado como un recurso literario hasta la aparición de la Ling´’uı́stica
cognitiva, cuando empieza a considerarse también un recurso cognitivo que forma parte de nuestro sistema conceptual. Por ello, la metáfora está presente tanto en la lengua general como en el
lenguaje especializado, a saber el caso de la Economı́a (Resche y Colin, 2016; Wang, Runtsova,
y Chen, 2013). Debido a ello, conocemos el uso de la metáfora en el discurso periodı́stico
económico (ej.: Nerghes et al., 2015) y, concretamente, el de la metáfora médica (ej.: Arrese,
2015).
Partimos de la hipótesis de que si la economı́a se entiende como un organismo vivo, muchas de
las enfermedades que sufre el ser humano serán empleadas en las proyecciones metafóricas, como
es el caso de las enfermedades mentales y del comportamiento. Por tanto, nuestros objetivos
consisten en:
• averiguar y analizar qué términos médicos relacionados con el ámbito de las enfermedades
mentales y del comportamiento aparecen en dicho discurso y qué relaciones se establecen
entre estos términos y otros términos del texto;
• establecer unos criterios de clasificación sintácticos y semánticos que permitan categorizar
dichas combinaciones léxicas metafóricas.
En primer lugar, hemos establecido un marco teórico basado en la Teorı́a de la metáfora conceptual (Lako↵ y Johnson, 1980, 1999), que nos ha ayudado a comprender la estructura de las
metáforas y proceder a su análisis, ası́ como en la Terminologı́a basada en marcos (Faber et al.,
2012), que nos ha servido para establecer los criterios sintácticos y semánticos de categorización
de las metáforas.
En segundo lugar, hemos creado un corpus para fines especı́ficos compuesto por textos periodı́sticos económicos de la prensa española, tanto de periódicos especı́ficos del ámbito económico
(ej.: El Economista) como de las secciones económicas de los periódicos de tirada nacional El
Paı́s y El Mundo. Para seleccionar los textos con presencia de metáforas, hemos empleado una
adaptación del Procedimiento de identificación metafórica propuesto por el Grupo Pragglejaz
(2007).
En tercer lugar, después de analizar el corpus y obtener las lı́neas de concordancia con presencia
de metáforas, hemos establecido unos criterios tanto sintácticos (mediante una adaptación de
⇤
Ponente
105
la propuesta realizada por Corpas Pastor, 1996) como semánticos (a partir de un evento conceptual prototı́pico en el que se establecen unas categorı́as semánticas) para clasificar dichas
combinaciones léxicas metafóricas. Además de establecer unas categorı́as semánticas, los eventos conceptuales muestran las relaciones semánticas entre las categorı́as, como son ”causa” o
”afecta”, y la proyección del dominio médico sobre el dominio económico, aplicando la Teorı́a
de la metáfora conceptual.
A continuación, mostramos unos ejemplos extraı́dos de la prensa con presencia de metáforas, ası́
como su categorización sintáctica y semántica:
• Estamos ante un nuevo brote psicótico de los mercados (El Mundo 2012)
Sustantivo + Adjetivo + Preposición + Sustantivo (SAPS).
PROCESO
• El problema radica en la incapacidad y pánico de nuestra economı́a (Cinco Dı́as 2009)
Sustantivo + Preposición + Sustantivo (SPS).
SIGNOS Y SÍNTOMAS
• El estrés de los bancos griegos e italianos (Expansión 2014)
Sustantivo + Adjetivo (SA).
PACIENTE
Contraseña: metáfora conceptual, ling´’uı́stica de corpus, fraseologı́a, periodismo económico, eventos
conceptuales
106
La mise en discours des données chi↵rées
dans les textes de vulgarisation scientifique
Riham El Khamissy
⇤ 1
1
Département de français, faculté des langues (AL ALSUN), Université Ain Chams, Le Caire –
Département de français Faculté des Langues (AL ALSUN) Université Ain Chams Rue khalifa
Maamoun Abbaseya Le Caire, Egipto
Les données chi↵rées ont cet atout de produire, chez le destinataire, cet e↵et d’incontestable,
d’irréfutable. Dans les médias, les journalistes peuvent rapporter une statistique de sorte
que celle-ci devienne l’élément central de l’article (chi↵rage de l’information). Dans ce cas,
l’explication des chi↵res constitue l’information secondaire. Or, le plus souvent, les statistiques
et les pourcentages servent à appuyer le texte même, à argumenter des énoncés et à conférer
une légitimité aux informations et aux idées.
Notre travail a pour objectif de saisir comment les journalistes traitent l’information chi↵rée
dans les articles de vulgarisation scientifiques (dans les médias de vulgarisation et la presse
généraliste) notamment ceux qui traitent le virus Zika qui a fait l’objet de nombreux débats
au cours des deux dernières années. Nous avons choisi les textes de vulgarisation plutôt que
les textes scientifiques parce que l’une des finalités les plus saillantes de notre travail consiste
à mettre en relief la volonté d’orienter le destinataire vers une attitude donnée, voire parfois le
manipuler, ce qui est, à notre sens, un phénomène qui se manifeste davantage dans les textes de
vulgarisation destinés au grand public généralement non averti. Nous sommes partie d’un corpus de 13090 documents en français répertoriés par la base Europresse.com entre le 1er janvier
2015 et le 31 décembre 2016, période où le virus a connu une expansion remarquable à l’échelle
planétaire.
Nous explorerons d’abord les données formelles. Nous examinerons le choix entre la forme
typique et classique du nombre (en chi↵res) et sa transcription (en lettres). Ensuite, nous analyserons les chi↵res dans leur environnement linguistique immédiat (le co-texte), lequel peut modifier l’information véhiculée par le chi↵re en matière d’exactitude, de précision et/ou d’orientation
argumentative selon la motivation communicative du journaliste. Sur ce, nous procéderons à
l’analyse des quantifieurs (jusqu’à, près de, aux environs de, autour de et aux alentours de,
près de etc.). Notre contribution s’inscrit dans la même lignée que les travaux d’Adler et Asnès
(2004, 2007, 2013), ceux de Ducrot (1983, 1995, 2002) approfondis par Doury et Moirand (2004).
La question que nous traitons, dans la présente contribution, n’est pas le recours aux chi↵res
mais plutôt leur mise en discours et leur soumission aux objectifs des journalistes pour influencer
l’opinion publique.
Résultat : d’après nos analyses, l’écart entre niveau factuel ou informatif d’une part et le niveau
argumentatif d’autre part est souvent que le reflet du passage des résultats numériques officiels,
témoins de la vérité, à des ersatz subjectifs de la réalité.
⇤
Ponente
107
Contraseña: Chi↵res, quantifieurs, opérateurs argumentatifs, textes de vulgarisation, presse
108
La modalité dans les discours politiques :
segments phraséologiques en langue et en
discours. Exploration textométrique d’un
corpus de débats présidentiels états-uniens
(1960-2016)
Marion Bendinelli
1
⇤ 1
Edition, Littératures, Langages, Informatique, Arts, Didactique, Discours (ELLIADD) – Université de
Franche-Comté – 30 rue Mégevand, 25030 Besançon cedex, Francia
Notre communication porte sur l’identification puis l’analyse énonciative et discursive de
segments phraséologiques incluant un ou plusieurs marqueurs verbaux de modalité (notamment
can, must, will, need to, have to). Ce travail repose sur l’exploration outillée d’un corpus, établi
en format XML-TEI, de discours politiques anglo-saxons composé de l’intégralité des débats
présidentiels organisés aux États-Unis depuis 1960. L’exploration est conduite au moyen des
logiciels d’analyse de données textuelles TXM (Heiden, Magué, Pincemin 2010) et Hyperbase
(Brunet 2010), et fait en particulier usage des modules permettant de consulter et/ou calculer
concordances, segments répétés et cooccurrents. Une telle exploration mettra en évidence les associations privilégiées entre (i) divers marqueurs de modalité ou (ii) entre marqueurs de modalité,
syntagmes nominaux sujets (groupe nominaux ou pronoms) et verbes ou, plus largement, classes
sémantiques verbales (verbes de communication, d’existence, d’activité... - selon la classification
établie par Biber, Johansson, Leech, Conrad et Finegan 1999). Ces associations ont parfois été
relevées dans divers travaux décrivant des genres discursifs (Dedaić 2004 ; Née, Sitri, Veniard
2014), des textes de spécialité (Gotti et Dossena 2001 ; Labbé et Labbé 2013) ou la grammaire
anglaise (Biber et al. 1999) ; ici, établies sur la base d’une co-fréquence statistiquement pertinente au sein du corpus, elles seront analysées comme des segments phraséologiques - collocations
(Firth 1957) et colligation (Hoey 2005) - de l’anglais, dans sa variante parlée aux États-Unis, et
du discours politique.
Dans un premier temps de l’étude, nous montrerons, par le biais de di↵érentes manipulations
des logiciels TXM et Hyperbase, comment l’approche textométrique permet de mettre au jour
l’existence de segments phraséologiques du type we must + verbe d’action non aspectuel (”
we must act ”) ou verbe mental + SN + can (” I believe that we can work together ”) dans
le cas des modaux must et can. Le calcul des cooccurrents permettra de mettre en évidence
des segments phraséologiques discontinus (les items n’étant pas nécessairement adjacents) et
ordonnés (l’apparition des items étant contrainte) du type can + must et/ou have to + will (”
we can fight terrorism [...], it has to be [...], therefore we must fight terrorism and we will ”).
Mettant en regard ces segments avec les données issues du corpus de référence COCA (Corpus
of Contemporary American English), établi par Mark Davies et librement interrogeable en ligne,
nous montrerons que certains sont spécifiques au discours politique, d’autres plus transversaux
car utilisés dans di↵érents genres discursifs, semblent davantage inscrits en langue. Quelques
éléments théoriques issus de l’analyse énonciative développée par Antoine Culioli et reprise par
Gilbert (2001) ou Deschamps (2001), à savoir les notions de construction et de parcours de
⇤
Ponente
109
l’altérité notionnelle, éclaireront par ailleurs le fonctionnement énonciatif des séquences modales
et leur fonction rhétorique.
Chemin faisant, cette communication articulera approche informatisée d’un corpus, analyse
statistique de données textuelles, analyses énonciative et discursive ; elle entend ainsi contribuer
à mieux connaı̂tre les caractéristiques linguistiques et discursives des discours politiques.
Contraseña: Segments phraséologiques, Collocation, Colligation, Discours politique, Débats présidentiels,
États, Unis
110
La traduction des ” megatermes ” anglais de
type erythrocyte invasion-inhibitory
response : une approche fondée sur corpus
et analyse du discours
Mojca Pecman
1
⇤† 1
, Natalie Kubler
, Alexandra Mestivier
⇤
⇤
1
1
Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus
(CLILLAC-ARP) – Université Paris VII - Paris Diderot : EA3967 – Université Paris Diderot Bât.
Olympe de Gouges case postale 7046 75205 Paris cedex 13, Francia
La linguistique de corpus a permis aux linguistes non seulement de fonder leurs observations sur les données authentiques, mais également d’étudier l’évolution de la langue et ses
tendances actuelles. En traduction spécialisée, tant dans le milieu professionnel que dans le
cadre d’une formation préparant les futurs traducteurs à s’adapter à ce milieu, la capacité à
envisager la dynamique actuelle des langues de spécialité devient un enjeu majeur de la qualité
de la traduction. Associée à l’envergure de la di↵usion de l’information spécialisée et à la rapidité d’évolution des connaissances, cette dynamique qui transparait ostensiblement dans les
corpus de linguistes, semble grandissante. Cette étude vise à démontrer comment une combinaison de l’analyse en corpus avec l’analyse en discours permet de capter la dynamique des
discours spécialisés et de trouver les solutions en matière de traduction. Nous illustrerons notre
propos sur l’exemple des problèmes de traduction que posent les groupes nominaux complexes
en anglais de spécialité tels que erythrocyte invasion-inhibitory response. Les groupes nominaux
complexes permettent de compacter ou condenser l’information, une caractéristique saillante du
discours spécialisé anglais. L’étude diachronique sur l’évolution des adjectifs composé anglais
de Mestivier-Volanschi (2015) fournit des preuves sur la fréquence en hausse des ces structures.
Gledhill (1999) et Jaime-Sisó (1993) étudient les mutations dans les titres des textes spécialisés
d’un format nominal vers un format à structure de phrase où les composés complexes permettent l’expression d’une structure argumentale de manière économique, selon un mécanisme qu’ils
appellent ”miniaturisation”. Les travaux de Maniez (2007, 2008) sur la langue médicale anglaise
et les groupes nominaux complexes discutent également de la propension de l’anglais pour la
nominalisation et de l’aide qu’o↵rirait aux traducteurs la création d’une base de données des
équivalences des GN complexes. En e↵et, la grande flexibilité de l’anglais quant à la formation des groupes et des syntagmes nominaux contraste avec le français, plus enclin à préserver
l’argumentation dans sa forme phrastique. Nous présenterons, dans un premier temps, le cadre
général de cette recherche qui s’inscrit dans la méthodologie d’enseignement de la traduction
spécialisée aux étudiants de Master pratiquée à l’université Paris Diderot. Cette méthodologie
repose sur l’analyse terminologique (Pecman et K´’ubler 2011) et donne lieu à des évaluations
⇤
†
Ponente
Autor correspondiente: [email protected]
111
à l’aide d’analyses quantitatives et qualitatives de corpus de traductions annotées (K´’ubler et
al. 2016). Ces analyses permettent d’améliorer la méthodologie d’enseignement de manière
incrémentale d’année en année. Nous montrerons comment cette méthodologie combine la pratique d’enseignement avec la recherche en traduction spécialisée pour inscrire notre étude dans
la lignée des travaux sur l’enseignement de la traduction par les corpus (Aston 1999, Zanettin
et al. 2004, Beeby et al. 2009, Castagnoli et al. 2011) et sur l’évaluation de l’apport des corpus en classe (Bowker & Bennison 2003, Frankenberg-Garcia 2009, Loock et al. 2013, Loock
2016). Nous illustrerons également l’évolution diachronique des composés adjectivaux dévoilée
par Mestivier-Volanschi (2015) pour démontrer la nécessité de la prise en compte de la tendance de l’anglais de spécialité à recourir aux GN complexes. Dans un deuxième temps, nous
présenterons l’analyse de l’exemple du groupe nominal anglais erythrocyte invasion-inhibitory response et nous tenterons de montrer les procédés utilisés pour véhiculer ce type d’information en
français (cf. les réponses immunes protectrices... médiées par des anticorps... inhibent l’invasion
des érythrocytes).
Contraseña: specialised traslation, translation teaching, corpus based approach, discourse analysis,
complex nominal groups
112
La traduction publicitaire : approche par
corpus
Isabel Comitre Narvaez
1
⇤ 1
Université de Málaga (UMA) – Université de Málaga – Campus de teatinos s/n - 29071 Málaga,
Francia
Si nous observons attentivement les messages publicitaires pour certains produits, nous nous
apercevons de la présence massive d’un vocabulaire technique, voire pseudo-scientifique (Remaury, 2000). Les grandes marques utilisent ce vocabulaire pseudo-scientifique comme argument
persuasif majeur pour gagner en crédibilité. En e↵et, la rigueur médicale et l’autorité scientifique
sont une garantie d’achat pour le futur consommateur (Valdés Rodriguez, 2004). C’est le cas du
lexique des produits appellés cosméceutiques (cosmétique + pharmaceutique) qui reflète à la fois
l’évolution de la société médico-esthétique, le progrès technologique du domaine et l’innovation
scientifique de ce secteur d’activité. Au sein de l’Union Européenne, la question de la traduction se pose au-delà de la simple équivalence lexicale car elle touche également la législation
de chaque pays. Cependant, la traduction est au coeur de notre étude qui a pour principal
objectif de pointer les principales stratégies traductionnelles mises en oeuvre par le traducteur
en publicité. Pour ce faire, nous avons analysé un corpus d’annonces bilingues que nous avons
constitué à partir des critères proposés par Guidère (2009, 2011). Notre corpus comparable
bilingue contient environ 750 termes en français et leurs équivalents en espagnol. Ce corpus
” ad hoc ” que nous avons créé a été puisé sur les sites officiels de grandes marques de produits cosméceutiques. Nous avons repéré ce lexique en relevant sur les sites officiels di↵érents
procédés qui permettent de conférer aux produits cette allure pseudo-scientifique (dérivation
préfixale, suffixale, emprunts, composition, abréviations, acronymie, siglaison alphabétique ou
chi↵rée, confixation, mots-valises, utilisation des majuscules, etc). Après cette première approche, nous avons comparé le vocabulaire repéré dans les mêmes sites en espagnol afin de
mettre en lumière les stratégies traductionnelles utilisées. Or, dans une communication telle
que la communication publicitaire où l’aspect visuel coexiste avec l’aspect verbal, nous avons
évidemment pris en compte les images des annonces car celles-ci participent à la création du sens
global de la publicité, voire même porter toutes seules le sens de la publicité. C’est la raison
pour laquelle nous avons choisi la sémiotraductologie (Guidère, 2000, 2009, 2011; Guillaume,
2016) comme cadre théorique et méthodologique car ce paradygme traductologique considère
l’importance des signes non verbaux (images, personnages, cadre, émotions, sensations) lors du
transfert du sens en traduction. Notamment, le concept du ”cube traductologique” (Guidère,
2011, p 112) que nous avons adapté à notre objet d’étude; Ce modèle d’analyse nous a servi de
point de départ et nous a permis de déterminer 3 niveaux d’analyse spécifiques à la publicité:
celui des conceptions (idées générales de l’annonce transmises par le message linguistique); celui
des perceptions (informations sensorielles transmises par les messages iconique et sonore) et,
enfin, celui des intentions (implicites discursifs culturels et idéologiques). Le modèle d’analyse
ainsi obtenu nous permet, d’une part, d’identifier et de classifier le lexique pseudo-scientifique
spécifique caractéristique des cosméceutiques et porté par le message verbal et, d’autre part,
d’appréhender le sens transmis par l’image et toutes les informations sensorielles portées par le
messages non verbal et contenues dans les annonces de notre corpus dans le but de déceler les
stratégies traductionnelles qui sous-tendent les choix du traducteur de campagnes publicitaires.
⇤
Ponente
113
Contraseña: traduction publicitaire corpus comparable bilingue
114
Le continuum lexique-grammaire en genre
spécialisé à partir de corpus maison
Laurent Gautier
⇤ 2,1
, Cyril Nguyen Van
⇤
2
2
1
Maison des Sciences de l’Homme de Dijon USR3516 (MSH Dijon) – Université Bourgogne Franche
Comté – Esplanade Erasme, 21000 Dijon, Francia
Centre Interlangues Texte Image Langage (TIL) – Université Bourgogne Franche Comté – Université
de Bourgogne-Faculté de Langues et Communication 2 Bd Gabriel 21000 Dijon, Francia
[Problématique et objectifs] La proposition, qui s’inscrit dans l’axe 5 de l’appel ” Corpus,
études contrastives et traduction ” vise à interroger l’apport des corpus spécialisés maison (Loock
2016a, b) pour la mise au jour, pour la traduction professionnelle et la formation de traducteurs,
des patrons lexico-grammaticaux inhérents à des moules textuels (Gautier 2009) hautement contraints, en langue(s) traduite(s). On discutera en particulier, à la suite de K´’ubler/Gledhill (2016
: 75), l’idée selon laquelle l’interrogation systématique de corpus homogènes permet d’aboutir à
une représentation holistique vérifiée des interactions entre lexique et grammaire, surtout quand
chacune des deux composantes est mise en œuvre à travers des répertoires (très) réduits par
rapport aux possibilités o↵ertes par le système linguistique considéré. Ces patrons peuvent en
e↵et représenter pour le traducteur un ” sous-texte ” à partir duquel les choix de traduction se
feront de manière ” naturelle ” à l’interface entre contenus conceptuels du texte à traduire et
mise en mots et en textes.
Données
Cette problématique sera instanciée par un corpus clos, compilé manuellement, et composé des
conférences de presse de la Banque Centrale Européenne 2015 et 2016 dans leur version originale
en anglais (19.883 mots) et dans leurs traductions en français (23.931 mots), allemand (19.810
mots) et néerlandais (21.324 mots). Par-delà son caractère de prime abord parallèle (Teubert
1996), chacun des sous-corpus sera envisagé pour lui-même, comme corpus de langue traduite,
la comparaison avec l’original ne jouant qu’un rôle périphérique.
Méthodologie
On partira tout d’abord de la fréquence des termes N pour en interroger systématiquement les
combinatoires, en particulier verbales, afin de dresser un inventaire systématique par langue des
structures argumentales dans lesquelles ils s’inscrivent. Ce faisant, la dimension formulatoire,
indispensable au traducteur pour la fluidité de son texte, sera mise en avant en particulier pour
les langues, allemand et néerlandais en tête, qui jouent sur l’emploi de N prédicatifs associés à
des V supports préférentiels non prédictibles :
(01) Insbesondere m´’ussen die entschlossene UmsetzungNPRED von [G´’uter- und Arbeitsmarktreformen]ARG sowie die Bem´’uhungenNPRED [zur Verbesserung des Gesch´’aftsumfelds
f´’ur Unternehmen]ARG in einigen L´’andern intensiviertVSUP werden.
⇤
Ponente
115
(02) Ten tweede was, hoewel de tussen juni en september vorig jaar genomen monetairbeleidsmaatregelen tot een aanzienlijke verbeteringNPRED [in termen van de koersen op de financi´’ele
markten]ARG hebben geleid VSUP, dit niet het geval voor de kwantitatieve uitkomsten.
On s’arrêtera ensuite, à partir d’une analyse des n-grams, sur les structures récurrentes, analysées
ici en termes de routines discursives, dont l’emploi, par-delà la terminologie et les collocations
conceptuelles, garantit l’appartenance du texte au genre, comme en (03) :
(03) D : nach wie vor, mit Blick auf ; F : au cours des prochains mois, (x) des prix à moyen
terme, NL : (van) de additionele aankopen van, op de middellange termijn
Discussion
Les résultats seront discutés d’une part par rapport à l’implémentation des corpus, en particulier
maison, dans la formation des traducteurs – et ce par-delà leur présence ” dissimulée ” dans
nombre d’outils de TAO, à commencer par les MT – et d’autre part par rapport au cloisonnement souvent systématique entre un module grammatical, un module terminologique et un
module ” stylistique ” qui, pour des types de textes spécialisés (très) contraints, vole en éclat
dès que l’on part de la langue en usage attestée en corpus.
Contraseña: corpus maison, genre, lexique, grammaire, routine discursive, terminologie, LSP
116
Le marqueur discursif ”donc” dans deux
corpus dialogaux de di↵érente nature
Gemma Delgar Farrés
1
⇤ 1
Université de Vic-Université Centrale de Catalogne (UVic-UCC) – C. de la Laura, 13 08500-VIC
(Barcelone), España
Notre étude porte sur l’analyse du marqueur discursif donc dans un corpus de conversation
réelle, le Minnesota Corpus (Kerr, 1983), et dans un corpus de dialogue de théâtre, la pièce Le
Mariage de Figaro de Beaumarchais. Comme point de départ, nous formulons les questions de
recherche suivantes : Les emplois de donc apparaissant dans les deux corpus sont-ils les mêmes
? Quelle est la distribution de ces emplois dans le corpus de conversation naturelle et dans celui
du dialogue de théâtre ? Les études linguistiques antérieures de donc signalent que ce marqueur
discursif peut avoir trois grands emplois : marque argumentative ou logique, marque de reprise
et marque interactive (Trésor de la langue française,1971-1994 ; Zenone, 1981 ; Hybertie, 1996
; Hansen, 1997 ; Pellet, 2005 ; Bolly et Degand, 2009 ; Delgar, 2010, 2013). La révision de ces
approches nous conduit tout naturellement à la description de donc donnée par Pellet :
In other words, the inferential aspect of donc may be viewed as a characteristic which is present
to varying degrees depending on the function that the discourse marker fulfills in a particular
context. The highest degree of ”inferentiality” is of course associated with the use of donc to
mark results and conclusions (argumentative). It is also high with donc to mark recapitulations,
confirmation requests, and resumptions. It seems ”less high” with the frameshift function (foregrounding) and with the discursive (emphasis) function. (2005 : 103)
En premier lieu, nous avons étudié les occurrences de donc des deux premières sections du
Minnesota Corpus et, en second lieu, nous avons réalisé la comparaison des résultats obtenus
avec ceux que nous avions déterminés pour Le Mariage de Figaro. Au vu de ces données, il faut
noter que les emplois et les valeurs sémantico-pragmatiques de donc sont quasi les mêmes dans
les deux corpus bien qu’il existe des valeurs qui n’apparaissent pas dans un des corpus, soit parce
qu’il s’agit d’emplois plus restreints du marqueur en situation dialogale, soit parce qu’elles sont
plus caractéristiques ou bien de la conversation authentique ou bien du dialogue de théâtre. Au
contraire, la distribution de ces emplois à l’intérieur des corpus est di↵érente car, dans le corpus
de conversation authentique, elle relève du fonctionnement de la communication réelle alors que,
dans le corpus théâtral, elle tient au fonctionnement du dialogue comme un projet d’écriture
prédéterminé par l’auteur.
Contraseña: valeurs sémantiques et pragmatiques, marqueur discursif, conversation, théâtre, corpus
⇤
Ponente
117
Learner vs. professional translational
behavior: The case of discourse markers
Maria Kunilovskaya
⇤† 1
, Natalia Morgoun
2
1
2
Tyumen State University (Utmn) – 625003, Volodarskogo 6, Tyumen, Russia, Rusia
Lomonosov Moscow State University - MSU (RUSSIA) – 119991, Moscow, GSP-1, 1 Leninskiye Gory,
Rusia
Learner vs. professional translational behavior: The case of discourse markers
Keywords: translational learner corpora, discourse markers, interference, frequency distribution, text-level linguistics, cohesion, translation studies, TQA
The major motivation behind this research is understanding linguistic behavior of translation
students in their mother tongue during translation. Which linguistic features (if any) make
them distinct from professional translations, can they be measured and targeted in the educational programmes? Another concern is describing the existing professional norm against
non-translated reference for a given direction of translation in a given language pair today. This
investigation is limited to mass-media texts and explores connectives frequences in English originals and Russian translations and non-translations as one possible operator of these di↵erences.
Levels of explicit text connectedness have been on the linguistic research agenda in computational and corpus linguistics for many years. It is an important textual feature that reflects
peculiarities of text production under di↵erent socio-pragmatic conditions. It has been found
that genres and entire languages vary not only in the inventory of the means used to signal relations between parts of text, but also by the intensity of their use (Liu, 2008; Fabricius-Hansen,
2005). Cross-linguistic di↵erences in textual strategies a↵ect translations and contribute to the
source language independent translationese hypothesized by Baker (Baker 1993). This has been
used to e↵ectively detect di↵erences between parallel corpora unseen by general similarity measures (Cartoni, 2011).
Discourse markers frequencies are used to establish di↵erences between translations and nontranslations and are interpreted as a linguistic indicator of several tendencies in translation such
as explicitation, simplification and convergence (Olohan, 2001; Chen, 2006; Denturk 2012). It
is important for this research that the intensity of ‘being a translation’ can be related to translation quality (Scarpa, 2006) and translational norms, operating within a particular direction of
translation and a particular language pair (Mauranen, 2004).
We set out to reveal tendencies in translational behaviour at di↵erent competence levels by
describing the frequency distributions of two functional types of discourse markers (connectives
and epistemic commentary markers) in learner and professional translations against sources and
non-translations. We compare data from a parallel translational learner corpus and a corpus of
professional translations to customized selections from English and Russian national corpora.
The total size of the research corpus amounts to 10 mln tokens. Using independent predefined
lists of targeted items for each language, we explore cross-linguistic di↵erences and their influence over the two types of translation. We test three possible tendencies: translation follow
source language pattern (interference); translations follow target language pattern (normaliza⇤
†
Ponente
Autor correspondiente: [email protected]
118
tion) or translations demonstrate independent idiosyncratic (over)use of connectives (explicitation). The observations are done with regard to the overall frequencies of the list items, their
semantic groups and individual frequencies. The latter approach reveals translationally distinctive connectives (Chen, 2006) – items that have statistically di↵erent frequencies in translations
as in originals. Manual analysis of parallel aligned data is used to verify the inferences from
statistical analysis and provides insights into typical errors which lead to a significant decrease
in the textual quality of learner translations.
Contraseña: translational learner corpora, discourse markers, interference, frequency distribution,
text, level linguistics, cohesion, translation studies, TQA
119
Les appositions nominales en français et en
slovène : étude contrastive sur le corpus
FraSloK
Adriana Mezeg
1
⇤† 1
Faculté des Lettres, Département de traduction – Askerceva 2, 1000 Ljubljana, Eslovenia
La présente communication aborde un phénomène grammatical que nous appelons, d’après
Combettes (1998), les appositions nominales, l’un des types de constructions détachées dont
les propriétés principales sont : la liberté de position dans la phrase, la séparation du reste
de la phrase par une virgule, la prédication seconde et la relation de coréférence avec le sujet
de la phrase (Combettes 1998). Il s’agit d’un groupe nominal qui n’est jamais précédé d’un
déterminant et qui établit avec le sujet principal une relation avec verbe être, par exemple :
Chef du gouvernement provisoire de la République française, il a signé à Moscou, le 10 décembre
1944, un ” traité d’alliance et d’assistance mutuelle ”, qu’il qualifie de ” belle et bonne alliance
”. (Le Monde diplomatique, avril 2008) La présente communication ne se propose d’analyser que
les traductions slovènes des appositions nominales françaises, placées en tête de phrase, cellesci étant le plus intéressantes contrastivement. L’apposition nominale s’avère problématique
du point de vue contrastif franco-slovène et ne peut pas être transmise en slovène par la même
structure, c’est-à-dire une construction détachée, car elle ne satisfait pas au critère de la mobilité
phrastique, ne pouvant pas, par exemple, occuper la position frontale. Ainsi supposons-nous
que l’explicitation grammaticale est de règle lors de la traduction de ces formes phrastiques en
slovène, les traducteurs devant les remplacer par d’autres structures. L’analyse contrastive sera
basée sur les exemples tirés semi-automatiquement du corpus parallèle français-slovène FraSloK
qui contient des articles de presse (Le Monde diplomatique, sous-corpus journalistique) et des
ouvrages littéraires (sous-corpus littéraire) publiés entre 1995 et 2008. Les deux sous-corpus
sont annotés morphosyntaxiquement et équilibrés au niveau de la taille, contenant ensemble
un peu moins de 2,5 millions de mots. Les exemples de constructions détachées nominales
initiales seront extraits du corpus français-slovène par le logiciel Paraconc (Barlow 1995) à l’aide
de patrons syntaxiques, composés d’étiquettes morphosyntaxiques et d’expressions régulières.
D’après les résultats du repérage automatique et du tri manuel, les appositions nominales sont
un peu plus fréquentes dans le corpus journalistique (178 occurrences contre 122 dans le corpus
littéraire). Souvent plus longues de la proposition principale, elles apportent, surtout dans
le discours journalistique, l’information sur la position et le statut social du référent de la
proposition principale. Cette étude vise à examiner comment les traducteurs slovènes a↵rontent
ces structures problématiques et propose d’en tirer des conclusions pratiques, utiles dans le
cadre pédagogique et dans la médiation interlinguistique franco-slovène. Les premiers résultats
montrent que le contenu des appositions nominales françaises est souvent exprimé en slovène sous
forme du sujet de la phrase, de l’attribut du sujet, de l’attribut de l’objet et d’une construction
liée (Combettes 1998) qui est, d’ailleurs, fréquente en slovène. La traduction des appositions
nominales françaises vers le slovène pose d’autres problèmes que nous constatons dans le contexte
pédagogique lors des cours de traduction, notamment les questions de l’ordre des mots, du
changement de place au sein d’une phrase et de l’emploi de la virgule, questions que nous
⇤
†
Ponente
Autor correspondiente: adriana.mezeg@↵.uni-lj.si
120
tâcherons d’éclaircir dans la communication proposée.
Contraseña: apposition nominale, construction détachée, corpus parallèle FraSloK, analyse contrastive, traduction
121
Les constructions verbales en comme : de
l’écrit scientifique à l’écrit académique des
étudiants natifs/non-natifs
Marie-Paule Jacques
⇤ 1,2
, Rui Yan
⇤ † 1
1
LInguistique et DIdactique des Langues Étrangères et Maternelles (LIDILEM) – Université Grenoble
Alpes – UFR des Sciences du Langage - BP 25 - 38040 Grenoble cedex 9, Francia
2
École supérieure du professorat et de l’éducation - Grenoble (ESPE Grenoble) – ESPE Académie de
Grenoble, Université Grenoble Alpes – 30, avenue Marcelin Berthelot - 38100 Grenoble, Francia
L’écrit scientifique fait un usage abondant d’une phraséologie spécialisée (Tutin, 2014),
qui s’y présente sous di↵érentes formes : collocations (Grossmann & Tutin, 2003), séquences
récurrentes (Tran, 2014) routines (Tutin & Kraif, 2016)... Cette phraséologie remplit des fonctions rhétoriques et discursives variées, par exemple, exprimer un point de vue, établir la cause
et l’e↵et, signaler une filiation scientifique, définir des termes et concepts, donner des éléments
de preuve, etc. Sa maitrise est de ce fait aussi importante que la maitrise de la terminologie et
de l’appareil conceptuel de la discipline.
Nous nous focaliserons sur la construction verbale associée à comme, dont une étude dans un
corpus d’articles de recherche en SHS montre qu’elle introduit souvent ” des comparatives métaénonciatives ” (Debaisieux & Martin, 2010, p. 321, cité par Grossmann, 2014, p. 764) : comme
nous l’avons montré/vu/souligné/dit, comme nous le verrons, comme nous l’expliquons, comme
illustré/indiqué dans la figure, etc. Ces quelques exemples mettent en évidence la contribution
de cette construction à l’argumentation scientifique : elle remplit ” une fonction métatextuelle
et/ou évidentielle ” (Grossmann, 2014) et ceci par la présence massive, après comme, de verbes
de constat (constater, voir ) ou de communication (dire, expliquer, souligner, montrer, indiquer ).
La construction a alors pour fonction de renvoyer vers un élément textuel ou un (fragment de)
discours qui servent de preuve ou de rappel.
Nous nous situons dans la perspective de son apprentissage par des scripteurs novices et envisageons d’étudier l’usage de cette construction par une comparaison des productions d’étudiants
natifs et non natifs et de textes de chercheurs, considérés ici comme experts de l’écriture scientifique. Dans la lignée de travaux centrés sur les phénomènes phraséologiques dans les écrits des
natifs/non natifs (Hyland & Milton, 1997 ; Ne↵, Ballesteros, Dafouz, Martı́nez, & Rica, 2004
; Granger & Paquot, 2009), nous considérons que le statut de novice en matière de rédaction
scientifique confronte identiquement les étudiants natifs et non natifs aux difficultés de l’usage
de la phraséologie scientifique. En revanche, comme le soulignent Granger et Paquot (2009), les
difficultés des étudiants non natifs méritent d’être prises en compte et traitées spécifiquement
puisqu’ils ont en outre des problèmes liés à la maı̂trise de la langue.
Nous examinerons donc l’emploi des constructions verbales associées à comme chez les étudiants
natifs ainsi que non-natifs en nous basant sur deux corpus composés de mémoires de master, et
en les contrastant à un corpus d’articles de recherche en SHS. Les premières observations manifestent des di↵érences aussi bien quantitatives que qualitatives : 1) Par rapport aux experts, ces
constructions sont sous-employées par ces deux publics. 2) Les étudiants montrent des emplois
⇤
†
Ponente
Autor correspondiente: [email protected]
122
di↵érents de ceux des experts, notamment concernant les verbes associés aux constructions en
comme. 3) Les étudiants non-natifs produisent des erreurs lexicales sur ces constructions.
Contraseña: construction verbale, écrit scientifique, étudiants natifs/non, natifs, linguistique de
corpus
123
Meeting the reader in academic writing:
reader pronouns in English and French.
Curry Niall
1
⇤ 1
University of Limerick [IRLANDE] (UL) – University of Limerick Limerick, Irlanda
Research on corpus-based contrastive analysis is notably experiencing a rebirth in interest
due to its role in a world of increasing ‘interlingual and intercultural communication’ (Granger
2003, p.18). This rebirth is largely influenced by advances in corpus linguistics over the last
30 years, where corpus-based contrastive analyses on academic writing are occupying an albeit
small but growing space in the literature. Much of this growth is likely due to the fact that
non-native speaking academic writers need to be informed of the writing conventions of the
academic discourse communities to which they aspire (Pérez-Llantada 2010, p.45). This has led
researchers on academic writing to occupy three streams of research (Biber 2006, p.6) that can
better inform language teaching i.e. the study of context and text, the study of interpersonal
communication and the study of lexico-grammatical items. Although these streams are arguably
interconnected, there is a surprising lack of research on interpersonal communication in academic
writing that compares evaluative markers across languages. In other words, there is a need for
research on rhetorical devices, such as directives, personal asides, shared knowledge, questions
and reader pronouns (Hyland 2005), that authors use to engage readers in academic writing and
this research aims to address this gap in the context of reader pronouns in English and French
academic writing. In this paper, we consider reader pronouns in the economics research article
in English and French and in so doing, aim to analyse their varying role in the research article
as engagement markers. We focus on the functions of these pronouns as a comparable common
ground or tertium comparationis in English and French, and test their equivalence, following
Krzeszowski (1990), in terms of form, location and word class. To do this, we present a corpusbased contrastive analysis of economics research articles in English and French, taken from the
KIAP corpus (Fløttum et al. 2006) which is a comparbale corpus that contains 450 research
articles with 150 in English, French and Norwegian and 50 in each language in the economics,
linguistics and medicine disciplines. This research centres on the English and French economics
subcorpora totalling 100 research articles. Reader pronouns are identified in each sub-corpus
and their functions are categorised based on a synthesis of research by Hyland (2001; 2005)
and Fløttum et al. (2006) in terms of their work on addressee features and reader pronouns.
These reader pronouns are then analysed in terms of their formal typology, their location within
the text, and their morpho-syntactic properties in a view to measure equivalence. The results
of this study reveal some important similarities and di↵erences at the level of function, form,
location and morpho-syntax which are investigated both quantitatively and qualitatively. Such
findings allow us to add to the debate on the nature of English and French academic writing
as writer- and reader-responsible languages, respectively and can have useful implications in
informing the teaching of academic writing in both English for academic purposes and français
langue académique.
⇤
Ponente
124
Contraseña: corpus, based contrastive analysis, English for academic purposes, français langue
académique, academic writing
125
Multi-word terms: disclosing the semantic
relations in noun compounds
Melania Cabezas-Garcı́a
⇤† 1
, Pilar León-Araúz
⇤
1
1
University of Granada (UGR) – Buensuceso, 11 18001, España
Noun compounds (e.g. wind power ) are the units mainly used to designate specialized concepts (Nakov, 2013). These multi-word terms (MWTs) can be defined as a sequence of nouns
that function as a single noun (Downing, 1977) and they are distinguished by their syntacticsemantic complexity, since two concepts are juxtaposed without any clear indication of the link
between them (Rosario et al., 2002). This involves that in compound terms, such as air pollution
and oil pollution, that have the same external form (the head pollution combines with a noun
modifier), di↵erent semantic relations can be established between their constituents (Location
vs. Cause) (Maguire et al., 2010). Therefore, the semantics of terminological noun compounds
is not fully compositional or construed from the meaning of their constituents, as it is often
assumed. Although the ambiguity of the semantic relations in noun compounds has long been
studied, it remains problematic, because di↵erent interpretations can lead to di↵erent inferences,
query expansion, paraphrases, translations, etc. (Hendrickx et al., 2013).
The root of this issue is noun packing, which can be addressed by analyzing the formation
processes of noun compounds, involving predicate deletion (e.g. power system, instead of a
system produces power ) and predicate nominalization (e.g. energy transfer, instead of energy
is transferred ) (Levi 1978). These propositions underlying the noun compounds make the semantic relation explicit and take the form of a predicate, its arguments, which are mandatory
and make up the meaning of the verb, and adjuncts (optional complements) (Tesnière, 1976).
The relation between a predicate and its complement structure is referred to as ‘micro-context’,
which represents a key factor in accessing the semantics of terms.
This paper describes the use of paraphrases conveying the conceptual content of English twoterm noun compounds (Nakov and Hearst, 2006; Butnariu and Veale, 2008; Cabezas-Garcı́a
and Faber, in press) in the specialized domain of environmental science. Verb paraphrases were
used to access micro-contexts, which represent the syntax-semantics interface, in two-term noun
compounds formed by predicate deletion. Some of these paraphrases were based on the lexicosyntactic patterns that usually convey semantic relations in real texts (Meyer, 2001; Marshman,
2006). Our goal was to access the semantics of these MWTs in order to (i) disambiguate the
semantic relation between the constituents of the compound; and (ii) develop a procedure of
inference of the semantic relations in these MWTs.
To this end, English two-term noun compounds were extracted from an environmental science
corpus. The MWTs selected designated entities and all of them shared the same head (e.g. air
pollution, wastewater pollution, oil pollution, etc.). We then organized the MWTs according to
the semantic category of their modifiers, i.e. the qualitative valence of the concealed predicates
⇤
†
Ponente
Autor correspondiente: [email protected]
126
was considered to disambiguate the semantic relations in the noun compounds. The following
step was the extraction of paraphrases from the corpus. Finally, the di↵erent groups of MWTs,
which had been previously organized depending on the semantic category of their modifier, were
compared.
Our results showed that the specification of the semantic category of the modifiers and the
use of paraphrases allowed access to the conceptual load of the noun compounds, namely to
the semantic relation between their constituents. Thus, recurrent patterns in the formation of
these compounds were observed, which was found to be a valuable starting point toward the
development of translation rules of these units.
Contraseña: noun compound, semantic relation, paraphrase, micro, context, terminology
127
Multilingual extraction of terminology from
specialised corpora.
Eva M. Mestre-Mestre
1
⇤ 1
Universitat Politecnica de Valencia [Espagne] (UPV) – Camino de Vera, s/n 46022 Valencia, España
There exists considerable amount of literature related to the use of text based corpora for
various purposes: scientific research, elaboration of teaching materials, compilation of glossaries
and vocabularies, etc. In many cases, computer software is used (and sometimes programmed)
to help in these tasks. Most of the analysis software used permits the users to check word frequencies, concordances and collocations. However, there are not many tools which permit the
extraction of true specialised lexical units from specialised domain corpora. In addition, there
are not numerous able to work with languages other than English. This work presents the main
characteristics of DEXTER (Discovering and EXtracting TERminology)[1], an online workbench
for terminology management and data mining of corpora based on unstructured texts.
The current version of DEXTER supports the processing of small- and medium-sized corpora
carrying out first an automatic extraction of the terms in a given corpus, by contrasting the
target corpus with the IATE thesaurus of the European Union. Then, a manual validation of
the candidate terms is necessary to obtain final valid results. During the analysis, a distinctive
characteristic of DEXTER is the possibility of working with di↵erent languages; at the moment,
it is able to analyse corpora in English, French, Italian and Spanish. A second particularity of
this software tool is that it uses a hybrid approach which takes into account the linguistic and
statistical properties of the lexical units, using in addition lexical filters without grammatical
tagging to restrict the results obtained before their weighing, which simplifies the validation work
needed for the completion of the terminology extraction task. This also permits the identification of terms that include di↵erent grammatical categories (nouns, verbs, adjectives or adverbs).
DEXTER uses the SCR metric (Periñán-Pascual, 2015), resulting from the combination of
termhood and unithood of the n-grams extracted by the software (Salton, Wong, and Yang,
1975; Salton and Buckley, 1988; Ahmad, Gillam and Tostevin, 2000; Park, Byrd and Boguraev,
2002).
The research presented here compares the results obtained in the analysis of three corpora composed by 50 articles written in French, 50 written in English and 50 written in Spanish on the
subject of neurology published in the last five years in prestigious research journals. The degree
of precision of the terms proposed by the software after manual validation has been studied. The
cases in which greater degree of false positives (considered as terms by the software proposed but
disregarded in the validation phase) have also been considered. The study concludes that the
results obtained with DEXTER are similar for the three languages and consistent with previous
studies carried out with monolingual corpora (Periñán-Pascual and Mestre-Mestre, 2015, 2016).
DEXTER has been developed in C# with ASP.NET 4.0 by Prof. Carlos Periñán-Pascual, and
is freely accessible at www.fungramkb.com/nlp.aspx.
⇤
Ponente
128
Contraseña: ATE, multilingual, specialised corpora, terminology
129
Naming practices and media constructions
of reality in Spanish: A corpus-based
perspective on violence against women news
(2005-2015)
José Santaemilia
1
⇤ 1
Universitat de València (UV) – Avda. Blasco Ibáñez, 32-6 Valencia 46010, España
Without a doubt, violence against women (VAW) is a serious issue within Spanish society,
which is characterized, among other things, by a growing awareness of gender and sexual issues,
and this includes a perception of VAW as a serious social malady, as well as a crime. Multiple
representations of, and debates on, the topic are to be found in literature (Báez Ramos 2002),
cinema (Sánchez Noriega 2002, Wheeler 2012) or TV and radio programmes (Gómez Nicolau
2012).
In this heightened awareness of VAW, mass media have been instrumental. In Spain, media
accounts of VAW are very closely related to two quality newspapers, El Paı́s and El Mundo.
Since the mid-1970s quality papers have featured growing numbers of articles on the topic. With
the murder of Ana Orantes in December 1997, a new discourse on VAW has been identified in
the Spanish media (Bengoechea 2000, Carballido 2007), though scholarly research at the turn
of this century (Bengoechea 2000, Lledó 2002, Fernández Dı́az 2003, Jorge 2004, Vives-Cases
et al 2005, Carballido 2007, Zurbano 2012, Menéndez 2014, Carratalá 2016) still shows that
Spanish media discourses have a tendency to naturalize and condone male responsibility, thus
reproducing the existing asymmetrical relations between the two sexes.
Although a vast number of denominations for VAW are present in the Spanish media discourse, three naming practices seem to stand out as the most common -violencia de género
[Eng. ‘gender-based violence’], violencia doméstica [Eng. ‘domestic violence’] and violencia
machista [Eng. ‘male violence’]. Choosing one term over another is especially relevant, as it is
likely to impose a category of thought, convey negative or positive values, attribute blame or
praise, or shape a certain evaluative stance.
This presentation, therefore, compares and contrasts the two Spanish quality dailies (El Paı́s
and El Mundo) in their use of the three main naming practices used in contemporary VAW
news. To do so I draw on an ad-hoc corpus made up of ca. 10 million words of gender-based
news, covering the period 2005-2015. This is part of a larger, comparable (Spanish-English),
highly specialized corpus (GENTEXT-N), containing all the news articles dealing with genderrelated topics such as VAW, homosexuality or abortion. In terms of methodological approach,
I resort to a CADS (Computer-Assisted Discourse Studies) approach (Partington 2004, Baker
& Levon 2015) –e.g. the combined, dialogical insights from both corpus linguistics and Critical
Discourse Analysis, ”moving back and forth recursively between qualitative and quantitative
forms of analysis in order to generate new hypotheses as well as to test existing ones” (Baker &
Levon 2015: 223). Therefore, di↵erences and similarities in frequencies and concordance lines
are explored, in order to assess the most important ideological values present in VAW news
⇤
Ponente
130
stories. Attention has been paid to the news values (Bednarek & Caple 2012, 2014) construed
by each newspaper, together with the relevant associations and ideological implications. Among
the traits that seem to be confirmed we identify a general trend towards a more widespread
use of two terms –violencia machista (El Paı́s) and violencia de género (El Mundo)– with the
increasing exclusion of violencia doméstica.
Newsworthy naming practices, and their evolution in media discourses, are powerful indicators
of both social positionings on sensitive social issues and of public evaluations of the same issues.
Contraseña: violence against women (VAW), Spanish press, El Paı́s, El Mundo, media discourse,
VAW naming practices, news values.
131
On the Endophoric, Abstract and Narrative
Nature of Idiomatic ’Do So’ in Legal texts,
Journalistic Texts and Written
Correspondence. ”
Carlos Prado-Alonso
1
⇤ 1
of Oviedo (Uniovi) – España
Do so idiomatic constructions, as in ‘I ate an Apple yesterday in the park, and Peter did so
last week’, are verbal anaphors that have been extensively studied from a theoretical perspective.
Research on do so has mainly focused on the categorical factors -i.e. semantic and syntacticthat determine the use of the construction. It has been argued, for instance, that the extent of
application of do so anaphora depends principally on factors such as: (a) non-stativity of the
antecedent (Guimier 1981); (b) antecedent not headed by be (Levin 1986); (c) coreferentiality
of subjects in the antecedent and do so clauses (Souesme 1987), (d) adjunct status of any ”orphan” in the do so clause (Culicover & Jackendo↵ 2005); and/or (e) non-contrastive status of
any adjunct in the do so clause (Huddleston and Pullum 2002), among others.
Overall, however, scholars have devoted little attention to the examination of the textual factors
a↵ecting the distribution and use of do so anaphora in naturally occurring Present-day English,
apart form a few isolated hints here and there (cf. Houser 2010).
In order to bridge this gap, this paper presents an in-depth corpus-based analysis of the factors
that determine the pragmatic use and distribution of do so constructions in di↵erent contemporary legal, journalistic and written correspondence texts. The data for the study are taken from
the ICAME family of corpora, namely the LOB, FLOB, FLOB, FROWN, BE06, and AmE06
corpora.
As a rule, do so has been regarded typical of formal registers, with the elliptical alternative
omitting so being preferred in informal contexts (cf. Stirling and Huddleston 2002: 1531). Beyond that, however, the analysis of the 687 instances retrieved from the corpora will show that
the frequency and distribution of do so constructions in legal, journalistic and written correspondence texts is not only dependent on the degree of formality but also on the narrative,
endophoric and abstract nature of the texts in which it occurs. The data will also show that
such a narrative, endophoric and abstract nature is not only a property of the texts in which do
so anaphora occurs, but also a feature of the construction itself.
In sum, the analysis sheds light on the linguistic and textual factors that drive the pragmatic
use and the distribution of do so verbal anaphora and shows that, in addition to syntactic and
semantic factors, the linguistic features of the texts in which they occur also play an important
role in the use of these types of formulaic expressions.
References
⇤
Ponente
132
Culicover, P.W., Jackendo↵, R. 2005. Simpler Syntax. Oxford: OUP.
Houser, M. J. 2010. The Syntax and Semantics of Do So. University of California.
Guimier, C. 1981. La Substitution Verbale en Anglais. Modèles Linguistiques 3.1: 135-161.
Levin, L. 1986. Operations on Lexical Forms: Unaccusative Rules in Germanic Languages.
Cambridge, MIT.
Pullum, G. K. & R. Huddleston. 2002. The Cambridge Grammar of the English Language.
Cambridge University, 1449-1564.
Souesme, J. 1987. Valeurs et Emplois Respectifs de DO et DO SO. Modèles Linguistiques
9: 65-92.
Contraseña: Idiomatic Do So, Textual Variation, Legal Texts, Editorials, Written Correspondence
133
On the Grammaticalization Path of the
Quasi-coordinator as well as
Miriam Criado Peña
1
⇤ 1
UNIVERSIDAD DE MÁLAGA – España
The English language as it is known today has undergone a number of developments that
have changed it throughout time. Among those changes, grammaticalization stands out because
of its relevance in the progress of the language, consisting in the process by which a lexical
word having full meaning on its own becomes a grammatical item. The present study analyses
the developmental path of the construction as well as taking the Old English adverb well as
the origin of it. In Middle English as well and as well as (swa well swa) emerged from the
original adverb behaving as single units to finally turn into the coordinator as well as in Early
Modern English. These manifold layers still remain in Present Day English, which together with
the versatility of the construction allows me to classify it into four groups according to their
meaning and function: a) as an adverb of manner; b) as a comparative of two elements; c) as a
conjunctive coordinator; and d) as a coordinator introducing one person or thing. Nevertheless,
coordinators such as as well as sometimes perform di↵erent syntactic roles in a sentence, those
are called quasi-coordinators, that is, linkers that can behave like coordinators or subordinators
depending on the context. When they behave as subordinators, they introduce prepositional
phrases and can be placed in front position but do not lose their coordination function. Besides, some of the mechanisms involved in the modification of the language, such as syntactic
reanalysis or semantic bleaching, among others, are also considered in this paper to explain the
changes and provide a dual view of them encompassing syntax and semantics. The process of
grammaticalization of quasi-coordinators has been practically neglected in the literature, and
therefore, the diachronic development of as well as still remains unknown. In the light of this,
the present paper studies this process examining the syntactic and semantic changes of this
construction as well as exploring its coordinating function in the di↵erent layers across time. In
this fashion, the following objectives are pursued: a) a historical analysis to ascertain the origin
of this quasi-coordinator, examining the linguistic causes that motivated the change, both syntactically and semantically; b) an identification of the multiple mechanisms and processes taking
place along the grammaticalization path; c) a classification of the construction into four groups
according to their function in order to appreciate its progress, and d) a socio-linguistic study to
assess the role played by the social factors during the linguistic process. For the purpose, the
Parsed Corpus of Early English Correspondence (PCEEC) and the Helsinki Corpus of English
Texts have been used as sources of analysis, covering almost seven hundred years from the late
Old English period to the Early Modern period.
Contraseña: Grammaticalization, as well as, quasi, coordinator, diachronic development, semantic
bleaching, reanalysis, socio, linguistic factors
⇤
Ponente
134
Onomasiologı́a del sentimiento: los corpus
ling´’uı́sticos como fuente de datos para la
semántica y la combinatoria sintagmática de
los nombres de emoción en español
Inmaculada Mas
1
⇤ 1
Universidad de Santiago de Compostela - USC (SPAIN) (USC) – España
La expresión de las emociones está de moda. Los emocionarios, la plasmación de sentimientos en las redes sociales, los emoticonos, imprescindibles en las conversaciones a través del chat
móvil. Estas son solo algunas de las manifestaciones de la relevancia actual de la sensibilidad
subjetiva. Más allá del monolı́tico me gusta, dar nombre a las emociones constituye en las comunicaciones públicas y privadas un elemento esencial, no por primitivo menos sofisticado. En esta
comunicación proponemos un acercamiento a la semántica y la combinatoria léxico-sintáctica
de los nombres de emoción en español con ayuda de los datos obtenidos a través de corpus
ling´’uı́sticos.
Los objetivos de esta propuesta son tres: en primer lugar, se intenta llevar a cabo una aproximación onomasiológica a la materia de los sentimientos, centrada en los nombres de emoción en
español y su combinatoria léxico-sintáctica; en segundo lugar, se pretende comprobar la utilidad
de los corpus como fuente de datos, ya que, además del contexto y el ámbito, aportan información sobre frecuencia (corpus de referencia), correspondencias multiling´’ues (corpus paralelos)
e incidencia en la interlengua (corpus de aprendices); como tercer objetivo, se busca considerar
la aplicabilidad de todo ello de cara a la elaboración de un producto lexicográfico destinado a
estudios contrastivos y a resolver necesidades de producción y traducción.
La aproximación onomasiológica parte del Diccionario ideológico de Casares, del Diccionario de
uso del español, de Moliner, y del Diccionario de sinónimos y antónimos de la lengua española,
de López Garcı́a. Según el plan general de la clasificación ideológica de Casares (1942), los
nombres de emoción están englobados en la materia de Sensibilidad y se desglosan en Sensibilidad/Sentidos, en el Cuadro sinóptico 13 (p. L), y en Sentimientos, en el Cuadro sinóptico 14
(p. LI).
Como es sabido, la perspectiva onomasiológica es más aprovechable en las tareas ling´’uı́sticas
de producción y traducción, dos actividades para las que los catálogos del Diccionario de uso
del español y los diccionarios de sinónimos y antónimos han demostrado ser fuentes de enorme
utilidad. La localización del léxico preciso se consigue en general a partir de la voz más neutra,
más general o más frecuente. Moliner tenı́a como uno de sus propósitos al incluir los catálogos
el de ”conducir al lector desde la palabra que conoce al modo de decir lo que desconoce o que no
acude a su mente en el momento preciso” (p. IX). En su diseño pretendió dotar al diccionario
de una doble vı́a de consulta: la onomasiológica y la semasiológica.
Los datos de frecuencia y, sobre todo, el caudal de ejemplos de los nombres en contexto que nos
ofrecen los diferentes corpus consultados (CORPES XXI, Reverso Context y CAES), permiten
perfilar el esquema semántico, completándolo con el potencial combinatorio; en el caso que nos
⇤
Ponente
135
ocupa, con los verbos de apoyo y los complementos adnominales. Algunos resultados en torno
a los dos polos en que se sitúa la materia Sentimientos (gusto/disgusto, amor/odio, preocupación/despreocupación) muestran las particularidades de la combinatoria de estos sustantivos.
Contraseña: nombres de emoción, lexicografı́a onomasiológica, corpus del español, combinatoria
sintáctica
136
Phraseological routines in scientific writing:
the example of metatextual routines in
French
Agnès Tutin
1
⇤ 1
Laboratoire de Linguistique de de Didactique des Langues Maternelles et Etrangères (LIDILEM) –
Université Paris VIII Vincennes-Saint Denis, Université de Grenoble – Université Grenoble Alpes
Bâtiment Stendhal CS40700 38058 Grenoble cedex 9, Francia
Phraseology is prevalent in scientific writing (e.g. Gledhill, 2000; Pecman & K´’ubler 2011)
and has many faces in this genre (Tutin, 2013). Cross-disciplinary scientific phraseology includes
collocations such as pay attention or encouraging results, discursive markers such as as long as
or as a first step but also large phraseological chunks that we call semantico-rhetorical routines
(Tutin & Kraif, 2016). These routines, which belong to the extended phraseological field (see
also Teufel 1998, Pecman 2004; Sandor 2007) present specific properties:
• At the syntactic level, they are generally complete sentences including a tensed verb.
They are thus di↵erent from standard collocations which prototypically involve two lexical
elements.
• At the rhetorical level, they have a specific rhetorical function, such as highlighting textual
coherence, e.g. comme on/nous l’avons mentionné/précisé [as one/we mentioned/made
clear ...].
• At the enunciative level, they involve specific referents in the discourse situation (e.g. the
author of the scientific writing, the scientific article, the audience of the scientific writing
...).
• At the semantic and lexical level, they involve specific concepts, lexicalized with various
elements, e.g. in the above example, the author of the scientific is referred to with on or
nous, while mentionné alternates with précisé.
These semantico-rhetorical routines are thus far from being frozen expressions, but we think they
fully belong to the field of phraseology since these patterns are dedicated to specific functions
in the genre of scientific writing and are realized through limited lexical paradigms.
After a theoretical presentation of routines, our presentation will show how these phraseological
patterns can be automatically extracted from treebanks of scientific articles in a corpus-driven
approach. This technique uses statistical association measures and dispersion measures (Kraif
2016; Tutin & Kraif 2016), associated with semantic lexicons and syntactic relations (Hatier et
al. 2016). We will then illustrate this notion in the field of metatextual functions, especially
text navigation functions, often associated with speech verbs.
⇤
Ponente
137
References
Gledhill, Ch. (2000). Collocations in Science Writing. Language in performance, 22. Tuebingen: Gunter Narr Verlag.
Hatier, S., Augustyn, M., Yan, R., Tran, T. T. H., Tutin, A., & Jacques, M. - P. French crossdisciplinary scientific lexicon: extraction and linguistic analysis (2016).Dans T. Margalitadze &
G. Meladze (éd.), Proceedings of the XVII EURALEX International congress Lexicography &
Linguistic diversity (p. 355–365).
Kraif, O. (2016). Le Lexicoscope : Un outil d’extraction des séquences phraséologiques basé
sur des corpus arboré. (O. Kraif & A. Tutin, éd.)Cahiers de lexicologie, 1 (108), 91-106.
Pecman, M. (2004). Phraséologie contrastive anglais-français : analyse et traitement en vue
de l’aide à la rédaction scientifique. Thèse de doctorat, Université de Nice Sophia Antipolis,
décembre 2004.
Pecman, M., & K´’ubler, N. (2011). ARTES: an online lexical database for research and teaching
in specialized translation and communication. In Proceedings of the First International Workshop on Lexical Resources.
Sándor, A. (2007). Modeling metadiscourse conveying the author’s rhetorical strategy in biomedical research abstracts. Revue Française de Linguistique Appliquée, XII: 2007-2: 97-108.
Tutin, A. (2016). La phraséologie transdisciplinaire des écrits scientifiques : des collocations
aux routines sémantico-rhétoriques. Dans A. Tutin & F. Grossmann (éd.), L’écrit scientifique
: du lexique au discours. Autour de Scientext (p. 27-44). Rennes: Presses Universitaires de
Rennes.
Tutin, A., & Kraif, O.(2016) Routines sémantico-rhétoriques dans l’écrit scientifique de sciences
humaines : l’apport des arbres lexico-syntaxiques récurrents. Lidil. Revue de linguistique et de
didactique des langues, (53), 119-141.
Contraseña: phraeology, scientific writing, routines
138
Phraseology and discourse grammar in
English as a lingua franca: ’on the contrary’
and ’on the other hand’ in unedited research
papers
Silvia Murillo
1
⇤ 1
Universidad de Zaragoza – Pedro Cerbuna 12, 50009 ZARAGOZA-ESPAÑA, España
Due to linguistic interference, some ‘deviant’ uses of the contrastive discourse markers on the
contrary and on the other hand have been pointed out in essays written by learners of English
(Lake 2004, Guilqin et al. 2007), as well as by users of English as a lingua franca (Prodromou
2008). For instance, these markers, which grammatically are prepositional phrases, are similar
in form to the Spanish discourse markers por el contrario and por otra parte, but their use
(i.e. their instructional or procedural meaning) is di↵erent. Por el contrario can either contrast
two topics or oppose/ refute one single topic, whereas on the contrary only encodes the latter
use. Por otra parte encodes discourse organizing instructions rather than counterargumentative
ones. The same applies to other language pairs, for example English-French on the contrary/
au contraire (Portolés 2002).
The purpose of this paper is to present a qualitative-quantitative analysis of the form and use
of these two markers in the SciELF corpus, a subset of the WrELFA corpus (Written Corpus of
English as a Lingua Franca in Academic Settings), compiled at the University of Helsinki. The
SciELF corpus consists of 150 unedited research papers (759 300 words) from Sciences and Social Sciences and Humanities disciplines, written by academics of a range of ten L1 backgrounds.
The analysis of the corpus revealed nonstandard phraseological variants of the two markers.
Regarding on the other hand, makers such as on the other side, in the other side, in the other
hand, and for the other hand were found. The phraseological range for on the contrary included
at the contrary, by contrary, in contrary, on contrary, and contrary. As regards their functions,
on the contrary presents deviant uses, contrasting two topics rather than opposing/ refuting
one single topic, in over two thirds of the cases found in the SciELF corpus. On the other hand
reflects a more discourse organizing role of the marker in some cases, and thus a less argumentative function.
These processes may be described as semantically-driven developments, as the role of residual conceptual meaning in the L1 markers (cf. Murillo 2010) seems to become central for the
form and use of these discourse markers in written academic ELF. Regarding form variants, in
most cases the core conceptual element of the markers has been kept (as a cognate) or translated,
and there is an approximate use of the prepositions and articles (cf. Sinclair 2004, Vetchinnikova
2015). Further, the procedural meaning of these markers seems to have been amplified due to
the influence of the L1. Thus, hybridity is the most remarkable process with regard to these
markers, and it is perceived at a formal level and at a pragmatic-semantic level.
Variations in form are masked by the role played by editors at a later stage, who tend to
⇤
Ponente
139
correct the use of prepositions and articles in papers to be published (Mur, 2013). However,
many deviant uses of on the contrary are overlooked in published papers (Murillo 2012). Considering this trend and the frequency of such cases revealed in the SciELF corpus, it is argued that
this discourse marker is undergoing a grammaticalization process in ELF, that is, its procedural
meaning is changing.
Contraseña: English as a lingua franca (ELF), contrastive discourse markers, formal variants,
procedural meaning, conceptual meaning, grammaticalization
140
ROUND TABLE: Corpus-based analysis of
interpersonal metadiscourse in specialized
domains: academic vs professional and
social genres. Theoretical and
methodological challenges
Francisca Suau-Jiménez ⇤ 1,2 , Rosa Lorés Sanz ⇤ † 3 , Giovanna Mapelli
4
, Isabel Herrando Rodrigo ⇤ § 3
1
⇤ ‡
FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ. UNIVERSITAT DE VALENCIA
(IULMA-UV) – 32, AV BLASCO IBÁÑEZ 46010 VALENCIA, España
2
FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ (IULMA - UV) – Av. Blasco
Ibáñez, 32 Valencia 46010, España
3
Universidad de Zaragoza – Pedro Cerbuna 12, 50009 ZARAGOZA-ESPAÑA, España
4
Dipartimento di Scienze della Mediazione Linguistica e di Studi Interculturali – Piazza Indro
Montanelli, 1 20099 - Sesto San Giovanni (MI), Italia
The main subject of this round table is an identified need to refine interpersonal metadiscourse (IM) as a theoretical and methodological tool of analysis to describe genres in specialized
domains and languages through their corresponding corpora. The debate will be grounded on
our own research results, based on corpora, stemming from the study of di↵erent academic and
professional genres (Herrando-Rodrigo 2010, 2012, 2014; Lorés-Sanz 2009, 2011a, 2011b; Mapelli
2008, 2016; Suau 2012a, 2012b, 2014), with a focus on interpersonality and its limitations and
challenges as an analytical perspective. Conclusions intend to suggest insights for the applicability of the descriptive framework of interpersonal metadiscourse and thus facilitate further
research in the field.
The hypothesis is that, if interpersonal metadiscourse (IM) as a framework for the analysis of
interpersonal features in professional, social and academic genres is conditioned by contextual
variables, it would therefore need to be constantly refined and readapted to the specific corpus
it is applied to, thus accepting new markers and/or new lexico-grammatical realizations. If this
hypothesis is somehow confirmed by means of the debate and the conclusions that will emerge
from the proposed round table, the scope will be opened for further refinement of the model
which will allow us to cater for the description of a wider range of genres, disciplines, languages
and corpora, with discursive and socio-linguistic implications.
To sum up, we will draw on several of our own studies carried out in specialized corpora from the
standpoint of the IM framework, discussing their main achievements but also their limitations,
due to the strict and extant pattern the model was designed with. Then, these four questions
will be posed in order to hold a debate among the presenters and the audience:
Questions for discussion, related to the four analyses:
⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
§
Autor correspondiente: [email protected]
†
141
Q.1. Have any weaknesses being identified in the framework of interpersonal metadiscourse,
especially related to markers and their lexico-grammatical realizations? and if so, which ones?
Q.2. Does each corpus determine the way in which the framework has been applied, or, on
the contrary, has the research objective determined what corpus to collect?
Q.3. What di↵erences can be observed in the interpersonal metadiscourse framework according
to genre, discipline and language?
Q.4. What conclusions can be drawn and what suggestions can be made to facilitate methodological improvements in order to facilitate further research in IM? Which would be the theoretical
implications?
Based on our contributions and on the implications emerging from them, di↵erences will be
identified in terms of variations in the degree of applicability of the model as regards the domain
of specialization (professional and social vs academic), language used (English vs Spanish) and
lexicogrammatical and phraseological indicators, among other aspects.
Contraseña: interpersonality/ interpersonal metadiscourse/ specialized, domain corpora/academic
genres/professional and social genres/ theoretical and methodological challenges
142
Rocking the corpus. A discourse analysis of
pop rock lyrics.
Marı́a Martı́nez Casas
1
⇤ 1
Katholische Universität Eichstätt-Ingolstadt (KU) – Alemania
Rocking the corpus. A discourse analysis of pop rock lyrics.
song lyrics, discourse analysis, language use patterns, enunciation, semantic processes
Pop rock songs are everywhere – except for corpora. As Kreyer and Mukherjee (2007: 31)
point out: ”pop song lyrics have not been included in any of the standard reference corpora
of present-day British and American English [...]; pop songs are virtually absent from corpuslinguistic research”. The current state of research on pop rock songs in Spanish does not constitute an exception to this statement.
Thus, the aim of this paper is to present the main language use patterns (Bubenhofer 2009)
regarding enunciation (Laferl 2005, Calsamiglia and Tusón 2015) and semantic processes (Halliday 1978, Ghio and Fernández 2008) in a corpus consisting of 1.000 pop rock lyrics in Spanish
(169.500 tokens).
The present corpus was compiled following the sociological criteria of consecration and canonization as well as central aesthetic values such as authenticity and hybridization (cf. Val,
Noya and Pérez-Colman 2014). It comprises 85 albums released between 1968 and 2015 by
artists coming from over 12 countries. 819 texts were taken from CD booklets or artists homepages and 181 lyrics were transcribed from recordings. They were then analyzed with both
AntConc 3.4.4W and WordSmithTools 6.0 and finally POS-tagged using Treetagger.
In accordance to the results of prior corpus-linguistic research on pop rock lyrics in English
(Murphey 1990, Kreyer and Mukherjee 2007, Werner 2012, Bértoli-Dutra 2014), pop rock discourse in Spanish builds upon the personal pronouns and possessive determiners of first (yo, me,
mi ) and second person singular (tú, te, tu). The most frequent enunciative structure as proposed
by Laferl (2005: 68) is: ”The I addresses itself to a you and talks about their relationship”.
However, both main participants in lyrics show di↵erent semantic preferences when it comes to
types of processes: whereas the articulate ”I” tends to be involved in mental (querer, sentir ) processes, the ”you” carries out material (irse, dar, dejar, llevar ) or verbal (decir, pedir ) processes.
The semantic categories which Bértoli-Dutra (2014: 162) grouped for the factor extraction in
her multi-dimensional analysis of pop songs in English show therefore following distribution in
the lyrics in Spanish: ”movement” and ”speech” apply rather to the ”you”; ”emotion”, on the
contrary, appears mainly close to the ”I”.
The linguistic representation of the main participants in pop rock lyrics shall be presented in
this paper through the discourse analysis of clusters with deictic expressions referred to the
”I” and the ”you” in the corpus. Special attention will be paid to lexical co-occurrences with
tags corresponding to clitic and personal pronouns, possessive determiners and verbal forms (i.e.
lexical and modal verbs as well as ser, estar , haber ).
⇤
Ponente
143
Contraseña: song lyrics, discourse analysis, language use patterns, enunciation, semantic processes
144
SUNCODAC: A Spanish-English corpus of
computer-mediated student discussions
Mario Cal Varela
1
⇤ 1
, Francisco Javier Fernandez Polo
⇤ † 1
Universidad de Santiago de Compostela - USC (SPAIN) (USC) – España
In this paper, we present the SUNCODAC corpus of student discussion forums. Our aims
will be to justify the corpus’ rationale, describe its compilation process, holdings, design and
query tools, and to highlight its potential as a research tool.
Despite the momentum of Computer-Mediated Communication research (Herring & alii 2013),
CMC corpora (Breissberger & Storrer 2008) are relatively meager and scarcely representative
of the wide variety of CMC settings, notably educational contexts.
Existing research in CMC in education is generally based on relatively small corpora, compiled for the special needs and research questions of individual research projects. SUNCODAC
is a comparatively large corpus of student forum discussions, a key genre in present-day higher
education (Rourke & alii. 1999, Loncar & alii 2014). Data consist of Moodle-based discussions
in an English-Spanish-English translation course over four consecutive years. The corpus contains a balanced representation of English and Spanish used as native and non-native languages
by multinational students. In the course of the presentation, we will provide a short description of the context of the discussions, as well as a brief account of the corpus compilation process.
SUNCODAC’S current holdings consist of approximately 450,000 words and, when completed,
it is expected to total over 600,000 words. Data were anonymized and stored in XML format
with metadata on a number of user and other contextual variables, including participants’ first
language, gender, main language of post, date, time, topic and thread. Except for the replacement of participants’ names by codes, the texts were left unedited as far as grammar, spelling
and other errors are concerned. A specific tool was developed to allow for the computerized
retrieval of data via the Internet. The tool can be used to search for specific language features,
as well as for browsing and retrieval of whole texts or text collections using one or a combination
of the coded variables as filters. In the course of the presentation, we will demonstrate some of
these functions.
The corpus holds considerable potential as a research tool, for instance, a) to further knowledge of ”netspeak” and, more specifically, b) to complement existing research on the discussion
forum genre (Biber & Conrad 2009) and its characteristic language. Furthermore, given its
longitudinal nature, c) it should provide insights into processes (individual and collective) of
genre development in CMC and, in view of its multilingual and multicultural nature, d) should
also prove particularly useful for language contrasts as well as e) for cross-cultural studies into
culture-specific communicative practices. Finally, f) it should also prove valuable as a tool to
study learner-language and second-language acquisition processes in real-life environments, as
well as to undertake pedagogically-oriented studies seeking to identify successful forum participations which result in more e↵ective learning practices, eventually leading to the design of
⇤
†
Ponente
Autor correspondiente: [email protected]
145
improved training materials.
Contraseña: Keywords: corpus, CMC, forum, Spanish, English, academic discourse, SUNCODAC
146
Secuencia gramatical para la enseñanza del
español como lengua extranjera
Yun Sil Jeon
1
⇤ 1
, Alejandro Muñoz-Garcés
⇤ † 1
Coastal Carolina University (CCU) – Associate Professor, Spanish, Estados Unidos
La investigación que estamos realizando conjuntamente la Università di Firenze y la Coastal
Caroline University inició con el propósito de conseguir encontrar un modo automático de extraer de corpus de la lengua oral las construcciones más sencillas que se realizan en el habla, e
ir progresivamente viendo las construcciones que presentan mayor complejidad.
Para esta investigación contábamos con varios corpus de la lengua oral española: C-Or-DiAL
(Corpus Oral Didáctico Anotado Ling´’uı́sticamente) (120.000 palabras transcritas y etiquetadas), C-ORAL-ROM (etiquetado y alineado) y el Minicorpus del Español (30.000 palabras
etiquetadas y alineados y con marcas de articulación de información).
Nuestro trabajo de programación inicial se ha propuesto encontrar el camino para conseguir
extraer de modo automático los enunciados más sencillos de todo el corpus y continuar con las
extracciones de los que presentan mayor complejidad de modo progresivo.
Se ha partido del presupuesto que un enunciado en el habla es menos complejo cuantas menos
unidades tonales lo componen. Se ha considerado por lo tanto que la unidad mı́nima de la comunicación es un enunciado compuesto de una solo unidad tonal, y que aumenta la complejidad
del enunciado al aumentar la complejidad en su articulación de la información con dos o más
unidades tonales.
Se ha iniciado el análisis utilizando las etiquetas de delimitación de estas unidades tonales en el
corpus C-Or-DiAL; estas etiquetas marcan los lugares en los que se percibe la delimitación de
las unidades tonales intermedias de un enunciado, los break prosódicos intermedios, y también
de final de enunciado, los break prosódicos finales. Gracias a este etiquetado ha sido posible
generar una lista con todos los enunciados compuestos de una unidad tonal, los compuesto de
dos, de tres de cuatro o más.
El paso sucesivo de la investigación consiste en analizar estas listas con los distintos tipos de
enunciados con ayuda de algunos analizadores morfosintácticos (GRAMPAL y FREELING entre otros) que se ofrecen en la red, para decidir cuál utilizar. Este mismo proceso de trabajo
de extracción de unidades tonales y análisis se hará también utilizando C-ORAL-ROM y con el
Minicorpus del Español para poder confrontar los resultados y evaluar las diferencias.
Como resultado de estos análisis esperamos encontrar datos que sean significativos o al menos
indicativos de lo que se suele usar en los enunciados más sencillos y lo que se va encontrando
en los más complejos. Se podrá reflexionar tras este análisis sobre los tipos de palabras y las
construcciones que ocupan determinadas posiciones en la articulación del enunciado.
Y por último a partir de estos datos se podrá proponer al profesor de español como lengua
⇤
†
Ponente
Autor correspondiente: [email protected]
147
extranjera una secuencialidad a la hora de elegir el material que enseñar en la clase, pues nuestra investigación espera obtener algunos indicios sobre lo que usa, dónde y cuánto se usa en el
español coloquial.
Contraseña: Secuencia gramatical, enseñanza del español, corpus de hablantes nativos, análisis
morfológico y sintáctico
148
Semantic constraints on MWU formation:
Evidence from clinical records.
Leonie Grön
1
⇤ 1
, Ann Bertels
1
Katholieke Universiteit Leuven (KUL) – Bélgica
Since Sinclair’s (1991) formulation of the idiom principle, the scope of research related to
multi-word units (MWUs) has widened considerably. While earlier work focused on fixed word
sequences, recent research locates MWUs on a continuum, ranging from frozen expressions to
patterns which allow for paradigmatic choices (Dobrovol’skij 2015; Steyer 2015). In studies on
language for special purposes (LSP), the defining criteria centre around the functional value
of the unit, whereby the surface forms may show both lexical and morpho-syntactic variation.
Such variation patterns may be attributed to the area of research, as well as properties of the
textual genre (Hyland 2008, Laso & Salazar 2013).
In the medical domain, most related research has focused on scholarly articles (León & Divasson
2006; Laso & John 2013). By contrast, our study investigates the structure of MWUs in clinical
records, which are at the verge of oral and written communication. By analyzing a corpus of
Dutch patient records, we aim to reveal patterns in the formation of complex noun phrases
(NPs). Our prediction is that structural preferences will pattern with semantic features of the
constituents.
Our study focused on MWUs relating to the semantic classes Diagnosis (e.g. ‘lipodystrofie’
lipodystrophy) and Examination (e.g. ‘schildklierfunctie’ thyroid function). Based on precompiled term lists, we extracted all instances of these classes that were either localized on
the human body (Anatomical, e.g. ‘onderbeen’ lower leg), or specified with regard to severity,
etiology or quality (Qualitative, e.g. ‘drug-ge´’induceerd’ drug-induced ).
We identified about 3 times as many MWUs for Qualitative than for Anatomical, both in
terms of raw counts (137.646 vs. 36.862) and the number of patterns ( ˜472 vs. 112). Especially for Qualitative, a small number of conventionalized phrases (e.g. ‘gunstig lipidenprofiel’
favourable lipid profile) accounts for a large share of occurrences. Irrespective of the headword
class, Qualitative modifiers primarily occur in the left context.
By contrast, the formation of Anatomical MWUs shows more structural variation: General types
of Examination are premodified (e.g. ‘pulmonaal onderzoek’ pulmonary examination), whereas
technical procedures are localized by nouns in the right context (e.g. ‘echo nier’ echography
kidney). MWUs based on the Diagnosis class entail more detailed localizations, leading to an
increase in average length ( ˜2.7 vs. 2.3 tokens for MWUs based on Diagnosis vs. Examination).
In MWUs involving multiple modifiers, the internal order of the constituents is determined
by their semantic class as well as the level of generality: Adjectives designating a particular
body part (e.g. ‘abdominaal’ abdominal ) are strongly tied to the headword, whereas relative
spatial modifiers and Qualitative specifications are found in the periphery (e.g. ‘stenose thv de
arteria carotis links’ stenosis in the arteria carotis left).
⇤
Ponente
149
We conclude that the formation of MWUs in clinical writing is guided by domain-specific constraints. In NPs relating to clinical findings and procedures, the type and relative position of
modifiers varies systematically depending on semantic properties of the constituents. These findings confirm that the study of MWUs in LSP benefits from a delexicalized approach, whereby
patterns of conceptual types form the basis of investigation.
Contraseña: Clinical language, Dutch, MWUs, concordance analysis
150
Sobre la cuasi-sinonimia de poner y meter
en español: un análisis de regresión logı́stica
de dos verbos locativos.
Marie Comer
1
⇤ 1
Ghent University – Bélgica
En esta ponencia nos proponemos comparar la sintaxis y la semántica de los dos verbos
principales locativos del español peninsular contemporáneo, poner y meter, mediante un corpus ampliamente anotado. En su significado básico, estos verbos cuasi-sinónimos expresan el
desplazamiento de una entidad (la ‘Figura’) de un lugar a otro (la ‘Base’) (Cifuentes 2000) (1).
Sin embargo, el uso de poner y meter va más allá del significado locativo básico (Autores 2015):
ambos verbos se usan como verbo de transferencia (2), en usos pseudo-copulativos (3), y en
perı́frasis causativas e incoativas (4).
(1) poner el mantel en la mesa - meterse un chupete en la boca
(2) poner una multa a alguien - meter muchos deberes a alguien
(3) ponerse nervioso - meterse monja
(4) ponerse a reı́r - meterse a trabajar
En cada uno de estos campos, poner y meter se comportan como cuasi-sinónimos. Significa
que son intercambiables en determinados contextos (ponerse/meterse a estudiar ), pero en no
otros contextos (El rı́o se mete/*se pone en el mar; *ponerse monja). El objetivo de esta presentación es doble. Primero, con base en un corpus arbitrario y manualmente compilado de 2000
ocurrencias (1000 de cada verbo, extraı́das de los bancos de datos CORPES XXI, CORLEC y
C-ORAL-ROM y ampliamente etiquetadas sintáctica y semánticamente), examinaremos hasta
qué punto los núcleos semánticos arriba mencionados se destacan concretamente con estos verbos.
Segundo, efectuaremos un estudio más detallado del uso locativo (1), con el fin de detectar
paralelos y diferencias entre poner y meter. El análisis se sustenta en un número extenso de
variables que potencialmente influencian la elección entre los verbos en su uso locativo. Para
este uso, los parámetros estudiados son, entre otros: (a) la dirección del desplazamiento de la
Figura con respecto a la Base; (b) la dimensión de la Base; (c) la presencia o ausencia de una
zona de contacto entre Figura y Base; (d) la forma fı́sica de la Figura; (e) la posibilidad de una
lectura de contenedor o no, y el grado de contenedor (parcial/completo); (f) la animacidad y
el carácter concreto/abstracto de los participantes, y (g) la interpretación literal o metafórica
del evento de colocación. Mediante un análisis de regresión logı́stica (logistic regression), estudiamos el impacto potencial que tiene el conjunto de las variables en la preferencia por uno de
los dos verbos. Nuestro estudio piloto reveló que meter se especializa en eventos donde la base
adquiere una lectura de contenedor de tipo meter el pañuelo dentro del bolsillo (Autores 2016;
Cifuentes 2004; Cifuentes & Jesús Llopis 1996), al preferir una localización interna, mientras
que poner se emplea en una diversidad de eventos locativos. Otros factores de diferencia son
⇤
Ponente
151
la reflexividad sintáctica del evento locativo y la semántica de los participantes. La presente
investigación ilustra cómo un método multivariado y estadı́sticamente avanzado se puede aplicar
para determinar la diferencia entre dos cuasi-sinónimos léxicos.
Contraseña: cuasi, sinonimia, verbos locativos, regresión logı́stica, análisis multifactorial
152
Spanish Fragments and Polar Verbless
Clauses. Typology and Corpus Distribution
Oscar Garcia-Marchena
1
⇤ 1
Laboratoire de Linguistique Formelle (LLF) – Université Paris VII - Paris Diderot, CNRS : UMR7110
– Case Postale 7031 5, rue Thomas Mann 75205 Paris cedex 13, Francia
The properties and use of fragments (or elliptical clauses) have received recent attention in
di↵erent works (Fernandez 2002, Merchant 2004, Schlangen 2003). There is no agreement, however, concerning their nature and classification. Firstly, some authors treat them as pure syntactic units: the remnants of verbless clauses which have undergone ellipsis (Merchant 2004). Secondly, others classify them as pragmatic objects, di↵erent from non-elliptical clauses (Schlangen
2003), by their function in discourse. Thirdly, other works stress their independence from
non-elliptical clauses and classify them with a combination of syntactic and pragmatic criteria
(Fernandez 2002).
The aim of this paper is to show the extent to which Spanish fragments and polar verbless
clauses (”yes”, ”no”) can be analysed as syntactic or discourse units, as well as to determine a
typology based in their syntactic and pragmatic properties and to present their distribution in
the di↵erent genres of a corpus. In order to achieve this goal, we have retrieved the totality of
fragments in the corpus of contemporary oral Spanish (CORLEC) (Marcos Marı́n 1992), composed by more than 63 000 utterances and we have classified them according to their syntactic
and pragmatic properties. Finally, we have counted the frequencies of each type in the di↵erent
genres.
The results of this analysis indicate that fragments containing a segment with a counterpart in
their source have a predictable discursive relationship with it: they perform a particular speech
act (answer, agreement, correction, check question, etc.) that is determined by the syntactic
and semantic properties of the source and the target clauses. This combination of properties
is detailed in the following list, with reference to constructed examples of the various speech acts:
• Interrogative source & asserting target: answer (1)
• Interrogative source & questioning target: answer + check question (2)
• Questioning declarative source & asserting target & same referent: agreement (3)
• Questioning declarative source & asserting target & di↵erent referent: correction (4)
• Questioning declarative source & quest. target & same referent: check question (5)
• Questioning declarative source & questioning target & di↵erent referent: correction (6)
• Asserting declarative source & asserting target & same referent: acknowledgement (7)
• Asserting declarative source & asserting target & di↵erent referent: correction
⇤
Ponente
153
• Asserting declarative source & questioning target & di↵erent referent: check question
• Asserting declarative source & questioning target & di↵erent referent: repair
• A: - ¿Cuándo vino? B: -Hoy. A: -‘When did he come?’ B: -‘Today.’
• A: - ¿Cuándo vino, hoy? A: -‘When did he come, today?’
• A: - ¿Se fue con Mar? B: -Con Mar. A: -‘Did he go with Mar?’ B: -‘With Mar.’
• A: - ¿Se fue con Pedro? B: -Con Mar. A: -‘Did he go with Pedro?’ B: -‘With Mar.’
• A: - ¿Se fue con Pedro? B: - ¿Con Pedro? A: -‘Did he go with Pedro?’ B: -‘With Pedro?’
• A: - ¿Se fue con Pedro? B: - ¿Con Marı́a? A: -‘Did he go with Pedro?’ B: -‘With Mar?’
• A: -Se fue con Pedro. B: - ¡Con Pedro! A: -‘He went with Pedro.’ B: -‘With Pedro!’
In this way, this article will show the illocutionary e↵ects of the combination of syntactic and
semantic properties in the source and target clauses for Spanish fragments and polar verbless
clauses, as well as the distribution of the resulting speech acts in the various genres of the CORLEC corpus.
Contraseña: Fragments, non, sentential utterances, polar verbless clauses, Spanish, corpus, speech
acts
154
Spoken Language Corpora under
Examination
Hanna Hedeland
1
⇤ 1
, Daniel Jettka
1
Hamburg Centre for Language Corpora, University of Hamburg – Alemania
Spoken Language Corpora under Examination
Contributing to the current discussion on reuse and citation of corpora and the replicability of
corpus-based research, this contribution describes evolving methods for corpus publication and
dissemination at a research data centre and presents an outline for a revised model of spoken
language corpora as complex dynamic linguistic resources.
Within emerging digital research infrastructures (e.g. CLARIN), digital repositories have been
set up for the dissemination of resources including spoken language corpora. While there are obviously many benefits to this current best practice approach, several questions regarding resource
type specific aspects of data modelling and versioning require an answer for its implementation.
By comparison to the previous web-based solution, this contribution discusses these questions
and their implications, which are highly relevant to research based on spoken language corpora.
Website resources
The vast majority of the resources hosted by the centre (cf. [1]) are XML data sets created
from heterogeneous legacy data using the EXMARaLDA system [2]. The EXMARaLDA Corpus
Manager (Coma) provides a basic data model for corpora comprising communications, speakers,
transcriptions, recordings and additional files, and the Coma metadata file itself.
For publication and dissemination however, corpus-specific methods based on the EXMARaLDA system were used to create a number of export and - mainly HTML based - presentation
formats (i.e. visualizations) and statistics from the source files, resulting in a much more comprehensive and complex resource. The protected resources were accessed via a public web page
page containing further background information and documentation.
Repository resources
Since a digital repository enforces concepts such as persistent identifiers, versioning of digital objects and ingest/dissemination services, the initial corpus data model for the repository
comprised only the original source files (cf. [3]), whereas basic visualization and export functionality was implemented by generic web services provided via the repository.
This solution brought about two important di↵erences: First, the resource is no longer a collection of static web pages and files; the user interacts with web services that change as target
formats or the services themselves are further developed.
Secondly, to allow for appropriate presentation of specific corpora (e.g. for research on (child)
⇤
Ponente
155
language acquisition or regional varieties) by generic web services, the corpus type specific characteristics related to corpus design, annotation layers and transcription conventions need to be
explicated and applied as configuration parameters for resource dissemination.
Discussion
Most important, the requirement of citable corpus versions makes it necessary to explicitly
track also the versions of web services and further components used for dissemination. As a
recent study [4] confirms, users of this type of corpora often mainly analyze visualized transcripts, whose characteristics are known to influence analysis (cf. [5]). Furthermore, while
merely applying corpus specific parameters in web services is straight-forward, the definition of
such parameters and classification of spoken language corpus types requires thorough investigation and interpretation. Such a typology can be used both to ensure a presentation consistent
with original research questions and frameworks for various resources, or, conversely, to allow
for a more consistent user experience by applying certain settings to various corpus types in
the repository. In our contribution we will discuss this revised and extended model of spoken
language corpora more in detail.
Contraseña: spoken language corpora, replicability, research infrastructures
156
Strategies for Processing Large Corpora for
Linguistic Inquiry and Natural Language
Processing Tasks.
Antonio Moreno-Ortiz
1
⇤ 1
Universidad de Málaga (UMA) – España
Very large (over a billion words) corpora, have become increasingly available to Corpus
Linguistics (CL) and Natural Language Processing (NLP) researchers. However, such text collections are o↵ered with no or little filtering and processing of their content. This is a non-issue
for some tasks, such as KWIC concordancing or collocates, due to the sheer volume of data available and, in some cases, the availability of web-based query environments. However, dealing with
the raw text to obtain accurate, linguistically-driven statistical information from such corpora,
with a view to using it for more advanced tasks, calls for some sophisticated pre-processing, in
terms of filtering and word tokenization. This basic step is critical to all others, since it involves
making such fundamental decisions as what a word is. This is even more relevant when a corpus
is compiled from on-line resources, which are commonly includes a fiar amount of non-lexical
and pseudo-lexical items, such as common computer-mediated communication items (URL’s,
handles, hashtags) as well as numbers, measures, formulas, etc. If no special treatment treatment is given to such elements, they will certainly impact word frequency counts at all levels,
including part-of-speech frequencies, n-gram extraction, statistical language modeling, and, in
general, any task that builds on these.
Determining the frequency of word classes accurately, as determined by part of speech assignment, is critical to a number of common corpus linguistics metrics, such as lexical density. In
this work we examine the role that certain non-lexical and pseudo-lexical items (e.g. cardinal
numbers, hashtags, URL’s, e-mail addresses) display in current available corpora obtained from
the Web. Specifically, we will focus on GloWbe (Davies, 2013), a large corpus (1.9 billion words),
available both for online queries and as a full-text download in di↵erent formats, including a
tokenized, part-of-speech tagged, lemmatized version. We show that in such web-based corpora,
non-lexical items exhibit high frequency, and therefore should be given a special treatment in
order to obtain adequate statistics of common corpus linguistic metrics, such as type/token ratio, word class frequency, and those that are derived from these. We then propose certain cues
for the proper treatment of such corpora, in terms of pre-processing, tokenization and part of
speech tagging. During this process, we identified certain pre-processing flaws in the original
corpus that led to inaccurate results, and propose ways to overcome them.
Finally, we describe the results of our segmentation and part-of-speech tagging processing, and
compare them with those given by the original tagged version of the Glowbe corpus, and go
on to show the impact that di↵erent preprocessing approaches have on certain types of corpus
queries, as well as n-gram extraction.
References
Davies, Mark. (2013) Corpus of Global Web-Based English: 1.9 billion words from speakers in
20 countries. Available online at http://corpus.byu.edu/glowbe/
⇤
Ponente
157
Contraseña: large corpora, corpus processing, tokenization, part, of, speech tagging
158
Students’ use of the n-grams tool to learn
about phraseology in academic writing
Maggie Charles
1
⇤† 1
Oxford University – Reino Unido
This paper focuses on the use of recurring multi-word units (MWUs) that are fixed or semifixed in form. In academic writing, MWUs have been investigated using various terms, including
‘lexical bundles’ (Biber et al., 1999; Cortes, 2004) or ‘clusters’ (Hyland, 2008a) and research has
shown that their occurrence di↵ers according to discipline (Hyland 2008b). Moreover, there are
considerable discrepancies in MWU use between learner and expert academic writing (Cortes,
2004; Hyland, 2008a), with learners typically employing di↵erent MWUs from expert writers
and/or using them for di↵erent purposes. Thus the use of MWUs presents challenges to learners of English for Academic Purposes and there is a consequent need for even advanced-level
students to develop proficiency in academic phraseology (Gilquin et al., 2007).
The present paper aims to address this issue by investigating students’ use of the n-grams tool in
the AntConc software (Anthony 2014). The n-grams tool makes a list of all sequences of words
that occur in a corpus, with the number of words in the sequence being determined by the user.
This study draws on students’ work during a 6-week, 12-hour course on ‘Editing your Thesis
with Corpora’. For this course, doctoral students built two do-it-yourself corpora: 1) a corpus of
expert writing constructed from research articles (RAs) in their own field; 2) a corpus of learner
writing consisting of draft chapters of their own doctoral thesis. Thus each student worked with
two corpora tailored to their own specific needs. In the session on n-grams, students were shown
the AntConc n-grams tool and each learner made an individual list of three-word sequences
(tri-grams) from their corpus of expert RA writing. As the retrieval process of n-grams is automatic, it was hypothesised that the tool would help students to identify the tri-grams used
in their own field and thus provide a means of highlighting appropriate academic phraseology.
Students were then asked to study the most frequent tri-grams on the list and to perform further
corpus searches to understand and explain what they noticed, comparing where necessary the
findings from the expert corpus with those from their own writing.
The data used in this paper currently consist of the corpora constructed by 15 students and
the worksheets completed by them in class, giving details of the most frequent tri-grams they
found and commenting on what they learnt from their findings. The most frequently mentioned
tri-grams were as well as (found by 11 learners), in terms of (6 learners), the fact that, the
e↵ect(s) of and in order to (4 learners each). Following the categorisations of the ‘Academic
Formulas List’ (Simpson-Vlach & Ellis, 2010), as well as, the e↵ect(s) of and in order to have
discourse organizing functions, while the fact that and in terms of are referential expressions.
After further investigations, students often commented on di↵erences they found between their
writing and that of the experts. For example, after researching the fact that, one student noted
that she used due to the fact that, which did not appear in the RA corpus, where despite the fact
that was prevalent. This paper reports in more detail on the student data and argues that the
n-grams tool provides a useful way of promoting the noticing and understanding of academic
⇤
†
Ponente
Autor correspondiente: [email protected]
159
phraseology at an advanced level.
Contraseña: academic writing, n, grams, academic phraseology, corpus tools, EAP learners
160
Teachers’ Dispositions Towards the Use of
Corpus-Based Approaches in Teaching
English as a Foreign Language in Higher
Education
Awatif Alruwaili
1
⇤† 1
University of Nottingham – Reino Unido
Despite the development and increased use of corpora as a resource in language learning,
little evidence exists that corpora are used as alternatives to textbooks and traditional resources
such as dictionaries (Chambers, 2005). Corpora use has not changed significantly since Chambers’s (2005) article, as revealed by later studies such as Boulton (2010) and R´’omer’s (2009).
Published research has shown improvements in learner performance and positive attitudes in
higher education, providing wide support for the use of a corpus approach in an English as a
foreign language (EFL) context. Nonetheless, implementing this approach in daily teaching is
still a distant goal. Many researchers (e.g. Boulton, 2009, 2010; Hughes, 2012; R´’omer, 2009)
have shown concern regarding the infrequent use of corpora in everyday classrooms. Several
authors have also confirmed the key role that teachers play in applying the corpus approach in
language teaching (Frankenberg-Garcia, 2012).
The present study sought to widen the existing perspective on using corpora in language classrooms given previous research’s promising results on the importance of investigating teachers’
attitudes towards the corpus approach. Their willingness to apply it is clearly a necessary step
in popularising this approach. This study was particularly interested in ways to transform
classrooms into learning environments that truly facilitate the use of corpus-based approaches
for learning English in an EFL context. This transformation can be facilitated by introducing
teachers to corpus-based approaches and their applications in teaching English, which could help
to inform language instructors and shape their attitudes.
This study’s aim, therefore, was to explore teachers’ dispositions towards the use of corpora
in language classrooms. Only two previous studies have examined in-service instructors’ attitudes towards corpus-based approaches to teaching (Mukherjee, 2004; Tribble, 2015). To this
end, the present study’s first phase involved designing an introductory course to show language
instructors possible ways of using corpora in the classroom. Next, I evaluated in-service teachers’ attitudes towards classroom uses of the corpus approach by developing and administering a
questionnaire. Finally, I identified possible factors that can a↵ect instructors’ opinions of using
corpora in the EFL classroom.
The introductory course consisted of two sessions, each of which ran for one hour and 30 minutes with a 15-minute break. The sessions were o↵ered multiple times to accommodate teachers’
availability. The course content consisted of three units: teaching about corpora, exploiting corpora to teach language and teaching to exploit corpora. The participants were 57 in-service
teachers who worked in higher education programmes.
⇤
†
Ponente
Autor correspondiente: [email protected]
161
An exploratory design was selected for developing the questionnaire, in which a semi-structured
interview was used to generate material on themes and list possible variables in addition to
those found in the related literature. The questionnaire covered five themes related to corpora
uses in the classroom, including usefulness, difficulty, practicality, confidence and anxiety, and
implementation.
The tri-component model of attitude was used as the theoretical framework for constructing the
questionnaire because this model is widely known and accepted by many researchers (Vandewaetere & Desmet, 2009). This framework consists of three elements that provide a comprehensive view of attitudes – in this case, towards corpus use in the language classroom – by capturing
the three components of attitudes: cognitive, a↵ective and behavioural. Overall, teachers had
moderate to positive attitudes.
Contraseña: Corpus, based approaches: in, service teachers: classroom
162
The Developmental Relationship between
Spoken and Written Clause Packaging in an
English Secondary School
Mark Brenchley
1
⇤ 1
Graduate School of Education, University of Exeter – 216 Baring Court University of Exeter St
Luke’s Campus Heavitree Road Exeter EX1 2LU UK, Reino Unido
This paper will detail the findings of a fresh study into the relationship between L1 spoken
and written syntax during the secondary phase of the English education system, situating them
within the context of other recent studies into L1 development during the school years and discussing their implications for L1 English curricula.
Working within a framework of ”linguistic literacy” and a wider model of ”rhetorical” competence, according to which L1 speakers and writers must not only learn the core forms of a
language but also develop the capacity to e↵ectively put these forms to work across a range
of literate contexts (Berman & Ravid, 2009; Ravid & Tolchinsky, 2002; cf. Biber, 1988, 1992;
Hymes, 1976), the aims of the present study were twofold. Firstly, to provide a better understanding of the relationship between spoken and written syntax during an apparently critical
period in the development of L1 English (Berman & Ravid, 2009; Myhill, 2009). Secondly, to
provide evidence that can better inform and support contemporary L1 English curricula, which
are increasingly emphasising the explicit teaching of grammar (ACARA, 2016; DfE, 2014).
To this end, a bespoke corpus of 180 pairs of spoken and written L1 expository discourse was
directly elicited from students attending a mainstream secondary school in Southern England.
The corpus was further designed so as to be balanced across two developmental axes: (a) the
year group of the student, and (b) their National Curriculum attainment level. This corpus was
then analysed in terms of the students’ modality-related use of clause packaging, construed here
as comprising the various means by which clauses are combined via coordination and subordination (cf. Berman & Slobin, 1994).
So analysed, the study indicates adolescent students at the present age and attainment levels to be at a stage where they can and do di↵erentiate their modality-related syntax, at least
for these texts and measures. It also found this di↵erentiation to be something that varied according to the particular kind of packaging measured. Thus, the spoken texts exhibited a greater
number of t-units per t-unit complex and clauses per t-unit, together with a greater prominence
of finite adverbial and post-verbal complement clauses. Conversely, the written texts exhibited
a greater overall prominence of non-finite clauses, whilst neither modality was distinguishable
in terms of either clause length or their respective proportions of relative clauses and phrasal
clauses. Finally, this di↵erentiation was found to be developmentally static, with participants
handling their modality-related syntax in much the same way regardless of their age or attainment level.
Overall, these findings are interpretable in terms of the participants tapping into the di↵erential
production conditions of speech and writing, but without necessarily fully exploiting these conditions (Biber, 1988, 1992). Moreover, when placed in the context of the wider evidence base
⇤
Ponente
163
(Berman, 2008; Myhill, 2008; Nippold, 2007; Nippold & Scott, 2010; Ravid & Tolchinsky, 2002),
the findings suggest two additional conclusions. Firstly, they indicate students at the present
age and attainment levels to be at a stage where their syntactic output is more in line with the
discourse of mature speakers and writers. Secondly, they indicate modality to be an aspect of
student syntax that is characterised by a potentially high degree of sensitivity to the various
communicative features of the wider discourse context.
Contraseña: Education, English, L1, Later Language Development, Modality, Register Variation,
Syntax
164
The Psycholinguistic Profile of Domestic
Abusers: A Corpus-Based Approach
ángela Almela⇤ 1 , Gema Alcaraz-Mármol 2 , Pascual Cantos
Chaski 4 , Clara Pallejá 5
1
† 3
, Carole
Centro Universitario de la Defensa - UPCT (CUD) – Centro Universitario de la Defensa. Base Aérea
de San Javier C/ Coronel López Peña s/n, 30720, Santiago de la Ribera, Murcia, España
2
Universidad de Castilla la Mancha (UCLM) – España
3
Universidad de Murcia (UM) – España
4
Institute for Linguistic Evidence (ILE) – Estados Unidos
5
Centro Universitario de la Defensa - UPCT (CUD) – España
Gender-based violence is receiving close attention from professionals and researchers within
the legal, criminal and psychological scope, exploring several aspects related to both the victim
and the abuser. In some cases, the phenomenon of gender-based violence shows the direct relationship between language and society. In fact, some stylistic methods show how social structures
and language are interwoven through the abuser’s discourse. However, the language produced
by those involved in gender-based violent acts has been hardly explored from a computationallinguistic perspective (Almela, Alcaraz-Mármol & Cantos, 2015; Hancock et al., 2011).
This paper presents a pilot study of di↵erentiating the language of domestic abusers from a control group. The domestic abusers have been convicted of a violent crime in the domestic context,
while control group members have not. The main aim is to shed some light on the gender-based
abuser’s psycholinguistic profile in the Spanish language from an empirical viewpoint, in the
light of the scientific practices promoted by Chaski (2013). This profile is meant to establish the
underpinnings for a database which will be compared to other criminals’ speech. Our research
is still at the initial stage, but we have already designed the methodology for the analysis of the
morphological characteristics in the gender-based abuser’s discourse, as opposed to the speech
of those convicted for other sorts of crimes and a control group. Specifically, the linguistic
sample for our analysis correspond to written interviews done by subjects that have
been accused and/or convicted for gender-based abuse. The computational analysis
involves several stages like POS-tagging, punctuation tagging and the evaluation of
markedness, as well as the assessment of lexical choice and the identification of morphosyntactic patterns, which will allow us to distinguish the abuser’s sublanguage
from that of the control group. Thus, the results of analyzing the two groups’ linguistic
behavior in writings responding to the same stimuli are presented. Further, results of clustering
and classification to determine the statistical reliability of di↵erentiating the language of domestic abusers are presented.
The present authors will also comment on some of the hindrances found in the collection of
data, which has complicated the accomplishment of the work schedule initially programmed,
and will show how the use of language as evidence in the framework of forensic linguistics in
Spain is still in its infancy.
REFERENCES
⇤
†
Autor correspondiente: [email protected]
Ponente
165
Almela, A., Alcaraz-Mármol, G. and Cantos, P. (2015). Analysing deception in a psychopath’s
speech: a quantitative approach. DELTA 31 (2): 559-572.
Chaski, C.E. (2013). Best practices and admissibility of forensic author identification. Journal
of Law and Policy 21 (2): 333–372.
Hancock, J. T., Woodworth, M. T. and Porter, S. (2011). Hungry like the wolf: A word-pattern
analysis of the language of psychopaths. Legal and Criminological Psychology 2011, 1–13.
Contraseña: domestic abusers, forensic linguistics, psycholinguistic profile, clustering, classification
166
The XML Annotation of A Corpus of
Historical English Law Reports 1535-1999:
A Progress Report
Paula Rodrı́guez-Puente
1
⇤ 1
University of Oviedo – España
A Corpus of Historical English Law Reports (CHELAR; Rodrı́guez-Puente et al. 2016) is a
specialised corpus consisting of law reports dating from the period 1535-1999. Law reports are
records of judicial decisions which are ”cited by lawyers and judges for their use as precedent in
subsequent cases” (Encyclopædia Britannica Online s.v. law report); they typically contain an
account of all the facts of the case, the arguments of the judge, his reasoning, the judgment he
arrives at and the kind of authority and evidence he uses. The corpus contains approximately
half a million words. It is structured into nine periods of 50 years each, except for the first
subperiod, which covers from 1535 to 1599. It is already available as plain text and with POS
annotation (CLAWS C7; see Garside 1987). In previous work we described the first difficulties
encountered during the process of creating the corpus texts as well as the editorial decisions that
were initially taken (Rodrı́guez-Puente 2011); Fanego et al. 2017 provide an account of the final
structure of the corpus and the type of documents it contains together with a description of the
process of compilation of the raw and POS-annotated texts. In this presentation we report on the
process of XML annotation of the corpus. CHELAR is currently being annotated following the
Text Encoding Initiative P5 Guidelines for Electronic Text Encoding and Interchange developed
by the Text Encoding Initiative Consortium (Bray et al. 2008). TEI XML encoding has become
the standard practice adopted in digitally based humanities research for present-day English
and diachronic corpora. More precisely we focus on the particular structure and contents of
law reports and the specific XML tags used for our purposes. We advocate for a modest XML
tagging which includes some renditional (e.g. italics), structural (paragraphs, line breaks, page
breaks, etc.) and conceptual (foreign words, proper names, names of cases, etc.) features of
the texts. In sum, although the annotation possibilities of the TEI-XML schema are infinite,
we selected only those tags that satisfy the needs of our texts, yet at the same time facilitate a
varied range of corpus analyses. An account of the decisions made will be provided in this paper,
together with a progress report of the annotation process itself. At present we have concluded
the annotation of the first two subperiods (1950-99 and 1900-1949) and we hope to conclude the
annotation of the whole corpus by the end of 2017.
Contraseña: corpus annotation, XML, law reports
⇤
Ponente
167
The construction of shared feelings: analysis
of a↵ect in a corpus of obituary comments
in online newspapers
Isabel Corona
1
⇤ 1
Universidad de Zaragoza (UNIZAR) – Facultad de Filosofı́a y Letras Pedro Cerbuna12 50009
ZARAGOZA, España
The comments section in online newspapers consists of a slot found below an article’s body
text where readers may post their opinion following that piece of news. Comment boards were
o↵ered by online newspapers a decade ago to engage readers in the news process, thus creating
a new context for expression and engagement (Yzer and Southwell 2008) within the general
‘connecting’ mantra.
Journalistic obituaries, with a long-standing tradition in all sorts of newspapers, are life stories
seen in retrospective. They are narratives of lives with a purpose established by the newspaper,
either to praise or condemn, becoming a lesson of life that guides or reinforces the values of a
community of readers who are supposed to share the same socio-cultural or political principles.
Thus, evaluation of the subject has been an intrinsic feature of obituaries. The subjects’ lives are
sanctioned as complying with or deviating from role-specific parameters, in such ways that they
construe a particular version of collective memory, reflecting the values of the media institution.
This collective memory can now be challenged by the new media a↵ordances that open up the
space for individual reactions to that memory. By using the comments section, which could be
viewed as a new ‘social tool’, prior readers become co-participants in the coproduction of the
text’s meanings” (Page and Thomas 2011: 10): they may bring emotional reactions on his or
her behaviour, on his or her public legacy as role models, and get an immediate response from
other participants. The users’ discursive acts, although separated from the main text, construe
another discursive context that may or may not agree with the newspaper’s assessment of the
subject.
The main aim of this study is to explore the commentator’s use of evaluative expressions for
the construction of a↵ect towards a life story of a public persona in the digital media, in order
to assess the way media users establish a new space for shared feelings. For this purpose, the
corpus comprises 840 comments which appeared in the obituaries published by five online newspapers (Daily Mail (UK), The Daily Telegraph (UK), The Guardian (UK), the Huffington Post
(USA edition), and the Washington Post (USA)) after the death of the Spanish Duchess of Alba.
The study is grounded in Collective Memory as an umbrella concept that ”defines relations
between the individual and the community to which she belongs and enables the community to
bestow meaning upon its existence” (Neiger et al. 2011: 4). The analysis applies the framework
proposed by Appraisal Theory (Martin 2004; Martin and White 2005; White 2001), to explore
the attitudinal values used to construe a community of shared values. The present analysis
focuses on the attitudinal realm of ”A↵ect”, as mapping the commentators’ reactions in terms
of happiness, admiration, satisfaction, desire and solidarity towards the obituarised subject.
The analysis of explicit attitudinal instantiations of A↵ect reveals a clearly positive emotional
⇤
Ponente
168
response of readers turned into users, with prototypical expressions of sorrow –so productive
in the construction of community identity–, and a high frequency of desiderative expressions
operating as ritual formula, all of them features –referred to by obituarists as ”dread clichés”
(Massingberd 1995: viii) and banned in all quality newspapers –, that challenge tacitly accepted
norms with respect to what is considered good obituary writing.
Contraseña: Collective Memory, obituaries, online comments, Appraisal, Computer Mediated Communication (CMC)
169
The implied consumer in British hotel
websites
Carmen Gregori-Signes
1
⇤ 1
IULMA. UNIVERSITAT DE VALENCIA (IULMA. UV) – Facultt de Filologia Blasco Ibañez 32
46010 Valencia, España
Hotel websites is a discourse type within etourism that intertwines textual and visual strategies
(cf. Cheng 2016) with the primary purpose of persuading website visitors to become customers.
This paper focuses on the interpersonal rhetorical functions of engagement, i.e. the lexicogrammatical choices (cf. Hyland 2005) that hotel website designers use as a strategy to create
a bond between the addresser (i.e. the hotelier) and the addressees (i.e. the potential clients),
in the framework of a ‘business to consumer’ (B2C) marketing practice in ecommerce. As a
framework for the analysis, the paper adopts Stern’s (1994) interactive communication model
and focuses on the implied consumer, i.e. the construct of the imagined consumer within the
message, and how the relationship between both is discursively established. This involves looking at metadiscourse, which Hyland and Jiang (2016: 3) described as ”(the) linguistic material
referring to the evolving texts and to the writer and imagined reader of that text.” As Hunston (2011: 24) puts it, ”metadiscourse is subsumed entirely under the concept of interaction
or engagement between writer and reader.” The corpus analysed comprises 114 British hotel
websites, and amounts to half a million words. This is part of COMETVAL, a large database of
over 7 million words, compiled by researchers at the University of València, and contains samples
of tourism websites in three languages: French, Spanish and English. The results obtained in
the analysis indicates the existence of patterns whose relevance becomes already apparent in
an initial keyword analysis of the corpus: among the top keywords one can find the personal
pronoun you (subject and object) and its corresponding possessive your as explicit reference
to the implied consumer. Further observation by means of concordancing and manual scrutiny
also pointed towards the need to include directives as a relevant feature of engagement (Hyland
2005). Directives are often conveyed by means of imperatives and cannot be detected through
keyword analysis and ordinary morpho-syntactic tagging. The results of the quantitative and
qualitative analysis seem to indicate that copywriters rely on a set of a set of specific conditional
constructions built around the subject personal pronoun you, and, in some cases, directives.
These structures were further explored and classified into di↵erent subsets, which brought out
a set of lexico-grammatical patterns that reflect the textual choices that hoteliers use in their
attempt to anticipate the needs and wishes that potential customers may have. These needs,
they claim, can be satisfied by the products and/or services that hoteliers o↵er. It is our view
that such rhetorical features of engagement distinguish the discourse of hotel websites from other
kinds of promotional discourse. These patterns are examples genuine cases of engagement, key
rhetorical features of hotel-owned websites (AUTHOR 2, 2014).
⇤
Ponente
170
Contraseña: Keywords: discourse, engagement, corpus linguistics, conditionals, advertising, etourism,
hotel websites
171
The power of English: I and we in ELF and
in ENL academic discourse
Jolanta Sinkuniene
1
⇤ 1
Vilnius University (VU) – Lituania
Within the last several decades, numerous cross-disciplinary and cross-linguistic studies of
research writing confirmed interesting trends in the ways knowledge is reported in di↵erent science fields and di↵erent cultures (Berkenkotter & Huckin 1995; Fløttum et al. 2006; Hyland
2008; Lorés–Sanz et al. 2010, inter alia). In those studies, author stance or author voice (Hyland & Sancho Guinda 2012) is the key element of investigation as it proved to play a very
important role in creating persuasive discourse which shapes disciplinary and cultural identities.
In cross-linguistic studies of research writing, the comparative axis is frequently drawn between
English vs other academic cultures trying to establish the level of similarity or divergence in
the expression of author stance. At the same time the question of the influence of English on
other academic cultures has become of crucial importance leading to the debate about the role
of English in the global research arena: the role of a common, unifying language of science or the
Tyrannosaurus rex (Swales 1997) responsible for the ”epistemicide” (Bennett 2007) of smaller
cultures.
One of the most obvious elements of author stance manifestation is personal pronouns. The use
of I and we in academic discourse has been acknowledged as one of the most powerful means
to mark author stance (Harwood 2005; Hyland 2001, inter alia). Numerous empirical studies
confirm substantial di↵erences in personal pronoun use depending on the cultural background
of the writer (for an overview see Mur-Dueñas & Šinkūnien 2016). There is less research which
attempts to investigate the ways personal pronouns are used in English as a Lingua Franca by
non native English speakers in comparison to their writing in native languages. The aim of the
present study therefore is to analyse the use of personal pronouns in linguistic research articles
written by Lithuanian scholars in Lithuanian and by the same scholars in English, and to compare patterns of use with those of native English speakers. The study employs corpus-based
contrastive methodology as well as quantitative and qualitative analysis. The data comes from
a self-compiled corpus of 36 single-authored research articles. For the Lithuanian data 12 pairs
of research articles written by the same scholar in English and in Lithuanian were selected. For
the English sub-corpus, 12 articles written by British linguists were chosen. The quantitative
analysis looks at the frequency distribution of I and we and their morphological forms in those
three sub-corpora. The qualitative analysis investigates the range of functions that personal
pronouns perform in Lithuanian, Lithuanian English and British English texts. For this purpose, all combinations of a personal pronoun with the verb have been analysed in context to
determine the function they perform.
The results suggest that most Lithuanian scholars choose a more explicit author stance expression when they write in English rather than in Lithuanian, though the frequency and functions
of I and we in English native speakers’ texts are di↵erent. English native speakers choose more
argumentative verbs to express author stance with personal pronouns, they also frequently shift
from I to we and in this way create more persuasive discourse and closer links with the audience
⇤
Ponente
172
than Lithuanian scholars.
Contraseña: academic discourse, personal pronouns, cross linguistic, quantitative analysis, qualitative analysis
173
The textual colligation of stance phraseology
in cross-disciplinary academic discourses:
the timing of authors’ self-projection
Louisa Buckingham
1
⇤† 1
, Jihua Dong
⇤ ‡ 1
University of Auckland – Nueva Zelanda
Lexical items, according to Hoey (2005, p.13) ”are primed to occur in or avoid, certain positions within the discourse”. An analysis of textual colligation, the term Hoey (2005) uses to
denote such priming, explores the textual position of linguistic markers in relation to textual
structures. Recent studies have explored the textual colligation of particular words or phrases
(e.g., Hoey & O’Donnell, 2008; Mahlberg, 2009; O’Donnell et al., 2012). Textual colligation
explores the textual position of linguistic markers in relation to textual structures and the interaction between the textual position and discourse functions (Hoey, 2005). Previous studies
have enriched our understanding of textual colligation of particular linguistic features such as
keywords or key phrases in a text. This study investigates the textual colligation of a type of
linguistic marker typical for one particular semantic group, namely, stance.
This quantitative study investigates the textual colligation of the stance phrases in academic
discourse in the disciplines of agriculture and economics. The study employs a purpose-built
corpus of 655 published research articles totalling around 3 million tokens. We use Wordskew
software (Barlow, 2016) to investigate the position (or colligation) of stance phrases at the level
of sentence, paragraph and text, and examine the existence of disciplinary variation with respect
to the textual colligation of these phrases.
The results show that significant di↵erences exist in the distribution of stance phrases in different textual positions (sentence, paragraph and text) in the two disciplines. Nevertheless, the
proportion of stance phrases in each of the three textual positions is notably similar in the two
disciplines. It may be inferred that the textual position of particular stance phrases may be a result of the type of routinized discourse or communicative function these serve (Hoey, 2005). The
findings regarding the textual position of the stance phrases consolidates Hoey’s premise that
certain expressions are primed to occur or avoid particular textual positions. In addition, the
study revealed that the phrases of a particular function tend to share some positional similarities
with regard to their distribution in sentence, paragraph and the whole text. From a communicative viewpoint, the appropriate positioning of stance phrases in a text supports authors in
constructing discourse-appropriate persona, interact with envisaged readers, and achieve their
communicative objectives.
The use of Wordskew has contributed to revealing the text positions at the sentence, paragraph, and text level. It provides an efficient way to quantify the textual position of particular
linguistic features, and contributes to visualising the distribution of particular linguistic features
in the organization of a text.
⇤
Ponente
Autor correspondiente: [email protected]
‡
Autor correspondiente: [email protected]
†
174
Barlow, M. (2016). WordSkew : Linking corpus data and discourse structure. International
Journal of Corpus Linguistics, 21 (1), 105–115.
Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge.
Hoey, M., & O’Donnell, M. B. (2008). Lexicography, grammar, and text position. International Journal of Lexicography, 21 (3), 293–309.
Mahlberg, M. (2009). Local text functions of move in newspaper story patterns. In U. R´’omer
& R. Schulze (Eds.), Exploring the lexis-grammar interface (pp. 265–287). John Benjamins.
O’Donnell, M. B., Scott, M., Mahlberg, M., & Hoey, M. (2012). Exploring text-initial words,
clusters and concgrams in a newspaper corpus. Corpus Linguistics and Linguistic Theory, 8 (1),
73–101.
Contraseña: textual colligation, stance phrases, academic disciplinary variation, academic writing
175
Towards an extended lexical grammar:
Complex colligational patterns of the noun
cause
Moisés Almela Sánchez
1
⇤ 1
, Pascual Cantos Gómez
⇤ † 1
University of Murcia – España
It has become a truism that lexis and grammar are intertwined and that grammatical choices
are bound to lexical items. The notion of lexical grammar is well established in several frameworks of modern linguistic research, and corpus-driven linguistics is not an exception in this
respect-see, for instance, Francis (1993) and Hunston and Francis (2000). This research is aimed
at extending the scope of description of lexico-grammatical co-selections, more specifically at
identifying certain forms of coordination of lexical and grammatical features that are more complex, and also subtler, than the cases of lexico-grammatical co-selection usually described in the
literature.
Theoretically and methodologically, the study builds on research into lexical constellations (Cantos & Sánchez, 2001; Almela, 2011; Almela et al., 2013), which has provided evidence that the
strength of association between a node and a collocate can be influenced by elements outside
the pair, particularly by dependencies among di↵erent collocates of a node. For instance, the
association of the verb face and the noun decision is strengthened by the presence of modifiers
of a specific semantic set (e.g., hard, difficult, tough). Previous studies have focused on the
implications of this phenomenon for the analysis of word meaning. The methodology was based
on comparisons of conditional probabilities between bigrams and trigrams formed by previously
extracted significant collocates of a node. The present study adapts the methodology of lexical
constellation analysis to the description of dependencies between di↵erent colligational patterns
(i.e. preferred grammatical contexts) of a word. The node under investigation is the noun cause,
and the corpus used is enTenTen2013, a large-scale web corpus of English. This corpus contains
19,717,205,676 tokens and is accessible at Sketch Engine.
The methodology will be organized in two main steps. In the first one we will compare the
conditional probabilities of di↵erent grammatical contexts of the node. The goal of this first
step is to determine whether the presence of a particular grammatical category in the context
of the node increases or decreases the probability of another grammatical category in a di↵erent
position. More specifically, we will observe possible dependencies between the slots ‘premodifier’
and ‘of -postmodifier’. In a second step, we will compare the behaviour of these two slots across
di↵erent collocations of the node. In particular, we will analyse their distribution in collocations
of cause with a list of top logDice collocates.
Two main conclusions are drawn from the results. The first one is that there are dependency
relations between the two grammatical slots investigated in the environment of cause (‘premodifier’ and ‘of -postmodifier’). The second one is that the dependency relations observed between
grammatical slots are contingent on specific collocations of cause. The dependencies observed
do not exhibit the same behaviour with all the verbal collocates of the node. In general, these
results point towards an influence of collocation on the co-occurrence probabilities of di↵erent
⇤
†
Ponente
Autor correspondiente: [email protected]
176
colligations of cause.
Contraseña: collocation, colligation, lexical priming, semantic preference.
177
Técnicas de caracterización de los
personajes femeninos en Galdós: una
aproximación desde los estudios de corpus
Guadalupe Nieto
1
⇤ 1
Universidad de Extremadura - Uex (SPAIN) – España
En esta comunicación se explora, a partir de un estudio de corpus, el lenguaje gestual en
las novelas de Benito Pérez Galdós y, de manera más precisa, los patrones empleados por el
novelista para trazar la personalidad de los personajes femeninos. El estudio abordará la obra
completa en prosa del escritor, la cual suma alrededor de 6,2 millones de palabras. Para ello se
prestará especial atención a las construcciones de al menos cinco palabras (clusters), empleadas
de manera sistemática, y que contengan algunas de las siguientes partes del cuerpo: cabeza,
espalda, hombros, manos u ojos. Este recurso de caracterización ha sido analizado en otros
escritores de habla inglesa como Dickens (Mahlberg, 2013; Ruano San Segundo, 2015) o Jane
Austen (Fischer-Starcke, 2010). El lenguaje gestual, como apunta Korte (1997: 4), se erige,
como se verá, en un sistema autónomo en la construcción del universo ficticio en el género novelesco.
El estudio de corpus que se propone permitirá profundizar en un aspecto del estilo de Galdós
que hasta ahora, debido a lo complejo que puede llegar a resultar su análisis sin herramientas
de carácter cuantitativo, ha pasado, por lo general, desapercibido. Ası́ pues, se indagará en la
caracterización de sus personajes femeninos a través de patrones recurrentes y que contengan
las mencionadas partes del cuerpo a lo largo de su producción literaria y el cotejo, en algunos
casos, con la caracterización de los personajes masculinos. Como se podrá comprobar, la obra
de Galdós está poblada por patrones que actúan como bloques textuales que contribuyen a la
construcción del universo ficticio que el autor nos plantea. Los textos han sido descargados del
repositorio digital Cervantes Virtual y han sido procesados posteriormente con el software de
concordancias WordSmith Tools 6 (Scott, 2013), que permite realizar búsquedas de palabras y
concordancias que extraen resultados que pueden ser analizados en el contexto de la novela en
que aparecen.
Entre los ejemplos de nuestro análisis se encuentra la expresión ”el pañuelo a los ojos”, asociada
casi con exclusividad a la caracterización de personajes femeninos y empleada normalmente en
momentos dialógicos para insistir en la tristeza de estos: ”Irene se llevó el pañuelo a los ojos, y
con voz de ahogo me dijo: ‘Sabe usted... más que Dios...”’ (El amigo Manso, capı́tulo 41).
En definitiva, la caracterización de los personajes femeninos en el universo novelesco de Galdós
está perfectamente lograda. En efecto, como se pretende demostrar en este trabajo, el análisis
del lenguaje gestual desde una perspectiva de estilı́stica de corpus permitirá, además, marcar
diferencias entre hombres y mujeres o entre mujeres burguesas y proletarias. El autor canario
es, en palabras de Marı́a Zambrano (1994: 130), ”el primer escritor español que introduce valientemente a las mujeres en su mundo”.
Bibliography:
⇤
Ponente
178
Biblioteca Virtual Miguel de Cervantes (2016): http://www.cervantesvirtual.com/ (acceso: 2
de abril de 2016).
Fischer-Starcke, B. (2010): Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries. London: Continuum.
Korte, B. (1997): Body Language in Literature. Toronto: University of Toronto Press.
Mahlberg, M. (2013): Corpus Stylistics and Dickens’s Fiction. New York/London: Routledge.
Ruano San Segundo, P. (2016): ”A corpus-stylistic approach to Dickens’ use of speech verbs:
Beyond mere reporting”. Language and Literature, 25 (2), 1-15.
Scott, M. (2013): WordSmith Tools. Version 6. Oxford: Oxford University Press.
Zambrano, M. (1994): ”Mujeres de Galdós”. Asparkı́a, 3, 129-135.
Contraseña: Galdós, mujer, lenguaje gestual, estilı́stica de corpus
179
Unidades fraseológicas en la subtitulación de
una serie del género de drama.
Dalila Itzel Nieto Mercado
1
⇤ 1
, Eleonora Lozano Bachioqui†
1
Universidad Autónoma de Baja California, Facultad de Idiomas (UABC) – Av. Álvaro Obregón y
Julián Carrillo S/N, Edificio de Rectorı́a, Col. Nueva, C.P. 021100, México
Resumen
El presente trabajo surge de la necesidad de conocer más sobre la traducción de unidades
fraseológicas en la subtitulación del inglés al español, debido al crecimiento de espectadores
de contenidos audiovisuales provenientes de Internet. En este contexto, se debe tener en cuenta
que la labor del traductor consiste en hacer la cultura accesible a todo aquel que se interese
por ella,ya que no se trata solamente de convertir mensajes de un idioma a otro sino también
de difundir la cultura. El objetivo de este trabajo es la creación de un glosario de unidades
fraseológicas en inglés junto con sus equivalencias, basado en un corpus proveniente de los
diálogos de una serie de televisión. Los resultados beneficiarán a todos aquellos que se interesen
por la traducción o bien puede servir como instrumento de enseñanza de unidades fraseológicas
en inglés y sus equivalencias al español. Para esto, se compilaron algunos guiones de la serie estadounidense Mad men(Weiner, 2007) con el fin de realizar un análisis de las unidades
fraseológicas utilizando el programa AntPConc, creado por Laurence Anthony para el análisis
de textos paralelos.
Phraseological units in the subtitling of a drama series
Abstract
The following paper rises from the need of learning more about the translation of phraseological
units in English to Spanish subtitling, due to the increasing amount of spectators of Internet
broadcast media.In this regard, we must take into account that a translator’s task is to make
culture approachable to anyone who is interested in it, for it’s not only about translating words
from one language to another but it’s also about spreading the culture. The objective of this paper is to create an English phraseological units glossary -along with their equivalence in Spanishbased on a corpus originated from the scripts and subtitles from a television series. The results
will benefit anyone who is interested in translation or it may also serve as an English to Spanish
phraseological unit teaching tool. To do the aforementioned glossary, a compilation of scripts
from the American series Mad men(Weiner, 2007) was made in order to analize the phraseological units using the tool AntPConc (to analize paralel texts) a tool created by Laurence
Anthony.
Contraseña: traducción, subtitulación, unidades fraseológicas, ling´’uı́stica de corpus
⇤
†
Ponente
Autor correspondiente: [email protected]
180
Verbal agreement with NCOLL-of-NPL
subjects in the inner varieties of English in
GloWbE
Yolanda Fernández-Pena
1
⇤ 1
University of Vigo – España
Collective noun-based subjects may take singular or plural verbs according to whether the
speaker focuses on the collectivity or on its individuals (Dekeyser 1975), the latter being preferred
in British English (Bauer 2002). This conundrum is further complicated when collective subjects
take plural of -dependents (i.e. Ncoll-of-Npl subjects) which may interfere in the subject-verb
agreement relation, as in (1):
(1) A [crowd]SG of [waiters]PL [were]PL gathering.
In previous research, with data from the British National Corpus (BNC) and the Corpus of
Contemporary American English (COCA), I showed that NCOLL-of-NPL subjects take a significant rate of plural verb agreement (68.05%) in local syntactic domains in both British and
American English and that, with increasing syntactic distance and complexity, the influence of
plural of -PPs on verb number diminishes and, therefore, the rate of plural agreement considerably lowers (58.47%).
This study extends the scope of such investigation by exploring verbal agreement with NCOLLof-NPL subjects in the corpus of Global Web-based English (GloWbE) with a two-fold purpose.
Firstly, I have inspected British and American English in GloWbE to find whether my prior
observations were corroborated (and to what extent) in the more informal web-based register.
Secondly, I have scrutinised the data for the other four inner varieties of English in GloWbE –
Ireland, Canada, Australia and New Zealand – to detect significant regional tendencies and similarities/di↵erences with respect to British and American English. To this end, I have replicated
my previous investigation and, thus, examined verbal agreement with twenty-three singular collective nouns taking of -dependents (lists retrieved from Biber et al. 1999: 249; Huddleston and
Pullum et al. 2002: 503) in the six inner varieties of English in GloWbE. The syntactic variables
considered in the study pertain to (i) the constituent structure of the of -PP, (ii) the typology
of the modifiers of the NPL, and (iii) the morphology of the NPL (i.e. regular vs. irregular vs.
non-overt plurality as in boys vs. men vs. people).
The results confirm to a large extent my prior observations in the BNC and COCA and also
evince significant regional trends. In general, NCOLL-of-NPL subjects show an overall preference for plural verbal agreement only in the British and the Irish components (57.26% and
63.97%); American English slightly favours singular agreement (52.18%), whereas Canada, Australia and New Zealand do not display significant preferences. In line with the BNC and COCA,
the data from GloWbE demonstrate how the morphology of NPL conditions verbal agreement
because morphologically-unmarked plural nouns such as people show a more remarkable influence on verb number (70.18%) than irregular (i.e. men, 60.89%) and regular (i.e. boys, 51.63%)
plural nouns, a tendency which is attested in all the varieties surveyed. Concerning syntactic
complexity, while Canada, Australia and New Zealand do not provide significant results, the
⇤
Ponente
181
results for the British, Irish and American varieties confirm that the most complex syntactic
configurations of NCOLL-of-NPL subjects (i.e. those with pre- and postmodification) select a
lower rate of plural agreement. Similarly, plural verb agreement is considerably less salient when
the NPL is postmodified by clausal and, thus, expectedly more complex constituents (40.94% vs.
non-clausal: 55.50%). This finding counteracts prior literature (Corbett 1979) but lends support
to the tendencies that I had previously observed and, hence, confirms the significant impact of
morphology and syntactic complexity on the verbal patterns of NCOLL-of-NPL subjects.
Contraseña: verbal agreement, collective nouns, regional varieties, corpus
182
Évaluer le seuil de fréquence pour la
sélection des paquets lexicaux: de bonnes
nouvelles avec quelques réserves
Yves Bestgen
1
⇤ 1
Centre for English Corpus Linguistics (CECL) – Place du Cardinal Mercier, 10 B-1348
Louvain-la-Neuve, Bélgica
Une des approches les plus fréquemment employées pour étudier les unités préformées dans
des corpus repose sur l’identification automatique des paquets lexicaux (lexical bundles) qui sont
les séquences de mots les plus récurrentes dans un corpus (Biber et al., 1999). Leur étude a
permis de mettre en évidence des di↵érences phraséologiques entre des registres, des genres et
des époques. Si la majorité des recherches ont été menées sur des séquences de 4 mots, des
séquences plus courtes ont également été analysées. Pour les sélectionner parmi l’ensemble des
n-grammes de mots présents dans un corpus, deux critères sont employés : un seuil de fréquence
minimale, censé garantir que les paquets lexicaux ”show a statistical tendency to co-occur ”
(Biber et al., 1999: 989) et le nombre minimal de documents dans lequel une séquence doit être
présente afin d’éliminer les séquences idiosyncrasiques. Si un large consensus s’est établi pour
fixer un seuil de 3 à 5 textes pour le deuxième critère, de très fortes variations sont observées
pour le premier, celui-ci se situant habituellement entre 10 et 40 occurrences par million de
mots, mais des valeurs allant de 4 (O’Kee↵e et al., 2007) à 88 (Decock, 1998) ont également
été employées. S’agissant du critère principal de sélection (Cortes, 2015: 204), censé garantir
que les paquets lexicaux sont composés de ”words which follow each other more frequently than
expected by chance” (Hyland, 2008: 5), une telle plage de variation conduit à se demander si les
seuils de fréquence employés sont suffisamment élevés pour éviter de sélectionner des n-grammes
que le hasard aurait pu facilement produire aussi fréquemment. De nombreux chercheurs ont en
e↵et souligné qu’une séquence peut-être très fréquente simplement en raison de la fréquence des
mots la composant (p.ex. Evert, 2005; Gries, 2010). Afin d’essayer d’apporter une réponse à
cette question, l’étude emploie une extension aux séquences de plus de deux mots du test exact
de Fisher qui est recommandé dans le cas des bigrammes (Jones et Sinclair, 1974; Pedersen et
al., 1996; Stefanowitsch et Gries, 2003). Il est important de noter que son objectif n’est pas de
remettre en question la définition des paquets lexicaux comme les séquences les plus récurrentes.
Il est évidemment plus utile de distinguer des registres au moyen de séquences très fréquentes
qu’au moyen de séquences rares.
Les analyses ont été e↵ectuées sur un corpus de 3 200 000 mots extraits de la section ”académique”
du BNC. Trois sous-corpus ont également été extraits de ce corpus initial de manière à faire varier
la taille, le premier contenant 800 000 mots, le deuxième 200 000 et le dernier 50 000 mots. Une
procédure d’estimation des probabilités par permutation des mots dans le corpus a été employée
et 10 millions de permutations ont été e↵ectuées dans chaque corpus.
Les résultats indiquent que les seuils classiques sont suffisamment élevés pour ne sélectionner
que des séquences de quatre mots que le hasard aurait très peu de chance de produire aussi
fréquemment. Par contre, un nombre important de séquences de trois mots sélectionnés sur la
⇤
Ponente
183
base de ces seuils ne passent pas le test inférentiel. Cette étude met aussi en évidence un e↵et
très marqué de la taille du corpus sur l’efficacité des seuils de fréquences lorsque ceux-ci sont
exprimés en fréquence normalisée, confirmant les inquiétudes de Cortes (2015) et de Hyland
(2012).
Contraseña: expressions phraséologiques, paquets lexicaux, lexical bundles, test exact de Fisher,
approche dirigée par le corpus, seuil de fréquence, taille du corpus
184
Índice de creatividad metafórica y
universales de traducción: propuesta
metodológica a partir de un corpus de
informes de responsabilidad social
empresarial
Sara Piccioni
1
⇤ 1
Università “G. D’Annunzio” di Chieti-Pescara – Italia
Objetivo del presente trabajo es investigar las hipótesis de los universales de traducción
(Baker 1996) a través de la comparación de un ı́ndice de creatividad metafórica en un corpus de
textos originales y textos traducidos en español. El análisis realizado se basa sobre una doble
propuesta metodológica: en primer lugar, abrazando la idea de que los textos traducidos difieren
de los textos originales por rasgos ling´’uı́sticos propios, se propone incluir entre estos rasgos el
nivel de lexicalización/creatividad metafórica, sugiriendo que el uso metafórico en originales y
traducciones es distinto en cuanto al tipo de repertorio metafórico utilizado; en segundo lugar,
se propone un ı́ndice de creatividad metafórica capaz de medir el nivel de creatividad metafórica
a partir de observaciones en un corpus de referencia general del español.
El corpus de estudio consta de un corpus monoling´’ue comparable de informes de responsabilidad
social empresarial compuesto por originales españoles (OR-ES) y textos en español traducidos
del inglés (TR-ES).
Por lo que atañe a la primera propuesta metodológica, se avanza la hipótesis de que el fenómeno
metafórico con su amplio margen de variación entre formas plenamente lexicalizadas (por ej.,
cuello de botella) y metáforas creativas (por ej., drenar el dolor ) ofrece un punto de observación
ideal para observar cómo el uso ling´’uı́stico de quien traduce se diferencia del observado en textos
originales. Más en lo especı́fico, se considera que las metáforas convencionales en las traducciones son reflejo de procesos de normalización propios de los textos traducidos (”tendency to
exaggerate features of the target language and to conform to its typical patterns”, Baker 1996),
mientras que las metáforas creativas pueden resultar de un proceso de reverberación de la lengua
de partida en la lengua meta (shining through, Teich 2003).
La segunda propuesta metodológica es funcional a la comparación del nivel de creatividad de las
metáforas en textos traducidos y originales y parte del criterio propuesto por Deignan (2005) para
distinguir metáforas innovadoras y metáforas históricas: la baja frecuencia de usos metafóricos
de una palabra dada se considera indicativa de innovación metafórica, mientras que las palabras
que se utilizan casi exclusivamente metafóricamente se consideran usos convencionales. Para calcular el ı́ndice de creatividad metafórica, se extraen las 200 parejas VERBO-SUSTANTIVO más
frecuentes en los dos corpus (OR-ES y TR-ES), identificando entre estas las parejas metafóricas
utilizando el procedimiento propuesto por el Grupo Pragglejaz (Pragglejaz Group, 2007). Sucesivamente, se calcula el ı́ndice de creatividad de verbos y sustantivos metafóricos contando el
⇤
Ponente
185
número de casos de usos metafóricos de cada uno en una selección casual de 100 concordancias extraı́das del corpus español de la Leeds Collection of Internet Corpora (Sharo↵ 2006,
REF). El número de casos metafóricos en REF multiplicado por la frecuencia de una dada
pareja VERBO-SUSTANTIVO en OR-ES y TR-ES se considera indicativo del nivel de creatividad/convencionalidad metafórica de cada corpus.
La comunicación se centrará en una discusión de las implicaciones metodológicas de las propuestas avanzadas, además de poner en relación el ı́ndice de creatividad metafórica con fenómenos
de normalización y reverberación en las traducciones.
Contraseña: informes de responsabilidad social empresarial, traducción de la metáfora, universales
de traducción, análisis de la metáfora basado en corpus
186
‘His maiestie chargeth, that no person shall
engrose any maner of corne’. The
Standardization of Punctuation in Early
Modern English Legal Proclamations
Javier Calle-Martı́n
1
⇤ 1
University of Málaga (UMA) – Facultad de Filosofia y Letras Departamento de Filologı́a Inglesa
Campus de Teatinos s/n Málaga 29071, España
Punctuation is historically noted to develop from the rhetorical to the grammatical, from the
speaker to the reader, the Renaissance standing out as the transitional period with the adoption
of syntactic and pragmatic functions to organize the written information. This standardization
is elsewhere regarded as a consequence of the introduction of Caxton’s printing press in England,
the increasing activity of Westminster’s Royal Chancery, and a growing number of professional
scriveners engaged in the writing of all sort of documents, from guild’s records to private letters.
The study of historical punctuation, however, has been mostly based on Old and Middle English handwritten material, literary and scientific texts in particular. Unfortunately, the Early
Modern English period has been an exception with the publication of a limited number of studies investigating the scribal attitudes in di↵erent text-types, the list including scientific, legal
and literary texts, drama in particular (Calle-Martı́n and Miranda-Garcı́a 2008: 356–360). The
unexplored condition of Early Modern English punctuation is even more significant in the particular case of printed texts, despite their active participation in the process of standardization.
Legal material is not an exception, proclamations being ”one of the most overlooked categories
of printed material in the field of early modern history” (Kyle 2015: 771). In the light of this,
the present study therefore analyses the punctuation system in Early Modern English printed
legal material with the following objectives: a) to provide the inventory of marks of punctuation
in Early Modern English printed texts; b) to o↵er a detailed account of the use and pragmatic
functions of these symbols; and c) to assess the level of standardization of punctuation in these
sources.
The present study relies on The Corpus of Early Modern English Statutes (compiled by Anu
Lehto at the University of Helsinki), containing approximately 214,000 words for the historical
period 1491-1707 (Lehto 2013: 239). The corpus is divided into 25-year sub-periods for diachronic comparison and they have been compiled to include two proclamations for each time
period, with samples printed during the reign of each sovereign. Legal material has been chosen
in view of a) its orality, written to be read aloud; b) its conservativeness, hostile to individual
creativity in favour of the standard practice; and c) it complex syntax, requiring a complex set
of marks for all kinds of syntactic relationships.
This material has allowed us to gather conclusive data to ascertain a) the existence of an inventory of punctuation marks with a preconceived set of rules, corroborating an ongoing process of
specialization at that time; and b) more importantly, the historical development of particular
punctuation symbols, o↵ering grounds as to the actual rise and fall of particular symbols and
⇤
Ponente
187
their functions in the history of English.
Calle-Martı́n, Javier and Antonio Miranda-Garcı́a. 2008. ”The Punctuation System of Elizabethan Legal Documents: The Case of G.U.L. MS Hunter 3 (S.1.3)”. The Review of English
Studies 59: 356–378.
Kyle, Chris R. 2015. ”Monarch and Marketplace: Proclamations as Use in Early Modern
England”. Huntington Library Quarterly 78.4: 771–787.
Lehto, Anu. 2013. ”Complexity and Genre Conventions: Text Structure and Coordination in
Early Modern English Proclamations”. In Andreas H. Jucker, Daniela Landert, Annina Seiler
and Nicole Studer-Joho (eds.). Meaning in the History of Engish. Words and Texts in Context.
Amsterdam, Phil: John Benjamins. 233–257.
Contraseña: Early Modern English, proclamations, punctuation, standardization
188
‘Making it clear’: A contrastive study of
evidentials and boosters in contemporary
political discourse
Ana Albalat-Mascarell
1
⇤† 1
Universitat Politecnica de Valencia [Espagne] (UPV) – España
Within Hyland’s (2005) metadiscoursal framework, evidentials and boosters are common
rhetorical strategies that lend credibility to arguments either by drawing on external sources
of information or by emphasising one’s own certainty about a proposition. Both strategies are
part of a strong interpersonal view of metadiscourse comprising the ways speakers can organize
a discourse and adopt a stance towards what is being discussed and their audience (Hyland,
2004, 2005, 2010; Hyland and Tse, 2004; Dafouz-Milne, 2008; Mur-Dueñas, 2011). But while a
useful tool in explaining the interactional features of language in di↵erent domains and genres,
metadiscourse has mostly been examined in relation to academic writing (Hyland, 2015). Little
attention has been given to the role of metadiscourse markers in non-academic discourses with
an overtly persuasive component such as political discourse, least of all from a comparative
perspective exploring rhetorical and discursive cross-cultural di↵erences (Mur-Dueñas, 2011)
between English and other languages. I address this gap by focusing on the presence and function
of evidentials and boosters in broadcast debates between political candidates held for the 2015
and 2016 general elections in Spain and for the 2016 presidential election in the United States of
America. In this vein, my objectives are, first, to extract the frequencies of the words and phrases
performing these particular metadiscourse functions in such televised debates aimed at a very
large audience; second, to compare the rhetorical and discursive roles of the most frequently used
expressions by di↵erent speakers and relate them to the candidates’ persuasive aims; third, to
explore linguistic and intercultural di↵erences regarding the use of these strategies and contrast
them with the particular outcome of each election. In the methodology set for this study, the
analysis was based on a corpus of authentic data consisting of the transcripts of those debates
involving the leaders of at least the two parties topping opinion polls in each country and election
(i.e. the PP and the PSOE (also Podemos in the 2016 election) in Spain and the Democratic
and Republican political parties in the United States). Furthermore, the quantitative use of
evidentials and boosters was analyzed with the tool ‘Metool’ developed specifically to detect
metadiscourse strategies. The results demonstrate how the strategies identified tend to work in
combination towards the representation of a credible self with something plausible to say that
challenges opposing views on the same issue. Also, the main di↵erences in the qualitative use of
these metadiscourse devices between the political actors involved and the positions they publicly
adopt reveal a striking correlation between the speaker’s communicative characteristics and the
projection of personal authority and trustworthiness into their discourse. Last but not least, the
cross-cultural analysis of evidentials and boosters in broadcast debates taking the framework
of interpersonal metadiscourse shows that the speaker’s ability to construct an e↵ective ‘Ethos’
varies according to language and culture but, quite surprisingly, a better performance at debates
does not necessarily imply an election victory neither in the Spanish national context nor in the
Anglo-Saxon tradition in the United States.
⇤
†
Ponente
Autor correspondiente: [email protected]
189
Contraseña: Intercultural rhetoric, Corpus, based analysis, Metadiscourse, Evidentials, Boosters,
Political discourse
190
Indice de autores
Álvarez-Gil, Francisco J., 73
Ahuactzin Martı́nez, Carlos Enrique, 75
Albalat-Mascarell, Ana, 183
Alcaraz-Mármol, Gema, 159
Almela Sánchez, Moisés, 170
Almela, Ángela, 159
Alonso Belonte, Isabel, 87
Alonso Ramos, Margarita, 81
Alonso-Almeida, Francisco, 73
Alruwaili, Awatif, 155
Andrade Navarro, Allen, 54
Arsenio, Andrades, 30
Baena Lupiáñez, Marı́a del Carmen, 56
BALLIER, Nicolas, 61
Barcellos, Carolina, 85
Barrio, Marı́a Valentina, 52
Barry, Pennock-Speck, 67
Bendinelli, Marion, 103
Bertels, Ann, 143
Bestgen, Yves, 177
BOJOVIC, Dijana, 36
Boutmgharine Idyassner, Najet, 46
Brenchley, Mark, 157
Buckingham, Louisa, 168
Cabezas-Garcı́a, Melania, 120
Cal Varela, Mario, 139
Calle-Martı́n, Javier, 181
Calvo-Rubio Jiménez, Estrella, 58
CANGIR, Hakan, 89
Cantos Gómez, Pascual, 170
Cantos, Pascual, 159
Carrió-Pastor, Marı́a Luisa, 97
CAVALLA, Cristelle, 38
Charles, Maggie, 153
Chaski, Carole, 159
Clavel Arroitia, Begoña, 67
Comer, Marie, 145
Comitre Narvaez, Isabel, 107
CORONA, ISABEL, 162
Criado Peña, Miriam, 128
Criado Sánchez, Raquel, 87
Delgar Farrés, Gemma, 111
Dong, Jihua, 168
EL KHAMISSY, Riham, 101
Esteban-Segura, Laura, 6
Fernández, Ester, 95
Fernández-Alcaina, Cristina, 18
Fernández-Domı́nguez, Jesús, 18
Fernández-Pena, Yolanda, 175
Fernandez Polo, Francisco Javier, 139
Gallego, Daniel, 63
Gandón-Chapela, Evelyn, 10
Garcı́a Salido, Marcos, 81
Garcia González, Marcos, 81
Garcia-Marchena, Oscar, 147
Gautier, Laurent, 32, 109
Georgopoulos, Athanasios, 22
Gil Martı́nez, Marı́a Adelaida, 77
GIRALDEZ CEBALLOS-ESCALERA, JOAQUÍN,
34
Gledhill, Christopher, 28
Gonzalez Darriba, Patricia, 2
Grön, Leonie, 143
Gregori-Signes, Carmen, 164
Gris Roca, Joaquı́n, 87
Hamilton, Clive, 71
Hedeland, Hanna, 149
Herrando Rodrigo, Isabel, 135
Heylen, Kris, 83
Jacques, Marie-Paule, 116
Jeon, Yun Sil, 141
Jettka, Daniel, 149
John, Suganthi, 12
Kang, Beomil, 4
Kubler, Natalie, 105
Kunilovskaya, Maria, 112
Lambrechts, An, 83
Lara-Clares, Cristina, 18
Laso, Natalia Judith, 12
León-Araúz, Pilar, 91, 120
191
Lee, Sun-Hee, 4
Lissón, Paula, 61
Liu, Yuanyi, 44
Llorián, Susana, 26
Lorés Sanz, Rosa, 135
Lozano Bachioqui, Eleonora, 54, 174
MAPELLI, GIOVANNA, 135
Martı́nez Casas, Marı́a, 137
Martı́nez Zavala, Sonia Paola, 24
Martı́nez, Inmaculada, 26
Martikainen, Hanna, 28
Martinez-Insua, Ana Elina, 16
Maruenda-Bataller, Sergio, 8
Mas, Inmaculada, 129
Mestivier (Volanschi), Alexandra, 28
Mestivier, Alexandra, 105
Mestre-Mestre, Eva M., 122
Mezeg, Adriana, 114
Morales Moreno, Albert, 59
Moreno-Ortiz, Antonio, 151
Moreno-Sandoval, Antonio, 44
Morgoun, Natalia, 112
Muñoz-Garcés, Alejandro, 141
Murillo, Silvia, 133
Savvidou, Paraskevi, 69
Selmi, Afef, 32
Sinkuniene, Jolanta, 166
SUAU-JIMÉNEZ, FRANCISCA, 79, 135
Suleymanov, Dzhavdet, 42
TRAN, Thi Thu Hoai, 38
Tutin, Agnès, 131
Vadasz, Noemi, 20
Verplaetse, Heidi, 83
Villayandre, Milka, 52
Yan, Rui, 116
Yoo, Hye Ryeong, 4
ZHANG, Xingzi, 93
Zimina, Maria, 28
Nevzorova, Olga, 40
Nguyen Van, Cyril, 109
Niall, Curry, 118
Nieto Mercado, Dalila Itzel, 174
Nieto, Guadalupe, 172
Pérez Béjar, Vı́ctor, 50
Padilla Herrada, Marı́a Soledad, 50
Pallejá, Clara, 159
Pecman, Mojca, 105
Perez-Guerra, Javier, 16
Piccioni, Sara, 179
PIQUÉ-NOGUERA, CARMEN, 79
Prado-Alonso, Carlos, 126
Ramisch, Carlos, 65
Ramos Ruiz, Ismael, 99
Reimerink, Arianne, 91
Rodrı́guez-Abruñeiras, Paula, 8
Rodrı́guez-Puente, Paula, 161
Romero Medina, Agustı́n, 87
Romero-Barranco, Jesús, 48
Ruano, Pablo, 14
Sánchez-Cárdenas, Beatriz, 65
Salles-Bernal, Soluna, 6
Santaemilia, José, 124
192