Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lost in Translation? The Effect of Language on Response Distributions in Likert Data Bert Weijters Maggie Geuens Hans Baumgartner Effect of language on response distributions in Likert data The non-equivalence problem in cross-national research Surveys are popular in cross-national marketing and consumer research However, one common concern is that survey responses may not be equivalent across countries: the same response (e.g., ‘4’ on a five point-agree/ disagree scale) may have a different meaning for different respondents (e.g., in different countries); □ sources of non-equivalence: □ Item-specific (different meanings attached to a particular item) General (i.e., over multiple tems) Effect of language on response distributions in Likert data Research objective General non-equivalence (i.e., bias not specific to a particular item) may be due to □ Nationality or national culture □ Language Our focus is on language and we will show that □ language differences can be a more important contributor to scale usage differences than differences in nationality; □ at least for bilingual respondents, differences in mother tongue do not matter; □ the scale labels used affect response behavior; □ the fluency (rather than the intensity) of the scale labels seems to be the driver of differences in response behavior; Effect of language on response distributions in Likert data Method: Measuring response distributions A major challenge is to measure bias in response distributions that is not item-specific and independent of substantive content; To do this, we need to observe patterns of responses across heterogeneous items (i.e., items that do not share common content but have the same response format): Deliberately designed scales consisting of heterogeneous items (Greenleaf 1992) Random samples of items from scale inventories (Weijters, Geuens & Schillewaert 2010) Effect of language on response distributions in Likert data Study 1 Does nationality or language lead to greater similarity in responses to heterogeneous Likert items? “Natural” experiment using native speakers of different languages in Europe who share or do not share the same nationality; Effect of language on response distributions in Likert data Method: Design and sample Country Language Dutch Netherlands Belgium 1046 644 French 371 France Germany 1000 993 Italian 1046 1015 Italy Total 1690 German Total Switzerland 1000 993 303 1674 606 1599 50 939 989 959 939 5952 Effect of language on response distributions in Likert data Hierarchical clustering of regions by response category proportions (Ward’s method) Effect of language on response distributions in Likert data Study 2 Are differences in response distributions due to language mainly related to respondents’ mother tongue (i.e., an individual characteristic) or the language of the questionnaire (i.e., a stimulus characteristic)? In particular, does the use of different category labels within each language affect the response distributions? □ Response category labels are a potential systematic source of differences in response distributions since they are constant across items but variable across languages; □ Even within the same language, response distributions may differ if different response category labels are used; Effect of language on response distributions in Likert data Study 2: Design version MOTHER TONGUE Total NL_a NL_b FR_a FR_b Dutch 115 61 62 128 366 French 109 224 57 118 51 113 112 240 329 695 Total NL_a (A) NL_ b (B) FR_a (C) FR_b (D) 5 4 Volledig eens Enigszins eens Sterk eens Eerder eens Tout à fait d'accord Un peu d'accord Fortement d’accord Plutôt d’accord 3 Noch eens, noch oneens Neutraal Ni d'accord, ni pas d'accord Neutre 2 Enigszins oneens Eerder oneens Un peu en désaccord Plutôt pas d’accord 1 Volledig oneens Sterk oneens Tout à fait en désaccord Fortement pas d’accord Effect of language on response distributions in Likert data Study 2: Design (cont’d) Dependent variable: □ 16-item Greenleaf (1992) scale; □ 16 heterogeneous Likert items sampled from as many unrelated marketing scales; □ the two sets of measures achieved convergent validity and were combined; language profile (language proficiency and use of Dutch/French); Effect of language on response distributions in Likert data Statistical analysis Score Statistics For Type 3 GEE Analysis Source Questionnaire Mother_tongue Scale_Category Questionnaire*Mother_tongue Questionnaire*Scale_Category Scale_Category*Mother_tongue 3-way interaction DF ChiSquare Pr > ChiSq 3 1 3 3 9 3 9 24.11 0.12 422.09 3.35 73.32 4.92 10.70 <.0001 0.7297 <.0001 0.3402 <.0001 0.1777 0.2969 Effect of language on response distributions in Likert data Study 2: Results Effect of language on response distributions in Likert data More specific tests Interaction of questionnaire version and scale category shows that the response patterns differ by language and/or label; In both Dutch and French, using different label sets changed the response distributions; Depending on which labels are used in Dutch and French, response distributions may or may not vary across languages; Effect of language on response distributions in Likert data Discussion Study 2 response distributions do not seem to differ as a function of a respondent’s mother tongue; the language of the questionnaire and the labels used for the scale categories can have a substantial influence on how readily certain positions on the rating scale are endorsed: □ even within the same language, supposedly similar labels strongly affected responses to items that were presumably free of common content; □ in a multi-language context, where category labels do differ across languages but are common across items within the same language, the labels attached to different scale positions can be a potent source of response bias; Effect of language on response distributions in Likert data Two alternative hypotheses to explain the effect of response category labels Intensity hypothesis: H1: Endpoint labels with higher intensity are less frequently endorsed. Fluency hypothesis: H2: Endpoint labels with higher fluency are more frequently endorsed. Effect of language on response distributions in Likert data H1: Intensity hypothesis Item Response Theory: □ respondents map their standing on the latent variable onto the response category that covers their position on the latent variable (Samejima 1969; Maydeu-Olivares 2005); □ the wider the response category, the more likely respondents are to endorse it; more intense endpoint labels move the category’s lower or upper boundary away from the midpoint, resulting in lower response frequencies; Extreme endpoint label Shifting boundary Narrow category Low frequency 1 2 3 4 5 6 7 Overt Likert response Latent construct Effect of language on response distributions in Likert data H2: Fluency hypothesis Research on processing fluency shows that the meta-cognitive experience of ease of processing affects judgment and decision making: □ perceptions of the truth value of statements (e.g., Unkelbach 2007) □ liking for objects and events (e.g., Reber, Schwarz, and Winkielman 2004) □ choice deferral or choices of compromise options (e.g., Novemski et al. 2007); Repeated statements are more likely to be rated as true (Unkelbach 2007) and repetition increases liking, as suggested by the mere exposure effect (e.g., Bornstein 1989), in part because repetition makes stimuli more familiar and contributes to greater processing fluency; Therefore, if scale labels are more commonly used in everyday language and are thus easier to process, this should increase the likelihood that the corresponding response option on the rating scale is selected; Effect of language on response distributions in Likert data Main experiment: Method □ We randomly assigned Dutch speaking students (N = 100) to two alternative versions of a brief online questionnaire (10 heterogeneous Likert items and pairwise comparisons); □ Two endpoint versions: ‘sterk (on)eens’ (‘strongly (dis)agree’): low intensity, low fluency ‘volledig (on)eens’ (‘fully (dis)agree’): high intensity, high fluency Effect of language on response distributions in Likert data Main experiment: Findings A generalized linear model analysis showed that the number of extreme positive responses was significantly lower in the ‘sterk eens’ (low intensity and fluency) condition than in the ‘volledig eens’ (high intensity and fluency) condition: means of 3.63 vs. 4.44 (χ21=3.998, p = .046); This result is consistent with H2: labels that are more fluent lead to higher response category frequencies (in this case despite their higher intensity); Effect of language on response distributions in Likert data Study 4: Method Language France USA Canada UK Total French 227 0 203 0 430 English 0 185 196 187 568 227 382 399 187 998 Total Version 1 2 3 4 5 6 English French Strongly agree Completely agree Extremely agree Definitely agree Fully agree Very much agree Fortement d'accord Complètement d'accord Extrêmement d'accord Définitivement d'accord Entièrement d'accord Tout à fait d'accord Effect of language on response distributions in Likert data Multilevel results Estimate S.E. Est./S.E. P-Value Within Level ERS ON FEMALE AGE EDU_HI 0.057 -0.001 -0.048 0.047 0.003 0.085 1.196 -0.279 -0.560 0.232 0.781 0.575 Between Level ERS ON FLUENCY INTENSITY LANG_FR C_US C_FR C_UK 0.165 -0.133 0.061 0.119 0.007 0.025 0.064 0.131 0.087 0.102 0.076 0.120 2.594 -1.014 0.703 1.166 0.091 0.212 0.009 0.311 0.482 0.244 0.927 0.832 1.002 0.184 5.444 0.000 Intercept ERS Effect of language on response distributions in Likert data Discussion: summary of findings Nationality Cross-regional nonequivalence Other language aspects Language Label intensity Questionnaire response category labels Label currency Study 1: Cross-regional European survey Response distributions are more homogeneous for regions sharing the same language than for regions sharing the same nationality. Effect of language on response distributions in Likert data Discussion: summary of findings Nationality Cross-regional nonequivalence Other language aspects Language Label intensity Questionnaire response category labels Label currency Study 2: Experiment with bilinguals Response distributions vary as a function of category labels, even within the same language and regardless of respondents’ mother tongue Effect of language on response distributions in Likert data Discussion: summary of findings Nationality Cross-regional nonequivalence Other language aspects Language Label intensity Questionnaire response category labels Label fluency Study 3: Label experiment (one sample) Highly fluent labels lead to higher endorsement rates of response categories, irrespective of label intensity (and keeping language constant) Study 4: Cross-continental label experiment This finding holds in a multilingual cross-continental setting, irrespective of language and nationality Effect of language on response distributions in Likert data Implications Response style research Need to extend the scope to questionnaire characteristics Need to cross-validate/replicate earlier crossnational comparisons Cross-cultural survey research Reconsider regional segmentations Validate measures cross-linguistically and crossnationally Effect of language on response distributions in Likert data Implications for multilingual survey research □ Translations usually imply a trade-off between the attempt to be literal and the attempt to be idiomatic; □ Optimize equivalence: use response category labels that are equally fluent in different languages (rather than literal translations or words with equal intensity); e.g., ‘Strongly agree’ is most commonly used in scales, but may not have valid equivalents in some other languages. ‘Completely agree’ seems to be a viable alternative. Completely agree Tout à fait d’accord fluency 1.24 1.22 ERS% 18.8% 19.2%