Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Developing inventories for satisfaction and Likert scales in a service environment by Contact Author: Karin Braunsberger, Ph.D. Associate Professor of Marketing University of South Florida St. Petersburg College of Business 140 Seventh Avenue South Bayboro Station 306 St. Petersburg, FL 33701-5016 Telephone: (727) 873-4082 Fax: (727) 873-4192 E-mail: [email protected] Roger Gates DSS Research 6750 Locke Avenue Ft. Worth, TX 76116 Telephone: (817) 665-7000 Fax: (817) 665-7001 E-mail: [email protected] 1 Abstract Purpose: To produce up-to-date inventories for satisfaction and Likert scales that contain commonly used scale point descriptors and their respective mean scale values and standard deviations. Methodology/Approach: All data were collected online using the SSI Survey Spot Panel. The panel is national (U.S.) in scope and was screened to include individuals 21-65 years of age. A random sample was drawn. Thirty-nine satisfaction items and 19 agreement items were tested, and the mean value and the standard deviation were calculated for each of these descriptors. Findings: Even though only six of the items that had been tested by Jones and Thurstone (1955) were included in the list of satisfaction scale descriptors, the semantic meanings of those six have changed very little over the years. Research limitations/implications: One limitation of the current study might be the chosen service context, since scale point descriptor inventories developed within the context of health insurance might not be valid in other service contexts. Practical Implications: Since the present study focuses on two types of scales that are frequently used in service environments, namely Likert and satisfaction scales, the major contribution of this study is to provide researchers and managers in services marketing with quantitative measurement of the meanings of commonly used scale point descriptors, which as pointed out by Myers and Warner (1968) will make possible the development of equal interval scales and thus aid analyses of data sets. It will thus help service marketers to develop questionnaires that more accurately reflect actual consumer satisfaction and opinions. Keywords: Satisfaction scale inventory, Likert scale inventory. Paper Type: Research paper. 2 Developing inventories for satisfaction and Likert scales in a service environment 1. Introduction Researchers and managers in services marketing are often concerned with assessing customer satisfaction and opinions (Bearden, Malhotra and Uscátequi, 1998). When developing questions to assess satisfaction it has been strongly suggested that the end points of preference response scales should be words or phrases that denote bi-polar extremes, and that all anchoring points should be suitably spaced along the semantic continuum connecting the end points (Jones and Thurstone, 1955). Jones and Thurstone (1955) further express the need to investigate the semantic properties of commonly used scale point descriptors to make sure that they possess the above properties and also carry meaning that is as clear as possible to subjects that represent the researcher’s population of interest. Further, knowing the exact scale value of each scale point descriptor is of importance when constructing successive-interval type of scales. Consequently, Jones and Thurstone (1955) examine the semantic meanings, to respondents, of 51 scale point descriptors using 9-point scales and subsequently present the research community with a listing of words and phrases that range from those expressing “greatest like” to those conveying the “greatest dislike.” That is, the authors succeed in constructing a “continuum of meaning” that ranges from the end points “best of all” to its bi-polar extreme “despise” (p.33), and further provide future researchers with both the scale value and standard deviation of each of the tested words and phrases. Similarly, Myers and Warner (1968) argue that the construction of accurate and meaningful scales requires that researchers comprehend the psychological meaning, to the respondent, of scale point descriptors. These authors further assert that quantitative measurement 3 of the meanings of commonly used scale point descriptors would allow researchers to develop equal interval scales that are desirable for subsequent statistical analyses of data sets. Accordingly, Myers and Warner (1968) modify the technique introduced by Jones and Thurstone (1955), investigate the psychological meaning of 50 commonly used scale point descriptors to four different groups of respondents, and present the respective mean scale values and standard deviations for all four groups of respondents. Even though the four subject groups are very different from each other (i.e., housewives, business executives, undergraduate and graduate business students), their mean scale values and standard deviations are very similar. Similar studies have been conducted by Bartram and Yelding (1973), Vidali (1975), Wildt and Mazis (1978), and the findings indicate that inventory scale values such as provided by Jones and Thurstone (1955) and Myers and Warner (1968) “are surprisingly consistent among very diverse groups of people,” “can be used with a high degree of confidence,” and are “likely to provide psychological scales that are virtually equi-distant” (Vidali, 1975, p.25). Considering, however, that languages change over time (Graddol, 2004; Yang, 2000), and no recent inventories are available, the purpose of the present study is to produce a current inventory containing commonly used scale point descriptors and their respective mean scale values and standard deviations. Since the present study focuses on two types of scales that are frequently used in service environments, namely Likert and satisfaction scales, the major contribution of this study is to provide researchers with quantitative measurement of the meanings of commonly used scale point descriptors, which as pointed out by Myers and Warner (1968) will make possible the development of equal interval scales and thus aid statistical analyses of data sets. 2. Methods 4 The goal of the present research was to develop inventories for two types of frequently used response scales, namely satisfaction and Likert scales. A review of the literature focused on locating commonly used scale point descriptors for both types of scales (see Tables 1 and 2). Given that that there is considerable overlap of scale point descriptors, a final number of 39 satisfaction items and 19 agreement items was chosen and tested. [INSERT TABLES 1 AND 2 HERE] The data collection followed the method first outlined by Jones and Thurstone (1955). Accordingly, all satisfaction scale point descriptors were treated as items on nine-point scales (from -4 to +4). Each scale was anchored to the left by “greatest dislike,” its midpoint by “neither like nor dislike,” and to the right by “greatest like” (see Table 3 for the instructions given to respondents). The procedure for the Likert scale point descriptors was similar, except that the left-hand anchor read “greatest disagreement,” the scale midpoint “neither agree nor disagree,” and the right-hand anchor “greatest agreement.” For each of the scale point descriptors, respondents were asked to place a check mark in the space on the nine-point scale that best described the meaning of the respective scale point descriptor. [INSERT TABLE 3 HERE] All data were collected online, in the United States. For that purpose, the SSI Survey Spot Panel was used. The panel is national in scope and was screened to include individuals 21-65 years of age. A random sample was drawn, and of those invited to participate by panel, 65% qualified to participate in the survey. That is, because the present study focuses on creating an inventory of satisfaction and agreement measures in the health insurance industry, we recruited only subjects who actually had experience with such insurance, i.e., had group health insurance through an employer [self or spouse]. Considering that 65% of the U.S. population has health 5 insurance, our samples are therefore representative of the population of interest. Further, only the household decision-maker or co-decision maker was qualified to participate. The response rate of those who qualified was 62%. All subjects were asked to rate each of the 39 satisfaction and 19 agreement items. The satisfaction scale point descriptors were rated first, followed by the agreement scale point descriptors. The order of the items within each of the categories (i.e., satisfaction and agreement descriptors) was random. Following the procedure outlined by Jones and Thurstone (1955) and defended by Myers and Warner (1968), all subjects (N = 272) were shown all scale-point descriptors within each category at once. 3. Data analysis and results The mean value and the standard deviation were calculated for each of the scale point descriptors (Tables 4 and 5). Interestingly, even though only six of the items that had been tested by Jones and Thurstone (1955) were included in the list of satisfaction scale descriptors, the semantic meanings of those six have changed very little over the years (see Table 4). [INSERT TABLES 4 AND 5 HERE] 4. Discussion and conclusion The current study examines the semantic properties of commonly used scale point descriptors for both satisfaction and agreement scales, and subsequently provides inventories of mean values and standard deviations for these scale point descriptors to be used by researchers. Knowing a scale point descriptor’s mean value makes it possible to construct successive interval and/or equal interval scales that support meaningful statistical analyses and interpretation. Although the current study manages to overcome some of the limitations pointed out by Myers and Warner (1968) – namely the use of relatively small samples that are not national in scope and are not random in kind – one limitation of the current study that future research should 6 investigate is the limitation that might arise due to the chosen product context. It is conceivable that scale point descriptor inventories developed within the context of health insurance might not be valid in other product contexts. However, even as we point to this limitation, Mittelstaedt (1971, p. 236), who compares three different studies that focused on building scale point descriptor inventories, helps us argue that the product context used to develop an inventory is not very likely to impact the usefulness of that inventory in other product contexts: “In spite of differences in time, place, subjects, instruments, instructions, referents and the contextual differences which may arise from using widely different arrays of stimuli, the correspondence among the scale values of the three studies seems remarkable.” 7 TABLE 1 Satisfaction Scales Crosby and Stephens, 1987, Journal of Displeased Marketing Research (cited by Wirtz and Pleased Lee, 2003, Journal of Service Research) Kolodinsky, 1999, Journal of Consumer Very dissatisfied Affairs Dissatisfied Neutral Satisfied Very satisfied Peterson and Wilson, 1992, Journal of the Very satisfied Academy of Marketing Science Somewhat satisfied Somewhat dissatisfied Very dissatisfied Uncertain Peterson and Wilson, 1992, Journal of Very satisfied Somewhat satisfied Marketing Research Unsatisfied Very unsatisfied Peterson and Wilson, 1992, Journal of the Completely satisfied Academy of Marketing Science Very satisfied Fairly satisfied Somewhat dissatisfied Very dissatisfied Preisser, 2002, Health Services and Excellent Outcomes Research Methodology Very good Good Fair Poor SIP Servizio Opinioni, 1989, as cited in Very satisfied Peterson and Wilson, 1992, Journal of the Quite satisfied Not very satisfied Academy of Marketing Science Not at all satisfied Weinstein, 1989, American Banker Very satisfied Consumer Survey Somewhat satisfied Completely unsatisfied Westbrook, 1980, Journal of Marketing (T- Delighted Pleased D Scale) Mostly satisfied Mixed (about equally satisfied and dissatisfied) Mostly dissatisfied Unhappy Terrible For reasons of completion and exploratory Extremely satisfied, acceptable, slightly purposes, the following scale point satisfied, OK, neither satisfied nor dissatisfied, descriptors were added slightly dissatisfied, fairly dissatisfied, completely dissatisfied, extremely dissatisfied 8 TABLE 2 Likert Scales Albaum, 1997, Market Research Society Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree Hair, Bush and Ortinau, 2003, Marketing Definitely agree Research Generally agree Slightly agree Slightly disagree Generally disagree Definitely disagree Jacoby and Matell, 1971, Journal of Agree Marketing Research Uncertain Disagree McDaniel and Gates, 2002, Marketing Strongly agree Research Somewhat agree Neutral Somewhat disagree Strongly disagree Menezes and Elbert, 1979, Journal of Strongly agree Marketing Research Generally agree Moderately agree Moderately disagree Generally disagree Strongly disagree For reasons of completion and exploratory Completely agree purposes, the following two scale point Completely disagree descriptors were added 9 TABLE 3 Instructions to Respondents WORD MEANING TEST In this test are words and phrases that people might use to show like or dislike for health insurance plans. For each word or phrase make a check mark to show what the word or phrase means to you. Look at the examples. Example I Suppose you heard a person say that he/she “barely liked” his/her health insurance plan. You would probably decide that he/she likes it only a little. To show the meaning of the phrase “barely like,” you would probably check under +1 on the scale below. Greatest Dislike -4 Neither Like Nor Dislike -3 -2 -1 0 Greatest Like +1 +2 +3 +4 Barely like √ Example II If you heard someone say he had the “greatest possible dislike” for a certain health insurance plan, you would probably check under -4, as shown on the scale below. Greatest Dislike -4 Greatest possible dislike Neither Like Nor Dislike -3 -2 -1 0 Greatest Like +1 +2 +3 +4 √ For each phrase on the following pages, check along the scale to show how much like or dislike the phrase means. 10 Item TABLE 4 Satisfaction Items Valid N Means 272 3.74 272 3.58 272 3.33 272 3.29 272 3.11 272 2.61 272 2.67 272 2.39 272 2.04 272 1.88 272 1.81 272 1.45 272 1.32 272 1.22 272 0.94 272 0.69 272 0.47 272 0.03 272 0.00 Std Dev 0.94 1.24 1.91 1.21 1.22 0.91 1.17 1.14 1.17 1.14 0.96 1.01 0.86 0.82 0.91 0.85 0.92 0.36 0.43 Excellent Completely satisfied Extremely satisfied Very satisfied Delighted Very good Quite satisfied Mostly satisfied Pleased Satisfied Good Fairly satisfied Somewhat satisfied Acceptable Slightly satisfied OK Fair Neutral Mixed (about equally satisfied and dissatisfied) Neither satisfied nor 272 -0.02 0.36 dissatisfied Uncertain 272 -0.07 0.50 Slightly dissatisfied 272 -1.13 0.76 Somewhat dissatisfied 272 -1.42 0.81 Fairly dissatisfied 272 -1.66 0.98 Not very satisfied 272 -1.51 1.32 Displeased 272 -1.85 1.10 Unhappy 272 -1.87 1.10 Dissatisfied 272 -1.85 1.25 272 1.29 Poor -1.92 Unsatisfied 272 -2.14 1.21 Mostly dissatisfied 272 -2.78 0.92 Quite dissatisfied 272 -2.65 1.73 Very unsatisfied 272 -3.15 1.45 Not at all satisfied 272 -3.25 1.43 Very dissatisfied 272 -3.08 1.57 Terrible 272 -3.36 1.14 Completely dissatisfied 272 -3.22 2.14 Completely unsatisfied 272 -3.60 1.45 Extremely dissatisfied 272 -3.71 1.02 * Statistically significant at the .05 level Jones and Thurstone (1955) inventoried the scale point descriptors excellent (mean = 3.71, std dev = 1.01); very good (mean = 2.56, std dev = .87); good (mean = 1.91, std dev = .76); fair (mean = .78, std dev = .47); neutral (mean = .02, std dev = .18); poor (mean = -1.55, std dev = .87) 11 Item Completely agree Definitely agree Strongly agree Agree Generally agree Moderately agree Somewhat agree Slightly agree Neutral Neither agree nor disagree Uncertain Slightly disagree Somewhat disagree Moderately disagree Disagree Generally disagree Strongly disagree Definitely disagree Completely disagree TABLE 5 Likert Items Valid N 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 Means 3.63 3.32 3.05 1.92 1.67 1.62 1.23 0.96 0.01 0.00 Std Dev 0.91 1.13 1.60 0.88 0.91 1.03 0.65 0.56 0.25 0.32 -0.08 -0.95 -1.37 -1.65 -1.82 -1.74 -3.21 -3.45 -3.69 0.44 0.84 0.80 1.12 1.01 1.09 1.43 1.18 1.18 * Statistically significant at the .05 level 12 REFERENCES Albaum, G. (1997) “The Likert scale revisited: an alternate version,” Market Research Society, Vol 39 No 2, pp. 331-48. Bearden, W. O., Malhotra, M. K., Uscátequi, K. H. (1998) “Customer contact and the evaluation of service experiences: propositions and implications for the design of services,” Psychology and Marketing, Vol 15 No 8, pp. 793-809. Bertram, P., Yelding, D. (1973) “The development of an empirical method of selecting phrases used in verbal rating scales,” Journal of the Market Research Society, Vol 15, pp. 151-56. Crosby, L. A., Stephens, N. (1987) “Effects of relationship marketing on satisfaction, retention, and prices in the life insurance industry,” Journal of Marketing Research, Vol 24 (November), pp. 404-11. Graddol, D. (2004) “The future of language,” Science, Vol. 303 (February 27), pp. 1329-31. Hair, J. F., Bush, R. P., Ortinau, D. J. (2003) Marketing Research: Within a Changing Information Environment, 2nd edition. McGraw-Hill, New York. Jacoby, J., Matell, M. S. (1971) “Three-point Likert scales are good enough,” Journal of Marketing Research, Vol 8 (November), pp. 495-500. Jones, L. V., Thurstone, L. L. (1955) “The psychophysics of semantics: an experimental investigation,” The Journal of Applied Psychology, Vol 39 No 1, pp. 31-6. Kolodinsky, J. (1999), “Consumer satisfaction with a managed health care plan,” The Journal of Consumer Affairs, Vol 33 No 2, pp. 223-36. McDaniel, C., Gates, R. (2005) Marketing Research, 6th edition. Wiley, Hoboken, NJ. Menezes, D., Elbert, N. F. (1979) “Alternative semantic scaling formats for measuring store image: an evaluation,” Journal of Marketing Research, Vol 16 (February), pp. 80-7. Mittelstaedt, R. A. (1971) “Semantic properties of selected evaluative adjectives: other evidence,” Journal of Marketing Research, Vol 8 (May), pp. 236-37. Myers, J. H., Gregory Warner, W.G. (1968) “Semantic properties of selected evaluation adjectives,” Journal of Marketing Research, Vol 5 (November), pp. 409-12. Peterson, R. A., Wilson, W. R. (1992) “Measuring customer satisfaction: fact and artifact,” Journal of the Academy of Marketing Science, Vol 20 No 1, pp. 61-71. Preisser, J. S. (2002) “Quasi-likelihood analysis of patient satisfaction with medical care,” Health Services & Outcomes Research Methodology, Vol 3 No 4, pp. 233-45. 13 Vidali, J. J.(1975) “Context effects on scaled evaluatory adjective meaning,” Journal of the Market Research Society, Vol 17 No 1, pp. 21-5. Weinstein, M. (1989) “Consumers still like service, but their enthusiasm erodes,” American Banker Consumer Survey, Vol 6, pp. 18, 20. Westbrook, R. A. (1980) “A rating scale for measuring product/service satisfaction,” Journal of Marketing, Vol 44 (Fall), pp. 68-72. Wildt, A. R., Mazis, M. B. (1978) “Determinants of scale response: label versus position,” Journal of Marketing Research, Vol 15 (May), pp. 261-7. Wirtz, J., Lee, M. C. (2003) “An examination of the quality and context-specific applicability of commonly used customer satisfaction measures,” Journal of Service Research, Vol 5 No 4, pp. 345-55. Yang, C. D. (2000) “Internal and external forces in language change,” Language Variation and Change, Vol 12 No 3, pp. 231-50. 14