Download Semantic properties of scale point descriptors: developing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operations research wikipedia , lookup

Transcript
Developing inventories for satisfaction and Likert scales in a service
environment
by
Contact Author: Karin Braunsberger, Ph.D.
Associate Professor of Marketing
University of South Florida St. Petersburg
College of Business
140 Seventh Avenue South
Bayboro Station 306
St. Petersburg, FL 33701-5016
Telephone: (727) 873-4082
Fax: (727) 873-4192
E-mail: [email protected]
Roger Gates
DSS Research
6750 Locke Avenue
Ft. Worth, TX 76116
Telephone: (817) 665-7000
Fax: (817) 665-7001
E-mail: [email protected]
1
Abstract
Purpose: To produce up-to-date inventories for satisfaction and Likert scales that contain
commonly used scale point descriptors and their respective mean scale values and standard
deviations.
Methodology/Approach: All data were collected online using the SSI Survey Spot Panel. The
panel is national (U.S.) in scope and was screened to include individuals 21-65 years of age. A
random sample was drawn. Thirty-nine satisfaction items and 19 agreement items were tested,
and the mean value and the standard deviation were calculated for each of these descriptors.
Findings: Even though only six of the items that had been tested by Jones and Thurstone (1955)
were included in the list of satisfaction scale descriptors, the semantic meanings of those six have
changed very little over the years.
Research limitations/implications: One limitation of the current study might be the chosen
service context, since scale point descriptor inventories developed within the context of health
insurance might not be valid in other service contexts.
Practical Implications: Since the present study focuses on two types of scales that are frequently
used in service environments, namely Likert and satisfaction scales, the major contribution of
this study is to provide researchers and managers in services marketing with quantitative
measurement of the meanings of commonly used scale point descriptors, which as pointed out by
Myers and Warner (1968) will make possible the development of equal interval scales and thus
aid analyses of data sets. It will thus help service marketers to develop questionnaires that more
accurately reflect actual consumer satisfaction and opinions.
Keywords: Satisfaction scale inventory, Likert scale inventory.
Paper Type: Research paper.
2
Developing inventories for satisfaction and Likert scales in a service
environment
1. Introduction
Researchers and managers in services marketing are often concerned with assessing
customer satisfaction and opinions (Bearden, Malhotra and Uscátequi, 1998). When developing
questions to assess satisfaction it has been strongly suggested that the end points of preference
response scales should be words or phrases that denote bi-polar extremes, and that all anchoring
points should be suitably spaced along the semantic continuum connecting the end points (Jones
and Thurstone, 1955). Jones and Thurstone (1955) further express the need to investigate the
semantic properties of commonly used scale point descriptors to make sure that they possess the
above properties and also carry meaning that is as clear as possible to subjects that represent the
researcher’s population of interest. Further, knowing the exact scale value of each scale point
descriptor is of importance when constructing successive-interval type of scales. Consequently,
Jones and Thurstone (1955) examine the semantic meanings, to respondents, of 51 scale point
descriptors using 9-point scales and subsequently present the research community with a listing
of words and phrases that range from those expressing “greatest like” to those conveying the
“greatest dislike.” That is, the authors succeed in constructing a “continuum of meaning” that
ranges from the end points “best of all” to its bi-polar extreme “despise” (p.33), and further
provide future researchers with both the scale value and standard deviation of each of the tested
words and phrases.
Similarly, Myers and Warner (1968) argue that the construction of accurate and
meaningful scales requires that researchers comprehend the psychological meaning, to the
respondent, of scale point descriptors. These authors further assert that quantitative measurement
3
of the meanings of commonly used scale point descriptors would allow researchers to develop
equal interval scales that are desirable for subsequent statistical analyses of data sets.
Accordingly, Myers and Warner (1968) modify the technique introduced by Jones and Thurstone
(1955), investigate the psychological meaning of 50 commonly used scale point descriptors to
four different groups of respondents, and present the respective mean scale values and standard
deviations for all four groups of respondents. Even though the four subject groups are very
different from each other (i.e., housewives, business executives, undergraduate and graduate
business students), their mean scale values and standard deviations are very similar.
Similar studies have been conducted by Bartram and Yelding (1973), Vidali (1975),
Wildt and Mazis (1978), and the findings indicate that inventory scale values such as provided
by Jones and Thurstone (1955) and Myers and Warner (1968) “are surprisingly consistent among
very diverse groups of people,” “can be used with a high degree of confidence,” and are “likely
to provide psychological scales that are virtually equi-distant” (Vidali, 1975, p.25).
Considering, however, that languages change over time (Graddol, 2004; Yang, 2000),
and no recent inventories are available, the purpose of the present study is to produce a current
inventory containing commonly used scale point descriptors and their respective mean scale
values and standard deviations. Since the present study focuses on two types of scales that are
frequently used in service environments, namely Likert and satisfaction scales, the major
contribution of this study is to provide researchers with quantitative measurement of the
meanings of commonly used scale point descriptors, which as pointed out by Myers and Warner
(1968) will make possible the development of equal interval scales and thus aid statistical
analyses of data sets.
2. Methods
4
The goal of the present research was to develop inventories for two types of frequently
used response scales, namely satisfaction and Likert scales. A review of the literature focused on
locating commonly used scale point descriptors for both types of scales (see Tables 1 and 2).
Given that that there is considerable overlap of scale point descriptors, a final number of 39
satisfaction items and 19 agreement items was chosen and tested.
[INSERT TABLES 1 AND 2 HERE]
The data collection followed the method first outlined by Jones and Thurstone (1955).
Accordingly, all satisfaction scale point descriptors were treated as items on nine-point scales
(from -4 to +4). Each scale was anchored to the left by “greatest dislike,” its midpoint by
“neither like nor dislike,” and to the right by “greatest like” (see Table 3 for the instructions
given to respondents). The procedure for the Likert scale point descriptors was similar, except
that the left-hand anchor read “greatest disagreement,” the scale midpoint “neither agree nor
disagree,” and the right-hand anchor “greatest agreement.” For each of the scale point
descriptors, respondents were asked to place a check mark in the space on the nine-point scale
that best described the meaning of the respective scale point descriptor.
[INSERT TABLE 3 HERE]
All data were collected online, in the United States. For that purpose, the SSI Survey Spot
Panel was used. The panel is national in scope and was screened to include individuals 21-65
years of age. A random sample was drawn, and of those invited to participate by panel, 65%
qualified to participate in the survey. That is, because the present study focuses on creating an
inventory of satisfaction and agreement measures in the health insurance industry, we recruited
only subjects who actually had experience with such insurance, i.e., had group health insurance
through an employer [self or spouse]. Considering that 65% of the U.S. population has health
5
insurance, our samples are therefore representative of the population of interest. Further, only the
household decision-maker or co-decision maker was qualified to participate. The response rate of
those who qualified was 62%. All subjects were asked to rate each of the 39 satisfaction and 19
agreement items. The satisfaction scale point descriptors were rated first, followed by the
agreement scale point descriptors. The order of the items within each of the categories (i.e.,
satisfaction and agreement descriptors) was random. Following the procedure outlined by Jones
and Thurstone (1955) and defended by Myers and Warner (1968), all subjects (N = 272) were
shown all scale-point descriptors within each category at once.
3. Data analysis and results
The mean value and the standard deviation were calculated for each of the scale point
descriptors (Tables 4 and 5). Interestingly, even though only six of the items that had been tested
by Jones and Thurstone (1955) were included in the list of satisfaction scale descriptors, the
semantic meanings of those six have changed very little over the years (see Table 4).
[INSERT TABLES 4 AND 5 HERE]
4. Discussion and conclusion
The current study examines the semantic properties of commonly used scale point
descriptors for both satisfaction and agreement scales, and subsequently provides inventories of
mean values and standard deviations for these scale point descriptors to be used by researchers.
Knowing a scale point descriptor’s mean value makes it possible to construct successive interval
and/or equal interval scales that support meaningful statistical analyses and interpretation.
Although the current study manages to overcome some of the limitations pointed out by
Myers and Warner (1968) – namely the use of relatively small samples that are not national in
scope and are not random in kind – one limitation of the current study that future research should
6
investigate is the limitation that might arise due to the chosen product context. It is conceivable
that scale point descriptor inventories developed within the context of health insurance might not
be valid in other product contexts. However, even as we point to this limitation, Mittelstaedt
(1971, p. 236), who compares three different studies that focused on building scale point
descriptor inventories, helps us argue that the product context used to develop an inventory is not
very likely to impact the usefulness of that inventory in other product contexts: “In spite of
differences in time, place, subjects, instruments, instructions, referents and the contextual
differences which may arise from using widely different arrays of stimuli, the correspondence
among the scale values of the three studies seems remarkable.”
7
TABLE 1
Satisfaction Scales
Crosby and Stephens, 1987, Journal of
Displeased
Marketing Research (cited by Wirtz and
Pleased
Lee, 2003, Journal of Service Research)
Kolodinsky, 1999, Journal of Consumer
Very dissatisfied
Affairs
Dissatisfied
Neutral
Satisfied
Very satisfied
Peterson and Wilson, 1992, Journal of the
Very satisfied
Academy of Marketing Science
Somewhat satisfied
Somewhat dissatisfied
Very dissatisfied
Uncertain
Peterson and Wilson, 1992, Journal of
Very satisfied
Somewhat satisfied
Marketing Research
Unsatisfied
Very unsatisfied
Peterson and Wilson, 1992, Journal of the
Completely satisfied
Academy of Marketing Science
Very satisfied
Fairly satisfied
Somewhat dissatisfied
Very dissatisfied
Preisser, 2002, Health Services and
Excellent
Outcomes Research Methodology
Very good
Good
Fair
Poor
SIP Servizio Opinioni, 1989, as cited in
Very satisfied
Peterson and Wilson, 1992, Journal of the
Quite satisfied
Not very satisfied
Academy of Marketing Science
Not at all satisfied
Weinstein, 1989, American Banker
Very satisfied
Consumer Survey
Somewhat satisfied
Completely unsatisfied
Westbrook, 1980, Journal of Marketing (T- Delighted
Pleased
D Scale)
Mostly satisfied
Mixed (about equally satisfied and dissatisfied)
Mostly dissatisfied
Unhappy
Terrible
For reasons of completion and exploratory
Extremely satisfied, acceptable, slightly
purposes, the following scale point
satisfied, OK, neither satisfied nor dissatisfied,
descriptors were added
slightly dissatisfied, fairly dissatisfied,
completely dissatisfied, extremely dissatisfied
8
TABLE 2
Likert Scales
Albaum, 1997, Market Research Society
Strongly agree
Agree
Neither agree nor disagree
Disagree
Strongly disagree
Hair, Bush and Ortinau, 2003, Marketing
Definitely agree
Research
Generally agree
Slightly agree
Slightly disagree
Generally disagree
Definitely disagree
Jacoby and Matell, 1971, Journal of
Agree
Marketing Research
Uncertain
Disagree
McDaniel and Gates, 2002, Marketing
Strongly agree
Research
Somewhat agree
Neutral
Somewhat disagree
Strongly disagree
Menezes and Elbert, 1979, Journal of
Strongly agree
Marketing Research
Generally agree
Moderately agree
Moderately disagree
Generally disagree
Strongly disagree
For reasons of completion and exploratory
Completely agree
purposes, the following two scale point
Completely disagree
descriptors were added
9
TABLE 3
Instructions to Respondents
WORD MEANING TEST
In this test are words and phrases that people might use to show like or dislike for
health insurance plans. For each word or phrase make a check mark to show what the word or
phrase means to you. Look at the examples.
Example I
Suppose you heard a person say that he/she “barely liked” his/her health insurance
plan. You would probably decide that he/she likes it only a little. To show the meaning of the
phrase “barely like,” you would probably check under +1 on the scale below.
Greatest
Dislike
-4
Neither
Like Nor
Dislike
-3
-2
-1
0
Greatest
Like
+1
+2
+3
+4
Barely like
√
Example II
If you heard someone say he had the “greatest possible dislike” for a certain health
insurance plan, you would probably check under -4, as shown on the scale below.
Greatest
Dislike
-4
Greatest possible
dislike
Neither
Like Nor
Dislike
-3
-2
-1
0
Greatest
Like
+1
+2
+3
+4
√
For each phrase on the following pages, check along the scale to show how much like
or dislike the phrase means.
10
Item
TABLE 4
Satisfaction Items
Valid N
Means
272
3.74
272
3.58
272
3.33
272
3.29
272
3.11
272
2.61
272
2.67
272
2.39
272
2.04
272
1.88
272
1.81
272
1.45
272
1.32
272
1.22
272
0.94
272
0.69
272
0.47
272
0.03
272
0.00
Std Dev
0.94
1.24
1.91
1.21
1.22
0.91
1.17
1.14
1.17
1.14
0.96
1.01
0.86
0.82
0.91
0.85
0.92
0.36
0.43
Excellent
Completely satisfied
Extremely satisfied
Very satisfied
Delighted
Very good
Quite satisfied
Mostly satisfied
Pleased
Satisfied
Good
Fairly satisfied
Somewhat satisfied
Acceptable
Slightly satisfied
OK
Fair
Neutral
Mixed (about equally
satisfied and dissatisfied)
Neither satisfied nor
272
-0.02
0.36
dissatisfied
Uncertain
272
-0.07
0.50
Slightly dissatisfied
272
-1.13
0.76
Somewhat dissatisfied
272
-1.42
0.81
Fairly dissatisfied
272
-1.66
0.98
Not very satisfied
272
-1.51
1.32
Displeased
272
-1.85
1.10
Unhappy
272
-1.87
1.10
Dissatisfied
272
-1.85
1.25
272
1.29
Poor
-1.92
Unsatisfied
272
-2.14
1.21
Mostly dissatisfied
272
-2.78
0.92
Quite dissatisfied
272
-2.65
1.73
Very unsatisfied
272
-3.15
1.45
Not at all satisfied
272
-3.25
1.43
Very dissatisfied
272
-3.08
1.57
Terrible
272
-3.36
1.14
Completely dissatisfied
272
-3.22
2.14
Completely unsatisfied
272
-3.60
1.45
Extremely dissatisfied
272
-3.71
1.02
* Statistically significant at the .05 level
Jones and Thurstone (1955) inventoried the scale point descriptors excellent (mean = 3.71, std
dev = 1.01); very good (mean = 2.56, std dev = .87); good (mean = 1.91, std dev = .76); fair
(mean = .78, std dev = .47); neutral (mean = .02, std dev = .18); poor (mean = -1.55, std dev =
.87)
11
Item
Completely agree
Definitely agree
Strongly agree
Agree
Generally agree
Moderately agree
Somewhat agree
Slightly agree
Neutral
Neither agree nor
disagree
Uncertain
Slightly disagree
Somewhat disagree
Moderately disagree
Disagree
Generally disagree
Strongly disagree
Definitely disagree
Completely disagree
TABLE 5
Likert Items
Valid N
272
272
272
272
272
272
272
272
272
272
272
272
272
272
272
272
272
272
272
Means
3.63
3.32
3.05
1.92
1.67
1.62
1.23
0.96
0.01
0.00
Std Dev
0.91
1.13
1.60
0.88
0.91
1.03
0.65
0.56
0.25
0.32
-0.08
-0.95
-1.37
-1.65
-1.82
-1.74
-3.21
-3.45
-3.69
0.44
0.84
0.80
1.12
1.01
1.09
1.43
1.18
1.18
* Statistically significant at the .05 level
12
REFERENCES
Albaum, G. (1997) “The Likert scale revisited: an alternate version,” Market Research Society,
Vol 39 No 2, pp. 331-48.
Bearden, W. O., Malhotra, M. K., Uscátequi, K. H. (1998) “Customer contact and the evaluation
of service experiences: propositions and implications for the design of services,”
Psychology and Marketing, Vol 15 No 8, pp. 793-809.
Bertram, P., Yelding, D. (1973) “The development of an empirical method of selecting phrases
used in verbal rating scales,” Journal of the Market Research Society, Vol 15, pp. 151-56.
Crosby, L. A., Stephens, N. (1987) “Effects of relationship marketing on satisfaction, retention,
and prices in the life insurance industry,” Journal of Marketing Research, Vol 24
(November), pp. 404-11.
Graddol, D. (2004) “The future of language,” Science, Vol. 303 (February 27), pp. 1329-31.
Hair, J. F., Bush, R. P., Ortinau, D. J. (2003) Marketing Research: Within a Changing
Information Environment, 2nd edition. McGraw-Hill, New York.
Jacoby, J., Matell, M. S. (1971) “Three-point Likert scales are good enough,” Journal of
Marketing Research, Vol 8 (November), pp. 495-500.
Jones, L. V., Thurstone, L. L. (1955) “The psychophysics of semantics: an experimental
investigation,” The Journal of Applied Psychology, Vol 39 No 1, pp. 31-6.
Kolodinsky, J. (1999), “Consumer satisfaction with a managed health care plan,” The Journal of
Consumer Affairs, Vol 33 No 2, pp. 223-36.
McDaniel, C., Gates, R. (2005) Marketing Research, 6th edition. Wiley, Hoboken, NJ.
Menezes, D., Elbert, N. F. (1979) “Alternative semantic scaling formats for measuring store
image: an evaluation,” Journal of Marketing Research, Vol 16 (February), pp. 80-7.
Mittelstaedt, R. A. (1971) “Semantic properties of selected evaluative adjectives: other
evidence,” Journal of Marketing Research, Vol 8 (May), pp. 236-37.
Myers, J. H., Gregory Warner, W.G. (1968) “Semantic properties of selected evaluation
adjectives,” Journal of Marketing Research, Vol 5 (November), pp. 409-12.
Peterson, R. A., Wilson, W. R. (1992) “Measuring customer satisfaction: fact and artifact,”
Journal of the Academy of Marketing Science, Vol 20 No 1, pp. 61-71.
Preisser, J. S. (2002) “Quasi-likelihood analysis of patient satisfaction with medical care,” Health
Services & Outcomes Research Methodology, Vol 3 No 4, pp. 233-45.
13
Vidali, J. J.(1975) “Context effects on scaled evaluatory adjective meaning,” Journal of the
Market Research Society, Vol 17 No 1, pp. 21-5.
Weinstein, M. (1989) “Consumers still like service, but their enthusiasm erodes,” American
Banker Consumer Survey, Vol 6, pp. 18, 20.
Westbrook, R. A. (1980) “A rating scale for measuring product/service satisfaction,” Journal of
Marketing, Vol 44 (Fall), pp. 68-72.
Wildt, A. R., Mazis, M. B. (1978) “Determinants of scale response: label versus position,”
Journal of Marketing Research, Vol 15 (May), pp. 261-7.
Wirtz, J., Lee, M. C. (2003) “An examination of the quality and context-specific applicability of
commonly used customer satisfaction measures,” Journal of Service Research, Vol 5 No
4, pp. 345-55.
Yang, C. D. (2000) “Internal and external forces in language change,” Language Variation and
Change, Vol 12 No 3, pp. 231-50.
14