Download A note on Cohen`s profile similarity coefficient <Emphasis Type

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Journal of Classification 1:125-131 (1984)
A Note on Cohen's Profile Similarity Coefficient rc
Sampo V. Paunonen
University of Toronto
Abstract: Analytic procedures for classifying objects are commonly based on
the product-moment correlation as a measure of object similarity. This statistic, however, generally does not represent an invariant index of similarity
between two objects if they are measured along different bipolar variables
where the direction of measurement for each variable is arbitrary. A computer simulation study compared Cohen's (1969) proposed solution to the
problem, the invariant similarity coefficient rc, with the mean productmoment correlation based on all possible changes in the measurement direction of individual variables within a profile of scores. The empirical observation that rc approaches the mean product-moment correlation with increases
in the number of scores in the profiles was interpreted as encouragement for
the use of r c in classification research. Some cautions regarding its application were noted.
Keywortls: Transpose factor analysis; Q-technique; Profile analysis; Bipolar
variables; Direction of measurement
1. Introduction
The classification of objects into typologies through transpose factor
analysis (e.g., Q-method factoring) and its variants such as Modal Profile
Analysis (Skinner 1977) is frequently based on the product-moment correlation coefficient as an index of the similarities among profiles of object
scores. It has long been known, however, that the correlation coefficient
between profiles of bipolar variables is not invariant over variable
reflections, or changes in the direction of measurement (Tellegen 1965).
This research was supported by the Social Sciences and Humanities Research Council of
Canada, Grant no. 410-83-0633, and by the University of Toronto.
Author's address: Sam Paunonen, Department of Psychology, The University of Toronto, Erindale Campus, Mississauga, Ontario L5L 1C6.
126
Sampo V. Paunonen
Consider a 10-item Introversion-Extraversion measure that has been
developed so that the frequency of " t r u e " responses to the items yields a
score for Extraversion. The maximum score on the Extraversion scale
would be 10, which suggests a minimum level of respondent Introversion.
On the other hand, if each item receives a "false" endorsement, a
respondent's score would be zero for Extraversion, implying a relatively
high level of Introversion. But the direction of measurement, or item keying, in this example is arbitrary. It is obvious that one could, instead,
justifiably count the number of "false" responses and compute a total Introversion score; larger numbers would represent more Introversion and,
hence, less Extraversion. This transformation simply involves "reflecting"
respondent Extraversion scores around the scale's midpoint of " 5 . " That is,
an Extraversion score of 0 becomes an Introversion score of 10, 1 becomes
9, and so on. In general, the reflected value of score X, can be obtained for
a neutral point of m as 2 m - X / (Cohen 1969, p. 281).
A change in item keying as illustrated in the example above does not
affect the measurement properties of the scale. Such changes to selected
scales within a profile of scales, however, will generally modify the
configuration or shape of each profile. This can result in increases or
decreases in the correlation between two profiles to such an extent as to
alter conclusions concerning object similarity. A problem of interpretation is
apparent because the product-moment correlation (rpm) as an index of
profile shape similarity, in contrast to euclidian distance measures of similarity (cf. Budescu 1980; Cronbach & Gleser 1953), can be based on arbitrary
profile configurations.
Cohen (1969) has developed a correlational index of profile similarity
(rc) which is invariant over variable reflections or changes in the direction
of measurement. Coefficient rc is equivalent to adding to the end of each
object profile its corresponding "mirror image" profile, derived from the
reflections of all variable scores about their common neutral point, and then
computing the standard product-moment correlation between the extended
(doubled) X and Y vectors. "The rationale is simply that since the original
[scoring] directions are arbitrary, each element is represented both in its original direction and its opposite" (Cohen 1969, p. 282). Cohen (p. 282) has
presented a computational formula for the correlation between such
extended vectors:
Y . X Y + k m 2 - m (~.X + Y. Y)
rc= [(Y.X2 + k m 2 - 2 m Y . X ) (Y~ y2 + k m 2 _ 2 m E Y)]'~
where X and Y are the original profiles of k scores each, m is the common
neutral point of all profile scales (about which any particular scale score
Profile Similarity
127
would be reflected when changing that scale's direction of measurement),
and summation is across all k measures. 1
An alternative to computing rc is to calculate the mean of all possible
correlation coefficients that can be derived by reflecting one or more of the
measures in the profiles (?'p,n).2 If the mean value is high and the standard
deviation of the distribution of values is small, one can place more
confidence in identifying the profiles as being similar in shape than if the
mean correlation is near zero with some values being high and others low.
Moreover, in the former case, inferences about object similarity based on
any single product-moment correlation are more likely to represent those
based on the mean correlation.
Computing ~m is hardly a practical undertaking due to the computational labor involved. An approximation would be useful if it were readily
estimated. Because Cohen's coefficient is the product-moment correlation
between the original profiles extended by their mirror image reflections, rc
might serve in this capacity.
The following describes a computer simulation study designed to evaluate the relationship between rc and the mean correlation based on all possible variable reflections ~m as a function of (a) the number of variables in a
profile, and (b) the distance of the profile means from the reflection point.
Used was a set of computer-generated random vectors of varying lengths.
2. Computer Simulation Study
A random number generating algorithm was used to construct 20 vectors simulating profiles of bipolar variables. The random number generator
returns observations from a rectangular distribution of values ranging from 0
to 1, exclusive, with an expected mean of .5. A computer algorithm then
iterated through all possible combinations of variable reflections, intercorrelating the 20 "profiles" with each change in profile configuration. Separate
analyses were made with vectors of 4, 6, 8, 10, 12, and 14 elements, and
reflection (neutral) points of .10, .50, and .90. The mean and the standard
deviation of the product-moment correlations and rc were recorded for each
of the 190 unique profile comparisons in each analysis.
1Note that rc is limited to applications where all scales of both profiles have the same
neutral point. Cohen (1969, p. 282), however, has developed a somewhat more general form
of r c ( r g ) , appropriate for situations where all reflection points for scales within profiles are the
same but are different between profiles.
2For k variables there is a finite population of 2 k-I unique configurations of the profiles
derivable from changing the scoring direction for some or all of the variables. This number
does not include each configuration's redundant mirror image.
128
Sampo V. Paunonen
Figure 1 illustrates the absolute differences between rc and -fpm for
different profile lengths and variable neutral points. As the number of
variables in the simulated profiles increases, rc appears asymptotically to
approach Ypm- When the neutral point is .50 and near the expected profile
means of .50, the differences are smaller at all points than the corresponding
differences for reflection points of .10 or .90.
.03
50
0
.
.
.01
4
6
8
I0
12
14
NUMBER OF SCALES
Figure 1. The absolute difference between Cohen's coefficient of profile similarity (rc) and the
mean profile correlation based on all possible profile configurations (~m), as a function of the
number of scales in the profiles and the choice of reflection point (.10, .50, .90). Each point
represents the mean of 190 profile comparisons.
Profile Similarity
129
Figure 2 represents the effect of profile length and reflection point on
the standard deviation of the empirical distribution of product-moment
correlation coefficients. These dispersions become progressively smaller as
the number of variables in the profiles is increased. Any one set of
reflections will not, in general, " m o v e " the coefficient as far from the mean
correlation with long vectors as it will with shorter vectors. When the point
about which the scores are reflected is either .10 or .90, the standard deviations of the correlations are less than corresponding values at a reflection
point of .50. 3
An important observation not apparent in these data is that the magnitude of the mean (and standard deviation) of each distribution of productmoment correlations varies as a function of the choice of neutral point, as
does the magnitude of each rc value. To illustrate the effect of profile
elevation on ~,~ and rc consider the scores of objects A and B on four measures as 75, 60, 70, 80, and 70, 70, 60, 80. The correlation between these
two profiles is .48. If the first variable is reflected around a neutral point of
50 the corresponding scores of A and B change to 25 and 30, respectively,
and the profile correlation rises to .93. Note the increase in the range of the
profile scores. Reflecting the other measures will result in similarly large
correlational increments ultimately effecting a large mean value (i.e., a large
~,n )" Cohen's coefficient yields a value of .94 for these data. One can
visualize, however, both profiles being moved closer to the neutral point of
50; reflections will generally have less incremental effect on profile dispersions and thus ~m will decrease, as will re. Lowering the original scores by
20 points, for example, reduces the re estimate of profile similarity from .94
to .47. 4
These data demonstrate the contribution of profile elevations to the size
of both re as well as the mean correlation. This observation is contrary to
Cohen's (1969, p. 283) statement that rc is not "responsive" to such
differences in profile level. Because r~ is a product-moment correlation
coefficient computed across two extended profiles, its value is not affected by
mean or dispersion differences between these transformed vectors and simply represents an index of their shape similarity. But the shape parameter is
itself affected by the means (and standard deviations) of the original profiles
3The small discrepancies observed for reflection points of .10 and .90 in the figures are
due to a slight bias in the random number algorithm for values of less than .50 in the samples
generated.
4Similar effects occur if one profile mean is above the reflection point and the other is
below. In this case, however, the profile similarity measures r c and ~m yield increasingly large
negative coefficients as the profile means move away from each other and the reflection point.
130
Sampo V. Paunonen
before being extended. It is partly these initial profile elevations and their
relation to the reflection point of the vector elements that determine the
appearance (shape) of the new profile configurations and, thus, determine
rc.
3. Conclusions
Cohen's coetficient of profile similarity, rc, was observed empirically to
approximate the mean product-moment correlation averaged across a population of arbitrary profile configurations. This property of rc broadens its
definition (cf. Cohen 1969) and provides a sound rationale for its use in
classification research where a similarity index is required between profiles
.301
ii
<
.20
I.U CI~
,~©
c3U
e~
C3,,
zO
.io
t"-"
.90
I
4
I. . . . . .
6
1
I
I
I
8
I0
12
14
NUMBER OF SCALES
Figure 2. The standard deviation of profile correlations based on all possible profile
configurations, as a function of the number of scales in the profiles and choice of reflection
point (.10, .50, .90). Each point represents the mean of 190 profile comparisons.
Profile Similarity
131
of bipolar measurements. To the extent that the profile means (for any
arbitrary configuration) are far from the reflection point, both rc and the
mean correlation approach _+1.00, suggesting ever increasing degrees of
profile (dis)similarity. Prudence, therefore, is advised in the interpretation
Ofrc.
References
BUDESCU, D.V. (t980), "Some New Measures of Profile Dissimilarity," Applied Psychological
Measuremen~ 4, 261-272.
COHEN, J. (1969), "rc: A profile Similarity Coefficient Invariant over Variable Reflection,"
Psychological Bulletin, 71,281-284.
CRONBACH, L.J., and Gleser, G.C. (1953), "Assessing Similarities Between Profiles,"
Psychological Bulletin, 50, 456-473.
SKINNER, H.A. (1977), "The Eyes That Fix You: A Model for Classification Research,"
Canadian Psychological Review, 18, 142-151.
TELLEGEN, A. (1965), "Direction of Measurement: A Source of Misinterpretation," Psychological Bulletin, 63, 233-243.