Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Journal of Classification 1:125-131 (1984) A Note on Cohen's Profile Similarity Coefficient rc Sampo V. Paunonen University of Toronto Abstract: Analytic procedures for classifying objects are commonly based on the product-moment correlation as a measure of object similarity. This statistic, however, generally does not represent an invariant index of similarity between two objects if they are measured along different bipolar variables where the direction of measurement for each variable is arbitrary. A computer simulation study compared Cohen's (1969) proposed solution to the problem, the invariant similarity coefficient rc, with the mean productmoment correlation based on all possible changes in the measurement direction of individual variables within a profile of scores. The empirical observation that rc approaches the mean product-moment correlation with increases in the number of scores in the profiles was interpreted as encouragement for the use of r c in classification research. Some cautions regarding its application were noted. Keywortls: Transpose factor analysis; Q-technique; Profile analysis; Bipolar variables; Direction of measurement 1. Introduction The classification of objects into typologies through transpose factor analysis (e.g., Q-method factoring) and its variants such as Modal Profile Analysis (Skinner 1977) is frequently based on the product-moment correlation coefficient as an index of the similarities among profiles of object scores. It has long been known, however, that the correlation coefficient between profiles of bipolar variables is not invariant over variable reflections, or changes in the direction of measurement (Tellegen 1965). This research was supported by the Social Sciences and Humanities Research Council of Canada, Grant no. 410-83-0633, and by the University of Toronto. Author's address: Sam Paunonen, Department of Psychology, The University of Toronto, Erindale Campus, Mississauga, Ontario L5L 1C6. 126 Sampo V. Paunonen Consider a 10-item Introversion-Extraversion measure that has been developed so that the frequency of " t r u e " responses to the items yields a score for Extraversion. The maximum score on the Extraversion scale would be 10, which suggests a minimum level of respondent Introversion. On the other hand, if each item receives a "false" endorsement, a respondent's score would be zero for Extraversion, implying a relatively high level of Introversion. But the direction of measurement, or item keying, in this example is arbitrary. It is obvious that one could, instead, justifiably count the number of "false" responses and compute a total Introversion score; larger numbers would represent more Introversion and, hence, less Extraversion. This transformation simply involves "reflecting" respondent Extraversion scores around the scale's midpoint of " 5 . " That is, an Extraversion score of 0 becomes an Introversion score of 10, 1 becomes 9, and so on. In general, the reflected value of score X, can be obtained for a neutral point of m as 2 m - X / (Cohen 1969, p. 281). A change in item keying as illustrated in the example above does not affect the measurement properties of the scale. Such changes to selected scales within a profile of scales, however, will generally modify the configuration or shape of each profile. This can result in increases or decreases in the correlation between two profiles to such an extent as to alter conclusions concerning object similarity. A problem of interpretation is apparent because the product-moment correlation (rpm) as an index of profile shape similarity, in contrast to euclidian distance measures of similarity (cf. Budescu 1980; Cronbach & Gleser 1953), can be based on arbitrary profile configurations. Cohen (1969) has developed a correlational index of profile similarity (rc) which is invariant over variable reflections or changes in the direction of measurement. Coefficient rc is equivalent to adding to the end of each object profile its corresponding "mirror image" profile, derived from the reflections of all variable scores about their common neutral point, and then computing the standard product-moment correlation between the extended (doubled) X and Y vectors. "The rationale is simply that since the original [scoring] directions are arbitrary, each element is represented both in its original direction and its opposite" (Cohen 1969, p. 282). Cohen (p. 282) has presented a computational formula for the correlation between such extended vectors: Y . X Y + k m 2 - m (~.X + Y. Y) rc= [(Y.X2 + k m 2 - 2 m Y . X ) (Y~ y2 + k m 2 _ 2 m E Y)]'~ where X and Y are the original profiles of k scores each, m is the common neutral point of all profile scales (about which any particular scale score Profile Similarity 127 would be reflected when changing that scale's direction of measurement), and summation is across all k measures. 1 An alternative to computing rc is to calculate the mean of all possible correlation coefficients that can be derived by reflecting one or more of the measures in the profiles (?'p,n).2 If the mean value is high and the standard deviation of the distribution of values is small, one can place more confidence in identifying the profiles as being similar in shape than if the mean correlation is near zero with some values being high and others low. Moreover, in the former case, inferences about object similarity based on any single product-moment correlation are more likely to represent those based on the mean correlation. Computing ~m is hardly a practical undertaking due to the computational labor involved. An approximation would be useful if it were readily estimated. Because Cohen's coefficient is the product-moment correlation between the original profiles extended by their mirror image reflections, rc might serve in this capacity. The following describes a computer simulation study designed to evaluate the relationship between rc and the mean correlation based on all possible variable reflections ~m as a function of (a) the number of variables in a profile, and (b) the distance of the profile means from the reflection point. Used was a set of computer-generated random vectors of varying lengths. 2. Computer Simulation Study A random number generating algorithm was used to construct 20 vectors simulating profiles of bipolar variables. The random number generator returns observations from a rectangular distribution of values ranging from 0 to 1, exclusive, with an expected mean of .5. A computer algorithm then iterated through all possible combinations of variable reflections, intercorrelating the 20 "profiles" with each change in profile configuration. Separate analyses were made with vectors of 4, 6, 8, 10, 12, and 14 elements, and reflection (neutral) points of .10, .50, and .90. The mean and the standard deviation of the product-moment correlations and rc were recorded for each of the 190 unique profile comparisons in each analysis. 1Note that rc is limited to applications where all scales of both profiles have the same neutral point. Cohen (1969, p. 282), however, has developed a somewhat more general form of r c ( r g ) , appropriate for situations where all reflection points for scales within profiles are the same but are different between profiles. 2For k variables there is a finite population of 2 k-I unique configurations of the profiles derivable from changing the scoring direction for some or all of the variables. This number does not include each configuration's redundant mirror image. 128 Sampo V. Paunonen Figure 1 illustrates the absolute differences between rc and -fpm for different profile lengths and variable neutral points. As the number of variables in the simulated profiles increases, rc appears asymptotically to approach Ypm- When the neutral point is .50 and near the expected profile means of .50, the differences are smaller at all points than the corresponding differences for reflection points of .10 or .90. .03 50 0 . . .01 4 6 8 I0 12 14 NUMBER OF SCALES Figure 1. The absolute difference between Cohen's coefficient of profile similarity (rc) and the mean profile correlation based on all possible profile configurations (~m), as a function of the number of scales in the profiles and the choice of reflection point (.10, .50, .90). Each point represents the mean of 190 profile comparisons. Profile Similarity 129 Figure 2 represents the effect of profile length and reflection point on the standard deviation of the empirical distribution of product-moment correlation coefficients. These dispersions become progressively smaller as the number of variables in the profiles is increased. Any one set of reflections will not, in general, " m o v e " the coefficient as far from the mean correlation with long vectors as it will with shorter vectors. When the point about which the scores are reflected is either .10 or .90, the standard deviations of the correlations are less than corresponding values at a reflection point of .50. 3 An important observation not apparent in these data is that the magnitude of the mean (and standard deviation) of each distribution of productmoment correlations varies as a function of the choice of neutral point, as does the magnitude of each rc value. To illustrate the effect of profile elevation on ~,~ and rc consider the scores of objects A and B on four measures as 75, 60, 70, 80, and 70, 70, 60, 80. The correlation between these two profiles is .48. If the first variable is reflected around a neutral point of 50 the corresponding scores of A and B change to 25 and 30, respectively, and the profile correlation rises to .93. Note the increase in the range of the profile scores. Reflecting the other measures will result in similarly large correlational increments ultimately effecting a large mean value (i.e., a large ~,n )" Cohen's coefficient yields a value of .94 for these data. One can visualize, however, both profiles being moved closer to the neutral point of 50; reflections will generally have less incremental effect on profile dispersions and thus ~m will decrease, as will re. Lowering the original scores by 20 points, for example, reduces the re estimate of profile similarity from .94 to .47. 4 These data demonstrate the contribution of profile elevations to the size of both re as well as the mean correlation. This observation is contrary to Cohen's (1969, p. 283) statement that rc is not "responsive" to such differences in profile level. Because r~ is a product-moment correlation coefficient computed across two extended profiles, its value is not affected by mean or dispersion differences between these transformed vectors and simply represents an index of their shape similarity. But the shape parameter is itself affected by the means (and standard deviations) of the original profiles 3The small discrepancies observed for reflection points of .10 and .90 in the figures are due to a slight bias in the random number algorithm for values of less than .50 in the samples generated. 4Similar effects occur if one profile mean is above the reflection point and the other is below. In this case, however, the profile similarity measures r c and ~m yield increasingly large negative coefficients as the profile means move away from each other and the reflection point. 130 Sampo V. Paunonen before being extended. It is partly these initial profile elevations and their relation to the reflection point of the vector elements that determine the appearance (shape) of the new profile configurations and, thus, determine rc. 3. Conclusions Cohen's coetficient of profile similarity, rc, was observed empirically to approximate the mean product-moment correlation averaged across a population of arbitrary profile configurations. This property of rc broadens its definition (cf. Cohen 1969) and provides a sound rationale for its use in classification research where a similarity index is required between profiles .301 ii < .20 I.U CI~ ,~© c3U e~ C3,, zO .io t"-" .90 I 4 I. . . . . . 6 1 I I I 8 I0 12 14 NUMBER OF SCALES Figure 2. The standard deviation of profile correlations based on all possible profile configurations, as a function of the number of scales in the profiles and choice of reflection point (.10, .50, .90). Each point represents the mean of 190 profile comparisons. Profile Similarity 131 of bipolar measurements. To the extent that the profile means (for any arbitrary configuration) are far from the reflection point, both rc and the mean correlation approach _+1.00, suggesting ever increasing degrees of profile (dis)similarity. Prudence, therefore, is advised in the interpretation Ofrc. References BUDESCU, D.V. (t980), "Some New Measures of Profile Dissimilarity," Applied Psychological Measuremen~ 4, 261-272. COHEN, J. (1969), "rc: A profile Similarity Coefficient Invariant over Variable Reflection," Psychological Bulletin, 71,281-284. CRONBACH, L.J., and Gleser, G.C. (1953), "Assessing Similarities Between Profiles," Psychological Bulletin, 50, 456-473. SKINNER, H.A. (1977), "The Eyes That Fix You: A Model for Classification Research," Canadian Psychological Review, 18, 142-151. TELLEGEN, A. (1965), "Direction of Measurement: A Source of Misinterpretation," Psychological Bulletin, 63, 233-243.