Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Journal of Mathematical Psychology MP1131 journal of mathematical psychology 40, 342347 (1996) article no. 0033 THEORETICAL NOTE Maximum Entropy Inference and Stimulus Generalization In Jae Myung Ohio State University and Roger N. Shepard Stanford University Maximum entropy inference is a method for estimating a probability distribution based on limited information expressed in terms of the moments of that distribution. This paper presents such a maximum entropy characterization of Shepard's theory of generalization. Shepard's theory assumes that an object has an important consequence for an individual only if it falls in a connected set, called the consequential region, in the individual's representational space. The assumption yields a generalization probability that decays exponentially with an appropriate psychological distance metriceither the city-block or the Euclidean, depending on the correlational structure between extensions of the consequential region along the dimensions. In this note we show that a generalization function similar to that derived by Shepard (1987) can be obtained by applying maximum entropy inference on limited information about interstimulus distances between two objects having a common consequence. In particular, we show that different shapes of equal generalization contours may be interpreted as optimal utilizationin the maximum entropy senseof the correlation structure of stimulus dimensions, similar to the explanation by Shepard's theory. ] 1996 Academic Press 1. INTRODUCTION Results from multidimensional scaling analysis of generalization data (e.g., Shepard, 1958) indicate that the likelihood that an organism will generalize from one stimulus to another decreases in close approximation to an exponential decay function of psychological distance between the two stimuli. In formal terms, the generalization function g y (x) is defined as the probability that the second stimulus, denoted by a vector x, in an organism's representation space will have a significant consequence given that the first stimulus, denoted by another vector y, was found to be consequential. Let us assume that the organism estimates this probability Send all correspondence and reprint requests to: Dr. In Jae Myung, Department of Psychology, 142 Townshend Hall, 1885 Neil Avenues Mall, Columbus, OH 43210-1222. Email: imyungmagnus.acs.ohio-state.edu. 0022-249696 18.00 Copyright 1996 by Academic Press All rights of reproduction in any form reserved. File: 480J 113101 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01 Codes: 6494 Signs: 4120 . Length: 60 pic 11 pts, 257 mm 342 from which it makes decisions probabilistically regarding generalizability of the first stimulus to the second. Then the generalization function can be approximated as g y (x)=exp( &' } d(x, y)) (1.1) where d(x, y) is the distance between x and y measured by an appropriate metric and ' is a positive scaling parameter. The Minkowski r-metric is often found to fit generalization data closely and is defined as \ m d(x, y)= : |x i & y i | r i=1 + 1r , r0. (1.2) In particular, the city-block metric (r=1) better describes data obtained for stimuli differing along separable dimensions, such as size and orientation of shapes, whereas the Euclidean metric (r=2) better describes data obtained for stimuli differing along integral dimensions, such as saturation and lightness of colors (for reviews see Garner, 1974, and Shepard, 1991; for a critical review of related issues, see Townsend 6 Thomas, 1993). There is some evidence, however, that generalization data may be better described by a Minkowski r-metric with a non-integer r-value between 1 and 2, or greater than 2, or even less than 1 (see, e.g., Kruskal, 1964; Shepard, 1964; Tversky 6 Gati, 1982; and Shepard, 1991). Shepard (1987) proposed a theory of generalization that provides non-arbitrary explanations for the empirical regularities based on the idea that they represent a statistical inference given limited information. The theory assumes that a basic kind or category is represented in an organism's representation space as a connected set, called a consequential region. Two objects that fall in a particular consequential region belong to the same category and thus have the 343 MAXIMUM ENTROPY INFERENCE AND STIMULUS GENERALIZATION same consequence. To generalize, one must then estimate the probability that the second stimulus x falls in the same consequential region that already contains the first stimulus y. The estimation of this probability must be carried out with little or no knowledge of the location, size, and shape of the consequential region in representational space. According to theory, the generalization function is estimated as an expectation of that probability by integrating over consequential regions of all possible locations and sizes, weighted by their prior probabilities. For simplicity, a two-dimensional case (m=2) is considered in this paper. Assuming that nature selects the location of the consequential region at random, the theory prescribes the generalization function g y (x) as obtained by g y (x)= | | 0 P y (x | s 1 , s 2 ) p(s 1 , s 2 ) ds 1 ds 2 . (1.3) 0 In this equation, P y (x | s 1 , s 2 ) is the probability that stimulus x belongs to the consequential region of particular sizes or extensions, s 1 and s 2 , that already contains stimulus y, and p(s 1 , s 2 ) is the size density. For a broad range of choices of p(s 1 , s 2 ), Shepard (1987) found the integral in Eq. (1.3) to yield an approximately exponential form for g y (x). In addition, depending upon the particular assumption an organism makes about the correlational structure between extensions of consequential regions, different approximations to Minkowski distance metrics are obtained. Specifically, an assumption that the two extensions, (s 1 , s 2 ), of the consequential region are uncorrelated leads to equal generalization contours of a diamond shape, which correspond to g y (x) with the cityblock metric of r=1 in Eqs. (1.1), (1.2). On the other hand, an assumption of a perfect positive correlation leads to equal-generalization contours of a rounded shape, similar to the Euclidean metric of r=2 in Eq. (1.2). In particular, if either (a) the consequential region is assumed to be circular or (b) the consequential region whatever its shape is assumed to have any orientation with equal probability, then exactly the Euclidean metric is obtained. The theory also allows a graded series of cases between perfect integrality and perfect separability. That is, for an intermediate positive correlation, an intermediate metric between the city-block and the Euclidean metrics (1<r<2) is obtained, and furthermore, for a negative correlation, concave equal-generalization contours (r<1) result. In short, the theory points to the correlational structure between extensions of consequential regions as the source for the observed empirical regularities in generalization performance. The purpose of this theoretical note is to provide another statistical approach to the generalization problem. Specifically, in this paper (a) we investigate the possibility of basing the type of generalization theory advanced by File: 480J 113102 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01 Codes: 6369 Signs: 5156 . Length: 56 pic 0 pts, 236 mm Shepard (1987) on a more thorough-going informationtheoretic foundation, and (b) we examine structural relations between the inferential processes adopted in Shepard's original derivation and the different one explored here. 2. GENERALIZATION AS MAXIMUM ENTROPY INFERENCE In this section we first bring out a close connection between the correlational structure of the extensions of consequential regions and that of interstimulus distances between two objects having the same consequence. We then show that the generalization function can be expressed in terms of two probability densities that are estimated by maximum entropy inference based on moments of stimulus dimensions. First, following Shepard (1987), we assume that an object is consequential only if it falls in a consequential region. Further, assuming for the present that the consequential region is rectangular, we derive the following equalities that hold for any size density p(s 1 , s 2 ) and any n ( >0) (see Appendix 1): (a) (b) E[ |x i & y i | n ]= E[s ni ] (n+1)(n+2)2 (i=1, 2) (2.1) n E[(s s ) ] 1 1 E[( |x 1 & y 1 | |x 2 & y 2 | ) n ]= . ((n+1)(n+2)2) 2 Here, |x i & y i | (i=1, 2) denotes the absolute distance between two consequential stimuli along the ith coordinate axis. The above equalities are obtained by integrating over all possible consequential regions of different locations and sizes for fixed y=( y 1 , y 2 ), weighted by their prior probabilities. The results in Eq. (2.1) imply that statistical information about moments of (unobservable) extensions of the consequential region translates into the corresponding information in terms of moments of (observable) distances between two consequential stimuli, and vice versa. For example, from the above equalities, we obtain Cov(|x 1 & y 1 |, |x 2 & y 2 | )=Cov(s 1 , s 2 )9. Second, from the definition of the generalization function and also by applying Bayes rule, we obtain (see Appendix 2) g y (x)= P C (x | y) P(y) , P C (y | y) P(x) (2.2) where the subscript C stands for ``consequential.'' Then to generalize, an organism must estimate two probability densities: (a) P(x), the probability of observing a stimulus at x in the organism's representational space, whether it is consequential or not; and (b) P c (x | y), the conditional probability of observing a consequential stimulus at x given 344 MYUNG AND SHEPARD another consequential stimulus observed at y. P c (y | y) and P(y) are obtained as special cases by setting x=y. For the marginal probability P(x), we note P(x)=P(x | C) P(C)+P(x | C ) P(C ), where C stands for ``inconsequential''. Given that encounter with consequential stimuli are rather rare events (e.g., only about 100 of 5000 types of mushrooms found in the United States are poisonous 1 ) or P(C)r0, the marginal probability may be approximated as P(x)rP(x | C ). In so far as Shepard's (1987) theory is concerned, an organism generalizes using information about consequential stimuli only. 2 Under this circumstance of absence of any information about non-consequential stimuli, the organism should assume a uniform density for P(x | C ) or for P(x), thus yielding P(y) P(x)=1. 3 An implication of this simplifying assumption is that the organism estimates the generalization function solely based on the conditional probability of consequential stimuli as g y (x)=P c (x | y)P c (y | y). According to Shepard's original theory, an organism estimates the size density p(s 1 , s 2 ) in Eq. (1.3) by maximum entropy inference from minimum knowledge about the size of the consequential region such as E(s i ), i=1, 2, and the correlational information such as Cov(s 1 , s 2 ). 4 The equalities in Eq. (2.1) imply that this limited information translates into the following set of moments of interstimulus distances between consequential stimuli: [E[ |x 1 &y 1 | ], E[ |x 2 &y 2 | ], E[ |x 1 &y 1 | |x 2 &y 2 | ]]. (2.3) 1 From the U.C. Davis Poison Center Answer Book on the World Wide Web site: http:www.ucdmc.ucdavis.edupoisoncontrolmushroom.html. 2 Although the theory of generalization, as advanced in Shepard's (1987) paper, was developed using information about consequential stimuli only, nevertheless the paper hinted a possibility of extending the theory to situations involving information about non-consequential stimuli as well (e.g., generalization over a series of learning trials with differential reinforcement feedback). 3 The result in Eq. (2.2), along with the assumption P(x)rP(x | C ), shows that a reliable estimation of the generalization function requires two independent sources of information, one about consequential stimuli and the other about non-consequential stimuli. Thus the present assumption of a uniform density for non-consequential stimuli represents a suboptimal strategy in a sense that it does not fully utilize potentially useful information. However, it may represent a heuristic an organism adopts given limited storage and computational resources. In other words, the organism might primarily focus on stimuli having significant consequences that might be important for its survival but pay no or little attention to stimuli that are not consequential. Besides, the organism might assume that consequential stimuli are concentrated on a relatively small region in its representation space whereas non-consequential stimuli are scattered around in the space so P(x | C) is essentially uniform in that small region. 4 Although the maximum entropy inferred size density yielded exactly the exponential decay form of generalization function, many non-maximum entropy densities also produced ``approximately'' exponential forms, as demonstrated in Shepard (1987). File: 480J 113103 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01 Codes: 7194 Signs: 5625 . Length: 56 pic 0 pts, 236 mm In this paper we assume that the organism estimates P c (x | y) by maximum entropy inference from these moments. Maximum entropy inference (Jaynes, 1957) is a method for estimating a probability density based on the moments of that distribution (see, e.g., Kapur 6 Kesavan, 1992; also see Shore 6 Johnson, 1980, for theoretical justifications of maximum entropy inference based on an axiomatic proof ). Examples of recent applications of the inference method include categorization (Myung, 1994) and opinion aggregation (Levy 6 Delic, 1994; Myung, Ramamoorti, 6 Bailey, 1996). Maximum entropy inference prescribes that among all possible probability densities that satisfy a given set of moments, one should choose the one that maximizes the Shannon information measure (1948) defined as E( p) = & p(z) ln[ p(z)] dz for a continuous random variable z and its probability density p(z). Suppose that some information about the random event is available in terms of n moments E[h k(z)]=; k (k=1, ..., n). Then the maximum entropy solution that maximizes E( p) subject to the moment constraints has the form \ n + p(z)=exp &* 0 & : * k h k(z) , k=1 (2.4) where the * 0 and * k 's are constants determined by p(z) dz=1 and the moment constraints. The maximization of the information measure implies that the maximum entropy inferred probability is the one that utilizes all information contained in the set of moments but is maximally non-committal with respect to information not available (Jaynes, 1957). It is in this sense that the inferred probability is an optimal solution to the probability estimation problem; any other form would assume either more or less information than is actually available. Now, the maximum entropy inferred probability P c(x | y) based on the moment constraints in Eq. (2.3) is given in the form of Eq. (2.4). Then the generalization function obtained is g y (x)=exp[ &* 1 |x 1 &y 2 | &* 2 |x 2 &y 2 | +! |x 1 &y 1 | |x 2 &y 2 | ], (2.5) where * i 's and ! are constants determined from the constraints. To examine properties of the above maximum entropy inferred generalization function, we assume that stimuli, and hence interstimulus distances, are defined on some finite intervals, 0 |x i &y i | R i for 0<R i < (i=1, 2). Let us also require that the generalization function be monotonically decreasing with respect to the absolute interstimulus distance |x i & y i | so g y (x) |x i & y i | <0 for MAXIMUM ENTROPY INFERENCE AND STIMULUS GENERALIZATION all x i {y i (i=1, 2). 5 This monotonicity condition implies the following conditions on the parameters: * i >0 (i=1, 2) and !<min[* 1 R 2 , * 2 R 1 ]. Under these conditions, it can be shown that (a) a negative ! produces a concave shape of equal-generalization contours, which corresponds to the Minkowski ``metric'' of r<1 in Eq. (1.2); (b) !=0 leads to the rhombic-shaped equal-generalization contours that is the city-block metric of r=1; and (c) a positive ! produces a convex shape of equal-generalization contours that approximates the Minkowski metric of r>1, including the Euclidean metric of r=2. In particular, the larger the positive (negative) ! value, meaning a higher positive (negative) correlation between interstimulus distances, the more square-shaped (concave) the shape of the equalgeneralization contours becomes. The maximum entropy inferred generalization function in Eq. (2.5) was obtained from a relatively minimal set of moments in Eq. (2.3). If more information is assumed than this set, such as higher moments (e.g., E[ |x i & y i | 2 ]) or higher order correlations (e.g., E[ |x 1 & y 1 | 2 |x 2 & y 2 | 2 ]), then the resulting maximum entropy solution will yield somewhat different shapes for the equal generalization contours. For instance, suppose that in addition to the set in Eq. (2.3), the second moments [E[|x i & y i | 2 ], i=1, 2] are also available. Then, it can be shown that the maximum entropy inferred generalization function is a Gaussian decay function with the oblique Euclidean metric (see e.g., Ashby 6 Maddox, 1993), corresponding to tilted elliptical equalgeneralization contours. 3. DISCUSSION The purpose of this theoretical note was to characterize Shepard's (1987) theory of generalization from an information-theoretic point of view. We have shown that a generalization function similar to that derived by Shepard (1987) can be obtained by assuming maximum entropy inference based on information about interstimulus distances between two objects having a common consequence. Perhaps the most important result of the present investigation 5 For the particular set of moments in Eq. (2.3), the corresponding maximum entropy solution in Eq. (2.5) is not always monotonically decreasing, especially for large positive !, so this extra condition was required to obtain a sensible solution. This observation may suggest that some crucial piece of information is missing from the assumed moment set. In fact, the nonmonotonic property of the maximum entropy solution disappears (i.e., no additional condition needed) when certain additional moments such as [E[|x i & y i | 2 ], i=1, 2] are introduced. However, a problem here is that the resulting solution now becomes a Gaussian decay generalization function, instead of the desirable exponential decay generalization function, and further, that only the Euclidean metric is obtained, rather than the more general Minkowski metric of r>1. File: 480J 113104 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01 Codes: 6783 Signs: 5958 . Length: 56 pic 0 pts, 236 mm 345 is its interpretation that different shapes of equal generalization contours may be interpreted as optimal utilizationin the maximum entropy senseof the correlation structure of stimulus dimensions. Specifically, the convex, rhombic, or concave shape of contours is derived as the maximum entropy inference solution when a positive, zero, or negative correlation, respectively, is assumed between interstimulus distances between two objects having a common consequence. Shepard's (1987) approach to generalization provides essentially the same interpretation of the distance metrics in terms of the correlational structure between extensions of consequential regions. This result is not surprising given the relations in Eq. (2.1) between interstimulus distances and extensions of consequential regions. Yet, the information that an organism gains through experience provides more direct evidence about distances between stimuli that it finds to be consequential than about the extensions of an underlying consequential region. The present maximum entropy approach has been motivated by the possibility of thus providing a more explicit way in which an organism might gain knowledge about consequential regions, together with the possibility of making direct use of the moments of probability distributions of stimulus dimensions. There are in fact close connections between Shepard's approach and the present maximum entropy approach. In the former approach, maximum entropy inference is first applied to obtain the prior size density p(s 1 , s 2 ) from an assumed moments of size dimensions of consequential regions, [E(s 1 ), E(s 2 ), E(s 1 , s 2 )]. Then the generalization function is obtained by integrating over consequential regions of all possible sizes and locations. In the maximum entropy approach explored here, the above inference process is reversed. Therefore, the principal difference between the two approaches is the order of application of the two inferential operations, maximum entropy inference, and marginalization over hyperparameters. Although both approaches produce strikingly similar forms of the generalization function, at least qualitatively, the two inferential operations are not commutative. To show this, suppose that the information available is in the form of the two moments only, [E(s 1 ), E(s 2 )]. The result indicates that for this particular set of moments, Shepard's (1987) theory yields an approximately exponential form of the generalization function with a near-rhombic shape of contours. The results using the maximum entropy approach are somewhat sharper: not only is an exactly exponential form (relatively steeper) of the generalization function produced, but it also has an exactly rhombic shape of contours. On the other hand, by assuming a different set of expectations, for example, [E(s 1 ), E(ln s 1 ), E(s 2 ), E(ln s 2 )], one can obtain the opposite pattern of results, that is, it is Shepard's theory that now produces the relatively sharper results. 346 MYUNG AND SHEPARD We first note that according to the Bayes rule, the following equality holds for any random events A, B, and C: APPENDIX 1 This appendix derives the equalities in Eq. (2.1). Let a fixed vector y=( y 1 , y 2 ) and a random vector x=(x 1 , x 2 ) denote locations of the first- and the second-consequential stimuli in an individual's psychological space. Both stimuli belong to a rectangular consequential region of sizes s 1 (length) and s 2 (height) and locations v 1 and v 2 , denoting the horizontal and vertical coordinates of the upper-right corner of the consequential region, respectively. By assuming that all possible locations of the consequential region are equally likely, and further, that v 1 and v 2 are independently selected, we have p(v 1 , v 2 )= p 1(v 1 ) p 2(v 2 ) = ( y i v i s i + y i , 1 . s1 s2 i=1, 2) (A1.1) Also, the assumption that all locations within the rectangular consequential region are equally likely for x leads to p(x) v1, v2 = p 1(x 1 ) p 2(x 2 ) = (v i &s i x i v i , = p(A & B | C) p(C) p(B | C) p(C) = p(A & B | C) . p(B | C) (A2.1) Now, to express stimulus generalization in formal terms, let continuous random variables Y i (i=1, 2) denote locations in an organism's representation space of the first and second stimuli that are observed. Also let binary random variables Z i (i=1, 2) denote whether the corresponding stimulus is consequential (Z i = 1) or not (Z i = 0). From the definition of the generalization function g y (x) as the probability of the second stimulus observed at x being consequential given that the first stimulus observed at y is consequential, and also from Eq. (A2.1), we obtain the following result: i=1, 2) (A1.2) = Then, given a particular consequential region of sizes s 1 and s 2 , the n th order moment of the distance between the first- and the second-consequential stimuli is obtained as s2 s1 v2 0 0 (v2 &s2 ) | | | | v1 1 = (s 1 s 2 ) 2 s2 s1 = |x i & y i | n p(x 1 , x 2 ) 0 0 | | +(v i & y i ) n+1 ] dv 1 dv 2 n i s (n+1)(n+2)2 (i=1, 2). (A1.3) The desired equality of Eq. (2.1a) is then obtained by carrying out another integration with respect to the size density p(s 1 , s 2 ). Following similar steps, the equality in Eq. (2.1b) can readily be obtained. APPENDIX 2 This appendix shows that the generalization function can be expressed in terms of two probability densities of consequential and non-consequential stimuli. File: 480J 113105 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01 Codes: 5492 Signs: 3100 . Length: 56 pic 0 pts, 236 mm p(Y 2 =x, Z 2 =1 | Y 1 =y, Z 1 =1) p(Y 2 =y, Z 2 =1 | Y 1 =y, Z 1 =1) _ (v1 &s1 ) 1 [(s i &v i + y i ) n+1 n+1 p(Y 2 =x, Z 2 =1 | Y 1 =y, Z 1 =1) p(Y 2 =x | Y 1 =y, Z 1 =1) (from Eq. (A2.1)) _p(v 1 , v 2 ) dx 1 dx 2 dv 1 dv 2 = p(A & B & C) p(B & C) g y (x)#p(Z 2 =1 | Y 2 =x, Y 1 =y, Z 1 =1) 1 . s1 s2 E[ |x i & y i | n ]= p(A | B & C)= = p(Y 2 =y | Y 1 =y, Z 1 =1) p(Y 2 =x | Y 1 =y, Z 1 =1) (from g y (y)=1) p(Y 2 =x, Z 2 =1 | Y 1 =y, Z 1 =1) p(Y 2 =y) . p(Y 2 =y, Z 2 =1 | Y 1 =y, Z 1 =1) p(Y 2 =x) (A2.2) Here, the last equality is obtained assuming that nature selects the second stimulus Y 2 independent of the first stimulus Y 1 and whether it is consequential. Finally, by defining P C (x | y)#p(Y 2 =x, Z 2 =1 | Y 1 =y, Z 1 =1) and P(y)#p(Y 2 =y), the desired result is obtained as g y (x)= P C (x | y) P(y) . P C (y | y) P(x) (A2.3) ACKNOWLEDGMENTS The authors appreciate many helpful comments by the Action Editor, Greg Ashby, Robin Thomas, and an anonymous reviewer. They also thank Xiangen Hu, Caroline Palmer, Sridhar Ramamoorti, and Joshua Tenenbaum MAXIMUM ENTROPY INFERENCE AND STIMULUS GENERALIZATION for their comments on earlier drafts. A portion of this work has been presented at the Twenty-Seventh Annual Meeting of the Society for Mathematical Psychology held in Seattle, WA, on August 1114, 1994. REFERENCES Ashby, F. G., 6 Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization. Journal of Mathematical Psychology, 37(3), 372400. Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Erlbaum. Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106, 620630; 108, 171190. Kapur, J. N., 6 Kesavan, H. K. (1992). Entropy optimization principles with applications. Boston, MA: Academic Press. Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nometric hypothesis. Psychometrika, 29, 127. Levy, W. B., and Delic, H. (1994). Maximum entropy aggregation of individual opinions. IEEE Transactions on System Management 6 Cybernetics, 24(4), 606613. Myung, I. J. (1994). Maximum entropy interpretation of decision bound and context models of categorization. Journal of Mathematical Psychology, 38, 335365. Myung, I. J., Ramamoorti, S., 6 Bailey, A. D., Jr. (1996). Maximum File: 480J 113106 . By:CV . Date:30:12:96 . Time:13:44 LOP8M. V8.0. Page 01:01 Codes: 3580 Signs: 2775 . Length: 56 pic 0 pts, 236 mm 347 entropy aggregation of expert predictions. Management Science, 42(10). Shannon, C. E. (1948). A mathematical theory of communication, Bell Systems Technology Journal, 27, 379423. Shepard, R. N. (1958). Stimulus and response generalization: Tests of a model relating generalization to distance in psychological space. Journal of Experimental Psychology, 55, 509523. Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 5487. Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 13171323. Shepard, R. N. (1991). Integrality versus separability of stimulus dimensions: From an early convergence of evidence to a proposed theoretical basis. In G. R. Lockhead 6 J. R. Pomerantz (Eds.), Perception of structure: Essays in honor of Wendell R. Garner, pp. 5371. Washington, D.C.: American Psychological Association. Shore, J. E., 6 Johnson, R. W. (1980). Axiomatic derivation of the principle of maximum entropy and the principle of minimum-cross entropy. IEEE Transactions on Information Theory, 26, 2637. Townsend, J. T., 6 Thomas, R. D. (1993). On the need for a general quantitative theory of pattern similarity. In S. C. Masin (Ed.), Foundation of perceptual theory, pp. 297368. Amsterdam: North-Holland. Tversky, A., 6 Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89, 123154. Received March 8, 1995