Download Maximum Entropy Inference and Stimulus Generalization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

History of randomness wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Inductive probability wikipedia , lookup

Entropy (information theory) wikipedia , lookup

Transcript
Journal of Mathematical Psychology MP1131
journal of mathematical psychology 40, 342347 (1996)
article no. 0033
THEORETICAL NOTE
Maximum Entropy Inference and Stimulus Generalization
In Jae Myung
Ohio State University
and
Roger N. Shepard
Stanford University
Maximum entropy inference is a method for estimating a probability
distribution based on limited information expressed in terms of the
moments of that distribution. This paper presents such a maximum
entropy characterization of Shepard's theory of generalization.
Shepard's theory assumes that an object has an important consequence
for an individual only if it falls in a connected set, called the consequential region, in the individual's representational space. The assumption
yields a generalization probability that decays exponentially with an
appropriate psychological distance metriceither the city-block or the
Euclidean, depending on the correlational structure between extensions of the consequential region along the dimensions. In this note we
show that a generalization function similar to that derived by Shepard
(1987) can be obtained by applying maximum entropy inference on
limited information about interstimulus distances between two objects
having a common consequence. In particular, we show that different
shapes of equal generalization contours may be interpreted as optimal
utilizationin the maximum entropy senseof the correlation structure
of stimulus dimensions, similar to the explanation by Shepard's
theory. ] 1996 Academic Press
1. INTRODUCTION
Results from multidimensional scaling analysis of generalization data (e.g., Shepard, 1958) indicate that the likelihood that an organism will generalize from one stimulus to
another decreases in close approximation to an exponential
decay function of psychological distance between the two
stimuli. In formal terms, the generalization function g y (x) is
defined as the probability that the second stimulus, denoted
by a vector x, in an organism's representation space will
have a significant consequence given that the first stimulus,
denoted by another vector y, was found to be consequential.
Let us assume that the organism estimates this probability
Send all correspondence and reprint requests to: Dr. In Jae Myung,
Department of Psychology, 142 Townshend Hall, 1885 Neil Avenues Mall,
Columbus, OH 43210-1222. Email: imyungmagnus.acs.ohio-state.edu.
0022-249696 18.00
Copyright 1996 by Academic Press
All rights of reproduction in any form reserved.
File: 480J 113101 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01
Codes: 6494 Signs: 4120 . Length: 60 pic 11 pts, 257 mm
342
from which it makes decisions probabilistically regarding
generalizability of the first stimulus to the second. Then the
generalization function can be approximated as
g y (x)=exp( &' } d(x, y))
(1.1)
where d(x, y) is the distance between x and y measured by
an appropriate metric and ' is a positive scaling parameter.
The Minkowski r-metric is often found to fit generalization
data closely and is defined as
\
m
d(x, y)= : |x i & y i | r
i=1
+
1r
,
r0.
(1.2)
In particular, the city-block metric (r=1) better describes
data obtained for stimuli differing along separable dimensions, such as size and orientation of shapes, whereas the
Euclidean metric (r=2) better describes data obtained for
stimuli differing along integral dimensions, such as saturation and lightness of colors (for reviews see Garner, 1974,
and Shepard, 1991; for a critical review of related issues, see
Townsend 6 Thomas, 1993). There is some evidence,
however, that generalization data may be better described
by a Minkowski r-metric with a non-integer r-value between
1 and 2, or greater than 2, or even less than 1 (see, e.g.,
Kruskal, 1964; Shepard, 1964; Tversky 6 Gati, 1982; and
Shepard, 1991).
Shepard (1987) proposed a theory of generalization that
provides non-arbitrary explanations for the empirical
regularities based on the idea that they represent a statistical
inference given limited information. The theory assumes
that a basic kind or category is represented in an organism's
representation space as a connected set, called a consequential region. Two objects that fall in a particular consequential region belong to the same category and thus have the
343
MAXIMUM ENTROPY INFERENCE AND STIMULUS GENERALIZATION
same consequence. To generalize, one must then estimate
the probability that the second stimulus x falls in the
same consequential region that already contains the first
stimulus y. The estimation of this probability must be
carried out with little or no knowledge of the location, size,
and shape of the consequential region in representational
space. According to theory, the generalization function is
estimated as an expectation of that probability by integrating over consequential regions of all possible locations and
sizes, weighted by their prior probabilities. For simplicity,
a two-dimensional case (m=2) is considered in this
paper. Assuming that nature selects the location of the
consequential region at random, the theory prescribes the
generalization function g y (x) as obtained by
g y (x)=
| |
0
P y (x | s 1 , s 2 ) p(s 1 , s 2 ) ds 1 ds 2 .
(1.3)
0
In this equation, P y (x | s 1 , s 2 ) is the probability that
stimulus x belongs to the consequential region of particular
sizes or extensions, s 1 and s 2 , that already contains stimulus
y, and p(s 1 , s 2 ) is the size density.
For a broad range of choices of p(s 1 , s 2 ), Shepard (1987)
found the integral in Eq. (1.3) to yield an approximately
exponential form for g y (x). In addition, depending upon
the particular assumption an organism makes about the
correlational structure between extensions of consequential
regions, different approximations to Minkowski distance
metrics are obtained. Specifically, an assumption that the
two extensions, (s 1 , s 2 ), of the consequential region are
uncorrelated leads to equal generalization contours of a
diamond shape, which correspond to g y (x) with the cityblock metric of r=1 in Eqs. (1.1), (1.2). On the other hand,
an assumption of a perfect positive correlation leads to
equal-generalization contours of a rounded shape, similar
to the Euclidean metric of r=2 in Eq. (1.2). In particular, if
either (a) the consequential region is assumed to be circular
or (b) the consequential region whatever its shape is
assumed to have any orientation with equal probability,
then exactly the Euclidean metric is obtained. The theory
also allows a graded series of cases between perfect
integrality and perfect separability. That is, for an intermediate positive correlation, an intermediate metric
between the city-block and the Euclidean metrics (1<r<2)
is obtained, and furthermore, for a negative correlation,
concave equal-generalization contours (r<1) result. In
short, the theory points to the correlational structure
between extensions of consequential regions as the source
for the observed empirical regularities in generalization
performance.
The purpose of this theoretical note is to provide another
statistical approach to the generalization problem. Specifically, in this paper (a) we investigate the possibility of
basing the type of generalization theory advanced by
File: 480J 113102 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01
Codes: 6369 Signs: 5156 . Length: 56 pic 0 pts, 236 mm
Shepard (1987) on a more thorough-going informationtheoretic foundation, and (b) we examine structural relations between the inferential processes adopted in Shepard's
original derivation and the different one explored here.
2. GENERALIZATION AS
MAXIMUM ENTROPY INFERENCE
In this section we first bring out a close connection
between the correlational structure of the extensions of consequential regions and that of interstimulus distances
between two objects having the same consequence. We then
show that the generalization function can be expressed in
terms of two probability densities that are estimated by
maximum entropy inference based on moments of stimulus
dimensions.
First, following Shepard (1987), we assume that an object
is consequential only if it falls in a consequential region.
Further, assuming for the present that the consequential
region is rectangular, we derive the following equalities that
hold for any size density p(s 1 , s 2 ) and any n ( >0) (see
Appendix 1):
(a)
(b)
E[ |x i & y i | n ]=
E[s ni ]
(n+1)(n+2)2
(i=1, 2)
(2.1)
n
E[(s
s
)
]
1
1
E[( |x 1 & y 1 | |x 2 & y 2 | ) n ]=
.
((n+1)(n+2)2) 2
Here, |x i & y i | (i=1, 2) denotes the absolute distance
between two consequential stimuli along the ith coordinate
axis. The above equalities are obtained by integrating over
all possible consequential regions of different locations
and sizes for fixed y=( y 1 , y 2 ), weighted by their prior
probabilities. The results in Eq. (2.1) imply that statistical
information about moments of (unobservable) extensions
of the consequential region translates into the corresponding information in terms of moments of (observable)
distances between two consequential stimuli, and vice versa.
For example, from the above equalities, we obtain
Cov(|x 1 & y 1 |, |x 2 & y 2 | )=Cov(s 1 , s 2 )9.
Second, from the definition of the generalization function
and also by applying Bayes rule, we obtain (see Appendix 2)
g y (x)=
P C (x | y) P(y)
,
P C (y | y) P(x)
(2.2)
where the subscript C stands for ``consequential.'' Then to
generalize, an organism must estimate two probability densities: (a) P(x), the probability of observing a stimulus at x
in the organism's representational space, whether it is consequential or not; and (b) P c (x | y), the conditional probability of observing a consequential stimulus at x given
344
MYUNG AND SHEPARD
another consequential stimulus observed at y. P c (y | y) and
P(y) are obtained as special cases by setting x=y.
For the marginal probability P(x), we note
P(x)=P(x | C) P(C)+P(x | C ) P(C ), where C stands for
``inconsequential''. Given that encounter with consequential
stimuli are rather rare events (e.g., only about 100 of 5000
types of mushrooms found in the United States are
poisonous 1 ) or P(C)r0, the marginal probability may be
approximated as P(x)rP(x | C ). In so far as Shepard's
(1987) theory is concerned, an organism generalizes using
information about consequential stimuli only. 2 Under this
circumstance of absence of any information about non-consequential stimuli, the organism should assume a uniform
density for P(x | C ) or for P(x), thus yielding P(y)
P(x)=1. 3 An implication of this simplifying assumption
is that the organism estimates the generalization function
solely based on the conditional probability of consequential
stimuli as g y (x)=P c (x | y)P c (y | y).
According to Shepard's original theory, an organism
estimates the size density p(s 1 , s 2 ) in Eq. (1.3) by maximum
entropy inference from minimum knowledge about the size
of the consequential region such as E(s i ), i=1, 2, and the
correlational information such as Cov(s 1 , s 2 ). 4 The
equalities in Eq. (2.1) imply that this limited information
translates into the following set of moments of interstimulus
distances between consequential stimuli:
[E[ |x 1 &y 1 | ], E[ |x 2 &y 2 | ], E[ |x 1 &y 1 | |x 2 &y 2 | ]]. (2.3)
1
From the U.C. Davis Poison Center Answer Book on the World Wide
Web site: http:www.ucdmc.ucdavis.edupoisoncontrolmushroom.html.
2
Although the theory of generalization, as advanced in Shepard's (1987)
paper, was developed using information about consequential stimuli only,
nevertheless the paper hinted a possibility of extending the theory to situations involving information about non-consequential stimuli as well (e.g.,
generalization over a series of learning trials with differential reinforcement
feedback).
3
The result in Eq. (2.2), along with the assumption P(x)rP(x | C ),
shows that a reliable estimation of the generalization function requires two
independent sources of information, one about consequential stimuli and
the other about non-consequential stimuli. Thus the present assumption of
a uniform density for non-consequential stimuli represents a suboptimal
strategy in a sense that it does not fully utilize potentially useful information. However, it may represent a heuristic an organism adopts given
limited storage and computational resources. In other words, the organism
might primarily focus on stimuli having significant consequences that
might be important for its survival but pay no or little attention to stimuli
that are not consequential. Besides, the organism might assume that consequential stimuli are concentrated on a relatively small region in its
representation space whereas non-consequential stimuli are scattered
around in the space so P(x | C) is essentially uniform in that small region.
4
Although the maximum entropy inferred size density yielded exactly
the exponential decay form of generalization function, many non-maximum entropy densities also produced ``approximately'' exponential forms,
as demonstrated in Shepard (1987).
File: 480J 113103 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01
Codes: 7194 Signs: 5625 . Length: 56 pic 0 pts, 236 mm
In this paper we assume that the organism estimates
P c (x | y) by maximum entropy inference from these
moments.
Maximum entropy inference (Jaynes, 1957) is a method
for estimating a probability density based on the moments
of that distribution (see, e.g., Kapur 6 Kesavan, 1992; also
see Shore 6 Johnson, 1980, for theoretical justifications of
maximum entropy inference based on an axiomatic proof ).
Examples of recent applications of the inference method
include categorization (Myung, 1994) and opinion aggregation (Levy 6 Delic, 1994; Myung, Ramamoorti, 6 Bailey,
1996). Maximum entropy inference prescribes that among
all possible probability densities that satisfy a given set of
moments, one should choose the one that maximizes the
Shannon information measure (1948) defined as E( p) =
& p(z) ln[ p(z)] dz for a continuous random variable z and
its probability density p(z). Suppose that some information
about the random event is available in terms of n moments
E[h k(z)]=; k (k=1, ..., n). Then the maximum entropy
solution that maximizes E( p) subject to the moment
constraints has the form
\
n
+
p(z)=exp &* 0 & : * k h k(z) ,
k=1
(2.4)
where the * 0 and * k 's are constants determined by
p(z) dz=1 and the moment constraints. The maximization of the information measure implies that the maximum
entropy inferred probability is the one that utilizes all information contained in the set of moments but is maximally
non-committal with respect to information not available
(Jaynes, 1957). It is in this sense that the inferred probability
is an optimal solution to the probability estimation
problem; any other form would assume either more or less
information than is actually available.
Now, the maximum entropy inferred probability P c(x | y)
based on the moment constraints in Eq. (2.3) is given in the
form of Eq. (2.4). Then the generalization function obtained
is
g y (x)=exp[ &* 1 |x 1 &y 2 | &* 2 |x 2 &y 2 |
+! |x 1 &y 1 | |x 2 &y 2 | ],
(2.5)
where * i 's and ! are constants determined from the constraints. To examine properties of the above maximum
entropy inferred generalization function, we assume that
stimuli, and hence interstimulus distances, are defined on
some finite intervals, 0 |x i &y i | R i for 0<R i <
(i=1, 2). Let us also require that the generalization function
be monotonically decreasing with respect to the absolute
interstimulus distance |x i & y i | so g y (x) |x i & y i | <0 for
MAXIMUM ENTROPY INFERENCE AND STIMULUS GENERALIZATION
all x i {y i (i=1, 2). 5 This monotonicity condition implies
the following conditions on the parameters: * i >0 (i=1, 2)
and !<min[* 1 R 2 , * 2 R 1 ]. Under these conditions, it can
be shown that (a) a negative ! produces a concave shape of
equal-generalization contours, which corresponds to the
Minkowski ``metric'' of r<1 in Eq. (1.2); (b) !=0 leads to
the rhombic-shaped equal-generalization contours that is
the city-block metric of r=1; and (c) a positive ! produces
a convex shape of equal-generalization contours that
approximates the Minkowski metric of r>1, including the
Euclidean metric of r=2. In particular, the larger the
positive (negative) ! value, meaning a higher positive
(negative) correlation between interstimulus distances, the
more square-shaped (concave) the shape of the equalgeneralization contours becomes.
The maximum entropy inferred generalization function in
Eq. (2.5) was obtained from a relatively minimal set of
moments in Eq. (2.3). If more information is assumed than
this set, such as higher moments (e.g., E[ |x i & y i | 2 ]) or
higher order correlations (e.g., E[ |x 1 & y 1 | 2 |x 2 & y 2 | 2 ]),
then the resulting maximum entropy solution will yield
somewhat different shapes for the equal generalization contours. For instance, suppose that in addition to the set in
Eq. (2.3), the second moments [E[|x i & y i | 2 ], i=1, 2] are
also available. Then, it can be shown that the maximum
entropy inferred generalization function is a Gaussian decay
function with the oblique Euclidean metric (see e.g., Ashby
6 Maddox, 1993), corresponding to tilted elliptical equalgeneralization contours.
3. DISCUSSION
The purpose of this theoretical note was to characterize
Shepard's (1987) theory of generalization from an information-theoretic point of view. We have shown that a
generalization function similar to that derived by Shepard
(1987) can be obtained by assuming maximum entropy
inference based on information about interstimulus distances between two objects having a common consequence.
Perhaps the most important result of the present investigation
5
For the particular set of moments in Eq. (2.3), the corresponding maximum entropy solution in Eq. (2.5) is not always monotonically decreasing,
especially for large positive !, so this extra condition was required to obtain
a sensible solution. This observation may suggest that some crucial piece
of information is missing from the assumed moment set. In fact, the nonmonotonic property of the maximum entropy solution disappears (i.e., no
additional condition needed) when certain additional moments such as
[E[|x i & y i | 2 ], i=1, 2] are introduced. However, a problem here is that
the resulting solution now becomes a Gaussian decay generalization function, instead of the desirable exponential decay generalization function,
and further, that only the Euclidean metric is obtained, rather than the
more general Minkowski metric of r>1.
File: 480J 113104 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01
Codes: 6783 Signs: 5958 . Length: 56 pic 0 pts, 236 mm
345
is its interpretation that different shapes of equal generalization contours may be interpreted as optimal utilizationin
the maximum entropy senseof the correlation structure of
stimulus dimensions. Specifically, the convex, rhombic, or
concave shape of contours is derived as the maximum
entropy inference solution when a positive, zero, or negative
correlation, respectively, is assumed between interstimulus
distances between two objects having a common consequence.
Shepard's (1987) approach to generalization provides
essentially the same interpretation of the distance metrics
in terms of the correlational structure between extensions
of consequential regions. This result is not surprising
given the relations in Eq. (2.1) between interstimulus distances and extensions of consequential regions. Yet, the information that an organism gains through experience provides
more direct evidence about distances between stimuli that it
finds to be consequential than about the extensions of an
underlying consequential region. The present maximum
entropy approach has been motivated by the possibility of
thus providing a more explicit way in which an organism
might gain knowledge about consequential regions,
together with the possibility of making direct use of the
moments of probability distributions of stimulus dimensions.
There are in fact close connections between Shepard's
approach and the present maximum entropy approach. In
the former approach, maximum entropy inference is first
applied to obtain the prior size density p(s 1 , s 2 ) from an
assumed moments of size dimensions of consequential
regions, [E(s 1 ), E(s 2 ), E(s 1 , s 2 )]. Then the generalization
function is obtained by integrating over consequential
regions of all possible sizes and locations. In the maximum
entropy approach explored here, the above inference process is reversed. Therefore, the principal difference between
the two approaches is the order of application of the two
inferential operations, maximum entropy inference, and
marginalization over hyperparameters. Although both
approaches produce strikingly similar forms of the generalization function, at least qualitatively, the two inferential
operations are not commutative. To show this, suppose that
the information available is in the form of the two moments
only, [E(s 1 ), E(s 2 )]. The result indicates that for this
particular set of moments, Shepard's (1987) theory yields
an approximately exponential form of the generalization
function with a near-rhombic shape of contours. The results
using the maximum entropy approach are somewhat
sharper: not only is an exactly exponential form (relatively
steeper) of the generalization function produced, but it also
has an exactly rhombic shape of contours. On the other
hand, by assuming a different set of expectations, for
example, [E(s 1 ), E(ln s 1 ), E(s 2 ), E(ln s 2 )], one can obtain
the opposite pattern of results, that is, it is Shepard's theory
that now produces the relatively sharper results.
346
MYUNG AND SHEPARD
We first note that according to the Bayes rule, the following equality holds for any random events A, B, and C:
APPENDIX 1
This appendix derives the equalities in Eq. (2.1). Let a
fixed vector y=( y 1 , y 2 ) and a random vector x=(x 1 , x 2 )
denote locations of the first- and the second-consequential
stimuli in an individual's psychological space. Both stimuli
belong to a rectangular consequential region of sizes s 1
(length) and s 2 (height) and locations v 1 and v 2 , denoting
the horizontal and vertical coordinates of the upper-right
corner of the consequential region, respectively. By assuming that all possible locations of the consequential region
are equally likely, and further, that v 1 and v 2 are independently selected, we have
p(v 1 , v 2 )= p 1(v 1 ) p 2(v 2 )
=
( y i v i s i + y i ,
1
.
s1 s2
i=1, 2)
(A1.1)
Also, the assumption that all locations within the rectangular consequential region are equally likely for x
leads to
p(x) v1, v2 = p 1(x 1 ) p 2(x 2 )
=
(v i &s i x i v i ,
=
p(A & B | C) p(C)
p(B | C) p(C)
=
p(A & B | C)
.
p(B | C)
(A2.1)
Now, to express stimulus generalization in formal terms,
let continuous random variables Y i (i=1, 2) denote locations in an organism's representation space of the first and
second stimuli that are observed. Also let binary random
variables Z i (i=1, 2) denote whether the corresponding
stimulus is consequential (Z i = 1) or not (Z i = 0). From
the definition of the generalization function g y (x) as the
probability of the second stimulus observed at x being
consequential given that the first stimulus observed at y is
consequential, and also from Eq. (A2.1), we obtain the
following result:
i=1, 2)
(A1.2)
=
Then, given a particular consequential region of sizes s 1
and s 2 , the n th order moment of the distance between the
first- and the second-consequential stimuli is obtained as
s2
s1
v2
0
0
(v2 &s2 )
| | |
|
v1
1
=
(s 1 s 2 ) 2
s2
s1
=
|x i & y i | n p(x 1 , x 2 )
0
0
| |
+(v i & y i ) n+1 ] dv 1 dv 2
n
i
s
(n+1)(n+2)2
(i=1, 2).
(A1.3)
The desired equality of Eq. (2.1a) is then obtained by carrying out another integration with respect to the size density
p(s 1 , s 2 ).
Following similar steps, the equality in Eq. (2.1b) can
readily be obtained.
APPENDIX 2
This appendix shows that the generalization function can
be expressed in terms of two probability densities of consequential and non-consequential stimuli.
File: 480J 113105 . By:CV . Date:30:12:96 . Time:13:42 LOP8M. V8.0. Page 01:01
Codes: 5492 Signs: 3100 . Length: 56 pic 0 pts, 236 mm
p(Y 2 =x, Z 2 =1 | Y 1 =y, Z 1 =1)
p(Y 2 =y, Z 2 =1 | Y 1 =y, Z 1 =1)
_
(v1 &s1 )
1
[(s i &v i + y i ) n+1
n+1
p(Y 2 =x, Z 2 =1 | Y 1 =y, Z 1 =1)
p(Y 2 =x | Y 1 =y, Z 1 =1)
(from Eq. (A2.1))
_p(v 1 , v 2 ) dx 1 dx 2 dv 1 dv 2
=
p(A & B & C)
p(B & C)
g y (x)#p(Z 2 =1 | Y 2 =x, Y 1 =y, Z 1 =1)
1
.
s1 s2
E[ |x i & y i | n ]=
p(A | B & C)=
=
p(Y 2 =y | Y 1 =y, Z 1 =1)
p(Y 2 =x | Y 1 =y, Z 1 =1)
(from g y (y)=1)
p(Y 2 =x, Z 2 =1 | Y 1 =y, Z 1 =1) p(Y 2 =y)
.
p(Y 2 =y, Z 2 =1 | Y 1 =y, Z 1 =1) p(Y 2 =x)
(A2.2)
Here, the last equality is obtained assuming that nature
selects the second stimulus Y 2 independent of the first
stimulus Y 1 and whether it is consequential.
Finally, by defining P C (x | y)#p(Y 2 =x, Z 2 =1 | Y 1 =y,
Z 1 =1) and P(y)#p(Y 2 =y), the desired result is obtained
as
g y (x)=
P C (x | y) P(y)
.
P C (y | y) P(x)
(A2.3)
ACKNOWLEDGMENTS
The authors appreciate many helpful comments by the Action Editor, Greg
Ashby, Robin Thomas, and an anonymous reviewer. They also thank
Xiangen Hu, Caroline Palmer, Sridhar Ramamoorti, and Joshua Tenenbaum
MAXIMUM ENTROPY INFERENCE AND STIMULUS GENERALIZATION
for their comments on earlier drafts. A portion of this work has been
presented at the Twenty-Seventh Annual Meeting of the Society for Mathematical Psychology held in Seattle, WA, on August 1114, 1994.
REFERENCES
Ashby, F. G., 6 Maddox, W. T. (1993). Relations between prototype,
exemplar, and decision bound models of categorization. Journal of
Mathematical Psychology, 37(3), 372400.
Garner, W. R. (1974). The processing of information and structure.
Potomac, MD: Erlbaum.
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical
Review, 106, 620630; 108, 171190.
Kapur, J. N., 6 Kesavan, H. K. (1992). Entropy optimization principles with
applications. Boston, MA: Academic Press.
Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of
fit to a nometric hypothesis. Psychometrika, 29, 127.
Levy, W. B., and Delic, H. (1994). Maximum entropy aggregation of
individual opinions. IEEE Transactions on System Management 6
Cybernetics, 24(4), 606613.
Myung, I. J. (1994). Maximum entropy interpretation of decision bound
and context models of categorization. Journal of Mathematical Psychology, 38, 335365.
Myung, I. J., Ramamoorti, S., 6 Bailey, A. D., Jr. (1996). Maximum
File: 480J 113106 . By:CV . Date:30:12:96 . Time:13:44 LOP8M. V8.0. Page 01:01
Codes: 3580 Signs: 2775 . Length: 56 pic 0 pts, 236 mm
347
entropy aggregation of expert predictions. Management Science,
42(10).
Shannon, C. E. (1948). A mathematical theory of communication, Bell
Systems Technology Journal, 27, 379423.
Shepard, R. N. (1958). Stimulus and response generalization: Tests of a
model relating generalization to distance in psychological space.
Journal of Experimental Psychology, 55, 509523.
Shepard, R. N. (1964). Attention and the metric structure of the stimulus
space. Journal of Mathematical Psychology, 1, 5487.
Shepard, R. N. (1987). Toward a universal law of generalization for
psychological science. Science, 237, 13171323.
Shepard, R. N. (1991). Integrality versus separability of stimulus dimensions: From an early convergence of evidence to a proposed theoretical
basis. In G. R. Lockhead 6 J. R. Pomerantz (Eds.), Perception of structure: Essays in honor of Wendell R. Garner, pp. 5371. Washington,
D.C.: American Psychological Association.
Shore, J. E., 6 Johnson, R. W. (1980). Axiomatic derivation of the principle
of maximum entropy and the principle of minimum-cross entropy.
IEEE Transactions on Information Theory, 26, 2637.
Townsend, J. T., 6 Thomas, R. D. (1993). On the need for a general quantitative theory of pattern similarity. In S. C. Masin (Ed.), Foundation of
perceptual theory, pp. 297368. Amsterdam: North-Holland.
Tversky, A., 6 Gati, I. (1982). Similarity, separability, and the triangle
inequality. Psychological Review, 89, 123154.
Received March 8, 1995