Download Nonlinear Population Codes - Department of Nonlinear Dynamics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mirror neuron wikipedia , lookup

Caridoid escape reaction wikipedia , lookup

Neural oscillation wikipedia , lookup

Rheobase wikipedia , lookup

Single-unit recording wikipedia , lookup

Neural modeling fields wikipedia , lookup

Development of the nervous system wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Recurrent neural network wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Circumventricular organs wikipedia , lookup

Convolutional neural network wikipedia , lookup

Central pattern generator wikipedia , lookup

Neuroanatomy wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Pre-Bötzinger complex wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Optogenetics wikipedia , lookup

Metastability in the brain wikipedia , lookup

Synaptic gating wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Biological neuron model wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Neural coding wikipedia , lookup

Nervous system network models wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Transcript
LETTER
Communicated by Peter Latham
Nonlinear Population Codes
Maoz Shamir
maoz@Žz.huji.ac.il
Haim Sompolinsky
haim@Žz.huji.ac.il
Racah Institute of Physics and Center for Neural Computation, Hebrew University of
Jerusalem, Jerusalem 91904, Israel
Theoretical and experimental studies of distributed neuronal representations of sensory and behavioral variables usually assume that the tuning
of the mean Žring rates is the main source of information. However, recent theoretical studies have investigated the effect of cross-correlations
in the trial-to-trial uctuations of the neuronal responses on the accuracy
of the representation. Assuming that only the Žrst-order statistics of the
neuronal responses are tuned to the stimulus, these studies have shown
that in the presence of correlations, similar to those observed experimentally in cortical ensembles of neurons, the amount of information in the
population is limited, yielding nonzero error levels even in the limit of
inŽnitely large populations of neurons.
In this letter, we study correlated neuronal populations whose higherorder statistics, and in particular response variances, are also modulated
by the stimulus. We ask two questions: Does the correlated noise limit the
accuracy of the neuronal representation of the stimulus? and, How can a
biological mechanism extract most of the information embedded in the
higher-order statistics of the neuronal responses? SpeciŽcally, we address
these questions in the context of a population of neurons coding an angular variable. We show that the information embedded in the variances
grows linearly with the population size despite the presence of strong
correlated noise. This information cannot be extracted by linear readout
schemes, including the linear population vector. Instead, we propose a bilinear readout scheme that involves spatial decorrelation, quadratic nonlinearity, and population vector summation. We show that this nonlinear
population vector scheme yields accurate estimates of stimulus parameters, with an efŽciency that grows linearly with the population size. This
code can be implemented using biologically plausible neurons.
1 Introduction
In many areas of the brain, information on certain stimulus features is
coded in a distributed manner—by a population of neurons (e.g., Georc 2004 Massachusetts Institute of Technology
Neural Computation 16, 1105–1136 (2004) °
1106
M. Shamir and and H. Sompolinsky
gopoulos, Schwartz, & Kettner, 1986; Lee, Rohrer, & Sparks, 1988; Wilson
& McNaughton, 1993; Fitzpatrick, Batra, Stanford, & Kuwada, 1997; Young
& Yamane, 1992). It is generally accepted that the average response of each
neuron is tuned to the stimulus; hence, the average Žring rate codes for
the stimulus. The Žring rates of each neuron uctuate from trial-to-trial
about their average value. If the trial-to-trial uctuations of different neurons are uncorrelated, then by pooling information from large populations
of neurons, the brain can overcome the noise inherent in the single neuron responses. However, experimental Žndings suggest that correlations between
the uctuations of different neurons are considerable (Zohary, Shadlen, &
Newsome, 1994; Lee, Port, Kruse, & Georgopoulos, 1998). Theoretical studies studies that investigated the effect of correlations on population coding
yielded conicting results (e.g., see Zohary et al., 1994; Abbott, & Dayan,
1999). Recently Sompolinsky, Yoon, Kang, and Shamir (2001) studied the
effect of correlations on the accuracy of population coding. They showed
that long-range positive correlations that vary smoothly with the functional
distance between the neurons lead to saturation of the accuracy by which
the stimulus parameters can be extracted to a Žnite value, even in the limit
of inŽnitely large ensembles. This work, however, was limited to stimulusindependent covariance matrices.
Experimental Žndings show that information exists not only in the mean
Žring rates but also in higher-order statistics of the neuronal responses.
The approximate linear relationship between the average Žring rate and its
variance reported in many brain areas (e.g., see Vogels, Spileers, & Orban,
1989; Britten, Shadlen, Newsome, & Movshon, 1992) implies that the variance is also tuned to the stimulus. Maynard et al. (1999) showed a (weak)
dependence of the cross correlations of different neurons on the stimulus.
Here we address two questions. Is information coded in the higher-order
statistics of the neuronal responses bounded by the correlated noise in the
system? And, How can such a code be read by a biologically plausible
mechanism? For concreteness, we discuss the coding for an angle, such as
the direction of arm movement in M1 during a simple unimanual movement
task or the coding for the direction of movement of visual stimuli in area V5.
The outline of this letter is as follows. In section 1, the problem of coding information in a correlated population of neurons is introduced. First,
the stochastic model of the neuronal population that codes for the angle is
deŽned. Then the Fisher information of the system is calculated, and the effect of the correlated noise on linear readout mechanisms is discussed. The
saturation of information coded in the Žrst-order statistics of the neuronal
behavior, on the one hand, and the inability of linear readout schemes to
extract information coded in the high-order statistics, on the other hand,
encourage the study of nonlinear readout schemes in section 2. Motivated
by the results, we suggest a nonlinear readout mechanism and study its
efŽciency. We then discuss the statistical properties of the neurons in the
readout layer. A combination of linear and nonlinear readout schemes is
Nonlinear Population Codes
1107
presented and is shown to have superior performance. In section 3, we
summarize the results of our analysis, discuss the effects of different variations of this model, and compare our model to other existing models of
nonlinear population readout. Preliminary results have been published in
Shamir and Sompolinsky (2001a, 2001b).
1.1 The Statistical Model of the Input Layer. We consider a population
of N neurons that code for an angle µ, and denote by ri the response of the ith
neuron (i 2 f1; 2; : : : Ng) to the stimulus µ . We shall refer to this population
of neurons hereafter as the input layer.
Assuming the Žring rates of the neurons are sufŽciently high, we model
the probability distribution of the neuronal activities according to a multivariate gaussian distribution,
P.fri g j µ / D p
1
.2¼ /N
det.C/
0
1
X
1
£ exp @¡
C¡1 .ri ¡ fi .µ //.rj ¡ fj .µ //A
2 ij ij
(1.1)
where fi .µ / is the average Žring rate of neuron i, and Cij .µ / is the covariance
of the responses of neurons i and j, for a given stimulus µ. The average Žring
rate of each neuron is modeled by a smooth stereotypical tuning curve with
a single peak at the neuron’s preferred direction (PD), denoted here by Ái ,
as shown in Figure 1a for a neuron with a PD of 0 deg. The tuning curve of a
neuron with a PD of Á is obtained by a horizontal translation of this tuning
curve by Á. In our numerical simulations, we use the following tuning curve,
fi .µ / D f .µ ¡ Ái /
Á
f .µ / D . fmax ¡ fref / exp
cos.µ / ¡ 1
¾ f2
(1.2)
!
C fref;
(1.3)
where ¾ f is the tuning width, fmax is the peak response at the neuron’s
PD, and fref determines the modulation amplitude of the tuning curve.
We assume that the PDs of the neurons are evenly spaced on the ring—
C 2¼
Ák D ¡¼ N¡1
N
N k.
The covariance matrix, C, on the ring is, in general, a function of three
angles Cij .µ / D .Ái ; Áj ; µ/. Assuming isotropy of the system implies that
there is no preferred spatial direction. Hence, C should obey C.Ái ; Áj ; µ / D
C.Ái ¡ Ã; Áj ¡ Ã; µ ¡ Ã / for arbitrary Ã. Thus, C is a function of only two
angles:
Cij .µ / D C.Ái ¡ µ; Áj ¡ µ /:
(1.4)
M. Shamir and and H. Sompolinsky
­1
fi(q) (sec )
1108
(a)
80
60
40
­ 120
­ 60
0
q (deg)
60
120
­2
180
(b)
bi(q) (sec )
­ 180
80
60
40
­ 120
­ 60
0
q (deg)
60
­2
120
180
(c)
Cij (sec )
­ 180
10
0
0
60
f
j
(deg)
120
180
Figure 1: The statistics of neurons in the input layer in model 1. (a) The tuning
curve of a neuron i with PD of Ái D 0 degree. The average Žring rate of the
neuron, fi .µ /, is plotted as a function of the stimulus, µ . The tuning curve of a
neuron with a PD of Á can be obtained by a horizontal shift of the above curve
by Á. (b) The tuning of the variance. The variance, bi .µ /, of neuron i with PD
of Ái D 0 degree is plotted as a function of the stimulus µ . (c) The decay of the
covariance, Cij , between neuron i with PD of 0 degree and neuron j with PD of
Áj is plotted as a function of Áj . Note that in model 1, Cij , for i 6D j, is independent
of µ . The above Žgures present the rate statistics, which is equivalent to the
spike count statistics in a 1 sec time interval. In order to obtain the spike count
statistics in different time intervals, these tuning curves should be scaled by
the relevant time window. Unless stated otherwise, the parameters that were
used for the numerical simulations of the
p input layer are: for the tuning curve
fmax D 80 sec ¡1 , fref D 40 sec ¡1 , ¾ f D 2; for the variance bmax D 80 sec ¡1 ,
p
bref D 40 sec ¡1 , ¾b D 1=2; for the cross-correlations c D 0:3, ½ D 1. Note that
these parameters are given in rates, that is, in numbers per 1 sec. In all the
simulations, we used a time window of T D 0:25 sec by which the averages and
the correlations were scaled linearly.
Nonlinear Population Codes
1109
A particularly simple example is a system in which only the variances of the
neurons depend on the external stimulus, while the cross-correlations do
not depend on the stimulus but depend on the angular coordinates of the
neurons. We shall refer to this model hereafter as model 1. We assume that
correlations are stronger for pairs of neurons with closer PDs, a property
that has been observed experimentally in some systems (Zohary et al.,
1994; Lee et al., 1998; van Kan, Scobey, & Gabor, 1985; Mastronarde, 1983).
For concreteness, we have used the following form of C for model 1,
³
´
jÁi ¡ Áj j
D
C
¡
exp
¡
model 1
(1.5)
Cij ±ij bi .µ / cbref .1 ±ij /
½
Á
!
cos.µ ¡ Ái / ¡ 1
C bref
(1.6)
bi .µ / D b.Ái ¡ µ/ D .bmax ¡ bref / exp
¾b2
where ±ij is a Kronecker delta. Here the variance, bi .µ /, has a unimodal tuning
with regard to µ with a peak at the neuron’s PD, as shown in Figure 1b. The
tuning of the variance is determined by three parameters: ¾b is the tuning
width, bmax is the peak variance, and bref determines the modulation of the
variance tuning curve. This stimulus dependence of the neuronal variances
is qualitatively similar to the monotonic relationship between variance and
mean observed in many neurons. The dependence of the cross-correlations
on the distance between neurons (i.e., the difference in their PDs) is modeled
as an exponential decay with correlation length ½ and correlation strength
c. Here jxj denotes angular distance and ranges from 0 to ¼ . Figure 1c shows
the decay of the correlation between the neurons as a function of the distance
between them.
We assume that the correlations are long distance, that is, ½ D O .1/ (note
the maximum distance is ¼ ) and remains Žxed when we change the size
of the population. For example, for model
p 1 with the parameters c D 0:3,
½ D 1, bmax D 80, bref D 40, and ¾b D 1=2 for a time interval of T D
1 sec (see Figure 1), the average correlation coefŽcient (CC) across pairs of
neurons with PD differences of less than 90 degrees is 0.12, while for pairs
of neurons with PD differences of more than 90 degrees, the average CC is
0.025. For comparison, Zohary et al. (1994) report an average CC of 0.18 for
neurons with close PDs and 0.04 for neurons with PD differences of more
than 90 degrees. Although these correlation values may seem weak, their
effect on the population code is dramatic, as shown below.
Another simple example is the multiplicative model (Abbott & Dayan,
1999), which we shall refer to as model 2. Here, the covariance matrix has
the form
³
³
´´
q
jÁi ¡ Áj j
model 2: (1.7)
Cij .µ / D bi .µ /bj .µ / ±ij C c.1 ¡ ±ij / exp ¡
½
The neuronal variances are given by bi .µ /, as before. The cross-correlations
1110
M. Shamir and and H. Sompolinsky
q
consist of a stimulus-dependent multiplicative component, bi .µ /bj .µ /, multiplied by the distance-dependent function. The two models for the correlation matrix are introduced as two example that will yield qualitatively
similar results (see below). As will be clear from the analysis below, the
essential features of the correlation matrices are the long range of the crosscorrelations and the discontinuity of the correlation matrix on the diagonal
resulting from the variance term.
1.2 Estimation Errors and Fisher Information. Throughout this letter,
we will evaluate the efŽciency of an estimator µO of the stimulus angle µ
by the inverse of the mean square of its estimation error, h.± µO /2 i¡1 ; where
O and h: : :i denotes averaging with regard to the distribution of
± µO D µO ¡ hµi
the neuronal responses for a given stimulus µ. The Fisher information (FI)
(Thomas & Cover, 1991; Kay, 1993),
*³
JD
@ log P.rjµ/
@µ
´2 +
;
(1.8)
provides an upper bound on the efŽciency of any unbiased estimator of µ.
The FI of the gaussian ensemble, equation 1.1, can be written as a sum of
the FI of the mean responses, denoted as Jmean and that of the tuning of the
covariance matrix, Jcov (see, e.g., Kay, 1993)
J D Jmean C Jcov
(1.9)
Jmean D ft0 C¡1 f0
1
Jcov D TracefC0 C¡1 C0 C¡1 g:
2
(1.10)
(1.11)
Here f0 and C0 are derivatives with regard to µ; of the mean response vector
and the covariance matrix, respectively. In the case of uncorrelated population, the FI is given by J D Jmean C Jvar, where
Z
Jmean D N
¼
¡¼
0
dÁ f 2 .Á/
2¼ b.Á/
(1.12)
and
Z
Jvar D N
¼
¡¼
0
dÁ b 2 .Á/
:
2¼ 2b2 .Á/
(1.13)
Here and throughout the letter, we assume that the system size, N, is large;
hence spatial summations can be converted to integrals over angles. Figure 2
Nonlinear Population Codes
1111
0.04
D q­ 2 (deg­ 2)
J
0.03
Jcov
0.02
0.01
J
0
0
500
1000
N
1500
mean
2000
Figure 2: FI of a correlated population of neurons. The FI, J, of the input layer
(model 1) is plotted as a function of the number of cells, N, in the pool by the
solid line. The contributions of the different terms Jmean , and Jcov are plotted by
the dash-dot and the dotted lines, respectively. Notice that Jmean saturates for
large pools, while Jcov grows linearly with the size of the pool, N. The Fisher
information, J, Jmean , and Jcov , was calculated using equations 1.9–1.11 and parameters appearing in Figure 1 for model 1. Qualitatively similar results are
obtained in model 2 as well.
shows the values of J, Jmean , and Jcov for model 1 as a function of N. Note
that while the FI of the covariance matrix, Jcov , grows linearly with the
population size, N, the FI of the average Žring rates, Jmean , saturates to a
Žnite value, in contrast to the FI of uncorrelated population, equation 1.12.
The saturation of Jmean in the large N limit is not speciŽc to this model,
but is characteristic of systems with long-range positive correlations, as
previously shown (Sompolinsky et al., 2001). Thus, in the present system,
most of the information in the neuronal responses comes from the stimulus
dependence of the covariance matrix.
In this letter, we study readout models for correlated populations. As we
will show below, a linear readout is incapable of extracting the information
embedded in the second-order statistics; its performance is bounded by
Jmean . This raises the question of whether there is a nonlinear readout scheme
whose performance is close to the total J, and if so, how complex such a
readout needs to be. Here we present a relatively simple bilinear readout
that extracts a large fraction of the information embedded in the covariance
matrix, providing a plausible model for readout of population codes with
stimulus-dependent correlations.
1112
M. Shamir and and H. Sompolinsky
1.3 Linear and Recurrent
Network Readouts. A linear readout is an
P
estimator of the form i ri wi , where fwi g is a set of readout weights. Here
we shall consider the linear readout in the context of two tasks: estimation
and discrimination of an angular variable.
1.3.1 Estimation. A well-known example of a linear estimator of µ is the
linear population vector (LPV), Žrst proposed by Georgopoulos et al. (1986)
for the readout of arm-reaching-movement directions in two dimensions,
from the responses of motor cortical cells. The LPV can be written as
zO D
N
X
ri eiÁi ;
(1.14)
iD1
where we have used a complex representation of two-dimensional unit
vectors. In this notation, zO is an estimator of the unit vector z D eiµ , which
represents a movement in the direction of the angle µ. Using the rotational
symmetry of the system, one can show that the estimator zO =jhOzij is an unbiased estimator of z D eiµ and that it is the optimal linear readout in terms of
squared error averaged over all angles µ, with a Žxed set of weights, w (see
appendix A.1). The efŽciency of any estimator that respects the symmetry of
the system (i.e., its parameters are periodic functions of Ái / is independent
of µ . Assuming that the uctuations in the estimation of µ are small, one can
write the average estimation error of such an estimator as a signal-to-noise
relation for µ D 0 degree (see appendix A.1),
³
O 2 i¡1 D
h.± µ/
signal
noise
´2
O
signal D hxi
(1.15)
(1.16)
2
O 2 i;
noise D h.± y/
(1.17)
O For the LPV, these signal and
where we have used the notation zO D xO C iy.
noise terms are
O D
hxi
N
X
f .Ái / cos.Ái /
(1.18)
iD1
O 2i D
h.± y/
N
X
Cij .µ / sin.Ái / sin.Áj /;
(1.19)
i;jD1
where Cij .µ / is the covariance matrix given by equations 1.5 and 1.6 for
model 1. In the case of an uncorrelated population of neurons, C.Ái ; Áj / D
b.Ái /±ij ; hence, both equations 1.16 and 1.17 grow linearly with N, yielding
Nonlinear Population Codes
1113
(see also Seung & Sompolinsky, 1993)
h.± µO /2 i¡1 D N
­­
2
2­
f1 ­
;
b0 ¡ b2
(1.20)
R dÁ
where fn D 2¼
f .Á/einÁ and similarly bn . This result should be compared
to the upper bound provided by J D Jmean C Jvar of equations 1.12 and 1.13.
Note that the performance of the LPV is bounded by Jmean (see section 1.3.2)
reecting the fact that the linear readout extracts information only from
the mean responses. In fact, the slope of the LPV performance versus N,
equation 1.20, is smaller than that of Jmean , equation 1.12, since due to the
smoothness of the LPV weights, it is sensitive only to the low Fourier components of f and b, the Žrst Fourier component of f , and zero and second
Fourier components of b, as can be seen from equation 1.20.
In the uncorrelated case, the efŽciency of the LPV grows linearly with
N although with a suboptimal slope. This is not the case in the presence
of the correlations. Figure 3 shows the effect of the correlations on the accuracy of the LPV readout (open circles) for model 1 for several values of
the parameter c, equation 1.5. As can be seen from the Žgure for c D 0, the
efŽciency of the LPV grows linearly with N. However, in the case of c > 0,
there exists some Nsat such that for N » Nsat the efŽciency saturates to a
size-independent limit. Note that because the tuning curves of the mean
rates for the chosen parameters are broad, they contain little power in high
Fourier modes. Consequently, the LPV performs close to the bound of Jmean .
1.3.2 Discrimination. Another interesting readout problem is that of discriminating between two close angles. We model a two-interval discrimination task by a system that is given two sets of neuronal activities r.1/ ; r.2/
generated by two proximal stimuli µ and µ C ±µ . From these responses,
the system must infer which stimulus generated which activity. For cases
where the FI is large, the maximum likelihood (ML) discrimination
for
p
two close angles yields a probability of error given by H.d0 = 2/, where
R1
2
H.x/ D .2¼ /¡1=2 x dxe¡x =2 and the discriminability d0 equals
p
d0 D j±µj J.µ /;
(1.21)
where J.µ / is the FI of the system (Seung & Sompolinsky, 1993). A linear
readout can be used to perform the discrimination task. We deŽne the discrimination rule to be according to the sign of q D w.r.1/ ¡ r.2/ /, where a
plus sign is interpreted as µ C ±µ was presented Žrst and the minus sign is
interpreted as µ was presented Žrst. The set of weights w that optimize the
0
discrimination for close angles, ±µ ¿ 1, is given by w D C¡1 f . Note that
these weights depend on the angle µ. The probability of error of the optimal
1114
M. Shamir and and H. Sompolinsky
Figure 3: The effect of correlations on the efŽciency of the LPV and RN estimators. The efŽciency of the readouts is shown in terms of the reciprocal squared
estimation error. The efŽciency of the LPV (circles) and the RN (boxes) is plotted
as a function of the number of cells in the pool, for different values of correlation
strength, from the bottom c D 0:3, c D 0:03, and c D 0; all other parameters that
were used are as deŽned in Figure 1. The efŽciency was calculated numerically
by simulating the neuronal responses and averaging the LPV/RN squared estimation errors over 4000 trials, for model 1. The contribution of the average
Žring rates to the FI of the system, Jmean , is given in the solid lines, from bottom
to top for c D 0:3, c D 0:03 and c D 0. Jmean was calculated using equation 1.10.
p
linear discriminator is given by H.d0L = 2/, where (see section A.2)
p
d0L D j±µj Jmean .µ /;
(1.22)
which is inferior to the ML performance, equation 1.21. Note that the linear discriminator is the optimal local linear estimator (deŽned by a linear
readout with weights that depend on the neighborhood of angles to be estimated). Thus, Jmean is an upper bound on the performance of any linear
estimator of µ.
1.3.3 Recurrent Networks. Recently a recurrent network (RN) model has
been proposed (Deneve, Latham, & Pouget, 1999) as an improvement on
the performance of the LPV in estimation tasks. Essentially, the RN readout
is superior to the LPV in that it is sensitive to the Žner structure of the
neuronal tuning curves. The efŽciency of the RN readout is presented in
Nonlinear Population Codes
1115
Figure 3 (open boxes) for different values of c, as a function of N. In this
example, the RN performance is only slightly better than that of the LPV
(circles). This is due to the fact that for smooth tuning curves, the LPV
already captures almost all the information in the mean responses. The
numerical results are in agreement with the claim (Deneve et al., 1999) that
the efŽciency of the optimal RN readout in the limit of small estimation
errors approaches 1=h.± µORN /2 i D Jmean . Thus, although the performance of
the RN readout is in general superior to that of the LPV, it too saturates to a
size-independent limit in the presence of long-range positive correlations. In
summary, both the LPV and the RN schemes read out only the information
that is embedded in the Žrst-order statistics of the data, that is, the tuning
curves of the neurons, and hence both are bounded by Jmean .
2 The Bilinear Readout
2.1 Covariance Fisher Information. In order to understand how the information in the second-order statistics can be extracted, it is important to
gain insights into the properties of the FI, equations 1.9–1.11. As observed
above, the Žrst-order statistics contain only a Žnite amount of information. The reason for the saturation of Jmean is that the tuning curves for the
average Žring rates of the neurons are relatively smooth functions of the
stimulus angle; hence, most of the embedded information resides in the
low Fourier modes of the network responses. On the other hand, since the
cross-correlations vary smoothly with angular distance, these slowly varying modes contain noise whose standard deviation grows linearly with N,
yielding a signal-to-noise ratio of order 1. This raises the question of why
the second-order statistics of the same smooth system yields information
that grows linearly with N. Indeed, our study shows that most of the information in Jcov resides in the stimulus dependence of the variances, not
the cross-correlations. More precisely, the covariance matrix of a system
with smoothly varying cross-correlations (such as equations 1.5 and 1.7)
has a discontinuity on the diagonal resulting from the fact that the variance,
bi .µ /, is larger than the zero distance limit of the cross-correlations, which
is cbref in model 1 (see Figure 1) and cbi .µ / in model 2. It is useful to write
the matrix C as
Cij D ±ij Di .µ / C Sij
(2.1)
with S as the continuous (smooth) part of C and D is the discontinuous part
of C on the diagonal. In the models described above, we obtain
Di .µ / D b.Ái ¡ µ/ ¡ cbref; model 1
³
´
jÁi ¡ Áj j
D
exp
¡
Sij cbref
; model 1
½
(2.2)
(2.3)
1116
M. Shamir and and H. Sompolinsky
1.4
x 10
Model 1
­3
4
3
D
­J |
D
­J |
1
cov
0.8
cov
0.6
|J
|J
Model 2
­5
3.5
1.2
0.4
2.5
2
1.5
1
0.2
0
0
x 10
0.5
500
1000
N
1500
0
0
2000
500
1000
N
1500
2000
Figure 4: The difference between the contribution of the correlations to the total
FI of the system, Jcov, and the FI of an uncorrelated gaussian population of
neurons with zero means and variances D, JD , is shown as a function of the
size of the system, N, for model 1 (left) and for model 2 (right). Note that the
difference jJcov ¡ JD j is a sublinear function of N. For model 2, the tuning curves
of the mean and variance are as deŽned in Figure 1, with ½ D 1 and c D 0:3 (see
equation 1.7).
for model 1, and
Di .µ / D b.Ái ¡ µ/.1 ¡ c/; model 2
³
´
q
jÁi ¡ Áj j
Sij D c bi .µ /bj .µ / exp ¡
; model 2
½
(2.4)
(2.5)
for model 2.
As will be explained below, the discontinuity in the covariance matrix
along the diagonal plays an important role in the system’s behavior. In
fact, for both models, we Žnd numerically that for large N, Jcov can be
approximated by
Jcov
N
1X
¼ JD ´
2 iD1
Á
0
Di .µ /
Di .µ /
!2
Z
¼N
0
dÁ D 2 .Á/
:
2¼ 2D2 .Á/
(2.6)
Equation 2.6 is a central result of this article. Figure 4 shows the difference
between Jcov and JD as a function of N for models 1 (left) and 2 (right). In
both cases, as can be seen from the Žgure, the difference jJcov ¡ JD j is a sublinear function of N for large N. However from equation 2.6, we Žnd that
JD scales linearly with N; hence, to a leading order in N, one obtains the
result of equation 2.6. In section 2.4, we present an analytical calculation
that bounds Jcov from below by JD to a leading order in N. This result, equation 2.6, suggests that the source of the fact that the second-order statistics
Nonlinear Population Codes
1117
of the correlated system contains an extensive amount of information is the
discontinuous nature of the variance. Because of this discontinuity, the information in the variance resides in all modes, including the modes with
high Fourier numbers, and are therefore immune to the correlated noise,
which resides mostly in the slowly varying modes.
Comparing equation 2.6 with equation 1.13 indicates that the information in the second-order statistics of the correlated system is equivalent to
that of an uncorrelated population of neurons with zero mean tuning but
with stimulus dependence variances equal to Di .µ /. This important insight
motivates our proposal for a nonlinear readout in the next section.
2.2 The Bilinear Readout Mechanism. Based on the above results, we
propose a two-layer readout network (see Figure 5) for extracting information out of the second-order statistics. The Žrst layer of the readout system
receives as input the activities of neurons in the input layer, fri g, and performs a linear Žltering of the inputs followed by quadratic nonlinearity.
Similar to the input-layer neurons, the output response of a neuron in the
Žrst readout layer is a real scalar value. Denoting the output of the ith neuron
in this layer by Ri , we deŽne Ri according to the deterministic mapping,
0
12
N
X
Ri D @
Wij rj A ;
(2.7)
jD1
where fW ij g is a set of linear Žltering weights between the presynaptic neuron j in the input layer and the postsynaptic neuron i in the Žrst processing
layer. The form of the N £ N Žlter W will be discussed below. The output
layer of the readout mechanism receives, via feedforward connections, the
responses of neurons in the Žrst layer (see Figure 5) and calculates their population vector, termed hereafter the nonlinear population vector (NLPV),
O
ZO D XO C iY:
ZO D
N
X
Ri eiÁi :
(2.8)
iD1
Note that we associate with the ith neuron of the Žrst layer an angle, Ái , which
is the same as the PD of the ith input neuron. The rationale for this will be
evident when we specify W below. Using the symmetry of the correlation
matrix, Cij .µ /, equation 1.4, and constraining the linear Žlter, W, to be of the
form Wij D W.Ái ¡ Áj /, one obtains
0
O D eiµ @
hZi
1
X
ijk
.C.Áj ; Ák / C f .Áj / f .Ák //W.Áj ¡ Ái /W.Ák ¡ Ái /eiÁi A : (2.9)
1118
M. Shamir and and H. Sompolinsky
r1
r2
rN
Input layer
W11
W 1N
R1
WNN
RN
Readout
1st layer
Output layer
X
W
Y
ij
1
0.5
0
-0.5
180
120
60
0
f- f
i
j
60
120
(deg)
180
Figure 5: The architecture of the NLPV readout model. The input to the Žrst layer
of the readout mechanism is a linear combination of the activities of neurons in
the input. The Žrst layer responds by a nonlinear transfer function, and its output
is transmitted in a feedforward manner to the second layer that calculates its
population vector. At the bottom left appears a plot of the NLPV weight matrix,
Wij , as a function of .Ái ¡ Áj /, for N D 300 and p D 6.
O Zij
O is an unbiased estimator of µ. As before, we
Hence, the estimator Z=jh
O 2i
O 2 i D h.± Y/
calculate its variance for µ D 0 degree by h.± µ/
O 2 (see appendix A.1),
the results of which are shown below.
hXi
2.3 The EfŽciency of the Bilinear Readout. In order to complete the
deŽnition of the readout, one has to choose the form of the linear Žlter W.
Motivated by the observation that most of the signal lies in the discontinuous
part of the correlation matrix on the diagonal and that the noise mainly
resides in the slowly varying modes of the network, that is, in the low-order
Nonlinear Population Codes
1119
Fourier modes, we choose
Wkj D ±kj ¡
p
1 X in.Ák ¡Áj /
1 X in.Ák ¡Áj /
D
e
e
:
N nD¡p
N jnj>p
(2.10)
With this choice, the matrix W simply Žlters out the Žrst .2p C 1/ low-order
Fourier modes of the inputs, fri g. The parameter p determines how many
low-order Fourier modes are Žltered out by W. Substituting equations 2.8
and 2.10 into equation 1.16, the signal term reduces to
(
)
N ±
²
X
¤
O D 2NReal
hXi
CQ nC1;n C fnC1 fn
;
(2.11)
n>p
where CQ mn is the double Fourier transform of the correlation matrix,
1 X imÁk ¡inÁj
CQ m;n D 2
e
Ckj :
N kj
(2.12)
In the large N limit, the Fourier transform is given by (see equation 2.1)
1
CQ m;n D Dm¡n C Sm;n
N
Z
dÁ
Dm D
D.Á/eimÁ
2¼
Z
dÁdÃ
Sm;n D
S.Á ; Ã/eimÁ¡inà :
.2¼/2
(2.13)
(2.14)
(2.15)
Both S and f are relatively smooth functions of the angles; hence, Sm;n and
fn decay fast with the increase of m and n. Therefore, for sufŽciently large
p, one can neglect the contribution of S and f to the signal, equation 2.11.
Finally, noting that CQ nC1;n D N1 D1 C SnC1;n , we obtain
O D
hXi
N
X
jnj>p
Z
D1 D [N ¡ .2p C 1/]D1 ¼ N
dÁ
D.Á/eiÁ ;
2¼
(2.16)
where D is the discontinuous piece of the covariance matrix, equations 2.2
and 2.4. It is important to note the limit in which equation 2.16 is valid. First,
we have assumed (in the last step) that p ¿ N such that terms of order p
are negligible compared to terms of order N. On the other hand, we have
assumed that p is sufŽciently large such that the Fourier transforms of S and
f can be neglected relative to the contribution of the terms containing D. As
will be shown below, this latter condition is equivalent to the requirement
1120
M. Shamir and and H. Sompolinsky
Model 1
Model 2
0.02
D q­ 2 (deg­ 2)
0.03
p=6
p=3
p=6
0.015
p=3
0.02
0.01
p=1
0.01
p=1
0.005
0
0
500
1000
N
1500
2000
0
0
500
1000
N
1500
2000
Figure 6: Nonlinear PV efŽciency as a function of the size of the pool for different
values of p. EfŽciency of the NLPV readout in terms of the reciprocal average
squared estimation error, shown in solid lines for p D 1; 3; 6 from bottom to top,
for model 1 (left) and model 2 (right). The dashed line is the asymptotical value
of the NLPV efŽciency, equation 2.19.
N ¿ Nsat.p/, where Nsat .p/ is a strongly growing function of p. The form of
this function depends on the asymptotic decay of the Fourier transform of
S. Thus, the range of validity of equation 2.16 is p ¿ N ¿ Nsat.p/. Similar
but more tedious analysis of the noise term yields (see appendix A.3), for
p ¿ N ¿ Nsat .p/
h.±Y/2 i ¼ N
D
N
2
X
m
Z
.Dm D¡m ¡ Dm D2¡m /
dÁ
B.Á/ .1 ¡ cos 2Á/ ;
2¼
(2.17)
where
B.Á/ D 2D2 .Á/:
(2.18)
Combining equations 2.16 and 2.17, we obtain
O 2 i¡1 D N
h.± µ/
2jD1 j2
´ NJ0 ; p ¿ N ¿ Nsat.p/:
B0 ¡ B2
(2.19)
This result shows that for sufŽciently large p such that N ¿ Nsat .p/ , the
squared estimation error will scale like 1=N. Figure 6 shows the efŽciency of
the NLPV readout as a function of the pooling size N, for models 1 (left) and
2 (right) and several values of p. The dashed line is the asymptotical limit of
the performance of the NLPV, given by equation 2.19. As is indicated by the
results, initially the estimation efŽciency of the NLPV grows linearly with
Nonlinear Population Codes
1121
Figure 7: Typical network size for saturation of the NLPV efŽciency, Nsat , as a
function of p, for model 1. The solid line is the exact expression, Nsat D NID =IS
(equation A.22, calculated for N D 2000), is presented as a function of p. The
approximated result, equation A.19, is shown by the dashed line.
N. However, for any given p, there exists some scale of N, Nsat .p/, above
which the efŽciency of the NLPV begins to saturate. Indeed we show in
the appendix that for any
q Žxed p, if N is made large enough, the standard
O 2 i; grows eventually like N, saturating the
deviation of the noise, h.± Y/
signal-to-noise ratio. Analyzing this saturation in detail for model 1, we
derive the following expression for the saturation size,
Nsat .p/ D
B0 ¡ B2
p3 ; model 1;
¢2
4=3 cbref ¼=½ .1 ¡ e¡2¼=½ /
¡
(2.20)
where B.Á/ is given in equation 2.18. The fact that the saturation value
grows fast with p implies that we can use the bilinear readout to obtain an
accurate estimate of µ even with moderate values of p, as shown in Figure 6.
Figure 7 shows Nsat as a function of p in model 1. The exact expression for
Nsat (equation A.23 in the appendix) is given by the solid line. The dashed
line shows the approximate analytical calculation, equation 2.20. Finally,
when N is large compared with Nsat , the saturated value of the efŽciency of
the NLPV is given by
lim h.± µO /2 i¡1 D J0 Nsat ;
N!1
(2.21)
1122
M. Shamir and and H. Sompolinsky
35
(a)
150
(b)
30
25
100
ri 20
R
15
i
50
10
5
10
­ 180 ­ 120 ­ 60
0
60
f (deg)
i
120
180
­ 180 ­ 120 ­ 60
0
60
f (deg)
i
120
180
Figure 8: Typical result of simulating the activities of neurons in the model during a single trial of a presentation of µ D 0 degree. (a) Activities of a population
of N D 500 neurons in the input layer during a single trial of presentation of
µ D 0 degree, model 1. The neuronal activities are plotted as a function of their
PDs by the thin line. The thick smooth line is the average across trial activity
of the different neurons. (b) The activities of neurons in layer 1 of the readout
are shown as a function of their PDs by the thin line. The input to the system
in this single trial is given in a. The average activity of the neurons is shown by
the thick line. For this graph, p D 10 was used.
where J0 is deŽned in equation 2.19. Thus, Nsat determines both the linear regime of the efŽciency of the NLPV readout in N and the asymptotic efŽciency of this readout. Qualitatively similar results are obtained for
model 2.
2.4 Statistics of the Neuronal Responses in Layer 1 of the Readout. To
obtain more insight into the performance of the NLPV, we compared the
statistics of the responses of neurons in the input layer, fri g, and those in
the Žrst readout layer, fRi g: An example of the two responses is displayed
in Figure 8 for a network of 500 neurons during a presentation of stimulus
µ D 0 degree as a function of their PDs. Figure 8a shows the activities in the
input layer (thin line) together with their trial-averaged activities (thick line).
The rapid jittering of the neuronal responses is caused by uctuations in the
high-order Fourier modes of the system. However, it is the uctuations in
the low-order Fourier modes of the system that cause the population activity
proŽle to shift left (as in the Žgure) or right from the average proŽle (i.e., the
population tuning curve), and thus cause substantial errors in the estimation
of µ. In the speciŽc example shown here, the input layer activities (thin line)
looks as though they were shifted from the average proŽle of activities (thick
Nonlinear Population Codes
1123
(a)
(b)
20
250
200
15
C
ij
R
10
Cij
5
150
100
50
0
180
0
180
180
0
f
j
(deg)
180
0
0
f
­ 180 ­ 180
i
(deg)
f
j
(deg)
0
­ 180 ­ 180
f
i
(deg)
Figure 9: (a) Covariance matrix of the stochastic network. The covariance between the activities of neurons i and j is plotted on the vertical axis as a function
of Ái and Áj . The covariance is calculated with respect to the distribution of the
ri , given µ D 0 degree, in model 1. Note that the covariance in the activities of
different neurons (off-diagonal elements) is independent of µ and decays slowly
with the functional distance between the neurons. The variance (diagonal elements) depends on µ and peaks at Ái D µ . (b) Covariance matrix of neurons in
layer 1 of the readout. The covariance is calculated with respect to the distribution of their inputs, fri g, given µ D 0 degree, for p D 15. Note that the covariance
between different neurons is negligible. The variance (diagonal elements of the
matrix) is stimulus dependent and peaks at Ái D µ .
line) by 30 degrees, causing an error of ¼ 30 degrees in the LPV estimation
of µ.
Figure 8b shows the activities of neurons in the Žrst readout layer. The
rapid jittering in the activities is obvious, but the long-range uctuations
observed in the input layer are absent, suggesting that the uctuations in
Ri are largely independent. Indeed, using calculations similar to those mentioned in the previous section, we show (see appendix A.3) that in the limit
of p ¿ N ¿ Nsat .p/, the statistics of Ri obeys
hRi i D Di
(2.22)
2
h±Ri ±Rj i D 2±ij .Di / :
(2.23)
These analytical results are supported by the numerical results of Figure 9b,
which shows the covariance matrix of neurons in layer 1, CRij .µ / D h±Ri ±Rj i,
for µ D 0 degree. As can be seen, the matrix CRij .µ / D h±Ri ±Rj i is almost a
diagonal matix; the covariance between different neurons is negligible. In
contrast, the covariance matrix of the input neurons, Figure 9a, contains a
substantial smooth off-diagonal part that gives rise to the collective uctuation that limits the accuracy of the LPV (see also Figure 1c). Figures 10a
and 10b show the average and standard deviation of the Ri , respectively, as
1124
M. Shamir and and H. Sompolinsky
20
(a)
30
(b)
25
15
i
DR
i
<R >
20
10
15
10
5
5
­ 180 ­ 120 ­ 60
0
60
f (deg)
i
120
180
­ 180 ­ 120 ­ 60
0
60
f (deg)
120
180
i
Figure 10: Statistics of neurons in the Žrst layer of readout. (a) The population
tuning proŽle. The average activity of neurons in the Žrst layer is plotted as
a function of their PDs (solid line). The population proŽle peaks at Ái D µ .
For comparison, Cdi .Ái / is also plotted (dashed line). (b) The standard deviation
of neurons in thepŽrst layer is plotted as a function of their PDs (solid line).
For comparison 2.Cdi .Ái //2 is also plotted (dashed line). The statistics were
calculated for a system in model 1 with N D 300 and for p D 4.
a function of their PDs, for stimulus µ D 0 degree. They show good Žt with
the analytical predictions of equations 2.22 and 2.23.
We conclude that for sufŽciently large p, the linear Žltering of the loworder Fourier modes effectively decorrelates the activities of different neurons. The subsequent squaring of the Žltered activity is required to be able to
extract the information embedded in the second-order statistics of the Ri by
a linear readout downstream (the second layer of weights in our readout).
The results of equations 2.22 and 2.23 also shed light onto our expression
for the efŽciency of the bilinear readout, equation 2.19. Comparing equation 2.19 and equation 1.20, one observes that the efŽciency of the bilinear
readout is equivalent to the efŽciency of a linear population vector of independent
neurons with mean responses, D.Ái ¡ µ/, and standard deviations
p
of 2D.Ái ¡ µ/, in line with equations 2.22 and 2.23.
2.4.1 The FI of the First Readout Layer Population. Our bilinear readout
contains Žrst a decorrelation step, a nonlinear operation, and Žnally a population vector summation. We may inquire how much better we can do
if we replace the last layer by a more complicated operation (e.g., a RN).
This question can be addressed by calculating the FI, JR , of the R layer.
To do this, we
P use the fact that the Ri are squares of gaussian variables:
Ri D t2i I ti D j Wij rj . Hence, the FI of the Ri is equal to the FI of the ti . In the
limit of p ¿ N ¿ Nsat.p/, the ti become independent random gaussian vari-
Nonlinear Population Codes
1125
ables with zero mean and variance Di . Applying equation 1.13, we obtain
in this limit JR D JD , where JD is given by equation 2.6. A corollary of this
result is that Jcov ¸ JD C o.N/. In particular one obtains that Jcov scales (at
least) linearly with N. In fact, as indicated above (see equation 2.6), our numerical results indicate that Jcov ¼ JD , implying that the Žltering out of the
low-lying modes of the inputs by Ri leaves most of the information intact.
Nevertheless, the NLPV estimator does not saturate the FI of the system
(or the Ri / as its variance, equation 2.19, is larger than 1=Jcov ¼ 1=JD . This is
due to the choice of a population vector for the second-layer weights, which
extracts only the low-order components in the tuning of the Ri .
2.4.2 Optimal Bilinear Readout and Discrimination. For a stimulusindependent set of weights of the second layer, the NLPV weights are optimal in the sense of squared error averaged over all angles, similar to the case
with a linear PV. However, one can obtain optimal results with a bilinear
readout appropriate for local tasks. Considering a discrimination task, as
described in section 1.3, we deŽne a bilinear discriminator that is based on
t
the sign of q D W.R.1/ ¡ R.2/ /. Minimizing the noise h.±q/2 i ¼ 2WCR W ,
while constraining the signal hqi ¼ ±µ WhRi0, for small angular deviations,
one obtains the optimal weights Wi D D0i =.2D2i /, for p ¿ N ¿ Nsat . Calculating the discriminability in this limit (similar to the calculation in section A.2)
yields
p
d0R D j±µ j JD :
(2.24)
This result shows once again that the R layer retains most of the information
coded in the correlated neurons of the input layer, and furthermore that
locally this information can be extracted by a further linear pooling.
2.5 Combined Population Vector. So far we have focused on extracting
the information from the covariance matrix. In fact, linear and nonlinear
readout schemes can be combined to implement a readout mechanism that
is sensitive to the different aspects of the statistics of the neural responses.
We deŽne a combined readout as a linear combination of the LPV estimator,
zO L , equation 1.14, and the NLPV estimator, zO B , equation 2.8,
zO D aL zO L C aB zO B :
(2.25)
Note that this estimator is unbiased and its accuracy is independent of µ.
Minimizing the quadratic error with regard to the coefŽcients aL and aB , we
obtain the optimal set of coefŽcients,
³ ´ µ
h.± yO L /2 i
aL
D
aB
h± yO L ± yO B i
¶¡1 ³
´
h± yO L ± yO B i
hxO L i
;
hxO B i
h.± yO B /2 i
(2.26)
1126
M. Shamir and and H. Sompolinsky
D q­ 2 (deg­ 2)
0.04
0.03
J
Asymp NLPV
LPV
NLPV
Combined
0.02
0.01
0
0
500
1000
N
1500
2000
Figure 11: Comparison of the efŽciency of different readout schemes. EfŽciency
of the LPV (boxes), NLPV (circles), and the combined readout (asterisks), as
deŽned in section 2.5, was calculated by averaging the estimation errors of
these readout schemes over 4000 trials. For comparison, we show the Fisher
information, J (solid line), and the asymptotic limit of the NLPV, equation 2.19
(dashed line). For the NLPV, p D 12 was used. In this Žgure, model 1 was used
for the input-layer neurons.
where averages are taken with regard to the probability distribution of the
neural responses, fri g, for µ D 0 degree. The efŽciency of the combined
readout is shown in Figure 11 (asterisks). As can be seen, the performance
of the combined readout is superior to that of the NLPV due to the addition
of the information embedded in the Žrst-order statistics.
3 Summary and Discussion
We have investigated readout mechanisms for correlated populations of
neurons that code for an angle, whose second-order statistics vary with the
stimulus. Using the Fisher information, we show that the total amount of
information in the system about the stimulus is extensive: it grows linearly
with population size (see equation 2.6). This is in contrast to the case of
stimulus-independent correlations where the FI saturates to a Žnite value
due to the long-range correlations (e.g., compare J and Jmean in Figure 2) as
has been shown previously (Sompolinsky et al., 2001). This information is
embedded in the stimulus-dependent covariance matrix since the information in the tuning of the mean rates saturates to a Žnite value (see Figure 2).
Nonlinear Population Codes
1127
We Žnd that the main source of information in the second-order statistics
of the correlated populations is the stimulus dependence of the variances
and not the cross-correlations (see Figure 4). This is because the variances,
being larger than the cross-correlations between nearby neurons, induce a
stimulus-dependent discontinuity in the spatial structure of the covariance
matrix (see Figure 9a). This stimulus sensitivity is spread across all spatial modes and hence is not signiŽcantly suppressed by the slowly varying
modes of correlated noise.
The above analysis has a direct bearing on the assessment of distributed
coding in cortical neuronal ensembles. Although pairs of cortical neurons
often exhibit signiŽcant cross-correlations, the degree and ubiquity of the
modulation of the noise cross-correlograms by changing the stimulus are
unclear (Aertsen, Gerstein, Habib, & Palm, 1989; Ahissar, Ahissar, Bergman,
& Vaadia, 1992; Maynard et al., 1999). On the other hand, the variance of
the spike rates of cortical cells typically changes monotonically with the
mean rates; hence, they are strongly modulated by the stimulus. In addition, the variance in spike counts is in general substantially larger than the
mean cross-correlations of nearby neurons (see, e.g., Lee et al., 1998; note
that the condition for the variance, b.Ái /, to be larger than the correlation
with nearby neurons, cbref , in model 1, is equivalent to the demand that
correlation coefŽcient of nearby neurons cbref=b.Ái / will be less than one
in absolute value), validating our assumption of discontinuity in the spatial structure of the neuronal covariance matrices. If this analysis is correct,
then despite the presence of signiŽcant noise cross-correlations in cortical
ensembles, the response of large cortical pools contains accurate information about the coded stimuli, mainly due to the tuning of the variance of the
Žring rates. This suggests that experimental characterization of the tuning
properties of the response variances may be as important as the traditional
characterization of the tuning properties of the trial-averaged responses.
Extracting the information embedded in the response variances requires
more complex readouts than the ones usually assumed in modeling of neuronal population coding. We show that linear schemes (e.g., the population
vector; see Figure 3) are incapable of extracting information from secondorder statistics. Their efŽciency is bounded by the relatively small Fisher
information of the mean rates. Here we propose a bilinear readout model
that we call a nonlinear population vector. The NLPV consists of two stages
of processing (see Figure 5). In the Žrst stage, a linear Žltering of the input neurons is performed, such that the slowly varying Fourier modes
of the inputs are subtracted out. This high-pass Žltering is followed by a
quadratic nonlinearity. We show that this stage effectively decorrelates the
neuronal responses and generates a population of uncorrelated responses
with stimulus-dependent means and standard deviations (see Figures 9 and
10), both of which are proportional to the variances of the input neurons
(see equations 2.22 and 2.23). In the second stage, the outputs of the Žrst
stage are linearly summed similar to a linear population vector. The resul-
1128
M. Shamir and and H. Sompolinsky
tant estimate has efŽciency that is extensive in the pool size, although it is
below the Fisher information of the system (see Figure 11).
Our readout model suggests that the degree of noise correlations in
cortical ensembles will exhibit substantial heterogeneity, where ensembles
higher in the sensory processing pathways are decorrelated by high-pass
Žltering mediated by the internal (or interlaminar) synaptic patterns. In
addition, the decorrelated neurons should exhibit more signiŽcant inputoutput nonlinearity than correlated lower-level neurons.
Our readout model employs a quadratic nonlinearity. The realization of
such a computation by neurons has been discussed in several previous studies (Poirazi & Mel, 2000; Schwartz & Simoncelli, 2001; Deneve et al., 1999).
We have also investigated the robustness of our readout to deviations from
the precise form of quadratic nonlinearity. We have studied numerically the
efŽciency of a more general nonlinearity by taking
­
­
®
­
­
N
X
­
­
­
­
Ri D ­ Wij rj ­;
­
­
jD1
(3.1)
where ® is a parameter of the neural responses that characterizes the neuron’s input-output nonlinearity. The simulations have shown that moderate
deviations of ® from 2 yield NLPV performances that are qualitatively the
same as the quadratic one, as shown in Figure 12. Further reasonable altering of other parameters of the model will not change qualitatively the
results presented here. For example, changing the DC component of the
mean responses tuning curves (i.e., changing fref while keeping fmax ¡ fref
constant) will not affect any result. Increasing the modulation of the tuning
curve, fmax ¡ fref, by a factor ¯ will increase Jmean and the efŽciency of the
LPV by factor ¯ 2 ; however, they will still saturate to a Žnite limit at the same
system size as before scaling.
An interesting feature of our readout scheme is (approximate) scale invariance. As can be seen from equation 2.19, the performance of the readout
depends on the tuning of the variances but is independent of their absolute
scale. Such a change in the scale of the variances may be implemented by a
change in the overall time window of the observed responses. In contrast,
linear readout will be sensitive to a change in the overall time window since
it will affect their signal-to-noise ratio. This scale invariance of the bilinear
readout can be traced to the form of the Fisher information, equations 1.10
and 1.11. Changing the time window by a factor of say, ° , is expected to
change both the mean spike counts and the covariance matrix by the same
factor. Hence, whereas Jmean will change by ° , Jcov will not. This suggests
that for short time intervals, the NLPV is superior to the LPV even for uncorrelated populations. It should be noted, however, that this scale invariance
is a property of the gaussian statistics assumed here, which is no longer a
Nonlinear Population Codes
(D q)­ 2 (deg­ 2)
0.02
0.015
1129
a=3.2
a=2
a=1.6
NLPV asymp
0.01
0.005
0
0
200
400
N
600
800
1000
Figure 12: EfŽciency of the NLPV readout for small deviations from the
quadratic nonlinearity (see equation 3.1), as a function of the number of neurons
in the pool. From the bottom ® D 1:6 (diamonds), ® D 3:2 (squares), ® D 2 (circles). The asymptotic limit of the NLPV, equation 2.19, is shown for comparison
(dashed line). For all plots, p D 10 was used. In this Žgure, model 1 was used
for the input-layer neurons.
valid model for the neuronal responses at sufŽciently low spike counts (i.e.,
at very short observation times).
Finally, we have shown that linear and nonlinear readout schemes can
be combined to implement a readout mechanism that is sensitive to several
aspects of the statistics of the neural responses (see equation 2.25). Even
in this simple example, a wisely chosen weighted sum of the linear and
nonlinear PVs yields a readout that is more accurate than the LPV and the
NLPV (see Figure 11).
In this article, we have chosen a high-pass Žlter form of the linear weights
matrix, W, based on heuristic arguments. Alternatively, one may Žnd the
optimal weights by minimizing the squared error of the readout (averaged
over all angles). However, Žnding the optimal weights is complicated. In a
previous study we used this approach for model 2 (Shamir & Sompolinsky,
2001a, 2001b). We found the efŽciency of the optimal readout is similar to
that of the high-pass NLPV with an appropriate value of p (for details, see
Shamir & Sompolinsky, 2001a).
Recent studies have used model neurons with nonlinear input-output
functions in order to obtain an improved population readout. Deneve et al.
(1999) used recurrent network (RN) nonlinear neurons, followed by linear
population vector pooling. They show that their network yields superior
1130
M. Shamir and and H. Sompolinsky
performances compared to LPV. In fact, they showed that for low noise
levels, their network extracts all the information that is coded in the average
Žring rates of the neurons, Jmean . Interestingly almost all of the information
that can be read by their mechanism is obtained in the Žrst iteration of
the dynamics. The Žrst iteration step of their dynamics can be modeled
according to equation 2.7 with a feedforward linear Žlter W, identical to
their recurrent weight matrix. Because information coded in the average
Žring rates of the neurons resides mainly in the low-order Fourier modes of
the network, their choice of Žlter, W, leaves only these few slow modes of the
system and Žlters out all of the higher Fourier modes. Thus, their readout
suffers from two major drawbacks. First, it is unable to read information
coded in the variance of the responses. Second, it does not overcome the
problem of the strong correlated noise in the slow Fourier modes of the
system (see Figure 3). Note that their study shows how bilinear readouts
can be used in order to read information efŽciently from the Žrst-order
statistics of the neuronal responses. In the context of this work, we have
studied the effect of replacing the feedforward layer Ri by an RN with
recurrent weights equal to our high-pass Žlter W. We have found (results
not shown) that iterating the dynamics of the network beyond the Žrst step
causes a deterioration of the readout accuracy of the system rather than an
improvement. On the other hand, it is expected that feeding the output of
our Ri into a RN with weights similar to that of Deneve et al. (1999) might
yield slightly improved performance, closer to the bound given by Jcov t JD
(see equation 2.6).
Appendix
A.1 Derivation of Angular Estimation Error. Let µO D arg.Oz/ be an unO For small uctuations in xO and y,
O
biased estimator of µ . DeŽne zO D xO C iy.
one can expand µO in xO and yO around their mean, obtaining
O 2i D
h.± µ/
µ
O 2i
h.± x/
.¡ sin µ; cos µ /
O yi
O
h± x±
jhOzij2
O
O yi
h± x±
O 2i
h.± y/
¶³
´
¡ sin µ
cos µ
;
(A.1)
where averages are taken with respect to the distribution of the neuronal
O 2 i, is independent of
responses for a given µ. If the estimation error, h.± µ/
µ, one can evaluate the estimation error for µ D 0 degree. In this case,
equation A.1 is reduced to
O 2i D
h.± µ/
O 2i
h.± y/
;
O 2
hxi
O to noise,
which is in the form of a signal, hxi,
(A.2)
p
O 2 i, relation.
h.± y/
Nonlinear Population Codes
1131
the optimality of the LPV,
A.1.1 Optimality of the LPV. We will now showP
assuming isotropy of the neural network. Let zO D i wi ri be an estimator of
eiµ . DeŽne a cost function, E, as the squared estimation error of zO averaged
over all angles:
Z
1
dµ iµ
hje ¡ wrj2 i:
(A.3)
ED
2
2¼
Minimizing E with respect to w, one obtains that the optimal set of linear
weights, wopt , is given by
wopt D Q¡1 u
Z
dµ iµ
uk D
e fk .µ /
2¼
Z
¢
dµ ¡
Qij D
Cij .µ / C fi .µ / fj .µ / :
2¼
(A.4)
(A.5)
(A.6)
Now, using the symmetry of the tuning curves, fi .µ / D f .Ái ¡µ/, one obtains
from equation A.5, ui D f1 eiÁi .
Both the correlation matrix Cij and the outer product fi fj are functions of
two angles, .Ái ¡ µ/ and .Áj ¡ µ/. After integration over all values of µ (e.g.,
equation A.6), one obtains by a change of variables µ 0 D µ ¡ Áj that Q is a
function of only one angle Qij D Q.Ái ¡ Áj /. Hence, u is an eigenvector of
Q. Thus, the optimal linear weights for discrimination are wi / eiÁi , that is,
zO is the LPV.
A.2 Derivation of the Discrimination Error. One can apply the linear
readout to the task of discrimination. We deŽne the readout according to
the sign of q D w.r.1/ ¡ r.2/ /. Minimizing the noise, h.±q/2 i D 2wCwt , while
constraining the signal, hqi ¼ wf0±µ, one obtains that the optimal linear
readout weights are given by w D C¡1 f0 . Thus, for the optimal discriminator,
q is a gaussian random variable with mean ±µf0t C¡1 f0 D ±µ Jmean and variance
2Jmean . The probability of discrimination
error, q±µ < 0, for the optimal linear
p
discriminator is given by H.d0L = 2/ with a discriminability
p
d0L D j±µj Jmean .µ /;
(A.7)
thus yielding the result of equation 1.22.
A.3 Derivation of the NLPV Noise Term. For the calculation of the
NLPV noise term, it is convenient to deŽne
ZO D TrfArrt g
N
X
Aij D
W ik eiÁk W kj ;
kD1
(A.8)
(A.9)
1132
M. Shamir and and H. Sompolinsky
where rrt is an .N £ N/ matrix of rank 1. Using the above deŽnition of A
with the notation A D AR C iAI , with AR and AI real matrices, the signal
O D Tr.AR C/ C Tr.AR fft /. The noise is given by
can be written as hXi
X
O 2i D
h.± Y/
(A.10)
AIij AIkl M ijkl
ijkl
Mijkl D h±.ri rj /±.rk rl /i;
(A.11)
where ±.ri rj / D ri rj ¡ hri rj i. Using the gaussianity of the ri , we obtain
Mijkl D Cik Cjl C Cil Cjk C Cik fj fl C Cil fj fk C Cjk fi fl C Cjl fi fk :
(A.12)
Substituting equation A.12 into equation A.10 yields
O 2 i D 2Trf.AI C/2 g C 4Trf.AI C/AI fft g:
h.± Y/
(A.13)
The last term on the right-hand side of equation A.13 contains only terms
including Fourier transforms of the tuning curve, f .µ /, of order > p. Due
to the smoothness of f .µ /, these terms will decay faster than any power
of p, and hence will be neglected in our analysis. Using the form of W,
equation 2.10, the expression for AI , imaginary of equation A.8, is reduced
to
²
X ±
1
(A.14)
AIij D
ei.nC1/Ái ¡inÁj ¡ einÁi ¡i.nC1/Áj :
2iN > p
n
< ¡p ¡ 1
Substituting equation A.14 into equation A.13, we obtain
n
o
X
O 2 i D N2
h.± Y/
CQ m;n CQ nC1;mC1 ¡ CQ mC1;n CQ nC1;m ;
m;n
(A.15)
>p
< ¡p ¡ 1
where CQ m;n is the double Fourier transform of C, as deŽned in equation 2.12.
We now consider the contribution of the different parts of the correlation
matrix, namely S and D, to the noise term, equation A.15. As mentioned
above, equation 2.13, the transform of C is a sum of transforms of S and
D. Thus, the products CQ m;n CQ t;u contain three different terms: products of
transforms of D, products of transforms of S, and a mixture term containing
products of transforms of D and transforms of S.
Using the relation, equation 2.13, we obtain the contribution of terms
containing only D, ID , to the noise
X
fDm¡n Dn¡m ¡ DmC1¡n DnC1¡m g
ID D
m;n
¼N
>p
< ¡p ¡ 1
X
m
fDm D¡m ¡ Dm D2¡m g D N
B0 ¡ B2
;
2
(A.16)
Nonlinear Population Codes
1133
where we have neglected terms of order p relative to N. Bn is the nth Fourier
transform of B.Á/ D 2D2 .Á/. Note that this term scales linearly with N and
does not decay fast as p grows.
The contribution of terms containing only S, IS , is given by
IS D N 2
X
m;n
>p
< ¡p ¡ 1
©
ª
Sm;n SnC1;mC1 ¡ SmC1;n SnC1;m :
(A.17)
This term, IS , scales like N 2 . However, since it contains only transforms of
S that are of order > p, it will be a strongly decaying function of p. Below,
we focus on model 1. In model 1, the Fourier modes are eigenvectors of S:
Smn D ±mn S.n/, thus yielding
IS D 2N 2
X
n>p
S.n/S.n C 1/; model 1
(A.18)
For large n, we can approximate S.n/ ¼ cbref =.¼½/.1¡.¡1/n e¡¼=½ /n¡2 , model
1. Replacing the summation in the last equation by an integral, we obtain
IS D
2N 2
3
³
cbref
¼½
´2
.1 ¡ e¡2¼=½ /p¡3 ; model 1:
(A.19)
The contribution of the mixed term, ISD is
Á
ISD D 2N.D0 ¡ D2 / 2
X
n>p
!
S.n/ ¡ S.p/ / Np¡1 ; model 1:
(A.20)
Note that although this term yields a contribution that scales linearly with
N, this contribution is suppressed due to the algebraic decay of ISD with the
increase of p.
Neglecting the contribution of ISD , we obtain that the efŽciency of the
NLPV is given by
1
1 C N=Nsat
NID
B0 ¡ B2
¼
Nsat .p/ D lim
p3 ; model 1:
±
²2
N!1 I S
cbref
¡2¼=½
4=3 ¼½
.1 ¡ e
/
h.± µO /2 i¡1 D NJ0
(A.21)
(A.22)
The above results are qualitatively the same for model 2 as well. The algebraic decay of the transforms of S results from the discontinuity of the
derivatives of S on the diagonal. This discontinuity causes the nth Fourier
component to decay asymptotically like n¡2 in model 1. In model 2, the
Fourier components are not its eigenvectors. However, the scaling of the
1134
M. Shamir and and H. Sompolinsky
different contributions to the noise term, namely ID , IS , and ISD , is the same
as in model 1. Thus, for model 2, the relation, equation A.21, still holds with
the deŽnition
NID
:
N!1 IS
Nsat.p/ D lim
(A.23)
A.3.1 The CalculationPof the Moments of Ri . The Ri are squares of gaussian
variables:Ri D t2i I ti D j W ij rj with average
hti i D
X
einÁi fn
(A.24)
jnj>p
and covariance
h±ti ±tj i D
X
jmj;jnj>p
e¡inÁi CimÁj CQ mn :
(A.25)
In the limit of p ¿ N ¿ Nsat.p/, the averages of t are exponentially small in
p, and hence will be neglected,
hti i ¼ 0
p ¿ N ¿ Nsat .p/
(A.26)
Using equation 2.13, we obtain from equation A.25 in the limit of p ¿ N ¿
Nsat .p/:
h±ti ±tj i D Di ±ij
p ¿ N ¿ Nsat.p/:
(A.27)
From equations A.26 and A.27 and the gaussianity of the ti , we obtain the
results of the equations 2.22 and 2.23.
Acknowledgments
This research is partially supported by the Israel Science Foundation, Center
of Excellence Grant no 8006/00. M.S. is supported by a scholarship from the
Clore Foundation.
References
Abbott, L. F., & Dayan P. (1999). The effect of correlated variability on the accuracy of a population code. Neural Comput., 11(1), 91–101.
Nonlinear Population Codes
1135
Aertsen, A. M., Gerstein, G. L., Habib, M. K., & Palm, G. (1989) Dynamics of
neuronal Žring correlation: Modulation of “effective connectivity.” J. Neurophysiol., 61(5), 900–917.
Ahissar, M., Ahissar, E., Bergman, H., & Vaadia, E. (1992) Encoding of soundsource location and movement: Activity of single neurons and interactions
between adjacent neurons in the monkey auditory cortex. J. Neurophysiol.,
67(1), 203–215.
Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (1992). The
analysis of visual motion: A comparison of neuronal and psychophysical
performance. J. Neurosci., 12(12), 4745–4765.
Deneve, S., Latham, P. E., & Pouget, A. (1999). Reading population codes: A
neural implementation of ideal observers. Nat. Neurosci., 2(8), 740–745.
Fitzpatrick, D. C., Batra, R., Stanford, T. R., & Kuwada, S. (1997). A neuronal
population code for sound localization. Nature, 388(6645), 871–874.
Georgopoulos, A. P., Schwartz, A. B., & Kettner, R. E. (1986). Neuronal population coding of movement direction. Science, 233(4771), 1416–1419.
Kay, S. M. (1993). Fundamentals of statistical signal processing. London: Prentice
Hall International.
Lee, C., Rohrer, W. H., & Sparks, D. L. (1988). Population coding of saccadic eye
movements by neurons in the superior colliculus. Nature, 332(6162), 357–360.
Lee, D., Port, N. L., Kruse, W., & Georgopoulos, A. P. (1998). Variability and
correlated noise in the discharge of neurons in motor and parietal areas of
the primate cortex. J. Neurosci., 18(3), 1161–1170.
Mastronarde, D. N. (1983). Correlated Žring of cat retinal ganglion cells. II.
Responses of X- and Y-cells to single quantal events. J. Neurophysiol., 49(2),
325–349.
Maynard, E. M., Hatsopoulos, N. G., Ojakangas, C. L., Acuna, B. D., Sanes, J.
N., Normann R. A., & Donoghue, J. P. (1999). Neuronal interactions improve
cortical population coding of movement direction. J. Neurosci., 19(18), 8083–
8093.
Poirazi, P., & Mel, B. W. (2000). Choice and value exibility jointly contribute
to the capacity of a subsampled quadratic classiŽer. Neural Comput., 12(5),
1189–1205.
Schwartz, O., & Simoncelli, E. P. (2001). Natural signal statistics and sensory
gain control. Nat. Neurosci., 4(8), 819–825.
Seung H. S., & Sompolinsky H. (1993). Simple models for reading neuronal
population codes. Proc. Natl. Acad. Sci. USA, 90(22), 10749–10753.
Shamir, M., & Sompolinsky, H. (2001a). Correlation codes in neuronal networks.
In D. G. Thomas, B. Suzanna, & G. Zoubin (Eds.), Advances in neural information processing systems, 14. Cambridge, MA: MIT Press.
Shamir, M., & Sompolinsky, H. (2001b). Nonlinear population vector for correlated
neurons. Abstract presented at the Society for Neuroscience’s 31st Annual
Meeting.
Sompolinsky, H., Yoon, H., Kang, K., & Shamir, M. (2001). Population coding in
neuronal systems with correlated noise. Phys. Rev. E, 64(5 Pt 1), 051904.
Thomas, J. A., & Cover, T. M. (1991). Elements of information theory. New York:
Wiley.
1136
M. Shamir and and H. Sompolinsky
van Kan, P. L., Scobey, R. P., & Gabor A. J. (1985). Response covariance in cat
visual cortex. Exp. Brain. Res., 60(3), 559–563.
Vogels, R., Spileers, W., & Orban, G. A. (1989). The response variability of striate
cortical neurons in the behaving monkey. Exp. Brain Res., 77(2), 432–436.
Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippocampal ensemble code for space. Science, 261(5124), 1055–1058.
Young, M. P., & Yamane, S. (1992). Sparse population coding of faces in the
inferotemporal cortex. Science, 256(5061), 1327–1331.
Zohary, E., Shadlen, M. N., & Newsome, W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature,
370(6485), 140–143.
Received August 6, 2003; accepted November 6, 2003.