Download Information processing in a neuron ensemble with the multiplicative

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuroesthetics wikipedia , lookup

Process tracing wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Neural modeling fields wikipedia , lookup

Binding problem wikipedia , lookup

Neuroinformatics wikipedia , lookup

Neurophilosophy wikipedia , lookup

Synaptic gating wikipedia , lookup

Neurocomputational speech processing wikipedia , lookup

Central pattern generator wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Cortical cooling wikipedia , lookup

Height and intelligence wikipedia , lookup

Neural oscillation wikipedia , lookup

Neuroeconomics wikipedia , lookup

Neuroethology wikipedia , lookup

Biological neuron model wikipedia , lookup

Optogenetics wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Convolutional neural network wikipedia , lookup

Neural coding wikipedia , lookup

Artificial neural network wikipedia , lookup

Nervous system network models wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Metastability in the brain wikipedia , lookup

Neural binding wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Neural engineering wikipedia , lookup

Recurrent neural network wikipedia , lookup

Development of the nervous system wikipedia , lookup

Transcript
Neural Networks 17 (2004) 205–214
www.elsevier.com/locate/neunet
Information processing in a neuron ensemble with
the multiplicative correlation structure
Si Wua,*, Shun-ichi Amarib, Hiroyuki Nakaharab,c
a
Department of Informatics, Sussex University, Brighton, BN1 9QH UK
b
RIKEN Brain Science Institute, Wako-shi, Saitama, Japan
c
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Received 9 October 2002; revised 6 October 2003
Abstract
The present study investigates the performance of population codes when the fluctuations in neural activity have mutual correlation with
strength being proportional to the neuronal firing rate (multiplicative noise). The neural field is used to calculate the Fisher information,
which is decomposed in two parts, one due to the tuning function and spatial correlation, and the other due to the multiplicative structure.
Their different characteristics are studied. The paper also investigates three types of maximum likelihood method, namely, decoding by using
faithful and unfaithful models and the Center of Mass strategy, and compares their performances in terms of decoding accuracy and
computational complexity.
q 2003 Elsevier Ltd. All rights reserved.
Keywords: Information processing; Population code; Multiplicative correlation; Neural field; Fisher information; Maximum likelihood; Asymptotic efficiency;
Center of mass
1. Introduction
One of the key questions in neuroscience is how an
ensemble formed from a neural population encodes and
decodes the external world (deCharms & Zador, 2000;
Deadwyler & Hampson, 1997; Oram, Földiak, Perrett, &
Sengpiel, 1998). One approach to address this question is to
investigate the accuracy of neural population coding by
using the Fisher information (Abbott & Dayan, 1999;
Brunel & Nadal, 1998; Deneve, Latham, & Pouget, 1999;
Eurich & Wilke, 2000; Nakahara & Amari, 2002; Nakahara,
Wu, & Amari, 2001; Paradiso, 1998; Pouget, Deneve,
Ducom, & Latham, 1999; Pouget, Zhang, Deneve, &
Latham, 1998; Salinas & Abbott, 1994; Sanger, 1998;
Seung & Sompolinsky, 1993; Snippe, 1996; Wu, Nakahara,
& Amari, 2001; Wu, Amari, &Nakahara, 2002b; Zemel,
Dayan, & Pouget, l998; Zhang & Sejnowski, 1999). This is
because the inverse of the Fisher information, called the
Crameér-Rao bound, gives the lower bound of decoding
errors for unbiased estimators. Thus, the Fisher information
of the neural ensemble is an useful indicator to assess how
* Corresponding author. Tel.: þ 44-1273-678770; fax: þ 44-1273671320.
E-mail address: [email protected] (S. Wu).
0893-6080/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2003.10.003
accurately the neural ensemble may encode the external
world and also how closely the decoding of the neural
ensemble in the brain can realize such an accuracy.
The correlation structure of firing in the neural ensemble is
important not only to investigate a function of neural ensemble
in general (Shadlen, Britten, Newsome, & Movshon, 1996;
Zohary, Shadlen, & Newsome, 1994), but also in the approach
using the Fisher information: The encoding accuracy of the
neural ensemble drastically depends on the correlation
structure (Abbott and Dayan, 1999; Wilke & Eurich, 2002;
Wu, Nakahara, Murata & Amari, 2000b; Wu et al., 2002b;
Yoon & Sompolinsky, 1999). Thus, it is important for any
theoretical approach, including the one using the Fisher
information, to use an appropriate correlation structure, in
other words, to use the structure found in experiments.
Experimental results usually indicate weak correlations in
neural activities: The mean correlation coefficient (COR) is
weak at approximately 0.01, (ranging 0.01–0.20) (Gawne &
Richmond, 1993; Lee, Port, Kruse, & Georgopoulos, 1998a,
b; Maynard et al., 1999; McAdams and Maunsell, 1999b;
Zohary et al., 1994), while the COR of some neuron pairs
could be up to 0.8 (Maynard et al., 1999). It is still under
controversy to determine an exact correlation structure of the
neural ensemble in general. However, experimental data
suggest a multiplicative form as a most promising ‘first’
206
S. Wu et al. / Neural Networks 17 (2004) 205–214
approximation. A multiplicative form (Abbott & Dayan,
1999; Nakahara & Amari, 2002; Wilke & Eurich, 2002; Wu,
Chen, & Amari, 2000a) in general is given by
Aij ¼ kij fia fja ;
ð1Þ
where Aij represents the covariance of the firing activities of
neuron i and j (the variance in case of i ¼ j), fi and fj represents
the mean firing rates of the neurons i and j; respectively, a and
kij are parameters, both of which are usually fitted as constants
for each pair of two neurons in experimental literatures.
Experimental data on several cerebral cortical areas indicate
that the exponent (i.e. a) is distributed roughly around 0.55
and that the constant kii ranges mainly around 0.8–3.0 with a
tendency of kii $ lkij lði – jÞ (Britten, Shadlen, Newsome,
& Movshon, 1992; Dean, 1981; Gershon, Wiener, Latham, &
Richmond, 1998; Lee et al., 1998a,b; Tolhurst, Movshon, &
Dean, 1983; Vogels, Spileers, & Orban, 1989).
Provided the above multiplicative form as the first
approximation, reports in the literature indicate some more
details, particularly in relation to the term kij : One study
(Zohary et al., 1994) suggested that the correlation between
neurons, whose preferred stimuli are similar, is significantly
higher than that between unsimilar neurons, while another
study (Lee et al., 1998a,b) considered this phenomena only
for neurons whose recording electrodes are close in the
cortex. The other study (Maynard et al., 1999), however,
suggests little dependency on recording sites.
These not necessarily consistent data urge us to investigate
the encoding and decoding accuracy of a neural ensemble with
different correlation structures under the same multiplicative
form. There has been a lot of theoretical research in this
direction, and the Fisher information has been found to depend
critically on the correlation structure. When neurons fire
independently according to their tuning function, the Fisher
information increases proportional to the number of neurons
or the density of neurons in a continuous neural field, as is
easily expected. However, some authors (Abbott & Dayan,
1999; Yoon & Sompolinsky, 1999) discovered that this is not
true if there are correlations among neurons. The Fisher
information may saturate. Wu et al. used the method of neural
field (Amari, 1977; Giese, 1999; Wu et al., 2002a) to calculate
the Fisher information systematically, and found that the
saturation occurs only in a middle range of correlations. The
Fisher information increases without limit, when the effective
range of correlation is very short or very long.
These results are based on the simple additive noise
correlation. Eurich and Wilke (2000) used the general
multiplicative case and showed that the Fisher information
increases proportional to the number of neurons and that the
saturation does not take place. The present paper uses the
neural field model (Wu et al., 2002a) to show more detailed
structure of the Fisher information in the multiplicative case
studied by (Eurich and Wilke, 2002). The Fisher information is decomposed into two parts, one due to the form of
the tuning function and the range of correlation, and the
other due to the multiplicative effect (i.e. the term fia fja in
Eq. (1)), which vanishes in the additive case. Our findings
are: (1) the first part behaves similarly to the additive case,
and it saturates in some range of correlation. (2) The second
part always increases proportional to the number of neurons,
as proved in (Eurich and Wilke, 2000). (3) The first part
increases inversely proportional to the noise level, but the
second part does not depend on the noise level.
Furthermore, the present study also investigates the
important issue, the difference in decoding accuracy by the
faithful and unfaithful models (Nakahara & Amari, 2002; Wu
et al., 2001, 2002a). The definition of the Fisher information
implicitly poses the assumption that decoding is carried out by
using the true encoding probability distribution (Wu et al.,
2001). This assumption, however, is rather excessive and may
not hold in a general case. For example, given the complexity
and hierarchy of brain neural encoding/decoding structures, it
would be more plausible to expect the encoding and decoding
distributions to be different. If the decoding distribution is
different from the encoding one, the decoding one is called
unfaithful (If the same, called faithful). In addition to the
decoding accuracy of the faithful model, which equals the
inverse of the Fisher information in the multiplicative
correlation case, the present study also investigates the
decoding accuracy of a specific unfaithful model that neglects
the neuronal correlation. This unfaithful model is attractive
partly because it can be a compromise between computational
complexity and estimation errors (Wu et al., 2001) and partly
because a biologically plausible recurrent neural dynamics
can lead to the estimation of this model (Pouget et al., 1998;
Wu et al., 2001).
2. The encoding model
We begin with a discrete encoding model. Consider an
ensemble of N neurons coding a variable x which represents
the position of the stimulus. Let us denote by ci the preferred
stimulus position of the ith neuron, and let ri denote the
response of the neuron, so that r ¼ {ri }; for i ¼ 1; …; N;
denote the population activity.
The neural responses are correlated, and the ith neuron’s
activity is given by
ri ¼ fi ðxÞ þ s1i ; i ¼ 1; …; N;
ð2Þ
where fi ðxÞ is the tuning function of the ith neuron
representing the mean value of the response when stimulus
x is applied, and s1i is noise whose probability may depend
on x:
In the present study, we consider only the Gaussian
tuning function, that is,
2
2
1
fi ðxÞ ¼ pffiffiffiffi e2ðci 2xÞ =2a þ d;
2pa
ð3Þ
where the parameter a is the tuning width and d is a
small constant representing the level of the spontaneous
activity.
S. Wu et al. / Neural Networks 17 (2004) 205–214
The parameter s represents the noise intensity, and 1i is
the noise, which satisfies
k1i l ¼ 0;
ð4Þ
k1i 1j l ¼ Aij ;
ð5Þ
where hðc; c0 ; xÞ is the covariance function (see Appendix A),
0
a ð1 2 bÞ
0
2ðc2c0 Þ2 =2b2
dðc 2 c Þ þ be
hðc; c ; xÞ ¼ f ðc 2 xÞ
r
f ðc0 2 xÞa
¼ f ðc 2 xÞa Bðc; c0 Þf ðc0 2 xÞa ;
where k·l represents averaging over many trials.
The covariance matrix Aij ðxÞ contains the structure of the
noise, which we assume has a general form,
dðcÞ is the delta function, and
Aij ðxÞ ¼ fia ðxÞ½ð1 2 bÞdij þ be2ðci 2cj Þ =2b fja ðxÞ;
Bðc; c0 Þ ¼
2
¼
2
fia ðxÞBij fja ðxÞ;
ð6Þ
with
Bij ¼ ð1 2 bÞdij þ be2ðci 2cj Þ
2
=2b2
:
ð7Þ
207
ð12Þ
0 2
2
ð1 2 bÞ
dðc 2 c0 Þ þ be2ðc2c Þ =2b :
r
ð13Þ
The continuous form of the encoding process is,
(
1
r2 ð ð
QðrlxÞ ¼ exp 2 2
½rðcÞ 2 f ðc 2 xÞhp ðc; c0 ;xÞ½rðc0 Þ
Z
2s
)
The parameters are 0 # a # 1 and 0 # b # 1; and the
width b is the effective correlation length. The variable a
reflects the degree of multiplicative correlation (Eurich &
Wilke, 2000). When a ¼ 0; the noise is additive. The spatial
correlation is controlled by Bij ; which we assume decays
exponentially as lci 2 cj l becomes large. The noise can be
seen as a sum of independent or white Gaussian noise with
intensity 1 2 b; and the correlated Gaussian noise with
intensity b (Wu et al., 2002a).
The encoding process of population coding is fully
specified by the conditional probability density of r when
stimulus x is given, as
where r ¼ {rðcÞ} and Z is the normalization factor. The
function hp ðc; c0 ;xÞ is the inverse kernel of hðc;c0 ; xÞ;
satisfying
ðð
r2
ð15Þ
hp ðc; c0 ; xÞhðc0 ; c00 ;xÞdc0 dc00 ¼ 1:
1
QðrlxÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
ð2ps ÞN detðAÞ
1 X 21
exp 2 2 Aij ðri 2fi ðxÞÞðrj 2fj ðxÞÞ :
2s ij
where Bp ðc; c0 Þ is the inverse kernel function of Bðc; c0 Þ;
that is,
ðð
r2
ð17Þ
Bp ðc;c0 ÞBðc0 ; c00 Þdc0 dc00 ¼ 1:
ð8Þ
2f ðc0 2 xÞdc dc0 ;
ð14Þ
It is easy to check that
hp ðc; c0 ; xÞ ¼ f ðc 2 xÞ2a Bp ðc; c0 Þf ðc0 2 xÞ2a ;
ð16Þ
2.1. Generalization to the continuous case
Since we are only interested in the case when the number
N of neurons is large (more accurately, the neuronal density
is high), it is useful to extend the discrete encoding model to
the continuous version. Mathematically, there is a large
benefit from coping with a continuous neural field model
(Amari, 1977; Giese, 1999).
Let us consider a one-dimensional neural field, in which
neurons are located with uniform density r: The activity of
neuron at position c is denoted by rðcÞ:
The neural response function rðcÞ is given by
rðcÞ ¼ f ðc 2 xÞ þ s1ðcÞ;
ð9Þ
when stimulus x is applied, where quantities rðcÞ; f ðc 2 xÞ
and 1ðcÞ are the counterparts of ri ; fi and 1i in the discrete
version, respectively. The tuning function f ðc 2 xÞ has the
same form as that of fi ðxÞ except that ci is replaced by c:
The noise term 1ðcÞ satisfies
k1ðcÞl ¼ 0;
ð10Þ
k1ðcÞ1ðc0 Þl ¼ hðc; c0 ; xÞ;
ð11Þ
3. The fisher information
The Fisher information for the encoding model QðrlxÞ is
defined as
ð
d2 ln QðrlxÞ
IF ðxÞ ¼ 2 QðrlxÞ
dr:
ð18Þ
dx2
From Eq. (14), we get
r2 ð ð 0
IF ðxÞ ¼ 2
f ðc 2 xÞhp ðc; c0 ; xÞf 0 ðc0 2 xÞdc dc0
s
r2 ð ð
þ
hðc; c0 ; xÞh00 pðc; c0 ; xÞdc dc0 :
2
¼ IT þ IC ;
ð19Þ
where f ðc 2 xÞ ¼ df ðc 2 xÞ=dx and h pðc; c ; xÞ ¼
d2 hp ðc; c0 ; xÞ=dx2 : Note that IF ðxÞ does not depend on x;
because of the homogeneity of the neural field.
We shall demonstrate below that the first part of the
Fisher information, IT ; is mostly due to the tuning function,
0
00
0
208
S. Wu et al. / Neural Networks 17 (2004) 205–214
Fig. 1. Comparing the decoding errors of FMLI, UMLI and COM in the case of strong correlation ðb ¼ 1Þ and weak noise ðs ¼ 0:05Þ: The other parameters are
a ¼ 0:55; a ¼ 1; d ¼ 0; and b ¼ 0:5: Because of the big difference in the magnitudes of the results, they are shown in two sub-figures: (a) FMLI; (b) UMLI and COM.
in the sense that its value depends on the derivative of the
tuning function and the spatial range of correlation to be
shown later. The second term, IC ; is mostly due to the
multiplicative nature of the noise, because it vanishes when
a ¼ 0: The two terms are calculated separately.
The asymptotic behavior of IT can be similarly analyzed as
in the previous work (Wu et al., 2002b).1 The Fisher
information IT does not necessarily increase with r; and its
behavior strongly depends on the length b of correlation.
The results are summarized in five different cases (as
illustrated in Fig. 1 of Wu et al., (2002b))
3.1. Fisher information from the tuning function
The Fourier transform of a function gðtÞ is defined as
1 ð
F½gðtÞ ¼ pffiffiffiffi e2ivt gðtÞ dt:
2p
ð20Þ
By using the Fourier transformation,
IT ¼
pffiffiffiffi 2 ð
2pr
FðvÞFð2vÞ
dv;
s2
r2 BðvÞ
ð21Þ
where FðvÞ ¼ F½f 0 ðc 2 xÞf 2a ðc 2 xÞ and BðvÞ ¼
F½Bðc 2 c0 Þ: Here, we utilize the relation F½Bp ðc; c0 Þ ¼
1=½r2 BðvÞ: This is quite similar to the case with additive
noise as studied in (Wu et al., 2002a).
In order to see the property of IT ; we ignore the small
constant d in f ðc 2 xÞ; corresponding to the spontaneous
firing. Then, from Eqs. (3) and (13),
FðvÞ ¼ iGwe
2a2 v2 =2ð12aÞ2
;
2 2
ð1 2 bÞ
BðvÞ ¼ pffiffiffiffi þ bbe2b v =2 ;
2pr
ð22Þ
† No correlation. When b ¼ 0; IT is proportional to r:
† Local correlation. When b is order of 1=r;IT is
proportional to r:
pffiffi
† Short-range correlation. When 1=r pb , 2a=ð12 aÞ; IT
saturates to a constant even when r goespto
ffiffi infinity.
† Wide-range correlation. When b $ 2a=ð12 aÞ; IT
increases in proportion to r:
† Uniform multiplicative correlation. When b!1; Aij ¼
fi ðxÞa ½ð12 bÞdij þ bfj ðxÞa ; or hðc;c0 ;xÞ ¼ f ðc2xÞa ½ð12 bÞ=rdðc2c0 Þþ bf ðc0 2xÞa ; IT is proportional to r:
3.2. The fisher information due to the multiplicative
structure
Let us see the property of IC ; i.e. the contribution to the
Fisher information from the multiplicative correlation
structure, whose value is calculated to be (see Appendix B)
ð
IC ¼ a2 r kðcÞ2 dc
2
0 2
ðð KðvÞKð2vÞð12 b þ rpffiffiffiffi
2pbbe2b ðv Þ =2 Þ
pffiffiffiffi
þ2pa
dv dv0 ;
2
0
2
12 b þ r 2pbbe2b ðv 2vÞ =2
ð25Þ
¼IC1 þIC2 ;
2
ð23Þ
where G ¼ ð2pÞ20:5þa aa ð1 2 aÞ21:5 : In the above we put
x ¼ 0 (due to the homogeneity of the field) without loss of
generality. Therefore,
2 2
2a v
v2 exp ð12
2pG2 r2 ð1
a Þ2
pffiffiffiffi
IT ¼
dv :
ð24Þ
2 2
2
s2
21 rð12 bÞþ r
2pbb e2b v =2
where kðc2xÞ¼f 0 ðc2xÞ=f ðc2xÞ and
f ðc2xÞ:
1
KðvÞ¼F½f 0 ðc2xÞ=
Note that Eq. (24) has the similar form as Eq.(5) in Wu et al. (2002b),
and hence they have the same asymptotical behaviours on r as illustrated in
Fig. 1 of Wu et al. (2002b).
S. Wu et al. / Neural Networks 17 (2004) 205–214
From the above decomposition, we see that IC1 is
proportional to the neuronal density r: The second term, IC2
saturates to a constant even when r tends to infinity.
Because of IC1 ; the Fisher information will always
increase with the neuronal density r in the case of
multiplicative correlation, as was found by (Eurich &
Wilke, 2000). The present calculation confirms their result.
Another interesting finding of this study is that, while IT
is order of 1=s and grows infinitely as the noise intensity
tends to 0; IC is independent of s; being derived only from
the multiplicative nature of correlation. It vanishes as a
tends to 0. This is understandable, since one cannot expect
more from the knowledge of multiplicative correlation by
decreasing the noise intensity as they are coupled in product.
This is an important property we need to pay attention to
when practical data are analyzed2. It tells us that there is a
trade-off between IT and IC ; which one dominates over the
other depends not only on r but also on s:
4. Population decoding
The Fisher information provides us only with an optimal
decoding accuracy that an unbiased estimator can achieve.
When a practical decoding method is concerned, its
performance needs to be evaluated individually depending
on the decoding model. We compare three decoding
methods, all of which are formulated as the Maximum
Likelihood Inference (MLI) type, that is, the maximizer of a
likelihood function, whereas, they differ in the probability
models for decoding.
4.1. Three decoding methods
A MLI type estimator x^ is obtained through maximization of the presumed log likelihood ln PðrlxÞ; i.e. by solving
7ln Pðrl^xÞ ¼ 0;
ð26Þ
where 7kðxÞ denotes dkðxÞ=dx: PðrlxÞ is called the decoding
model, which may be different from the real encoding
model QðrlxÞ: This is because the decoding system usually
does not know the exact encoding system. Moreover, a
simple and robust decoding model is desirable from the
computational point of view (Wu et al., 2002a). We consider
three decoding methods defined as follows.
† The first method is the conventional MLI, referred to as
FMLI, which utilizes all of the encoding information, i.e.
the decoding model is the true encoding model,
PF ðrlxÞ ¼ QðrlxÞ:
ð27Þ
† The second method, referred to as UMLI, utilizes the
information on the shape of the tuning function and
2
In Eurich and Wilke (2000), this property seems not to be paid attention,
as they did not define an explicit parameter for the noise intensity.
209
the magnitude of signal fluctuation, but neglects the
detail of neural correlation, so that the probability density
1
r ð
2 2a
PU ðrlxÞ¼ exp 2 2 ½rðcÞ2f ðc2xÞ =f ðc2xÞdc ;
ZU
2s
ð28Þ
is used for decoding.
† The third method, referred to as COM, does not utilize
any information of the encoding process, but instead it
assumes an incorrect but simple tuning function. It also
disregard correlations by using
1
r ð
2
~
PC ðrlxÞ¼ exp 2 2 ½rðcÞ2 fðc2xÞ
dc;
ð29Þ
ZC
2s
2
~
where fðc2xÞ¼2ðx2cÞ
þconst is used as a presumed
tuning function.
It is easy to check that the third method is equivalent to
the conventional Center of Mass decoding strategy, with the
solution given by
Ð
crðcÞdc
:
ð30Þ
x^ ¼ Ð
rðcÞdc
4.2. The performance of UMLI
We first analyze the performance of UMLI. For
convenience, two notations are introduced: EQ ½kðr; xÞ and
VQ ½kðr; xÞ denote, respectively, the mean and variance of
kðr; xÞ with respect to the distribution QðrlxÞ:
Suppose x^ is close enough to x: We expand 7ln PU ðrl^xÞ
at x;
7ln PU ðrl^xÞ . 7ln PU ðrlxÞ þ 77ln PU ðrlxÞð^x 2 xÞ:
ð31Þ
Since the estimator x^ satisfies 7ln PU ðrl^xÞ ¼ 0;
77lnPU ðrlxÞð^x 2 xÞ . 27ln PU ðrlxÞ:
ð32Þ
Let us denote
R ¼ 7ln PU ðrlxÞ ¼
¼
r ð ½rðcÞ 2 f ðc 2 xÞf 0 ðc 2 xÞ
dc
s2
f 2a ðc 2 xÞ
r ð 1ðcÞf 0 ðc 2 xÞ
dc;
s
f 2a ðc 2 xÞ
ð33Þ
S ¼ 77ln PU ðrlxÞ
!
rð
f 00 ðc 2 xÞ
ðf 0 ðc 2 xÞÞ2
¼
1ðcÞ 2a
2 2a 2aþ1
dc
s
f
ðc 2 xÞ
f ðc 2 xÞ
¼2
r ð f 0 ðc 2 xÞ2
dc
s2 f 2a ðc 2 xÞ
!
rð
f 00 ðc 2 xÞ
ðf 0 ðc 2 xÞÞ2
dc þ D;
1ðcÞ 2a
2 2a 2aþ1
¼
s
f ðc 2 xÞ
f
ðc 2 xÞ
ð34Þ
210
D¼2
S. Wu et al. / Neural Networks 17 (2004) 205–214
r ð f 0 ðc 2 xÞ2
2ð2pÞa20:5 a2a23 r
:
dc
¼
23=2 ð1 2 aÞ3=2 s2
s2 f 2a ðc 2 xÞ
ð35Þ
Then, the estimating equation is
Sð^x 2 xÞ ¼ 2R
ð36Þ
or
R
x^ 2 x ¼ 2 :
S
ð37Þ
Here, both R and S are random variables depending on 1ðcÞ:
It is easy to show that
We now remark the cases of short- and wide- range
correlation and of strong noise. In these cases, since BðvÞ is
Oð1Þ; the random and constant terms in the variable S are of
the same order. The decoding error ð^x 2 xÞ tends to have the
Cauchy-type distribution, whose variance is undefined. How
to choose a suitable performance measure in this case is an
open question.
4.3. The performance of FMLI
EQ ½R ¼ 0;
ð38Þ
Following the same line as for UMLI, we analyze the
performance of FMLI. Suppose x^ is close enough to x: We
expand 7ln Qðrl^xÞ; instead of 7ln Pðrl^xÞ; at x;
EQ ½S ¼ D:
ð39Þ
7ln Qðrl^xÞ . 7ln QðrlxÞ þ 77ln QðrlxÞð^x 2 xÞ:
Their variances are given, by using the Fourier transforms,
as
pffiffiffiffi 2 ð
2pr
VQ ½R ¼
ð40Þ
FðvÞFð2vÞBðvÞdv;
s2
pffiffiffiffi 2 ð
2pr
~ vÞBðvÞdv;
~ vÞFð2
ð41Þ
VQ ½S ¼
Fð
s2
~ vÞ ¼ F½f 00 ðc 2 xÞf 2a ðc 2 xÞ 2 2aðf 0 ðc 2 xÞÞ2 f 2a21
where Fð
ðc 2 xÞ:
These show that R is a zero-mean Gaussian random
variable. The random variable S is composed of two terms
(Eq. (34)), the first one being random (Eq. (41)) and the
second term D being a fixed constant (Eq. (35)).
The constant term D in S dominates over the random one
asymptotically in two cases
1. BðvÞ is order of 1=r: In this case the random term is of
Oðr1=2 Þ and D is of OðrÞ:
2. BðvÞ is of Oð1Þ; but the noise intensity s2 is sufficiently
small. In this case the random one is Oð1=sÞ and the
constant term D is Oð1=s2 Þ:
The case (1) corresponds to uncorrelated, local range,
and uniformly correlated cases. Note that the noise intensity
in this case is not important, provided that r is sufficiently
large. The case (2) holds for the short- and wide- range
correlations with sufficiently small noise. In these cases, we
may neglect the random term in R; so that we have
asymptotically
S < D;
x^ 2 x <
R
;
D
and the decoding error ð^x 2 xÞ is normally distributed with
zero mean and variance
pffiffiffiffi 2 ð
2pr
EQ ½ð^x 2 xÞ2UMLI ¼
ð43Þ
FðvÞFð2vÞBðvÞdv:
D 2 s2
UMLI is called quasi-asymptotic efficient, or quasi-Fisher
efficient when the above equality holds (for more detail,
please refer to Wu et al. (2002a)).
ð44Þ
Then, by using
R ¼ 7ln QðrlxÞ;
ð45Þ
S ¼ 77ln QðrlxÞ;
ð46Þ
we have
R
:
ð47Þ
S
Let us analyze the property of random variable S: Its mean
value is obtained to be
x^ 2 x ¼ 2
EQ ½S ¼ IF ;
ð48Þ
and the variance is
pffiffiffiffi 2 ð
^ vÞFð2
^ vÞ
2pr
Fð
VQ ½S ¼
ð49Þ
dv ;
2
2
s
r BðvÞ
^ vÞ ¼ F½f 00 ðc 2 xÞf 2a ðc 2 xÞ:
where Fð
The magnitude of VQ ½S can be similarly analyzed as IT
in Section 3 (note their similar forms). It can be shown that
VQ ½S is Oð1Þ in the short-range correlation and OðrÞ in
other cases. The magnitude of EQ ½S; however, is always
OðrÞ; even in short-range correlation. This is due to the
contribution from the multiplicative correlation as analyzed
before. Obviously, this property is not shared by UMLI and
COM, as they both neglect the correlation.
Therefore, the fluctuations of the random variable S with
respect to its mean value are order of 1=r1=2 : In the large
limit of r; the variable 1=S can be sufficiently approximated
to be 1=EQ ½S:
Finally, the variance of decoding error of FMLI is
calculated to be
VQ ½ð^x 2 xÞ2FMLI <
VQ ½R
1
¼ :
2
IF
EQ ½S
ð50Þ
It tells us that FMLI is always asymptotically efficient due to
the multiplicative correlation nature.
4.4. Summarization of performances
The performance of COM can be similarly analyzed. It is
understandable that COM has the same asymptotical
S. Wu et al. / Neural Networks 17 (2004) 205–214
Table 1
The asymptotic behaviors of Fisher information and three MLI type of
decoding methods (excluding the special case of weak noise). FE and QFE
denote Fisher efficiency and quasi-Fisher efficiency, respectively, and NonF (Non-Fisherian) the opposite cases
Correlation scale
IF
FMLI
UMLI
COM
Non
Local-range
Short-range
Wide-range
Uniform
/r
/r
/r
/r
/r
FE
FE
FE
FE
FE
QFE
QFE
Non-F
Non-F
QFE
QFE
QFE
Non-F
Non-F
QFE
behavior as that of UMLI, since they both neglect the
correlation. Its decoding error, when the condition of quasiasymptotic efficiency is satisfied, can be calculated to be
ðð
VQ ½ð^x 2 xÞ2COM < s2
ð51Þ
chðc; c0 Þc0 dc dc0 :
Table 1 summarizes the asymptotic behaviors of the three
decoding methods (excluding the special case of weak
noise, in which the MLI type of methods is always
approximately asymptotically or quasi-asymptotically efficient) and the Fisher information in different correlation
scales under the multiplicative form.
We compare the performances of the three methods. The
true stimulus is assumed to be zero, and the neural field
involved in coding is restricted to be within ½23; 3: This is
to avoid the divergence of the error of COM, and can be
understood as only those active enough neurons contributing to the decoding (Wu et al., 2001). Also, the weak noise is
considered to ensure that both UMLI and COM are quasiasymptotically efficient, such that the variance of decoding
error can be used as the performance measure. The
parameter a is set as a ¼ 0:55 as suggested by experiments.
Fig. 1 shows how the decoding errors of three methods
change with the neural density r when the correlation covers
a non-local range of population (b ¼ 1; i.e. corresponding to
211
the wide-range correlation). We see that the error of FMLI
decreases with r when r is large. This is understandable,
since the Fisher information is proportional to r in all cases.
A comparison with the additive noise case is interesting,
where the error of FMLI saturates (see Fig. 3(a) in Wu et al.
(2002a)). The errors of UMLI and COM, however, both
saturate when r is large.
Fig. 2 exhibits how the decoding errors of the three
methods change with the correlation scale when the neural
density is fixed. They all increase initially and decrease
when b is sufficiently large. This is understandable, since the
extremes on both ends correspond to the cases when neurons
are either uncorrelated or uniformly correlated.
In all illustrated examples, UMLI has a larger error than
FMLI, but a lower error than COM. The computational
complexity of three methods, however, can be analyzed as
follows. Consider maximization of the log likelihood of
UMLI and FMLI by using the standard gradient descent
method. The amounts of computation for obtaining the
derivative of the log likelihood of UMLI and FMLI are
proportional to N and N 2 ; respectively. UMLI is significantly simpler than FMLI when N is large. For COM, due to
the quadratic form of the tuning function, the estimation can
be done in one-shot by Eq. (30). Therefore, UMLI is able to
achieve a compromise between the decoding accuracy and
computational complexity.
5. Conclusion and discussion
In summary, we have investigated the performance of
population codes when the fluctuations in neural activity are
multiplicatively correlated, i.e. the correlation strength
depends on neuron’s firing rate. This correlation structure
is known to be the most likely one in the cortex. Hence, the
present study is expected to give us insight in the
understanding of the information process in the brain.
Fig. 2. Comparing the decoding errors of FMLI, UMLI and COM for different correlation scales. The parameters are a ¼ 0:55; a ¼ 1; d ¼ 0; b ¼ 0:5; r ¼ 50
and s ¼ 0:05: (a) FMLI; (b) UMLI and COM.
212
S. Wu et al. / Neural Networks 17 (2004) 205–214
The correlation model for the encoding process we
consider is quite general. Apart from the varieties on the
degree of the multiplicative nature (tuned by a), the spatial
scale of correlation can also change from non, local, short,
wide, till the whole range of neural ensemble (tuned by b).
We expect that this model covers all uncertainties in neural
correlation so far the experimental data tell us.
Based on the proposed model, we first calculate the
Fisher information, which is decomposed into two parts, IT
and IC : IT is due to the tuning function and spatial
correlation, and IC due to the multiplicative structure. IT is
shown to have the same asymptotic behavior as the Fisher
information in the additive noise case, and IC is proportional
to the neural density r: Hence, the Fisher information in the
multiplicative correlation case is always proportional to the
neural density, as found by (Wilke & Eurich, 2002).
Furthermore, we note that IT depends inversely on the noise
intensity s; whereas, IC is independent of it. This property is
also important. It tells us there is a trade-off between IT and
IC ; which one dominating over the other not only depends
on r but also s: The number of neurons involved in coding
and the noise level can be essentially measured by
experiment. With these data, we could judge which features
of neuronal activity play the main role in information
processing: If IT is larger, they are the tuning function and
spatial correlation, otherwise, it is the multiplicative
correlation.
Subsequently, we have investigated three decoding
methods. All of them are formulated as the MLI type,
including the conventional COM method, whereas they
differ in the knowledge of encoding process being utilized.
It is proved that FMLI is always asymptotically efficient in
the multiplicative correlation case, that is, it achieves the
Cramér-Rao bound in the large limit of r: However, for
UMLI and COM, they are not quasi-asymptotically efficient
when the correlation covers a non-local range of population
(excluding the cases of uniform correlation and weak
noises). When UMLI and COM are not quasi-asymptotically efficient, their decoding errors satisfy the Cauchy-type
of distribution with variance being un-defined. Therefore,
we should be careful of using Eqs. (43) or (51) to calculate
the variance. One may argue that, in practice, the value of
‘variance’ is always obtainable provided that there are a
sufficient number of data points, e.g., we may calculate
PNthe2
2
variance
of
x
by
using
the
formula
kðx
2
kxlÞ
l
¼
1=N
i xi
P
2ð1=N Ni xi Þ2 ; where xi denotes the ith measurement.
However, this value is un-reliable. The Cauchy-type of
distribution says that kðx 2 kxlÞ2 l does not converge with the
number of data points N: It implies that the value we obtain
is sensitive to the data set being used, no matter how many
data points are there (this is unlike the Gaussian distribution,
whose variance is more accurately estimated when more
data points are sampled.).
Finally, we compare the performances of the three
methods in terms of their decoding accuracy and computational complexity. It shows that UMLI, which utilizes
the knowledge of the tuning function and the noise intensity,
can achieve a compromise between computational complexity and decoding accuracy. This property may be
inspirational for our understanding of the structures of
neural encoding and decoding.
Appendix A. The covariance function of the continuous
encoding model
We assume that the covariance function has the same
form as for the discrete case,
hðc; c0 Þ ¼ f ðc 2 xÞa ½D1 ð1 2 bÞdðc 2 c0 Þ
0 2
þ D2 be2ðc2c Þ
=2b2
f ðc0 2 xÞa ;
ðA1Þ
where dðc 2 c0 Þ is the delta function. This assumption is
confirmed by the consistency of the final result.
In order to determine the coefficients D1 and D2 ; we use
the correspondence principle: The covariance matrix Aij and
the correlation function hðc; c0 Þ correspond to each other to
give the quadratic form for an arbitrary vector k ¼ ðki Þ and
its continuous version kðcÞ;
r2
ðð
kðcÞhðc; c0 Þkðc0 Þdc dc0 ¼
X
ki Aij kj ;
ðA2Þ
ij
where r is the neuronal density.
Without loss of generality, we consider that the preferred
stimulus ci is uniformly distributed in a range ½2ðL=2Þ; L=2;
that is,
ci ¼ 2
L
L
þ i ; for i ¼ 1; …; N:
2
N
ðA3Þ
By choosing ki ¼ fi ðxÞ2a ; for i ¼ 1; …; N; the above
equation becomes,
r2
ð L2 ð L2
L
22
¼
X
L
22
0 2
½D1 ð12 bÞdðc2c0 ÞþD2 b e2ðc2c Þ
=2b2
dcdc0
Bij :
ðA4Þ
ij
This has an intuitive meaning, i.e. the total correlation is
reserved after the continuous extension.
The left hand-side of the above equation can be
calculated to be
r2
ð L2 ð L2
L
22
L
22
0 2
½D1 ð12 bÞdðc2c0 ÞþD2 b e2ðc2c Þ
pffiffiffiffi
¼D1 ð12 bÞN 2 =LþD2 b 2pbN 2 =L;
=2b2
dcdc0
ðA5Þ
S. Wu et al. / Neural Networks 17 (2004) 205–214
where r ¼N=L is used, and the right hand-side
X
pffiffiffiffi
Bij ¼Nð12 bÞþN 2 b 2pb=L;
ðA6Þ
ij
in the large N limit.
Comparing Left and Right, we get D1 ¼ 1=r and D2 ¼ 1;
and finally
0 2
2
ð1 2 bÞ
hðc; c0 Þ ¼ f ðc 2 xÞa
dðc 2 c0 Þ þ b e2ðc2c Þ =2b
r
f ðc0 2 xÞa
¼ f ðc 2 xÞa Bðc; c0 Þf ðc0 2 xÞa :
ðA7Þ
Appendix B. The second part of fisher information, IC
From Eq. (19),
r2 ð ð
IC ¼
hðc; c0 ; xÞh00 pðc; c0 ; xÞdc dc0 :
2
ðB1Þ
The derivative of covariance function is calculated to be
h00 pðc; c0 ; xÞ ¼ ½a2 ðkðc 2 xÞ þ kðc0 2 xÞÞ2 2 aðk0 ðc 2 xÞ
þ k0 ðc0 2 xÞÞhp ðc; c0 ; xÞ;
0
ðB2Þ
21
where kðc 2 xÞ ¼ f ðc 2 xÞf ðc 2 xÞ:
Thus
ð
ðð
IC ¼ a2 r kðc 2 xÞ2 dc þ a2 r2
kðc 2 xÞkðc0
2 xÞhðc; c0 ; xÞhp ðc; c0 ; xÞdc dc0 ;
¼ IC1 þ IC2 :
Due to the homogeneity of neural field, we get
ð
IC1 ¼ a2 r kðcÞ2 dc:
ðB3Þ
ðB4Þ
By using the Fourier transformation, the second term IC2 can
be written as
ðð KðvÞKð2vÞBðv0 Þ
IC2 ¼a2
dvdv0
Bðv2v0 Þ
2
0 2
ðð KðvÞKð2vÞð12bþrpffiffiffiffi
2pbbe2b ðv Þ =2 Þ
pffiffiffiffi
¼2pa2
dvdv0 ;
2
0
2
12bþr 2pbbe2b ðv 2vÞ =2
ðB5Þ
where KðvÞ¼F½kðc2xÞ:
References
Abbott, F., & Dayan, P. (1999). The effect of correlated variability on the
accuracy of a population code. Neural Computation, 11, 91 –101.
Amari, S. (1977). Dynamics of pattern formation in lateral-inhibition type
neural fields. Biological Cybernetics, 27, 77 –87.
213
Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (1992).
The analysis of visual motion: a comparison of neuronal and
psychophysical performance. Journal of Neuroscience, 12, 4745–4765.
Brunel, N., & Nadal, J.-P. (1998). Mutual information, fisher information,
and population coding. Neural Computation, 10, 1731–1757.
deCharms, R. C., & Zador, A. (2000). Neural representation and the cortical
code. Annual Review of Neuroscience, 23, 613 –647.
Deadwyler, S. A., & Hampson, R. E. (1997). The significance of neural
ensemble codes during behavior and cognition. Annual Review of
Neuroscience, 20, 217–244.
Dean, A. F. (1981). The variability of discharge of simple cells in the cat
striate cortex. Experimental Brain Research, 44, 437 –440.
Deneve, S., Latham, P. E., & Pouget, A. (1999). Reading population codes:
a neural implementation of ideal observers. Nature Neuroscience, 2(8),
740– 745.
Eurich, C. W., & Wilke, S. D. (2000). Multidimensional encoding strategy
of spiking neurons. Neural Computation, 12(7), 1519–1529.
Gawne, T. J., & Richmond, B. J. (1993). How independent are the messages
carried by adjacent inferior temporal cortical neurons? Journal of
Neuroscience, 13(7), 2758–2771.
Gershon, E. D., Wiener, M. C., Latham, P. E., & Richmond, B. J. (1998).
Coding strategies in monkey v1 and inferior temporal cortices. Journal
of Neurophysiology, 79, 1135–1144.
Giese, M. A. (1999). Dynamic neural field theory for motion perception.
Dordrecht: Kluwer.
Lee, D., Port, N. L., Kruse, W., & Georgopoulos, A. P. (1998a). Variability
and correlated noise in the discharge of neurons in motor and parietal
areas of the primate cortex. Journal of Neuroscience, 18(3),
1161–1170.
Lee, D., Port, N. L., Kruse, W., & Georgopoulos, A. P. (1998b). Neural
population coding: multielectrode recordings in primate cerebral cortex.
In H. Eichenbaum, & J. Davis (Eds.), Neural ensembles: strategies for
recording and decoding (pp. 117–136). New York: Wiley.
Maynard, E. M., Hatsopoulos, N. G., Ojakangas, C. L., Acuna, B. D., Sanes,
J. N., Normann, R. A., & Donoghue, J. P. (1999). Neuronal interactions
improve cortical population coding of movement direction. Journal of
Neuroscience, 19(18), 8083–8093.
McAdams, G. J., & Maunsell, J. H. R. (1999). Effects of attention on the
reliability of individual neurons in monkey visual cortex. Neuron, 23,
765– 773.
Nakahara, H., & Amari, S. (2002). Attention modulation of neural tuning
through peak and base rate in correlated firing. Neural Networks, 15,
41– 55.
Nakahara, H., Wu, S., & Amari, S. (2001). Attention modulation of neural
tuning through peak and base rate. Neural Computation, 13, 2031–2047.
Oram, M. W., Földiak, P., Perrett, D. I., & Sengpiel, F. (1998). The ideal
Homunculus: decoding neural population signals. Trends in Neuroscience, 21(6), 259–265.
Paradiso, M. A. (1988). A theory for use of visual orientation information
which exploits the columnar structure of striate cortex. Biological
Cybernetics, 58, 35 –49.
Pouget, A., Deneve, S., Ducom, J.-C., & Latham, P. E. (1999). Narrow
versus wide tuning curves: what’s best for a population code. Neural
Computation, 11, 85–90.
Pouget, A., Zhang, K., Deneve, S., & Latham, P. E. (1998). Statistically
efficient estimation using population coding. Neural Computation, 10,
373– 401.
Salinas, E., & Abbott, L. F. (1994). Vector reconstruction from firing rates.
Journal of Computational Neuroscience, 1, 89–107.
Sanger, T. D. (1998). Probability density methods for smooth function
approximation and learning in populations of tuned spiking neurons.
Neural Computation, 10(6), 1567–1586.
Seung, H. S., & Sompolinsky, H. (1993). Simple models for reading
neuronal population codes. Proceedings of the National Academy of
Sciences USA, 90, 10749–10753.
Shadlen, M. N., Britten, K. H., Newsome, W. T., & Movshon, J. A. (1996).
A computational analysis of the relationship between neuronal and
214
S. Wu et al. / Neural Networks 17 (2004) 205–214
behavioral responses to visual motion. Journal of Neuroscience, 16,
1486–1510.
Snippe, H. P. (1996). Parameter extraction from population codes: a critical
assessment. Neural Computation, 8, 511 –529.
Tolhurst, D. J., Movshon, J. A., & Dean, A. F. (1983). The statistical
reliability of signals in single neurons in cat and monkey visual cortex.
Vision Research, 23(8), 775 –785.
Vogels, R., Spileers, W., & Orban, G. A. (1989). The response variability of
striate cortical neurons in the behaving monkey. Experimental Brain
Research, 77(2), 432–436.
Wilke, S. D., & Eurich, C. W. (2002). Representational accuracy of
stochastic neural populations. Neural Computation, 14, 155–189.
Wu, S., Amari, S., & Nakahara, H. (2002a). Population coding and
decoding in a neural field: a computational study. Neural Computation,
999– 1026.
Wu, S., Amari, S., & Nakahara, H. (2002b). Asymptotical behaviours of
population codes. Neurocomputing, 44, 697.
Wu, S., Chen, D., & Amari, S. (2000a). Unfaithful population coding. In S.
Amari, C. L. Giles, M. Gori, & V. Piuri (Eds.), (Vol. II) (pp. 199 –205).
Proceedings of International Joint Conference on Neural Network
(IJCNN 2000), New York: IEEE.
Wu, S., Nakahara, H., Murata, N., & Amari, S. (2000b). Population
decoding based on an unfaithful model (Vol. 12). Advances in
Neural Information Processing, Cambridge, MA: MIT Press, pp.
192 –198.
Wu, S., Nakahara, H., & Amari, S. (2001). Population coding with
correlation and an unfaithful model. Neural Computation, 775–797.
Yoon, H., & Sompolinsky, H. (1999). The effect of correlations on the
Fisher information of population codes. In M. S. Kearns, S. A. Solla, &
D. A. Cohn (Eds.), (Vol. 11). Advances in neural information
processing, Cambridge, MA: MIT Press.
Zemel, R. S., Dayan, P., & Pouget, A. (1998). Probablistic interpretation of
population codes. Neural Computation, 10, 403–430.
Zhang, K., & Sejnowski, T. J. (1999). Neural tuning: to sharpen or broaden.
Neural Computation, 11, 75–84.
Zohary, E., Shadlen, M. N., & Newsome, W. T. (1994). Correlated neuronal
discharge rate and its implications for psychophysical performance.
Nature, 370(6485), 140–143.