* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Information processing in a neuron ensemble with the multiplicative
Neuroesthetics wikipedia , lookup
Process tracing wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Neural modeling fields wikipedia , lookup
Binding problem wikipedia , lookup
Neuroinformatics wikipedia , lookup
Neurophilosophy wikipedia , lookup
Synaptic gating wikipedia , lookup
Neurocomputational speech processing wikipedia , lookup
Central pattern generator wikipedia , lookup
Holonomic brain theory wikipedia , lookup
Cortical cooling wikipedia , lookup
Height and intelligence wikipedia , lookup
Neural oscillation wikipedia , lookup
Neuroeconomics wikipedia , lookup
Neuroethology wikipedia , lookup
Biological neuron model wikipedia , lookup
Optogenetics wikipedia , lookup
Neural correlates of consciousness wikipedia , lookup
Channelrhodopsin wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Convolutional neural network wikipedia , lookup
Neural coding wikipedia , lookup
Artificial neural network wikipedia , lookup
Nervous system network models wikipedia , lookup
Efficient coding hypothesis wikipedia , lookup
Metastability in the brain wikipedia , lookup
Neural binding wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Neural engineering wikipedia , lookup
Neural Networks 17 (2004) 205–214 www.elsevier.com/locate/neunet Information processing in a neuron ensemble with the multiplicative correlation structure Si Wua,*, Shun-ichi Amarib, Hiroyuki Nakaharab,c a Department of Informatics, Sussex University, Brighton, BN1 9QH UK b RIKEN Brain Science Institute, Wako-shi, Saitama, Japan c Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan Received 9 October 2002; revised 6 October 2003 Abstract The present study investigates the performance of population codes when the fluctuations in neural activity have mutual correlation with strength being proportional to the neuronal firing rate (multiplicative noise). The neural field is used to calculate the Fisher information, which is decomposed in two parts, one due to the tuning function and spatial correlation, and the other due to the multiplicative structure. Their different characteristics are studied. The paper also investigates three types of maximum likelihood method, namely, decoding by using faithful and unfaithful models and the Center of Mass strategy, and compares their performances in terms of decoding accuracy and computational complexity. q 2003 Elsevier Ltd. All rights reserved. Keywords: Information processing; Population code; Multiplicative correlation; Neural field; Fisher information; Maximum likelihood; Asymptotic efficiency; Center of mass 1. Introduction One of the key questions in neuroscience is how an ensemble formed from a neural population encodes and decodes the external world (deCharms & Zador, 2000; Deadwyler & Hampson, 1997; Oram, Földiak, Perrett, & Sengpiel, 1998). One approach to address this question is to investigate the accuracy of neural population coding by using the Fisher information (Abbott & Dayan, 1999; Brunel & Nadal, 1998; Deneve, Latham, & Pouget, 1999; Eurich & Wilke, 2000; Nakahara & Amari, 2002; Nakahara, Wu, & Amari, 2001; Paradiso, 1998; Pouget, Deneve, Ducom, & Latham, 1999; Pouget, Zhang, Deneve, & Latham, 1998; Salinas & Abbott, 1994; Sanger, 1998; Seung & Sompolinsky, 1993; Snippe, 1996; Wu, Nakahara, & Amari, 2001; Wu, Amari, &Nakahara, 2002b; Zemel, Dayan, & Pouget, l998; Zhang & Sejnowski, 1999). This is because the inverse of the Fisher information, called the Crameér-Rao bound, gives the lower bound of decoding errors for unbiased estimators. Thus, the Fisher information of the neural ensemble is an useful indicator to assess how * Corresponding author. Tel.: þ 44-1273-678770; fax: þ 44-1273671320. E-mail address: [email protected] (S. Wu). 0893-6080/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2003.10.003 accurately the neural ensemble may encode the external world and also how closely the decoding of the neural ensemble in the brain can realize such an accuracy. The correlation structure of firing in the neural ensemble is important not only to investigate a function of neural ensemble in general (Shadlen, Britten, Newsome, & Movshon, 1996; Zohary, Shadlen, & Newsome, 1994), but also in the approach using the Fisher information: The encoding accuracy of the neural ensemble drastically depends on the correlation structure (Abbott and Dayan, 1999; Wilke & Eurich, 2002; Wu, Nakahara, Murata & Amari, 2000b; Wu et al., 2002b; Yoon & Sompolinsky, 1999). Thus, it is important for any theoretical approach, including the one using the Fisher information, to use an appropriate correlation structure, in other words, to use the structure found in experiments. Experimental results usually indicate weak correlations in neural activities: The mean correlation coefficient (COR) is weak at approximately 0.01, (ranging 0.01–0.20) (Gawne & Richmond, 1993; Lee, Port, Kruse, & Georgopoulos, 1998a, b; Maynard et al., 1999; McAdams and Maunsell, 1999b; Zohary et al., 1994), while the COR of some neuron pairs could be up to 0.8 (Maynard et al., 1999). It is still under controversy to determine an exact correlation structure of the neural ensemble in general. However, experimental data suggest a multiplicative form as a most promising ‘first’ 206 S. Wu et al. / Neural Networks 17 (2004) 205–214 approximation. A multiplicative form (Abbott & Dayan, 1999; Nakahara & Amari, 2002; Wilke & Eurich, 2002; Wu, Chen, & Amari, 2000a) in general is given by Aij ¼ kij fia fja ; ð1Þ where Aij represents the covariance of the firing activities of neuron i and j (the variance in case of i ¼ j), fi and fj represents the mean firing rates of the neurons i and j; respectively, a and kij are parameters, both of which are usually fitted as constants for each pair of two neurons in experimental literatures. Experimental data on several cerebral cortical areas indicate that the exponent (i.e. a) is distributed roughly around 0.55 and that the constant kii ranges mainly around 0.8–3.0 with a tendency of kii $ lkij lði – jÞ (Britten, Shadlen, Newsome, & Movshon, 1992; Dean, 1981; Gershon, Wiener, Latham, & Richmond, 1998; Lee et al., 1998a,b; Tolhurst, Movshon, & Dean, 1983; Vogels, Spileers, & Orban, 1989). Provided the above multiplicative form as the first approximation, reports in the literature indicate some more details, particularly in relation to the term kij : One study (Zohary et al., 1994) suggested that the correlation between neurons, whose preferred stimuli are similar, is significantly higher than that between unsimilar neurons, while another study (Lee et al., 1998a,b) considered this phenomena only for neurons whose recording electrodes are close in the cortex. The other study (Maynard et al., 1999), however, suggests little dependency on recording sites. These not necessarily consistent data urge us to investigate the encoding and decoding accuracy of a neural ensemble with different correlation structures under the same multiplicative form. There has been a lot of theoretical research in this direction, and the Fisher information has been found to depend critically on the correlation structure. When neurons fire independently according to their tuning function, the Fisher information increases proportional to the number of neurons or the density of neurons in a continuous neural field, as is easily expected. However, some authors (Abbott & Dayan, 1999; Yoon & Sompolinsky, 1999) discovered that this is not true if there are correlations among neurons. The Fisher information may saturate. Wu et al. used the method of neural field (Amari, 1977; Giese, 1999; Wu et al., 2002a) to calculate the Fisher information systematically, and found that the saturation occurs only in a middle range of correlations. The Fisher information increases without limit, when the effective range of correlation is very short or very long. These results are based on the simple additive noise correlation. Eurich and Wilke (2000) used the general multiplicative case and showed that the Fisher information increases proportional to the number of neurons and that the saturation does not take place. The present paper uses the neural field model (Wu et al., 2002a) to show more detailed structure of the Fisher information in the multiplicative case studied by (Eurich and Wilke, 2002). The Fisher information is decomposed into two parts, one due to the form of the tuning function and the range of correlation, and the other due to the multiplicative effect (i.e. the term fia fja in Eq. (1)), which vanishes in the additive case. Our findings are: (1) the first part behaves similarly to the additive case, and it saturates in some range of correlation. (2) The second part always increases proportional to the number of neurons, as proved in (Eurich and Wilke, 2000). (3) The first part increases inversely proportional to the noise level, but the second part does not depend on the noise level. Furthermore, the present study also investigates the important issue, the difference in decoding accuracy by the faithful and unfaithful models (Nakahara & Amari, 2002; Wu et al., 2001, 2002a). The definition of the Fisher information implicitly poses the assumption that decoding is carried out by using the true encoding probability distribution (Wu et al., 2001). This assumption, however, is rather excessive and may not hold in a general case. For example, given the complexity and hierarchy of brain neural encoding/decoding structures, it would be more plausible to expect the encoding and decoding distributions to be different. If the decoding distribution is different from the encoding one, the decoding one is called unfaithful (If the same, called faithful). In addition to the decoding accuracy of the faithful model, which equals the inverse of the Fisher information in the multiplicative correlation case, the present study also investigates the decoding accuracy of a specific unfaithful model that neglects the neuronal correlation. This unfaithful model is attractive partly because it can be a compromise between computational complexity and estimation errors (Wu et al., 2001) and partly because a biologically plausible recurrent neural dynamics can lead to the estimation of this model (Pouget et al., 1998; Wu et al., 2001). 2. The encoding model We begin with a discrete encoding model. Consider an ensemble of N neurons coding a variable x which represents the position of the stimulus. Let us denote by ci the preferred stimulus position of the ith neuron, and let ri denote the response of the neuron, so that r ¼ {ri }; for i ¼ 1; …; N; denote the population activity. The neural responses are correlated, and the ith neuron’s activity is given by ri ¼ fi ðxÞ þ s1i ; i ¼ 1; …; N; ð2Þ where fi ðxÞ is the tuning function of the ith neuron representing the mean value of the response when stimulus x is applied, and s1i is noise whose probability may depend on x: In the present study, we consider only the Gaussian tuning function, that is, 2 2 1 fi ðxÞ ¼ pffiffiffiffi e2ðci 2xÞ =2a þ d; 2pa ð3Þ where the parameter a is the tuning width and d is a small constant representing the level of the spontaneous activity. S. Wu et al. / Neural Networks 17 (2004) 205–214 The parameter s represents the noise intensity, and 1i is the noise, which satisfies k1i l ¼ 0; ð4Þ k1i 1j l ¼ Aij ; ð5Þ where hðc; c0 ; xÞ is the covariance function (see Appendix A), 0 a ð1 2 bÞ 0 2ðc2c0 Þ2 =2b2 dðc 2 c Þ þ be hðc; c ; xÞ ¼ f ðc 2 xÞ r f ðc0 2 xÞa ¼ f ðc 2 xÞa Bðc; c0 Þf ðc0 2 xÞa ; where k·l represents averaging over many trials. The covariance matrix Aij ðxÞ contains the structure of the noise, which we assume has a general form, dðcÞ is the delta function, and Aij ðxÞ ¼ fia ðxÞ½ð1 2 bÞdij þ be2ðci 2cj Þ =2b fja ðxÞ; Bðc; c0 Þ ¼ 2 ¼ 2 fia ðxÞBij fja ðxÞ; ð6Þ with Bij ¼ ð1 2 bÞdij þ be2ðci 2cj Þ 2 =2b2 : ð7Þ 207 ð12Þ 0 2 2 ð1 2 bÞ dðc 2 c0 Þ þ be2ðc2c Þ =2b : r ð13Þ The continuous form of the encoding process is, ( 1 r2 ð ð QðrlxÞ ¼ exp 2 2 ½rðcÞ 2 f ðc 2 xÞhp ðc; c0 ;xÞ½rðc0 Þ Z 2s ) The parameters are 0 # a # 1 and 0 # b # 1; and the width b is the effective correlation length. The variable a reflects the degree of multiplicative correlation (Eurich & Wilke, 2000). When a ¼ 0; the noise is additive. The spatial correlation is controlled by Bij ; which we assume decays exponentially as lci 2 cj l becomes large. The noise can be seen as a sum of independent or white Gaussian noise with intensity 1 2 b; and the correlated Gaussian noise with intensity b (Wu et al., 2002a). The encoding process of population coding is fully specified by the conditional probability density of r when stimulus x is given, as where r ¼ {rðcÞ} and Z is the normalization factor. The function hp ðc; c0 ;xÞ is the inverse kernel of hðc;c0 ; xÞ; satisfying ðð r2 ð15Þ hp ðc; c0 ; xÞhðc0 ; c00 ;xÞdc0 dc00 ¼ 1: 1 QðrlxÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ð2ps ÞN detðAÞ 1 X 21 exp 2 2 Aij ðri 2fi ðxÞÞðrj 2fj ðxÞÞ : 2s ij where Bp ðc; c0 Þ is the inverse kernel function of Bðc; c0 Þ; that is, ðð r2 ð17Þ Bp ðc;c0 ÞBðc0 ; c00 Þdc0 dc00 ¼ 1: ð8Þ 2f ðc0 2 xÞdc dc0 ; ð14Þ It is easy to check that hp ðc; c0 ; xÞ ¼ f ðc 2 xÞ2a Bp ðc; c0 Þf ðc0 2 xÞ2a ; ð16Þ 2.1. Generalization to the continuous case Since we are only interested in the case when the number N of neurons is large (more accurately, the neuronal density is high), it is useful to extend the discrete encoding model to the continuous version. Mathematically, there is a large benefit from coping with a continuous neural field model (Amari, 1977; Giese, 1999). Let us consider a one-dimensional neural field, in which neurons are located with uniform density r: The activity of neuron at position c is denoted by rðcÞ: The neural response function rðcÞ is given by rðcÞ ¼ f ðc 2 xÞ þ s1ðcÞ; ð9Þ when stimulus x is applied, where quantities rðcÞ; f ðc 2 xÞ and 1ðcÞ are the counterparts of ri ; fi and 1i in the discrete version, respectively. The tuning function f ðc 2 xÞ has the same form as that of fi ðxÞ except that ci is replaced by c: The noise term 1ðcÞ satisfies k1ðcÞl ¼ 0; ð10Þ k1ðcÞ1ðc0 Þl ¼ hðc; c0 ; xÞ; ð11Þ 3. The fisher information The Fisher information for the encoding model QðrlxÞ is defined as ð d2 ln QðrlxÞ IF ðxÞ ¼ 2 QðrlxÞ dr: ð18Þ dx2 From Eq. (14), we get r2 ð ð 0 IF ðxÞ ¼ 2 f ðc 2 xÞhp ðc; c0 ; xÞf 0 ðc0 2 xÞdc dc0 s r2 ð ð þ hðc; c0 ; xÞh00 pðc; c0 ; xÞdc dc0 : 2 ¼ IT þ IC ; ð19Þ where f ðc 2 xÞ ¼ df ðc 2 xÞ=dx and h pðc; c ; xÞ ¼ d2 hp ðc; c0 ; xÞ=dx2 : Note that IF ðxÞ does not depend on x; because of the homogeneity of the neural field. We shall demonstrate below that the first part of the Fisher information, IT ; is mostly due to the tuning function, 0 00 0 208 S. Wu et al. / Neural Networks 17 (2004) 205–214 Fig. 1. Comparing the decoding errors of FMLI, UMLI and COM in the case of strong correlation ðb ¼ 1Þ and weak noise ðs ¼ 0:05Þ: The other parameters are a ¼ 0:55; a ¼ 1; d ¼ 0; and b ¼ 0:5: Because of the big difference in the magnitudes of the results, they are shown in two sub-figures: (a) FMLI; (b) UMLI and COM. in the sense that its value depends on the derivative of the tuning function and the spatial range of correlation to be shown later. The second term, IC ; is mostly due to the multiplicative nature of the noise, because it vanishes when a ¼ 0: The two terms are calculated separately. The asymptotic behavior of IT can be similarly analyzed as in the previous work (Wu et al., 2002b).1 The Fisher information IT does not necessarily increase with r; and its behavior strongly depends on the length b of correlation. The results are summarized in five different cases (as illustrated in Fig. 1 of Wu et al., (2002b)) 3.1. Fisher information from the tuning function The Fourier transform of a function gðtÞ is defined as 1 ð F½gðtÞ ¼ pffiffiffiffi e2ivt gðtÞ dt: 2p ð20Þ By using the Fourier transformation, IT ¼ pffiffiffiffi 2 ð 2pr FðvÞFð2vÞ dv; s2 r2 BðvÞ ð21Þ where FðvÞ ¼ F½f 0 ðc 2 xÞf 2a ðc 2 xÞ and BðvÞ ¼ F½Bðc 2 c0 Þ: Here, we utilize the relation F½Bp ðc; c0 Þ ¼ 1=½r2 BðvÞ: This is quite similar to the case with additive noise as studied in (Wu et al., 2002a). In order to see the property of IT ; we ignore the small constant d in f ðc 2 xÞ; corresponding to the spontaneous firing. Then, from Eqs. (3) and (13), FðvÞ ¼ iGwe 2a2 v2 =2ð12aÞ2 ; 2 2 ð1 2 bÞ BðvÞ ¼ pffiffiffiffi þ bbe2b v =2 ; 2pr ð22Þ † No correlation. When b ¼ 0; IT is proportional to r: † Local correlation. When b is order of 1=r;IT is proportional to r: pffiffi † Short-range correlation. When 1=r pb , 2a=ð12 aÞ; IT saturates to a constant even when r goespto ffiffi infinity. † Wide-range correlation. When b $ 2a=ð12 aÞ; IT increases in proportion to r: † Uniform multiplicative correlation. When b!1; Aij ¼ fi ðxÞa ½ð12 bÞdij þ bfj ðxÞa ; or hðc;c0 ;xÞ ¼ f ðc2xÞa ½ð12 bÞ=rdðc2c0 Þþ bf ðc0 2xÞa ; IT is proportional to r: 3.2. The fisher information due to the multiplicative structure Let us see the property of IC ; i.e. the contribution to the Fisher information from the multiplicative correlation structure, whose value is calculated to be (see Appendix B) ð IC ¼ a2 r kðcÞ2 dc 2 0 2 ðð KðvÞKð2vÞð12 b þ rpffiffiffiffi 2pbbe2b ðv Þ =2 Þ pffiffiffiffi þ2pa dv dv0 ; 2 0 2 12 b þ r 2pbbe2b ðv 2vÞ =2 ð25Þ ¼IC1 þIC2 ; 2 ð23Þ where G ¼ ð2pÞ20:5þa aa ð1 2 aÞ21:5 : In the above we put x ¼ 0 (due to the homogeneity of the field) without loss of generality. Therefore, 2 2 2a v v2 exp ð12 2pG2 r2 ð1 a Þ2 pffiffiffiffi IT ¼ dv : ð24Þ 2 2 2 s2 21 rð12 bÞþ r 2pbb e2b v =2 where kðc2xÞ¼f 0 ðc2xÞ=f ðc2xÞ and f ðc2xÞ: 1 KðvÞ¼F½f 0 ðc2xÞ= Note that Eq. (24) has the similar form as Eq.(5) in Wu et al. (2002b), and hence they have the same asymptotical behaviours on r as illustrated in Fig. 1 of Wu et al. (2002b). S. Wu et al. / Neural Networks 17 (2004) 205–214 From the above decomposition, we see that IC1 is proportional to the neuronal density r: The second term, IC2 saturates to a constant even when r tends to infinity. Because of IC1 ; the Fisher information will always increase with the neuronal density r in the case of multiplicative correlation, as was found by (Eurich & Wilke, 2000). The present calculation confirms their result. Another interesting finding of this study is that, while IT is order of 1=s and grows infinitely as the noise intensity tends to 0; IC is independent of s; being derived only from the multiplicative nature of correlation. It vanishes as a tends to 0. This is understandable, since one cannot expect more from the knowledge of multiplicative correlation by decreasing the noise intensity as they are coupled in product. This is an important property we need to pay attention to when practical data are analyzed2. It tells us that there is a trade-off between IT and IC ; which one dominates over the other depends not only on r but also on s: 4. Population decoding The Fisher information provides us only with an optimal decoding accuracy that an unbiased estimator can achieve. When a practical decoding method is concerned, its performance needs to be evaluated individually depending on the decoding model. We compare three decoding methods, all of which are formulated as the Maximum Likelihood Inference (MLI) type, that is, the maximizer of a likelihood function, whereas, they differ in the probability models for decoding. 4.1. Three decoding methods A MLI type estimator x^ is obtained through maximization of the presumed log likelihood ln PðrlxÞ; i.e. by solving 7ln Pðrl^xÞ ¼ 0; ð26Þ where 7kðxÞ denotes dkðxÞ=dx: PðrlxÞ is called the decoding model, which may be different from the real encoding model QðrlxÞ: This is because the decoding system usually does not know the exact encoding system. Moreover, a simple and robust decoding model is desirable from the computational point of view (Wu et al., 2002a). We consider three decoding methods defined as follows. † The first method is the conventional MLI, referred to as FMLI, which utilizes all of the encoding information, i.e. the decoding model is the true encoding model, PF ðrlxÞ ¼ QðrlxÞ: ð27Þ † The second method, referred to as UMLI, utilizes the information on the shape of the tuning function and 2 In Eurich and Wilke (2000), this property seems not to be paid attention, as they did not define an explicit parameter for the noise intensity. 209 the magnitude of signal fluctuation, but neglects the detail of neural correlation, so that the probability density 1 r ð 2 2a PU ðrlxÞ¼ exp 2 2 ½rðcÞ2f ðc2xÞ =f ðc2xÞdc ; ZU 2s ð28Þ is used for decoding. † The third method, referred to as COM, does not utilize any information of the encoding process, but instead it assumes an incorrect but simple tuning function. It also disregard correlations by using 1 r ð 2 ~ PC ðrlxÞ¼ exp 2 2 ½rðcÞ2 fðc2xÞ dc; ð29Þ ZC 2s 2 ~ where fðc2xÞ¼2ðx2cÞ þconst is used as a presumed tuning function. It is easy to check that the third method is equivalent to the conventional Center of Mass decoding strategy, with the solution given by Ð crðcÞdc : ð30Þ x^ ¼ Ð rðcÞdc 4.2. The performance of UMLI We first analyze the performance of UMLI. For convenience, two notations are introduced: EQ ½kðr; xÞ and VQ ½kðr; xÞ denote, respectively, the mean and variance of kðr; xÞ with respect to the distribution QðrlxÞ: Suppose x^ is close enough to x: We expand 7ln PU ðrl^xÞ at x; 7ln PU ðrl^xÞ . 7ln PU ðrlxÞ þ 77ln PU ðrlxÞð^x 2 xÞ: ð31Þ Since the estimator x^ satisfies 7ln PU ðrl^xÞ ¼ 0; 77lnPU ðrlxÞð^x 2 xÞ . 27ln PU ðrlxÞ: ð32Þ Let us denote R ¼ 7ln PU ðrlxÞ ¼ ¼ r ð ½rðcÞ 2 f ðc 2 xÞf 0 ðc 2 xÞ dc s2 f 2a ðc 2 xÞ r ð 1ðcÞf 0 ðc 2 xÞ dc; s f 2a ðc 2 xÞ ð33Þ S ¼ 77ln PU ðrlxÞ ! rð f 00 ðc 2 xÞ ðf 0 ðc 2 xÞÞ2 ¼ 1ðcÞ 2a 2 2a 2aþ1 dc s f ðc 2 xÞ f ðc 2 xÞ ¼2 r ð f 0 ðc 2 xÞ2 dc s2 f 2a ðc 2 xÞ ! rð f 00 ðc 2 xÞ ðf 0 ðc 2 xÞÞ2 dc þ D; 1ðcÞ 2a 2 2a 2aþ1 ¼ s f ðc 2 xÞ f ðc 2 xÞ ð34Þ 210 D¼2 S. Wu et al. / Neural Networks 17 (2004) 205–214 r ð f 0 ðc 2 xÞ2 2ð2pÞa20:5 a2a23 r : dc ¼ 23=2 ð1 2 aÞ3=2 s2 s2 f 2a ðc 2 xÞ ð35Þ Then, the estimating equation is Sð^x 2 xÞ ¼ 2R ð36Þ or R x^ 2 x ¼ 2 : S ð37Þ Here, both R and S are random variables depending on 1ðcÞ: It is easy to show that We now remark the cases of short- and wide- range correlation and of strong noise. In these cases, since BðvÞ is Oð1Þ; the random and constant terms in the variable S are of the same order. The decoding error ð^x 2 xÞ tends to have the Cauchy-type distribution, whose variance is undefined. How to choose a suitable performance measure in this case is an open question. 4.3. The performance of FMLI EQ ½R ¼ 0; ð38Þ Following the same line as for UMLI, we analyze the performance of FMLI. Suppose x^ is close enough to x: We expand 7ln Qðrl^xÞ; instead of 7ln Pðrl^xÞ; at x; EQ ½S ¼ D: ð39Þ 7ln Qðrl^xÞ . 7ln QðrlxÞ þ 77ln QðrlxÞð^x 2 xÞ: Their variances are given, by using the Fourier transforms, as pffiffiffiffi 2 ð 2pr VQ ½R ¼ ð40Þ FðvÞFð2vÞBðvÞdv; s2 pffiffiffiffi 2 ð 2pr ~ vÞBðvÞdv; ~ vÞFð2 ð41Þ VQ ½S ¼ Fð s2 ~ vÞ ¼ F½f 00 ðc 2 xÞf 2a ðc 2 xÞ 2 2aðf 0 ðc 2 xÞÞ2 f 2a21 where Fð ðc 2 xÞ: These show that R is a zero-mean Gaussian random variable. The random variable S is composed of two terms (Eq. (34)), the first one being random (Eq. (41)) and the second term D being a fixed constant (Eq. (35)). The constant term D in S dominates over the random one asymptotically in two cases 1. BðvÞ is order of 1=r: In this case the random term is of Oðr1=2 Þ and D is of OðrÞ: 2. BðvÞ is of Oð1Þ; but the noise intensity s2 is sufficiently small. In this case the random one is Oð1=sÞ and the constant term D is Oð1=s2 Þ: The case (1) corresponds to uncorrelated, local range, and uniformly correlated cases. Note that the noise intensity in this case is not important, provided that r is sufficiently large. The case (2) holds for the short- and wide- range correlations with sufficiently small noise. In these cases, we may neglect the random term in R; so that we have asymptotically S < D; x^ 2 x < R ; D and the decoding error ð^x 2 xÞ is normally distributed with zero mean and variance pffiffiffiffi 2 ð 2pr EQ ½ð^x 2 xÞ2UMLI ¼ ð43Þ FðvÞFð2vÞBðvÞdv: D 2 s2 UMLI is called quasi-asymptotic efficient, or quasi-Fisher efficient when the above equality holds (for more detail, please refer to Wu et al. (2002a)). ð44Þ Then, by using R ¼ 7ln QðrlxÞ; ð45Þ S ¼ 77ln QðrlxÞ; ð46Þ we have R : ð47Þ S Let us analyze the property of random variable S: Its mean value is obtained to be x^ 2 x ¼ 2 EQ ½S ¼ IF ; ð48Þ and the variance is pffiffiffiffi 2 ð ^ vÞFð2 ^ vÞ 2pr Fð VQ ½S ¼ ð49Þ dv ; 2 2 s r BðvÞ ^ vÞ ¼ F½f 00 ðc 2 xÞf 2a ðc 2 xÞ: where Fð The magnitude of VQ ½S can be similarly analyzed as IT in Section 3 (note their similar forms). It can be shown that VQ ½S is Oð1Þ in the short-range correlation and OðrÞ in other cases. The magnitude of EQ ½S; however, is always OðrÞ; even in short-range correlation. This is due to the contribution from the multiplicative correlation as analyzed before. Obviously, this property is not shared by UMLI and COM, as they both neglect the correlation. Therefore, the fluctuations of the random variable S with respect to its mean value are order of 1=r1=2 : In the large limit of r; the variable 1=S can be sufficiently approximated to be 1=EQ ½S: Finally, the variance of decoding error of FMLI is calculated to be VQ ½ð^x 2 xÞ2FMLI < VQ ½R 1 ¼ : 2 IF EQ ½S ð50Þ It tells us that FMLI is always asymptotically efficient due to the multiplicative correlation nature. 4.4. Summarization of performances The performance of COM can be similarly analyzed. It is understandable that COM has the same asymptotical S. Wu et al. / Neural Networks 17 (2004) 205–214 Table 1 The asymptotic behaviors of Fisher information and three MLI type of decoding methods (excluding the special case of weak noise). FE and QFE denote Fisher efficiency and quasi-Fisher efficiency, respectively, and NonF (Non-Fisherian) the opposite cases Correlation scale IF FMLI UMLI COM Non Local-range Short-range Wide-range Uniform /r /r /r /r /r FE FE FE FE FE QFE QFE Non-F Non-F QFE QFE QFE Non-F Non-F QFE behavior as that of UMLI, since they both neglect the correlation. Its decoding error, when the condition of quasiasymptotic efficiency is satisfied, can be calculated to be ðð VQ ½ð^x 2 xÞ2COM < s2 ð51Þ chðc; c0 Þc0 dc dc0 : Table 1 summarizes the asymptotic behaviors of the three decoding methods (excluding the special case of weak noise, in which the MLI type of methods is always approximately asymptotically or quasi-asymptotically efficient) and the Fisher information in different correlation scales under the multiplicative form. We compare the performances of the three methods. The true stimulus is assumed to be zero, and the neural field involved in coding is restricted to be within ½23; 3: This is to avoid the divergence of the error of COM, and can be understood as only those active enough neurons contributing to the decoding (Wu et al., 2001). Also, the weak noise is considered to ensure that both UMLI and COM are quasiasymptotically efficient, such that the variance of decoding error can be used as the performance measure. The parameter a is set as a ¼ 0:55 as suggested by experiments. Fig. 1 shows how the decoding errors of three methods change with the neural density r when the correlation covers a non-local range of population (b ¼ 1; i.e. corresponding to 211 the wide-range correlation). We see that the error of FMLI decreases with r when r is large. This is understandable, since the Fisher information is proportional to r in all cases. A comparison with the additive noise case is interesting, where the error of FMLI saturates (see Fig. 3(a) in Wu et al. (2002a)). The errors of UMLI and COM, however, both saturate when r is large. Fig. 2 exhibits how the decoding errors of the three methods change with the correlation scale when the neural density is fixed. They all increase initially and decrease when b is sufficiently large. This is understandable, since the extremes on both ends correspond to the cases when neurons are either uncorrelated or uniformly correlated. In all illustrated examples, UMLI has a larger error than FMLI, but a lower error than COM. The computational complexity of three methods, however, can be analyzed as follows. Consider maximization of the log likelihood of UMLI and FMLI by using the standard gradient descent method. The amounts of computation for obtaining the derivative of the log likelihood of UMLI and FMLI are proportional to N and N 2 ; respectively. UMLI is significantly simpler than FMLI when N is large. For COM, due to the quadratic form of the tuning function, the estimation can be done in one-shot by Eq. (30). Therefore, UMLI is able to achieve a compromise between the decoding accuracy and computational complexity. 5. Conclusion and discussion In summary, we have investigated the performance of population codes when the fluctuations in neural activity are multiplicatively correlated, i.e. the correlation strength depends on neuron’s firing rate. This correlation structure is known to be the most likely one in the cortex. Hence, the present study is expected to give us insight in the understanding of the information process in the brain. Fig. 2. Comparing the decoding errors of FMLI, UMLI and COM for different correlation scales. The parameters are a ¼ 0:55; a ¼ 1; d ¼ 0; b ¼ 0:5; r ¼ 50 and s ¼ 0:05: (a) FMLI; (b) UMLI and COM. 212 S. Wu et al. / Neural Networks 17 (2004) 205–214 The correlation model for the encoding process we consider is quite general. Apart from the varieties on the degree of the multiplicative nature (tuned by a), the spatial scale of correlation can also change from non, local, short, wide, till the whole range of neural ensemble (tuned by b). We expect that this model covers all uncertainties in neural correlation so far the experimental data tell us. Based on the proposed model, we first calculate the Fisher information, which is decomposed into two parts, IT and IC : IT is due to the tuning function and spatial correlation, and IC due to the multiplicative structure. IT is shown to have the same asymptotic behavior as the Fisher information in the additive noise case, and IC is proportional to the neural density r: Hence, the Fisher information in the multiplicative correlation case is always proportional to the neural density, as found by (Wilke & Eurich, 2002). Furthermore, we note that IT depends inversely on the noise intensity s; whereas, IC is independent of it. This property is also important. It tells us there is a trade-off between IT and IC ; which one dominating over the other not only depends on r but also s: The number of neurons involved in coding and the noise level can be essentially measured by experiment. With these data, we could judge which features of neuronal activity play the main role in information processing: If IT is larger, they are the tuning function and spatial correlation, otherwise, it is the multiplicative correlation. Subsequently, we have investigated three decoding methods. All of them are formulated as the MLI type, including the conventional COM method, whereas they differ in the knowledge of encoding process being utilized. It is proved that FMLI is always asymptotically efficient in the multiplicative correlation case, that is, it achieves the Cramér-Rao bound in the large limit of r: However, for UMLI and COM, they are not quasi-asymptotically efficient when the correlation covers a non-local range of population (excluding the cases of uniform correlation and weak noises). When UMLI and COM are not quasi-asymptotically efficient, their decoding errors satisfy the Cauchy-type of distribution with variance being un-defined. Therefore, we should be careful of using Eqs. (43) or (51) to calculate the variance. One may argue that, in practice, the value of ‘variance’ is always obtainable provided that there are a sufficient number of data points, e.g., we may calculate PNthe2 2 variance of x by using the formula kðx 2 kxlÞ l ¼ 1=N i xi P 2ð1=N Ni xi Þ2 ; where xi denotes the ith measurement. However, this value is un-reliable. The Cauchy-type of distribution says that kðx 2 kxlÞ2 l does not converge with the number of data points N: It implies that the value we obtain is sensitive to the data set being used, no matter how many data points are there (this is unlike the Gaussian distribution, whose variance is more accurately estimated when more data points are sampled.). Finally, we compare the performances of the three methods in terms of their decoding accuracy and computational complexity. It shows that UMLI, which utilizes the knowledge of the tuning function and the noise intensity, can achieve a compromise between computational complexity and decoding accuracy. This property may be inspirational for our understanding of the structures of neural encoding and decoding. Appendix A. The covariance function of the continuous encoding model We assume that the covariance function has the same form as for the discrete case, hðc; c0 Þ ¼ f ðc 2 xÞa ½D1 ð1 2 bÞdðc 2 c0 Þ 0 2 þ D2 be2ðc2c Þ =2b2 f ðc0 2 xÞa ; ðA1Þ where dðc 2 c0 Þ is the delta function. This assumption is confirmed by the consistency of the final result. In order to determine the coefficients D1 and D2 ; we use the correspondence principle: The covariance matrix Aij and the correlation function hðc; c0 Þ correspond to each other to give the quadratic form for an arbitrary vector k ¼ ðki Þ and its continuous version kðcÞ; r2 ðð kðcÞhðc; c0 Þkðc0 Þdc dc0 ¼ X ki Aij kj ; ðA2Þ ij where r is the neuronal density. Without loss of generality, we consider that the preferred stimulus ci is uniformly distributed in a range ½2ðL=2Þ; L=2; that is, ci ¼ 2 L L þ i ; for i ¼ 1; …; N: 2 N ðA3Þ By choosing ki ¼ fi ðxÞ2a ; for i ¼ 1; …; N; the above equation becomes, r2 ð L2 ð L2 L 22 ¼ X L 22 0 2 ½D1 ð12 bÞdðc2c0 ÞþD2 b e2ðc2c Þ =2b2 dcdc0 Bij : ðA4Þ ij This has an intuitive meaning, i.e. the total correlation is reserved after the continuous extension. The left hand-side of the above equation can be calculated to be r2 ð L2 ð L2 L 22 L 22 0 2 ½D1 ð12 bÞdðc2c0 ÞþD2 b e2ðc2c Þ pffiffiffiffi ¼D1 ð12 bÞN 2 =LþD2 b 2pbN 2 =L; =2b2 dcdc0 ðA5Þ S. Wu et al. / Neural Networks 17 (2004) 205–214 where r ¼N=L is used, and the right hand-side X pffiffiffiffi Bij ¼Nð12 bÞþN 2 b 2pb=L; ðA6Þ ij in the large N limit. Comparing Left and Right, we get D1 ¼ 1=r and D2 ¼ 1; and finally 0 2 2 ð1 2 bÞ hðc; c0 Þ ¼ f ðc 2 xÞa dðc 2 c0 Þ þ b e2ðc2c Þ =2b r f ðc0 2 xÞa ¼ f ðc 2 xÞa Bðc; c0 Þf ðc0 2 xÞa : ðA7Þ Appendix B. The second part of fisher information, IC From Eq. (19), r2 ð ð IC ¼ hðc; c0 ; xÞh00 pðc; c0 ; xÞdc dc0 : 2 ðB1Þ The derivative of covariance function is calculated to be h00 pðc; c0 ; xÞ ¼ ½a2 ðkðc 2 xÞ þ kðc0 2 xÞÞ2 2 aðk0 ðc 2 xÞ þ k0 ðc0 2 xÞÞhp ðc; c0 ; xÞ; 0 ðB2Þ 21 where kðc 2 xÞ ¼ f ðc 2 xÞf ðc 2 xÞ: Thus ð ðð IC ¼ a2 r kðc 2 xÞ2 dc þ a2 r2 kðc 2 xÞkðc0 2 xÞhðc; c0 ; xÞhp ðc; c0 ; xÞdc dc0 ; ¼ IC1 þ IC2 : Due to the homogeneity of neural field, we get ð IC1 ¼ a2 r kðcÞ2 dc: ðB3Þ ðB4Þ By using the Fourier transformation, the second term IC2 can be written as ðð KðvÞKð2vÞBðv0 Þ IC2 ¼a2 dvdv0 Bðv2v0 Þ 2 0 2 ðð KðvÞKð2vÞð12bþrpffiffiffiffi 2pbbe2b ðv Þ =2 Þ pffiffiffiffi ¼2pa2 dvdv0 ; 2 0 2 12bþr 2pbbe2b ðv 2vÞ =2 ðB5Þ where KðvÞ¼F½kðc2xÞ: References Abbott, F., & Dayan, P. (1999). The effect of correlated variability on the accuracy of a population code. Neural Computation, 11, 91 –101. Amari, S. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics, 27, 77 –87. 213 Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (1992). The analysis of visual motion: a comparison of neuronal and psychophysical performance. Journal of Neuroscience, 12, 4745–4765. Brunel, N., & Nadal, J.-P. (1998). Mutual information, fisher information, and population coding. Neural Computation, 10, 1731–1757. deCharms, R. C., & Zador, A. (2000). Neural representation and the cortical code. Annual Review of Neuroscience, 23, 613 –647. Deadwyler, S. A., & Hampson, R. E. (1997). The significance of neural ensemble codes during behavior and cognition. Annual Review of Neuroscience, 20, 217–244. Dean, A. F. (1981). The variability of discharge of simple cells in the cat striate cortex. Experimental Brain Research, 44, 437 –440. Deneve, S., Latham, P. E., & Pouget, A. (1999). Reading population codes: a neural implementation of ideal observers. Nature Neuroscience, 2(8), 740– 745. Eurich, C. W., & Wilke, S. D. (2000). Multidimensional encoding strategy of spiking neurons. Neural Computation, 12(7), 1519–1529. Gawne, T. J., & Richmond, B. J. (1993). How independent are the messages carried by adjacent inferior temporal cortical neurons? Journal of Neuroscience, 13(7), 2758–2771. Gershon, E. D., Wiener, M. C., Latham, P. E., & Richmond, B. J. (1998). Coding strategies in monkey v1 and inferior temporal cortices. Journal of Neurophysiology, 79, 1135–1144. Giese, M. A. (1999). Dynamic neural field theory for motion perception. Dordrecht: Kluwer. Lee, D., Port, N. L., Kruse, W., & Georgopoulos, A. P. (1998a). Variability and correlated noise in the discharge of neurons in motor and parietal areas of the primate cortex. Journal of Neuroscience, 18(3), 1161–1170. Lee, D., Port, N. L., Kruse, W., & Georgopoulos, A. P. (1998b). Neural population coding: multielectrode recordings in primate cerebral cortex. In H. Eichenbaum, & J. Davis (Eds.), Neural ensembles: strategies for recording and decoding (pp. 117–136). New York: Wiley. Maynard, E. M., Hatsopoulos, N. G., Ojakangas, C. L., Acuna, B. D., Sanes, J. N., Normann, R. A., & Donoghue, J. P. (1999). Neuronal interactions improve cortical population coding of movement direction. Journal of Neuroscience, 19(18), 8083–8093. McAdams, G. J., & Maunsell, J. H. R. (1999). Effects of attention on the reliability of individual neurons in monkey visual cortex. Neuron, 23, 765– 773. Nakahara, H., & Amari, S. (2002). Attention modulation of neural tuning through peak and base rate in correlated firing. Neural Networks, 15, 41– 55. Nakahara, H., Wu, S., & Amari, S. (2001). Attention modulation of neural tuning through peak and base rate. Neural Computation, 13, 2031–2047. Oram, M. W., Földiak, P., Perrett, D. I., & Sengpiel, F. (1998). The ideal Homunculus: decoding neural population signals. Trends in Neuroscience, 21(6), 259–265. Paradiso, M. A. (1988). A theory for use of visual orientation information which exploits the columnar structure of striate cortex. Biological Cybernetics, 58, 35 –49. Pouget, A., Deneve, S., Ducom, J.-C., & Latham, P. E. (1999). Narrow versus wide tuning curves: what’s best for a population code. Neural Computation, 11, 85–90. Pouget, A., Zhang, K., Deneve, S., & Latham, P. E. (1998). Statistically efficient estimation using population coding. Neural Computation, 10, 373– 401. Salinas, E., & Abbott, L. F. (1994). Vector reconstruction from firing rates. Journal of Computational Neuroscience, 1, 89–107. Sanger, T. D. (1998). Probability density methods for smooth function approximation and learning in populations of tuned spiking neurons. Neural Computation, 10(6), 1567–1586. Seung, H. S., & Sompolinsky, H. (1993). Simple models for reading neuronal population codes. Proceedings of the National Academy of Sciences USA, 90, 10749–10753. Shadlen, M. N., Britten, K. H., Newsome, W. T., & Movshon, J. A. (1996). A computational analysis of the relationship between neuronal and 214 S. Wu et al. / Neural Networks 17 (2004) 205–214 behavioral responses to visual motion. Journal of Neuroscience, 16, 1486–1510. Snippe, H. P. (1996). Parameter extraction from population codes: a critical assessment. Neural Computation, 8, 511 –529. Tolhurst, D. J., Movshon, J. A., & Dean, A. F. (1983). The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Research, 23(8), 775 –785. Vogels, R., Spileers, W., & Orban, G. A. (1989). The response variability of striate cortical neurons in the behaving monkey. Experimental Brain Research, 77(2), 432–436. Wilke, S. D., & Eurich, C. W. (2002). Representational accuracy of stochastic neural populations. Neural Computation, 14, 155–189. Wu, S., Amari, S., & Nakahara, H. (2002a). Population coding and decoding in a neural field: a computational study. Neural Computation, 999– 1026. Wu, S., Amari, S., & Nakahara, H. (2002b). Asymptotical behaviours of population codes. Neurocomputing, 44, 697. Wu, S., Chen, D., & Amari, S. (2000a). Unfaithful population coding. In S. Amari, C. L. Giles, M. Gori, & V. Piuri (Eds.), (Vol. II) (pp. 199 –205). Proceedings of International Joint Conference on Neural Network (IJCNN 2000), New York: IEEE. Wu, S., Nakahara, H., Murata, N., & Amari, S. (2000b). Population decoding based on an unfaithful model (Vol. 12). Advances in Neural Information Processing, Cambridge, MA: MIT Press, pp. 192 –198. Wu, S., Nakahara, H., & Amari, S. (2001). Population coding with correlation and an unfaithful model. Neural Computation, 775–797. Yoon, H., & Sompolinsky, H. (1999). The effect of correlations on the Fisher information of population codes. In M. S. Kearns, S. A. Solla, & D. A. Cohn (Eds.), (Vol. 11). Advances in neural information processing, Cambridge, MA: MIT Press. Zemel, R. S., Dayan, P., & Pouget, A. (1998). Probablistic interpretation of population codes. Neural Computation, 10, 403–430. Zhang, K., & Sejnowski, T. J. (1999). Neural tuning: to sharpen or broaden. Neural Computation, 11, 75–84. Zohary, E., Shadlen, M. N., & Newsome, W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature, 370(6485), 140–143.