Download NIHMS263877-supplement-1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural oscillation wikipedia , lookup

Mirror neuron wikipedia , lookup

Neuroanatomy wikipedia , lookup

Nonsynaptic plasticity wikipedia , lookup

Neural modeling fields wikipedia , lookup

Development of the nervous system wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Convolutional neural network wikipedia , lookup

Metastability in the brain wikipedia , lookup

Recurrent neural network wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Optogenetics wikipedia , lookup

Central pattern generator wikipedia , lookup

Neural coding wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Sparse distributed memory wikipedia , lookup

Pre-Bötzinger complex wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Biological neuron model wikipedia , lookup

Synaptic gating wikipedia , lookup

Nervous system network models wikipedia , lookup

Transcript
1
Supporting Text for the manuscript
“A reservoir of time constants for memory traces in cortical neurons”
Alberto Bernacchia, Hyojung Seo, Daeyeol Lee, Xiao-Jing Wang
2
Contents
Figures
1) Distribution of timescales and amplitudes in the random network model
1.1) Description of the model
1.2) Choice of the network parameters
1.3) Theoretical Analysis
1.4) Interpretations and limits of modelling
2) Factorization of epoch code and readout of memory-epoch conjunctions
2.1) Integrating the activity of cortical neurons
2.2) Encoding of memory-epoch conjunctions
References
3
7
7
8
11
13
16
16
17
20
3
Figure S1: Panel (a) and (b) show the fraction of timescales for the single (a) and
double exponential functions (b). The two histograms are virtually identical. Note that
panel (b) includes both 1 (red) and 2 (green) for the double exponential, where 1 tends
to be more concentrated at small values while 2 at large values. This is expected since,
for each double exponential, we define 1 as the short timescale and 2 as the long
timescale, therefore 1<2 by definition for each neuron. Panels (c), (d) and (e) show the
same histograms separately for the three areas (c – ACCd, d – DLPFC, e – LIP). Note
that ACCd tends to have longer timescales while LIP tends to have shorter timescales.
4
Figure S2: The distribution of timescales (panel (a)) and amplitudes (panel (b)) for the
neural memory traces of choice (same format as of Figs. 4 and 8 in the main text).
Results are qualitatively similar to those for reward memory. However, the exponents
fitted in the curves are slightly different, and results are also less consistent across
different areas. The number of neurons displaying memory for choice is different across
areas (40% in ACCd, 71% in DLPFC, 73% in LIP, 2-test p~10-12).
5
Figure S3: Schematic illustration of the model. The model consists of a reservoir
network of recurrently connected neurons (green), receiving an input pulse when a
reward is delivered (red). The output (blue) illustrates a hypothetical system reading out
the timescales in the reservoir network (not explicitly modelled). The memory traces of
four example neurons are shown in the bottom, taken from actual simulations.
6
a
b
Figure S4: Histogram of the on-diagonal (a) and off-diagonal (b) entries of one
example matrix J, the former representing the self-interactions of nodes in the network,
the latter representing the strength of the interaction among different nodes.
7
1) Distribution of timescales and amplitudes in the random network
model
The purpose of this section is to build a mathematical model accounting for the
distribution of memory timescales and amplitudes observed in the exponential fits of the
experimental data. We will compare the exponential decay ex(t) observed in the data
with the variable v(t) of our model that corresponds to the temporally integrated
response of neurons to reward. Consistent with the data, the response v(t) shows an
exponential decay, and we will focus on studying its amplitude and timescale across
neurons. In this section we model only the exponential factor of the memory trace in the
experimental data, and we do not consider the multiplicative effect of memory on neural
activity. The label v(t) is reminiscent of the value v(t) (or reward expectation) in
reinforcement learning theories, defined as an exponential filter of past rewards1-4.
1.1) Description of the model
Our model consists of a large number of neurons, all driven by the input signalling the
reward. The neurons and the recurrent synaptic interactions among them form a network
architecture. The equation of dynamics of the network activity is given by
dv/dt = Jv(t) + hRew(t)
(1.1)
where v(t) is a vector of M components, each component corresponds to the activity of a
neuron in response to the reward sequence Rew(t) (we set M=1000 in simulations). The
matrix J controls the synaptic interactions among different neurons, including the selfinteraction, and h is the vector of the input weights to different neurons. The equation of
dynamics for each separate neuron, i.e. the scalar form of Eq.(1.1), is written as
dvi/dt = j Jijvj(t) + hiRew(t)
(1.2)
8
where different neurons are labelled by the indices i (post-synaptic) and j (pre-synaptic).
We consider a single reward event, by modelling Rew(t) as a single pulse at time zero,
i.e. Rew(t) = (t). The response to a sequence of rewards can be obtained by a
straightforward summation of single reward responses and is not considered further.
Eq.(1.1) can be readily solved, and the response of neuron i is given by
vi(t) = ∑k Ai(k)e-t/(k)
(1.3)
Hence, the response to a single reward is a sum of exponential functions, labelled by the
index k, each exponential characterized by a different amplitude Ai(k) and timescale
(k). Eq.(1.3) is the focus of our model and has to be compared with the memory traces
observed in the experimental data. The values of amplitudes and timescales depend on
the choice of the matrix J and the vector h, which are specified in the section 1.2
“Choice of network parameters”. In simulations, amplitudes and timescales are
determined from the spectral decomposition of J (i.e. its eigenvalues and eigenvectors)
following Eqs.(1.5) and (1.6). The distribution of amplitudes and timescales across
neurons in the model is then plotted and compared with experimental data.
The synaptic interactions Jij are described in detail in the next sections. We will show
that three main features of the interactions allows the model to reproduce the observed
data: 1) They are strong, implying that neurons are able to reverberate the reward input,
memorizing it for long time. This will give rise to a wide distribution (power-law) of
timescales. 2) They are heterogeneous, implying that each neuron displays a
quantitatively different decay of the memory. 3) They are symmetric, implying that the
decay is exponential and that the amplitudes are distributed exponentially.
1.2) Choice of the network parameters
The free parameters of the network model are the values of the matrix J and the vector h
in Eq.(1.1). We pick each component of the vector of input weights hi independently
9
from a Gaussian distribution of zero mean and variance equal to M. The specific choice
of hi does not affect our results, and a different choice would lead to the same
distribution of timescales and amplitudes in the model, although the magnitude of |h|
controls the scale of amplitudes (see section 1.3 “Theoretical analysis”). Instead, the
choice of the synaptic matrix of interactions J strongly affects the distribution of
timescales and amplitudes, and we give the specific prescriptions for J in the following.
We construct a matrix J from its spectral decomposition. As well known in linear
algebra, every square matrix J (non-defective) can be decomposed in its eigenvalues and
-1
eigenvectors following the expression J = VDV , where D is the diagonal matrix of
eigenvalues (k) (i.e. Dkk=(k)), the columns of the matrix V are the right eigenvectors
-1
R(k) (i.e. Vik = Ri(k)) and the rows of its inverse V are the left eigenvectors L(k) (i.e.
-1
-1
(V )kj = Lj(k)). It is convenient to rewrite the spectral decomposition J = VDV in
terms of its eigenvalues and right and left eigenvectors as (k = 1,…,M)
Jij = ∑kRi(k)(k)Lj(k)
(1.4)
Instead of setting the values of Jij directly, we set the values of the eigenvalues and
eigenvectors and we compute Jij according to Eq.(1.4). Eigenvalues (k) are taken
independently from a uniform distribution in the interval [-2,0]. Instead of drawing M
eigenvalues, we draw M/2 eigenvalues from [-2,0] and we count each eigenvalue twice,
i.e. each eigenvalue is two-fold degenerate. The choice of eigenvectors is specific: we
assume eigenvectors to form an orthogonal basis of vectors, i.e. each eigenvector is
orthogonal to all other eigenvectors and is normalized to one. Formally, this
T
T
-1
corresponds to the expression V V = I or, equivalently, V = V (where the superscript
T indicates transpose, and I is the identity matrix), implying that the matrix V is
orthogonal and the left and right eigenvectors are equal. Among the infinite number of
possible orthogonal matrices of dimension M, we draw V randomly from the uniform
distribution of orthogonal matrices (also known as Haar Measure).
10
In order to gain some intuition about the uniform distribution of orthogonal matrices V,
it is useful to recognize that each orthogonal matrix corresponds to the rotation of a
vector by a given angle. In a space of dimension M, the rotation is defined by specifying
the values of M-1 angles, where each angle has a finite interval of allowed values. For
example, in three dimensions (M=3) the rotation of a vector is defined by two angles,
the azimuth angle, varying in the interval [0,2], and the zenith angle, varying in the
interval [0,]. The uniform distribution of V can be understood as the uniform
distribution of all angles in each of their respective finite interval (see below for a recipe
for generating V from this distribution).
Note that our model does not have any free parameter: the statistics of J is completely
determined by the uniform distributions of its eigenvalues and eigenvectors. We set the
left bound of the distribution of eigenvalues to -2 in order to have the mean eigenvalue
equal to -1, which sets the mean self-interaction in the matrix J, hence it sets the
characteristic integration timescale of single neurons. Fig.S4a shows the distribution of
on-diagonal terms (self-interaction) of one example matrix J obtained by the above
procedure: the self-interaction is approximately -1, corresponding to a single-neuron
integration timescale equal to 1 in Eq.(1.2). The choice of the right bound of the
distribution of eigenvalues to be zero has important consequences, such that the
magnitude of synaptic interactions is large and the network is in a critical state (see
section 1.3 “Theoretical analysis”). Fig.S4b shows the off-diagonal terms (crossinteraction) of the matrix J: cross interactions include both positive (excitatory) and
negative (inhibitory) terms of order of magnitude M-1/2. The choice of an orthogonal
basis of eigenvectors, together with the fact that eigenvalues are real, implies that the
interactions between each pair of connected neurons i and j are symmetric, i.e. Jij = Jji.
Weak asymmetries could be considered by allowing complex values for eigenvalues
and eigenvectors.
11
We conclude this section with a recipe for generating a specific instance of the
orthogonal matrix V. This is consists of three simple steps: 1) Generate a square matrix
W, of dimension M, by drawing each element independently from an arbitrary
distribution (for instance, a Gaussian distribution with zero mean and unitary variance).
2) Perform the orthogonal-triangular decomposition of matrix W (also known as QR
decomposition), namely W=QR, where Q is an orthogonal matrix and R is an upper
triangular matrix. 3) The final result V is obtained by multiplying each column of Q by
the sign of the corresponding value in the diagonal of R, i.e. V=QS, where S is a
diagonal matrix where each element in the diagonal is equal to the sign of the
corresponding element in R. This procedure has been shown equivalent to drawing V
from the uniform distribution of orthogonal matrices5.
1.3) Theoretical Analysis
We can predict the distribution of timescales and amplitudes in the model by a
theoretical analysis. The first step is to write the amplitudes and timescales of the
exponential functions in Eq.(1.3) as a function of the eigenvalues (k) and the left and
right eigenvectors Li(k) and Ri(k) of the matrix J, defined by Eq.(1.4). By
straightforward linear algebra, those expressions read
(k) = -1/(k)
(1.5)
Ai(k) = Ri(k)  ∑jLj(k)hj
(1.6)
Note that the amplitudes depend also on the vector of input weights hj.
We assume that the eigenvalues of the matrix J are random and follow a distribution
G(). We can calculate the distribution of timescales P() as a function of the
distribution of eigenvalues G(), by using Eq.(1.5) and the identity P()d = G()d.
The result is
12
P() =  -2 G(-1/)
(1.7)
Even before specifying the distribution of eigenvalues G, it is clear that the first factor
of Eq.(1.7) corresponds to a power law with an exponent of -2, consistent with the
experimental data. If we assume that the distribution of eigenvalues is uniform, i.e. G is
constant in a given interval, the distribution of timescales is
P() ≈  -2
(1.8)
in the corresponding interval of values of . The interval of  is obtained by applying
Eq.(1.5) to the interval for . Since the interval for  is [-2,0], then the interval for  is
[1/2,+∞]. It is now clear that setting to zero the right bound for  implies that the power
law distribution of  extends to large values (up to infinity). Such a wide distribution of
timescales is the signature of a critical system6. In the machine learning literature, it has
been proposed that critical systems may perform useful types of computation (the so
called “computation at the edge of chaos”7-9).
Now we investigate the distribution of amplitudes in the model. We assume that the
eigenvectors of the matrix J are orthogonal (i.e. the matrix J is normal10), implying that
the left and right eigenvectors are equal (or complex conjugate), and we can normalize
them to unitary norm. Without loss of generality, we focus on a single eigenvector R,
and we write the distribution of its components as
Q(R) ≈ (|R|2-1)
(1.9)
This distribution corresponds to a uniform distribution in which the normalization
property of the eigenvector R is enforced (uniform distribution on the (M-1) sphere),
where |R|2 is the standard Euclidean norm of a vector. From Eqs.(1.6, 1.9), the
distribution of the amplitudes can be computed, and the result depends only on whether
the eigenvector R(k) is complex or it is real, and on the degeneracy of the corresponding
13
eigenvalue (k). We found that in the case of a real eigenvector, and for a two-fold
degenerate eigenvalue, the distribution of amplitudes is exponential, i.e.
P(A) ≈ e-|A|
(1.10)
where  = M/|h|. This expression is consistent with the distribution of amplitudes
observed in the data. Note that, if the components of h are independently distributed
following a Gaussian with zero mean and variance equal to M, then |h| ≈ M and  ≈ 1. It
remains unknown whether other classes of matrices J (non-normal) could give rise to an
exponential distribution of amplitudes.
In summary, theoretical analysis of the model suggest that the power law distribution of
memory timescales and the exponential distribution of amplitudes are consistent with a
random network model in which the matrix of interactions J satisfies two prescriptions.
First, the power law distribution of timescales is observed if the eigenvalues of J are
allowed to approach zero, pushing the network to the critical state (edge of chaos).
Second, the exponential distribution of amplitudes is observed if the matrix J is normal
and its eigenvectors are distributed uniformly in the space of orthogonal matrices.
1.4) Interpretations and limits of modelling
The model we considered has several limitations. The first, obvious one is that the
neural network does not incorporate the biophysics in the dynamics of neurons and
synapses. Although the model does not reflect the biology of the cortex, the results of
the model can be quantitatively compared with the experimental observations, offering a
detailed account of the heterogeneous picture of the response to reward of a large
number of neurons. In this section, we discuss a few issues to be considered when
comparing the model with the experimental results.
14
Eq.(1.3) shows the sum of a large number of exponential functions. However, a single
exponential, or sum of two exponentials, is good enough to fit the memory trace of each
neuron observed in the experimental data. Furthermore, neurons in the model have
different amplitudes but they share the same set of timescales, while experimental
results show that different neurons feature different timescales. We list in the following
a few possible scenarios, not mutually exclusive, able to reconcile modeling and
experimental data.
1) Since the distribution of amplitudes within a single neuron is exponential,
characterized by a strong peak at zero, some timescales would have amplitudes
close to zero. Then, each neuron shows only the timescales that correspond to
a non-zero amplitude for that neuron, implying that different neurons might
have different timescales, consistent with the experimental data.
2) Because the distribution of timescales is power-law, then, with high
probability, the largest timescale in the network is much larger than the second
largest, which in turn is much larger then the third and so on and so forth,
down to a set of timescales that are of similar and small magnitude. Since
exponentials with short timescales decays away very fast with respect to those
with long timescales, very fast responses are negligible. Then, only one or two
timescales contribute to the sum in Eq.(1.3), consistent with the single or
double exponential fits used in the experimental data.
3) Different neurons measured experimentally might be accounted for by
different instances of the network model, i.e. different realizations of the
synaptic ineractions. This would explain why different neurons in the data are
characterized by different timescales, and is consistent with at least two
different interpretations. First, different cortical neurons might belong to
different and disconnected sub-networks. Second, Eq.(1.1) might not describe
15
a neural network, but an intracellular network, and each neuron is separately
characterized by a network of intracellular reactions that can be modelled by
Eq.(1.1)11-14.
In particular, in simulations we have implemented together the scenarios 2) and 3). We
performed 1000 simulations, each characterized by different instances of the
interactions, i.e. the matrix J and the vector h. In each simulation we consider only the
longest timescale and a single neuron. Without loss of generality, we set k=1 for the
longest timescale, and i=1 for the single neuron, and we record in each simulation (1)
and A1(1), discarding all the other timescales and amplitudes. During 1000 simulations,
we collect 1000 timescales and 1000 amplitudes, and we compute histograms for A and
 across the different simulations. Hence, we interpret each simulation as a separate
measure of the activity from a single neuron, and the number of simulations, rather than
the number of neurons in a single simulation, determines the number of neurons
measured in the model. The results of implementing this scheme shows no qualitative
difference with respect to considering a single network, namely the distribution of
timescales is still power law and the distribution of amplitudes is still exponential. This
result is obvious for the amplitudes, since we pick in each network an independent
instance from the same exponential distribution. For the timescales, since we pick the
longest timescale in each network, the distribution across networks will be equal to the
extreme value density of a power law distribution with exponent -2, which is equal to
the Cauchy distribution15. As shown in Fig.7b in the main text, this distribution shows a
cut between a plateau and the power law tail. Consistently, the cut is present also in the
data (Fig.4), occurring at roughly  = 0.5 trials. In the model, the characteristic
integration timescale of a neuron is set to 10ms, and timescales have been normalized
such that one  corresponds to 3.4s, which is the mean duration of one trial. The cut
occurs at  = 0.5 trials, consistent with data.
16
2) Factorization of epoch code and readout of memory-epoch
conjunctions
In this section we discuss possible computational advantages of the factorization of the
epoch code observed in the data. We show that the factorization allows the computation
of arbitrary memory-epoch conjunctions.
2.1) Integrating the activity of cortical neurons
Following Eq.(2) in the main text, the firing rate of a cortical neuron in trial n and epoch
k can be expressed as the epoch code g(k) times a factor that depends on the past
rewards, namely
FR(n,k) = g(k)[1+∑n’=0:5 ex(t)Rew(n-n’)]
(2.1)
where t denotes the time elapsed since the outcome of n’ trials in the past, and ex(t) is
an exponential function (see Methods). Our goal is to describe the general properties of
a readout neuron integrating signals of the type of Eq.(2.1).
As observed in the data, different cortical neurons correspond to a variety of different
epoch codes g(k) and exponential functions ex(t). We label cortical neurons by the
index i, we denote the firing rate of neuron i as FRi, and the corresponding epoch code
and exponential by gi(k) and exi(t). We restrict our analysis to linear integration, namely
the input Iout, from cortical neurons to the readout neuron, is a sum over the firing rates
of cortical neurons
Iout = ∑i wiFRi = ∑i wigi(k)[1+∑n’=0:5 exi(t)Rew(n-n’)]
(2.2)
Where wi denotes the synaptic weight of the connection from neuron i to the readout
neuron. Because the contributions of different outcomes (Rew) to Iout add up linearly
17
(sum over n’), we can focus on the contribution of a single outcome, denoted by fout and
given by
fout(k,t) = ∑i wi[gi(k)exi(t)]
(2.3)
This corresponds to the response of the readout neuron to a single outcome (note that
we have dropped the 1 in Eq.(2.2), since it does not depend on the outcome). We aim at
describing the encoding of the two variables, epoch k and elapsed time t, by the readout
neuron. The former represents the present task context, while the latter represents the
memory of a specific past event that occurred t seconds ago, and we are interested in the
conjunction of information between that past event and the present task demands. Note
that the two variables (k,t) are not independent, but we show that, formally, they can be
treated as independent in most cases (see below), and for simplicity we assume that t
measures the number of elapsed trials since a given outcome.
2.2) Encoding of memory-epoch conjunctions
In this section, we show that the multiplicative form [gi(k)exi(t)] gives the readout
neuron a substantial power in computing arbitrary conjunctions of the two variables
(k,t). As a comparison, we show that the multiplicative form does better than the
additive form [gi(k)+exi(t)]. Following Eq.(2.3), the problem can be stated as finding the
weights wi resulting in the output fout(k,t) that best approximates a given, arbitrary target
function ftarget(k,t), i.e. the weights minimizing their average square difference. The
target function is chosen depending on the requirements of an arbitrary task. For
example, if the task requires learning the effects of outcome t = 3 trials in the past on the
epoch k = 4 of current trial, then we may require the readout neuron to be active only
during epoch k = 4 and to signal the outcome of only t = 3 trials in the past, by defining
the target function as ftarget(4,3) = 1 and ftarget(k,t) = 0 for all other values of (k,t). We
assume that k takes K values (k = 1,…,K), t takes T values (t = 1,…,T), such that the
18
function ftarget(k,t) is characterized by KT arbitrary values, and the number of inputs in
Eq.(2.3) (labelled by the index i) is of large magnitude, close to KT. In general, solving
this problem requires a standard application of linear algebra, namely the pseudoinversion of the matrix in square brackets of Eq.(2.3), where each row of the matrix
corresponds to a combination of the two values (k,t) and each column corresponds to a
different input i. The quality of the reconstruction of ftarget from the available inputs
depends on the rank of the matrix, and better results are obtained when the rank
approaches the number KT of values of ftarget to be reconstructed. When summation
[gi(k)+exi(t)] is used instead of multiplication [gi(k)exi(t)], the matrix has only K+T
independent entries, and its rank is at most K+T, a very small number with respect to
KT. Instead, the multiplicative matrix always has full rank, and it guarantees the best
possible reconstruction given the available number of input neurons. Given that the
multiplicative form [gi(k)exi(t)] is full rank, optimal reconstruction of the target
function requires the inversion of the following matrix
Aij = [∑kgi(k)gj(k)][∑texi(t)exj(t)]
(2.4)
Note that this expression measures how similar are the encoding of the two variables
(k,t) by the two neurons i and j. We may call gi(k) and exi(n’) the “tuning curves” of
neuron i for the encoding of k and t, in analogy with other behavioural variables
explored in the neurophysiology literature. If the tuning curves of different neurons are
very different, or they have a small overlap (for example, different neurons are tuned to
different values of the variable of interest), then the on-diagonal elements of matrix A
are much larger than the corresponding off-diagonal terms (|Aii||Ajj| >> |Aij||Aji|),
implying that the matrix is well-conditioned and can be robustly inverted. In other
words, the heterogeneous encoding of k and t implies that the problem of finding the
optimal synaptic connections is computationally easy. If neurons display a wide variety
of tuning curves for epoch and for elapsed trials, then the reconstruction of the target
19
function is robust. In fact, consistent with previous work, we observe a wide
heterogeneity of tuning curves, and this seems an experimental support to our
arguments.
We conclude this section with a technical remark about the dependence of the two
variables k and t. The time t elapsed since the outcome depends on the epoch k, because
epochs in the current trial follow one after another in time. However, since the
dependence of time on the epoch is linear, the exponential function ex(t) gives one
additional factor that can be absorbed in the epoch code g(k). More concretely, as an
example, we assume that we can express time in terms of elapsed trials n’ and epoch k,
in the form t = n’+k, where  is the duration of one trial and  is the duration of one
epoch. Then, in the case of a single exponential, ex(t) = Aexp(-t/), we can rewrite the
term in square brackets of Eq.(3.3) as
g(k)ex(t) = g(k)Aexp(-t/) = [g(k)exp(-k/)][Aexp(-n’/)] = h(k)e(n’)
(2.5)
Now the two variables k and n’ are both factorized and independent, and the last
equality defines the functions h(k) and e(n’). The case in which ex(t) is the sum of two
exponentials can be treated similarly, where the two exponentials can be considered as
two separate inputs. In the light of Eq.(2.5), we rewrite Eq.(2.3) in terms of the
independent variables, epoch k and elapsed trials n’, as
fout(k,n’) = ∑i wi[hi(k)ei(n’)]
(2.6)
Hence, all the above considerations for the encoding of epoch and elapsed time hold
true for the case in which the two variables of interests are independent.
20
References
1. Sutton, R. S. & Barto, A. G., Reinforcement Learning, An Introduction. MIT
Press, Cambridge, MA (1998).
2. Lee, D. & Wang, X. J., Neural circuit mechanisms for stochastic decision
making in the primate frontal cortex. In Neuroeconomics: Decision making and
the brain. Academic Press, New York, NY (2009), pp 481- 501 (2009).
3. Rushworth, M. F. S. & Behrens T. E. J., Choice, uncertainty, and value in
prefrontal and cingulate cortex. Nature Neuroscience: 11, 389 - 397 (2008).
4. Kable, J. W. & Glimcher, P.W., The Neurobiology of Decision: Consensus and
Controversy. Neuron 63, 733-745 (2009).
5. Mezzadri, F., How to Generate Random Matrices from the Classical Compact
Groups. Not. Am. Math. Soc. 54, 592-604 (2007).
6. Sornette, D., Critical Phenomena in Natural Sciences. Springer Verlag, Berlin
(2004).
7. Langton, C. G., Computation at the edge of chaos: phase transitions and
emergent computations. Phisica D 42, 12-37 (1990).
8. Bertschinger, N. & Natschlager, T., Real-Time Computation at the Edge of
Chaos in Recurrent Neural Networks. Neural Comput. 16, 1413-1436 (2004).
9. Sussillo, D. & Abbott, L.F., Generating Coherent Patterns of Activity from
Chaotic Neural Networks. Neuron 63:544-557 (2009).
10. Trefethen, L. N. & Embree, M., Spectra and Pseudospectra: The Behavior of
Nonnormal Matrices and Operators. Princeton University Press, Princeton, NJ
(2005).
11. Curtis, C. E. & Lee, D. (2010) Beyond working memory: the role of persistent
activity in decision making. Trends Cog Sci. In press.
12. Egorov, A.V., Hamam, B.N., Fransen, E., Hasselmo, M.E., Alonso, A.A. (2002)
Graded persistent activity in entorhinal cortex neurons. Nature 420:173-178.
13. Wang, XJ (2001). Synaptic reverberation underlying mnemonic persistent
activity. Trends Neurosci., 24:455-63.
14. Goldman M, Compte A and Wang, X-J (2008) Neural integrators: recurrent
mechanisms and models. In New Encyclopedia of Neuroscience, edited by Larry
Squire, Tom Albright, Floyd Bloom, Fred Gage and Nick Spitzer. MacMillan
Reference Ltd, pp. 165-178.
15. Coles, S.G., An Introduction to Statistical Modeling of Extreme Values. Springer
Series in Statistics, Springer, New York (2001).