Download Learning to classify complex patterns using a VLSI network of

Document related concepts

Feature detection (nervous system) wikipedia , lookup

Neural engineering wikipedia , lookup

Long-term depression wikipedia , lookup

Artificial neural network wikipedia , lookup

Neuroanatomy wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Caridoid escape reaction wikipedia , lookup

Synaptic noise wikipedia , lookup

Single-unit recording wikipedia , lookup

Central pattern generator wikipedia , lookup

Pre-Bötzinger complex wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Catastrophic interference wikipedia , lookup

Metastability in the brain wikipedia , lookup

Development of the nervous system wikipedia , lookup

Neurotransmitter wikipedia , lookup

Neural coding wikipedia , lookup

Convolutional neural network wikipedia , lookup

Sparse distributed memory wikipedia , lookup

Pattern recognition wikipedia , lookup

Recurrent neural network wikipedia , lookup

Neural modeling fields wikipedia , lookup

Nonsynaptic plasticity wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Biological neuron model wikipedia , lookup

Nervous system network models wikipedia , lookup

Synaptic gating wikipedia , lookup

Synaptogenesis wikipedia , lookup

Chemical synapse wikipedia , lookup

Transcript
Diss. ETH No. 17821
Learning to classify complex patterns
using a VLSI network of spiking
neurons
A dissertation submitted to
ETH ZURICH
For the degree of
Doctor of Sciences
Presented by
Srinjoy Mitra
MTech Microelectronics
Indian Institute of Technology, Bombay
Thesis committee
Prof. Rodney Douglus
Prof. Stefano Fusi
Dr. Giacomo Indiveri
2008
b
Abstract
The learning and classification of natural stimuli are accomplished by biological organisms with remarkable ease, even when the input is noisy or incomplete. Such real-time classification of complex patterns of spike trains is a
difficult computational problem that artificial neural networks are confronted
with. The performance of classical neural network models depends critically
on an unrealistic feature, the fact that their synapses have unbounded weight.
In contrast, biological synapses face the hard limit of physical bounds as well
as problems of noisy, unmatched elementary devices. Although the reasons
for the superiority of the nervous system in the real world are not completely
understood, it is obvious that the main methods of neural computation in
biology are very different from those of modern digital computers.
In the brain, neuronal networks perform local analog computation and
transmit information using energy-efficient asynchronous events (spikes). Unlike the digital logic elements built in silicon, computational primitives in
biology (i.e. neurons and synapses) are imprecise, but exhibit a highly faulttolerant behavior. Nevertheless, the basic elements of the neural substrate
and that of silicon technology obey similar physical principles. The emerging
discipline of neuromorphic engineering recognizes and exploits such similarities, and maps the properties of neural computation onto silicon to implement new types of computing devices. Such a neuromorphic system can be
designed to emulate the spike-based synaptic plasticity of a biological neural
network, the root of learning and classification.
The goal of this project was to build a spike-based hardware device that
exhibits memory formation and classification, in real time and with minimal
power consumption. Understanding how to accomplish this in VLSI networks
of spiking neurons can not only contribute to an insight into the fundamental
mechanisms of computation used in the brain, but could also lead to efficient
hardware implementations for a wide range of applications, from autonomous
sensory-motor systems to brain-machine interfaces.
In this thesis, silicon implementation of a novel spike based supervisedlearning mechanism, that utilizes bounded synapses with limited analog resi
ii
olution, is presented. The learning mechanism modifies the synaptic weights
only as long as the current generated by all the stimulated plastic synapses
does not match the output desired by the supervisor, as in the perceptron
learning rule (Brader et al., 2007). This thesis also describes the development
and verification of the hardware system capable of performing reliable event
based, asynchronous communication necessary for neuromorphic systems. It
shows how the modules involved in designing the communication channel can
be improved utilizing the asynchronous circuit design techniques.
Using the device developed, real-time classification of complex patterns
of mean firing rates is carried out. The circuits responsible for synaptic plasticity and their dependence on pre- and post-synaptic signals are extensively
characterized in this work. The results include experimental data describing the behavior of the device in classifying random uncorrelated binary
patterns and quantification of the memory capacity. The proposed system
demonstrates, for the first time, robust classification of highly correlated
spike patterns on a silicon device. It could successfully learn graded and
corrupted patterns that can lead to classification of real life spike trains from
silicon sensors or from nerve signals. The thesis demonstrates how the scaling properties of the VLSI system matched that of the theoretical learning
rule.
The VLSI system developed exhibits superior performance when compared to state-of-the-art spike based learning systems. This device is an
ideal candidate for low-power biomedical applications or for integration into
a multi-chip spike-based neuromorphic system. It is already under examination for the classification of spoken vowels, captured by a silicon cochlea. A
larger system has recently been built and is undergoing characterization to
demonstrate an even higher classification performance.
Résumé
L’apprentissage et la classification de stimuli naturels sont accomplis avec une
facilit remarquable par des organismes biologiques, même quand le stimulus
est brouill ou incomplet. Une telle classification tempsréel sur des ensembles complexes de squences de potentiels d’action est un problme computationel difficile auquel sont confronts les rseaux de neurones artificiels. Les
performances des rseaux de neurones artificiels classiques reposent de faon
critique sur un aspect irraliste, qui est que leurs poids synaptiques n’ont pas
de valeurs limites. En contraste, les synapses biologiques font face aux limitations strictes imposées par les contraintes physiques, ainsi qu’aux problmes
de bruits et d’inconsistance entre leurs composants élémentaires. Bien que
les raisons de la supériorité du systme nerveux dans le monde réel ne soient
pas ompltement élucidées, il est évident que les principales méthodes de computation neuronale en biologie sont trs différentes de celles des ordinateurs
modernes.
Dans le cerveau, les réseaux neuronaux accomplissent des calculs analogiques
locaux et transmettent l’information sous forme d’événements asynchrones
et économe en énergie (les potentiels d’action). A la différence des composants électroniques fabriques en silicone, les unités de calcul biologique
(i.e. neurones et synapses) sont imprécises, mais largement tolérantes aux
erreurs. Néanmoins, les composants basiques du substrat neuronale et ceux
des technologies à base de silicone obéissent à des principes physiques similaires. La discipline émergente de l’ingénierie neuromorphique reconnat et
exploite ces similarités, et transfre les propriétés de computation neuronale
vers le silicone pour implémenter de nouveaux procédés de omputation en circuit électroniques. De tels systmes neuromorphiques peuvent être conus pour
émuler la plasticité synaptique de réseaux de neurones biologiques, actrice
principale de l’apprentissage et de la classification.
Le but de ce projet était de réaliser un circuit électronique exploitant
le traitement de potentiels d’action numériques pour la mémorisation et la
classification. Comprendre comment construire ce circuit en réseaux de neurones VLSI peut non seulement aider à obtenir un aperu des mécanismes
iii
iv
de calcul fondamentaux utilisés par le cerveau, mais pourrait aussi mener
à l’implémentation de composants électroniques plus efficaces dans de nombreux domaines d’applications, allant des systmes sensorimoteurs autonomes
aux interfaces cerveau-machine.
Dans cette thse, l’implémentation électronique d’un nouveau mécanisme
d’apprentissage supervisé est présenté, qui utilise des synapses contraintes
avec une résolution analogique limitée. Le processus d’apprentissage modifie
les poids synaptiques seulement tant que le courant généré par toutes les
synapses plastiques stimulées ne correspond pas au résultat en sortie désiré
par le superviseur, comme dans la rgle d’apprentissage du Perceptron (Brader
et al., 2007). Cette thèse décrit aussi le développement et la vérification
d’un système électronique implémentant une communication sous forme de
potentiel d’action asynchrones, nécessaire aux systèmes neuromorphiques.
On montre ici comment les modules utilisés dans la création des canaux
de communication peuvent être améliorés par les procédés d’élaboration de
circuit asynchrones.
Avec le circuit développé, on procède à la classification temps-réel des
motifs complexes de taux moyen de décharge de potentiels d’action. Les
circuits responsable de la plasticité synaptique et de sa dépendance vis à
vis des signaux pre- et post-synaptiques est détaillé. Les résultats incluent
les données expérimentales décrivant le comportement du circuit dans la
classification de motifs binaires aléatoires et non corrélés et la quantification
de la capacité mémorielle. Le système proposé démontre, pour la première
fois, la capacité d’un composant à base de silicone de classifier de faon robuste
des motifs de potentiels d’actions hautement corrélés. Il a pu apprendre avec
succès des motifs gradués et corrompus utilisables pour la classification de
séquence de potentiels d’actions en conditions réelles, venant de capteurs
électroniques ou d’influx nerveux. Cette thèse démontre que l’évolution des
propriétés du système VLSI en fonction du nombre de synapses correspond
au prévisions déduites de la règle d’apprentissage théorique.
Les performances du système VLSI que nous avons développé sont supérieures
à celles des meilleurs systèmes d’apprentissage à base d’événements ou potentiels d’action. Ce circuit est un candidat idéal pour les applications
biomédicales à basse consommation d’énergie, ou pour l’intégration au sein de
systèmes neuromorphiques à composants multiples. Il est déjà considéré pour
la classification des voyelles parlées, captées via une cochlea électronique.
Un système plus large a récemment été construit et est testé qui devrait
démontrer un niveau encore plus élevé de performance en classification.
Contents
1 Introduction
1.1 Biologically inspired hardware
1.1.1 Digital vs Analog . . .
1.1.2 Spiking vs rate models
1.2 Why Learning? . . . . . . . .
1.3 Thesis outline . . . . . . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Biophysical models of learning
2.1 Introduction . . . . . . . . . . . . . . . . . . . .
2.2 Spike-driven plasticity . . . . . . . . . . . . . .
2.3 The palimpsest property . . . . . . . . . . . . .
2.3.1 Bounded and bistable synapses . . . . .
2.4 Stochastic update and stop-learning mechanisms
2.5 The learning rule . . . . . . . . . . . . . . . . .
2.6 Network description . . . . . . . . . . . . . . . .
3 AER Communication circuits
3.1 Introduction . . . . . . . . . . . . . .
3.2 Basics of AER communication . . . .
3.3 Single sender and single receiver . . .
3.3.1 Pipelining the data . . . . . .
3.4 Multiple sender and multiple receiver
3.4.1 Data path design . . . . . . .
3.5 Receiver handshake . . . . . . . . . .
3.6 Arbitration basics . . . . . . . . . . .
3.6.1 Standard arbiter . . . . . . .
3.6.2 Fast, unfair arbiter . . . . . .
3.6.3 Fair arbiter . . . . . . . . . .
3.7 Decoder . . . . . . . . . . . . . . . .
3.7.1 Delay . . . . . . . . . . . . .
3.7.2 Address Latching . . . . . . .
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
3
5
6
9
.
.
.
.
.
.
.
11
11
12
14
15
16
17
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
24
25
29
30
31
33
36
38
43
45
45
48
vi
Contents
3.8
3.7.3 Receiver Synapse Select . . . . . . . . . . . . . . . . . 49
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Circuits for synaptic plasticity
4.1 Introduction . . . . . . . . . . . . . .
4.2 The IFSL family of chips . . . . . . .
4.3 The pre-synaptic module . . . . . . .
4.3.1 The pulse extender block . . .
4.3.2 The weight update block . . .
4.3.3 The bistability block . . . . .
4.3.4 The EPSC block . . . . . . .
4.3.5 Point-neuron architecture . .
4.4 The post-synaptic module . . . . . .
4.4.1 The I&F soma block . . . . .
4.4.2 The pulse integrator block . .
4.4.3 The dual threshold comparator
4.5 Configuration of synaptic density . .
4.6 Conclusions . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
block
. . . .
. . . .
5 Characterization of the plasticity circuits
5.1 Introduction . . . . . . . . . . . . . . . . .
5.2 The post-synaptic module . . . . . . . . .
5.3 The pre-synaptic module . . . . . . . . . .
5.4 Transition probabilities . . . . . . . . . . .
5.5 STDP phase relation . . . . . . . . . . . .
5.6 Multiplexer functionality . . . . . . . . . .
5.7 Conclusions . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Spike based learning and classification
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
6.2 Network architecture . . . . . . . . . . . . . . . . .
6.3 Training methodology . . . . . . . . . . . . . . . .
6.4 Evolution of synaptic weights . . . . . . . . . . . .
6.5 Classifying multiple spatial patterns . . . . . . . . .
6.5.1 Uneven class distributions . . . . . . . . . .
6.6 Quantitatitive analysis of classification performance
6.6.1 Boosting the classifier performance . . . . .
6.7 Classification of Correlated patterns . . . . . . . . .
6.8 Classification of graded patterns . . . . . . . . . . .
6.9 Conclusions . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
54
55
56
58
59
63
65
67
68
69
70
76
77
.
.
.
.
.
.
.
79
79
80
82
86
92
93
94
.
.
.
.
.
.
.
.
.
.
.
97
97
98
98
100
101
104
105
107
109
112
113
Contents
7 Discussion
7.1 Relevance of the work described in this thesis
7.1.1 A robust AER communication system
7.1.2 Synaptic plasticity in silicon . . . . . .
7.1.3 Learning and classification in VLSI . .
7.2 Future Work . . . . . . . . . . . . . . . . . . .
7.3 Outlook . . . . . . . . . . . . . . . . . . . . .
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
. 115
. 116
. 117
. 118
. 119
. 120
A C-element
123
B Hand Shaking Expansion
125
C Current-mode log domain Filter
127
Bibliography
133
viii
List of Figures
1.1
1.2
1.3
1.4
Voltage clamp experiment in biology and silicon . . . .
Data from Voltage clamp experiment . . . . . . . . . .
Generalized artificial behavioral system . . . . . . . . .
Structure of the hippocampus and details of a synapse
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
5
6
8
2.1
2.2
2.3
2.4
Spike time dependent plasticity data and model
Memory retention experiment . . . . . . . . . .
Theoretical learning rule, simulation results . .
Architecture for a feedforward network . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
18
20
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
3.20
3.21
Comparison between real and virtual axon . . . . . . . . . . .
Schematic of an AER system . . . . . . . . . . . . . . . . . .
Fundamentals of synchronous pipeline and 4-phase handshaking
Pipelining in AER Communication cycle . . . . . . . . . . . .
Implementation of pipeline in AER communication channel . .
Data exchange between combinational blocks . . . . . . . . . .
Conventional and wired-OR gate . . . . . . . . . . . . . . . .
On chip verification of AER event . . . . . . . . . . . . . . . .
AER communication failure . . . . . . . . . . . . . . . . . . .
Mutual exclusion circuit . . . . . . . . . . . . . . . . . . . . .
Output of mutual exclusion circuit . . . . . . . . . . . . . . .
General arbitration scheme . . . . . . . . . . . . . . . . . . . .
Standard arbiter cell . . . . . . . . . . . . . . . . . . . . . . .
Standard arbiter timing diagram . . . . . . . . . . . . . . . . .
Fast, unfair arbiter cell . . . . . . . . . . . . . . . . . . . . . .
Data from fast, unfair arbiter . . . . . . . . . . . . . . . . . .
Fast, fair arbiter cell . . . . . . . . . . . . . . . . . . . . . . .
Data from fast, fair arbiter . . . . . . . . . . . . . . . . . . . .
Traditional decoder scheme . . . . . . . . . . . . . . . . . . .
RC delay in decoder . . . . . . . . . . . . . . . . . . . . . . .
A pre-decoding system . . . . . . . . . . . . . . . . . . . . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
23
25
26
28
31
32
32
33
34
35
36
37
39
40
41
42
44
46
47
48
x
List of Figures
3.22
3.23
3.24
3.25
Pre-decoder layout . . . . . . . . .
Dual rail data communication . . .
AER receiver chip implementation
Pixel select circuit in AER receiver
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
50
51
52
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.15
4.16
4.17
4.18
4.19
4.20
4.21
Layout of neurons and synapses on the silicon chip . . . . . .
Pulse Extender(PE) circuit . . . . . . . . . . . . . . . . . . . .
Simulation result of the PE circuit . . . . . . . . . . . . . . .
Weight update block . . . . . . . . . . . . . . . . . . . . . . .
Bistability block . . . . . . . . . . . . . . . . . . . . . . . . . .
Dependence of EPSC on bistability output . . . . . . . . . . .
Alternative design of the bistability block . . . . . . . . . . . .
Simulation results of the bistability block . . . . . . . . . . . .
Model of a silicon synapse . . . . . . . . . . . . . . . . . . . .
Differential pair integrator (DPI) schematic . . . . . . . . . .
Point Neuron architecture and its silicon equivalent . . . . . .
learn control signals generated from post-synaptic module . .
Integrate-and-fire neuron stimulated at the soma and synapse
Coupling between global shared signals . . . . . . . . . . . . .
Dual threshold voltage comparator . . . . . . . . . . . . . . .
Data from IFSL-v1 chip . . . . . . . . . . . . . . . . . . . . .
WTA circuit used as current comparator . . . . . . . . . . . .
Complete post-synaptic module . . . . . . . . . . . . . . . . .
Active current mirror . . . . . . . . . . . . . . . . . . . . . .
Use of a multiplexer to reconfigure synaptic density . . . . . .
55
57
58
60
61
62
63
64
65
66
67
68
69
71
72
73
74
75
76
77
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
Silicon I& neuron . . . . . . . . . . . . . . . . . . .
Verification of the learn control functionality . . . .
Stimulation protocols . . . . . . . . . . . . . . . . .
Stochastic Transition . . . . . . . . . . . . . . . . .
Synaptic update depends on control voltages . . . .
Dependence of transition probability on νpost . . . .
Stochastic transition without initialization . . . . .
Stochastic LTP transition with initialized synapses
Stochastic LTD transition with initialized synapses
STDP phase relation . . . . . . . . . . . . . . . . .
Multiplexer verification data . . . . . . . . . . . . .
81
83
84
85
85
87
89
90
91
93
94
6.1
6.2
6.3
Binary patterns for training . . . . . . . . . . . . . . . . . . . 99
Evolution of νpost for a C + training . . . . . . . . . . . . . . . 101
Evolution of νpost for a C − training . . . . . . . . . . . . . . . 101
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
Classification result for four spatial patterns . . . . . . . . . .
Classification result for six and eight spatial patterns . . . . .
Pattern recognition of 2D binary images . . . . . . . . . . . .
Memory recall from corrupted data set . . . . . . . . . . . . .
Classification of one pattern out of many . . . . . . . . . . . .
Basics of ROC analysis . . . . . . . . . . . . . . . . . . . . . .
Classification performance and memory capacity . . . . . . . .
Boosting the classification performance . . . . . . . . . . . . .
Classification performance for correlated patterns . . . . . . .
Classification performance with and without the stop-learning
Classification performance for graded patterns . . . . . . . . .
Classification performance for Gaussian distributed input . . .
xi
102
103
104
105
106
107
108
109
110
111
112
113
A.1 C element truthtable . . . . . . . . . . . . . . . . . . . . . . . 123
A.2 C element implementation . . . . . . . . . . . . . . . . . . . . 124
C.1 Basic log-domain filter . . . . . . . . . . . . . . . . . . . . . . 128
C.2 Gain in Log-domain filter . . . . . . . . . . . . . . . . . . . . 130
xii
Chapter 1
Introduction
Understanding the functional and structural properties of the brain has been
a baffling task, challenging scientists for long. While neuroscientists, coming
from virtually all disciplines of science, are hard put to decipher this enigma,
engineers are often reluctant to dive into the realm of biology. Although it
is widely accepted that evolution has done an excellent job in engineering
the organization of biological systems, traditionally engineered systems have
failed to incorporate its virtues. This has been particularly true for the computing industry, which is solely driven by the kind of problem it wants to
solve i.e., high speed data crunching. Conventional methods of computation
and information processing, nevertheless, have evolved computing power by
many orders of magnitude since the early ENIAC 1 , and must be acknowledged as the most successful technology of the past century. However, even
the most advanced general-purpose computer falls far behind the capacity
of a fly brain when simple behavioral (navigation, pattern recognition, communication) tasks are concerned. As Carver Mead, one of the forefathers of
large–scale electronics on silicon pointed out (Mead, 1990):
Biological information-processing systems operate on completely
different principles from those with which most engineers are familiar. For many problems, particularly those in which the input
data are ill-conditioned and the computation can be specified in
a relative manner, biological solutions are many orders of magnitude more effective than those we have been able to implement
using digital methods.
1
Unveiled in 1946 at University of Pennsylvania, it was the first general-purpose electronic computer that was Turing Complete. It weighed 27 tones, occupied 63m2 and
consumed 150kw power.
1
2
Chapter 1. Introduction
He envisioned that the technology of silicon chips can be appropriately
utilized to morph computational primitives of biological systems. This is
in stark contrast to how digital computers functions. Digital computers are
good at precision driven arithmetic operation while animals have evolved
brains and senses that let them interact efficiently with the imprecise inputs of the real world. Comparison between efficiency of the human brain
and standard digital computers, though qualitative, is able to highlight some
huge discrepancies. Roughly, the 1012 neurons in the human brain, each with
an average 103 synaptic connections, spike at an average rate of 10Hz. This
adds up to nearly 1016 synaptic events per second. From measurements of
cerebral blood flow and oxygen consumption, it is estimated that the brain
consumes around 10W burning a mere 10−15 J per operation (Mead, 1990;
Sarpeshkar, 1998). On the other hand, the bleeding-edge Intel 80-core teraflop (1012 floating point operations per second) processor consumes nearly
100W (Intel, 2007) burning 10−10 J per operation (roughly an order of magnitude improvement compared IBM Blue Gene, the most powerful supercomputer till date). Even these prohibitively expensive gigantic prototype
machines, results of multi-million dollar research, are nearly a million times
less efficient than the brain. However, some researchers intend to understand
the behavior of large neural networks (comparable to the size of brain) by
simulating them in large clusters of these super fast general purpose digital computers. E.M. Izhikevich simulated 1011 neurons for one second in
a beowulf cluster of twenty seven 3GHz machines that took fifty days to
complete its task (Izhikevich, 2005)! The impressive Blue Gene/L supercomputer at EPFL, Switzerland, with its 8000 processors will be able to simulate
detailed models of just 10,000 neurons constituting a single cortical column
(Markram, 2006). Andres Lanser and his team are undertaking a similar effort at KTH (Sweeden), simulating 100 cortical hypercolumns with 22 million
neurons, but more than 5000 times slower than real time (Djurfeldt et al.,
2008). These attempts show the difficulties faced by general purpose digital
computers in simulating very basic functionalities of the brain in an efficient
manner. In contrast, custom hardware, that is not limited to simulation of
large neural networks, might be necessary to emulate it. To build a brain
like intelligent system on should design computational primitives on silicon
using the same physical laws as in biology.
1.1
Biologically inspired hardware
As research in artificial neural networks gained momentum during late 80s, a
group of researchers started looking into its hardware implementation. Two
1.1. Biologically inspired hardware
3
distinct schools of thought that soon emerged are based on digital and analog nerumorphic hardware. As most of the VLSI designers round the globe
consider digital design as their forte, their familiarity in the field and the
availability of high level languages for design synthesis often made it an obvious choice over analog design. However, a small but dedicated community
of analog designers, lead by Carver Mead, pioneered the field of neuromorphic VLSI that took advantage of the inherent analog nature of silicon and
biology (see Boahen, 2005; Sarpeshkar, 2006; Indiveri et al., 2008, for recent
reviews). Neuromorphic hardware offers a medium in which neuronal networks can be emulated directly in real–time and with extremely low power
consumption.
1.1.1
Digital vs Analog
There has been substantial interest in developing dedicated digital hardware for neural networks in the early 90s . The fast design cycle, flexibility
of design and high precession of digital computation initiated a deluge of
specialized chips both from the industry and academia (Lindsey and Lindblad, 1995). SIMD (single instruction multiple data) parallel processors,
namely CNAPS (Hammerstrom, 2002), were successful in implementing large
networks with speeds much greater than that of conventional microprocessors. Perhaps the most promising technology for digitally emulating neural
networks today are the Field Programmable Gate Arrays (FPGA). These
are semiconductor devices containing programmable logic components, programmable interconnects and some memory elements. The ease in their reconfigurability and fastest design cycle gives them a leading edge compared
to all other hardware implementations. Unlike general purpose processor
that divides computation over time, dedicated hardware like FPGA, divides
it across space by using more physical resources. This inherently parallel
computing architectures of FPGA device gives them additional advantage as
candidates for neural hardware. FPGAs are now available with hardwareoptimized multipliers and also with larger on–chip memory to generate huge
networks of spiking neurons (Ros et al., 2006). Due to limited silicon real
estate, FPGA neural networks almost always use fixed-point arithmetic, compared to the standard floating-point one. This is generally not a big handicap,
as precision can be often traded for redundancy in such networks. However,
this design choice has potential pitfalls since it is not clear whether a given
fixed-point representation will allow for a faithful implementation of the underlying model (Pearson et al., 2007). Apart from that, high area and power
budget required for digital computation becomes a limiting factor when larger
networks are concerned.
4
Chapter 1. Introduction
Control
voltage
Control
voltage
Sense
Electrode
Drive
Electrode
Axon
Drain
Gate
(a)
Source
(b)
Figure 1.1: Voltage clamp experiment. (a) Schematic of the arrangement used
to measure current through the cell membrane of a squid axon, where the membrane potential is clamped to a fixed voltage. The Sense electrode measures
the actual intralcellular potential while the current through the Drive electrode pushes it to wards the desired Control Voltage. (Adapted from Hodgkin
and Huxley (1952)). (b) A similar experimental setup duplicated for a nMOS
connected as a two terminal device to measure the current through them.
Fundamental considerations show that digital processing is more efficient
than analog with respect to power consumption (as well as chip area) when
the required signal-to-noise ratio (or more generally the required precision)
is large, typically larger than 60 dB (Sarpeshkar, 1998). Conversely, analog
processing is more efficient when low precision is acceptable. This is the case
for evaluative processing, in which the need for precision of individual cells is
replaced by that for collective computation in massively parallel structures
(Vittoz, 1998). In analog circuitry, complex, nonlinear operations such as
multiply, divide, hyperbolic tangents etc. can be performed with a handful
of transistors. Analog computation in subthreshold domain 2 comes with
the added benefit of extremely low power consumption and an exponential
transfer function.
Fig.1.1(a) shows the measurement method of current through a cell membrane clamping the the membrane potential to a desired value (Control voltage),
2
In MOS transistors the amount of current (i) flowing between its two terminals (source
and drain) is controlled by the applied voltage (v) at the a third terminal (gate). The
transistors when operated at very low gate voltages show exponential i − v relation (subthreshold) compared to higher gate voltages where they have a quadratic relation. Most
analog and digital circuits, however, use MOS transistors at higher gate voltages compared
to subthreshold.
1.1. Biologically inspired hardware
(a)
5
(b)
Figure 1.2: (a) Exponential current-voltage characteristics of voltagedependent membrane channels (Hodgkin and Huxley, 1952) measured from
the voltage clamp expriment in Fig. 1.1(a). (b) Analogous current-voltage
relation from a nMOS silicon transistor connected as in Fig. 1.1(b).
while Fig.1.2(a) shows the measured current. A similar method used for MOS
transistors operating in subthreshold regime ( Fig.1.1(b)) produces nearly
identical i − v characteristics shown in Fig.1.2(b). Carver Mead realized
that this similarity in nature of charge transfer between a cell membrane and
a silicon transistor, both following the Boltzman distribution law , can be
exploited in building circuits that mimic biology. In Douglas et al. (1995)
showed that, in general, neural computational primitives such as conservation of charge, amplification, exponentiation, thresholding, compression and
integration arise naturally out of physical process of analog circuits. These
observations provided a solid ground for neuromorphic engineers around the
world to flourish with ideas for emulating the nervous system.
1.1.2
Spiking vs rate models
The basis of communication between neurons in the brain, and also in artificial neural networks, is their output activity pattern. Traditionally, neural
networks used rate-based models, where the normalized average firing rate (a
quantity between 0 and 1), is the information to be transmitted to another
neuron. These models were successful in understanding various neuronal
mechanisms including learning (Bell and Sejnowski, 1997; Blais et al.). Rate
models could be efficiently implemented in dedicated hardware, both digi-
Sensor
Front end
Signal
Processing
Feature
extraction
Learning/
Classification
Motor
output
Behavior
Chapter 1. Introduction
Stimulus
6
Figure 1.3: A generalized artificial system for behavioral tasks. The system
partitions, though much simpler compared to that in biology, consists of relevant modules which can have neuromorphic analogues. The modules should
be able to efficiently transfer information among themselves, in real time.
tal (Lindsey and Lindblad, 1995; Danese, 2002) and analog (Valle, 2002).
However, they have limitations from both theoretical and practical perspective. These models do not address phenomena such as temporal coding,
spike-timing dependent synaptic plasticity, or any short-time behavior of
neurons. Whereas, spike based coding allows incorporating spatio-temporal
information in communication and computation, like real neurons do (Gerstner, 2001). Where hardware implementation is concerned, communication of
analog signal (firing rates of neuron) between chips seriously limit the system
bandwidth and also the number of processing units that can be implemented
per chip. Neuromorphic engineering, on the other hand, has mostly relied
on spike based coding where fast inter-chip communication can be achieved
by transmitting digital spikes multiplexed on a single bus (see chapter 3 for
detailed discussion).
A generalised multi-chip behavioral system would ideally have front end
signal processing circuits connected to the sensor, succeeded by higher order
processing and finally a motor control unit. In Fig. 1.3 we show such a system
in a simplified form where each subsystem can be considered as a separate
neuromorphic chip. Taking advantage of the physics of silicon, the sensors
and front end processing circuits (often integrated together) are designed to
encode the stimulus in spike code. Considering spike based inter-chip communication, it is logical to design higher order computational units also based
on spike based coding. Here we focus on one of the most essential features
of an artificial behavioral system that implements spike-based learning and
classification, on a VLSI device.
1.2
Why Learning?
Most of the effort in early days of neuromorphic engineering was focused on
designing efficient sensors and subsequently the front end processing circuits
necessary for them. Vision and audition are , by far, the two most researched
topics in this field. A number of vision chips, varying in their degree of truly
1.2. Why Learning?
7
emulating the biology of retina, has been built. From the early days of Mahowald (1992), to Culurciello et al. (2003), Zaghloul and Boahen (2006) and
Lichtsteiner et al. (2006), are notable examples. Similarly, Lyon and Mead
(1988) paved the path for auditory chips which was later developed by Chan
et al. (2006), Wen and Boahen (2006) and Sarpeshkar (2006) among others. Recently various neuromorphic chemical sensors have been successfully
demonstrated, like Georgiou and Toumazou (2007) , Koickal et al. (2007),
Shen et al. (2003), etc. extending the scope of the field even further.
The front end signal processing for most neuromorphic devices are performed right on the sensor chip, producing spike trains encoding features of
the input stimulus. Chips for spatial filtering the data from vision sensor
(Etienne-Cummings et al., 1999; Zaghloul and Boahen, 2005), for optimized
signal extraction from noisy bioelectric interfaces using probabilistic models (Genov and Cauwenberghs, 2002; Chen et al., 2006) or for specialized
function like convolution (Serrano-Gotarredona et al., 2006) also falls in this
category.
There are number of examples of neuromorphic chips responsible for feature extraction and other higher order neuronal processing. Orientation
selctivity (Choi et al., 2004; Chicca et al., 2007), feature extraction (Vogelstein et al., 2007) and saliency detection (Bartolozzi and Indiveri, 2007a)
being few among them.
Neuromorphic chips for motor control are mostly in their early stage of
development (Vogelstein et al., 2006; Still et al., 2006). This is partially
because of the lack of robust on-chip classification necessary to stimulate
motor behavior in a selective manner.
In building a real-time behavioral system, learning and classification undoubtedly forms an integral part. Yet, the volume of research in devicing
a neuromorphic learning chip is comparatively low. One obvious reason is
the lack of established models pertaining to spike based learning, essential
for multi-chip system shown in Fig. 1.3. The process of classification of
patterns performed by neural networks is usually the result of a training procedure, during which the synaptic strengths between neurons are modified.
It has been well established that plasticity in the hippocampal and neocortical synapses in the brain is the root of memory formation (Martin et al.,
2000; Shouval et al., 2002). Excitatory synapses throughout the brain are
bidirectionally modifiable and is studied in great detail in layers 2/3 and 5
of the neocortex and in CA1 cells of the hippocampus. However, the actual methodology of synaptic modification is still a matter of debate. Given
the complexity of learning and memory, it is possible to find many forms
of synaptic plasticity with different mechanisms of induction and expression
(Abbott and Nelson, 2000). Spike timing dependent plasticity (see Sec. 2.2)
8
Chapter 1. Introduction
(a)
(b)
Figure 1.4: a) The highlighted hippocampus region is considered to be the
center of learning and memory in the primate brain. Bidirectional synaptic
plasticity, believed to be the cause of memory formation, has been observed in
hippocampus and in some other cortical synapses. b) A cartoon of two neurons forming a synapse and the details of the synapse indicating the complex
biochemical mechanism during a synaptic event.
is one possible mechanism motivated by experimental (Levy and Steward,
1983; Markram et al., 1997) and by theoretical studies (Kempter et al., 1999;
Abbott and Song, 1999). It has been taken up by some neuromorphic engineers to device learning systems (Arthur and Boahen, 2006; Bofill-i Petit and
Murray, 2004; Indiveri et al., 2006a) due to its and success in solving various
computational problems. In fact, a similar form of spike timing based learning was proposed in a neuromorphic chip by Hafliger and Mahowald (1999)
even before the formal STDP rule gained popularity.
In accordance with the theoretical necessity of STDP, Bofill-i Petit and
Murray (2004) and Indiveri et al. (2006a) showed methods for analog VLSI
implementation of such synaptic rule. Fusi et al. (2000), Häfliger (2007)
showed feasibility of other forms of spike based learning in silicon including
detailed characterisation of the synaptic dynamics. It was only in Arthur and
Boahen (2006) and Häfliger (2007) that some basic classification experiments
were performed, however, they do not quantify the classification behavior.
Apart from that, no one has tried to solve the difficult problem of classifying
1.3. Thesis outline
9
correlated patterns in a silicon device. In this thesis we show an efficient
VLSI implementation of a very robust spike based learning rule (Brader
et al., 2007). We designed a family of chips, code named IFSL (different
versions are IFSL-v1, IFSL-v2 and IFSL-WTA), that can map the theoretical
learning rule on silicon for real-time learning and classification. We present
experimental data, mostly from the IFSL-v2 chip, describing the detailed
behavior of the learning circuits, at the single neuron and synapse level,
and quantify classification results of complex spatial patterns of mean firing
rates, both uncorrelated and correlated. IFSL-WTA is still in the process of
characterisation. Giulioni et al. (2007) showed preliminary results using the
same learning rule implemented in a VLSI chip with much more area and
power overhead (but with added reconfigurability).
1.3
Thesis outline
This thesis is divided into seven chapters. In this chapter I presented some
introductory remarks on the motivation for building a VLSI chip capable of
real-time spike based learning and classification. This device can be a part
of a large multi-chip behavioral system for applications like robotics or manmachine interfaces. Chapter. 2 deals with the theoretical basis of learning
and classification. I discuss the various physical characteristics observed in
a natural learning system and the models implementing such behavior. I
describe in detail, a bio-plausible learning rule with stochastic and bistable
synapses (Brader et al., 2007) and justify the advantages of implementing it
in VLSI. In Chap. 3, I introduce the AddressEvent Representation (AER)
for interchip communication in neuromorphic systems. I review the asynchronous pipelining methodology and suggest improvements in existing AER
using the knowledge of asynchronous communication channel design. I also
study the various combinational circuits required for a robust data transfer
between neuromorphic chips and show possible improvement schemes. The
CMOS circuit blocks implementing the spike based learning rule is described
in Chap. 4. Various circuits designed in the neuromorphic community were
reused and new circuits designed. I highlight the advantages and disadvantages of different circuit elements used in the IFSL 3 family of chips, used in
this project. In Chap. 5 I describe various methods to characterize the individual circuit blocks and also the on-chip synaptic plasticity. The behavior of
silicon synapses are also compared with that of the theoretical requirement.
Chapter. 6 deals with spike based learning and classification on the VLSI
system. I first describe the training and testing methodology of the silicon
3
IFSL stands for Integrate and Fire with Stop Learning.
10
Chapter 1. Introduction
neurons. Next, the classification performance for random binary patterns, of
mean frequencies, are described in detail along with rigorous quantification.
I also demonstrate the difficult case of classifying correlated and graded patterns on neuromorphic VLSI, reported for the first time. I conclude the thesis
by summarizing the results achieved and discussing ideas about further work
and outlook in Chap. 7.
Chapter 2
Biophysical models of learning
2.1
Introduction
The important role of activity-dependent modifications of synaptic strength
in learning and memory formation is well accepted in the neuroscience community (Abbott and Nelson, 2000). Synapses throughout the brain are considered to be bidirectionally modifiable. This property, postulated in almost
every theoretical description of synaptic plasticity, has been most clearly
demonstrated in the CA1 region of the hippocampus (Martin et al., 2000;
Shouval et al., 2002). Understanding the biophysical mechanisms underlying such functional plasticity and learning has been an important aspect in
neuroscience research. From the theoretical point of view, the methods of
learning can be broadly grouped in two classes, Hebbian plasticity and Classical conditioning. The generalised Hebb rule states that synapses change
in proportion to the correlation or covariance of the activities of the preand post-synaptic neurons (Dayan and Abbott, 2001). Classical conditioning, on the other hand, uses the correlation between multiple input signals
(conditioned and unconditioned) to determine the synaptic weight.
Hebbian plasticity, in the form of long-term potentiation (LTP) and depression (LTD), provides the basis for most models of learning and memory,
as well as for the development of cortical maps (Bi and Poo, 2001). It can
be subdivided in two categories, the supervised and the unsupervised. In
supervised learning, for every pre-synaptic input the post-synaptic neuron is
provided with a target; i.e, the environment tells it what its response should
be. The neuron then compares its actual response to the target and adjusts
the synapse in such a way that it is more likely to produce the appropriate
response the next time it receives the same input. In unsupervised learning:
the neuron receives no external signal. Instead, the task is to re-represent the
11
12
Chapter 2.
Biophysical models of learning
inputs in a more efficient way, as clusters or categories, or using a reduced
set of dimensions. Unsupervised learning is based on the similarities and differences among the input patterns. In classical conditioning, a process akin
to supervised learning, a reinforcer (reward or punishment) is delivered to
the neuron independently of the output of the neuron.
Here, we will focus only on the supervised learning methodology for complex real world classification problems. In order to classify patterns, both
natural and artificial physical systems are required to create, modify and
preserve memories of the representations of the learned classes. In particular,
they should be capable of modifying the memory elements, the synapses, in
order to learn from experience and create new memories (memory encoding).
They should also be able to protect old memories from being overwritten
by new ones (memory preservation). We will discuss various physical and
theoretical constraints in designing an artificial device for such classification
tasks.
2.2
Spike-driven plasticity
Most learning rules for memory encoding are formulated in terms of mean
firing rate, using a continuous variable representing the mean pre- and postsynaptic activity. They generally use a sigmoidal function between input
and output. Such a rate-based model neglects the effects originating from
the pulse structure of the input signal. However, recently there has been
an increased interest in the spike-based Hebbian learning compared to the
pure rate-based models (Maass and Bishop, 1998; Kempter et al., 1999; Xie
and Seung, 2000). This is influenced both by the biophysical mechanism
of information transfer (an all-or-none event) also and the experimental evidence supporting synaptic modification affected by individual spikes. The
spike time dependent plasticity (STDP), first observed by Markram et al.
(1997) and Bi and Poo (1998) etc, is the most popular spike-based synaptic
modification mechanism formulated till date. This is a form of bidirectional
synaptic modification dependent on the phase of the pre- and post-synaptic
spike instants. As shown in Fig. 2.1(a), the polarity and magnitude of the
modification depends on the phase difference of the two spikes. Various possible STDP windows are shown in Fig. 2.1(b). The properties of STDP has
been extensively studied both in recurrent neural networks and at the level
of single synapses (e.g., Rubin et al., 2001; Kempter et al., 2001). These
mechanisms have important regulatory properties and have been shown to
create memories of temporal patterns of spikes (Legenstein et al., 2005; Gütig
and Sompolinsky, 2006). However, STDP in its simplest form is not suitable
2.2. Spike-driven plasticity
13
pre
post
t
(a)
(b)
Figure 2.1: a) Experimental data showing the synaptic modification in hyppocampal neurons following a STDP rule ( adapted from Bi and Poo (1998)).
b) See Caporale and Dan (2008) for a detailed review and all possible STDP
time windows. Time in the x-axis is in milliseconds.
for learning patterns of mean firing rates. It is usually too sensitive to the
specific temporal pattern of spikes and it can hardly be generalized to trains
of spikes sharing the same mean firing rate (Abbott and Nelson, 2000).
In most studies, there are specific range of frequencies that must be used
for successful induction of STDP. The frequency dependence suggests that
there is something more than simple pairing of pre- and post-synaptic spike
times necessary for this form of bidirectional plasticity (Lisman and Spruston, 2005). In this temporally asymmetric Hebbian learning, the biophysical
mechanism of coincidence detection (of the two spikes) depends on the back
propagating action potential (bAP) 1 . The location dependence of bAP in a
1
Action potentials, initiated in the axon, are found to propagate into the dendrites of
the hippocampal and cortical pyramidal cells. They form the primary feedback signal to
the synapse and determine the shape and polarity of STDP window depending on the
14
Chapter 2.
Biophysical models of learning
after normal activity
Retention interval (hours)
Mean percent recalled
Retention score
after forced activity
after sleep
after normal waking activity
Retention interval (hours)
(a)
(b)
Figure 2.2: Memory retention experiments for cockroaches (a) and collage
students (b) show similar decay. Memory decays at a faster rate when new
experiences happen compared to that of sleep (adapted from Jenkins and
Dallenbach, 1924).
dendritic tree imposes additional constrains on the theoretical STDP models.
It should also be noted that the standard form of STDP is not suitable for
a VLSI implementation due the difficulty in longterm storage of the analog
weights. Most attempts in this direction (Bofill-i Petit and Murray, 2004;
Arthur and Boahen, 2006; Indiveri et al., 2006a) have considered added assumptions to drive the final synaptic states to binary values.
2.3
The palimpsest property
The problem of memory preservation has been often neglected in various
theoretical models by assuming simplified but unrealistic characteristics for
a physical system, like unbounded synaptic weights. However, capacity limited memory systems need to gradually forget old information in order to
avoid catastrophic forgetting where all information is lost at once (Sandberg
et al., 2002). In his classical experiment, Jenkins and Dallenbach (1924)
showed that memory is destroyed by new experiences and not only by time
(see Fig. 2.2). Networks with this property are called palimpsests by analogy
with the ancient practice of cleaning old texts from papyrus to make way for
new ones (Nadal et al., 1986). In order to prevent too-fast forgetting, one
can introduce a stochastic mechanism for selecting only a small fraction of
synaptic location (Letzkus et al., 2006)
2.3. The palimpsest property
15
synapses to be changed upon the presentation of a stimulus. Such a mechanism can be easily implemented by exploiting the noisy fluctuations in the
pre- and post-synaptic activities to be encoded (Fusi et al., 2000). However,
this also slows down learning and the memories should be experienced several
times to produce a detectable mnemonic trace.
2.3.1
Bounded and bistable synapses
The remarkable memory capacity of the classic neural network models depend critically on the fact that their synapses are unbounded. But, physical
implementations of long lasting memories, either biological or electronic, are
confronted with two hard limits: the synaptic weights are bounded (i.e.
they cannot grow indefinitely or become negative), and the resolution of the
synapse is limited (i.e. the synaptic weight cannot have an infinite number
of stable states). These constraints, usually ignored by the vast majority of
software models, have strong impact on the classification performance of the
network, and on its memory storage capacity. In case of bounded synapse,
if one assumes that the long-term changes cannot be arbitrarily small, the
memory trace decays exponentially with the number of stored patterns. The
neural network remembers only the most recent stimuli, and the memory
span cannot surpass a number of patterns that is proportional to the logarithm of the number of neurons (Senn and Fusi, 2005; Fusi and Senn, 2006).
p<−
log N
log(1 − Q)
(2.1)
Here p denotes the number of stored patterns at one time, N is the number
of synapses, and Q the minimum probability of inducing a longterm change.
Slowing down the learning process (decreasing the probability of transition,
Q) allows increase in storage capacity as it also leads to slow forgetting.
The resolution, or analog depth, of a synapse is the number of synaptic
states that can be preserved in long time scale. It has been shown, if each
synapse has n such stable states, then the number of patterns p can grow
quadratically with n. However, this can happen only in an unrealistic scenarios, where fine tuning of the network’s parameters are allowed. In more
realistic scenarios where there are inhomogeneities and variability (as is the
case for biology and silicon) p is largely independent of n (Fusi and Abbott,
2007). At the same time, there has been accumulating evidence that biological synaptic contacts undergo all-or-none modification (Petersen et al., 1998;
O’Connor et al., 2005), with no intermediate stable states. Positive feedback loops in the biochemical process, involving protein signaling cascade at
post-synaptic density, has been hypothesized to durably maintain the evoked
16
Chapter 2.
Biophysical models of learning
synaptic state in the form of a bistable switch (Graupner and Brunel, 2007).
The lack of intermediate analog sates also makes the synapses robust and
noise immune. Hence, in this work, we consider the extreme case of bistable
synapse also because of the ease of implementation on a VLSI device suitable
for long term storage of digital bits.
2.4
Stochastic update and stop-learning mechanisms
Recently a new model of spike-driven synaptic plasticity has been proposed
(Brader et al., 2007) that can encode patterns of mean firing rates and is
very effective in protecting old learned memories. Doing so, it captures the
rich phenomenology observed in neurophysiological experiments on synaptic plasticity, including STDP protocols. This model uses Hebbian learning
with stochastic updates and an additional stop-learning condition to classify
broad classes of linearly separable patterns. As explained in previous sections, stochastic update is a requirement arising from the theoretical necessity of slowing down the learning process in an unbiased way. The dynamics
of the synapse makes use of the stochasticity of the spike emission process
of pre- and post-synaptic neurons. It is assumed that the pre-synaptic spike
train is Poissonian, while the afferent current to the post-synaptic neuron
is uncorrelated to the pre-synaptic process. To implement slow learning,
only a random small fraction of synapses, from the large number of afferents
(N ), should be modified at a time. The storage
√ capacity obtained from the
stochastic update method is proportional to N (Fusi and Senn, 2006) a significant improvement compared to log(N ), which was shown in the previous
section. In this model, the memory lifetime is further extended by modifying
the synapses only when necessary, i.e., when the input pattern weighted by
the plastic synapses does not generate the output desired by the supervisor.
If it does, synaptic modifications are stopped (using a stop-learning mechanism). This results in a very efficient method of classifying wide classes
of highly correlated random patterns (Brader et al., 2007; Senn and Fusi,
2005). The mechanism of stop-learning resembles that of the perceptron
learning rule (Minsky and Papert, 1969): patterns already learned does not
change the synapses any more. In the model, stop-learning is implemented
using a variable representing the average post-synaptic firing rate, without
the need of any additional external signal.
The learning rule in Brader et al. (2007) showed superior performance in
classifying complex patterns of spike trains ranging from stimuli generated by
2.5. The learning rule
17
auditory/vision sensors (Coath et al., 2005) to images of handwritten digits
from the MNIST 2 database (Brader et al., 2007). Both memory encoding
(using mean firing rates and stochastic updates) and memory preservation
(using binary weights and low transition probabilities for synapse) methods
used in this model fit well with the physical limitations of a silicon device.
Storage and recovery of binary values on silicon has been practiced by the
VLSI industry for a long period of time and in a very successful manner.
Using the very same strategies, we can build large arrays of synapse that are
bistable in nature. The compact synaptic circuits do not require local Analogto-Digital Converters or floating gate memory cells for storing weight values.
In addition, the inherent inhomogeneities in silicon fabrication process can
be exploited to our advantage, when stochastic update in synaptic weight
is required. By construction, these types of devices operate in a massively
parallel fashion and are fault-tolerant: even if a considerable fraction of the
small synaptic circuits are faulty due to fabrication problems, the overall
functionality of the chip is not compromised. This can be a very favorable
property considering the potential problems of unreliability in future scaled
VLSI processes.
2.5
The learning rule
The main goal of the synaptic model is to encode patterns of mean firing
rates. The stochastic selection and stop-learning is given by two simple
abstract learning rules. Consider a single neuron receiving a total current h
which is the weighted sum of the activities si of its N inputs (Brader et al.,
2007):
h=
N
1 X
(Jj − gI )sj ,
N j=1
(2.2)
where the Jj are the binary plastic excitatory synaptic weights (where
Jj ∈ 0,1), and gI is a constant representing the contribution of an inhibitory
population. The synaptic learning rule can be summerised as:
Ji → 1 with probability qsi if hi < Ω and ξ = 1
Ji → 0 with probability qsi if hi > Ω and ξ = 0,
2
(2.3)
A large database of handwritten character set that provides a good benchmark for
learning network performance and have been used to test numerous classification algorithms.
18
Chapter 2.
Biophysical models of learning
k3
k2
[Ca] k1
Vmth
UP
Vmem
DN
post
Vmth
(a)
(b)
Figure 2.3: Characteristics of the synaptic modification. Top plots show
conceptual framework while simulation results (adapted from Brader et al.,
2007) are shown below. a) The Probability of upward or downward synaptic
jumps depend on the average firing frequency of the post-synaptic neuron
(νpost ). b) Within the right frequency range, the polarity of jumps depend on
the value of the post-synaptic depolarization (Vmem ) compared to a threshold,
Vmth .
where ξ is a binary variable indicating the desired output as specified by
the teacher. Variable Ω is the threshold on input current h that determines
whether the neuron is active or not and q is the proportionality constant for
transition probability.
In order to make the model match neurophysiological observation, specific experimental results were considered. Though, presence of spike time
dependent plasticity (STDP) at low frequencies of pre- and post-synaptic
neuron has been well established both in vitro and in vivo (see Caporale and
Dan, 2008, for review), LTP dominates when both pre- and post-synaptic
neuron fires at higher frequencies, independent of the phase relation of the
spikes (Sjostrom et al., 2001). In Nelson et al. (2002) also demonstrated that
post-synaptic neuron has to be sufficiently depolarized for LTP to occur. The
model in Brader et al. (2007) shows that with one variable for the average
post-synaptic frequency and another for the post-synaptic depolarization,
2.6. Network description
19
the abstract learning rule can be implemented together with the neruphysiological constrains. While post-synaptic calcium concentration [Ca] , with
its long time constant, is a good measure for the average firing frequency,
the membrane voltage Vmem is a direct reading of the depolarization. All
synapses are considered to be modified only at pre-synaptic inputs.
Considering x as a variable representing the synaptic weight of a single
synapse, the weight update during a pre-synaptic spike tpre is given by:
x → x + ∆x if Vmem (tpre ) > Vmth and kUL P < [Ca](tpre ) < kUHP
L
H
x → x − ∆x if Vmem (tpre ) > Vmth and kDN
< [Ca](tpre ) < kDN
,
(2.4)
H
L
are the thresholds on the calcium variable.
and kDN
where kUL P , kUHP , kDN
In absence of pre-synaptic spike or if none of the conditions in Eq. 2.4 are
not satisfied, x drifts to either of the two bistable states:
dx
= α if x > θ
dt
dx
= −β if x ≤ θ,
dt
(2.5)
where α and β are positive constants and θ is a constant threshold. In
the simplified model used for the silicon implementation, we have:
L
H
kUL P = kDN
→ k1 , kDN
→ k2 and kUHP → k3
α=β
x = w,
(2.6)
where w is the actual synaptic weight used for generation of the excitatory
post synaptic current (EP SC).
2.6
Network description
The learning rule is designed to modify the synaptic matrix in such a way
that each pattern seen during training can be retrieved later without mistakes. In the case of a feedforward network, this means that each input
pattern produces the correct response indicated by the teacher during the
training. For a recurrent network, each pattern imposed by the sensory
stimuli becomes a fixed point of the network dynamics. Under additional
stability conditions, these fixed points can also be attractors of the network
20
Chapter 2.
Biophysical models of learning
synapse array
Inh
n2
Teach
Teach
(a)
n1
pN
soma
p2
axon
p1
Inh
(b)
Figure 2.4: a) A schematic of the network architecture for a data set consisting
of two classes. The output units (top) are are connected to the input layer
(bottom) by the plastic synapses. The output units receive additional inputs
from the teacher and inhibitory populations. b) Cartoon of a silicon output
neuron showing the soma, axon and the synapse array. Patterns are presented
to the plastic synapses (p1 to pN ) and the teacher signal is fed through a
non-plastic excitatory synapse (n2 ). Another neuron receives the same input
pattern and feeds it to the output neuron via an inhibitory synapse (n1 ).
dynamics (Senn and Fusi, 2005). Here we consider the output neurons to
be binary classifiers embedded in a feedforward network. The network architecture we mostly used consists of a single feedforward layer composed of
N input neurons fully connected by plastic synapses to one output neuron.
In addition to the inputs, the output neuron receive signals from a teacher
and an inhibitory population. A binary teacher signal divides the input into
two distinct classes and dictates which patterns the neuron should respond
to. The Fig.2.4(a) shows the input and the output layer with the inhibitory
and teacher population. Multiple output neurons can be used to respond
to different classes of patterns. In the corresponding VLSI implementation,
shown in Fig. 2.4(b)), the output neuron receives the input spikes at its plastic synapses (p1 to pN ). The input is also sent to an inhibitory neuron, in
parallel. The inhibitory neuron and a teacher signal stimulates the output
neuron at its and non-plastic synapses (n1 , n2 ).
Chapter 3
AER Communication circuits
3.1
Introduction
Neuromorphic engineering, from its conception, promised better utilization of
silicon physics and low-power analog circuits for emulating functional behavior of the nervous system. At the same time it had to address the daunting
task of mimicking the complex wiring in a three-dimensional physiological
structure onto an essentially two-dimensional device. As pointed out in Mahowald (1994), it might seem impossible even in principle to build a structure in VLSI that mimics such wiring density. The degree of convergence
and divergence of a single neuron is staggering in comparison to artificial
devices such as a computer chip. This unusual need for a large fanout (and
fanin) initiated thinking on a new strategy, foreseeing the speed of conventional VLSI system (Sivilotti, 1991; Mahowald, 1994; Boahen, 1998). Address Event Representation (AER) protocol, as it came to be known, is one
of the most important achievements of this process. Over the years, tradeoffs on different aspects of the protocol, such as channel access and encoding
schemes, has been analyzed and improved Boahen (2000); Culurciello and
Andreou (2003); Boahen (2004a). AER is becoming increasingly popular as
a means of data transfer in pulse-coded neural network and has been successfully implemented for multi-chip neuromorphic system (Serrano-Gotarredona
et al., 2005; Chicca et al., 2007). Teixeira et al. (2006) even developed an
AER-emulator to speed up post processing of data from AER-sensors. The
usability of the protocol has been greatly improved by the simultaneous development of supporting hardware infrastructure (Deiss et al., 1998; Dante
et al., 2005). The popularity and success of the method lead to a few alternative AER schemes, mainly word-serial Boahen (2004c) and serial AER
Berge and Hafliger (2007); Fasnacht et al. (2008) communication.
21
22
Chapter 3. AER Communication circuits
Source
Destination
1 34
2 7
3 22
50 16
(a)
(b)
(c)
Figure 3.1: Comparison between the biological and the neuromorphic information transfer. a) In biology, neurons transmit spikes through dedicated
axons. b) In neuromorphic systems, the AER uses the bandwidth of copper
wire to emulate virtual axons for all the silicon neurons. An encoder and a
decoder (gray boxes) are required to multiplex the spikes addresses. c) A nontrivial connectivity can be set up by sending the multiplexed spikes through
a look-up table.
In this chapter I describe the basics of AER communication and show
a method to formalize the description of the protocol according to Boahen
(2000). In contrast to many prior implementation of the AER circuits, the
formal representation helps in identifying possible improvements in the design. I describe various improvements with relevant data from the silicon
chips. The AER communication, being the backbone for data transfer in
neuromorphic chips, has to be carefully optimized both speed and robustness.
3.2
Basics of AER communication
In biology, dedicated axons carry information (spikes) from a neuron to all its
synapses, far from the cell body (see Fig. 3.1(a)). The synapses create connections between the neurons, and the axons form an intricate 3D network
of cables within the layers of the cortex. It is not a feasible to build such
an architecture for large network of silicon neurons and synapses due to the
inherent 2D structure of a VLSI system. The need for AER arose not only
to circumvent the problem of connecting silicon neurons located on different
chips but also withing the same chip. The VLSI chips, in general, have few
external pins much less than the number of neurons or synapses present on it.
Hence, a metal wire connected to a pin cannot be used as a dedicated axon
3.2. Basics of AER communication
23
Figure 3.2: Details of an address event (AE) bus. Two neuromorphic, chips
having 1D array of neurons with synapses attached to them, are connected
via a AE single bus. The bus consists of data and control path signals used to
establish communication between two arbitrary neurons via silicon synapses.
Multiple chips can be similarly connected forming a large network of neurons
and synapses.
serving a neuron. The strategy is to use the high bandwidth of copper wire
(compared to a physiological axon) and the speed of VLSI system (compared
to physiological time scale) to perform a time-domain-multiple-access. This
allows to form virtual connections between two neurons on different chips,
or even on the same chip. Figure. 3.1(b) shows how the virtual connection
carries spikes from the neurons to all their synapse, sharing the same physical connection. An intermediate look-up-table, shown in Fig. 3.1(c), can
direct spikes from its source to the desired destination, forming an arbitrary
neuronal connectivity map.
Spikes generated by the neurons are stereotype events having digital amplitude, whose timing and source of initiation(address) are of only importance
for communication purpose. The AER is an event driven communication protocol where the data represents the source address of the spike, and timing
of the event is self-encoded by the arrival of the data itself. The temporal
order of events on the address event bus is reminiscent of the firing pattern
of the neurons. This is demonstrated in Fig. 3.2, which shows the connection
between a source and a destination chip.
However, to implement this communication protocol one has to take care
24
Chapter 3. AER Communication circuits
of a number intermediate elements between the sending and receiving units,
the neuron and its target synapse. The intermediate blocks perform combinational operation to correctly represent the data.
Arbiter: As neurons do not fire in regular temporal sequence, many of
them will try to access the address event bus simultaneously. A method of
arbitrary selection of one among them in an unbiased manner is essential.
Encoder: With a large number of neurons (say
√ N )1 on a chip, the number
of wires required for transmission increases as N . To restrict the wire
count grow in a logarithmic manner, an encoding scheme is required.
Decoder: The receiver ship should decode the encoded address. The AE
bus can carry a sporadic data stream often with high event rate but also with
phases of ’garbage’ data. Hence, the address bus should be decoded in the
receiver chip fast and in a reliable way.
A robust asynchronous communication between the source and destination chip along with the above mentioned combinational circuit blocks is
designed. Here we first describe the method of implementing an efficient
communication channel with single sender and receiver. In such a condition,
all combinational blocks can be approximated as delay elements. Later we
introduce the more general case of multiple sender and receiver with design
considerations for the arbiter, encoder and decoder. As the neurons function
in parallel and in real time, the physical connection is accessed by them only
when necessary. This leads to the an asynchronous communication that is
not driven by a central clock.
3.3
Single sender and single receiver
Let us first consider the simplified assumption of a single sender and single
receiver i.e., there is one source neuron sending its spikes to only one destination neuron. The source neuron can generate spikes at any time and
with any frequency (limited only by its refractory period and the channel
bandwidth). The destination neuron should receive a spike event when it is
ready and also communicate with the source about its state. There is no
central clock synchronizing the source and destination chips, resulting in an
asynchronous data transfer. One principle assumption in this analysis is that
every spike is important and we do not want any of them to get lost during
communication. On the other hand, data transfer should be fast enough to
mimic the real-time communication between biological neural network.
1
Here we consider the generic
√ case of a 2D square array of N neurons. To uniquely
identify a neuron in this array, N wires for each of X and Y axis are necessary.
3.3. Single sender and single receiver
Stage i−1
R i−2
A i−2
Stage i
R i−1
Stage i+1
Ri
A i−1
Ai
Data in
25
A i+1
R i+1
Data out
(a)
Ri
Ai
Data
(b)
Figure 3.3: a) Generic example of an asynchronous pipelined channel showing data and control signals. The request(R) and acknowledge (A) signals
establishes a handshaking protocol to transmit the data correctly, without
any central clock. b) Four-phase handshake is the most popular method of
communication between asynchronous pipelined elements.
3.3.1
Pipelining the data
Pipelining is a technique of increasing the throughput of a communication
channel by breaking up the communication cycle into subprocess, at a moderate increase in area. These sub processes are connected in series but executes
their function in parallel.
Asynchronous data transfer, without a global clock, is heavily dependent on the concept of handshaking to complete a communication cycle.
In asynchronous pipeline, the transfer of data between the source and the
destination is regulated by local communication between stages. When one
wants to send data to a neighboring stage, it sends out a request. If the
neighbor can accept new data, it does so and returns an acknowledgment.
The Figure 3.3(a) shows separate control and data path lines consisting of
request/acknowledge (R/A) signals and data bus, respectively (Sutherland,
1989). The control path ports with black dots are active ones, while the others are passive ports. The control path can operate in two different modes,
namely, two-phase or four-phase handshaking. In two-phase handshaking,
when passing control information, a request is sent from the sender and an
acknowledge is sent back by the completion detection logic of the receiver.
This effects in single transitions on both request and acknowledge lines. In
contrast, a four-phase handshaking has a second set of request and acknowledge transitions sent in order to return these signals to their original states
(Yun et al., 1996). Even though it requires twice the number of transitions,
four-phase handshaking is not necessarily slower than two-phase as most
combinational logic blocks consume much more time than the communication blocks. In four-phase hand shake communication is initiated only on
26
Chapter 3. AER Communication circuits
Figure 3.4: Communication cycle in a an AE bus connecting two neuromorphic chips. All combinational blocks are connected via the communication
blocks, that executes data transfer. White and black boxes indicate the duration of the set and reset halves of the control signals. (a) A non-pipelined
channel an intermediate stage is acknowledged only when all its following
stages are acknowledged. (b) In the pipelined channel, an intermediate stage
do not wait for the following stage to acknowledge it before acknowledging its
preceding stage. Similarly, it does not wait for the following stage to withdraw
its acknowledge before withdrawing its own. (adapted from Boahen (2004a))
rising edges; hence, it is easier to implement and preferred by designers.
Consider the stage i in the micropipeline communication block shown in
Fig. 3.3(a) and the corresponding timing diagram in Fig. 3.3(b). The active
port Ri is taken high when the data is ready to be sent. The next stage (i+1),
if ready, receives the signal on its passive port and sends out an acknowledge
on Ai by taking it high ( it also latches the data simultaneously). Stage i
receives the acknowledge on its passive port and starts the resetting half-cycle
by taking Ri low. After the completion of the full four-phase handshaking
the data bus is released and can go into its default high impedance state.
Using these micropipeline communication blocks we can build an asynchronous pipelined channel. The combinational data path blocks mentioned
before (arbiter, encoder, decoder) can be modeled as simple delay cells between the request(R) and acknowledge(A) control lines. Adding a source
and a sink to the channel, the sender and receiver respectively, the entire
communication path can be viewed as in Fig. 3.4 (Boahen, 2004a). The
different combinational elements in the pipeline are: neuron(sender)-arbiter-
3.3. Single sender and single receiver
27
encoder-decoder-neuron(receiver). Figure 3.4 shows the same channel with
and without pipeline, demonstrating the advantage of higher throughput.
The implementation of the pipeline can be understood from Figure. 3.5,
which shows a single sender and a receiver communicating through the asynchronous channel.
An entire system similar to this was initially built with heuristic methods
that did not exploit the theoretical knowledge on asynchronous communication. However, it was later made compatible with the formal CHP 2 language,
utilizing the know how of asynchronous design methodology (Boahen, 2000).
We will investigate the system with our knowledge of micro-pipelines and
show that the heurestic method matches with the formal description to a
large extent. This also allows us to identify potential improvements from
previously proposed scheme. Comparing Fig. 3.5 to Fig. 3.3(a), we see how
the basic implementation matches with ideal pipeline requirement. Distinct
control and data path carries signals from left to right, using the handshaking
cycles in communication blocks. The combinational blocks modifies the data
path behavior. Here we describe the different handshaking cycles, with the
relevant signals, inside the communication blocks. Handshaking is performed
by symmetric and asymmetric C-elements, marked by C and aC, respectively
(see Appendix A for the logic and circuit descriptions of C elements). The
following sections describe the handshaking cycle using HSE primitives. The
basic HSE primitives with examples are shown in Appendix. B.
Neuron-Arbiter
In the beginning of the pipeline cycle, the event generator interfaces the
sender neuron to the arbiter (see Fig. 3.5). The high speed analog signal
(Ss ), representing a spike, is terminated to generate a pulsed event Nr for
the arbiter. However, Nr should wait till arbiter acknowledge (Na ) goes low,
completing the previous cycle. The neuron is reset by Sr , when the arbiter
acknowledges it. The reseting phase starts with Ss going low and Sr is taken
back after the entire cycle is complete. The HSE of the event generator
should look like:
∗ [[Ss &Ña ]; Nr +, [Na ]; Sr +; [Ss ]; Nr −; [Na ]; Sr −]
2
(3.1)
Communicating Hardware Process (CHP) is the first step in describing the behavior
of an asynchronous circuit with formal notation. Next, hand shaking expansion (HSE),
encodes data with Boolean variables (nodes in the eventual circuit) and converts every
communication action into a four-phase handshake (Martin et al., 2003).
Chapter 3. AER Communication circuits
Na
Sr
aC
Ao
Arbiter
Neuron
aC
C
Ea
Latch
Event
generator
Cr
Er
Tx
handshake
Encoder
Nr
Ss
Receiver Chip
Pr
Dr
C
Ca Le
Pa
Latch
R
Q
Rx
handshake
Neuron
Sender Chip
Decoder
28
Event
receiver
Figure 3.5: Communication channel established between a single sender and a
single receiver on different chips. The communication blocks are shown in solid
rectangle and combinational blocks in dashed oval. All control path signals
and their handshaking cycles are illustrated. The combinational blocks acts
as delays on the control path and alters the data path.
The circuit for the event generator can be synthesised using the above
HSE (see Appendix.B) code. Every sender has its own event generator, the
output of which connects to the arbiter. The arbiter collects all such events
to select one among them (see Sec. 3.6). For a single sender case, the arbiter
logically being a combinational block, delays the Na signal during arbitration
and it as Ao to the next communication block. The data path output of the
arbiter is a one-hot 3 code, carrying a high signal only on the selected line.
Arbiter-Encoder
The next micro-pipeline stage, the transmitter handshake (Tx handshake),
connects the arbiter to the encoder. The signals participating in four phase
handshaking during this stage of micropipeline can be identified as Ao , Ea as
input and Na , Er as output. Output Na also serves as the latching signal for
the data path transmission. The HSE for the Tx handshake can be written
as:
∗ [[Ão &Ea ]; Er +, Na +; [Ao &Ẽa ]; Er −, Na −]
(3.2)
The control signal Cr is a delayed version of Er , communicating with the
next micro-pipeline stage. The encoder data path output is a log N data bus,
that carries the address of the selected neuron.
3
In one-hot code, only one data line out of N is high, while all others remain low
3.4. Multiple sender and multiple receiver
29
Encoder-Decoder
The Receiver handshake (Rx handshake) block, between the encoder and the
decoder, functions similar to the Tx handshake. The four-phase handshaking
produces Ca and Dr . The signal Dr can be used to latch the data before
sending it to the decoder.
∗ [[Cr &P˜a ]; Dr +, Ca +; [C̃r &Pa ]; Dr −, Ca −]
(3.3)
This block establishes communication between the receiver chip with the
external world and can talk to either a transmitter chip (as shown in Figure.
3.5) or a computer sending data using the AER protocol. The Decoder
decodes the log N data to a one hot data (see Sec. 3.7) and communicates
the signal to the relevant receiver unit (neuron). For a single receiver, the
communication channel terminates to the only receiver neuron present.
Decoder-Neuron
The receiver node can be a synapse of a neuron or any other circuit that
receives the spike event for further processing. Let us consider that the
receiver node is always ready to accept the incoming spike. For a single
receiver, the termination of the communication channel is trivial. On arrival
of the pixel request signal (Pr ), it receives the one hot data Q, and generates
a acknowledge signal Pa .
A communication channel that complies with all the handshaking cycles,
should result in the optimal data speed and robustness available from micropipelining. However, in the IFSL family of chips, minor variations are
used. One important omission is that of one of the upper asymmetric Celements in the event generator block. In the IFSL chips, Ss and Nr are
essentially the same signal that assumes the arbiter to be faster compared to
the delay between two consecutive spikes from the same neuron.
3.4
Multiple sender and multiple receiver
The description of a communication channel presented in the earlier sections
assumed a single sender/receiver unit. But, this is rather uncommon for
a neuromorphic chip that is typically supposed to establish communication
between a pool of senders and receivers. When many neurons in sender
chip tries to communicate (send spike events) with many others neurons
in the receiver chip, the single physical bus allows this simultaneous manyto-many mapping by time-division-multiple-access of the AER channel (see
30
Chapter 3. AER Communication circuits
Fig. 3.1(b)). Only one sender neuron gets access to the bus at a given time,
chosen by a arbitration process between the competing neurons. Section.
3.6 deals in detail about the arbitration strategies and their results. The
selected neuron send its request and its address (through the data path) to
the receiver chip. The encoder, the mapper and the decoder is responsible to
deliver the spike event to the correct target neuron. The input to the receiver
neuron, being a passive port, reacts only when it is triggered with a request
and then sends an acknowledge back. The virtual communication channel
established in this process can be reused by another pair of sender-receiver
at a later stage.
3.4.1
Data path design
The Fig. 3.6(a) and Fig. 3.6(b) shows the sender and receiver units communicating with the data bus. No timing protocol is considered here, but only
the combinational blocks previously considered as delay elements in Fig. 3.5
are shown. The relevant signal names are identical to those used in Fig. 3.5.
Once the sender neuron is acknowledged by the arbiter (Ai ), its address is
encoded in logN bits and a request (Cr ) is sent to the receiver chip. Due
to the asynchronous, event driven nature of communication, the request is
generated only during an event. As the request is generic, any acknowledge
in the sender chip (A1 to AN ) result in the same Cr signal. An OR gate (Fig.
3.7 left) connected to all acknowledge signals of the pixel column (Ai ) can
be used. This implementation would require N pMOS transistors stacked in
series, where N can be in order of hundreds. For such a gate, the time delay
for high to low transition at the output (Cr ) will be extremely varied and
data dependent (see Sec. 3.7 for detail explanation). On the other hand, a
wired-OR implementation, shown in Fig. 3.7 right, does not suffer from the
problem of stacked transistor and also uses nearly half as many transistors.
The wiredOR circuit is similar to the lower part of the conventional OR
gate and is enough to generate a low-high transition in Cr . The high-low
transition can be executed by a single pull-up transistor connected to an
appropriate bias. A better approach is to replace bias voltage source with
active pull-up circuits. The Ca signal serves a good candidate to perform
the pull-up function. The active pull-up complies with the idea of making an
efficient AER transceiver which is robust and independent of external voltage
biases.
Similar to the sender chip, the target neuron (Qi ) in the receiver delivers
the pixel acknowledge signal (Pa ). An identical wired-OR can be used to
combine all Q1 -QN , to produce the generic Pa signal. The Pr signal can be
used for the pull-up of the OR gate.
3.5. Receiver handshake
31
AE bus
Na
Ai
Pixel
column
Ca
Cr
Pa
log(N)
Qi
Wired OR
Nr
Arbiter
N pixels
log(N)
Encoder
(a)
Pr
Wired OR
AE bus
Decoder Pixel
column
(b)
Figure 3.6: Data exchange between combinational blocks of the 1D array of
neurons (pixels). (a) The arbiter selects one of the active neurons sending a
N -bit one-hot code to the encoder. The Encoder generates a log N bit data.
(b) The decoder recreates the one-hot code and selects the required pixel. The
control path signals play a necessary part in latching the data.
3.5
Receiver handshake
The pipelining scheme shown in Sec. 3.5 was not implemented in early generations of various neuromorphic chips. In order to speed up the entire
communication cycle, certain liberties are often taken while implementing
the receiver chip. One major assumption is that the on-chip communication
is much faster than the off-chip one. This allows us to ignore the four phase
handshaking at the Rx handshake block in in Fig. 3.5. The request, Cr , from
the previous stage is acknowledged (Ca ) as soon as it arrives, without waiting
to check the acknowledge from the next stage (Pa ). Usually this does not
have any adverse effect on the behavior AE bus, as by the time a second Cr
arrives, the on-chip communication almost certainly restores Pa to its correct
state. However, this can lead to a problem of missing spikes during a burst
of events sent to the receiver chip. The implementation of proper handshake
at every pipeline stage is a matter of choice depending upon the usage of the
chip. Where data is incessant in nature; say from the output of a sensor chip
(like a vision or audition sensor), each and every event (spike) may not be
very important. On the other hand, cases where data is sparse and comes in
bursts, missing a single spike is unacceptable.
Receiver handshake experiment
In order to demonstrate the problem of missing spikes, we tested two chips
one implementing the receiver handshake and another not. In both chips, an
32
Chapter 3. AER Communication circuits
An
A2
Ca
A1
Cr
A1
A2
An
A1
Cr
A2
An
Figure 3.7: Conventional OR gate (left), using complementatry CMOS logic,
has stacked pMOS transistors for pullup while Wired-OR (right) uses a single
transistor for pullup (either active or passive). A staticizer holds the output
state when it is not actively driven.
Vc
IEPSC
Figure 3.8: A probe point inside a single synapse is connected to an external
pad. This makes it possible to verify whether an AER event directed to the
particular synapse reached its destination.
internal node of a receiver synapse was connected to a probe point Vc shown
in Fig. 3.8. In input spike will result in a dip in the voltage Vc , before it
recovers back. This let us check whether that particular receiver unit actually
responded to an input spike targeted to it. The chips were stimulated with a
burst of 5 spikes directed to different addresses including the test unit. Let
us first consider the chip without the four-phase handshaking.
Two different burst address was used: a) Address of the test unit as the
first spike of the burst and b) Address of the test unit as the last spike of the
burst. For case-a (not shown), the test synapse always responded to the input
spike. Changes in inter-spike-interval(ISI) of the spike burst did not have any
effect to this result, i.e., the first spike always reached the correct address.
In contrast, for case-b, the test-synapse response was highly dependent on
the ISI of the burst. A moderate (∼ 10ms) interval between spikes resulted
in correct test-synapse response but it often failed to respond for lower ISIs.
This is shown in the Fig. 3.9. Two separate bursts of spike were sent to the
chip with equal ISI of 4ms. The test-synapse response at this is unreliable
3.6. Arbitration basics
33
Vc
3.3
Vc
3.3
5
5
0
Spike Input
2.5
Spike Input
2.5
0.003
0.007
time(s)
(a) a
0
0.003
0.007
time(s)
(b) b
Figure 3.9: Communication failure without four-phase handshaking. Top
panels show voltage Vc as in Fig. 3.8. The voltage should go down when the
particular synapse receives an AER spike event. Bottom panel shows a burst
of AER events whose last spike is directed to the test synapse. The communication fails to be reliable as it responds randomly to one of the two consecutive
input bursts, the first one in (a) and the second one in (b). Implementing a
four-phase handshaking scheme did not show a single failure in 200 trials using
even smaller ISIs.
as it responds in one of the two bursts randomly. The failure rate increases
with decrease in the ISI. In the failed cases, the receiver chip kept working
without any knowledge that it is acknowledging external signals faster than
the core of the chip can handle.
The same experiments were carried out in the other chip implementing
proper four-phase handshaking as described in Eq. 3.3. Case-a, as usual did
not show any problem of missing spike. For case-b, even using the minimum
ISI possible (dependent on maximum AER bandwidth of 6k events/s) did
not show a single failure, in the 200 continuous stimulations.
3.6
Arbitration basics
In the early days of AER communication systems, various channel access
topologies were studied, e.g., sequential scanning, ALOHA-based, priority
encoding or arbitrated access (Culurciello and Andreou, 2003). However, an
arbitrated system was found to be best suited for channels with a wide range
of throughput. An arbiter chooses one of its many inputs when they attempt
34
Chapter 3. AER Communication circuits
out1
in1
Concurrent
inputs
in2
in1
o1
out2
out1
out2
in2
o2
Mutually Exclusive
outputs
M
in1 in2 o1
o2
0
1
0
1
0
0
1
1
1
1
0
0
1
1
previous
or ME
Figure 3.10: Mutual exclusion (M E) is the core of the arbitration process.
Two concurrent inputs (high signals) passing through the ME results in complementary output signals (i.e., mutually exclusive) in an arbitrary manner.
The ME circuit consists of a pair of cross-coupled NAND gates and a glitch
reduction circuit (within dashed box). The truth table shows the expected
behavior, within a bounded delay.
to access the same shared resource simultaneously (i.e., the AE bus). In this
section, we will first discuss about a two input arbiter that selects only one
of its outputs high, when both inputs are activated.
The core of an arbiter is the mutual exclusion circuit, often implemented
with cross coupled NAND gates followed by a glitch reduction circuit (Mead
and Conway, 1980). This circuit shown in Fig. 3.10 behaves like a traditional
SR flipflop for input pairs 10, 01 and 11. In conventional digital design, the
input 00 is restricted from use as it does not produce complementary output
states. However, let us consider 00 as the default input condition, where o1
and o2 are both high. When one of the inputs go high, corresponding NAND
output goes low (see truth table). As the inputs return to 00; o1 and o2 both
goes back to 11 state. In an alternative situation, if one input goes high while
another is already high the NAND output behaves predictively as shown in
the truth table of Fig. 3.10, i.e., the previous output is maintained. These
are the simple operating conditions where one of the two inputs is selected
without any ambiguity. The inverters, powered by complementary inputs
simply switch the states of o1 and o2 signals.
The critical condition arises when both inputs go high at nearly same
instant. The NAND outputs temporarily goes into metastable states (local maxima in energy level) from which they come out to reach arbitrary
stable states but complementary in nature (o1 6= o2). This possibly happens because of inherent asymmetry of the physical realisation of the gates
(impurities, fabrication flaws, thermal noise, etc Martin et al. (2003)). Theoretically, the outputs lingers around the metastable state for an unbound
3.6. Arbitration basics
35
V=o1−o2
in1
in2
in2
o1
tinit
log( V)
in1
time
o2
(a)
(b)
Figure 3.11: Output of the cross-coupled NAND gates depend on the present
and immediate past input, as shown in Fig. 3.10. (a) Trivial changes in input
(in1, in2) results in an expected output (o1,o2). However, concurrent high
inputs result in a mutually exclusive output (gray region) depending minute
initial time difference (∆t) or system nonideality. (b) The voltage difference
at the output (∆V ) grows exponentially over time to settle to complementary
values.
period before the conflict is resolved. However, it has been shown that the
circuit output stabilizes to 01 or 10–depending on the slightest difference in
initial condition–without much delay. Let us consider, ∆Vinit is the voltage
difference between o1 and o2 due to the small time difference(∆t) between
in1 and in2 . Detailed analysis using inverter models show that the voltage
difference ∆V , grows exponentially in time depending on infinitesimal difference in the input(Sakurai, 1988). This rapidly results in a complementary
set of output.
∆Vinit = K∆tinit , ∆V = ∆Vinit exp(αt)
(3.4)
The inverters become an essential part during the time interval when o1
and o2 has not reached stable states. They act as glitch removal circuit and
guarantee that the M E output remains complementary for the subsequent
logic stages.
We now look into the logical framework of arbitration. An N-input arbiter required to select between N concurrent signals can be constructed
from 2-input arbiters arranged in a tree structure, shown in Fig. 3.12(a).
In Fig. 3.12(b) a basic 2 input arbiter with internal signal nomenclature
is shown. To analyze the functionality of an arbiter tree, we start with a
rudimentary 4 input arbiter (shown in Fig. 3.12(c)). In the first level, the
input nodes are named as d1 to d4 and the intermediate nodes are called m1
and m2 . The four input nodes send requests to the arbiter by taking their
respective li ports high, and get back acknowledgment from the arbiter if
36
Chapter 3. AER Communication circuits
d1
d2
d1
N/2
input
A m1
d2
dN/2
dN/2+1
dN/2+2
2
input
Am
A
N/2
input
li2
lo2
dN
(a)
m1
li1
lo1
(b)
ro
ri
d3
A m2
m2
d4
(c)
Figure 3.12: (a) The general arbitration scheme. An N input arbiter is made
of a tree of arbiters culminating in a 2 input one. Each branch of N/2 input
arbiter again has the same tree structure. (b) Single 2 input arbiter with all its
input and output terminals. Separate signals for request (li1) and acknowledge
(lo1) are used for each input. (c) A four input arbiter with all internal nodes
and signals. The arbiters Am1 and Am2 are called as daughters of the arbiter
Am .
the corresponding lo goes high. In next sections, we will analyze some individual arbiter circuits and their behavior in the 4-input arbiter tree, using
the so called ping-pong diagram as described in Boahen (2004a). The two
important aspects of arbiter to be considered are speed (average delay for
a request to get acknowledged) and fairness. An arbiter is called fair if it
does not prefer servicing one request over another depending on its history
of operation. As we will see in various arbiter implementation, fairness of an
arbiter with large number of inputs is not easy to achieve.
3.6.1
Standard arbiter
Alian Martin proposed the standard version of the arbiter in late 80s, which is
fair but slow in its operation (see Martin and Nystrom, 2006, for review). The
arbiter, shown Fig. 3.13(a), has its mutual exclusion circuit made of coupled
NAND gates and complementary powered NOR gates (for glitch reduction).
Let us consider the default condition when both li1 and li2 are low. Nodes
a1 and a2 are low and so are ro and ri. This results in acknowledge nodes
lo1 and lo2 to be low as well. Now, if li1 goes high, a1 goes low and a2
remains high. The NOR gates drive a1 high sending a request to the parent
level ( i.e., ro goes high). When the acknowledge from the parent comes back
setting ri high, the C-element sets lo1 high and lo2 continues to be low. Even
if li2 is taken high in between, trying to send a new request to upper level, it
does not change states of a1 or a2 as M E element holds their previous state.
Hence, li2 has to wait before propagating its request. A complete timing
diagram of all the internal signals, for the case where li1 made a request
3.6. Arbitration basics
lo1
li1
37
C
m 1or m 2
Am
a1
a1
ri
m1
A m1 A m2
ro
a2
li2
lo2
d1
d1
d2
d3
a2
C
(a)
t1
t a t’2
input
(b)
Figure 3.13: (a) The standard arbiter cell designed by Martin (see Martin
and Nystrom, 2006, for review). (b) The ping-pong diagram shows the signal
transmission through different levels of the four input arbiter. All request
signals are directed upwards and acknowledge downwards. The set and reset
phase is differentiated by arrow-heads and circular-heads, respectively. The
arbiters Am1 and Am2 (see Fig. 3.12(c)) receives requests from d1 -d3 input
node. Am forms the next level in the arbiter tree.
before li2, is shown in Fig. 3.14.
We will analyze the ping-pong diagram (Fig. 3.13(b)) to understand signal
transition between different levels of the 4-input arbiter in Fig. 3.12(c). Each
node (d1 -d4 , m1 and m2 ) has a request and an acknowledge component to
it. For input d1 , we will call them d1r and d1a respectively. In the figure,
both request and acknowledge signals of a node are marked with the node
name itself, but with different arrow directions. All arrows directed upward
represent request signals and arrows going downward represents acknowledge
signals. The circular arrow heads follow the same rule but represent the
reset phase of the signals. Signals corresponding to d1 are marked with filled
arrow/circular head, and the ones corresponding to d2 are marked with open
arrow/circular head.
Assuming the inputs d1 and d2 made simultaneous requests taking d1r
and d2r high, a request from the first level of arbiters (A1 , A2 ) will be sent to
the second level (A3 ), taking m1r high. When m1a is set high, arbiter A1 is
chosen by its parent, which, in turn, acknowledges one of the two daughter
requests(say d1 , and sets d1a high). Next, when d1r is reset to low, this is
conveyed to the next level resetting m1r low. The acknowledge signal on d1a
goes low only after acknowledge m1a is reset. This process illustrates that the
request from one input sets all the intermediate request and the acknowledge
38
Chapter 3. AER Communication circuits
signals, right up to the top of the arbiter tree. Similarly, all the request and
acknowledge signals has to be reset to complete the cycle just to service that
one input.
After input d1 is serviced, the held request of d2r will force another request
(m1r ) by A1 . However, if d3 is ready with a request d3r at t1 (< ta ), then m1r
and m2r will be both set and one of them will be arbitrarily chosen without
any bias. If arbiter A1 is chosen (setting m1a high), then d2a will be taken
high, otherwise, d3a goes high acknowledging input d3 . This random choice
between the two (d2 and d3 ) makes the arbiter completely fair, and without
any bias from its history. However, setting and resetting all the corresponding
signals in every level of the tree, for each input makes the process very slow.
3.6.2
Fast, unfair arbiter
In Boahen (1998), a faster implementation of the arbiter was proposed. The
analysis of the circuit, shown in Fig. 3.15(a) is provided in the paper itself.
This is an improved version of the early arbiter proposed in Mahowald (1994),
with added robustness replacing the non-standard digital elements. Here we
discuss the ping-pong diagram in Fig. 3.15(b) to demonstrate its behavior.
Let us start with the previous situation where inputs d1 and d2 made simultaneous requests, and d3 made a request at t1 . Say, input d1 is arbitrarily
chosen to be acknowledged first. Next, input d2 will also be acknowledged
immediately after d1 resets its request (take d1r low) without waiting for
any change in m1r or m1a . It reuses the local acknowledge (M1a ), without
sending its request up the arbiter tree. This is in contrast to the analysis
in Fig. 3.13(b), where d3 had a fair chance of getting acknowledged. Now,
consider d1 making a second request at t2 . Next, m2r goes high, but arbiter
Am2 has no chance of getting acknowledged. Signal m1a which was already
set high in previous cycle acknowledges Am1 , once again. Hence, d1 gets
acknowledged again, keeping d3 on hold, only because Am1 had an immediate history of getting acknowledged. If d2 makes another request at t3 ,
it will get acknowledged too, while d3 is kept waiting. This goes on unless
both d1 and d2 stops requesting (say, at t4 ), letting d3 get acknowledged.
The local request-acknowledge cycle speeds up the arbitration process for a
large arbiter tree (often with hundreds of inputs and tens of levels). But
once it served a branch, it is biased to acknowledge the same one repeatedly,
restricting all activity from a localized region.
3.6. Arbitration basics
39
li1
li2
a1
a2
a1
a2
ro
ri
lo1
lo2
tp
tc
tp
tp
tc
tp
Figure 3.14: Detailed timing diagram of a single arbiter (shown in
Fig. 3.13(a)). The time taken to set or reset signals by its parent (tp ) or
its daughter (tc ) level are marked below. The lower white and black boxes
indicate the time taken for a full request/acknowledge cycle of the individual
inputs.
40
Chapter 3. AER Communication circuits
lo1
li1
li2
ri
lo2
ro
(a) a1
Am
m1
m 2 m2
Am1,Am2
d1
d1
d2
d2
d3
t1 t2
d1
t3
t4
(b) a2
Figure 3.15: (a) The fast, unfair arbiter designed by Boahen (Boahen, 2000).
(b) Starting with the same condition as in Fig. 3.13(b), the ping-pong diagram
shows a faster request/acknowledge cycle for the inputs d1 and d2 , but d3 is
unfairly held back as long as none of the others are requesting any more.
Experimental verification of the unfair arbiter
To verify the problem of unfair arbitration, a chip consisting a 32-by-32 array
of I&F neurons was used. All the neurons could be forced to spike simultaneously using a single external bias voltage controls the current injection to
the neurons. The AER bus from the chip transmits these spike events to a
computer. Using the address of the sender neuron, the activity pattern on
the chip can be monitored in the computer. Though all neurons integrate
the same input current, only the one acknowledged by the arbiter are allowed
to get access to the AER bus. They communicate their spike events outside
while others have to wait for their chances arrive.
In Fig.3.16, the 2D reconstruction of the chip activity is shown. In
Fig. 3.16(a), for a moderate level of injected current, firing frequency of neurons vary considerably due to the inherent mismatch in silicon fabrication
3.6. Arbitration basics
41
Spike count = 434176
Spike count = 569344
1k
1k
32
Y
Y
32
1
1
1
32
1
X
32
X
(a)
(b)
Spike count = 585728
Spike count = 581632
1k
1k
32
Y
Y
32
1
1
32
X
(c)
1
1
32
X
(d)
Figure 3.16: Data obtained from using the greedy arbiter in a 2D array of
I&F neurons. In (a), a low constant current injection make the neurons fire at
moderate frequencies with cross-chip variations due to mismatch. (b) Firing
rate escalates with the increase in current injection thus increasing the total
spike count per second (shown on top). The maximum frequency of spikes that
can be received from the chip has an upper limit due to AER bus bandwidth.
The problem arising due the unfair arbiter becomes evident in (c) and (d).
With increase in the current injection, the arbiter services one half (c) of the
array much faster, but completely ignores the other. For even higher currents
just one fourth (d) of the array could report any activity. For all panels, the
output frequencies were clipped at 1kHz.
42
Chapter 3. AER Communication circuits
lo1
li1
m2
aC
ro
ri
li2
m1
d1
m1
d1
d2
d1
aC
t1
lo2
(a)
t4
t2
(b)
Figure 3.17: (a)The fair arbiter designed by Boahen (2004a). This is faster
than Martin’s scheme shown in Fig. 3.13(a). (b) The arbiter does not show
any bias for the previously selected daughter.
process. This is a random mismatch problem without any particular directional bias. The average spike count per second is shown on top of the plot.
When the injection current is increased (from Fig. 3.16(b) to Fig. 3.16(c)),
the spike count reaches a maximum of ∼ 5.8k spikes/sec. As the total activity approaches its maximum value the uniformly distributed firing pattern
disrupts to form local regions of much higher activity. The arbiter randomly
choose one half of the arbiter tree and completely ignores request from the
other (Fig. 3.16(c)). As the lower half is never acknowledged, no activity is
ever reported from there. In a 2D array of neurons, arbitration should be
done in both X- and Y-axis, one after the other. Here, the biased nature of
an unfair arbiter is observed only along the Y-axis, one which is arbitrated
first in this chip. Hence, for high data volume the arbiter breaks the array
in two horizontal halves. For even higher current injection (Fig. 3.16(d)),
arbiter randomly chooses one half of the previously selected region. As we
kept increasing the current (data not shown), the arbiter acknowledged even
smaller portions of the chip. After restricting the activity to one horizontal
row, the symmetry along X-axis was broken. Finally only one neuron firing
at ∼ 10kHz was observed. Even though such a huge firing frequency from
a single neuron is not feasible for biological (and neuromorphic) systems,
but frequency of 1kHz is not uncommon for neurons communicating sensory
signals from silicon-retina or siicon-cochlea. In such cases, a sudden burst of
activity from a local pool of neurons would completely ignore all information
from other regions of the network.
3.6. Arbitration basics
3.6.3
43
Fair arbiter
An improved version of the arbiter was again proposed by Boahen (2004a) by
going through the rigorous formalism of CHP and HSE. The circuit shown
in Fig. 3.17(a) is described in the paper itself. The philosophy behind a fair
arbitration is: a daughter node that is not requesting is not visited and a
daughter node that makes another request is not revisited until the entire tree
has been serviced. However, comparing Fig. 3.17(a) to Fig. 3.15(a), shows
that an ingenious alteration in the connectivity of the ri signal (acknowledge
from the parent) is responsible for the change.
Again, we analyze the ping-pong diagram by starting with two simultaneous requests from d1 and d2 . After d1 is acknowledged, it resets its request
(d1r ) that result in immediate reset of its acknowledge (d1a ). This leads to
d2 being acknowledged (d2a goes high). As usual, we consider d3 putting up
a request in between (at t1 ), and d1 making a second request at t2 . Though
input d3 always forces m2r to go high, but arbiter Am2 was never acknowledged in the unfair arbiter (see Fig. 3.15(b)). Here, when d2r is reset, m2r
and m2a are also taken low. This allows arbiter Am2 to get acknowledged,
setting m2a high and also d3a . The pending request by d1r is acknowledged
only after m2a is reset (t4 ). This forces the arbiter to be fair by checking all
nodes before repeatedly servicing the same one.
However, if we compare the ping-pong diagrams, it is evident that greedy
(Fig. 3.15(b)) and fair (Fig. 3.17(b)) arbiters completes one full requestacknowledge cycle by t2 which is less than t02 in Martin’s design (Fig. 3.13(b)).
This is possible by keeping the request-acknowledge cycle local.
Experimental verification of the fair arbiter
We verified the performance of the fair arbiter in a different chip with a
similar method as described in the Fig. 3.16. This chip also consists of a 32by-32 array of I&F neurons that can be made to fire by injecting current. As
shown in Fig. 3.18, moderate injection current resulted in low firing frequency
(Fig. 3.18(a)) and progressively higher injection increased frequency of all
neurons (Fig. 3.18(b). The total spike count per second escalated till the
bus is saturated at around ∼ 5.8k event per second. Though the effect of
random mismatch is visible in the firing pattern in all the plots, notice that
the arbiter is performing in an unbiased manner for even higher injection
current (Fig. 3.18(c)). It lets all neurons fire at their preferred frequency.
In Fig. 3.18(d), however, we see that even higher injection currents smooths
out mismatch effects. This is because weaker neurons are firing at higher
frequencies and stronger neurons (that already have high firing rate) are no
44
Chapter 3. AER Communication circuits
Spike count = 71753
Spike count = 307200
1k
1k
32
Y
Y
32
1
1
1
32
1
X
32
X
(a)
(b)
Spike count = 573440
Spike count = 581632
1k
1k
32
Y
Y
32
1
1
32
X
(c)
1
1
32
X
(d)
Figure 3.18: The fair arbiter performs much better in a task similar to that
in Fig. 3.16. Current injection increases the average firing rate and also the
total output spikes from the chip, as shown in (a) and (b). In panels (c) and
(d) the maximum AER bandwidth limit is reached. However, neurons from all
parts of the 2D array are acknowledged by the arbiter which is demonstrated
by an uniform activity. The neurons that showed high firing rate in (c) are
restricted by the latency of the arbiter in (d), that lets the weaker neurons to
get acknowledged.
3.7. Decoder
45
longer allowed to fire higher. The unbiased selection of the weaker neurons
is the most important advantage of the new arbiter.
3.7
Decoder
The log N bit input data to a receiver chip is decoded to produce an onehot code of N bits. The decoder design used in this project has changed
considerably from many previous generations of neuromorphic chips. Two
main issues are addressed here:
1. Nonuniform delay in decoding different address bits.
2. Latching of address bits.
3.7.1
Delay
Shown in Fig. 3.19, a simple decoder for N -bit address space is made of 2N
NAND gates, each having N inputs. A very efficient method of automatically
laying out the decoder, developed in Mahowald (1994), was used in various
neuromorphic chips (Boahen, 2000; Indiveri, 2002, etc.). The main problem
in this type of decoder is the large and unmatched delay between decoding
certain addresses in succession. Two different kind delays can be identified
for this circuit, the wire-delay and the gate-delay. In a 2D address space,
each output of the decoder (the NAND gates), has to drive 2N gates and a
long metal wire. This adds to considerable capacitance and a delay while
changing states. Though of considerable magnitude, the delay does not vary
for different input address.
The second source of delay arises from discharging the internal nodes of
the NAND gates. Shown in detail in Fig. 3.20, each NAND gate is made by
N parallel pMOS as pull-up and N nMOS in series, as pull-down. Only one
of these 2N NAND gates receive high signals in all of its N inputs driving the
output to a low value. This is the correct address corresponding to that particular gate. As the output node changes state from high-to-low (H → L),
all the stacked internal nodes (parasitic capacitance C1 -Cn ) have to be discharged as well. However, the status of the internal nodes, before the correct
address arrives, depends purely on the previous address bit. Depending on
the number of internal nodes already discharged, the total time taken by the
output to change its state might vary widely. Let us analyze the delay in a
2-input NAND gate with input A0 and A1 . Suppose A1 at high state (H)
and A0 at low (L), hence the output is high and C1 discharged to ground.
46
Chapter 3. AER Communication circuits
2N gates
2N address lines
Figure 3.19: Traditional decoding scheme for N bit address space. 2N NAND
gates each having N inputs are connected to half of the 2N address lines.
When A0 changes from L → H, the delay for the output transition ( from
H → L) will be controlled by the discharging time of C0 and CLoad given by:
τHL = (R0 + R1 ).(C0 + CLoad ),
(3.5)
where Ri is the equivalent resistance of the ith nMOS transistor. On the
other hand, if the previous input had both A1 and A0 as low, both C1 and
C0 should be discharged (along with Cload ) simultaneously:
τHL = (R0 + R1 )(C0 + CLoad ) + R1 .C1
(3.6)
Though this difference is minimal for a 2-input NAND gate, it becomes a
limiting factor as number of input (N ) increases. Suppose the present input
to a 5 bit decoder is 11110, which produces an output transition only for
the NAND gate having A4 A3 A2 A1 A0 as input. It should be noted that the
NAND gate corresponding to A0 A1 A2 A3 A4 already has 4 of the 5 internal
nodes discharged to ground, for the present input. If the next input is 11111,
the same NAND will need to discharge the only remaining node to produce a
very fast output H → L transition. On the other hand if the previous bit was
01111 and the next 11111, this particular NAND gate will have to discharge
all the 5 internal nodes through their respective resistors. This creates a big
difference in delay between 11110→11111 from a 01111→11111 decoding.
Various other delay conditions can arise in this process that becomes more
critical as the number of input increases. Hence, it is important to latch the
decoded data before sending it to the sensitive analog core. These decoder
delays are only important when the NAND gates are making a H → L
3.7. Decoder
47
An
A1
A0
CLoad
C0
A0
CLoad + C0
R0
C1
A1
Cn
R1
Rn
An
Figure 3.20: Details of a N input NAND gate showing the series of stacked
nMOS transistors in the pull-down path. Each transistor, when turned on,
can be considered as a resistive element (R1 -RN ). All the internal nodes have
parasitic capacitors (C1 -CN ) that should be discharged during a high to low
transition at the output.
transition. The L → H delay is always uniform and fastest, carried out by
any of the parallel pMOS pull-up transistors.
Predecoder
Using a predecoder is a popular method to match the decoder delays for large
digital memories. The basic idea behind a predecoder is to break the large
NAND gates described before into a tree of smaller NANAD gates, that have
less delay mismatch. A predecoder uses small NAND gates (usually 2 to 3
input) to decode different parts of the address word in parallel. The word
is generally broken into two (MSBs and LSBs) or three parts. The output
of predecoders then goes through another NAND gate stage combining the
MSB and LSB results to form the final decoding stage (Keeth and Baker,
2000).
Fig. 3.21 (left) shows the scheme for a 5 bit address space (A0 to A4 ).
Instead of the 5-input NAND gates, the LSB is decoded by a 2-to-4 bit
decoder that uses four 2-input NAND gates and the MSB is decoded using
eight 3-input NAND gates. The final decoder stage consists of 2-input NAND
gates combining the output of the LSB and MSB decoder. As the design does
not use NAND gates bigger than the 3 input ones, the delay mismismatch
discussed before is largely reduced. More than one predecoder stages are used
if the input address space is wider. The Fig. 3.21 shows decoding scheme
48
Chapter 3. AER Communication circuits
Predecoders
Predecoders
A8
A4
3 to 8
3 to 8
2 input
NAND gates
3 input
NAND gates
2 to 4
3 to 8
A0
A0
3 to 8
Figure 3.21: A pre-decoding scheme for 5 and 9 bit address space. The predecoder decodes different parts of the address space (say LSB and MSB) in
parallel and combine them in a final decoding stage. The 5 bits (left) are
broken into 2 and 3 bit decoders where as 9 bits (right) into 3 decoders each of
3 bits each. In order to reduce the prblem of unequal delay, NAND gates not
bigger than 3 inputs are used anywhere. The subsequent NAND gate stage
generates the final result.
for 5 and 9 input address space, both using a single predecoding level. The
Fig.3.22 shows how the physical layout of the predecoder is done. Here, the
data is first latched before being sent to the decoder. The latching scheme is
described in the next section.
3.7.2
Address Latching
In a robust asynchronous pipeline, address latching should not require any
timing assumption of internal and external delays. However, the validity of
the data to be latched is not obvious. It is dependent on the the assumption
of matched propagation delay for data and control path. In the AER system
described here, the problem of latching a valid data becomes prominent in
the Receiver Handshake block (see Fig. 3.5), that receives data from off-chip
metal wires. A common practice to avoid the requirement of matched delay
is to latch the input data with a delayed version of Dr (or Le ). However,
this requires addition of delay only on the rising edge of Dr using external
bias voltages, that we intend to avoid. Also, adding worst-case delay for
increasing reliability would result in slowing down the communication cycle
unnecessarily. To tackle problems of this sort, delay insensitive data encoding
has been extensively used in asynchronous design community (Martin and
Nystrom, 2006).
Dual-rail data is one such method in which 2 wires are used to carry a
signal, bit-true (bt ) and bit-false (bf ). Figure. 3.23 shows the scheme where
2N wires are required to carry N bit data. In contrast to bundled data, where
3.7. Decoder
49
2 to 4
Latch
3 to 8
Decoder
Data Input
Predecoder
~D r D r
2D synapse array
Figure 3.22: Pre-decoder cell placement in the chip layout. The pre-decoder
and the decoder NAND gates are physically separated for ease of layout.
each wire carries a signal, this added overhead provides a big advantage where
delay-insensitive design is necessary. In dual-rail configuration, data itself
carries a Data valid signal. A low value on both wires (bt bf =00) indicate
invalid data where as 01 and 10 represent low and high data, respectively.
This justifies the lack of separate Req signal in the Fig. 3.23. The state 11
is not allowed to happen.
In Fig. 3.24 the entire input stage of the receiver is shown. The C-element
and the latch can be identified as the Tx handshake block shown in Fig. 3.5.
The output of the C-element Dr (same as Le ) when high, latches the available
data. The output of the latch, combined with Dr forms the dual rail data,
each bit having a bt and bf component. These lines can be directly used as
inputs to the decoder (2-to-4 decoder, in this case). When Dr is low, any
data passing though the transparent latch is transformed into ’invalid’ (00)
dual-rail state. Even if one of the dual-rail bit is in 00 state, it turns all
the outputs of the decoder to zero. The right most structure, generating the
∼ Pa signal is a part of the wiredOR gate described in Sec. 3.4.1. The Dr
signal is delayed by an amount equal to that of the data lines to generate the
Pr signal used for active pull-up of the wiredoR gates.
3.7.3
Receiver Synapse Select
For a 2D array of neurons, shown in Fig. 3.25, the X- and Y-decoder data
should be combined to generate the final signal R that selects the correct receiver unit. The safest way to generate R is to use a state holding C-element,
50
Chapter 3. AER Communication circuits
4−phase dual−rail channel
Data (Req)
2N
Ack
Data{ bt bf }
Valid
Valid
Ack
Figure 3.23: The dual-rail communication channel with 4-phase handshaking.
The data path consists of 2N lines for N bit address. This also implicitly
carries the request signal. Every data bit is represented by a bt and a bf . The
timing diagram is shown below.
with Qx and Qy as input. However, as the synapse circuit is repeated several
times on chip, it is advisable to keep it as small as possible. Three other
methods to generate R is shown in Fig. 3.25. Driving a NAND gate with
QX and QY is the next obvious solution (a), however, the structure in (b)
requires one less transistor than the NAND gate. This can be considered as
a static NAND that uses a constant pull-up current supplied by a bias. However, the static pull-up introduces problems of mismatch among synapses,
and makes the width of R highly variable. This results in more variability
in the following analog circuits. The pull-up can be actively driven by the
Pr signal. The structure in (c) requires just two transistors and is commonly
used in large memory arrays (DRAMs). The pull-down path can be either
driven by a bias or by the active Pr signal, as before. All the individual
acknowledge signals (R), are collected together using wired-OR, similar to
Fig. 3.7, to generate a global acknowledge (Pa ) driving the communication
channel. However, to reduce capacitive load arising from thousands of pixels,
we first do a row wired-OR and then a column wired-OR on the row outputs
(see Fig. 3.24). A delayed version of the signal Pr is used as active pull-ups
for the wired-OR.
3.8
Conclusions
The Address-Event-Representation (AER) protocol uses a time-division-multiple
access for communication between neuromorphic chips. It exploits the 5decade difference in bandwidth between a neuron (hundreds of Hz) and a
digital bus (tens of megahertz), that enables us to replace thousands of dedicated connections with a handful of high-speed metal wires. Followed by the
3.8. Conclusions
51
Pr
Control Path
~Pa
C
Cr
Ca
~D r
Dr
Pr
Data Path
X Latch,Decoder
Y Decoder
A0
Qy
Y Latch
bt
S1
2D synapse array
NAND
bf
A1
Latch
Qx
Si
2 to 4
~Pr
Dual−rail data
Figure 3.24: Part of the AER receiver implementation. All relevant control
path and data path signal are shown (see Fig. 3.5 and Fig.3.6). The dual-rail
data representation, internal to the chip, is marked. The data is latched using
the control signals and decoded using a pre-decoder scheme (see Fig. 3.21).
The wiredOR gates (see fig.3.7) on the the right of the 2D array are used to
generate acknowledge signals for the control path.
pioneering work done by Sivilotti (1991) and Mahowald (1994), AER has become the most prominent method of pulse-codded data transfer. Since then,
there has been significant improvement in its design aspect (Boahen, 2000,
2004b; Merolla et al., 2007) to accommodate large multi-chip neuromorphic
systems.
Here I discussed the basic formulation of the AER scheme and the method
to build an asynchronous communication channel using this protocol. As
introduced in Boahen (2000), I showed how pipelining the communication
channel increases its throughput and elaborated on the design aspects of the
pipeline. The handshaking cycles in an asynchronous pipeline can expanded
using the HSE formalism developed by Martin et al. (2003). This formal
approach in AER circuit design is in contrast to the heuristic method used
in many previous generations of neuromorphic chips. I described how to
build a robust AER communication system, independent of any external
bias. Apart from the ease in usability, the reliability of the system also
showed improvement. The problem of data loss observed by Chicca (2006)
during high frequency activity could be completely eliminated.
52
Chapter 3. AER Communication circuits
X Decoder
(b)
(a)
(c)
X Decoder
QX
QY
QY
Pr
QX
R
R
~R
QX
~Pr
QY
Figure 3.25: The x- and y-decoder generates one-hot code to select a particular synapse on the 2D array (left). This can be done by uses various Synapse
Select circuits(right). Circuits from left to right uses less area, which is a
major deciding factor in implementation of large arrays of synapses.
In this chapter, I also discussed the design of individual combinational
circuit blocks. In particular, I focused on the arbiter and the decoder design.
Arbiter, being an integral part of an asynchronous transmitter, has to be
optimized for both speed and fairness. I described the design basics and presented data from different chips to point out improvements in arbiter design
that was proposed in Boahen (2004a). This is an important enhancement
over the problems observed in Lichtsteiner and Delbrück (2005), where increased activity in a region of the chip restricted all the AER traffic to that
part. The dual rail data representation used in the decoder design is another
step toward adding robustness to the AER communication system.
Chapter 4
Circuits for synaptic plasticity
4.1
Introduction
Models of learning and classification using networks of spiking neurons have
recently become very popular in the theoretical neuroscience community
(Kempter et al., 1999; Natschläger and Maass, 2002; Fusi and Senn, 2006).
However, there are very few VLSI implementation of spiking neurons that
exhibit plasticity and learning. From the early efforts (Diorio et al., 1996) to
the very recent ones (see Giulioni et al., 2007; Mitra et al., 2008), different
learning rules, and a wide variety of silicon circuits implementing them, have
been tested. Among various technological constrains to be considered in designing such circuits, an efficient long term storage of the synaptic weight is of
foremost necessity. Apart from technology specific floating gates (P.Hasler
et al., 1999; Häfliger and Rasche, 1999; Shon et al., 2002), digital methods like on-chip SRAMs (Arthur and Boahen, 2006; Schemmel et al., 2007)
or external look-up-tables (Vogelstein et al., 2003; Wang and Liu, 2006) to
large multi-level analog storage (Häfliger, 2007) were used. Storage of binary
synaptic weights on silicon are also reported in Bofill-i Petit and Murray
(2004); Indiveri et al. (2006a).
In this work, the circuits implementing the plastic synapses store binary
weights as stable states of an amplifier. Here I show how the design was
carried out to optimize for power and area. The improvised current mode
global feedback signal, from the neurons to the synapses, also makes the circuit robust against signal interference. Though motivated by the theoretical
work proposed in Brader et al. (2007), the design of the synaptic circuits is
not restricted to incorporating only the specific learning rule. Various bias
parameters can be tuned to modify both the functionality and the network
architecture to accommodate generic (like STDP) or even specific learning
53
54
Chapter 4. Circuits for synaptic plasticity
rules (Gütig and Sompolinsky, 2006). In next chapters I will show the performance of the silicon circuits mostly in the context of the particular learning
rule as in Brader et al. (2007). Another silicon implementation that meets
the same theoretical requirement is shown in Badoni et al. (2006).
4.2
The IFSL family of chips
All the IFSL family of chips designed during this project have a basic similarity in architecture. Apart from the AER interface circuitry, they consists
of arrays of analog integrate-and-fire (I&F ) neurons and dynamic synapses.
Here we describe the IFSL-v2 chip, which was used for most of the experimental results shown in next chapters, in detail. The fundamental building
block of the chip is its analog core consuming more than 80% of the silicon
area. Along with the neurons, there are three kinds of silicon synapse, plastic excitatory (P), non-plastic excitatory (E) and non-plastic inhibitory (I).
There are 16 neurons, 1920 plastic synapses and 128 non plastic ones. The
number of synapses connected to a neuron can be configured, depending on
the requirement of the experiment. The chip was fabricated using a standard
0.35µm CMOS technology, and occupies an area of 6.1mm2 . It can be used
to accommodate generalized feedforward learning networks consisting of an
input and an output layer. It also has nearest-neighbor connectivity among
neurons and a global inhibition for implementing more complex networks.
The architecture, as shown in the Fig. 4.1(a), is used for the learning network described in Brader et al. (2007) and in Fig. 2.4(a). The post-synaptic
neurons behave as the output units of the network, while the plastic synapse
receives spikes from the input layer. The chip layout in Fig. 4.1(b) shows the
physical dimensions and the placement of the circuit elements. In the following sections the circuits constituting the synapses and parts of the neuron
will be described.
The dynamics of the synapses are controlled by both the pre-synaptic and
the post-synaptic signals. Each synapse, receives spikes at its pre-synaptic
terminal (spike inputs to the chip) and control signals from the post-synaptic
neuron (the silicon neuron it is connected to). We will therefore divide
the circuits associated to a synapse in pre − synaptic and post − synaptic
modules. The pre-synaptic module consists of five different blocks, the
AER interface, the P ulse Extender, the weight update, the bistability and
the EP SC (excitatory post synaptic current) generating block. The postsynaptic module consists of the I&F soma, the pulse − integrator and few
currentcomparators. We also implement a method to modify the number
of synapses connected to a neuron by having a multiplexer in between the
4.3. The pre-synaptic module
55
I
Synapse
Pre−synaptic modules
E
P
(a)
20 m
Post−synaptic
module
70 m
(b)
Figure 4.1: a) Cartoon of a neuron with its synapses. In circuit implementation, the synapse is broken into a pre-synaptic and a post-synaptic module. b)
Layout of the chip along with the floor-plan for the soma and the synapse circuits. The bottom zoomed out image shows one synapse with all the internal
blocks in its pre-synaptic module (see Sec. 4.3).
pre-synaptic and post-synaptic modules.
The same learning rule was implemented in the LANN 1 family of chips
designed by Giulioni et al. (2007), where they also used a part of the circuits
developed during this project. The basic difference between the LANN and
the IFSL chips are their pre-synaptic circuits. In contrary to the IFSL chips,
the LAN N chips uses a dedicated digital read-out to monitor the state of
every silicon synapse. Though this is useful during the characterization of the
synaptic array, the area overhead (3219µm2 per synapse) is more than double
compared to that of the IFSL-v2 (1400µm2 per synapse, see Fig. 4.1(b)) chip.
The LANN chips also used constant current pulses while generating the
EPSC, without incorporating the temporal dynamics observed in biological
synapses.
4.3
The pre-synaptic module
Instances of this module covers the largest section of the chip, one necessary
for each silicon synapse. Due to the large number on-chip synapses required
for better classification performance (see storage capacity in Sec 2.4), the presynaptic module should be designed with the minimal silicon area possible.
1
LANN stands for Learning Attractor Neural Network
56
Chapter 4. Circuits for synaptic plasticity
The primary inputs to a pre-synaptic module are the decoded data from
the X and Y address decoders. They are passed through an AER interface
(see Sec. 3.7.3) circuit to select a single synapse in a 2D address space and
stimulate the selected synapse with a digital pulse, called a pre-synaptic
event. This pulse received by the synapse at the output of the AER interface
has a typical width of 800-1000ns. A P ulse Extender is used to extend this
to few milliseconds, which is necessary for the generation of post-synaptic
current. At every pre-synaptic event, the synaptic weight is updated in
the weight update block and an excitatory post synaptic current (EP SC) is
produced depending on the present synaptic weight. In between pre-synaptic
events, the weight is restored to ward one of its two stable states (Bistaility
block).
4.3.1
The pulse extender block
In order to mimic the slow dynamics and time constant of the post-synaptic
current, a wide digital input pulse (∼ 1ms) is essential for the EP SC generating block. This pulse is used to produce a post-synaptic current (IEP SC )
of comparable rise time . The pre-synaptic input (∼ 1µs), coming from the
AER interface, is passed through a P ulse Extender (PE) that extends the
width to a desired value of few milliseconds. On the other hand, the narrow
output pulse coming from the AER interface can vary by nearly 40% due
to delays in the communication cycle, decoder irregularities and unmatched
NAND gate delays. The AER output pulse if directly fed to the sensitive
analog core, would create a high degree of mismatch between post-synaptic
currents even if the synaptic weights are the same. The pulse extender also
helps in minimizing this effect. It PE should be activated at the positive
edge of the input pulse but keep the extended pulse width independent of
the variations in input. Considering the high synaptic density required, it
also has to be small enough not to increase the area of the pre-synaptic
module appreciably. This eliminates the use of popular digital circuits like
monostable multivibrator, that has the required functionality, but occupies
large area consisting of comparators and flipflops.
Figure. 4.2(a) shows a simple method of extending a voltage pulse (Vin ).
The input activates transistor Mp1 and rapidly discharges the node Vc for the
duration of the pulse. It then slowly charges up through Mp2 with a small
leak current. The inverter at the capacitive node changes state when the
voltage Vc crosses its inverting threshold. Mp2 should be weakly biased to
keep the time constant of Vc in millisecond range. Even though this produces
an output pulse (Vpulse ) larger width than the input (Vin ), the inverter can
draw large amount of current due to its slowly varying input(Vc ). If the input
4.3. The pre-synaptic module
Vh
57
Ipulse
Ms2
Vh
Vw
Ipulse
Mp1
Vpulse
Vc
Vin
Mp2
Mi3
(a)
Ms1
Vs
Mp1
Vin
Vc
Vw
Ms2
Ms1
Mp2
(b)
Figure 4.2: Schematic of two different circuit for the PE (Pulse Extender)
block. A part of the EPSC block is shown within the dotted line. (a) The
narrow input pulse Vin generates a wide voltage pulse Vpulse . The output is
used to produce the current pulse (Ipulse ) within the EPSC block. b) A wide
current pulse is directly produced in the EPSC block, using less transistors.
pulse width is negligible compared to the intended output, the width of Vpulse
only depends on the charging time and not on Vin . A starved inverter, that
puts an upper boundary on the current, is used to limit the inverter current
( within 100nA) during this slow change in Vc . Transistors Ms1,s2 are part
of the EPSC block that uses Vpulse to produce Ipulse necessary for the IEP SC
generation.
It is important to note that the extended voltage pulse is required only
to produce Ipulse current pulse for EPSC generation. Instead, a current pulse
of uniform width could be directly generated by simplifying the design. In
a different PE circuit shown in Fig. 4.2(b), a transistor and a bias voltage
is removed reducing considerable amount of area. In this case, Vc goes high
when Vin is high and then slowly discharges through Mp2 . Ms1 in the EPSC
block turns the current Ipulse on when Vc is high, but magnitude of the current
is limited by transistor Ms2 . As long as the decaying voltage Vc is above the
bias Vh , the gate-source voltage of Ms2 controls the magnitude of current
Ipulse , keeping it constant. When Vc goes below Vh , the current starts to
decay exponentially, producing a sharp edged current pulse.
The simulation results for the circuit in Fig. 4.2(b) is shown in Fig. 4.3.1.
The top panel shows the input voltage pulses. Three pulses of 500ns width
are shown in gray and one with 900ns width in black. The input pulses of
the two different widths are indistinguishable when plotted in millisecond
time scale. The bottom panels show the current Ipulse in black and gray in
response to respective inputs. Even if the wider input (900ns) is nearly twice
the narrow one (500ns), the current output shows no noticeable difference.
58
Chapter 4. Circuits for synaptic plasticity
Vin,Vc
3.3
Ipulse(nA)
100
1
2
Time(ms)
3
Figure 4.3: Simulation results of the circuit shown in Fig. 4.2(b). The top
panel shows two different inputs having pulse widths of 500ns (gray) and 900ns
(black), overlapped on each other. In the bottom panel, we plot the respective
output current pulses in gray and black. Though the input varies by nearly
100%, the width of the current pulses are indistinguishable. The current pulse
also show a linear summation behavior when two inputs come close to each
other.
Voltage Vc , plotted in dashed line, shows a sharp rise and a slow decay.
Output Ipulse is high as long as Vc is greater than the constant voltage Vh
(plotted in dotted line) and comes down rapidly when Vc goes below Vh 2 .
Two consecutive pulses in the later part of the simulation add up to form
a current pulse wider than that for a single pulse and also depends on their
inter-spike interval. This is a very desirable property of the circuit that
demonstrates its linearity.
4.3.2
The weight update block
The synaptic weight can be modified with every pre-synaptic input spike, depending on the post-synaptic control signals. The updates are instantaneous
jumps of the synaptic weight, during the narrow input pulse (width ∆t). The
jumps can be either of positive or negative polarity of same magnitude, or
null.
In Fig. 4.4, we show the weight update block that increases or decreases
the synaptic weight (node voltage w) by sourcing to or sinking charge from
the capacitor Cw . During a pre-synaptic input, MsU and MsD receives digital
2
As seen from the simulation, the decay in current (Ipulse ) starts little before Vc crosses
Vh . This is because source voltage of Ms2 (Vs ), has to rise to accommodate the current
Ipulse through Ms1 , reducing Vgs of Ms2 .
4.3. The pre-synaptic module
59
signals VP and VP respectively, both transistors behaving as switches. This
initiates a change in the synaptic weight. The polarity of the change depend
on the values of IU P m and IDN m , mirrored from the post-synaptic module.
Transistors MU and MmU , sharing the gate voltage VU P , mirror the current
IU P produced in the post-synaptic module. Similarly, MD and MmD is used
to mirror IDN . During pre-synaptic spikes VP goes high and VP goes low
turning on both the switches simultaneously. This forces the current mirror
circuits to an incomplete configuration. Hence, no current flows in or out of
the capacitor.
At every pre-synaptic spike:
∆w = (IU P − IDN )
∆t
Cw
(4.1)
In section 4.4 we will describe that the current IU P and IDN , produced in
the post-synaptic module, can either take the value Ijump or zero. Voltages
VDN and VU P , corresponding to the gates of transistors MU and MD , are
broadcast to all synapses connected to the same neuron. This lets the postsynaptic module to mirror the control currents to all its synapses. It should
be noted that only one of the two current can take the value Ijump at a time
( see Sec. 4.4), while both can be zero simultaneously. This restricts ∆w
to be either +Ijump C∆tw or −Ijump C∆tw or zero. This block performs a logical
AND function between the pre and post-synaptic signals. It lets a current
IU P charge the synapse when:
VP = 1 and IU P = Ijump
(4.2)
It discharges the weight with IDN when:
VP = 1 and IDN = Ijump ,
(4.3)
and does not make a change when either VP is zero or IU P ,IDN are both
zero.
4.3.3
The bistability block
One of the prescriptions of the learning rule is that the synaptic weight
should reach either of the two stable states in long the term. During inter
spike intervals the bistability circuit should refresh the weight by pushing it
to one of these two, depending on its previous state.
60
Chapter 4. Circuits for synaptic plasticity
presynaptic
module
VP
M sU
IUPm
VUP
M mU
Cw
MU
IUP
w
IDN
VDN
M mD
VP
postsynaptic
module
M sD
MD
IDNm
Figure 4.4: The weight update block in the pre-synaptic module modifies the
node w at every pre-synaptic pulse (VP ). The node charges, discharges or
hold its state depending on the currents IU P m and IDN m mirrored from the
post-synaptic modules. The current mirrors (MD , MmD and MU , MmU ) are
inactive when the switching transistors, MsD and MsU are off, i.e., during
inter-spike-intervals.
In the silicon implementation, the capacitor storing the synaptic weight
(Cw in the Fig. 4.4) is always actively charged or discharged with a current
much smaller than Ijump . This is done by an amplifier in positive feedback
configuration and biased with a small subthreshold current Ir , as shown in
Fig. 4.5(a). The positive feedback amplifier has two stable outputs corresponding to its supply rails VH and VL . Depending on value of w after the
last jump in synaptic weight, the node is either charged or discharged to
either of the two rails. If w was left at a value below θ, the node is slowly
pulled downward and vice versa. The amplifier bias current Ir determines
the rate at which w is driven to the supply rails (also known as slew-rate).
Unlike the theoretical model, the bistability circuit is active not only during
ISI but also during a pre-synaptic spike. However, as the magnitude of Ijump
is much larger than that of Ir the weight update overrides the weak bistable
drive. The slew-rate can be expressed as:

Ir

 + Cw if w > θ and w < VH
dw
= − CIrw if w < θ and w > VL

dt

0, otherwise
(4.4)
The choice of the supply rails and the amplifier type plays an important role in the design. As shown in the Fig. 4.5(b), output w is connected
4.3. The pre-synaptic module
61
VH
Ir
Vr
Ipulse
IUPm
w
w
Mw
IDNm
VeP
MP
VL
(a)
(b)
Figure 4.5: a) A wide range p-type input amplifier is used in positive feedback
configuration for the bistability block. The output (w) is slowly driven by the
bias current Ir to one of the two supply rails (VH or VL ). b) The bistability
block is connected to the weight update and the EPSC blocks. Choice of
the amplifier type and the magnitude of the supply rails in the design of the
bistability block also depend on the functionality of the connected circuits.
to a nMOS in the EPSC block (transistors in gray). A hand drawing portraying the jumps and the bistable-drive of the voltage on node w is shown
in Fig. 4.6(a). To simplify circuit description, the lower transistor (MP in
Fig. 4.5(b)) is replaced with a switch. The corresponding input currents to
the EPSC block (Ipulse ) are also shown in the lower panel. The threshold θ
is set to divide the range of w in two equal halves, such that the transition
to a higher or lower state can be equally likely. In order to keep Ipulse within
subthreshold values, VH cannot exceed 0.5V. If VL is set to 0.1V, the output
of the amplifier (w ) is always withing a narrow range between 0.1-0.5V. A
wide range pMOS differential amplifier is best suited to drive the output to
such low values. From Eq. 4.1 we know that the value of ∆w, determined by
Ijump (∼10nA)3 cannot be made arbitrarily small. If W is the difference between the minimum and maximum voltage the node w can take, then ∆w
can
W
be considered as the dynamic range of the synaptic weight. For a balanced
transition probability, the theoretical requirement is to have the dynamic
range more than ten. However, the narrow voltage range limits the dynamic
range to smaller value. Though, the dynamic range can be increased by
3
The current Ijump has to be more than one order of magnitude compared to Ir for
reasons described in last paragraph. And, Ir cannot be smaller than 1nA without causing
serious disparity due mismatch between different synapses. The current Ijump , also mirrored to hundreds of synapse simultaneously, cannot be too small without having mismatch
issues of its own. Hence Ijump cannot be less than 10nA.
62
Chapter 4. Circuits for synaptic plasticity
I pulse
VH
Vw
Mw
V eP
VL
Mw
Vw
I pulse
VH
VL
MD
log(I pulse )
log(I pulse )
V eP
(a)
(b)
Figure 4.6: Schematic representation of the effect of bistability block on the
output of the EPSC block (not simulation). The transistor Mp in Fig. 4.5 is
replaced with a switch for sake of simplicity. A typical example of the dynamics
of the synaptic weight w is shown on the top panels. a) The nMOS Mw
converts the weight into Ipulse current. The current changes in an exponential
fashion depending on instantaneous w. To keep Ipulse within the subthreshold
limits, the range of w is restricted between 0V (VL ) and 0.5V (VH ). b) The
dynamic range can be extended to 0-0.8V (VH = 0.8V) by using a diodeconnected (MD ) transistor. This limits the coefficient of the exponent to a
lower value, thus restricting the magnitude of Ipulse from overshooting the
subthreshold limit.
raising the value of VH , but that also results in large Ipulse currents crossing
the subthreshold limit. An additional diode connected (gate-drain shorted)
transistor MD , when placed in between the Mw and the switch, can restrict
Ipulse to subthreshold values for larger VH , but not more than 0.8V.
Alternative design of the bistablity block
An alternative approach to solve the problem is to utilise the entire supply
rail (0-VDD ) for the bistable amplifier and decouple the node w from the
actual weight used by the EPSC block. A full rail-to-rail dynamic range
requires the gate voltage of Mw to take binary values without traversing the
intermediate analog states. A possible implementation is shown in Fig. 4.7.
Here, wD determines the state of the synapse (on or off) and Vlim is used to
limit the current Ipulse within subthreshold range. The node w is connected
to the weight update block and the bistable amplifier as in Fig. 4.5(a) but is
seperated from the transistor Mw . An obvious method to convert w to wD is
to use a high speed open-loop amplifier behaving as a one-bit analog-to-digital
converter. However, the power and area overhead for such an implementation
4.3. The pre-synaptic module
63
Ipulse
VP
Ma3
Vlim
Ir
Ma2
wD
w
Ml
Mw
Ma1
VeP
MP
Figure 4.7: The positive feedback amplifier with supply rails connected to
VDD and ground increase the dynamic range of w. The EPSC block does not
directly take w as its input, but is conneceted to wD a digital version of w. The
circuit within the dashed line is a open loop amplifier performing as a one-bit
analog-to-digital converter that is active only during pre-synaptic pulse VP .
Bias Vlim limits the Ipulse magnitude within subthreshold limit.
is not only unnecessary, also impractical for the small synaptic circuit. The
simple circuit shown within dotted lines in Fig. 4.7 behaves like a two stage
non-inverting amplifier performing similar function but consuming much less
power. Ma1 behaves like an inverting amplifier with Ma2 as its load. This
amplifier turns on only during pre-synaptic spike events (using switch Ma3 )
reducing the static power dissipation to zero. The dynamic power dissipation
through the inverter and the amplifier is also minimal due the narrow width
of Vs . The gain of the amplifier has to be high enough for w to settle to a
digital state within the allowed time period.
In the simulation results in Fig. 4.8 we forced w to vary between a wide
voltage range (top panel) and plotted the corresponding wD and Ipulse . As
expected, wD settles to either zero or VDD very fast during a pre-synaptic
event (except when w is around θ). However, the current Ipulse shows oneor-none nature, exactly what the EPSC blocks should receive.
4.3.4
The EPSC block
The EPSC block generates the final synaptic current, with a biologically
realistic temporal dynamics, and an amplitude dependent on the synaptic
weight. The EPSC current is produced at every pre-synaptic event that depolarizes the membrane capacitance, as shown in Fig. 4.9(a). This can be
modeled as a controlled current source charging up a capacitance at every
input spike (see Fig. 4.9(b)). In models of spiking neural network, the temporal dynamics of the current source is often ignored. It is generally considered
Chapter 4. Circuits for synaptic plasticity
w(V)
64
2
θ
Ipulse (nA)
wD(V)
1
3
1
100
0
0
0.5
1
1.5
time (µs)
2
2.5
Figure 4.8: Simulation result showing w, wD and Ipulse from Fig. 4.7. The
synapse receives a regular spike train input that modifies w according to Iupm
and IDN m (not shown). Node wD settles to one of the two bistable states
during pre-synaptic spikes, depending on the magnitude of w. Current Ipulse
has a high subthreshold value only when wD is high at the instant of a presynaptic spike.
to be a sharp current pulse of known amplitude. In many implementations
of VLSI synapse constant current sources activated only for the duration of
the presynaptic input pulse (Mead, 1989; Fusi et al., 2000; Chicca et al.,
2003) have been used. However, within the context of pulse based neural
networks, modeling the detailed dynamics of postsynaptic currents can be a
crucial step for learning neural codes and encoding spatiotemporal patterns
of spikes (Gütig and Sompolinsky, 2006). Various VLSI models of synaptic
dynamics have been implemented in early works of Mead (1989), Lazzaro
(1994) to more bio-plausible ones in Boahen (1996), Arthur and Boahen
(2004) and Farquhar and Hasler (2005).
In theoretical models of synaptic transmission, a pre-synaptic spike is considered to release neurotransmitters that bring changes in membrane conductance activating the ion channels of the post-synaptic cell. This results in a
post synaptic current. The membrane conductance changes are, in general,
modeled as α-function (Rall, 1967), as exponential rise and fall (Destexhe
et al., 1998) or as difference of exponentials (Dayan and Abbott, 2001). A
first order linear filter with equal exponential rise and fall time is good approximation of the model in Destexhe et al. (1998). Linear filters can be
implemented in VLSI with very few MOS transistors when their subthreshold exponential transfer function is considered. Frey (2000) demonstrated
this family of log-domain filters using the exponential behavior of bipolar
transistors and Arthur and Boahen (2004) showed how to use them in sili-
4.3. The pre-synaptic module
65
Spike
Presynaptic
terminal
Spike
EPSC
Postsynaptic
membrane
Membrane
capacitance
EPSC
(a)
(b)
Figure 4.9: a) A biological synapse is activated by a pre-synaptic spike that
initiates a complex biochemical phenomena resulting in a post-synaptic current
(EPSC). b) It is modeled in silicon as a controlled current source, triggered by
a pre-synaptic pulse, whose temporal dynamics follows that of the real EPSC.
con synaptic circuits. The Differential pair integrator (DPI) circuit reported
in Bartolozzi and Indiveri (2007b) is another suitable approximation of the
linear integrator and has been used in a conductance based silicon synapse.
We used the DPI circuit in this work because of its compactness and our
familiarity with the design (see Bartolozzi et al., 2006). Appendix. C shows
detailed analysis and design of other compact log-domain filters with various
tunable parameters.
In the DPI circuit shown in Fig. 4.10(a), the steady state IEP SC is proportional to the current Ipulse . The layout of the DPI circuit covers nearly
a third of the area occupied by the pre-synaptic module (see Fig. 4.1(b)).
However, as it implements a linear filter (Bartolozzi et al., 2006), it would
be possible to use a DPI circuit per neuron that integrates the effect of all
pre-synaptic inputs. This is shown in Fig. 4.10(b), where the w1 to wn represents weights of different synapses and S1 to Sn are the switches receiving
the corresponding pre-synaptic spikes. A common Ipulse current is generated
and fed to the core of the DPI circuit, that is shown in dotted line. The
part of the circuit inside the dotted line (compare with Fig. 4.10(a), uses this
Ipulse to produce a synaptic current, which is a linear summation of all inputs. Though the area of the EPSC block could be greatly reduced with this
method, in this work we followed a conservative design approach by using
one DPI per synapse, giving less emphasis on the synaptic density.
4.3.5
Point-neuron architecture
A point-neuron, shown in Fig. 4.11(a), consists of a single node where currents from all synapses are summed and then fed into an integrate-and-fire
66
Chapter 4. Circuits for synaptic plasticity
v tau
w
VeP
M3
DPI
M4
Vgain
M1
M2
I pulse
I EPSC
w
I pulse
wn
I EPSC
w1
VeP
Sn
(a)
S1
(b)
Figure 4.10: a) Schematic of the differential pair integrator (DPI) circuit
used for IEP SC generation (Bartolozzi and Indiveri, 2007b). The current is
produce only during the VeP pulse and has a magnitude depending on the
synaptic weight at node w. One EPSC block is used for each silicon synapse.
b) A method of sharing the EPSC block by exploiting the linearity of the
DPI circuit. Synapses with different weights (w1 -wn ) produce a common Ipulse
depending on the pre-synaptic inputs (S1 -Sn ). All inputs will have an additive
effect on the IEP SC current charging the post-synaptic membrane.
soma. This is in contrast to the compartmental model, where the physical position of synapses are important in their contribution to post-synaptic
depolarization. Unlike the compartmental model, a point-neuron has a single lumped membrane capacitor connected to a common node and the node
voltage is the membrane voltage (Vmem ). Similarly, in the silicon implementation, all synapses belonging to the same neuron have independent IEP SC
currents that charge up the common Vmem in an identical way, without any
spatial bias. The Fig. 4.11(b) shows an array of pre-synaptic modules, each
representing a plastic synapse, connected to one neuron (represented by the
post-synaptic module). It also shows all the circuit blocks forming the presynaptic module with their internal connections. The Spre input on the left is
where the pre-synaptic spike is received. Though the synapses are physically
placed at different locations, away from the neuron, they show a point-neuron
behavior when the electrical connectivity is concerned. Apart from the common Vmem node, the pre-synaptic modules also share common learn control
signals, (VU P and VDN ), produced in the post-synaptic module.
4.4. The post-synaptic module
67
W1
W2
Vmem
Axon
Wn
pre−synaptic
modules
post−synaptic
module
VUP
sU
Vmem
Spost
w
PE
Spre
AER
Bistable
VeP
VeP
VP
DPI
IEPSC
VDN
sD
Figure 4.11: The point neuron architecture, shown on the top, accumulates
pre-synaptic inputs from all its synapses on a single node (Vmem ). Corresponding silicon implementation with numerous pre-synaptic modules connected to
a single post-synaptic module is shown below. All internal blocks of a typical pre-synaptic module along with the global learn control signals (VU P and
VDN ) are also shown.
4.4
The post-synaptic module
Every silicon neuron is made up of an instance of the post-synaptic module.
This module consists of a I&F soma block, a pulse integrator block and
three current comparator blocks. The action potential is generated in the
I&F soma while a pulse integrator and the current comparators are responsible for producing the learn control signals fed back to the synapses.
Though the changes in synaptic weights occur only during a pre-synaptic
spike, the polarity and magnitude of the change depends on the state of
the post-synaptic module. As described in the learning rule (Sec. 2.5), the
average post-synaptic firing rate and the membrane depolarisation at the
pre-synaptic spike instant determines the nature of synaptic modification.
Combining the Eq. 2.4 and Eq. 2.6, the conditions for upward or downward
jumps are given by the status of the binary signals U P and DN , respectively:
(
UP =
1, if k1 < νpost < k3 and Vmem > Vmth
0, otherwise
68
Chapter 4. Circuits for synaptic plasticity
Vmem
Vmth
UP
DN
Figure 4.12: Dependence of the binary learn control signals, U P and DN , on
post-synaptic membrane potential Vmem . This assumes that the post-synaptic
freqiency is in the right operating range.
(
DN =
1, if k1 < νpost < k2 and Vmem < Vmth
0, otherwise
(4.5)
Here νpost is the average post synaptic firing rate, k1−3 , Vmth are constants, and Vmem is the post-synaptic membrane potential. Due to their
dependence on Vmem , the binary control signals, U P and DN , are activated
in a complementary fashion. They are also called as eligibility traces, that
makes a pre-synaptic spike eligible to modify the synapse. If the condition
for upward jump is met (i.e, U P = 1) at the instant of a pre-synaptic spike
(Spre ), the synaptic weight increases by a predefined amount, while it goes
low by the same amount if the condition for downward jump is met at that
moment. Though the U P and DN conditions cannot me simultaneously
true, they can both be false, resulting in no synaptic modification during a
pre-synaptic spike. Figure. 4.12 shows the status of the control signals when
the restrictions on νpost for both the above equations are met.
4.4.1
The I&F soma block
There has been a number of implementation of integrate and fire circuit for
silicon I&F somas (Mead, 1989; Culurciello et al., 2001; van Schaik, 2001).
Primarily, all the circuits proposed follow a similar principle of charging a
capacitor with the synaptic currents, followed by a positive feedback circuit
that rapidly drives the integrated voltage to positive supply, once it crosses
a threshold. Subsequently the capacitor is reset to ground. In this work we
used the low-power I&F soma described in detail in Indiveri (2003). The
spiking threshold, refractory period and spike frequency adaptation can be
4.4. The post-synaptic module
(a)
69
(b)
Figure 4.13: Expected behavior of I&F block. a) Response during a step
input current to the axon. b) Response during Poisson input spikes stimulating
the excitatory synapses.
independently controlled in this circuit. It also consumes the least amount
of power compared to all previous implementations. For an average firing
frequency of around 50Hz, the soma consumes few hundred nano-watts of
power, ideal for a large scale system consisting hundreds of such elements.
The two different outputs of the soma are its analog membrane voltage Vmem
and the digital pulse representing an action potential.
The cartoons in Fig. 4.13(b) shows the methods of stimulating a silicon
neuron. Similar to physiological experiments, the soma, can be either directly
stimulated with a step current or via the excitatory synapses. The figure
shows expected firing patterns for a step current and for Poisson distributed
input at random synaptic location. A pulse integrator connected to the
output of the neuron can provide a measure of the average post-synaptic
firing frequency (νpost ).
4.4.2
The pulse integrator block
In biology, the slowly varying calcium concentration ([Ca]) of the cell is a
measure of the average firing frequency of the neuron. However, in silicon
implementation, we directly integrate the digital output pulses from the soma
to generate an analog signal, voltage or current, for estimating the mean firing
rate.
In a preliminary design of the integrator, a capacitor was charged by
a current pulse for every post-synaptic spike and then discharged linearly
during ISIs. The voltage on the capacitor (V[Ca] ) provides an approximate
measure of the mean spike frequency. Though easy to implement, this is not
a reliable method to obtain the average post-synaptic frequency, νpost . As
the charge and discharge phases are independent of each other, the voltage
on the capacitor hits one of the supply rails within a short time duration.
70
Chapter 4. Circuits for synaptic plasticity
200
Current(nA)
current(nA)
12
8
70Hz
50Hz
40Hz
100
4
0
0
1
2
3
Time(ms)
4
5
0
0
0.2
0.4
Time(s)
0.6
(a)
Figure 4.14: [
Simulation of DPI circuit used as pulse intgrator]Simulation of the DPI
circuit when used as a pulse integrator. a) Response during a single input
pulse shows low pass filter like behavior. b) Uniform pulse train input of
different frequencies take the steady state output to different asymptotic
values.
To avoid this problem, we use a first order low pass filter that produces an
output magnitude directly proportional to input frequency.
A simple linear filter can be designed using a resistor, capacitor combination in parallel, and a an amplifier for the voltage readout. However,
implementing a linear resistor in VLSI is not an obvious task and the overall
circuit requires a fairly large area. In Sec. 4.4.3 I will discuss the additional
problems encountered when using a voltage signal representing νpost . Hence,
it is observed that a current-mode pulse integrator is best suited for the purpose. The integrator should generate a current (I[Ca] ) proportional to the
average output frequency of the post-synaptic neuron. Different low pass
filters (LPF) that can be designed to meet this requirement are shown in
Appendix. C. Here we used a differential pair integrator (DPI) as a low
pass filter, similar to the one shown in Fig. 4.10(a)) (Bartolozzi et al., 2006).
Simulation results in Fig. 4.14(a) shows the response of the integrator to one
wide input pulse, and in Fig. 4.14(b) shows the response to regular pulse
trains of different frequencies. As desired, the steady state LPF current is a
function of the mean νpost and reaches the value in an asymptotic manner.
4.4.3
The dual threshold comparator block
Every neuron in the chip has to produce two binary learn control signals (U P
and DN ) and broadcast them to all its synapses simultaneously. The average
4.4. The post-synaptic module
71
UP
mem
DN
Figure 4.15: Physical layout of the neuron and synapses require three wires
carrying U P , DN and mem signals to run in proximity for a few millimeters
on the chip. This can create enough coupling to destroy the integrity of the
slowly varying mem, if the binary signals (U P ,DN ) have large voltage swings.
post-synaptic firing frequency should be compared with two thresholds to
generate the U P and DN signals, as described in Eq. 4.5. In the silicon
implementation of a point-neuron, the synapses are physically placed at a
large distance, often few millimeters away, from the post-synaptic module.
When broadcasting the the learn-control signals, it should be taken care
that the signal integrity is not lost in charging the large line capacitance
(originating from long metallic lines) and numerous device capacitances (from
large fanout). For a point-neuron, the membrane potential (Vmem ) seen by
the synapses is the same as that generated in the soma. Hence, similar to the
binary signals, Vmem has to be shared by all synapses, as well (see Fig. 4.15).
In such cases, shielding is necessary to remove interference between signals
with sharp, high voltage swings with the slowly varying ones. In the following
sections we will discuss possible usage of voltage or current mode comparators
responsible for generating and transmitting these global signals.
Voltage mode comparator
Let us first consider the mean post-synaptic frequency being represented by
a voltage, V[Ca] . To determine the presence of this analog signal between two
different threshold, two different circuits can compare the input with two
thresholds simultaneously. The outputs of the comparators are combined
for a final result. Standard comparators (high gain amplifier) and a digital
AN D gates are required to achieve the functionality. However, this would
typically require 15 to 20 MOSFETs and significant amount of DC power
consumption. On the other hand, if the speed requirements are not very
stringent, the same dual-threshold comparison can be achieved with more
power efficient and compact circuits. Here we show a circuit with only 8
MOS-FETs that dissipates very small amounts of power, that too, only when
the signal is within the two thresholds (Mitra and Indiveri, 2005).
In Fig. 4.16(b), we show a dual threshold comparator whose output
produces a direct indication corresponding to the presence of the input V[Ca]
within the two thresholds k1 and k3 . This circuit is particularly useful if the
72
Chapter 4. Circuits for synaptic plasticity
k2
Ip
M1
M6
3
VUP
Ix
2.5
Ixm
M3
M5
C
V [Ca]
M2
k1
(a)
M8
VUP(V)
Vlim
M7
log(Ix)(A)
Vp
2
−10
10
−15
10
1.5
0.5
1
V[Ca](V)
1.5
1
0.5
M4
0
0.4
0.6
0.8
1
V[Ca](V)
1.2
1.4
1.6
(b)
Figure 4.16: The dual threshold voltage comparator (a) and its output(b).
As the input V[Ca] sweeps from 0 to 1.6V, the output swings from default
high state to a low one only between the two thresholds k2 and k1 (0.5V and
1.4V, respectively). The inset shows the increase in current in the left branch
of (a) when V[Ca] is in between 0.5-1.4V range. The horizontal line indicates
the constant current (Ip ). The maximum current in the left branch is clipped
using the Vlim bias, to keep the power consumption minimal.
lower threshold is near zero (we set k1 =0.5V). The output swings high to
Vdd , when k1 > V[Ca] or V[Ca] > k3 , and goes low to k1 , when k1 < V[Ca] < k3 .
Transsitor M1 and M2 forms a basic inverter which conducts a large current
(Ix ) through the left branch only when V[Ca] is within the two thresholds
k1 and k2 . This current is mirrored (Ixm ) by M3 and M5 . On the right
branch, Ixm is subtracted from a suitable constant Ip to generate the voltage
output (VU P ). Voltage Vlim is used to limit Ix to a subthreshold value thus
restricting the power consumption. It should be noticed that Ix is almost
zero (so is Ixm ) when V[Ca] is outside the two thresholds, reducing the power
consumption to minimal. A digital switch, controlled by binary signal C,
can turn the comparator off which forces VU P to go high irrespective of V[Ca] .
A similar dual threshold comparator compares V[Ca] between k1 and k3 to
generate the VDN signal, which, unlike VU P , goes high when in between the
thresholds and low when turned off.
In Fig. 4.17(a) we plot data from the IFSL-v1 chip showing the voltage
VDN and V[Ca] in solid and dashed lines, respectively. Here k2 is set at 1V and
k1 at 0.5V, as before (plotted in dotted lines). Voltage VDN goes high only
within the right range of V[Ca] , but also changes its polarity depending on
the status of C. The binary signal C turns the dual threshold comparator off
4.4. The post-synaptic module
73
Vmem
3
2
0
1.5
1
1
0.5
0
0
2
vpre(V)
VDN,V[Ca](V)
2.5
0.2
0.4
0.6
Time(s)
0.8
1
0
0.5
Time(ms)
Figure 4.17: Data from the IFSL-v1 chip. a) The voltage V[Ca] (dashed line)
represents the average post-synaptic firing frequency. VDN (in solid line) goes
high only when Eq. 4.5 is satisfied. The coupling between V[Ca] and high voltage swings on VDN is evident from the plot. b) Large voltage swings on VDN
and VU P severely corrupted the Vmem signal when the neuron is stimulated by
Poisson distributed spike trains (Vpre ).
when Vmem is less than Vmth , forcing VDN to zero. As expected, the behavior
of the voltage VDN is identical to the variable DN in Eq. 4.5. However,
it can be noticed, the high voltage swing of VDN couples with the slowly
varying (V[Ca] ). The effect is even worse when VDN and VU P both couples
with slow analog Vmem signal. Shown in Fig. 4.15, these three wires run
in parallel and share large wire capacitance in between, enough to severely
corrupt the behavior of Vmem . Figure. 4.17(b) such a noisy post-synaptic
membrane voltage (Vmem ) when stimulated with Poisson distributed spike
inputs (Vpre ) on its non-plastic synapses.
Current Mode comparator
From the analysis in the last two sections, it is evident that voltage mode
comparison is not suitable for implementing the Eq. 4.5 in a silicon neuron.
The digital voltage outputs are prone to couple with each other and the slowly
varying membrane voltage. On the other hand, there are efficient circuits for
representing the magnitude of νpost as a current variable (see Sec. 4.4.2).
Hence, the output current from the pulse integrator (I[Ca] ) can be used for
suitable current mode comparison. Such comparators are utilized to produce
binary currents corresponding to U P and DN learn control signals. A simple
current comparator can be derived from a two input current mode winner
take all (WTA) (Lazzaro et al., 1989). Figure 4.18(a) shows such a WTA
circuit having two inputs, two outputs and a bias current IB . If one of the
74
Chapter 4. Circuits for synaptic plasticity
Ii1
Io1
Io2
Ii2
Ii1
Io2 = Iout
CC
Ibias
Ii2
(a)
(b)
Figure 4.18: Two input current mode winner take all (WTA) can be used
as current comparator. One of the two outputs (say Io1 in left figure) when
connected to VDD reduces the WTA circuit to a current comparator with the
specific connection, i.e. Ii1 as the negative input, Ii2 as the positive and Io2
as the output.
two output branches (say, Io1 ) is connected to VDD and the other (say, Io2 )
to an external load the WTA circuit reduces to a current comparator (CC)
of a specific connection (Fig. 4.18(b)). Here, Ii1 becomes the negative input,
Ii2 , the positive and Io2 behaves as the comparator output, renamed as Iout .
The output takes a high value if positive input is larger than the negative
and vice versa:
(
Iout =
IB , if Ii2 > Ii1
0, otherwise
(4.6)
It should be noticed that, if the bias current itself is brought down to zero
the comparator output is always zero, independent of the inputs.
Now, let us reduce the Eq. 4.5 to suitable current variables. We consider
IU P and IDN as binary output currents, and Ik1 ,Ik2 ,Ik3 as constant currents
representing the thresholds k1−3 . As mentioned before, the output current
of the DPI circuit (I[Ca] ) represents the average post-synaptic firing. If we
consider a current Ijump corresponding to the high state of IU P or IDN , Eq. 4.5
can be re-written as:
(
IU P =
(
IDN =
Ijump , if Ik1 < I[Ca] < Ik3 and Vmem > Vmth
0, otherwise
(4.7)
Ijump , if Ik1 < I[Ca] < Ik2 and Vmem < Vmth
0, otherwise
(4.8)
4.4. The post-synaptic module
75
V UP
Learn control
MU
Vmth
CC2
Vmem
Ijump
CC1
V AP
Ik3
Soma
AER
IUP
Ik2
Ik1
S post
IDN
CC3
DPI
V DN
V AP
ICa
MD
Figure 4.19: The complete post-synaptic module with the I&F , the DPI
and the current comparators. Connections to the dendritic tree (as shown in
Fig. 4.11(b) are done via the VDN , VU P and the Vmem signals. The output of
the soma VAP is also connected to an AER interface circuit that generates a
digital output pulse (Spost ) to be transmitted via the AER bus.
The above equations can be implemented in silicon using three current
comparators, each comparing an instance of I[Ca] with the constant currents
Ik1−k3 . The current comparators, CC1,CC2 and CC3, when connected in
the configuration shown in Fig. 4.19 produces the desired functionality. Here,
Ijump is the constant bias current for the comparator CC1, but CC2 and CC3
receives the output of CC1 as their bias. Hence, CC2,CC3 are active only if
I[Ca] crosses the first threshold, Ik1 , otherwise they receive zero bias current,
forcing their respective outputs to zero. However, only one of CC2 or CC3
can be active at a given time as the signal C diverts Ijump to either of them,
depending on the polarity of Vmem -Vth .
These binary currents generated in the post-synaptic module should be
broadcast to all synapses belonging to the same neuron. Unlike voltage
signals, currents cannot be directly broadcast. As separate wires has to
carry each instances of the original current, this would result in hundreds of
parallel wires running across the chip. Here, broadcasting currents essentially
means mirroring them to the synapses. Figure. 4.19 shows a part of the
circuit (MU , MD ) that mirrors IU P and IDN . The corresponding gate voltages
produced (VU P and VDN ) are sent over wires shared by all synapses (also
shown in Fig. 4.4). Though the learn control signals are binary in nature,
switching between zero and Ijump , the subthreshold current jump produces
voltage swings of only few hundred millivolts. The small voltage jumps do not
produce any noticeable coupling between Vmem and VU P or VDN . However,
a normal current mirrors are not efficient enough to handle a fanout of few
76
Chapter 4. Circuits for synaptic plasticity
I DN
I DN
MD
MD
Figure 4.20: a) General configuration of an active current mirror. b) The
circuit used to mirror IU P and IDN to all pre-synaptic modules in the synapse
array.
hundred currents. The capacitive load on the wire carrying VDN (or VU P )
restricts them from changing fast enough. Active current mirrors were used
to mirror the currents to hundreds of synapses connected to the same node.
Figure. 4.20(a) shows a generalised configuration of an active mirror and
Fig. 4.20(b) shows the specific circuit used here.
4.5
Configuration of synaptic density
The number of inputs (synapses) connected to a neuron has important consequences in the classification behavior. As described in Sec. 2.4, the number
of patterns learned by a neuron increases as the square-root of its synaptic
density. In this model, a neuron, being a binary classifier, categorizes the
learned patterns into two different classes. Though, more synapses per neuron is better for individual classification performance, more neurons can be
used to classify a large number of patterns. On the other hand, it is often
beneficial to configure each neuron as a weak classifier, with a low synaptic
density, and use a pool of them to boost the overall performance (Polikar,
2006). Given a fixed area, it is a difficult decision to choose between the
optimum number of neurons and the number of synapses connected to each.
To meet these varied requirements we provide a flexibility to the hardware
device that can reconfigure the synaptic connectivity to modify the density of
synapses per neuron. An on-chip multiplexer can connect arrays of synapses
to one neuron or the other, depending on the requirement.
In Fig. 4.21 four neurons, corresponding synapses arrays and the multiplexer are shown. In default condition, switches p1-p4 are on (1111) and
c1-c4 off (0000), each neuron is connected to the synapse array placed right
beside it. However, if p switches take a 0101 configuration and c switches the
complementary 1010, the s1 ad s2 arrays get connected to n2 and s3,s4 to
n4. Though the synaptic density of n2 and n4 got doubled, neurons n1 and
4.6. Conclusions
77
s1
s2
s3
s4
Synapse
Array
c1
Multiplexer
p1
c2
p2
c3
p3
c4
p4
Neurons
n1
n2
n3
n4
Figure 4.21: Neurons are connected to arrays synapses via a multiplexer.
The pass-gate switches are set to give the network a flexible configuration,
that varies the synaptic density of a particular neuron.
n3, without any synaptic input, became inactive. Similarly, a 0001 and 1110
configuration for the p and c switches respectively, connects all the available
synapses to n, forcing all other neurons to be inactive. In the IFSL-v2 chip,
a single neuron can be connected to 128, 248, 512 or 1024 synapses at one
time, thus reducing the number of active neurons from 16 down to 2. For
most experiments in next chapters, we configured the chip with 16 neurons
each connected to 128 synapses.
4.6
Conclusions
In this chapter I described different variants of the circuits used in designing
the silicon neurons and synapses. These are part of a wider collection of
circuits that the neuromorphic community has developed for last few years
and also during the course of this project. I showed how the circuits were improved while considering various design constrains in building multi-neuron
VLSI devices.
I demonstrated the basic requirements for implementing the reduced
learning rule described in Sec. 2.5, and showed detailed analysis in justifying
the choice of individual circuit elements. The most abundant circuit on the
IF LS family of chips is the VLSI plastic synapse, whose weight can be modified by its corresponding pre-synaptic and pos-synaptic modules. Analogous
circuits and networks in silicon, implementing variants of the learning rule
proposed in Brader et al. (2007) are also reported by Fusi et al. (2000); Chicca
et al. (2003); Giulioni et al. (2007). Here, I showed how the circuit blocks
are designed to improve in terms of power and area consumption, compared
to other implementations. In a previously proposed solution, all synapses re-
78
Chapter 4. Circuits for synaptic plasticity
ceiving the same global learn control signals from the post-synaptic neurons,
are affected by electrical interference (Mitra and Indiveri, 2005). I demonstrated how to overcome this problem by using current-mode approach and
broadcast subthreshold voltage signals to reduce the effect of coupling in the
shared metal lines. This methodology can be effectively used for all global
feedback signals for such silicon neural networks.
Current-mode log-domain filters has been shown to be useful for lowpower neuromorphic applications (Arthur and Boahen, 2006; Bartolozzi and
Indiveri, 2007b). In this chapter I described how similar circuits are utilized
as first order low-pass filters in silicon synapses and neurons. These circuits,
with additional control parameters (see Appendex. C), can also be used for
various low-power bio-medical applications (Bartolozzi et al., 2006). Some
circuit descriptions in this chapter are accompanied by relevant T-Spice TM
simulation results to emphasize their functionality. The analysis of the data
obtained from the various blocks are described in the next chapter, within
the context of synaptic characterization.
Towards the end of the chapter I described a method for configuring the
number of synapses connected to a silicon neuron, using an on-chip multiplexer. This will enable future chips to have flexible synaptic density for
incorporating a varied range of networks.
Chapter 5
Characterization of the
plasticity circuits
5.1
Introduction
Synaptic plasticity is considered to be the site of learning and the basis of
memory formation in biological systems (Abbott and Nelson, 2000). In this
work, the design of the silicon synapses showing functional plasticity is motivated by their biological counterpart and also by the theoretical learning rule
described in Brader et al. (2007). Here, I describe how the synaptic circuits
constituting the pre-synaptic and the post-synaptic modules (described in
Chapter. 4) are characterized. The circuits are stimulated with controlled
input to gain detailed knowledge about the behavior of the synapses. It is
necessary to optimize the functionality of the silicon synapses before proceeding towards the experiments related to learning and memory (described
in Chapter. 6).
Plasticity in silicon synapses, driven by spike based learning rules, have
been reported in some previous studies. Hafliger and Mahowald (1999) proposed a spike based learning rule that modifies according to the temporal
correlation of the pre- and post-synaptic spikes and showed the mechanism
of weight normalization in a silicon synapse. Most of the other studies also
emphasized on characterizing the temporal learning window of the silicon
synapses (Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006). Only
in Fusi et al. (2000); Indiveri et al. (2006a) was the dependence of synaptic
modification on pre- and post-synaptic frequency implemented. Here, I will
describe the short-term and the long-term modification of the synapses with
much more detailed characterization.
Due to transistor non-idealities, an exact match between the theoretical
79
80
Chapter 5. Characterization of the plasticity circuits
specification and silicon behavior is not possible for the hundreds of elementary circuits on chip. Imperfection during fabrication process can lead to an
on-chip mismatch of up to 50% even between single transistor I-V characteristics 1 . Therefore, it is important to implement neural networks that exploit
the collective behavior of neumerous circuit elements, overriding their individual disparity. Large number of imprecise computational blocks working
in massively parallel fashion results in a fault tollerant design, a hallmark of
neuromorphic systems (Vittoz, 1998). Here I describe the similarity between
the model of synaptic plasticity and the data from the circuits implementing
them. Due to the limitation in the number of pads used to read-out internal
signals from the chip, there is no possible way to directly access or individually characterize all the 1024 silicon synapses. We therefor chose to probe the
behavior of only one synapse in detail. In this chapter I also show the data
demonstrating stochastic transition probabilities for all synapses connected
to a single neuron. I describe how to modify the probability in synaptic
transition of the entire array, by tuning the bias voltages, according to the
theoretical prescription.
5.2
The post-synaptic module
The silicon synapses are functionally divided in a pre-synaptic and a postsynaptic module (see Fig. 4.1(a)). All synapses connected to a single neuron
receives learn control signals from the post-synaptic module. The circuit detail of different blocks in this module are described in Sec. 4.4 (see Fig. 4.19).
Here, we first characterized the post-synaptic module by constant current injection to the I&F soma block. As described in Sec. 4.4, the post-synaptic
module consists of a I&F soma, a pulse integrator and the current comparator blocks. The soma ingrates the input current on the membrane capacitance, fires an action potential when a threshold voltage is reached, and
then resets the membrane potential to ground. This begins a new cycle of
integrate-fire-reset, as long as the current injection is active. In Fig. 5.1(a)
the membrane voltage of a silicon neuron during an action potential generation period is plotted (from Indiveri (2003)). The data points (circle) are
1
Threshold voltage mismatch in transistors (∆Vt ) is generally modelled as σ∆V t =
The technology dependent proportionality factor AV t is measured to be around
9mVµm, for the 0.35µm process used here (Kinget, 2005). However, Grossberg et al.
(2004) showed that Vgs mismatch in subthreshold operation has a more dominant role to
play and has a proportionality factor twice that of AV t . Simple calculation shows, for a
minimum size device in this process, that the σ∆V gs can be as large as 60mV. Compared
to the subthreshold gate voltages of 200-300mV, necessary for nano ampare operation,
this can result in severe variations in individual circuit behavior.
AV t
√
.
WL
5.2. The post-synaptic module
81
DN
400 mV
2
V
Vmem(V)
V
mem
3
V
UP
400 mV
1
0
0.02
0.04
Time(s)
0.06
0
0.02
0.04
0.06
Time(s)
0.08
0.1
Figure 5.1: a) Membrane potential of I&F silicon neuron adapted from Indiveri (2003). The data points (circle) are fitted (solid line) with analytical
solution from MOS transistor equations. b) The top panel shows the membrane voltage made to fire at a appropriate constsnt frequency by continuous
current injection. The spike frequency adapatation mechanisms are turned off.
The states of VU P and VDN are shown in the lower panels.
fitted (solid line) with analytical solution from MOS transistor equations.
The circuit uses bias parameters determining the refractory period, spiking
threshold and spike frequency adaptation.
As shown in Fig. 4.12, the learn control signals (U P and DN ) generated
in the post-synaptic module depend on the membrane potential of the soma.
In order to check the dependence of the currents IU P and IDN on the value
of Vmem , we injected the soma with a constant current. The top panel in
Fig. 5.1(b) shows the activity of the I&F soma with out activating the spike
frequency adaptation machanism. To verify the dependence of the control
signals only on Vmem , we also deactivated the current comparator blocks in
the post-synaptic module. In order to do so, we had to set Ik1 to zero and
Ik2 , Ik3 to a large suprathreshold value (∼ 1µA). This ensures that I[Ca]
is always within the right range given by eqs. 4.7 and 4.8. We monitored
the gate voltages VU P and VDN corresponding to the current IU P and IDN ,
respectively (see Fig. 4.19). In Fig. 5.1(b) the gate voltages VU P and VDN
are plotted. Depending on the magnitude of Vmem compared to Vth (shown
in dashed line), the currents switch from zero (inactive) to Ijump (active).
Voltage VDN , from the gate of the nMOS transistor carrying IDN , takes a
higher value (dotted line in middle panel) when the current is in active state.
On contrary, VU P controlled by a pMOS transistor, comes down (dotted line
in lower panel) from VDD when IU P is active. The voltage signals changes by
82
Chapter 5. Characterization of the plasticity circuits
approximately 300mV as the currents change their states. As expected, the
currents reach their active states in a complementary fashion.
To emulate the calcium dynamics and verify the functionality of the other
blocks, a non-plastic synapse of the neuron was stimulated with a brief spike
train. This slowly increased its output firing frequency (νpost ) of the neuron to an average steady state value. The stimulation consisted of Poisson
distributed spikes for three seconds. The temporal average of νpost is determined by the DPI circuit, which is shown in Fig. 4.19). Its output reaches
an asymptote proportional to the neurons mean output firing frequency. In
Fig. 5.2 we plot the gate voltage v[Ca] of the DPI output transistor, that
produces the current I[Ca] . Similar to the previous experiment, we first deactivated the current comparators and monitored the changes in VU P and
VDN with respect to Vmem . We plotted all the voltages in Fig. 5.2(a), where
the state VU P and VDN are determined by Vmem . Both the control voltages
are alternatively active (similar to Fig. 5.1(b)) during the entire stimulation
period, except when the there was no post-synaptic firing. Next, the current
comparators are activated setting Ik1 , Ik2 and Ik3 to approximately 10nA,
50nA and 100nA, respectively. In Fig. 5.2(b) corresponding gate voltages
marked by horizontal lines are shown along with the V[Ca] signal. In this
case, following eq. 4.7, IDN switches between active and inactive states only
when I[Ca] is in the right frequency range. It is inactive in all other cases.
In the Fig. 5.2(b) we plot the corresponding VDN . Similarly IU P (and VU P )
follows the behavior in eq. 4.8.
5.3
The pre-synaptic module
Every silicon synapse has a corresponding pre-synaptic module that is responsible for the short term and the long term modification in the synaptic weights. The node voltage w in Fig. 4.4 is a measure of the synaptic
weight. The change in the weight result from two independent mechanisms,
the weight update (see Sec. 4.3.2) and the bistability (see Sec. 4.3.3). As
shown in Fig. 4.11(b), the weight update mechanism accumulates or drains
off charge from the capacitor connected to node w, using the currents mirrored from the post-synaptic module. The current mirrors are functional
only when the switches sU and sD are on, i.e, during a pre-synaptic spike.
In absence of pre-synaptic spikes, the current mirrors are blocked and the
voltage is weakly driven by the amplifier in the bistability module. As explained in Eq.4.4, the weight drifts towards either of the amplifiers supply
rails depending on the initial value of w. At the end of the last synaptic update, if w is greater than a threshold θ (see Fig. 4.5), the amplifier pushes it
mem
DN
UP
V
VUP
V
VDN
V
Vmem
V
[Ca]
83
V[Ca]
5.3. The pre-synaptic module
0
1
2
Time(s)
(a)
3
4
0
1
2
Time(s)
3
4
(b)
Figure 5.2: Verification of the post-synaptic module with spike train stimulation. The top panels show the gate voltage of the transistor producing I[Ca]
(see Fig. 4.19). This represents the average post-synaptic frequency. a) The
current comparators are turned off resulting in VU P and VDN changes solely
dependent on Vmem , and independent of V[Ca] . b) After turning the current
comparators on, V[Ca] along with Vmem determines the states of VU P and VDN .
towards the positive rail VH . Otherwise, the node w is pulled towards lower
supply VL .
In order to monitor the synaptic modification of one plastic synapse,
we stimulated a non-plastic synapse, of the same neuron, with a Poisson
distributed spike train. This forced the neuron to fire consistently at an
average frequency of 30Hz. Next, we stimulated the pre-synaptic input of
the particular plastic synapse with a Poisson distributed spike train lasting
for a 400ms long trial session. This was repeated for a several of trials, and
always with a mean frequency of 60Hz. The stimulation protocol is shown
as p1 in Fig. 5.3. The synaptic dynamics, represented by node voltage w,
is shown in Fig. 5.4. In the top and the bottom panels, the membrane
potential of the post-synaptic neuron (Vmem ) and the digital pre-synaptic
spike events (pre) are shown, respectively. In the middle we plot the weight
modification depending on the pre and post-synaptic spike statistics. Due
to layout constrains in placing probe points, we had to monitor an internal
voltage ∼ w, that corresponds to the inverted value of w. As prescribed by
the theory in Eq. 2.4, the jumps in synaptic weight of the plastic synapse (w)
occur only during a pre-synaptic spike. The polarity of the jump depends on
Vmem compared to Vmth (dashed line in the top panel). Notice that not all presynaptic spike result in jumps. This happens when none of the post-synaptic
currents, IU P and IDN , are in active during a pre-synaptic jump. It can be
84
Chapter 5. Characterization of the plasticity circuits
p1
non−plastic
plastic
p2
non−plastic
plastic
p3
non−plastic
plastic
400 ms
Figure 5.3: Various stimulation protocols used to demonstrate the properties
of the plastic synapse. The post-synaptic activity is primarily driven by the
stimulus given to the non-plastic synapse. The pre-synaptic stimulus is goes
to the plastic-synapse that is being tested. The length of the lines show the
stimulus duration in three different protocols, p1-p3.
observed that the up and down jump heights are not equal in magnitude.
This is because of the differences in the p-type or a n-type current mirror
feeding current to the node w. Voltages VH and VL are the two bistable limits
of the synaptic weight (see Fig. 4.5). The bistability threshold θ is placed
closer to VL to compensate for the unequal jump heights. This eliminates
any unintended bias in the probability of synaptic transition.
Though the bistability mechanism is continuously active, its effect is observed only during the inter spike intervals. This is due to the fact that the
current bias responsible for the drift is much smaller than the current responsible from the jumps in synaptic weight. The up and down slopes of the drift
do not exactly match as they are the result of either a p or a nMOS transistor charging/discharging a capacitor with unequal subthreshold currents.
To verify the probabilistic nature of the synaptic transition, we monitored
the dynamics of the node voltage w, from one trial to another. Due to the
stochastic nature of the spiking inputs, the jumps accumulate to a weight
transition in some cases (see Fig. 5.4(a)), but not in others (see Fig. 5.4(b)).
This randomness in transition behavior, even with the same mean firing rate
for pre- and post-synaptic neurons, is an essential requirement of the learning
mechanism.
The stimulus to the non-plastic synapse can be considered as the teacher
signal driving the post-synaptic firing frequency. In protocol p1 the teacher
input is continuously available to the neuron. This is unlikely for a learning
mechanism, where the supervisor is active only when necessary. In a new set
of experiments, we used a teacher signal only during the 400ms trial session.
The protocol is shown as p2 in Fig. 5.3. In Fig. 5.5, the voltage traces
VU P and VDN are shown during an experiment using p2. The average postsynaptic frequency was set to 80Hz by stimulating a non-plastic synapse. It
can be noticed that even with a higher post-synaptic activity (compared to
30Hz in Fig. 5.4), the control voltages (VU P , VDN ) get activated much later
Vmem
85
Vmem
5.3. The pre-synaptic module
θ
θ
0
0.1
0.2
Time(s)
0.3
pre
VH
pre
VH
~w
VL
~w
VL
0.4
0
(a) a1
0.1
0.2
Time(s)
0.3
0.4
(b) a2
vmem
VDN
VL
~w
θ
0.2
Time(s)
(a) a1
0.4
θ
VH
~w
VL
VH
VUP
VUP
VDN
vmem
Figure 5.4: Stochastic LTP transition in a silicon synapse. The stimulation
protocol p1 is used to stimulate a plastic and a non-plastic synapse. The particular plastic synapse whose internal dynamics can be probed, is stimulated.
(a) The updates in the synaptic weight did not produce any LTP transition
during the 400ms stimulus presentation. (b) The updates in the synaptic
weight produced an LTP transition that remains consolidated.
0.2
Time(s)
0.4
(b) a2
Figure 5.5: Dependence of synaptic updates on the lean control signals. The
neuron is stimulated using protocol p2. The voltages, VU P and VDN , get
activated when average post-synaptic frequency reaches the right range. (a)
Synaptic jumps do not accumulate into a transition. (b) A synaptic transition
happens.
86
Chapter 5. Characterization of the plasticity circuits
than the commencement of the stimulus. The delay is due to the time taken
for the average post-synaptic frequency (represented by the current I[Ca] ) to
reach its steady state from zero (as in Fig. 5.2(b)). In this protocol, synaptic
updates can take place only during the later part of the 400ms pre-synaptic
stimulus. In a different protocol, we set the average νpost to a non-zero value
before the arrival of the stimulus to plastic synapse. In order to do so, the
input to the non-plastic synapse is started 100ms before the 400ms stimulus
duration. Shown as p3 in Fig. 5.3, this protocol is used for most of the
remaining experiments.
5.4
Transition probabilities
As described in Sec. 2.4, stochastic transition and stop-learning are two
distinctive features of the learning rule. Stop-learning introduces a nonmonotonic dependence on the synaptic transition probability as a function
of mean νpost . According to the theory, both LTP and LTD should show
a low probability for low νpost and peak up at different intermediate values
(see Fig. 2.3(a)). The probabilities should again go down to zero for higher
post-synaptic frequencies. The LTD reaches its peak before that of the LTP
because the learning mechanism is Hebbian.
The frequency dependency of the transition probability is a collective behavior controlled by all the pre and post-synaptic blocks described before.
Furthermore, all the synapses should behave similarly and have probabilities
of very small value, less than 0.1 (Brader et al., 2007). In order to verify
this behavior for any particular νpost , hundreds of trials are required to independently stimulate and test each synapse. The bias values should then be
modified, and the experiment repeated. Due to the many iterations required
to optimize the set of bias parameters, determining the correct transition
probability for all 1920 plastic synapses can be an extremely time consuming
procedure. Alternatively, we optimize the behavior of just one synapse by
stimulating it for few hundred trials. The protocol p3 in Fig. 5.3 was used
during the trials. We also maintained the bias settings so as to obtain a
high transition probability. This let us check the shapes of LTP/LTD curves
in a reasonable amount of time. The peak probability can be controlled
by changing the bias determining the jump heights (Ijump ). In Fig. 5.6 we
show the transition probability distribution of the single synapse, as a function of mean νpost . This matches well with the theoretical curves shown in
Fig. 2.3(a), except for the high magnitudes of p(LT P ) and p(LT D).
Until now, we monitored the detailed behavior of just one synapse. Next,
we test all the synapses belonging to a single neuron. The important features
5.4. Transition probabilities
87
p(LTD)
1
0.5
0
p(LTP)
1
0.5
0
0
200
400
νpost(Hz)
600
Figure 5.6: The transition probabilities measured for a single synapse as a
function of the post-synaptic frequency. The peak probability can be reduced
by decreasing the current Ijump and shape of the curves can be modified by
changing the biases that set Ik1−k3 (see Fig. 4.11(b)).
to be verified are their stochastic nature of transition and the dependence of
their transition probability on the νpost . We confirmed this in a qualitative
manner by a set of plots demonstrating the transition of the synapses over
many trials. A trial consisted of pre- and post-synaptic plasticity phase similar to the protocol p3 in Fig. 5.3, followed by a state-determination phase.
We started by stimulating all 60 synapses connected to one neuron. During
the plasticity phase, the synaptic efficacies of all synapses were turned down.
This means that any particular synapse is free to make updates and transitions, as usual, but the post synaptic current it produces is always zero.
The state of the synapses (high or low) do not have an effect on their respective IEP SC , hence do not influence the post-synaptic frequency either.
This is done to decouple the probabilistic transition of one synapse from
another. Multiple plastic synapse can be stimulated simultaneously by interleaving their spikes using the spike generation software SPIKE-TOOLBOX
(Muir, 2005). In contrary, in the state-determination phase, the synapses
are restricted from updates (setting Ijump =0) and their efficacies turned to
maximum. Each plastic synapse is independently stimulated to evaluate its
effect on post-synaptic frequency. A high νpost guarantees the corresponding
synapse is in high state and vice versa 2 . The entire protocol is shown as
2
This method of determining the synaptic state is more time consuming than the one
implemented in Giulioni et al. (2007). They employ a RAM style read out that can access
88
Chapter 5. Characterization of the plasticity circuits
inset in Fig. 5.7. The states of all synapses are determined after each trial
and marked as black (depressed state) and white (potentiated state) dots
before embarking into a new trial. We performed fifty trials with a low νpost
(10Hz) and progressively increased the νpost to higher values (up to 490Hz),
by increasing the input frequency to the non-plastic synapse. In Fig. 5.7,
each panel shows the result from a set of 50 trials.
The series of plots verifies the tendency of the synapses to remain depressed (LTD) at low post-synaptic frequencies. This is evident from the
abundance of black dots in first two panels. In mid range frequencies, the
density of black and white dots show that the probability of LTD and LTP
are comparable. However, at high frequencies, the synapses prefer to remain
potentiated (more white dots). Apart from the change in transition probability as a function of νpost , the stochastic nature of transition is also evident
from the Fig. 5.7. Most synapses change their states in a random fashion
from trial to trial, with probabilities that depend on νpost . As seen from the
figure, some synapses do prefer one state than the other, for all frequency
values. These are instances of faulty circuit blocks (synapses in this case)
common in a large arrays of mismatch prone silicon chips.
In order to independently verify the LTP and LTD transition probabilities, we modified the experiment protocol. Here we initialize the synapses
to one particular state, before the every trial. We start by setting all plastic
synapses to depressed state. This is represented by the down arrows (↓) in
the protocol shown on top of Fig. 5.8. The plots below show the transition to potentiated state by white dots, over a set of twenty trials. Similar
to the previous experiment, νpost is increased after every set (shown above
each panel) by changing the input frequency to the non-plastic synapse. The
density of white dots increase and then decrease again with the increasing
values of the post-synaptic frequency, meeting the theoretical requirement.
As pointed out before, the transitions happen randomly even though the preand post-synaptic frequencies are same for one entire set.
The same experiment was performed with the synapses initialized to the
potentiated state. Results from twenty trials are plotted in Fig. 5.9, until
the dotted line. Here black dots represent a transition to a depressed state.
The density of the black dots increases and decreases again as νpost is increased. This, once again verifies the effect of the stop-learning phenomena.
Notice that the peak LTD transition occurs at a much lower post-synaptic
frequency than that of the LTP transition (see Fig. 5.8). After twenty trials,
the experimental setup was altered, as described in the next section.
the state of each plastic synapse simultaneously. However, the area overhead for each
synaptic element limits the on-chip synaptic density
5.4. Transition probabilities
89
Figure 5.7: Stochastic transition of all 60 synapses on a synapse array. Each
panel consists of data from fifty trials following a protocol shown in the inset.
The post-synaptic frequency is increased from 10Hz to 490Hz in equal intervals.
Each black dot represents a low synaptic state and white a high one. The
dominance of LTD in low νpost and that of LTP at higher values is evident
from the series of graphs.
90
Chapter 5. Characterization of the plasticity circuits
non−plastic
plastic
efficay
Synapse number
Synapse number
I jump
20
60
100
180
250
320
400
500
600
700
800
900
20
40
20
40
Figure 5.8: Stochastic LTP transition in all 60 synapses in an array. Trials
are performed according to the protocol shown on top. The synapses are reset
to low bistable state at the beginning of each trial. Each panel consists of data
from 20 trials where a white dot represents a transition from low to hight state
(LTP). The rise and fall in LTP probabilities with the post-synaptic frequency
(shown on top of each sub-figure) is maintained for all synapses.
Synapse number
Synapse number
5.4. Transition probabilities
91
5
10
20
60
100
180
250
320
20
40
20
40
Figure 5.9: Stochastic LTD transition in all 60 synapses in an array. A protocol similar to one in Fig. 5.8 was used, but reseting the synapses to high
bistable state at the beginning of each trial. In each panlel, data from first 20
trials (before dotted line) show increase and decrease in LTD transition probabilities, as νpost increases. The last 10 trials were performed by increasing the
bias current Ik3 that delays the fall in LTD probabilities at high frequencies.
92
Chapter 5. Characterization of the plasticity circuits
Delaying stop-learning
In the Sec. 4.4.3 we described how the stop-learning frequencies are implemented by bias currents set in the post-synaptic module. Currents Ik1 and
Ik2 , sets the lower and upper boundaries for LTP transitions, while Ik1 and
k3 sets the boundaries for LTD transitions. In order to monitor their effects
on transition probability, in the later part of experiment described in last
section (Figure. 5.9), we set Ik3 to a higher value. The data obtained in
the last ten trials are shown after the dotted line. A higher Ik3 delays the
effect of the stop-learning until higher νpost values are reached. Pushing the
upper boundary to a higher frequency value makes no difference in the transition probabilities at low νpost . This is evident from the density of black dots
remain same on either side of the dotted line.
5.5
STDP phase relation
The spike time dependent plasticity (STDP) observed in in physiological
experiments (Markram et al., 1997) has been widely implemented in neural
network models (e.g., Kempter et al., 1999) and in neuromorphic systems
(e.g., Arthur and Boahen, 2006). The silicon synapse in the IF SL chips can
show similar spike time dependent plasticity, but in a restricted form. In
order to invoke such properties, we used a protocol common to physiological
experiments, where the pre and post-synaptic spikes are paired with a phase
difference of the order of few milliseconds. We stimulated a plastic and a
non-plastic synapse of a neuron and measured the jump heights of w, for
different phase relations. The frequencies ( νpre = νpost ) were kept constant
at 80hz. As usual, post synaptic firing was generated by stimulating the
non-plastic excitatory synapse with a Poisson pulse train. The data is shown
in Fig. 4.11(b). Consistent with the theory, the jumps had a positive value
(U P jumps) if the tpre − tpost delay was positive, and vice-versa. Though the
circuits responsible for the synaptic update are designed for equal upward and
downward jumps (see Sec. 4.3.2), different jump heights were observed during
the experiment. The large disparity in U P and DN jumps (also observed
in Fig. 5.4) is due to the difference between the p or nMOS current mirrors
charging/discharging node w. The minor changes in jump heights between
various U P -jumps(or DN -jumps) are due to the frequency dependence of
IU P and IDN currents and the bistability mechanism.
5.6. Multiplexer functionality
93
300
∆w (mV)
200
100
0
−100
−5
0
5
tpost− tpre (ms)
10
Figure 5.10: Pre- and post-synaptic neurons are made to fire in a manner
similar to that of Fig. 5.4. The jump heights ∆w, are plotted against the
phase relation between pre- and post-synaptic spiking events. The polarity of
jumps show typical STDP like behavior.
5.6
Multiplexer functionality
In all experiments in this chapter, the number of synapses connected to a
neuron was fixed. However, the Sec. 4.5 describes the need for a flexibility
in synaptic density when using the chip for a wide variety of classification
experiments. As shown in Fig. 4.21, a multiplexer can be used to reconfigure
the synaptic density of the on-chip neurons. The multiplexer is designed
to re-wire the synapse array (SA), each having 128 synapses, to desired
neurons. In order to verify the multiplexer functionality, we first keep it in the
default state and stimulate one non-plastic synapse of each array. This results
in a post-synaptic firing from all corresponding neurons, i.e., stimulating a
synapse in SA-3 makes neuron-3 fire. Next, we configure the SAs such that
each neuron has 256 synapses, connecting two arrays to alternate neurons.
In this case, stimulating SA-3 or SA-4 both makes neuron-4 fire. At the
same time the configuration makes neuron-3 unusable. This is shown in
Fig. 5.11(a) where the input stimulus to the SAs are shown in gray and
the output from the neurons in black. The heights of the black bars vary
from one neuron to another due to the mismatch in the non-plastic synapses.
In Fig. 5.11(b), we re-configure the multiplexer such that 512 synapses are
connected to each neuron. Here, SA-1 to SA-4 stimulate neuron-4, while
neuron-1,2 and 3 remain unused. In this process the synaptic density of
some neurons could be increased by up to four times, but with less usable
neurons.
94
Chapter 5. Characterization of the plasticity circuits
input/output frequency (Hz)
input/output frequency (Hz)
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
1
2
3
4
5
6
Dendritic Tree, Neuron
7
(a) a1
8
1
2
3
4
5
6
Dendritic Tree, Neuron
7
8
(b) a2
Figure 5.11: Data from eight synapse arrays (SA) connected to eight neurons
via a multiplexer. The gray bars show the frequency of the input stimulus to
the non-plastic synapses of the SA. In black the frequency of the neurons in
response to the stimulus are shown. In the default multiplexer configuration,
each SA is connected to its adjacent neuron (see Fig. 4.21). (a) In a different
multiplexer configuration, two adjacent SAs are connected to alternate neurons (neurons 2,4,6 and 8) increasing their synaptic density. Other neurons
remain unused as they do not have any synapse connected. (b) Four SAs are
connected to single neurons (neurons 4 and 8) for even lager synaptic density.
5.7
Conclusions
This chapter shows different methods of characterizing the functionality of
the silicon synapses. I described various experiments that were carried out
to analyze individual functional blocks and also the overall performance of
the plasticity circuit.
Spike based synaptic plasticity on silicon has a fairly recent history, and
methods of long term storage of the synaptic weights are far from standardized. Over the years, researchers accomplished this difficult technological
challenge in different ways, for example, by floating gate storage (P.Hasler
et al., 1999; Häfliger and Rasche, 1999), by implementing digital look-uptables (Vogelstein et al., 2003) or by storing only the binary state of the
synapse (Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006). In
this project we used a synapse with limited analog resolution, designed to
take bistable values in long time scale (Brader et al., 2007). This chapter
demonstrates the dynamics and the bistable nature of a single synapse. I
first showed how the post-synaptic module comply with the theoretical requirement of generating necessary learn control signals. In order to do so,
5.7. Conclusions
95
an appropriate protocol was designed to stimulate the non-plastic and the
plastic synapses. Next, I demonstrated the stochastic transition in the states
when the plastic synapses are subjected to the right stimulation protocol.
Due to the limitation in the number of external probe points, experiments
highlighting the detailed behavior of the synaptic plasticity were performed
on one single synapse. I also described the method of determining the LT P
and LT D probability curves that are dependent on the post-synaptic frequency. Similar transition probabilities of a single synapse have been extensively characterized in Fusi et al. (2000). In a later part of the chapter,
I explained how the synaptic transition of all synapses of a neuron can be
monitored in a qualitative manner. It is essential for all the silicon synapses
to show stochastic nature of transition, similar to the one tested in detail.
Though some preliminary data on stochastic transition was reported before
in Chicca et al. (2003), here, it was demonstrated with much more rigor and
control.
The transition probability is an outcome of the collective performance by
all pre- and post-synaptic blocks. It requires optimized values from many
different voltage biases. I showed how the bias parameters can be tuned
to modify the LT P and LT D probabilities. I also showed that the weight
update of a synapse depends on the phase difference between the pre- and
post-synaptic spike times, similar to the STDP behavior. Finally I described
an experiment to verify the behavior of the multiplexer, which is necessary
for reconfiguring the on-chip synaptic density.
96
Chapter 6
Spike based learning and
classification
6.1
Introduction
Learning and classification on dedicated VLSI neural networks (NN) has been
an active field of research for last two decades. This essentially begun with
the resurgence of NNs research after the rediscovery of the backpropagating
learning rule in late 80s (Rumelhart et al., 1986). Multilayer perceptron, using the backpropagation learning rule, became very effective in solving a wide
variety of learning tasks. Hardware NN systems were designed for various
applications like autonomous robotics, stand-alone sensors, speech processors etc., for their speed and power advantages compared to that of general
purpose computers (Lindsay, 2002). Analog VLSI implementation of such
learning networks, that utilizes the physical properties of silicon, have been
developed to achieve high energy and integration efficiency (see Cauwenberghs and Bayoumi, 1999; Valle, 2002, for review). However, most of these
systems had little or no connection to biological learning mechanism, such
as spike based synaptic plasticity. More recently, inspired by the physiological phenomena, neuromorphic systems with bio-plausible learning rules have
been proposed (Häfliger et al., 1997; Fusi et al., 2000; Vogelstein et al., 2003;
Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006; Indiveri et al.,
2006a; Häfliger, 2007). Here, I demonstrate learning and classification on a
silicon system that is based on a biologically inspired spike driven synaptic
plasticity, as proposed in Brader et al. (2007). This shows superior performance compared to any other spike based learning system reported in the
literature.
In Chapter. 5, detailed characterization of the spike driven synaptic plas97
98
Chapter 6. Spike based learning and classification
ticity circuits were shown. Here, I describe the experiments done to train the
VLSI system and to test its memory capacity. I show results that demonstrate learning of binary spatial patterns of mean firing rates along with
quantification of the classification behavior. The network was tested with
a variety of inputs from random uncorrelated patterns to highly correlated
patterns and also patterns with graded input. This thesis describes the most
extensive classification done on a spike driven VLSI neural network, and
shows a very satisfactory performance for a prototype device.
6.2
Network architecture
The network architecture we consider consists of a single feedforward layer
composed of N input neurons, that are fully connected by plastic synapses to
a single output. Many biologically plausible models implementing supervised
learning mechanism are tested in such simplified neural architectures, typically feedforward single layer networks with binary outputs (see e.g., Brader
et al., 2007; Gütig and Sompolinsky, 2006). The aim of learning is to modify
the synaptic connections between the neurons so that the output respond as
desired in both the presence and absence of the instructor. As described in
Sec. 2.6, we used a network consisting of input and output units connected
by plastic synapses where each integrate-and-fire neuron is considered to be
an output unit (see Fig. 2.4). Multiple output neurons receiving the same
input could also be grouped together to form a population, for a better performance. Initially, each output neuron was configured to have sixty input
units, corresponding to sixty plastic synapses. In the following experiments
we trained the network to classify binary spatial patterns of sixty dimensions
(i.e. sent to sixty plastic synapses), using an additional teacher signal sent
to the neurons non-plastic excitatory synapse.
6.3
Training methodology
In order to classify patterns, the network is first trained with various inputs
and then tested for its memory capacity. The training patterns consist of
binary vectors of Poisson distributed spike trains, with either a high mean
firing rate (30Hz), or a low one (2Hz). In Fig. 6.1 we show two such input
vectors, created in a random fashion with 50% high rates (white circles), and
50% low rates (black circles). These spatial patterns are randomly assigned to
either a C + or a C − class. During training, when a C + pattern is presented
to the plastic synapses a T + teacher signal is used. The T + teacher is a
6.3. Training methodology
99
T−
+T
Excitatory synapse, non−plastic
Inhibitory synapse, non−plastic
Excitatory synapse, plastic
High input state (30Hz)
Low input state (2Hz)
C+
C
Figure 6.1: The method of training with binary patterns of mean input frequency. Two examples of training patterns are shown on the left and right
sides of a neuron symbol. Poisson spike trains of high (30Hz) or low (2Hz)
frequency are randomly assigned to each synapse, and represented as white
or black circles respectively. These spatial patterns (black/white circles) are
arbitrarily assigned to either a C + or a C − class. During training, the C +
patterns are presented together with a T + (teacher) signal, while C − patterns
are presented in parallel to a T − spike train. Training of the C + and C −
class patterns are interleaved in a random order. The same spatial patterns
are trained for multiple iteration but with new realizations of Poisson spike
trains, for each session.
Poisson distributed spike train with mean frequency of 250Hz, presented
to one of the neurons non-plastic excitatory synapses. Similarly, for C −
patterns a T − signal, with mean rate of 20Hz, is used. The training sessions
follow the protocol p3 shown in Fig. 5.3, where the input to the plastic
synapses last for 400ms and that to the non-plastic one is for 500ms duration.
For each training session we generate new Poisson spike trains, for both
input and teacher, keeping the same pre-defined spatial distribution of mean
frequencies. The output neuron fires according to the total synaptic input
current weighed by the plastic synapses plus from the teacher input. The
probability transition for the plastic synapses depends on the mean firing
frequency of the post-synaptic neuron (νpost ).
After several training sessions, the neurons are tested for the learned patterns. Before testing, we disable the synaptic update mechanism by setting
the current Ijump , that determines the magnitude of jumps (see Fig 4.19), to
zero. This is done to allow us do the tests without any interference from the
learning mechanism and to study the results of training at fixed synapses.
During testing we present the input patterns, again for 400 ms, but without
the teacher signal, and evaluate the networks response. A high output frequency for a C + class and a low for a C − class, suggests correct classification
behavior.
100
6.4
Chapter 6. Spike based learning and classification
Evolution of synaptic weights
According to the theoretical prescription of Brader et al. (2007), during training sessions the synaptic weight changes depending on the pre- and postsynaptic frequency. The synapse displays Hebbian behavior: LTP dominates
at high νpost and LTD dominates at low νpost , when the presynaptic neuron
is stimulated. We tried to analyze the evolution in synaptic weights, as the
training progresses. Although, direct monitoring of the states of all plastic
synapse is not possible, the post-synaptic firing frequency during the testing phase gives a measure of the synaptic states. Synapses corresponding
to high pre-synaptic input(30Hz), if potentiated, increases the output frequency. Conversely, synapses corresponding to high input, if depressed, does
not contribute to the νpost .
Figure. 6.2 shows the evolution of synaptic weights when trained with a
+
C pattern. Prior to the experiment, all synapses were initialized to random states 1 . To evaluate the response of the neuron, we test it with a new
binary pattern and measure the output frequency. We then train the neuron with the same pattern as a C + class for ten sessions, interrupting the
training sessions to test it (i.e. to measure its response to the pattern in
absence of the teacher signal). We repeated the cycle of ten training sessions
followed by one testing session for two more times. At the end of the entire C + class training period, we assign the same pattern to the C − class,
and re-train it. We again interrupt the training sessions and measure the
intermediate results for three times. This experiment was repeated with 50
different spatial patterns; and the outcome plotted in Fig. 6.2. The light
gray bars in all panels represent the neuron’s output frequency histogram
in the initial condition, with random synaptic states. The panels, from left
to right, show how the output frequencies gradually increase as the training
potentiates more and more synapses. Similarly, the panels in the Fig. 6.3
show how the weights tend to depress again as the neuron is being trained
with the same patterns, but this time assigned to C − class. The fact that the
output frequency distributions do not increase beyond approximately 200Hz,
and do not decrease completely to zero is a direct consequence of the circuits
stop-learning mechanism, described in Sec.2.4.
Comparing the two rightmost panels in Fig. 6.2 and Fig. 6.3, shows that
the output frequencies obtained from training C + and C − classes are well
separated, once the training is complete. These results indicate that the
Ten random binary patterns were trained as both class C + and class C − to the same
synapse array alternatively for 5 sessions each. A single synapse receiving the same νpre
but inconsistent νpost created a conflicting requirement between LTP or LTD transition.
This drove the synapses to random stable states
1
6.5. Classifying multiple spatial patterns
101
0.5
p(νpost)
0.4
0.3
0.2
0.1
0
Frequency(Hz)
Figure 6.2: Probability of output frequency as a neuron’s training progresses
with the C + class patterns. The gray bars, in all panels, represent the neurons
response to the input patterns in its initial conditions (before training). The
black bars represent the neuron’s output frequency as training of the C + class
progresses. Synaptic weights stop potentiating when the νpost is too high,
limiting any further increase in the firing frequency.
0.5
p(νpost)
0.4
0.3
0.2
0.1
0
0
100
200
0
100
200
Frequency(Hz)
0
100
200
Figure 6.3: Probability of output frequency as a neurons training progresses
with the C − class patterns. Synaptic weights stop depressing when the νpost
is too low, restricting the post-synaptic frequency from going down to zero.
device can be robustly used to perform classification of spatial patterns of
mean firing rates, into two distinct classes.
6.5
Classifying multiple spatial patterns
To further verify the classification performance, we carried out the following
experiment with four random spatial patterns: we trained one neuron (labeled as neuron-A) with two of the patterns (1a and 2a) assigned to the C +
class, and with other spatial patterns (1b and 2b) assigned to the C − class.
We interleaved the training of the four patterns in random order. In addition, we trained a different neuron (labeled neuron-B ) with the same four
102
Chapter 6. Spike based learning and classification
neuron−A
neuron−B
0.5
60
νpost(Hz)
p(νpost)
80
p(νpost)
0
0.5
40
20
0
0
50
100
Frequency(Hz)
(a)
150
0
1a
2a 1b 2b
#patterns
1a
2a 1b 2b
#patterns
(b)
Figure 6.4: Classification of four spatial patterns. (a) Probability distribution of νpost after having been trained to classify four patterns belonging the
C + (top) and the C − (bottom) class. (b) Average output frequency for the
individual patterns, during testing. In the left plot, patterns 1a and 2a are assigned to the C + class while patterns 1b and 2b to the C − class, when training
neuron-A. In the right plot the class assignments are swapped, while training
neuron-B.
patterns, but swapping the class assignments. After training, both neurons
were tested with all the patterns. To measure the mean performance and
its variance, we repeated the entire procedure thirty times, creating new sets
of random spatial patterns each time. In Fig. 6.4(a) we plot the probability
distribution of output frequencies of the neuron-A, in response to the four
input patterns, during the testing phase. The top panel shows the response
to the patterns 1a and 2a, and bottom shows the response to 1b and 2b. As
expected, the probability (p(νpost ) of neuron-A to fire at a higher frequencies
for the patterns trained as the C + class, is much larger than for the C − patterns. In Fig. 6.4(b), we plot the average output frequency separately for all
four patterns obtained from neuron-A (left panel) and that from neuron-B
(right panel). Here, neuron-B responds with a low firing rate to the very
same patterns that were assigned to the C + class for neuron-A, and with a
high rate for the others. From the neuronal responses it is evident that a
single threshold frequency would be enough to categorize the patterns in two
distinct classes.
We repeated similar experiments with six and eight random input patterns, always training half of them as C + to and the other half as C − . As
before, the two neurons were trained with the same input patterns, but with
opposite class assignments. During the testing, the mean output frequencies
6.5. Classifying multiple spatial patterns
neuron−B
80
70
70
60
60
50
50
(Hz)
80
post
40
neuron−A
neuron−B
1a 2a 3a 4a 1b 2b 3b 4b
#patterns
1a 2a 3a 4a 1b 2b 3b 4b
#patterns
40
ν
νpost(Hz)
neuron−A
103
30
30
20
20
10
10
0
1a 2a 3a 1b 2b 3b
#patterns
(a)
1a 2a 3a 1b 2b 3b
#patterns
0
(b)
Figure 6.5: Average output frequency during classification of six (a) an eight
(b) random patterns. In each case, half of the patterns (marked with ’a’ at
the end) are trained as C + to neuron-A and the other half as C − class. For
neuron-B, the class assignment was swapped.
for all the six patterns are plotted in Fig. 6.5(a) and for the eight patterns
in Fig. 6.5(b). From the frequency histograms we can see that the separation between the two classes become less obvious as the number of patterns
increase. A more quantitative analysis of the classification performance will
be discussed in Sec. 6.6.
To provide the reader with more insight on the significance of these experiments we used 2D binary patterns representing digits (see Fig. 6.6) as
input spatial patterns. The 2D input space is translated to a 1D binary vector by simple raster scan and Poisson input spike trains are created with a
high mean firingrate for white pixels, and a low rate for black pixels, as in
previous experiments. All symbols on the left most panel of Fig. 6.6 represent the digit 1 in different languages, while the symbols in the middle panel
represent the digit 2. A neuron is first trained with just two patterns (1a as
class C + , and 2a as class C − ) and its response during testing is shown in the
top row of the right panel in Fig. 6.6 . The neuron is then trained with four
patterns (the patterns in top two rows of the figures’s left panel), and testing
shows correct classification behavior (see middle row of figures’s right panel).
Finally, the neuron is trained with all six patterns. The test results in the
bottom row of the figures right panel show that the neuron can successfully
learn to distinguish the character 1 from 2 in three different languages.
We also checked the classifier recall performance by corrupting the input
during testing. In order to do so, a random subset of the input vector was
Chapter 6. Spike based learning and classification
freq(Hz)
104
50
freq(Hz)
0
50
freq(Hz)
0
50
0
C+
1a 1b
1c 2a 2b
2c
C
Figure 6.6: Pattern recognition of 2D binary images. The data is converted
into a 1D vector of Poisson mean firing rates (30Hz for white pixels, 2Hz for
black pixels). Numbers 1 and 2 in three different languages were assigned to
C + and C − classes respectively. The plots in the right panel show classification
results when one, two, or three members of each class are trained together (top
to bottom).
inverted, from high to low or vice versa, from the one used in training. The
modified input was used for testing the neurons. The Fig. 6.7 shows the
difference in average post-synaptic firing frequency (∆νpost ) between C + class
patterns and C − class patterns, both corrupted during testing. In the left
figure, two spatial patterns were used as input and in the right, four. The
x-axis shows the size of input vector that was corrupted. The middle figure
shows an example of the original pattern (top), to 5% (middle) and 10%
(bottom) corrupted spatial pattern.
6.5.1
Uneven class distributions
In all experiments described until now, the number of patterns belonging to
C + and C − class were always kept equal. Hence, after training was complete,
the neuron had to categorize the set of input patterns in two equal halves.
To verify the capability of a neuron to recognize one pattern out of many,
we created four random binary patterns (labeled 1 through 4) and trained
four neurons (labeled A through D), with uneven class assignments. For
neuron A, only pattern 1 was assigned to the C + class, and all other patterns
were assigned to the C − class. Similarly, for neuron B only pattern 2 was
assigned to the C + class, and for neurons C and D only patterns 3 and 4 were
assigned to C + respectively. After multiple iterations of the training session
we tested all four neurons. This was repeated forty times with new sets of four
140
140
120
120
100
100
∆ νpost
post
∆ νpost
post
6.6. Quantitatitive analysis of classification performance
80
60
80
60
40
40
20
20
0
0
5
10
% Noise level
15
105
0
0
5
10
% Noise level
15
Figure 6.7: Memory recall from corrupted data set. After training is completed, a percentage of the input vector (% noise level ) is inverted before
testing is done. The difference in post-synaptic frequency between the C +
and C − classes while testing two and four spatial patterns are shown in the
left and right panel, respectively. As expeceted an increase in noise level decrease the frequency difference. In the middle panel, an example pattern and
its 5% and 10% corrupted versions are shown from top to bottom.
randomly generated spatial patterns, with identical class assignments. We
used a fixed threshold frequency (20Hz) as decision boundary and counted
the number of times each neurons output crossed it, while testing all four
patterns. As expected, neuron-A crossed the threshold many more times in
response to pattern 1, compared to the responses to other patterns, and all
other neurons behaved correspondingly. The gray bars in Fig. 6.8 show the
fraction of times (fT ) each neuron crossed the decision boundary. The height
of the gray bars for the patterns corresponding to the C + class show the
fraction of correct classification, while the gray bars for patterns of respective
C − classes show the fraction of misclassified results. We also counted the
number of times νpost resided within the narrow band between 16Hz and
20Hz, and considered that as un-classified output. This means that the
output is neither high enough to categorize the pattern as a C + class nor low
enough for a C − class. The thin black bars show the fraction of times this
happened.
6.6
Quantitatitive analysis of classification performance
In order to do a quantitative characterization of the networks classification
performance, we used a discrimination analysis based on the Receiver Operating Characteristics (ROC)(Fawcett, 2006). An ROC graph, plotted in an
Chapter 6. Spike based learning and classification
fT
fT
fT
fT
106
1
0.5
0
1
0.5
0
1
0.5
0
1
0.5
0
A
B
C
D
1
2
3
4
Figure 6.8: Four neurons (A-D) are tested with four different random patterns
(1-4), each trained with one pattern as C + and three other as C − . The fray
bars show the fraction of time the output crossed a fixed frequency threshold
while testing. Each neuron crossed the threshold many more times for their
corresponding C + patterns (like pattern-3 for neuron-C) compared to others.
Black bars represent the fraction of time the output frequency lies within a
narrow band around the threshold, depicting unclassified result.
unit square, is a technique for measuring the sensitivity and specificity of a
binary classifier. Shown in Fig. 6.9(a), the output of a binary classifier can
have four possible outcomes, true posistive, true negative, false positive and
false negative. In an ROC graph the classifiers true positive rate (also called
hit rate) is plotted against its false positive rate (also called false alarm rate).
For classifiers that are designed to produce a direct class decision for each
input, the analysis produces a single point in the ROC graph (like C1 or
C2). The closer the point to the location (0,1), the better the classifier (e.g.,
C1 is a better classifier than C2). However, for classifiers with continuous
value output (e.g. neuron’s output frequency), a sliding threshold ranging
from 0 to infinity, is used as the classifiers decision boundary. Each threshold
produces a different point in the ROC space and a step function connecting
them is called a ROC curve. The area under the ROC curve (AUC) is a
measure of the classification performance. While unity area denotes perfect classification, an area of 0.5 indicates that the classifier is performing at
chance.
We performed classification experiments analogous to the one done for
the data in Fig. 6.4(a), with different sets of input patterns, ranging from
two to twelve, but always dividing them equally between C + and C − class.
An ROC analysis was done on the neurons output frequency, for each set of
input patterns used. The solid line in Fig. 6.10(a) shows the AUC values of
6.6. Quantitatitive analysis of classification performance
107
Actual value
−
1
+
True
Positive
False
Positive
tpr =
tp
tp+fn
−
False
Negative
True
Negative
fpr =
fp
fp+tn
(a)
C1
tpr
Prediction
outcome
+
0
C2
fpr
1
(b)
Figure 6.9: (a) A confusion matrix shows the four possible outcome of a
classifier for a given input. (b) A graph between the true positive rate(tpr)
and the false positive rate(f pr) forms the ROC space with unity dimension.
Classifiers with a binary decision are represented as points (C1,C2) on the
graph, while classifiers with an analog output ( e.g., post-synapstic frequency
of neuron) forms a ROC-curve.
the single neuron classifier as a function of the number of patterns in a set.
As the number of patterns to classify increases, the classification performance
decreases, as expected from theory. This is due to the fact that the overlap
among patterns belonging to the C + and C − classes increases for greater
number of random spatial patterns. The number of patterns for which the
classifier has an ROC value of 0.75 can be considered as the classifier’s storage
capacity. This illustrates the maximum number of patterns for which the it
has 75% probability of producing the right result for a random input.
Next we performed similar ROC analysis as in Fig. 6.10(a) and obtained
the AUC by using only 40 and 20 input synapses. In Fig. 6.10(b) we plot the
storage capacity (solid line), versus the number of input synapses (N ) used
by the classifier. The top and bottom traces show theoretical predictions,
derived from Brader et al. (2007), with (top dashed line) and without (bottom dashed line) the stop-learning condition. The performance of the VLSI
system is compatible with the theoretical predictions on the storage capacity
of networks with stochastic bounded synapses, and shows the same scaling
properties as predicted by the theory.
6.6.1
Boosting the classifier performance
To enhance the classification performance, instead of using a single neuron as
a binary classifier, we used 20 independent classifiers trained with the same
sets of patterns. Figure. 6.11(a) shows such an arrangement, where different
108
Chapter 6. Spike based learning and classification
1
100
0.8
80
0.7
0.6
0.5
2
# patterns
AUC
0.9
% Correct Classification
15
10
5
60
4
6
8
# patterns
(a)
10
12
0
10
20
30
40
50
# input synapses
60
70
(b)
Figure 6.10: Classification performance and the memory capacity. (a) Area
under the ROC curve (AU C) measured from pattern classification experiments
with a single output neuron is shown in the solid line. The result from a pool
of 20 independent output neurons, using a majority decision rule, is shown in
dashed line (see Fig. 6.11(b) for explanation). (b) The storage capacity of a
single neuron is measured by the number of patterns classified with greater
than 0.75 AU C value. The solid curve shows the number of such patterns
plotted against the number of synapses used for the experiment. The dashed
lines show the theoretical predictions of storage capacity, with and without
the stop learning condition.
neurons are receiving the same input but different teacher signals (T1 -T20 ).
Due to limitations of number of active neurons, rather than using 20 different
ones, we trained the same neuron multiple times, with different realizations of
the Poisson trains for the teacher signal, but same input pattern spike trains.
The binary decision mechanism that combines the result of the different
output neurons was implemented using a majority rule decision process: each
neuron in the pool individually classifies the learned pattern to be in C + or
in C − and votes for the class chosen. The score is positive (+1) if the vote
is correct, and negative (-1) otherwise. The total outcome is computed by
summing all the scores that shows what the majority decides. Figure. 6.11(b)
shows the outcome of a pool of 20 neurons for 10 different input patterns.
The dark and light gray bars represent the correct and incorrect votes during
classification. The black bars represent the net sum, which is positive if
the classification is correct and negative for a misclassification. Using this
method we can also define an unclassified outcome. For example, a pattern
can be defined to be unclassified (rather than misclassified) if the difference
between the correct and incorrect votes does not exceed 10% of total members
in the pool. In that case the black bar resides within the two horizontal line.
6.7. Classification of Correlated patterns
T1
T2
T20
109
20
Vote Count
15
10
5
0
N1
N2
N20
Majority Vote
(a)
−5
−10
1
2
3
4
5
6
7
# patterns
8
9
10
(b)
Figure 6.11: Boosting the classification performance. (a) Twenty different
neurons (N1 -N20 ) can be used as independent binary classifiers by stimulating them with the same input but with different teacher signals (T1 -T20 ). A
majority voting from all these weak classifiers provide the final result. (b)
Individual classification results, and majority rule decision for classification of
10 patterns. Dark and light gray bars represent the vote counts for correct
(positive) and incorrect (negative) classifications, respectively. Black bars,
represent the sum of vote counts indicating the majority decision. Negative
black bars (not present) would represent misclassification error, while bars
within the dashed lines would represent unclassified decisions.
The dashed line in Fig. 6.10(a) shows the performance achieved using this
method. This technique is known as boosting that provides clear advantages
over the use of a single classifier. The improvement can be attributed to
synaptic updates being stochastic and independent on different neurons. As
a consequence every output neuron can be regarded as a weak classifier and
the errors made by each of them are independent.
6.7
Classification of Correlated patterns
The important advantage of the stop-learning mechanism implemented in
this device lies in its ability to classify correlated patterns. To test this
claim, we created correlated patterns using a prototype with 60 random binary values (as explained in Fig. 6.1) as a starting point. We then generated
new patterns by changing only a random subset, of a size that depends on
the amount of correlation. In Fig. 6.12(a) four patterns (labeled 1-4) are
generated starting from the prototype labeled 0. In the top panel a the four
110
Chapter 6. Spike based learning and classification
Patterns
10
1
Synapse
20
30
0.9
40
1
2
Patterns
3
4
60
10
AUC
50
0
0.8
0.7
Synapse
20
30
0.6
40
50
0
0
1
1
2
2
(a)
3
3
4
4
60
0.5
20
2 patterns
4 patterns
6 patterns
8 patterns
40
60
80
Percentage correlation
100
(b)
Figure 6.12: Classification performance with correlated patterns. (a) Four
correlated patterns (labeled 1-4) are created from the same randomly generated prototype (labeled 0). The patterns with 30% correlation with the
prototype are shown on top and the patterns with 90% correlation are shown
below. (b) The AU C values computed for different sets of patterns (2 to 8),
are plotted against the percentage of correlation among the patterns. In every
experiment half of the patterns are randomly assigned to the C + class, and
half to the C − class.
input vectors (length sixty) with 30% correlation shows a small degree of
similarity to the prototype. Patterns in the bottom panel, with 90% correlation, have most of their inputs in the same state as the prototype. These
patterns were then randomly assigned to either the C + or the C − class. In
the experiments that follow we systematically increased the percentage of
correlation among the input patterns, and repeated the classification experiment with increasing numbers of input patterns per class, ranging from two
to eight.
Figure 6.12(b) shows the AU C obtained from the ROC analysis carried
out on the outcome of these experiments. The curves show a substantially
constant AU C value for patterns with low and intermediate correlation, with
a rapid drop in AUC (indicating low classification performance) only when
the correlation among patterns increases beyond 90%. This remarkable performance derives from both the the bistable nature of the synaptic weights
and the stochastic nature of the weight updates (Fusi, 2002).
To further evaluate the effect of the stop-learning mechanism, we compared the performance of the system with the corresponding circuits enabled
and disabled. We carried out a classification experiment starting with two
6.7. Classification of Correlated patterns
111
1
C+
C
0.9
AUC
0.8
0.7
0.6
1
2
3
(a)
4
0.5
0
20
40
60
Percentage Overlap
80
100
(b)
Figure 6.13: Classification performance with and without the stop-learning
mechanism enabled. (a) Examples of the C + and C − patterns with just 20%
overlap (the overlapping region is within the dashed box). (b) Even without the stop-learning (dotted line) mechanism, the classification performance
remains high for such trivial patterns (with no or moderate overlap), but decreases for higher overlap. Result with the stop-learning (solid line) enabled
has little effect with increasing overlap.
completely orthogonal sets of C + and C − patterns. The two patterns assigned to C + class consisted of random binary vectors for synapses 1-30 and
all zeros for 31-60. Other two patterns belonging to the C − class were generated by assigning random binary vectors to synapses 31-60, and setting
the synapses 1-30 to zero. This corresponds to zero overlap between the
two classes and a trivial set to classify. Additional patterns with increasing
overlap were generated following an analogous procedure: the random binary vectors were assigned to overlapping subsets of synapses (e.g. 1-33 and
27-60 for 10% overlap) The inset of Fig. 6.13(b) shows an example of four
patterns with 20% overlap (see grey dashed box). Due to the random nature
of the binary vectors, the number of correlated synapses is usually less than
the overlap percentage. When the overlap is set to 100% this experiment
is equivalent to that of random uncorrelated patterns described in Sec. 6.5.
Conditions with little or no overlap between patterns were classified properly
(high AUC values) even with the stop-learning mechanism disabled (see the
squares in Fig. 6.13(b)). However, the effect of the stop-learning mechanism
becomes evident for high values of overlap (see the circles in Fig. 6.13(b)).
112
Chapter 6. Spike based learning and classification
0
1
10
0.9
AUC
Synapse
20
30
0.8
0.7
40
0.6
50
60
1
0
0.4
(a)
2
0
0.4
0.5
2
4
6
8
# patterns
10
12
(b)
Figure 6.14: Classification performance (AU C values) measured in response
to graded input patterns. The patterns are formed by spike trains with four
possible mean frequency values. (a) Two typical input vectors with four values
shown in shades of gray. Example spike raster plots are plotted beside. b)
With graded patterns, as expeceted, the classification performance degrades
much faster compared to that of binary pattersn in Fig. 6.10(a)).
6.8
Classification of graded patterns
In addition to using binary spatial patterns, we performed experiments with
graded patterns: patterns in which the mean frequencies could be 2Hz, 10Hz,
20Hz, or 30Hz (as opposed to just 2Hz or 30Hz).
Two samples of random graded input patterns are shown in the Fig. 6.14(a).
Example spike raster plots corresponding to the mean frequencies used are
shown next to the patterns. We performed experiments similar to those described in Section. 6.5 for classifying random patterns assigned to two classes
using a single neuron. We quantified the classifier’s performance using the
ROC analysis as shown in Fig. 6.14(b). The AUC value decreases with the
numbers of patterns presented to the classifier during the training phase.
The overall trend is similar to the one shown in Fig. 6.10(a), but the rate at
which the AUC value decreases (i.e. the classification performance degrades)
is much higher here. This is an expected behavior, as the similarity between
input patterns is even higher for graded values, and the correlation between
patterns increases as more and more are used in the training set.
To further analyze the classification performance of the device on graded
input patterns, we created spatially distributed Gaussian profiles as input
vectors. The profiles had standard deviation of 6 synapses and a maximum
mean frequency of 30Hz. In Fig. 6.15 (top row) we show two such patterns
113
input (Hz)
6.9. Conclusions
20
output (Hz)
0
30 60
30 60
30 60
30 60
100
50
0
a b
a b
a b
a b
Figure 6.15: Classification performance for two graded and overlapped input
stimuli with Gaussian profiles, a (gray) and b (black). The top row shows the
two input stimuli with increasing areas of overlap (left to right). The X-axis
represents the input synapse address and Y-axis its mean input frequency. The
output neuron is trained to classify a as a C + class and b as a C − class pattern.
The bottom row shows the neuron’s mean output frequency in response to the
a and b patterns, during the testing phase.
in four panels, with increasing amount of overlap. The first pattern (labeled
a) is centered around the synapse #25, while the other pattern (labeled b) is
gradually shifted from synapse #45 (leftmost plot) to synapse #30 (rightmost
plot). The pattern a was trained as a class C + pattern, while pattern b was
trained as a C − class pattern. The outcome of the classification experiment,
during the test phase, is plotted in the bottom row of Fig. 6.15.
As expected, the neuron responds with a high firing rate to pattern a
and with a low one to pattern b. But the difference between the high and
low mean rates decreases as the overlap between the patterns increases. The
classifier manages to maintain a difference in the output firing rates, even for
the case in which the patterns overlap almost completely.
6.9
Conclusions
In this chapter I showed classification results that verify the functionality
of a spike based learning on the IFSL-v2 chip. Random binary patterns of
mean firing rates were used as input. I showed how these results could be
extended to the classification of binary 2D images and also to recognize one
pattern out of many. To characterize the classification performance of the
system in a thorough and quantitative way, a discrimination analysis based
on receiver operating characteristics (ROC) is shown.
114
Chapter 6. Spike based learning and classification
Most of the early spike based learning circuits reported in literature focused on the detailed characterization of a single silicon synapse (Häfliger and
Rasche, 1999; Fusi et al., 2000). In other studies, data from a very small number of synapses were shown to demonstrates the change in synaptic weight
(Shon et al., 2002; Vogelstein et al., 2003; Bofill-i Petit and Murray, 2004,
etc.), according to a learning rule. In recent studies, Schemmel et al. (2006)
and Indiveri et al. (2006b) also reported spike time dependent plasticity on
silicon and emphasized on replicating the shape of the temporal window of
plasticity. However, none of them report collective behavior generated from
an array of plastic synapses, or show any statistical analysis of entire the
system. In Yang et al. (2006) and Koickal et al. (2007), spike based learning
mechanisms were utilized for adaptation in silicon synapses, but pattern classification on a VLSI network of spiking neurons were explicitly demonstrated
only in Arthur and Boahen (2006) and in Häfliger (2007). However, the patterns to be learned were rather trivial with little or no overlap between them.
Furthermore, there is neither a quantitative analysis on the classification performance nor a measure of the storage capacity of a spiking VLSI network
reported in the literature. Giulioni et al. (2007) implemented the same plasticity rule as in this project, and reported very basic learning experiments
concentrating mostly on characterization of the synaptic changes.
In this chapter, in addition to quantitative analysis of the network I
showed how the learning performance could be improved further by using
boosting techniques (Polikar, 2006). The memory recall from a partially corrupted pattern, as in associative memory, was also shown. I further demonstrated how the scaling of the storage capacity in the VLSI system are in
accordance with those predicted by the theory (Brader et al., 2007). The robust classification of highly correlated patterns shown here has not yet been
reported in any other bio-plausible hardware system. Finally, I showed how
the learning circuits can correctly classify more complex patterns that are
not restricted to binary input firing rates.
Chapter 7
Discussion
7.1
Relevance of the work described in this
thesis
Natural evolution has led to robust computational architecture based on
principles conceptually different from those of classical digital computation.
Biological systems are far more efficient in solving ill-posed problems and extracting reliable information from noisy and ambiguous data (Douglas et al.,
1994). To bridge the gap between the computational ability of a biological
and an engineered system, the concept of neuromorphic engineering took root
at California Institute of Technology, USA, during the mid-eighties, with the
research of Carver Mead. He envisioned that the inherent similarity in the
physics of silicon and biology can be exploited to build computational primitives in VLSI circuits (Mead, 1990). Neuromorphic engineers attempt to
capture the computational power and efficiency of biological neural systems
in hybrid analog-digital VLSI device. These devices employ a similar design strategy as in biology: local computations are performed in analog, and
the results are communicated using all-or-none binary events (spikes). The
significance of neuromorphic systems is that they offer a method of exploring neural computation in a medium whose physical behavior is analogous to
that of biological nervous systems and that operates in real time, irrespective
of size. The challenge for neuromorphic engineering is to explore the methods of biological information processing in a practical electrical engineering
context.
Today this field has grown into larger bio-inspired hardware systems with
a variety of silicon devices being designed at various research labs around the
world. Neuromorphic engineering includes design of computational primitives like silicon neurons/synapses, different kinds of intelligent sensors, ana115
116
Chapter 7. Discussion
log signal processing systems, spike based networks and devices for low-power
biomedical applications (see Boahen, 2005; Sarpeshkar, 2006, for reviews).
My specific contribution to the field focuses on the design and implementation of spike-based neural networks with learning capabilities. A growing
interest in such systems has recently lead to the design and fabrication of
an increasing number of VLSI networks of integrate-and-fire (I&F ) neurons
(Chicca et al., 2003; Liu and Douglas, 2004; Bofill-i Petit and Murray, 2004;
Arthur and Boahen, 2006; Badoni et al., 2006; Indiveri et al., 2006a; Häfliger,
2007), including multi-chip networks (Choi et al., 2005; Serrano-Gotarredona
et al., 2005; Vogelstein et al., 2007). Learning and classification undoubtedly forms an important part in building a complete, multi-chip, real-time,
behavioral system in hardware. Yet, the volume of research in devising a
neuromorphic learning chip is comparatively low. One obvious reason is the
lack of established models pertaining to spike based learning, and also the
difficulty in long term storage of synaptic weights in a silicon device. In this
context, the work presented here is extremely relevant as it shows a flexible
neural network that implements a very robust spike based learning algorithm
(Brader et al., 2007). It also solves the problem of weight storage by utilizing a bistable memory element, a natural choice in the entire information
industry for the last half-century. The specific achievements and extension
to the state of the art made by this project and their relevance in the on
going research of building a large scale artificial system, capable of brain like
behavior, are highlighted below.
7.1.1
A robust AER communication system
Neuromorphic chips routinely require tens of thousands of axonal connections to propagate spikes from the silicon neurons, far too many to be implemented using dedicated wires. The address-event link, originally introduced
by Sivilotti (1991) and Mahowald (1994), implements virtual connections
using a time-division-multiple-access of a single physical channel. Since its
early days, the design of the communication channel has been significantly
improved by the work of Boahen (2000, 2004a). The AER communication,
being the backbone for data transfer in neuromorphic chips, has to be carefully optimized for both speed and robustness.
In this thesis, I showed how pipelining the communication channel increases its throughput and elaborated on the design aspects of the pipeline,
originally proposed in Boahen (2000). This formal approach in AER circuit
design is in contrast to the heuristic method used in many previous generations of neuromorphic chips. It also helped in designing an improved AER
communication system, independent of any external bias. I showed a reli-
7.1. Relevance of the work described in this thesis
117
able asynchronous communication channel without the problems of missing
spikes, as seen in Chicca (2006). I discussed the design of individual combinational circuit blocks in the communication channel, in particular, I focused
on the arbiter and the decoder design. The arbiter, being an integral part of
an asynchronous transmitter, should be both fast and unbiased in its operation. The design basics and data from different chips are presented to point
out the improvements in the arbiter design, that was proposed in Boahen
(2004a). This is an important enhancement over the problems observed in
Lichtsteiner and Delbrück (2005) for example, where increased spiking activity in one region of the chip restricted all the AER traffic to that region.
Another important contribution was in the dual rail data data representation (Martin and Nystrom, 2006) used in the receiver chip, a step toward
increasing robustness to the AER communication system.
7.1.2
Synaptic plasticity in silicon
Implementation of spike based synaptic plasticity on silicon has a fairly recent history, and long term storage of the analog synaptic weights still poses
an important technological challenge. Over the years, researchers accomplished this difficult task by using floating-gate storage (Diorio et al., 1997;
Häfliger and Rasche, 1999; Shon et al., 2002), by implementing digital lookup-tables (Vogelstein et al., 2003; Wang and Liu, 2006) or by storing only
the binary state of the synapse (Arthur and Boahen, 2006). However, there
has been accumulating evidence that biological synaptic contacts undergo
all-or-none modification (Petersen et al., 1998; O’Connor et al., 2005), with
no intermediate stable states. In this project we used a binary synapse with
limited analog resolution that is designed to take only two values over long
time scale (Brader et al., 2007). The plasticity model uses Hebbian learning
with stochastic updates and an additional stop-learning condition to classify
broad classes of linearly separable patterns. In this work, I demonstrated the
bistable nature of the silicon synapse along with its stochastic transition from
one state to another. Experiments demonstrating the detailed dynamics of
synaptic plasticity were performed on one single synapse due to limitation
in number of external probe points. The weight update on the particular
synapse was also analyzed for its phase dependency between the pre- and
post-synaptic spike times. I described the method of determining the LT P
and LT D probability curves that are dependent on the pre- and the postsynaptic frequencies. It is also essential for all the synapses to show stochastic nature of transition, similar to the one tested in detail. The transition
probability is an outcome of the collective performance of the various circuit
blocks associated with the synapse, and requires optimized values from many
118
Chapter 7. Discussion
different voltage biases. In previous works, Fusi et al. (2000); Chicca et al.
(2003) have shown frequency dependent transition probabilities, but in a single synaptic circuit. For the first time, I reported data showing controlled
frequency dependence of the stochastic transition probabilities, of an entire
synaptic array in a full custom VLSI chip.
Using the IFSL-v2 chip, I showed how current-mode signal processing
helped in reducing unwanted coupling between feedback signals from neuron to synapses. This method could be extensively used for similar learning
rules where a few global feedback signals are broadcast to all synapses, to
control their plasticity. Current-mode integrators, current-comparators (derived from current-mode WTA proposed by Lazzaro et al. (1989)) and active
current-mirrors used in this chip consume much lower power compared to
the corresponding voltage-mode processing.
7.1.3
Learning and classification in VLSI
Memory is a fundamental component of all learning mechanisms which lead
to the classification of learned stimuli. In particular, the memory elements
(the synaptic weights) should be modified to learn from experience, creating new memories (memory encoding). At the same time the old memories
should be protected from being overwritten by new ones (memory preservation). In this work I described a spike based VLSI system which can learn
to classify complex patterns, efficiently solving both the memory encoding
and the memory preservation problems. Specifically, I demonstrated how the
VLSI system can robustly classify patterns of mean firing rates, also in the
case in which there are strong correlations between input patterns.
Almost none of the neuromorphic systems designed for spike based synaptic plasticity report any classification behavior (see Fusi et al., 2000; Shon
et al., 2002; Bofill-i Petit and Murray, 2004; Indiveri et al., 2006b), with notable exceptions in Arthur and Boahen (2006) and Häfliger (2007). However,
in those studies, the patterns to be classified were very simple with little or
no overlap between them and none showed results for simultaneous classification of multiple patterns. None of them report any quantitative analysis on
classification performance or any measure of the storage capacity of the VLSI
network. The work of Giulioni et al. (2007) implements the same plasticity
rule as in this project, and reports simple learning experiments concentrating
mostly on characterization of the synaptic weights. In this thesis, I showed
results verifying the functionality of the VLSI system, in the difficult condition of random binary patterns as input. Classification of multiple input
patterns and rigorous quantification of the network performance was shown
using ROC analysis. In addition I showed how the learning performance
7.2. Future Work
119
could be further improved by using boosting techniques (Polikar, 2006). Results from the chip demonstrated how the scaling of the storage capacity are
in accordance with those predicted by theory the (Brader et al., 2007). This
is an important step towards justifying a scaled up version of the network,
with few thousand synapses per neuron. The robust classification performance with highly correlated patterns shown in this thesis has not yet been
demonstrated in any other bio-plausible hardware system. Finally, I showed
how the learning circuits can correctly classify more complex patterns that
are not restricted to binary input firing rates. These results set an important step towards online classification of spike trains from artificial sensors
or biological neural networks.
7.2
Future Work
Spike based learning and classification has become an active field of research
because of its importance in processing spatio-temporal information in the
cortex (Maass and Bishop, 1998). The pulse coded communication has also
proved to be of great advantage in developing large multi-chip artificial behavioral systems, e.g., the EU FET CAVIAR (2002–2006) project. VLSI devices that implement spike based learning and signal processing are essential
for various applications including autonomous robotics, stand-alone sensors
or computational module in brain-machine interfaces. Considering the satisfactory classification performance of the IFSL family of chips, presented in
this thesis, it is important to continue research in the same direction. The
obvious next step would be to extend the methods and devices developed
here to classify spike train obtained from real sensors and biological nervous systems. On going work in Choi et al. (2008) shows promising results
in using the IFSL-v2 chip to recognize spoken vowels captured by a silicon
cochlea (Chan et al., 2006). Other preliminary tests show that these chips
are a good candidate for classification of spike rasters recorded from in-vivo
experiments, e.g., monkey pre-motor cortex when it plans a grasping task
(Musallam et al., 2004), paving the way for real-time neuro-prosthetics.
To improve the VLSI system used in this work, various suggestions are
presented in Chapter. 4. The size of the silicon synapse can be greatly reduced
by using a single EPSC block per neuron, similarly to the approach proposed
in Arthur and Boahen (2006), allowing for many more synapses per unit
area. Enhancing the dynamic range of the silicon synapse can provide a
better control on the stochastic transition probabilities, essential for robust
classification. Improved multiplexers for configuring the synaptic density
will provide higher flexibility in the choice of the network architecture, for
120
Chapter 7. Discussion
classifying a wider variety of input stimuli.
7.3
Outlook
In the late 1980s and 1990s, various research efforts (both industrial and
academic) were dedicated to the design and implementation of hardware
neural networks (NN), both in the analog and the digital domain (Lindsey
and Lindblad, 1995; Ienne et al., 1996; Lindsay, 2002, see). However, very
few of these efforts matured enough to become commercially successful (e.g.,
Synaptics; Adaptive Solutions; Ligature). The modest level of achievement
can be largely attributed to the fact that those works were based on ASIC 1
technology that was not competitive enough to justify large multi-chip adoption for neural-network applications. Most of these silicon devices were built
to behave as hardware accelerators, to execute, in real-time, the successful
classical NN models (Rumelhart et al., 1986; Hertz et al., 1991). These rate
coded models largely ignored the biological realism of spike based computation or the physical limits of bounded synaptic weight. Neither did the
hardware NNs gain any particular advantage from their ASIC implementation, compared to that of a general purpose digital processor. On the other
hand, progressive scaling of CMOS transistors, along with the advent of fast
automated design and testing tools, drove the phenomenal success of digital
processors, that outperformed the dedicated hardwares.
Today, it is widely recognized that handling intra-die variability in device characteristics represent the biggest challenges for present and next
generation nano-CMOS transistors and circuits (Declerck, 2005). This has
prompted the ITRS (International Technology Roadmap for Semiconductor) in suggesting the concept of More than Moore 2 , to bring revolutionary changes in the way future integrated circuits and systems are conceived
(ITRS 2007). According to a well known technology leader, Jan Rabaey
(GSRC Berkeley), the unpredictable component behavior will lead to design
strategies dependent on self-adaptivity, error resiliency and device randomness (Rabaey, 2005). Designing systems with high degree of redundancy
with large arrays, possibly in billions, of poorly matched devices is now seriously considered by both academia and industry (Martorell and Cotofana,
2008; Bahar et al., 2007; FACETS; NanoCMOSgriD). Power management
1
Application Specific Integrated Circuits
The ITRS added a perpendicular axis to traditional transistor scaling following the
Moore’s law, that improves both transistor density and processing speed. The other
axis refers to functional diversification by incorporating design techniques that does not
necessarily scale according to Moore’s law.
2
7.3. Outlook
121
(both distribution and dissipation) will be an even bigger issue, when such
arrays are required to function in an uninterrupted mode, e.g., in mobile or
prosthetic devices.
Not surprisingly, the architectural constrains in biological systems are
very similar to that of the advanced silicon technology (Mead, 1990; Vittoz, 1998). Inspired by biology, the neuromorphic systems described in this
thesis use low-power hybrid analog/digital circuits to emulate nature in the
interest of addressing these system-level issues. Together with advances in
spike based communication, they are a perfect fit for such massively parallel artificial systems, useful for advanced computing paradigms. From a
physical hardware perspective, spikes allow for the reliable transmission and
processing of information in large distributed systems similar to the digital
computing hardware of today. Recent theoretical work in computer science
(Maass and Bishop, 1998) further suggests that spike based computation offers a rich variety of principles that can effectively used to synthesize large
scale novel computing structures. Spike based real-time classification is a
step towards the synthesis of complex structures that can have cognitive capabilities taking us ahead of the reactive properties of current systems. Such
neuromorphic systems could potentially be used for optimal exploitation of
future emerging technologies that go well beyond the miniaturization and
integration limits of CMOS in building adaptive, fault-tolerant systems.
In conclusion, the VLSI device described in this thesis is highly suitable
for real-time spike based classification and showed the most promising performance reported in the literature so far. The IFSL family of chips are an
ideal example of hardware systems based on neuromorphic principles, right
from the design of low power subthrehold circuit blocks to the demonstration of collective computation based on noisy elementary blocks. During this
project, the prototype device has reached an appropriate stage of development where it can be interfaced with other spike based devices, for real-world
applications.
122
Appendix A
C-element
The Muller C-element is a commonly used asynchronous logic component
originally designed by David E. Muller. It applies logical operations on the
inputs and has hysteresis (Sutherland, 1989). The output of the C-element
reflects the inputs when the states of all inputs match. The output then
remains in this state until all the inputs make transition to the other state
(see truth table in Fig. A.1(a)). This model can be extended to the Asymmetric C-element where some inputs only effect the operation in one of the
two transitions (positive or negative) in the output. As in Fig. A.1(b), input
B has no affect in performing or restricting a downward transition. It is
conventional to show C-elements with two outputs, instead of one, in asynchronous communication channel design. In such cases, both outputs refer
to the same internal signal, i.e., C.
A
C
B
C
A
0
0
1
1
B
0
1
0
1
C
0
prv
prv
1
A
aC
C
B
(a)
A
0
0
1
1
B
0
1
0
1
C
0
0
prv
1
(b)
Figure A.1: The symmetric (a) and asymmetric (b) C-elements with their
truth tables. The output holding the previous state is denoted by prv.
The four major implementations of C-elements reported in literature are
dynamic, week feedback, conventional, and symmetric (Shams et al., 1998).
The C-element circuits can be broken up into two basic functional blocks,
the switching and the keeping block. In Fig. A.2(a), the four stacked transistor at the input and the inverter at the output together form the switching
123
124
Appendix A. C-element
A
A
B
B
C
A
A
A
B
C
B
A
B
B
C
C
C
A
C
B
B
A
A
A
B
A
C
C
C
B
B
A
C
Figure A.2: Various circuit implementations of C-element. (a) dynamic, (b)
week feedback, (c) conventional, (d) symmetric.
block. The parasitic capacitor connected to node C̄ performs as the keeper.
The circuits from left to right in Fig. A.2 consists of an increasing number
of transistors. In the next two circuits, the same switching block can be
easily identified. In the Fig.A.2(b) the feedback inverter plays the part of
the keeper; and the six transistors in the middle of the circuit Fig. A.2(c)
forms the keeper. In the symmetric implementation (Fig.A.2(d)), all the
stacked transistors connected to the inputs (A and B) and the output inverter together form the switching block. The two transistors connected to
C functions as keeper. Though the dynamic implementation is the fastest of
all four, not having an explicit keeper makes it less robust. The conventional
and the symmetric circuits are ratio-less implementations, while the weekfeedback requires correct transistor ratios for proper functioning. The area
overhead being nearly the same the conventional implementation was chosen
over the symmetric one, considering the ease of design.
Appendix B
Hand Shaking Expansion
HSE or hand − shaking expansion shows the sequence of actions to be performed when designing an asynchronous link starting with CHP1 formalism.
The following table lists HSE primitives.
Operation notation
AND
u&v
OR
u|v
Set
v+
Clear
vWait
[u]
Sequential
[u];v+
Concurrent u+,v+
Repeat
*[S]
Explanation
High if both are high
low if both are low
Drive v high
Drive v low
Wait till u is high
Wait till u, then v+
Drive u and v high
Repeat statement S infinitely
Example:
∗ [[Ão &Ea ]; Er +, Na +; [Ao &Ẽa ]; Er −, Na −]
(B.1)
Read: Wait till Ao is low and Ea is high, then drive Er and Na high. Now
wait for Ao to be high and Ea to be low, then drive Er and Na low. Keep
repeating this process for ever.
1
communicating hardware processes
125
126
Appendix C
Current-mode log domain
Filter
Current-mode log domain filter was first introduced by Seevinck (1990) and
later analysed in detail by Tsividis (1997); Mahattanakul and Toumazou
(1999); Frey (2000) etc. This group of circuits are often referred as ELIN
(externally-linear-internally-nonlinear) filter, due to the nonlinear (log) transformation from current to voltage in their internal nodes. The log-domain
filters were initially designed using bipolar junction transistors, that has the
necessary exponential i − v relation. However, due recent interest in lowpower, low-voltage analog design, CMOS circuits working in subthreshod
are often used for similar log domain function (Serra-Graells and Luis Huertas, 2005). In Arthur and Boahen (2006) showed ingenious application of
log-domain filtering in neuromorphic circuits, where extremely low power
consumption is one of prime requirements. Here I will describe various low
pass filters using log-domain technique for neuromorphic applications, such
as silicon neurons and synapses.
Let us first analyze the circuit in Fig. C.1. Consider Iin the input to
be a subthreshold current and Vtau a subthreshold voltage producing Iτ .
Transistor M2 mirrors the input as I2 . The KCL at node VL can be written
as:
I2 = Iτ + C
⇒C
d
VL = Iτ − I2
dt
d
(0 − VL )
dt
(C.1)
Considering subthreshold operation, the output current Iout from a pMOS
can be written as:
127
128
Appendix C. Current-mode log domain Filter
I
V tau
M3
M4
M1
M2
I out
I in
Figure C.1: The basic current mode logdomain filter. Output current Iout is
a low-pass filtered version of the input Iin . The gain and time constant of the
filter cannot be independently controlled.
κ
Iout = I0 e UT
⇒
(VDD −VL )
d
κ d
Iout = −Iout
VL
dt
UT dt
(C.2)
Combining the above two equations, we get:
κ Iτ − I2
d
Iout = −Iout
dt
UT CL
CL UT d
⇒
Iout = −Iout Iτ + Iout I2
κ dt
˙ + Iout = Iout I2
τ Iout
Iτ
(C.3)
L UT
where τ = CκI
. Looking into the circuit we find that the transistors
τ
M1 ,M2 and M4 form a translinear loop (Gilbert, 1990). We mark the corresponding gate-source voltages with arrows, hence:
Vgs,1 = Vgs2 + Vgs,4
(C.4)
Without going into detail of its derivation, using the translinear principle (for
unequal number of elements on both sides of the equality) can be written as:
Iin I0 = I2 Iout ,
(C.5)
where Iin ,I2 and Iout are the currents in M1 , M2 and M4 respectively. I0
is the leakage current through a transistor when Vgs is zero.
129
Using the above current relation, the differential equation Eq.C.3 can be
further modified to:
˙ + Iout =
τ Iout
Iin I0
Iτ
(C.6)
The first order ordinary differential equation can be Rsolved by rewriting
1
t
it and multiplying both sides by the integrating factor e τ dt (=e τ ):
Iin I0
Iout
=
τ
τ Iτ
t
˙ + e τt Iout = e τt Iin I0
e τ Itau
τ
τ Iτ
t
Iin I0 t
⇒ d(e τ Iout ) =
e τ dt
τ Iτ
˙ +
Iout
(C.7)
Integrating both sides, we get:
t
e τ Iout =
Iin I0 t
eτ + C
Iτ
(C.8)
The initial condition for a step input current can be given as, Iout (0+ )=0
and Iin (0+ )=Ip . The same equation can be solved for a negative step, after
the steady state has been reached, Iout (0+ )= IpIτI0 and Iin (0+ )=0.
t
Ip I0
(1 − e− τ ) positive step
Iτ
Ip I0 − t
=
e τ negative step
Iτ
Iout (t) =
(C.9)
(C.10)
This shows the behavior of a first order lowpass filter, where the steady
state current in determined by the forcing function IinIτI0 on the right hand
side of Eq.C.6. This being a function of the Iτ , will affect the gain when the
time constant is varied. The factor I0 (∼ 10−18 A) also limits the gain to a
low value. To add an independent control on the gain term, we can connect
the source of M1 to a voltage source Vgain instead of VDD . The translinear
loop can be written as:
Vb + Vgs,1 = Vgs2 + Vgs,4 , ⇒ Iin Igain = I2 Iout
(C.11)
130
Appendix C. Current-mode log domain Filter
I
V tau
M5
I
V tau
M3
M3
Vgain
M4
M4
I gain
M1
M1
M2
I out
I in
M2
I out
I in
(a)
(b)
Figure C.2: Modification of the basic logdomain filter with additional gain
control (a) and time-constant gain decoupled (b).
κ
V
where Vb =VDD -Vgain and Igain =I0 e UT b . The circuit is shown in Fig. C.2(a).
Substituting the translinear current in Eq. C.3, the forcing function becomes
Iin Igain
. The voltage Vgain can be used to independently control the gain.
Iτ
In order to decouple the gain from Itau and also make it independent of
I0 , the circuit in Fig. C.2(b) can be used. The only difference is in the source
of transistor M1 which is connected to Vtau instead of VD D. This includes the
transistor M3 also in translinear loop (red arrows). The translinear current
equation (for same number of elements on either side of equality) is given by:
Iin Iτ = I2 Iout
(C.12)
The analysis of the remaining part of the circuit remains the same. Substituting the translinear currents in Eq. C.3 the new differential equation
becomes:
˙ + Iout = Iin
τ Iout
(C.13)
Hence the new forcing function is just Iin . Though the gain is independent
of time constant and I0 but it has a constant magnitude of unity. A better
control in the steady state gain can be achieved by using an extra transistor,
as shown in Fig. C.3(a). Here Igain is a subthreshold current used to bias the
transistor M1 . The translinear loop is given by:
Vgs,5 + Vgs,1 = Vgs2 + Vgs,4 , ⇒ Iin Igain = I2 Iout
˙ + Iout = Iin Igain
substituting in Eq. C.3; τ Iout
Itau
(C.14)
131
Vtau
I
V tau
M3
I gain
M5
I gain
M5
M4
M1
M4
M1
M2
I out
I in
I
M3
M2
I out
I in
(a)
(b)
Figure C.3: Variations of the log-domain circuit to implement independent
gain and time constant control
Hence the forcing function is devoid of I0 but is again dependent on the
time constant. Including the transistor M3 in translinear loop, as shown in
Fig. C.3(b), can make the gain independent of Itau . The translinear loop can
be written as:
Vgs,3 + Vgs,5 + Vgs,1 = Vgs2 + Vgs,4 , ⇒ Iin Igain Itau = I0 I2 Iout
gain
˙ + Iout = Iin I(C.15)
substituting in Eq. C.3; τ Iout
I0
Though this brings the I0 term back into the forcing function, it now
appears in the numerator. A small I0 will now increase the gain in contrast
to its affect in Eq. C.6. A complete removal of I0 from the forcing function
can be achieved by using another translinear loop for Igain , instead of a
simple current source. In Fig. C.4(a), such a current source is shown where
the output Igain = I1I2I0 . Here I1 and I2 are two voltage biased MOS current
sources. Using this current source in Fig. C.3(b) reduces the forcing function
to IinI2I1 .
A different logdomain filter is reported in Bartolozzi and Indiveri (2007b).
Shown in Fig. C.4(b), this circuit does not utilize the translinear principle.
In subthreshold operation, the current I2 can be written as:
I2 =
Iin
out
1 + IIgain
(C.16)
132
Appendix C. Current-mode log domain Filter
I
I2
M3
bias
I gain
M4
Vgain
M1
I1
M2
I out
I in
(a)
(b)
Figure C.4: (a) A translinear current source for use in logdomain filters. (b)
Differential pair integrator circuit that approximates a log domain filter with
independent gain and time constant control (adapted from Bartolozzi et al.
(2006)).
−
κ
UT
where Igain is a function of Vgain (=I0 e Vgain −VDD ) and not the current
through transistor M1 . If we consider that in the steady state condition
gain
. Hence the forcing function becomes
Iout >> Igain , then I2 reduces to IinIIout
Iin Igain
.
Iτ
Bibliography
L. F. Abbott and S. Song. Asymmetric hebbian learning, spike timing and
neural response variability. In Advances in Neural Information Processing
Systems, volume 11, pages 69–75, 1999.
L.F. Abbott and S.B. Nelson. Synaptic plasticity: taming the beast. Nature
Neuroscience, 3:1178–1183, November 2000.
Adaptive Solutions. URL http://www.adaptivesolutions.com/.
J. Arthur and K. Boahen. Learning in silicon: Timing is everything. In
Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 2006.
J.V. Arthur and K. Boahen. Recurrently connected silicon neurons with
active dendrites for one-shot learning. In IEEE International Joint Conference on Neural Networks, volume 3, pages 1699–1704, July 2004.
D. Badoni, M. Giulioni, V. Dante, and P. Del Giudice. An aVLSI recurrent
network of spiking neurons with reconfigurable and plastic synapses. In
Proceedings of the IEEE International Symposium on Circuits and Systems, pages 1227–1230. IEEE, IEEE, May 2006.
R.I. Bahar, C. Lau, D. Hammerstrom, D. Marculescu, J. Harlow,
A. Orailoglu, W.H. Joyner, and M.; Pedram. Architectures for silicon
nanoelectronics and beyond. IEEE Computer, pages 25–33, 2007.
C. Bartolozzi and G. Indiveri. A spiking VLSI selective attention multi–
chip system with dynamic synapses and integrate and fire neurons. In
B. Schölkopf, J.C. Platt, and T. Hofmann, editors, Advances in Neural
Information Processing Systems 19, Cambridge, MA, Dec 2007a. Neural
Information Processing Systems Foundation, MIT Press. (In press).
C. Bartolozzi and G. Indiveri. Synaptic dynamics in analog VLSI. Neural
Computation, 19:2581–2603, Oct 2007b.
133
134
Bibliography
C. Bartolozzi, S. Mitra, and G. Indiveri. An ultra low power current–mode
filter for neuromorphic systems and biomedical signal processing. In IEEE
Proceedings on Biomedical Circuits and Systems (BioCAS06), pages 130–
133, 2006.
A.J. Bell and T.J. Sejnowski. The independent components of natural scenes
are edge filters. Vision Res., 37:3327–3338, Dec 1997.
H.K.O. Berge and P. Hafliger. High-speed serial aer on fpga. In IEEE International Symposium on Circuits and Systems, pages 857–860, 2007.
G. Bi and M. Poo. Synaptic modification by correlated activity: Hebb’s
postulate revisited. Annu. Rev. Neurosci., 24:139–166, 2001.
G-Q. Bi and M-M. Poo. Synaptic modifications in cultured hippocampal
neurons: Dependence on spike timing, synaptic strength, and postsynaptic
cell type. Jour. of Neuroscience, 18(24):10464–10472, 1998.
B. Blais, L.N. Cooper, and H. Shouval. Formation of direction selectivity in
natural scene environments.
K. A. Boahen. Point-to-point connectivity between neuromorphic chips using
address-events. IEEE Transactions on Circuits and Systems II, 47(5):416–
34, 2000.
K. A. Boahen. A burst-mode word-serial address-event link – I: Transmitter
design. IEEE Circuits and Systems I, 51(7):1269–80, 2004a.
K. A. Boahen. A burst-mode word-serial address-event link – II: Receiver
design. IEEE Circuits and Systems I, 51(7):1281–91, 2004b.
K. A. Boahen. A burst-mode word-serial address-event link – III: Analysis
and test results. IEEE Circuits and Systems I, 51(7):1292–300, 2004c.
K.A. Boahen. Neuromorphic microchips. Scientific American, pages 56–63,
May 2005.
K.A. Boahen. A retinomorphic vision system. IEEE Micro, 16(5):30–39,
October 1996.
K.A. Boahen. Communicating neuronal ensembles between neuromorphic
chips. In T. S. Lande, editor, Neuromorphic Systems Engineering, pages
229–259. Kluwer Academic, Norwell, MA, 1998.
Bibliography
135
A. Bofill-i Petit and A. F. Murray. Synchrony detection and amplification
by silicon neurons with STDP synapses. IEEE Transactions on Neural
Networks, 15(5):1296–1304, September 2004.
J. Brader, W. Senn, and S. Fusi. Learning real world stimuli in a neural
network with spike-driven synaptic dynamics. Neural Computation, 19:
2881–2912, 2007.
N. Caporale and Y. Dan. Spike Timing-Dependent Plasticity: A Hebbian
Learning Rule. Annu. Rev. Neurosci., Feb 2008.
G. Cauwenberghs and M. A. Bayoumi, editors. Learning on Silicon: Adaptive
VLSI Neural Systems. Kluwer, Boston, MA, 1999.
CAVIAR. Convolution address-event-representation (AER) vision architecture for real-time. IST -2001- 34124 EU Grant, 2002–2006.
V. Chan, S-C. Liu, and A. van Schaik. AER EAR: A matched silicon cochlea
pair with address event representation interface. IEEE Transactions on
Circuits and Systems I, 54(1):48–59, Jan 2006. Special Issue on Sensors.
H. Chen, C.D. Fleury, and A.F. Murray. Continuous-valued probabilistic
behavior in a vlsi generative model. IEEE Trans. on Neural networks, 17
(3):755–770, 2006.
E. Chicca. A Neuromorphic VLSI System for Modeling Spike–Based Cooperative Competitive Neural Networks. PhD thesis, ETH Zürich, Zürich,
Switzerland, April 2006.
E. Chicca, D. Badoni, V. Dante, M. D’Andreagiovanni, G. Salina, S. Fusi, and
P. Del Giudice. A VLSI recurrent network of integrate–and–fire neurons
connected by plastic synapses with long term memory. IEEE Transactions
on Neural Networks, 14(5):1297–1307, September 2003.
E. Chicca, A. M. Whatley, V. Dante, P. Lichtsteiner, T. Delbrück, P. Del Giudice, R. J. Douglas, and G. Indiveri. A multi-chip pulse-based neuromorphic infrastructure and its application to a model of orientation selectivity.
IEEE Transactions on Circuits and Systems I, Regular Papers, 5(54):981–
993, 2007.
S. Choi, G. Mitra Indiveri, S. Liu. Shih-Chii, and S. Y. Lee. Real-time
sound-recognition using neuromorphic hardware. 2008. in preperation.
136
Bibliography
T. Y. W. Choi, B. E. Shi, and K. Boahen. An on-off orientation selective
address event representation image transceiver chip. IEEE Transactions
on Circuits and Systems I, 51(2):342–353, 2004.
T. Y. W. Choi, P. A. Merolla, J. V. Arthur, K. A. Boahen, and B. E. Shi.
Neuromorphic implementation of orientation hypercolumns. IEEE Transactions on Circuits and Systems I, 52(6):1049–1060, 2005.
M. Coath, J. Brader, S. Fusi, and S.L. Denham. Multiple views of the
response of an ensemble of spectro-temporal features supports concurrent
classification of utterance, prosody, sex and speaker identity. Network,
(2-3):285–300, 2005.
E. Culurciello and A. G. Andreou. A comparative study of access topologies
for chip-level address-event communication channels. IEEE Transactions
on Neural Networks, 14(5):1266–77, September 2003.
E. Culurciello, R. Etienne-Cummings, and K. Boahen. Arbitrated addressevent representation digital image sensor. Electronics Letters, 37(24):1443–
1445, Nov 2001.
E. Culurciello, R. Etienne-Cummings, and K. Boahen. A biomorphic digital
image sensor. Solid-State Circuits, IEEE Journal of, 38(2):281–294, 2003.
Leporati F. Danese, G. A parallel neural processor for real-time applications.
IEEE Micro, 22(3):20–31, 2002.
V. Dante, P. Del Giudice, and A. M. Whatley. PCI-AER – hardware and software for interfacing to address-event based neuromorphic
systems. The Neuromorphic Engineer, 2(1):5–6, 2005. http://ineweb.org/research/newsletters/index.html.
P. Dayan and L.F. Abbott. Theoretical Neuroscience: Computational and
Mathematical Modeling of Neural Systems. MIT Press, 2001.
G. Declerck. A look into the future of nanoelectronics. In IEEE Symposium
on VLSI Technology, 2005. Digest of Technical Papers., pages 6–10, 2005.
S.R. Deiss, R.J. Douglas, and A.M. Whatley. A pulse-coded communications
infrastructure for neuromorphic systems. In W. Maass and C. M. Bishop,
editors, Pulsed Neural Networks, chapter 6, pages 157–78. MIT Press, 1998.
A. Destexhe, Z.F. Mainen, and T.J. Sejnowski. Methods in Neuronal Modelling, from ions to networks, chapter Kinetic Models of Synaptic Transmission, pages 1–25. The MIT Press, Cambridge, Massachussets, 1998.
Bibliography
137
C. Diorio, P. Hasler, B.A. Minch, and C. Mead. A single-transistor silicon
synapse. IEEE Transactions on Electron Devices, 43(11):1972–1980, 1996.
C. Diorio, P. Hasler, B. A. Minch, and C. A. Mead. A floating-gate MOS
learning array with locally computed weight updates. IEEE Transactions
on Electron Devices, 44(12):2281–2289, December 1997.
M. Djurfeldt, M. Lundqvist, C. Johansson, Ekeberg . Rehn, M., and
A. Lansner. Brain-scale simulation of the neocortex on the blue gene/l
supercomputer. IBM Journal of Research and Development, 52:31–41,
2008.
R.J. Douglas, M.A. Mahowald, and K.A.C. Martin. Hybrid analog-digital
architectures for neuromorphic systems. In Proc. IEEE World Congress
on Computational Intelligence, volume 3, pages 1848–1853. IEEE, 1994.
R.J. Douglas, M.A. Mahowald, and C. Mead. Neuromorphic analogue VLSI.
Annu. Rev. Neurosci., 18:255–281, 1995.
R. Etienne-Cummings, V. Van der Spiegel, and J. Muller. Hardware implementation of a visual-motion pixel using oriented spatiotemporal neural
filters. IEEE Trans. on Circuits and Systems II, 46:1121–1136, 1999.
FACETS. Fast analog computing with emergent transient states. URL http:
//facets.kip.uni-heidelberg.de/.
E. Farquhar and P. Hasler. A bio-physically inspired silicon neuron. IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMSI, 52:477–488, 2005.
D. B. Fasnacht, A. M. Whatley, and G. Indiveri. A serial communication
infrastructure for multi-chip address event systems. 2008. accepted in
ISCAS, 2008.
T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters,
(26):861–874, 2006.
D. Frey. Future implications of the log domain paradigm. In IEE Proceedings
Circuits Devices Systems, volume 147, pages 65–72, February 2000.
S. Fusi. Hebbian spike-driven synaptic plasticity for learning patterns of
mean firing rates. Biological Cybernetics, 87:459–470, 2002.
S. Fusi and L. F. Abbott. Limits on the memory storage capacity of bounded
synapses. Nature Neuroscience, 10:485–493, 2007.
138
Bibliography
S. Fusi and W. Senn. Eluding oblivion with smart stochastic selection of
synaptic updates. Chaos, 16, 2006.
S. Fusi, M. Annunziato, D. Badoni, A. Salamon, and D. J. Amit. Spike–
driven synaptic plasticity: theory, simulation, VLSI implementation. Neural Computation, 12:2227–58, 2000.
R. Genov and G. Cauwenberghs. Stochastic mixed-signal vlsi architecture for
high-dimensional kernel machines. In J. E. Moody et al., editor, Advances
in Neural Information Processing Systems. Morgan Kaufmann, 2002.
P. Georgiou and C. Toumazou. A silicon pancreatic beta cell for diabetes.
Biomedical Circuits and Systems, IEEE Transactions on, 1, 2007.
W. Gerstner. What is different with spiking neurons? In H. Mastebroek
and J. E. Vos, editors, Plausible Neural Networks for Biological Modelling.
Kluwer Academic, 2001.
B. Gilbert. Current-mode circuits from a translinear viewpoint: A tutorial.
In C. Tomazou, F. J. Lidgey, and D. G. Haigh, editors, Analogue IC design:
the current-mode approach, chapter 2, pages 11–91. Peregrinus, Stevenage,
Herts., UK, 1990.
M Giulioni, M. Pannunzi, D. Badoni, V. Dante, and P. Del Giudice. A configurable analog vlsi neural network with spiking neurons and self-regulating
plastic synapses. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances
in Neural Information Processing Systems (NIPS), 2007.
M. Graupner and N. Brunel. STDP in a bistable synapse model based on
CaMKII and associated signaling pathways. PLoS Comput. Biol., 3:e221,
Nov 2007.
S. Grossberg, E. Mingolla, and D. Todovoric. Threshold voltage mismatch
and intra-die leakage current in digital cmos circuits. IEEE JOURNAL
OF SOLID-STATE CIRCUITS,., 39:157–168, 2004.
GSRC Berkeley. Gigasclae systems research lab.
gigascale.org/.
URL http://www.
R. Gütig and H. Sompolinsky. The tempotron: a neuron that learns spike
timing–based decisions. Nature Neuroscience, 9:420–428, 2006. doi: 10.
1038/nn1643.
P. Häfliger. Adaptive WTA with an analog VLSI neuromorphic learning
chip. IEEE Trans Neural Netw, 18:551–572, Mar 2007.
Bibliography
139
P. Hafliger and M. Mahowald. Spike based normalizing hebbian learning
in an analog vlsi artificial neuron. Analog Integrated Circuits and Signal
Processing, 18:133–139, 1999.
P. Häfliger and C. Rasche. Floating gate analog memory for parameter and
variable storage in a learning silicon neuron. In Proc. IEEE International
Symposium on Circuits and Systems, Orlando, 1999.
P. Häfliger,
in analog
Advances
698. MIT
M. Mahowald, and L. Watts. A spike based learning neuron
VLSI. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors,
in neuralinformation processing systems, volume 9, pages 692–
Press, 1997.
D. Hammerstrom. The handbook of brain theory and neural network, pages
349–353. 2002.
J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural
Computation. Addison-Wesley, Reading, MA, 1991.
A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane
current and its application to conduction and excitation in nerve. Journal
of Physiology, 117:500–44, 1952.
P. Ienne, T. Cornu, and G Kuhn. Special-purpose digital hardware for neural
networks: An architectural survey. Journal of VLSI Signal Processing
Systems, pages 5–25, 1996.
G. Indiveri. Neuromorphic bistable VLSI synapses with spike-timingdependent plasticity. In Advances in Neural Information Processing Systems, volume 15, pages 1091–1098, Cambridge, MA, December 2002. MIT
Press.
G. Indiveri. A low-power adaptive integrate-and-fire neuron circuit. In Proc.
IEEE International Symposium on Circuits and Systems, pages IV–820–
IV–823. IEEE, May 2003.
G. Indiveri, E. Chicca, and R. Douglas. A vlsi array of low-power spiking
neurons and bistable synapses with spiketiming dependent plasticity. IEEE
Transactions on Neural Networks, 17(1):211–221, 2006a.
G. Indiveri, E. Chicca, and R. Douglas. A VLSI array of low-power spiking neurons and bistable synapses with spike–timing dependent plasticity.
IEEE Transactions on Neural Networks, 17(1):211–221, Jan 2006b.
140
Bibliography
G. Indiveri, S.-C. Liu, T. Delbruck, and R. Douglas. The New Encyclopedia
of Neuroscience, chapter Neuromorphic systems, page In Press. Elsevier,
2008.
Intel. Teraflops research chip. http://techresearch.intel.com/articles/TeraScale/1449.htm, 2007.
ITRS 2007. URL http://www.itrs.net/Links/2007ITRS/ExecSum2007.
pdf.
E.
M. Izhikevich.
Simulation of large-scale brain models.
http://vesicle.nsi.edu/users/izhikevich/human brain simulation/Blue Brain.htm,
2005.
J.K. Jenkins and K. M. Dallenbach. Obliviscence during sleep and waking.
American Journal of Psycology, 35:605–612), 1924.
B. Keeth and J. Baker. DRAM Circuit Design, A Tutorial. IEEE, New York,
7th edition, 2000.
R Kempter, W. Gerstner, and J. L. van Hemmen. Hebbian learning and
spiking neurons. Physical Review E, 59(4):4498–4514, 1999.
R Kempter, W. Gerstner, and J. L. van Hemmen. Intrinsic stabilization of
output firing rates by spike-based hebbian learning. Neural Computation,
59(4):4498–4514, 2001.
P. Kinget. Device mismatch and tradeoffs in the design of analog circuits.
IEEE JOURNAL OF SOLID-STATE CIRCUITS,, 40(6), 2005.
T. J. Koickal, A. Hamilton, S. L. Tan, J.A. Covington, J.W. Garrdner, and
T.C. Pearce. Analog vlsi circuit implementation of an adaptive neuromorphic olfaction chip. IEEE Trans. on Circuits and Systems, 54(1):60–,
2007.
J. Lazzaro, S. Ryckebusch, M.A. Mahowald, and C.A. Mead. Winner-take-all
networks of O(n) complexity. In D.S. Touretzky, editor, Advances in neural
information processing systems, volume 2, pages 703–711, San Mateo - CA,
1989. Morgan Kaufmann.
J. P. Lazzaro. Silicon Implementation of Pulse Coded Neural Networks, chapter Low-power silicon axons, neurons, and synapses, pages 153–164. Kluwer
Academic Publishers, 1994.
Bibliography
141
R.A. Legenstein, C. Näger, and W. Maass. What can a neuron learn with
spike-timing-dependent plasticity?
Neural Computation, 17(11):2337–
2382, 2005.
J.J. Letzkus, B. Kampa, and G.J. Stuart. Learning rules for spike timingdependent plasticity depend on dendritic synapse location. The Journal
of Neuroscience, 26:10420–10429, 2006.
W.B. Levy and O. Steward. Temporal contiguity requirements for long-term
associative potentiation/depression in the hippocampus. Neuroscience, 8:
791–797, Apr 1983.
P. Lichtsteiner and T. Delbrück. A 64x64 aer logarithmic temporal derivative
silicon retina. In Research in Microelectronics and Electronics, 2005 PhD,
volume 2, pages 202–205, July 2005.
P. Lichtsteiner, C. Posch, and T. Delbrück. A 128×128 120dB 30mW asynchronous vision sensor that responds to relative intensity change. In 2006
IEEE ISSCC Digest of Technical Papers, pages 508–509. IEEE, 2006.
Ligature. OCR that reads the way you do. URL http://www.ligatureltd.
com/.
C.S. Lindsay. Neural networks in hardware: Architectures, products
and applications, 2002. URL http://www.particle.kth.se/~lindsey/
HardwareNNWCourse/.
C. S. Lindsey and T. Lindblad. Survey of neural network hardware. Proc.
SPIE, 2492:1194–1205, 1995.
J. Lisman and N. Spruston. Postsynaptic depolarization requirements for
LTP and LTD: a critique of spike timing-dependent plasticity. Nat. Neurosci., 8:839–841, Jul 2005.
S.-C. Liu and R. Douglas. Temporal coding in a silicon network of integrateand-fire neurons. IEEE Transactions on Neural Networks, 15(5):1305–
1314, Sep 2004.
R.F. Lyon and C. Mead. An analog electronic cochlea. IEEE Transactions
on Acoustics, Speech, and Signal Processing, 36(7):1119–1134, 1988.
W. Maass and C. M. Bishop. Pulsed Neural Networks. MIT Press, 1998.
142
Bibliography
J. Mahattanakul and C. Toumazou. Modular log-domain filters based upon
linear gm-c filter synthesis. IEEE TRANSACTIONS ON CIRCUITS AND
SYSTEMSI, pages 1421–1430, 1999.
M. Mahowald. An Analog VLSI System for Stereoscopic Vision. Kluwer,
Boston, MA, 1994.
M.A. Mahowald. VLSI analogs of neuronal visual processing: a synthesis of
form and function. PhD thesis, Department of Computation and Neural
Systems, California Institute of Technology, Pasadena, CA., 1992.
H. Markram. The blue brain project. Nat. Rev. Neurosci., 7:153–160, Feb
2006.
H. Markram, J. Lübke, M. Frotscher, and B. Sakmann. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275:
213–215, 1997.
A. J. Martin and M. Nystrom. Asynchronous techniques for system-on-chip
design. Proceedings of the IEEE, 94:1089–1120, 2006.
A.J. Martin, M. Nystrom, and C.G. Wong. Three generations of asynchronous microprocessors. Design and Test of Computers, IEEE, 26:9–17,
2003.
S.J. Martin, P.D. Grimwood, and R.G. Morris. Synaptic plasticity and memory: an evaluation of the hypothesis. Annu. Rev. Neurosci., 23:649–711,
2000.
F. Martorell and A. Cotofana, S. D. Antonio Rubio. An analysis of internal
parameter variations effects on nanoscaled gates. IEEE TRANSACTIONS
ON NANOTECHNOLOGY, pages 24–33, 2008.
C. Mead. Neuromorphic electronic systems. Proceedings of the IEEE, 78(10):
1629–36, October 1990.
C. Mead and L. Conway. Introduction to VLSI Systems. Addison-Wesley,
Reading, Massachusetts, 1980.
C.A. Mead. Analog VLSI and Neural Systems. Addison-Wesley, Reading,
MA, 1989.
P. A. Merolla, J. V. Arthur, B. E. Shi, and K. A. Boahen. Expandable networks for neuromorphic chips. IEEE Transactions on Circuits and Systems
I: Fundamental Theory and Applications, 54(2):301–311, Feb. 2007.
Bibliography
143
M.L. Minsky and S.A. Papert. Perceptron. MIT Press, Cambridge, 1969.
S. Mitra, G. Indiveri, and S. Fusi. Learning to classify complex patterns
using a VLSI network of spiking neurons. In B. Schölkopf, J. Platt, and
T. Hoffman, editors, Advances in Neural Information Processing Systems,
Cambridge (MA), 2008. MIT Press. (In Press).
Srinjoy Mitra and Giacomo Indiveri. A low-power dual-threshold comparator
for neuromorphic systems. In 2005 PhD Research in Microelectronics and
Electronics, volume 2, pages 402–405, Lausanne, Jul 2005. IEEE.
D. Muir. Spike toolbox. http://www.ini.uzh.ch/˜dylan/spike toolbox/, 2005.
S. Musallam, B.D. Corneil, B. Greger, H. Scherberger, and R.A. Andersen.
Cognitive control signals for neural prosthetics. Science, 305:258–262, Jul
2004.
J. P. Nadal, G. Toulouse, J. P. Changeux, and S. Dehaene. Networks of
formal neurons and memory palimpsests. Europhys. Lett., pages 535–542,
1986.
NanoCMOSgriD. Meeting the design challenges of nano-cmos electronics.
URL http://www.nanocmos.ac.uk/.
T. Natschläger and W. Maass. Spiking neurons and the induction of finite
state machines. Theoretical Computer Science: Special Issue on Natural
Computing, 287(251–265), 2002.
S.B. Nelson, P.J. Sjostrom, and G.G. Turrigiano. Rate and timing in cortical
synaptic plasticity. Philos. Trans. R. Soc. Lond., B, Biol. Sci., 357:1851–
1857, Dec 2002.
D.H. O’Connor, G.M. Wittenberg, and S.S. Wang. Graded bidirectional
synaptic plasticity is composed of switch-like unitary events. Proc. Natl.
Acad. Sci. U.S.A., 102:9679–9684, Jul 2005.
M. J. Pearson, A.G. Pipe, K. Mitchinson, B. Gurney, C. Melhuish, I. Gilhespy, and M. Nibouche. Implementing spiking neural networks for realtime signal-processing and control applications: A model-validated FPGA
approach. IEEE Trans. on Neural Networkse, 18(5):1472–87, 2007.
C. C. H. Petersen, R. C. Malenka, R. A. Nicoll, and J. J. Hopfield. All-ornone potentiation at CA3-CA1 synapses. Proc. Natl. Acad. Sci., 95:4732,
1998.
144
Bibliography
P.Hasler, B.A.Minch, J.Dugger, and C.Diorio. Adaptive circuits and synapses
using pFET floating-gate devices. Kluwer Academic, 1999.
R. Polikar. Ensemble based system in decission making. IEEE Circuits and
systems magazine, 3, 2006.
J.M. Rabaey. Design at the end of the silicon roadmap. In IEEE Design
Automation Conference, page Keynote Address III, 2005.
W. Rall. Distinguishing theoretical synaptic potentials computed for different
soma-dendritic distributions of synaptic input. Journal of neurophysiology,
30(5):1138–1168, September 1967.
E. Ros, E. M. Ortigosa, R. Agis, R. Carrillom, and M. Arnold. Real–time
computing platform for spiking neurons (RT-spike). IEEE Transactions
on Neural Networks, 17(4):1050–1062, 2006.
J. Rubin, D. Lee, and H. Sompolinsky. The equilibrium property of temporal
asymmetric hebbian plasticity. Physical Review letters, pages 364–367,
2001.
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. In D. E. Rumelhart J.L. McClelland and
PDP Res.Grp., editors, Parallel Distributed Processing, pages 318–362.
MIT Press, 1986.
T. Sakurai. Optimization of cmos arbiter and synchronizer circuits with
submicrometer mosfets. Solid-State Circuits, IEEE Journal of, 23:901–
906, 1988.
A. Sandberg, A. Lansner, K.M. Petersson, and O. Ekeberg. A Bayesian
attractor network with incremental learning. Network, 13:179–194, May
2002.
R Sarpeshkar. Brain power – borrowing from biology makes for low power
computing – bionic ear. IEEE Spectrum, 43(5):24–29, May 2006.
R. Sarpeshkar. Analog versus digital: Extrapolating from electronics to neurobiology. Neural Computation, 10(7):1601–1638, October 1998.
J. Schemmel, A. Grubl, K. Meier, and E. Mueller. Implementing synaptic
plasticity in a vlsi spiking neural network model. In Neural Networks,
2006. IJCNN ’06. International Joint Conference on, pages 1–6, 2006.
Bibliography
145
J. Schemmel, D. Bruderle, K. Meier, and B. Ostendorf. Modeling synaptic
plasticity within networks of highly accelerated i&f neurons. In Circuits
and Systems, 2007. ISCAS 2007. IEEE International Symposium on, pages
3367–3370, 2007.
E. Seevinck. Companding current-mode integrator: A new circuit principle for continuous-time monlithic filters. Electronics Letters, 26(24):2046–
2047, November 1990.
W. Senn and S. Fusi. Learning Only When Necessary: Better Memories of
Correlated Patterns in Networks with Bounded Synapses. Neural Computation, 17(10):2106–2138, 2005.
F. Serra-Graells and J. Luis Huertas. Low-voltage cmos subthreshold logdomain filtering. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI, pages 2090–2100, 2005.
R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco,
R. Paz-Vicente, F. Gómez-Rodrı́guez, H. Kolle Riis, T. Delbrück, S. C. Liu,
S. Zahnd, A. M. Whatley, R. J. Douglas, P. Häfliger, G. Jimenez-Moreno,
A. Civit, T. Serrano-Gotarredona, A. Acosta-Jiménez, and B. LinaresBarranco. AER building blocks for multi-layer multi-chip neuromorphic
vision systems. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15. MIT Press,
Dec 2005.
R. Serrano-Gotarredona, T. Serrano-Gotarredona, A. Acosta-Jimenez, and
B. Linares-Barranco. A neuromorphic cortical-layer microchip for spikebased event processing vision systems. IEEE Transactions on Circuits and
Systems I, 53(12):2548–2566, Dec. 2006.
M. Shams, J. C. Ebergen, and M. I. Elmasry. Modeling and comparing
CMOS implementations of the c–element. IEEE Transactions on VLSI
Systems, 6(4):563–7, 1998.
N. Y. Shen, Z. Liu, C. Lee, B. A. Mich, and E. C. Kan. Charge-based chemical
sensors: A neuromorphic approach with chemoreceptive neuron mos (Cν
MOS) transistors. IEEE Trans. on electron devices., 50(10):2171–2178,
2003.
A.P. Shon, D. Hsu, and C. Diorio. Learning spike-based correlations and
conditional probabilities in silicon. In Advances in Neural Information
Processing Systems, pages 1123–1130, 2002.
146
Bibliography
H.Z. Shouval, M.F. Bear, and L.N. Cooper. A unified model of NMDA
receptor-dependent bidirectional synaptic plasticity. Proc. Natl. Acad. Sci.
U.S.A., 99:10831–10836, Aug 2002.
M. Sivilotti. Wiring Considerations in Analog VLSI SystemsWith Applications to Field Programmable Networks. PhD thesis, California Inst. Technol., Pasadena,, 1991.
P.J. Sjostrom, G.G. Turrigiano, and S.B. Nelson. Rate, timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron, 32:1149–1164,
Dec 2001.
S. Still, K. Hepp, and R.J. Douglas. Neuromorphic walking gait control.
IEEE TRANSACTIONS ON NEURAL NETWORKS, 17:496–508, 2006.
I. E. Sutherland. Micropipelines. Communications of the ACM, 32(6):720–
738, 1989.
Synaptics. URL http://www.synaptics.com/.
T. Teixeira, E. Culurciello, J. Park, D. Lymberopoulos, A. BartonSweeney,
and A. Savvides. Addressevent imagers for sensor networks: Evaluation
and modeling. In Information Processing in Sensor Networks, pages 19–21,
2006.
Y. Tsividis. Externally linear, time-invariant systems and their application
to companding signal processors. IEEE Trans. Circuits and Systems, pages
65–85, 1997.
M. Valle. Analog VLSI implementation of artificial neural networks with
supervised on-chip learning. Analog Integrated Circuits and Signal Processing, 33:263–287, 2002.
A. van Schaik. Building blocks for electronic spiking neural networks. Neural
Networks, 14(6–7):617–628, Jul–Sep 2001.
E.A. Vittoz. Analog VLSI for collective computation. In Proc. IEEE Int.
Conf. on Electronic Circuits and Systems, volume 2, pages 3–6, 1998.
R. J. Vogelstein, F. Tenore, R. Philipp, M. S. Adlerstein, D. H. Goldberg, and
Cauwenberghs. Spike timing-dependent plasticity in the address domain.
In Advances in Neural Information Processing Systems, Cambridge, MA,
2003. MIT Press.
Bibliography
147
R.J. Vogelstein, F. Tenore, R. Etienne-Cummings, M.A. Lewis, N. Thakor,
and A. Cohen. Control of locomotion after injury or amputation. Biological
Cybernatics, 95:555–566, 2006.
R.J. Vogelstein, U. Malik, E. Culurciello, G. Cauwenberghs, and R. EtienneCummings. A multichip neuromorphic system for spike-based visual information processing. Neural Computation, 19:2281–2300, 2007.
Y-X. Wang and S-C. Liu. Programmable synaptic weights for an avlsi network of spiking neurons. In IEEE International Symposium on Circuits
and Systems, pages 4531–4534, 2006.
B. Wen and K. Boahen. Active bidirectional coupling in a cochlear chip.
In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems, volume 18, pages 1497–1504. MIT Press,
Cambridge, MA, 2006.
X. Xie and H. S. Seung. Spike-based learning rules and stabilization of
persistent neural activity. In Advanced Research in Asynchronous Circuits
and Systems, 2000.
Z Yang, A. Murray, F. Worgotter, K. Cameron, and V. Boonsobhak. A neuromorphic depth-from-motion vision model with stdp adaptation. Neural
Networks, IEEE Transactions on, 17:482–495, 2006.
K. Y. Yun, P.A. Beerel, and J. Areceo. High-performance asynchronous
pipeline circuits. In Advanced Research in Asynchronous Circuits and Systems, pages 17–28, 1996.
K.A. Zaghloul and K. Boahen. An ONOFF log domain circuit that recreates
adaptive filtering in the retina. IEEE Trans. Circuits and Systems., 52(1):
99–107, 2005.
K.A. Zaghloul and K. Boahen. A silicon retina that reproduces signals in the
optic nerve. Journal of Neural Engineering, 3:257–267, December 2006.
doi: 10.1088/1741-2560/3/4/002.