* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Learning to classify complex patterns using a VLSI network of
Feature detection (nervous system) wikipedia , lookup
Neural engineering wikipedia , lookup
Long-term depression wikipedia , lookup
Artificial neural network wikipedia , lookup
Neuroanatomy wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Caridoid escape reaction wikipedia , lookup
Synaptic noise wikipedia , lookup
Single-unit recording wikipedia , lookup
Central pattern generator wikipedia , lookup
Pre-Bötzinger complex wikipedia , lookup
Holonomic brain theory wikipedia , lookup
Catastrophic interference wikipedia , lookup
Metastability in the brain wikipedia , lookup
Development of the nervous system wikipedia , lookup
Neurotransmitter wikipedia , lookup
Neural coding wikipedia , lookup
Convolutional neural network wikipedia , lookup
Sparse distributed memory wikipedia , lookup
Pattern recognition wikipedia , lookup
Recurrent neural network wikipedia , lookup
Neural modeling fields wikipedia , lookup
Nonsynaptic plasticity wikipedia , lookup
Activity-dependent plasticity wikipedia , lookup
Types of artificial neural networks wikipedia , lookup
Biological neuron model wikipedia , lookup
Nervous system network models wikipedia , lookup
Synaptic gating wikipedia , lookup
Diss. ETH No. 17821 Learning to classify complex patterns using a VLSI network of spiking neurons A dissertation submitted to ETH ZURICH For the degree of Doctor of Sciences Presented by Srinjoy Mitra MTech Microelectronics Indian Institute of Technology, Bombay Thesis committee Prof. Rodney Douglus Prof. Stefano Fusi Dr. Giacomo Indiveri 2008 b Abstract The learning and classification of natural stimuli are accomplished by biological organisms with remarkable ease, even when the input is noisy or incomplete. Such real-time classification of complex patterns of spike trains is a difficult computational problem that artificial neural networks are confronted with. The performance of classical neural network models depends critically on an unrealistic feature, the fact that their synapses have unbounded weight. In contrast, biological synapses face the hard limit of physical bounds as well as problems of noisy, unmatched elementary devices. Although the reasons for the superiority of the nervous system in the real world are not completely understood, it is obvious that the main methods of neural computation in biology are very different from those of modern digital computers. In the brain, neuronal networks perform local analog computation and transmit information using energy-efficient asynchronous events (spikes). Unlike the digital logic elements built in silicon, computational primitives in biology (i.e. neurons and synapses) are imprecise, but exhibit a highly faulttolerant behavior. Nevertheless, the basic elements of the neural substrate and that of silicon technology obey similar physical principles. The emerging discipline of neuromorphic engineering recognizes and exploits such similarities, and maps the properties of neural computation onto silicon to implement new types of computing devices. Such a neuromorphic system can be designed to emulate the spike-based synaptic plasticity of a biological neural network, the root of learning and classification. The goal of this project was to build a spike-based hardware device that exhibits memory formation and classification, in real time and with minimal power consumption. Understanding how to accomplish this in VLSI networks of spiking neurons can not only contribute to an insight into the fundamental mechanisms of computation used in the brain, but could also lead to efficient hardware implementations for a wide range of applications, from autonomous sensory-motor systems to brain-machine interfaces. In this thesis, silicon implementation of a novel spike based supervisedlearning mechanism, that utilizes bounded synapses with limited analog resi ii olution, is presented. The learning mechanism modifies the synaptic weights only as long as the current generated by all the stimulated plastic synapses does not match the output desired by the supervisor, as in the perceptron learning rule (Brader et al., 2007). This thesis also describes the development and verification of the hardware system capable of performing reliable event based, asynchronous communication necessary for neuromorphic systems. It shows how the modules involved in designing the communication channel can be improved utilizing the asynchronous circuit design techniques. Using the device developed, real-time classification of complex patterns of mean firing rates is carried out. The circuits responsible for synaptic plasticity and their dependence on pre- and post-synaptic signals are extensively characterized in this work. The results include experimental data describing the behavior of the device in classifying random uncorrelated binary patterns and quantification of the memory capacity. The proposed system demonstrates, for the first time, robust classification of highly correlated spike patterns on a silicon device. It could successfully learn graded and corrupted patterns that can lead to classification of real life spike trains from silicon sensors or from nerve signals. The thesis demonstrates how the scaling properties of the VLSI system matched that of the theoretical learning rule. The VLSI system developed exhibits superior performance when compared to state-of-the-art spike based learning systems. This device is an ideal candidate for low-power biomedical applications or for integration into a multi-chip spike-based neuromorphic system. It is already under examination for the classification of spoken vowels, captured by a silicon cochlea. A larger system has recently been built and is undergoing characterization to demonstrate an even higher classification performance. Résumé L’apprentissage et la classification de stimuli naturels sont accomplis avec une facilit remarquable par des organismes biologiques, même quand le stimulus est brouill ou incomplet. Une telle classification tempsréel sur des ensembles complexes de squences de potentiels d’action est un problme computationel difficile auquel sont confronts les rseaux de neurones artificiels. Les performances des rseaux de neurones artificiels classiques reposent de faon critique sur un aspect irraliste, qui est que leurs poids synaptiques n’ont pas de valeurs limites. En contraste, les synapses biologiques font face aux limitations strictes imposées par les contraintes physiques, ainsi qu’aux problmes de bruits et d’inconsistance entre leurs composants élémentaires. Bien que les raisons de la supériorité du systme nerveux dans le monde réel ne soient pas ompltement élucidées, il est évident que les principales méthodes de computation neuronale en biologie sont trs différentes de celles des ordinateurs modernes. Dans le cerveau, les réseaux neuronaux accomplissent des calculs analogiques locaux et transmettent l’information sous forme d’événements asynchrones et économe en énergie (les potentiels d’action). A la différence des composants électroniques fabriques en silicone, les unités de calcul biologique (i.e. neurones et synapses) sont imprécises, mais largement tolérantes aux erreurs. Néanmoins, les composants basiques du substrat neuronale et ceux des technologies à base de silicone obéissent à des principes physiques similaires. La discipline émergente de l’ingénierie neuromorphique reconnat et exploite ces similarités, et transfre les propriétés de computation neuronale vers le silicone pour implémenter de nouveaux procédés de omputation en circuit électroniques. De tels systmes neuromorphiques peuvent être conus pour émuler la plasticité synaptique de réseaux de neurones biologiques, actrice principale de l’apprentissage et de la classification. Le but de ce projet était de réaliser un circuit électronique exploitant le traitement de potentiels d’action numériques pour la mémorisation et la classification. Comprendre comment construire ce circuit en réseaux de neurones VLSI peut non seulement aider à obtenir un aperu des mécanismes iii iv de calcul fondamentaux utilisés par le cerveau, mais pourrait aussi mener à l’implémentation de composants électroniques plus efficaces dans de nombreux domaines d’applications, allant des systmes sensorimoteurs autonomes aux interfaces cerveau-machine. Dans cette thse, l’implémentation électronique d’un nouveau mécanisme d’apprentissage supervisé est présenté, qui utilise des synapses contraintes avec une résolution analogique limitée. Le processus d’apprentissage modifie les poids synaptiques seulement tant que le courant généré par toutes les synapses plastiques stimulées ne correspond pas au résultat en sortie désiré par le superviseur, comme dans la rgle d’apprentissage du Perceptron (Brader et al., 2007). Cette thèse décrit aussi le développement et la vérification d’un système électronique implémentant une communication sous forme de potentiel d’action asynchrones, nécessaire aux systèmes neuromorphiques. On montre ici comment les modules utilisés dans la création des canaux de communication peuvent être améliorés par les procédés d’élaboration de circuit asynchrones. Avec le circuit développé, on procède à la classification temps-réel des motifs complexes de taux moyen de décharge de potentiels d’action. Les circuits responsable de la plasticité synaptique et de sa dépendance vis à vis des signaux pre- et post-synaptiques est détaillé. Les résultats incluent les données expérimentales décrivant le comportement du circuit dans la classification de motifs binaires aléatoires et non corrélés et la quantification de la capacité mémorielle. Le système proposé démontre, pour la première fois, la capacité d’un composant à base de silicone de classifier de faon robuste des motifs de potentiels d’actions hautement corrélés. Il a pu apprendre avec succès des motifs gradués et corrompus utilisables pour la classification de séquence de potentiels d’actions en conditions réelles, venant de capteurs électroniques ou d’influx nerveux. Cette thèse démontre que l’évolution des propriétés du système VLSI en fonction du nombre de synapses correspond au prévisions déduites de la règle d’apprentissage théorique. Les performances du système VLSI que nous avons développé sont supérieures à celles des meilleurs systèmes d’apprentissage à base d’événements ou potentiels d’action. Ce circuit est un candidat idéal pour les applications biomédicales à basse consommation d’énergie, ou pour l’intégration au sein de systèmes neuromorphiques à composants multiples. Il est déjà considéré pour la classification des voyelles parlées, captées via une cochlea électronique. Un système plus large a récemment été construit et est testé qui devrait démontrer un niveau encore plus élevé de performance en classification. Contents 1 Introduction 1.1 Biologically inspired hardware 1.1.1 Digital vs Analog . . . 1.1.2 Spiking vs rate models 1.2 Why Learning? . . . . . . . . 1.3 Thesis outline . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biophysical models of learning 2.1 Introduction . . . . . . . . . . . . . . . . . . . . 2.2 Spike-driven plasticity . . . . . . . . . . . . . . 2.3 The palimpsest property . . . . . . . . . . . . . 2.3.1 Bounded and bistable synapses . . . . . 2.4 Stochastic update and stop-learning mechanisms 2.5 The learning rule . . . . . . . . . . . . . . . . . 2.6 Network description . . . . . . . . . . . . . . . . 3 AER Communication circuits 3.1 Introduction . . . . . . . . . . . . . . 3.2 Basics of AER communication . . . . 3.3 Single sender and single receiver . . . 3.3.1 Pipelining the data . . . . . . 3.4 Multiple sender and multiple receiver 3.4.1 Data path design . . . . . . . 3.5 Receiver handshake . . . . . . . . . . 3.6 Arbitration basics . . . . . . . . . . . 3.6.1 Standard arbiter . . . . . . . 3.6.2 Fast, unfair arbiter . . . . . . 3.6.3 Fair arbiter . . . . . . . . . . 3.7 Decoder . . . . . . . . . . . . . . . . 3.7.1 Delay . . . . . . . . . . . . . 3.7.2 Address Latching . . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 5 6 9 . . . . . . . 11 11 12 14 15 16 17 19 . . . . . . . . . . . . . . 21 21 22 24 25 29 30 31 33 36 38 43 45 45 48 vi Contents 3.8 3.7.3 Receiver Synapse Select . . . . . . . . . . . . . . . . . 49 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4 Circuits for synaptic plasticity 4.1 Introduction . . . . . . . . . . . . . . 4.2 The IFSL family of chips . . . . . . . 4.3 The pre-synaptic module . . . . . . . 4.3.1 The pulse extender block . . . 4.3.2 The weight update block . . . 4.3.3 The bistability block . . . . . 4.3.4 The EPSC block . . . . . . . 4.3.5 Point-neuron architecture . . 4.4 The post-synaptic module . . . . . . 4.4.1 The I&F soma block . . . . . 4.4.2 The pulse integrator block . . 4.4.3 The dual threshold comparator 4.5 Configuration of synaptic density . . 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . block . . . . . . . . 5 Characterization of the plasticity circuits 5.1 Introduction . . . . . . . . . . . . . . . . . 5.2 The post-synaptic module . . . . . . . . . 5.3 The pre-synaptic module . . . . . . . . . . 5.4 Transition probabilities . . . . . . . . . . . 5.5 STDP phase relation . . . . . . . . . . . . 5.6 Multiplexer functionality . . . . . . . . . . 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Spike based learning and classification 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 6.2 Network architecture . . . . . . . . . . . . . . . . . 6.3 Training methodology . . . . . . . . . . . . . . . . 6.4 Evolution of synaptic weights . . . . . . . . . . . . 6.5 Classifying multiple spatial patterns . . . . . . . . . 6.5.1 Uneven class distributions . . . . . . . . . . 6.6 Quantitatitive analysis of classification performance 6.6.1 Boosting the classifier performance . . . . . 6.7 Classification of Correlated patterns . . . . . . . . . 6.8 Classification of graded patterns . . . . . . . . . . . 6.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 53 54 55 56 58 59 63 65 67 68 69 70 76 77 . . . . . . . 79 79 80 82 86 92 93 94 . . . . . . . . . . . 97 97 98 98 100 101 104 105 107 109 112 113 Contents 7 Discussion 7.1 Relevance of the work described in this thesis 7.1.1 A robust AER communication system 7.1.2 Synaptic plasticity in silicon . . . . . . 7.1.3 Learning and classification in VLSI . . 7.2 Future Work . . . . . . . . . . . . . . . . . . . 7.3 Outlook . . . . . . . . . . . . . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 . 115 . 116 . 117 . 118 . 119 . 120 A C-element 123 B Hand Shaking Expansion 125 C Current-mode log domain Filter 127 Bibliography 133 viii List of Figures 1.1 1.2 1.3 1.4 Voltage clamp experiment in biology and silicon . . . . Data from Voltage clamp experiment . . . . . . . . . . Generalized artificial behavioral system . . . . . . . . . Structure of the hippocampus and details of a synapse . . . . . . . . . . . . . . . . 4 5 6 8 2.1 2.2 2.3 2.4 Spike time dependent plasticity data and model Memory retention experiment . . . . . . . . . . Theoretical learning rule, simulation results . . Architecture for a feedforward network . . . . . . . . . . . . . . . . . . . . . 13 14 18 20 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 Comparison between real and virtual axon . . . . . . . . . . . Schematic of an AER system . . . . . . . . . . . . . . . . . . Fundamentals of synchronous pipeline and 4-phase handshaking Pipelining in AER Communication cycle . . . . . . . . . . . . Implementation of pipeline in AER communication channel . . Data exchange between combinational blocks . . . . . . . . . . Conventional and wired-OR gate . . . . . . . . . . . . . . . . On chip verification of AER event . . . . . . . . . . . . . . . . AER communication failure . . . . . . . . . . . . . . . . . . . Mutual exclusion circuit . . . . . . . . . . . . . . . . . . . . . Output of mutual exclusion circuit . . . . . . . . . . . . . . . General arbitration scheme . . . . . . . . . . . . . . . . . . . . Standard arbiter cell . . . . . . . . . . . . . . . . . . . . . . . Standard arbiter timing diagram . . . . . . . . . . . . . . . . . Fast, unfair arbiter cell . . . . . . . . . . . . . . . . . . . . . . Data from fast, unfair arbiter . . . . . . . . . . . . . . . . . . Fast, fair arbiter cell . . . . . . . . . . . . . . . . . . . . . . . Data from fast, fair arbiter . . . . . . . . . . . . . . . . . . . . Traditional decoder scheme . . . . . . . . . . . . . . . . . . . RC delay in decoder . . . . . . . . . . . . . . . . . . . . . . . A pre-decoding system . . . . . . . . . . . . . . . . . . . . . . ix . . . . . . . . . . . . . . . . 22 23 25 26 28 31 32 32 33 34 35 36 37 39 40 41 42 44 46 47 48 x List of Figures 3.22 3.23 3.24 3.25 Pre-decoder layout . . . . . . . . . Dual rail data communication . . . AER receiver chip implementation Pixel select circuit in AER receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 50 51 52 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.15 4.16 4.17 4.18 4.19 4.20 4.21 Layout of neurons and synapses on the silicon chip . . . . . . Pulse Extender(PE) circuit . . . . . . . . . . . . . . . . . . . . Simulation result of the PE circuit . . . . . . . . . . . . . . . Weight update block . . . . . . . . . . . . . . . . . . . . . . . Bistability block . . . . . . . . . . . . . . . . . . . . . . . . . . Dependence of EPSC on bistability output . . . . . . . . . . . Alternative design of the bistability block . . . . . . . . . . . . Simulation results of the bistability block . . . . . . . . . . . . Model of a silicon synapse . . . . . . . . . . . . . . . . . . . . Differential pair integrator (DPI) schematic . . . . . . . . . . Point Neuron architecture and its silicon equivalent . . . . . . learn control signals generated from post-synaptic module . . Integrate-and-fire neuron stimulated at the soma and synapse Coupling between global shared signals . . . . . . . . . . . . . Dual threshold voltage comparator . . . . . . . . . . . . . . . Data from IFSL-v1 chip . . . . . . . . . . . . . . . . . . . . . WTA circuit used as current comparator . . . . . . . . . . . . Complete post-synaptic module . . . . . . . . . . . . . . . . . Active current mirror . . . . . . . . . . . . . . . . . . . . . . Use of a multiplexer to reconfigure synaptic density . . . . . . 55 57 58 60 61 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 Silicon I& neuron . . . . . . . . . . . . . . . . . . . Verification of the learn control functionality . . . . Stimulation protocols . . . . . . . . . . . . . . . . . Stochastic Transition . . . . . . . . . . . . . . . . . Synaptic update depends on control voltages . . . . Dependence of transition probability on νpost . . . . Stochastic transition without initialization . . . . . Stochastic LTP transition with initialized synapses Stochastic LTD transition with initialized synapses STDP phase relation . . . . . . . . . . . . . . . . . Multiplexer verification data . . . . . . . . . . . . . 81 83 84 85 85 87 89 90 91 93 94 6.1 6.2 6.3 Binary patterns for training . . . . . . . . . . . . . . . . . . . 99 Evolution of νpost for a C + training . . . . . . . . . . . . . . . 101 Evolution of νpost for a C − training . . . . . . . . . . . . . . . 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 Classification result for four spatial patterns . . . . . . . . . . Classification result for six and eight spatial patterns . . . . . Pattern recognition of 2D binary images . . . . . . . . . . . . Memory recall from corrupted data set . . . . . . . . . . . . . Classification of one pattern out of many . . . . . . . . . . . . Basics of ROC analysis . . . . . . . . . . . . . . . . . . . . . . Classification performance and memory capacity . . . . . . . . Boosting the classification performance . . . . . . . . . . . . . Classification performance for correlated patterns . . . . . . . Classification performance with and without the stop-learning Classification performance for graded patterns . . . . . . . . . Classification performance for Gaussian distributed input . . . xi 102 103 104 105 106 107 108 109 110 111 112 113 A.1 C element truthtable . . . . . . . . . . . . . . . . . . . . . . . 123 A.2 C element implementation . . . . . . . . . . . . . . . . . . . . 124 C.1 Basic log-domain filter . . . . . . . . . . . . . . . . . . . . . . 128 C.2 Gain in Log-domain filter . . . . . . . . . . . . . . . . . . . . 130 xii Chapter 1 Introduction Understanding the functional and structural properties of the brain has been a baffling task, challenging scientists for long. While neuroscientists, coming from virtually all disciplines of science, are hard put to decipher this enigma, engineers are often reluctant to dive into the realm of biology. Although it is widely accepted that evolution has done an excellent job in engineering the organization of biological systems, traditionally engineered systems have failed to incorporate its virtues. This has been particularly true for the computing industry, which is solely driven by the kind of problem it wants to solve i.e., high speed data crunching. Conventional methods of computation and information processing, nevertheless, have evolved computing power by many orders of magnitude since the early ENIAC 1 , and must be acknowledged as the most successful technology of the past century. However, even the most advanced general-purpose computer falls far behind the capacity of a fly brain when simple behavioral (navigation, pattern recognition, communication) tasks are concerned. As Carver Mead, one of the forefathers of large–scale electronics on silicon pointed out (Mead, 1990): Biological information-processing systems operate on completely different principles from those with which most engineers are familiar. For many problems, particularly those in which the input data are ill-conditioned and the computation can be specified in a relative manner, biological solutions are many orders of magnitude more effective than those we have been able to implement using digital methods. 1 Unveiled in 1946 at University of Pennsylvania, it was the first general-purpose electronic computer that was Turing Complete. It weighed 27 tones, occupied 63m2 and consumed 150kw power. 1 2 Chapter 1. Introduction He envisioned that the technology of silicon chips can be appropriately utilized to morph computational primitives of biological systems. This is in stark contrast to how digital computers functions. Digital computers are good at precision driven arithmetic operation while animals have evolved brains and senses that let them interact efficiently with the imprecise inputs of the real world. Comparison between efficiency of the human brain and standard digital computers, though qualitative, is able to highlight some huge discrepancies. Roughly, the 1012 neurons in the human brain, each with an average 103 synaptic connections, spike at an average rate of 10Hz. This adds up to nearly 1016 synaptic events per second. From measurements of cerebral blood flow and oxygen consumption, it is estimated that the brain consumes around 10W burning a mere 10−15 J per operation (Mead, 1990; Sarpeshkar, 1998). On the other hand, the bleeding-edge Intel 80-core teraflop (1012 floating point operations per second) processor consumes nearly 100W (Intel, 2007) burning 10−10 J per operation (roughly an order of magnitude improvement compared IBM Blue Gene, the most powerful supercomputer till date). Even these prohibitively expensive gigantic prototype machines, results of multi-million dollar research, are nearly a million times less efficient than the brain. However, some researchers intend to understand the behavior of large neural networks (comparable to the size of brain) by simulating them in large clusters of these super fast general purpose digital computers. E.M. Izhikevich simulated 1011 neurons for one second in a beowulf cluster of twenty seven 3GHz machines that took fifty days to complete its task (Izhikevich, 2005)! The impressive Blue Gene/L supercomputer at EPFL, Switzerland, with its 8000 processors will be able to simulate detailed models of just 10,000 neurons constituting a single cortical column (Markram, 2006). Andres Lanser and his team are undertaking a similar effort at KTH (Sweeden), simulating 100 cortical hypercolumns with 22 million neurons, but more than 5000 times slower than real time (Djurfeldt et al., 2008). These attempts show the difficulties faced by general purpose digital computers in simulating very basic functionalities of the brain in an efficient manner. In contrast, custom hardware, that is not limited to simulation of large neural networks, might be necessary to emulate it. To build a brain like intelligent system on should design computational primitives on silicon using the same physical laws as in biology. 1.1 Biologically inspired hardware As research in artificial neural networks gained momentum during late 80s, a group of researchers started looking into its hardware implementation. Two 1.1. Biologically inspired hardware 3 distinct schools of thought that soon emerged are based on digital and analog nerumorphic hardware. As most of the VLSI designers round the globe consider digital design as their forte, their familiarity in the field and the availability of high level languages for design synthesis often made it an obvious choice over analog design. However, a small but dedicated community of analog designers, lead by Carver Mead, pioneered the field of neuromorphic VLSI that took advantage of the inherent analog nature of silicon and biology (see Boahen, 2005; Sarpeshkar, 2006; Indiveri et al., 2008, for recent reviews). Neuromorphic hardware offers a medium in which neuronal networks can be emulated directly in real–time and with extremely low power consumption. 1.1.1 Digital vs Analog There has been substantial interest in developing dedicated digital hardware for neural networks in the early 90s . The fast design cycle, flexibility of design and high precession of digital computation initiated a deluge of specialized chips both from the industry and academia (Lindsey and Lindblad, 1995). SIMD (single instruction multiple data) parallel processors, namely CNAPS (Hammerstrom, 2002), were successful in implementing large networks with speeds much greater than that of conventional microprocessors. Perhaps the most promising technology for digitally emulating neural networks today are the Field Programmable Gate Arrays (FPGA). These are semiconductor devices containing programmable logic components, programmable interconnects and some memory elements. The ease in their reconfigurability and fastest design cycle gives them a leading edge compared to all other hardware implementations. Unlike general purpose processor that divides computation over time, dedicated hardware like FPGA, divides it across space by using more physical resources. This inherently parallel computing architectures of FPGA device gives them additional advantage as candidates for neural hardware. FPGAs are now available with hardwareoptimized multipliers and also with larger on–chip memory to generate huge networks of spiking neurons (Ros et al., 2006). Due to limited silicon real estate, FPGA neural networks almost always use fixed-point arithmetic, compared to the standard floating-point one. This is generally not a big handicap, as precision can be often traded for redundancy in such networks. However, this design choice has potential pitfalls since it is not clear whether a given fixed-point representation will allow for a faithful implementation of the underlying model (Pearson et al., 2007). Apart from that, high area and power budget required for digital computation becomes a limiting factor when larger networks are concerned. 4 Chapter 1. Introduction Control voltage Control voltage Sense Electrode Drive Electrode Axon Drain Gate (a) Source (b) Figure 1.1: Voltage clamp experiment. (a) Schematic of the arrangement used to measure current through the cell membrane of a squid axon, where the membrane potential is clamped to a fixed voltage. The Sense electrode measures the actual intralcellular potential while the current through the Drive electrode pushes it to wards the desired Control Voltage. (Adapted from Hodgkin and Huxley (1952)). (b) A similar experimental setup duplicated for a nMOS connected as a two terminal device to measure the current through them. Fundamental considerations show that digital processing is more efficient than analog with respect to power consumption (as well as chip area) when the required signal-to-noise ratio (or more generally the required precision) is large, typically larger than 60 dB (Sarpeshkar, 1998). Conversely, analog processing is more efficient when low precision is acceptable. This is the case for evaluative processing, in which the need for precision of individual cells is replaced by that for collective computation in massively parallel structures (Vittoz, 1998). In analog circuitry, complex, nonlinear operations such as multiply, divide, hyperbolic tangents etc. can be performed with a handful of transistors. Analog computation in subthreshold domain 2 comes with the added benefit of extremely low power consumption and an exponential transfer function. Fig.1.1(a) shows the measurement method of current through a cell membrane clamping the the membrane potential to a desired value (Control voltage), 2 In MOS transistors the amount of current (i) flowing between its two terminals (source and drain) is controlled by the applied voltage (v) at the a third terminal (gate). The transistors when operated at very low gate voltages show exponential i − v relation (subthreshold) compared to higher gate voltages where they have a quadratic relation. Most analog and digital circuits, however, use MOS transistors at higher gate voltages compared to subthreshold. 1.1. Biologically inspired hardware (a) 5 (b) Figure 1.2: (a) Exponential current-voltage characteristics of voltagedependent membrane channels (Hodgkin and Huxley, 1952) measured from the voltage clamp expriment in Fig. 1.1(a). (b) Analogous current-voltage relation from a nMOS silicon transistor connected as in Fig. 1.1(b). while Fig.1.2(a) shows the measured current. A similar method used for MOS transistors operating in subthreshold regime ( Fig.1.1(b)) produces nearly identical i − v characteristics shown in Fig.1.2(b). Carver Mead realized that this similarity in nature of charge transfer between a cell membrane and a silicon transistor, both following the Boltzman distribution law , can be exploited in building circuits that mimic biology. In Douglas et al. (1995) showed that, in general, neural computational primitives such as conservation of charge, amplification, exponentiation, thresholding, compression and integration arise naturally out of physical process of analog circuits. These observations provided a solid ground for neuromorphic engineers around the world to flourish with ideas for emulating the nervous system. 1.1.2 Spiking vs rate models The basis of communication between neurons in the brain, and also in artificial neural networks, is their output activity pattern. Traditionally, neural networks used rate-based models, where the normalized average firing rate (a quantity between 0 and 1), is the information to be transmitted to another neuron. These models were successful in understanding various neuronal mechanisms including learning (Bell and Sejnowski, 1997; Blais et al.). Rate models could be efficiently implemented in dedicated hardware, both digi- Sensor Front end Signal Processing Feature extraction Learning/ Classification Motor output Behavior Chapter 1. Introduction Stimulus 6 Figure 1.3: A generalized artificial system for behavioral tasks. The system partitions, though much simpler compared to that in biology, consists of relevant modules which can have neuromorphic analogues. The modules should be able to efficiently transfer information among themselves, in real time. tal (Lindsey and Lindblad, 1995; Danese, 2002) and analog (Valle, 2002). However, they have limitations from both theoretical and practical perspective. These models do not address phenomena such as temporal coding, spike-timing dependent synaptic plasticity, or any short-time behavior of neurons. Whereas, spike based coding allows incorporating spatio-temporal information in communication and computation, like real neurons do (Gerstner, 2001). Where hardware implementation is concerned, communication of analog signal (firing rates of neuron) between chips seriously limit the system bandwidth and also the number of processing units that can be implemented per chip. Neuromorphic engineering, on the other hand, has mostly relied on spike based coding where fast inter-chip communication can be achieved by transmitting digital spikes multiplexed on a single bus (see chapter 3 for detailed discussion). A generalised multi-chip behavioral system would ideally have front end signal processing circuits connected to the sensor, succeeded by higher order processing and finally a motor control unit. In Fig. 1.3 we show such a system in a simplified form where each subsystem can be considered as a separate neuromorphic chip. Taking advantage of the physics of silicon, the sensors and front end processing circuits (often integrated together) are designed to encode the stimulus in spike code. Considering spike based inter-chip communication, it is logical to design higher order computational units also based on spike based coding. Here we focus on one of the most essential features of an artificial behavioral system that implements spike-based learning and classification, on a VLSI device. 1.2 Why Learning? Most of the effort in early days of neuromorphic engineering was focused on designing efficient sensors and subsequently the front end processing circuits necessary for them. Vision and audition are , by far, the two most researched topics in this field. A number of vision chips, varying in their degree of truly 1.2. Why Learning? 7 emulating the biology of retina, has been built. From the early days of Mahowald (1992), to Culurciello et al. (2003), Zaghloul and Boahen (2006) and Lichtsteiner et al. (2006), are notable examples. Similarly, Lyon and Mead (1988) paved the path for auditory chips which was later developed by Chan et al. (2006), Wen and Boahen (2006) and Sarpeshkar (2006) among others. Recently various neuromorphic chemical sensors have been successfully demonstrated, like Georgiou and Toumazou (2007) , Koickal et al. (2007), Shen et al. (2003), etc. extending the scope of the field even further. The front end signal processing for most neuromorphic devices are performed right on the sensor chip, producing spike trains encoding features of the input stimulus. Chips for spatial filtering the data from vision sensor (Etienne-Cummings et al., 1999; Zaghloul and Boahen, 2005), for optimized signal extraction from noisy bioelectric interfaces using probabilistic models (Genov and Cauwenberghs, 2002; Chen et al., 2006) or for specialized function like convolution (Serrano-Gotarredona et al., 2006) also falls in this category. There are number of examples of neuromorphic chips responsible for feature extraction and other higher order neuronal processing. Orientation selctivity (Choi et al., 2004; Chicca et al., 2007), feature extraction (Vogelstein et al., 2007) and saliency detection (Bartolozzi and Indiveri, 2007a) being few among them. Neuromorphic chips for motor control are mostly in their early stage of development (Vogelstein et al., 2006; Still et al., 2006). This is partially because of the lack of robust on-chip classification necessary to stimulate motor behavior in a selective manner. In building a real-time behavioral system, learning and classification undoubtedly forms an integral part. Yet, the volume of research in devicing a neuromorphic learning chip is comparatively low. One obvious reason is the lack of established models pertaining to spike based learning, essential for multi-chip system shown in Fig. 1.3. The process of classification of patterns performed by neural networks is usually the result of a training procedure, during which the synaptic strengths between neurons are modified. It has been well established that plasticity in the hippocampal and neocortical synapses in the brain is the root of memory formation (Martin et al., 2000; Shouval et al., 2002). Excitatory synapses throughout the brain are bidirectionally modifiable and is studied in great detail in layers 2/3 and 5 of the neocortex and in CA1 cells of the hippocampus. However, the actual methodology of synaptic modification is still a matter of debate. Given the complexity of learning and memory, it is possible to find many forms of synaptic plasticity with different mechanisms of induction and expression (Abbott and Nelson, 2000). Spike timing dependent plasticity (see Sec. 2.2) 8 Chapter 1. Introduction (a) (b) Figure 1.4: a) The highlighted hippocampus region is considered to be the center of learning and memory in the primate brain. Bidirectional synaptic plasticity, believed to be the cause of memory formation, has been observed in hippocampus and in some other cortical synapses. b) A cartoon of two neurons forming a synapse and the details of the synapse indicating the complex biochemical mechanism during a synaptic event. is one possible mechanism motivated by experimental (Levy and Steward, 1983; Markram et al., 1997) and by theoretical studies (Kempter et al., 1999; Abbott and Song, 1999). It has been taken up by some neuromorphic engineers to device learning systems (Arthur and Boahen, 2006; Bofill-i Petit and Murray, 2004; Indiveri et al., 2006a) due to its and success in solving various computational problems. In fact, a similar form of spike timing based learning was proposed in a neuromorphic chip by Hafliger and Mahowald (1999) even before the formal STDP rule gained popularity. In accordance with the theoretical necessity of STDP, Bofill-i Petit and Murray (2004) and Indiveri et al. (2006a) showed methods for analog VLSI implementation of such synaptic rule. Fusi et al. (2000), Häfliger (2007) showed feasibility of other forms of spike based learning in silicon including detailed characterisation of the synaptic dynamics. It was only in Arthur and Boahen (2006) and Häfliger (2007) that some basic classification experiments were performed, however, they do not quantify the classification behavior. Apart from that, no one has tried to solve the difficult problem of classifying 1.3. Thesis outline 9 correlated patterns in a silicon device. In this thesis we show an efficient VLSI implementation of a very robust spike based learning rule (Brader et al., 2007). We designed a family of chips, code named IFSL (different versions are IFSL-v1, IFSL-v2 and IFSL-WTA), that can map the theoretical learning rule on silicon for real-time learning and classification. We present experimental data, mostly from the IFSL-v2 chip, describing the detailed behavior of the learning circuits, at the single neuron and synapse level, and quantify classification results of complex spatial patterns of mean firing rates, both uncorrelated and correlated. IFSL-WTA is still in the process of characterisation. Giulioni et al. (2007) showed preliminary results using the same learning rule implemented in a VLSI chip with much more area and power overhead (but with added reconfigurability). 1.3 Thesis outline This thesis is divided into seven chapters. In this chapter I presented some introductory remarks on the motivation for building a VLSI chip capable of real-time spike based learning and classification. This device can be a part of a large multi-chip behavioral system for applications like robotics or manmachine interfaces. Chapter. 2 deals with the theoretical basis of learning and classification. I discuss the various physical characteristics observed in a natural learning system and the models implementing such behavior. I describe in detail, a bio-plausible learning rule with stochastic and bistable synapses (Brader et al., 2007) and justify the advantages of implementing it in VLSI. In Chap. 3, I introduce the AddressEvent Representation (AER) for interchip communication in neuromorphic systems. I review the asynchronous pipelining methodology and suggest improvements in existing AER using the knowledge of asynchronous communication channel design. I also study the various combinational circuits required for a robust data transfer between neuromorphic chips and show possible improvement schemes. The CMOS circuit blocks implementing the spike based learning rule is described in Chap. 4. Various circuits designed in the neuromorphic community were reused and new circuits designed. I highlight the advantages and disadvantages of different circuit elements used in the IFSL 3 family of chips, used in this project. In Chap. 5 I describe various methods to characterize the individual circuit blocks and also the on-chip synaptic plasticity. The behavior of silicon synapses are also compared with that of the theoretical requirement. Chapter. 6 deals with spike based learning and classification on the VLSI system. I first describe the training and testing methodology of the silicon 3 IFSL stands for Integrate and Fire with Stop Learning. 10 Chapter 1. Introduction neurons. Next, the classification performance for random binary patterns, of mean frequencies, are described in detail along with rigorous quantification. I also demonstrate the difficult case of classifying correlated and graded patterns on neuromorphic VLSI, reported for the first time. I conclude the thesis by summarizing the results achieved and discussing ideas about further work and outlook in Chap. 7. Chapter 2 Biophysical models of learning 2.1 Introduction The important role of activity-dependent modifications of synaptic strength in learning and memory formation is well accepted in the neuroscience community (Abbott and Nelson, 2000). Synapses throughout the brain are considered to be bidirectionally modifiable. This property, postulated in almost every theoretical description of synaptic plasticity, has been most clearly demonstrated in the CA1 region of the hippocampus (Martin et al., 2000; Shouval et al., 2002). Understanding the biophysical mechanisms underlying such functional plasticity and learning has been an important aspect in neuroscience research. From the theoretical point of view, the methods of learning can be broadly grouped in two classes, Hebbian plasticity and Classical conditioning. The generalised Hebb rule states that synapses change in proportion to the correlation or covariance of the activities of the preand post-synaptic neurons (Dayan and Abbott, 2001). Classical conditioning, on the other hand, uses the correlation between multiple input signals (conditioned and unconditioned) to determine the synaptic weight. Hebbian plasticity, in the form of long-term potentiation (LTP) and depression (LTD), provides the basis for most models of learning and memory, as well as for the development of cortical maps (Bi and Poo, 2001). It can be subdivided in two categories, the supervised and the unsupervised. In supervised learning, for every pre-synaptic input the post-synaptic neuron is provided with a target; i.e, the environment tells it what its response should be. The neuron then compares its actual response to the target and adjusts the synapse in such a way that it is more likely to produce the appropriate response the next time it receives the same input. In unsupervised learning: the neuron receives no external signal. Instead, the task is to re-represent the 11 12 Chapter 2. Biophysical models of learning inputs in a more efficient way, as clusters or categories, or using a reduced set of dimensions. Unsupervised learning is based on the similarities and differences among the input patterns. In classical conditioning, a process akin to supervised learning, a reinforcer (reward or punishment) is delivered to the neuron independently of the output of the neuron. Here, we will focus only on the supervised learning methodology for complex real world classification problems. In order to classify patterns, both natural and artificial physical systems are required to create, modify and preserve memories of the representations of the learned classes. In particular, they should be capable of modifying the memory elements, the synapses, in order to learn from experience and create new memories (memory encoding). They should also be able to protect old memories from being overwritten by new ones (memory preservation). We will discuss various physical and theoretical constraints in designing an artificial device for such classification tasks. 2.2 Spike-driven plasticity Most learning rules for memory encoding are formulated in terms of mean firing rate, using a continuous variable representing the mean pre- and postsynaptic activity. They generally use a sigmoidal function between input and output. Such a rate-based model neglects the effects originating from the pulse structure of the input signal. However, recently there has been an increased interest in the spike-based Hebbian learning compared to the pure rate-based models (Maass and Bishop, 1998; Kempter et al., 1999; Xie and Seung, 2000). This is influenced both by the biophysical mechanism of information transfer (an all-or-none event) also and the experimental evidence supporting synaptic modification affected by individual spikes. The spike time dependent plasticity (STDP), first observed by Markram et al. (1997) and Bi and Poo (1998) etc, is the most popular spike-based synaptic modification mechanism formulated till date. This is a form of bidirectional synaptic modification dependent on the phase of the pre- and post-synaptic spike instants. As shown in Fig. 2.1(a), the polarity and magnitude of the modification depends on the phase difference of the two spikes. Various possible STDP windows are shown in Fig. 2.1(b). The properties of STDP has been extensively studied both in recurrent neural networks and at the level of single synapses (e.g., Rubin et al., 2001; Kempter et al., 2001). These mechanisms have important regulatory properties and have been shown to create memories of temporal patterns of spikes (Legenstein et al., 2005; Gütig and Sompolinsky, 2006). However, STDP in its simplest form is not suitable 2.2. Spike-driven plasticity 13 pre post t (a) (b) Figure 2.1: a) Experimental data showing the synaptic modification in hyppocampal neurons following a STDP rule ( adapted from Bi and Poo (1998)). b) See Caporale and Dan (2008) for a detailed review and all possible STDP time windows. Time in the x-axis is in milliseconds. for learning patterns of mean firing rates. It is usually too sensitive to the specific temporal pattern of spikes and it can hardly be generalized to trains of spikes sharing the same mean firing rate (Abbott and Nelson, 2000). In most studies, there are specific range of frequencies that must be used for successful induction of STDP. The frequency dependence suggests that there is something more than simple pairing of pre- and post-synaptic spike times necessary for this form of bidirectional plasticity (Lisman and Spruston, 2005). In this temporally asymmetric Hebbian learning, the biophysical mechanism of coincidence detection (of the two spikes) depends on the back propagating action potential (bAP) 1 . The location dependence of bAP in a 1 Action potentials, initiated in the axon, are found to propagate into the dendrites of the hippocampal and cortical pyramidal cells. They form the primary feedback signal to the synapse and determine the shape and polarity of STDP window depending on the 14 Chapter 2. Biophysical models of learning after normal activity Retention interval (hours) Mean percent recalled Retention score after forced activity after sleep after normal waking activity Retention interval (hours) (a) (b) Figure 2.2: Memory retention experiments for cockroaches (a) and collage students (b) show similar decay. Memory decays at a faster rate when new experiences happen compared to that of sleep (adapted from Jenkins and Dallenbach, 1924). dendritic tree imposes additional constrains on the theoretical STDP models. It should also be noted that the standard form of STDP is not suitable for a VLSI implementation due the difficulty in longterm storage of the analog weights. Most attempts in this direction (Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006; Indiveri et al., 2006a) have considered added assumptions to drive the final synaptic states to binary values. 2.3 The palimpsest property The problem of memory preservation has been often neglected in various theoretical models by assuming simplified but unrealistic characteristics for a physical system, like unbounded synaptic weights. However, capacity limited memory systems need to gradually forget old information in order to avoid catastrophic forgetting where all information is lost at once (Sandberg et al., 2002). In his classical experiment, Jenkins and Dallenbach (1924) showed that memory is destroyed by new experiences and not only by time (see Fig. 2.2). Networks with this property are called palimpsests by analogy with the ancient practice of cleaning old texts from papyrus to make way for new ones (Nadal et al., 1986). In order to prevent too-fast forgetting, one can introduce a stochastic mechanism for selecting only a small fraction of synaptic location (Letzkus et al., 2006) 2.3. The palimpsest property 15 synapses to be changed upon the presentation of a stimulus. Such a mechanism can be easily implemented by exploiting the noisy fluctuations in the pre- and post-synaptic activities to be encoded (Fusi et al., 2000). However, this also slows down learning and the memories should be experienced several times to produce a detectable mnemonic trace. 2.3.1 Bounded and bistable synapses The remarkable memory capacity of the classic neural network models depend critically on the fact that their synapses are unbounded. But, physical implementations of long lasting memories, either biological or electronic, are confronted with two hard limits: the synaptic weights are bounded (i.e. they cannot grow indefinitely or become negative), and the resolution of the synapse is limited (i.e. the synaptic weight cannot have an infinite number of stable states). These constraints, usually ignored by the vast majority of software models, have strong impact on the classification performance of the network, and on its memory storage capacity. In case of bounded synapse, if one assumes that the long-term changes cannot be arbitrarily small, the memory trace decays exponentially with the number of stored patterns. The neural network remembers only the most recent stimuli, and the memory span cannot surpass a number of patterns that is proportional to the logarithm of the number of neurons (Senn and Fusi, 2005; Fusi and Senn, 2006). p<− log N log(1 − Q) (2.1) Here p denotes the number of stored patterns at one time, N is the number of synapses, and Q the minimum probability of inducing a longterm change. Slowing down the learning process (decreasing the probability of transition, Q) allows increase in storage capacity as it also leads to slow forgetting. The resolution, or analog depth, of a synapse is the number of synaptic states that can be preserved in long time scale. It has been shown, if each synapse has n such stable states, then the number of patterns p can grow quadratically with n. However, this can happen only in an unrealistic scenarios, where fine tuning of the network’s parameters are allowed. In more realistic scenarios where there are inhomogeneities and variability (as is the case for biology and silicon) p is largely independent of n (Fusi and Abbott, 2007). At the same time, there has been accumulating evidence that biological synaptic contacts undergo all-or-none modification (Petersen et al., 1998; O’Connor et al., 2005), with no intermediate stable states. Positive feedback loops in the biochemical process, involving protein signaling cascade at post-synaptic density, has been hypothesized to durably maintain the evoked 16 Chapter 2. Biophysical models of learning synaptic state in the form of a bistable switch (Graupner and Brunel, 2007). The lack of intermediate analog sates also makes the synapses robust and noise immune. Hence, in this work, we consider the extreme case of bistable synapse also because of the ease of implementation on a VLSI device suitable for long term storage of digital bits. 2.4 Stochastic update and stop-learning mechanisms Recently a new model of spike-driven synaptic plasticity has been proposed (Brader et al., 2007) that can encode patterns of mean firing rates and is very effective in protecting old learned memories. Doing so, it captures the rich phenomenology observed in neurophysiological experiments on synaptic plasticity, including STDP protocols. This model uses Hebbian learning with stochastic updates and an additional stop-learning condition to classify broad classes of linearly separable patterns. As explained in previous sections, stochastic update is a requirement arising from the theoretical necessity of slowing down the learning process in an unbiased way. The dynamics of the synapse makes use of the stochasticity of the spike emission process of pre- and post-synaptic neurons. It is assumed that the pre-synaptic spike train is Poissonian, while the afferent current to the post-synaptic neuron is uncorrelated to the pre-synaptic process. To implement slow learning, only a random small fraction of synapses, from the large number of afferents (N ), should be modified at a time. The storage √ capacity obtained from the stochastic update method is proportional to N (Fusi and Senn, 2006) a significant improvement compared to log(N ), which was shown in the previous section. In this model, the memory lifetime is further extended by modifying the synapses only when necessary, i.e., when the input pattern weighted by the plastic synapses does not generate the output desired by the supervisor. If it does, synaptic modifications are stopped (using a stop-learning mechanism). This results in a very efficient method of classifying wide classes of highly correlated random patterns (Brader et al., 2007; Senn and Fusi, 2005). The mechanism of stop-learning resembles that of the perceptron learning rule (Minsky and Papert, 1969): patterns already learned does not change the synapses any more. In the model, stop-learning is implemented using a variable representing the average post-synaptic firing rate, without the need of any additional external signal. The learning rule in Brader et al. (2007) showed superior performance in classifying complex patterns of spike trains ranging from stimuli generated by 2.5. The learning rule 17 auditory/vision sensors (Coath et al., 2005) to images of handwritten digits from the MNIST 2 database (Brader et al., 2007). Both memory encoding (using mean firing rates and stochastic updates) and memory preservation (using binary weights and low transition probabilities for synapse) methods used in this model fit well with the physical limitations of a silicon device. Storage and recovery of binary values on silicon has been practiced by the VLSI industry for a long period of time and in a very successful manner. Using the very same strategies, we can build large arrays of synapse that are bistable in nature. The compact synaptic circuits do not require local Analogto-Digital Converters or floating gate memory cells for storing weight values. In addition, the inherent inhomogeneities in silicon fabrication process can be exploited to our advantage, when stochastic update in synaptic weight is required. By construction, these types of devices operate in a massively parallel fashion and are fault-tolerant: even if a considerable fraction of the small synaptic circuits are faulty due to fabrication problems, the overall functionality of the chip is not compromised. This can be a very favorable property considering the potential problems of unreliability in future scaled VLSI processes. 2.5 The learning rule The main goal of the synaptic model is to encode patterns of mean firing rates. The stochastic selection and stop-learning is given by two simple abstract learning rules. Consider a single neuron receiving a total current h which is the weighted sum of the activities si of its N inputs (Brader et al., 2007): h= N 1 X (Jj − gI )sj , N j=1 (2.2) where the Jj are the binary plastic excitatory synaptic weights (where Jj ∈ 0,1), and gI is a constant representing the contribution of an inhibitory population. The synaptic learning rule can be summerised as: Ji → 1 with probability qsi if hi < Ω and ξ = 1 Ji → 0 with probability qsi if hi > Ω and ξ = 0, 2 (2.3) A large database of handwritten character set that provides a good benchmark for learning network performance and have been used to test numerous classification algorithms. 18 Chapter 2. Biophysical models of learning k3 k2 [Ca] k1 Vmth UP Vmem DN post Vmth (a) (b) Figure 2.3: Characteristics of the synaptic modification. Top plots show conceptual framework while simulation results (adapted from Brader et al., 2007) are shown below. a) The Probability of upward or downward synaptic jumps depend on the average firing frequency of the post-synaptic neuron (νpost ). b) Within the right frequency range, the polarity of jumps depend on the value of the post-synaptic depolarization (Vmem ) compared to a threshold, Vmth . where ξ is a binary variable indicating the desired output as specified by the teacher. Variable Ω is the threshold on input current h that determines whether the neuron is active or not and q is the proportionality constant for transition probability. In order to make the model match neurophysiological observation, specific experimental results were considered. Though, presence of spike time dependent plasticity (STDP) at low frequencies of pre- and post-synaptic neuron has been well established both in vitro and in vivo (see Caporale and Dan, 2008, for review), LTP dominates when both pre- and post-synaptic neuron fires at higher frequencies, independent of the phase relation of the spikes (Sjostrom et al., 2001). In Nelson et al. (2002) also demonstrated that post-synaptic neuron has to be sufficiently depolarized for LTP to occur. The model in Brader et al. (2007) shows that with one variable for the average post-synaptic frequency and another for the post-synaptic depolarization, 2.6. Network description 19 the abstract learning rule can be implemented together with the neruphysiological constrains. While post-synaptic calcium concentration [Ca] , with its long time constant, is a good measure for the average firing frequency, the membrane voltage Vmem is a direct reading of the depolarization. All synapses are considered to be modified only at pre-synaptic inputs. Considering x as a variable representing the synaptic weight of a single synapse, the weight update during a pre-synaptic spike tpre is given by: x → x + ∆x if Vmem (tpre ) > Vmth and kUL P < [Ca](tpre ) < kUHP L H x → x − ∆x if Vmem (tpre ) > Vmth and kDN < [Ca](tpre ) < kDN , (2.4) H L are the thresholds on the calcium variable. and kDN where kUL P , kUHP , kDN In absence of pre-synaptic spike or if none of the conditions in Eq. 2.4 are not satisfied, x drifts to either of the two bistable states: dx = α if x > θ dt dx = −β if x ≤ θ, dt (2.5) where α and β are positive constants and θ is a constant threshold. In the simplified model used for the silicon implementation, we have: L H kUL P = kDN → k1 , kDN → k2 and kUHP → k3 α=β x = w, (2.6) where w is the actual synaptic weight used for generation of the excitatory post synaptic current (EP SC). 2.6 Network description The learning rule is designed to modify the synaptic matrix in such a way that each pattern seen during training can be retrieved later without mistakes. In the case of a feedforward network, this means that each input pattern produces the correct response indicated by the teacher during the training. For a recurrent network, each pattern imposed by the sensory stimuli becomes a fixed point of the network dynamics. Under additional stability conditions, these fixed points can also be attractors of the network 20 Chapter 2. Biophysical models of learning synapse array Inh n2 Teach Teach (a) n1 pN soma p2 axon p1 Inh (b) Figure 2.4: a) A schematic of the network architecture for a data set consisting of two classes. The output units (top) are are connected to the input layer (bottom) by the plastic synapses. The output units receive additional inputs from the teacher and inhibitory populations. b) Cartoon of a silicon output neuron showing the soma, axon and the synapse array. Patterns are presented to the plastic synapses (p1 to pN ) and the teacher signal is fed through a non-plastic excitatory synapse (n2 ). Another neuron receives the same input pattern and feeds it to the output neuron via an inhibitory synapse (n1 ). dynamics (Senn and Fusi, 2005). Here we consider the output neurons to be binary classifiers embedded in a feedforward network. The network architecture we mostly used consists of a single feedforward layer composed of N input neurons fully connected by plastic synapses to one output neuron. In addition to the inputs, the output neuron receive signals from a teacher and an inhibitory population. A binary teacher signal divides the input into two distinct classes and dictates which patterns the neuron should respond to. The Fig.2.4(a) shows the input and the output layer with the inhibitory and teacher population. Multiple output neurons can be used to respond to different classes of patterns. In the corresponding VLSI implementation, shown in Fig. 2.4(b)), the output neuron receives the input spikes at its plastic synapses (p1 to pN ). The input is also sent to an inhibitory neuron, in parallel. The inhibitory neuron and a teacher signal stimulates the output neuron at its and non-plastic synapses (n1 , n2 ). Chapter 3 AER Communication circuits 3.1 Introduction Neuromorphic engineering, from its conception, promised better utilization of silicon physics and low-power analog circuits for emulating functional behavior of the nervous system. At the same time it had to address the daunting task of mimicking the complex wiring in a three-dimensional physiological structure onto an essentially two-dimensional device. As pointed out in Mahowald (1994), it might seem impossible even in principle to build a structure in VLSI that mimics such wiring density. The degree of convergence and divergence of a single neuron is staggering in comparison to artificial devices such as a computer chip. This unusual need for a large fanout (and fanin) initiated thinking on a new strategy, foreseeing the speed of conventional VLSI system (Sivilotti, 1991; Mahowald, 1994; Boahen, 1998). Address Event Representation (AER) protocol, as it came to be known, is one of the most important achievements of this process. Over the years, tradeoffs on different aspects of the protocol, such as channel access and encoding schemes, has been analyzed and improved Boahen (2000); Culurciello and Andreou (2003); Boahen (2004a). AER is becoming increasingly popular as a means of data transfer in pulse-coded neural network and has been successfully implemented for multi-chip neuromorphic system (Serrano-Gotarredona et al., 2005; Chicca et al., 2007). Teixeira et al. (2006) even developed an AER-emulator to speed up post processing of data from AER-sensors. The usability of the protocol has been greatly improved by the simultaneous development of supporting hardware infrastructure (Deiss et al., 1998; Dante et al., 2005). The popularity and success of the method lead to a few alternative AER schemes, mainly word-serial Boahen (2004c) and serial AER Berge and Hafliger (2007); Fasnacht et al. (2008) communication. 21 22 Chapter 3. AER Communication circuits Source Destination 1 34 2 7 3 22 50 16 (a) (b) (c) Figure 3.1: Comparison between the biological and the neuromorphic information transfer. a) In biology, neurons transmit spikes through dedicated axons. b) In neuromorphic systems, the AER uses the bandwidth of copper wire to emulate virtual axons for all the silicon neurons. An encoder and a decoder (gray boxes) are required to multiplex the spikes addresses. c) A nontrivial connectivity can be set up by sending the multiplexed spikes through a look-up table. In this chapter I describe the basics of AER communication and show a method to formalize the description of the protocol according to Boahen (2000). In contrast to many prior implementation of the AER circuits, the formal representation helps in identifying possible improvements in the design. I describe various improvements with relevant data from the silicon chips. The AER communication, being the backbone for data transfer in neuromorphic chips, has to be carefully optimized both speed and robustness. 3.2 Basics of AER communication In biology, dedicated axons carry information (spikes) from a neuron to all its synapses, far from the cell body (see Fig. 3.1(a)). The synapses create connections between the neurons, and the axons form an intricate 3D network of cables within the layers of the cortex. It is not a feasible to build such an architecture for large network of silicon neurons and synapses due to the inherent 2D structure of a VLSI system. The need for AER arose not only to circumvent the problem of connecting silicon neurons located on different chips but also withing the same chip. The VLSI chips, in general, have few external pins much less than the number of neurons or synapses present on it. Hence, a metal wire connected to a pin cannot be used as a dedicated axon 3.2. Basics of AER communication 23 Figure 3.2: Details of an address event (AE) bus. Two neuromorphic, chips having 1D array of neurons with synapses attached to them, are connected via a AE single bus. The bus consists of data and control path signals used to establish communication between two arbitrary neurons via silicon synapses. Multiple chips can be similarly connected forming a large network of neurons and synapses. serving a neuron. The strategy is to use the high bandwidth of copper wire (compared to a physiological axon) and the speed of VLSI system (compared to physiological time scale) to perform a time-domain-multiple-access. This allows to form virtual connections between two neurons on different chips, or even on the same chip. Figure. 3.1(b) shows how the virtual connection carries spikes from the neurons to all their synapse, sharing the same physical connection. An intermediate look-up-table, shown in Fig. 3.1(c), can direct spikes from its source to the desired destination, forming an arbitrary neuronal connectivity map. Spikes generated by the neurons are stereotype events having digital amplitude, whose timing and source of initiation(address) are of only importance for communication purpose. The AER is an event driven communication protocol where the data represents the source address of the spike, and timing of the event is self-encoded by the arrival of the data itself. The temporal order of events on the address event bus is reminiscent of the firing pattern of the neurons. This is demonstrated in Fig. 3.2, which shows the connection between a source and a destination chip. However, to implement this communication protocol one has to take care 24 Chapter 3. AER Communication circuits of a number intermediate elements between the sending and receiving units, the neuron and its target synapse. The intermediate blocks perform combinational operation to correctly represent the data. Arbiter: As neurons do not fire in regular temporal sequence, many of them will try to access the address event bus simultaneously. A method of arbitrary selection of one among them in an unbiased manner is essential. Encoder: With a large number of neurons (say √ N )1 on a chip, the number of wires required for transmission increases as N . To restrict the wire count grow in a logarithmic manner, an encoding scheme is required. Decoder: The receiver ship should decode the encoded address. The AE bus can carry a sporadic data stream often with high event rate but also with phases of ’garbage’ data. Hence, the address bus should be decoded in the receiver chip fast and in a reliable way. A robust asynchronous communication between the source and destination chip along with the above mentioned combinational circuit blocks is designed. Here we first describe the method of implementing an efficient communication channel with single sender and receiver. In such a condition, all combinational blocks can be approximated as delay elements. Later we introduce the more general case of multiple sender and receiver with design considerations for the arbiter, encoder and decoder. As the neurons function in parallel and in real time, the physical connection is accessed by them only when necessary. This leads to the an asynchronous communication that is not driven by a central clock. 3.3 Single sender and single receiver Let us first consider the simplified assumption of a single sender and single receiver i.e., there is one source neuron sending its spikes to only one destination neuron. The source neuron can generate spikes at any time and with any frequency (limited only by its refractory period and the channel bandwidth). The destination neuron should receive a spike event when it is ready and also communicate with the source about its state. There is no central clock synchronizing the source and destination chips, resulting in an asynchronous data transfer. One principle assumption in this analysis is that every spike is important and we do not want any of them to get lost during communication. On the other hand, data transfer should be fast enough to mimic the real-time communication between biological neural network. 1 Here we consider the generic √ case of a 2D square array of N neurons. To uniquely identify a neuron in this array, N wires for each of X and Y axis are necessary. 3.3. Single sender and single receiver Stage i−1 R i−2 A i−2 Stage i R i−1 Stage i+1 Ri A i−1 Ai Data in 25 A i+1 R i+1 Data out (a) Ri Ai Data (b) Figure 3.3: a) Generic example of an asynchronous pipelined channel showing data and control signals. The request(R) and acknowledge (A) signals establishes a handshaking protocol to transmit the data correctly, without any central clock. b) Four-phase handshake is the most popular method of communication between asynchronous pipelined elements. 3.3.1 Pipelining the data Pipelining is a technique of increasing the throughput of a communication channel by breaking up the communication cycle into subprocess, at a moderate increase in area. These sub processes are connected in series but executes their function in parallel. Asynchronous data transfer, without a global clock, is heavily dependent on the concept of handshaking to complete a communication cycle. In asynchronous pipeline, the transfer of data between the source and the destination is regulated by local communication between stages. When one wants to send data to a neighboring stage, it sends out a request. If the neighbor can accept new data, it does so and returns an acknowledgment. The Figure 3.3(a) shows separate control and data path lines consisting of request/acknowledge (R/A) signals and data bus, respectively (Sutherland, 1989). The control path ports with black dots are active ones, while the others are passive ports. The control path can operate in two different modes, namely, two-phase or four-phase handshaking. In two-phase handshaking, when passing control information, a request is sent from the sender and an acknowledge is sent back by the completion detection logic of the receiver. This effects in single transitions on both request and acknowledge lines. In contrast, a four-phase handshaking has a second set of request and acknowledge transitions sent in order to return these signals to their original states (Yun et al., 1996). Even though it requires twice the number of transitions, four-phase handshaking is not necessarily slower than two-phase as most combinational logic blocks consume much more time than the communication blocks. In four-phase hand shake communication is initiated only on 26 Chapter 3. AER Communication circuits Figure 3.4: Communication cycle in a an AE bus connecting two neuromorphic chips. All combinational blocks are connected via the communication blocks, that executes data transfer. White and black boxes indicate the duration of the set and reset halves of the control signals. (a) A non-pipelined channel an intermediate stage is acknowledged only when all its following stages are acknowledged. (b) In the pipelined channel, an intermediate stage do not wait for the following stage to acknowledge it before acknowledging its preceding stage. Similarly, it does not wait for the following stage to withdraw its acknowledge before withdrawing its own. (adapted from Boahen (2004a)) rising edges; hence, it is easier to implement and preferred by designers. Consider the stage i in the micropipeline communication block shown in Fig. 3.3(a) and the corresponding timing diagram in Fig. 3.3(b). The active port Ri is taken high when the data is ready to be sent. The next stage (i+1), if ready, receives the signal on its passive port and sends out an acknowledge on Ai by taking it high ( it also latches the data simultaneously). Stage i receives the acknowledge on its passive port and starts the resetting half-cycle by taking Ri low. After the completion of the full four-phase handshaking the data bus is released and can go into its default high impedance state. Using these micropipeline communication blocks we can build an asynchronous pipelined channel. The combinational data path blocks mentioned before (arbiter, encoder, decoder) can be modeled as simple delay cells between the request(R) and acknowledge(A) control lines. Adding a source and a sink to the channel, the sender and receiver respectively, the entire communication path can be viewed as in Fig. 3.4 (Boahen, 2004a). The different combinational elements in the pipeline are: neuron(sender)-arbiter- 3.3. Single sender and single receiver 27 encoder-decoder-neuron(receiver). Figure 3.4 shows the same channel with and without pipeline, demonstrating the advantage of higher throughput. The implementation of the pipeline can be understood from Figure. 3.5, which shows a single sender and a receiver communicating through the asynchronous channel. An entire system similar to this was initially built with heuristic methods that did not exploit the theoretical knowledge on asynchronous communication. However, it was later made compatible with the formal CHP 2 language, utilizing the know how of asynchronous design methodology (Boahen, 2000). We will investigate the system with our knowledge of micro-pipelines and show that the heurestic method matches with the formal description to a large extent. This also allows us to identify potential improvements from previously proposed scheme. Comparing Fig. 3.5 to Fig. 3.3(a), we see how the basic implementation matches with ideal pipeline requirement. Distinct control and data path carries signals from left to right, using the handshaking cycles in communication blocks. The combinational blocks modifies the data path behavior. Here we describe the different handshaking cycles, with the relevant signals, inside the communication blocks. Handshaking is performed by symmetric and asymmetric C-elements, marked by C and aC, respectively (see Appendix A for the logic and circuit descriptions of C elements). The following sections describe the handshaking cycle using HSE primitives. The basic HSE primitives with examples are shown in Appendix. B. Neuron-Arbiter In the beginning of the pipeline cycle, the event generator interfaces the sender neuron to the arbiter (see Fig. 3.5). The high speed analog signal (Ss ), representing a spike, is terminated to generate a pulsed event Nr for the arbiter. However, Nr should wait till arbiter acknowledge (Na ) goes low, completing the previous cycle. The neuron is reset by Sr , when the arbiter acknowledges it. The reseting phase starts with Ss going low and Sr is taken back after the entire cycle is complete. The HSE of the event generator should look like: ∗ [[Ss &Ña ]; Nr +, [Na ]; Sr +; [Ss ]; Nr −; [Na ]; Sr −] 2 (3.1) Communicating Hardware Process (CHP) is the first step in describing the behavior of an asynchronous circuit with formal notation. Next, hand shaking expansion (HSE), encodes data with Boolean variables (nodes in the eventual circuit) and converts every communication action into a four-phase handshake (Martin et al., 2003). Chapter 3. AER Communication circuits Na Sr aC Ao Arbiter Neuron aC C Ea Latch Event generator Cr Er Tx handshake Encoder Nr Ss Receiver Chip Pr Dr C Ca Le Pa Latch R Q Rx handshake Neuron Sender Chip Decoder 28 Event receiver Figure 3.5: Communication channel established between a single sender and a single receiver on different chips. The communication blocks are shown in solid rectangle and combinational blocks in dashed oval. All control path signals and their handshaking cycles are illustrated. The combinational blocks acts as delays on the control path and alters the data path. The circuit for the event generator can be synthesised using the above HSE (see Appendix.B) code. Every sender has its own event generator, the output of which connects to the arbiter. The arbiter collects all such events to select one among them (see Sec. 3.6). For a single sender case, the arbiter logically being a combinational block, delays the Na signal during arbitration and it as Ao to the next communication block. The data path output of the arbiter is a one-hot 3 code, carrying a high signal only on the selected line. Arbiter-Encoder The next micro-pipeline stage, the transmitter handshake (Tx handshake), connects the arbiter to the encoder. The signals participating in four phase handshaking during this stage of micropipeline can be identified as Ao , Ea as input and Na , Er as output. Output Na also serves as the latching signal for the data path transmission. The HSE for the Tx handshake can be written as: ∗ [[Ão &Ea ]; Er +, Na +; [Ao &Ẽa ]; Er −, Na −] (3.2) The control signal Cr is a delayed version of Er , communicating with the next micro-pipeline stage. The encoder data path output is a log N data bus, that carries the address of the selected neuron. 3 In one-hot code, only one data line out of N is high, while all others remain low 3.4. Multiple sender and multiple receiver 29 Encoder-Decoder The Receiver handshake (Rx handshake) block, between the encoder and the decoder, functions similar to the Tx handshake. The four-phase handshaking produces Ca and Dr . The signal Dr can be used to latch the data before sending it to the decoder. ∗ [[Cr &P˜a ]; Dr +, Ca +; [C̃r &Pa ]; Dr −, Ca −] (3.3) This block establishes communication between the receiver chip with the external world and can talk to either a transmitter chip (as shown in Figure. 3.5) or a computer sending data using the AER protocol. The Decoder decodes the log N data to a one hot data (see Sec. 3.7) and communicates the signal to the relevant receiver unit (neuron). For a single receiver, the communication channel terminates to the only receiver neuron present. Decoder-Neuron The receiver node can be a synapse of a neuron or any other circuit that receives the spike event for further processing. Let us consider that the receiver node is always ready to accept the incoming spike. For a single receiver, the termination of the communication channel is trivial. On arrival of the pixel request signal (Pr ), it receives the one hot data Q, and generates a acknowledge signal Pa . A communication channel that complies with all the handshaking cycles, should result in the optimal data speed and robustness available from micropipelining. However, in the IFSL family of chips, minor variations are used. One important omission is that of one of the upper asymmetric Celements in the event generator block. In the IFSL chips, Ss and Nr are essentially the same signal that assumes the arbiter to be faster compared to the delay between two consecutive spikes from the same neuron. 3.4 Multiple sender and multiple receiver The description of a communication channel presented in the earlier sections assumed a single sender/receiver unit. But, this is rather uncommon for a neuromorphic chip that is typically supposed to establish communication between a pool of senders and receivers. When many neurons in sender chip tries to communicate (send spike events) with many others neurons in the receiver chip, the single physical bus allows this simultaneous manyto-many mapping by time-division-multiple-access of the AER channel (see 30 Chapter 3. AER Communication circuits Fig. 3.1(b)). Only one sender neuron gets access to the bus at a given time, chosen by a arbitration process between the competing neurons. Section. 3.6 deals in detail about the arbitration strategies and their results. The selected neuron send its request and its address (through the data path) to the receiver chip. The encoder, the mapper and the decoder is responsible to deliver the spike event to the correct target neuron. The input to the receiver neuron, being a passive port, reacts only when it is triggered with a request and then sends an acknowledge back. The virtual communication channel established in this process can be reused by another pair of sender-receiver at a later stage. 3.4.1 Data path design The Fig. 3.6(a) and Fig. 3.6(b) shows the sender and receiver units communicating with the data bus. No timing protocol is considered here, but only the combinational blocks previously considered as delay elements in Fig. 3.5 are shown. The relevant signal names are identical to those used in Fig. 3.5. Once the sender neuron is acknowledged by the arbiter (Ai ), its address is encoded in logN bits and a request (Cr ) is sent to the receiver chip. Due to the asynchronous, event driven nature of communication, the request is generated only during an event. As the request is generic, any acknowledge in the sender chip (A1 to AN ) result in the same Cr signal. An OR gate (Fig. 3.7 left) connected to all acknowledge signals of the pixel column (Ai ) can be used. This implementation would require N pMOS transistors stacked in series, where N can be in order of hundreds. For such a gate, the time delay for high to low transition at the output (Cr ) will be extremely varied and data dependent (see Sec. 3.7 for detail explanation). On the other hand, a wired-OR implementation, shown in Fig. 3.7 right, does not suffer from the problem of stacked transistor and also uses nearly half as many transistors. The wiredOR circuit is similar to the lower part of the conventional OR gate and is enough to generate a low-high transition in Cr . The high-low transition can be executed by a single pull-up transistor connected to an appropriate bias. A better approach is to replace bias voltage source with active pull-up circuits. The Ca signal serves a good candidate to perform the pull-up function. The active pull-up complies with the idea of making an efficient AER transceiver which is robust and independent of external voltage biases. Similar to the sender chip, the target neuron (Qi ) in the receiver delivers the pixel acknowledge signal (Pa ). An identical wired-OR can be used to combine all Q1 -QN , to produce the generic Pa signal. The Pr signal can be used for the pull-up of the OR gate. 3.5. Receiver handshake 31 AE bus Na Ai Pixel column Ca Cr Pa log(N) Qi Wired OR Nr Arbiter N pixels log(N) Encoder (a) Pr Wired OR AE bus Decoder Pixel column (b) Figure 3.6: Data exchange between combinational blocks of the 1D array of neurons (pixels). (a) The arbiter selects one of the active neurons sending a N -bit one-hot code to the encoder. The Encoder generates a log N bit data. (b) The decoder recreates the one-hot code and selects the required pixel. The control path signals play a necessary part in latching the data. 3.5 Receiver handshake The pipelining scheme shown in Sec. 3.5 was not implemented in early generations of various neuromorphic chips. In order to speed up the entire communication cycle, certain liberties are often taken while implementing the receiver chip. One major assumption is that the on-chip communication is much faster than the off-chip one. This allows us to ignore the four phase handshaking at the Rx handshake block in in Fig. 3.5. The request, Cr , from the previous stage is acknowledged (Ca ) as soon as it arrives, without waiting to check the acknowledge from the next stage (Pa ). Usually this does not have any adverse effect on the behavior AE bus, as by the time a second Cr arrives, the on-chip communication almost certainly restores Pa to its correct state. However, this can lead to a problem of missing spikes during a burst of events sent to the receiver chip. The implementation of proper handshake at every pipeline stage is a matter of choice depending upon the usage of the chip. Where data is incessant in nature; say from the output of a sensor chip (like a vision or audition sensor), each and every event (spike) may not be very important. On the other hand, cases where data is sparse and comes in bursts, missing a single spike is unacceptable. Receiver handshake experiment In order to demonstrate the problem of missing spikes, we tested two chips one implementing the receiver handshake and another not. In both chips, an 32 Chapter 3. AER Communication circuits An A2 Ca A1 Cr A1 A2 An A1 Cr A2 An Figure 3.7: Conventional OR gate (left), using complementatry CMOS logic, has stacked pMOS transistors for pullup while Wired-OR (right) uses a single transistor for pullup (either active or passive). A staticizer holds the output state when it is not actively driven. Vc IEPSC Figure 3.8: A probe point inside a single synapse is connected to an external pad. This makes it possible to verify whether an AER event directed to the particular synapse reached its destination. internal node of a receiver synapse was connected to a probe point Vc shown in Fig. 3.8. In input spike will result in a dip in the voltage Vc , before it recovers back. This let us check whether that particular receiver unit actually responded to an input spike targeted to it. The chips were stimulated with a burst of 5 spikes directed to different addresses including the test unit. Let us first consider the chip without the four-phase handshaking. Two different burst address was used: a) Address of the test unit as the first spike of the burst and b) Address of the test unit as the last spike of the burst. For case-a (not shown), the test synapse always responded to the input spike. Changes in inter-spike-interval(ISI) of the spike burst did not have any effect to this result, i.e., the first spike always reached the correct address. In contrast, for case-b, the test-synapse response was highly dependent on the ISI of the burst. A moderate (∼ 10ms) interval between spikes resulted in correct test-synapse response but it often failed to respond for lower ISIs. This is shown in the Fig. 3.9. Two separate bursts of spike were sent to the chip with equal ISI of 4ms. The test-synapse response at this is unreliable 3.6. Arbitration basics 33 Vc 3.3 Vc 3.3 5 5 0 Spike Input 2.5 Spike Input 2.5 0.003 0.007 time(s) (a) a 0 0.003 0.007 time(s) (b) b Figure 3.9: Communication failure without four-phase handshaking. Top panels show voltage Vc as in Fig. 3.8. The voltage should go down when the particular synapse receives an AER spike event. Bottom panel shows a burst of AER events whose last spike is directed to the test synapse. The communication fails to be reliable as it responds randomly to one of the two consecutive input bursts, the first one in (a) and the second one in (b). Implementing a four-phase handshaking scheme did not show a single failure in 200 trials using even smaller ISIs. as it responds in one of the two bursts randomly. The failure rate increases with decrease in the ISI. In the failed cases, the receiver chip kept working without any knowledge that it is acknowledging external signals faster than the core of the chip can handle. The same experiments were carried out in the other chip implementing proper four-phase handshaking as described in Eq. 3.3. Case-a, as usual did not show any problem of missing spike. For case-b, even using the minimum ISI possible (dependent on maximum AER bandwidth of 6k events/s) did not show a single failure, in the 200 continuous stimulations. 3.6 Arbitration basics In the early days of AER communication systems, various channel access topologies were studied, e.g., sequential scanning, ALOHA-based, priority encoding or arbitrated access (Culurciello and Andreou, 2003). However, an arbitrated system was found to be best suited for channels with a wide range of throughput. An arbiter chooses one of its many inputs when they attempt 34 Chapter 3. AER Communication circuits out1 in1 Concurrent inputs in2 in1 o1 out2 out1 out2 in2 o2 Mutually Exclusive outputs M in1 in2 o1 o2 0 1 0 1 0 0 1 1 1 1 0 0 1 1 previous or ME Figure 3.10: Mutual exclusion (M E) is the core of the arbitration process. Two concurrent inputs (high signals) passing through the ME results in complementary output signals (i.e., mutually exclusive) in an arbitrary manner. The ME circuit consists of a pair of cross-coupled NAND gates and a glitch reduction circuit (within dashed box). The truth table shows the expected behavior, within a bounded delay. to access the same shared resource simultaneously (i.e., the AE bus). In this section, we will first discuss about a two input arbiter that selects only one of its outputs high, when both inputs are activated. The core of an arbiter is the mutual exclusion circuit, often implemented with cross coupled NAND gates followed by a glitch reduction circuit (Mead and Conway, 1980). This circuit shown in Fig. 3.10 behaves like a traditional SR flipflop for input pairs 10, 01 and 11. In conventional digital design, the input 00 is restricted from use as it does not produce complementary output states. However, let us consider 00 as the default input condition, where o1 and o2 are both high. When one of the inputs go high, corresponding NAND output goes low (see truth table). As the inputs return to 00; o1 and o2 both goes back to 11 state. In an alternative situation, if one input goes high while another is already high the NAND output behaves predictively as shown in the truth table of Fig. 3.10, i.e., the previous output is maintained. These are the simple operating conditions where one of the two inputs is selected without any ambiguity. The inverters, powered by complementary inputs simply switch the states of o1 and o2 signals. The critical condition arises when both inputs go high at nearly same instant. The NAND outputs temporarily goes into metastable states (local maxima in energy level) from which they come out to reach arbitrary stable states but complementary in nature (o1 6= o2). This possibly happens because of inherent asymmetry of the physical realisation of the gates (impurities, fabrication flaws, thermal noise, etc Martin et al. (2003)). Theoretically, the outputs lingers around the metastable state for an unbound 3.6. Arbitration basics 35 V=o1−o2 in1 in2 in2 o1 tinit log( V) in1 time o2 (a) (b) Figure 3.11: Output of the cross-coupled NAND gates depend on the present and immediate past input, as shown in Fig. 3.10. (a) Trivial changes in input (in1, in2) results in an expected output (o1,o2). However, concurrent high inputs result in a mutually exclusive output (gray region) depending minute initial time difference (∆t) or system nonideality. (b) The voltage difference at the output (∆V ) grows exponentially over time to settle to complementary values. period before the conflict is resolved. However, it has been shown that the circuit output stabilizes to 01 or 10–depending on the slightest difference in initial condition–without much delay. Let us consider, ∆Vinit is the voltage difference between o1 and o2 due to the small time difference(∆t) between in1 and in2 . Detailed analysis using inverter models show that the voltage difference ∆V , grows exponentially in time depending on infinitesimal difference in the input(Sakurai, 1988). This rapidly results in a complementary set of output. ∆Vinit = K∆tinit , ∆V = ∆Vinit exp(αt) (3.4) The inverters become an essential part during the time interval when o1 and o2 has not reached stable states. They act as glitch removal circuit and guarantee that the M E output remains complementary for the subsequent logic stages. We now look into the logical framework of arbitration. An N-input arbiter required to select between N concurrent signals can be constructed from 2-input arbiters arranged in a tree structure, shown in Fig. 3.12(a). In Fig. 3.12(b) a basic 2 input arbiter with internal signal nomenclature is shown. To analyze the functionality of an arbiter tree, we start with a rudimentary 4 input arbiter (shown in Fig. 3.12(c)). In the first level, the input nodes are named as d1 to d4 and the intermediate nodes are called m1 and m2 . The four input nodes send requests to the arbiter by taking their respective li ports high, and get back acknowledgment from the arbiter if 36 Chapter 3. AER Communication circuits d1 d2 d1 N/2 input A m1 d2 dN/2 dN/2+1 dN/2+2 2 input Am A N/2 input li2 lo2 dN (a) m1 li1 lo1 (b) ro ri d3 A m2 m2 d4 (c) Figure 3.12: (a) The general arbitration scheme. An N input arbiter is made of a tree of arbiters culminating in a 2 input one. Each branch of N/2 input arbiter again has the same tree structure. (b) Single 2 input arbiter with all its input and output terminals. Separate signals for request (li1) and acknowledge (lo1) are used for each input. (c) A four input arbiter with all internal nodes and signals. The arbiters Am1 and Am2 are called as daughters of the arbiter Am . the corresponding lo goes high. In next sections, we will analyze some individual arbiter circuits and their behavior in the 4-input arbiter tree, using the so called ping-pong diagram as described in Boahen (2004a). The two important aspects of arbiter to be considered are speed (average delay for a request to get acknowledged) and fairness. An arbiter is called fair if it does not prefer servicing one request over another depending on its history of operation. As we will see in various arbiter implementation, fairness of an arbiter with large number of inputs is not easy to achieve. 3.6.1 Standard arbiter Alian Martin proposed the standard version of the arbiter in late 80s, which is fair but slow in its operation (see Martin and Nystrom, 2006, for review). The arbiter, shown Fig. 3.13(a), has its mutual exclusion circuit made of coupled NAND gates and complementary powered NOR gates (for glitch reduction). Let us consider the default condition when both li1 and li2 are low. Nodes a1 and a2 are low and so are ro and ri. This results in acknowledge nodes lo1 and lo2 to be low as well. Now, if li1 goes high, a1 goes low and a2 remains high. The NOR gates drive a1 high sending a request to the parent level ( i.e., ro goes high). When the acknowledge from the parent comes back setting ri high, the C-element sets lo1 high and lo2 continues to be low. Even if li2 is taken high in between, trying to send a new request to upper level, it does not change states of a1 or a2 as M E element holds their previous state. Hence, li2 has to wait before propagating its request. A complete timing diagram of all the internal signals, for the case where li1 made a request 3.6. Arbitration basics lo1 li1 37 C m 1or m 2 Am a1 a1 ri m1 A m1 A m2 ro a2 li2 lo2 d1 d1 d2 d3 a2 C (a) t1 t a t’2 input (b) Figure 3.13: (a) The standard arbiter cell designed by Martin (see Martin and Nystrom, 2006, for review). (b) The ping-pong diagram shows the signal transmission through different levels of the four input arbiter. All request signals are directed upwards and acknowledge downwards. The set and reset phase is differentiated by arrow-heads and circular-heads, respectively. The arbiters Am1 and Am2 (see Fig. 3.12(c)) receives requests from d1 -d3 input node. Am forms the next level in the arbiter tree. before li2, is shown in Fig. 3.14. We will analyze the ping-pong diagram (Fig. 3.13(b)) to understand signal transition between different levels of the 4-input arbiter in Fig. 3.12(c). Each node (d1 -d4 , m1 and m2 ) has a request and an acknowledge component to it. For input d1 , we will call them d1r and d1a respectively. In the figure, both request and acknowledge signals of a node are marked with the node name itself, but with different arrow directions. All arrows directed upward represent request signals and arrows going downward represents acknowledge signals. The circular arrow heads follow the same rule but represent the reset phase of the signals. Signals corresponding to d1 are marked with filled arrow/circular head, and the ones corresponding to d2 are marked with open arrow/circular head. Assuming the inputs d1 and d2 made simultaneous requests taking d1r and d2r high, a request from the first level of arbiters (A1 , A2 ) will be sent to the second level (A3 ), taking m1r high. When m1a is set high, arbiter A1 is chosen by its parent, which, in turn, acknowledges one of the two daughter requests(say d1 , and sets d1a high). Next, when d1r is reset to low, this is conveyed to the next level resetting m1r low. The acknowledge signal on d1a goes low only after acknowledge m1a is reset. This process illustrates that the request from one input sets all the intermediate request and the acknowledge 38 Chapter 3. AER Communication circuits signals, right up to the top of the arbiter tree. Similarly, all the request and acknowledge signals has to be reset to complete the cycle just to service that one input. After input d1 is serviced, the held request of d2r will force another request (m1r ) by A1 . However, if d3 is ready with a request d3r at t1 (< ta ), then m1r and m2r will be both set and one of them will be arbitrarily chosen without any bias. If arbiter A1 is chosen (setting m1a high), then d2a will be taken high, otherwise, d3a goes high acknowledging input d3 . This random choice between the two (d2 and d3 ) makes the arbiter completely fair, and without any bias from its history. However, setting and resetting all the corresponding signals in every level of the tree, for each input makes the process very slow. 3.6.2 Fast, unfair arbiter In Boahen (1998), a faster implementation of the arbiter was proposed. The analysis of the circuit, shown in Fig. 3.15(a) is provided in the paper itself. This is an improved version of the early arbiter proposed in Mahowald (1994), with added robustness replacing the non-standard digital elements. Here we discuss the ping-pong diagram in Fig. 3.15(b) to demonstrate its behavior. Let us start with the previous situation where inputs d1 and d2 made simultaneous requests, and d3 made a request at t1 . Say, input d1 is arbitrarily chosen to be acknowledged first. Next, input d2 will also be acknowledged immediately after d1 resets its request (take d1r low) without waiting for any change in m1r or m1a . It reuses the local acknowledge (M1a ), without sending its request up the arbiter tree. This is in contrast to the analysis in Fig. 3.13(b), where d3 had a fair chance of getting acknowledged. Now, consider d1 making a second request at t2 . Next, m2r goes high, but arbiter Am2 has no chance of getting acknowledged. Signal m1a which was already set high in previous cycle acknowledges Am1 , once again. Hence, d1 gets acknowledged again, keeping d3 on hold, only because Am1 had an immediate history of getting acknowledged. If d2 makes another request at t3 , it will get acknowledged too, while d3 is kept waiting. This goes on unless both d1 and d2 stops requesting (say, at t4 ), letting d3 get acknowledged. The local request-acknowledge cycle speeds up the arbitration process for a large arbiter tree (often with hundreds of inputs and tens of levels). But once it served a branch, it is biased to acknowledge the same one repeatedly, restricting all activity from a localized region. 3.6. Arbitration basics 39 li1 li2 a1 a2 a1 a2 ro ri lo1 lo2 tp tc tp tp tc tp Figure 3.14: Detailed timing diagram of a single arbiter (shown in Fig. 3.13(a)). The time taken to set or reset signals by its parent (tp ) or its daughter (tc ) level are marked below. The lower white and black boxes indicate the time taken for a full request/acknowledge cycle of the individual inputs. 40 Chapter 3. AER Communication circuits lo1 li1 li2 ri lo2 ro (a) a1 Am m1 m 2 m2 Am1,Am2 d1 d1 d2 d2 d3 t1 t2 d1 t3 t4 (b) a2 Figure 3.15: (a) The fast, unfair arbiter designed by Boahen (Boahen, 2000). (b) Starting with the same condition as in Fig. 3.13(b), the ping-pong diagram shows a faster request/acknowledge cycle for the inputs d1 and d2 , but d3 is unfairly held back as long as none of the others are requesting any more. Experimental verification of the unfair arbiter To verify the problem of unfair arbitration, a chip consisting a 32-by-32 array of I&F neurons was used. All the neurons could be forced to spike simultaneously using a single external bias voltage controls the current injection to the neurons. The AER bus from the chip transmits these spike events to a computer. Using the address of the sender neuron, the activity pattern on the chip can be monitored in the computer. Though all neurons integrate the same input current, only the one acknowledged by the arbiter are allowed to get access to the AER bus. They communicate their spike events outside while others have to wait for their chances arrive. In Fig.3.16, the 2D reconstruction of the chip activity is shown. In Fig. 3.16(a), for a moderate level of injected current, firing frequency of neurons vary considerably due to the inherent mismatch in silicon fabrication 3.6. Arbitration basics 41 Spike count = 434176 Spike count = 569344 1k 1k 32 Y Y 32 1 1 1 32 1 X 32 X (a) (b) Spike count = 585728 Spike count = 581632 1k 1k 32 Y Y 32 1 1 32 X (c) 1 1 32 X (d) Figure 3.16: Data obtained from using the greedy arbiter in a 2D array of I&F neurons. In (a), a low constant current injection make the neurons fire at moderate frequencies with cross-chip variations due to mismatch. (b) Firing rate escalates with the increase in current injection thus increasing the total spike count per second (shown on top). The maximum frequency of spikes that can be received from the chip has an upper limit due to AER bus bandwidth. The problem arising due the unfair arbiter becomes evident in (c) and (d). With increase in the current injection, the arbiter services one half (c) of the array much faster, but completely ignores the other. For even higher currents just one fourth (d) of the array could report any activity. For all panels, the output frequencies were clipped at 1kHz. 42 Chapter 3. AER Communication circuits lo1 li1 m2 aC ro ri li2 m1 d1 m1 d1 d2 d1 aC t1 lo2 (a) t4 t2 (b) Figure 3.17: (a)The fair arbiter designed by Boahen (2004a). This is faster than Martin’s scheme shown in Fig. 3.13(a). (b) The arbiter does not show any bias for the previously selected daughter. process. This is a random mismatch problem without any particular directional bias. The average spike count per second is shown on top of the plot. When the injection current is increased (from Fig. 3.16(b) to Fig. 3.16(c)), the spike count reaches a maximum of ∼ 5.8k spikes/sec. As the total activity approaches its maximum value the uniformly distributed firing pattern disrupts to form local regions of much higher activity. The arbiter randomly choose one half of the arbiter tree and completely ignores request from the other (Fig. 3.16(c)). As the lower half is never acknowledged, no activity is ever reported from there. In a 2D array of neurons, arbitration should be done in both X- and Y-axis, one after the other. Here, the biased nature of an unfair arbiter is observed only along the Y-axis, one which is arbitrated first in this chip. Hence, for high data volume the arbiter breaks the array in two horizontal halves. For even higher current injection (Fig. 3.16(d)), arbiter randomly chooses one half of the previously selected region. As we kept increasing the current (data not shown), the arbiter acknowledged even smaller portions of the chip. After restricting the activity to one horizontal row, the symmetry along X-axis was broken. Finally only one neuron firing at ∼ 10kHz was observed. Even though such a huge firing frequency from a single neuron is not feasible for biological (and neuromorphic) systems, but frequency of 1kHz is not uncommon for neurons communicating sensory signals from silicon-retina or siicon-cochlea. In such cases, a sudden burst of activity from a local pool of neurons would completely ignore all information from other regions of the network. 3.6. Arbitration basics 3.6.3 43 Fair arbiter An improved version of the arbiter was again proposed by Boahen (2004a) by going through the rigorous formalism of CHP and HSE. The circuit shown in Fig. 3.17(a) is described in the paper itself. The philosophy behind a fair arbitration is: a daughter node that is not requesting is not visited and a daughter node that makes another request is not revisited until the entire tree has been serviced. However, comparing Fig. 3.17(a) to Fig. 3.15(a), shows that an ingenious alteration in the connectivity of the ri signal (acknowledge from the parent) is responsible for the change. Again, we analyze the ping-pong diagram by starting with two simultaneous requests from d1 and d2 . After d1 is acknowledged, it resets its request (d1r ) that result in immediate reset of its acknowledge (d1a ). This leads to d2 being acknowledged (d2a goes high). As usual, we consider d3 putting up a request in between (at t1 ), and d1 making a second request at t2 . Though input d3 always forces m2r to go high, but arbiter Am2 was never acknowledged in the unfair arbiter (see Fig. 3.15(b)). Here, when d2r is reset, m2r and m2a are also taken low. This allows arbiter Am2 to get acknowledged, setting m2a high and also d3a . The pending request by d1r is acknowledged only after m2a is reset (t4 ). This forces the arbiter to be fair by checking all nodes before repeatedly servicing the same one. However, if we compare the ping-pong diagrams, it is evident that greedy (Fig. 3.15(b)) and fair (Fig. 3.17(b)) arbiters completes one full requestacknowledge cycle by t2 which is less than t02 in Martin’s design (Fig. 3.13(b)). This is possible by keeping the request-acknowledge cycle local. Experimental verification of the fair arbiter We verified the performance of the fair arbiter in a different chip with a similar method as described in the Fig. 3.16. This chip also consists of a 32by-32 array of I&F neurons that can be made to fire by injecting current. As shown in Fig. 3.18, moderate injection current resulted in low firing frequency (Fig. 3.18(a)) and progressively higher injection increased frequency of all neurons (Fig. 3.18(b). The total spike count per second escalated till the bus is saturated at around ∼ 5.8k event per second. Though the effect of random mismatch is visible in the firing pattern in all the plots, notice that the arbiter is performing in an unbiased manner for even higher injection current (Fig. 3.18(c)). It lets all neurons fire at their preferred frequency. In Fig. 3.18(d), however, we see that even higher injection currents smooths out mismatch effects. This is because weaker neurons are firing at higher frequencies and stronger neurons (that already have high firing rate) are no 44 Chapter 3. AER Communication circuits Spike count = 71753 Spike count = 307200 1k 1k 32 Y Y 32 1 1 1 32 1 X 32 X (a) (b) Spike count = 573440 Spike count = 581632 1k 1k 32 Y Y 32 1 1 32 X (c) 1 1 32 X (d) Figure 3.18: The fair arbiter performs much better in a task similar to that in Fig. 3.16. Current injection increases the average firing rate and also the total output spikes from the chip, as shown in (a) and (b). In panels (c) and (d) the maximum AER bandwidth limit is reached. However, neurons from all parts of the 2D array are acknowledged by the arbiter which is demonstrated by an uniform activity. The neurons that showed high firing rate in (c) are restricted by the latency of the arbiter in (d), that lets the weaker neurons to get acknowledged. 3.7. Decoder 45 longer allowed to fire higher. The unbiased selection of the weaker neurons is the most important advantage of the new arbiter. 3.7 Decoder The log N bit input data to a receiver chip is decoded to produce an onehot code of N bits. The decoder design used in this project has changed considerably from many previous generations of neuromorphic chips. Two main issues are addressed here: 1. Nonuniform delay in decoding different address bits. 2. Latching of address bits. 3.7.1 Delay Shown in Fig. 3.19, a simple decoder for N -bit address space is made of 2N NAND gates, each having N inputs. A very efficient method of automatically laying out the decoder, developed in Mahowald (1994), was used in various neuromorphic chips (Boahen, 2000; Indiveri, 2002, etc.). The main problem in this type of decoder is the large and unmatched delay between decoding certain addresses in succession. Two different kind delays can be identified for this circuit, the wire-delay and the gate-delay. In a 2D address space, each output of the decoder (the NAND gates), has to drive 2N gates and a long metal wire. This adds to considerable capacitance and a delay while changing states. Though of considerable magnitude, the delay does not vary for different input address. The second source of delay arises from discharging the internal nodes of the NAND gates. Shown in detail in Fig. 3.20, each NAND gate is made by N parallel pMOS as pull-up and N nMOS in series, as pull-down. Only one of these 2N NAND gates receive high signals in all of its N inputs driving the output to a low value. This is the correct address corresponding to that particular gate. As the output node changes state from high-to-low (H → L), all the stacked internal nodes (parasitic capacitance C1 -Cn ) have to be discharged as well. However, the status of the internal nodes, before the correct address arrives, depends purely on the previous address bit. Depending on the number of internal nodes already discharged, the total time taken by the output to change its state might vary widely. Let us analyze the delay in a 2-input NAND gate with input A0 and A1 . Suppose A1 at high state (H) and A0 at low (L), hence the output is high and C1 discharged to ground. 46 Chapter 3. AER Communication circuits 2N gates 2N address lines Figure 3.19: Traditional decoding scheme for N bit address space. 2N NAND gates each having N inputs are connected to half of the 2N address lines. When A0 changes from L → H, the delay for the output transition ( from H → L) will be controlled by the discharging time of C0 and CLoad given by: τHL = (R0 + R1 ).(C0 + CLoad ), (3.5) where Ri is the equivalent resistance of the ith nMOS transistor. On the other hand, if the previous input had both A1 and A0 as low, both C1 and C0 should be discharged (along with Cload ) simultaneously: τHL = (R0 + R1 )(C0 + CLoad ) + R1 .C1 (3.6) Though this difference is minimal for a 2-input NAND gate, it becomes a limiting factor as number of input (N ) increases. Suppose the present input to a 5 bit decoder is 11110, which produces an output transition only for the NAND gate having A4 A3 A2 A1 A0 as input. It should be noted that the NAND gate corresponding to A0 A1 A2 A3 A4 already has 4 of the 5 internal nodes discharged to ground, for the present input. If the next input is 11111, the same NAND will need to discharge the only remaining node to produce a very fast output H → L transition. On the other hand if the previous bit was 01111 and the next 11111, this particular NAND gate will have to discharge all the 5 internal nodes through their respective resistors. This creates a big difference in delay between 11110→11111 from a 01111→11111 decoding. Various other delay conditions can arise in this process that becomes more critical as the number of input increases. Hence, it is important to latch the decoded data before sending it to the sensitive analog core. These decoder delays are only important when the NAND gates are making a H → L 3.7. Decoder 47 An A1 A0 CLoad C0 A0 CLoad + C0 R0 C1 A1 Cn R1 Rn An Figure 3.20: Details of a N input NAND gate showing the series of stacked nMOS transistors in the pull-down path. Each transistor, when turned on, can be considered as a resistive element (R1 -RN ). All the internal nodes have parasitic capacitors (C1 -CN ) that should be discharged during a high to low transition at the output. transition. The L → H delay is always uniform and fastest, carried out by any of the parallel pMOS pull-up transistors. Predecoder Using a predecoder is a popular method to match the decoder delays for large digital memories. The basic idea behind a predecoder is to break the large NAND gates described before into a tree of smaller NANAD gates, that have less delay mismatch. A predecoder uses small NAND gates (usually 2 to 3 input) to decode different parts of the address word in parallel. The word is generally broken into two (MSBs and LSBs) or three parts. The output of predecoders then goes through another NAND gate stage combining the MSB and LSB results to form the final decoding stage (Keeth and Baker, 2000). Fig. 3.21 (left) shows the scheme for a 5 bit address space (A0 to A4 ). Instead of the 5-input NAND gates, the LSB is decoded by a 2-to-4 bit decoder that uses four 2-input NAND gates and the MSB is decoded using eight 3-input NAND gates. The final decoder stage consists of 2-input NAND gates combining the output of the LSB and MSB decoder. As the design does not use NAND gates bigger than the 3 input ones, the delay mismismatch discussed before is largely reduced. More than one predecoder stages are used if the input address space is wider. The Fig. 3.21 shows decoding scheme 48 Chapter 3. AER Communication circuits Predecoders Predecoders A8 A4 3 to 8 3 to 8 2 input NAND gates 3 input NAND gates 2 to 4 3 to 8 A0 A0 3 to 8 Figure 3.21: A pre-decoding scheme for 5 and 9 bit address space. The predecoder decodes different parts of the address space (say LSB and MSB) in parallel and combine them in a final decoding stage. The 5 bits (left) are broken into 2 and 3 bit decoders where as 9 bits (right) into 3 decoders each of 3 bits each. In order to reduce the prblem of unequal delay, NAND gates not bigger than 3 inputs are used anywhere. The subsequent NAND gate stage generates the final result. for 5 and 9 input address space, both using a single predecoding level. The Fig.3.22 shows how the physical layout of the predecoder is done. Here, the data is first latched before being sent to the decoder. The latching scheme is described in the next section. 3.7.2 Address Latching In a robust asynchronous pipeline, address latching should not require any timing assumption of internal and external delays. However, the validity of the data to be latched is not obvious. It is dependent on the the assumption of matched propagation delay for data and control path. In the AER system described here, the problem of latching a valid data becomes prominent in the Receiver Handshake block (see Fig. 3.5), that receives data from off-chip metal wires. A common practice to avoid the requirement of matched delay is to latch the input data with a delayed version of Dr (or Le ). However, this requires addition of delay only on the rising edge of Dr using external bias voltages, that we intend to avoid. Also, adding worst-case delay for increasing reliability would result in slowing down the communication cycle unnecessarily. To tackle problems of this sort, delay insensitive data encoding has been extensively used in asynchronous design community (Martin and Nystrom, 2006). Dual-rail data is one such method in which 2 wires are used to carry a signal, bit-true (bt ) and bit-false (bf ). Figure. 3.23 shows the scheme where 2N wires are required to carry N bit data. In contrast to bundled data, where 3.7. Decoder 49 2 to 4 Latch 3 to 8 Decoder Data Input Predecoder ~D r D r 2D synapse array Figure 3.22: Pre-decoder cell placement in the chip layout. The pre-decoder and the decoder NAND gates are physically separated for ease of layout. each wire carries a signal, this added overhead provides a big advantage where delay-insensitive design is necessary. In dual-rail configuration, data itself carries a Data valid signal. A low value on both wires (bt bf =00) indicate invalid data where as 01 and 10 represent low and high data, respectively. This justifies the lack of separate Req signal in the Fig. 3.23. The state 11 is not allowed to happen. In Fig. 3.24 the entire input stage of the receiver is shown. The C-element and the latch can be identified as the Tx handshake block shown in Fig. 3.5. The output of the C-element Dr (same as Le ) when high, latches the available data. The output of the latch, combined with Dr forms the dual rail data, each bit having a bt and bf component. These lines can be directly used as inputs to the decoder (2-to-4 decoder, in this case). When Dr is low, any data passing though the transparent latch is transformed into ’invalid’ (00) dual-rail state. Even if one of the dual-rail bit is in 00 state, it turns all the outputs of the decoder to zero. The right most structure, generating the ∼ Pa signal is a part of the wiredOR gate described in Sec. 3.4.1. The Dr signal is delayed by an amount equal to that of the data lines to generate the Pr signal used for active pull-up of the wiredoR gates. 3.7.3 Receiver Synapse Select For a 2D array of neurons, shown in Fig. 3.25, the X- and Y-decoder data should be combined to generate the final signal R that selects the correct receiver unit. The safest way to generate R is to use a state holding C-element, 50 Chapter 3. AER Communication circuits 4−phase dual−rail channel Data (Req) 2N Ack Data{ bt bf } Valid Valid Ack Figure 3.23: The dual-rail communication channel with 4-phase handshaking. The data path consists of 2N lines for N bit address. This also implicitly carries the request signal. Every data bit is represented by a bt and a bf . The timing diagram is shown below. with Qx and Qy as input. However, as the synapse circuit is repeated several times on chip, it is advisable to keep it as small as possible. Three other methods to generate R is shown in Fig. 3.25. Driving a NAND gate with QX and QY is the next obvious solution (a), however, the structure in (b) requires one less transistor than the NAND gate. This can be considered as a static NAND that uses a constant pull-up current supplied by a bias. However, the static pull-up introduces problems of mismatch among synapses, and makes the width of R highly variable. This results in more variability in the following analog circuits. The pull-up can be actively driven by the Pr signal. The structure in (c) requires just two transistors and is commonly used in large memory arrays (DRAMs). The pull-down path can be either driven by a bias or by the active Pr signal, as before. All the individual acknowledge signals (R), are collected together using wired-OR, similar to Fig. 3.7, to generate a global acknowledge (Pa ) driving the communication channel. However, to reduce capacitive load arising from thousands of pixels, we first do a row wired-OR and then a column wired-OR on the row outputs (see Fig. 3.24). A delayed version of the signal Pr is used as active pull-ups for the wired-OR. 3.8 Conclusions The Address-Event-Representation (AER) protocol uses a time-division-multiple access for communication between neuromorphic chips. It exploits the 5decade difference in bandwidth between a neuron (hundreds of Hz) and a digital bus (tens of megahertz), that enables us to replace thousands of dedicated connections with a handful of high-speed metal wires. Followed by the 3.8. Conclusions 51 Pr Control Path ~Pa C Cr Ca ~D r Dr Pr Data Path X Latch,Decoder Y Decoder A0 Qy Y Latch bt S1 2D synapse array NAND bf A1 Latch Qx Si 2 to 4 ~Pr Dual−rail data Figure 3.24: Part of the AER receiver implementation. All relevant control path and data path signal are shown (see Fig. 3.5 and Fig.3.6). The dual-rail data representation, internal to the chip, is marked. The data is latched using the control signals and decoded using a pre-decoder scheme (see Fig. 3.21). The wiredOR gates (see fig.3.7) on the the right of the 2D array are used to generate acknowledge signals for the control path. pioneering work done by Sivilotti (1991) and Mahowald (1994), AER has become the most prominent method of pulse-codded data transfer. Since then, there has been significant improvement in its design aspect (Boahen, 2000, 2004b; Merolla et al., 2007) to accommodate large multi-chip neuromorphic systems. Here I discussed the basic formulation of the AER scheme and the method to build an asynchronous communication channel using this protocol. As introduced in Boahen (2000), I showed how pipelining the communication channel increases its throughput and elaborated on the design aspects of the pipeline. The handshaking cycles in an asynchronous pipeline can expanded using the HSE formalism developed by Martin et al. (2003). This formal approach in AER circuit design is in contrast to the heuristic method used in many previous generations of neuromorphic chips. I described how to build a robust AER communication system, independent of any external bias. Apart from the ease in usability, the reliability of the system also showed improvement. The problem of data loss observed by Chicca (2006) during high frequency activity could be completely eliminated. 52 Chapter 3. AER Communication circuits X Decoder (b) (a) (c) X Decoder QX QY QY Pr QX R R ~R QX ~Pr QY Figure 3.25: The x- and y-decoder generates one-hot code to select a particular synapse on the 2D array (left). This can be done by uses various Synapse Select circuits(right). Circuits from left to right uses less area, which is a major deciding factor in implementation of large arrays of synapses. In this chapter, I also discussed the design of individual combinational circuit blocks. In particular, I focused on the arbiter and the decoder design. Arbiter, being an integral part of an asynchronous transmitter, has to be optimized for both speed and fairness. I described the design basics and presented data from different chips to point out improvements in arbiter design that was proposed in Boahen (2004a). This is an important enhancement over the problems observed in Lichtsteiner and Delbrück (2005), where increased activity in a region of the chip restricted all the AER traffic to that part. The dual rail data representation used in the decoder design is another step toward adding robustness to the AER communication system. Chapter 4 Circuits for synaptic plasticity 4.1 Introduction Models of learning and classification using networks of spiking neurons have recently become very popular in the theoretical neuroscience community (Kempter et al., 1999; Natschläger and Maass, 2002; Fusi and Senn, 2006). However, there are very few VLSI implementation of spiking neurons that exhibit plasticity and learning. From the early efforts (Diorio et al., 1996) to the very recent ones (see Giulioni et al., 2007; Mitra et al., 2008), different learning rules, and a wide variety of silicon circuits implementing them, have been tested. Among various technological constrains to be considered in designing such circuits, an efficient long term storage of the synaptic weight is of foremost necessity. Apart from technology specific floating gates (P.Hasler et al., 1999; Häfliger and Rasche, 1999; Shon et al., 2002), digital methods like on-chip SRAMs (Arthur and Boahen, 2006; Schemmel et al., 2007) or external look-up-tables (Vogelstein et al., 2003; Wang and Liu, 2006) to large multi-level analog storage (Häfliger, 2007) were used. Storage of binary synaptic weights on silicon are also reported in Bofill-i Petit and Murray (2004); Indiveri et al. (2006a). In this work, the circuits implementing the plastic synapses store binary weights as stable states of an amplifier. Here I show how the design was carried out to optimize for power and area. The improvised current mode global feedback signal, from the neurons to the synapses, also makes the circuit robust against signal interference. Though motivated by the theoretical work proposed in Brader et al. (2007), the design of the synaptic circuits is not restricted to incorporating only the specific learning rule. Various bias parameters can be tuned to modify both the functionality and the network architecture to accommodate generic (like STDP) or even specific learning 53 54 Chapter 4. Circuits for synaptic plasticity rules (Gütig and Sompolinsky, 2006). In next chapters I will show the performance of the silicon circuits mostly in the context of the particular learning rule as in Brader et al. (2007). Another silicon implementation that meets the same theoretical requirement is shown in Badoni et al. (2006). 4.2 The IFSL family of chips All the IFSL family of chips designed during this project have a basic similarity in architecture. Apart from the AER interface circuitry, they consists of arrays of analog integrate-and-fire (I&F ) neurons and dynamic synapses. Here we describe the IFSL-v2 chip, which was used for most of the experimental results shown in next chapters, in detail. The fundamental building block of the chip is its analog core consuming more than 80% of the silicon area. Along with the neurons, there are three kinds of silicon synapse, plastic excitatory (P), non-plastic excitatory (E) and non-plastic inhibitory (I). There are 16 neurons, 1920 plastic synapses and 128 non plastic ones. The number of synapses connected to a neuron can be configured, depending on the requirement of the experiment. The chip was fabricated using a standard 0.35µm CMOS technology, and occupies an area of 6.1mm2 . It can be used to accommodate generalized feedforward learning networks consisting of an input and an output layer. It also has nearest-neighbor connectivity among neurons and a global inhibition for implementing more complex networks. The architecture, as shown in the Fig. 4.1(a), is used for the learning network described in Brader et al. (2007) and in Fig. 2.4(a). The post-synaptic neurons behave as the output units of the network, while the plastic synapse receives spikes from the input layer. The chip layout in Fig. 4.1(b) shows the physical dimensions and the placement of the circuit elements. In the following sections the circuits constituting the synapses and parts of the neuron will be described. The dynamics of the synapses are controlled by both the pre-synaptic and the post-synaptic signals. Each synapse, receives spikes at its pre-synaptic terminal (spike inputs to the chip) and control signals from the post-synaptic neuron (the silicon neuron it is connected to). We will therefore divide the circuits associated to a synapse in pre − synaptic and post − synaptic modules. The pre-synaptic module consists of five different blocks, the AER interface, the P ulse Extender, the weight update, the bistability and the EP SC (excitatory post synaptic current) generating block. The postsynaptic module consists of the I&F soma, the pulse − integrator and few currentcomparators. We also implement a method to modify the number of synapses connected to a neuron by having a multiplexer in between the 4.3. The pre-synaptic module 55 I Synapse Pre−synaptic modules E P (a) 20 m Post−synaptic module 70 m (b) Figure 4.1: a) Cartoon of a neuron with its synapses. In circuit implementation, the synapse is broken into a pre-synaptic and a post-synaptic module. b) Layout of the chip along with the floor-plan for the soma and the synapse circuits. The bottom zoomed out image shows one synapse with all the internal blocks in its pre-synaptic module (see Sec. 4.3). pre-synaptic and post-synaptic modules. The same learning rule was implemented in the LANN 1 family of chips designed by Giulioni et al. (2007), where they also used a part of the circuits developed during this project. The basic difference between the LANN and the IFSL chips are their pre-synaptic circuits. In contrary to the IFSL chips, the LAN N chips uses a dedicated digital read-out to monitor the state of every silicon synapse. Though this is useful during the characterization of the synaptic array, the area overhead (3219µm2 per synapse) is more than double compared to that of the IFSL-v2 (1400µm2 per synapse, see Fig. 4.1(b)) chip. The LANN chips also used constant current pulses while generating the EPSC, without incorporating the temporal dynamics observed in biological synapses. 4.3 The pre-synaptic module Instances of this module covers the largest section of the chip, one necessary for each silicon synapse. Due to the large number on-chip synapses required for better classification performance (see storage capacity in Sec 2.4), the presynaptic module should be designed with the minimal silicon area possible. 1 LANN stands for Learning Attractor Neural Network 56 Chapter 4. Circuits for synaptic plasticity The primary inputs to a pre-synaptic module are the decoded data from the X and Y address decoders. They are passed through an AER interface (see Sec. 3.7.3) circuit to select a single synapse in a 2D address space and stimulate the selected synapse with a digital pulse, called a pre-synaptic event. This pulse received by the synapse at the output of the AER interface has a typical width of 800-1000ns. A P ulse Extender is used to extend this to few milliseconds, which is necessary for the generation of post-synaptic current. At every pre-synaptic event, the synaptic weight is updated in the weight update block and an excitatory post synaptic current (EP SC) is produced depending on the present synaptic weight. In between pre-synaptic events, the weight is restored to ward one of its two stable states (Bistaility block). 4.3.1 The pulse extender block In order to mimic the slow dynamics and time constant of the post-synaptic current, a wide digital input pulse (∼ 1ms) is essential for the EP SC generating block. This pulse is used to produce a post-synaptic current (IEP SC ) of comparable rise time . The pre-synaptic input (∼ 1µs), coming from the AER interface, is passed through a P ulse Extender (PE) that extends the width to a desired value of few milliseconds. On the other hand, the narrow output pulse coming from the AER interface can vary by nearly 40% due to delays in the communication cycle, decoder irregularities and unmatched NAND gate delays. The AER output pulse if directly fed to the sensitive analog core, would create a high degree of mismatch between post-synaptic currents even if the synaptic weights are the same. The pulse extender also helps in minimizing this effect. It PE should be activated at the positive edge of the input pulse but keep the extended pulse width independent of the variations in input. Considering the high synaptic density required, it also has to be small enough not to increase the area of the pre-synaptic module appreciably. This eliminates the use of popular digital circuits like monostable multivibrator, that has the required functionality, but occupies large area consisting of comparators and flipflops. Figure. 4.2(a) shows a simple method of extending a voltage pulse (Vin ). The input activates transistor Mp1 and rapidly discharges the node Vc for the duration of the pulse. It then slowly charges up through Mp2 with a small leak current. The inverter at the capacitive node changes state when the voltage Vc crosses its inverting threshold. Mp2 should be weakly biased to keep the time constant of Vc in millisecond range. Even though this produces an output pulse (Vpulse ) larger width than the input (Vin ), the inverter can draw large amount of current due to its slowly varying input(Vc ). If the input 4.3. The pre-synaptic module Vh 57 Ipulse Ms2 Vh Vw Ipulse Mp1 Vpulse Vc Vin Mp2 Mi3 (a) Ms1 Vs Mp1 Vin Vc Vw Ms2 Ms1 Mp2 (b) Figure 4.2: Schematic of two different circuit for the PE (Pulse Extender) block. A part of the EPSC block is shown within the dotted line. (a) The narrow input pulse Vin generates a wide voltage pulse Vpulse . The output is used to produce the current pulse (Ipulse ) within the EPSC block. b) A wide current pulse is directly produced in the EPSC block, using less transistors. pulse width is negligible compared to the intended output, the width of Vpulse only depends on the charging time and not on Vin . A starved inverter, that puts an upper boundary on the current, is used to limit the inverter current ( within 100nA) during this slow change in Vc . Transistors Ms1,s2 are part of the EPSC block that uses Vpulse to produce Ipulse necessary for the IEP SC generation. It is important to note that the extended voltage pulse is required only to produce Ipulse current pulse for EPSC generation. Instead, a current pulse of uniform width could be directly generated by simplifying the design. In a different PE circuit shown in Fig. 4.2(b), a transistor and a bias voltage is removed reducing considerable amount of area. In this case, Vc goes high when Vin is high and then slowly discharges through Mp2 . Ms1 in the EPSC block turns the current Ipulse on when Vc is high, but magnitude of the current is limited by transistor Ms2 . As long as the decaying voltage Vc is above the bias Vh , the gate-source voltage of Ms2 controls the magnitude of current Ipulse , keeping it constant. When Vc goes below Vh , the current starts to decay exponentially, producing a sharp edged current pulse. The simulation results for the circuit in Fig. 4.2(b) is shown in Fig. 4.3.1. The top panel shows the input voltage pulses. Three pulses of 500ns width are shown in gray and one with 900ns width in black. The input pulses of the two different widths are indistinguishable when plotted in millisecond time scale. The bottom panels show the current Ipulse in black and gray in response to respective inputs. Even if the wider input (900ns) is nearly twice the narrow one (500ns), the current output shows no noticeable difference. 58 Chapter 4. Circuits for synaptic plasticity Vin,Vc 3.3 Ipulse(nA) 100 1 2 Time(ms) 3 Figure 4.3: Simulation results of the circuit shown in Fig. 4.2(b). The top panel shows two different inputs having pulse widths of 500ns (gray) and 900ns (black), overlapped on each other. In the bottom panel, we plot the respective output current pulses in gray and black. Though the input varies by nearly 100%, the width of the current pulses are indistinguishable. The current pulse also show a linear summation behavior when two inputs come close to each other. Voltage Vc , plotted in dashed line, shows a sharp rise and a slow decay. Output Ipulse is high as long as Vc is greater than the constant voltage Vh (plotted in dotted line) and comes down rapidly when Vc goes below Vh 2 . Two consecutive pulses in the later part of the simulation add up to form a current pulse wider than that for a single pulse and also depends on their inter-spike interval. This is a very desirable property of the circuit that demonstrates its linearity. 4.3.2 The weight update block The synaptic weight can be modified with every pre-synaptic input spike, depending on the post-synaptic control signals. The updates are instantaneous jumps of the synaptic weight, during the narrow input pulse (width ∆t). The jumps can be either of positive or negative polarity of same magnitude, or null. In Fig. 4.4, we show the weight update block that increases or decreases the synaptic weight (node voltage w) by sourcing to or sinking charge from the capacitor Cw . During a pre-synaptic input, MsU and MsD receives digital 2 As seen from the simulation, the decay in current (Ipulse ) starts little before Vc crosses Vh . This is because source voltage of Ms2 (Vs ), has to rise to accommodate the current Ipulse through Ms1 , reducing Vgs of Ms2 . 4.3. The pre-synaptic module 59 signals VP and VP respectively, both transistors behaving as switches. This initiates a change in the synaptic weight. The polarity of the change depend on the values of IU P m and IDN m , mirrored from the post-synaptic module. Transistors MU and MmU , sharing the gate voltage VU P , mirror the current IU P produced in the post-synaptic module. Similarly, MD and MmD is used to mirror IDN . During pre-synaptic spikes VP goes high and VP goes low turning on both the switches simultaneously. This forces the current mirror circuits to an incomplete configuration. Hence, no current flows in or out of the capacitor. At every pre-synaptic spike: ∆w = (IU P − IDN ) ∆t Cw (4.1) In section 4.4 we will describe that the current IU P and IDN , produced in the post-synaptic module, can either take the value Ijump or zero. Voltages VDN and VU P , corresponding to the gates of transistors MU and MD , are broadcast to all synapses connected to the same neuron. This lets the postsynaptic module to mirror the control currents to all its synapses. It should be noted that only one of the two current can take the value Ijump at a time ( see Sec. 4.4), while both can be zero simultaneously. This restricts ∆w to be either +Ijump C∆tw or −Ijump C∆tw or zero. This block performs a logical AND function between the pre and post-synaptic signals. It lets a current IU P charge the synapse when: VP = 1 and IU P = Ijump (4.2) It discharges the weight with IDN when: VP = 1 and IDN = Ijump , (4.3) and does not make a change when either VP is zero or IU P ,IDN are both zero. 4.3.3 The bistability block One of the prescriptions of the learning rule is that the synaptic weight should reach either of the two stable states in long the term. During inter spike intervals the bistability circuit should refresh the weight by pushing it to one of these two, depending on its previous state. 60 Chapter 4. Circuits for synaptic plasticity presynaptic module VP M sU IUPm VUP M mU Cw MU IUP w IDN VDN M mD VP postsynaptic module M sD MD IDNm Figure 4.4: The weight update block in the pre-synaptic module modifies the node w at every pre-synaptic pulse (VP ). The node charges, discharges or hold its state depending on the currents IU P m and IDN m mirrored from the post-synaptic modules. The current mirrors (MD , MmD and MU , MmU ) are inactive when the switching transistors, MsD and MsU are off, i.e., during inter-spike-intervals. In the silicon implementation, the capacitor storing the synaptic weight (Cw in the Fig. 4.4) is always actively charged or discharged with a current much smaller than Ijump . This is done by an amplifier in positive feedback configuration and biased with a small subthreshold current Ir , as shown in Fig. 4.5(a). The positive feedback amplifier has two stable outputs corresponding to its supply rails VH and VL . Depending on value of w after the last jump in synaptic weight, the node is either charged or discharged to either of the two rails. If w was left at a value below θ, the node is slowly pulled downward and vice versa. The amplifier bias current Ir determines the rate at which w is driven to the supply rails (also known as slew-rate). Unlike the theoretical model, the bistability circuit is active not only during ISI but also during a pre-synaptic spike. However, as the magnitude of Ijump is much larger than that of Ir the weight update overrides the weak bistable drive. The slew-rate can be expressed as: Ir + Cw if w > θ and w < VH dw = − CIrw if w < θ and w > VL dt 0, otherwise (4.4) The choice of the supply rails and the amplifier type plays an important role in the design. As shown in the Fig. 4.5(b), output w is connected 4.3. The pre-synaptic module 61 VH Ir Vr Ipulse IUPm w w Mw IDNm VeP MP VL (a) (b) Figure 4.5: a) A wide range p-type input amplifier is used in positive feedback configuration for the bistability block. The output (w) is slowly driven by the bias current Ir to one of the two supply rails (VH or VL ). b) The bistability block is connected to the weight update and the EPSC blocks. Choice of the amplifier type and the magnitude of the supply rails in the design of the bistability block also depend on the functionality of the connected circuits. to a nMOS in the EPSC block (transistors in gray). A hand drawing portraying the jumps and the bistable-drive of the voltage on node w is shown in Fig. 4.6(a). To simplify circuit description, the lower transistor (MP in Fig. 4.5(b)) is replaced with a switch. The corresponding input currents to the EPSC block (Ipulse ) are also shown in the lower panel. The threshold θ is set to divide the range of w in two equal halves, such that the transition to a higher or lower state can be equally likely. In order to keep Ipulse within subthreshold values, VH cannot exceed 0.5V. If VL is set to 0.1V, the output of the amplifier (w ) is always withing a narrow range between 0.1-0.5V. A wide range pMOS differential amplifier is best suited to drive the output to such low values. From Eq. 4.1 we know that the value of ∆w, determined by Ijump (∼10nA)3 cannot be made arbitrarily small. If W is the difference between the minimum and maximum voltage the node w can take, then ∆w can W be considered as the dynamic range of the synaptic weight. For a balanced transition probability, the theoretical requirement is to have the dynamic range more than ten. However, the narrow voltage range limits the dynamic range to smaller value. Though, the dynamic range can be increased by 3 The current Ijump has to be more than one order of magnitude compared to Ir for reasons described in last paragraph. And, Ir cannot be smaller than 1nA without causing serious disparity due mismatch between different synapses. The current Ijump , also mirrored to hundreds of synapse simultaneously, cannot be too small without having mismatch issues of its own. Hence Ijump cannot be less than 10nA. 62 Chapter 4. Circuits for synaptic plasticity I pulse VH Vw Mw V eP VL Mw Vw I pulse VH VL MD log(I pulse ) log(I pulse ) V eP (a) (b) Figure 4.6: Schematic representation of the effect of bistability block on the output of the EPSC block (not simulation). The transistor Mp in Fig. 4.5 is replaced with a switch for sake of simplicity. A typical example of the dynamics of the synaptic weight w is shown on the top panels. a) The nMOS Mw converts the weight into Ipulse current. The current changes in an exponential fashion depending on instantaneous w. To keep Ipulse within the subthreshold limits, the range of w is restricted between 0V (VL ) and 0.5V (VH ). b) The dynamic range can be extended to 0-0.8V (VH = 0.8V) by using a diodeconnected (MD ) transistor. This limits the coefficient of the exponent to a lower value, thus restricting the magnitude of Ipulse from overshooting the subthreshold limit. raising the value of VH , but that also results in large Ipulse currents crossing the subthreshold limit. An additional diode connected (gate-drain shorted) transistor MD , when placed in between the Mw and the switch, can restrict Ipulse to subthreshold values for larger VH , but not more than 0.8V. Alternative design of the bistablity block An alternative approach to solve the problem is to utilise the entire supply rail (0-VDD ) for the bistable amplifier and decouple the node w from the actual weight used by the EPSC block. A full rail-to-rail dynamic range requires the gate voltage of Mw to take binary values without traversing the intermediate analog states. A possible implementation is shown in Fig. 4.7. Here, wD determines the state of the synapse (on or off) and Vlim is used to limit the current Ipulse within subthreshold range. The node w is connected to the weight update block and the bistable amplifier as in Fig. 4.5(a) but is seperated from the transistor Mw . An obvious method to convert w to wD is to use a high speed open-loop amplifier behaving as a one-bit analog-to-digital converter. However, the power and area overhead for such an implementation 4.3. The pre-synaptic module 63 Ipulse VP Ma3 Vlim Ir Ma2 wD w Ml Mw Ma1 VeP MP Figure 4.7: The positive feedback amplifier with supply rails connected to VDD and ground increase the dynamic range of w. The EPSC block does not directly take w as its input, but is conneceted to wD a digital version of w. The circuit within the dashed line is a open loop amplifier performing as a one-bit analog-to-digital converter that is active only during pre-synaptic pulse VP . Bias Vlim limits the Ipulse magnitude within subthreshold limit. is not only unnecessary, also impractical for the small synaptic circuit. The simple circuit shown within dotted lines in Fig. 4.7 behaves like a two stage non-inverting amplifier performing similar function but consuming much less power. Ma1 behaves like an inverting amplifier with Ma2 as its load. This amplifier turns on only during pre-synaptic spike events (using switch Ma3 ) reducing the static power dissipation to zero. The dynamic power dissipation through the inverter and the amplifier is also minimal due the narrow width of Vs . The gain of the amplifier has to be high enough for w to settle to a digital state within the allowed time period. In the simulation results in Fig. 4.8 we forced w to vary between a wide voltage range (top panel) and plotted the corresponding wD and Ipulse . As expected, wD settles to either zero or VDD very fast during a pre-synaptic event (except when w is around θ). However, the current Ipulse shows oneor-none nature, exactly what the EPSC blocks should receive. 4.3.4 The EPSC block The EPSC block generates the final synaptic current, with a biologically realistic temporal dynamics, and an amplitude dependent on the synaptic weight. The EPSC current is produced at every pre-synaptic event that depolarizes the membrane capacitance, as shown in Fig. 4.9(a). This can be modeled as a controlled current source charging up a capacitance at every input spike (see Fig. 4.9(b)). In models of spiking neural network, the temporal dynamics of the current source is often ignored. It is generally considered Chapter 4. Circuits for synaptic plasticity w(V) 64 2 θ Ipulse (nA) wD(V) 1 3 1 100 0 0 0.5 1 1.5 time (µs) 2 2.5 Figure 4.8: Simulation result showing w, wD and Ipulse from Fig. 4.7. The synapse receives a regular spike train input that modifies w according to Iupm and IDN m (not shown). Node wD settles to one of the two bistable states during pre-synaptic spikes, depending on the magnitude of w. Current Ipulse has a high subthreshold value only when wD is high at the instant of a presynaptic spike. to be a sharp current pulse of known amplitude. In many implementations of VLSI synapse constant current sources activated only for the duration of the presynaptic input pulse (Mead, 1989; Fusi et al., 2000; Chicca et al., 2003) have been used. However, within the context of pulse based neural networks, modeling the detailed dynamics of postsynaptic currents can be a crucial step for learning neural codes and encoding spatiotemporal patterns of spikes (Gütig and Sompolinsky, 2006). Various VLSI models of synaptic dynamics have been implemented in early works of Mead (1989), Lazzaro (1994) to more bio-plausible ones in Boahen (1996), Arthur and Boahen (2004) and Farquhar and Hasler (2005). In theoretical models of synaptic transmission, a pre-synaptic spike is considered to release neurotransmitters that bring changes in membrane conductance activating the ion channels of the post-synaptic cell. This results in a post synaptic current. The membrane conductance changes are, in general, modeled as α-function (Rall, 1967), as exponential rise and fall (Destexhe et al., 1998) or as difference of exponentials (Dayan and Abbott, 2001). A first order linear filter with equal exponential rise and fall time is good approximation of the model in Destexhe et al. (1998). Linear filters can be implemented in VLSI with very few MOS transistors when their subthreshold exponential transfer function is considered. Frey (2000) demonstrated this family of log-domain filters using the exponential behavior of bipolar transistors and Arthur and Boahen (2004) showed how to use them in sili- 4.3. The pre-synaptic module 65 Spike Presynaptic terminal Spike EPSC Postsynaptic membrane Membrane capacitance EPSC (a) (b) Figure 4.9: a) A biological synapse is activated by a pre-synaptic spike that initiates a complex biochemical phenomena resulting in a post-synaptic current (EPSC). b) It is modeled in silicon as a controlled current source, triggered by a pre-synaptic pulse, whose temporal dynamics follows that of the real EPSC. con synaptic circuits. The Differential pair integrator (DPI) circuit reported in Bartolozzi and Indiveri (2007b) is another suitable approximation of the linear integrator and has been used in a conductance based silicon synapse. We used the DPI circuit in this work because of its compactness and our familiarity with the design (see Bartolozzi et al., 2006). Appendix. C shows detailed analysis and design of other compact log-domain filters with various tunable parameters. In the DPI circuit shown in Fig. 4.10(a), the steady state IEP SC is proportional to the current Ipulse . The layout of the DPI circuit covers nearly a third of the area occupied by the pre-synaptic module (see Fig. 4.1(b)). However, as it implements a linear filter (Bartolozzi et al., 2006), it would be possible to use a DPI circuit per neuron that integrates the effect of all pre-synaptic inputs. This is shown in Fig. 4.10(b), where the w1 to wn represents weights of different synapses and S1 to Sn are the switches receiving the corresponding pre-synaptic spikes. A common Ipulse current is generated and fed to the core of the DPI circuit, that is shown in dotted line. The part of the circuit inside the dotted line (compare with Fig. 4.10(a), uses this Ipulse to produce a synaptic current, which is a linear summation of all inputs. Though the area of the EPSC block could be greatly reduced with this method, in this work we followed a conservative design approach by using one DPI per synapse, giving less emphasis on the synaptic density. 4.3.5 Point-neuron architecture A point-neuron, shown in Fig. 4.11(a), consists of a single node where currents from all synapses are summed and then fed into an integrate-and-fire 66 Chapter 4. Circuits for synaptic plasticity v tau w VeP M3 DPI M4 Vgain M1 M2 I pulse I EPSC w I pulse wn I EPSC w1 VeP Sn (a) S1 (b) Figure 4.10: a) Schematic of the differential pair integrator (DPI) circuit used for IEP SC generation (Bartolozzi and Indiveri, 2007b). The current is produce only during the VeP pulse and has a magnitude depending on the synaptic weight at node w. One EPSC block is used for each silicon synapse. b) A method of sharing the EPSC block by exploiting the linearity of the DPI circuit. Synapses with different weights (w1 -wn ) produce a common Ipulse depending on the pre-synaptic inputs (S1 -Sn ). All inputs will have an additive effect on the IEP SC current charging the post-synaptic membrane. soma. This is in contrast to the compartmental model, where the physical position of synapses are important in their contribution to post-synaptic depolarization. Unlike the compartmental model, a point-neuron has a single lumped membrane capacitor connected to a common node and the node voltage is the membrane voltage (Vmem ). Similarly, in the silicon implementation, all synapses belonging to the same neuron have independent IEP SC currents that charge up the common Vmem in an identical way, without any spatial bias. The Fig. 4.11(b) shows an array of pre-synaptic modules, each representing a plastic synapse, connected to one neuron (represented by the post-synaptic module). It also shows all the circuit blocks forming the presynaptic module with their internal connections. The Spre input on the left is where the pre-synaptic spike is received. Though the synapses are physically placed at different locations, away from the neuron, they show a point-neuron behavior when the electrical connectivity is concerned. Apart from the common Vmem node, the pre-synaptic modules also share common learn control signals, (VU P and VDN ), produced in the post-synaptic module. 4.4. The post-synaptic module 67 W1 W2 Vmem Axon Wn pre−synaptic modules post−synaptic module VUP sU Vmem Spost w PE Spre AER Bistable VeP VeP VP DPI IEPSC VDN sD Figure 4.11: The point neuron architecture, shown on the top, accumulates pre-synaptic inputs from all its synapses on a single node (Vmem ). Corresponding silicon implementation with numerous pre-synaptic modules connected to a single post-synaptic module is shown below. All internal blocks of a typical pre-synaptic module along with the global learn control signals (VU P and VDN ) are also shown. 4.4 The post-synaptic module Every silicon neuron is made up of an instance of the post-synaptic module. This module consists of a I&F soma block, a pulse integrator block and three current comparator blocks. The action potential is generated in the I&F soma while a pulse integrator and the current comparators are responsible for producing the learn control signals fed back to the synapses. Though the changes in synaptic weights occur only during a pre-synaptic spike, the polarity and magnitude of the change depends on the state of the post-synaptic module. As described in the learning rule (Sec. 2.5), the average post-synaptic firing rate and the membrane depolarisation at the pre-synaptic spike instant determines the nature of synaptic modification. Combining the Eq. 2.4 and Eq. 2.6, the conditions for upward or downward jumps are given by the status of the binary signals U P and DN , respectively: ( UP = 1, if k1 < νpost < k3 and Vmem > Vmth 0, otherwise 68 Chapter 4. Circuits for synaptic plasticity Vmem Vmth UP DN Figure 4.12: Dependence of the binary learn control signals, U P and DN , on post-synaptic membrane potential Vmem . This assumes that the post-synaptic freqiency is in the right operating range. ( DN = 1, if k1 < νpost < k2 and Vmem < Vmth 0, otherwise (4.5) Here νpost is the average post synaptic firing rate, k1−3 , Vmth are constants, and Vmem is the post-synaptic membrane potential. Due to their dependence on Vmem , the binary control signals, U P and DN , are activated in a complementary fashion. They are also called as eligibility traces, that makes a pre-synaptic spike eligible to modify the synapse. If the condition for upward jump is met (i.e, U P = 1) at the instant of a pre-synaptic spike (Spre ), the synaptic weight increases by a predefined amount, while it goes low by the same amount if the condition for downward jump is met at that moment. Though the U P and DN conditions cannot me simultaneously true, they can both be false, resulting in no synaptic modification during a pre-synaptic spike. Figure. 4.12 shows the status of the control signals when the restrictions on νpost for both the above equations are met. 4.4.1 The I&F soma block There has been a number of implementation of integrate and fire circuit for silicon I&F somas (Mead, 1989; Culurciello et al., 2001; van Schaik, 2001). Primarily, all the circuits proposed follow a similar principle of charging a capacitor with the synaptic currents, followed by a positive feedback circuit that rapidly drives the integrated voltage to positive supply, once it crosses a threshold. Subsequently the capacitor is reset to ground. In this work we used the low-power I&F soma described in detail in Indiveri (2003). The spiking threshold, refractory period and spike frequency adaptation can be 4.4. The post-synaptic module (a) 69 (b) Figure 4.13: Expected behavior of I&F block. a) Response during a step input current to the axon. b) Response during Poisson input spikes stimulating the excitatory synapses. independently controlled in this circuit. It also consumes the least amount of power compared to all previous implementations. For an average firing frequency of around 50Hz, the soma consumes few hundred nano-watts of power, ideal for a large scale system consisting hundreds of such elements. The two different outputs of the soma are its analog membrane voltage Vmem and the digital pulse representing an action potential. The cartoons in Fig. 4.13(b) shows the methods of stimulating a silicon neuron. Similar to physiological experiments, the soma, can be either directly stimulated with a step current or via the excitatory synapses. The figure shows expected firing patterns for a step current and for Poisson distributed input at random synaptic location. A pulse integrator connected to the output of the neuron can provide a measure of the average post-synaptic firing frequency (νpost ). 4.4.2 The pulse integrator block In biology, the slowly varying calcium concentration ([Ca]) of the cell is a measure of the average firing frequency of the neuron. However, in silicon implementation, we directly integrate the digital output pulses from the soma to generate an analog signal, voltage or current, for estimating the mean firing rate. In a preliminary design of the integrator, a capacitor was charged by a current pulse for every post-synaptic spike and then discharged linearly during ISIs. The voltage on the capacitor (V[Ca] ) provides an approximate measure of the mean spike frequency. Though easy to implement, this is not a reliable method to obtain the average post-synaptic frequency, νpost . As the charge and discharge phases are independent of each other, the voltage on the capacitor hits one of the supply rails within a short time duration. 70 Chapter 4. Circuits for synaptic plasticity 200 Current(nA) current(nA) 12 8 70Hz 50Hz 40Hz 100 4 0 0 1 2 3 Time(ms) 4 5 0 0 0.2 0.4 Time(s) 0.6 (a) Figure 4.14: [ Simulation of DPI circuit used as pulse intgrator]Simulation of the DPI circuit when used as a pulse integrator. a) Response during a single input pulse shows low pass filter like behavior. b) Uniform pulse train input of different frequencies take the steady state output to different asymptotic values. To avoid this problem, we use a first order low pass filter that produces an output magnitude directly proportional to input frequency. A simple linear filter can be designed using a resistor, capacitor combination in parallel, and a an amplifier for the voltage readout. However, implementing a linear resistor in VLSI is not an obvious task and the overall circuit requires a fairly large area. In Sec. 4.4.3 I will discuss the additional problems encountered when using a voltage signal representing νpost . Hence, it is observed that a current-mode pulse integrator is best suited for the purpose. The integrator should generate a current (I[Ca] ) proportional to the average output frequency of the post-synaptic neuron. Different low pass filters (LPF) that can be designed to meet this requirement are shown in Appendix. C. Here we used a differential pair integrator (DPI) as a low pass filter, similar to the one shown in Fig. 4.10(a)) (Bartolozzi et al., 2006). Simulation results in Fig. 4.14(a) shows the response of the integrator to one wide input pulse, and in Fig. 4.14(b) shows the response to regular pulse trains of different frequencies. As desired, the steady state LPF current is a function of the mean νpost and reaches the value in an asymptotic manner. 4.4.3 The dual threshold comparator block Every neuron in the chip has to produce two binary learn control signals (U P and DN ) and broadcast them to all its synapses simultaneously. The average 4.4. The post-synaptic module 71 UP mem DN Figure 4.15: Physical layout of the neuron and synapses require three wires carrying U P , DN and mem signals to run in proximity for a few millimeters on the chip. This can create enough coupling to destroy the integrity of the slowly varying mem, if the binary signals (U P ,DN ) have large voltage swings. post-synaptic firing frequency should be compared with two thresholds to generate the U P and DN signals, as described in Eq. 4.5. In the silicon implementation of a point-neuron, the synapses are physically placed at a large distance, often few millimeters away, from the post-synaptic module. When broadcasting the the learn-control signals, it should be taken care that the signal integrity is not lost in charging the large line capacitance (originating from long metallic lines) and numerous device capacitances (from large fanout). For a point-neuron, the membrane potential (Vmem ) seen by the synapses is the same as that generated in the soma. Hence, similar to the binary signals, Vmem has to be shared by all synapses, as well (see Fig. 4.15). In such cases, shielding is necessary to remove interference between signals with sharp, high voltage swings with the slowly varying ones. In the following sections we will discuss possible usage of voltage or current mode comparators responsible for generating and transmitting these global signals. Voltage mode comparator Let us first consider the mean post-synaptic frequency being represented by a voltage, V[Ca] . To determine the presence of this analog signal between two different threshold, two different circuits can compare the input with two thresholds simultaneously. The outputs of the comparators are combined for a final result. Standard comparators (high gain amplifier) and a digital AN D gates are required to achieve the functionality. However, this would typically require 15 to 20 MOSFETs and significant amount of DC power consumption. On the other hand, if the speed requirements are not very stringent, the same dual-threshold comparison can be achieved with more power efficient and compact circuits. Here we show a circuit with only 8 MOS-FETs that dissipates very small amounts of power, that too, only when the signal is within the two thresholds (Mitra and Indiveri, 2005). In Fig. 4.16(b), we show a dual threshold comparator whose output produces a direct indication corresponding to the presence of the input V[Ca] within the two thresholds k1 and k3 . This circuit is particularly useful if the 72 Chapter 4. Circuits for synaptic plasticity k2 Ip M1 M6 3 VUP Ix 2.5 Ixm M3 M5 C V [Ca] M2 k1 (a) M8 VUP(V) Vlim M7 log(Ix)(A) Vp 2 −10 10 −15 10 1.5 0.5 1 V[Ca](V) 1.5 1 0.5 M4 0 0.4 0.6 0.8 1 V[Ca](V) 1.2 1.4 1.6 (b) Figure 4.16: The dual threshold voltage comparator (a) and its output(b). As the input V[Ca] sweeps from 0 to 1.6V, the output swings from default high state to a low one only between the two thresholds k2 and k1 (0.5V and 1.4V, respectively). The inset shows the increase in current in the left branch of (a) when V[Ca] is in between 0.5-1.4V range. The horizontal line indicates the constant current (Ip ). The maximum current in the left branch is clipped using the Vlim bias, to keep the power consumption minimal. lower threshold is near zero (we set k1 =0.5V). The output swings high to Vdd , when k1 > V[Ca] or V[Ca] > k3 , and goes low to k1 , when k1 < V[Ca] < k3 . Transsitor M1 and M2 forms a basic inverter which conducts a large current (Ix ) through the left branch only when V[Ca] is within the two thresholds k1 and k2 . This current is mirrored (Ixm ) by M3 and M5 . On the right branch, Ixm is subtracted from a suitable constant Ip to generate the voltage output (VU P ). Voltage Vlim is used to limit Ix to a subthreshold value thus restricting the power consumption. It should be noticed that Ix is almost zero (so is Ixm ) when V[Ca] is outside the two thresholds, reducing the power consumption to minimal. A digital switch, controlled by binary signal C, can turn the comparator off which forces VU P to go high irrespective of V[Ca] . A similar dual threshold comparator compares V[Ca] between k1 and k3 to generate the VDN signal, which, unlike VU P , goes high when in between the thresholds and low when turned off. In Fig. 4.17(a) we plot data from the IFSL-v1 chip showing the voltage VDN and V[Ca] in solid and dashed lines, respectively. Here k2 is set at 1V and k1 at 0.5V, as before (plotted in dotted lines). Voltage VDN goes high only within the right range of V[Ca] , but also changes its polarity depending on the status of C. The binary signal C turns the dual threshold comparator off 4.4. The post-synaptic module 73 Vmem 3 2 0 1.5 1 1 0.5 0 0 2 vpre(V) VDN,V[Ca](V) 2.5 0.2 0.4 0.6 Time(s) 0.8 1 0 0.5 Time(ms) Figure 4.17: Data from the IFSL-v1 chip. a) The voltage V[Ca] (dashed line) represents the average post-synaptic firing frequency. VDN (in solid line) goes high only when Eq. 4.5 is satisfied. The coupling between V[Ca] and high voltage swings on VDN is evident from the plot. b) Large voltage swings on VDN and VU P severely corrupted the Vmem signal when the neuron is stimulated by Poisson distributed spike trains (Vpre ). when Vmem is less than Vmth , forcing VDN to zero. As expected, the behavior of the voltage VDN is identical to the variable DN in Eq. 4.5. However, it can be noticed, the high voltage swing of VDN couples with the slowly varying (V[Ca] ). The effect is even worse when VDN and VU P both couples with slow analog Vmem signal. Shown in Fig. 4.15, these three wires run in parallel and share large wire capacitance in between, enough to severely corrupt the behavior of Vmem . Figure. 4.17(b) such a noisy post-synaptic membrane voltage (Vmem ) when stimulated with Poisson distributed spike inputs (Vpre ) on its non-plastic synapses. Current Mode comparator From the analysis in the last two sections, it is evident that voltage mode comparison is not suitable for implementing the Eq. 4.5 in a silicon neuron. The digital voltage outputs are prone to couple with each other and the slowly varying membrane voltage. On the other hand, there are efficient circuits for representing the magnitude of νpost as a current variable (see Sec. 4.4.2). Hence, the output current from the pulse integrator (I[Ca] ) can be used for suitable current mode comparison. Such comparators are utilized to produce binary currents corresponding to U P and DN learn control signals. A simple current comparator can be derived from a two input current mode winner take all (WTA) (Lazzaro et al., 1989). Figure 4.18(a) shows such a WTA circuit having two inputs, two outputs and a bias current IB . If one of the 74 Chapter 4. Circuits for synaptic plasticity Ii1 Io1 Io2 Ii2 Ii1 Io2 = Iout CC Ibias Ii2 (a) (b) Figure 4.18: Two input current mode winner take all (WTA) can be used as current comparator. One of the two outputs (say Io1 in left figure) when connected to VDD reduces the WTA circuit to a current comparator with the specific connection, i.e. Ii1 as the negative input, Ii2 as the positive and Io2 as the output. two output branches (say, Io1 ) is connected to VDD and the other (say, Io2 ) to an external load the WTA circuit reduces to a current comparator (CC) of a specific connection (Fig. 4.18(b)). Here, Ii1 becomes the negative input, Ii2 , the positive and Io2 behaves as the comparator output, renamed as Iout . The output takes a high value if positive input is larger than the negative and vice versa: ( Iout = IB , if Ii2 > Ii1 0, otherwise (4.6) It should be noticed that, if the bias current itself is brought down to zero the comparator output is always zero, independent of the inputs. Now, let us reduce the Eq. 4.5 to suitable current variables. We consider IU P and IDN as binary output currents, and Ik1 ,Ik2 ,Ik3 as constant currents representing the thresholds k1−3 . As mentioned before, the output current of the DPI circuit (I[Ca] ) represents the average post-synaptic firing. If we consider a current Ijump corresponding to the high state of IU P or IDN , Eq. 4.5 can be re-written as: ( IU P = ( IDN = Ijump , if Ik1 < I[Ca] < Ik3 and Vmem > Vmth 0, otherwise (4.7) Ijump , if Ik1 < I[Ca] < Ik2 and Vmem < Vmth 0, otherwise (4.8) 4.4. The post-synaptic module 75 V UP Learn control MU Vmth CC2 Vmem Ijump CC1 V AP Ik3 Soma AER IUP Ik2 Ik1 S post IDN CC3 DPI V DN V AP ICa MD Figure 4.19: The complete post-synaptic module with the I&F , the DPI and the current comparators. Connections to the dendritic tree (as shown in Fig. 4.11(b) are done via the VDN , VU P and the Vmem signals. The output of the soma VAP is also connected to an AER interface circuit that generates a digital output pulse (Spost ) to be transmitted via the AER bus. The above equations can be implemented in silicon using three current comparators, each comparing an instance of I[Ca] with the constant currents Ik1−k3 . The current comparators, CC1,CC2 and CC3, when connected in the configuration shown in Fig. 4.19 produces the desired functionality. Here, Ijump is the constant bias current for the comparator CC1, but CC2 and CC3 receives the output of CC1 as their bias. Hence, CC2,CC3 are active only if I[Ca] crosses the first threshold, Ik1 , otherwise they receive zero bias current, forcing their respective outputs to zero. However, only one of CC2 or CC3 can be active at a given time as the signal C diverts Ijump to either of them, depending on the polarity of Vmem -Vth . These binary currents generated in the post-synaptic module should be broadcast to all synapses belonging to the same neuron. Unlike voltage signals, currents cannot be directly broadcast. As separate wires has to carry each instances of the original current, this would result in hundreds of parallel wires running across the chip. Here, broadcasting currents essentially means mirroring them to the synapses. Figure. 4.19 shows a part of the circuit (MU , MD ) that mirrors IU P and IDN . The corresponding gate voltages produced (VU P and VDN ) are sent over wires shared by all synapses (also shown in Fig. 4.4). Though the learn control signals are binary in nature, switching between zero and Ijump , the subthreshold current jump produces voltage swings of only few hundred millivolts. The small voltage jumps do not produce any noticeable coupling between Vmem and VU P or VDN . However, a normal current mirrors are not efficient enough to handle a fanout of few 76 Chapter 4. Circuits for synaptic plasticity I DN I DN MD MD Figure 4.20: a) General configuration of an active current mirror. b) The circuit used to mirror IU P and IDN to all pre-synaptic modules in the synapse array. hundred currents. The capacitive load on the wire carrying VDN (or VU P ) restricts them from changing fast enough. Active current mirrors were used to mirror the currents to hundreds of synapses connected to the same node. Figure. 4.20(a) shows a generalised configuration of an active mirror and Fig. 4.20(b) shows the specific circuit used here. 4.5 Configuration of synaptic density The number of inputs (synapses) connected to a neuron has important consequences in the classification behavior. As described in Sec. 2.4, the number of patterns learned by a neuron increases as the square-root of its synaptic density. In this model, a neuron, being a binary classifier, categorizes the learned patterns into two different classes. Though, more synapses per neuron is better for individual classification performance, more neurons can be used to classify a large number of patterns. On the other hand, it is often beneficial to configure each neuron as a weak classifier, with a low synaptic density, and use a pool of them to boost the overall performance (Polikar, 2006). Given a fixed area, it is a difficult decision to choose between the optimum number of neurons and the number of synapses connected to each. To meet these varied requirements we provide a flexibility to the hardware device that can reconfigure the synaptic connectivity to modify the density of synapses per neuron. An on-chip multiplexer can connect arrays of synapses to one neuron or the other, depending on the requirement. In Fig. 4.21 four neurons, corresponding synapses arrays and the multiplexer are shown. In default condition, switches p1-p4 are on (1111) and c1-c4 off (0000), each neuron is connected to the synapse array placed right beside it. However, if p switches take a 0101 configuration and c switches the complementary 1010, the s1 ad s2 arrays get connected to n2 and s3,s4 to n4. Though the synaptic density of n2 and n4 got doubled, neurons n1 and 4.6. Conclusions 77 s1 s2 s3 s4 Synapse Array c1 Multiplexer p1 c2 p2 c3 p3 c4 p4 Neurons n1 n2 n3 n4 Figure 4.21: Neurons are connected to arrays synapses via a multiplexer. The pass-gate switches are set to give the network a flexible configuration, that varies the synaptic density of a particular neuron. n3, without any synaptic input, became inactive. Similarly, a 0001 and 1110 configuration for the p and c switches respectively, connects all the available synapses to n, forcing all other neurons to be inactive. In the IFSL-v2 chip, a single neuron can be connected to 128, 248, 512 or 1024 synapses at one time, thus reducing the number of active neurons from 16 down to 2. For most experiments in next chapters, we configured the chip with 16 neurons each connected to 128 synapses. 4.6 Conclusions In this chapter I described different variants of the circuits used in designing the silicon neurons and synapses. These are part of a wider collection of circuits that the neuromorphic community has developed for last few years and also during the course of this project. I showed how the circuits were improved while considering various design constrains in building multi-neuron VLSI devices. I demonstrated the basic requirements for implementing the reduced learning rule described in Sec. 2.5, and showed detailed analysis in justifying the choice of individual circuit elements. The most abundant circuit on the IF LS family of chips is the VLSI plastic synapse, whose weight can be modified by its corresponding pre-synaptic and pos-synaptic modules. Analogous circuits and networks in silicon, implementing variants of the learning rule proposed in Brader et al. (2007) are also reported by Fusi et al. (2000); Chicca et al. (2003); Giulioni et al. (2007). Here, I showed how the circuit blocks are designed to improve in terms of power and area consumption, compared to other implementations. In a previously proposed solution, all synapses re- 78 Chapter 4. Circuits for synaptic plasticity ceiving the same global learn control signals from the post-synaptic neurons, are affected by electrical interference (Mitra and Indiveri, 2005). I demonstrated how to overcome this problem by using current-mode approach and broadcast subthreshold voltage signals to reduce the effect of coupling in the shared metal lines. This methodology can be effectively used for all global feedback signals for such silicon neural networks. Current-mode log-domain filters has been shown to be useful for lowpower neuromorphic applications (Arthur and Boahen, 2006; Bartolozzi and Indiveri, 2007b). In this chapter I described how similar circuits are utilized as first order low-pass filters in silicon synapses and neurons. These circuits, with additional control parameters (see Appendex. C), can also be used for various low-power bio-medical applications (Bartolozzi et al., 2006). Some circuit descriptions in this chapter are accompanied by relevant T-Spice TM simulation results to emphasize their functionality. The analysis of the data obtained from the various blocks are described in the next chapter, within the context of synaptic characterization. Towards the end of the chapter I described a method for configuring the number of synapses connected to a silicon neuron, using an on-chip multiplexer. This will enable future chips to have flexible synaptic density for incorporating a varied range of networks. Chapter 5 Characterization of the plasticity circuits 5.1 Introduction Synaptic plasticity is considered to be the site of learning and the basis of memory formation in biological systems (Abbott and Nelson, 2000). In this work, the design of the silicon synapses showing functional plasticity is motivated by their biological counterpart and also by the theoretical learning rule described in Brader et al. (2007). Here, I describe how the synaptic circuits constituting the pre-synaptic and the post-synaptic modules (described in Chapter. 4) are characterized. The circuits are stimulated with controlled input to gain detailed knowledge about the behavior of the synapses. It is necessary to optimize the functionality of the silicon synapses before proceeding towards the experiments related to learning and memory (described in Chapter. 6). Plasticity in silicon synapses, driven by spike based learning rules, have been reported in some previous studies. Hafliger and Mahowald (1999) proposed a spike based learning rule that modifies according to the temporal correlation of the pre- and post-synaptic spikes and showed the mechanism of weight normalization in a silicon synapse. Most of the other studies also emphasized on characterizing the temporal learning window of the silicon synapses (Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006). Only in Fusi et al. (2000); Indiveri et al. (2006a) was the dependence of synaptic modification on pre- and post-synaptic frequency implemented. Here, I will describe the short-term and the long-term modification of the synapses with much more detailed characterization. Due to transistor non-idealities, an exact match between the theoretical 79 80 Chapter 5. Characterization of the plasticity circuits specification and silicon behavior is not possible for the hundreds of elementary circuits on chip. Imperfection during fabrication process can lead to an on-chip mismatch of up to 50% even between single transistor I-V characteristics 1 . Therefore, it is important to implement neural networks that exploit the collective behavior of neumerous circuit elements, overriding their individual disparity. Large number of imprecise computational blocks working in massively parallel fashion results in a fault tollerant design, a hallmark of neuromorphic systems (Vittoz, 1998). Here I describe the similarity between the model of synaptic plasticity and the data from the circuits implementing them. Due to the limitation in the number of pads used to read-out internal signals from the chip, there is no possible way to directly access or individually characterize all the 1024 silicon synapses. We therefor chose to probe the behavior of only one synapse in detail. In this chapter I also show the data demonstrating stochastic transition probabilities for all synapses connected to a single neuron. I describe how to modify the probability in synaptic transition of the entire array, by tuning the bias voltages, according to the theoretical prescription. 5.2 The post-synaptic module The silicon synapses are functionally divided in a pre-synaptic and a postsynaptic module (see Fig. 4.1(a)). All synapses connected to a single neuron receives learn control signals from the post-synaptic module. The circuit detail of different blocks in this module are described in Sec. 4.4 (see Fig. 4.19). Here, we first characterized the post-synaptic module by constant current injection to the I&F soma block. As described in Sec. 4.4, the post-synaptic module consists of a I&F soma, a pulse integrator and the current comparator blocks. The soma ingrates the input current on the membrane capacitance, fires an action potential when a threshold voltage is reached, and then resets the membrane potential to ground. This begins a new cycle of integrate-fire-reset, as long as the current injection is active. In Fig. 5.1(a) the membrane voltage of a silicon neuron during an action potential generation period is plotted (from Indiveri (2003)). The data points (circle) are 1 Threshold voltage mismatch in transistors (∆Vt ) is generally modelled as σ∆V t = The technology dependent proportionality factor AV t is measured to be around 9mVµm, for the 0.35µm process used here (Kinget, 2005). However, Grossberg et al. (2004) showed that Vgs mismatch in subthreshold operation has a more dominant role to play and has a proportionality factor twice that of AV t . Simple calculation shows, for a minimum size device in this process, that the σ∆V gs can be as large as 60mV. Compared to the subthreshold gate voltages of 200-300mV, necessary for nano ampare operation, this can result in severe variations in individual circuit behavior. AV t √ . WL 5.2. The post-synaptic module 81 DN 400 mV 2 V Vmem(V) V mem 3 V UP 400 mV 1 0 0.02 0.04 Time(s) 0.06 0 0.02 0.04 0.06 Time(s) 0.08 0.1 Figure 5.1: a) Membrane potential of I&F silicon neuron adapted from Indiveri (2003). The data points (circle) are fitted (solid line) with analytical solution from MOS transistor equations. b) The top panel shows the membrane voltage made to fire at a appropriate constsnt frequency by continuous current injection. The spike frequency adapatation mechanisms are turned off. The states of VU P and VDN are shown in the lower panels. fitted (solid line) with analytical solution from MOS transistor equations. The circuit uses bias parameters determining the refractory period, spiking threshold and spike frequency adaptation. As shown in Fig. 4.12, the learn control signals (U P and DN ) generated in the post-synaptic module depend on the membrane potential of the soma. In order to check the dependence of the currents IU P and IDN on the value of Vmem , we injected the soma with a constant current. The top panel in Fig. 5.1(b) shows the activity of the I&F soma with out activating the spike frequency adaptation machanism. To verify the dependence of the control signals only on Vmem , we also deactivated the current comparator blocks in the post-synaptic module. In order to do so, we had to set Ik1 to zero and Ik2 , Ik3 to a large suprathreshold value (∼ 1µA). This ensures that I[Ca] is always within the right range given by eqs. 4.7 and 4.8. We monitored the gate voltages VU P and VDN corresponding to the current IU P and IDN , respectively (see Fig. 4.19). In Fig. 5.1(b) the gate voltages VU P and VDN are plotted. Depending on the magnitude of Vmem compared to Vth (shown in dashed line), the currents switch from zero (inactive) to Ijump (active). Voltage VDN , from the gate of the nMOS transistor carrying IDN , takes a higher value (dotted line in middle panel) when the current is in active state. On contrary, VU P controlled by a pMOS transistor, comes down (dotted line in lower panel) from VDD when IU P is active. The voltage signals changes by 82 Chapter 5. Characterization of the plasticity circuits approximately 300mV as the currents change their states. As expected, the currents reach their active states in a complementary fashion. To emulate the calcium dynamics and verify the functionality of the other blocks, a non-plastic synapse of the neuron was stimulated with a brief spike train. This slowly increased its output firing frequency (νpost ) of the neuron to an average steady state value. The stimulation consisted of Poisson distributed spikes for three seconds. The temporal average of νpost is determined by the DPI circuit, which is shown in Fig. 4.19). Its output reaches an asymptote proportional to the neurons mean output firing frequency. In Fig. 5.2 we plot the gate voltage v[Ca] of the DPI output transistor, that produces the current I[Ca] . Similar to the previous experiment, we first deactivated the current comparators and monitored the changes in VU P and VDN with respect to Vmem . We plotted all the voltages in Fig. 5.2(a), where the state VU P and VDN are determined by Vmem . Both the control voltages are alternatively active (similar to Fig. 5.1(b)) during the entire stimulation period, except when the there was no post-synaptic firing. Next, the current comparators are activated setting Ik1 , Ik2 and Ik3 to approximately 10nA, 50nA and 100nA, respectively. In Fig. 5.2(b) corresponding gate voltages marked by horizontal lines are shown along with the V[Ca] signal. In this case, following eq. 4.7, IDN switches between active and inactive states only when I[Ca] is in the right frequency range. It is inactive in all other cases. In the Fig. 5.2(b) we plot the corresponding VDN . Similarly IU P (and VU P ) follows the behavior in eq. 4.8. 5.3 The pre-synaptic module Every silicon synapse has a corresponding pre-synaptic module that is responsible for the short term and the long term modification in the synaptic weights. The node voltage w in Fig. 4.4 is a measure of the synaptic weight. The change in the weight result from two independent mechanisms, the weight update (see Sec. 4.3.2) and the bistability (see Sec. 4.3.3). As shown in Fig. 4.11(b), the weight update mechanism accumulates or drains off charge from the capacitor connected to node w, using the currents mirrored from the post-synaptic module. The current mirrors are functional only when the switches sU and sD are on, i.e, during a pre-synaptic spike. In absence of pre-synaptic spikes, the current mirrors are blocked and the voltage is weakly driven by the amplifier in the bistability module. As explained in Eq.4.4, the weight drifts towards either of the amplifiers supply rails depending on the initial value of w. At the end of the last synaptic update, if w is greater than a threshold θ (see Fig. 4.5), the amplifier pushes it mem DN UP V VUP V VDN V Vmem V [Ca] 83 V[Ca] 5.3. The pre-synaptic module 0 1 2 Time(s) (a) 3 4 0 1 2 Time(s) 3 4 (b) Figure 5.2: Verification of the post-synaptic module with spike train stimulation. The top panels show the gate voltage of the transistor producing I[Ca] (see Fig. 4.19). This represents the average post-synaptic frequency. a) The current comparators are turned off resulting in VU P and VDN changes solely dependent on Vmem , and independent of V[Ca] . b) After turning the current comparators on, V[Ca] along with Vmem determines the states of VU P and VDN . towards the positive rail VH . Otherwise, the node w is pulled towards lower supply VL . In order to monitor the synaptic modification of one plastic synapse, we stimulated a non-plastic synapse, of the same neuron, with a Poisson distributed spike train. This forced the neuron to fire consistently at an average frequency of 30Hz. Next, we stimulated the pre-synaptic input of the particular plastic synapse with a Poisson distributed spike train lasting for a 400ms long trial session. This was repeated for a several of trials, and always with a mean frequency of 60Hz. The stimulation protocol is shown as p1 in Fig. 5.3. The synaptic dynamics, represented by node voltage w, is shown in Fig. 5.4. In the top and the bottom panels, the membrane potential of the post-synaptic neuron (Vmem ) and the digital pre-synaptic spike events (pre) are shown, respectively. In the middle we plot the weight modification depending on the pre and post-synaptic spike statistics. Due to layout constrains in placing probe points, we had to monitor an internal voltage ∼ w, that corresponds to the inverted value of w. As prescribed by the theory in Eq. 2.4, the jumps in synaptic weight of the plastic synapse (w) occur only during a pre-synaptic spike. The polarity of the jump depends on Vmem compared to Vmth (dashed line in the top panel). Notice that not all presynaptic spike result in jumps. This happens when none of the post-synaptic currents, IU P and IDN , are in active during a pre-synaptic jump. It can be 84 Chapter 5. Characterization of the plasticity circuits p1 non−plastic plastic p2 non−plastic plastic p3 non−plastic plastic 400 ms Figure 5.3: Various stimulation protocols used to demonstrate the properties of the plastic synapse. The post-synaptic activity is primarily driven by the stimulus given to the non-plastic synapse. The pre-synaptic stimulus is goes to the plastic-synapse that is being tested. The length of the lines show the stimulus duration in three different protocols, p1-p3. observed that the up and down jump heights are not equal in magnitude. This is because of the differences in the p-type or a n-type current mirror feeding current to the node w. Voltages VH and VL are the two bistable limits of the synaptic weight (see Fig. 4.5). The bistability threshold θ is placed closer to VL to compensate for the unequal jump heights. This eliminates any unintended bias in the probability of synaptic transition. Though the bistability mechanism is continuously active, its effect is observed only during the inter spike intervals. This is due to the fact that the current bias responsible for the drift is much smaller than the current responsible from the jumps in synaptic weight. The up and down slopes of the drift do not exactly match as they are the result of either a p or a nMOS transistor charging/discharging a capacitor with unequal subthreshold currents. To verify the probabilistic nature of the synaptic transition, we monitored the dynamics of the node voltage w, from one trial to another. Due to the stochastic nature of the spiking inputs, the jumps accumulate to a weight transition in some cases (see Fig. 5.4(a)), but not in others (see Fig. 5.4(b)). This randomness in transition behavior, even with the same mean firing rate for pre- and post-synaptic neurons, is an essential requirement of the learning mechanism. The stimulus to the non-plastic synapse can be considered as the teacher signal driving the post-synaptic firing frequency. In protocol p1 the teacher input is continuously available to the neuron. This is unlikely for a learning mechanism, where the supervisor is active only when necessary. In a new set of experiments, we used a teacher signal only during the 400ms trial session. The protocol is shown as p2 in Fig. 5.3. In Fig. 5.5, the voltage traces VU P and VDN are shown during an experiment using p2. The average postsynaptic frequency was set to 80Hz by stimulating a non-plastic synapse. It can be noticed that even with a higher post-synaptic activity (compared to 30Hz in Fig. 5.4), the control voltages (VU P , VDN ) get activated much later Vmem 85 Vmem 5.3. The pre-synaptic module θ θ 0 0.1 0.2 Time(s) 0.3 pre VH pre VH ~w VL ~w VL 0.4 0 (a) a1 0.1 0.2 Time(s) 0.3 0.4 (b) a2 vmem VDN VL ~w θ 0.2 Time(s) (a) a1 0.4 θ VH ~w VL VH VUP VUP VDN vmem Figure 5.4: Stochastic LTP transition in a silicon synapse. The stimulation protocol p1 is used to stimulate a plastic and a non-plastic synapse. The particular plastic synapse whose internal dynamics can be probed, is stimulated. (a) The updates in the synaptic weight did not produce any LTP transition during the 400ms stimulus presentation. (b) The updates in the synaptic weight produced an LTP transition that remains consolidated. 0.2 Time(s) 0.4 (b) a2 Figure 5.5: Dependence of synaptic updates on the lean control signals. The neuron is stimulated using protocol p2. The voltages, VU P and VDN , get activated when average post-synaptic frequency reaches the right range. (a) Synaptic jumps do not accumulate into a transition. (b) A synaptic transition happens. 86 Chapter 5. Characterization of the plasticity circuits than the commencement of the stimulus. The delay is due to the time taken for the average post-synaptic frequency (represented by the current I[Ca] ) to reach its steady state from zero (as in Fig. 5.2(b)). In this protocol, synaptic updates can take place only during the later part of the 400ms pre-synaptic stimulus. In a different protocol, we set the average νpost to a non-zero value before the arrival of the stimulus to plastic synapse. In order to do so, the input to the non-plastic synapse is started 100ms before the 400ms stimulus duration. Shown as p3 in Fig. 5.3, this protocol is used for most of the remaining experiments. 5.4 Transition probabilities As described in Sec. 2.4, stochastic transition and stop-learning are two distinctive features of the learning rule. Stop-learning introduces a nonmonotonic dependence on the synaptic transition probability as a function of mean νpost . According to the theory, both LTP and LTD should show a low probability for low νpost and peak up at different intermediate values (see Fig. 2.3(a)). The probabilities should again go down to zero for higher post-synaptic frequencies. The LTD reaches its peak before that of the LTP because the learning mechanism is Hebbian. The frequency dependency of the transition probability is a collective behavior controlled by all the pre and post-synaptic blocks described before. Furthermore, all the synapses should behave similarly and have probabilities of very small value, less than 0.1 (Brader et al., 2007). In order to verify this behavior for any particular νpost , hundreds of trials are required to independently stimulate and test each synapse. The bias values should then be modified, and the experiment repeated. Due to the many iterations required to optimize the set of bias parameters, determining the correct transition probability for all 1920 plastic synapses can be an extremely time consuming procedure. Alternatively, we optimize the behavior of just one synapse by stimulating it for few hundred trials. The protocol p3 in Fig. 5.3 was used during the trials. We also maintained the bias settings so as to obtain a high transition probability. This let us check the shapes of LTP/LTD curves in a reasonable amount of time. The peak probability can be controlled by changing the bias determining the jump heights (Ijump ). In Fig. 5.6 we show the transition probability distribution of the single synapse, as a function of mean νpost . This matches well with the theoretical curves shown in Fig. 2.3(a), except for the high magnitudes of p(LT P ) and p(LT D). Until now, we monitored the detailed behavior of just one synapse. Next, we test all the synapses belonging to a single neuron. The important features 5.4. Transition probabilities 87 p(LTD) 1 0.5 0 p(LTP) 1 0.5 0 0 200 400 νpost(Hz) 600 Figure 5.6: The transition probabilities measured for a single synapse as a function of the post-synaptic frequency. The peak probability can be reduced by decreasing the current Ijump and shape of the curves can be modified by changing the biases that set Ik1−k3 (see Fig. 4.11(b)). to be verified are their stochastic nature of transition and the dependence of their transition probability on the νpost . We confirmed this in a qualitative manner by a set of plots demonstrating the transition of the synapses over many trials. A trial consisted of pre- and post-synaptic plasticity phase similar to the protocol p3 in Fig. 5.3, followed by a state-determination phase. We started by stimulating all 60 synapses connected to one neuron. During the plasticity phase, the synaptic efficacies of all synapses were turned down. This means that any particular synapse is free to make updates and transitions, as usual, but the post synaptic current it produces is always zero. The state of the synapses (high or low) do not have an effect on their respective IEP SC , hence do not influence the post-synaptic frequency either. This is done to decouple the probabilistic transition of one synapse from another. Multiple plastic synapse can be stimulated simultaneously by interleaving their spikes using the spike generation software SPIKE-TOOLBOX (Muir, 2005). In contrary, in the state-determination phase, the synapses are restricted from updates (setting Ijump =0) and their efficacies turned to maximum. Each plastic synapse is independently stimulated to evaluate its effect on post-synaptic frequency. A high νpost guarantees the corresponding synapse is in high state and vice versa 2 . The entire protocol is shown as 2 This method of determining the synaptic state is more time consuming than the one implemented in Giulioni et al. (2007). They employ a RAM style read out that can access 88 Chapter 5. Characterization of the plasticity circuits inset in Fig. 5.7. The states of all synapses are determined after each trial and marked as black (depressed state) and white (potentiated state) dots before embarking into a new trial. We performed fifty trials with a low νpost (10Hz) and progressively increased the νpost to higher values (up to 490Hz), by increasing the input frequency to the non-plastic synapse. In Fig. 5.7, each panel shows the result from a set of 50 trials. The series of plots verifies the tendency of the synapses to remain depressed (LTD) at low post-synaptic frequencies. This is evident from the abundance of black dots in first two panels. In mid range frequencies, the density of black and white dots show that the probability of LTD and LTP are comparable. However, at high frequencies, the synapses prefer to remain potentiated (more white dots). Apart from the change in transition probability as a function of νpost , the stochastic nature of transition is also evident from the Fig. 5.7. Most synapses change their states in a random fashion from trial to trial, with probabilities that depend on νpost . As seen from the figure, some synapses do prefer one state than the other, for all frequency values. These are instances of faulty circuit blocks (synapses in this case) common in a large arrays of mismatch prone silicon chips. In order to independently verify the LTP and LTD transition probabilities, we modified the experiment protocol. Here we initialize the synapses to one particular state, before the every trial. We start by setting all plastic synapses to depressed state. This is represented by the down arrows (↓) in the protocol shown on top of Fig. 5.8. The plots below show the transition to potentiated state by white dots, over a set of twenty trials. Similar to the previous experiment, νpost is increased after every set (shown above each panel) by changing the input frequency to the non-plastic synapse. The density of white dots increase and then decrease again with the increasing values of the post-synaptic frequency, meeting the theoretical requirement. As pointed out before, the transitions happen randomly even though the preand post-synaptic frequencies are same for one entire set. The same experiment was performed with the synapses initialized to the potentiated state. Results from twenty trials are plotted in Fig. 5.9, until the dotted line. Here black dots represent a transition to a depressed state. The density of the black dots increases and decreases again as νpost is increased. This, once again verifies the effect of the stop-learning phenomena. Notice that the peak LTD transition occurs at a much lower post-synaptic frequency than that of the LTP transition (see Fig. 5.8). After twenty trials, the experimental setup was altered, as described in the next section. the state of each plastic synapse simultaneously. However, the area overhead for each synaptic element limits the on-chip synaptic density 5.4. Transition probabilities 89 Figure 5.7: Stochastic transition of all 60 synapses on a synapse array. Each panel consists of data from fifty trials following a protocol shown in the inset. The post-synaptic frequency is increased from 10Hz to 490Hz in equal intervals. Each black dot represents a low synaptic state and white a high one. The dominance of LTD in low νpost and that of LTP at higher values is evident from the series of graphs. 90 Chapter 5. Characterization of the plasticity circuits non−plastic plastic efficay Synapse number Synapse number I jump 20 60 100 180 250 320 400 500 600 700 800 900 20 40 20 40 Figure 5.8: Stochastic LTP transition in all 60 synapses in an array. Trials are performed according to the protocol shown on top. The synapses are reset to low bistable state at the beginning of each trial. Each panel consists of data from 20 trials where a white dot represents a transition from low to hight state (LTP). The rise and fall in LTP probabilities with the post-synaptic frequency (shown on top of each sub-figure) is maintained for all synapses. Synapse number Synapse number 5.4. Transition probabilities 91 5 10 20 60 100 180 250 320 20 40 20 40 Figure 5.9: Stochastic LTD transition in all 60 synapses in an array. A protocol similar to one in Fig. 5.8 was used, but reseting the synapses to high bistable state at the beginning of each trial. In each panlel, data from first 20 trials (before dotted line) show increase and decrease in LTD transition probabilities, as νpost increases. The last 10 trials were performed by increasing the bias current Ik3 that delays the fall in LTD probabilities at high frequencies. 92 Chapter 5. Characterization of the plasticity circuits Delaying stop-learning In the Sec. 4.4.3 we described how the stop-learning frequencies are implemented by bias currents set in the post-synaptic module. Currents Ik1 and Ik2 , sets the lower and upper boundaries for LTP transitions, while Ik1 and k3 sets the boundaries for LTD transitions. In order to monitor their effects on transition probability, in the later part of experiment described in last section (Figure. 5.9), we set Ik3 to a higher value. The data obtained in the last ten trials are shown after the dotted line. A higher Ik3 delays the effect of the stop-learning until higher νpost values are reached. Pushing the upper boundary to a higher frequency value makes no difference in the transition probabilities at low νpost . This is evident from the density of black dots remain same on either side of the dotted line. 5.5 STDP phase relation The spike time dependent plasticity (STDP) observed in in physiological experiments (Markram et al., 1997) has been widely implemented in neural network models (e.g., Kempter et al., 1999) and in neuromorphic systems (e.g., Arthur and Boahen, 2006). The silicon synapse in the IF SL chips can show similar spike time dependent plasticity, but in a restricted form. In order to invoke such properties, we used a protocol common to physiological experiments, where the pre and post-synaptic spikes are paired with a phase difference of the order of few milliseconds. We stimulated a plastic and a non-plastic synapse of a neuron and measured the jump heights of w, for different phase relations. The frequencies ( νpre = νpost ) were kept constant at 80hz. As usual, post synaptic firing was generated by stimulating the non-plastic excitatory synapse with a Poisson pulse train. The data is shown in Fig. 4.11(b). Consistent with the theory, the jumps had a positive value (U P jumps) if the tpre − tpost delay was positive, and vice-versa. Though the circuits responsible for the synaptic update are designed for equal upward and downward jumps (see Sec. 4.3.2), different jump heights were observed during the experiment. The large disparity in U P and DN jumps (also observed in Fig. 5.4) is due to the difference between the p or nMOS current mirrors charging/discharging node w. The minor changes in jump heights between various U P -jumps(or DN -jumps) are due to the frequency dependence of IU P and IDN currents and the bistability mechanism. 5.6. Multiplexer functionality 93 300 ∆w (mV) 200 100 0 −100 −5 0 5 tpost− tpre (ms) 10 Figure 5.10: Pre- and post-synaptic neurons are made to fire in a manner similar to that of Fig. 5.4. The jump heights ∆w, are plotted against the phase relation between pre- and post-synaptic spiking events. The polarity of jumps show typical STDP like behavior. 5.6 Multiplexer functionality In all experiments in this chapter, the number of synapses connected to a neuron was fixed. However, the Sec. 4.5 describes the need for a flexibility in synaptic density when using the chip for a wide variety of classification experiments. As shown in Fig. 4.21, a multiplexer can be used to reconfigure the synaptic density of the on-chip neurons. The multiplexer is designed to re-wire the synapse array (SA), each having 128 synapses, to desired neurons. In order to verify the multiplexer functionality, we first keep it in the default state and stimulate one non-plastic synapse of each array. This results in a post-synaptic firing from all corresponding neurons, i.e., stimulating a synapse in SA-3 makes neuron-3 fire. Next, we configure the SAs such that each neuron has 256 synapses, connecting two arrays to alternate neurons. In this case, stimulating SA-3 or SA-4 both makes neuron-4 fire. At the same time the configuration makes neuron-3 unusable. This is shown in Fig. 5.11(a) where the input stimulus to the SAs are shown in gray and the output from the neurons in black. The heights of the black bars vary from one neuron to another due to the mismatch in the non-plastic synapses. In Fig. 5.11(b), we re-configure the multiplexer such that 512 synapses are connected to each neuron. Here, SA-1 to SA-4 stimulate neuron-4, while neuron-1,2 and 3 remain unused. In this process the synaptic density of some neurons could be increased by up to four times, but with less usable neurons. 94 Chapter 5. Characterization of the plasticity circuits input/output frequency (Hz) input/output frequency (Hz) 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 1 2 3 4 5 6 Dendritic Tree, Neuron 7 (a) a1 8 1 2 3 4 5 6 Dendritic Tree, Neuron 7 8 (b) a2 Figure 5.11: Data from eight synapse arrays (SA) connected to eight neurons via a multiplexer. The gray bars show the frequency of the input stimulus to the non-plastic synapses of the SA. In black the frequency of the neurons in response to the stimulus are shown. In the default multiplexer configuration, each SA is connected to its adjacent neuron (see Fig. 4.21). (a) In a different multiplexer configuration, two adjacent SAs are connected to alternate neurons (neurons 2,4,6 and 8) increasing their synaptic density. Other neurons remain unused as they do not have any synapse connected. (b) Four SAs are connected to single neurons (neurons 4 and 8) for even lager synaptic density. 5.7 Conclusions This chapter shows different methods of characterizing the functionality of the silicon synapses. I described various experiments that were carried out to analyze individual functional blocks and also the overall performance of the plasticity circuit. Spike based synaptic plasticity on silicon has a fairly recent history, and methods of long term storage of the synaptic weights are far from standardized. Over the years, researchers accomplished this difficult technological challenge in different ways, for example, by floating gate storage (P.Hasler et al., 1999; Häfliger and Rasche, 1999), by implementing digital look-uptables (Vogelstein et al., 2003) or by storing only the binary state of the synapse (Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006). In this project we used a synapse with limited analog resolution, designed to take bistable values in long time scale (Brader et al., 2007). This chapter demonstrates the dynamics and the bistable nature of a single synapse. I first showed how the post-synaptic module comply with the theoretical requirement of generating necessary learn control signals. In order to do so, 5.7. Conclusions 95 an appropriate protocol was designed to stimulate the non-plastic and the plastic synapses. Next, I demonstrated the stochastic transition in the states when the plastic synapses are subjected to the right stimulation protocol. Due to the limitation in the number of external probe points, experiments highlighting the detailed behavior of the synaptic plasticity were performed on one single synapse. I also described the method of determining the LT P and LT D probability curves that are dependent on the post-synaptic frequency. Similar transition probabilities of a single synapse have been extensively characterized in Fusi et al. (2000). In a later part of the chapter, I explained how the synaptic transition of all synapses of a neuron can be monitored in a qualitative manner. It is essential for all the silicon synapses to show stochastic nature of transition, similar to the one tested in detail. Though some preliminary data on stochastic transition was reported before in Chicca et al. (2003), here, it was demonstrated with much more rigor and control. The transition probability is an outcome of the collective performance by all pre- and post-synaptic blocks. It requires optimized values from many different voltage biases. I showed how the bias parameters can be tuned to modify the LT P and LT D probabilities. I also showed that the weight update of a synapse depends on the phase difference between the pre- and post-synaptic spike times, similar to the STDP behavior. Finally I described an experiment to verify the behavior of the multiplexer, which is necessary for reconfiguring the on-chip synaptic density. 96 Chapter 6 Spike based learning and classification 6.1 Introduction Learning and classification on dedicated VLSI neural networks (NN) has been an active field of research for last two decades. This essentially begun with the resurgence of NNs research after the rediscovery of the backpropagating learning rule in late 80s (Rumelhart et al., 1986). Multilayer perceptron, using the backpropagation learning rule, became very effective in solving a wide variety of learning tasks. Hardware NN systems were designed for various applications like autonomous robotics, stand-alone sensors, speech processors etc., for their speed and power advantages compared to that of general purpose computers (Lindsay, 2002). Analog VLSI implementation of such learning networks, that utilizes the physical properties of silicon, have been developed to achieve high energy and integration efficiency (see Cauwenberghs and Bayoumi, 1999; Valle, 2002, for review). However, most of these systems had little or no connection to biological learning mechanism, such as spike based synaptic plasticity. More recently, inspired by the physiological phenomena, neuromorphic systems with bio-plausible learning rules have been proposed (Häfliger et al., 1997; Fusi et al., 2000; Vogelstein et al., 2003; Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006; Indiveri et al., 2006a; Häfliger, 2007). Here, I demonstrate learning and classification on a silicon system that is based on a biologically inspired spike driven synaptic plasticity, as proposed in Brader et al. (2007). This shows superior performance compared to any other spike based learning system reported in the literature. In Chapter. 5, detailed characterization of the spike driven synaptic plas97 98 Chapter 6. Spike based learning and classification ticity circuits were shown. Here, I describe the experiments done to train the VLSI system and to test its memory capacity. I show results that demonstrate learning of binary spatial patterns of mean firing rates along with quantification of the classification behavior. The network was tested with a variety of inputs from random uncorrelated patterns to highly correlated patterns and also patterns with graded input. This thesis describes the most extensive classification done on a spike driven VLSI neural network, and shows a very satisfactory performance for a prototype device. 6.2 Network architecture The network architecture we consider consists of a single feedforward layer composed of N input neurons, that are fully connected by plastic synapses to a single output. Many biologically plausible models implementing supervised learning mechanism are tested in such simplified neural architectures, typically feedforward single layer networks with binary outputs (see e.g., Brader et al., 2007; Gütig and Sompolinsky, 2006). The aim of learning is to modify the synaptic connections between the neurons so that the output respond as desired in both the presence and absence of the instructor. As described in Sec. 2.6, we used a network consisting of input and output units connected by plastic synapses where each integrate-and-fire neuron is considered to be an output unit (see Fig. 2.4). Multiple output neurons receiving the same input could also be grouped together to form a population, for a better performance. Initially, each output neuron was configured to have sixty input units, corresponding to sixty plastic synapses. In the following experiments we trained the network to classify binary spatial patterns of sixty dimensions (i.e. sent to sixty plastic synapses), using an additional teacher signal sent to the neurons non-plastic excitatory synapse. 6.3 Training methodology In order to classify patterns, the network is first trained with various inputs and then tested for its memory capacity. The training patterns consist of binary vectors of Poisson distributed spike trains, with either a high mean firing rate (30Hz), or a low one (2Hz). In Fig. 6.1 we show two such input vectors, created in a random fashion with 50% high rates (white circles), and 50% low rates (black circles). These spatial patterns are randomly assigned to either a C + or a C − class. During training, when a C + pattern is presented to the plastic synapses a T + teacher signal is used. The T + teacher is a 6.3. Training methodology 99 T− +T Excitatory synapse, non−plastic Inhibitory synapse, non−plastic Excitatory synapse, plastic High input state (30Hz) Low input state (2Hz) C+ C Figure 6.1: The method of training with binary patterns of mean input frequency. Two examples of training patterns are shown on the left and right sides of a neuron symbol. Poisson spike trains of high (30Hz) or low (2Hz) frequency are randomly assigned to each synapse, and represented as white or black circles respectively. These spatial patterns (black/white circles) are arbitrarily assigned to either a C + or a C − class. During training, the C + patterns are presented together with a T + (teacher) signal, while C − patterns are presented in parallel to a T − spike train. Training of the C + and C − class patterns are interleaved in a random order. The same spatial patterns are trained for multiple iteration but with new realizations of Poisson spike trains, for each session. Poisson distributed spike train with mean frequency of 250Hz, presented to one of the neurons non-plastic excitatory synapses. Similarly, for C − patterns a T − signal, with mean rate of 20Hz, is used. The training sessions follow the protocol p3 shown in Fig. 5.3, where the input to the plastic synapses last for 400ms and that to the non-plastic one is for 500ms duration. For each training session we generate new Poisson spike trains, for both input and teacher, keeping the same pre-defined spatial distribution of mean frequencies. The output neuron fires according to the total synaptic input current weighed by the plastic synapses plus from the teacher input. The probability transition for the plastic synapses depends on the mean firing frequency of the post-synaptic neuron (νpost ). After several training sessions, the neurons are tested for the learned patterns. Before testing, we disable the synaptic update mechanism by setting the current Ijump , that determines the magnitude of jumps (see Fig 4.19), to zero. This is done to allow us do the tests without any interference from the learning mechanism and to study the results of training at fixed synapses. During testing we present the input patterns, again for 400 ms, but without the teacher signal, and evaluate the networks response. A high output frequency for a C + class and a low for a C − class, suggests correct classification behavior. 100 6.4 Chapter 6. Spike based learning and classification Evolution of synaptic weights According to the theoretical prescription of Brader et al. (2007), during training sessions the synaptic weight changes depending on the pre- and postsynaptic frequency. The synapse displays Hebbian behavior: LTP dominates at high νpost and LTD dominates at low νpost , when the presynaptic neuron is stimulated. We tried to analyze the evolution in synaptic weights, as the training progresses. Although, direct monitoring of the states of all plastic synapse is not possible, the post-synaptic firing frequency during the testing phase gives a measure of the synaptic states. Synapses corresponding to high pre-synaptic input(30Hz), if potentiated, increases the output frequency. Conversely, synapses corresponding to high input, if depressed, does not contribute to the νpost . Figure. 6.2 shows the evolution of synaptic weights when trained with a + C pattern. Prior to the experiment, all synapses were initialized to random states 1 . To evaluate the response of the neuron, we test it with a new binary pattern and measure the output frequency. We then train the neuron with the same pattern as a C + class for ten sessions, interrupting the training sessions to test it (i.e. to measure its response to the pattern in absence of the teacher signal). We repeated the cycle of ten training sessions followed by one testing session for two more times. At the end of the entire C + class training period, we assign the same pattern to the C − class, and re-train it. We again interrupt the training sessions and measure the intermediate results for three times. This experiment was repeated with 50 different spatial patterns; and the outcome plotted in Fig. 6.2. The light gray bars in all panels represent the neuron’s output frequency histogram in the initial condition, with random synaptic states. The panels, from left to right, show how the output frequencies gradually increase as the training potentiates more and more synapses. Similarly, the panels in the Fig. 6.3 show how the weights tend to depress again as the neuron is being trained with the same patterns, but this time assigned to C − class. The fact that the output frequency distributions do not increase beyond approximately 200Hz, and do not decrease completely to zero is a direct consequence of the circuits stop-learning mechanism, described in Sec.2.4. Comparing the two rightmost panels in Fig. 6.2 and Fig. 6.3, shows that the output frequencies obtained from training C + and C − classes are well separated, once the training is complete. These results indicate that the Ten random binary patterns were trained as both class C + and class C − to the same synapse array alternatively for 5 sessions each. A single synapse receiving the same νpre but inconsistent νpost created a conflicting requirement between LTP or LTD transition. This drove the synapses to random stable states 1 6.5. Classifying multiple spatial patterns 101 0.5 p(νpost) 0.4 0.3 0.2 0.1 0 Frequency(Hz) Figure 6.2: Probability of output frequency as a neuron’s training progresses with the C + class patterns. The gray bars, in all panels, represent the neurons response to the input patterns in its initial conditions (before training). The black bars represent the neuron’s output frequency as training of the C + class progresses. Synaptic weights stop potentiating when the νpost is too high, limiting any further increase in the firing frequency. 0.5 p(νpost) 0.4 0.3 0.2 0.1 0 0 100 200 0 100 200 Frequency(Hz) 0 100 200 Figure 6.3: Probability of output frequency as a neurons training progresses with the C − class patterns. Synaptic weights stop depressing when the νpost is too low, restricting the post-synaptic frequency from going down to zero. device can be robustly used to perform classification of spatial patterns of mean firing rates, into two distinct classes. 6.5 Classifying multiple spatial patterns To further verify the classification performance, we carried out the following experiment with four random spatial patterns: we trained one neuron (labeled as neuron-A) with two of the patterns (1a and 2a) assigned to the C + class, and with other spatial patterns (1b and 2b) assigned to the C − class. We interleaved the training of the four patterns in random order. In addition, we trained a different neuron (labeled neuron-B ) with the same four 102 Chapter 6. Spike based learning and classification neuron−A neuron−B 0.5 60 νpost(Hz) p(νpost) 80 p(νpost) 0 0.5 40 20 0 0 50 100 Frequency(Hz) (a) 150 0 1a 2a 1b 2b #patterns 1a 2a 1b 2b #patterns (b) Figure 6.4: Classification of four spatial patterns. (a) Probability distribution of νpost after having been trained to classify four patterns belonging the C + (top) and the C − (bottom) class. (b) Average output frequency for the individual patterns, during testing. In the left plot, patterns 1a and 2a are assigned to the C + class while patterns 1b and 2b to the C − class, when training neuron-A. In the right plot the class assignments are swapped, while training neuron-B. patterns, but swapping the class assignments. After training, both neurons were tested with all the patterns. To measure the mean performance and its variance, we repeated the entire procedure thirty times, creating new sets of random spatial patterns each time. In Fig. 6.4(a) we plot the probability distribution of output frequencies of the neuron-A, in response to the four input patterns, during the testing phase. The top panel shows the response to the patterns 1a and 2a, and bottom shows the response to 1b and 2b. As expected, the probability (p(νpost ) of neuron-A to fire at a higher frequencies for the patterns trained as the C + class, is much larger than for the C − patterns. In Fig. 6.4(b), we plot the average output frequency separately for all four patterns obtained from neuron-A (left panel) and that from neuron-B (right panel). Here, neuron-B responds with a low firing rate to the very same patterns that were assigned to the C + class for neuron-A, and with a high rate for the others. From the neuronal responses it is evident that a single threshold frequency would be enough to categorize the patterns in two distinct classes. We repeated similar experiments with six and eight random input patterns, always training half of them as C + to and the other half as C − . As before, the two neurons were trained with the same input patterns, but with opposite class assignments. During the testing, the mean output frequencies 6.5. Classifying multiple spatial patterns neuron−B 80 70 70 60 60 50 50 (Hz) 80 post 40 neuron−A neuron−B 1a 2a 3a 4a 1b 2b 3b 4b #patterns 1a 2a 3a 4a 1b 2b 3b 4b #patterns 40 ν νpost(Hz) neuron−A 103 30 30 20 20 10 10 0 1a 2a 3a 1b 2b 3b #patterns (a) 1a 2a 3a 1b 2b 3b #patterns 0 (b) Figure 6.5: Average output frequency during classification of six (a) an eight (b) random patterns. In each case, half of the patterns (marked with ’a’ at the end) are trained as C + to neuron-A and the other half as C − class. For neuron-B, the class assignment was swapped. for all the six patterns are plotted in Fig. 6.5(a) and for the eight patterns in Fig. 6.5(b). From the frequency histograms we can see that the separation between the two classes become less obvious as the number of patterns increase. A more quantitative analysis of the classification performance will be discussed in Sec. 6.6. To provide the reader with more insight on the significance of these experiments we used 2D binary patterns representing digits (see Fig. 6.6) as input spatial patterns. The 2D input space is translated to a 1D binary vector by simple raster scan and Poisson input spike trains are created with a high mean firingrate for white pixels, and a low rate for black pixels, as in previous experiments. All symbols on the left most panel of Fig. 6.6 represent the digit 1 in different languages, while the symbols in the middle panel represent the digit 2. A neuron is first trained with just two patterns (1a as class C + , and 2a as class C − ) and its response during testing is shown in the top row of the right panel in Fig. 6.6 . The neuron is then trained with four patterns (the patterns in top two rows of the figures’s left panel), and testing shows correct classification behavior (see middle row of figures’s right panel). Finally, the neuron is trained with all six patterns. The test results in the bottom row of the figures right panel show that the neuron can successfully learn to distinguish the character 1 from 2 in three different languages. We also checked the classifier recall performance by corrupting the input during testing. In order to do so, a random subset of the input vector was Chapter 6. Spike based learning and classification freq(Hz) 104 50 freq(Hz) 0 50 freq(Hz) 0 50 0 C+ 1a 1b 1c 2a 2b 2c C Figure 6.6: Pattern recognition of 2D binary images. The data is converted into a 1D vector of Poisson mean firing rates (30Hz for white pixels, 2Hz for black pixels). Numbers 1 and 2 in three different languages were assigned to C + and C − classes respectively. The plots in the right panel show classification results when one, two, or three members of each class are trained together (top to bottom). inverted, from high to low or vice versa, from the one used in training. The modified input was used for testing the neurons. The Fig. 6.7 shows the difference in average post-synaptic firing frequency (∆νpost ) between C + class patterns and C − class patterns, both corrupted during testing. In the left figure, two spatial patterns were used as input and in the right, four. The x-axis shows the size of input vector that was corrupted. The middle figure shows an example of the original pattern (top), to 5% (middle) and 10% (bottom) corrupted spatial pattern. 6.5.1 Uneven class distributions In all experiments described until now, the number of patterns belonging to C + and C − class were always kept equal. Hence, after training was complete, the neuron had to categorize the set of input patterns in two equal halves. To verify the capability of a neuron to recognize one pattern out of many, we created four random binary patterns (labeled 1 through 4) and trained four neurons (labeled A through D), with uneven class assignments. For neuron A, only pattern 1 was assigned to the C + class, and all other patterns were assigned to the C − class. Similarly, for neuron B only pattern 2 was assigned to the C + class, and for neurons C and D only patterns 3 and 4 were assigned to C + respectively. After multiple iterations of the training session we tested all four neurons. This was repeated forty times with new sets of four 140 140 120 120 100 100 ∆ νpost post ∆ νpost post 6.6. Quantitatitive analysis of classification performance 80 60 80 60 40 40 20 20 0 0 5 10 % Noise level 15 105 0 0 5 10 % Noise level 15 Figure 6.7: Memory recall from corrupted data set. After training is completed, a percentage of the input vector (% noise level ) is inverted before testing is done. The difference in post-synaptic frequency between the C + and C − classes while testing two and four spatial patterns are shown in the left and right panel, respectively. As expeceted an increase in noise level decrease the frequency difference. In the middle panel, an example pattern and its 5% and 10% corrupted versions are shown from top to bottom. randomly generated spatial patterns, with identical class assignments. We used a fixed threshold frequency (20Hz) as decision boundary and counted the number of times each neurons output crossed it, while testing all four patterns. As expected, neuron-A crossed the threshold many more times in response to pattern 1, compared to the responses to other patterns, and all other neurons behaved correspondingly. The gray bars in Fig. 6.8 show the fraction of times (fT ) each neuron crossed the decision boundary. The height of the gray bars for the patterns corresponding to the C + class show the fraction of correct classification, while the gray bars for patterns of respective C − classes show the fraction of misclassified results. We also counted the number of times νpost resided within the narrow band between 16Hz and 20Hz, and considered that as un-classified output. This means that the output is neither high enough to categorize the pattern as a C + class nor low enough for a C − class. The thin black bars show the fraction of times this happened. 6.6 Quantitatitive analysis of classification performance In order to do a quantitative characterization of the networks classification performance, we used a discrimination analysis based on the Receiver Operating Characteristics (ROC)(Fawcett, 2006). An ROC graph, plotted in an Chapter 6. Spike based learning and classification fT fT fT fT 106 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0 A B C D 1 2 3 4 Figure 6.8: Four neurons (A-D) are tested with four different random patterns (1-4), each trained with one pattern as C + and three other as C − . The fray bars show the fraction of time the output crossed a fixed frequency threshold while testing. Each neuron crossed the threshold many more times for their corresponding C + patterns (like pattern-3 for neuron-C) compared to others. Black bars represent the fraction of time the output frequency lies within a narrow band around the threshold, depicting unclassified result. unit square, is a technique for measuring the sensitivity and specificity of a binary classifier. Shown in Fig. 6.9(a), the output of a binary classifier can have four possible outcomes, true posistive, true negative, false positive and false negative. In an ROC graph the classifiers true positive rate (also called hit rate) is plotted against its false positive rate (also called false alarm rate). For classifiers that are designed to produce a direct class decision for each input, the analysis produces a single point in the ROC graph (like C1 or C2). The closer the point to the location (0,1), the better the classifier (e.g., C1 is a better classifier than C2). However, for classifiers with continuous value output (e.g. neuron’s output frequency), a sliding threshold ranging from 0 to infinity, is used as the classifiers decision boundary. Each threshold produces a different point in the ROC space and a step function connecting them is called a ROC curve. The area under the ROC curve (AUC) is a measure of the classification performance. While unity area denotes perfect classification, an area of 0.5 indicates that the classifier is performing at chance. We performed classification experiments analogous to the one done for the data in Fig. 6.4(a), with different sets of input patterns, ranging from two to twelve, but always dividing them equally between C + and C − class. An ROC analysis was done on the neurons output frequency, for each set of input patterns used. The solid line in Fig. 6.10(a) shows the AUC values of 6.6. Quantitatitive analysis of classification performance 107 Actual value − 1 + True Positive False Positive tpr = tp tp+fn − False Negative True Negative fpr = fp fp+tn (a) C1 tpr Prediction outcome + 0 C2 fpr 1 (b) Figure 6.9: (a) A confusion matrix shows the four possible outcome of a classifier for a given input. (b) A graph between the true positive rate(tpr) and the false positive rate(f pr) forms the ROC space with unity dimension. Classifiers with a binary decision are represented as points (C1,C2) on the graph, while classifiers with an analog output ( e.g., post-synapstic frequency of neuron) forms a ROC-curve. the single neuron classifier as a function of the number of patterns in a set. As the number of patterns to classify increases, the classification performance decreases, as expected from theory. This is due to the fact that the overlap among patterns belonging to the C + and C − classes increases for greater number of random spatial patterns. The number of patterns for which the classifier has an ROC value of 0.75 can be considered as the classifier’s storage capacity. This illustrates the maximum number of patterns for which the it has 75% probability of producing the right result for a random input. Next we performed similar ROC analysis as in Fig. 6.10(a) and obtained the AUC by using only 40 and 20 input synapses. In Fig. 6.10(b) we plot the storage capacity (solid line), versus the number of input synapses (N ) used by the classifier. The top and bottom traces show theoretical predictions, derived from Brader et al. (2007), with (top dashed line) and without (bottom dashed line) the stop-learning condition. The performance of the VLSI system is compatible with the theoretical predictions on the storage capacity of networks with stochastic bounded synapses, and shows the same scaling properties as predicted by the theory. 6.6.1 Boosting the classifier performance To enhance the classification performance, instead of using a single neuron as a binary classifier, we used 20 independent classifiers trained with the same sets of patterns. Figure. 6.11(a) shows such an arrangement, where different 108 Chapter 6. Spike based learning and classification 1 100 0.8 80 0.7 0.6 0.5 2 # patterns AUC 0.9 % Correct Classification 15 10 5 60 4 6 8 # patterns (a) 10 12 0 10 20 30 40 50 # input synapses 60 70 (b) Figure 6.10: Classification performance and the memory capacity. (a) Area under the ROC curve (AU C) measured from pattern classification experiments with a single output neuron is shown in the solid line. The result from a pool of 20 independent output neurons, using a majority decision rule, is shown in dashed line (see Fig. 6.11(b) for explanation). (b) The storage capacity of a single neuron is measured by the number of patterns classified with greater than 0.75 AU C value. The solid curve shows the number of such patterns plotted against the number of synapses used for the experiment. The dashed lines show the theoretical predictions of storage capacity, with and without the stop learning condition. neurons are receiving the same input but different teacher signals (T1 -T20 ). Due to limitations of number of active neurons, rather than using 20 different ones, we trained the same neuron multiple times, with different realizations of the Poisson trains for the teacher signal, but same input pattern spike trains. The binary decision mechanism that combines the result of the different output neurons was implemented using a majority rule decision process: each neuron in the pool individually classifies the learned pattern to be in C + or in C − and votes for the class chosen. The score is positive (+1) if the vote is correct, and negative (-1) otherwise. The total outcome is computed by summing all the scores that shows what the majority decides. Figure. 6.11(b) shows the outcome of a pool of 20 neurons for 10 different input patterns. The dark and light gray bars represent the correct and incorrect votes during classification. The black bars represent the net sum, which is positive if the classification is correct and negative for a misclassification. Using this method we can also define an unclassified outcome. For example, a pattern can be defined to be unclassified (rather than misclassified) if the difference between the correct and incorrect votes does not exceed 10% of total members in the pool. In that case the black bar resides within the two horizontal line. 6.7. Classification of Correlated patterns T1 T2 T20 109 20 Vote Count 15 10 5 0 N1 N2 N20 Majority Vote (a) −5 −10 1 2 3 4 5 6 7 # patterns 8 9 10 (b) Figure 6.11: Boosting the classification performance. (a) Twenty different neurons (N1 -N20 ) can be used as independent binary classifiers by stimulating them with the same input but with different teacher signals (T1 -T20 ). A majority voting from all these weak classifiers provide the final result. (b) Individual classification results, and majority rule decision for classification of 10 patterns. Dark and light gray bars represent the vote counts for correct (positive) and incorrect (negative) classifications, respectively. Black bars, represent the sum of vote counts indicating the majority decision. Negative black bars (not present) would represent misclassification error, while bars within the dashed lines would represent unclassified decisions. The dashed line in Fig. 6.10(a) shows the performance achieved using this method. This technique is known as boosting that provides clear advantages over the use of a single classifier. The improvement can be attributed to synaptic updates being stochastic and independent on different neurons. As a consequence every output neuron can be regarded as a weak classifier and the errors made by each of them are independent. 6.7 Classification of Correlated patterns The important advantage of the stop-learning mechanism implemented in this device lies in its ability to classify correlated patterns. To test this claim, we created correlated patterns using a prototype with 60 random binary values (as explained in Fig. 6.1) as a starting point. We then generated new patterns by changing only a random subset, of a size that depends on the amount of correlation. In Fig. 6.12(a) four patterns (labeled 1-4) are generated starting from the prototype labeled 0. In the top panel a the four 110 Chapter 6. Spike based learning and classification Patterns 10 1 Synapse 20 30 0.9 40 1 2 Patterns 3 4 60 10 AUC 50 0 0.8 0.7 Synapse 20 30 0.6 40 50 0 0 1 1 2 2 (a) 3 3 4 4 60 0.5 20 2 patterns 4 patterns 6 patterns 8 patterns 40 60 80 Percentage correlation 100 (b) Figure 6.12: Classification performance with correlated patterns. (a) Four correlated patterns (labeled 1-4) are created from the same randomly generated prototype (labeled 0). The patterns with 30% correlation with the prototype are shown on top and the patterns with 90% correlation are shown below. (b) The AU C values computed for different sets of patterns (2 to 8), are plotted against the percentage of correlation among the patterns. In every experiment half of the patterns are randomly assigned to the C + class, and half to the C − class. input vectors (length sixty) with 30% correlation shows a small degree of similarity to the prototype. Patterns in the bottom panel, with 90% correlation, have most of their inputs in the same state as the prototype. These patterns were then randomly assigned to either the C + or the C − class. In the experiments that follow we systematically increased the percentage of correlation among the input patterns, and repeated the classification experiment with increasing numbers of input patterns per class, ranging from two to eight. Figure 6.12(b) shows the AU C obtained from the ROC analysis carried out on the outcome of these experiments. The curves show a substantially constant AU C value for patterns with low and intermediate correlation, with a rapid drop in AUC (indicating low classification performance) only when the correlation among patterns increases beyond 90%. This remarkable performance derives from both the the bistable nature of the synaptic weights and the stochastic nature of the weight updates (Fusi, 2002). To further evaluate the effect of the stop-learning mechanism, we compared the performance of the system with the corresponding circuits enabled and disabled. We carried out a classification experiment starting with two 6.7. Classification of Correlated patterns 111 1 C+ C 0.9 AUC 0.8 0.7 0.6 1 2 3 (a) 4 0.5 0 20 40 60 Percentage Overlap 80 100 (b) Figure 6.13: Classification performance with and without the stop-learning mechanism enabled. (a) Examples of the C + and C − patterns with just 20% overlap (the overlapping region is within the dashed box). (b) Even without the stop-learning (dotted line) mechanism, the classification performance remains high for such trivial patterns (with no or moderate overlap), but decreases for higher overlap. Result with the stop-learning (solid line) enabled has little effect with increasing overlap. completely orthogonal sets of C + and C − patterns. The two patterns assigned to C + class consisted of random binary vectors for synapses 1-30 and all zeros for 31-60. Other two patterns belonging to the C − class were generated by assigning random binary vectors to synapses 31-60, and setting the synapses 1-30 to zero. This corresponds to zero overlap between the two classes and a trivial set to classify. Additional patterns with increasing overlap were generated following an analogous procedure: the random binary vectors were assigned to overlapping subsets of synapses (e.g. 1-33 and 27-60 for 10% overlap) The inset of Fig. 6.13(b) shows an example of four patterns with 20% overlap (see grey dashed box). Due to the random nature of the binary vectors, the number of correlated synapses is usually less than the overlap percentage. When the overlap is set to 100% this experiment is equivalent to that of random uncorrelated patterns described in Sec. 6.5. Conditions with little or no overlap between patterns were classified properly (high AUC values) even with the stop-learning mechanism disabled (see the squares in Fig. 6.13(b)). However, the effect of the stop-learning mechanism becomes evident for high values of overlap (see the circles in Fig. 6.13(b)). 112 Chapter 6. Spike based learning and classification 0 1 10 0.9 AUC Synapse 20 30 0.8 0.7 40 0.6 50 60 1 0 0.4 (a) 2 0 0.4 0.5 2 4 6 8 # patterns 10 12 (b) Figure 6.14: Classification performance (AU C values) measured in response to graded input patterns. The patterns are formed by spike trains with four possible mean frequency values. (a) Two typical input vectors with four values shown in shades of gray. Example spike raster plots are plotted beside. b) With graded patterns, as expeceted, the classification performance degrades much faster compared to that of binary pattersn in Fig. 6.10(a)). 6.8 Classification of graded patterns In addition to using binary spatial patterns, we performed experiments with graded patterns: patterns in which the mean frequencies could be 2Hz, 10Hz, 20Hz, or 30Hz (as opposed to just 2Hz or 30Hz). Two samples of random graded input patterns are shown in the Fig. 6.14(a). Example spike raster plots corresponding to the mean frequencies used are shown next to the patterns. We performed experiments similar to those described in Section. 6.5 for classifying random patterns assigned to two classes using a single neuron. We quantified the classifier’s performance using the ROC analysis as shown in Fig. 6.14(b). The AUC value decreases with the numbers of patterns presented to the classifier during the training phase. The overall trend is similar to the one shown in Fig. 6.10(a), but the rate at which the AUC value decreases (i.e. the classification performance degrades) is much higher here. This is an expected behavior, as the similarity between input patterns is even higher for graded values, and the correlation between patterns increases as more and more are used in the training set. To further analyze the classification performance of the device on graded input patterns, we created spatially distributed Gaussian profiles as input vectors. The profiles had standard deviation of 6 synapses and a maximum mean frequency of 30Hz. In Fig. 6.15 (top row) we show two such patterns 113 input (Hz) 6.9. Conclusions 20 output (Hz) 0 30 60 30 60 30 60 30 60 100 50 0 a b a b a b a b Figure 6.15: Classification performance for two graded and overlapped input stimuli with Gaussian profiles, a (gray) and b (black). The top row shows the two input stimuli with increasing areas of overlap (left to right). The X-axis represents the input synapse address and Y-axis its mean input frequency. The output neuron is trained to classify a as a C + class and b as a C − class pattern. The bottom row shows the neuron’s mean output frequency in response to the a and b patterns, during the testing phase. in four panels, with increasing amount of overlap. The first pattern (labeled a) is centered around the synapse #25, while the other pattern (labeled b) is gradually shifted from synapse #45 (leftmost plot) to synapse #30 (rightmost plot). The pattern a was trained as a class C + pattern, while pattern b was trained as a C − class pattern. The outcome of the classification experiment, during the test phase, is plotted in the bottom row of Fig. 6.15. As expected, the neuron responds with a high firing rate to pattern a and with a low one to pattern b. But the difference between the high and low mean rates decreases as the overlap between the patterns increases. The classifier manages to maintain a difference in the output firing rates, even for the case in which the patterns overlap almost completely. 6.9 Conclusions In this chapter I showed classification results that verify the functionality of a spike based learning on the IFSL-v2 chip. Random binary patterns of mean firing rates were used as input. I showed how these results could be extended to the classification of binary 2D images and also to recognize one pattern out of many. To characterize the classification performance of the system in a thorough and quantitative way, a discrimination analysis based on receiver operating characteristics (ROC) is shown. 114 Chapter 6. Spike based learning and classification Most of the early spike based learning circuits reported in literature focused on the detailed characterization of a single silicon synapse (Häfliger and Rasche, 1999; Fusi et al., 2000). In other studies, data from a very small number of synapses were shown to demonstrates the change in synaptic weight (Shon et al., 2002; Vogelstein et al., 2003; Bofill-i Petit and Murray, 2004, etc.), according to a learning rule. In recent studies, Schemmel et al. (2006) and Indiveri et al. (2006b) also reported spike time dependent plasticity on silicon and emphasized on replicating the shape of the temporal window of plasticity. However, none of them report collective behavior generated from an array of plastic synapses, or show any statistical analysis of entire the system. In Yang et al. (2006) and Koickal et al. (2007), spike based learning mechanisms were utilized for adaptation in silicon synapses, but pattern classification on a VLSI network of spiking neurons were explicitly demonstrated only in Arthur and Boahen (2006) and in Häfliger (2007). However, the patterns to be learned were rather trivial with little or no overlap between them. Furthermore, there is neither a quantitative analysis on the classification performance nor a measure of the storage capacity of a spiking VLSI network reported in the literature. Giulioni et al. (2007) implemented the same plasticity rule as in this project, and reported very basic learning experiments concentrating mostly on characterization of the synaptic changes. In this chapter, in addition to quantitative analysis of the network I showed how the learning performance could be improved further by using boosting techniques (Polikar, 2006). The memory recall from a partially corrupted pattern, as in associative memory, was also shown. I further demonstrated how the scaling of the storage capacity in the VLSI system are in accordance with those predicted by the theory (Brader et al., 2007). The robust classification of highly correlated patterns shown here has not yet been reported in any other bio-plausible hardware system. Finally, I showed how the learning circuits can correctly classify more complex patterns that are not restricted to binary input firing rates. Chapter 7 Discussion 7.1 Relevance of the work described in this thesis Natural evolution has led to robust computational architecture based on principles conceptually different from those of classical digital computation. Biological systems are far more efficient in solving ill-posed problems and extracting reliable information from noisy and ambiguous data (Douglas et al., 1994). To bridge the gap between the computational ability of a biological and an engineered system, the concept of neuromorphic engineering took root at California Institute of Technology, USA, during the mid-eighties, with the research of Carver Mead. He envisioned that the inherent similarity in the physics of silicon and biology can be exploited to build computational primitives in VLSI circuits (Mead, 1990). Neuromorphic engineers attempt to capture the computational power and efficiency of biological neural systems in hybrid analog-digital VLSI device. These devices employ a similar design strategy as in biology: local computations are performed in analog, and the results are communicated using all-or-none binary events (spikes). The significance of neuromorphic systems is that they offer a method of exploring neural computation in a medium whose physical behavior is analogous to that of biological nervous systems and that operates in real time, irrespective of size. The challenge for neuromorphic engineering is to explore the methods of biological information processing in a practical electrical engineering context. Today this field has grown into larger bio-inspired hardware systems with a variety of silicon devices being designed at various research labs around the world. Neuromorphic engineering includes design of computational primitives like silicon neurons/synapses, different kinds of intelligent sensors, ana115 116 Chapter 7. Discussion log signal processing systems, spike based networks and devices for low-power biomedical applications (see Boahen, 2005; Sarpeshkar, 2006, for reviews). My specific contribution to the field focuses on the design and implementation of spike-based neural networks with learning capabilities. A growing interest in such systems has recently lead to the design and fabrication of an increasing number of VLSI networks of integrate-and-fire (I&F ) neurons (Chicca et al., 2003; Liu and Douglas, 2004; Bofill-i Petit and Murray, 2004; Arthur and Boahen, 2006; Badoni et al., 2006; Indiveri et al., 2006a; Häfliger, 2007), including multi-chip networks (Choi et al., 2005; Serrano-Gotarredona et al., 2005; Vogelstein et al., 2007). Learning and classification undoubtedly forms an important part in building a complete, multi-chip, real-time, behavioral system in hardware. Yet, the volume of research in devising a neuromorphic learning chip is comparatively low. One obvious reason is the lack of established models pertaining to spike based learning, and also the difficulty in long term storage of synaptic weights in a silicon device. In this context, the work presented here is extremely relevant as it shows a flexible neural network that implements a very robust spike based learning algorithm (Brader et al., 2007). It also solves the problem of weight storage by utilizing a bistable memory element, a natural choice in the entire information industry for the last half-century. The specific achievements and extension to the state of the art made by this project and their relevance in the on going research of building a large scale artificial system, capable of brain like behavior, are highlighted below. 7.1.1 A robust AER communication system Neuromorphic chips routinely require tens of thousands of axonal connections to propagate spikes from the silicon neurons, far too many to be implemented using dedicated wires. The address-event link, originally introduced by Sivilotti (1991) and Mahowald (1994), implements virtual connections using a time-division-multiple-access of a single physical channel. Since its early days, the design of the communication channel has been significantly improved by the work of Boahen (2000, 2004a). The AER communication, being the backbone for data transfer in neuromorphic chips, has to be carefully optimized for both speed and robustness. In this thesis, I showed how pipelining the communication channel increases its throughput and elaborated on the design aspects of the pipeline, originally proposed in Boahen (2000). This formal approach in AER circuit design is in contrast to the heuristic method used in many previous generations of neuromorphic chips. It also helped in designing an improved AER communication system, independent of any external bias. I showed a reli- 7.1. Relevance of the work described in this thesis 117 able asynchronous communication channel without the problems of missing spikes, as seen in Chicca (2006). I discussed the design of individual combinational circuit blocks in the communication channel, in particular, I focused on the arbiter and the decoder design. The arbiter, being an integral part of an asynchronous transmitter, should be both fast and unbiased in its operation. The design basics and data from different chips are presented to point out the improvements in the arbiter design, that was proposed in Boahen (2004a). This is an important enhancement over the problems observed in Lichtsteiner and Delbrück (2005) for example, where increased spiking activity in one region of the chip restricted all the AER traffic to that region. Another important contribution was in the dual rail data data representation (Martin and Nystrom, 2006) used in the receiver chip, a step toward increasing robustness to the AER communication system. 7.1.2 Synaptic plasticity in silicon Implementation of spike based synaptic plasticity on silicon has a fairly recent history, and long term storage of the analog synaptic weights still poses an important technological challenge. Over the years, researchers accomplished this difficult task by using floating-gate storage (Diorio et al., 1997; Häfliger and Rasche, 1999; Shon et al., 2002), by implementing digital lookup-tables (Vogelstein et al., 2003; Wang and Liu, 2006) or by storing only the binary state of the synapse (Arthur and Boahen, 2006). However, there has been accumulating evidence that biological synaptic contacts undergo all-or-none modification (Petersen et al., 1998; O’Connor et al., 2005), with no intermediate stable states. In this project we used a binary synapse with limited analog resolution that is designed to take only two values over long time scale (Brader et al., 2007). The plasticity model uses Hebbian learning with stochastic updates and an additional stop-learning condition to classify broad classes of linearly separable patterns. In this work, I demonstrated the bistable nature of the silicon synapse along with its stochastic transition from one state to another. Experiments demonstrating the detailed dynamics of synaptic plasticity were performed on one single synapse due to limitation in number of external probe points. The weight update on the particular synapse was also analyzed for its phase dependency between the pre- and post-synaptic spike times. I described the method of determining the LT P and LT D probability curves that are dependent on the pre- and the postsynaptic frequencies. It is also essential for all the synapses to show stochastic nature of transition, similar to the one tested in detail. The transition probability is an outcome of the collective performance of the various circuit blocks associated with the synapse, and requires optimized values from many 118 Chapter 7. Discussion different voltage biases. In previous works, Fusi et al. (2000); Chicca et al. (2003) have shown frequency dependent transition probabilities, but in a single synaptic circuit. For the first time, I reported data showing controlled frequency dependence of the stochastic transition probabilities, of an entire synaptic array in a full custom VLSI chip. Using the IFSL-v2 chip, I showed how current-mode signal processing helped in reducing unwanted coupling between feedback signals from neuron to synapses. This method could be extensively used for similar learning rules where a few global feedback signals are broadcast to all synapses, to control their plasticity. Current-mode integrators, current-comparators (derived from current-mode WTA proposed by Lazzaro et al. (1989)) and active current-mirrors used in this chip consume much lower power compared to the corresponding voltage-mode processing. 7.1.3 Learning and classification in VLSI Memory is a fundamental component of all learning mechanisms which lead to the classification of learned stimuli. In particular, the memory elements (the synaptic weights) should be modified to learn from experience, creating new memories (memory encoding). At the same time the old memories should be protected from being overwritten by new ones (memory preservation). In this work I described a spike based VLSI system which can learn to classify complex patterns, efficiently solving both the memory encoding and the memory preservation problems. Specifically, I demonstrated how the VLSI system can robustly classify patterns of mean firing rates, also in the case in which there are strong correlations between input patterns. Almost none of the neuromorphic systems designed for spike based synaptic plasticity report any classification behavior (see Fusi et al., 2000; Shon et al., 2002; Bofill-i Petit and Murray, 2004; Indiveri et al., 2006b), with notable exceptions in Arthur and Boahen (2006) and Häfliger (2007). However, in those studies, the patterns to be classified were very simple with little or no overlap between them and none showed results for simultaneous classification of multiple patterns. None of them report any quantitative analysis on classification performance or any measure of the storage capacity of the VLSI network. The work of Giulioni et al. (2007) implements the same plasticity rule as in this project, and reports simple learning experiments concentrating mostly on characterization of the synaptic weights. In this thesis, I showed results verifying the functionality of the VLSI system, in the difficult condition of random binary patterns as input. Classification of multiple input patterns and rigorous quantification of the network performance was shown using ROC analysis. In addition I showed how the learning performance 7.2. Future Work 119 could be further improved by using boosting techniques (Polikar, 2006). Results from the chip demonstrated how the scaling of the storage capacity are in accordance with those predicted by theory the (Brader et al., 2007). This is an important step towards justifying a scaled up version of the network, with few thousand synapses per neuron. The robust classification performance with highly correlated patterns shown in this thesis has not yet been demonstrated in any other bio-plausible hardware system. Finally, I showed how the learning circuits can correctly classify more complex patterns that are not restricted to binary input firing rates. These results set an important step towards online classification of spike trains from artificial sensors or biological neural networks. 7.2 Future Work Spike based learning and classification has become an active field of research because of its importance in processing spatio-temporal information in the cortex (Maass and Bishop, 1998). The pulse coded communication has also proved to be of great advantage in developing large multi-chip artificial behavioral systems, e.g., the EU FET CAVIAR (2002–2006) project. VLSI devices that implement spike based learning and signal processing are essential for various applications including autonomous robotics, stand-alone sensors or computational module in brain-machine interfaces. Considering the satisfactory classification performance of the IFSL family of chips, presented in this thesis, it is important to continue research in the same direction. The obvious next step would be to extend the methods and devices developed here to classify spike train obtained from real sensors and biological nervous systems. On going work in Choi et al. (2008) shows promising results in using the IFSL-v2 chip to recognize spoken vowels captured by a silicon cochlea (Chan et al., 2006). Other preliminary tests show that these chips are a good candidate for classification of spike rasters recorded from in-vivo experiments, e.g., monkey pre-motor cortex when it plans a grasping task (Musallam et al., 2004), paving the way for real-time neuro-prosthetics. To improve the VLSI system used in this work, various suggestions are presented in Chapter. 4. The size of the silicon synapse can be greatly reduced by using a single EPSC block per neuron, similarly to the approach proposed in Arthur and Boahen (2006), allowing for many more synapses per unit area. Enhancing the dynamic range of the silicon synapse can provide a better control on the stochastic transition probabilities, essential for robust classification. Improved multiplexers for configuring the synaptic density will provide higher flexibility in the choice of the network architecture, for 120 Chapter 7. Discussion classifying a wider variety of input stimuli. 7.3 Outlook In the late 1980s and 1990s, various research efforts (both industrial and academic) were dedicated to the design and implementation of hardware neural networks (NN), both in the analog and the digital domain (Lindsey and Lindblad, 1995; Ienne et al., 1996; Lindsay, 2002, see). However, very few of these efforts matured enough to become commercially successful (e.g., Synaptics; Adaptive Solutions; Ligature). The modest level of achievement can be largely attributed to the fact that those works were based on ASIC 1 technology that was not competitive enough to justify large multi-chip adoption for neural-network applications. Most of these silicon devices were built to behave as hardware accelerators, to execute, in real-time, the successful classical NN models (Rumelhart et al., 1986; Hertz et al., 1991). These rate coded models largely ignored the biological realism of spike based computation or the physical limits of bounded synaptic weight. Neither did the hardware NNs gain any particular advantage from their ASIC implementation, compared to that of a general purpose digital processor. On the other hand, progressive scaling of CMOS transistors, along with the advent of fast automated design and testing tools, drove the phenomenal success of digital processors, that outperformed the dedicated hardwares. Today, it is widely recognized that handling intra-die variability in device characteristics represent the biggest challenges for present and next generation nano-CMOS transistors and circuits (Declerck, 2005). This has prompted the ITRS (International Technology Roadmap for Semiconductor) in suggesting the concept of More than Moore 2 , to bring revolutionary changes in the way future integrated circuits and systems are conceived (ITRS 2007). According to a well known technology leader, Jan Rabaey (GSRC Berkeley), the unpredictable component behavior will lead to design strategies dependent on self-adaptivity, error resiliency and device randomness (Rabaey, 2005). Designing systems with high degree of redundancy with large arrays, possibly in billions, of poorly matched devices is now seriously considered by both academia and industry (Martorell and Cotofana, 2008; Bahar et al., 2007; FACETS; NanoCMOSgriD). Power management 1 Application Specific Integrated Circuits The ITRS added a perpendicular axis to traditional transistor scaling following the Moore’s law, that improves both transistor density and processing speed. The other axis refers to functional diversification by incorporating design techniques that does not necessarily scale according to Moore’s law. 2 7.3. Outlook 121 (both distribution and dissipation) will be an even bigger issue, when such arrays are required to function in an uninterrupted mode, e.g., in mobile or prosthetic devices. Not surprisingly, the architectural constrains in biological systems are very similar to that of the advanced silicon technology (Mead, 1990; Vittoz, 1998). Inspired by biology, the neuromorphic systems described in this thesis use low-power hybrid analog/digital circuits to emulate nature in the interest of addressing these system-level issues. Together with advances in spike based communication, they are a perfect fit for such massively parallel artificial systems, useful for advanced computing paradigms. From a physical hardware perspective, spikes allow for the reliable transmission and processing of information in large distributed systems similar to the digital computing hardware of today. Recent theoretical work in computer science (Maass and Bishop, 1998) further suggests that spike based computation offers a rich variety of principles that can effectively used to synthesize large scale novel computing structures. Spike based real-time classification is a step towards the synthesis of complex structures that can have cognitive capabilities taking us ahead of the reactive properties of current systems. Such neuromorphic systems could potentially be used for optimal exploitation of future emerging technologies that go well beyond the miniaturization and integration limits of CMOS in building adaptive, fault-tolerant systems. In conclusion, the VLSI device described in this thesis is highly suitable for real-time spike based classification and showed the most promising performance reported in the literature so far. The IFSL family of chips are an ideal example of hardware systems based on neuromorphic principles, right from the design of low power subthrehold circuit blocks to the demonstration of collective computation based on noisy elementary blocks. During this project, the prototype device has reached an appropriate stage of development where it can be interfaced with other spike based devices, for real-world applications. 122 Appendix A C-element The Muller C-element is a commonly used asynchronous logic component originally designed by David E. Muller. It applies logical operations on the inputs and has hysteresis (Sutherland, 1989). The output of the C-element reflects the inputs when the states of all inputs match. The output then remains in this state until all the inputs make transition to the other state (see truth table in Fig. A.1(a)). This model can be extended to the Asymmetric C-element where some inputs only effect the operation in one of the two transitions (positive or negative) in the output. As in Fig. A.1(b), input B has no affect in performing or restricting a downward transition. It is conventional to show C-elements with two outputs, instead of one, in asynchronous communication channel design. In such cases, both outputs refer to the same internal signal, i.e., C. A C B C A 0 0 1 1 B 0 1 0 1 C 0 prv prv 1 A aC C B (a) A 0 0 1 1 B 0 1 0 1 C 0 0 prv 1 (b) Figure A.1: The symmetric (a) and asymmetric (b) C-elements with their truth tables. The output holding the previous state is denoted by prv. The four major implementations of C-elements reported in literature are dynamic, week feedback, conventional, and symmetric (Shams et al., 1998). The C-element circuits can be broken up into two basic functional blocks, the switching and the keeping block. In Fig. A.2(a), the four stacked transistor at the input and the inverter at the output together form the switching 123 124 Appendix A. C-element A A B B C A A A B C B A B B C C C A C B B A A A B A C C C B B A C Figure A.2: Various circuit implementations of C-element. (a) dynamic, (b) week feedback, (c) conventional, (d) symmetric. block. The parasitic capacitor connected to node C̄ performs as the keeper. The circuits from left to right in Fig. A.2 consists of an increasing number of transistors. In the next two circuits, the same switching block can be easily identified. In the Fig.A.2(b) the feedback inverter plays the part of the keeper; and the six transistors in the middle of the circuit Fig. A.2(c) forms the keeper. In the symmetric implementation (Fig.A.2(d)), all the stacked transistors connected to the inputs (A and B) and the output inverter together form the switching block. The two transistors connected to C functions as keeper. Though the dynamic implementation is the fastest of all four, not having an explicit keeper makes it less robust. The conventional and the symmetric circuits are ratio-less implementations, while the weekfeedback requires correct transistor ratios for proper functioning. The area overhead being nearly the same the conventional implementation was chosen over the symmetric one, considering the ease of design. Appendix B Hand Shaking Expansion HSE or hand − shaking expansion shows the sequence of actions to be performed when designing an asynchronous link starting with CHP1 formalism. The following table lists HSE primitives. Operation notation AND u&v OR u|v Set v+ Clear vWait [u] Sequential [u];v+ Concurrent u+,v+ Repeat *[S] Explanation High if both are high low if both are low Drive v high Drive v low Wait till u is high Wait till u, then v+ Drive u and v high Repeat statement S infinitely Example: ∗ [[Ão &Ea ]; Er +, Na +; [Ao &Ẽa ]; Er −, Na −] (B.1) Read: Wait till Ao is low and Ea is high, then drive Er and Na high. Now wait for Ao to be high and Ea to be low, then drive Er and Na low. Keep repeating this process for ever. 1 communicating hardware processes 125 126 Appendix C Current-mode log domain Filter Current-mode log domain filter was first introduced by Seevinck (1990) and later analysed in detail by Tsividis (1997); Mahattanakul and Toumazou (1999); Frey (2000) etc. This group of circuits are often referred as ELIN (externally-linear-internally-nonlinear) filter, due to the nonlinear (log) transformation from current to voltage in their internal nodes. The log-domain filters were initially designed using bipolar junction transistors, that has the necessary exponential i − v relation. However, due recent interest in lowpower, low-voltage analog design, CMOS circuits working in subthreshod are often used for similar log domain function (Serra-Graells and Luis Huertas, 2005). In Arthur and Boahen (2006) showed ingenious application of log-domain filtering in neuromorphic circuits, where extremely low power consumption is one of prime requirements. Here I will describe various low pass filters using log-domain technique for neuromorphic applications, such as silicon neurons and synapses. Let us first analyze the circuit in Fig. C.1. Consider Iin the input to be a subthreshold current and Vtau a subthreshold voltage producing Iτ . Transistor M2 mirrors the input as I2 . The KCL at node VL can be written as: I2 = Iτ + C ⇒C d VL = Iτ − I2 dt d (0 − VL ) dt (C.1) Considering subthreshold operation, the output current Iout from a pMOS can be written as: 127 128 Appendix C. Current-mode log domain Filter I V tau M3 M4 M1 M2 I out I in Figure C.1: The basic current mode logdomain filter. Output current Iout is a low-pass filtered version of the input Iin . The gain and time constant of the filter cannot be independently controlled. κ Iout = I0 e UT ⇒ (VDD −VL ) d κ d Iout = −Iout VL dt UT dt (C.2) Combining the above two equations, we get: κ Iτ − I2 d Iout = −Iout dt UT CL CL UT d ⇒ Iout = −Iout Iτ + Iout I2 κ dt ˙ + Iout = Iout I2 τ Iout Iτ (C.3) L UT where τ = CκI . Looking into the circuit we find that the transistors τ M1 ,M2 and M4 form a translinear loop (Gilbert, 1990). We mark the corresponding gate-source voltages with arrows, hence: Vgs,1 = Vgs2 + Vgs,4 (C.4) Without going into detail of its derivation, using the translinear principle (for unequal number of elements on both sides of the equality) can be written as: Iin I0 = I2 Iout , (C.5) where Iin ,I2 and Iout are the currents in M1 , M2 and M4 respectively. I0 is the leakage current through a transistor when Vgs is zero. 129 Using the above current relation, the differential equation Eq.C.3 can be further modified to: ˙ + Iout = τ Iout Iin I0 Iτ (C.6) The first order ordinary differential equation can be Rsolved by rewriting 1 t it and multiplying both sides by the integrating factor e τ dt (=e τ ): Iin I0 Iout = τ τ Iτ t ˙ + e τt Iout = e τt Iin I0 e τ Itau τ τ Iτ t Iin I0 t ⇒ d(e τ Iout ) = e τ dt τ Iτ ˙ + Iout (C.7) Integrating both sides, we get: t e τ Iout = Iin I0 t eτ + C Iτ (C.8) The initial condition for a step input current can be given as, Iout (0+ )=0 and Iin (0+ )=Ip . The same equation can be solved for a negative step, after the steady state has been reached, Iout (0+ )= IpIτI0 and Iin (0+ )=0. t Ip I0 (1 − e− τ ) positive step Iτ Ip I0 − t = e τ negative step Iτ Iout (t) = (C.9) (C.10) This shows the behavior of a first order lowpass filter, where the steady state current in determined by the forcing function IinIτI0 on the right hand side of Eq.C.6. This being a function of the Iτ , will affect the gain when the time constant is varied. The factor I0 (∼ 10−18 A) also limits the gain to a low value. To add an independent control on the gain term, we can connect the source of M1 to a voltage source Vgain instead of VDD . The translinear loop can be written as: Vb + Vgs,1 = Vgs2 + Vgs,4 , ⇒ Iin Igain = I2 Iout (C.11) 130 Appendix C. Current-mode log domain Filter I V tau M5 I V tau M3 M3 Vgain M4 M4 I gain M1 M1 M2 I out I in M2 I out I in (a) (b) Figure C.2: Modification of the basic logdomain filter with additional gain control (a) and time-constant gain decoupled (b). κ V where Vb =VDD -Vgain and Igain =I0 e UT b . The circuit is shown in Fig. C.2(a). Substituting the translinear current in Eq. C.3, the forcing function becomes Iin Igain . The voltage Vgain can be used to independently control the gain. Iτ In order to decouple the gain from Itau and also make it independent of I0 , the circuit in Fig. C.2(b) can be used. The only difference is in the source of transistor M1 which is connected to Vtau instead of VD D. This includes the transistor M3 also in translinear loop (red arrows). The translinear current equation (for same number of elements on either side of equality) is given by: Iin Iτ = I2 Iout (C.12) The analysis of the remaining part of the circuit remains the same. Substituting the translinear currents in Eq. C.3 the new differential equation becomes: ˙ + Iout = Iin τ Iout (C.13) Hence the new forcing function is just Iin . Though the gain is independent of time constant and I0 but it has a constant magnitude of unity. A better control in the steady state gain can be achieved by using an extra transistor, as shown in Fig. C.3(a). Here Igain is a subthreshold current used to bias the transistor M1 . The translinear loop is given by: Vgs,5 + Vgs,1 = Vgs2 + Vgs,4 , ⇒ Iin Igain = I2 Iout ˙ + Iout = Iin Igain substituting in Eq. C.3; τ Iout Itau (C.14) 131 Vtau I V tau M3 I gain M5 I gain M5 M4 M1 M4 M1 M2 I out I in I M3 M2 I out I in (a) (b) Figure C.3: Variations of the log-domain circuit to implement independent gain and time constant control Hence the forcing function is devoid of I0 but is again dependent on the time constant. Including the transistor M3 in translinear loop, as shown in Fig. C.3(b), can make the gain independent of Itau . The translinear loop can be written as: Vgs,3 + Vgs,5 + Vgs,1 = Vgs2 + Vgs,4 , ⇒ Iin Igain Itau = I0 I2 Iout gain ˙ + Iout = Iin I(C.15) substituting in Eq. C.3; τ Iout I0 Though this brings the I0 term back into the forcing function, it now appears in the numerator. A small I0 will now increase the gain in contrast to its affect in Eq. C.6. A complete removal of I0 from the forcing function can be achieved by using another translinear loop for Igain , instead of a simple current source. In Fig. C.4(a), such a current source is shown where the output Igain = I1I2I0 . Here I1 and I2 are two voltage biased MOS current sources. Using this current source in Fig. C.3(b) reduces the forcing function to IinI2I1 . A different logdomain filter is reported in Bartolozzi and Indiveri (2007b). Shown in Fig. C.4(b), this circuit does not utilize the translinear principle. In subthreshold operation, the current I2 can be written as: I2 = Iin out 1 + IIgain (C.16) 132 Appendix C. Current-mode log domain Filter I I2 M3 bias I gain M4 Vgain M1 I1 M2 I out I in (a) (b) Figure C.4: (a) A translinear current source for use in logdomain filters. (b) Differential pair integrator circuit that approximates a log domain filter with independent gain and time constant control (adapted from Bartolozzi et al. (2006)). − κ UT where Igain is a function of Vgain (=I0 e Vgain −VDD ) and not the current through transistor M1 . If we consider that in the steady state condition gain . Hence the forcing function becomes Iout >> Igain , then I2 reduces to IinIIout Iin Igain . Iτ Bibliography L. F. Abbott and S. Song. Asymmetric hebbian learning, spike timing and neural response variability. In Advances in Neural Information Processing Systems, volume 11, pages 69–75, 1999. L.F. Abbott and S.B. Nelson. Synaptic plasticity: taming the beast. Nature Neuroscience, 3:1178–1183, November 2000. Adaptive Solutions. URL http://www.adaptivesolutions.com/. J. Arthur and K. Boahen. Learning in silicon: Timing is everything. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 2006. J.V. Arthur and K. Boahen. Recurrently connected silicon neurons with active dendrites for one-shot learning. In IEEE International Joint Conference on Neural Networks, volume 3, pages 1699–1704, July 2004. D. Badoni, M. Giulioni, V. Dante, and P. Del Giudice. An aVLSI recurrent network of spiking neurons with reconfigurable and plastic synapses. In Proceedings of the IEEE International Symposium on Circuits and Systems, pages 1227–1230. IEEE, IEEE, May 2006. R.I. Bahar, C. Lau, D. Hammerstrom, D. Marculescu, J. Harlow, A. Orailoglu, W.H. Joyner, and M.; Pedram. Architectures for silicon nanoelectronics and beyond. IEEE Computer, pages 25–33, 2007. C. Bartolozzi and G. Indiveri. A spiking VLSI selective attention multi– chip system with dynamic synapses and integrate and fire neurons. In B. Schölkopf, J.C. Platt, and T. Hofmann, editors, Advances in Neural Information Processing Systems 19, Cambridge, MA, Dec 2007a. Neural Information Processing Systems Foundation, MIT Press. (In press). C. Bartolozzi and G. Indiveri. Synaptic dynamics in analog VLSI. Neural Computation, 19:2581–2603, Oct 2007b. 133 134 Bibliography C. Bartolozzi, S. Mitra, and G. Indiveri. An ultra low power current–mode filter for neuromorphic systems and biomedical signal processing. In IEEE Proceedings on Biomedical Circuits and Systems (BioCAS06), pages 130– 133, 2006. A.J. Bell and T.J. Sejnowski. The independent components of natural scenes are edge filters. Vision Res., 37:3327–3338, Dec 1997. H.K.O. Berge and P. Hafliger. High-speed serial aer on fpga. In IEEE International Symposium on Circuits and Systems, pages 857–860, 2007. G. Bi and M. Poo. Synaptic modification by correlated activity: Hebb’s postulate revisited. Annu. Rev. Neurosci., 24:139–166, 2001. G-Q. Bi and M-M. Poo. Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type. Jour. of Neuroscience, 18(24):10464–10472, 1998. B. Blais, L.N. Cooper, and H. Shouval. Formation of direction selectivity in natural scene environments. K. A. Boahen. Point-to-point connectivity between neuromorphic chips using address-events. IEEE Transactions on Circuits and Systems II, 47(5):416– 34, 2000. K. A. Boahen. A burst-mode word-serial address-event link – I: Transmitter design. IEEE Circuits and Systems I, 51(7):1269–80, 2004a. K. A. Boahen. A burst-mode word-serial address-event link – II: Receiver design. IEEE Circuits and Systems I, 51(7):1281–91, 2004b. K. A. Boahen. A burst-mode word-serial address-event link – III: Analysis and test results. IEEE Circuits and Systems I, 51(7):1292–300, 2004c. K.A. Boahen. Neuromorphic microchips. Scientific American, pages 56–63, May 2005. K.A. Boahen. A retinomorphic vision system. IEEE Micro, 16(5):30–39, October 1996. K.A. Boahen. Communicating neuronal ensembles between neuromorphic chips. In T. S. Lande, editor, Neuromorphic Systems Engineering, pages 229–259. Kluwer Academic, Norwell, MA, 1998. Bibliography 135 A. Bofill-i Petit and A. F. Murray. Synchrony detection and amplification by silicon neurons with STDP synapses. IEEE Transactions on Neural Networks, 15(5):1296–1304, September 2004. J. Brader, W. Senn, and S. Fusi. Learning real world stimuli in a neural network with spike-driven synaptic dynamics. Neural Computation, 19: 2881–2912, 2007. N. Caporale and Y. Dan. Spike Timing-Dependent Plasticity: A Hebbian Learning Rule. Annu. Rev. Neurosci., Feb 2008. G. Cauwenberghs and M. A. Bayoumi, editors. Learning on Silicon: Adaptive VLSI Neural Systems. Kluwer, Boston, MA, 1999. CAVIAR. Convolution address-event-representation (AER) vision architecture for real-time. IST -2001- 34124 EU Grant, 2002–2006. V. Chan, S-C. Liu, and A. van Schaik. AER EAR: A matched silicon cochlea pair with address event representation interface. IEEE Transactions on Circuits and Systems I, 54(1):48–59, Jan 2006. Special Issue on Sensors. H. Chen, C.D. Fleury, and A.F. Murray. Continuous-valued probabilistic behavior in a vlsi generative model. IEEE Trans. on Neural networks, 17 (3):755–770, 2006. E. Chicca. A Neuromorphic VLSI System for Modeling Spike–Based Cooperative Competitive Neural Networks. PhD thesis, ETH Zürich, Zürich, Switzerland, April 2006. E. Chicca, D. Badoni, V. Dante, M. D’Andreagiovanni, G. Salina, S. Fusi, and P. Del Giudice. A VLSI recurrent network of integrate–and–fire neurons connected by plastic synapses with long term memory. IEEE Transactions on Neural Networks, 14(5):1297–1307, September 2003. E. Chicca, A. M. Whatley, V. Dante, P. Lichtsteiner, T. Delbrück, P. Del Giudice, R. J. Douglas, and G. Indiveri. A multi-chip pulse-based neuromorphic infrastructure and its application to a model of orientation selectivity. IEEE Transactions on Circuits and Systems I, Regular Papers, 5(54):981– 993, 2007. S. Choi, G. Mitra Indiveri, S. Liu. Shih-Chii, and S. Y. Lee. Real-time sound-recognition using neuromorphic hardware. 2008. in preperation. 136 Bibliography T. Y. W. Choi, B. E. Shi, and K. Boahen. An on-off orientation selective address event representation image transceiver chip. IEEE Transactions on Circuits and Systems I, 51(2):342–353, 2004. T. Y. W. Choi, P. A. Merolla, J. V. Arthur, K. A. Boahen, and B. E. Shi. Neuromorphic implementation of orientation hypercolumns. IEEE Transactions on Circuits and Systems I, 52(6):1049–1060, 2005. M. Coath, J. Brader, S. Fusi, and S.L. Denham. Multiple views of the response of an ensemble of spectro-temporal features supports concurrent classification of utterance, prosody, sex and speaker identity. Network, (2-3):285–300, 2005. E. Culurciello and A. G. Andreou. A comparative study of access topologies for chip-level address-event communication channels. IEEE Transactions on Neural Networks, 14(5):1266–77, September 2003. E. Culurciello, R. Etienne-Cummings, and K. Boahen. Arbitrated addressevent representation digital image sensor. Electronics Letters, 37(24):1443– 1445, Nov 2001. E. Culurciello, R. Etienne-Cummings, and K. Boahen. A biomorphic digital image sensor. Solid-State Circuits, IEEE Journal of, 38(2):281–294, 2003. Leporati F. Danese, G. A parallel neural processor for real-time applications. IEEE Micro, 22(3):20–31, 2002. V. Dante, P. Del Giudice, and A. M. Whatley. PCI-AER – hardware and software for interfacing to address-event based neuromorphic systems. The Neuromorphic Engineer, 2(1):5–6, 2005. http://ineweb.org/research/newsletters/index.html. P. Dayan and L.F. Abbott. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press, 2001. G. Declerck. A look into the future of nanoelectronics. In IEEE Symposium on VLSI Technology, 2005. Digest of Technical Papers., pages 6–10, 2005. S.R. Deiss, R.J. Douglas, and A.M. Whatley. A pulse-coded communications infrastructure for neuromorphic systems. In W. Maass and C. M. Bishop, editors, Pulsed Neural Networks, chapter 6, pages 157–78. MIT Press, 1998. A. Destexhe, Z.F. Mainen, and T.J. Sejnowski. Methods in Neuronal Modelling, from ions to networks, chapter Kinetic Models of Synaptic Transmission, pages 1–25. The MIT Press, Cambridge, Massachussets, 1998. Bibliography 137 C. Diorio, P. Hasler, B.A. Minch, and C. Mead. A single-transistor silicon synapse. IEEE Transactions on Electron Devices, 43(11):1972–1980, 1996. C. Diorio, P. Hasler, B. A. Minch, and C. A. Mead. A floating-gate MOS learning array with locally computed weight updates. IEEE Transactions on Electron Devices, 44(12):2281–2289, December 1997. M. Djurfeldt, M. Lundqvist, C. Johansson, Ekeberg . Rehn, M., and A. Lansner. Brain-scale simulation of the neocortex on the blue gene/l supercomputer. IBM Journal of Research and Development, 52:31–41, 2008. R.J. Douglas, M.A. Mahowald, and K.A.C. Martin. Hybrid analog-digital architectures for neuromorphic systems. In Proc. IEEE World Congress on Computational Intelligence, volume 3, pages 1848–1853. IEEE, 1994. R.J. Douglas, M.A. Mahowald, and C. Mead. Neuromorphic analogue VLSI. Annu. Rev. Neurosci., 18:255–281, 1995. R. Etienne-Cummings, V. Van der Spiegel, and J. Muller. Hardware implementation of a visual-motion pixel using oriented spatiotemporal neural filters. IEEE Trans. on Circuits and Systems II, 46:1121–1136, 1999. FACETS. Fast analog computing with emergent transient states. URL http: //facets.kip.uni-heidelberg.de/. E. Farquhar and P. Hasler. A bio-physically inspired silicon neuron. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI, 52:477–488, 2005. D. B. Fasnacht, A. M. Whatley, and G. Indiveri. A serial communication infrastructure for multi-chip address event systems. 2008. accepted in ISCAS, 2008. T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, (26):861–874, 2006. D. Frey. Future implications of the log domain paradigm. In IEE Proceedings Circuits Devices Systems, volume 147, pages 65–72, February 2000. S. Fusi. Hebbian spike-driven synaptic plasticity for learning patterns of mean firing rates. Biological Cybernetics, 87:459–470, 2002. S. Fusi and L. F. Abbott. Limits on the memory storage capacity of bounded synapses. Nature Neuroscience, 10:485–493, 2007. 138 Bibliography S. Fusi and W. Senn. Eluding oblivion with smart stochastic selection of synaptic updates. Chaos, 16, 2006. S. Fusi, M. Annunziato, D. Badoni, A. Salamon, and D. J. Amit. Spike– driven synaptic plasticity: theory, simulation, VLSI implementation. Neural Computation, 12:2227–58, 2000. R. Genov and G. Cauwenberghs. Stochastic mixed-signal vlsi architecture for high-dimensional kernel machines. In J. E. Moody et al., editor, Advances in Neural Information Processing Systems. Morgan Kaufmann, 2002. P. Georgiou and C. Toumazou. A silicon pancreatic beta cell for diabetes. Biomedical Circuits and Systems, IEEE Transactions on, 1, 2007. W. Gerstner. What is different with spiking neurons? In H. Mastebroek and J. E. Vos, editors, Plausible Neural Networks for Biological Modelling. Kluwer Academic, 2001. B. Gilbert. Current-mode circuits from a translinear viewpoint: A tutorial. In C. Tomazou, F. J. Lidgey, and D. G. Haigh, editors, Analogue IC design: the current-mode approach, chapter 2, pages 11–91. Peregrinus, Stevenage, Herts., UK, 1990. M Giulioni, M. Pannunzi, D. Badoni, V. Dante, and P. Del Giudice. A configurable analog vlsi neural network with spiking neurons and self-regulating plastic synapses. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems (NIPS), 2007. M. Graupner and N. Brunel. STDP in a bistable synapse model based on CaMKII and associated signaling pathways. PLoS Comput. Biol., 3:e221, Nov 2007. S. Grossberg, E. Mingolla, and D. Todovoric. Threshold voltage mismatch and intra-die leakage current in digital cmos circuits. IEEE JOURNAL OF SOLID-STATE CIRCUITS,., 39:157–168, 2004. GSRC Berkeley. Gigasclae systems research lab. gigascale.org/. URL http://www. R. Gütig and H. Sompolinsky. The tempotron: a neuron that learns spike timing–based decisions. Nature Neuroscience, 9:420–428, 2006. doi: 10. 1038/nn1643. P. Häfliger. Adaptive WTA with an analog VLSI neuromorphic learning chip. IEEE Trans Neural Netw, 18:551–572, Mar 2007. Bibliography 139 P. Hafliger and M. Mahowald. Spike based normalizing hebbian learning in an analog vlsi artificial neuron. Analog Integrated Circuits and Signal Processing, 18:133–139, 1999. P. Häfliger and C. Rasche. Floating gate analog memory for parameter and variable storage in a learning silicon neuron. In Proc. IEEE International Symposium on Circuits and Systems, Orlando, 1999. P. Häfliger, in analog Advances 698. MIT M. Mahowald, and L. Watts. A spike based learning neuron VLSI. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, in neuralinformation processing systems, volume 9, pages 692– Press, 1997. D. Hammerstrom. The handbook of brain theory and neural network, pages 349–353. 2002. J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA, 1991. A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology, 117:500–44, 1952. P. Ienne, T. Cornu, and G Kuhn. Special-purpose digital hardware for neural networks: An architectural survey. Journal of VLSI Signal Processing Systems, pages 5–25, 1996. G. Indiveri. Neuromorphic bistable VLSI synapses with spike-timingdependent plasticity. In Advances in Neural Information Processing Systems, volume 15, pages 1091–1098, Cambridge, MA, December 2002. MIT Press. G. Indiveri. A low-power adaptive integrate-and-fire neuron circuit. In Proc. IEEE International Symposium on Circuits and Systems, pages IV–820– IV–823. IEEE, May 2003. G. Indiveri, E. Chicca, and R. Douglas. A vlsi array of low-power spiking neurons and bistable synapses with spiketiming dependent plasticity. IEEE Transactions on Neural Networks, 17(1):211–221, 2006a. G. Indiveri, E. Chicca, and R. Douglas. A VLSI array of low-power spiking neurons and bistable synapses with spike–timing dependent plasticity. IEEE Transactions on Neural Networks, 17(1):211–221, Jan 2006b. 140 Bibliography G. Indiveri, S.-C. Liu, T. Delbruck, and R. Douglas. The New Encyclopedia of Neuroscience, chapter Neuromorphic systems, page In Press. Elsevier, 2008. Intel. Teraflops research chip. http://techresearch.intel.com/articles/TeraScale/1449.htm, 2007. ITRS 2007. URL http://www.itrs.net/Links/2007ITRS/ExecSum2007. pdf. E. M. Izhikevich. Simulation of large-scale brain models. http://vesicle.nsi.edu/users/izhikevich/human brain simulation/Blue Brain.htm, 2005. J.K. Jenkins and K. M. Dallenbach. Obliviscence during sleep and waking. American Journal of Psycology, 35:605–612), 1924. B. Keeth and J. Baker. DRAM Circuit Design, A Tutorial. IEEE, New York, 7th edition, 2000. R Kempter, W. Gerstner, and J. L. van Hemmen. Hebbian learning and spiking neurons. Physical Review E, 59(4):4498–4514, 1999. R Kempter, W. Gerstner, and J. L. van Hemmen. Intrinsic stabilization of output firing rates by spike-based hebbian learning. Neural Computation, 59(4):4498–4514, 2001. P. Kinget. Device mismatch and tradeoffs in the design of analog circuits. IEEE JOURNAL OF SOLID-STATE CIRCUITS,, 40(6), 2005. T. J. Koickal, A. Hamilton, S. L. Tan, J.A. Covington, J.W. Garrdner, and T.C. Pearce. Analog vlsi circuit implementation of an adaptive neuromorphic olfaction chip. IEEE Trans. on Circuits and Systems, 54(1):60–, 2007. J. Lazzaro, S. Ryckebusch, M.A. Mahowald, and C.A. Mead. Winner-take-all networks of O(n) complexity. In D.S. Touretzky, editor, Advances in neural information processing systems, volume 2, pages 703–711, San Mateo - CA, 1989. Morgan Kaufmann. J. P. Lazzaro. Silicon Implementation of Pulse Coded Neural Networks, chapter Low-power silicon axons, neurons, and synapses, pages 153–164. Kluwer Academic Publishers, 1994. Bibliography 141 R.A. Legenstein, C. Näger, and W. Maass. What can a neuron learn with spike-timing-dependent plasticity? Neural Computation, 17(11):2337– 2382, 2005. J.J. Letzkus, B. Kampa, and G.J. Stuart. Learning rules for spike timingdependent plasticity depend on dendritic synapse location. The Journal of Neuroscience, 26:10420–10429, 2006. W.B. Levy and O. Steward. Temporal contiguity requirements for long-term associative potentiation/depression in the hippocampus. Neuroscience, 8: 791–797, Apr 1983. P. Lichtsteiner and T. Delbrück. A 64x64 aer logarithmic temporal derivative silicon retina. In Research in Microelectronics and Electronics, 2005 PhD, volume 2, pages 202–205, July 2005. P. Lichtsteiner, C. Posch, and T. Delbrück. A 128×128 120dB 30mW asynchronous vision sensor that responds to relative intensity change. In 2006 IEEE ISSCC Digest of Technical Papers, pages 508–509. IEEE, 2006. Ligature. OCR that reads the way you do. URL http://www.ligatureltd. com/. C.S. Lindsay. Neural networks in hardware: Architectures, products and applications, 2002. URL http://www.particle.kth.se/~lindsey/ HardwareNNWCourse/. C. S. Lindsey and T. Lindblad. Survey of neural network hardware. Proc. SPIE, 2492:1194–1205, 1995. J. Lisman and N. Spruston. Postsynaptic depolarization requirements for LTP and LTD: a critique of spike timing-dependent plasticity. Nat. Neurosci., 8:839–841, Jul 2005. S.-C. Liu and R. Douglas. Temporal coding in a silicon network of integrateand-fire neurons. IEEE Transactions on Neural Networks, 15(5):1305– 1314, Sep 2004. R.F. Lyon and C. Mead. An analog electronic cochlea. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(7):1119–1134, 1988. W. Maass and C. M. Bishop. Pulsed Neural Networks. MIT Press, 1998. 142 Bibliography J. Mahattanakul and C. Toumazou. Modular log-domain filters based upon linear gm-c filter synthesis. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI, pages 1421–1430, 1999. M. Mahowald. An Analog VLSI System for Stereoscopic Vision. Kluwer, Boston, MA, 1994. M.A. Mahowald. VLSI analogs of neuronal visual processing: a synthesis of form and function. PhD thesis, Department of Computation and Neural Systems, California Institute of Technology, Pasadena, CA., 1992. H. Markram. The blue brain project. Nat. Rev. Neurosci., 7:153–160, Feb 2006. H. Markram, J. Lübke, M. Frotscher, and B. Sakmann. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275: 213–215, 1997. A. J. Martin and M. Nystrom. Asynchronous techniques for system-on-chip design. Proceedings of the IEEE, 94:1089–1120, 2006. A.J. Martin, M. Nystrom, and C.G. Wong. Three generations of asynchronous microprocessors. Design and Test of Computers, IEEE, 26:9–17, 2003. S.J. Martin, P.D. Grimwood, and R.G. Morris. Synaptic plasticity and memory: an evaluation of the hypothesis. Annu. Rev. Neurosci., 23:649–711, 2000. F. Martorell and A. Cotofana, S. D. Antonio Rubio. An analysis of internal parameter variations effects on nanoscaled gates. IEEE TRANSACTIONS ON NANOTECHNOLOGY, pages 24–33, 2008. C. Mead. Neuromorphic electronic systems. Proceedings of the IEEE, 78(10): 1629–36, October 1990. C. Mead and L. Conway. Introduction to VLSI Systems. Addison-Wesley, Reading, Massachusetts, 1980. C.A. Mead. Analog VLSI and Neural Systems. Addison-Wesley, Reading, MA, 1989. P. A. Merolla, J. V. Arthur, B. E. Shi, and K. A. Boahen. Expandable networks for neuromorphic chips. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 54(2):301–311, Feb. 2007. Bibliography 143 M.L. Minsky and S.A. Papert. Perceptron. MIT Press, Cambridge, 1969. S. Mitra, G. Indiveri, and S. Fusi. Learning to classify complex patterns using a VLSI network of spiking neurons. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, Cambridge (MA), 2008. MIT Press. (In Press). Srinjoy Mitra and Giacomo Indiveri. A low-power dual-threshold comparator for neuromorphic systems. In 2005 PhD Research in Microelectronics and Electronics, volume 2, pages 402–405, Lausanne, Jul 2005. IEEE. D. Muir. Spike toolbox. http://www.ini.uzh.ch/˜dylan/spike toolbox/, 2005. S. Musallam, B.D. Corneil, B. Greger, H. Scherberger, and R.A. Andersen. Cognitive control signals for neural prosthetics. Science, 305:258–262, Jul 2004. J. P. Nadal, G. Toulouse, J. P. Changeux, and S. Dehaene. Networks of formal neurons and memory palimpsests. Europhys. Lett., pages 535–542, 1986. NanoCMOSgriD. Meeting the design challenges of nano-cmos electronics. URL http://www.nanocmos.ac.uk/. T. Natschläger and W. Maass. Spiking neurons and the induction of finite state machines. Theoretical Computer Science: Special Issue on Natural Computing, 287(251–265), 2002. S.B. Nelson, P.J. Sjostrom, and G.G. Turrigiano. Rate and timing in cortical synaptic plasticity. Philos. Trans. R. Soc. Lond., B, Biol. Sci., 357:1851– 1857, Dec 2002. D.H. O’Connor, G.M. Wittenberg, and S.S. Wang. Graded bidirectional synaptic plasticity is composed of switch-like unitary events. Proc. Natl. Acad. Sci. U.S.A., 102:9679–9684, Jul 2005. M. J. Pearson, A.G. Pipe, K. Mitchinson, B. Gurney, C. Melhuish, I. Gilhespy, and M. Nibouche. Implementing spiking neural networks for realtime signal-processing and control applications: A model-validated FPGA approach. IEEE Trans. on Neural Networkse, 18(5):1472–87, 2007. C. C. H. Petersen, R. C. Malenka, R. A. Nicoll, and J. J. Hopfield. All-ornone potentiation at CA3-CA1 synapses. Proc. Natl. Acad. Sci., 95:4732, 1998. 144 Bibliography P.Hasler, B.A.Minch, J.Dugger, and C.Diorio. Adaptive circuits and synapses using pFET floating-gate devices. Kluwer Academic, 1999. R. Polikar. Ensemble based system in decission making. IEEE Circuits and systems magazine, 3, 2006. J.M. Rabaey. Design at the end of the silicon roadmap. In IEEE Design Automation Conference, page Keynote Address III, 2005. W. Rall. Distinguishing theoretical synaptic potentials computed for different soma-dendritic distributions of synaptic input. Journal of neurophysiology, 30(5):1138–1168, September 1967. E. Ros, E. M. Ortigosa, R. Agis, R. Carrillom, and M. Arnold. Real–time computing platform for spiking neurons (RT-spike). IEEE Transactions on Neural Networks, 17(4):1050–1062, 2006. J. Rubin, D. Lee, and H. Sompolinsky. The equilibrium property of temporal asymmetric hebbian plasticity. Physical Review letters, pages 364–367, 2001. D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. In D. E. Rumelhart J.L. McClelland and PDP Res.Grp., editors, Parallel Distributed Processing, pages 318–362. MIT Press, 1986. T. Sakurai. Optimization of cmos arbiter and synchronizer circuits with submicrometer mosfets. Solid-State Circuits, IEEE Journal of, 23:901– 906, 1988. A. Sandberg, A. Lansner, K.M. Petersson, and O. Ekeberg. A Bayesian attractor network with incremental learning. Network, 13:179–194, May 2002. R Sarpeshkar. Brain power – borrowing from biology makes for low power computing – bionic ear. IEEE Spectrum, 43(5):24–29, May 2006. R. Sarpeshkar. Analog versus digital: Extrapolating from electronics to neurobiology. Neural Computation, 10(7):1601–1638, October 1998. J. Schemmel, A. Grubl, K. Meier, and E. Mueller. Implementing synaptic plasticity in a vlsi spiking neural network model. In Neural Networks, 2006. IJCNN ’06. International Joint Conference on, pages 1–6, 2006. Bibliography 145 J. Schemmel, D. Bruderle, K. Meier, and B. Ostendorf. Modeling synaptic plasticity within networks of highly accelerated i&f neurons. In Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on, pages 3367–3370, 2007. E. Seevinck. Companding current-mode integrator: A new circuit principle for continuous-time monlithic filters. Electronics Letters, 26(24):2046– 2047, November 1990. W. Senn and S. Fusi. Learning Only When Necessary: Better Memories of Correlated Patterns in Networks with Bounded Synapses. Neural Computation, 17(10):2106–2138, 2005. F. Serra-Graells and J. Luis Huertas. Low-voltage cmos subthreshold logdomain filtering. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI, pages 2090–2100, 2005. R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gómez-Rodrı́guez, H. Kolle Riis, T. Delbrück, S. C. Liu, S. Zahnd, A. M. Whatley, R. J. Douglas, P. Häfliger, G. Jimenez-Moreno, A. Civit, T. Serrano-Gotarredona, A. Acosta-Jiménez, and B. LinaresBarranco. AER building blocks for multi-layer multi-chip neuromorphic vision systems. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15. MIT Press, Dec 2005. R. Serrano-Gotarredona, T. Serrano-Gotarredona, A. Acosta-Jimenez, and B. Linares-Barranco. A neuromorphic cortical-layer microchip for spikebased event processing vision systems. IEEE Transactions on Circuits and Systems I, 53(12):2548–2566, Dec. 2006. M. Shams, J. C. Ebergen, and M. I. Elmasry. Modeling and comparing CMOS implementations of the c–element. IEEE Transactions on VLSI Systems, 6(4):563–7, 1998. N. Y. Shen, Z. Liu, C. Lee, B. A. Mich, and E. C. Kan. Charge-based chemical sensors: A neuromorphic approach with chemoreceptive neuron mos (Cν MOS) transistors. IEEE Trans. on electron devices., 50(10):2171–2178, 2003. A.P. Shon, D. Hsu, and C. Diorio. Learning spike-based correlations and conditional probabilities in silicon. In Advances in Neural Information Processing Systems, pages 1123–1130, 2002. 146 Bibliography H.Z. Shouval, M.F. Bear, and L.N. Cooper. A unified model of NMDA receptor-dependent bidirectional synaptic plasticity. Proc. Natl. Acad. Sci. U.S.A., 99:10831–10836, Aug 2002. M. Sivilotti. Wiring Considerations in Analog VLSI SystemsWith Applications to Field Programmable Networks. PhD thesis, California Inst. Technol., Pasadena,, 1991. P.J. Sjostrom, G.G. Turrigiano, and S.B. Nelson. Rate, timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron, 32:1149–1164, Dec 2001. S. Still, K. Hepp, and R.J. Douglas. Neuromorphic walking gait control. IEEE TRANSACTIONS ON NEURAL NETWORKS, 17:496–508, 2006. I. E. Sutherland. Micropipelines. Communications of the ACM, 32(6):720– 738, 1989. Synaptics. URL http://www.synaptics.com/. T. Teixeira, E. Culurciello, J. Park, D. Lymberopoulos, A. BartonSweeney, and A. Savvides. Addressevent imagers for sensor networks: Evaluation and modeling. In Information Processing in Sensor Networks, pages 19–21, 2006. Y. Tsividis. Externally linear, time-invariant systems and their application to companding signal processors. IEEE Trans. Circuits and Systems, pages 65–85, 1997. M. Valle. Analog VLSI implementation of artificial neural networks with supervised on-chip learning. Analog Integrated Circuits and Signal Processing, 33:263–287, 2002. A. van Schaik. Building blocks for electronic spiking neural networks. Neural Networks, 14(6–7):617–628, Jul–Sep 2001. E.A. Vittoz. Analog VLSI for collective computation. In Proc. IEEE Int. Conf. on Electronic Circuits and Systems, volume 2, pages 3–6, 1998. R. J. Vogelstein, F. Tenore, R. Philipp, M. S. Adlerstein, D. H. Goldberg, and Cauwenberghs. Spike timing-dependent plasticity in the address domain. In Advances in Neural Information Processing Systems, Cambridge, MA, 2003. MIT Press. Bibliography 147 R.J. Vogelstein, F. Tenore, R. Etienne-Cummings, M.A. Lewis, N. Thakor, and A. Cohen. Control of locomotion after injury or amputation. Biological Cybernatics, 95:555–566, 2006. R.J. Vogelstein, U. Malik, E. Culurciello, G. Cauwenberghs, and R. EtienneCummings. A multichip neuromorphic system for spike-based visual information processing. Neural Computation, 19:2281–2300, 2007. Y-X. Wang and S-C. Liu. Programmable synaptic weights for an avlsi network of spiking neurons. In IEEE International Symposium on Circuits and Systems, pages 4531–4534, 2006. B. Wen and K. Boahen. Active bidirectional coupling in a cochlear chip. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems, volume 18, pages 1497–1504. MIT Press, Cambridge, MA, 2006. X. Xie and H. S. Seung. Spike-based learning rules and stabilization of persistent neural activity. In Advanced Research in Asynchronous Circuits and Systems, 2000. Z Yang, A. Murray, F. Worgotter, K. Cameron, and V. Boonsobhak. A neuromorphic depth-from-motion vision model with stdp adaptation. Neural Networks, IEEE Transactions on, 17:482–495, 2006. K. Y. Yun, P.A. Beerel, and J. Areceo. High-performance asynchronous pipeline circuits. In Advanced Research in Asynchronous Circuits and Systems, pages 17–28, 1996. K.A. Zaghloul and K. Boahen. An ONOFF log domain circuit that recreates adaptive filtering in the retina. IEEE Trans. Circuits and Systems., 52(1): 99–107, 2005. K.A. Zaghloul and K. Boahen. A silicon retina that reproduces signals in the optic nerve. Journal of Neural Engineering, 3:257–267, December 2006. doi: 10.1088/1741-2560/3/4/002.