Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Distributed, Event Driven Simulation of Spiking Neural Networks Dipl.-Ing. Cyprian Graßmann Prof. Dr. Joachim K. Anlauf Phone: +49-228-73-4279 Fax: +49-228-73-4212 [email protected] [email protected] University of Bonn Department of Computer Science II Römerstraße 164 D-53117 Bonn, Germany Abstract We present the architecture of a simulator that is able to simulate large networks of spiking neurons using a distributed event driven simulation. Contrary to a time driven simulation, which is usually used to simulate spiking neural networks, our simulation needs less computational resources because of the low average activity of typical networks. The simulator is divided into a set of communicating sub-simulators running concurrently on several workstations, where each sub-simulator handles a part of the network. The paper addresses the problems of synchronisation between the sub-simulators and how information contained in the network topology and properties of neuron models are used to solve them. Preliminary results are presented for two simple model networks illustrating the speed up gained by a distribution of the simulation. 1. Introduction Simulation is the connecting link between neurophysiological measurements and theoretical studies. It helps us to understand and to reproduce the behaviour of biological systems of neurons and to verify functional behaviour of model networks. Recent theoretical studies about the computational complexity of spiking neural networks by Wolfgang Maass [Maass95] encourage the development of a dedicated simulation method for this kind of networks. The properties of artificial spiking neural networks like the low average activity [Rolls89] are inspired by biological neural networks. These networks consist of many different neuron types which are connected by a dense but inhomogeneous and not fully connected network. Each connection within this network has a specific delay. Despite the irregular structure of the network and the large amount of different neuron types, information between neurons is exchanged using spikes. All spikes arriving at a given synapse look the same, hence the information carried by a spike is found in its time of occurrence only. Substantial evidence also indicates that the time structure of neural spike trains is relevant in neural processing [Aertsen93]. These networks are usually simulated using a time driven method. A well known simulator of this type is Genesis [BowBee94]. Simulators using a time driven simulation achieve their results by a computation of the whole network from one time step to another. An advantage of this method is a very detailed analysis of the neuron behaviour. A draw- back is the tremendous computational effort and therefore a limitation to the simulation of single neurons or small networks. Large networks can only be simulated by using a coarser time scale and reducing complexity of the neuron model. Hence the computational effort for a time driven simulation of spiking neural networks is proportional to the number of neurons in the network and independent of its activity. Taking the network properties into account, spikes are sufficiently described using a time stamp and the address of the sender or the receiver. Together with the sparse coding this leads to an event driven simulation as an efficient framework for the simulation of spiking neural networks. The spikes are directly mapped to events, which are controlled and scheduled by a simulation engine. In an event driven simulation the evaluation of events is done by logical processes. They can implement any spike processing model of a neuron. Delays are usually modelled by the logical process itself. Because the delays can be used to enhance the simulation performance we present a solution which provides an explicit handling of the delays by the simulation engine. Moreover this solution relieves the logical process from the explicit handling of the delays. Contrary to the time driven simulation, the event driven simulation profits from the low activity of spiking neural networks. The logical processes are only called if there is an incoming event to one of the inputs of the process. Hence computation only takes place if one of the processes is active. This results in a much lower computational effort. Consequently larger networks can be simulated with less resources using this method. Distributing an event driven simulation onto a cluster of workstations reduces the time needed for the simulation of very large networks of spiking neurons. The functionality of a spiking neural system can be simulated at different levels of abstraction and hence at different levels of detail. For the simulation of artificial and biological networks with a tremendous number of neurons and connections a simulation method should not only support the detailed examination of single neurons, but also the efficient simulation of large neural systems at a functional level. Even if simulation takes place on a functional level, the behaviour of the single neuron should be as realistic as possible. Therefore a set of logical processes implementing known models of neurons and a well defined interface for user defined processes must be provided by the simulation engine. Reusability and extensibility of the code written for the simulator is provided using an object oriented design and implementation method. 2. Design considerations In terms of a distributed event driven simulation logical processes are communicating via messages with each other. Hence they all have to use their own communication interface. With an object oriented design in mind, a decomposition at neuron level would be a generic way of distribution. It is done implementing each neuron as a separate logical process, which communicates with other logical processes. This results in a tremendous overhead for each neuron and therefore in a very poor performance of the simulator. Using a centrally controlled simulation on a conventional computer instead, the overhead for each logical process can be minimised but the simulation is limited by the resources of a single processor. Moreover any parallelism of the network cannot be used to enhance simulation speed. To avoid the overhead of a distribution at neuron level and the limitations of a centrally controlled simulation we combine both methods by dividing the network into sub-networks. Each sub-network consists of a limited amount of logical processes and is computed on a subsimulator. The sub-simulators are communicating via messages. It is a hierarchical structure, where microscopic logical processes (neurons) are grouped to macroscopic logical processes (sub-simulators). Hence we avoid the overhead at the microscopic level using a centrally controlled simulation on each sub-simulator and we overcome the limitations of a purely centrally controlled simulation at the macroscopic level because of the parallel execution of multiple sub-simulators. The potential advantages of this hierarchical approach with communicating sub-simulators is discussed in [Brissinck97]. In terms of a distributed simulation with communicating logical processes, the sub-simulators are the logical processes to be synchronised, because they hide the centrally controlled processing of the underlying logical processes, which are implementing the behaviour of the neurons. There are two basic concepts for a distributed, event driven simulation of interacting logical processes, the optimistic and the conservative synchronisation method. Refer to [Ferscha95] for a more detailed overview of parallel discrete event simulation methods. An implementation of the optimistic method simulates events even if the temporal order is not guaranteed, e.g. violated by undefined inputs on which no sufficient “look ahead1” is provided. Depending on the machine used for the calculation of the process and the complexity of each process the temporal order of the events could be corrupted. Even if the optimistic method avoids a deadlock of the simulation, because it runs as long as events are available, it is sometimes necessary to roll back the simulation to reintroduce consistency, violated through an “out of order” event. An implementation of the conservative method ensures the temporal order of the events in the system and provides methods to prevent deadlocks. Hence if a faster process depends on a slower one it has to wait until the slower process completes execution or provides a “look ahead” to the faster process. With a look ahead the preceding process grants that it will be inactive up to a certain time. To avoid inactivity of processes in the conservative simulation, every possible look ahead of the system should be exploited. With the simulation of large networks in mind, we have chosen the conservative method, because the optimistic method needs more memory resources and controlling effort than the conservative one. It obviously has a tremendous impact to the whole simulation run in which way the network is divided into sub networks (partitions). Hence we provide mechanisms to support the partitioning of the network in an efficient way. This includes information about the speed of the contributing workstations, basic bandwidth of the network connecting the workstations, the computational effort for the different event processing kernels and the topology of the spiking neural network. Nevertheless there is not always an optimal solution to the balancing of partitions and therefore stalling of sub-simulators still can occur. Special care has to be taken to avoid these situations. A sub-simulator stalls upon the following conditions: 1 A look ahead is provided by the preceding processes. It guarantees that the corresponding process will be inactive up to a certain time (e.g. because of a refractory period). The sub-simulator has processed all events up to the safe simulation window (Figure 1), which is defined through the minimal look ahead provided by the preceding sub-simulators (predecessors). Additionally, the predecessor which has provided the smallest look ahead is either busy but inactive at the output to the succeeding sub-simulator (successor), or it also stalls. afe Window stalling condition up to the third message which indicates the ongoing simulation of the predecessor, none of the succeeding sub-simulators sends any inquiry messages to the stalling logical process. In summary three different modes have to be implemented for each sub-simulator: sequential mode – sequential processing of events look ahead mode – calculation of maximal look ahead inquiry mode – asking for look ahead messages 3. Architecture of the simulator The distributed simulator consists of a set of communicating sub-simulators. Each sub-simulator executes a part of the simulated network (Figure 2), only depending upon the look ahead and the messages, provided by sub-simulators calculating neurons in its Figure 1: Safe simulation window In case the predecessor is busy it advances its own local virtual simulation time, but the successor cannot advance its local time if the output of the predecessor stays inactive. A common solution to this problem is to send null messages from the predecessor to the successor each time the predecessor advances its local virtual time. This results in a tremendous amount of messages passed between the two sub-simulators. Alternatively the successor sends inquiry messages to the predecessor if needed. In most cases the amount of messages is less sending inquiry messages on demand, than continuously sending null messages. In case the predecessor also stalls, there is no difference. Therefore we use the inquiry method, adding a special handling for the stalling situation. If the sub-simulator stalls we first send the actual local virtual time as a first look ahead to all succeeding subsimulators to avoid a deadlock situation that can be caused by recursive connections. Next we use a modification of an algorithm presented in [Brissinck97] to calculate the maximal look ahead, the stalling subsimulator can provide, i.e. the earliest time at which an event may occur at the different outputs of the subsimulator. For this purpose it uses the internal state of the sub-simulator, the actual look ahead provided at the inputs to the sub-simulator and the delays in-between the logical processes (axonal delays). Finally this look ahead is sent out to the succeeding sub-simulators. An additional message is sent by the predecessor, when it is able to continue its simulation. In the time span starting with the first look ahead sent by the predecessor, indicating the beginning of the User Interface Sub simulator 1 Sub simulator 2 Sub simulator 4 Sub simulator 3 Figure 2: Decomposition into sub-networks neighbourhood. A network is entered and decomposed into sub-networks with the help of the user interface. The user interface can be executed on one of the machines running a sub-simulator, or on a separate machine. It provides mechanisms to distribute the subnetworks to the sub-simulators and to start the simulation. After all sub-simulators have notified the completion of their simulation, any information logged during the simulation run can be gathered through the user interface. Online interaction with the subsimulators is also possible by sending messages to them. Using PVM2 as the messaging system, the simulator can Transmitter Event List Control Receiver Fan Out Connection Logical Process Logical Process Control Logical Process Kernel Figure 3: sub-simulator Network Model of the neuron be build up with nearly any kind of computing machines. Currently, the simulation engine is implemented for PCs running Windows NT and workstations running Solaris 2.5. A graphical user interface will be provided on Windows NT machines. Figure 3 shows the structure of the sub-simulator. A central control handles the simulation process and ensures, together with the event list, the temporal order of the events in the system. It also handles the communication with the other subsimulators through a transmitter object and a receiver object. The gray shaded area is the logical process which handles the processing of events. It contains the logical process kernel, which implements the model of the spiking neuron. The logical process control handles administrative tasks of the simulation process and the fan out connection models the distribution of spikes by the axon. Figure 4 shows the conceptual decomposition Dendritic Trees Synapse F(x) Axon Synapse sponding successor. Alternatively one could use the receiver oriented method where just one event with the sender address has to be created. Unfortunately it is necessary to iterate over all processes to check whether a process is sensitive to a certain sender address or not and each process has to handle its own event list to ensure temporal order. This would introduce additional overhead to each process and results in an expensive controlling. Therefore the sender oriented method is much more efficient for the case of low average activity. Temporal order of events is necessary to guarantee correct simulation results. If a process has already calculated the changes of its state upon an event, any earlier incoming event may lead to different results. It is also necessary to ensure, that an event is guaranteed to happen at the given time, before it is inserted into the event list. Therefore the logical process control implements a mechanism to hold back events up to the time, where it is save to send them. Given a model with a postsynaptic potential like the one shown in Figure 5, the first incoming event (solid), is succeeded by another event (dashed). The temporal order is provided for this case, but the second event changes the firing time of the incoming spikes t Logical ProcessKernel Fan OutConnection Threshold postsynaptic potenial Figure 4: Components of the neuron model of a neuron. The dendritic trees together with the synapses and the soma are forming the logical process kernel. This kernel is an implementation of the behaviour of the model neuron. For each incoming spike its task is to calculate the effect to its internal states and to indicate at the output when the next spikes will occur. The outgoing spikes are distributed by the fan out connection to all succeeding neurons. The delays between the soma output and the synapse are explicitly handled by this object. This method is called sender oriented [Hartmann93], because the sender is the initiator of the transmission. In fact the events are not directly sent to the succeeding logical processes, but put into the event list, to be handled by the control, ensuring the temporal order of the events. Therefore the fan out connection inserts as many events into the event list as the number of successors of the current process. The time stamps of these events are set to the time of the spike plus the delay to the corre2 Pure Virtual Machine: a messaging system for heterogeneous networks of workstations (Please refer to [Geist94].) t outgoing spikes t Figure 5: Temporal order of events neuron, which was calculated on basis of the first event. Hence in this situation the logical process control keeps the outgoing event up to the time it is scheduled. If the second event arrives with a time stamp earlier than the scheduled time for the outgoing event, the outgoing event is cancelled by the logical process control and the new results from the kernel are stored or sent for further processing. Only if each input of the process has a sufficient look ahead, exceeding the time stamp of the outgoing event, the logical process control directly passes the event to the fan out connection. This mechanism ensures that the events entered into the event list are guaranteed to happen at their given time. This is the situation each implementation of a logical process kernel faces. Obeying these rules it may implement any behaviour which guarantees the temporal order in the system. Hence the logical process kernel must produce outgoing events with a time stamp equal to or greater than the time stamp of the incoming event which causes the calculation of the outgoing event. There is no limit to the complexity of a logical process kernel. Currently two different models are implemented for the use with the simulator: 1. A very simple integrate and fire model, which is also used by Loyd Watts event driven simulator “spike” [Watts94] 2. A more complicated model, similar to the spike response model introduced by Wulfram Gerstner [Gerstner90], [GerstHemm92] The first model uses a linear integration over input current pulses of fixed duration which are triggered through incoming spikes. The second model is implemented using look up tables to provide nearly arbitrary kernel functions for the shape of the post synaptic potential, like the one in Figure 5. The computational effort for the simple integrate and fire model is negligible. Because look up tables are used for the kernel functions of the spike response model, the computational effort for this model is still considerable low. face of the distributed simulation a SUN Sparc 20 with 32MB. In case of a distributed simulation the layers of network one were equally distributed over the contributing workstations, where one of the workstations additionally handles the feeding layer. The second net was divided into equally sized sets of neurons to be mapped onto the contributing workstations. In Figure 6 the measured speed up for the layered net- 4. Preliminary Results works reaches its maximum at 1.95 for two machines and at 2.7 for three machines, which is close to the theoretical maximum. The speed up of the Hopfield network is lower, because it is a kind of worst case scenario to our simulator. The fully recursive connected network causes a periodically stalling of the sub-simulators. Nevertheless a speed up of 1.7 using two workstations and 2.3 using three workstations is achieved in this situation (Figure 7). Factor Two machines 2,8 Three machines 2,6 2,4 2,2 2 1,8 1,6 1,4 1,2 0 10 20 30 40 50 60 Number of Neurons per layer Figure 6: Speed up layered topology Speed up relative to a single processor machine Factor The objective of our implementation is to keep all subsimulators in the sequential mode where the amount of spikes lying in its safe simulation window is big enough to keep the sub-simulator running. If this condition is fulfilled, the handling of messages reduces the performance of the simulation by a constant value, because the sequential simulation engine and the message handling are separate threads. It is also obvious, that the performance in this situation is independent of the bandwidth of the underlying network. In most real simulation runs with irregular structures, a certain amount of stalling time of sub-simulators will occur. Because the implementation of the simulator is in its early alpha version, we present preliminary results for two very simple network topologies which are illustrating the expected speed up depending on the degree to which the capacity of each contributing workstation is exploited. The first network consists of six layers, fully connected from layer to layer, plus a feeding layer. Each layer has the same number of neurons and the connections are equally weighted and delayed. The second network consists of one layer where each neuron is fully connected to all other neurons, as in a Hopfield network. The connections are also equally weighted and delayed. The networks were simulated on a simulator with one, two and three equally equipped workstations by varying the amount of neurons for all layers between five and 50 for the first net. The size of the second net was varied between 30 and 120 neurons. For the subsimulators we used SUN Ultra Sparc workstations (model 140 with 128MB RAM) and for the user inter- Speed up relative to a single processor machine 3 2,5 Two Machines 2,3 Three Machines 2,1 1,9 1,7 1,5 1,3 20 30 40 50 60 70 80 90 100 110 Number of Neurons in the Hopfield layer Figure 7: Speed up Hopfield topology 120 130 5. Conclusion Throughout the paper we have presented a concept for a distributed event driven simulator, which is well suited for but not limited to the simulation of spiking neural networks. Contrary to time driven simulators for spiking neural networks this simulator exploits the low average activity of these networks. It also provides a dedicated mechanism to handle the delays between logical processes and therefore between neurons. Moreover an advantage is taken from these delays to further enhance the performance of the simulator. Because of the subsimulator concept a good scalability is provided by using already available computational resources. A simple interface between the simulation engine and the logical process kernels eases the exchange and addition of new neuron models. The measured performance for two simple networks proves the theoretically estimated speed up one can achieve by a distribution of the simulation. There is an ongoing research to further improve the overall simulation performance. Several algorithms for the automatic partitioning of the networks are currently under evaluation. References [Aertsen93] Aertsen A. (ed.), “Brain Theory: SpatioTemporal Aspects of Brain Function”, Elsevier, 1993. [Brissinck97] Brissinck W., Clarysse S., Dirkx E., “A Hierarchical Approach to Distributed Discrete Event Simulation”, Proc. of IASTED – International Conference on Parallel and Distributed Computing and Networks, 1997. [BowBee94] Bower J.M., Beeman D., “The Book of GENESIS”, Springer Verlag, 1994. [Ferscha95] Ferscha A., “Parallel and Distributed Simulation of Discrete Event Systems”, in Parallel and Distributed Computing Handbook, McGraw-Hill, 1995. [Geist94] Geist A., Beguelin A., Jiang W., Manchek R., Sunderam V., “PVM: Parallel Virtual Machine”, MIT Press, 1994. [Gerstner90] Gerstner W., “Associative memory in a network of ‘biological’ neurons”, Advances in Neural Information Processing Systems 3: 8490, 1990. [GerstHemm92] Gerstner W., van Hemmen JL., “Associative memory in a network of ‘spiking’ neurons”, Network 3: 139-164, 1992. [Hartmann93] Hartmann G., 1. Workshop zum Förderungsschwerpunkt “Elektronisches Auge”, Summary: 10-19, 1993. [Maass95] Maass W., “On the computational complexity of networks of spiking neurons”, Advances in Neural Information Processing Systems, vol.7, MIT Press: 183-190, 1995. [Rolls89] Rolls E.T., “The representation and storage of information in neuronal networks in the primate cerebral cortex and hippocampus”, in The Computing Neuron, Addison-Wesley: 125-159, 1989. [Watts94] Watts Lloyd, “Event-Driven Simulation of Networks of Spiking Neurons”, Advances in Neural Information Processing Systems, Volume6: 927-934, 1994.