Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Steve Furber The University of Manchester [email protected] BCS 27 Nov 2013 1 • How can massively parallel compu;ng resources accelerate our understanding of brain func;on? • How can our growing understanding of brain func;on point the way to more efficient parallel, fault-‐tolerant computa;on? BCS 27 Nov 2013 2 • Brains demonstrate – massive parallelism (1011 neurons) – massive connec;vity (1015 synapses) – excellent power-‐efficiency • much beJer than today’s microchips – low-‐performance components (~ 100 Hz) – low-‐speed communica;on (~ metres/sec) – adap;vity – tolerant of component failure – autonomous learning BCS 27 Nov 2013 3 • Neurons • mul;ple inputs, single output (c.f. logic gate) • useful across mul;ple scales (102 to 1011) • Brain structure • regularity • e.g. 6-‐layer cor;cal ‘microarchitecture’ BCS 27 Nov 2013 4 • A million mobile phone processors in one computer • Able to model about 1% of the human brain… • …or 10 mice! BCS 27 Nov 2013 5 • Virtualised topology – physical and logical connec;vity are decoupled • Bounded asynchrony – ;me models itself • Energy frugality – processors are free – the real cost of computa;on is energy BCS 27 Nov 2013 6 BCS 27 Nov 2013 7 BCS 27 Nov 2013 8 Mobile DDR SDRAM interface Mul;-‐chip packaging by UNISEM Europe BCS 27 Nov 2013 9 BCS 27 Nov 2013 10 BCS 27 Nov 2013 11 104 machine:10,368 cores, 1 rack, 900W 103 machine: 864 cores, 1 PCB, 75W 105 machine: 103,680 cores, 1 cabinet, 9kW (NB 12 PCBs for opera;on without aircon) 106 machine: 1M cores, 10 cabinets, 90kW BCS 27 Nov 2013 12 • Emulate the very high connec;vity of real neurons • A spike generated by a neuron firing must be conveyed efficiently to >1,000 inputs • On-‐chip and inter-‐chip spike communica;on should use the same delivery mechanism BCS 27 Nov 2013 13 • Four packet types MC (mul;cast): source routed; carry events (spikes) P2P (point-‐to-‐point): used for bootstrap, debug, monitoring, etc NN (nearest neighbour): build address map, flood-‐fill code FR (fixed route): carry 64-‐bit debug data to host – – – – • Timestamp mechanism removes errant packets – which could otherwise circulate forever Event ID (32 bits) Header (8 bits) T ER TS 0 - P Header (8 bits) T SQ TS 1 - P BCS 27 Nov 2013 Payload (32 bits) Address (16+16 bits) Dest Srce 14 • All MC spike event packets are sent to a router • Ternary CAM keeps router size manageable at 1024 entries (but careful network mapping also essen;al) • CAM ‘hit’ yields a set of des;na;ons for this spike event – automa;c mul;cas;ng • CAM ‘miss’ routes event to a ‘default’ output link Event ID 0 0 1 0 X 1 0 1 BCS 27 Nov 2013 X 000000010000010000 001001 On-chip Inter-chip 15 Topology 14 01 72 1 07 2 5 3 09 2 03 12 Core 10 06 Synapse 10 11 15 9 02 7 Fragment of MC table 8 6 01 4 6 Node 94 Problem graph (circuit) BCS 27 Nov 2013 07 23 0 72 3 23 0 72 94 0 1 1 23 0 72 1 94 0 0 0 2 2 3 23 3 72 2 94 3 3 2 3 23 3 72 2 94 2 1 94 2 23 0 72 2 94 - 1 23 23 - 72 1 94 2 0 2 16 ...abstract problem topology... ...problem topology loaded into firmware rou<ng tables... Problem: represented as a network of nodes with a certain behaviour... ...problem is split into two parts... ...compile, link... ...behaviour of each node embodied as an interrupt handler in code... BCS 27 Nov 2013 ...binary files loaded into core instruc<on memory... Our job is to make the model behaviour reflect reality The code says "send message" but has no control where the output message goes 17 • 1,024 links – in each direc;on • ~10 billion packets/s • 10Hz mean firing rate • 250 Gbps bisec;on bandwidth BCS 27 Nov 2013 18 • A set of dynamical processes P = {Pi} – Si(t) is the state of Pi at ;me t • A set of event channels E = {Ej} – Ej carries a ;me series of asynchronous impulses – generated by a process Ej = ej(Pj) • Hybrid model (biology): processes evolve Si = si(t, E* ⊆ E) • Discrete model (SpiNNaker) – ;me can be abstracted into a series of (e.g.) 1ms events Et – We can model each event atomically: Ej => Si := pi(Si, j) • In prac;ce, on SpiNNaker event handling takes a finite ;me and may overlap subsequent events. BCS 27 Nov 2013 19 SpiNNaker chip Monitor Processor Application Processors Monitor Processor SC&MP SC&MP Application Processors Application Code SpiNNaker + Ethernet Application Code Spike Server Visualiser Program Loader Host workstation SDP−based API Spin1 API Linux OS SARK SARK Ethernet Ethernet Inter−chip Links BCS 27 Nov 2013 Spin1 API SARK SARK Inter−chip Links 20 off-‐die SDRAM system RAM ITCM DTCM applica;on dma ;mer vic core comms hardware BCS 27 Nov 2013 comms NoC 21 applica;on control core control applica;on sark memory management peripheral management event management hardware BCS 27 Nov 2013 SDP messaging 22 applica;on event-‐driven programming model API run-‐;me run-‐;me environment environment sark hardware BCS 27 Nov 2013 sark func;onality s;ll available 23 priority level = -1 only one callback cannot be pre-empted priority level = 0 can only be pre-empted by priority -1 callback priority level > 0 can be pre-empted by priority <= 0 callbacks scheduled in priority order BCS 27 Nov 2013 24 BCS 27 Nov 2013 25 BCS 27 Nov 2013 26 • LIF • Izhikevich BCS 27 Nov 2013 27 • Vogels-‐ AbboJ benchmark – 500 LIF neurons BCS 27 Nov 2013 28 BCS 27 Nov 2013 29 • SpiNNaker: • 5M conn/s/ARM • Spaun: • 2.5M neurons • ~100Hz firing rates • ~500 inputs/neuron • 125G conn/s • Real-‐;me Spaun: • 25,000 ARMs • 30x 48-‐node PCB • by end 2013? a) b) Working Memory Visual Input Informa!on Encoding Transform Calcula!on Reward Evalua!on Ac!on Selec!on Informa!on Decoding Motor Processing Motor Output = hierarchy Chris Eliasmith et al, Science vol. 338, 30 Nov 2012 BCS 27 Nov 2013 30 BCS 27 Nov 2013 31