Download document 8908518

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Steve Furber
The University of Manchester
[email protected]
BCS 27 Nov 2013 1 •  How can massively parallel compu;ng resources accelerate our understanding of brain func;on? •  How can our growing understanding of brain func;on point the way to more efficient parallel, fault-­‐tolerant computa;on? BCS 27 Nov 2013 2 •  Brains demonstrate –  massive parallelism (1011 neurons) –  massive connec;vity (1015 synapses) –  excellent power-­‐efficiency •  much beJer than today’s microchips –  low-­‐performance components (~ 100 Hz) –  low-­‐speed communica;on (~ metres/sec) –  adap;vity – tolerant of component failure –  autonomous learning BCS 27 Nov 2013 3 •  Neurons •  mul;ple inputs, single output (c.f. logic gate) •  useful across mul;ple scales (102 to 1011) •  Brain structure •  regularity •  e.g. 6-­‐layer cor;cal ‘microarchitecture’ BCS 27 Nov 2013 4 •  A million mobile phone processors in one computer •  Able to model about 1% of the human brain… •  …or 10 mice! BCS 27 Nov 2013 5 •  Virtualised topology –  physical and logical connec;vity are decoupled •  Bounded asynchrony –  ;me models itself •  Energy frugality –  processors are free –  the real cost of computa;on is energy BCS 27 Nov 2013 6 BCS 27 Nov 2013 7 BCS 27 Nov 2013 8 Mobile DDR SDRAM interface Mul;-­‐chip packaging by UNISEM Europe BCS 27 Nov 2013 9 BCS 27 Nov 2013 10 BCS 27 Nov 2013 11 104 machine:10,368 cores, 1 rack, 900W 103 machine: 864 cores, 1 PCB, 75W 105 machine: 103,680 cores, 1 cabinet, 9kW (NB 12 PCBs for opera;on without aircon) 106 machine: 1M cores, 10 cabinets, 90kW BCS 27 Nov 2013 12 •  Emulate the very high connec;vity of real neurons •  A spike generated by a neuron firing must be conveyed efficiently to >1,000 inputs •  On-­‐chip and inter-­‐chip spike communica;on should use the same delivery mechanism BCS 27 Nov 2013 13 •  Four packet types MC (mul;cast): source routed; carry events (spikes) P2P (point-­‐to-­‐point): used for bootstrap, debug, monitoring, etc NN (nearest neighbour): build address map, flood-­‐fill code FR (fixed route): carry 64-­‐bit debug data to host – 
– 
– 
– 
•  Timestamp mechanism removes errant packets –  which could otherwise circulate forever Event ID (32 bits)
Header (8 bits)
T ER
TS 0 - P
Header (8 bits)
T SQ
TS 1 - P
BCS 27 Nov 2013 Payload (32 bits)
Address (16+16 bits)
Dest
Srce
14 •  All MC spike event packets are sent to a router •  Ternary CAM keeps router size manageable at 1024 entries (but careful network mapping also essen;al) •  CAM ‘hit’ yields a set of des;na;ons for this spike event –  automa;c mul;cas;ng •  CAM ‘miss’ routes event to a ‘default’ output link Event ID
0 0 1 0 X 1 0 1
BCS 27 Nov 2013 X
000000010000010000
001001
On-chip
Inter-chip
15 Topology
14 01 72 1
07 2
5 3 09 2 03 12
Core 10 06 Synapse
10
11
15 9 02 7 Fragment of
MC table 8
6 01 4
6
Node 94 Problem graph (circuit)
BCS 27 Nov 2013 07 23 0 72 3 23 0
72 94 0
1
1 23 0 72 1 94 0 0
0 2 2 3 23 3
72 2
94 3
3 2 3 23 3
72 2
94 2
1
94 2 23 0 72 2 94 - 1
23 23 - 72 1 94 2 0
2
16 ...abstract problem topology...
...problem topology loaded into firmware rou<ng tables...
Problem: represented as a network of nodes with a certain behaviour...
...problem is split into two parts...
...compile, link...
...behaviour of each node embodied as an interrupt handler in code...
BCS 27 Nov 2013 ...binary files loaded into core instruc<on memory...
Our job is to make the model behaviour reflect reality
The code says "send message" but has no control where the output message goes 17 •  1,024 links –  in each direc;on •  ~10 billion packets/s •  10Hz mean firing rate •  250 Gbps bisec;on bandwidth BCS 27 Nov 2013 18 •  A set of dynamical processes P = {Pi} –  Si(t) is the state of Pi at ;me t •  A set of event channels E = {Ej} –  Ej carries a ;me series of asynchronous impulses –  generated by a process Ej = ej(Pj) •  Hybrid model (biology): processes evolve Si = si(t, E* ⊆ E) •  Discrete model (SpiNNaker) –  ;me can be abstracted into a series of (e.g.) 1ms events Et –  We can model each event atomically: Ej => Si := pi(Si, j) •  In prac;ce, on SpiNNaker event handling takes a finite ;me and may overlap subsequent events. BCS 27 Nov 2013 19 SpiNNaker chip
Monitor
Processor
Application
Processors
Monitor
Processor
SC&MP
SC&MP
Application
Processors
Application
Code
SpiNNaker + Ethernet
Application
Code
Spike Server
Visualiser
Program Loader
Host workstation
SDP−based API
Spin1
API
Linux OS
SARK
SARK
Ethernet
Ethernet
Inter−chip
Links
BCS 27 Nov 2013 Spin1
API
SARK
SARK
Inter−chip
Links
20 off-­‐die SDRAM system RAM ITCM DTCM applica;on dma ;mer vic core comms hardware BCS 27 Nov 2013 comms NoC 21 applica;on control core control applica;on sark memory management peripheral management event management hardware BCS 27 Nov 2013 SDP messaging 22 applica;on event-­‐driven programming model API run-­‐;me run-­‐;me environment environment sark hardware BCS 27 Nov 2013 sark func;onality s;ll available 23 priority level = -1
only one callback
cannot be pre-empted
priority level = 0
can only be pre-empted
by priority -1 callback
priority level > 0
can be pre-empted
by priority <= 0 callbacks
scheduled in priority order
BCS 27 Nov 2013 24 BCS 27 Nov 2013 25 BCS 27 Nov 2013 26 •  LIF •  Izhikevich BCS 27 Nov 2013 27 •  Vogels-­‐
AbboJ benchmark –  500 LIF neurons BCS 27 Nov 2013 28 BCS 27 Nov 2013 29 •  SpiNNaker: •  5M conn/s/ARM •  Spaun: •  2.5M neurons •  ~100Hz firing rates •  ~500 inputs/neuron •  125G conn/s •  Real-­‐;me Spaun: •  25,000 ARMs •  30x 48-­‐node PCB •  by end 2013? a)
b)
Working Memory
Visual
Input
Informa!on
Encoding
Transform
Calcula!on
Reward
Evalua!on
Ac!on Selec!on
Informa!on
Decoding
Motor
Processing
Motor
Output
= hierarchy
Chris Eliasmith et al, Science vol. 338, 30 Nov 2012 BCS 27 Nov 2013 30 BCS 27 Nov 2013 31