Download Coarse Grain Reconfigurable Architectures

Document related concepts
no text concepts found
Transcript
CAPES/DFG Cooperation on
Reconfigurable Computing, inv.
talk, Sept 19, 2008, Dept of
Mechanical Engineering,
Universidade de Brasilia
Reiner
Hartenstein
slightly modified version
Speed-ups
obtained by
Reconfigurable
Computing
1
outline
Introduction
Manycore Crisis & von Neumann Syndrome
The Impact of Reconfigurable Computing
Programmer education: new roadmap needed
Conclusions
© 2008, [email protected]
2
http://hartenstein.de
5 key issues
climate change faster than predicted: by
carbon emission, primarily from power plants ?
very high and growing computer energy cost –
and growing number of power plants needed here
the manycore programming crisis stalls progress
(end of the free ride on the Gordon Moore curve)
technologically stalled Moore‘s Law*
Reconfigurable Computing
is a promising alternative
© Tom
2008,
[email protected]
*)
Williams
(keynote): the 20 nm wall
3
[Nick Tredennick (Gilder), 2003]
http://hartenstein.de
2008: 65, 45, 32 nm
History of data processing
The first reconfigurable computer
• prototyped: 1884
Herman Hollerith
•datastream-based
DPU
• 1st Xilinx FPGA
100 years later
© 2008, [email protected]
4
http://hartenstein.de
Configware Programming
no instruction
streams
manually
(Configuration)
or, by swapping
pre-wired board
(Reconfiguration)
motivating the J. v N, 1946
von Neumann paradigm
60 years later: RAM
available –ferrite cores
© 2008, [email protected]
5
http://hartenstein.de
fine-grained
reconfigurable
a wire to CLB
forming
Connect
Field-Programmable Gate Array FPGA
A
CLB
CLB
CLB
CLB
connect box
switch box CLB
CConfigurable
Logic Box
6
© 2008, [email protected]
CLB
B
Xilinx old „island architecture“
CLB
http://hartenstein.de
switch box
forming a wire
connect box
Connect to CLB
Field-Programmable Gate Array FPGA
CLB
A
CLB
CLB
CLB
CLB
B
CLB
CLB
CConfigurable
Logic Box
© 2008, [email protected]
7
http://hartenstein.de
RAM-based
hidden RAM
this switch box has
hidden RAM 150 transistors &
150 flipflops FF
0
0
0
0
0
patches even at the
customer‘s desk
1
configware code loaded
before run time into
switch box “hidden RAM”
part of FF
“hidden RAM”
hidden RAM
© 2008, [email protected]
FPGAs mainstream
since > a decade
8
http://hartenstein.de
Coarse-grained Reconfigurable Array
CLB CFB !
Conditional Swap Example
(parallelization of the bubble sort algorithm)
if X > Y then swap;
X
Xi
0
1
Swap
>
Xo
Y
1
Yi
rout thru only
© 2008, [email protected]
rout thru
and function
(multiplexer)
0
Yo
swap turned into
a wiring pattern
http://hartenstein.de
Another coarse-grained r-Array
SNN Filter on supersystolic Array: mainly a Pipe Network
rout thru only
CFB !
no CPU
rDPU
reconfigurable
Data Path Unit,
32 Bits wide
Legend: size:
rDPU not
used connect
for routing only
array
10used
x 16backbus connect
backbus
© 2008, [email protected]
10
port used
location marker
not
(99% placement efficiency)
operator and routing
by KressArray Xplorer [Ulrich Nageldinger]
CoDe-X inside [Jürgen Becker]
http://hartenstein.de
ConfigwareCode-input
Plattform-FPGA
8 – 32
fast serial
I/O-channels
256 – 1704
BGA
DPUs
56 – fast on-chip
424
Block
RAMs:
BRAMs
[courtesy Lattice
Semiconductor]
© 2008, [email protected]
11
http://hartenstein.de
Reconfigurable Supercomputing
Silicon
graphics
Reconfigurable
ApplicationSpecific
Computing
(RASC™)
Cray XD1
Supercomputing 2007,
Reno, Nevada, USA
9600 registered attandees,
440 exhibitors
•Xilinx Virtex-II Pro
•Library by Cray
Chuck Thacker …
(even Microsoft
working at it)
(Lab in Cambridge. UK, etc.).
© 2008, [email protected]
12
http://hartenstein.de
what means Configware
time domain
Software
Source
Software to
Configware
Migration
space domain
Configware
Source
Placement
& Routing
mapper
Software
Compiler
data
scheduler
Software Code
(instruction-procedural)
© 2008, [email protected]
Flowware Code
(data-procedural)
13
Configware Code
(structural: space domain)
http://hartenstein.de
outline
Introduction
The Manycore Crisis & the
von Neumann Syndrome
The Impact of Reconfigurable Computing
Programmer education: new roadmap needed
Conclusions
© 2008, [email protected]
14
http://hartenstein.de
Many-core: Break-through or Breakdown?
Industry is facing a disruptive turning point
“could reset µP HW & SW
roadmaps for next 30 years”,
[David Patterson]
intel’s vision:
MultiCore
forcing a historic transition to a parallel
programming model yet to be invented [David Callahan]
HPC users lack understanding in basic precepts*
it‘s an education,
qualification, and
a R&D problem
The stakes are high ...
„I would be panicked
if I were inindustry“
[John Hennessy]
*) PRACE consortium (Partnership foR Advanced Computing in Europe)
http://www.prace-project.eu/documents/D3.3.1_document_final.pdf
© 2008, [email protected]
15
http://hartenstein.de
Declining Programmer Productivity
The Law of More: programmer
productivity declines disproportionately
with increasing parallelism
At particular HPC application domains
massive parallelism requires 10 – 30
professionalists in multi-disciplinary
multi-insitutional teams for 5 - 10 years
[Douglass Post, DoD HPCMP, panelist at SC07]
Software done: machine obsolete
© 2008, [email protected]
16
http://hartenstein.de
The von Neumann
Syndrome
© 2008, [email protected]
17
http://hartenstein.de
The von Neumann
Syndrome
© 2008, [email protected]
18
http://hartenstein.de
Massive Overhead Phenomena
overhead piling up to code sizes
of
astronomic
dimensions
von Neumann
CPU
single core
2006:
C.V. “RAM”
Ramamoorthy:
von Neumann
overhead
machine
instruction fetch
instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU + other overh. instruction stream
i / o to / from off-chip RAM instruction stream
“von Neumann Syndrome”
1986, E.I.S. Projekt: 94%
for address computation
total speed-up:
x 15000
2008 David Callahan:
„a terrifying number of
processes running in parallel,
create sequential-processing
bottlenecks and losses in
Dijkstra 1968: The Goto considered harmful
Koch et al. 1975: The universal Bus considered harmful
Backus, 1978: Can programming be liberated from the von Neumann style?
Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style
© 2008, [email protected]
19
data locality“
http://hartenstein.de
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
manycore von Neumann: arrays
of massive overhead phenomena
von
CPU Neumann
CPU
CPU
manyCPU
CPU
single
CPU
CPU
CPU core
fast on-chip memory cannot store
such huge instruction code blocks
von Neumann
machine
instruction fetch
instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU + other overh. instruction stream
i / o to / from off-chip RAM instruction stream
overhead
Inter PU communication
instruction stream
message passing overhead instruction stream
transactional memory overh. instruction stream
overhead
©multithreading
2008, [email protected]
etc. instruction stream
proportionate
to the number
of processors
disproportionate
to the number
of processors
20
http://hartenstein.de
outline
Introduction
Manycore Crisis & von Neuman Syndrome
The Impact of Reconfigurable Computing
Programmer education: new roadmap needed
Conclusions
© 2008, [email protected]
21
http://hartenstein.de
Speed-up factors obtained
by Software to Configware migration
Speedup-Factor
106
103
DES
breaking
Image processing,
Pattern matching,
Multimedia
28500
DSP and wireless
real-time face detection
Reed-Solomon Decoding
6000
MAC
crypto 3000
video-rate stereo vision
2400
pattern recognition 730
SPIHT wavelet-based image compression457
52
protein identification
100
© 2008, [email protected]
20
22
900
1000
400
288
100
FFT
BLAST
88
1000
Viterbi Decoding
Smith-Waterman
pattern matching
molecular dynamics simulation
40
Bioinformatics
Astrophysics
GRAPE
http://hartenstein.de
Accelerator card from Bruchsal
16
FPGAs
MAC means
Multiply and
ACcumulate
Tera means 1012 or
1 000 000 000 000
(1 trillion)
• 1.5 TeraMAC/s
• I/O Bandwidth: 50 GByte/s
• Manufacturer: SIEMENS Bruchsal
© 2008, [email protected]
23
http://hartenstein.de
Energy saving factors obtained
by software to configware migration
Speedup-Factor
106
103
Energy saving:
almost x10 less
than speed-up …
… could be
improved
100
DES
breaking
Image processing,
Pattern matching,
Multimedia
28500
DSP und wireless
real-time face detection
Reed-Solomon Decoding
6000
video-rate stereo vision
pattern recognition 730
900
SPIHT wavelet-based image compression457
@10
© 2008, [email protected]
52
protein identification
20
24
MAC
1000
400
288
100
FFT
BLAST
88
2400
crypto
3000
1000
Viterbi Decoding
Smith-Waterman
pattern matching
molecular dynamics simulation
40
Bioinformatics
Astrophysics
GRAPE
http://hartenstein.de
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
rDPU
(coarse-grained rec.)
rDPU
rDPA: reconfigurable datapath array
overhead
instruction fetch
state address computation
von Neumann overhead vs.
Reconfigurable Computing
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
von Neumann
machine
instruction stream
instruction stream
data address computation
instruction stream
data meet PU + other overh. instruction stream
i / o to / from off-chip RAM instruction stream
Inter PU communication
instruction stream
message passing overhead instruction stream
transactional memory overh. instruction stream
overhead etc. instruction
stream
©multithreading
2008, [email protected]
25
anti machine
none*
none*
none*
none*
none*
none*
none*
none*
none*
http://hartenstein.de
25
Data meet the processor (CPU)
illustrating von Neumann syndrome
inefficient
transport over
off-Chip-memory
by memory-cyclehungry instruction
streams
by
Software
This is just one
of many von
NeumannOverheadPhenomena
© 2008, [email protected]
26
http://hartenstein.de
Data meet the CPU
illustrating acceleration
Placement of the
execution locality
(not moving data)
within pipe network:
generated by the
Configware-Compiler*
*) before run
time (at
compile time)
© 2008, [email protected]
by
Flowware
27
http://hartenstein.de
What did we learn?
There are 2 kinds of datastreams:
1) indirectly moved by an instruction stream
machine (von Neumann): extremely inefficient
2) directly moved by a datastream machine
(from Reconfigurable Computing): very efficient
“Dataflow machine” would be a nice term,
but was introduced by a different scene*
*) meanwhile dead: not really a dataflow machine, but
had used compilers accepting a dataflow language
© 2008, [email protected]
28
http://hartenstein.de
What else did we learn?
There are 2 kinds of parallelism:
1) Concurrent processes: instruction stream
parallelism (CPU manycores): inefficient
2) Data parallelism by parallel datastreams (in
Reconfigurable Computing Systems): efficient
Conclusion:
- Data parallelism brings the performance
(we do data processing !)
© 2008, [email protected]
29
http://hartenstein.de
data
parallelism:
rDPU rDPU rDPU rDPU
What Parallelism?
[Hartenstein’s
watering can model]
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
instruction
parallelism:
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
no von
Neumannbottleneck
many von
Neumann
bottlenecks
© 2008, [email protected]
30
http://hartenstein.de
Put old ideas into practice
(POIIP)
“We need a complete re-definition of CS”
[Burton Smith and other celebrities]
Wrong! I do not agree, finding out, that ...
[Reiner Hartenstein]
... „The biggest payoff will come from
putting old ideas into practice and teaching
people how to apply them properly.“ [David Parnas]
“We need a complete re-definition of curriculum
recommendations - missing several key issues.”
[Reiner Hartenstein]
© 2008, [email protected]
31
http://hartenstein.de
outline
Introduction
Manycore Crisis & von Neuman Syndrome
The Impact of Reconfigurable Computing
Programmer education:
new road map needed
Conclusions
© 2008, [email protected]
32
http://hartenstein.de
Fighting against obsolete curricula?
The Embedded Systems Approach?
Graduate Curriculum
on Embedded Software
and Systems (EU)
… support their own
educational approach
Advanced Real Time Systems
Real-Time Systems (Sweden)
Recommendations for Designing new ICT Curricula
Chess – Center
for Hybrid and
Embedded
Software Systems
(courses in
embedded
systems)
WESE
Workshop on Embedded Systems Education
© 2008, [email protected]
„You can always teach
programming to a
hardware guy ...
... but you can never teach
hardware to a programmer“
it‘s not the programmer‘s fault:
it‘s due to obsolete CS curricula
http://hartenstein.de
CS is a Monster
fully wrong educational mainstream approaches:
1) the basic mind set exclusively instruction-streamoriented - data streams considered being exotic
2) mapping parallelism into the time domain –
abstracting away the space domain is fatal
We need a dual-rail education
© 2008, [email protected]
34
http://hartenstein.de
We need to
POIIP for:
Software to Hardware Migration:
and
Software to Configware Migration:
2 key rules of thumb - terrifically simple:
1) loop turns into pipeline [1979]
2) decision box turns into demultiplexer
[1967]: PvOIIP
© 2008, [email protected]
35
http://hartenstein.de
Two Dichotomies
Dichotomy = mutual allocation to two opposed
domains such, that a third domain is excluded.
The dichotomy model as an educational orientation
guide for dual rail education to overcome the
software/configware chasm & the software/hardware chasm
1) Machine Paradigm Dichotomy (von Neumann
/Dataflow machine*): the „Twin Paradigm“ model
2) Relativity Dichotomy: time domain / space domain –
helps parallelization by time to space mapping
*) see definition
© 2008, [email protected]
36
http://hartenstein.de
Def.: Dataflow Machine
The old „Dataflow Machine“ research scene is dead.
sequential execution: not really a dataflow machine.
indeterministic: unpredictable order of execution:
had used compilers accepting a dataflow language
we re-define this term: counterpart of von Neumann
deterministic, w. data counters (no program counter)
© 2008, [email protected]
37
http://hartenstein.de
1 ) Paradigm Dichotomy
(procedural dichotomy)
The Twin Paradigm Approach (TTPA)
CPU
program
counter
-
datastream
domain
instruction
domain
(r)DPA
data
counter
+
© 2008, [email protected]
38
+
http://hartenstein.de
Paradigm Dichotomy
(procedural dichotomy)
The Twin Paradigm Approach
CPU
program
counter
-
datastream
domain
instruction
domain
(TTPA)
(r)DPA
data
counter s
+
+
-
+
data
we need+
parallelism
© 2008, [email protected]
39
http://hartenstein.de
ASM
x x x
x x x -
ASM
x x x - -
ASM:
AutoSequencing
Memory
© 2008, [email protected]
|
ASM
x
x
x
|
|
|
|
|
|
|
|
|
|
|
|
x
x
x
x
x
x
New is only: its
generalization [1989]
|
- - - x x x
ASM
- - - - x x x
ASM
- - - - - x x x
ASM
[1990] GAG
Data streams
[Kung et al. 1979]
RAM
data
counter
|
x
x
x
ASM
ASM
x
x
x
ASM
[1995]
ASM
ASM
(r)DPA
[1995]
x
x
x
ASM
systolic array
super systolic
Data Machine: from
old stuff [1979 - ...]
40
http://hartenstein.de
Procedural Languages Twins
program counter
data counter(s)
imperative Software Languages
read next instruction
goto (instruction address)
jump to (instruction address)
instruction loop
instruction loop nesting
instruction loop escape
instruction stream branching
no: no internally parallel loops
super
systolic Flowware Languages
read next data item
goto (data address)
jump to (data address)
data loop
data loop nesting
data loop escape
data stream branching
yes: internally parallel loops
But there is the Asymmetry
© 2008, [email protected]
41
for data parallelism
http://hartenstein.de
Relativity Dichotomy
space
time/space)
time
(time
time domain:
procedure domain
space domain:
structure domain
2 phases:
1) programming
instruction streams
2) run time
3 phases:
1) reconfiguration
of structures
2) programming
data streams
3) run time
von Neumann Machine
© 2008, [email protected]
42
Anti Machine
http://hartenstein.de
time-iterative to space-iterative
n time steps,
1 CPU
Often the space dimension is limited
(e.g. because of the chip size)
k*n time steps,
1 CPU
a time to
space
mapping
a time to
space/time
mapping
1 time step,
n DPUs
( n = length of pipeline )
k time steps,
n DPUs
loop transformation
methodogy: 70ies and later
© 2008, [email protected]
Strip mining
43
[D. Loveman, J-ACM, 1977]
http://hartenstein.de
outline
Introduction
Manycore Crisis & von Neuman Syndrome
The Impact of Reconfigurable Computing
Conclusions
© 2008, [email protected]
44
http://hartenstein.de
Conclusions (1)
We massively need programmable accelerator co-processors
Established technologies are available and
we can still use standard software and their tools
We need a massive Migration of Software to Configware.
To cope with the implementation wall: to cope with the
programmer population‘s unsustainable skills mismatches
Configware skills and basic hardware knowledge
are essential qualifications for programmers.
© 2008, [email protected]
45
http://hartenstein.de
Conclusions (2)
CS education is a monster !
Fully wrong educational mainstream approaches
Yaw-dropping sclerosis of curriculum taskforces
We need a complete re-definition of CS education
We urgently need Dual-Rail Education
CS should learn a lot from Embedded
Systems, like in Mechanical Engineering
© 2008, [email protected]
46
http://hartenstein.de
thank you for your patience
© 2008, [email protected]
47
http://hartenstein.de
END
© 2008, [email protected]
48
http://hartenstein.de
backup for discussion:
© 2008, [email protected]
49
http://hartenstein.de
time to space mapping
time domain:
procedure domain
time algorithm
program loop
n time steps, 1
CPU
Bubble Sort
n x k time
steps,
x
condition
1
swap
y
al
„conditio
time
algorithm
nal
swap“
© 2008, [email protected]
50
unit
space domain:
structure domain
space algorithm
pipeline
1 time step, n
DPUs
Shuffle Sort
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
k time
steps,
n
„conditional
space/time
algorithm s
swap“ units
http://hartenstein.de
Architecture instead of synchro
Example
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
conditio
swap
nal
Better Architecture
instead of complex
synchronisation: half
he number of
conditio
Blocks + up und
swap
down of data
nal
conditio
(shuffle function) –
swap
no von Neumannnal
syndrome !
conditio
swap
nal
conditio
swap
nal
direct time to
space mapping
modification:
with shufflefunction
accessing conflicts
© 2008, [email protected]
„Shuffle Sort“
51
http://hartenstein.de
Transformations since the 70ies
loop transformations: rich methodology publi
[survey: Diss. Karin Schmidt,
1994, Shaker Verlag]
time domain:
procedure domain
program loop
space domain:
structure domain
Strip Mining
Transformation
n x k time
steps,
1
C
P time algorithm
U
© 2008, [email protected]
Pipeline
k time
steps,
DPUs
n
space/time algorithmus
52
http://hartenstein.de
Revolution der Lehre:
Mikroelektronik-Entwurfs-Revolution
traditionelle Arbeitsteilung:
Anwendung
Einreichung
Die neue M-&-C
Arbeitsteilung:
Anwendu
ng
Rückweisung
(in Deutschland: das E.I.S.-Projekt)
Carver Mead
Lynn Conway
[1980]
Einreichung
Rückweisung
Logik-Ebene
Einreichung
Entrümpelung &
Rückweisung intuitive Modelle
SwitchingEbene Rückweisung
Einreichung
SchaltkreisEbene Rückweisung
Einreichung
zur Behebung
des
AusbildungsDilemmas
tall thin
Kohärenz
man
Zersplitterung
RT-Ebene
Layout-Ebene
Technologi
e
im Hause
Spezialisierungsbreite
© 2008, [email protected]
Silicon Foundry
(externeTechnologie)
Spezialisierungsbreite
stark reduziert
53
Betonung auf
“Systems”
http://hartenstein.de
Education Revolution:
Reconfigurable Computing Revolution
Christophe Bobda
Application
level
(instructionstreambased)
clearing out
*) or” tall thin woman”
the tall thin man*
> Dichotomy <
The
new
Program level
Mead
&
Conwa
y?
Anti machine
von-NeumannParadigm
Paradigm
(datastream-based)
clearing out
Twin
Paradigm
© 2008, [email protected]
54
http://hartenstein.de
Who generates the data streams?
Withourt a
Sequencer
it‘s not a
Machine !
x
x x
x x x
x x |
x | |
x x x
xx x -
- - - x xx
- - - - xx x
- - - - - x xx
xxx - -
© 2008, [email protected]
55
|
|
|
|
|
|
|
|
|
x | |
x x |
x x x
x x
x
http://hartenstein.de
ASM
x x x - -
ASM:
AutoSequencing
Memory
© 2008, [email protected]
|
ASM
several date counters
instead of
a program counter
|
|
|
|
|
|
|
|
|
x |
x x
x x
x
the data counter:
placed in memory**
(not with datapath***)
|
|
|
x
x
x
ASM
ASM
x x x
x x x -
The Anti Machine
x
x x
x x
x |
ASM
ASM
ASM
ASM
(r)DPA*
x
x
x
ASM
Supersystolic
Array
(Kress
Array)
- - - x x x
ASM
- - -
- x x x
ASM
- - -
- - x x x
ASM
GAG
Data streams
[Kung et al. 1979]
RAM
data
counter
programmed by Flowware
56
*) especially coarse-grained:
for instance: platform FPGA
**) normaly on-chip
***) not like with CPU
http://hartenstein.de
Misson of this talk
software 2 hardware mapping (and,
software 2 configware mapping)
means time to space migration
(and von Neumann 2 anti machine migration)
We need time to space migration
++
since infinite space is not available,###
we often need partial time 2 space migrat
© 2008, [email protected]
57
http://hartenstein.de
Morphware: old stuff
structural programming (non-von-Neumann)
1971 PROMs for small logic
1975 PLA
1978 PAL with PALASM tool
1984 first Xilinx FPGA
meanwhile mainstream …
© 2008, [email protected]
58
http://hartenstein.de
POIIP: Loop turns into pipeline
[1979]
loop:
Memory
CPU
loop
body
(reconfigurable)
DataPath Unit:
rDPU
loop
body
Pipeline:
rDPU
rDPU
rDPU
rDPU
© 2008, [email protected]
http://hartenstein.de
super-systolic array
(recall this example !)
rout thru only
far beyond
just uniform
linear pipes
supporting
any complex
free form
pipe networks
Legend:
© 2008, [email protected]
rDPU not used
backbus connect
used connect
for routing only
backbus
operator and routing
port used
location marker
not
by KressArray Xplorer [Ulrich Nageldinger]
CoDe-X inside [Jürgen Becker]
60
http://hartenstein.de
decision box turns into demultiplexer
PvOIIP [1967]
decision box:
demultiplexer:
ENABLE
CONDITION
CONDITION
ENABLE
B0
B0
1
0
B1
B1
W. A. Clark: 1967 SJCC, AFIPS Conf. Proc.
C. G. Bell et al: IEEE Trans-C21/5, May 1972
RTM as a DEC product available: 1973
© 2008, [email protected]
61
[~1971] (introducing HDLs):
„That‘ so simple! Why did it
take 30 years to find out ?“
http://hartenstein.de
von Neumann overhead: an example
von Neumann
CPU single CPU
machine
instruction fetch
instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU + other overh. instruction stream
i / o to / from off-chip RAM instruction stream
overhead
rDPU
rDPU
rDPU
rDPU
(entire project:
15000x speed-up)
PISA DRC accelerator [ICCAD 1984]
reconfigurable address
generator (GAG): ~20x speed-up
© 2008, [email protected]
62
http://hartenstein.de
ASM
x x x
x x x -
ASM
x x x - -
ASM:
AutoSequencing
Memory
© 2008, [email protected]
|
ASM
x
x
x
|
|
|
|
|
|
|
|
|
|
|
|
x
x
x
x
x
x
New is only: its
generalization [1989]
|
- - - x x x
ASM
- - - - x x x
ASM
- - - - - x x x
ASM
[1990] GAG
Data streams
[Kung et al. 1979]
RAM
data
counter
|
x
x
x
ASM
ASM
x
x
x
ASM
[1995]
ASM
ASM
(r)DPA
[1995]
x
x
x
ASM
systolic array
super systolic
Data Machine: from
old stuff [1979 - ...]
ASM
63
data
counter
(r)DPA
data
counter s
http://hartenstein.de
Related documents