Download signalling-spieff

Document related concepts

Low Pin Count wikipedia , lookup

Bus (computing) wikipedia , lookup

Transcript
Signalling in the Heterogeneous
Architecture Multiprocessor Paradigm
Antonio Núñez, Victor Reyes, Tomás Bautista
Keynote
IUMA, Institute for Applied
Microelectronics, ULPGC
SPIE Gran Canaria 2003
A. Nunez
1
Index
MPSoC Architectures -> Hetero MPSoC
Communication Architectures -> Split
Transport and Signalling Networks
Previous and Related work
Our SystemC Based Modelling Approach
Experiments
Conclusions
SPIE Gran Canaria 2003
A. Nunez
2
SPIE Gran Canaria 2003
A. Nunez
3
Technological Forecasts
Moore's Law:
number of transistors per chip double every two years
ITRS:
SoC
Year of 1st shipment
Local Clock (GHz)
Across Chip (GHz)
Chip Size (mm²)
Dense Lines (nm)
Number of chip I/O
Transistors per chip
SPIE Gran Canaria 2003
MPSoC
GALS
NoC
1997 1999 2002 2005 2008 2011 2014
0,75 1,25
2,1
3,5
6
10 16,9
0,75
1,2
1,6
2
2,5
3 3,674
300
340
430
520
620
750
901
250
180
130
100
70
50
35
1515 1867 2553 3492 4776 6532 8935
11M 21M 76M 200M 520M 1,4B 3,62B
A. Nunez
4
SPIE Gran Canaria 2003
A. Nunez
5
Processor to DRAM Performance Gap
CPU
“Moore’s Law”
Processor-Memory
Performance Gap:
(grows 50% / year)
100
10
DRAM
1
µProc
60%/yr.
DRAM
7%/yr.
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Performance
1000
Time
SPIE Gran Canaria 2003
A. Nunez
6
Logic to Memory Area Gap
SPIE Gran Canaria 2003
A. Nunez
7
Logic to Productivity Gap
SPIE Gran Canaria 2003
A. Nunez
8
SPIE Gran Canaria 2003
-> Platform based design
-> Communication architectures
A. Nunez
9
Index
MPSoC Architectures -> Hetero MPSoC
Communication Architectures -> Split
Transport and Signalling Networks
Previous and Related work
Our SystemC Based Modelling Approach
Experiments
Conclusions
SPIE Gran Canaria 2003
A. Nunez
10
Processor Architecture Paradigms
Cfr. Ungerer et al, Patterson et al, Tenhunnen et al, Computer special issue
Processor/Memory/Switch


Processor- Memory- Communications- dominated systems
Communications architecture
Processor-Mono: Speed-up of a single-threaded application
 Advanced superscalar
 Trace Cache
Patt, Sohi…
 Superspeculative
 Multiscalar processors
Processor-Multi: Speed-up of multi-threaded applications
 Simultaneous multithreading (SMT)
Homo
Many..
Hetero
 Chip multiprocessors (CMPs)
Patterson
Memory, Processor-in-Memory, IRAM, others
Network on Chip
Mihal, Tenhunnen, Goosens
SPIE Gran Canaria 2003
A. Nunez
11
Monoprocessor: Superflow Processor
Fine granularity, data word
The Superflow processor speculates on



instruction flow: two-phase branch predictor combined with
trace cache
register data flow: dependence prediction: predict the register
value dependence between instructions
 source operand value prediction
 constant value prediction
 value stride prediction: speculate on constant, incremental
increases in operand values
 dependence prediction predicts inter-instruction
dependences
memory data flow: prediction of load values, of load addresses
and alias prediction
SPIE Gran Canaria 2003
A. Nunez
12
Com-arch in Superflow Processor
SPIE Gran Canaria 2003
A. Nunez
13
Multiscalar Processors
A program is represented as a control flow graph (CFG), where basic blocks are
nodes, and arcs represent flow of control.
A multiscalar processor walks through the CFG speculatively, taking task-sized
steps, without pausing to inspect any of the instructions within a task.
The tasks are distributed to a number of parallel PEs within a processor.
Each PE fetches and executes instructions belonging to its assigned task.
The primary constraint: it must preserve the sequential program semantics.
SPIE Gran Canaria 2003
A. Nunez
14
Multiscalar mode of execution
PE 0
A
B
C
Data values
Task A
PE 1
Task B
D
PE 2
Task D
E
PE 3
Task E
SPIE Gran Canaria 2003
A. Nunez
15
Com-arch in Multiscalar processor
SPIE Gran Canaria 2003
A. Nunez
16
Multiscalar, Trace and Speculative
Multithreaded Processors
Multiscalar: A program is statically partitioned into tasks which are
marked by annotations of the CFG.
Trace Processor: Tasks are generated from traces of the trace
cache.
Speculative multithreading: Tasks are otherwise dynamically
constructed.
Common target: Increase of single-thread program performance by
dynamically utilizing thread-level speculation additionally to
instruction-level parallelism.
A „thread“ means a „HW thread“
SPIE Gran Canaria 2003
A. Nunez
17
Multis: Additional utilization of more
coarse-grained parallelism
CMPs Chip multiprocessors or multiprocessor chips


integrate two or more complete processors on a single chip,
every functional unit of a processor is duplicated.
SMPs Simultaneous multithreaded processors



store multiple contexts in different register sets on the chip,
the functional units are multiplexed between the threads,
instructions of different contexts are simultaneously executed.
SPIE Gran Canaria 2003
A. Nunez
18
CMPs-Homo: Com-arch by shared
global memory
Processor
Processor
Processor
Processor
Primary Cache
Secndary
Cache
Global
Memory
Global Memory
Shared global memory, no caches
SPIE Gran Canaria 2003
A. Nunez
19
CMPs-Homo: Com-arch by shared
primary cache
Processor
Processor
Processor
Processor
Primary Cache
Secondary Cache
Global Memory
Shared primary cache
SPIE Gran Canaria 2003
A. Nunez
20
CMPs-Homo: Com-arch by global
memory, caches
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Primary
Cache
Primary
Cache
Primary
Cache
Primary
Cache
Primary
Cache
Primary
Cache
Primary
Cache
Primary
Cache
Secondary
Cache
Secondary
Cache
Secondary
Cache
Secondary
Cache
Global Memory
Global Memory
Shared caches and memory
SPIE Gran Canaria 2003
Secondary Cache
Shared secondary cache
A. Nunez
21
Com-arch in Hydra: A Single-Chip
Multiprocessor
A Single Chip
Centralized Bus Arbitration Mechanisms
CPU 0
Primary
I-cache
Primary
D-cache
CPU 0 Memory Controller
On-chip Secondary
Cache
SPIE Gran Canaria 2003
CPU 1
Primary
I-cache
CPU 2
Primary
I-cache
Primary
D-cache
CPU 1 Memory Controller
Primary
D-cache
Primary
I-cache
CPU2 Memory Controller
Off-chip L3
Interface
Rambus Memory
Interface
Cache SRAM Array
DRAM Main Memory
A. Nunez
CPU 3
Primary
D-cache
CPU 3 Memory Controller
DMA
I/O Bus
Interface
I/O Device
22
CMPs-Hetero:
Communications Architecture
Architectures found in today’s heterogeneous
processors for platform based design
E.gr. CPU cores, AMBA buses, internal/external
shared memories
RISC
Core
Internal/
External
Memory
SPIE Gran Canaria 2003
AMBA Bus
Engines
Engines
Shared Bus
A. Nunez
External
I/O
23
CMPs-Hetero:
Communications Architecture, Arbiters
SPIE Gran Canaria 2003
A. Nunez
24
Multithreaded Processors
Aim: Latency tolerance
What is the problem? Load access latencies measured on an Alpha
Server 4100 SMP with four Alpha 21164 processors are:
 7 cycles for a primary cache miss which hits in the on-chip L2 cache of the

21164 processor,
21 cycles for a L2 cache miss which hits in the L3 (board-level) cache,
80 cycles for a miss that is served by the memory, and

125 cycles for a dirty miss, i.e., a miss that has to be served from another

processor's cache memory.
SPIE Gran Canaria 2003
A. Nunez
25
Multithreading
Multithreading


The ability to pursue two or more threads of control in parallel within a
processor pipeline.
Advantage: The latencies that arise in the computation of a single instruction
stream are filled by computations of another thread.
Multithreaded processors are able to bridge latencies by switching
to another thread of control - in contrast to chip multiprocessors.
SPIE Gran Canaria 2003
A. Nunez
26
Approaches of Multithreaded
Processors
Cycle-by-cycle interleaving

An instruction of another thread is fetched and fed into the execution pipeline
at each processor cycle.
Block-interleaving

The instructions of a thread are executed successively until an event occurs
that may cause latency. This event induces a context switch.
Simultaneous multithreading SMTs


Instructions are simultaneously issued from multiple threads to the FUs of a
superscalar processor.
combines a wide issue superscalar instruction issue with multithreading.
SPIE Gran Canaria 2003
A. Nunez
27
(a)
(b)
SPIE Gran Canaria 2003
Context switch
Context switch
Time (process cycles)
Multithreading versus NonMultithreading Approaches
(a) single-threaded scalar
(b) cycle-by-cycle
interleaving multithreaded
scalar
(c) block interleaving
multithreaded scalar
(c)
A. Nunez
28
Simultaneous Multithreading (SMT)
and Chip Multiprocessors (CMP)
Time (processor cycles)
(a) SMT
(b) CMP
Issue slots
(a)
SPIE Gran Canaria 2003
(b)
A. Nunez
29
Combining SMT and Multimedia
Start with a wide-issue superscalar general-purpose
processor
Enhance by simultaneous multithreading
Enhance by multimedia unit(s)
Enhance by on-chip RAM memory for constants and local
variables
SPIE Gran Canaria 2003
A. Nunez
30
The SMT Multimedia Processor
To Memory
Memoryinterface
DCache
Global
L/S
Local
Memory
Local
L/S
ICache
I/O
Thread
Control
Branch
IF
ID
Rename
RI
IF
ID
Simple
Integer
RT
WB
Compl
Integer
BTAC
Register
SPIE Gran Canaria 2003
A. Nunez
31
IPC of Maximum Processor Models
6,32
6,33
5,56
5,64
7
5,67
5,34
6
3,84
3,89
3,91
1,98
1,99
1,99
1
1
8
3,53
3,27
1,96
1,86
1
1,86
1,86
1,57
1
4
0,96
Threads
1
1
SPIE Gran Canaria 2003
3,52
2
4
A. Nunez
6
5
4
IPC
3
2
1
0
8
Issue
32
Combining CMP-hetero and
Multimedia
Start with a general-purpose processor
Enhance by hierarchical-bus com-arch
Enhance by hardware accelerators and copros
including multimedia unit(s)
Enhance by on-chip RAM memories for constants,
local variables, frames…
SPIE Gran Canaria 2003
A. Nunez
33
Real implementation example: Philips Eclipse
architecture instance for video coding
SPIE Gran Canaria 2003
A. Nunez
34
CMP or SMT?
The performance race between SMT and CMP is not yet decided.
CMP is easier to implement, but only SMT has the ability to hide
latencies.
A functional partitioning is not easily reached within a SMT
processor due to the centralized instruction issue.


A separation of the thread queues is a possible solution, although it does not
remove the central instruction issue.
A combination of simultaneous multithreading with the CMP may be superior.
Research: combine SMT or CMP organization with the ability to
create threads with compiler support or fully dynamically out of a
single thread


thread-level speculation
close to multiscalar
SPIE Gran Canaria 2003
A. Nunez
35
Processor-in-Memory
Technological trends have produced a large and growing gap
between processor speed and DRAM access latency.
Today, it takes dozens of cycles for data to travel between the CPU
and main memory.
CPU-centric design philosophy has led to very complex superscalar
processors with deep pipelines.
Much of this complexity is devoted to hiding memory access
latency.
Memory wall: the phenomenon that access times are increasingly
limiting system performance.
Memory-centric design is envisioned for the future
SPIE Gran Canaria 2003
A. Nunez
36
PIM or Intelligent RAM (IRAM)
PIM (processor-in-memory) or IRAM (intelligent RAM) approaches
couple processor execution with large, high-bandwidth, on-chip
DRAM banks.
PIM or IRAM merge processor and memory into a single chip.
Advantages:




The processor-DRAM gap in access speed increases in future. PIM provides
higher bandwidth and lower latency for (on-chip-)memory accesses.
DRAM can accommodate 30 to 50 times more data than the same chip area
devoted to caches.
On-chip memory may be treated as main memory - in contrast to a cache
which is just a redundant memory copy.
PIM decreases energy consumption in the memory system due to the
reduction of off-chip accesses.
VIRAM, CODE
SPIE Gran Canaria 2003
A. Nunez
37
V-IRAM-2: 0.13 µm, Fast Logic, 1GHz
16 GFLOPS(64b)/64 GOPS(16b)/128MB
8 x 64
or
16 x 32
or
32 x 16
+
x
2-way Superscalar
Vector
Instruction
Queue
Processor
I/O
I/O
÷
Load/Store
8K I cache
Vector Registers
8K D cache
8 x 64
8 x 64
Serial
I/O
Memory Crossbar Switch
M
I/O
M
8…x 64
I/O
M
M
M
M
M
M
M
…
M
8…
x 64
M
…
M
8…x 64
…
M
M
M
M
M
M
SPIE Gran Canaria 2003
A. Nunez
M
M
8…
x 64
M
M
M
M
M
…
M
8…
x 64
…
M
M
M
M
…
38
NoC Processor Architecture
Network-on-chip, specialized PEs, advanced
interconnect technologies
Will use packet network architectures in 2010
On-Chip
Memory
Controller
PE
DSP
PE Array
SPIE Gran Canaria 2003
External
Memory
PE
Switch
Node
External
I/O
Packet
Network
Switch
Node
PE
PE
A. Nunez
PE
39
NoC Mescal Communication
Architecture General Paradigm
Mescal Communication Architecture is a general, coarse-grained on-chip
interconnection scheme for various system components such as Processing
Elements, memory and other communicating elements.
PE
Processing
Element
$
PE
switch
bridge
MEM
$
MEM
switch
SPIE Gran Canaria 2003
PE
A. Nunez
Processing
Element
40
NoC Mescal Abstract System Architecture
Processing
Element
Processing
Element
Communication
Instructions
(send/recv)
Communication
Instructions
(send/recv)
Communication
Assist
Communication
Assist
On-Chip-Network
Operations
On-Chip-Network
Operations
Application Layer
Presentation Layer
Session Layer
Transport Layer
Network Layer
On Chip Network
Data Link Layer
Physical Layer
SPIE Gran Canaria 2003
A. Nunez
41
NoC Communication Architecture
Translation of network operations to
packet switch operations
Corresponding
Protocol Stack
On-Chip-Network
Operations
On-Chip-Network
Operations
Packet
Deassembler
Packet Assembler
Packet Switch
Network Operation
Network Layer
N0
N1
N7
N4
N6
Data Link Layer
Packet Switching Network
N2
N5
Physical Layer
N3
SPIE Gran Canaria 2003
A. Nunez
42
NoC: Example for a bus
Translation of network operations
to bus operations
On-Chip-Network
Operations
On-Chip-Network
Operations
Bus Interface Adapter
Corresponding Protocol
Stack
Bus Interface Adapter
Data Link Layer
Bus Operation
On Chip Bus
Physical Layer
SPIE Gran Canaria 2003
A. Nunez
43
Index
MPSoC Architectures -> Hetero MPSoC
Communication Architectures -> Split
Transport and Signalling Networks
Previous and Related work
Our SystemC Based Modelling Approach
Experiments
Conclusions
SPIE Gran Canaria 2003
A. Nunez
44
Todays Communication Architecture
Paradigms: Topology
Single and Shared Transport and Signalling
Channel




p2p
Bus
Hierarchical bus
Switch
 Crossbar
 Multistage…



Ring
Trees
Network
 Circuit sw
 Packet sw w/o connection
 Packet sw w connection..
SPIE Gran Canaria 2003
A. Nunez
45
Todays Communication Architecture
Paradigms: Topology
Split Transport and Signalling

Transport


Topology (bus, h-bus, switch, ring, network…)
Signalling (Addresses and routing, services, synchronisms)
 Associated channel

Topology
 Common channel


Topology…
Protocol layer stack: software and process view of the
generation of hardware signalling requires mapping onto
actual interfaces
SPIE Gran Canaria 2003
A. Nunez
46
Todays Communications
Architecture Paradigms: Bandwidth
Application Granularity
Transport Granularity




Fine grain
Medium grain
Coarse grain
Bus sizes, transfer sizes
Traffic Characterization


Traffic Characterization
E.gr. Streaming, burstiness, interval requests, space-time
distribution
SPIE Gran Canaria 2003
A. Nunez
47
Todays Communications Architecture
Paradigms: Protocols
Protocols




High level signalling primitives mapping
Communications to architecture mapping
Access policies mapping, priorities, static, dynamic
Traffic and flow control
 Burstiness
 Request Intervals
 Concurrency
SPIE Gran Canaria 2003
A. Nunez
48
Todays Communications
Architecture Paradigms: Signalling
Addressing, routing info
Service info
Hand-shake and command sync strobes




High level signalling primitives mapping
Communications to architecture mapping
Access policies mapping, priorities, static, dynamic
Traffic and flow control




Burstiness
Request Intervals
Concurrency
Streaming ...
SPIE Gran Canaria 2003
A. Nunez
49
Com-arch Modelling: Ptolemy-Mescal
UCBerkeley PtolemyI&II, Mescal, UCSD-Dey, PR-Vissers, Goosens, Lippen.., TIMA-Jerraya..
Components for channels:




Synchronous digital bus (shared or point-to-point)
ARM AMBA bus
IBM CoreConnect bus
Analog channel
Actors encapsulate the physical layer
Each actor has a common interface to make
experimentation possible
Ptolemy actor interface is a higher level than the
channel’s actual electrical interface
SPIE Gran Canaria 2003
A. Nunez
50
Com-arch Modelling: Ptolemy-Mescal
Components for CommAssists







Queues
Arbitrators
PE interfaces
Bus interfaces
External memory or I/O cycle generators
Switches
Small memories
Parameterizable components
Programmable components
Designing a CA, very similar to designing a PE
SPIE Gran Canaria 2003
A. Nunez
51
Com-arch Modelling: Ptolemy-Mescal
Encapsulate a PE model as a composite actor
Combine with CA components to make a
Communicator
Encapsulate Communicator model as a
composite actor
Combine multiple Communicators with
Channel components to make a complete
system
SPIE Gran Canaria 2003
A. Nunez
52
Index
MPSoC Architectures -> Hetero MPSoC
Communication Architectures -> Split
Transport and Signalling Networks
Previous and Related work
Our SystemC Based Modelling Approach
Experiments
Conclusions
SPIE Gran Canaria 2003
A. Nunez
53
Case study: Communication architecture in
HA-MPSoC
Mapping communicating processes and threads on HA-MPSoC
requires efficient ways of implementing the on-chip
communication
Previous work: comparative performance of different classes
of data communication architectures (San Diego)
But: The communication architecture can be split in: the data
communication architecture, and the signalling and
synchronization architecture
The impact of different signalling and synchronization
architectural options on the overall performance has not been
sufficiently studied
SPIE Gran Canaria 2003
A. Nunez
54
Our focus: Signalling in the HA-MPSoC
paradigm, split sync, SystemC modelling
New solutions for signalling and synchronization
in the HA-MPSoC paradigm
Based in a technique for modelling the
communication and synchronization architectures
using SystemC
High abstraction modelling based on the Kahn
Process Network Model of Computation
Here: Variations on Dey’s simple communication
architecture (bus)
SPIE Gran Canaria 2003
A. Nunez
55
Previous related work: UCSD-Dey
Analysis of the performance of various SoC
communication architectures under different
classes of on-chip communication traffic
Identifying parts of the application’s
“communiation traffic space” for which different
communication architectures are well-suited
Methodology based on POLIS/PTOLEMY
SPIE Gran Canaria 2003
A. Nunez
56
Previous related work: Dey’s
communication architectures
Static Priority Based Shared Bus
Architecture
Two-level TDMA Based Architecture
Hierarchical Bus Architecture
Ring Based Architecture
SPIE Gran Canaria 2003
A. Nunez
57
Index
MPSoC Architectures -> Hetero MPSoC
Communication Architectures -> Split
Transport and Signalling Networks
Previous and Related work
Our SystemC Based Modelling Approach
Experiments
Conclusions
SPIE Gran Canaria 2003
A. Nunez
58
Abstracting high level communication
KPN: concurrent tasks interconnected by channels
(FIFOs)
Processes have to share service administrative
information related to the FIFOs
Administrative information divided in two parts:
static and dynamic information
The update of the dynamic information of the
FIFO is the synchronization aspect of the
complete signalling function
SPIE Gran Canaria 2003
A. Nunez
59
A simple KPN example
Producer
FIFO
Consumer
Administrative information
- Base address memory
- FIFO size
- Number of data in FIFO
SPIE Gran Canaria 2003
A. Nunez
60
Signalling Primitives in MPSoC
Flexiblity and scalability, a protocol for
communicating tasks is needed
Set of primitives for data communication and
synchronization.
The Eclipse (Philips Research) example:
- Primitives for data communication:
void Read(int port_id, int offset, int n_bytes, Bytes *bytevector)
void Write(int port_id, int offset, int n_bytes, Bytes *bytevector)
- Primitives for data synchronization:
bool GetSpace(int port_id, int n_bytes)
void PutSpace(int port_id, int n_bytes)
SPIE Gran Canaria 2003
A. Nunez
61
Our SystemC-based Modelling
Executable specification of a system described in different
abstraction levels (functional untimed, timed, transaction
level and cycle-true)
TLM is a natural method to perform system level
performance simulation
SystemC Master/Slave library hides the more complex
details of C++ programming and fits well for TLM
development
The design time of complex MPSoC models can be greatly
shortened using the SystemC Master/Slave library
SPIE Gran Canaria 2003
A. Nunez
62
Application modelling
Chain of P processors interconnected through
FIFOs
Simulation parameters: number of processes (P),
token size (data-granularity), request intervals,
waiting cycles, transfer cycles, execution time,
total simulation time
P1
Pin
PP-2
FIFO1
SPIE Gran Canaria 2003
Pout
FIFOP-1
A. Nunez
63
Index
MPSoC Architectures -> Hetero MPSoC
Communication Architectures -> Split
Transport and Signalling Networks
Previous and Related work
Our SystemC Based Modelling Approach
Experiments
Conclusions
SPIE Gran Canaria 2003
A. Nunez
64
Average Communication rate
Static Priority Based Shared Bus Architecture
250
200
Inter-Request = 10
150
Inter-Request = 100
Inter-Request = 500
100
Inter-Request = 1000
50
0
1
10
50
100
Token size
SPIE Gran Canaria 2003
A. Nunez
65
Average Communication rate
Two-level TDMA Based Architecture
250
200
Inter-Request = 10
150
Inter-Request = 100
Inter-Request = 500
100
Inter-Request = 1000
50
0
1
10
50
100
Token size
SPIE Gran Canaria 2003
A. Nunez
66
Average Communication rate
Hierarchical Bus Architecture
350
300
250
Inter-Request = 10
200
Inter-Request = 100
150
Inter-Request = 500
100
Inter-Request = 1000
50
0
1
10
50
100
Token size
SPIE Gran Canaria 2003
A. Nunez
67
Average Communication rate
Ring Based Architecture
350
300
250
Inter-Request = 10
200
Inter-Request = 100
150
Inter-Request = 500
100
I nter-Request = 1000
50
0
1
10
50
100
Token size
SPIE Gran Canaria 2003
A. Nunez
68
Reminder of Dey’s communication
architectures
Static Priority Based Shared Bus
Architecture
Two-level TDMA Based Architecture
Hierarchical Bus Architecture
Ring Based Architecture
SPIE Gran Canaria 2003
A. Nunez
69
Experiments: Additional models of
communication architectures
MEM
MEM
ARB
ARB
Wd
P1
Wd
Wd
Wd
P1
P2
Wd
Wd
Wd
P3
P2
Wd
P4
P3
P4
SYNC
MEM
ARB
Wd - Ws
Wd - Ws
P2
P1
Wd - Ws
Wd - Ws
P3
P4
MEM
MEM
ARB
Wd
Wd
Wd
Wd
P1
P2
P3
P4
Ws
Ws
Ws
Ws
ARB
ARB
SPIE Gran Canaria 2003
A. Nunez
Wd
Wd
Wd
Wd
P1
P2
P3
P4
Ws
Ws
Ws
Ws
70
Centralized architecture using shared
memory (Mem)
MEM
sync
ARB
Wd
Wd
Wd
Wd
P1
P2
P3
P4
SPIE Gran Canaria 2003
A. Nunez
71
Centralized architecture using a central
synchronization module (Central)
MEM
ARB
Wd
Wd
Wd
Wd
P1
P2
P3
P4
SYNC
SPIE Gran Canaria 2003
A. Nunez
72
Distributed architecture, same bus for data
transport and synchronization (Single-Bus)
MEM
ARB
Wd-Ws
Wd-Ws
Wd-Ws
Wd-Ws
P1
P2
P3
P4
SPIE Gran Canaria 2003
A. Nunez
73
Distributed architecture, splitting data
transport bus and sync bus (2-Busses)
MEM
ARB
Wd
Wd
Wd
Wd
P1
P2
P3
P4
Ws
Ws
Ws
Ws
ARB
SPIE Gran Canaria 2003
A. Nunez
74
Distributed architecture with ring
topology for synchronization (Ring)
MEM
ARB
Wd
Wd
Wd
Wd
P1
P2
P3
P4
Ws
Ws
Ws
Ws
SPIE Gran Canaria 2003
A. Nunez
75
Implementation example: Philips Eclipse
architecture instance for video coding
SPIE Gran Canaria 2003
A. Nunez
76
Additional measurements
Quantify what synchronization topology allows
the shortest execution time for an application, i.e.
the more efficient from the performance point of
view
The Coprocessor Usage percentage figure (Ucop):
%Ucop = (Texec/Tsim) · 100
SPIE Gran Canaria 2003
A. Nunez
77
%
Coprocessor Usage, P = 4
10
9
8
7
6
5
4
3
2
1
0
.
2-busses
Ring
Single-bus
Mem
Central
1
4
8
16
Token size
SPIE Gran Canaria 2003
A. Nunez
78
%
Coprocessor Usage, P = 8
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
2-busses
Ring
Single-bus
Mem
Central
1
4
8
16
Token size
SPIE Gran Canaria 2003
A. Nunez
79
Conclusions
Increasing importance of communication
architecture, MPSoCs <-> NoCs
Design space exploration extended with
communication-architectures
SystemC master/slave library powerful modelling
tool
Large performance spread found due to
communication topologies, signalling protocols,
and traffic characteristics
Need of more qualitative and quantitative
modelling, analysis, studies, tools
Consider splitting transport and signalling
Hierarchical buses, rings, plus splitting ++
SPIE Gran Canaria 2003
A. Nunez
80
Signalling in the Heterogeneous
Architecture Multiprocessor Paradigm
Antonio Núñez, Victor Reyes, Tomás Bautista
Keynote
IUMA, Institute for Applied
Microelectronics, ULPGC
SPIE Gran Canaria 2003
A. Nunez
81