Download N1-ODYSSEY Project

Document related concepts

Distributed operating system wikipedia , lookup

Transcript
Design & Co-design of Embedded
Systems
The ODYSSEY Methodology:
ASIP-Based Design of Embedded Systems from
Object-Oriented System-Level Models
Maziar Goudarzi
1
Outline
• Motivation
• Related Work
• ODYSSEY: Theory
• ODYSSEY: Implementation
• ODYSSEY: Design Automation
• Summary and Conclusion
2
Embedded Systems Market
• Rapidly growing market
– Compound Annual Growth Rate (CAGR) of 17.3%
The future of computing resides in embedded computing
3
Market Life Cycle
• A delay to the market window causes a huge
revenue impact
Source: Agilent Technologies
4
Motivation:
Design Automation
• Conclusion:
– Design Automation Tools & Methodologies are
needed for Embedded System Design
• Question:
– At what level of abstraction?
5
The Design Productivity Gap
6
Motivation:
Electronic System-Level (ESL) design
• Solution:
– Raise the level of
abstraction
• Historical examples:
– Place & route tools
– Hardware description
languages
– Hardware synthesis
• Latest suggestion:
Source: Monterey Design Systems
– ESL
• Spans SW+HW
7
Motivation
• Conclusion:
– The embedded system industry is in need of
ESL Design Methodologies
and supporting Design Automation Tools
• Question:
– How to specify, implement, and validate the
embedded system?
8
The First Challenge in ESL:
Specification
• Alternatives:
– Extend HW modeling (e.g. VHDL) to SW
– Extend SW modeling (e.g. Java) to HW
– Use HW/SW-neutral or mathematical models (e.g.
Codesign FSM)
• Observations:
– Software accounts for 80% of embedded system
development cost [ITRS-2003]
– Technology trend toward SW:
• Catapult (Mentor Co.)
• Agility Compiler & DK Design Suite (Celoxica Co.)
• Cascade (CriticalBlue Co.)
9
ESL Challenges (cont’d)
• Conclusion:
– Object-oriented design methodology is a
reasonable answer
• Questions:
– What about other ESL challenges?
• Implementation, verification, automation of the design
• To be discussed later in the talk…
10
Thesis of this work
There is scope to raise the abstraction-level of
processors when designing embedded systems,
and furthermore,
such raise helps to address modelling, implementation,
and reuse challenges in the design and designautomation of modern embedded systems.
11
Outline
• Motivation
• Related Work
• ODYSSEY: Theory
• ODYSSEY: Implementation
• ODYSSEY: Design Automation
• Summary and Conclusion
12
Related Work
• OO used for hardware Modeling
– Extensions of VHDL
• Myriads of different proposals
– Objective-VHDL, several flavours of OO-VHDL, SUAVE
• Just a few consider synthesis
– Java
• HW components viewed as objects
• Signals travelling among components viewed as objects
– C++
• SystemC
• CynLib from CynApps
13
Related Work (cont’d)
• OO used for hardware modeling (cont’d)
– Modeling is good, but synthesis is the major concern
• Major approaches to OO synthesis
–
–
–
–
ODETTE
OASE
Enodia® Architecture
Not in our area of work:
• Wolf’s OO Co-synthesis
• Matisse
• jHISC
14
The ODETTE Approach
• ODETTE proposal:
– View objects are Finite-State Machines (FSM)
– Object attributes: FSM state variables
– Method calls: FSM state transitions
15
The ODETTE Object FSM
16
Polymorphism in ODETTE
17
Analysis of ODETTE
• Nice, but very high overhead
– One FSM per object => High area and power
overhead: O(no)
– Polymorphism: Replication => High area and
power overhead
+Maximum potential concurrency
– (Apparently) FSM => sequential method-call
inside objects
• Q: What if a method calls another one?
• Q: How to extend to HW/SW systems?
18
The OASE Approach
• OASE Proposal:
– Reuse and customize behavioural synthesis
techniques
– Static analysis & transformation of the OO code
– Converts OO constructs to non-OO ones
• Access to object attributes
• Non-virtual method calls
• Virtual method calls (polymorphism)
19
OASE Transformation Process
e Source
Syntax Tree
Scanner / Parser
Semantic Analysis
Control Flow Analysis
Data Flow Analysis
Concurrency Analysis
Symbol Tables
Control Flow
Graph
Output of Intermediate Format
Verilog
The transformation process from ‘e’ to Verilog [Kuhn et al., DAC’01]
20
Polymorphism in OASE
Object
Reference
variable
Set
x
S1, S2
y
S2
S1
S2
S3
z
S1, S2, S3
Results of static analysis
switch (z) {
case S1: S1_foo();
case S2: S2_foo();
case S3: S3_foo();
}
An example in e language [Kuhn et al., DAC’01]
21
Analysis of OASE
• Nice extension of behavioural synthesis to
OO, but still high overhead for
polymorphism
– Area/power overhead: O(nonmc)
22
The Enodia® Architecture
• Silicon Infusion Co. (UK startup)
• Enodia Proposal:
– Bottom-up composition of a variety of their IP
cores
– An Object-Orientated SoC architecture
• Patented in UK and US
23
Enodia® E9610 product
Internal architecture of Enodia E9610 chip [Silicon Infusion Co., 2004]
24
Analysis of Enodia®
• Patent on high-performance caching
• Chip architecture very similar to ours, but
– uses firmware for polymorphism =>
performance overhead
– Bottom-up approach => one manual chip
design per application domain
25
Summary & Comparison
ODETTE
OASE
Enodia
Impl. Style
ASIC
ASIC
Heterogeneous
Multiprocessor
Synthesis
Approach
Per-object method
replication
Static analysis +
inlining
Multiple objects
per method impl.
Language
Objective-VHDL,
SystemC-Plus
Java, SystemC, e
N.A.
Optimization
Dead-code
removal
object reachability
N.A.
Polymorphism
Method replication
& multiplexing
Method inlining
Firmware
26
Summary & Comparison (cont’d)
ODETTE
OASE
Enodia
HW-SW?
Not provided
Stub generation
SW on
multiprocessor
Model of
Concurrency
Objects invoked
from processes
Multiple processes
in modules
N.A.
Dynamic
(de)allocation
Not supported
Not supported
Supported
27
Summary & Comparison (cont’d)
• Major shortcomings
1.
2.
3.
4.
5.
Viewing objects as structural components
Too verbose languages
Unacceptable area/power overhead
No or unclear path toward HW-SW system
HW designers’ reluctance to OO
• We propose ODYSSEY
–
Object-oriented Design and sYntheSiS
of Embedded sYstems
28
Outline
• Motivation
• Related Work
• ODYSSEY: Theory
• ODYSSEY: Implementation
• ODYSSEY: Design Automation
• Summary and Conclusion
29
ASIP vs. ASIC
Source: K. Keutzer, S. Malik R. Newton, “From
ASIC to ASIP: The Next Design Discontinuity”,
ICCD, 2002
Application-specific instruction-processors (ASIPs)
are replacing ASICs
30
OO-ASIP:
Object-Oriented ASIP
• Our proposal:
A
– Let methods of a class library
be the instruction-set of a
processor
The class
library
i: int
f()
g()
B
c: char
f()
h()
C
f: float
a1
Data
Memory
The OO-ASIP
b1.h()
b1
a2.g()
a2
ap->f()
g()
k()
Instruction
Memory
31
OO-ASIP vs. Traditional Processors
• OO-ASIP for int/float = a traditional processor
• Differentiating features
– OO-ASIP instructions can call one another
– OO-ASIP instructions can be implemented in software as well as
in hardware
– Big instructions
Independent execution units for each HW instruction
Dynamic power management by de-activating not-running
instructions
& Dynamic area management by caching most-recently-run
instructions
– OO-ASIP implements polymorphism in hardware
32
OO-ASIP vs. Other ASIPs
• Typical ASIP-design
flow
Applications and Design Constraints
Application Analysis
Architectural Design-Space Exploration
Instruction-set generation
Code Synthesis
Hardware Synthesis
Object code
Code
Source: M.K.Jain, M. Balakrishnan, A. Kumar, “ASIP Design
Methodologies: Survey and Issues”, VLSI-Design Conf., 2001.
• Disadvantage
– No guarantee to suit future different (but related) applications
• OO-ASIP: future related apps. shall use today class lib.
33
Design-Space Represented by
OO-ASIP
Given an OO application
with No objects
Implementation by a
traditional processor
Number of objects
per OO-ASIP
No
OO-ASIP
2
1
All
HW
ODETTE
implementation
All
SW
Style of methods
(HW or SW)
34
Design Flow using OO-ASIPs
OO-ASIP Design Flow
OO-ASIP Reuse Flow
Disciplined Benchmarking
(OO-ASIP, HW Class Lib.)
Choose suitable class lib.
Database
Hardware
Class lib.
HW class lib.
Model+verify the App.
OO-ASIP Synthesis
The OO-ASIP
Data memory
OO-ASIP
Compile toward the ASIP
Instr. memory
35
Design Flow using OO-ASIP:
Another View
Application
SW Model
Software
C++
ASIP ISA:
Hardware
ASIP Programming
Path
f, g, k
SystemC
(C++)
ASIP Synthesis Path
Hardware
Class Lib.
D
DD
Software
Class Lib.
A
f()
h()
f()
g()
h()
B
k()
BB
C
System
Class Lib.
ASIP Hardware
36
Programming the OO-ASIP
• Requirements on the OO-ASIP compiler
– Retargetable to various OO-ASIPs
– Retargetable to various processor cores
– Capable of early hardware-software
co-validation
• Our solution:
– Source-to-source transformation
37
38
The ODYSSEY Ultimate Goal
• The ODYSSEY target chip:
FPGA-like array of
OO-ASIPs
• Interconnection:
– Packet-routing network
– Motivation:
• Network-on-Chip viewed as
future paradigm in DSM
technologies
ODYSSEY
System-Synthesizer
On-Chip network of OO-ASIPs
OO-ASIP1
router
OO-ASIP2
OO-ASIP3
router
Processor
OO-ASIP4
router
Processor
39
Outline
• Motivation
• Related Work
• ODYSSEY: Theory
• ODYSSEY: Implementation
• ODYSSEY: Design Automation
• Summary and Conclusion
40
A Simple
OO-ASIP Architecture
Functional
Units
(FUs)
Implementation
A
Traditional
Processor
B
f()
g()
f()
h()
B::f()
routine
of A::f()
Implementation
To
Data
Memory
Object
Management
Unit
(OMU)
of A::g()
Implementation
of B::h()
The OO-ASIP
Method
Invocation
Unit
(MIU)
From
Instruction
Memory
VMT OTT
41
Case Study 1:
Traffic-Light Controller
traffic_light
status: int
elapsed_time: int
open()
close()
timekeeper()
farmroad_light
highway_light
fixed_green: int
min_green: int
open()
close()
All methods implemented in hardware
42
Case Study 1:
Traffic-Light Controller
Share in total area (%)
30
25
20
15
10
5
)
se
(
w
ay
hi
gh
d:
:o
ro
a
fa
rm
::c
lo
pe
n
()
r()
ek
ee
pe
e(
)
::t
im
::c
tr
af
f…
c…
tr
af
fi
t::
op
e
ig
h
c_
l
tr
af
fi
lo
s
n(
)
U
O
M
M
IU
0
Values reported by LeonardoSpectrum tool over a sample 0.5um process
43
Case Study 1:
Traffic-Light Controller
Power Consumption (nW)
15% reduction
200
180
160
140
120
100
80
60
40
20
0
20% reduction
Without Powerdown
With Powerdown
g
ts
ts
ssin
bj ec
bjec
o
o
o
r
t
c
t
h
h
li g
l ig
o ad
oad_
armr
w ay_
r
h
F
m
g
r
y
i
h
fa
wa
i th 4
i th 4
High
w
w
n
n
o
ti
t io
Junc
Junc
Values estimated by Synopsys PowerCompiler tool over a 1 um process with
5V operating voltage
44
Analysis of the Architecture
• Area/Power management
– Static (application-specific) policy
– Dynamic (application-independent) policy
• Polymorphism overhead
– Performance improved by HW MIU
– Area/power overhead still present
45
Our Solution:
Network-on-Chip Architecture
• Dispatch virtual-methods at the same time that
packets are routed on an on-chip network
Processor
Object
Management
Unit
(OMU)
The OO-ASIP
On-chip
Network
To
Data
Memory
A::f()
A::g()
B::h()
Functional
Units
(FUs)
From
Instruction
Memory
A
B
f()
g()
f()
h()
46
NoC: Network-on-Chip
• NoC emergence:
– Fully synchronous designs not feasible
anymore
– Unreliable communication in very deep
submicron technologies (90 nm and beyond)
– Solution: leverage computer networks and
protocols for communication inside chips
– NoC seems unavoidable
Reference: L. Benini, G. DeMicheli, “Networks on Chips: a New SoC Paradigm,”
47
IEEE Computer, 35(1):70-78, 2002.
Ordinary-Method Dispatch
by Network Routing
• FU-identifier:
FU=<method.class>
• Object-identifier:
object=<class.num>
• Method call = invoke a method on an object
<method.object> = <method.<class.num>>
= <<method.class>.num> = <FU.num>
= Packet destined to the node addressed FU
48
Virtual-Method Dispatch
by Network Routing
• To dynamically bind a method call
(e.g. objp->method(params) in C++)
1.Assemble a packet as
<method, objp, params>
2.Send it over the on-chip network
3.The (probable) return value is sent back as
another packet
49
Case Study 2:
A Codec Engine
data_block
data[20]: byte
Hardware methods
Software methods
print()
encode()
decode()
xor_encoded_data
swap_encoded_data
cypher: byte
convert_char(byte)
encode()
decode()
swap(byte, byte)
encode()
decode()
50
Case Study 2:
Implementation in SystemC
51
Outline
• Motivation
• Related Work
• ODYSSEY: Theory
• ODYSSEY: Implementation
• ODYSSEY: Design Automation
• Summary and Conclusion
52
Input-Output Correspondence
Class definition
attributes
attributes
attributes
HW-methods
HW-methods
HW-methods
SW-methods
SW-methods
SW-methods
main() function
System Model
(C++)
The OO-ASIP
Object-Management
Unit (OMU)
Processor Module
thread__main()
HW-method
implementation
SW-method
implementation
on-chip network
System Implementation
(SystemC)
53
Big Picture of
Tool Flow OO-ASIP
System Model
(C++)
Synthesis
HW-method
Transformations
Parsing + Analysis
Partitioning
HW-structure
generator
System-level
Synthesis
OO-ASIP
Compilation
SW-method
Transformations
SW-structure
generator
Hardware
(SystemC)
Instr-set
extenstions
Software
(C++)
SystemC
Synthesis
Traditional
Processor
C++ Compiler
Gate-level HW
Binary SW
Final System
Downstream
Synthesis
54
HW-SW
Co-simulation
Model
HW-method
Transformations
System Model
(C++)
Synthesis
Parsing + Analysis
Partitioning
HW-structure
generator
Co-simulation
model
System-level
SW-method
Transformations
SW-structure
generator
Hardware
(SystemC)
Instr-set
extenstions
Software
(C++)
SystemC
Synthesis
Traditional
Processor
C++ Compiler
Gate-level HW
Binary SW
Final System
Downstream
Synthesis
55
Experiments on
Co-simulation Performance*
10000
1000
Attr.-access freq. (10K acc/s)
100
Method-calls freq. (100 call/s)
10
Imposed overhead (% )
1
A
09
0
00
1
A
09
0
00
2
B
Imposed overhead (%)
01
0
00
0
B
01
0
00
1
B
01
0
00
3
Method-calls freq. (100 call/s)
Attr.-access freq. (10K acc/s)
* All experiments done on a Celeron 2.0 GHz processor with 256MB of RAM
** Worst-case assumed: All methods are implemented in hardware
56
Analysis of Experimental Results
• High MC/sec. = High Communication/Computation ratio
= Most of the time spent in comm. instead of comp.
= Potentially low performance in final implementation
• Conclusion:
– Low co-simulation performance ~ Potentially low final performance
=> Hint to the designer: Decrease comm./comp. time (e.g. by
combining methods)
57
Outline
• Motivation
• Related Work
• ODYSSEY: Theory
• ODYSSEY: Implementation
• ODYSSEY: Design Automation
• Summary and Conclusion
58
Summary
• An ESL design methodology for embedded systems
was
– developed
– implemented
– automated
• The main thrusts:
– The design methodology
– The raise in abstraction-level of processor ISA
– The OO-ASIP processor
59
Further Research
• Currently going-on:
– Case studies on real-life industrial apps.
• JPEG codec (Morteza NajafVand)
• MPEG decoder (Naser MohammadZadeh)
– Object-aware cache
–
–
–
–
–
• Application-specific data prefetching in hardware (Mehdi Modarressi)
Synthesis of a Multiprocessor OO-ASIP (Hani JavanHemmat)
RT-Level co-simulation (Ms. Zeinolabedini)
Using IP-Cores in OO-ASIPs (Ms. Hashemi)
Fault-Tolerance by software standby sparing
Assertion-based verification
• A few others
– Application-specific memory synthesis for OO-ASIP
– Fault-tolerance by dynamic reconfiguration using polymorphism
– Multithreaded OO-ASIP
60
Conclusion
There is scope to raise the abstraction-level of
processors when designing embedded systems,
and furthermore,
such raise helps to address modelling, implementation,
and reuse challenges in the design and designautomation of modern embedded systems.
61
Supplementary Material
62
Supplements
• FDL’03 Poster
• Presentation at Oldenburg
• Progress Report 1 at Department of High-
Tech. Industries, Ministry of Industries and
Mines
63