Download PROCEED: Pareto Optimization-based Circuit

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pulse-width modulation wikipedia , lookup

Decibel wikipedia , lookup

Electrical substation wikipedia , lookup

Opto-isolator wikipedia , lookup

Power factor wikipedia , lookup

Power inverter wikipedia , lookup

Voltage optimisation wikipedia , lookup

Buck converter wikipedia , lookup

Electrification wikipedia , lookup

History of electric power transmission wikipedia , lookup

Wireless power transfer wikipedia , lookup

Amtrak's 25 Hz traction power system wikipedia , lookup

Power MOSFET wikipedia , lookup

Audio power wikipedia , lookup

Standby power wikipedia , lookup

Electric power system wikipedia , lookup

Alternating current wikipedia , lookup

Mains electricity wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Power electronics wikipedia , lookup

Distribution management system wikipedia , lookup

Power over Ethernet wikipedia , lookup

Power engineering wikipedia , lookup

CMOS wikipedia , lookup

AC adapter wikipedia , lookup

Transcript
PROCEED: Pareto Optimization-based
Circuit-level Evaluation Methodology
for Emerging Devices
ASP-DAC 2014
Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta
Department of Electrical Engineering, University of California,
Los Angeles
ASP-DAC 2014
1
Need for Device Evaluation
• Traditional CMOS technologies are reaching physical
limits
• Many alternative emerging devices under
investigation: TFET, CNT, Heterogeneous CMOS, etc.
 need to be able quickly compare them to guide
technology development
• How should we compare emerging devices ?
− Comprehensive, systematic and automated comparison
in context of how they are going to be used
− Account for various design types and circuit-level optimizations
− Fast and flexible evaluation framework
− Cover the wide performance range (KHz to GHz)
NanoCAD Lab
ASP-DAC 2014
2
Prior Work
• Three classes of works
– Devices level1,6: Ion/Ioff, Subthreshold Slope (SS ), CV/I, CV2
– Canonical circuit level: Simple circuits + Analytical model4 based
power-delay tradeoff
– Full design flow: Library generation, Synthesis, Placement and
Routing
• Existing evaluation benchmarks neglect how modern
circuits really use devices  These can dramatically
change the conclusions.
− Circuit topology dependence (e.g., logic depth)
− Design-time power optimization (Multi-Vth and multiple gate sizes)
− Runtime adaptive power management (DVFS, Gating)
1L.
Wei, et. al., IEDM, 2010
J. Frank, et. al., IBM J, 2006
6M. Luisier, et. al., IEDM, 2011
4D.
NanoCAD Lab
ASP-DAC 2014
3
Outline
PROCEED Methodology
Example Experimental Results
NanoCAD Lab
ASP-DAC 2014
4
Proposed Framework: PROCEED
Wire capacitance,
resistance, and chip
area
Logic depth
histogram, Fan-out
Interconnect model
Canonical circuit
construction
Variation
information
Ratio of
throughput
Gate sizes
Device model,
Activity
Pareto optimizer
Power management
Delay and
power Pareto
curves
Throughput
power Pareto
curves
Vdd, Vth, gate sizes
constraints
NanoCAD Lab
ASP-DAC 2014
5
Canonical Circuit Construction
• Utilize essential design information
– Logic depth histogram, average number of transistors per gate,
average fan-out, average interconnect load and chip area
– Ignore detailed circuit design
• Little impact on device performance evaluation
20
Bin 5
 Logic depth histogram
... S5
S5
Long
paths
Logic Depth
16
Bin 4
12
Bin 3
8
Bin 2
Short
paths
4
Bin 1
0
0.0
0.2
Frequency
S4
...
S4
 Simulation blocks (Si)
S3
...
S3
S2
...
S2
S1
...
S1
– Construct logic paths in
corresponding bins
 Single stage
0.4
(a)
Simulation block
Single stage
Fan-out
Gate
Inverter
Interconnect
1 stage
Signal in
Si
...
1 stage
Signal out
i stages
(b)
NanoCAD Lab
ASP-DAC 2014
6
– Gate (Nand, XOR, etc.)
– Buffer, interconnect,
fan-out load
 Tuning parameters
– Gate sizes, Vdd, Vth
Canonical Circuit Construction
• Logic depth histogram is
estimated using slack
histogram
• Interconnect
1
2
ASP-DAC 2014
Model for interconnect length
2.5
2.0
1.5
1.0
0.5
0.0
0
1
2
3
Transistor width (m)
J. A. Davis, et. al.,, IEEE TED, 1998.
R. S. Ghaida and P. Gupta, IEEE Trans. CAD, 2012.
NanoCAD Lab
3.0
Relative interconnect load (a.u.)
– Interconnect is proportional to
the square-root of chip area1
– Chip area is assumed to be
linear to cell area
– Cell area is modeled as a
function of transistor width
from DRE2
(b)
7
4
5
Pareto Optimization: Overview
Existing
Point
• Objective functions are
weighted sum of delay and
power
Simulate starting
point1
Building models2(a) and trust
region2(b)
4
e
lop
Power
s
D-P
New
Point
Weighted sum3(a)
and gradient descent.3(b)
2(b)
Trust region
3(b)
Gradient
descent
Simulate to evaluate new points
and update Pareto points.4
1 Start point
D-P slope
2(a)Building models :
Pi  P VDD1 ,...,VDDk ,VT 1 ,..., VTl , W1 ,..., W p 
Di  D VDD1 ,...,VDDk ,VT 1 ,..., VTl , W1 ,..., W p 
Existing
Point
3(a) Adaptive Weight is
determined by slope of
existing Pareto Fronts
Delay
Local models
NanoCAD Lab
ASP-DAC 2014
8
– Non-convex problem
– Gradient descent used
• Delay and power are
approximated with second
order functions in trust
region
• Trust region shrinks during
optimization
• Logarithmic barrier is
incorporated to confine
parameter range
Pareto Optimization: Modeling
• Build simulation-block-level delay and power models by
utilizing circuit simulations results.
T
DSi  yi,0 + Δyi  = DSi ,0 + GDi
Δyi 
1 T
Δyi H Di Δyi
2
1
PSi  yi,0 + Δyi  = PSi ,0 + GPiT Δyi  ΔyiT H Pi Δyi
2
• Objective delay is the longest delay of all logic paths
(constructed by simulation blocks)
– Using high order norm to estimate max-delay function
 This can make the objective function continuous for gradient
calculation


D  X  WD  max  DS 1  y1  , DS 2  y2  ,..., DSn  yn    WD   DS 1  y1  , DS 2  y2  ,..., DSn  yn  
• Objective power is the weighted total power
n
P  Wi  PSi
i 1
NanoCAD Lab
ASP-DAC 2014
9
K
Power Management: Overview
• Modern circuits allow devices to operate in three modes:
normal, power saving and sleep mode.
– A 2nd lower Vdd is applied to devices in power saving mode
– Device is turned off in sleeping mode
• Pick best 2nd Vdd and optimally divide time spent on each mode
– Need input of the ratio of average throughput to peak throughput
– Use polynomial models for delay and power of simulation block
60
as a function of Vdd
Model fit for delay and power
15
Symbols: Power and delay data
from simulations
Lines:
Fitting models
Delay (ps)
• Minimizing average power
consumption: f1P1 + f2P2
5
20
0
0.3
NanoCAD Lab
ASP-DAC 2014
10
0.4
0.5
VDD (V)
0.6
0.7
Power (W)
10
40
Validation of PROCEED
45nm SOI CortexM0
Activity = 20% (PROCEED)
Activity = 20% (RTL Compiler)
Activity = 20% (Model [18])
Activity = 20% (PROCEED w/o ldh)
Activity = 100% (PROCEED)
Activity = 100% (RTL Compiler)
Activity = 100% (Model [18])
Activity = 100% (PROCEED w/o ldh)
Power (W)
0.08
100% Activity
20% Activity
0.00
800
850
900
950
1000
1050
1100
Clock period (ps)
•
•
Model3 is frequently used for device evaluation
– Ignoring logic depth histogram and using analytical delay and power
models is inaccurate
Proposed methodology is 21X (average) more accurate
– Efficient to evaluate devices with performance range from MHz to GHz
3D.
NanoCAD Lab
ASP-DAC 2014
11
J. Frank, et. al., IBM J, 2006
Outline
PROCEED Methodology
Example Experimental Results
NanoCAD Lab
ASP-DAC 2014
12
Impact of Activity
10000
Impact of activity on 45nm ARM CortexM0
TFET activity=100%
TFET activity=1%
SOI activity=100%
SOI activity=1%
Power (W)
1000
100
Activity=100%
97.5X
10
9.4X
96ns
1
Activity=1%
50
100
150
200
Clock period (ns)
250
300
• Low activity circuit benefits low leakage devices
– TFET better than SOI MOSFET for low-activity, low-performance
circuits
NanoCAD Lab
ASP-DAC 2014
13
Impact of Circuit Topology
0
300
600
900
1200
Benchmark impact on 45nm P 10
1500
SOI ARM CortexM0
SOI MIPS
TFET ARM CortexM0
TFET MIPS
MIPS
106ns (CortexM0)
73ns
(MIPS)
CortexM0
300
600
900
Frequency
1200
1
Activity=1%
2Vdd and 2Vt
1500
50
100
150
Clock period (ns)
200
• CortexM0 is more evenly distributed in LDH
• Power consumption in MIPS is dominated by short logic paths
– More accommodating to low power devices ( TFETs)
NanoCAD Lab
ASP-DAC 2014
14
Power (W)
Logic depth
0
60
50
40
30
20
10
0
60
50
40
30
20
10
0
Frequency
Impact of Power Management
45nm ARM CortexM0 with power management
10
Power (W)
Activity=1%
10.9M
12.6M
1
16.1M
21.5M
TFET | Ave TP/Peak TP | SOI
1.0
0.5
0.2
0.1
0.1
0.01
Ave. TP/Peak TP
= 100%, 50%. 20%. 10%
5
10
15
20
25
30
Peak throughput (M ops/s)
35
40
• Peak throughput cross-points shift higher as ratio of average to
peak throughput decreases
– This indicates TFET may be a better device for applications with
wide dynamic range in performance needs
NanoCAD Lab
ASP-DAC 2014
15
Impact of Variation
Impact of variation on 45nm ARM CortexM0
TFET w/o variation
SOI w/o variation
TFET w/ variation
SOI w/ variation
Power (W)
10
130ns
96ns
1
50
100
150
200
250
Clock period (ns)
• Variation evaluation indicates that TFET suffers more from
effective voltage drop
– TFET and SOI are assumed to have peak 10% Vdd and 50mV
Vth worst case variation
– High Sub-threshold Slope (SS )devices are more sensitive to
voltage drop
NanoCAD Lab
ASP-DAC 2014
16
Conclusion
• Proposed new methodology for evaluating emerging devices
accounts for circuit topology, adaptivity, variability, and use
context
• Proposed methodology is efficient and accurate for device
evaluation over broad operating range.
– Effective Pareto -based optimization heuristic
– Accurate circuit simulation and device compact model
• Example comparison of TFET and SOI devices
– TFET is better for low activity, low logic depth and high
dynamic performance design
– TFET is more sensitive to voltage drop and threshold
voltage shifting
• Entire PROCEED source-code (MATLAB, C++) will be made
available openly.
NanoCAD Lab
ASP-DAC 2014
17
Q&A
Thanks!
NanoCAD Lab
ASP-DAC 2014
18