Download Lectures 10, 11, 12: Gate-Level Power

Document related concepts

Buck converter wikipedia , lookup

Alternating current wikipedia , lookup

Power engineering wikipedia , lookup

Switched-mode power supply wikipedia , lookup

CMOS wikipedia , lookup

Transcript
CSV881: Low-Power Design
Gate-Level Power Optimization
Vishwani D. Agrawal
James J. Danaher Professor
Dept. of Electrical and Computer Engineering
Auburn University, Auburn, AL 36849
[email protected]
http://www.eng.auburn.edu/~vagrawal
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
1
Components of Power

Dynamic

Signal transitions
 Logic
activity
 Glitches


Short-circuit (often neglected)
Static

Copyright Agrawal, 2011
Leakage
Lectures 10, 11, 12: Gate-level optimization
2
Power of a Transition
isc
R
VDD
Dynamic Power
Vo
Vi
= CLVDD2/2 + Psc
CL
R
Ground
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
3
Dynamic Power
Each transition of a gate consumes CV 2/2.
 Methods of power saving:


Minimize load capacitances
 Transistor
sizing
 Library-based gate selection

Reduce transitions
 Logic
design
 Glitch reduction
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
4
Glitch Power Reduction

Design a digital circuit for minimum transient
energy consumption by eliminating hazards
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
5
Theorem 1

For correct operation with minimum energy
consumption, a Boolean gate must produce
no more than one event per transition.
Output logic state changes
One transition is necessary
Copyright Agrawal, 2011
Output logic state unchanged
No transition is necessary
Lectures 10, 11, 12: Gate-level optimization
6
Event Propagation
Single lumped inertial delay modeled for each gate
PI transitions assumed to occur without time skew
Path P1
1
0
13
P2
0
0
Copyright Agrawal, 2011
2
1
3
2
246
Path P3
5
Lectures 10, 11, 12: Gate-level optimization
7
Inertial Delay of an Inverter
Vin
dHL+dLH
d = ────
dHL
2
dLH
Vout
time
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
8
Multi-Input Gate
A
DPD: Differential path delay
Delay
C
d < DPD
B
A
DPD
B
C
Copyright Agrawal, 2011
d
d Hazard or glitch
Lectures 10, 11, 12: Gate-level optimization
9
Balanced Path Delays
A
DPD
Delay buffer
Delay
d < DPD
C
B
A
B
C
Copyright Agrawal, 2011
d No glitch
Lectures 10, 11, 12: Gate-level optimization
10
Glitch Filtering by Inertia
A
Delay
d > DPD
C
B
A
DPD
B
d > DPD
C
Copyright Agrawal, 2011
Filtered glitch
Lectures 10, 11, 12: Gate-level optimization
11
Theorem

Given that events occur at the input of a gate,
whose inertial delay is d, at times, t1 ≤ . . . ≤ tn ,
the number of events at the gate output cannot
exceed
tn – t 1
min ( n , 1 + ────
d
)
tn - t1
t1
Copyright Agrawal, 2011
t2
t3
tn
Lectures 10, 11, 12: Gate-level optimization
time
12
Minimum Transient Design

Minimum transient energy condition for a
Boolean gate:
| t i – tj | <
d
Where ti and tj are arrival times of input
events and d is the inertial delay of gate
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
13
Balanced Delay Method



All input events arrive simultaneously
Overall circuit delay not increased
Delay buffers may have to be inserted
1
1
1
1
1
No increase in
critical path
delay
3
1
Copyright Agrawal, 2011
1
1
1
Lectures 10, 11, 12: Gate-level optimization
1
14
Hazard Filter Method



Gate delay is made greater than maximum input
path delay difference
No delay buffers needed (least transient energy)
Overall circuit delay may increase
Copyright Agrawal, 2011
1
1
1
1
1
1
1
1
1
3
Lectures 10, 11, 12: Gate-level optimization
15
Designing a Glitch-Free Circuit



Maintain specified critical path delay.
Glitch suppressed at all gates by
 Path delay balancing
 Glitch filtering by increasing inertial delay of gates or by
inserting delay buffers when necessary.
A linear program optimally combines all objectives.
Path delay = d1
Path delay = d2
Copyright Agrawal, 2011
|d1 – d2| < D
Delay
D
Lectures 10, 11, 12: Gate-level optimization
16
Problem Complexity

Number of paths in a circuit can be
exponential in circuit size.

Considering all paths through enumeration
is infeasible for large circuits.

Example: c880 has 6.96M path constraints.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
17
Define Arrival Time Variables

di

Define two timing window variables per gate output:

Gate delay.

ti
Earliest time of signal transition at gate i.

Ti
Latest time of signal transition at gate i.
Glitch suppression constraint: Ti – ti < di
t1, T1
ti, Ti
.
.
.
di
tn, Tn
Reference: T. Raja, Master’s Thesis, Rutgers Univ., 2002.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
18
Linear Program
Variables: gate and buffer delays,
arrival time variables.
 Objective: minimize number of buffers.
 Subject to: overall circuit delay
constraint for all input-output paths.
 Subject to: minimum transient energy
condition for all multi-input gates.

Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
19
An Example: Full Adder add1b
1
1
1
1
1
1
1
1
1
Critical path delay = 6
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
20
Linear Program



Gate variables: d4 . . . d12
Buffer delay variables: d15 . . . d29
Window variables: t4 . . . t29 and T4 . . . . T29
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
21
Multiple-Input Gate Constraints
For Gate 7:
T7 ≥ T5 + d7
T7 ≥ T6 + d7
Copyright Agrawal, 2011
t7 ≤ t5 + d 7
t7 ≤ t6 + d 7
d7 > T7 – t7
Glitch suppression
Lectures 10, 11, 12: Gate-level optimization
22
Single-Input Gate Constraints
Buffer 19:
T16 + d19 = T19
t16 + d19 = t19
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
23
Critical Path Delay Constraints
T11 ≤ maxdelay
T12 ≤ maxdelay
maxdelay is specified
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
24
Objective Function
Need to minimize the number of buffers.
 Because that leads to a nonlinear
objective function, we use an approximate
criterion:
minimize ∑ (buffer delay)

all buffers
i.e.,
minimize d15 + d16 + ∙ ∙ ∙ + d29
 This gives a near optimum result.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
25
AMPL Solution: maxdelay = 6
1
1
2
1
1
2
1
1
1
2
2
Critical path delay = 6
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
26
AMPL Solution: maxdelay = 7
3
1
1
1
2
1
1
2
1
2
Critical path delay = 7
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
27
AMPL Solution: maxdelay ≥ 11
5
1
1
1
2
3
1
3
4
Critical path delay = 11
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
28
ALU4: Four-Bit ALU 74181
maxdelay
Buffers inserted
7
10
12
15
5
2
1
0
Maximum Power Savings (zero-buffer design):
Peak = 33%, Average = 21%
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
29
ALU4: Original and Low-Power
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
30
Benchmark Circuits
Normalized Power
Average
Peak
Circuit
Max-delay
(gates)
No. of
Buffers
ALU4
7
15
5
0
0.80
0.79
0.68
0.67
C880
24
48
62
34
0.68
0.68
0.54
0.52
C6288
47
94
294
120
0.40
0.36
0.36
0.34
c7552
43
86
366
111
0.44
0.42
0.34
0.32
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
31
C7552 Circuit: Spice Simulation
Power Saving: Average 58%, Peak 68%
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
32
References







R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for
Mathematical Programming, South San Francisco: The Scientific Press, 1993.
M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc.
ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188.
V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10th Int’l Conf.
VLSI Design, Jan. 1997, pp. 193-197.
V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital
Circuit Design for Minimum Transient Energy and Linear Programming Method,”
Proc. 12th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439.
T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum Dynamic Power CMOS
Circuit Design by a Reduced Constraint Set Linear Program,” Proc. 16th Int’l
Conf. VLSI Design, Jan. 2003, pp. 527-532.
T. Raja, V. D. Agrawal, and M. L. Bushnell, “Transistor sizing of logicgates to
maximize input delay variability,” J. Low Power Electron., vol.2, no. 1, pp. 121–
128, Apr. 2006.
T. Raja, V. D. Agrawal, and M. L. Bushnell, “Variable Input Delay CMOS Logic
for Low Power Design,” IEEE Trans. VLSI Design, vol. 17, mo. 10, pp. 15341545. October 2009.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
33
Exercise: Dynamic Power

An average gate
VDD, V = 1 volt
 Output capacitance, C = 1pF
 Activity factor, α = 10%
 Clock frequency, f = 1GHz


What is the dynamic power consumption
of a 1 million gate VLSI chip?
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
34
Answer
Dynamic energy per transition = 0.5CV2
 Dynamic power per gate
= Energy per second
= 0.5 CV2 α f
= 0.5 ✕ 10 – 12 ✕ 12 ✕ 0.1 ✕ 109
= 0.5 ✕ 10 – 4 = 50μW
 Power for 1 million gate chip = 50W

Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
35
Components of Power

Dynamic

Signal transitions
 Logic
activity
 Glitches


Short-circuit
Static

Copyright Agrawal, 2011
Leakage
Lectures 10, 11, 12: Gate-level optimization
36
Subthreshold Conduction
Ids
Vgs – Vth
–Vds
I0 exp( ───── ) × (1– exp ─── )
nVT
VT
=
Ids
Subthreshold
region
1mA
100μA
10μA
1μA
100nA
10nA
1nA
100pA
10pA
g
Subthreshold slope
d
s
Vth
0
Copyright Agrawal, 2011
Saturation region
0.3
0.6
0.9
1.2
Lectures 10, 11, 12: Gate-level optimization
1.5
1.8 V
Vgs
37
Thermal Voltage, vT
VT = kT/q = 26 mV, at room temperature.
When Vds is several times greater than VT
Ids
Copyright Agrawal, 2011
=
Vgs – Vth
I0 exp( ───── )
nVT
Lectures 10, 11, 12: Gate-level optimization
38
Leakage Current





Leakage current equals Ids when Vgs = 0
Leakage current, Ids = I0 exp( – Vth/nVT)
At cutoff, Vgs = Vth , and Ids = I0
Lowering leakage to 10-b ✕ I0
Vth = bnVT ln 10 = 1.5b × 26 ln 10 = 90b mV
Example: To lower leakage to I0/1,000
Vth = 270 mV
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
39
Threshold Voltage
Vth = Vt0 + γ[(Φs+Vsb)½ – Φs½]
 Vt0 is threshold voltage when source is at
body potential (0.4 V for 180nm process)
 Φs = 2VT ln(NA /ni ) is surface potential
 γ = (2qεsi NA)½tox /εox is body effect
coefficient (0.4 to 1.0)
 NA is doping level = 8×1017 cm–3
 ni = 1.45×1010 cm–3

Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
40
Threshold Voltage, Vsb = 1.1V
Thermal voltage, VT = kT/q = 26 mV
 Φs = 0.93 V
 εox = 3.9×8.85×10-14 F/cm
 εsi = 11.7×8.85×10-14 F/cm
 tox = 40 Ao
 γ = 0.6 V½
 Vth = Vt0 + γ[(Φs+Vsb)½- Φs½] = 0.68 V

Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
41
A Sample Calculation
VDD = 1.2V, 100nm CMOS process
 Transistor width, W = 0.5μm
 OFF device (Vgs = Vth) leakage

 I0
= 20nA/μm, for low threshold transistor
 I0 = 3nA/μm, for high threshold transistor

100M transistor chip
= (100×106/2)(0.5×20×10-9A)(1.2V) = 600mW
for all low-threshold transistors
 Power = (100×106/2)(0.5×3×10-9A)(1.2V) = 90mW for
all high-threshold transistors
 Power
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
42
Dual-Threshold Chip
Low-threshold only for 20% transistors on
critical path.
 Leakage power
= 600×0.2 + 90×0.8
= 120 + 72
= 192 mW

Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
43
Dual-Threshold CMOS Circuit
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
44
Dual-Threshold Design



To maintain performance, all gates on
critical paths are assigned low Vth .
Most other gates are assigned high Vth .
But, some gates on non-critical paths
may also be assigned low Vth to prevent
those paths from becoming critical.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
45
Integer Linear Programming (ILP) to
Minimize Leakage Power





Use dual-threshold CMOS process
First, assign all gates low Vth
Use an ILP model to find the delay (Tc) of the
critical path
Use another ILP model to find the optimal Vth
assignment as well as the reduced leakage
power for all gates without increasing Tc
Further reduction of leakage power possible
by letting Tc increase
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
46
ILP -Variables
For each gate i define two variables.
 Ti : the longest time at which the output of
gate i can produce an event after the
occurrence of an input event at a primary
input of the circuit.
 Xi : a variable specifying low or high Vth
for gate i ; Xi is an integer [0, 1],
1  gate i is assigned low Vth ,
0  gate i is assigned high Vth .
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
47
ILP - objective function
Leakage power:
Pleak  Vdd  I leaki
i
minimize the sum of all gate leakage currents, given by
Min   X i  I Li  1  X i   I Hi 
i



ILi is the leakage current of gate i with low Vth
IHi is the leakage current of gate i with high Vth
Using SPICE simulation results, construct a leakage
current look up table, which is indexed by the gate
type and the input vector.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
48
ILP - Constraints

For each gate
(1)
Gate i
Ti  T j  X i  DLi  1  X i   DHi
Ti
output of gate j is fanin of gate i
(2)

Gate j
0  Xi 1
Tj
Max delay constraints for primary outputs (PO)
(3)
Ti  Tmax
Tmax is the maximum delay of the critical path
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
49
ILP Constraint Example
0
1
2
3
Ti  T j  X i  DLi  1  X i   DHi


Assume all primary input (PI) signals on the left arrive at the
same time.
For gate 2, constraints are
T2  T0  X 2  DL 2  1  X 2   DH 2
T2  0  X 2  DL2  1  X 2   DH 2
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
50
ILP – Constraints (cont.)
DHi is the delay of gate i with high Vth
 DLi is the delay of gate i with low Vth
 A second look-up table is constructed and
specifies the delay for given gate types
and fanout numbers.

Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
51
ILP – Finding Critical Delay
Ti  Tmax


Tmax can be specified or be the delay of longest path (Tc).
To find Tc , we first delete the above constraint and assign
all gates low Vth
0  Xi 1


Xi 1
Maximum Ti in the ILP solution is Tc.
If we replace Tmax with Tc , the objective function then
minimizes leakage power without sacrificing performance.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
52
Power-Delay Tradeoff
1
0.9
Normalized Leakage Power
0.8
C432
0.7
C880
0.6
C1908
0.5
0.4
0.3
0.2
0.1
1
1.1
1.2
1.3
1.4
1.5
Normalized Critical Path Delay
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
53
Power-Delay Tradeoff




If we gradually increase Tmax from Tc , leakage
power is further reduced, because more gates
can be assigned high Vth .
But, the reduction trends to become slower.
When Tmax = (130%) Tc , the reduction about
levels off because almost all gates are
assigned high Vth .
Maximum leakage reduction can be 98%.
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
54
Leakage & Dynamic Power Optimization 70nm
CMOS c7552 Benchmark Circuit @ 90oC
900
800
Microwatts
700
600
500
Leakage power
Dynamic power
Total power
400
300
200
100
0
Original circuit
Copyright Agrawal, 2011
Optimized
design
Lectures 10, 11, 12: Gate-level optimization
Y. Lu and V. D. Agrawal, “CMOS
Leakage and Glitch
Minimization for PowerPerformance Tradeoff,” Journal
of Low Power Electronics
(JOLPE), vol. 2, no. 3, pp. 378387, December 2006.
55
Summary



Leakage power is a significant fraction of the
total power in nanometer CMOS devices.
Leakage power increases with temperature; can
be as much as dynamic power.
Dual threshold design can reduce leakage.


Reference: Y. Lu and V. D. Agrawal, “CMOS Leakage
and Glitch Minimization for Power-Performance
Tradeoff,” J. Low Power Electronics, Vol. 2, No. 3, pp.
378-387, December 2006.
Access other paper at
http://www.eng.auburn.edu/~vagrawal/TALKS/talks.html
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
56
Problem: Leakage Reduction
Following circuit is designed in 65nm CMOS technology using low threshold
transistors. Each gate has a delay of 5ps and a leakage current of 10nA.
Given that a gate with high threshold transistors has a delay of 12ps and
leakage of 1nA, optimally design the circuit with dual-threshold gates to
minimize the leakage current without increasing the critical path delay.
What is the percentage reduction in leakage power? What will the leakage
power reduction be if a 30% increase in the critical path delay is allowed?
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
57
Solution 1: No Delay Increase
Three critical paths are from the first, second and third inputs to the last
output, shown by a dashed line arrow. Each has five gates and a delay of
25ps. None of the five gates on the critical path (red arrow) can be
assigned a high threshold. Also, the two inverters that are on four-gate long
paths cannot be assigned high threshold because then the delay of those
paths will become 27ps. The remaining three inverters and the NOR gate
can be assigned high threshold. These gates are shaded blue in the circuit.
The reduction in leakage power = 1 – (4×1+7×10)/(11×10) = 32.73%
Critical path delay = 25ps
12ps
5ps
12ps
5ps
5ps
5ps
12ps
12ps
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
5ps
5ps
5ps
58
Solution 2: 30% Delay Increase
Several solutions are possible. Notice that any 3-gate path can have 2 high
threshold gates. Four and five gate paths can have only one high threshold
gate. One solution is shown in the figure below where six high threshold
gates are shown with shading and the critical path is shown by a dashed
red line arrow.
The reduction in leakage power = 1 – (6×1+5×10)/(11×10) = 49.09%
Critical path delay = 29ps
5ps
12ps
5ps
12ps
12ps
5ps
12ps
12ps
Copyright Agrawal, 2011
Lectures 10, 11, 12: Gate-level optimization
12ps
5ps
5ps
59