Download Low-Power Electronics and Systems

Document related concepts
no text concepts found
Transcript
Low-Power Electronics and
Systems
Vishwani D. Agrawal
James J. Danaher Professor
Department of Electrical and Computer Engineering
Auburn University, Auburn, AL 36849, USA
http://www.eng.auburn.edu/~vagrawal
[email protected]
August 9, 2006
Agrawal: VDAT'06 Tutorial II
1
Contents
• Introduction
• Dynamic power
– Short circuit power
– Reduced supply voltage operation
– Glitch elimination
• Static (leakage) power reduction
• Low power systems
– State encoding
– Processor and multi-core design
• Books on low-power design
August 9, 2006
Agrawal: VDAT'06 Tutorial II
2
Introduction
Power Consumption of VLSI Chips
Why is it a concern?
August 9, 2006
Agrawal: VDAT'06 Tutorial II
3
ISSCC, Feb. 2001, Keynote
Patrick P. Gelsinger
Senior Vice President
General Manager
Digital Enterprise Group
INTEL CORP.
August 9, 2006
“Ten years from now,
microprocessors will run at
10GHz to 30GHz and be capable
of processing 1 trillion operations
per second -- about the same
number of calculations that the
world's fastest supercomputer
can perform now.
“Unfortunately, if nothing
changes these chips will produce
as much heat, for their
proportional size, as a nuclear
reactor. . . .”
Agrawal: VDAT'06 Tutorial II
4
VLSI Chip Power Density
Source: Intel
Sun’s
Surface
Power Density (W/cm2)
10000
Rocket
Nozzle
1000
Nuclear
Reactor
100
8086
Hot Plate
10 4004
8008 8085
386
286
8080
1
1970
August 9, 2006
1980
P6
Pentium®
486
1990
Year
2000
Agrawal: VDAT'06 Tutorial II
2010
5
Meaning of Low-Power Design
• Design practices that reduce power
consumption at least by one order of magnitude;
in practice 50% reduction is often acceptable.
• General considerations in low-power design
–
–
–
–
–
Algorithms and architectures
High-level and software techniques
Gate and circuit-level methods
Power estimation techniques
Test power
August 9, 2006
Agrawal: VDAT'06 Tutorial II
6
Topics in Low-Power
• Power dissipation in CMOS circuits
• Device technology
– Low-power CMOS technologies
– Energy recovery methods
• Circuit and gate level methods
– Logic synthesis
– Dynamic power reduction techniques
– Leakage power reduction
• System level methods
– Microprocessors
– Arithmetic circuits
– Low power memory technology
• Test power
• Power estimation methods and tools
August 9, 2006
Agrawal: VDAT'06 Tutorial II
7
Power in a CMOS Gate
VDD
iDD(t)
Ground
August 9, 2006
Agrawal: VDAT'06 Tutorial II
8
Power Dissipation in
CMOS Logic (0.25µ)
Ptotal (0→1) = CL VDD2 + tscVDD Ipeak + VDDIleakage
VDD
VDD
CL
%75
August 9, 2006
Agrawal: VDAT'06 Tutorial II
%20
%5
9
Power and Energy
• Instantaneous power (Watts)
P(t) = iDD(t) VDD
• Peak power (Watts)
Ppeak = Max {P(t)}
• Average power (Watts)
T
Pav = [ ∫0 P(t) dt ]/T
• Energy (Joules)
T
E = ∫0 P(t) dt
August 9, 2006
Agrawal: VDAT'06 Tutorial II
10
Low-Power Design Techniques
• Circuit and gate level methods
–Reduced supply voltage
–Adiabatic switching and charge recovery
–Logic design for reduced activity
–Reduced Glitches
–Transistor sizing
–Pass-transistor logic
–Pseudo-nMOS logic
–Multi-threshold gates
August 9, 2006
Agrawal: VDAT'06 Tutorial II
11
Low-Power Design Techniques
• Functional and architectural methods
– Clock suppression
– Clock frequency reduction
– Supply voltage reduction
– Power down
– Algorithmic and Software methods
August 9, 2006
Agrawal: VDAT'06 Tutorial II
12
Test Power
• Power grid on a VLSI chip is designed for
certain current capacity during functional
operation:
– Average current → heat dissipation
– Peak current → noise, ground bounce
• Problem – Tests like scan or BIST are
nonfunctional and may cause higher than
the functional circuit activity; a functionally
good chip can fail the test.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
13
Power Estimation Methods
• Spice: Accurate but expensive
• Logic-level
– Event-driven simulation
– Statistical
– Probabilistic
• High-level: Hierarchical
August 9, 2006
Agrawal: VDAT'06 Tutorial II
14
Components of Power
• Dynamic
– Signal transitions
• Logic activity
• Glitches
– Short-circuit
• Static
– Leakage
Ptotal =
=
August 9, 2006
Pdyn + Pstat
Ptran + Psc + Pstat
Agrawal: VDAT'06 Tutorial II
15
Power of a Transition: Ptran
VDD
Ron
ic(t)
vi (t)
R=large
vo(t)
CL
Ground
August 9, 2006
Agrawal: VDAT'06 Tutorial II
16
Charging of a Capacitor
R
t=0
v(t)
i(t)
C
V
Charge on capacitor, q(t)
=
C v(t)
Current, i(t)
=
C dv(t)/dt
August 9, 2006
=
dq(t)/dt
Agrawal: VDAT'06 Tutorial II
17
C dv(t)/dt =
[V – v(t)] /R
dv(t)
V – v(t)
───
=
─────
dt
RC
dv(t)
dt
∫ ───── =
∫─────
V – v(t)
RC
-t
ln [V – v(t)]
=
── + A
RC
i(t)
=
Initial condition, t = 0, v(t) = 0 → A = ln V
-t
v(t) =
V [1 – exp(───)]
RC
August 9, 2006
Agrawal: VDAT'06 Tutorial II
18
v(t) =
i(t)
August 9, 2006
=
-t
V [1 – exp( ── )]
RC
dv(t)
C ───
dt
=
Agrawal: VDAT'06 Tutorial II
V
-t
── exp( ── )
R
RC
19
Total Energy Per Charging
Transition from Power Supply
Etrans =
=
August 9, 2006
∞
∫ V i(t) dt =
0
∞ V2
-t
∫ ── exp( ── ) dt
0 R
RC
CV2
Agrawal: VDAT'06 Tutorial II
20
Energy Dissipated per Transition in
Resistance (R) of “On” Transistors
∞2
R ∫ i (t) dt
0
August 9, 2006
=
V2 ∞
-2t
R ──
∫ exp( ── ) dt
2
R 0
RC
=
1
─ CV2
2
Agrawal: VDAT'06 Tutorial II
21
Energy Stored in Charged
Capacitor
∞
∞
-t
V
-t
∫ v(t) i(t) dt = ∫ V [1- exp( ── )] ─ exp( ── ) dt
0
0
RC R
RC
1
= ─ CV2
2
August 9, 2006
Agrawal: VDAT'06 Tutorial II
22
Transition Power
• Gate output rising transition
– Energy dissipated in pMOS transistor = CV2/2
– Energy stored in capacitor = CV2/2
• Gate output falling transition
– Energy dissipated in nMOS transistor = CV2/2
• Energy dissipated per transition = CV2/2
• Power dissipation:
Ptrans =
Etrans α fck =
α fck CV2/2
α
August 9, 2006
=
activity factor
Agrawal: VDAT'06 Tutorial II
23
Short Circuit Current, isc(t)
VDD
VDD - VTp
Vi(t)
VDD
Vi(t)
Vo(t)
Vo(t)
Volt
GND
VTn
0
Iscmaxf
isc(t)
Amp
0
August 9, 2006
tB
tE
Agrawal: VDAT'06 Tutorial II
1
Time (ns)
24
Short-Circuit Energy per Transition
• Escf =∫
tE
tB
VDD isc(t)dt = (tE – tB) IscmaxfVDD /2
• Escf = tf (VDD- |VTp| -VTn) Iscmaxf /2
• Escr = tr (VDD- |VTp| -VTn) Iscmaxr /2
• Escf = 0, when VDD = |VTp| + VTn
August 9, 2006
Agrawal: VDAT'06 Tutorial II
25
Short-Circuit Power and Voltage
Scaling
• Decreases and eventually becomes zero when
VDD is scaled down but the threshold voltages
are not scaled down.
• References:
– M. A. Ortega and J. Figueras, “Short Circuit Power
Modeling in Submicron CMOS,” PATMOS’96, Aug.
1996, pp. 147-166.
– T. Sakurai and A. Newton, “Alpha-power Law
MOSFET model and Its Application to a CMOS
Inverter,” IEEE J. Solid State Circuits, vol. 25, April
1990, pp. 584-594.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
26
Psc and Output Capacitance
VDD
Ron
ic(t)+isc(t)
vo(t)
vi (t)
tf
CL
R=large
tr
Ground
August 9, 2006
Agrawal: VDAT'06 Tutorial II
vo(t)
───
R↑
27
isc and Output Capacitance
Isc(t) =
August 9, 2006
-t
VDD[1- exp(─────)]
vo(t)
R↓tf (t)C
──── = ──────────────
R↑tf (t)
R↑tf (t)
Agrawal: VDAT'06 Tutorial II
28
iscmax and Output Capacitance
i
Small C
vo(t)
Large C
vo(t)
1
────
R↑tf (t)
iscmax
t
tf
August 9, 2006
Agrawal: VDAT'06 Tutorial II
29
Psc, Output Rise Times,
Capacitance
• For given input rise and fall times short
circuit power decreases as output
capacitance increases.
• Short circuit power increases with increase
of input rise and fall times.
• Short circuit power is reduced if output rise
and fall times are smaller than the input
rise and fall times.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
30
Effects of Scaling Down
•
•
•
•
1-16% short-circuit power at 0.7 micron
4-37% at 0.35 micron
12-60% at 0.17 micron
Reference: S. R. Vemuru and N.
Steinberg, “Short Circuit Power Dissipation
Estimation for CMOS Logic Gates,” IEEE
Trans. on Circuits and Systems I, vol. 41,
Nov. 1994, pp. 762-765.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
31
Summary: Short-Circuit Power
• Short-circuit power is consumed by each
transition (increases with input transition time).
• Reduction requires that gate output transition
should not be faster than the input transition
(faster gates can consume more short-circuit
power).
• Increasing the output load capacitance reduces
short-circuit power.
• Scaling down of supply voltage with respect to
threshold voltages reduces short-circuit power.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
32
Dynamic Power
isc
R
VDD
Dynamic Power
Vo
Vi
= CLVDD2/2 + Psc
CL
R
Ground
August 9, 2006
Agrawal: VDAT'06 Tutorial II
33
Dynamic Power Reduction
• Reduce power per transition
– Reduced voltage operation – voltage scaling
– Capacitance minimization – device sizing
• Reduce number of transitions
– Glitch elimination
August 9, 2006
Agrawal: VDAT'06 Tutorial II
34
CMOS Dynamic Power
Dynamic Power
=
Σ 0.5 αi fclk CLi VDD2
All gates i
≈ 0.5 α fclk CL VDD2
≈ α01 fclk CL VDD2
where
August 9, 2006
α
α01
fclk
CL
VDD
average gate activity factor
= 0.5α, average 0→1 trans.
clock frequency
total load capacitance
supply voltage
Agrawal: VDAT'06 Tutorial II
35
Example: 0.25μm CMOS Chip
•
•
•
•
•
f = 500MHz
Average capacitance = 15fF/gate
VDD = 2.5V
106 gates
Power = α01 f CL VDD2
= α01×500×106×(15×10-15×106) ×2.52
= 46.9W, for α01 = 1.0
August 9, 2006
Agrawal: VDAT'06 Tutorial II
36
Signal Activity, α
T=1/f
α01= 1.0
Clock
α01=
0.5
Comb.
signals
α01= 0.5
August 9, 2006
Agrawal: VDAT'06 Tutorial II
37
Reducing Dynamic Power
• Dynamic power reduction is
– Quadratic with reduction of supply voltage
– Linear with reduction of capacitance
August 9, 2006
Agrawal: VDAT'06 Tutorial II
38
2.5
0
2.0
-4
Gain
Vout (V)
0.25μm CMOS Inverter, VDD=2.5V
1.5
1.0
-8
-12
0.5
-16
0
-20
0
0
0.5
1.0
1.5
2.0
2.5
1.0
1.5
2.0
2.5
Vin (V)
Vin (V)
August 9, 2006
0.5
Agrawal: VDAT'06 Tutorial II
39
0.25μm CMOS Inverter, VDD< 2.5V
2.5
0.2
Vout (V)
Vout (V)
2.0
1.5
1.0
0.15
0.1
0.05
0.5
0
0
0
0.5
1.0
1.5
Vin (V)
August 9, 2006
2.0
2.5
Gain = -1
Agrawal: VDAT'06 Tutorial II
0
0.05
0.1
0.15
0.2
Vin (V)
40
Lower Bound on VDD
• For proper operation of gate, maximum gain (for
Vin = VDD/2) should be greater than 1.
• Gainmax = -(1/n)[exp(VDD /2ΦT) – 1] = -1
• n = 1.5
• ΦT = kT/q = 26mV
• VDD = 48V
• VDDmin > 2 to 4 times kT/q or ~100mV at room
temperature (27oC)
• Ref.: J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital
Integrated Circuits, Upper Saddle River, New Jersey: Pearson
Education, 2003.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
41
Impact of VDD on Performance
Inverter delay
=
CLVDD
K ───────
(VDD – Vt )α
30
20
Power
10
Delay
Power (log scale)
Delay (ns)
40
0
0.6V
VDD=Vt
August 9, 2006
1.8V
Agrawal: VDAT'06 Tutorial II
3.0V
VDD
42
Optimum Power × Delay
Power × Delay, PD =
VDD3
constant × ───────
(VDD – Vt)α
For minimum power-delay product, d(PD)/dVDD = 0
VDD
=
3Vt
───
3–α
For long channel devices, α = 2, VDD = 3Vt
For very short channel devices, α = 1, VDD = 1.5Vt
August 9, 2006
Agrawal: VDAT'06 Tutorial II
43
Transistor Sizing for Performance
• Problem: If we increase W/L to make the
charging or discharging of load
capacitance, then the increased W
increases the load for the driving gate
Cin
August 9, 2006
CL
Agrawal: VDAT'06 Tutorial II
44
Fixed-Taper Buffer
Delay
= t0
Vin
1
Cin
α
α2
αi-1
Ci = αi-1Cin
CL = αnCin
αn-1
Vout
CL
Ref.: J. Segura and C. F. Hawkins, CMOS Electronics, How It Works,
How It Fails, Piscataway, New Jersey: IEEE Press, 2004.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
45
Buffer (Cont.)
αn
n
=
CL/Cin
=
ln (CL/Cin)
──────
ln α
ith stage delay, ti = αt0, i = 1, . . . n,
because each stage drives a stage α times
bigger than itself.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
46
Buffer (Cont.)
Total delay
=
n
Σ ti =
i=1
nαt0
= ln(CL/Cin) αt0/ln(α)
August 9, 2006
Agrawal: VDAT'06 Tutorial II
47
Buffer (Cont.)
Differentiating total delay with respect to
α and equating to 0, we get
αopt = e ≈ 2.7
The optimum number of stages is
nopt = ln(CL/Cin)
August 9, 2006
Agrawal: VDAT'06 Tutorial II
48
Further Reading
B. S. Cherkauer and E. G. Friedman, “A Unified Design
Methodology for CMOS Tapered Buffers,” IEEE Trans.
VLSI Systems, vol. 3, no. 1, pp. 99-111, March 1995.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
49
Logic Activity and Glitches
1
2
3
6
5
4
d=2
August 9, 2006
7
d=1
d=1
d=1
Agrawal: VDAT'06 Tutorial II
50
Glitch Power Reduction
• Design a digital circuit for minimum
transient energy consumption by
eliminating hazards
August 9, 2006
Agrawal: VDAT'06 Tutorial II
51
Theorem 1
• For correct operation with minimum
energy consumption, a Boolean gate
must produce no more than one event
per transition.
Output logic state changes
One transition is necessary
August 9, 2006
Output logic state unchanged
No transition is necessary
Agrawal: VDAT'06 Tutorial II
52
Inertial Delay of a Gate (Inverter)
Vin
dHL+dLH
d = ────
dHL
2
dLH
Vout
time
August 9, 2006
Agrawal: VDAT'06 Tutorial II
53
Theorem 2
• Given that events occur at the input of a
gate with inertial delay d at times,
t1 ≤ . . . ≤ tn , the number of events at the
gate output cannot exceed
tn – t1
min ( n , 1 + -------d
)
tn - t1
t1
August 9, 2006
t2
t3
Agrawal: VDAT'06 Tutorial II
tn
time
54
Minimum Transient Design
• Minimum transient energy condition for a
Boolean gate:
| t i - tj | <
d
Where ti and tj are arrival times of input
events and d is the inertial delay of gate
August 9, 2006
Agrawal: VDAT'06 Tutorial II
55
Balanced Delay Method
• All input events arrive simultaneously
• Overall circuit delay not increased
• Delay buffers may have to be inserted
4?
1
1
1
1
1
1
1
3
1
August 9, 2006
1
1
Agrawal: VDAT'06 Tutorial II
56
Hazard Filter Method
• Gate delay is made greater than maximum input path
delay difference
• No delay buffers needed (least transient energy)
• Overall circuit delay may increase
August 9, 2006
3
1
1
1
1
1
1
1
1
3
Agrawal: VDAT'06 Tutorial II
57
Glitch-Free Design by Linear
Programming
•
•
•
•
Variables: gate and buffer delays
Objective: minimize number of buffers
Subject to: overall circuit delay
Subject to: minimum transient condition
for multi-input gate
August 9, 2006
Agrawal: VDAT'06 Tutorial II
58
Variables for Full-Adder
Delay variables
• Gate delay variables d4 . . . d12
• Buffer delay variables d15 . . . d29
Delay variables are located at the checkpoints of the circuit.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
59
Objective Function
• Ideal: minimize the number of non-zero
delay buffers
• Actual: minimize sum of buffer delays
August 9, 2006
Agrawal: VDAT'06 Tutorial II
60
Specify Critical Path Delay
0
Original design
0
0
0
0
1
1
1
0
0
1
0
0
0
1
1
1
1
0
0
0
1
Sum of delays on critical path ≤ maxdel
August 9, 2006
Agrawal: VDAT'06 Tutorial II
61
Multi-Input Gate Condition
d1
1
d
1
d
1
1
d
|d1 - d2| ≤ d
August 9, 2006
≡
d2
d1 - d2 ≤ d
d2 - d1 ≤ d
Agrawal: VDAT'06 Tutorial II
62
Results: 1-Bit Adder
R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language
for Mathematical Programming, South San Francisco: The Scientific Press,
1993.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
63
AMPL Solution: maxdel = 6
1
2
1
1
1
2
1
1
1
2
2
August 9, 2006
Agrawal: VDAT'06 Tutorial II
64
AMPL Solution: maxdel = 7
3
1
1
1
2
1
1
2
1
2
August 9, 2006
Agrawal: VDAT'06 Tutorial II
65
AMPL Solution: maxdel ≥ 11
5
1
1
1
2
3
1
3
4
August 9, 2006
Agrawal: VDAT'06 Tutorial II
66
Removing a Limitation
• Constraints are written by path enumeration.
• Since number of paths in a circuit can be exponential
in circuit size, the formulation is infeasible for large
circuits.
• Example: c880 has 6.96M constraints.
• Solution: A linear complexity method. See,
– T. Raja, Master’s Thesis, Rutgers University, 2002.
– T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum Dynamic
Power CMOS Circuit Design by a Reduced Constraint Set
Linear Program,” Proc. 16th International Conf. VLSI Design,
2003, pp. 527-532.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
67
Number of constraints
Comparison of Constraints
Number of gates in circuit
August 9, 2006
Agrawal: VDAT'06 Tutorial II
68
Benchmark Circuits
Circuit
Maxdel.
(gates)
No. of
Buffers
C432
17
34
95
66
0.72
0.62
0.67
0.60
C880
24
48
62
34
0.68
0.68
0.54
0.52
C6288
47
94
294
120
0.40
0.36
0.36
0.34
c7552
43
86
366
111
0.38
0.36
0.34
0.32
August 9, 2006
Agrawal: VDAT'06 Tutorial II
Normalized Power
Average
Peak
69
Instantaneous Energy x10--10 Joules
c7552: 3,500-gate CMOS Circuit
August 9, 2006
Clock Cycles
Agrawal: VDAT'06 Tutorial II
70
References
•
•
•
•
•
•
•
R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for
Mathematical Programming, South San Francisco: The Scientific Press, 1993.
M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc.
ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188.
V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10th Int’l Conf.
VLSI Design, Jan. 1997, pp. 193-197.
V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital
Circuit Design for Minimum Transient Energy and Linear Programming Method,”
Proc. 12th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439.
M. Hsiao, E. M. Rudnick and J. H. Patel, “Effects of Delay Model in Peak Power
Estimation of VLSI Circuits,” Proc. ICCAD, Nov. 1997, pp. 45-51.
T. Raja, A Reduced Constraint Set Linear Program for Low Power Design of
Digital Circuits, Master’s Thesis, Rutgers Univ., New Jersey, 2002.
T. Raja, V. D. Agrawal and M. L. Bushnell, “Transistor Sizing of Logic gates to
Maximize Input Delay Variability,” J. of Low Power Electronics (JOLPE), vol. 2,
pp. 121-128, 2006.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
71
Static (Leakage) Power
• Dynamic
– Signal transitions
• Logic activity
• Glitches
– Short-circuit
• Static
– Leakage
August 9, 2006
Agrawal: VDAT'06 Tutorial II
72
Leakage Power
IG
Ground
VDD
R
n+
Isub
IPT
IGIDL
August 9, 2006
Agrawal: VDAT'06 Tutorial II
n+
ID
73
Leakage Current Components
• Subthreshold conduction, Isub
• Reverse bias pn junction conduction, ID
• Gate induced drain leakage, IGIDL due to
tunneling at the gate-drain overlap
• Drain source punchthrough, IPT due to
short channel and high drain-source
voltage
• Gate tunneling, IG through thin oxide
August 9, 2006
Agrawal: VDAT'06 Tutorial II
74
Subthreshold Current
Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH)/nVt}
μ0: carrier surface mobility
Cox: gate oxide capacitance per unit area
L: channel length
W: gate width
Vt = kT/q: thermal voltage
n: a technology parameter
August 9, 2006
Agrawal: VDAT'06 Tutorial II
75
IDS for Short Channel Device
Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH+ηVDS)/nVt}
VDS = drain to source voltage
η: a proportionality factor
August 9, 2006
Agrawal: VDAT'06 Tutorial II
76
Increased Subthreshold Leakage
Scaled device
Log Isub
Ic
0 VTH’ VTH
August 9, 2006
Agrawal: VDAT'06 Tutorial II
Gate voltage
77
Reducing Leakage Power
• Leakage power as a fraction of the total power
increases as clock frequency drops. Turning
supply off in unused parts can save power.
• For a gate it is a small fraction of the total power;
it can be significant for very large circuits.
• Scaling down features requires lowering the
threshold voltage, which increases leakage
power; roughly doubles with each shrinking.
• Multiple-threshold devices are used to reduce
leakage power.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
78
Problem Statement
• Problem: To Design a CMOS Circuit,
– using dual-threshold devices to globally minimize
subthreshold leakage
– using delay elements to eliminate all glitches
– maintaining specified performance
– allowing performance-power tradeoff
• Reference: Y. Lu and V. D. Agrawal, “Leakage
and Dynamic Glitch Power Minimization Using
Integer Linear Programming for Vth Assignment
and Path Balancing,” Proc. PATMOS, 2005, pp.
217-226.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
79
MILP: Mixed Integer Linear Program
Minimize { Σ Xi ILi + (1-Xi)IHi
all gates i
+ Σ Σ Δdij }
all gates i→ j
Where
Xi = 1, gate i has low Vth, low leakage = ILi
Xi = 0, gate i has high Vth, high leakage = IHi
Δdij = delay inserted between gates i and j
for glitch suppression
Xi = [0,1], is an integer, Δdij is a real variable
ILi and IHi are constants for gate i obtained by SPICE simulation
August 9, 2006
Agrawal: VDAT'06 Tutorial II
80
MILP - Constraints
 Circuit delay constraint for each PO i:
Ti  Tmax
Tmax can be the delay of critical path or clock period specified by the
circuit designer.
 Glitch suppression constraint for each gate i:
Ti  T j  d i , j   X i  DLi  (1  X i )  DHi  (1)
ti  t j  di , j   X i  DLi  (1  X i )  DHi 
(2)
X i  DLi  (1  X i )  DHi  Ti  ti
(3)
Constraints (1), (2) and (3) make sure that Ti - ti < di for each gate,
so glitches are eliminated.
Ti is the latest signal arrival time at the output of gate i.
ti is the earliest signal arrival time at the output of gate i.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
81
A
Power-Delay Tradeoff Example
14-Gate Full Adder (Unptimized, Tmax = Tc)
C0
B
C
Low Vth gates
Critical path
S
Ileak = 161 pA
August 9, 2006
Agrawal: VDAT'06 Tutorial II
82
Power-Delay Tradeoff Example
14-Gate Full Adder (Optimized, Tmax = Tc)
A
C0
B
C
Low Vth
High Vth
Delay buffer (high Vth)
Critical path
S
Ileak = 73 pA
August 9, 2006
Agrawal: VDAT'06 Tutorial II
83
Power-Delay Tradeoff Example
14-Gate Full Adder (Optimized, Tmax = 1.25Tc)
A
C0
B
C
Low Vth
High Vth
Delay buffer (high Vth)
Critical path
S
Ileak = 16 pA
August 9, 2006
Agrawal: VDAT'06 Tutorial II
84
Leakage Reduction and Performance Tradeoff
@ 27℃, 70nm
Circuit
#
gates
Critical
Path
Delay
Tc (ns)
C432
160
0.751
2.620
1.022
61.0%
0.42
0.132
95.0%
0.3
C499
182
0.391
4.293
3.464
19.3%
0.08
0.225
94.8%
1.8
C880
328
0.672
4.406
0.524
88.1%
0.24
0.153
96.5%
0.3
C1355
214
0.403
4.388
3.290
25.0%
0.1
0.294
93.3%
2.1
C1908
319
0.573
6.023
2.023
66.4%
59
0.204
96.6%
1.3
C2670
362
1.263
5.925
0.659
90.4%
0.38
0.125
97.9%
0.16
C3540
1097
1.748
15.622
0.972
93.8%
3.9
0.319
98.0%
0.74
C5315
1165
1.589
19.332
2.505
87.1%
140
0.395
98.0%
0.71
C6288
1189
2.177
23.142
6.075
73.8%
277
0.678
97.1%
7.48
C7552
1046
1.915
22.043
0.872
96.0%
1.1
0.445
98.0%
0.58
August 9, 2006
Unoptimized
Ileak (μA)
Optimized
Ileak (μA)
(Tmax= Tc )
Leakage
Reduction
Sun
OS 5.7
CPU
secs.
Optimized
Ileak (μA)
(Tmax=
1.25Tc )
Leakage
Reduction
Sun
OS 5.7
CPU
secs.
Agrawal: VDAT'06 Tutorial II
85
Leakage, Dynamic and Total Power Comparison
@ 90℃, 70nm
Circuit
#
Gates
Leakage Power
Dynamic Power
Total Power
Pleak1*
(uW)
Pleak2*
(uW)
Leakage
Reduction
Pdyn1*
(uW)
Pdyn2*
(uW)
Dynamic
Reduction
Ptotal1*
(uW)
Ptotal2*
(uW)
Total
Reduction
C432
160
35.77
11.87
66.8%
101.0
73.3
27.4%
136.8
85.2
37.7%
C499
182
50.36
39.94
20.7%
225.7 160.3
29.0%
276.1
200.2
27.5%
C880
328
85.21
11.05
87.0%
177.3 128.0
27.8%
262.5
139.1
47.0%
C1355
214
54.12
39.96
26.3%
293.3 165.7
43.5%
347.4
205.7
40.8%
C1908
319
92.17
29.69
67.8%
254.9 197.7
22.4%
347.1
227.4
34.5%
C2670
362
115.4
11.32
90.2%
128.6 100.8
21.6%
244.0
112.1
54.1%
C3540 1097
302.8
17.98
94.1%
333.2 228.1
31.5%
636.0
246.1
61.3%
C5315 1165
421.1
49.79
88.2%
465.5 304.3
34.6%
886.6
354.1
60.1%
C6288 1189
388.5
97.17
75.0%
1691.2 405.6
76.0%
2079.7 502.8
75.8%
C7552 1046
444.4
18.75
95.8%
380.9 227.8
40.2%
825.3
70.1%
246.6
* 1: unoptimized circuits; 2: optimized circuits.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
86
Low-Power System Design
• State encoding
– Bus encoding
– Finite state machine
• Clock gating
– Flip-flop
– Shift register
• Microprocessors
– Single processor
– Multi-core processor
August 9, 2006
Agrawal: VDAT'06 Tutorial II
87
Bus Encoding
• Example: Four bit bus
• 0000→1110 has three transitions.
• If bits of second pattern are inverted, then 0000→0001 will
have only one transition.
Number of bit transitions
after inversion encoding
• Bit-inversion encoding for N-bit bus:
August 9, 2006
N
N/2
0
0
N/2
Number of bit transitions
Agrawal: VDAT'06 Tutorial II
N
88
Sent data
Received data
Bus-Inversion Encoding Logic
Polarity
decision
logic
August 9, 2006
Bus register
Polarity bit
M. Stan and W. Burleson, “Bus-Invert
Coding for Low Power I/O,” IEEE Trans.
VLSI Systems, vol. 3, no. 1, pp. 49-58,
March 1995.
Agrawal: VDAT'06 Tutorial II
89
Transition
probability
based on
PI statistics
FSM State Encoding
0.6
11
0.3
0.4
00
0.6
0.6
0.1
01
0.3
0.1
0.4
01
00
0.9
0.6
0.1
0.1
11
0.9
Expected number of state-bit transitions:
2(0.3+0.4) + 1(0.1+0.1) = 1.6
1(0.3+0.4+0.1) + 2(0.1) = 1.0
State encoding can be selected using a power-based cost function.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
90
FSM: Clock-Gating
• Moore machine: Outputs depend only on
the state variables.
– If a state has a self-loop in the state transition
graph (STG), then clock can be stopped
whenever a self-loop is to be executed.
Xi/Zk
Si
Sk
Sj
August 9, 2006
Xk/Zk
Xj/Zk
Agrawal: VDAT'06 Tutorial II
Clock can be stopped
when (Xk, Sk) combination
occurs.
91
Clock-Gating in Moore FSM
PI
Flip-flops
Combinational
logic
Clock
activation
logic
CK
August 9, 2006
Latch
PO
L. Benini and G. De Micheli,
Dynamic Power Management,
Boston: Springer, 1998.
Agrawal: VDAT'06 Tutorial II
92
Clock-Gating in Low-Power Flip-Flop
D
D
Q
CK
C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in
W. Nebel and J. Mermet (ed.), Low Power Design in Deep
Submicron Electronics, Boston: Kluwer Academic Publishers,
1997.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
93
Reduced-Power Shift Register
D
Q
D
Q
D
Q
D
Q
multiplexer
D
D
Q
D
Q
D
Q
D
Output
Q
CK(f/2)
Flip-flops are operated at full voltage and half the clock frequency.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
94
Power Reduction in Processors
• Just about everything is used.
• Hardware methods:
•
•
•
•
Voltage reduction for dynamic power
Dual-threshold devices for leakage reduction
Clock gating, frequency reduction
Sleep mode
• Architecture:
• Instruction set
• hardware organization
• Software methods
August 9, 2006
Agrawal: VDAT'06 Tutorial II
95
SIA Roadmap for Processors (1999)
Year
1999
2002
2005
2008
2011
2014
Feature size (nm)
180
130
100
70
50
35
Logic transistors/cm2
6.2M
18M
39M
84M
180M
390M
Clock (GHz)
1.25
2.1
3.5
6.0
10.0
16.9
Chip size (mm2)
340
430
520
620
750
900
Power supply (V)
1.8
1.5
1.2
0.9
0.6
0.5
High-perf. Power (W)
90
130
160
170
175
183
Source: http://www.semichips.org
August 9, 2006
Agrawal: VDAT'06 Tutorial II
96
Power Reduction Example
•
•
•
•
•
•
•
Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W
Reduce voltage to 1.5V, power (5.3x) = 4.9W
Eliminate FP, power (3x) = 1.6W
Scale 0.75→0.35μ, power (2x) = 0.8W
Reduce clock load, power (1.3x) = 0.6W
Reduce frequency 200→160MHz, power (1.25x) = 0.5W
J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no.
11, pp. 1703-1714, Nov. 1996.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
97
Low-Power Datapath Architecture
• Lower supply voltage
– This slows down circuit speed
– Use parallel computing to gain the speed back
• Works well when threshold voltage is also
lowered.
• About 60% reduction in power obtainable.
• Reference: A. P. Chandrakasan and R. W.
Brodersen, Low Power Digital CMOS Design,
Boston: Kluwer Academic Publishers (Now
Springer), 1995.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
98
Combinational
logic
Register
Input
Register
A Reference Datapath
Output
Cref
CK
Supply voltage
Total capacitance switched per cycle
Clock frequency
Power consumption:
Pref
August 9, 2006
Agrawal: VDAT'06 Tutorial II
= Vref
= Cref
=f
= CrefVref2f
99
Comb.
Logic
Copy 2
Multiphase
Clock gen.
and mux
control
f/N
Register
f/N
N = Deg. of
parallelism
Register
Input
Comb.
Logic
Copy 1
Supply voltage:
VN ≤ V1 = Vref
N to 1 multiplexer
f/N
Register
A copy processes
every Nth input,
operates at
reduced voltage
Register
A Parallel Architecture
Output
f
Comb.
Logic
Copy N
CK
August 9, 2006
Agrawal: VDAT'06 Tutorial II
100
Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
August 9, 2006
Agrawal: VDAT'06 Tutorial II
101
Power
PN
=
Pproc + Poverhead
Pproc
=
N(Cinreg+ Ccomb)VN2f/N + CoutregVN2f
=
(Cinreg+ Ccomb+Coutreg)VN2f
=
CrefVN2f
CoverheadVN2f
PN
[1 + δ(N – 1)]CrefVN2f
=
PN
──
P1
August 9, 2006
≈ δCref(N – 1)VN2f
Poverhead =
=
VN2
[1 + δ(N – 1)] ───
Vref2
Agrawal: VDAT'06 Tutorial II
102
Voltage vs. Speed
Delay of a gate, T
≈
CLVref
────
I
=
CLVref
──────────
k(W/L)(Vref – Vt)2
Normalized
gate delay, T
where I is saturation current
k is a technology parameter
W/L is width to length ratio of transistor
Vt is threshold voltage
4.0
1.2μ CMOS Voltage reduction
slows down as we
N=3
3.0
get closer to Vt
N=2
2.0
N=1
1.0
0.0
August 9, 2006
Vt
V V2=2.9V Vref =5V
Agrawal:
VDAT'06 Tutorial II
3
Supply voltage
103
Increasing Multiprocessing
1.0
1.2μ CMOS, Vref = 5V
0.8
Vt=0.8V
0.6
PN/P1
Vt=0.4V
0.4
0.2
Vt=0V (extreme case)
0.0
1
2
3
4
5
6
7
8
9
10
11
12
N
August 9, 2006
Agrawal: VDAT'06 Tutorial II
104
Extreme Cases: Vt = 0
Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN
──
P1
=
[1+ δ (N – 1)]
1
──
N2
→
1/N
For negligible overhead, δ→0
PN
──
P1
≈
1
──
N2
For Vt > 0, power reduction is less and there will be an
optimum value of N.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
105
Example: Multiplier Core
• Specification:
• 200MHz Clock
• 15W dissipation @ 5V
• Low voltage operation, VDD ≥ 1.5 volts
Relative clock rate
=
(VDD – 0.5)2
───────
20.25
• Problem:
• Integrate multiplier core on a SOC
• Power budget for multiplier ~ 5W
August 9, 2006
Agrawal: VDAT'06 Tutorial II
106
Multiphase
Clock gen.
and mux
control
40MHz
Reg
40MHz
Output
Reg
Multiplier
Core 2
5 to 1 mux
Input
Reg
40MHz
Multiplier
Core 1
Reg
A Multicore Design
200MHz
Multiplier
Core 5
200MHz
CK
Core clock frequency = 200/N, N should divide 200.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
107
How Many Cores?
• For N cores:
• clock frequency = 200/N MHz
• Supply voltage, VDDN= 0.5 + (20.25/N)1/2 Volts
• Assuming 10% overhead per core,
VDDN 2
Power dissipation =15 [1 + 0.1(N – 1)] (───) watts
5
August 9, 2006
Agrawal: VDAT'06 Tutorial II
108
Design Tradeoffs
Number of cores
N
Clock (MHz)
Core supply
VDDN (Volts)
Total Power
(Watts)
1
200
5.00
15.0
2
100
3.68
8.94
4
50
2.75
5.90
5
40
2.51
5.29
8
25
2.10
4.50
August 9, 2006
Agrawal: VDAT'06 Tutorial II
109
Output Input
½
Proc.
Register
Processor
Register
Input
Register
Pipeline Architecture
½
Proc.
Output
f
f
Capacitance = C
Voltage = V
Frequency = f
Power = CV2f
August 9, 2006
Capacitance = 1.2C
Voltage = 0.6V
Frequency = f
Power = 0.432CV2f
Agrawal: VDAT'06 Tutorial II
110
Approximate Trend
n-parallel proc.
n-stage pipeline proc.
Capacitance
nC
C
Voltage
V/n
V/n
Frequency
f/n
f
Power
CV2f/n2
CV2f/n2
Chip area
n times
10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer
Academic Publishers, 1998.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
111
Performance based on
SPECint2000 and SPECfp2000 benchmarks
Multicore Processors
August 9, 2006
Computer, May 2005, p. 12
Multicore
Single core
2000
2004
Agrawal: VDAT'06 Tutorial II
2008
112
Multicore Processors
• D. Geer, “Chip Makers Turn to Multicore
Processors,” Computer, vol. 38, no. 5, pp. 11-13,
May 2005.
• A. Jerraya, H. Tenhunen and W. Wolf,
“Multiprocessor Systems-on-Chips,” Computer,
vol. 5, no. 7, pp. 36-40, July 2005; this special
issue contains three more articles on
multicore processors.
• S. K. Moore, “Winner Multimedia Monster –
Cell’s Nine Processors Make It a Supercomputer
on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp.
20-23, January 2006.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
113
Cell - Cell Broadband Engine
Architecture
© IEEE Spectrum, January 2006
Nine-processor chip:
192 Gflops
August 9, 2006
L to R
Atsushi Kameyama, Toshiba
James Kahle, IBM
Masakazu Suzoki, Sony
Agrawal: VDAT'06 Tutorial II
114
Cell’s Nine-Processor Chip
© IEEE Spectrum, January 2006
August 9, 2006
Agrawal: VDAT'06 Tutorial II
Eight Identical
Processors
f = 5.6GHz (max)
44.8 Gflops
115
Books on Low-Power Design (1)
•
•
•
•
•
•
•
•
•
•
•
•
L. Benini and G. De Micheli, Dynamic Power Management Design Techniques
and CAD Tools, Boston: Springer, 1998.
T. D. Burd and R. A. Brodersen, Energy Efficient Microprocessor Design, Boston:
Springer, 2002.
A. Chandrakasan and R. Brodersen, Low-Power Digital CMOS Design, Boston:
Springer, 1995.
A. Chandrakasan and R. Brodersen, Low-Power CMOS Design, New York: IEEE
Press, 1998.
J.-M. Chang and M. Pedram, Power Optimization and Synthesis at Behavioral
and System Levels using Formal Methods, Boston: Springer, 1999.
M. S. Elrabaa, I. S. Abu-Khater and M. I. Elmasry, Advanced Low-Power Digital
Circuit Techniques, Boston: Springer, 1997.
R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum
Publishers, 2002.
S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Boston:
Springer, 1998.
J. B. Kuo and J.-H. Lou, Low-Voltage CMOS VLSI Circuits, New York: WileyInterscience, 1999.
J. Monteiro and S. Devadas, Computer-Aided Design Techniques for Low Power
Sequential Logic Circuits, Boston: Springer, 1997.
S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS
Technologies, Boston: Springer, 2005.
W. Nebel and J. Mermet, Low Power Design in Deep Submicron Electronics,
Boston: Springer, 1997.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
116
Books on Low-Power Design (2)
•
•
•
•
•
•
•
•
•
•
•
•
•
N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits,
Boston: Springer, 2003.
V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic and N. Nedovic, Digital System
Clocking: High Performance and Low-Power Aspects, Wiley-IEEE, 2005.
M. Pedram and J. M. Rabaey, Power Aware Design Methodologies, Boston:
Springer, 2002.
C. Piguet, Low-Power Electronics Design, Boca Raton: Florida: CRC Press, 2005.
J. M. Rabaey and M. Pedram, Low Power Design Methodologies, Boston:
Springer, 1996.
S. Roudy, P. K. Wright and J. M. Rabaey, Energy Scavenging for Wireless Sensor
Networks, Boston: Springer, 2003.
K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design, New York: WileyInterscience, 2000.
E. Sánchez-Sinencio and A. G. Andreaou, Low-Voltage/Low-Power Integrated
Circuits and Systems – Low-Voltage Mixed-Signal Circuits, New York: IEEE
Press, 1999.
W. A. Serdijn, Low-Voltage Low-Power Analog Integrated Circuits,
Boston:Springer, 1995.
S. Sheng and R. W. Brodersen, Low-Power Wireless Communications: A
Wideband CDMA System Design, Boston: Springer, 1998.
G. Verghese and J. M. Rabaey, Low-Energy FPGAs, Boston: springer, 2001.
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Springer, 1998.
K.-S. Yeo and K. Roy, Low-Voltage Low-Power Subsystems, McGraw Hill, 2004.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
117
Other Books Useful in Low-Power
Design
• A. Chandrakasan, W. J. Bowhill and F. Fox, Design of HighPerformance Microprocessor Circuits, New York: IEEE Press,
2001.
• N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition,
Reading, Massachusetts, Addison-Wesley, 2005.
• S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits,
New York: McGraw-Hill, 1996.
• E. Larsson, Introduction to Advanced System-on-Chip Test
Design and Optimization, Springer, 2005.
• J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated
Circuits, Second Edition, Upper Saddle River, New Jersey:
Prentice-Hall, 2003.
• J. Segura and C. F. Hawkins, CMOS Electronics, How It Works,
How It Fails, New York: IEEE Press, 2004.
August 9, 2006
Agrawal: VDAT'06 Tutorial II
118