Download Low Power Design - Universidade Federal de Minas Gerais

Document related concepts
no text concepts found
Transcript
VLSI Design
Power
Frank Sill Torres
Department of Electronic Engineering, Federal University of Minas Gerais,
Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
[email protected]
http://www.cpdee.ufmg.br/~frank/
TRENDS
Copyright Sill Torres, 2012
2
Trend: Performance
1000000
100000
Pentium® 4 proc
10000
1 TIPS
1000
MIPS
100
10
1
386
Pentium® proc
8086
0,1
0,01
1970
8080
1980
1990
2000
2010
2020
Source: Moore, ISSCC 2003
Copyright Sill Torres, 2012
3
Trends – Power Dissipation
SoC Consumer Portable Power Trend [Source: ITRS, 2010 Update]
Copyright Sill Torres, 2012
Trends - Power Density
Nuclear Reactor →
←Hot Plate
Source: http://cpudb.stanford.edu/
Copyright Sill Torres, 2012
Problems of High Power Dissipation

Continuously increasing
performance demands

Increasing power dissipation of
technical devices

Today: power dissipation is a main
problem
High Power dissipation leads to:
 Reduced time of operation
 High efforts for cooling

 Higher weight (batteries)
 Increasing operational costs
 Reduced mobility
 Reduced reliability
Copyright Sill Torres, 2012
6
Chip Power Density Distribution
Power Map
On-Die Temperature

Power density is not uniformly distributed across the chip

Silicon is not a good heat conductor

Max junction temperature is determined by hot-spots

Impact on packaging, cooling
Copyright Sill Torres, 2012
7
„The Internet is an Electricity Hog“
Badische Zeitung, 2003

Energy for the internet in 2001 in Germany:
6.8 Bill. kWh = 1.4 % of total energy consumption
 2.35 Bn. kWh for 17.3 Mill. Internet-PCs
 1.91 Bn. for servers
 1.67 Bn. for the network
 0.87 Bn. for USV


Rate of growth (at the moment): 36 % per year
Prognosis: 2010 33 Bn. kWh



> 6 % total energy consumption
> 3 medium nuclear power plants
World: 400 Mill. PCs  0.16 PW (P = Peta=1015)
Copyright Sill Torres, 2012
8
Dissipation in a Notebook
Peripherals
Processing
ASICs
Disk
Display
Power supply
Battery
Copyright Sill Torres, 2012
programmable µPs
or DSPs
Memory
Communication
DC-DC
converter
WLAN
Ethernet
9
Examples for Energy Dissipation
Energy dissipation in a notebook
Copyright Sill Torres, 2012
Energy dissipation a PDA
10
Battery Capacity
Generalized Moore‘s Law
Intel beats Varta
Capacity of batteries
2% - 6% Increase per year
(up to year 2000)
Source: Timmernann, 2007
Copyright Sill Torres, 2012
11
Current Progresses
Batter.
20 kg

Factor 4 in the last 10 years  still much too less
Copyright Sill Torres, 2012
12
POWER CONSUMPTION IN
CMOS
Copyright Sill Torres, 2012
13
Metrics: Energy and Power

Energy




Measured in Joules or kWh
“Measure of the ability of a system to do work or produce a
change”
“No activity is possible without energy.”
Power




Measured in Watts or kW
“Amount of energy required for a given unit of time.”
Average power
 Average amount of energy consumed per unit time
 Simplified to "power" in clear contexts
Instantaneous power
 Energy consumed if time unit goes to zero
Copyright Sill Torres, 2012
14
Metrics: Energy and Power cont’d

Instantaneous Electrical Power P(t)
 P(t)
= v(t) * i(t)
 v(t): Potential difference (or voltage drop) across
component
 i(t): Current through component

Electrical Energy
E

= P(t) * t = v(t) * i(t) * t
Electrical Energy in CMOS circuits
 Energy
= Power * Delay
 Why?
Copyright Sill Torres, 2012
15
Consumption in CMOS

Voltage (Volt, V)
Water pressure (bar)

Current (Ampere, A)
Water quantity per second (liter/s)

Energy
Amount of Water
1
CL
0
Energy consumption is proportional to capacitive load!
Copyright Sill Torres, 2012
16
Consumption in CMOS cont’d

Voltage (Volt, V)
Water pressure (bar)

Current (Ampere, A)
Water quantity per second (liter/s)

Energy
Amount of Water
1
CL
0
Energy for calculation only consumed at 0→1 at output
Copyright Sill Torres, 2012
17
Energy and Instantaneous Power
INV1:
High instantaneous
Power (bigger width)
CL
 Same Energy (Cin ingnored)
 INV1 is faster
INV2:
Low instantaneous
power
CL
Copyright Sill Torres, 2012
td1 td2
18
Metrics: Energy and Power cont’d
Power is height of curve
Watts
Approach 1
Approach 2
time
Energy is area under curve
Watts
Approach 1
Approach 2
time
Energy = Power * time for calculation = Power * Delay
Copyright Sill Torres, 2012
19
Metrics: Energy and Power cont’d

Energy dissipation
 Determines
 Sets

battery life in hours
packaging limits
Peak power
 Determines
 Impacts
power ground wiring designs
signal noise margin and reliability
analysis
Copyright Sill Torres, 2012
20
Metrics: PDP and EDP

Power-Delay Product
 Power
 Quality

P, delay tp
criterion PDP = P * tp [J]

P and tp have some weight

Two designs can have same PDP, even if tp = 1 year
Energy-Delay Product
 EDP
= PDP * tp = P * tp2
 Delay tp
Copyright Sill Torres, 2012
has higher weight
21
Energy and Power

Average Power direct proportional to Energy
 In Following: Power means average power
Copyright Sill Torres, 2012
22
Where Does Power Go in CMOS?

Dynamic Power Consumption


Short Circuit Currents


Charging and discharging capacitors
Short circuit path between supply rails during switching
Leakage

Leaking diodes and transistors
Copyright Sill Torres, 2012
23
Dynamic Power Consumption
VDD
Vin
Vout
CL
f01= α * f
Pdyn = CL * VDD2 * P01 * f
P01 : probability for 0-to-1 switch of output
f : clock frequency
α : activity
Data dependent - a function of switching activity!
Copyright Sill Torres, 2012
24
Dynamic Power Consumption

E c   I (t )VDD (t )dt
VDD
0

dV
  CL
VDD (t )dt
dt
0
CL
 CLVDD
VDD
 dV
0
2
 CLVDD
Copyright Sill Torres, 2012
25
Transition Probabilities for CMOS Cells
Example: Static 2 Input NOR Cell
If A and B with same input signal probability:
Truth table of NOR2 cell
A
B
Out
1
1
0
0
1
0
1
0
0
0
0
1
PA=1 = 1/2
PB=1 = 1/2
Then:
POut=0 = 3/4
POut=1 = 1/4
P0→1
= POut=0 * POut=1
= 3/4 * 1/4 = 3/16
Ceff = P0→1 * CL = 3/16 * CL
Copyright Sill Torres, 2012
26
Transition Probabilities cont’d






A and B with different input signal probability:
PA and PB : Probability that input is 1
P1
: Probability that output is 1
Switching activity in CMOS circuits: P01 = P0 * P1
For 2-Input NOR: P1 = (1-PA)(1-PB)
Thus: P01 = (1-P1)*P1 = [1-(1-PA)(1-PB)]*[(1-PA)][1-PB] (see next slide)
P01 = Pout=0 * Pout=1
NOR
(1 - (1 - PA)(1 - PB)) * (1 - PA)(1 - PB)
OR
(1 - PA)(1 - PB) * (1 - (1 - PA)(1 - PB))
NAND
PAPB * (1 - PAPB)
AND
(1 - PAPB) * PAPB
XOR
(1 - (PA + PB- 2PAPB)) * (PA + PB- 2PAPB)
Copyright Sill Torres, 2012
27
Transition Probabilities cont’d
Transition Probability of NOR2 Cell as a Function of Input Probabilities
Probability of input signals → high influence on P01
Source: Timmernann, 2007
Copyright Sill Torres, 2012
28
Short Circuit Power Consumption
VDD
Vin
Isc
Vout
CL
tsc
GND

Finite slope of input signal

During switching: NMOS and PMOS transistors are conducting for
short period of time (tsc)

Direct current path between VDD and GND
Psc = VDD * Isc * (P01 + P10 )
Copyright Sill Torres, 2012
29
Leakage Power Consumption
VDD
Gate
Igate
Source
Igate
Isub
Drain
SiO2
Isub
L
CL

GND

Copyright Sill Torres, 2012
Most important Leakage currents:

Subthreshold Leakage Isub

Gate Oxide Leakage Igate
Pleak = Ileak * VDD ≈ (Isub + Igate)* VDD
30
Power Equations in CMOS
P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak
Dynamic power
(≈ 40 - 70% today
and decreasing
relatively)
Copyright Sill Torres, 2012
Short-circuit power
(≈ 10 % today and
decreasing
absolutely)
Leakage power
(≈ 20 – 50 %
today and
increasing)
31
LEAKAGE
Copyright Sill Torres, 2012
32
Trends
30 nm
50 nm
20 nm
10 nm
35 nm
SiGe S/D
Strained
Silicon
5 nm
SiGe S/D
Strained
Silicon
Metal Gate
Nanowire
Tri-Gate
5 nm
High-k
Si Substrate
S
G
Copyright Sill Torres, 2012
D
S
III-V
Carbon
Nanotube FET
33
Trends cont‘d
Power Dissipation [W]
(100 mm² Chip)
1400
Power Dissipation by
Leakage currents
1200
1000
800
Dynamic Power
Dissipation
600
400
200
0
90 nm 65 nm 45 nm 32 nm 22 nm 16 nm
Technology
Technologie
Source: S. Borkar (Intel), ‘05
Copyright Sill Torres, 2012
34
Recap: Transistor Geometrics
polysilicon
gate
Gate-width
W
tox
L
n+
n+
SiO2 gate oxide
(good insulator, eox = 3.9
p-type body
tox – thickness of oxide layer
Gate length
Source: Rabaey,“Digital Integrated Circuits”,1995
Copyright Sill Torres, 2012
35
Subthreshold Leakage


Threshold Voltage

Transistor characteristic

If: „Gate-Source“-Voltage Vgs
higher than Vth

Channel under Gate

Current between Drain and Source

If: Vgs lower than Vth

(ideal) No current
Vgs <
>V
Vthth
Gate
Gate
Subthreshold leakage Isub
 Leakage between Drain and
Source when Vgs < Vth

Based on:
 Short Channels
 Diffusion
 Thermionic Emission
Copyright Sill Torres, 2012
Drain
Source
Source
Isub
Drain
Diffusion
high
Concentration
Low
concentration
36
Subthreshold Leakage cont’d
Short-channel device
Log (Drain current)
Transistor is
conducting
Isub
NMOS-Transistor
0
Vth’
Vth
Gate voltage
Source: Agarwal, 2007
Copyright Sill Torres, 2012
37
Drain Induced Barrier Lowering (DIBL)
Vgs > Vth
Vgs < Vth
Vds
Vds
Gate
Source
Gate
Drain
Source
Drain
Potential
Height of curve =
Potential barrier
Changed by
gate voltage

Electrons have to overcome potential barrier to enter the channel

Ideal: Potential barrier is only controlled by gate voltage
Copyright Sill Torres, 2012
38
Drain Induced Barrier Lowering cont’d
Long-channel transistor (L > 2 µm)
Short-channel transistor (L < 180 nm)
Vds
Vds
G
Gate
Source
S
Drain
D
Lowering of
potential barrier
Vds = Vth
Vds = Vth
Vds = VDD
Vds = VDD
At short channel transistors potential barrier is also affected by drain
voltage
 If Vds = VDD Transistors can start to conduct even if Vgs < Vth

Copyright Sill Torres, 2012
39
Temperature dependence
20
Source: Chatterjee, Intel-labs
IOFF at 1100C
Normalized Isub/µm
16
Isub at 250C
12
8
4
130nm6x
0
0
20
40
60
80
100
120
Temperature (°C)

Based on Thermionic Emission: subthreshold leakage Isub increases
with temperature
Copyright Sill Torres, 2012
40
Gate Oxide Leakage
 Tunneling

effect
Electromagnetic wave strike at
barrier:
 Reflection

Potential Energy
Energy
Potential
+ Intrusion into barrier 0
If thickness is small enough:
 Wave
interfuse barrier partially:
(Electrons tunnel through Barrier)
 Gate

Igate
oxide leakage Igate
In Nanometer-Transistors, where
Tox< 2 nm
 Electrons
 Leakage
x
Tox
Gate
Gateoxide
Source
Tox
Drain
tunnel through gate oxide
current
Copyright Sill Torres, 2012
41
Gate Oxide Thickness at 45 nm
Copyright Sill Torres, 2012
42
Gate Oxide Leakage cont’d

Components of Gate Oxide Leakage:

Tunneling currents through overlap regions (gate-drain Igso, gatesource Igdo)

Tunneling currents into channel (gate-drain Igis, gate-source Igcd)

Tunneling currents between gate and bulk (Igb)
Gate
Source
Igso
Igcd
Igcs
Igdo
Drain
Igb
Bulk
Copyright Sill Torres, 2012
43
Further Leakage Components

Reverse bias pn junction conduction Ipn

Gate induced drain leakage IGIDL

Drain source punchthrough IPT

Hot carrier injection IHCI
IHCI
Gate
Source
IGIDL
Ipt
Copyright Sill Torres, 2012
Drain
Ipn
44
Leakage Dependencies

Leakage depends on:
 Gate
Width (Isub, Igate)
 Gate
Length (Isub, Igate)
 Gate
Oxide Thickness (Igate)
 Threshold
Voltage (Isub)
 Temperature
 Input
(Isub)
state (Igate)
Copyright Sill Torres, 2012
45
LOW POWER TECHNIQUES
Copyright Sill Torres, 2012
46
Lowering Dynamic Power

Reducing VDD has a quadratic effect!
 Has
a negative effect on performance especially as VDD
approaches 2VT

Lowering CL
 Improves
 Keep

performance as well
transistors minimum size
Reducing the switching activity, f01 = P01 * f
A
function of signal statistics and clock rate
 Impacted
Copyright Sill Torres, 2012
by logic and architecture design decisions
47
Power & Delay Dependence of Vth
VTH
W
P  pt  f CLK  CL VDD 2  I 0 T 10 S VDD
W0
td 
k Q
k'  CL VDD

I
(W / L )  (VDD  VTH ) K
w.o. gate leakage
Source: Sakurai, ‘01
Copyright Sill Torres, 2012
Micro transductors ‘08, Low Leakage
48
Transistor Sizing for Power Minimization
Lower Capacitance
Higher Voltage
Small W’s
To keep
performance
Large W’s
Higher Capacitance
Lower Voltage

Larger sized devices: only useful only when interconnects dominate

Minimum sized devices: usually optimal for low-power
Source: Timmernann, 2007
Copyright Sill Torres, 2012
49
Logic Style and Power Consumption

Voltage increases:
Power-delay product
improves

Best logic style
minimizes power-delay
for a given delay
constraint

New Logic style can
reduced Power
dissipation
(if possible / available !)
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
50
Logic Restructuring
 Logic restructuring: changing the topology of a logic
network to reduce transitions
AND: P01 = P0 * P1 = (1 - PAPB) * PAPB
0.5
A
B
0.5

(1-0.25)*0.25 = 3/16
W
7/64 = 0.109
X
15/256
C
F
0.5
D
0.5
0.5 A
0.5 B
0.5
C
0.5 D
3/16
Y
15/256
F
Z
3/16 = 0.188
Chain implementation has a lower overall switching activity than tree
implementation for random inputs
 BUT: Ignores glitching effects
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
51
Input Ordering
(1-0.5x0.2)*(0.5x0.2)=0.09
0.5
A
B
0.2
X
C
0.1
F
(1-0.2x0.1)*(0.2x0.1)=0.0196
0.2
B
X
C
F
0.1
A
0.5
AND: P01 = (1 - PAPB) * PAPB
Beneficial: postponing introduction of signals with
a high transition rate (signals with signal
probability close to 0.5)
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
52
Glitching
A
B
X
Z
C
ABC
101
000
X
Z
Unit Delay
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
53
Example 1: Chain of NAND Cells
out1
out2
out3
out4
out5
1
...
V (Volt)
6.0
4.0
out2
out4
out6
out8
VDD / 2
2.0
out1
out3
out5
out7
0.0
0
1
t (nsec)
2
3
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
54
Example 2: Adder Circuit
Cin
S14
S15
S0
S1
S2
S Output Voltage (V)
3
S3
2
S4
Cin
S2
S15
VDD / 2
S5
1
S10
S1
S0
0
0
2
4
6
8
10
12
Time (ps)
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
55
How to Cope with Glitching?
0
F1
0
1
F2
0
0
2
F3
0
0
F1
1
F3
0
0
F2
1
Equalize Lengths of Timing Paths Through Design
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
56
Clock Gating
 Power is reduced by two mechanisms
–Clock net toggles less frequently, reducing feff
–Registers’ internal clock buffering switches less often
d
din
en
q
dout
enF
FSM
enE
Execution
Unit
enM
Memory
Control
clk
d
din
q
qn
clk
en
clk
Local Gating
Copyright Sill Torres, 2012
dout
clk
Global Gating
Source: Jan M. Rabaey
qn
clk
Clock Gating Insertion

Local clock gating: 3 methods
 Logic
synthesizer finds and implements local gating
opportunities
 RTL
code explicitly specifies clock gating
 Clock

gating cell explicitly instantiated in RTL
Global clock gating: 2 methods
 RTL
code explicitly specifies clock gating
 Clock
gating cell explicitly instantiated in RTL
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Clock Gating VHDL Code
Conventional
RTL Code
//always clock the register
if rising_edge (clk) then
// form the flip-flop
if (enable = ‘1’)then q <= din; end if;
end if;
Low
Power Clock Gated RTL Code
//only clock the register when enable is true
gclk <= enable and clk;
// gate the clock
if rising_edge (gclk) then // form the flip-flop
q <= din;
end if;
Instantiated
Clock Gating Cell
//instantiate a clock gating cell from the target library
I1: clkgx1 port map(en=>enable, cp=>clk, gclk_out=>gclk);
if rising_edge (gclk) then // form the flip-flop
q <= din;
Source: Jan M. Rabaey
end if;
Copyright Sill Torres, 2012
Clock Gating: Example
Without clock gating
30.6mW
With clock gating
8.5mW
0
5
10
15
VDE
20
25
MIF
DSP/
HIF
Power [mW]
 90% of FlipFlops clock-gated
DEU
896Kb SRAM
 70% power reduction by clock-gating
MPEG4 decoder
Source: M. Ohashi, Matsushita, 2002
Copyright Sill Torres, 2012
Data Gating
Objective
 Reduce
wasted operations => reduce feff
Example
X
 Multiplier
whose inputs change
every cycle, whose output
conditionally feeds an ALU
Low
Power Version
 Inputs
are prevented from
rippling through multiplier
if multiplier output is not
selected
X
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Data Gating Insertion

Two insertion methods
 Logic
synthesizer finds and implements data gating
opportunities
 RTL


code explicitly specifies data gating
Some opportunities cannot be found by synthesizers
Issues
 Extra
logic in data path slows timing
 Additional
area due to gating cells
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Data Gating VHDL Code: Operand Isolation

Conventional Code
assign muxout = sel ? A : A*B ;
B
// build mux
X
muxout

Low Power Code
A
sel
assign multinA = sel & A ; // build and cell
assign multinB = sel & B ; // build and cell
assign muxout = sel ? A : multinA*multinB ;
B
X
muxout
A
sel
Copyright Sill Torres, 2012
Source: Jan M. Rabaey
Influence of Threshold Voltage Vth

Threshold Voltage Vth:

Influence on sub-threshold leakage Isub

Influence on delay of logic cells
55
160
120
Isub
50
45
80
40
40
0
0.25
35
0.27
0.29
0.31
0.33
0.35
Dealy [ps]
Leakage- -Isub
Isub [nA]
[nA]
Leakage
Inverter (BPTM 65 nm)
30
0.37
[V] [V]
Threshold Voltage
VthNMOS
VoltageVthNMOS
Threshold
Copyright Sill Torres, 2012
64
Influence of Gate Oxide Thickness Tox

Gate oxide Thickness Tox:

Influence on gate oxide leakage Igate

Influence on delay
160
50
120
45
Igate
40
80
35
40
30
0
25
1.4
1.6
1.7
1.8
2.0
Delay [ps]
Leakage - Igate [nA]
Inverter (BPTM 65 nm)
2.2
Gate oxide Thicknes Tox [nm]
Copyright Sill Torres, 2012
65
Recap: Data Paths

Data propagate through different data paths between registers
(flipflops - FF)

Paths mostly differ in propagation delay times

Frequency of clock signal (CLK) depends on path with longest delay
 critical path
FF
FF
FF
FF
FF
FF
Paths
Path
FF
CLK
Copyright Sill Torres, 2012
FF
CLK
FF
CLK
66
Recap: Slack
C
A
B
G1
Y
G2
A
G1 ready with
evaluation
B
Y
all inputs of G2
arrived
all Inputs of G1
arrived
C
delay of G1
Copyright Sill Torres, 2012
Slack for G1
time
67
Dual-Vth / Dual-Tox
Two different cell types:
“LVT / LTO”- Cells




Cells consist of „low-Vth“- or „low-Tox“-transistors
Low threshold voltage or thin gate oxide layer
For critical paths
High leakage / short delay
“HVT / HTO”- Cells




Cells consist of „high-Vth“- „high-Tox“-transistors
High threshold voltage or thick gate oxide layer
For uncritical paths
Low leakage / long delay
 Leakage reduction at constant performance
(no level converter necessary)
Copyright Sill Torres, 2012
68
Normalized Performance
Performance at different Dual-Vth
1.0
0.8
0.6
0.4
0.2
0.0
1.0V
0.9V
Low Vth
High Vth
0.8V
0.7V
Supply Voltage VDD
0.6V
Measured at NAND2 BPTM 65nm Technology
Copyright Sill Torres, 2012
69
Sub-Threshold Lekage [nA]
Leakage Isub at different Dual-Vth
80
60
40
20
0
1.0V
0.9V
Low Vth
High Vth
0.8V
0.7V
0.6V
Supply Voltage VDD
Measured at NAND2 BPTM 65nm Technology
Copyright Sill Torres, 2012
70
Dual-Vth / Dual-Tox Example
LVT- and/or
LTO-Cells
HVT- and/or
HTO-Cells
Critical Path
Copyright Sill Torres, 2012
71
Stack Effect

Transistor stack: at least two transistor from same type (NMOS or
PMOS) in a row

Based on behavior of internal nodes:
 The more transistors are non-conducting (off) the lower the leakage
Leakage Isub [nA]
10
8
6
4
2
0
1
2
3
Transistors off in stack
Copyright Sill Torres, 2012
4
Source: K. Roy
72
Sleep Transistors






Idea: Insertion of additional transistors
between logic block and supply lines
sleep
This transistors: connect with SLEEPsignal
Vdd
Virtual Vdd
If circuit has nothing to do:
SLEEP signal is active: Stack effect
(additional off transistor in row to
other)
If sleep transistors are High-Vth:
approach also called Multi-Threshold
CMOS (MTCMOS)
Low-Vth logic cells
sleep
Virtual Vss
Vss
Mostly insertion only of 1 Transistor
Source: Kaijian Shi, Synopsys
Copyright Sill Torres, 2012
73
Sleep Transistors: Realization
Ring style sleep transistor implementation
Global VDD
VDD
VVDD1
domain

VVDD2
domain
Sleep transistors are placed around each VVDD island
Source: Kaijian Shi, Synopsys
Copyright Sill Torres, 2012
74
Sleep Transistors: Realization cont’d
Grid style sleep transistor implementation
Global VDD
VVDD1
VDD
VVDD2
VVDD1
VVDD2
VVDD1
VVDD2

VDD network cross chip; VVDD networks in each gating domain

Sleep transistors are placed in grid connecting VDD and VVDDs
Source: Kaijian Shi, Synopsys
Copyright Sill Torres, 2012
75
Sleep Transistors: Problems
SLEEP


VDD
VDD
CMOS
Gatter / Block
CMOS
Gatter / Block
high-Vth
sleep transistor
R
I
Sleep transistor can be modeled as resistor R
In active mode (cell is working)



Current I through sleep transistor
Voltage Vx drop over resistor
Output voltage reduced to VDD-Vx
VDD - Vx
Vx = RI
Current I is not a
leakage current!
I is a discharging
current of load
capacitances
Reduced Delay (of following blocks)
Copyright Sill Torres, 2012
76
Stackforcing

Simple method of using stack effect
 Increasing stack by splitting transistors
 Cin stays constant
 Only one technology is needed
 Area is (almost) the same
 Drive strength (drain-source current) is reduced  delay goes down
VDD
VDD
WP/2
WP
WP/2
WN/2
WN/2
Copyright Sill Torres, 2012
77
Normalized delay
Stackforcing cont’d
No Stackforcing
Normalized Isub
Source: Narendra, et al., ISLPED01
Copyright Sill Torres, 2012
78
Input Vector Control (IVC)

Leakage of cell depends on input vector
VDD
Input vector
In3 In2 In1
In1
In2
In3
TN3
TN2
TN1
Copyright Sill Torres, 2012
Leakage
[nA]
Trans. off
in NMOS-Stack
0
0
0
0,1
TN3, TN2, TN1
0
0
1
0,2
TN3, TN2
0
1
0
0,2
TN3, TN1
0
1
1
1,9
TN3
1
0
0
0,2
TN2, TN1
1
0
1
1,3
TN2
1
1
0
1,2
TN1
1
1
1
9,4
-
79
Input Vector Control cont’d

Every circuits is input vector with minimum leakage

Idea: If design is in passive mode

SLEEP signal gets active

Sleep vector is applied
Data
MUX
Logic Circuit
Sleep Vector
SLEEP
Copyright Sill Torres, 2012
80
Pin Reordering
BPTM, 65 nm technology
VDD
Input vector
[In3,In2,In1]
T3
T2
T1
|Igate,stack|
001
Igdo
-
Igcs, Igso, Igcd, Igdo
→
65.9 nA
010
Igdo
Igci, Igcs, Igdo,
Igcd
-
↑
42.8 nA
100
-
Igdo
-
↓
10.3 nA
101
-
Igdo
Igcs, Igso, Igcd, Igdo
→
58.7 nA
110
-
-
Igdo
↓
7.6 nA
011
Igdo
Igci, Igso, Igdo,
Igcd
Igcs, Igso, Igcd, Igdo
↑
116.0 nA
Drain
In3
T3
In2
T2
Igdo
Igcd
Igso
In1
Igcs
T1



Example
Gate leakage in stack depends on input vector
Same logic input vector (amounts of ‘0’ and ‘1’ is equal) → can
result in different leakage
If input probability is known  reorder pins so that highest probable
state has minimum gate leakage
Copyright Sill Torres, 2012
81
Delay and Power versus VDD
10
Pdyn
5
4
8
td
6
3
4
2
1
2
0
0
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Relative Pdyn
Relative Delay td
6
2.4
Supply voltage (VDD)
Dynamic Power (and leakage) can be traded by delay
Copyright Sill Torres, 2012
Adaptive Dynamic Voltage/Frequency Scaling
(DVS/DFS)

Slow down processor to fill idle time

More Delay  lower operational voltage
Active
Idle
Active
Idle
Active
3.3 V
2.4 V

Runtime Scheduler determines processor speed and selects
appropriate voltage

Transitions delay for frequencies <150s

Potential to realize 10x energy savings

E.g.: Intel SpeedStep, AMD PowerNow, Transmeta Longrun
Copyright Sill Torres, 2012
DVS/DFS with Transmeta LongRun
% of max powerl consumption
100
90
80
70
60
50
40
30
20
10
0
300
300 Mhz
0.80 V
Peak performance region
Typical operating region
400
433 Mhz
0.87 V
500
533 Mhz
0.95 V
600
700
667 Mhz
1.05 V
800
800 Mhz
1.15 V
900
900 Mhz
1.25 V
1000
1000 Mhz
1.30 V
Frequency (MHz)
Source: Transmeta
Copyright Sill Torres, 2012
Multi-VDD

Objective


Reduce dynamic power by reducing the VDD2 term

Higher supply voltage used for speed-critical logic

Lower supply voltage used for non speed-critical logic
Example

Memory VDD = 1.2 V

Logic VDD = 1.0 V

Logic dynamic power
savings = 30%
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Multi-VDD Issues





Partitioning
 Which blocks and modules should use with voltages?
 Physical and logical hierarchies should match as much as possible
Voltages
 Voltages should be as low as possible to minimize CVDD2f
 Voltages must be high enough to meet timing specs
Level shifters
 Needed (generally) to buffer signals crossing islands
 Added delays must be considered
Physical design
 Multiple VDD rails must be considered during floorplanning
Timing verification
 Timing verification must be performed for all corner cases across
voltage islands.
Source: Jan M. Rabaey
Copyright Sill Torres, 2012
Multi-VDD Flow
Determine which blocks
run at which Vdd
Multi-voltage
synthesis
Determine floor plan
Multi-voltage placement
Clock tree synthesis
Route
Verify timing
Copyright Sill Torres, 2012
Source: Jan M. Rabaey
Power-orientated Programming
Switched Capacitance (nF)
14000
12000
10000
Others
Functional Unit
Pipeline Registers
Register File
8000
6000
4000
2000
0
bubble.c
heap.c
quick.c
 Algorithms can differ in power dissipation
Source: Irwin, 2000
Copyright Sill Torres, 2012