Download presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Energy Recovery
from High-frequency Clocks
using DC-DC Converters
Mehdi Alimadadi, Samad Sheikhaei,
Guy Lemieux, Shahriar Mirabbasi, William Dunford
University of British Columbia, Canada
Patrick Palmer
University of Cambridge, UK
Problem
Clock power in high-performance CPUs
CPU
•
Year
Clock
Power
% Power
for Clock
Clock
Power
Intel McKinley
2002
(180nm)
1 GHz
130W
33%
43W
Intel Montecito
2005
(90nm)
2.5 GHz
85W
30%
25W
IBM Power 6
2007
(65nm)
5 GHz
> 100W
22%
> 22W
Cause
– Charge big clock capacitor Cclk with energy
– Discharge Cclk energy to GND (WASTE IT!!)
– Repeat every clock cycle
2
Primary Contribution of This Work
•
Primary contribution
– Discharge Cclk using DC-DC converter instead of GND
• Use converter to power useful load (Rload)
• Integrated clock drivers with DC-DC converters
• Net savings in power
 Voltage feedback (for regulation)
Useful
Load
3
Summary Results
•
Explore 3 main DC-DC power converter topologies
– Buck converter
– Boost converter
– Buck-boost converter
•
our previous work
this paper
this paper
[ ISSCC 2007 ]
[ ISVLSI 2008 ]
[ ISVLSI 2008 ]
90nm layouts, 3GHz operation, < 0.3mm2
Clock-only
power
(input)
Extra power to
operate
converter
(input)
Converter
output power
% clock
energy
recovered
Buck converter
[ ISSCC2007 ]
40mW
16mW
26mW
50%
Boost
converter
100mW
25mW
28mW
20%
Buck-boost
converter
100mW
72mW
48mW
30%
4
Background
Background – Typical Clocking Architecture
Bottom mesh
Final H-tree
Clock
Source
Level 3 Gaters & Final drivers
Level 1 & Level 2 H-tree
6
Background – Typical Clocking Architecture
• Clock distribution
– Majority of energy used by final drivers
– Levels 1, 2
•
•
•
•
H-trees
Tunable delays (CVDs) to eliminate skew
Low-swing, differential  low power, noise immunity
~ 5W of power
– Level 3
• Gaters reduce clock activity 50-85% (Power6)
– Can’t eliminate all activity  still need a clock to compute
• Final clock drivers
– Full-rail swing  tapered inverters drive hundreds latches, high power
• H-tree with ends shorted by Mesh  low skew, high power
• ~15W to 40W of power
7
Background –Reducing Clock Power
• Clock distribution
– Low-swing (differential) signals
• Final drivers need full-rail
– Resonant clocking (saves 80%)
• Final drivers need square clock
• Final clock drivers
– Adiabatic switching
• Low-performance, < 100MHz
– Double-edge clocking
• Feasible, but complex flip-flops, larger loads
• Compatible with energy recovery in this paper
8
Background – Switch Mode Power Supplies
• Basic DC-DC converter topologies
– Buck
S
LF
• Step down
• 0 Vout  VDD
– Boost
D
RL
+
CF
RL
D
LF
• Step up
• VDD  Vout
S
S
D
– Buck-boost
• Negative step up/down
• Vout  0
+
CF
LF
CF
+
RL
9
Background – Switch Mode Power Supplies
•
DC-DC buck converter
– CMOS inverter as power switches
Vdd
Vgate
Vinv
Vin +
-
S
D
L
IL
S
Vout
C
Vgate
Vinv
R
L
IL
Vout
C
R
D
•
Implementation of zero-voltage switching (ZVS)
– Turn on NMOS when Vinv= 0
– Turn on PMOS when Vinv=Vdd
10
Background
ISSCC 2007 Design
• ZVS  delay circuit
• Integrated clock driver / power converter
Integration of Clock and SMPS
• CPU clock: 3GHz clock and large Cclk
CLK in
Vclk
CLK in
Vclk
Cclk
Cclk
• SMPS: large Mp, Mn drive chain
Mp
CLK in
CLK in
Mp
Lf
Lf
Vout
Cf
Mn
Cf
Vout
Rload
Rload
Mn
12
Integration of Clock and SMPS
• Combine the driver circuits
CLK in
Vclk
Cclk
Mp
Lf
CLK in
Vout
Cf
Rload
Mn
13
Key Concept: Energy Recycling
CLK in
Vclk
Cclk
•
Benefits
– Shared driver chain
– Cclk added to SMPS
Lf
•
Red path
Vout
Cf
Rload
– NMOS drains Cclk  wastes charge!
•
Blue path
– Delay NMOS turn-on  recovers clock charge!
– ZVS (zero voltage switching) in power electronics
14
ZVS Detailed Operation
• ZVS delay circuit D
– Delay only rising edge of Vn
– Implemented inside the clock chain
Vdd
Vp
Mp
Lf
Vclk
Cclk
Vn
Vout
Cf
Rload
Mn
GND
15
ZVS Detailed Operation (Mode 1)
• Mode 1 (0 < t < DTsw)
D = Duty cycle
Tsw = Switching period
– Mp is ON
– Current builds up in the inductor
– Cclk charges up
Vdd
Vp
Mp
Lf
Vclk
Cclk
Vn
Vout
Cf
Rload
Mn
GND
16
ZVS Detailed Operation (Mode 2)
• Mode 2 (DTsw < t < DTsw+Tzvs)
D = Duty cycle
Tsw = Period
Tzvs = ZVS delay
– Both power transistors are OFF
– Inductor current discharges Cclk
– Cclk charge is recycled to output load
Vdd
Vp
Mp
Lf
Vclk
Cclk
Vn
Vout
Cf
Rload
Mn
GND
17
ZVS Detailed Operation (Mode 3)
• Mode 3 (DTsw+Tzvs < t < Tsw)
D = Duty cycle
Tsw = Period
Tzvs = ZVS delay
– Mn turns ON when Vclk  0
• ZVS for Mn
– Inductor current decreases linearly
Vdd
Vp
Mp
Lf
Vclk
Cclk
Vn
Vout
Cf
Rload
Mn
GND
18
Detailed Operation
• ZVS delay circuit for Mn
– Delay rising edge of Vn
M3
Vdd
1
2
Vm
Vp
Mp
M4
3
Lf
Vclk
Cclk
Vout
Cf
Rload
M1
4
Vn
ZVS
Delay
Circuit
Mn
GND
M2
19
Detailed Operation
• ZVS delay circuit for Mn
– Falling edges of Vp and Vn are synchronized
M3
Vdd
1
2
Vm
Vp
Mp
M4
Lf
Vclk
Cclk
Vout
Cf
Rload
M1
2
Vn
ZVS
Delay
Circuit
Mn
GND
M2
20
Simulation Voltages
1.2
Vclk
Vclk-ref
Vload
1
0.8
Vdd
1
2
Vm
Vp
Voltage (V)
M3
0.6
0.4
0.2
Mp
0
-0.2
0
0.2
0.4
0.6
0.8
1
M4
-0.4
Time (nSec)
Lf
Vclk
Cclk
Vout
Cf
Rload
M1
2
Vn
ZVS
Delay
Circuit
Mn
GND
M2
21
1.2
Simulation Currents
Vclk
Vclk-ref
Vload
1
0.8
Voltage (V)
M3
Vdd
1
2
Vm
Vp
0.6
0.4
0.2
Mp
0
-0.2
M4
Cclk
0.2
-0.4
Lf
Vclk
0
0.4
0.6
0.8
1
Time (nSec)
Vout
0.3
Rload
Cf
Lf
Mn
Mp
M1
0.25
2
GND
M2
0.2
Mn
Current (mA)
Vn
ZVS
Delay
Circuit
0.15
0.1
0.05
0
-0.05
-0.1
0
0.2
0.4
0.6
Time (nSec)
0.8
1
22
Effective Efficiency
• How to measure power efficiency after clock drivers are integrated with
DC-DC converters ?
– Converter gets “free energy” from clock
– Effective efficiency: how efficient a regular (standalone) power converter
must be to equal the efficiency of integrated clock/power converter
Power Converter Portion
dummy
Pin1
Pin1
Raw
Efficiency
Pin1 – Pin2
Pout
Effective
Efficiency
Pout
Recycled Energy
(not counted as
input power)
Integrated Clock Driver
and Power Converter
or
Stand-alone Power Converter
Pin2
Raw efficiency
Effective efficiency
Pout
raw 
 100
Pin1
effective 
Pout
 100
Pin1  Pin2
Clock Driver Portion
23
Buck Converter – Simulation Results
• Open loop converter (no regulation)
– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
300
1
Iout=30
0.75
Effective Efficiency (%)
Iout=50
Iout=70
Vout (V)
Iout=100
0.5
0.25
250
200
150
D=30%
D=40%
D=50%
D=60%
D=70%
100
50
0
0
10
20
30
40
50
Duty Ratio (%)
60
70
80
40
50
60
70
80
90
100
Iout (mA)
24
ISSCC 2007
• 90nm test chip 1mm2, buck converter 0.27mm2
25
Buck Converter –
Chip Measurement vs. Simulation Results
Chip Measurement
3.5GHz
3GHz
2.5GHz
2GHz
Fsw Sweep (D=50%)
300
Effective Efficiency (%)
240
Effective Efficiency (%)
Simulation (3GHz)
200
160
120
80
40
250
200
150
D=30%
D=40%
D=50%
D=60%
D=70%
100
50
0
30
40
50
60
70
Iout (mA)
80
90
100
110
0
40
50
60
70
80
90
100
Iout (mA)
26
ISVLSI 2008
New Design 1
Boost Converter
Boost Converter
•
Vclk
Basic operation
VDD
– Vclk provides power & timing
0
Vout
Vout
+
Vout
+
CF
CF
Diode
Dshift
0
LF
Vin
Vshift
ILf
+
+
Switch
Mp
t
Vshift
Vmax
Cshift
LF
Vclk_scaled
ILf
Vmin
0
t
ILf
Vclk
Mn
Cclk
max
0
t
Mode
•
t
Vclk_scaled
0th order result… Vout = D/(1-D)*Vdd
2
1
2
1
28
Boost Converter
Vout
+
CF=378pF
Vout
Dshift
+
512/0.1 x2
CF
Dshift
2016/0.75
36720/0.75
Cclk=Cshift
Vshift
+
Mp
Vshift
Wp/Lp = 192/0.1
Wp/Lp = 64/0.1
Cshift
LF
Wp/Lp = 576/0.1
Wp/Lp = 192/0.1
1kW
+ Cshift=21pF
VDD
LF=310pH
Vclk_scaled
Vpulse
ILf
Mp1
Vclk
Cclk
Wp/Lp = 48/0.1
Wp/Lp = 16/0.1
Vclk
Mp2
192/0.1
Cclk=21pF
Mn1
Vclk_scaled
ILf
4096/0.1
Mn
Mp3
4096/0.1
Cclk_scaled
2.2pF
Mn3
2048/0.1
Clock Load
Capacitance
1024/0.1
Mn2
64/0.1
216/0.75
29
Boost Converter – Simulation Results
• Open loop converter (no regulation)
– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
2.5
1.5
Effective Efficiency (%)
2
Vout (V)
125
Iout=10mA
Iout=30mA
Iout=50mA
Iout=70mA
Iout=100mA
1
0.5
0
30
40
50
60
Duty Ratio (%)
70
80
D=40%
D=50%
D=60%
D=70%
D=80%
100
75
50
25
0
0
20
40
60
80
100
Iout (mA)
30
ISVLSI 2008
New Design 2
Buck-boost Converter
Buck-boost Converter
•
Vclk
Basic operation
VDD
– Vclk provides power & timing
0
Vshift
VDD
Vmax
Vin
+
+
ILf
Diode
0
Vmin
Vclk
Switch
LF
t
t
Mp
Sclk
Cshift
Vclk
Vclk
Vinv
VDD
0
ILf
Vshift
+
CF
Mn
LF
Cclk
Vout ≤ 0
t
Vinv
VDD
0
Vout
Dshift
Clock Load
Capacitance
+
CF
Vout
t
ILf
max
0
t
•
0th order result… Vout = -D2/(1-D)*Vdd
Mode
2
1
2
1
32
Buck-boost Converter
Wp/Lp = 192/0.1
Wp/Lp = 64/0.1
Wp/Lp = 576/0.1
Wp/Lp = 192/0.1
VDD
Vclk
Vpulse
VDD
Wp/Lp = 48/0.1
Wp/Lp = 16/0.1
Vclk
+
Mp3
192/0.1
2016/0.75
Mp
Mp2
Sclk
Cshift
Mp1
4096/0.1
4096/0.1
Vclk
Vinv
Mp4
4096/0.1
Vinv
Mn3
64/0.1
ILf
ILf
Vshift
Mn
Mn2
LF
1024/0.1
Cclk
2016/0.75
1kW
Clock Load
Capacitance
+
CF
Vout
Cclk = 21pF
LF
Clock Load
Capacitance
310pH
+ Cshift=21pF Deep N-Well
Structures
Vshift
Dshift
Vclk
Mp5
128/0.1
Mn1
1024/0.1
34560/0.75
Vbias
CF
Dshift
2016/0.75
Cbias
Three Diodes
in Series,
Each: 128/0.1
+
+
21pF
Cbias
10.4kW
CF=356pF
Vout
33
Buck-boost Converter
•
Open loop converter (no regulation)
–
Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
0
100
Vout (V)
-0.4
Effective Efficiency (%)
Iout=10mA
Iout=30mA
Iout=50mA
Iout=70mA
Iout=90mA
-0.8
-1.2
80
60
40
-1.6
20
-2
0
10
20
30
40
50
Duty Ratio (%)
60
70
D=20%
D=30%
D=40%
D=50%
D=60%
D=70%
0
20
40
60
100
80
Iout (mA)
34
Results and Comparisons
Summary Results
Clock-only
power
(input)
Extra power to
operate
converter
(input)
Converter
output power
% clock
energy
recovered
Buck converter
[ ISSCC2007 ]
40mW
16mW
26mW
50%
Boost
converter
100mW
25mW
28mW
20%
Buck-boost
converter
100mW
72mW
48mW
30%
•
90nm layouts, 3GHz operation, < 0.3mm2
36
Comparative Results
•
IBM Power6 100W@1V, 341mm2  Cclk = 13pF/mm2
•
Other work: fully on-chip DC-DC buck converter
•
–
S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A Multi-Stage Interleaved Synchronous Buck Converter
with Integrated Output Filter in a 0.18µm SiGe Process," ISSCC 2006, pp. 356–357
–
–
27mm2, 45MHz
65% power efficiency
This work
–
0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor area, 3GHz
•
•
–
LC filter: 310pH inductor, 350pF capacitor
•
–
–
–
Cclk 20pF, equiv to 1.6mm2 of Power6 area
DC-DC converter adds 12.5% area overhead
L and C similar and dominate layout area  can stack to cut area in half
Buck: 75 – 185% effective power efficiency (50% recovered)
Boost: 25 – 110% effective power efficiency (20% recovered)
Buck-boost: 20 – 66% effective power efficiency (30% recovered)
37
Conclusion
•
Key concepts
–
–
–
–
•
High switching frequency  saves area
Combined drivers  saves area and switching loss
Recycled charge  converter load discharges Cclk
ZVS delay circuit  lower power loss
Limitations
– Regulation needs variable duty cycle clock
• May introduce additional clock jitter
• Mostly suitable for edge-triggered blocks
(no latches)
•
Future work
– Lots of improvements to make!
38
Thank you!
Questions ?
Related documents