Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Energy Recovery from High-frequency Clocks using DC-DC Converters Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, William Dunford University of British Columbia, Canada Patrick Palmer University of Cambridge, UK Problem Clock power in high-performance CPUs CPU • Year Clock Power % Power for Clock Clock Power Intel McKinley 2002 (180nm) 1 GHz 130W 33% 43W Intel Montecito 2005 (90nm) 2.5 GHz 85W 30% 25W IBM Power 6 2007 (65nm) 5 GHz > 100W 22% > 22W Cause – Charge big clock capacitor Cclk with energy – Discharge Cclk energy to GND (WASTE IT!!) – Repeat every clock cycle 2 Primary Contribution of This Work • Primary contribution – Discharge Cclk using DC-DC converter instead of GND • Use converter to power useful load (Rload) • Integrated clock drivers with DC-DC converters • Net savings in power Voltage feedback (for regulation) Useful Load 3 Summary Results • Explore 3 main DC-DC power converter topologies – Buck converter – Boost converter – Buck-boost converter • our previous work this paper this paper [ ISSCC 2007 ] [ ISVLSI 2008 ] [ ISVLSI 2008 ] 90nm layouts, 3GHz operation, < 0.3mm2 Clock-only power (input) Extra power to operate converter (input) Converter output power % clock energy recovered Buck converter [ ISSCC2007 ] 40mW 16mW 26mW 50% Boost converter 100mW 25mW 28mW 20% Buck-boost converter 100mW 72mW 48mW 30% 4 Background Background – Typical Clocking Architecture Bottom mesh Final H-tree Clock Source Level 3 Gaters & Final drivers Level 1 & Level 2 H-tree 6 Background – Typical Clocking Architecture • Clock distribution – Majority of energy used by final drivers – Levels 1, 2 • • • • H-trees Tunable delays (CVDs) to eliminate skew Low-swing, differential low power, noise immunity ~ 5W of power – Level 3 • Gaters reduce clock activity 50-85% (Power6) – Can’t eliminate all activity still need a clock to compute • Final clock drivers – Full-rail swing tapered inverters drive hundreds latches, high power • H-tree with ends shorted by Mesh low skew, high power • ~15W to 40W of power 7 Background –Reducing Clock Power • Clock distribution – Low-swing (differential) signals • Final drivers need full-rail – Resonant clocking (saves 80%) • Final drivers need square clock • Final clock drivers – Adiabatic switching • Low-performance, < 100MHz – Double-edge clocking • Feasible, but complex flip-flops, larger loads • Compatible with energy recovery in this paper 8 Background – Switch Mode Power Supplies • Basic DC-DC converter topologies – Buck S LF • Step down • 0 Vout VDD – Boost D RL + CF RL D LF • Step up • VDD Vout S S D – Buck-boost • Negative step up/down • Vout 0 + CF LF CF + RL 9 Background – Switch Mode Power Supplies • DC-DC buck converter – CMOS inverter as power switches Vdd Vgate Vinv Vin + - S D L IL S Vout C Vgate Vinv R L IL Vout C R D • Implementation of zero-voltage switching (ZVS) – Turn on NMOS when Vinv= 0 – Turn on PMOS when Vinv=Vdd 10 Background ISSCC 2007 Design • ZVS delay circuit • Integrated clock driver / power converter Integration of Clock and SMPS • CPU clock: 3GHz clock and large Cclk CLK in Vclk CLK in Vclk Cclk Cclk • SMPS: large Mp, Mn drive chain Mp CLK in CLK in Mp Lf Lf Vout Cf Mn Cf Vout Rload Rload Mn 12 Integration of Clock and SMPS • Combine the driver circuits CLK in Vclk Cclk Mp Lf CLK in Vout Cf Rload Mn 13 Key Concept: Energy Recycling CLK in Vclk Cclk • Benefits – Shared driver chain – Cclk added to SMPS Lf • Red path Vout Cf Rload – NMOS drains Cclk wastes charge! • Blue path – Delay NMOS turn-on recovers clock charge! – ZVS (zero voltage switching) in power electronics 14 ZVS Detailed Operation • ZVS delay circuit D – Delay only rising edge of Vn – Implemented inside the clock chain Vdd Vp Mp Lf Vclk Cclk Vn Vout Cf Rload Mn GND 15 ZVS Detailed Operation (Mode 1) • Mode 1 (0 < t < DTsw) D = Duty cycle Tsw = Switching period – Mp is ON – Current builds up in the inductor – Cclk charges up Vdd Vp Mp Lf Vclk Cclk Vn Vout Cf Rload Mn GND 16 ZVS Detailed Operation (Mode 2) • Mode 2 (DTsw < t < DTsw+Tzvs) D = Duty cycle Tsw = Period Tzvs = ZVS delay – Both power transistors are OFF – Inductor current discharges Cclk – Cclk charge is recycled to output load Vdd Vp Mp Lf Vclk Cclk Vn Vout Cf Rload Mn GND 17 ZVS Detailed Operation (Mode 3) • Mode 3 (DTsw+Tzvs < t < Tsw) D = Duty cycle Tsw = Period Tzvs = ZVS delay – Mn turns ON when Vclk 0 • ZVS for Mn – Inductor current decreases linearly Vdd Vp Mp Lf Vclk Cclk Vn Vout Cf Rload Mn GND 18 Detailed Operation • ZVS delay circuit for Mn – Delay rising edge of Vn M3 Vdd 1 2 Vm Vp Mp M4 3 Lf Vclk Cclk Vout Cf Rload M1 4 Vn ZVS Delay Circuit Mn GND M2 19 Detailed Operation • ZVS delay circuit for Mn – Falling edges of Vp and Vn are synchronized M3 Vdd 1 2 Vm Vp Mp M4 Lf Vclk Cclk Vout Cf Rload M1 2 Vn ZVS Delay Circuit Mn GND M2 20 Simulation Voltages 1.2 Vclk Vclk-ref Vload 1 0.8 Vdd 1 2 Vm Vp Voltage (V) M3 0.6 0.4 0.2 Mp 0 -0.2 0 0.2 0.4 0.6 0.8 1 M4 -0.4 Time (nSec) Lf Vclk Cclk Vout Cf Rload M1 2 Vn ZVS Delay Circuit Mn GND M2 21 1.2 Simulation Currents Vclk Vclk-ref Vload 1 0.8 Voltage (V) M3 Vdd 1 2 Vm Vp 0.6 0.4 0.2 Mp 0 -0.2 M4 Cclk 0.2 -0.4 Lf Vclk 0 0.4 0.6 0.8 1 Time (nSec) Vout 0.3 Rload Cf Lf Mn Mp M1 0.25 2 GND M2 0.2 Mn Current (mA) Vn ZVS Delay Circuit 0.15 0.1 0.05 0 -0.05 -0.1 0 0.2 0.4 0.6 Time (nSec) 0.8 1 22 Effective Efficiency • How to measure power efficiency after clock drivers are integrated with DC-DC converters ? – Converter gets “free energy” from clock – Effective efficiency: how efficient a regular (standalone) power converter must be to equal the efficiency of integrated clock/power converter Power Converter Portion dummy Pin1 Pin1 Raw Efficiency Pin1 – Pin2 Pout Effective Efficiency Pout Recycled Energy (not counted as input power) Integrated Clock Driver and Power Converter or Stand-alone Power Converter Pin2 Raw efficiency Effective efficiency Pout raw 100 Pin1 effective Pout 100 Pin1 Pin2 Clock Driver Portion 23 Buck Converter – Simulation Results • Open loop converter (no regulation) – Higher efficiency at lowest duty cycle because only a fixed amount of energy is available from Cclk 300 1 Iout=30 0.75 Effective Efficiency (%) Iout=50 Iout=70 Vout (V) Iout=100 0.5 0.25 250 200 150 D=30% D=40% D=50% D=60% D=70% 100 50 0 0 10 20 30 40 50 Duty Ratio (%) 60 70 80 40 50 60 70 80 90 100 Iout (mA) 24 ISSCC 2007 • 90nm test chip 1mm2, buck converter 0.27mm2 25 Buck Converter – Chip Measurement vs. Simulation Results Chip Measurement 3.5GHz 3GHz 2.5GHz 2GHz Fsw Sweep (D=50%) 300 Effective Efficiency (%) 240 Effective Efficiency (%) Simulation (3GHz) 200 160 120 80 40 250 200 150 D=30% D=40% D=50% D=60% D=70% 100 50 0 30 40 50 60 70 Iout (mA) 80 90 100 110 0 40 50 60 70 80 90 100 Iout (mA) 26 ISVLSI 2008 New Design 1 Boost Converter Boost Converter • Vclk Basic operation VDD – Vclk provides power & timing 0 Vout Vout + Vout + CF CF Diode Dshift 0 LF Vin Vshift ILf + + Switch Mp t Vshift Vmax Cshift LF Vclk_scaled ILf Vmin 0 t ILf Vclk Mn Cclk max 0 t Mode • t Vclk_scaled 0th order result… Vout = D/(1-D)*Vdd 2 1 2 1 28 Boost Converter Vout + CF=378pF Vout Dshift + 512/0.1 x2 CF Dshift 2016/0.75 36720/0.75 Cclk=Cshift Vshift + Mp Vshift Wp/Lp = 192/0.1 Wp/Lp = 64/0.1 Cshift LF Wp/Lp = 576/0.1 Wp/Lp = 192/0.1 1kW + Cshift=21pF VDD LF=310pH Vclk_scaled Vpulse ILf Mp1 Vclk Cclk Wp/Lp = 48/0.1 Wp/Lp = 16/0.1 Vclk Mp2 192/0.1 Cclk=21pF Mn1 Vclk_scaled ILf 4096/0.1 Mn Mp3 4096/0.1 Cclk_scaled 2.2pF Mn3 2048/0.1 Clock Load Capacitance 1024/0.1 Mn2 64/0.1 216/0.75 29 Boost Converter – Simulation Results • Open loop converter (no regulation) – Higher efficiency at lowest duty cycle because only a fixed amount of energy is available from Cclk 2.5 1.5 Effective Efficiency (%) 2 Vout (V) 125 Iout=10mA Iout=30mA Iout=50mA Iout=70mA Iout=100mA 1 0.5 0 30 40 50 60 Duty Ratio (%) 70 80 D=40% D=50% D=60% D=70% D=80% 100 75 50 25 0 0 20 40 60 80 100 Iout (mA) 30 ISVLSI 2008 New Design 2 Buck-boost Converter Buck-boost Converter • Vclk Basic operation VDD – Vclk provides power & timing 0 Vshift VDD Vmax Vin + + ILf Diode 0 Vmin Vclk Switch LF t t Mp Sclk Cshift Vclk Vclk Vinv VDD 0 ILf Vshift + CF Mn LF Cclk Vout ≤ 0 t Vinv VDD 0 Vout Dshift Clock Load Capacitance + CF Vout t ILf max 0 t • 0th order result… Vout = -D2/(1-D)*Vdd Mode 2 1 2 1 32 Buck-boost Converter Wp/Lp = 192/0.1 Wp/Lp = 64/0.1 Wp/Lp = 576/0.1 Wp/Lp = 192/0.1 VDD Vclk Vpulse VDD Wp/Lp = 48/0.1 Wp/Lp = 16/0.1 Vclk + Mp3 192/0.1 2016/0.75 Mp Mp2 Sclk Cshift Mp1 4096/0.1 4096/0.1 Vclk Vinv Mp4 4096/0.1 Vinv Mn3 64/0.1 ILf ILf Vshift Mn Mn2 LF 1024/0.1 Cclk 2016/0.75 1kW Clock Load Capacitance + CF Vout Cclk = 21pF LF Clock Load Capacitance 310pH + Cshift=21pF Deep N-Well Structures Vshift Dshift Vclk Mp5 128/0.1 Mn1 1024/0.1 34560/0.75 Vbias CF Dshift 2016/0.75 Cbias Three Diodes in Series, Each: 128/0.1 + + 21pF Cbias 10.4kW CF=356pF Vout 33 Buck-boost Converter • Open loop converter (no regulation) – Higher efficiency at lowest duty cycle because only a fixed amount of energy is available from Cclk 0 100 Vout (V) -0.4 Effective Efficiency (%) Iout=10mA Iout=30mA Iout=50mA Iout=70mA Iout=90mA -0.8 -1.2 80 60 40 -1.6 20 -2 0 10 20 30 40 50 Duty Ratio (%) 60 70 D=20% D=30% D=40% D=50% D=60% D=70% 0 20 40 60 100 80 Iout (mA) 34 Results and Comparisons Summary Results Clock-only power (input) Extra power to operate converter (input) Converter output power % clock energy recovered Buck converter [ ISSCC2007 ] 40mW 16mW 26mW 50% Boost converter 100mW 25mW 28mW 20% Buck-boost converter 100mW 72mW 48mW 30% • 90nm layouts, 3GHz operation, < 0.3mm2 36 Comparative Results • IBM Power6 100W@1V, 341mm2 Cclk = 13pF/mm2 • Other work: fully on-chip DC-DC buck converter • – S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A Multi-Stage Interleaved Synchronous Buck Converter with Integrated Output Filter in a 0.18µm SiGe Process," ISSCC 2006, pp. 356–357 – – 27mm2, 45MHz 65% power efficiency This work – 0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor area, 3GHz • • – LC filter: 310pH inductor, 350pF capacitor • – – – Cclk 20pF, equiv to 1.6mm2 of Power6 area DC-DC converter adds 12.5% area overhead L and C similar and dominate layout area can stack to cut area in half Buck: 75 – 185% effective power efficiency (50% recovered) Boost: 25 – 110% effective power efficiency (20% recovered) Buck-boost: 20 – 66% effective power efficiency (30% recovered) 37 Conclusion • Key concepts – – – – • High switching frequency saves area Combined drivers saves area and switching loss Recycled charge converter load discharges Cclk ZVS delay circuit lower power loss Limitations – Regulation needs variable duty cycle clock • May introduce additional clock jitter • Mostly suitable for edge-triggered blocks (no latches) • Future work – Lots of improvements to make! 38 Thank you! Questions ?