Download June 2008 DRAFT - submitted to JSSC for review.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Immunity-aware programming wikipedia , lookup

Amplifier wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Television standards conversion wikipedia , lookup

Operational amplifier wikipedia , lookup

Schmitt trigger wikipedia , lookup

Coupon-eligible converter box wikipedia , lookup

Analog-to-digital converter wikipedia , lookup

Resistive opto-isolator wikipedia , lookup

Transistor–transistor logic wikipedia , lookup

Radio transmitter design wikipedia , lookup

Power MOSFET wikipedia , lookup

Phase-locked loop wikipedia , lookup

Valve RF amplifier wikipedia , lookup

Surge protector wikipedia , lookup

Index of electronics articles wikipedia , lookup

Integrated circuit wikipedia , lookup

CMOS wikipedia , lookup

Integrating ADC wikipedia , lookup

Current mirror wikipedia , lookup

Opto-isolator wikipedia , lookup

Power electronics wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Rectiverter wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Transcript
June 2008 DRAFT - submitted to JSSC for review.
Energy Recycling from Multi-GHz Clocks using Fully Integrated Switching Converters
M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, W. Dunford, P. Palmer
University of British Columbia, Vancouver, BC, Canada
Abstract
Large digital chips use a significant amount of energy to distribute a multi-GHz clock. By
discharging the clock network to ground every cycle, the energy stored in this large capacitor is
wasted. Instead, this energy can be recycled to another part of the chip using an on-chip DC-DC
converter. This paper investigates energy recycling by integrating three DC-DC converter
topologies, namely buck, boost and buck-boost, with a high-speed clock driver. The high
operating frequency significantly shrinks the required size of the L and C components so they can
be placed on-chip; typical converters place them off-chip. The clock driver and DC-DC converter
are able to share the entire tapered buffer chain, including the widest drive transistors in the final
stage. The clock duty cycle must be modulated to achieve voltage regulation, implying only
single-edge-triggered flops should be used. However, this minor drawback is eclipsed by the
benefits: by recovering energy from the clock, the output power can actually exceed the
incremental input power needed to operate the converter circuitry, resulting in an effective
efficiency greater than 100%. Circuit operation is validated using a 90nm prototype chip. Finally,
the converter output enables additional power-saving features such as low-voltage islands or
body bias voltages.
Keywords
Switching DC-DC converter, integrated DC-DC converter, integrated output filter, charge
recycling, energy recovery, energy recycling, integrated clock/converter, multi-GHz clock.
1. Introduction
The rapid increase in energy consumption of large digital circuits has been predominantly due to
an increase in total gate capacitance and an increase in operating frequency [1]. As a result, a
large fraction of the total energy budget is used by the high-frequency clock network. While the
clock energy is obviously proportional to frequency, additional energy is also expended in
modern designs to reduce clock skew and to drive increased capacitance caused by the more
closely spaced wires and thinner gate oxides. Most of this energy is consumed in the final drive
stage where a large load capacitance is charged and discharged every cycle [2]. For example, the
5+ GHZ IBM POWER6 processor clock network consumes 22W at 1V in a 341mm2 die [3],
suggesting it can be roughly modeled by an equivalent capacitance of 13pF/mm2.
There are several methods used to reduce clock energy, such as gating the clock, lowswing signals, double-edge triggered flip-flops, adiabatic switching [4], and resonant clocking
[5][6]. All of these previous techniques have attempted to reduce the power consumption of
operating the clock, but have various issues. For example, resonant clocking shows promise for
up to 80% energy reduction in clock distribution, but it cannot be applied at the end loads where
most of the power is dissipated. The reason is that the sinusoidal waveform produced by
resonance does not have steep edges right at the critical switching time, which is needed for all
the flops to agree to the same, precise switching moment.
This paper investigates a new method for reducing clock energy, summarized in Figure 1,
where the clock energy is not reduced directly but is instead recovered using a DC-DC converter
and redeployed to another circuit in a regulated fashion. This redeployment reduces the total
current draw from the primary supply. We call this concept energy recycling [7].
1
One of the main advantages of energy recycling is the efficient generation of an on-chip
voltage supply which differs from the level offered by the primary supply. Since the on-chip DCDC converter is small, many can be deployed across the chip to produce independent, regional
power supplies. This allows for several different regulated voltages to be on-chip at the same
time, all powered from a single primary supply. Various power-saving techniques such as mixedvoltage islands and adaptive body biasing (ABB) [8] can utilize these additional supply voltages.
An on-chip DC-DC converter can power these schemes without the need for external pins,
external components, or board design effort. Another advantage of on-chip converters is the
ability to respond quickly to dynamic load conditions in many-core processors, a key requirement
for achieving the savings promised by dynamic voltage and frequency scaling (DVFS) [9].
The main drawback of fully integrated switching converters is their low efficiency. Highquality discrete converters operate at above 80% efficiency. It can be difficult to reach this level
without off-chip components because a low switching frequency (hence, large LC components) is
often used to reduce switching loss. For example, the converter described in [10] uses 27mm2 at
45MHz to reach an efficiency of 65%. The DC-DC converters described in this paper operate at
nearly 100× higher frequencies to shrink LC area by 99%, but switching losses do increase. To
mitigate the switching loss, energy stored in the clock capacitance is recycled. This extra energy
is available for free, since it would normally be wasted every cycle when the clock node is
discharged. As a result, the power needed to operate the DC-DC converter is greatly reduced.
To keep things simple, the new converters in this paper are presented in an ‘open loop’
mode with no voltage regulation capability. A more complete design, as presented in [10], would
include a feedback controller to regulate voltage by modulating the clock pulse width. However,
2
the operation of such a controller is not under investigation here. Also, in all designs, the power
converters target an output voltage ripple of less than 5% peak-to-peak.
The remainder of this paper is organized as follows. In the next section, we describe the
compatibility of merging a clock driver with a power converter and the resulting impact on
power conversion efficiency. Sections 3 to 5 present a clock driver integrated with buck, boost,
and buck-boost power converter topologies, respectively. Section 6 gives simulation results,
while Section 7 discusses layout implementation issues and Section 8 presents results from a
working proof-of-concept buck-converter prototype manufactured in 90nm CMOS. Section 9
concludes this paper.
2. Similarity of Clock Driver and Power Converter Circuits
Integrating a clock driver with a DC-DC converter merges several compatible concepts. First,
using the same high switching frequency (e.g., 3GHz) for the clock and converter reduces the
size of on-chip inductor and capacitor needed by the converter. Second, the final clock drivers
and the DC-DC power transistors are both very wide to improve switching time of the clock and
reduce output losses of the converter. These large, low-impedance transistors need to be driven
by a tapered inverter chain to keep up with the high frequency. Third, the power used by this
chain should be minimized in both cases. Fourth, high-efficiency DC-DC converters employ an
energy-saving concept known as zero-voltage switching (ZVS), where a power transistor is
turned on only after a “dead-time” delay when the source-drain voltage reaches 0V. During the
dead time, energy stored in the drain node capacitance is removed by the inductor and delivered
to the load instead of being wasted. Applying this ZVS technique to the large clock capacitance
is the key to energy recycling. Since the inductor is not used in a resonant fashion, clock edges
3
are kept fast. Lastly, many DC-DC converters use pulse width modulation (with fast edge rates)
for output regulation, a scheme compatible with single-edge triggered clocking.
Figure 2 shows how combining the clock driver circuit with the power converter circuit
helps to increase the overall efficiency. The integrated clock driver/power converter in Figure
2(a) receives Pin1 input power and provides Pout output power, resulting in a raw efficiency of
η = Pout /Pin1 < 1. However, part of Pin1 is required to operate the clock network. If a dedicated
clock driver was constructed, this power consumption would be Pin2. Hence, an integrated clock
driver/power converter circuit uses Pin1 − Pin2 additional power to operate the power converter
portion and recycle energy from the clock driver. As shown in Figure 2(b), if the clock driver
power (and circuitry) was removed from the integrated design and replaced with a standalone
converter, it would need to provide Pout using just the incremental power, Pin1 − Pin2. Thus,
recycling the clock power increases the overall efficiency.
To compare the dual-purpose circuit with a traditional stand-alone power converter, we
use a new metric called effective efficiency (ηeff), which is defined in Equation (1) as the output
power divided by the incremental input power needed to operate the converter. Effective
efficiency captures how efficient a traditional stand-alone converter needs to be to supply the
same output power using just the incremental input power used by the dual-purpose circuit.
ηeff =
Pout
× 100 .
Pin1 − Pin 2
(1)
Due to the poor efficiency typically associated with very high switching rates, we
consider an effective efficiency greater than 50% to be good for an on-chip converter.1 However,
in some cases, circuits in this paper provide effective efficiency greater than 100%. This is not
1
The threshold value of 50% effective efficiency was chosen to match the raw efficiency of the integrated clock driver/buck
converter chip presented in this paper. Achieving anything better than the raw efficiency should be considered good.
4
evidence of the perpetual motion fallacy; it is proof that otherwise wasted energy is being
recycled from the clock tree and delivered to the converter output.
3. Integrated Clock Driver/Buck Converter
This section describes the operation of a buck converter and how it can be integrated with a
traditional tapered clock driver chain. This combined circuit is used to drive a large clock
capacitance and output a voltage lower than VDD which can be used as a supply for low-voltage
islands to further reduce energy consumption. This circuit was initially presented in [11], but is
described here in greater detail.
3.1. Simplified Circuit
A buck converter operates by averaging a square-wave signal through a low-pass filter as shown
in Figure 3(a). The output voltage of an ideal converter is the average or DC value of its input,
D×VDD, which implies that it is a function of the magnitude and duty cycle of the input, but not
the frequency.
A simple integrated clock driver/buck converter circuit is shown in Figure 3(b) where a
chain of cascaded inverters (not shown) drives node Vclk-in. Capacitance Cclk is the overall
capacitance at the clock node and includes all transistor and wiring capacitances at this node. The
operation of this circuit is summarized by the idealized timing diagram in Figure 3(c), where D,
Tsw, and Tdelay represent clock duty cycle, switching (clock) period, and ZVS dead-time,
respectively. There are three modes of operation:
•
Mode 1 (time 0 to D×Tsw) is intended to drive the load and charge Cclk through Mp.
Inductor current increases linearly since the voltage across it is constant.
•
Mode 2 (time D×Tsw to D×Tsw+Tdelay) is intended for energy recycling. During this time,
both Mn and Mp are off and the charge stored in Cclk is moved to the output circuit
5
through the inductor, as the inductor current can not be disrupted abruptly. This results in
a rapid drop of Vclk, which is intended. In this short period of time, the inductor current
can be assumed somewhat constant. It is worth mentioning that if no delay is present, Cclk
would be discharged to ground at time D×Tsw through Mn, wasting the energy.
•
Mode 3 (time D×Tsw+Tdelay to Tsw) starts when the voltage across Mn is close to zero. At
this time Mn is turned on to provide a low-resistance path for the inductor current. At this
point, no further energy is supplied to the inductor and the voltage across it is constant, so
inductor current decreases linearly. ZVS operation occurs when Mn is turned on while its
source-drain voltage is close to zero, thereby reducing dynamic power loss.
Theoretically, in Mode 3, when the falling inductor current crosses zero, Mn could be turned off
to allow charging Cclk with the negative inductor current. Then, at the beginning of the next
switching cycle, Mp would be turned on with 0V across it (i.e., ZVS operation for Mp). In
practice, this increases the output voltage ripple, as CF must provide the required charge for the
large Cclk. In this design, no ZVS operation is implemented for Mp.
3.2. Complete Circuit
Measuring the performance of the new circuit requires constructing a reference clock driver
circuit as well as the integrated clock driver/buck converter itself. These two circuits are shown
in Figures 4 and 5, respectively. The reference clock driver has the same transistor sizes and load
as the integrated design to facilitate experimental measurements with a fabricated prototype.
In both circuits, PMOS transistors are three times wider than NMOS transistors, except
for the last inverter stage in which the PMOS is four times wider to reduce the voltage drop
across Mp while Vclk is high and the current is building up in the inductor LF (Mode 1). A tapering
factor equivalent to a fan-out of four is used for the inverter chain, which minimizes clock
6
latency from the source. To reduce front-end energy and improve overall conversion efficiency,
this tapering factor could be increased. NMOS transistor gate capacitance is used to implement
the converter filter capacitor, CF, while the gate capacitance of a large dummy inverter is used to
represent the parasitic and load capacitance at the clock node, Cclk. The value of Cclk is estimated
to be 12pF, roughly equivalent to a 1mm2 region of the IBM POWER6 processor.
To control the exact on/off timing of Mn and Mp shown in Figure 5, the inverter driving
those transistors is replaced with two separate inverters, with the same total transistor sizes as the
original single driver. To implement a delay for the ZVS dead-time, the gate of M1 is connected
to Vclk instead of being connected to the gate of M2. Therefore, compared to Vp, the rising edge of
Vn is delayed by Tdelay, a duration which depends on how quickly LF drains Cclk and how fast M1
turns on to raise Vn. A drop in Vclk will result in M1 and then Mn to turn on and consequently Vclk
is dropped faster. Since the gate of M2 is connected to Vm, no falling edge delay is observed for
Vn. To prevent M1 and M2 from being on concurrently at the rising edge of Vm, the source of M1 is
connected to Vp instead of VDD and uses its charge. Therefore, Vn falls at the falling edge of Vp.
The duty cycle observed at Vclk, which determines the output voltage, is influenced by
how quickly Cclk is drained by the load. The output voltage is given by Vout = Deff ×Vin where
Here, Tfall is the fall time of Vclk
1 Tdelay − T fall
(2)
Deff = D + ⋅
.
2
Tsw
if there is no ZVS delay (reference clock driver circuit), while
Tdelay is the fall time of Vclk in the presence of ZVS (integrated driver/converter circuit).
Equation (2) suggests that if Tdelay is equal to Tfall, duty cycle remains unchanged. Any Tdelay
larger than Tfall increases the effective duty cycle accordingly. Tdelay can be calculated using the
simplified circuit model given in Figure 6. At the time t = 0 when Mp turns off,
7
Vclk (0 ) = VDD − I Lmax ⋅ Ron− PMOS . During clock fall off, ILmax can be assumed to be constant,
therefore, Vclk (t ) = Vclk (0 ) −
1
⋅ I Lmax ⋅ t . The time that takes for Vclk to reach zero is
Cclk
V

(3)
Tdelay = Cclk  DD − Ron−PMOS  .
I

 Lmax

Timing uncertainty is an important issue in a clock distribution network. In Figure 5, as
Cclk is charged and discharged through non-similar circuit routes, rising/falling edges of the Vclk
are not similar. The most significant concern is rise- and fall-time dependence on a dynamic load.
When the clock is rising, different load currents result in different voltage drops across the Mp
on-resistance, Ron-pmos. As a result, the rising edge of the clock is slightly modified based on the
load current. For the falling edge, the problem is more severe, as the load current solely
determines the falling slope of the clock signal before the ZVS delay circuit is triggered.
It is also worth noting that no oscillation happens at the Vclk node: as Vclk approaches 0V,
Mn turns on and prevents the oscillation by keeping Vclk at 0V until the next Mp turn-on phase.
4. Integrated Clock Driver/Boost Converter
The buck converter just described cannot output a voltage close to VDD because the PWM clock
signal must remain low for a significant time period. Instead, a boost converter can be used to
provide increased output voltage range. Originally presented in [12], this paper provides
additional circuit design and implementation details. This boost converter design also led to the
development of a new low-power clock driver [13].
4.1. Simplified Circuit
In the typical boost converter of Figure 7(a), when the switch is closed, voltage Vin is across the
inductor LF and it builds up current. In the next phase, when the switch is open, inductor current
finds its way through the diode and charges the output capacitor.
8
A simple integrated clock driver/boost converter circuit is shown in Figure 7(b). It uses a
switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the PMOS
transistor in place of the power diode. In addition to providing output voltage levels higher than
VDD, the circuit also produces a buffered version of the clock, Vclk_scaled, at the same magnitude as
Vout. This clock signal can be used in the circuitry powered by the converter, but allowances for
clock skew and level-conversion will need to be made in the data path logic. There are two
modes of operation: Mode 1, where Vclk goes high and Mn turns on to build up inductor current,
and Mode 2 where inductor current finds its way through Mp to charge the output capacitor CF.
The output voltage of this boost converter would be Vout =
1
D
× VDD ,
× Vin =
1− D
1− D
where Vin = mean (Vclk ) = D × VDD . Ideally when D > 50%, the output voltage will be higher than
VDD.
4.2. Complete Circuit
The complete integrated clock driver/boost converter circuit is shown in Figure 8. Mn2 and Mp2
introduce the turn-on delay to implement ZVS for Mn1 as in the buck circuit. The drain node of
Mn3 and Mp3, denoted as Vclk_scaled, swings from zero to Vout. The value of the scaled clock
capacitor is assumed to be 2.2pF. Some of the recovered energy is subsequently lost when this
capacitor is discharged, so it should be kept small.
In Figure 8, the gating signal for Mn3 changes from VDD to zero. However, the appropriate
gating signal for Mp3 should instead change from Vout to Vout – VDD. The combination of diodes
Dshift, capacitor Cshift, and transistors Mn1 and Mp1 perform as a switched-capacitor voltage shifter.
Except for Dshift and Cshift, all transistor body terminals are connected to their source pins.
The body terminals of Dshift and Cshift are connected to ground instead. This prevents forward
9
biasing of the body-drain intrinsic diode, in case drain voltage goes lower than the source
voltage. Also, this makes the layout implementation easier as well, since no deep n-well structure
is required. Finally, a 1kΩ resistor is added in parallel to Cshift to bias the Dshift diodes and provide
a DC current path to avoid floating nodes when the Dshift is off.
5. Integrated Clock Driver/Buck-Boost Converter
This section integrates a buck-boost converter with a clock driver to produce a negative output
voltage. Originally presented in [12], this paper provides circuit implementation details.
5.1. Simplified Circuit
In the typical buck-boost converter of Figure 7(c), when the switch is on, voltage Vin will be
across the inductor LF and current will build up in the inductor. In the next phase, when the
switch is off, inductor current goes through the diode and charges the output capacitor. The diode
prevents shorting Vout to VDD when the switch is on.
A simple integrated clock driver/buck-boost converter circuit is shown in Figure 7(d). It
uses a switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the NMOS
used in place of the power diode. An extra switch Sclk is also added between nodes Vclk and Vinv.
This switch prevents Vclk from becoming negative as Vinv goes below zero when Mn is on.
Similar to the boost converter, the buck-boost converter also has two modes of operation.
In Mode 1, current builds up in the inductor, while in Mode 2 the inductor current absorbs the
energy in Cclk then delivers its stored energy to CF. The output voltage of this buck-boost
converter is calculated by Vout =
−D
− D2
× Vin =
× VDD , which is negative.
1− D
1− D
5.2. Complete Circuit
The complete integrated clock driver/buck-boost converter circuit is shown in Figure 9. Many of
10
the changes are similar in nature to those used to implement the boost circuit, i.e., the addition of
Mp3 and Mn3 to delay the energy-wasting discharge of Cclk. Also, in Figure 9, the gating signal for
Mp1 changes from zero to VDD but Mn1 needs a voltage-shifted value produced by the
combination of diodes Dshift, capacitor Cshift, and transistors Mn3 and Mp3.
There are three design decisions in Figure 9 that warrant further discussion. First,
transistors Mp2 and Mn2 are added to protect Mp1 and Mn1 from potentially large voltage drops
across them. Second, transistor Mp4 acts as the switch to prevent Vclk from going negative. The
gate of Mp4 is connected to Vbias, which is set at the threshold voltage of PMOS transistor Mp5.
Third, the body of all NMOS transistors needs to be connected to their sources or the most
negative voltage in the system to prevent forward biasing of body-source intrinsic diodes.
6. Simulation Results
All three integrated clock-driver/switching converters were designed and simulated in 90nm
CMOS technology. In the buck design, all transistors are standard-Vt (standard-threshold) type,
but the other two use only low-Vt type transistors to facilitate operation at lower VDD levels. To
provide greater opportunity for energy recycling, the boost and buck-boost designs use a larger
Cclk estimated to be 25pF.
6.1. Buck Simulation
Simulated waveforms for the integrated buck converter are shown in Figure 10. The circuit is
simulated with a 50% duty cycle and 70mA load current. The inductor current shown as Lf in
Figure 10(b) exhibits a triangular shape as expected, with minimum and maximum values of
around -50mA and 190mA, respectively. In the first half cycle of the clock, Mp source current
provides the energy to charge up Cclk as well as LF. Because of the high current, there is a voltage
drop of ~0.1V across Mp as suggested by the droop of Vclk to ~0.9V in Figure 10(a). In this figure,
11
the reference clock circuit output is shown as Vclk-ref. Both clocks have similar edge slopes. In the
second half cycle of the clock, inductor current discharges Cclk. As can be seen in Figure 10(b),
Mn source current is always positive, which means that all the charge in Cclk is delivered to the
load instead of the ground.
The simulated output voltage and effective efficiency of the buck converter at different
duty cycles and output currents are given in Figure 11. The output voltage increases as D is
increased and, at the same time, the effective efficiency decreases because more of the output
power comes from conduction via Mp1 rather than energy recycling of Cclk. For example, at
70mA output current, by varying the duty cycle from 30% to 70%, the effective efficiency ranges
from 286% down to 135%. For the reference circuit (the clock driver alone), simulations
determined its power consumption, Pin2, is 41mW.
6.2. Boost and Buck-boost Simulations
Figures 12 and 13 show the output voltage and the effective efficiency of the integrated boost
converter and buck-boost converter, respectively The output voltage increases in magnitude as D
is increased and, at the same time, the effective efficiency decreases. For the boost converter, a
maximum effective efficiency of 111% is achieved at D = 40% with Iout = 30mA. As mentioned
earlier, achieving an effective efficiency above 100% is definitive proof that “free energy” is
being recovered from the clock tree. For the buck-boost converter, effective efficiency reaches a
maximum of 66% at D = 20% with Iout = 50mA. The lower efficiency is a result of more
transistors in the main current path.
For the reference circuit, simulations determined its power consumption, Pin2, was
100mW. This is higher than the buck circuit, primarily because of a larger Cclk.
12
7. Prototype Implementation
Layout design and fabrication for all three designs are performed in 90nm CMOS. This section
describes layout issues for the inductor, capacitor, and three converter circuits.
7.1. Implementation of the Inductor
The use of magnetic materials to increase inductance [14] is desirable, but this work assumes
only conventional CMOS processes and, consequently, coreless inductors. These integrated
inductors induce eddy currents in the substrate, which can be mitigated by placing a metal
patterned-ground shield (PGS) in between the inductor coil and the substrate [15] [16] [17] [18].
The inductor in buck design uses a single-turn octagon, placing metal layers 6 and 7 in parallel to
reduce series resistance. To further reduce resistance, the inductor in the other two designs places
metal layers 4 through 7 and one extra aluminum (ALUCAP) layer in parallel resulting in a
slightly different estimated inductance. In all cases, the PGS is implemented in metal 1 to keep it
as far as possible from the inductor. ASITIC [19] was used to extract the inductor characteristics
shown in Figure 14(a) and the simplified π model of Figure 14(b) for the inductor used in the
buck converter chip. The inductor layout area is 0.1mm2.
7.2. Implementation of the Bulk Capacitor
In CMOS technology MOSFET gate capacitors have the highest capacitance density, but they are
nonlinear [20]. This nonlinear behavior of gate capacitance is not significant in power converter
applications such as this work because capacitance is used only for energy storage. An array of
hundreds of NMOS devices in parallel is used to accommodate for the high capacitance needed.
The internal resistance of a MOS gate capacitor, known as equivalent series resistance (ESR), is
reduced by using a transistor W/L ratio of 10 [21]. A reduced ESR not only decreases power
dissipation in the capacitor, but also lowers the voltage ripple across it.
13
7.3. Implementation of Three Complete Designs
The block diagram and chip micrograph of the buck design are shown in Figure 15. The area of
the buck design is 0.27mm2, and the total die area is 1mm2. A second chip, with a total die area
of 2mm2, contains the boost and buck-boost designs as well as the circuit described in [13]. The
boost design, including LF, is 0.26mm2. Due to increased layout effort, the buck-boost design
requires only 0.2mm2. In comparison, the area of the reference clock driver is 0.03mm2.
The layout of all three circuits is organized for probe station testing. Paths that carry high
currents are wide, slotted, and use many parallel vias. We are unable to monitor the internal
waveforms of the chip without being invasive, so waveform measurements are not available.
8. Prototype Measurement Results
After verifying that the circuit simulated successfully, the buck converter was manufactured and
tested first. For precise power measurement, all the parasitic resistances in the test setup of the
buck circuit were accounted for through measurement and calibration. As a result, a supply
voltage of 1.0V was applied at the chip probe pads. An external signal generator provides a clock
signal to the chip under test with a 50% duty cycle.2
The converter output voltage vs. the output current is plotted in Figure 16(a). For each
curve, the input duty cycle is kept constant at 50% and the switching frequency is changed. As
expected, the output voltage does not vary much with frequency. The output voltage increases as
output current decreases because of Deff and less resistive voltage drop in the circuit.
Similarly, Figure 16(b) plots the input power vs. output current. The input power to the
integrated clock driver/buck converter is plotted as Pin1, and the input power to the reference
clock circuit as Pin2. The input power increases with frequency due to switching activity.
14
Raw and effective efficiency is plotted in Figure 17. Raw efficiency does not change
much at different frequencies. Effective efficiency is higher at lower output currents. Since the
available energy in Cclk is constant with respect to output current, at low outputs a greater
proportion of the output energy comes from recycling. However, higher Fsw results in more
energy being stored in the capacitor per second. Hence, ηeff benefits from increasing the
frequency and lowering the output current. Achieving an effective efficiency above 100% is
definitive proof that energy is being recovered from the clock. At 3GHz, Pin2 of the reference
clock is measured to be 43mW. At the same time Pin1 is measured to be 58mW for an output
voltage of 0.75V at 37mA load current (Pout1=27mW). Thus, η = 47% and ηeff = 180%.
Table 1 provides a summary of performance comparison between this work and two other
previously published buck converters. The output voltage ripple is part of the design
specification. Among the published on-chip DC-DC converters, [22] has the highest reported
switching frequency at 480MHz, but it uses on-package inductors.3 In contrast, [10] implemented
a fully on-chip buck converter in 0.18µm SiGe RF BiCMOS technology that was 65% efficient.
It also used an area of 27mm2 to fit the large passive components. The buck converter in this
work achieves a much higher effective efficiency using only 1/100th of the area.
After testing the buck converter successfully, the boost and buck-boost designs were
manufactured together on a shared die and tested. Unfortunately, the boost and buck-boost circuit
prototypes were not functional. For these circuits, measurements indicate an output voltage of a
few hundreds of millivolts, which is lower than expected. There are several suspected issues,
mostly arising from common elements of the layout or design, but we were unable to identify a
2
The circuit also behaves as expected at a 66% duty cycle [21]. Generator limits prevented testing other duty cycles.
3
The design later appeared in [23], switching at 233MHz to further boost converter efficiency.
15
specific cause. The most likely cause is a faulty voltage shifter circuit, leading to the inability to
turn off the associated power transistor and unintentional draining of CF.
9. Conclusions
This paper investigates energy recovery from a clock load in high-speed digital circuits by
exploring the integration of three switching DC-DC converter topologies with a clock driver.
These integrated driver/converter circuits recycle the energy (in the form of charge stored) from
the clock capacitance. Several characteristics make integration of these two different circuits very
promising: the multi-GHz system clock used for switching significantly reduces the size of the
output filter components, making their integration feasible; the clock driver and power converter
can share the tapered buffer chain, which can be optimized for low power; and energy wasted
during clock discharge can be recovered using ZVS technique. Unlike resonant schemes which
normally produce sinusoidal clock waveforms, simulations show this approach produces goodquality, quasi-square clocks. In addition, the power converter output enables further powersaving features to be employed such as low-voltage islands or body bias voltages.
In this work, the buck converter has the simplest implementation. A proof-of-concept
fully integrated clock driver/buck converter test chip was fabricated in a standard 90nm CMOS
technology. Measurements indicate the chip operates with an effective efficiency up to 180%.
This is able to exceed 100% by recycling the “free energy” that is available from the clock.
Going forward, the circuits described in this work leave significant room for further
optimization. For example, this paper focuses on using ZVS to recycle the back-end energy of the
final clock driver where most of the energy is lost. However, it is possible to lower switching
losses even further with front-end energy recycling [24] or gate charge recycling [25].
Alternatively, [26] suggests a way to shrink passive components by employing two inductors to
16
cancel output ripple. Future work can improve upon the converters in this paper by improving
their design and combining them with other recent advances.
In addition, there are a number of concerns that still need to be addressed before this work
can be made practical. First, clock jitter may be increased as a result of the pulse-width
modulation required for voltage regulation. As well, there may be issues with stopping the clock,
such as possible oscillations due to the LC elements. Integration with the layout of a real highspeed microprocessor, or a reasonable proxy, is needed to verify practical layout issues with the
power grid and low-skew clock distribution network. Also, the new clock waveform is only
suitable for edge-triggered digital flip-flops, whereas high-performance microprocessors
frequently employ transparent latches. Future work is needed to address these concerns.
Acknowledgements
The authors would like to thank several people who have provided feedback for this work,
including Daryl van Vorst and Professors David Pulfrey, Resve Saleh, and K.C. Smith. We
gratefully acknowledge the contributions of CMC Microsystems for providing CAD tools and
chip fabrication resources, as well as funding from the Natural Sciences and Engineering
Research Council of Canada (NSERC).
References
[1] E. G. Friedman, “Clock Distribution Networks in Synchronous Digital Integrated Circuits,”
Proc. of the IEEE, vol. 89, no. 5, pp. 665-689, May 2001.
[2] N. Ranganathan and N. Jouppi, “Evaluating the Potential of Future On-Chip Clock
Distribution using Optical Interconnects,” Hewlett-Packard Company Labs Technical Report
HPL-2007-163, 2007.
17
[3] J. Friedrich, B. McCredie, N. James, B. Huott, B. Curran, E. Fluhr, G. Mittal, E. Chan, Y.
Chan, D. Plass, S. Chu, H. Le, L. Clark, J. Ripley, S. Taylor, J. Dilullo, and M. Lanzerotti,
“Design of the Power6 Microprocessor,” IEEE Int. Solid-State Circuits Conf. Dig. Tech.
Papers, pp. 96-97, 2007.
[4] M. Stan and W. Burleson, “Low-power CMOS Clock Drivers,” Proc. ACM/IEEE Int.
Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, pp. 149156, 1995.
[5] S. C. Chan, K. L. Shepard, and P. J. Restle, “Uniform-Phase Uniform-Amplitude ResonantLoad Global Clock Distributions,” IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 102-109,
Jan. 2005.
[6] S. C. Chan, K. L. Shepard, and P. J. Restle, “Distributed Differential Oscillators for Global
Clock Networks,” IEEE J. Solid-State Circuits, vol. 41, no. 9, pp. 2083-2094, Sept. 2006.
[7] G. Lemieux, M. Alimadadi, S. Sheikhaei, S. Mirabbasi, and P. Palmer “SoC Energy Savings
= Reduce + Reuse + Recycle: A Case Study Using a 660MHz DC-DC Converter with
Integrated Output Filter,” Proc. IEEE Canadian Conf. on Electrical and Computer
Engineering, pp. 947-950, 2008.
[8] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan, and
V. De, “Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter
Variations on Microprocessor Frequency and Leakage,” IEEE J. Solid-State Circuits, vol. 37,
no. 11, pp. 1396-1402, Nov. 2002.
[9] W. Kim, M. Gupta, G.-Y. Wei, and D. Brooks, “System Level Analysis of Fast, per-core
DVFS using On-chip Switching Regulators,” IEEE Int. Symposium on High Performance
Computer Architecture, pp. 123-134, 2008.
[10] S. Abedinpour, B. Bakkaloglu, and S. Kiaei, “A Multi-Stage Interleaved Synchronous Buck
Converter with Integrated Output Filter in a 0.18µm SiGe Process,” IEEE Int. Solid-State
Circuits Conf. Dig. Tech. Papers, pp. 356-357, 2006.
[11] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, and P. Palmer, “A 3GHz Switching
DC-DC Converter Using Clock-Tree Charge-Recycling in 90nm CMOS with Integrated
Output Filter,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 532-533, 2007.
[12] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford,
“Energy Recovery from High-frequency Clocks using DC-DC Converters,” Proc. IEEE Int.
Symposium on Very Large Scale Integration, pp. 162-167, 2008.
[13] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “A
4GHz Non-resonant Clock Driver with Power-grid Energy Return,” submitted to IEEE
Trans. Circuits and Systems II, 2008.
[14] S. C. O. Mathuna, T. O'Donnell, W. Ningning, and K. Rinne, “Magnetics on Silicon: an
Enabling Technology for Power Supply on Chip,” IEEE Trans. Power Electronics, vol. 20,
no. 3, pp. 585-592, May 2005.
[15] C. Yue and S. Wong, “On-chip Spiral Inductors with Patterned Ground Shields for Si-based
RF ICs,” IEEE J. Solid-State Circuits, vol. 33, no. 5, pp. 743-752, 1998.
18
[16] J. N. Burghartz, “Progress in RF inductors on silicon-understanding substrate losses,” IEEE
Int. Electron Devices Meeting Dig. Tech. Papers, pp. 523-526, 1998.
[17] J. N. Burghartz, D. C. Edelstein, M. Soyuer, H. A. Ainspan, and K. A. Jenkins, “RF circuit
design aspects of spiral inductors on silicon,” IEEE J. Solid-State Circuits, vol. 33, no. 12,
pp. 2028-2034, Dec. 1998.
[18] J. Gil and S. Hyungcheol, “A Simple Wide-band On-chip Inductor Model for Silicon-based
RF ICs,” IEEE Trans. Microwave Theory and Techniques, vol. 51, no. 9, pp. 2023-2028,
Sep. 2003.
[19] A. M. Niknejad and R. G. Meyer, “Analysis, Design, and Optimization of Spiral Inductors
and Transformers for Si RF IC’s,” IEEE J. Solid-State Circuits, vol. 33, no. 10, pp. 14701481, Oct. 1998.
[20] G. Villar, E. Alarcon, F. Guinjoan, and A. Poveda, “Optimized Design of MOS Capacitors
in Standard CMOS Technology and Evaluation of their Equivalent Series Resistance for
Power Applications,” Proc. IEEE Int. Symposium Circuits and Systems, pp. 451-454, 2003.
[21] M. Alimadadi, “Recycling Clock Network Energy in High-performance Digital Designs
using On-chip DC-DC Converters”, Ph.D. Thesis, Dept. of Electrical and Computer
Engineering, The University of British Columbia, 2008.
[22] G. Schrom, P. Hazucha, J. Hahn, D. S. Gardner, B. A. Bloechel, G. Dermer, S. G. Narendra,
T. Karnik, and V. De, “ A 480-MHz, Multi-phase Interleaved Buck DC-DC Converter with
Hysteretic Control,” Proc. IEEE Power Electronics Specialists Conf., pp. 4702-4707, 2004.
[23] P. Hazucha, G. Schrom, J. Hahn, B. A. Bloechel, P. Hack, G. E. Dermer, S. Narendra, D.
Gardner, T. Karnik, V. De, and S. Borkar, “A 233MHz 80%-87% Efficient Four-phase DCDC Converter Utilizing Air-core Inductors on Package,” IEEE J. Solid-State Circuits, vol.
40, no. 4, pp. 838-845, Apr. 2005.
[24] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “A
660MHz ZVS DC-DC Converter Using Gate-driver Charge-recycling in 0.18µm CMOS with
an Integrated Output Filter,” Proc. IEEE Power Electronics Specialists Conf., 2008.
[25] M. D. Mulligan, B. Broach, and T. H. Lee, “A 3MHz Low-voltage Buck Converter with
Improved Light Load Efficiency,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp.
528-529, 2007.
[26] J. Wibben and R. Harjani, “A High-efficiency DC-DC Converter Using 2nH Integrated
Inductors,” IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 844-854, 2008.
19
Figure 1. Recycling clock energy with a DC-DC converter
(a) Raw efficiency
(b) Effective efficiency
Figure 2. Power dissipation block diagram
20
(a) A typical buck converter
(b) Simplified circuit diagram
(c) Idealized timing diagrams
Figure 3. Integrated clock driver/buck converter
Figure 4. Reference clock driver for the integrated clock driver/buck converter
21
Figure 5. Integrated clock driver/buck converter
Figure 6. Simplified circuit model for analyzing Vclk during Tfall
22
(a) A typical boost converter
(c) A typical buck-boost converter
(b) Simple clock driver/boost converter
(d) Simple clock driver/buck-boost converter
Figure 7. Integrated clock driver converter circuits
23
Figure 8. Circuit diagram of the integrated clock driver/boost converter
Figure 9. Circuit diagram of the integrated clock driver/buck-boost converter
24
1.2
0.3
Vclk
Vclk-ref
Vload
Lf
Mn
Mp
0.25
0.8
0.2
0.6
0.15
Current (A)
Voltage (V)
1
0.4
0.2
0.1
0.05
0
0
0
0.2
0.4
0.6
0.8
1
-0.2
-0.05
0
0.2
0.4
-0.1
-0.4
0.6
0.8
1
Time (nSec)
Time (nSec)
(a) Voltage waveforms
(b) Current waveforms
Figure 10. Simulated waveforms of points of interest for the integrated clock driver/buck converter
300
1
Iout=30
Effective Efficiency (%)
Iout=50
0.75
Iout=70
Vout (V)
Iout=100
0.5
0.25
250
200
150
D=30%
D=40%
D=50%
D=60%
D=70%
100
50
0
0
10
20
30
40
50
60
70
80
40
50
Duty Ratio (%)
60
70
80
90
100
Iout (mA)
(a) Output voltage vs. duty cycle
(b) Effective efficiency vs. output current
Figure 11. Simulation results of the integrated clock driver/buck converter
2.5
1.5
Effective Efficiency (%)
2
Vout (V)
125
Iout=10mA
Iout=30mA
Iout=50mA
Iout=70mA
Iout=100mA
1
0.5
0
D=40%
D=50%
D=60%
D=70%
D=80%
100
75
50
25
0
30
40
50
60
Duty Ratio (%)
70
80
0
20
40
60
80
100
Iout (mA)
(a) Output voltage vs. duty cycle
(b) Effective efficiency vs. output current
Figure 12. Simulation results of the integrated clock driver/boost converter
25
100
0
Vout (V)
-0.4
Effective Efficiency (%)
Iout=10mA
Iout=30mA
Iout=50mA
Iout=70mA
Iout=90mA
-0.8
-1.2
-1.6
D=20%
D=30%
D=40%
D=50%
D=60%
D=70%
80
60
40
20
-2
0
10
20
30
40
50
60
70
0
Duty Ratio (%)
20
40
60
80
100
Iout (mA)
(a) Output voltage vs. duty cycle
(b) Effective efficiency vs. output current
Figure 13. Simulation results of the integrated clock driver/buck-boost converter
Ls (pH), Rs (mOhm), Q x 10
400
350
300
250
200
150
100
Ls (pH)
Rs (mOhm)
50
Q x 10
0
0.1
1
10
Fsw (GHz)
100
Fsw = 3GHz, Lseries = 320pH, Rseries = 260mΩ,
Cs1 ≅ Cs2 ≅ 140fF, Rs1 ≅ Rs2 ≅ 280Ω, Q = 20
(a) Lseries, Rseries and Q vs. Fsw
(b) Simplified π model
Figure 14. Modeling of the on-chip inductor for the buck converter chip
(a) Chip block diagram
(b) Chip micrograph
Figure 15. Implementation of the integrated clock driver/buck converter
26
3GHz
2.5GHz
Fsw Sweep (D=50%)
120
1
100
0.8
80
Power (mW)
Vout (V)
1.2
3GHz
2.5GHz
Fsw Sweep (D=50%)
0.6
0.4
0.2
Pin1
60
Pin2
40
20
0
0
30
40
50
60
70
80
90
100
110
30
40
50
60
Iout (mA)
70
80
90
100
110
Iout (mA)
(a) Output voltage
(b) Input power
Figure 16. Measured results indicating the effect of Fsw on (a) Vout and (b) Pin1 and Pin2
3GHz
2.5GHz
Fsw Sweep (D=50%)
240
Effective Efficiency (%)
120
100
Raw Efficiency (%)
3GHz
2.5GHz
Fsw Sweep (D=50%)
80
60
40
20
200
160
120
80
40
0
0
30
40
50
60
70
80
90
100
110
30
40
50
60
70
80
90
100
110
Iout (mA)
Iout (mA)
(a) Raw efficiency
(b) Effective efficiency
Figure 17. Measured results showing the effect of Fsw on η and ηeff
Table 1. Summary of performance comparison between integrated switching converters
Converter type
Technology
2
Layout Area (mm )
Switching frequency, Fsw (MHz)
Inductor, LF (pH)
Capacitor, CF (pF)
Supply Voltage, Vin (V)
Output Voltage, Vout (V)
Output Voltage Ripple
Output Current, Iout (mA)
Effective Efficiency, ηeff (%)
Previous Work
4-Phase Buck [22] 2-Phase Buck [10]
90nm
0.18µm SiGe
CMOS
RF BiCMOS
0.14 *
27
(excludes L)
480
45
3 600
11 000
(per phase)
(per phase)
2 500
6 000
1.8
2.8
0.9
1.5 ~ 2
500
200
72
65
This Work
Buck
90nm
CMOS
0.27
3 000
320
350
1.0
0.53 ~ 0.75
< 5%
40 ~ 100
184 (Vout=0.75V)
102 (Vout=0.63V)
74 (Vout=0.53V)
* Layout area was reported in [23]
27