* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download June 2008 DRAFT - submitted to JSSC for review.
Immunity-aware programming wikipedia , lookup
Flip-flop (electronics) wikipedia , lookup
Television standards conversion wikipedia , lookup
Operational amplifier wikipedia , lookup
Schmitt trigger wikipedia , lookup
Coupon-eligible converter box wikipedia , lookup
Analog-to-digital converter wikipedia , lookup
Resistive opto-isolator wikipedia , lookup
Transistor–transistor logic wikipedia , lookup
Radio transmitter design wikipedia , lookup
Power MOSFET wikipedia , lookup
Phase-locked loop wikipedia , lookup
Valve RF amplifier wikipedia , lookup
Surge protector wikipedia , lookup
Index of electronics articles wikipedia , lookup
Integrated circuit wikipedia , lookup
Integrating ADC wikipedia , lookup
Current mirror wikipedia , lookup
Opto-isolator wikipedia , lookup
Power electronics wikipedia , lookup
Time-to-digital converter wikipedia , lookup
June 2008 DRAFT - submitted to JSSC for review. Energy Recycling from Multi-GHz Clocks using Fully Integrated Switching Converters M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, W. Dunford, P. Palmer University of British Columbia, Vancouver, BC, Canada Abstract Large digital chips use a significant amount of energy to distribute a multi-GHz clock. By discharging the clock network to ground every cycle, the energy stored in this large capacitor is wasted. Instead, this energy can be recycled to another part of the chip using an on-chip DC-DC converter. This paper investigates energy recycling by integrating three DC-DC converter topologies, namely buck, boost and buck-boost, with a high-speed clock driver. The high operating frequency significantly shrinks the required size of the L and C components so they can be placed on-chip; typical converters place them off-chip. The clock driver and DC-DC converter are able to share the entire tapered buffer chain, including the widest drive transistors in the final stage. The clock duty cycle must be modulated to achieve voltage regulation, implying only single-edge-triggered flops should be used. However, this minor drawback is eclipsed by the benefits: by recovering energy from the clock, the output power can actually exceed the incremental input power needed to operate the converter circuitry, resulting in an effective efficiency greater than 100%. Circuit operation is validated using a 90nm prototype chip. Finally, the converter output enables additional power-saving features such as low-voltage islands or body bias voltages. Keywords Switching DC-DC converter, integrated DC-DC converter, integrated output filter, charge recycling, energy recovery, energy recycling, integrated clock/converter, multi-GHz clock. 1. Introduction The rapid increase in energy consumption of large digital circuits has been predominantly due to an increase in total gate capacitance and an increase in operating frequency [1]. As a result, a large fraction of the total energy budget is used by the high-frequency clock network. While the clock energy is obviously proportional to frequency, additional energy is also expended in modern designs to reduce clock skew and to drive increased capacitance caused by the more closely spaced wires and thinner gate oxides. Most of this energy is consumed in the final drive stage where a large load capacitance is charged and discharged every cycle [2]. For example, the 5+ GHZ IBM POWER6 processor clock network consumes 22W at 1V in a 341mm2 die [3], suggesting it can be roughly modeled by an equivalent capacitance of 13pF/mm2. There are several methods used to reduce clock energy, such as gating the clock, lowswing signals, double-edge triggered flip-flops, adiabatic switching [4], and resonant clocking [5][6]. All of these previous techniques have attempted to reduce the power consumption of operating the clock, but have various issues. For example, resonant clocking shows promise for up to 80% energy reduction in clock distribution, but it cannot be applied at the end loads where most of the power is dissipated. The reason is that the sinusoidal waveform produced by resonance does not have steep edges right at the critical switching time, which is needed for all the flops to agree to the same, precise switching moment. This paper investigates a new method for reducing clock energy, summarized in Figure 1, where the clock energy is not reduced directly but is instead recovered using a DC-DC converter and redeployed to another circuit in a regulated fashion. This redeployment reduces the total current draw from the primary supply. We call this concept energy recycling [7]. 1 One of the main advantages of energy recycling is the efficient generation of an on-chip voltage supply which differs from the level offered by the primary supply. Since the on-chip DCDC converter is small, many can be deployed across the chip to produce independent, regional power supplies. This allows for several different regulated voltages to be on-chip at the same time, all powered from a single primary supply. Various power-saving techniques such as mixedvoltage islands and adaptive body biasing (ABB) [8] can utilize these additional supply voltages. An on-chip DC-DC converter can power these schemes without the need for external pins, external components, or board design effort. Another advantage of on-chip converters is the ability to respond quickly to dynamic load conditions in many-core processors, a key requirement for achieving the savings promised by dynamic voltage and frequency scaling (DVFS) [9]. The main drawback of fully integrated switching converters is their low efficiency. Highquality discrete converters operate at above 80% efficiency. It can be difficult to reach this level without off-chip components because a low switching frequency (hence, large LC components) is often used to reduce switching loss. For example, the converter described in [10] uses 27mm2 at 45MHz to reach an efficiency of 65%. The DC-DC converters described in this paper operate at nearly 100× higher frequencies to shrink LC area by 99%, but switching losses do increase. To mitigate the switching loss, energy stored in the clock capacitance is recycled. This extra energy is available for free, since it would normally be wasted every cycle when the clock node is discharged. As a result, the power needed to operate the DC-DC converter is greatly reduced. To keep things simple, the new converters in this paper are presented in an ‘open loop’ mode with no voltage regulation capability. A more complete design, as presented in [10], would include a feedback controller to regulate voltage by modulating the clock pulse width. However, 2 the operation of such a controller is not under investigation here. Also, in all designs, the power converters target an output voltage ripple of less than 5% peak-to-peak. The remainder of this paper is organized as follows. In the next section, we describe the compatibility of merging a clock driver with a power converter and the resulting impact on power conversion efficiency. Sections 3 to 5 present a clock driver integrated with buck, boost, and buck-boost power converter topologies, respectively. Section 6 gives simulation results, while Section 7 discusses layout implementation issues and Section 8 presents results from a working proof-of-concept buck-converter prototype manufactured in 90nm CMOS. Section 9 concludes this paper. 2. Similarity of Clock Driver and Power Converter Circuits Integrating a clock driver with a DC-DC converter merges several compatible concepts. First, using the same high switching frequency (e.g., 3GHz) for the clock and converter reduces the size of on-chip inductor and capacitor needed by the converter. Second, the final clock drivers and the DC-DC power transistors are both very wide to improve switching time of the clock and reduce output losses of the converter. These large, low-impedance transistors need to be driven by a tapered inverter chain to keep up with the high frequency. Third, the power used by this chain should be minimized in both cases. Fourth, high-efficiency DC-DC converters employ an energy-saving concept known as zero-voltage switching (ZVS), where a power transistor is turned on only after a “dead-time” delay when the source-drain voltage reaches 0V. During the dead time, energy stored in the drain node capacitance is removed by the inductor and delivered to the load instead of being wasted. Applying this ZVS technique to the large clock capacitance is the key to energy recycling. Since the inductor is not used in a resonant fashion, clock edges 3 are kept fast. Lastly, many DC-DC converters use pulse width modulation (with fast edge rates) for output regulation, a scheme compatible with single-edge triggered clocking. Figure 2 shows how combining the clock driver circuit with the power converter circuit helps to increase the overall efficiency. The integrated clock driver/power converter in Figure 2(a) receives Pin1 input power and provides Pout output power, resulting in a raw efficiency of η = Pout /Pin1 < 1. However, part of Pin1 is required to operate the clock network. If a dedicated clock driver was constructed, this power consumption would be Pin2. Hence, an integrated clock driver/power converter circuit uses Pin1 − Pin2 additional power to operate the power converter portion and recycle energy from the clock driver. As shown in Figure 2(b), if the clock driver power (and circuitry) was removed from the integrated design and replaced with a standalone converter, it would need to provide Pout using just the incremental power, Pin1 − Pin2. Thus, recycling the clock power increases the overall efficiency. To compare the dual-purpose circuit with a traditional stand-alone power converter, we use a new metric called effective efficiency (ηeff), which is defined in Equation (1) as the output power divided by the incremental input power needed to operate the converter. Effective efficiency captures how efficient a traditional stand-alone converter needs to be to supply the same output power using just the incremental input power used by the dual-purpose circuit. ηeff = Pout × 100 . Pin1 − Pin 2 (1) Due to the poor efficiency typically associated with very high switching rates, we consider an effective efficiency greater than 50% to be good for an on-chip converter.1 However, in some cases, circuits in this paper provide effective efficiency greater than 100%. This is not 1 The threshold value of 50% effective efficiency was chosen to match the raw efficiency of the integrated clock driver/buck converter chip presented in this paper. Achieving anything better than the raw efficiency should be considered good. 4 evidence of the perpetual motion fallacy; it is proof that otherwise wasted energy is being recycled from the clock tree and delivered to the converter output. 3. Integrated Clock Driver/Buck Converter This section describes the operation of a buck converter and how it can be integrated with a traditional tapered clock driver chain. This combined circuit is used to drive a large clock capacitance and output a voltage lower than VDD which can be used as a supply for low-voltage islands to further reduce energy consumption. This circuit was initially presented in [11], but is described here in greater detail. 3.1. Simplified Circuit A buck converter operates by averaging a square-wave signal through a low-pass filter as shown in Figure 3(a). The output voltage of an ideal converter is the average or DC value of its input, D×VDD, which implies that it is a function of the magnitude and duty cycle of the input, but not the frequency. A simple integrated clock driver/buck converter circuit is shown in Figure 3(b) where a chain of cascaded inverters (not shown) drives node Vclk-in. Capacitance Cclk is the overall capacitance at the clock node and includes all transistor and wiring capacitances at this node. The operation of this circuit is summarized by the idealized timing diagram in Figure 3(c), where D, Tsw, and Tdelay represent clock duty cycle, switching (clock) period, and ZVS dead-time, respectively. There are three modes of operation: • Mode 1 (time 0 to D×Tsw) is intended to drive the load and charge Cclk through Mp. Inductor current increases linearly since the voltage across it is constant. • Mode 2 (time D×Tsw to D×Tsw+Tdelay) is intended for energy recycling. During this time, both Mn and Mp are off and the charge stored in Cclk is moved to the output circuit 5 through the inductor, as the inductor current can not be disrupted abruptly. This results in a rapid drop of Vclk, which is intended. In this short period of time, the inductor current can be assumed somewhat constant. It is worth mentioning that if no delay is present, Cclk would be discharged to ground at time D×Tsw through Mn, wasting the energy. • Mode 3 (time D×Tsw+Tdelay to Tsw) starts when the voltage across Mn is close to zero. At this time Mn is turned on to provide a low-resistance path for the inductor current. At this point, no further energy is supplied to the inductor and the voltage across it is constant, so inductor current decreases linearly. ZVS operation occurs when Mn is turned on while its source-drain voltage is close to zero, thereby reducing dynamic power loss. Theoretically, in Mode 3, when the falling inductor current crosses zero, Mn could be turned off to allow charging Cclk with the negative inductor current. Then, at the beginning of the next switching cycle, Mp would be turned on with 0V across it (i.e., ZVS operation for Mp). In practice, this increases the output voltage ripple, as CF must provide the required charge for the large Cclk. In this design, no ZVS operation is implemented for Mp. 3.2. Complete Circuit Measuring the performance of the new circuit requires constructing a reference clock driver circuit as well as the integrated clock driver/buck converter itself. These two circuits are shown in Figures 4 and 5, respectively. The reference clock driver has the same transistor sizes and load as the integrated design to facilitate experimental measurements with a fabricated prototype. In both circuits, PMOS transistors are three times wider than NMOS transistors, except for the last inverter stage in which the PMOS is four times wider to reduce the voltage drop across Mp while Vclk is high and the current is building up in the inductor LF (Mode 1). A tapering factor equivalent to a fan-out of four is used for the inverter chain, which minimizes clock 6 latency from the source. To reduce front-end energy and improve overall conversion efficiency, this tapering factor could be increased. NMOS transistor gate capacitance is used to implement the converter filter capacitor, CF, while the gate capacitance of a large dummy inverter is used to represent the parasitic and load capacitance at the clock node, Cclk. The value of Cclk is estimated to be 12pF, roughly equivalent to a 1mm2 region of the IBM POWER6 processor. To control the exact on/off timing of Mn and Mp shown in Figure 5, the inverter driving those transistors is replaced with two separate inverters, with the same total transistor sizes as the original single driver. To implement a delay for the ZVS dead-time, the gate of M1 is connected to Vclk instead of being connected to the gate of M2. Therefore, compared to Vp, the rising edge of Vn is delayed by Tdelay, a duration which depends on how quickly LF drains Cclk and how fast M1 turns on to raise Vn. A drop in Vclk will result in M1 and then Mn to turn on and consequently Vclk is dropped faster. Since the gate of M2 is connected to Vm, no falling edge delay is observed for Vn. To prevent M1 and M2 from being on concurrently at the rising edge of Vm, the source of M1 is connected to Vp instead of VDD and uses its charge. Therefore, Vn falls at the falling edge of Vp. The duty cycle observed at Vclk, which determines the output voltage, is influenced by how quickly Cclk is drained by the load. The output voltage is given by Vout = Deff ×Vin where Here, Tfall is the fall time of Vclk 1 Tdelay − T fall (2) Deff = D + ⋅ . 2 Tsw if there is no ZVS delay (reference clock driver circuit), while Tdelay is the fall time of Vclk in the presence of ZVS (integrated driver/converter circuit). Equation (2) suggests that if Tdelay is equal to Tfall, duty cycle remains unchanged. Any Tdelay larger than Tfall increases the effective duty cycle accordingly. Tdelay can be calculated using the simplified circuit model given in Figure 6. At the time t = 0 when Mp turns off, 7 Vclk (0 ) = VDD − I Lmax ⋅ Ron− PMOS . During clock fall off, ILmax can be assumed to be constant, therefore, Vclk (t ) = Vclk (0 ) − 1 ⋅ I Lmax ⋅ t . The time that takes for Vclk to reach zero is Cclk V (3) Tdelay = Cclk DD − Ron−PMOS . I Lmax Timing uncertainty is an important issue in a clock distribution network. In Figure 5, as Cclk is charged and discharged through non-similar circuit routes, rising/falling edges of the Vclk are not similar. The most significant concern is rise- and fall-time dependence on a dynamic load. When the clock is rising, different load currents result in different voltage drops across the Mp on-resistance, Ron-pmos. As a result, the rising edge of the clock is slightly modified based on the load current. For the falling edge, the problem is more severe, as the load current solely determines the falling slope of the clock signal before the ZVS delay circuit is triggered. It is also worth noting that no oscillation happens at the Vclk node: as Vclk approaches 0V, Mn turns on and prevents the oscillation by keeping Vclk at 0V until the next Mp turn-on phase. 4. Integrated Clock Driver/Boost Converter The buck converter just described cannot output a voltage close to VDD because the PWM clock signal must remain low for a significant time period. Instead, a boost converter can be used to provide increased output voltage range. Originally presented in [12], this paper provides additional circuit design and implementation details. This boost converter design also led to the development of a new low-power clock driver [13]. 4.1. Simplified Circuit In the typical boost converter of Figure 7(a), when the switch is closed, voltage Vin is across the inductor LF and it builds up current. In the next phase, when the switch is open, inductor current finds its way through the diode and charges the output capacitor. 8 A simple integrated clock driver/boost converter circuit is shown in Figure 7(b). It uses a switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the PMOS transistor in place of the power diode. In addition to providing output voltage levels higher than VDD, the circuit also produces a buffered version of the clock, Vclk_scaled, at the same magnitude as Vout. This clock signal can be used in the circuitry powered by the converter, but allowances for clock skew and level-conversion will need to be made in the data path logic. There are two modes of operation: Mode 1, where Vclk goes high and Mn turns on to build up inductor current, and Mode 2 where inductor current finds its way through Mp to charge the output capacitor CF. The output voltage of this boost converter would be Vout = 1 D × VDD , × Vin = 1− D 1− D where Vin = mean (Vclk ) = D × VDD . Ideally when D > 50%, the output voltage will be higher than VDD. 4.2. Complete Circuit The complete integrated clock driver/boost converter circuit is shown in Figure 8. Mn2 and Mp2 introduce the turn-on delay to implement ZVS for Mn1 as in the buck circuit. The drain node of Mn3 and Mp3, denoted as Vclk_scaled, swings from zero to Vout. The value of the scaled clock capacitor is assumed to be 2.2pF. Some of the recovered energy is subsequently lost when this capacitor is discharged, so it should be kept small. In Figure 8, the gating signal for Mn3 changes from VDD to zero. However, the appropriate gating signal for Mp3 should instead change from Vout to Vout – VDD. The combination of diodes Dshift, capacitor Cshift, and transistors Mn1 and Mp1 perform as a switched-capacitor voltage shifter. Except for Dshift and Cshift, all transistor body terminals are connected to their source pins. The body terminals of Dshift and Cshift are connected to ground instead. This prevents forward 9 biasing of the body-drain intrinsic diode, in case drain voltage goes lower than the source voltage. Also, this makes the layout implementation easier as well, since no deep n-well structure is required. Finally, a 1kΩ resistor is added in parallel to Cshift to bias the Dshift diodes and provide a DC current path to avoid floating nodes when the Dshift is off. 5. Integrated Clock Driver/Buck-Boost Converter This section integrates a buck-boost converter with a clock driver to produce a negative output voltage. Originally presented in [12], this paper provides circuit implementation details. 5.1. Simplified Circuit In the typical buck-boost converter of Figure 7(c), when the switch is on, voltage Vin will be across the inductor LF and current will build up in the inductor. In the next phase, when the switch is off, inductor current goes through the diode and charges the output capacitor. The diode prevents shorting Vout to VDD when the switch is on. A simple integrated clock driver/buck-boost converter circuit is shown in Figure 7(d). It uses a switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the NMOS used in place of the power diode. An extra switch Sclk is also added between nodes Vclk and Vinv. This switch prevents Vclk from becoming negative as Vinv goes below zero when Mn is on. Similar to the boost converter, the buck-boost converter also has two modes of operation. In Mode 1, current builds up in the inductor, while in Mode 2 the inductor current absorbs the energy in Cclk then delivers its stored energy to CF. The output voltage of this buck-boost converter is calculated by Vout = −D − D2 × Vin = × VDD , which is negative. 1− D 1− D 5.2. Complete Circuit The complete integrated clock driver/buck-boost converter circuit is shown in Figure 9. Many of 10 the changes are similar in nature to those used to implement the boost circuit, i.e., the addition of Mp3 and Mn3 to delay the energy-wasting discharge of Cclk. Also, in Figure 9, the gating signal for Mp1 changes from zero to VDD but Mn1 needs a voltage-shifted value produced by the combination of diodes Dshift, capacitor Cshift, and transistors Mn3 and Mp3. There are three design decisions in Figure 9 that warrant further discussion. First, transistors Mp2 and Mn2 are added to protect Mp1 and Mn1 from potentially large voltage drops across them. Second, transistor Mp4 acts as the switch to prevent Vclk from going negative. The gate of Mp4 is connected to Vbias, which is set at the threshold voltage of PMOS transistor Mp5. Third, the body of all NMOS transistors needs to be connected to their sources or the most negative voltage in the system to prevent forward biasing of body-source intrinsic diodes. 6. Simulation Results All three integrated clock-driver/switching converters were designed and simulated in 90nm CMOS technology. In the buck design, all transistors are standard-Vt (standard-threshold) type, but the other two use only low-Vt type transistors to facilitate operation at lower VDD levels. To provide greater opportunity for energy recycling, the boost and buck-boost designs use a larger Cclk estimated to be 25pF. 6.1. Buck Simulation Simulated waveforms for the integrated buck converter are shown in Figure 10. The circuit is simulated with a 50% duty cycle and 70mA load current. The inductor current shown as Lf in Figure 10(b) exhibits a triangular shape as expected, with minimum and maximum values of around -50mA and 190mA, respectively. In the first half cycle of the clock, Mp source current provides the energy to charge up Cclk as well as LF. Because of the high current, there is a voltage drop of ~0.1V across Mp as suggested by the droop of Vclk to ~0.9V in Figure 10(a). In this figure, 11 the reference clock circuit output is shown as Vclk-ref. Both clocks have similar edge slopes. In the second half cycle of the clock, inductor current discharges Cclk. As can be seen in Figure 10(b), Mn source current is always positive, which means that all the charge in Cclk is delivered to the load instead of the ground. The simulated output voltage and effective efficiency of the buck converter at different duty cycles and output currents are given in Figure 11. The output voltage increases as D is increased and, at the same time, the effective efficiency decreases because more of the output power comes from conduction via Mp1 rather than energy recycling of Cclk. For example, at 70mA output current, by varying the duty cycle from 30% to 70%, the effective efficiency ranges from 286% down to 135%. For the reference circuit (the clock driver alone), simulations determined its power consumption, Pin2, is 41mW. 6.2. Boost and Buck-boost Simulations Figures 12 and 13 show the output voltage and the effective efficiency of the integrated boost converter and buck-boost converter, respectively The output voltage increases in magnitude as D is increased and, at the same time, the effective efficiency decreases. For the boost converter, a maximum effective efficiency of 111% is achieved at D = 40% with Iout = 30mA. As mentioned earlier, achieving an effective efficiency above 100% is definitive proof that “free energy” is being recovered from the clock tree. For the buck-boost converter, effective efficiency reaches a maximum of 66% at D = 20% with Iout = 50mA. The lower efficiency is a result of more transistors in the main current path. For the reference circuit, simulations determined its power consumption, Pin2, was 100mW. This is higher than the buck circuit, primarily because of a larger Cclk. 12 7. Prototype Implementation Layout design and fabrication for all three designs are performed in 90nm CMOS. This section describes layout issues for the inductor, capacitor, and three converter circuits. 7.1. Implementation of the Inductor The use of magnetic materials to increase inductance [14] is desirable, but this work assumes only conventional CMOS processes and, consequently, coreless inductors. These integrated inductors induce eddy currents in the substrate, which can be mitigated by placing a metal patterned-ground shield (PGS) in between the inductor coil and the substrate [15] [16] [17] [18]. The inductor in buck design uses a single-turn octagon, placing metal layers 6 and 7 in parallel to reduce series resistance. To further reduce resistance, the inductor in the other two designs places metal layers 4 through 7 and one extra aluminum (ALUCAP) layer in parallel resulting in a slightly different estimated inductance. In all cases, the PGS is implemented in metal 1 to keep it as far as possible from the inductor. ASITIC [19] was used to extract the inductor characteristics shown in Figure 14(a) and the simplified π model of Figure 14(b) for the inductor used in the buck converter chip. The inductor layout area is 0.1mm2. 7.2. Implementation of the Bulk Capacitor In CMOS technology MOSFET gate capacitors have the highest capacitance density, but they are nonlinear [20]. This nonlinear behavior of gate capacitance is not significant in power converter applications such as this work because capacitance is used only for energy storage. An array of hundreds of NMOS devices in parallel is used to accommodate for the high capacitance needed. The internal resistance of a MOS gate capacitor, known as equivalent series resistance (ESR), is reduced by using a transistor W/L ratio of 10 [21]. A reduced ESR not only decreases power dissipation in the capacitor, but also lowers the voltage ripple across it. 13 7.3. Implementation of Three Complete Designs The block diagram and chip micrograph of the buck design are shown in Figure 15. The area of the buck design is 0.27mm2, and the total die area is 1mm2. A second chip, with a total die area of 2mm2, contains the boost and buck-boost designs as well as the circuit described in [13]. The boost design, including LF, is 0.26mm2. Due to increased layout effort, the buck-boost design requires only 0.2mm2. In comparison, the area of the reference clock driver is 0.03mm2. The layout of all three circuits is organized for probe station testing. Paths that carry high currents are wide, slotted, and use many parallel vias. We are unable to monitor the internal waveforms of the chip without being invasive, so waveform measurements are not available. 8. Prototype Measurement Results After verifying that the circuit simulated successfully, the buck converter was manufactured and tested first. For precise power measurement, all the parasitic resistances in the test setup of the buck circuit were accounted for through measurement and calibration. As a result, a supply voltage of 1.0V was applied at the chip probe pads. An external signal generator provides a clock signal to the chip under test with a 50% duty cycle.2 The converter output voltage vs. the output current is plotted in Figure 16(a). For each curve, the input duty cycle is kept constant at 50% and the switching frequency is changed. As expected, the output voltage does not vary much with frequency. The output voltage increases as output current decreases because of Deff and less resistive voltage drop in the circuit. Similarly, Figure 16(b) plots the input power vs. output current. The input power to the integrated clock driver/buck converter is plotted as Pin1, and the input power to the reference clock circuit as Pin2. The input power increases with frequency due to switching activity. 14 Raw and effective efficiency is plotted in Figure 17. Raw efficiency does not change much at different frequencies. Effective efficiency is higher at lower output currents. Since the available energy in Cclk is constant with respect to output current, at low outputs a greater proportion of the output energy comes from recycling. However, higher Fsw results in more energy being stored in the capacitor per second. Hence, ηeff benefits from increasing the frequency and lowering the output current. Achieving an effective efficiency above 100% is definitive proof that energy is being recovered from the clock. At 3GHz, Pin2 of the reference clock is measured to be 43mW. At the same time Pin1 is measured to be 58mW for an output voltage of 0.75V at 37mA load current (Pout1=27mW). Thus, η = 47% and ηeff = 180%. Table 1 provides a summary of performance comparison between this work and two other previously published buck converters. The output voltage ripple is part of the design specification. Among the published on-chip DC-DC converters, [22] has the highest reported switching frequency at 480MHz, but it uses on-package inductors.3 In contrast, [10] implemented a fully on-chip buck converter in 0.18µm SiGe RF BiCMOS technology that was 65% efficient. It also used an area of 27mm2 to fit the large passive components. The buck converter in this work achieves a much higher effective efficiency using only 1/100th of the area. After testing the buck converter successfully, the boost and buck-boost designs were manufactured together on a shared die and tested. Unfortunately, the boost and buck-boost circuit prototypes were not functional. For these circuits, measurements indicate an output voltage of a few hundreds of millivolts, which is lower than expected. There are several suspected issues, mostly arising from common elements of the layout or design, but we were unable to identify a 2 The circuit also behaves as expected at a 66% duty cycle [21]. Generator limits prevented testing other duty cycles. 3 The design later appeared in [23], switching at 233MHz to further boost converter efficiency. 15 specific cause. The most likely cause is a faulty voltage shifter circuit, leading to the inability to turn off the associated power transistor and unintentional draining of CF. 9. Conclusions This paper investigates energy recovery from a clock load in high-speed digital circuits by exploring the integration of three switching DC-DC converter topologies with a clock driver. These integrated driver/converter circuits recycle the energy (in the form of charge stored) from the clock capacitance. Several characteristics make integration of these two different circuits very promising: the multi-GHz system clock used for switching significantly reduces the size of the output filter components, making their integration feasible; the clock driver and power converter can share the tapered buffer chain, which can be optimized for low power; and energy wasted during clock discharge can be recovered using ZVS technique. Unlike resonant schemes which normally produce sinusoidal clock waveforms, simulations show this approach produces goodquality, quasi-square clocks. In addition, the power converter output enables further powersaving features to be employed such as low-voltage islands or body bias voltages. In this work, the buck converter has the simplest implementation. A proof-of-concept fully integrated clock driver/buck converter test chip was fabricated in a standard 90nm CMOS technology. Measurements indicate the chip operates with an effective efficiency up to 180%. This is able to exceed 100% by recycling the “free energy” that is available from the clock. Going forward, the circuits described in this work leave significant room for further optimization. For example, this paper focuses on using ZVS to recycle the back-end energy of the final clock driver where most of the energy is lost. However, it is possible to lower switching losses even further with front-end energy recycling [24] or gate charge recycling [25]. Alternatively, [26] suggests a way to shrink passive components by employing two inductors to 16 cancel output ripple. Future work can improve upon the converters in this paper by improving their design and combining them with other recent advances. In addition, there are a number of concerns that still need to be addressed before this work can be made practical. First, clock jitter may be increased as a result of the pulse-width modulation required for voltage regulation. As well, there may be issues with stopping the clock, such as possible oscillations due to the LC elements. Integration with the layout of a real highspeed microprocessor, or a reasonable proxy, is needed to verify practical layout issues with the power grid and low-skew clock distribution network. Also, the new clock waveform is only suitable for edge-triggered digital flip-flops, whereas high-performance microprocessors frequently employ transparent latches. Future work is needed to address these concerns. Acknowledgements The authors would like to thank several people who have provided feedback for this work, including Daryl van Vorst and Professors David Pulfrey, Resve Saleh, and K.C. Smith. We gratefully acknowledge the contributions of CMC Microsystems for providing CAD tools and chip fabrication resources, as well as funding from the Natural Sciences and Engineering Research Council of Canada (NSERC). References [1] E. G. Friedman, “Clock Distribution Networks in Synchronous Digital Integrated Circuits,” Proc. of the IEEE, vol. 89, no. 5, pp. 665-689, May 2001. [2] N. Ranganathan and N. Jouppi, “Evaluating the Potential of Future On-Chip Clock Distribution using Optical Interconnects,” Hewlett-Packard Company Labs Technical Report HPL-2007-163, 2007. 17 [3] J. Friedrich, B. McCredie, N. James, B. Huott, B. Curran, E. Fluhr, G. Mittal, E. Chan, Y. Chan, D. Plass, S. Chu, H. Le, L. Clark, J. Ripley, S. Taylor, J. Dilullo, and M. Lanzerotti, “Design of the Power6 Microprocessor,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 96-97, 2007. [4] M. Stan and W. Burleson, “Low-power CMOS Clock Drivers,” Proc. ACM/IEEE Int. Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, pp. 149156, 1995. [5] S. C. Chan, K. L. Shepard, and P. J. Restle, “Uniform-Phase Uniform-Amplitude ResonantLoad Global Clock Distributions,” IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 102-109, Jan. 2005. [6] S. C. Chan, K. L. Shepard, and P. J. Restle, “Distributed Differential Oscillators for Global Clock Networks,” IEEE J. Solid-State Circuits, vol. 41, no. 9, pp. 2083-2094, Sept. 2006. [7] G. Lemieux, M. Alimadadi, S. Sheikhaei, S. Mirabbasi, and P. Palmer “SoC Energy Savings = Reduce + Reuse + Recycle: A Case Study Using a 660MHz DC-DC Converter with Integrated Output Filter,” Proc. IEEE Canadian Conf. on Electrical and Computer Engineering, pp. 947-950, 2008. [8] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan, and V. De, “Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. 1396-1402, Nov. 2002. [9] W. Kim, M. Gupta, G.-Y. Wei, and D. Brooks, “System Level Analysis of Fast, per-core DVFS using On-chip Switching Regulators,” IEEE Int. Symposium on High Performance Computer Architecture, pp. 123-134, 2008. [10] S. Abedinpour, B. Bakkaloglu, and S. Kiaei, “A Multi-Stage Interleaved Synchronous Buck Converter with Integrated Output Filter in a 0.18µm SiGe Process,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 356-357, 2006. [11] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, and P. Palmer, “A 3GHz Switching DC-DC Converter Using Clock-Tree Charge-Recycling in 90nm CMOS with Integrated Output Filter,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 532-533, 2007. [12] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “Energy Recovery from High-frequency Clocks using DC-DC Converters,” Proc. IEEE Int. Symposium on Very Large Scale Integration, pp. 162-167, 2008. [13] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “A 4GHz Non-resonant Clock Driver with Power-grid Energy Return,” submitted to IEEE Trans. Circuits and Systems II, 2008. [14] S. C. O. Mathuna, T. O'Donnell, W. Ningning, and K. Rinne, “Magnetics on Silicon: an Enabling Technology for Power Supply on Chip,” IEEE Trans. Power Electronics, vol. 20, no. 3, pp. 585-592, May 2005. [15] C. Yue and S. Wong, “On-chip Spiral Inductors with Patterned Ground Shields for Si-based RF ICs,” IEEE J. Solid-State Circuits, vol. 33, no. 5, pp. 743-752, 1998. 18 [16] J. N. Burghartz, “Progress in RF inductors on silicon-understanding substrate losses,” IEEE Int. Electron Devices Meeting Dig. Tech. Papers, pp. 523-526, 1998. [17] J. N. Burghartz, D. C. Edelstein, M. Soyuer, H. A. Ainspan, and K. A. Jenkins, “RF circuit design aspects of spiral inductors on silicon,” IEEE J. Solid-State Circuits, vol. 33, no. 12, pp. 2028-2034, Dec. 1998. [18] J. Gil and S. Hyungcheol, “A Simple Wide-band On-chip Inductor Model for Silicon-based RF ICs,” IEEE Trans. Microwave Theory and Techniques, vol. 51, no. 9, pp. 2023-2028, Sep. 2003. [19] A. M. Niknejad and R. G. Meyer, “Analysis, Design, and Optimization of Spiral Inductors and Transformers for Si RF IC’s,” IEEE J. Solid-State Circuits, vol. 33, no. 10, pp. 14701481, Oct. 1998. [20] G. Villar, E. Alarcon, F. Guinjoan, and A. Poveda, “Optimized Design of MOS Capacitors in Standard CMOS Technology and Evaluation of their Equivalent Series Resistance for Power Applications,” Proc. IEEE Int. Symposium Circuits and Systems, pp. 451-454, 2003. [21] M. Alimadadi, “Recycling Clock Network Energy in High-performance Digital Designs using On-chip DC-DC Converters”, Ph.D. Thesis, Dept. of Electrical and Computer Engineering, The University of British Columbia, 2008. [22] G. Schrom, P. Hazucha, J. Hahn, D. S. Gardner, B. A. Bloechel, G. Dermer, S. G. Narendra, T. Karnik, and V. De, “ A 480-MHz, Multi-phase Interleaved Buck DC-DC Converter with Hysteretic Control,” Proc. IEEE Power Electronics Specialists Conf., pp. 4702-4707, 2004. [23] P. Hazucha, G. Schrom, J. Hahn, B. A. Bloechel, P. Hack, G. E. Dermer, S. Narendra, D. Gardner, T. Karnik, V. De, and S. Borkar, “A 233MHz 80%-87% Efficient Four-phase DCDC Converter Utilizing Air-core Inductors on Package,” IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 838-845, Apr. 2005. [24] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, “A 660MHz ZVS DC-DC Converter Using Gate-driver Charge-recycling in 0.18µm CMOS with an Integrated Output Filter,” Proc. IEEE Power Electronics Specialists Conf., 2008. [25] M. D. Mulligan, B. Broach, and T. H. Lee, “A 3MHz Low-voltage Buck Converter with Improved Light Load Efficiency,” IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 528-529, 2007. [26] J. Wibben and R. Harjani, “A High-efficiency DC-DC Converter Using 2nH Integrated Inductors,” IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 844-854, 2008. 19 Figure 1. Recycling clock energy with a DC-DC converter (a) Raw efficiency (b) Effective efficiency Figure 2. Power dissipation block diagram 20 (a) A typical buck converter (b) Simplified circuit diagram (c) Idealized timing diagrams Figure 3. Integrated clock driver/buck converter Figure 4. Reference clock driver for the integrated clock driver/buck converter 21 Figure 5. Integrated clock driver/buck converter Figure 6. Simplified circuit model for analyzing Vclk during Tfall 22 (a) A typical boost converter (c) A typical buck-boost converter (b) Simple clock driver/boost converter (d) Simple clock driver/buck-boost converter Figure 7. Integrated clock driver converter circuits 23 Figure 8. Circuit diagram of the integrated clock driver/boost converter Figure 9. Circuit diagram of the integrated clock driver/buck-boost converter 24 1.2 0.3 Vclk Vclk-ref Vload Lf Mn Mp 0.25 0.8 0.2 0.6 0.15 Current (A) Voltage (V) 1 0.4 0.2 0.1 0.05 0 0 0 0.2 0.4 0.6 0.8 1 -0.2 -0.05 0 0.2 0.4 -0.1 -0.4 0.6 0.8 1 Time (nSec) Time (nSec) (a) Voltage waveforms (b) Current waveforms Figure 10. Simulated waveforms of points of interest for the integrated clock driver/buck converter 300 1 Iout=30 Effective Efficiency (%) Iout=50 0.75 Iout=70 Vout (V) Iout=100 0.5 0.25 250 200 150 D=30% D=40% D=50% D=60% D=70% 100 50 0 0 10 20 30 40 50 60 70 80 40 50 Duty Ratio (%) 60 70 80 90 100 Iout (mA) (a) Output voltage vs. duty cycle (b) Effective efficiency vs. output current Figure 11. Simulation results of the integrated clock driver/buck converter 2.5 1.5 Effective Efficiency (%) 2 Vout (V) 125 Iout=10mA Iout=30mA Iout=50mA Iout=70mA Iout=100mA 1 0.5 0 D=40% D=50% D=60% D=70% D=80% 100 75 50 25 0 30 40 50 60 Duty Ratio (%) 70 80 0 20 40 60 80 100 Iout (mA) (a) Output voltage vs. duty cycle (b) Effective efficiency vs. output current Figure 12. Simulation results of the integrated clock driver/boost converter 25 100 0 Vout (V) -0.4 Effective Efficiency (%) Iout=10mA Iout=30mA Iout=50mA Iout=70mA Iout=90mA -0.8 -1.2 -1.6 D=20% D=30% D=40% D=50% D=60% D=70% 80 60 40 20 -2 0 10 20 30 40 50 60 70 0 Duty Ratio (%) 20 40 60 80 100 Iout (mA) (a) Output voltage vs. duty cycle (b) Effective efficiency vs. output current Figure 13. Simulation results of the integrated clock driver/buck-boost converter Ls (pH), Rs (mOhm), Q x 10 400 350 300 250 200 150 100 Ls (pH) Rs (mOhm) 50 Q x 10 0 0.1 1 10 Fsw (GHz) 100 Fsw = 3GHz, Lseries = 320pH, Rseries = 260mΩ, Cs1 ≅ Cs2 ≅ 140fF, Rs1 ≅ Rs2 ≅ 280Ω, Q = 20 (a) Lseries, Rseries and Q vs. Fsw (b) Simplified π model Figure 14. Modeling of the on-chip inductor for the buck converter chip (a) Chip block diagram (b) Chip micrograph Figure 15. Implementation of the integrated clock driver/buck converter 26 3GHz 2.5GHz Fsw Sweep (D=50%) 120 1 100 0.8 80 Power (mW) Vout (V) 1.2 3GHz 2.5GHz Fsw Sweep (D=50%) 0.6 0.4 0.2 Pin1 60 Pin2 40 20 0 0 30 40 50 60 70 80 90 100 110 30 40 50 60 Iout (mA) 70 80 90 100 110 Iout (mA) (a) Output voltage (b) Input power Figure 16. Measured results indicating the effect of Fsw on (a) Vout and (b) Pin1 and Pin2 3GHz 2.5GHz Fsw Sweep (D=50%) 240 Effective Efficiency (%) 120 100 Raw Efficiency (%) 3GHz 2.5GHz Fsw Sweep (D=50%) 80 60 40 20 200 160 120 80 40 0 0 30 40 50 60 70 80 90 100 110 30 40 50 60 70 80 90 100 110 Iout (mA) Iout (mA) (a) Raw efficiency (b) Effective efficiency Figure 17. Measured results showing the effect of Fsw on η and ηeff Table 1. Summary of performance comparison between integrated switching converters Converter type Technology 2 Layout Area (mm ) Switching frequency, Fsw (MHz) Inductor, LF (pH) Capacitor, CF (pF) Supply Voltage, Vin (V) Output Voltage, Vout (V) Output Voltage Ripple Output Current, Iout (mA) Effective Efficiency, ηeff (%) Previous Work 4-Phase Buck [22] 2-Phase Buck [10] 90nm 0.18µm SiGe CMOS RF BiCMOS 0.14 * 27 (excludes L) 480 45 3 600 11 000 (per phase) (per phase) 2 500 6 000 1.8 2.8 0.9 1.5 ~ 2 500 200 72 65 This Work Buck 90nm CMOS 0.27 3 000 320 350 1.0 0.53 ~ 0.75 < 5% 40 ~ 100 184 (Vout=0.75V) 102 (Vout=0.63V) 74 (Vout=0.53V) * Layout area was reported in [23] 27