Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 563 Design and Implementation of Differential Cascode Voltage Switch with Pass-Gate (DCVSPG) Logic for High-Performance Digital Systems Fang-shi Lai and Wei Hwang, Senior Member, IEEE Abstract— In this paper, a new high-speed circuit technique called differential cascode voltage switch with pass-gate (DCVSPG) logic tree is presented. The circuit technique is designed using a pass-gate logic tree in DCVSPG instead of the nMOS logic tree in the conventional DCVS circuit, which eliminates the floating node problem. By eliminating the floating node problem, the DCVSPG becomes a new type of ratioless circuit, and it also provides superior performance with less power dissipation and better silicon area tradeoff. The basic DCVSPG design technique, the methodology for optimization, and synthesis of the pass-gate logic tree are described. The standard cell library development by taking advantages of the dual-rail outputs of DCVSPG gates are also discussed. The performance comparisons with other existing pass-gate circuit techniques [complimentary pass-transistor logic (CPL), double pass-transistor logic (DPL), and swing restored pass-transistor logic (SRPL)] are presented. For more robust design, the DCVSPG with inverter buffers is also the best choice. A Viterbi macro design using the DCVSPG circuit technique is demonstrated. The process that the design is based upon is a 0.5-m CMOS technology with 0.25-m effective channel length and five layers of metal. This macro can run up to 500 MHz at the nominal process condition. In comparison with other existing dynamic circuit techniques, the results also clearly show that the dynamic DCVSPG has the superior power-delay performance. Index Terms—Complete logic family, CMOS digital integrated circuits, high-speed circuits, MOSFET logic devices. I. INTRODUCTION T HE dominant circuit design technique for current digital systems is static CMOS. This is mainly due to their robust design nature which can implement reliable circuits with excellent noise margin. However, the demand for high-performance digital systems requires continuously faster CMOS circuit speed. Dynamic circuits are proven to have better circuit performance [1]. Unfortunately, these dynamic design styles suffer from charge sharing, low noise margin, complexity of design, and difficulty in testing. Recently, several researchers have attempted to use pass-gate logic style to realizes static and high performance designs in different digital systems [2]–[4]. Pass-gate logics gain their speed over the traditional static CMOS design due to their high logic functionality and Manuscript received January 4, 1996; revised April 10, 1996. F. Lai is with the IBM Almaden Research Center, San Jose, CA 95120 USA. W. Hwang is with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA. Publisher Item Identifier S 0018-9200(97)02480-3. reduction in the number of pMOS transistors. However, the degradation of pull-up performance for the pass-gate design in the long circuit chain is the major obstacle for most designer to use. A proper termination of the long pass-gate chain in the gate or the insertion of a static inverter is a ultimate solution to realize the high-speed pass-gate design. Differential cascode voltage switch (DCVS) [5] is claimed to have advantages over the traditional static CMOS design in terms of circuit delay, layout area, logic flexibility, and power dissipation [6], [7]. DCVS also has an inherent selftesting property which can provide coverage for stuck-at and dynamic faults [8]. Actually, most of the published pass-gate high performance circuits [2]–[4], [9] are more or less derived from DCVS. The reduced size of the pMOS transistor network substantially saves the amount of silicon area. The inherent cross-coupled logic and complementary outputs also make DCVS a very attractive candidate for dynamic implementations. Standard domino logic [10] suffers from the fact that inverting logic gates cannot be implemented. DCVS provides complementary information and therefore overcomes the restriction. However, this DCVS is a ratio circuit and, as such, has some significant drawbacks and disadvantages in both static and dynamic mode of operations. This is mainly caused by the floating node which is generated by one of its logic tree legs. With this inherent nature, this DCVS tends to have relatively large current spikes and additional delay [11]. By replacing the logic evaluation tree with the pass-gate design, the floating node problem can be eliminated. This makes the differential cascode voltage switch with pass-gate tree (DCVSPG) becomes a ratioless circuit. It combines the cross-coupled DCVS nature which makes DCVSPG have full output swings signal and compact logic design style of passgate circuits. This leads DCVSPG to be suitable for high performance digital systems. In this paper, DCVSPG basic design technique will be presented. In Section II, the basic AND circuit will be compared with the standard DCVS circuit. The optimization procedure will be derived, and the synthesis of pass-gate logic by using the recursive Karnaugh map will be discussed. The performance comparisons with existing pass-gate circuits will be shown in Section III. A Viterbi decoder designed with DCVSPG and the performance comparisons with static and dynamic DCVS circuits are shown in Section IV. A conclusion will be presented in Section V. 0018–9200/97$10.00 1997 IEEE 564 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 (a) Fig. 2. ASTAP simulation results of DCVS with fixed nMOS device sizes. (b) Fig. 1. (a) Conventional DCVS AND circuit and (b) DCVSPG AND circuit. II. DIFFERENTIAL CASCODE VOLTAGE SWITCH WITH PASS-GATE LOGIC (DCVSPG) A. Basic Properties Fig. 1(a) shows the traditional DCVS AND circuit. In DCVS circuits, two cross-coupled pMOS transistors p1 and p2 form the circuit load. Below the pMOS load, there are four nMOS transistors n1, n2, n3, and n4 that form the n-channel logic evaluation tree. When the input signals and swing from low to high, transistors n1 and n2 turn ON. The node is then discharged to the ground. The node is floating at the transition period while the complementary input signals and swing from high to low. Both of the nMOS transistors turns n3 and n4 are OFF. The ground level on the node the cross-coupled pMOS transistor p2 ON. The output node will be charged high. This realizes the AND logic function. However, the floating phenomenon on the output node has an adverse effect on the DCVS circuit operation. If we assume the node is low and the node is high in the previous state, during the transition period, the pMOS transistor p1 is ON momentarily when both of the input signals and are swinging from low to high. This is due to the node being low at this moment. The transistors p1, n1, and n2 form a ratio circuit in which the node is going to be high or low solely depending on the ratio strength of pMOS transistor p1 and the series-connected nMOS n1 and n2 transistors. If p/n ratio sets at — or lower, the gate will switch—but will be very slow. The logic function of this circuit will fail. However, if we decrease the strength of pMOS transistor p1, the performance of the pull up will be substantially degraded, and power consumption will be increased. In Fig. 2, the Advanced Statistical Analysis Program (ASTAP) simulation results of this circuit are shown. In this simulation, we set the nMOS device widths constant. By varying the pMOS device width, we can see that the rise time is dramatically changed when we monitor the output node . In the small pMOS device width region, the slow pull-up performance is largely caused by the small pMOS strength. However, in the larger pMOS device width region, the degraded pull-up performance is mainly caused by the ratio circuit problem. The node discharges slowly with the high pMOS device strength. The DCVSPG AND circuit shown in Fig. 1(b) actually solves the floating node problem by replacing the n-channel evaluation tree with the pass-gate design. The cross-coupled pMOS device load is the same as in Fig. 1(a). With the same previous state, when both input signals and swing from low to high, the nMOS transistors n2 and n4 both turn is then discharged into ground when the ON. The node complementary signals and swing from high to low. However, the output node is charging up to the high state which prevents the ratio circuit problem as discussed before. This improves not only the increasing circuit performance, but also decreases the power consumption. This also makes DCVSPG a more robust design technique. In the DCVS circuit, a proper pMOS device width can be chosen to make the circuit function, but the technology process variations will make the pMOS device width adjustments very difficult. In the ASTAP simulation, results of DCVSPG with fixed nMOS device sizes are as shown in Fig. 3. It is confirmed that the DCVSPG design technique has the better performance. The rise times are constant throughout the whole pMOS device width variation period. The pull-up performances are almost one order of magnitude better than that of DCVS design. B. Optimization of DCVSPG Gates If we take the average value of the rise and fall times shown in Fig. 3 and replot them in Fig. 4, an optimum pMOS device width can be chosen in order to make better performance. This originates from the fact that the pull up performance is contributed both from the pMOS and nMOS devices in the DCVSPG circuit. So if we increase the pMOS device width, the pull-up performance will be improved. However, LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC 565 where . However (4) where is a constant which depends on the design ground rule for minimum junction length. is the pMOS width. is the pMOS length, and is the lumped junction capacitance per unit channel width. Here, we assume pMOS and nMOS have the same junction capacitance. The total capacitance is then (5) Fig. 3. ASTAP simulation results of DCVSPG with fixed nMOS device sizes. , and where we assume . The fall time is then (6) The fall time is proportionally increased as increases. This is shown in Fig. 3. For the charging path shown in Fig. 5(c), the charged voltages by nMOS and pMOS are (7) (8) Fig. 4. Delay time with the functions of pMOS device size and capacitive load. the increased pMOS device junction capacitance will degrade the pull-down performance. As shown in Fig. 5(a), the discharging path can be summarized as shown in Fig. 5(b). The shown in Fig. 5(a) is the wiring capacitance and the shown in Fig. 5(b) and (c) is the total capacitance where is the junction capacitance of the pMOS and nMOS devices. By assuming the nMOS device is in the saturation mode while the discharge voltage varies from to 0.5 where we measured at the switching point, the discharge current [12] is By assuming , the total charged voltage is (9) where path. and are the same definition as shown in discharging is the rise time. So the rise time is (10) where and are the same definition as the discharging path. By rearranging the delay time (1) (11) where is the electron mobility and is the intrinsic gate capacitance. is the nMOS width. is the nMOS length. is the gate-to-source voltage and is the nMOS threshold voltage. The discharged voltage is then (2) where is the fall time. The fall time is then rewritten as (3) By taking the first derivative of (11) and let , we get (12) This is the optimum value for DCVSPG circuits. Equation (12) shows that when , the equals zero. That means the pMOS device width should be as small as possible to have the best performance as shown in Fig. 4. However, when , the value is around 1.16. The pMOS device width should be chosen to be the same as that of the nMOS devices. 566 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 (a) (a) (b) Fig. 6. (a) Karnaugh map for AND circuit and (b) synthesized DCVSPG AND circuit. (b) leg just take the respective complementary function variable connection to realize the complete circuit. The realized AND logic circuit is shown in Fig. 6(b). Fig. 7(a) shows the logic function of Karnaugh map. If we assume the input signals and are the control signals, the function variables are then and . By grouping the terms under each of the four control values and minimizing the functions using the conventional Karnaugh map procedure, the logic function result can be written as (13) (c) Fig. 5. (a) DCVSPG AND circuit with the capacitive load, (b) simplified pull-down path for DCVSPG circuit, and (c) simplified pull-up path for DCVSPG circuit. C. Recursive Synthesis of Pass-Gate The pass-gate logic tree can be synthesized in a very systematic way by recursively using the Karnaugh map. Although the complex DCVSPG logics can be realized by using the standard basic logic gates such as NOR, NAND, and XOR gates, the powerful logic implementation by using the passgate approach is another major advantage of DCVSPG circuits. Fig. 6(a) shows the Karnaugh map for the AND function. The or can be either the nMOS gate control input signal or the nMOS source connection. In this case, if we assume signal will be the the signal is the control variable, the function variable. The control variable is used to connect to the gate and the function variable is connected to the source and , of the nMOS device. Under the control signals we group the terms together as shown in Fig. 6(a). Under the , the grouped terms are all zero. It indicates control signal should that the function variable under the control signal connect to ground. However, under the control signal , the grouped terms are the same with the signal . This means the function variable under the control gate should connect to . The function variable connections under the other logic tree This indicates that under the series-connected control gates and , the function variable is 1010 from the Karnaugh map. In parallel with these series-connected control gates and , it should have one control gate and one control gate. Both of them are connected to the same function variable 0011. As shown in the AND gate realization method, the 1010 function variable can be realized in the same fashion by assuming the signal is the control variable. The 1010 function variable can be realized as (14) and the 0011 function variable as ground (15) The completely synthesized circuit is shown in Fig. 7(d). It is obvious that if the function variables are more than five, we should use the Quine–McCluskey tabular method [15]. However, the circuit with more than five variables increases stack height dramatically. This will degrade the circuit performance profoundly. Especially, when the technology is moving down to the smaller power supply with the threshold voltage not being scaled proportionally, the increased stack height will degrade the performance sharply. The optimum function variable number is then better kept lower than three or four stack height depending on the applications. LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC 567 (a) (b) Fig. 8. Basic two-way DCVSPG logic gates. (c) (d) = AN BN CN DN + C (A + B + D) Fig. 7. (a) Karnaugh map for F logic function, (b) Karnaugh map for [1010] value from (a), (c) Karnaugh map for [0011] value from (a), and (d) synthesized DCVSPG circuit. D. DCVSPG Gates Figs. 8 and 9 show the basic logic gates with two or three input variables. These gates can be synthesized using the previously mentioned recursive Karnaugh map method. It is interesting to note that all of the logic functions are produced by only four nMOS transistors. The only differences among them are the function variable connections. Another attractive feature is that these circuits can generate the complementary outputs without any inverter circuits, such that the AND and NAND are actually the same circuit only with the output nodes exchanged. This leads the DCVSPG circuit to be most suitable for standard cell library development. For all the library cells, the circuit topologies are the same. They are only different in function variable connection. Due to the cross-coupled pMOS device load, these circuits have the latch function. By combining the clock function inside the circuit, these circuits can become the flip-flop combined with the logic function [16], [17]. As shown in Fig. 10, between the crosscoupled pMOS load and logic tree, two extra nMOS’s are inserted. The gates of these nMOS’s are then connected to the clock signal . When the clock signal is high, the output nodes and can store the current evaluated value. When the clock signal is low again, the input signals will be changed. However, the output nodes and are disconnected from the logic tree. It will temporarily store the previous data until the clock is high again. This is a typical cross-coupled latch function. It is very useful to apply this DCVSPG latch technique to the pipelined design. Usually, in the conventional pipelined design, we have to add extra latches in order to store the temporary data. Thus, the silicon area penalties of these extra latches are very large. By using the DCVSPG techniques, the requirement for these extra latches can be eliminated totally. It is because only at the last logic stage of pipelined design can we insert the clock-controlled NMOS between the cross-coupled pMOS load and DCVSPG logic tree to form the latch. The advantages of these DCVSPG latches will not only improve the performance dramatically (no extra latches to be driven), but also reduce the silicon area substantially. All DCVSPG circuits also allow dot-OR function due to these pass-gates being in the high impedance state. The dot-OR function, however, is not allowed in the conventional static CMOS design. III. PERFORMANCE COMPARISONS In order to compare the performance and power dissipation with the competing techniques such as CPL [2], DPL [3], and SRPL [4], we construct the SUM circuits as the test vehicle. For DCVSPG circuit, the SUM circuit is actually the threeway XOR circuit shown in Fig. 9. Due to all of the competing 568 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 Fig. 11. DCVSPG SUM circuit with two inverter buffers. Fig. 9. Basic three-way DCVSPG logic gates. Fig. 12. CPL circuit implementation for SUM circuit. Fig. 10. DCVSPG latch circuit. Fig. 13. DPL circuit implementation for SUM circuit. circuits having inverter buffers, we also add inverter buffers in the DCVSPG circuit shown in Fig. 11 as DCVSPGB. The CPL SUM circuit is shown in Fig. 12. It is interesting to note that DCVSPGB is actually CPL with the additional cross-coupled pMOS load. The DPL SUM is described in Fig. 13, and the SRPL SUM is shown in Fig. 14. By looking all the circuits, they are awfully similar. So we simulate these circuits with the same nMOS and pMOS device widths. The only difference for DPL is that we split the nMOS width used in other circuits into 40 to 60% ratio between the nMOS and pMOS passgates. The technology used for simulation is a 0.5- m CMOS technology with 2.5-V power supply. Figs. 15 and 16 show the simulated circuit performance and power dissipation at the variation of capacitive loads. Without considering the performance result and the waveform shape, DCVSPG actually has the best power-delay product. It is easy to understand that DCVSPG has the least number of transistors and the full-swing CMOS signals. For the capacitive load less than 0.3 pF, DCVSPG even has the best performance of all. The only serious drawback is that DCVSPG cannot drive a long chain of logic circuits due to the weak pMOS transistor pull up. This will degrade the performance at high load as shown in Fig. 15. However, as we discussed before, the input signals to pass-gate can be either a control variable or a function variable. The long chain of logic circuits can be terminated at gate if a proper control variable is chosen. The alternate approach is using an inverter buffer as shown LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC Fig. 14. 569 SRPL circuit implementation for SUM circuit. Fig. 15. ASTAP simulation results for rise and fall time delay. in DCVSPGB. Adding the inverter buffer, however, the load is increased with two extra gate capacitances from the node shown in Fig. 11. The performance at the light capacitive load will be degraded. At the high capacitive load range, however, the delay of the inverter buffer will be dominated, such that DCVSPGB has the better performance in the highly loaded region. CPL, DPL, and DCVSPGB all have the very similar circuit performance at the low capacitive load. In the high load range, however, both DPL and DCVSPG have better performance than CPL. This is due to both of them having full swing signals. For CPL, the pull down performance is dramatically degraded as the singles only swing from to zero at nodes and shown in Fig. 12. Smaller overdrive voltage causes the inverter delay to increase. Although we can lower the inverter threshold voltage as suggested [2], this will lower the noise margin, and technology process variations will cause the design to be less robust. SRPL performance is not as good as the claim in the SRPL paper [4]. The reason is not quite known. But from the circuit point of view, the node in Fig. 14 will have not only the wiring capacitance and inverter gate load, but also the inverter junction loads. Before the inverter switches, at the same device width, SRPL actually has the largest loading among these competing circuits. From the overall results, DCVSPG actually is the best circuit for the lightly loaded ( 0.3 pF) condition if the long logic chain can be properly terminated at a gate. For a more robust design situation, DCVSPGB and DPL might be the best choices. However, from the circuit simplicity point of view, DPL is proven to be much more complicated. Although the total device widths are the same, the complex wiring might finally drive DPL into a larger silicon area. IV. APPLICATIONS A Viterbi decoder used in the high performance PRML read-write channel chip [18] has been designed. The logic diagram of the Viterbi decoder is shown in Fig. 17. From Fig. 16. ASTAP simulation results for power consumption. this diagram, the bottleneck of the critical path is the 6-b subtracter unit which has to meet the worst process condition at 3 ns. The silicon area is also a big concern for this costperformance chip. The 6-b subtracter is then constructed and implemented by using a DCVSPG circuit and the ripplethrough architecture. The full subtracter is shown in Fig. 18 and the overall Viterbi decoder computer plot is shown in Fig. 19. The simulation results of the whole Viterbi macro are shown in Fig. 20. It is shown that at the nominal process, the Viterbi macro can achieve very high performance up to 500 MHz by using the DCVSPG circuit. As we discussed before, the DCVSPG circuit cannot only be used in a static circuit configuration, but also can be extended into the dynamic circuit region [6], [7]. Again, in order to compare the circuit performances of various design techniques, 570 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 Fig. 17. Logic schematic for the DCVSPG realized Viterbi decoder. Fig. 18. DCVSPG full adder circuit for Viterbi decoder. the SUM circuit is used for ASTAP simulation. Figs. 21 and 22 show the static and dynamic circuits of the conventional DCVS design technique. The static and dynamic implementations of DCVSPG are shown in Figs. 9 and 23. The static CMOS SUM circuit is shown in Fig. 24. The device width is designed following a basic rule that the conductance of all the discharging paths are assumed to be the same as the conductance of a minimum size ( m in our case) nMOS transistor such that, for example, in Fig. 21 the Fig. 19. Computer plot of the whole Viterbi decoder. conventional DCVS static circuit, three transistors are seriesconnected along the discharge path. The transistor width is LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC 571 TABLE I COMPARISON OF FULL ADDER Fig. 20. ASTAP simulation results for Viterbi decoder. then chosen as m m. For Fig. 22, however, the device size is then increased up to m m in the dynamic DCVS circuit. The pMOS device size is chosen as twice as large as that of the nMOS device. All the circuits are laid out by using the row-based standard cell library image with only one level metal allowed. The simulation results are shown in Table I. The load capacitance is assumed that the circuit drives a chain of the similar circuits. A fan out of one is used for the static circuit. The dynamic circuits, however, are buffered by the C MOS latch and expect to be able to drive larger loads. A fan out of two is used for the dynamic circuits. Considering the static design first, it appears that the static DCVSPG has the best power-delay product. DCVSPG has the lowest logic tree stack height such that its transistor size and input capacitance are the smallest. The best performance is Fig. 21. Static DCVS SUM circuit. solely due to the fact that the pull up and pull down are mostly done by the high performance nMOS transistor. By contrast, the pull up of DCVS and static CMOS are done by the inferior pMOS transistors. The least power consumption is also mostly caused by the shorter stack height and smaller transistor size. For the dynamic circuits, the power-delay product of the dynamic DCVSPG is the same with its static counterpart. However, its speed is almost two times faster than that of the static DCVSPG at the expense of the twice larger device count and silicon area. This leads to almost two times larger power consumption. The conventional dynamic DCVS also shows significant performance improvement. This might be due to the fact that the dynamic DCVS is a ratioless circuit using the precharge mechanism. But this might be offset by the larger silicon area, charge sharing problem, and complex clock scheme in the dynamic circuit. 572 Fig. 22. IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 Dynamic DCVS SUM circuit. Fig. 24. Static CMOS SUM circuit. ACKNOWLEDGMENT The authors are indebted to Dr. D. Tang and Dr. M. Chen for their encouragement and support during the course of Viterbi decoder design. REFERENCES [1] M. Annaratone, Digital CMOS Circuit Design. New York: Kluwer, 1986. [2] K. Yano, T. Yamanaka, T. Nishida, M. Sato, K. Shimohigashi, and A. 16 multiplier using complementary Shimizu, “A 3.8 ns CMOS 16 pass-gate transistor logic,” IEEE J. Solid-Sate Circuits, vol. 25, pp. 388–395, 1990. [3] M. Suzuki, N. Ohkubo, T. Yamanaka, A. Shimizu, and K. Sasaki, “A 1.5 ns 32b CMOS ALU in double pass-transistor logic,” in Dig. Tech. Papers, ISSCC, 1993, pp. 90–91. [4] A. Parameswar, H. Hara, and T. Sakurai, “A high speed, low power, swing restored pass-transistor logic based multiply and accumulate circuit for multimedia application,” in Proc. IEEE CICC, 1994, pp. 278–281. [5] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, “Cascode voltage switch logic: A differential CMOS logic family,” in Dig. Tech. Papers, ISSCC, 1984, pp. 16–17. [6] K. M. Chu and D. I. Pulfrey, “A comparison of CMOS circuit techniques: Differential cascode voltage switch logic versus conventional logic,” IEEE J. Solid-State Circuits, vol. SC-22, pp. 528–532, 1987. , “Design procedures for differential cascode voltage switch [7] circuits,” IEEE J. Solid-State Circuits, vol. SC-21, pp. 1082–1087, 1986. [8] R. K. Montoye, “Testing scheme for differential cascode voltage switch circuits,” IBM Tech. Disc. Bull., vol. 27, pp. 6148–6152, 1985. [9] F. S. Lai and W. Hwang, “Differential cascode voltage switch with passgate logic tree for high performance CMOS digital systems,” in 1993 Int. Symp. VLSI Technology Systems Applications, 1993, pp. 358–362. [10] R. H. Krambeck, C. M. Lee, and H. S. Law, “High-speed compact circuits with CMOS,” IEEE J. Solid-States Circuits, vol. SC-17, pp. 614–619, 1982. [11] L. C. Pfennings, W. J. Mol, J. J. Bastiaens, and J. M. VanDijk, “Differential split-level CMOS logic for subnanosecond speeds,” IEEE J. Solid-State Circuits, vol. SC-20, pp. 1050–1055, 1985. [12] S. M. Sze, Physics of Semiconductor Devices. New York: Wiley, 1981. [13] S. Whitaker, “Pass-transistor networks optimize n-MOS logic,” Electron., pp. 144–148, 1983. 2 Fig. 23. Dynamic DCVSPG SUM circuit. V. CONCLUSIONS This paper describes a new circuit design technique—DCVSPG. This proposed circuit eliminates the floating node problem which existed in the conventional DCVS design. This leads DCVSPG to have better performance and power consumption. We also demonstrate a simple synthesis way to construct the complex logic function into pass-gate design by using the recursive Karnaugh map. Compared with the existing design techniques, DCVSPG is actually the best of all at the light capacitive load range. By using DCVSPG, the properly terminated long logic chain at the gate is an essential issue. For the more robust design, however, DCVSPGB might also be the best choice. CPL is not a full swing signal circuit and DPL suffers from the complex wiring and design. A Viterbi macro shows the possibility of using DCVSPG to achieve a very high performance design. With the extension into the dynamic circuit design region, DCVSPG also shows the best power-delay product. LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC [14] D. Radhakrishnan, S. Whitaker, and G. K. Maki, “Formal design procedures for pass-transistor switching circuits,” IEEE J. Solid-State Circuits, vol. SC-20, pp. 531–536, 1985. [15] D. Winkel and F. Prosser, The Art of Digital Design. Englewood Cliffs, NJ: Prentice-Hall, 1980. [16] S. Feng, U.S. Patent 4 620 117, 1986. [17] M. Afghahi, “A robust single phase clocking for low power, high-speed VLSI applications,” IEEE J. Solid-State Circuits, vol. 31, pp. 247–254, 1996. [18] R. A. Richetta, C. J, Goestchel, R. A. Green, R. A. Kertis, R. A. Philpott, T. J. Schmerbeck, D. J. Schulte, and D. P. Swart, “A 16 MB/PRMLread/write data channel,” in Dig. Tech. Papers, ISSCC, 1995, pp. 78–89. Fang-shi Lai received the B.S. degree in 1971 from National Cheng Kung University, Tainan, Taiwan, the M.S. degree in 1977 from National Taiwan University, and the Ph.D. degree in 1980 from the University of Florida, Gainesville, all in electrical engineering. After receiving the B.S. degree, he served as a Technical Officer in the Chinese Army from 1971 to 1973. From 1973 to 1975, he was a Technical Staff Member in the Chinese Telecommunication Bureau in the field of communication switching. From 1980 to 1982, he joined the Harris Semiconductor Corporation, Melbourne, FL, as an Associate Principal Engineer, where he was active in the advanced CMOS technology development, device physics, process and device simulation. From 1982 to 1985, he served as a Research Staff Member at the IBM T. J. Watson Research Center, Yorktown Heights, NY, where he was involved in the development of advanced CMOS technology, devices, and process modeling. He was an Advisory Engineer in the IBM General Product Division, San Jose, CA, where his responsibilities were the analog and digital circuit design for the advanced disk products from 1985 to 1987. In 1987, he returned to the T. J. Watson Research Center as a Research Staff Member and was actively involved in the VLSI circuit design, computer algorithms, computer architecture, and digital signal processing. From 1993 to 1996, he moved to the IBM Almaden Research Center, San Jose, CA, where he was actively involved in the mixed-signal circuit design, design methodology, and highspeed, low-power circuit design. He is now with the EPIC Design Technology, Inc., San Jose, CA, where he is involved in the circuit analysis and design methodology. 573 Wei Hwang (S’68–M’69–SM’90) received the B.Sc. degree from National Cheng-Kung University, the M.Sc. degree from National Chiao-Tung University, Taiwan, R.O.C., and the M.Sc. and Ph.D. degrees from the University of Manitoba, Canada. He is currently a Research Staff Member at IBM T. J. Watson Research Center in Yorktown Heights, NY, and an Adjunct Professor of Electrical Engineering at Columbia University, New York, NY. Prior to that he was as an Associate Professor of Electrical Engineering at Columbia University and an Assistant Professor of Electrical Engineering at Concordia University, Montreal. He has contributed to several areas of microelectronics, VLSI design, and submicron CMOS technology. His innovative circuits and architecture designs led to the development of the first high-speed CMOS 1-Mb DRAM. He has also been involved in the early technology development, memory cell, and sensing circuit designs for the IBM CMOS 4, 16, 64, and 256 Mb DRAM research and development programs. Later, he worked on high-performance self-resetting CMOS (SRCMOS) circuit designs for the IBM PowerPC 630 microprocessor development program. Currently, he is working on a merged logic/DRAM 0.25-m CMOS technology for integrated-systems-on-a-chip (ISOC) project. He holds 32 U.S. patents and has published more than 85 technical papers in the areas of semiconductor technologies, materials, devices, and VLSI logic and memory circuits. He has also co-authored a book entitled Electrical Transport in Solids—With Particular Reference to Organic Semiconductors (Pergamon Press, Oxford 1981). Dr. Hwang has been awarded 11 IBM Invention Achievement Awards and three IBM Research Division Technical Awards. He was recognized as one of IBM’s top inventors in 1991 and 1994. He was President of the Chinese American Academic and Professional Society (CAAPS) in 1986, Chairman of the Board of Directors of CAAPS from 1988 to 1990, and a member of the Board of Directors of CAAPS between 1986 to 1991 and 1994 to 1996. He now serves as a member of the Governing Board and as treasurer of the Chinese Language Computer Society, 1993 to 1998. He is a member of the New York Academy of Science, the American Physical Society, Sigma Xi, and Phi Tau Phi Society. He received the 1985 Outstanding CAAPS Service Award, the 1992 Courvoisier Leadership Award, and the 1995 CAAPS 20th Anniversary Special Service Award.