Download Design and Implementation of Differential Cascode Voltage Switch

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997
563
Design and Implementation of Differential Cascode
Voltage Switch with Pass-Gate (DCVSPG)
Logic for High-Performance Digital Systems
Fang-shi Lai and Wei Hwang, Senior Member, IEEE
Abstract— In this paper, a new high-speed circuit technique
called differential cascode voltage switch with pass-gate
(DCVSPG) logic tree is presented. The circuit technique is
designed using a pass-gate logic tree in DCVSPG instead of
the nMOS logic tree in the conventional DCVS circuit, which
eliminates the floating node problem. By eliminating the floating
node problem, the DCVSPG becomes a new type of ratioless
circuit, and it also provides superior performance with less
power dissipation and better silicon area tradeoff. The basic
DCVSPG design technique, the methodology for optimization,
and synthesis of the pass-gate logic tree are described. The
standard cell library development by taking advantages of the
dual-rail outputs of DCVSPG gates are also discussed. The
performance comparisons with other existing pass-gate circuit
techniques [complimentary pass-transistor logic (CPL), double
pass-transistor logic (DPL), and swing restored pass-transistor
logic (SRPL)] are presented. For more robust design, the
DCVSPG with inverter buffers is also the best choice.
A Viterbi macro design using the DCVSPG circuit technique
is demonstrated. The process that the design is based upon is a
0.5-m CMOS technology with 0.25-m effective channel length
and five layers of metal. This macro can run up to 500 MHz at
the nominal process condition. In comparison with other existing
dynamic circuit techniques, the results also clearly show that the
dynamic DCVSPG has the superior power-delay performance.
Index Terms—Complete logic family, CMOS digital integrated
circuits, high-speed circuits, MOSFET logic devices.
I. INTRODUCTION
T
HE dominant circuit design technique for current digital
systems is static CMOS. This is mainly due to their robust
design nature which can implement reliable circuits with excellent noise margin. However, the demand for high-performance
digital systems requires continuously faster CMOS circuit
speed. Dynamic circuits are proven to have better circuit
performance [1]. Unfortunately, these dynamic design styles
suffer from charge sharing, low noise margin, complexity of
design, and difficulty in testing. Recently, several researchers
have attempted to use pass-gate logic style to realizes static
and high performance designs in different digital systems
[2]–[4]. Pass-gate logics gain their speed over the traditional
static CMOS design due to their high logic functionality and
Manuscript received January 4, 1996; revised April 10, 1996.
F. Lai is with the IBM Almaden Research Center, San Jose, CA 95120
USA.
W. Hwang is with the IBM Thomas J. Watson Research Center, Yorktown
Heights, NY 10598 USA.
Publisher Item Identifier S 0018-9200(97)02480-3.
reduction in the number of pMOS transistors. However, the
degradation of pull-up performance for the pass-gate design in
the long circuit chain is the major obstacle for most designer
to use. A proper termination of the long pass-gate chain in the
gate or the insertion of a static inverter is a ultimate solution
to realize the high-speed pass-gate design.
Differential cascode voltage switch (DCVS) [5] is claimed
to have advantages over the traditional static CMOS design
in terms of circuit delay, layout area, logic flexibility, and
power dissipation [6], [7]. DCVS also has an inherent selftesting property which can provide coverage for stuck-at
and dynamic faults [8]. Actually, most of the published
pass-gate high performance circuits [2]–[4], [9] are more or
less derived from DCVS. The reduced size of the pMOS
transistor network substantially saves the amount of silicon
area. The inherent cross-coupled logic and complementary
outputs also make DCVS a very attractive candidate for
dynamic implementations. Standard domino logic [10] suffers
from the fact that inverting logic gates cannot be implemented.
DCVS provides complementary information and therefore
overcomes the restriction.
However, this DCVS is a ratio circuit and, as such, has
some significant drawbacks and disadvantages in both static
and dynamic mode of operations. This is mainly caused by
the floating node which is generated by one of its logic
tree legs. With this inherent nature, this DCVS tends to
have relatively large current spikes and additional delay [11].
By replacing the logic evaluation tree with the pass-gate
design, the floating node problem can be eliminated. This
makes the differential cascode voltage switch with pass-gate
tree (DCVSPG) becomes a ratioless circuit. It combines the
cross-coupled DCVS nature which makes DCVSPG have full
output swings signal and compact logic design style of passgate circuits. This leads DCVSPG to be suitable for high
performance digital systems.
In this paper, DCVSPG basic design technique will be presented. In Section II, the basic AND circuit will be compared
with the standard DCVS circuit. The optimization procedure
will be derived, and the synthesis of pass-gate logic by
using the recursive Karnaugh map will be discussed. The
performance comparisons with existing pass-gate circuits will
be shown in Section III. A Viterbi decoder designed with
DCVSPG and the performance comparisons with static and
dynamic DCVS circuits are shown in Section IV. A conclusion
will be presented in Section V.
0018–9200/97$10.00  1997 IEEE
564
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997
(a)
Fig. 2. ASTAP simulation results of DCVS with fixed nMOS device sizes.
(b)
Fig. 1. (a) Conventional DCVS AND circuit and (b) DCVSPG AND circuit.
II. DIFFERENTIAL CASCODE VOLTAGE
SWITCH WITH PASS-GATE LOGIC (DCVSPG)
A. Basic Properties
Fig. 1(a) shows the traditional DCVS AND circuit. In
DCVS circuits, two cross-coupled pMOS transistors p1 and
p2 form the circuit load. Below the pMOS load, there are four
nMOS transistors n1, n2, n3, and n4 that form the n-channel
logic evaluation tree. When the input signals and
swing
from low to high, transistors n1 and n2 turn ON. The node
is then discharged to the ground. The node is floating at the
transition period while the complementary input signals
and
swing from high to low. Both of the nMOS transistors
turns
n3 and n4 are OFF. The ground level on the node
the cross-coupled pMOS transistor p2 ON. The output node
will be charged high. This realizes the AND logic function.
However, the floating phenomenon on the output node
has an adverse effect on the DCVS circuit operation. If we
assume the node
is low and the node
is high in
the previous state, during the transition period, the pMOS
transistor p1 is ON momentarily when both of the input signals
and
are swinging from low to high. This is due to
the node
being low at this moment. The transistors p1,
n1, and n2 form a ratio circuit in which the node
is
going to be high or low solely depending on the ratio strength
of pMOS transistor p1 and the series-connected nMOS n1
and n2 transistors. If p/n ratio sets
at
—
or
lower, the gate will switch—but will be very slow. The logic
function of this circuit will fail. However, if we decrease the
strength of pMOS transistor p1, the performance of the pull
up will be substantially degraded, and power consumption
will be increased. In Fig. 2, the Advanced Statistical Analysis
Program (ASTAP) simulation results of this circuit are shown.
In this simulation, we set the nMOS device widths constant.
By varying the pMOS device width, we can see that the rise
time is dramatically changed when we monitor the output
node . In the small pMOS device width region, the slow
pull-up performance is largely caused by the small pMOS
strength. However, in the larger pMOS device width region,
the degraded pull-up performance is mainly caused by the
ratio circuit problem. The node
discharges slowly with
the high pMOS device strength.
The DCVSPG AND circuit shown in Fig. 1(b) actually
solves the floating node problem by replacing the n-channel
evaluation tree with the pass-gate design. The cross-coupled
pMOS device load is the same as in Fig. 1(a). With the
same previous state, when both input signals and
swing
from low to high, the nMOS transistors n2 and n4 both turn
is then discharged into ground when the
ON. The node
complementary signals
and
swing from high to low.
However, the output node
is charging up to the high state
which prevents the ratio circuit problem as discussed before.
This improves not only the increasing circuit performance,
but also decreases the power consumption. This also makes
DCVSPG a more robust design technique.
In the DCVS circuit, a proper pMOS device width can be
chosen to make the circuit function, but the technology process
variations will make the pMOS device width adjustments very
difficult. In the ASTAP simulation, results of DCVSPG with
fixed nMOS device sizes are as shown in Fig. 3. It is confirmed
that the DCVSPG design technique has the better performance.
The rise times are constant throughout the whole pMOS device
width variation period. The pull-up performances are almost
one order of magnitude better than that of DCVS design.
B. Optimization of DCVSPG Gates
If we take the average value of the rise and fall times shown
in Fig. 3 and replot them in Fig. 4, an optimum pMOS device
width can be chosen in order to make better performance.
This originates from the fact that the pull up performance
is contributed both from the pMOS and nMOS devices in
the DCVSPG circuit. So if we increase the pMOS device
width, the pull-up performance will be improved. However,
LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC
565
where
. However
(4)
where is a constant which depends on the design ground rule
for minimum junction length.
is the pMOS width.
is the
pMOS length, and
is the lumped junction capacitance per
unit channel width. Here, we assume pMOS and nMOS have
the same junction capacitance. The total capacitance is then
(5)
Fig. 3. ASTAP simulation results of DCVSPG with fixed nMOS device
sizes.
, and
where we assume
. The fall time is then
(6)
The fall time is proportionally increased as increases. This
is shown in Fig. 3. For the charging path shown in Fig. 5(c),
the charged voltages by nMOS and pMOS are
(7)
(8)
Fig. 4. Delay time with the functions of pMOS device size and capacitive
load.
the increased pMOS device junction capacitance will degrade
the pull-down performance.
As shown in Fig. 5(a), the discharging path can be summarized as shown in Fig. 5(b). The
shown in Fig. 5(a) is the
wiring capacitance and the
shown in Fig. 5(b) and (c) is
the total capacitance
where
is the junction
capacitance of the pMOS and nMOS devices. By assuming
the nMOS device is in the saturation mode while the discharge
voltage varies from
to 0.5
where we measured at the
switching point, the discharge current [12]
is
By assuming
, the total charged voltage
is
(9)
where
path.
and are the same definition as shown in discharging
is the rise time. So the rise time is
(10)
where
and are the same definition as the discharging path.
By rearranging the delay time
(1)
(11)
where
is the electron mobility and
is
the intrinsic gate capacitance.
is the nMOS width.
is
the nMOS length.
is the gate-to-source voltage and
is
the nMOS threshold voltage. The discharged voltage is then
(2)
where
is the fall time. The fall time is then rewritten as
(3)
By taking the first derivative of (11) and let
, we get
(12)
This is the optimum value for DCVSPG circuits. Equation
(12) shows that when
, the equals zero. That means
the pMOS device width should be as small as possible to
have the best performance as shown in Fig. 4. However, when
, the value is around 1.16. The pMOS device width
should be chosen to be the same as that of the nMOS devices.
566
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997
(a)
(a)
(b)
Fig. 6. (a) Karnaugh map for AND circuit and (b) synthesized DCVSPG
AND circuit.
(b)
leg just take the respective complementary function variable
connection to realize the complete circuit. The realized AND
logic circuit is shown in Fig. 6(b).
Fig. 7(a) shows the logic function of
Karnaugh map. If we assume
the input signals and are the control signals, the function
variables are then and . By grouping the terms under each
of the four control values and minimizing the functions using
the conventional Karnaugh map procedure, the logic function
result can be written as
(13)
(c)
Fig. 5. (a) DCVSPG AND circuit with the capacitive load, (b) simplified
pull-down path for DCVSPG circuit, and (c) simplified pull-up path for
DCVSPG circuit.
C. Recursive Synthesis of Pass-Gate
The pass-gate logic tree can be synthesized in a very systematic way by recursively using the Karnaugh map. Although
the complex DCVSPG logics can be realized by using the
standard basic logic gates such as NOR, NAND, and XOR
gates, the powerful logic implementation by using the passgate approach is another major advantage of DCVSPG circuits.
Fig. 6(a) shows the Karnaugh map for the AND function. The
or
can be either the nMOS gate control
input signal
or the nMOS source connection. In this case, if we assume
signal will be the
the signal is the control variable, the
function variable. The control variable is used to connect to
the gate and the function variable is connected to the source
and
,
of the nMOS device. Under the control signals
we group the terms together as shown in Fig. 6(a). Under the
, the grouped terms are all zero. It indicates
control signal
should
that the function variable under the control signal
connect to ground. However, under the control signal , the
grouped terms are the same with the signal . This means the
function variable under the control gate should connect to
. The function variable connections under the other logic tree
This indicates that under the series-connected control gates
and
, the function variable is 1010 from the Karnaugh
map. In parallel with these series-connected control gates
and
, it should have one control gate and one control
gate. Both of them are connected to the same function variable
0011. As shown in the AND gate realization method, the
1010 function variable can be realized in the same fashion
by assuming the
signal is the control variable. The 1010
function variable can be realized as
(14)
and the 0011 function variable as
ground
(15)
The completely synthesized circuit is shown in Fig. 7(d). It
is obvious that if the function variables are more than five,
we should use the Quine–McCluskey tabular method [15].
However, the circuit with more than five variables increases
stack height dramatically. This will degrade the circuit performance profoundly. Especially, when the technology is moving
down to the smaller power supply with the threshold voltage
not being scaled proportionally, the increased stack height
will degrade the performance sharply. The optimum function
variable number is then better kept lower than three or four
stack height depending on the applications.
LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC
567
(a)
(b)
Fig. 8. Basic two-way DCVSPG logic gates.
(c)
(d)
=
AN BN CN DN + C (A + B + D)
Fig. 7. (a) Karnaugh map for F
logic function, (b) Karnaugh map for [1010] value from (a), (c) Karnaugh map
for [0011] value from (a), and (d) synthesized DCVSPG circuit.
D. DCVSPG Gates
Figs. 8 and 9 show the basic logic gates with two or three
input variables. These gates can be synthesized using the
previously mentioned recursive Karnaugh map method. It is
interesting to note that all of the logic functions are produced
by only four nMOS transistors. The only differences among
them are the function variable connections. Another attractive
feature is that these circuits can generate the complementary
outputs without any inverter circuits, such that the AND and
NAND are actually the same circuit only with the output
nodes exchanged. This leads the DCVSPG circuit to be most
suitable for standard cell library development. For all the
library cells, the circuit topologies are the same. They are
only different in function variable connection. Due to the
cross-coupled pMOS device load, these circuits have the latch
function. By combining the clock function inside the circuit,
these circuits can become the flip-flop combined with the logic
function [16], [17]. As shown in Fig. 10, between the crosscoupled pMOS load and logic tree, two extra nMOS’s are
inserted. The gates of these nMOS’s are then connected to the
clock signal
. When the clock signal
is high, the
output nodes and
can store the current evaluated value.
When the clock signal
is low again, the input signals
will be changed. However, the output nodes
and
are
disconnected from the logic tree. It will temporarily store the
previous data until the clock is high again. This is a typical
cross-coupled latch function. It is very useful to apply this
DCVSPG latch technique to the pipelined design. Usually, in
the conventional pipelined design, we have to add extra latches
in order to store the temporary data. Thus, the silicon area
penalties of these extra latches are very large. By using the
DCVSPG techniques, the requirement for these extra latches
can be eliminated totally. It is because only at the last logic
stage of pipelined design can we insert the clock-controlled
NMOS between the cross-coupled pMOS load and DCVSPG
logic tree to form the latch. The advantages of these DCVSPG
latches will not only improve the performance dramatically
(no extra latches to be driven), but also reduce the silicon
area substantially. All DCVSPG circuits also allow dot-OR
function due to these pass-gates being in the high impedance
state. The dot-OR function, however, is not allowed in the
conventional static CMOS design.
III. PERFORMANCE COMPARISONS
In order to compare the performance and power dissipation
with the competing techniques such as CPL [2], DPL [3], and
SRPL [4], we construct the SUM circuits as the test vehicle.
For DCVSPG circuit, the SUM circuit is actually the threeway XOR circuit shown in Fig. 9. Due to all of the competing
568
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997
Fig. 11. DCVSPG SUM circuit with two inverter buffers.
Fig. 9. Basic three-way DCVSPG logic gates.
Fig. 12. CPL circuit implementation for SUM circuit.
Fig. 10.
DCVSPG latch circuit.
Fig. 13. DPL circuit implementation for SUM circuit.
circuits having inverter buffers, we also add inverter buffers in
the DCVSPG circuit shown in Fig. 11 as DCVSPGB. The CPL
SUM circuit is shown in Fig. 12. It is interesting to note that
DCVSPGB is actually CPL with the additional cross-coupled
pMOS load. The DPL SUM is described in Fig. 13, and the
SRPL SUM is shown in Fig. 14. By looking all the circuits,
they are awfully similar. So we simulate these circuits with the
same nMOS and pMOS device widths. The only difference for
DPL is that we split the nMOS width used in other circuits
into 40 to 60% ratio between the nMOS and pMOS passgates. The technology used for simulation is a 0.5- m CMOS
technology with 2.5-V power supply.
Figs. 15 and 16 show the simulated circuit performance and
power dissipation at the variation of capacitive loads. Without
considering the performance result and the waveform shape,
DCVSPG actually has the best power-delay product. It is easy
to understand that DCVSPG has the least number of transistors
and the full-swing CMOS signals. For the capacitive load
less than 0.3 pF, DCVSPG even has the best performance
of all. The only serious drawback is that DCVSPG cannot
drive a long chain of logic circuits due to the weak pMOS
transistor pull up. This will degrade the performance at high
load as shown in Fig. 15. However, as we discussed before,
the input signals to pass-gate can be either a control variable
or a function variable. The long chain of logic circuits can
be terminated at gate if a proper control variable is chosen.
The alternate approach is using an inverter buffer as shown
LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC
Fig. 14.
569
SRPL circuit implementation for SUM circuit.
Fig. 15. ASTAP simulation results for rise and fall time delay.
in DCVSPGB. Adding the inverter buffer, however, the load
is increased with two extra gate capacitances from the node
shown in Fig. 11. The performance at the light capacitive
load will be degraded. At the high capacitive load range,
however, the delay of the inverter buffer will be dominated,
such that DCVSPGB has the better performance in the highly
loaded region. CPL, DPL, and DCVSPGB all have the very
similar circuit performance at the low capacitive load. In
the high load range, however, both DPL and DCVSPG have
better performance than CPL. This is due to both of them
having full swing signals. For CPL, the pull down performance
is dramatically degraded as the singles only swing from
to zero at nodes
and
shown in Fig. 12.
Smaller overdrive voltage causes the inverter delay to increase.
Although we can lower the inverter threshold voltage as
suggested [2], this will lower the noise margin, and technology
process variations will cause the design to be less robust.
SRPL performance is not as good as the claim in the SRPL
paper [4]. The reason is not quite known. But from the circuit
point of view, the node in Fig. 14 will have not only the
wiring capacitance and inverter gate load, but also the inverter
junction loads. Before the inverter switches, at the same device
width, SRPL actually has the largest loading among these
competing circuits.
From the overall results, DCVSPG actually is the best
circuit for the lightly loaded ( 0.3 pF) condition if the long
logic chain can be properly terminated at a gate. For a more
robust design situation, DCVSPGB and DPL might be the best
choices. However, from the circuit simplicity point of view,
DPL is proven to be much more complicated. Although the
total device widths are the same, the complex wiring might
finally drive DPL into a larger silicon area.
IV. APPLICATIONS
A Viterbi decoder used in the high performance PRML
read-write channel chip [18] has been designed. The logic
diagram of the Viterbi decoder is shown in Fig. 17. From
Fig. 16. ASTAP simulation results for power consumption.
this diagram, the bottleneck of the critical path is the 6-b
subtracter unit which has to meet the worst process condition
at 3 ns. The silicon area is also a big concern for this costperformance chip. The 6-b subtracter is then constructed and
implemented by using a DCVSPG circuit and the ripplethrough architecture. The full subtracter is shown in Fig. 18
and the overall Viterbi decoder computer plot is shown in
Fig. 19. The simulation results of the whole Viterbi macro are
shown in Fig. 20. It is shown that at the nominal process,
the Viterbi macro can achieve very high performance up to
500 MHz by using the DCVSPG circuit.
As we discussed before, the DCVSPG circuit cannot only be
used in a static circuit configuration, but also can be extended
into the dynamic circuit region [6], [7]. Again, in order to
compare the circuit performances of various design techniques,
570
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997
Fig. 17.
Logic schematic for the DCVSPG realized Viterbi decoder.
Fig. 18.
DCVSPG full adder circuit for Viterbi decoder.
the SUM circuit is used for ASTAP simulation. Figs. 21 and 22
show the static and dynamic circuits of the conventional DCVS
design technique. The static and dynamic implementations of
DCVSPG are shown in Figs. 9 and 23. The static CMOS SUM
circuit is shown in Fig. 24.
The device width is designed following a basic rule that the
conductance of all the discharging paths are assumed to be the
same as the conductance of a minimum size (
m in our
case) nMOS transistor such that, for example, in Fig. 21 the
Fig. 19. Computer plot of the whole Viterbi decoder.
conventional DCVS static circuit, three transistors are seriesconnected along the discharge path. The transistor width is
LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC
571
TABLE I
COMPARISON OF FULL ADDER
Fig. 20.
ASTAP simulation results for Viterbi decoder.
then chosen as
m
m. For Fig. 22, however, the
device size is then increased up to
m
m in the
dynamic DCVS circuit. The pMOS device size is chosen as
twice as large as that of the nMOS device. All the circuits are
laid out by using the row-based standard cell library image
with only one level metal allowed.
The simulation results are shown in Table I. The load
capacitance is assumed that the circuit drives a chain of the
similar circuits. A fan out of one is used for the static circuit.
The dynamic circuits, however, are buffered by the C MOS
latch and expect to be able to drive larger loads. A fan out of
two is used for the dynamic circuits.
Considering the static design first, it appears that the static
DCVSPG has the best power-delay product. DCVSPG has the
lowest logic tree stack height such that its transistor size and
input capacitance are the smallest. The best performance is
Fig. 21. Static DCVS SUM circuit.
solely due to the fact that the pull up and pull down are mostly
done by the high performance nMOS transistor. By contrast,
the pull up of DCVS and static CMOS are done by the inferior
pMOS transistors. The least power consumption is also mostly
caused by the shorter stack height and smaller transistor size.
For the dynamic circuits, the power-delay product of the
dynamic DCVSPG is the same with its static counterpart.
However, its speed is almost two times faster than that of
the static DCVSPG at the expense of the twice larger device
count and silicon area. This leads to almost two times larger
power consumption. The conventional dynamic DCVS also
shows significant performance improvement. This might be
due to the fact that the dynamic DCVS is a ratioless circuit
using the precharge mechanism. But this might be offset by
the larger silicon area, charge sharing problem, and complex
clock scheme in the dynamic circuit.
572
Fig. 22.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997
Dynamic DCVS SUM circuit.
Fig. 24. Static CMOS SUM circuit.
ACKNOWLEDGMENT
The authors are indebted to Dr. D. Tang and Dr. M. Chen for
their encouragement and support during the course of Viterbi
decoder design.
REFERENCES
[1] M. Annaratone, Digital CMOS Circuit Design. New York: Kluwer,
1986.
[2] K. Yano, T. Yamanaka, T. Nishida, M. Sato, K. Shimohigashi, and A.
16 multiplier using complementary
Shimizu, “A 3.8 ns CMOS 16
pass-gate transistor logic,” IEEE J. Solid-Sate Circuits, vol. 25, pp.
388–395, 1990.
[3] M. Suzuki, N. Ohkubo, T. Yamanaka, A. Shimizu, and K. Sasaki, “A
1.5 ns 32b CMOS ALU in double pass-transistor logic,” in Dig. Tech.
Papers, ISSCC, 1993, pp. 90–91.
[4] A. Parameswar, H. Hara, and T. Sakurai, “A high speed, low power,
swing restored pass-transistor logic based multiply and accumulate
circuit for multimedia application,” in Proc. IEEE CICC, 1994, pp.
278–281.
[5] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, “Cascode
voltage switch logic: A differential CMOS logic family,” in Dig. Tech.
Papers, ISSCC, 1984, pp. 16–17.
[6] K. M. Chu and D. I. Pulfrey, “A comparison of CMOS circuit techniques: Differential cascode voltage switch logic versus conventional
logic,” IEEE J. Solid-State Circuits, vol. SC-22, pp. 528–532, 1987.
, “Design procedures for differential cascode voltage switch
[7]
circuits,” IEEE J. Solid-State Circuits, vol. SC-21, pp. 1082–1087, 1986.
[8] R. K. Montoye, “Testing scheme for differential cascode voltage switch
circuits,” IBM Tech. Disc. Bull., vol. 27, pp. 6148–6152, 1985.
[9] F. S. Lai and W. Hwang, “Differential cascode voltage switch with passgate logic tree for high performance CMOS digital systems,” in 1993
Int. Symp. VLSI Technology Systems Applications, 1993, pp. 358–362.
[10] R. H. Krambeck, C. M. Lee, and H. S. Law, “High-speed compact
circuits with CMOS,” IEEE J. Solid-States Circuits, vol. SC-17, pp.
614–619, 1982.
[11] L. C. Pfennings, W. J. Mol, J. J. Bastiaens, and J. M. VanDijk,
“Differential split-level CMOS logic for subnanosecond speeds,” IEEE
J. Solid-State Circuits, vol. SC-20, pp. 1050–1055, 1985.
[12] S. M. Sze, Physics of Semiconductor Devices. New York: Wiley, 1981.
[13] S. Whitaker, “Pass-transistor networks optimize n-MOS logic,” Electron., pp. 144–148, 1983.
2
Fig. 23.
Dynamic DCVSPG SUM circuit.
V. CONCLUSIONS
This paper describes a new circuit design technique—DCVSPG. This proposed circuit eliminates the floating
node problem which existed in the conventional DCVS design.
This leads DCVSPG to have better performance and power
consumption. We also demonstrate a simple synthesis way to
construct the complex logic function into pass-gate design by
using the recursive Karnaugh map.
Compared with the existing design techniques, DCVSPG
is actually the best of all at the light capacitive load range.
By using DCVSPG, the properly terminated long logic chain
at the gate is an essential issue. For the more robust design,
however, DCVSPGB might also be the best choice. CPL is not
a full swing signal circuit and DPL suffers from the complex
wiring and design. A Viterbi macro shows the possibility of
using DCVSPG to achieve a very high performance design.
With the extension into the dynamic circuit design region,
DCVSPG also shows the best power-delay product.
LAI AND HWANG: DESIGN AND IMPLEMENTATION OF DCVSPG LOGIC
[14] D. Radhakrishnan, S. Whitaker, and G. K. Maki, “Formal design
procedures for pass-transistor switching circuits,” IEEE J. Solid-State
Circuits, vol. SC-20, pp. 531–536, 1985.
[15] D. Winkel and F. Prosser, The Art of Digital Design. Englewood Cliffs,
NJ: Prentice-Hall, 1980.
[16] S. Feng, U.S. Patent 4 620 117, 1986.
[17] M. Afghahi, “A robust single phase clocking for low power, high-speed
VLSI applications,” IEEE J. Solid-State Circuits, vol. 31, pp. 247–254,
1996.
[18] R. A. Richetta, C. J, Goestchel, R. A. Green, R. A. Kertis, R. A.
Philpott, T. J. Schmerbeck, D. J. Schulte, and D. P. Swart, “A 16
MB/PRMLread/write data channel,” in Dig. Tech. Papers, ISSCC, 1995,
pp. 78–89.
Fang-shi Lai received the B.S. degree in 1971 from
National Cheng Kung University, Tainan, Taiwan,
the M.S. degree in 1977 from National Taiwan
University, and the Ph.D. degree in 1980 from the
University of Florida, Gainesville, all in electrical
engineering.
After receiving the B.S. degree, he served as a
Technical Officer in the Chinese Army from 1971 to
1973. From 1973 to 1975, he was a Technical Staff
Member in the Chinese Telecommunication Bureau
in the field of communication switching. From 1980
to 1982, he joined the Harris Semiconductor Corporation, Melbourne, FL, as
an Associate Principal Engineer, where he was active in the advanced CMOS
technology development, device physics, process and device simulation. From
1982 to 1985, he served as a Research Staff Member at the IBM T. J.
Watson Research Center, Yorktown Heights, NY, where he was involved in the
development of advanced CMOS technology, devices, and process modeling.
He was an Advisory Engineer in the IBM General Product Division, San
Jose, CA, where his responsibilities were the analog and digital circuit design
for the advanced disk products from 1985 to 1987. In 1987, he returned
to the T. J. Watson Research Center as a Research Staff Member and was
actively involved in the VLSI circuit design, computer algorithms, computer
architecture, and digital signal processing. From 1993 to 1996, he moved to
the IBM Almaden Research Center, San Jose, CA, where he was actively
involved in the mixed-signal circuit design, design methodology, and highspeed, low-power circuit design. He is now with the EPIC Design Technology,
Inc., San Jose, CA, where he is involved in the circuit analysis and design
methodology.
573
Wei Hwang (S’68–M’69–SM’90) received the
B.Sc. degree from National Cheng-Kung University,
the M.Sc. degree from National Chiao-Tung
University, Taiwan, R.O.C., and the M.Sc. and
Ph.D. degrees from the University of Manitoba,
Canada.
He is currently a Research Staff Member at
IBM T. J. Watson Research Center in Yorktown
Heights, NY, and an Adjunct Professor of Electrical
Engineering at Columbia University, New York,
NY. Prior to that he was as an Associate Professor
of Electrical Engineering at Columbia University and an Assistant Professor of
Electrical Engineering at Concordia University, Montreal. He has contributed
to several areas of microelectronics, VLSI design, and submicron CMOS
technology. His innovative circuits and architecture designs led to the
development of the first high-speed CMOS 1-Mb DRAM. He has also
been involved in the early technology development, memory cell, and sensing
circuit designs for the IBM CMOS 4, 16, 64, and 256 Mb DRAM research and
development programs. Later, he worked on high-performance self-resetting
CMOS (SRCMOS) circuit designs for the IBM PowerPC 630 microprocessor
development program. Currently, he is working on a merged logic/DRAM
0.25-m CMOS technology for integrated-systems-on-a-chip (ISOC) project.
He holds 32 U.S. patents and has published more than 85 technical papers
in the areas of semiconductor technologies, materials, devices, and VLSI
logic and memory circuits. He has also co-authored a book entitled Electrical
Transport in Solids—With Particular Reference to Organic Semiconductors
(Pergamon Press, Oxford 1981).
Dr. Hwang has been awarded 11 IBM Invention Achievement Awards and
three IBM Research Division Technical Awards. He was recognized as one
of IBM’s top inventors in 1991 and 1994. He was President of the Chinese
American Academic and Professional Society (CAAPS) in 1986, Chairman
of the Board of Directors of CAAPS from 1988 to 1990, and a member of
the Board of Directors of CAAPS between 1986 to 1991 and 1994 to 1996.
He now serves as a member of the Governing Board and as treasurer of the
Chinese Language Computer Society, 1993 to 1998. He is a member of the
New York Academy of Science, the American Physical Society, Sigma Xi,
and Phi Tau Phi Society. He received the 1985 Outstanding CAAPS Service
Award, the 1992 Courvoisier Leadership Award, and the 1995 CAAPS 20th
Anniversary Special Service Award.