Download VDD Stacking with Different Adder Topologies

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

The Effectiveness of Stacked-VDD on Different FullAdder Topologies without Voltage Regulation
by Wenlu Sun & Jiyati Verma
ECE 6332 – Fall 2011
[email protected], [email protected]
Typically, supply noise does not lessen with decreasing VDD,
contributing to the already growing problem of power
density. One way to scale VDD and suppress power
dissipation is to stack circuits that run at multiples of original
supply voltage. The limitation of this technique is the need
for voltage regulation of the middle node on the output side
of the circuit: the more this voltage point strays from VDD/2,
the harder it is to regulate the circuit. Because designing
robust voltage regulators is difficult, supply noise plays a
large role in how effective this technique is. For our project,
we experimented with full adder topology, implementing 6
single-bit blocks and cascading them into 16-bit adders. To
be able to make some sort of generalization as to how good
different families of adders are at keeping the middle node at
VDD/2, two types of static adders, two transmission logic
adders, and two dynamic adders were assessed. Simulation
results revealed that static adders stray the least from VDD/2.
Keywords: Stacked-VDD, VDD stacking, multi-story power
delivery, Cadence, Adder Topology, Static Adders,
Transmission Logic Adders, NP CMOS Dynamic Adders
Ideally, the voltage drop is always the same across both the
bottom half and top half of the circuit and the currents passing
through the two halves are also equal at all times. On the
contrary, in reality, passing different input signals, at different
times, can cause different circuits to utilize more current than
others. This can cause all the different Vmiddle’s as shown in
Figure 1, to wobble off of the ideal midpoint V DD/2. As we
will see later, the stability of this middle node intrinsically
determines whether the overall circuit balances current well.
Consequently, practical circuit-level definitions of VDD
stacking involve closely regulating the voltage of each middle
node. This important current balancing strategy pins each of
those nodes to the correct fractions of the supply voltage. But
designing robust voltage regulators is difficult, and if the
currents are not balanced, the difference flows through the
voltage regulators [6]. Not only does this result in more power
wastage, it also worsens power supply noise because it
increases the current through external power grids [6]. So any
way to alleviate that electrical strain would contribute to better
Reducing power supply voltage is necessary as industry
scales the width of transistors to smaller sizes, approaching
the fundamental limit of CMOS technology. However, with
increasing levels of operating current, higher switching
speeds, scaling down transistors worsens power supply noise.
This trend in power supply noise cripples circuit reliability,
power density, and greater system functionality. In the event
that decoupling capacitance is not cushioning the effects of
power supply noise, such noise has the potential to contribute
to excess power dissipation. Exacerbating the power
dissipation crisis is definitely unacceptable, as circuits are
approaching the power density of a nuclear reactor [3].
The conventional power supply method is not robust but
multi-story power delivery is by effectively reducing power
supply noise [3]. Multi-story power delivery, or VDD
stacking, is where n independent circuits, originally powered
by n different power supplies, are grouped into one circuit
loop, powered by a single power source of nVDD, and
assigned n different voltage domains [6]. Typically, on a
chip, “independent” circuits are hooked up to the same power
supply, so in reality this means all those different circuits are
in parallel with the same singular power supply. Figure 1
illustrates this transformation, possibly complex and different
circuits. In effect, this technique allows us to reuse the
current of one circuit in each of the others. The benefit of
doing so is that it reduces power supply current by 1/n of the
original value, lowering power supply noise [6].
Figure 1- VDD stacking applied to n blocks of logic. Multiple
Vmiddle nodes must each be carefully voltage regulated in
reality, making this a more complex solution.
power delivery. Overall, limiting how much noise the
voltage regulator experiences would increase the
effectiveness of VDD stacking technique. Thus, our central
focus is to simulate and record what noise appears at that
middle node without voltage regulation, because less noise
means the voltage regulator needs to be less robust.
The general super-circuit outline we adhered to was the one
illustrated by Figure 2, with added functionality. We wanted
a circuit that could function as two separate 2-digit 16-bit
binary adders computing sums independently, and our
project aimed to determine which full adder topology would
have the smallest noise margins for a given input vectors. In
the methodology, we outline the circuit topologies of those
adders, and which single input pattern change, for the sake of
simplicity. By doing so, we hope to gain some insight into
which circuit family is inherently better at balancing current.
1.1 Literature Review
VDD Stacking
Our work largely builds off the findings of Gu and Kim: their
simulation results showed a 66% reduction of IR noise, and
5% reduction of total power consumption when a 2-story
architecture’s metrics were compared to the respective values
for the conventional power delivery schemes [3]. Power
used by the circuit blocks stays the same, and the power
dissipated in the parasitic resistors is reduced [3]. These
seemingly small returns make a big difference in VLSI
systems [3]. However, this research group involved an
amount of voltage regulation in their simulation, so our
results may not be immediately applicable to their work.
The findings behind other sources determined how our
stacked-VDD circuit should look. For example, one source
indicated stacked circuits that are balanced and independent
can help achieve higher efficiency circuit for the system [4],
by stacking multipliers. Based on this finding, we decided to
use the same adder for the top and bottom block in each
setup that computed “different” sums. References [6] and [8]
make our findings applicable to other uses of VDD stacking,
since module assignment and pin packing also influence the
effectiveness of VDD stacking.
In VLSI, a whole universe of adder topologies optimized for
various purposes already exist. Much work focuses on
speeding up adders, streamlining the data arrival, tweaking
the critical path, opting for tree structures whenever possible.
Since there was no way to do all of them justice within the
course of a semester and within the span of such a project,
we took a simplified our approach to this grand variety of
adder topologies. In particular, we focused our interest on 3
different families of adders: static CMOS adders,
transmission logic adders, and dynamic adders. We drew as
much as we could from the textbook on this matter [6].
Reference [1] focused on optimizing full adder block
circuitry to minimize the power-delay product (PDP), and
gauge how sensitive all these topologies are to scaling VDD
for static adders, transmission logic adders, and one dynamic
adder: their comprehensiveness is commendable. Reference
[5] seeks to minimize the PDP for dynamic and transmission
logic adders by making it faster. References [1] and [5]
served as sources for adder topologies for this project.
Reference [9] gave some insight into the energy efficiency of
different low-power logic styles of adders, drawing from
transmission logic and static CMOS logic.
Figure 2- The two-level stacked-VDD circuit applied to each
adder topology. The individual adders each work off of 1.1 V,
so in VDD stacking, 2.2 V is required. For each topology, we
are concerned with the amount of noise that naturally appears
at Vmiddle with respect to ground.
A study involving VDD stacking and adders proved hard to
find. We did find a paper that involved SRAM banks and VDD
stacking, which discussed implications for reducing Vmiddle’s
fluctuations around VDD/2 [2]. Other than that, none of the
papers cited were concerned with both topics: for the most
part, they belong in one or the other, only intersecting at the
concern of reducing power dissipation [9]. Herein lays the
novelty of our discovery.
1.2 Problem Statement
As a whole, this becomes a complex problem but it is highly
dependent on two things: (1) the topology of the inner circuits
and (2) the input patterns supplied to each of them. Our
approach deals with simplifying the possibly complicated
effect of these two factors.
The topology of the inner circuits determines how current is
balanced. In class, we have witnessed how different styles of
circuitry utilize and reroute current in fundamentally different
ways, to an extent. Static CMOS logic has “opposite” circuits
in the pull-up and pull-down networks, always clearly
connects the output to VDD or ground, and has good switching
power characteristics. Transmission logic’s transmission gates
act like switches and thus open up new possibilities, but they
have the worst subthreshold leakage characteristics. Dynamic
logic only uses a pull-down network, has “Evaluate” and
“Precharge” modes, is not always so clearly linked to either
VDD or ground, and often has the worst switching power.
Dynamic logic is attractive for high-speed applications. But
large clock loads and the “Precharge” mechanism result in
high power dissipation. Choosing speed as the main purpose of
these circuits inherently compromises the robustness of the
generated, output signal. Using this piece of information, we
Figure 3- Cadence schematic of 16-bit adder implementation.
may hypothesize that either the transmission logic adders or
the static CMOS adders will result in less noise at Vmiddle. An
appropriate simplification here is that if the circuits had the
same topology, this may grant the overall circuit better
current-balancing properties [4]. We implemented and
expanded this circuit (see Figure 2 and Figure 3) in Cadence
using six different topologies: CMOS static, Mirror Adder
(static), transmission function full adder, and transmission
gate adder, and two different varieties of NP CMOS dynamic
adders. For a variety of full adder technologies, we took a
single-bit full adder block, and cascaded for 16-bits. Using
Cadence, we simulated stacked-VDD with each of these
different topologies. In particular, we aimed to see which of
the adder topologies balanced the current coming out of the
middle node naturally. We measured their ability to
inherently balance current by measuring the deviation of
Vmiddle from VDD/2 = 1.1 V.
The input patterns supplied to the top and bottom adders had
to be independent to give the circuit better current balancing
properties [4]. For measuring the deviation of Vmiddle from
VDD/2 = 1.1 V, we want to analyze the worst-case noise, not
the worst-case delay. After some preliminary simulations
with the Static CMOS and Mirror adder circuits, we observed
that the greatest amount of noise resulted when all the inputs
changed at the same time, and applied the same reasoning to
the rest of the full adder implementations. The Methodology
discusses this experimental design choice in further detail.
In general, we claim that the static adders will out perform
the other topologies because they have the most transistors.
his intuition for how the adders worked. When implementing
the adders, Wenlu sized all the transistors appropriately, based
on the principles we learned in class. For the dynamic full
adders, the clock signal consisted of 10-nanosecond pulses in a
50/50 duty cycle.
Previous work revealed that if the two blocks of circuitry are
symmetric but independent, then the overall circuit is able to
balance current better [4]. Based on our preliminary
simulations, we can safely claim that when inputs of FA0 and
FA1 swing in opposite directions then the greatest amount of
noise occurs at the output, causing the most imbalanced V middle
conditions. So Wenlu chose to change all three inputs at once,
and did his best to try to apply different voltage fluctuations on
each. But because there are more than 2 inputs (ABC), and
only two changes possible (10 or 01), some sort of overlap
had to be made. Hence, as Figure 4 shows, Wenlu chose that
A0 would be going from 10, B0 going from 01, A1 going
from 01, and B1 going from 10. All of the A’s and B’s
undergo this transition for all 16 blocks. Wenlu decided to
simplify the 3-input vector stimuli into 2-input vectors by
supplying the same signal A has to Cin too for the first block to
ensure that the adders were computing completely different
sums for the two cycles. But all the subsequent Cin’s were
hooked up to the appropriate Cout’s. This setup was applied to
both the top and bottom block of logic. For dynamic logic, the
clock’s phase is offset from the Thus, the transient output
graphs generated and analyzed in the Results portion of this
paper is the effect of the aforementioned input patterns on
Using Cadence, we simulated the effect of VDD stacking on a
variety of 16-bit adder topologies, shown above. Figure 3
shows how the adders’ carry out signals rippled onto the
following adder block. A few notes must be made with
respect to the circuits we have used. Without stacking, we
used transistors that would operate at 1.1 V VDD. With VDD
stacking, the two tiers of adders would need one 2.2 Vvoltage source. Without voluntary voltage regulation, the
middle node of the circuit on the output side is free to float
with respect to ground. In fact, since there are 16 adders in
parallel, there are actually 16 independent floating Vmiddle
points. Because these adder blocks are in parallel and see the
same loads to the left and right, we assume that all 16 of
those nodes undergo the same supply noise fluctuations.
Wenlu made many independent decisions about the practical
circuitry concerns, as he was in charge of the Cadence
simulations. He chose the adder topologies he did based on
Figure 4-Input vectors applied to all 6 circuit topologies. In
Cadence, the effect of this input signal on 16 cascading full
adder blocks is simulated. The first block’s Cin = A, all the
following Cin = Cout from the previous block as in Figure 3.
(a) Circuit diagram for static CMOS full-adder topology
(CFA), with 28 transistors per block. As seen in [1], [6].
(e) Circuit diagram for dynamic NP CMOS adder topology
(Dyn1), with 22 transistors per block (4 more than what is
shown here to account for the NOT’ed outputs). Each adder
block only implemented the highlighted block. As seen in the
1st ed. of our textbook.
(b) Circuit diagram for mirror full-adder topology (MFA),
with 28 transistors. As seen in [1], [6].
(c) Circuit diagram for transmission function adder (TFA),
with 16 transistors per block. As seen in [5].
(d) Circuit diagram for transmission gate adder (TGA), with
14 transistors per block. As seen in [5].
(f) Circuit diagram for an adjusted dynamic NP CMOS adder
topology (Dyn2), to combat the charge sharing problem using
as few transistors as possible. This implementation actually
required 17 transistors per block, 2 more than what is shown
here to account for the NOT’ed Sum. As seen in [5].
Figure 5- The 6 Adder Topologies used. The static adders
are (a) the Static CMOS adder (CFA), and (b) the Mirror adder
(MFA). The transmission logic adders are (c) Transmission
Function adder (TFA), and the (d) Transmission Gate adder
(TGA). The dynamic NP CMOS adders are (e) and (f).
(a) Cadence Schematic of the CFA block.
(d) Cadence Schematic of the TGA block.
(b) Cadence Schematic of the MFA block.
(e) Cadence Schematic of the Dyn1 block.
(c) Cadence Schematic of the TFA block.
(f) Cadence Schematic of the Dyn2 block.
Figure 6- Schematics of the Cadence implementation of the
6 Adder Topologies used. Again the static adders are (a) and
(b), the transmission logic adders are (c) and (d), and the
dynamic NP CMOS adders are (e) and (f).
Instead of trying to judge whether static, dynamic, or
transmission gate adders are naturally better at balancing
current using just one type of each, we did two of each type
to get a slightly better idea of how these three particular
families of adders balance current. The topologies
implemented were CMOS static full adder [1],[6], Mirror
Full Adder (MFA) [1],[6], transmission function full adder
(TFA) [5], and transmission gate adder (TGA) [5]. As for
dynamic adders, two different NP CMOS dynamic adders
were used: one from the book [6; 1st ed.], and the other
rearranged to minimize charge sharing without increasing the
number of transistors in a paper [5]. On the previous pages,
Figure 5 shows the more readable circuit schematics for all
of the 6 adders. Figure 6 contains the less readable
screenshots of the circuit schematic implementations for each
of those adders. Thus, in total, using the input vector
patterns listed above, we tested 6 different circuits.
After implementing each of these adders, Wenlu doublechecked to ensure that each of these adder blocks correctly
calculated the sum and carry out functions, and they all did
work correctly.
Transient output generated for ABC going from 101010.
When this input pattern is applied, the sum output should
transition from 0 to 1, and the carry out output should
transistion from 1 to 0: this should occur for all of these logic
blocks, regardless of logic style, because this is the definition
of a full adder. Figure 7 shows the graphs of what the input
and output signals: all of these graphs do not always look the
same. One thing to note in Figure 7 is that for the dynamic
adders, although the clock pulse is the same length as the
input pulse, they are out of phase by 4 nanoseconds.
The following graphs, in Figure 8, show the voltage noise at
the central node, Vmiddle. Besides the smooth RC curves
showing the discharging and charging of internal nodes,
other noise exists in different places, and each of these
adders exhibited a unique signature. More activity at the
bottom level lowers Vmiddle whereas increased activity in top
level increase Vmiddle above VDD/2 [4]. So the location of
Vmiddle can prove very telling as to which circuit block is at
work, and where to look for the action. The shape of these
curves aside, the following analysis focuses on the very
maximum and minimum points of each adder’s Vmiddle
curves, given in Figure 8.
3.1 CMOS Static Adder (CFA)
In Figure 8(a), we observe how the transient response
transitions from 1.205 to 1.050 V. With respect to VDD/2, that
is 0.105 above 1.1 V and 0.050 V below it. Percentage-wise,
that is 9.55% above and 4.55% below, on average 7.05%.
some very robust voltage regulation could definitely control
this much deviation from VDD/2. It is safe to say, this did the
3.2 Mirror Adder (MFA)
Given that this adder is so symmetric that can almost literally
be folded on itself, we chose this adder because we thought it
might actually out-perform the CFA in current balancing.
But with a transient response that went from 1.148 to 0.985
V (0.048 above 1.1 V and 0.115 V below VDD), it turns out that
was not so. Vmiddle ran 4.36% above and 10.5% below VDD/2,
but it still ranked among the best adders at an average of
7.43% deviation.
3.3 Transmission Function Adder (TFA)
The transient response for the TFA ranged from 1.295 to 1.049
V (0.195 above and 0.051 V below VDD/2). This adder is the
best out of the transmission logic adders we tested, although
better transmission logic adders may exist. Unfortunately, as
we will see later, even the most robust voltage regulation
might have trouble reigning in the runaway voltage peaks that
deviate 17.7% above and 4.63% below 1.1 V.
3.4 14-Transistor Transmission Gate Adder
Spanning 1.365 V to 0.790 V, the transient response of this
adder does poorly because it goes 0.265 V above 1.1 and 0.310
V below VDD/2. Going 24.1% above and 28.2% below 1.1 V,
this adder is long gone beyond the hope of voltage regulation.
The low number of transistors may be to blame.
3.5 NP CMOS Dynamic Adder (Dyn1)
For the first Dynamic adder, the transient response goes from
1.335 to 0.7 V. That is 0.235 above VDD/2 and 0.4 V below 1.1
V. Percents: 21.4% above and 36.4% below. On average, this
adder did the absolute worst at staying at VDD/2. Charge
sharing is definitely a problem.
3.6 Adjusted NP CMOS Dynamic Adder
The adjusted dynamic adder did better because the voltage
only swung from 1.406 to 0.861 V: in fact, it out performed
the TGA. Going only 0.306 (27.8%) above 1.1 V and 0.239 V
(21.7%) below that, this was pretty good for a dynamic adder:
it out-performs the TGA, with only 17 transistors. But making
an adder that combats charge sharing may not be worth it
because it still does not reign in Vmiddle to the point that it could
be voltage regulated.
In general, the static CMOS full adders did the best at
balancing current. Let’s see why. The following equation
quantifies how deviations from VDD/2 can depend on activity
factor, and capacitances [4].
The static adders did the best at current balancing because
either the pull-up and pull-down networks kept the output
point engaged. The static adders gain stability by utilizing their
delay in the right way. The transmission gate adders did not
have charge sharing issues. The dynamic adders did the worst
at current balancing. For them, by making the unit fast, we
compromise the stability of the middle voltage point by far.
Charge sharing played a significant role in Dyn 1.
In reality, even the most robust voltage regulators can handle
about +10% to -25% deviation off the point of regulation on
the input side.
(a) I/O graph for CFA. Notice the noise is exactly mirrored
for the top and bottom tiers.
(d) I/O graph for TGA. Notice, this adder is very noisy while
switching, especially during 0 to 1 transitions.
(b) I/O graph for MFA. Notice the noise is exactly mirrored
for the top and bottom tiers.
(e) I/O graph for Dyn1, with the “Precharge” phase labeled.
Charge sharing problems visible in this graph.
(c) I/O graph for TFA. Notice the smooth RC curves that
occur during switching.
(f) I/O graph for Dyn2, with the “Precharge” phase labeled.
Charge sharing has been prevented & transitions look fast and
nearly perfect.
Figure 7- Input and Output signals for all 6 adders. Red
dashed line is A0, orange dashed line is A1, pink dashed line is
Cout0, and green dashed line is Cout1. as per Figure 4.
(d) Vmiddle graph for TGA.
(a) Vmiddle graph for CFA.
(e) Vmiddle graph for Dyn1, with the “Precharge” phase labeled.
(b) Vmiddle graph for MFA.
(f) Vmiddle graph for Dyn2, with the “Precharge” phase labeled.
(c) Vmiddle graph for TFA.
Figure 8- Vmiddle signals for all 6 adders, illustrating the
deviations from VDD/2 = 1.1 V.
Reducing power supply noise is necessary to prevent
worsening the already acute problem of power dissipation in
circuitry today. To alleviate the pin limitation problem, a
stacked-VDD circuit paradigm has recently been proposed in
the literature. To use VDD stacking, usually a voltage
regulator needs to maintain the voltages at the appropriate
fractions of the supply voltage. But building an effective
voltage regulator poses an additional problem: it is difficult
to make a robust one so we should see what can be done to
limit how much this middle node deviates from VDD/2.
However, for a circuit designed using this paradigm, a
significant amount of power may be wasted if modules are
not strictly assigned to different VDD domains. Instead of
testing the limits of voltage regulation, for the purpose of this
project, by using VDD stacking without voltage regulation, we
kept the middle node floating, to quantify the deviations. In
particular, we aimed to see which of the adder topologies
balanced the current coming out of the middle node
naturally. And we found that static adders had the most hope
for use in stacked-VDD schemes.
A few concerns fell outside of the scope of our project. We
did not explore the worst-case scenario of vastly different to
avoid the complication of top/bottom block assignment. We
decided to look at neither energy nor performance
optimization. We did not scale VTh with VDD, considering that
the effective VDD per circuit is still roughly the same. We
also did not find enough time to simulate the effect of
random signal inputs on the middle node’s voltage stability.
We did not know how to verify if connecting Cin to A was
justified: although, it was a design choice made by Wenlu,
we do not believe it gives the worst-case noise for all 6
Further work could seek to investigate each of the
aforementioned concerns not addressed by the work
presented in this paper. For example, discovering what input
vector pattern makes the worst-case noise would be an
important step in the right direction. We could cascade two
or more 16-bit adders for each topology. This would
probably give a more realistic idea of how these circuits
balance current overall. Or, we could make this into a 4-digit
16-bit adder if we coordinated the 16-bit adder above with
the one below. This would probably require up or down
converters for the pertinent inputs and outputs. Potentially,
particular assignment of different adders in the top and
bottom tiers of the circuit could help improve the energydelay location of the circuit with respect to the Pareto
optimal curve. Typical modern microprocessors and DSP
chips contain multiple processing unit (ALU, multiplier,
memory blocks, etc.) for concurrent execution, so something
to that effect could also reflect the usage of this technique
more realistically.
Wenlu Sun did all the Cadence simulations, and made most of
the experimental design decisions. Jiyati Verma did the Wiki
page and most of the documentation the slides for the
presentation. It is fair to say that both did 50% of the project.
[1] Alioto, M., and Palumbo, G. Analysis and Comparison on
Full Adder Block in Submicron Technology. IEEE
Transactions on VLSI Systems, 10, 6 (Dec. 2002), 806823.
[2] Cabe, A. C., Qi, Z., and Stan, M. R. Stacking SRAM
Banks for Ultra Low Power Standby Mode Operation. In
Design Automation Conference (DAC 2010) (Anaheim,
CA, June 13-18, 2010), ACM press, New York, NY,
2010, 699-704.
[3] Gu, J., and Kim, C. H. Multi-Story power delivery for
supply noise reduction and low voltage operation. In
Proceedings of the International Symposium on Low
Power Electronics and Design (ISLPED 2005) (San
Diego, CA, Augusut 8-10, 2005), ACM press, New York,
NY, 2005, 192-197.
[4] Liu, C. Voltage Regulation of CMOS Stacked Digital
Circuits. Technical Report from Department of Electrical
Engineering, Eindhoven University of Technology, The
Netherlands, August 30, 2010.
[5] Mirzaee, R. F., Moaiyeri, M. H., and Navi, K. High Speed
NP-CMOS and Multi-Output Dynamic Full Adder Cells.
International Journal of Electrical and Electronics
Engineering, 4, 4 (Spring 2010), 304-310.
[6] Rabaey, J. M., Chandrakasan, A., and Nikolic, B. Digital
Integrated Circuits: A Design Perspective. 2nd ed.
Prentice Hall, Upper Saddle Review, NJ, 2003. (pp. 565567)
[7] Zhan, Y., and Sapatnekar, S. S. Automated Module
Assignment in Stacked- VDD designs for high-efficiency
power delivery. ACM Journal on Emerging Technologies
in Computing Systems, 4, 4, (October 2008), 18:1-18:20.
[8] Zhan, Y., Zhang, T., and Sapatnekar, S. S. Module
Assignment for Pin-Limited Designs under the StackedVDD Paradigm. In Proceedings of the 2007 IEEE/ACM
International Conference on Computer-Aided Design
(ICCAD 2007), (San Jose, CA, November 4-8, 2007),
ACM Press, New York, NY, 2007, 656-659.
[9] Zimmermann, R., and Fichtner, W. Low-Power Logic
Styles: CMOS Versus Pass-Transistor Logic. IEEE
Journal of Solid-State Circuits, 32, 7, (July 1997), 10791090.