Download A Novel Power Delay Optimized 32-bit Parallel Prefix Adder For

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LETTERS
Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009
A Novel Power Delay Optimized 32-bit Parallel
Prefix Adder For High Speed Computing
P.Ramanathan 1, P.T.Vanathi 2
1
PSG College of Technology / Department of ECE, Coimbatore, India.
Email: [email protected]
2
PSG College of Technology / Department of ECE, Coimbatore, India.
Email: [email protected]
represent generate and propagate signals used in addition.
The output of the operator is a new pair of bits which is
again combined using a dot operator ‘ • ’ or semi-dot
operator ‘ • ’ with another pairs of bits. This procedural use
of dot operator ‘ • ’ and semi-dot operator ‘ • ’ creates a
prefix tree network which ultimately ends in the generation
of all carry signals. In the final step, the sum bits of the
adder are generated with the propagate signals of the
operand bits and the preceding stage carry bit using a xor
gate. The semi-dot operator ‘ • ’ will be present as last
computation node in each column of the prefix graph
structures, where it is essential to compute only generate
term, whose value is the carry generated from that bit to the
succeeding bit.
The structure of the prefix network specifies the type of
the PPA. The Prefix network described by Haiku Zhu,
Chung-Kuan Cheng and Ronald Graham [1], has the
minimal depth for a given ‘n’ bit adder. Optimal
logarithmic adder structures with a fan-out of two for
minimizing the area-delay product is presented by Matthew
Ziegler and Mircea Stan [2]. The Sklansky adder [3]
presents a minimum depth prefix network at the cost of
increased fan-out for certain computation nodes. The
algorithm invented by Kogge-Stone [4] has both optimal
depth and low fan-out but produces massively complex
circuit realizations and also account for large number of
interconnects. Brent-Kung adder [5] has the merit of
minimal number of computation nodes, which yields in
reduced area but structure has maximum depth which
yields slight increase in latency when compared with other
structures. The Han-Carlson adder [6] combines BrentKung and Kogge-Stone structures to achieve a balance
between logic depth and interconnect count. Knowles [7]
presented a class of logarithmic adders with minimum
depth by allowing the fan-out to grow. Ladner and Fischer
[8] proposed a general method to construct a prefix
network with slightly higher depth when compared with
Sklansky topology but achieved some merit by reducing
the maximum fan-out for computation nodes in the critical
path. Related work on PPA literature such as Ling adder
[9], achieve improved performance gains by changing the
equation of the dot operator ‘ • ’. Taxonomy of classical
Prefix Parallel Adders based on fan-out, interconnect count
and depth characteristics has been presented by Harris [10].
Sparse tree binary adder proposed by Yan Sun and etal [11]
combines the benefits of prefix adder and carry save adder.
Integer Linear Programming method to build parallel prefix
Abstract—Parallel Prefix addition is a technique for
improving the speed of binary addition. Due to continuing
integrating intensity and the growing needs of portable
devices, low-power and high-performance designs are of
prime importance. The classical parallel prefix adder
structures presented in the literature over the years optimize
for logic depth, area, fan-out and interconnect count of logic
circuits. In this paper, a new architecture for performing 32bit Parallel Prefix addition is proposed. The proposed 32-bit
prefix adder is compared with several classical adders of
same bit width in terms of power, delay and number of
computational nodes. The results reveal that the proposed 32bit Parallel Prefix adder has the least power delay product
when compared with its peer existing Prefix adder structures.
Tanner EDA tool was used for simulating the adder designs in
the TSMC 180 nm and TSMC 130 nm technologies.
Index Terms— Parallel Prefix Adder, Dot operator, Semi-Dot
operator, CMOS, Odd-dot operator, Even-dot operator, Oddsemi-dot operator, Even-semi-dot operator
I. INTRODUCTION
VLSI Integer adders find applications in Arithmetic and
Logic units (ALU’s), microprocessors and memory
addressing units. Speed of the adder often decides the
minimum clock cycle time in a microprocessor. The need
for a Parallel Prefix adder is that it is primarily fast when
compared with ripple carry adders. Parallel Prefix adders
(PPA) are family of adders derived from the commonly
known carry look ahead adders. These adders are best
suited for adders with wider word lengths. PPA circuits use
a tree network to reduce the latency to O (log 2 n) where
‘n’ represents the number of bits. A three step process is
generally involved in the construction of a PPA. The first
step involves the creation of generate and propagate signals
for the input operand bits. The second step involves the
generation of carry signals. In PPA, the dot operator ‘ • ’
and the semi-dot operator ‘ • ’ are introduced. The dot
operator ‘ • ’ is defined by the equation (1) and the semidot operator ‘ • ’ is defined by the equation (2)
( pi , gi ) • ( pi −1 , gi −1 ) = ( pi ⋅ pi −1 , gi + pi ⋅ gi −1 )
( pi , gi ) • ( pi −1 , gi −1 ) = ( gi + pi ⋅ gi −1 )
(1)
(2)
In the above equation, ‘ • ’ operator is applied on two
pairs of bits ( pi , gi ) and ( pi −1 , gi −1 ) . These bits
58
© 2009 ACEEE
DOI: 01.IJRTET.02.06.208
LETTERS
Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009
adders is proposed by Jainhau Liu and etal [12]. A new
technique to achieve energy savings for Kogge-Stone adder
was proposed by Frustaci and etal [14]. A novel Parallel
Prefix adder for 32-bits has been proposed. The proposed
structure has the least power delay product amongst all its
peer one’s.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8
Inputs
7
6 5 4
3 2
1 0
Stage 1
Stage 2
Stage 3
Stage 4
II. EXISTING PARALLEL PREFIX ADDERS
Stage 5
Brent Kung adder is oriented towards simpler tree
structure with a fewer computation nodes. Kogge-Stone
adder possesses a regular layout and is preferred for high
performance applications. Han-Carlson adder the reduces
the hardware complexity when compared to that of a
Kogge-Stone adder but at the cost of introducing a
additional stage to its carry merge path. In Sklansky adder,
the fan-out from the inputs to outputs along the critical path
increases drastically which introduces latency in the circuit.
Ladner-Fischer adder is an improved version of Sklansky
adder, where the maximum fan-out is reduced. Table I
summarizes the data regarding the requirement of number
of computation nodes and logic depth for various existing
Parallel Prefix adders. Let ‘n’ be the word-length of the
adder in terms of bits.
Stage 6
Stage 7
Stage 8
Stage 9
Outputs
C31 C 30 C 29 C 28 C C 26C 25 C 24C 23C22 C 21 C20 C 19C 18 C C 16C15C14C 13 C 12 C C C C C C C 5 C C C C1 C
27
17
4 3 2
0
11 10 9 8 7 6
Figure 1. Proposed 32-bit Parallel Prefix Adder
Gi = (ai ⋅ bi )
Pi = ai ⊕ bi = ai bi
In the above equations,
Fig. 1 shows the architecture of the proposed 32-bit
parallel prefix adder. The objective is to eliminate the
massive overlap between the prefix sub-terms being
computed. Hence the associate property of the dot operator
is employed to keep the number of computation nodes at a
minimum. The first stage of the computation is called as
pre-processing. The first stage in the architectures of the
proposed 32-bit prefix adder involve the creation of
generate and propagate signals for individual operand bits
in active low format. The equations (3) and (4) represent
the functionality of the first stage.
TABLE I.
COMPARATIVE STUDY ON EXISTING PREFIX ADDERS
Number of Computation Nodes
Brent-Kung
[2* n − 2 − log 2 n]
[(2 *log 2 n) − 2]
Kogge-Stone
[(n *log 2 n) − n + 1]
log 2 n
n
[ *(log 2 n)]
2
n
[ *(log 2 n)]
2
n
[ *(log 2 n)]
2
[(log 2 n) + 1]
Han-Carlson
Ladner-Fischer
Sklansky
(4)
ai , bi represent input operand
bits for the adder, where ‘i’ varies from 0 to 31. The second
stage in the prefix addition is termed as prefix computation.
This stage is responsible for creation of group generate and
group propagate signals. For deriving the carry signals in
the second stage, this architecture introduces four different
computation nodes for achieving improved performance.
There are two cells designed for the dot operator. First cell
for the dot operator named odd-dot represented by a ‘ „ ’,
works with active low inputs and generates active high
outputs.
The second cell for the dot operator named even-dot
represented by a ‘ „ ’, works with active high inputs and
generates active low outputs. Similarly, there are two cells
designed for the semi-dot operator. First cell for the semidot operator named odd-semi-dot represented by a ‘ • ’,
works with active low inputs and generates active high
inputs. The second cell for the semi-dot operator named
even-semi-dot represented by a ‘ • ’, works with active
high inputs and generates active low outputs. The stages
with odd indexes use odd-dot and odd-semi-dot cells where
as the stages with even indexes use even-dot and evensemi-dot cells. The equations (5) and (6) represent the
functionality of odd-dot and even-dot cells respectively.
III. PROPOSED 32-BIT PARALLELREFIX ADDER
Adder Type
(3)
Logic Depth
(P,G) = ( pi , gi )„( pi−1 , gi−1)
[(log 2 n) + 1]
= ( pi + pi−1 , gi ⋅ ( pi + gi−1)
(5)
= ( pi ⋅ pi−1 , gi + pi ⋅ gi−1)
log 2 n
( P , G ) = ( pi , gi )„( pi −1 , gi −1 )
= ( pi ⋅ pi −1 , gi + pi ⋅ gi −1 )
(6)
The equations (7) and (8) represent the functionality of
odd-semi-dot and even-semi-dot cells respectively.
59
© 2009 ACEEE
DOI: 01.IJRTET.02.06.208
LETTERS
Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009
(G ) = ( pi , gi ) • ( gi −1 ) = ( gi ⋅ ( pi + gi −1 )
= ( gi + pi ⋅ gi −1 ) = ci
(G ) = ( pi , gi ) • ( gi −1 ) = ( gi + pi ⋅ gi −1 ) = ci
respectively and the supply voltage was kept at 1.8 V.. For
TSMC 130 nm technology, threshold voltages of NMOS
and PMOS transistors are around 0.332 V and -0.3499 V
respectively and the supply voltage was kept at 1.3 V. The
parameters considered for comparison are power
consumption, worst case delay and power-delay product.
The various PPA structures were then compared with the
number of transistors needed for each of the circuit
realizations.
(7)
(8)
CMOS logic family will implement only inverting
functions. Thus cascading odd cells and even cells
alternatively gives the benefit of elimination of two
inverters between them, if a dot or a semi-dot computation
node in an odd stage receives both of its input edges from
any of the even stages and vice-versa. But it is essential to
introduce two inverters in a path, if a dot or a semi-dot
computation node in an even stage receives any of its edges
from any of the even stages and vice-versa. From the
prefix graph of the proposed structure shown in Figure. 1,
we infer that there are only few edges with a pair of
inverters, to make (G , P ) as (G , P) or to make (G , P )
as (G , P ) respectively. The pair of inverters in a path is
represented by a ‘‹’ in the prefix graph. By introducing
two cells for dot operator and two cells for semi-dot
operator, we have eliminated a large number of inverters.
Due to inverter elimination in paths, the propagation delay
in those paths would have reduced. Further we achieve a
benefit in power reduction, since these inverters if not
eliminated, would have contributed to significant amount
of power dissipation due to switching. The output of the
odd-semi-dot cells gives the value of the carry signal in that
corresponding bit position. The output of the even-semi-dot
cell gives the complemented value of carry signal in that
corresponding bit position. The final stage in the prefix
addition is termed as post-processing. The final stage
involves generation of sum bits from the active low
Propagate signals of the individual operand bits and the
carry bits generated in true form or complement form. The
proposed 32-bit structure has a maximum fan-out of 6 and
a logic depth of 9. The first stage and last stage are
intrinsically fast because they involve only simple
operations on signals local to each bit position. The
intermediate stage embodies long distance propagation of
carries, so the performance of the adder depends on the
intermediate stage. The lateral fan-out slightly increases,
but we get an advantage of limited interconnect lengths
since the tree grows along the main diagonal.
IV. RESULTS AND DISCUSSION
Table II lists the structural characteristics for each of the
32-bit Parallel Prefix adders. From the table it is observed
that the proposed 32-bit Parallel Prefix adder has the least
number of computation nodes amongst all other peer
designs. Structure of the proposed 32-bit adder also reveals
that the prefix tree builds along the main diagonal after the
first two stages.
TABLE II.
CHARACTERISTICS OF 32-BIT PARALLEL PREFIX ADDERS
Number of Computation Nodes
Adder Type
Semi-Dot
Brent-Kung
26
31
8
Kogge-Stone
98
31
5
Han-Carlson
33
31
6
Ladner-Fischer
33
31
6
Sklansky
33
31
5
Proposed
23
31
9
Table III and Table IV list out the performance comparison
of the various 32-bit Parallel Prefix adders in 180 µm
technology and 130 µm technology.
TABLE III.
PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX
ADDERS USING TSMC 180 NM TECHNOLOGY
Brent-Kung
228.5272
1.05
Power-Delay
Product
( X 10-15
Joules)
239.9536
Kogge-Stone
342.1976
0.79
270.3361
Adder Name
III. SIMULATION ENVIRONMENT
Simulation of various Parallel Prefix Adder designs were
carried out with Tanner EDA tool using TSMC 180 nm and
TSMC 130 nm technologies. All the Parallel Prefix Adder
structures were implemented using CMOS logic family.
The aspect ratio of the MOS transistor devices were chosen
⎛W ⎞
⎛W ⎞
⎟ = 3* ⎜ ⎟ . For TSMC 180 nm
⎝ L ⎠n
⎝ L ⎠p
such that ⎜
technology, threshold voltages of NMOS and PMOS
transistors are around 0.3694 V and
-0.3944 V
60
© 2009 ACEEE
DOI: 01.IJRTET.02.06.208
Logic Depth
Dot
Average
Power
(µW)
Delay
(ns)
Han-Carlson
253.1247
0.89
225.2810
Sklansky
261.1407
0.70
184.8985
Ladner-Fischer
232.7179
0.91
211.7733
Proposed
211.9441
0.85
180.1525
LETTERS
Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009
TABLE IV.
PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX
ADDERS USING TSMC 130 NM TECHNOLOGY
Brent-Kung
67.02993
0.88
Power-Delay
Product
( X 10-15
Joules)
58.9863
Kogge-Stone
101.4262
0.62
62.8842
Adder Name
Average
Power
(µW)
Delay
(ns)
Han-Carlson
76.92042
0.68
52.3059
Sklansky
78.23902
0.53
41.4667
Ladner-Fischer
69.72199
0.67
46.7137
Proposed
62.9453
0.65
40.9144
Fig. 2 and Fig. 3 show the Power and Power Delay Product
comparison of various PPA Designs simulated using
TSMC 180 nm Technology respectively.
Figure 4. Power Comparison of various PPA Designs in 130 nm
Figure 2. Power Comparison of various PPA Designs in 180 nm
Figure 5. Power Delay Product of various PPA Designs in 130nm
V. CONCLUSIONS
It is observed that the Proposed 32-bit Parallel Prefix
adder has the least power delay product when compared
with its peer existing one’s. The Proposed structure offers
7.2 % power savings is achieved when compared with
Brent Kung adder. The Proposed design offers 38 % power
Figure 3. Power Delay Product of various PPA Designs in 180 nm
61
© 2009 ACEEE
DOI: 01.IJRTET.02.06.208
LETTERS
Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009
[7] S.Knowles, “A family of adders”, Proceedings of the 15th
IEEE Symposium on Computer Arithmetic. Vail, Colarado,
pp.277-281, June 2001.
[8] R. Ladner and M. Fischer, “Parallel prefix computation,”
Journal of ACM. La Jolla, CA, vol.27,no.4, pp. 831-838,
October 1980.
[9] Giorgos Dimitrakopoulos and Dimitris Nikolos, “High speed
parallel prefix VLSI Ling adders,” IEEE Transactions on
Computers, vol.54, no.2, pp. 225-231, February 2005.
[10] David Harris, “ A Taxonomy of parallel prefix networks,”
Proceedings of the 37th Asilomar Conference on Signals,
Systems and Computers Pacific Grove, California, pp.22132217, November 2003.
[11] Yan Sun, Dongyu, Zheng, Minxuan Zhang and Shaoqing Li,
“High Performance-Low power sparse tree binary adders,”
Proceedings of the 8th International Conference on Solid
State and Integrated Circuit Technology ICSICT 2006.
Shanghai, pp. 1649-1651, October 2006.
[12] Jainhua Liu, Yi Zhu, Haikun Zhu and Chung-Kuan Cheng,
“Optimum prefix adders in a comprehensive area, timing and
power design Space,” Proceedings of the 2007 Asia and
South Pacific Design Automation Conference. Washington,
pp. 609-615, January 2007.
[13] Jun Chen and James E. Stine, “Enhancing parallel Prefix
Structures with Carry Save Notion,” Proceedings of the 51st
IEEE International Midwest Symposium on Circuits and
Systems. Knoxville, pp. 354-357, August 2008.
[14] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri and P.
Corsonello, “Designing high speed adders in power
constrained environments,” IEEE Transactions on Circuits
and Systems, Vol.56, pp. 172-176, February 2009.
saving when compared with Kogge-Stone structure, 18.8 %
power savings when compared with Sklansky structure and
16.26 % power savings when compared with Han-Carlson
structure. The delay of the Proposed structure is almost
comparable with that of the Kogge-Stone adder. The
significant improvement in power delay product of the
Proposed 32-bit prefix structure makes it more suited for
ALU’s and multiplier units.
REFERENCES
[1] Haikun Zhu, Chung-Kuan Cheng and Ronald Graham,
“Constructing zero deficiency parallel prefix adder of
minimum depth,” Proceedings of 2005 Conference on Asia
South Pacific Design Automation ASP-DAC 2005.
Shanghai, vol.2, pp. 883–888, January 2005.
[2] Matthew Ziegler and Mircea Stan, “Optimal logarithmic
adder structures with a fan-out of two for minimizing area
delay product,” IEEE International Symposium on Circuits
and Systems 2001. Sydney, vol.2, pp.657-660, May 2001.
[3] J. Sklansky, “Conditional sum addition logic,” IRE
Transactions on Electronic computers. New York, vol. EC-9,
pp. 226-231, June 1960.
[4] P.Kogge and H.Stone, “A parallel algorithm for the efficient
solution of a general class of recurrence relations,” IEEE
Transactions on Computers, vol. C-22, no.8, pp.786-793,
August 1973.
[5] R.Brent and H.Kung, “A regular layout for parallel adders,”
IEEE Transaction on Computers, vol. C-31, no.3, pp. 260264, March 1982.
[6] T. Han and D. Carlson, “Fast area efficient VLSI adders,”
Proceedings of the Eighth Symposium on Computer
Arithmetic. Como, Italy, pp.49-56, September 1987.
62
© 2009 ACEEE
DOI: 01.IJRTET.02.06.208