Download A Novel Power Delay Optimized 32-bit Parallel Prefix Adder For

LETTERS Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009 A Novel Power Delay Optimized 32-bit Parallel Prefix Adder For High Speed Computing P.Ramanathan 1, P.T.Vanathi 2 1 PSG College of Technology / Department of ECE, Coimbatore, India. Email: [email protected] 2 PSG College of Technology / Department of ECE, Coimbatore, India. Email: [email protected] represent generate and propagate signals used in addition. The output of the operator is a new pair of bits which is again combined using a dot operator ‘ • ’ or semi-dot operator ‘ • ’ with another pairs of bits. This procedural use of dot operator ‘ • ’ and semi-dot operator ‘ • ’ creates a prefix tree network which ultimately ends in the generation of all carry signals. In the final step, the sum bits of the adder are generated with the propagate signals of the operand bits and the preceding stage carry bit using a xor gate. The semi-dot operator ‘ • ’ will be present as last computation node in each column of the prefix graph structures, where it is essential to compute only generate term, whose value is the carry generated from that bit to the succeeding bit. The structure of the prefix network specifies the type of the PPA. The Prefix network described by Haiku Zhu, Chung-Kuan Cheng and Ronald Graham [1], has the minimal depth for a given ‘n’ bit adder. Optimal logarithmic adder structures with a fan-out of two for minimizing the area-delay product is presented by Matthew Ziegler and Mircea Stan [2]. The Sklansky adder [3] presents a minimum depth prefix network at the cost of increased fan-out for certain computation nodes. The algorithm invented by Kogge-Stone [4] has both optimal depth and low fan-out but produces massively complex circuit realizations and also account for large number of interconnects. Brent-Kung adder [5] has the merit of minimal number of computation nodes, which yields in reduced area but structure has maximum depth which yields slight increase in latency when compared with other structures. The Han-Carlson adder [6] combines BrentKung and Kogge-Stone structures to achieve a balance between logic depth and interconnect count. Knowles [7] presented a class of logarithmic adders with minimum depth by allowing the fan-out to grow. Ladner and Fischer [8] proposed a general method to construct a prefix network with slightly higher depth when compared with Sklansky topology but achieved some merit by reducing the maximum fan-out for computation nodes in the critical path. Related work on PPA literature such as Ling adder [9], achieve improved performance gains by changing the equation of the dot operator ‘ • ’. Taxonomy of classical Prefix Parallel Adders based on fan-out, interconnect count and depth characteristics has been presented by Harris [10]. Sparse tree binary adder proposed by Yan Sun and etal [11] combines the benefits of prefix adder and carry save adder. Integer Linear Programming method to build parallel prefix Abstract—Parallel Prefix addition is a technique for improving the speed of binary addition. Due to continuing integrating intensity and the growing needs of portable devices, low-power and high-performance designs are of prime importance. The classical parallel prefix adder structures presented in the literature over the years optimize for logic depth, area, fan-out and interconnect count of logic circuits. In this paper, a new architecture for performing 32bit Parallel Prefix addition is proposed. The proposed 32-bit prefix adder is compared with several classical adders of same bit width in terms of power, delay and number of computational nodes. The results reveal that the proposed 32bit Parallel Prefix adder has the least power delay product when compared with its peer existing Prefix adder structures. Tanner EDA tool was used for simulating the adder designs in the TSMC 180 nm and TSMC 130 nm technologies. Index Terms— Parallel Prefix Adder, Dot operator, Semi-Dot operator, CMOS, Odd-dot operator, Even-dot operator, Oddsemi-dot operator, Even-semi-dot operator I. INTRODUCTION VLSI Integer adders find applications in Arithmetic and Logic units (ALU’s), microprocessors and memory addressing units. Speed of the adder often decides the minimum clock cycle time in a microprocessor. The need for a Parallel Prefix adder is that it is primarily fast when compared with ripple carry adders. Parallel Prefix adders (PPA) are family of adders derived from the commonly known carry look ahead adders. These adders are best suited for adders with wider word lengths. PPA circuits use a tree network to reduce the latency to O (log 2 n) where ‘n’ represents the number of bits. A three step process is generally involved in the construction of a PPA. The first step involves the creation of generate and propagate signals for the input operand bits. The second step involves the generation of carry signals. In PPA, the dot operator ‘ • ’ and the semi-dot operator ‘ • ’ are introduced. The dot operator ‘ • ’ is defined by the equation (1) and the semidot operator ‘ • ’ is defined by the equation (2) ( pi , gi ) • ( pi −1 , gi −1 ) = ( pi ⋅ pi −1 , gi + pi ⋅ gi −1 ) ( pi , gi ) • ( pi −1 , gi −1 ) = ( gi + pi ⋅ gi −1 ) (1) (2) In the above equation, ‘ • ’ operator is applied on two pairs of bits ( pi , gi ) and ( pi −1 , gi −1 ) . These bits 58 © 2009 ACEEE DOI: 01.IJRTET.02.06.208 LETTERS Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009 adders is proposed by Jainhau Liu and etal [12]. A new technique to achieve energy savings for Kogge-Stone adder was proposed by Frustaci and etal [14]. A novel Parallel Prefix adder for 32-bits has been proposed. The proposed structure has the least power delay product amongst all its peer one’s. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 Inputs 7 6 5 4 3 2 1 0 Stage 1 Stage 2 Stage 3 Stage 4 II. EXISTING PARALLEL PREFIX ADDERS Stage 5 Brent Kung adder is oriented towards simpler tree structure with a fewer computation nodes. Kogge-Stone adder possesses a regular layout and is preferred for high performance applications. Han-Carlson adder the reduces the hardware complexity when compared to that of a Kogge-Stone adder but at the cost of introducing a additional stage to its carry merge path. In Sklansky adder, the fan-out from the inputs to outputs along the critical path increases drastically which introduces latency in the circuit. Ladner-Fischer adder is an improved version of Sklansky adder, where the maximum fan-out is reduced. Table I summarizes the data regarding the requirement of number of computation nodes and logic depth for various existing Parallel Prefix adders. Let ‘n’ be the word-length of the adder in terms of bits. Stage 6 Stage 7 Stage 8 Stage 9 Outputs C31 C 30 C 29 C 28 C C 26C 25 C 24C 23C22 C 21 C20 C 19C 18 C C 16C15C14C 13 C 12 C C C C C C C 5 C C C C1 C 27 17 4 3 2 0 11 10 9 8 7 6 Figure 1. Proposed 32-bit Parallel Prefix Adder Gi = (ai ⋅ bi ) Pi = ai ⊕ bi = ai bi In the above equations, Fig. 1 shows the architecture of the proposed 32-bit parallel prefix adder. The objective is to eliminate the massive overlap between the prefix sub-terms being computed. Hence the associate property of the dot operator is employed to keep the number of computation nodes at a minimum. The first stage of the computation is called as pre-processing. The first stage in the architectures of the proposed 32-bit prefix adder involve the creation of generate and propagate signals for individual operand bits in active low format. The equations (3) and (4) represent the functionality of the first stage. TABLE I. COMPARATIVE STUDY ON EXISTING PREFIX ADDERS Number of Computation Nodes Brent-Kung [2* n − 2 − log 2 n] [(2 *log 2 n) − 2] Kogge-Stone [(n *log 2 n) − n + 1] log 2 n n [ *(log 2 n)] 2 n [ *(log 2 n)] 2 n [ *(log 2 n)] 2 [(log 2 n) + 1] Han-Carlson Ladner-Fischer Sklansky (4) ai , bi represent input operand bits for the adder, where ‘i’ varies from 0 to 31. The second stage in the prefix addition is termed as prefix computation. This stage is responsible for creation of group generate and group propagate signals. For deriving the carry signals in the second stage, this architecture introduces four different computation nodes for achieving improved performance. There are two cells designed for the dot operator. First cell for the dot operator named odd-dot represented by a ‘ ’, works with active low inputs and generates active high outputs. The second cell for the dot operator named even-dot represented by a ‘ ’, works with active high inputs and generates active low outputs. Similarly, there are two cells designed for the semi-dot operator. First cell for the semidot operator named odd-semi-dot represented by a ‘ • ’, works with active low inputs and generates active high inputs. The second cell for the semi-dot operator named even-semi-dot represented by a ‘ • ’, works with active high inputs and generates active low outputs. The stages with odd indexes use odd-dot and odd-semi-dot cells where as the stages with even indexes use even-dot and evensemi-dot cells. The equations (5) and (6) represent the functionality of odd-dot and even-dot cells respectively. III. PROPOSED 32-BIT PARALLELREFIX ADDER Adder Type (3) Logic Depth (P,G) = ( pi , gi )( pi−1 , gi−1) [(log 2 n) + 1] = ( pi + pi−1 , gi ⋅ ( pi + gi−1) (5) = ( pi ⋅ pi−1 , gi + pi ⋅ gi−1) log 2 n ( P , G ) = ( pi , gi )( pi −1 , gi −1 ) = ( pi ⋅ pi −1 , gi + pi ⋅ gi −1 ) (6) The equations (7) and (8) represent the functionality of odd-semi-dot and even-semi-dot cells respectively. 59 © 2009 ACEEE DOI: 01.IJRTET.02.06.208 LETTERS Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009 (G ) = ( pi , gi ) • ( gi −1 ) = ( gi ⋅ ( pi + gi −1 ) = ( gi + pi ⋅ gi −1 ) = ci (G ) = ( pi , gi ) • ( gi −1 ) = ( gi + pi ⋅ gi −1 ) = ci respectively and the supply voltage was kept at 1.8 V.. For TSMC 130 nm technology, threshold voltages of NMOS and PMOS transistors are around 0.332 V and -0.3499 V respectively and the supply voltage was kept at 1.3 V. The parameters considered for comparison are power consumption, worst case delay and power-delay product. The various PPA structures were then compared with the number of transistors needed for each of the circuit realizations. (7) (8) CMOS logic family will implement only inverting functions. Thus cascading odd cells and even cells alternatively gives the benefit of elimination of two inverters between them, if a dot or a semi-dot computation node in an odd stage receives both of its input edges from any of the even stages and vice-versa. But it is essential to introduce two inverters in a path, if a dot or a semi-dot computation node in an even stage receives any of its edges from any of the even stages and vice-versa. From the prefix graph of the proposed structure shown in Figure. 1, we infer that there are only few edges with a pair of inverters, to make (G , P ) as (G , P) or to make (G , P ) as (G , P ) respectively. The pair of inverters in a path is represented by a ‘’ in the prefix graph. By introducing two cells for dot operator and two cells for semi-dot operator, we have eliminated a large number of inverters. Due to inverter elimination in paths, the propagation delay in those paths would have reduced. Further we achieve a benefit in power reduction, since these inverters if not eliminated, would have contributed to significant amount of power dissipation due to switching. The output of the odd-semi-dot cells gives the value of the carry signal in that corresponding bit position. The output of the even-semi-dot cell gives the complemented value of carry signal in that corresponding bit position. The final stage in the prefix addition is termed as post-processing. The final stage involves generation of sum bits from the active low Propagate signals of the individual operand bits and the carry bits generated in true form or complement form. The proposed 32-bit structure has a maximum fan-out of 6 and a logic depth of 9. The first stage and last stage are intrinsically fast because they involve only simple operations on signals local to each bit position. The intermediate stage embodies long distance propagation of carries, so the performance of the adder depends on the intermediate stage. The lateral fan-out slightly increases, but we get an advantage of limited interconnect lengths since the tree grows along the main diagonal. IV. RESULTS AND DISCUSSION Table II lists the structural characteristics for each of the 32-bit Parallel Prefix adders. From the table it is observed that the proposed 32-bit Parallel Prefix adder has the least number of computation nodes amongst all other peer designs. Structure of the proposed 32-bit adder also reveals that the prefix tree builds along the main diagonal after the first two stages. TABLE II. CHARACTERISTICS OF 32-BIT PARALLEL PREFIX ADDERS Number of Computation Nodes Adder Type Semi-Dot Brent-Kung 26 31 8 Kogge-Stone 98 31 5 Han-Carlson 33 31 6 Ladner-Fischer 33 31 6 Sklansky 33 31 5 Proposed 23 31 9 Table III and Table IV list out the performance comparison of the various 32-bit Parallel Prefix adders in 180 µm technology and 130 µm technology. TABLE III. PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX ADDERS USING TSMC 180 NM TECHNOLOGY Brent-Kung 228.5272 1.05 Power-Delay Product ( X 10-15 Joules) 239.9536 Kogge-Stone 342.1976 0.79 270.3361 Adder Name III. SIMULATION ENVIRONMENT Simulation of various Parallel Prefix Adder designs were carried out with Tanner EDA tool using TSMC 180 nm and TSMC 130 nm technologies. All the Parallel Prefix Adder structures were implemented using CMOS logic family. The aspect ratio of the MOS transistor devices were chosen ⎛W ⎞ ⎛W ⎞ ⎟ = 3* ⎜ ⎟ . For TSMC 180 nm ⎝ L ⎠n ⎝ L ⎠p such that ⎜ technology, threshold voltages of NMOS and PMOS transistors are around 0.3694 V and -0.3944 V 60 © 2009 ACEEE DOI: 01.IJRTET.02.06.208 Logic Depth Dot Average Power (µW) Delay (ns) Han-Carlson 253.1247 0.89 225.2810 Sklansky 261.1407 0.70 184.8985 Ladner-Fischer 232.7179 0.91 211.7733 Proposed 211.9441 0.85 180.1525 LETTERS Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009 TABLE IV. PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX ADDERS USING TSMC 130 NM TECHNOLOGY Brent-Kung 67.02993 0.88 Power-Delay Product ( X 10-15 Joules) 58.9863 Kogge-Stone 101.4262 0.62 62.8842 Adder Name Average Power (µW) Delay (ns) Han-Carlson 76.92042 0.68 52.3059 Sklansky 78.23902 0.53 41.4667 Ladner-Fischer 69.72199 0.67 46.7137 Proposed 62.9453 0.65 40.9144 Fig. 2 and Fig. 3 show the Power and Power Delay Product comparison of various PPA Designs simulated using TSMC 180 nm Technology respectively. Figure 4. Power Comparison of various PPA Designs in 130 nm Figure 2. Power Comparison of various PPA Designs in 180 nm Figure 5. Power Delay Product of various PPA Designs in 130nm V. CONCLUSIONS It is observed that the Proposed 32-bit Parallel Prefix adder has the least power delay product when compared with its peer existing one’s. The Proposed structure offers 7.2 % power savings is achieved when compared with Brent Kung adder. The Proposed design offers 38 % power Figure 3. Power Delay Product of various PPA Designs in 180 nm 61 © 2009 ACEEE DOI: 01.IJRTET.02.06.208 LETTERS Int. J. of Recent Trends in Engineering and Technology, Vol. 2, No. 6, Nov 2009 [7] S.Knowles, “A family of adders”, Proceedings of the 15th IEEE Symposium on Computer Arithmetic. Vail, Colarado, pp.277-281, June 2001. [8] R. Ladner and M. Fischer, “Parallel prefix computation,” Journal of ACM. La Jolla, CA, vol.27,no.4, pp. 831-838, October 1980. [9] Giorgos Dimitrakopoulos and Dimitris Nikolos, “High speed parallel prefix VLSI Ling adders,” IEEE Transactions on Computers, vol.54, no.2, pp. 225-231, February 2005. [10] David Harris, “ A Taxonomy of parallel prefix networks,” Proceedings of the 37th Asilomar Conference on Signals, Systems and Computers Pacific Grove, California, pp.22132217, November 2003. [11] Yan Sun, Dongyu, Zheng, Minxuan Zhang and Shaoqing Li, “High Performance-Low power sparse tree binary adders,” Proceedings of the 8th International Conference on Solid State and Integrated Circuit Technology ICSICT 2006. Shanghai, pp. 1649-1651, October 2006. [12] Jainhua Liu, Yi Zhu, Haikun Zhu and Chung-Kuan Cheng, “Optimum prefix adders in a comprehensive area, timing and power design Space,” Proceedings of the 2007 Asia and South Pacific Design Automation Conference. Washington, pp. 609-615, January 2007. [13] Jun Chen and James E. Stine, “Enhancing parallel Prefix Structures with Carry Save Notion,” Proceedings of the 51st IEEE International Midwest Symposium on Circuits and Systems. Knoxville, pp. 354-357, August 2008. [14] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri and P. Corsonello, “Designing high speed adders in power constrained environments,” IEEE Transactions on Circuits and Systems, Vol.56, pp. 172-176, February 2009. saving when compared with Kogge-Stone structure, 18.8 % power savings when compared with Sklansky structure and 16.26 % power savings when compared with Han-Carlson structure. The delay of the Proposed structure is almost comparable with that of the Kogge-Stone adder. The significant improvement in power delay product of the Proposed 32-bit prefix structure makes it more suited for ALU’s and multiplier units. REFERENCES [1] Haikun Zhu, Chung-Kuan Cheng and Ronald Graham, “Constructing zero deficiency parallel prefix adder of minimum depth,” Proceedings of 2005 Conference on Asia South Pacific Design Automation ASP-DAC 2005. Shanghai, vol.2, pp. 883–888, January 2005. [2] Matthew Ziegler and Mircea Stan, “Optimal logarithmic adder structures with a fan-out of two for minimizing area delay product,” IEEE International Symposium on Circuits and Systems 2001. Sydney, vol.2, pp.657-660, May 2001. [3] J. Sklansky, “Conditional sum addition logic,” IRE Transactions on Electronic computers. New York, vol. EC-9, pp. 226-231, June 1960. [4] P.Kogge and H.Stone, “A parallel algorithm for the efficient solution of a general class of recurrence relations,” IEEE Transactions on Computers, vol. C-22, no.8, pp.786-793, August 1973. [5] R.Brent and H.Kung, “A regular layout for parallel adders,” IEEE Transaction on Computers, vol. C-31, no.3, pp. 260264, March 1982. [6] T. Han and D. Carlson, “Fast area efficient VLSI adders,” Proceedings of the Eighth Symposium on Computer Arithmetic. Como, Italy, pp.49-56, September 1987. 62 © 2009 ACEEE DOI: 01.IJRTET.02.06.208

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A Novel Power Delay Optimized 32-bit Parallel Prefix Adder For