Download Combined unsigned and two`s complement saturating multipliers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Jesús Mosterín wikipedia , lookup

Canonical normal form wikipedia , lookup

Transcript
Combined unsigned and two's complement saturating
multipliers
Michael J. Schultey , Mustafa Goky, Pablo I. Balzolay, and Robert W. Brocatoz
y
EECS Dept., Lehigh University, Bethlehem, PA 18015, USA.
z
Digital Microelectronics, Sandia National Laboratories, Albuquerque, NM 87185
ABSTRACT
In many digital signal processing and multimedia applications, results that overow are saturated to the most positive
or most negative representable number. This paper presents ecient techniques for performing saturating n-bit
integer multiplication on unsigned and two's complement numbers. Unlike conventional techniques for saturating
multiplication, which compute a 2n-bit product and then examine the n most signicant product bits to determine
if overow has occurred, the techniques presented in this paper compute only the (n + 1) least signicant bits of the
product. Specialized overow detection units, which operate in parallel with the multiplier, determine if overow
has occurred and the product should be saturated. These techniques are applied to designs for saturating array
multipliers that perform either unsigned or two's complement saturating integer multiplication, based on an input
control signal. Compared to array multipliers that use conventional methods for saturation, these multipliers have
about half as much area and delay.
Keywords: Saturation, overow detection, unsigned, two's complement, multipliers, computer arithmetic.
1. INTRODUCTION
Saturating arithmetic is used in many digital signal processing and multimedia applications including digital ltering,
speech encoding and decoding, red-green-blue color computations, and image shading [1], [2], [3]. Because of its
usefulness in these types of applications, support for saturating arithmetic has been added to several instruction
set architecture extensions [1], [4], [5], [6]. Saturating arithmetic operations are also supported on most commercial
digital signal processors [7].
With saturating multiplication, products that overow are saturated to the most positive or most negative
representable number [7]. For unsigned saturating multiplication, if the product is too large to represent, it is
saturated to the largest representable number. With two's complement saturating multiplication, if the magnitude
of the product is too large to represent, it is saturated to either the most positive or most negative representable
number, depending on its sign. Conventional n-bit saturating integer multipliers produce a 2n-bit product. The n
most signicant bits of the product are then examined to determine if the product should be saturated.
This paper presents ecient techniques for designing unsigned, two's complement, and combined unsigned and
two's complement saturating integer multipliers. These multipliers compute only the (n + 1) least signicant bits
of the product. Specialized overow detection units, which operates in parallel with the multiplier, determine if
the product should be saturated. Since the overow detection units presented in this paper have linear delay, they
are particularly well suited for use with array multipliers [8], [9]. The fundamental equations for overow detection,
however, are independent of the multiplier structure, and can also be applied to tree multipliers [10], [11], [12], Boothencoded multipliers [13], [14], and multipliers that use other architectures [15]. The overow detection structures
presented in this paper are similar to those presented in [16], except they do not use carry bits from the multiplier
and are more easily adapted to designs for combined unsigned and two's complement saturating multiplication.
2. UNSIGNED SATURATING MULTIPLIERS
Figure 1 shows the multiplication matrix for an n-bit unsigned integer multiplication. In this gure, the n-bit
multiplicand A = an?1 an?2 : : : a1 a0 is multiplied by the n-bit multiplier B = bn?1bn?2 : : : b1 b0 to produce a 2n-bit
product P = p2n?1 p2n?2 : : : p1 p0 . The values of A, B , and P are
A=
X ai 2 i
n?1
i=0
B=
X bi 2i
n?1
i=0
P=
X pi 2i
2n?1
i=0
(1)
an-1 b0 an-2 b0
an-1 b1 an-2 b1
a1 bn-2 a0 bn-2
an-1 bn-2 an-2 b n-2
an-1 bn-1 an-2 bn-1
p2n-1
p2n-2
p2n-3
Figure 1.
a1 b0 a0 b0
a1 b1 a0 b1
a1 bn-1 a0 bn-1
pn
pn-1
p1
p0
Unsigned multiplication matrix for P = A B
If the n least signicant bits of the product are used as the result, overow occurs and the product should be
saturated if 2n P . The conventional method for detecting this is to compute the entire 2n-bit product and then
detect overow as
Vu = p2n?1 + p2n?2 + : : : + pn+1 + pn
(2)
where + denotes logical OR. If overow occurs, Vu = 1 and the n least signicant product bits are all set to ones,
which corresponds to 2n ? 1, the largest representable number. This is accomplished by ORing each of the n least
signicant product bits, p0 to pn?1 , with Vu . Thus, the bits of the saturated product
< P >=< pn?1 >< pn?2 > : : : < p1 >< p0 >
(3)
are computed as
< pi >= pi + Vu
(4)
for 0 i n ? 1.
A block diagram of an unsigned 8-bit array multiplier that performs conventional saturating multiplication is
shown in Figure 2. The cells along each diagonal in the array multiplier correspond to a column in the multiplication
matrix. In this diagram, a modied half adder (MHA) cell consists of an AND gate and a half adder (HA). The
AND gate generates a partial product bit, and the HA adds the generated partial product bit and a partial product
bit from the previous row to produce a sum bit and a carry bit. Similarly, a modied full adder (MFA) consists of
an AND gate, which generates a partial product bit, and a full adder (FA) that adds the partial product bit and the
sum and carry bits from the previous row. The bottom row of adders produces the n most signicant product bits.
At the bottom of the array, (n ? 1) OR gates combine the n most signicant product bits to detect overow. If any
of these bits is one, then Vu = 1 and the n OR gates on the right side of the array set product bits p0 to pn?1 to
one. An n-bit unsigned array multiplier that uses this technique has n2 AND gates, (2n ? 1) OR gates, n HAs, and
(n2 ? 2n) FAs. The worst case delay is approximately equal to the delay through one AND gate, three OR gates,
two HAs, and (2n ? 4) FAs.
With our proposed technique for unsigned multiplication, only p0 to pn are computed. Overow is detected by
examining the value of pn and the total number of leading zeros in the two input operands, A and B . This technique
works because the number of leading zeros in A and B indicate the magnitude of P . If A has ZA leading zeros and
B has ZB leading zeros, then A, B , and P are bounded by
(5)
2n?ZA ?1 A 2n?ZA ? 1
n
?
Z
?
1
n
?
Z
B
B
?1
(6)
2
B 2
(7)
22n?ZA ?ZB ?2 P 22n?ZA ?ZB ? 2n?ZA ? 2n?ZB + 1
Since the left side of (7) provides a lower bound on the product, overow for unsigned multiplication occurs if
2n 22n?ZA ?ZB ?2 P
(8)
or equivalently if
ZA + Z B n ? 2
(9)
b0
a7
a6
a5
a4
a3
a2
a1
a0
AND
AND
AND
AND
AND
AND
AND
AND
p0
b1
b2
b3
b4
b5
b6
b7
AND
AND
AND
AND
AND
AND
AND
MHA
MHA
MHA
MHA
MHA
MHA
MHA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
FA
c s
p15
OR
Figure 2.
p14
OR
FA
FA
c s
p13
OR
FA
c s
c s
p12
OR
p11
OR
FA
c s
p10
OR
FA
c s
p9
OR
< p0 >
OR
< p1 >
OR
< p2 >
OR
< p3 >
OR
< p4 >
OR
< p5 >
OR
< p6 >
OR
< p7 >
p1
p2
p3
p4
p5
p6
p7
HA
c s
p8
Vu
OR
Conventional 8-bit unsigned saturating array multiplier.
Thus, overow occurs, if A and B together have fewer than n ? 1 leading zeros. The right size of (7), gives the upper
bound
(10)
P 22n?ZA ?ZB ? 2n?ZA ? 2n?ZB + 1 < 22n?ZA ?ZB
If A and B together have n ? 1 or more leading zeros (i.e., n ? 1 ZA + ZB ), then (10) gives
P < 22n?ZA ?ZB 2n+1
(11)
In this case, P can always be represented as an unsigned integer with (n + 1) bits and overow occurs if and only if
pn = 1. Thus, for unsigned multiplication, overow occurs and the product saturates if and only if ZA + ZB n ? 2
or pn = 1. The condition given in (9) is detected as
Vu0 =
X X an?j bi
n?1 i
i=1 j=1
(12)
where the bit summations and bit multiplications correspond to logical ORs and logical ANDs, respectively, and
Vu0 = 1 when (9) is true. Rather than computing n (n ? 1)=2 bit products and then ORing these bit products,
a signicant hardware reduction is achieved by taking advantage of common terms in Equation (12). With this
approach, overow is detected using the iterative equations
oi+1 = oi + an?i
(13)
vi+1 = vi + oi+1 bi
(14)
for 2 i n ? 1, where oi is a temporary OR bit and vi is a temporary overow bit. Initially, o2 = an?1 and
v2 = an?1 b1. After (n ? 2) iterations of Equations (13) and (14), the overow ag is computed as Vu = vn + pn ,
where vn is equivalent to Vu0 from Equation (12). If Vu = 1, then p0 to pn are set to one. This technique is similar
to the technique for unsigned saturating multipliers presented in [16], except that it computes (n + 1) product bits
and does not use carries into column n when detecting overow.
An 8-bit unsigned array multiplier that uses our proposed technique is shown in Figure 3. Compared to the
conventional saturating array multiplier, the hardware that computes pn+1 to p2n?1 and Vu is replaced by an
b0
a7
a6
a5
a4
a3
a2
a1
a0
AND
AND
AND
AND
AND
AND
AND
AND
p0
b1
OR
AND
o2
b2
MHA
MHA
MHA
MHA
MHA
MHA
MHA
cs
cs
cs
cs
cs
cs
cs
v2 a6
OR
A2X
s
b3
a5
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
s
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
a4
A2X
s
ns
U
MFA
MFA
MFA
cs
cs
cs
cs
d
ne
ov
s
fl
er
MFA
MFA
cs
cs
cs
s
a1
MFA
MFA
cs
cs
it
A2X
s
Figure 3.
v5
OVD
v6
b6
OVD
< p6 >
v7
b7
OVD
v8
OR
vu
p7
OR
p8
b5
o7
XOR
s
OVD
p6
MFA
cs
< p5 >
p8
a1
v4
o6
OR
un
b7
b4
o5
< p4 >
a2
OVD
p5
n
io
ct
te
de
A2X
< p3 >
a3
v3
b3
p4
OR
ow
b6
MFA
a2
< p2 >
a4
OVD
o4
OR
A2X
b2
p3
ig
b5
MFA
a3
< p1 >
a5
v2
o3
OR
b4
o2 a6
p2
OR
A2X
< p0 >
p1
Unsigned overflow detection unit
< p7 >
Vu
Proposed 8-bit unsigned saturating array multiplier.
unsigned overow detection unit, the MFAs along the diagonal that produce pn are replaced by A2X cells, and
the HA in the bottom right corner of the array is replaced by an (exclusive-or) XOR gate. The unsigned overow
detection unit consists of one OR gate and (n ? 2) overow detection (OVD) cells. Each OVD cell computes oi+1
and vi+1 , based on Equations 13 and 14, using one AND gate and two OR gates. Each A2X cell contains one AND
gate, which generates a partial product bit, and two XOR gates, which compute the sum of the generated partial
product bit and sum and carry bits from the previous row. A2X cells are used instead of MFAs, since the carry bits
from this diagonal are no longer needed. The XOR gate uses sum and carry bits from the previous row to produce
pn .
An n-bit unsigned saturating array multiplier that uses our proposed technique requires (n2 + 5n ? 6)=2 AND
gates, (3n ? 3) OR gates, (2n ? 3) XOR gates, (n ? 1) HAs, and (n2 ? 3n + 2)=2 FAs. Since the OVDs have less delay
than the MFAs, they are not on the critical path. The worst case delay of our proposed saturating unsigned array
multiplier is approximately equal to the delay through one AND gate, two OR gates, one XOR gate, one HA, and
(n ? 2) FAs. Although our proposed technique for unsigned saturating multipliers has slightly more area and delay
than the technique presented in [16], it is more easily adapted to designs for two's complement saturating multipliers
and combined unsigned and two's complement saturating multipliers, as discussed in the following sections.
3. TWO'S COMPLEMENT SATURATING MULTIPLIERS
With two's complement integer multiplication, the values of A, B , and P are
A = ?an?1 2n?1 +
X ai 2 i
n?2
i=0
B = ?bn?1 2n?1 +
X bi 2i
n?2
i=0
P = ?p2n?1 22n?1 +
X pi 2i
2n?2
i=0
(15)
The most signicant bit of a two's complement number, called the sign bit, has a negative weight [17]. Figure 4
shows the multiplication matrix for an n-bit two's complement integer multiplication that uses the Complemented
Partial Product Word Correction Algorithm [18], [12]. This matrix is identical to the unsigned multiplication matrix,
shown in Figure 4, except (2n ? 2) of the partial products are complemented and ones are added to columns n and
(2n ? 1).
1
an-1 b0 an-2 b0
an-1 b1 an-2 b1
an-1 bn-2 an-2 b n-2
1
p2n-1
an-1 bn-1 an-2 bn-1
p2n-2
Figure 4.
p2n-3
a1 b0 a0 b0
a1 b1 a0 b1
a1 bn-2 a0 bn-2
a1 bn-1 a0 bn-1
pn
pn-1
p1
p0
Two's complement multiplication matrix for P = A B
If the n least signicant bits of the product are used as the result, overow occurs and the product should be
saturated when
P < ?2n?1
or
P 2n?1
(16)
With the conventional method for two's complement multiplication, overow is detected by testing if pn?1 diers
from any product bit to the left of it. Thus, overow is computed as
Vt = p^2n?1 + p^2n?2 + : : : + p^n+1 + p^n
(17)
where p^i = pi pn?1 and denotes logical XOR. Detecting overow with this method requires n XOR gates and
(n ? 1) OR gates, after the 2n-bit product is computed.
For two's complement multiplication, the product saturates to ?2n?1 = 100 : : : 00 when s = an?1 bn?1 = 1 and
Vt = 1, and it saturates to 2n?1 ? 1 = 011 : : : 11 when s = an?1 bn?1 = 0 and Vt = 1. If Vt = 0, then the n least
signicant bits of the product are returned. Thus, the bits of the saturated two's complement product are computed
as
< pn?1 > = Vt s + Vt pn?1
< pi > = Vt s + Vt pi (0 i n ? 2)
Equations (18) and (19) are implemented by an n-bit 2-to-1 mux that uses Vt as the select signal.
(18)
(19)
Figure 5 shows an 8-bit two's complement array multiplier that uses the Complemented Partial Product Word
Correction Algorithm to generate the partial products and the conventional method for saturation. This multiplier
is similar to the unsigned multiplier shown in Figure 2, except that (n ? 1) of the AND gates on the left side of the
array are replaced by NAND gates, (n ? 1) MFAs toward the bottom of the array are replaced by negating MFAs
(NMFAs), one of the half adders is replaced by a specialized half adder (SHA), p2n?1 is complemented, n XOR gates
are used at the bottom of the array, and the n OR gates on the right side of the array are replaced by an n-bit
2-to-1 mux. The NMFAs and the NAND gates complement (2n ? 2) partial product bits, as shown in Figure 4. The
SHA takes sum and carry bits from the previous row and adds them with one to produce new sum and carry bits.
The SHA and the inverter that complements p2n?1 add the ones shown in columns n and (2n ? 1) of Figure 4. The
n XOR gates perform p^i = pi pn?1 , for n i 2n ? 1. The n-bit 2-to-1 mux selects the appropriate bits for
the saturated product, based on Equations (18) and (19). An n-bit saturating two's complement array multiplier
that uses the conventional technique has 2n inverters, n2 AND gates, (n ? 1) OR gates, (n + 1) XOR gates, n HAs,
(n2 ? 2n) FAs, and one n-bit 2-to-1 mux. When implemented in CMOS technology, (2n ? 2) of the AND gates and
inverters can be combined to form NAND gates. The worst case delay is approximately equal to the delay through
one inverter, one AND gate, two OR gates, one XOR gate, one 2-to-1 mux, two HAs, and (2n ? 4) FAs.
For two's complement saturating multiplication, our proposed technique only computes p0 to pn . Overow is
detected by examining pn pn?1 , along with the number of leading zeros for positive operands and the number of
leading ones for negative operands. These leading zeros or leading ones, which have the same value as the sign bit,
a6
a5
a4
a3
a2
a1
b0
NAND
AND
AND
AND
AND
AND
AND
AND
< p0 >
b1
NAND
MHA
MHA
MHA
MHA
MHA
MHA
MHA
cs
cs
cs
cs
cs
cs
cs
< p1 >
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
< p2 >
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
< p5 >
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
< p6 >
NMFA
cs
NMFA
cs
NMFA
cs
NMFA
cs
NMFA
cs
NMFA
cs
NMFA
cs
b2
b3
b4
b5
b6
b7
NAND
NAND
NAND
NAND
NAND
XAND
s
FA
c s
p15
XOR
p15
OR
p14
XOR
p14
OR
FA
c s
p13
XOR
p13
OR
FA
c s
p12
XOR
p12
OR
FA
c s
p11
XOR
p11
OR
FA
c s
p10
XOR
p10
OR
a0
FA
c s
8-bit 2-to-1 mux
a7
< p3 >
< p4 >
< p7 >
SHA
c s
p9
p8
XOR
XOR
p9
p8
OR
Vt
Figure 5.
Conventional 8-bit two's complement saturating array multiplier.
are referred to as leading bits. For example, 00001101 has four leading bits, and 11101010 has three leading bits. If
A has LA leading bits and B has LB leading bits, then the magnitudes of A, B , and P are bounded by
2n?LA?1 j A j 2n?LA
2n?LB ?1 j B j 2n?LB
22n?LA?LB ?2 j P j 22n?LA?LB
(20)
(21)
(22)
As noted in (16), overow occurs and the product should be saturated if P < ?2n?1 or P 2n?1. When P
is the most negative representable number, ?2n?1 , saturating the product to the most negative number produces
the correct product, even though overow has not occurred. Thus, the nal product can be saturated whenever
j P j 2n?1 . This approach makes it easier to the determine the maximum total number of leading bits for which
the product can always be saturated. As noted in [19], our technique for two's complement saturation, also works
for two's complement overow detection without saturation.
Since the left side of (22) provides a lower bound on the product, the product of the two's complement multiplication should be saturated if
2n?1 22n?LA?LB ?2 j P j
(23)
or equivalently if
LA + L B n ? 1
(24)
Thus, the product should be saturated, if A and B together have fewer than n leading bits. The right size of (22),
gives the upper bound
(25)
j P j 22n?LA?LB
If A and B together have n or more leading bits (i.e., n LA + LB ), then (25) gives
j P j 22n?LA?LB 2n
(26)
b1
a6
a5
a4
a3
a2
a1
NAND
AND
AND
AND
AND
AND
AND
NAND
a7
b2
MHA
MHA
MHA
MHA
MHA
MHA
cs
cs
cs
cs
cs
< p1 >
b1
XOR
cs
A2X
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
b2
XOR
cs
< p2 >
a6
a5
A2X
Tw
MFA
MFA
MFA
MFA
cs
cs
cs
cs
a4
A2X
pl
om
sc
o’
s
MFA
MFA
MFA
cs
cs
cs
a3
t
en
em
b5
A2X
s
MFA
MFA
cs
cs
a2
ov
fl
er
A2X
ow
b6
s
MFA
cs
MFA
cs
MFA
cs
cs
MFA
cs
NA2X
s
NMFA
cs
< p4 >
< p5 >
MFA
cs
a1
< p3 >
b3
b4
b5
< p6 >
b6
< p7 >
b7
XOR
XOR
XOR
XOR
a6
a5
a4
a3
a2
a1
XOR
XOR
XOR
XOR
XOR
XOR
a6
b1
a5
a4
a3
a2
p7
p8
a1
AND
o3
b2
v3
OVD
o4
b3
v4
OVD
o5
b4
v5
OVD
o6
b5
v6
OVD
o7
b6
XOR
v7
v’’
t
OVD
v8
OR
vt
ct
te
de
io
n
b7
< p0 >
AND
cs
s
b4
a7
MHA
s
b3
a0
8-bit 2-to-1 mux
b0
a7
un
s
XOR
it
Vt
Two’s complement overflow detection unit
XOR
s
p8 s p7
Figure 6.
Proposed 8-bit two's complement saturating array multiplier.
In this case, P can always be represented as a two's complement integer with (n + 1) bits, and overow occurs if
and only if pn pn?1 = 1. Thus, for two's complement multiplication, overow occurs and the product saturates
if and only if LA + LB n ? 1 or pn pn?1 = 1. A more formal proof of this statement is presented in [19]. Our
proposed technique for two's complement overow detection is similar to the technique presented in [20], however,
our technique uses fewer gates and can be implemented eciently for any value of n.
A two's complement overow detection unit is used to determine if the product should be saturated. The condition
given in (24) is detected as
n?2 i+1
a^n?j ^bi
(27)
Vt0 = a^n?2 ^b1 +
XX
i=2 j=2
where a^i = ai an?1 , ^bi = bi bn?1 , and Vt0 = 1 when (24) is true. As was done for unsigned saturating multiplication,
the hardware needed to detect overow is reduced by taking advantage of common terms in Equation (27). With
this approach, overow is detected using the iterative equations
oi+1 = oi + a^n?i
vi+1 = vi + oi+1 ^bi
(28)
(29)
for 3 i n ? 1. Initially, o3 = a^n?2 and v3 = a^n?2 ^b1 . After (n ? 3) iterations of Equations (28) and (29), the
overow ag is computed as Vt = vn + pn pn?1 , where vn is equivalent to Vt0 from Equation (27).
An 8-bit saturating two's complement array multiplier that uses our proposed technique is shown in Figure 6.
This multiplier is similar to the proposed saturating unsigned multiplier, shown in Figure 3, except the two AND
gates in the upper left corner are replaced by NAND gates, the A2X and MFA in the bottom right corner are replaced
by a negating A2X (NA2X) and a negating MFA (NMFA), the unsigned overow detection unit is replaced by a
two's complement overow detection unit, and the n OR gates on the right side of the array are replaced by an n-bit
2-to-1 mux. The two's complement overow detection unit uses (2n ? 2) XOR gates to compute a^1 to a^n?2 , ^b1 to
^bn?2 , s = an?1 bn?1, and Vt00 = pn pn?1 , one AND gate to compute v3 = a^n?2 b^1 , (n ? 3) OVD cells to compute
o4 to on?1 and v4 to vn , and one OR gate to compute Vt = vn + Vt00 .
t
an-1 b0 an-2 b0
a1 b0 a0 b0
an-1 b1 an-2 b1
a1 bn-2 a0 bn-2
an-1 bn-2 an-2 b n-2
t
an-1 bn-1 an-2 bn-1
p2n-1
Figure 7.
p2n-2
p2n-3
a1 b1 a0 b1
a1 bn-1 a0 bn-1
pn
pn-1
p1
p0
Combined unsigned and two's complement multiplication matrix for P = A B
An n-bit two's complement saturating array multiplier that uses our proposed technique requires ve inverters,
(n2 + 5n ? 6)=2 AND gates, (2n ? 5) OR gates, (4n ? 5) XOR gates, (n ? 1) HAs, (n2 ? 3n + 2)=2 FAs, and one n-bit
2-to-1 mux. The worst case delay of our proposed two's complement saturating array multiplier is approximately
equal to the delay through one inverter, one AND gate, two OR gates, two XOR gates, one 2-to-1 mux, one HA, and
(n ? 2) FAs.
4. COMBINED SATURATING MULTIPLIERS
Since the unsigned and two's complement saturating multipliers presented in the previous sections have very similar
structures, a single multiplier can be designed that performs either unsigned or two's complement saturating integer
multiplication, based on an input control signal. This control signal, t, is one when two's complement multiplication
is performed and zero when unsigned multiplication is performed.
Figure 7 shows the combined unsigned and two's complement multiplication matrix for P = A B . In this
gure, ai bj = (ai bj ) t. When t = 1, the (2n ? 2) partial product bits that use either an?1 or bn?1, but not
both, are complemented and ones are added to columns n and (2n ? 1), which corresponds to the two's complement
multiplication matrix shown in Figure 4. When t = 0, the partial product bits are left unchanged and no ones are
added, which corresponds to the unsigned multiplication matrix shown in Figure 1.
The conventional method for combined unsigned and two's complement saturating multiplication detects overow
by examining pn to p2n?1 when unsigned multiplication is performed, and by examining p^n to p^2n?1 when two
complement multiplication is performed. This is accomplished by computing
d
Vc = p2n?1 + p2n?2 + : : : + pn+1 + pn
(30)
where pi = pi w, w = t pn?1 , and Vc is one when overow occurs.
For combined multiplication, the product saturates to 111 : : : 11 when Vc = 1 and t = 0; to 100 : : : 00 when Vc = 1,
an?1 bn?1 = 1, and t = 0; and to 011 : : : 11 when Vc = 1, an?1 bn?1 = 0, and t = 0. If Vc = 0, then the n least
signicant bits of the product are returned. Thus, the bits of the saturated product are computed as
< pn?1 > = Vc q1 + Vt pn?1
< pi > = Vc q2 + Vt pi (0 i n ? 2)
where q1 = an?1 bn?1 + t and q2 = an?1 bn?1 + t.
(31)
(32)
Figure 8 shows the block diagram of an combined 8-bit array multiplier that performs conventional saturating
multiplication. This gure is identical to the two's complement multiplier shown in Figure 5, except that (n ? 1)
NAND gates are replaced by conditional AND (CAND) gates, (n ? 1) NMFAs are replaced by conditional MFAs
(CMFAs), the SHA is replaced by a FA, the XAND cell is replaced by an AND-XOR-INV-OR (AXIO) cell, one
inverter is changed to an XOR gates, and one AND gate is added to second row from the bottom of the array. Each
CAND gate contains one AND gate and one XOR gate that compute an?1 bi = (an?1 bi ) t, which complements
d
a6
a5
a4
a3
a2
a1
b0
CAND
AND
AND
AND
AND
AND
AND
AND
< p0 >
b1
CAND
MHA
MHA
MHA
MHA
MHA
MHA
MHA
cs
cs
cs
cs
cs
cs
cs
< p1 >
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
< p2 >
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
MFA
MFA
MFA
MFA
MFA
MFA
MFA
cs
cs
cs
cs
cs
cs
cs
CMFA
cs
CMFA
cs
CMFA
cs
CMFA
cs
CMFA
cs
CMFA
cs
CMFA
cs
b2
b3
b4
b5
b6
b7
t
CAND
CAND
CAND
CAND
a0
8-bit 2-to-1 mux
a7
< p6 >
q1
AXIO
XOR
p15
XOR
p15
OR
FA
c s
p14
XOR
p14
OR
FA
c s
p13
XOR
p13
OR
FA
c s
p12
XOR
p12
OR
FA
c s
p11
XOR
p11
OR
FA
c s
p10
XOR
p10
OR
FA
c s
p9
XOR
p9
t
< p4 >
< p5 >
q2
CAND
< p3 >
< p7 >
q
FA
c s
p8 t p 7
w AND
XOR
p8
OR
Vc
Figure 8.
Conventional 8-bit combined unsigned and two's complement saturating array multiplier.
an?1 bi when t = 1. Similarly, each CMFA contains an XOR gate that complement ai bn?1 when t = 1. The AXIO
cell computes t, q1 , and q2 . The FA in the bottom right corner of the array adds the t in column n, and the XOR
gate in the third row from the bottom adds the t in column 2n ? 1. The AND gate in the second row from the
bottom computes w = t pn?1 . An n-bit combined saturating array multiplier that uses the conventional technique
has 2 inverters, (n2 + 1) AND gates, n OR gates, 3n XOR gates, (n ? 1) HAs, (n2 ? 2n + 1) FAs, and one n-bit
2-to-1 mux. The worst case delay is approximately equal to the delay through one AND gate, one OR gates, three
XOR gates, one 2-to-1 mux, one HAs, and (2n ? 3) FAs.
Our proposed technique for combined saturating multiplication takes advantage of common terms in the overow
detection equations for unsigned and two's complement multiplication. Since the unsigned overow detection equations use ai and bi and the two's complement overow detection equations use a^i and b^i , these equations are unied
by dening
bi = bi (t bn?1 )
ai = ai (t an?1 )
and
(33)
for 1 i n ? 2. This gives ai and bi when t = 0, and a^i and ^bi when t = 1. Removing common terms from
Equations (12) and (27), gives
n?2 i
an?j bi
(34)
Vc0 =
XX
i=2 j=2
With our technique, a preliminary combined overow ag is computed using the iterative equations
oi+1 = oi + an?i
(35)
vi+1 = vi + oi+1 bi
(36)
for 3 i n ? 2. Initially, o3 = an?2 and v3 = an?2 b2 . After (n ? 4) iterations of Equations (35) and (36), the
preliminary overow ag is Vc0 = vn?1 .
b1
b2
a6
a5
a4
a3
a2
a1
CAND
AND
AND
AND
AND
AND
AND
CAND
XOR
XOR
MFA
< p2 >
b2
XOR
cs
A2X
MFA
MFA
MFA
cs
cs
cs
MFA
MFA
MFA
MFA
cs
cs
cs
cs
MFA
MFA
MFA
cs
cs
cs
cs
MFA
cs
MFA
cs
bi
m
d
ne
ov
MFA
MFA
cs
cs
< p3 >
< p4 >
< p5 >
MFA
cs
q2
w
flo
er
io
ct
te
de
A2X
s
MFA
cs
< p6 >
MFA
cs
n
q1
un
it
b7
XOR
cs
cs
b6
XOR
MFA
cs
s
XOR
cs
cs
A2X
AND
MFA
cs
b5
a2
cs
MHA
s
a3
XOR
MHA
A2X
a4
b1
MHA
Co
b4
a5
< p1 >
MHA
s
t
a6
cs
MHA
A2X
< p0 >
AND
a7
MHA
MHA
s
b3
a0
8-bit 2-to-1 mux
b0
a7
CA2X
s
b4
b5
b6
< p7 >
CMFA
cs
Vc
t
b3
XOR
XOR
XOR
XOR
AND
b4
b5
b6
a3
a2
XOR
a1
AND
OR
OR
OA
AND
o3
v3
OR
OA
OVD
o4
v4
OVD
o5
OR
OA
v5
OA
OVD
o6
v6
OR
OVD
r’u
OR
o7
OA
v7 r t a1
Combined overflow
detection logic
vc
q1
q2
Combined overflow detection unit
s
Figure 9.
b3
a4
b7
XOR
p8
b1
b2
a5
a6
a1 p7 p8
p7
Proposed 8-bit combined unsigned and two's complement saturating array multiplier.
The remaining terms for unsigned overow detection not included in Equation (34) are
ru = (bn?1 where
X ai) + (an?1 nX?2 bi) = bn?1 (a1 + on?1) + an?1 ru0
n?2
i=1
i=1
ru0 =
X bi
n?2
i=1
(37)
(38)
The remaining terms for two's complement overow detection not included in Equation (34) are
rt =
X ai bn?i?1
n?2
i=1
The nal overow ag is computed as
Vc = Vc0 + (rt + pn pn?1 ) t + (bn?1 (a1 + on?1 ) + an?1 ru0 + pn ) t
(39)
(40)
An 8-bit combined saturating array multiplier that uses our proposed technique is shown in Figure 9. This
multiplier is similar to the proposed saturating two's complement multiplier, shown in Figure 6, except the two
NAND gates in the upper left corner are replaced by CAND gates, the NA2X and NMFA in the bottom right corner
are replaced by a complementing A2X (CA2X) and a complementing MFA (CMFA), and the unsigned overow
detection unit is replaced by a combined overow detection unit. The combined overow detection unit uses (2n ? 4)
XOR gates and two AND gates to compute a1 to an?2 , b1 to bn?2 , an AND gate to compute v3 , (n ? 4) OVD cells to
compute o4 to on?1 and v4 to vn?1 , (n ? 2) OR gates to compute ru0 , an AND gate and (n ? 3) OR-AND (OA) cells
to compute rt , and combined overow detection logic to compute Vc , q1 , and q2 . The combined overow detection
logic requires two inverters, four AND gates, seven OR gates, and two XOR gates.
An n-bit combined saturating array multiplier that uses our proposed technique requires two inverters, (n2 +7n)=2
AND gates, (3n ? 2) OR gates, (4n ? 1) XOR gates, (n ? 1) HAs, (n2 ? 3n +2)=2 FAs, and one n-bit 2-to-1 mux. The
worst case delay of our proposed combined saturating array multiplier is approximately equal to the delay through
one inverter, two AND gates, two OR gates, three XOR gates, one 2-to-1 mux, one HA, and (n ? 2) FAs.
Table 1.
Component counts for n-bit saturating array multipliers.
Multiplier type
INV
AND
Conventional unsigned 0
n2
Proposed unsigned
0 (n2 + 5n ? 6)=2
Conventional signed
2n
n2
2
Proposed signed
5 (n + 5n ? 6)=2
Conventional combined 2
n2 + 1
2
Proposed combined
2
(n + 7n)=2
Table 2.
Number of components
OR
XOR HA
FA
Mux
2n ? 1
0
n
n2 ? 2n
0
3n ? 3 2n ? 3 n ? 1 (n2 ? 3n + 2)=2 0
n?1 n+1
n
n2 ? 2n
n
2
2n ? 5 4n ? 5 n ? 1 (n ? 3n + 2)=2 n
n
3n n ? 1
n2 ? 2n + 1
n
3n ? 2 4n ? 1 n ? 1 (n2 ? 3n + 2)=2 n
Components on the critical delay path for n-bit saturating array multipliers.
Number of components
Multiplier type
INV AND OR XOR HA FA Mux
Conventional unsigned 0
1
3
0
2 2n ? 4 0
Proposed unsigned
0
1
2
1
1 n?2
0
Conventional signed
1
1
2
1
2 2n ? 4 1
Proposed signed
1
1
2
2
1 n?2
1
Conventional combined 0
1
1
3
1 2n ? 3 1
Proposed combined
1
2
2
3
1 n?2
1
Table 3.
Area and delay estimates for 32-bit saturating multipliers.
Area in equivalent gates
Delay in nanoseconds
Multiplier type Conventional Proposed Savings Conventional Proposed Savings
Unsigned
11934
6352
46.8%
51.34
26.46
48.5%
Signed
12129
6571
45.8%
52.01
27.41
47.2%
Combined
12265
6720
45.2%
52.93
28.20
46.7%
5. RESULTS AND CONCLUSIONS
Tables 1 and 2 summarize the total number of components and the number of each component on the critical delay
path for n-bit saturating array multipliers. Table 1 shows that the proposed techniques reduce the number of AND
gates and FAs required to implement saturating multipliers by nearly a factor of two for large values of n. Since these
components dominate the area of array multipliers, this leads a signicant a signicant reduction in area. Table 2
shows that the proposed techniques reduce the number FAs on the critical delay path by about a factor of two, with
little or no increase in the number of the other components.
Table 3 gives area and delay estimates for 32-bit saturating multipliers. Area and delay estimates were based
based on data from LSI Logic's 0.6 micron LCA300K gate array library and the Leonardo synthesis tool from
Exemplar Logic. The estimates given assume a nominal operating voltage and temperature of 5.0 Volts and 25 C,
respectively. From these estimates, 32-bit saturating multipliers implemented using our proposed techniques have
between 45.2% and 46.8% less area and between 46.7% and 48.5% less delay than saturating multipliers that use
conventional methods. Modifying our proposed 32-bit unsigned saturating array multiplier so that it performs both
unsigned and two's complement saturating multiplication, increases the area by only 5.8% and the delay by only
6.6%.
The proposed unsigned, two's complement, and combined saturating array multipliers presented in this paper have
about half as much area and delay as saturating array multipliers that use conventional methods. These decreases
in area and delay occur because the hardware that computes the (n ? 1) most signicant product bits is replaced by
simple overow detection units. These multipliers should also dissipate signicantly less power due to their decreased
hardware requirements. The techniques presented in this paper can easily be adapted to designs for multipliers that
use other structures or to multipliers that support both saturating and wraparound arithmetic.
ACKNOWLEDGMENTS
This material is based upon work supported by Sandia National Laboratories under Grant Number BF-1029 and
the National Science Foundation under Grant Number MIP-9703421. Any opinions, ndings and conclusions or
recommendations expressed in this material are those of the authors and do not necessarily reect the views of
Sandia National Laboratories or the National Science Foundation.
REFERENCES
1. A. Peleg and U. Weiser, \MMX Technology Extension to the Intel Architecture," IEEE Micro 6, pp. 42{50,
August 1996.
2. F. Mintzer and A. Peled, \A Microprocessor for Signal Processing, the RSP," IBM Journal of Research &
Development 26, pp. 413{423, July 1982.
3. N. Yadav, M. J. Schulte, and J. Glossner, \Parallel Saturating Fractional Arithmetic Units," in Proceedings of
the Ninth Great Lakes Symposium on VLSI, pp. 214{217, March 1999.
4. K. Diefendor, P. Dubey, R. Hochsprung, and H. Scale, \AltiVec Extension to PowerPC Accelerates Media
Processing," IEEE Micro 20(2), pp. 85{95, 2000.
5. M. Tremblay, \VIS Speeds New Media Processing," IEEE Micro 16(4), pp. 10{20, 1996.
6. R. Lee, \Accelerating Multimedia with Enhanced Processors," IEEE Micro 15(2), pp. 22{32, 1995.
7. P. Lapsley, DSP Processor Fundamentals: Architectures and Features, IEEE Press, 1997.
8. C. R. Baugh and B. A. Wooley, \A Two's Complement Parallel Array Multiplication Algorithm," IEEE Transactions on Computers C-22, pp. 1045{1047, December 1973.
9. K. Z. Pekmestzi, \Multiplexer-Based Array Multipliers," IEEE Transactions on Computers C-48, pp. 15{23,
January 1999.
10. L. Dadda, \Some Schemes for Parallel Multipliers," Alta Frequenza 34, pp. 349{356, 1965.
11. C. S. Wallace, \Suggestion for a Fast Multiplier," IEEE Transactions on Electronic Computers EC-13, pp. 14{
17, 1964.
12. K. Bickersta, M. J. Schulte, and E. E. Swartzlander, Jr., \Parallel Reduced Area Multipliers," Journal of VLSI
Signal Processing 9, pp. 181{192, 1995.
13. A. D. Booth, \A Signed Binary Multiplication Technique," Quarterly Journal of Mechanics and Applied Mathematics 4, pp. 236{240, 1951.
14. H. Sam and A. Gupta, \A Generalized Multibit Recoding of Two's Complement Binary Numbers and Its Proof
with Application in Multiplier Implementations," IEEE Transactions on Computers 39(8), pp. 1006{1015, 1990.
15. P. J. Song and G. D. Micheli, \Circuit and Architecture Trade-os for High-Speed Multiplication," IEEE Journal
of Solid-State Circuits 26, pp. 1184{1198, September 1991.
16. M. J. Schulte, P. I. Balzola, A. Akkas, and R. W. Brocato, \Integer Multiplication with Overow Detection or
Saturation," IEEE Transactions on Computers 49(7), 2000.
17. I. Koren, Computer Arithmetic and Algorithms, Brookside Court Publishers, 1998.
18. J. A. Gibson and R. W. Gibbard, \Synthesis and Comparison of Two's Complement Parallel Multipliers," IEEE
Transactions on Computers C-24, pp. 1020{1027, October 1975.
19. M. Gok, \Integer Multiplication with Overow Detection or Saturation," Master's thesis, Lehigh University, 19
Memorial Dr. West, Bethlehem, PA, 18015, 2000.
20. Y. Baydatch and R. Hasharon, \Overow Detection for Integer Multiply Instruction" Patent No. 5,801,978,
September, 1998.