Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
EE 5324 – VLSI Design II
Part VII: Floating Point Arithmetic
Kia Bazargan
University of Minnesota
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
299
Floating-Point vs. Fixed-Point Numbers
• Fixed point has limitations
x = 0000 0000. 0000 10012
y = 1001 0000. 0000 00002
Rounding?
Overflow? (x2 and y2 under/overflow)
• Floating point: represent numbers in two fixedwidth fields: “magnitude” and “exponent”
Magnitude: more bits = more accuracy
Exponent: more bits = wider range of numbers
± Exponent
e
X= s
Spring 2006
Magnitude
m
EE 5324 - VLSI Design II - © Kia Bazargan
300
Floating Point Number Representation
• Sign field:
When 0: positive number, when 1, negative
• Exponent:
Usually presented as unsigned by adding an offset
Example: 4 bits of exponent, offset=8
o Exp=10012 e = 10012-10002 = 00012
o Exp=00102 e = 00102-10002 = 10102 = -6
• Magnitude (also called significand, mantissa)
Shift the number to get: 1.xxxx
Magnitude is the fractional part (hidden ‘1’)
Example: 6 bits of mantissa
o Number=110.0101 shift: 1.100101 mantissa=100101
o Number=0.0001011 shift: 1.011 mantissa=011000
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
301
Floating Point Numbers: Example
± Exponent
X = s e (+bias)
Magnitude
m
X = ± 1.m × 2e
X1 = 0
1010 0011101
X1 = + 1.0011101 × 22
X2 = 0
0010 1000000
X2 = + 1. 1 × 2-6
X3 = 1
1011 0000001
X3 = - 1.0000001 × 23
X4 = 0
0000 0000000
X4 = + 1.0000000 × 2-8
=0
X5 = 0
1111 0000000
X5 = + 1.0000000 × 27
= +
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
302
Floating Point Number Range
• Range: [-max, -min] [min, max]
Min = smallest magnitude x 2smallest exponent
Max = largest magnitude x 2largest exponent
• What happens if:
We increase # bits for exponent?
Increase # bits for magnitude?
• Ref:
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
ftp://download.intel.com/technology/itj/q41999/pdf/ia64fpbf.pdf
-
-max
FLP-
-min
0
min
FLP+
max
+
. . .
. . .
. . .
. . .
Denser
Sparser
Denser
Sparser
Negative
Positive
Overflow
Overflow
Underflow
numbers
numbers
Region
Region
Regions
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
303
Floating Point Operations
• Addition/subtraction, multiplication/division,
function evaluations, ...
• Basic operations
Adding exponents / magnitudes
Multiplying magnitudes
Aligning magnitudes (shifting, adjusting the
exponent)
Rounding
Checking for overflow/underflow
Normalization (shifting, adjusting the exponent)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
304
Floating Point Addition
• More difficult than multiplication!
• Operations:
Align magnitudes (so that exponents are equal)
Add (and round)
Normalize (result in the form of 1.xxx)
X=
0
1011 0011101
X = + 1.0011101 × 23
y=
0
1000 1010011
y = + 1.1010011 × 20
y=
0
1011 0011010
y = + 0.0011010 × 23
x+y= 0
1011 0110111
x+y= +1.0110111 × 23
No need to normalize in this case
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
305
Floating Point Adder Architecture
Unpack
Complement/swap
Subtract
Exponents
+/-
Align Magnitudes
Cout
Sign
Logic
Add Magnitudes
Cin
Normalize
Adjust
Exponent
Round/Complement
Adjust
Exponent
Normalize
Pack
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
306
Floating Point Adder Components
• Unpacking
Inserting the “hidden 1”
Checking for special inputs (NaN, zero)
• Exponent difference
Used in aligning the magnitudes
A few bits enough for subtraction
o If 32-bit magnitude adder, 8 bits of exponent, only 5 bits
involved in subtraction
If negative difference, swap, use positive diff
o How to compute the positive diff?
• Pre-shifting and swap
Shift/complement provided for one operand only
Swap if needed
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
307
Floating Point Adder Components (cont.)
• Rounding
Three extra bits used for rounding
• Post-shifting
Result in the range (-4, 4) z = Coutz1z0.z-1z-2…
Right shift: 1 bit max
o If Cout z1 right shift
Left shift: up to # of bits in magnitude
o Determine # of consecutive 0’s (1’s) in z, beginning with z1.
Adjust exponent accordingly
• Packing
Check for special results (zero, under-/overflow)
Remove the hidden 1
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
308
Counting vs. Predicting Leading Zeros/Ones
Magnitude
Adder
Magnitude
Adder
Predict
Leading
0/1
Count
Leading
0/1
Adjust
Exponent
Shift
amount
Post-shifter
Counting:
Simpler but on the
critical path
Adjust
Exponent
Shift
amount
Post-Shifter
Predicting:
More complex
architecture
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
309
Floating Point Multiplication
• Simpler than floating-point addition
• Operation:
Inputs: z1= ± 1.m1 × 2e1 z2= ± 1.m2 × 2e2
Output = ± (1.m1 × 1.m2) × 2e1+e2
Sign: XOR
Exponent:
o Tentatively computed as e1+e2
o Subtract the bias (=127) HOW?
o Adjusted after normalization
Magnitude
o
o
o
o
Spring 2006
Result in the range [1,4)
(inputs in the range [1,2) )
Normalization: 1- or 2-bit shift right, depending on rounding
Result is 2.(1+m) bits, should be rounded to (1+m) bits
Rounding can gradually discard bits, instead of one last stage
EE 5324 - VLSI Design II - © Kia Bazargan
310
Floating Point Multiplier Architecture
Floating-point operands
Unpack
Note:
Pipelining is
used in
magnitude
multiplier, as
well as block
boundaries
XOR
Add
Exponents
Multiply
Magnitudes
Adjust
Exponent
Normalize
Round
Adjust
Exponent
Normalize
Pack
Product
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
[© Oxford U Press]
311
Square-Rooting
• Most important elementary function
• In IEEE standard, specified a basic operation
(alongside +,-,*,/)
• Very similar to division
• Pencil-and-paper method:
Radicand:
z=z2k-1z2k-2…z1z0
Square root:
qk-1qk-2…q1q0
Remainder (z-q2) sksk-1sk-2…s1s0 (k+1 digits)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
312
Square Rooting: Example
• Example: sqrt(9 52 41)
q2
q1
q0
9
9
52
41
0
52
00
q
=
z
q(0)=0
q2=3
q(1)=3
6q1 × q1 52
q1=0
q(2)=30
q0=8
q(3)=308
×2
×2
52
48
41
64
60q0 × q0 5241
03
77
s = 377
q=308
Append digits
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
313
Square Rooting: Example (cont.)
• Why double the partial root?
Partial root after step 2 is: q(2) = 30
Appending the next digit q0 10 × q(2) + q0
Square of which is 100×(q(2))2 + 20×q(2)×q0 + q02
The term 100×(q(2))2 already subtracted
Find q0 such that (10×(2×q(2)) + q0) × q0 is the
max number partial remainder
• The binary case:
Square of
2×q(2) + q0 is:
4×(q(2))2 + 4×q(2)×q0 + q02
Find q0 such that (4×q(2) + q0) × q0 is partial
remainder
For q0=1, the expression becomes 4×q(2)+1 (i.e.,
append “01” to the partial root)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
314
Square Rooting: Example Base 2
• Example: sqrt(011101102) = sqrt(118)
q3
q2
q1
q0
01
01
11
01
10
00
0
11
00
0
11
10
01
01
01
00
00
00
10
00
1
00
10
Spring 2006
q
=
q(0)=0
z=(118)10
q3=1
q(1)=1
101 ?
No
q2=0
q(2)=10
1001 ?
Yes
q1=1
q(3)=101
No
q0=0
q(4)=1010
10101 ?
s=1810
EE 5324 - VLSI Design II - © Kia Bazargan
q=10102=1010
315
Sequential Shift/Subtract Square Rooter Architecture
Put z - 1 here
at the outset
Trial Difference
MSB of
2s(j-1)
Partial
Remainder
Load
Square
root
Complement
Cout
(l+2)-bit
adder
l+2
q
l+2
-j
Select
Root Digit
sub
Cin
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
316
Other Methods for Square Rooting
• Restoring vs. non-restoring
We looked at the restoring algorithm
(after subtraction, restore partial remainder if the
result is negative)
Non-restoring:
Use a different encoding (use digits {-1,1} instead of
{0,1}) to avoid restoring
• High-radix
Similar to modified Booth encoding multiplication: take
care of more number of bits at a time
More complex circuit, but faster
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
317
Other Methods for Square Rooting (cont.)
• Convergence methods
Use the Newton method to approximate the function
f(x) = x2 – z
approximates x=z
OR
f(x) = 1/x2 – z approximates x=1/z ,
multiply by z to get z
Iteratively improve the accuracy
Can use lookup table for the first iteration
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
318
Square Rooting: Abstract Notation
q
z
-q3 (q(0) 0q3) 26
-q2 (q(1) 0q2) 24
-q1 (q(2) 0q1) 22
-q0 (q(3) 0q0) 20
s
Floating point format:
- Shift left (not right)
- Powers of 2 decreasing
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
319
Restoring Floating-Point Square Root Calc.
z
0 1.1 1 0 1 1 0
(118/64)
q(0)=1.
s(0) = z - 1
2s(0)
-1
-[2× ( 1.)+2 ]
0 0 0.1 1 0 1 1 0
0 0 1.1 0 1 1 0 0
1 0.1
q =1
0
s(1)
s(1) = 2 s(0)
2s(1)
-[2× ( 1.0)+2 -2 ]
1 1 1.0 0 1 1 0 0
0 0 1.1 0 1 1 0 0
0 1 1.0 1 1 0 0 0
1 0.0 1
q =0
q(1)= 1.0
-1
Restore
s(2)
2s(2)
-3
-[2× ( 1.01)+2 ]
0 0 1.0 0 1 0 0 0
0 1 0.0 1 0 0 0 0
1 0.1 0 1
q =1
-2
q(2)= 1.01
s(3)
s(3) = 2 s(2)
2s(3)
-4
-[2× ( 1.010)+2 ]
1 1 1.1
0 1 0.0
1 0 0.1
1 0.1
q =0
-3
q(3)= 1.010
0
1
0
0
1
0
0
0
0 0 0
0 0 0
0 0 0
1
Restore
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
320
Restoring Floating-Point Sq. Root Calc. (cont.)
s(3)
s(3) = 2 s(2)
2s(3)
-4
-[2× ( 1.010)+2 ]
q =0
-3
s(4)
0 0 1.1 1 1 1 0 0
2s(4)
0 1 1.1 1 1 0 0 0
-5
-[2× (1.0101 )+2 ]
1 0.1 0 1 0 1
q =1
-4
q(4)= 1.0101
s(5)
0 0 1.0 0 1 1 1 0
2s(5)
0 1 0.0 1 1 1 0 0
-6
-[2×( 1.01011)+2 ]
1 0.1 0 1 1 0 1
q =1
-5
q(5)= 1.01011
s
q
0
1
0
0
1
0
0
0
q(3)= 1.010
0 0 0
0 0 0
0 0 0
1
s(6)
s(6) = 2 s(5)
1 1 1.1
0 1 0.0
1 0 0.1
1 0.1
Restore
q =0
q(6)= 1.010110
-6
(156/64)
Restore
2
0 . 0 0 0 0 1 0 0 1 1 1 0 0 (156/64 )
1.0 1 0 1 1 0
(86/64)
1 1 1.1 0 1 1 1 1
0 1 0.0 1 1 1 0 0
(true remainder)
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
321
Nonrestoring Floating-Point Square Root Calc.
z
s(0) = z - 1
2 s(0)
-1
-[2× ( 1.)+2 ]
s(1)
2 s(1)
-2
+[2× ( 1.1)-2 ]
s(2)
2 s(2)
-[2× ( 1.01)+2 -3 ]
s(3)
2 s(3)
-4
+[2× ( 1.011)-2 ]
s(4)
2 s(4)
-5
-[2× ( 1. 0 1 0 1 )+2 ]
Spring 2006
0 1.1 1 0 1 1 0
0 0 0.1
0 0 1.1
1 0.1
1 1 1.0
1 1 0.0
1 0.1
0 0 1.0
0 1 0.0
1 0.1
1 1 1.1
1 1 1.0
1 0.1
0 0 1.1
0 1 1.1
1 0.1
(118/64)
1 0 1 1 0
0 1 1 0 0
q =1
q0 =1
-1
q(0)=1.
q(1)=1.1
0
1
1
0
1
0
0
1
0
1
1
0
1 1 0 0
1 0 0 0
q =-1
-2
q(2)=1.01
1
0
1
1
0
1
1
1
1
0 0 0
0 0 0
q =1
-3
q(3)=1.011
0
0
1
1
0
0
0 0
0 0
q =-1
-4
q(4)=1.0101
0 0
0 0
1
q =1
-5
q(5)=1.01011
EE 5324 - VLSI Design II - © Kia Bazargan
322
Nonrestoring FP Square Root Calc. (cont.)
s(4)
2 s(4)
-5
-[2× ( 1. 0 1 0 1 )+2 ]
0 0 1.1 1 1 1 0 0
0 1 1.1 1 1 0 0 0
1 0.1 0 1 0 1
q =1
-5
q(5)=1.01011
s(5)
2 s(5)
-[2×( 1.01011 )+2 -6 ]
0 0 1.0 0 1 1 1 0
0 1 0.0 1 1 1 0 0
1 0.1 0 1 1 0 1
q =1
-6
q(6)=1.010111
s(6)
s(6)=2 S(5)
1 1 1.1 0
1 0.1 0
0 1 0.0 1
0.0 0
1 . 1 -1
1.0 1
s(6) (corrected)
s (true remainder)
q (signed-digit)
q (corrected bin)
1 1 1 1 Negative
(-17/64)
1 1 0 1 Correct
1 1 0 0
(156/64)
0 0 1 0 0 1 1 1 0 0
1 -1 1 1
(87/64)
(86/64)
0 1 1 0
If final S negative, drop the last ‘1’ in q, and restore the
remainder to the last positive value.
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
323
Square Root Through Convergence
• Newton-Rapson method:
Choose f(x)=x2-z
x(i+1) = x(i) – f(x(i)) / f’(x(i))
x(i+1) = 0.5 (x(i) + z / x(i))
• Example: compute square root of z=(2.4)10
x(0)
read out from table = 1.5
accurate to 10 -1
x(1) = 0.5( x(0) +2.4/x(0) ) = 1.550 000 000
accurate to 10 -2
x(2) = 0.5( x(1) +2.4/x(1) ) = 1.549 193 548
accurate to 10 -4
x(3) = 0.5( x(2) +2.4/x(2) ) = 1.549 193 338
accurate to 10 -8
[Par00] p354
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
324
Non-Restoring Parallel Square Rooter
q-1 1
0
z-1
1
z-2
Cell
z-3
0
1
XOR
z-4
q-2
z-5
1
0
FA
z-6
q-3
0
z-7
z-8
s-7
s-8
1
q-4
s-1
s-2
s-3
s-4
s-5
s-6
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
325
Function Evaluation
• We looked at square root calculation
Direct hardware implementation (binary, BSD, high-radix)
o Serial
o Parallel
Approximation (Newton method)
• What about other functions?
Direct implementation
o Example: log2 x can be directly implemented in hardware
(using square root as a sub-component)
Polynomial approximation
Table look-up
o Either as part of calculation or for the full calculation
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
326
Table Lookup
Operand(s)
u bits
2u x v
table
Preprocessing
Logic
Operand(s)
u bits
.
.
.
Smaller
table(s)
...
Postprocessing
logic
Result(s)
v bits
Result(s)
v bits
Direct table-lookup
implementation
Table-lookup with preand post-processing
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
327
Linear Interpolation Using Four Subintervals
a(3)+b(3)x
a(2)+b(2)x
2-bit address
a(1)+b(1)x
a(0)+b(0)x
a(i)
b (i) /4
x
f(x)
Radix
Point
×
4x
+
x
xmin
4-entry tables
x
xmax
f(x)
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
328
Piecewise Table Lookup
z
b-bit input
b-g
b-bit input
g
d
d
Table
1
b-h
Table v
L
2
vH
d
d
Adder
-p
Sign
d-bit output
d*-h
Table
1
h
v
d*
d
Adder
Table
2
d*
Adder
d
d*
h
d+1
d+1
z
m*
Z mod p
d
Mux
d-bit output
z mod p
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
329
Accuracy vs. Lookup Table Size Trade-off
Worst-case absolute error
10-1
10-2
10-3
10-4
10-5
2nddegree
10-6
10-7
3rddegree
10-8
10-9
Spring 2006
Linear
0
2
4
6
8
Number of address bits (h)
EE 5324 - VLSI Design II - © Kia Bazargan
10
[© Oxford U Press]
330
Useful Links
• M. E. Phair, “Free Floating-Point Madness!”,
http://www.hmc.edu/chips/
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
331