Download Rounding Techniques - the GMU ECE Department

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 2
Addendum
Rounding Techniques
ECE 645 – Computer Arithmetic
Required Reading
Parhami,
Chapter 17.5, Rounding schemes
Rounding Algorithms 101
http://www.diycalculator.com/popup-m-round.shtml
2
Rounding
• Rounding occurs when we want to approximate a more
precise number (i.e. more fractional bits L) with a less
precise number (i.e. fewer fractional bits L')
• Example 1:
• old: 000110.11010001 (K=6, L=8)
• new: 000110.11 (K'=6, L'=2)
• Example 2:
• old: 000110.11010001 (K=6, L=8)
• new: 000111. (K'=6, L'=0)
• The following pages show rounding from L>0 fractional bits
to L'=0 bits, but the mathematics hold true for any L' < L
• Usually, keep the number of integral bits the same K'=K
3
Rounding Equation
Whole part
Fractional part
xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l
Round
yk–1yk–2 . . . y1y0
• y = round(x)
4
Rounding Techniques
• There are different rounding techniques:
• 1) truncation
• results in round towards zero in signed magnitude
• results in round towards -∞ in two's complement
• 2) round to nearest number
• 3) round to nearest even number (or odd number)
• 4) round towards +∞
• Other rounding techniques
• 5) jamming or von Neumann
• 6) ROM rounding
• Each of these techniques will differ in their error depending
on representation of numbers i.e. signed magnitude versus
two's complement
• Error = round(x) – x
5
1) Truncation
The simplest possible rounding scheme: chopping or truncation
xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l
trunc
xk–1xk–2 . . . x1x0
ulp
•
Truncation in signed-magnitude results in a number chop(x) that is always of smaller
magnitude than x. This is called round towards zero or inward rounding
• 011.10 (3.5)10  011 (3)10
• Error = -0.5
• 111.10 (-3.5)10  111 (-3)10
• Error = +0.5
•
Truncation in two's complement results in a number chop(x) that is always smaller
than x. This is called round towards -∞ or downward-directed rounding
• 011.10 (3.5)10  011 (3)10
• Error = -0.5
• 100.01 (-3.5)10  100 (-4)10
• Error = -0.5
6
Truncation Function Graph: chop(x)
chop( x)
chop(x)
4
4
3
3
2
2
1
1
x
–4
–3
–2
–1
1
2
3
4
x
–4
–3
–2
–1
1
–1
–1
–2
–2
–3
–3
–4
–4
Fig. 17.5 Truncation or chopping
of a signed-magnitude number
(same as round toward 0).
2
3
4
Fig. 17.6 Truncation or chopping
of a 2’s-complement number (same
as round to -∞).
7
Bias in two's complement truncation
•
•
X
(binary)
X
(decimal)
chop(x)
(binary)
chop(x)
(decimal)
Error
(decimal)
011.00
3
011
3
0
011.01
3.25
011
3
-0.25
011.10
3.5
011
3
-0.5
011.11
3.75
011
3
-0.75
100.01
-3.75
100
-4
-0.25
100.10
-3.5
100
-4
-0.5
100.11
-3.25
100
-4
-0.75
101.00
-3
101
-3
0
Assuming all combinations of positive and negative values of x equally possible,
average error is -0.375
In general, average error = -(2-L'-2-L )/2, where L' = new number of fractional bits
8
Implementation truncation in hardware
• Easy, just ignore (i.e. truncate) the fractional digits
from L to L'+1
xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L
=
yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest)
9
2) Round to nearest number
• Rounding to nearest number what we normally think of when say round
• 010.01 (2.25)10  010 (2)10
• Error = -0.25
• 010.11 (2.75)10  011 (3)10
• Error = +0.25
• 010.00 (2.00)10  010 (2)10
• Error = +0.00
• 010.10 (2.5)10  011 (3)10
• Error = +0.5 [round-half-up (arithmetic rounding)]
• 010.10 (2.5)10  010 (2)10
• Error = -0.5 [round-half-down]
10
Round-half-up: dealing with negative numbers
• Rounding to nearest number what we normally think of when say round
• 101.11 (-2.25)10  110 (-2)10
• Error = +0.25
• 101.01 (-2.75)10  101 (-3)10
• Error = -0.25
• 110.00 (-2.00)10  110 (-2)10
• Error = +0.00
• 101.10 (-2.5)10  110 (-2)10
• Error = +0.5 [asymmetric implementation]
• 101.10 (-2.5)10  101 (-3)10
• Error = -0.5 [symmetric implementation]
11
Round to Nearest Function Graph: rtn(x)
Round-half-up version
Asymmetric implementation
Symmetric implementation
rtn(x)
rtn(x)
4
4
3
3
2
2
1
1
x
–4
–3
–2
–1
1
2
3
4
x
–4
–3
–2
–1
1
–1
–1
–2
–2
–3
–3
–4
–4
2
3
4
12
Bias in two's complement round to nearest
Round-half-up asymmetric implementation
•
X
(decimal)
rtn(x)
(binary)
rtn(x)
(decimal)
Error
(decimal)
010.00
2
010
2
0
010.01
2.25
010
2
-0.25
010.10
2.5
011
3
+0.5
010.11
2.75
011
3
+0.25
101.01
-2.75
101
-3
-0.25
101.10
-2.5
110
-2
+0.5
101.11
-2.25
110
-2
+0.25
110.00
-2
110
-2
0
Assuming all combinations of positive and negative values of x equally possible, average error
is +0.125
•
•
•
X
(binary)
Smaller average error than truncation, but still not symmetric error
We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias
always
Also have the problem that you can get overflow if only allocate K' = K integral bits
•
•
Example: rtn(011.10)  overflow
This overflow only occurs on positive numbers near the maximum positive value, not on negative
numbers
13
Implementing round to nearest (rtn) in hardware
Round-half-up asymmetric implementation
• Two methods
• Method 1: Add '1' in position one digit right of new LSB
(i.e. digit L'+1) and keep only L' fractional bits
=
xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L
+
1
yk-1 yk-2 .. y1 y0. y-1
ignore (i.e. truncate the rest)
• Method 2: Add the value of the digit one position to right
of new LSB (i.e. digit L'+1) into the new LSB digit (i.e.
digit L) and keep only L' fractional bits
xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L
+ x-1
yk-1 yk-2 .. y1 y0.
ignore (i.e truncate the rest)
14
Round to Nearest Even Function Graph: rtne(x)
•
To solve the problem with the midway value we implement round to nearest-even
number (or can round to nearest odd number)
rtne(x)
R*(x)
4
4
3
3
2
2
1
1
x
–4
–3
–2
–1
1
2
3
4
x
–4
–3
–2
–1
1
–1
–1
–2
–2
–3
–3
–4
–4
Fig. 17.8 Rounding to the
nearest even number.
2
3
4
Fig. 17.9 R* rounding or rounding
to the nearest odd number.
15
Bias in two's complement round to nearest even
(rtne)
X
(binary)
X
(decimal
)
rtne(x)
(binary)
rtne(x)
(decimal)
Error
(decimal)
000.10
0.5
000
0
-0.5
001.10
1.5
010
2
+0.5
010.10
2.5
010
2
-0.5
011.10
3.5
0100 (overfl)
4
+0.5
100.10
-3.5
100
-4
-0.5
101.10
-2.5
010
-2
+0.5
110.10
-1.5
010
-2
-0.5
111.10
-0.5
000
0
+0.5
• average error is now 0 (ignoring the overflow)
• cost: more hardware
16
4) Rounding towards infinity
• We may need computation errors to be in a known direction
• Example: in computing upper bounds, larger results are
acceptable, but results that are smaller than correct values
could invalidate upper bound
• Use upward-directed rounding (round toward +∞)
• up(x) always larger than or equal to x
• Similarly for lower bounds, use downward-directed rounding
(round toward -∞)
• down(x) always smaller than or equal to x
• We have already seen that round toward -∞ in two's complement
can be implemented by truncation
17
Rounding Toward Infinity Function Graph: up(x)
and down(x)
up(x)
down(x)
down(x) can be implemented by chop(x) in
two's complement
18
Two's Complement Round to Zero
• Two's complement round to zero (inward rounding)
also exists
inward(x )
4
3
2
1
x
–4
–3
–2
–1
1
2
3
4
–1
–2
–3
–4
19
Other Methods
• Note that in two's complement round to nearest (rtn)
involves an addition which may have a carry
propagation from LSB to MSB
• Rounding may take as long as an adder takes
• Can break the adder chain using the following two
techniques:
• Jamming or von Neumann
• ROM-based
20
5) Jamming or von Neumann
jam(x)
Chop and force the LSB
of the result to 1
4
3
2
1
x
–4
–3
–2
–1
1
–1
–2
2
3
Simplicity of chopping,
with the near-symmetry
or ordinary rounding
4
Max error is comparable
to chopping (double that
of rounding)
–3
–4
21
6) ROM Rounding
ROM(x)
Fig. 17.11 ROM rounding
with an 8  2 table.
4
3
Example: Rounding with a
32  4 table
2
1
x
–4
–3
–2
–1
1
2
3
4
–1
–2
–3
–4
xk–1 . . . x4x3x2x1x0 . x–1x–2 . . . x–l
ROM address
Rounding result is the same
as that of the round to nearest
scheme in 31 of the 32
possible cases, but a larger
error is introduced when
x3 = x2 = x1 = x0 = x–1 = 1
ROM
xk–1 . . . x4y3y2y1y0
ROM data
22
ANSI/IEEE Floating-Point Standard
ANSI/IEEE standard includes four rounding modes:
Round to nearest even [default rounding mode]
Round toward zero (inward)
Round toward + (upward)
Round toward – (downward)
23
Related documents