Download ppt

Document related concepts

List of first-order theories wikipedia , lookup

Large numbers wikipedia , lookup

Elementary arithmetic wikipedia , lookup

Positional notation wikipedia , lookup

Location arithmetic wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Elementary mathematics wikipedia , lookup

Division by zero wikipedia , lookup

Addition wikipedia , lookup

Arithmetic wikipedia , lookup

Transcript
Floating Point in computers
Comply with standards:
IEEE 754
ISO/IEC 559
Timeline
•
•
•
•
•
•
Introduction
Binary review
Integer Arithmetic
Floating Point
Floating Point Arithmetic
Other issues
quite short
not so long
1/3
1/3
1/3
extra short
Introduction
•
•
•
•
Who does computer arithmetic?
Intel’s spare money
How is it done in hardware?
How Integer relates to Floating point
• Now, we go back to “computer structure”
Binary numbers
• What is 1 0 0 1 0 1 1 . 0 0 1 0 1
?
26 25 24 23 22 21 20 2122232425
64
8 21
1
8
1
32
 75 5
32
Signed Binary Integers
•
•
•
•
Sign-magnitude
2’s complement
1’s complement
biased
Sign-Magnitude
• High order bit = Sign
• 0101 = 5
• 1101 = -5
• 2 zero’s
2’s complement
• Number + Negative = 2n
• 0101 = 5
• 1011 = -5
• Easy addition (drop carry)
• Formula: -an-12n-1 + an-22n-2 + … +a121 +
a0
1’s Complement
• Negative - complement to 1
• 0101 = 5
• 1010 = -5
• 2 zero’s
• Number + Negative = 2n-1
Biased
• Binary = Number + Bias
• Bias = 5:
1101 = 5
0000 = -5
5+5=10
(-5)+5 = 0
• Relative order remains
Integer Arithmetic
Adding (usigned) Integers
• Elementry school :
1
+
1 1
11001101
10000110
1 0 10 100 1 1
• Result has n+1 bits!
Adding Integers - hardware
Half Adder
a
Full Adder
b
a
Cout
b
Cout
Cin
s
s
s  ab  ab
cout  ab
s  abcin  abcin  abcin
cout  ab  ac  bc
2 logical levels
Ripple carry Adder
an-1
bn-1
an-2
bn-2
a1
b1
a0
b0
Cin
Cout
sn-1
sn-2
• Slow - 2n logical levels
• Small constant (CMOS)
• Other ways exist
s1
s0
Adding Signed Integers
• In 2’s complement:
b + (2n-a)
b + (-a)
=
(-b) + (-a)
= (2n-b)+(2n-a)
=
2n + (b-a)
= (2n - (b+a)) + 2n
• hence - add as integers, discard carry out
• Example:
0011 + 1100 = ?
Substracting Integers
• Add the negation
• Negating 2’s complement:
11010100101011000110000 = ?
001010110101001110 10000
Integer (unsigned) Multiplication
• Elementry school :
1101
*
1001
1101
0000
0000
1101
01110101
• Result is 2n bits !
Hardware Multiplier
Shift
Carry
P
A
n
n
B
n
• P=0
• loop:
(i) if A0=1, add B to P
(ii) right-shift P & A
Integer (unsigned) Division
• Elementry school :
0 100
11 1101
00
Result: 0100, Rem 1
011
11
Dec: 13/3=4, Rem 1
000
00
001
00
01
Hardware Divider
Shift
P
n+1
0
• P=0
• loop:
A
n
B
n+1
(i) left-shift P & A
(ii) Sub. B from P:
positive: a0=1
negative: a0=0, restore P (add B)
Example
• 13 / 3 = 4 (1)
• n=4
• A=1101
B=00011
P=00000
P
A
00000 1101
B
00011
P
A
00001 0100
Remainder Quotient
B
00011
Division - remarks
•
•
•
•
Non-restoring Algorithm
Load P only if positive
Check for 0
(Total) Result is 2n bits!
Integer arithmetic - remarks
• Signed Multiply and Division
– Algorithms exist
– We will not use them
• What to do with extra bits?
• Faster methods
Floating Point
Non Integers - Other Methods
• Fixed Point
–
–
–
–
example: # # # . #
Binary point shifted
Integer arithmetic (extra shifting)
Small number magnitude
• Rational
– a/b
(a,bZ)
Floating Point
• Exponent + Significand (= Mantisa)
• x = s • 2e
• Example:
s=101 e=011
x = 101 • 211 = 5 • 23 = 40 = 101000
Uniqueness
• Denormal Numbers:
123.456  107
0.123  104
• Normalized:
#.###  10#
1.123  104
• What about 0 ?
Floating Point Standard
• Why Standartize?
–
–
–
–
Hardware accelerators
Software compatibility
Build Software Libraries
etc…..
• IEEE 754-1985
ISO/IEC 559
• Includes: Structure, Arithmetic results
Float Types
• 4 Precision Types:
–
–
–
–
Single
Single extended
Double
Double extended
Single Precision
• 32 bits:
1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Sign(1) Exponent(8)
Significand(23)
• Exponent (e):
• Significand (f):
Biased ( + 127)
Fixed fraction: 0 . # # # …
• Nuber:
1.f • 2e-127
Single Precision - Example
• 1 10000001 01000000000000000000000
• 10000001 = 129

• 01000… = 0.01000… 
• X = - 1.25 • 22
• X= -5
129-127=2
1.01= 1.25
Single Precision - Range
• Emax = 127
• Emin = -126
(e = 254)
(e = 1)
• Why |Emin|<|Emax|?
– 1/2Emin does not overflow
• Why Biased notation?
• What about 0 and 255 ?
Floating Point Precision
Single
Single
Extend
Double
Double
Extend
Format Width
32
43
64
80
Precision
24
32
53
64
Emax
+127
1023
1023
16383
Emin
-126
-1022
-1022
-16382
8
11
11
15
Exp. Width
Exp. Bias
127
1023
Exmaples
• We shall use base 10 sometimes:
• f will have 3 digits
• Emax will be 98
• Emin will be -97
• Ex:
5.341070
NaN
• Not a Number
• Result of ilegal computation:
– 0    () 0   rem(x,0) rem(, y)
0 
– Any computation involving a NaN
• e = Emax + 1 & f  0
• # 11111111 #######################
• Many NaN’s (different f’s)
NaN’s in use
• Zero finder outside domain
– f(x) = sqrt(x) - 1
• Works since all computations NaN
• No exception caused !
Zero’s
• 0 00000000 00000000000000000000000 ?
• this is NOT 1.02Emin
• 1 00000000 00000000000000000000000 ?
• 0 is signed!
0 both exits!
• What is the difference?
Signed 0’os
• +0 = -0
BUT:
• Multiply/Divide keep sign rules:
3  (0)  (0) 3  (0)  (0)
• Monivation:
– Using inf correctly (describe later)
– log(x) : log(0)=-inf
log(negative)=Nan
log(x) if x(-0) ?
± inf
 1  
0
 1  
0
 1  
0
 1  
0
• More logic:
 1  0

• e = Emax + 1 &
x  ()  
f=0
• # 11111111 00000000000000000000000
Inf usage Example
1
cos (x)  2 tan
1
1 x
1 x
(If tan-1 is defined properly)
More on 0’os and inf’s
• General Rule for 0/inf arithmetic:
– Take appropriate limit:
3 
3  
 0 lim
x
x 0
3 
3  (0)
() lim
x
x ()
• 1/(1/x) where x=0 or inf
• Why not Max # instead?
x2  y2
x  3  1070
y  4  1070
9.99  1098  3.16  1049
answare :
5  1070
Zero’s and inf’s - yet again
• X/(x2+1) is bad! Why?
• 1/(x+x-1) is better
• Do we need to check for x=0?
• Using 2 zero’s and inf’s saves some
special cases checks.
Denormalized numbers
• Example:
–
–
–
–
x=1.23•10-98 y=1.11•10-98
x-y = 1.20•10 -99 = 0
so: x-y=0 but: x  y
think of:
if(x  y) then z=1/(x-y)
• Soluition:
– use denormalized numbers!
Denormal Numbers
• Smallest normal:
• Below, use denormal:
1.0 • 2Emin
0.f • 2Emin
• e = Emin - 1 & f  0
• # 00000000 #######################
• Gradual underflow: 1.23 • 10-4 ( /10 )
0.12 • 10-4
( /10 )
0.01 • 10-4
( /10 )
0
Denormal Numbers
• Back to our Example:
– x=1.23•10-98 y=1.11•10-98
– x-y = 0.12•10 -98
– and this is not 0 !
Flush to 0 Vs Gradual Underflow
0
2-4 2-3
2-2
2-1
0
2-4 2-3
2-2
2-1
Special Values - Summary
Exponent
Fraction
Represents
Emin-1
f=0
0
Emin-1
f 0
0.f2Emin
---f=0
1.f2e
0
f 0
0.f2Emin
Emin  e  Emax
Emax+1
Emax+1
Rounding
• Why is rounding needed?
•
•
•
•
Infinit numbers  Finit representation
Integers only overflow
Almost all operations need rounding
IEEE - specifies algorithms for arithmetic
Numbers need rounding
• Out of range:
– x>22Emax
x<12Emin
• Between 2 floats:
– 0.110 = 0.00011001100….2 = 1.1001100…. 2-4
– 1.1001 2-4
Measuring Error
• ULPS
(units in last place)
– 1.1210-1 Vs
– 1.1210-1 Vs
0.124
0.118
: 0.4 ulps
: 0.2 ulps
• Relative Error
– Difference/Original
– 1.1210-1 Vs 0.124
: Err=0.004/0.124=0.032
Calculate Using Rounding
• Benign cancellation
– Calculate 10.1-9.93 (= 0.17)
1.01 101
0.99 101
0.02 101 = 2.00 10-1
– 30 upls!
Rounding problems
• Catastrophic cancellation
–
–
–
–
b2-4ac
both b2 and 4ac are rounded
the (-) exposes the error
b=3.34 a=1.22 c=2.28
b2=11.2 4ac=11.1 b2-4ac=0.10
correct=0.0292
(70.08 upls)
IEEE Arithmetic
• Requirement:
+-
shold be EXACTLY
rounded
remainder
shold be EXACTLY rounded
Integer conv. shold be EXACTLY rounded
• Not all (transcendental, binary to decimal)
• “Tie break” - Round to Even
Round to Even
• How will 1.005 be rounded ?
– Round Up:
– Round Even:
1.01
1.00
• Why? Example:
– xi=xi-1+y-y x0=1.00 y=0.125
– Round up: 1.00, 1.01, 1.02, ….
– Round even: 1.00, 1.00, 1.00, ….
Float Multiplication
e1  e2
(s1  2 )  (s2  2 )  (s1  s2)  2
e1
e2
Integer Biased
multiply additio
n
•“Biased addition”:
(e1  127)  (e2  127)  127  (e3  127)
-detect Overflow:
Use n+1 bit adder
-detect Underflow: Harder (Denormals)
Rounding Multiplication
1.23
6.78
8.3394
Round to 8.34
X
1.0001
1
Round bit
0
2.83
4.47
12.6501
Round to 1.27
1.0010
0
Round bit
1
All rest 0
X
1.0010
1
Round bit
1
All rest 0
1.28
7.81
09.9968
Round to 1.00
X
0.1101
0
Shift needed
Round, Guard, Sticky
1.001000100
number
round
sticky
0.110100010
number
guard round
sticky
Rounding Multiplication
Shift
Carry
P
A
n
n
B
n
Product
Results:
Case 1: x0=0, shift
x0x1.x2x3x4x
Case 2: x0=1, inc.
exp
X0.x1x2x3x4x5
5
x1.x2x3x4x5 g
g r s s s
s
Sticky
bit
Roun
d digit
Rounding rules
• r=0
• r=1, s=1
• r=1, s=0



• Denormals 
rounded OK
add 1 to LSB
add 1 if LSB=1
Extra shifting
Float addition
• Compute all digits and round?
– 1.00220 + 1.00 2-20 = 10000000….0000001
– too long!
• Use Round and Sticky bits:
– shift to same exponent
– r = first discarded digit
– s = OR of rest discarded
Float addition - example
Calculate:
Shift exponents:
0.000011000120
1.1001120 + 1.100012-5
1.1001120 +
r=1
1.10011
+ .00001
1.10100
r=1, s=1
Round needed! 1.10101
s=0|0|0|1=
1
Signed Addition/Substraction
• Simplest way - convert to 2’s cmpl.
• Cancellation of high order bit - shift
1.00000
0.00000101111
cmpl
1.1111101000
1
1.00000
+ 1.11111
0.11111
• more bits cancel - How many guard digits?
Float Division
e1 e2
(s1  2 )  (s2  2 )  (s1  s2)  2
e1
e2
Integer
Biased
division substractio
n
•
•
•
•
•
Very similar to Multiplication
Dividing using integer divide
Compute 2 more bits (round, guard)
Use remainder as sticky bit (Why?)
Sign bit: XOR
More on floats
Rounding modes
• IEEE specifies 4 modes:
–
–
–
–
Nearest
towards 0
towards +inf
towards -inf
(default)
• affects overflow (How?)
Exceptions
• Set a flag at:
–
–
–
–
–
Underflow
Overflow
divide by 0
inexact
invalid
• flags are sticky
1.02Emin x 1.02Emin
1.02Emax x 1.02Emax
1/0
Rounded was needed
NaN return operations
Speeding up
• Different algorithms may be used
• Result should be exact
• divide SRT algorithm in pentium
– 5/2048 entries in a table
– 1/9,000,000 chance
– check:
Precision
• Why extended precisions?
– Return higher accuracy (D*Dext. D)
– use for computations:
x2  y 2