Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand 4 exponent A great number of corresponding binary standards exists. There is one common standard: IEEE 754-1985 (IEC 559) Computer Engineering FloatingPoint page 1 IEEE 754-1985 Number representations: – Single precision (32 bits) sign: exponent:8 bits fraction: 23 bits 1 bit – Double precision (64 bits) sign: exponent:11 bits fraction: 52 bits 1 bit Computer Engineering FloatingPoint page 2 Single Precision Format Sign S 1 S 8 E Exponent E: excess 127 binary integer 23 M Mantissa M: normalized binary significand w/ hidden integer bit: 1.M Excess 127; actual exponent is e = E - 127 N = (-1)S * (1.M [bit-string])*2e Computer Engineering FloatingPoint page 3 Example 1 S 1 E 01111110 M 10000000000000000000000 e = E - 127 e = 126 - 127 = -1 N = (-1)1 * (1.1 [bit-string]) *2-1 N = -1 * 0.11 [bit-string] N = -1 * (2-1 *1 + 2 -2 *1) N = -1 * (0.5*1 + 0.25*1) = -0.75 Computer Engineering FloatingPoint page 4 Single Precision Range Magnitude of numbers that can be represented is in the range: 2-126 *(1.0) to 2-127 *(2-223) which is approximately: 1.8*10-38 to 3.4 *1038 Computer Engineering FloatingPoint page 5 IEEE 754-1985 Fraction part: 23 / 52 bits; 0 x < 1 Significand: 1 + fraction part. “1” is not stored; “hidden bit”. corresponds to 7 resp. 16 decimal digits. Exponent: 127 / 1023 added to the exponent; “biased exponent”. corresponds to 10 -39 to 10 39 / 10 -308 to 10 308 Computer Engineering FloatingPoint page 6 IEEE 754-1985 Special features: – Correct rounding of “halfway” result (to even number). – Includes special values: NaN - Not a number Infinity - Infinity – Uses denormal number to represent numbers less than 2 -E min – Rounds to nearest by default; Three other rounding modes exist. – Sophisticated exception handling. Computer Engineering FloatingPoint page 7 Add / Sub (s1 * 2e1) +/- (s2 * 2 e2 ) = (s1 +/- s2) * 2 e3 = s3 * 2 e3 – s = 1.s, the hidden bit is used during the operation. 1: Shift summands so they have the same exponent: – e.g., if e2 < e1: shift s2 right and increment e2 until e1 = e2 2: Add/Sub significands using the sign bits for s1 and s2. – set sign bit accordingly for the result. 3: Normalize result (sign bit kept separate): – shift s3 left and decrement e3 until MSB = 1. 4: Round s3 correctly. – more than 23 / 52 bits is used internally for the addition. Computer Engineering FloatingPoint page 8 Multiplication (s1 * 2e1) * (s2 * 2 e2 ) = s1 * s2 * 2 e1+e2 so, multiply significands and add exponents. Problem: Significand coded in sign & magnitude; use unsigned multiplication and take care of sign. Round 2n bits significand to n bits significand. Normalize result, compute new exponent with respect to bias. Computer Engineering FloatingPoint page 9 Accurate Arithmetic 1. Multiply the two significands to get the 2n-bits product: – But we have only 23/52 bit to store the result! P x0 x1 x2 x3 x4 x5 A g r s s s s guard round bit bit Case 1: x0 = 0, shift needed: P x1 x2 x3 x4 x5 g The s bits are OR:ed together (“sticky bit”) STICKY A r STICKY 0….0 Case 2: x0 = 1, increment exponent, set g = r; r = STICKY or r. P x0 x1 x2 x3 x4 x5 A r (STICKY or r) STICKY Computer Engineering FloatingPoint page 10 Rounding 2: For both cases: if r = 0, P is the correctly rounded product. if r = 1 and STICKY = 1, then P + 1 is the correctly rounded product if r = 1 and s = 0, (the “halfway case”), then P is the correctly rounded product if x5 (or g) is 0 P+1 is the correctly rounded product if x5 (or g) is 1 Computer Engineering FloatingPoint page 11 Division e1 (s1 * 2 e1 e1-e2 ) / (s2 * 2 ) = (s1 / s2) * 2 so, divide significands and subtract exponents Problem: Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign Round n + 2 (guard and round) bits significand to n bits significand Compute new exponent with respect to bias Computer Engineering FloatingPoint page 12