Download Floating point

Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand 4 exponent A great number of corresponding binary standards exists. There is one common standard: IEEE 754-1985 (IEC 559) Computer Engineering FloatingPoint page 1 IEEE 754-1985  Number representations: – Single precision (32 bits) sign: exponent:8 bits fraction: 23 bits 1 bit – Double precision (64 bits) sign: exponent:11 bits fraction: 52 bits 1 bit Computer Engineering FloatingPoint page 2 Single Precision Format Sign S 1 S 8 E Exponent E: excess 127 binary integer 23 M Mantissa M: normalized binary significand w/ hidden integer bit: 1.M Excess 127; actual exponent is e = E - 127 N = (-1)S * (1.M [bit-string])*2e Computer Engineering FloatingPoint page 3 Example 1 S 1 E 01111110 M 10000000000000000000000 e = E - 127 e = 126 - 127 = -1 N = (-1)1 * (1.1 [bit-string]) *2-1 N = -1 * 0.11 [bit-string] N = -1 * (2-1 *1 + 2 -2 *1) N = -1 * (0.5*1 + 0.25*1) = -0.75 Computer Engineering FloatingPoint page 4 Single Precision Range Magnitude of numbers that can be represented is in the range: 2-126 *(1.0) to 2-127 *(2-223) which is approximately: 1.8*10-38 to 3.4 *1038 Computer Engineering FloatingPoint page 5 IEEE 754-1985  Fraction part: 23 / 52 bits; 0 x < 1  Significand: 1 + fraction part. “1” is not stored; “hidden bit”. corresponds to 7 resp. 16 decimal digits.  Exponent: 127 / 1023 added to the exponent; “biased exponent”. corresponds to 10 -39 to 10 39 / 10 -308 to 10 308 Computer Engineering FloatingPoint page 6 IEEE 754-1985  Special features: – Correct rounding of “halfway” result (to even number). – Includes special values:    NaN  - Not a number Infinity - Infinity – Uses denormal number to represent numbers less than 2 -E min – Rounds to nearest by default; Three other rounding modes exist. – Sophisticated exception handling. Computer Engineering FloatingPoint page 7 Add / Sub (s1 * 2e1) +/- (s2 * 2 e2 ) = (s1 +/- s2) * 2 e3 = s3 * 2 e3 – s = 1.s, the hidden bit is used during the operation. 1: Shift summands so they have the same exponent: – e.g., if e2 < e1: shift s2 right and increment e2 until e1 = e2 2: Add/Sub significands using the sign bits for s1 and s2. – set sign bit accordingly for the result. 3: Normalize result (sign bit kept separate): – shift s3 left and decrement e3 until MSB = 1. 4: Round s3 correctly. – more than 23 / 52 bits is used internally for the addition. Computer Engineering FloatingPoint page 8 Multiplication (s1 * 2e1) * (s2 * 2 e2 ) = s1 * s2 * 2 e1+e2 so, multiply significands and add exponents. Problem: Significand coded in sign & magnitude; use unsigned multiplication and take care of sign. Round 2n bits significand to n bits significand. Normalize result, compute new exponent with respect to bias. Computer Engineering FloatingPoint page 9 Accurate Arithmetic 1. Multiply the two significands to get the 2n-bits product: – But we have only 23/52 bit to store the result! P x0 x1 x2 x3 x4 x5 A g r s s s s guard round bit bit Case 1: x0 = 0, shift needed: P x1 x2 x3 x4 x5 g The s bits are OR:ed together (“sticky bit”) STICKY A r STICKY 0….0 Case 2: x0 = 1, increment exponent, set g = r; r = STICKY or r. P x0 x1 x2 x3 x4 x5 A r (STICKY or r) STICKY Computer Engineering FloatingPoint page 10 Rounding 2: For both cases: if r = 0, P is the correctly rounded product. if r = 1 and STICKY = 1, then P + 1 is the correctly rounded product if r = 1 and s = 0, (the “halfway case”), then P is the correctly rounded product if x5 (or g) is 0 P+1 is the correctly rounded product if x5 (or g) is 1 Computer Engineering FloatingPoint page 11 Division e1 (s1 * 2 e1 e1-e2 ) / (s2 * 2 ) = (s1 / s2) * 2 so, divide significands and subtract exponents Problem: Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign Round n + 2 (guard and round) bits significand to n bits significand Compute new exponent with respect to bias Computer Engineering FloatingPoint page 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Floating point