Download The IEEE 754 standard floating point number system 32 bit floating

The IEEE 754 standard floating point number system 32 bit floating point. A floating point number has the form ±(1.b1 b2 . . . b23 )2 × 2M . The sign is determined by the sign bit: ( ± = (−1)s = 1, s = 0 −1, s = 1 The exponent M is determined by 8 exponent bits e1 e2 . . . e8 as follows. The cases of all zeros and all ones are used to represent special values (zero, positive infinity, NaN, etc.). Otherwise we interpret e1 e2 . . . e8 as a binary integer, subtract the bias value 127, and the result is M . Thus the least exponent is M = −126 with the bit string e1 e2 . . . e8 = 00000001, and the greatest possible exponent is M = 127 with bit string e1 e2 . . . e8 = 11111110. The remaining 23 mantissa bits and the implied leading 1 then determine the fractional part: (1.b1 b2 . . . b23 )2 = 1 + b1 /2 + b2 /4 + . . . + b23 /223 . The least value, when all the mantissa bits are 0, is 1, and the greatest value, when all the mantissa bits are 1, is 2 − 223 . Combining these we see that the largest floating point number has the bit string 0 11111110 11111111111111111111111 which represents the value (2 − 223 ) × 2127 = 2128 − 2104 ≈ 3.4 × 1038 . The smallest positive floating point number has bit string 0 00000001 00000000000000000000000 with the value 2−126 ≈ 1.2 × 10−38 . The machine epsilon, which is the spacing between consecutive floating point numbers in the interval [1, 2], is 2−23 ≈ 1.2 × 10−7 . smallest exponent −126 largest exponent +127 smallest positive number 1.2 × 10−38 largest number 3.4 × 10+38 machine epsilon 1.2 × 10−7 64 bit floating point This uses 1 sign bit, 11 exponent bits, and 53 bits for the mantissa, not counting the leading 1. The bias is 1023. This gives: smallest exponent −1022 largest exponent +1023 smallest positive number 2.2 × 10−308 largest number 1.8 × 10+308 machine epsilon 1.1 × 10−16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The IEEE 754 standard floating point number system 32 bit floating