Download PPT - Bucknell University

CSCI206 - Computer Organization & Programming Introduction to Floating Point Numbers zyBook: 10.5 Developed and maintained by the Bucknell University Computer Science Department - 2017 Real Numbers in Binary Recall the equation describing a positional number system: Recall the equation describing a positional number system: This can be extended to real numbers: decimal point! http://www.virtualnerd.com/pre-algebra/real-numbers-right-triangles/real-and-irrational/define-realnumbers/real-number-definition Real Numbers in Binary Examples base 10 0.1 in binary is = 0.5 (base 10) Real Numbers in Binary Fractions of a power of 2 are easy Other values are represented using the sum of fractions with a power of 2 denominator Algorithm to Convert Decimal to Binary For decimal to binary integers we divided by 2 and record the remainder. Example: convert For decimal to binary real numbers we multiply by 2 and record the integer part. to binary. to binary binary digits beginning to the right of the decimal to binary binary digits beginning to the right of the decimal to binary binary digits beginning to the right of the decimal to binary binary digits beginning to the right of the decimal to binary only work with fractional part! binary digits beginning to the right of the decimal to binary = 0.00011 It appears magic, but the reason behind the algorithm is to find k, such that v * 2^k = 1.0, as v is expressed as b0*2^(-1)+b1*2^(-2) … fraction part is zero, stop! Convert binary to decimal Convert 0.000112 to decimal 0.000112 == 2-4 + 2-5 == 1/16 + 1/32 == 3/32 == 0.0937510 Number Representation in Computing For a given range of integers, there is a corresponding range of (exact) binary representations Example: the range [0-15] corresponds to the 4-bit binary numbers 0000 through 1111. Number Representation in Computing Within a range of real numbers, there is no way to encode all possible values. Example: the range [0.0 - 1.0] has an infinite number of points, so we would need an infinite number of bits to represent all of the possible values! As a result, in computing, real numbers are approximate. Activity 24, question 1 - 2 An Observation Many common base 10 real numbers generate an infinite number of binary digits Fixed vs. Floating Point Approximations Decimal notation Floating Point Representation 2 2×100 300 3×102 321.7 3.217×102 −53,000 −5.3×104 6,720,000,000 6.72×109 0.2 2×10−1 Where S is the sign bit M is a fixed point number (precision of numbers) E is a signed integer (range of numbers) S Exponent Mantissa or Fraction 32 or 64 bit word Scientific notation IEEE 754 Standard (1985) S Exponent Mantissa One bit for Sign Single precision float (32 bits) 8 bit Exponent 23 bit Mantissa Double precision float (64 bits) 11 bit Exponent 52 bit Mantissa IEEE 754 Standard (1985) S Exponent Mantissa Mantissa is normalized, meaning it is a fixed point number in the form 1.xxxxxx to save one bit, the 1. is implicit (not represented) Exponent is represented in biased form B = 127 for single B = 1023 for double IEEE 754 Standard (1985) (normalized) S Exponent Mantissa S, E, and M are encoded in the binary word IEEE754 - Reserved Values Not a Number = IEEE754 - Example Show 3.14 as a single precision float 3.14 - step 1 write in binary 3.14 == 3 + 0.14 0.14*2 = 0.28 0.28*2 = 0.56 0.56*2 = 1.12 0.12*2 = 0.24 ...... 0.0010...... 3.14 - step 1 write in binary need 24 bits for single (52 for double) 3.14 == 11.0010001111010111000011 3.14 - step 2 normalize binary Normalized form is 1.yyyyy 3.14 == 11.0010001111010111000011 == Note a total of 24 bits. 3.14 - step 3 write mantissa & sign 3.14 == M = 10010001111010111000011 S = 0 (positive) Note that the mantissa keeps only 23 bits, the leading bit is always 1, so it is omitted in representation (only!!). 3.14 - step 4 encode exponent 3.14 == Exponent = 1, B = 127, (8 bits) E (biased exponent) = 128 = 1000 0000 3.14 - step 5 write result S = 0 (positive) E = 1000 0000 M = 10010001111010111000011 0 1000 0000 10010001111010111000011 to hex = 0x4048f5c3 Endianness On a little-endian system (Intel, etc), the IEEE754 value is byte & word swapped 0x 40 48 f5 c3 (big endian) 4840 c3f5 Swap bytes and words! 0x c3f5 4840 (little endian) float f = 3.14; unsigned char* p = (unsigned char*)&f; printf("%02x%02x %02x%02x\n", *p, *(p+1), *(p+2), *(p+3)); // result: c3f5 4840

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT - Bucknell University