Download document

Floating Point Numbers • Floating point is used to represent “real” numbers • 1.23233, 0.0003002, 3323443898.3325358903 • Real means “not imaginary” • Computer floating-point numbers are a subset of real numbers • Limit on the largest/smallest number represented • Depends on number of bits used • Limit on the precision • 12345678901234567890 --> 12345678900000000000 • Floating Point numbers are approximate, while integers are exact representation Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 2 Scientific Notation + 34.383 x 102 = 3438.3 Sign Significand Exponent + 3.4383 x 103 = 3438.3 Normalized form: Only one digit before the decimal point +3.4383000E+03 = 3438.3 Floating point notation 8 digit significand can only represent 8 significant digits Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 3 Binary Floating Point Numbers + 101.1101 = 1 x 22 + 0 x 21 + 1 x 20 + 1 x 2-1 + 1 x 2-2 + 0 x 2-3 + 1 x 2-4 = 4 + 0 + 1 + 1/2 + 1/4 + 0 + 1/16 = 5.8125 +1.011101 E+2 Normalized so that the binary point immediately follows the leading digit Note: First digit is always non-zero --> First digit is always one. Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 4 Converting Decimal Fractions to Binary Multiply by a power of 2, convert to binary, divide by the same power of 2 Example: 13.387 220 13.387 x 1048576 = 14037286.912 1. Multiply by 2. If fraction remains, multiply by a larger number or truncate it 3. Convert integer portion to binary 1403728610 = 1101011000110001001001102 4. Divide by 220 (shift radix point left 20) 1101.011000110001001001102 20 bits This works with any power of 2! Use larger powers to get more bits. Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 5 IEEE Floating Point Format 31 30 23 22 0 8 bits Sign 0: Positive 1: Negative 23 bits Exponent Significand Biased by 127. Leading ‘1’ is implied, but not represented Number = -1S * (1 + Sig) x 2E-127 • Allows representation of numbers in range 2-127 to 2+128 (10±38) • Since the significand always starts with ‘1’, we don’t have to represent it explicitly • Significand is effectively 24 bits • Zero is represented by Sign=Significand=Exp=0 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 6 IEEE Double Precision Format Sign 63 62 52 51 11 bits 32 20 bits Exponent 31 Bias:1023 Significand 0 32 bits Number = -1S * (1 + Sig) x 2E-1023 • Allows representation of numbers in range 2-1023 to 2+1024(10± 308) • Larger significand means more precision • Takes two registers to hold one number Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 7 Conversion Convert 5.75 to Single-Precision IEEE Floating Point 1. Convert 5.7510 to Binary ---> 101.112 2. Normalize ---> 1.0111 x 22 Significand Exponent 3. Sign = 0 (positive). 4. Add 127 (bias) to exponent. Exponent = 12910 = 100000012 5. Express significand as 24 bits Sig = 1.01110000000000000000000 6. Remove leading one from significand, leaving 23 bits Sig = .01110000000000000000000 7. Put in proper bit fields Number = 0 10000001 01110000000000000000000 = 0x40B80000 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 8 Adding Floating Point Numbers 1.2232E+3 + 4.211E+5 1. Normalize to higher exponent a. Find the difference between exponents (= 2) b. Shift smaller number right by that amount 1.2232E+3 == 0.012232E+5 2. Now that exponents are the same, add significands together 4.211 E+5 + 0.012232 E+5 4.223232 E+5 5.0 E+2 Note: If carry out of MSD, re-normalize + 7.0 E+2 12.0 E+2 = 1.2 E+3 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 9 Adding IEEE Floating Point Numbers SE Sig. 0x45B8CD8D --> 0 8B 38CD8D = 5913.69410 + 0x46FC8672 --> 0 8D 7C8672 = 32323.2210 1. Check for Sign=Exp=Significand=0 --> If so, treat as a special case 2. Put the ‘1’ back in bit 23 of significands 38CD8D = 011 1000 1100 1101 1000 1101 ---> 1011 1000 1100 1101 1000 1101 = B8CD8D 7C8672 = 111 1100 1000 0110 0111 0010 ---> 1111 1100 1000 0110 0111 0010 = FC8672 0 8B B8CD8D + 0 8D FC8672 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 10 Adding IEEE Floating Point Numbers 0 8B B8CD8D + 0 8D FC8672 3. Normalize to higher exponent: a. Find difference in exponents: 8D - 8B = 2 b. Shift significand of number with smaller exponent right by the difference B8CD8D = 1011 1000 1100 1101 1000 1101 right shift by 2 --> 0010 1110 0011 0011 0110 0011 = 2E3363 c. Set lower-valued exponent to higher one 0 8D 2E3363 (re-normalized form of 0 8B B8CD8D) + 0 8D FC8672 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 11 Adding IEEE Floating Point Numbers 0 8D 2E3363 + 0 8D FC8672 4. Add significands: (note: carry produced one too many bits) 0010 1110 0011 0011 0110 0011 + 1111 1100 1000 0110 0111 0010 1 0010 1010 1011 1001 1101 0101 = 12AB9D5 5. Since bit 24 is ‘1’, we must re-normalize by shifting significand right 1 and incrementing exponent by one. 1 0010 1010 1011 1001 1101 0101 SRL --> 1001 0101 0101 1100 1110 1010 = 955CEA (significand) exp: 8D --> 8E Result is: 6. Get rid of bit 23 in significand (for IEEE standard) 0 8E 155CEA or 0x47155CEA 1001 0101 0101 1100 1110 1010 = 38236.9110 --> 001 0101 0101 1100 1110 1010 = 155CEA Bit 24 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 12 Multiplying Floating Point Numbers 34.233 E +09 * 212.32 E +03 1. Add exponents: --> 9 + 3 = 12 2. Multiply significands --> 34.233 * 212.32 = 7268.35056 3. Result is 7268.35056 E +12 4. Normalize: 7.26835056 E +15 Note: Number of digits to right of decimal point in product = sum of the number of bits to right of decimal points in factors 5. Truncate extra bits... --> 7.26835 E +15 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 13 Multiplying IEEE Floating Point Numbers 0 8B 38CD8D = 5913.69410 x 0 8D 7C8672 = 32323.2210 1. Check for zero. 2. Add exponents. (Note: both have the bias of 127 already. Only want to bias once, so subtract 127 (7F) .) 8B = 0C+7F. 8D = 0E +7F. Sum: (0C+7F)+(0E+7F)-7F = (1A+7F)=99 3. Put ‘1’ back onto bit 23, multiply significands. 38CD8D --> B8CD8D Multiplying two 24-bit numbers, each with 7C8672 --> FC8672 23 bits to the right of the binary point – result has 46 bits to the right of the point B8CD8D * FC8672 = 10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 14 Multiplying IEEE Floating Point Numbers 0 8B 38CD8D = 5913.69410 x 0 8D 7C8672 = 32323.2210 10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010 5. Re-normalize so one place to left of binary point. 1.011 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010 (Add one to exponent) --> 99 + 1 = 9A 6. Remove extra bits so only 24 bits remain (truncate) 1.011 0110 0100 1011 0110 0100 7. Remove implied one (bit 23) 011 0110 0100 1011 0110 0100 Result is: 0 9A 364B64 = 191149632.174710 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 15

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download document