* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download document
Survey
Document related concepts
Transcript
Floating Point Numbers • Floating point is used to represent “real” numbers • 1.23233, 0.0003002, 3323443898.3325358903 • Real means “not imaginary” • Computer floating-point numbers are a subset of real numbers • Limit on the largest/smallest number represented • Depends on number of bits used • Limit on the precision • 12345678901234567890 --> 12345678900000000000 • Floating Point numbers are approximate, while integers are exact representation Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 2 Scientific Notation + 34.383 x 102 = 3438.3 Sign Significand Exponent + 3.4383 x 103 = 3438.3 Normalized form: Only one digit before the decimal point +3.4383000E+03 = 3438.3 Floating point notation 8 digit significand can only represent 8 significant digits Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 3 Binary Floating Point Numbers + 101.1101 = 1 x 22 + 0 x 21 + 1 x 20 + 1 x 2-1 + 1 x 2-2 + 0 x 2-3 + 1 x 2-4 = 4 + 0 + 1 + 1/2 + 1/4 + 0 + 1/16 = 5.8125 +1.011101 E+2 Normalized so that the binary point immediately follows the leading digit Note: First digit is always non-zero --> First digit is always one. Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 4 Converting Decimal Fractions to Binary Multiply by a power of 2, convert to binary, divide by the same power of 2 Example: 13.387 220 13.387 x 1048576 = 14037286.912 1. Multiply by 2. If fraction remains, multiply by a larger number or truncate it 3. Convert integer portion to binary 1403728610 = 1101011000110001001001102 4. Divide by 220 (shift radix point left 20) 1101.011000110001001001102 20 bits This works with any power of 2! Use larger powers to get more bits. Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 5 IEEE Floating Point Format 31 30 23 22 0 8 bits Sign 0: Positive 1: Negative 23 bits Exponent Significand Biased by 127. Leading ‘1’ is implied, but not represented Number = -1S * (1 + Sig) x 2E-127 • Allows representation of numbers in range 2-127 to 2+128 (10±38) • Since the significand always starts with ‘1’, we don’t have to represent it explicitly • Significand is effectively 24 bits • Zero is represented by Sign=Significand=Exp=0 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 6 IEEE Double Precision Format Sign 63 62 52 51 11 bits 32 20 bits Exponent 31 Bias:1023 Significand 0 32 bits Number = -1S * (1 + Sig) x 2E-1023 • Allows representation of numbers in range 2-1023 to 2+1024(10± 308) • Larger significand means more precision • Takes two registers to hold one number Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 7 Conversion Convert 5.75 to Single-Precision IEEE Floating Point 1. Convert 5.7510 to Binary ---> 101.112 2. Normalize ---> 1.0111 x 22 Significand Exponent 3. Sign = 0 (positive). 4. Add 127 (bias) to exponent. Exponent = 12910 = 100000012 5. Express significand as 24 bits Sig = 1.01110000000000000000000 6. Remove leading one from significand, leaving 23 bits Sig = .01110000000000000000000 7. Put in proper bit fields Number = 0 10000001 01110000000000000000000 = 0x40B80000 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 8 Adding Floating Point Numbers 1.2232E+3 + 4.211E+5 1. Normalize to higher exponent a. Find the difference between exponents (= 2) b. Shift smaller number right by that amount 1.2232E+3 == 0.012232E+5 2. Now that exponents are the same, add significands together 4.211 E+5 + 0.012232 E+5 4.223232 E+5 5.0 E+2 Note: If carry out of MSD, re-normalize + 7.0 E+2 12.0 E+2 = 1.2 E+3 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 9 Adding IEEE Floating Point Numbers SE Sig. 0x45B8CD8D --> 0 8B 38CD8D = 5913.69410 + 0x46FC8672 --> 0 8D 7C8672 = 32323.2210 1. Check for Sign=Exp=Significand=0 --> If so, treat as a special case 2. Put the ‘1’ back in bit 23 of significands 38CD8D = 011 1000 1100 1101 1000 1101 ---> 1011 1000 1100 1101 1000 1101 = B8CD8D 7C8672 = 111 1100 1000 0110 0111 0010 ---> 1111 1100 1000 0110 0111 0010 = FC8672 0 8B B8CD8D + 0 8D FC8672 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 10 Adding IEEE Floating Point Numbers 0 8B B8CD8D + 0 8D FC8672 3. Normalize to higher exponent: a. Find difference in exponents: 8D - 8B = 2 b. Shift significand of number with smaller exponent right by the difference B8CD8D = 1011 1000 1100 1101 1000 1101 right shift by 2 --> 0010 1110 0011 0011 0110 0011 = 2E3363 c. Set lower-valued exponent to higher one 0 8D 2E3363 (re-normalized form of 0 8B B8CD8D) + 0 8D FC8672 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 11 Adding IEEE Floating Point Numbers 0 8D 2E3363 + 0 8D FC8672 4. Add significands: (note: carry produced one too many bits) 0010 1110 0011 0011 0110 0011 + 1111 1100 1000 0110 0111 0010 1 0010 1010 1011 1001 1101 0101 = 12AB9D5 5. Since bit 24 is ‘1’, we must re-normalize by shifting significand right 1 and incrementing exponent by one. 1 0010 1010 1011 1001 1101 0101 SRL --> 1001 0101 0101 1100 1110 1010 = 955CEA (significand) exp: 8D --> 8E Result is: 6. Get rid of bit 23 in significand (for IEEE standard) 0 8E 155CEA or 0x47155CEA 1001 0101 0101 1100 1110 1010 = 38236.9110 --> 001 0101 0101 1100 1110 1010 = 155CEA Bit 24 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 12 Multiplying Floating Point Numbers 34.233 E +09 * 212.32 E +03 1. Add exponents: --> 9 + 3 = 12 2. Multiply significands --> 34.233 * 212.32 = 7268.35056 3. Result is 7268.35056 E +12 4. Normalize: 7.26835056 E +15 Note: Number of digits to right of decimal point in product = sum of the number of bits to right of decimal points in factors 5. Truncate extra bits... --> 7.26835 E +15 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 13 Multiplying IEEE Floating Point Numbers 0 8B 38CD8D = 5913.69410 x 0 8D 7C8672 = 32323.2210 1. Check for zero. 2. Add exponents. (Note: both have the bias of 127 already. Only want to bias once, so subtract 127 (7F) .) 8B = 0C+7F. 8D = 0E +7F. Sum: (0C+7F)+(0E+7F)-7F = (1A+7F)=99 3. Put ‘1’ back onto bit 23, multiply significands. 38CD8D --> B8CD8D Multiplying two 24-bit numbers, each with 7C8672 --> FC8672 23 bits to the right of the binary point – result has 46 bits to the right of the point B8CD8D * FC8672 = 10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 14 Multiplying IEEE Floating Point Numbers 0 8B 38CD8D = 5913.69410 x 0 8D 7C8672 = 32323.2210 10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010 5. Re-normalize so one place to left of binary point. 1.011 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010 (Add one to exponent) --> 99 + 1 = 9A 6. Remove extra bits so only 24 bits remain (truncate) 1.011 0110 0100 1011 0110 0100 7. Remove implied one (bit 23) 011 0110 0100 1011 0110 0100 Result is: 0 9A 364B64 = 191149632.174710 Seattle Pacific University EE/CS/CPE 3760 - Computer Organization Ch3d- 15