Download Appendix B Floating Point Numbers

Appendix B Floating Point Numbers Making a computer’s limited bit words represent a larger range of numerical values. Modeling real floating-point number representations. Mantissa – defined fractional component of a floating point number Exponent – the exponentiation radix and value of the exponent X  frac  R exponent or Decimal floating point range of values 1 of 9 X   1 sign  frac  R exponent Binary Floating Point X  frac  R exp onent Typical two’s complement The fractional portion is in:  Fractional two’s complement 1  frac  0.5 and frac  0 and  0.5  frac  1  Mixed two’s complement  Hidden 1, two’s complement 2 Off  frac  2 Off 1 and  2 Off 1  frac  2 Off 2   frac  1  1 and  1   frac  1  2 The exponent portion is in (E is number of bits)  Integer two’s complement 2 E 1  1  exp  2 E 1 Note: specific values of the exponent may be used as “flags”. Flags may be required to represent 0 and infinity TI initially used a hidden-1, two’s complement format for floating point. They now support IEEE, which based on hidden-1, sign-magnitude Typical Sign-magnitude X   1 sign  frac  R exp onent The fractional portion is in:  Fractional sign-magnitude 1  frac  0  Mixed sign-magnitude 2 Off  frac  2 Off 1  Hidden 1, sign-magnitude 2  frac  1  Hidden 1, sign-magnitude 2   frac  1  1 and  1   frac  1  2 The exponent portion is in (E is number of bits)  Excess M (integer with offset M) 2 E  1  M  exp   M Note: specific values of the exponent may be used as “flags”. Flags may be required to represent 0 and infinity IEEE floating point is based on hidden-1, sign-magnitude, excess exponent 2 of 9 Normalized Numbers Numerical processes may result in numbers that are not in the correctly normalized floatingpoint format. Text Examples of normalized and unnormalized numbers Terms: Normalized The number is in the defined format Unnormalized The number is not in the normalized format Denormalized An unnormalized number, but it may be that way for a reason. Denormalized representations are required when adding or subtracting two normalized numbers! Since they may have unequal exponents, something has to be done with the mantissas. 3 of 9 IEEE Std. 754 Floating Point Numbers IEEE Floating Point Numbers consist of 32-bit number representations with unique fractional and exponent segments. The fractional parts are offset sign-magnitude, while the exponents are in an excess 127 format. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4610935&tag=1 Sign (s): 1 bit  1s Exponent (exp): Fraction (f): Single: 8 bits Single: 23 bits Double: 11 bits Double: 52 bits Single  2 exp 127 Norm  1. f1 f 2    Double  2 exp 1023 Denorm  0. f 1 f 2    Representation based on the exponent range for 32-bit, single precision. Excess 127: The “8-bit unsigned integer” is offset by 127 (i.e. UInt – 127) 0 and 255 or reserved values. Exponent Value Type 1111 1111 (255) Inf, NaN, Flags Special 1111 1110 (254) v   1  1. f 1 f 2     2127 Normalized   1000 0000 (128) v   1  1. f1 f 2     21 Normalized 0111 1111 (127) v   1  1. f1 f 2     2 0 Normalized 0111 1110 (126) v   1  1. f 1 f 2     2 1 Normalized   0000 0001 (1) v   1  1. f1 f 2     2 126 Normalized 0000 0000 (0) v   1  0. f 1 f 2     2 126 Denormalized s  s s s  s s 4 of 9 Examples of IEEE Single Precision Floating Point IEEE Hex/Binary Value 3F80 0000 Norm: v   1  2127 127  1.0000000  0011 1111 1000 0000 etc v  1 2 0 BFC0 0000 Norm: v   1  2127 127  1.1000000 1011 1111 1100 0000 etc v  1.5  2 0 7F80 0000 Special: (exp = 1111 1111) 0111 1111 1000 0000 etc +Inf FF00 0000 Norm: v   1  2 254127  1.0000000  1111 1111 0000 0000 etc v  1 2127 0070 000 Denorm: v   1  2 126  0.1110000 0000 0000 0111 0000 etc v  7  2 126 8 IEEE Hex/Binary Value 0 1 1 0 v  1 2 0 v  123 4000 0000 BFFF 0000 See: This is an on-line converter that I really liked! http://babbage.cs.qc.edu/IEEE-754/ 5 of 9 B1) Convert the following numbers to IEEE single precision floating-point format. Give the results as eight hexadecimal digits. Value Sign bit Binary w/ fraction Exponent Shifted Binary Excess 127 Exp Binary IEEE Hex IEEE 9 0 1001.0 3 = 0000 0011 1.001 130 = 1000 0010 0100 0001 0001 0000 4110 0000 5/32 0 0.00101 -3 = 1111 1101 1.010 124 = 0111 1100 0011 1110 0010 0000 3E20 0000 Value Sign bit Binary w/ fraction Exponent Shifted Binary Excess 127 Exp Binary IEEE Hex IEEE -5/32 1 0.00101 -3 = 1111 1101 1.010 124 = 0111 1100 1011 1110 0010 0000 BE20 0000 6.125 0 0110.0010 2 = 0000 0010 1.100010 129 = 1000 0001 01000 0000 1100 0100 40C4 0000 B2) Convert the following IEEE single-precision floating-point numbers from hex to decimal: IEEE Value Sign bit Excess 127 Exp Exponent Fraction Format Value Magnitude Binary Value 42E4 8000 0 1000 0101 = 133 6 1.11001001000 2^6 x 1.11001001000 111 0010.01000 114.25 3F88 0000 0 0111 1111 = 127 0 1.0001000 2^0 x 1.0001000 1.0001000 1.0625 IEEE Value Sign bit Excess 127 Exp Exponent Fraction Format Value Magnitude Binary Value 0080 0000 0 0000 0001 = 1 -126 1.0000 2^-126 x 1.0 0.000 --- 1 2^-126 C7F0 0000 1 1000 1111 = 143 16 1.1110000 -2^16 x 1.1110 11110000000000000 -122,880 6 of 9 Everyone should read: What Every Computer Scientist Should Know About Floating-Point Arithmetic David Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23, 1 (March 1991), 5-48. DOI=10.1145/103162.103163 http://doi.acm.org/10.1145/103162.103163 http://dl.acm.org/citation.cfm?id=103163 http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#674 Abstract: Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point datatype; computers from PC’s to supercomputers have floatingpoint accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on those aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating-point standard, and concludes with numerous examples of how computer builders can better support floating-point. Another on-line article: Five Tips for Floating Point Programming, By John D. Cook, 30 Oct 2008 http://www.codeproject.com/Articles/29637/Five-Tips-for-Floating-Point-Programming From Wikipedia: http://en.wikipedia.org/wiki/Floating_point  Dealing with exception cases  Accuracy problems  Machine precision definition  Example of well-formed and poorly formed iteration for computing pi. In this article there is an example of “catastrophic cancellation” where an iterative algorithm performs subtraction and the numerical accuracy of the result is terrible or “dangerously inaccurate”. If you are going to be involved with “engineering or scientific” numerical processing, take a class in Numerical Analysis. 7 of 9 Floating Point Arithmetic Hardware Implementation Addition/Subtraction of floating point numbers:  Denormalize the smaller number  Add the mantissa  Normalize the mantissa and adjust the exponent or form the correct representation.  Check to see if the exponent has overflowed and form the correct representation. Floating Point Add-Subtract (from Chapter 6, Computer System Design and Architecture, V. Heuring and H. Jordan, Addison Wesley, 1997, ISBN 0-8053-4330-X) 8 of 9 Multiplication of floating point numbers:  Multiply the mantissas  Add the exponents  Normalize the mantissa and adjust the new exponent or form the correct representation.  Check to see if the exponent has overflowed and form the correct representation A(Sign, Exo, Mant.) XOR Sign Add Exponents B(Sign, Exo, Mant.) Multiply Fractions Normalize Add Exponents C(Sign, Exo, Mant.) 9 of 9

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Appendix B Floating Point Numbers